WO2007051425A1 - Procede de communication multimedia et terminal de celui-ci - Google Patents

Procede de communication multimedia et terminal de celui-ci Download PDF

Info

Publication number
WO2007051425A1
WO2007051425A1 PCT/CN2006/002961 CN2006002961W WO2007051425A1 WO 2007051425 A1 WO2007051425 A1 WO 2007051425A1 CN 2006002961 W CN2006002961 W CN 2006002961W WO 2007051425 A1 WO2007051425 A1 WO 2007051425A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
information
fault
transmission
real
Prior art date
Application number
PCT/CN2006/002961
Other languages
English (en)
Chinese (zh)
Inventor
Zhong Luo
Bin Song
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2007051425A1 publication Critical patent/WO2007051425A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0023Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the signalling
    • H04L1/0025Transmission of mode-switching indication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0023Systems modifying transmission characteristics according to link quality, e.g. power backoff characterised by the signalling
    • H04L1/0028Formatting
    • H04L1/0029Reduction of the amount of signalling, e.g. retention of useful signalling or differential signalling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0078Avoidance of errors by organising the transmitted data in a format specifically designed to deal with errors, e.g. location
    • H04L1/0079Formats for control data

Definitions

  • Multimedia communication method and terminal thereof Multimedia communication method and terminal thereof
  • the present invention relates to the field of multimedia communication technologies, and in particular, to a multimedia communication technology supporting fault tolerance and elasticity, and in particular to a multimedia communication method and a terminal thereof. Background technique
  • Video communication is gradually becoming the main communication.
  • 3G, 3rd Generation third-generation mobile communication systems
  • IP Internet Protocol Internet Protocol
  • Video communication is gradually becoming the main communication.
  • Two-way or multi-party video communication services such as video telephony, video conferencing, and mobile terminal multimedia services, impose strict requirements on the transmission of multimedia data streams and the quality of services. Not only does network transmission require better real-time performance, but equivalently requires video data compression coding to be more efficient.
  • the purpose of the H.264 standard is to improve video coding efficiency and its adaptability to the network more effectively.
  • the H.264 4 video compression coding standard has quickly become the mainstream standard in multimedia communication.
  • a large number of H.264 multimedia real-time communication products such as conference TV, videophone, 3G mobile communication terminal
  • network streaming media products have been released, with the official promulgation and widespread use of H.264, based on IP networks and 3G,
  • the multimedia communication of the post-3G wireless network will inevitably enter a new stage of rapid development.
  • the H.264 standard uses a layered mode to define the video coding layer (VCL, Video Coding Layer) and the network abstraction layer.
  • VCL Video Coding Layer
  • H.264 introduces an encoding mechanism for IP packets, which is beneficial to Packet transmission in the network, supporting streaming media transmission of video in the network; having strong anti-error characteristics, especially suitable for wireless video transmission with high packet loss rate and serious interference.
  • All data to be transmitted of H.264, including Image data and other messages are encapsulated into a uniform format packet transmission, ie, a network abstraction layer unit (NALU, NAL Unit).
  • NALU network abstraction layer unit
  • Each NALU is a variable long byte string of a certain syntax element, including a header containing one byte.
  • Information which can be used to represent data types, and load data of several integer bytes.
  • a NAL unit can carry a code slice, a data segment of its own type, or a sequence or image parameter set. To enhance data reliability, each frame of image is divided. For several strips
  • each slice is carried by a NALU, and the Slice is composed of several smaller macroblocks, which is the smallest processing unit.
  • the slices of the corresponding positions of the preceding and succeeding frames are related to each other, and the Slices at different positions are independent of each other, so that bit error interdiffusion between slices can be avoided.
  • the H.264 data includes texture data of non-reference frames, sequence parameters, image parameters, Supplemental Enhancement Information (SEI), reference frame texture data, and the like.
  • SEI Supplemental Enhancement Information
  • the SEI message is a general term for messages that assist in the decoding, display, and other aspects of H.264 video.
  • the prior art defines various types of SEI messages while preserving the SEI reservation messages, leaving room for expansion for future possible applications.
  • SEI message Not required to reconstruct luminance and chrominance images during the decoding process.
  • a decoder conforming to the H.264 standard does not require any processing for the SEI.
  • H.264 provides a variety of mechanisms for message extension, including SEI.
  • SEI Supplemental Enhancement Information
  • H.264 provides a variety of mechanisms for message extension, including SEI.
  • SEI Supplemental Enhancement Information
  • H.264 provides a variety of mechanisms for message extension, including SEI.
  • Supplemental Enhancement Information (SEI) is defined in H.264, and its data representation area is independent of video coding data. Its usage is given in the description of NAL in H.264 protocol.
  • the basic unit of H.264 code stream is NALU.
  • NALU can carry various H.264 data types, such as video sequence parameters, picture parameters, slice data (ie specific image data), and SEI messages. data.
  • SEI is used to deliver various messages and support message extension. Therefore, the SEI domain is used to transmit messages customized for a specific purpose without affecting the compatibility based on the H.264 video communication system.
  • the NALU carrying the SEI message is called SEI NALU.
  • An SEI NALU contains one or more SEI messages.
  • Each SEI message contains variables, mainly payload type (payloadType) and payload size (payloadSize), which indicate the type and size of the message payload.
  • payloadType payload type
  • payloadSize payload size
  • the grammar and semantics of some commonly used H.264 SEI messages are defined in H.264 Annex D.8, D.9.
  • the payload contained in NALU is called Raw-Byte Sequence Payload (RBSP), and SEI is a type of RBSP.
  • the data of the SEI indicates that the zone cylinder is called the SEI domain.
  • Each SEI field contains one or more SEI messages, which in turn consist of SEI header information and SEI payload.
  • the SEI header information includes two fields: one identifies the type of payload in the SEI message and the other indicates the size of the payload. Users can customize any of a variety of load types. For H.264 decoders that do not support parsing these user-defined information, the data in the SEI field is automatically discarded. Therefore, the inclusion of useful custom information within the SEI domain does not affect compatibility based on H.264 video communication systems. ; As described above, multimedia communication only requires medium compression coding efficiency, and requires real-time transmission network.
  • RTP Real-time Transport Protocol
  • RTCP Real-time Transport
  • Control Protocol a transport protocol for multimedia data streams over the Internet and is published by the Internet Engineering Task Force (IETF).
  • IETF Internet Engineering Task Force
  • RTP is defined to work in a one-to-one or one-to-many transmission with the goal of providing time information and stream synchronization.
  • the typical application of RTP is based on the User Datagram Protocol (UDP), but it can also work on other protocols such as TCP (Transport Control Protocol) or Asynchronous Transfer Mode (ATM). .
  • UDP User Datagram Protocol
  • ATM Asynchronous Transfer Mode
  • RTP itself only guarantees the transmission of real-time data, and does not provide a reliable transmission mechanism, flow control or congestion control, but relies on RTCP to provide these services.
  • RTCP is responsible for managing the transmission quality to exchange control information between current application processes.
  • each participant periodically transmits RTCP packets, which contain statistics such as the number of transmitted packets and the number of lost packets. Therefore, the server can use this information to dynamically change the transmission rate, even Change the payload type.
  • RTP and RTCP work together to optimize transmission efficiency with efficient feedback and minimal overhead, making it suitable for real-time data transmission over the network.
  • the H.264 multimedia data is transmitted over the IP network and is based on UDP and its upper layer RTP protocol.
  • RTP itself is structurally applicable to different media data types, but different high-level protocols or media compression coding standards in multimedia communication (eg H.261, H.263, MPEG-1/-2/-4, MP3) Etc.), the IETF will develop a specification file for the RTP payload (Package) packaging method for the protocol, detailing the method of RTP encapsulation of large packets, which is optimized for this specific protocol.
  • the corresponding IETF standard for H.264 is RFC 3984: RTP Payload Format for H.264 Video 0
  • This standard is currently the main standard for H.264 video stream transmission over IP networks, and is widely used. In the field of video communication, the products of major manufacturers are based on RFC 3984, and it is currently the only H.264/RTP transmission method.
  • H.264 defines a new layer, called Network Abstract Layer (AL), which is a standard that makes it standard.
  • the interface opens up the underlying business capabilities and shields the underlying network from the differences and abstracts the business capability layer.
  • H.264 is designed to increase the separation of its video coding layer (VCL, Video Coding Layer) and the following specific network transport protocol layer.
  • VCL Video Coding Layer
  • Sexuality bringing greater application flexibility, defines a new layer of NAL, which is an early ITU-T video compression coding protocol such as H.261, H.263/H.263+/H.263++ There is no such thing.
  • how to design a more efficient and better solution for the advantages of H.264 in the NAL and RTP protocol bearer cooperation makes RTP better for H.264, practical, and worthy of study.
  • the method of RTP carrying the NAL layer data of H.264 proposed by the RFC3984 specification is the current mainstream transmission method.
  • the scheme encapsulates the NAL layer data in the RTP payload for carrying on the basis of the RTP protocol (RFC 3550).
  • the NAL layer is located between the VCL and the RTP, and specifies that the video bitstream is divided into a series of NAL data units (NALU, NAL Units) according to defined rules and structures.
  • the RTP payload format for the NALU is defined in RFC3984. The following is a brief introduction to the frame format of the RTP and the encapsulation method of the NALU in the prior art.
  • RTP is typically carried over the UDP protocol to take advantage of its multiplexing and verification capabilities. If the underlying provides multipoint distribution, RTP supports multiple address transfers. Features provided by RTP include: payload type identification, sequence numbering, timestamp, and send monitoring.
  • the detailed structure of the header information of the RTP packet is shown in Figure 1.
  • the front-to-back RTP header information shown in Figure 1 is: The first byte is the field about the header information structure itself, the second byte is the defined payload type, and the third and fourth bytes are the packet sequence number ( Sequence Number ), the 5th-8th byte is the timestamp (timestamp), the 9th-12th byte is the Synchronous Source Identifier (SSRC ID), and finally the Gonen Source Identifier (CSRC Ids, Contributing Source Identifiers) The list of ) is uncertain.
  • the first 12 bytes appear in all different types of RTP packets, while other data in the header information, such as the contribution source identifier, is only available when the mixer is inserted.
  • the V field is the version (Version) information, which is 2 bits.
  • VOIP voice IP
  • the P field is a padding flag (Padding), which is 1 bit. If P is set, it indicates that the packet contains one or more padding bytes (Padding) at the end, and the padding does not belong to a part of the payload;
  • the X field is an extension identification bit (Extension), which occupies 1 bit.
  • Extension extension identification bit
  • the format of the header extension is described in detail in section 5.3.1 of RJFC3550.
  • the CC field is the number of contributing sources (CSRC Count), which is 4 bits, indicating the number of CSRC identifiers at the end of the header information.
  • the receiver can determine the length of the CSRC IDs list following the header information according to the CC field.
  • the M field is a marker bit (Marker), which occupies 1 bit.
  • the interpretation of the identifier bit is defined in a specific profile. It allows identification of important events in the packet stream, which are specifically agreed by the communicating parties and are not subject to agreement. Limited
  • the PT field is a payload type (PT, Payload Type), which is 7 bits in total, identifies the format of the RTP payload and determines his interpretation in the application; it can also perform dynamic negotiation through signaling other than RTP to define the PT value and media. The relationship between the formats.
  • the RTP source can change the PT.
  • the next field is the sequence number of 16 bits, which the receiver can use to detect packet loss and recover the packet sequence.
  • the time stamp occupies 32 bits, which reflects the sampling time of the first byte in the RTP packet, and the receiver adjusts the media playback time or synchronizes according to it.
  • Synchronization source The SSRC ID is 32 bits. The specific value can be randomly selected to uniquely identify a media source. If a source changes the source transmission address, a new SSRC flag must be selected.
  • the source CSRC list Contribute the source CSRC list and set it to SSRC or CSRC.
  • the CSRC ID is inserted by the mixer.
  • RTP packages the NA.Package of H.264 into an RTP packet stream.
  • the NALU is mainly defined in the RPC 3984 file and is given based on this.
  • the RTP encapsulation format of this NALU is shown in Figure 2.
  • Figure 2 shows the encapsulation structure of a NALU in the RTP payload, including NALU header information, NALU data content, and multiple NALUs that are filled end-to-end into the payload of the RTP packet.
  • NALU header information is the first byte, and there are three fields. The meaning and full name are respectively described as follows:
  • the F field is defined as a forbidden bit (forbidden_zero-bit), which is 1 bit, used to identify grammatical errors, etc., and is set to 1 if there is a syntax conflict.
  • a forbidden bit forbidden_zero-bit
  • the NRI field is defined as a NAL reference identifier (nal_ref_idc), which is 2 bits, and is used to indicate the degree of importance of the NALU data.
  • a value of 00 indicates that the content of the NALU is not used to reconstruct the reference image for inter prediction, instead of 00.
  • Indicates that the current NALU is important data such as a slice or a sequence parameter set (SPS) and a picture parameter set (PPS, Picture Parameter Set) belonging to a reference frame. The larger the value, the more important the current NAL is;
  • the Type field is defined as NALU type (Nal_unit_Jype), which is 5 bits in total. There can be 32 types of NALU. The correspondence between the value and the specific type is given in Table 1.
  • the information given in one byte of the NALU header information mainly contains the validity and importance level of the NALU. Based on this information, the importance of the data carried by the RTP can be determined.
  • RTCP Quality of service
  • QoS quality of service
  • RCTP control protocol
  • RTCP is mainly used for control and reporting of the RTP protocol.
  • the underlying protocol provides multiplexing of data and control packets (eg, using separate UDP port numbers, etc.).
  • RTCP packets The types and structure of RTCP packets are described below.
  • the following RTCP packet types are defined in RTCP to carry a variety of control information: sender report (SR, Sender Report), statistics on the transmission and reception of the active sender; receiver report (RR, Receiver Report) from Participants who are not active senders receive statistics; resource description items (SDES, Source Description), which include CNAME; participant end (exit) identifier (BYE); special function (APP, Application-specific fUnction).
  • sender report (SR, Sender Report), statistics on the transmission and reception of the active sender
  • receiver report RR, Receiver Report
  • SDES Source Description
  • BYE participant end (exit) identifier
  • APP Application-specific fUnction
  • the V field is version information
  • the P field is a padding flag bit (Padding);
  • the RC field is the Receive Report Count (RC, Reception Report Count), indicating the data.
  • RC Receive Report Count
  • the PT field is the packet type (PT, Payload Type);
  • the SSRC of the sender indicates the Synchronous Source Identifier (SSRC) of the initiator of the SR packet, where the synchronization source uniquely identifies a media data source, such as the source of the video;
  • SSRC Synchronous Source Identifier
  • NTP timestamp field is Network Time Protocol (NTP), which indicates wall clock (absolute date and time), used in conjunction with RTP timestamps;
  • NTP Network Time Protocol
  • the RTP timestampe field is an RTP timestamp, that is, a timestamp generated by the RTP protocol;
  • the sender's packet count field indicates the total number of RTP packets transmitted by the sender from the time the transmission is established to the generation of the SR packet;
  • the sender byte count field indicates the total number of bytes (not including headers or padding) that the sender transmits in the RTP packet during the generation of the SR packet (excluding header or padding). This field can be used to estimate Average rate of load;
  • the following fields contain zero or more receive report blocks, each of which receives the statistics of RTP packets received from a single sync source, including: fraction lost; cumulative lost packets;
  • the received maximum extension sequence number and arrival delay jitter all reflect the network transmission status
  • the last SR (LSR, Last SR) is 32 bits, which is the timestamp flag reported by the SR on the source, which is the middle 32 bits of the NTP of the previous SR;
  • the delay since last SR (DLSR, Delay since Last SR), which is 32 bits, refers to the length of the interval from the last SR to the SR. This parameter is used to calculate the key parameters of the QoS report.
  • the difference between the Receive Report (RR) packet format and the Transmit Report (SR) is:
  • the value of the Packet Type field is 201; there is no sender information portion.
  • RTCP the functions of RTCP are as follows:
  • RTCP transmits a permanent transport layer identifier for each RTP source, called the canonical name (CNAME, Canonical Name).
  • CNAME Canonical Name
  • the SSRC identifier may change when a conflict is found or the program is restarted, so the receiver needs to track each participant through CNAME;
  • the QoS report is transmitted by using the RTCP protocol, and the QoS information is reported according to the report content specified by the RTCP protocol, and the QoS monitoring for the bearer media such as H.264 is implemented based on this.
  • RTCP brings the ability to provide QoS reporting mechanisms
  • the use of periodic reporting methods results in additional network bandwidth overhead, up to 5%. If the network is congested, resulting in a drop in the transmission QoS, the extra traffic generated by the RTCP will make the situation worse.
  • H.264 is the main video protocol for multimedia communication in the future.
  • the network of future multimedia communication applications is mainly IP-based packet switching networks and wireless networks.
  • the IP network implements "best effort" transmission and does not guarantee the QoS of the transmitted video data. Especially for the H.264 code stream that has been efficiently compressed and encoded, the problem is more prominent. Best-effort delivery over IP networks does not guarantee QoS, packet loss, latency, and latency jitter for real-time video communications. It has an impact on the quality of the restored video.
  • Error Resilience is the ability of a delivery mechanism to prevent errors from occurring or to be corrected with certain capabilities after an error has occurred. In a multimedia communication environment, it is critical that a video delivery mechanism is resilient to fault.
  • FEC Forward Error Correction
  • ARQ Automatic Retransmission Request
  • JSCC Joint Channel Coding
  • -Channel Coding interleaving and eliminating bit error spread.
  • FEC Forward Error Correction
  • ARQ Automatic Retransmission Request
  • JSCC Joint Channel Coding
  • -Channel Coding interleaving and eliminating bit error spread.
  • the use of multiple error correction coding to encode the data to be protected essentially forms data redundancy, thereby increasing the ability to resist errors.
  • the main error of the packet on the network is the packet loss error, which is called Erasure Error in the error correction coding theory.
  • Error correction codes for deletion errors are a large class called Erasure Codes.
  • the so-called erasure code is to divide the data stream sequence into segments of the same size (Unit), also called data nodes (Data Nodes).
  • the data nodes are calculated to generate check nodes (Parity Nodes or Check Nodes).
  • the check nodes may continue to operate to generate the second layer check nodes, and so on.
  • the third layer, the fourth layer can be generated up to the Nth layer check node.
  • the number of nodes on each layer is decremented according to a certain rule with respect to the previous layer, thus forming a layer-by-layer hierarchical multi-node structure. It can be visually represented as a pyramid that turns 90 degrees to the right. The leftmost side is the data node layer, and the right side is the first layer of the face node, the second layer of the check node, ..., the Nth layer check node.
  • the erasure code has a very important property, that is, the time complexity required for processing has a linear relationship with the number n of data nodes, so it is called linear-time.
  • Many other erasure codes such as the famous Reed-Solomon code, require much more time complexity, on the order of n*log2n*log(logn). Therefore, the erasure code with linear time is much better used in real-time communication.
  • Tornado code Tornado erasure code
  • multiple check node layers are generated layer by layer from the data nodes. Both the check node and the data node are sent by the sender to the receiver through the network. If some nodes are lost during the network transmission process, because the upper node participates in the generation of the lower node, the information of the upper node is already included in the lower node and the lower node, so the information of the lost node can pass the lower level of sufficient majority. The node or lower node is fully recovered. Let the number of data nodes be n, and the number of check nodes generated is 1.
  • Figure 4 shows the relationship between a typical Tornado code data node and the check nodes of each layer. The line between the nodes in the figure is called the edge, and the node on the left side of the edge participates in the calculation of the right node. It can be seen that there is a many-to-many logical relationship between the two nodes before and after.
  • the higher the code rate and the higher the redundancy rate the higher the efficiency of the erasure code.
  • the structure and performance of Tornado code are mainly determined by three factors: (a) the number of data nodes and the law of layer-by-layer scaling, which is generally scaled down in equal proportions; (b) the calculation method for generating the next layer of nodes; (c) The relationship between two adjacent nodes.
  • the number of data nodes is set to n
  • the number of check nodes is set to m
  • the scale of scaling is set to p
  • the number of check nodes is i
  • the front i-1 layer the number of check nodes, respectively np, np 2, ..., ⁇ ⁇ , i.e., the last layer of the number of i-th layer as npV (l- p), so that the total number of nodes obtained
  • the most commonly used calculation method in the Tornado code generation process is the exclusive OR operation, because the XOR operation has 4 convenient recovery functions.
  • XORing by bit to obtain the same long bit sequence C has the following properties: A and C X or O, B and C X or O; the same for the XOR between multiple sequences, there are The corresponding recovery method. It can be seen that after the XOR operation, the data nodes or the check nodes are connected with each other, and after any node is lost, it can be restored by all the remaining nodes. Since the final layer of check nodes has different scaling ratios, it is generally calculated using a conventional error correction coding strategy, such as a Reed-Solomon code.
  • the front and back layers of the Tornado code have an association relationship, that is, which node of the lower layer is calculated by which nodes of the previous layer.
  • a two-part graph is formed between the two nodes before and after, and the association between the nodes in the front and back layers is determined according to the association between the left and right nodes of the two-part graph.
  • the parameters n, m, i, p, etc. are determined by given protection capabilities and other requirements, such as the reasonableness of the data node size, the maximum acceptable network delay, etc., and given the node degree vector. Randomly distributed, and can be encoded in Tornado.
  • erasure codes In fact, the range of erasure codes is very large. Tornado codes are only one of them. In addition, there are RS (Reed-Solomon) codes and Low Density Parity Codes (LDPC).
  • RS Random-Solomon
  • LDPC Low Density Parity Codes
  • An important performance indicator of the erasure code is its error correction capability (or protection capability), which is directly reflected in the maximum number of lost packets allowed under the packet loss error (on the premise of a certain number of packets), or The package is able to correct the percentage of the package correctly above this maximum allowable number.
  • error correction capability or protection capability
  • the higher the protection the higher the redundancy rate under the same conditions.
  • the protection capability is not only applicable to erasure codes, but on a larger scale, all FEC codes can be measured by protection capabilities.
  • some data are relatively important, such as structural parameters of video sequences, structural parameters of images, header information, etc.
  • Other data are relatively less important, such as image content data.
  • FEC FEC
  • a more robust code is used for relatively important data, and a weaker code is used for relatively unimportant data. This balances protection and efficiency.
  • UEP Unequal Protection
  • QoS guarantee for video communication services is easily realized by unequal protection.
  • the idea of unequal protection is to protect data with different importance (relative) in multimedia data with different protection/protection strength protection mechanisms.
  • Different protection mechanisms can refer to large or small classes. For example, large classes differ in principle, and small classes differ only in structure or parameters.
  • Hierarchical protection is to divide the protection mechanism into multiple levels according to the protection ability.
  • Hierarchical protection is actually an adaptive strategy. The combination of protection and hierarchical protection forms. More complex and powerful protection strategies.
  • the existing anti-drop error smear can be roughly divided into two categories: (a) Active error-proof type: Take pre-protection measures, such as introducing a redundancy mechanism, try to ensure that the data packet is not lost or that the receiving end can recover a small amount of loss. (b) Error compensation type: Take certain compensation measures in case of error, for example, in the case of serious deterioration of network conditions, the packet loss rate is very high, and the active error prevention method loses its effect. The error is compensated.
  • error compensation method for error compensation is divided into two types: error masking and error spreading.
  • error concealment is focused on compensating the current impact of the error, and the error re-distribution elimination is to eliminate the subsequent influence of the error in spatial and temporal diffusion.
  • Error concealment can also lead to the spread of bit errors.
  • the codec and decoder decoding image cache contents do not match, resulting in the spread of bit errors in the time domain.
  • the existing H.264/RTP transport architecture and the RTCP-based QoS reporting method use RTP to directly encapsulate the NALU for transmission, and use the RTCP SR/RR report to monitor QoS information.
  • the related technical details have been introduced.
  • Tornado code used in the prior art is a relatively complicated solution.
  • the data transmission protection method based on H.261/H.263/H.263+/H.263++/H.264 video compression coding is implemented by using Tornado code.
  • the existing error elimination methods are independent error concealing methods or error diffusion elimination methods, and the error concealing methods include time domain masking, spatial domain masking, and space-time joint masking.
  • the error spread elimination has intraframe coding, identification, adaptive intra block refresh, and the like.
  • the time domain masking method uses the information of adjacent frames on the time axis to estimate the missing data.
  • the following methods can be used: Simply replace the missing data with the data of the same position of the adjacent frame; Consider the motion prediction factor, and perform motion prediction based on the adjacent frame data. In addition to this there are more complicated masking strategies, but the amount of calculation is very large.
  • the spatial domain masking method utilizes spatially adjacent regions of the lost data region for error concealment. This method is computationally intensive.
  • the space-time joint masking method is a combination of spatial domain and time domain error concealment. Or, combine spatial data and time data to cover up together.
  • the error code diffusion elimination method based on intraframe coding adopts intraframe coding for macroblocks affected by bit errors, that is, using the forward dependence of motion vectors to perform accurate error tracking, and adopting frames for macroblocks affected by error codes. Internal coding can effectively prevent bit error diffusion.
  • multi-level protection and unequal protection are not realized because there is no convenient solution for providing network condition monitoring and description of the relative importance of data.
  • the Tornado code scheme in the prior art is too complicated and inefficient, and is applied to the protection of video data, and the delay is large, which cannot meet the performance requirements of real-time communication.
  • the prior art protects the video communication stream with a fixed erasure code strategy, and cannot adapt to network communication changes; the alternative mechanism adopted by the error concealment method may cause error diffusion; the error diffusion elimination method requires a complicated mechanism or an additional feedback channel.
  • the system consumes resources and network bandwidth resources.
  • the header information of the NALU is completely encapsulated in the payload, so that the RTP protocol cannot directly know the attributes, levels, importance, and the like of the payload, and thus the QoS mechanism based on this cannot be implemented.
  • such an encapsulation format also causes the NALU header information to occupy the payload resources, because each NALU has header information, which results in many cases, because the header information of multiple NALUs of the same type in an RTP is the same. , thus wasting RTP transmission bandwidth resources.
  • the H.264/RTP multimedia communication framework uses a generic coordination control protocol RTCP to transmit QoS reports for QoS monitoring.
  • RTCP itself is not necessarily the most suitable for specific video communication applications such as H.264. Its own out-of-band re-opening logical channel to transmit QoS reports affects network conditions and leads to conflicts.
  • the key point is that the prior art does not implement a fault-tolerant and flexible protection strategy of the transport layer, and cannot provide the reliability and communication quality of multimedia transmission.
  • a primary object of the present invention is to provide a multimedia communication method and terminal thereof that improve transmission reliability and communication quality.
  • the embodiment of the invention provides a multimedia communication method, including:
  • the transmitting end selects the encoding mode according to the fault-tolerant elastic protection policy, encodes the multimedia data, and sends the encoded multimedia data encapsulated by the real-time transport protocol to the receiving end; the receiving end receives the multimedia data, if the receiving The multimedia data has a transmission error, and the receiving end restores or partially restores the multimedia data of the transmission error.
  • the receiving end collects the communication quality, generates a quality of service report, and sends it back to the Delivery end
  • the sending end adjusts the fault tolerant elastic protection policy according to the quality of service report. More suitably, further comprising:
  • the receiving end transmits error information according to the multimedia data of the transmission error, and implements an error concealment strategy
  • the receiving end feeds back the transmission error information to the sending end;
  • the transmitting end implements an error diffusion elimination strategy according to the transmission error information.
  • the real-time transport protocol header information carries code-related information
  • the receiving end recovers or partially recovers the multimedia data according to the code-related information.
  • the transmitting end obtains the positioning information of the lost strip according to the transmission error information, and performs segment-by-frame intra-frame coding on the lost strip to implement the error spreading elimination strategy.
  • a multimedia communication terminal has a basic function module for implementing multimedia communication, and includes a codec module for implementing a multimedia codec function, and further includes:
  • the fault-tolerant and flexible transmission control protocol module is configured to receive the multimedia data encoded by the codec module, perform fault-tolerant and flexible protection on the multimedia data, and send the fault-tolerant and elastic-protected data to the network side for transmission.
  • the fault-tolerant elastic implementation transmission control protocol module is further configured to receive the multimedia data on the network side, perform error correction on the multimedia data, and transmit the multimedia data to the codec module for decoding.
  • the terminal further comprises:
  • the protection method and the policy negotiation module are configured to perform fault-tolerant and flexible protection policy negotiation between the two communication parties, determine a protection policy set, and implement the transmission control protocol module selection for the fault-tolerant elasticity;
  • the terminal further comprises:
  • Error masking module for implementing error concealment function
  • the codec module is used to implement codec of the H.264 codec standard, and is also used for error diffusion elimination function;
  • a network condition analysis calculation module is also included for analyzing the calculated network condition and providing information to the error masking module and the codec module.
  • the terminal further comprises:
  • a supplementary enhanced message extension processing module is configured to implement a quality of service report and a network status report function, and send the report to the network status analysis calculation module.
  • the technical solution of the present invention adopts a fault-tolerant elastic real-time transmission protocol (ERRTP), and provides a transport layer encapsulation format that can carry information related to a fault-tolerant elastic coding scheme on the basis of the existing RTP, so that the multimedia data is in the ER.
  • ERRTP fault-tolerant elastic real-time transmission protocol
  • the TP transmits the corresponding fault-tolerant elastic coding scheme information at the same time, thereby integrating the fault-tolerant elastic mechanism into the transport layer; giving a dedicated ERRTP encapsulation method and protocol header information transformation scheme for the H.264 NALU structure, by using the same ERRTP
  • the header information bytes of all NALUs in the packet are combined into their header information, using a clever combination that does not affect the operation of the existing ERRTP protocol and device, and can directly reflect the attributes of the NALU payload in ERRTP.
  • the bearer efficiency is greatly improved, and on the other hand, the basis of the implementation of the QoS mechanism is provided;
  • the communication quality is measured by the receiving end and fed back to the transmitting end, and the extended message mechanism of the high-level media protocol H.264 itself is directly used to carry the QoS report information, avoiding the use of additional channels, and realizing a kind of "" In-band 'QoS reporting mechanism;
  • the unequal protection and the multiple fault-tolerant flexible schemes are alternately mixed, and the transmitting end selects and uses different levels according to the QoS report fed back by the receiving end and the related network transmission status message.
  • Protection strategy based on the data importance level reflected from the ERRTP header information, you can also select different levels of data. Use appropriate protection strategies;
  • An efficient Tornado code scheme is also provided. By ensuring that the data transmission protection capability is not significantly degraded, by setting an erasure code having only one layer of check nodes, the erasure code generation check node layer is reduced. The amount of calculation reduces the data transmission delay time, so that the data transmission protection performance and cost ratio are improved;
  • the above various multimedia communication related enhancement technologies are integrated on the multimedia communication system, and various technologies and protocol architectures are modularized, and various technologies work in coordination with each other to further enhance the reliability of multimedia communication.
  • the transmission structure saves the network transmission bandwidth; the realization of the unequal protection achieves the balance between protection capability and transmission efficiency, facilitates the realization of QoS guarantee for multimedia transmission, further improves the quality of service, reduces redundancy, and improves transmission efficiency. Achieving compatibility with the prior art has improved the robustness of the new method of ERRTP;
  • 1 is a schematic diagram showing the structure of a header information of an RTP data packet
  • FIG. 2 is a schematic diagram of a package format of an RTP packet payload to NALU data
  • Figure 3 is a schematic diagram of a format of a QoS report data packet based on the RTCP protocol
  • Figure 4 is a schematic diagram of the Tornado erasure code principle
  • FIG. 5 is a schematic structural diagram of a module supporting a fault tolerant elastic multimedia communication terminal according to a first embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a multimedia communication protocol stack according to a first embodiment of the present invention
  • FIG. 7 is a schematic diagram showing a header information structure of an ER TP data packet according to second and third embodiments of the present invention
  • FIG. 8 is a schematic diagram of an SEI encapsulation format for carrying a QoS report according to a fourth embodiment of the present invention.
  • Figure 9 is a diagram showing the principle of error spread elimination based on segmented successive intra coding according to a sixth embodiment of the present invention.
  • FIG. 10 is a block diagram showing the structure of the erasure code of the present invention. DETAILED DESCRIPTION OF THE INVENTION
  • the present invention integrates various enhancement techniques on a multimedia communication system, combining the respective advantages of various enhancement techniques to improve system performance, transmission reliability, and communication quality.
  • enhancements include the fault-tolerant elastic real-time transport protocol (ERRTP, Error Resilience Real-time Transport Protocol) that integrates FEC into the RTP protocol.
  • ERTP Error Resilience Real-time Transport Protocol
  • the invention combines various enhancement technologies and combines them in a multimedia communication system to realize fault-tolerant and flexible H.264 video communication, and the system includes a general control module and a user. Interfaces, network communication modules, I/O and underlying driver modules, various service modules, communication process control modules, application protocol modules, etc., and also include protection methods and policy negotiation modules for implementing various enhancement technologies, FEC modules, ERRTP modules, RTCP module, H.264 NAL module, I- 1.264 encoder module, H.264 decoder module, audio codec module, error masking module, SEI message extension processing module, network condition analysis and calculation module.
  • a plurality of enhancement technologies are implemented and modularized in a multimedia communication system, mainly referring to a multimedia communication terminal.
  • the implementation description of the device is performed from each component function module of the terminal, and a complete terminal internal module structure diagram is as follows.
  • Figure 5 shows. It should be noted that the functional modules mentioned here are all defined functionally, and the specific implementation manners may be software, hardware, firmware, and a combination of software and hardware.
  • a complete multimedia communication terminal must first contain the following modules:
  • Main control module responsible for the control of the entire terminal system
  • User interface (or interface) module Responsible for user input and output interaction, the user operates through interface control elements such as menu buttons, and displays feedback information such as current system status, parameters, network status, etc.
  • Network communication module responsible for communication with the network, providing TCP, UDPJP and lower communication protocol stacks such as Ethernet, PPP, ATM, etc.;
  • I/O and underlying driver modules responsible for driving hardware devices, such as video, audio capture devices and display/playback devices, and for video and audio data input and output;
  • Various business modules implement various specific services, such as videophone, multi-party conference, video mail, timely news, video chat, etc.
  • Communication process control module Controls in the specific communication process, such as implementing the application chairperson in the multi-party conference, releasing the chairman, Shenqi speaking, controlling the broadcasting of a certain venue, the venue browsing, etc.;
  • Application protocol module It can be a specific application protocol such as H.323 system (including H.225.0, RAS, H.245, H.235, H.460, etc.) and SIP.
  • H.323 system including H.225.0, RAS, H.245, H.235, H.460, etc.
  • SIP Session Initiation Protocol
  • Protection method and policy negotiation module The module is responsible for negotiating the protection method between the communication parties, determining the allowed set, and then negotiating a set according to the allowed set. The strategy of mixing and alternating use of protection methods. The negotiation is completed through the "application protocol module".
  • the module controls the FEC module, the latter implements different FEC protection modes, functions such as unequal protection and adaptive hierarchical protection;
  • FEC module This module supports a variety of FEC protection methods. They can be subclasses in multiple categories. It is assumed that a total of T different methods are supported. According to the results of the negotiation (from the "Protection Method and Policy Negotiation Module"), H.264 video data and audio data (not in the scope of this patent) are protected. The module internally stores the generation rules and parameters for the various FEC subclasses, so it contains an internal database for storing this data. This module enables mixing and alternate application of different protection methods;
  • ERRTP module Implementing the ERRTP protocol, the protocol encapsulation format for ERRTP and the related encapsulation decapsulation steps corresponding to H.264 will be described in detail in the following embodiments;
  • RTCP module Implements the normal RTCP function. Although the present invention provides a reporting mechanism based on the H.264 SEI message extension, the main RTCP information can be reported, but the use of RTCP is not excluded, and the two reporting mechanisms can coexist. Mainly considering compatibility and interoperability, the other terminal may not support the use of SEI message extension 4 advertising mechanism;
  • H.264 NAL module The function of implementing the H.264 network abstraction layer
  • H.264 encoder module In addition to realizing the normal H.264 encoder function, the error diffusion elimination function of the present invention is also implemented, so the information is derived from the "network condition analysis and calculation module,;
  • H.264 decoder module implements the normal H.264 decoder function
  • Audio codec module implement audio codec function, the supported protocol can be ITU-T
  • Error Masking Module Implements the error concealment function provided by the present invention. The information is based on the "Network Status Analysis Calculation Module” and the “H.264 Encoder Module”;
  • SEI message extension processing module implementing QoS and SEI message extension based on the present invention
  • the network status report function on the transmitting end, collects data to form RTCP SR, RR report, and then sends out through SEI extended message encapsulation; extracts RTCP SR, RR report from SEI extended message at the receiving end, and then sends the data to the network
  • the condition analysis calculation module "is analyzed and calculated;
  • Network condition analysis calculation module According to the data from the "SEI message extension processing module", perform analysis and calculation to obtain network status data, such as packet loss rate, jitter, delay, clockwise end-to-end bandwidth, etc., and then Use this data to control the "II.264 Encoder Module” and “Error Masking Module”, and also send this data to the "User Interface Module” which can be displayed to the user.
  • network status data such as packet loss rate, jitter, delay, clockwise end-to-end bandwidth, etc.
  • Fig. 6 is a block diagram showing the structure of a multimedia communication protocol stack in accordance with a first embodiment of the present invention.
  • the H.264/ER TP multimedia delivery architecture of the present invention differs from the traditional H.264/RTP architecture mainly in that:
  • SEI Extended Reporting Layer is added between the H.264 VCL layer and the NAL layer. This layer facilitates the implementation of QoS monitoring and network transmission status based on SEI extended messages.
  • the "FEC layer” is added between the H.264 NAL layer and the ERRTP/RTCP layer. This layer implements node partitioning, encoding, and encapsulation for the H.264 NALU data stream.
  • the first embodiment of the present invention provides a basic modular structure and a protocol stack composition as an example of a typical H.264 service, for other protocols.
  • the multimedia communication protocol or application that appears in the future only needs to implement the relevant technical details according to the specific application based on the principle of the present invention, and achieve the object of the invention without affecting the essence and scope of the present invention.
  • the present invention proposes an improved RTP protocol supporting fault tolerance resilience, which aims to integrate a fault-tolerant elastic mechanism into a transport layer protocol, which not only simplifies the transmission structure, reduces complexity, but also improves the fault-tolerant elastic mechanism. Flexibility enhances transmission reliability. Due to its fault tolerance, the present invention calls this improved RTP protocol a fault tolerant elastic real time transfer protocol (ERRTP or ER2TP, Error Resilience Real-time Transport Protocol).
  • ERRTP error tolerant elastic real time transfer protocol
  • ERTP Error Resilience Real-time Transport Protocol
  • the main difference between ERRTP and RTP is that the ERTP protocol packet header information extension can carry information about the fault-tolerant elastic coding scheme, such as FEC type, protection capability, and coding parameters.
  • the present invention conveniently realizes unequal protection. Firstly, various protection measures with different protection capabilities are available for selection, and then the sender can collect information such as network status and importance of multimedia data. These factors are used to select appropriate protection measures to achieve the goal of unequal protection and to achieve a balance between protection capability and transmission efficiency. Since the FEC related information is carried on each ERRTP data packet, the transmitting end only needs to fill in the information of the selected scheme into the ERRTP header information, and the receiving end can correctly recover or correct according to it. wrong.
  • the specific implementation method based on erasure code protection is given, including the steps of dividing, generating, encapsulating and decapsulating data nodes and check nodes.
  • a series of NALUs are equally divided into several data nodes, and then the Tornado code is used to generate the face nodes. All of these nodes are distributed in several ERRTP packets, and the receiver performs this inverse process.
  • the transmitting and receiving parties implement unequal protection based on ERRTP.
  • the main steps are as follows:
  • the transmitting end selects a fault-tolerant elastic coding scheme to perform erasure coding on the multimedia data.
  • ERRTP encapsulates the encoded multimedia data, and carries information related to the fault-tolerant elastic coding scheme in the ERRTP header information, and then sends the information to the receiving end;
  • the receiving end decapsulates the received ERRTP packet, and extracts the information about the fault-tolerant elastic coding scheme from the ERRTP header information, and then selects the fault-tolerant elastic coding scheme for fault-tolerant elastic decoding according to the information of the fault-tolerant elastic coding scheme to obtain the multimedia data.
  • the unequal protection is reflected in the fact that the transmitting end selects the fault tolerant elastic coding scheme according to the current network transmission status and/or the quality of service level of the multimedia data to be transmitted.
  • FIG. 7 is a diagram showing the structure of an ERRTP header information according to a first embodiment of the present invention.
  • the header information extension is also followed by the relevant information field about the fault-tolerant elastic coding scheme.
  • the fault-tolerant elastic coding type field, the fault-tolerant elastic coding parameter field, the packet length field, and the number of packets field are included.
  • the fault-tolerant elastic coding type field is used to indicate the erasure code type used by the fault-tolerant elastic coding scheme, and may also be referred to as an FEC Type field, that is, the FEC coding type is indicated, which is 4 bits, and can represent 16 different FEC types, from practical application. Medium is enough.
  • the types defined here are actually large types, and will continue to be subdivided into various schemes, called subtypes.
  • the large types in actual applications are, for example, 0010 for Tornado code and 0011 for RS code.
  • This field identifies 16 different types of FEC codes.
  • the LUT Look-Up Table
  • FECTypeLUT which is required by the two parties to agree on a correspondence between the FEC encoding type and the encoding type code
  • the fault-tolerant elastic coding subtype field is used to indicate the related parameter setting of the fault-tolerant elastic coding scheme. For each type of FEC coding, it is also necessary to determine the setting of various parameters to be specifically implemented. This field serves to clarify specific parameters. Since the resources in the ERRTP header information are limited, it is impossible to list specific parameters and their rules, etc. corresponding to various FEC encoding schemes, and the first embodiment of the present invention indicates various alternative parameters by using the concept of subtypes. Set the plan.
  • This field is also called the FEC encoding subtype field, FEC Subtype, which occupies 9 bits. This field mainly represents subtypes further subdivided under each of the large types defined in the FECTypeLUT.
  • MTU Maximum Transport Unit
  • the number of data packets is used to indicate the number of data nodes carried by the ERRTP packet, which is also called a Packet Number field, which occupies 8 bits. For example, after several NALUs are verified by the forward error correction code, the packet is encapsulated in multiple The number of data nodes carried in each ERRTP in ERRTP.
  • the decoding end or the network node can verify the received data packet according to the FEC code type and the check type of the data packet given by the field, and recover the lost data packet.
  • sub-type FEC Subtype field mentioned above has a total of 9 bits for encoding a parameter setting scheme indicating various alternatives, and how to perform the coding indication in the first embodiment of the present invention is given below. technical details.
  • the sending and receiving parties need to negotiate to determine the field indicating the relationship correspondence table.
  • the sender and the receiver negotiate to determine: for various types of FEC codes, the correspondence between the value of the FEC Subtype and the related parameter setting scheme of the FEC code indicated, and various alternatives. Specific parameter settings.
  • the sender and the receiver both establish a correspondence table according to the negotiation result, and are configured to query the corresponding FEC coding type or FEC codec processing module according to the FEC Type and FEC Subtype fields;
  • the transmitting end calls the corresponding erasure coding processing module to perform erasure coding
  • the receiving end calls the corresponding erasure decoding processing module to perform erasure decoding.
  • the so-called generation rule is a rule or algorithm (Algorithm) of how the data node is processed at the transmitting end to generate each check node. Of course, the opposite is done at the receiving end, such as If a packet loss occurs during transmission, that is, some nodes are lost, the lost node can be recovered or partially recovered according to the generation rule. It can be seen that the generation rule is very important information, according to which both parties of the communication can work based on the FEC mechanism.
  • Each of the FEC types listed in the FECTypeLUT has different generation rules; in each class, such as the Tornado code, the following subclass generation rules are combined with the specific generation parameters (generation parametei's). . So for each subclass here, the claim rule will be combined with the build parameters.
  • the generation parameters include the following data: the total number of data nodes, the total number of check nodes, the number of check node layers, the scaling ratio of the number of power saves between successive layers, and the association of node associations between successive layers.
  • Matrix if there is an L-layer check node, then such an associative matrix has L, or equivalent, bipartite of the relationship between successive two-layer nodes.
  • Parametric mathematical representation 0 In the case that the large generation rules are the same, the generation of parameters often determines the protection strength of the subtype.
  • Tornado code in the various generation parameters given above, the total number of data nodes and the total number of face nodes can basically determine the protection ability to a large extent (of course, strictly speaking, to fully determine the protection ability, all the generation is required. parameter).
  • some main parameters determining the maximum effect
  • representative generation parameters subclasses under the large class can be arranged in order of protection from weak to strong (ascending order).
  • creating a LUT is called FECSubTypeLUT.
  • Each large type specifically supports multiple subtypes below, and can have specific application and communication capabilities (CPU processing speed, memory, program complexity, etc.) and needs to be determined. If the communication environment changes a lot and the performance of the network fluctuates widely, then the subtypes that need to be supported are generally more, but less. This can be agreed upon by the communication parties through the capability negotiation process before the communication begins. Negotiation can be carried out through the current mainstream multimedia communication framework protocols such as H.323 or Session Initial Protocol (SIP).
  • H.323 Session Initial Protocol
  • each type of macro corresponds to a FEC processing module at the transmitting end, which is responsible for generating a check node; at the receiving end, it also corresponds to an FEC processing module, which is responsible for restoring the node.
  • both parties of the communication decide which FEC processing module to call and which generation parameters to read based on the information of the two information fields FEC Type and FEC Subtype.
  • the second embodiment of the present invention gives the NALU of H.264 with ERRTP.
  • the specific steps of the data stream for FEC encoding and decoding are as follows.
  • the sender sets multiple CiC to S) H.264 NALU merges into a group to perform unified coding transmission.
  • the S NALUs are re-divided into equal-length blocks, and the support is set to M. These M are data nodes. .
  • the S NARUs of ⁇ .264 are grouped into one group; then the S NALUs are concatenated end-to-end, connected to form a large block, and then the large block is equally divided into M data blocks, wherein Each data block has a length of K bytes.
  • the rounding operation should be performed so that the length of each data block is Ceiling (TB/M) bytes, and the Ceiling function indicates rounding. , that is, Ceiling(x) is equal to the smallest integer not less than X, and X is any real number. Then, in some data blocks, the operation of zero padding may be used, so that the number of bytes is equal to Ceiling (TB/M).
  • F data encoding is performed on the M data nodes to obtain N check nodes.
  • FEC code encoding for M data blocks to generate N school face blocks, the generation process uses the method described above to determine which FEC processing module to call for the generation of the check block according to the FEC Type and FEC Subtype information.
  • the sender encapsulates all data nodes and check node packets in the ERRTP packet for transmission.
  • the fields should be set as follows:
  • Type field FEC Type 0010, indicating the use of Tornado code
  • Packet Number (M+N)/P , which represents the number of data nodes carried in an ERRTP payload.
  • the receiver After receiving the ERRTP packets, the receiver decapsulates the data nodes and the check nodes.
  • the receiving end starts with P packets and starts decoding and recovering every time a group of P packets is received. How many packets of a group are determined by mutual agreement.
  • the receiving end performs fault-tolerant elastic decoding on the data node according to the check node.
  • the processing module decodes and recovers or partially loses data. Finally, after obtaining the complete data node, re-merge to obtain a large block, and divide the S NALUs in the same way as the sender.
  • the above example uses the ERRTP-based anti-data packet loss algorithm, which can greatly improve the anti-data packet loss capability of the video code stream when the codeword is less than 17%.
  • the RTP payload header structure only 4 bytes have been added, which shows that there is basically no effect on the transmission efficiency, and significant practical results have been achieved.
  • Another key technical point that has been mentioned above with respect to the present invention is the implementation of unequal protection. It is mainly embodied in two aspects. One is to select the appropriate coding scheme or parameters according to the multimedia data of different important levels, that is, to determine the aforementioned FEC coding type and subtype, and the other is to select according to the network conditions at different times. Corresponding to these two aspects, they are called mixed and alternate use of various FEC coding schemes. Hybrid refers to the simultaneous use of multiple FEC subtypes at the same time, mainly for protecting data of different importance. The so-called Alternation refers to the use at different times (different network conditions). Different FEC subtypes.
  • the header byte reflects the importance of the data, so the sender can evaluate the QoS level according to the NRI field or Type field in the NALU header information, and then select the fault-tolerant elastic coding scheme. , that is, the FEC Type field and the FEC Subtype field are determined.
  • the general network transmission has a corresponding network condition monitoring mechanism. The transmitting end can learn the transmission report fed back by the receiving end according to these mechanisms, thereby evaluating the network transmission status, and then selecting the fault-tolerant elastic coding scheme, that is, determining the FEC Type. Field and FEC Subtype fields.
  • the H.264 code stream is transmitted or stored based on the NALU, which consists of NAL header information and NAL payload.
  • NALU which consists of NAL header information and NAL payload.
  • different NALU types have different effects on decoding and restoring images. For example, if NI takes 0, it means that a Slice or Slice data strip of a non-reference image in the NALU does not affect subsequent decoding; and NRI takes a non-zero to indicate that a sequence/image parameter set or a slice of the reference image is stored in the NALU. Or slice data strips, which will seriously affect subsequent decoding.
  • the data of H.264 can be classified into two types according to the values of NRI or Nal_unit_type: One type is a relatively important image.
  • the data for example, Nal_ref_idc is equal to 1
  • the other is secondary image data (for example, Nal_ref_idc is equal to 0).
  • the important image data is protected by the FECI code with high redundancy and strong anti-dropping capability; and the secondary image data can be protected by the FEC2 code with less redundancy and weaker packet loss resistance. .
  • FEC1, FEC2 are just general representations, representing any two subtypes. These two seed types can belong to the same large type or to different major types.
  • the above method can be extended to a more general case, and the data is divided into more classes according to the value of NAL_unit-type, for example, five categories: the most important data, the second most important data, the general important data, the less important data, The least important data; can also be divided into 7 categories or more, then, can be protected with the same number of FEC subtypes, each type of data corresponds to a different subtype. As long as the protection ability is weak to strong, these subtypes do not necessarily belong to the same large type.
  • the image information that has not been restored after the protection of the FEC code with the strongest protection ability adopts techniques such as error concealment and prevention of error diffusion.
  • Another situation in which unequal protection is also within the scope of the present invention is the ability to select FECs of different protection capabilities depending on the real-time conditions of the network.
  • the ERRTP header information is then used to inform both parties of the communication so that they can correctly decode the data and recover the lost data.
  • the image information that has not been recovered after the protection of the FEC code with the strongest protection is error masking and error-preventing techniques are adopted. Perceived network conditions can be achieved through various existing QoS monitoring methods.
  • the data importance level and the network status level are in ascending order.
  • the subscript of FEC is represented by a two-dimensional subscript, and the fault-tolerant elastic mechanism FEC(i), 0 ⁇ i ⁇ U, 0 ⁇ j ⁇ V, in the table may be any one of the above T FEC schemes.
  • an improved Tornado erasure code is specifically employed.
  • the improved Tornado erasure code generates only one layer of the check node for a group of data nodes.
  • the coding delay is greatly reduced to meet the requirements of real-time communication.
  • the use of FEC code packet protection introduces a delay, the size of which is related to the size of the image data packet.
  • the S NALUs are grouped into one group, and one NALU contains a stream data of a Slice. If a frame of image is divided into a slice, the encoding end will have the delay of the S frame, and the decoding end will also have the delay of the S frame.
  • the relationship between NALU and the number of data nodes is as follows:
  • the delay of one frame of image T ""w is basically determined by the value of S, and the DataNode It also greatly affects the value of S. Therefore, under the premise of ensuring the ability of video communication to resist packet loss, the delay introduced by FEC is minimized, and the QoS of real-time video communication is further ensured.
  • the present invention employs an improved Tornado code protection algorithm in the case where the DataNode is limited.
  • the improved Tornado method does not use the encoding of multi-level even graphs, but only uses the encoding of a layer of check nodes.
  • the improved coding method greatly improves the flexibility of the algorithm.
  • the number of data nodes and check nodes can be arbitrarily set, and the complexity of the codec algorithm is also reduced. It can be used for real-time video communication.
  • Anti-packet loss In addition, the improved anti-data packet loss performance of the Tornado code is basically not reduced when the data node is limited. The specific principles and detailed steps of the improved Tornado coding method will be described in detail later.
  • ERRTP processes the same type of NALU and integrates the header information into the ERRTP header information.
  • the most basic difference from RTP is that in the ERRTP encapsulation process, the header information of the NALU packet with the same header information is integrated into the header information of ERRTP.
  • NALU header information structure has already been mentioned.
  • the NALU information includes: W occupies a 1-bit F field, which is used to indicate whether the NALU is in error;
  • a 5-bit Type field indicating the type of the NALU.
  • the execution steps of both the sender and the receiver are as follows.
  • the sender encapsulates multiple NALU data nodes or check nodes with the same header information in the same ERRTP packet in the ERRTP encapsulation format.
  • the first one can accumulate the same type of NALU until it is packaged into ERRTP after satisfying a certain number, and the other is the same. If the number of types of NALUs does not reach a certain number, the method of RTP padding is a waste of bandwidth, but this is insignificant. Another method is that if there are many NALUs of different types, you can use RTP encapsulation, anyway.
  • the receiving end can identify according to the ERRTP identifier and perform corresponding processing.
  • the same header information of the NALU carried by the NALU is integrated into the header information of the ERRTP packet, and the carried NALU is removed from the header information and then according to the aforementioned Process processing, partitioning, encoding, and encapsulation are populated into the payload of the ERRTP packet. So how do you integrate the NALU header into the ERRTP header? Two sets of solutions will be specifically given below to solve these several problems.
  • the N I field and the Type field in the NALU header information are filled in the PT field of the ERRTP header information, which has been described above, and the PT field is located after the second byte of the ERRTP header information.
  • the format of such an ERRTP header has been given in Figure 7, where the difference from RTP has been indicated in bold, and some places in the other figures are explained later.
  • the V field in the ERRTP header is used as the ERRTP identifier, which has been mentioned above; the F field in the NALU header information is filled in the M field of the ERRTP header information, and the M field is located in the first byte of the second byte of the ERRTP header information.
  • Bit, at the receiving end, according to the M field of the ERRTP packet it is judged whether the NALU carried by the ERRTP packet is in error, and the forbidden bit function of the F field is realized. It can be seen that the scheme can tell the receiver of the RTP data packet through the difference of the version.
  • the RTP protocol is ERRTP, so in the subsequent processing, it is necessary to follow the needle. The processing flow of the ERRTP protocol is performed.
  • the NALU header information byte (8 bits) is replaced by the identifier M field 1 bit in the original RTP header information and the PT field 7 bits in total 8 bits.
  • the specific replacement order can be like this:
  • NRI 2 bits replace the highest 2 bits of the PT 7 bits
  • Type 5 bits replaces the most 4 ⁇ 5 bits of the PT 7 bits.
  • the code-related information such as the FEC Type FEC Subtype Packet Number in the ER TP header is used to identify the coding mode used and the multimedia data packet.
  • the receiving end restores or partially restores the multimedia data according to the encoding related information.
  • the PT 7 bits are inherently free to use, as mentioned earlier.
  • the purpose of the M field is specified in RTP (RFC 3550) as follows: A specific profile (Profile) can specify not to use M bits, but to put it with A PT, so that the PT can have up to 8 bits, which distinguishes 256 different types. type. Therefore, replacing M bits with F bits is completely RTP-compliant and does not cause interworking between ERRTP and traditional RTP.
  • the package format of the ERRTP of the present invention has three obvious advantages: First, the overhead is small, especially when there are multiple NALUs in one RTP, the number of transmitted bits is obviously saved; Second, it is not necessary in the RP package. H.264 NALU data decoding can discriminate the relative importance of these NALUs. Third, without decoding the H.264 NALU data in the RTP packets, it can be identified whether the RTP packets will be correct due to other bit loss. decoding.
  • the 7 bits of the PT in the ERRTP header information are copied to the lowest 7 bits of a byte H (8 bits), and the highest bit of H is set to 0 as the F bit.
  • the generated H bytes are then appended to the top of each extracted NALU, thus restoring each NALU.
  • the F field in the ERRTP header is 1, it indicates that the NALU in the ERRTP packet is in error, so it can be directly discarded, and the processing time saved.
  • the second solution is given below, which is the same as the first one, that is, the NRI and Type fields in the NALU header are also filled into the 7 bits of the PT field of the ERRTP header.
  • ⁇ Use the M field to identify ERRTP One problem that comes with this is that the F field has no place to fill.
  • NALU still uses the original RTP transmission, and for normal, it uses ERRTP to transmit, but ignores the F bit. The specific details are as follows.
  • the M field is set to 1 to identify the ERRTP packet, which is located in the first 1 byte of the 2nd byte of the ERRTP header information.
  • F bits it is specified in the H.264 protocol: 1 if there is a syntax conflict or an error.
  • the network recognizes a bit error in this unit, it can be set to 1 so that the receiver drops the unit. It is mainly used to adapt to different kinds of network environments, such as wired and wireless combined environments.
  • the specific usage principle is: Generally, when the transmitting end and the receiving end of the communication perform H.264 encoding and decoding on the video, the bit is not "written, operated, and the decoding end performs a "read" operation on the bit.
  • the receiving end will discard the NALU during the decoding process.
  • the "write" operation for the F bit is mainly a gateway between two different networks. On the above, such as the case of encoding conversion (MPEG-4 to H.264, H.263 to H.264, etc.).
  • the present invention ignores the F bits and does not have to be defined with the original H.264.
  • the M field originally used to fill the F bits can be reserved, and the future extension carries more information, which is used to identify the ERRTP packet.
  • the present invention performs the following processing for this case: In the ERETP encapsulation format, the F field in the NALU header information is ignored; but on the transmitting end, the error NALU valid for the F field still uses the RTP packet. Encapsulation, only the normal NALU is encapsulated in ERRTP; at the receiving end, it is judged whether the packet is ERRTP or RTP packet and the packet is processed according to the corresponding encapsulation format.
  • the F bit when used in some special cases, it is used for the purpose of the original H.264 definition, that is, to indicate the possible H.264 NALU syntax error, if an intermediate device such as a gateway is in the When the video is video-encoded according to the H.264 protocol, it is found that a certain NALU has a syntax error, and then the NALU is separately encapsulated.
  • the sender first determines whether the F field in the header information of at least one NALU is valid, and accordingly divides it into a normal NALU and an error NALU;
  • the normal NALU is encapsulated into an ERRTP packet, and the ERRTP identifier is set; the error NALU is encapsulated into an RTP packet according to the RTP encapsulation format;
  • the receiving end first determines whether the header information of the received packet is set to the ERRTP identifier, and divides it into an ERRTP packet and an RTP packet;
  • the ERRTP packet is then processed according to the ERRTP encapsulation format, and the RTP packet is processed according to the RTP packet encapsulation format.
  • the gateway for the normal NALU according to the method described above, for the same type of H.264 NALU according to certain rules (determined by the specific application, mainly stipulates how many similar NALUs are encapsulated in each ERRTP packet) for ERRTP encapsulation
  • a regular RTP encapsulation is required for the NALU.
  • the regular RTP packet may contain only one H.264 NALU.
  • the biggest advantage of integrating the NALU header information into the ERRTP header information is that the multimedia transmission device can directly learn the relevant information of the NALU carried by the multimedia transmission device according to the ERRTP header information, and implement H.264 multimedia data accordingly.
  • Real-time delivery of QoS policies This is not possible in the existing RTP, because for the RTP layer, the NALU layer information is not concerned, and the head information of each NALU in the payload cannot be known, so that the QoS policy cannot be implemented.
  • the SEI On the basis of ERRTP, in order to achieve feedback from the receiving end, the SEI carries the enhanced technology of QoS reporting.
  • RTCP assumes the QoS reporting mechanism, but it is actually a general reporting method that can be used for reporting. QoS can also be used to report other information. For specific video communication applications, reporting with RTCP is not necessarily the most appropriate.
  • H.264 can be considered to carry the reported content. Based on this starting point, the present invention directly uses H.264 to carry QoS report information, which avoids the use of additional channels and implements an "in-band" reporting mechanism.
  • Another basis for transmitting QoS reports by H.264 higher layer protocols is that in current video communication applications, the adaptation measures for network transmission are mainly based on terminals, rather than network intermediate devices such as routers, switches or gateways. . Therefore, the encapsulation of the QoS report does not depend on the underlying protocol.
  • the terminal can understand the QoS report information carried in the H.264 to implement QoS monitoring, so it can be independent of the underlying RTCP and other protocols.
  • the "in-band" reporting mechanism of H.264 it does not mean to exclude the application of the RTCP reporting mechanism.
  • the two mechanisms can be used or coexisted, and the use of H.264 can reduce the reporting traffic of RTCP.
  • H.264 packets can take multiple protection measures and bear the QoS.
  • the reported H.264 packet which can be considered as important data, can be protected against high-intensity according to the principle of Unequal Protection (UEP). Thereby, the correct arrival of the report data can be ensured, and the reliability of the QoS monitoring is improved.
  • UDP Unequal Protection
  • the H.264-based extended message mechanism to carry QoS reports is roughly divided into the following three basic steps.
  • each multimedia communication terminal statistically generates a QoS report of H.264 multimedia communication.
  • the content of these reports may be the same as the SR and RR report contents of the RTCP, and may of course be different, but the described quality of service related to H.264 media communication. And information such as network status is consistent;
  • the terminal carries the QoS report by using the H.264 extended message and sends it to other communication terminals.
  • the H.264 extended message mechanism has been mentioned above.
  • SEI SEI
  • the SEI message is basically used by the present invention.
  • Later extensions of H.264 can also use other extended message payloads;
  • the terminal also receives the QoS report sent by other terminals while sending the QoS advertisement. In fact, each terminal will execute the QoS policy according to these QoS reports.
  • the present invention uses the SEI message to carry the QoS report.
  • the main content of the SR and RR reports of the RTCP can be directly used as the payload of the H.264 SEI message, and thus carried by the extended SEI message. these messages.
  • a specific SEI extended message is defined specifically for carrying QoS reports.
  • the invention stores the SR and RR report messages similar to RTCP in the SEI domain, which not only ensures the transmission efficiency, but also effectively feeds back the channel state and the decoded information, and facilitates the interactive anti-data packet loss between the encoding end and the decoding end.
  • the specific structure is shown in Figure 8, except that the header information is arranged according to the SEI message structure, and other QoS report contents are drawn from the format of the SR and RR reports of RTCP.
  • SEI Type The first byte (byte 0) is a payload type field (SEI Type), which is used to indicate that the payload is a corresponding QoS report.
  • the second and third bytes are the payload length field (SEI Packet-Length), which is used to indicate the corresponding QoS report length, which is the same as the length field in the RTCP QoS report;
  • SEI Packet-Length the payload length field
  • the 4th byte and later are the payload of the SEI message, that is, used to fill the corresponding QoS report.
  • the QoS report is also divided into the sender report and the receiver report.
  • the load type field indicates the difference, that is, the SEI Type value is different.
  • the specific content of the QoS report can be the same as the RTCP SR and RR reports, as shown in Figure 2:
  • V version information field
  • the padding field (P), which is 1 bit, is used to indicate whether there is padding content, the same as RTCP;
  • Receive report number field which is 5 bits, used to indicate the number of received report blocks reported in the QoS report
  • the sender SSRC field which is 32 bits, is used to identify the sender of the quality of service report
  • sender information block for describing the information about the sender of the report
  • a plurality of receiving report blocks are included for describing multimedia statistical information from different sources, each block containing the identifier of the source and related statistical indicators of the multimedia stream, and the meanings of various indicators have been described in the previous RTCP;
  • the content of the QoS report given in Figure 8 is basically the same as that of RTCP.
  • the RR and the SR are written into the SEI domain, the RTCP information can be transmitted without a dedicated logical channel, which saves part of the bandwidth overhead.
  • the essence of the present invention lies in the in-band bearer with the SEI message.
  • the statistical generation of the QoS report as long as the invention of the QoS monitoring can be achieved, the essence and scope of the present invention are not affected.
  • various QoS policies can be performed on this basis, for example, using the accumulated packet loss field of RTCP, which can be used for feedback decoding information in two-way video communication (the terminal has both an encoder and a decoder). For easy interactive anti-data packet loss.
  • the rate control algorithm can further ensure that the encoding end rate is nearly constant according to the information in the arrival delay jitter field; the sender byte count field can estimate the average rate of the payload, so that the sending end can reset the encoder parameters according to the network state. This includes adjusting the target frame rate, restoring the image quality, and the resolution of the original image.
  • H.264 data packets can adopt various protection measures after adopting H.264 "in-band" reporting mode, and can be considered as H.264 data packets carrying QoS reports.
  • Important data according to the principle of unequal protection, can be applied with high-intensity protection measures. This ensures the correct arrival of the report data.
  • the SEI for carrying the QoS report should be further carried by the NALU, and as described above, the NALU has a header information to set the importance of the content, so the communication terminal can set the NALU according to the reliability requirement of the QoS report transmission.
  • the nal-ref_idc field can be set to 1, 2, 3, etc. In the fault-tolerant elastic coding, different strength protection measures are taken according to the level of this field.
  • the communication terminal can also dynamically adjust the transmission period of the QoS advertisement based on the SEI message according to the current network state and the high-level application requirement.
  • the interval for writing RTCP information to the SEI domain (that is, the reporting period) is the same as the recommended RTCP transmission interval in RFC3550.
  • the possible reporting period may not be exactly the same as that specified in RFC 3550, but may be adjusted.
  • the reporting period is determined by the needs of the specific application. For example, an important use of reporting data is to dynamically estimate network performance: packet loss rate, latency, jitter, and more. If these data need to be detected frequently, the reporting period should be short, otherwise the reporting period can be long.
  • the SEI message can not only transmit the QoS report of the H.264 video, but also mix the QoS reports carrying the multiple media streams, and only need to add the corresponding receiving report blocks of the various media streams after the QoS report. For example, audio stream, etc., as long as the source of the SSRC specific report block content is added to the SR report.
  • communication The terminal may also select an existing RTCP transmission, or may simultaneously transmit one or both of the H.264 extended message and the RTCP.
  • the present invention provides a video transmission method for estimating the current communication status and adaptively adjusting the adaptive protection of the protection policy. Firstly, according to the performance impact of the protection method, different parameter configurations are given, and a multi-level protection strategy with different protection capabilities is set, which is selected for efficient and reliable protection under different communication conditions. Secondly, according to the communication statistics at the receiving end The network status and communication quality are sent back to the sender; finally, the sender adjusts according to the returned communication quality statistics to select the most appropriate protection policy level.
  • the key to the program is also the method of statistical communication quality and the channel for sending back statistical information.
  • the information of the packet loss rate and its location can be counted by using the sequence number loss of the H.264 NALU, and the extended SEI message structure of the payload part of the NALU is defined to carry the statistical information, and the statistical data is transmitted from the receiving end to the transmitting end.
  • the feedback mechanism is different from the SR/RR format of the QoS report, those skilled in the art can understand that the fundamental principles of the two methods are the same, but the content carried by the SEI is different, so the following description does not. then the QoS reporting scheme proposed specifically SEI bearer network packet loss ratio of the area of the other embodiment 1 J.
  • Tornado erasure codes need to set parameters such as '. number of data nodes, number of check nodes, scaling ratio, number of check node layers, and two levels of graphs used to calculate check nodes.
  • the transmitting end divides the video stream data into data nodes, and then generates a check node according to the Tornado encoding method, and sends it to the receiving end together; the receiving end performs error correction according to the Tornado decoding method to obtain video stream data.
  • this embodiment pre-sets a protection strategy series with different levels of protection strength. Used separately for different communication quality levels Protect video stream data. It can be seen that different levels of protection policies can adapt to changes in network communication quality, not only can meet the protection requirements of channel degradation, but also can appropriately reduce the protection strength in the case of signal improvement, thereby reducing system overhead and saving processing and bandwidth resources. .
  • Tornado erasure codes In order to give different levels of protection strategy, it is necessary to set Tornado erasure codes with different parameters. According to the foregoing parameters affecting the protection performance of Tornado erasure code, there are mainly the number of data nodes, the number of check nodes and the random distribution of the node degrees on both sides of the bipartite graph. For the sake of the single, the Tornado codes of different abilities are generally not unified. In the bipartite graph, the Tornado erasure code protection strategy with different protection strengths is given by using different number of data nodes and number of check nodes. According to the Tornado erasure code principle, the number of different data nodes and the number of check nodes can determine the Tornado erasure codes of different code rates or redundancy rates, thus giving different protection strengths and system overhead.
  • the receiving end receives the data and performs Tornado erasure code decoding to obtain the video stream data, and performs statistics according to the data loss situation, and obtains statistical information to represent the communication quality.
  • the sender needs to adjust the protection policy according to the communication quality. Therefore, the transmission needs to be counted.
  • the receiver collects the transmission according to the sequence number of the NALU of the H.264 video process data.
  • each terminal of the communication system has both an encoder and a decoder.
  • the NALU is sequence numbered, that is, the NALU sent by all the senders has a uniform sequence number. Therefore, the receiver can determine whether there is a NALU loss according to the sequence number of the received NALU. If the NALU sequence number is discontinuous, it indicates that there is a NALU loss.
  • the interrupted NALU sequence number is the sequence number of the lost NALU, and the number is the number of lost NALUs.
  • the receiving end can also send the packet loss information directly to the sending end, and the sending end performs statistics. Using the NALU sequence number for statistics not only ensures that the statistics are accurate, but also directly uses the existing data information without additional bearer overhead.
  • the receiving end sends the statistical information and other data loss information back to the sending end through the extended SEI message. After collecting statistics on the transmission status at the receiving end, it needs to be sent back to the sending.
  • the extended SEI message structure is specifically configured to carry the transmission status statistics sent back from the receiving end. After completing the statistics, the receiving end writes the information into the specifically defined extended SEI message body, and then writes it into the SEI field of the encoded code stream sent back by the terminal, and sends it back to the transmitting end. After receiving the SEI message, the sender can directly learn the statistics or obtain the ALSR, so as to establish a true perception mechanism of the packet loss rate of the network.
  • the SEI message is also carried by the basic unit NALU of the H.264 code stream.
  • Each SEI field contains one or more SEI messages, and the SEI message is composed of SEI header information and SEI payload.
  • the SEI header information includes two codewords: payload type and payload size.
  • the length of the payload type is not necessarily the same. For example, when the type is between 0 and 255, it is represented by one byte. When the type is between 256 and 511, it is represented by two bytes OxFFOO to OxFFFE, and so on. Define any number of load types. In the existing H.264 standard, the type 0 to type 18 standards have been defined as specific information such as buffer period, image timing, and the like. It can be seen that the SEI domain defined in H.264 can store enough user-defined information according to requirements.
  • an extended SEI message for carrying statistical information is defined in the reserved SEI payload type.
  • the sender adjusts the Tornado erasure code according to the statistics sent back, and uses a protection strategy that is more suitable for the current transmission situation. Finally, the sender will adjust the protection policy according to the statistical information, that is, select the appropriate level of protection strategy.
  • the transmitting end also presets a judgment threshold series corresponding to different protection levels, sets a threshold for entering each level, and then selects its corresponding level according to the threshold at which the ALSR falls.
  • Different protection strategy series are used for data of different importance. Considering the different protection requirements for critical and non-critical data, in order to further improve the fitness, two different protection strategy series were set up to protect critical and non-critical data. In this way, the data of two different communication requirements can be processed independently, and the protection strategy is selected according to the protection strength suitable for each requirement, thereby improving system efficiency.
  • n the number of data nodes
  • 1 the number of check nodes.
  • the Tornado code protection scheme determined by the parameters n, 1 is represented by TN(n+l,n). So corresponding to the key
  • the data protection scheme series is: TN K (n. +l Q , n.), ,
  • TN K (n Q +lo, n Q ) is used to protect the key data
  • TN NK (n Q +l., n Q ) is used to protect the non-critical data
  • G L-1 ⁇ A1SR ⁇ 1 , TN ⁇ nw+lw, ! ⁇ ) is used to protect key data
  • TN I is used to protect non-critical data.
  • the sender resends the information according to the lost data information sent back by the receiver.
  • the receiving end counts the lost NALU information, it obtains the positioning information of the image frame corresponding to the lost NALU, and the information includes the sequence number of the frame and the position in the frame.
  • the receiving end sends the positioning information back to the sending end, and the sending end can locate the corresponding video stream data and resend it.
  • video stream data with too long delay has lost value, but in some business situations or under certain mechanisms, data with a certain delay still has value, such as a large buffer range.
  • the data can be used to avoid interruption of the video stream playback. It can be seen that the retransmission mechanism has important value for improving the reliability and quality of service of video stream communication.
  • the basic idea of the scheme is to find the missing data information, such as the location of the slice, by using the statistics of the NALU serial number at the receiving end.
  • an efficient algorithm is used to simply replace the lost data to cover the error loss, and on the other hand, it will be wrong.
  • the code information is fed back to the sender.
  • the extended SEI message of H.264 establishes a bit error information feedback channel from the receiving end to the transmitting end. After the sender knows the error information, it immediately adopts the strategy of intra-frame coding successively, and segments the error slice to prevent the error from spreading.
  • the transmitting end encodes the video stream data to be encoded, obtains a video stream, and then encapsulates the NALU and transmits the packet to the receiving end through the packet message.
  • the receiving end receives the message and decodes it. At this time, the receiving end needs to determine whether the video stream data is lost, so as to perform subsequent error elimination operations.
  • the error elimination process is roughly divided into three major steps: masking, feedback, and diffusion elimination.
  • the receiving end judges whether data is lost according to the NALU sequence interruption condition, and counts the information of the lost data, that is, the error information.
  • NALU is the basic unit of H.264 video stream data transfer, and each NALU has a unique serial number. Therefore, the receiving end knows which NALUs are lost according to whether the NALU sequence number is interrupted. It is thus possible to implement an error concealment strategy for lost data.
  • the NALU serial number is used for statistics, which not only ensures the accuracy of the statistical information, but also directly uses the existing data information, and does not require additional bearer overhead.
  • the receiving end learns the sequence number by identifying the received NALU header information, and the discontinuous detection error occurs by the sequence number.
  • the previous NALU knows the video data that the missing NALU should carry, and locates the data loss caused by the error code. For example, if the previous NALU of the lost NALU bears the first slice of the Nth frame, the position of the slice carried by the lost NALU may be inferred in the order of transmission, which should be the latter slice of the current frame.
  • the receiving end needs to re-synchronize the video information. Because the H.264 video code stream is continuously transmitted, the receiving end and the data stream need to be synchronized, and then can be correctly received. Once the data stream is interrupted, the receiving end needs to re-synchronize. The resynchronization of the decoder is accomplished by finding the next NALU header information after the interruption. This process, the receiver also needs After that, the receiving end needs to perform error concealment, and the lost NALU is discarded. Therefore, the entire slice carried by the NALU is lost.
  • the error concealment strategy is to replace the lost data with data adjacent to the time domain or the spatial domain. For example, the slice recovery image data corresponding to the position of the previous frame of the frame in which the data is lost is masked.
  • the receiving end After receiving the error information, the receiving end feeds it back to the transmitting end.
  • the feedback error information needs a feedback channel.
  • the first embodiment of the present invention uses an existing H.264 communication mechanism to define an extended SEI message for carrying the error information to establish feedback. So that the sender combines the error information to prevent the error from spreading. In fact, combined with the error information feedback mechanism and the error diffusion elimination strategy at the transmitting end, the error spread caused by the error concealment strategy implemented by the previous receiving end can be avoided.
  • the extended SEI message of the H.264 is used to provide an information feedback mechanism from the receiving end to the transmitting end, so that the sending end can know which NALUs are lost in time, so that effective error spreading can be eliminated in time. Prevent future error spread due to these lost data.
  • the advantage of establishing an information feedback mechanism within the H.264 system is to save network bandwidth overhead, save system processing resources, and not affect interoperability.
  • the SEI message is also carried by the basic unit NALU of the H.264 code stream.
  • Each SEI field contains one or more SEI messages, and the SEI message is composed of the SEI header information and the SEI payload.
  • the SEI header information includes two codewords: payload type and payload size.
  • the length of the payload type is not necessarily the same. For example, the type is represented by one byte between 0 and 255.
  • the type When the type is between 256 and 511, it is represented by two bytes OxFFOO to OxFFFE, and so on, so that the user can customize Any of a variety of load types.
  • the type 0 to type 18 standards have been defined as specific information such as buffer period, image timing, and the like. It can be seen that the SEI domain defined in H.264 can store enough user-defined information according to requirements.
  • the transmitting end starts to perform error diffusion elimination according to the error information of the feedback.
  • the error diffusion elimination method of joint error information is better than the existing error-free diffusion elimination without feedback.
  • the sender can purposely take precautions against the lost slice, such as avoiding losing the slice in later encoding. For the reference frame, this can minimize the dependence of the receiver on the slice when decoding.
  • the error diffusion is also limited to the same slice. internal.
  • a strategy of intra-frame coding is performed in stages, that is, after the error is transmitted, the slice region of the subsequent frame is segmented into new slices, for example, P macroblocks are divided. A new slice is then intra-coded to eliminate the reference or dependency of the slice on the previously lost slice.
  • the H.264 video real-time transmission system uses a data rate control scheme to limit the fluctuation of each frame of data, so that the amount of data per frame is equalized, and the stability of video transmission is improved. Therefore, the amount of data that is intra-coded once in each frame, that is, the number of macroblocks, cannot be too much, otherwise it will exceed the H.264 data rate control range.
  • Figure 9 shows the principle of error spread elimination for segmented successive intra coding.
  • the error information is detected and fed back to the transmitting end, that is, the frame where the slice of the lost data and the intra-frame positioning information are sent back to the transmitting end through the extended SEI message.
  • the sender extracts the missing slice location information from the SEI message. For example, each frame in FIG. 9 is divided into three slices, namely, Slice#0, Slice#1, Slice#2, and the slice #1 of the nth frame is in the transmission. Lost, then segmented successive intraframe coding is required.
  • the encoding end divides P macroblocks into a new Slice#3 from the starting position in the macroblock scanning order, and the remaining macroblocks are still Slice#l, and there are four Slice, where the new Slice#3 is intra-coded.
  • Slice #3 which is divided into new components in the previous step, is intra-coded and then transmitted as Slice #3, and the other slices are still encoded as usual.
  • the number of macroblocks P divided each time should satisfy the following conditions, as large as possible, to avoid the number of divisions, reduce the processing delay, and shorten the range of influence, but it is necessary to satisfy the aforementioned H.264 data rate control range.
  • the number of macroblocks divided each time can be different, but the number of macroblocks divided last time will cause all macroblocks in the lost slice to be processed.
  • one frame of video stream data is composed of 240 macroblocks, and each 80 macroblocks are initially divided into one slice, that is, 1 - 80 macroblocks are Slice # 0, 81 - 160 macroblocks are Slice # 1 , 161 - 240 The macro block is Slice # 2.
  • the appropriate segmentation value P is determined to be 12 macroblock segments.
  • the first 12 macroblocks in the n+1th frame are intra-coded to form Slice #3.
  • Slice #3 can use conventional predictive coding, and the next 12 macroblocks are intra-coded to form Slice #4, and the last remaining until the n+7th frame is 8
  • the macroblock is intra-coded to form Slice #9, and the error spreading method flow of the segment-by-frame intra-frame coding is completed.
  • the seventh embodiment of the present invention is: setting an erasure code having only one layer of the check node, and performing data transfer protection based on the erasure code.
  • the Tornado erasure code scheme has only one check node layer, and the intermediate check node layer of the Tornado code is removed. Similarly, the inherent requirement of the last layer check node generated by the Reed-Solomon code in the Tornado code is removed. Thus, the erasure code of the present invention has only one layer of data node layer and one layer of check node as shown in FIG. 10. It can be said that the erasure code of the present invention is a structured tubular Tornado code, which is a An improved Tornado code.
  • the data node size L1 of the improved Tornado code of the present invention, the number n of data nodes in the data node layer, and the number L of check nodes in the check node layer can be determined according to actual needs.
  • the data node size L1 in the data node layer and the number of data nodes included in the data node layer are determined according to factors such as data transmission rate, data type such as audio data/video data, data protection capability requirements, maximum network delay that can be received, and the like. Check the number of check nodes L included in the node layer.
  • the proportional scaling factor of the number of nodes between adjacent two layers is ⁇ the last layer and the equal scaling factor of the number of nodes between the mth layers is the total number of nodes Total N of the Tornado code in the prior art. De is:
  • the improved Tornado code of the present invention has the condition that the number of hidden integer nodes is no longer required for the improved Tornado code because there is no intermediate check node layer.
  • the number of check nodes of the check node layer of the improved Tornado code of the present invention L For: L ⁇ n, the equal-ratio scaling factor of the number of nodes of the data node layer and the check node layer of the improved Tornado code of the present invention can be arbitrarily set, given the number n of data nodes, L Can be flexibly set.
  • the present invention integrates the above six enhancement technologies, modularizes the entire H.264/ERRTP transmission architecture, and combines them on a protocol stack, not only achieving their respective advantages, but also mutual Enhanced to reflect better reliability and quality of service.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

La présente invention concerne un procédé de communication multimédia et le terminal de celui-ci, destinés à améliorer la fiabilité de la transmission et la qualité de la communication. Basé sur le protocole RTP existant de transmission en temps réel, le protocole ERRTP de transmission en temps réel de tolérance aux erreurs procure le format d’encapsulation de couche transport transportant les informations liées à la méthode d’encodage de la tolérance aux erreurs, qui fait en sorte que les données multimédia marquent les informations correspondantes relatives à la méthode d’encodage de la tolérance aux erreurs pendant la transmission des données multimédia sur ERRTP, et intègre ainsi le mécanisme de tolérance aux erreurs dans la couche transport ; pour le procédé privé d’encapsulation d’ERRTP et la méthode adaptée pour les informations d’en-tête de protocole données par l’architecture NALU (unité de couche abstraite de réseau) H.264, il est possible d’intégrer l’octet d’informations de l’en-tête entièrement NALU se trouvant dans le même paquet ERRTP dans les informations de son en-tête, ce qui encapsule les informations NALU importantes dans les informations de l’en-tête ERRTP et améliore l’efficacité de la transmission ; on utilise le mécanisme de message étendu de H.264 lui-même pour transporter les informations de signalement de QoS, on met en œuvre un mécanisme de signalement de QoS « en bande » du protocole de couche supérieure, et on évite ainsi les charges supplémentaires liées à l’utilisation du canal.
PCT/CN2006/002961 2005-11-03 2006-11-03 Procede de communication multimedia et terminal de celui-ci WO2007051425A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNB2006100690163A CN100466725C (zh) 2005-11-03 2005-11-03 多媒体通信方法及其终端
CN200510110013.5 2005-11-03

Publications (1)

Publication Number Publication Date
WO2007051425A1 true WO2007051425A1 (fr) 2007-05-10

Family

ID=37390610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2006/002961 WO2007051425A1 (fr) 2005-11-03 2006-11-03 Procede de communication multimedia et terminal de celui-ci

Country Status (2)

Country Link
CN (1) CN100466725C (fr)
WO (1) WO2007051425A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101800751A (zh) * 2010-03-09 2010-08-11 上海雅海网络科技有限公司 分布式实时数据编码传输方法
CN103118241A (zh) * 2012-02-24 2013-05-22 金三立视频科技(深圳)有限公司 基于3g网络的移动视频监控流媒体传输自适应调整算法
CN103167319A (zh) * 2011-12-16 2013-06-19 中国移动通信集团公司 一种流媒体的传送处理方法、装置及系统
CN112311802A (zh) * 2020-11-05 2021-02-02 维沃移动通信有限公司 信息传输方法和信息传输装置
WO2021180065A1 (fr) * 2020-03-09 2021-09-16 华为技术有限公司 Procédé de transmission de données et appareil de communication
CN114070458A (zh) * 2020-08-04 2022-02-18 成都鼎桥通信技术有限公司 数据传输方法、装置、设备及存储介质
CN115189810A (zh) * 2022-07-07 2022-10-14 福州大学 一种面向低时延实时视频fec编码传输控制方法

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8356331B2 (en) * 2007-05-08 2013-01-15 Qualcomm Incorporated Packet structure for a mobile display digital interface
JP4488027B2 (ja) * 2007-05-17 2010-06-23 ソニー株式会社 情報処理装置および方法、並びに、情報処理システム
WO2009099510A1 (fr) 2008-02-05 2009-08-13 Thomson Licensing Procédés et appareil pour la segmentation implicite de blocs dans le codage et le décodage vidéo
CN102075312B (zh) * 2011-01-10 2013-03-20 西安电子科技大学 基于视频服务质量的混合选择重传方法
CN102438002B (zh) * 2011-08-10 2016-08-03 中山大学深圳研究院 一种基于Ad hoc网络下的视频文件数据传输方法
US8549570B2 (en) * 2012-02-23 2013-10-01 Ericsson Television Inc. Methods and apparatus for managing network resources used by multimedia streams in a virtual pipe
CN102956233B (zh) * 2012-10-10 2015-07-08 深圳广晟信源技术有限公司 数字音频编码的附加数据的扩展结构及相应的扩展装置
CN105653530B (zh) * 2014-11-12 2021-11-30 上海交通大学 一种高效可伸缩的多媒体传送、存储和呈现方法
FR3031428A1 (fr) * 2015-01-07 2016-07-08 Orange Systeme de transmission de paquets de donnees selon un protocole d' acces multiple
CN105307050B (zh) * 2015-10-26 2018-10-26 何震宇 一种基于hevc的网络流媒体应用系统及方法
CN107181783B (zh) * 2016-03-11 2020-06-23 上汽通用汽车有限公司 在车辆中利用以太网传输数据的方法和装置
CN105916058B (zh) * 2016-05-05 2019-09-20 青岛海信宽带多媒体技术有限公司 一种流媒体缓冲播放方法、装置及显示设备
CN106921843B (zh) * 2017-01-18 2020-06-26 苏州科达科技股份有限公司 数据传输方法及装置
CN109756468B (zh) * 2017-11-07 2021-08-17 中兴通讯股份有限公司 一种数据包的修复方法、基站及计算机可读存储介质
CN108702487A (zh) * 2017-11-20 2018-10-23 深圳市大疆创新科技有限公司 无人机的图像传输方法和装置
CN109891817B (zh) * 2018-02-08 2020-09-08 Oppo广东移动通信有限公司 无线通信方法、终端和网络设备
CN110139150A (zh) * 2019-04-12 2019-08-16 北京物资学院 一种视频处理方法及装置
CN110233716A (zh) * 2019-05-31 2019-09-13 北京文香信息技术有限公司 一种通信交互方法、装置、存储介质、终端设备及服务器
CN110740135A (zh) * 2019-10-21 2020-01-31 湖南新云网科技有限公司 一种多媒体教室的同屏数据传输方法、装置及系统
CN111010593A (zh) * 2019-11-08 2020-04-14 深圳市麦谷科技有限公司 基于flv格式封装h.265视频数据的方法和装置
CN110769206B (zh) * 2019-11-19 2022-01-07 深圳开立生物医疗科技股份有限公司 一种电子内窥镜信号传输方法、装置和系统及电子设备
CN112866178B (zh) * 2019-11-27 2023-09-05 北京沃东天骏信息技术有限公司 音频数据传输的方法和装置
CN111083510A (zh) * 2019-12-18 2020-04-28 深圳市麦谷科技有限公司 推送hevc视频的方法和装置
CN111490984B (zh) * 2020-04-03 2022-03-29 上海宽创国际文化科技股份有限公司 一种网络数据编码及其加密算法
CN111629282B (zh) * 2020-04-13 2021-02-09 北京创享苑科技文化有限公司 一种实时的纠删码编码冗余度动态调节方法
CN111629279B (zh) * 2020-04-13 2021-04-16 北京创享苑科技文化有限公司 一种基于定长格式的视频数据传输方法
CN111800388A (zh) * 2020-06-09 2020-10-20 盐城网之易传媒有限公司 一种媒体信息处理方法及媒体信息处理装置
CN113873340B (zh) * 2021-09-18 2024-01-16 恒安嘉新(北京)科技股份公司 一种数据处理方法、装置、设备、系统及存储介质
CN113938881A (zh) * 2021-10-18 2022-01-14 上海华讯网络系统有限公司 适用于互联网数据的传输系统及方法
CN114615549B (zh) * 2022-05-11 2022-09-20 北京搜狐新动力信息技术有限公司 流媒体seek方法、客户端、存储介质和移动设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1065168A (zh) * 1991-02-19 1992-10-07 菲利浦光灯制造公司 传输系统以及用于该系统的接收机
US20030229822A1 (en) * 2002-04-24 2003-12-11 Joohee Kim Methods and systems for multiple substream unequal error protection and error concealment
WO2004036760A1 (fr) * 2002-10-15 2004-04-29 Koninklijke Philips Electronics N.V. Systeme et procede de recouvrement d'erreur pour la diffusion en continu de donnees video fgs codees dans un reseai internet
US6944802B2 (en) * 2000-03-29 2005-09-13 The Regents Of The University Of California Method and apparatus for transmitting and receiving wireless packet

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002071640A1 (fr) * 2001-03-05 2002-09-12 Intervideo, Inc. Systemes et procedes de codage et de decodage de vecteurs de mouvement redondants dans des trains de bits video comprimes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1065168A (zh) * 1991-02-19 1992-10-07 菲利浦光灯制造公司 传输系统以及用于该系统的接收机
US6944802B2 (en) * 2000-03-29 2005-09-13 The Regents Of The University Of California Method and apparatus for transmitting and receiving wireless packet
US20030229822A1 (en) * 2002-04-24 2003-12-11 Joohee Kim Methods and systems for multiple substream unequal error protection and error concealment
WO2004036760A1 (fr) * 2002-10-15 2004-04-29 Koninklijke Philips Electronics N.V. Systeme et procede de recouvrement d'erreur pour la diffusion en continu de donnees video fgs codees dans un reseai internet

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101800751A (zh) * 2010-03-09 2010-08-11 上海雅海网络科技有限公司 分布式实时数据编码传输方法
CN103167319A (zh) * 2011-12-16 2013-06-19 中国移动通信集团公司 一种流媒体的传送处理方法、装置及系统
CN103167319B (zh) * 2011-12-16 2016-06-22 中国移动通信集团公司 一种流媒体的传送处理方法、装置及系统
CN103118241A (zh) * 2012-02-24 2013-05-22 金三立视频科技(深圳)有限公司 基于3g网络的移动视频监控流媒体传输自适应调整算法
WO2021180065A1 (fr) * 2020-03-09 2021-09-16 华为技术有限公司 Procédé de transmission de données et appareil de communication
US12003322B2 (en) 2020-03-09 2024-06-04 Huawei Technologies Co., Ltd. Data transmission method and communication apparatus
CN114070458A (zh) * 2020-08-04 2022-02-18 成都鼎桥通信技术有限公司 数据传输方法、装置、设备及存储介质
CN112311802A (zh) * 2020-11-05 2021-02-02 维沃移动通信有限公司 信息传输方法和信息传输装置
CN112311802B (zh) * 2020-11-05 2023-10-27 维沃移动通信有限公司 信息传输方法和信息传输装置
CN115189810A (zh) * 2022-07-07 2022-10-14 福州大学 一种面向低时延实时视频fec编码传输控制方法
CN115189810B (zh) * 2022-07-07 2024-04-16 福州大学 一种面向低时延实时视频fec编码传输控制方法

Also Published As

Publication number Publication date
CN100466725C (zh) 2009-03-04
CN1863302A (zh) 2006-11-15

Similar Documents

Publication Publication Date Title
WO2007051425A1 (fr) Procede de communication multimedia et terminal de celui-ci
AU2006321552B2 (en) Systems and methods for error resilience and random access in video communication systems
Wenger et al. RTP payload format for H. 264 video
US8462856B2 (en) Systems and methods for error resilience in video communication systems
Wang et al. RTP payload format for H. 264 video
EP1936868B1 (fr) Méthode pour surveiller une qualité de service dans une communication multimédia
WO2007045141A1 (fr) Procede de prise en charge de transmission de donnees multimedias avec tolerance aux erreurs
CN100558167C (zh) 多媒体视频通信方法及系统
WO2006105713A1 (fr) Procede de protection d'emission video fonde sur la technologie h.264
Wenger et al. RFC 3984: RTP payload format for H. 264 video
JP2005033556A (ja) データ送信装置、データ送信方法、データ受信装置、データ受信方法
Wang et al. RFC 6184: RTP Payload Format for H. 264 Video
AU2012216587B2 (en) Systems and methods for error resilience and random access in video communication systems
AU2012201576A1 (en) Improved systems and methods for error resilience in video communication systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06805162

Country of ref document: EP

Kind code of ref document: A1