WO2023168133A2 - Packet wash of rtp aggregation packets in a video stream - Google Patents

Packet wash of rtp aggregation packets in a video stream Download PDF

Info

Publication number
WO2023168133A2
WO2023168133A2 PCT/US2023/021254 US2023021254W WO2023168133A2 WO 2023168133 A2 WO2023168133 A2 WO 2023168133A2 US 2023021254 W US2023021254 W US 2023021254W WO 2023168133 A2 WO2023168133 A2 WO 2023168133A2
Authority
WO
WIPO (PCT)
Prior art keywords
aggregation
packet
rtp
nalus
value
Prior art date
Application number
PCT/US2023/021254
Other languages
French (fr)
Other versions
WO2023168133A3 (en
Inventor
Lijun Dong
Original Assignee
Futurewei Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Futurewei Technologies, Inc. filed Critical Futurewei Technologies, Inc.
Priority to PCT/US2023/021254 priority Critical patent/WO2023168133A2/en
Publication of WO2023168133A2 publication Critical patent/WO2023168133A2/en
Publication of WO2023168133A3 publication Critical patent/WO2023168133A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/752Media network packet handling adapting media to network capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • H04N21/2402Monitoring of the downstream path of the transmission network, e.g. bandwidth available
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • H04N21/2407Monitoring of transmitted content, e.g. distribution time, number of downloads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8451Structuring of content, e.g. decomposing content into time segments using Advanced Video Coding [AVC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/85406Content authoring involving a specific file format, e.g. MP4 format

Definitions

  • the present disclosure is generally related to packetized communication of a video stream and specifically to improved methods of packet washing video stream packets.
  • Bursty loss and longer-than-expected delay have catastrophic effects on the quality of experience (QoE) to end-users in video streaming. Such loss and delay are often caused by network congestion.
  • Various congestion control mechanisms target various goals, e.g. link utilization improvement, loss reduction, and fairness enhancement. For media streaming, reducing the possibility of network congestion may be achieved by rate control and video adaptation methods.
  • Video codecs such as H.264 Advanced Video Coding (AVC), H.264 Scalable Video Coding (SVC), H.265 High Efficiency Video Coding (HEVC), and H.266 Versatile Video Coding (VVC) use a syntax structure based on Network Abstraction Layer (NAL) units.
  • NAL unit structure provides convenient packetization/framing of video data to be transmitted in packetbased systems using transport protocols such as Real-time Transport Protocol (RTP).
  • RTP Real-time Transport Protocol
  • a first aspect relates to a method implemented in a video source, the method comprising: receiving a plurality of aggregation Network Abstraction Layer (NAL) units (NALUs), wherein each aggregation NALU contains a portion of a frame of video data; generating a Real-time Transport Protocol (RTP) aggregation packet comprising the aggregation NALUs, wherein the aggregation NALUs are ordered in a payload of the RTP aggregation packet in decreasing order of their relative importance to the process of decoding a video stream, the relative importance of the aggregation NALU being determined by an encoder of the video data; and transmitting the RTP aggregation packet.
  • NAL Network Abstraction Layer
  • RTP Real-time Transport Protocol
  • another implementation of the aspect provides receiving the plurality of aggregation NALUs comprises receiving an aggregation packet in which the aggregation NALUs are arranged in sequential decoding order; and generating the RTP aggregation packet comprises reordering the aggregation NALUs in decreasing order of their relative importance.
  • another implementation of the aspect provides the payload of the RTP aggregation packet comprises: a decoding order number (DON) Base (DONB) value comprising a lowest DON value for the aggregation NALUs in the RTP aggregation packet; and for each aggregation NALU, a Decoding Order Number Difference (DOND) value, indicating a difference between a DON value of the received NALU and the DONB value.
  • DON decoding order number
  • DONB Decoding order number Base
  • DOND Decoding Order Number Difference
  • another implementation of the aspect provides the DON value of the aggregation NALU equals DONB + DOND.
  • the aggregation NALUs comprise video data encoded using one of an Advanced Video Coding (AVC) standard or a Scalable Video Coding (SVC) standard;
  • the importance value of an aggregation NALU comprises a NAL reference indication (NRI) value determined by the encoder of the video data;
  • the aggregation NALUs are ordered in the payload of the RTP aggregation packet in decreasing order of their NRI value;
  • the importance threshold value comprises a minimum NRI value to be retained in the RTP aggregation packet during the packet wash procedure;
  • the WashAllowance value is equal to a number of aggregation NALUs in the RTP aggregation packet whose NRI value is less than the importance threshold value.
  • the aggregation NALUs comprise video data encoded using High Efficiency Video Coding (HEVC) standard;
  • the importance value of an aggregation NALU comprises a Temporalld (TID) value determined by the encoder of the video data;
  • the aggregation NALUs are ordered in the payload of the RTP aggregation packet in increasing order of the TID value;
  • the importance threshold value comprises a maximum TID value to be retained in the RTP aggregation packet during the packet wash procedure;
  • the WashAllowance value is equal to a number of aggregation NALUs in the RTP aggregation packet whose TID value is greater than the importance threshold value.
  • the aggregation NALUs comprise video data encoded using Versatile Video Coding (VVC) standard;
  • VVC Versatile Video Coding
  • the importance value of an aggregation NALU comprises a combined importance value based on combining a TID value and a LayerTD value determined by the encoder of the video data;
  • the aggregation NALUs are ordered in the payload of the RTP aggregation packet in increasing order of the combined importance value;
  • the importance threshold value comprises a maximum combined importance value to be retained in the RTP aggregation packet during the packet wash procedure;
  • the maximum number of aggregation NALUs that may be removed from the RTP aggregation packet during the packet wash procedure is equal to a number of NALUs in the RTP aggregation packet whose combined importance value is greater than the importance threshold value.
  • another implementation of the aspect provides the combined importance value of the aggregation NALU is the TID value of the aggregation NALU multiplied by the LayerlD value of the aggregation NALU.
  • a second aspect relates to a method of performing a packet wash on a Real-time Transport Protocol (RTP) aggregation packet, implemented in a network node, wherein the RTP aggregation packet comprises a plurality of received Network Abstraction Layer (NAL) units (NALUs), the method comprising: receiving a received RTP aggregation packet; performing the packet wash procedure when a size of the received RTP aggregation packet is larger than an available space in an outbound queue of the network node, the packet wash procedure comprising: generating a reduced RTP aggregation packet by successively removing, beginning with a last received NALU, one or more received NALUs from the payload of the reduced RTP aggregation packet; terminating the packet wash procedure when one of (i) the size of the reduced RTP aggregation packet is less than or equal to the available space in the outbound queue, or (ii) a total number of received NALUs removed from the payload of the reduced RTP aggregation packet is equal
  • another implementation of the aspect provides a network layer header that encapsulates the RTP aggregation packet comprises a WashAllowance value indicating a maximum number of received NALUs that may be removed from the received RTP aggregation packet during the packet wash procedure, removing a received NALU from the payload of the reduced RTP aggregation packet comprises decrementing by one the WashAllowance value in the network layer header, and the total number of received NALUs removed from the pay load is equal to the maximum number of received NALUs that may be removed when the WashAllowance value in the network layer header equals zero.
  • a third aspect relates to a network apparatus, comprising: a memory configured to store instructions; and a processor coupled to the memory and configured to execute the instructions to perform the method of any of the preceding aspects.
  • the memory comprises two or more memory components, and wherein the processor comprises two or more processor components.
  • a fourth aspect relates to a network apparatus, comprising: a receiving means for receiving a plurality of aggregation Network Abstraction Layer (NAL) units (NALUs), wherein each of the plurality of NALUs contain a portion of a frame of video data; a processing means for generating a decreasing-order Real-time Transport Protocol (RTP) aggregation packet comprising the aggregation NALUs; and a transmitting means for transmitting the RTP aggregation packet.
  • NAL Network Abstraction Layer
  • RTP Real-time Transport Protocol
  • another implementation of the aspect provides the network apparatus is further configured to perform the method of any of the preceding aspects.
  • a fourth aspect relates to a network apparatus, comprising: a receiving means for receiving a decreasing-order Real-time Transport Protocol (RTP) aggregation packet; a processing means for generating a reduced RTP aggregation packet when a size of the received RTP aggregation packet is larger than an available space in an outbound queue of the network node; and a transmitting means for transmitting the reduced RTP aggregation packet when the size of the reduced RTP aggregation packet is less than or equal to the available space in the outbound queue.
  • RTP Real-time Transport Protocol
  • another implementation of the aspect provides the network apparatus is further configured to perform the method of any of the preceding aspects.
  • any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.
  • FIG 1 is a diagram of a telecommunication network including a network domain.
  • FIG. 2 is a diagram of a payload format for a Real-time Transport Protocol (RTP) aggregation packet according to an embodiment of the disclosure.
  • RTP Real-time Transport Protocol
  • FIGS. 3A-3B illustrate methods for generating an RTP aggregation packet according to an embodiment of the disclosure.
  • FIGS. 4 illustrates a method for performing a packet wash process on an RTP aggregation packet according to an embodiment of the disclosure.
  • FIG. 5 is a diagram of a network apparatus according to an embodiment of the disclosure.
  • FIG. 6 is a diagram of an apparatus configured to implement one or more of the methods described herein according to embodiments of the disclosure.
  • Packet payload formation and processing according to the disclosure avoids dropping an entire packet when a packet error or network congestion occurs, and instead selectively drops parts of the packet to reduce the packet size, such that the remainder of the packet may be able to reach its destination.
  • Techniques according to the disclosure may be applied to RTP aggregation packets of NAL units (or NALUs) of video data encoded in the H.264/AVC, H.264/SVC, H.265/HEVC, or H.266/WC formats.
  • RTP aggregation packets according to the disclosure further include information indicating a threshold maximum number of NALUs that may be removed. If, after the threshold number of NALUs have been removed, the reduced packet size is still too large to fit in an outbound queue of a network element, the packet may be dropped.
  • FIG. 1 is a diagram of a telecommunication network 100 including a network domain 102.
  • the telecommunications network 100 includes a plurality of network nodes 104, 106, 108, and 110.
  • the network nodes 104-110 may be operating using layer three (L3) routing technology and, as such, may be referred to as L3 network nodes. While eight network nodes 104- 110 are shown in the network domain 102, more or fewer nodes may be included in practical applications.
  • L3 layer three
  • Each of the network nodes 104-110 is configured to send, receive, and/or route packets containing media content (e.g., streaming media content).
  • media content e.g., streaming media content
  • one or more of the network nodes 104-110 may be a router, a switch, or a gateway.
  • Network nodes 104, 106, and 110 are disposed at an edge of the network domain 102, and may therefore be referred to as edge nodes.
  • the network nodes 104, 106, and 110 that receive packets from outside the network domain 102 may be referred to as an ingress network node (e g., an ingress router).
  • the network nodes 104, 106, and 1 10 that transmit packets out of the network domain 102 may be referred to as an egress network node (e.g., an egress router).
  • each of the network nodes 104, 106, and 110 may function as an ingress network node, an egress network node, or both.
  • Network nodes 108 are not on an edge of the network domain 102. Such nodes 108 may be referred to as internal nodes and may not be configured to communicate outside of the network domain 102 without passing through an edge node.
  • Each of the network nodes 104-110 has one or more neighbor network nodes.
  • a neighbor network node refers to a network node which is only one hop away from another network node.
  • the network nodes 104-110 are coupled to, and communicate with each other, via links 120.
  • the links 120 may be wired, wireless, electrical, optical, or some combination thereof.
  • the pattern or arrangement of links 120 in FIG. 1 is for the purpose of illustration only. More or fewer links 120 coupling the network nodes 104-110 to each other in a different configuration may be used in practical applications.
  • the network node 104 is coupled to a content source 130.
  • the content source 130 may be a video source, e.g., a content provider that streams packets of video data for viewing by an end user (e.g., streaming television, video conferencing, etc.)
  • the content source 130 is configured to provide packets of media content such as, for example, streaming media content, motion picture experts group (MPEG) video samples, etc.
  • MPEG motion picture experts group
  • the network node 104 is configured to request and receive media content from the content source 130.
  • the content source 130 may utilize a video encoder (a.k.a., codec) capable of implementing various video coding techniques, including the H.264/AVC, H.264/VC, H.265/HEVC, and H.266/WC formats.
  • the network node 104 may receive the encoded video in the NAL unit format, which provides packets of video data to be transmitted in packet-based systems using the RTP protocol.
  • the network node 104 may receive the encoded video in existing format single-time aggregation packets (STAPs), multi-time aggregation packets (MTAPs), or in RTP aggregation packets according to the disclosure. If H.264/AVC or H.264/SVC aggregation packets are received, the network node 104 may reorder the NAL units into RTP aggregation packets according to the disclosure.
  • the network node 110 is coupled to a destination node 140.
  • the destination node 140 may represent a user equipment (UE) of an end user.
  • the UE may be a smartphone, a tablet device, a laptop computer, and so on (e.g., a device configured to receive and display streaming video content).
  • the destination node 140 is configured to consume the media content provided by the content source 130.
  • the end user may utilize their UE to request media content (e.g., a streaming media video).
  • the request is routed by the network nodes 104-110 through the network domain 102 and delivered to the content source 130.
  • the content source 130 transmits the requested media content to be routed by the network nodes 104- 110 back through the network domain 102 and delivered to the destination node 140, where the requested media content may be consumed.
  • the destination node 140 may utilize a video decoder (or codec).
  • the video decoder of the destination node 140 is capable of implementing various video decoding techniques corresponding to those used by the video encoder of the content source 130.
  • Network nodes 104- 110 each contain one or more receivers, processors, memories, and transmitters.
  • a packet is received by a receiver, processed by a processor, and stored in an outbound queue in memory until the packet can be transmitted by a transmitter.
  • Network nodes 104-1 10 may include many receivers and transmitters, and may communicate many flows of packets at the same time. When a current packet is received and all transmitters are transmitting other packets, the current packet is retained in an outbound queue in memory until a transmitter is available to transmit the current packet.
  • Network congestion occurs when a network node 104-110 receives packets faster than such packets can be transmitted.
  • outbound queues in the memory in the congested node fill up. Once outbound queues in the memory are full, the congested node must either discontinue storing incoming packets or must evict already received packets from outbound queues in the memory to make room for the new packets. This function is known as packet drop.
  • network nodes 104-110 can be configured to perform a packet wash.
  • packets can be encoded with NALUs arranged according to an order of importance to the process of decoding and reconstructing a video signal.
  • a NALU header may include information indicating a relative importance of the NALU, as determined by the encoder of the video data.
  • the packets can be arranged with the less important NALUs toward the end of the packet and the more important NALUs towards the beginning of the packet.
  • a packet wash according to the disclosure drops data from the end of the packet, continuing towards the front of the packet until the packet is small enough to be stored in an outbound queue for transmission. This can be done to reduce network congestion.
  • the NALUs are not ordered in the packet in an order of importance.
  • the packet header (or packet metadata in some embodiments) includes information indicating the importance of each NALU.
  • a packet wash according to the disclosure removes NALUs from the packet based on the NALU importance information in the packet header or metadata. In either such embodiment, the video quality of the slice in the packet is reduced by less than if the packet were dropped entirely.
  • NALUs for an image may be split into multiple aggregation packets for transmission, information relating to the order for decoding the NALUs (the decoding order) is sent in an aggregation packet along with the NALUs themselves.
  • Standards for each of the H.264/AVC, H.264 SVC, H.265/HEVC, and H.266/VVC video formats and any codecs that apply to such data structures define different techniques for sending such decoding order information.
  • NALUs that are received in RTP payloads of STAP-A or STAP-B format have identical NALU-times (the RTP timestamp value that the NALU would have if it were transported in its own RTP packet) and are arranged in sequential decoding order.
  • a STAP-A payload does not include a decoding order number (DON) for its NALUs.
  • a STAP-B payload includes a 16-bit unsigned DON, specifying a DON value for the first NALU in the payload. For each successive NAL unit in a STAP-B payload, the value of DON is equal to the value of DON of the previous NAL unit in the STAP-B plus 1, modulo 65536.
  • NALUs that are received in RTP payloads of MTAP-16 or MTAP-24 format have individual timestamp offsets associated with each NALU that determine a decoding order for the NALUs.
  • FIG. 2 is a diagram of a payload format 200 for an RTP aggregation packet according to an embodiment of the disclosure.
  • the payload format 200 may be used with the NALUs received in a STAP-A or STAP-B aggregation packet. Once the NALUs of a STAP-A or STAP-B aggregation packet are rearranged in decreasing order of their importance in an RTP aggregation packet according to the disclosure, information about the NALUs decoding order could be lost.
  • the payload format 200 includes a 16-bit unsigned Decoding Order Number Base (DONB) field 202 that indicates the lowest DON value for any of the NALUs in the RTP aggregation packet according to the disclosure.
  • DONB Decoding Order Number Base
  • the fields for each NALU are as defined for the STAP-A and STAP-B aggregation packet.
  • the exception is that the payload format 200 adds an 8-bit Decoding Order Number Difference (DOND) field 204 for each NALU of the RTP aggregation packet according to the disclosure.
  • the DOND field 204 indicates a difference between DONB and the decoding order number values of the associated NALU.
  • FIG. 3 A-3B illustrate methods 300 and 304 for generating an RTP aggregation packet according to an embodiment of the disclosure.
  • the method 300 of FIG. 3 A illustrates a method for generating an RTP aggregation packet according to the disclosure.
  • the method 300 may be performed by a video source.
  • the method 334 of FIG. 3B illustrates a method for performing one step (i.e., step 304) of the method 300 of FIG. 3A.
  • the method 300 of FIG. 3 A begins with step 302, in which a module of a video source configured to generate an RTP aggregation packet according to embodiments of the disclosure receives a plurality of aggregation NALUs. Each aggregation NALU contains a portion of a frame of video data.
  • the video source module generates an RTP aggregation packet comprising the aggregation NALUs.
  • the aggregation NALUs are ordered in decreasing order of their relative importance to the process of decoding a video stream. The relative importance of the aggregation NALU is determined by an encoder of the video stream.
  • the video source module transmits the RTP aggregation packet .
  • the RTP aggregation packet may be transmitted by being stored in an outbound queue that is configured to transmit the packet (or by any other suitable process).
  • the method 334 of FIG. 3B provides further details of the step 304 of the method 300 of FIG. 3A.
  • the plurality of aggregation NALUs are received by the video source module in an RTP single-time aggregation packet (STAP) of format STAP-A or STAP-B, or in an RTP multi-time aggregation packet (MTAP) of format MTAP-16 or MTAP-24.
  • STAP single-time aggregation packet
  • MTAP RTP multi-time aggregation packet
  • the NAL unit header includes a NAL reference indication (NRI) field, indicating the relative importance of the associated NAL unit to the process of decoding the encoded video signal, as determined by the encoder that encoded the video signal.
  • NAI NAL reference indication
  • a higher value of NRI indicates a NALU of higher importance.
  • the value of the NRI field indicates an importance value of the H.264 NALU.
  • NALUs of H.264/ AVC or H.264/SVC format are ordered in the payload of the RTP aggregation packet in decreasing order of NRI value, in some embodiments of the disclosure.
  • the NAL unit header includes a TemporallD (TID) field, indicating the relative importance of the associated NAL unit to the process of decoding the encoded video signal, as determined by the encoder that encoded the video signal.
  • TID TemporallD
  • a lower value of TID indicates a NALU of higher importance.
  • the value of the TID field indicates an importance value of the H.265/HEVC NALU.
  • NALUs of H.265/HEVC format are ordered in the payload of the RTP aggregation packet in increasing order of TID values, in some embodiments of the disclosure.
  • the NAL unit header For video data encoded using the H.266/VVC format, the NAL unit header includes a TID field indicating the relative importance of the associated NAL unit, with a lower value of NRI indicating a NALU of higher importance.
  • An H.266/VVC format NAL unit header also includes a LayerlD field, where a lower value of LayerlD indicates a NALU of higher importance.
  • the TID and LayerlD values of the NALU may be combined by multiplication or addition, or other function that preserves the characteristic that a lower value of the combined importance value indicates a NALU of higher importance.
  • NALUs of H.266/WC format are ordered in the payload of the RTP aggregation packet in increasing order of the combined importance value, in some embodiments of the disclosure.
  • step 310 of FIG. 3B the video source module orders, in decreasing order of importance, the plurality of aggregation NALUs received in step 302 in the RTP aggregation packet, based on the NALUs importance value or combined importance value.
  • Steps 312 and 314 are shown as optional, because they are performed only for NALUs received in STAP-A or STAP- B format packets.
  • step 312 a value of DONB is stored in the packet header.
  • step 314 a DOND value is added to each NALU in the RTP aggregation packet according to the disclosure payload.
  • the value of DOND reflects the decoding order of the NALU in the payload of the STAP-A or STAP-B packet in which the H.264/AVC or H.264/SVC NALUs were received.
  • Steps 312 and 314 may be performed before or after the NALUs are ordered in step 310.
  • an importance threshold value indicates a threshold value for the importance values (or combined importance values) of aggregation NALUs that are to be retained in the RTP aggregation packet during a later packet wash procedure (if such a procedure is applied).
  • the importance threshold value is a minimum value ofNRI that should be retained in the RTP aggregation packet.
  • the importance threshold value is a maximum value of the TID or the combined importance value, respectively, that should be retained in the RTP aggregation packet.
  • the WashAllowance value indicates a maximum number of aggregation NALUs that may be removed from the RTP aggregation packet during the packet wash procedure.
  • the WashAllowance value is set equal to a number of NALUs whose importance value (or combined importance value) is smaller (or larger, as appropriate) than the packet’s importance threshold value.
  • the WashAllowance value is a number of NALUs in the packet whose NRI value is smaller than the importance threshold value.
  • the WashAllowance value is a number of NALUs in the packet whose TID value (or combined importance value) is larger than the importance threshold value.
  • the list of sizes of the aggregation NALUs is ordered in a corresponding order to the order of the associated aggregation NALUs in the RTP aggregation packet according to the disclosure payload. This list of sizes may allow a packet wash operation to be performed based on the information stored in the network layer header of the RTP aggregation packet, rather than requiring access to the individual NALUs in the packet payload.
  • FIG. 4 illustrates a method 400 for performing a packet wash process on an RTP aggregation packet according to an embodiment of the disclosure.
  • the network node receives an RTP aggregation packet.
  • the network node determines a port of the node by which the packet will be forwarded and determines whether an outbound queue for the port has sufficient space to store the received packet. If the outbound queue has sufficient space, in step 406 the packet is stored in the outbound queue, where the outbound queue is configured to transmit the packet.
  • step 404 If it is determined in step 404 that the outbound queue for the port does not have sufficient space to store the received packet, a packet wash process is begun. In step 408, a last NALU of the packet (i.e., the NALU of lowest importance) is removed from the packet and the value of Wash Allowance in the network header of the packet is decremented by one. In step 410, the network node determines whether the outbound queue for the port has sufficient space to store the reduced size packet. If so, in step 406, the reduced size packet is stored in the outbound queue for transmission.
  • a packet wash process is begun.
  • a last NALU of the packet i.e., the NALU of lowest importance
  • the network node determines whether the outbound queue for the port has sufficient space to store the reduced size packet. If so, in step 406, the reduced size packet is stored in the outbound queue for transmission.
  • step 412 the network node determines whether the value of WashAllowance in the network header of the packet equals zero, indicating that the maximum allowed number of NALUs have been removed from the packet. If so, in step 414, the packet is dropped. If the value of WashAllowance is greater than zero, the method branches back to step 408 and another NALU is removed from the packet in step 408. In this way, the packet wash procedure continues until either (i) the size of the reduced'size packet is less than or equal to the available space in the outbound queue, or (ii) the maximum allowed number of NALUs have been removed from the packet and the reduced-size packet is still too large for the outbound queue. In the first outcome, the reduced-size packet is forwarded toward its destination. In the second outcome, the reduced-size packet is dropped.
  • FIG. 5 is a diagram of a network apparatus 500 (e.g., a video source, a router, a network node, etc.) according to an embodiment of the disclosure.
  • the network apparatus 500 is suitable for implementing the disclosed embodiments as described herein.
  • the network apparatus 500 comprises ingress ports/ingress means 510 coupled to receiver units (Rx)/receiving means 520 for receiving data; a processor, logic unit, or central processing unit (CPU)/processing means 530 (coupled to the Rx/receiving means 520) to process the data; transmitter units (Tx)/transmitting means 540 and egress ports/egress means 550 (coupled to the processor/processing means 530) for transmitting the data; and a memory/memory means 560 (coupled to the processor/processing means 530) for storing the data.
  • the network apparatus 500 may also comprise optical-to- electrical (OE) components and electrical-to-optical (EO) components coupled to the ingress ports/ingress means 510, the receiver units/receiving means 520, the transmitter units/transmitting means 540, and the egress ports/egress means 550 for egress or ingress of optical or electrical signals.
  • OE optical-to- electrical
  • EO electrical-to-optical
  • the processor/processing means 530 is implemented by hardware and software
  • the processor/processing means 530 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or digital signal processors (DSPs).
  • the processor/processing means 530 is in communication with the ingress ports/ingress means 510, receiver units/receiving means 520, transmitter units/transmitting means 540, egress ports/egress means 550, and memory/memory means 560.
  • the processor/processing means 530 comprises an RTP packet aggregation module 570.
  • the RTP packet aggregation module 570 is stored in onboard memory of the processor/processing means 530.
  • the RTP packet aggregation module 570 is able to implement the method 300 as described with reference to FIGS. 3A-3B.
  • the processor/processing means 530 may further comprise an RTP aggregation packet wash module 572.
  • the RTP aggregation packet wash module 572 is stored in onboard memory of the processor/processing means 530.
  • the RTP aggregation packet wash module 572 is able to implement the method 400 as described with reference to FIG. 4, for example.
  • the inclusion of the RTP packet aggregation module 570 and/or the RTP aggregation packet wash module 572 therefore provides a substantial improvement to the functionality of the network apparatus 500 and effects a transformation of the network apparatus 500 to a different state.
  • the RTP packet aggregation module 570 and/or the RTP aggregation packet wash module 572 are implemented as instructions stored in the memory/memory means 560 and executed by the processor/processing means 530.
  • the network apparatus 500 may also include input and/or output (I/O) devices/I/O means 580 for communicating data to and from a user.
  • the I/O devices or means 580 may be coupled to the processor/processing means 530.
  • the I/O devices I/O means 580 may include output devices such as a display for displaying video data, speakers for outputting audio data, etc.
  • the I/O devices or means 580 may also include input devices, such as a keyboard, mouse, trackball, etc., and/or corresponding interfaces for interacting with such output devices.
  • the memory/memory means 560 comprises one or more disks, tape drives, or solid-state drives and may be used as an over-flow data storage device, may be used to store programs when such programs are selected for execution, to store instructions that are read during program execution and to store data for execution or generated during execution.
  • the memory/memory means 560 may be volatile and/or non-volatile and may be read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (TCAM), and/or static randomaccess memory (SRAM).
  • FIG. 6 illustrates a network apparatus 600 configured to implement one or more of the methods for RTP packet aggregation and/or RTP aggregation packet wash as described herein.
  • the network apparatus 600 is configured to implement the methods described with reference to FIGS. 3A-3B and 4.
  • the network apparatus 600 may be implemented in the network device 500.
  • the network apparatus 600 comprises a means 602 for generating an RTP aggregation packet according to embodiments of the disclosure, as described with reference to the methods described with reference to FIGS. 3A-3B.
  • the network apparatus 600 may further comprise a means 604 for packet washing an RTP aggregation packet according to embodiments of the disclosure, as described with reference to the method described with reference to FIG. 4.
  • the disclosed embodiments may be a system, an apparatus, a method, and/or a computer program product at any possible technical detail level of integration
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure
  • the computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the disclosure is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

Abstract

A video source receives Network Abstraction Layer (NAL) units (NALUs), generates a Real-time Transport Protocol (RTP) aggregation packet comprising the NALUs, ordered in the packet payload in decreasing order of their relative importance to video stream decoding, as determined by the NALU encoder. Once generated, the video source transmits the packet. A network node performs a packet wash on a received RTP aggregation packet that includes a plurality of NALUs. The network node performs the packet wash when the packet size is larger than available space in an outbound queue of the network node. NALUs are successively removed from the payload of the packet, beginning with a last NALU. The packet wash terminates when either the packet will fit in the outbound queue or a maximum number of NALUs have been removed from the packet. If the packet will fit, it is stored in the outbound queue for transmission.

Description

Packet Wash of RTP Aggregation Packets in a Video Stream
TECHNICAL FIELD
[0001] The present disclosure is generally related to packetized communication of a video stream and specifically to improved methods of packet washing video stream packets.
BACKGROUND
[0002] Bursty loss and longer-than-expected delay have catastrophic effects on the quality of experience (QoE) to end-users in video streaming. Such loss and delay are often caused by network congestion. Various congestion control mechanisms target various goals, e.g. link utilization improvement, loss reduction, and fairness enhancement. For media streaming, reducing the possibility of network congestion may be achieved by rate control and video adaptation methods.
[0003] Video codecs such as H.264 Advanced Video Coding (AVC), H.264 Scalable Video Coding (SVC), H.265 High Efficiency Video Coding (HEVC), and H.266 Versatile Video Coding (VVC) use a syntax structure based on Network Abstraction Layer (NAL) units. The NAL unit structure provides convenient packetization/framing of video data to be transmitted in packetbased systems using transport protocols such as Real-time Transport Protocol (RTP).
SUMMARY
[0004] A first aspect relates to a method implemented in a video source, the method comprising: receiving a plurality of aggregation Network Abstraction Layer (NAL) units (NALUs), wherein each aggregation NALU contains a portion of a frame of video data; generating a Real-time Transport Protocol (RTP) aggregation packet comprising the aggregation NALUs, wherein the aggregation NALUs are ordered in a payload of the RTP aggregation packet in decreasing order of their relative importance to the process of decoding a video stream, the relative importance of the aggregation NALU being determined by an encoder of the video data; and transmitting the RTP aggregation packet.
[0005] Optionally, in any of the preceding aspects, another implementation of the aspect provides receiving the plurality of aggregation NALUs comprises receiving an aggregation packet in which the aggregation NALUs are arranged in sequential decoding order; and generating the RTP aggregation packet comprises reordering the aggregation NALUs in decreasing order of their relative importance.
[0006] Optionally, in any of the preceding aspects, another implementation of the aspect provides the payload of the RTP aggregation packet comprises: a decoding order number (DON) Base (DONB) value comprising a lowest DON value for the aggregation NALUs in the RTP aggregation packet; and for each aggregation NALU, a Decoding Order Number Difference (DOND) value, indicating a difference between a DON value of the received NALU and the DONB value.
[0007] Optionally, in any of the preceding aspects, another implementation of the aspect provides the DON value of the aggregation NALU equals DONB + DOND.
[0008] Optionally, in any of the preceding aspects, another implementation of the aspect provides a network layer header that encapsulates the RTP aggregation packet comprises: an importance threshold value indicating a threshold value for importance values of aggregation NALUs to be retained in the RTP aggregation packet during a packet wash procedure; a WashAllowance value indicating a maximum number of aggregation NALUs that may be removed from the RTP aggregation packet during a packet wash procedure; and a list of sizes of the aggregation NALUs in the RTP aggregation packet, the list ordered in an order corresponding to the order of the aggregation NALUs in the RTP aggregation packet payload.
[0009] Optionally, in any of the preceding aspects, another implementation of the aspect provides the aggregation NALUs comprise video data encoded using one of an Advanced Video Coding (AVC) standard or a Scalable Video Coding (SVC) standard; the importance value of an aggregation NALU comprises a NAL reference indication (NRI) value determined by the encoder of the video data; the aggregation NALUs are ordered in the payload of the RTP aggregation packet in decreasing order of their NRI value; the importance threshold value comprises a minimum NRI value to be retained in the RTP aggregation packet during the packet wash procedure; and the WashAllowance value is equal to a number of aggregation NALUs in the RTP aggregation packet whose NRI value is less than the importance threshold value.
[0010] Optionally, in any of the preceding aspects, another implementation of the aspect provides the aggregation NALUs comprise video data encoded using High Efficiency Video Coding (HEVC) standard; the importance value of an aggregation NALU comprises a Temporalld (TID) value determined by the encoder of the video data; the aggregation NALUs are ordered in the payload of the RTP aggregation packet in increasing order of the TID value; the importance threshold value comprises a maximum TID value to be retained in the RTP aggregation packet during the packet wash procedure; and the WashAllowance value is equal to a number of aggregation NALUs in the RTP aggregation packet whose TID value is greater than the importance threshold value.
[0011] Optionally, in any of the preceding aspects, another implementation of the aspect provides the aggregation NALUs comprise video data encoded using Versatile Video Coding (VVC) standard; the importance value of an aggregation NALU comprises a combined importance value based on combining a TID value and a LayerTD value determined by the encoder of the video data; the aggregation NALUs are ordered in the payload of the RTP aggregation packet in increasing order of the combined importance value; the importance threshold value comprises a maximum combined importance value to be retained in the RTP aggregation packet during the packet wash procedure; and the maximum number of aggregation NALUs that may be removed from the RTP aggregation packet during the packet wash procedure is equal to a number of NALUs in the RTP aggregation packet whose combined importance value is greater than the importance threshold value.
[0012] Optionally, in any of the preceding aspects, another implementation of the aspect provides the combined importance value of the aggregation NALU is the TID value of the aggregation NALU multiplied by the LayerlD value of the aggregation NALU.
[0013] A second aspect relates to a method of performing a packet wash on a Real-time Transport Protocol (RTP) aggregation packet, implemented in a network node, wherein the RTP aggregation packet comprises a plurality of received Network Abstraction Layer (NAL) units (NALUs), the method comprising: receiving a received RTP aggregation packet; performing the packet wash procedure when a size of the received RTP aggregation packet is larger than an available space in an outbound queue of the network node, the packet wash procedure comprising: generating a reduced RTP aggregation packet by successively removing, beginning with a last received NALU, one or more received NALUs from the payload of the reduced RTP aggregation packet; terminating the packet wash procedure when one of (i) the size of the reduced RTP aggregation packet is less than or equal to the available space in the outbound queue, or (ii) a total number of received NALUs removed from the payload of the reduced RTP aggregation packet is equal to the maximum number of received NALUs that may be removed from the received RTP aggregation packet during the packet wash procedure; and storing the reduced size packet in the outbound queue when the packet wash procedure is terminated, if the size of the reduced RTP aggregation packet is less than or equal to the available space in the outbound queue, wherein the outbound queue is configured to transmit the reduced size packet.
[0014] Optionally, in any of the preceding aspects, another implementation of the aspect provides a network layer header that encapsulates the RTP aggregation packet comprises a WashAllowance value indicating a maximum number of received NALUs that may be removed from the received RTP aggregation packet during the packet wash procedure, removing a received NALU from the payload of the reduced RTP aggregation packet comprises decrementing by one the WashAllowance value in the network layer header, and the total number of received NALUs removed from the pay load is equal to the maximum number of received NALUs that may be removed when the WashAllowance value in the network layer header equals zero.
[0015] A third aspect relates to a network apparatus, comprising: a memory configured to store instructions; and a processor coupled to the memory and configured to execute the instructions to perform the method of any of the preceding aspects.
[0016] Optionally, in any of the preceding aspects, another implementation of the aspect provides the memory comprises two or more memory components, and wherein the processor comprises two or more processor components.
[0017] A fourth aspect relates to a network apparatus, comprising: a receiving means for receiving a plurality of aggregation Network Abstraction Layer (NAL) units (NALUs), wherein each of the plurality of NALUs contain a portion of a frame of video data; a processing means for generating a decreasing-order Real-time Transport Protocol (RTP) aggregation packet comprising the aggregation NALUs; and a transmitting means for transmitting the RTP aggregation packet. [0018] Optionally, in any of the preceding aspects, another implementation of the aspect provides the network apparatus is further configured to perform the method of any of the preceding aspects.
[0019] A fourth aspect relates to a network apparatus, comprising: a receiving means for receiving a decreasing-order Real-time Transport Protocol (RTP) aggregation packet; a processing means for generating a reduced RTP aggregation packet when a size of the received RTP aggregation packet is larger than an available space in an outbound queue of the network node; and a transmitting means for transmitting the reduced RTP aggregation packet when the size of the reduced RTP aggregation packet is less than or equal to the available space in the outbound queue.
[0020] Optionally, in any of the preceding aspects, another implementation of the aspect provides the network apparatus is further configured to perform the method of any of the preceding aspects.
[0021] For the purpose of clarity, any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.
[0022] These and other features, and the advantages thereof, will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts. [0024] FIG 1 is a diagram of a telecommunication network including a network domain.
[0025] FIG. 2 is a diagram of a payload format for a Real-time Transport Protocol (RTP) aggregation packet according to an embodiment of the disclosure.
[0026] FIGS. 3A-3B illustrate methods for generating an RTP aggregation packet according to an embodiment of the disclosure.
[0027] FIGS. 4 illustrates a method for performing a packet wash process on an RTP aggregation packet according to an embodiment of the disclosure.
[0028] FIG. 5 is a diagram of a network apparatus according to an embodiment of the disclosure. [0029] FIG. 6 is a diagram of an apparatus configured to implement one or more of the methods described herein according to embodiments of the disclosure.
DETAILED DESCRIPTION
[0030] It should be understood at the outset that, although illustrative implementations of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
[0031] Packet payload formation and processing according to the disclosure avoids dropping an entire packet when a packet error or network congestion occurs, and instead selectively drops parts of the packet to reduce the packet size, such that the remainder of the packet may be able to reach its destination. Techniques according to the disclosure may be applied to RTP aggregation packets of NAL units (or NALUs) of video data encoded in the H.264/AVC, H.264/SVC, H.265/HEVC, or H.266/WC formats. When a packet size is to be reduced in response to network congestion (a process referred to as “packet washing”), NALUs are successively removed from the end of the packet, thereby improving quality of the resulting image by removing NALUs that will have less effect on the decoded image.
[0032] RTP aggregation packets according to the disclosure further include information indicating a threshold maximum number of NALUs that may be removed. If, after the threshold number of NALUs have been removed, the reduced packet size is still too large to fit in an outbound queue of a network element, the packet may be dropped.
[0033] FIG. 1 is a diagram of a telecommunication network 100 including a network domain 102. As shown, the telecommunications network 100 includes a plurality of network nodes 104, 106, 108, and 110. The network nodes 104-110 may be operating using layer three (L3) routing technology and, as such, may be referred to as L3 network nodes. While eight network nodes 104- 110 are shown in the network domain 102, more or fewer nodes may be included in practical applications.
[0034] Each of the network nodes 104-110 is configured to send, receive, and/or route packets containing media content (e.g., streaming media content). Thus, one or more of the network nodes 104-110 may be a router, a switch, or a gateway. Network nodes 104, 106, and 110 are disposed at an edge of the network domain 102, and may therefore be referred to as edge nodes. The network nodes 104, 106, and 110 that receive packets from outside the network domain 102 may be referred to as an ingress network node (e g., an ingress router). The network nodes 104, 106, and 1 10 that transmit packets out of the network domain 102 may be referred to as an egress network node (e.g., an egress router). Depending on the direction of packet traffic, each of the network nodes 104, 106, and 110 may function as an ingress network node, an egress network node, or both. Network nodes 108 are not on an edge of the network domain 102. Such nodes 108 may be referred to as internal nodes and may not be configured to communicate outside of the network domain 102 without passing through an edge node.
[0035] Each of the network nodes 104-110 has one or more neighbor network nodes. As used herein, a neighbor network node refers to a network node which is only one hop away from another network node. The network nodes 104-110 are coupled to, and communicate with each other, via links 120. The links 120 may be wired, wireless, electrical, optical, or some combination thereof. The pattern or arrangement of links 120 in FIG. 1 is for the purpose of illustration only. More or fewer links 120 coupling the network nodes 104-110 to each other in a different configuration may be used in practical applications.
[0036] As shown in FIG. 1, the network node 104 is coupled to a content source 130. The content source 130 may be a video source, e.g., a content provider that streams packets of video data for viewing by an end user (e.g., streaming television, video conferencing, etc.) The content source 130 is configured to provide packets of media content such as, for example, streaming media content, motion picture experts group (MPEG) video samples, etc. Thus, the network node 104 is configured to request and receive media content from the content source 130.
[0037] The content source 130 may utilize a video encoder (a.k.a., codec) capable of implementing various video coding techniques, including the H.264/AVC, H.264/VC, H.265/HEVC, and H.266/WC formats. The network node 104 may receive the encoded video in the NAL unit format, which provides packets of video data to be transmitted in packet-based systems using the RTP protocol. The network node 104 may receive the encoded video in existing format single-time aggregation packets (STAPs), multi-time aggregation packets (MTAPs), or in RTP aggregation packets according to the disclosure. If H.264/AVC or H.264/SVC aggregation packets are received, the network node 104 may reorder the NAL units into RTP aggregation packets according to the disclosure.
[0038] The network node 110 is coupled to a destination node 140. The destination node 140 may represent a user equipment (UE) of an end user. The UE may be a smartphone, a tablet device, a laptop computer, and so on (e.g., a device configured to receive and display streaming video content). The destination node 140 is configured to consume the media content provided by the content source 130. For example, the end user may utilize their UE to request media content (e.g., a streaming media video). The request is routed by the network nodes 104-110 through the network domain 102 and delivered to the content source 130. In response to the request, the content source 130 transmits the requested media content to be routed by the network nodes 104- 110 back through the network domain 102 and delivered to the destination node 140, where the requested media content may be consumed.
[0039] In order to decode the media content (encoded by the video encoder at the content source 130), the destination node 140 may utilize a video decoder (or codec). The video decoder of the destination node 140 is capable of implementing various video decoding techniques corresponding to those used by the video encoder of the content source 130.
[0040] The present disclosure focuses on the issue of network congestion. Network nodes 104- 110 each contain one or more receivers, processors, memories, and transmitters. A packet is received by a receiver, processed by a processor, and stored in an outbound queue in memory until the packet can be transmitted by a transmitter. Network nodes 104-1 10 may include many receivers and transmitters, and may communicate many flows of packets at the same time. When a current packet is received and all transmitters are transmitting other packets, the current packet is retained in an outbound queue in memory until a transmitter is available to transmit the current packet. Network congestion occurs when a network node 104-110 receives packets faster than such packets can be transmitted. In such a case, outbound queues in the memory in the congested node fill up. Once outbound queues in the memory are full, the congested node must either discontinue storing incoming packets or must evict already received packets from outbound queues in the memory to make room for the new packets. This function is known as packet drop.
[0041] In some examples, network nodes 104-110 can be configured to perform a packet wash. In some embodiments, packets can be encoded with NALUs arranged according to an order of importance to the process of decoding and reconstructing a video signal. For example, a NALU header may include information indicating a relative importance of the NALU, as determined by the encoder of the video data. In some embodiments, the packets can be arranged with the less important NALUs toward the end of the packet and the more important NALUs towards the beginning of the packet. A packet wash according to the disclosure drops data from the end of the packet, continuing towards the front of the packet until the packet is small enough to be stored in an outbound queue for transmission. This can be done to reduce network congestion. In other embodiments, the NALUs are not ordered in the packet in an order of importance. Instead, the packet header (or packet metadata in some embodiments) includes information indicating the importance of each NALU. In such embodiments, a packet wash according to the disclosure removes NALUs from the packet based on the NALU importance information in the packet header or metadata. In either such embodiment, the video quality of the slice in the packet is reduced by less than if the packet were dropped entirely.
[0042] Because NALUs for an image may be split into multiple aggregation packets for transmission, information relating to the order for decoding the NALUs (the decoding order) is sent in an aggregation packet along with the NALUs themselves. Standards for each of the H.264/AVC, H.264 SVC, H.265/HEVC, and H.266/VVC video formats and any codecs that apply to such data structures define different techniques for sending such decoding order information. [0043] NALUs that are received in RTP payloads of STAP-A or STAP-B format have identical NALU-times (the RTP timestamp value that the NALU would have if it were transported in its own RTP packet) and are arranged in sequential decoding order. A STAP-A payload does not include a decoding order number (DON) for its NALUs. A STAP-B payload includes a 16-bit unsigned DON, specifying a DON value for the first NALU in the payload. For each successive NAL unit in a STAP-B payload, the value of DON is equal to the value of DON of the previous NAL unit in the STAP-B plus 1, modulo 65536. NALUs that are received in RTP payloads of MTAP-16 or MTAP-24 format have individual timestamp offsets associated with each NALU that determine a decoding order for the NALUs.
[0044] FIG. 2 is a diagram of a payload format 200 for an RTP aggregation packet according to an embodiment of the disclosure. The payload format 200 may be used with the NALUs received in a STAP-A or STAP-B aggregation packet. Once the NALUs of a STAP-A or STAP-B aggregation packet are rearranged in decreasing order of their importance in an RTP aggregation packet according to the disclosure, information about the NALUs decoding order could be lost.
[0045] The payload format 200 includes a 16-bit unsigned Decoding Order Number Base (DONB) field 202 that indicates the lowest DON value for any of the NALUs in the RTP aggregation packet according to the disclosure. With one exception, the fields for each NALU are as defined for the STAP-A and STAP-B aggregation packet. The exception is that the payload format 200 adds an 8-bit Decoding Order Number Difference (DOND) field 204 for each NALU of the RTP aggregation packet according to the disclosure. The DOND field 204 indicates a difference between DONB and the decoding order number values of the associated NALU. [0046] FIGS. 3 A-3B illustrate methods 300 and 304 for generating an RTP aggregation packet according to an embodiment of the disclosure. The method 300 of FIG. 3 A illustrates a method for generating an RTP aggregation packet according to the disclosure. The method 300 may be performed by a video source. The method 334 of FIG. 3B illustrates a method for performing one step (i.e., step 304) of the method 300 of FIG. 3A.
[0047] The method 300 of FIG. 3 A begins with step 302, in which a module of a video source configured to generate an RTP aggregation packet according to embodiments of the disclosure receives a plurality of aggregation NALUs. Each aggregation NALU contains a portion of a frame of video data. In step 304, the video source module generates an RTP aggregation packet comprising the aggregation NALUs. In a payload of the RTP aggregation packet, the aggregation NALUs are ordered in decreasing order of their relative importance to the process of decoding a video stream. The relative importance of the aggregation NALU is determined by an encoder of the video stream. In step 306, the video source module transmits the RTP aggregation packet . The RTP aggregation packet may be transmitted by being stored in an outbound queue that is configured to transmit the packet (or by any other suitable process).
[0048] The method 334 of FIG. 3B provides further details of the step 304 of the method 300 of FIG. 3A. In various embodiments, the plurality of aggregation NALUs are received by the video source module in an RTP single-time aggregation packet (STAP) of format STAP-A or STAP-B, or in an RTP multi-time aggregation packet (MTAP) of format MTAP-16 or MTAP-24. [0049] For video data encoded using the H.264 format (both AVC and SVC), the NAL unit header includes a NAL reference indication (NRI) field, indicating the relative importance of the associated NAL unit to the process of decoding the encoded video signal, as determined by the encoder that encoded the video signal. A higher value of NRI indicates a NALU of higher importance. The value of the NRI field indicates an importance value of the H.264 NALU. NALUs of H.264/ AVC or H.264/SVC format are ordered in the payload of the RTP aggregation packet in decreasing order of NRI value, in some embodiments of the disclosure.
[0050] For video data encoded using the H.265/HEVC format, the NAL unit header includes a TemporallD (TID) field, indicating the relative importance of the associated NAL unit to the process of decoding the encoded video signal, as determined by the encoder that encoded the video signal. A lower value of TID indicates a NALU of higher importance. The value of the TID field indicates an importance value of the H.265/HEVC NALU. NALUs of H.265/HEVC format are ordered in the payload of the RTP aggregation packet in increasing order of TID values, in some embodiments of the disclosure.
[0051] For video data encoded using the H.266/VVC format, the NAL unit header includes a TID field indicating the relative importance of the associated NAL unit, with a lower value of NRI indicating a NALU of higher importance. An H.266/VVC format NAL unit header also includes a LayerlD field, where a lower value of LayerlD indicates a NALU of higher importance. To generate a combined importance value representing the importance of an H.266/VVC NALU, the TID and LayerlD values of the NALU may be combined by multiplication or addition, or other function that preserves the characteristic that a lower value of the combined importance value indicates a NALU of higher importance. NALUs of H.266/WC format are ordered in the payload of the RTP aggregation packet in increasing order of the combined importance value, in some embodiments of the disclosure.
[0052] In step 310 of FIG. 3B, the video source module orders, in decreasing order of importance, the plurality of aggregation NALUs received in step 302 in the RTP aggregation packet, based on the NALUs importance value or combined importance value. Steps 312 and 314 are shown as optional, because they are performed only for NALUs received in STAP-A or STAP- B format packets. In step 312, a value of DONB is stored in the packet header. In step 314, a DOND value is added to each NALU in the RTP aggregation packet according to the disclosure payload. The value of DOND reflects the decoding order of the NALU in the payload of the STAP-A or STAP-B packet in which the H.264/AVC or H.264/SVC NALUs were received. Steps 312 and 314 may be performed before or after the NALUs are ordered in step 310.
[0053] In step 316, an importance threshold value, a WashAllowance value, and a list of sizes of the aggregation NALUs are added to a network layer header of the RTP aggregation packet, in some embodiments. The importance threshold value indicates a threshold value for the importance values (or combined importance values) of aggregation NALUs that are to be retained in the RTP aggregation packet during a later packet wash procedure (if such a procedure is applied). For AVC or SVC NALUs, the importance threshold value is a minimum value ofNRI that should be retained in the RTP aggregation packet. For HEVC and VVC NALUs, the importance threshold value is a maximum value of the TID or the combined importance value, respectively, that should be retained in the RTP aggregation packet.
[0054] The WashAllowance value indicates a maximum number of aggregation NALUs that may be removed from the RTP aggregation packet during the packet wash procedure. The WashAllowance value is set equal to a number of NALUs whose importance value (or combined importance value) is smaller (or larger, as appropriate) than the packet’s importance threshold value. For AVC or SVC NALUs, the WashAllowance value is a number of NALUs in the packet whose NRI value is smaller than the importance threshold value. For HEVC and VVC NALUs, the WashAllowance value is a number of NALUs in the packet whose TID value (or combined importance value) is larger than the importance threshold value. [0055] The list of sizes of the aggregation NALUs is ordered in a corresponding order to the order of the associated aggregation NALUs in the RTP aggregation packet according to the disclosure payload. This list of sizes may allow a packet wash operation to be performed based on the information stored in the network layer header of the RTP aggregation packet, rather than requiring access to the individual NALUs in the packet payload.
[0056] FIG. 4 illustrates a method 400 for performing a packet wash process on an RTP aggregation packet according to an embodiment of the disclosure. In step 402, the network node receives an RTP aggregation packet. In step 404, the network node determines a port of the node by which the packet will be forwarded and determines whether an outbound queue for the port has sufficient space to store the received packet. If the outbound queue has sufficient space, in step 406 the packet is stored in the outbound queue, where the outbound queue is configured to transmit the packet.
[0057] If it is determined in step 404 that the outbound queue for the port does not have sufficient space to store the received packet, a packet wash process is begun. In step 408, a last NALU of the packet (i.e., the NALU of lowest importance) is removed from the packet and the value of Wash Allowance in the network header of the packet is decremented by one. In step 410, the network node determines whether the outbound queue for the port has sufficient space to store the reduced size packet. If so, in step 406, the reduced size packet is stored in the outbound queue for transmission.
[0058] If the outbound queue does not have sufficient space to store the reduced'size packet, in step 412, the network node determines whether the value of WashAllowance in the network header of the packet equals zero, indicating that the maximum allowed number of NALUs have been removed from the packet. If so, in step 414, the packet is dropped. If the value of WashAllowance is greater than zero, the method branches back to step 408 and another NALU is removed from the packet in step 408. In this way, the packet wash procedure continues until either (i) the size of the reduced'size packet is less than or equal to the available space in the outbound queue, or (ii) the maximum allowed number of NALUs have been removed from the packet and the reduced-size packet is still too large for the outbound queue. In the first outcome, the reduced-size packet is forwarded toward its destination. In the second outcome, the reduced-size packet is dropped.
[0059] FIG. 5 is a diagram of a network apparatus 500 (e.g., a video source, a router, a network node, etc.) according to an embodiment of the disclosure. The network apparatus 500 is suitable for implementing the disclosed embodiments as described herein. The network apparatus 500 comprises ingress ports/ingress means 510 coupled to receiver units (Rx)/receiving means 520 for receiving data; a processor, logic unit, or central processing unit (CPU)/processing means 530 (coupled to the Rx/receiving means 520) to process the data; transmitter units (Tx)/transmitting means 540 and egress ports/egress means 550 (coupled to the processor/processing means 530) for transmitting the data; and a memory/memory means 560 (coupled to the processor/processing means 530) for storing the data. The network apparatus 500 may also comprise optical-to- electrical (OE) components and electrical-to-optical (EO) components coupled to the ingress ports/ingress means 510, the receiver units/receiving means 520, the transmitter units/transmitting means 540, and the egress ports/egress means 550 for egress or ingress of optical or electrical signals.
[0060] The processor/processing means 530 is implemented by hardware and software The processor/processing means 530 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or digital signal processors (DSPs). The processor/processing means 530 is in communication with the ingress ports/ingress means 510, receiver units/receiving means 520, transmitter units/transmitting means 540, egress ports/egress means 550, and memory/memory means 560. The processor/processing means 530 comprises an RTP packet aggregation module 570. In some embodiments, the RTP packet aggregation module 570 is stored in onboard memory of the processor/processing means 530. The RTP packet aggregation module 570 is able to implement the method 300 as described with reference to FIGS. 3A-3B. The processor/processing means 530 may further comprise an RTP aggregation packet wash module 572. In some embodiments, the RTP aggregation packet wash module 572 is stored in onboard memory of the processor/processing means 530. The RTP aggregation packet wash module 572 is able to implement the method 400 as described with reference to FIG. 4, for example. The inclusion of the RTP packet aggregation module 570 and/or the RTP aggregation packet wash module 572 therefore provides a substantial improvement to the functionality of the network apparatus 500 and effects a transformation of the network apparatus 500 to a different state. Alternatively, the RTP packet aggregation module 570 and/or the RTP aggregation packet wash module 572 are implemented as instructions stored in the memory/memory means 560 and executed by the processor/processing means 530.
[0061] The network apparatus 500 may also include input and/or output (I/O) devices/I/O means 580 for communicating data to and from a user. The I/O devices or means 580 may be coupled to the processor/processing means 530. The I/O devices I/O means 580 may include output devices such as a display for displaying video data, speakers for outputting audio data, etc. The I/O devices or means 580 may also include input devices, such as a keyboard, mouse, trackball, etc., and/or corresponding interfaces for interacting with such output devices. [0062] The memory/memory means 560 comprises one or more disks, tape drives, or solid-state drives and may be used as an over-flow data storage device, may be used to store programs when such programs are selected for execution, to store instructions that are read during program execution and to store data for execution or generated during execution. The memory/memory means 560 may be volatile and/or non-volatile and may be read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (TCAM), and/or static randomaccess memory (SRAM).
[0063] FIG. 6 illustrates a network apparatus 600 configured to implement one or more of the methods for RTP packet aggregation and/or RTP aggregation packet wash as described herein. For example, the network apparatus 600 is configured to implement the methods described with reference to FIGS. 3A-3B and 4. The network apparatus 600 may be implemented in the network device 500. The network apparatus 600 comprises a means 602 for generating an RTP aggregation packet according to embodiments of the disclosure, as described with reference to the methods described with reference to FIGS. 3A-3B. The network apparatus 600 may further comprise a means 604 for packet washing an RTP aggregation packet according to embodiments of the disclosure, as described with reference to the method described with reference to FIG. 4.
[0064] The disclosed embodiments may be a system, an apparatus, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure The computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the disclosure is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
[0065] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.

Claims

CLAIMS What is claimed is:
1. A method implemented in a video source, the method comprising: receiving a plurality of aggregation Network Abstraction Layer (NAL) units (NALUs), each aggregation NALU containing a portion of a frame of video data; generating a Real-time Transport Protocol (RTP) aggregation packet comprising the aggregation NALUs, the aggregation NALUs being ordered in a payload of the RTP aggregation packet in decreasing order of their relative importance to the process of decoding a video stream, the relative importance of an individual aggregation NALU being determined by an encoder of the video data; and transmitting the RTP aggregation packet.
2. The method of claim 1, wherein: the receiving the plurality of aggregation NALUs comprises receiving an aggregation packet in which the aggregation NALUs are arranged in a sequential decoding order; and the generating the RTP aggregation packet comprises ordering the aggregation NALUs in decreasing order of relative importance.
3. The method of any of claims 1 -2, wherein the payload of the RTP aggregation packet comprises: a decoding order number (DON) Base (DONB) value comprising a lowest DON value for the aggregation NALUs in the RTP aggregation packet; and for each aggregation NALU, a Decoding Order Number Difference (DOND) value, indicating a difference between a DON value of the received NALU and the DONB value.
4. The method of any of claims 1-3, wherein the DON value of the aggregation NALU equals
DONB + DOND.
5. The method of any of claims 1-4, wherein a network layer header that encapsulates the RTP aggregation packet comprises: an importance threshold value indicating a threshold value for importance values of aggregation NALUs to be retained in the RTP aggregation packet during a packet wash procedure; a WashAllowance value indicating a maximum number of aggregation NALUs that may be removed from the RTP aggregation packet during a packet wash procedure; and a list of sizes of the aggregation NALUs in the RTP aggregation packet, the list ordered in an order corresponding to the order of the aggregation NALUs in the RTP aggregation packet payload.
6. The method of any of claims 1-5, wherein: the aggregation NALUs comprise video data encoded using one of an Advanced Video Coding (AVC) standard or a Scalable Video Coding (SVC) standard; the importance value of the each aggregation NALU comprises a NAL reference indication (NRI) value determined by the encoder of the video data; the aggregation NALUs are ordered in the payload of the RTP aggregation packet in decreasing order of NRI values; the importance threshold value comprises a minimum NRI value to be retained in the RTP aggregation packet during the packet wash procedure; and the WashAllowance value is equal to a number of aggregation NALUs in the RTP aggregation packet whose NRI value is less than the importance threshold value.
7. The method of any of claims 1-6, wherein: the aggregation NALUs comprise video data encoded using a High Efficiency Video Coding (HEVC) standard; the importance value of the each aggregation NALU comprises a Temporalld (TID) value determined by the encoder of the video data; the aggregation NALUs are ordered in the payload of the RTP aggregation packet in increasing order of TID values; the importance threshold value comprises a maximum TID value to be retained in the RTP aggregation packet during the packet wash procedure; and the WashAllowance value is equal to a number of aggregation NALUs in the RTP aggregation packet whose TID value is greater than the importance threshold value.
8. The method of any of claims 1-7, wherein: the aggregation NALUs comprise video data encoded using a Versatile Video Coding (VVC) standard; the importance value of the each aggregation NALU comprises a combined importance value based on combining a TID value and a LayerTD value determined by the encoder of the video data; the aggregation NALUs are ordered in the payload of the RTP aggregation packet in increasing order of combined importance values; the importance threshold value comprises a maximum combined importance value to be retained in the RTP aggregation packet during the packet wash procedure; and the maximum number of aggregation NALUs that may be removed from the RTP aggregation packet during the packet wash procedure is equal to a number of NALUs in the RTP aggregation packet whose combined importance value is greater than the importance threshold value.
9. The method of any of claims 1-8, wherein the combined importance value of the aggregation N LU is the TID value of the aggregation NALU multiplied by the LayerlD value of the aggregation NALU.
10. A method of performing a packet wash on a Real-time Transport Protocol (RTP) aggregation packet, implemented in a network node, the RTP aggregation packet comprising a plurality of received Network Abstraction Layer (NAL) units (NALUs), the method comprising: receiving a RTP aggregation packet; performing the packet wash when a size of the RTP aggregation packet is larger than an available space in an outbound queue of the network node, the packet wash comprising: generating a reduced RTP aggregation packet by removing, beginning with a last received NALU, one or more received NALUs from the payload of the reduced RTP aggregation packet; terminating the packet wash procedure when one of (i) the size of the reduced RTP aggregation packet is less than or equal to the available space in the outbound queue, or (ii) a total number of received NALUs removed from the payload of the reduced RTP aggregation packet is equal to a maximum number of received NALUs that may be removed from the received RTP aggregation packet during the packet wash procedure; and storing the reduced size packet in the outbound queue when the packet wash procedure is terminated if the size of the reduced RTP aggregation packet is less than or equal to the available space in the outbound queue, wherein the outbound queue is configured to transmit the reduced size packet.
11. The method of claim 10, wherein: a network layer header that encapsulates the RTP aggregation packet comprises a WashAllowance value indicating a maximum number of received NALUs that may be removed from the received RTP aggregation packet during the packet wash procedure; the removing a received NALU from the payload of the reduced RTP aggregation packet comprises decrementing by one the WashAllowance value in the network layer header; and the total number of received NALUs removed from the payload is equal to the maximum number of received NALUs that may be removed when the WashAllowance value in the network layer header equals zero.
12. A network apparatus, comprising: a memory configured to store instructions; and a processor coupled to the memory and configured to execute the instructions to perform the method of any of claims 1-11.
13. The network apparatus of claim 12, wherein the memory comprises two or more memory components and the processor comprises two or more processor components.
14. A non-transitory computer readable medium comprising a computer program product for use by an electronic device, the computer program product comprising computer executable instructions stored on the non-transitory computer readable medium that, when executed by one or more processors, cause the electronic device to execute the method of any of claims 1-11.
15. A network apparatus, comprising: a receiving means for receiving a plurality of aggregation Network Abstraction Layer (NAL) units (NALUs), each aggregation NALU of the plurality of aggregation NALUs containing a portion of a frame of video data; a processing means for generating a decreasing-order Real-time Transport Protocol (RTP) aggregation packet comprising the aggregation NALUs; and a transmitting means for transmitting the RTP aggregation packet.
16. The network apparatus of claim 15, wherein the network apparatus is further configured to perform the method of any of claims 1-9.
17. A network apparatus, comprising: a receiving means for receiving a decreasing-order Real-time Transport Protocol (RTP) aggregation packet; a processing means for generating a reduced RTP aggregation packet when a size of the received RTP aggregation packet is larger than an available space in an outbound queue of the network node; and a transmitting means for transmitting the reduced RTP aggregation packet when the size of the reduced RTP aggregation packet is less than or equal to the available space in the outbound queue.
18. The network apparatus of claim 17, wherein the network apparatus is further configured to perform the method of any of claims 10-11.
PCT/US2023/021254 2023-05-05 2023-05-05 Packet wash of rtp aggregation packets in a video stream WO2023168133A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2023/021254 WO2023168133A2 (en) 2023-05-05 2023-05-05 Packet wash of rtp aggregation packets in a video stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2023/021254 WO2023168133A2 (en) 2023-05-05 2023-05-05 Packet wash of rtp aggregation packets in a video stream

Publications (2)

Publication Number Publication Date
WO2023168133A2 true WO2023168133A2 (en) 2023-09-07
WO2023168133A3 WO2023168133A3 (en) 2024-02-01

Family

ID=86732131

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/021254 WO2023168133A2 (en) 2023-05-05 2023-05-05 Packet wash of rtp aggregation packets in a video stream

Country Status (1)

Country Link
WO (1) WO2023168133A2 (en)

Also Published As

Publication number Publication date
WO2023168133A3 (en) 2024-02-01

Similar Documents

Publication Publication Date Title
US11196786B2 (en) Interface apparatus and method for transmitting and receiving media data
CN104737514B (en) Method and apparatus for distributive medium content service
CN110072143B (en) Video stream decoding method and device
CN109194660B (en) Network access method and device of mobile terminal
CN109068187B (en) Real-time traffic delivery system and method
CN110474721B (en) Video data transmission method, device and computer readable storage medium
US9565118B1 (en) Methods and apparatus for handling management packets in an audio video bridging (AVB) network
JP2006211681A (en) High-speed ethernet mac and phy apparatus comprising filter-based ethernet packet router having priority queue, and single or multiple transport stream interface
RU2420909C2 (en) Splitting data stream
CN111083425A (en) Video stream processing method and device, server, electronic equipment and storage medium
CN110661726A (en) Data sending method and device based on multilink aggregation
WO2020173165A1 (en) Method and apparatus for simultaneously switching audio stream and video stream
US20190238921A1 (en) STANDARDIZED HOT-PLUGGABLE TRANSCEIVING unit WITH BANDWIDTH OPTIMIZATION FUNCTIONALITIES
US9647951B2 (en) Media stream rate reconstruction system and method
US10848802B2 (en) IP traffic software high precision pacer
EP3096525B1 (en) Communication apparatus, communication data generation method, and communication data processing method
US10587518B2 (en) Identifying network conditions
CN110830762B (en) Audio and video data processing method and system
WO2023168133A2 (en) Packet wash of rtp aggregation packets in a video stream
WO2019200568A1 (en) Data communication method and device
CN109714641B (en) Data processing method and device based on video network
CN110149306B (en) Media data processing method and device
CN108966038B (en) Video data processing method and video networking cache server
CN111181872A (en) Data processing method and device
Koren et al. Architecture of a 100-gbps network processor for next generation video networks