WO2024058782A1 - Group of pictures affected packet drop - Google Patents

Group of pictures affected packet drop Download PDF

Info

Publication number
WO2024058782A1
WO2024058782A1 PCT/US2022/043664 US2022043664W WO2024058782A1 WO 2024058782 A1 WO2024058782 A1 WO 2024058782A1 US 2022043664 W US2022043664 W US 2022043664W WO 2024058782 A1 WO2024058782 A1 WO 2024058782A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
packets
packet
lpd
frames
Prior art date
Application number
PCT/US2022/043664
Other languages
French (fr)
Inventor
Lijun Dong
Original Assignee
Futurewei Technologies, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Futurewei Technologies, Inc. filed Critical Futurewei Technologies, Inc.
Priority to PCT/US2022/043664 priority Critical patent/WO2024058782A1/en
Publication of WO2024058782A1 publication Critical patent/WO2024058782A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/32Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/647Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
    • H04N21/64784Data processing by the network
    • H04N21/64792Controlling the complexity of the content stream, e.g. by dropping packets

Definitions

  • the present disclosure is generally related to video streaming in telecommunication networks and is specifically related to packet dropping or partial packet dropping in telecommunications networks.
  • Bursty loss and longer-than-expected delay have catastrophic effects on the quality of experience (QoE) to end-users in video streaming. Bursty loss and longer-than-expected delay are usually caused by network congestion. Despite all kinds of congestion control mechanisms developed in the community over the decades, congestion control mechanisms often target different goals, e.g. link utilization improvement, loss reduction, fairness enhancement. For media streaming, minimizing the possibility of network congestion can often be achieved by rate control and video adaptation methods.
  • each packet in a video stream may include a loss propagation depth (LPD).
  • LPD loss propagation depth
  • An LPD value indicates a number of frames that directly or indirectly rely on video data from a current frame in a current packet to be decodable.
  • frames can be organized into multiple groups of pictures (GOPs). Each GOP includes one I frame and a series of P and/or B frames.
  • each P frame and B frame directly or indirectly relies on the I frame in the same GOP.
  • a number of B frames e.g., four
  • the LPD value may be set to the number of frames in the GOP
  • the LPD value may be set to the number of frames in the current GOP plus four.
  • the LPD value is set as the number of frames that directly or indirectly reference the current frame plus one.
  • the network node handling the packets is capable of intelligently packet washing or dropping the packet containing the frame that has the least impact on the video stream.
  • a first aspect relates to a method implemented in a network node, the method comprising: receiving a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data; obtaining loss propagation depth (LPD) values from each of the plurality of packets, wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and performing a packet drop on a packet with a lowest LPD value from the plurality of packets.
  • LPD loss propagation depth
  • each of the plurality of packets contains an LPD field including metadata indicating a corresponding LPD value.
  • another implementation of the aspect provides reducing a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped, wherein packets with lower priority are dropped prior to packets with higher priority.
  • each of the plurality of packets contains a reference frame identifier (ID), and wherein the reference frame ID indicates any frame IDs of any frames upon which the current frame depends.
  • ID reference frame identifier
  • each of the plurality of packets contains a frame identifier (ID), and wherein the frame ID indicates the current frame associated with the current packet.
  • ID frame identifier
  • a second aspect relates to a method implemented in a network node, the method comprising: determining loss propagation depth (LPD) values for each of a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data, and wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; encoding the LPD values into the plurality of packets as metadata to support packet dropping on a packet with a lowest LPD value from the plurality of packets; and transmitting the plurality of packets.
  • LPD loss propagation depth
  • another implementation of the aspect provides that the LPD values are encoded into LPD fields in each of the plurality of packets.
  • another implementation of the aspect provides encoding a reference frame identifier (ID) into each of the plurality of packets, and wherein the reference frame ID indicates any frame IDs of any frames upon which the current frame depends.
  • ID reference frame identifier
  • another implementation of the aspect provides that encoding a frame identifier (ID) into each of the plurality of packets, and wherein the frame ID indicates the current frame associated with the current packet.
  • ID a frame identifier
  • another implementation of the aspect provides that the reference frame IDs and the frame IDs support reducing a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped, wherein packets with lower priority are dropped prior to packets with higher priority.
  • a third aspect relates to a network node comprising a processor, a receiver coupled to the processor, a memory coupled to the processor, and a transmitter coupled to the processor, wherein the processor, receiver, memory, and transmitter are configured to perform the method of any of the preceding aspects.
  • a fourth aspect relates to a non-transitory computer readable medium comprising a computer program product for use by a router, the computer program product comprising computer executable instructions stored on the non-transitory computer readable medium such that when executed by a processor cause the router to perform the method of any of the preceding aspects.
  • a fifth aspect relates to a network device comprising: a receiving means for receiving a plurality of packets, wherein each of the plurality of packets contain a frame of video data; and a processing means for: obtaining loss propagation depth (LPD) values from each of the plurality of packets, wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and performing a packet drop on a packet with a lowest LPD value from the plurality of packets.
  • LPD loss propagation depth
  • another implementation of the aspect provides that the network device is further configured to perform the method of any of the preceding aspects.
  • a sixth aspect relates to a network device comprising: a processing means for: determining loss propagation depth (LPD) values for each of a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data, and wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and encoding the LPD values into the plurality of packets as metadata to support packet dropping on a packet with a lowest LPD value from the plurality of packets; and a transmitting means for transmitting the plurality of packets.
  • LPD loss propagation depth
  • another implementation of the aspect provides that the network device is further configured to perform the method of any of the preceding aspects.
  • any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.
  • FIG. 1 is a schematic diagram of an example telecommunication network including a network domain.
  • FIG. 2 is a schematic diagram illustrating an example process of uni-directional inter prediction using a plurality of frames.
  • FIG. 3 is a schematic diagram illustrating an example process of bidirectional inter prediction using a plurality of frames.
  • FIG. 4 is a schematic diagram illustrating an example packet configured to carry metadata to support packet washing and/or dropping based on loss propagation depth (LPD).
  • FIG. 5 is an example method of performing a packet wash based on LPD.
  • FIG. 6 is an example method of performing a downgrade of packet priority for a packet carrying a frame that is dependent on another frame in another packet that has been washed and/or dropped.
  • FIG. 7 is an example method of encoding packets to support packet wash based on LPD and priority downgrades for dependent packets.
  • FIG. 8 is an example method of performing a packet wash based on LPD and priority downgrades for dependent packets.
  • FIG. 9 is a schematic diagram of a network apparatus according to an embodiment of the disclosure.
  • a packet wash is an example mechanism that can be applied to streams of video data in order to selectively reduce the amount of video data rather than completely dropping video data when network resources are insufficient to correctly route the data
  • a packet wash can be defined as an operation that selectively drops chunks of packet data to reduce packet payload size, for example in the event of network congestion.
  • the packet wash process may use information related to the way a video stream is encoded to determine which information has the least effect on playback at the destination. The packet wash process can then prioritize dropping the least important information first.
  • the video packets may each contain an image frame.
  • each packet can be organized into chunks such that data in the first chunk of the packet can be used to decode the frame at a low resolution and each subsequent chunk provides enough information to decode the frame at a higher resolution.
  • the packet wash process can drop chunks from the end of the packet in response to network congestion rather than dropping the entire packet. Accordingly, the destination receives some packets with frames at lower quality rather than receiving a video stream that is completely missing some packets and frames. In such an example, the packets are only completely dropped as a last resort to overcome congestion.
  • network congestion avoidance mechanisms can provide a priority to each packet and wash and/or drop packets based on priority.
  • a video stream may include intra prediction (I) frames, unidirectional prediction (P) frames, and bidirectional (B) frames.
  • I frame is independently decodable
  • P frame is decodable based on an I frame
  • B frame is decodable based on an I frame and/or a P frame.
  • an I frame can be given a highest priority because dropping an I frame affects the decodability of related P and B frames.
  • P frames can then be given a lower priority because dropping a P frame affects the decodability of related B frames.
  • B frames can then be given the lowest priority because dropping a B frame generally does not affect other frames.
  • packets can be completely dropped based on priority.
  • each packet can be assigned a priority based on the relative importance of the packet as discussed above.
  • the network node can begin by dropping packets of the lowest priority first. Packets of subsequently higher priority can then be dropped as needed. In this way, disruption of the video stream is minimized.
  • packet wash and packet drop may be applied together. For example, if a packet contains a frame or parts of a frame (e.g., slices), the network node may apply packet dropping. However, if the frame contains multiple frames, the network node can wash the packet by dropping some, but not all of the frames in the packet.
  • each packet in a video stream may include an LPD.
  • An LPD value indicates a number of frames that directly or indirectly rely on video data from a current frame in a current packet to be decodable.
  • frames can be organized into multiple groups of pictures (GOPs). Each GOP includes one I frame and a series of P and/or B frames.
  • each P frame and B frame directly or indirectly relies on the I frame in the same GOP.
  • a number of B frames e.g., four
  • the LPD value may be set to the number of frames in the GOP
  • the LPD value may be set to the number of frames in the current GOP plus four.
  • the LPD value is set as the number of frames that directly or indirectly reference the current frame plus one.
  • the network node handling the packets is capable of intelligently packet washing or dropping the packet containing the frame that has the least impact on the video stream.
  • each packet may contain a frame identifier (ID) and a reference frame ID.
  • the frame ID indicates the frame contained in the current packet and the reference frame ID indicates the list of frame IDs that the current frame relies upon to be decodable.
  • a network node can keep track of the frame ID of any packet that is washed or dropped. Further, when a current packet contains a reference ID that includes a frame ID of a dropped packet, this indicates that the frame in the current packet is unlikely to be decodable at the destination absent a retransmission of the dropped packet. Accordingly, the network node can reduce the priority of the current packet to the lowest available priority and wash or drop the current packet before washing or dropping other packets that are still decodable. In this way, the network node handling the packets is capable of intelligently packet washing or dropping the packet containing the frame that has the least impact on the video stream.
  • FIG. 1 is a schematic diagram of a telecommunication network 100 including a network domain 102.
  • the telecommunications network 100 includes a plurality of network nodes 104, 106, 108, and 110.
  • the network nodes 104-110 may be operating using layer three (L3) routing technology and, as such, may be referred to as L3 network nodes. While eight network nodes 104-110 are shown in the network domain 102, more or fewer nodes may be included in practical applications.
  • L3 layer three
  • Each of the network nodes 104-110 is configured to send, receive, and/or route packets containing media content (e.g., streaming media content).
  • media content e.g., streaming media content
  • one or more of the network nodes 104-110 may be a router, a switch, or a gateway.
  • Network nodes 104, 106, and 110 are disposed at an edge of the network domain 102, and may therefore be referred to as edge nodes.
  • the network nodes 104, 106, and 110 that receive packets from outside the network domain 102 may be referred to as an ingress network node (e.g., an ingress router).
  • the network nodes 104, 106, and 110 that transmit packets out of the network domain 102 may be referred to as an egress network node (e.g., an egress router). Depending on the direction of packet traffic, each of the network nodes 104, 106, and 110 may function as an ingress network node, an egress network node, or both.
  • Network nodes 108 are not on an edge of the network domain 102. Such nodes 108 may be referred to as internal nodes and may not be configured to communicate outside of the network domain 102 without passing through an edge node.
  • Each of the network nodes 104-110 has one or more neighbor network nodes.
  • a neighbor network node refers to a network node which is only one hop away from the network node.
  • the network nodes 104-110 are coupled to, and communicate with each other, via links 120.
  • the links 120 may be wired, wireless, electrical, optical, or some combination thereof.
  • the pattern or arrangement of links 120 in FIG. 1 is for the purpose of illustration only. More or fewer links 120 coupling the network nodes 104-110 to each other in a different configuration may be used in practical applications.
  • the network node 104 is coupled to a content source 130.
  • the content source 130 may represent a content provider that streams packets of video data for viewing by an end user (e.g., streaming television, video conferencing, etc.)
  • the content source 130 is configured to provide packets of media content such as, for example, streaming media content, motion picture experts group (MPEG) video samples, etc.
  • MPEG motion picture experts group
  • the network node 104 is configured to request and receive media content from the content source 130.
  • the content source 130 may utilize a video encoder (a.k.a., codec) capable of implementing various video coding techniques.
  • the video encoder of the content source 130 may perform intra- and inter-coding of video blocks within video slices.
  • a video block, or simply a block is an MxN (M-column by N-row) array of samples (e.g., pixels), or an MxN array of transform coefficients.
  • MxN M-column by N-row
  • Several of the video blocks may be aggregated to form one of the video slices. All of the video slices or all of the video blocks collectively form a picture that can be displayed to a user.
  • Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture.
  • Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence.
  • Intra-mode may refer to any of several spatial based coding modes.
  • Inter-modes such as uni-directional (a.k.a., uni prediction) prediction (P mode) or bi-prediction (a.k.a., bi prediction) (B mode), may refer to any of several temporal-based coding modes.
  • Video slices of pictures may be included as payloads in network packets, and such network packets may be denoted by the mode used to encode the included slices.
  • a packet containing a picture/slice coded according to I mode is said to be an I frame packet or a packet carrying an I frame
  • a packet containing a picture/slice coded according to P mode is said to be a P frame packet or a packet carrying a P frame
  • a packet containing a picture/slice coded according to B mode is said to be a B frame packet or a packet carrying a B frame.
  • the network node 110 is coupled to a destination node 140.
  • the destination node 140 may represent a user equipment (UE) of an end user.
  • the UE may be a smartphone, a tablet device, a laptop computer, and so on (e.g., a device configured to receive and display streaming video content).
  • the destination node 140 is configured to consume the media content provided by the content source 130.
  • the end user may utilize their UE to request media content (e.g., a streaming media video).
  • the request is routed by the network nodes 104-110 through the network domain 102 and delivered to the content source 130.
  • the content source 130 transmits the requested media content to be routed by the network nodes 104-110 back through the network domain 102 and delivered to the destination node 140 where the requested media content may be consumed.
  • the destination node 140 may utilize a video decoder (a.k.a., codec).
  • the video decoder of the destination node 140 is capable of implementing various video coding techniques corresponding to those used by the video encoder of the content source 130. That is, the video decoder of the destination node 140 is able to decode the video blocks encoded using intra- and inter-coding by the video encoder as described above.
  • Network nodes 104-110 each contain one or more receivers, processors, memories, and transmitters.
  • a packet is received by a receiver, processed by a processor, and stored in memory until the packet can be transmitted by a transmitter.
  • Network nodes 104-110 may include many receivers and transmitters, and may communicate many flows of packets at the same time. When a current packet is received and all transmitters are transmitting other packets, the current packet is retained in memory until a transmitter is available to transmit the current packet.
  • Network congestion occurs when a network node 104-110 receives packets faster than such packets can be transmitted. In such a case, the memory in the congested node fills up. Once the memory is full, the congested node must either discontinue storing incoming packets or must evict already received packets from memory to make room for the new packets. This function is known as a packet drop.
  • network nodes 104-110 can also be configured to perform a packet wash.
  • packets can be encoded with a slice and the data can be arranged in order of importance.
  • a slice can be described by coding mode(s) that indicate reference blocks that can be used to reconstruct current blocks in the slice. Any difference between the reference blocks and the current blocks is known as residual. Residual can be transformed into coefficients and carried in the packets. Coefficients have varying effects on reconstructed video quality.
  • the packets can be arranged with the least important coefficients toward the end of the packet and the most important coefficients towards the beginning of the packet.
  • a packet wash can drop data from the end of the packet towards the front of the packet until a packet is small enough to be transmitted despite the network congestion, or to reduce network congestion in general. In this way, the video quality of the slice in the packet is reduced as needed instead of the packet and corresponding slice being dropped entirely.
  • the network packets can be provided with a priority.
  • the network nodes 104- 110 can then use the priority to select which packets to drop and/ or wash. In this way, the least important packets can be dropped or washed before the more important packets.
  • I frames may be more important than P or B frames because P and B frames rely on I frames to be decodable.
  • P frames may be more important than B frames because B frames rely on P frames to be decodable.
  • B frames are generally not relied upon by other frames for decoding and hence can have the lowest priority.
  • using priority allows the network nodes 104-110 to drop B frames first, then P frames, and then I frames so that packet drop based disruptions to the user at the destination node 140 are minimized as much as possible given the level of congestion across the path between the source node 130 and the destination node 140.
  • the present disclosure includes the case where a network node 104-110 has to choose between two packets of the same priority when determining to wash and/or drop a packet.
  • This can be accomplished by employing an LPD value.
  • an LPD value can be encoded into each packet.
  • the LPD value indicates the number of packets that rely on a current packet to be decodable.
  • the LPD may indicate the number of frames that rely on the frame in the current packet to be decodable. Dropping a packet with a higher LPD value is more disruptive to a video stream than dropping a packet with a lower LPD. Therefore, when two packets have the same priority, the network node 104-110 can wash and/or drop the packet with the lowest LPD first when overcoming network congestion.
  • the calculation of LPD for each packet and the corresponding mechanism for use is discussed in greater detail below.
  • each packet can include a frame ID.
  • a network node 104-110 can keep a log of the frame IDs of any packets that are dropped and/or washed.
  • each packet can contain a reference frame ID that includes values indicating the frame IDs of all frames upon which the current frame depends upon to be decodable.
  • the network node 104- 110 can reduce the priority of the current packet (e.g., to zero). In this way, packets that are unlikely to be useable at the destination node 140 are washed and/or dropped first to reduce video streaming disruption.
  • the mechanism for downgrading priority is also discussed in greater detail below.
  • FIG. 2 is a schematic diagram illustrating the process of uni-directional inter prediction 200 using a plurality of frames 202.
  • the frames 202 individually and collectively represent media content. As shown, some of the frames 202 have been labeled with the letter “I.”
  • These frames 202 are an intra prediction (I) frame coded using the I mode discussed above.
  • An I-frame comprises macroblocks that only rely on intra prediction. That is, the macroblocks within the I-frame are predicted based only on information inside the I-frame and not information outside the I-frame. Therefore, the I-frame can be predicted without having to reference any other frame.
  • macroblock is used in relation to high efficiency video coding (HEVC) standards.
  • VVC Versatile video coding
  • VVC instead partition slices into coding tree units (CTUs). Accordingly, in a VVC context an I-frame 202 comprises CTUs that only rely on intra prediction, etc.
  • a P-frame (a.k.a., a predicted frame) allows macroblocks or CTUs to be compressed using temporal prediction in addition to spatial prediction.
  • a P-frame references other frames 202 that have been previously encoded (e.g., an I-frame or a neighboring P-frame). Every macroblock or CTU in a P-frame can be temporally predicted, spatially predicted, or skipped (e.g., where the video decoder copies the co-located block from the previous frame - i.e., a “zero” motion vector).
  • the process of uni-directional inter prediction 200 may proceed according to a decode order 204.
  • the frame 202 with position 1 in the decode order 204 is an I-frame.
  • the I-frame is decoded without referencing any other frame 202 (i.e., there is no arrow from the I-frame pointing to any preceding frame 202).
  • the process of uni-directional inter prediction 200 proceeds to decode the frame 202 in position 2 of the decode order 204.
  • the frame in position 2 is a P-frame.
  • the P-frame is decoded by referencing the preceding I- frame in position 1 of the decode order 204 as shown by the arrow.
  • the process of uni-directional inter prediction 200 proceeds to decode the frame in position 3 of the decode order 204.
  • the frame in position 3 is a P-frame.
  • the P-frame is decoded by referencing the preceding P-frame in position 2 of the decode order 204 as shown by the arrow.
  • This process of decoding frames 202 as described above continues until all frames are decoded.
  • the frames 202 may be displayed on a display screen of a UE according to the display order 206, which corresponds to the decode order 204.
  • frames may also be referred to as video frames, images, and/or pictures depending on the example.
  • FIG. 3 is a schematic diagram illustrating the process of bidirectional inter prediction 300 using a plurality of frames 302.
  • the frames 302 individually and collectively represent media content. As with the process of uni-directional inter prediction 200, some of these frames 302 are labeled with the letter “I” or “P” to represent I-frames and P-frames, respectively. In addition, some of the frames 302 are labeled with the letter “B.”
  • These frames 302 are a bidirectional prediction (B) frame coded using the B mode discussed above.
  • a B-frame is a frame that can refer to frames 302 that occur both before the B-frame and after the B-frame in decode order 304.
  • the video decoder may use macroblock compression (e.g., like Advanced Video Coding (AVC)ZITU-T H.264 does).
  • AVC Advanced Video Coding
  • Each macroblock or CTU of a B-frame can be predicted in a variety of ways. For example, each macroblock or CTU of the B-frame can be predicted using backward prediction (e.g., using frames 302 that occur after the current frame), using forward prediction (e.g. , using frames 302 that occur before the current frame), or without inter prediction (e.g., only intra prediction).
  • each macroblock or CTU of a B-frame can be skipped completely (e.g., with intra or inter prediction).
  • the process of bidirectional inter prediction 300 may proceed according to the decode order 304.
  • the frame 302 with position 1 in the decode order 304 is an I-frame.
  • the I-frame is decoded without referencing any other frame 302 (i.e., there is no arrow from the I-frame pointing to any preceding frame 202).
  • the process of bidirectional inter prediction 300 proceeds to decode the frame 302 in position 2 of the decode order 304.
  • the frame in position 2 is a P-frame.
  • the P-frame is decoded by referencing the preceding I-frame in position 1 of the decode order 204 as shown by the arrow.
  • the process of bidirectional inter prediction 300 proceeds to decode the frame in position 3 of the decode order 304.
  • the frame in position 3 is a B-frame.
  • the B-frame is decoded by referencing the P- frame in position 2 of the decode order 304 as well as the I-frame in position 1 of the decode order 304 as shown by the arrows.
  • the process of bidirectional inter prediction 300 proceeds to decode the frame in position 4 of the decode order 304.
  • the frame in position 4 is a B-frame.
  • the B-frame is decoded by referencing the P- frame in position 2 of the decode order 304 as well as the I-frame in position 1 of the decode order 304 as shown by the arrows.
  • This process of decoding frames as described above continues until all frames 302 are decoded.
  • the frames 302 may be displayed on a display screen of a UE according to the display order 306.
  • the display order 306 does not necessarily correspond to the decode order 304. That is, the frames 302 are decoded in a different order than they are displayed.
  • MPEG video sequences are made up of groups of pictures (GOPs).
  • Each GOP comprises a preset number of coded frames, including one I frame and one or more P and B frames.
  • FIG. 2 depicts two complete GOPs and FIG. 3 depicts a single GOP.
  • GOP lengths may differ depending on the encoded video. Longer GOPs may be used for low-motion video content because frames in low-motion video have reduced dependency on the I frames and hence fewer I frames are needed. This in turn improves the video compression efficiency for long GOPs. However, long GOPs may have reduced error resilience.
  • the first frame is an intracoded I frame and the last frame is a P frame.
  • P or B frames are used for all other frames in the GOP. Closed GOPs are self-contained since none of the frames refer to another frame outside the GOP.
  • the I frame is directly referenced by up to three frames, whereas a P frame is directly referenced by up to five frames.
  • open GOP may use both I and P frames for forward or backward prediction.
  • the last P frame in a previous GOP is referenced by B frames in the current GOP.
  • the last P frame in a current GOP is reference by B frames in a subsequent GOP.
  • an open GOP ends with a P frame.
  • the open GOP fully exploits the last P frame, which is used as a reference for four B frames.
  • fewer P frames may be employed in open GOP when compared to closed GOP structures. This may result in a slight improvement in compression efficiency.
  • an I frame serves as a reference frame for more frames (e.g., 5 frames), possibly as many as the P frame.
  • the drawback of an open GOP is that an open GOP is no longer self- contained, and hence cannot be decoded independently.
  • An I frame in a closed GOP can be referenced by up to 3 frames.
  • a loss of an I frame in a closed GOP affects the entire GOP, because such a loss indirectly prevents all other frames from being decoded.
  • loss of an I frame prevents decoding of associated P frames, which results in prevention of decoding corresponding B frames.
  • the loss of an I frame may propagate to the entire GOP.
  • the loss of an I frame located in a longer GOP would yield worse quality of experience (QoE) to end user than the loss of an I frame located in a shorter GOP. Accordingly, an I frame in longer GOP is more important than I frame in shorter GOP.
  • the loss of an I frame in an open GOP may affect more frames than the entire GOP, because the last P frame in a current GOP is used as a reference for four B frames in a next GOP.
  • LPD loss propagation depth
  • LPD LPD for an I frame
  • GOPLength the number of frames in the GOP
  • 4 is the number of B frames in a subsequent GOP that reference the last P frame in the current open GOP.
  • a direct reference could be up to five frames, but the indirect reference could be propagated to the end of the GOP.
  • the first P frame e.g., the P frame following the first I frame
  • LPD P nf P +l, where nf P is the number of frames (e.g., B frames or P frames) that directly or indirectly reference the P frame.
  • nf P the number of frames (e.g., B frames or P frames) that directly or indirectly reference the P frame.
  • the direct references to a P frame may include up to 5 frames, but indirect references could propagate outside of the current GOP.
  • LPD B nf B +l
  • nf B is the number of other B frames in the GOP which reference the current B frame.
  • a B frame can be referenced by other B frames, but such references are not generally used.
  • a network node When a network node needs to decide between two packets for washing and/or dropping and such packets contain frames with the same type, i.e., two I frames, two P frames, or two B frames, the network node compares the LPD value of the frames and washes and/or drops the packet with the smallest LPD value. For example, a drop may occur without a packet wash. Further, a packet wash that cannot reduce the packet to a small enough size to allow for transmission despite congestion results in a packet drop.
  • the network node first checks a priority level for different types of frames, and washes and/or drops the packet with lower priority. When further packet dropping is needed and the candidate packets have the same priority level, then the LPD of the packets is used to decide how to further wash and/or drop the packets.
  • FIG. 4 is a schematic diagram illustrating an example packet 400 configured to carry metadata to support packet washing and/or dropping based on LPD.
  • the packet 400 may be configured to carry an an I frame, a P frame, or a B frame via telecommunications network 100 as discussed above.
  • the packet 400 includes a GOP ID 402, a frame ID 404, a reference frame ID 406, and a LPD 408, for example in a packet header.
  • the packet 400 also includes a frame 410, which may be a frame of a video stream, such as an I frame, a P frame, a B frame, or a slice thereof as described above.
  • the GOP ID 402 is an identifier that identifies the GOP that includes frame 410 carried by the packet.
  • the GOP ID 402 may be unique within the flow containing the video stream.
  • the flow is identified by a source address, a destination address, and flow label.
  • the GOP ID 402 may include some combination of a unique ID, a source address, a destination address, and a flow label.
  • the GOP ID 402 may have varying values depending on the length of the video.
  • the frame ID 404 is an ID that identifies the frame 410.
  • a GOP generally has a maximum length of 16 frames, and hence the frame ID 404 could be represented by four bits.
  • the reference frame ID 406 includes zero or more identifier(s) of the frames referenced by the current frame 410. I frames do not reference other frames, and hence the reference frame ID 406 may be zero bits for an I frame.
  • the reference frame ID 406 may be four bits for a P frame, and 8 bits for a B frame, respectively, and indicates the one or more frames reference by the P frame or B frame included in the frame 410. This is the case for a closed GOP.
  • the reference frame ID 406 may also include a previous GOP ID if the frame 410 references a frame in the previous GOP.
  • the LPD 408 is the LPD value of the current frame 410.
  • the LPD 408 may be determined based on the number of frames that directly or indirectly reference the current frame 410 as described above.
  • the network node may record the source address, destination address, flow label, GOP ID 402, and frame ID 404. This information can then be used to reduce the priority of any subsequent frame that references the current frame 410.
  • FIG. 5 is an example method 500 of performing a packet wash and/or packet drop based on LPD as described above.
  • the method 500 may be performed by a network node experiencing network congestion or otherwise having insufficient resources to properly route a corresponding flow.
  • the node determines that a packet wash and/or a packet drop is needed to continue normal operation.
  • the node employs method 500 to determine which packet to wash and/or drop.
  • the node determines whether the decision to select which packet to wash and/or drop is based on selecting between packets containing the same frame type.
  • the node determines whether two packets under consideration for packet washing and/or dropping have the same priority and/or are of the same frame type (e.g., two I frames, two P frames, or two B frames). If the packets are not of the same type, the method 500 proceeds to step 506.
  • the node selects the packet with the lowest priority and washes and/or drops the lowest priority packet. For example, when a first packet contains an I frame and a second packet contains a P frame, the node selects the P frame as having the lowest priority and washes and/or drops the packet containing the P frame. As another example, when a first packet contains a P frame and a second packet contains a B frame, the node selects the B frame as having the lowest priority and washes and/or drops the packet containing the P frame.
  • the method 500 proceeds from step 502 to step 504.
  • the node selects the packet with the lowest LPD value and washes and/or drops the packet.
  • the node can determine which packet has the lowest LPD value by reading the LPD values in the respective packet headers.
  • the LPD values can be determined by the content source based on the equations described above and encoded into each packet when the packets are sent across the network.
  • the node can store the frame ID and flow information for each packet dropped at step 504 and/or 506 in a dropped packet list for use in connection with method 600.
  • the method 500 can also be used recursively. For example, if there are initially three packets under consideration for packet washing and/or dropping with the same priority, then at step 502 the node can determine that two of the packets do not contain the same frame type. Then, at step 506, one of the two packets can be wash/dropped. The node can return to step 502 and process the remaining two packets accordingly.
  • FIG. 6 is an example method 600 of performing a downgrade of packet priority for a packet carrying a frame that is dependent on another frame in another packet that has been washed and/or dropped.
  • the method 600 may be performed by a network node performing packet washes and packet drops based on LPD values.
  • the network node can check whether priority degradation is implicated for each packet containing a P frame of a B frame. For example, when no packets have been washed or dropped at the node over a predetermined timeframe, then priority degradation is not needed. However, when the node has performed packet washing or dropping during the predetermined time frame, then priority degradation may be needed and the node proceeds to step 604.
  • the node checks the dropped packet list to determine whether a packet has been washed or dropped from the same flow. This can be accomplished by comparing the source address, the destination address, and flow label of the current packet with the dropped packet list. When a packet from the same flow has been dropped and/or washed, the node can further compare the reference frame ID of the current packet to the frame ID of the dropped packets in the dropped packet list that are associated with the same flow.
  • the node can determine whether a match is found as a result of step 604. When a match is not found, the node proceeds to step 610 and maintains the original priority of the current packet. When a match is found, the node proceeds from step 606 to step 608. A match of the flow and the reference ID to the frame ID in the dropped packets list indicates the current frame references a dropped packet and therefore is likely to be undecodable at the destination. Accordingly, when a match is found, the node downgrades the priority of the current packet to the lowest priority. In this way, the current packet is the first packet washed and/or dropped in the event of further network congestion as the current packet is unlikely to be useful to the end user. [0091] FIG.
  • Method 700 is an example method 700 of encoding packets to support packet wash based on LPD and priority downgrades for dependent packets.
  • Method 700 may occur at a content source.
  • the content source may employ method 700 to encode video frames into packets in a manner that supports packet wash and/or packet drops based on LPD and/or priority downgrades as described above.
  • the node determines LPD values for each of a plurality of packets.
  • Each of the plurality of packets contains at least a slice of a frame of video data.
  • each LPD value indicates, for a current slice of a current frame contained in a current packet, an amount of video data (e.g., a number of frames) that relies on video data from the current frame to be decodable.
  • the node encodes the LPD values into the plurality of packets as metadata to support packet dropping on a packet with a lowest LPD value from the plurality of packets, for example in case of network congestion.
  • the packet drop may include dropping the packet with the lowest LPD value.
  • the LPD values may be encoded into LPD fields in each of the plurality of packets.
  • the node encodes a frame ID into each of the plurality of packets.
  • the frame ID indicates the current frame associated with in the current packet (e.g., the current frame contained in the packet or the current frame containing the slice contained in the packet).
  • the node encodes a reference frame ID into each of the plurality of packets.
  • the reference frame ID indicates any frame IDs of any frames upon which the current frame depends.
  • the reference frame IDs and the frame IDs support reducing a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped. Packets with a lower priority are dropped prior to packets with a higher priority.
  • FIG. 8 is an example method 800 of performing a packet wash based on LPD and priority downgrades for dependent packets.
  • a network node may perform method 800 on a video stream transmitted from a content source and encoded according to method 700.
  • the network node receives a plurality of packets. Such packets may have a same priority and hence the network node may have to determine which packet to drop.
  • Each of the plurality of packets contain at least a slice of a frame of video data.
  • the network node obtains LPD values from each of the plurality of packets.
  • Each LPD value indicates, for a current slice of a current frame contained in a current packet, an amount of video data (e.g., a number of frames) that relies on video data from the current frame to be decodable.
  • each of the plurality of packets may contain an LPD field including metadata indicating a corresponding LPD value.
  • the network node performs a packet drop on a packet with a lowest LPD value from the plurality of packets, for example in response to network congestion.
  • the packet drop may include dropping the packet with the lowest LPD value from the packets with the same priority.
  • each of the plurality of packets may contains a reference frame ID.
  • the reference frame ID indicates any frame IDs of any frames upon which the current frame depends.
  • each of the plurality of packets contains a frame ID.
  • the frame ID indicates the current frame associated with the current packet.
  • the network node may match the reference frame ID to the current frame ID in a dropped packets list (along with a flow match) to determine when to reduce the priority of the dependent packet. Packets with lower priority are dropped prior to packets with higher priority.
  • the network apparatus 900 is suitable for implementing the disclosed embodiments as described herein.
  • the network apparatus 900 comprises ingress ports/ingress means 910 and receiver units (Rx)Zreceiving means 920 for receiving data; a processor, logic unit, or central processing unit (CPU)Zprocessing means 930 to process the data; transmitter units (Tx)Ztransmitting means 940 and egress portsZegress means 950 for transmitting the data; and a memoryZmemory means 960 for storing the data.
  • the network apparatus 900 may also comprise optical-to-electrical (OE) components and electrical-to-optical (EO) components coupled to the ingress portsZingress means 910, the receiver unitsZreceiving means 920, the transmitter unitsZtransmitting means 940, and the egress portsZegress means 950 for egress or ingress of optical or electrical signals.
  • OE optical-to-electrical
  • EO electrical-to-optical
  • the processorZprocessing means 930 is implemented by hardware and software.
  • the processorZprocessing means 930 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs).
  • the processorZprocessing means 930 is in communication with the ingress portsZingress means 910, receiver unitsZreceiving means 920, transmitter unitsZtransmitting means 940, egress portsZegress means 950, and memoryZmemory means 960.
  • the processorZprocessing means 930 comprises a packet LPD module 970.
  • the packet LPD module 970 is able to implement the methods disclosed herein.
  • the inclusion of the packet LPD module 970 therefore provides a substantial improvement to the functionality of the network apparatus 900 and effects a transformation of the network apparatus 900 to a different state.
  • the packet prioritization 970 is implemented as instructions stored in the memoryZmemory means 960 and executed by the processorZprocessing means 930.
  • the network apparatus 900 may also include input andZor output (IZO) devicesZIZO means 980 for communicating data to and from a user.
  • the IZO devices EO means 980 may include output devices such as a display for displaying video data, speakers for outputting audio data, etc.
  • the IZO devices EO means 980 may also include input devices, such as a keyboard, mouse, trackball, etc., andZor corresponding interfaces for interacting with such output devices.
  • the memoryZmemory means 960 comprises one or more disks, tape drives, and solid- state drives and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution.
  • the memoryZmemory means 960 may be volatile andZor non-volatile and may be read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (TCAM), andZor static random-access memory (SRAM).
  • ROM read-only memory
  • RAM random access memory
  • TCAM ternary content-addressable memory
  • SRAM static random-access memory

Abstract

A network node that receives a plurality of packets. Each of the plurality of packets contain a portion of a frame of video data. Loss propagation depth (LPD) values are obtained from each of the plurality of packets. Each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable. A packet drop is performed on a packet with a lowest LPD value from the plurality of packets.

Description

Group of Pictures Affected Packet Drop
TECHNICAL FIELD
[0001] The present disclosure is generally related to video streaming in telecommunication networks and is specifically related to packet dropping or partial packet dropping in telecommunications networks.
BACKGROUND
[0002] Bursty loss and longer-than-expected delay have catastrophic effects on the quality of experience (QoE) to end-users in video streaming. Bursty loss and longer-than-expected delay are usually caused by network congestion. Despite all kinds of congestion control mechanisms developed in the community over the decades, congestion control mechanisms often target different goals, e.g. link utilization improvement, loss reduction, fairness enhancement. For media streaming, minimizing the possibility of network congestion can often be achieved by rate control and video adaptation methods.
SUMMARY
[0003] The disclosed aspects/embodiments provide techniques to address the scenario where a network node determines to packet wash and/or drop a packet and the packets under consideration have the same priority. This may occur when considering whether to packet wash or drop one of two packets of the same type (e.g., two or more I frames, two or more P frames, or two or more B frames). In an example, each packet in a video stream may include a loss propagation depth (LPD). An LPD value indicates a number of frames that directly or indirectly rely on video data from a current frame in a current packet to be decodable. For example, frames can be organized into multiple groups of pictures (GOPs). Each GOP includes one I frame and a series of P and/or B frames. In a closed GOP, each P frame and B frame directly or indirectly relies on the I frame in the same GOP. In an open GOP, a number of B frames (e.g., four) in a next GOP can also reference the current GOP. Accordingly, for an I frame in a closed GOP the LPD value may be set to the number of frames in the GOP, and for an I frame in an open GOP the LPD value may be set to the number of frames in the current GOP plus four. Hence, when considering whether to packet wash or drop two I frames, the I frame with the smaller GOP is washed or dropped. For P and B frames, the LPD value is set as the number of frames that directly or indirectly reference the current frame plus one. Hence, when considering whether to packet wash or drop two P frames or two B frames, the packet with the smaller dependency is always dropped. In this way, the network node handling the packets is capable of intelligently packet washing or dropping the packet containing the frame that has the least impact on the video stream.
[0004] A first aspect relates to a method implemented in a network node, the method comprising: receiving a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data; obtaining loss propagation depth (LPD) values from each of the plurality of packets, wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and performing a packet drop on a packet with a lowest LPD value from the plurality of packets.
[0005] Optionally, in any of the preceding aspects, another implementation of the aspect provides that each of the plurality of packets contains an LPD field including metadata indicating a corresponding LPD value.
[0006] Optionally, in any of the preceding aspects, another implementation of the aspect provides reducing a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped, wherein packets with lower priority are dropped prior to packets with higher priority.
[0007] Optionally, in any of the preceding aspects, another implementation of the aspect provides that each of the plurality of packets contains a reference frame identifier (ID), and wherein the reference frame ID indicates any frame IDs of any frames upon which the current frame depends.
[0008] Optionally, in any of the preceding aspects, another implementation of the aspect provides that each of the plurality of packets contains a frame identifier (ID), and wherein the frame ID indicates the current frame associated with the current packet.
[0009] Optionally, in any of the preceding aspects, another implementation of the aspect provides that an LPD value for an intra-predicted (I) frame in a closed group of pictures (GOP) is determined according to LPDI = GOPLength where LPDI is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures.
[0010] Optionally, in any of the preceding aspects, another implementation of the aspect provides that an LPD value for an I frame in an open GOP is determined according to LPDI = GOPLength + 4 where LPDI is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures.
[0011] Optionally, in any of the preceding aspects, another implementation of the aspect provides that an LPD value for a unidirectional inter-predicted (P) frame is determined according to LPDP = nfp + 1 where LPDP is the LPD value for the P frame and nfp is a number of frames that reference the P frame.
[0012] Optionally, in any of the preceding aspects, another implementation of the aspect provides that an LPD value for a bidirectional inter-predicted (B) frame is determined according to LPDB = nfp + 1 where LPDP is the LPD value for the B frame and nfp is a number of frames that reference the B frame.
[0013] A second aspect relates to a method implemented in a network node, the method comprising: determining loss propagation depth (LPD) values for each of a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data, and wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; encoding the LPD values into the plurality of packets as metadata to support packet dropping on a packet with a lowest LPD value from the plurality of packets; and transmitting the plurality of packets.
[0014] Optionally, in any of the preceding aspects, another implementation of the aspect provides that the LPD values are encoded into LPD fields in each of the plurality of packets.
[0015] Optionally, in any of the preceding aspects, another implementation of the aspect provides encoding a reference frame identifier (ID) into each of the plurality of packets, and wherein the reference frame ID indicates any frame IDs of any frames upon which the current frame depends.
[0016] Optionally, in any of the preceding aspects, another implementation of the aspect provides that encoding a frame identifier (ID) into each of the plurality of packets, and wherein the frame ID indicates the current frame associated with the current packet.
[0017] Optionally, in any of the preceding aspects, another implementation of the aspect provides that the reference frame IDs and the frame IDs support reducing a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped, wherein packets with lower priority are dropped prior to packets with higher priority.
[0018] Optionally, in any of the preceding aspects, another implementation of the aspect provides that an LPD value for an intra-predicted (I) frame in a closed group of pictures (GOP) is determined according to LPDI = GOPLength where LPDI is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures.
[0019] Optionally, in any of the preceding aspects, another implementation of the aspect provides that an LPD value for an I frame in an open GOP is determined according to LPDI = GOPLength + 4 where LPDI is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures.
[0020] Optionally, in any of the preceding aspects, another implementation of the aspect provides that an LPD value for a unidirectional inter-predicted (P) frame is determined according to LPDP = nfp + 1 where LPDP is the LPD value for the P frame and nip is a number of frames that reference the P frame.
[0021] Optionally, in any of the preceding aspects, another implementation of the aspect provides that an LPD value for a bidirectional inter-predicted (B) frame is determined according to LPDB = nfp + 1 where LPDP is the LPD value for the B frame and nfp is a number of frames that reference the B frame.
[0022] A third aspect relates to a network node comprising a processor, a receiver coupled to the processor, a memory coupled to the processor, and a transmitter coupled to the processor, wherein the processor, receiver, memory, and transmitter are configured to perform the method of any of the preceding aspects.
[0023] A fourth aspect relates to a non-transitory computer readable medium comprising a computer program product for use by a router, the computer program product comprising computer executable instructions stored on the non-transitory computer readable medium such that when executed by a processor cause the router to perform the method of any of the preceding aspects.
[0024] A fifth aspect relates to a network device comprising: a receiving means for receiving a plurality of packets, wherein each of the plurality of packets contain a frame of video data; and a processing means for: obtaining loss propagation depth (LPD) values from each of the plurality of packets, wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and performing a packet drop on a packet with a lowest LPD value from the plurality of packets.
[0025] Optionally, in any of the preceding aspects, another implementation of the aspect provides that the network device is further configured to perform the method of any of the preceding aspects.
[0026] A sixth aspect relates to a network device comprising: a processing means for: determining loss propagation depth (LPD) values for each of a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data, and wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and encoding the LPD values into the plurality of packets as metadata to support packet dropping on a packet with a lowest LPD value from the plurality of packets; and a transmitting means for transmitting the plurality of packets.
[0027] Optionally, in any of the preceding aspects, another implementation of the aspect provides that the network device is further configured to perform the method of any of the preceding aspects.
[0028] For the purpose of clarity, any one of the foregoing embodiments may be combined with any one or more of the other foregoing embodiments to create a new embodiment within the scope of the present disclosure.
[0029] These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0030] For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
[0031] FIG. 1 is a schematic diagram of an example telecommunication network including a network domain.
[0032] FIG. 2 is a schematic diagram illustrating an example process of uni-directional inter prediction using a plurality of frames.
[0033] FIG. 3 is a schematic diagram illustrating an example process of bidirectional inter prediction using a plurality of frames.
[0034] FIG. 4 is a schematic diagram illustrating an example packet configured to carry metadata to support packet washing and/or dropping based on loss propagation depth (LPD). [0035] FIG. 5 is an example method of performing a packet wash based on LPD.
[0036] FIG. 6 is an example method of performing a downgrade of packet priority for a packet carrying a frame that is dependent on another frame in another packet that has been washed and/or dropped.
[0037] FIG. 7 is an example method of encoding packets to support packet wash based on LPD and priority downgrades for dependent packets.
[0038] FIG. 8 is an example method of performing a packet wash based on LPD and priority downgrades for dependent packets. [0039] FIG. 9 is a schematic diagram of a network apparatus according to an embodiment of the disclosure.
DETAILED DESCRIPTION
[0040] It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
[0041] A packet wash is an example mechanism that can be applied to streams of video data in order to selectively reduce the amount of video data rather than completely dropping video data when network resources are insufficient to correctly route the data A packet wash can be defined as an operation that selectively drops chunks of packet data to reduce packet payload size, for example in the event of network congestion. For example, the packet wash process may use information related to the way a video stream is encoded to determine which information has the least effect on playback at the destination. The packet wash process can then prioritize dropping the least important information first. For example, the video packets may each contain an image frame. Further, each packet can be organized into chunks such that data in the first chunk of the packet can be used to decode the frame at a low resolution and each subsequent chunk provides enough information to decode the frame at a higher resolution. In this way, the packet wash process can drop chunks from the end of the packet in response to network congestion rather than dropping the entire packet. Accordingly, the destination receives some packets with frames at lower quality rather than receiving a video stream that is completely missing some packets and frames. In such an example, the packets are only completely dropped as a last resort to overcome congestion.
[0042] In another example, network congestion avoidance mechanisms can provide a priority to each packet and wash and/or drop packets based on priority. For example, a video stream may include intra prediction (I) frames, unidirectional prediction (P) frames, and bidirectional (B) frames. An I frame is independently decodable, a P frame is decodable based on an I frame, and a B frame is decodable based on an I frame and/or a P frame. Accordingly, an I frame can be given a highest priority because dropping an I frame affects the decodability of related P and B frames. P frames can then be given a lower priority because dropping a P frame affects the decodability of related B frames. B frames can then be given the lowest priority because dropping a B frame generally does not affect other frames.
[0043] In a particular example, packets can be completely dropped based on priority. For example, each packet can be assigned a priority based on the relative importance of the packet as discussed above. When a network node needs to drop a packet, the network node can begin by dropping packets of the lowest priority first. Packets of subsequently higher priority can then be dropped as needed. In this way, disruption of the video stream is minimized. In some examples, packet wash and packet drop may be applied together. For example, if a packet contains a frame or parts of a frame (e.g., slices), the network node may apply packet dropping. However, if the frame contains multiple frames, the network node can wash the packet by dropping some, but not all of the frames in the packet.
[0044] Disclosed herein are techniques to address the scenario where a network node determines to packet wash and/or drop a packet and the packets under consideration have the same priority. This may occur when considering whether to packet wash or drop one of two packets of the same type (e.g., two or more I frames, two or more P frames, or two or more B frames). In an example, each packet in a video stream may include an LPD. An LPD value indicates a number of frames that directly or indirectly rely on video data from a current frame in a current packet to be decodable. For example, frames can be organized into multiple groups of pictures (GOPs). Each GOP includes one I frame and a series of P and/or B frames. In a closed GOP, each P frame and B frame directly or indirectly relies on the I frame in the same GOP. In an open GOP, a number of B frames (e.g., four) in a next GOP can also reference the current GOP. Accordingly, for an I frame in a closed GOP the LPD value may be set to the number of frames in the GOP, and for an I frame in an open GOP the LPD value may be set to the number of frames in the current GOP plus four. Hence, when considering whether to packet wash or drop two I frames, the I frame with the smaller GOP is washed or dropped. For P and B frames, the LPD value is set as the number of frames that directly or indirectly reference the current frame plus one. Hence, when considering whether to packet wash or drop two P frames or two B frames, the packet with the smaller dependency is dropped. In this way, the network node handling the packets is capable of intelligently packet washing or dropping the packet containing the frame that has the least impact on the video stream.
[0045] In a further example, each packet may contain a frame identifier (ID) and a reference frame ID. The frame ID indicates the frame contained in the current packet and the reference frame ID indicates the list of frame IDs that the current frame relies upon to be decodable. In an example, a network node can keep track of the frame ID of any packet that is washed or dropped. Further, when a current packet contains a reference ID that includes a frame ID of a dropped packet, this indicates that the frame in the current packet is unlikely to be decodable at the destination absent a retransmission of the dropped packet. Accordingly, the network node can reduce the priority of the current packet to the lowest available priority and wash or drop the current packet before washing or dropping other packets that are still decodable. In this way, the network node handling the packets is capable of intelligently packet washing or dropping the packet containing the frame that has the least impact on the video stream.
[0046] FIG. 1 is a schematic diagram of a telecommunication network 100 including a network domain 102. As shown, the telecommunications network 100 includes a plurality of network nodes 104, 106, 108, and 110. The network nodes 104-110 may be operating using layer three (L3) routing technology and, as such, may be referred to as L3 network nodes. While eight network nodes 104-110 are shown in the network domain 102, more or fewer nodes may be included in practical applications.
[0047] Each of the network nodes 104-110 is configured to send, receive, and/or route packets containing media content (e.g., streaming media content). Thus, one or more of the network nodes 104-110 may be a router, a switch, or a gateway. Network nodes 104, 106, and 110 are disposed at an edge of the network domain 102, and may therefore be referred to as edge nodes. The network nodes 104, 106, and 110 that receive packets from outside the network domain 102 may be referred to as an ingress network node (e.g., an ingress router). The network nodes 104, 106, and 110 that transmit packets out of the network domain 102 may be referred to as an egress network node (e.g., an egress router). Depending on the direction of packet traffic, each of the network nodes 104, 106, and 110 may function as an ingress network node, an egress network node, or both. Network nodes 108 are not on an edge of the network domain 102. Such nodes 108 may be referred to as internal nodes and may not be configured to communicate outside of the network domain 102 without passing through an edge node.
[0048] Each of the network nodes 104-110 has one or more neighbor network nodes. As used herein, a neighbor network node refers to a network node which is only one hop away from the network node. The network nodes 104-110 are coupled to, and communicate with each other, via links 120. The links 120 may be wired, wireless, electrical, optical, or some combination thereof. The pattern or arrangement of links 120 in FIG. 1 is for the purpose of illustration only. More or fewer links 120 coupling the network nodes 104-110 to each other in a different configuration may be used in practical applications.
[0049] As shown in FIG. 1, the network node 104 is coupled to a content source 130. The content source 130 may represent a content provider that streams packets of video data for viewing by an end user (e.g., streaming television, video conferencing, etc.) The content source 130 is configured to provide packets of media content such as, for example, streaming media content, motion picture experts group (MPEG) video samples, etc. Thus, the network node 104 is configured to request and receive media content from the content source 130.
[0050] The content source 130 may utilize a video encoder (a.k.a., codec) capable of implementing various video coding techniques. For example, the video encoder of the content source 130 may perform intra- and inter-coding of video blocks within video slices. A video block, or simply a block, is an MxN (M-column by N-row) array of samples (e.g., pixels), or an MxN array of transform coefficients. Several of the video blocks may be aggregated to form one of the video slices. All of the video slices or all of the video blocks collectively form a picture that can be displayed to a user. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. Intra-mode (I mode) may refer to any of several spatial based coding modes. Inter-modes, such as uni-directional (a.k.a., uni prediction) prediction (P mode) or bi-prediction (a.k.a., bi prediction) (B mode), may refer to any of several temporal-based coding modes. Video slices of pictures may be included as payloads in network packets, and such network packets may be denoted by the mode used to encode the included slices. Accordingly, a packet containing a picture/slice coded according to I mode is said to be an I frame packet or a packet carrying an I frame, a packet containing a picture/slice coded according to P mode is said to be a P frame packet or a packet carrying a P frame, and a packet containing a picture/slice coded according to B mode is said to be a B frame packet or a packet carrying a B frame.
[0051] The network node 110 is coupled to a destination node 140. The destination node 140 may represent a user equipment (UE) of an end user. The UE may be a smartphone, a tablet device, a laptop computer, and so on (e.g., a device configured to receive and display streaming video content). The destination node 140 is configured to consume the media content provided by the content source 130. For example, the end user may utilize their UE to request media content (e.g., a streaming media video). The request is routed by the network nodes 104-110 through the network domain 102 and delivered to the content source 130. In response to the request, the content source 130 transmits the requested media content to be routed by the network nodes 104-110 back through the network domain 102 and delivered to the destination node 140 where the requested media content may be consumed.
[0052] In order to decode the media content encoded by the video encoder at the content source 130, the destination node 140 may utilize a video decoder (a.k.a., codec). The video decoder of the destination node 140 is capable of implementing various video coding techniques corresponding to those used by the video encoder of the content source 130. That is, the video decoder of the destination node 140 is able to decode the video blocks encoded using intra- and inter-coding by the video encoder as described above.
[0053] The present disclosure focuses on the issue of network congestion. Network nodes 104-110 each contain one or more receivers, processors, memories, and transmitters. A packet is received by a receiver, processed by a processor, and stored in memory until the packet can be transmitted by a transmitter. Network nodes 104-110 may include many receivers and transmitters, and may communicate many flows of packets at the same time. When a current packet is received and all transmitters are transmitting other packets, the current packet is retained in memory until a transmitter is available to transmit the current packet. Network congestion occurs when a network node 104-110 receives packets faster than such packets can be transmitted. In such a case, the memory in the congested node fills up. Once the memory is full, the congested node must either discontinue storing incoming packets or must evict already received packets from memory to make room for the new packets. This function is known as a packet drop.
[0054] In some examples, network nodes 104-110 can also be configured to perform a packet wash. In an example, packets can be encoded with a slice and the data can be arranged in order of importance. For example, a slice can be described by coding mode(s) that indicate reference blocks that can be used to reconstruct current blocks in the slice. Any difference between the reference blocks and the current blocks is known as residual. Residual can be transformed into coefficients and carried in the packets. Coefficients have varying effects on reconstructed video quality. Hence, the packets can be arranged with the least important coefficients toward the end of the packet and the most important coefficients towards the beginning of the packet. A packet wash can drop data from the end of the packet towards the front of the packet until a packet is small enough to be transmitted despite the network congestion, or to reduce network congestion in general. In this way, the video quality of the slice in the packet is reduced as needed instead of the packet and corresponding slice being dropped entirely.
[0055] In some examples, the network packets can be provided with a priority. The network nodes 104- 110 can then use the priority to select which packets to drop and/ or wash. In this way, the least important packets can be dropped or washed before the more important packets. For example, I frames may be more important than P or B frames because P and B frames rely on I frames to be decodable. Further, P frames may be more important than B frames because B frames rely on P frames to be decodable. B frames are generally not relied upon by other frames for decoding and hence can have the lowest priority. As such, using priority allows the network nodes 104-110 to drop B frames first, then P frames, and then I frames so that packet drop based disruptions to the user at the destination node 140 are minimized as much as possible given the level of congestion across the path between the source node 130 and the destination node 140.
[0056] The present disclosure includes the case where a network node 104-110 has to choose between two packets of the same priority when determining to wash and/or drop a packet. This can be accomplished by employing an LPD value. For example, an LPD value can be encoded into each packet. The LPD value indicates the number of packets that rely on a current packet to be decodable. In another example, the LPD may indicate the number of frames that rely on the frame in the current packet to be decodable. Dropping a packet with a higher LPD value is more disruptive to a video stream than dropping a packet with a lower LPD. Therefore, when two packets have the same priority, the network node 104-110 can wash and/or drop the packet with the lowest LPD first when overcoming network congestion. The calculation of LPD for each packet and the corresponding mechanism for use is discussed in greater detail below.
[0057] In addition, when a current packet relies on a dropped packet (or portion of a washed packet) to be decodable, the destination node 140 is unlikely to be able to use the current packet. In some examples, each packet can include a frame ID. Further, a network node 104-110 can keep a log of the frame IDs of any packets that are dropped and/or washed. In addition, each packet can contain a reference frame ID that includes values indicating the frame IDs of all frames upon which the current frame depends upon to be decodable. When a network node 104- 110 determines that the reference frame ID of a current packet indicates a frame ID that has already been dropped, the network node 104-110 can reduce the priority of the current packet (e.g., to zero). In this way, packets that are unlikely to be useable at the destination node 140 are washed and/or dropped first to reduce video streaming disruption. The mechanism for downgrading priority is also discussed in greater detail below.
[0058] FIG. 2 is a schematic diagram illustrating the process of uni-directional inter prediction 200 using a plurality of frames 202. The frames 202 individually and collectively represent media content. As shown, some of the frames 202 have been labeled with the letter “I.” These frames 202 are an intra prediction (I) frame coded using the I mode discussed above. An I-frame comprises macroblocks that only rely on intra prediction. That is, the macroblocks within the I-frame are predicted based only on information inside the I-frame and not information outside the I-frame. Therefore, the I-frame can be predicted without having to reference any other frame. It should be noted that the term macroblock is used in relation to high efficiency video coding (HEVC) standards. Versatile video coding (VVC) standards instead partition slices into coding tree units (CTUs). Accordingly, in a VVC context an I-frame 202 comprises CTUs that only rely on intra prediction, etc.
[0059] Some of the frames 202 have been labeled with the letter “P.” These frames 202 are an inter prediction (P) frame coded using the P mode discussed above. A P-frame (a.k.a., a predicted frame) allows macroblocks or CTUs to be compressed using temporal prediction in addition to spatial prediction. For motion estimation, a P-frame references other frames 202 that have been previously encoded (e.g., an I-frame or a neighboring P-frame). Every macroblock or CTU in a P-frame can be temporally predicted, spatially predicted, or skipped (e.g., where the video decoder copies the co-located block from the previous frame - i.e., a “zero” motion vector). [0060] Still referring to FIG. 2, the process of uni-directional inter prediction 200 may proceed according to a decode order 204. As shown, the frame 202 with position 1 in the decode order 204 is an I-frame. Thus, the I-frame is decoded without referencing any other frame 202 (i.e., there is no arrow from the I-frame pointing to any preceding frame 202). Once the I-frame in position 1 of the decode order 204 has been decoded, the process of uni-directional inter prediction 200 proceeds to decode the frame 202 in position 2 of the decode order 204. The frame in position 2 is a P-frame. Thus, the P-frame is decoded by referencing the preceding I- frame in position 1 of the decode order 204 as shown by the arrow.
[0061] Once the P-frame in position 2 of the decode order 204 has been decoded, the process of uni-directional inter prediction 200 proceeds to decode the frame in position 3 of the decode order 204. The frame in position 3 is a P-frame. Thus, the P-frame is decoded by referencing the preceding P-frame in position 2 of the decode order 204 as shown by the arrow. This process of decoding frames 202 as described above continues until all frames are decoded. Once decoded, the frames 202 may be displayed on a display screen of a UE according to the display order 206, which corresponds to the decode order 204. It should be noted that, as used herein, frames may also be referred to as video frames, images, and/or pictures depending on the example.
[0062] FIG. 3 is a schematic diagram illustrating the process of bidirectional inter prediction 300 using a plurality of frames 302. The frames 302 individually and collectively represent media content. As with the process of uni-directional inter prediction 200, some of these frames 302 are labeled with the letter “I” or “P” to represent I-frames and P-frames, respectively. In addition, some of the frames 302 are labeled with the letter “B.” These frames 302 are a bidirectional prediction (B) frame coded using the B mode discussed above. A B-frame is a frame that can refer to frames 302 that occur both before the B-frame and after the B-frame in decode order 304. [0063] The video decoder may use macroblock compression (e.g., like Advanced Video Coding (AVC)ZITU-T H.264 does). Each macroblock or CTU of a B-frame can be predicted in a variety of ways. For example, each macroblock or CTU of the B-frame can be predicted using backward prediction (e.g., using frames 302 that occur after the current frame), using forward prediction (e.g. , using frames 302 that occur before the current frame), or without inter prediction (e.g., only intra prediction). In addition, each macroblock or CTU of a B-frame can be skipped completely (e.g., with intra or inter prediction).
[0064] Still referring to FIG. 3, the process of bidirectional inter prediction 300 may proceed according to the decode order 304. As shown, the frame 302 with position 1 in the decode order 304 is an I-frame. Thus, the I-frame is decoded without referencing any other frame 302 (i.e., there is no arrow from the I-frame pointing to any preceding frame 202). Once the I-frame in position 1 of the decode order 304 has been decoded, the process of bidirectional inter prediction 300 proceeds to decode the frame 302 in position 2 of the decode order 304. The frame in position 2 is a P-frame. Thus, the P-frame is decoded by referencing the preceding I-frame in position 1 of the decode order 204 as shown by the arrow.
[0065] Once the P-frame in position 2 of the decode order 304 has been decoded, the process of bidirectional inter prediction 300 proceeds to decode the frame in position 3 of the decode order 304. The frame in position 3 is a B-frame. The B-frame is decoded by referencing the P- frame in position 2 of the decode order 304 as well as the I-frame in position 1 of the decode order 304 as shown by the arrows.
[0066] Once the B-frame in position 3 of the decode order 304 has been decoded, the process of bidirectional inter prediction 300 proceeds to decode the frame in position 4 of the decode order 304. The frame in position 4 is a B-frame. The B-frame is decoded by referencing the P- frame in position 2 of the decode order 304 as well as the I-frame in position 1 of the decode order 304 as shown by the arrows. This process of decoding frames as described above continues until all frames 302 are decoded. Once decoded, the frames 302 may be displayed on a display screen of a UE according to the display order 306. Notably, in the process of bidirectional inter prediction 300 the display order 306 does not necessarily correspond to the decode order 304. That is, the frames 302 are decoded in a different order than they are displayed.
[0067] It should be noted that MPEG video sequences are made up of groups of pictures (GOPs). Each GOP comprises a preset number of coded frames, including one I frame and one or more P and B frames. For example, FIG. 2 depicts two complete GOPs and FIG. 3 depicts a single GOP. Further, GOP lengths may differ depending on the encoded video. Longer GOPs may be used for low-motion video content because frames in low-motion video have reduced dependency on the I frames and hence fewer I frames are needed. This in turn improves the video compression efficiency for long GOPs. However, long GOPs may have reduced error resilience. For example, when an error occurs in a current frame, all frames that are decoded in reference to the current frame may use the error as input, which causes the error to propagate to such dependent frames. An I frame resets inter picture dependency. Therefore, an error in a longer GOP may be propagated between a larger number of frames until the next I frame resets the inter picture dependency and removes the error. Longer GOPs also increase the latency in the transmission of the frames from a source to a destination since the entire GOP must be assembled before transmission can occur.
[0068] In temporal prediction for a closed GOP, the first frame is an intracoded I frame and the last frame is a P frame. For all other frames in the GOP, P or B frames are used. Closed GOPs are self-contained since none of the frames refer to another frame outside the GOP. The I frame is directly referenced by up to three frames, whereas a P frame is directly referenced by up to five frames.
[0069] Unlike the closed GOP, open GOP may use both I and P frames for forward or backward prediction. In addition, the last P frame in a previous GOP is referenced by B frames in the current GOP. Stated differently, the last P frame in a current GOP is reference by B frames in a subsequent GOP. Like a closed GOP, an open GOP ends with a P frame. However, unlike a closed GOP, the open GOP fully exploits the last P frame, which is used as a reference for four B frames. As a consequence, fewer P frames may be employed in open GOP when compared to closed GOP structures. This may result in a slight improvement in compression efficiency. For an open GOP, an I frame serves as a reference frame for more frames (e.g., 5 frames), possibly as many as the P frame. The drawback of an open GOP is that an open GOP is no longer self- contained, and hence cannot be decoded independently.
[0070] An I frame in a closed GOP can be referenced by up to 3 frames. However, a loss of an I frame in a closed GOP affects the entire GOP, because such a loss indirectly prevents all other frames from being decoded. For example, loss of an I frame prevents decoding of associated P frames, which results in prevention of decoding corresponding B frames. As such, the loss of an I frame may propagate to the entire GOP. Thus, the loss of an I frame located in a longer GOP would yield worse quality of experience (QoE) to end user than the loss of an I frame located in a shorter GOP. Accordingly, an I frame in longer GOP is more important than I frame in shorter GOP. The loss of an I frame in an open GOP may affect more frames than the entire GOP, because the last P frame in a current GOP is used as a reference for four B frames in a next GOP. [0071] To account for these issues, the present disclosure defines a loss propagation depth (LPD) of a current frame as the number of frames that would not be decodeable if the current frame is lost/dropped during the transmission, including the current frame.
[0072] For an I frame in a closed GOP, the LPD can determined according to LPD! = GOPLength, where LPD, is the LPD for an I frame and GOPLength is the number of frames in the GOP. For an I frame in an open GOP, the LPD can determined according to LPD, = GOPLength + 4, where LPD, is the LPD for an I frame, GOPLength is the number of frames in the GOP, and 4 is the number of B frames in a subsequent GOP that reference the last P frame in the current open GOP.
[0073] For a P frame in a closed GOP, the LPD can determined according to LPDP = nfP+l, where LPDP is the LPD for a P frame and nfP is the number of frames (e.g., B frames or P frames) that directly or indirectly reference the P frame. For example, a direct reference could be up to five frames, but the indirect reference could be propagated to the end of the GOP.
[0074] For example, in FIG. 2 the first P frame (e.g., the P frame following the first I frame), LPDP = 5 because the four P frames following the first frame reference the first P frame and hence the first P frame (e.g. 1) and all four following frames (e.g., +4) would not be decodable if the first P frame were lost. For the second P frame, LPDP = 4 because the second P frame is referenced by three other P frames. For the third P frame, LPDP = 3 because the third P frame is referenced by two other P frames. For the fourth P frame, LPDP = 2 because the fourth P frame is referenced by one other P frames. For the fifth P frame, LPDP = 1 because the fifth P frame is not referenced by any other P frames. Likewise, for a P frame in an open GOP, LPDP = nfP+l, where nfP is the number of frames (e.g., B frames or P frames) that directly or indirectly reference the P frame. For example, in an open GOP, the direct references to a P frame may include up to 5 frames, but indirect references could propagate outside of the current GOP. For the last P frame in an open GOP, LPDP = 4+1=5, because the last P frame in a current open GOP is referenced by the four B frames in the next open GOP.
[0075] For a B frame (called reference-B frame) in a closed GOP or open GOP, LPDB = nfB+l, where nfB is the number of other B frames in the GOP which reference the current B frame. A B frame can be referenced by other B frames, but such references are not generally used.
[0076] When a network node needs to decide between two packets for washing and/or dropping and such packets contain frames with the same type, i.e., two I frames, two P frames, or two B frames, the network node compares the LPD value of the frames and washes and/or drops the packet with the smallest LPD value. For example, a drop may occur without a packet wash. Further, a packet wash that cannot reduce the packet to a small enough size to allow for transmission despite congestion results in a packet drop.
[0077] Accordingly, the network node first checks a priority level for different types of frames, and washes and/or drops the packet with lower priority. When further packet dropping is needed and the candidate packets have the same priority level, then the LPD of the packets is used to decide how to further wash and/or drop the packets.
[0078] Further, when a frame A is dropped during transmission, other frames that reference frame A should generally not be transmitted. This is because such frames cannot be decoded at the receiver side unless the frame A is re-transmitted and successfully delivered. Under this circumstance, the decoding of other frames has unpredictable latency. For example, in many cases the decode order is not the same as transmission order. For example, in FIG. 3, if the first P frame is dropped during the transmission, and if the same network node which dropped the first P frame detects the packet containing the second P frame, then this packet containing the second P frame could be of much lower priority than its original priority. The priority level of the second P frame can be downgraded to be lower due to the dropping of the previous P frame. Likewise, associated B frames cannot be decoded without the first P frame and can also be downgraded to a lower priority. This allows traffic that is rendered useless to be dropped and thus reduces congestion on an already overloaded network node.
[0079] FIG. 4 is a schematic diagram illustrating an example packet 400 configured to carry metadata to support packet washing and/or dropping based on LPD. The packet 400 may be configured to carry an an I frame, a P frame, or a B frame via telecommunications network 100 as discussed above. The packet 400 includes a GOP ID 402, a frame ID 404, a reference frame ID 406, and a LPD 408, for example in a packet header. The packet 400 also includes a frame 410, which may be a frame of a video stream, such as an I frame, a P frame, a B frame, or a slice thereof as described above.
[0080] The GOP ID 402 is an identifier that identifies the GOP that includes frame 410 carried by the packet. The GOP ID 402 may be unique within the flow containing the video stream. The flow is identified by a source address, a destination address, and flow label. As such, the GOP ID 402 may include some combination of a unique ID, a source address, a destination address, and a flow label. The GOP ID 402 may have varying values depending on the length of the video.
[0081] The frame ID 404 is an ID that identifies the frame 410. A GOP generally has a maximum length of 16 frames, and hence the frame ID 404 could be represented by four bits. [0082] The reference frame ID 406 includes zero or more identifier(s) of the frames referenced by the current frame 410. I frames do not reference other frames, and hence the reference frame ID 406 may be zero bits for an I frame. The reference frame ID 406 may be four bits for a P frame, and 8 bits for a B frame, respectively, and indicates the one or more frames reference by the P frame or B frame included in the frame 410. This is the case for a closed GOP. For an open GOP, the reference frame ID 406 may also include a previous GOP ID if the frame 410 references a frame in the previous GOP.
[0083] The LPD 408 is the LPD value of the current frame 410. The LPD 408 may be determined based on the number of frames that directly or indirectly reference the current frame 410 as described above.
[0084] When a network node washes and/or drops a packet 400, the network node may record the source address, destination address, flow label, GOP ID 402, and frame ID 404. This information can then be used to reduce the priority of any subsequent frame that references the current frame 410.
[0085] FIG. 5 is an example method 500 of performing a packet wash and/or packet drop based on LPD as described above. The method 500 may be performed by a network node experiencing network congestion or otherwise having insufficient resources to properly route a corresponding flow. The node determines that a packet wash and/or a packet drop is needed to continue normal operation. The node employs method 500 to determine which packet to wash and/or drop. At step 502, the node determines whether the decision to select which packet to wash and/or drop is based on selecting between packets containing the same frame type. Stated differently, the node determines whether two packets under consideration for packet washing and/or dropping have the same priority and/or are of the same frame type (e.g., two I frames, two P frames, or two B frames). If the packets are not of the same type, the method 500 proceeds to step 506. At step 506, the node selects the packet with the lowest priority and washes and/or drops the lowest priority packet. For example, when a first packet contains an I frame and a second packet contains a P frame, the node selects the P frame as having the lowest priority and washes and/or drops the packet containing the P frame. As another example, when a first packet contains a P frame and a second packet contains a B frame, the node selects the B frame as having the lowest priority and washes and/or drops the packet containing the P frame.
[0086] When the packets are of the same type, the method 500 proceeds from step 502 to step 504. At step 504, the node selects the packet with the lowest LPD value and washes and/or drops the packet. The node can determine which packet has the lowest LPD value by reading the LPD values in the respective packet headers. The LPD values can be determined by the content source based on the equations described above and encoded into each packet when the packets are sent across the network. In some examples, the node can store the frame ID and flow information for each packet dropped at step 504 and/or 506 in a dropped packet list for use in connection with method 600.
[0087] The method 500 can also be used recursively. For example, if there are initially three packets under consideration for packet washing and/or dropping with the same priority, then at step 502 the node can determine that two of the packets do not contain the same frame type. Then, at step 506, one of the two packets can be wash/dropped. The node can return to step 502 and process the remaining two packets accordingly.
[0088] FIG. 6 is an example method 600 of performing a downgrade of packet priority for a packet carrying a frame that is dependent on another frame in another packet that has been washed and/or dropped. The method 600 may be performed by a network node performing packet washes and packet drops based on LPD values. At step 602, the network node can check whether priority degradation is implicated for each packet containing a P frame of a B frame. For example, when no packets have been washed or dropped at the node over a predetermined timeframe, then priority degradation is not needed. However, when the node has performed packet washing or dropping during the predetermined time frame, then priority degradation may be needed and the node proceeds to step 604.
[0089] At step 604, the node checks the dropped packet list to determine whether a packet has been washed or dropped from the same flow. This can be accomplished by comparing the source address, the destination address, and flow label of the current packet with the dropped packet list. When a packet from the same flow has been dropped and/or washed, the node can further compare the reference frame ID of the current packet to the frame ID of the dropped packets in the dropped packet list that are associated with the same flow.
[0090] At step 606, the node can determine whether a match is found as a result of step 604. When a match is not found, the node proceeds to step 610 and maintains the original priority of the current packet. When a match is found, the node proceeds from step 606 to step 608. A match of the flow and the reference ID to the frame ID in the dropped packets list indicates the current frame references a dropped packet and therefore is likely to be undecodable at the destination. Accordingly, when a match is found, the node downgrades the priority of the current packet to the lowest priority. In this way, the current packet is the first packet washed and/or dropped in the event of further network congestion as the current packet is unlikely to be useful to the end user. [0091] FIG. 7 is an example method 700 of encoding packets to support packet wash based on LPD and priority downgrades for dependent packets. Method 700 may occur at a content source. For example, the content source may employ method 700 to encode video frames into packets in a manner that supports packet wash and/or packet drops based on LPD and/or priority downgrades as described above.
[0092] At step 702, the node determines LPD values for each of a plurality of packets. Each of the plurality of packets contains at least a slice of a frame of video data. Further, each LPD value indicates, for a current slice of a current frame contained in a current packet, an amount of video data (e.g., a number of frames) that relies on video data from the current frame to be decodable.
[0093] At step 704, the node encodes the LPD values into the plurality of packets as metadata to support packet dropping on a packet with a lowest LPD value from the plurality of packets, for example in case of network congestion. The packet drop may include dropping the packet with the lowest LPD value. The LPD values may be encoded into LPD fields in each of the plurality of packets. In an example, an LPD value for an intra-predicted (I) frame in a closed group of pictures (GOP) is determined according to LPDI = GOPLength where LPDI is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures. In an example, an LPD value for an I frame in an open GOP is determined according to LPDI = GOPLength + 4 where LPDI is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures. In an example, an LPD value for a unidirectional inter-predicted (P) frame is determined according to LPDP = nip + 1 where LPDP is the LPD value for the P frame and nip is a number of frames that reference the P frame. In an example, an LPD value for a bidirectional inter-predicted (B) frame is determined according to LPDB = nip + 1 where LPDP is the LPD value for the B frame and nip is a number of frames that reference the B frame. [0094] At step 706, the node encodes a frame ID into each of the plurality of packets. The frame ID indicates the current frame associated with in the current packet (e.g., the current frame contained in the packet or the current frame containing the slice contained in the packet).
[0095] At step 708, the node encodes a reference frame ID into each of the plurality of packets. The reference frame ID indicates any frame IDs of any frames upon which the current frame depends. The reference frame IDs and the frame IDs support reducing a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped. Packets with a lower priority are dropped prior to packets with a higher priority.
[0096] At step 710, the node transmits the plurality of packets. [0097] FIG. 8 is an example method 800 of performing a packet wash based on LPD and priority downgrades for dependent packets. For example, a network node may perform method 800 on a video stream transmitted from a content source and encoded according to method 700. [0098] At step 802, the network node receives a plurality of packets. Such packets may have a same priority and hence the network node may have to determine which packet to drop. Each of the plurality of packets contain at least a slice of a frame of video data.
[0099] At step 804, the network node obtains LPD values from each of the plurality of packets. Each LPD value indicates, for a current slice of a current frame contained in a current packet, an amount of video data (e.g., a number of frames) that relies on video data from the current frame to be decodable. For example, each of the plurality of packets may contain an LPD field including metadata indicating a corresponding LPD value. In an example, an LPD value for an intra-predicted (I) frame in a closed group of pictures (GOP) is determined according to LPDI = GOPLength where LPDI is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures. In an example, an LPD value for an I frame in an open GOP is determined according to LPDI = GOPLength + 4 where LPDI is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures. In an example, an LPD value for a unidirectional inter-predicted (P) frame is determined according to LPDP = nip + 1 where LPDP is the LPD value for the P frame and nip is a number of frames that reference the P frame. In an example, an LPD value for a bidirectional inter-predicted (B) frame is determined according to LPDB = nip + 1 where LPDP is the LPD value for the B frame and nip is a number of frames that reference the B frame.
[00100] At step 806, the network node performs a packet drop on a packet with a lowest LPD value from the plurality of packets, for example in response to network congestion. The packet drop may include dropping the packet with the lowest LPD value from the packets with the same priority.
[00101] At step 808, the network node reduces a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped. For example, each of the plurality of packets may contains a reference frame ID. The reference frame ID indicates any frame IDs of any frames upon which the current frame depends. For example, each of the plurality of packets contains a frame ID. The frame ID indicates the current frame associated with the current packet. The network node may match the reference frame ID to the current frame ID in a dropped packets list (along with a flow match) to determine when to reduce the priority of the dependent packet. Packets with lower priority are dropped prior to packets with higher priority. [00102] FIG. 9 is a schematic diagram of a network apparatus 900 (e.g., an content source, a network node, etc.). The network apparatus 900 is suitable for implementing the disclosed embodiments as described herein. The network apparatus 900 comprises ingress ports/ingress means 910 and receiver units (Rx)Zreceiving means 920 for receiving data; a processor, logic unit, or central processing unit (CPU)Zprocessing means 930 to process the data; transmitter units (Tx)Ztransmitting means 940 and egress portsZegress means 950 for transmitting the data; and a memoryZmemory means 960 for storing the data. The network apparatus 900 may also comprise optical-to-electrical (OE) components and electrical-to-optical (EO) components coupled to the ingress portsZingress means 910, the receiver unitsZreceiving means 920, the transmitter unitsZtransmitting means 940, and the egress portsZegress means 950 for egress or ingress of optical or electrical signals.
[00103] The processorZprocessing means 930 is implemented by hardware and software. The processorZprocessing means 930 may be implemented as one or more CPU chips, cores (e.g., as a multi-core processor), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs). The processorZprocessing means 930 is in communication with the ingress portsZingress means 910, receiver unitsZreceiving means 920, transmitter unitsZtransmitting means 940, egress portsZegress means 950, and memoryZmemory means 960. The processorZprocessing means 930 comprises a packet LPD module 970. The packet LPD module 970 is able to implement the methods disclosed herein. The inclusion of the packet LPD module 970 therefore provides a substantial improvement to the functionality of the network apparatus 900 and effects a transformation of the network apparatus 900 to a different state. Alternatively, the packet prioritization 970 is implemented as instructions stored in the memoryZmemory means 960 and executed by the processorZprocessing means 930.
[00104] The network apparatus 900 may also include input andZor output (IZO) devicesZIZO means 980 for communicating data to and from a user. The IZO devices EO means 980 may include output devices such as a display for displaying video data, speakers for outputting audio data, etc. The IZO devices EO means 980 may also include input devices, such as a keyboard, mouse, trackball, etc., andZor corresponding interfaces for interacting with such output devices.
[00105] The memoryZmemory means 960 comprises one or more disks, tape drives, and solid- state drives and may be used as an over-flow data storage device, to store programs when such programs are selected for execution, and to store instructions and data that are read during program execution. The memoryZmemory means 960 may be volatile andZor non-volatile and may be read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (TCAM), andZor static random-access memory (SRAM). [00106] While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
[00107] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.

Claims

CLAIMS What is claimed is:
1. A method implemented in a network node, the method comprising: receiving a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data; obtaining loss propagation depth (LPD) values from each of the plurality of packets, wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and performing a packet drop on a packet with a lowest LPD value from the plurality of packets.
2. The method of claims 1, wherein each of the plurality of packets contains an LPD field including metadata indicating a corresponding LPD value.
3. The method of any of claims 1-2, further comprising reducing a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped, wherein packets with lower priority are dropped prior to packets with higher priority.
4. The method of any of claims 1-3, wherein each of the plurality of packets contains a reference frame identifier (ID), and wherein the reference frame ID indicates any frame IDs of any frames upon which the current frame depends.
5. The method of any of claims 1-4, wherein each of the plurality of packets contains a frame identifier (ID), and wherein the frame ID indicates the current frame associated with the current packet.
6. The method of any of claims 1 -5, wherein an LPD value for an intra-predicted (I) frame in a closed group of pictures (GOP) is determined according to LPDi = GOPLength where LPDi is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures.
7. The method of any of claims 1-6, wherein an LPD value for an I frame in an open GOP is determined according to LPDi = GOPLength + 4 where LPDi is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures.
8. The method of any of claims 1-7, wherein an LPD value for a unidirectional interpredicted (P) frame is determined according to LPDp = nfp + 1 where LPDp is the LPD value for the P frame and nfp is a number of frames that reference the P frame.
9. The method of any of claims 1-8, wherein an LPD value for a bidirectional interpredicted (B) frame is determined according to LPDB = nfp + 1 where LPDp is the LPD value for the B frame and nfp is a number of frames that reference the B frame.
10. A method implemented in a network node, the method comprising: determining loss propagation depth (LPD) values for each of a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data, and wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; encoding the LPD values into the plurality of packets as metadata to support packet dropping on a packet with a lowest LPD value from the plurality of packets; and transmitting, the plurality of packets.
11. The method of claim 10, wherein the LPD values are encoded into LPD fields in each of the plurality of packets.
12. The method of any of claims 10-11, further comprising encoding a reference frame identifier (ID) into each of the plurality of packets, and wherein the reference frame ID indicates any frame IDs of any frames upon which the current frame depends.
13. The method of any of claims 10-12, further comprising encoding a frame identifier (ID) into each of the plurality of packets, and wherein the frame ID indicates the current frame associated with the current packet.
14. The method of any of claims 10-13, wherein the reference frame IDs and the frame IDs support reducing a priority of a dependent packet containing a dependent frame when any packet containing a frame upon which the dependent frame depends is dropped, wherein packets with lower priority are dropped prior to packets with higher priority.
15. The method of any of claims 10-14, wherein an LPD value for an intra-predicted (I) frame in a closed group of pictures (GOP) is determined according to LPDi = GOPLength where LPDi is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures.
16. The method of any of claims 10-15, wherein an LPD value for an I frame in an open GOP is determined according to LPDi = GOPLength + 4 where LPDi is the LPD value for the I frame and GOPLength is a number of frames in the group of pictures.
17. The method of any of claims 10-16, wherein an LPD value for a unidirectional interpredicted (P) frame is determined according to LPDp = nfp + 1 where LPDp is the LPD value for the P frame and nfp is a number of frames that reference the P frame.
18. The method of any of claims 10-17, wherein an LPD value for a bidirectional interpredicted (B) frame is determined according to LPDB = nfp + 1 where LPDp is the LPD value for the B frame and nfp is a number of frames that reference the B frame.
19. A network node comprising: a processor, a receiver coupled to the processor, a memory coupled to the processor, and a transmitter coupled to the processor, wherein the processor, receiver, memory, and transmitter are configured to perform the method of any of claims 1-18.
20. A non-transitory computer readable medium comprising a computer program product for use by a router, the computer program product comprising computer executable instructions stored on the non-transitory computer readable medium such that when executed by a processor cause the router to perform the method of any of claims 1-18.
21. A network device comprising: a receiving means for receiving a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data; and a processing means for: obtaining loss propagation depth (LPD) values from each of the plurality of packets, wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and performing a packet drop on a packet with a lowest LPD value from the plurality of packets.
22. The network device of claim 21, wherein the network device is further configured to perform the method of any of claims 1-9.
23. A network device comprising: a processing means for: determining loss propagation depth (LPD) values for each of a plurality of packets, wherein each of the plurality of packets contain a portion of a frame of video data, and wherein each LPD value indicates, for a current portion of a current frame contained in a current packet, an amount of video data that relies on video data from the current frame to be decodable; and encoding the LPD values into the plurality of packets as metadata to support packet dropping on a packet with a lowest LPD value from the plurality of packets; and a transmitting means for transmitting the plurality of packets.
24. The network device of claim 23, wherein the network device is further configured to perform the method of any of claims 10-18.
PCT/US2022/043664 2022-09-15 2022-09-15 Group of pictures affected packet drop WO2024058782A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2022/043664 WO2024058782A1 (en) 2022-09-15 2022-09-15 Group of pictures affected packet drop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2022/043664 WO2024058782A1 (en) 2022-09-15 2022-09-15 Group of pictures affected packet drop

Publications (1)

Publication Number Publication Date
WO2024058782A1 true WO2024058782A1 (en) 2024-03-21

Family

ID=83691525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/043664 WO2024058782A1 (en) 2022-09-15 2022-09-15 Group of pictures affected packet drop

Country Status (1)

Country Link
WO (1) WO2024058782A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120013748A1 (en) * 2009-06-12 2012-01-19 Cygnus Broadband, Inc. Systems and methods for prioritization of data for intelligent discard in a communication network
US20140036999A1 (en) * 2012-06-29 2014-02-06 Vid Scale Inc. Frame prioritization based on prediction information

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120013748A1 (en) * 2009-06-12 2012-01-19 Cygnus Broadband, Inc. Systems and methods for prioritization of data for intelligent discard in a communication network
US20140036999A1 (en) * 2012-06-29 2014-02-06 Vid Scale Inc. Frame prioritization based on prediction information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEN-PING LAI ET AL: "Dispersive Video Frame Importance Driven Probabilistic Packet Mapping for 802.11e Based Video Transmission", INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS), 2011 THIRD INTERNATIONAL CONFERENCE ON, IEEE, 30 November 2011 (2011-11-30), pages 424 - 429, XP032098700, ISBN: 978-1-4577-1908-0, DOI: 10.1109/INCOS.2011.77 *

Similar Documents

Publication Publication Date Title
KR100799784B1 (en) Method and apparatus for frame prediction in hybrid video compression to enable temporal scalability
KR101859155B1 (en) Tuning video compression for high frame rate and variable frame rate capture
CN109068187B (en) Real-time traffic delivery system and method
US8443097B2 (en) Queue management unit and method for streaming video packets in a wireless network
US10116970B1 (en) Video distribution, storage, and streaming over time-varying channels
US20120063462A1 (en) Method, apparatus and system for forwarding video data
US10887151B2 (en) Method for digital video transmission adopting packaging forwarding strategies with path and content monitoring in heterogeneous networks using MMT protocol, method for reception and communication system
KR20150135440A (en) Improved rtp payload format designs
US8340180B2 (en) Camera coupled reference frame
US20110067072A1 (en) Method and apparatus for performing MPEG video streaming over bandwidth constrained networks
US20180184145A1 (en) Abr network profile selection engine
CN102223218B (en) Method and equipment for inhibiting media message retransmission
US20080144505A1 (en) Method and Apparatus for Bitrate Reduction
Fiandrotti et al. Traffic prioritization of H. 264/SVC video over 802.11 e ad hoc wireless networks
JP2010520677A (en) Data encoding method and system in communication network
US20130058409A1 (en) Moving picture coding apparatus and moving picture decoding apparatus
EP3257177A1 (en) System and method for real time video communication employing fountain coding
WO2024058782A1 (en) Group of pictures affected packet drop
da Silva et al. Preventing quality degradation of video streaming using selective redundancy
TW201330625A (en) Streaming transcoder with adaptive upstream and downstream transcode coordination
Nightingale et al. Priority-based methods for reducing the impact of packet loss on HEVC encoded video streams
WO2010115376A1 (en) Media stream switching method, apparatus and system
WO2022220863A1 (en) Mpeg characteristics aware packet dropping and packet wash
TWI416962B (en) Method and apparatus,and computer readable medium for frame prediction in hybrid video compression to enable temporal scalability
BR102018070605A2 (en) method for digital video transmission adopting packet forwarding strategies with route and content monitoring in heterogeneous networks using mmt protocol, method for reception and communication system