CN113453006A

CN113453006A - Picture packaging method, equipment and storage medium

Info

Publication number: CN113453006A
Application number: CN202110234142.4A
Authority: CN
Inventors: 文格尔史蒂芬; 崔秉斗; 赵帅
Original assignee: Tencent America LLC
Current assignee: Tencent America LLC
Priority date: 2020-03-25
Filing date: 2021-03-03
Publication date: 2021-09-28
Anticipated expiration: 2041-03-03

Abstract

The embodiment of the disclosure discloses a picture packaging method, a device and a storage medium, wherein the method comprises the following steps: obtaining a plurality of NAL units, the plurality of NAL units comprising a first NAL unit of a picture and a last NAL unit of the picture; partitioning a first NAL unit of a picture into a plurality of first slices and a last NAL unit of the picture into a plurality of last slices; encapsulating the plurality of first segments into a plurality of first Fragment Unit (FU) packets and encapsulating the plurality of last segments into a plurality of last FU packets; and transmitting the plurality of first FU packets and the plurality of last FU packets, wherein a last FU packet of the plurality of last FU packets includes a last FU header including a last R bit, and wherein the last R bit is set to 1.

Description

Picture packaging method, equipment and storage medium

Cross-referencing

This application claims priority from united states provisional application No. 62/994,563 filed at united states patent and trademark office on 25/3/2020 and united states application No. 17/077,546 filed at united states patent and trademark office on 22/10/2020, which are hereby incorporated by reference in their entirety.

Technical Field

The disclosed subject matter relates to video encoding and decoding, and more particularly to signaling of picture boundary information for supporting individual access of pictures in a video payload format.

Background

Real-time Transport Protocol (RTP), which is a network Protocol for transmitting video over an IP network, has been used in communication systems using streaming media, such as video conferencing applications. RTP payload formats have recently received attention for carrying Video data in compliance with the Coding standard ITU-T recommendation [ H.266] and the ISO (International Organization for standardization)/IEC (International Electrotechnical Commission) international standard [ ISO23090-3], also known as Next Generation Video Coding (VVC) and developed by the Joint Video Experts group (JVT: Joint Video Experts Team). The RTP payload format allows for the encapsulation of at least one Network Abstraction Layer (NAL) unit in each RTP packet (data packet) payload and the fragmentation of NAL units into multiple RTP packets (data packets).

At least some video coding standards recognize the concept of an Access Unit (AU). In the single layer case, an access unit may consist of a single coded picture. In other cases, particularly those involving layered coding and multiview coding, the AU may comprise multiple coded pictures that share certain timing information, e.g. have the same presentation time.

The RTP header may include so-called "marker" bits (M bits). Conventionally, in almost all RTP payload formats, which can identify the concept of an AU, M bits are specified to be equal to one for RTP packets carrying the last bit string of the AU, and set to zero otherwise. When the receiver receives an RTP packet with M bits set, it knows that the RTP packet is the last data packet of the AU and can process accordingly. Some details of this processing can be found in the RTP specification.

At least some video coding standards further identify the concept of coded pictures, which may be different from AUs. AU and coded pictures may be different, for example, if an AU consists of several coded pictures, it may be different when spatial or SNR scalability is used; or may be different in the case of redundant pictures.

If the sending endpoint retrieves its sending video codestream from a storage device/hard disk drive, such a file may not include easily accessible meta-information about access units or encoded picture boundaries, for example, because the codestream may be stored, for example, in a format commonly referred to as an "annex B codestream". In this scenario, there may be no Application Program Interface (API) information available from the encoder to the RTP wrapper that signals that the bit string of the bitstream is the final bit string of the AU or the final bit string of the encoded picture. Instead, the RTP encapsulator may have to identify the bit string, which includes the end of the AU or encoded picture without side information that is typically available to the encoder.

Disclosure of Invention

In an embodiment, there is provided a method of encapsulating a plurality of Network Abstraction Layer (NAL) units of a picture, the method comprising: obtaining the plurality of NAL units, the plurality of NAL units comprising a first NAL unit of the picture and a last NAL unit of the picture; partitioning the first NAL unit of the picture into a plurality of first slices and partitioning the last NAL unit of the picture into a plurality of last slices; encapsulating the plurality of first segments into a plurality of first Fragment Unit (FU) packets and encapsulating the plurality of last segments into a plurality of last FU packets; and transmitting the first FU packets and the last FU packets, wherein a last FU header is included in a last FU packet among the last FU packets, the last FU header includes a last R bit, and the last R bit is set to 1.

In an embodiment, there is provided an apparatus for encapsulating a plurality of Network Abstraction Layer (NAL) units of a picture, the apparatus comprising: an acquisition module to cause the at least one processor to acquire the plurality of NAL units, the plurality of NAL units comprising a first NAL unit of the picture and a last NAL unit of the picture; a partitioning module to cause the at least one processor to partition the first NAL unit of the picture into a plurality of first slices and the last NAL unit of the picture into a plurality of last slices; an encapsulation module to cause the at least one processor to encapsulate the first plurality of fragments into a first plurality of sliced unit (FU) packets and encapsulate the last plurality of fragments into a plurality of last FU packets; and a transmitting module for causing the at least one processor to transmit the plurality of first FU packets and the plurality of last FU packets, wherein a last FU packet of the plurality of last FU packets comprises a last FU header, the last FU header comprises a last R bit, and the last R bit is set to 1.

In an embodiment, a computing device is provided, comprising a processor and a memory; the memory stores a computer program that, when executed by the processor, causes the processor to perform the method of embodiments of the disclosure.

In an embodiment, a non-transitory computer-readable medium storing instructions is provided, the instructions including at least one instruction that, when executed by at least one processor of a device for encapsulating a plurality of Network Abstraction Layer (NAL) units of a picture, causes the at least one processor to perform a method as described in embodiments of the present disclosure.

In the above technical solution of the embodiment of the present disclosure, a plurality of network abstraction layer NAL units of a picture are encapsulated, and a flag bit is set in a header of a last FU packet of a plurality of last FU packets of a last NAL unit of the picture, so that boundary information of the picture can be conveniently identified, and further, the picture can be efficiently and independently accessed.

Drawings

Further features, properties, and various advantages of the disclosed subject matter will become more apparent from the following detailed description and the accompanying drawings, in which:

fig. 1 is a schematic diagram of a simplified block diagram of a communication system according to an embodiment.

Fig. 2 is a schematic illustration of an RTP header according to an embodiment.

Fig. 3 is a schematic illustration of RTP including a payload header and an actual payload according to an embodiment.

Fig. 4 is a schematic diagram of a NAL unit header in a VVC with bit boundaries according to an embodiment.

Fig. 5 is a schematic diagram of a Fractional Unit (FU) payload format, according to an embodiment.

Fig. 6 is a schematic diagram of an FU header for next generation video coding (VVC) according to an embodiment.

Fig. 7 is a schematic diagram of a VCL NAL unit header with two FU structures according to an embodiment.

Fig. 8 is a flowchart of an example method for encapsulating a plurality of Network Abstraction Layer (NAL) units of a picture, according to an embodiment.

Fig. 9 is a schematic diagram of a computer system, according to an embodiment.

Detailed Description

In an embodiment, a method for signaling and identifying picture boundaries in Real-time Transport Protocol (RTP) payload format for next generation Video Coding (VVC), other protocols and codecs is described. The indication of picture boundaries may allow for efficient play-out buffer processing.

Referring to fig. 1, a communication system may include at least one endpoint (11, 12, 13), the at least one endpoint (11, 12, 13) communicating with each other over an IP network (14), such as the internet, using real-time media, such as voice, video and/or other media. The system may further comprise at least one media-aware network element (15), the at least one media-aware network element (15) being configured to manipulate media sent by one endpoint before forwarding the media to other endpoints.

In some such system designs, an endpoint and/or a Media Aware Network Element (MANE) may include an RTP encapsulator that transmits RTP packets over a Network to an RTP receiver located, for example, in another endpoint or MANE. In some cases, the sending endpoint may include a video camera functionally coupled to a video encoder, which in turn is coupled to an encapsulator such that video captured by the video camera is transmitted over a network (14) from the sending endpoint, e.g., endpoint (11), to the receiving endpoint, e.g., endpoint (12), using RTP packets.

In some cases, the sending endpoint may not include a video encoder. But may instead retrieve video from a file stored on a hard drive or the like (16), the hard drive or the like (16) being coupled to the endpoint (11).

Some real-time communication techniques for video over the internet and other IP networks rely on RTP as indicated in RFC 3550. In some cases, RTP packets are transmitted over UDP over IP from one endpoint or MANE to another endpoint or MANE. Referring to fig. 2, the RTP header structure indicated in RFC3550 is shown. Each RTP packet begins with a header of the RTP packet. Fig. 2 illustrates the format of the RTP header indicated in RFC 3550.

The Version (V) field (201) is used to identify the Version of RTP and is equal to 2. A Padding (P) field (202) is used to indicate whether the end of the packet contains at least one additional Padding octet (octet). An Extension (X) field (203) is used to indicate whether the fixed header is followed by exactly one header Extension. The CSRC Count (CC) field (204) contains the number of CSRC identifiers (210) following the fixed header. The tag (M) field (205) allows significant events, such as Access Unit boundary (Access Unit boundary) in a packet stream, to be tagged. The Payload Type (PT) field indicates Payload Type (206) -the Type of media in use, e.g., video encoded according to ITU-T recommendation h.264 using the RTP Payload format RFC 6184 with a set of RFC3984 parameters. The PT is selected/negotiated by a call control protocol (call control protocol) in many cases. The RTP sequence number (207) is incremented by one for each RTP packet to be sent until wrapping around. The RTP timestamp (208) is used to indicate the moment in time the first sample represented in the packet was sampled (acquisition time) and is typically used as presentation time. At least some video codecs have a timestamp of 90kHz, while for many audio codecs the timestamp is equal to the sampling rate, e.g. 8kHz, 44.1kHz or 48 kHz. The synchronization source (209) and the contribution source (210) are described below only to the extent necessary.

RTP follows the general approach of application layer framing, so adaptation to certain payloads, such as coded video formats specified according to certain video coding standards, can be specified by a secondary specification outside the main RTP specification, referred to as the RTP payload format. Some RTP payload formats reuse the bits of a Network Abstraction Header (Network Abstraction Header) as their payload Header, which is present in some video coding standards, such as h.264 or h.265. In such RTP payload formats and video coding standards, a Network Abstraction Layer Unit (NAL Unit or NALU) may be a finite-sized bit string covering a coded picture or a well-defined part thereof, such as a slice (slice), a tile (tile), a GOB, etc.

The bit string may comprise a relatively short data structure like at its beginning, e.g. of

length

8 or 16 bits, which contains minimal information about the type of bit string comprised, and in some scenarios, hierarchical information.

As described above, the RTP header may include so-called "marker" bits (M bits) (205). Conventionally, in almost all RTP payload formats that identify the AU concept, M bits are specified to be equal to one for the RTP packet carrying the last bit string of the AU, otherwise it is set to zero. When the receiver receives an RTP packet with M bit groups, it knows that it is the last packet of the AU and can process the RTP packet accordingly. Some details of this processing can be found in the RTP specification. Referring again to fig. 1, assuming that the sending endpoint (11) retrieves its sending video bitstream from the storage device/hard disk drive (16), the file may not include access unit or encoded picture boundary related easy-to-access meta-information, for example because the bitstream may be stored, for example, in a format commonly referred to as "annex B bitstream". In this scenario, there may not be Application Program Interface (API) information available from the encoder to the RTP wrapper that signals that the bit string of the (signal) bitstream is the final bit string of the AU or encoded picture. Instead, the RTP encapsulator may have to identify a bit string comprising the end of the AU or encoded picture, which bit string does not comprise side information that is typically available to the encoder.

In an embodiment, the transport layer may use RTP packets to communicate media data including video and audio. Referring to fig. 3, each RTP packet starts with an RTP header. The RTP header fields have been described above. In the same or another embodiment, these RTP header fields may be set according to RFC3550 and the applicable RTP payload specification.

In the same or another embodiment, the RTP packet may further include an RTP payload header (302). For example, the RTP payload header format may be specified in an RTP payload specification applicable to a given payload. The given payload may be, for example, video encoded according to the VVC specification (also known as ITU-T rec.h.266). The purpose of the RTP payload header may include, for example:

a) control information relating to the payload and useful for decapsulators (depacketizers), jitter buffer management, etc. is provided to the extent that it is not available in the RTP header (301) and/or is not available or easily retrievable from the payload (303) itself. For example, the payload (303) may be encoded using complex variable length codes, arithmetic codecs, etc., which may be sufficient for decoding purposes, but too heavy for a decapsulator located in the MANE;

b) additional functionality is provided. Examples include slices of video units (e.g., codec pictures, coded slices, NAL units, etc.); aggregation of video units; redundant copies of some syntax elements tp, may enable easy access and/or redundancy in case of packet loss, etc.

The RTP payload header (202) may be followed by an RTP payload (303). The RTP payload may be encoded according to a media codec specification, such as an audio codec or video codec specification, and may include, for example, at least one compressed or uncompressed audio sample, a compressed or uncompressed picture or portion thereof, and/or the like.

Embodiments may thereafter relate to video encoded according to the VVC specification and corresponding RTP payload formats.

The VVC uses a NAL unit based video bitstream structure. A NAL unit may be a bit string representing control data-a non-video coding layer (VCL-) NAL unit-or encoded video bits of compressed video data (VCL NAL unit) for a picture, slice (slice), tile (tile), or similar structure. According to certain RTP payload formats, one RTP packet may carry in its payload (203) a single NAL unit (in which case the NAL unit header collectively acts as the RTP payload format header), multiple NAL units (an aggregate packet with its own NAL unit-like structure as the RTP payload header followed by two or more NAL units), and fragments of the NAL units (in which case the RTP payload header is used for control information for fragmentation and is followed by fragments of the NAL units).

Regardless of how many packets (or fragments thereof) are carried by the RTP packet, it is advantageous for the decapsulator to be able to identify the last packet of a given encoded picture. In some non-hierarchical environments, this may be achieved by a marker (M) bit of the RTP header (205). In particular, it may be implemented according to some RTP description file and RTP payload formats.

In the same or another embodiment, when the flag bit is set equal to 1, it indicates that the current packet may be the last packet of an access unit in the current RTP stream. When the flag bit is set equal to 0, it indicates that the current packet may not be the last packet of the access unit. Since in some non-layered environments the AU boundaries may be the same as the encoded picture boundaries, the flag bits may be indicated as picture boundaries. However, in hierarchical environments, and in some non-hierarchical environments involving, for example, redundant pictures, when the flag bit is set at an AU boundary, the encoded picture boundary cannot be indicated either, since there may be more picture boundaries than AU boundaries.

Referring to fig. 4, in the same or another embodiment, the VVC NAL unit header may include two bytes (16 bits). The inhibit zero bit (Forbidden-zero bit) F (401) is always zero. The 5 bits are used to indicate the NAL unit Type (Type) (404), meaning that up to 32 types of NAL units or NAL unit-like structures may be present. The value of a VCL NAL unit ranges between zero and 12 and the range of non-VCL NAL unit types ranges between 13 and 31. The Z bit (402), LayerID (403), and Temporal ID (405) are used to manage the spatial/SR and Temporal hierarchies, respectively, and are not described in detail herein.

In the VVC RTP payload format, three different types of RTP packet payload structures are indicated. The receiver may identify the type of RTP packet payload by a type field in the payload header. A single NAL unit packet contains a single NAL unit in the payload, and the NAL unit header of the NAL unit also serves as the payload header. Aggregation Packets (AP) contain more than one NAL unit within one access unit and are not described further herein. Fragmentation Packets contain Fragmentation Units (FUs), which in turn contain a subset of a single NAL Unit.

A Fragmentation Unit (FU) enables the fragmentation of a single NAL unit into multiple RTP packets. A slice of a NAL unit may consist of an integer number of consecutive octets (octets) of the NAL unit. The fragments of the same NAL unit may be sent in consecutive order with increasing RTP sequence numbers. When a NAL unit is segmented and transported within an FU, it is referred to as a segmented NAL unit.

Referring to fig. 5, in the same or another embodiment, an FU packet may include a NAL unit payload header (501), the NAL unit payload header (501) indicating that the packet is a slice packet, the NAL unit 6-78 payload header (501) including various fields as described below: FU payload (505) and optional RTP padding (506), and FU header (502) and conditionally Decoding sequence Number Difference (DONL) encoded in network byte Order (504).

In the same or another embodiment, referring to fig. 6, the NAL unit type of the NAL unit, the slice of which is carried in the FU, is signaled in FuType (604) with 5 bits. The FU header may further include S bits, E bits, R bits. Setting S bits for a first segment of a NAL unit (601), otherwise clearing it; and sets the E bit for the last slice of the NAL unit (602), otherwise clears it.

In the same or another embodiment, the R bit (603) may be reserved for subsequent use; the R bit (603) is set to, for example, 0 by the encapsulator and ignored (603) by the decapsulator.

In the same or another embodiment, the R bit (603) may indicate a first slice of a first NAL unit in decoding order of the encoded picture. This bit may be set to 1 if the slice is the first slice of the first NAL unit in decoding order of the coded picture, and to 0 otherwise. The RTP payload specification may also reverse these semantics, where the bit may be set to 0 if the slice is the first slice of the first NAL unit in the decoding order of the encoded picture, and to 1 otherwise.

In the same or another embodiment, the R bit (603) may indicate the last slice of the last NAL unit in decoding order of the coded picture. This bit may be set to 1 if the slice is the last slice of the last NAL unit in decoding order of the coded picture, and to 0 otherwise. The RTP payload specification may also reverse these semantics, where the bit may be set to 0 if the slice is the last slice of the last NAL unit in the decoding order of the encoded picture, and to 1 otherwise.

In the same or another embodiment, if a NAL unit is the last NAL unit of a bitstream, it may be determined that the NAL unit is the last NAL unit of a picture. NAL unit naluX may also be determined to be the last NAL unit of a picture if one of the following conditions is true: 1) the next VCL NAL unit naluY in decoding order has NAL _ unit _ type (i.e., PH _ NUT) equal to 19, or 2) the high-order bit (high-order bit) of the first byte following its NAL unit header (i.e., picture _ header _ in _ slice _ header _ flag) is equal to 1.

In the same or another embodiment, syntax elements or bits similar to the R bits may not be placed in the FU header, but in another appropriate syntax structure of the RTP payload header; e.g., in the payload header itself, the aggregate packet header and the aggregate unit header, etc.

Referring to fig. 7, in the same or another embodiment, NAL unit (713) is shown, which NAL unit (713) has been split into two RTP packets in order to illustrate the use of an FU. When transmitted over an IP network using RTP, the fragments of the same NAL unit may be transmitted in consecutive order with increasing RTP sequence numbers.

NAL unit 713 may be split into two fragments and each fragment may be carried in its own RTP packet. Or may be divided into more than two packets.

For example, the NAL unit (713) may contain n bits and be divided into two slices, which are carried as a first FU payload of k bits (710) and a second FU payload of n-k bits (712). Each of the two FU payloads follows its respective FU header, e.g., FU payload (710) follows FU header (709), and FU payload (712) follows FU header (711).

In an embodiment, within the first FU header (709), the S bit (701) may be set and the E bit (702) may be cleared to indicate that this is the first slice of the NAL unit. The Type (Type) field (704) is set to the Type of NAL unit. The R bit (703) may be set as described in one of the alternatives above. For example, if NAL unit (713) is the first NAL unit of a picture, R bit (703) may be set to indicate that the slice included in FU payload (710) is the first slice of the first NAL unit of the picture.

In the second FU header (711), the S bit is cleared (705) and the E bit is set (706) to indicate that this is the last slice of the NAL unit. The Type (Type) field (708) is set to the Type of NAL unit. The R bit (707) is set as described in one of the alternatives above. For example, if NAL unit (713) is the last NAL unit of a picture, R bit (707) may be set to indicate that the slice included in FU payload (712) is the last slice of the last NAL unit of the picture.

In an embodiment, a method of encapsulating a NAL unit into a plurality of RTP packets according to at least one RTP payload specification by an encapsulator may comprise: partitioning a NAL unit into a plurality of slices; each fragment is encapsulated into an RTP packet, which includes an FU header including R bits. In an embodiment, the R bit may be set by the encapsulator if the NAL unit is the last NAL unit of a coded picture, and cleared otherwise.

In an embodiment, a method of depacketizing, by a decapsulator, a NAL unit from a plurality of RTP packets according to at least one RTP payload specification may comprise: decoding the NAL unit; each fragment is depacketized from an RTP packet including an FU header, the FU header includes R bits, and the plurality of fragments are assembled into NAL units. In an embodiment, the R bit observed by the encapsulator may be equal to one if the NAL unit is the last NAL unit of a coded picture, and equal to zero otherwise.

Fig. 8 is a flow diagram of an example method 800 for encapsulating multiple NAL units of a picture using at least one processor. In some embodiments, at least one of the method blocks in fig. 8 may be performed by an encapsulator or decapsulator, for example, as discussed above. In some embodiments, at least one of the method blocks in FIG. 8 may be performed by another device or group of devices, such as the endpoints and MANEs discussed above.

As shown in fig. 8, method 800 may include obtaining a plurality of NAL units including a first NAL unit of a picture and a last NAL unit of the picture (block 810).

As further shown in fig. 8, method 800 may include partitioning a first NAL unit of a picture into a plurality of first slices and partitioning a last NAL unit of the picture into a plurality of last slices (block 820).

As further shown in fig. 8, method 800 may include encapsulating a first plurality of segments into a first plurality of Fragment Unit (FU) packets and encapsulating a last plurality of segments into a last plurality of FU packets. In an embodiment, a last FU packet of the plurality of last FU packets may include a last FU header including a last R bit, and the last R bit may be set, e.g., to 1 (block 830).

As further shown in fig. 8, the method 800 may include transmitting a plurality of first FU packets and a plurality of last FU packets (block 840).

In an embodiment, the first plurality of FU packets and the last plurality of FU packets may include a Real-time Transport Protocol (RTP) packet.

In an embodiment, a first FU packet of the plurality of first FU packets may include a first FU header including a first R bit, and the first R bit may be set to 0.

In an embodiment, a first FU packet of the plurality of first FU packets may include a first FU header including a first S bit, and a last FU header may include a last S bit.

In an embodiment, the first S bit may be set to 1 and the last S bit may be set to 0.

In an embodiment, the plurality of NAL units may include an intermediate NAL unit between the first NAL unit and the last NAL unit, the intermediate NAL unit may be divided into a plurality of intermediate slices, and the plurality of intermediate slices may be encapsulated into a plurality of intermediate FU packets.

In an embodiment, the first FU packet of the plurality of first FU packets may include a first FU header including a first E bit, the last FU packet of the plurality of intermediate FU packets may include an intermediate FU header including an intermediate E bit, and the last FU header may include a last E bit.

In an embodiment, the first E bit may be set to 0, wherein the middle E bit may be set to 1 and the last E bit may be set to 0.

Although fig. 8 shows example blocks of the method 800, in some embodiments, the method 800 may include additional blocks, fewer blocks, different blocks, or a different arrangement of blocks than those depicted in fig. 8. Additionally or alternatively, two or more of the blocks of method 800 may be performed in parallel.

Further, the proposed method may be implemented by a processing circuit (e.g. at least one processor or at least one integrated circuit). In one example, at least one processor executes a program stored in a non-transitory computer readable medium to perform at least one of the methods set forth.

Corresponding to the method 800, an embodiment of the present disclosure further provides an apparatus for encapsulating multiple network abstraction layer NAL units of a picture, where the apparatus includes:

an acquisition module to acquire the plurality of NAL units, the plurality of NAL units comprising a first NAL unit of the picture and a last NAL unit of the picture;

a partitioning module to partition the first NAL unit of the picture into a plurality of first segments and to partition the last NAL unit of the picture into a plurality of last segments;

a packaging module, configured to package the first segments into a plurality of first fragment unit FU packets, and package the last segments into a plurality of last FU packets; and

a transmitting module for transmitting the plurality of first FU packets and the plurality of last FU packets,

wherein a last FU packet of the plurality of last FU packets comprises a last FU header, the last FU header comprises a last R bit, and the last R bit is set to 1.

In some embodiments, the plurality of first FU packets and the plurality of last FU packets comprise real-time transport protocol, RTP, packets.

In some embodiments, a first FU packet of the plurality of first FU packets comprises a first FU header, the first FU header comprises a first R bit, and the first R bit is set to 0.

In some embodiments, a first FU packet of said plurality of first FU packets comprises a first FU header, said first FU header comprising a first S bit, wherein said last FU header comprises a last S bit.

In some embodiments, the first S bit is set to 1 and the last S bit is set to 0.

In some embodiments, the plurality of NAL units includes an intermediate NAL unit between the first NAL unit and the last NAL unit, wherein the intermediate NAL unit is partitioned into a plurality of intermediate slices, wherein the plurality of intermediate slices are encapsulated into a plurality of intermediate FU packets.

In some embodiments, a first FU packet of said plurality of first FU packets comprises a first FU header, said first FU header comprising a first E-bit, wherein a last FU packet of said plurality of intermediate FU packets comprises an intermediate FU header, said intermediate FU header comprising an intermediate E-bit, wherein said last FU header comprises a last E-bit.

In some embodiments, the first E bit is set to 0, wherein the middle E bit is set to 1, and wherein the last E bit is set to 0.

The above-described techniques for signaling and identifying picture boundaries in video payload format over an IP network may be implemented as computer software via computer readable instructions and physically stored in one or more computer readable media. For example, fig. 9 illustrates a computer system (900) suitable for implementing certain embodiments of the disclosed subject matter.

The computer software may be encoded in any suitable machine code or computer language, and by assembly, compilation, linking, etc., mechanisms create code that includes instructions that are directly executable by one or more computer Central Processing Units (CPUs), Graphics Processing Units (GPUs), etc., or by way of transcoding, microcode, etc.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablets, servers, smartphones, gaming devices, internet of things devices, and so forth.

The components illustrated in FIG. 9 for the computer system (900) are exemplary in nature and are not intended to limit the scope of use or functionality of the computer software implementing embodiments of the present application in any way. Neither should the configuration of components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of the computer system (900).

The computer system (900) may include some human interface input devices. Such human interface input devices may respond to input from one or more human users through tactile input (e.g., keyboard input, swipe, data glove movement), audio input (e.g., sound, applause), visual input (e.g., gestures), olfactory input (not shown). The human-machine interface device may also be used to capture media that does not necessarily directly relate to human conscious input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, photographic images obtained from still-image cameras), video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

The human interface input device may include one or more of the following (only one of which is depicted): keyboard (901), mouse (902), touch pad (903), touch screen (910), data glove (not shown), joystick (905), microphone (906), scanner (907), camera (908).

The computer system (900) may also include certain human interface output devices. Such human interface output devices may stimulate the senses of one or more human users through, for example, tactile outputs, sounds, light, and olfactory/gustatory sensations. Such human interface output devices may include tactile output devices (e.g., tactile feedback through a touch screen (910), data glove (904), or joystick (905), but there may also be tactile feedback devices that do not act as input devices), audio output devices (e.g., speakers (909), headphones (not shown)), visual output devices (e.g., screens (910) including cathode ray tube screens, liquid crystal screens, plasma screens, organic light emitting diode screens, each with or without touch screen input functionality, each with or without tactile feedback functionality — some of which may output two-dimensional visual output or output more than three-dimensional by means such as stereoscopic picture output), virtual reality glasses (not shown), holographic displays, and smoke boxes (not shown)), and printers (not shown).

The computer system (900) may also include human-accessible storage devices and their associated media, such as optical media including compact disc read-only/rewritable (CD/DVD ROM/RW) (920) or similar media (921) with CD/DVD, thumb drive (922), removable hard drive or solid state drive (923), conventional magnetic media such as magnetic tape and floppy disk (not shown), ROM/ASIC/PLD based proprietary devices such as secure dongle (not shown), and the like.

Those skilled in the art will also appreciate that the term "computer-readable medium" used in connection with the disclosed subject matter does not include transmission media, carrier waves, or other transitory signals.

The computer system (900) may also include an interface to one or more communication networks. For example, the network may be wireless, wired, optical. The network may also be a local area network, a wide area network, a metropolitan area network, a vehicular network, an industrial network, a real-time network, a delay tolerant network, and so forth. The network also includes ethernet, wireless local area networks, local area networks such as cellular networks (GSM, 3G, 4G, 5G, LTE, etc.), television wired or wireless wide area digital networks (including cable, satellite, and terrestrial broadcast television), automotive and industrial networks (including CANBus), and so forth. Some networks typically require external network interface adapters for connecting to some general purpose data port or peripheral bus (949) (e.g., a USB port of computer system (900)); other systems are typically integrated into the core of the computer system (900) by connecting to a system bus as described below (e.g., an ethernet interface to a PC computer system or a cellular network interface to a smart phone computer system). Using any of these networks, the computer system (900) may communicate with other entities. The communication may be unidirectional, for reception only (e.g., wireless television), unidirectional for transmission only (e.g., CAN bus to certain CAN bus devices), or bidirectional, for example, to other computer systems over a local or wide area digital network. Each of the networks and network interfaces described above may use certain protocols and protocol stacks.

The human interface device, human accessible storage device, and network interface described above may be connected to the core (940) of the computer system (900).

The core (940) may include one or more Central Processing Units (CPUs) (941), Graphics Processing Units (GPUs) (942), special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) (943), hardware accelerators (944) for specific tasks, and so forth. These devices, as well as Read Only Memory (ROM) (945), random access memory (946), internal mass storage (e.g., internal non-user accessible hard drives, solid state disks, etc.) (947), etc., may be connected by a system bus (948). In some computer systems, the system bus (948) may be accessed in the form of one or more physical plugs, so as to be extensible through additional central processing units, graphics processing units, and the like. The peripheral devices may be attached directly to the system bus (948) of the core or connected through a peripheral bus (949). The architecture of the peripheral bus includes peripheral controller interface PCI, universal serial bus USB, etc.

The CPU (941), GPU (942), FPGA (943) and accelerator (944) may execute certain instructions, which in combination may constitute the computer code described above. The computer code may be stored in ROM (945) or RAM (946). Transitional data may also be stored in RAM (946) while persistent data may be stored in, for example, internal mass storage (947). Fast storage and retrieval of any memory device may be achieved through the use of cache memory, which may be closely associated with one or more CPUs (941), GPUs (942), mass storage (947), ROMs (945), RAMs (946), and so on.

The computer-readable medium may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present application, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, a computer system having an architecture (900), and in particular a core (940), may provide functionality as a processor (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software embodied in one or more tangible computer-readable media. Such computer-readable media may be media associated with the user-accessible mass storage described above, as well as certain memory having a non-volatile core (940), such as core internal mass storage (947) or ROM (945). Software implementing various embodiments of the present application may be stored in such devices and executed by the core (940). The computer-readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core (940), and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.), to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM (946) and modifying such data structures according to software defined processes. Additionally or alternatively, the computer system may provide functionality that is logically hardwired or otherwise embodied in circuitry (e.g., accelerator (944)) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. Where appropriate, reference to software may include logic and vice versa. Where appropriate, reference to a computer-readable medium may include circuitry (e.g., an Integrated Circuit (IC)) storing executable software, circuitry comprising executable logic, or both. The present application includes any suitable combination of hardware and software.

While the application has described several exemplary embodiments, various modifications, arrangements, and equivalents of the embodiments are within the scope of the application. It will thus be appreciated that those skilled in the art will be able to devise various systems and methods which, although not explicitly shown or described herein, embody the principles of the application and are thus within its spirit and scope.

Claims

1. A method of encapsulating a plurality of Network Abstraction Layer (NAL) units of a picture, the method comprising:

obtaining the plurality of NAL units, the plurality of NAL units comprising a first NAL unit of the picture and a last NAL unit of the picture;

partitioning the first NAL unit of the picture into a plurality of first slices and partitioning the last NAL unit of the picture into a plurality of last slices;

encapsulating the plurality of first segments into a plurality of first Fragment Unit (FU) packets and encapsulating the plurality of last segments into a plurality of last FU packets; and

transmitting the plurality of first FU packets and the plurality of last FU packets,

wherein in a last FU packet of the plurality of last FU packets, a last FU header is included, the last FU header including a last R bit, an

The last R bit is set to 1.

2. The method of claim 1, wherein said plurality of first FU packets and said plurality of last FU packets comprise real-time transport protocol (RTP) packets.

3. The method of claim 1, wherein a first FU packet of the plurality of first FU packets comprises a first FU header, the first FU header comprising a first R-bit, and wherein

Wherein the first R bit is set to 0.

4. The method of claim 1, wherein a first FU packet of the plurality of first FU packets comprises a first FU header, the first FU header comprising a first S-bit,

wherein the last FU header comprises a last S bit.

5. The method of claim 4, wherein the first S bit is set to 1 and the last S bit is set to 0.

6. The method of any of claims 1-5, wherein the plurality of NAL units includes an intermediate NAL unit between the first NAL unit and the last NAL unit,

wherein the intermediate NAL unit is partitioned into a plurality of intermediate segments,

wherein the plurality of intermediate segments are encapsulated into a plurality of intermediate FU packets.

7. The method of claim 6, wherein a first FU packet of the plurality of first FU packets comprises a first FU header, the first FU header comprising a first E-bit,

wherein a last FU packet of the plurality of intermediate FU packets comprises an intermediate FU header, the intermediate FU header comprising an intermediate E bit,

wherein the last FU header includes a last E bit.

8. The method of claim 7, wherein the first E bit is set to 0,

wherein the middle E bit is set to 1, an

Wherein the last E bit is set to 0.

9. An apparatus for encapsulating a plurality of Network Abstraction Layer (NAL) units of a picture, the apparatus comprising:

an acquisition module to cause the at least one processor to acquire the plurality of NAL units, the plurality of NAL units comprising a first NAL unit of the picture and a last NAL unit of the picture;

a partitioning module to cause the at least one processor to partition the first NAL unit of the picture into a plurality of first slices and the last NAL unit of the picture into a plurality of last slices;

a packing module to cause the at least one processor to pack the first plurality of segments into a first plurality of FU packets and to pack the last plurality of segments into a last plurality of FU packets; and

a transmitting module for causing the at least one processor to transmit the plurality of first FU packets and the plurality of last FU packets,

wherein a last FU packet of the plurality of last FU packets comprises a last FU header, the last FU header comprising a last R bit, an

The last R bit is set to 1.

10. The apparatus of claim 9, wherein said plurality of first FU packets and said plurality of last FU packets comprise real-time transport protocol (RTP) packets.

11. The apparatus of claim 9, wherein a first FU packet of the plurality of first FU packets comprises a first FU header, the first FU header comprising a first R-bit, and

wherein the first R bit is set to 0.

12. The apparatus of claim 9, wherein a first FU packet of the plurality of first FU packets comprises a first FU header, the first FU header comprising a first S-bit,

wherein the last FU header comprises a last S bit.

13. The device of claim 9, wherein the plurality of NAL units includes an intermediate NAL unit between the first NAL unit and the last NAL unit,

14. A computer device comprising a processor and a memory; the memory stores a computer program which, when executed by the processor, causes the processor to perform the method of any one of claims 1 to 8.

15. A non-transitory computer-readable medium storing instructions, the instructions comprising: at least one instruction that, when executed by at least one processor of a device that encapsulates a plurality of Network Abstraction Layer (NAL) units of a picture, causes the at least one processor to perform the method of any one of claims 1-8.