MX2008009353A - Backward-compatible aggregation of pictures in scalable video coding - Google Patents

Backward-compatible aggregation of pictures in scalable video coding

Info

Publication number
MX2008009353A
MX2008009353A MX/A/2008/009353A MX2008009353A MX2008009353A MX 2008009353 A MX2008009353 A MX 2008009353A MX 2008009353 A MX2008009353 A MX 2008009353A MX 2008009353 A MX2008009353 A MX 2008009353A
Authority
MX
Mexico
Prior art keywords
unit
access unit
elementary data
data unit
layer
Prior art date
Application number
MX/A/2008/009353A
Other languages
Spanish (es)
Inventor
Hannuksela Miska
Wang Yekui
Original Assignee
Hannuksela Miska
Nokia Corporation
Nokia Inc
Wang Yekui
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hannuksela Miska, Nokia Corporation, Nokia Inc, Wang Yekui filed Critical Hannuksela Miska
Publication of MX2008009353A publication Critical patent/MX2008009353A/en

Links

Abstract

An indirect aggregator NAL unit for the SVC file format and RTP payload format for video coding. The indirect aggregator NAL unit of the present invention enables easy identification of scalability dependencies within a bit stream, thereby enabling fast and efficient stream manipulation. Furthermore, the indirect aggregator NAL unit of the present invention ensures that a base layer of the streams can still be processed with a H.264/AVC decoder, AVC file format parser, and H.264/AVC RTP payload parser.

Description

REVERSABLE COMPATIBLE AGGREGATION OF IMAGES IN SCALABLE VIDEO CODING FIELD OF THE INVENTION The present invention relates generally to video coding. More particularly, the present invention relates to the coding, storage and transport of scalable video.
BACKGROUND OF THE INVENTION This section is intended to provide an antecedent or context for the invention which is indicated in the claims. The description of the present may include concepts that may be desired, but are not necessarily those that are previously conceived or desired. Therefore, unless otherwise indicated herein, what is described in this section is not prior art for the description and claims in this application, and is not admitted as prior art by inclusion in this section. Scalable Video Encoding (SVC) provides scalable video bit streams. A stream of scalable video bits contains a non-scalable base layer and one or more enhancement or enhancement layers. A layer of increase or improvement Ref .: 195072 it can improve the temporal resolution (for example, the proportion or frame rate), the spatial resolution, or the quality of the video content represented by the lower layer or part of it. The scalable layers can be added to a simple real-time transport protocol (RTP) stream or transported independently. The concept of a video coding layer (VCL) and network abstraction layer (NAL) is inherited from advanced video coding (AVC). ). The VCL contains the codec signal processing functionality; mechanisms such as transformation, quantification, compensated prediction in motion, loop filter, inter-layer prediction. An encoded image of a base or breeding layer consists of one or more slices. The NAL encapsulates each slice generated by the VCL in one or more NAL units. Each SVC layer is formed by NAL units, which represent the encoded video bits of the layer. An RTP stream carrying only one layer could carry NAL units belonging to that layer only. An RTP stream carrying a full scalable video bit stream could carry NAL units of a base layer and one or more enhancement layers. The SVC specifies the decoding order of these NAL units.
The concept of scaling the quality of visual content by the omission of transport and the decoding of the complete enhancement layers is denoted as coarse-grained scalability (CGS, for its acronym in English). In some cases, the bit rate of a given enhancement layer can be reduced by truncation of the bits from the individual NAL units. Truncation leads to an elegant degradation of the video quality of the reproduction layer reproduced. This concept is known as fine grain scalability (granularity) (FGS). According to the H.264 / AVC video coding standard, an access unit comprises a primary coded image. In some systems, the detection of the limits of the access unit can be simplified by the insertion of a NAL unit delimiting the access unit, in the bit stream. In the SVC, an access unit can comprise multiple primary coded images, but at most one image for each unique combination of dependency id (dependency_id), temporal level (temporal_level), and quality level (quality_level). Scalable video coding involves the coding of a "base layer" with certain minimum quality, as well as the coding of the improvement information which increases the quality to a maximum level. The base layer of SVC currents is typically compliant with advanced video coding (AVC). In other words, AVC decoders can decode the base layer of an SVC stream and ignore the SVC-specific data. This feature has been made by specifying the encoded slice NAL unit types, which are specific to SVC, were reserved for future use in AVC, and should be skipped according to the AVC specification. The identification of the images and their scalability characteristics within an SVC access unit is important for at least two purposes. Firstly, this identification is important for the thinning of the compressed domain stream in the servers or in the inputs. Due to the requirement to handle large amounts of data, these elements have to identify removable images as quickly as possible. Secondly, this identification is important for the reproduction of a current with desired quality and complexity. The receivers and players should be able to identify those images in a scalable stream that they are unable or unwilling to decode. A function of known inputs by the RTP media or mixers (which can be multi-point conference control units, inputs between telephony circuit-switched and packet-switched video, push-over-talk-to-talk (PoC) servers, IP encapsulators in manual digital video broadcast (DVB-H) systems, or higher-set boxes that send transmissions locally broadcast to domestic wireless networks, for example) is to control the bit rate of the current sent according to the prevailing downlink network conditions. It is desirable to control the data rate sent without extensive processing of the input data, for example, simply by carrying packages or easily identified packet parts. For layered coding, the inputs must deliver complete images or sequences of images that do not affect the decoding of the current sent. The interleaved packet formation mode of the H.264 / AVC RTP load specification allows the encapsulation of virtually any NAL units of any access units in the same RTP load (referred to as an aggregation packet). In particular, it is not required to encapsulate complete encoded images in an RTP load, but rather the NAL units of an encoded image can be divided into multiple RTP packets. While this freedom of packet aggregation is welcome for many applications, it causes A number of complications in an entry operation. First, given an aggregation packet, it is not known which images belong to your NAL units until before the grammatical analysis of the header of each NAL unit contained in the aggregation packet. Therefore, when the interleaved packet mode for SVC is applied, the layers to which the contained NAL units belong are not known before the grammatical analysis of the header of each NAL unit in the packet. Consequently, an entry has to grammatically analyze each NAL unit header before deciding whether any, all or some NAL units of the package are sent. Secondly, for some NAL units, such Supplementary Improvement Information (SEI) and the NAL units set in parameters, it is not possible to identify the access unit to which they belong before the NAL units of the coding layer are received. Video (VCL) of the same access unit. Therefore, the entry may need to maintain a buffer and some other status information to resolve the mapping of non-VCL NAL units to their associated images. In conventional video coding standards, an image header is used to separate the encoded images. However, in the H.264 / AVC standard and in SVC, no header is included of image in the syntax. Additionally, although grammatical analyzers may have the ability to grammatically analyze the scalability information for each NAL unit in a stream, this requires a much larger amount of processing energy bits, and some grammatical analyzers may not have this ability. In addition to the above, a NAL aggregate unit has been previously proposed in model 2 of the SVC file format check (MPEG document M7586). In this system, the aggregate NAL unit is a container that includes the associated NAL units in its load. The NAL aggregate unit has a type that is not specified in the H.264 / AVC and SVC specifications and must be ignored in the H.264 / AVC and SVC decoders. However, when a base layer image according to the H.264 / AVC standard is enclosed within a NAL aggregate unit, it is no longer decodable with an H.264 / AVC decoder, nor is it grammatically analysable with a non-decoder. H.264 / AVC RTP or a grammar analyzer of AVC file format.
BRIEF DESCRIPTION OF THE INVENTION The present invention provides an indirect aggregate NAL unit for the SVC file format and the RTP load format. The indirect aggregate NAL unit of the present invention makes possible the easy identification of the scalability dependencies within the bitstream, which makes fast and efficient manipulation of the streams possible. In addition, the indirect aggregate NAL unit of the present invention ensures that the base layer of the streams can still be processed with an H.264 / AVC decoder, the AVC file format grammar analyzer, and the H.264 grammatical load analyzer. / AVC RTP. These and other advantages and features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying figures, wherein like elements have numbers similar throughout the various figures described below.
BRIEF DESCRIPTION OF THE FIGURES Figure 1 is a schematic representation of the set of circuits included in an electronic device that is capable of serving as an encoder or decoder to implement the functionality of the present invention; Figure 2 shows a generic multimedia communication system (multiple media) for use with the present invention; and Figure 3 shows a multicast array of IP where each router can withdraw the bit stream according to its capabilities.
DETAILED DESCRIPTION OF THE INVENTION The present invention provides an indirect aggregate NAL unit, more generally a scalar information elementary data unit, for use in scalable video coding. The NAL indirect aggregate unit does not contain other NAL units. Rather, the indirect aggregate NAL unit of the present invention contains mechanisms for associating itself with other NAL units. These mechanisms include, but are not limited to, the number of subsequent bytes, the number of subsequent NAL units, and the number of remaining NAL units within the higher-level structure. For example, the remaining NAL units within the top-level structure are in the same RTP load in which the indirect NAL aggregate unit also appears. The structure for the indirect NAL unit of the present invention further comprises the property or feature information that is common to all associated NAL units. Property information or common features include, but are not limited to, scalability information, and whether or not the associated NAL units form a layer switching point. scalable, in which a different scalable layer can change or switch to the current layer. The scalability information may include at least the "extended" NAL unit header specified in the SVC specification, which includes the syntax elements simple_priority_id, desirable_flag, dependency_id, temporary_level, and quality_level. The indirect aggregate NAL unit of the present invention is selected from such types of NAL units that are specified as types to be ignored by the processing units for the H.264 / AVC base layer only. In other words, the H.264 / AVC decoder, the AVC file format grammar analyzer, and the H.264 / AVC RTP downloader must ignore the indirect aggregate NAL unit of the present invention. In addition, the indirect NAL aggregate unit could be ignored by the SVC decoder, since this does not cause normative effect in the decoding process. An exemplary syntax and semantics of the NAL aggregate unit for the SVC file format, and another example for the SVC RTP load format, are provided below. It should also be noted that the present invention is not limited to these particular examples of encapsulation and coding formats. In terms of the SVC file format, the NAL aggregate units make it possible for NALU map group entries to be consistent and repetitive The NAL aggregation units are used to group the SVN NAL units that belong to the same sample and that have the same scalability information. The NAL aggregation units use the same header as the scalable extension NAL units (SVC NAL units) but a new NAL unit type. A NAL aggregate unit may comprise NAL extractor units. A NAL extractor unit can refer to the NAL aggregate units. When the current or flow is explored, if the NALU aggregation is not necessary (for example, it belongs to an unwanted layer), this and its contents are easily discarded (using its length field). If the NALU aggregation is necessary, its header is easily discarded and its contents withheld. A NAL aggregate unit encapsulates two or more NAL SVC units in a new NAL unit. The NAL aggregate unit uses a NAL unit header with the same syntax as the SVC NAL units (as specified in the SVC specification). A NAL aggregate unit is stored within a sample like any other NAL unit. All NAL units remain in the decoding order within a NAL aggregate unit. If the NAL units are grouped that belonging to the same level of quality (quality_level), then the order of the units NAL with quality_nĂ­vel > 0 can change. The syntax of the NAL aggregate unit is as follows. class lined (8) NALAdvantage Unit (NUnloading Unit Size). { int not assigned i = 2; / * NalUnit Header as specified in SVC spec * / bit (1) forgotten_zero_bit; bit (2) NAL-ref_idc; bit (5) NAL_type_type = TypeUnit_NALAdvanced = const (30); bit (6) simple_dependence_ID; bit (1) disposable_bandera; bit (1) bank_extension; yes (extension_bander). { quality_level = simple_dependence_ID; bit (3) temporary_level; bit (3) dependency_ID; bit (2) quality_ID; i ++; } / * end of unit header NAL * / do. { unallocated int (SmalleritudeOneStageOne + 1) * 8) LengthUnitALN; Bitio (Length of UnitNAL * 8) SVCNALUnity; i + = (LengthSizeSizeOne + 1) + Length of NALUnit; } whereas (i <NALAdvantageSizeUnit); } The semantics for a NAL aggregate unit are as follows. NAL Unit Header: (8 or 16 bits) as specified in the SVC specification: NAL_type_type is set to the type of aggregate NAL unit (type 30). The scalability information (NAL-ref_idc, simple_dependency_ID, disposable_flag, extended scalability information) will have the same values as within the header of each NAL unit added. Null Unit Length: Specifies the size of the next NAL unit. The size of this field is specified with the entry LengthOneSizeOne. NALSVC unit: The NAL SVC unit is specified in the SVC specification, including the header of the NAL SVC unit. The size of the NAL unit of SVC is specified by the length of NalUnit. It is assumed that a NAL aggregate unit collects NAL SVC units from the same scalability layer. This could group the SVC NAL units of different layers as well (for example, the grouping of all levels of quality (fragments of FGS), the grouping of all NAL units with the same dependency_ID). In this case, the header of the NAL aggregation unit could signal the scalability information of the SVC NAL units with the lowest dependency_ID and / or temporary_level, quality_ID. The NAL aggregation units can be used to group the SVN NAL units that belong to a scalability level that may not be signaled by the NAL unit header (for example, the SVN NAL units that belong to a region of interest). . The description of such aggregate NAL unit can be made with the description of the layer and the NAL unit map groups. In this case, more than one aggregate NAL unit with the same scalability information may appear in a sample. The NAL aggregation units can lead to a constant number of NAL units for each layer in each AU. To ensure a consistent pattern, the following may occur. The NALUs of the AVC base layer can be grouped into a NAL aggregate unit (if used in an SVC stream). In this case, temporary_level, dependency_ID and quality_ID are all set to 0. The NALUs of the base layer of AVC can be referred by an NAL extractor. If for some reasons, no NALU of a particular layer exists in this AU, then an empty NAL aggregate unit may exist in this position.
In terms of an RTP Load Format for the SVC Video, a NAL Load Content Scalability Information Unit is generally as follows. A NAL unit of SVC includes a header of one, two or three bytes and the string of load bytes. The header indicates the type of NAL unit, the presence (potential) of the bit errors or the syntax violations in the NAL unit load, the information regarding the relative importance of the NAL unit for the decoding process, and ( optionally, when the header is three bytes) the information from the dependency to scalable layer decoding. The header of the NAL unit co-serves as the loading header of this RTP load format. The loading of a NAL unit follows immediately. The syntax and semantics of the NAL unit header are specified in [SVC], but the essential properties of the NAL unit header are summarized below. The first byte of the NAL unit header has the following format: + + | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | F | NRI | Type I prohibited_cero_bitio (F): 1 bit. The H.264 specification declares a value of 1 as a violation of the syntax. nal_ref_idc (NRI): 2 bits. A value of 00 indicates that the content of the NAL unit is not used to reconstruct the reference images for inter-image prediction. Such NAL units can be discarded without risking the integrity of the reference images in the same layer. Values greater than 00 indicate that the decoding of the NAL unit is required to maintain the integrity of the reference images. For a slice or slice data that divides the NAL unit, an NRI value of 11 indicates that the NAL unit contains data from a key image, as specified in [SVC]. Informative Note: The concept of a key image has been introduced in SVC, and no assumption of any images in the bit streams that comply with the 2003 and 2005 versions of H.264 follow this rule should be made. nal_type_type (Type): 5 bits. This component specifies the type of load of the NAL unit. Previously, types 20 and 21 of the NAL unit (among others) have been reserved for future extensions. SVC is using these two types of NAL unit. These indicate the presence of one more byte that is helpful from a transport point of view. | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | + + simple_priority_id (PRID): 6 bits. This component specifies a priority identifier for the NAL unit. When flag_extension is equal to 0, simple_priority_id is used to infer the values of dependence_id, temporal_level, and cali ad_level. When simple_priority_id is not present, it is inferred that this is equal to 0. disposable_bander (D): 1 bit. A value of 1 indicates that the content of the NAL unit (id_dependency = current_dependency_ID) is not used in the decoding process of the NAL units with id_dependence > Current Dependency Id. Such NAL units can be discarded without risking the integrity of the highest scalable layers, with larger values of id_dependency. disposable_bandera equal to 0 indicates that the decoding of the NAL unit is required to maintain the integrity of the highest scalable layers with higher values of dependence_id. banner extension (E): 1 bit. A value of 1 indicates that the third byte of the NAL unit header is present When E-bit of the second byte is 1, then the header of the NAL unit is extended to a third byte: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | + - + - + - + - + - + - + - + - + TL | DID | QL | temporary_level (TL): 3 bits. This component is used to indicate the temporal scalability or frame rate. A layer consisted of images of a smaller temporary value_level, has a smaller frame rate. dependency_id (DID): 3 bits. This component is used to indicate the dependency hierarchy of inter-layer coding. In any temporary site, an image of a smaller value of dependency_id can be used for inter-layer prediction for encoding an image with a larger dependency_id value. quality_level (QL): 2 bits. This component is used to indicate the hierarchy of the FGS layer. In any temporary site with an identical value of dependency_id, an FGS image with a quality value equal to QL uses the FGS image or the base quality image (the non-FGS image when QL-1 = 0) with the value of quality_level equal to QL-1 for inter-layer prediction. When QL is greater than 0, the NAL unit contains a slice FGS or part of it. In this mode, a new type of NAL unit, named as a NAL unit of load content scalability information (PACSI) is specified. The NAL unit of PACSI, if present, must be the first NAL unit in aggregation packets, and this must not be present in other types of packets. The NAL unit of PACSI indicates the scalability characteristics that are common for all NAL units remaining in the load, making it easier in this way for the MANEs to decide whether to send or discard the package. Senders or senders can create PACSI NAL units, and receivers can ignore them. The NAL unit type for the PACSI NAL unit is selected from those values that are not specified in the H.264 / AVC specification and in RFC 3984. In this way, the SVC streams that have the H.264 / AVC base layer and that include the PACSI NAL units can be processed with RFC 3984 receivers and H.264 / AVC decoders. When the first aggregation unit of an aggregation packet contains a PACSI NAL unit, there must be at least one additional aggregation unit present in the same package. The RTP header fields are established according to the remaining NAL units in the aggregation package. When a NAL PACSI unit is included in a multi-time aggregation packet, the number of decode commands for the PACSI NAL unit must be adjusted to indicate that the PACSI NAL unit is the first NAL unit in order of decoding between the units NAL in the aggregation packet or PACSI NAL unit has a decode order number identical to the first NAL unit in order of decoding, among the remaining NAL units in the aggregation packet. The structure of the PACSI NAL unit is specified as follows. 0 1 2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 | F | NRI | Type I PRID | D | E | TL | DID | QL | The values of the fields in the PACSI NAL unit should be set as follows. Bit F must be set to 1 if the F bit in any NAL unit remaining in the load is equal to l. Otherwise, bit F must be set to 0. - The NRI field must be set to the highest value of the NRI field between the remaining NAL units in the load. The Type field must be set to 30. The PRID field must be set to the lowest value of the PRID field among the remaining NAL units in the load. If the PRID field is not present in one of the remaining NAL units in the load, the PRID field in the NAL unit of the PACSI must be set to 0. Bit D must be set to 0 if bit D in any NAL unit remaining in the load is equal to 0. Otherwise, bit D must be set to 1. Bit E must be set to 1. The TL field must be set to the lowest value of the TL field among the remaining NAL units in the load. The DID field must be set to the lowest value of the DID field among the remaining NAL units in the load. The QL field must be adjusted to the lowest value of the QL field among the remaining NAL units in the load. The indirect aggregate NAL unit of the present invention makes it possible to easily identify the scalability dependencies within the bit stream, thereby making the manipulation of the stream fast and efficient. The indirect aggregate NAL unit ensures that the base layer of the currents can still be processed with an H.264 / AVC decoder, a grammar analyzer of the AVC file format, and the grammar analyzer of H.264 / AVC RTP load. In the case of decoding, it should be noted that the bit stream to be decoded can be received from a remote device located virtually within any type of network. Additionally, the bit stream can be received from the local hardware or software. It should also be understood that, although the text and the examples contained therein may specifically describe a coding process, a person skilled in the art could easily understand that the same concepts and principles apply also to the corresponding decoding process and vice versa. Figure 1 shows a representative electronic device 12 within which the present invention can be implemented, on the coding side and the decoding side. It should be understood, however, that the present invention is not intended to be limited to a particular type of electronic device 12. The electronic device 12 of Figure 1 includes a screen 32, a keyboard 34, a microphone 36, an auricular piece 38 , an infrared port 42, an antenna 44, an intelligent card 46 in the form of a UICC according to an embodiment of the invention, a card reader 48, a set of radio interconnection circuits 52, a set of circuits 54 codec, a 56 controller and a memory 58. The circuits and individual elements are all of a type well known in the art, for example in the Nokia range of mobile phones. Figure 2 shows a generic multimedia communication system for use with the present invention. A data source 100 provides a source signal in an analog format, uncompressed digital or compressed digital, or any combination of these formats. An encoder 110 encodes the source signal and a bit stream of encoded medium. The encoder 110 may be capable of encoding more than one type of medium, such as audio and video, and more than one encoder 110 may be required to encode different types of source signal means. The encoder 110 can also obtain the synthetically produced input, such as graphics and text, or it can be capable of producing coded bit streams of synthetic media. In the following, only the processing of a bitstream of encoded media of a type of medium is considered, to simplify the description. It should be noted, however, that typical real-time broadcast services typically comprise several streams (typically at least one stream of audio, video and sub-textual entitlement). It should also be noted that the system may include many encoders, but in the following only one encoder 110 is considered for simplify the description without a lack of generality. The bitstream of encoded media is transferred to a store 120. The store 120 may comprise any type of mass memory for storing the bit stream of encoded media. The format of the coded media bit stream in store 120 may be a self-contained, elementary bit stream format, or one or more coded media bit streams may be encapsulated within a container or container file. Some systems operate "live", for example, bypassing the storage and transferring the bit stream of encoded media from the encoder 110 directly to the sender 130. The encoded media bit stream is then transferred to the sender 130, also referred to as the server , on a need basis. The format used in the transmission may be an elementary auto-content bit stream format, a packet stream format, or one or more encoded media bit streams may be encapsulated in a container file. The encoder 110, the store 120, and the server 130 may reside on the same physical device or they may be included in the separate devices. The encoder 110 and the server 130 can operate with active or live real-time content, in which case the bit stream of encoded media is typically not permanently stored, but rather temporarily stored for small periods of time in content encoder 110 and / or server 130, to smooth out variations in processing delay, transfer delay, and bit rate of encoded media . The server 130 sends the bit stream of encoded media using a stacking communication protocol. The stack or stack may include but is not limited to the Real Time Transport Protocol (RTP), User Datagram Protocol (UDP for its acronym in English), and Internet Protocol (IP for its acronym in English). When the stacking of the communication protocol is packet oriented, the server 130 encapsulates the stream of encoded media bits, in packets. For example, when RTP is used, the server 130 encapsulates the bitstream of encoded media in RTP packets according to an RTP load format. Typically, each type of media has a dedicated RTP load format. It should be noted again that a system may contain more than one server 130, but for purposes of simplicity, the following description considers only one server 130. The server 130 may or may not be connected to an input 140 through a communication network. The entrance 140 can perform different types of functions, such as the translation of a packet stream according to a stack of communication protocol to another stacking of communication protocol, the merging and raising of data streams, and the manipulation of the current of data according to the downlink and / or receiver capabilities, such as the bit rate control of the current sent, according to the prevailing conditions of the downlink network. Examples of input 140 include multiple point-of-conferencing control units (MCUs), inputs between switched video telephony in circuits and packet switched, push to talk over cellular (PoC) servers, IP encapsulators in manual systems of digital video broadcasting (DVB-H), or upper-adjustment boxes that send broadcast transmissions locally to domestic wireless networks. When using RPT, input 140 is called an RTP mixer and acts as an endpoint of an RTP connection. The system includes one or more receivers 150, typically capable of receiving, demodulating and de-encapsulating the transmitted signal in a bit stream of encoded media. The codec media bit stream is typically further processed by a decoder 160, whose output is one or more streams of non-media means. tablets Finally, a provider 170 can reproduce the uncompressed media streams with a loudspeaker or a screen, for example. The receiver 150, the decoder 160, and the provider 170 may reside in the same physical device or may be included in separate devices. Scalability in terms of bit rate, decoding complexity, and image size is a desirable property for heterogeneous and error-prone environments. This property is desirable in order to counter-subtract limitations such as constraints on bit rate, display resolution, network performance, and computational power in a receiving device. Scalability can be used to improve the elasticity to errors in a transport system where layered coding is combined with transport prioritization. The term "transport prioritization" refers to various mechanisms for providing different qualities of service in transport, including the proportion of unequal errors, to provide different channels having different speeds or error / loss ratios. Depending on their nature, the data is assigned differently. For example, the base layer can be distributed through a channel with a high degree of error protection, and improvement layers can be transmitted through more error-prone channels. In multi-point and broadcast multimedia applications, constraints on network performance may not be foreseen at the time of encoding. In this way, a scalable stream of bits must be used. Figure 3 shows an IP multicast array where each router can debug the bitstream according to its capabilities. Figure 3 shows a server S that provides a stream of bits to a number of clients C1-C3. The bitstreams are routed to the clients by the routers R1-R3. In this example, the server is providing a cutout that can be scaled to at least three bit rates, 120 kbit / second and 28 kbit / second. If the client and the server are connected via a normal unicast connection, the server may try to adjust the bit rate of the media clip transmitted according to the channel's temporal performance. One solution is to use a bitstream in layers and adapt it to changes in bandwidth by varying the number of enhancement layers transmitted. The present invention is described in the general context of the steps of the method, which can be implemented in a modality by a program product that includes executable instructions in computer, such as the program code, executed by the computers in the network environments. In general, the program modules include routines, programs, objects, components, data structures, etc. who perform particular tasks or implement particular types of abstract data. Computer executable instructions, associated data structures, and program modules represent examples of program code to execute the steps of the methods described herein. The particular sequence of such executable instructions or the associated data structures represent examples of the corresponding acts to implement the functions described in such steps. The software and network implementations of the present invention could be achieved with standard programming techniques with rule-based logic and other logic to carry out the various steps of the database search, correlation steps, steps of comparison and decision steps. It should be noted that the words "component" and "module", as used herein and in the claims, are intended to encompass implementations that use one or more lines of software code and / or hardware implementations, and / or equipment to receive manual entries.
The above description of the embodiments of the present invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the present invention to the precise form described, and modifications and variations are possible in light of the above teachings, or may be acquired from the practice of the present invention. The modalities were chosen and described for the purpose of explaining the principles of the present invention and their practical application to enable a person skilled in the art to use the present invention in various modalities and with various modifications as are suitable for the particular use contemplated. .
It is noted that in relation to this date the best method known by the applicant to carry out the aforementioned invention, is that which is clear from the present description of the invention.

Claims (25)

  1. CLAIMS Having described the invention as above, the content of the following claims is claimed as property: 1. A method for encapsulating a scalable encoded video signal that includes a base layer of an image that is decodable according to a first algorithm, and at least one image enhancement layer that is decodable according to an algorithm, characterized in that it comprises: the coding of the base layer and at least one improvement layer in an access unit, the access unit includes: at least one elementary data unit used for decoding, and an elementary data unit of scalability information, associated with at least a portion of the access unit, wherein the scalability information elementary data unit is configured to be ignored during the decoding according to the first algorithm.
  2. 2. The method of compliance with the claim 1, characterized in that the elementary data unit of scalability information is associated with an image in the access unit.
  3. The method according to claim 1, characterized in that the scalability information elementary data unit comprises the information related to at least a portion of the access unit.
  4. 4. The method according to claim 3, characterized in that the information is selected from the group consisting of priority, temporal level, dependency order indicator, an indicator of whether the elementary data units associated with a higher order indicator of dependency, whether or not they require at least a portion of the access unit for decoding, an indicator of whether at least a part of the access unit is or not a point of change or layer switching where a different layer may change to the current layer, and combinations thereof.
  5. The method according to claim 1, characterized in that it also comprises the encapsulation of the scalable encoded video signal, in a file.
  6. 6. The method of compliance with the claim 5, characterized in that the scalable encoded video signal is encapsulated in the file according to at least one file format of an ISO base-half file format, an AVC file format, an SVC file format, a file format 3GP, and a 3G2 file format.
  7. 7. The method according to claim 1, characterized in that it further comprises the encapsulation of the scalable encoded video signal in a packet stream ..
  8. 8. The method according to the claim 7, characterized in that the packet stream comprises an RTP stream.
  9. The method according to claim 1, characterized in that the scalability information elementary data unit is configured to be ignored by at least one of a H.264 / AVC decoder, a AVC file grammatical analyzer, an AVC file unloader, and H.264 / AVC RTP, and an SVC decoder.
  10. The method according to claim 1, characterized in that it further comprises the elimination of the associated portion of the access unit from the encoded video signal, based on the elementary data unit of scalability information.
  11. The method according to claim 1, characterized in that it further comprises stopping the processing of the associated portion of the access unit from the encoded video signal, based on the elementary data unit of the scalability information .
  12. 12. A computer program product encoded on a computer-readable medium for the encapsulation of a scalable encoded video signal, including a base layer of an image that is decodable according to a first algorithm and at least one image enhancement layer that is decodable according to a second algorithm, characterized in that it comprises: a computer code for coding the base layer and at least one improvement layer in an access unit, the access unit includes: at least one elementary data unit used for decoding, and one data unit elemental scalability information, associated with at least a portion of the access unit, wherein the scalability information elementary data unit is configured to be ignored during decoding according to the first algorithm.
  13. 13. The computer program product according to claim 12, characterized in that the scalability information elementary data unit is associated with an image in the access unit.
  14. 14. The computer program product according to claim 12, characterized in that the scalability information elementary data unit comprises the information related to at least a portion of the access unit.
  15. 15. The computer program product according to claim 14, characterized in that the information is selected from the group consisting of priority, temporal level, dependency order indicator, an indicator of whether the elementary data units associated with an indicator of higher order of dependence, whether or not they require at least a portion of the access unit for decoding, an indicator of whether at least a part of the access unit is or not a point of change or layer switching where a Different layer can change to the current layer, and combinations thereof.
  16. 16. The computer program product according to claim 12, characterized in that it further comprises the encapsulation of the scalable encoded video signal, in a file.
  17. 17. The computer program product according to claim 16, characterized in that the scalable encoded video signal is encapsulated in the file according to at least one file format of an ISO base-half file format, a format of AVC file, an SVC file format, a 3GP file format, and a 3G2 file format.
  18. 18. The computer program product according to claim 12, characterized in that further comprises encapsulating the scalable encoded video signal in a packet stream.
  19. 19. The computer program product according to claim 18, characterized in that the packet stream comprises an RTP stream.
  20. 20. The computer program product according to claim 12, characterized by the scalability information elementary data unit is configured to be ignored by at least one of a H.264 / AVC decoder, a grammar file analyzer AVC , an H.264 / AVC RTP downloader, and an SVC decoder.
  21. 21. The computer program product according to claim 12, characterized in that it further comprises the elimination of the associated portion of the access unit from the encoded video signal, based on the unit of elementary information data of Scalability
  22. 22. The computer program product according to claim 12, characterized in that it further comprises stopping the processing of the associated portion of the access unit from the encoded video signal, based on the elementary data unit of the scalability information.
  23. 23. An electronic device, characterized because it comprises: a processor; and a memory unit communicatively connected to the processor and including a computer program product for encapsulating a scalable encoded video signal, including a base layer of an image that is decodable according to a first algorithm, and at least one layer of improvement of the image that is decodable according to a second algorithm, comprising: the computer code for coding the base layer and at least one improvement layer in an access unit, the access unit includes: at least one unit of elementary data used for decoding, and an elementary data unit of scalability information, associated with at least a portion of the access unit, wherein the scalability information elementary data unit is configured to be ignored during the decoding of according to the first algorithm.
  24. 24. A video signal encoded, scalable, encapsulated, characterized in that it comprises: an access unit that includes a coded base layer of an image that is decodable according to a First algorithm, and at least one image enhancement layer that is decodable according to a second algorithm, the access unit comprises: at least one elementary data unit used for decoding, and an elementary data unit of scalability, associated with at least a portion of the access unit, wherein the scalability information elementary data unit is configured to be ignored during decoding according to the first algorithm.
  25. 25. A method for decoding a scalable, encapsulated, encoded video signal that includes a base layer of an image that is decodable according to a first algorithm and at least one image enhancement layer that is decodable according to a second algorithm, characterized in that it comprises: decoding the base layer and at least one improvement layer from an access unit, the access unit includes: at least one elementary data unit used for decoding, and an elementary data unit of scalability information, associated with at least a portion of the unit of access, wherein the scalability information elementary data unit is configured to be ignored during decoding according to the first algorithm.
MX/A/2008/009353A 2006-01-11 2008-07-21 Backward-compatible aggregation of pictures in scalable video coding MX2008009353A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US60/758,254 2006-01-11

Publications (1)

Publication Number Publication Date
MX2008009353A true MX2008009353A (en) 2008-09-26

Family

ID=

Similar Documents

Publication Publication Date Title
AU2007204168B2 (en) Backward-compatible aggregation of pictures in scalable video coding
CN107431819B (en) Method, apparatus, computer-readable storage medium, and video decoder for video decoding
US9161032B2 (en) Picture delimiter in scalable video coding
RU2435235C2 (en) System and method of indicating interconnections of tracks in multimedia file
RU2697741C2 (en) System and method of providing instructions on outputting frames during video coding
TWI471015B (en) Generic indication of adaptation paths for scalable multimedia
US8699583B2 (en) Scalable video coding and decoding
EP2100459B1 (en) System and method for providing and using predetermined signaling of interoperability points for transcoded media streams
US8442109B2 (en) Signaling of region-of-interest scalability information in media files
TWI482498B (en) Signaling of multiple decoding times in media files
Wenger et al. RTP payload format for scalable video coding
WO2015065804A1 (en) Method and apparatus for decoding an enhanced video stream
MX2008009353A (en) Backward-compatible aggregation of pictures in scalable video coding
Sun et al. MPEG-4/XML FGS approach to multicast video synchronization