WO2008084443A1

WO2008084443A1 - System and method for implementing improved decoded picture buffer management for scalable video coding and multiview video coding

Info

Publication number: WO2008084443A1
Application number: PCT/IB2008/050053
Authority: WO
Inventors: Ying Chen; Ye-Kui Wang; Miska Hannuksela
Original assignee: Nokia Corporation; Nokia, Inc.
Priority date: 2007-01-09
Filing date: 2008-01-08
Publication date: 2008-07-17

Abstract

A system and method for implementing improved decoded picture buffer (DPB) management for scalable video coding and multiview video coding. To address the issue of mismatches in reference picture markings, all prior unnecessary reference pictures can be marked as unused after the decoding of an open decoding refresh (ODR), open layer refresh (OLR), or open view refresh (OVR) picture or, when scalable video coding (SVC) is at issue, an indication can be made in the stream that previous and current reference picture markings are identical. To address the occurrence of potential overflows in the decoded picture buffer, ODR, OLR, or OVR pictures may not be encoded if a switch can cause an overflow m the decoded picture buffer; all prior pictures may be removed from the decoded picture buffer; or prior pictures may removed from the DPB based on certain control indications. Solutions are also provided to address the issue of mismatches in bi-dependent leading pictures.

Description

SYSTEM AND METHOD FOR IMPLEMENTING IMPROVED

DECODED PICTURE BUFFER MANAGEMENT FOR SCALABLE VIDEO CODING AND MULTIVIEW VIDEO

CODING

FIELD OF THE INVENTION

[0001] The present invention relates generally to video coding. More particularly, the present invention relates to decoded picture buffer management when layer switching in scalable video coding or vicw(s) switching in multiview video coding is performed.

BACKGROUND OF THE INVENTION

[0002] This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.

[0003] Video coding standards include ITU-T H.261, ISO/IEC MPEG-I Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 Part 10, or ΛVC). In addition, there are currently efforts underway with regards to the development of new video coding standards. One such standard under development is the scalable video coding (SVC) standard, which will become the scalable extension to H.264/ A VC. Another such standard under development is the multiview video coding (MVC) standard, which will become another extension to the H.264/ΛVC standard (referred to herein as "H.264/AVC").

[0004] Decoded pictures are often used both for predicting subsequent coded pictures and for future output. Such pictures are buffered in a decoded picture buffer (DPB). To efficiently utilize the buffer memory, the DPB management processes, including the storage process of decoded pictures into the DPB, the marking process of reference pictures, output and removal processes of decoded pictures from the DPB, should be specified.

[1)005] The process for reference picture marking in Advanced Video Coding (AVC) is summarized as follows. The maximum number of reference pictures used for inter prediction, referred to as Mref, is indicated in the active sequence parameter set. The sequence parameter set contains parameters that are common for a sequence of coded pictures. The sequence parameter set that is in use for a coded video sequence is referred to as the active sequence parameter set. When a reference picture is decoded, it is marked as "used for reference." If the decoding of the reference picture caused more than Mref pictures marked as "used for reference," at least one picture must be marked as "unused for reference." The DPB removal process then would remove pictures marked as "unused for reference" from the DPB if they are not needed for output as well. There are two types of operation for the reference picture marking: adaptive memory control and sliding window. The operation mode for reference picture marking is selected on a picture basis. The adaptive memory control requires the presence of memory management control operation (MMCO) commands in the bitstream. The memory management control operations enable explicit signaling which pictures are marked as "unused for reference," assigning long-term indices to short-term reference pictures, storage of the current picture as long-term picture, changing a short-term picture to the long-term picture, and assigning the maximum allowed long-term index (MaxLongTermFrameldx) for long-term pictures. If the sliding window operation mode is in use and there are Mref pictures marked as "used for reference," the short-term reference picture that was first decoded picture among those short-term reference pictures that are marked as "used for reference" is marked as "unused for reference." In other words, the sliding window operation mode results into first-in-first-out buffering operation among short-term reference pictures. [0006] Each short-term picture is associated with a variable PicNum that is derived from the syntax element frame_num, and each long-term picture is associated with a variable LongTermPicNum that is derived form the long_term frame idx which is signaled by the MMCO command. [0007] A hypothetical reference decoder (HRD), specified in Annex C of H.264/AVC, is used to check bitstream and decoder conformances. The HRD contains a coded picture buffer (CPB), an instantaneous decoding process, a decoded picture buffer (DPB), and an output picture cropping block. The CPB and the instantaneous decoding process are specified similarly to any other video coding standard, and the output picture cropping block simply crops those samples from the decoded picture that are outside the signaled output picture extents. The DPB was introduced in H.264/ AVC in order to control the required memory resources for decoding of conformant bitstreams. As mentioned above, there are two reasons to buffer decoded pictures: for references in inter prediction and for reordering decoded pictures into output order. The DPB includes a unified decoded picture buffering process for reference pictures and output reordering. A decoded picture is removed from the DPB when it is no longer used as reference and needed for output. The maximum size of the DPB that bitstreams are allowed to use is specified in the Level definitions (Annex A) of H.264/AVC.

[0008J There are two types of conformance for decoders: output timing conformance and output order conformance. For output timing conformance, a decoder must output pictures at identical times compared to the HRD. For output order conformance, only the correct order of output picture is taken into account. The output order DPB is assumed to contain a maximum allowed number of frame buffers. A frame is removed from the DPB when it is no longer used as reference and needed for output. When the DPB becomes full, the earliest frame in output order is output until at least one frame buffer becomes unoccupied.

[0009] The SVC Joint Draft 8.0 is described in JVT-U201 , "Joint Draft 8 of SVC Amendment", 21^st JVT meeting, HangZhou, China, Oct. 2006 (available at ftp3.itu.ch/av-arch/jvt-site/2006_l 0_Hangzhou/JVT-U201.zip), incorporated herein by reference in its entirety. In SVC Joint Draft 8.0, the latest SVC specification, the target layer is identified by dependencyjd being equal to Dependencyldmax, When layer switching is invoked, Dependencyldmax is changed. Correspondingly, the changing of Dependencyldmax indicates that a layer switching has happened. The SVC specification specifies a single-loop decoding process, wherein only a single DPB for the target layer is maintained, as there are no decoded pictures with dependency_jd other than Dependencyldmax to be processed, 10010] The latest joint draft of the MVC standard is in described in JVT-U209, "Joint Draft 1.0 on Multiview Video Coding", 21^st JVT meeting, HangZhou, China, Oct. 2006 (available at ftp3.itu.ch/av-arch/jvt-site/2006_10Jlangzhou/JVT- U209.zip), incorporated herein by reference. The latest draft of the video model of MVC is described in JVT-U207, "Joint Multiview Video Model (JMVM) 2.0", 21st JVT meeting, HangZhou, China, Oct. 2006(available at ftp3.itu.ch/av-arch/jvt- site/2006_l 0_HangzhouZJVT-U207.zip), also incorporated herein by reference. [0011] In multiview video coding, video sequences output from different cameras, each corresponding to a different view, are encoded into one bit-stream. After decoding, to display a certain view, referred to herein as the target view, the decoded pictures belonging to that view are reconstructed and displayed. The decoded pictures from other views that the target view is dependent upon also need to be decoded. It is also possible for more than one view to be reconstructed and displayed. In this case, there is more than one target view. Assuming that there are totally N views coded while only P (1 <= P <= N) views are to output. In this case, the P views to output are the target views. When a view switching occurs, at least one of the following happens: (1) the number of the views to output, i.e. P, changes or (2) any of the views to output changes. In other words, if P changes, or if any of the views to output changes, a view switching has occurred.

[0012] In multiview video coding, each view is identified by its identifier, i.e. view id, which is associated with a camera. Therefore, the above view switching condition can be formulated as follows. If the integer data set {view_id_oo, view_id_oi, ..., view_id_op-i } changes, a view switching has occurred, and vice versa. In this case, view_id_o, denotes the view_id of the i-th view to output. [0013] It is assumed that there are K views that the P target views depend on. These K views are known from the view dependency information signalled in the MVC sequence parameter set (SPS) extension, which is an extension of the sequence parameter set as specified in the MVC specification. Therefore, pictures of all of the P target views and the K dependent views (collectively referred to as the to-be- decoded views) need to be decoded. Consequently, the decoded pictures of all the to- be-decoded views need to be processed by the DPB management process. The DPB management process is summarized as follows. All of the decoded pictures are stored in conceptually one decoded picture buffer (DPB). However, the reference picture marking process, including the sliding window mechanism and the adapύλ c memory control, still follows the H.264/AVC process, but applies to each view independently. Those processes in MVC are constrained in the same view and will be performed separately for each view. For non-reference pictures (pictures that are not used for intra-view inter prediction) that are used for inter-view prediction, implicit marking according to view dependency relationship signaled in the MVC SPS extension is specified. In the latest MVC draft, an assumption is made that the output times of pictures from different views at the same time instance are identical. [0014] An instantaneous decoding refresh (IDR) picture of H.264/AVC contains only intra-coded slices and causes all reference pictures except for the current picture to be marked as "unused for reference." A coded video sequence is defined as a sequence of consecutive access units in decoding order from an IDR access unit, inclusive, to the next IDR access unit, exclusive, or to the end of the bitstrcam, whichever appears earlier. A group of pictures (GOP) in H.264/AVC refers to a number of pictures that are contiguous in decoding order, starting with an intra coded picture, ending with the first picture (exclusive) of the next GOP or coded video sequence in decoding order. All of the pictures within the GOP following the intra picture in output order can be correctly decoded, regardless of whether any previous pictures were decoded. An open GOP is such a group of pictures in which pictures preceding the initial intra picture in output order may not be correctly decodable. An H.264/AVC decoder can recognize an intra picture starting an open GOP from the recovery point SEI message in the H.264/AVC bitstream. The picture starting an open GOP is referred to herein as an open decoding refresh (ODR) picture. A closed GOP is such a group of pictures in which all pictures can be correctly decoded. In H.264/ AVC, a closed GOP starts from an IDR access unit. [0015J Particularly in unicast streaming services, a technique known as stream switching is often used for bitrate adaptation for the prevailing throughput in the network and congestion avoidance. In stream switching, multiple streams from the same original video content are coded. The transmitted stream can be changed in the middle of the streaming session according to the prevailing network conditions. The stream switching can take place when the stream to switch to starts with an IDR picture.

[0016] H.264/AVC and its extensions, SVC and MVC, specify the concept of a video coding layer (VCL) and a network abstraction layer (NAL). The VCL contains the signal processing functionality of the codec— mechanisms such as transform, quantization, motion-compensated prediction, loop filter and inter-layer prediction. A coded picture of a base or enhancement layer comprises one or more slices. The NAL encapsulates each slice generated by the VCL into one or more NAL units. A NAL unit is comprised of a NAL unit header and a NAL unit payload. The NAL unit header contains, among other items, the NAL unit type indicating whether the NAL unit contains a coded slice, a coded slice data partition, a sequence or picture parameter set, etc. A NAL unit stream is a concatenation of a number of NAL units. An encoded bitstream according to H.264/AVC or its extensions, e.g. SVC, is either a NAL unit stream or a byte stream by prefixing a start code to each NAL unit in a NAL unit stream.

[0017] In SVC, the target layer of an SVC bitstream can be switched at any coded picture (a SVC coded picture is defined as all of the coded slices having the same value of dependency_id in one access unit) that has NAL unit type equal to 5 or 21. Such a coded picture is referred to as an IDR coded picture or an IDR picture. No coded picture having identical depcndcncy_id compared to the dependency_id of the IDR picture refers to any picture preceding the IDR picture in decoding order. After decoding an IDR picture in the target layer, all reference pictures except for the IDR picture are marked as "unused for reference," similar to the process for an IDR picture in H.264/AVC. United States Application Serial No. 11/546638, filed October 11, 2006 and incorporated herein by reference in its entirety, describes a system in which some coded picture can be IDR pictures in one access unit, while other coded pictures in the same access units are not IDR pictures. [0018] A scalable nesting supplemental enhancement information (SEI) message specified in SVC contains an ordinary H.264/AVC SEI message and indicates the scope that the message concerns. Consequently, the scalable nesting SEI message enables the reuse of the syntax of H.264/ AVC SEI messages for SVC coarse-granular scalability (CGS) or fine-granular scalability (FGS) enhancement layers. It should be noted that CGS enhance layers may have the same or different spatial resolutions as their respective base layers. The semantics of the recovery point SEI message in the SVC context were proposed in JVT-Ul 10 (available at http://ftp3.ituxh/av-arch/jvt- sitc/2006_10_Hangzhou/JVT-Ul 10.zip) to be the following: When this SEI message is included in a scalable nesting SEI message, then the following applies for each pair of dependencyjd[ i ] and qualityjevel[ i ], referred to as targetDcpendencyld and targetQualityLevel, respectively. When the decoding process is started from the access unit in decoding order associated with the recovery point SEI message, and video coding layer (VCL) network abstraction layer (NAL) units having depcndencyjd less than targetDependencyld or both dependency id equal to targetDependencyld and quality_level equal to or less than targetQualityLevel are decoded, all decoded pictures at or subsequent to the recovery point in output order specified in this SEI message are indicated to be correct or approximately correct in content. An enhancement layer picture associated with a recovery point SEI message with recovcry_frame_cnt equal to 0 is herein referred to as an open layer refresh (OLR) picture.

[0019] An anchor picture in MVC is a coded picture in which all slices reference only slices with the same temporal index, i.e., only slices in other views and not slices in earlier pictures of the current view. An anchor picture is signaled by setting an anchor_pic_flag in the NAL unit header to 1. After decoding the anchor picture, all subsequent coded pictures in output order arc capable of being decoded without inter- prediction from any picture decoded prior to the anchor picture. If a picture in one view is an anchor picture, then all pictures with the same temporal index in other views are also anchor pictures. Consequently, the decoding of any view can be initiated from a temporal index that corresponds to anchor pictures. [0020] In U.S. Provisional Patent Application Serial No. 60/852,223, filed October 16, 2006 and incorporated herein by reference, a system is discussed in which a view_refresh_flag is included in the NAL unit header of MVC NAL units. There are a number of ways to specify the semantics of the view refresh flag. A first way for specifying the semantics of the view_refresh_flag involves having the view_refresh_flag indicate that the current picture and all subsequent pictures in output order in the same view can be correctly decoded when all of the directly depend-on view pictures of the current and subsequent pictures in the same view are also (possibly partially) decoded without decoding any preceding picture in the same view or other views. This implies that (1) none of the depend-on view pictures rely on any preceding picture in decoding order in any view, or (2) if any of the depend-on view pictures rely on any preceding picture in decoding order in any view, then only the constrainedly intra-coded areas of the directly depend-on view pictures of the current and subsequent pictures in the same view are used for inter-view prediction. A constrainedly intra-coded area uses no data from inter-coded neighboring areas for intra prediction. This definition is analogous to an intra picture starting an open GOP in single-view coding.

[0021] A second way for specifying the semantics of the view__refresh_flag involves having the view_refresh_flag indicate that the current picture and all subsequent pictures in decoding order in the same view can be correctly decoded when all the directly depend-on view pictures of the current picture and subsequent pictures in the same view are also completely or, in one embodiment, partially decoded without decoding any preceding picture. This definition is analogous to an intra picture starting a closed GOP in single-view coding.

[0022] A third way for specifying the semantics of the view refresh_flag involves having the view_refresh_flag indicate that the current picture and all subsequent pictures in output order in the same view can be correctly decoded when all of the depend-on view pictures of the current and subsequent pictures in the same view are also completely or, on one embodiment, partially decoded. This definition is also analogous to an intra picture starting an open GOP in single-view coding. In terms of standard specification text, this option can be written as follows: A view refresh flag equal to 1 indicates that the current picture and any subsequent picture in decoding order in the same view as the current picture and following the current picture in output order do not refer to a picture preceding the current picture in decoding order in the inter prediction process. A view_refresh flag equal to 0 indicates that the current picture or a subsequent picture in decoding order in the same view as the current picture and following the current picture in output order may refer to a picture preceding the current picture in decoding order in the inter prediction process. [0023] A fourth way for specifying the semantics of the view refresh_flag involves having the view j-efresh_flag indicate that the current picture and all subsequent pictures in decoding order in the same view can be correctly decoded when all the depend-on view pictures of the current and subsequent pictures in the same view are also completely or, in one embodiment, partially decoded. This definition is also analogous to an intra picture starting a closed GOP in single-view coding. [0024] A picture associated with a view_refresh_flag specified according to the first or third mechanism discussed above is herein referred to as an open view refresh (OVR) picture. It is noted that an anchor picture of MVC is also an OVR picture. A picture associated with a view_refresh_flag specified according to the second or fourth way above is herein referred to as an independent view refresh (IVR) picture. OVR and IVR pictures are collectively referred to as view refresh (VR) pictures. [0025] A leading picture refers to a picture succeeding an ODR, OLR, or OVR picture in decoding order and preceding the ODR, OLR, or OVR in output order. A bi-dependent leading picture is predicted from at least one picture preceding the ODR, OLR, or OVR picture in decoding order.

[0026J Open GOPs, and correspondingly ODR, OLR, and OVR pictures, are used for increased compression efficiency compared to closed GOPs. It would therefore be desirable for ODR, OLR, and OVR pictures to be used for switching purposes as well. However, as ODR, OLR, view refresh and anchor pictures do not reset the marking of all reference pictures as "unused for reference," and as the decoded picture buffer operation of the HRD is not specified for switching occurring in these pictures, various problems occur. These problems are discussed below. [0027] Currently, AVC stream switching and SVC layer switching can occur only at pictures providing a complete reset of reference picture markings, i.e. marking of all reference pictures as unused. The possible switch pictures are IDR pictures. [0028] For MVC, the problems discussed below are also valid for view switchings that occurred at IDR picture (coded slices of one view in one access unit that have NAL unit type equal to 5 or 21) positions.

[0029] A first problem associated with using ODR, OLR and OVR pictures for switching purposes involves the occurrence of mismatches in reference picture marking states. In order to describe this problem, it is helpful to define the state of reference picture marking to include information of all reference pictures marked as "used for reference," The information contains the "spatial identification" (whether the picture is a frame, a top field or a bottom field), the "temporal identification" (the values of framc_num and picture order count) or the long-term index associated to the picture, the layer dependency identification (dependency_id) when applicable, and the view identification (view_id) when applicable. Let the state of the reference picture marking for an ODR, OLR, or OVR picture be equal to normalState, when the target stream, layer, or view (respectively) has been decoded from the beginning of the coded video sequence. Let the state of the reference picture marking for the ODR, OLR, or OVR picture be equal to switchedState, when a switch to the target stream, layer, or view (respectively) happened in this ODR, OLR, or OVR picture. A normal decoding path refers to pictures that are decoded when the same stream, layer, or view is decoded from the beginning of the coded video sequence, and a switched decoding path refers to pictures that are decoded when a switch to another stream, layer, or view happened in the middle of the coded video sequence. It is assumed that the target stream, layer, or view in the normal and switched decoding paths is the same after the switch occurred in the ODR, OLR, or OVR picture, respectively, and both the normal decoding path and the switched decoding path contained an IDR picture. To have identical decoder output for the normal and switched decoding paths after the switch occured, it is necessary that the states normalState and switchedState are identical for each picture at and after decoding of the ODR, OLR, or OVR picture. (0030] At least the following problems arise if normalState and switchedState are not identical. First, initial reference picture lists are constructed based on the state of reference picture marking. Consequently, initial reference picture lists would be different in the two cases. This problem can be avoided when all of the pictures at and after the ODR, OLR, or OVR picture explicitly reorder all used reference pictures with reference picture list reordering (RPLR) commands to the final reference picture lists, if all the required reference pictures are present in the DPB. However, RLPR commands increase the bit rate to some extent.

10031] Second, if the number of pictures marked as "used for long-term reference" in switchedState is greater than in normalState, then the window size for the sliding window memory management control operation is smaller in the switched decoding path. Consequently pictures that are still used for reference in the normal decoding path may be marked as "unused for reference" in the switched decoding path and the decoded sample values in the switched decoding path may become incorrect due to missing reference pictures. The same problem also occurs if certain long-term picture indices are occupied in switchedState but not in normalState.

[0032] Third, if an OVR picture is used for view switching and there is a change in the views that are required for decoding of the target view, then those reference pictures that are in the views no longer needed for reference are not marked as "unused for reference" according to the current MVC draft (JVT-U209). Consequently, the number of frame buffers needed for the decoded picture buffer operation becomes larger unnecessarily and may exceed the limit given in the level of the stream.

[0033] The following demonstrates the above issues in terms of SVC. After layer switching at an OLR picture, some decoded pictures from the previous target layer stored in the DPB marked as "used for reference" may not be marked as "unused for reference" and removed from the DPB during the decoding of the following GOP though they actually are not no longer needed for inter prediction reference. This will effectively cause the reduction of the usable buffer size, which may cause buffer overflow in decoding the remaining part of the bitstream. Consequently, the decoding result becomes unpredictable and the decoder may crash. [0034] Figure 1 presents an example where layer switching occurs several times at anchor picture positions. After each layer switching, some reference pictures in the previous target layer may still remain in the DPB and marked as "used for reference." Cumulatively, the number of such pictures may become close or equal to the maximum number of reference pictures used for inter prediction, i.e. Mref, for the subsequent bitstream. A similar problem as identified above may occur after layer switching, if there arc decoded pictures from the previous target layer stored in the DPB and waiting for output. When they do not cause a DPB buffer overflow in decoding of the following decoded pictures after layer switching, it is better to output those pictures. When they would cause a DPB overflow, they should not be output. [0035] The same problems identified above can occur in MVC when view switching at OVR picture positions. Figure 2 shows an example where the target views before switching are views 1 and 2, while the target views after view switching are views 4, 5, and 6. Views 0 and 3 are dependent but not-to-output views. If there are some reference pictures in views 1 and 2 stored in the DPB after the view switching, because there is no process specified to mark those pictures as "unused for reference," they are always stored in the DPB. The situation is similar as in SVC for pictures from view 1 and 2 stored in the DPB waiting for output after the view switching.

[0036] Another set of problems with using ODR, OLR and OVR pictures for switching purposes involves a potential overflow in the decoded picture buffer. If the required size of the decoded picture buffer in the target stream, layer, or view at the beginning of the video sequence is smaller than the DPB size in the stream, layer, or view that is switched to (when that is decoded using the normal decoding path), switching may cause an overflow in the DPB, which was allocated at the beginning of the video sequence.

[0037J A third set of problems relating to the use of ODR, OLR and OVR pictures for switching purposes involves the mismatches in bi-dependent leading pictures. In certain situations, when an OLR picture is used for layer switching, the bi-dcpendent leading pictures neither in the old target layer nor in the new target layer can be correctly decoded, as at least one of the reference pictures referred by the bi- dependent leading pictures is not decoded. Consequently, switching at an OLR picture can suffer either from incorrectly reconstructed bi-dependent leading pictures or a temporary drop in output picture rate. When an OVR picture is used for view switching to view r and view u becomes unnecessary for decoding, the bi-dependent leading pictures of view r cannot be correctly decoded due to missing reference pictures on view r. Yet, the bi-dependent leading pictures of view r are present in the stream by default. It should be noted that for view switching, the switching can occur at several OVR pictures of the current targeting K+P views, and those bi-dependent leading pictures in the to-be unnecessary views for decoding all have the same problem.

[0038] If AVC stream concatenation is done in certain manners, bi-dependent leading pictures whose reference pictures are not present in the concatenated streams are removed from the concatenated stream too. Alternatively, the broken_link flag in the recovery point SEI message can be set to 1 for such bi-dependent leading pictures. Although these methods can be used handling the bi-dependent leading pictures in AVC stream switching, other methods are still desirable.

SUMMARY OF THE INVENTION

[0039] Various embodiments of the present invention provide mechanisms for addressing each of the issues discussed above. In order to address the issue of mismatches in reference picture marking states after stream, layer or viewing switching at ODR, OLR or OVR pictures, all prior unnecessary reference pictures can be marked as unused after the decoding of an ODR, OLR, or OVR picture. Additionally, when SVC is at issue, an indication can be made in the stream that previous and current reference picture marking states are identical. Regarding the occurrence of potential overflows in the decoded picture buffer, a number of options are available. In one embodiment, ODR, OLR, or OVR pictures are not encoded if a switch can cause an overflow in the decoded picture buffer. In another embodiment, prior pictures are removed from the DPB based on certain control indications. Solutions are also provided to address the issue of mismatches in bi-dependent leading pictures. Various embodiments of the present invention are applicable to both SVC, wherein layers are involved, and MVC, where views are involved. For simplicity purposes, the term "perspective" is used herein to refer to layers (for SVC) and views (for MVC) where necessary or desired.

[0040] These and other advantages and features of the invention, together with the organization and manner of operation thereof, will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, wherein like elements have like numerals throughout the several drawings described below.

BRIEF DESCRIPTION OF THE DRAWINGS

[0041] Figure 1 is a representation of a conventional SVC layer switching process, where layer switching occurs several times at OLR picture positions, potentially causing the number of pictures that are decoded before the layer switching and stored in the decoded picture buffer and marked as "used for reference" to approach or equal the maximum number of reference pictures used for interprediction in the subsequent bitstream;

[0042] Figure 2 is a representation of a conventional MVC layer switching process, where view switching occurs at OVR picture positions;

[0043] Figure 3 is a representation of a generic multimedia communications system for use with the present invention;

[0044] Figure 4 is a perspective view of an electronic device that can be used in the implementation of the present invention; and

[0045] Figure 5 is a schematic representation of the circuitry of the electronic device of Figure 4.

DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS

[0046] Various embodiments of the present invention provide mechanisms for address each of the issues discussed above. In order to address the issue of mismatches in reference picture marking states, a number of solutions are presented in accordance with different embodiments of the present invention. In one embodiment, all prior unnecessary reference pictures can be marked as unused after the decoding of an ODR, OLR, or OVR picture. After stream switching and decoding the ODR picture at which the stream switching happened, all reference pictures preceding the ODR picture are marked as "unused for reference." Similarly, after the decoding of an OLR picture that was used to switch the target layer, all reference pictures preceding the OLR picture in decoding order are marked as "unused for reference." Still further, after decoding of an OVR picture in a view that was not previously decoded, all reference pictures in those views that are not among the target views or their reference views are marked as "unused for reference."

[0047] In an alternative embodiment, and when SVC is at issue, an indication can be made in the stream that previous and current reference picture markings are identical. For example, if an identical temporal scalability hierarchy is used in different coded pictures, the state of reference picture marking may be identical regardless of the decoded layer. The bitstream may contain a syntax element indicating this status, e.g., in the sequence parameter set. The indication, herein referred to as identical ref_pic_marking_flag, may couple the layers for which the reference picture marking is done identically. The decoder operation may be adaptive based on the indication of identical reference picture marking. If the marking is not identical, all prior reference pictures are marked as "unused for reference." Otherwise, no marking as "unused for reference" is performed. [0048] Regarding the occurrence of potential overflows in the decoded picture buffer, a number of options are available according to various embodiments of the present invention. In one embodiment, ODR, OLR, or OVR pictures arc not encoded if a switch can cause an overflow in the decoded picture buffer. In another embodiment, all prior pictures are removed from the DPB. After layer switching, the frame buffers containing all of the decoded pictures that are before the switching point in decoding order are emptied, and the DPB fullness is decremented by the number of frame buffers emptied. After view switching, the frame buffers containing all of the decoded pictures that are before the switching point in decoding order and do not belong to the new target views and the views the new target views depend on are emptied, and the DPD fullness is decremented by the number of frame buffers emptied. [0049] In still another embodiment, prior pictures are removed from the DPB based on certain control indications For SVC, a no_output_of_pπor_layers flag is included in the slice header of an OLR picture This flag is meaningful only if a layer switching happens at an OLR picture, i e , it the target layer has changed from another la> er to the layer containing the OLR picture If a layer switching occurs at an OLR picture, then the following applies after the layer switching If the flag is equal to 1, then the frame buffers containing all the decoded pictures before the sw itching point m decoding order are emptied without outputtmg, and the DPB fullness is decremented by the number of frame buffers that are emptied Otherwise (i e , if the flag is equal to 0), all of the decoded pictures before the switching point m decoding order are output according to their output timestamps The flag is set by the encodei When setting the flag to 0, the encoder ensures that the bitsream after switching can be correctly decoded

[0050] For MVC, a no_output_ofj>rior_views_flag is included m the slice header of an OVR picture The flag is meaningful only if a view switching happens at an OVR picture, and the view containing the OVR picture is one of the new target views after view switching If a view switching happens at an OVR picture, then the following applies after the view switching If the flag of any OVR picture that is at the switching point and is m the new target views or the views the new target views depend on is equal to 1, then the frame buffers containing all of the decoded pictures that are before the switching point m decoding order and are not belonging to the new target views and the views the new target views depend on are emptied without outputtmg, and the DPB fullness is decremented by the number of frame buffers emptied Otherwise (i e , if the flags of all of the OVR pictures that are at the switching point and are in the new target views or the views the new target views depend on are equal to 0), then all of the decoded pictures that are before the switching point in decoding order and are belonging to the old target views before the view switching are outputted according to their output timestamps The flag is set by the encoder When setting the flag to 0, the encoder shall make sure that the bitstream after switching can be correctly decoded (0051] Solutions are also provided to address the issue of mismatches in bi- depcndent leading pictures. For MVC, bi-depcndent leading pictures that are within the k views that are no longer needed as reference for decoding the p target views after a view switch can be removed from the bit stream or not decoded. Alternatively, those bi-dependent leading pictures in the p target views before view switching that do not depend on any picture subsequent to the OVR picture and that are associated with the OVR picture may remain in the bit stream and are decoded provided that the value of no_output_of_prior_views_flag is equal to 0. The reference pictures of those bi-deρendent leading pictures are marked as "unused for reference" when the pictures where the view switch happened are outputted or some other means, such as a signal element in the bit stream, arc used for concluding when the last bi-depcndent leading picture for the to-be-decoded views (before view switching) has been decoded. The use of this arrangement may enable a steadier picture rate. Bi-dependent leading pictures that are within the new views to be decoded should not be included in the bitstream or decoded.

[0052J For SVC, bit-exact decoding of bi-dependent leading pictures should be allowed. Bit-exact decoding can be achieved as follows. (1) The state of the decoder, including e.g. the state of the reference picture marking and decoded picture buffer, is stored first. (2) The picture corresponding to the old target dependency id is decoded from the access unit containing the OLR picture. (3) The bi-dependent leading pictures for the old target dependency_id are decoded. (4) The stored state of the decoder is recovered. (5) The OLR picture is decoded (i.e. the picture corresponding to the new target dependency_id is decoded). (6) The bi-dependent leading pictures for the new target dcρendency_id are not decoded. Other pictures for the new target dependency_id are decoded.

[0053] A dec_of_bidep leading_pics_allowed_flag is included in the bit stream and is associated to an OLR picture. This flag is included only if identical_ref_pic_marking_flag is equal to 1, or these two indications may be semantically combined to one flag. Allowed layers to switch from may be also signaled when the value of the flag is equal to 1. When the value of dec_ofj)idep_leading_pics_allowed_flag is equal to 1, then bi-dependent leading pictures should be decoded. The sample values of bi-dependent leading pictures in the switched decoding path may not be equal to those in the normal decoding path, but represent the original content well enough to be output from the decoder. When the value of dec of_bidep__leading_pics _allowed_flag is equal to 0, then bi-dependent leading pictures are not decoded.

[0054] There are multiple design options for the signaling of ODR, OLR, and OVR pictures in the bit stream. A significant factor affecting the design choice for the signaling is whether the normative decoder operation depends on the detection of ODR, OLR, and OVR pictures. If the detection of ODR, OLR, and OVR pictures is necessary for normative decoder operation, then the indication should appear in the NAL unit header or slice header, for example. If the detection of ODR, OLR, and OVR pictures is not necessary for normative decoder operation, then an SEl message is sufficient. The recovery point SEI message can be used for AVC and SVC, although it should be noted that for SVC, the recovery point SEI message may be included in the nesting SEI message.

[0055] The straightforward detection of OLR and OVR pictures is useful for senders or middleboxes that manipulate the transmitted or forwarded streams by including or excluding layers or views. Decoders may detect a change in decoded layers and views based on those layers and views that are present in the bit stream. [0056] In terms of the detection of layer switching in SVC and view switching in MVC, decoder specifications of SVC and MVC will likely assume that that only those target layer/views and the layers/views the target layer/views depend on will be presented to the decoder. Practical decoder implementations may, however, receive a bit stream that contains additional layers or views. The following discusses how decoders may become aware of the target layer or views in different types of applications.

[0057] For local playback applications, the target layer or views are determined based on the capability of the device and user input. The information on target layer or views is then passed to the decoder. Therefore, the decoder knows immediately when a layer or view switching occurs. [0058] For networked applications, the initial target layer or target views are selected or negotiated in the session setup. If a switch in target layer or views occurs as a response to a receiver decision (e.g. end-user action), then the decoder can again expect a layer or view switch occurring soon. If the layer or view switching is decided by the sender or an intermediate network element without notifying the client, e.g., as a result of rate adaptation to detected network congestion, then the client may detect the switching event by observing the received layers or views within a certain time window and make its decision based on the observation. For example, the client may sense a layer switching if the received highest layer has changed for one second. The client passes the information of layer or view switching to the decoder. Alternatively, the sender or the intermediate network element may create a notification into the bitstream about the change in layers or views. A change of layers can be indicated with a "scalability information layer not present" SEI message. A similar SEI message can be created for a change in views. [0059] The following is a description of the changed operation of the DPB according to the various embodiments of the present invention. This description is based upon Section 8.4.2 of JVT-U209 discussed previously.

[0060] For the operation of the DPB for multiview video, the decoded picture buffer contains frame buffers. Each of the frame buffers may contain a decoded frame, a decoded complementary field pair or a single (non-paired) decoded field that are marked as "used for reference" (reference pictures) or are held for future output (reordered or delayed pictures). Prior to initialization, the DPB is empty (the DPB fullness is set to zero). The following steps of the subclauses of this subclause all happen instantaneously at t_r(n) and in the sequence listed. [0061] For the decoding of gaps in frame _num and storage of "non-existing" frames, the specifications in Annex C.2.1 of JVT-U209 apply independently for each view. The view_id of the generated pictures should be set to the value of the view id of the current picture being processed.

[0062] For picture decoding and output, a picture n belonging to the set of to-be- decoded views that are required to be decoded are decoded in its decoding order. Pictures belonging to the set of K depended views are marked as "not needed for output". The DPB output time t_Oidpb(n) for picture n belonging to the set of p views is derived by (C- 12) and is equal to the DBP output time for all other pictures at the same time instant belonging to the set of p views. The pictures with the same DPB output time are output in the order of ascending view id.

[0063] The output of the current picture is specified as follows. If t_Oid_Pb(n) = t_r(n), then the current picture is output. (When the current picture is a reference picture, it is stored in the DPB). Otherwise ( t_o,dPt)(n) > t_r(n) ), and the current picture is output later, is stored in the DPB (as specified in subclause G.8.4.2.4 of JVT-U209) and are output at time t_{o dpb}(n) unless indicated not to be output by the decoding or inference of no_output of_prior_pics_fiag equal to 1 at a time that precedes t_αj_Pb(n). [0064] The output picture is cropped using the cropping rectangle specified in the sequence parameter set for the sequence. When the picture n is a picture that is output and is not the last picture of the bitstream that is output, the value of Δt_{o dpb}( n ) is defined as by (C-13). The decoded picture is then temporarily stored (but not in the DPB).

[0065] The removal of pictures from the DPB before possible insertion of the current picture proceeds as follows. If view switching is detected, then all reference pictures in the DPB belonging to the previous to be decoded to-be-decoded views while not belonging to the current to be decoded to-be-decoded views are marked as "unused for reference". If the decoded picture is an IDR picture, then all reference pictures in the DPB with the same view_Jd as the decoded picture are marked as "unused for reference" as specified in subclause 8.2.5.1 of JVT-U209. When the IDR picture is not the first IDR picture decoded, and when the value of PicWidthlnMbs or FrameHeightlnMbs or max dec frame buffering derived from the active sequence parameter set is different from the value of PicWidthlnMbs or FrameHeightlnMbs or max_dec frame buffering derived from the sequence parameter set that was active for the preceding sequence, respectively, then no_output_of_prior_pics flag is inferred to be equal to 1 by the HRD, regardless of the actual value of no_output of_prior_ρics_flag. (It should be noted that decoder implementations should try to handle frame or DPB size changes more gracefully than the HRD in regard to changes in PicWidthlnMbs or FrameHeightlnMbs). When no _^output _of_prior_jHcs_flag is equal to 1 or is inferred to be equal to 1 , then all frame buffers in the DPB with the same view__id as the decoded picture are emptied without output of the pictures they contain, and DPB fullness is decremented by the number of pictures emptied.

[0066] In the event that the decoded picture is not an IDR picture, the following applies. If the slice header of the current picture includes memory_management_control_operation equal to 5, then all reference pictures in the DPB with the same view_id as the decoded picture are marked as "unused for reference." Otherwise, (i.e., when the slice header of the current picture does not include memory management_control_operation equal to 5) the decoded reference picture marking process specified in subclause 8.2.5 of JVT-U209 is invoked. [0067] All pictures m in the DPB with the same view_id as the decoded picture, for which all of the following conditions are true, are removed from the DPB. The first condition is that picture m is marked as "unused for reference" or picture m is a non- reference picture. When a picture is a reference frame, it is considered to be marked as "unused for reference" only when both of its fields have been marked as "unused for reference." The second condition is that picture m is marked as "non-existing," it belongs to the set of k views (i.e. marked as "not needed for output") or its DPB output time is less than or equal to the CPB removal time of the current picture n; i.e., to,dpb( πi ) <= t_r( n ).

[0068] When a view switching is detected, the following applies. If the no_output_of_prior_views_flag of any anchor picture at the switching point and in the current (to-be-decoded) views is equal to 1, then all of the frame buffers containing the decoded pictures that are before the switching point in decoding order and are not belonging to the current (to-be-decoded) views are removed from the DPB. Otherwise, (i.e., the no_output_of_prior_views_flag of all the anchor pictures at the switching point and in the current (to-be-decoded) views is equal to 1), all pictures m in the DPB that are before the switching point in decoding order and are not belonging to the current (to-be-decoded) views, for which all of the following conditions are true, are removed from the DPB. [0069] The first condition is that a picture m is either marked as "unused for reference" or is a non-reference picture. When a picture is a reference frame, it is considered to be marked as "unused for reference" only when both of its fields have been marked as "unused for reference." The second condition is that the picture m is marked as "non-existing" or its DPB output time is less than or equal to the CPB removal time of the current picture n; i.e., t_{o dpb}( m ) <= t_τ( n ). [0070] When a frame or the last field in a frame buffer is removed from the DPB, the DPB fullness is decremented by one.

[0071] For current decoded picture marking and storage, the specifications in Annex C.2.4 of JVT-U209 applies independently for each view.

[0072] Figure 3 shows a generic multimedia communications system for use with the present invention. A data source 100 provides a source signal in an analog, uncompressed digital, or compressed digital format, or any combination of these formats. An encoder 110 encodes the source signal into a coded media bitstream. The encoder 1 10 may be capable of encoding more than one media type, such as audio and video, or more than one encoder 1 10 may be required to code different media types of the source signal. The encoder 1 10 may also receive synthetically produced input, such as graphics and text, or it may be capable of producing coded bitstreams of synthetic media. In the following, only processing of one coded media bitstream of one media type is considered to simplify the description. It should be noted, however, that typically real-time broadcast services comprise several streams (typically at least one audio, video and text sub-titling stream). It should also be noted that the system may include many encoders, but in the following only one encoder 110 is considered to simplify the description without a lack of generality. [0073] The coded media bitstream is transferred to a storage 120. The storage 120 may comprise any type of mass memory to store the coded media bitstream. The format of the coded media bitstream in the storage 120 may be an elementary self- contained bitstream format, or one or more coded media bitstreams may be encapsulated into a container file. Some systems operate "live", i.e. omit storage and transfer coded media bitstream from the encoder 1 10 directly to the sender 130. The coded media bitstream is then transferred to the sender 130, also referred to as the server, on a need basis. The format used in the transmission may be an elementary self-contained bitstream format, a packet stream format, or one or more coded media bitstreams may be encapsulated into a container file. The encoder 1 10, the storage 120, and the server 130 may reside in the same physical device or they may be included in separate devices. The encoder 110 and server 130 may operate with live real-time content, in which case the coded media bitstream is typically not stored permanently, but rather buffered for small periods of time in the content encoder 1 10 and/or in the server 130 to smooth out variations in processing delay, transfer delay, and coded media bitrate.

[00741 The server 130 sends the coded media bitstream using a communication protocol stack. The stack may include but is not limited to Real-Time Transport Protocol (RTP), User Datagram Protocol (UDP), and Internet Protocol (IP). When the communication protocol stack is packet-oriented, the server 130 encapsulates the coded media bitstream into packets. For example, when RTP is used, the server 130 encapsulates the coded media bitstream into RTP packets according to an RTP payload format. Typically, each media type has a dedicated RTP payload format. It should be again noted that a system may contain more than one server 130, but for the sake of simplicity, the following description only considers one server 130. [0075] The server 130 may or may not be connected to a gateway 140 through a communication network. The gateway 140 may perform different types of functions, such as translation of a packet stream according to one communication protocol stack to another communication protocol stack, merging and forking of data streams, and manipulation of data stream according to the downlink and/or receiver capabilities, such as controlling the bit rate of the forwarded stream according to prevailing downlink network conditions. Examples of gateways 140 include multipoint conference control units (MCUs), gateways between circuit-switched and packet- switched video telephony, Push-to-talk over Cellular (PoC) servers, IP encapsulators in digital video broadcasting-handheld (DVB-H) systems, or set-top boxes that forward broadcast transmissions locally to home wireless networks. When RTP is used, the gateway 140 is called an RTP mixer or an RTP translator and typically acts as an endpoint of an RTP connection A gateway may also sometimes be referred to as a middlebox.

[0076] The system includes one or more receivers 150, typically capable of receiving, de-modulating, and de-capsulatmg the transmitted signal into a coded media bitstream. The codec media bitstream is typically processed further by a decoder 160, whose output is one or more uncompressed media streams It should be noted that the bitstream to be decoded can be received from a remote device located within virtually any type of network Additionally, the bitstream can be received from local hardware or software Finally, a renderer 170 may reproduce the uncompressed media streams with a loudspeaker or a display, for example. The receiver 150, decoder 160, and renderer 170 may reside in the same physical device or they may be included in separate devices

[0077] Figures 4 and 5 show one representative electronic device 12 within which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type of electronic device 12 or other electronic device. The electronic device 12 of Figures 4 and 5 includes a housing 30, a display 32 in the form of a liquid crystal display, a keypad 34, a microphone 36, an ear-piece 38, a battery 40, an infrared port 42, an antenna 44, a smart card 46 in the form of a UICC according to one embodiment of the invention, a card reader 48, radio interface circuitry 52, codec circuitry 54, a controller 56, a memory 58 and a battery 80. Individual circuits and elements are all of a type well known in the art, for example in the Nokia range of mobile telephones These circuits and components can be incorporated into virtually all of the devices discussed herein, including an encoder, a converter and a decoder.

[0078] Communication devices of the present invention may communicate using various transmission technologies including, but not limited to, Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMΛ), Transmission Control Protocol/Internet Protocol (TCP/IP), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), e-mail, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, etc. A communication device may communicate using various media including, but not limited to, radio, infrared, laser, cable connection, and the like.

10079] The various embodiments described herein are described in the general context of method steps or processes, which may be implemented in one embodiment by a computer program product, embodied in a computer-readable medium, including computer-executable instructions, such as program code, executed by computers in networked environments. A computer-readable medium may include removable and non-removable storage devises including, but not limited to, Read Only Memory (ROM), Random Access Memory (RAM), compact discs (CDs), digital versatile disc (DVD), etc. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.

[0080] Software and web implementations of various embodiments can be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various database searching steps or processes, correlation steps or processes, comparison steps or processes and decision steps or processes. It should be noted that the words "component" and "module," as used herein and in the following claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving manual inputs.

[0081] The foregoing description of embodiments of the present invention has been presented for purposes of illustration and description. The foregoing description is not intended to be exhaustive or to limit embodiments of the present invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the present invention. The embodiments discussed herein were chosen and described in order to explain the principles and the nature of various embodiments and its practical application to enable one skilled in the art to utilize the present invention in various embodiments and with various modifications as are suited to the particular use contemplated. The features of the embodiments described herein may be combined in all possible combinations of methods, apparatus, modules, systems and computer program products.

Claims

WHAT IS CLAIMED IS:

1 A method for decoding coded pictures from a bitstream, each coded picture being associated with a perspective, comprising: associating, with a first coded picture, a change in decoded perspectives compared to previously decoded perspectives; m response to the association, deriving a decision for marking individual decoded pictures as being unused for reference, and based upon the decision, marking decoded pictures preceding the first coded picture, in decoding order, as unused for reference for the decoding of subsequent pictures.

2. The method of claim I , wherein each perspective comprises a layer.

3. The method of claim 1 , wherein each perspective comprises a view.

4. The method of claim 3, wherein the individual pictures being unused for reference are those decoded coded pictures preceding the first coded picture, in decoding order, and not residing among those views that are decoded from an access unit containing the first coded picture

5. The method of claim 1, wherein the associated change is based upon an indication in the bitstream.

6. The method of claim 1, wherein the associated change is based upon the presence of perspectives in the first coded picture

7. The method of claim 1, wherein the derived decision is always to mark decoded pictures preceding the first coded picture, in decoding order, as unused for reference for the decoding of subsequent pictures.

8. The method of claim 1, wherein the derived decision is to mark decoded pictures as unused for reference based upon an indication associated with the bitstream. 0053

9 The method of claim 8, wherein the indication is included within the bitstream

10 The method of claim 1, wherein the first coded picture and a second coded picture corresponding to the immediately previously decoded perspective are included within an access unit, and further comprising' decoding the second coded picture, decoding coded pictures succeeding the second coded picture in decoding order and preceding the second coded picture in output order and residing on the same perspective compared to the second coded picture; and decoding the first coded picture.

11. A computer program product, embodied in a computer-readable medium, for performing the processes of claim 1

12. A decoding apparatus, comprising: a processor, and a memory unit communicatively connected to the processor and including: computer code for associating, with a first coded picture, a change in decoded perspectives compared to previously decoded perspectives; computer code for, in response to the association, deriving a decision for marking individual decoded pictures as being unused for reference; and computer code for, based upon the decision, marking decoded pictures preceding the first coded picture, in decoding order, as unused for reference for the decoding of subsequent pictures.

13. The decoding apparatus of claim 12, wherein each perspective comprises a layer

14 The decoding apparatus of claim 12, wherein each perspective comprises a view .

15 The decoding apparatus of claim 14, wherein the individual pictures being unused for reference are those decoded coded pictures preceding the first coded picture, in decoding order, and not residing among those views that are decoded from an access unit containing the first coded picture

16 The decoding apparatus of claim 12, wherein the associated change is based upon an indication m the bitstream

17 The decoding apparatus of claim 12, wherein the associated change is based upon the presence of perspectives in the first coded picture

18 The decoding apparatus of claim 12, wherein the deπved decision is always to mark decoded pictures preceding the first coded picture, m decoding order, as unused for reference for the decoding of subsequent pictures

19 The decoding apparatus of claim 12, wherein the deπved decision is to mark decoded pictures as unused for reference based upon an indication associated with the bitstream

20 The decoding apparatus of claim 19, wherein the indication is included within the bitstream

21 The decoding apparatus of claim 12, wherein the first coded picture and a second coded picture corresponding to the immediately previously decoded perspective are included within an access unit, and wherein the memory unit further comprises computer code for decoding the second coded picture, computer code for decoding coded pictures succeeding the second coded picture in decoding order and preceding the second coded picture in output order and residing on the same perspective compared to the second coded picture, and computer code for decoding the first coded picture

22 A method for encoding coded pictures to a bitstream, each coded picture being associated with a perspective, comprising encoding a first picture into the bitstream; noting a change in decoded perspectives compared to previously decoded perspectives with regard to the first coded picture.

23. The method of claim 22, wherein each perspective comprises a layer.

24. The method of claim 22, wherein each perspective comprises a view.

25. The method of claim 22, wherein the change is noted through an indication coded into the bitstream.

26. The method of claim 22, wherein the change is noted via the presence of perspectives in the first coded picture.

27. A computer program product, embodied in a computer-readable medium, for performing the processes of claim 22.

28. An encoding apparatus, comprising: a processor; and a memory unit communicatively connected to the processor and including: computer code for encoding a first picture into the bitstream; and computer code for noting a change in decoded perspectives compared to previously decoded perspectives with regard to the first coded picture.

29. The encoding apparatus of claim 28, wherein each perspective comprises a layer.

30. The encoding apparatus of claim 28, wherein each perspective comprises a view.

31. The encoding apparatus of claim 28, wherein the change is noted through an indication coded into the bitstream.

32. The encoding apparatus of claim 28, wherein the change is noted via the presence of perspectives in the first coded picture.

33. A decoding apparatus, comprising: means for associating, with a first coded picture, a change in decoded perspectives compared to previously decoded perspectives; means for, in response to the association, deriving a decision for marking individual decoded pictures as being unused for reference; and means for, based upon the decision, marking decoded pictures preceding the first coded picture, in decoding order, as unused for reference for the decoding of subsequent pictures.

34. An encoding apparatus, comprising: means for encoding a first picture into the bitstream; and means for noting a change in decoded perspectives compared to previously decoded perspectives with regard to the first coded picture.