KR20080066784A

KR20080066784A - Efficient decoded picture buffer management for scalable video coding

Info

Publication number: KR20080066784A
Application number: KR1020087011093A
Authority: KR
Inventors: 예-쿠에이 왕; 미스카 한누크셀라; 슈테판 벵어
Original assignee: 노키아 코포레이션
Priority date: 2005-10-11
Filing date: 2006-10-11
Publication date: 2008-07-16
Also published as: JP2009512306A; CN101317459A; WO2007042914A1; US20070086521A1; EP1949701A1

Abstract

A system and method for enabling the removal of decoded pictures from a decoded picture buffer as soon as the decoded pictures are no longer needed for prediction reference and future output. An indication is introduced into the bifstreara as to whether a picture may be used for mter-layer prediction reference, as well as a decoded picture buffer management method which uses the indication. The present invention includes a process for marking a picture as being used for inter-layer reference or unused for inter-layer reference, a storage process of decoded pictures into the decoded picture buffer, a marking process of reference pictures, and output and removal processes of decoded pictures from the decoded picture buffer.

Description

Efficient decoded picture buffer management for scalable video coding

본 발명은 비디오 코딩 분야에 관한 것이다. 더 구체적으로 말해, 본 발명은 규모가변적 비디오 코딩에 관한 것이다. The present invention relates to the field of video coding. More specifically, the present invention relates to scalable video coding.

비디오 코딩 규격들에는 ITU-T H.261, ISO/IEC MPEG-1 비주얼, ITU-T H.262 또는 ISO/IEC MPEG-2 비주얼, ITU-T H.263, ISO/IEC MPEG-4 비주얼 및 ITU-T H.264 (ISO/IEC MPEG-4 AVC로도 알려짐)가 포함된다. 그 외에도, 현재 새로운 비디오 코딩 규격 개발과 관련한 노력들이 진행중에 있다. 그러한 개발 중인 하나의 규격이 규모가변적 비디오 코딩 (SVC, scalable video coding) 규격으로서, 이것은 H.264/AVC에 대한 규모가변적 확장 버전이 될 것이다. 또 다른 그러한 노력에 차이나 (China) 비디오 코딩 규격들이 포함된다.Video coding standards include ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO / IEC MPEG-4 AVC) is included. In addition, efforts are currently underway to develop new video coding standards. One such specification under development is the scalable video coding (SVC) specification, which will be a scalable extension to H.264 / AVC. Another such effort involves China video coding standards.

규모가변적 비디오 코딩은 규모가변적 비디오 비트 스트림들을 제공할 수 있다. 규모가변적 비디오 비트스트림의 일부는 저하된 비주얼 재생 품질을 가진 채 추출 및 디코딩될 수 있다. 오늘날의 개념 안에서, 규모가변적 비디오 비트스트림은 비규모가변적 (non-scalable) 기저 계층 (base layer) 및 하나 이상의 인핸스먼트 계층 (enhancement layer)들을 포함한다. 인핸스먼트 계층은 시간적 해상도 (즉, 프레임 레이트), 공간적 해상도, 또는 단순히 그것의 하위 계층이나 그 일부에 의해 표현되는 비디오 콘텐츠의 품질을 개선시킬 수 있다. 몇몇 경우들에서, 인핸스먼트 계층의 데이터는 일정 위치 뒤에, 심지어 임의의 위치들에서 잘라질 수 있고, 각각의 자른 위치는 크게 개선된 비주얼 품질을 나타내는 어떤 부가 데이터를 포함할 수 있다. 그러한 규모가변성 (scalability)을 미소 단위 (granularity) 규모가변성 (FGS, fine-grained scalability)이라고 한다. FGS와는 반대로, 미소 단위 규모가변성을 제공하지 않는 품질 인핸스먼트 계층에 의해 제공되는 규모가변성을 조악한 단위의 규모가변성 (CGS, coarse-grained scalabilty)이라 부른다. 기저 계층들도 FGS 규모가변성을 가지도록 디자인 될 수 있으나; 현재의 어떤 비디오 압축 표준이나 표준안 초안도 그러한 개념을 구현하고 있지 못하다.Scalable video coding may provide scalable video bit streams. Portions of the scalable video bitstream may be extracted and decoded with degraded visual reproduction quality. In today's concept, a scalable video bitstream includes a non-scalable base layer and one or more enhancement layers. The enhancement layer may improve temporal resolution (ie, frame rate), spatial resolution, or simply the quality of video content represented by its sublayer or part thereof. In some cases, the data of the enhancement layer may be truncated after a certain position, even at arbitrary positions, and each cut position may include some additional data indicating a greatly improved visual quality. Such scalability is called fine-grained scalability (FGS). In contrast to FGS, scalability provided by a quality enhancement layer that does not provide microscale scalability is called coarse-grained scalabilty (CGS). Base layers may also be designed to have FGS scalability; No current video compression standard or draft standard implements such a concept.

현재 계획단계인 SVC 표준에서의 규모가변적 계층 구조는 비트 스트림을 통해 시그날링 되거나 그 사양에 따라 도출될 수 있는 시간_레벨 (temporal_level), 종속성_아이디 (dependency_id) 및 품질_레벨 (quality_level)이라 불리는 세 가지 변수들을 특징으로 한다. temporal_level은 시간적 규모가변성이나 프레임 레이트를 가리키는데 사용된다. 보다 작은 temporal_level의 화상들을 포함하는 계층은 보다 큰 temporal_level의 화상들을 포함하는 계층보다 작은 프레임 레이트를 갖는다. dependency_id는 계층 간 (inter-layer) 코딩 종속성 위계 (hierarchy)를 가리키는 데 사용된다. 어떤 시간적 위치에서, 보다 작은 dependency_id 값의 화상은 보다 큰 dependency_id 값의 화상 코딩을 위한 계층간 예측에 사용될 수 있다. quality_level은 FGS 계층 위계를 가리키는데 사용된다. 어떤 시간 위치상에서 동 일한 dependency_id 값을 가진, QL과 동일한 quality_level을 갖는 FGS 화상은, 계층간 예측을 위해 QL-1과 동일한 quality_level 값을 갖는 FGS 화상 또는 기저 품질 화상 (즉, QL-1=0일 때 비 FGS 화상)를 이용한다.The scalable hierarchy in the current SVC standard is called temporal_level, dependency_id and quality_level, which can be signaled through the bit stream or derived according to its specification. It features three variables. temporal_level is used to indicate temporal scalability or frame rate. A layer containing pictures of smaller temporal_level has a lower frame rate than a layer containing pictures of larger temporal_level. dependency_id is used to indicate the inter-layer coding dependency hierarchy. At some temporal location, a picture with a smaller dependency_id value can be used for inter-layer prediction for picture coding of a larger dependency_id value. quality_level is used to indicate the hierarchy of FGS layers. An FGS picture with the same quality_level as QL, having the same dependency_id value on any time position, is an FGS picture or base quality picture with the same quality_level value as QL-1 for inter-layer prediction (ie, QL-1 = 0 days). When using non-FGS images).

도 1은 상술한 세 변수들의 표시 값들과 함께 예로 든 규모가변적 비디오 스트림의 시간적 세그먼트를 도시한 것이다. 시간 값들은 상대적인 것임을 인지해야 한다, 즉, time=0이라는 것이 반드시, 그 비트 스트림 상에서의 디스플레이 순서상 최초 화상의 시간임을 의미하는 것은 아니라는 것을 알아야 한다. 이 예의, 통상적 예측 참조 관계가 도 2에 도시되는데, 도 2에서 실선 화살표는 수평 방향의 상호 예측 참조 관계 (interprediction reference relationship)를 가리키고, 점선 블록 화살표는 계층간 예측 참조 관계를 가리킨다. 지시되는 쪽 인스턴스 (instance)는 예측 참조를 위해 반대 방향의 인스턴스를 이용한다.1 illustrates the temporal segment of an example scalable video stream with display values of the three variables described above. Note that the time values are relative, i.e., that time = 0 does not necessarily mean the time of the first picture in the display order on that bit stream. In this example, a typical predictive reference relationship is shown in FIG. 2, in which the solid arrows indicate the horizontal interprediction reference relationship and the dashed block arrows indicate the inter-layer predictive reference relationship. The indicated instance uses the instance in the opposite direction for predictive reference.

여기 논의되는 바와 같이, 한 계층은 각각 동일한 temporal_level, dependency_id 및 quality_level 값들을 가진 화상들의 집합으로 정의된다. 인핸스먼트 계층을 디코딩 및 재생하기 위해, 통상적으로 기저 계층을 포함하는 그 하위 계층들 역시 사용가능해야 하는데, 이는 그 하위 계층들이 인핸스먼트 계층의 디코딩시 계층간 예측을 위해 직간접적으로 사용될 것이기 때문이다. 예를 들어, 도 1 및 2에서, (0,0,0,0) 및 (8,0,0,0)과 동일한 (t, T, D, Q)를 갖는 화상들이 기저 계층에 속하고, 이것은 어떤 인핸스먼트 계층들과도 무관하게 디코딩될 수 있다. (4,1,0,0)과 같은 (t, T, D, Q)를 갖는 화상은 기저 계층의 프레임 레이트의 두 배가 되는 인핸스먼트 계층에 속한다; 이 계층의 디코딩은 기저 계층 화상들의 존재를 필요로 한다. (0,0,0,1) 및 (8,0,0,1)과 같은 (t, T, D, Q)를 갖는 화상들은 FGS 방식으로 기저 계층의 품질 및 비트 레이트를 향상시키는 인핸스먼트 계층에 속한다; 이 계층의 디코딩 역시 기저 계층 화상들의 존재를 필요로 한다.As discussed herein, one layer is defined as a set of pictures, each having the same temporal_level, dependency_id and quality_level values. In order to decode and play the enhancement layer, its lower layers, which typically include the base layer, should also be available, since the lower layers will be used directly or indirectly for inter-layer prediction in decoding the enhancement layer. . For example, in Figures 1 and 2, pictures with (t, T, D, Q) equal to (0, 0, 0, 0) and (8, 0, 0, 0) belong to the base layer, This can be decoded independently of any enhancement layers. Pictures with (t, T, D, Q) such as (4,1,0,0) belong to an enhancement layer that is twice the frame rate of the base layer; Decoding of this layer requires the presence of base layer pictures. Pictures with (t, T, D, Q) such as (0,0,0,1) and (8,0,0,1) are enhancement layers that improve the quality and bit rate of the base layer in the FGS scheme Belongs to; Decoding of this layer also requires the presence of base layer pictures.

현재의 SVC 표준 초안에서, 공간적 혹은 CGS의 인핸스먼트 계층 내 한 코딩(된) 화상은 계층간 예측 참조 표시 (즉, 슬라이스 헤더 안에 base_id_plus1 신택스 엘리먼트)를 포함한다. 계층간 예측은 코딩 모드, 모션 정보 및 샘플 잔여정보 (residual) 예측을 포함한다. 계층간 예측의 사용은 인핸스먼트 계층들의 코딩 효율을 크게 향상시킬 수 있다. 계층간 예측은 예측을 위한 참조로서 하위 계층들을 항상 이용한다. 달리 말해, 상위 계층은 하위 계층의 디코딩에 절대 필요로 되지 않는다는 것이다. In the current SVC standard draft, a coded picture in the enhancement layer of a spatial or CGS includes an inter-layer prediction reference indication (ie, base_id_plus1 syntax element in the slice header). Inter-layer prediction includes coding mode, motion information, and sample residual prediction. The use of inter-layer prediction can greatly improve the coding efficiency of enhancement layers. Interlayer prediction always uses lower layers as a reference for prediction. In other words, the upper layer is never needed for decoding the lower layer.

규모가변적 비디오 비트스트림에서, 인핸스먼트 계층 화상은 계층간 예측에 어떤 하위 계층을 사용할지를 자유로이 선택할 수 있다. 예를 들어, 세 계층들인 base_layer_0, CGS_layer_1 및 spatiallayer_2가 있고, 이들이 동일한 프레임 레이트를 가질 때, 상기 인핸스먼트 계층 화상은 계층간 예측을 위해 이들 계층들 중 어느 하나를 선택할 수 있다. In a scalable video bitstream, an enhancement layer picture is free to choose which lower layer to use for inter-layer prediction. For example, when there are three layers base_layer_0, CGS_layer_1 and spatiallayer_2, and they have the same frame rate, the enhancement layer picture may select any of these layers for inter-layer prediction.

일반적인 계층간 예측 종속성 위계가 도 3에 도시된다. 도 3을 참조하면, 계층간 예측은 종속성 방향으로 가리키는 화살표들로 표시된다. 가리켜진 (pointed-to) 오브젝트는 계층간 예측을 위해 가리키는 (pointed-from) 오브젝트를 필요로 한다. 계속 도 3을 참조하면, 각 계층의 오른쪽에 있는 한 쌍의 값들이 현재의 SVC 표준 초안에서 특정된 것과 같은 dependency_id 및 quality_level 값들을 나타낸다. 그러나, spatial_layer_2 안의 한 화상 역시 도 4에 도시된 것과 같이 계층간 예측을 위해 base_layer_0를 사용하도록 선택할 수 있다. 또한, spatiallayer_2 안의 한 화상은 계층간 예측을 위해 base_layer_2를 선택하지만, 같은 시간적 위치에 있는 CGS_layer_1 안의 한 화상은 도 5에 도시된 것과 같이 어떤 계층간 예측도 전혀 하지 않기로 결정하는 것이 가능하다.A general inter-layer prediction dependency hierarchy is shown in FIG. 3. Referring to FIG. 3, inter-layer prediction is indicated by arrows pointing in the dependency direction. Pointed-to objects need a pointed-from object for inter-layer prediction. With continued reference to FIG. 3, a pair of values to the right of each layer represent dependency_id and quality_level values as specified in the current SVC standard draft. However, one picture in spatial_layer_2 may also choose to use base_layer_0 for inter-layer prediction as shown in FIG. 4. Also, one picture in spatiallayer_2 selects base_layer_2 for inter-layer prediction, but it is possible to decide that one picture in CGS_layer_1 at the same temporal location does no inter-layer prediction at all, as shown in FIG. 5.

FGS 계층들이 관여될 때, 코딩 모드 및 모션 정보에 대한 계층간 예측은 샘플 잔여정보를 위한 계층간 예측이 아닌 기저 계층으로부터 얻어질 수 있다. 예를 들어 도 6에 도시된 바와 같이, spatial_layer_2 화상에 있어서, 코딩 모드 및 모션 정보를 위한 계층간 예측은 CGS_layer_1 화상으로부터 일어나는 반면, 샘플 잔여정보에 대한 계층간 예측은 FGS_layer_1_1 화상으로부터 얻어진다. 다른 예로서 도 7에 도시된 것과 같이 spatial_layer_2 화상에 있어서, 코딩 모드 및 모션을 위한 계층간 예측은 계속해서 CGS_layer_1 화상으로부터 얻어지는 반면, 샘플 잔여정보의 계층간 예측은 FGS_layer_1_0 화상으로부터 시작된다. 상기 관계는, 보다 개념적으로 말해, 코딩 모드, 모션 정보 및 샘플 잔여정보 모두에 대한 계층간 예측이 각각 도 8 및 9에 도시된 바와 같이 동일한 FGS 계층으로부터 얻어지도록 표현될 수 있다.When FGS layers are involved, inter-layer prediction for coding mode and motion information may be obtained from the base layer rather than inter-layer prediction for sample residual information. For example, as shown in FIG. 6, in the spatial_layer_2 picture, the inter-layer prediction for coding mode and motion information takes place from the CGS_layer_1 picture, while the inter-layer prediction for sample residual information is obtained from the FGS_layer_1_1 picture. As another example, in the spatial_layer_2 picture as shown in FIG. 7, the inter-layer prediction for coding mode and motion continues to be obtained from the CGS_layer_1 picture, while the inter-layer prediction of sample residual information starts from the FGS_layer_1_0 picture. More specifically, the relationship may be expressed such that inter-layer predictions for both coding mode, motion information, and sample residual information are obtained from the same FGS layer, as shown in FIGS. 8 and 9, respectively.

비디오 코딩 표준들에서, 한 비트 스트림은 그것이 개념적으로 인코더의 출력과 연결되는 가정하의 참조 디코더에 의해 디코딩될 때 부합되는 것으로서 정의되며, 상기 참조 디코더는 사전 디코더 (pre-decoder) 버퍼, 디코더, 및 출력/디스플레이 유닛을 포함한다. 이러한 가상의 디코더는 H.263, H.264의 가정적 참조 디 코더 (HRD, hypothetical reference decoder) 및 MPEG 하의 비디오 버퍼링 검증기 (VBV, video buffering verifier)라고 알려진다. 3GPP 패킷 교환형 스트리밍 서비스 표준 (3GPP TS 26.234)의 부록 G에서는, 개념적으로 스트리밍 서버의 출력에 연결된다는 차이점을 가진, HRD로서 간주될 수도 있는 서버 버퍼링 검증기를 특정한다. 가상의 디코더 및 버퍼링 검증기 같은 기술들은 이 명세서 전반에 걸쳐 가정적 참조 디코더 (HRD)라고 집합적으로 불릴 것이다. 스트림은 버퍼 오버플로 (overflow)나 언더플로 (underflow) 없이 HRD에 의해 디코딩될 경우 순응적인 것이 된다. 버퍼 오버플로는 버퍼가 이미 가득 차 있을 때 더 많은 비트들이 버퍼 안에 놓여져야 할 경우에 발생한다. 버퍼 언더플로는 디코딩/재생을 위해 비트들이 버퍼로부터 불려져야 할 시점에서 버퍼가 비어 있는 경우에 일어난다. In video coding standards, a bit stream is defined as being matched when it is decoded by a reference decoder under the assumption that it is conceptually connected with the output of an encoder, the reference decoder being a pre-decoder buffer, a decoder, and It includes an output / display unit. Such virtual decoders are known as hypothetical reference decoders (HRDs) of H.263 and H.264 and video buffering verifiers (VBVs) under MPEG. Appendix G of the 3GPP Packet-Switched Streaming Services Standard (3GPP TS 26.234) specifies a server buffering verifier that may be considered as an HRD, with the difference that it is conceptually connected to the output of a streaming server. Techniques such as virtual decoders and buffering verifiers will be collectively referred to as hypothetical reference decoders (HRDs) throughout this specification. The stream becomes compliant when it is decoded by HRD without buffer overflow or underflow. A buffer overflow occurs when more bits have to be placed in the buffer when the buffer is already full. Buffer underflow occurs when the buffer is empty at the point where bits must be called from the buffer for decoding / playback.

HRD 파라미터들이 화상들의 인코딩(된) 사이즈들에 대한 제약사항을 부과하고 필요한 버퍼 사이즈 및 시동 지연 결정을 돕는데 사용될 수 있다. HRD parameters can be used to impose constraints on the encoded sizes of pictures and to help determine the required buffer size and startup delay.

PSS 부록 G 및 H.264 이전의 앞선 HRD 사양에서는, 사전 디코딩된 버퍼의 동작만이 특정되어 있다. 이 버퍼는 보통 H.264의 코딩 화상 버퍼, CPB로 불린다. PSS 부록 G 및 H.264 HRD 안의 HRD는 포스트 디코더 (post-decoder) 버퍼 (H.264에서 디코딩 화상 버퍼 (DBP, decoded picture buffer)라고도 불림)의 동작을 역시 특정한다. 또, 일찍이 HRD 사양들은 오직 한 HRD 동작 포인트만을 사용하지만, PSS 부록 G 및 H.264 HRD의 HRD는 여러 개의 HRD 동작 포인트들을 가능하게 한다. 각각의 HRD 동작 포인트는 HRD 파라미터 값들의 집합에 대응한다.In the preceding HRD specification prior to PSS Annex G and H.264, only the operation of the pre-decoded buffer is specified. This buffer is commonly referred to as H.264 coded picture buffer, CPB. The HRDs in PSS Appendix G and H.264 HRDs also specify the behavior of post-decoder buffers (also called decoded picture buffers (DBPs) in H.264). In addition, although earlier HRD specifications use only one HRD operating point, the HRD of PSS Appendix G and H.264 HRD enables multiple HRD operating points. Each HRD operating point corresponds to a set of HRD parameter values.

SVC 표준 초안에 따르면, 다음 코딩 화상들에 대한 예측 및 추후 출력에 사 용되는 디코딩 화상들은 디코딩 화상 버퍼 (DPB)에서 버퍼링된다. 버퍼 메모리를 효율적으로 이용하기 위해, 디코딩 화상들을 DPB 안에 저장하는 프로세스, 참조 화상들의 마킹 (marking) 프로세스, 디코딩 화상들을 DPB로부터 출력 및 제거하는 프로세스를 포함하는 DPB 관리 프로세스들이 특정된다.According to the SVC standard draft, decoded pictures used for prediction and subsequent output for the next coded pictures are buffered in a decoded picture buffer (DPB). In order to use the buffer memory efficiently, DPB management processes are specified that include a process of storing decoded pictures in a DPB, a marking process of reference pictures, and a process of outputting and removing decoded pictures from a DPB.

현재의 SVC 표준 초안에 특정된 DPB 관리 프로세스들은, 계층간 예측을 위해 버퍼링될 필요가 있는 디코딩 화상들의 관리를, 그 화상들이 비참조 화상들일 때 특히, 효율적으로 처리할 수 없다. 이것은 DPB 관리 프로세스들이, 기껏해야 시간적 규모가변성을 지원하는 전통적인 단일 계층 코딩에 대해 의도되고 있다는 사실에서 비롯된다. DPB management processes specified in the current draft of the SVC standard cannot efficiently handle the management of decoded pictures that need to be buffered for inter-layer prediction, especially when those pictures are non-reference pictures. This is due to the fact that DPB management processes are intended for traditional single layer coding, which at best supports temporal scalability.

H.264/AVC 같은 전통적 단일 계층 코딩 시, 상호 예측 참조나 추후 출력을 위해 버퍼링 되어야 하는 디코딩 화상들은 이들이 상호 예측 참조 및 추후 출력을 위해 더 이상 필요로 되지 않을 때 버퍼에서 제거될 수 있다. 참조 화상이 상호 예측 참조 및 추후 출력에 더 이상 필요로 되지 않게 되자마자 그 참조 화상을 제거할 수 있도록, 참조 화상이 상호 예측 기준에 더 이상 필요하지 않게 되자마자 그것이 알려질 수 있도록, 참조 화상 마킹 프로세스가 특정된다. 그러나, 계층간 예측 참조를 위한 화상들에 대해, 현재로서는, 계층간 예측 참조에 더 이상 필요로 되지 않게 된 한 화상에 대한 정보를 가능한 한 빠르게 얻을 수 있도록 디코더를 돕는 어떠한 이용가능한 메커니즘도 존재하지 않는다. 그러한 방법 하나는, 원하는 규모가변적 계층의 각 화상을 디코딩한 후에, 이하의 조건들이 모두 참 (true)인 DPB 내 모든 화상들을 DPB로부터 제거하는 동작을 수반할 수 있다; 조건들 1) 화상이 비참조 화상임; 2) 화상이 방금 디코딩된 화상과 같은 액세스 유닛 안에 있음; 3) 화상이 상기 원하는 규모가변적 계층보다 하위에 있는 계층 안에 있음. 결국, 계층간 예측 참조를 위한 화상들은 DPB에서 불필요하게 버퍼링되게 될 것이고, 이것이 버퍼 메모리 이용도의 효율성을 떨어뜨리게 된다. 예를 들어, 필요로 되는 DPB는 기술적으로 필요한 것 이상으로 커질 것이다.In traditional single layer coding such as H.264 / AVC, decoded pictures that must be buffered for cross prediction or later output can be removed from the buffer when they are no longer needed for cross prediction and later output. The reference picture marking process so that it can be known as soon as the reference picture is no longer needed for cross prediction criteria, so that the reference picture can be removed as soon as the reference picture is no longer needed for cross prediction and future output. Is specified. However, for pictures for inter-layer prediction reference, there are currently no available mechanisms to help the decoder to get information about the picture as quickly as possible as long as it is no longer needed for inter-layer prediction reference. Do not. One such method may involve, after decoding each picture of the desired scalable layer, removing all pictures in the DPB from which the following conditions are all true: Conditions 1) the picture is a non-reference picture; 2) the picture is in the same access unit as the picture just decoded; 3) the image is in a layer below the desired scalable layer. As a result, pictures for inter-layer prediction reference will be unnecessarily buffered in the DPB, which reduces the efficiency of buffer memory utilization. For example, the required DPB will grow beyond what is technically necessary.

그 외에, 규모가변적 비디오 코딩에 있어서, 재생에 요망되는 규모가변적 게층보다 하위에 있는 어떤 규모가변적 계층의 디코딩(된) 화상들은 절대 출력되지 않는다. 그러한 화상들이 상호 예측 또는 계층간 예측에 필요로 될 때 이들을 DPB에 저장하는 것은 간단히 말해 버퍼 메모리의 낭비가 된다.In addition, in scalable video coding, decoded pictures of any scalable layer below the scalable layer desired for reproduction are never output. When such pictures are needed for mutual prediction or inter-layer prediction, storing them in the DPB is simply a waste of buffer memory.

따라서, 디코딩 화상들이 예측 참조 (상호 예측 또는 계층간 예측) 및 추후 출력에 더 이상 필요로 되지 않게 되자마자, DPB로부터 그 화상들을 제거하는 시스템 및 방법을 제공하는 것이 요망될 것이다. Accordingly, it will be desirable to provide a system and method for removing pictures from a DPB as soon as decoded pictures are no longer needed for predictive reference (inter prediction or inter-layer prediction) and later output.

본 발명은 디코딩 화상들이 상호 예측 참조, 계층간 예측 참조 및 추후 출력에 더 이상 필요로 되지 않게 되자마자 DPB로부터 그 디코딩 화상들의 제거를 가능하게 하는 시스템 및 방법을 제안한다. 본 발명의 시스템 및 방법은 비트스트림 안에, 한 화상이 계층간 예측 참조에 사용될 수 있는지 여부에 대한 표시 및, 그 표시를 이용하는 DPB 관리 방법에 대한 도입부를 포함한다. DPB 관리 방법은 계층간 참조에 사용되거나, 계층간 참조에 사용되지 않는 한 화상을 마킹하는 프로세스, DPB 안으로의 디코딩(된) 화상들의 저장 프로세스, 참조 화상들의 마킹 프로세스, 및 DPB로부터 디코딩 화상들의 출력 및 제거 프로세스들을 포함한다. 한 화상이 계층간 예측 참조에 더 이상 필요로 되자마자 디코더가 알 수 있도록, 그 화상을 계층간 참조에 미사용한다는 마킹을 수행하기 위해, 새로운 메모리 관리 제어 동작 (MMCO, memory management control operation)이 정의되고, 비트 스트림 안에 해당 시그날링이 특정된다.The present invention proposes a system and method that enables the removal of decoded pictures from a DPB as soon as the decoded pictures are no longer needed for cross prediction reference, inter-layer prediction reference and future output. The system and method of the present invention include an introduction into the bitstream for an indication of whether a picture can be used for inter-layer prediction reference, and an introduction to a DPB management method using that indication. The DPB management method is a process of marking a picture, a process of storing decoded pictures into a DPB, a marking process of reference pictures, and an output of decoded pictures from a DPB unless used for inter-layer reference or for inter-layer reference. And removal processes. A new memory management control operation (MMCO) is defined to perform marking that the picture is not used for inter-layer references so that the decoder knows as soon as a picture is no longer needed for inter-layer predictive references. The corresponding signaling is specified in the bit stream.

본 발명은 규모가변적 비디오 비트 스트림들의 디코딩을 위해 필요한 메모리를 절감할 수 있는 디코딩 화상 버퍼 관리 프로세스 제공을 가능하게 한다. 본 발명은 H.264/AVC 비디오 코딩 표준의 규모가변적 확장의 맥락 및 기타의 규모가변적 비디오 코딩 방식들 안에서 이용될 수 있다.The present invention makes it possible to provide a decoded picture buffer management process that can save the memory required for decoding scalable video bit streams. The present invention can be used in the context of scalable expansion of the H.264 / AVC video coding standard and other scalable video coding schemes.

본 발명의 이러한, 그리고 기타의 이점들과 특징들과, 그 동작 체계 및 방식 모두, 첨부된 도면과 연계하여 취해진 이하의 상세한 설명으로부터 명확히 파악될 것이고, 이하에서 논의될 여러 도면들 전체에서 동일한 참조부호는 동일한 구성요소를 가리킨다. These and other advantages and features of the present invention, as well as their operating system and manner, will be apparent from the following detailed description taken in conjunction with the accompanying drawings, and like reference throughout the several drawings to be discussed below. The symbols indicate the same component.

도 1은 세 변수들인 temporal_level, dependency_id 및 quality_level의 값들이 표시되는, 예로 든 규모가변적 비디오 스트림의 시간상의 세그먼트를 보인다.1 shows a temporal segment of an example scalable video stream in which values of three variables, temporal_level, dependency_id and quality_level, are indicated.

도 2는 도 1에 도시된 시간상의 세그먼트에 있어서의 통상의 예측 참조 관계이다.FIG. 2 is a typical predictive reference relationship in the temporal segment shown in FIG. 1.

도 3은 통상적 계층간 예측 종속성 위계 표현으로서, 여기서 화살표는 가리켜진 오브젝트가 계층간 예측 참조를 위해 가리킨 오브젝트를 이용한다는 것을 가 리킨다. 3 is a typical inter-layer prediction dependency hierarchical representation, where an arrow indicates that the object pointed to uses the object pointed to for inter-layer prediction reference.

도 4는 어떻게 spatial_layer_2 안의 한 화상이 계층간 예측에 base_layer_0를 또한 이용하도록 선택할 수 있는지를 보인 흐름도이다.4 is a flow chart showing how a picture in spatial_layer_2 may choose to also use base_layer_0 for inter-layer prediction.

도 5는 spatial_layer_2 안의 한 화상이 계층간 예측을 위해 base_layer_0를 선택하는 한편, 같은 시간적 위치상에 있는 CGS_layer_1 안의 화상은 어떠한 계층간 예측치도 갖지 않도록 결정하고 있는 예를 나타낸다. FIG. 5 shows an example in which a picture in spatial_layer_2 selects base_layer_0 for inter-layer prediction, while determining that a picture in CGS_layer_1 on the same temporal location does not have any inter-layer prediction.

도 6은 코딩 모드 및 모션 정보를 위한 계층간 예측이 샘플 잔여정보에 대한 계층간 예측과는 다른 기저 계층으로 나오는 방법을 보인 예를 나타낸 것이다.FIG. 6 illustrates an example of how an inter-layer prediction for coding mode and motion information is output to a base layer different from the inter-layer prediction for sample residual information.

도 7은 spatial_layer_2 화상에 있어서, 샘플 잔여 정보를 위한 계층간 예측이 FGS_layer_1_0 화상으로부터 나오는 한편, 코딩 모드 및 모션을 위한 계층간 예측이 CGS_layer_1 화상으로부터 어떻게 나올 수 있는가를 보인 예이다.FIG. 7 shows an example of how an inter-layer prediction for sample residual information comes from an FGS_layer_1_0 picture while the spatial_layer_2 picture is derived from a CGS_layer_1 picture while the inter-layer prediction for coding mode and motion is shown.

도 8은 코딩 모드와 모션 정보가 기저 품질 계층으로부터 이어받은 것인 경우, 그 코딩 모드, 모션 정보 및 샘플 잔여 정보 모두에 대한 계층간 예측이 FGS_layer_1_1 화상으로부터 나오는 예를 나타낸 것이다.8 illustrates an example in which inter-layer prediction for both coding mode, motion information, and sample residual information comes from an FGS_layer_1_1 picture when the coding mode and motion information are inherited from the base quality layer.

도 9는 코딩 모드와 모션 정보가 기저 품질 계층으로부터 이어받은 것인 경우, 그 코딩 모드, 모션 정보 및 샘플 잔여 정보 모두에 대한 계층간 예측이 FGS_layer_1_0 화상으로부터 나오는 예를 나타낸 것이다. FIG. 9 shows an example where an inter-layer prediction for both coding mode, motion information, and sample residual information comes from an FGS_layer_1_0 picture when the coding mode and motion information are inherited from the base quality layer.

도 10은 종래에 알려진 시스템들에 따라, 액세스 유닛에 있는 다수의 코딩 화상들에 대한 상태 진화 프로세스의 예를 도시한 것이다. 10 illustrates an example of a state evolution process for multiple coded pictures in an access unit, in accordance with conventionally known systems.

도 11은 본 발명의 시스템 및 방법에 따라, 액세스 유닛 안의 여러 코딩 화 상들의 상태 진화 프로세스의 예를 보인 것이다.11 shows an example of a state evolution process of various coded pictures within an access unit, in accordance with the system and method of the present invention.

도 12는 본 발명이 구현될 수 있는 시스템의 개략도이다.12 is a schematic diagram of a system in which the present invention may be implemented.

도 13은 본 발명의 원리들을 포함할 수 있는 전자 기기의 정면도이다.13 is a front view of an electronic device that may incorporate the principles of the present invention.

도 14는 도 13의 전자 기기의 개략적 회로도 표현이다.14 is a schematic circuit diagram representation of the electronic device of FIG. 13.

도 15는 본 발명의 규모가변적 코딩 위계가 적용될 수 있는 일반 멀티미디어 데이터 스트리밍 시스템을 예시한 것이다. 15 illustrates a general multimedia data streaming system to which the scalable coding hierarchy of the present invention can be applied.

도 6을 참조하여, 본 발명의 처리절차를 적용하기 위한 한 시스템인, 통상의 멀티미디어 스트리밍 시스템이 논의될 것이다. Referring to Fig. 6, a conventional multimedia streaming system, which is one system for applying the processing procedure of the present invention, will be discussed.

멀티미디어 데이터 스트리밍 시스템은 보통, 비디오 카메라 및 마이크, 또는 메모리 캐리어에 저장되는 비디오 이미지나 컴퓨터 그래픽 파일들 같은 하나 이상의 멀티미디어 소스들(100)을 포함한다. 서로 다른 멀티미디어 소스들(100)로부터 얻은 미가공 데이터는 편집기라고도 불릴 수 있는 인코더(102) 안의 멀티미디어 파일 속에 결합될 것이다. 하나 이상의 멀티미디어 소스들(100)로부터 나온 미가공 데이터는 우선 인코더(102)에 포함된 캡처 수단(104)을 이용해 캡처되는데, 그 캡처 수단은 통상적으로 서로 다른 인터페이스 카드들, 드라이버 소프트웨어, 또는 카드 기능을 제어하는 어플리케이션 소프트웨어로서 구현될 수 있다. 예를 들어, 비디오 데이터가 비디오 캡처 카드 및 관련 소프트웨어를 이용해 캡처될 수 있다. 캡처 수단(104)의 출력은 보통 압축되지 않았거나 약간 압축된 데이터 플로 (flow)가 되는데, 이를테면, 비디오 캡처 카드가 관여될 때 YUV 4:2:0 포맷이나 모션 JPEG 이미지 포맷의 미압축 비디오 프레임들이 된다.A multimedia data streaming system typically includes one or more multimedia sources 100, such as video cameras and microphones, or video images or computer graphics files stored in a memory carrier. Raw data from different multimedia sources 100 will be combined into a multimedia file in encoder 102, which may also be called an editor. Raw data from one or more multimedia sources 100 is first captured using capture means 104 included in encoder 102, which typically captures different interface cards, driver software, or card functionality. It can be implemented as controlling application software. For example, video data can be captured using a video capture card and associated software. The output of the capture means 104 is usually an uncompressed or slightly compressed data flow, such as an uncompressed video frame in YUV 4: 2: 0 format or Motion JPEG image format when the video capture card is involved. It becomes.

편집기(106)는 비디오 및 오디오 플로들이 원하는 바와 같이 동시에 재생될 수 있게 이들을 동기시키도록 서로 다른 미디어 플로우들을 서로 링크시킨다. 편집기(106)는 또한 프레임 레이트를 절반으로 내리거나, 공간적 해상도 등을 감축함으로써 비디오 플로 같은 각각의 미디어 플로를 편집할 수도 있을 것이다. 동기되어 있더라도, 압축기(108)에서는 개별 미디오 플로들이 압축되며, 이때 각 미디어 플로는 그 미디어 플로에 적합한 압축기를 써서 개별적으로 압축된다. 예를 들어, YUV 4:2:0 포맷의 비디오 프레임들은 ITU-T 권고안 H.263이나 H.264를 이용해 압축될 수 있다. 따로 따로인, 동기 및 압축된 미디어 플로들은 통상적으로 멀티플렉서(110) 안에서 인터리빙되며, 인코더(102)로부터 얻어진 출력은 복수의 미디어 플로들의 데이터를 포함하는 하나의 단일한 비트 플로로서 멀티미디어 파일이라고 부를 수 있다. 멀티미디어 파일을 생성하는 것이 반드시 복수의 미디어 플로들을 하나의 파일로 다중화하는 것을 필요로 하는 것은 아니며, 스트리밍 서버가 미디어 플로들을 전송하기 바로 전에 그들을 인터리빙할 수 있다는 것을 알아야 할 것이다. The editor 106 links the different media flows together to synchronize the video and audio flows so that they can be played simultaneously as desired. The editor 106 may also edit each media flow, such as a video flow, by cutting the frame rate in half, reducing the spatial resolution, and the like. Although synchronized, the individual media flows are compressed in the compressor 108, where each media flow is individually compressed using a compressor suitable for that media flow. For example, video frames in YUV 4: 2: 0 format can be compressed using ITU-T Recommendation H.263 or H.264. Separately, synchronous and compressed media flows are typically interleaved within the multiplexer 110, and the output obtained from the encoder 102 may be called a multimedia file as one single bit flow containing data of a plurality of media flows. have. It will be appreciated that creating a multimedia file does not necessarily require multiplexing multiple media flows into one file, and that a streaming server can interleave them just before sending the media flows.

멀티미디어 파일들은 스트리밍 서버(112)로 전송되고, 그러면 스트리밍 서버는 실시간 스트리밍이나 점진적 (progressive) 다운로딩의 형태로 스트리밍을 실행할 수 있다. 점진적 다운로딩시, 멀티미디어 파일들은, 수요가 일어날 때 이들이 전송을 위해 검색될 수 있는 서버(112)의 메모리 안에 먼저 저장된다. 실시간 스트리밍 시, 편집기(102)는 멀티미디어 파일들의 연속적 미디어 플로를 스트리밍 서 버(112)로 전송하며, 서버(112)는 그 플로를 바로 클라이언트(114)로 전달한다. 추가 옵션으로서, 멀티미디어 파일들이, 서버(112)로부터 액세스 가능하고 그로부터 실시간 스트리밍이 도출될 수 있고 수요가 있을 때 멀티미디어 파일들의 연속적 미디어 플로가 시작되는 스토리지 안에 저장되도록, 실시간 스트리밍이 또한 수행될 수 있다. 그 경우, 편집기(102)는 어떤 수단에 의해서든 스트리밍을 반드시 통제하지는 못한다. 스트리밍 서버(112)는 클라이언트(114)의 사용 가능 대역폭 또는 최대 디코딩 및 재생 레이트에 관해서, 멀티미디어 데이터의 트래픽 정형화 (traffic shaping)을 수행하며, 스트리밍 서버는 전송정보로부터 B-프레임들을 제거하거나 규모가변성 계층들의 개수를 최적화하는 등에 의해 미디어 플로의 비트 레이트를 조정할 수가 있다. 더 나아가, 스트리밍 서버(112)는 다중화된 미디어 플로의 헤더 필드들을 변경하여 그들의 사이즈를 줄이고 사용하는 통신 네트워크 상에서의 전송에 적합한 데이터 패킷들 안에 그 멀티미디어 데이터를 캡슐화할 수 있다. 클라이언트(114)는 보통 적절한 제어 프로토콜을 사용해 서버(112)의 동작을 적어도 어느 정도까지는 조정할 수 있다. 클라이언트(114)는 적어도 원하는 멀티미디어 파일이 클라이언트에게 전송되도록 선택될 수 있게 서버(112)를 통제할 수 있고, 여기에 더해, 클라이언트는 보통 멀티미디어 파일 전송을 정지 및 중단시킬 수 있다. The multimedia files are sent to streaming server 112, which can then perform streaming in the form of real-time streaming or progressive downloading. In progressive downloading, the multimedia files are first stored in the memory of the server 112 where they can be retrieved for transmission when demand arises. In real time streaming, the editor 102 sends a continuous media flow of multimedia files to the streaming server 112, which server 112 passes the flow directly to the client 114. As an additional option, real time streaming may also be performed such that multimedia files are accessible from server 112 and from which live streaming can be derived and stored in storage where a continuous media flow of multimedia files begins when there is a demand. . In that case, the editor 102 does not necessarily control streaming by any means. The streaming server 112 performs traffic shaping of the multimedia data with respect to the available bandwidth of the client 114 or the maximum decoding and playback rate, and the streaming server removes or scales B-frames from the transmission information. By optimizing the number of layers, the bit rate of the media flow can be adjusted. Furthermore, the streaming server 112 can change the header fields of the multiplexed media flow to reduce their size and encapsulate the multimedia data in data packets suitable for transmission on the telecommunications network in use. The client 114 can typically adjust the operation of the server 112 to at least some extent using an appropriate control protocol. The client 114 can control the server 112 so that at least the desired multimedia file can be selected to be sent to the client, in addition, the client can usually stop and stop the multimedia file transfer.

이하의 텍스트는 본 발명의 한 특정 실시예를 SVC 표준을 위한 특정 텍스트 형식으로 기술한 것이다. 이 실시예에서, 디코딩 참조 화상 마킹 신택스 (marking syntax)는 다음과 같다.The following text describes one specific embodiment of the present invention in a specific text format for the SVC standard. In this embodiment, the decoding reference picture marking syntax is as follows.

디코딩 참조 화상 마킹 신택스Decoding Reference Picture Marking Syntax

규모가변적 확장 신택스 안에서의 슬라이스 헤더는 다음과 같다.The slice header in the scalable extension syntax is as follows.

규모가변적 확장 신택스 내 슬라이스 헤더Slice headers in scalable extension syntax

디코딩 참조 화상 마킹 어문 (semantics)에 있어서, "num_inter_layer_mmco"는 DPB 안의 디코딩 화상들을 "계층간 예측에 미사용"이라고 마킹하기 위한 memory_management_control 동작 횟수를 가리킨다. "dependency_id[i]"는 "계층간 예측 미사용"이라고 마킹될 화상의 dependency_id를 나타낸다. dependency_id[i]는 현 화상의 dependency_id보다 작거나 같다. "quality_level[i]은 "계층간 예측 미사용"이라고 마킹될 화상의 quality_level을 나타낸다. dependency_id[i]가 dependency_id와 같을 때, quality_level[i]는 quality_level보다 작다. 현 화상과 같은 액세스 유닛에 있고 dependency_id[i]와 같은 dependency_id 및quality_level[i]와 같은 quality_level을 가진 디코딩 화상은 1인 inter_layer_ref_flag를 가질 것이다.In decoding reference picture marking semantics, "num_inter_layer_mmco" refers to the number of memory_management_control operations for marking decoded pictures in a DPB as "unused for inter-layer prediction." "dependency_id [i]" indicates dependency_id of an image to be marked "inter-layer prediction not used". dependency_id [i] is less than or equal to dependency_id of the current picture. "quality_level [i] represents the quality_level of the picture to be marked" inter-layer prediction not used. "When dependency_id [i] is equal to dependency_id, quality_level [i] is less than quality_level.It is in the same access unit as the current picture and dependency_id [ A decoded picture with dependency_id and quality_level [i], such as i], will have inter_layer_ref_flag equal to 1.

존재할 경우, 규모가변적 확장 신택스 엘리먼트들인 pic_parametr_set_id, frame_num, inter_layer_ref_flag, field_pic_flag, bottom_field_flag, idr_pic_id, pic_order_cnt_1sb, delta_pic_order_cnt_bottom, delta_pic_order_cnt[0], delta_pic_order_cnt[1], 및 slice_group_change_cycle 내 슬라이스 헤더의 값은 코딩 화상의 모든 슬라이스 헤더들 안에서와 동일하다. "frame_num"은 현재의 SVC 표준 초안의 하위 절 S.7.4.3의 fram_num과 같은 어문구조를 가진다. 0에 해당하는 "inter_layer_ref_flag" 값은, 현재의 화상이, 현재의 화상에 대한 dependency_id 값보다 큰 값의 dependency_id를 가진 어떤 화상의 디코딩을 위한 계층간 예측 참조에 사용되지 않는다는 것을 의미한다. 1에 해당하는 "inter_layer_ref_flag" 값은 현재의 화상보다 큰 값의 dependency_id를 가진 화상 디코딩을 위해 현재의 화상이 계층간 예측 참조로서 사용될 수 있다는 것을 가리킨다. "filed_pic_flag"는 현재의 SVC 표준 초안 하위 절 S.7.4.3의 filed_pic_flag와 동일한 어의구조를 가진다.If present, the scalable extension syntax elements pic_parametr_set_id, frame_num, inter_layer_ref_flag, field_pic_flag, bottom_field_flag, idr_pic_id, pic_order_cnt_1sb, delta_pic_order_cnt_bottom, delta_pic_order_cnt [0], slice_delta_pic_slice in header, delta_pic_order_cnt [0], and delta_pic_order. Same as inside "frame_num" has the same syntax as fram_num in subclause S.7.4.3 of the current draft of the SVC standard. A value of "inter_layer_ref_flag" corresponding to 0 means that the current picture is not used for inter-layer prediction reference for decoding of any picture having a dependency_id having a value larger than the dependency_id value for the current picture. A value of "inter_layer_ref_flag" corresponding to 1 indicates that the current picture can be used as an inter-layer prediction reference for picture decoding with a dependency_id of a value larger than the current picture. "filed_pic_flag" has the same semantics as filed_pic_flag in the current subclause S.7.4.3 of the SVC standard draft.

디코딩 화상 마킹 프로세스의 일련의 동작들에 있어서, "inter_layer_ref_flag" 값이 1과 같으면, 현재의 화상은 "계층간 참조에 사용됨"으로 마킹된다.In a series of operations of the decoding picture marking process, if the "inter_layer_ref_flag" value is equal to 1, the current picture is marked as "used for inter-layer reference".

한 화상을 "계층간 참조에 미사용"으로 마킹하는 프로세스에 있어서, 이 프로세스는 "num_inter_layer_mmco"의 값이 0이 아닐 때 유발된다. 이하의 모든 조건들이 참이 되는 DPB 내 모든 화상들은 "계층간 참조에 미사용"으로 마킹된다: 조건 (1) 화상이 현재의 화상과 같은 액세스 유닛에 속함; (2) 화상이 1에 해당하는 "inter_layer_ref_flag" 값을 가지고 "계층간 참조에 사용됨"으로 마킹됨; (3) 화상이 현재의 화상에 대한 dec_ref_pic_marking()을 통해 시그날링 된 한 쌍의 dependency_id[i] 및 quality_level[i]와 같은 dependency_id 및 quality_level 값 들을 가짐; (4) 화상이 비참조 화상임.In the process of marking a picture as "unused for inter-layer reference", this process is caused when the value of "num_inter_layer_mmco" is not zero. All pictures in the DPB where all of the following conditions are true are marked as "unused for inter-layer reference": condition (1) The picture belongs to the same access unit as the current picture; (2) the picture has a value of "inter_layer_ref_flag" corresponding to 1 and is marked as "used for inter-layer reference"; (3) the picture has a pair of dependency_id and quality_level values such as dependency_id [i] and quality_level [i] signaled via dec_ref_pic_marking () for the current picture; (4) The picture is a non-reference picture.

디코딩 화상 버퍼 (decoded picture buffer)의 동작에 있어서, 디코딩 화상 버퍼는 프레임 버퍼들을 포함한다. 프레임 버퍼들 각각은 디코딩 프레임, "참조에 사용됨" (참조 화상들)으로 마킹되거나 "계층간 참조에 사용됨"으로 마킹되거나 추후 출력 (기록 또는 지연 화상들)을 위해 보유되는 디코딩 보충 필드 쌍 (decoded complementary filed pair) 또는 (쌍이 아닌) 단일 디코딩 필드를 포함할 수 있다. 초기화 전에, DPB는 비어 있게 된다 (DPB 충만도가 0으로 세팅됨). 이 하부절의 다음 하부절들의 단계들은 모두 t_r(n)에서 순간적으로, 나열된 시퀀스에 따라 일어난다.In operation of a decoded picture buffer, the decoded picture buffer comprises frame buffers. Each of the frame buffers is decoded as a decoded frame, marked as "used for reference" (reference pictures) or marked as "used for inter-layer reference" or held for later output (recorded or delayed pictures). complementary filed pair) or a single decoding field (not a pair). Before initialization, the DPB is empty (DPB fullness is set to zero). The steps in the following subclauses of this subclause all occur instantaneously at t _r (n), according to the listed sequence.

frame_num의 디코딩 및 "존재하지 않는" 프레임들의 저장에 있어서, 적용 가능한 경우, 디코딩 프로세스를 통해 frame_num의 갭 (gap)들이 검출되고, 생성된 프레임들은 마킹되어 이하와 같이 특정된 대로 DPB에 삽입된다. frame_num의 갭들이 디코딩 프로세스에 의해 검출되고 생성 프레임들은 현 SVC 표준 초안의 하부절 8.2.5.2에 특정된 바와 같이 마킹된다. 생성된 각각의 프레임 마킹 후, "참조에 미사용"이라고 "슬라이딩 윈도 (sliding window)" 프로세스를 통해 마킹된 각 화상 m은, 그것이 "존재하지 않음"으로 역시 마킹되어 있거나, 그것의 DPB 출력 시간이 현재의 화상 n의 코딩 화상 버퍼 (CPB, coded picture buffer) 제거 시간보다 작거나 같을 때, 즉, t_o, _dpb(m)<=t_r(n)일 때, DPB에서 제거된다. 한 프레임이나 한 프레임 버퍼 내 마지막 필드가 DPB에서 제거될 때, DPB 충만도는 1 만큼 감소된다. " 존재하지 않는" 생성 프레임이 DPB로 삽입되면, DPB 충만도는 1 만큼 증가된다. In the decoding of frame_num and the storage of "non-existing" frames, if applicable, gaps in frame_num are detected through the decoding process, and the generated frames are marked and inserted into the DPB as specified below. Gaps in frame_num are detected by the decoding process and generated frames are marked as specified in subclause 8.2.5.2 of the current SVC standard draft. After each frame marking created, each picture m marked "unused for reference" through the "sliding window" process is either marked as "not present" or its DPB output time is When less than or equal to the coded picture buffer (CPB) removal time of the current picture n, that is, when t _o, _dpb (m) < = t _r (n), it is removed from the DPB. When the last field in one frame or one frame buffer is removed from the DPB, DPB fullness is reduced by one. If a "non-existing" generation frame is inserted into the DPB, the DPB fullness is increased by one.

화상 디코딩 및 출력에 있어서, 화상 n이 디코딩되고 (DPB 아닌 곳에) 일시적으로 저장된다. 화상 n이 요망되는 규모가변적 계층 내에 있으면, 아래의 텍스트가 적용된다. 화상 n의 DPB 출력 시간 t_o, _dpb(n)은 t_o,dpu(n)=t_r(n)+t_c*dpb_output_delay(n)에 의해 도출된다. 현재의 화상 출력은 다음과 같이 특정된다. t_o, _dpb(n)=t_r(n)일 때, 현재의 화상이 출력된다. 현재의 화상이 참조 화상일 때 그것은 DPB에 저장될 것이라는 것을 주지해야 한다. t_o, _dpb(n)≠t_r(n)이면, t_o,dpb(n)>t_r(n)이고, 현재의 화상은 나중에 출력되고 DPB에 저장될 것이며 (현 SVC 표준 초안의 하부절 C.2.4에 명시된 바와 같음), t_o, _dpb(n)에 앞선 시간에 1에 해당하는 no_output_of_prior_pics_flag의 디코딩 또는 추정에 의해 출력되지 않는 것으로 지시된 것이 아니라면, t_o, _dpb(n) 시점에 출력된다. 출력 화상은 그 시퀀스에 대해 정해진 시퀀스 파라미터에 특정된 크로핑 (cropping) 사각형을 이용해 크롭 (crop)된다. In picture decoding and output, picture n is decoded and stored temporarily (not in the DPB). If picture n is within the desired scalable hierarchy, the text below applies. The DPB output time t _{o and} _dpb (n) of the image n are _derived by t _{o, dpu} (n) = t _r (n) + t _c * dpb_output_delay (n). The current image output is specified as follows. When t _o, _dpb (n) = t _r (n), the current image is output. Note that when the current picture is a reference picture it will be stored in the DPB. If t _o, _dpb (n) ≠ t _r (n), then t _{o, dpb} (n)> t _r (n), and the current picture will be output later and stored in the DPB (the subclause of the current SVC standard draft). as _equal), t _o, the earlier time in _dpb (n) unless indicated as not being output by the decoding or estimating the no_output_of_prior_pics_flag corresponding to _{_{1, t o, dpb (n}} ) output at the time specified in C.2.4 do. The output picture is cropped using a cropping rectangle specified for the sequence parameter specified for that sequence.

화상 n이 출력되는 화상이고 출력되는 비트 스트림의 마지막 화상은 아닐 때,

의 값은

으로 정의되고, 이때 n_n은 출력 순서상 화상 n 뒤에 이어지는 화상을 가리킨다.When picture n is an output picture and not the last picture of the output bit stream,

The value of

Where n _n refers to an image following image _n in the output order.

현재의 화상 삽입 전에 DPB로부터 화상들을 제거하는 것은 다음과 같이 나열 된 시퀀스에 따라 진행된다. 디코딩 화상이 IDR 화상이면, 다음과 같은 사항이 적용된다. DPB 안에 있고 현 SVC 표준 초안의 하부절 8.2.5.1에 명시된 바와 같이 각각 현재의 화상과 동일한 dependency_id 및 quality_level 값들을 가진 모든 참조 화상들이 "참조에 미사용 (unused for reference)"이라고 마킹된다. IDR 화상이 디코딩된 최초 IDR 화상이 아니고 액티브 시퀀스 파라미터 세트로부터 도출된 PicWidthInMbs나 FrameHeightInMbs나 max_dec_frame_buffeing의 값이, 현재 코딩된 비디오 시퀀스와 각각 동일한 dependency_id 및 quality_level 값을 가진 이전 시퀀스에 대해 영향을 가졌던 시퀀스 파라미터 세트로부터 도출된 PicWidthInMbs나 FrameHeightInMbs나 max_dec_frame_buffeing의 값과는 다를 때, no_output_of_prior_pics_glag의 실제 값과 무관하게, no_output_of_prior_pics_flag는 HRD에 의해 1에 해당한다고 추정된다. 디코더 구성은 PicWidthInMbs나 FrameHeightInMbs의 변경과 관련해 HRD보다 얌전하게 프레임이나 DPB 사이즈 변경을 처리하도록 시도할 것이라는 것을 인지해야 한다. Removing pictures from the DPB before the current picture insertion proceeds according to the sequence listed as follows. If the decoded picture is an IDR picture, the following applies. All reference pictures in the DPB and with the same dependency_id and quality_level values as the current picture, respectively, as specified in subclause 8.2.5.1 of the current SVC standard draft, are marked as "unused for reference". A sequence parameter set in which the value of PicWidthInMbs, FrameHeightInMbs, or max_dec_frame_buffeing derived from the active sequence parameter set and not the first IDR picture decoded had an effect on the previous sequence with the same dependency_id and quality_level values as the current coded video sequence, respectively. When different from the values of PicWidthInMbs, FrameHeightInMbs, or max_dec_frame_buffeing derived from, no_output_of_prior_pics_flag is estimated to correspond to 1 by HRD regardless of the actual value of no_output_of_prior_pics_glag. It should be noted that the decoder configuration will attempt to handle frame or DPB resizes more gracefully than HRD in terms of PicWidthInMbs or FrameHeightInMbs changes.

no_output_of_prior_pics_flag가 1이거나 1로 추정될 때, 현재의 화상과 각각 동일한 dependency_id 및 quality_level 값을 가진 디코딩 화상들을 포함하는 DPB 내 모든 프레임 버퍼들은, 이들이 포함하는 화상들의 출력 없이 비워지고, DPB 충만도는 비워진 프레임 버퍼들의 수만큼 감소된다. 그렇지 않은 경우 (즉, 디코딩 화상이 IDR 화상이 아닌 경우), 다음과 같은 사항이 적용된다. 현재 화상의 슬라이스 헤더가 5에 해당하는 memory_management_control_operation 값을 포함하면, DPB에 포함되어 있고 현재의 화상과 각각 동일한 dependency_id 및 quality_level 값을 가진 모든 화상들이 "참조에 미사용"으로 마킹된다. 그렇지 않으면 (즉, 현 화상의 슬라이스 헤더가 5에 해당하는 memory_management_control_operation 값을 포함하지 않으면), 현 SVC 표준 초안의 하부절 8.2.5에 명시된 디코딩 참조 화상 마킹 프로세스가 일어난다. 한 화상을 현 SVC 표준 초안의 하부절 8.2.5.5에 명시된 대로 "계층간 참조에 미사용"으로 하는 마킹 프로세스가 일어난다.When no_output_of_prior_pics_flag is 1 or estimated to be 1, all frame buffers in the DPB that contain decoded pictures with the same dependency_id and quality_level values as the current picture, respectively, are emptied without output of the pictures they contain, and the DPB fullness is empty. Is reduced by the number of buffers. Otherwise (ie, when the decoded picture is not an IDR picture), the following applies. If the slice header of the current picture includes a memory_management_control_operation value corresponding to 5, all pictures included in the DPB and each having the same dependency_id and quality_level values as the current picture are marked as "unused for reference". Otherwise (ie, the slice header of the current picture does not contain a memory_management_control_operation value corresponding to 5), the decoding reference picture marking process specified in subclause 8.2.5 of the current SVC standard draft occurs. The marking process takes place as "unused for inter-layer references" as specified in subclause 8.2.5.5 of the current draft of the SVC standard.

현 화상이 요망된 규모가변적 계층 안에 있으면, 이하의 조건들을 모두 만족하는 DPB 안의 모든 디코딩 화상들이 "계층간 참조에 미사용"으로 마킹된다. 조건들 (1) 화상이 현재의 화상과 같은 액세스 유닛에 속함; (2) 화상이 1에 해당하는 inter_layer_ref_flag 값을 가지고 "계층간 참조에 사용됨"으로 마킹됨; (3) 화상이 현 화상보다 작은 dependency_id 값을 가지거나, 동일한 dependency_id 값을 가지지만 현 화상보다 작은 quality_level 값을 가짐.If the current picture is within the desired scalable layer, all decoded pictures in the DPB that meet all of the following conditions are marked as "unused for inter-layer reference". Conditions (1) the picture belongs to the same access unit as the current picture; (2) the picture is marked as "used for inter-layer reference" with the inter_layer_ref_flag value corresponding to 1; (3) The picture has a smaller dependency_id value than the current picture, or has the same dependency_id value but a smaller quality_level value than the current picture.

이하의 모든 조건들이 참이 되는 DPB 안의 모든 화상들이 DPB로부터 제거된다. 조건들 (1) 화상 m이 "참조에 미사용"으로 마킹되거나 화상 m이 비참조 화상임. 한 화상이 참조 프레임일 때, 그것의 필드 양쪽 모두가 "참조에 미사용"으로 마킹되어 있었을 때에만 "참조에 미사용"이라고 마킹된다고 간주됨. (2) 화상 m이 "계층간 참조에 미사용"으로 마킹되거나 화상 m이 0인 inter_layr_ref_flag를 포함. (3) 화상 m이 "존재하지 않음"으로 마킹되거나, 원하는 규모가변 계층에 있지 않거나, 그 DPB 출력 시간이 현 화상 n의 CPB 제거 시간과 같거나 그보다 작음, 즉 t_o,dpb(m)<=t_r(n). 한 프레임이나 한 프레임 버퍼 내 마지막 필드가 DPB로부터 제거 될 때, DPB 충만도가 1 만큼 감소된다.All pictures in the DPB for which all the following conditions are true are removed from the DPB. Conditions (1) Picture m is marked as "unused for reference" or picture m is a non-reference picture. When a picture is a reference frame, it is considered to be marked "unused for reference" only when both of its fields have been marked "unused for reference". (2) Include inter_layr_ref_flag where picture m is marked as "unused for inter-layer reference" or picture m is zero. (3) picture m is marked as "not present", not in the desired scalable layer, or its DPB output time is less than or equal to the CPB removal time of current picture n, i.e. t _{o, dpb} (m) < = t _r (n). When the last field in one frame or one frame buffer is removed from the DPB, DPB fullness is reduced by one.

다음은 현 디코딩 화상 마킹 및 저장에 대한 논의가 될 것이다. 참조 디코딩 화상의 마킹 및 DPB 안으로의 저장에 있어서, 현재의 화상이 참조 화상일 때, 그것은 다음과 같이 DPB에 저장된다. 현재의 디코딩 화상이 보충 참조 필드 쌍의 두 번째 필드 (디코딩 순서상)이고, 그 쌍의 첫 번째 필드는 아직 DPB 안에 있을 때, 현재의 디코딩 화상이 그 쌍의 첫 번째 필드로서 동일한 프레임 버퍼 안에 저장된다. 그렇지 않은 경우, 현재의 디코딩 화상은 한 비어 있는 프레임 버퍼 안에 저장되며, DPB 충만도는 1 만큼 증가된다.The following will be a discussion of current decoding picture marking and storage. In the marking of the reference decoded picture and storage into the DPB, when the current picture is the reference picture, it is stored in the DPB as follows. When the current decoded picture is the second field (in decoding order) of the supplementary reference field pair, and the first field of the pair is still in the DPB, the current decoded picture is stored in the same frame buffer as the first field of the pair. do. Otherwise, the current decoded picture is stored in one empty frame buffer and the DPB fullness is increased by one.

비참조 화상을 DPB에 저장하는 것에 있어서, 현재의 화상이 비참조 화상이면 다음과 같은 사항이 적용된다. 현재의 하상이 원하는 규모가변적 계층 안에 있지 않거나, 현재의 화상이 원하는 규모가변적 계층에 있고 t_o,dpb(m)>t_r(n)을 가지면, 그 화상은 다음과 같이 DPB에 저장된다. 현재의 디코딩 화상이 보충 비참조 필드 쌍 중 두 번째 필드 (디코딩 순서 상)이고, 첫 번째 필드는 아직 DPB 안에 있을 때, 현재의 디코딩 화상은 상기 쌍의 첫째 필드와 동일한 프레임 버퍼 안에 저장된다. 그렇지 않으면, 현재의 디코딩 화상이 한 빈 프레임 버퍼 안에 저장되고, DPB 충만도가 1 만큼 증가된다. In storing the non-reference picture in the DPB, the following matters apply if the current picture is a non-reference picture. If the current riverbed is not in the desired scalable layer, or if the current picture is in the desired scalable hierarchy and has t _{o, dpb} (m)> t _r (n), the picture is stored in the DPB as follows. When the current decoded picture is the second field of the supplemental dereferenced field pair (in decoding order) and the first field is still in the DPB, the current decoded picture is stored in the same frame buffer as the first field of the pair. Otherwise, the current decoded picture is stored in one empty frame buffer and the DPB fullness is increased by one.

상술한 실시예에서, 화상이 계층간 예측 참조에 사용될지 여부를 분별하는 표시가 슬라이스 헤더를 통해 시그날링된다. 그것은 신택스 엘리먼트 inter_layer_ref_flag로서 시그날링 된다. 그러한 표시를 시그날링하는 데 있어 다른 여러 대안적인 방법들이 존재한다. 예를 들어, 그 표시는 NAL 유닛 헤더나 다른 방식들을 통해 시그날링 될 수 있다. In the above-described embodiment, an indication to discriminate whether a picture is to be used for inter-layer prediction reference is signaled through the slice header. It is signaled as the syntax element inter_layer_ref_flag. There are many other alternative ways to signal such an indication. For example, the indication can be signaled via a NAL unit header or other ways.

계층간 참조로서 미사용된다고 마킹될 화상들이 식별되는 한, 메모리 관리 동작 명령 (MMCO)의 시그날링 역시 다른 대안적 방식들을 통해 수행될 수 있다. 예를 들어, 신택스 엘리먼트 dependency_id[i]는 슬라이스 헤더가 속한 현재 화상의 dependency_id 값에 대한 증분으로서 코딩될 수 있다. As long as pictures to be marked as unused as inter-layer references are identified, signaling of a memory management operation command (MMCO) may also be performed in other alternative ways. For example, the syntax element dependency_id [i] may be coded as an increment to the dependency_id value of the current picture to which the slice header belongs.

상술한 실시예와 오리지널 DPB 관리 프로세서의 주요한 차이는 다음과 같다. (1) 상술한 실시예에서, inter_layer_ref_flag가 1일 때 디코딩 화상들이 계층간 참조에 사용된다고 마킹된다. (2) 화상이 원하는 규모가변적 계층 안에 있을 때에만 상기 실시예의 디코딩 화상 출력 프로세스가 특정된다. (3) 상기 실시예에서 "계층간 참조에 미사용"이라고 화상을 마킹하는 프로세스는, 현재 화상의 가능한 삽입 전 DPB로부터 화상들의 제거 전에 일어난다. (4) 상기 실시예에서 현재 화상의 가능한 삽입 전에 DPB로부터 제거될 화상들의 조건이, 화상이 "계층간 참조에 미사용"이라고 마킹되거나 0에 해당하는 inter_layer_ref_flag를 가지는지 여부, 및 화상이 원하는 규모가변적 계층에 있는지 여부가 고려되도록 변경된다. (5) DPB에 저장될 화상들의 조건은 상기 실시예에서, 화상이 원하는 규모가변적 계층 안에 있는지 여부를 고려하여 변경된다.The main differences between the above-described embodiment and the original DPB management processor are as follows. (1) In the above embodiment, when inter_layer_ref_flag is 1, it is marked that decoded pictures are used for inter-layer reference. (2) The decoded picture output process of the above embodiment is specified only when the picture is within the desired scalable layer. (3) In the above embodiment, the process of marking a picture as "unused for inter-layer reference" occurs before removal of pictures from the DPB before possible insertion of the current picture. (4) In the above embodiment, the condition of the pictures to be removed from the DPB before the possible insertion of the current picture is whether the picture is marked as "unused for inter-layer reference" or has an inter_layer_ref_flag corresponding to 0, and the picture has a desired scale change. It is changed to take into account whether you are in the hierarchy. (5) The condition of pictures to be stored in the DPB is changed in the above embodiment in consideration of whether the picture is within a desired scalable hierarchy.

도 10은 종래에 알려진 시스템들에 따른 액세스 유닛 내 여러 코딩 화상들에 대한 상태 진보 프로세스의 예를 보인 것이고, 도 11은 본 발명에 따른 그러한 같은 예를 보인 것이다. 도 10에 도시된 종래의 시스템에 있어서 DPB 상태 진보 프 로세스는 다음과 같다 (계층 4가 디코딩 및 재생을 위한 요망되는 규모가변적 계층이라고 가정함). 앞서의 디코딩 액세스 유닛들로부터의 화상들 역시 DPB 안에 저장될 것이나, 단순화를 위해 이 화상들은 이하에서 고려되지 않을 것이다. 계층 0 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후에, DPB는 계층 0으로부터의 화상만을 포함한다. 계층 1 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후에, DPB는 계층 0 및 1 각각으로부터의 2 화상들을 포함한다. 계층 2 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후, DPB는 각각 계층 0-2들로부터의 3 화상들을 포함한다. 계층 3 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후, DPB는 각각 계층 0-3들로부터의 4 화상들을 포함한다. 계층 4 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후, DPB는 각각 계층 0 및 4들로부터의 2 화상들을 포함한다.FIG. 10 shows an example of a state advancement process for various coded pictures in an access unit according to conventionally known systems, and FIG. 11 shows such an example according to the present invention. In the conventional system shown in FIG. 10, the DPB state advancement process is as follows (assuming Layer 4 is the desired scalable layer for decoding and playback). Pictures from previous decoding access units will also be stored in the DPB, but for simplicity these pictures will not be considered below. After decoding the layer 0 picture and the corresponding DPB management process, the DPB includes only the picture from layer 0. After decoding the layer 1 picture and the corresponding DPB management process, the DPB includes 2 pictures from layer 0 and 1 respectively. After decoding the layer 2 picture and the corresponding DPB management process, the DPB includes three pictures from layer 0-2, respectively. After decoding the layer 3 picture and the corresponding DPB management process, the DPB includes 4 pictures from layer 0-3, respectively. After decoding the layer 4 picture and the corresponding DPB management process, the DPB includes two pictures from layers 0 and 4, respectively.

도 11에 도시된 것 같은 DPB 상태 진보 프로세스는 다음과 같다 (계층 4가 디코딩 및 재생을 위한 요망되는 규모가변적 계층이라고 가정함). 앞서의 디코딩 액세스 유닛들로부터의 화상들 역시 DPB에 저장될 수 있지만, 이 화상들은 단순화 목적을 위해 이하에서는 고려되지 않을 것이다. 계층 0 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후, DPB는 계층 0으로부터의 화상만을 포함한다. 계층 1 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후, DPB는 각각 계층 0 및 2로부터의 2 화상들을 포함한다. 계층 2 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후, DPB는 각각 계층 0 및 2로부터의 2 화상들을 포함한다. 계층 3 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후, DPB는 각각 계층 0 및 3으로부터의 2 화상들을 포함한다. 계층 4 화상의 디코딩 및 대응하는 DPB 관리 프로세스 후, DPB는 각각 계 층 0 및 4로부터의 2 화상들을 포함한다.The DPB state advancement process as shown in FIG. 11 is as follows (assuming Layer 4 is the desired scalable layer for decoding and playback). Pictures from previous decoding access units may also be stored in the DPB, but these pictures will not be considered below for simplicity purposes. After decoding the layer 0 picture and the corresponding DPB management process, the DPB contains only the picture from layer 0. After decoding the layer 1 picture and the corresponding DPB management process, the DPB includes two pictures from layer 0 and 2, respectively. After decoding the layer 2 picture and the corresponding DPB management process, the DPB includes two pictures from layer 0 and 2, respectively. After decoding the layer 3 picture and the corresponding DPB management process, the DPB includes two pictures from layers 0 and 3, respectively. After decoding the layer 4 picture and the corresponding DPB management process, the DPB includes two pictures from layers 0 and 4, respectively.

도 11에서 알 수 있는 바와 같이, 본 발명은 버퍼 메모리에 대한 요건을 감축시킬 수 있다. 도 11에 도시된 예에서, 2 디코딩 화상들에 대한 버퍼 메모리가 절약될 수 있다. As can be seen in FIG. 11, the present invention can reduce the requirement for a buffer memory. In the example shown in FIG. 11, the buffer memory for two decoded pictures can be saved.

도 12는 본 발명이 활용될 수 있는 시스템(10)을 보인 것으로서, 그 시스템은 네트워크를 통해 통신할 수 있는 여러 통신 장치들을 포함한다. 시스템(10)은 모바일 전화 네트워크, 무선 LAN (Local Area Network), 블루투스 개인 영역 네트워크, 이더넷 LAN, 토큰 링 LAN, 광역 네트워크, 인터넷 등등을 포함하나 이들에 국한되는 것은 아닌 유무선 네트워크들의 임의의 조합을 포함할 수 있다. 시스템(10)은 유선 및 무선 통신 장치들을 모두 포함할 수 있다.Figure 12 shows a system 10 in which the present invention may be utilized, which includes several communication devices capable of communicating over a network. The system 10 may include any combination of wired and wireless networks, including but not limited to mobile telephone networks, wireless local area networks, Bluetooth personal area networks, Ethernet LANs, token ring LANs, wide area networks, the Internet, and the like. It may include. System 10 may include both wired and wireless communication devices.

예로서, 도 12에 도시된 시스템(10)은 모바일 전화 네트워크(11) 및 인터넷(28)을 포함한다. 인터넷(28)에 대한 접속은, 장거리 무선 접속, 단거리 무선 접속 및 전화선, 케이블 선, 전력선 등등을 포함하나 여기에 한정되는 것은 아닌 다양한 유선 접속들을 포함하나, 여기에 국한되는 것은 아니다.By way of example, the system 10 shown in FIG. 12 includes a mobile telephone network 11 and the Internet 28. Connections to the Internet 28 include, but are not limited to, long distance wireless connections, short range wireless connections and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and the like.

예로 든 시스템(10)의 통신 장치들은 모바일 전화(12), 복합 PDA 및 모바일 전화(14), PDA(16), 통합 메시징 장치 (IMD, integrated messaging device)(18), 데스크 탑 컴퓨터(20), 및 노트북 컴퓨터(22)를 포함할 수 있으며, 상기 나열된 것에 국한하지 않는다. 통신 장치들은 정지형이거나, 이동하는 개인에 의해 소지되는 것 같은 이동형일 수 있다. 통신 장치들은 또한, 자동차, 트럭, 택시, 버스, 보트, 비행기, 자전거, 오토바이 등등을 포함하나 여기에 국한되지는 않은 교통 모 드 상에 자리할 수도 있다. 통신 장치들 중 일부나 전부는 통화 및 메시지를 발신 및 수신하고 서비스 제공자들과 무선 접속(25)을 통해 기지국(24)까지 통신할 수 있다. 기지국(24)은 모바일 전화 네트워크(11) 및 인터넷(28) 사이에서 통신을 가능하게 하는 네트워크 서버(26)에 연결될 수 있다. 시스템(10)은 추가 통신 장치들 및 다른 종류의 통신 장치들을 포함할 수 있다.The communication devices of the example system 10 include a mobile telephone 12, a composite PDA and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20. , And notebook computer 22, but are not limited to those listed above. The communication devices may be stationary or mobile such as carried by a moving individual. Communication devices may also be placed in a traffic mode, including but not limited to cars, trucks, taxis, buses, boats, airplanes, bicycles, motorcycles, and the like. Some or all of the communication devices may originate and receive calls and messages and communicate with the service providers to the base station 24 via a wireless connection 25. The base station 24 may be connected to a network server 26 that enables communication between the mobile telephone network 11 and the Internet 28. System 10 may include additional communication devices and other types of communication devices.

통신 장치들은 CDMA (Code Division Multiple Access), GSM (Global System for Mobile Communications), UMTS (Universal Mobile Telecommunications System), TDMA (Time Division Multiple Access), FDMA (Frequency Division Multiple Access), TCP/IP (Transmission Control Protocol/Internet Protocol), SMS (Short Messaging Service), MMS (Multimedia Messaging Service), 이메일, IMS (Instant Messaging Service), 블루투스, IEEE 802.11 등등을 포함하나 여기 국한되는 것은 아닌 다양한 전송 기술들을 이용해 통신을 행할 수 있다. 통신 장치는 라디오, 적외선, 레이저, 케이블 접속 등등을 포함하나 거기에 국한되는 것은 아닌 다양한 매체를 이용해 통신을 행할 수 있다. Communication devices include Code Division Multiple Access (CDMA), Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), and Transmission Control (TCP / IP). Communicate using a variety of transport technologies including, but not limited to, Protocol / Internet Protocol (SMS), Short Messaging Service (SMS), Multimedia Messaging Service (MMS), Email, Instant Messaging Service (IMS), Bluetooth, IEEE 802.11, and so forth. Can be. Communication devices can communicate using a variety of media, including but not limited to radio, infrared, laser, cable connections, and the like.

도 13 및 14는 본 발명이 구현될 수 있는 한 대표적 모바일 전화(12)를 보인 것이다. 그러나, 본 발명이 한 특정한 타입의 모바일 전화(12)나 기타 전자 기기에 한정되도록 의도되지는 않는다는 것을 알아야 한다. 도 13 및 14의 모바일 전화(12)는 하우징(30), 액정 디스플레이 형태의 디스플레이(32), 키패드(34), 마이크(36), 이어폰(38), 배터리(40), 적외선 포트(42), 안테나(44), 본 발명의 일 실시예에 따른 UICC 형태의 스마트 카드(46), 카드 리더(48), 라디오 인터페이스 회 로(52), 코덱 회로(54), 컨트롤러(56) 및 메모리(58)를 포함한다. 개별 회로들 및 구성요소들은 모두 이 분야에 잘 알려져 있는 타입의 것들로서, 예를 들자면, 노키아의 모바일 전화기들에 속한 것들이 해당된다. 13 and 14 illustrate one representative mobile phone 12 in which the present invention may be implemented. It should be understood, however, that the present invention is not intended to be limited to one particular type of mobile phone 12 or other electronic device. The mobile phone 12 of FIGS. 13 and 14 has a housing 30, a display 32 in the form of a liquid crystal display, a keypad 34, a microphone 36, earphones 38, a battery 40, an infrared port 42. , An antenna 44, a smart card 46 in the form of a UICC according to an embodiment of the present invention, a card reader 48, a radio interface circuit 52, a codec circuit 54, a controller 56 and a memory ( 58). The individual circuits and components are all of a type well known in the art, for example those belonging to Nokia's mobile telephones.

본 발명은 네트워크 환경하에서 컴퓨터들에 의해 실행되는 프로그램 코드 같은 컴퓨터 실행가능 명령들을 포함하는 프로그램 제품을 통해 일 실시예로서 구현될 수 있는 일반적인 방법의 단계들로서 기술되었다.The invention has been described as steps of a general method that can be implemented as an embodiment via a program product comprising computer executable instructions such as program code executed by computers under a network environment.

일반적으로, 프로그램 모듈들은 특정 작업들을 수행하거나 특정의 추상적 데이터 타입들을 구현하는 루틴들, 프로그램들, 오브젝트들, 컴포넌트들, 데이터 구조들 등등을 포함한다. 컴퓨터 실행가능 명령들, 관련 데이터 구조들, 및 프로그램 모듈들은 여기 개시된 방법의 단계들을 실행하기 위한 프로그램 코드의 예들을 나타낸다. 그러한 실행가능한 명령들 또는 관련 데이터 구조들의 특정 시퀀스는 그러한 단계들에서 설명된 기능들을 구현하기 위한 해당 동작들의 예들을 나타낸다.Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of program code for executing the steps of the methods disclosed herein. The particular sequence of such executable instructions or related data structures represents examples of corresponding acts for implementing the functions described in such steps.

본 발명의 소프트웨어 및 웹 구현물은, 다양한 데이터베이스 서치 단계들, 상관 단계들, 비교단계들 및 결정 단계들을 수행하기 위한 규칙 기반 로직 및 기타 로직을 갖춘 표준 프로그래밍 기술을 통해 구현될 수 있다. 명세서와 청구범위에서 사용된 "컴포넌트" 및 "모듈"이라는 말들은 소프트웨어 코드의 하나 이상의 라인들, 및/또는 하드웨어 구성, 및/또는 수동 입력 수취용 기기를 사용하는 구현물들을 포괄하도록 의도된 것임을 주지해야 한다. The software and web implementations of the present invention can be implemented through standard programming techniques with rule-based logic and other logic to perform various database search steps, correlation steps, comparison steps and decision steps. It is noted that the terms "component" and "module" as used in the specification and claims are intended to cover implementations that use one or more lines of software code, and / or hardware configuration, and / or a manual input receiving device. Should be.

본 발명의 실시예들에 대한 상기 내용은 예시 및 설명의 목적으로 제시된 것 들이다. 그것이 본 발명의 전부가 되거나 개시된 특정 형태로 제한하고자 한 것이 아니고, 상기 개념들에 비춰 다양한 변경 및 치환이 가능하거나 본 발명의 실시로부터 획득될 수 있다. 실시예들은, 이 분야의 당업자가 본 발명을 다양한 실시예들과 숙고된 특정한 용도로 맞춰진 여러가지 변형을 통해 활용할 수 있도록 하는 본 발명의 원리들 및 그 실제적 어플리케이션을 설명하기 위해 선택 및 기술되었다. The foregoing description of the embodiments of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to be limited to the particular forms disclosed, and various modifications and substitutions are possible in light of the above concepts or may be acquired from practice of the invention. The embodiments have been selected and described in order to explain the principles of the invention and its practical application which enable those skilled in the art to utilize the invention through various modifications tailored to the specific embodiments contemplated by the various embodiments.

Claims

A method of managing a decoded picture buffer for scalable video coding,

Receiving a first decoded picture belonging to a first layer on a bit stream into the decoded picture buffer;

Receiving a second decoded image belonging to a second layer;

Determining whether the first decoded picture is needed for inter-layer prediction reference in view of the reception of the second decoded picture; And

Removing the first decoded picture from the decoded picture buffer when the first decoded picture is no longer needed for inter-layer prediction reference and subsequent output.

The method of claim 1,

Conveying information about possible inter-layer predictive reference indications of subsequent pictures in the decoding order signaled over the bit stream.

3. The method of claim 2, wherein the possible inter-layer prediction reference indication is signaled in a slice header.

3. The method of claim 2, wherein the possible inter-layer prediction reference indication is signaled by being enclosed in a Network Abstraction Layer (NAL) unit header.

3. The method of claim 2, wherein determining whether the first decoded picture is required for an inter-layer prediction reference comprises selectively marking the first decoded picture as "unused for inter-layer reference." How to feature.

6. The method of claim 5, wherein the first decoded picture is marked as "unused for inter-layer reference" when the first picture belongs to the same access unit as the second picture.

7. The method of claim 6, wherein the determination as to whether the first decoded picture is marked as "unused for inter-layer reference" is based on signaling through the bit stream.

6. The method of claim 5, wherein the first decoded picture is marked as "unused for inter-layer reference" when the first picture is marked as "used for inter-layer reference" with a possible possible inter-layer prediction reference indication. How to feature.

9. The method of claim 8, wherein the determination of whether the first decoded picture is marked as "unused for inter-layer reference" is based on signaling through the bit stream.

6. The method of claim 5, wherein the first decoded image has a dependency_id of which the first image has a smaller value than the second image, or has a dependency_id of the same value as the second image, but has a quality_level of a value smaller than the second image. Case, marked as "unused for inter-layer reference".

12. The method of claim 10, wherein determining whether the first decoded picture is marked as "unused for inter-layer reference" is based on signaling over the bit stream.

3. The method of claim 2, wherein the first picture is marked as "unused for reference" or is an unreferenced picture; The first picture is marked as "unused for inter-layer reference" or the possible inter-layer prediction reference indication is negative; The first decoding when the first picture is marked as “not present” or not in the desired scalable layer, or when the decoded picture buffer output time is less than or equal to the coded picture buffer removal time of the second picture. And it is determined that the picture is no longer needed for inter-layer prediction reference.

13. The method of claim 12, wherein if the first decoded picture is a reference frame, the first decoded picture is considered to be marked as "unused in reference" only when all the fields of the first decoded picture are marked as "unused in reference." How to feature.

2. The method of claim 1, wherein the first decoded picture is not needed for future output unless it is within the desired scalable hierarchy for playback.

The bit stream of claim 1, wherein the bit stream includes a first sub bit stream and a second sub bit stream, the first sub bit stream includes coded pictures belonging to a first layer, and And comprising two layers of images.

A decoder for decoding an encoding stream of a plurality of pictures,

The plurality of pictures are defined as reference pictures or non-reference pictures, and when information related to the decoding order and the output order of one picture is determined for the pictures of the picture stream, the decoder is adapted to perform the method of claim 1. Decoder configured.

A computer program product for managing a decoded picture buffer for scalable video coding,

Computer code for receiving a first decoded picture belonging to a first layer on a bit stream into the decoded picture buffer;

Computer code for receiving a second decoded picture belonging to a second layer;

Computer code for determining whether the first decoded picture is needed for inter-layer prediction reference in light of the reception of the second decoded picture; And

And computer code for removing the first decoded picture from the decoded picture buffer when the first decoded picture is no longer needed for inter-layer prediction reference and subsequent output.

The method of claim 17,

And computer code for conveying information about possible inter-layer predictive reference indications of subsequent pictures in the decoding order signaled through the bit stream.

19. The computer program product of claim 18, wherein the possible inter-layer prediction reference indication is enclosed in a slice header and signaled.

19. The computer program product of claim 18, wherein the possible inter-layer prediction reference indication is signaled by being enclosed in a Network Abstraction Layer (NAL) unit header.

19. The computer-readable medium of claim 18, wherein determining whether the first decoded picture is required for inter-layer prediction reference comprises selectively marking the first decoded picture as "unused for inter-layer reference." Program product.

22. The computer program product according to claim 21, wherein the first decoded picture is marked as "unused for inter-layer reference" when the first picture belongs to the same access unit as the second picture.

23. The computer program product of claim 22, wherein the determination of whether the first decoded picture is marked "unused for inter-layer reference" is based on signaling over the bit stream.

22. The method according to claim 21, wherein the first decoded picture is marked as "unused for inter-layer reference" when the inter-layer predictive reference indication capable of the first picture is marked as "used for inter-layer reference" with a positive. Computer program product characterized in that.

25. The computer program product of claim 24, wherein the determination of whether the first decoded picture is marked as "unused for inter-layer reference" is based on signaling via the bit stream.

22. The method of claim 21, wherein the first decoded image has a dependency_id of which the first image has a smaller value than the second image, or has a dependency_id of the same value as the second image, but has a quality_level of a value smaller than the second image. If marked as "unused for inter-layer reference."

27. The computer program product of claim 26, wherein the determination of whether the first decoded picture is marked as "unused for inter-layer reference" is based on signaling via the bit stream.

18. The method of claim 17, wherein the first picture is marked as “unused for reference” or is an unreferenced picture; The first picture is marked as "unused for inter-layer reference" or the possible inter-layer prediction reference indication is negative; The first decoding when the first picture is marked as “not present” or not in the desired scalable layer, or when the decoded picture buffer output time is less than or equal to the coded picture buffer removal time of the second picture. A computer program product, characterized in that it is determined that a picture is no longer needed for inter-layer prediction reference.

29. The method of claim 28, wherein if the first decoded picture is a reference frame, the first decoded picture is considered to be marked as "unused in reference" only when all the fields of the first decoded picture are marked as "unused in reference." Computer program product characterized.

17. The computer program product of claim 16, wherein the first decoded picture is not needed for future output unless it is within the desired scalable layer for playback.

17. The method of claim 16, wherein the bit stream includes a first sub bit stream and a second sub bit stream, the first sub bit stream includes coded pictures belonging to a first layer, and the second sub bit stream includes a first sub bit stream. A computer program product comprising two layers of images.

In an electronic device,

A processor; And

A memory coupled to interoperate with the processor, the memory including a computer program product for managing a decoded picture buffer for scalable video coding,

The computer program product,

Computer code for determining whether the first decoded picture is needed for inter-layer prediction reference in view of the reception of the second decoded picture; And

33. The electronic device of claim 32, wherein the memory unit further comprises computer code for conveying information about a possible inter-layer prediction reference indication of a next picture in decoding order signaled via the bit stream. .

34. The electronic device of claim 33, wherein the possible inter-layer prediction reference indication is signaled by being put in a slice header.

34. The electronic device of claim 33, wherein the possible inter-layer prediction reference indication is signaled by being put in a network abstraction layer (NAL) unit header.

34. The electronic device of claim 33, wherein determining whether the first decoded picture is required for an inter-layer prediction reference comprises selectively marking the first decoded picture as "unused for inter-layer reference." device.

37. The electronic device according to claim 36, wherein the first decoded picture is marked as "unused for inter-layer reference" when the first picture belongs to the same access unit as the second picture.

38. The electronic device of claim 37, wherein determining whether the first decoded picture is marked as "unused for inter-layer reference" is based on signaling through the bit stream.

37. The method of claim 36, wherein the first decoded picture is marked as "unused for inter-layer reference" when the first picture-enabled inter-layer prediction reference mark is marked as "used for inter-layer reference" with a positive. Electronic device characterized in that the.

40. The electronic device of claim 39, wherein determining whether the first decoded picture is marked as "unused for inter-layer reference" is based on signaling through the bit stream.

37. The method of claim 36, wherein the first decoded image has a dependency_id of which the first image has a smaller value than the second image, or has a dependency_id of the same value as the second image but has a quality_level of a value smaller than the second image The electronic device, if marked as "unused for inter-layer reference."

42. The electronic device of claim 41, wherein the determination of whether the first decoded picture is marked as "unused for inter-layer reference" is based on signaling through the bit stream.

37. The method of claim 36, wherein the first picture is marked “unused for reference” or is an unreferenced picture; The first picture is marked as "unused for inter-layer reference" or the possible inter-layer prediction reference indication is negative; The first decoding when the first picture is marked as “not present” or not in the desired scalable layer, or when the decoded picture buffer output time is less than or equal to the coded picture buffer removal time of the second picture. An electronic device, characterized in that it is determined that a picture is no longer needed for inter-layer prediction reference.

44. The method of claim 43, wherein if the first decoded picture is a reference frame, then the first decoded picture is considered to be marked as "unused in reference" only when all the fields of the first decoded picture are marked as "unused in reference." An electronic device characterized by the above-mentioned.

33. The electronic device of claim 32, wherein the first decoded picture is not needed for future output if it is not within the desired scalable hierarchy for playback.

33. The method of claim 32, wherein the bit stream comprises a first sub bit stream and a second sub bit stream, the first sub bit stream including coded pictures belonging to a first layer, and the second sub bit stream being a second sub bit stream. An electronic device comprising two layers of images.

33. The electronic device of claim 32, wherein the electronic device comprises a decoder configured to read syntax elements of possible reference indication and memory management control operations from the bit stream.

An encoder for forming an encoding stream of pictures, the encoder comprising:

The pictures are defined as reference pictures or non-reference pictures, and when the decoding order and output order related information of a picture is defined in the pictures of the stream,

Wherein the encoder places in the stream syntax elements for possible reference indication and memory management control operations, the syntax elements being generated by an electronic device according to claim 32.

In the bit stream,

And a syntax element that provides an indication to selectively remove the first decoded picture of the first layer from the decoded picture buffer in the light of the second decoded picture of the second layer.

49. A computer device comprising an encoder for generating a bit stream according to claim 48.

In the bit stream,

A syntax element for providing an indication to selectively remove the first decoded picture of the first layer from the decoded picture buffer in the light of the second decoded picture of the second layer;

And wherein said syntax element is set according to the method of claim 1.

A method of managing a decoded picture buffer for scalable video coding,

Receiving a second decoded image belonging to a second layer;

Determining whether the first decoded picture is required for inter-layer prediction reference, cross-prediction reference, and future output in view of receiving the second decoded picture; And

Removing the first decoded picture from the decoded picture buffer if the first decoded picture is no longer needed for inter-layer prediction reference, cross prediction reference, and future output.