KR20140043483A

KR20140043483A - Mvc based 3dvc codec supporting inside view motion prediction (ivmp) mode

Info

Publication number: KR20140043483A
Application number: KR1020147004636A
Authority: KR
Inventors: 잉 천; 리 장; 마르타 카르체비츠
Original assignee: 퀄컴 인코포레이티드
Priority date: 2011-07-22
Filing date: 2012-07-20
Publication date: 2014-04-09
Also published as: EP2735152A1; BR112014001247A2; TW201320754A; CN103748882A; JP6141386B2; WO2013016231A1; KR101628582B1; JP2014526193A; US20160301936A1; HUE040195T2; ES2686936T3; US20130188013A1; RU2014106666A; EP2735152B1; JP2016067009A; CA2842569A1

Abstract

본 개시물은 3차원 (3D) 비디오 코딩에 적용가능한 특징들 및 기법들을 설명한다. 일 예에서, 기법은 텍스쳐 뷰 비디오 블록을 코딩하는 것, 및 심도 뷰 비디오 블록을 코딩하는 것을 포함하며, 심도 뷰 비디오 블록은 텍스쳐 뷰 비디오 블록과 연관된다. 상기 심도 뷰 비디오 블록을 코딩하는 것은 상기 텍스쳐 뷰 비디오 블록과 연관된 모션 정보가 상기 심도 뷰 비디오 블록과 연관된 모션 정보로서 채택되는지의 여부를 나타내는 구문 엘리먼트를 코딩하는 것을 포함할 수도 있다.This disclosure describes features and techniques applicable to three-dimensional (3D) video coding. In one example, the technique includes coding the texture view video block, and coding the depth view video block, wherein the depth view video block is associated with the texture view video block. Coding the depth view video block may include coding a syntax element indicating whether motion information associated with the texture view video block is adopted as motion information associated with the depth view video block.

Description

MVC BASED 3DVC CODEC SUPPORTING INSIDE VIEW MOTION PREDICTION (IVMP) MODE}

본 출원은,[0002]

2011 년 11월 18일 출원된 미국 특허 가출원 61/561,800;United States Patent Provisional Application 61 / 561,800, filed November 18, 2011;

2011 년 11월 26일 출원된 미국 특허 가출원 61/563,771; United States Patent Provisional Application 61 / 563,771, filed November 26, 2011;

2011 년 8월 11일 출원된 미국 특허 가출원 61/522,559;United States Patent Provisional Application 61 / 522,559, filed August 11, 2011;

2011 년 7월 22일 출원된 미국 특허 가출원 61/510,738;US Provisional Application No. 61 / 510,738, filed July 22, 2011;

2011 년 8월 11일 출원된 미국 특허 가출원 61/522,584;United States Patent Provisional Application 61 / 522,584, filed August 11, 2011;

2011 년 11월 26일 출원된 미국 특허 가출원 61/563,772; 및US Provisional Application No. 61 / 563,772, filed November 26, 2011; And

2011 년 8월 13일 출원된 미국 특허 가출원 61/624,031United States Patent Provisional Application 61 / 624,031, filed August 13, 2011

의 이익을 우선권으로 주장하며, 이들 각각의 전체 내용을 본원에서는 참조로서 포함한다.Is hereby prioritized, the entire contents of each of which are incorporated herein by reference.

본 개시물은 3 차원 (3D) 비디오 코딩에 관한 것이다.This disclosure relates to three-dimensional (3D) video coding.

디지털 비디오 능력들은 디지털 텔레비전, 디지털 직접 브로드캐스트 시스템들, 무선 브로드캐스트 시스템들, 개인 휴대정보 단말기들 (PDAs), 랩탑 또는 데스크탑 컴퓨터들, 태블릿 컴퓨터들, e-책 리더들, 디지털 카메라들, 디지털 레코딩 디바이스들, 디지털 미디어 플레이어들, 비디오 게이밍 디바이스들, 비디오 게임 콘솔들, 셀룰러 또는 위성 무선 전화기들, 소위 "스마트폰들", 원격 화상회의 디바이스들, 비디오 스트리밍 디바이스들 등을 포함한, 광범위한 디바이스들에 포함될 수 있다. 디지털 비디오 디바이스들은 MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, 파트 10, AVC (Advanced Video Coding), 현재 개발중인 HEVC (High Efficiency Video Coding) 표준, 및 이런 표준들의 익스텐션들에 의해 정의된 표준들에서 설명되는 기법들과 같은, 비디오 압축 기법들을 구현한다. 비디오 디바이스들은 이런 비디오 압축 기법들을 구현함으로써, 디지털 비디오 정보를 좀더 효율적으로 송신하거나, 수신하거나, 인코딩하거나, 디코딩하거나, 및/또는 저장할 수도 있다.Digital video capabilities include, but are not limited to, digital television, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, A wide variety of devices, including recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite wireless telephones, so-called "smart phones", remote video conferencing devices, video streaming devices, . Digital video devices include MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4, Part 10, AVC (Advanced Video Coding), HEVC (High Efficiency Video Coding) Standards, and techniques described in the standards defined by the extensions of these standards. Video devices may implement such video compression techniques to more efficiently transmit, receive, encode, decode, and / or store digital video information.

비디오 압축 기법들은 비디오 시퀀스들에 고유한 리던던시를 감소시키거나 또는 제거하기 위해, 공간 (인트라-픽쳐) 예측 및/또는 시간 (인터-픽쳐) 예측을 수행한다. 블록-기반의 비디오 코딩에 있어, 비디오 슬라이스 (예컨대, 비디오 프레임 또는 비디오 픽쳐의 부분) 는 비디오 블록들로 파티셔닝될 수도 있으며, 이 비디오 블록들은 또한 트리블록들, 코딩 유닛들 (CUs) 및/또는 코딩 노드들로서 지칭될 수도 있다. 픽쳐의 인트라-코딩된 (I) 슬라이스에서 비디오 블록들은 동일한 픽쳐에서 이웃하는 블록들에서의 참조 샘플들에 대한 공간 예측을 이용하여 인코딩된다. 픽쳐의 인터-코딩된 (P 또는 B) 슬라이스에서 비디오 블록들은 동일한 프레임에서 이웃하는 블록들에서의 참조 샘플들에 대한 공간 예측 또는 다른 참조 프레임들에서의 참조 샘플들에 대한 시간 예측을 이용할 수도 있다. 픽쳐들은 프레임들로서 참조될 수도 있으며, 참조 픽쳐들은 참조 픽쳐들에 참조될 수도 있다. 픽쳐들은 프레임들로서 지칭될 수도 있고, 참조 픽쳐들은 참조 프레임들로서 지칭될 수도 있다.Video compression techniques perform spatial (intra-picture) prediction and / or temporal (inter-picture) prediction to reduce or eliminate redundancy inherent in video sequences. In block-based video coding, a video slice (e.g., a portion of a video frame or a video picture) may be partitioned into video blocks, which may also include triblocks, coding units (CUs), and / May be referred to as coding nodes. The video blocks in the intra-coded (I) slice of the picture are encoded using spatial prediction for reference samples in neighboring blocks in the same picture. The video blocks in the inter-coded (P or B) slice of the picture may use spatial prediction for reference samples in neighboring blocks in the same frame or temporal prediction for reference samples in different reference frames . The pictures may be referred to as frames, and the reference pictures may be referred to the reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

공간 또는 시간 예측은 코딩되는 블록에 대한 예측 블록을 초래한다. 잔여 데이터는 코딩되는 원래 블록과 예측 블록 사이의 픽셀 차이들을 나타낸다. 인터-코딩된 블록은 예측 블록을 형성하는 참조 샘플들의 블록을 가리키는 모션 벡터, 및 코딩된 블록과 예측 블록 사이의 차이를 나타내는 잔여 데이터에 따라서 인코딩된다. 인트라-코딩된 블록은 인트라-코딩 모드 및 잔여 데이터에 따라서 인코딩된다. 추가적인 압축을 위해, 잔여 데이터는 픽셀 도메인으로부터 변환 도메인으로 변환되어, 잔여 변환 계수들을 초래할 수도 있으며, 이 잔여 변환 계수는 그후 양자화될 수도 있다. 처음에 2차원 어레이로 배열된, 양자화된 변환 계수들은 변환 계수들의 1차원 벡터를 발생하기 위해 스캐닝될 수도 있으며, 더욱 더 많은 압축을 달성하기 위해 엔트로피 코딩이 적용될 수도 있다.Spatial or temporal prediction results in a prediction block for the block being coded. The residual data represents the pixel differences between the original block being coded and the prediction block. The inter-coded block is encoded according to the motion vector indicating the block of reference samples forming the prediction block, and the residual data indicating the difference between the coded block and the prediction block. The intra-coded block is encoded according to the intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain, resulting in residual transform coefficients, which may then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned to generate a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.

여러 애플리케이션에 대해 3 차원 (3D) 비디오가 매우 바람직하지만, 3D 비디오 코딩에는 많은 도전 과제들이 존재한다.Although three-dimensional (3D) video is highly desirable for many applications, many challenges exist for 3D video coding.

본 개시물은 3차원 (3D) 비디오 코딩에 적용가능한 특징들 및 기법들을 설명한다. 일 예에서, 기법은 텍스쳐 뷰 비디오 블록을 코딩하는 것 및 심도 뷰 비디오 블록을 코딩하는 것을 포함하며, 심도 뷰 비디오 블록은 텍스쳐 뷰 비디오 블록과 연관되어 있다. 심도 뷰 비디오 블록을 코딩하는 것은 텍스쳐 뷰 비디오 블록이 심도 뷰 비디오 블록과 연관된 모션 정보로서 채택되는지의 여부를 나타내는 구문 엘리먼트를 코딩하는 것을 포함할 수도 있다.This disclosure describes features and techniques applicable to three-dimensional (3D) video coding. In one example, the technique includes coding the texture view video block and coding the depth view video block, the depth view video block being associated with the texture view video block. Coding a depth view video block may include coding a syntax element that indicates whether the texture view video block is adopted as motion information associated with the depth view video block.

설명된 기법들은 IMVP (inside view motion prediction) 모드로서 본원에서 지칭되는 코딩 모드에 대응할 수도 있다. 이 경우에, 심도 뷰 성분 (예를 들어, 심도 뷰 비디오 블록) 은 그 모션 정보에 대하여 임의의 추가적인 델타 값들을 포함하지 않을 수도 있지만, 그 대신에 그 모션 정보로서 텍스쳐 뷰 성분의 모션 정보를 채택시킬 수도 있다. 심도 뷰의 모션 정보로서 텍스쳐 뷰의 모션 정보를 완전히 채택시키는 모드를 정의함으로써, 이러한 모션 정보에 대한 델타 값의 어떠한 시그널링 없이도, 향상된 압축을 실현할 수도 있다.The techniques described may correspond to a coding mode referred to herein as inside view motion prediction (IMVP) mode. In this case, the depth view component (eg, depth view video block) may not include any additional delta values for that motion information, but instead employs the motion information of the texture view component as its motion information. You can also By defining a mode that fully adopts the motion information of the texture view as the motion information of the depth view, improved compression may be realized without any signaling of delta values for this motion information.

다른 예에서, 본 개시물은 3D 비디오 데이터를 코딩하는 디바이스를 설명하며, 디바이스는 텍스쳐 뷰 비디오 블록을 코딩하고 심도 뷰 비디오 블록을 코딩하도록 구성된 하나 이상의 프로세서들을 포함하며, 심도 뷰 비디오 블록은 텍스쳐 뷰 비디오 블록과 연관되어 있다. 심도 뷰 비디오 블록을 코딩하는 것은 텍스쳐 뷰 비디오 블록이 심도 뷰 비디오 블록과 연관된 모션 정보로서 채택되는지의 여부를 나타내는 구문 엘리먼트를 코딩하는 것을 포함한다.In another example, this disclosure describes a device for coding 3D video data, the device including one or more processors configured to code a texture view video block and code a depth view video block, wherein the depth view video block is a texture view. Associated with the video block. Coding a depth view video block includes coding a syntax element that indicates whether the texture view video block is adopted as motion information associated with the depth view video block.

다른 예에서, 본 개시물은 명령들이 저장되어 있는 컴퓨터 판독가능 저장 매체를 설명하며, 본 명령들은 실행시, 하나 이상의 프로세서들로 하여금 텍스쳐 뷰 비디오 블록을 코딩하게 하고 심도 뷰 비디오 블록을 코딩하게 하며, 심도 뷰 비디오 블록은 텍스쳐 뷰 비디오 블록과 연관되어 있다. 심도 뷰 비디오 블록을 코딩하는 것은 텍스쳐 뷰 비디오 블록이 심도 뷰 비디오 블록과 연관된 모션 정보로서 채택되는지의 여부를 나타내는 구문 엘리먼트를 코딩하는 것을 포함한다.In another example, the present disclosure describes a computer readable storage medium having stored thereon instructions that, when executed, cause one or more processors to code a texture view video block and code a depth view video block when executed. The depth view video block is associated with the texture view video block. Coding a depth view video block includes coding a syntax element that indicates whether the texture view video block is adopted as motion information associated with the depth view video block.

다른 예에서, 본 개시물은 3D 비디오 데이터를 코딩하도록 구성된 디바이스를 설명하며, 디바이스는 텍스쳐 뷰 비디오 블록을 코딩하는 수단 및 심도 뷰 비디오 블록을 코딩하는 수단을 포함하며, 심도 뷰 비디오 블록은 텍스쳐 뷰 비디오 블록과 연관되어 있고 심도 뷰 비디오 블록을 코딩하는 수단은 텍스쳐 뷰 비디오 블록이 심도 뷰 비디오 블록과 연관된 모션 정보로서 채택되는지의 여부를 나타내는 구문 엘리먼트를 코딩하는 수단을 포함한다.In another example, this disclosure describes a device configured to code 3D video data, the device including means for coding a texture view video block and means for coding a depth view video block, wherein the depth view video block is a texture view. The means associated with the video block and coding the depth view video block includes means for coding a syntax element that indicates whether the texture view video block is adopted as motion information associated with the depth view video block.

하나 이상의 예들의 상세들은 첨부 도면들 및 이하의 설명에서 언급된다. 다른 특징들, 목적들, 및 이점들은 설명 및 도면들로부터, 그리고 청구항들로부터 명백할 것이다.The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.

도 1 은 본 개시물에 설명된 기법들을 이용할 수도 있는 예시적인 비디오 인코딩 및 디코딩 시스템을 나타내는 블록도이다.
도 2 는 본 개시물에 설명된 기법들을 구현할 수도 있는 예시적인 비디오 인코더를 나타내는 블록도이다.
도 3 은 본 개시물에 설명된 기법들을 구현할 수도 있는 예시적인 비디오 디코더를 나타내는 블록도이다.
도 4 는 하나의 액세스 유닛 내에서 뷰 성분들의 VCL (video coding layer) NAL (network abstraction layer) 유닛들의 비트 스스트림 순서를 나타내는 개념도이다.
도 5 는 비디오 시퀀스를 형성하는 픽쳐들의 시퀀스의 개념적 예시도이며, 여기에서, 텍스쳐 뷰의 4번째 픽쳐에서의 공동 위치된 매크로 블록 (MB) 의 모션 벡터 및 심도 뷰의 4번째 픽쳐에서의 식별된 매크로블록 (MB) 이 심도 뷰 성분에 재이용된다.
도 6 은 3DVC (three-dimensional video coding) 코덱에 의해 이용될 수도 있는 예측 구조를 나타내는 개념도이다.
도 7 은 심도 뷰 성분들에 대한 인터-뷰 예측을 허용하지 않는 3DVC 코덱의 예측 구조를 나타내는 개념도이다.
도 8 은 비대칭 인터-뷰 예측의 일례를 나타내는 개념도이며, 여기에서 좌측 뷰 (VL) 와 우측 뷰 (VR) 가 반폭을 갖는다.
도 9 는 본 개시물에 부합하는 비디오 인코더에 의해 수행될 수도 있는 기법을 나타내는 흐름도이다.
도 10 은 본 개시물에 부합하는 비디오 디코더에 의해 수행될 수도 있는 기법을 나타내는 흐름도이다.1 is a block diagram illustrating an example video encoding and decoding system that may utilize the techniques described in this disclosure.
2 is a block diagram illustrating an example video encoder that may implement the techniques described in this disclosure.
3 is a block diagram illustrating an example video decoder that may implement the techniques described in this disclosure.
4 is a conceptual diagram illustrating a bitstream order of video coding layer (VCL) network abstraction layer (NAL) units of view components within one access unit.
5 is a conceptual illustration of a sequence of pictures forming a video sequence, wherein the identified in the motion vector of the co-located macroblock (MB) in the fourth picture of the texture view and the fourth picture of the depth view. Macroblocks (MB) are reused for depth view components.
6 is a conceptual diagram illustrating a prediction structure that may be used by a three-dimensional video coding (3DVC) codec.
7 is a conceptual diagram illustrating a prediction structure of a 3DVC codec that does not allow inter-view prediction for depth view components.
8 is a conceptual diagram illustrating an example of asymmetric inter-view prediction, where the left view (VL) and the right view (VR) have half widths.
9 is a flowchart illustrating a technique that may be performed by a video encoder consistent with this disclosure.
10 is a flowchart illustrating a technique that may be performed by a video decoder consistent with this disclosure.

본 개시물의 기법들은 MVC (multi-view coding) 를 지원하는 ITU-T H.264/AVC 표준 및 하나 이상의 익스텐션들, 이를 테면, ITU-T H.264/AVC 표준의 Annex H 에 기초하는 3 차원 (3D) 비디오 코딩에 관련된다. 그러나, 본 기법들은 다른 비디오 코딩 표준들 또는 기법들, 이를 테면, 현재 개발중인 이머징 HEVC 표준, ITU-T H.264/AVC 표준들 또는 이머징 HEVC 표준의 익스텍션들 또는 독점적인 비디오 코딩 기법들, 이를 테면, On2 VP6/VP7/VP8 에도 적용할 수도 있다.The techniques of this disclosure are three-dimensional based on the ITU-T H.264 / AVC standard and one or more extensions that support multi-view coding (MVC), such as Annex H of the ITU-T H.264 / AVC standard. (3D) relates to video coding. However, the present techniques may be applied to other video coding standards or techniques, such as emerging HEVC standard, ITU-T H.264 / AVC standards or emerging HEVC standard or proprietary video coding techniques, For example, it can be applied to On2 VP6 / VP7 / VP8.

3D 비디오 코딩에서, 3D 비디오 프리젠테이션을 정의하는데 총괄적으로 이용된 다수의 상이한 뷰들이 존재한다. 또한, 각각의 상이한 뷰들은 텍스쳐 뷰 성분 및 심도 뷰 성분 양쪽 모두를 포함할 수도 있다. 텍스쳐 뷰 성분들은 비디오 데이터의 블록들로 코딩될 수도 있으며, 이 블록들을 "비디오 블록들" 로 코딩되며, H.264 컨텍스트에서는 "매크로 블록" 이라 일반적으로 불린다. 이와 유사하게, 심도 뷰 성분들은 "비디오 블록들" 로 코딩되며, H.264 컨텍스트에서는 "매크로 블록" 이라 일반적으로 불린다. 각각의 텍스쳐 비디오 블록은 대응하는 심도 뷰 블록을 가질 수도 있다. 상이한 비디오 블록들 (텍스쳐 및 심도) 은 그러나 통상 별도로 코딩된다. 다른 비디오 코딩 표준들은 비디오 블록에 대해 트리 블록들 또는 코딩 유닛들 (CU들) 로서 지칭할 수도 있다.In 3D video coding, there are a number of different views collectively used to define a 3D video presentation. In addition, each different view may include both a texture view component and a depth view component. Texture view components may be coded into blocks of video data, which are coded as "video blocks" and are commonly referred to as "macro blocks" in the H.264 context. Similarly, depth view components are coded as "video blocks" and are generally referred to as "macro blocks" in the H.264 context. Each texture video block may have a corresponding depth view block. Different video blocks (texture and depth) are however usually coded separately. Other video coding standards may refer to tree blocks or coding units (CUs) for a video block.

내부 코디에서, 모션 벡터들 (또는 모션 벡터 예측자에 대한 모션 벡터 차이값들) 은 예측 블록들을 정의하는데 이용될 수도 있고, 그 후 예측 블록은 코딩된 비디오 블록들의 값들을 예측하는데 이용된다. 이 경우에, 소위 "잔여 값들" 또는 "차이 값들" 이 대응하는 예측 블록들을 식별하는 모션 벡터들 (또는 모션 벡터 예측자에 대한 모션 벡터 차이 값들) 과 함께, 인코딩된 비트스트림에 포함된다. 디코더는 모션 벡터들 및 잔여 값들을 수신하고, 모션 벡터들을 이용하여 이전에 디코딩된 비디오 데이터로부터 예측 블록들을 식별한다. 인코딩된 비디오 블록들을 재구성하기 위해, 디코더는 모션 벡터들에 의해 식별된 대응하는 예측 블록들과 잔여 값들을 결합한다.In inner coordination, the motion vectors (or motion vector difference values for the motion vector predictor) may be used to define the predictive blocks, which are then used to predict the values of the coded video blocks. In this case, so-called "residual values" or "difference values" are included in the encoded bitstream, along with motion vectors (or motion vector difference values for the motion vector predictor) that identify corresponding prediction blocks. The decoder receives the motion vectors and the residual values and identifies the prediction blocks from the previously decoded video data using the motion vectors. To reconstruct the encoded video blocks, the decoder combines the residual values with the corresponding prediction blocks identified by the motion vectors.

3D 비디오 코딩에는 많은 잠재적인 문제들이 존재한다. 예를 들어, 멀티뷰 비디오 데이터를 코딩할 때, 다음의 문제들은 효과적인 코덱을 실현하기 위하여 해결될 필요가 있을 수도 있다.There are many potential problems with 3D video coding. For example, when coding multiview video data, the following problems may need to be solved to realize an effective codec.

1. 하나 이상의 뷰들에 대한 텍스쳐 및 심도 성분들의 조인트 코딩을 위한 능력을 제공하는 것;1. providing the ability for joint coding of texture and depth components for one or more views;

2. 텍스쳐와 심도 사이에 모션 리던던시의 이점을 취하는 능력을 제공하는 것;2. providing the ability to take advantage of motion redundancy between texture and depth;

3. 간단하고 효과적인 방식으로 카메라 파라미터들을 송신하는 능력을 제공하는 것;3. providing the ability to transmit camera parameters in a simple and effective manner;

4. 뷰 채택시, 출력을 위하여 이용되고 있는 뷰에 속하지 않으면, inter_view_flag 가 뷰 성분을 폐기하는데 이용될 수도 있다. 그러나, 비대칭 3DV 경우에, 플래그가 0 과 같은 경우에도 상이한 분해능을 갖는 뷰들의 예측을 위하여 여전히 NAL (network abstraction layer) 유닛이 요구될 수도 있다.4. When adopting a view, inter_view_flag may be used to discard the view component, unless it belongs to the view being used for output. However, in the case of asymmetric 3DV, a network abstraction layer (NAL) unit may still be required for prediction of views with different resolution even if the flag is equal to zero.

위의 문제를 해결하기 위해, 다음을 포함하는 수개의 기법들이 이용될 수도 있다.To solve the above problem, several techniques may be used, including the following.

1. 심도 뷰 및 텍스쳐 뷰의 조인트 코딩을 지원하는 프레임워크.1. A framework that supports joint coding of depth views and texture views.

2. 새로운 IVMP (inside view motion prediction) 모드가 매크로블록 (또는 다른 비디오 블록 또는 CU) 레벨에 이용되어, 심도와 텍스쳐 뷰들 사이의 모션 벡터들의 재사용을 가능하게 할 수도 있다. IVMP 모드의 양태들은 본 개시물에서 자세히 설명된다.2. A new inside view motion prediction (IVMP) mode may be used at the macroblock (or other video block or CU) level to enable reuse of motion vectors between depth and texture views. Aspects of the IVMP mode are described in detail in this disclosure.

3. 카메라 파라미터들 및 심도 범위들이 SPS (sequence parameter set) 내에 추가될 수도 있거나 또는 새로운 SEI (supplemental enhancement information) 메시지들로서 추가될 수도 있으며, 이러한 파라미터가 픽쳐 기반으로 변화하는 경우, VPS (view parameter set) 또는 SEI 메시지가 추가될 수도 있다. 3. Camera parameters and depth ranges may be added in a sequence parameter set (SPS) or added as new supplemental enhancement information (SEI) messages, and if this parameter changes on a picture basis, view parameter set ) Or an SEI message may be added.

4. inter_view_flag 의 시맨틱스가 변경될 수도 있거나 또는 새로운 구문 엘리먼트가, 또한 상이한 분해능을 가진 뷰에 대하여 폐기가능하지 않은 뷰 성분이 동일한 분해능을 가진 뷰에 대하여 폐기가능한지를 나타내는 NAL (network abstraction layer) 유닛 헤더에서 정의될 수도 있다.4. The network abstraction layer (NAL) unit header indicating whether the semantics of inter_view_flag may change or new syntax elements are also discardable for views with the same resolution, which are not discardable for views with different resolutions. It can also be defined in.

5. 심도 뷰 성분에 의해 이용될 nal_unit_type (예를 들어, 21) 에 더하여, 일 예는 H.264/MVC 에 호환가능하지 않은 텍스쳐 뷰 성분들에 대하여 새로운 new nal_unit_type (예를 들어, 22) 을 더 포함한다.5. In addition to nal_unit_type (eg, 21) to be used by the depth view component, one example may be to add a new new nal_unit_type (eg, 22) for texture view components that are not compatible with H.264 / MVC. It includes more.

본 개시물은 다음 정의들을 이용할 수도 있다.This disclosure may use the following definitions.

뷰 성분: 단일의 액세스 유닛으로 뷰의 코딩된 표현. 뷰가 코딩된 텍스쳐 및 심도 표현들 양쪽 모두를 포함할 때, 뷰 성분은 텍스쳐 뷰 성분 및 심도 뷰 성분으로 구성된다. View component: A coded representation of a view in a single access unit . When a view includes both coded texture and depth representations, the view component consists of a texture view component and a depth view component.

텍스쳐 뷰 성분: 단일의 액세스 유닛으로 뷰의 텍스쳐의 코딩된 표현. Texture view component: A coded representation of the texture of a view in a single access unit .

심도 뷰 성분: 단일의 액세스 유닛으로 뷰의 심도의 코딩된 표현. Depth view component : A coded representation of the depth of a view in a single access unit .

심도 뷰 성분에서의 코딩된 VCL (video coding layer) NAL (network abstraction layer) 유닛들은 심도 뷰 성분들에 대하여 구체적으로 코딩된 슬라이스 익스테션의 새로운 타입으로서 nal_unit_type 21 을 할당받을 수도 있다. 텍스쳐 뷰 성분 및 심도 뷰 성분은 또한 본원에서는 테스쳐 뷰 비디오 블록 및 심도 뷰 비디오 블록으로 지칭될 수도 있다.Coded video coding layer (VCL) network abstraction layer (NAL) units in the depth view component may be assigned nal_unit_type 21 as a new type of coded slice extension specifically for the depth view components. Texture view components and depth view components may also be referred to herein as texture view video blocks and depth view video blocks.

예시적인 비트 스트림 순서는 아래 설명될 것이다. 일부 예들에서, 각각의 뷰 성분에서, 심도 뷰 성분의 임의의 코딩된 슬라이스 NAL 유닛 (nal_unit_type 21 을 가짐) 은 텍스쳐 뷰 성분의 모든 코딩된 슬라이스 NAL 유닛들을 추종해야 한다. 간략화를 위하여, 본원은 심도 뷰 성분의 코딩된 슬라이스 NAL 유닛들을 심도 NAL 유닛들로서 명명할 수도 있다.An exemplary bit stream order will be described below. In some examples, in each view component, any coded slice NAL unit (having nal_unit_type 21) of the depth view component should follow all coded slice NAL units of the texture view component. For simplicity, the present application may name coded slice NAL units of the depth view component as depth NAL units.

심도 NAL 유닛은 20 과 동일한 nal_unit_type 을 가진 NAL 유닛과 동일한 NAL 유닛 헤더 구조를 가질 수도 있다. 도 4 는 하나의 액세스 유닛 내부에서 뷰 성분들의 VCL NAL 유닛들의 비트 스트림 순서를 나타내는 개념도이다.The depth NAL unit may have the same NAL unit header structure as the NAL unit with nal_unit_type equal to 20. 4 is a conceptual diagram illustrating a bit stream order of VCL NAL units of view components within one access unit.

도 4 에 도시된 바와 같이, 본 개시물에 따르면, 액세스 유닛은 다중 뷰 성분들을 가진 다중 NAL 유닛들을 포함한다. 각각의 뷰 성분은 하나의 텍스쳐 뷰 성분과 하나의 심도 뷰 성분으로 구성될 수도 있다. 0 과 같은 뷰 오더 인덱스 (VOIdx) 를 가진 기본 뷰의 텍스쳐 뷰 성분은 하나의 프리픽스 NAL 유닛 (4 와 같은 NAL 유닛 타입을 가짐) 및 하나 이상의 AVC VCL NAL 유닛들 (예를 들어, 1 또는 5 와 같은 NAL 유닛 타입을 가짐) 을 포함한다. 다른 뷰들에서의 텍스쳐 뷰 성분들은 MVC VCL NAL 유닛들 (20과 같은 NAL 유닛 타입을 가짐) 만을 포함한다. 기본 뷰 및 비기본 뷰들 양쪽 모두에서, 심도 뷰 성분들은 21 과 같은 심도 NAL 유닛 타입을 가진 심도 NAL 유닛들을 포함한다. 어떠한 뷰 성분에서도, 심도 NAL 유닛들은 디코딩/비트스트림 순서에서 텍스쳐 뷰 성분의 NAL 유닛들을 추종한다.As shown in FIG. 4, according to this disclosure, an access unit includes multiple NAL units with multiple view components. Each view component may consist of one texture view component and one depth view component. The texture view component of a base view with a view order index (VOIdx) equal to 0 may include one prefix NAL unit (having a NAL unit type such as 4) and one or more AVC VCL NAL units (eg, 1 or 5). Having the same NAL unit type). Texture view components in other views include only MVC VCL NAL units (having a NAL unit type such as 20). In both the base view and non-base views, the depth view components include depth NAL units with a depth NAL unit type such as 21. In any view component, the depth NAL units follow the NAL units of the texture view component in decoding / bitstream order.

텍스쳐 뷰 성분 및 그 관련 심도 뷰 성분이 유사한 오브젝트 실루엣을 갖기 때문에, 이들은 통상 유사한 오브젝트 경계 및 움직음을 갖는다. 따라서, 이들 모션 필트들에서 리던던시가 존재한다. 텍스쳐 뷰 블록 및 심도 뷰 블록이 동일한 NAL 유닛에 존재한다면 및/또는 이들이 3D 비디오 데이터의 동일한 (오버랩하는) 공간 및/또는 시간 인스턴스에 대응한다면, 텍스쳐 뷰 블록 및 심도 뷰 블록은 "연관"될 수도 있다. 본 개시물의 기법들은 심도 뷰 성분이 소위 "병합" 모드와 유사한 방식으로 연관된 텍스쳐 뷰 성분의 모션 정보를 완전히 채택시키는 모드를 허용함으로써 넓은 범위로 이 리던던시를 활용할 수도 있다. 이 경우에, 심도 뷰 성분은 그 모션 정보에 대하여 어떠한 추가적인 델타 값들을 포함하지 않을 수도 있지만, 그 대신에, 그 모션 정보로서 텍스쳐 뷰 성분의 정보를 채택시킬 수도 있다. 텍스쳐 뷰의 모션 정보를 심도 뷰의 모션 정보로서 완전히 채택시키는 모드를 정의함으로써, 이러한 모션 정보에 대한 델타 값들의 어떠한 시그널링 없이도, 향상된 압축을 실현할 수도 있다.Since texture view components and their associated depth view components have similar object silhouettes, they typically have similar object boundaries and movement. Thus, there is redundancy in these motion fields. If the texture view block and depth view block exist in the same NAL unit and / or they correspond to the same (overlapping) spatial and / or temporal instance of 3D video data, the texture view block and depth view block may be “associated”. have. The techniques of this disclosure may utilize this redundancy to a large extent by allowing a mode in which the depth view component fully adopts motion information of the associated texture view component in a manner similar to the so-called "merge" mode. In this case, the depth view component may not include any additional delta values for that motion information, but instead may employ the information of the texture view component as its motion information. By defining a mode that fully adopts the motion information of the texture view as motion information of the depth view, enhanced compression may be realized without any signaling of delta values for such motion information.

특히, 관련된 심도 뷰 성분에 대한 텍스쳐 뷰 성분으로부터의 모션 예측은 심도 뷰의 모션 정보로서 텍스쳐 뷰의 모션 정보를 병합하는 새로운 모드에 따라 인에이블될 수도 있다. 일부 예들에서, 이러한 소위 IVMP (inside view motion prediction) 모드는 심도 뷰 성분들을 갖는 인터 코딩된 MB들에 대해서만 인에이블될 수도 있다. IVMP 모드에 있어서, mb_type, sub_mb_type, 기준 인덱스들, 및 텍스쳐 뷰 성분에 공통 위치된 MB의 모션 벡터들을 포함하는 모션 정보는동일한 뷰의 심도 뷰 성분에 의해 재사용된다. IVMP 모드를 사용하는지의 여부를 나타내는 플래그가 각각의 MB 로 시그널링될 수도 있다. 즉, 플래그가 비디오 블록 레벨, 예를 들어, 매크로블록 레벨에서 정의될 수도 있다. 플래그는 심도 비디오 블록들 (예를 들어, 심도 매크로 블록들) 과 함께 포함될 수도 있다. 도 5 에 도시된 바와 같이, 플래그는 심도 뷰의 네번째 픽쳐에서의 식별된 MB 에 대해 참일 수도 있고, (4번째 픽쳐로서 식별된) 텍스쳐 뷰의 4번째 픽쳐에 공동 위치된 MB 의 모션 벡터는 심도 뷰 성분에서 하이라이트된 MB 에 대해 재사용된다. 일부 예들에서, IVMP 모드가 비앵커 픽쳐에만 적용함을 주지한다.In particular, motion prediction from the texture view component for the associated depth view component may be enabled according to a new mode of merging the motion information of the texture view as the motion information of the depth view. In some examples, this so-called inside view motion prediction (IVMP) mode may be enabled only for inter coded MBs with depth view components. In the IVMP mode, motion information including mb_type, sub_mb_type, reference indices, and motion vectors of MBs commonly located in the texture view component are reused by the depth view component of the same view. A flag indicating whether to use the IVMP mode may be signaled in each MB. That is, the flag may be defined at the video block level, eg, macroblock level. The flag may be included with depth video blocks (eg, depth macro blocks). As shown in FIG. 5, the flag may be true for the identified MB in the fourth picture of the depth view, and the motion vector of the MB co-located in the fourth picture of the texture view (identified as the fourth picture) is the depth of field. It is reused for the MB highlighted in the view component. Note that in some examples, the IVMP mode only applies to non-anchor pictures.

또한, 다른 뷰의 모션에 기초하여 하나의 뷰에 대하여 모션 벡터를 예측하는 기법들에 비해, 본 개시물의 기법들은 추가적인 압축을 실현할 수도 있다. 예를 들어, 몇몇 SVC (scalable video coding) 기법들은 기본 뷰의 모션 정보에 기초하여 보강 뷰의 모션 예측을 허용할 수도 있고, 몇몇 경우에, 기본 뷰가 텍스쳐 뷰일 수도 있고 보강 뷰가 심도 뷰일 수도 있다. 그러나, 이러한 경우, 기본 뷰를 이용하여 보강 뷰를 예측함을 나타내는 예측 정보 (또는 플래그) 에 더하여 모션 벡터 차이 데이터 (예를 들어, 델타) 가 항상 코딩된다. 이와 대조적으로, 본 개시물의 기법들은 델타 정보가 코딩 또는 허용되지 않는 (예를 들어, 모션 벡터 차이 값이 코딩 또는 허용되지 않는) IVMP 모드를 이용할 수도 있다. 그 대신에, IVMP 모드에서, 텍스쳐 뷰의 모션 정보는 심도 뷰의 모션 정보로서 채택된다.Also, compared to techniques for predicting a motion vector for one view based on the motion of another view, the techniques of this disclosure may realize additional compression. For example, some scalable video coding (SVC) techniques may allow motion prediction of the reinforcement view based on the motion information of the base view, and in some cases, the base view may be a texture view and the reinforcement view may be a depth view. . However, in this case, motion vector difference data (eg, delta) is always coded in addition to the prediction information (or flag) indicating that the base view is used to predict the reinforcement view. In contrast, the techniques of this disclosure may use an IVMP mode in which delta information is not coded or allowed (eg, a motion vector difference value is not coded or allowed). Instead, in IVMP mode, the motion information of the texture view is taken as the motion information of the depth view.

텍스쳐 뷰의 모션 정보가 심도 뷰의 모션 정보로서 채택되는 경우, 디코더는, 심도 뷰에 대한 어떠한 다른 모션 정보도 수신하거나 디코딩하지 않고도, 텍스쳐 뷰의 모션 정보 (예를 들어 텍스쳐 블록) 를 이용하여 심도 뷰 (예를 들어, 대응하는 심도 블록) 를 디코딩할 수 있다. 특히, 디코더는 이러한 방식으로 IVMP 플래그를 해석하도록 구성될 수 있다. 따라서, IVMP 플래그가 인에이블될 때 심도 비디오 블록으로부터의 모션 정보가 배제될 수도 있고 디코더는 인에이블된 IVMP 플래그는 심도 비디오 블록에 대한 모션 정보가 대응하는 테스쳐 비디오 블록으로부터 획득될 수도 있음을 의미한다는 것을 알도록 구성될 수 있다.When the motion information of the texture view is adopted as the motion information of the depth view, the decoder uses the depth information using the texture information's motion information (e.g., texture block) without receiving or decoding any other motion information for the depth view. A view (eg, a corresponding depth block) can be decoded. In particular, the decoder can be configured to interpret the IVMP flag in this manner. Thus, when the IVMP flag is enabled, motion information from the depth video block may be excluded and the decoder means that the enabled IVMP flag may be obtained from the corresponding test video block with motion information for the depth video block. It can be configured to know that.

본 개시물을 따르는 인코더는 일반적으로 JMVC (joint multiview video coding) 인코더 방식에 따를 수도 있으며, 뷰들은 하나씩 인코딩된다. 각각의 뷰 내부에서, 텍스쳐 시퀀스는 먼저 인코딩되고, 심도 시퀀스가 그 다음 인코딩된다.An encoder according to this disclosure may generally follow a joint multiview video coding (JMVC) encoder scheme, with views encoded one by one. Inside each view, the texture sequence is encoded first, and the depth sequence is then encoded.

IVMP 모드가 인에이블될 때, 텍스쳐 뷰 성분 인코딩 동안에, 각각의 텍스쳐 뷰 성분의 모션 필드는 모션 파일 내에 기록되고, 그 이름은 컨피규어 파일에 특정될 수 있다. 동일한 뷰의 관련 심도 시퀀스를 인코딩할 때, 모션 파일은 참조를 위하여 판독될 수 있다.When IVMP mode is enabled, during texture view component encoding, the motion field of each texture view component is recorded in a motion file, and the name can be specified in the configuration file. When encoding the relevant depth sequences of the same view, the motion file can be read for reference.

디코더는 JMVC 디코더와 유사할 수 있으며, 일부 양태들에서, 각각의 뷰에 대한 심도 시퀀스를 디코딩 및 출력하는데 있어 또한 변경이 있다. IVMP 모드가 인에이블될 때, 각각의 텍스쳐 뷰 성분의 모션은 저장되고, 각각의 대응하는 심도 뷰의 모션으로서 채택된다. IVMP 모드가 디스에이블되는 임의의 블록들에서는, 심도 뷰가 자신의 모션 정보를 포함할 수도 있거나 또는 각각의 모션 정보를 어디서 획득, 예측, 및/또는 채택하는지를 식별하는 몇몇 다른 구문 엘리먼트들을 포함할 수도 있다. 그러나, IVMP 모드가 인에블되면, 심도 뷰는 자신의 모션 정보를 포함하지 않고, 모션 정보는 대응하는 텍스쳐 뷰 성분으로부터 디코더에 의해 획득된다. 따라서, IVMP 모드가 인에이블되면, 심도 뷰 비디오 블록은 대응하는 텍스쳐 뷰 비디오 블록의 모션 정보를 채택하여, 심도 뷰 비디오 블록이 자기 자신의 모션 정보를 포함하지 않게 된다.The decoder may be similar to the JMVC decoder, and in some aspects there are also changes in decoding and outputting the depth sequence for each view. When the IVMP mode is enabled, the motion of each texture view component is stored and adopted as the motion of each corresponding depth view. In any blocks for which IVMP mode is disabled, the depth view may include its motion information or may include some other syntax elements that identify where to obtain, predict, and / or employ each motion information. have. However, if the IVMP mode is enabled, the depth view does not include its motion information, and the motion information is obtained by the decoder from the corresponding texture view component. Thus, when the IVMP mode is enabled, the depth view video block adopts motion information of the corresponding texture view video block so that the depth view video block does not contain its own motion information.

도 1, 도 2 및 도 3 의 이후의 설명은 본 개시물의 MVC-기반 3DVC 기법들이 사용될 수도 있는 몇몇 예시적인 시나리오들을 설명한다.The following description of FIGS. 1, 2 and 3 describe some example scenarios in which the MVC-based 3DVC techniques of this disclosure may be used.

도 1 은 본 개시물에 기술된 기법들을 이용할 수도 있는, 예시적인 비디오 인코딩 및 디코딩 시스템 (10) 을 예시하는 블록도이다. 도 1 에 나타낸 바와 같이, 시스템 (10) 은 목적지 디바이스 (14) 에 의해 추후에 디코딩될 인코딩된 비디오 데이터를 생성하는 소스 디바이스 (12) 를 포함한다. 소스 디바이스 (12) 및 목적지 디바이스 (14) 는 데스크탑 컴퓨터들, 노트북 (즉, 랩탑) 컴퓨터들, 태블릿 컴퓨터들, 셋-탑 박스들, 소위 "스마트" 폰들과 같은 전화기 핸드셋들, 소위 "스마트" 패드들, 텔레비전, 카메라들, 디스플레이 디바이스들, 디지털 미디어 플레이어들, 비디오 게이밍 콘솔들, 비디오 스트리밍 디바이스 등을 포함한, 광범위한 디바이스들 중 임의의 디바이스를 포함할 수도 있다. 일부의 경우, 소스 디바이스 (12) 및 목적지 디바이스 (14) 는 무선 통신용으로 탑재될 수도 있다.1 is a block diagram illustrating an example video encoding and decoding system 10 that may utilize the techniques described in this disclosure. As shown in FIG. 1, system 10 includes a source device 12 that generates encoded video data to be later decoded by destination device 14. The source device 12 and destination device 14 may be any type of device, such as desktop computers, laptop computers, tablet computers, set-top boxes, telephone handsets such as so- May include any of a wide variety of devices, including pads, televisions, cameras, display devices, digital media players, video gaming consoles, video streaming devices, and the like. In some cases, the source device 12 and the destination device 14 may be mounted for wireless communication.

목적지 디바이스 (14) 는 링크 (16) 를 통하여 인코딩된 비디오 데이터를 수신할 수도 있다. 링크 (16) 는 인코딩된 비디오 데이터를 소스 디바이스 (12) 로부터 목적지 디바이스 (14) 로 이동시킬 수 있는 임의 종류의 매체 또는 디바이스를 포함할 수도 있다. 일 예에서, 컴퓨터-판독가능 매체 (16) 는 소스 디바이스 (12) 로 하여금, 인코딩된 비디오 데이터를 직접 목적지 디바이스 (14) 로 실시간으로 송신할 수 있게 하는 통신 매체를 포함할 수도 있다. 인코딩된 비디오 데이터는 무선 통신 프로토콜과 같은 통신 표준에 따라서 변조되어 목적지 디바이스 (14) 로 송신될 수도 있다. 통신 매체는 무선 주파수 (RF) 스펙트럼 또는 하나 이상의 물리적인 송신 라인들과 같은, 임의의 무선 또는 유선 통신 매체를 포함할 수도 있다. 통신 매체는 근거리 네트워크, 광역 네트워크, 또는 글로벌 네트워크, 예컨대 인터넷과 같은 패킷-기반 네트워크의 부분을 형성할 수도 있다. 통신 매체는 라우터들, 스위치들, 기지국들, 또는 소스 디바이스 (12) 로부터 목적지 디바이스 (14) 로 통신을 용이하게 하는데 유용할 수도 있는 임의의 다른 장비를 포함할 수도 있다.Destination device 14 may receive the encoded video data over link 16. The link 16 may include any type of media or device capable of moving the encoded video data from the source device 12 to the destination device 14. [ In one example, computer-readable medium 16 may include a communication medium that enables source device 12 to transmit encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard such as a wireless communication protocol and transmitted to the destination device 14. [ The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful for facilitating communication from the source device 12 to the destination device 14.

대안으로서, 인코딩된 데이터는 출력 인터페이스 (22) 로부터 저장 디바이스 (32) 로 출력될 수도 있다. 이와 유사하게, 인코딩된 데이터는 저장 디바이스 (32) 로부터 입력 인터페이스에 의해 액세스될 수도 있다. 저장 디바이스 (32) 는 하드 드라이브, 블루-레이 디스크들, DVDs, CD-ROMs, 플래시 메모리, 휘발성 또는 비-휘발성 메모리, 또는 인코딩된 비디오 데이터를 저장하기 위한 임의의 다른 적합한 디지털 저장 매체들과 같은 다양한 분산된 또는 국부적으로 액세스되는 데이터 저장 매체들 중 임의의 데이터 저장 매체를 포함할 수도 있다. 추가 예에서, 저장 디바이스 (32) 는 소스 디바이스 (12) 에 의해 발생되는 인코딩된 비디오를 유지할 수도 있는 파일 서버 또는 또다른 중간 저장 디바이스에 대응할 수도 있다. 목적지 디바이스 (14) 는 저장된 비디오 데이터에 저장 디바이스로부터 스트리밍 또는 다운로드를 통해서 액세스할 수도 있다. 파일 서버는 인코딩된 비디오 데이터를 저장하고 그 인코딩된 비디오 데이터를 목적지 디바이스 (14) 로 송신하는 것이 가능한 임의 종류의 서버일 수도 있다. 예시적인 파일 서버들은 웹 서버 (예컨대, 웹사이트용), FTP 서버, NAS (network attached storage) 디바이스들, 또는 로칼 디스크 드라이브를 포함한다. 목적지 디바이스 (14) 는 인터넷 접속을 포함한, 임의의 표준 데이터 접속을 통해서, 인코딩된 비디오 데이터에 액세스할 수도 있다. 이것은 파일 서버 상에 저장된 인코딩된 비디오 데이터에 액세스하는데 적합한, 무선 채널 (예컨대, Wi-Fi 접속), 유선 접속 (예컨대, DSL, 케이블 모뎀 등), 또는 양자의 조합을 포함할 수도 있다. 저장 디바이스 (32) 로부터의 인코딩된 비디오 데이터의 송신은 스트리밍 송신, 다운로드 송신, 또는 이들의 조합일 수도 있다.Alternatively, the encoded data may be output from the output interface 22 to the storage device 32. Similarly, encoded data may be accessed by the input interface from storage device 32. Storage device 32 may be a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. It may include any of a variety of distributed or locally accessed data storage media. In a further example, storage device 32 may correspond to a file server or another intermediate storage device that may maintain encoded video generated by source device 12. Destination device 14 may access stored video data via streaming or download from the storage device. The file server may be any kind of server capable of storing encoded video data and transmitting the encoded video data to the destination device 14. [ Exemplary file servers include a web server (e.g., for a web site), an FTP server, network attached storage (NAS) devices, or a local disk drive. The destination device 14 may access the encoded video data via any standard data connection, including an Internet connection. This may include a wireless channel (eg, Wi-Fi connection), a wired connection (eg, DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from storage device 32 may be a streaming transmission, a download transmission, or a combination thereof.

본 개시물의 기법들은 그러나, 반드시 무선 애플리케이션들 또는 설정들에 한정되지는 않는다. 이 기법들은 오버-디-에어 텔레비전 브로드캐스트들, 케이블 텔레비전 송신들, 위성 텔레비전 송신들, 스트리밍 비디오 송신들, 예컨대 인터넷을 통하여 데이터 저장 매체 상으로의 저장을 위한 디지털 비디오의 인코딩, 데이터 저장 매체 상에 저장된 디지털 비디오의 디코딩, 또는 다른 애플리케이션들과 같은, 다양한 멀티미디어 애플리케이션들 중 임의의 애플리케이션의 지원 하에 비디오 코딩에 적용될 수도 있다. 일부 예들에서, 시스템 (10) 은 비디오 스트리밍, 비디오 플레이백, 비디오 브로드캐스팅, 및/또는 비디오 전화 통신과 같은, 지원 애플리케이션들로의 1-방향 또는 2-방향 비디오 송신을 지원하도록 구성될 수도 있다.The techniques of this disclosure, however, are not necessarily limited to wireless applications or settings. These techniques include over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions, such as encoding of digital video for storage onto a data storage medium via the Internet, on a data storage medium. It may be applied to video coding with the support of any of a variety of multimedia applications, such as decoding of digital video stored in, or other applications. In some instances, the system 10 may be configured to support one-way or two-way video transmission to support applications, such as video streaming, video playback, video broadcasting, and / or video telephony .

도 1 의 예에서, 소스 디바이스 (12) 는 비디오 소스 (18), 비디오 인코더 (20) 및 출력 인터페이스 (22) 를 포함한다. 일부 경우에, 출력 인터페이스 (22) 는 변조기/복조기 (모뎀) 및/또는 송신기를 포함할 수도 있다. 소스 디바이스 (12) 에서, 비디오 소스 (18) 는 소스, 이를 테면, 비디오 캡쳐 디바이스, 예를 들어, 비디오 카메라, 이전에 캡쳐된 비디오를 포함하는 비디오 아카이브, 또는 비디오 콘텐츠 제공자로부터 공급되는 비디오를 수신하는 비디오 피드 인터페이스, 및/또는 소스 비디오, 소스 비디오들의 조합으로서 컴퓨터 그래픽스 데이터를 생성하는 컴퓨터 그래픽 시스템을 포함할 수도 있다. 예로서, 비디오 소스 (18) 가 비디오 카메라이면, 소스 디바이스 (12) 및 목적지 디바이스 (14) 는 소위 카메라 폰들 또는 비디오 폰들을 형성할 수도 있다. 그러나, 본 개시물에 설명된 기법들은 일반적으로 비디오 코딩에 적용가능할 수도 있으며 무선 및/또는 유선 애플리케이션들에 적용될 수도 있다.In the example of FIG. 1, the source device 12 includes a video source 18, a video encoder 20, and an output interface 22. In some cases, output interface 22 may include a modulator / demodulator (modem) and / or a transmitter. At source device 12, video source 18 receives a source, such as a video capture device, eg, a video camera, a video archive containing previously captured video, or a video supplied from a video content provider. A video feed interface, and / or a computer graphics system that generates computer graphics data as a source video, a combination of source videos. As an example, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be applicable to video coding in general and may be applied to wireless and / or wired applications.

캡쳐되거나, 사전-캡쳐되거나, 또는 컴퓨터-발생된 비디오는 비디오 인코더 (12) 에 의해 인코딩될 수도 있다. 인코딩된 비디오 데이터는 소스 비디오 (20) 의 출력 인터페이스 (22) 를 통하여 목적지 디바이스 (14) 에 직접 송신될 수도 있다. 인코딩된 비디오 데이터는 또한 (또는 대안으로서) 목적지 디바이스 (14) 또는 다른 디바이스들에 의해 디코딩 및/또는 플레이백을 위하여 추후 액세스를 위해 저장 디바이스 (32) 상에 저장된다.The captured, pre-captured, or computer-generated video may be encoded by the video encoder 12. The encoded video data may be sent directly to the destination device 14 via the output interface 22 of the source video 20. The encoded video data is also (or alternatively) stored on storage device 32 for later access for decoding and / or playback by destination device 14 or other devices.

목적지 디바이스 (14) 는 입력 인터페이스 (28), 비디오 디코더 (30) 및 디스플레이 디바이스 (31) 를 포함한다. 일부 경우에, 입력 인터페이스 (28) 는 수신 및/또는 모뎀을 포함할 수도 있다. 목적지 디바이스 (14) 의 입력 인터페이스 (28) 는 링크 (16) 를 통하여 인코딩된 비디오 데이터를 수신한다. 링크 (16) 를 통하여 통신되거나 또는 저장 디바이스 (32) 상에 저장된 인코딩된 비디오 데이터는 비디오 데이터를 디코딩하는데 있어서 비디오 디코더, 이를 테면, 비디오 디코더 (30) 에 의한 사용을 위하여 비디오 인코더 (20) 에 의해 생성된 여러 구문 엘리먼트들을 포함할 수도 있다. 이러한 구문 엘리먼트들은 파일 서버에 저장되거나 또는 저장 매체 상에 저장되거나 또는 통신 매체를 통하여 송신된 인코딩된 비디오 데이터와 함께 포함될 수도 있다.Destination device 14 includes input interface 28, video decoder 30, and display device 31. In some cases, input interface 28 may include a receive and / or modem. Input interface 28 of destination device 14 receives encoded video data via link 16. Encoded video data communicated via link 16 or stored on storage device 32 is passed to video encoder 20 for use by a video decoder, such as video decoder 30, in decoding the video data. It may also contain several syntax elements generated by it. Such syntax elements may be included with the encoded video data stored on a file server, stored on a storage medium, or transmitted via a communication medium.

디스플레이 디바이스 (31) 는 목적지 디바이스 (14) 와 통합될 수도 또는 그 외부에 있을 수도 있다. 일부 예들에서, 목적지 디바이스 (14) 는 통합형 디스플레이를 포함할 수도 있고 또한 외부 디스플레이 디바이스와 인터페이스하도록 구성될 수도 있다. 다른 예들에서, 목적지 디바이스 (14) 는 디스플레이 디바이스일 수도 있다. 일반적으로 디스플레이 디바이스 (31) 는 유저에게 디코딩된 비디오 데이터를 디스플레이하고, 액정 디스플레이 (LCD), 플라즈마 디스플레이, 유기 발광 다이오드 (OLED) 디스플레이, 또는 다른 타입의 디스플레이 디바이스와 같은 여러 디스플레이 디바이스 중 어느 것일 수도 있다.Display device 31 may be integrated with or may be external to destination device 14. In some examples, destination device 14 may include an integrated display and also be configured to interface with an external display device. In other examples, the destination device 14 may be a display device. In general, display device 31 displays decoded video data to a user and may be any of several display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device. have.

비디오 인코더 (20) 및 비디오 디코더 (30) 는 비디오 압축 표준, 이를 테면, 현재 개발중인 HEVC (High Efficiency Video Coding) 표준에 따라 동작할 수도 있고, HM (HEVC Test Model) 에 따를 수도 있다. 대안으로서, 비디오 인코더 (20) 및 비디오 디코더 (30) 는 다른 독점적인 또는 산업적인 표준, 이를 테면, MPEG-4, Part 10, AVC (Advanced Video Coding) 로서 달리 지칭되는 ITU-T H.264 표준, 또는 이러한 표준의 익스텐션들에 따라 동작할 수도 있다. 그러나, 본 개시물의 기법들은 임의의 특정 코딩 표준으로 제한되지 않는다. 비디오 압축 표준들의 다른 예들은 MPEG-2 및 ITU-T H.263 를 포함한다. 독점적인 코딩 기법들, 이를 테면, On2 VP6/VP7/VP8 이라 지칭되는 것들이 또한 본 원에서 설명된 기법들 중 하나 이상을 구현할 수도 있다.Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the High Efficiency Video Coding (HEVC) standard currently under development, and may conform to the HEVC Test Model (HM). As an alternative, video encoder 20 and video decoder 30 are other proprietary or industrial standards, such as the ITU-T H.264 standard, otherwise referred to as MPEG-4, Part 10, Advanced Video Coding (AVC). Or may operate according to extensions of this standard. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video compression standards include MPEG-2 and ITU-T H.263. Proprietary coding techniques, such as those referred to as On2 VP6 / VP7 / VP8, may also implement one or more of the techniques described herein.

도 1 에 나타내지는 않지만, 일부 양태들에서, 비디오 인코더 (20) 및 비디오 디코더 (30) 는 오디오 인코더 및 디코더와 각각 통합될 수도 있으며, 오디오 및 비디오 양자의 인코딩을 공통 데이터 스트림 또는 별개의 데이터 스트림들로 처리하기 위해 적합한 MUX-DEMUX 유닛들, 또는 다른 하드웨어 및 소프트웨어를 포함할 수도 있다. 적용가능한 경우, 일부 예들에서, MUX-DEMUX 유닛들은 ITU H.223 멀티플렉서 프로토콜, 또는 다른 프로토콜들, 이를 테면, 사용자 데이터그램 프로토콜 (UDP) 에 따를 수도 있다. Although not shown in FIG. 1, in some aspects, video encoder 20 and video decoder 30 may be integrated with an audio encoder and decoder, respectively, and may encode both audio and video encodings as a common data stream or a separate data stream Or other hardware and software, for processing with the < RTI ID = 0.0 > MUX-DEMUX < / RTI > If applicable, in some examples, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols, such as User Datagram Protocol (UDP).

비디오 인코더 (20) 및 비디오 디코더 (30) 각각은 하나 이상의 마이크로프로세서들, 디지털 신호 프로세서들 (DSPs), 주문형 집적회로들 (ASICs), 필드 프로그래밍가능 게이트 어레이들 (FPGAs), 이산 로직, 소프트웨어, 하드웨어, 펌웨어 또는 임의의 이들의 조합들과 같은, 다양한 적합한 인코더 회로 중 임의의 회로로 구현될 수도 있다. 이 기법들이 소프트웨어로 부분적으로 구현되는 경우, 디바이스는 본 개시물의 기법들을 수행하기 위해 소프트웨어용 명령들을 적합한 비일시성 컴퓨터-판독가능 매체에 저장하고, 그 명령들을 하드웨어에서 하나 이상의 프로세서들을 이용하여 실행할 수도 있다. 비디오 인코더 (20) 및 비디오 디코더 (30) 각각은 하나 이상의 인코더들 또는 디코더들에 포함될 수도 있으며, 이들 중 어느 쪽이든 각각의 디바이스에서 결합된 인코더/디코더 (코덱) 의 부분으로서 통합될 수도 있다.Each of video encoder 20 and video decoder 30 includes one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, It may be implemented in any of a variety of suitable encoder circuits, such as hardware, firmware or any combination thereof. When these techniques are partially implemented in software, the device may store instructions for the software in a suitable non-volatile computer-readable medium to perform the techniques of the present disclosure, and may execute those instructions in hardware using one or more processors have. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder / decoder (codec) in each device.

JCT-VC 는 HEVC 표준의 개발에 노력을 들이고 있다. HEVC 표준화 노력들은 HEVC 테스트 모델 (HM) 로서 지칭되는 비디오 코딩 디바이스의 진화 모델 (evolving model) 에 기초한다. HM 은 예컨대, ITU-T H.264/AVC 에 따른 기존 디바이스들에 관련된 비디오 코딩 디바이스들의 여러 추가적인 능력들을 가정한다. 예를 들어, H.264 는 9개의 인트라-예측 인코딩 모드들을 제공하는 반면, HM 은 33개 만큼이나 많은 인트라-예측 인코딩 모드들을 제공할 수도 있다.JCT-VC is committed to the development of the HEVC standard. HEVC standardization efforts are based on an evolving model of a video coding device referred to as the HEVC test model (HM). HM assumes several additional capabilities of video coding devices, for example, related to existing devices according to ITU-T H.264 / AVC. For example, H.264 provides nine intra-prediction encoding modes, while HM may provide as many as 33 intra-prediction encoding modes.

일반적으로, HM 의 작업 모델은 비디오 프레임 또는 픽쳐가 루마 및 크로마 샘플들 양쪽 모두를 포함하는 트리블록들 또는 최대 코딩 유닛들 (LCU) 의 시퀀스로 분할될 수도 있음을 기술한다. 트리블록은 H.264 표준의 매크로블록과 유사한 목적을 갖는다. 슬라이스는 코딩 순서에서 복수의 연속적인 트리블록들을 포함한다. 비디오 프레임 또는 픽쳐는 하나 이상의 슬라이스들로 파티션될 수도 있다. 각각각의 트리블록은 쿼드트리에 따라 코딩 유닛들 (CUs) 로 분할될 수도 있다. 예를 들어, 쿼드트리의 루트 노드로서 트리블록은 4개의 자식 노드로 분할될 수도 있고 각각의 자식 노드는 이어서 부모 노드일 수도 있고 다른 4개의 자식 노드로 분할될 수도 있다. 쿼드트리의 리프 노드로서, 최종의 분할되지 않은 자식 노드는 코딩 노드, 즉, 코딩된 비디오 블록을 포함한다. 코딩된 비트스트림과 연관된 구문 데이터는 트리블록이 분할될 수 있는 최대 횟수를 정의할 수도 있으며, 또한, 코딩 노드들의 최소 사이즈를 정의할 수도 있다. 트리블록들은 일부 예들에서 LCU들로 지칭될 수도 있다.In general, the working model of HM describes that a video frame or picture may be divided into a sequence of treeblocks or maximum coding units (LCU) that include both luma and chroma samples. The triblock has a purpose similar to the macroblock of the H.264 standard. A slice includes a plurality of consecutive treeblocks in coding order. A video frame or picture may be partitioned into one or more slices. Each treeblock may be divided into coding units (CUs) according to a quadtree. For example, as a root node of the quadtree, a treeblock may be divided into four child nodes, and each child node may then be a parent node or may be divided into four other child nodes. As a leaf node of the quadtree, the last undivided child node contains a coding node, i.e. a coded video block. Syntax data associated with the coded bitstream may define the maximum number of times a treeblock can be split and may also define the minimum size of coding nodes. Treeblocks may be referred to as LCUs in some examples.

CU 는 코딩 노드, 및 이 코딩 노드와 연관되는 변환 유닛들 (TUs) 및 예측 유닛들 (PUs) 을 포함한다. CU 의 사이즈는 코딩 노드의 사이즈에 대응하며 형태가 정사각형이어야 한다. CU 의 사이즈는 8 x 8 픽셀들로부터 64 x 64 픽셀들 이상의 최대치를 갖는 트리블록의 사이즈까지 이를 수도 있다. 각각의 CU 는 하나 이상의 PU들 및 하나 이상의 TU들을 포함할 수도 있다. CU 와 연관되는 구문 데이터는 예를 들어, 하나 이상의 PU들로의 CU 의 파티셔닝을 기술할 수도 있다. 파티셔닝 모드들은 CU 가 건너 뛸지 또는 직접 모드 인코딩될지, 인트라-예측 모드 인코딩될지, 또는 인터-예측 모드 인코딩될지 여부의 사이에 상이할 수도 있다. PU들은 비-정사각형의 형태로 파티셔닝될 수도 있다. CU 와 연관되는 구문 데이터는 또한 예를 들어, 쿼드트리에 따른 하나 이상의 TU들로의 CU 의 파티셔닝을 기술할 수도 있다. TU 는 정사각형 또는 비-정사각형 (예컨대, 직사각형) 의 형태일 수 있다.The CU includes a coding node, and transformation units (TUs) and prediction units (PUs) associated with the coding node. The size of the CU corresponds to the size of the coding node and should be square in shape. The size of the CU may range from 8 x 8 pixels to the size of the triblock with a maximum of 64 x 64 pixels or more. Each CU may include one or more PUs and one or more TUs. The syntax data associated with the CU may describe, for example, the partitioning of the CU into one or more PUs. The partitioning modes may be different between whether the CU is skipped or direct mode encoded, intra-predicted mode encoded, or inter-predicted mode encoded. PUs may be partitioned into non-square shapes. Syntactic data associated with a CU may also describe partitioning of the CU to one or more TUs according to, for example, a quadtree. The TU may be in the form of a square or a non-square (eg, rectangular).

HEVC 표준은 TU들에 따른 변환들을 허용하며, 그 변환은 상이한 CU들에 대해 상이할 수도 있다. TU들은 일반적으로 파티셔닝된 LCU 에 대해 정의된 주어진 CU 내 PU들의 사이즈에 기초하여 사이징되지만, 이것은 항상 사실은 아니다. TU들은 일반적으로 PU들과 동일한 사이즈이거나 또는 더 작다. 일부 예들에서, CU 에 대응하는 잔여 샘플들은 "잔여 쿼드 트리" (RQT) 로서 알려진 쿼드트리 구조를 이용하여 더 작은 유닛들로 세분될 수도 있다. RQT 의 리프 노드들은 변환 유닛들 (TUs) 로서 지칭될 수도 있다. TU들과 연관되는 픽셀 차이 값들은 변환 계수들을 발생하기 위해 변환될 수도 있으며, 그 변환 계수들은 양자화될 수도 있다.The HEVC standard allows transforms according to TUs, which may be different for different CUs. TUs are usually sized based on the size of the PUs in a given CU defined for the partitioned LCU, but this is not always true. The TUs are generally the same size or smaller than the PUs. In some instances, the remaining samples corresponding to the CU may be subdivided into smaller units using a quadtree structure known as "residual quadtree " (RQT). The leaf nodes of the RQT may be referred to as conversion units (TUs). The pixel difference values associated with the TUs may be transformed to generate transform coefficients, and the transform coefficients may be quantized.

일반적으로, PU 는 예측 프로세스에 관련된 데이터를 포함한다. 예를 들어, PU 가 인트라-모드 인코딩될 때, PU 에 대한 인트라-예측 모드를 기술하는 데이터를 포함할 수도 있다. 또 다른 예로서, PU 가 인터-모드 인코딩될 때, PU 는 PU 에 대한 모션 벡터를 정의하는 데이터를 포함할 수도 있다. PU 에 대한 모션 벡터를 정의하는 데이터는 예를 들어, 모션 벡터의 수평 성분, 모션 벡터의 수직 성분, 모션 벡터에 대한 해상도 (예컨대, 1/4 픽셀 정밀도 또는 1/8 픽셀 정밀도), 모션 벡터가 가리키는 참조 픽쳐, 및/또는 모션 벡터에 대한 참조 픽쳐 리스트 (예컨대, 리스트 0, 리스트 1, 또는 리스트 C) 를 기술할 수도 있다.Generally, the PU contains data relating to the prediction process. For example, when a PU is intra-mode encoded, it may include data describing an intra-prediction mode for the PU. As another example, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. The data defining the motion vector for the PU may be, for example, the horizontal component of the motion vector, the vertical component of the motion vector, the resolution (e.g., 1/4 pixel precision or 1/8 pixel precision) for the motion vector, A reference picture list (eg, list 0, list 1, or list C) for the pointing reference picture and / or motion vector may be described.

일반적으로, TU 는 변환 및 양자화 프로세스들에 이용된다. 하나 이상의 PU들을 가진 주어진 CU 는 또는 하나 이상의 변환 유닛들 (TUs) 을 포함할 수도 있다. 예측에 이어서, 비디오 인코더 (20) 는 PU에 대응하는 잔여 값들을 계산할 수도 있다. 잔여 값들은 엔트로피 코딩을 위한 직렬화된 변환 계수들을 생성하기 위하여, TU들을 이용하여 변환 계수들 변환될 수도, 양자화될 수도, 스캐닝될 수도 있는 픽셀 차이 값들을 포함한다. 본 개시물은 일반적으로, 용어 "비디오 블록" 을 이용하여 CU 의 코딩 노드를 지칭한다. 일부 특정 경우에, 본 개시물은 또한 용어 "비디오 블록" 을 이용하여, 트리블록, 즉, 코딩 노드들 및 PU들 및 TU들을 포함하는 LCU 또는 CU 를 지칭할 수도 있다.In general, a TU is used for transform and quantization processes. A given CU with one or more PUs may or may include one or more transform units (TUs). Following prediction, video encoder 20 may calculate residual values corresponding to the PU. Residual values include pixel difference values that may be transformed, quantized, or scanned using the TUs to produce serialized transform coefficients for entropy coding. This disclosure generally refers to a coding node of a CU using the term “video block”. In some specific cases, this disclosure may also use the term “video block” to refer to a treeblock, that is, an LCU or CU that includes coding nodes and PUs and TUs.

비디오 시퀀스는 일반적으로 비디오 프레임들의 시리즈 또는 픽쳐들을 포함한다. 픽쳐들의 그룹 (GOP) 은 일반적으로 비디오 픽쳐들의 하나 이상의 시리즈를 포함한다. GOP 는 GOP 의 헤더, 픽쳐들의 하나 이상의 헤더에, 또는 다른 곳에, GOP 에 포함된 다수의 픽쳐들을 기술하는 구문 데이터를 포함할 수도 있다. 픽쳐의 각각의 슬라이스는 각각의 슬라이스에 대한 인코딩 모드를 기술하는 슬라이스 구문 데이터를 포함할 수도 있다. 비디오 인코더 (20) 는 일반적으로 비디오 데이터를 인코딩하기 위해 개개의 비디오 슬라이스들 내 비디오 블록들에 대해 동작한다. 비디오 블록은 CU 내 코딩 노드에 대응할 수도 있다. 비디오 블록들은 고정 또는 가변 사이즈들을 가질 수도 있으며, 규정된 코딩 표준에 따라서 사이즈가 상이할 수도 있다.A video sequence generally includes a series or pictures of video frames. A group of pictures (GOP) generally includes one or more series of video pictures. A GOP may include syntax data describing a number of pictures included in a GOP, in a header of the GOP, in one or more headers of pictures, or elsewhere. Each slice of the picture may include slice syntax data that describes the encoding mode for each slice. Video encoder 20 generally operates on video blocks within individual video slices to encode video data. The video block may correspond to a coding node in the CU. The video blocks may have fixed or variable sizes, and may vary in size according to a prescribed coding standard.

일 예로서, HM 은 여러 PU 사이즈들에서 예측을 지원한다. 특정의 CU 의 사이즈가 2N x 2N 이라고 가정하면, HM 은 2N x 2N 또는 N x N 의 PU 사이즈들에서 인트라-예측을, 그리고 2N x 2N, 2N x N, N x 2N, 또는 N x N 의 대칭적인 PU 사이즈들에서 인터-예측을 지원한다. HM 은 또한 2N x nU, 2N x ND, nL x 2N, 및 nR x 2N 의 PU 사이즈들에서의 인터-예측에 대해 비대칭적인 파티셔닝을 지원한다. 비대칭적인 파티셔닝에서, CU 의 하나의 방향은 파티셔닝되지 않지만, 다른 방향은 25% 및 75% 로 파티셔닝된다. 25% 파티션에 대응하는 CU 의 부분은 "상부 (Up)", "하부 (Down)", "좌측 (Left)", 또는 "우측 (Right)" 의 표시가 뒤따르는 "n" 으로 표시된다. 따라서, 예를 들어, "2N x nU" 는 상부 (top) 상에서 2N x 0.5N PU 으로 그리고 바닥부 (bottom) 상에서 2N x 1.5N PU 으로 수평으로 파티셔닝된 2N x 2N CU 를 지칭한다.As an example, HM supports prediction in several PU sizes. Assuming that the size of a particular CU is 2N x 2N, HM can perform intra-prediction on PU sizes of 2N x 2N or N x N, and intra-prediction on 2N x 2N, 2N x N, N x 2N, or N x N Inter-prediction is supported in symmetric PU sizes. HM also supports asymmetric partitioning for inter-prediction in PU sizes of 2N x nU, 2N x ND, nL x 2N, and nR x 2N. In asymmetric partitioning, one direction of the CU is not partitioned, while the other direction is partitioned by 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by "n" followed by an indication of "Up", "Down", "Left", or "Right". Thus, for example, “2N × nU” refers to a 2N × 2N CU partitioned horizontally with 2N × 0.5N PU on top and 2N × 1.5N PU on bottom.

본 개시물에서, "N x N" 및 "N 곱하기 N" 은 수직 및 수평 치수들의 관점에서 비디오 블록의 픽셀 치수들, 예컨대, 16 x 16 픽셀들 또는 16 곱하기 16 픽셀들을 지칭하기 위해 상호교환가능하게 사용될 수도 있다. 일반적으로, 16 x 16 블록은 수직 방향으로 16 개의 픽셀들 (y = 16) 및 수평 방향으로 16 개의 픽셀들 (x = 16) 을 가질 것이다. 이와 유사하게, N x N 블록은 일반적으로 수직 방향으로 N 개의 픽셀들 및 수평 방향으로 N 개의 픽셀들을 가지며, 여기서 N 은 음이 아닌 정수 값을 나타낸다. 블록에서 픽셀들은 로우들 및 칼럼들로 배열될 수도 있다. 더욱이, 블록들은 수직 방향에서와 같이 수평 방향에서 동일한 픽셀들의 개수를 반드시 가질 필요는 없다. 예를 들어, 블록들은 N x M 픽셀들을 포함할 수도 있으며, 여기서 M 은 반드시 N 과 같을 필요는 없다.In the present disclosure, "N x N" and "N times N" are interchangeable to refer to pixel dimensions of a video block, e.g., 16 x 16 pixels or 16 x 16 pixels in terms of vertical and horizontal dimensions. . In general, a 16 x 16 block will have 16 pixels (y = 16) in the vertical direction and 16 pixels (x = 16) in the horizontal direction. Similarly, an N × N block generally has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in the block may be arranged in rows and columns. Moreover, the blocks need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction. For example, the blocks may include N x M pixels, where M does not necessarily have to be equal to N.

CU 의 PU들을 이용한 인트라-예측 또는 인터-예측 코딩 이후, 비디오 인코더 (20) 는 CU 의 TU들에 대한 잔여 데이터를 계산할 수도 있다. PU들은 공간 도메인 (또한, 픽셀 도메인으로 지칭됨) 에서 픽셀 데이터를 포함할 수도 있으며, TU들은 이산 코사인 변환 (DCT), 정수 변환, 웨이블릿 변환, 또는 잔여 비디오 데이터에 개념적으로 유사한 변환과 같은 변환의 적용 이후 변환 도메인에서 계수들을 포함할 수도 있다. 잔여 데이터는 미인코딩된 픽쳐의 픽셀들과 PU들에 대응하는 예측 값들 사이의 픽셀 차이들에 대응할 수도 있다. 비디오 인코더 (20) 는 CU 에 대한 잔여 데이터를 포함하는 TU들을 형성하고, 그후 그 TU들을 변환하여, 그 CU 에 대한 변환 계수들을 발생할 수도 있다.After intra-prediction or inter-prediction coding using PUs of the CU, the video encoder 20 may calculate the residual data for the TUs of the CU. The PUs may include pixel data in the spatial domain (also referred to as the pixel domain), and the TUs may be transformed into a transform, such as discrete cosine transform (DCT), integer transform, wavelet transform, or a conceptually similar transform to residual video data And may include coefficients in the transform domain after application. Residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PUs. The video encoder 20 may form TUs that contain residual data for the CU and then transform those TUs to generate transform coefficients for that CU.

변환 계수들을 발생하는 임의의 변환들 이후, 비디오 인코더 (20) 는 변환 계수들의 양자화를 수행할 수도 있다. 양자화는 일반적으로 변환 계수들이 계수들을 나타내는데 사용되는 데이터의 양을 가능한 한 감축하기 위해 양자화되는 프로세스를 지칭하며, 추가적인 압축을 제공한다. 양자화 프로세스는 그 계수들의 일부 또는 모두와 연관되는 비트 깊이를 감소시킬 수도 있다. 예를 들어, n-비트 값은 양자화 동안 m-비트 값까지 절사될 수도 있으며, 여기서, n 은 m 보다 더 크다.After any transforms that generate transform coefficients, the video encoder 20 may perform quantization of the transform coefficients. Quantization generally refers to a process in which transform coefficients are quantized to reduce as much as possible the amount of data used to represent coefficients, and provides additional compression. The quantization process may reduce the bit depth associated with some or all of its coefficients. For example, the n-bit value may be truncated to an m-bit value during quantization, where n is greater than m.

일부 예들에서, 비디오 인코더 (20) 는 엔트로피 인코딩될 수 있는 직렬화된 벡터를 발생하기 위해, 미리 정의된 스캐닝 순서를 이용하여, 양자화된 변환 계수들을 스캐닝할 수도 있다. 다른 예들에서, 비디오 인코더 (20) 는 채택적 스캐닝을 수행할 수도 있다. 양자화된 변환 계수들을 스캐닝하여 1차원 벡터를 형성한 후, 비디오 인코더 (20) 는 1차원 벡터를, 예컨대, 컨텍스트-채택 가변 길이 코딩 (CAVLC), 컨텍스트-채택 2진 산술 코딩 (CABAC), 구문-기반의 컨텍스트-채택 2진 산술 코딩 (SBAC), 확률 간격 파티셔닝 엔트로피 (PIPE; Probability Interval Partitioning Entropy) 코딩 또는 또다른 엔트로피 인코딩 방법론에 따라서, 엔트로피 인코딩할 수도 있다. 비디오 인코더 (20) 는 또한 비디오 데이터를 디코딩할 때에 비디오 디코더 (30) 에 의해 사용하기 위해 인코딩된 비디오 데이터와 연관되는 구문 엘리먼트들을 엔트로피 인코딩할 수도 있다.In some instances, the video encoder 20 may scan the quantized transform coefficients using a predefined scanning order to generate a serialized vector that may be entropy encoded. In other examples, video encoder 20 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 generates the one-dimensional vector, for example, context-adopted variable length coding (CAVLC), context-adopted binary arithmetic coding (CABAC), syntax. Entropy encoding may be performed according to context-adapted binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding or another entropy encoding methodology. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 in decoding the video data.

CABAC 을 수행하기 위해, 비디오 인코더 (20) 는 컨텍스트 모델 내 컨텍스트를 송신되는 심볼에 할당할 수도 있다. 컨텍스트는 예를 들어, 심볼의 이웃하는 값들이 논-제로인지 여부에 관련될 수도 있다. CAVLC 을 수행하기 위해, 비디오 인코더 (20) 는 송신되는 심볼에 대해 가변 길이 코드를 선택할 수도 있다. VLC 에서의 코드워드들은 상대적으로 더 짧은 코드들이 더 가능성 있는 심볼들에 대응하지만, 더 긴 코드들이 덜 가능성 있는 심볼들에 대응하도록, 구성될 수도 있다. 이러한 방법으로, VLC 의 사용은 예를 들어, 송신되는 각각의 심볼에 대해 동일-길이 코드워드들을 사용하는 것을 넘어서는 비트 절감을 달성할 수도 있다. 확률 결정은 그 심볼에 할당된 컨텍스트에 기초할 수도 있다.To perform CABAC, the video encoder 20 may assign a context in the context model to the transmitted symbols. The context may, for example, relate to whether neighboring values of a symbol are non-zero. To perform CAVLC, the video encoder 20 may select a variable length code for the transmitted symbol. The codewords in the VLC may be configured such that relatively shorter codes correspond to more probable symbols, but longer codes correspond to less probable symbols. In this way, the use of VLC may achieve bit savings, for example, beyond using the same-length codewords for each symbol transmitted. The probability determination may be based on the context assigned to the symbol.

도 2 는 본 개시물에 따른 기법들을 구현할 수도 있는 예시적인 비디오 인코더 (20) 를 나타내는 블록도이다. 비디오 인코더 (20) 는 비디오 슬라이스들 내에서 비디오 블록들의 인터 코딩을 수행할 수도 있다. 인트라 코딩은 공간 예측에 의존하여, 주어진 비디오 프레임 또는 픽쳐 내에서 비디오에서의 공간적 리던던시를 감소 또는 제거한다. 인터 코딩은 시간 예측에 의존하여 비디오 시퀀스의 인접하는 프레임들 또는 픽쳐들 내에서 비디오의 시간적 리던던시를 감소 또는 제거한다. 인트라 모드 (I 모드) 는 수개의 공간 기반 압축 모드들 중 어느 것을 지칭할 수도 있다. 인터 모드들, 이를 테면, 단방향성 예측 (P 모드) 또는 양방향 예측 (B 모드) 는 수개의 시간 기반 압축 모드들 중 어느 것을 지칭할 수도 있다.2 is a block diagram illustrating an example video encoder 20 that may implement techniques in accordance with this disclosure. Video encoder 20 may perform inter coding of video blocks within video slices. Intra coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame or picture. Inter coding reduces or eliminates temporal redundancy of the video within adjacent frames or pictures of the video sequence depending on the temporal prediction. Intra mode (I mode) may refer to any of several spatial based compression modes. Inter modes, such as unidirectional prediction (P mode) or bidirectional prediction (B mode), may refer to any of several time based compression modes.

도 2 의 예에서, 비디오 인코더 (20) 는 파티셔닝 유닛 (35), 예측 모듈 (41), 참조 픽쳐 메모리 (64), 합산기 (50), 변환 모듈 (52), 양자화 유닛 (54), 및 엔트로피 인코딩 유닛 (56) 을 포함한다. 예측 모듈 (41) 은 모션 추정 유닛 (42), 모션 보상 유닛 (44) 및 인트라 예측 모듈 (46) 을 포함한다. 비디오 블록 재구성을 위하여, 비디오 인코더 (20) 는 또한 역 양자화 유닛 (58), 역 변환 모듈 (60) 및 합산기 (62) 를 포함한다. (도 2 에 도시되지 않은) 디블록킹 필터가 또한 재구성된 비디오로부터 블록 현상 아티팩트들을 제거하기 위해 블록 바운더리들을 필터링하도록 포함될 수도 있다. 필요에 따라, 디블록킹 필터는 통상적으로, 합산기 (62) 의 출력을 필터링한다. 추가적인 루프 필터 (인루프 또는 포스트 루프) 가 디블록킹 필터에 더하여 또한 이용될 수도 있다.In the example of FIG. 2, video encoder 20 is partitioning unit 35, prediction module 41, reference picture memory 64, summer 50, transform module 52, quantization unit 54, and Entropy encoding unit 56. Prediction module 41 includes motion estimation unit 42, motion compensation unit 44, and intra prediction module 46. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform module 60, and summer 62. A deblocking filter (not shown in FIG. 2) may also be included to filter block boundaries to remove block phenomenon artifacts from the reconstructed video. As needed, the deblocking filter typically filters the output of summer 62. Additional loop filters (in loop or post loop) may also be used in addition to the deblocking filter.

도 2 에 도시된 바와 같이, 비디오 인코더 (20) 는 비디오 데이터를 수신하고 파티셔닝 유닛 (35) 은 비디오 블록들로 데이터를 파티셔닝한다. 이 파티셔닝은 또한, 예를 들어, LCU들 및 CU들의 쿼드트리 구조에 따라 비디오 블록들의 파티셔닝 뿐만 아니라 슬라이스들, 타일들 또는 다른 더 큰 유닛들로 파티셔닝하는 것을 포함할 수도 있다. 비디오 인코더 (20) 는 일반적으로, 인코딩될 비디오 슬라이스 내의 비디오 블록들을 인코딩하는 컴포넌트들을 나타낸다. 슬라이스는 다중 비디오 블록들로 (그리고 가능하게는, 타이들로 지칭되는 비디오 블록들의 세트들로) 분할될 수도 있다. 예측 모듈 (41) 은 에러 결과들 (예를 들어, 코딩 레이트 및 왜곡 레벨) 에 기초하여 현재 비디오 블록에 대해 복수의 코딩 모드들 중 하나, 이를 테면, 복수의 인트라 코딩 모드들 중 하나 복수의 인터 코딩 모드들 중 하나를 선택할 수도 있다. 예측 모듈 (41) 은 결과적인 인트라 또는 인터 코딩된 블록을, 잔여 블록 데이터를 생성하기 위해 합산기 (50) 에 그리고 참조 픽쳐로서 이용하기 위한 인코딩된 블록을 재구성하기 위해 합산기 (62) 에 제공할 수도 있다.As shown in FIG. 2, video encoder 20 receives video data and partitioning unit 35 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles or other larger units, as well as partitioning of video blocks, eg, according to a quadtree structure of LCUs and CUs. Video encoder 20 generally represents components that encode video blocks within a video slice to be encoded. A slice may be divided into multiple video blocks (and possibly into sets of video blocks referred to as ties). Prediction module 41 may determine one of a plurality of coding modes, such as one of a plurality of intra coding modes, for a current video block based on error results (eg, coding rate and distortion level). One of the coding modes may be selected. Prediction module 41 provides the resulting intra or inter coded block to summer 50 to generate residual block data and to summer 62 to reconstruct the encoded block for use as a reference picture. You may.

예측 모듈 (41) 내의 인트라 예측 모듈 (46) 은 공간 압축을 제공하기 위해 코딩될 현재 블록으로서 동일한 프레임 또는 슬라이스에서 하나 이상의 이웃하는 블록들에 대한 현재 비디오 블록의 인트라 예측 코딩을 수행할 수도 있다. 예측 모듈 (41) 내에서의 모션 추정 유닛 (42) 및 모션 보상 유닛 (44) 은 시간 압축을 제공하기 위해 하나 이상의 참조 픽쳐들에서의 하나 이상의 예측 블록들에 대한 현재 비디오 블록의 인터 예측 코딩을 수행할 수도 있다.Intra prediction module 46 within prediction module 41 may perform intra prediction coding of the current video block on one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 42 and motion compensation unit 44 within prediction module 41 perform inter prediction coding of the current video block for one or more prediction blocks in one or more reference pictures to provide temporal compression. It can also be done.

모션 추정 유닛 (42) 은 비디오 시퀀스에 대한 소정의 패턴에 따라 비디오 슬라이스에 대한 인터 예측 모드를 결정하도록 구성될 수도 있다. 결정된 패턴은 P 슬라이스들, B 슬라이스들, 또는 GPB 슬라이스들로서 시퀀스에서의 비디오 슬라이스들을 할당할 수도 있다. 모션 추정 유닛 (42) 및 모션 보상 유닛 (44) 은 고도로 통합될 수도 있지만, 개념적인 목적으로 별도로 설명된다. 모션 추정 유닛 (42) 에 의해 수행된 모션 추정은 모션 벡터들을 생성하기 위한 프로세스이며, 이는 비디오 블록들의 모션을 추정한다. 예를 들어, 모션 벡터는 참조 픽쳐 내에서의 예상 블록에 대한 현재 비디오 프레임 또는 픽쳐 내의 비디오 블록의 PU 의 변위를 나타낼 수도 있다.Motion estimation unit 42 may be configured to determine the inter prediction mode for the video slice according to a predetermined pattern for the video sequence. The determined pattern may assign video slices in the sequence as P slices, B slices, or GPB slices. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are described separately for conceptual purposes. The motion estimation performed by motion estimation unit 42 is a process for generating motion vectors, which estimates the motion of the video blocks. For example, the motion vector may indicate the displacement of the PU of the current video frame or video block within the picture relative to the expected block within the reference picture.

예측 블록은 픽셀 차이의 관점에서 코딩될 비디오 블록의 PU 와 근접하게 매칭하는 것으로 찾아진 블록이며, 이는 SAD (sum of absolute difference), SSD (sum of square difference) 또는 다른 상이한 매트릭스들에 의해 결정될 수도 있다. 일부 예들에서, 비디오 인코더 (20) 는 참조 픽쳐 메모리 (64) 에 저장된 참조 픽쳐들의 서브-정수 픽셀 위치들에 대한 값들을 계산할 수도 있다. 예를 들어, 비디오 인코더 (20) 는 참조 픽쳐의, 1/4 픽셀 위치들, 1/8 픽셀 위치들, 또는 다른 부분들의 픽셀 위치들의 값들을 내삽할 수도 있다. 따라서, 모션 추정 유닛 (42) 은 전체 픽셀 위치들 또는 부분 픽셀 위치들에 대한 모션 검색을 수행하고 부분 픽셀 정밀도를 가진 모션 벡터를 출력할 수도 있다.The predictive block is the block found to closely match the PU of the video block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of square difference (SSD) or other different matrices. have. In some examples, video encoder 20 may calculate values for sub-integer pixel locations of reference pictures stored in reference picture memory 64. For example, video encoder 20 may interpolate values of pixel positions of quarter pixel positions, eighth pixel positions, or other portions of a reference picture. Thus, motion estimation unit 42 may perform a motion search for full pixel positions or partial pixel positions and output a motion vector with partial pixel precision.

모션 추정 유닛 (42) 은 PU 의 위치를 참조 픽쳐의 예측 블록의 위치에 비교함으로써 인터 코딩된 슬라이스에서의 비디오 블록의 PU 에 대한 모션 벡터를 계산한다. 참조 픽쳐는 제 1 참조 픽쳐 리스트 (List 0) 또는 제 2 참조 픽쳐 리스트 (List 1) 로부터 선택될 수도 있고, 이들 각각은 참조 픽쳐 메모리 (64) 에 저장된 하나 이상의 참조 픽쳐들을 식별한다. 모션 추정 유닛 (42) 은 계산된 모션 벡터를 엔트로피 인코딩 유닛 (56) 및 모션 보상 유닛 (44) 에 전송한다.Motion estimation unit 42 calculates a motion vector for the PU of the video block in the inter coded slice by comparing the position of the PU to the position of the predictive block of the reference picture. The reference picture may be selected from the first reference picture list (List 0) or the second reference picture list (List 1), each of which identifies one or more reference pictures stored in the reference picture memory 64. The motion estimation unit 42 transmits the calculated motion vectors to the entropy encoding unit 56 and the motion compensation unit 44. [

모션 보상 유닛 (44) 에 의해 수행된 모션 보상은 모션 추정에 의해, 가능하다면 서브픽셀 정밀도에 대한 내삽들을 수행함으로써 결정된 모션 벡터에 기초하여 예측 블록을 페치 또는 생성하는 것을 포함할 수도 있다. 현재 비디오 블록의 PU 의 모션 벡터에 대한 수신시, 모션 보상 유닛 (44) 은 모션 벡터가 참조 픽쳐 리스트들 중 하나에 포인팅하는 예측 블록을 위치결정할 수도 있다. 비디오 인코더 (20) 는 코딩되고 있는 현재 비디오 블록의 픽셀 값들로부터 예측 블록의 픽셀 값들을 감산함으로써 잔여 비디오 블록을 형성하여, 픽셀 차이 값들을 형성한다. 픽셀 차이 값들은 블록에 대한 잔여 데이터를 형성하고 루마 및 크로마 차이 성분들 양쪽 모두를 포함할 수도 있다. 합산기 (50) 는 이 감산 동작을 수행하는 컴포넌트 또는 컴포넌트들을 나타낸다. 모션 보상 유닛 (44) 은 비디오 슬라이스의 비디오 블록들을 디코딩함에 있어서 비디오 디코더 (30) 에 의한 이용을 위하여 비디오 슬라이스들 및 비디오 블록들과 연관된 구문 엘리먼트들을 또한 생성할 수도 있다.The motion compensation performed by motion compensation unit 44 may include fetching or generating the predictive block based on the motion vector determined by performing motion interpolation on subpixel precision, if possible. Upon reception of the motion vector of the PU of the current video block, motion compensation unit 44 may locate the predictive block to which the motion vector points to one of the reference picture lists. Video encoder 20 forms a residual video block by subtracting pixel values of the predictive block from pixel values of the current video block being coded, thereby forming pixel difference values. The pixel difference values form residual data for the block and may include both luma and chroma difference components. The summer 50 represents a component or components that perform this subtraction operation. Motion compensation unit 44 may also generate syntax elements associated with the video slices and the video blocks for use by video decoder 30 in decoding the video blocks of the video slice.

인트라 예측 모듈 (46) 은 위에 설명된 바와 같이, 모션 추정 유닛 (42) 및 모션 보상 유닛 (44) 에 의해 수행된 인터 예측에 대한 대안으로서 현재 블록을 인트라 예측할 수도 있다. 특히, 인트라 예측 모듈 (46) 은 현재 블록을 인코딩하는데 이용하기 위한 인트라 예측 모드를 결정할 수도 있다. 일부 예들에서, 인트라 예측 모듈 (46) 은 별도의 인코딩 패스들 동안에 여러 인트라 예측 모드들을 이용하여 현재 블록을 인코딩할 수도 있고, 인트라 예측 모듈 (46)(또는 일부 예들에서, 모드 선택 유닛 (40)) 이 테스트 모드로부터 이용하기 위한 적절한 인트라 예측 모드를 선택할 수도 있다. 예를 들어, 인트라 예측 모듈 (46) 은 여러 테스트되는 인트라 예측 모드들에 대한 레이트 왜곡 분석을 이용하여 레이트 왜곡 값들을 계산할 수도 있고, 테스트된 모드들 중에서 최상의 레이트 왜곡 특성을 갖는 인트라 예측 모드를 선택할 수도 있다. 레이트 왜곡 분석은 일반적으로 인코딩된 블록을 생성하는데 이용된 비트 레이트 (즉, 복수의 비트들) 에 더하여, 인코딩된 블록을 생성하기 위해 인코딩되었던 원래의 비인코딩된 블록과 인코딩된 블록 사이의 왜곡 (또는 에러) 의 크기를 결정한다. 인트라 예측 모듈 (46) 은 어느 인트라 예측 모드가 블록에 대한 최상의 레이트 왜곡 값을 나타내는지를 결정하기 위해 여러 인코딩된 블록들에 대한 왜곡들 및 레이트들로부터의 비들을 계산할 수도 있다. Intra prediction module 46 may intra predict the current block as an alternative to inter prediction performed by motion estimation unit 42 and motion compensation unit 44, as described above. In particular, intra prediction module 46 may determine an intra prediction mode to use to encode the current block. In some examples, intra prediction module 46 may encode the current block using several intra prediction modes during separate encoding passes, and intra prediction module 46 (or in some examples, mode selection unit 40) The appropriate intra prediction mode may be selected for use from this test mode. For example, intra prediction module 46 may calculate rate distortion values using rate distortion analysis for several tested intra prediction modes, and select an intra prediction mode having the best rate distortion characteristic among the tested modes. It may be. Rate distortion analysis generally includes, in addition to the bit rate (ie, the plurality of bits) used to generate the encoded block, the distortion between the original non-encoded block and the encoded block that was encoded to produce the encoded block. Or error). Intra prediction module 46 may calculate ratios from the distortions and rates for the various encoded blocks to determine which intra prediction mode exhibits the best rate distortion value for the block.

몇몇 경우들에, 예측 모듈 (41) 은 하나 이상의 심도 비디오 블록들을 코딩하기 위한IVMP 모드를 선택할 수도 있다. 이 경우에, 대응하는 텍스쳐 비디오 블록에 대한 모션 정보는 본원에 설명된 바와 같은 심도 블록에 대하여 채택될 수도 있다. 심도 블록 및 텍스쳐 블록은 동일한 NAL 유닛으로 코딩될 수도 있고, 디코더가 대응하는 텍스쳐 뷰 비디오 블록의 모션 정보를 재사용함으로써 심도 비디오 블록을 적절하게 디코딩할 수 있도록 IVMP 플래그가 인코딩될 수도 있다.In some cases, prediction module 41 may select an IVMP mode for coding one or more depth video blocks. In this case, motion information for the corresponding texture video block may be adopted for the depth block as described herein. The depth block and the texture block may be coded in the same NAL unit, and the IVMP flag may be encoded such that the decoder can properly decode the depth video block by reusing the motion information of the corresponding texture view video block.

어느 경우에도, 블록에 대한 인트라 예측 모드를 선택한 후, 인트라 예측 모듈 (46) 은 블록에 대하여 선택된 인트라 예측 모드를 나타내는 정보를 엔트로피 인코딩 유닛 (56) 에 제공할 수도 있다. 엔트로피 인코딩 유닛 (56) 은 본 개시물의 기법에 따라 선택된 인트라 예측 모드를 나타내는 정보를 인코딩할 수도 있다. 비디오 인코더 (20) 는 송신된 비트스트림 구성 데이터에 포함될 수도 있고, 이 데이터는 복수의 인트라 예측 모드 인덱스 및 복수의 변경된 인트라 예측 모드 인덱스 테이블 (또한, 코드워드 매핑 테이블들로 지칭됨), 여러 블록에 대한 인코딩 컨텍스트들의 정의들, 및 가장 가능성있는 인트라 예측 모드의 표시, 인트라 예측 모드 인덱스 테이블, 및 컨텍스트들 각각에 대하여 재사용하기 위한 변경된 인트라 예측 모드 인덱스 테이블을 포함할 수도 있다.In any case, after selecting the intra prediction mode for the block, intra prediction module 46 may provide entropy encoding unit 56 with information indicating the selected intra prediction mode for the block. Entropy encoding unit 56 may encode the information indicating the selected intra prediction mode according to the techniques of this disclosure. Video encoder 20 may be included in the transmitted bitstream configuration data, which data may include a plurality of intra prediction mode indexes and a plurality of modified intra prediction mode index tables (also referred to as codeword mapping tables), various blocks. Definitions of encoding contexts for, and an indication of the most likely intra prediction mode, an intra prediction mode index table, and a modified intra prediction mode index table for reuse for each of the contexts.

예측 모듈 (41) 이 인터 예측 또는 인트라 예측을 통하여 현재 비디오 블록에 대한 예측 블록을 생성한 후, 비디오 인코더 (20) 는 현재 비디오 블록으로부터 예측 블록을 감산함으로써 잔여 비디오 블록을 형성한다. 잔여 블록에서의 잔여 비디오 데이터는 하나 이상의 TU들에 포함될 수도 있고, 변환 모듈 (52) 에 적용될 수도 있다. 변환 모듈 (52) 은 이산 코사인 변환 (discrete cosine transform; DCT) 또는 개념적으로 유사한 변환과 같은 변환을 이용하여 잔여 비디오 데이터를 잔여 변환 계수들로 변환한다. 변환 모듈 (52) 은 픽셀 도메인으로부터 변환 도메인, 이를 테면, 주파수 도메인으로 잔여 비디오 데이터를 변환할 수도 있다.After prediction module 41 generates the predictive block for the current video block through inter prediction or intra prediction, video encoder 20 forms a residual video block by subtracting the prediction block from the current video block. Residual video data in the residual block may be included in one or more TUs and may be applied to transform module 52. Transform module 52 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform. Transform module 52 may transform residual video data from the pixel domain to a transform domain, such as a frequency domain.

변환 모듈 (52) 은 결과적인 변환 계수들을 양자화 유닛 (54) 에 전송할 수도 있다. 양자화 유닛 (54) 은 변환 계수들을 양자화하여 비트 레이트를 추가로 감소시킨다. 양자화 프로세스는 계수들의 일부 또는 전부와 연관된 비트 심도를 감소시킬 수도 있다. 양자화의 정도는 양자화 파라미터를 조정함으로써 변경될 수도 있다. 일부 예들에서, 양자화 유닛 (54) 은 그후에, 양자화된 변환 계수들을 포함하는 매트릭스의 스캔을 수행할 수도 있다. 대안으로서, 엔트로피 인코딩 유닛 (56) 은 스캔을 수행할 수도 있다.The transform module 52 may also send the resulting transform coefficients to the quantization unit 54. [ Quantization unit 54 quantizes the transform coefficients to further reduce bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be changed by adjusting the quantization parameter. In some examples, quantization unit 54 may then perform a scan of the matrix including the quantized transform coefficients. As an alternative, entropy encoding unit 56 may perform the scan.

양자화에 이어서, 엔트로피 인코딩 유닛 (56) 은 양자화된 변환 계수들을 엔트로피 인코딩할 수도 있다. 예를 들어, 엔트로피 인코딩 유닛 (56) 은 컨텍스트-채택 가변 길이 코딩 (CAVLC), 컨텍스트-채택 2진 산술 코딩 (CABAC), 구문-기반의 컨텍스트-채택 2진 산술 코딩 (SBAC), 확률 간격 파티셔닝 엔트로피 (PIPE; Probability Interval Partitioning Entropy) 코딩 또는 또 다른 엔트로피 인코딩 방법론 또는 기법을 수행할 수도 있다. 엔트로피 인코딩 유닛 (56) 에 의한 엔트로피 인코딩에 이어서, 인코딩된 비트스트림은 비디오 디코더 (30) 에 송신될 수도 있거나 또는 비디오 디코더 (30) 에 의한 취출 또는 추후 송신을 위하여 아카이브될 수도 있다. 엔트로피 인코딩 유닛 (56) 은 또한 코딩되고 있는 현재 비디오 슬라이스에 대한 다른 구문 엘리먼트들 및 모션 벡터들을 또한 엔트로피 인코딩할 수도 있다.Following quantization, entropy encoding unit 56 may entropy encode the quantized transform coefficients. For example, entropy encoding unit 56 may include context-adopted variable length coding (CAVLC), context-adopted binary arithmetic coding (CABAC), syntax-based context-adopted binary arithmetic coding (SBAC), probability interval partitioning. Probability Interval Partitioning Entropy (PIPE) coding or another entropy encoding methodology or technique may be performed. Following entropy encoding by entropy encoding unit 56, the encoded bitstream may be transmitted to video decoder 30 or archived for retrieval or later transmission by video decoder 30. Entropy encoding unit 56 may also entropy encode other syntax elements and motion vectors for the current video slice being coded.

역 양자화 유닛 (58) 및 역 변환 모듈 (60) 은 참조 픽쳐의 참조 블록으로서 추후의 이용을 위하여 픽셀 도메인으로 잔여 블록을 재구성하기 위해 역 양자화 및 역 변환을 각각 적용한다. 모션 보상 유닛 (44) 은 참조 픽쳐 리스트들 중 하나 내에서 참조 픽쳐들 중 하나의 예측 블록에 잔여 블록을 추가함으로써 참조 블록을 계산할 수도 있다. 모션 보상 유닛 (44) 은 또한 모션 추정에서의 이용을 위하여 서브 정수 픽셀 값들을 계산하기 위해 재구성된 잔여 블록에 하나 이상의 내삽 필터들을 적용할 수도 있다. 합산기 (62) 는 모션 보상 유닛 (44) 에 의해 생성된 모션 보상된 예측 블록에 재구성된 잔여 블록을 추가하여, 참조 픽쳐 메모리 (64) 에의 저장을 위하여 참조 블록을 생성한다. 참조 블록은 후속하는 비디오 프레임 또는 픽쳐에서의 블록을 인터 예측하기 위해 모션 추정 유닛 (42) 및 모션 보상 유닛 (44) 에 의해 참조 블록으로서 이용될 수도 있다.Inverse quantization unit 58 and inverse transform module 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block into the pixel domain for later use as a reference block of the reference picture. Motion compensation unit 44 may calculate the reference block by adding a residual block to one prediction block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reference block for storage in reference picture memory 64. The reference block may be used as the reference block by motion estimation unit 42 and motion compensation unit 44 to inter predict the block in a subsequent video frame or picture.

도 3은 본 개시물에 기재된 기법들을 구현할 수도 있는 예시적인 비디오 디코더 (30) 를 나타내는 블록도이다. 도 3 의 예에서, 비디오 디코더 (30) 는 엔트로피 디코딩 유닛 (80), 예측 모듈 (81), 역 양자화 유닛 (86), 역 변환 유닛 (88), 합산기 (90), 및 참조 픽쳐 메모리 (92) 를 포함한다. 예측 모듈 (81) 은 모션 보상 유닛 (82) 및 인트라 예측 모듈 (84) 을 포함한다. 비디오 디코더 (30) 는 일부 예들에서, 도 2 의 비디오 인코더 (20) 에 대하여 설명된 인코딩 패스에 대하여 일반적으로 가역적인 디코딩 패스를 수행할 수도 있다.3 is a block diagram illustrating an example video decoder 30 that may implement the techniques described in this disclosure. In the example of FIG. 3, video decoder 30 includes entropy decoding unit 80, prediction module 81, inverse quantization unit 86, inverse transform unit 88, summer 90, and reference picture memory ( 92). The prediction module 81 includes a motion compensation unit 82 and an intra prediction module 84. Video decoder 30 may in some examples perform a generally reversible decoding pass for the encoding pass described with respect to video encoder 20 of FIG. 2.

디코딩 프로세스 동안에, 비디오 디코더 (30) 는 비디오 인코더 (20) 로부터, 인코딩된 비디오 슬라이스의 비디오 블록들 및 연관된 구문 엘리먼트들을 나타내는 인코딩된 비디오 비트스트림을 수신한다. 엔트로피 디코딩 유닛 (80) of 비디오 디코더 (30) 는 양자화된 계수들, 모션 벡터들, 다른 구문 엘리먼트들을 생성하기 위해 비트스트림을 엔트로피 디코딩한다. 엔트로피 디코딩 유닛 (80) 은 모션 벡터들 및 다른 구문 엘리먼트들을 예측 모듈 (81) 에 포워딩한다. 비디오 디코더 (30) 는 구문 엘리먼트들을 비디오 슬라이스 레벨 및/또는 비디오 블록 레벨에서 수신할 수도 있다.During the decoding process, video decoder 30 receives from video encoder 20 an encoded video bitstream that represents the video blocks and associated syntax elements of the encoded video slice. Entropy decoding unit 80 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 80 forwards the motion vectors and other syntax elements to prediction module 81. Video decoder 30 may receive syntax elements at a video slice level and / or a video block level.

비디오 슬라이스가 인크라 코딩 (I) 슬라이스로서 코딩되는 경우, 예측 모듈 (81) 의 인트라 예측 모듈 (84) 은 현재 프레임 또는 픽쳐의 이전에 디코딩된 블록들로부터의 데이터 및 시그널링된 인트라 예측 모드에 기초하여 현재 비디오 슬라이스의 비디오 블록에 대한 예측 데이터를 생성할 수도 있다. 비디오 프레임이 인터 코딩된 (즉, B, P 또는 GPB) 슬라이스로 코딩되는 경우, 예측 모듈 (81) 의 모션 보상 유닛 (82) 은 엔트로피 디코딩 유닛 (80) 으로부터 수신된 모션 벡터들 및 다른 구문 엘리먼트들에 기초하여 현재 비디오 슬라이스의 비디오 블록에 대한 예측 블록들을 생성한다. 예측 블록들은 참조 픽쳐 리스트들 중 하나 내에서의 참조 픽들 중 하나로부터 생성될 수도 있다. 비디오 디코더 (30) 는 참조 픽쳐 메모리 (92) 에 저장된 참조 픽쳐들에 기초하여 디폴트 구성 기법들을 이용하여 참조 프레임 리스트들, List 0 및 List 1 을 구성할 수도 있다.If the video slice is coded as an intra coding (I) slice, intra prediction module 84 of prediction module 81 is based on the signal from the previously decoded blocks of the current frame or picture and the signaled intra prediction mode. To generate predictive data for a video block of the current video slice. When a video frame is coded into an inter coded (ie, B, P or GPB) slice, motion compensation unit 82 of prediction module 81 may receive motion vectors and other syntax elements received from entropy decoding unit 80. Generate predictive blocks for the video block of the current video slice based on the fields. Predictive blocks may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 30 may construct the reference frame lists, List 0 and List 1 using default construction techniques based on reference pictures stored in reference picture memory 92.

모션 보상 유닛 (82) 은 모션 벡터들 및 다른 구문 엘리먼트들을 파싱함으로써 현재 비디오의 비디오 블록에 대한 예측 정보를 결정하고, 예측 정보를 이용하여 디코딩되고 있는 현재 비디오 블록에 대한 예측 블록들을 재생성한다. 예를 들어, 모션 보상 유닛 (82) 은 비디오 슬라이스의 비디오 블록들을 코딩하는데 이용된 예측 모드 (예를 들어, 인트라 또는 인터 예측), 인터 예측 슬라이스 타입 (예를 들어, B 슬라이스, P 슬라이스 또는 GPB 슬라이스), 슬라이스에 대한 참조 픽쳐 리스트들 중 하나 이상에 대한 구성 정보, 슬라이스의 각각의 인터 코딩된 비디오 블록에 대한 모션 벡터들, 슬라이스의 각각의 인터 코딩된 비디오 블록에 대한 인터 예측 상태 및 현재 비디오 슬라이스에서의 비디오 블록들을 디코딩하기 위한 다른 정보를 결정하기 위해 수신된 구먼 엘리먼트들 중 일부를 이용한다.Motion compensation unit 82 determines prediction information for the video block of the current video by parsing motion vectors and other syntax elements, and regenerates the prediction blocks for the current video block being decoded using the prediction information. For example, motion compensation unit 82 may include a prediction mode (eg, intra or inter prediction) used to code video blocks of a video slice, an inter prediction slice type (eg, a B slice, a P slice, or a GPB). Slice), configuration information for one or more of the reference picture lists for the slice, motion vectors for each inter-coded video block of the slice, inter prediction status for each inter-coded video block of the slice, and current video Use some of the received elements to determine other information for decoding the video blocks in the slice.

일부 경우에, 예측 모듈 (81) 은 NAL 유닛에서의 플래그를 해석할 수도 있고 NAL 유닛의 하나 이상의 심도 비디오 블록들을 디코딩하기 위한 IVMP 모드를 선택할 수도 있다. 이 경우에, 대응하는 텍스쳐 비디오 블록에 대한 모션 정보는 본원에 기술된 바와 같이 심도 블록에 대하여 채택될 수도 있다. 심도 블록 및 텍스쳐 블록은 동일한 NAL 유닛으로 코딩될 수도 있고, 비디오 디코더 (30) 가 대응하는 텍스쳐 뷰 비디오 블록의 모션 정보를 재사용함으로써 심도 비디오 블록을 적절하게 디코딩할 수 있도록 IVMP 플래그가 비트스트림으로부터 디코딩될 수도 있다.In some cases, prediction module 81 may interpret the flag in the NAL unit and select an IVMP mode for decoding one or more depth video blocks of the NAL unit. In this case, motion information for the corresponding texture video block may be adopted for the depth block as described herein. The depth block and texture block may be coded in the same NAL unit, and the IVMP flag is decoded from the bitstream so that video decoder 30 can properly decode the depth video block by reusing the motion information of the corresponding texture view video block. May be

모션 보상 유닛 (82) 은 또한, 내삽 필터들에 기초하여 내삽을 수행할 수도 있다. 모션 보상 유닛 (82) 은 참조 블록들의 서브 정수 픽셀들에 대한 내삽된 값들을 계산하기 위해 비디오 인코더 (20) 에 의해 사용된 내삽 필터들을 사용할 수도 있다. 이 경우에, 모션 보상 유닛 (82) 은 수신된 구문 엘리먼트들로부터, 비디오 인코더 (20) 에 의해 이용된 내삽 필터들을 결정하고, 예측 블록들을 생성하기 위해 내삽 필터들을 사용한다.Motion compensation unit 82 may also perform interpolation based on interpolation filters. Motion compensation unit 82 may use the interpolation filters used by video encoder 20 to calculate interpolated values for the sub integer pixels of the reference blocks. In this case, motion compensation unit 82 determines, from the received syntax elements, the interpolation filters used by video encoder 20 and uses the interpolation filters to generate the predictive blocks.

역 양자화 유닛 (86) 은 비트스트림으로 제공되어 엔트로피 디코딩 유닛 (80) 에 의해 디코딩되는 양자화된 변환 계수들을 역 양자화, 즉, 양자화 해제한다. 역 양자화 프로세스는 양자화의 정도 및 마찬가지로 적용되어야 하는 역 양자화의 정도를 결정하기 위해 비디오 슬라이스에서의 각각의 비디오 블록에 대하여 비디오 인코더 (20) 에 의해 계산된 양자화 파라미터의 사용을 포함할 수도 있다. 역 변환 모듈 (88) 은 잔여 블록을 픽셀 도메인으로 생성하기 위하여, 역변환, 예를 들어, 역 DCT, 역 정수 변환 또는 개념적으로 유사한 역 변환 프로세스를 변환 계수들에 적용한다.Inverse quantization unit 86 is provided in the bitstream to inverse quantize, ie, quantize, the quantized transform coefficients decoded by entropy decoding unit 80. The inverse quantization process may include the use of the quantization parameter calculated by video encoder 20 for each video block in the video slice to determine the degree of quantization and likewise the degree of inverse quantization that should be applied. Inverse transform module 88 applies an inverse transform, eg, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process to the transform coefficients, to produce a residual block in the pixel domain.

예측 모듈 (81) 이 인터 또는 인트라 예측에 기초하여 현재 비디오 블록에 대한 예측 블록을 생성한 후, 비디오 디코더 (30) 는 예측 모듈 (81) 에 의해 생성된 대응하는 예측 블록들과, 역 변환 모듈 (88) 로부터의 잔여 블록들을 합산함으로써 디코딩된 비디오 블록을 형성한다. 합산기 (90) 는 이 합산 동작을 수행하는 컴포넌트 또는 컴포넌트들을 나타낸다. 필요에 따라, 디블록킹 필터가 또한, 블록 현상 아티팩트들을 제거하기 위하여, 디코딩된 블록들을 필터링하도록 적용될 수도 있다. 다른 루프 필터들 (코딩 루프에서 또는 코딩 루프 뒤에) 이 또한 픽셀 트랜지션들을 평활화하는데 또는 비디오 품질을 향상시키는데 이용될 수도 있다. 소정의 프레임 또는 픽쳐에서의 디코딩된 비디오 블록들은 그 후, 참조 픽쳐 메모리 (92) 에 저장되는데, 이 메모리는 후속하는 모션 보상에 이용되는 참조 픽쳐들을 저장한다. 참조 픽쳐 메모리 (92) 는 또한 도 1 의 디스플레이 디바이스 (31) 와 같은 디스플레이 디바이스 상에 추후 프리젠테이션을 위하여 디코딩된 비디오를 저장한다.After prediction module 81 generates the predictive block for the current video block based on inter or intra prediction, video decoder 30 generates corresponding prediction blocks generated by prediction module 81 and an inverse transform module. Sum the residual blocks from 88 to form a decoded video block. The summer 90 represents the components or components that perform this summation operation. As needed, a deblocking filter may also be applied to filter the decoded blocks to remove block phenomenon artifacts. Other loop filters (in the coding loop or after the coding loop) may also be used to smooth the pixel transitions or to improve the video quality. Decoded video blocks in a given frame or picture are then stored in reference picture memory 92, which stores reference pictures used for subsequent motion compensation. Reference picture memory 92 also stores decoded video for later presentation on a display device, such as display device 31 of FIG. 1.

3D 비디오 코딩에서, 텍스쳐 뷰 성분 및 그 연관된 심도 뷰 성분은 유사한 오브젝트 실루엣을 가질 수도 있고, 이들 상이한 뷰 성분들은 유사한 오브젝트 바운더리 및 움직임을 가질 수도 있다. 따라서, 연관된 텍스쳐 뷰 성분 및 심도 뷰 성분의 모션 필드들에 리던던시가 존재한다. 본 개시물의 기법은 심도 뷰 성분이 소위 "병합" 모드와 유사한 방식으로, 텍스쳐 뷰 성분의 모션 정보를 완전하게 채택하는 모드를 허용함으로써 종래 기법들보다 더 넓은 범위로 이러한 리던던시를 활용할 수도 있다. 이 경우에, 심도 뷰 성분은 모션 정보에 대하여 어떠한 추가적인 델타 값들로 포함하지 않을 수도 있고 (즉, 어떠한 모션 벡터 차이 값도 포함하지 않을 수도 있고), 그 대신에, 텍스쳐 뷰 성분의 모션 정보를 그 모션 정보로서 채택할 수도 있다.In 3D video coding, the texture view component and its associated depth view component may have similar object silhouettes, and these different view components may have similar object boundaries and motion. Thus, there is redundancy in the motion fields of the associated texture view component and the depth view component. The techniques of this disclosure may utilize this redundancy in a wider range than conventional techniques by allowing a mode in which the depth view component fully adopts the motion information of the texture view component, in a manner similar to the so-called "merge" mode. In this case, the depth view component may not include any additional delta values for the motion information (i.e. it may not contain any motion vector difference values), and instead replaces the motion information of the texture view component. It may be adopted as motion information.

구체적으로, 텍스쳐 뷰 성분으로부터 연관된 심도 뷰 성분으로의 모션 예측은 텍스쳐 뷰의 모션 정보를 심도 뷰의 모션 정보로서 병합하는 새로운 모드에 따라 인에이블될 수도 있다. 일부 예들에서, 이 소위 IVMP 모드는 심도 뷰 성분들에 대한 인터 코딩된 MB들에 대해서만 인에이블될 수도 있다. IVMP 모드에서, mb_type, sub_mb_type, 참조 인덱스들, 및 텍스쳐 뷰 성분에서 공동 위치된 MB 의 모션 벡터들을 포함한 모션 정보는 동일한 뷰에서의 심도 뷰 성분에 의해 재사용된다. IVMP 모드를 사용하기 있는지의 여부를 나타내는 플래그가 각각의 MB 에서 시그널링될 수도 있다. 도 5 에 도시된 바와 같이, 플래그가 심도 뷰의 4번째 픽쳐에서의 식별된 MB 에 대해 참일 수도 있고, 텍스쳐 뷰의 4번째 픽쳐 (4째 픽쳐로서 식별됨) 에서의 공동 위치된 MB 의 모션 벡터는 심도 뷰 성분에서의 하이라이트된 MB 에 대하여 재사용된다. 일부 예들에서, IVMP 모드는 비앵커 픽쳐들에만 적용함을 주지한다. 용어 "고정 픽쳐"는 IDR (instantaneous decoding refresh) 픽쳐와 상이한 임의의 RAP (random access point) 로서 정의될 수도 있다.Specifically, motion prediction from the texture view component to the associated depth view component may be enabled according to a new mode of merging the motion information of the texture view as the motion information of the depth view. In some examples, this so-called IVMP mode may be enabled only for inter coded MBs for depth view components. In IVMP mode, motion information including mb_type, sub_mb_type, reference indices, and motion vectors of MB co-located in the texture view component are reused by the depth view component in the same view. A flag indicating whether to use the IVMP mode may be signaled in each MB. As shown in FIG. 5, the flag may be true for the identified MB in the fourth picture of the depth view, and the motion vector of the co-located MB in the fourth picture (identified as the fourth picture) of the texture view. Is reused for the highlighted MB in the depth view component. Note that in some examples, the IVMP mode only applies to non-anchor pictures. The term “fixed picture” may be defined as any random access point (RAP) that is different from an instantaneous decoding refresh (IDR) picture.

위에 언급된 바와 같이, 다른 뷰의 모션에 기초하여 하나의 뷰에 대한 모션 벡터를 예측하는 종래의 기법들에 비해, 본 개시물의 기법들은 추가의 압축을 실현할 수도 있다. 예를 들어, 몇몇 종래의 스케일러블 기법들은 기본 뷰의 모션 정보에 기초하여 보강 뷰의 모션 예측을 허용할 수도 있고, 일부 경우에, 기본 뷰는 텍스트 뷰일 수도 있고 보강 뷰는 심도 뷰일 수도 있다. 이러한 경우에, 그러나, 기본 뷰가 보강 뷰를 예측하는데 사용됨을 나타내는 예측 정보 (또는 플래그) 에 더하여, 모션 벡터 차이 값 (예를 들어, 델타) 이 항상 코딩된다. 이와 대조적으로, 본 개시물의 기법들은 어떠한 델타도 코딩되거나 허용되지 않는 IVMP 모드를 이용할 수도 있다. 그 대신에, IVMP 모드에서, 텍스쳐 뷰의 모션 정보가 심도 뷰의 모션 정보로서 채택된다.As mentioned above, compared to conventional techniques for predicting a motion vector for one view based on the motion of another view, the techniques of this disclosure may realize additional compression. For example, some conventional scalable techniques may allow motion prediction of the enhancement view based on the motion information of the base view, and in some cases, the base view may be a text view and the enhancement view may be a depth view. In this case, however, in addition to the prediction information (or flag) indicating that the base view is used to predict the reinforcement view, the motion vector difference value (eg delta) is always coded. In contrast, the techniques of this disclosure may utilize an IVMP mode in which no delta is coded or allowed. Instead, in the IVMP mode, the motion information of the texture view is adopted as the motion information of the depth view.

압축된 비디오 데이터를 시그널링하기 위한 추가적인 여러 시그널링 기법들의 세부 사항들이 아래 설명된다. VPS (view parameter set) 는 "대역 내" 로서 시그널링될 수도 있으며 이는 파라미터 세트가 코딩된 픽쳐와 연관되고 하나의 채널 또는 세션에서 함께 송신됨을 의미한다. VPS 는 비트스트림의 시간 인스턴스의 코딩된 표현인 액세스 유닛 (AU) 으로 존재하는 경우에, 임의의 VCL NAL 유닛들을 선행하는 것이 필요할 수도 있다. 다중 프레임들은 복제된 동일한 VPS들을 가져 탄력성 (resiliency) 을 도입할 수도 있다.Details of several additional signaling techniques for signaling compressed video data are described below. A view parameter set (VPS) may be signaled as "in-band", meaning that the parameter set is associated with the coded picture and transmitted together in one channel or session. If the VPS exists in an access unit (AU) that is a coded representation of a time instance of the bitstream, it may be necessary to precede any VCL NAL units. Multiple frames may have the same VPSs replicated to introduce resiliency.

일부 예들에서, 본 개시물의 기법들은 inver_view_flag 를 어드레싱할 수도 있고, inter_view_flag 의 구문들을 확장할 수도 있다. 일 예에서, 　0 과 같은 inter_view_flag 는 현재 뷰 성분이 동일 또는 상이한 공간 분해능을 가진 현재 액세스 유닛으로의 임의의 다른 뷰 성분에 의한 인터-뷰 예측에 이용되지 않음을 특정한다. 이 예에서, 1 과 같은 inter_view_flag 는 현재 뷰 성분이 현재 액세스 유닛으로의 다른 뷰 성분들에 의한 인터-뷰 예측에 이용될 수도 있음을 특정한다.In some examples, the techniques of this disclosure may address inver_view_flag and may extend the syntaxes of inter_view_flag. In one example, inter_view_flag such as # 0 specifies that the current view component is not used for inter-view prediction by any other view component to the current access unit with the same or different spatial resolution. In this example, inter_view_flag equal to 1 specifies that the current view component may be used for inter-view prediction by other view components to the current access unit.

inter_view_flag 의 값은 뷰 성분의 모든 VCL NAL 유닛들에 대하여 동일할 수도 있다.The value of inter_view_flag may be the same for all VCL NAL units of the view component.

일 실시예에서, 좌측 및 우측 뷰들은 하프 분해능이고, 중심 뷰는 풀 분해능이다. 비대칭 3DV 프로파일에서, 이 플래그는 예를 들어, 우측 뷰에 대하여 1 로 설정될 수도 있다. 그러나, MVC 서브 비트스트림이 추출되면, 이 플래그는 1 이 될 필요가 없다.In one embodiment, the left and right views are half resolution and the center view is full resolution. In an asymmetric 3DV profile, this flag may be set to 1 for the right view, for example. However, if the MVC sub bitstream is extracted, this flag need not be 1.

inter _ asy _ view _ flag라 불리는 플래그를 정의:Define a flag called inter _ asy _ view _ flag :

일부 예들에서, 0 과 같은 inter _ asy _ view _ flag 는 현재 뷰 성분이 상이한 공간 분해능을 갖는 현재 액세스 유닛으로 임의의 다른 뷰 성분에 의한 인터-뷰 예측에 이용되지 않음을 특정한다. 1 과 같은 inter_asy_view_flag 는 현재 뷰 성분이 현재 액세스 유닛으로의 상이한 공간 분해능을 가진 다른 뷰 성분들에 의해 인터-뷰 예측에 이용될 수도 있음을 특정한다.In some examples, an inter _ asy _ view _ flag , such as 0, specifies that the current view component is not used for inter-view prediction by any other view component with the current access unit having a different spatial resolution. Inter_asy_view_flag equal to 1 specifies that the current view component may be used for inter-view prediction by other view components with different spatial resolution to the current access unit.

상기 예에서, 좌측 뷰에서, NAL 유닛들은 1 과 같은 inter_view_flag 및 1 과 같은 inter_asy_view_flag 를 가질 수도 있다. 좌측 뷰에서. NAL 유닛들은 0 과 같은 inter_view_flag 및 1 과 같은 inter_asy_view_flag 를 가질 수도 있고, 중심 뷰에서, 모든 NAL 유닛들은 0 과 같은 이들 두개의 플래그를 가질 수도 있다.In the example above, in the left view, NAL units may have inter_view_flag equal to 1 and inter_asy_view_flag equal to 1. From left view. NAL units may have inter_view_flag equal to 0 and inter_asy_view_flag equal to 1, and in the central view, all NAL units may have these two flags equal to zero.

본 개시물은 MPEG 에 의해 발행된 3D Video Coding에서의 CfP (call for proposal) 에 대한 응답을 제공할 수도 있다. 이 제안은 다중 뷰들의 텍스쳐 및 심도의 조인트 코딩을 통합할 수도 있는, 수개의 인핸스먼트들을 가진 H.264/MVC 레퍼런스 소프트웨어 JMVC 및 애디션들에 기초할 수 있다. 본 개시물의 제안은 텍스쳐 및 심도의 조인트 코딩, 뷰 내에서의 텍스쳐로부터 심도로의 예측, 및 상이한 분해능들을 가진 뷰 성분들의 비대칭 코딩을 포함할 수도 있다. 제안시, MPEG 뷰 합성 소프트웨어는 어떠한 변경 없이도 뷰 생성에 이용될 수도 있다.This disclosure may provide a response to a call for proposal (CfP) in 3D Video Coding issued by MPEG. This proposal can be based on H.264 / MVC reference software JMVC and additions with several enhancements, which may incorporate joint coding of texture and depth of multiple views. The proposal of this disclosure may include joint coding of texture and depth, prediction of texture from depth within a view, and asymmetric coding of view components with different resolutions. In the proposal, MPEG view synthesis software may be used for view generation without any change.

JMVC 8.3.1 앵커에 비해, 본 개시물의 제안은 비트레이트들이 두개의 뷰들의 텍스쳐 및 심도 양쪽 모두의 총 비트레이트들일 때, 그리고 피크 신호 대 노이즈 비 (PSNR) 값들이 두개의 디코딩된 텍스쳐 뷰들의 값들에 대한 평균 PSNR 값들일 때, 2-뷰 케이스에 대해, 최대 22.6% (평균적으로 11.7%) 의 레이트 감소를 실현할 수도 있고, 3-뷰 케이스에 대해, 최대 15.8% (평균적으로 7.3%) 의 레이트 감소를 실현할 수도 있다.Compared to the JMVC 8.3.1 anchor, the proposal of this disclosure shows that when the bitrates are total bitrates of both texture and depth of the two views, and the peak signal to noise ratio (PSNR) values of the two decoded texture views. When average PSNR values for the values, for a 2-view case, a rate reduction of up to 22.6% (average 11.7%) may be realized, and for a 3-view case, up to 15.8% (average 7.3%) Rate reduction may be realized.

2-뷰 케이스에 대해, 총 비트레이트들 대 합성된 뷰의 PSNR 값들이 이용되는 경우, BD 레이트 감소는 최대 24.7% (및 평균적으로 13.9%) 이며, 3-뷰 케이스에 대해, 총 비트레이트들 대 두개의 합성 뷰들의 평균 PSNR 값들이 이용되는 경우, BD 레이트 감소는 최대 19.0% (및 평균적으로 15.0%) 이다.For 2-view case, when the total bitrates versus PSNR values of the synthesized view are used, the BD rate reduction is up to 24.7% (and on average 13.9%), and for the 3-view case, the total bitrates If average PSNR values of the two composite views are used, the BD rate reduction is up to 19.0% (and 15.0% on average).

본 개시물은 다음을 제공할 수도 있다.The present disclosure may also provide the following.

● H.264/AVC 하이 프로파일 및 H.264/MVC 스테레오 하이 프로파일 양쪽 모두에 대한 호환가능성 및 가능성있는 멀티뷰 하이 프로파일;Compatibility and possible multiview high profile for both H.264 / AVC high profile and H.264 / MVC stereo high profile;

● 멀티뷰 시퀀스들에 대한 텍스쳐 및 심도의 조인트 코딩;Joint coding of texture and depth for multiview sequences;

● 각각의 뷰의 텍스쳐 및 심도 뷰 성분들에 대한 대칭 공간 및 시간 분해능들;Symmetric spatial and temporal resolutions for the texture and depth view components of each view;

● 상이한 뷰들에 대한 비대칭 공간 분해능들. Asymmetric spatial resolutions for different views.

H.264/MVC 코덱의 상단에 추가적인 코덱 변경들은 또한 다음을 포함할 수도 있다: Additional codec changes on top of the H.264 / MVC codec may also include:

● 텍스쳐 뷰 성분 및 심도 뷰 성분들의 조인트 코딩을 지원하는 하이 레벨 구문;High level syntax to support joint coding of texture view components and depth view components;

● 텍스쳐 및 심도 뷰 성분들 사이의 모션 벡터 예측 및 심도 뷰 모션이 연관된 텍스쳐 뷰 모션으로부터 채택되는 모드.A mode in which motion vector prediction and depth view motion between texture and depth view components are adopted from the associated texture view motion.

본 개시물은 또한 다른 툴, 이를 테면, 상이한 분해능들에서의 뷰 성분들 사이의 예측, 및 텍스쳐 뷰 성분으로부터 대응하는 심도 뷰 성분으로의 슬라이스 헤더들의 예측을 허용하는 툴들을 설명한다. 텍스쳐 뷰 성분 및 심도 뷰 성분은 액세스 유닛으로의 한 뷰의 코딩된 픽쳐인 뷰 성분을 형성할 수도 있다. 따라서, 기법들은 설명된 IVMP 모드에 따라 모션 정보의 채택 또는 텍스쳐 뷰에 대한 심도 뷰의 모션 정보의 예측 (이는 델타들을 포함한다) 을 허용할 수도 있다. 양쪽 툴들은 코딩 유연성을 허용할 수도 있지만, 툴들을 어느 정도로 제한함으로써 최상의 압축을 실현할 수도 있다. 예를 들어, 본원에 설명된 IVMP 모드가 비앵커 픽쳐들로 제한될 수도 있다.This disclosure also describes other tools, such as those that allow prediction between view components at different resolutions, and prediction of slice headers from a texture view component to a corresponding depth view component. The texture view component and the depth view component may form a view component that is a coded picture of a view to the access unit. Thus, the techniques may allow for the adoption of motion information or the prediction of motion information of a depth view relative to a texture view, including deltas, in accordance with the described IVMP mode. Both tools may allow for coding flexibility, but may limit the tools to some degree to achieve the best compression. For example, the IVMP mode described herein may be limited to non-anchor pictures.

문서 전반에 걸쳐, AVC 는 H.264/AVC 하이 프로파일을 지칭한다. 임의의 다른 H.264/AVC 프로파일 또는 보정안이 지칭되는 경우, 그 보정안 또는 프로파일 이름은 명시적으로 특정될 것이다. 예를 들어, H.264/MVC 또는 MVC 는 H264/AVC 의 멀티뷰 익스텐션을 지칭한다. 그러나, H.264/AVC 의 임의의 보정안 또는 프로파일은 AVC 계열에 속하며, 따라서, 제안된 코덱은, 이것이 MVC 스테레오 하이 프로파일에 호환되면, 이는 또한 AVC 스테레오 하이 프로파일에 호환가능하다.Throughout the document, AVC refers to H.264 / AVC high profile. If any other H.264 / AVC profile or amendment is referred to, that amendment or profile name will be explicitly specified. For example, H.264 / MVC or MVC refers to the multiview extension of H264 / AVC. However, any correction or profile of H.264 / AVC belongs to the AVC series, so the proposed codec is also compatible to the AVC stereo high profile if it is compatible with the MVC stereo high profile.

이하 코덱 설명이 제공된다. 이 섹션에서, 제안된 3DVC 코덱은 두가지 양태, 하이 레벨 프레임워크 및 로우 레벨 코딩 기법에서 기술된다. 가능성있게 상이한 애플리케이션에 대응하는 2-뷰들 및 3-뷰들을 가질 수 있는 3DV 포맷을 정의하는 것이 바람직한 경우에, 3-뷰 케이스에서의 기법은 2-뷰 케이스에서의 것들의 슈퍼세트를 형성할 수도 있다. 따라서, 이 섹션에서, 양쪽의 케이스에 대하여 적용가능한 하이레벨 프레임워크가 먼저 설명되고, 이어서, 3-뷰 케이스에 적용가능한 2-뷰 케이스에서의 기법들에 대한 코덱 설명 및 그 후, 3-뷰 케이스에만 이용되는 기법들이 설명될 것이다.Codec descriptions are provided below. In this section, the proposed 3DVC codec is described in two aspects, high level framework and low level coding scheme. In the case where it is desirable to define a 3DV format that can possibly have two views and three views corresponding to different applications, the technique in the three-view case may form a superset of those in the two-view case. have. Thus, in this section, the high level framework applicable for both cases is described first, followed by the codec description for techniques in the two-view case applicable to the three-view case and then the three-view Techniques used only in cases will be described.

하이 레벨 프레임워크는 다음 정의들을 이용할 수도 있다. The high level framework may use the following definitions.

뷰 성분: 단일의 액세스 유닛으로 뷰의 코딩된 표현. 뷰가 코딩된 텍스쳐 및 심도 표현들 양쪽 모두를 포함할 때, 뷰 성분은 텍스쳐 뷰 성분 및 심도 뷰 성분으로 구성된다. View component: A coded representation of a view in a single access unit . When the view includes both coded texture and depth representations, the view component consists of a texture view component and a depth view component.

심도 뷰 성분에서의 코딩된 VCL NAL 유닛들에는, nal_unit_type 21 이, 특히 심도 뷰 성분들에 대한 새로운 타입의 코딩된 슬라이스 익스텐션들로서 할당될 수도 있다.Coded VCL NAL units in the depth view component may be assigned nal_unit_type 21, in particular as a new type of coded slice extensions for depth view components.

이하, 비트 스트림 순서를 설명한다. 각각의 뷰 성분에서, 심도 뷰 성분의 임의의 코딩된 슬라이스 NAL 유닛 (nal_unit_type 21 을 가짐) 은 텍스쳐 뷰 성분의 모든 코딩된 슬라이스 NAL 유닛들을 추종하는 것이 필요할 수도 있다. 간략화를 위하여, 본 개시물은 심도 뷰 성분의 코딩된 슬라이스 NAL 유닛들을 심도 NAL 유닛으로 명명한다.The bit stream sequence is described below. In each view component, any coded slice NAL unit (with nal_unit_type 21) of the depth view component may need to follow all coded slice NAL units of the texture view component. For simplicity, this disclosure names coded slice NAL units of depth view component as depth NAL units.

심도 NAL 유닛은 20과 같은 nal_unit_type 을 가진 NAL 유닛과 동일한 NAL 유닛 헤더 구조들을 갖는다. 도 4 는 하나의 액세스 유닛 내의 뷰 성분들의 VCL NAL 유닛들의 예시적인 비트스트림 순서를 나타낸다.The depth NAL unit has the same NAL unit header structures as the NAL unit with nal_unit_type equal to 20. 4 illustrates an example bitstream order of VCL NAL units of view components within one access unit.

도 4 에 도시된 바와 같이, 예시적인 3D 비디오 코덱에서, 액세스 유닛은 다중 뷰 성분들을 포함하며, 그 각각의 뷰 성분은 하나의 텍스쳐 뷰 성분 및 하나의 심도 뷰 성분으로 구성된다. 0 과 같은 뷰 오더 인덱스 (VOIdx) 를 갖는 기본 뷰의 텍스쳐 뷰 성분은 (14 와 같은 NAL 유닛을 갖는) 하나의 프리픽스 NAL 유닛 및 (1 또는 5 와 같은 NAL 유닛 타입을 갖는) 하나 이상의 AVC VCL NAL 유닛들을 포함한다. 다른 뷰들에서의 텍스쳐 뷰 성분들은 (20 과 같은 NAL 유닛 타입을 가진) MVC VCL NAL 유닛들만을 포함한다. 기본 뷰와 비기본 뷰 양쪽 모두에서, 심도 뷰 성분들은 21 과 같은 NAL 유닛을 갖는 심도 NAL 유닛들을 포함한다. 어느 뷰 성분에서도, 심도 NAL 유닛들은 디코딩/비트스트림 순서에 있어서 텍스쳐 뷰 성분들의 NAL 유닛들을 추종한다.As shown in FIG. 4, in the exemplary 3D video codec, the access unit includes multiple view components, each view component consisting of one texture view component and one depth view component. The texture view component of the base view with a view order index (VOIdx) equal to 0 is one prefix NAL unit (with a NAL unit equal to 14) and one or more AVC VCL NAL (with a NAL unit type equal to 1 or 5). It includes units. Texture view components in other views include only MVC VCL NAL units (with a NAL unit type such as 20). In both the base view and the non-base view, the depth view components include depth NAL units with a NAL unit equal to 21. In any view component, the depth NAL units follow the NAL units of the texture view components in decoding / bitstream order.

2-뷰 케이스에서, 본 개시물은 좌측 뷰 및 우측 뷰 양쪽 모두에 대해 하프 분해능 인코딩을 채택할 수도 있다. 제안된 코덱의 특성들은 다음을 포함할 수도 있다:In a two-view case, this disclosure may employ half resolution encoding for both left and right views. Characteristics of the proposed codec may include:

● 하프 수평방향 또는 하프 수직방향 공간 분해능;Half horizontal or half vertical spatial resolution;

● 각각의 뷰의 텍스쳐 뷰 성분들 및 심도 뷰 성분들에 대한 동일한 분해능;The same resolution for texture view components and depth view components of each view;

● VC 하이 프로파일 호환가능 하프 분해능 기본 뷰 (텍스쳐 단독);VC high profile compatible half resolution base view (texture only);

● AVC 스테레오 하이 프로파일 호환가능 하프 분해능 스테레오스코픽 뷰들 (텍스쳐 단독);AVC stereo high profile compatible half resolution stereoscopic views (texture only);

● 기본 뷰의 심도 뷰 성분으로부터 비기본 뷰의 심도 뷰 성분으로의 인터-뷰 예측;Inter-view prediction from the depth view component of the base view to the depth view component of the non-base view;

● 뷰 성분 내의 텍스쳐 대 심도 예측.Texture versus depth prediction within view components.

하프 공간 분해능 MVC 는 아래 참조되며 아래 테이블 1 에 언급되어 있다. 모든 시퀀스들은 하프 공간 분해능으로 코딩될 수도 있다. H.264/AVC 프레임 호환가능 코딩에 비해, 하프 공간 분해능 MVC 는 보다 효율적이고 이는 다음 요건들을 충족시키기에 보다 편리하다.Half space resolution MVC is referenced below and is mentioned in Table 1 below. All sequences may be coded with half spatial resolution. Compared to H.264 / AVC frame compatible coding, half space resolution MVC is more efficient and it is more convenient to meet the following requirements.

● 순방향 호환가능성: 이러한 2-뷰 3DVC 비트스트림은 MVC 서브비트스트림을 포함하며, 이 서브비트스트림은 또한 AVC 서브비트스트림을 포함한다. 따라서, 제안된 코덱은 다음의 본 요건들을 충족시킨다: 이 모드에 따르는 모든 압축 비트스트림들이 기존의 AVC 디코더들로 하여금 비트스트림으로부터의 모노 및 스테레오 뷰들로부터 샘플들을 재구성할 수 있게 한다. Forward compatibility: This two-view 3DVC bitstream includes an MVC subbitstream, which also includes an AVC subbitstream. Thus, the proposed codec meets the following present requirements: All compressed bitstreams conforming to this mode enable existing AVC decoders to reconstruct samples from mono and stereo views from the bitstream .

● 스테레오/모노 호환가능성: VCL NAL 유닛들은 MVC 또는 AVC 서브비트스트림을 얻기 위해 NAL 유닛 타입을 체크함으로써 간단히 추출될 수도 있다. 따라서, 제안된 코덱은 특히 다음 요건들을 충족시킨다: 압축된 델타 포맷은 스테레오 및 모노 출력에 대하여 비트스트림의 간단한 추출을 인에이블하는 모드를 포함해야 하고, 스테레오 비디오의 좌측 및 우측 뷰들로부터 샘플들의 높은 신뢰성의 재구성을 지원해야 한다. Stereo / mono compatibility: VCL NAL units may be extracted simply by checking the NAL unit type to obtain an MVC or AVC subbitstream. Therefore, the proposed codec is thus especially meet the following requirements: a delta compression format that enables a simple extraction of the bit stream with respect to the stereo and mono output Mode must be included and support high reconstruction of samples from left and right views of stereo video .

하프 공간 분해능 시퀀스들은 텍스쳐 및 심도 시퀀스들 양쪽 모두에 대해 MPEG 13-탭 다운샘플링 필터 ([2,0,-4,-3,5,19,26,19,5,-3,-4,0,2]/64) 에 의해 획득될 수도 있다. 보다 양호한 품질을 실현하기 위하여, 다운샘플링이 수평방향 또는 수직방향으로 적용될 수 있다. 지배적인 수평방향 고주파 성분들을 가진 시퀀스들에 대해, 하프 수직 분해능이 이용될 수 있다. 일부 예들에서, 하나의 시퀀스만이 이 카테고리 : " Poznan _ Hall2 " 에 속하는 것으로 고려된다. 다른 시퀀스들은 지배적인 수직방향 고주파 성분들을 갖는 것으로 고려되고 수평방향 다운샘플링이 하브 수평방향 분해능 시퀀스들을 획득하기 위해 적용된다.Half-space resolution sequences are MPEG 13-tap downsampling filters ([2,0, -4, -3,5,19,26,19,5, -3, -4,0) for both texture and depth sequences. , 2] / 64). In order to realize better quality, downsampling can be applied in the horizontal or vertical direction. For sequences with dominant horizontal high frequency components, half vertical resolution can be used. In some examples, only one sequence is considered to belong to this category: " Poznan _ Hall2 " . Other sequences are considered to have dominant vertical high frequency components and horizontal downsampling is applied to obtain the harp horizontal resolution sequences.

텍스쳐 및 심도에 대한 대칭 분해능이 이용될 수도 있다. 심도 뷰 성분은 동일 뷰의 텍스쳐 뷰 성분과 동일한 분해능을 갖는 8-비트 모노 시퀀스로서 코딩될 수도 있다. 이러한 설정에 있어서, 텍스쳐 뷰 성분으로부터 심도 뷰 성분으로의 예측은 스케일링 없이도, 예를 들어, 매크로블록 (MB) 에서의 모션 벡터들 또는 픽셀들 없이도 수행될 수 있다.Symmetric resolution for texture and depth may be used. The depth view component may be coded as an 8-bit mono sequence with the same resolution as the texture view component of the same view. In this setting, the prediction from the texture view component to the depth view component may be performed without scaling, eg, without motion vectors or pixels in the macroblock (MB).

심도 뷰 성분들에 대한 인터-뷰 예측이 지원될 수도 있다. 심도 뷰 성분은 MVC 에서의 인터-뷰 예측과 동일한 방식으로 동일 액세스 유닛으로의 다른 심도 뷰 성분들에 의해 예측될 수도 있다. 심도 뷰 성분은 익스텐션에서 시그널링되는 뷰 의존성을 갖는 서브세트 시퀀스 파라미터 세트 (SPS) 를 참조한다.Inter-view prediction for depth view components may be supported. Depth view component may be predicted by other depth view components to the same access unit in the same manner as inter-view prediction in MVC. The depth view component refers to a subset sequence parameter set (SPS) with view dependencies signaled in the extension.

통상적으로, 심도 뷰 성분들의 예측 종속성은 도 6 에 도시된 바와 같이 텍스쳐 뷰 성분들의 동일한 뷰 종속성을 공유한다. 또한, 수개의 시퀀스들은 심도 뷰들 사이의 인터-뷰 예측으로부터 이익을 얻을 수 없음을 주지한다. 따라서, 심도 뷰들에 대한 인터-뷰 예측은 이러한 케이스들에 대하여 간단히 디스에이블될 수 있다. 도 6 은 3DVC 코덱의 예측 구조를 나타낸다. 심도 뷰 성분들 (크로스해칭되어 도시됨) 은 텍스쳐 뷰 성분들 (쉐이딩없이 도시됨) 과 동일한 예측 구조를 갖는다.Typically, the prediction dependencies of the depth view components share the same view dependency of the texture view components as shown in FIG. 6. Also note that several sequences do not benefit from inter-view prediction between depth views. Thus, inter-view prediction for depth views can simply be disabled for these cases. 6 shows a prediction structure of the 3DVC codec. Depth view components (shown crosshatched) have the same prediction structure as texture view components (shown without shading).

따라서, 플래그 (disable_depth_inter_view_flag) 가 SPS 에서 시그널링되어 심도 뷰들에 대한 인터-뷰 예측을 디스에이블 또는 인에이블할 수 있다. 2-뷰 케이스 및 3-뷰 케이스에 대한 보다 구체화된 SPS 설계는 아래 보다 자세하게 설명된다. 인터-뷰 예측으로 이익을 얻을 수 있는 심도 맵 시퀀스들에 대해, 심도 뷰 성분들은 도 6 에 도시된 바와 같이, 텍스쳐 뷰와 동일한 인터 예측 및 인터-뷰 예측 구조들을 갖는다.Thus, a flag (disable_depth_inter_view_flag) may be signaled in the SPS to disable or enable inter-view prediction for depth views. More specific SPS designs for two-view and three-view cases are described in more detail below. For depth map sequences that may benefit from inter-view prediction, the depth view components have the same inter prediction and inter-view prediction structures as the texture view, as shown in FIG. 6.

도 7 은 3DVC 코덱의 예측 구조가 심도 뷰 성분들에 대한 인터-뷰 예측을 허용하지 않음을 나타낸다. 도 7 에 나타낸 성분들에서 쉐이딩하여 도시된 성분은 텍스쳐 뷰이고 크로스해칭으로 쉐이딩하여 나타낸 성분은 심도 뷰이다. 도 7 에 나타낸 바와 같이, 인터-뷰 예측은 텍스쳐 뷰 성분에 대해 인에이블될 수도 있지만, 심도 뷰 성분들에 대해서는 전체적으로 디스에이블된다. 이러한 경우에 심도 뷰 성분은 대응하는 텍스쳐 뷰 성분과는 상이한 슬라이스 타입을 가질 수도 있다.7 shows that the prediction structure of the 3DVC codec does not allow inter-view prediction for depth view components. Components shown shaded in the components shown in FIG. 7 are texture views and components shown shaded with crosshatching are depth views. As shown in FIG. 7, inter-view prediction may be enabled for texture view components, but disabled globally for depth view components. In this case the depth view component may have a different slice type than the corresponding texture view component.

이하, 텍스쳐로부터 심도로의 모션 예측이 설명된다. 텍스쳐 뷰 성분과 그 관련된 심도 뷰 성분이 유사한 오브젝트 실루엣을 갖고 있기 때문에, 이들은 유사한 오브젝트 바운더리 및 움직임을 가지며, 따라서, 이들 모션 필드들에는 리던던시가 존재한다.Hereinafter, motion prediction from texture to depth is described. Since the texture view component and its associated depth view component have similar object silhouettes, they have similar object boundaries and movements, so there is redundancy in these motion fields.

본 개시물에 따르면, 텍스쳐 뷰 성분으로부터 그 관련 심도 뷰 성분으로의 모션 예측은 제안된 코덱에서 새로운 모드로서 인에이블될 수 있다. 일부 예들에서, IVMP (inside view motion prediction) 모드가 심도 뷰 성분들에서만 인터 코딩된 MB 에 대해 인에이블된다. IVMP 모드에서, mb_type, sub_mb_type, 참조 인덱스 및 텍스쳐 뷰 성분에서의 공동 위치된 MB 의 모션 벡터를 포함하는 모션 정보가 동일한 뷰의 심도 뷰 성분에 의해 재사용된다. IVMP 모드를 이용하는지 여부를 나타내는 플래그가 각각의 MB 에서 시그널링될 수 있다. 도 5 와 일관성있게, 플래그가 심도 뷰의 4번째 픽쳐에 대하여 참일 수도 있고 텍스쳐 뷰의 4번째 픽쳐 (4번째로 라벨링됨) 에서의 공동 위치된 MB 의 모션 벡터는 심도 뷰 성분에 대해 재사용된다. 일부 예들에서, IVMP 모드는 비앵커 픽쳐들에만 적용시킨다.According to this disclosure, motion prediction from a texture view component to its associated depth view component may be enabled as a new mode in the proposed codec. In some examples, inside view motion prediction (IVMP) mode is enabled for intercoded MB only in depth view components. In IVMP mode, motion information including mb_type, sub_mb_type, reference index, and motion vectors of co-located MBs in the texture view component are reused by the depth view component of the same view. A flag indicating whether to use the IVMP mode may be signaled in each MB. Consistent with FIG. 5, the flag may be true for the fourth picture of the depth view and the motion vector of the co-located MB in the fourth picture (labeled fourth) of the texture view is reused for the depth view component. In some examples, IVMP mode applies only to non-anchor pictures.

이하, 슬라이스 헤더 예측이 설명된다. 각각의 뷰 성분에서, 심도 뷰 성분과 텍스쳐 뷰 성분 사이의 슬라이스 헤더들 사이에 리던던시가 존재할 수도 있다. 따라서, 텍스쳐 뷰 성분의 슬라이스 헤더가 주어지면, 동일 액세스 유닛의 동일 뷰 내의 심도 뷰 성분은 이미 결정된 자신의 슬라이스 헤더 정보의 대부분을 갖는다.Slice header prediction is described below. In each view component, there may be redundancy between slice headers between the depth view component and the texture view component. Thus, given the slice header of the texture view component, the depth view component in the same view of the same access unit has most of its slice header information already determined.

본 개시물에 따르면, 심도 뷰 성분들은 대응하는 텍스트 대부분의 슬라이스 헤더 구문 엘리먼트들을 공유한다. 상이한 구문 엘리먼트들은 pic_parameter_set_id, slice_qp_delta, 및 가능성있게 참조 픽쳐 리스트 구성에 관련된 구문 엘리먼트를 포함하며, 이 관련된 구문 엘리먼트는 num_ref_idx_l0_active_minus1, num_ref_idx_l1_active_minus1 및 참조 픽쳐 리스트 변경 구문 테이블을 포함한다.According to this disclosure, depth view components share slice text syntax elements of most of the corresponding text. The different syntax elements include pic_parameter_set_id, slice_qp_delta, and possibly syntax elements related to the reference picture list construction, which include num_ref_idx_l0_active_minus1, num_ref_idx_l1_active_minus1, and the reference picture list change syntax table.

심도 뷰 성분의 슬라이스 헤더는 슬라이스 헤더 심도 익스텐션에서 시그널링될 수도 있다. pred_slice_header_depth_idc 는 시퀀스 파라미터 세트에서 시그널링될 수도 있음을 주지한다. 일부 예들에서, 인코더는 항상 1 인 것으로 설정될 수도 있다.The slice header of the depth view component may be signaled in the slice header depth extension. Note that pred_slice_header_depth_idc may be signaled in the sequence parameter set. In some examples, the encoder may always be set to one.

예시적인 슬라이스 헤더 심도 익스텐션 구문은 아래의 테이블 1 의 예에 따를 수도 있다.An example slice header depth extension syntax may follow the example in Table 1 below.

테이블 1Table 1

이하, 3-뷰 케이스가 설명된다. 본 개시물의 기법들은 하프 분해능 인코딩을 좌측 및 우측 뷰들 양쪽 모두에 대해, 그리고 풀 분해능을 중심 뷰에 대해 채택할 수도 있다. 2-뷰 케이스에서 인에이블된 인코딩 방법들이 또한 3-뷰 케이스에서의 코덱에 대해 지원될 수도 있다. 코덱은 3-뷰 케이스에 대한 다음의 특징들을 포함할 수도 있다.The three-view case is described below. The techniques of this disclosure may adopt half resolution encoding for both left and right views, and full resolution for the center view. Encoding methods enabled in the two-view case may also be supported for the codec in the three-view case. The codec may include the following features for the 3-view case.

● 상이한 뷰들에서의 비대칭 공간 분해능;Asymmetric spatial resolution in different views;

● 로우 분해능 뷰로부터 하이 분해능 뷰로의 인터-뷰 예측;Inter-view prediction from the low resolution view to the high resolution view;

● 로우 분해능 뷰들의 텍스쳐 뷰 성분들을 포함하는 서브-비트스트림이 H.264/MVC 스테레오 하이 프로파일에 호환가능하다.The sub-bitstream comprising the texture view components of the low resolution views is compatible with H.264 / MVC stereo high profile.

● 하이 분해능 뷰들에 대한 인터-뷰 예측 종속성의 시그널링.Signaling of inter-view prediction dependency on high resolution views.

이하, 비대칭 3DVC 코덱에서의 인터-뷰 예측이 설명된다. 텍스쳐 뷰 성분들 사이와 심도 뷰 성분들 사이 양쪽 모두에서, 재구성된 로우 분해능 뷰로부터 하이 분해능 뷰로의 예측이 인에이블될 수도 있다.Hereinafter, inter-view prediction in asymmetric 3DVC codec is described. In both the texture view components and between the depth view components, prediction from the reconstructed low resolution view to the high resolution view may be enabled.

보다 구체적으로, 3-뷰 케이스에서, 좌측 뷰 및 우측 뷰는 하프 분해능으로 코딩될 수도 있고, 중심 뷰가 풀 분해능으로 코딩될 수도 있다. 하프 분해능 뷰 성분으로부터 풀 분해능 (텍스쳐 또는 심도) 뷰 성분으로의 인터-뷰 예측이 발생하는 경우, 하프 분해능 뷰 성분의 디코딩된 픽쳐는, 인터-뷰 예측에 사용될 경우에, AVC 6-탭 필터 [1, -5, 20, 20, -5, 1]/32 로 업샘플링된다. 이 경우에, 로우 분해능 픽쳐 (출력에 필요함) 및 또한 업샘플링된 픽쳐는 버퍼에 일시적으로 공존하는 것이 필요할 수도 있다. 그 후, 좌측 뷰 및 우측 뷰로부터의 업샘플링된 픽쳐들은 동일 액세스 유닛으로의 중심 뷰의 뷰 성분의 참조 픽쳐 리스트들 내에 놓일 수 있다.More specifically, in a three-view case, the left view and the right view may be coded at half resolution, and the center view may be coded at full resolution. When inter-view prediction occurs from the half resolution view component to the full resolution (texture or depth) view component, the decoded picture of the half resolution view component, when used for inter-view prediction, is determined using the AVC 6-tap filter [ Upsampled to 1, -5, 20, 20, -5, 1] / 32. In this case, the low resolution picture (needed for output) and also the upsampled picture may need to temporarily coexist in the buffer. The upsampled pictures from the left view and the right view can then be placed in the reference picture lists of the view component of the center view to the same access unit.

비대칭 인터-뷰 예측이 도 8 에 도시되어 있으며, 좌측 뷰 (VL) 및 우측 뷰 (VR) 양쪽 모두가 반폭을 갖는다. 뷰 종속성이 이들로 하여금 중심 뷰 (VC) 에 대한 인터-뷰 참조들로서 이용되는 것을 허용하기 때문에, 이들 뷰는 양쪽 모두 중간 픽쳐들로 업샘플링된다.Asymmetric inter-view prediction is shown in FIG. 8, where both the left view (VL) and the right view (VR) have half widths. Since view dependencies allow them to be used as inter-view references to the center view (VC), these views are both upsampled to intermediate pictures.

간략화를 위하여, "MVC 뷰" 가 텍스트 및 심도 부분들 양쪽 모두를 지칭하는지 또는 단지 텍스쳐 부분만을 지칭하는지의 여부와 무관하게, (텍스쳐에 대해서만 고려하는 경우에) MVC 에 호환가능한 로우 분해능 뷰들은 MVC 뷰들이라 불린다. 풀 분해능을 가진 다른 뷰들은 추가적인 뷰들로 지칭된다. 이렇게 3-뷰 케이스에서, 이들은 두개의 MVC 뷰들과 하나의 추가적인 뷰이다. 각각의 MVC 뷰는 추가적인 뷰의 분해능의 반이 되는 동일한 분해능에서 텍스쳐 및 심도 양쪽 모두를 포함한다.For simplicity, regardless of whether the "MVC view" refers to both text and depth portions or just a texture portion, the raw resolution views compatible with MVC are MVC (when considering only texture). It's called views. Other views with full resolution are called additional views. In this three-view case, these are two MVC views and one additional view. Each MVC view includes both texture and depth at the same resolution, which is half the resolution of the additional view.

이하, 시퀀스 파라미터 세트 설계가 설명된다. 본 개시물의 일부 양태들에서, 새로운 SPS 익스텐션이 도입될 수도 있다. seq_parameter_set_data() 에 표시된 프로파일이 3DV 에 관련되면, 새로운 SPS 익스텐션은 서브세트 SPS 에 추가된다. 본 개시물에 따르면, 두개의 상이한 케이스들에 대해 두개의 가능성있는 프로파일: "3DV 프로파일" 및 "비대칭 3DV 프로파일"이 고려된다. 즉, 3DV 프로파일은 2-뷰 케이스에 적용하고, 비대칭 3DV 프로파일은 3-뷰 케이스에 적용한다.The sequence parameter set design is described below. In some aspects of the present disclosure, a new SPS extension may be introduced. If the profile indicated in seq_parameter_set_data () is related to 3DV, a new SPS extension is added to the subset SPS. According to the present disclosure, two possible profiles: "3DV profile" and "asymmetric 3DV profile" are considered for two different cases. That is, the 3DV profile applies to the two-view case, and the asymmetric 3DV profile applies to the three-view case.

MVC 에서, 새로운 시퀀스 레벨 파라미터 세트, 즉, SPS MVC 익스텐션이 도입되어 서브세트 SPS 로 시그널링될 수도 있다. MVC 가 기본 사양인 것으로 고려되면, 새롭게 추가된 프로파일들의 어느 것에서도, 서브세트 SPS 는 SPS MVC 익스텐션의 상단에서 시퀀스 파라미터 세트 3DVC 익스텐션을 시그널링하도록 추가로 확장된다.In MVC, a new sequence level parameter set, that is, an SPS MVC extension, may be introduced and signaled in a subset SPS. If MVC is considered to be the base specification, in any of the newly added profiles, the subset SPS is further extended to signal the sequence parameter set 3DVC extension on top of the SPS MVC extension.

하나의 제안된 코덱에서, 새로운 SPS 익스텐션, 즉, 시퀀스 파라미터 세트 3DVC 익스텐션은 심도 뷰 성분들에 대한 인터-뷰 종속성에 더하여 비대칭 3DV 프로파일에 대한 하이 분해능 뷰들에 대한 인터-뷰 종속성들을 추가로 시그널링하도록 하는 구문을 포함하며, 이는 3DV 프로파일 및 비대칭 3DV 프로파일 양쪽 모두에 적용가능하다.In one proposed codec, a new SPS extension, i.e., sequence parameter set 3DVC extension, further signals inter-view dependencies for high resolution views for asymmetric 3DV profile in addition to inter-view dependency for depth view components. Which is applicable to both 3DV profiles and asymmetric 3DV profiles.

3DV 관련 애플리케이션들에서, 다른 구문 엘리먼트들, 예를 들어, 카메라 파라미터들 및 심도 범위 및/또는 심도 양자화와 관련된 것들이 또한 SPS 로 시그널링될 수도 있다. 그러나, 하나의 제안된 코덱에서, 이 정보는 이용가능한 것으로 고려될 수도 있으며, 따라서, 코딩된 비트스트림으로 송신되지 않는다.In 3DV related applications, other syntax elements, such as those related to camera parameters and depth range and / or depth quantization, may also be signaled to the SPS. However, in one proposed codec, this information may be considered available and therefore is not transmitted in the coded bitstream.

테이블 2 는 서브세트 시퀀스 파라미터 세트 RBSP (raw byte sequence payload) 구문의 예들을 나타낸다.Table 2 shows examples of subset sequence parameter set RBSP (raw byte sequence payload) syntax.

테이블 2Table 2

테이블 3 은 시퀀스 파라미터 세트 3DVC 익스텐션 구문의 예들을 나타낸다.Table 3 shows examples of sequence parameter set 3DVC extension syntax.

테이블 3Table 3

하나의 제안된 3DVC 코덱에서, 심도 범위들에 더하여 카메라 파라미터들은 이들이 디코딩된 뷰들에 표준 영향을 주지 않기 때문에 비트스트림에 포함되지 않을 수도 있다. 그러나, 이들은 뷰 합성에 그리고, 예를 들어, 뷰 합성을 특정 모드로서 이용하는 가능한 코딩 툴들에 도움을 줄 수 있다. 카메라 파라미터들 또는 심도 범위가 특정 코딩 툴에 요구되는 경우, 이들은 파라미터들 세트, 이를 테면, SPS, PPS (Picture Parameter Set) 또는 심지어, 이러한 정보가 프레임 마다 기반으로 변할 수 있는 경우의 새로운 타입의 파라미터 세트, 즉, VPS (View Parameter Set) 내에서 표준이고 정해진 (mandatory) 방식으로 송신될 수도 있다. 이들이 임의의 송신된 텍스쳐 또는 심도의 디코딩에 필요한 것이 아니면, 이들은 SEI 메시지로 (시퀀스 레벨 또는 픽쳐 레벨로) 시그널링될 수 있다. In one proposed 3DVC codec, camera parameters in addition to depth ranges may not be included in the bitstream since they do not have a standard effect on the decoded views. However, they can help with view synthesis and possible coding tools that use view synthesis as a particular mode, for example. If camera parameters or depth ranges are required for a particular coding tool, they may be a set of parameters, such as a SPS, Picture Parameter Set (PPS) or even a new type of parameter when such information may vary on a per frame basis. It may be transmitted in a standard, mandatory manner within a set, ie View Parameter Set (VPS). If they are not necessary for decoding any transmitted texture or depth, they may be signaled (at sequence level or picture level) in an SEI message.

본 섹션은 위의 정보가 어떻게 비트스트림에서 시그널링될 수 있는지의 방법에 대한 실현을 제공한다. 카메라 파라미터들 및 심도 범위들에 대한 시그널링은 소프트웨어로 구현될 수도 있지만 비트스트림들의 생성을 위하여 인에이블되지 않는다. This section provides a realization of how the above information can be signaled in the bitstream. Signaling for camera parameters and depth ranges may be implemented in software but is not enabled for generation of bitstreams.

테이블 4 는 SPS 3DVC 익스텐션에서의 카메라 파라미터들 및 심도 범위들의 예들을 나타낸다.Table 4 shows examples of camera parameters and depth ranges in the SPS 3DVC extension.

테이블 4Table 4

카메라 파라미터들 구문 테이블에서, 부동 소수점 값 V 는 소수점 앞 뒤의 숫자들의 수인 자신의 정밀도 (P) 와 정수값 (I) 을 갖는 것으로 표현될 수도 있으며, V=I*10 ^P 으로 된다. V 의 부호는 I 의 것과 동일할 수도 있다. 이러한 제안된 표현은 카메라 파라미터들 및 심도 범위들에 대하여 충분히 정확할 수도 있으며, 부동 소수점 값을 파싱하고 구성하는 것이 비교적 용이할 수 있다.In the camera parameters syntax table, the floating point value V may be expressed as having its precision ( P ) and integer value ( I ), the number of digits before and after the decimal point, and V = I * 10 ^P. Becomes The sign of V may be the same as that of I. This proposed representation may be sufficiently accurate for camera parameters and depth ranges, and it may be relatively easy to parse and construct a floating point value.

본 개시물에서 CfP 에 진술된 바와 같이 "소스 비디오 데이터가 카메라 지오메트리 및 컬러들의 오정렬을 피하기 위해 정류되어야 한다"는 요건이 주어지면, 다중의 뷰들이 수평방향 트랜슬레이션을 제외하고는 외적인 파라미터들의 대부분 및 동일한 내적인 파라미터들을 공유하는 것으로 취해질 수도 있다.Given the requirement that "source video data should be rectified to avoid misalignment of camera geometry and colors " as stated in CfP in this disclosure, multiple views may be subject to external parameters except for horizontal translation. It may be taken to share most and the same internal parameters.

다음에 오는 테이블 5 및 단락들은 예시적인 카메라 파라미터들 구문 및 시맨틱스를 나타낸다.Table 5 and the paragraphs that follow represent example camera parameters syntax and semantics.

테이블 5Table 5

테이블 5 에서, 1 과 같은 cam _ param _ present _ flag 는 카메라 파라미터들이 이 SPS 에서 시그널링됨을 나타낼 수도 있다. 0 과 같은 cam_param_present_flag 는 카메라 파라미터들이 이 SPS 에서 시그널링되지 않음을 나타낼 수도 있다.In Table 5, cam _ param _ present _ flag , such as 1, may indicate that camera parameters are signaled in this SPS. Cam_param_present_flag equal to 0 may indicate that camera parameters are not signaled in this SPS.

테이블 5 에서, focal _ length _ precision 는 focal_length_x 및 focal_length_y 의 값들의 정밀도를 특정하며 focal_length_x 및 focal_length_y 는 모든 카메라들의 x 좌표 초점 길이 및 y 좌표 초점 길이이다. In Table 5, focal length _ _ precision is specified the precision of the value of focal_length_x and focal_length_y and focal_length_x and focal_length_y is a focal length x coordinate and y coordinate of the focal length all cameras.

테이블 5 에서, focal _ length _x_I 는 focal_length_x 의 값의 정수값을 특정한다. In Table 5, focal length _ _x_I specifies the integer value of value of focal_length_x.

focalfocal __ lengthlength _x=_x = focalfocal __ lengthlength _x_I*10_x_I * 10 ^focalfocal ^__ ^lengthlength ^__ ^{precisionprecision}

테이블 5 에서, focal _ length _y_I_ diff _x 더하기 focal_length_x_I 는focal_length_y 의 값의 정수 부분을 특정한다.In Table 5, focal length _y_I_ _ diff _x plus focal_length_x_I specifies the integer part of the value of focal_length_y.

focalfocal __ lengthlength _y=_y = (( focalfocal __ lengthlength _x_I+_x_I + focalfocal __ lengthlength _y_I__y_I_ diffdiff _x)*10_x) * 10 ^focalfocal ^__ ^lengthlength ^__ ^{precisionprecision}

테이블 5 에서, principal _ precision 는 principal_point_x 및 principal_point_y 의 값들의 정밀도를 특정하며, principal_point_x 및 principal_point_y는 모든 카메라들의 x 좌표 주 포인트 및 y 좌표 주 포인트이다.In table 5, principal _ precision Specifies the precision of the values of principal_point_x and principal_point_y, where principal_point_x and principal_point_y are the x coordinate main point and y coordinate main point of all cameras.

테이블 5 에서, principal _ point _x_I 는 principal_point_x 의 값의 정수 부분을 특정한다.In Table 5, principal _ point _x_I specifies the integer part of the value of principal_point_x.

principalprincipal __ pointpoint _x= _x = principalprincipal __ pointpoint _x_I*10_x_I * 10 ^{principalprincipal} ^__ ^{precisionprecision}

테이블 5 에서, principal _ point _y_I_ diff _x 더하기 principal_point_x principal_point_y 의 값의 정수 부분을 특정한다. In Table 5, principal _ point _y_I_ diff _x plus principal_point_x specifies the integer part of the value of principal_point_y.

principalprincipal __ pointpoint _y=(_y = ( principalprincipal __ pointpoint _x_I+_x_I + principalprincipal __ pointpoint _y_I__y_I_ diffdiff _x)*10_x) * 10 ^{principalprincipal} ^__ ^{precisionprecision}

각각의 카메라에 대한 회전 행렬 R 은 다음과 같이 표현될 수도 있다:The rotation matrix R for each camera may be expressed as follows:

테이블 5 에서, rotation _ kl _ half _ pi 는 회전 행렬 ( R ) 의 대각 성분들을 나타내며, 여기서, kl 는 xy, yz, 또는 xz 와 같으며, 여기에서, R _kl =(-1) ^rotation ^_ ^kl ^_ ^half ^_ ^pi 이다. 0 과 같은 이 플래그 R _kl = 1 임을 나타내고 1 과 같은 이 플래그는 R _kl = -1 임을 나타낸다.In Table 5, rotation _ kl _ half _ pi represents the diagonal components of the rotation matrix R , where kl is equal to xy, yz, or xz, where R _kl = (-1) ^rotation ^_ ^kl ^_ ^half ^_ ^pi . This flag equal to 0 R _kl A flag indicating that = 1 equals R _kl = -1.

테이블 5 에서, translation _ precision 는 모든 뷰들의 트랜슬레이션의 값들의 정밀도를 특정한다. 트랜슬레이션 값들의 정밀도는 이 SPS 에서 참조하는 뷰들의 모든 트랜슬레이션 값들에 적용한다.In Table 5, translation _ precision specifies the precision of the value of the translation of all views. The precision of the translation values applies to all translation values of the views referenced by this SPS.

테이블 5 에서, numViewsMinus1 는 num_views_minus1 + num_add_views_minus1 +1 으로서 유도된다.In Table 5, numViewsMinus1 is derived as num_views_minus1 + num_add_views_minus1 +1.

테이블 5 에서, anchor _ view _ id 는 뷰의 view_id 를 특정하며, 그 뷰의 트랜슬레이션은 다른 뷰들의 트랜슬레이션을 계산하기 위해 앵커로서 이용된다.In Table 5, anchor _ view _ id Specifies the view_id of the view, which is used as an anchor to compute the translations of other views.

테이블 5 에서, 1과 같은 zero _ translation _ present _ flag 는 anchor_view_id 와 동일한 view_id 를 가진 뷰의 트랜슬레이션이 0 임을 나타낸다; 0 과 같은 이 값은 anchor_view_id 와 같은 view_id 를 가진 뷰의 트랜슬레이션이 시그널링됨을 나타낸다.In Table 5, zero translation _ _ present _ flag equal to 1 indicates that the translation is 0, the view with the same view_id as anchor_view_id; This value equal to 0 indicates that the translation of the view with view_id equal to anchor_view_id is signaled.

translation _ anchor _ view _I 는 앵커 뷰의 트랜슬레이션의 정수 부분을 특정한다. 앵커 뷰의 트랜슬레이션을 translation_anchor_view 로서 표기한다. zero_translation_present_flag 이 0 과 같은 경우 Translation_anchor_view 이 0 과 같고, 그렇지 않은 경우, 트랜슬레이션은 다음과 같이 계산된다. translation _ anchor _ view _I specifies the integer part of the translation of the anchor view. The translation of the anchor view is denoted as translation_anchor_view. If zero_translation_present_flag is equal to 0, Translation_anchor_view is equal to 0, otherwise the translation is calculated as follows.

테이블 5 에서, In table 5,

translation_anchor_view = translation _ anchor _ view _I*10 ^translation ^_ ^precision translation_anchor_view = translation _ anchor _ view _I * 10 ^translation ^_ ^precision

테이블 5 에서, translation _ diff _ anchor _ view _I[i] 더하기 translation_anchor_view_I 는 view_id[i] 와 같은 view_id 를 가진 뷰의 트랜슬레이션의 정수 부분을 특정하며, translation_view_I[i] 로서 표기된다.In Table 5, translation _ diff _ anchor _ view _I [ i ] plus translation_anchor_view_I specifies the integer portion of the translation of the view with view_id, such as view_id [i], and is denoted as translation_view_I [i].

view_id[i] 와 같은 view_id 를 가진 뷰의 트랜슬레이션은 as translation_view[i] 로서 표기한다.Translation of a view with view_id equal to view_id [i] is denoted as translation_view [i].

translation_view[i]= translation_view [i] =

(translation _ diff _ anchor _ view _I[i]+translation _ anchor _ view _I) *10 ^translation ^_ ^precision ( translation _ diff _ anchor _ view _I [i] + translation _ anchor _ view _I ) * 10 ^translation ^_ ^precision

다음에 오는 테이블 6 및 단락은 예시적인 심도 범위들 구문 및 시맨틱스를 나타낸다.Table 6 and the paragraphs that follow show example depth ranges syntax and semantics.

테이블 6Table 6

테이블 6 에서, 1 과 같은 depth _ range _ present _ flag 는 모든 뷰들에 대한 심도 범위가 이 SPS 에서 시그널링됨을 나타내며, 0 과 같은 depth_range_present_flag 는 모든 뷰들에 대한 심도 범위가 이 SPS 에서 시그널링되지 않음을 나타낸다.In Table 6, a depth _ range _ present _ flag equal to 1 Indicates that the depth range for all views is signaled in this SPS, and depth_range_present_flag equal to 0 indicates that the depth range for all views is not signaled in this SPS.

테이블 6 에서, z_ near _ precision 는 z_near 값의 정밀도를 특정한다. SPS 에서 특정된 z_near 의 정밀도는 이 SPS 에서 참조하는 뷰들의 모든 z_near 값들에 적용한다.In Table 6, z_ near _ precision specifies the precision of the z_near value. The precision of z_near specified in the SPS applies to all z_near values of the views referenced in this SPS.

테이블 6 에서, z_ far _ precision 는 z_far 값의 정밀도를 특정한다. SPS 에서 특정된 z_far 의 정밀도는 이 SPS 에서 참조하는 뷰들의 모든 z_far 값들에 적용한다.In Table 6, z_ far _ precision specifies the precision of the z_far value. The precision of z_far specified in the SPS applies to all z_far values of the views referenced in this SPS.

테이블 6 에서, 0 과 같은 different _ depth _ range _ flag 은 모든 뷰들의 심도 범위들이 동일하며 z_near 내지 z_far (이들 값 포함) 의 범위에 있음을 나타낸다. 1 과 같은 different_depth_range_flag 는 모든 뷰들의 범위들이 상이할 수도 있음을 나타낸다: z_near 및 z_far 는 앵커 뷰에 대한 심도 범위이며, z_near[i] 및 z_far[i] 가 view_id[i] 와 동일한 view_id 를 갖는 뷰의 심도 범위로서 이 SPS 에서 추가로 특정된다.In Table 6, a different _ depth _ range _ flag , such as 0, indicates that the depth ranges of all views are the same and are in the range of z_near to z_far (including these values). Different_depth_range_flag equal to 1 indicates that the ranges of all views may be different: z_near and z_far are depth ranges for the anchor view, and z_near [i] and z_far [i] are of the view with the same view_id as view_id [i] Depth range is further specified in this SPS.

테이블 6 에서, z_ near _ integer 는 z_near 의 값의 정수 부분을 특정한다. z_near= z_ near _ integer *10 ^z_ ^near ^_ ^precision In table 6, z_ near _ integer Specifies the integer part of the value of z_near. z_near = z_ near _ integer * 10 ^z_ ^near ^_ ^precision

테이블 6 에서, z_ far _ integer specifies the integer part of the value of z_far. z_ far = z_ far _ integer *10 ^z_ ^far ^_ ^precision In table 6, z_ far _ integer specifies the integer part of the value of z_far. z_ far = z_ far _ integer * 10 ^z_ ^far ^_ ^precision

테이블 6 에서, z_ near _ diff _ anchor _ view _I 더하기 z_near_integer 는 view_id[i] 와 동일한 view_id 를 가진 뷰의 가장 얕은 심도 값의 정수 부분을 특정하며 z_near_I[i] 로서 표기된다.In Table 6, z_ near _ _ diff anchor view _ _I plus z_near_integer is specific for the constant portion of the shallow depth value of the view with the same view_id as view_id [i] and is denoted as z_near_I [i].

view_id[i] 와 같은 view_id 를 갖는 뷰의 z_near 는 z_near[i]로서 표기된다.z_near of view having a view_id such as view_id [i] is denoted by the z_near [i].

z_near[i]=(z_ near _ diff _ anchor _ view _I[i] + z_ near _ integer) *10 ^z_ ^near ^_ ^precision z_near [i] = ( z_ near _ diff _ anchor _ view _I [i] + z_ near _ integer ) * 10 ^z_ ^near ^_ ^precision

테이블 6 에서, z_ far _ diff _ anchor _ view _I 더하기 z_far_Integer 는 view_id[i] 와 같은 view_id 를 갖는 뷰의 가장 깊은 심도 값의 정수 부분을 특정하며, z_far_I[i] 로서 표기된다.In Table 6, z_ far _ _ diff anchor view _ _I plus z_far_Integer shall specify the integer portion of the deepest depth value of the view with view_id such as view_id [i], is expressed as z_far_I [i].

z_far[i]= (z_ far _ diff _ anchor _ view _I[i]+z_ far _ integer) *10 ^z_ ^far ^_ ^precision z_far [i] = ( z_ far _ diff _ anchor _ view _I [i] + z_ far _ integer ) * 10 ^z_ ^far ^_ ^precision

테이블 7 은 예시적인 뷰 파라미터 세트 RBSP 구문을 나타낸다.]Table 7 shows an example view parameter set RBSP syntax.]

테이블 7Table 7

뷰 파라미터 세트 RBSP 를 포함하는 NAL 유닛에는 새로운 NAL 유닛 타입, 예를 들어, 16 이 할당될 수 있다.A NAL unit including the view parameter set RBSP may be assigned a new NAL unit type, for example 16.

다음에 오는 테이블 8 및 단락은 예시적인 뷰 파라미터 세트 구문 및 시맨틱스를 나타낸다.Table 8 and the paragraphs that follow show exemplary view parameter set syntax and semantics.

테이블 8Table 8

카메라의 심도 범위 및 트랜슬레이션은 픽쳐 기반으로 변화할 수도 있다. 현재 VPS 에 이어서 새로운 VPS 가 관련 뷰들의 이들 값을 업데이트할 때까지 업데이트된 심도 범위 또는 카메라 파라미터들은 현재 액세스 유닛의 뷰 성분들 및 비트스트림에서 다음에 오는 뷰 성분들에 적용가능할 수도 있다.The depth range and translation of the camera may vary on a picture basis. The updated depth range or camera parameters may be applicable to the view components of the current access unit and the following view components in the bitstream until the new VPS, following the current VPS, updates these values of related views.

간략화를 위하여, 구문 엘리먼트들의 시맨틱스가 주어지지 않는다. 각각의 뷰의 트랜슬레이션 및 심도 범위에 대해, SPS 에서 시그널링된 값 (seq_para_set_id와 같은 식별자를 가짐) 과 새로운 값 사이의 차이의 정수 부분은 이 VPS 에서 시그널링될 수도 있다. 트랜슬레이션 및 심도 범위에 대한 업데이트된 값들은 다음과 같이 계산될 수 있다:For simplicity, no semantics of syntax elements are given. For the translation and depth range of each view, the integer portion of the difference between the value signaled in the SPS (having an identifier such as seq_para_set_id) and the new value may be signaled in this VPS. The updated values for the translation and depth range can be calculated as follows:

translation_view[i]=(translation _ view _ integer [i]+ translation _ update _ view _I[i])*10 ^translation ^_ ^precision translation_view [i] = ( translation _ view _ integer [i] + translation _ update _ view _I [i] ) * 10 ^translation ^_ ^precision

z_near[i]=(z_ near _ integer [i] +z_ near _ update _ view _I[i])*10 ^z_ ^near ^_ ^precision z_near [i] = ( z_ near _ integer [i] + z_ near _ update _ view _I [i] ) * 10 ^z_ ^near ^_ ^precision

z_far[i]=(z_ far _ integer [i] +z_ far _ update _ view _I[i])*10 ^z_ ^far ^_ ^precision z_far [i] = ( z_ far _ integer [i] + z_ far _ update _ view _I [i] ) * 10 ^z_ ^far ^_ ^precision

여기에서, translation_view_integer[i], z_near_integer[i] 및 z_far_integer[i] 는 translation_view[i], z_near[i], 및 z_far[i] 의 값들의 정수 부분들이고, SPS 에서의 시그널링에 기초하여 계산될 수 있다.Where translation_view_integer [i], z_near_integer [i] and z_far_integer [i] are integer parts of the values of translation_view [i], z_near [i], and z_far [i], and can be calculated based on signaling in the SPS have.

본 개시물의 기법들 중 하나 이상은 압축 및/또는 품질 관점에서 코딩 향상을 제공하는데 이용될 수도 있다. 인코딩 시간 및 복잡도는 본 개시물의 기법들 중 하나 이상을 이용하여 향상될 수 있다. 디코딩 시간 및 복잡도도 또한 향상될 수도 있다. 추가로, 인코더 및 디코더에서의 메모리 사용량은 다른 기법들에 비해 향상 또는 감소될 수도 있다.One or more of the techniques of this disclosure may be used to provide coding enhancements in terms of compression and / or quality. Encoding time and complexity may be improved using one or more of the techniques of this disclosure. Decoding time and complexity may also be improved. In addition, memory usage at the encoder and decoder may be improved or reduced over other techniques.

일부 예들에서, 인코더 및 디코더 양쪽 모두는 JMVC 인코더 및 디코더와 동일한 메모리 소모 레벨을 가질 수도 있다. 따라서, 메모리 사용량은 뷰 성분들, 예를 들어, 액세스 유닛의 수에 비례하는 것으로 고려될 수도 있다. 심도 뷰 성분은 항상 4:0:0 로서 저장되며, 동일한 수의 뷰들에서, 제안된 솔루션은 인코더 또는 디코더에 대하여 JMVC 에 의해 이용된 메모리의 대략 5/3 (약 67% 증가) 를 소모할 수도 있다. 동작들, 예를 들어, 심도 맵들을 뷰잉하고 이들 심도 맵을 뷰 합성을 위해 이용하는 것의 간략화를 위하여, 인코더 및 디코더는 여전히 4:2:0 크로마 샘플링 포맷으로 심도 파일들을 취출하여 출력할 수도 있음을 주지한다.In some examples, both the encoder and the decoder may have the same memory consumption level as the JMVC encoder and decoder. Thus, memory usage may be considered to be proportional to the number of view components, eg, access unit. The depth view component is always stored as 4: 0: 0, and in the same number of views, the proposed solution may consume approximately 5/3 (about 67% increase) of the memory used by JMVC for the encoder or decoder. have. For simplicity of operations, eg, viewing depth maps and using these depth maps for view synthesis, the encoder and decoder may still retrieve and output depth files in a 4: 2: 0 chroma sampling format. Please note.

이하, 디코더의 복잡도 특성들이 설명된다. 일부 예들에서, 본 개시물의 기법들에 따르는 인코더 및 디코더 양쪽 모두는 JMVC 인코더 및 디코더와 동일한 복잡도 레벨을 가질 수도 있다. JMVC 와 비교될 때, 본 개시물에 따르는 코덱의 계산적 복잡도는 각각의 뷰의 공간 분해능과 뷰들의 수에 관련될 수도 있다. 즉, 본 개시물에 따르는 코덱은 이들 양쪽 모두가 동일한 수의 픽셀들을 갖는 동일한 비디오를 취하는 한, JMVC 코덱과 동일한 계산량을 요구할 수도 있다.In the following, the complexity characteristics of the decoder are described. In some examples, both the encoder and the decoder according to the techniques of this disclosure may have the same complexity level as the JMVC encoder and decoder. Compared with JMVC, the computational complexity of the codec according to this disclosure may relate to the spatial resolution of each view and the number of views. That is, a codec according to this disclosure may require the same amount of computation as the JMVC codec, as long as both take the same video with the same number of pixels.

디코더 측에서, 표준의 픽쳐 레벨 업샘플링이 비대칭 3DV 프로파일에 대하여 요구될 수도 있다. 그러나, 이러한 디코딩 프로세스는 하이 분해능 뷰 성분의 디코딩을 위한 다른 디코딩 프로세스들보다 덜 복잡한 것으로 간주될 수도 있어, 복잡도 특성이 여전히 예를 들어, 얼마나 많은 MB들이 초당 프로세싱되어야 하는지에 의해 표시될 수도 있다.At the decoder side, standard picture level upsampling may be required for an asymmetric 3DV profile. However, this decoding process may be considered less complex than other decoding processes for decoding of the high resolution view component, so that the complexity characteristic may still be indicated by, for example, how many MBs should be processed per second.

본원에 설명된 기법들에 따른 인코더는 현재 JMVC 인코더 방식을 추종할 수도 있으며, 뷰들은 하나씩 인코딩된다. 각각의 뷰 내부에서, 텍스쳐 시퀀스가 먼저 인코딩되고, 심도 시퀀스가 그 다음 인코딩된다.An encoder according to the techniques described herein may follow the current JMVC encoder scheme, and the views are encoded one by one. Inside each view, the texture sequence is encoded first, and the depth sequence is then encoded.

IVMP 모드가 인에이블되는 경우, 텍스쳐 뷰 성분 인코딩 동안에, 각각의 텍스쳐 뷰 성분의 파일된 모션은 모션 파일로 기록되고 그 이름은 컨피규어 파일로 특정될 수 있다. 동일한 뷰의 관련 심도 시퀀스가 인코딩되는 경우, 모션 파일이 참조로서 판독된다.When the IVMP mode is enabled, during texture view component encoding, the filed motion of each texture view component can be recorded in a motion file and its name can be specified in the configuration file. If the relevant depth sequences of the same view are encoded, the motion file is read as a reference.

인코더는 다음의 추가적인 항목들을 갖고, JMVC 와 동일한 구성을 이용할 수도 있다.The encoder has the following additional items and may use the same configuration as JMVC.

MotionFileMotionFile

StringString , , defaultdefault : ": " MotionMotion ""

생성될 모션 시퀀스의 파일 이름 (날짜 없이) 을 특정한다. 이 시퀀스는 IVMP 모드에 제공된다. motion _0. dat , motion _1. dat 등은 인코더에 의해 자동으로 생성될 것이다.Specifies the file name (without date) of the motion sequence to be created. This sequence is provided in IVMP mode. motion _0. dat , motion _1. dat etc. will be generated automatically by the encoder.

HalfSizeDimensionHalfsizedimension

UnsignedUnsigned IntInt , , defaultdefault : 0: 0

비대칭 공간 분해능이 이용되는지 그리고 이것이 이용되는 경우의 서브샘플링 치수를 나타낸다. 다음 값들이 지원된다:It indicates whether asymmetric spatial resolution is used and the subsampling dimensions when it is used. The following values are supported:

0 - 모든 뷰들이 동일한 공간 분해능으로 인코딩된다.0-all views are encoded with the same spatial resolution.

1 - 비대칭 공간 분해능이 이용되고 하프 분해능 뷰들이 다른 뷰들의 반폭을 갖는다.1-asymmetric spatial resolution is used and half resolution views have half the width of other views.

2 - 비대칭 공간 분해능이 이용되고 하프 분해능 뷰들이 다른 뷰들의 1/2 높이를 갖는다.2-asymmetric spatial resolution is used and the half resolution views have half the height of other views.

BasisQPBasisQP __ texturetexture

DoubleDouble , , defaultdefault : 26: 26

하프 공간 분해능을 갖는 텍스쳐 뷰 성분의 기본 양자화 파라미터를 특정한다.Specifies a basic quantization parameter of the texture view component with half space resolution.

BasisQPBasisQP __ depthdepth

DoubleDouble , , defaultdefault : 26: 26

하프 공간 분해능을 갖는 심도 뷰 성분의 기본 양자화 파라미터를 특정한다.Specifies a basic quantization parameter of the depth view component with half spatial resolution.

BasisQPBasisQP __ texturetexture __ deltadelta

UnsignedUnsigned IntInt , , defaultdefault : 0: 0

하프 공간 분해능을 갖는 텍스쳐 뷰 성분의 기본 양자화 파라미터에 비교되는, 풀 공간 분해능을 갖는 텍스쳐 뷰 성분의 기본 양자화 파라미터에 대한 기본 양자화 파라미터 오프셋을 특정한다. 풀 공간 분해능을 가진 텍스쳐 뷰 성분의 기본 양자화 파라미터는 BasisQP _ texture ( full spatial resolution )　= BasisQP_texture　+　 BasisQP _ texture _ delta 에 의해 계산된다.Specifies a base quantization parameter offset for the base quantization parameter of the texture view component with full spatial resolution, compared to the base quantization parameter of the texture view component with half space resolution. The default quantization parameter for texture view components with full spatial resolution is BasisQP _ texture ( full spatial resolution ) = Calculated by BasisQP_texture + BasisQP _ texture _ delta .

BasisQPBasisQP __ depthdepth __ deltadelta

UnsignedUnsigned IntInt , , defaultdefault :0:0

하프 공간 분해능을 갖는 심도 뷰 성분의 기본 양자화 파라미터에 비교되는, 풀 공간 분해능을 갖는 심도 뷰 성분의 기본 양자화 파라미터에 대한 기본 양자화 파라미터 오프셋을 특정한다. 풀 공간 분해능을 가진 심도 뷰 성분의 기본 양자화 파라미터는 BasisQP _ depth ( full spatial resolution )　= BasisQP _ depth +　BasisQP_depth_delta 에 의해 계산된다.Specifies a base quantization parameter offset for the base quantization parameter of the depth view component with full spatial resolution, compared to the base quantization parameter of the depth view component with half spatial resolution. The default quantization parameter for the depth view component with full spatial resolution is BasisQP _ depth ( full spatial resolution ) = calculated by BasisQP _ depth + BasisQP_depth_delta .

NoDepthInterViewFlagNoDepthInterViewFlag

FlagFlag (0 (0 oror 1), One), defaultdefault :0:0

인터-뷰 예측이 임의의 심도 뷰 성분에 대해 인에이블되는지의 여부를 특정한다. NoDepthInterViewFlag 이 0 과 같은 경우, 인터-뷰 예측이 인에이블된다. NoDepthInterViewFlag 이 1 과 같은 경우, 인터-뷰 예측이 디스에이블된다.Specifies whether inter-view prediction is enabled for any depth view component. If NoDepthInterViewFlag is equal to 0, inter-view prediction is enabled. If NoDepthInterViewFlag is equal to 1, inter-view prediction is disabled.

HalfResHalfres

FlagFlag (0 (0 oror 1), One), defaultdefault : 0: 0

이 값은 뷰 종속성 부분에서 시그널링되는 각각의 참조 뷰의 특성의 부분으로서 View _ ID 값과 연관된다.This value is a portion of each of the reference characteristic view of the signaled in the view dependency portion is associated with a View _ ID values.

View _ ID 로 식별된 뷰가 하프 공간 분해능인지의 여부를 특정한다. HalfRes 가 0 인 경우, 이는 풀 공간 분해능 뷰이다. HalfRes 가 1 인 경우, 이는 하프 공간 분해능 뷰이다.A view to identifying View _ ID specifies whether the half spatial resolution. If HalfRes is 0, this is a full spatial resolution view. If HalfRes is 1, this is a half space resolution view.

인코더는 비트스트림들을 생성하기 위해 이용될 수 있다. 예시적인 인코더는 다음의 예에서 나타내어진다.The encoder can be used to generate bitstreams. An example encoder is shown in the following example.

여기에서, mcfg 는 컨피규레이션 파일들의 파일이름을 나타낸다. 컨피규레이션 파일은 각각의 인코더 호에 대하여 특정될 수도 있다. 엘리먼트 view _ id 는 인코딩되어야 하는 뷰를 나타낸다. 엘리먼트 component _ idx 는 특정 뷰의 인코딩되는 현재 시퀀스가 텍스쳐인지 (component _ idx 가 1 인 경우) 또는 심도인지 (component_idx 가 0 인 경우) 의 여부를 나타낸다. 인코더는 인코딩되어야 하는 각각의 뷰의 각각의 뷰 성분에 대하여 실행될 수도 있다.Where mcfg represents the filename of the configuration files. The configuration file may be specified for each encoder call. The element view _ id indicates the view to be encoded. Element component _ idx Indicates whether or not the current sequence of the texture are encoded for a particular view that the (component _ idx the case of 1) or field (if the component_idx is zero). The encoder may be executed for each view component of each view to be encoded.

디코더는 JMVC 디코더와 유사할 수도 있으며, 각각의 뷰에 대한 심도 시퀀스를 또한 디코딩 및 출력하는 주요 변경이 존재한다. 비대칭 3DV 프로파일에서, 업샘플링은 추가적인 뷰 (중심) 의 예측을 위해 MVC 뷰 (좌측 또는 우측) 를 하이 분해능으로 변환하는데 요구된다.The decoder may be similar to the JMVC decoder, and there are major changes that also decode and output the depth sequence for each view. In an asymmetric 3DV profile, upsampling is required to convert the MVC view (left or right) to high resolution for prediction of additional view (center).

어셈블러는 복제된 파라미터 세트 NAL 유닛들을 폐기하기 위해 아주 소소한 변경들을 가질 수 있으며, 그 복잡도는 JMVC 어셈블리와 동일하다.The assembler can have very minor changes to discard the duplicated parameter set NAL units, the complexity of which is the same as the JMVC assembly.

뷰에 대해, JMVC 에 대한 신디사이저 변경들이 요구되지 않을 수도 있다.For views, synthesizer changes to JMVC may not be required.

H.264/MVC 기반 3DVC 코덱에 대한 수개의 특징들이 설명되었으며, 이는 본 제안의 모든 "shall" 요건들을 충족시킬 수 있고, 비교적 작은 양의 추가적인 코딩 방법들에 양호한 코딩 성능을 제공할 수도 있다. 본 방법들은 텍스쳐 및 심도의 조인트 코딩에 대한 하이 레벨 프레임워크, 뷰 성분 내부에서의 텍스쳐로부터 심도로의 예측, 및 비대칭 공간 분해능들을 갖는 텍스쳐 또는 심도 뷰 성분들 사이의 인터-뷰 예측을 포함한다.Several features of the H.264 / MVC based 3DVC codec have been described, which can meet all the “shall” requirements of the present proposal and may provide good coding performance for a relatively small amount of additional coding methods. The methods include a high level framework for joint coding of texture and depth, prediction from texture inside the view component to depth, and inter-view prediction between texture or depth view components with asymmetric spatial resolutions.

MVC-기반 3DV 코덱은 단기적인 시장 요구들에 대하여 표준화될 수도 있고, 본 개시물의 제안된 특징들은 이러한 3DV 코덱의 참조 소프트웨어 및 작업 드래프트의 기초일 수도 있다.The MVC-based 3DV codec may be standardized for short term market requirements, and the proposed features of this disclosure may be the basis of the reference software and working draft of this 3DV codec.

도 9 는 본 개시물에 따른 비디오 인코더에 의해 수행될 수도 있는 기법을 나타내는 흐름도이다. 도 9 는 도 2 의 비디오 인코더 (20) 의 관점에서 설명되지만, 다른 비디오 인코더들이 또한 이용될 수 있다. 도 9 에 도시된 바와 같이, 예측 모듈 (41) 은 3D 비디오, 예를 들어, 3D 렌디션을 나타내는 비디오 블록들을 수신한다 (901). 3D 비디오는 텍스쳐 뷰 비디오 블록 및 관련된 심도 뷰 비디오 블록을 포함한다 (901). 예측 모듈 (41) 은 텍스쳐 뷰 비디오 블록을 인코딩한다 (902). 추가로, 예측 모듈 (41) 은 심도 뷰 비디오 블록을 인코딩한다 (903).9 is a flowchart illustrating a technique that may be performed by a video encoder in accordance with this disclosure. 9 is described in terms of video encoder 20 of FIG. 2, other video encoders may also be used. As shown in FIG. 9, prediction module 41 receives 3D video, eg, video blocks that represent 3D rendition (901). The 3D video includes a texture view video block and an associated depth view video block (901). Prediction module 41 encodes the texture view video block (902). In addition, prediction module 41 encodes the depth view video block (903).

본 개시물에 따르면, 예측 모듈 (41) 은 IVMP 모드를 지원한다. 특히, 예측 모듈 (41) 은 심도 뷰에 대한 모션 정보가 텍스쳐 뷰로부터 채택되는지의 여부를 나타내는 구문 엘리먼트를 생성한다 (903). 이 방법으로, IMVP 모드가 인에이블되면, 심도 뷰 성분은 자신의 모션 정보에 대한 어떠한 추가적인 델타 값들을 포함하지 않을 수도 있고, 그 대신에, 텍스쳐 뷰 성분의 모션 정보를 심도 뷰 성분의 모션 정보로서 채택할 수도 있다. 특히, IMVP 모드에서, 심도 뷰 성분은 어떠한 모션 벡터 차이 값들을 포함하지 않을 수도 있으며, 대응하는 텍스쳐 뷰 성분의 모션 벡터를 완전히 채택할 수도 있다. 텍스쳐 뷰의 모션 정보를 심도 뷰의 모션 정보로서 완전히 채택하는 모드를 정의함으로써 이러한 모션 정보에 대한 어떠한 모션 벡터 델타 값들의 시그널링없이도, 향상된 압축을 실현할 수도 있다.According to this disclosure, prediction module 41 supports IVMP mode. In particular, prediction module 41 generates a syntax element that indicates whether motion information for the depth view is adopted from the texture view (903). In this way, if the IMVP mode is enabled, the depth view component may not contain any additional delta values for its motion information; instead, the motion information of the texture view component is used as the motion information of the depth view component. May be adopted. In particular, in IMVP mode, the depth view component may not include any motion vector difference values and may fully adopt the motion vector of the corresponding texture view component. By defining a mode that fully adopts the motion information of the texture view as the motion information of the depth view, improved compression may be realized without signaling any motion vector delta values for this motion information.

텍스쳐 뷰 비디오 블록 및 심도 뷰 비디오 블록은 NAL (network abstraction layer) 유닛에서 함께 코딩될 수도 있고, 구문 엘리먼트는 텍스쳐 뷰 비디오 블록과 연관된 모션 정보가 심도 뷰 비디오 블록과 연관된 모션 정보로서 채택되는지의 여부를 나타내는 플래그를 NAL 유닛에 포함시킬 수도 있다. 이 경우에, 합성 엘리먼트는 텍스쳐 뷰 비디오 블록이 심도 뷰 비디오 블록과 연관된 모션 정보로서 채택됨을 나타내면, 심도 뷰 비디오 블록은 심도 뷰 비디오 블록과 연관된 모션 정보에 대한 어떠한 추가적인 델타를 포함하지 않는다. NAL 유닛들은 비디오 데이터를 코딩하는데 이용된 액세스 유닛의 하나의 특정 타입이며, 본 기법들은 비디오 유닛들의 다른 타입들에도 또한 이용될 수도 있다.The texture view video block and the depth view video block may be coded together in a network abstraction layer (NAL) unit, and the syntax element indicates whether motion information associated with the texture view video block is adopted as motion information associated with the depth view video block. The flag indicating may be included in the NAL unit. In this case, the composite element indicates that the texture view video block is adopted as motion information associated with the depth view video block, and the depth view video block does not contain any additional deltas to the motion information associated with the depth view video block. NAL units are one particular type of access unit used to code video data, and the techniques may also be used for other types of video units.

보다 구체적으로, 구문 엘리먼트는 IVMP 모드가 인에이블되는지의 여부를 나타내는 하나 이상의 비트들을 포함할 수도 있다. IVMP 모드가 디스에이블되면, 텍스쳐 뷰 비디오 블록과 연관된 모션 정보가 NAL 유닛에 포함되고, 심도 뷰 비디오 블록과 연관된 모션 정보가 NAL 유닛에 별도로 포함된다. 대안으로서, IVMP 모드가 인에이블되면, 텍스쳐 뷰 비디오 블록과 연관된 모션 정보는 NAL 유닛에 포함되고, 텍스쳐 뷰 비디오 블록과 연관된 모션 정보는 심도 뷰 비디오 블록과 연관된 모션 정보로서 채택된다. 이에 따라, IVMP 모드가 인에이블되면, 심도 뷰 비디오 블록은 심도 뷰 비디오 블록과 연관된 모션 정보에 대해 어떠한 추가적인 델타도 포함하지 않는다. 일부 예들에서, IVMP 모드는 비앵커 픽쳐들에만 적용하고 앵커 픽쳐들에는 적용하지 않는다.More specifically, the syntax element may include one or more bits indicating whether the IVMP mode is enabled. When the IVMP mode is disabled, motion information associated with the texture view video block is included in the NAL unit, and motion information associated with the depth view video block is separately included in the NAL unit. Alternatively, if the IVMP mode is enabled, motion information associated with the texture view video block is included in the NAL unit, and motion information associated with the texture view video block is adopted as motion information associated with the depth view video block. Thus, if the IVMP mode is enabled, the depth view video block does not include any additional deltas for the motion information associated with the depth view video block. In some examples, IVMP mode applies only to non-anchor pictures and not to anchor pictures.

도 10 은 본 개시물에 따르는 비디오 디코더에 의해 수행될 수도 있는 기법을 나타내는 흐름도이다. 도 10 은 도 3 의 비디오 디코더 (30) 의 관점으로부터 기술되었지만, 다른 비디오 디코더들이 또한 이용될 수 있다. 도 10 에 도시된 바와 같이, 예측 모듈 (81) 은, 3D 비디오, 예를 들어, 3D 비디오 데이터를 나타내는 비디오 블록을 수신한다 (1001). 3D 비디오는 텍스쳐 뷰 비디오 블록 및 연관된 심도 뷰 비디오 블록을 포함한다 (1001). 예측 모듈 (41) 은 텍스쳐 뷰 비디오 블록을 디코딩한다 (1002). 추가로, 예측 모듈 (41) 은 심도 뷰 비디오 블록을 디코딩한다 (1003). 10 is a flowchart illustrating a technique that may be performed by a video decoder in accordance with this disclosure. Although FIG. 10 has been described from the perspective of video decoder 30 of FIG. 3, other video decoders may also be used. As shown in FIG. 10, prediction module 81 receives a 3D video, eg, a video block representing 3D video data (1001). The 3D video includes a texture view video block and an associated depth view video block (1001). Prediction module 41 decodes the texture view video block (1002). In addition, prediction module 41 decodes the depth view video block (1003).

본 개시물에 따르면, 예측 모듈 (81) 은 IVMP 모드를 지원한다. 특히, 예측 모듈 (81) 은 심도 뷰에 대한 모션 정보가 텍스쳐 뷰로부터 채택되는지의 여부를 나타내는 구문 엘리먼트를 디코딩한다 (1003). 심도 뷰의 모션 정보가 텍스쳐 뷰로부터 채택되는지의 여부를 나타내는 것으로서 구문 엘리먼트가 디코더에 의해 해석될 수도 있다. IMVP 모드가 인에이블되면, 심도 뷰 성분은 그 모션 정보에 대하여 어떠한 추가적인 델타 값들을 포함하지 않을 수도 있고 그 대신에, 텍스쳐 뷰 성분의 모션 정보를 심도 뷰 성분의 모션 정보로서 채택할 수도 있다. 또한, 텍스쳐 뷰의 모션 정보를 심도 뷰의 모션 정보로서 완전히 채택하는 모드를 정의함으로써, 이러한 모션 정보에 대한 어떠한 델타값의 스그널링 없이도, 향상된 압축을 실현할 수도 있다.According to this disclosure, prediction module 81 supports IVMP mode. In particular, prediction module 81 decodes 1003 a syntax element that indicates whether motion information for the depth view is adopted from the texture view. The syntax element may be interpreted by the decoder as indicating whether motion information of the depth view is adopted from the texture view. If the IMVP mode is enabled, the depth view component may not include any additional delta values for that motion information and may instead adopt the motion information of the texture view component as the motion information of the depth view component. In addition, by defining a mode that fully adopts the motion information of the texture view as the motion information of the depth view, it is possible to realize improved compression without any delta of this motion information.

하나 이상의 예들에서, 설명된 기능들은 하드웨어, 소프트웨어, 펌웨어, 또는 이들의 임의의 조합으로 구현될 수도 있다. 소프트웨어로 구현되는 경우, 그 기능들은 하나 이상의 명령들 또는 코드들로서 컴퓨터-판독가능 매체 상에 저장되거나 송신되어, 하드웨어-기반의 프로세싱 유닛에 의해 실행될 수도 있다. 컴퓨터-판독가능 매체는 컴퓨터-판독가능 저장 매체들을 포함할 수도 있으며, 이 컴퓨터-판독가능 저장 매체들은 데이터 저장 매체와 같은 유형의 매체, 또는 예컨대, 통신 프로토콜에 따라서 한 장소로부터 다른 장소로의 컴퓨터 프로그램의 전송을 용이하게 하는 임의의 매체를 포함한 통신 매체들에 대응한다. 이러한 방법으로, 컴퓨터-판독가능 매체들은 일반적으로 (1) 비일시성 유형의 컴퓨터-판독가능 저장 매체 또는 (2) 신호 또는 캐리어 파와 같은 통신 매체에 대응할 수도 있다. 데이터 저장 매체는 본 개시물에서 설명하는 기법들의 구현을 위한 명령들, 코드 및/또는 데이터 구조들을 취출하기 위해 하나 이상의 컴퓨터들 또는 하나 이상의 프로세서들에 의해 액세스될 수 있는 임의의 가용 매체들일 수도 있다. 컴퓨터 프로그램 제품은 컴퓨터-판독가능 매체를 포함할 수도 있다.In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored or transmitted on a computer-readable medium as one or more instructions or codes, and executed by a hardware-based processing unit. The computer-readable medium may comprise computer-readable storage media, such as a data storage medium, or a computer readable medium, such as, for example, a computer from one place to another, And any medium that facilitates transmission of the program. In this way, computer-readable media may generally correspond to (1) non-transitory type computer-readable storage media or (2) communication media such as signals or carrier waves. The data storage medium may be one or more computers or any available media that can be accessed by one or more processors to retrieve instructions, code, and / or data structures for implementation of the techniques described in this disclosure . The computer program product may comprise a computer-readable medium.

일 예로서, 이에 한정하지 않고, 이런 컴퓨터-판독가능 저장 매체는 RAM, ROM, EEPROM, CD-ROM 또는 다른 광디스크 스토리지, 자기디스크 스토리지, 또는 다른 자기 저장 디바이스들, 플래시 메모리, 또는 원하는 프로그램 코드를 명령들 또는 데이터 구조들의 형태로 저장하는데 사용될 수 있고 컴퓨터에 의해 액세스될 수 있는 임의의 다른 매체를 포함할 수 있다. 또한, 임의의 접속이 컴퓨터-판독가능 매체로 적절히 지칭된다. 예를 들어, 동축 케이블, 광섬유 케이블, 이중 권선, 디지털 가입자 회선 (DSL), 또는 무선 기술들 예컨대, 적외선, 무선, 및 마이크로파를 이용하여 명령들이 웹사이트, 서버, 또는 다른 원격 소스로부터 송신되는 경우, 동축 케이블, 광섬유 케이블, 이중 권선, DSL, 또는 무선 기술들, 예컨대 적외선, 무선, 및 마이크로파가 그 매체의 정의에 포함된다. 그러나, 컴퓨터-판독가능 저장 매체들 및 데이터 저장 매체들이 접속부들, 캐리어 파들, 신호들, 또는 다른 일시적인 매체들을 포함하지 않고, 대신 비일시성 유형의 저장 매체들에 관련되는 것으로 이해되어야 한다. 디스크 (disk) 및 디스크 (disc) 는, 본원에서 사용할 때, 컴팩트 디스크 (CD), 레이저 디스크, 광 디스크, 디지털 다기능 디스크 (DVD), 플로피 디스크 및 블루-레이 디스크를 포함하며, 디스크들 (disks) 은 데이터를 자기적으로 보통 재생하지만, 디스크들 (discs) 은 레이저로 데이터를 광학적으로 재생한다. 앞에서 언급한 것들의 조합들이 또한 컴퓨터-판독가능 매체들의 범위 내에 포함되어야 한다.By way of example, and not limitation, such computer-readable storage media may be embodied in a computer-readable medium such as RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, Instructions, or any other medium that can be used to store data in the form of data structures and which can be accessed by a computer. Also, any connection is properly referred to as a computer-readable medium. For example, when commands are transmitted from a web site, server, or other remote source using coaxial cable, fiber optic cable, dual winding, digital subscriber line (DSL), or wireless technologies such as infrared, , Coaxial cable, fiber optic cable, dual winding, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the medium. However, it should be understood that the computer-readable storage mediums and data storage media do not include connections, carrier waves, signals, or other transient mediums, but instead relate to non-transitory type storage media. A disk and a disc as used herein include a compact disk (CD), a laser disk, an optical disk, a digital versatile disk (DVD), a floppy disk and a Blu-ray disk, ) Usually reproduce data magnetically, while discs reproduce data optically with a laser. Combinations of the foregoing should also be included within the scope of computer-readable media.

명령들은 하나 이상의 디지털 신호 프로세서들 (DSPs), 범용 마이크로프로세서들, 주문형 집적회로들 (ASICs), 필드 프로그래밍가능 로직 어레이들 (FPGAs), 또는 다른 등가의 통합 또는 이산 로직 회로와 같은, 하나 이상의 프로세서들에 의해 실행될 수도 있다. 따라서, 용어 "프로세서" 는, 본원에서 사용될 때 전술한 구조 중 임의의 구조 또는 본원에서 설명하는 기법들의 구현에 적합한 임의의 다른 구조를 지칭할 수도 있다. 게다가, 일부 양태들에서, 본원에서 설명하는 기능 전용 하드웨어 및/또는 인코딩 및 디코딩을 위해 구성되는 소프트웨어 모듈들 내에 제공되거나, 또는 결합된 코덱에 포함될 수도 있다. 또한, 이 기법들은 하나 이상의 회로들 또는 로직 엘리먼트들로 전적으로 구현될 수 있다.The instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits Lt; / RTI > Thus, the term "processor" when used herein may refer to any of the structures described above or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, it may be provided in software modules that are configured for functional dedicated hardware and / or encoding and decoding described herein, or may be included in a combined codec. In addition, the techniques may be implemented entirely with one or more circuits or logic elements.

본 개시물의 기법들은 무선 핸드셋, 집적 회로 (IC) 또는 ICs 의 세트 (예컨대, 칩 세트) 를 포함한, 매우 다양한 디바이스들 또는 장치들로 구현될 수도 있다. 개시한 기법들을 수행하도록 구성되는 디바이스들의 기능적 양태들을 강조하기 위해서 여러 구성요소들, 모듈들, 또는 유닛들이 본 개시물에서 설명되지만, 상이한 하드웨어 유닛들에 의한 실현을 반드시 필요로 하지는 않는다. 더 정확히 말하면, 위에서 설명한 바와 같이, 여러 유닛들이 코덱 하드웨어 유닛에 결합되거나 또는 적합한 소프트웨어 및/또는 펌웨어와 함께, 위에서 설명한 바와 같은 하나 이상의 프로세서들을 포함한, 상호작용하는 하드웨어 유닛들의 컬렉션으로 제공될 수도 있다.The techniques of the present disclosure may be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize the functional aspects of the devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. More precisely, as described above, several units may be coupled to a codec hardware unit or provided with a collection of interactive hardware units, including one or more processors as described above, along with suitable software and / or firmware .

또 다른 예들에서, 본 개시물은 데이터 구조를 저장한 컴퓨터 판독가능 저장 매체에 대하여 교시될 수도 있다. 데이터 구조들은 이를 테면, 텍스쳐 뷰에 대한 심도 뷰를 코딩하기 위해 IVMP 모드를 이용함으로써, 본 원에 설명된 방법으로 압축되는 3D 비디오 데이터를 포함할 수도 있다.In still other examples, this disclosure may be taught with respect to computer readable storage media having stored data structures. Data structures may include 3D video data that is compressed in the manner described herein, such as by using an IVMP mode to code a depth view for a texture view.

본 개시물의 여러 예들이 설명되었다. 이들 및 다른 예들은 다음 청구항들의 범위 이내이다.Several examples of the disclosure have been described. These and other examples are within the scope of the following claims.

Claims

A method of coding three-dimensional (3D) video data,
Coding the texture view video block; And
Coding a depth view video block, wherein the depth view video block comprises coding the depth view video block associated with the texture view video block,
Coding the depth view video block includes coding a syntax element indicating whether motion information associated with the texture view video block is employed as motion information associated with the depth view video block, A method of coding three-dimensional (3D) video data.

The method according to claim 1,
The texture view video block and the depth view video block are coded together into an access unit,
The syntax element includes a flag defined at a video block level indicating whether the motion information associated with the texture view video block is employed as the motion information associated with the depth view video block; How to code your data.

3. The method of claim 2,
If the syntax element indicates that the motion information associated with the texture view video block is adopted as the motion information associated with the depth view video block, the depth view video block is any delta to the motion information associated with the depth view video block. Also not included, a method of coding three-dimensional (3D) video data.

3. The method of claim 2,
And the syntax element defines whether or not inside view motion prediction (IVMP) mode is enabled.

5. The method of claim 4,
If the IVMP mode is disabled, the motion information associated with the texture view video block is included in the access unit, motion information associated with the depth view video block is separately included in the access unit,
When the IVMP mode is enabled, the motion information associated with the texture view video block is included in the access unit, and the motion information associated with the texture view video block is adopted as motion information associated with the depth view video block, A method of coding three-dimensional (3D) video data.

6. The method of claim 5,
If the IVMP mode is enabled, the depth view video block does not include any delta for the motion information associated with the depth view video block.

The method according to claim 1,
The coding comprises encoding, and coding the syntax element comprises generating the syntax element.

The method according to claim 1,
The coding comprises decoding, and the coding of the syntax element comprises decoding the syntax element from an encoded bitstream, wherein the syntax element is included in the encoded bitstream. How to code video data.

A device for coding three-dimensional (3D) video data,
The device includes one or more processors,
The processor comprising:
Code the texture view video block,
Is configured to code a depth view video block,
The depth view video block is associated with the texture view video block,
Coding the depth view video block includes coding a syntax element indicating whether motion information associated with the texture view video block is adopted as motion information associated with the depth view video block. Device for coding data.

The method of claim 9,
The texture view video block and the depth view video block are coded together into an access unit,
The syntax element includes a flag defined at a video block level indicating whether the motion information associated with the texture view video block is employed as the motion information associated with the depth view video block; Device for coding data.

11. The method of claim 10,
If the syntax element indicates that the motion information associated with the texture view video block is adopted as the motion information associated with the depth view video block, the depth view video block is any delta to the motion information associated with the depth view video block. A device for coding three-dimensional (3D) video data, which also does not include.

11. The method of claim 10,
Wherein the syntax element defines whether an inside view motion prediction (IVMP) mode is enabled.

13. The method of claim 12,
If the IVMP mode is disabled, the motion information associated with the texture view video block is included in the access unit, motion information associated with the depth view video block is separately included in the access unit,
When the IVMP mode is enabled, the motion information associated with the texture view video block is included in the access unit, and the motion information associated with the texture view video block is adopted as motion information associated with the depth view video block, Device for coding three-dimensional (3D) video data.

14. The method of claim 13,
And if the IVMP mode is enabled, the depth view video block does not include any delta for the motion information associated with the depth view video block.

The method of claim 9,
Wherein said coding comprises encoding, and coding said syntax element comprises generating said syntax element.

The method of claim 9,
The coding comprises decoding, and coding the syntax element comprises decoding the syntax element from an encoded bitstream, wherein the syntax element is included in the encoded bitstream. A device for coding the.

The method of claim 9,
The device is a device for coding three-dimensional (3D) video data comprising a wireless handset.

The method of claim 9,
The device comprising:
Digital tv,
Devices in digital direct broadcast systems,
Devices in wireless broadcast systems,
Personal digital assistants (PDAs),
Laptop Computer,
Desktop Computer,
Tablet computers,
e-book reader,
digital camera,
Digital recording devices,
Digital media player,
Video gaming devices,
Video game console,
Cell cordless phone,
Satellite cordless phone,
Smartphone,
Video teleconferencing devices, and
A device for coding three-dimensional (3D) video data, comprising one or more of video streaming devices.

A computer-readable storage medium having stored thereon instructions,
The instructions cause the one or more processors to execute,
Code the texture view video block;
To code a depth view video block,
The depth view video block is associated with the texture view video block,
Coding the depth view video block includes coding a syntax element indicating whether motion information associated with the texture view video block is adopted as motion information associated with the depth view video block. Storage media.

20. The method of claim 19,
The texture view video block and the depth view video block are coded together into an access unit,
The syntax element includes a flag defined at a video block level indicating whether the motion information associated with the texture view video block is employed as the motion information associated with the depth view video block; Storage media.

21. The method of claim 20,
If the syntax element indicates that the motion information associated with the texture view video block is adopted as the motion information associated with the depth view video block, the depth view video block is any delta to the motion information associated with the depth view video block. A computer readable storage medium having instructions stored thereon that is not included.

21. The method of claim 20,
And the syntax element defines whether or not inside view motion prediction (IVMP) mode is enabled.

23. The method of claim 22,
If the IVMP mode is disabled, the motion information associated with the texture view video block is included in the access unit, motion information associated with the depth view video block is separately included in the access unit,
When the IVMP mode is enabled, the motion information associated with the texture view video block is included in the access unit, and the motion information associated with the texture view video block is adopted as motion information associated with the depth view video block, A computer readable storage medium having stored thereon instructions.

24. The method of claim 23,
And when the IVMP mode is enabled, the depth view video block does not include any delta to the motion information associated with the depth view video block.

20. The method of claim 19,
Wherein the coding comprises an encoding, and wherein coding the syntax element comprises generating the syntax element.

20. The method of claim 19,
The coding comprises decoding, and coding the syntax element comprises decoding the syntax element from an encoded bitstream, the syntax element being included in the encoded bitstream. media.

A device configured to code three-dimensional (3D) video data, the device comprising:
Means for coding a texture view video block; And
Means for coding a depth view video block, the depth view video block comprising means for coding the depth view video block associated with the texture view video block,
The means for coding the depth view video block includes means for coding a syntax element indicating whether motion information associated with the texture view video block is adopted as motion information associated with the depth view video block. 3D) a device configured to code the video data.

28. The method of claim 27,
The texture view video block and the depth view video block are coded together into an access unit,
The syntax element includes a flag defined at a video block level indicating whether the motion information associated with the texture view video block is employed as the motion information associated with the depth view video block; A device configured to code data.

29. The method of claim 28,
If the syntax element indicates that the motion information associated with the texture view video block is adopted as the motion information associated with the depth view video block, the depth view video block is any delta to the motion information associated with the depth view video block. A device configured to code three-dimensional (3D) video data, which also does not include.

29. The method of claim 28,
Wherein the syntax element defines whether three-dimensional (3D) video data defines whether an inside view motion prediction (IVMP) mode is enabled.

31. The method of claim 30,
If the IVMP mode is disabled, the motion information associated with the texture view video block is included in the access unit, motion information associated with the depth view video block is separately included in the access unit,
When the IVMP mode is enabled, the motion information associated with the texture view video block is included in the access unit, and the motion information associated with the texture view video block is adopted as motion information associated with the depth view video block, A device configured to code three-dimensional (3D) video data.

32. The method of claim 31,
And if the IVMP mode is enabled, the depth view video block does not include any delta for the motion information associated with the depth view video block.

28. The method of claim 27,
The means for coding comprises means for encoding and the means for coding the syntax element comprises means for generating the syntax element.

28. The method of claim 27,
The means for coding comprises means for decoding and the means for coding the syntax element comprises means for decoding the syntax element from an encoded bitstream, wherein the syntax element is included in the encoded bitstream. A device configured to code dimensional (3D) video data.