KR20160002194A

KR20160002194A - Adaptive merging candidate selection method and apparatus

Info

Publication number: KR20160002194A
Application number: KR1020140081190A
Authority: KR
Inventors: 방건; 이광순; 허남호; 허영수; 박광훈
Original assignee: 한국전자통신연구원; 경희대학교 산학협력단
Priority date: 2014-06-30
Filing date: 2014-06-30
Publication date: 2016-01-07

Abstract

The present invention relates to a method for encoding/decoding a video and a device thereof. The method comprises a method for adaptively selecting a motion candidate index without transmitting the motion candidate index. According to the present invention, video encoding efficiency can be increased.

Description

METHOD AND APPARATUS FOR SELECTION OF ADAPTIVE MERGE CANDIDATE {

본 발명은 비디오의 부호화/복호화 방법 및 장치에 관한 것임.The present invention relates to a video encoding / decoding method and apparatus.

3차원 비디오는 3차원 입체 디스플레이 장치를 통해 사용자에게 실세계에서 보고 느끼는 것과 같은 입체감을 생생하게 제공한다. 이와 관련된 연구로써 ISO/IEC의 MPEG(Moving Picture Experts Group)과 ITU-T의 VCEG(Video Coding Experts Group)의 공동 표준화 그룹인 JCT-3V(The Joint Collaborative Team on 3D Video Coding Extension Development)에서 3차원 비디오 표준이 진행 중이다. 3차원 비디오 표준은 실제 영상과 그것의 깊이정보 맵을 이용하여 스테레오스코픽 영상뿐만 아니라 오토스테레오스코픽 영상의 재생등을 지원할 수 있는 진보된 데이터 형식과 그에 관련된 기술에 대한 표준을 포함하고 있다.3D video provides users with a stereoscopic effect as if they are seeing and feeling in the real world through a 3D stereoscopic display device. As a result of this research, the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V), a joint standardization group of ISO / IEC Moving Picture Experts Group (MPEG) and ITU-T VCEG (Video Coding Experts Group) Video standards are in progress. The 3D video standard includes standards for advanced data formats and related technologies that can support playback of autostereoscopic images as well as stereoscopic images using real images and their depth information maps.

본 발명에서는 비디오 부호화 효율을 증가시키는 방법을 제안한다.The present invention proposes a method for increasing video coding efficiency.

본 발명은 움직임 후보 인덱스를 전송하지 않고 적응적으로 선택하는 방법을 포함함The present invention includes a method for adaptively selecting a motion candidate index without transmitting it

본 발명은 비디오 부호화의 부호화 효율을 증가시킬 수 있다.The present invention can increase the coding efficiency of video coding.

도 1. 3차원 비디오 시스템의 기본 구조와 데이터 형식
도 2. “balloons” 영상의 실제 영상과 깊이정보 맵 영상의 일 예: (a) 실제 영상, (b) 깊이정보 맵
도 3. CTU(CTU(LCU))를 CU단위로 분할하는 방법의 일 예
도 4. PU의 분할 구조의 일 예
도 5. 3D Video Codec에서 시점 간 예측 구조의 일예
도 6. 3D Video Encoder/Decoder의 실시 일예
도 7. 3D Video Codec의 예측 구조의 일 예
도 8. 병합 움직임 후보 리스트로 사용되는 현재 블록의 주변 블록들의 일예
도 9. 병합 움직임 후보가 리스트에 추가되는 기존의 과정
도 10. 제안하는 적응적 병합후보 선택방법
도 11. 제안하는 적응적 병합후보 선택방법의 일 예Figure 1. Basic structure and data format of 3D video system
2. An example of a real image of a "balloons" image and a depth information map image: (a) a real image, (b) a depth information map
3. An example of a method of dividing a CTU (CTU (LCU)) into CU units
Figure 4. Example of partition structure of PU
Figure 5. An example of the inter-view prediction structure in 3D Video Codec
Figure 6. Implementation of 3D Video Encoder / Decoder
Figure 7. Example of the prediction structure of 3D Video Codec
8. An example of neighboring blocks of a current block used as a merging motion candidate list
Figure 9. Existing process where merged motion candidates are added to the list
10. Suggested adaptive merge candidate selection method
11. Example of proposed adaptive merge candidate selection method

3차원 비디오는 3차원 입체 디스플레이 장치를 통해 사용자에게 실세계에서 보고 느끼는 것과 같은 입체감을 생생하게 제공한다. 이와 관련된 연구로써 ISO/IEC의 MPEG(Moving Picture Experts Group)과 ITU-T의 VCEG(Video Coding Experts Group)의 공동 표준화 그룹인 JCT-3V(The Joint Collaborative Team on 3D Video Coding Extension Development)에서 3차원 비디오 표준이 진행 중이다. 3차원 비디오 표준은 실제 영상과 그것의 깊이정보 맵을 이용하여 스테레오스코픽 영상뿐만 아니라 오토스테레오스코픽 영상의 재생등을 지원할 수 있는 진보된 데이터 형식과 그에 관련된 기술에 대한 표준을 포함하고 있다.
3D video provides users with a stereoscopic effect as if they are seeing and feeling in the real world through a 3D stereoscopic display device. As a result of this research, the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V), a joint standardization group of ISO / IEC Moving Picture Experts Group (MPEG) and ITU-T VCEG (Video Coding Experts Group) Video standards are in progress. The 3D video standard includes standards for advanced data formats and related technologies that can support playback of autostereoscopic images as well as stereoscopic images using real images and their depth information maps.

3차원 비디오 표준에서 고려하고 있는 기본적인 3차원 비디오 시스템은 도 1과 같다. 송신 측에서는 스테레오 카메라, 깊이정보 카메라, 다시점 카메라, 2차원 영상을 3차원 영상으로 변환 등을 이용하여 N(N≥2) 시점의 영상 콘텐츠를 획득한다. 획득된 영상 콘텐츠에는 N 시점의 비디오 정보와 그것의 깊이정보 맵(Depth-map) 정보 그리고 카메라 관련 부가정보 등이 포함될 수 있다. N 시점의 영상 콘텐츠는 다시점 비디오 부호화 방법을 사용하여 압축되며, 압축된 비트스트림은 네트워크를 통해 단말로 전송된다. 수신 측에서는 전송받은 비트스트림을 다시점 비디오 복호화 방법을 사용하여 복호화하여 N 시점의 영상을 복원한다. 복원된 N 시점의 영상은 깊이정보 맵 기반 랜더링(DIBR; Depth-Image-Based Rendering) 과정에 의해 N 시점 이상의 가상시점 영상들을 생성한다. 생성된 N 시점 이상의 가상시점 영상들은 다양한 입체 디스플레이 장치에 맞게 재생되어 사용자에게 입체감이 있는 영상을 제공하게 된다. The basic 3D video system considered in the 3D video standard is shown in FIG. On the transmitting side, the image content of N (N ≥ 2) viewpoints is acquired by using a stereo camera, a depth information camera, a multi-view camera, and a two-dimensional image into a three-dimensional image. The obtained image content may include video information of the N view point, its depth map information, camera-related additional information, and the like. The video content at time point N is compressed using the multi-view video encoding method, and the compressed bitstream is transmitted to the terminal through the network. The receiving side decodes the transmitted bit stream using the multi-view video decoding method, and restores the N view image. The reconstructed N-view image generates virtual view images at N or more viewpoints by a depth-image-based rendering (DIBR) process. The generated virtual viewpoint images are reproduced in accordance with various stereoscopic display devices to provide stereoscopic images to the user.

가상시점 영상을 생성하는데 사용되는 깊이정보 맵은 실세계에서 카메라와 실제 사물(object) 간의 거리(실사 영상과 동일한 해상도로 각 화소에 해당하는 깊이정보)를 일정한 비트수로 표현한 것이다. 깊이정보 맵을 생성하는 방법은 카메라를 이용하여 획득하는 방법과 실제 일반 영상(Texture image)을 이용하여 자동적으로 생성하는 방법으로 나뉠 수 있다. 카메라를 이용하여 획득하는 방법의 경우, 깊이정보 카메라가 일정한 거리 내에서만 작동된다는 문제점이 있다. 반면, 실제 일반 영상을 이용하여 자동적으로 생성하는 방법의 경우, 두 일반 영상 간의 시점 차이(disparity)를 이용하여 깊이정보 맵을 생성한다. 다시 말하면, 현재 시점에서의 임의의 한 화소를 주변 시점에서의 화소들과 비교하여 매칭이 가장 잘되는 부분의 화소를 찾고, 그 화소들간의 거리를 깊이정보로 표현하는 방법이다. 실제 일반 영상을 이용하여 자동적으로 생성된 깊이정보 맵의 예로써, 도 2는 국제 표준화 기구인 MPEG의 3차원 비디오 부호화 표준에서 사용 중인 “balloons” 영상(도 2 (a))과 그것의 깊이정보 맵(도 2 (b))을 보여주고 있다. 실제 도 2의 깊이정보 맵은 화면에 보이는 깊이 정보를 화소당 8비트로 표현한 것이다.The depth information map used to generate the virtual viewpoint image is a representation of the distance between the camera and the actual object in the real world (depth information corresponding to each pixel at the same resolution as the real image) in a fixed number of bits. The method of generating the depth information map can be divided into a method of acquiring using a camera and a method of automatically generating an actual image using a texture image. In the case of a method of acquiring using a camera, there is a problem that the depth information camera operates only within a certain distance. On the other hand, in the case of a method of automatically generating using a real general image, a depth information map is generated using a disparity between two general images. In other words, the method compares an arbitrary one pixel at the current time with the pixels at the surrounding viewpoint, finds the pixel of the best matching area, and expresses the distance between the pixels by the depth information. FIG. 2 shows an example of a depth information map automatically generated by using a real general image. FIG. 2 shows an example of a "balloons" image (FIG. 2 (a)) used in the 3D video coding standard of MPEG, (Fig. 2 (b)). The depth information map shown in FIG. 2 actually represents depth information on the screen in 8 bits per pixel.

실제 영상과 그것의 깊이정보 맵을 부호화하는 방법의 일예로, H.264/AVC(MPEG-4 Part 10 Advanced Video Coding)를 이용하여 부호화를 수행할 수 있으며, 또 다른 방법의 일 예로 MPEG(Moving Picture Experts Group)과 VCEG(Video Coding Experts Group)에서 공동으로 표준화를 진행한 HEVC(High Efficiency Video Coding) 국제 동영상 표준를 사용할 수 있다.
As an example of a method of encoding an actual image and its depth information map, encoding can be performed using H.264 / AVC (MPEG-4 Part 10 Advanced Video Coding), and another example of MPEG Picture Experts Group) and VCEG (Video Coding Experts Group), which are standardized by the High Efficiency Video Coding (HEVC).

HEVC에서는 영상을 효율적으로 부호화하기 위해서 CU(Coding Unit: CU, 이하 ‘CU’라 함)로 부호화를 수행한다. 도 3는 영상을 부호화할 때 CTU(Coding Tree Unit; 이하 ‘CTU’라 함. 혹은 CTU(LCU)(Largest Coding Unit)라고도 할 수 있음) 내에서 CU를 분할하는 방법을 나타낸 그림이다.
The HEVC performs coding with a CU (CU) to efficiently encode an image. 3 is a diagram illustrating a method of dividing a CU within a CTU (Coding Tree Unit) (hereinafter, also referred to as CTU (LCU) (Largest Coding Unit)) when coding an image.

도 3와 같이 HEVC에서는 영상을 CTU(LCU) 단위로 순차적으로 분할한 후, CTU(LCU) 단위로 분할구조를 결정한다. 분할구조는 CTU(LCU) 내에서 영상을 효율적으로 부호화하기 위한 CU의 분포를 의미하는데, 이 분포는 CU를 크기가 가로 세로 절반씩 감소한 4개의 CU로 분할할지를 결정함으로써 결정할 수 있다. 분할된 CU는 같은 방식으로 가로 세로 절반씩 감소한 4개의 CU로 재귀적으로 분할할 수 있다. 이 때, CU의 분할은 미리 정의된 깊이까지 분할을 수행할 수 있는데, 깊이 정보(Depth)는 CU의 크기를 나타내는 정보로써 모든 CU에 저장되어 있다. 기본이 되는 CTU(LCU)의 깊이는 0이고, SCU의 깊이는 미리 정의된 최대 깊이이다. CTU(LCU)로부터 가로 세로 절반으로 분할을 수행할 때마다 CU의 깊이가 1씩 증가한다. 각각의 깊이 별로, 분할을 수행하지 않는 CU의 경우에는 2Nx2N의 크기로, 분할을 수행할 경우에는 NxN 크기의 CU 4개로 분할된다. N의 크기는 깊이가 1씩 증가할 때마다 절반으로 감소한다. 그림 5에서는 최소 깊이가 0인 CTU(LCU)의 크기가 64x64 화소이고, 최대 깊이가 3인 SCU(Smallest Coding Unit)의 크기가 8x8 화소인 예이다. 64x64 화소 CU(CTU(LCU))는 깊이 ‘0’으로 32x32 화소 CU는 깊이 ‘1’로 16x16 화소 CU는 깊이 ‘2’로 8x8 CU(SCU)는 깊이 ‘3’으로 표현된다. 또한 특정 CU를 분할할지에 대한 정보는 CU마다 1비트 정보인 분할 정보를 통해 표현한다. 이 분할 정보는 SCU를 제외한 모든 CU에 포함되어 있으며, CU를 분할하지 않을 경우에는 분할 정보에 ‘0’을 저장하고, 분할할 경우에는 ‘1’을 저장한다.As shown in FIG. 3, the HEVC sequentially divides an image into CTU (LCU) units, and then determines a divided structure by CTU (LCU) units. A partition structure means a distribution of CUs for efficiently encoding an image in a CTU (LCU), which distribution can be determined by determining whether the CU is divided into four CUs whose size is reduced by half the length and half. A partitioned CU can be recursively partitioned into four CUs, which are reduced in half by half in the same manner. At this time, the division of the CU can be performed to a predetermined depth. The depth information is information indicating the size of the CU and is stored in all the CUs. The depth of the underlying CTU (LCU) is zero, and the depth of the SCU is a predefined maximum depth. The depth of the CU increases by one every time a division is made in the horizontal and vertical halves from the CTU (LCU). For each depth, CUs that are not partitioned are sized to 2Nx2N, and when partitioning is performed, they are divided into 4 CUs of NxN size. The size of N decreases by half every time the depth increases by one. Figure 5 shows an example where the size of the smallest coding unit (SCU) with a minimum depth of 0 is 64 × 64 pixels and the maximum depth is 3 × 8 pixels. A 64x64 pixel CU (CTU (LCU)) has a depth of '0', a 32x32 pixel CU has a depth of 1, a 16x16 pixel CU has a depth of 2, and an 8x8 CU (SCU) has a depth of 3. Also, information on whether to divide a specific CU is represented by division information, which is 1-bit information for each CU. This partition information is included in all the CUs except for the SCU. When not partitioning the CU, '0' is stored in the partition information, and '1' is stored in case of partitioning.

이러한 CU는 부호화 단위로써, 하나의 CU단위로 부호화 모드를 가질 수 있다. 즉, CU 각각은 화면 내 부호화 (MODE_INTRA 혹은 INTRA라고 할 수도 있음) 모드 혹은 화면 간 부호화 (MODE_INTER 혹은 INTER라고 할 수도 있음) 모드로 나누어 질 수 있다. 화면 간 부호화 (MODE_INTER) 모드에는 MODE_INTER 모드와 MODE_SKIP (SKIP이라고 할 수도 있음) 모드로 구분될 수 있다.Such a CU is an encoding unit and can have a coding mode in units of one CU. That is, each of the CUs can be divided into intra picture coding (also referred to as MODE_INTRA or INTRA) mode or inter picture coding (also referred to as MODE_INTER or INTER). The inter-picture coding (MODE_INTER) mode can be divided into a MODE_INTER mode and a MODE_SKIP (also referred to as SKIP) mode.

PU(Prediction Unit)는 예측의 단위로서, 도 4과 같이 하나의 CU는 여러 개의 PU(Prediction Unit: PU, 이하 ‘PU’라 함)들로 분할되어 예측이 수행될 수 있다. 하나의 CU의 부호화 모드가 INTRA 라면, 그 CU내의 PU들은 모두 INTRA 로 부호화 되고, 하나의 CU의 부호화 모드가 INTER 라면, 그 CU내의 PU들은 모두 INTER 로 부호화된다. 실시 일예로, 하나의 CU가 INTRA 일 경우, PU의 분할 구조는 PU 2Nx2N과 PU NxN만 가질 수 있다. 실시 일예로, 하나의 CU가 INTER라면 PU는 도 4의 모든 분할 구조를 가질 수 있다.
PU (Prediction Unit) is a unit of prediction. As shown in FIG. 4, one CU may be divided into several PUs (PUs) to be predicted. If the coding mode of one CU is INTRA, all the PUs in the CU are encoded into INTRA, and if the encoding mode of one CU is INTER, all the PUs in the CU are encoded into INTER. For example, when one CU is INTRA, the partition structure of the PU may include only PU 2Nx2N and PU NxN. For example, if one CU is INTER, then the PU may have all of the partition structures of FIG.

실제 영상과 그것의 깊이정보맵은 카메라 하나뿐만아니라 여러 개의 카메라에서 획득한 영상일 수 있다. 여러 개의 카메라에서 획득한 영상은 독립적으로 부호화될 수 있으며, 일반적인 2차원 비디오 부호화 코덱을 사용할 수 있다. 또한, 여러 개의 카메라에서 획득한 영상은 시점 간의 상관성이 존재하므로, 부호화 효율을 높이기 위하여 서로 다른 시점간 예측을 사용하여 영상들을 부호화할 수 있다. 일 실시 예로, 도 5은 카메라 3대에서 획득한 영상들에 대한 시점간 예측 구조의 일예를 나타낸다.The actual image and its depth information map can be images obtained from multiple cameras as well as one camera. Images obtained from multiple cameras can be independently encoded, and a general two-dimensional video coding codec can be used. In addition, since images acquired from a plurality of cameras have correlation between viewpoints, images can be encoded using different inter-view prediction in order to increase the encoding efficiency. In one embodiment, FIG. 5 shows an example of an inter-view prediction structure for images acquired from three cameras.

도 5에서, 시점 1(View 1)은 시점 0(View 0)을 기준으로 왼쪽에 위치한 카메라에서 획득한영상이고, 시점 2(View 2)는 시점 0(View 0)을 기준으로 오른쪽에 위치한 카메라에서 획득한 영상이다. 또한, 시점 1(View 1)과 시점 2(View 2)는 시점 0(View 0)을 참조 영상으로 사용하여 시점 간 예측을 수행하며, 부호화 순서는 시점 1(View 1)과 시점 2(View 2)보다 시점 0(View 0)이 먼저 부호화되어야 한다. 이때, 시점 0(View 0)은 다른 시점과 상관없이 독립적으로 부호화될 수 있으므로 독립적인 시점(Independent View)이라고 한다. 반면, 시점 1(View 1)과 시점 2(View 2)는 시점 0(View 0)을 참조 영상으로 사용하므로 의존적인 시점(Dependent View)라고 한다. 독립적인 시점 영상은 일반적인 2차원 비디오 코덱을 사용하여 부호화 될 수 있다. 반면, 의존적인 시점 영상은 시점간 예측을 수행하여야 하므로, 시점간 예측 과정이 포함된 3차원 비디오 코덱을 사용하여 부호화할 수 있다. 5, view 1 is an image acquired from a camera located on the left side based on view 0 (View 0), view 2 (view 2) is an image acquired from a camera located on the right side based on view 0 . View 1 and View 2 perform inter-view prediction using View 0 as a reference picture and the coding order is View 1 and View 2 (View 0) must be encoded first. At this time, View 0 can be independently encoded regardless of other viewpoints, so it is referred to as an independent view. On the other hand, View 1 and View 2 are called Dependent View because View 0 is used as a reference image. The independent viewpoint image can be encoded using a general two-dimensional video codec. On the other hand, since the dependent view image must perform the inter-view prediction, it can be encoded using the 3D video codec including the inter-view prediction process.

또한, 시점 1(View 1)과 시점 2(View 2)의 부호화 효율을 증가시키기 위하여 깊이정보 맵을 이용하여 부호화 할 수 있다. 일예로, 실제 영상과 그것의 깊이정보 맵을 부호화할 때, 서로 독립적으로 부호화/복호화할 수 있다. 또한 실제 영상과 깊이정보 맵을 부호화할 때, 도 6와 같이 서로 의존적으로 부호화/복호화 할 수 있다. 일 실시 예로, 이미 부호화된/복호화된 깊이정보맵을 이용하여 실제 영상을 부호화/복호화할 수 있으며, 반대로 이미 부호화된/복호화된 실제 영상을 이용하여 깊이정보맵을 부호화/복호화할 수 있다.
Also, in order to increase the coding efficiency of view 1 (view 1) and view 2 (view 2), it is possible to encode using depth information map. For example, when an actual image and its depth information map are encoded, they can be independently encoded / decoded. Also, when the actual image and the depth information map are encoded, they can be encoded / decoded dependently on each other as shown in FIG. In one embodiment, an actual image can be encoded / decoded using an already-encoded / decoded depth information map, and conversely, a depth information map can be encoded / decoded using an already encoded / decoded real image.

일 실시 예로, 카메라 3대에서 획득한 실제 영상과 그것의 깊이정보맵을 부호화하기 위한 부호화 예측 구조를 도 7에 나타내었다. 도 7에서 3개의 실제 영상을 시점에 따라 T0, T1, T2로 나타내었고, 실제 영상과 동일한 위치의 3개의 깊이정보맵을 시점에 따라 D0, D1, D2로 나타내었다. 여기서 T0와 D0는 시점 0(View 0)에서 획득한 영상이며, T1와 D1는 시점 1(View 1)에서 획득한 영상이며, T2와 D2는 시점 2(View 2)에서 획득한 영상이다. 각 픽쳐는 I(Intra Picture), P(Uni-prediction Picture), B(Bi-prediction Picture)로 부호화될 수 있다. 도 7에서 화살표는 예측 방향을 나타낸다. 즉, 실제 영상과 그것의 깊이정보맵은 서로 의존적으로 부호화/복호화된다.
In one embodiment, an encoding prediction structure for encoding an actual image obtained in three cameras and a depth information map thereof is shown in Fig. In FIG. 7, three actual images are shown as T0, T1, and T2 according to the viewpoint, and three depth information maps at the same position as the actual image are shown as D0, D1, and D2 according to the viewpoint. Here, T0 and D0 are images obtained at View 0, T1 and D1 are images acquired at View 1, and T2 and D2 are images acquired at View 2, respectively. Each picture can be encoded into I (Intra Picture), P (Uni-prediction Picture), and B (Bi-prediction Picture). In Fig. 7, arrows indicate prediction directions. That is, the actual image and its depth information map are encoded / decoded depending on each other.

실제 영상에서 현재 블록의 움직임 정보(움직임 벡터만을 의미할 수 있으며, 혹은 움직임벡터와 참조영상 번호와 단방향 예측인지 양방향 예측인지 그리고 시점간 예측인지 시간적 예측인지 또다른 예측인지를 의미할 수도 있다.)를 유추하기 위한 방법은 크게 시간적 예측과 시점간 예측으로 나뉜다. 시간적 예측은 동일한 시점 내에서 시간적 상관성을 이용한 예측 방법이고, 시점간 예측은 인접한 시점에서 시점간 상관성을 이용한 예측 방법이다. 이러한 시간적 예측과 시점간 예측은 한 픽쳐에서 서로 혼용되어 사용될 수 있다.
It may mean motion information of current block (only motion vector of the current block in the real image, or motion vector, reference picture number, unidirectional prediction, bi-directional prediction, inter-view prediction, temporal prediction or other prediction). The method for inferring is divided into temporal prediction and inter-view prediction. Temporal prediction is a prediction method using temporal correlation within the same time, and inter-view prediction is a prediction method using inter-view correlation at an adjacent time. Such temporal prediction and inter-view prediction can be used in combination with each other in one picture.

영상 부호화/복호화 시 움직임 정보의 부호화 방법 중 하나로 병합 움직임(merge) 방법을 이용한다. 이때 움직임 정보란, 움직임 벡터, 참조영상에 대한 인덱스 그리고 예측 방향(단방향, 양방향, 등) 중에서 적어도 하나를 포함하는 정보다. 예측 방향은 참조픽쳐목록(Reference Picture List; RefPicList)의 사용에 따라 크게 단방향 예측과 양방향 예측으로 나누어 질 수 있다. 단방향 예측에는 순방향 참조픽쳐목록 (LIST 0)을 사용한 순방향 예측(Pred_L0; Prediction L0)과 역방향 참조픽쳐목록 (LIST 1)을 사용한 역방향 예측(Pred_L1; Prediction L1)으로 구분된다. 또한, 양방향 예측(Pred_BI; Prediction BI)은 순방향 참조픽쳐목록 (LIST 0)와 역방향 참조픽쳐목록 (LIST 1)을 모두 사용하며, 순방향 예측과 역방향 예측이 모두 존재하는 것을 말할 수 있다. 또한, 순방향 참조픽쳐목록 (LIST 0)을 역방향 참조픽쳐목록 (LIST 1)에 복사하여 순방향 예측이 두개 존재하는 경우도 양방향 예측에 포함할 수 있다. 이러한 예측 방향 여부를 predFlagL0, predFlagL1를 사용하여 정의할 수 있다. 실시 일예로, 단방향 예측이고 순방향 예측일 경우 predFlagL0는 ‘1’이 되고 predFlagL1은 ‘0’이 될 수 있다. 또한, 단방향 예측이고 역방향 예측일 경우 predFlagL0는 ‘0’이 되고 predFlagL1은 ‘1’이 될 수 있다. 또한, 양방향 예측일 경우 predFlagL0는 ‘1’이 되고 predFlagL1은 ‘1’이 될 수 있다. A merge method is used as a method of coding motion information in image encoding / decoding. At this time, the motion information is information including at least one of a motion vector, an index for a reference image, and a prediction direction (unidirectional, bidirectional, etc.). The prediction direction can be largely divided into unidirectional prediction and bidirectional prediction according to use of a reference picture list (Ref PictureList). The unidirectional prediction is divided into forward prediction (Pred_L0) using the forward reference picture list (LIST0) and backward prediction (Pred_L1; Prediction L1) using the reverse reference picture list (LIST1). The bidirectional prediction (Pred_BI) uses both the forward reference picture list (LIST 0) and the backward reference picture list (LIST 1), and both the forward prediction and the backward prediction are present. In addition, the forward reference picture list (LIST0) may be copied to the reverse reference picture list (LIST1), and two forward prediction may be included in the bidirectional prediction. The prediction direction can be defined using predFlagL0, predFlagL1. For example, in the case of unidirectional prediction and forward prediction, predFlagL0 may be '1' and predFlagL1 may be '0'. In case of unidirectional prediction and backward prediction, predFlagL0 may be '0' and predFlagL1 may be '1'. In case of bi-directional prediction, predFlagL0 may be '1' and predFlagL1 may be '1'.

병합 움직임은 부호화 유닛(Coding Unit: CU, 이하 ‘CU’라 함) 단위의 병합 움직임과 예측 유닛(Prediction Unit: PU, 이하 ‘PU’라 함) 단위의 병합 움직임이 가능하다. CU또는 PU(이하, 설명의 편의를 위해 ‘블록’이라 함) 단위로 병합 움직임을 수행하는 경우에는, 블록 파티션(partition)별로 병합 움직임을 수행할지에 관한 정보와 현재 블록에 인접한 주변 블록(현재 블록의 좌측 인접 블록, 현재 블록의 상측 인접 블록, 현재 블록의 시간적(temporral) 인접 블록 등) 중 어떤 블록과 병합 움직임을 할 것인지에 대한 정보를 전송할 필요가 있다.The merging motion can be a merge motion in units of a coding unit (CU) and a merging motion in units of a prediction unit (PU). In the case of performing a merge movement in units of a CU or a PU (hereinafter, referred to as a 'block' for convenience of description), information on whether to perform a merge movement for each block partition and information on whether to perform a merge movement on neighboring blocks It is necessary to transmit information on which of the blocks adjacent to the left side of the block, the adjacent block on the upper side of the current block, the temporally neighboring block of the current block, etc., is to be merged.

병합 움직임(merge) 후보 리스트(List)은 움직임 정보들이 저장된 리스트을 나타내며, 병합 움직임이 수행되기 전에 생성된다. 여기서 병합 움직임 후보 리스트에 저장되는 움직임 정보는 현재 블록에 인접한 주변 블록의 움직임 정보이거나 참조 영상에서 현재 블록에 대응되는(collocated) 블록의 움직임 정보일 수 있다. 또한 병합 움직임 후보 리스트에 저장되는 움직임 정보는 이미 병합 움직임 후보 리스트에 존재하는 움직임 정보들을 조합하여 만든 새로운 움직임 정보일 수 있다.
The merge candidate list (List) represents a list in which the motion information is stored, and is generated before the merge movement is performed. Here, the motion information stored in the merged motion candidate list may be motion information of a neighboring block adjacent to the current block or motion information of a collocated block corresponding to the current block in the reference image. Also, the motion information stored in the merged motion candidate list may be new motion information created by combining motion information already present in the merged motion candidate list.

병합 움직임 후보 리스트는 도 8의 주변 블록(A, B, C, D, E)과 그리고 동일 위치의 후보 블록(H(혹은 M))에 대하여 해당 블록의 움직임 정보가 현재 블록의 병합 움직임에 이용될 수 있는지를 판단하여, 이용 가능한 경우에는 해당 블록의 움직임 정보를 병합 움직임 후보 리스트에 입력할 수 있다. 또한, 각 주변 블록은 서로 동일한 움직임 정보를 가지는지 확인하여 동일한 움직임 정보일 경우, 해당 주변 블록의 움직임 정보는 병합 움직임 후보 리스트에 포함되지 않는다. 실시 일 예로, 도 8에서 X 블록에 대한 병합 움직임 후보 리스트를 생성할 때, 주변 블록 A가 사용가능하여 병합 움직임 후보 리스트에 포함된 후, 주변 블록 B는 주변 블록 A와 동일한 움직임 정보가 아닐 경우에만, 병합 움직임 후보 리스트에 포함될 수 있다. 동일한 방법으로 주변 블록 C는 주변 블록 B와 동일한 움직임 정보가 아닐 경우에만 병합 움직임 후보 리스트에 포함될 수 있다. 동일한 방법으로 주변 블록 D와 주변 블록 E에 적용될 수 있다. 여기서 동일한 움직임 정보라는 것은 움직임 벡터가 동일하고 동일한 참조픽쳐를 사용하고 동일한 예측 방향(단방향(정방향, 역방향), 양방향)을 사용함을 의미할 수 있다. 마지막으로 도 8에서 X 블록에 대한 병합 움직임 후보 리스트는 소정의 순서, 예컨대, A→B→C→D→E→H(혹은 M) 블록 순서로 리스트에 추가될 수 있다.
The merged motion candidate list is used for motion information of the current block in the neighboring blocks A, B, C, D, and E and the candidate block H (or M) And if it is available, the motion information of the block can be input to the merged motion candidate list. Also, if each neighboring block has the same motion information, the motion information of the neighboring block is not included in the merged motion candidate list. For example, when generating a merged motion candidate list for X block in FIG. 8, if neighboring block A is available and included in the merged motion candidate list, neighboring block B is not the same motion information as neighboring block A Only candidate merging motion candidate list. In the same way, the neighboring block C can be included in the merged motion candidate list only when it is not the same motion information as the neighboring block B. Can be applied to the peripheral block D and the peripheral block E in the same way. Here, the same motion information may mean that the same motion vector is used and the same reference picture is used and the same prediction direction (unidirectional (forward, reverse), bidirectional) is used. Finally, in FIG. 8, the merged motion candidate list for the X block may be added to the list in a predetermined order, for example, A → B → C → D → E → H (or M) block order.

비디오 부호화에서는 움직임 정보를 효율적으로 부호화하기 위하여, 주변의 움직임 정보를 이용할 수 있다. 주변의 움직임 정보로부터 유도 된 병합 움직임 후보들은 도 9와 같이 병합 움직임 후보 리스트에 삽입된다. 그 후, 가상으로 부호화를 시도하여 가장 부호화 효율이 좋은 후보를 최종적으로 선택한다. 선택 된 후보는 엔트로피 부호화를 이용하여 부호화 된다.
In video coding, surrounding motion information can be used to efficiently encode motion information. The merged motion candidates derived from the neighboring motion information are inserted into the merged motion candidate list as shown in FIG. After that, the encoding is virtually attempted, and the candidate with the best coding efficiency is finally selected. The selected candidate is encoded using entropy encoding.

이 때, 엔트로피 부호화의 효율을 증가시키기 위해서는 리스트의 첫 번째 위치에 들어가는 후보가 되도록 많이 선택되어 0번 인덱스가 부호화 되는 횟수가 많아야 한다. 따라서 후보들 간의 우선순위가 존재하게 되며, 통계적으로 가장 선택이 많이되는 순서대로 후보들을 리스트에 추가한다. 일 예로, 현재 3D-HEVC에서는 MPI(텍스처로부터 상속), IvMC(시점 간 움직임), A1(좌), B1(상), B0(우상), IvDC(시차), VSP(시점합성), A0(좌하), B2(좌상), ShiftIV(보정시차), Col(동일위치), Bi(양방향), Zero(0벡터) 후보가 나열 된 순서대로 우선적으로 병합 움직임 후보 리스트에 삽입 된다.In this case, in order to increase the efficiency of entropy encoding, it is necessary to select a large number of candidates at the first position in the list and to encode the index of 0's at a large number of times. Therefore, there is a priority between the candidates, and the candidates are added to the list in the statistically most selected order. For example, in the current 3D-HEVC, MPI (inheritance from texture), IvMC (inter-view motion), A1 (left), B1 (top), B0 (top), IvDC (time difference) (Left), B2 (top left), ShiftIV (correction time difference), Col (same position), Bi (bi-directional) and Zero (zero vector) candidates are inserted into the merged motion candidate list in the order listed.

표 1은 텍스처의 병합 움직임 후보 선택 비율의 일 예이다. 텍스처에서는 MPI 후보가 사용되지 않으므로 IvMC가 0번째 인덱스를 차지하며 가장 높은 비율로 선택된다. 78%의 후보가 0 값으로 엔트로피 부호화 되므로 부호화 효율이 좋은 편이다. 이후의 후보들 또한 정확하진 않지만 대체적으로 선택비율이 높은 순서대로 나열되어 있다.Table 1 is an example of a merged motion candidate selection ratio of a texture. Since MPI candidates are not used in textures, IvMC occupies the 0th index and is selected at the highest rate. 78% of the candidates are entropy-encoded with a value of 0, so that the coding efficiency is good. Subsequent candidates are also listed in order of increasing selection ratio, although they are not accurate.

이와 같이 많은 종류의 병합 움직임 후보들이 존재하는데, 움직임 병합을 사용하는 블록을 부호화 시, 리스트 내에서 어떤 후보를 사용할지를 의미하는 후보 인덱스를 전송해야 하며, 이는 압축 비트스트림에서 많은 비중을 차지한다. 따라서 만일, 후보 인덱스를 적응적으로 선택함으로써 인덱스 전송을 생략할 수 있다면 비디오 부호화 효율을 증가시킬 수 있을 것이다.
There are many kinds of merging motion candidates. When a block using motion merging is encoded, a candidate index indicating which candidate to use in the list must be transmitted, which occupies a large portion in the compressed bitstream. Therefore, if the index transmission can be omitted by adaptively selecting the candidate index, the video encoding efficiency can be increased.

비디오 부호화에서 병합 후보리스트를 구성 시, 여러 종류의 후보들이 사용된다. 이 때, 높은 빈도로 사용되는 후보에 대해서는, 주변블록과 현재블록 간의 움직임 후보선택에 대한 상관성이 높다.When constructing a merging candidate list in video coding, various kinds of candidates are used. At this time, for candidates used with a high frequency, there is a high correlation with motion candidate selection between the neighboring block and the current block.

본 발명에서는 주변블록이 특정 후보를 사용했을 시 현재 블록을 부호화 시, 움직임 후보 리스트 상에서 움직임 정보로 이용할 후보의 인덱스를 부호화 하지 않고, 주변블록에서 사용했던 후보를 현재블록의 움직임 후보로 사용하는 방법을 제안한다. 제안하는 방법의 개념도는 도 10과 같다. 제안하는 방법을 통하여 움직임 후보 인덱스를 전송하지 않아도 되므로 부호화 효율을 증가시킬 수 있다.
In the present invention, when a neighboring block uses a specific candidate, a candidate used in neighboring blocks is used as a motion candidate of the current block without encoding an index of a candidate to be used as motion information on the motion candidate list when encoding the current block . A conceptual diagram of the proposed method is shown in FIG. It is not necessary to transmit the motion candidate index through the proposed method, so that the coding efficiency can be increased.

도 11는 제안하는 방법을 3D-HEVC의 움직임 병합에 적용하는 일 예이다. 현재 블록이 기본시점 텍스처에 속해있지 않고 아래 조건 중 하나를 만족한다면, 병합 움직임 후보를 부호화 하지 않는다.11 is an example of applying the proposed method to motion merging of 3D-HEVC. If the current block does not belong to the base view texture and satisfies one of the following conditions, the merged motion candidate is not encoded.

- 조건 1: 현재 픽처가 깊이맵이 아니고, 현재 시점과 동일한 이전 픽처 내의 동일위치 블록이 시점 간 움직임 후보(IVMC, Inter-view merging candidate)를 사용했으며, 현재블록에서 시점 간 움직임 후보를 사용 가능함- Condition 1: The current picture is not a depth map, and the co-located block in the previous picture is the same as the current view, and the inter-view merging candidate (IVMC) is used.

- 조건 2: 현재 픽처가 깊이맵이고, 현재 픽처에 상응하는 텍스처 내의 동일위치 블록이 움직임 상속 후보(MPI, Motion Parameter Inheritance)를 사용했으며, 현재블록에서 움직임 상속 후보를 사용 가능함- Condition 2: the current picture is a depth map, a co-located block in the texture corresponding to the current picture uses Motion Parameter Inheritance (MPI), and a motion inheritance candidate is available in the current block

Claims

In the method of selecting one of the candidates in the motion candidate list and using it as the motion information of the current block, the motion candidate used by the neighboring block is used as the motion candidate of the current block and the transmission of the motion candidate index is omitted

The method of claim 1, further comprising: using a motion candidate used in neighboring blocks spatially adjacent to the current block

The method according to claim 1, further comprising the steps of: using a motion candidate used in a block temporally adjacent to the current block as a neighboring block

The method according to claim 1, further comprising: using a motion candidate used as a neighboring block in a block adjacent to the current block