KR100780840B1

KR100780840B1 - Temporal Prediction Apparatus and Method for Hierarchical Depth Image Coding of Multi-view Video

Info

Publication number: KR100780840B1
Application number: KR1020060053355A
Authority: KR
Inventors: 호요성; 윤승욱; 김성열
Original assignee: 광주과학기술원
Priority date: 2006-06-14
Filing date: 2006-06-14
Publication date: 2007-11-29

Abstract

본 발명에 따른 다시점 동영상의 계층적 깊이 영상 부호화를 위한 시간적 예측 장치는, 깊이 정보를 갖는 다시점 동영상에 대하여 각 시점의 칼라와 깊이 계층으로 구성된 시간대별 계층적 깊이 영상 시퀀스를 생성하는 LDI 프레임 생성부와, LDI 프레임 생성부에서 생성된 계층적 깊이 영상 시퀀스에서 시간대별 칼라 및 깊이 계층을 이용하여 시간대별 부분 칼라 영상 및 부분 깊이 영상을 각각 생성하는 부분 영상 생성부와, 부분 영상 생성부에서 생성된 부분 칼라 영상 및 부분 깊이 영상을 각각 하나의 영상으로 결합하는 영상 결합부와, 영상 결합부로부터 하나의 영상으로 결합된 부분 칼라 영상을 제공받아 I, B, B, P 구조를 이용해 시간축으로 부분 칼라 영상을 예측하는 부분 칼라 영상 예측부와, 영상 결합부로부터 하나의 영상으로 결합된 부분 깊이 영상을 제공받아 I, P 구조를 이용해 시간축으로 부분 깊이 영상을 예측하는 부분 깊이 영상 예측부를 포함하며, 이를 통해 다시점 동영상에서 가장 많은 부호화 이득을 얻을 수 있는 예측 방식인 시간적 예측을 이용하여 계층적 깊이 영상을 시퀀스를 예측함으로서, 전체적인 다시점 동영상 부호화 효율을 향상시킬 수 있다.The temporal prediction apparatus for hierarchical depth image encoding of a multiview video according to the present invention includes an LDI frame that generates a hierarchical hierarchical depth image sequence composed of a color and a depth layer of each view for a multiview video having depth information. The partial image generator for generating the partial color image and the partial depth image for each time zone using the time slot color and the depth layer from the hierarchical depth image sequence generated by the LDI frame generator, and the partial image generator The image combiner combines the generated partial color image and the partial depth image into one image, and receives the partial color image combined into one image from the image combiner and uses the I, B, B, and P structures as the time axis. A partial color image predictor for predicting a partial color image and a partial depth combined into an image from an image combiner It includes a partial depth image predictor that receives an image and predicts the partial depth image on the time axis using I and P structures, and uses hierarchical prediction, which is a prediction method that can obtain the most coding gain in a multiview video. By predicting the depth image sequence, the overall multiview video encoding efficiency may be improved.

Description

Temporal Prediction Apparatus and Method for Hierarchical Depth Image Coding of Multi-view Video TECHNICAL VIEW APPARATUS AND METHOD FOR CODING MULTI-VIEW VIDEO BASED ON LAYER-DEPTH IMAGE

도 1은 본 발명의 바람직한 실시 예에 따른 다시점 동영상의 계층적 깊이 영상 부호화를 위한 시간적 예측 장치를 도시한 블록도이며,1 is a block diagram illustrating a temporal prediction apparatus for hierarchical depth image encoding of a multiview video according to an exemplary embodiment of the present invention.

도 2는 본 발명의 실시 예에 따라 깊이 정보를 포함한 다시점 동영상을 부호화하기 위해 계층적 깊이 영상의 시각적 예측 과정을 설명하기 위한 도면이고,2 is a diagram illustrating a visual prediction process of a hierarchical depth image for encoding a multiview video including depth information according to an embodiment of the present invention.

도 3은 본 발명의 실시 예에 따른 계층 깊이 영상의 시간적 예측 방법 중 컬러 계층 영상을 하나로 결합하여 시간축으로 예측하는 방법을 설명하기 위한 도면이며,FIG. 3 is a diagram for describing a method of predicting by combining the color layer images into one of the temporal prediction methods of the layer depth images according to an embodiment of the present invention.

도 4는 본 발명의 실시 예에 따른 계층 깊이 영상의 시간적 예측 방법 중 깊이 계층 영상을 하나로 결합하여 시간축으로 예측하는 방법을 설명하기 위한 도면이며,FIG. 4 is a diagram for describing a method of predicting by combining a depth layer image into one of the temporal prediction methods of the layer depth image according to an embodiment of the present invention.

도 5 내지 도 8은 본 발명의 실시 예에 따른 계층 깊이 영상의 시간적 예측 방법 중 컬러 계층 영상을 하나로 결합하지 않고 각각 시간축으로 예측하는 방법을 설명하기 위한 도면이며,5 to 8 are diagrams for describing a method of predicting a color layer image on a time axis without combining the color layer images into one of the temporal prediction methods of the layer depth image according to an exemplary embodiment of the present invention.

도 9 내지 도 10은 본 발명의 실시 예에 따른 계층 깊이 영상의 시간적 예측 방법 중 깊이 계층 영상을 하나로 결합하지 않고 각각 시간축으로 예측하는 방법을 설명하기 위한 도면이다.9 to 10 are diagrams for describing a method of predicting a time layer of a depth layer image without combining the depth layer image in one of the temporal prediction methods of the layer depth image according to an embodiment of the present invention.

＜도면의 주요부분에 대한 부호의 설명＞<Description of the code | symbol about the principal part of drawing>

100/1∼100/n : 카메라 110 : LDI 프레임 생성부100/1 to 100 / n: camera 110: LDI frame generation unit

120 : 부분 영상 생성부 130 : 영상 결합부120: partial image generating unit 130: image combining unit

140 : 부분 칼라 영상 추측부 150 : 부분 깊이 영상 추측부140: partial color image estimation unit 150: partial depth image estimation unit

본 발명은 다시점 동영상 서비스 시스템에 관한 것으로, 특히 계층적 깊이 영상 기반으로 다시점 동영상을 표현하고 부호화 효율을 향상시킬 수 있는 다시점 동영상의 계층적 깊이 영상 부호화를 위한 시간적 예측 장치 및 방법에 관한 것이다.The present invention relates to a multi-view video service system, and more particularly, to a temporal prediction apparatus and method for hierarchical depth video encoding of a multi-view video capable of expressing a multi-view video based on hierarchical depth video and improving encoding efficiency. will be.

다시점 동영상은 더욱 현실감을 제공하기 위하여 다양한 응용 분야에서 사용되고 있으나, 방대한 양의 데이터를 요구하므로 이를 전송하기 위해서는 막대한 대역폭이 필요하다. 기존의 이차원 동영상 부호화 방식의 확장이 아닌 삼차원 정보를 이용하는 계층적 깊이 영상의 개념을 이용하면 상대적으로 적은 대역폭으로 다시점 동영상을 보호화할 수 있다.Multi-view video is used in various applications to provide more realism, but requires a huge amount of data, so huge bandwidth is required to transmit it. By using the concept of hierarchical depth image using three-dimensional information rather than the extension of the conventional two-dimensional video coding method, it is possible to protect the multi-view video with a relatively low bandwidth.

계층적 깊이 영상은 일반적으로 메쉬를 이용하여 삼차원 모델을 표현하는 방식과 달리 단일 카메라 위치에서 보이는 화소의 배열을 사용해서 객체를 표현하는 방법으로, 각 화소는 칼라, 화소에서 카메라까지의 거리를 나타내는 깊이 정보, 기타 렌더링을 지원하는 몇 가지 다른 특성 정보들로 표현된다. 즉, 계층적 깊이 영상은 일반적인 이차원 영상과 비슷하게 화소들로 구성되지만, 각 화소는 색상 정보뿐만 아니라 깊이 정보 및 렌더링에 사용되는 부가 정보를 가진다. 따라서, 한 시점에서 구성된 계층적 깊이 영상을 이용하여 제한된 시야 내에서 임의 시점의 영상을 쉽게 생성할 수 있다. 구체적으로 LDI를 구성하는 정보는 Y, Cb, Cr, Alpha의 색상 정보, 카메라와 물체사이의 거리를 나타내는 깊이 정보 및 렌더링 시에 화소의 크기를 결정하는 사용되는 스플랫 테이블 인덱스(splat table index)이다. 한 LDI 화소는 이 모든 정보를 포함하기 위해 화소마다 총 63비트를 사용하므로, 한 장의 LDI는 적게는 수 MB에서 많게는 수십 MB에 달하는 데이터를 포함한다.Hierarchical depth image is a method of representing an object using an array of pixels visible from a single camera position, unlike a method of representing a three-dimensional model using a mesh. Each pixel represents a color and a distance from a pixel to a camera. It is represented by depth information and some other characteristic information that supports rendering. That is, the hierarchical depth image is composed of pixels similar to a general two-dimensional image, but each pixel has not only color information but also depth information and additional information used for rendering. Therefore, an image of any viewpoint can be easily generated within a limited field of view using the hierarchical depth image configured at one viewpoint. Specifically, the information constituting the LDI includes color information of Y, Cb, Cr, Alpha, depth information indicating the distance between the camera and the object, and a splat table index used to determine the size of the pixel during rendering. to be. Since one LDI pixel uses a total of 63 bits per pixel to contain all of this information, one LDI contains data ranging from a few MB to as many as tens of MBs.

한편, 종래의 다시점 동영상 부호화 기법은 H.264/AVC 기반을 이용한다. 기반을 이용한다. 하지만, H.264/AVC 기반의 다시점 동영상 부호화 기법은 모두 이차원 동영상 부호화 기술의 확장형이며, 삼차원 정보를 사용하지 않는다. 실질적으로 사용자나 시청자에게 더 실감나는 깊이 효과 및 몰입감을 주기 위해서는 다시점 동영상에 포함된 삼차원 정보를 효과적으로 이용해야하나, 종래에는 단순히 이차원 정보만을 사용하는 문제점이 있다.Meanwhile, a conventional multiview video encoding technique uses H.264 / AVC based. Use the foundation. However, H.264 / AVC-based multi-view video encoding techniques are all extensions of two-dimensional video encoding technology and do not use three-dimensional information. In order to give the user or the viewer a more realistic depth effect and immersion, the three-dimensional information included in the multi-view video should be effectively used, but conventionally, there is a problem of using only two-dimensional information.

따라서, 삼차원 TV나 자유시점 TV와 같이 다시점 동영상을 이용한 다양한 서비스 제공 시 많은 제약이 있게 된다.Therefore, there are many limitations when providing various services using multi-view video such as 3D TV or free view TV.

본 발명의 목적은 이와 같은 종래 기술의 문제점을 해결하기 위한 것으로, 다시점 동영상에 포함된 삼차원 정보인 깊이 정보를 이용하여 계층적 깊이 영상 시퀀스를 생성한 후 이를 시간축으로 예측함으로서, 부호화 효율을 향상시킬 수 있는 다시점 동영상의 계층적 깊이 영상 부호화를 위한 시간적 예측 장치 및 방법을 제공하는데 있다.An object of the present invention is to solve the problems of the prior art, by generating a hierarchical depth image sequence by using the depth information that is three-dimensional information included in the multi-view video, and predicting it in the time axis, thereby improving the coding efficiency The present invention provides a temporal prediction apparatus and method for hierarchical depth image encoding of multiview video.

상기와 같은 목적을 달성하기 위하여 본 발명은, 계층 깊이 영상 기반의 다시점 동영상의 계층적 깊이 영상 부호화를 위한 시간적 예측 장치로서, 깊이 정보를 갖는 다시점 동영상에 대하여 각 시점의 칼라와 깊이 계층으로 구성된 시간대별 계층적 깊이 영상 시퀀스를 생성하는 LDI(Layered Death Image) 프레임 생성부와, 상기 LDI 프레임 생성부에서 생성된 계층적 깊이 영상 시퀀스에서 시간대별 칼라 및 깊이 계층을 이용하여 시간대별 부분 칼라 영상 및 부분 깊이 영상을 각각 생성하는 부분 영상 생성부와, 상기 부분 영상 생성부에서 생성된 부분 칼라 영상 및 부분 깊이 영상을 각각 하나의 영상으로 결합하는 영상 결합부와, 상기 영상 결합부로부터 하나의 영상으로 결합된 부분 칼라 영상을 제공받아 I, B, B, P 구조를 이용해 시간축으로 부분 칼라 영상을 예측하는 부분 칼라 영상 예측부와, 상기 영상 결합부로부터 하나의 영상으로 결합된 부분 깊이 영상을 제공받아 I, P 구조를 이용해 시간축으로 부분 깊이 영상을 예측하는 부분 깊이 영상 예측부를 포함한다.In order to achieve the above object, the present invention provides a temporal prediction apparatus for hierarchical depth image encoding of a hierarchical depth image based hierarchical depth image. Layered Death Image (LDI) frame generation unit for generating a hierarchical hierarchical depth image sequence configured by the time zone, and a partial color image for each time zone using a time zone color and a depth layer in the hierarchical depth image sequence generated by the LDI frame generator And a partial image generator which generates a partial depth image, an image combiner which combines the partial color image and the partial depth image generated by the partial image generator into one image, and one image from the image combiner. Partial color image on the time axis using I, B, B, P structure Receiving service prediction portion predicting unit and color image, a depth image-combining section into one image from the image combination unit to include I, part depth image predicting unit to predict the structure using the P portion of the depth images in the time axis.

또한, 본 발명은, 계층 깊이 영상 기반의 다시점 동영상의 계층적 깊이 영상 부호화를 위한 시간적 예측 장치로서, 깊이 정보를 갖는 다시점 동영상에 대하여 각 시점의 칼라와 깊이 계층으로 구성된 시간대별 계층적 깊이 영상 시퀀스를 생성하는 LDI(Layered Death Image) 프레임 생성부와, 상기 LDI 프레임 생성부에서 생성된 계층적 깊이 영상 시퀀스에서 시간대별 칼라 및 깊이 계층을 이용하여 시간대별 부분 칼라 영상 및 부분 깊이 영상을 각각 생성하는 부분 영상 생성부와, 상기 부분 영상 생성부로부터 제공받은 시간대별 부분 칼라 영상에 I, P, P 예측 구조를 적용하여 시간축으로 부분 칼라 영상을 예측하는 부분 칼라 영상 예측부와, 상기 부분 영상 생성부로부터 제공받은 시간대별 부분 칼라 영상에 I, P 예측 구조를 적용하여 시간축으로 부분 깊이 영상을 예측하는 부분 깊이 영상 예측부를 포함한다.In addition, the present invention is a temporal prediction apparatus for hierarchical depth image coding of hierarchical depth image based hierarchical depth image, and hierarchical depth for each time point composed of color and depth layer of each viewpoint for multiview image having depth information. A layered death image (LDI) frame generation unit for generating an image sequence, and a partial color image and a partial depth image for each time zone, respectively, using a time zone color and depth layer in the hierarchical depth image sequence generated by the LDI frame generator. A partial color image predictor configured to predict the partial color image on a time axis by applying an I, P, and P prediction structure to the partial color image for each time zone provided by the partial image generator; By applying I, P prediction structure to the partial color image for each time zone provided by the generation unit, the partial depth is zero in the time axis. And a partial depth image predictor for predicting an image.

다른 견지에서 본 발명은, 계층 깊이 영상 기반의 다시점 동영상의 계층적 깊이 영상 부호화를 위한 시간적 예측 방법으로서, 깊이 정보를 갖는 다시점 동영상에 대하여 각 시점의 칼라와 깊이 계층으로 구성된 시간대별 계층적 깊이 영상 시퀀스를 생성하는 단계와, 상기 생성된 계층적 깊이 영상 시퀀스에서 시간대별 칼라 및 깊이 계층을 이용하여 시간대별 부분 칼라 영상 및 부분 깊이 영상을 각각 생성하는 단계와, 상기 생성된 부분 칼라 영상을 하나의 영상으로 결합한 후 이를 I, B, B, P 구조를 이용해 시간축으로 부분 칼라 영상을 예측하는 단계와, 상기 생성된 부분 깊이 영상을 각각 하나의 영상으로 결합한 후 이를 I, P 구조를 이용해 시간축으로 부분 깊이 영상을 예측하는 단계를 포함한다.In another aspect, the present invention is a temporal prediction method for hierarchical depth image encoding of a multiview video based on a hierarchical depth image. Generating a depth image sequence, generating a partial color image and a partial depth image for each time zone by using a time slot color and a depth layer in the generated hierarchical depth image sequence, and generating the partial color image Combining a single image and predicting the partial color image using the I, B, B, and P structures on the time axis; combining the generated partial depth images into one image, and then using the I, P structure Predicting the partial depth image.

또한, 본 발명은, 계층 깊이 영상 기반의 다시점 동영상의 계층적 깊이 영상 부호화를 위한 시간적 예측 방법으로서, 깊이 정보를 갖는 다시점 동영상에 대하여 각 시점의 칼라와 깊이 계층으로 구성된 시간대별 계층적 깊이 영상 시퀀스를 생성하는 단계와, 상기 생성된 계층적 깊이 영상 시퀀스에서 시간대별 칼라 및 깊이 계층을 이용하여 시간대별 부분 칼라 영상 및 부분 깊이 영상을 각각 생성하는 단계 와, 상기 생성된 시간대별 부분 칼라 영상에 I, P, P 예측 구조를 적용하여 각 시간축으로 부분 칼라 영상을 예측하는 단계와, 상기 생성된 시간대별 부분 칼라 영상에 I, P 예측 구조를 적용하여 시간축으로 부분 깊이 영상을 예측하는 단계를 포함한다.In addition, the present invention is a temporal prediction method for hierarchical depth image encoding of a multi-view video based on a hierarchical depth image. The hierarchical depth for each multi-view video having depth information consists of color and depth hierarchies at each viewpoint. Generating an image sequence, generating a partial time-phase partial color image and a partial depth image by using a time-phase color and a depth layer in the generated hierarchical depth image sequence, and generating the partial time-phase partial color image Predicting a partial color image in each time axis by applying an I, P, and P prediction structure to it, and predicting a partial depth image in the time axis by applying an I, P prediction structure to the generated time-scale partial color image. Include.

이하, 첨부한 도면을 참조하여 바람직한 실시 예에 대하여 상세히 설명한다. Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 계층 깊이 영상 기반의 다시점 동영상 서비스 시스템에서 시간적 예측 장치를 도시한 블록도이다.1 is a block diagram illustrating a temporal prediction apparatus in a multi-view video service system based on a layer depth image, according to an exemplary embodiment.

도 1을 참조하면, 시간적 예측 장치는 n 개의 카메라(100/1∼100/n), 계층적 깊이 영상(Layered Death Image; 이하, "LDI") 프레임 생성부(110), 부분 영상 생성부(120), 영상 결합부(130), 부분 칼라 영상 예측부(140), 부분 깊이 영상 예측부(150)를 포함한다.Referring to FIG. 1, the temporal prediction apparatus includes n cameras 100/1 to 100 / n, a layered death image (LDI) frame generation unit 110, and a partial image generation unit ( 120, an image combiner 130, a partial color image predictor 140, and a partial depth image predictor 150.

LDI 프레임 생성부(110)는 n 개의 카메라(100/1∼100/n)로부터 획득한 깊이 정보를 갖는 다시점 동영상에 대해 각 시점의 칼라와 깊이 계층으로 구성된 계층적 깊이 영상 시퀀스를 생성한다. 각 계층적 깊이 영상은 최대 n 개의 계층을 가진 구성 요소 영상의 집합으로 표현되는데, 이때 각 계층적 깊이 영상에는 칼라와 깊이 계층이 포함되어 있기 때문에 총 2n장의 영상 집합으로 표현된다. The LDI frame generation unit 110 generates a hierarchical depth image sequence composed of color and depth hierarchies of each viewpoint for a multiview video having depth information obtained from n cameras 100/1 to 100 / n. Each hierarchical depth image is represented by a set of component images having a maximum of n hierarchies. In this case, since each hierarchical depth image includes a color and a depth layer, the hierarchical depth image is represented by a total of 2n image sets.

또한, LDI 프레임 생성부(110)는 n 개의 카메라(100/1∼100n)로부터 획득한 다시점 동영상을 이용하여 계층적 깊이 영상 시퀀스를 생성할 때 시간의 변화에 따라 계층적 깊이 영상을 예측하여 생성한다. 즉, 도 2에 도시된 바와 같이, 계층적 깊이 영상 시퀸스는 우측부분의 시간의 변화에 따라 각 시각 T(T=0,1,2, …, M)에 생성된 계층적 깊이 영상(계층적 깊이 영상0, 계층적 깊이 영상1, …, 계층적 깊이 영상M)이 배열되며, 계층적 깊이 영상 시퀀스의 좌측 부분에는 n 개의 카메라(100/1∼100/n)로부터 획득한 다시점 동영상에서 각 동영상별로 칼라와 깊이 계층을 포함하고 있다. 즉 하나의 계층적 깊이 영상은 최대 n 개의 계층으로 구성되며, 각 계층은 칼라 계층 부분과 깊이 계층 부분으로 구성된다.In addition, the LDI frame generator 110 predicts a hierarchical depth image according to a time change when generating a hierarchical depth image sequence using a multi-view video obtained from n cameras 100/1 to 100n. Create That is, as shown in FIG. 2, the hierarchical depth image sequence is hierarchical depth image (hierarchical) generated at each time T (T = 0,1,2, ..., M) according to the change of time of the right portion. Depth image 0, hierarchical depth image 1, ..., hierarchical depth image M) are arranged, and on the left part of the hierarchical depth image sequence, a multi-view video obtained from n cameras 100/1 to 100 / n Each video includes a color and depth hierarchy. That is, one hierarchical depth image is composed of at most n layers, and each layer is composed of a color layer part and a depth layer part.

부분 영상 생성부(120)는 계층적 깊이 영상 시퀀스에서 시간대별 칼라 및 깊이 계층을 이용하여 시간대별 부분 칼라 영상 및 부분 깊이 영상을 각각 생성한다. The partial image generator 120 generates a partial color image and a partial depth image for each time zone by using a color and depth layer for each time in the hierarchical depth image sequence.

먼저, 부분 영상 생성부(120)는 LDI 프레임 생성부(102)에서 생성된 계층적 깊이 영상 중에서 시간대별로 칼라 계층만을 추출하여 부분 칼라 영상을 생성하는데, 도 3에 도시된 바와 같이, 시간대별 칼라 계층을 추출하고, 각 시간대별로 칼라 계층에 대해 수평 및 수직 데이터 모으기를 통해 획득한 영상과 빈 화소 계층을 채운 영상을 포함한 부분 칼라 영상을 생성한다. First, the partial image generator 120 extracts only a color layer for each time zone from the hierarchical depth image generated by the LDI frame generator 102 to generate a partial color image. As shown in FIG. The layer is extracted, and a partial color image including an image obtained by collecting horizontal and vertical data about the color layer for each time zone and an image filled with the empty pixel layer is generated.

그리고, 부분 영상 생성부(120)는 LDI 프레임 생성부(110)에서 생성된 계층적 깊이 영상 중에서 시간대별로 깊이 계층만을 추출하여 부분 깊이 영상을 생성하는데, 도 4에 도시된 바와 같이, 시간대별 깊이 계층을 추출하고, 각 시간대별로 깊이 계층에 대해 수평 및 수직 데이터 모으기를 통해 획득한 영상과 빈 화소 계층을 채운 영상을 포함한 부분 깊이 영상을 생성한다. The partial image generator 120 extracts only a depth layer for each time zone from the hierarchical depth images generated by the LDI frame generator 110 to generate a partial depth image. As shown in FIG. The layer is extracted, and a partial depth image including an image obtained by collecting horizontal and vertical data about the depth layer for each time zone and an image filled with the empty pixel layer is generated.

여기서, 부분 깊이 영상 및 칼라 영상 생성 시 빈 화소 계층을 채우는 방법은 빈 화소를 갖는 계층의 앞부분 계층 정보를 이용하여 채우거나 기타 보간법 등을 이용하여 빈 화소 계층을 채운다. Here, the method of filling the empty pixel layer when generating the partial depth image and the color image fills the empty pixel layer by using the front layer information of the layer having the empty pixel or by using other interpolation methods.

부분 영상 생성부(120)에서 수평 및 수직 데이터 모으기는 시간대별로 빈 계층, 즉 부분 칼라 영상이 없는 계층들에 대해 오른쪽으로 모으고 화소가 있는 계층을 왼쪽으로 모아 정렬한다.In the partial image generator 120, horizontal and vertical data collections are arranged to the right with respect to an empty layer, that is, layers without a partial color image, and arranged with a pixel to the left.

이후, 영상 결합부(130)는 부분 영상 생성부(120)에서 생성된 부분 깊이 및 칼라 영상을 하나의 영상으로 결합하고, 하나의 영상으로 결합된 부분 칼라 영상을 부분 칼라 영상 예측부(140)에 출력함과 더불어 하나의 영상으로 결합된 부분 깊이 영상을 부분 깊이 영상 예측부(150)에 출력한다.Thereafter, the image combiner 130 combines the partial depth and the color image generated by the partial image generator 120 into one image, and the partial color image predictor 140 combines the partial color image combined into one image. A partial depth image combined with one image is output to the partial depth image predictor 150.

부분 칼라 영상 예측부(140)는 I, B, B, P 구조를 이용해 시간축으로 부분 칼라 영상을 예측한다. I, B, B, P 구조를 이용하여 시간축으로 부분 칼라 영상을 예측하는 방법은, 도 3에 도시된 바와 같이, 처음 시간(T=t)에 들어오는 부분 칼라 영상을 I로 정의한 후 T=t+3에 입력되는 부분 칼라 영상을 I를 기반으로 P를 예측하고, T=t+1, t+2의 부분 칼라 영상을 I와 P를 기반으로 B로 예측한다. The partial color image predictor 140 predicts the partial color image on the time axis using I, B, B, and P structures. In the method of predicting a partial color image on the time axis using the I, B, B, and P structures, as shown in FIG. 3, after defining the partial color image that comes in at the first time (T = t) as I, T = t A partial color image input to +3 is predicted P based on I, and a partial color image of T = t + 1 and t + 2 is predicted to B based on I and P.

이와 같이, I, B, B, P 구조를 이용한 시간축 부분 칼라 영상 예측 방법은 B최대 개수가 2개이다.As described above, in the time-base partial color image prediction method using I, B, B, and P structures, the maximum number of B is two.

부분 깊이 영상 예측부(150)는, 도 4에 도시된 바와 같이, H.264/AVC의 I, P 구조를 이용하여 부분 깊이 영상을 예측하는데, 즉 시간(T=t)대의 부분 깊이 영상을"I"라고 가정할 때 그 다음 시간(T=t+1)대의 깊이 정보를 "P"로 예측한다. 이러한 깊이 정보는 스테레오스코픽 디스플레이, 삼차원 TV, 자유시점 TV 등의 응용분야에 중요하게 사용되는 정보이다. As illustrated in FIG. 4, the partial depth image predictor 150 predicts the partial depth image using the I and P structures of H.264 / AVC, that is, the partial depth image of the time (T = t). Assuming " I ", depth information next time (T = t + 1) is predicted as " P ". This depth information is important information for applications such as stereoscopic display, three-dimensional TV, free-view TV.

본 발명에서는 깊이 정보를 예측할 때 "I, P" 구조를 이용하여 예측하는 것 을 예로 들었지만, 이러한 시스템의 응용범위에 따라 정확한 깊이 정보가 필요치 않을 경우 부분 칼라 영상 예측부(140)에서 이용한 "I, B, B, P" 구조를 이용하여 깊이 정보를 예측할 수 있다.In the present invention, when the depth information is predicted by using the "I, P" structure as an example, if the accurate depth information is not necessary according to the application range of the system "I" used in the partial color image prediction unit 140 , B, B, P "structure can be used to predict the depth information.

여기서, 같은 장면을 n 개의 카메라(100/1∼100/n)로 촬영하기 때문에 생성되는 계층수는 거의 일정하다. 만약 생성되는 계층수가 시간대별로 다른 경우 처음 시간(t=0)대에 생성된 계층 수에 맞게 다른 시간(t=1, 2, 3 …)에서의 계층 수를 조절한다. 즉 다른 시간에서의 계층수가 처음 시간의 계층수와 다른 경우 해당 시간에 계층 수를 처음 시간(t=0)대에 생성된 계층 수에 대응되도록 임의의 계층들을 결합시킨다. 이때, 결합되는 계층들의 해상도는 처음 시간(t=0)대 영상의 해상도와 동일하게 맞춘다.Here, since the same scene is photographed with n cameras 100/1 to 100 / n, the number of generated hierarchies is almost constant. If the number of generated layers differs according to time zones, the number of layers at different times (t = 1, 2, 3…) is adjusted to match the number of generated layers in the first time (t = 0). That is, if the number of layers at different times is different from the number of layers at the first time, arbitrary layers are combined so that the number of layers at the corresponding time corresponds to the number of layers generated at the first time (t = 0). At this time, the resolution of the combined layers is set equal to the resolution of the image at the first time (t = 0).

본 발명의 일 실시 예에서는 생성된 부분 칼라 및 깊이 영상을 하나의 영상으로 결합한 후 시간축으로 예측하는 구조로 설명하였지만, 본 발명의 다른 실시 예로서 하나의 영상으로 결합하지 않고 각각 별도로 처리할 수 있다. 즉, 본 발명의 다른 실시 예에 따른 시간적 예측 장치는 영상 결합부(130)가 생략된 도 1에 도시된 시간적 예측 장치와 동일한 구성을 가지며, 부분 영상 생성부(120)는 생성된 부분 칼라 영상을 부분 칼라 영상 추측부(140)에 출력함과 더불어 생성된 부분 깊이 영상을 부분 깊이 영상 추측부(150)에 출력한다.Although an embodiment of the present invention has been described as a structure in which the generated partial color and depth images are combined into one image and then predicted in the time axis, as another embodiment of the present invention, each of them may be processed separately without combining into one image. . That is, the temporal prediction apparatus according to another embodiment of the present invention has the same configuration as the temporal prediction apparatus shown in FIG. 1 in which the image combiner 130 is omitted, and the partial image generator 120 generates the partial color image. Is output to the partial color image estimation unit 140 and the generated partial depth image is output to the partial depth image estimation unit 150.

이때, 부분 칼라 영상 추측부(140)는 I, P, P 구조, 즉 3개의 시간대를 하나의 묶음으로 하여 부분 칼라 영상을 예측하는데, 부분 칼라 영상 예측 방법은 네 가지의 경우로 구분할 수 있다.In this case, the partial color image estimator 140 predicts the partial color image using I, P, and P structures, that is, three time zones as one bundle, and the partial color image prediction method may be classified into four cases.

먼저, 도 5에 도시된 바와 같이, T=t일 때 생성된 계층의 수가 T=t+1일 때 생성된 계층의 수보다 적고, T=t+1일 때 계층의 수가 T=t+2일 때 계층의 수보다 많은 경우에는 이전 시간에 영상이 존재하지 않은 경우 현재 영상을 I로 부호화한 후 I로 부호화된 계층과 대응되는 각 시간대(T=t+1, t+2)의 계층에 대해 P로 부호화한다.First, as shown in FIG. 5, the number of layers generated when T = t is less than the number of layers created when T = t + 1, and the number of layers when T = t + 1 is T = t + 2. If there is more than the number of layers, if the image does not exist in the previous time, the current image is encoded with I and then the layer of each time zone (T = t + 1, t + 2) corresponding to the layer encoded with I. Is encoded as P.

두 번째의 경우는, 도 6에 도시된 바와 같이, T=t일 때 생성된 계층의 수가 T=t+1일 때 생성된 계층의 수보다 적고, T=t+1일 때 계층의 수가 T=t+2일 때 계층의 수보다 더 적은 경우로서, 이전 시간에 영상이 존재하지 않은 경우 현재 영상을 I로 부호화한 후 I로 부호화된 계층과 대응되는 각 시간대(T=t+1, t+2)의 계층에 대해 P로 부호화한다.In the second case, as shown in FIG. 6, the number of layers created when T = t is smaller than the number of layers created when T = t + 1, and the number of layers when T = t + 1. When = t + 2, it is less than the number of layers. If there is no image in the previous time, each time zone corresponding to the layer encoded with I after encoding the current image with I is encoded. The code of +2) is encoded by P.

세 번째 경우는, 도 7에 도시된 바와 같이, T=t일 때 생성된 계층의 수가 T=t+1일 때 생성된 계층의 수보다 많지만, T=t+1일 때 계층의 수가 T=t+2일 때 계층의 수보다 적은 경우로서, 이전 영상이 존재하면 현재 영상을 P로 부호화하고, 그렇지 않으면 I로 부호화한다.In the third case, as shown in FIG. 7, the number of layers created when T = t is greater than the number of layers created when T = t + 1, but the number of layers when T = t + 1 is T = t. When t + 2, the number is less than the number of layers. If there is a previous video, the current video is encoded by P, otherwise, I is encoded.

네 번째 경우는, 도 8에 도시된 바와 같이, T=t일 때 생성된 계층의 수가 가장 많고, T=t+1일 때 계층의 수가 그 다음, T=t+2일 때 계층의 수가 제일 적은 경우로서, T=t+2일 때 T=t에서의 부분 칼라 영상으로부터 I, P, P 예측을 수행한다.In the fourth case, as shown in Fig. 8, the number of layers created when T = t is the largest, the number of layers when T = t + 1, and the number of layers when T = t + 2. In a few cases, I, P, P prediction is performed from partial color images at T = t when T = t + 2.

한편, 부분 깊이 영상 정보는 다시점 복원 시 적합한 화소를 선택하는데 중요한 정보이기 때문에 부분 깊이 영상 예측부(150)는 I, P, P 예측 구조를 이용하여 시간축으로 부분 깊이 영상을 예측한다.On the other hand, since the partial depth image information is important information for selecting a suitable pixel for multi-view reconstruction, the partial depth image predictor 150 predicts the partial depth image on the time axis using I, P, and P prediction structures.

즉, 부분 깊이 영상 예측부(150)는 부분 영상 생성부(120)로부터 제공받은 부분 깊이 영상에 I, P 예측 구조를 적용하여 시간대의 부분 깊이 영상을 예측한다. That is, the partial depth image predictor 150 predicts the partial depth image of the time zone by applying the I and P prediction structures to the partial depth image provided from the partial image generator 120.

그 예로서, 도 9에 도시된 바와 같이, T=t일 때 생성된 계층의 수가 T=t+1일 때 생성된 계층의 수보다 많은 경우 직접 I. P 예측 구조를 이용하여 영상을 예측하고, 그렇지 않을 경우(T=t+1일 때 계층의 수가 T=t일 때 계층의 수보다 많은 경우), 도 10에 도시된 바와 같이, 이전 시간으로부터 예측이 불가능한 현재 영상을 "I"로 부호화한다.For example, as shown in FIG. 9, when the number of layers generated when T = t is larger than the number of layers generated when T = t + 1, the image is predicted using a direct I. P prediction structure. Otherwise, (when the number of layers is greater than the number of layers when T = t + 1), as shown in FIG. 10, encoding the current image, which is not predictable from the previous time, as “I”. do.

본 발명에 따르면, 다시점 동영상에 포함된 삼차원 정보인 깊이 정보를 이용하여 계층적 깊이 영상 시퀀스를 생성한 후 이를 시간축으로 예측함으로서, 부호화 효율을 향상시킬 수 있다.According to the present invention, by generating a hierarchical depth image sequence using depth information, which is three-dimensional information included in a multi-view video, and predicting the hierarchical depth image sequence, the coding efficiency may be improved.

본 발명은 상술한 특정의 바람직한 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자라면 누구든지 다양한 변형 실시가 가능한 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위내에 있게 된다.The present invention is not limited to the above-described specific preferred embodiments, and various modifications can be made by any person having ordinary skill in the art without departing from the gist of the present invention claimed in the claims. Of course, such changes will fall within the scope of the claims.

이상 설명한 바와 같이, 본 발명은 다시점 동영상에서 가장 많은 부호화 이득을 얻을 수 있는 예측 방식인 시간적 예측을 이용하여 계층적 깊이 영상을 시퀀스를 예측함으로서, 전체적인 다시점 동영상 부호화 효율을 향상시킬 수 있다.As described above, the present invention can improve the overall multiview video encoding efficiency by predicting a sequence of hierarchical depth images using temporal prediction, which is a prediction method that can obtain the most coding gain in a multiview video.

Claims

A temporal prediction apparatus for hierarchical depth image coding of multi-view video based on hierarchical depth image,

A layered death image (LDI) frame generation unit for generating a time-phase hierarchical depth image sequence composed of color and depth hierarchies of each viewpoint for a multiview video having depth information;

A partial image generator for generating a partial color image and a partial depth image for each time zone by using a color time zone and a color depth layer in the hierarchical depth image sequence generated by the LDI frame generator;

An image combiner for combining the partial color image and the partial depth image generated by the partial image generator into one image, respectively;

A partial color image predictor which receives a partial color image combined into one image from the image combiner and predicts the partial color image on the time axis using I, B, B, and P structures;

A partial depth image predictor for receiving a partial depth image combined into one image from the image combiner and predicting the partial depth image on the time axis using I and P structures.

A temporal prediction apparatus for hierarchical depth image encoding of a multiview video including a.

The method of claim 1,

The partial image generator,

Temporal depth image coding for hierarchical depth image encoding of a multi-view video, comprising generating a partial color image and a depth image including an image obtained through horizontal and vertical data collection for each time and an image filled with empty pixels. Prediction device.

The method of claim 2,

The partial image generator,

The apparatus for temporal prediction of hierarchical depth image coding of a multi-view video, characterized in that the empty pixel is filled using the front layer information of the layer having the empty pixel.

The method of claim 1,

The partial image generator,

If the number of layers of the partial color image and the depth image for each time zone is different, the temporal prediction apparatus for hierarchical depth image coding of the multi-view video, characterized in that the number of layers of different time zones is adjusted according to the number of layers of the first time zone.

The method of claim 4, wherein

And a new hierarchical resolution of the other time zone is generated in accordance with the resolution of the initial time zone image.

The method of claim 1,

The partial depth image predictor,

A hierarchical depth image encoding of a multi-view video is provided by receiving a partial depth image combined into one image from the image combiner and predicting the partial depth image for each time zone using I, B, B, and P structures. Temporal prediction device.

A temporal prediction apparatus for hierarchical depth image coding of a hierarchical depth image based multi-view video service system,

A partial color image predictor for predicting a partial color image on a time axis by applying I, P, and P prediction structures to the partial color image for each time zone provided by the partial image generator;

Partial depth image predictor for predicting a partial depth image on a time axis by applying I and P prediction structures to partial color images for each time zone provided from the partial image generator

The method of claim 7, wherein

The partial depth image predictor,

Temporal prediction for hierarchical depth image coding of multi-view video, wherein the partial depth image is predicted by applying I, P, and P prediction structures to the partial color image for each time zone provided by the partial image generator. Device.

A temporal prediction method for hierarchical depth image coding of multi-view video based on hierarchical depth image,

Generating a hierarchical hierarchical depth image sequence composed of color and depth hierarchies of each viewpoint for a multiview video having depth information;

Generating a partial color image and a partial depth image for each time zone by using the color and depth layer for each time zone in the generated hierarchical depth image sequence;

Combining the generated partial color images into one image and predicting the partial color images on the time axis using I, B, B, and P structures;

Combining the generated partial depth images into one image, and predicting the partial depth images on the time axis using I and P structures.

A temporal prediction method for hierarchical depth image encoding of a multiview video including a.

The method of claim 9,

Generating the partial color image and the partial depth image, respectively,

Temporal depth image coding for hierarchical depth image encoding of a multi-view video, comprising generating a partial color image and a depth image including an image obtained through horizontal and vertical data collection for each time and an image filled with empty pixels. Forecast method.

The method of claim 10,

Generating the partial color image and the partial depth image, respectively,

The method of claim 1, wherein the empty pixels are filled by using information on the front layer of the layer having the empty pixels.

The method of claim 9,

Generating the partial color image and the partial depth image, respectively,

And when the number of layers of the partial color image and the depth image of each time zone is different, adjusting the number of layers of different time zones according to the number of layers of the first time zone.

The method of claim 12,

The resolution of the new layer in the other time zone is generated by matching the resolution of the first time zone image to the temporal prediction method for hierarchical depth image encoding of a multi-view video.

The method of claim 12,

Predicting the partial depth image,

A method of temporal prediction for hierarchical depth image coding of a multiview video, comprising predicting the partial depth image on the time axis using I, B, B, and P structures of the partial depth image combined into one image.

Predicting a partial color image on each time axis by applying an I, P, and P prediction structure to the generated partial color image for each time zone;

Predicting a partial depth image on a time axis by applying an I and P prediction structure to the generated partial color image for each time zone

The method of claim 15,

Predicting the partial depth image,

A time prediction method for hierarchical depth image encoding of a multi-view video, characterized by predicting partial depth images for each time zone by applying I, P, and P prediction structures to the partial color images for each time zone.