KR100737808B1

KR100737808B1 - Method for efficiently compressing 2d multi-view images

Info

Publication number: KR100737808B1
Application number: KR1020050094449A
Authority: KR
Inventors: 최병호; 김태완; 김제우; 김용환; 혁 송; 배진우
Original assignee: 전자부품연구원
Priority date: 2005-10-07
Filing date: 2005-10-07
Publication date: 2007-07-10
Also published as: KR20070039290A

Abstract

본 발명은 2차원 카메라 배열로부터의 다시점 영상을 압축하는 방법에 관한 것으로서, 다시점 영상의 기준영상에 대하여 객체의 시차 정보가 포함된 합성시차영상을 생성하는 단계와, 상기 다시점 영상으로부터 전체시차를 예측하는 단계와, 상기 예측된 전체시차에 따라 기준영상을 조정하여 각 시점의 합성텍스처영상을 합성하는 단계와, 상기 합성시차영상 및 합성텍스처영상을 이용하여 각 시점의 원 영상으로부터 공간축 예측 및 같은 시점의 기 입력된 영상으로부터의 시간축 예측을 동시에 수행하는 단계를 포함한다. 본 발명에 따르면, 다시점 영상의 시간축 및 공간측 예측을 통해 다시점 영상의 전송을 위한 데이터량을 감소시킴과 아울러, 각 카메라의 시점에 따른 숨겨진 영상을 정확히 예측할 수 있는 장점이 있다.The present invention relates to a method of compressing a multiview image from a 2D camera array, the method comprising: generating a synthetic parallax image including parallax information of an object with respect to a reference image of a multiview image, and totally from the multiview image Predicting parallax; synthesizing a composite texture image of each viewpoint by adjusting a reference image according to the predicted total parallax; and using a synthesized parallax image and a composite texture image, a spatial axis from the original image of each viewpoint And simultaneously performing prediction and time-base prediction from a previously input image of the same view. According to the present invention, it is possible to reduce the amount of data for transmitting a multiview image through the time axis and the spatial side prediction of the multiview image, and to accurately predict the hidden image according to the viewpoint of each camera.

다시점, 시차, 깊이, 압축, 예측, 합성, Dense disparity, Depth, View Multiview, Parallax, Depth, Compression, Prediction, Synthesis, Dense disparity, Depth, View

Description

Multi-view image compression method of two-dimensional structure {METHOD FOR EFFICIENTLY COMPRESSING 2D MULTI-VIEW IMAGES}

도 1은 종래 기술에 따른 다시점 예측 방법의 개념도.1 is a conceptual diagram of a multi-view prediction method according to the prior art.

도 2는 본 발명의 일 실시예에 따른 다시점 영상 압축 방법에서 둘 이상의 카메라 입력으로부터 공간 예측을 수행하기 위한 삼각 측량법의 예시도.2 is an exemplary diagram of triangulation for performing spatial prediction from two or more camera inputs in a multiview image compression method according to an embodiment of the present invention.

도 3은 다시점 영상 압축 방법에서 객체 내에서 인접 블록의 벡터값을 이용하여 현재의 시차벡터(Dense disparity)를 예측하는 방법의 개념도.3 is a conceptual diagram of a method of predicting a current disparity vector using a vector value of an adjacent block in an object in a multiview image compression method.

도 4는 본 발명의 일 실시예에 따른 다시점 영상 압축 방법의 전체 흐름도.4 is an overall flowchart of a multi-view image compression method according to an embodiment of the present invention.

도 5 및 도 6은 도 4의 흐름도에 따른 다시점 영상 압축 방법을 개념적으로 도시한 도면.5 and 6 conceptually illustrate a multi-view image compression method according to the flowchart of FIG.

본 발명은 다시점 영상의 압축 전송시에 2차원 카메라 배열로부터의 영상으로부터 합성 및 재구성된 영상을 이용하여 공간축(시차간) 및 시간적으로 압축함으로써 기존의 압축방법에 비하여 영상 데이터양을 줄이는 방법에 관한 것이다.The present invention is a method of reducing the amount of image data compared to the conventional compression method by compressing the spatial axis (time difference) and temporal using the image synthesized and reconstructed from the image from the two-dimensional camera array during compression transmission of the multi-view image It is about.

최근들어 고화질 양방향 멀티미디어 시대의 도래와 함께 시청자의 영상정보 욕구에 부응하고자, 3D 입체 영상의 제작과 관련하여 다시점 카메라 시스템으로부터 영상을 입력받아 이를 압축/전송/복원하는 기술을 표준화하려는 움직임이 ISO/IEC MPEG에서 진행되고 있다. 이에 따라, 많은 기업체, 학교 및 연구소에서 이와 관련한 연구를 진행하고 있다. Recently, with the advent of the era of high-definition interactive multimedia, the movement to standardize the technology of receiving, compressing, transmitting, and restoring images from a multi-view camera system in connection with the production of 3D stereoscopic images is required. / Progress in IEC MPEG. Accordingly, many companies, schools, and research institutes are conducting research on this.

그런데, 이러한 다시점 영상 압축 알고리즘으로는 다시점 영상을 시간축으로만 예측하여 직접 압축하는 캐스케이드(Cascade) 방법, 영상을 시간축 및 인접 카메라에서의 입력에서 예측하여 압축하는 방법, 영상을 시간축 및 인접 카메라를 단방향 예측하는 방법, 그리고 웨이브렛(Wavelet) 알고리즘을 이용하여 고대역통과필터 및 저대역통과필터를 통하여 합성하는 알고리즘 등이 제안되어 있다. However, such a multi-view image compression algorithm is a cascade method of directly compressing a multiview image by predicting it only on the time axis, a method of compressing the image by predicting it from an input from a time axis and an adjacent camera, and compressing the image from the time axis and the adjacent camera. The unidirectional prediction method, and a synthesis algorithm using a high pass filter and a low pass filter using a wavelet algorithm have been proposed.

예컨대, 국내특허출원 제10-2003-0073241호 "3차원 다시점 멀티미디어 처리용 적응형 다중화/역다중화장치 및 그 방법"(이하 "선행기술 1"이라 함)은 여러 카메라에서 들어오는 입력을 Cascade 방법으로 다중화/역다중화(Mux/Demux)를 이용하여 압축/복원하는 방법을 개시하고 있다. For example, Korean Patent Application No. 10-2003-0073241 "Adaptive multiplexing / demultiplexing apparatus and method thereof for three-dimensional multi-view multimedia processing" (hereinafter, referred to as "prior art 1") is a cascade method for input from various cameras. A compression / restore method using multiplexing / demultiplexing (Mux / Demux) is disclosed.

즉, 선행기술 1에 따르면, 여러 개의 카메라로부터 영상을 입력받아 전처리기에서 카메라의 특성 차이로 인한 파라메타 등의 보정을 수행한 후, 각 시점의 영상들의 중복성을 최대한 제거하여 다시점 비디오 압축기에서 여러 개의 비디오 영상 스트림(Elementary Stream : ES)으로 압축하며, 이를 다시점 비디오 다중화 장치로 입력하여 각각의 독립된 영상 스트림을 하나의 다중화된 스트림으로 생성한다. 이에 따라, 수신부에서는 다시점 비디오 역다중화 장치에서, 다중화된 다시점 영상을 역다중화하여 각각의 비디오 영상 스트림을 수신부의 다시점 비디오 복원기 내에 존재하는 각 시점에 맞는 비디오 디코더로 입력하여 상기 압축기에서 압축된 각각의 영상을 복원한다. 이어서, 다시점 비디오 합성기에서 상기 복원된 다시점 영상간의 중간영상을 생성하고, 디스플레이에 출력한다.That is, according to the prior art 1, after receiving images from several cameras, the preprocessor corrects the parameters due to the difference in the characteristics of the cameras, and then removes the redundancy of the images at each viewpoint as much as possible. It compresses into video streams (Elementary Streams: ES), and inputs them to a multi-view video multiplexing device to generate each independent video stream into one multiplexed stream. Accordingly, in the multi-view video demultiplexing apparatus, the receiver demultiplexes the multiplexed multi-view video and inputs each video image stream to a video decoder corresponding to each view present in the multi-view video decompressor in the receiver. Restore each compressed image. Subsequently, the multiview video synthesizer generates an intermediate image between the restored multiview images and outputs the intermediate image to the display.

이와 같이, 영상을 시간축으로 예측하여 압축하는 Cascade 방법은 다시점 카메라에서 입력되는 영상을 압축/복원시에 가장 기본적인 방법으로서, 압축된 영상의 사이즈는 입력되는 카메라의 개수에 비례하는 용량이 된다. 따라서, 압축의 효율은 전혀 좋아지지 않는다. 또한, 압축된 데이터를 변형하여 하나의 영상 스트림에 넣더라도 스트림의 구성에 따라 디코더 및 인코더가 재구성된다. 이는 압축 효율을 높이기 위한 알고리즘이 아니라 기존의 압축 표준을 변형하여 Mux/Demux를 구성한 것이라 할 수 있다. As described above, the cascade method of predicting and compressing an image on a time axis is the most basic method for compressing / restore an image input from a multiview camera, and the size of the compressed image is a capacity proportional to the number of input cameras. Therefore, the efficiency of compression does not improve at all. In addition, even if the compressed data is transformed into one video stream, the decoder and encoder are reconfigured according to the stream configuration. This is not an algorithm for improving compression efficiency, but a modification of existing compression standards to form Mux / Demux.

전술한 Cascade 방법을 개선한 기술로서, 국내특허출원 제10-2003-0002116호 "다시점 영상의 압축/복원장치 및 방법"(이하 "선행기술 2"라 함)은 주변 카메라에서 입력되는 영상에서의 예측 및 시간축으로 예측을 하는 방법을 제안하고 있으며, 도 1에 도시된 바와 같다. As a technique for improving the above-described Cascade method, Korean Patent Application No. 10-2003-0002116, "Compression / Restoration apparatus and method of multi-view image" (hereinafter referred to as "prior art 2") is an image input from a peripheral camera A method of predicting and predicting by the time axis is proposed, as shown in FIG. 1.

선행기술 2의 시공간 예측 방법에 따르면, 중앙영상 엔코더가, 입력되는 중앙영상의 움직임을 추정하고 움직임을 보상하면서 그 중앙영상을 엔코딩하여 중앙영상 데이터 스트림을 생성함과 아울러 중앙영상을 복원한 복원 중앙 영상을 기준영상으로 제공한다. 그리고, 좌측영상 엔코더가, 입력되는 좌측영상의 움직임을 추정하고 움직임을 보상함과 아울러 상기 중앙영상 엔코더가 제공하는 복원 중앙영상을 기준영상으로 참조하여 변이 추정 및 변이 보상을 수행하면서 좌측영상을 엔코 딩하여 좌측영상 데이터 스트림을 생성하며, 우측영상 엔코더가, 입력되는 우측영상의 움직임을 추정하고 움직임을 보상함과 아울러 상기 중앙영상 엔코더가 생성한 복원 중앙영상을 기준영상으로 참조하여 변이 추정 및 변이 보상을 수행하면서 우측영상을 엔코딩하여 우측영상 데이터 스트림을 생성한다.According to the space-time prediction method of the prior art 2, the central image encoder estimates the motion of the input central image and encodes the central image while compensating for the motion to generate the central image data stream and restore the center image. Provide an image as a reference image. The left image encoder estimates the movement of the input left image and compensates for the movement, and performs shift estimation and variation compensation by referring to the reconstructed center image provided by the center image encoder as a reference image and encodes the left image. The left image data stream is generated, and the right image encoder estimates the movement of the input right image and compensates for the movement, and estimates the variation and the variation by referring to the reconstructed center image generated by the center image encoder as a reference image. While performing the compensation, the right image is encoded to generate a right image data stream.

예컨대, 도 1에 도시된 바와 같이, 중앙영상 엔코더가 다시점 영상의 중앙영상에 대하여 I 화상으로 엔코딩을 수행한 후, B 화상, B 화상 및 P 화상으로 엔코딩을 수행하는 것을 반복하고, B 화상, B 화상 및 P 화상으로 엔코딩을 수행할 때 복원한 복원 중앙영상을 좌측영상 엔코더 및 우측영상 엔코더에 제공한다. 그리고, 좌측영상 엔코더 및 우측영상 엔코더는 입력되는 좌측영상 및 우측영상에 대하여 각기 I 화상으로 엔코딩을 수행한 후 상기 중앙영상 엔코더가 제공하는 복원 중앙영상을 기준영상으로 참조하면서 P 화상으로 엔코딩을 수행하는 것을 반복한다.For example, as shown in FIG. 1, after the central image encoder encodes the central image of the multiview image as an I image, the encoding is repeated for the B image, the B image, and the P image, and the B image is repeated. The restored center image is restored to the left image encoder and the right image encoder when encoding is performed on the B image and the P image. In addition, the left image encoder and the right image encoder encode the left image and the right image, respectively, as an I image, and then encode the P image while referring to the reconstructed center image provided by the center image encoder as a reference image. Repeat what you do.

이와 같이 시공간 예측 방법은 각 영상 스트림을 시간적으로 예측하기도 하며, 영상의 겹치는 부분에 있어서 다른 카메라로부터 예측하여 효율을 높이는 방법이다. 시간축으로만 예측하게 되면 영상의 겹침영역과 카메라의 배열에서 발생하는 시차로 인하여 영상의 양 끝부분에 다른 카메라에서 입력되는 영상중에는 없는 영역이 발생하게 된다. 이 영역은 카메라 배치에 따른 공간적 예측을 행함으로써 효율을 높일 수 있다. 이 방법은 시간축으로 예측한 것과 비교하여 볼 때에 시공간적으로 예측한 방법이 조금 더 우수하다. As described above, the spatiotemporal prediction method predicts each video stream temporally and increases efficiency by predicting from another camera in overlapping portions of the images. When the prediction is performed only on the time axis, due to parallax occurring in the overlapping area of the image and the arrangement of the cameras, an area that is not present in the image input from another camera is generated at both ends of the image. This area can increase efficiency by performing spatial prediction according to the camera arrangement. This method is slightly better than the one predicted in time and space compared to the one predicted on the time base.

그러나, 실제로 영상을 예측할 시에 인지적으로 정확한 위치를 예측하는 것이 아니라 비교함수에 의하여 가장 오류가 적은 블록을 예측하게 되므로, 실제로는 그리 효율이 높지 않을 뿐 아니라, 각 카메라의 특성이 각각 다르므로 실제 예측시 카메라간 예측 비율은 높지 않다. 또한, 영상간에 예측시에 3차원적인 회전요소를 배제하고 예측하므로 카메라간 공간적 예측은 오류를 많이 갖고 있다. 각 순차적 영상을 순서대로 고대역 통과 필터 및 저대역 통과 필터로 통과하여 다운샘플링한 후 이를 복원하는 방식에 있어서는 고대역 통과 및 저대역 통과 후의 데이터가 원 데이터보다 영상 데이터를 많이 잃어버린 후가 되므로 실제 복원된 영상의 화질이 그리 높지 않은 단점이 있다.However, when predicting an image, instead of predicting the cognitively accurate position, the comparison function predicts the least errored block, so it is not very efficient and the characteristics of each camera are different. In actual prediction, the prediction ratio between cameras is not high. In addition, spatial prediction between cameras has a lot of errors because three-dimensional rotational elements are excluded and predicted during the prediction between images. In the method of downsampling and resampling each sequential image by the high pass filter and the low pass filter in order, the data after the high pass and the low pass pass loses more image data than the original data. There is a disadvantage that the quality of the reconstructed image is not so high.

전술한 문제점을 해결하고자, 본 발명은 다시점 영상의 시간축 및 공간측 예측을 통해 다시점 영상의 전송을 위한 데이터량을 감소시킴과 아울러, 각 카메라의 시점에 따른 숨겨진 영상을 정확히 예측할 수 있는 다시점 영상 압축 방법을 제공하는 데 목적이 있다. In order to solve the above-described problems, the present invention reduces the amount of data for the transmission of a multiview image through the time axis and the spatial side prediction of the multiview image, and can accurately predict the hidden image according to the viewpoint of each camera. The purpose is to provide a point image compression method.

전술한 목적을 달성하기 위하여, 본 발명의 일 측면에 따르면 다시점 영상의 압축 방법이 제공되며, 다시점 영상의 기준영상에 대하여 객체의 시차 정보가 포함된 합성시차영상을 생성하는 단계와, 상기 다시점 영상으로부터 전체시차를 예측하는 단계와, 상기 예측된 전체시차에 따라 기준영상을 조정하여 각 시점의 합성텍스처영상을 합성하는 단계와, 상기 합성시차영상 및 합성텍스처영상을 이용한 각 시점의 원 영상으로부터의 공간축 예측 및 같은 시점의 기 입력된 영상으로부터의 시간축 예측을 동시에 수행하는 공간축 및 시간축 예측 단계를 포함한다.In order to achieve the above object, according to an aspect of the present invention, there is provided a method of compressing a multiview image, generating a synthetic parallax image including parallax information of an object with respect to the reference image of the multiview image; Predicting total parallax from a multiview image, adjusting a reference image according to the predicted total parallax, synthesizing a composite texture image of each viewpoint, and using a circle of each viewpoint using the synthesized parallax image and the composite texture image A spatial axis and a time axis prediction step of simultaneously performing a spatial axis prediction from an image and a time axis prediction from a previously input image of the same viewpoint are included.

이 때, 바람직하게는 상기 공간축 및 시간축 예측 단계가, 상기 합성시차영상의 시차 정보에 따라 합성텍스처영상의 객체 회전을 보상하여 각 시점의 영상을 공간축에서 예측하는 단계와, 상기 객체 회전이 보상된 합성텍스처영상을 이용하여 상기 각 시점의 원 영상으로부터 시간축 예측을 수행하는 시간축 예측 단계를 포함할 수 있다.In this case, preferably, the space axis and time axis predicting step comprises: estimating the object rotation of the composite texture image according to the parallax information of the synthesis parallax image to predict the image of each viewpoint on the space axis; And a time axis prediction step of performing time axis prediction from the original image of each viewpoint using the compensated composite texture image.

또한, 상기 합성시차영상의 생성 단계는 객체기반 시차 예측에 의하여 상기 합성시차영상을 생성할 수 있으며, 상기 다시점 영상이 깊이 정보를 포함하는 경우에는 상기 깊이 정보를 상기 시차 정보로 변환하거나, 그렇지 않은 경우에 삼각 측량에 의하여 객체의 시차 정보를 예측할 수 있다. The generating of the disparity image may include generating the disparity image by object-based disparity prediction, and converting the depth information into the disparity information when the multiview image includes depth information. If not, the parallax information of the object may be predicted by triangulation.

본 발명의 또 다른 측면에 의하면, 전술한 다시점 영상 압축 방법의 각 단계를 수행하는 명령어가 기록된 컴퓨터 판독 가능한 기록 매체가 제공된다.According to still another aspect of the present invention, there is provided a computer-readable recording medium having recorded thereon instructions for performing each step of the above-described multi-view image compression method.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명토록 한다.Hereinafter, exemplary embodiments of the present invention will be described with reference to the accompanying drawings.

도 2는 본 발명의 일 실시예에 따른 다시점 영상 압축 방법에서 둘 이상의 카메라 입력으로부터 공간 예측을 수행하기 위한 삼각 측량법을 개념적으로 도시한 것이다.2 conceptually illustrates triangulation for performing spatial prediction from two or more camera inputs in a multi-view image compression method according to an embodiment of the present invention.

같은 카메라에서의 짧은 시간차이가 있는 영상간 예측 방법과는 달리 카메라간 공간적 예측에는 3차원적 개념이 포함되어야 한다. 즉, 영상간 예측에 있어서 단순한 이동(Translation)의 움직임이 아닌 회전(Rotation) 움직임의 개념이 포함되어야 한다. 이는 하나의 객체로부터 두 개의 카메라가 동시에 영상을 획득하였을 경우, 각 카메라에 입력되는 객체는 단순히 배경과 객체를 이동한 것이 아닌 삼각 법에 의하여 회전된 영상을 갖게 되기 때문이다. 따라서, 단순한 공간적 예측으로는 정확하고 효율적인 예측이 불가능하다. Unlike the inter prediction method, which has a short time difference in the same camera, the spatial prediction between cameras must include a three-dimensional concept. That is, in the inter prediction, the concept of rotation movement should be included, not simple translation movement. This is because when two cameras simultaneously acquire an image from one object, the object input to each camera will have the image rotated by the trigonometry, rather than simply moving the background and the object. Therefore, accurate and efficient prediction is not possible with simple spatial prediction.

도 2에서 b는 베이스라인을 나타내며, 베이스라인은 카메라간 거리를 의미한다. 일반적으로 사람의 눈은 6.5Cm의 베이스라인을 갖는다. xl, xr는 실제 시공간에서 한 점이 두 개의 카메라에서 입력된 영상내에서 좌표를 의미한다. Cx는 영상의 센터를 의미하며 이는 영상의 크기와 관련이 있다. 실제 공간에서의 깊이 Z(실초점에서 영상면까지 거리)는 다음과 같이 표현할 수 있다.In FIG. 2, b represents a baseline, and the baseline represents a distance between cameras. In general, the human eye has a baseline of 6.5 cm. xl and xr mean coordinates in an image inputted by two cameras at a point in real time and space. Cx means the center of the image, which is related to the size of the image. Depth Z in real space (distance from real focus to image plane) can be expressed as follows.

Z = f - f × b /(xl - xr)
여기서, f는 초점 거리를 나타낸다. Z = f-f × b / (xl-xr)
Where f represents a focal length.

삭제delete

이와 같이, 두 대의 서로 다른 카메라에서 객체내의 한 점을 바라보았을 때 카메라에 투영된 영상은 단순한 이동이 아닌 회전이 발생하므로, 시차를 예측할 시에 회전에 대한 연산을 필요로 한다. As described above, when two different cameras look at a point in the object, the image projected on the camera generates a rotation, not a simple movement, and thus requires a calculation of the rotation when predicting parallax.

이를 위해서 본 발명에서 합성시차영상(Accumulated Dense Disparity Image)을 사용하였다. 시차영상의 이용은 모든 화소에서의 시차값을 제시할 뿐 아니라, 객체의 회전운동에 관한 정보를 포함하기도 한다. 합성시차영상은 객체기반 시차예측 방법에 의하여 구해진다. 이는 단순히 에러가 적은 단위의 예측지점을 벡터로 찾는 것이 아니라 객체내의 인접한 벡터는 유사한 크기를 갖는다는 가정 하에 예측하게 되므로, 객체내의 시차벡터는 유사한 값을 갖게 되며 정확한 벡터를 찾을 수 있다. To this end, in the present invention, a synthesized disparity image was used. The use of parallax images not only suggests parallax values in all pixels, but also includes information about the rotational motion of the object. Synthetic disparity image is obtained by object-based disparity prediction method. This is not simply to find the prediction point of the unit with less error as a vector, but to predict under the assumption that adjacent vectors in the object have a similar size, the parallax vector in the object has a similar value and can find the correct vector.

도 3은 다시점 영상 압축 방법에서 객체 내에서 인접 블록의 벡터값을 이용하여 현재의 시차벡터(Dense disparity)를 예측하는 알고리즘을 나타낸 것이다.3 illustrates an algorithm for predicting a current disparity vector using a vector value of an adjacent block in an object in a multiview image compression method.

먼저, 객체기반 시차벡터를 구하기 위하여서는 객체의 내부와 외부를 구분하여야 하며, 이 때 적용되는 기법이 주변블록을 이용한 객체추출 방법이다. 즉, 객체내의 벡터는 유사한 벡터를 갖게 되는데, 이 때 벡터를 구하기 위하여 에러가 적은 벡터를 찾는 것이 아니라 주변 블록과 유사한 벡터이면서 에러가 적은 블록을 찾는 방법이다.First, in order to obtain the object-based parallax vector, it is necessary to distinguish between the inside and the outside of the object, and the technique applied at this time is an object extraction method using neighboring blocks. In other words, a vector in an object has a similar vector. In this case, instead of looking for a vector having a low error, a vector similar to a neighboring block and a low error block is not found.

주변 블록에서 이미 구하여진 벡터값을 이용하여 현재 블록의 벡터값을 구할 시에 조건은 아래와 같다. 주변 블록에서 구하여진 벡터들의 중간값을 후보로 정하고 이 후보벡터를 이용하여 벡터를 구하였을 시에 값이 충분히 작으면 시차 벡터로 인정한다. 그렇지 않을 경우에는, 주변블록 벡터들과 크기가 같고 그 벡터를 이용하여 시차값을 구하였을 때 에러가 충분히 작으면 시차벡터로 인정한다. 위 조건들이 다 만족되지 않을 경우에는 전체검색방법을 이용한다. When obtaining the vector value of the current block by using the vector value already obtained from the neighboring blocks, the conditions are as follows. When the median value of the vectors obtained from the neighboring blocks is determined as a candidate and the vector is obtained using the candidate vector, if the value is small enough, it is regarded as a parallax vector. Otherwise, if the size of the neighboring block vectors is the same and the parallax value is obtained using the vector, the error is small enough. If all of the above conditions are not satisfied, the entire search method is used.

보다 구체적으로 살펴보면, 먼저 조건 1(Condition 1)은 현재 구하려는 블록의 벡터(파란색)가 주변블록의 벡터(보라색)와 비교하며, 미디언값(중간값)을 이용하여 예측하였을 때 에러가 문턱값 이하면 현재 블록의 벡터로 인정한다. 다음으로, 조건 2(Condition 2)는 주변의 모든 벡터가 모두 같은 값을 가지고 그 벡터를 이용한 에러가 문턱값 이하면 현재 블록의 벡터로 인정한다. 마지막으로 조건 3(Condition 3)은 위의 조건 1, 2를 만족하지 않는 경우에, 같은 객체내의 블록으 로 인정하지 않고 재검색한다. More specifically, condition 1 first compares the vector (blue) of the block to be obtained with the vector (purple) of the neighboring block, and the error is below the threshold when predicted using the median (median). If it is a vector of the current block. Next, Condition 2 recognizes as a vector of the current block if all surrounding vectors have the same value and an error using the vector is less than or equal to a threshold. Finally, if Condition 3 does not satisfy the above conditions 1 and 2, it does not recognize as a block in the same object and rescans.

다음으로, 도 4는 본 발명의 일 실시예에 따른 다시점 영상 압축 방법의 전체 흐름도이며, 도 5 및 도 6은 도 4의 흐름도에 따른 다시점 영상 압축 방법을 개념적으로 도시한 것이다.Next, FIG. 4 is an overall flowchart of a multiview image compression method according to an embodiment of the present invention, and FIGS. 5 and 6 conceptually illustrate a multiview image compression method according to the flowchart of FIG. 4.

먼저, 합성기준영상(Accumulated Image)을 만들기 위해서는 전체시차(Global disparity)가 필요하다. 전체시차는 카메라간의 배열에 의하여 발생하는 시차로서, 배경은 좌측 영상이 기준영상에 비해 좌측 부분이 더 존재하게 된다. 합성기준영상은 이러한 부분을 포함하게 되며, 합성시차영상의 정보를 이용하여 각 카메라로부터의 영상을 예측한다. First, global disparity is required in order to create an accumulated image. The total parallax is a parallax generated by the arrangement between the cameras. In the background, the left image has more left portions than the reference image. The composite reference image includes such a portion, and predicts an image from each camera by using the information of the synthesized parallax image.

도 4를 참조하면, 먼저 다시점 영상(Multi-view video)의 스트림이 입력되며(S410), 이 때 데이터는 평행 카메라 모델(Parallel camera model) 또는 수렴 모델 (Convergent camera model)에 의할 수 있다. 단계(S410)에 후속하여, 상기 스트림이 깊이 맵(Depth map)을 가지고 있는지를 판단하며(S420), 이를 이용할 수 있는 경우에는 깊이(Depth) 정보를 시차로 변환한다(S430). 이 때, 전술한 수학식 1에 따라 깊이 정보로부터 시차를 얻을 수 있다. 만약, 전술한 단계(S420)에서 Depth map을 이용할 수 없는 경우에는, 시차 예측(Disparity estimation)을 수행하며(S440), 도 3의 시차 예측 방법을 이용할 수 있다.Referring to FIG. 4, first, a stream of a multi-view video is input (S410), and data may be based on a parallel camera model or a convergent model. . Subsequently to step S410, it is determined whether the stream has a depth map (S420), and if it is available, depth information is converted into parallax (S430). At this time, the parallax can be obtained from the depth information according to Equation 1 described above. If the depth map is not available in the above-described step (S420), disparity estimation is performed (S440), and the parallax prediction method of FIG. 3 can be used.

전술한 단계(S430 또는 S440)에 후속하여, 기준영상에 대한 객체의 시차 정보가 포함된 합성시차영상(Accumulated dense disparity image)의 프레임, 즉 합성시차프레임(Accumulated disparity frame)을 구성한다(S450). Subsequent to the above-described step S430 or S440, a frame of an accumulated dense disparity image including parallax information of an object with respect to the reference image is formed, that is, an accumulated disparity frame (S450). .

다음으로, 텍스처(RGB) 영상의 구성과 관련하여, 전술한 단계(S410)의 다시점 영상 스트림에 대하여 전체시차 예측(Global disparity estimation)을 통하여 전체시차를 구하고, 이를 이용하여 기준영상을 바탕으로 각 시점에 따라 영상을 조정(Image Rectification)함으로써(S460), 합성텍스처프레임(accumulated texture frame) 또는 합성텍스처영상(Accumulated texture image)을 합성한다(S470). 이 때, 3 X 3 카메라 배열의 경우에 중심 영상을 기준영상으로 할 때, 주변의 각 시점에 해당하는 총 8개의 합성텍스처영상을 얻을 수 있다. Next, in relation to the configuration of the texture (RGB) image, the global parallax is obtained through the global disparity estimation of the multi-view image stream of the above-described step S410, and based on the reference image, By adjusting an image according to each viewpoint (S460), an synthesized texture frame or an synthesized texture image is synthesized (S470). In this case, in the case of a 3 × 3 camera array, when the center image is used as the reference image, a total of eight composite texture images corresponding to each viewpoint of the surroundings can be obtained.

최종적으로, 전술한 합성시차영상(Accumulated disparity frame) 및 전체시차를 이용하여 생성된 합성텍스처영상(accumulated texture frame)을 이용하여 각 카메라 입력에 대한 시간축 예측(기존 MPEG의 예측방법) 및 영상간(공간) 예측(Inter-view Prediction)을 수행한다(S480). Finally, time-base prediction (previous MPEG prediction method) and inter-image (for each camera input) using the synthesized disparity frame and the accumulated texture frame generated using the total parallax. Spatial) prediction (Inter-view Prediction) is performed (S480).

보다 구체적으로 살펴보면, 도 5에 도시된 바와 같이 합성시차영상(Accumulated texture frame)에 포함된 객체의 시차 정보를 이용하여 합성텍스처영상(Accumulated disparity frame)에서 객체의 회전을 보상함으로써 각 시점의 영상에 대하여 공간축 상에서 T번째 세그먼트 프레임(T'th segment frame)을 예측하고, 이와 동시에 시간축 상에서는 각 시점의 영상에 대하여 이미 압축된 과거 시간의 영상, 즉, (T-1)번째 프레임(T-1'th frame)으로부터 움직임을 보상함으로써 T번째 세그먼트 프레임 예측을 수행한다. 즉, 각 시점의 영상에 대하여, T번째 세그먼트 프레임의 예측에는 도 6에 도시된 바와 같이 공간축 예측 및 시간축 예측 결과가 함께 반영된다. 그리고, 예측되는 이미지는 입력 영상과 같은 숫자의 영상이 만들 어진다. More specifically, as shown in FIG. 5, by using the parallax information of the object included in the synthesized parallax image to compensate for the rotation of the object in the accumulated texture image (Accumulated disparity frame) to the image of each view The T'th segment frame is predicted on the spatial axis, and at the same time, the image of the past time, that is, the (T-1) th frame (T-1), is already compressed with respect to the image of each viewpoint on the time axis. T-th segment frame prediction is performed by compensating for motion from 'th frame'. That is, the spatial axis prediction and the time axis prediction result are reflected together in the prediction of the T-th segment frame with respect to the image of each viewpoint. The predicted image is made of the same number of images as the input image.

한편, 도 6에 도시된 바와 같이 3 X 3 카메라 배열의 경우에, 중심 영상을 기준영상으로 하여 주변 영상을 바로 예측을 하게 되면, 영상의 가려진 부분(Occlusion area)이 많아지게 되고, 영상 내에서 객체가 다른 영상에서 바라볼 때 Rotation 현상으로 정확하게 예측하지 못하는 상황에 발생할 수 있다. 따라서, 본 발명의 바람직한 실시예에서는, 시차 정보로부터 얻어진 객체의 Rotation 정보에 따른 시차 보상 및 전체 시차에 따른 시점 조정을 이용하여 합성텍스처영상(Accumulated texture image)을 예측하고, 이로부터 각 시점 영상(view)을 예측한다. On the other hand, in the case of a 3 X 3 camera array, as shown in Figure 6, when the surrounding image is immediately predicted by using the center image as a reference image, the occlusion area of the image becomes large, and in the image This may occur when the object is not accurately predicted by the rotation phenomenon when looking at another image. Therefore, in a preferred embodiment of the present invention, by using the parallax compensation according to the rotation information of the object obtained from the parallax information and the viewpoint adjustment according to the entire parallax, the accumulated texture image is predicted, and from each viewpoint image ( predict the view.

이와 같이, 본 발명의 바람직한 실시예에 따르면, 전체시차(Global disparity)와 합성시차영상(Accumulated dense disparity image)을 이용하여 각 카메라 입력으로부터 합성기준영상이 되는 합성텍스처영상을 합성하며, 이를 이용하여 원 영상으로부터 공간축 예측을 하며, 이와 동시에 이미 압축된 과거 시간(프레임)의 영상으로부터 시간축 예측을 실시한다. As described above, according to a preferred embodiment of the present invention, a composite texture image that becomes a composite reference image from each camera input is synthesized by using global disparity and accumulated dense disparity image. The spatial axis prediction is performed from the original image, and at the same time, the time axis prediction is performed from the image of the past time (frame) which is already compressed.

중심의 기준영상으로부터 각 카메라 입력으로의 단순한 공간예측을 실시하게 되면, 카메라간의 시차와 카메라에 가까이 있는 객체의 시차는 매우 커지게 되며, 전체시차(Global disparity)로부터 발생하는 보이지 않는 영역의 예측시에 오류가 발생하게 되므로, 전체 영상의 예측시에 화질의 저하 및 연산량의 증가가 발생하게 된다. 전체시차(Global disparity) 및 합성시차영상(Accumulated dense disparity image)으로부터 예측된 합성영상(Accumulated image)을 이용하여 각 카메라로부터 입력된 영상을 예측하게 되면, 시차가 감소함으로 인하여 예측시에 연산량의 감소를 가져오며, 기준영상이 갖지 않은 숨겨진 영상을 포함하고 있으므로 정확한 예측 및 예측 오류가 감소하게 된다. When simple spatial prediction is performed from the central reference image to each camera input, the parallax between the cameras and the parallaxes of objects close to the camera become very large, and the prediction of the invisible region resulting from the global disparity Since an error occurs, the degradation of the image quality and the increase of the computation amount occur when the entire image is predicted. When predicting the image input from each camera using the predicted composite image from the global disparity and the accumulated dense disparity image, the amount of computation is reduced during prediction due to the reduction of the parallax Since it includes a hidden image that does not have a reference image, accurate prediction and prediction errors are reduced.

이상에서 본 발명에 따른 바람직한 실시예를 설명하였으나, 이는 예시적인 것에 불과하며 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 여타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 보호 범위는 이하의 특허청구범위에 의해서 정해져야 할 것이다.Although the preferred embodiment according to the present invention has been described above, this is merely exemplary and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the protection scope of the present invention should be defined by the following claims.

이상 설명한 바와 같이, 본 발명에 따르면, 합성영상 및 카메라의 각 입력간 예측을 통하여 데이터를 저감하고 다시점 영상을 전송하기 위한 네트워크 점유율을 감소시킬 수 있을 뿐만 아니라, 기준영상이 갖지 않은 숨겨진 영상을 정확히 예측할 수 있기 때문에 예측 오류가 감소한다. 더욱이, 합성시차영상을 이용하여 영상의 모델링이 가능하므로, 객체 추출, 3차원 렌더링, 블루스크린 효과와 같은 다양한 응용 분야를 만들 수 있다. As described above, according to the present invention, it is possible to reduce data and reduce network occupancy for transmitting a multiview image through prediction between inputs of a composite image and a camera, and also to display a hidden image without a reference image. Predictive errors are reduced because they can be predicted accurately. Furthermore, since the image can be modeled using the synthetic parallax image, various application fields such as object extraction, 3D rendering, and blue screen effect can be made.

Claims

As a method of compressing a multiview image,

Generating a synthetic parallax image including parallax information of an object with respect to a reference image of the multiview image;

Predicting total parallax from the multi-view image;

Synthesizing a composite texture image of each viewpoint by adjusting a reference image according to the predicted total parallax;

A space axis and time axis prediction step of simultaneously performing the spatial axis prediction from the original image of each viewpoint using the synthesized parallax image and the composite texture image and the time axis prediction from the input image of the same viewpoint

Compression method of a multi-view image comprising a.

The method of claim 1, wherein the space axis and time axis prediction step

Compensating the rotation of the object of the composite texture image according to the parallax information of the composite parallax image to predict the image of each viewpoint on the spatial axis, and at the same time perform the time axis prediction from the previously input image of the same viewpoint. Compression method.

The method of claim 1 or 2, wherein the generating of the synthesized parallax image

The multi-view image compression method of generating the synthesized parallax image by object-based parallax prediction.

The method of claim 3, wherein the object-based disparity prediction

The method of claim 1, further comprising extracting an object by finding a block that is similar to a neighboring block and has less error.

And when the multiview image includes depth information, converting the depth information into the parallax information.

The method of claim 5, wherein the generating of the synthesized parallax image

And if the multiview image does not include depth information, predicting parallax information of the object by triangulation.

A computer-readable recording medium having recorded thereon instructions for performing each step of the multi-view image compression method according to claim 1.