KR101812664B1

KR101812664B1 - Method and apparatus for extracting multi-view object with fractional boundaries

Info

Publication number: KR101812664B1
Application number: KR1020160142239A
Authority: KR
Inventors: 권인소; 김성흠
Original assignee: 한국과학기술원
Priority date: 2016-10-28
Filing date: 2016-10-28
Publication date: 2017-12-27

Abstract

An operating method of an object extracting apparatus operated by at least one processor includes the steps of: extracting an interest volume in which an object is included in a 3D space based on a plurality of view images to photograph the object in multiple views; obtaining an initial panoramic region in which the object is included in each view image by projecting the interest volume to each view image; extracting the panoramic region of each view image converged from the initial panoramic region while updating an external energy model related to a color and a texture and an initialized geometric energy model from the initial panoramic region; and outputting a binary mask to divide a panorama and a background of each view image. Accordingly, the present invention can elaborately extract an alpha channel mask of the object.

Description

[0001] METHOD AND APPARATUS FOR EXTRACTING MULTI-VIEW OBJECT WITH FRACTIONAL BOUNDARIES [0002]

본 발명은 다시점 객체 공분할(multi-view object co-segmentation)에 관한 것이다.The present invention relates to multi-view object co-segmentation.

다시점 객체 공분할(multi-view object co-segmentation)은 두 개 이상의 시점에서 관찰되는 임의의 전경(foreground) 객체를 동시에 분할하는 것을 말한다. 지금까지 연구된 다시점 객체 공분할 방법은 입력되는 카메라 자세 정보를 이용해 초기 전경을 획득한다. 그리고, 초기 전경과 배경에 대해 컬러의 통계적 특성을 확률 모델화하고, 이를 기초로 전경 및 배경 영역을 갱신한다. 이 과정에서 마르코브 랜덤 필드(Markov random field, MRF)를 이용한 최적화를 반복적으로 수행한다.Multi-view object co-segmentation refers to the simultaneous segmentation of arbitrary foreground objects observed at two or more viewpoints. So far, the multi - view object co - ordination method obtains the initial view using the input camera attitude information. Then, the statistical characteristics of color are modeled with probability for the initial foreground and background, and the foreground and background areas are renewed based on the model. In this process, optimization is repeatedly performed using a Markov random field (MRF).

하지만, 지금까지 연구된 다시점 객체 공분할 방법은 초기 전경 영역을 카메라 자세 정보로 획득하는데, 이렇게 획득된 초기 전경 영역은 일반적으로 충분하지 않은 경우가 많다. 따라서, 불충분한 정보를 기초로 세워진 확률 모델 또한 입력 영상의 전경 및 배경의 특성을 모호하게 만드는 한계가 있다. 또한, 지금까지 연구된 다시점 객체 공분할 방법은 객체 추출 응용 분야에 따라 최종 결과를 위해 사용자의 추가적인 개입이 요구될 수 있다. 특히 지금까지 연구된 다시점 객체 공분할 방법은 털과 같은 복잡한 외곽선을 가지는 객체의 정확한 알파 채널 마스크를 얻지 못하는 한계가 있다.However, the method of convergence of multi - object objects studied until now acquires the initial foreground region as the camera posture information, and the obtained initial foreground region is often not sufficient in general. Therefore, the probability model based on insufficient information also has a limitation that obscures the foreground and background characteristics of the input image. In addition, the method of multi-object co-operation that has been studied so far may require additional intervention by the user for final results depending on the object extraction application. Especially, the method of multi-object co-operation, which has been studied so far, has a limitation in that it can not obtain an accurate alpha channel mask of an object having a complex outline such as hair.

본 발명이 해결하고자 하는 과제는 객체가 포함된 공간상의 관심 볼륨으로 각 시점 이미지의 초기 전경 영역을 제한하여 다시점 객체를 공분할하고, 각 시점의 이진 마스크에서 전경/배경으로 특정되지 않은 불확실한 영역을 동적 윈도우로 탐색 및 매팅하여 가분적 외곽선을 가지는 객체의 알파 채널 마스크를 추출하는 방법 및 장치를 제공하는 것이다.A problem to be solved by the present invention is to limit the initial foreground area of each viewpoint image to a volume of interest in the space including the object, to divide the multi-viewpoint object into two, and to generate an uncertain area And extracting an alpha channel mask of an object having a contour outline by searching and matting the object with a dynamic window.

본 발명의 다른 실시예에 따라 적어도 하나의 프로세서에 의해 동작하는 객체 추출 장치의 동작 방법으로서, 다시점에서 객체를 촬영한 복수의 시점 이미지들을 기초로 3차원 공간에서 객체가 포함된 관심 볼륨을 추출하는 단계, 각 시점 이미지로 상기 관심 볼륨을 투영하여 각 시점 이미지에서 상기 객체가 포함된 초기 전경 영역을 획득하는 단계, 상기 초기 전경 영역으로부터 초기화된 기하학적 에너지 모델과 컬러 및 질감에 관련된 외형 에너지 모델을 갱신하면서 상기 초기 전경 영역으로부터 수렴된 각 시점 이미지의 전경 영역을 추출하는 단계, 그리고 각 시점 이미지의 전경과 배경을 나누는 이진 마스크를 출력하는 단계를 포함한다.According to another embodiment of the present invention, there is provided a method of operating an object extraction apparatus operated by at least one processor, the method comprising: extracting a volume of interest including an object in a three-dimensional space based on a plurality of viewpoint images, Projecting the volume of interest to each point-in-time image to obtain an initial foreground region containing the object in each point-in-view image; generating a geometric energy model initialized from the initial foreground region and an outline energy model associated with color and texture, Extracting a foreground region of each viewpoint image converged from the initial foreground region while updating, and outputting a binary mask dividing foreground and background of each viewpoint image.

상기 관심 볼륨을 추출하는 단계는 상기 복수의 시점 이미지들 각각에 관계된 카메라 시점의 절두체들(frustums)이 겹치는 3차원 공간을 상기 관심 볼륨으로 추출할 수 있다.The extracting of the volume of interest may extract the three-dimensional space in which the frustums of the camera view related to each of the plurality of viewpoint images overlap, into the volume of interest.

상기 기하학적 에너지 모델은 각 시점 이미지가 추론된 전경과 배경으로 이진 분할된 상태에서, 어느 시점 이미지에서 전경이나 배경으로 추론되는 영역이 워핑된 다른 시점 이미지들에서도 동일하게 전경이나 배경으로 추론되는지를 나타내는 시점간 기하학적 일관성(geometric coherence)을 측정하도록 모델링될 수 있다.The geometric energy model indicates whether an area inferred as a foreground or a background in a point-in-time image is inferred as foreground or background in other warped point-in-time images in a state where each point-in-time image is binary- And may be modeled to measure geometric coherence between time points.

상기 각 시점 이미지의 전경 영역을 추출하는 단계는 각 시점 이미지의 기하학적 정보를 기초로 각 시점 이미지들의 수퍼픽셀들을 현재 관심 볼륨 내의 대응되는 3차원 지점에 연결하여 상기 시점간 기하학적 일관성을 판단하고, 상기 수퍼픽셀은 이미지를 분할하는 단위일 수 있다.Wherein the step of extracting the foreground region of each viewpoint image comprises: connecting the superpixels of each viewpoint images to corresponding three-dimensional points in the current volume of interest based on the geometric information of each viewpoint image to determine geometric consistency between the points of view; A super-pixel can be a unit for dividing an image.

상기 각 시점 이미지의 전경 영역을 추출하는 단계는 상기 현재 관심 볼륨에서 기하학적 일관성 점수가 기준값보다 높은 지점 주변에 3차원 지점을 추가하고, 상기 복수의 이미지에서 보이지 않는 3차원 지점들을 상기 현재 관심 볼륨에서 삭제하여 상기 현재 관심 볼륨을 갱신하고, 갱신한 관심 볼륨을 기초로 상기 기하학적 에너지 모델과 상기 컬러 및 질감에 관련된 외형 에너지 모델을 갱신할 수 있다.Wherein extracting the foreground region of each viewpoint image comprises: adding a three-dimensional point around a point at which a geometric coherence score is higher than a reference value in the current volume of interest; Update the current volume of interest, and update the geometric energy model and the external energy model associated with the color and texture based on the updated volume of interest.

상기 질감에 관련된 외형 에너지 모델은 상기 복수의 시점 이미지들에서 공유되는 질감 정보를 이용하여 에너지를 계산하도록 모델링되고, 상기 컬러에 관련된 외형 에너지 모델은 각 시점 이미지에서 추론된 전경과 배경의 컬러 정보를 이용하여 에너지를 계산하도록 모델링될 수 있다.The outer shape energy model related to the texture is modeled to calculate energy using texture information shared in the plurality of viewpoint images, and the outer shape energy model related to the color includes color information of the foreground and background inferred from each viewpoint image To calculate the energy.

상기 질감에 관련된 외형 에너지 모델은 상기 복수의 시점 이미지들에서 추론된 전경 및 배경의 질감 특성으로 학습된 분류기를 기초로 각 수퍼픽셀의 질감에 관계된 분류 점수를 계산하도록 모델링되고, 상기 수퍼픽셀은 이미지를 분할하는 단위일 수 있다.Wherein the external energy model related to the texture is modeled to calculate classification scores related to the texture of each super pixel based on classifiers learned with texture properties of foreground and background deduced from the plurality of viewpoint images, Lt; / RTI >

상기 동작 방법은 상기 이진 마스크의 경계선 주변의 컬러 정보와 기하학적 정보를 기초로 상기 경계선 주변을 배경, 전경, 그리고 불확실한 경계 영역으로 분할하는 트라이맵(Trimap)을 생성하는 단계, 그리고 상기 트라이맵의 상기 불확실한 경계 영역을 매팅하여 알파 채널 마스크를 생성하는 단계를 더 포함할 수 있다.The method includes generating a trimap that divides the perimeter of the boundary into a background, a foreground, and an uncertain boundary region based on color information and geometric information around the perimeter of the binary mask, And generating an alpha channel mask by mating an uncertain boundary region.

본 발명의 다른 실시예에 따라 적어도 하나의 프로세서에 의해 동작하는 객체 추출 장치의 동작 방법으로서, 입력 이미지의 이진 마스크를 입력받은 단계, 상기 이진 마스크의 경계선 주변에서 샘플링된 픽셀들에서 전경과 배경의 엔트로피를 계산하여, 각 픽셀에 대응된 로컬 윈도우 크기를 동적으로 결정하는 단계, 각 로컬 윈도우의 영역을 배경, 전경, 그리고 불확실한 경계 영역으로 분할하여 트라이맵(Trimap)을 생성하는 단계, 그리고 매팅 식으로 상기 트라이맵의 알파 채널 값을 추출하여 알파 채널 마스크를 생성하는 단계를 포함한다.In accordance with another embodiment of the present invention, there is provided a method of operating an object extraction apparatus operated by at least one processor, the method comprising: inputting a binary mask of an input image; Calculating entropy to dynamically determine a local window size corresponding to each pixel, dividing the region of each local window into a background, a foreground, and an uncertain boundary region to produce a trimap, And extracting an alpha channel value of the triimage to generate an alpha channel mask.

상기 로컬 윈도우 크기를 동적으로 결정하는 단계는 샘플링된 픽셀들에서 복수의 윈도우 크기로 쿨백-라이블러 발산(Kullback-Leibler divergence)을 계산하고, 쿨백-라이블러 발산이 최대가 되는 윈도우 크기를 추출할 수 있다.The step of dynamically determining the local window size may include calculating a Kullback-Leibler divergence in a plurality of window sizes at the sampled pixels, and extracting a window size at which the Kullback-Leibler divergence becomes maximum .

상기 트라이맵을 생성하는 단계는 마르코브 랜덤 필드(Markov random field, MRF)를 이용하여 각 로컬 윈도우의 영역을 배경, 전경, 그리고 불확실한 경계 영역으로 분할할 수 있다.The step of generating the tri-map may divide the area of each local window into a background, a foreground, and an uncertain boundary area using a Markov random field (MRF).

상기 동작 방법은 상기 복수의 시점 이미지들에 포함된 객체를 공분할하여 상기 이진 마스크를 획득하는 단계를 더 포함하고, 상기 입력 이미지는 다시점에서 객체를 촬영한 복수의 시점 이미지들 중의 임의 시점 이미지일 수 있다.The method of operation may further include acquiring the binary mask by space-dividing an object included in the plurality of view-point images, wherein the input image includes an arbitrary viewpoint image of a plurality of viewpoint images, Lt; / RTI >

상기 매팅 식은 상기 복수의 시점 이미지들의 매팅 결과들을 기하학적으로 공유하기 위한 제한 조건을 포함할 수 있다.The matting equation may include constraints for geometrically sharing the matting results of the plurality of viewpoint images.

본 발명의 또 다른 실시예에 따라 적어도 하나의 프로세서에 의해 동작하는 객체 추출 장치로서, 다시점에서 객체를 촬영한 복수의 시점 이미지들을 기초로 3차원 공간에서 객체가 포함된 관심 볼륨을 추출하고, 상기 관심 볼륨에 대한 기하학적 제한 조건과 상기 복수의 시점 이미지들의 컬러 및 질감 정보를 이용하여 상기 복수의 시점 이미지들에서 전경과 배경을 공분할하는 공분할부, 그리고 상기 다시점 객체 공분할부로부터 각 시점 이미지에 대한 전경과 배경의 분할 결과인 이진 마스크를 입력받고, 상기 이진 마스크의 경계선 주변을 탐색하여 매팅이 필요한 영역을 트라이맵(Trimap)으로 생성하며, 매팅 식으로 상기 트라이맵의 알파 채널 값을 추출하여 알파 채널 마스크를 생성하는 추출부를 포함한다.According to another embodiment of the present invention, there is provided an object extracting apparatus which is operated by at least one processor. The object extracting apparatus extracts a volume of interest including an object in a three-dimensional space based on a plurality of view images, A co-splitting unit splitting the foreground and the background in the plurality of viewpoint images using the geometric constraint on the volume of interest and the color and texture information of the plurality of viewpoint images, And extracts an alpha channel value of the triumph in a mathematical expression by extracting an alpha channel value of the triumph by a matting expression, And generates an alpha channel mask.

상기 공분할부는 상기 복수의 시점 이미지들 각각에 관계된 카메라 시점의 절두체들(frustums)이 겹치는 3차원 공간을 상기 관심 볼륨으로 추출하고, 각 시점 이미지로 상기 관심 볼륨을 투영하여 각 시점 이미지에서 상기 객체가 포함된 초기 전경 영역을 획득하며, 상기 초기 전경 영역으로부터 초기화된 기하학적 에너지 모델과 컬러 및 질감에 관련된 외형 에너지 모델을 갱신하면서 상기 초기 전경 영역으로부터 수렴된 각 시점 이미지의 전경 영역을 추출할 수 있다.Wherein the sharing unit extracts a three-dimensional space in which the frustums of the camera view related to each of the plurality of viewpoint images are overlapped with the volume of interest, projects the volume of interest into each viewpoint image, And extracts the foreground region of each viewpoint image converged from the initial foreground region while updating the geometric energy model initialized from the initial foreground region and the outline energy model related to color and texture .

상기 기하학적 에너지 모델은 각 시점 이미지가 추론된 전경과 배경으로 이진 분할된 상태에서, 어느 시점 이미지에서 전경이나 배경으로 추론되는 수퍼픽셀이 워핑된 다른 시점 이미지들에서도 동일하게 전경이나 배경으로 추론되는지를 나타내는 시점간 기하학적 일관성(geometric coherence)을 측정하도록 모델링되고, 상기 질감에 관련된 외형 에너지 모델은 상기 복수의 시점 이미지들에서 추론된 전경 및 배경의 질감 특성으로 학습된 분류기를 기초로 각 수퍼픽셀의 질감에 관계된 분류 점수를 계산하도록 모델링되며, 상기 수퍼픽셀은 이미지를 분할하는 단위일 수 있다.The geometric energy model is used to determine whether superimposition of superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposed superimposing images Wherein the external energy model associated with the texture is based on a classifier learned with texture properties of the foreground and background deduced from the plurality of viewpoint images, the texture of each superpixel based on the learned classifier, And the super-pixel may be a unit for dividing the image.

상기 추출부는 상기 이진 마스크의 경계선 주변에서 샘플링된 픽셀들에서 전경과 배경의 엔트로피를 계산하여, 각 픽셀에 대응된 로컬 윈도우 크기를 동적으로 결정하고, 각 로컬 윈도우의 영역을 배경, 전경, 그리고 불확실한 경계 영역으로 분할하여 상기 트라이맵을 생성할 수 있다.The extraction unit may calculate the entropy of the foreground and the background in the pixels sampled around the boundary line of the binary mask to dynamically determine the size of the local window corresponding to each pixel and set the area of each local window as the background, It is possible to generate the triimage by dividing the boundary region into the boundary regions.

본 발명의 실시예에 따르면 복잡한 가분적 외곽선을 가지는 객체의 알파 채널 마스크를 정교하게 추출할 수 있다. 본 발명의 실시예에 따르는 객체 분할 및 알파 채널 마스크 추출 방법은 고급 영상 처리, 로보틱스, 그리고 컴퓨터 비전 및 그래픽스 응용에 적용되어 정확한 객체를 추출할 수 있다.According to the embodiment of the present invention, it is possible to extract the alpha channel mask of an object having a complicated and discrete contour line precisely. The object segmentation and alpha channel mask extraction method according to the embodiment of the present invention can be applied to advanced image processing, robotics, computer vision and graphics application, and accurate objects can be extracted.

도 1은 본 발명의 한 실시예에 따른 객체 추출 장치의 구성도이다.
도 2와 도 3 각각은 시점 이미지별 초기 전경 영역 및 이진 마스크 추출 방법을 설명하는 도면이다.
도 4는 본 발명의 한 실시예에 따른 객체 추출 장치에서 추출되는 알파 채널 마스크의 예시이다.
도 5는 본 발명의 한 실시예에 따른 다시점 객체 공분할 방법의 흐름도이다.
도 6은 본 발명의 한 실시예에 따른 기하학적 제한 조건을 설명하는 도면이다.
도 7은 본 발명의 한 실시예에 따른 질감 에너지 모델을 설명하는 도면이다.
도 8은 본 발명의 한 실시예에 따른 알파 채널 마스크 추출 방법의 흐름도이다.
도 9는 본 발명의 한 실시예에 따른 알파 채널 마스크 추출 방법을 설명하는 예시 도면이다.
도 10은 본 발명의 한 실시예에 따른 로컬 윈도우 최적화를 통한 매팅 영역 탐지 방법을 설명하는 도면이다.
도 11은 본 발명의 한 실시예에 따른 사용자에 의한 초기 전경 영역 설정을 설명하는 도면이다.
도 12는 본 발명의 한 실시예에 따른 다시점 객체 추출 결과의 예시이다.1 is a configuration diagram of an object extraction apparatus according to an embodiment of the present invention.
2 and 3 are diagrams for explaining the initial foreground region and the binary mask extraction method for each viewpoint image.
4 is an illustration of an alpha channel mask extracted from an object extraction apparatus according to an embodiment of the present invention.
5 is a flowchart of a multi-point object cooperative method according to an embodiment of the present invention.
6 is a diagram illustrating a geometric constraint according to one embodiment of the present invention.
7 is a view for explaining a texture energy model according to an embodiment of the present invention.
8 is a flowchart of an alpha channel mask extraction method according to an embodiment of the present invention.
9 is an exemplary diagram for explaining an alpha channel mask extraction method according to an embodiment of the present invention.
10 is a view for explaining a method of detecting a matching area through local window optimization according to an embodiment of the present invention.
11 is a view for explaining an initial foreground area setting by a user according to an embodiment of the present invention.
12 is an example of a result of extracting a multi-view object according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. Also, the terms " part, "" module," and " module ", etc. in the specification mean a unit for processing at least one function or operation and may be implemented by hardware or software or a combination of hardware and software have.

다음에서, 객체 추출 장치(100)가 다시점 이미지들을 공분할하여 객체의 알파 채널 마스크를 추출하는 것으로 설명하나, 단일 시점 이미지에 포함된 객체의 알파 채널 마스크를 추출할 수도 있다.Hereinafter, the object extraction apparatus 100 extracts the alpha channel mask of the object by dividing the multi-view images into two, but may extract the alpha channel mask of the object included in the single view image.

도 1은 본 발명의 한 실시예에 따른 객체 추출 장치의 구성도이고, 도 2와 도 3 각각은 시점 이미지별 초기 전경 영역 및 이진 마스크 추출 방법을 설명하는 도면이며, 도 4는 본 발명의 한 실시예에 따른 객체 추출 장치에서 추출되는 알파 채널 마스크의 예시이다.FIG. 1 is a block diagram of an object extracting apparatus according to an embodiment of the present invention. FIGS. 2 and 3 are views for explaining an initial foreground region and a binary mask extracting method for each viewpoint image, and FIG. FIG. 8 is an illustration of an alpha channel mask extracted from an object extraction apparatus according to an embodiment. FIG.

도 1을 참고하면, 객체 추출 장치(100)는 다시점에서 객체를 촬영한 복수의 시점 이미지들을 기초로 각 시점 이미지에서 가분적 외곽선을 가지는 객체를 시점간 기하학적 정보 및 질감 정보를 공유하면서 공분할한다. 객체 추출 장치(100)는 각 시점 이미지의 이진 마스크의 불확실한 경계를 매팅하여 객체의 알파 채널 마스크를 추출한다. 이후, 객체 추출 장치(100)는 촬영 이미지에 알파 채널 마스크를 적용하여 객체만을 추출하고, 배경을 다양하게 변경할 수 있다. 알파 채널 마스크는 컬러 이미지에 포함된 객체를 배경에서 분리할 수 있도록 객체 영역을 흰색으로, 배경을 검은색으로 분리하는 마스크로서, 알파 마스크, 알파 채널 등 다양한 용어로 불릴 수 있다. Referring to FIG. 1, the object extracting apparatus 100 may share an object having a kinked outline in each viewpoint image, sharing geometric information and texture information between the viewpoints, based on a plurality of viewpoint images obtained by shooting an object at a different point do. The object extraction apparatus 100 extracts an alpha channel mask of an object by matching an uncertain boundary of the binary mask of each view image. Thereafter, the object extracting apparatus 100 extracts only the object by applying an alpha channel mask to the captured image, and can change the background in various ways. The alpha channel mask can be referred to as various terms such as an alpha mask, an alpha channel, and the like, which separates an object region into white and a background into black so that the objects included in the color image can be separated from the background.

객체 추출 장치(100)는 적어도 하나의 프로세서에 의해 동작하고, 적어도 하나의 프로세서에 의해 동작하는 입력부(110), 다시점 객체 공분할부(130), 그리고 알파 채널 마스크 추출부(150)를 포함한다. 입력부(110), 다시점 객체 공분할부(130), 그리고 알파 채널 마스크 추출부(150) 각각은 물리적으로 구현되거나 소프트웨어 기능 블록으로 구현될 수 있다.The object extracting apparatus 100 includes an input unit 110 operated by at least one processor and operated by at least one processor, a multi-point object sharer 130, and an alpha channel mask extractor 150 . Each of the input unit 110, the multi-point object covariance unit 130, and the alpha channel mask extraction unit 150 may be physically implemented or implemented as a software functional block.

입력부(110)는 다시점에서 객체를 촬영한 복수의 시점 이미지들을 입력받는다.The input unit 110 receives a plurality of viewpoint images obtained by capturing an object at a different point.

다시점 객체 공분할부(130)는 다시점에서 객체를 촬영한 복수의 시점 이미지들을 기초로 3차원 공간에서 객체가 포함된 관심 볼륨(volume of interest)을 추출한다. 다시점 객체 공분할부(130)는 카메라 시점들의 절두체들(frustums)이 겹치는 공간을 관심 볼륨으로 추출할 수 있다. 다시점 객체 공분할부(130)는 전경의 초기 구조(관심 볼륨 내 3차원 지점들)나 사용자에 의한 관심 볼륨 입력(도 10 참조)을 기초로 관심 볼륨의 크기를 최적화할 수 있다. 특히, 다시점 영상 공분할의 초기 전경 영역은 객체가 모든 카메라에 완전히 보여진다는 가정에서 얻어진다. 여기서의 카메라 자세 정보는 전통적인 다시점 구조 복원 알고리즘(structure from motion)을 사용할 수 있는데, 다시점 객체 공분할부(130)는 복원되는 삼차원 점들을 이용하여 초기 전경 영역을 더욱 제한하여 첫 전배경 모델의 정확도를 높일 수 있다.The multi-viewpoint object sharer 130 extracts a volume of interest including an object in a three-dimensional space based on a plurality of view images obtained by capturing an object at a different point. The multi-point object collation unit 130 can extract a space in which the frustums of the camera viewpoints overlap, as a volume of interest. The re-point object co-ordinator 130 can optimize the size of the volume of interest based on the initial structure of the foreground (three-dimensional points in the volume of interest) or the volume of interest input by the user (see FIG. 10). In particular, the initial foreground region of the multi-view video segmentation is obtained on the assumption that the object is completely visible to all cameras. Here, the camera posture information can use a conventional multi-point structure restoration algorithm (structure from motion), and the multi-point object covariance block 130 further restricts the initial foreground area using restored three-dimensional points, Accuracy can be increased.

다시점 객체 공분할부(130)는 시점 이미지별로 관심 볼륨을 투영하여 객체를 포함하는 초기 전경 영역을 구한다. 그리고 다시점 객체 공분할부(130)는 시점 이미지별 초기 전경 영역으로부터 초기화된 기하학적 모델(geometric model)과 컬러 및 질감에 관련된 외형 모델(appearance model)을 반복적으로 갱신함으로써, 특정 영역(객체 영역)으로 수렴된 최종 전경 영역을 획득한다. 다시점 객체 공분할부(130)는 각 시점 이미지를 전경과 배경으로 분할하는 이진 마스크(binary mask)를 추출한다. 이때, 다시점 객체 공분할부(130)는 수학식 1과 같이, 기하학적 모델과 외형 모델 각각을 마르코브 랜덤 필드(Markov random field, MRF)의 에너지항으로 모델링하고, MRF 최적화를 통해 이미지를 전경과 배경으로 이진 분할한다. 다시점 이미지들의 공통되는 컬러 및 질감, 그리고 기하학적 가정을 공유하도록 외형 모델과 기하학적 모델이 설계되고, 이를 통해 시점간 특징이 공유되는 이진 마스크를 획득할 수 있다. The multi-point object collation unit 130 projects an interested volume for each viewpoint image to obtain an initial foreground area including the object. Then, the multi-point object collation unit 130 repeatedly updates the geometric model initialized from the initial foreground region for each viewpoint image and the appearance model related to color and texture, thereby obtaining a specific region (object region) To obtain a converged final foreground area. The multi-viewpoint object sharer 130 extracts a binary mask that divides each viewpoint image into foreground and background. At this time, the multi-point object covariance unit 130 models each of the geometric model and the external model as an energy term of a Markov random field (MRF) as shown in Equation (1) Binary division into background. The outline model and the geometric model are designed to share the common color and texture of the multi-view images and the geometric assumptions, thereby obtaining a binary mask in which the inter-view features are shared.

수학식 1과 같이, MRF 에너지항은 데이터항(data term, E_d)과 평활화항(smoothness term, E_n)으로 구성될 수 있다. 데이터항(E_d)은 전경과 배경의 외형을 각각 모델링한 외형항(appearance term, E_a), 그리고 기하학적 제한 조건을 고려하기 위해 모델링한 기하학항(geometric term, E_g)으로 설계될 수 있다. 이때의 외형 모델의 신뢰도에 따라 E_a와 E_g의 상대적 영향도(

)가 결정될 수 있다. 평활화항은 외형 정보 중 컬러만을 주요하게 사용하기 위한 평활화용 컬러항(Enc), 그리고 모든 시점에 적용되는 기하학 정보를 사용하기 위한 평활화용 기하학항(Eng)으로 설계될 수 있다. λ_nc와 λ_ng은 가중치 값이다.As shown in Equation (1), the MRF energy term can be composed of a data term (E _d ) and a smoothness term (E _n ). The data term (E _d ) can be designed as a geometric term (E _g ) modeled to take into account the appearance term (E _a ) modeling the foreground and background contours, respectively, and geometric constraints . The relative influence of E _a and E _g depends on the reliability of the external model

) Can be determined. The smoothing term may be designed as a smoothing color term (Enc) for mainly using only color among appearance information, and a geometry term (Eng) for smoothing to use geometry information applied at all times. λ _nc and λ _ng are weight values.

도 2를 참고하면, 다시점 객체 공분할부(130)는 시점 이미지별로 관심 볼륨을 투영하여 객체를 포함하는 초기 전경 영역(10)을 구한다. 다시점 객체 공분할부(130)는 초기 전경 영역(10)으로부터 반복적으로 그래프 컷(iterative Graphcut)을 적용하여 전경과 배경의 경계선(20)를 구한다. 경계선(20)이 전경과 배경을 이진 분할한다.Referring to FIG. 2, the multi-point object sharer 130 projects an initial foreground area 10 including an object by projecting a volume of interest for each view image. The multi-point object sharer 130 repeatedly applies an iterative graphcut from the initial foreground area 10 to obtain a boundary line 20 between the foreground and the background. The border line 20 bisects the foreground and background.

도 3을 참고하면, 다시점 객체 공분할부(130)는 (a)와 같이 카메라가 객체에게 다가가면서 획득한 복수의 시점 이미지들, (b)와 같이 카메라가 객체 주변을 돌면서 획득한 복수의 시점 이미지들, 또는 (c)와 같이 카메라가 다양한 거리에서 객체를 지나가면서 획득한 복수의 시점 이미지들이 입력되더라도, 각 시점에서 공통으로 보이는 공간을 관심 볼륨(빨간 컨벡스 볼륨)으로 추출할 수 있다. 그리고, 다시점 객체 공분할부(130)는 시점 이미지별로 관심 볼륨을 투영하여 객체를 포함하는 초기 전경 영역(파란색)을 구하고, 객체 외곽선으로 수렴한 전경 영역(녹색)를 구한다.Referring to FIG. 3, the multi-viewpoint object shaper 130 includes a plurality of viewpoint images acquired by the camera as it approaches the object as shown in (a), a plurality of viewpoint images acquired by the camera Images, or a plurality of viewpoint images acquired by the camera while passing through objects at various distances, as shown in (c), can be extracted as a volume of interest (red convex volume) that is common to each viewpoint. Then, the multi-point object collation unit 130 obtains an initial foreground region (blue) including an object by projecting a volume of interest for each viewpoint image, and obtains a foreground region (green) converged to the object outline.

이와 같이, 다시점 객체 공분할부(130)는 각 시점에서 공통으로 보이는 공간을 관심 볼륨으로 추출하여 각 시점 이미지의 초기 전경 영역을 구하므로, 카메라의 배치나 움직임에 어떠한 제한을 두지 않는다. 즉, 다시점 객체 공분할부(130)는 반드시 여러 대의 카메라를 요구하지 않으며, 움직이는 한 대의 카메라에서 촬영된 영상들을 선택적으로 골라 처리할 수 있다. 따라서, 초기 전경 영역을 카메라 자세 정보로 획득하는 종래의 다시점 객체 공분할 방법에 비해, 본 발명은 다양한 조건에서 촬영된 다시점 이미지들의 객체 추출에 사용될 수 있는 장점이 있다.In this manner, the multi-point object sharer 130 extracts the space common to each viewpoint as the volume of interest and obtains the initial foreground area of each viewpoint image, so that there is no restriction on the arrangement and movement of the camera. That is, the multi-viewpoint object sharer 130 does not necessarily require a plurality of cameras, and can selectively process images taken by one moving camera. Accordingly, the present invention has an advantage in that it can be used for extracting objects of multi-view images photographed under various conditions, as compared with the conventional multi-viewpoint object sharing method of acquiring an initial foreground area as camera posture information.

다시 도 1을 참고하면, 알파 채널 마스크 추출부(150)는 다시점 객체 공분할부(130)에서 출력한 이진 마스크의 경계선을 조사하여 매팅이 필요한 매팅 영역을 탐지하고, 트라이맵(Trimap)을 생성한다. Referring again to FIG. 1, the alpha channel mask extracting unit 150 detects a matching area that needs to be matched by checking the boundary line of the binary mask output from the multi-point object shaper 130, and generates a triangle do.

알파 채널 마스크 추출부(150)는 객체의 외곽선을 따라 윈도우의 모양과 크기를 동적으로 결정하여 트라이맵을 분할한 후, 각 윈도우마다 객체의 알파 채널 값(alpha matte,

)을 추정하여 경계선을 매팅한다. 알파 채널 마스크 추출부(150)는 수학식 2와 같은 매팅 식을 통해 알파 채널 값(

)을 구할 수 있다.The alpha channel mask extraction unit 150 dynamically determines the shape and the size of the window along the outline of the object to divide the tri map, and then, for each window, the alpha channel value of the object (alpha matte,

) And then marsh the boundary line. The alpha channel mask extraction unit 150 extracts alpha channel values (

) Can be obtained.

수학식 1에서, L은 매팅 친화도 행렬(matting affinity matrix)로 정규화된 유사도 행렬이다. W는 대각 행렬이다. m_c는 이진 분할의 결과를 트라이맵으로 변환하여 얻어진 값으로서 강한 제한 조건(hard constraint)이고, m_g는 모든 시점 이미지들이 공유하는 기하학 모델로부터 얻어진 이진 분할의 결과를 사용하기 위한 값으로서 약한 제한 조건(soft constraint)이고, m_g에 의해 각 시점의 매팅 결과들이 기하학적으로 공유된다. λ_c와 λ_g는 가중치 값이다.In Equation (1), L is a similarity matrix normalized with a matting affinity matrix. W is a diagonal matrix. m _c is a hard constraint obtained by converting the result of the binary division into a tri-map, m _g is a value for using the result of the binary division obtained from the geometry model shared by all the view images, Is a soft constraint, and the mapped results at each time point are geometrically shared by m _g . λ _c and λ _g are weight values.

도 4를 참고하면, 알파 채널 마스크 추출부(150)는 각 시점의 외곽선을 매팅하여 각 시점의 알파 채널 마스크를(30, 31) 구한다. 알파 채널 마스크를 통해 각 시점의 객체(40, 41)만을 추출할 수 있다.Referring to FIG. 4, the alpha channel mask extraction unit 150 obtains the alpha channel masks (30, 31) at each view by matching the outlines of the respective viewpoints. Only the objects 40 and 41 at each view point can be extracted through the alpha channel mask.

도 5는 본 발명의 한 실시예에 따른 다시점 객체 공분할 방법의 흐름도이고, 도 6은 본 발명의 한 실시예에 따른 기하학적 제한 조건을 설명하는 도면이며, 도 7은 본 발명의 한 실시예에 따른 질감 에너지 모델을 설명하는 도면이다.FIG. 5 is a flowchart of a multi-point object co-ordinating method according to an embodiment of the present invention, FIG. 6 is a view for explaining a geometric constraint according to an embodiment of the present invention, And Fig.

도 5를 참고하면, 다시점 객체 공분할부(130)는 시점 이미지별로 관심 볼륨을 투영하여 객체를 포함하는 초기 전경 영역을 구한다(S110). 관심 볼륨은 객체를 포함하고, 모든 시점에서 관찰되는 3차원 지점들(points)로 구성된 컨벡스 공간(convex space)일 수 있다.Referring to FIG. 5, the multi-point object sharer 130 projects an interest volume for each view image to obtain an initial foreground area including the object (S110). The volume of interest can be an convex space consisting of three-dimensional points that contain objects and are observed at all times.

다시점 객체 공분할부(130)는 각 시점 이미지의 특징(feature)을 추출한다(S120). 특징은 이미지 벡터들(I)과 이미지를 분할한 수퍼픽셀들(S)의 정보를 포함할 수 있다. 수퍼픽셀은 복수의 픽셀들의 집합일 수 있다.The multi-view object collation unit 130 extracts features of each viewpoint image (S120). The feature may include information of the image vectors ( I ) and superpixels ( S ) that have divided the image. A superpixel may be a collection of a plurality of pixels.

다시점 객체 공분할부(130)는 각 시점 이미지에서 추론된 전경 및 배경의 기하학적 정보와 컬러 및 질감에 관련된 외형 정보를 기초로 현재 추론된 전경이 수렴 상태인지 판단한다(S130). 다시점 객체 공분할부(130)는 수학식 1과 같이 설계된 MRF 에너지가 최대인지 판단한다.In step S130, the multi-point object sharer 130 determines whether the currently inferred foreground is in a convergence state based on the geometric information of the foreground and background derived from each viewpoint image, and the appearance information related to color and texture. The multi-point object covariance unit 130 determines whether the designed MRF energy is maximized as shown in Equation (1).

수렴 상태가 아니면, 다시점 객체 공분할부(130)는 각 시점 이미지에서 전경으로 추론된 영역을 갱신한다(S140). MRF의 모든 데이터항과 평활화항은 전경/배경이 갱신될 때마다 새롭게 계산된다. 갱신된 전경은 점차 가분적 외곽선으로 둘러 쌓인 객체 영역으로 수렴한다.If not, the multi-viewpoint object sharer 130 updates the area inferred as foreground in each view image (S140). All data terms and smoothing terms in the MRF are computed each time the foreground / background is updated. The updated foreground converges gradually into an object area enclosed by a decimal outline.

수렴 상태인 경우, 다시점 객체 공분할부(130)는 현재의 전경과 배경을 나누는 이진 마스크를 출력한다(S150).In the convergence state, the multi-point object sharer 130 outputs a binary mask that divides the current foreground and the background (S150).

다음에서, 다시점 객체 공분할부(130)가 단계(S130)과 단계(S140)에서 기하학 에너지와 외형 에너지를 모델링한 MRF를 기초로 이미지를 이진 분할하는 방법에 대해 구체적으로 설명한다.In the following, a method for dividing an image based on the MRF modeled by geometric energy and external energy in step S130 and step S140 will be described in detail.

먼저, MRF의 기하학 에너지를 계산하고 제한하는 방법에 대해 설명한다.First, we explain how to calculate and limit the geometric energy of MRF.

도 6을 참고하면, 다시점 객체 공분할부(130)는 각 시점 이미지가 전경과 배경으로 이진 분할된 현재 상태에서, 어느 시점 이미지에서 전경/배경으로 추론되는 영역이 다른 시점 이미지들에서도 전경/배경으로 추론되는지를 나타내는 시점간 기하학적 일관성(geometric coherence)을 측정한다. 어느 시점 이미지에서 전경/배경으로 추론되는 영역이 다른 시점 이미지들에서도 동일하게 전경/배경으로 추론되었다면, 이 영역은 진정한 전경/배경이라고 할 수 있다. 이를 위해, 다시점 객체 공분할부(130)는 각 시점 이미지의 기하학적 정보를 기초로 어느 시점 이미지에서 전경/배경으로 추론되는 지점(수퍼픽셀 또는 픽셀)을 다른 시점 이미지들로 각각 워핑(warping)한 지점이 동일하게 전경/배경으로 추론된 영역인지 평가한다. Referring to FIG. 6, the multi-viewpoint object coordinator 130 determines whether the area inferred as the foreground / background in a certain point-in-time image in the current state in which each point-in-time image is binary- (Geometric coherence) is measured. If an area inferred as foreground / background in a point-in-time image is inferred as foreground / background in other point-in-view images, this area can be said to be true foreground / background. To this end, the multi-point object sharer 130 warps each superimposed viewpoint (superpixel or pixel) from the viewpoint image to the foreground / background based on the geometric information of each viewpoint image Evaluate whether the point is equally inferred as foreground / background.

다시점 객체 공분할부(130)는 시점간 기하학적 일관성 점수를 이용하여 기하학 에너지(E_g)를 모델링한다. The multi-point object coordinate unit 130 models the geometric energy (E _g ) using a point-to-point geometric consistency score.

배경에 대한 기하학 에너지[

]는 수학식 3과 같이 모델링될 수 있다. 전경에 대한 기하학 에너지[

]는

로 모델링될 수 있다. 수학식 3에서, k는 픽셀이고, en_b는 배경에 연결된 에너지의 최대값으로서 예를 들면 10일 수 있다. V_k는 픽셀k에 대한 시점간 기하학적 일관성 점수이고, V_th는 임계값이다. 평활화용 기하학 에너지(Eng)는 수퍼픽셀들의 시점간 연결(inter-view links)를 고려하도록 모델링된다. Geometric energy for background [

] Can be modeled as shown in Equation (3). Geometric energy for foreground [

]

Lt; / RTI > In Equation (3), k is a pixel and en _b is a maximum value of energy connected to the background, for example, 10. V _k is a point-to-point geometric consistency score for pixel k, and V _th is a threshold. The smoothing geometry energy (Eng) is modeled to take into account the inter-view links of superpixels.

다시점 객체 공분할부(130)는 관심 볼륨을 좀 더 제한하여 기하학 에너지모델을 갱신할 수 있다. 즉, 다시점 객체 공분할부(130)는 관심 볼륨에서 3차원 지점들을 샘플링한다. 다시점 객체 공분할부(130)는 기하학적 정보를 기초로 각 3차원 지점과 각 시점 이미지들의 수퍼픽셀들을 연결한다. 도 6의 (a)를 참고하면, A 지점은 두 시점 이미지들의 수퍼픽셀들에 연결되므로(즉, 적어도 두 시점에서 보이므로), 관심 볼륨에서 유효한 3차원 지점으로 유지된다. 가려서 보이지 않는 B 지점은 관심 볼륨에서 제외된다. C 지점은 MRF 최적화에서 시점 간의 기하학적 연결 관계를 가지지 않으나, 수퍼픽셀을 위한 기하학적 일관성을 판단하는데 참조된다. 다시점 객체 공분할부(130)는 도 6의 (b)와 같은 그래프 모델을 기하학적 모델링에 사용할 수 있다.The multi-point object coordinator 130 can update the geometric energy model by further restricting the volume of interest. That is, the multi-point object coarse dividing unit 130 samples three-dimensional points in the volume of interest. The multi-viewpoint object sharer 130 connects the super-pixels of each viewpoint image with each of the three-dimensional points based on the geometric information. Referring to Figure 6 (a), point A is maintained at a valid three-dimensional point in the volume of interest since it is linked to superpixels of the two viewpoint images (i. E. The invisible B point is excluded from the volume of interest. Point C does not have a geometric relationship between points in the MRF optimization, but is referred to in determining geometric consistency for superpixels. The multi-point object coordinate unit 130 can use the graph model shown in FIG. 6 (b) for geometric modeling.

다시점 객체 공분할부(130)는 일관성 점수(coherence score)가 높은 영역에 더 많은 3차원 지점을 부여하여 기하학적 일관성을 확인하고, 모든 시점에서 보이지 않는 3차원 지점들을 삭제할 수 있다. 이렇게 보정된 관심 볼륨으로 각 시점 이미지의 전경 영역이 갱신된다.The multi-point object covariance unit 130 can check the geometric consistency by giving more three-dimensional points to the region having a high coherence score and delete the invisible three-dimensional points at all points in time. The foreground area of each viewpoint image is updated with the corrected volume of interest.

다음에서, MRF의 외형 에너지를 계산하는 방법에 대해 설명한다. 다시점 객체 공분할부(130)는 각 시점 이미지가 전경과 배경으로 이진 분할된 현재 상태에서, 수학식 4와 같이, 컬러 및 질감에 관련된 외형 에너지(E_a)를 구한다. 이때, 다시점 객체 공분할부(130)는, 컬러에 관계된 에너지(E_c)는 시점 이미지별로 전경과 배경으로 추론된 영역의 컬러 정보를 이용하여 계산하고, 질감에 관계된 에너지(E_t)는 모든 시점에서 공유되는 질감 정보를 이용하여 계산한다. 컬러에 관계된 에너지(E_c)는 공지된 에너지 모델을 사용할 수 있다. 본 발명의 질감 에너지 모델은 초기 컬러 에너지 모델의 모호함을 보완할 수 있다.Next, a method of calculating the external energy of the MRF will be described. The multi-viewpoint object shaper 130 obtains the external shape energy E _a related to color and texture, as shown in Equation (4), in a current state in which each viewpoint image is divided into a foreground and a background. At this time, the multi-point object collation unit 130 calculates energy (E _c ) related to the color using the color information of the foreground and background inferred for each viewpoint image, and the energy ( _Et ) related to the texture is all And calculates using the texture information shared at the viewpoint. The energy (E _c ) associated with the color can use a known energy model. The texture energy model of the present invention can compensate for the ambiguity of the initial color energy model.

도 7을 참고하여 질감에 관계된 에너지(E_t)를 모델링하는 방법에 대해 설명한다.A method of modeling the energy (E _t ) related to the texture will be described with reference to FIG.

도 7의 (a)를 참고하면, 다시점 객체 공분할부(130)는 각 슈퍼픽셀을 질감으로 표현하는 특징 벡터를 생성한다. 다시점 객체 공분할부(130)는 이전에 전경/배경으로 추론된 영역들을 근거로 분류기를 학습시킨다. 그리고 다시점 객체 공분할부(130)는 학습된 분류기를 통해 이번에 전경/배경으로 추론된 영역의 슈퍼픽셀마다 분류 점수를 할당한다. 다시점 객체 공분할부(130)는 각 수퍼픽셀의 질감에 관계된 분류 점수를 이용하여 질감 에너지(E_t)를 모델링한다. 질감 에너지 모델은 특정 패턴이 반복적으로 발견되는 전경 객체가 비교적 특별한 질감이 없는 배경에서 관찰될 경우 중요한 단서가 된다. 한편, 특징 벡터는 컬러와 질감의 특징을 동시에 표현하는 기술자(descriptor)일 수 있다.Referring to FIG. 7A, the multi-point object covariance unit 130 generates a feature vector representing each super-pixel as a texture. The multi-viewpoint object collation unit 130 learns the classifier based on areas previously inferred as foreground / background. Then, the multi-point object covariance unit 130 allocates classification points for each super pixel of the area inferred as foreground / background through the learned classifier. The multi-point object shaper 130 models the texture energy E _t using classification scores related to the texture of each super-pixel. The texture energy model is an important clue when a foreground object in which a specific pattern is repeatedly found is observed in a background with a relatively unusual texture. On the other hand, the feature vector may be a descriptor that simultaneously expresses color and texture characteristics.

도 7의 (b)를 참고하면, MRF에서 하나의 수퍼픽셀은 이웃 수퍼픽셀들에 연결되고, 수퍼픽셀간의 컬러 및 질감의 유사성에 의해 이웃하지 않은 수퍼픽셀들도 연결될 수 있다. Referring to FIG. 7 (b), in the MRF, one superpixel is connected to neighboring superpixels, and neighboring superpixels can be connected by the similarity of color and texture between superpixels.

다시점 객체 공분할은 단일 시점 이미지 분할에 비해 정밀도(precision)보다는 재현율(recall)을 중요하게 고려한다. 왜냐하면, 기하학적 제한 조건을 반복적으로 사용하는 다시점 객체 공분할은 분할의 결과에 다소 노이즈가 포함되어 정밀도가 떨어지더라도 다른 시점에서 이를 제한할 수 있지만, 손이나 발과 같이 중요한 부위가 전경으로 추론되지 않아 재현율이 떨어지는 경우는 모든 시점에 이를 전파하게 되는 구조를 가지고 있기 때문이다. The multi-point object segmentation considers recall rather than precision as compared to single-view image segmentation. This is because a multi-viewpoint object segmentation that repeatedly uses geometric constraints can limit the segmentation result at other points even if the result of the segmentation contains some noise and the precision is low, but important parts such as hands or feet are inferred as foreground If the recurrence rate drops, it will be propagated at all times.

MRF 초기 단계에서 질감 정보는 컬러 정보나 기하학적 제한 조건보다 전경의 대략적인 영역을 높은 재현율로 탐지하는데 기여한다. 하지만 추론된 전경의 정밀도가 높아질수록 기하학적 제한 조건이 더 효과적일 수 있다. 따라서, MRF를 반복하면서 질감 정보, 컬러 정보, 기하학적 정보의 가중치를 조절할 수 있다.At the initial stage of MRF, texture information contributes to the detection of approximate foreground regions at higher recall than color information or geometric constraints. However, the higher the precision of the inferred foreground, the more geometric constraints may be more effective. Thus, weights of texture information, color information, and geometric information can be adjusted while repeating MRF.

컬러 정보는 질감 정보나 기하학적 정보와 다르게, 시점마다의 전경 및 배경이 모델링되기 때문에, 적응적 가중치를 두어 알고리즘의 안정성을 높일 수 있다. 즉, MRF 초기에는 큰 변화를 허용하지만, 전경 객체 경계가 드러나면서 점차 기하학적 에너지항의 가중치를 높여, 되도록 작은 변화만이 모든 시점에 걸쳐 일어나도록 할 수 있다.Since the foreground and background of each viewpoint are modeled differently from texture information or geometric information, color information can be provided with adaptive weights to enhance the stability of the algorithm. In other words, it allows a large change in the initial stage of the MRF, but gradually increases the weight of the geometric energy term as the foreground object boundary is exposed, so that only small changes occur at all times.

다시점 객체 공분할부(130)는 슈퍼픽셀 분할이 종료된 후, 픽셀 단위의 분할을 적어도 1회 수행할 수 있다. 이때, 다시점 객체 공분할부(130)는 컬러 정보와 기하학적 제한 조건만 사용하여 각 시점 이미지의 최종 이진 분할 결과(이진 마스크)를 구할 수 있다.After the super pixel division is completed, the multi-viewpoint object sharer 130 can perform division by pixel at least once. At this time, the multi-point object covariance unit 130 can obtain the final binary division result (binary mask) of each viewpoint image using only the color information and the geometric constraint.

도 8은 본 발명의 한 실시예에 따른 알파 채널 마스크 추출 방법의 흐름도이고, 도 9는 본 발명의 한 실시예에 따른 알파 채널 마스크 추출 방법을 설명하는 예시 도면이다.FIG. 8 is a flowchart illustrating a method of extracting an alpha channel mask according to an exemplary embodiment of the present invention, and FIG. 9 is an exemplary view illustrating an alpha channel mask extraction method according to an embodiment of the present invention.

도 8을 참고하면, 알파 채널 마스크 추출부(150)는 다시점 객체 공분할부(130)에서 출력된 각 이진 마스크의 경계선(도 9의 (a)의 50, 51 참조)을 조사하여 매팅이 필요한 매팅 영역을 탐지하고, 초기 트라이맵(Trimap)을 생성한다(S210). 도 9의 (b)를 참고하면, 트라이맵(60)은 확실한 배경(검은색)과 확실한 전경(흰색), 그리고 불확실한 경계 영역(회색)을 표시한다. 알파 채널 마스크 추출부(150)는 외곽선 픽셀에서 전경과 배경 간의 엔트로피를 극대화하는 윈도우 크기(전경과 배경을 효과적으로 분리하는 윈도우 크기)를 결정하는 방법으로 매팅 영역을 탐지한다. 도 9를 참고하면, 객체의 샘플 지점에 따라 윈도우(노란색 박스) 크기가 다르다. 알파 채널 마스크 추출부(150)는 연산량을 위해 필요한 외곽선 픽셀들을 샘플링하고, 엔트로피를 조사해 윈도우의 모양과 크기를 동적으로 결정할 수 있다. 전경/배경 간의 엔트로피는 쿨백-라이블러 발산(Kullback-Leibler divergence) 방법으로 측정될 수 있다. 쿨백-라이블러 발산이 극대화되는 크기의 윈도우에서는, 섞여 있는 전경/배경 샘플들의 두 확률 분포들이 최소로 겹쳐지므로 두 집단이 쉽게 분리 가능하다. 8, the alpha channel mask extraction unit 150 checks the boundary lines (refer to 50 and 51 in FIG. 9A) of the binary masks output from the multi-point object collation unit 130, Detects the matting area, and generates an initial triimap (S210). 9 (b), the tri-map 60 displays a certain background (black), a certain foreground (white), and an uncertain boundary area (gray). The alpha channel mask extraction unit 150 detects the matching area by determining a window size (a window size that effectively separates foreground and background) that maximizes entropy between foreground and background in an outline pixel. Referring to FIG. 9, the window (yellow box) size differs depending on the sample point of the object. The alpha channel mask extraction unit 150 can sample the outline pixels necessary for the calculation amount and dynamically determine the shape and size of the window by examining entropy. The entropy between the foreground / background can be measured by the Kullback-Leibler divergence method. In a window of a size that maximizes the Kullback-Leibler dispersion, the two probability distributions of the mixed foreground / background samples are minimally overlapped, making the two groups easily separable.

알파 채널 마스크 추출부(150)는 외곽선 상의 각 로컬 윈도우에서 확실한 배경, 확실한 전경, 그리고 불확실한 영역이 최적으로 분할되도록 트라이맵을 최적화한다(S220). 이때, 알파 채널 마스크 추출부(150)는 MRF(α-expansion)에서 컬러 정보와 기하학적 정보를 사용하여 각 로컬 윈도우마다 확실한 배경, 확실한 전경, 그리고 불확실한 경계 영역을 분할(Trimap segmentation)하여 최적화된 트라이맵을 생성한다. 여기서의 핵심 가정은 로컬 부분의 전경, 배경 샘플들은 선형적으로 표현할 수 있다는 것이며, 이를 통해 이 두 집단에서 가장 거리가 먼 샘플들을 매팅이 필요한 샘플들로 새롭게 분류하는 것이다. 도 9의 (b)를 참고하면, 트라이맵 최적화 후, 초기 트라이맵은 불확실한 영역을 늘리는 후처리 작업에 의해 보정될 수 있다.The alpha channel mask extracting unit 150 optimizes the tri-map so that a reliable background, a certain foreground, and an uncertain region are optimally divided in each local window on the outline (S220). At this time, the alpha channel mask extraction unit 150 divides (Trimap segmentation) a definite background, a reliable foreground, and an uncertain boundary region for each local window using the color information and the geometric information in the MRF (alpha-expansion) Create a map. The key assumption here is that the foreground and background samples of the local part can be represented linearly, thereby classifying the samples that are the most distant from these two groups into samples that require matting. 9 (b), after the tri-map optimization, the initial tri-map can be corrected by a post-processing operation that increases the uncertain area.

알파 채널 마스크 추출부(150)는 시점 이미지들이 공유하는 기하학 모델로부터 얻어진 이진 분할 결과를 지키는 제한 조건(m_g)을 포함하는 매팅 식을 이용하여 트라이맵의 불확실한 영역을 매팅한다(S230). 알파 채널 마스크 추출부(150)는 수학식 2의 매팅 식을 풀어 알파 채널 값(

)을 구할 수 있다. The alpha channel mask extraction unit 150 mats an uncertain region of the tri-map using a matting expression including a restriction condition (m _g ) for keeping the binary division result obtained from the geometric model shared by the viewpoint images (S230). The alpha channel mask extraction unit 150 extracts the matting expression of Equation (2) to obtain an alpha channel value

) Can be obtained.

알파 채널 마스크 추출부(150)는 트라이맵의 매팅 영역이 정교하게 채워진 알파 채널 마스크를 출력한다(S240). 도 9의 (c)를 참고하면, 트라이맵으로부터 추출된 알파 채널 마스크(70)를 확인할 수 있다. 매팅 후에는 응용 분야에 따라 전경, 배경 컬러 값을 더 정확히 추론할 수 있다.The alpha channel mask extraction unit 150 outputs an alpha channel mask in which the matting area of the triam is precisely filled (S240). Referring to FIG. 9C, the alpha channel mask 70 extracted from the tri-map can be confirmed. After matting, foreground and background color values can be deduced more accurately depending on the application.

도 10은 본 발명의 한 실시예에 따른 로컬 윈도우 최적화를 통한 매팅 영역 탐지 방법을 설명하는 도면이다.10 is a view for explaining a method of detecting a matching area through local window optimization according to an embodiment of the present invention.

도 10의 (a)를 참고하면, 알파 채널 마스크 추출부(150)는 다양한 모양과 크기의 윈도우를 설정한다. 도 10의 (b)를 참고하면, 알파 채널 마스크 추출부(150)는 제한된 경로(constrained path)를 따라 다양한 윈도우로 쿨백-라이블러 발산을 계산한 후, 쿨백-라이블러 발산이 최대가 되는 윈도우 모양과 크기를 결정한다. 이때, 알파 채널 마스크 추출부(150)는 경로상의 윈도우 모양과 크기는 부드럽게 변경할 수 있다.Referring to FIG. 10 (a), the alpha channel mask extraction unit 150 sets windows having various shapes and sizes. 10B, the alpha channel mask extraction unit 150 calculates the Kullback-Leibler divergence using various windows along a constrained path, and then calculates a Kullback- Determine shape and size. At this time, the alpha channel mask extraction unit 150 can smoothly change the window shape and size on the path.

도 11은 본 발명의 한 실시예에 따른 사용자에 의한 초기 전경 영역 설정을 설명하는 도면이다.11 is a view for explaining an initial foreground area setting by a user according to an embodiment of the present invention.

도 11을 참고하면, 시점간 정보를 공유하는 공분할 방법은 사용자의 수동 입력 정보를 효과적으로 전파할 수 있다. 예를 들어, 도 11의 (a)는 모호한 초기 전경 영역으로 공분할이 수렴한 결과를 나타낸다. Referring to FIG. 11, the cooperative method of sharing information between the viewpoints can effectively propagate the manual input information of the user. For example, FIG. 11 (a) shows a result of convergence of the coarse division into the ambiguous initial foreground region.

전경에 대한 재정의를 위해 (b)와 같이, 객체 추출 장치(100)는 사용자로부터 특정 시점(예를 들면, 정면 시점) 한 장에 대해 객체가 포함된 박스(80)를 입력받는다. 공간상의 관심 영역은 주어진 박스(80)에 따라 더욱 제한되고, 이 정보는 모든 시점으로 전파되어 MRF 데이터항들이 갱신되는 영역에 따라 다시 계산된다. 결국 모든 시점에서 (c), (d)와 같이 정확한 전경들로 수렴한다.The object extracting apparatus 100 receives a box 80 containing an object for one chapter at a specific time point (for example, a front view point) from the user, as shown in (b) of FIG. The region of interest in space is further limited according to the given box 80, and this information is propagated to all points of view and recalculated according to the area in which the MRF data terms are updated. Finally, converge to the correct foregrounds at all points (c) and (d).

도 12는 본 발명의 한 실시예에 따른 다시점 객체 추출 결과의 예시이다.12 is an example of a result of extracting a multi-view object according to an embodiment of the present invention.

도 12를 참고하면, 검증에 사용된 데이터셋은 소파, 테디베어, 나무, 사자1, 사자2이다. 나무와 사자1은 턴테이블(SNRT400-Solutionix)과 1280×960의 해상도를 가진 두 대의 카메라(Flea2-PGR)로 수집된 12개의 시점 영상을 포함한다. 사자2는 5184×3456의 해상도를 가진 DSLR 카메라(Canon DSLR Mark3)를 움직이며 촬영한 영상을 포함한다. 보정된 카메라 입력들 중 8장을 선택해 본 발명의 알고리즘을 적용하였고, 나머지는 정확한 기하학 모델을 공유하며 해결한다. Referring to Fig. 12, the data sets used for verification are a sofa, a teddy bear, a tree, a lion 1, and a lion 2. The Tree and Lion 1 includes 12 view images collected with a turntable (SNRT400-Solutionix) and two cameras (Flea2-PGR) with a resolution of 1280 × 960. Lion 2 includes images taken by moving a DSLR camera (Canon DSLR Mark 3) with a resolution of 5184 × 3456. Eight of the calibrated camera inputs are selected to apply the algorithm of the present invention, and the rest is shared by correct geometric models.

도 12에서 첫번째 열은 각 객체의 정면 시점의 입력 영상이다. 두번째 열은 객체 추출 장치(100)의 중간 결과물인 이진 분할 결과이다. 세번째 열은 객체 추출 장치(100)의 중간 결과물로부터 추정된 알파 채널 마스크 및 이를 이용해 추출된 객체를 나타낸다. 도 12를 참고하면, 본 발명은 다시점 이미지들간의 정보를 공유하는 공분할 결과를 얻을 수 있고, 객체의 알파 채널 마스크를 매우 정교하게 추출할 수 있다. In FIG. 12, the first column is the input image of the front view of each object. The second column is the result of the binary division which is the intermediate result of the object extracting apparatus 100. The third column represents the alpha channel mask estimated from the intermediate output of the object extraction apparatus 100 and the object extracted using the alpha channel mask. Referring to FIG. 12, the present invention can obtain a cooperative result sharing information between multi-view images, and can extract the alpha channel mask of the object very precisely.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only by the apparatus and method, but may be implemented through a program for realizing the function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

Claims

CLAIMS What is claimed is: 1. An operating method of an object extraction device operated by at least one processor,
Extracting a volume of interest including an object in a three-dimensional space on the basis of a plurality of viewpoint images obtained by capturing an object at a re-point,
Projecting the volume of interest to each viewpoint image to obtain an initial foreground area containing the object in each viewpoint image,
Extracting a geometric energy model initialized from the initial foreground region and a foreground region of each viewpoint image converged from the initial foreground region while updating an external energy model associated with color and texture,
Outputting a binary mask that divides the foreground and background of each viewpoint image
Lt; / RTI >
The step of extracting the volume of interest
Extracting, into the volume of interest, a three-dimensional space in which frustums of camera views related to each of the plurality of viewpoint images overlap.

delete

The method of claim 1,
The geometric energy model
The geometric consistency between the viewpoints, which indicate whether the viewpoint image is inferred as the foreground or background in other warped viewpoint images, where the viewpoint image is inferred as foreground or background in the viewpoint image, geometric coherence).

4. The method of claim 3,
The step of extracting the foreground region of each viewpoint image
Connecting the super-pixels of each viewpoint image to the corresponding three-dimensional point in the current volume of interest based on the geometric information of each viewpoint image to determine geometric consistency between the viewpoints,
Wherein the super-pixel is a unit for dividing an image.

5. The method of claim 4,
The step of extracting the foreground region of each viewpoint image
Updating the current volume of interest by adding a three-dimensional point around a point at which a geometric coherence score is higher than a reference value in the current volume of interest, deleting invisible three-dimensional points in the plurality of images from the current volume of interest,
And updating the geometric energy model and the cosmetic energy model related to the color and texture based on the updated volume of interest.

The method of claim 1,
The external energy model related to the texture
Modeled to calculate energy using texture information shared in the plurality of view-point images,
The external energy model associated with the color
Is modeled to calculate energy using color information of foreground and background inferred from each viewpoint image.

The method of claim 6,
The external energy model related to the texture
And calculating a classification score related to the texture of each superpixel based on the classifier learned by the texture characteristics of the foreground and background deduced from the plurality of viewpoint images,
Wherein the super-pixel is a unit for dividing an image.

The method of claim 1,
Generating a trimap that divides the perimeter of the boundary into a background, a foreground, and an uncertain boundary region based on color information and geometric information around the perimeter of the binary mask; and
Matting the uncertain boundary region of the tri map to generate an alpha channel mask
&Lt; / RTI >

CLAIMS What is claimed is: 1. An operating method of an object extraction device operated by at least one processor,
Extracting a volume of interest in which the frustums of the plurality of view-point images that have photographed the object at the multiple points overlap,
Projecting the volume of interest to an arbitrary viewpoint image of the plurality of viewpoint images to divide the arbitrary viewpoint image into a foreground and a background including the object;
Obtaining a binary mask of the arbitrarily-point-separated image,
Calculating entropy of foreground and background in the pixels sampled around the perimeter of the binary mask to dynamically determine the local window size corresponding to each pixel,
Dividing the region of each local window into a background, a foreground, and an uncertain boundary region to create a trimap; and
Extracting an alpha channel value of the tri-map in a matting manner to generate an alpha channel mask
&Lt; / RTI >

The method of claim 9,
The step of dynamically determining the local window size
Calculating a Kullback-Leibler divergence in a plurality of window sizes at the sampled pixels, and extracting a window size that maximizes the Kullback-Leibler divergence.

The method of claim 9,
The step of generating the tri-map
A method of dividing each local window region into a background, a foreground, and an uncertain boundary region using a Markov random field (MRF).

delete

The method of claim 9,
The matting equation
And constraints for geometrically sharing the matting results of the plurality of viewpoint images.

1. An object extraction device operated by at least one processor,
Extracting a volume of interest including an object in a three-dimensional space on the basis of a plurality of viewpoint images obtained by capturing an object at a plurality of viewpoints, and using geometric constraints on the volume of interest and color and texture information of the plurality of viewpoint images A co-division unit for co-splitting the foreground and the background in the plurality of viewpoint images, and
A binary mask as a division result of foreground and background for each viewpoint image is input from the multi-point object covariance unit, a region around the boundary line of the binary mask is generated as a trimap, An extraction unit for extracting an alpha channel value of the tri-map and generating an alpha channel mask,
Lt; / RTI >
The co-
And extracts a three-dimensional space in which the frustums of the camera view related to each of the plurality of viewpoint images overlap with the volume of interest.

The method of claim 14,
The co-
Projecting the volume of interest into each viewpoint image to obtain an initial foreground region containing the object in each viewpoint image, updating the geometric energy model initialized from the initial foreground region and the exterior energy model associated with color and texture, An object extraction apparatus for extracting a foreground region of each viewpoint image converged from a foreground region.

16. The method of claim 15,
The geometric energy model
The geometric consistency between the point-in-time and the point-in-time points indicating whether the super-pixel inferred from the point-in-time image as the foreground or background is inferred as foreground or background in the warped other point-in- lt; / RTI > is modeled to measure geometric coherence,
The external energy model related to the texture
And calculating a classification score related to the texture of each superpixel based on the classifier learned by the texture characteristics of the foreground and background deduced from the plurality of viewpoint images,
Wherein the super-pixel is a unit for dividing an image.

The method of claim 14,
The extracting unit
The entropy of the foreground and background in the pixels sampled around the boundary line of the binary mask is calculated to dynamically determine the size of the local window corresponding to each pixel, and the area of each local window is divided into the background, foreground, and uncertain boundary region And generating the triimage by dividing the object.