KR20220071935A

KR20220071935A - Method and Apparatus for Deriving High-Resolution Depth Video Using Optical Flow

Info

Publication number: KR20220071935A
Application number: KR1020210162409A
Authority: KR
Inventors: 방건; 강정원; 김수웅; 배성준; 이진호; 이하현; 임성창; 김민혁; 강다현
Original assignee: 한국전자통신연구원; 한국과학기술원
Priority date: 2020-11-24
Filing date: 2021-11-23
Publication date: 2022-05-31

Abstract

The present disclosure relates to a method and apparatus for deriving a high-resolution depth image using an optical flow. According to an embodiment of the present disclosure, the method for deriving a depth image using an optical flow comprises: estimating an optical flow between a first viewpoint image and a second viewpoint image; generating a ray direction for each pixel of the first viewpoint image based on the estimated optical flow; and deriving a depth image of the first viewpoint image using the generated ray direction, wherein the ray direction is expressed as a direction vector in a 3D space, and the depth image may be derived based on parameters of a camera that has captured the first viewpoint image. According to the present disclosure, it is possible to efficiently and quickly derive a high-resolution depth image using an optical flow and provide an optical flow-based 3D coordinate estimation scheme.

Description

Method and Apparatus for Deriving High-Resolution Depth Video Using Optical Flow

본 개시는 광학 흐름(Optical Flow)을 이용한 고해상도 깊이 영상 추정 방법 및 장치에 대한 것이다. The present disclosure relates to a method and apparatus for estimating a high-resolution depth image using an optical flow.

최근 전 세계는 4차산업혁명의 거센 물결을 맞아 급속도로 변화하고 있다. 소위 스마트화로 시작된 변화는 모든 분야를 혁신적으로 바꿔가고 있다. 4차산업혁명으로 인해 우리는 이제 언제 어디서나 모바일, 인터넷과 연결하여 기기를 조작하고 데이터를 수집하는 완전한 디지털화, 인공지능화 시대에 살게 되었다. 4차산업혁명 시대를 주도하는 핵심기술에는 증강 현실(Augmented Reality) 및 가상 현실(Virtual Reality) 등이 있다. Recently, the world is rapidly changing in the face of the strong wave of the 4th industrial revolution. The change that started with so-called smartization is transforming all fields innovatively. Due to the 4th industrial revolution, we are now living in an era of complete digitalization and artificial intelligence where we can operate devices and collect data anytime, anywhere by connecting to mobile and internet. Core technologies leading the 4th industrial revolution era include augmented reality and virtual reality.

증강현실(AR: Augmented Reality)은 현실세계의 정보에 추가적인 가상 정보를 덧입혀 현실 경험을 증강시키는 시스템이며, 가상현실(VR: Virtual Reality)은 현실세계를 인공적인 기술을 활용하여 실제로 얻기 힘든, 또는 얻을 수 없는 경험이나 환경 등을 제공해 인체의 오감을 자극함으로써 실제와 같이 체험하게 하는 기술을 말한다. 최근 가상현실 기술은 가상현실과 현실을 넘나들며 자유로운 행동과 오감을 통해 소통하는 미래 컴퓨팅 환경 기술로 발전하고 있으며, 사람들의 생활을 변화시키고 광범위한 산업 발전에 영향을 줄 것으로 기대하고 있다. Augmented reality (AR) is a system that augments the real experience by adding additional virtual information to the information in the real world. Or, it refers to a technology that stimulates the five senses of the human body by providing an experience or environment that cannot be obtained, so that it can be experienced as if it were real. Recently, virtual reality technology is developing into a future computing environment technology that communicates through free actions and five senses across virtual reality and reality, and is expected to change people's lives and affect the development of a wide range of industries.

가상현실(virtual reality) 서비스는 전방위 영상을 실사 혹은 CG(Computer Graphics) 형태로 생성하여 HMD(Head Mounted Display), 스마트폰 등에 재생함으로써 몰입감 및 현장감이 극대화된 서비스를 제공하는 방향으로 진화하고 있다.The virtual reality service is evolving in the direction of providing a service that maximizes the sense of immersion and realism by generating omnidirectional images in the form of live images or CG (Computer Graphics) and playing them on HMD (Head Mounted Display) and smartphones.

이에 따라, 최근 AR/VR 서비스에서 점점 고해상도, 고품질의 영상이 요구되고 있으며 VR/AR에서 제공하는 가상공간 내의 영상에서는 가상시점 영상이 필수적으로 제공되어야 하며 이 역시 고해상도/고품질을 위한 렌더링 기술이 필요하다. 이를 깊이영상의 품질에 따라 가상시점 영상의 품질이 좌우될 수 있으며 깊이영상의 품질을 높이는데 많은 시간과 비용이 발생한다. 따라서 정확한 깊이 영상을 추정하는 방법에 대한 기술이 요구되고 있는 실정이다. Accordingly, in recent AR/VR services, high-resolution and high-quality images are increasingly required, and virtual viewpoint images must be provided in the virtual space provided by VR/AR, which also requires high-resolution/high-quality rendering technology. do. In this case, the quality of the virtual view image may be affected depending on the quality of the depth image, and a lot of time and money are incurred to improve the quality of the depth image. Therefore, there is a need for a technique for a method for estimating an accurate depth image.

깊이 영상 추정 방법으로는 좌우 영상의 대응점을 찾아 이에 대한 변위를 구한 후 이를 다시 깊이(거리)로 변환하는 방법을 주로 사용하는데 이때 대응점을 찾는 방법에 있어 블록 매칭 방법을 사용하는 것이 일반적인 방법으로 사용되고 있다. 이때 블록 매칭 방법을 사용하는데 있어 좌우 영상간 대응 픽셀 간의 차이가 크게 발생하지 않는 경우에는 깊이 추정이 부정확할 수 있다.As a depth image estimation method, a method is mainly used to find the corresponding point of the left and right images, obtain the displacement, and then convert it back to the depth (distance). have. In this case, when the block matching method is used, when the difference between the corresponding pixels between the left and right images does not occur significantly, the depth estimation may be inaccurate.

또한, 기존 핀홀(pin hole) 카메라 모델에서 각 영상의 대응점을 고속으로 스캔할 수 있는 스테레오 매칭 기반으로 한 깊이 추정 방법에 의하면, 어안렌즈를 통해서 왜곡된 영상은 에피폴라 기하학 제약(epipolar geometry constraint)을 적용할 수 없기 때문에 기존 방법을 사용하게 되면 부정확한 깊이 추정이 될 수 있다. In addition, according to the stereo matching-based depth estimation method that can scan the corresponding points of each image at high speed in the existing pinhole camera model, the image distorted through the fisheye lens is an epipolar geometry constraint. Since it cannot be applied, using the existing method may lead to inaccurate depth estimation.

본 개시의 목적은, 효율적이고 신속한 광학 흐름을 이용한 고해상도 깊이 영상 추정 기술을 제공하는 데 있다. It is an object of the present disclosure to provide a high-resolution depth image estimation technique using an efficient and fast optical flow.

본 개시의 목적은, 광학 흐름(optical flow) 계산을 기반으로 한 깊이 추정 기술을 제공하여 자유롭고 보다 정확한 깊이 추정을 가능하게 하는 데 있다.It is an object of the present disclosure to provide a depth estimation technique based on optical flow calculation to enable free and more accurate depth estimation.

본 개시의 목적은, 렌즈로부터 촬영상의 왜곡여부와 무관하게 보다 정확한 깊이 영상을 추정하는 데 있다.An object of the present disclosure is to more accurately estimate a depth image from a lens regardless of whether a photographing image is distorted.

본 개시의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 개시의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 개시의 목적 및 장점들은 특허청구범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다. Other objects and advantages of the present disclosure may be understood by the following description, and will become more clearly understood by the examples of the present disclosure. It will also be readily apparent that the objects and advantages of the present disclosure may be realized by the means and combinations thereof indicated in the appended claims.

본 개시의 일 실시예에 따르면, 광학 흐름을 이용한 깊이 영상 추정 방법은, 제1 시점 영상과 제2 시점 영상 간의 광학 흐름을 추정하는 단계, 상기 추정된 광학 흐름에 기초하여 상기 제1 시점 영상의 각 픽셀에 대한 광선 방향을 생성하는 단계, 및 상기 생성된 광선 방향을 이용하여 상기 제1 시점 영상의 깊이 영상을 추정하는 단계를 포함하되, 상기 광선 방향은 3차원 공간 상의 방향벡터로 표현되며, 상기 깊이 영상은, 상기 제1 시점 영상을 촬영한 카메라 파라미터에 기반하여 추정될 수 있다.According to an embodiment of the present disclosure, a method for estimating a depth image using an optical flow includes estimating an optical flow between a first viewpoint image and a second viewpoint image; generating a ray direction for each pixel, and estimating a depth image of the first viewpoint image using the generated ray direction, wherein the ray direction is expressed as a direction vector in a three-dimensional space, The depth image may be estimated based on a camera parameter capturing the first viewpoint image.

한편, 상기 추정된 광학 흐름은 딥 러닝(deep learning)을 기반으로 추정될 수 있다. Meanwhile, the estimated optical flow may be estimated based on deep learning.

한편, 상기 카메라 파라미터는 상기 카메라의 중심 위치를 포함할 수 있다.Meanwhile, the camera parameter may include a center position of the camera.

한편, 상기 깊이 영상을 추정하는 단계는, 상기 생성된 광선 방향을 이용하여 상기 제1 시점 영상의 각 픽셀에 대한 상기 3차원 공간 상의 좌표를 구하는 단계를 포함할 수 있다.Meanwhile, the estimating of the depth image may include obtaining coordinates in the 3D space for each pixel of the first viewpoint image by using the generated ray direction.

한편, 상기 좌표는 최소 MSE를 이용한 경사 하강법(gradient descent)에 기초하여 획득될 수 있다.Meanwhile, the coordinates may be obtained based on gradient descent using the minimum MSE.

본 개시의 일 실시예에 따르면, 광학 흐름을 이용한 깊이 영상 추정 장치는, 데이터를 저장하는 메모리 및 상기 메모리를 제어하는 프로세서를 포함하되, 상기 프로세서는 제1 시점 영상과 제2 시점 영상 간의 광학 흐름을 추정하고, 상기 추정된 광학 흐름에 기초하여 상기 제1 시점 영상의 각 픽셀에 대한 광선 방향을 생성하고, 상기 생성된 광선 방향을 이용하여 상기 제1 시점 영상의 깊이 영상을 추정하되, 상기 광선 방향은 3차원 공간 상의 방향벡터로 표현되며, 상기 깊이 영상은, 상기 제1 시점 영상을 촬영한 카메라 파라미터에 기반하여 추정될 수 있다.According to an embodiment of the present disclosure, an apparatus for estimating a depth image using an optical flow includes a memory for storing data and a processor for controlling the memory, wherein the processor includes an optical flow between a first viewpoint image and a second viewpoint image. , generating a ray direction for each pixel of the first viewpoint image based on the estimated optical flow, and estimating a depth image of the first viewpoint image using the generated ray direction, A direction is expressed as a direction vector in a three-dimensional space, and the depth image may be estimated based on a camera parameter for capturing the first viewpoint image.

본 개시의 일 실시예에 따르면, 비 일시적 컴퓨터 판독 가능한 매체에 저장된 광학 흐름을 이용한 깊이 영상 추정 프로그램은, 컴퓨터에서, 제1 시점 영상과 제2 시점 영상 간의 광학 흐름을 추정하는 단계, 상기 추정된 광학 흐름에 기초하여 상기 제1 시점 영상의 각 픽셀에 대한 광선 방향을 생성하는 단계, 및 상기 생성된 광선 방향을 이용하여 상기 제1 시점 영상의 깊이 영상을 추정하는 단계를 수행하되, 상기 광선 방향은 3차원 공간 상의 방향벡터로 표현되며, 상기 깊이 영상은, 상기 제1 시점 영상을 촬영한 카메라 파라미터에 기반하여 추정될 수 있다.According to an embodiment of the present disclosure, a depth image estimation program using an optical flow stored in a non-transitory computer-readable medium includes, in a computer, estimating an optical flow between a first viewpoint image and a second viewpoint image, the estimated generating a ray direction for each pixel of the first viewpoint image based on an optical flow, and estimating a depth image of the first viewpoint image using the generated ray direction, wherein the ray direction is expressed as a direction vector in a three-dimensional space, and the depth image may be estimated based on a camera parameter for capturing the first viewpoint image.

본 개시에 의하면, 광학 흐름을 이용하여 효율적이고 신속하게 고해상도 깊이 영상 추정을 수행할 수 있다.According to the present disclosure, it is possible to efficiently and quickly perform high-resolution depth image estimation using an optical flow.

본 개시에 의하면, 주변 시점의 영상을 이용하여 광학 흐름 기반 3차원 좌표 추정 기술을 제공할 수 있다.According to the present disclosure, it is possible to provide an optical flow-based 3D coordinate estimation technique using an image of a peripheral viewpoint.

본 개시에 의하면, 다수의 광학 흐름으로부터 생성된 광선(light ray)들을 최적화할 수 있다. According to the present disclosure, it is possible to optimize light rays generated from multiple optical streams.

본 개시에 의하면, 렌즈로부터 촬영상의 왜곡여부와 무관하게 보다 정확한 깊이 영상을 추정함으로써 다양한 렌즈 기반 환경에 자유롭고 정확한 고해상도 깊이 영상 추정을 수행할 수 있다.According to the present disclosure, it is possible to freely and accurately estimate a high-resolution depth image in various lens-based environments by estimating a more accurate depth image from a lens regardless of whether a photographing image is distorted.

본 개시의 실시 예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 이하의 본 개시의 실시 예들에 대한 기재로부터 본 개시의 기술 구성이 적용되는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다. 즉, 본 개시에서 서술하는 구성을 실시함에 따른 의도하지 않은 효과들 역시 본 개시의 실시 예들로부터 당해 기술 분야의 통상의 지식을 가진 자에 의해 도출될 수 있다.Effects obtainable in the embodiments of the present disclosure are not limited to the above-mentioned effects, and other effects not mentioned are the technical fields to which the technical configuration of the present disclosure is applied from the description of the embodiments of the present disclosure below. It can be clearly derived and understood by those of ordinary skill in the art. That is, unintended effects of implementing the configuration described in the present disclosure may also be derived by those of ordinary skill in the art from the embodiments of the present disclosure.

도 1은 플렌옵틱 포인트 클라우드의 생성 예시를 도시한다.
도 2는 플렌옵틱 포인트에 할당된 속성 정보를 시점의 위치에 따라 표현하는 방식을 설명하기 위한 도면이다.
도 3은 본 개시의 일 실시예에 따른 다시점 영상 생성 방법을 설명하기 위한 도면이다.
도 4 및 도 5는 종래의 에피폴라 기하학 제약(epipolar geometry constraint)의 개념을 설명하기 위한 도면이다.
도 6은 종래의 스테레오 매칭(stereo matching) 기반 깊이 영상을 추정하는 방법을 설명하기 위한 도면이다.
도 7은 본 개시의 일 실시예에 따른 광학 흐름을 이용한 깊이 영상 추정 방법을 설명하기 위한 도면이다.
도 8은 본 개시의 일 실시예에 따른 광학 흐름을 이용한 깊이 영상 추정 장치를 설명하기 위한 도면이다.1 shows an example of generating a plenoptic point cloud.
FIG. 2 is a diagram for explaining a method of expressing attribute information allocated to a plenoptic point according to a position of a viewpoint.
3 is a view for explaining a method for generating a multi-viewpoint image according to an embodiment of the present disclosure.
4 and 5 are diagrams for explaining the concept of a conventional epipolar geometry constraint.
6 is a diagram for explaining a conventional method of estimating a depth image based on stereo matching.
7 is a view for explaining a depth image estimation method using an optical flow according to an embodiment of the present disclosure.
8 is a view for explaining an apparatus for estimating a depth image using an optical flow according to an embodiment of the present disclosure.

이하에서는 첨부한 도면을 참고로 하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나, 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art to which the present disclosure pertains can easily implement them. However, the present disclosure may be embodied in several different forms and is not limited to the embodiments described herein.

본 개시의 실시 예를 설명함에 있어서 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그에 대한 상세한 설명은 생략한다. 그리고, 도면에서 본 개시에 대한 설명과 관계없는 부분은 생략하였으며, 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.In describing an embodiment of the present disclosure, if it is determined that a detailed description of a known configuration or function may obscure the gist of the present disclosure, a detailed description thereof will be omitted. And, in the drawings, parts not related to the description of the present disclosure are omitted, and similar reference numerals are attached to similar parts.

본 개시에 있어서, 서로 구별되는 구성요소들은 각각의 특징을 명확하게 설명하기 위함이며, 구성요소들이 반드시 분리되는 것을 의미하지는 않는다. 즉, 복수의 구성요소가 통합되어 하나의 하드웨어 또는 소프트웨어 단위로 이루어질 수도 있고, 하나의 구성요소가 분산되어 복수의 하드웨어 또는 소프트웨어 단위로 이루어질 수도 있다. 따라서, 별도로 언급하지 않더라도 이와 같이 통합된 또는 분산된 실시 예도 본 개시의 범위에 포함된다. In the present disclosure, the components that are distinguished from each other are for clearly explaining each characteristic, and the components do not necessarily mean that the components are separated. That is, a plurality of components may be integrated to form one hardware or software unit, or one component may be distributed to form a plurality of hardware or software units. Accordingly, even if not specifically mentioned, such integrated or distributed embodiments are also included in the scope of the present disclosure.

본 개시에 있어서, 다양한 실시 예에서 설명하는 구성요소들이 반드시 필수적인 구성요소들을 의미하는 것은 아니며, 일부는 선택적인 구성요소일 수 있다. 따라서, 일 실시 예에서 설명하는 구성요소들의 부분집합으로 구성되는 실시 예도 본 개시의 범위에 포함된다. 또한, 다양한 실시 예에서 설명하는 구성요소들에 추가적으로 다른 구성요소를 포함하는 실시 예도 본 개시의 범위에 포함된다. In the present disclosure, components described in various embodiments do not necessarily mean essential components, and some may be optional components. Accordingly, an embodiment composed of a subset of components described in an embodiment is also included in the scope of the present disclosure. In addition, embodiments including other components in addition to components described in various embodiments are also included in the scope of the present disclosure.

한편, 본 개시의 실시예를 설명함에 있어, 두 개 이상의 시점에 대한 영상들이 사용될 수 있는 경우에는, 임의의 한 시점과 다른 시점을 구분하기 위해 현재 시점과 그 외의 시점으로 구분한다. 그러나, 현재 시점이 특정한 한 시점으로 고정되는 것은 아니며, 임의의 시간적 순간에서, 깊이 영상을 추정하고자 하는 대상 시점이 현재 시점으로 지칭될 수도 있다. Meanwhile, in describing an embodiment of the present disclosure, when images for two or more viewpoints can be used, the current viewpoint and other viewpoints are divided in order to distinguish one viewpoint from another viewpoint. However, the current view is not fixed to a specific view, and at an arbitrary temporal moment, a target view from which a depth image is to be estimated may be referred to as a current view.

이하, 도면을 참조하여 본 개시에 대해 상세히 설명한다.Hereinafter, the present disclosure will be described in detail with reference to the drawings.

도 1은 플렌옵틱 포인트 클라우드의 생성 예시를 도시한다. 플렌옵틱 포인트(plenoptic point)는 3차원 공간에서 X, Y, Z 등의 3차원 좌표로 표현되는 하나의 기하(geometry) 정보와 N개의 카메라 시점으로 관측했을 때 획득되는 RGB, YUV 등의 N개의 속성(attribute) 정보를 포함하고 있는 데이터 형태이다. 플렌옵틱 포인트 클라우드(plenoptic point cloud)는 플렌옵틱 포인트의 집합으로, 플렌옵틱 포인트를 최소 하나 이상 포함할 수 있다.1 shows an example of generating a plenoptic point cloud. A plenoptic point is one piece of geometry information expressed in three-dimensional coordinates such as X, Y, and Z in a three-dimensional space, and N pieces of RGB, YUV, etc. It is a data type that includes attribute information. A plenoptic point cloud is a set of plenoptic points, and may include at least one plenoptic point.

플렌옵틱 포인트 클라우드는 N개의 각 입력 시점 별 2차원 영상과 깊이 정보를 이용하여 생성될 수 있다. 이때, 3차원 공간은 생성한 포인트들을 모두 포함하는 공간을 통해 정의될 수 있다.The plenoptic point cloud may be generated using a 2D image and depth information for each of the N input viewpoints. In this case, the 3D space may be defined through a space including all the generated points.

여기서, 2D 영상은 다시점(multi-view) 영상, 라이트필드(lightfield) 영상 등 하나 이상의 카메라로 획득된 영상들을 의미할 수 있다. 그리고 다시점 영상은 특정 영역을 서로 다른 시점을 가진 다수의 카메라들이 동시에 촬영한 영상들로 구성될 수 있다.Here, the 2D image may mean images acquired by one or more cameras, such as a multi-view image and a lightfield image. In addition, the multi-view image may be composed of images simultaneously photographed by a plurality of cameras having different viewpoints in a specific area.

이때, 정의된 3차원 공간이 일정 단위 복셀로 분할되고 복셀안에 있는 포인트들이 하나의 기하 정보 값을 갖도록 병합될 수 있다. 또한 이때, 3차원 포인트들이 갖고 있던 색상 정보가 모두 저장되고 어느 시점으로부터 생성된 포인트 인지에 관한 정보를 활용하여 플렌옵틱 포인트 클라우드가 생성될 수 있다. In this case, the defined 3D space may be divided into predetermined unit voxels, and points within the voxels may be merged to have one geometric information value. Also, at this time, all color information of the three-dimensional points is stored, and a plenoptic point cloud may be generated by using the information regarding the point generated from which point.

3차원 포인트가 생성되지 않은 시점의 색상 정보는, 같은 복셀에 포함되는 다른 시점들의 색상 정보로부터 유추될 수 있다. 예컨대, 3차원 포인트가 생성되지 않은 시점의 색상 정보는 복셀 안의 다른 시점 또는 포인트들의 색상 정보들의 평균값, 최대값, 최소값 중 적어도 하나를 이용하여 유도될 수 있다. 또한 예컨대, 3차원 포인트가 생성되지 않은 시점의 색상 정보는, 해당 시점 또는 포인트에 인접한 시점 또는 포인트의 색상 정보로부터 유도될 수 있다.The color information of a viewpoint in which a 3D point is not generated may be inferred from color information of other viewpoints included in the same voxel. For example, color information of a viewpoint in which a 3D point is not generated may be derived using at least one of an average value, a maximum value, and a minimum value of color information of other viewpoints or points in the voxel. Also, for example, color information of a viewpoint in which a 3D point is not generated may be derived from color information of a viewpoint or point adjacent to the corresponding viewpoint or point.

한편, 한 시점에서 생성된 3차원 포인트가 한 복셀에 여러 개 포함되어 있는 경우, 해당 시점의 색상 값들의 평균값, 최대값, 최소값 중 적어도 하나를 저장하는 방법으로 플렌옵틱 포인트 클라우드가 생성될 수 있다.Meanwhile, when a plurality of 3D points generated at one viewpoint are included in one voxel, a plenoptic point cloud may be generated by storing at least one of the average value, the maximum value, and the minimum value of the color values of the corresponding viewpoint. .

다른 예로, 한 시점에서 생성된 3차원 포인트가 한 복셀에 여러 개 포함되어 있는 경우, 가장 작은 깊이 정보 또는 가장 큰 깊이 정보를 가진 포인트의 색상 정보를 저장하는 방법으로 플렌옵틱 포인트 클라우드가 생성될 수 있다. As another example, if multiple 3D points generated at one point are included in one voxel, a plenoptic point cloud can be created by storing the color information of the point with the smallest depth information or the largest depth information. have.

여기서, 하나 이상의 속성 정보를 가지는 복셀을 다속성 복셀이라고 할 수 있다. 즉, 다속성 복셀은 플렌옵틱 포인트를 의미할 수 있다.Here, a voxel having one or more attribute information may be referred to as a multi-attribute voxel. That is, the multi-attribute voxel may mean a plenoptic point.

도 2는 플렌옵틱 포인트에 할당된 속성 정보를 시점의 위치에 따라 표현하는 방식을 설명하기 위한 도면이다. FIG. 2 is a diagram for explaining a method of expressing attribute information allocated to a plenoptic point according to a position of a viewpoint.

도 2의 예와 같이, 생성된 플렌옵틱 포인트에 할당된 속성 정보는 시점의 위치에 따라

와 h를 이용한 2차원 형태로 표현할 수 있다. 여기서,

는 실수 값, 정수 값 등으로 표현되는 각도를 의미하며, h는 실수 값, 정수 값 등의 표현되는 크기를 의미할 수 있다. 즉, 플렌옵틱 포인트에 할당된 속성 정보는

와 h로 표현되는 2차원 좌표 값을 가질 수 있다.As in the example of FIG. 2 , the attribute information assigned to the generated plenoptic point depends on the position of the viewpoint.

It can be expressed in a two-dimensional form using and h. here,

may mean an angle expressed by a real value, an integer value, etc., and h may mean a size expressed by a real value, an integer value, or the like. That is, the attribute information assigned to the plenoptic point is

It can have a two-dimensional coordinate value expressed by and h.

2차원 형태로 표현된 N개 시점의 플렌옵틱 포인트의 속성 정보들 간의 중복성을 효율적으로 제거하기 위해, 속성 분해(analysis)가 수행될 수 있다. 이때, 플렌옵틱 포인트의 속성 분해는 속성 정보에 대해 속성 정보 간 예측(prediction), 속성 정보의 패딩(padding), 속성 정보의 외삽(extrapolation), 속성 정보의 보간(interpolation), 속성 정보의 변환(transform), 속성 정보의 양자화(quantization) 중 적어도 하나를 수행하는 것을 의미할 수 있다. In order to efficiently remove redundancy between attribute information of plenoptic points of N views expressed in a two-dimensional form, attribute analysis may be performed. At this time, the property decomposition of the plenoptic point includes prediction between property information for property information, padding of property information, extrapolation of property information, interpolation of property information, conversion of property information ( transform) and quantization of attribute information may mean performing at least one of.

속성 분해는 시점 종속 속성 정보 및 시점 독립 속성 정보로 속성을 분해하는 것을 의미할 수 있다. 여기서, 시점 종속 속성 정보는 각 시점에서 가지는 고유한 속성 정보를 의미할 수 있고, 시점 독립 속성 정보는 하나 이상의 시점에서 공통적으로 가지는 속성 정보를 의미할 수 있다. 예를 들어, 시점 종속 속성 정보는 정반사(specular) 성분을 의미할 수 있고, 시점 독립 속성 정보는 난반사(diffuse) 성분을 의미할 수 있다.Attribute decomposition may mean decomposing attributes into view-dependent attribute information and view-independent attribute information. Here, the view-dependent attribute information may mean unique attribute information of each view, and the view-independent attribute information may refer to attribute information that is common to one or more views. For example, view-dependent attribute information may mean a specular component, and view-independent attribute information may mean a diffuse component.

도 3 본 개시의 일 실시예에 따른 다시점 영상 생성 방법을 설명하기 위한 도면이다.3 is a view for explaining a method of generating a multi-viewpoint image according to an embodiment of the present disclosure.

먼저, 플렌옵틱 포인트 클라우드의 기하 정보를 활용하여, 3차원 물체가 2차원 영상으로 투영될 수 있다(S300). 또한, 플렌옵틱 포인트 클라우드의 폐색 패턴 정보를 활용하여 3차원 물체(포인트)가 2차원 영상으로 투영될 수 있다. 예를 들어, 폐색 패턴 정보 중 임의의 시점에 해당하는 폐색 패턴 정보가 무의미한 속성 정보를 나타내는 경우 해당 포인트는 해당 시점으로 투영이 이루어지지 않을 수 있다. 폐색 패턴 정보는 제1 값으로 '1'을 가질 수 있으며, 제2 값으로 '0'을 가질 수 있다. 예를 들어, 임의 시점의 폐색 패턴 정보가 제1 값을 갖는 경우, 해당 시점으로 투영된 위치의 픽셀 값은 해당 포인트의 해당 시점의 속성 정보로 결정될 수 있다. 반면, 폐색 패턴 정보가 제2 값을 갖는 경우, 해당 시점으로 포인트가 투영되지 않을 수 있다.First, a 3D object may be projected as a 2D image by using the geometric information of the plenoptic point cloud (S300). In addition, a 3D object (point) may be projected as a 2D image by using the occlusion pattern information of the plenoptic point cloud. For example, when occlusion pattern information corresponding to an arbitrary viewpoint among occlusion pattern information indicates meaningless attribute information, the corresponding point may not be projected to the corresponding viewpoint. The occlusion pattern information may have a first value of '1' and a second value of '0'. For example, when the occlusion pattern information of an arbitrary viewpoint has a first value, a pixel value of a position projected to the corresponding viewpoint may be determined as attribute information of the corresponding viewpoint of the corresponding point. On the other hand, when the occlusion pattern information has the second value, the point may not be projected to the corresponding viewpoint.

이때, 현재 픽셀에 대해 하나의 포인트만이 투영되는 경우에는, 해당 시점(의 포인트)의 속성 정보를 이용하여 현재 픽셀의 속성 정보가 결정될 수 있다(S305, S310). 반면, 현재 픽셀에 대해 다수의 포인트가 투영되는 경우에는, 카메라와 거리가 가장 가까운 시점(의 포인트)의 속성 정보를 이용하여 해당 픽셀의 속성 정보가 결정될 수 있다(S305, S320).In this case, when only one point is projected on the current pixel, the attribute information of the current pixel may be determined using attribute information of (the point of) the current pixel ( S305 and S310 ). On the other hand, when a plurality of points are projected on the current pixel, the attribute information of the corresponding pixel may be determined using attribute information of (a point of) that is closest to the camera ( S305 and S320 ).

반면, 현재 픽셀에 대해 플렌옵틱 포인트의 투영이 이루어지지 않은 속성 정보에 대해, 현재 픽셀의 N개의 이웃 픽셀들의 속성 정보를 참조될 수 있다. 이때, 카메라와의 거리가 가장 가까운 이웃 픽셀의 속성 정보를 이용하여, 현재 픽셀의 속성 정보가 결정될 수 있다(S315, S330). 이때, N은 0보다 큰 자연수 일 수 있으며, 예컨대, 4 또는 8의 값을 가질 수 있다.On the other hand, with respect to attribute information in which the projection of the plenoptic point is not performed on the current pixel, attribute information of N neighboring pixels of the current pixel may be referred to. In this case, property information of the current pixel may be determined using property information of a neighboring pixel having the closest distance to the camera ( S315 and S330 ). In this case, N may be a natural number greater than 0, for example, may have a value of 4 or 8.

또한, 현재 픽셀에 대해 플렌옵틱 포인트의 투영이 이루어지지 않고, 현재 픽셀의 N개의 이웃 픽셀들의 속성 정보가 없거나 투영된 포인트가 없는 경우, 2차원 영상 상의 NxN 마스크를 활용한 보간 방법을 활용하여, 현재 샘플에 대한 홀(Hole) 필링이 수행될 수 있다(S315, S340). 이때, N은 0보다 큰 자연수 일 수 있으며, 예컨대, 5의 값을 가질 수 있다.In addition, if the projection of the plenoptic point is not performed on the current pixel and there is no property information or no projected points of the N neighboring pixels of the current pixel, using an interpolation method using an NxN mask on a two-dimensional image, Hole filling may be performed on the current sample (S315 and S340). In this case, N may be a natural number greater than 0, for example, may have a value of 5.

도 4 및 도 5는 종래의 에피폴라 기하학 제약(epipolar geometry constraint)의 개념을 설명하기 위한 도면이다. 보다 상세하게는, 도 4는 핀홀(pin hole) 카메라로 두 개 이상의 시점에서 영상을 촬영 시 존재하는 에피폴라 기하학 제약을 설명하기 위한 도면이며, 도 5는 에피폴라 기하학 제약에 따른 일 영상의 점에서 다른 영상에 대응될 수 있는 점을 찾기 위한 에피폴라 선을 설명하기 위한 도면이다.4 and 5 are diagrams for explaining the concept of a conventional epipolar geometry constraint. More specifically, FIG. 4 is a diagram for explaining an epipolar geometry constraint that exists when an image is captured from two or more viewpoints with a pinhole camera, and FIG. 5 is a point of an image according to the epipolar geometry constraint. It is a diagram for explaining an epipolar line for finding a point that can correspond to another image in .

일 예로서, 핀홀(Pinhole) 카메라 모델은 일반적인 2차원 평면에 투시도(perspective view)를 담을 수 있는 일반적인 카메라에 적용될 수 있다. 이러한 핀홀(Pinhole) 카메라로 두 개 이상의 시점에서 영상을 촬영할 경우 에피폴라 기하학 제약(Epipolar geometry constraint)이 존재할 수 있다.As an example, the pinhole camera model may be applied to a general camera that can contain a perspective view on a general two-dimensional plane. When an image is taken from two or more viewpoints with such a pinhole camera, an epipolar geometry constraint may exist.

일 예로서, 도 4에 나타난 바와 같이, 좌 시점에 대한 영상(left view) 및 우 시점에 대한 영상(right view)가 존재할 수 있다. 이 경우. 각 시점에 대한 영상의 중심점을 각각

이라 할 수 있다. 차원 공간 상의 점

는 중심점

과 X를 잇는 선분 상에 있는 점들은 좌시점에서 바라보았을 때 좌측 이미지 평면에서는

로만 표현될 수 있다. 즉,

는 하나의 점

과 위치가 동일하나 깊이가 상이한 점으로 나타나나, 우 시점에서 바라보았을 때는

가 이미지 평면상에 놓이게 될 수 있다. 한편, 이미지 평면이나

는 동일한 선(line) 위에 놓이게 될 수 있다. As an example, as shown in FIG. 4 , an image for a left view and an image for a right view may exist. in this case. The center point of the image for each viewpoint

it can be said point in dimensional space

is the center point

Points on the line segment connecting X and X are on the left image plane when viewed from the left viewpoint.

can only be expressed as in other words,

is a single point

It appears as a point with the same location as the one with a different depth, but when viewed from the right

may lie on the image plane. On the other hand, the image plane or

may lie on the same line.

이런 점들이 놓이는 부분을 에피폴라 선(Epipolar Line)이라 한다. 다시 말해, 일 시점을 기반으로 할 때 일정한 조건을 만족하는 3차원 공간상의 점들을 다른 시점을 기반으로 하면 이미지 평면의 에피폴라 선(epiline: epipolar line) 상에서만 찾아도 되는 상황을 에피폴라 기하학 제약(Epipolar geometry constraints)이라고 할 수 있다. The part where these points lie is called an epipolar line. In other words, when based on one viewpoint, points in a three-dimensional space that satisfy certain conditions are based on another viewpoint, the epipolar geometry constraint is a situation in which it is only necessary to find points on the epipolar line of the image plane. (Epipolar geometry constraints).

이를 도 5를 참조하여 설명하면, 에피폴라 기하학 제약(Epipolar geometry constraint)는 다음 두 가지로 특징지을 수 있다. 일 시점의 영상 평면 상의 붉은 점 p가 다른 시점의 영상 평면에서 대응될 수 있는 점은 에피폴라 선 1(Epipolar line 1) 상의 어느 한점이 될 수 있다.When this is described with reference to FIG. 5 , the epipolar geometry constraint can be characterized in the following two ways. A point to which the red point p on the image plane of one view may correspond to the image plane of the other view may be any point on the epipolar line 1 .

마찬가지로, 다른 시점의 영상 평면의 붉은 점 p'가 다른 이미지 평면에서 대응될 수 있는 점은 에피폴라 선 2(Epipolar line 2) 상의 어느 한점이 될 수 있다. Similarly, a point to which the red point p' of the image plane at different viewpoints may correspond to in the other image plane may be a point on the epipolar line 2 .

도 6은 종래의 스테레오 매칭(stereo matching) 기반 깊이 영상을 추정하는 방법을 설명하기 위한 도면이다. 6 is a diagram for explaining a conventional method of estimating a depth image based on stereo matching.

일 예로서, 종래의 스테레오 매칭은 다시점 영상을 기반으로 할 수 있는데, 적어도 두 개 시점에서 촬영된 영상을 이용한다고 가정하고 설명한다. 또한, 상기에서 언급한 에피폴라 선을 탐색에 따른 대응 점 검색 과정을 기반으로 할 수 있다. As an example, the conventional stereo matching may be based on a multi-viewpoint image, and it is assumed that an image captured from at least two viewpoints is used. In addition, the above-mentioned epipolar line may be based on the corresponding point search process according to the search.

에피폴라 라인 생성(S601) 과정은, 두 개 시점에서 촬영된 영상을 정렬하는 단계를 포함하고, 두 개 시점에서 촬영된 영상들 사이의 에피폴라 선(epipolar line)을 카메라 파라미터 등을 이용하여 탐색하는 단계를 포함할 수 있다. 여기서 카메라 파라미터란, 두 개의 시점에 대한 영상을 촬영한 카메라의 파라미터를 의미할 수 있다. The process of generating an epipolar line ( S601 ) includes aligning images captured at two viewpoints, and searching for an epipolar line between images photographed at two viewpoints using a camera parameter, etc. may include the step of Here, the camera parameter may mean a parameter of a camera that captures an image for two viewpoints.

생성된 에피폴라 라인에 기반하여 대응점을 검색(S602)하는 과정은, 디스패리티(Disparity) 추정 단계를 포함할 수 있는데, 이는 생성된 에피폴라 라인을 중심으로 현재 시점 영상(제1 시점 영상)과 다른 시점 영상(제2 시점 영상) 간의 대응 점을 찾아 디스패리티(disparity)를 추정할 수 있다. The process of searching for a corresponding point based on the generated epipolar line ( S602 ) may include a disparity estimation step, which includes a current view image (a first view image) and a current view point based on the generated epipolar line. Disparity may be estimated by finding a correspondence point between images of different viewpoints (second viewpoint images).

이후, 획득된 대응 점을 이용하여 카메라 파라미터 기반의 깊이 영상을 추정(S603)할 수 있다. 추정된 디스패리티를 깊이(Depth)로 변환할 수 있다. 여기서, 사용되는 카메라 파라미터에는 카메라의 위치, 시점 관련 값들 등이 포함될 수 있다.Thereafter, the depth image based on the camera parameter may be estimated using the obtained corresponding point ( S603 ). The estimated disparity may be converted into depth. Here, the camera parameter used may include a camera position, viewpoint-related values, and the like.

한편, 도 6의 스테레오 매칭 기반 깊이 영상 추정 방법은, 기존 핀홀(pin hole) 카메라 모델에서 가장 최적화된다. 일 예로서, 어안렌즈를 통해서 왜곡된 영상을 다시점 영상으로 이용하여 도 6의 방법을 적용하면, 에피폴라 기하학 제약(epipolar geometry constraint)은 적용할 수 없다. Meanwhile, the stereo matching-based depth image estimation method of FIG. 6 is most optimized in the existing pinhole camera model. As an example, when the method of FIG. 6 is applied by using an image distorted through a fisheye lens as a multi-view image, an epipolar geometry constraint cannot be applied.

도 7은 본 개시의 일 실시예에 따른 광학 흐름을 이용한 깊이 영상 추정 방법을 설명하기 위한 도면이다. 7 is a view for explaining a depth image estimation method using an optical flow according to an embodiment of the present disclosure.

일 실시예로서, 도 7의 깊이 영상 추정 방법은 깊이 영상 추정 장치 등에 의해 수행될 수 있으며, 이는 도 8의 장치를 포함한다. As an embodiment, the depth image estimation method of FIG. 7 may be performed by a depth image estimation apparatus, etc., including the apparatus of FIG. 8 .

일 예로서, 광학 흐름을 이용한 깊이 영상 추정 방법은 다시점 영상을 이용할 수 있다. 예를 들어, 두 개 이상의 시점에 대한 영상(들)을 이용할 수 있다. 여기서 이용되는 다시점 영상(들)은, 일 카메라가 이동하면서 스캔하여 획득한 것이거나, 하나 이상의 카메라에 의해 획득된 것일 수도 있다. As an example, a method of estimating a depth image using an optical flow may use a multi-view image. For example, image(s) for two or more viewpoints may be used. The multi-viewpoint image(s) used herein may be obtained by scanning one camera while moving, or may be obtained by one or more cameras.

먼저, 광학 흐름을 이용한 깊이 영상 추정 방법은, 다시점 영상 간 광학 흐름(optical flow)를 추정(S701)하는 과정을 포함할 수 있다. 일 예로서, 이는 두 개 이상의 시점에 대한 영상들을 이용하여 영상 내 광학 흐름을 추정하는 단계를 포함할 수 있다. 두 개 시점 이상에 대해 촬영된 영상들에 대해, 현재 시점(제1 시점)과 그 외의 시점 간의 광학 흐름을 추정하는 단계가 포함될 수 있다. 예를 들어, 광학 흐름을 구하는 방법에는 인공지능이 이용될 수 있으며, 딥 러닝(deep learning)을 기반으로 할 수 있으나, 본 개시가 이에 한정되는 것은 아니다. First, the method for estimating a depth image using an optical flow may include estimating an optical flow between multi-viewpoint images ( S701 ). As an example, this may include estimating an optical flow in the image using images for two or more viewpoints. The step of estimating an optical flow between a current viewpoint (a first viewpoint) and other viewpoints may be included for images captured for two or more viewpoints. For example, artificial intelligence may be used for a method of obtaining an optical flow and may be based on deep learning, but the present disclosure is not limited thereto.

이후, 추정된 광학 흐름에 기반하여 광선 방향을 생성(S702)하는 과정이 수행될 수 있다. 일 예로서, 본 과정에는 광학 흐름(Optical flow)에서 구한 각 시점 영상의 임의의 픽셀(예를 들어, i 픽셀)들 각각의 광선 방향(ray direction)

를 사용하여 각 시점 영상의 임의의 픽셀(i 픽셀)들 각각에 대한 3차원 공간상의 방향벡터

생성하는 단계를 포함할 수 있다. 또한, 이는 다시점 영상 중 특정 시점의 영상을 기준으로 다른 시점 영상에 대한 광선 방향을 생성하는 과정일 수 있다. Thereafter, a process of generating a ray direction based on the estimated optical flow ( S702 ) may be performed. As an example, in this process, the ray direction of each of arbitrary pixels (eg, i pixels) of each viewpoint image obtained from the optical flow

direction vector in 3D space for each arbitrary pixel (i-pixel) of each viewpoint image using

It may include the step of generating. Also, this may be a process of generating a ray direction for an image of a different viewpoint based on an image of a specific viewpoint among multi-view images.

또한, 생성된 광선 방향을 이용하여 깊이 영상을 추정(S703)하는 단계를 수행할 수 있다. 생성된 광선 방향이란, 즉, 상기에서 언급한 각 픽셀에 대한 방향 벡터일 수 있다. 일 예로서, 깊이 영상을 추정하는 단계는 상기에서 언급한 두 개 이상의 시점에 대한 영상을 촬영한 카메라 파라미터를 기반으로 할 수 있다. 일 예로서, 카메라 파라미터에는, 임의의 제n 카메라의 위치

, 제n 카메라 화각 등이 포함될 수 있으며, 여기서 카메라의 위치란 카메라의 시야의 중점, 즉 카메라에 의해 획득되는 영상의 중점을 의미할 수 있다. 또한, 깊이 영상은 방향 벡터

등을 더 사용하여 추정될 수 있다. 즉, 방향 벡터는 현재 시점의 영상의 임의의 픽셀에 대한 3차원 공간상의 좌표에 대한 최적해를 계산하는 데 사용될 수 있다. 일 예로서, 최적해를 구하는 방법은 최소 MSE(Mean Square Error)를 이용한 경사 하강법(gradient descent) 등 다양한 방법들을 이용할 수 있다. 이때 구해지는 최적해, 즉 3차원 공간 상의 좌표 P=(X,Y,Z)를 사용하여 현재 시점 픽셀 위치의 깊이 값이 추정되어, 이를 모든 픽셀에 반복하면, 깊이 영상이 추정될 수 있다. 일 예로서, 최적의 좌표는 하기와 같이 구해질 수 있다.Also, the step of estimating the depth image ( S703 ) may be performed using the generated ray direction. The generated ray direction may be the direction vector for each pixel mentioned above. As an example, the step of estimating the depth image may be based on the camera parameters for capturing images of the two or more viewpoints mentioned above. As an example, the camera parameter includes the position of an arbitrary nth camera.

, n-th camera angle of view, etc. may be included, and the position of the camera may mean a midpoint of a field of view of the camera, that is, a midpoint of an image acquired by the camera. Also, the depth image is a direction vector

It can be estimated using further That is, the direction vector may be used to calculate an optimal solution for coordinates in 3D space for any pixel of the image of the current view. As an example, various methods such as gradient descent using a minimum mean square error (MSE) may be used as a method of obtaining an optimal solution. At this time, the depth value of the current view pixel position is estimated using the obtained optimal solution, ie, coordinates P=(X,Y,Z) in the three-dimensional space, and repeating this for all pixels can estimate the depth image. As an example, the optimal coordinates may be obtained as follows.

[수학식 1][Equation 1]

한편, 도 7의 순서도는 본 개시의 일 실시예에 해당하므로, 본 개시가 이에 한정되는 것은 아니다. 따라서, 일부 단계의 순서가 변경되거나, 다른 단계가 추가되거나, 일부 단계가 제거되거나, 동시에 진행되는 것도 가능하다고 할 것이다. Meanwhile, since the flowchart of FIG. 7 corresponds to an embodiment of the present disclosure, the present disclosure is not limited thereto. Accordingly, it will be said that the order of some steps is changed, other steps are added, some steps are removed, or it is possible to proceed simultaneously.

도 8은 본 개시의 일 실시예에 따른 광학 흐름을 이용한 깊이 영상 추정 장치를 설명하기 위한 도면이다.8 is a view for explaining an apparatus for estimating a depth image using an optical flow according to an embodiment of the present disclosure.

일 예로서, 광학 흐름을 이용한 깊이 영상 추정 장치는 상기에서 언급한 깊이 영상 추정을 수행할 수 있으며, 도 7의 깊이 영상 추정 방법이 포함된다.As an example, the apparatus for estimating a depth image using an optical flow may perform the above-mentioned depth image estimation, and the depth image estimation method of FIG. 7 is included.

일 예로서, 도 8에 도시된 일 실시예에 따른 광학 흐름을 이용한 깊이 영상 추정 장치(801)는, 데이터를 저장하는 메모리(802) 및 메모리를 제어하는 프로세서(803)를 포함할 수 있다. 한편, 도 8에 도시되지는 않았으나, 외부 장치와 데이터를 송수신하는 송수신부나 사용자 입/출력 인터페이스 등을 더 포함할 수 있으며, 본 개시가 이에 한정되는 것은 아니다.As an example, the apparatus 801 for estimating a depth image using an optical flow according to the embodiment shown in FIG. 8 may include a memory 802 for storing data and a processor 803 for controlling the memory. Meanwhile, although not shown in FIG. 8 , a transceiver or a user input/output interface for transmitting/receiving data with an external device may be further included, and the present disclosure is not limited thereto.

일 예로서, 프로세서는 다시점 영상 간 광학 흐름을 추정할 수 있으며, 추정된 광학 흐름에 기초하여 다시점 영상 중 현재 시점 영상의 각 픽셀에 대한 광선 방향을 생성하고, 생성된 광선 방향을 이용하여 일정 시점 영상의 깊이 영상을 추정하되, 광선 방향은 3차원 공간 상의 방향벡터로 표현되며, 깊이 영상은, 상기 제1 시점 영상을 촬영한 카메라 파라미터에 기반하여 추정될 수 있다. 일 예로서, 추정된 광학 흐름은 딥 러닝(deep learning)을 기반으로 추정될 수 있다. 또한, 카메라 파라미터는 다시점 영상을 촬영한 카메라의 중심 위치를 포함할 수 있다. 한편, 깊이 영상을 추정을 추정할 때, 생성된 광선 방향을 이용하여 현재 시점 영상의 각 픽셀에 대한 3차원 공간 상의 좌표를 구할 수 있다. 한편 좌표는 최소 MSE를 이용한 경사 하강법(gradient descent)에 기초하여 획득될 수 있다. 이는 상기에서 다른 도면을 참조하여 설명한 바와 같다.As an example, the processor may estimate an optical flow between multi-viewpoint images, generate a ray direction for each pixel of a current view image among multi-view images based on the estimated optical flow, and use the generated ray direction to A depth image of a predetermined viewpoint image is estimated, a ray direction is expressed as a direction vector in a three-dimensional space, and the depth image may be estimated based on a camera parameter obtained by photographing the first viewpoint image. As an example, the estimated optical flow may be estimated based on deep learning. Also, the camera parameter may include a center position of a camera that has captured a multi-viewpoint image. Meanwhile, when estimating the depth image, coordinates in 3D space may be obtained for each pixel of the current view image by using the generated ray direction. Meanwhile, the coordinates may be obtained based on gradient descent using the minimum MSE. This is the same as described above with reference to other drawings.

본 개시에 의하면, 렌즈로부터 촬영상의 왜곡여부와 무관하게 보다 정확한 깊이 영상을 추정할 수 있으며, 구하고자 하는 시점의 영상을 중심으로 주변 시점의 영상을 이용하여 다수의 시점에 대한 영상 간의 광학 흐름을 계산할 수 있다. 또한, 이를 기반으로 한 3차원 좌표 추정이 가능하므로, 광학 흐름으로부터 생성된 광선(light ray)들을 최적화할 수 있다. According to the present disclosure, it is possible to estimate a more accurate depth image from the lens regardless of whether the photographing image is distorted, and the optical flow between images for a plurality of viewpoints can be obtained using images of peripheral viewpoints centered on the image of the viewpoint to be obtained. can be calculated In addition, since three-dimensional coordinate estimation based on this is possible, light rays generated from an optical flow can be optimized.

본 개시의 다양한 실시 예는 모든 가능한 조합을 나열한 것이 아니고 본 개시의 대표적인 양상을 설명하기 위한 것이며, 다양한 실시 예에서 설명하는 사항들은 독립적으로 적용되거나 또는 둘 이상의 조합으로 적용될 수도 있다. Various embodiments of the present disclosure do not list all possible combinations, but are intended to describe representative aspects of the present disclosure, and matters described in various embodiments may be applied independently or in combination of two or more.

또한, 본 개시의 다양한 실시 예는 하드웨어, 펌웨어(firmware), 소프트웨어, 또는 그들의 결합 등에 의해 구현될 수 있다. 또한, 하나의 소프트웨어가 아닌 하나 이상의 소프트웨어의 결합에 의해 구현될 수 있으며, 일 주체가 모든 과정을 수행하지 않을 수 있다. 예를 들어, 고도의 데이터 연산 능력 및 방대한 메모리를 요구하는 기계학습 과정은 클라우드나 서버에서 이루어지고, 사용자 측은 기계학습이 완료된 신경망만을 이용하는 방식으로 구현될 수도 있으며, 이에 한정되지 않음은 자명하다.In addition, various embodiments of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof. In addition, it may be implemented by a combination of one or more software rather than one software, and one subject may not perform all processes. For example, a machine learning process that requires a high degree of data computing power and a large amount of memory is performed in the cloud or a server, and the user side may be implemented in a way that only uses a neural network in which machine learning has been completed, but it is self-evident that it is not limited thereto.

하드웨어에 의한 구현의 경우, 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 범용 프로세서(general processor), 컨트롤러, 마이크로 컨트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다. 예를 들어, 상기 범용 프로세서를 포함한 다양한 형태를 띨 수도 있다. 하나 혹은 그 이상의 결합으로 이루어진 하드웨어로 개시될 수도 있음은 자명하다. For implementation by hardware, one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose It may be implemented by a processor (general processor), a controller, a microcontroller, a microprocessor, and the like. For example, it may take various forms including the general-purpose processor. It is apparent that hardware may be disclosed in combination of one or more.

본 개시의 범위는 다양한 실시 예의 방법에 따른 동작이 장치 또는 컴퓨터 상에서 실행되도록 하는 소프트웨어 또는 머신-실행 가능한 명령들(예를 들어, 운영체제, 애플리케이션, 펌웨어(firmware), 프로그램 등), 및 이러한 소프트웨어 또는 명령 등이 저장되어 장치 또는 컴퓨터 상에서 실행 가능한 비-일시적 컴퓨터-판독가능 매체(non-transitory computer-readable medium)를 포함한다. The scope of the present disclosure includes software or machine-executable instructions (eg, operating system, application, firmware, program, etc.) that cause an operation according to the method of various embodiments to be executed on a device or computer, and such software or and non-transitory computer-readable media in which instructions and the like are stored and executable on a device or computer.

한편, 각 도면을 참조하여 설명한 내용은 각 도면에만 한정되는 것은 아니며, 상반되는 내용이 없는 한 상호 보완적으로 적용될 수도 있다. Meanwhile, the contents described with reference to each drawing are not limited to each drawing, and as long as there is no contradiction, they may be applied complementary to each other.

이상에서 설명한 본 개시는, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 개시의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로, 본 개시의 범위는 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.Since the present disclosure described above can be various substitutions, modifications, and changes within the scope that does not depart from the technical spirit of the present disclosure for those of ordinary skill in the art to which the present disclosure pertains, the scope of the present disclosure is limited to the above It is not limited by one embodiment and the accompanying drawings.

Claims

In the depth image estimation method using optical flow,
estimating an optical flow between the first viewpoint image and the second viewpoint image;
generating a ray direction for each pixel of the first viewpoint image based on the estimated optical flow; and
estimating a depth image of the first viewpoint image by using the generated ray direction;
The ray direction is expressed as a direction vector in a three-dimensional space, and the depth image is estimated based on a camera parameter for capturing the first viewpoint image.