KR102034024B1

KR102034024B1 - scene flow Learning METHOD FOR scene flow Estimation AND scene flow Estimation METHOD

Info

Publication number: KR102034024B1
Application number: KR1020170146416A
Authority: KR
Inventors: 최영식; 황금별
Original assignee: 한국항공대학교산학협력단
Priority date: 2017-11-06
Filing date: 2017-11-06
Publication date: 2019-10-18
Also published as: KR20190051114A

Abstract

딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 방법에 관한 것이며, 장면 흐름 추정 방법은, (a) t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출하고, 상기 t 시간에서의 상기 제1 시점 이미지와 상기 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출하는 단계; 및 (b) 추출된 상기 시각적 디스패리티 표현자를 고려하여 산출된 상기 제1 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 상기 제1 목표 해상도에서의 디스패리티 확률정보를 추정하고, 추출된 상기 시각적 광학흐름 표현자를 고려하여 산출된 상기 제2 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 상기 제2 목표 해상도에서의 광학흐름 확률정보를 추정하는 단계를 포함할 수 있다.The present invention relates to a scene flow estimation method based on a deep neural network structure, wherein the scene flow estimation method comprises: (a) receiving a first view image and a second view image at time t while sequentially downsampling a visual disparity presenter; Extracts a visual disparity presenter at a target resolution, and inputs a visual optical flow presenter by inputting the first viewpoint image at time t and a first viewpoint image at time (t-1) Extracting the visual optical flow presenter at the second target resolution while sequentially down sampling; And (b) estimating disparity probability information at the first target resolution using the degree of matching for the disparity correspondence point candidate group at the first target resolution calculated in consideration of the extracted visual disparity presenter, and extracting Estimating optical flow probability information at the second target resolution using a degree of matching for the optical flow correspondence point candidate group calculated at the second target resolution calculated in consideration of the visual optical flow presenter.

Description

Scene flow learning method and scene flow estimation method for scene flow estimation

본원은 장면 흐름 추정을 위한 장면 흐름 학습 방법 및 장면 흐름 추정 방법에 관한 것으로서, 특히 실시간 장면 흐름(scene flow) 정밀 추정을 위한 딥 뉴럴 네트워크 구조 및 학습 방법에 관한 것이다.The present disclosure relates to a scene flow learning method and a scene flow estimation method for scene flow estimation, and more particularly, to a deep neural network structure and a learning method for real-time scene flow precision estimation.

장면 흐름(scene flow) 추정 기술은 무인기 자율비행, 자율주행 차 등 이동 로봇에 필요한 핵심 기술 중 하나이다. 여기서, 장면 흐름은 디스패리티(disparity)와 광학흐름(optical flow)을 의미하며, 디스패리티는 스테레오 이미지의 시점(view point) 차이로 인하여 발생하는 두 대응 픽셀 간의 수평 변위로 카메라와 물체간의 거리 정보를 나타내고, 광학흐름은 연속 이미지의 시차(time difference)로 발생하는 두 대응 픽셀 간의 수평, 수직 변위로 카메라와 물체의 움직임 정보를 나타낸다.Scene flow estimation technology is one of the key technologies for mobile robots such as autonomous drones and autonomous vehicles. Here, the scene flow refers to disparity and optical flow, and the disparity refers to the distance information between the camera and the object as a horizontal displacement between two corresponding pixels generated due to the difference in the view point of the stereo image. The optical flow represents the motion information of the camera and the object by the horizontal and vertical displacements between two corresponding pixels caused by the time difference of the continuous image.

스테레오 카메라(stereo camera) 센서의 경우 가격이 싸고 장면 흐름 정보뿐만 아니라 다양한 시각 정보를 추출할 수 있기 때문에, 관련 분야에서 스테레오 영상을 이용한 장면 흐름 추정 기술이 많이 이용되고 있다.Since a stereo camera sensor is inexpensive and can extract various visual information as well as scene flow information, a scene flow estimation technique using a stereo image is widely used in related fields.

스테레오 카메라를 이용한 장면 흐름(scene flow) 추정시, 주로 두 이미지 간 대응 픽셀(corresponding pixel)을 탐색하는 대응점 정합(correspondence matching) 기술이 사용된다. 이때, 두 픽셀 간의 정합도(matching degree)를 나타내는 연산으로서 구현이 쉽고 계산량이 적은 SAD(Sum of Absolute Difference)가 일반적으로 사용되고 있다. 그러나, SAD 방식만으로는 정확한 대응점을 찾기가 어려워 SAD를 적용한 이후에 다양한 후 처리를 추가로 시행해야 하기 때문에, 이는 장면 흐름에 대한 실시간 추정을 어렵게 하는 측면에 있다.In estimating scene flow using a stereo camera, a correspondence matching technique that mainly searches for corresponding pixels between two images is used. In this case, a SAD (Sum of Absolute Difference) that is easy to implement and has a small amount of calculation is generally used as an operation indicating a matching degree between two pixels. However, since the SAD method alone is difficult to find an exact correspondence point, various post-processes need to be additionally applied after the SAD is applied, which makes it difficult to make a real-time estimation of the scene flow.

최근에는 인공지능 분야에서 획기적인 성과를 내고 있는 딥 러닝을 사용하여 장면 흐름을 측정하는 방법들이 제안되고 있다. 딥 러닝 방식 중에서 기존의 방법에 필적하거나 더 나은 성과를 내고 있는 딥 러닝 정합 방식이 많은 주목을 받고 있다. 딥 러닝 정합 방식은 이미지 패치 간의 정합도를 콘벌루션 뉴럴 네트워크(Convolutional Neural Network, CNN)로 학습한 후에 픽셀의 대응점을 탐색한다. 그러나 종래의 딥 러닝 정합 방식은 주어진 해상도의 이미지에서 픽셀의 대응점을 탐색하기 때문에, 계산량이 적은 SAD와 달리, 수행 속도가 너무 느려져서 실시간으로 장면 흐름을 추정하는 데에 어려움이 있다.Recently, methods for measuring scene flow using deep learning, which have made breakthroughs in artificial intelligence, have been proposed. Among the deep learning methods, the deep learning matching method that is comparable to or better performing than the existing method is attracting much attention. The deep learning matching method detects correspondences of pixels after learning the degree of matching between image patches with a convolutional neural network (CNN). However, since the conventional deep learning matching method searches for corresponding points of pixels in an image of a given resolution, unlike the SAD which has a small amount of computation, the execution speed is too slow, which makes it difficult to estimate the scene flow in real time.

본원의 배경이 되는 기술은 논문 [Jure Zbontar, Yann LeCun, "Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches", Journal of Machine Learning Research 17 (2016) 1-32]에 개시되어 있다.The background technology of this application is disclosed in the article [Jure Zbontar, Yann LeCun, "Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches", Journal of Machine Learning Research 17 (2016) 1-32.

상기 논문은 CNN 신경망을 이용하여 스테레오 이미지의 특징을 추출하고, 두 이미지 간의 유사도(similarity) 측정을 위한 신경망 구조 및 학습 방법에 대해 제안한다. 그러나, 상기 논문에서 제안하는 신경망 구조 및 학습 방법은 깊이 실측치가 필요하고 유사도 측정 속도가 느리며 여러 단계의 후처리가 필요함에 따라 수행 시간이 오래 걸려, 실시간 장면 흐름 추정에 적용하는데에 한계가 있다.This paper proposes a neural network structure and a learning method for extracting features of stereo images using CNN neural networks and measuring similarity between two images. However, the neural network structure and learning method proposed in the paper have a long execution time due to the need for depth measurement, slowness of similarity measurement, and post-processing of several stages, and thus, there is a limitation in applying to real-time scene flow estimation.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 딥 러닝 정합 방식에 있어서 대응점 탐색 속도를 효과적으로 개선함과 동시에 딥 러닝 방식의 장점인 대응점 측정 정확도를 향상시킬 수 있는 장면 흐름 추정을 위한 딥 뉴럴 네트워크 구조 및 그에 기반한 장면 흐름 학습 방법과 장면 흐름 추정 방법을 제공하려는 것을 목적으로 한다.The present invention is to solve the above-described problems of the prior art, a deep neural for scene flow estimation that can effectively improve the speed of the matching point search in the deep learning matching method and at the same time improve the accuracy of the corresponding point measurement, which is an advantage of the deep learning method An object of the present invention is to provide a network structure, a scene flow learning method, and a scene flow estimation method based thereon.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the embodiments of the present application is not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제 1측면에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름(scene flow) 추정 방법은, (a) t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출하고, 상기 t 시간에서의 상기 제1 시점 이미지와 상기 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출하는 단계; 및 (b) 추출된 상기 시각적 디스패리티 표현자를 고려하여 산출된 상기 제1 목표 해상도에서의 디스패리티 대응점 후보군 에 대한 정합도를 이용하여 상기 제1 목표 해상도에서의 디스패리티 확률정보를 추정하고, 추출된 상기 시각적 광학흐름 표현자를 고려하여 산출된 상기 제2 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 상기 제2 목표 해상도에서의 광학흐름 확률정보를 추정하는 단계를 포함할 수 있다.As a technical means for achieving the above technical problem, a deep neural network structure-based scene flow estimation method according to the first aspect of the present application, (a) the first viewpoint image and the second viewpoint at t time Extracting the visual disparity presenter at the first target resolution while sequentially downsampling the visual disparity presenter with the image as input, and is earlier than the time t with the first viewpoint image at time t (t-1) Extracting the visual optical flow presenter at the second target resolution while sequentially down sampling the visual optical flow presenter as input to the first viewpoint image in time; And (b) estimating disparity probability information at the first target resolution by using the degree of matching for the disparity correspondence point candidate group at the first target resolution calculated in consideration of the extracted visual disparity presenter, and extracting the extracted disparity probability information at the first target resolution. Estimating optical flow probability information at the second target resolution using a degree of matching for the optical flow correspondence point candidate group calculated at the second target resolution calculated in consideration of the visual optical flow presenter.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제2 측면에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름(scene flow) 추정 방법은, (a) t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 목표 해상도 에서의 시각적 디스패리티 표현자를 추출하는 단계; 및 (b) 추출된 상기 시각적 디스패리티 표현자를 고려하여 산출된 상기 목표 해상도에서의 디스패리티 대응점 후보군 에 대한 정합도를 이용하여 상기 목표 해상도에서의 디스패리티 확률정보를 추정할 수 있다.As a technical means for achieving the above technical problem, a deep neural network structure-based scene flow estimation method according to the second aspect of the present application, (a) a first viewpoint image and a second viewpoint at t time Extracting the visual disparity presenter at the target resolution while sequentially down sampling the visual disparity presenter with the image; And (b) estimating disparity probability information at the target resolution using the degree of matching for the disparity correspondence point candidate group at the target resolution calculated in consideration of the extracted visual disparity presenter.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제3 측면에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름(scene flow) 추정 방법은, (a) t 시간에서의 제1 시점 이미지와 상기 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 목표 해상도에서의 시각적 광학흐름 표현자를 추출하는 단계; 및 (b) 추출된 상기 시각적 광학흐름 표현자를 고려하여 산출된 상기 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 상기 목표 해상도에서의 광학흐름 확률정보를 추정하는 단계를 포함할 수 있다.As a technical means for achieving the above technical problem, a deep neural network structure-based scene flow estimation method according to a third aspect of the present application, (a) the first viewpoint image at t time and the t time Extracting the visual optical flow presenter at the target resolution while sequentially down sampling the visual optical flow presenter as input to the first viewpoint image at an earlier (t-1) time; And (b) estimating optical flow probability information at the target resolution using the degree of matching for the optical flow correspondence point candidate group calculated at the target resolution calculated in consideration of the extracted visual optical flow presenter. .

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제4 측면에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정을 위한 장면 흐름 학습 방법은, (a) 복수의 계층 중 어느 하나인 다운 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지에 대응하는 타깃 디스패리티 확률정보를 학습정보로서 산출하는 단계; (b) 상기 다운 샘플링 계층에 포함된 다층 CNN의 적용을 통해 시각적 디스패리티 표현자를 추출하는 단계; 및 (c) 추출된 상기 시각적 디스패리티 표현자를 고려하여 산출된 상기 다운 샘플링 계층에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 상기 다운 샘플링 계층에서의 디스패리티 확률정보를 추정한 다음, 상기 다운 샘플링 계층에 대한 타깃 디스패리티 확률정보와의 차이가 최소화되도록 학습하는 단계를 포함하고, 상기 (a) 단계 내지 상기 (c) 단계는, t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자가 목표 해상도까지 순차적으로 다운 샘플링되는 상기 복수의 계층 각각에 대하여 차례로 수행될 수 있다.As a technical means for achieving the above technical problem, the scene flow learning method for scene flow estimation based on the deep neural network structure according to the fourth aspect of the present application, (a) a down sampling layer which is any one of a plurality of layers Calculating target disparity probability information corresponding to the first viewpoint image and the second viewpoint image at time t as learning information; (b) extracting a visual disparity presenter through application of a multi-layer CNN included in the down sampling layer; And (c) estimating disparity probability information in the down sampling layer using the degree of matching for the disparity correspondence point candidate group in the down sampling layer calculated in consideration of the extracted visual disparity presenter, and then Learning to minimize the difference with the target disparity probability information for the sampling layer, wherein steps (a) to (c) include inputting a first viewpoint image and a second viewpoint image at time t; As a result, a visual disparity presenter may be sequentially performed on each of the plurality of layers sequentially downsampled to a target resolution.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제5 측면에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정을 위한 장면 흐름 학습 방법은, (a) 복수의 계층 중 어느 하나인 다운 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 상기 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지에 대응하는 타깃 광학흐름 확률정보를 산출하는 단계; (b) 상기 다운 샘플링 계층에 포함된 다층 CNN의 적용을 통해 시각적 광학흐름 표현자를 추출하는 단계; 및 (c) 추출된 상기 시각적 광학흐름 표현자를 고려하여 산출된 상기 다운 샘플링 계층에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 상기 다운 샘플링 계층에서의 광학흐름 확률정보를 추정한 다음, 상기 다운 샘플링 계층에 대한 타깃 광학흐름 확률정보와의 차이가 최소화되도록 시각적 광학흐름 표현자를 학습하는 단계를 포함하고, 상기 (a) 단계 내지 상기 (c) 단계는, t 시간에서의 제1 시점 이미지 및 상기 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자가 목표 해상도까지 순차적으로 다운 샘플링되는 상기 복수의 계층 각각에 대하여 차례로 수행될 수 있다.As a technical means for achieving the above technical problem, the scene flow learning method for scene flow estimation based on the deep neural network structure according to the fifth aspect of the present application, (a) a down sampling layer which is any one of a plurality of layers Calculating target optical flow probability information corresponding to the first viewpoint image at time t and the first viewpoint image at time (t-1) earlier than the time t; (b) extracting a visual optical flow presenter through application of a multi-layer CNN included in the down sampling layer; And (c) estimating optical flow probability information in the down sampling layer using the degree of matching for the optical flow correspondence point candidate group in the down sampling layer calculated in consideration of the extracted visual optical flow descriptor. Learning a visual optical flow presenter such that a difference with the target optical flow probability information for the sampling layer is minimized, wherein steps (a) to (c) include: a first viewpoint image at time t and the A visual optical flow presenter may be sequentially performed on each of the plurality of hierarchies sequentially downsampled to a target resolution using the first viewpoint image at time (t-1).

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제6 측면에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름(scene flow) 추정 장치는, t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출하고, 상기 t 시간에서의 상기 제1 시점 이미지와 상기 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출하는 시각적 표현자 추출부; 및 추출된 상기 시각적 디스패리티 표현자를 고려하여 산출된 상기 제1 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 상기 제1 목표 해상도에서의 디스패리티 확률정보를 추정하고, 추출된 상기 시각적 광학흐름 표현자를 고려하여 산출된 상기 제2 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 상기 제2 목표 해상도에서의 광학흐름 확률정보를 추정하는 확률정보 추정부를 포함할 수 있다.As a technical means for achieving the above technical problem, the scene flow estimation apparatus based on the deep neural network structure according to the sixth aspect of the present invention, the first viewpoint image and the second viewpoint image at time t Extracting the visual disparity presenter at the first target resolution while sequentially downsampling the visual disparity presenter, and extracts the visual disparity presenter at the first target resolution at time (t-1) that is earlier than the time t and the first viewpoint image at time t. A visual presenter extracting unit extracting a visual optical flow presenter at a second target resolution while sequentially down sampling the visual optical flow presenter with the first viewpoint image; And estimating disparity probability information at the first target resolution by using a degree of matching for the disparity correspondence point candidate group at the first target resolution calculated in consideration of the extracted visual disparity presenter, and extracting the visual It may include a probability information estimator for estimating the optical flow probability information at the second target resolution by using the degree of matching for the optical flow corresponding point candidate group at the second target resolution calculated in consideration of the optical flow presenter.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제7 측면에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름(scene flow) 추정 장치는, t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 목표 해상도에서의 시각적 디스패리티 표현자를 추출하는 시각적 디스패리티 표현자 추출부; 및 추출된 상기 시각적 디스패리티 표현자를 고려하여 산출된 상기 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 상기 목표 해상도에서의 디스패리티 확률정보를 추정하는 디스패리티 확률정보 추정부를 포함할 수 있다.As a technical means for achieving the above technical problem, the scene flow estimation apparatus based on the deep neural network structure according to the seventh aspect of the present invention, the first viewpoint image and the second viewpoint image at time t A visual disparity presenter extracting unit configured to extract the visual disparity presenter at a target resolution while sequentially down sampling the visual disparity presenter; And a disparity probability information estimator estimating disparity probability information at the target resolution using a degree of matching for the disparity correspondence point candidate group at the target resolution calculated in consideration of the extracted visual disparity presenter. have.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제8 측면에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름(scene flow) 추정 장치는, t 시간에서의 제1 시점 이미지와 상기 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 목표 해상도에서의 시각적 광학흐름 표현자를 추출하는 시각적 광학흐름 표현자 추출부; 및 추출된 상기 시각적 광학흐름 표현자를 고려하여 산출된 상기 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 상기 목표 해상도에서의 광학흐름 확률정보를 추정하는 광학흐름 확률정보 추정부를 포함할 수 있다.As a technical means for achieving the above technical problem, the apparatus for estimating a scene flow based on a deep neural network structure according to an eighth aspect of the present disclosure may include a first viewpoint image at t time and a time earlier than the t time. (t-1) a visual optical flow presenter extracting unit extracting a visual optical flow presenter at a target resolution while sequentially down sampling the visual optical flow presenter as an input of the first viewpoint image at time; And an optical flow probability information estimator for estimating optical flow probability information at the target resolution using a matching degree for the optical flow correspondence point candidate group calculated at the target resolution calculated in consideration of the extracted visual optical flow presenter. have.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제9 측면에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치는, 복수의 계층 중 어느 하나인 다운 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지에 대응하는 타깃 디스패리티 확률정보를 산출하고, 상기 다운 샘플링 계층에 포함된 다층 CNN의 적용을 통해 시각적 디스패리티 표현자를 추출하고, 추출된 상기 시각적 디스패리티 표현자를 고려하여 산출된 상기 다운 샘플링 계층에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 상기 다운 샘플링 계층에 대한 디스패리티 확률정보를 추정한 다음, 상기 다운 샘플링 계층에 대한 타깃 디스패리티 확률분포와의 차이가 최소화되도록 시각적 디스패리티 표현자를 학습하는 디스패리티 학습부를 포함하고, 상기 디스패리티 학습부는, t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자가 목표 해상도까지 순차적으로 다운 샘플링되는 상기 복수의 계층 각각에 대하여 차례로 학습을 수행할 수 있다.As a technical means for achieving the above technical problem, the scene flow learning apparatus for scene flow learning according to the ninth aspect of the present application, the first viewpoint image at the time t with respect to the down sampling layer which is any one of a plurality of layers And calculating target disparity probability information corresponding to a second view image, extracting a visual disparity presenter through application of a multi-layer CNN included in the downsampling layer, and calculating the extracted disparity presenter. The disparity probability information for the down sampling layer is estimated using the degree of matching of the disparity correspondence point candidate group in the down sampling layer, and then visually minimized to be different from the target disparity probability distribution for the down sampling layer. A disparity learner learning a disparity presenter, Group disparity learning portion, a first point-in-time image and a second point visual disparity presenter of the plurality of layers that are down-sampled in sequence to the target resolution of the image as an input each from a time t may then perform learning.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 제10 측면에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치는, 복수의 계층 중 어느 하나인 다운 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 상기 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지에 대응하는 타깃 광학흐름 확률정보를 산출하고, 상기 다운 샘플링 계층에 포함된 다층 CNN의 적용을 통해 시각적 광학흐름 표현자를 추출하고, 추출된 상기 시각적 광학흐름 표현자를 고려하여 산출된 상기 다운 샘플링 계층에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 상기 다운 샘플링 계층에 대한 광학흐름 확률정보를 추정한 다음, 상기 다운 샘플링 계층에 대한 타깃 광학흐름 확률정보와의 차이가 최소화되도록 시각적 광학흐름 표현자를 학습하는 광학흐름 학습부를 포함하고, 상기 광학흐름 학습부는, t 시간에서의 제1 시점 이미지 및 상기 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자가 목표 해상도까지 순차적으로 다운 샘플링되는 상기 복수의 계층 각각에 대하여 차례로 학습을 수행할 수 있다.As a technical means for achieving the above technical problem, the scene flow learning apparatus for scene flow learning according to the tenth aspect of the present application, the first viewpoint image at time t with respect to the down sampling layer which is one of a plurality of layers And calculating target optical flow probability information corresponding to the first view image at time (t-1) earlier than the time t, and extracting the visual optical flow presenter through application of a multi-layer CNN included in the down sampling layer. And estimate the optical flow probability information for the down sampling layer using the degree of matching for the optical flow corresponding point candidate group in the down sampling layer calculated in consideration of the extracted visual optical flow presenter. An optical flow that learns visual optical flow presenters to minimize the difference with the target optical flow probability information for And a learner, wherein the optical flow learner sequentially inputs a first view image at time t and a first view image at time (t-1) to downsample the target optical resolution to a target resolution. Learning may be performed on each of the plurality of layers in turn.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-mentioned means for solving the problems are merely exemplary and should not be construed as limiting the present application. In addition to the above-described exemplary embodiments, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 딥 러닝 정합 방식에 있어서 대응점 탐색 속도를 효과적으로 개선함과 동시에 딥 러닝 방식의 장점인 대응점 측정 정확도를 향상시킬 수 있는 장면 흐름 추정을 위한 딥 뉴럴 네트워크 구조 및 그에 기반한 장면 흐름 학습 방법과 장면 흐름 추정 방법을 제공할 수 있다.According to the above-described problem solving means of the present invention, a deep neural network structure for scene flow estimation that can effectively improve the matching point search speed in the deep learning matching method and improve the accuracy of the corresponding point measurement, which is an advantage of the deep learning method, and Based on the scene flow learning method and the scene flow estimation method can be provided.

전술한 본원의 과제 해결 수단에 의하면, 다운 샘플링을 복수회 수행하여 최저해상도인 목표해상도에서의 대응점 정합을 수행함으로써, 대응점 간의 정합 연산량을 효과적으로 줄일 수 있어 실시간으로 장면 흐름을 추정할 수 있다.According to the above-described problem solving means of the present application, by performing the down-sampling a plurality of times to perform the matching point matching at the target resolution of the lowest resolution, it is possible to effectively reduce the amount of matching operation between the corresponding points, it is possible to estimate the scene flow in real time.

전술한 본원의 과제 해결 수단에 의하면, 원본 해상도보다 해상도가 낮은 목표 해상도(최저 해상도)에서 대응점 후보군에 대한 정합도를 연산함으로써 종래에 원본 해상도에서 정합도 연산을 수행하는 것 대비 연산량을 효과적으로 줄일 수 있다. 이에 더하여, 본원은 다운 샘플링 계층마다 그 해상도에 대응하는 이미지 상의 대응 후보점 사이의 거리에 반비례하도록 산출되는 타깃 확률정보와 추정된 확률정보 간의 차이가 최소화되도록 학습을 수행함으로써 장면 흐름 추정의 정확도 내지 신뢰성을 소정 이상 확보할 수 있는 효과적인 학습이 수행될 수 있다.According to the above-described problem solving means of the present application, by calculating the degree of matching for the matching point candidate group at the target resolution (lowest resolution) that is lower than the original resolution, it is possible to effectively reduce the amount of computation compared to performing the matching operation at the original resolution have. In addition, the present application performs the learning to minimize the difference between the estimated probability information and the target probability information calculated to be inversely proportional to the distance between the corresponding candidate points on the image corresponding to the resolution for each down sampling layer. Effective learning can be performed to secure a predetermined or more reliability.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effects obtainable herein are not limited to the effects as described above, and other effects may exist.

도 1은 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치의 개략적인 구성을 나타낸 도면이다.
도 2는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 시각적 디스패리티 표현자 추출을 위한 시각적 디스패리티 표현자 추출 구조를 나타낸 도면이다.
도 3a는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 원본 이미지에서의 디스패리티의 추정을 위한 디스패리티 추정 구조의 일 구현예를 나타낸 도면이다.
도 3b는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 원본 이미지에서의 디스패리티의 추정을 위한 디스패리티 추정 구조의 다른 구현예를 나타낸 도면이다.
도 4는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 시각적 광학흐름 표현자 추출을 위한 시각적 광학흐름 표현자 추출 구조를 나타낸 도면이다.
도 5a는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 원본 이미지에서의 광학흐름의 추정을 위한 광학흐름 추정 구조의 일 구현예를 나타낸 도면이다.
도 5b는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 원본 이미지에서의 광학흐름의 추정을 위한 광학흐름 추정 구조의 다른 구현예를 나타낸 도면이다.
도 6은 본원의 일 실시예에 따른 장면 흐름 추정을 위한 장면 흐름 학습시 고려되는 타깃 디스패리티 확률분포의 예를 나타낸 도면이다.
도 7은 본원의 일 실시예에 따른 장면 흐름 추정을 위한 장면 흐름 학습시 고려되는 타깃 광학흐름 확률분포의 예를 나타낸 도면이다.
도 8은 본원의 제1 실시예에 따른 장면 흐름 추정 방법에 대한 동작 흐름도이다.1 is a diagram illustrating a schematic configuration of an apparatus for estimating a scene flow based on a deep neural network structure according to the first embodiment of the present disclosure.
FIG. 2 is a diagram illustrating a visual disparity presenter extraction structure for visual disparity presenter extraction in a deep neural network structure based scene flow estimation apparatus according to a first embodiment of the present disclosure.
3A is a diagram illustrating an embodiment of a disparity estimation structure for estimating disparity in an original image in a deep neural network structure based scene flow estimation apparatus according to a first embodiment of the present disclosure.
FIG. 3B is a diagram illustrating another implementation of a disparity estimation structure for estimating disparity in an original image in the deep neural network structure based scene flow estimation apparatus according to the first embodiment of the present disclosure.
FIG. 4 is a diagram illustrating a visual optical flow presenter extraction structure for extracting a visual optical flow presenter in a deep neural network structure based scene flow estimation apparatus according to a first embodiment of the present disclosure.
FIG. 5A illustrates an embodiment of an optical flow estimation structure for estimating optical flow in an original image in a deep neural network structure based scene flow estimation apparatus according to a first embodiment of the present disclosure.
FIG. 5B illustrates another embodiment of an optical flow estimation structure for estimating optical flow in an original image in the deep neural network structure based scene flow estimation apparatus according to the first embodiment of the present disclosure.
6 is a diagram illustrating an example of a target disparity probability distribution considered in scene flow learning for scene flow estimation according to an embodiment of the present disclosure.
7 is a diagram illustrating an example of a target optical flow probability distribution considered in scene flow learning for scene flow estimation according to an exemplary embodiment of the present application.
8 is a flowchart illustrating a scene flow estimation method according to the first embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present disclosure. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted for simplicity of explanation, and like reference numerals designate like parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다.Throughout this specification, when a part is "connected" to another part, it is not only "directly connected" but also "electrically connected" or "indirectly connected" with another element in between. "Includes the case.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when a member is said to be located on another member "on", "upper", "top", "bottom", "bottom", "bottom", this means that any member This includes not only the contact but also the presence of another member between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when a part is said to "include" a certain component, it means that it can further include other components, without excluding the other components unless specifically stated otherwise.

도 1은 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치(100)의 개략적인 구성을 나타낸 도면이다. 이하에서는 본원의 설명의 편의상 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치(100)를 본 장면 흐름 추정 장치(100)라 하기로 한다.FIG. 1 is a diagram illustrating a schematic configuration of an apparatus 100 for estimating a scene flow based on a deep neural network structure according to the first embodiment of the present disclosure. Hereinafter, for convenience of description, the scene flow estimation apparatus 100 based on the deep neural network structure according to the first embodiment of the present application will be referred to as the scene flow estimation apparatus 100.

도 1을 참조하면, 본 장면 흐름 추정 장치(100)는 스테레오 카메라(1)로부터 획득된 이미지(영상)을 이용하여 디스패리티와 광학흐름을 포함하는 장면 흐름(scene flow)을 추정할 수 있다.Referring to FIG. 1, the apparatus 100 for estimating scene flow may estimate a scene flow including disparity and optical flow using an image (image) obtained from the stereo camera 1.

장면 흐름(scene flow)은 디스패리티(disparity)와 광학흐름(optical flow)을 의미한다. 여기서, 디스패리티는 스테레오 카메라(1)로부터 획득된 이미지에 있어서 스테레오 이미지의 시점(view point) 차이로 인하여 발생하는 두 대응 픽셀 간의 수평 변위로서, 카메라와 물체 간의 거리 정보를 나타낸다. 광학흐름은 스테레오 카메라(1)로부터 획득된 이미지에 있어서 연속된 이미지의 시차(time difference)로 발생하는 두 대응 픽셀 간의 수평, 수직 변위로서, 카메라와 물체의 움직임 정보를 나타낸다.Scene flow refers to disparity and optical flow. Here, the disparity is a horizontal displacement between two corresponding pixels generated due to the difference in the view point of the stereo image in the image obtained from the stereo camera 1 and represents distance information between the camera and the object. The optical flow is a horizontal and vertical displacement between two corresponding pixels generated by the time difference of successive images in the image acquired from the stereo camera 1, and represents the motion information of the camera and the object.

본 장면 흐름 추정 장치(100)는 시각적 표현자 추출부(110) 및 확률정보 추정부(120)를 포함할 수 있다. 표현자 추출부(110)는 시각적 디스패리티 표현자 추출부(111) 및 시각적 광학흐름 표현자 추출부(112)를 포함할 수 있고, 확률정보 추정부(120)는 디스패리티 확률정보 추정부(121) 및 광학흐름 확률정보 추정부(122)를 포함할 수 있다. 또한, 본 장면 흐름 추정 장치(100)는 추정부(130)를 포함할 수 있으며, 추정부(130)는 디스패리티 추정부(131) 및 광학흐름 추정부(132)를 포함할 수 있다.The scene flow estimating apparatus 100 may include a visual presenter extractor 110 and a probability information estimator 120. The presenter extractor 110 may include a visual disparity presenter extractor 111 and a visual optical flow presenter extractor 112, and the probability information estimator 120 may include a disparity probability information estimator ( 121) and the optical flow probability information estimating unit 122. In addition, the apparatus 100 for estimating the scene flow may include an estimator 130, and the estimator 130 may include a disparity estimator 131 and an optical flow estimator 132.

이하에서는 설명의 편의상 먼저 디스패리티 추정 과정에 대하여 구체적으로 설명하고, 이후 광학흐름 추정 과정에 대하여 구체적으로 설명하기로 한다.Hereinafter, for convenience of explanation, the disparity estimation process will be described in detail, and then the optical flow estimation process will be described in detail.

시각적 표현자 추출부(110)의 시각적 디스패리티 표현자 추출부(111)는 스테레오 카메라(1)를 통해 획득된 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출할 수 있다.The visual disparity presenter extracting unit 111 of the visual presenter extracting unit 110 receives the visual disparity presenter by inputting the first viewpoint image and the second viewpoint image at the time t obtained through the stereo camera 1. The visual disparity presenter at the first target resolution can be extracted while sequentially down sampling.

여기서, 일예로 t 시간에서의 제1 시점 이미지는 스테레오 카메라(1)를 통해 t 시간에 획득된 좌측 이미지(

)를 의미하고, t 시간에서의 제2 시점 이미지는 스테레오 카메라(1)를 통해 t 시간에 획득된 우측 이미지(

)를 의미할 수 있다.Here, for example, the first view image at time t is a left image (timed at time t through the stereo camera 1).

), And the second view image at time t is the right image obtained at time t through the stereo camera 1.

May mean.

또한, 제1 목표 해상도에서의 시각적 디스패리티 표현자(

,

)는 t 시간에서의 제1 시점 이미지에 대한 목표 해상도에서의 시각적 디스패리티 표현자(

) 및 t 시간에서의 제2 시점 이미지에 대한 목표 해상도에서의 시각적 디스패리티 표현자(

)를 포함할 수 있다. 즉, 제1 목표 해상도라 함은 t 시간에서의 제1 시점 이미지와 t 시간에서의 제2 시점 이미지 각각의 목표 해상도를 의미할 수 있다. 목표 해상도는 최저해상도를 의미할 수 있다.In addition, the visual disparity presenter at the first target resolution (

,

) Is the visual disparity presenter () at the target resolution for the first viewpoint image at time t.

) And the visual disparity presenter at the target resolution for the second viewpoint image at time t (

) May be included. That is, the first target resolution may mean a target resolution of each of the first viewpoint image at t time and the second viewpoint image at t time. The target resolution may mean the lowest resolution.

시각적 디스패리티 표현자의 추출 과정은 도 2를 참조하여 보다 쉽게 이해될 수 있다.The extraction process of the visual disparity presenter can be more easily understood with reference to FIG. 2.

도 2는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치(100)에서 시각적 디스패리티 표현자 추출을 위한 시각적 디스패리티 표현자 추출 구조를 나타낸 도면이다.FIG. 2 is a diagram illustrating a visual disparity presenter extraction structure for extracting a visual disparity presenter in the deep neural network structure based scene flow estimation apparatus 100 according to the first embodiment of the present disclosure.

도 2를 참조하면, 본 장면 흐름 추정 장치(100)는 시각적 디스패리티 표현자를 추출하기 위해 복수의 레이어(10)를 계층적으로 구비할 수 있다. 여기서, 복수의 레이어(10)는 디스패리티 대응 복수 레이어(10)라 표현할 수 있으며, 디스패리티 대응 복수 레이어(10)는 제1 레이어(11, 레이어 #1), 제2 레이어(12, 레이어 #2), 제3 레이어(13, 레이어 #3) 등을 포함할 수 있다. 또한, 후술할 본원에서 제1 레이어(11)는 제1 계층, 제1 다운 샘플링 계층 등으로 달리 표현될 수 있으며, 제2 레이어(12)는 제2 계층, 제2 다운 샘플링 계층 등으로 달리 표현될 수 있다.Referring to FIG. 2, the apparatus 100 for estimating the scene flow may include a plurality of layers 10 hierarchically to extract a visual disparity presenter. Here, the plurality of layers 10 may be referred to as a plurality of disparity corresponding layers 10, and the plurality of disparity corresponding layers 10 may be a first layer 11 (layer # 1) or a second layer 12 (layer # 12). 2), the third layer 13, layer # 3, and the like. In addition, the first layer 11 may be differently represented as a first layer, a first down sampling layer, and the like, which will be described later. The second layer 12 may be differently represented as a second layer, a second down sampling layer, and the like. Can be.

디스패리티 대응 복수 레이어(10) 각각은 다층 콘벌루션 뉴럴 네트워크(Convolutional Neural Network, CNN)(즉, 다층 CNN) 및 다운 샘플링부(down sampling)를 포함할 수 있다. 구체적인 일예로, 제1 레이어(11)는 제1 다층 CNN(11a, 다층 콘벌루션 뉴럴 네트워크 #1) 및 제1 다운 샘플링부(11b, 다운 샘플링 #1)를 포함할 수 있다. 제2 레이어(12)는 제2 다층 CNN(12a, 다층 콘벌루션 뉴럴 네트워크 #2) 및 제2 다운 샘플링부(12b, 다운 샘플링 #2)를 포함할 수 있다.Each of the disparity corresponding multiple layers 10 may include a multi-layer convolutional neural network (CNN) (ie, a multi-layer CNN) and a down sampling unit. As a specific example, the first layer 11 may include a first multilayer CNN 11a (multilayer convolutional neural network # 1) and a first down sampling unit 11b (down sampling # 1). The second layer 12 may include a second multilayer CNN 12a (multilayer convolutional neural network # 2) and a second down sampling unit 12b (down sampling # 2).

시각적 디스패리티 표현자 추출부(111)는 도 2에 도시된 바와 같은 시각적 디스패리티 표현자 추출 구조에 기초하여 순차적으로 다운 샘플링하면서 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출할 수 있다.The visual disparity presenter extractor 111 may extract the visual disparity presenter at the first target resolution while downsampling sequentially based on the visual disparity presenter extraction structure as shown in FIG. 2.

구체적으로, 시각적 디스패리티 표현자 추출부(111)는 순차적으로 다운 샘플링 수행시 계층적으로 구비된 디스패리티 대응 복수 레이어(10) 각각에 포함된 다층 CNN의 적용을 통해 디스패리티 대응 복수 레이어(10) 각각에 대응하는 해상도별 시각적 디스패리티 표현자를 추출하고, 해상도별 시각적 디스패리티 표현자에 대하여 다운 샘플링을 수행할 수 있다.In detail, the visual disparity presenter extractor 111 sequentially applies the disparity-corresponding multiple layers 10 by applying the multi-layer CNN included in each of the disparity-corresponding multiple layers 10 provided hierarchically when downsampling is performed. ) The visual disparity presenter for each resolution corresponding to each resolution may be extracted and downsampling may be performed for the visual disparity presenter for each resolution.

일예로, t 시간에서의 제1 시점 이미지가 제1 레이어(11) 내의 제1 다층 CNN(11a)의 입력값으로 적용될 수 있다. 시각적 디스패리티 표현자 추출부(111)는 제t 시간에서의 제1 시점 이미지에 제1 레이어(11) 내의 제1 다층 CNN(11a)을 적용함으로써 t 시간에서의 제1 시점 이미지의 원본 해상도에서의 제1 시각적 디스패리티 표현자(11c, 시각적 디스패리티 표현자 #1)를 추출할 수 있다. 이후 시각적 디스패리티 표현자 추출부(111)는 제1 시각적 디스패리티 표현자(11c)에 대하여 제1 다운 샘플링부(11b)를 통해 다운 샘플링을 수행할 수 있다. 제1 다운 샘플링부(11b)의 출력값은 제2 레이어(12) 내의 제2 다층 CNN(12a)의 입력값으로 적용될 수 있다. 한편, t 시간에서의 제1 시점 이미지에 대하여 설명한 내용은 t시간에서의 제2 시점 이미지에 대한 설명에도 동일 또는 유사하게 적용 가능하며, 이하 중복되는 설명은 생략하기로 한다. 이와 같은 과정이 디스패리티 대응 복수 레이어(10) 각각에 대하여 순차적으로 반복하여 수행됨으로써, 시각적 디스패리티 표현자 추출부(111)는 최저해상도인 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출할 수 있다.For example, the first view image at time t may be applied as an input value of the first multilayer CNN 11a in the first layer 11. The visual disparity presenter extracting unit 111 applies the first multi-layer CNN 11a in the first layer 11 to the first viewpoint image at the t-th time, at the original resolution of the first viewpoint image at the t-time. The first visual disparity presenter 11c (visual disparity presenter # 1) may be extracted. Thereafter, the visual disparity presenter extractor 111 may perform down sampling on the first visual disparity presenter 11c through the first down sampling unit 11b. The output value of the first down sampling unit 11b may be applied as an input value of the second multi-layer CNN 12a in the second layer 12. In the meantime, the description of the first viewpoint image at time t is equally or similarly applicable to the description of the second viewpoint image at time t, and overlapping descriptions will be omitted. Since the above process is repeatedly performed for each of the plurality of disparity corresponding layers 10, the visual disparity presenter extractor 111 may extract the visual disparity presenter at the first target resolution, which is the lowest resolution. have.

달리 표현하면, 시각적 디스패리티 표현자 추출부(111)는 입력 이미지 패치 P에 제1 레이어(11) 내의 제1 다층 CNN(11a)을 적용함으로써 디스패리티 추정을 위한 시각적 디스패리티 표현자로서 제1 레이어(11)에 대응하는 제1 시각적 디스패리티 표현자(11c, f₁)를 추출할 수 있다. 이후, 시각적 디스패리티 표현자 추출부(111)는 제1 시각적 디스패리티 표현자(11c, f₁)를 제1 다운 샘플링부(11b)를 통해 다운 샘플링함으로써

를 추출할 수 있다. 여기서,

는 제1 시각적 디스패리티 표현자(f₁)가 다운 샘플링된 시각적 디스패리티 표현자를 의미할 수 있다. 이후, 시각적 디스패리티 표현자 추출부(111)는 제1 다운 샘플링부(11b)의 출력값

에 제2 레이어(12) 내의 제2 다층 CNN(12a)를 적용함으로써 제1 레이어(11)로부터 출력된 이미지의 해상도에서의 시각적 디스패리티 표현자 f₂(즉, 제2 레이어(12)에 대응하는 제2 시각적 디스패리티 표현자)를 추출할 수 있다. 이러한 과정은 디스패리티 대응 복수 레이어(10) 각각에 대하여 순차적으로 진행될 수 있으며, 이를 통해 시각적 디스패리티 표현자 추출부(111)는 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출할 수 있다.In other words, the visual disparity presenter extracting unit 111 applies the first multi-layer CNN 11a in the first layer 11 to the input image patch P so as to be a first visual disparity presenter for disparity estimation. The first visual disparity presenters 11c and f ₁ corresponding to the layer 11 may be extracted. Thereafter, the visual disparity presenter extractor 111 down-samples the first visual disparity presenter 11c and f ₁ through the first down sampling unit 11b.

Can be extracted. here,

Denotes a visual disparity presenter for which the first visual disparity presenter f ₁ is down sampled. Thereafter, the visual disparity presenter extractor 111 outputs the output value of the first down sampling unit 11b.

Corresponds to the visual disparity presenter f ₂ (ie, second layer 12) at the resolution of the image output from the first layer 11 by applying a second multilayer CNN 12a in the second layer 12 to Second visual disparity presenter). This process may be sequentially performed for each of the plurality of disparity corresponding layers 10, and through this, the visual disparity presenter extractor 111 may extract the visual disparity presenter at the first target resolution.

시각적 디스패리티 표현자 추출부(111)는 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출함에 있어서, 제1 복수 레이어 각각에 대응하는 계층별로, 다운 샘플링을 통해 점차 줄어든 각 해상도에 대응하는 시각적 디스패리티 표현자를 순차적으로 추출할 수 있다. 달리 표현하여, 시각적 디스패리티 표현자 추출부(111)는 순차적으로 다운 샘플링을 수행함으로써 해상도 별로 시각적 디스패리티 표현자를 추출할 수 있다. 시각적 디스패리티 표현자 추출부(111)는 계층별(레이어별) 디스패리티 측정을 위한 시각적 표현자를 추출할 수 있다.When the visual disparity presenter extractor 111 extracts the visual disparity presenter at the first target resolution, the visual disparity corresponding to each resolution gradually reduced through downsampling for each layer corresponding to each of the first plurality of layers. Parity descriptors can be extracted sequentially. In other words, the visual disparity presenter extractor 111 may extract the visual disparity presenter for each resolution by sequentially performing down sampling. The visual disparity presenter extractor 111 may extract a visual presenter for disparity measurement by layer (by layer).

시각적 디스패리티 표현자 추출부(111)를 통해 제1 목표 해상도에서의 시각적 디스패리티 표현자가 추출된 이후에, 디스패리티 확률정보 추정부(121)는 제1 목표 해상도에서의 시각적 디스패리티 표현자를 고려하여 산출된 제1 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도(유사도)를 이용하여 제1 목표 해상도에서의 디스패리티 확률정보를 추정할 수 있다. 즉, 디스패리티 확률정보 추정부(121)는 정합도를 고려하여 제1 목표 해상도에서의 디스패리티 대응점 위치에 대한 확률정보를 추정할 수 있다. 이러한 디스패리티 확률정보 추정부(121)는 추정된 제1 목표 해상도에서의 디스패리티 확률정보를 기반으로 디스패리티 측면에서의 장면 흐름을 추정할 수 있다. 여기서, 디스패리티가 수평 변위만 고려되는 깊이 맵 개념임에 따라 디스패리티 확률분포는 2차원 형태로 나타날 수 있다.After the visual disparity presenter at the first target resolution is extracted by the visual disparity presenter extractor 111, the disparity probability information estimator 121 considers the visual disparity presenter at the first target resolution. The disparity probability information at the first target resolution may be estimated using the degree of matching (similarity) for the disparity correspondence point candidate group at the first target resolution. That is, the disparity probability information estimator 121 may estimate probability information on the position of the disparity correspondence point at the first target resolution in consideration of the degree of matching. The disparity probability information estimator 121 may estimate a scene flow in terms of disparity based on the disparity probability information at the estimated first target resolution. Here, as the disparity is a depth map concept in which only horizontal displacement is considered, the disparity probability distribution may appear in a two-dimensional form.

여기서, 디스패리티 대응점 후보군은 제1 목표 해상도에서의 제1 시점 이미지의 어느 한 픽셀과 디스패리티 측면에서 대응 가능성이 있는 제1 목표 해상도에서의 제2 시점 이미지의 모든 픽셀일 수 있다. 예를 들면, 픽셀 관점에서 제1 목표 해상도에서 제1 시점 이미지의 3번째 행에 해당하는 어느 한 픽셀은 디스패리티 측면에서 제1 목표 해상도에서 제2 시점 이미지의 3번째 행에 해당하는 모든 픽셀과 대응될 가능성이 있으므로, 제1 목표 해상도에서 제2 시점 이미지의 3번째 행에 해당하는 모든 픽셀과 조합되는 것을 대응점 후보군으로 볼 수 있을 것이다.Here, the disparity correspondence point candidate group may be any pixel of the first viewpoint image at the first target resolution and all the pixels of the second viewpoint image at the first target resolution that may correspond in terms of disparity. For example, any pixel corresponding to the third row of the first viewpoint image at the first target resolution in terms of pixels may be equal to all pixels corresponding to the third row of the second viewpoint image at the first target resolution in terms of disparity. Since there is a possibility of correspondence, the combination of all the pixels corresponding to the third row of the second view image at the first target resolution may be regarded as the corresponding point candidate group.

한편, 종래에는 장면 흐름 추정을 위한 이미지 간의 정합도 계산시 처음에 주어진 원본 해상도에서의 이미지에서 픽셀의 대응점을 탐색하기 때문에, 많은 양의 픽셀에 대한 탐색으로 인해 정합도 계산시(즉, 대응점 간의 접합 연산시) 많은 시간이 소요되는 문제가 있었다.On the other hand, conventionally, since the corresponding point of the pixel is searched for in the image at the original resolution given at the time of calculating the match between the images for the scene flow estimation, the search for the large amount of pixels (that is, between the corresponding points There was a problem that takes a lot of time.

이러한 문제를 해소하고자, 디스패리티 확률정보 추정부(121)는 디스패리티 대응점 탐색시 탐색 범위를 효과적으로 줄이기 위해 최저해상도인 제1 목표 해상도에 대하여 정합도(유사도) 계산을 수행할 수 있다. 즉, 본원은 시각적 디스패리티 표현자 추출부(111)에 의하여, 다운 샘플링을 통해 최저해상도인 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출한 이후에, 최저해상도인 제1 목표 해상도에 기초하여 제1 목표 해상도에 포함된 디스패리티 대응점 후보군에 대한 정합도를 디스패리티 확률정보 추정부(121)에 의하여 계산할 수 있다. 또한 디스패리티 확률정보 추정부(121)는 계산된 정합도에 따른 제1 목표 해상도에서의 디스패리티 확률정보를 추정할 수 있으며, 달리 말해 계산된 정합도에 기초하여 디스패리티 대응점 위치에 대한 디스패리티 확률정보를 추정할 수 있다.In order to solve this problem, the disparity probability information estimator 121 may perform a matching degree (similarity) calculation on the first target resolution, which is the lowest resolution, to effectively reduce the search range when searching for a disparity correspondence point. That is, the present application extracts the visual disparity presenter at the first target resolution, which is the lowest resolution, by downsampling by the visual disparity presenter extracting unit 111, and then based on the first target resolution that is the lowest resolution. The degree of match for the disparity correspondence point candidate group included in the target resolution may be calculated by the disparity probability information estimator 121. In addition, the disparity probability information estimating unit 121 may estimate the disparity probability information at the first target resolution according to the calculated degree of matching, or in other words, the disparity of the disparity corresponding point position based on the calculated degree of matching. Probability information can be estimated.

이러한 본원은 정합도 계산시의 계산량을, 종래에 원본 해상도에 대한 정합도 계산시 요구되는 계산량 대비 다운 샘플링 횟수에 비례하여 기하급수적으로 줄일 수 있다. 즉, 본원은 다운 샘플링으로 인해 줄어든 제1 목표 해상도에서의 대응점 후보군에 대하여 정합도 계산을 수행함으로써, 디스패리티 대응점 탐색 범위를 효과적으로 줄여 정합도 계산시 요구되는 계산량을 줄이고, 이로 인해 장면 흐름 추정시 소요되는 시간을 효과적으로 줄일 수 있다.The present application can reduce the amount of calculation in the degree of matching calculation exponentially in proportion to the number of down-sampling compared to the amount of calculation required in the conventional calculation of the degree of matching for the original resolution. That is, the present application performs matching degree calculation on the matching point candidate group at the first target resolution reduced due to down-sampling, thereby effectively reducing the disparity matching point search range, thereby reducing the amount of calculation required for calculating the matching degree, and thus, in estimating scene flow. This can effectively reduce the time required.

디스패리티 확률정보 추정부(121)는 디스패리티 대응점 후보군에 대한 정합도(유사도)를 제1 목표 해상도에서의 시각적 디스패리티 표현자 간의 내적 연산에 의하여 산출할 수 있다. 이때, 본원에서는 정합도(유사도) 산출시 일예로 내적 연산을 이용할 수 있으며, 이에만 한정되는 것은 아니고, 정합도(유사도) 산출을 위한 다양한 방법이 이용될 수 있다.The disparity probability information estimator 121 may calculate a degree of matching (similarity) for the disparity correspondence point candidate group by an inner product operation between the visual disparity presenters at the first target resolution. In this case, the present invention can use the internal calculation as an example when calculating the degree of matching (similarity), but is not limited thereto, and various methods for calculating the degree of matching (similarity) may be used.

디스패리티 확률정보 추정부(121)는 제1 목표 해상도에 대응하는 제1 시점 이미지와 제1 목표 해상도에 대응하는 제2 시점 이미지 각각으로부터 추출된 시각적 디스패리티 표현자 간의 정합도를 계산할 수 있다. 일예로, 디스패리티 확률분포 추정부(121)는 픽셀 좌표 p₁, p₂에서 시각적 디스패리티 표현자 f(p₁), f(p₂)의 내적(dot-product) 연산 s(p₁, p₂) = < f(p₁), f(p₂) > 를 적용함으로써 정합도(유사도)를 계산(측정, 연산)할 수 있다.The disparity probability information estimator 121 may calculate a degree of matching between the visual disparity presenter extracted from each of the first viewpoint image corresponding to the first target resolution and the second viewpoint image corresponding to the first target resolution. For example, the disparity probability distribution estimator 121 performs dot-product operations s (p ₁ ,) of the visual disparity presenters f (p ₁ ) and f (p ₂ ) at the pixel coordinates p ₁ and p ₂ . By applying p ₂ ) = <f (p ₁ ), f (p ₂ )>, the degree of matching (similarity) can be calculated (measured, calculated).

디스패리티 확률정보 추정부(121)는 계산(측정, 연산)된 정합도를 이용하여 제1 목표 해상도에서의 디스패리티 확률정보를 추정할 수 있으며, 이때 추정되는 디스패리티 확률정보는 정규화된 확률분포(확률분포)일 수 있다.The disparity probability information estimator 121 may estimate the disparity probability information at the first target resolution by using the calculated degree of matching (measurement and calculation), and the estimated disparity probability information is a normalized probability distribution. (Probability distribution).

구체적으로, 디스패리티 확률정보 추정부(121)는 계산된 대응점들과의 정합도에 정규화를 위해 소프트맥스 함수(softmax function)를 적용함으로써 제1 목표 해상도에서의 디스패리티 대응점 위치에 대한 확률분포(즉, 디스패리티 확률분포)를 추정할 수 있다. 달리 말해, 디스패리티 확률분포는 제1 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도에 소프트맥스 함수를 적용함으로써 추정될 수 있다. 디스패리티 확률정보 추정부(121)는 하기 수학식 1을 이용하여 디스패리티 확률분포를 추정할 수 있다.Specifically, the disparity probability information estimator 121 applies a probability distribution for the disparity correspondence point position at the first target resolution by applying a softmax function for normalization to the degree of matching with the corresponding correspondence points. That is, the disparity probability distribution) can be estimated. In other words, the disparity probability distribution may be estimated by applying a softmax function to the degree of matching for the disparity correspondence point candidate group at the first target resolution. The disparity probability information estimator 121 may estimate the disparity probability distribution using Equation 1 below.

여기서,

는 다운 샘플링된 제1 목표 해상도의 픽셀 좌표

에서 수평 변위가

일 확률을 의미한다. 여기서,

는 디스패리티를 나타내는 것으로서, 최저 디스패리티는 0이고 최대 디스패리티는

일 수 있다.

는 각각 제1 시점 이미지 및 제2 시점 이미지의 다운 샘플링 픽셀 좌표를 나타낸다. 달리 표현하여,

는 각각 t시간에서의 제1 시점 이미지에 대응하는 목표 해상도에서의 픽셀 좌표 및 t시간에서의 제2 시점 이미지에 대응하는 목표 해상도에서의 픽셀 좌표를 나타낸다.here,

Is the pixel coordinate of the down sampled first target resolution

Horizontal displacement at

Means the probability of work. here,

Indicates disparity, where the lowest disparity is zero and the maximum disparity is

Can be.

Denote down-sample pixel coordinates of the first viewpoint image and the second viewpoint image, respectively. In other words,

Denotes pixel coordinates at the target resolution corresponding to the first viewpoint image at time t and pixel coordinates at the target resolution corresponding to the second viewpoint image at time t, respectively.

디스패리티 확률정보 추정부(121)는 상기 수학식 1을 통해 제1 목표 해상도에서의 디스패리티 확률정보가 추정되면, 이를 기반으로 디스패리티 측면에서의 장면 흐름을 추정할 수 있다.When the disparity probability information estimator 121 estimates disparity probability information at the first target resolution through Equation 1, the disparity probability information estimator 121 may estimate a scene flow in terms of disparity.

한편, 디스패리티 추정부(131)는 디스패리티 확률정보 추정부(121)를 통해 추정된 디스패리티 확률정보에, 제1 목표 해상도로부터의 순차적인 업 샘플링을 적용하여 획득된 디스패리티 확률정보에 기초하여 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지에서의 디스패리티(디스패리티 대응점)를 추정할 수 있다. 디스패리티 추정부(131)는 디스패리티 확률정보에 대한 순차적인 업 샘플링의 적용을 통해, 제1 시점 이미지 및 제2 시점 이미지에 대응하는 원본 해상도에서의 디스패리티 확률정보를 획득하고, 이에 기초하여 원본 해상도에서의 디스패리티를 추정할 수 있다. 즉, 디스패리티 추정부(131)는 스테레오 카메라(1)를 통해 획득된 제1 시점 이미지와 제2 시점 이미지의 해상도(원본 해상도)에서의 디스패리티를 추정할 수 있다. 원본 해상도에서의 디스패리티의 추정 과정은 도 3a 및 도 3b를 참조하여 보다 쉽게 이해될 수 있다.Meanwhile, the disparity estimator 131 based on the disparity probability information obtained by applying sequential upsampling from the first target resolution to the disparity probability information estimated by the disparity probability information estimator 121. The disparity (disparity correspondence point) in the first viewpoint image and the second viewpoint image at time t can be estimated. The disparity estimator 131 obtains disparity probability information at the original resolution corresponding to the first viewpoint image and the second viewpoint image by applying sequential upsampling to the disparity probability information, and based on this, The disparity at the original resolution can be estimated. That is, the disparity estimator 131 may estimate the disparity in the resolution (original resolution) of the first view image and the second view image acquired through the stereo camera 1. The process of estimating disparity at the original resolution can be more easily understood with reference to FIGS. 3A and 3B.

도 3a는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치(100)에서 원본 이미지에서의 디스패리티의 추정을 위한 디스패리티 추정 구조를 나타낸 도면이다. 또한, 도 3b는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 원본 이미지에서의 디스패리티의 추정을 위한 디스패리티 추정 구조의 다른 구현예를 나타낸 도면이다.FIG. 3A is a diagram illustrating a disparity estimation structure for estimating disparity in an original image in a deep neural network structure based scene flow estimation apparatus 100 according to a first embodiment of the present disclosure. 3B is a diagram illustrating another implementation of the disparity estimation structure for estimating the disparity in the original image in the deep neural network structure based scene flow estimation apparatus according to the first embodiment of the present disclosure.

도 3a 및 도 3b를 참조하면, 본 장면 흐름 추정 장치(100)는 원본 해상도에서의 디스패리티를 추정하기 위해, 복수의 레이어(20)를 계층적으로 구비할 수 있다. 여기서, 복수의 레이어(20)는 디스패리티 대응 복수 레이어(20)라 표현할 수 있으며, 디스패리티 대응 복수 레이어(20)는 제1 레이어(21, 레이어 #1), …, 제n 레이어(29, 레이어 #n)를 포함할 수 있다. 디스패리티 대응 복수 레이어(20) 각각은 2개의 다층 CNN을 포함할 수 있다. 또한, 디스패리티 대응 복수 레이어(20) 중 마지막 레이어(29)를 제외한 레이어들 각각은 업 샘플링부(up sampling)를 포함할 수 있다. 구체적인 일예로, 제1 레이어(21)는 제1a 다층 CNN(21a, 다층 콘벌루션 뉴럴 네트워크 #1-1), 제1b 다층 CNN(21b, 다층 콘벌루션 뉴럴 네트워크 #1-2) 및 제1 업 샘플링부(21c, 업 샘플링 #1)를 포함할 수 있다. 디스패리티 대응 복수 레이어(20) 중 마지막에 위치한 레이어(29)는 제1a 다층 CNN(29a, 다층 콘벌루션 뉴럴 네트워크 #n-1) 및 제1b 다층 CNN(29b, 다층 콘벌루션 뉴럴 네트워크 #n-2)를 포함할 수 있다.3A and 3B, the scene flow estimation apparatus 100 may include a plurality of layers 20 hierarchically in order to estimate disparity at an original resolution. Here, the plurality of layers 20 may be referred to as a plurality of disparity-compatible layers 20, and the plurality of disparity-compatible layers 20 may include the first layer 21 (layer # 1),. , An n-th layer 29 (layer #n). Each of the disparity corresponding multiple layers 20 may include two multilayer CNNs. In addition, each of the layers except for the last layer 29 among the plurality of disparity corresponding layers 20 may include an up sampling unit. As a specific example, the first layer 21 may include a first a multilayer CNN 21a, a multilayer convolutional neural network # 1-1, a firstb multilayer CNN 21b, a multilayer convolutional neural network # 1-2, and a first up. The sampling unit 21c and upsampling # 1 may be included. The last layer 29 of the disparity-adaptive multiple layers 20 includes the first a multilayer CNN 29a (multilayer convolutional neural network # n-1) and the firstb multilayer CNN 29b (multilayer convolutional neural network # n−). It may include 2).

디스패리티 추정부(131)는 도 3a 및 도 3b에 도시된 바와 같은 원본 해상도에서의 디스패리티 추정 구조에 기반하여 정합도 또는 정합도 기반으로 추정된 디스패리티 확률정보를 순차적으로 업 샘플링하면서 원본 해상도에서의 디스패리티를 추정할 수 있다.The disparity estimator 131 sequentially upsamples the disparity probability information estimated on the basis of the degree of matching or the degree of matching based on the disparity estimation structure at the original resolution as shown in FIGS. 3A and 3B. We can estimate the disparity at.

구체적으로, 디스패리티 추정부(131)는 순차적으로 업 샘플링 수행시 계층적으로 구비된 디스패리티 대응 복수 레이어(20) 각각에 대응하는 정합도 또는 이러한 정합도 기반으로 추정된 디스패리티 확률정보에 다층 CNN을 적용하여 업 샘플링을 수행하고, 상기 디스패리티 대응 복수 레이어(20) 각각에 대응하여 출력된 CNN 출력값에 다른 다층 CNN을 적용하여 디스패리티 대응 복수 레이어(20) 각각에 대응하는 해상도별 디스패리티를 추정할 수 있다.In detail, the disparity estimator 131 may sequentially apply the multiply to the matching degree corresponding to each of the plurality of disparity corresponding layers 20 provided hierarchically or the disparity probability information estimated based on the matching degree. Upsampling by applying CNN, and applying different multi-layer CNN to the CNN output value corresponding to each of the plurality of disparity-corresponding layers 20, disparity for each resolution corresponding to each of the plurality of disparity-corresponding layers 20. Can be estimated.

디스패리티 추정부(131)는 디스패리티 확률정보 추정부(121)를 통해 추정된 제1 목표 해상도(최저해상도)에서의 디스패리티 대응점 위치에 대한 정합도 또는 이러한 정합도 기반으로 추정된 디스패리티 확률정보를 입력값으로 하여, 원본 해상도까지 순차적으로 업 샘플링을 수행하면서 이전 레이어에서의 정합도 또는 정합도 기반의 디스패리티 확률정보를 상기 이전 레이어의 다음에 구비된 레이어로 전달하는 과정을 통해, 디스패리티 대응 복수 레이어(20) 내 마지막에 위치한 레이어(29)에 의하여 원본 해상도에서의 디스패리티 확률정보(확률분포)를 획득(추정)하고, 이에 기초하여 원본 해상도에서의 디스패리티를 추정할 수 있다.The disparity estimator 131 estimates a disparity probability estimated based on the disparity correspondence point position at the first target resolution (lowest resolution) estimated by the disparity probability information estimator 121 or the disparity probability estimated based on the matching degree. By using the information as an input value, and performing upsampling up to the original resolution sequentially, distributing the matching degree or disparity probability information based on the matching degree in the previous layer to the next layer provided in the previous layer, The disparity probability information (probability distribution) at the original resolution may be obtained (estimated) by the layer 29 located last in the parity-compatible plurality of layers 20, and the disparity at the original resolution may be estimated based on this. .

다시 말해, 디스패리티 추정부(131)는 디스패리티 확률정보 추정부(121)로부터, 추정된 최저 해상도인 제1 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도 또는 정합도 기반으로 추정된 디스패리티 확률정보를 전달받을 수 있다. 이후, 디스패리티 추정부(131)는 제1 목표 해상도에서의 정합도 또는 정합도 기반의 디스패리티 확률정보에 제1 레이어(21) 내의 제1a 다층 CNN(21a)과 제1 업 샘플링부(21c)를 적용함으로써 추정된 정합도 또는 정합도 기반의 디스패리티 확률정보를, 제1 레이어(21)보다 상위 단계의 계층인 제2 레이어(22) 내의 제1a 다층 CNN으로 전달할 수 있다. 이때, 제1 레이어(21)에서는 제1a 다층 CNN(21a)를 통한 CNN 출력값에 다른 다층 CNN인 제1b 다층 CNN(21c)를 적용함으로써 제1 레이어(21)에 대응하는 목표 해상도에서의 디스패리티(디스패리티 #1)를 추정할 수 있다. 이와 같은 과정을 디스패리티 대응 복수 레이어(20) 각각에 대하여 순차적으로 진행함에 따라, 디스패리티 추정부(131)는 제1 목표 해상도에 대응하는 원본 해상도에서의 디스패리티(디스패리티 #n)를 추정할 수 있다. 여기서, 원본 해상도는 t 시간에서의 제1 시점 이미지의 원본 해상도와 t 시간에서의 제2 시점 이미지의 원본 해상도를 포함할 수 있다. 즉, 디스패리티 추정부(131)는 t 시간에서의 제1 시점 이미지와 제2 시점 이미지에 대응하는 원본 해상도에서의 디스패리티를 추정할 수 있다.In other words, the disparity estimator 131 estimates, from the disparity probability information estimator 121, the disparity estimated based on the matching degree or the matching degree for the disparity correspondence point candidate group at the first target resolution, which is the estimated lowest resolution. Probability information can be delivered. Subsequently, the disparity estimator 131 transmits the first a multi-layer CNN 21a and the first up-sampler 21c in the first layer 21 to the matching degree or disparity probability information based on the matching degree at the first target resolution. ), The estimated degree of matching or disparity probability information based on the degree of matching can be transmitted to the first multi-layer CNN in the second layer 22, which is a layer higher than the first layer 21. At this time, in the first layer 21, the disparity at the target resolution corresponding to the first layer 21 by applying the first multi-layer CNN 1b multi-layer CNN 21c to the CNN output value through the first multi-layer CNN 21a. (Disparity # 1) can be estimated. As this process is sequentially performed for each of the plurality of disparity corresponding layers 20, the disparity estimator 131 estimates the disparity (disparity #n) at the original resolution corresponding to the first target resolution. can do. Here, the original resolution may include an original resolution of the first viewpoint image at t time and an original resolution of the second viewpoint image at t time. That is, the disparity estimator 131 may estimate the disparity at the original resolution corresponding to the first view image and the second view image at time t.

달리 표현하면, 디스패리티 추정부(131)는 입력된 제1 목표 해상도(최저 해상도)에서의 디스패리티 대응점 위치에 대한 정합도 또는 확률정보 P₁에 제1 레이어(21) 내의 다층 CNN(21a)을 적용하여 정합도 또는 디스패리티 확률정보

를 계산할 수 있으며, 이를 업 샘플링함에 따른 제1 업 샘플링부(21c)의 출력값

을 제1 레이어(21)보다 상위 계층인 제2 레이어(22)로 전달할 수 있다. 제1 업 샘플링부(21c)의 출력값은 제2 레이어(22) 내의 제1a 다층 CNN의 입력값으로 적용될 수 있다. 이후 디스패리티 추정부(131)는 제1 업 샘플링부(21c)의 출력값(정합도 또는 정합도 기반의 디스패리티 확률정보)에 제2 레이어(22) 내의 다층 CNN을 적용하여 제2 레이어(22)에 대응하는 해상도에서의 정합도 또는 정합도 기반의 디스패리티 확률정보를 계산할 수 있으며, 이를 업 샘플링함에 따른 제2 레이어 내의 제2 업 샘플링부의 출력값을 제2 레이어보다 상위 계층인 제3 레이어(미도시)로 전달할 수 있다. 디스패리티 추정부(131)는 이와 같은 과정을 원본 해상도까지 반복함으로써, 원본 해상도에서의 정합도 또는 정합도 기반의 디스패리티 확률정보를 추정(획득)하고, 이에 기초하여 원본 해상도에서의 디스패리티를 추정할 수 있다.In other words, the disparity estimator 131 may apply the multi-layer CNN 21a in the first layer 21 to the matching degree or probability information P ₁ for the disparity correspondence point position at the first target resolution (lowest resolution). Matching degree or disparity probability information by applying

May be calculated and the output value of the first upsampling unit 21c according to the upsampling thereof.

May be transmitted to the second layer 22 that is higher than the first layer 21. The output value of the first upsampling unit 21c may be applied as an input value of the first a multilayer CNN in the second layer 22. Thereafter, the disparity estimator 131 applies the multi-layer CNN in the second layer 22 to the output value (matching degree or degree of disparity probability information based on the matching degree) of the first upsampling unit 21c to apply the second layer 22 to the second layer 22. The disparity probability information based on the degree of matching or the degree of matching based on the resolution corresponding to the second layer, and the output value of the second upsampling unit in the second layer according to the upsampling of the third layer is higher than the second layer. (Not shown). The disparity estimator 131 repeats this process up to the original resolution, and estimates (acquires) the disparity probability information based on the degree of matching or the degree of matching based on the original resolution, and based on the disparity at the original resolution. It can be estimated.

디스패리티 추정부(131)는 원본 해상도에서의 디스패리티를 추정함에 있어서, 업 샘플링에 따른 각 해상도 별로 디스패리티를 추정할 수 있다. 달리 표현하여, 디스패리티 추정부(131)는 제1 복수 레이어(20) 각각에 대응하는 계층별로, 업 샘플링을 통해 점차 높아지는 해상도 각각에서의 디스패리티를 순차적으로 추정할 수 있다. 즉 디스패리티 추정부(131)는 디스패리티 대응 복수 레이어(20) 내의 각 레이어(계층) 별로 정합도 또는 정합도 기반의 디스패리티 확률정보를 전달함과 더불어 각각의 레이어에 대응하는 해상도에서의 디스패리티를 추정할 수 있다.The disparity estimator 131 may estimate the disparity for each resolution according to upsampling in estimating the disparity at the original resolution. In other words, the disparity estimator 131 may sequentially estimate disparity at each resolution gradually increasing through upsampling for each layer corresponding to each of the first plurality of layers 20. That is, the disparity estimator 131 transmits the disparity probability information based on the degree of matching or the degree of matching for each layer (layer) in the disparity-corresponding plurality of layers 20, and at the resolution corresponding to each layer. Parity can be estimated.

또한, 제1 목표 해상도로부터의 순차적 업 샘플링에 따른 각 해상도(레이어)의 순서는, 제1 목표 해상도로의 순차적 다운 샘플링에 따른 각 해상도(레이어)의 역순에 대응(매칭)할 수 있다. 다시 말해, 제1 목표 해상도에서 시작되는 순차적인 업 샘플링시(도 3a 참조)의 레이어 #1, 레이어 #2, … , 레이어 #n은, 제1 목표 해상도를 향하여 시작되는 순차적인 다운 샘플링시(도 2 참조)의 레이어 #1, 레이어 #2, … , 레이어 #n의 역순에 대응(매칭)할 수 있다. 또한, 다운 샘플링시 각 레이어마다 추출된 시각적 디스패리티 표현자는 업 샘플링시 상기 대응하는 레이어에 대하여 적용(활용)될 수 있다.The order of the resolutions (layers) according to the sequential upsampling from the first target resolution may correspond (match) the reverse order of the resolutions (layers) according to the sequential downsampling to the first target resolution. In other words, layer # 1, layer # 2,... At sequential upsampling starting at the first target resolution (see FIG. 3A). , Layer #n denotes layer # 1, layer # 2,... At sequential downsampling (see FIG. 2) starting toward the first target resolution. , The reverse order of layer #n can be matched (matching). In addition, the visual disparity presenter extracted for each layer during down sampling may be applied (utilized) to the corresponding layer during up sampling.

한편, 시각적 디스패리티 표현자 추출부(111)를 통해 추출된 계층별(해상도별) 시각적 디스패리티 표현자, 디스패리티 확률정보 추정부(121)를 통해 추정된 정합도 또는 정합도 기반의 디스패리티 확률정보 및 디스패리티 추정부(131)를 통해 추정된 계층별(해상도별) 디스패리티 확률정보(디스패리티)는 후술할 장면 흐름 추정을 위한 장면 흐름 학습 장치에 의하여 학습될 수 있으며, 학습에 관한 설명은 후술하여 자세히 설명하기로 한다.On the other hand, the disparity based on the degree of matching or the degree of matching estimated by the layered (resolution) visual disparity presenter extracted by the visual disparity presenter extractor 111 and the disparity probability information estimator 121. The disparity probability information (disparity) for each layer (by resolution) estimated by the probability information and the disparity estimator 131 may be learned by a scene flow learning apparatus for scene flow estimation, which will be described later. The description will be described later in detail.

이하에서는 광학흐름 추정 과정에 대하여 구체적으로 설명하기로 한다. 이때, 광학흐름 추정 과정에 대한 설명은 앞서 디스패리티 추정 과정에 대해 설명한 내용과 동일 또는 유사하게 이해될 수 있다.Hereinafter, an optical flow estimation process will be described in detail. In this case, the description of the optical flow estimation process may be understood to be the same as or similar to the above description of the disparity estimation process.

시각적 표현자 추출부(110)의 시각적 광학흐름 표현자 추출부(112)는, 스테레오 카메라(1)를 통해 획득된 t 시간에서의 제1 시점 이미지 및 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출할 수 있다.The visual optical flow presenter extractor 112 of the visual presenter extractor 110 may have a first viewpoint image at time t obtained through the stereo camera 1 and a time that is earlier than t time (t-1). The visual optical flow presenter at the second target resolution may be extracted while sequentially down sampling the visual optical flow presenter as an input of the first viewpoint image in.

)를 의미하고, t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지는 스테레오 카메라(1)를 통해 t-1 시간에 획득된 좌측 이미지(

)를 의미할 수 있다. 다만, 이에만 한정되는 것은 아니고, 다른 일예로, t시간에서의 제1 시점 이미지는 스테레오 카메라(1)를 통해 t 시간에 획득된 우측 이미지를 의미하고, t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지는 스테레오 카메라(1)를 통해 t-1 시간에 획득된 우측 이미지를 의미할 수 있다.Here, for example, the first view image at time t is a left image (timed at time t through the stereo camera 1).

), And the first view image at time (t-1) that is earlier than the time t is a left image obtained at time t-1 through the stereo camera 1.

May mean. However, the present invention is not limited thereto, and as another example, the first view image at t time means a right image acquired at t time through the stereo camera 1, and is earlier than t time (t-1). The first viewpoint image at time may refer to a right image obtained at time t-1 through the stereo camera 1.

또한, 제2 목표 해상도에서의 시각적 광학흐름 표현자(

,

)는 t 시간에서의 제1 시점 이미지에 대한 목표 해상도에서의 시각적 광학흐름 표현자(

) 및 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지에 대한 목표 해상도에서의 시각적 광학흐름 표현자(

)를 포함할 수 있다. 즉, 제2 목표 해상도라 함은 t 시간에서의 제1 시점 이미지와 (t-1) 시간에서의 제1 시점 이미지 각각의 목표 해상도를 의미할 수 있다. 목표 해상도는 최저해상도를 의미할 수 있다.Also, the visual optical flow presenter at the second target resolution (

,

) Is the visual optical flow presenter at the target resolution for the first viewpoint image at time t (

) And the visual optical flow presenter at the target resolution for the first viewpoint image at time (t-1) prior to time t

) May be included. That is, the second target resolution may mean a target resolution of each of the first viewpoint image at time t and the first viewpoint image at time (t-1). The target resolution may mean the lowest resolution.

시각적 광학흐름 표현자의 추출 과정은 도 4를 참조하여 보다 쉽게 이해될 수 있다.The extraction process of the visual optical flow presenter can be more easily understood with reference to FIG. 4.

도 4는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치(100)에서 시각적 광학흐름 표현자 추출을 위한 시각적 광학흐름 표현자 추출 구조를 나타낸 도면이다.4 is a diagram illustrating a visual optical flow presenter extraction structure for visual optical flow presenter extraction in the deep neural network structure based scene flow estimation apparatus 100 according to the first embodiment of the present disclosure.

도 4를 참조하면, 본 장면 흐름 추정 장치(100)는 시각적 광학흐름 표현자를 추출하기 위해 복수의 레이어(30)를 계층적으로 구비할 수 있다. 여기서, 복수의 레이어(30)는 광학흐름 대응 복수 레이어(30)라 표현할 수 있으며, 광학흐름 대응 복수 레이어(30)는 제1 레이어(31, 레이어 #1), 제2 레이어(32, 레이어 #2), 제3 레이어(33, 레이어 #3) 등을 포함할 수 있다. 또한, 후술할 본원에서 제1 레이어(31)는 제1 계층, 제1 다운 샘플링 계층 등으로 달리 표현될 수 있으며, 제2 레이어(32)는 제2 계층, 제2 다운 샘플링 계층 등으로 달리 표현될 수 있다.Referring to FIG. 4, the apparatus 100 for estimating the scene flow may include a plurality of layers 30 hierarchically to extract a visual optical flow presenter. In this case, the plurality of layers 30 may be referred to as a plurality of layers corresponding to the optical flow 30, and the plurality of layers corresponding to the optical flow 30 may include the first layer 31 (layer # 1) and the second layer 32 (layer # 32). 2), a third layer 33, layer # 3, and the like. In addition, the first layer 31 may be differently represented as a first layer, a first down sampling layer, and the like, and the second layer 32 may be differently represented as a second layer, a second down sampling layer, or the like. Can be.

광학흐름 대응 복수 레이어(30) 각각은 다층 콘벌루션 뉴럴 네트워크(Convolutional Neural Network, CNN)(즉, 다층 CNN) 및 다운 샘플링부(down sampling)를 포함할 수 있다. 구체적인 일예로, 제1 레이어(31)는 제1 다층 CNN(31a, 다층 콘벌루션 뉴럴 네트워크 #1) 및 제1 다운 샘플링부(31b, 다운 샘플링 #1)를 포함할 수 있다. 제2 레이어(32)는 제2 다층 CNN(32a, 다층 콘벌루션 뉴럴 네트워크 #2) 및 제2 다운 샘플링부(32b, 다운 샘플링 #2)를 포함할 수 있다.Each of the plurality of optical flow corresponding layers 30 may include a multilayer convolutional neural network (CNN) (ie, a multilayer CNN) and a down sampling unit. As a specific example, the first layer 31 may include a first multilayer CNN 31a (multilayer convolutional neural network # 1) and a first down sampling unit 31b (down sampling # 1). The second layer 32 may include a second multilayer CNN 32a (multilayer convolutional neural network # 2) and a second down sampling unit 32b (down sampling # 2).

시각적 광학흐름 표현자 추출부(112)는 도 4에 도시된 바와 같은 시각적 광학흐름 표현자 추출 구조에 기초하여 순차적으로 다운 샘플링하면서 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출할 수 있다.The visual optical flow presenter extractor 112 may extract the visual optical flow presenter at the second target resolution while downsampling sequentially based on the visual optical flow presenter extraction structure as shown in FIG. 4.

구체적으로, 시각적 광학흐름 표현자 추출부(112)는 순차적으로 다운 샘플링 수행시 계층적으로 구비된 디스패리티 대응 복수 레이어(30) 각각에 포함된 다층 CNN의 적용을 통해 디스패리티 대응 복수 레이어(30) 각각에 대응하는 해상도별 시각적 광학흐름 표현자를 추출하고, 해상도별 시각적 광학흐름 표현자에 대하여 다운 샘플링을 수행할 수 있다.In detail, the visual optical flow presenter extractor 112 sequentially applies the disparity-compatible multiple layers 30 through the application of the multi-layer CNN included in each of the disparity-compatible multiple layers 30 hierarchically provided when performing downsampling. ) A visual optical flow presenter for each resolution corresponding to each resolution may be extracted and downsampling may be performed for the visual optical flow presenter for each resolution.

일예로, t 시간에서의 제1 시점 이미지가 제1 레이어(31) 내의 제1 다층 CNN(31a)의 입력값으로 적용될 수 있다. 시각적 광학흐름 표현자 추출부(112)는 제t 시간에서의 제1 시점 이미지에 제1 레이어(31) 내의 제1 다층 CNN(31a)을 적용함으로써 t 시간에서의 제1 시점 이미지의 원본 해상도에서의 제1 시각적 광학흐름 표현자(31c, 시각적 광학흐름 표현자 #1)를 추출할 수 있다. 이후 시각적 광학흐름 표현자 추출부(112)는 제1 시각적 광학흐름 표현자(31c)에 대하여 제1 다운 샘플링부(31b)를 통해 다운 샘플링을 수행할 수 있다. 제1 다운 샘플링부(31b)의 출력값은 제2 레이어(32) 내의 제2 다층 CNN(32a)의 입력값으로 적용될 수 있다. 한편, t시간에서의 제1 시점 이미지에 대하여 설명한 내용은 t-1 시간에서의 제1 시점 이미지에 대한 설명에도 동일 또는 유사하게 적용 가능하며, 이하 중복되는 설명은 생략하기로 한다. 이와 같은 과정이 광학흐름 대응 복수 레이어(30) 각각에 대하여 순차적으로 반복하여 수행됨으로써, 시각적 광학흐름 표현자 추출부(112)는 최저해상도인 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출할 수 있다.For example, the first viewpoint image at time t may be applied as an input value of the first multilayer CNN 31a in the first layer 31. The visual optical flow presenter extractor 112 applies the first multi-layer CNN 31a in the first layer 31 to the first viewpoint image at t-th time at the original resolution of the first viewpoint image at t-time. It is possible to extract the first visual optical flow expressor 31c (visual optical flow expressor # 1). Thereafter, the visual optical flow presenter extractor 112 may perform down sampling on the first visual optical flow presenter 31c through the first down sampling unit 31b. The output value of the first down sampling unit 31b may be applied as an input value of the second multi-layer CNN 32a in the second layer 32. The description of the first viewpoint image at time t is similarly or similarly applicable to the description of the first viewpoint image at time t-1, and overlapping descriptions will be omitted. As such a process is repeatedly performed for each of the plurality of layers corresponding to the optical flow, the visual optical flow presenter extractor 112 may extract the visual optical flow presenter at the second target resolution, which is the lowest resolution. have.

달리 표현하면, 시각적 광학흐름 표현자 추출부(112)는 입력 이미지 패치 P에 제1 레이어(31) 내의 제1 다층 CNN(31a)을 적용함으로써 광학흐름 추정을 위한 시각적 광학흐름 표현자로서 제1 레이어(31)에 대응하는 제1 시각적 광학흐름 표현자(31c, g₁)를 추출할 수 있다. 이후, 시각적 광학흐름 표현자 추출부(112)는 제1 시각적 광학흐름 표현자(31c, g₁)를 제1 다운 샘플링부(31b)를 통해 다운 샘플링함으로써

를 추출할 수 있다. 여기서,

는 제1 시각적 광학흐름 표현자(g₁)가 다운 샘플링된 시각적 광학흐름 표현자를 의미할 수 있다. 이후, 시각적 광학흐름 표현자 추출부(112)는 제1 다운 샘플링부(31b)의 출력값

에 제2 레이어(32) 내의 제2 다층 CNN(32a)를 적용함으로써 제1 레이어(31)로부터 출력된 이미지의 해상도에서의 시각적 광학흐름 표현자 g₂(즉, 제2 레이어(32)에 대응하는 제2 시각적 광학흐름 표현자)를 추출할 수 있다. 이러한 과정은 광학흐름 대응 복수 레이어(30) 각각에 대하여 순차적으로 진행될 수 있으며, 이를 통해 시각적 광학흐름 표현자 추출부(112)는 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출할 수 있다.In other words, the visual optical flow presenter extractor 112 applies the first multi-layer CNN 31a in the first layer 31 to the input image patch P to form the first optical visual presenter as the optical optical flow presenter for optical flow estimation. First visual optical flow presenters 31c and g ₁ corresponding to the layer 31 may be extracted. Thereafter, the visual optical flow presenter extractor 112 down-samples the first visual optical flow presenter 31c and g ₁ through the first down sampling unit 31b.

Can be extracted. here,

May denote a visual optical flow presenter for which the first visual optical flow presenter g ₁ is down sampled. Thereafter, the visual optical flow presenter extractor 112 outputs the output value of the first down sampling unit 31b.

Corresponds to the visual optical flow indicator g ₂ (ie, second layer 32) at the resolution of the image output from the first layer 31 by applying a second multilayer CNN 32a in the second layer 32 to Second visual optical flow presenter) can be extracted. This process may be sequentially performed for each of the plurality of optical flow corresponding layers 30, and through this, the visual optical flow presenter extractor 112 may extract the visual optical flow presenter at the second target resolution.

시각적 광학흐름 표현자 추출부(112)는 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출함에 있어서, 제2 복수 레이어 각각에 대응하는 계층별로, 다운 샘플링을 통해 점차 줄어든 각 해상도에 대응하는 시각적 광학흐름 표현자를 순차적으로 추출할 수 있다. 달리 표현하여, 시각적 광학흐름 표현자 추출부(112)는 순차적으로 다운 샘플링을 수행함으로써 해상도 별로 시각적 광학흐름 표현자를 추출할 수 있다. 시각적 광학흐름 표현자 추출부(112)는 계층별(레이어별) 광학흐름 측정을 위한 시각적 표현자를 추출할 수 있다.When the visual optical flow presenter extractor 112 extracts the visual optical flow presenter at the second target resolution, the visual optical corresponding to each resolution gradually reduced through down sampling for each layer corresponding to each of the second plurality of layers. Flow descriptors can be extracted sequentially. In other words, the visual optical flow presenter extractor 112 may extract the visual optical flow presenter for each resolution by sequentially performing down sampling. The visual optical flow presenter extractor 112 may extract a visual presenter for measuring the optical flow for each layer.

시각적 광학흐름 표현자 추출부(112)를 통해 제2 목표 해상도에서의 시각적 광학흐름 표현자가 추출된 이후에, 광학흐름 확률정보 추정부(122)는 제2 목표 해상도에서의 시각적 광학흐름 표현자를 고려하여 산출된 제2 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도(유사도)를 이용하여 제2 목표 해상도에서의 광학흐름 확률분포를 추정할 수 있다. 즉, 광학흐름 확률정보 추정부(122)는 정합도를 고려하여 제2 목표 해상도에서의 광학흐름 대응점 위치에 대한 확률정보를 추정할 수 있다. 이러한 광학흐름 확률정보 추정부(122)는 추정된 제2 목표 해상도에서의 디스패리티 확률정보를 기반으로 광학흐름 측면에서의 장면 흐름을 추정할 수 있다. 여기서, 광학흐름이 수평 변위 및 수직 변위가 모두 고려된 개념임에 따라 광학흐름 확률분포는 3차원 형태로 나타날 수 있다.After the visual optical flow presenter at the second target resolution is extracted by the visual optical flow presenter extractor 112, the optical flow probability information estimator 122 considers the visual optical flow presenter at the second target resolution. The optical flow probability distribution at the second target resolution may be estimated using the matching degree (similarity) for the optical flow correspondence point candidate group calculated at the second target resolution. That is, the optical flow probability information estimator 122 may estimate probability information on the position of the optical flow corresponding point at the second target resolution in consideration of the degree of matching. The optical flow probability information estimator 122 may estimate the scene flow in terms of the optical flow based on the disparity probability information at the second target resolution. Here, as the optical flow is a concept in which both horizontal displacement and vertical displacement are considered, the optical flow probability distribution may appear in a three-dimensional form.

여기서, 광학흐름 대응점 후보군은 제2 목표 해상도에서의 t 시간의 제1 시점 이미지의 어느 한 픽셀과 광학흐름 측면에서 대응 가능성이 있는 제2 목표 해상도에서의 (t-1) 시간의 제1 시점 이미지의 모든 픽셀일 수 있다. 예를 들면, 픽셀 관점에서 제2 목표 해상도에서 제1 시점 이미지의 3번째 행에 해당하는 어느 한 픽셀은, 광학흐름 측면에서 제2 목표 해상도에서 제2 시점 이미지의 3번째 행에 해당하는 모든 픽셀 및 상기 어느 한 픽셀이 포함된 열에 해당하는 모든 픽셀과 대응될 가능성이 있으므로, 제2 목표 해상도에서 제2 시점 이미지의 3번째 행에 해당하는 모든 픽셀 및 상기 어느 한 픽셀이 포함된 열에 해당하는 모든 픽셀과 조합되는 것을 대응점 후보군으로 볼 수 있을 것이다.Here, the optical flow correspondence point candidate group includes a first viewpoint image of time (t-1) at a second target resolution at which the pixel has a corresponding possibility in terms of optical flow with a pixel of the first viewpoint image at time t at the second target resolution. Can be any pixel of. For example, any pixel corresponding to the third row of the first viewpoint image at the second target resolution in terms of pixels is all pixels corresponding to the third row of the second viewpoint image at the second target resolution in terms of optical flow. And all pixels corresponding to the third row of the second view image at the second target resolution, and all the columns corresponding to the one pixel, because the pixels may correspond to all the pixels corresponding to the column including the one pixel. The combination with the pixel may be considered as a matching point candidate group.

한편, 종래에는 장면 흐름 추정을 위한 이미지 간의 정합도 계산시 처음에 주어진 원본 해상도에서의 이미지에서 픽셀의 대응점을 탐색하기 때문에, 많은 양의 픽셀에 대한 탐색으로 인해 정합도 계산시 많은 시간이 소요되는 문제가 있었다.On the other hand, conventionally, since the corresponding point of a pixel is searched for in the image at the original resolution given when calculating the match between images for scene flow estimation, a large amount of time is required to calculate the match due to the search for a large amount of pixels. There was a problem.

이러한 문제를 해소하고자, 광학흐름 확률정보 추정부(122)는 광학흐름 대응점 탐색시 탐색 범위를 효과적으로 줄이기 위해 최저해상도인 제2 목표 해상도에 대하여 정합도(유사도) 계산을 수행할 수 있다. 즉, 본원은 시각적 광학흐름 표현자 추출부(112)에 의하여, 다운 샘플링을 통해 연속된 이미지(즉, t시간에서의 제1 시점 이미지와 (t-1) 시간에서의 제1 시점 이미지)에 대응하는 최저해상도인 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출한 이후에, 최저해상도인 제2 목표 해상도에 기초하여 제2 목표 해상도에 포함된 광학흐름 대응점 후보군에 대한 정합도를 광학흐름 확률정보 추정부(122)에 의하여 계산할 수 있다. 또한 광학흐름 확률정보 추정부(122)는 계산된 정합도에 따른 제2 목표 해상도에서의 광학흐름 확률정보를 추정할 수 있으며, 달리 말해 계산된 정합도에 기초하여 광학흐름 대응점 위치에 대한 광학흐름 확률정보를 추정할 수 있다.In order to solve this problem, the optical flow probability information estimator 122 may perform a matching degree (similarity) calculation on the second target resolution, which is the lowest resolution, in order to effectively reduce the search range when searching for the optical flow correspondence point. That is, the present application is performed by the visual optical flow presenter extracting unit 112 to down-sampling the continuous images (that is, the first viewpoint image at time t and the first viewpoint image at time (t-1)). After extracting the visual optical flow presenter at the second target resolution of the corresponding minimum resolution, the matching degree of the optical flow correspondence point candidate group included in the second target resolution is determined based on the second target resolution of the minimum resolution. The estimation unit 122 may calculate. In addition, the optical flow probability information estimator 122 may estimate the optical flow probability information at the second target resolution according to the calculated matching degree, in other words, the optical flow with respect to the optical flow corresponding point position based on the calculated matching degree. Probability information can be estimated.

이러한 본원은 정합도 계산시의 계산량을, 종래에 원본 해상도에 대한 정합도 계산시 요구되는 계산량 대비 다운 샘플링 횟수에 비례하여 기하급수적으로 줄일 수 있다. 즉, 본원은 다운 샘플링으로 인해 줄어든 제2 목표 해상도에서의 대응점 후보군에 대하여 정합도 계산을 수행함으로써, 광학흐름 대응점 탐색 범위를 효과적으로 줄여 정합도 계산시 요구되는 계산량을 줄이고, 이로 인해 장면 흐름 추정시 소요되는 시간을 효과적으로 줄일 수 있다.The present application can reduce the amount of calculation in the degree of matching calculation exponentially in proportion to the number of down-sampling compared to the amount of calculation required in the conventional calculation of the degree of matching for the original resolution. That is, the present application performs matching degree calculation on the matching point candidate group at the second target resolution reduced due to down-sampling, thereby effectively reducing the optical flow matching point search range, thereby reducing the amount of calculation required for calculating the matching degree, and thus, in estimating scene flow. This can effectively reduce the time required.

광학흐름 확률정보 추정부(122)는 광학흐름 대응점 후보군에 대한 정합도(유사도)를 제2 목표 해상도에서의 시각적 광학흐름 표현자 간의 내적 연산에 의하여 산출할 수 있다. 이때, 본원에서는 정합도(유사도) 산출시 일예로 내적 연산을 이용할 수 있으며, 이에만 한정되는 것은 아니고, 정합도(유사도) 산출을 위한 다양한 방법이 이용될 수 있다.The optical flow probability information estimator 122 may calculate the matching degree (similarity) of the optical flow correspondence point candidate group by an inner product calculation between visual optical flow presenters at the second target resolution. In this case, the present invention can use the internal calculation as an example when calculating the degree of matching (similarity), but is not limited thereto, and various methods for calculating the degree of matching (similarity) may be used.

광학흐름 확률정보 추정부(122)는 제2 목표 해상도에 대응하는 t 시간에서의 제1 시점 이미지와 제2 목표 해상도에 대응하는 (t-1) 시간에서의 제1 시점 이미지 각각으로부터 추출된 시각적 광학흐름 표현자 간의 정합도를 계산할 수 있다. 일예로, 광학흐름 확률정보 추정부(122)는 픽셀 좌표 p₁, p₂에서 시각적 광학흐름 표현자 g(p₁), g(p₂)의 내적(dot-product) 연산 s(p₁, p₂) = < g(p₁), g(p₂) > 를 적용함으로써 정합도(유사도)를 계산(측정, 연산)할 수 있다.The optical flow probability information estimator 122 extracts a visual image extracted from each of the first viewpoint image at time t corresponding to the second target resolution and the first viewpoint image at time (t-1) corresponding to the second target resolution. The degree of matching between optical flow descriptors can be calculated. As an example, an optical flow probability information estimating unit 122 is the pixel coordinate p _1, expressed visual optical flow character on the p ₂ g (p _1), g (p _2), the inner product (dot-product) operation of s (p _1, By applying p ₂ ) = <g (p ₁ ) and g (p ₂ )>, the degree of matching (similarity) can be calculated (measured and calculated).

광학흐름 확률정보 추정부(122)는 계산(측정, 연산)된 정합도를 이용하여 제2 목표 해상도에서의 광학흐름 확률정보를 추정할 수 있으며, 이때 추정되는 광학흐름 확률정보는 정규화된 확률정보(확률분포)일 수 있다.The optical flow probability information estimator 122 may estimate the optical flow probability information at the second target resolution by using the calculated degree of matching (calculated, calculated), and the estimated optical flow probability information is normalized probability information. (Probability distribution).

구체적으로, 광학흐름 확률정보 추정부(122)는 계산된 대응점들과의 정합도에 정규화를 위해 소프트맥스 함수(softmax function)를 적용함으로써 제2 목표 해상도에서의 광학흐름 대응점 위치에 대한 확률정보(즉, 광학흐름 확률정보)를 추정할 수 있다. 달리 말해, 확률흐름 확률정보는 제2 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도에 소프트맥스 함수를 적용함으로써 추정될 수 있다. 광학흐름 확률정보 추정부(122)는 하기 수학식 2를 이용하여 광학흐름 확률정보를 추정할 수 있다.Specifically, the optical flow probability information estimator 122 applies the softmax function to normalize the degree of matching with the calculated corresponding points, thereby applying probability information on the position of the optical flow correspondence point at the second target resolution. That is, the optical flow probability information) can be estimated. In other words, the probability flow probability information may be estimated by applying the softmax function to the matching degree for the optical flow correspondence point candidate group at the second target resolution. The optical flow probability information estimating unit 122 may estimate the optical flow probability information using Equation 2 below.

여기서,

는 다운 샘플링된 제2 목표 해상도의 픽셀 좌표

에서 수평 변위가

, 수직 변위가

일 확률을 의미한다. 최저 광학흐름은

이고 최대 광학흐름은

일 수 있다.

는 각각 t 시간에서의 제1 시점 이미지 및 (t-1) 시간에서의 제2 시점 이미지의 다운 샘플링 픽셀 좌표를 나타낸다. 달리 표현하여,

는 각각 t시간에서의 제1 시점 이미지에 대응하는 목표 해상도에서의 픽셀 좌표 및 t-1시간에서의 제1 시점 이미지에 대응하는 목표 해상도에서의 픽셀 좌표를 나타낸다.here,

Is the pixel coordinate of the down sampled second target resolution

Horizontal displacement at

Vertical displacement

Means the probability of work. Lowest optical flow

And the maximum optical flow is

Can be.

Respectively represent down-sampled pixel coordinates of the first viewpoint image at time t and the second viewpoint image at time (t-1). In other words,

Respectively represent pixel coordinates at the target resolution corresponding to the first viewpoint image at time t and pixel coordinates at the target resolution corresponding to the first viewpoint image at time t-1.

디스패리티 확률정보 추정부(122)는 상기 수학식 2를 통해 제2 목표 해상도에서의 디스패리티 확률정보가 추정되면, 이를 기반으로 광학흐름 측면에서의 장면 흐름을 추정할 수 있다.When the disparity probability information estimator 122 estimates the disparity probability information at the second target resolution through Equation 2, the disparity probability information estimator 122 may estimate the scene flow in terms of optical flow.

한편, 광학흐름 추정부(132)는 광학흐름 확률정보 추정부(122)를 통해 추정된 광학흐름 확률정보에, 제2 목표 해상도로부터의 순차적인 업 샘플링을 적용하여 획득된 광학흐름 확률정보에 기초하여 t 시간에서의 제1 시점 이미지 및 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지에서의 광학흐름(광학흐름 대응점)을 추정할 수 있다. 광학흐름 추정부(132)는 광학흐름 확률정보에 대한 순차적인 업 샘플링의 적용을 통해, t 시간에서의 제1 시점 이미지 및 (t-1) 시간에서의 제2 시점 이미지에 대응하는 원본 해상도에서의 광학흐름 확률정보를 획득하고, 이에 기초하여 원본 해상도에서의 광학흐름을 추정할 수 있다. 즉, 광학흐름 추정부(132)는 스테레오 카메라(1)를 통해 획득된 t 시간에서의 제1 시점 이미지와 (t-1) 시간에서의 제1 시점 이미지의 해상도(원본 해상도)에서의 광학흐름을 추정할 수 있다. 원본 해상도에서의 광학흐름의 추정 과정은 도 5a 및 도 5b를 참조하여 보다 쉽게 이해될 수 있다.Meanwhile, the optical flow estimator 132 is based on the optical flow probability information obtained by applying sequential upsampling from the second target resolution to the optical flow probability information estimated by the optical flow probability information estimator 122. The optical flow (optical flow corresponding point) in the first viewpoint image at time t and the first viewpoint image at time (t-1) that is earlier than t time can be estimated. The optical flow estimator 132 applies the sequential upsampling to the optical flow probability information, and at the original resolution corresponding to the first viewpoint image at time t and the second viewpoint image at time (t-1). The optical flow probability information can be obtained, and the optical flow at the original resolution can be estimated based on this. That is, the optical flow estimating unit 132 performs optical flow at a resolution (original resolution) of the first viewpoint image at time t and the first viewpoint image at time (t-1) obtained through the stereo camera 1. Can be estimated. The process of estimating the optical flow at the original resolution can be more easily understood with reference to FIGS. 5A and 5B.

도 5a는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치(100)에서 원본 이미지에서의 광학흐름의 추정을 위한 광학흐름 추정 구조의 일 구현예를 나타낸 도면이다. 또한, 도 5b는 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 원본 이미지에서의 광학흐름의 추정을 위한 광학흐름 추정 구조의 다른 구현예를 나타낸 도면이다.5A is a diagram illustrating an embodiment of an optical flow estimation structure for estimating optical flow in an original image in the deep neural network structure-based scene flow estimation apparatus 100 according to the first embodiment of the present disclosure. 5B is a diagram illustrating another embodiment of the optical flow estimation structure for estimating the optical flow in the original image in the deep neural network structure based scene flow estimation apparatus according to the first embodiment of the present application.

도 5a 및 도 5b를 참조하면, 본 장면 흐름 추정 장치(100)는 원본 해상도에서의 광학흐름을 추정하기 위해, 복수의 레이어(40)를 계층적으로 구비할 수 있다. 여기서, 복수의 레이어(40)는 광학흐름 대응 복수 레이어(40)라 표현할 수 있으며, 광학흐름 대응 복수 레이어(40)는 제1 레이어(41, 레이어 #1), …, 제n 레이어(49, 레이어 #n)를 포함할 수 있다. 광학흐름 대응 복수 레이어(40) 각각은 2개의 다층 CNN을 포함할 수 있다. 또한, 광학흐름 대응 복수 레이어(40) 중 마지막 레이어(49)를 제외한 레이어들 각각은 업 샘플링부(up sampling)를 포함할 수 있다. 구체적인 일예로, 제1 레이어(41)는 제1a 다층 CNN(41a, 다층 콘벌루션 뉴럴 네트워크 #1-1), 제1b 다층 CNN(41b, 다층 콘벌루션 뉴럴 네트워크 #1-2) 및 제1 업 샘플링부(41c, 업 샘플링 #1)를 포함할 수 있다. 광학흐름 대응 복수 레이어(40) 중 마지막에 위치한 레이어(49)는 제1a 다층 CNN(49a, 다층 콘벌루션 뉴럴 네트워크 #n-1) 및 제1b 다층 CNN(49b, 다층 콘벌루션 뉴럴 네트워크 #n-2)를 포함할 수 있다.5A and 5B, the scene flow estimation apparatus 100 may include a plurality of layers 40 hierarchically in order to estimate the optical flow at the original resolution. Here, the plurality of layers 40 may be referred to as a plurality of layers 40 for optical flow, and the plurality of layers 40 for optical flow may include a first layer 41 (layer # 1),. And an n-th layer 49 (layer #n). Each of the plurality of optical flow corresponding layers 40 may include two multilayer CNNs. In addition, each of the layers except the last layer 49 among the plurality of layers 40 may include an up sampling unit. As a specific example, the first layer 41 may include a first a multilayer CNN 41a (multilayer convolutional neural network # 1-1), a 1b multilayer CNN 41b (multilayer convolutional neural network # 1-2), and a first up. The sampling unit 41c and upsampling # 1 may be included. The last layer 49 of the optical flow-compatible plural layers 40 includes the first a multilayer CNN 49a (multilayer convolutional neural network # n-1) and the firstb multilayer CNN 49b (multilayer convolutional neural network # n−). It may include 2).

광학흐름 추정부(132)는 도 5a 및 도 5b에 도시된 바와 같은 원본 해상도에서의 광학흐름 추정 구조에 기반하여 정합도 또는 정합도 기반으로 추정된 광학흐름 확률정보를 순차적으로 업 샘플링하면서 원본 해상도에서의 광학흐름을 추정할 수 있다. The optical flow estimator 132 sequentially upsamples the optical flow probability information estimated based on the degree of matching or the degree of matching based on the optical flow estimation structure at the original resolution as shown in FIGS. 5A and 5B. The optical flow at can be estimated.

구체적으로, 광학흐름 추정부(132)는 순차적으로 업 샘플링 수행시 계층적으로 구비된 광학흐름 대응 복수 레이어(40) 각각에 대응하는 정합도 또는 이러한 정합도 기반으로 추정된 광학흐름 확률정보에 다층 CNN을 적용하여 업 샘플링을 수행하고, 상기 광학흐름 대응 복수 레이어(40) 각각에 대응하여 출력된 CNN 출력값에 다른 다층 CNN을 적용하여 광학흐름 대응 복수 레이어(40) 각각에 대응하는 해상도별 광학흐름을 추정할 수 있다.In detail, the optical flow estimator 132 has a multi-layer based on the matching degree corresponding to each of the plurality of optical flow-corresponding layers 40 provided hierarchically or the estimated optical flow probability information based on the matching degree. Upsampling is performed by applying CNN, and different multi-layer CNN is applied to the CNN output value corresponding to each of the plurality of layers 40 corresponding to the optical flows, thereby providing an optical flow for each resolution corresponding to each of the plurality of layers 40 corresponding to the optical flow. Can be estimated.

광학흐름 추정부(132)는 광학흐름 확률정보 추정부(122)를 통해 추정된 제2 목표 해상도(최저해상도)에서의 광학흐름 대응점 위치에 대한 정합도 또는 이러한 정합도 기반으로 추정된 광학흐름 확률정보를 입력값으로 하여, 원본 해상도까지 순차적으로 업 샘플링을 수행하면서 이전 레이어에서의 정합도 또는 정합도 기반의 광학흐름 확률정보를 상기 이전 레이어의 다음에 구비된 레이어로 전달하는 과정을 통해, 광학흐름 대응 복수 레이어(40) 내 마지막에 위치한 레이어(29)에 의하여 원본 해상도에서의 광학흐름 확률정보를 획득(추정)하고, 이에 기초하여 원본 해상도에서의 광학흐름을 추정할 수 있다.The optical flow estimator 132 may match the optical flow probability point position at the second target resolution (lowest resolution) estimated by the optical flow probability information estimator 122 or the optical flow probability estimated based on the matching degree. By using the information as an input value, and performing upsampling up to the original resolution sequentially, the degree of matching or the degree of matching-based optical flow probability in the previous layer is transmitted to the next layer provided in the previous layer. The optical layer probability information at the original resolution may be obtained (estimated) by the layer 29 positioned last in the flow-corresponding plurality of layers 40, and the optical flow at the original resolution may be estimated based on the layer 29.

다시 말해, 광학흐름 추정부(132)는 광학흐름 확률정보 추정부(122)로부터, 추정된 최저 해상도인 제2 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도 또는 정합도 기반으로 추정된 광학흐름 확률정보를 전달받을 수 있다. 이후, 광학흐름 추정부(132)는 제2 목표 해상도에서의 정합도 또는 정합도 기반의 광학흐름 확률정보에 제1 레이어(41) 내의 제1a 다층 CNN(41a)과 제1 업 샘플링부(41c)를 적용함으로써 추정된 정합도 또는 정합도 기반의 광학흐름 확률정보를, 제1 레이어(41)보다 상위 단계의 계층인 제2 레이어(42) 내의 제1a 다층 CNN으로 전달할 수 있다. 이때, 제1 레이어(41)에서는 제1a 다층 CNN(41a)를 통한 CNN 출력값에 다른 다층 CNN인 제1b 다층 CNN(41c)를 적용함으로써 제1 레이어(41)에 대응하는 목표 해상도에서의 광학흐름(광학흐름 #1)을 추정할 수 있다. 이와 같은 과정을 광학흐름 대응 복수 레이어(40) 각각에 대하여 순차적으로 진행함에 따라, 광학흐름 추정부(132)는 제2 목표 해상도에 대응하는 원본 해상도에서의 광학흐름(광학흐름 #n)을 추정할 수 있다. 여기서, 원본 해상도는 t 시간에서의 제1 시점 이미지의 원본 해상도와 (t-1) 시간에서의 제1 시점 이미지의 원본 해상도를 포함할 수 있다. 즉, 광학흐름 추정부(132)는 t 시간에서의 제1 시점 이미지와 (t-1) 시간에서의 제1 시점 이미지에 대응하는 원본 해상도에서의 광학흐름을 추정할 수 있다.In other words, the optical flow estimator 132 estimates the optical flow estimated from the optical flow probability information estimator 122 based on the matching degree or the matching degree for the optical flow corresponding point candidate group at the second target resolution, which is the lowest resolution. Probability information can be delivered. Thereafter, the optical flow estimating unit 132 includes the first a multi-layer CNN 41a and the first up-sampling unit 41c in the first layer 41 based on the matching degree or the matching degree-based optical flow probability information at the second target resolution. ) Can be transmitted to the first multi-layer CNN in the second layer 42 which is a layer higher than the first layer 41. At this time, in the first layer 41, the first flow is applied to the CNN output value through the first multi-layer CNN 41a by applying the first multi-layer CNN, the first multi-layer CNN 41c, at the target resolution corresponding to the first layer 41. (Optical Flow # 1) can be estimated. As the above process is sequentially performed for each of the plurality of optical flow corresponding layers 40, the optical flow estimating unit 132 estimates the optical flow (optical flow #n) at the original resolution corresponding to the second target resolution. can do. Here, the original resolution may include an original resolution of the first viewpoint image at time t and an original resolution of the first viewpoint image at time (t-1). That is, the optical flow estimator 132 may estimate the optical flow at the original resolution corresponding to the first viewpoint image at time t and the first viewpoint image at time t-1.

달리 표현하면, 광학흐름 추정부(132)는 입력된 제2 목표 해상도(최저 해상도)에서의 광학흐름 대응점 위치에 대한 정합도 또는 확률정보 P₁에 제1 레이어(41) 내의 다층 CNN(41a)을 적용하여 정합도 또는 광학흐름 확률정보

를 계산할 수 있으며, 이를 업 샘플링함에 따른 제1 업 샘플링부(41c)의 출력값

을 제1 레이어(41)보다 상위 계층인 제2 레이어(42)로 전달할 수 있다. 제1 업 샘플링부(41c)의 출력값은 제2 레이어(42) 내의 제1a 다층 CNN의 입력값으로 적용될 수 있다. 이후 광학흐름 추정부(132)는 제1 업 샘플링부(41c)의 출력값(정합도 또는 정합도 기반의 광학흐름 확률정보)에 제2 레이어(42) 내의 다층 CNN을 적용하여 제2 레이어(42)에 대응하는 해상도에서의 정합도 또는 정합도 기반의 광학흐름 확률정보를 계산할 수 있으며, 이를 업 샘플링함에 따른 제2 레이어 내의 제2 업 샘플링부의 출력값을 제2 레이어보다 상위 계층인 제3 레이어(미도시)로 전달할 수 있다. 광학흐름 추정부(132)는 이와 같은 과정을 원본 해상도까지 반복함으로써, 원본 해상도에서의 정합도 또는 정합도 기반의 광학흐름 확률정보를 추정(획득)하고, 이에 기초하여 원본 해상도에서의 광학흐름을 추정할 수 있다.In other words, the optical flow estimating unit 132 has a multiplicity of CNNs 41a in the first layer 41 at the matching degree or probability information P ₁ for the position of the optical flow corresponding point at the input second target resolution (lowest resolution). Matching degree or optical flow probability information by applying

May be calculated and the output value of the first upsampling unit 41c according to the upsampling.

May be transmitted to the second layer 42 that is higher than the first layer 41. The output value of the first upsampling unit 41c may be applied as an input value of the first a multilayer CNN in the second layer 42. Thereafter, the optical flow estimator 132 applies the multi-layer CNN in the second layer 42 to the output value (matching degree or matching degree-based optical flow probability information) of the first up-sampling unit 41c, thereby applying the second layer 42 to the second layer 42. The optical density probability information based on the matching degree or the matching degree at the resolution corresponding to) may be calculated, and the output value of the second upsampling unit in the second layer according to the upsampling of the third layer may be higher than the second layer. (Not shown). The optical flow estimating unit 132 repeats the above process to the original resolution, and estimates (acquires) the degree of matching or the flow rate based optical flow probability information at the original resolution, and based on the optical flow at the original resolution. It can be estimated.

광학흐름 추정부(132)는 원본 해상도에서의 광학흐름을 추정함에 있어서, 업 샘플링에 따른 각 해상도 별로 광학흐름을 추정할 수 있다. 달리 표현하여, 광학흐름 추정부(132)는 광학흐름 대응 복수 레이어(40) 각각에 대응하는 계층별로, 업 샘플링을 통해 점차 높아지는 해상도 각각에서의 광학흐름을 순차적으로 추정할 수 있다. 즉 광학흐름 추정부(132)는 광학흐름 대응 복수 레이어(40) 내의 각 레이어 계층별로 정합도 또는 정합도 기반의 광학흐름 확률정보를 전달함과 더불어 각각의 레이어에 대응하는 해상도에서의 광학흐름을 추정할 수 있다.In estimating the optical flow at the original resolution, the optical flow estimator 132 may estimate the optical flow for each resolution according to upsampling. In other words, the optical flow estimating unit 132 may sequentially estimate the optical flow at each resolution gradually increasing through upsampling for each layer corresponding to each of the plurality of layers 40 corresponding to the optical flow. That is, the optical flow estimating unit 132 transmits the optical flow probability information based on the degree of matching or the degree of matching for each layer layer in the plurality of layers corresponding to the optical flow, and the optical flow at the resolution corresponding to each layer. It can be estimated.

또한, 제2 목표 해상도로부터의 순차적 업 샘플링에 따른 각 해상도(레이어)의 순서는, 제2 목표 해상도로의 순차적 다운 샘플링에 따른 각 해상도(레이어)의 역순에 대응(매칭)할 수 있다. 다시 말해, 제2 목표 해상도에서 시작되는 순차적인 업 샘플링시(도 5a 참조)의 레이어 #1, 레이어 #2, … , 레이어 #n은, 제2 목표 해상도를 향하여 시작되는 순차적인 다운 샘플링시(도 4 참조)의 레이어 #1, 레이어 #2, … , 레이어 #n의 역순에 대응(매칭)할 수 있다. 또한, 다운 샘플링시 각 레이어마다 추출된 시각적 광학흐름 표현자는 업 샘플링시 상기 대응하는 레이어에 대하여 적용(활용)될 수 있다.In addition, the order of each resolution (layer) according to sequential upsampling from the second target resolution may correspond (match) to the reverse order of each resolution (layer) according to sequential downsampling to the second target resolution. In other words, layer # 1, layer # 2,... At sequential upsampling starting at the second target resolution (see FIG. 5A). , Layer #n denotes layer # 1, layer # 2,... At sequential downsampling (see FIG. 4) starting toward the second target resolution. , The reverse order of layer #n can be matched (matching). In addition, the visual optical flow presenter extracted for each layer during down sampling may be applied (used) to the corresponding layer during up sampling.

또한, 전술한 제1 목표 해상도는 상기 제2 목표 해상도와 동일한 해상도일 수 있다. 이러한 경우, 제1 목표 해상도와 제2 목표 해상도를 목표 해상도라 통칭할 수 있으며, 이러한 목표 해상도는 원본 해상도 대비 낮은 해상도, 예를 들면 최저 해상도일 수 있다.In addition, the above-described first target resolution may be the same resolution as the second target resolution. In this case, the first target resolution and the second target resolution may be collectively referred to as a target resolution, and the target resolution may be a lower resolution than the original resolution, for example, the lowest resolution.

한편, 시각적 광학흐름 표현자 추출부(112)를 통해 추출된 계층별(해상도별) 시각적 광학흐름 표현자, 광학흐름 확률정보 추정부(122)를 통해 추정된 정합도 또는 정합도 기반의 광학흐름 확률정보 및 광학흐름 추정부(132)를 통해 추정된 계층별(해상도별) 광학흐름 확률정보(광학흐름)는 후술할 장면 흐름 추정을 위한 장면 흐름 학습 장치에 의하여 학습될 수 있으며, 학습에 관한 설명은 후술하여 자세히 설명하기로 한다.On the other hand, by the visual optical flow presenter extractor 112, the layer-by-layer (by resolution) visual optical flow presenter, the optical flow based on the matching degree or the matching degree estimated by the optical flow probability information estimator 122 Probability information and optical flow probability information (optical flow) estimated by layer (resolution) estimated by the optical flow estimating unit 132 may be learned by a scene flow learning apparatus for scene flow estimation, which will be described later. The description will be described later in detail.

한편, 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치(100)에서는 시각적 디스패리티 표현자 추출부(111), 시각적 광학흐름 표현자 추출부(112), 디스패리티 확률정보 추정부(121), 광학흐름 확률정보 추정부(122), 디스패리티 추정부(131) 및 광학흐름 추정부(132)의 구성이 하나의 장치 내에 포함되는 것으로만 예시하였으나, 이에만 한정되는 것은 아니고, 본원의 다른 일 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치는 상기의 구성 중 일부의 구성만 포함하도록 마련될 수 있다. 구체적인 예는 다음과 같다.Meanwhile, in the deep neural network structure-based scene flow estimation apparatus 100 according to the first embodiment of the present application, the visual disparity presenter extractor 111, the visual optical flow presenter extractor 112, and the disparity probability information Although the configuration of the estimator 121, the optical flow probability information estimator 122, the disparity estimator 131, and the optical flow estimator 132 is illustrated as being included in one device, the present invention is not limited thereto. In addition, the apparatus for estimating scene flow based on a deep neural network structure according to another embodiment of the present disclosure may be provided to include only some of the above components. Specific examples are as follows.

본원의 제2 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치는 시각적 디스패리티 표현자 추출부 및 디스패리티 확률정보 추정부를 포함할 수 있다. 여기서, 시각적 디스패리티 표현자 추출부는 앞서 설명한 시각적 디스패리티 표현자 추출부(111)에 대응되고, 디스패리티 확률정보 추정부는 디스패리티 확률정보 추정부(121)에 대응될 수 있다. 따라서 이하 생략된 내용이라 하더라도 시각적 디스패리티 표현자 추출부(111) 및 디스패리티 확률정보 추정부(121)에 대하여 설명한 내용은 본원의 제2 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에 대한 설명에도 동일하게 적용될 수 있다.The scene flow estimation apparatus based on the deep neural network structure according to the second embodiment of the present disclosure may include a visual disparity presenter extracting unit and a disparity probability information estimating unit. Here, the visual disparity presenter extractor may correspond to the visual disparity presenter extractor 111 described above, and the disparity probability information estimator may correspond to the disparity probability information estimator 121. Therefore, even if omitted below, the descriptions of the visual disparity presenter extracting unit 111 and the disparity probability information estimating unit 121 are the scene flow estimation apparatus based on the deep neural network structure according to the second embodiment of the present application. The same applies to the description of.

간단히 살펴보면, 본원의 제2 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 시각적 디스패리티 표현자 추출부는, t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 목표 해상도에서의 시각적 디스패리티 표현자를 추출할 수 있다. 여기서, 목표 해상도는 앞서 설명한 제1 목표 해상도를 의미할 수 있으며, 이하 중복되는 설명은 생략하기로 한다.In brief, in the apparatus for estimating a scene flow based on a deep neural network structure according to the second embodiment of the present application, the visual disparity presenter extracting unit expresses visual disparity by inputting a first viewpoint image and a second viewpoint image at a time t. The visual disparity presenter at the target resolution can be extracted while sequentially down sampling the ruler. Here, the target resolution may refer to the first target resolution described above, and the overlapping description will be omitted.

디스패리티 확률정보 추정부는, 추출된 시각적 디스패리티 표현자를 고려하여 산출된 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 목표 해상도에서의 디스패리티 확률정보를 추정할 수 있다. 이로부터 디스패리티 확률정보 추정부는 추정된 디스패리티 확률정보를 기반으로 디스패리티 측면에서의 장면 흐름을 추정할 수 있다.The disparity probability information estimator may estimate the disparity probability information at the target resolution using a degree of matching for the disparity correspondence point candidate group at the target resolution calculated in consideration of the extracted visual disparity presenter. From this, the disparity probability information estimator may estimate a scene flow in terms of disparity based on the estimated disparity probability information.

또한, 본원의 제2 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치는 디스패리티 추정부를 포함할 수 있으며, 여기서, 디스패리티 추정부는 앞서 설명한 디스패리티 추정부(131)와 대응될 수 있는바, 이하 중복되는 설명은 생략하기로 한다.In addition, the deep neural network structure-based scene flow estimation apparatus according to the second embodiment of the present application may include a disparity estimator, where the disparity estimator may correspond to the disparity estimator 131 described above. The overlapping description will be omitted below.

본원의 제3 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치는 시각적 광학흐름 표현자 추출부 및 광학흐름 확률정보 추정부를 포함할 수 있다. 여기서, 시각적 광학흐름 표현자 추출부는 앞서 설명한 시각적 광학흐름 표현자 추출부(112)에 대응되고, 광학흐름 확률정보 추정부는 광학흐름 확률정보 추정부(122)에 대응될 수 있다. 따라서 이하 생략된 내용이라 하더라도 시각적 광학흐름 표현자 추출부(112) 및 광학흐름 확률정보 추정부(122)에 대하여 설명한 내용은 본원의 제3 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에 대한 설명에도 동일하게 적용될 수 있다.The apparatus for estimating scene flow based on the deep neural network structure according to the third embodiment of the present disclosure may include a visual optical flow presenter extractor and an optical flow probability information estimator. Here, the visual optical flow presenter extractor may correspond to the visual optical flow presenter extractor 112 described above, and the optical flow probability information estimator may correspond to the optical flow probability information estimator 122. Therefore, even if omitted below, the descriptions of the visual optical flow presenter extraction unit 112 and the optical flow probability information estimating unit 122 are the scene flow estimation apparatus based on the deep neural network structure according to the third embodiment of the present application. The same applies to the description of.

간단히 살펴보면, 본원의 제3 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에서 시각적 광학흐름 표현자 추출부는, t 시간에서의 제1 시점 이미지와 상기 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 목표 해상도에서의 시각적 광학흐름 표현자를 추출할 수 있다. 여기서, 목표 해상도는 앞서 설명한 제2 목표 해상도를 의미할 수 있으며, 이하 중복되는 설명은 생략하기로 한다.In brief, in the apparatus for estimating the scene flow based on the deep neural network structure according to the third embodiment of the present application, the visual optical flow presenter extractor may include a first viewpoint image at t time and a previous time at (t-1). The visual optical flow presenter at the target resolution may be extracted while sequentially down sampling the visual optical flow presenter with the input of the first viewpoint image in time. Here, the target resolution may refer to the second target resolution described above, and the overlapping description will be omitted.

광학흐름 확률정보 추정부는, 추출된 시각적 광학흐름 표현자를 고려하여 산출된 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 목표 해상도에서의 광학흐름 확률정보를 추정할 수 있다. 이로부터 광학흐름 확률정보 추정부는 추정된 광학흐름 확률정보를 기반으로 광학흐름 측면에서의 장면 흐름을 추정할 수 있다.The optical flow probability information estimator may estimate the optical flow probability information at the target resolution using the degree of matching for the candidate group of optical flow correspondence points at the target resolution calculated in consideration of the extracted visual optical flow descriptor. From this, the optical flow probability information estimator may estimate the scene flow in terms of optical flow based on the estimated optical flow probability information.

또한, 본원의 제3 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치는 광학흐름 추정부를 포함할 수 있으며, 여기서, 광학흐름 추정부는 앞서 설명한 광학흐름 추정부(132)와 대응될 수 있는바, 이하 중복되는 설명은 생략하기로 한다.In addition, the deep neural network structure-based scene flow estimation apparatus according to the third embodiment of the present application may include an optical flow estimator, where the optical flow estimator may correspond to the optical flow estimator 132 described above. The overlapping description will be omitted below.

이하에서는 본원의 제4 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치를 통한 디스패리티 측면에서의 학습에 대하여 설명하기로 한다. 본원의 제4 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치는 앞서 설명한 본원의 제2 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치와 동일하거나 상응하는 기술적 특징을 공유하는 동일한 구조(구성)를 갖는 장치일 수 있다. 따라서, 이하 생략된 내용이라 하더라도 본원의 제2 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치의 구조 및 해당 장치에 대하여 설명된 내용은 본원의 제4 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치에 대한 설명에도 동일하게 적용될 수 있다.Hereinafter, learning in terms of disparity through the scene flow learning apparatus for scene flow learning according to the fourth embodiment of the present application will be described. The scene flow learning apparatus for scene flow learning according to the fourth embodiment of the present application has the same structure that shares the same or corresponding technical features as the scene flow estimation apparatus based on the deep neural network structure according to the second embodiment of the present application described above. It may be a device having (configuration). Therefore, even if omitted below, the structure of the apparatus for estimating the scene flow based on the deep neural network structure according to the second embodiment of the present application and the description of the apparatus are described for scene flow learning according to the fourth embodiment of the present application. The same may be applied to the description of the scene flow learning apparatus.

본원의 제4 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치는 시각적 디스패리티 표현자 추출부, 디스패리티 확률정보 추정부, 디스패리티 추정부 및 디스패리티 학습부를 포함할 수 있다. 여기서, 시각적 디스패리티 표현자 추출부, 디스패리티 확률정보 추정부, 디스패리티 추정부에 대한 설명은 앞서 자세히 설명했으므로, 이하 생략하기로 한다.The scene flow learning apparatus for scene flow learning according to the fourth embodiment of the present application may include a visual disparity presenter extracting unit, a disparity probability information estimating unit, a disparity estimating unit, and a disparity learning unit. Here, since the description of the visual disparity presenter extracting unit, the disparity probability information estimating unit, and the disparity estimating unit has been described in detail above, it will be omitted.

디스패리티 학습부는, 복수의 계층 중 어느 하나인 다운 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지에 대응하는 타깃 디스패리티 확률정보를 학습대상으로서 산출할 수 있다(step 1). 다음으로, 디스패리티 학습부는 제1 다운 샘플링 계층에 포함된 다층 CNN의 적용을 통해 시각적 디스패리티 표현자를 추출할 수 있다(step 2). 다음으로, 디스패리티 학습부는 시각적 디스패리티 표현자를 고려하여 산출된 상기 다운 샘플링 계층에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 상기 다운 샘플링 계층에 대한 디스패리티 확률정보를 추정한 다음, 상기 다운 샘플링 계층에 대한 타깃 디스패리티 확률정보와의 차이가 최소화되도록 시각적 디스패리티 표현자를 학습할 수 있다(step 3).The disparity learner may calculate, as a learning object, target disparity probability information corresponding to the first viewpoint image and the second viewpoint image at time t for the down sampling layer, which is one of the plurality of layers (step 1). . Next, the disparity learner may extract the visual disparity presenter through the application of the multi-layer CNN included in the first down sampling layer (step 2). Next, the disparity learner estimates disparity probability information for the down sampling layer using the degree of matching for the disparity correspondence point candidate group in the down sampling layer calculated in consideration of the visual disparity presenter, and then The visual disparity presenter may be learned to minimize the difference with the target disparity probability information for the sampling layer (step 3).

이때, 디스패리티 학습부는 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자가 목표 해상도(최저 해상도)까지 순차적으로 다운 샘플링되는 상기 복수의 계층에 대하여 각각 학습을 수행할 수 있다. 달리 말해, 디스패리티 학습부는 step 1 내지 step 3의 과정을, 학습대상인 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자가 목표 해상도까지 순차적으로 다운 샘플링되는 복수의 계층 각각에 대하여 차례로 수행할 수 있다.In this case, the disparity learner may perform training on the plurality of hierarchies sequentially visually downsampled to a target resolution (lowest resolution) by inputting the first viewpoint image and the second viewpoint image at time t. Can be. In other words, the disparity learner performs steps 1 to 3 through a plurality of hierarchies in which the visual disparity presenter is sequentially downsampled to a target resolution by inputting the first viewpoint image and the second viewpoint image at a time t as a learning target. This can be done in turn for each.

또한, 디스패리티 학습부를 통해 산출되는 타깃 디스패리티 확률정보는, 제1 시점 이미지의 제1 다운 샘플링 계층의 해상도에 대응하는 이미지(제1 시점 이미지 대응 이미지) 및 제2 시점 이미지의 다운 샘플링 계층의 해상도에 대응하는 이미지(제2 시점 이미지 대응 이미지) 상의 대응 후보점 사이의 거리 관계에 기반하여 거리에 반비례하도록 산출될 수 있다. 예를 들어 거리에 반비례한다는 것은, 양 대응 후보점이 실제 대응하는 대응 관계에 있는 양 대응점으로부터 거리가 멀어질수록 낮은 확률을 갖는 것을 의미할 수 있다. 일예로, 거리에 반비례하는 것은 선형적인 반비례를 의미할 수 있으나, 이에만 한정되는 것은 아니다.In addition, the target disparity probability information calculated through the disparity learner may include the image corresponding to the resolution of the first down sampling layer of the first view image (the first view image corresponding image) and the down sampling layer of the second view image. It may be calculated to be inversely proportional to the distance based on a distance relationship between corresponding candidate points on the image corresponding to the resolution (second view image corresponding image). For example, inversely proportional to the distance may mean that both correspondence candidate points have a lower probability as the distance from both correspondence points in the corresponding correspondence becomes greater. For example, inversely proportional to distance may mean linear inversely, but is not limited thereto.

달리 말해, 디스패리티 학습부는 step2에서 추출되는 시각적 디스패리티 표현자를 학습하기 위하여, 타깃 디스패리티 확률정보를 표현(생성, 산출)할 수 있다. 이때, 타깃 디스패리티 확률정보는 제1 시점 이미지 대응 이미지 상에서의 픽셀 위치에 대응하는 제2 시점 이미지 대응 이미지 상에서의 픽셀 위치를, 일예로 도 6과 같이, 제1 시점 이미지 대응 이미지 상에서의 픽셀 위치에 실제로 대응하는 제2 시점 이미지 대응 이미지 상에서의 픽셀 위치와의 거리에 반비례하도록 표현할 수 있다.In other words, the disparity learner may express (generate and calculate) target disparity probability information in order to learn the visual disparity presenter extracted in step 2. In this case, the target disparity probability information may correspond to a pixel position on the second viewpoint image corresponding image corresponding to the pixel position on the first viewpoint image corresponding image, for example, as shown in FIG. 6, a pixel position on the first viewpoint image corresponding image. Can be expressed in inverse proportion to the distance to the pixel position on the corresponding image.

도 6은 본원의 일 실시예(제4 실시예)에 따른 장면 흐름 추정을 위한 장면 흐름 학습시 고려되는 타깃 디스패리티 확률정보의 예를 나타낸 도면이다. 달리 말해, 도 6은 일예로 디스패리티 대응점과 이웃 픽셀 간 거리에 따른 디스패리티 대응점 위치에 대한 타깃 디스패리티 확률정보의 예를 나타낸다.FIG. 6 is a diagram illustrating an example of target disparity probability information considered in scene flow learning for scene flow estimation according to an embodiment (fourth embodiment) of the present application. In other words, FIG. 6 illustrates an example of target disparity probability information about a disparity correspondence point position according to a distance between a disparity correspondence point and a neighboring pixel.

도 6을 참조하면, 타깃 디스패리티 확률정보는 상호 대응하는 픽셀의 위치가 실제 대응점의 픽셀 위치로부터 멀리 떨어질수록(즉, 이웃한 픽셀 간의 거리가 멀수록) 정합 확률이 낮아지는 반비례 관계로 나타나도록 표현(생성)될 수 있다.Referring to FIG. 6, the target disparity probability information may be represented in inverse relationship such that the matching probability decreases as the positions of corresponding pixels move away from the pixel positions of the actual corresponding points (ie, the distance between neighboring pixels). Can be expressed.

디스패리티 학습부는 step3에서 추정된 디스패리티 확률정보와 타깃 디스패리티 혹률분포와의 차이가 최소화되도록 학습을 수행할 수 있다. 여기서, 학습은 디스패리티 확률정보의 학습을 의미할 수 있으나, 이에만 한정되는 것은 아니고, 시각적 디스패리티 표현자의 학습을 의미할 수 있다. 또한, 디스패리티 학습부는 해상도별(계층별)로 디스패리티 확률정보를 학습할 수 있으며, 해상도별(계층별)로 시각적 디스패리티 표현자를 학습할 수 있다.The disparity learner may perform learning to minimize the difference between the disparity probability information estimated at step 3 and the target disparity probability distribution. Here, the learning may mean learning of disparity probability information, but is not limited thereto and may mean learning of a visual disparity presenter. In addition, the disparity learner may learn disparity probability information by resolution (by layer), and may learn visual disparity presenter by resolution (by layer).

한편, step3에서 시각적 디스패리티 표현자 간의 정합도 계산은 내적 연산에 의하여 계산될 수 있으며, 이에 대한 설명은 앞서 설명했으므로, 중복되는 설명은 이하 생략하기로 한다.Meanwhile, in step 3, the degree of matching between the visual disparity presenters may be calculated by an inner product. Since the description thereof has been described above, the overlapping description will be omitted.

또한, step3에서 디스패리티 확률정보의 추정은 대응점들과의 정합도에 소프트맥스 함수를 적용함으로써 디스패리티 위치에 대한 확률정보(특히, 다운 샘플링 계층에 대한 디스패리티 확률정보)를 추정함으로써 이루어질 수 있다. 디스패리티 대응점 위치에 대한 확률정보는 하기 수학식 3에 기초하여 추정될 수 있다.In addition, the estimation of the disparity probability information in step 3 may be performed by estimating the probability information on the disparity position (particularly, the disparity probability information on the down sampling layer) by applying a softmax function to the degree of matching with the corresponding points. . Probability information about the disparity corresponding point position may be estimated based on Equation 3 below.

여기서,

는 디스패리티 대응점 위치에 대한 확률변수로서, 범위가

인 정수로 나타날 수 있다. p^L은 t 시간에서의 제1 시점 이미지 상의 픽셀 좌표를 나타내고, p^R은 t 시간에서의 제2 시점 이미지 상의 픽셀 좌표를 나타낸다.here,

Is a random variable for the disparity correspondence point position.

Can be represented by an integer p ^L represents pixel coordinates on the first viewpoint image at t time, and p ^R represents pixel coordinates on the second viewpoint image at t time.

디스패리티 학습부는 디스패리티 대응점 위치에 대한 디스패리티 확률정보 추정치(달리 말해, 추정된 디스패리티 확률정보)와 타깃 디스패리티 확률정보 간의 차이의 최소화를 위해, 일예로 하기 수학식 4를 만족하는 크로스 엔트로피(cross entropy)를 비용 함수로 하여 두 확률정보 간의 차이가 최소화(minimize)되도록 학습할 수 있다.In order to minimize the difference between the disparity probability information estimated value (in other words, the estimated disparity probability information) and the target disparity probability information on the disparity correspondence point position, the disparity learning unit cross-entropy that satisfies Equation 4 below. We can learn to minimize the difference between the two probability information by using (cross entropy) as a cost function.

여기서,

는 디스패리티 대응점 위치에 대한 확률정보 추정치, 즉 디스패리티 확률정보 추정부(121)를 통해 추정되는 디스패리티 확률정보를 나타낸다.

는 타깃 디스패리티 확률정보를 나타낸다.here,

Denotes probability information estimation value for the disparity corresponding point position, that is, disparity probability information estimated by the disparity probability information estimator 121.

Represents target disparity probability information.

디스패리티 학습부는 계층별로 디스패리티 대응점 위치에 대한 확률정보(디스패리티 확률정보)를 학습할 수 있다. 즉, 디스패리티 학습부는 계층별 시각적 디스패리티 표현자에서, 디스패리티 대응점 위치에 대한 확률정보 학습 방법을 각각의 해상도별로 적용함으로써, 계층별로 디스패리티 대응점 위치에 대한 확률정보를 학습, 즉 계층별로 디스패리티 확률정보를 학습할 수 있다.The disparity learner may learn probability information (disparity probability information) regarding the disparity correspondence point for each layer. That is, the disparity learner applies the probability information learning method for the disparity correspondence point position in each visual disparity presenter for each layer, thereby learning the probability information for the disparity correspondence point position for each layer, that is, for each layer. Parity probability information can be learned.

이처럼, 디스패리티 학습부는 각 해상도별 t 시간에서의 제1 시점 이미지와 제2 시점 이미지에서 디스패리티 대응점 간의 시각적 표현자의 차이가 최소화되도록 추정된 디스패리티 확률정보를 학습할 수 있다. 디스패리티 학습부는 하기 수학식 5를 이용하여 학습을 수행할 수 있다. As such, the disparity learner may learn the disparity probability information estimated to minimize the difference in the visual presenter between the disparity correspondence points in the first view image and the second view image at each time t resolution. The disparity learner may perform learning using Equation 5 below.

상기 수학식 5는 일예로 t 시간에서의 제1 시점 이미지의 디스패리티를 추정하기 위한 비용 함수(cost function)의 예를 나타낸다. 여기서,

는 각각 t 시간에서의 제1 시점 이미지의 픽셀 좌표 및 t 시간에서의 제2 시점 이미지의 픽셀 좌표를 나타낸다. 또한,

는 t 시간에서의 제1 시점 이미지의 픽셀 좌표

에 대하여 추정된 디스패리티, 즉 제1 시점 이미지의 픽셀 좌표에서의 디스패리티 추정값을 나타낸다.Equation 5 shows an example of a cost function for estimating the disparity of the first viewpoint image at t time. here,

Denotes pixel coordinates of the first viewpoint image at t time and pixel coordinates of the second viewpoint image at t time, respectively. Also,

Is the pixel coordinate of the first viewpoint image at time t.

Denotes an estimated disparity, ie, an estimated disparity in pixel coordinates of the first viewpoint image.

또한, 디스패리티 학습부는, 복수의 계층 중 어느 하나인 업 샘플링 계층에 포함된 다층 CNN을 상기 업 샘플링 계층의 하위 계층에서 전달된 디스패리티 확률정보 및 상기 업 샘플링 계층에 대응하는 디스패리티 정합도 정보 중 하나 이상에 적용(도 3b 참조)함으로써 상기 업 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지에 대응하는 디스패리티를 추정할 수 있다(step 4). 다음으로, 디스패리티 학습부는, 상기 업 샘플링 계층에서의 타깃 디스패리티와 상기 step 4에서 추정된 디스패리티에 관한 정합도 차이가 최소화되도록 학습할 수 있다(step 5). 이때, 상기 디스패리티 정합도 정보는, step 3의 수행을 통해 업 샘플링 계층에 대응하는 다운 샘플링 계층에 대하여 학습된 시각적 디스패리티 표현자를 이용하여 산출된 정합도(디스패리티 정합도) 또는 정합도 기반으로 추정된 디스패리티 확률정보일 수 있다(도 3b 참조). 또한, 상기 타깃 디스패리티는, step 3의 수행을 통해 상기 업 샘플링 계층에 대응하는 다운 샘플링 계층에 대하여 학습된 시각적 디스패리티 표현자를 이용하여 산출될 수 있다. 즉, 업 샘플링시에는 계층별로 재차 시각적 디스패리티 표현자를 추출하는 것이 아니라, 다운 샘플링시의 다운 샘플링 계층에 대응하는 업 샘플링이 이루어지도록 함으로써, 다운 샘플링시 추출된 시각적 디스패리티 표현자를 업 샘플링 계층에서도 활용(이용)할 수 있다.The disparity learning unit may further include disparity probability information transmitted from a lower layer of the upsampling layer and disparity matching degree information corresponding to the upsampling layer from a multilayer CNN included in an upsampling layer, which is one of a plurality of layers. The disparity corresponding to the first viewpoint image and the second viewpoint image at time t may be estimated for the up-sampling layer by applying to one or more of the above (see FIG. 3B). Next, the disparity learner may learn to minimize the difference in matching degree with respect to the target disparity in the upsampling layer and the disparity estimated in step 4 (step 5). In this case, the disparity matching information is based on the matching degree (disparity matching degree) or the matching degree calculated using the visual disparity presenter learned about the down sampling layer corresponding to the upsampling layer by performing step 3. This may be disparity probability information estimated as (see FIG. 3B). In addition, the target disparity may be calculated by using the visual disparity presenter trained on the down sampling layer corresponding to the up sampling layer by performing step 3. In other words, instead of extracting the visual disparity presenter for each layer during upsampling, the upsampling corresponding to the downsampling layer during downsampling is performed so that the visual disparity presenter extracted during the downsampling is also extracted from the upsampling layer. I can utilize it.

이때, step 4 및 5는, 상기 step 3을 통해 목표 해상도에서 학습된 디스패리티 확률정보를 입력으로 각 업 샘플링 계층마다 포함된 다층 CNN 및 업 샘플링의 적용을 통해 디스패티리 확률정보 및 각 업 샘플링 계층에 대응하는 디스패리티 정합도 정보 중 하나 이상이 원본 해상도까지 순차적으로 전달되는 상기 복수의 계층 각각에 대하여 차례로 수행될 수 있다.In this case, steps 4 and 5 are based on the disparity probability information learned at the target resolution through step 3, and the disparity probability information and each upsampling by applying the multi-layer CNN and upsampling included in each upsampling layer. One or more of the disparity matching information corresponding to the layer may be sequentially performed on each of the plurality of layers sequentially transmitted to the original resolution.

전술한 바와 같이, 이러한 장면 흐름 학습 장치는 전술한 장면 흐름 추정 장치와 동일한 장치일 수 있다. 즉, 하나의 장치가 장면 흐름에 대한 추정 및 학습을 병행하여 수행할 수 있으므로, 이 같이 추정/학습을 모두 수행하는 장치를 장면 흐름 추정/학습 장치라 칭하고, 이에 도 1의 도면부호 100을 동일하게 부여할 수 있다. 또한, 본 장면 흐름 추정/학습 장치는 동일한 딥 러닝 네트워크 구조를 이용하여 추정 및 학습을 수행하는 장치를 의미할 수 있다. 이러한 딥 러닝 네트워크 구조는 도 1 내지 도 5b에 도시된 사항을 참조하여 이해될 수 있다.As described above, the scene flow learning apparatus may be the same apparatus as the scene flow estimating apparatus described above. That is, since one apparatus may perform the estimation and learning of the scene flow in parallel, the apparatus that performs both the estimation and the learning is called the scene flow estimation / learning apparatus, and the reference numeral 100 of FIG. 1 is the same. Can be given. In addition, the present scene flow estimation / learning apparatus may mean an apparatus for performing estimation and learning using the same deep learning network structure. Such a deep learning network structure may be understood with reference to the matters illustrated in FIGS. 1 to 5B.

이하에서는 본원의 제5 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치를 통한 광학흐름 측면에서의 학습에 대하여 설명하기로 한다. 본원의 제5 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치는 앞서 설명한 본원의 제3 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치와 동일하거나 상응하는 기술적 특징을 공유하는 동일한 구조(구성)를 갖는 장치일 수 있다. 따라서, 이하 생략된 내용이라 하더라도 본원의 제3 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치의 구조 및 해당 장치에 대하여 설명된 내용은 본원의 제5 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치에 대한 설명에도 동일하게 적용될 수 있다.Hereinafter, learning in terms of optical flow through the scene flow learning apparatus for scene flow learning according to the fifth embodiment of the present application will be described. The scene flow learning apparatus for scene flow learning according to the fifth embodiment of the present application has the same structure that shares the same or corresponding technical features as the scene flow estimation apparatus based on the deep neural network structure according to the third embodiment of the present application described above. It may be a device having (configuration). Therefore, even if omitted below, the structure of the apparatus for estimating the scene flow based on the deep neural network structure according to the third embodiment of the present application and the descriptions of the apparatus are described for scene flow learning according to the fifth embodiment of the present application. The same may be applied to the description of the scene flow learning apparatus.

본원의 제5 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치는 시각적 광학흐름 표현자 추출부, 광학흐름 확률정보 추정부, 광학흐름 추정부 및 광학흐름 학습부를 포함할 수 있다. 여기서, 시각적 광학흐름 표현자 추출부, 광학흐름 확률정보 추정부, 광학흐름 추정부에 대한 설명은 앞서 자세히 설명했으므로, 이하 생략하기로 한다.The scene flow learning apparatus for scene flow learning according to the fifth embodiment of the present application may include a visual optical flow presenter extractor, an optical flow probability information estimator, an optical flow estimator, and an optical flow learner. Here, the descriptions of the visual optical flow presenter extracting unit, the optical flow probability information estimating unit, and the optical flow estimating unit have been described in detail above, and thus will be omitted.

광학흐름 학습부는, 복수의 계층 중 어느 하나인 다운 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 t 시간보다 이전인 (t-1) 시간에 대응하는 타깃 광학흐름 확률정보를 학습대상으로서 산출할 수 있다(step 1). 다음으로, 광학흐름 학습부는, 제1 다운 샘플링 계층에 포함된 다층 CNN의 적용을 통해 시각적 광학흐름 표현자를 추출하라 수 있다(step 2). 다음으로, 광학흐름 학습부는, 시각적 광학흐름 표현자를 고려하여 산출된 상기 다운 샘플링 계층에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 제1 다운 샘플링 계층에 대한 광학흐름 확률정보를 추정한 다음, 제1 다운 샘플링 계층에 대한 타깃 광학흐름 확률정보와의 차이가 최소화되도록 시각적 광학흐름 표현자를 학습할 수 있다(step 3). The optical flow learning unit calculates, as a learning object, target optical flow probability information corresponding to the first viewpoint image at t time and the time (t-1) earlier than t time for the down sampling layer which is one of the plurality of layers. You can do it (step 1). Next, the optical flow learner may extract the visual optical flow presenter through the application of the multi-layer CNN included in the first down sampling layer (step 2). Next, the optical flow learner estimates the optical flow probability information for the first down sampling layer using the degree of matching for the disparity correspondence point candidate group in the down sampling layer calculated in consideration of the visual optical flow presenter. The visual optical flow presenter may be trained to minimize the difference with the target optical flow probability information for the first down sampling layer (step 3).

이때, 광학흐름 학습부는 t 시간에서의 제1 시점 이미지 및 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자가 목표 해상도(최저 해상도)까지 순차적으로 다운 샘플링되는 복수의 계층에 대하여 각각 학습을 수행할 수 있다. 달리 말해, 광학흐름 학습부는 step 1 내지 step 3의 과정을, 학습대상인 t 시간에서의 제1 시점 이미지 및 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자가 목표 해상도까지 순차적으로 다운 샘플링되는 복수의 계층 각각에 대하여 차례로 수행할 수 있다.In this case, the optical flow learner inputs a first viewpoint image at time t and a first viewpoint image at time (t-1), and a plurality of hierarchies in which the visual optical flow presenter is sequentially downsampled to a target resolution (lowest resolution). You can learn about each. In other words, the optical flow learning unit inputs the first viewpoint image at time t and the first viewpoint image at time t-1 to be the target resolution. It may be performed sequentially for each of a plurality of layers sequentially downsampled.

또한, 광학흐름 학습부를 통해 산출되는 타깃 광학흐름 확률정보는, 제1 시점 이미지의 제1 다운 샘플링 계층의 해상도에 대응하는 이미지(t 시간 대응 이미지) 및 (t-1) 시간에서의 제1 시점 이미지의 다운 샘플링 계층의 해상도에 대응하는 이미지((t-1) 시간 대응 이미지) 상의 대응 후보점 사이의 거리 관계에 기반하여 거리에 반비례하도록 산출될 수 있다.In addition, the target optical flow probability information calculated through the optical flow learning unit may include an image (t time corresponding image) corresponding to the resolution of the first down sampling layer of the first viewpoint image and a first viewpoint at time (t-1). It may be calculated to be inversely proportional to the distance based on a distance relationship between corresponding candidate points on the image (t-1) time corresponding image corresponding to the resolution of the down sampling layer of the image.

달리 말해, 광학흐름 학습부는 step2에서 추출되는 시각적 광학흐름 표현자를 학습하기 위하여, 타깃 광학흐름 확률정보를 표현(생성, 산출)할 수 있다. 이때, 타깃 광학흐름 확률정보는 t 시간 대응 이미지 상에서의 픽셀 위치에 대응하는 (t-1) 시간 대응 이미지 상에서의 픽셀 위치를, 일예로 도 7과 같이, t 시간 대응 이미지 상에서의 픽셀 위치에 실제로 대응하는 (t-1) 시간 대응 이미지 상에서의 픽셀 위치와의 거리에 반비례하도록 표현할 수 있다.In other words, the optical flow learning unit may express (generate and calculate) target optical flow probability information in order to learn the visual optical flow presenter extracted in step 2. In this case, the target optical flow probability information is a pixel position on the (t-1) time-corresponding image corresponding to the pixel position on the t-time-corresponding image, for example, as shown in FIG. 7. It can be expressed in inverse proportion to the distance from the pixel position on the corresponding (t-1) time-corresponding image.

도 7은 본원의 일 실시예(제5 실시예)에 따른 장면 흐름 추정을 위한 장면 흐름 학습시 고려되는 타깃 광학흐름 확률정보의 예를 나타낸 도면이다. 달리 말해, 도 7은 일예로 광학흐름 대응점과 이웃 픽셀 간 거리에 따른 광학흐름 대응점 위치에 대한 타깃 광학흐름 확률정보의 예를 나타낸다.FIG. 7 is a diagram illustrating an example of target optical flow probability information considered in scene flow learning for scene flow estimation according to an embodiment (fifth embodiment) of the present application. In other words, FIG. 7 illustrates an example of target optical flow probability information on the position of the optical flow correspondence point according to the distance between the optical flow correspondence point and the neighboring pixel.

도 7을 참조하면, 타깃 광학흐름 확률정보는 상호 대응하는 픽셀의 위치가 실제 대응점의 픽셀 위치로부터 멀리 떨어질수록(즉, 이웃한 픽셀 간의 거리가 멀수록) 정합 확률이 낮아지는 반비례 관계로 나타나도록 표현(생성)될 수 있다.Referring to FIG. 7, the target optical flow probability information is represented in inverse relationship such that the matching probability decreases as the positions of mutually corresponding pixels move away from the pixel positions of actual corresponding points (ie, the distance between neighboring pixels). Can be expressed.

광학흐름 학습부는 step3에서 추정된 광학흐름 확률정보와 타깃 광학흐름 확률분포와의 차이가 최소화되도록 학습을 수행할 수 있다. 여기서, 학습은 광학흐름 확률정보의 학습을 의미할 수 있으나, 이에만 한정되는 것은 아니고, 시각적 광학흐름 표현자의 학습을 의미할 수 있다. 또한, 광학흐름 학습부는 해상도별(계층별)로 광학흐름 확률정보를 학습할 수 있으며, 해상도별(계층별)로 시각적 광학흐름 표현자를 학습할 수 있다.The optical flow learner may perform the learning to minimize the difference between the optical flow probability information estimated in step 3 and the target optical flow probability distribution. Here, the learning may mean learning of the optical flow probability information, but is not limited thereto, and may mean learning of the visual optical flow presenter. In addition, the optical flow learning unit may learn optical flow probability information by resolution (by layer), and may learn visual optical flow presenter by resolution (by layer).

한편, step3에서 시각적 광학흐름 표현자 간의 정합도 계산은 내적 연산에 의하여 계산될 수 있으며, 이에 대한 설명은 앞서 설명했으므로, 중복되는 설명은 이하 생략하기로 한다.Meanwhile, in step 3, the degree of matching between visual optical flow presenters may be calculated by an inner product. Since the description thereof has been described above, the overlapping description will be omitted.

또한, step3에서 광학흐름 확률정보의 추정은 대응점들과의 정합도에 소프트맥스 함수를 적용함으로써 광학흐름 위치에 대한 확률정보(특히, 제1 다운 샘플링 계층에 대한 광학흐름 확률정보)를 추정함으로써 이루어질 수 있다. 광학흐름 대응점 위치에 대한 확률정보는 하기 수학식 6에 기초하여 추정될 수 있다.In addition, the estimation of the optical flow probability information in step 3 is performed by estimating the probability information on the optical flow position (particularly, the optical flow probability information on the first down sampling layer) by applying a softmax function to the degree of matching with the corresponding points. Can be. Probability information for the optical flow corresponding point position may be estimated based on Equation 6 below.

여기서,

는 광학흐름 대응점 위치에 대한 수평 변위 확률변수로서, 범위가

인 정수로 나타날 수 있다. 또한,

는 광학흐름 대응점 위치에 대한 수직 변위 확률변수로서, 범위가

인 정수로 나타날 수 있다. p^t- ¹는 (t-1) 시간에서의 제1 시점 이미지 상의 픽셀 좌표를 나타내고, p^t는 t 시간에서의 제1 시점 이미지 상의 픽셀 좌표를 나타낸다.here,

Is the horizontal displacement probability variable for the position of the optical flow correspondence point.

Can be represented by an integer Also,

Is the vertical displacement probability variable for the position of the optical flow correspondence point.

Can be represented by an integer p is ¹ ^t- (t-1) represent the pixel coordinates on the first image point in time, ^t p represents a pixel coordinate on the first point in time image at a time t.

광학흐름 학습부(미도시)는 광학흐름 대응점 위치에 대한 광학흐름 확률정보 추정치(달리 말해, 추정된 광학흐름 확률정보)와 타깃 광학흐름 확률정보 간의 차이의 최소화를 위해, 일예로 하기 수학식 7을 만족하는 크로스 엔트로피(cross entropy)를 비용 함수로 하여 두 확률정보 간의 차이가 최소화(minimize)되도록 학습할 수 있다.The optical flow learning unit (not shown) is an example of the following equation for minimizing the difference between the optical flow probability information estimated value (in other words, the estimated optical flow probability information) and the target optical flow probability information on the position of the optical flow corresponding point. We can learn to minimize the difference between the two probability information by using cross entropy satisfying the function as a cost function.

여기서,

는 광학흐름 대응점 위치에 대한 확률정보 추정치, 즉 광학흐름 확률정보 추정부(122)를 통해 추정되는 광학흐름 확률정보를 나타낸다.

는 타깃 광학흐름 확률정보를 나타낸다.here,

Denotes probability information estimation values for optical flow corresponding point positions, that is, optical flow probability information estimated by the optical flow probability information estimator 122.

Denotes target optical flow probability information.

광학흐름 학습부는 계층별로 광학흐름 대응점 위치에 대한 확률정보(광학흐름 확률정보)를 학습할 수 있다. 즉, 광학흐름 학습부는 계층별 시각적 광학흐름 표현자에서, 광학흐름 대응점 위치에 대한 확률정보 학습 방법을 각각의 해상도별로 적용함으로써, 계층별로 광학흐름 대응점 위치에 대한 확률정보를 학습, 즉 계층별로 광학흐름 확률정보를 학습할 수 있다.The optical flow learning unit may learn probability information (optical flow probability information) about the position of the optical flow corresponding point for each layer. That is, the optical flow learning unit applies probability information learning method for optical flow correspondence point positions by each resolution in the visual optical flow presenter for each layer, thereby learning probability information for optical flow correspondence point positions for each layer, that is, optical for each layer. Can learn flow probability information.

이처럼, 광학흐름 학습부는 각 해상도별 t 시간에서의 제1 시점 이미지와 제2 시점 이미지에서(달리 말해, 연속된 이미지 시퀀스의 해상도별 이미지에서) 광학흐름 대응점 간의 시각적 표현자의 차이가 최소화되도록 추정된 광학흐름 확률정보를 학습할 수 있다. 광학흐름 학습부는 하기 수학식 8을 이용하여 학습을 수행할 수 있다.As such, the optical flow learner is estimated to minimize the difference in the visual presenter between the optical flow correspondence points in the first viewpoint image and the second viewpoint image at the time t of each resolution (in other words, in the image of the resolution of a continuous image sequence). Optical flow probability information can be learned. The optical flow learner may perform learning using Equation 8 below.

상기 수학식 8은 일예로 (t-1) 시간에서의 제1 시점 이미지의 광학흐름을 추정하기 위한 비용 함수(cost function)의 예를 나타낸다. 여기서,

는 각각 (t-1) 시간에서의 제1 시점 이미지의 픽셀 좌표 및 t 시간에서의 제1 시점 이미지의 픽셀 좌표를 나타낸다. 또한,

는 (t-1) 시간에서의 제1 시점 이미지의 픽셀 좌표

에서 수평 광학흐름 추정 값, 즉 (t-1) 시간에서의 제1 시점 이미지의 픽셀 좌표에서 추정된 수평 광학흐름을 나타낸다.

는 t 시간에서의 제1 시점 이미지의 픽셀 좌표

에서 수직 광학흐름 추정 값, 즉 t 시간에서의 제1 시점 이미지의 픽셀 좌표에서 추정된 수직 광학흐름을 나타낸다.Equation 8 shows an example of a cost function for estimating the optical flow of the first viewpoint image at time (t-1). here,

Denotes pixel coordinates of the first viewpoint image at time (t-1) and pixel coordinates of the first viewpoint image at time t, respectively. Also,

Is the pixel coordinate of the first viewpoint image at time (t-1)

Denotes the horizontal optical flow estimation value, that is, the horizontal optical flow estimated at the pixel coordinates of the first viewpoint image at time (t-1).

Is the pixel coordinate of the first viewpoint image at time t.

Denotes a vertical optical flow estimate value, that is, an estimated vertical optical flow at pixel coordinates of the first viewpoint image at time t.

또한, 광학흐름 학습부는, 복수의 계층 중 어느 하나인 업 샘플링 계층에 포함된 다층 CNN을 상기 업 샘플링 계층의 하위 계층에서 전달된 광학흐름 확률정보 및 상기 업 샘플링 계층에 대응하는 광학흐름 정합도 정보 중 하나 이상에 적용(도 5b 참조)함으로써 상기 업 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지에 대응하는 광학흐름을 추정할 수 있다(step 4). 다음으로, 광학흐름 학습부는, 상기 업 샘플링 계층에서의 타깃 광학흐름과 상기 step 4에서 추정된 광학흐름에 관한 정합도 차이가 최소화되도록 학습할 수 있다(step 5). 이때, 상기 광학흐름 정합도 정보는, step 3의 수행을 통해 업 샘플링 계층에 대응하는 다운 샘플링 계층에 대하여 학습된 시각적 광학흐름 표현자를 이용하여 산출된 정합도(광학흐름 정합도) 또는 정합도 기반으로 추정된 광학흐름 확률정보일 수 있다(도 5b 참조). 또한, 상기 타깃 광학흐름은, step 3의 수행을 통해 상기 업 샘플링 계층에 대응하는 다운 샘플링 계층에 대하여 학습된 시각적 광학흐름 표현자를 이용하여 산출될 수 있다. 즉, 업 샘플링시에는 계층별로 재차 시각적 광학흐름 표현자를 추출하는 것이 아니라, 다운 샘플링시의 다운 샘플링 계층에 대응하는 업 샘플링이 이루어지도록 함으로써, 다운 샘플링시 추출된 시각적 광학흐름 표현자를 업 샘플링 계층에서도 활용(이용)할 수 있다.The optical flow learning unit may further include optical flow probability information transmitted from a lower layer of the upsampling layer and optical flow matching degree information corresponding to the upsampling layer. Applying to one or more of the above (see FIG. 5B) to estimate the optical flow corresponding to the first view image at time t and the first view image at time (t-1) earlier than time t for the upsampling layer You can do it (step 4). Next, the optical flow learner may learn to minimize the difference in the degree of matching with respect to the target optical flow in the upsampling layer and the optical flow estimated in step 4 (step 5). In this case, the optical flow matching degree information is based on the matching degree (optical flow matching degree) or the matching degree calculated using the visual optical flow presenter trained on the down sampling layer corresponding to the up sampling layer by performing step 3. It may be estimated as optical flow probability information (see FIG. 5B). In addition, the target optical flow may be calculated using the visual optical flow presenter learned about the down sampling layer corresponding to the up sampling layer by performing step 3. In other words, instead of extracting the visual optical flow presenter for each layer during upsampling, upsampling corresponding to the downsampling layer during downsampling is performed so that the visual optical flow presenter extracted during the downsampling is also extracted from the upsampling layer. I can utilize it.

이때, step 4 및 5는, 상기 step 3을 통해 목표 해상도에서 학습된 광학흐름 확률정보를 입력으로 각 업 샘플링 계층마다 포함된 다층 CNN 및 업 샘플링의 적용을 통해 광학흐름 확률정보 및 각 업 샘플링 계층에 대응하는 디스패리티 정합도 정보 중 하나 이상이 원본 해상도까지 순차적으로 전달되는 상기 복수의 계층 각각에 대하여 차례로 수행될 수 있다.In this case, steps 4 and 5 input optical flow probability information learned at the target resolution through step 3, and the optical flow probability information and each upsampling layer are applied by applying multi-layer CNN and upsampling included in each upsampling layer. One or more of the disparity matching information corresponding to may be sequentially performed for each of the plurality of layers sequentially transmitted to the original resolution.

한편, 앞선 일예에서는, 디스패리티 측면에서의 학습과 광학흐름 측면에서의 학습이 각각에 대응하는 학습 장치 각각에 의하여 이루어지는 것으로만 예시하였으나, 이에만 한정되는 것은 아니고, 디스패리티 측면에서의 학습과 광학흐름 측면에서의 학습은 하나의 학습 장치에 의하여 이루어질 수 있다. 즉, 하나의 장면 흐름 학습 장치에서 디스패리티 측면에서의 학습과 광학흐름 측면에서의 학습이 함께 수행되는 구현예 또한 제공될 수 있다.Meanwhile, in the foregoing example, the learning in terms of disparity and the learning in terms of optical flow are exemplified only by the corresponding learning devices, but the present invention is not limited thereto. Learning in terms of flow can be done by one learning device. That is, an embodiment in which learning in terms of disparity and learning in terms of optical flow are performed together in one scene flow learning apparatus may also be provided.

이러한 본 장면 흐름 추정 장치(본 장면 흐름 추정/학습 장치)는 입력된 이미지를 복수의 단계로 다운 샘플링하여 계층별로 정합도 측정을 가능하게 하는 계층적 시각적 디스패리티 표현자 추출 구조 및 계층적 시각적 광학흐름 표현자 추출 구조를 제공하며, 이에 기초하여 해상도별 시각적 디스패리티 표현자 및 해상도별 시각적 광학흐름 표현자를 추출하고, 이를 학습할 수 있다. 또한, 본 장면 흐름 추정 장치(본 장면 흐름 추정/학습 장치)는 계층적(해상도별, 레이어별)으로 추출된 시각적 디스패리티 표현자 및 시각적 광학흐름 표현자를 이용하여 목표 해상도(최저해상도)에서의 두 이미지에서의 디스패리티 정합도 및 광학흐름 정합도를 측정할 수 있으며, 측정된 정합도를 대응점의 위치에 대한 확률정보로 추정할 수 있다. 추정된 확률정보로부터 디스패리티 측면 및 광학흐름 측면에서의 장면 흐름을 추정할 수 있다.The present scene flow estimating apparatus (the present scene flow estimating / learning apparatus) down-samples the input image into a plurality of stages, and performs hierarchical visual disparity presenter extraction structure and hierarchical visual optics to measure the degree of matching for each layer. The present invention provides a flow presenter extraction structure, and based on this, a visual disparity presenter for each resolution and a visual optical flow presenter for each resolution may be extracted and learned. In addition, the present scene flow estimating apparatus (the present scene flow estimating / learning apparatus) uses a visual disparity presenter and a visual optical flow presenter extracted hierarchically (by resolution and by layer) at a target resolution (lowest resolution). The disparity matching degree and the optical flow matching degree in the two images may be measured, and the measured matching degree may be estimated as probability information on the position of the corresponding point. The scene flow in the disparity side and the optical flow side can be estimated from the estimated probability information.

또한, 본 장면 흐름 추정 장치(본 장면 흐름 추정/학습 장치)는 학습된 목표 해상도(최저해상도)에서의 디스패리티 대응점 위치에 대한 정합도(디스패리티 정합도) 또는 이러한 정합도 기반으로 추정된 확률정보(제1 목표 해상도에서의 디스패리티 확률정보)와 학습된 목표 해상도(최저해상도)에서의 광학흐름 대응점 위치에 대한 정합도(광학흐름 정합도) 또는 이러한 정합도 기반으로 추정된 확률정보(제2 목표 해상도에서의 광학흐름 확률정보)를 입력으로 단계별로 업 샘플링하여 정합도 또는 확률정보를 상위 단계로 전달하고, 최종적으로 원본 해상도에서의 디스패리티 및 광학흐름을 추정할 수 있는 디스패리티 추정 구조 및 광학흐름 추정 구조를 제공하며, 이에 기초하여 원본 해상도에서의 디스패리티 및 광학흐름을 추정하고, 이를 학습할 수 있다. In addition, the present scene flow estimating apparatus (this scene flow estimating / learning apparatus) has a matching degree (disparity matching degree) for the disparity correspondence point position at the learned target resolution (lowest resolution) or an estimated probability based on this matching degree. Information (disparity probability information at the first target resolution) and the matching degree (optical flow matching degree) for the position of the optical flow correspondence point at the learned target resolution (lowest resolution) or the probability information estimated based on the matching degree (first 2) Disparity estimation structure that up-samples the optical flow probability information at the target resolution step by step and delivers the matching degree or the probability information to the upper level and finally estimates the disparity and optical flow at the original resolution. And an optical flow estimating structure, which can be used to estimate and learn disparity and optical flow at the original resolution. C.

또한, 다운 샘플링시 추정(획득)되는 '확률정보'는 정규화된 확률정보, 즉 '확률분포'일 수 있고, 반면 업 샘플링시 추정(획득)되는 확률정보는 정규화가 이루어지지 않은 확률정보일 수 있다.In addition, 'probability information' estimated (acquired) during downsampling may be normalized probability information, that is, 'probability distribution', while probability information estimated (acquired) during upsampling may be non-normalized probability information. have.

이러한 본원은 실시간으로 장면 흐름을 추정할 수 있으며, 자율 이동 로봇, 자율 주행 차량 등의 기술에 활용될 수 있다.The present application can estimate the scene flow in real time, and can be used for techniques such as an autonomous mobile robot and an autonomous vehicle.

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다.Hereinafter, based on the details described above, the operation flow of the present application will be briefly described.

도 8은 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 방법에 대한 동작 흐름도이다.8 is a flowchart illustrating a scene flow estimation method based on a deep neural network structure according to the first embodiment of the present application.

도 8에 도시된 장면 흐름 추정 방법은 앞서 설명된 본원의 제1 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 본원의 제1 실시예에 따른 장면 흐름 추정 장치(100)에 대하여 설명된 내용은 본원의 제1 실시예에 따른 장면 흐름 추정 방법에 대한 설명에도 동일하게 적용될 수 있다.The scene flow estimation method illustrated in FIG. 8 may be performed by the scene flow estimation apparatus 100 based on the deep neural network structure according to the first embodiment of the present disclosure described above. Therefore, even if omitted below, the contents described with respect to the scene flow estimation apparatus 100 according to the first embodiment of the present application may be equally applied to the description of the scene flow estimation method according to the first embodiment of the present application. .

도 8를 참조하면, 단계S11에서는 시각적 디스패리티 표현자 및 시각적 광학흐름 표현자를 포함하는 시각적 표현자를 추출할 수 있다. 구체적으로, 단계S11에서는 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 제1 목표 해상도에서의 시각적 디스패리티 표현자를 추출할 수 있다. 또한, 단계S11에서는 t 시간에서의 제1 시점 이미지와 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 제2 목표 해상도에서의 시각적 광학흐름 표현자를 추출할 수 있다.Referring to FIG. 8, in operation S11, a visual presenter including a visual disparity presenter and a visual optical flow presenter may be extracted. In detail, in operation S11, the visual disparity presenter at the first target resolution may be extracted while sequentially down sampling the visual disparity presenter with the first viewpoint image and the second viewpoint image at time t. Further, in step S11, the first optical image at t time and the first visual image at time (t-1) earlier than t time are sequentially inputted to the second target resolution while down sampling the visual optical flow presenter. Visual optical flow presenters can be extracted.

이때, 단계S11에서는 순차적으로 다운 샘플링 수행시 계층적으로 구비된 디스패리티 대응 복수 레이어 각각에 포함된 다층 CNN의 적용을 통해 디스패리티 대응 복수 레이어 각각에 대응하는 해상도별 시각적 디스패리티 표현자를 추출하고, 해상도별 시각적 디스패리티 표현자에 대하여 다운 샘플링을 수행할 수 있다. 또한, 단계S11에서는 순차적으로 다운 샘플링 수행시 계층적으로 구비된 광학흐름 대응 복수 레이어 각각에 포함된 다층 CNN의 적용을 통해 광학흐름 대응 복수 레이어 각각에 대응하는 해상도별 시각적 광학흐름 표현자를 추출하고, 해상도별 시각적 광학흐름 표현자에 대하여 다운 샘플링을 수행할 수 있다.At this time, in step S11, a visual disparity presenter for each resolution corresponding to each of the plurality of disparity corresponding layers is extracted by applying the multi-layer CNN included in each of the plurality of disparity corresponding layers hierarchically provided when downsampling sequentially. Downsampling may be performed on the visual disparity presenter for each resolution. In addition, in step S11, a visual optical flow expressor for each resolution corresponding to each of the plurality of layers corresponding to the optical flow is extracted by applying the multi-layer CNN included in each of the plurality of layers corresponding to the optical flow hierarchically provided when down-sampling is sequentially performed. Downsampling may be performed on the visual optical flow presenter for each resolution.

다음으로, 단계S12에서는 디스패리티 확률정보 및 광학흐름 확률정보를 포함하는 확률정보를 추정할 수 있다. 구체적으로, 단계S12에서는, 추출된 시각적 디스패리티 표현자를 고려하여 산출된 제1 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 제1 목표 해상도에서의 디스패리티 확률정보를 추정할 수 있다. 또한, 단계S12에서는 추출된 시각적 광학흐름 표현자를 고려하여 산출된 제2 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 제2 목표 해상도에서의 광학흐름 확률정보를 추정할 수 있다.Next, in step S12, probability information including disparity probability information and optical flow probability information may be estimated. Specifically, in step S12, the disparity probability information at the first target resolution may be estimated using the degree of matching for the disparity correspondence point candidate group at the first target resolution calculated in consideration of the extracted visual disparity presenter. . In operation S12, the optical flow probability information at the second target resolution may be estimated using the degree of matching for the optical flow correspondence point candidate group at the second target resolution calculated in consideration of the extracted visual optical flow presenter.

단계S12에서는 추정된 확률정보를 기반으로 디스패리티 측면 및 광학 흐름 측면에서의 장면 흐름을 추정할 수 있다.In operation S12, the scene flow in the disparity side and the optical flow side may be estimated based on the estimated probability information.

또한, 단계S12에서 디스패리티 대응점 후보군에 대한 정합도는 제1 목표 해상도에서의 시각적 디스패리티 표현자 간의 내적 연산에 의하여 산출될 수 있다. 또한, 단계S12에서 광학흐름 대응점 후보군에 대한 정합도는 제2 목표 해상도에서의 시각적 광학흐름 표현자 간의 내적 연산에 의하여 산출될 수 있다. 여기서, 디스패리티 확률정보 및 광학흐름 확률정보는 정규화된 확률정보(확률분포)일 수 있다.Also, in operation S12, the degree of match for the disparity correspondence point candidate group may be calculated by an inner product operation between the visual disparity presenters at the first target resolution. Also, in operation S12, the degree of matching with respect to the optical flow correspondence point candidate group may be calculated by an inner product operation between the visual optical flow presenters at the second target resolution. Here, the disparity probability information and the optical flow probability information may be normalized probability information (probability distribution).

또한, 도면에 도시하지는 않았으나, 본원의 제1 실시예에 따른 장면 흐름 추정 방법은, 제1 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도 또는 정합도 기반으로 추정된 디스패리티 확률정보를 입력으로 상기 디스패리티 확률정보에 제1 목표 해상도로부터의 순차적인 업 샘플링을 적용하여 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지에 대응하는 디스패리티를 추정하고, 제2 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도 또는 정합도 기반으로 추정된 광학흐름 확률정보를 입력으로 상기 광학흐름 확률정보에 제2 목표 해상도로부터의 순차적인 업 샘플링을 적용하여 t 시간에서의 제1 시점 이미지와 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지에 대응하는 광학흐름을 추정하는 단계(미도시)를 포함할 수 있다.In addition, although not shown in the drawings, the scene flow estimation method according to the first embodiment of the present disclosure may be configured to input disparity probability information estimated based on the degree of matching or the degree of matching for the disparity correspondence point candidate group at the first target resolution. Applying the sequential upsampling from the first target resolution to the disparity probability information to estimate the disparity corresponding to the first viewpoint image and the second viewpoint image at time t, and the optical flow corresponding point at the second target resolution The sequential upsampling from the second target resolution is applied to the optical flow probability information by inputting the optical flow probability information estimated based on the degree of matching or the degree of matching for the candidate group, and then the first viewpoint image at time t and the time t. Estimating an optical flow corresponding to the first viewpoint image at a time (t-1) that is previous (not shown).

이때, 업 샘플링을 통한 확률정보 추정 단계(미도시)에서는, 순차적으로 업 샘플링 수행시 계층적으로 구비된 디스패리티 대응 복수 레이어 각각에 대응하는 정합도(디스패리티 정합도) 또는 정합도 기반의 디스패리티 확률정보에 다층 CNN을 적용하여 업 샘플링을 수행하고, 디스패리티 대응 복수 레이어 각각에 대응하여 출력된 CNN 출력값에 다른 다층 CNN을 적용하여 디스패리티 대응 복수 레이어 각각에 대응하는 해상도별 디스패리티 확률정보를 추정할 수 있다.At this time, in the step of estimating probability information through upsampling (not shown), a disparity matching degree or a disparity based disparity corresponding to each of the plurality of disparity corresponding layers that are hierarchically provided when the upsampling is sequentially performed. Upsampling is performed by applying a multi-layer CNN to the parity probability information, and disparity probability information for each resolution corresponding to each of the plurality of layers corresponding to the disparity by applying a different multi-layer CNN to the output CNN output corresponding to each of the plurality of layers corresponding to the disparity. Can be estimated.

또한, 업 샘플링을 통한 확률정보 추정 단계(미도시)에서는, 순차적으로 업 샘플링 수행시 계층적으로 구비된 광학흐름 대응 복수 레이어 각각에 대응하는 정합도(광학흐름 정합도) 또는 정합도 기반의 광학흐름 확률정보에 다층 CNN을 적용하여 업 샘플링을 수행하고, 광학흐름 대응 복수 레이어 각각에 대응하여 출력된 CNN 출력값에 다른 다층 CNN을 적용하여 광학흐름 대응 복수 레이어 각각에 대응하는 해상도별 광학흐름 확률정보를 추정할 수 있다.In addition, in the step of estimating probability information through upsampling (not shown), the matching degree (optical flow matching degree) or the matching-based optics corresponding to each of the plurality of layers corresponding to the optical flow hierarchically provided when the upsampling is sequentially performed. Upsampling by applying multi-layer CNN to flow probability information, and applying different multi-layer CNN to output CNN output value corresponding to each of the plurality of layers corresponding to optical flow Can be estimated.

상술한 설명에서, 단계S11 내지 S12는 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S11 to S12 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present disclosure. In addition, some steps may be omitted as necessary, and the order between the steps may be changed.

한편, 본원의 제2 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 방법은 앞서 설명된 본원의 제2 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 본원의 제2 실시예에 따른 장면 흐름 추정 장치에 대하여 설명된 내용은 본원의 제2 실시예에 따른 장면 흐름 추정 방법에 대한 설명에도 동일하게 적용될 수 있다.Meanwhile, the deep neural network structure based scene flow estimation method according to the second embodiment of the present disclosure may be performed by the deep neural network structure based scene flow estimation apparatus described above. Therefore, even if omitted below, the description of the scene flow estimation apparatus according to the second embodiment of the present application may be equally applicable to the description of the scene flow estimation method according to the second embodiment of the present application.

간단히 살펴보면, 본원의 제2 실시예에 따른 장면 흐름 추정 방법은 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자를 순차적으로 다운 샘플링하면서 목표 해상도에서의 시각적 디스패리티 표현자를 추출할 수 있다. 다음으로, 장면 흐름 추정 방법은 추출된 시각적 디스패리티 표현자를 고려하여 산출된 목표 해상도에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 목표 해상도에서의 디스패리티 확률정보를 추정할 수 있다. 이를 통해 장면 흐름 추정 방법은 디스패리티 측면에서의 장면 흐름을 추정할 수 있다.In brief, the scene flow estimation method according to the second embodiment of the present application visually disparity representation at a target resolution while sequentially down sampling the visual disparity presenter with input of the first viewpoint image and the second viewpoint image at time t You can extract the ruler. Next, the scene flow estimation method may estimate disparity probability information at the target resolution using the degree of matching for the disparity correspondence point candidate group at the target resolution calculated in consideration of the extracted visual disparity presenter. Through this, the scene flow estimation method may estimate the scene flow in terms of disparity.

본원의 제3 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 방법은 앞서 설명된 본원의 제3 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정 장치에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 본원의 제3 실시예에 따른 장면 흐름 추정 장치에 대하여 설명된 내용은 본원의 제3 실시예에 따른 장면 흐름 추정 방법에 대한 설명에도 동일하게 적용될 수 있다.The deep neural network structure based scene flow estimation method according to the third embodiment of the present disclosure may be performed by the deep neural network structure based scene flow estimation apparatus according to the third embodiment of the present disclosure. Therefore, even if omitted below, the content described with respect to the scene flow estimation apparatus according to the third embodiment of the present application may be equally applicable to the description of the scene flow estimation method according to the third embodiment of the present application.

간단히 살펴보면, 본원의 제3 실시예에 따른 장면 흐름 추정 방법은 t 시간에서의 제1 시점 이미지와 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자를 순차적으로 다운 샘플링하면서 목표 해상도에서의 시각적 광학흐름 표현자를 추출하라 수 있다. 다음으로, 장면 흐름 추정 방법은 추출된 시각적 광학흐름 표현자를 고려하여 산출된 목표 해상도에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 목표 해상도에서의 광학흐름 확률정보를 추정할 수 있다. 이를 통해 장면 흐름 추정 방법은 광학흐름 측면에서의 장면 흐름을 추정할 수 있다.In brief, the scene flow estimation method according to the third exemplary embodiment of the present disclosure may generate a visual optical flow presenter by inputting a first viewpoint image at time t and a first viewpoint image at time (t-1) before t time. By sequentially downsampling, we can extract the visual optical flow presenter at the target resolution. Next, the scene flow estimation method may estimate the optical flow probability information at the target resolution using the degree of matching for the candidate group of optical flow correspondence points at the target resolution calculated in consideration of the extracted visual optical flow presenter. Through this, the scene flow estimation method may estimate the scene flow in terms of optical flow.

본원의 제4 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정을 위한 장면 흐름 학습 방법은 앞서 설명된 본원의 제4 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 본원의 제4 실시예에 따른 장면 흐름 학습 장치에 대하여 설명된 내용은 본원의 제4 실시예에 따른 장면 흐름 학습 방법에 대한 설명에도 동일하게 적용될 수 있다.The scene flow learning method for scene flow estimation based on the deep neural network structure according to the fourth embodiment of the present disclosure may be performed by the scene flow learning apparatus for scene flow learning according to the fourth embodiment of the present disclosure described above. . Therefore, even if omitted below, the content described with respect to the scene flow learning apparatus according to the fourth embodiment of the present application may be equally applicable to the description of the scene flow learning method according to the fourth embodiment of the present application.

간단히 살펴보면, 본원의 제4 실시예에 따른 장면 흐름 학습 방법은, 복수의 계층 중 어느 하나인 다운 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지에 대응하는 타깃 디스패리티 확률정보를 학습대상으로서 산출할 수 있다(step 1). 다음으로, 장면 흐름 학습 방법은 다운 샘플링 계층에 포함된 다층 CNN의 적용을 통해 시각적 디스패리티 표현자를 추출할 수 있다(step 2). 다음으로, 장면 흐름 학습 방법은 추출된 시각적 디스패리티 표현자를 고려하여 산출된 상기 다운 샘플링 계층에서의 디스패리티 대응점 후보군에 대한 정합도를 이용하여 다운 샘플링 계층에서의 디스패리티 확률정보를 추정한 다음, 다운 샘플링 계층에 대한 타깃 디스패리티 확률정보와의 차이가 최소화되도록 학습할 수 있다(step 3).In brief, the scene flow learning method according to the fourth embodiment of the present disclosure may include target disparity probability information corresponding to a first view image and a second view image at time t for a down sampling layer, which is one of a plurality of layers. Can be calculated as the learning target (step 1). Next, the scene flow learning method may extract the visual disparity presenter through the application of the multi-layer CNN included in the down sampling layer (step 2). Next, the scene flow learning method estimates the disparity probability information in the down sampling layer by using the degree of matching for the disparity correspondence point candidate group in the down sampling layer calculated in consideration of the extracted visual disparity presenter. It can be learned to minimize the difference with the target disparity probability information for the down sampling layer (step 3).

여기서, step 1 내지 step 3의 과정은, t 시간에서의 제1 시점 이미지 및 제2 시점 이미지를 입력으로 시각적 디스패리티 표현자가 목표 해상도까지 순차적으로 다운 샘플링되는 복수의 계층 각각에 대하여 차례로 수행될 수 있다.In this case, the steps 1 to 3 may be sequentially performed for each of a plurality of layers in which the visual disparity presenter sequentially downsamples the target resolution by inputting the first viewpoint image and the second viewpoint image at time t. have.

또한, 타깃 디스패리티 확률정보는, 제1 시점 이미지의 다운 샘플링 계층의 해상도에 대응하는 이미지 및 제2 시점 이미지의 다운 샘플링 계층의 해상도에 대응하는 이미지 상의 대응 후보점 사이의 거리 관계에 기반하여 거리에 반비례하도록 산출될 수 있다.Further, the target disparity probability information may be based on a distance based on a distance relationship between an image corresponding to the resolution of the down sampling layer of the first viewpoint image and a corresponding candidate point on the image corresponding to the resolution of the down sampling layer of the second viewpoint image. It can be calculated in inverse proportion to.

또한, 본원의 제4 실시예에 따른 장면 흐름 학습 방법은, 복수의 계층 중 어느 하나인 업 샘플링 계층에 포함된 다층CNN을 상기 업 샘플링 계층의 하위 계층에서 전달된 디스패리티 확률정보 및 상기 업 샘플링 계층에 대응하는 디스패리티 정합도 정보 중 하나 이상에 적용함으로써 상기 업 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 제2 시점 이미지에 대응하는 디스패리티를 추정할 수 있다(step 4). 다음으로, 장면 흐름 학습 방법은 상기 업 샘플링 계층에서의 타깃 디스패리티와 상기 step 4에서 추정된 디스패리티에 관한 정합도 차이가 최소화되도록 학습할 수 있다 (step 5). 이때, 상기 디스패리티 정합도 정보는, step 3의 수행을 통해 업 샘플링 계층에 대응하는 다운 샘플링 계층에 대하여 학습된 시각적 디스패리티 표현자를 이용하여 산출된 정합도(디스패리티 정합도) 또는 정합도 기반으로 추정된 디스패리티 확률정보일 수 있다. 또한, 타깃 디스패리티는, step 3의 수행을 통해 업 샘플링 계층에 대응하는 다운 샘플링 계층에 대하여 학습된 시각적 디스패리티 표현자를 이용하여 산출될 수 있다. 또한, setp 4 및 5 는, step 3를 통해 목표 해상도에서 학습된 디스패리티 확률정보를 입력으로 각 업 샘플링 계층마다 포함된 다층 CNN 및 업 샘플링의 적용을 통해 상위 계층으로 전달되는 디스패티리 확률정보 및 각 업 샘플링 계층에 대응하는 디스패리티 정합도 정보 중 하나 이상이 원본 해상도까지 순차적으로 전달되는 상기 복수의 계층 각각에 대하여 차례로 수행될 수 있다.In addition, the scene flow learning method according to the fourth embodiment of the present invention, the disparity probability information transmitted from the lower layer of the up-sampling layer and the up-sampling of the multi-layer CNN included in the up-sampling layer which is one of a plurality of layers The disparity corresponding to the first view image and the second view image at time t may be estimated for the up-sampling layer by applying to one or more pieces of disparity matching information corresponding to the layer (step 4). Next, the scene flow learning method may learn to minimize the difference in matching degree with respect to the target disparity and the disparity estimated in step 4 in the upsampling layer (step 5). In this case, the disparity matching information is based on the matching degree (disparity matching degree) or the matching degree calculated using the visual disparity presenter trained on the down sampling layer corresponding to the upsampling layer by performing step 3. It may be disparity probability information estimated as. In addition, the target disparity may be calculated by using the visual disparity presenter trained on the down sampling layer corresponding to the up sampling layer by performing step 3. In addition, setp 4 and 5 input disparity probability information learned at the target resolution through step 3, and the disparity probability information delivered to the upper layer through the application of the multi-layer CNN and upsampling included in each upsampling layer. And one or more of disparity matching degree information corresponding to each upsampling layer may be sequentially performed on each of the plurality of layers sequentially transmitted to the original resolution.

본원의 제5 실시예에 따른 딥 뉴럴 네트워크 구조 기반의 장면 흐름 추정을 위한 장면 흐름 학습 방법은 앞서 설명된 본원의 제5 실시예에 따른 장면 흐름 학습을 위한 장면 흐름 학습 장치에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 본원의 제5 실시예에 따른 장면 흐름 학습 장치에 대하여 설명된 내용은 본원의 제5 실시예에 따른 장면 흐름 학습 방법에 대한 설명에도 동일하게 적용될 수 있다.The scene flow learning method for scene flow estimation based on the deep neural network structure according to the fifth embodiment of the present disclosure may be performed by the scene flow learning apparatus for scene flow learning according to the fifth embodiment of the present disclosure described above. . Therefore, even if omitted below, the content described with respect to the scene flow learning apparatus according to the fifth embodiment of the present application may be equally applicable to the description of the scene flow learning method according to the fifth embodiment of the present application.

간단히 살펴보면, 본원의 제5 실시예에 따른 장면 흐름 학습 방법은, 복수의 계층 중 어느 하나인 다운 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지에 대응하는 타깃 광학흐름 확률정보를 산출할 수 있다(step 1). 다음으로, 장면 흐름 학습 방법은 다운 샘플링 계층에 포함된 다층 CNN의 적용을 통해 시각적 광학흐름 표현자를 추출할 수 있다(step 2). 다음으로, 장면 흐름 학습 방법은 추출된 시각적 광학흐름 표현자를 고려하여 산출된 다운 샘플링 계층에서의 광학흐름 대응점 후보군에 대한 정합도를 이용하여 다운 샘플링 계층에서의 광학흐름 확률정보를 추정한 다음, 다운 샘플링 계층에 대한 타깃 광학흐름 확률정보와의 차이가 최소화되도록 시각적 광학흐름 표현자를 학습할 수 있다(step 3).In brief, the scene flow learning method according to the fifth embodiment of the present disclosure may include a first viewpoint image at t time and a time earlier than t time with respect to a down sampling layer which is one of a plurality of layers. Target optical flow probability information corresponding to the first viewpoint image of may be calculated (step 1). Next, the scene flow learning method may extract the visual optical flow presenter through the application of the multi-layer CNN included in the down sampling layer (step 2). Next, the scene flow learning method estimates the optical flow probability information in the down sampling layer using the degree of matching for the candidate group of the optical flow corresponding to the down sampling layer calculated in consideration of the extracted visual optical flow descriptor. The visual optical flow presenter can be learned to minimize the difference with the target optical flow probability information for the sampling layer (step 3).

여기서, step 1 내지 step 3의 과정은, t 시간에서의 제1 시점 이미지 및 (t-1) 시간에서의 제1 시점 이미지를 입력으로 시각적 광학흐름 표현자가 목표 해상도까지 순차적으로 다운 샘플링되는 복수의 계층 각각에 대하여 차례로 수행될 수 있다.Here, the steps 1 to 3 may include a plurality of processes in which the visual optical flow presenter is sequentially downsampled to the target resolution by inputting the first viewpoint image at time t and the first viewpoint image at time (t-1). This may be done in turn for each of the layers.

또한, 타깃 광학흐름 확률정보는, 제1 시점 이미지의 다운 샘플링 계층의 해상도에 대응하는 이미지 및 (t-1) 시간에서의 제1 시점 이미지의 다운 샘플링 계층의 해상도에 대응하는 이미지 상의 대응 후보점 사이의 거리 관계에 기반하여 거리에 반비례하도록 산출될 수 있다.In addition, the target optical flow probability information includes corresponding candidate points on the image corresponding to the resolution of the down sampling layer of the first viewpoint image and the image corresponding to the resolution of the down sampling layer of the first viewpoint image at (t-1) time. It can be calculated to be inversely proportional to the distance based on the distance relationship therebetween.

또한, 본원의 제5 실시예에 따른 장면 흐름 학습 방법은, 복수의 계층 중 어느 하나인 업 샘플링 계층에 포함된 다층 CNN을 상기 업 샘플링 계층의 하위 계층에서 전달된 광학흐름 확률정보 및 상기 업 샘플링 계층에 대응하는 광학흐름 정합도 정보 중 하나 이상에 적용함으로써 상기 업 샘플링 계층에 대하여 t 시간에서의 제1 시점 이미지 및 t 시간보다 이전인 (t-1) 시간에서의 제1 시점 이미지에 대응하는 광학흐름을 추정할 수 있다(step 4). 다음으로, 장면 흐름 학습 방법은 업 샘플링 계층에서의 타깃 광학흐름과 상기 step 4에서 추정된 광학흐름에 관한 정합도 차이가 최소화되도록 학습할 수 있다(step 5). 이때, 상기 광학흐름 정합도 정보는, step 3의 수행을 통해 업 샘플링 계층에 대응하는 다운 샘플링 계층에 대하여 학습된 시각적 광학흐름 표현자를 이용하여 산출된 정합도(광학흐름 정합도) 또는 정합도 기반으로 추정된 광학흐름 확률정보일 수 있다. 또한, 타깃 광학흐름은 step 3의 수행을 통해 업 샘플링 계층에 대응하는 다운 샘플링 계층에 대하여 학습된 시각적 광학흐름 표현자를 이용하여 산출될 수 있다. 또한, step 4 및 5는, step 3를 통해 목표 해상도에서 학습된 광학흐름 확률정보를 입력으로 각 업 샘플링 계층마다 포함된 다층 CNN 및 업 샘플링의 적용을 통해 상위 계층으로 전달되는 광학흐름 확률정보 및 각 업 샘플링 계층에 대응하는 디스패리티 정합도 정보 중 하나 이상이 원본 해상도까지 순차적으로 전달되는 복수의 계층 각각에 대하여 차례로 수행될 수 있다.In addition, the scene flow learning method according to the fifth embodiment of the present invention, the optical flow probability information transmitted from the lower layer of the up-sampling layer and the up-sampling of the multi-layer CNN included in the up-sampling layer which is one of a plurality of layers The first view image at time t and the first view image at time (t-1) earlier than t time for the up-sampling layer by applying to one or more of the optical flow matching information corresponding to the layer. The optical flow can be estimated (step 4). Next, the scene flow learning method may learn to minimize the difference in the degree of matching with respect to the target optical flow in the upsampling layer and the optical flow estimated in step 4 (step 5). In this case, the optical flow matching degree information is based on the matching degree (optical flow matching degree) or the matching degree calculated using the visual optical flow presenter trained on the down sampling layer corresponding to the up sampling layer by performing step 3. It may be estimated optical flow probability information. In addition, the target optical flow may be calculated by using the visual optical flow presenter learned about the down sampling layer corresponding to the up sampling layer by performing step 3. In addition, step 4 and 5, the optical flow probability information learned at the target resolution through the step 3, the optical flow probability information delivered to the upper layer through the application of the multi-layer CNN and upsampling included in each upsampling layer and One or more of the disparity matching degree information corresponding to each upsampling layer may be sequentially performed on each of the plurality of layers sequentially transmitted up to the original resolution.

본원의 일 실시예에 따른 장면 흐름 추정 방법 및 장면 흐름 추정을 위한 장면 흐름 학습 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The scene flow estimation method and the scene flow learning method for estimating the scene flow according to an embodiment of the present application may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present application is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 청구범위에 의하여 나타내어지며, 청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the following claims rather than the above description, and it should be construed that all changes or modifications derived from the meaning and scope of the claims and their equivalents are included in the scope of the present application.

100: 장면 흐름 추정 장치
110: 시각적 표현자 추출부
120: 확률정보 추정부
130: 추정부100: scene flow estimation device
110: visual presenter extracting unit
120: probability information estimator
130: estimator

Claims

In a deep neural network structure based scene flow estimation method performed by a deep neural network structure based scene flow estimation apparatus,
(a) extracting the visual disparity presenter at a first target resolution while sequentially down-sampling the visual disparity presenter with input of the first viewpoint image and the second viewpoint image at time t; Extracting the visual optical flow presenter at the second target resolution while sequentially down sampling the visual optical flow presenter as an input from the first viewpoint image and the first viewpoint image at time (t-1) prior to the t time; And
(b) estimating disparity probability information at the first target resolution using the degree of matching for the disparity correspondence point candidate group at the first target resolution calculated in consideration of the extracted visual disparity presenter, and extracting Estimating optical flow probability information at the second target resolution using a degree of matching for the optical flow correspondence point candidate group calculated at the second target resolution calculated in consideration of the visual optical flow presenter. .

The method of claim 1,
(c) sequential upsampling from the first target resolution to the disparity probability information by inputting disparity probability information estimated based on the degree of matching or the degree of matching for the disparity correspondence point candidate group at the first target resolution; Estimates the disparity corresponding to the first viewpoint image and the second viewpoint image at time t, and estimates an optical flow probability based on the degree of matching or the degree of matching of the candidate group of optical flow corresponding points at the second target resolution. Applying the information to the optical flow probability information by applying sequential upsampling from the second target resolution to the first view image at time t and the time at time (t-1) earlier than the time t. Estimating an optical flow corresponding to the first view image,
Scene flow estimation method further comprising.

The method of claim 1,
In step (a),
When the downsampling is sequentially performed, the visual disparity presenter for each resolution corresponding to each of the plurality of disparity corresponding layers is extracted by applying the multi-layer CNN included in each of the plurality of disparity corresponding layers hierarchically provided, and the visual for each resolution is obtained. Downsampling the disparity presenter,
When performing the downsampling sequentially, by applying the multi-layer CNN included in each of the plurality of layers corresponding to the optical flow hierarchically, a visual optical flow descriptor for each resolution corresponding to each of the plurality of layers corresponding to the optical flow is extracted, and the visual for each resolution is visualized. And performing down sampling on the optical flow presenter.

The method of claim 1,
In the step (b), the degree of match for the disparity correspondence point candidate group is calculated by an inner product operation between visual disparity presenters at the first target resolution,
And the matching degree for the optical flow correspondence point candidate group is calculated by an inner product operation between visual optical flow presenters at the second target resolution.

The method of claim 1,
The disparity probability information and the optical flow probability information are normalized probability information.

The method of claim 2,
In step (c),
When sequentially performing the upsampling, upsampling is performed by applying a multi-layer CNN to the degree of matching or the degree of matching based disparity probability information corresponding to each of the plurality of disparity corresponding layers hierarchically provided, and each of the plurality of disparity corresponding layers. Estimating the disparity for each resolution corresponding to each of the plurality of layers corresponding to the disparity by applying another multi-layer CNN to the CNN output value corresponding to
When sequentially performing the upsampling, upsampling is performed by applying a multi-layer CNN to the matching degree or matching degree-based optical flow probability information corresponding to each of the plurality of optical flow corresponding layers hierarchically provided, and each of the plurality of optical flow corresponding layers. And estimating the optical flow for each resolution corresponding to each of the plurality of layers corresponding to the optical flow by applying a different multilayer CNN to the CNN output value output in correspondence with the CNN.

In a deep neural network structure based scene flow estimation method performed by a deep neural network structure based scene flow estimation apparatus,
(a) extracting the visual disparity presenter at the target resolution while sequentially down sampling the visual disparity presenter with input of the first viewpoint image and the second viewpoint image at time t; And
(b) estimating scene disparity probability information at the target resolution using a degree of matching for the disparity correspondence point candidate group at the target resolution calculated in consideration of the extracted visual disparity presenter; Way.

In a deep neural network structure based scene flow estimation method performed by a deep neural network structure based scene flow estimation apparatus,
(a) Visual optical flow at the target resolution while sequentially down sampling the visual optical flow presenter with input from the first viewpoint image at time t and the first viewpoint image at time (t-1) prior to the time t Extracting the presenter; And
(b) estimating scene flow probability information at the target resolution using a degree of matching for the candidate group of optical flow corresponding points calculated at the target resolution calculated in consideration of the extracted visual optical flow presenter; Way.

A scene flow learning method for scene flow estimation based on a deep neural network structure performed by a scene flow learning apparatus for scene flow learning,
(a) calculating target disparity probability information corresponding to a first viewpoint image and a second viewpoint image at time t for a down sampling layer, which is one of a plurality of layers, as a learning object;
(b) extracting a visual disparity presenter through application of a multi-layer CNN included in the down sampling layer; And
(c) estimating disparity probability information in the down sampling layer using the degree of matching for the disparity correspondence point candidate group in the down sampling layer calculated in consideration of the extracted visual disparity presenter, and then performing the down sampling. Learning to minimize the difference with the target disparity probability information for the layer;
Steps (a) to (c) are sequentially performed for each of the plurality of layers in which a visual disparity presenter is sequentially downsampled to a target resolution by inputting a first viewpoint image and a second viewpoint image at time t. Scene flow learning method for scene flow estimation.

The method of claim 9,
The target disparity probability information is based on a distance relationship between an image corresponding to the resolution of the down sampling layer of the first view image and a corresponding candidate point on the image corresponding to the resolution of the down sampling layer of the second view image. A scene flow learning method for scene flow estimation, which is calculated to be inversely proportional to distance.

The method of claim 9,
(d) at least one of the disparity probability information transmitted from a lower layer of the upsampling layer and the disparity matching degree information corresponding to the upsampling layer in the multilayer CNN included in the upsampling layer, which is one of the plurality of layers. Estimating a disparity corresponding to the first view image and the second view image at time t for the up-sampling layer by applying to;
(e) learning to minimize the difference in the degree of matching with respect to the target disparity in the upsampling layer and the disparity estimated in step (d),
Include more,
The disparity matching information is a disparity estimated based on the matching degree or the matching degree calculated using the visual disparity presenter trained on the down sampling layer corresponding to the up sampling layer by performing the step (c). Parity probability information,
The target disparity is calculated using the visual disparity presenter trained on the down sampling layer corresponding to the up sampling layer by performing step (c),
Steps (d) and (e) are performed by applying multi-layer CNN and upsampling included in each upsampling layer as disparity probability information learned at the target resolution through step (c). Wherein at least one of the disparity probability information delivered to the layer and the disparity matching degree information corresponding to each upsampling layer is sequentially performed for each of the plurality of layers sequentially transmitted to the original resolution. Scene flow learning method for.

A scene flow learning method for scene flow estimation based on a deep neural network structure performed by a scene flow learning apparatus for scene flow learning,
(a) Target optical flow probability information corresponding to the first view image at time t and the first view image at time (t-1) before the time t for a down sampling layer, which is one of a plurality of layers Calculating;
(b) extracting a visual optical flow presenter through application of a multi-layer CNN included in the down sampling layer; And
(c) estimating optical flow probability information in the down sampling layer using the degree of matching for the optical flow correspondence point candidate group in the down sampling layer calculated in consideration of the extracted visual optical flow presenter; Learning a visual optical flow presenter such that the difference with the target optical flow probability information for the layer is minimized,
In steps (a) to (c), the visual optical flow presenter sequentially downsamples the target optical resolution to a target resolution by inputting the first viewpoint image at time t and the first viewpoint image at time (t-1). The scene flow learning method for scene flow estimation, which is performed in turn on each of the plurality of layers.

The method of claim 12,
The target optical flow probability information includes an image corresponding to the resolution of the down sampling layer of the first viewpoint image and an image corresponding to the resolution of the down sampling layer of the first viewpoint image at the time (t-1). And is calculated to be inversely proportional to distance based on the distance relationship between corresponding candidate points.

The method of claim 12,
(d) at least one of optical flow probability information transmitted from a lower layer of the upsampling layer and optical flow matching information corresponding to the upsampling layer, for the multi-layer CNN included in an upsampling layer, which is one of the plurality of layers; Estimating an optical flow corresponding to the first view image at time t and the first view image at time (t-1) earlier than the time t for the up-sampling layer by applying to.
(e) learning to minimize the difference in matching between the target optical flow in the upsampling layer and the optical flow estimated in step (d),
Include more,
The optical flow matching degree information is an optical estimated based on the matching degree or the matching degree calculated using the visual optical flow presenter learned about the down sampling layer corresponding to the up sampling layer by performing step (c). Flow probability information,
The target optical flow is calculated by using the visual optical flow presenter learned through the down sampling layer corresponding to the up sampling layer by performing step (c).
Steps (d) and (e) are performed by applying multi-layer CNN and upsampling included in each upsampling layer by inputting optical flow probability information learned at the target resolution through step (c). At least one of the optical flow probability information transmitted to the layer and the optical flow match information corresponding to each upsampling layer is sequentially performed for each of the plurality of layers sequentially transmitted to the original resolution. Scene flow learning method.

In the apparatus for estimating scene flow based on deep neural network structure,
extracting the visual disparity presenter at a first target resolution while sequentially downsampling the visual disparity presenter with input of the first viewpoint image and the second viewpoint image at time t, and the first viewpoint image at time t And a visual presenter extracting unit extracting a visual optical flow presenter at a second target resolution while down-sampling the visual optical flow presenter as an input from the first viewpoint image at time (t-1) earlier than the time t. ; And
The disparity probability information at the first target resolution is estimated using the degree of matching for the disparity correspondence point candidate group at the first target resolution calculated in consideration of the extracted visual disparity presenter, and the extracted visual optics are extracted. Scene flow comprising a probability information estimator for estimating the optical flow probability information at the second target resolution by using the matching degree for the optical flow corresponding point candidate group at the second target resolution calculated in consideration of the flow presenter A) estimation device.

The method of claim 15,
The disparity probability information at the first target resolution is input to the disparity probability information to apply sequential upsampling from the first target resolution to correspond to the first view image and the second view image at the time t. Estimating a disparity, and applying sequential upsampling from the second target resolution to the optical flow probability information by inputting the optical flow probability information at the second target resolution, the first time point at the time t. An estimator for estimating an optical flow corresponding to the image and the first viewpoint image at time (t-1) that is earlier than the time t;
Scene flow estimation device further comprising.

In the apparatus for estimating scene flow based on deep neural network structure,
a visual disparity presenter extracting unit configured to extract the visual disparity presenter at a target resolution while sequentially down sampling the visual disparity presenter with input of the first viewpoint image and the second viewpoint image at time t; And
A scene flow including a disparity probability information estimator for estimating disparity probability information at the target resolution using a degree of matching for the disparity correspondence point candidate group at the target resolution calculated in consideration of the extracted visual disparity presenter. Estimation device.

In the apparatus for estimating scene flow based on deep neural network structure,
Extract the visual optical flow presenter at the target resolution while sequentially down-sampling the visual optical flow presenter as an input from the first viewpoint image at time t and the first viewpoint image at time (t-1) earlier than the time t. A visual optical flow presenter extracting unit; And
Scene flow including an optical flow probability information estimator for estimating optical flow probability information at the target resolution using the degree of matching for the candidate group of optical flow correspondence points at the target resolution calculated in consideration of the extracted visual optical flow presenter Estimation device.

In the scene flow learning apparatus for scene flow learning,
Calculating target disparity probability information corresponding to the first view image and the second view image at time t for a down sampling layer, which is one of a plurality of layers,
Extract the visual disparity presenter through the application of a multi-layer CNN included in the down sampling layer,
The disparity probability information in the down sampling layer is estimated using the degree of matching for the disparity correspondence point candidate group in the down sampling layer calculated based on the extracted visual disparity presenter, and then A disparity learning unit learning a visual disparity presenter such that a difference with a target disparity probability distribution is minimized;
The disparity learner is configured to sequentially learn each of the plurality of layers in which the visual disparity presenter sequentially downsamples the target resolution by inputting the first viewpoint image and the second viewpoint image at time t. Scene flow learning apparatus for scene flow learning.

In the scene flow learning apparatus for scene flow learning,
Calculating target optical flow probability information corresponding to a first view image at time t and a first view image at time (t-1) earlier than time t for a down sampling layer, which is one of a plurality of layers, ,
Extract the visual optical flow presenter through the application of a multi-layer CNN included in the down sampling layer,
The optical flow probability information in the down sampling layer is estimated using the degree of matching for the optical flow correspondence point candidate group in the down sampling layer calculated in consideration of the extracted visual optical flow presenter. It includes an optical flow learning unit for learning the visual optical flow presenter to minimize the difference with the target optical flow probability information,
The optical flow learner may input a first viewpoint image at time t and a first viewpoint image at time (t-1) to each of the plurality of hierarchies in which a visual optical flow presenter is sequentially downsampled to a target resolution. Scene flow learning apparatus for scene flow learning.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 14.