KR20200095251A

KR20200095251A - Apparatus and method for estimating optical flow and disparity via cycle consistency

Info

Publication number: KR20200095251A
Application number: KR1020190013102A
Authority: KR
Inventors: 함범섭; 박현종
Original assignee: 연세대학교 산학협력단
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2020-08-10
Also published as: KR102186764B1

Abstract

The present invention provides an apparatus and a method for estimating optical flow and disparity which can reduce learning time. The apparatus for estimating optical flow and disparity comprises: a stereo image acquisition unit to acquire stereo images of a plurality of frames; and an estimation unit which includes a plurality of convolutional neural networks (hereafter, CNNs) having the same structure and having the same learning weight by learning a pattern recognition method in advance, and simultaneously estimates and outputs the disparity between a left image and a right image distinguished in accordance with a time difference and optical flow for images of continuous frames in a stereo image set of two continuous frames transferred from the stereo image acquisition unit. The plurality of CNNs of the estimation unit cycles the remaining images in a previously designated forward direction and reverse direction from each pixel of an image among four images of the stereo image set of the two continuous frames inputted during learning to backpropagate total loss including cycle consistency loss acquired in accordance with a cycle transition result representing the sum of changes of the positions of searched response points.

Description

A device and method for estimating optical flow and disparity based on cycle consistency {APPARATUS AND METHOD FOR ESTIMATING OPTICAL FLOW AND DISPARITY VIA CYCLE CONSISTENCY}

본 발명은 옵티컬플로우 및 디스패리티 추정 장치 및 방법에 관한 것으로, 사이클 일관성에 기반한 옵티컬플로우 및 디스패리티 동시 추정 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for estimating optical flow and disparity, and to an apparatus and method for simultaneously estimating optical flow and disparity based on cycle consistency.

다수의 이미지 사이의 밀집 대응점(dense correspondence) 분석은 영상 처리(image processing) 및 컴퓨터 비전(computer vision) 분야의 다양한 어플리케이션에 있어서 기본적인 작업으로, 특히 첨단 운전자 보조 시스템(Advanced Driver Assistance System: ADAS) 및 자율 주행 시스템(autonomous driving system)을 포함한 다양한 차량 어플리케이션에서 주로 이용되고 있다.Analysis of dense correspondence between multiple images is a fundamental task for a variety of applications in image processing and computer vision, especially Advanced Driver Assistance System (ADAS) and It is mainly used in a variety of vehicle applications including autonomous driving systems.

이미지의 대응점 분석은 이미지 스티칭(image stitching), 정렬(alignment), 인식(recognition), 스테레오 매칭(stereo matching), 및 옵티컬플로우(optical flow) 등에서 주로 이용되고 있다.Correspondence point analysis of an image is mainly used in image stitching, alignment, recognition, stereo matching, and optical flow.

한편, 인공 신경망을 이용한 딥러닝 기법에 대한 연구가 발전됨에 따라 스테레오 매칭을 위한 디스패리티 추정 및 옵티컬플로우 추정에도 딥러닝 기법이 이용되는 비중이 높아지고 있다. 딥러닝 기법을 이용한 디스패리티 추정 및 옵티컬플로우 추정에 대한 많은 연구가 수행되었지만, 기본적으로 스테레오 영상의 매칭을 위한 디스패리티 추정과 연속되는 다수 프레임에서 객체의 움직임을 유추하기 위한 옵티컬플로우 추정은 독자적으로 연구가 수행되어 왔다.On the other hand, as research on deep learning techniques using artificial neural networks develops, the proportion of using deep learning techniques for disparity estimation and optical flow estimation for stereo matching is increasing. Although many studies have been conducted on disparity estimation and optical flow estimation using deep learning techniques, disparity estimation for matching stereo images and optical flow estimation for inferring the motion of an object in a number of consecutive frames are independently conducted. Research has been conducted.

그러나 최근에 딥러닝 기법을 이용하여 디스패리티와 옵티컬플로우를 동시에 학습하여 추정하는 경우, 디스패리티와 옵티컬플로우를 개별적으로 추정하는 것보다 성능이 향상되는 것으로 밝혀졌다. 그러나 기존에는 디스패리티와 옵티컬플로우를 개별로 추정하기 위해 학습하는 경우에나 디스패리티와 옵티컬플로우를 동시에 추정하기 위해 학습하는 경우 모두 기본적으로 영상에 검증 자료(ground truth)가 레이블로 포함된 학습 데이터를 이용한 지도 학습(supervised learning) 방식을 기반으로 연구되어 왔다.However, recently, it has been found that when disparity and optical flow are simultaneously learned and estimated using a deep learning technique, performance is improved compared to individual estimation of disparity and optical flow. However, in the past, when training to estimate disparity and optical flow individually or when training to estimate disparity and optical flow at the same time, training data containing ground truth as a label is basically used. It has been studied based on the supervised learning method used.

현재 디스패리티와 옵티컬플로우를 개별로 지도 학습시키기 위한 학습 데이터도 부족하지만, 디스패리티와 옵티컬플로우를 동시에 지도 학습시키기 위한 학습 데이터는 특히 부족한 실정이다. 그리고 학습 데이터의 부족은 옵티컬플로우 및 디스패리티 추정 시에 정확도를 크게 떨어뜨리는 결과를 초래하는 문제가 있다.Currently, training data for supervising disparity and optical flow individually is insufficient, but training data for supervised learning disparity and optical flow at the same time is particularly insufficient. In addition, the lack of training data has a problem that results in a significant decrease in accuracy when estimating optical flow and disparity.

그러나 레이블된 검증 자료가 포함된 학습 데이터의 생성은 매우 오랜 시간과 노력 및 비용이 요구된다. 이에 가상의 이미지와 검증 자료를 합성한 가상 학습 데이터를 생성하는 방식이 연구되었으나, 가상 학습 데이터를 이용하여 학습된 경우, 가상 영상의 사실성과 가변성의 결여로 인해 요구되는 성능을 나타내지 못하는 경우가 대부분이다.However, generating training data including labeled verification data requires a very long time, effort, and cost. Therefore, a method of generating virtual learning data by combining a virtual image and verification data has been studied, but in most cases, when learning using the virtual learning data, the required performance cannot be displayed due to the lack of realism and variability of the virtual image. to be.

따라서 옵티컬플로우 및 디스패리티 추정 장치가 검증 자료가 레이블된 학습 데이터를 요구하지 않고, 통상의 스테레오 영상으로부터 비지도 학습 방식으로 디스패리티와 옵티컬플로우를 동시에 추론 가능하도록 학습되는 기법이 요구되고 있다.Accordingly, there is a need for a method in which the optical flow and disparity estimation apparatus learn to simultaneously infer disparity and optical flow using an unsupervised learning method from a conventional stereo image without requiring training data labeled with verification data.

한국 등록 특허 제10-1849605호 (2018.04.11 등록)Korean Registered Patent No. 10-1849605 (Registered on 2018.04.11)

본 발명의 목적은 학습 데이터를 요구하지 않는 비지도 학습 방식으로 학습되어도 높은 옵티컬플로우 및 디스패리티의 동시 추정 성능을 나타내는 옵티컬플로우 및 디스패리티 추정 장치 및 방법을 제공하는데 있다.An object of the present invention is to provide an apparatus and method for estimating optical flow and disparity, which exhibits high simultaneous estimation performance of optical flow and disparity even when learned in an unsupervised learning method that does not require learning data.

본 발명의 다른 목적은 옵티컬플로우 및 디스패리티 추정을 위한 학습을 동시에 수행할 수 있어, 학습 시간을 저감할 수 있는 옵티컬플로우 및 디스패리티 추정 장치 및 방법을 제공하는데 있다.Another object of the present invention is to provide an apparatus and method for estimating optical flow and disparity that can reduce learning time by simultaneously performing learning for estimating optical flow and disparity.

본 발명의 또 다른 목적은 검증 자료가 포함되지 않은 연속하는 다수 프레임의 스테레오 영상에서 연속 프레임의 좌영상 및 우영상 간 대응점의 사이클 일관성에 기반하여 비지도 학습되는 옵티컬플로우 및 디스패리티 추정 장치 및 방법을 제공하는데 있다.Another object of the present invention is an apparatus and method for estimating optical flow and disparity that are unsupervised learning based on cycle coherence of corresponding points between the left and right images of consecutive frames in a stereo image of a number of consecutive frames without verification data To provide.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치는 다수 프레임의 스테레오 영상을 획득하는 스테레오 영상 획득부; 및 동일 구조를 갖고 패턴 인식 방법이 미리 학습되어 동일한 학습 가중치를 갖는 다수의 컨볼루션 신경망(이하 CNN)을 포함하여, 상기 스테레오 영상 획득부에서 전달되는 연속된 2개 프레임의 스테레오 영상 세트에서 연속된 프레임의 영상들에 대한 옵티컬플로우와 시차에 따라 구분되는 좌영상과 우영상 사이의 디스페리티를 동시에 추정하여 출력하는 추정부; 를 포함하고, 상기 추정부의 다수의 CNN은 학습 시에 입력된 연속된 2개 프레임의 스테레오 영상 세트의 4개의 영상 중 하나의 영상의 각 픽셀로부터 기지정된 순방향 및 역방향 각각으로 나머지 영상을 사이클하여 탐색된 대응점의 위치의 변화의 합을 나타내는 사이클 전이 결과에 따라 획득되는 사이클 일관성 손실을 포함하는 총 손실이 역전파되어 업데이트된 상기 학습 가중치로 학습될 수 있다.An optical flow and disparity estimation apparatus according to an embodiment of the present invention for achieving the above object comprises: a stereo image acquisition unit for obtaining a stereo image of a plurality of frames; And a plurality of convolutional neural networks (hereinafter referred to as CNNs) having the same structure and having the same learning weight by pre-learning the pattern recognition method, and are sequentially transmitted from the stereo image set of two consecutive frames transmitted from the stereo image acquisition unit. An estimating unit for simultaneously estimating and outputting an optical flow for the images of the frame and disperity between the left and right images classified according to parallax; Including, the plurality of CNNs of the estimation unit are searched by cycling the remaining images in each of the predetermined forward and reverse directions from each pixel of one image among four images of a stereo image set of two consecutive frames input at the time of learning. The total loss including the cycle coherence loss obtained according to the cycle transition result representing the sum of the changes in the positions of the corresponding points may be backpropagated and learned with the updated learning weight.

상기 사이클 일관성 손실은 상기 4개의 영상 중 순방향 및 역방향으로의 사이클 경로에서 2개씩의 영상 사이에 대응점의 존재 여부를 나타내는 사이클 신뢰도 맵을 상기 사이클 전이 결과에 반영하여 획득될 수 있다.The cycle coherence loss may be obtained by reflecting a cycle reliability map indicating whether a corresponding point exists between two images in a cycle path in a forward direction and a reverse direction among the four images to the cycle transition result.

상기 사이클 일관성 손실은 순방향 및 역방향으로의 각 픽셀에 대한 사이클 일관성 손실이 기지정된 문턱값을 초과하면, 해당 픽셀의 사이클 일관성 손실을 문턱값으로 출력할 수 있다.When the cycle coherence loss for each pixel in the forward direction and the reverse direction exceeds a predetermined threshold, the cycle coherence loss of the corresponding pixel may be output as a threshold value.

상기 총 손실은 학습 시에 상기 4개의 영상 중 프레임 순서에 따른 2개의 영상 사이, 동일 프레임의 2개의 영상 사이, 프레임과 시차가 상이한 2개의 영상 사이 각각에서 각 픽셀에 대한 대응점의 픽셀값과 그래디언트 값에 따라 획득되는 복원 손실을 추가로 포함할 수 있다.The total loss is a pixel value and a gradient of a corresponding point for each pixel between two images of the four images according to frame order, between two images of the same frame, and between two images having different parallaxes among the four images during training. The restoration loss obtained according to the value may be additionally included.

상기 복원 손실은 프레임 순서에 따른 2개의 영상 사이, 동일 프레임의 2개의 영상 사이, 프레임과 시차가 상이한 2개의 영상 사이 각각에서 대응점의 존재 여부를 나타내는 신뢰도 맵을 더 반영하여 획득할 수 있다.The restoration loss may be obtained by further reflecting a reliability map indicating whether a corresponding point exists between two images according to a frame order, between two images of the same frame, and between two images having different parallaxes from a frame.

상기 총 손실은 상기 4개의 영상에서 프레임 순서에 따라 획득된 옵티컬플로우 변화를 제한하는 옵티컬플로우 평활화 손실과 시차에 따라 획득된 디스패리티의 변화를 제한하는 디스패리티 평활화 손실을 추가로 포함할 수 있다.The total loss may further include an optical flow smoothing loss for limiting a change in optical flow obtained according to a frame order in the four images and a disparity smoothing loss for limiting a change in disparity obtained according to a parallax.

상기 옵티컬플로우 및 디스패리티 추정 장치는 상기 추정부의 다수의 CNN을 학습시키는 동안 결합되어 상기 학습 가중치를 획득하는 학습부를 더 포함하고, 상기 학습부는 상기 추정부의 다수의 CNN과 동일한 구조를 갖고, 상기 4개의 영상 중 서로 다른 조합의 2개씩의 영상에 대해 대응점의 위치 변화를 오프셋으로 획득하는 다수의 샴 CNN으로 구성된 오프셋 획득부; 상기 오프셋 획득부의 다수의 샴 CNN 각각에서 획득되는 다수의 오프셋을 이용하여, 상기 사이클 일관성 손실, 상기 복원 손실, 상기 옵티컬플로우 평활화 손실 및 상기 디스패리티 평활화 손실을 계산하는 손실 측정부; 및 상기 사이클 일관성 손실, 상기 복원 손실, 상기 옵티컬플로우 평활화 손실 및 상기 디스패리티 평활화 손실 각각에 대해 기지정된 손실 가중치를 적용하여 상기 총 손실을 획득하여 상기 다수의 샴 CNN으로 역전파하여, 상기 다수의 샴 CNN에 대한 학습 가중치를 업데이트하고, 상기 다수의 샴 CNN에 대한 학습이 완료되면, 상기 학습 가중치를 상기 추정부의 다수의 CNN으로 전달하는 손실 역전파부; 를 포함할 수 있다.The optical flow and disparity estimation apparatus further includes a learning unit that is combined while learning a plurality of CNNs of the estimator to obtain the learning weight, and the learning unit has the same structure as the plurality of CNNs of the estimator, and the 4 An offset acquisition unit composed of a plurality of Siamese CNNs for obtaining a positional change of a corresponding point as an offset for each of two images of different combinations among the images; A loss measuring unit for calculating the cycle coherence loss, the restoration loss, the optical flow smoothing loss, and the disparity smoothing loss by using a plurality of offsets obtained from each of the plurality of sham CNNs of the offset obtaining unit; And applying a predetermined loss weight to each of the cycle coherence loss, the restoration loss, the optical flow smoothing loss, and the disparity smoothing loss to obtain the total loss and backpropagating to the plurality of Siamese CNNs, and the plurality of A lossy backpropagation unit that updates the learning weights for the Siamese CNNs and, when learning on the plurality of Siamese CNNs is completed, transfers the learning weights to the plurality of CNNs of the estimation unit; It may include.

상기 목적을 달성하기 위한 본 발명의 다른 실시예에 따른 옵티컬플로우 및 디스패리티 추정 방법은 다수 프레임의 스테레오 영상을 획득하는 단계; 및 동일 구조를 갖고 패턴 인식 방법이 미리 학습되어 동일한 학습 가중치를 갖는 다수의 컨볼루션 신경망(이하 CNN)을 이용하여, 상기 다수 프레임의 스테레오 영상 중 연속된 2개 프레임의 스테레오 영상 세트에서 연속된 프레임의 영상들에 대한 옵티컬플로우와 시차에 따라 구분되는 좌영상과 우영상 사이의 디스페리티를 동시에 추정하여 출력하는 단계; 를 포함하고, 상기 다수의 CNN은 학습 시에 입력된 연속된 2개 프레임의 스테레오 영상 세트의 4개의 영상 중 하나의 영상의 각 픽셀로부터 기지정된 순방향 및 역방향 각각으로 나머지 영상을 사이클하여 탐색된 대응점의 위치의 변화의 합을 나타내는 사이클 전이 결과에 따라 획득되는 사이클 일관성 손실을 포함하는 총 손실이 역전파되어 업데이트된 상기 학습 가중치로 학습될 수 있다.An optical flow and disparity estimation method according to another embodiment of the present invention for achieving the above object includes: obtaining a stereo image of multiple frames; And a plurality of convolutional neural networks (hereinafter referred to as CNNs) having the same structure and having the same learning weight by pre-learning the pattern recognition method, from the stereo image set of two consecutive frames among the stereo images of the plurality of frames. Simultaneously estimating and outputting a disparity between the left image and the right image divided according to the optical flow of the images of and the parallax; Including, the plurality of CNNs are searched corresponding points by cycling the remaining images in each of the predetermined forward and reverse directions from each pixel of one of the four images of the stereo image set of two consecutive frames input at the time of training The total loss including the cycle coherence loss obtained according to the cycle transition result representing the sum of the changes in the position of is backpropagated and can be learned with the updated learning weight.

따라서, 본 발명의 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치 및 방법은 연속하는 프레임의 스테레오 영상에서 대응점에 대한 사이클이 동일해야 하는 사이클 일관성을 이용하여, 옵티컬플로우와 디스패리티를 동시에 비지도 학습 방식으로 학습시킬 수 있으므로, 검증 자료가 포함된 학습 데이터를 요구하지 않고 비지도 학습 방식으로 학습되어 매우 높은 옵티컬플로우와 디스패리티 추정 성능을 가지며, 학습 시간을 크게 줄일 수 있다. 또한 신뢰도 맵에 기반하여 사이클 일관성에 기초한 학습시의 정확도를 높일 수 있으며, 복원 손실 및 평활화 손실이 추가로 역전파되어 학습됨으로써, 옵티컬플로우와 디스패리티 추정 성능을 더욱 향상시킬 수 있다.Therefore, the optical flow and disparity estimation apparatus and method according to an embodiment of the present invention use unsupervised learning of optical flow and disparity at the same time by using cycle consistency in which the cycle for corresponding points in the stereo image of consecutive frames should be the same. Since it can be trained in a method, it is learned in an unsupervised learning method without requiring training data including verification data, so that it has very high optical flow and disparity estimation performance, and can greatly reduce the learning time. In addition, it is possible to increase the accuracy of learning based on cycle consistency based on the reliability map, and further improve the optical flow and disparity estimation performance by further backpropagating and learning the recovery loss and the smoothing loss.

도1 은 본 발명의 일 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치의 개략적 구조를 나타낸다.
도2 는 도1 의 학습부의 상세 구성을 나타낸다.
도3 은 도2 의 오프셋 획득부의 동작을 설명하기 위한 도면이다.
도4 는 도2 의 손실 측정부의 상세 구성을 나타낸다.
도5 는 사이클 일관성 손실의 개념을 설명하기 위한 도면이다.
도6 은 신뢰도 맵의 개념을 설명하기 위한 도면이다.
도7 및 도8 은 복원 손실의 개념을 설명하기 위한 도면이다.
도9 는 본 발명의 일 실시예에 따른 옵티컬플로우 및 디스패리티 추정 방법을 나타낸다.
도10 및 도11 은 본 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치 및 방법의 성능을 비교한 결과를 나타낸다.1 shows a schematic structure of an optical flow and disparity estimation apparatus according to an embodiment of the present invention.
FIG. 2 shows a detailed configuration of the learning unit of FIG. 1.
FIG. 3 is a diagram for describing an operation of an offset acquisition unit of FIG. 2.
4 shows a detailed configuration of the loss measurement unit of FIG. 2.
5 is a diagram for explaining the concept of cycle coherence loss.
6 is a diagram for explaining the concept of a reliability map.
7 and 8 are diagrams for explaining the concept of restoration loss.
9 shows an optical flow and disparity estimation method according to an embodiment of the present invention.
10 and 11 show results of comparing the performance of the optical flow and disparity estimation apparatus and method according to the present embodiment.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the practice of the present invention, reference should be made to the accompanying drawings and the contents described in the accompanying drawings, which illustrate preferred embodiments of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, the present invention will be described in detail by explaining preferred embodiments of the present invention with reference to the accompanying drawings. However, the present invention can be implemented in various different forms, and is not limited to the described embodiments. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a part “includes” a certain component, this means that other components may be further included, rather than excluding other components, unless specifically stated to the contrary. In addition, terms such as "... unit", "... group", "module", and "block" described in the specification mean a unit that processes at least one function or operation, which is hardware or software or hardware. And software.

도1 은 본 발명의 일 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치의 개략적 구조를 나타낸다.1 shows a schematic structure of an optical flow and disparity estimation apparatus according to an embodiment of the present invention.

도1 을 참조하면, 본 실시예의 옵티컬플로우 및 디스패리티 추정 장치는 스테레오 영상 획득부(100), 추정부(200) 및 학습부(300)를 포함한다.Referring to FIG. 1, the optical flow and disparity estimation apparatus of the present embodiment includes a stereo image acquisition unit 100, an estimation unit 200, and a learning unit 300.

스테레오 영상 획득부(100)는 옵티컬플로우 및 디스패리티를 추정해야할 스테레오 영상을 획득한다. 여기서 스테레오 영상은 스테레오 카메라에서 획득될 수 있는 영상으로 2개의 서로 다른 시점을 갖는 다수의 연속된 프레임으로 구성된 영상이다. 스테레오 영상은 스테레오 카메라의 구조에 따라 상하 영상 좌우 영상 등으로 획득될 수 있으나, 여기서는 일예로 좌영상(Left image)과 우영상(Right image)을 획득하는 것으로 가정한다.The stereo image acquisition unit 100 acquires a stereo image to which optical flow and disparity are to be estimated. Here, the stereo image is an image that can be acquired from a stereo camera, and is an image composed of a plurality of consecutive frames having two different viewpoints. The stereo image may be obtained as a vertical image, a left image, etc. according to the structure of a stereo camera, but it is assumed here that a left image and a right image are acquired as an example.

스테레오 영상 획득부(100)는 획득되는 다수 프레임의 스테레오 영상 중 연속된 2개 프레임의 스테레오 영상을 추정부(200)로 전달한다.The stereo image acquisition unit 100 transmits a stereo image of two consecutive frames among the acquired stereo images of a plurality of frames to the estimation unit 200.

스테레오 영상 획득부(100)는 일예로 t번째 프레임 및 t+1번째 프레임을 추정부(200)로 전달한다. 즉 t번째 프레임의 좌영상 및 우영상, 즉 스테레오 영상 세트(l₁, r₁)와 t+1번째 프레임의 스테레오 영상 세트(l₂, r₂)를 전달한다.The stereo image acquisition unit 100 transfers the t-th frame and the t+1-th frame to the estimating unit 200, for example. That is, the left and right images of the t-th frame, that is, a stereo image set (l ₁ , r ₁ ) and a stereo image set (l ₂ , r ₂ ) of the t+1 th frame are transferred.

추정부(200)는 각각 미리 학습된 인공 신경망, 일예로 컨볼루션 신경망(Convolution neural network: 이하 CNN)으로 구현되는 옵티컬플로우 추정부(210) 및 디스패리티 추정부(220)를 포함하여, 스테레오 영상 획득부(100)에서 전달된 2개의 스테레오 영상 세트로부터 옵티컬플로우 및 디스패리티를 추정하여 출력한다.The estimating unit 200 includes an optical flow estimating unit 210 and a disparity estimating unit 220 each implemented with a pre-learned artificial neural network, for example a convolution neural network (CNN), and a stereo image The optical flow and disparity are estimated and output from the two stereo image sets transmitted from the acquisition unit 100.

옵티컬플로우 추정부(210)는 연속된 2개의 프레임의 좌영상 및 우영상 각각에서 대응점의 이동, 즉 객체의 움직임 탐색하여 옵티컬플로우를 추정하기 위한 2개의 CNN을 포함할 수 있으며, 디스패리티 추정부(220)는 각 프레임의 좌영상과 우영상 사이의 대응점 위치 차인 디스패리티를 추정하기 위한 적어도 하나의 CNN을 포함할 수 있다.The optical flow estimation unit 210 may include two CNNs for estimating optical flow by searching for a movement of a corresponding point in each of the left image and the right image of two consecutive frames, that is, the movement of the object, and the disparity estimating unit Reference numeral 220 may include at least one CNN for estimating a disparity that is a position difference of a corresponding point between the left image and the right image of each frame.

여기서 옵티컬플로우 추정부(210)와 디스패리티 추정부(220)에 포함된 다수의 CNN은 동일한 구조를 갖고, 동시에 학습되어 동일한 가중치가 적용되는 샴(siamese) CNN으로 구현될 수 있다.Here, a plurality of CNNs included in the optical flow estimating unit 210 and the disparity estimating unit 220 may be implemented as a siamese CNN that has the same structure and is learned at the same time and applied with the same weight.

비록 옵티컬플로우는 연속하는 프레임에서 객체의 움직임을 탐색하는 반면, 디스패리티는 동일 프레임의 좌영상 및 우영상의 차이를 탐색하는 차이가 있으나, 기본적으로 2개의 영상에서 대응하는 픽셀 간의 위치 변이, 즉 대응점 변이를 추정한다는 점에서 유사하다. 따라서, 동일 구조를 갖고 동시에 학습되어 동일한 가중치가 적용된 샴 CNN으로 구현될 수 있다.Although optical flow searches for the motion of an object in consecutive frames, disparity has a difference in searching for the difference between the left image and the right image of the same frame, but basically, the positional shift between corresponding pixels in two images, that is, It is similar in that it estimates the corresponding point variation. Therefore, it can be implemented as a Siamese CNN having the same structure and learning at the same time to which the same weight is applied.

옵티컬플로우 추정부(210)는 2개의 CNN을 포함하여, 스테레오 영상 획득부(100)에서 전달된 2개의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))의 t번째 프레임 및 t+1번째 프레임의 좌영상들(l₁, l₂) 사이의 대응점 변이(F_l1,l2)와 우영상들(r₁, r₂) 사이의 대응점 변이(F_r1,r2)를 옵티컬플로우로 추정한다.The optical flow estimating unit 210 includes two CNNs, and the t-th of _two sets of stereo images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) transmitted from the stereo image acquisition unit 100 The correspondence point shift (F _l1,l2 ) between the left images (l ₁ , l ₂ ) of the frame and the t+1th frame and the correspondence point shift (F _r1,r2 ) between the right images (r ₁ , r ₂ ) It is estimated by optical flow.

그리고 디스패리티 추정부(220)는 스테레오 영상 획득부(100)에서 전달된 2개의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))의 t번째 프레임의 좌영상(l₁)에서 우영상(r₁)으로의 대응점 변이(F_l1,r1)를 디스패리티로 추정한다. 이때 디스패리티 추정부(220)는 다수의 CNN을 포함하여 t+1번째 프레임의 좌영상(l₂)에서 우영상(r₂)으로의 대응점 변이(F_l2,r2) 또한 함께 추정할 수 있다. 또한 경우에 따라서는 t번째 프레임 및 t+1번째 프레임 각각의 우영상(r₁, r₂)에서 좌영상(l₁, l₂)으로의 대응점 변이((F_r1,l1), (F_r2,l2))도 추정할 수도 있다.In addition, the disparity estimator 220 is the left image (l) of the t-th frame of the two stereo image sets ((l ₁ , r ₁ ), (l ₂ , r ₂ )) transmitted from the stereo image acquisition unit 100. _The variation of the corresponding point (F _l1,r1 ) from ₁ ) to the right image (r ₁ ) is estimated as disparity. At this time, the disparity estimating unit 220 may also estimate a shift of the corresponding point (F _l2,r2 ) from the left image (l ₂ ) to the right image (r ₂ ) of the t+1th frame including a plurality of CNNs. . Also, depending on the case, the corresponding point shift from the right image (r ₁ , r ₂ ) to the left image (l ₁ , l ₂ ) of each of the t-th frame and t+1-th frame ((F _r1,l1 ), (F _{r2) ,l2} )) can also be estimated.

상기한 바와 같이, 본 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치에서 추정부(200)는 미리 학습된 다수의 샴 CNN으로 구성되어, 전달된 2개 프레임의 스테레오 영상으로부터 옵티컬플로우 및 디스패리티를 동시에 추정하여 출력할 수 있다.As described above, in the optical flow and disparity estimation apparatus according to the present embodiment, the estimating unit 200 is composed of a plurality of pre-learned Siamese CNNs, and calculates optical flow and disparity from the transmitted stereo images of two frames. It can be estimated and printed at the same time.

다만 추정부(200)의 다수의 샴 CNN이 옵티컬플로우 및 디스패리티를 정확하게 추정하기 위해서는 반드시 미리 학습된 상태이어야 한다.However, in order for the plurality of Siamese CNNs of the estimating unit 200 to accurately estimate optical flow and disparity, they must be learned in advance.

이에 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치는 별도의 검증 자료가 레이블된 학습 데이터 없이 추정부(200)를 학습시키기 위한 학습부(300)를 더 포함할 수 있다.Accordingly, the apparatus for estimating optical flow and disparity according to the embodiment may further include a learning unit 300 for learning the estimating unit 200 without learning data labeled with separate verification data.

학습부(300)는 추정부(200)와 마찬가지로 스테레오 영상 획득부(100)로부터 2개 프레임의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))를 전달받는다. 그리고 전달된 2개 프레임의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))에 대해 추정부(200)와 동일한 구성을 갖는 8개 샴 CNN을 이용하여 8개의 오프셋을 획득하고, 획득된 8개의 오프셋으로부터 손실을 측정하여 8개의 샴 CNN으로 역전파함으로써 8개의 샴 CNN의 가중치를 업데이트 한다. 즉 8개의 샴 CNN을 학습시킨다.Like the estimation unit 200, the learning unit 300 receives a two-frame stereo image set ((l ₁ , r ₁ ), (l ₂ , r ₂ )) from the stereo image acquisition unit 100. And, for the transmitted stereo image set of two frames ((l ₁ , r ₁ ), (l ₂ , r ₂ )), 8 offsets were calculated using 8 Siamese CNNs having the same configuration as the estimator 200. The weights of the 8 siamese CNNs are updated by backpropagating them to 8 siamese CNNs by measuring losses from the obtained 8 offsets. That is, eight Siamese CNNs are trained.

여기서 8개의 샴 CNN은 추정부(200)와 동일한 구성을 가지므로, 업데이트된 8개의 샴 CNN의 가중치는 추정부(200)에 포함된 다수의 샴 CNN의 가중치로 적용되며, 결과적으로 추정부(200)의 다수의 샴 CNN을 학습시킬 수 있다.Here, since the 8 Siamese CNNs have the same configuration as the estimating unit 200, the weights of the updated 8 Siamese CNNs are applied as weights of a number of Siamese CNNs included in the estimating unit 200, and as a result, the estimating unit ( 200) of Siamese CNNs can be trained.

그리고 학습부(300)는 학습의 성능을 향상시키기 위해, 즉 추정부(200)의 옵티컬플로우 및 디스패리티 추정 정확도를 높이기 위해, 8개의 오프셋으로부터 여러 종류의 손실을 측정하여 역전파할 수 있다. 여기서는 일예로 사이클 일관성 손실, 복원 손실 및 평활화 손실 측정하여 역전파한다.In addition, the learning unit 300 may measure various types of losses from eight offsets and backpropagation in order to improve the learning performance, that is, to increase the accuracy of estimating optical flow and disparity of the estimating unit 200. Here, as an example, cycle coherence loss, recovery loss, and smoothing loss are measured and backpropagated.

학습부(300)는 단지 추정부(200)를 학습시키기 위한 구성으로, 추정부의 다수의 샴 CNN이 학습된 이후에는 생략될 수 있다. 즉 옵티컬플로우 및 디스패리티 추정 장치가 실제 이용되는 경우에는 학습부(300)가 제외될 수 있다. 뿐만 아니라, 학습부(300)는 다수의 샴 CNN을 위한 가중치를 획득하는 별도의 학습 장치로 구성될 수도 있다.The learning unit 300 is a configuration for only learning the estimation unit 200, and may be omitted after a plurality of Siamese CNNs of the estimation unit are learned. That is, when the optical flow and disparity estimation apparatus are actually used, the learning unit 300 may be excluded. In addition, the learning unit 300 may be configured as a separate learning device that acquires weights for a plurality of Siamese CNNs.

도2 는 도1 의 학습부의 상세 구성을 나타내고, 도3 은 도2 의 오프셋 획득부의 동작을 설명하기 위한 도면이며, 도4 는 도2 의 손실 측정부의 상세 구성을 나타낸다.FIG. 2 shows a detailed configuration of the learning unit of FIG. 1, FIG. 3 is a diagram for explaining the operation of the offset acquisition unit of FIG. 2, and FIG. 4 shows a detailed configuration of the loss measurement unit of FIG. 2.

도2 를 참조하면, 학습부(300)는 오프셋 획득부(310), 손실 측정부(320) 및 손실 역전파부(330)를 포함한다.Referring to FIG. 2, the learning unit 300 includes an offset acquisition unit 310, a loss measurement unit 320, and a loss backpropagation unit 330.

오프셋 획득부(310)는 도3 에 도시된 바와 같이, 각각 스테레오 영상 획득부(100)에서 전달된 2개의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))에서 각각 대응하는 2개의 영상을 인가받아 특징 맵을 추출하는 인코더(Encoder)와 추출된 특징 맵으로부터 오프셋을 획득하는 디코더(Decoder) 구조를 가지는 8개의 샴 CNN을 포함한다. 그리고 8개의 샴 CNN 중 4개의 샴 CNN은 옵티컬플로우 오프셋 획득부(311)를 구성하고, 나머지 4개의 샴 CNN은 디스패리티 오프셋 획득부(312)를 구성한다.As shown in FIG. 3, the offset acquisition unit 310 is, respectively, from the two stereo image sets ((l ₁ , r ₁ ), (l ₂ , r ₂ )) transmitted from the stereo image acquisition unit 100. It includes 8 Siamese CNNs having an encoder that extracts a feature map by receiving corresponding two images and a decoder that obtains an offset from the extracted feature map. And, of the eight Siamese CNNs, four Siamese CNNs constitute the optical flow offset acquisition unit 311, and the remaining four Siamese CNNs constitute the disparity offset acquisition unit 312.

표1 은 샴 CNN의 인코더 및 디코더 구조의 일예를 나타낸다.Table 1 shows an example of a Siamese CNN encoder and decoder structure.

표1 에 나타난 바와 같이, 인코더는 일예로 10개의 컨볼루션 레이어를 포함하여 구성될 수 있으며, 디코더는 컨볼루션 레이어와 업컨볼루션 레이어가 조합된 12개의 레이어를 포함할 수 있다.As shown in Table 1, the encoder may include, for example, 10 convolution layers, and the decoder may include 12 layers in which a convolution layer and an upconvolution layer are combined.

옵티컬플로우 오프셋 획득부(311)는 동일한 구조의 4개의 샴 CNN을 포함하며, 각각의 CNN은 스테레오 영상 획득부(100)에서 전달된 2개의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))의 좌영상(l₁, l₂)과 우영상(r₁, r₂) 각각에 대해 시간의 흐름에 따라 순방향 및 역방향의 4개의 옵티컬플로우 오프셋((F_l1,l2), (F_r2,r1), (F_l2,l1), (F_r1,r2))을 획득한다.The optical flow offset acquisition unit 311 includes four Siamese CNNs of the same structure, and each CNN is a set of two stereo images transmitted from the stereo image acquisition unit 100 ((l ₁ , r ₁ ), (l _{_2,} r ₂₎₎ left picture (l _1, l ₂₎ and right images (r _1, r ₂₎ 4 of optical flow offset ((F _{l1, l2)} of the forward and reverse directions in accordance with the passage of time for each , (F _r2,r1 ), (F _l2,l1 ), (F _r1,r2 )) are obtained.

옵티컬플로우 오프셋 획득부(311)의 4개의 샴 CNN 중 2개는 추정부(200)의 옵티컬플로우 추정부(210)와 동일하게 스테레오 영상 획득부(100)에서 전달된 2개의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))에서 시간의 흐름에 따라 순방향으로 t번째 프레임으로부터 t+1번째 프레임으로의 좌영상들(l₁, l₂) 사이의 대응점 변이(F_l1,l2)와 우영상들(r₁, r₂) 사이의 대응점 변이(F_r1,r2)를 옵티컬플로우 오프셋으로 획득한다.Two of the four Siamese CNNs of the optical flow offset acquisition unit 311 are two sets of stereo images transmitted from the stereo image acquisition unit 100 in the same manner as the optical flow estimator 210 of the estimating unit 200 (( In l ₁ , r ₁ ), (l ₂ , r ₂ )), the corresponding point shift between the left images (l ₁ , l ₂ ) from the t-th frame to the t+1-th frame in the forward direction over time (F The corresponding point shifts (F _r1,r2 ) between _l1,l2 ) and right images (r ₁ , r ₂ ) are acquired as optical flow offsets.

그리고 옵티컬플로우 오프셋 획득부(311)의 나머지 2개의 샴 CNN은 시간의 역방향으로 t+1번째 프레임으로부터 t번째 프레임으로의 좌영상들(l₁, l₂) 사이의 대응점 변이(F_l2,l1)와 우영상들(r₂, r₁) 사이의 대응점 변이(F_r1,r2)를 옵티컬플로우 오프셋으로 획득한다.And the remaining two siamese CNNs of the optical flow offset acquisition unit 311 are the corresponding point shifts (F _{l2, l1} ) between the left images (l ₁ , l ₂ ) from the t+1 th frame to the t th frame in the reverse direction of time. ) And the right images (r ₂ , r ₁ ) to obtain a corresponding point shift (F _r1,r2 ) as an optical flow offset.

한편, 디스패리티 오프셋 획득부(312) 또한 4개의 샴 CNN을 포함하고, 각각의 CNN은 스테레오 영상 획득부(100)에서 전달된 2개의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))에서 t번째 및 t+1번째 프레임 각각의 좌영상(l₁, l₂)에서 우영상(r₁, r₂)으로의 대응점 변이((F_l1,r1), (F_l2,r2)) 및 우영상(r₁, r₂)에서 좌영상(l₁, l₂)으로의 대응점 변이((F_r1,l1), (F_r2,l2))를 디스패리티 오프셋으로 획득한다.On the other hand, the disparity offset acquisition unit 312 also includes four Siamese CNNs, and each CNN includes two stereo image sets ((l ₁ , r ₁ ), (l ₂ ) transmitted from the stereo image acquisition unit 100 , r ₂ )), the corresponding point shift from the left image (l ₁ , l ₂ ) to the right image (r ₁ , r ₂ ) of each of the t-th and t+1th frames ((F _l1,r1 ), (F _{l2 ,r2} )) and the corresponding point shift ((F _r1,l1 ), (F _r2,l2 )) from the right image (r ₁ , r ₂ ) to the left image (l ₁ , l ₂ ) as a disparity offset .

즉 오프셋 획득부(310)는 8개의 샴 CNN을 이용하여 스테레오 영상 획득부(100)에서 전달된 2개의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))의 4개의 영상에 대해 서로 다른 조합으로 대응점 변이를 탐색하여, 8개의 오프셋(4개의 옵티컬플로우 오프셋((F_l1,l2), (F_r2,r1), (F_l2,l1), (F_r1,r2))과 4개의 디스패리티 오프셋((F_l1,r1), (F_r2,l2), (F_r1,l1), (F_l2,r2)))을 획득한다.That is, the offset acquisition unit 310 uses 8 Siamese CNNs to provide 4 sets of 2 stereo images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) transmitted from the stereo image acquisition unit 100. By searching for the corresponding point variation in different combinations for the image, 8 offsets (4 optical flow offsets ((F _l1,l2 ), (F _r2,r1 ), (F _l2,l1 ), (F _r1,r2 )) ) And four disparity offsets ((F _l1,r1 ), (F _r2,l2 ), (F _r1,l1 ), (F _l2,r2 ))).

여기서 a 영상에서 픽셀(p)의 위치가 (p_x, p_y)이고, b 영상의 대응점(q)의 위치가 (q_x, q_y)인 경우, a 영상으로부터 b 영상으로의 대응점 변이, 즉 오프셋(F_a,b)은 수학식 1과 같이 계산된다.Here, if the location of the pixel (p) in the image a is (p _x , p _y ) and the location of the corresponding point (q) in the image b is (q _x , q _y ), the shift of the corresponding point from image a to image b, That is, the offset (F _a,b ) is calculated as in Equation 1.

본 실시예에서 오프셋 획득부(310)가 8개의 샴 CNN을 포함하여 8개의 오프셋을 획득하는 것은 학습부(300)가 별도의 학습 데이터 없이도 후술하는 사이클 일관성에 기반하여 비지도 학습을 정확하게 수행할 수 있도록 하기 위함이다.In this embodiment, when the offset acquisition unit 310 acquires 8 offsets including 8 Siamese CNNs, the learning unit 300 accurately performs unsupervised learning based on cycle coherence described below without additional training data. It is to make it possible.

도4 를 참조하면, 손실 측정부(320)는 사이클 일관성 손실 측정부(321), 복원 손실 측정부(322), 평활화 손실 측정부(323) 및 신뢰도 맵 생성부(324)를 포함할 수 있다.Referring to FIG. 4, the loss measurement unit 320 may include a cycle coherence loss measurement unit 321, a restoration loss measurement unit 322, a smoothing loss measurement unit 323, and a reliability map generation unit 324. .

상기한 바와 같이, 본 실시예에서 손실 측정부(320)는 오프셋 획득부(310)의 8개의 샴 CNN을 학습시키기 위해 사이클 일관성 손실, 복원 손실 및 평활화 손실의 3가지 손실을 측정한다.As described above, in the present embodiment, the loss measurement unit 320 measures three losses of cycle coherence loss, restoration loss, and smoothing loss in order to learn eight Siamese CNNs of the offset acquisition unit 310.

우선 사이클 일관성 손실 측정부(321)는 스테레오 영상 획득부(100)에서 전달된 2개 프레임의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))의 4개의 영상에서 기지정된 순서로 사이클된 대응점의 변이의 합이 0의 오프셋을 가져야 한다는 사이클 일관성을 이용하여 손실을 측정한다. 그리고 복원 손실 측정부(322)는 각 대응점의 픽셀 값은 유사해야 한다는 픽셀 일치성을 이용하여 손실을 측정한다. 마지막으로 평활화 손실 측정부(323)는 움직임 및 디스패리티의 경계를 제외한 나머지 영역에서 대응점의 주변 픽셀들의 픽셀 값은 매끄럽게 변화되어야 한다는 평활 특성을 이용하여 손실을 측정한다.First, the cycle coherence loss measurement unit 321 is determined from the four images of the stereo image set ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of two frames transmitted from the stereo image acquisition unit 100. The loss is measured using cycle coherence that the sum of the variations of the corresponding points cycled in the specified order must have an offset of zero. In addition, the restoration loss measurement unit 322 measures the loss by using the pixel consistency that the pixel values of each corresponding point should be similar. Finally, the smoothing loss measurement unit 323 measures the loss by using a smoothing characteristic that the pixel values of the pixels surrounding the corresponding point must be smoothly changed in the rest area excluding the boundary between motion and disparity.

한편, 신뢰도 맵 생성부(324)는 사이클 일관성 손실 측정부(321)와 복원 손실 측정부(322)가 객체의 움직임으로 인해 사라진 영역 및 폐색(Occlusion) 영역과 같이 대응점이 존재하지 않는 영역에 대해 손실을 측정함으로써, 손실 측정의 오차가 발생하는 것을 방지하기 위해 신뢰도 맵을 생성한다.On the other hand, the reliability map generation unit 324 includes the cycle coherence loss measurement unit 321 and the restoration loss measurement unit 322 for areas where there is no corresponding point such as an area disappeared due to movement of an object and an occlusion area. By measuring the loss, a reliability map is created to prevent errors in the loss measurement.

이하에서는 도5 내지 도9 를 참조하여, 도4 의 손실 측정부(320)의 동작을 상세하게 설명한다.Hereinafter, the operation of the loss measurement unit 320 of FIG. 4 will be described in detail with reference to FIGS. 5 to 9.

도5 는 사이클 일관성 손실의 개념을 설명하기 위한 도면이다.5 is a diagram for explaining the concept of cycle coherence loss.

사이클 일관성은 연속되는 2개의 프레임의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂))에서 사이클되는 대응점은 일치되어야 함을 의미한다. 만일 오프셋 획득부(310)의 8개의 샴 CNN이 정상적으로 학습된 상태라고 가정하면, 도5 의 (a)에 도시된 바와 같이, t번째 프레임의 좌영상(l₁)의 특정 픽셀로부터 t+1번째 프레임의 좌영상(l₂), t+1번째 프레임의 우영상(r₂) 및 t번째 프레임의 우영상(r₁)을 거쳐 사이클된 t번째 프레임의 좌영상(l₁)의 대응점은 최초 t번째 프레임의 좌영상(l₁)의 픽셀과 동일해야 한다. 즉 오프셋 획득부(310)에서 획득된 각 대응점 변이의 합은 0이되어야 한다.Cycle coherence means that the corresponding points cycled in the four images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of a stereo image set of two consecutive frames must be matched. If it is assumed that the 8 Siamese CNNs of the offset acquisition unit 310 are normally learned, as shown in (a) of FIG. 5, t+1 from a specific pixel of the left image 1 ₁ of the t-th frame The corresponding point of the left image (l ₁ ) of the t-th frame cycled through the left image (l ₂ ) of the th frame, the right image (r ₂ ) of the t+1 th frame, and the right image (r ₁ ) of the t th frame is It should be the same as the pixel of the left image (l ₁ ) of the first t-th frame. That is, the sum of the variations of each corresponding point acquired by the offset acquisition unit 310 should be zero.

그리고 이러한 사이클 일관성은 도5 의(a)에 도시된 순방향(l₁ -> l₂ -> r₂ -> r₁)뿐만 아니라, (b)에 도시된 역 방향(l₁ -> r₁ -> r₂ -> l₂)에서도 만족해야만 한다.And this cycle coherence is not only the forward direction (l ₁ -> l ₂ -> r ₂ -> r ₁ ) shown in Fig. 5(a), but also the reverse direction (l ₁ -> r ₁ -) shown in (b). > r ₂ -> l ₂ ) should also be satisfied.

만일 순방향 및 역방향 사이클 중 적어도 하나에서 대응점 변이의 합이 0이 아니면, 이는 8개의 CNN 중 적어도 하나의 CNN이 잘못된 대응점에 대한 변이를 획득한 것으로 판단할 수 있으며, 사이클 일관성 손실로 나타나게 된다.If the sum of the corresponding point variations in at least one of the forward and reverse cycles is not 0, it can be determined that at least one CNN of the 8 CNNs has acquired the variation for the wrong corresponding point, resulting in a cycle coherence loss.

이에 사이클 일관성 손실 측정부(321)는 순방향 및 역방향의 대응점 변이의 합을 계산한다. 그리고 대응점 변이의 합을 계산하기 위해 본 실시예에서는 우선 전이 연산자(

)를 수학식 2과 같이 정의한다.Accordingly, the cycle coherence loss measurement unit 321 calculates the sum of the shifts of the corresponding points in the forward and reverse directions. And in order to calculate the sum of the corresponding point variations, in this embodiment, first, the transfer operator (

) Is defined as in Equation 2.

수학식 2는 전이 연산자(

)는 a, b 및 c의 3개의 영상에 대해 a 영상으로부터 b 영상을 거쳐 c 영상으로의 전이되는 오프셋(F_a,b)의 합을 나타낸다.Equation 2 is the transfer operator (

) Represents the sum of the offsets (F _a,b ) transitioned from the a image to the c image through the b image for three images a, b, and c.

그리고 도5 와 유사하게 a, b, c, d의 4개의 영상에 대해 순차적으로 전이되는 사이클 전이 결과(

)를 수학식 2의 전이 연산자(

)를 이용하면, 수학식 3과 같이 표현될 수 있다.And similarly to Figure 5, the cycle transition results that are sequentially transitioned for four images of a, b, c, and d (

) To the transfer operator in Equation 2 (

) Can be expressed as in Equation 3.

수학식 3을 참조하면,

는 a, b, c, d의 4개의 영상에 대해 a 영상으로부터 b 영상 방향으로의 사이클 전이 결과를 나타낸다.Referring to Equation 3,

Represents the result of the cycle transition from the a image to the b image direction for four images of a, b, c, and d.

이에 수학식 3을 도5 의 (a) 및 (b)에 반영하면, (a)의 순방향 사이클 전이 결과는

로 표현되고, (b)의 역방향 사이클 전이 결과는

로 표현될 수 있다.Accordingly, if Equation 3 is reflected in (a) and (b) of Fig. 5, the result of the forward cycle transition of (a) is

And the reverse cycle transition result in (b) is

Can be expressed as

사이클 일관성을 만족하기 위해서는 순방향 사이클 전이 결과(

)와 역방향 사이클 전이 결과(

)가 모두 0이 되어야 하며, 전이 결과가 0이 아니면, 사이클 일관성 손실이 발생된 것으로 볼 수 있다. 즉 사이클 일관성 손실(L _c )은 순방향 사이클 전이 결과(

)와 역방향 사이클 전이 결과(

)의 합(

+

)으로 계산될 수 있다.To satisfy cycle consistency, the forward cycle transition result (

) And reverse cycle transition result (

) Must be all 0, and if the transition result is not 0, it can be considered that cycle coherence loss has occurred. That is, the cycle coherence loss ( L _c ) is the result of the forward cycle transition (

) And reverse cycle transition result (

) Sum (

+

) Can be calculated.

그러나 상기한 바와 같이 2개의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂)) 사이에는 대응점이 존재하지 않는 픽셀이 포함될 수 있다. 따라서 영상의 모든 픽셀(p)에 대한 순방향 사이클 전이 결과(

)와 역방향 사이클 전이 결과(

)의 합을 단순 계산하는 경우, 계산된 사이클 일관성 손실(L _c )에 오류가 발생하게 된다.However, as described above, a pixel without a corresponding point may be included between the four images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of the two stereo image sets. Therefore, the result of the forward cycle transition for all pixels (p) of the image (

) And reverse cycle transition result (

If the sum of) is simply calculated, an error occurs in the calculated cycle coherence loss ( L _c ).

이에 손실 측정부(320)는 신뢰도 맵 생성부(324)를 추가로 구비하여, 사이클 일관성 손실(L _c )의 오류를 방지할 수 있도록 한다. 신뢰도 맵 생성부(324)는 2개의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂))에서 각 조합에 따른 두 영상들 사이에서 획득되는 양방향 반복 대응점 변이 차, 즉 오프셋의 차를 기반으로 각 픽셀의 신뢰도를 판별한다.Accordingly, the loss measurement unit 320 additionally includes a reliability map generation unit 324 to prevent an error of the cycle coherence loss L _c . The reliability map generator 324 is a two-way repetitive correspondence point shift obtained between two images according to each combination in four images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of two stereo image sets. The reliability of each pixel is determined based on the difference, that is, the difference in the offset.

도6 은 신뢰도 맵의 개념을 설명하기 위한 도면으로, 도6 에서 (a)와 (b)는 a 영상(여기서는 l₁)과 b 영상(여기서는 l₂)의 두영상 사이에서 생성되는 신뢰도 맵의 개념을 나타낸다.6 is a diagram for explaining the concept of a reliability map. In FIG. 6, (a) and (b) are diagrams of a reliability map generated between two images of image a (in this case l ₁ ) and image b (in this case, l ₂ ). Represent the concept.

신뢰도 맵 생성부(324)는 (a)에 도시된 a 영상에서 b 영상으로 전이 후 다시 a 영상으로 전이된 오프셋의 합(

)과, (b)에 도시된 b 영상에서 a 영상으로 전이 후 다시 b 영상으로 전이된 오프셋의 합(

) 사이의 차를 기반으로 a 영상과 b 영상 사이의 신뢰도 맵을 계산한다.The reliability map generator 324 is the sum of the offsets transferred from image a to image b and then transferred to image a again (

), and the sum of the offsets transferred from the b image to the a image and then to the b image (

) Based on the difference between the a and b images.

도6 을 참조하면, 오프셋 획득부(310)의 CNN들이 정상적으로 학습된 상태인 경우, (a)에 도시된 오프셋의 합(

)과 (b)에 도시된 오프셋의 합(

)은 모두 0으로 나타나야 하며, 적어도 0에 근접한 값을 나타내야 한다.Referring to FIG. 6, when the CNNs of the offset acquisition unit 310 are normally learned, the sum of the offsets shown in (a) (

) And the sum of the offsets shown in (b) (

) Must appear as 0, and must represent at least a value close to 0.

따라서 오프셋의 합(

)과 오프셋의 합(

) 사이의 차 또한 0에 근사된 값을 가져야 한다. 이에 a 영상과 b 영상의 두 영상 사이의 신뢰도 맵(M_a,b)은 수학식 4에 따라 계산될 수 있다.So the sum of the offsets (

) Plus the offset (

The difference between) must also have a value approximating 0. Accordingly, a reliability map (M _a,b ) between two images of image a and image _b may be calculated according to Equation 4.

수학식 4 에서 H(x)는 x가 0 이상이면 1을 출력하고 이외에는 0을 출력하는 스텝 함수이고, ∥∥₂는 L2 norm 함수이다. 그리고 기지정된 값을 갖는 α₁ 및 α₂는 마진 값이다.In Equation 4, H(x) is a step function that outputs 1 if x is greater than or equal to 0, and outputs 0 otherwise, and ∥∥ ₂ is an L2 norm function. And α ₁ and α ₂ having predetermined values are margin values.

수학식 4에서 첫번째 항목인

는 오프셋의 합(

)이 오프셋의 합(

)과 마진(α₁)의 합보다 작은 경우에 1로 출력되어 신뢰할 수 있음을 의미한다. 그러나 첫번째 항목만으로 신뢰도를 계산하는 경우, 오프셋의 합(

)과 오프셋의 합(

)이 마진(α₁)보다 작은 차이를 갖되 둘 다 매우 큰 값을 갖는 경우, 신뢰도를 잘못 판단하게 되는 문제가 발생할 수 있다. 이에 두번째 항목인

가 오프셋의 합(

)이 마진(α₂)보다 작은 경우에 1로 출력되어 신뢰할 수 있음을 의미한다. 즉 수학식 4는 오프셋의 합(

)과 오프셋의 합(

)이 마진(α₁) 이내의 차를 갖고, 오프셋의 합(

)이 마진(α₂)보다 작은 값을 갖는 경우에 신뢰할 수 있음을 나타낸다.The first item in Equation 4

Is the sum of the offsets (

) Is the sum of the offsets (

If it is smaller than the sum of) and margin (α ₁ ), it is output as 1, which means that it is reliable. However, when calculating the reliability with only the first item, the sum of the offsets (

) Plus the offset (

) Has a difference smaller than the margin (α ₁ ), but both have very large values, a problem of incorrectly determining the reliability may occur. The second item

Is the sum of the offsets (

If) is smaller than the margin (α ₂ ), it is output as 1, which means that it is reliable. That is, Equation 4 is the sum of the offsets (

) Plus the offset (

) Has the difference within the margin (α ₁ ), and the sum of the offsets (

) Is less than the margin (α ₂ ).

그리고 사이클 일관성 손실 측정부(321)는 2개의 프레임의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂))에 대한 순방향 및 역방향 사이클 전이 결과를 계산하므로, 이에 대응하여, 신뢰도 맵 생성부(324) 또한 4개의 영상들 사이에 대한 순방향 및 역방향 신뢰도 맵을 각각 생성할 수 있다.And the cycle coherence loss measurement unit 321 calculates the forward and reverse cycle transition results for four images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of a stereo image set of two frames, Correspondingly, the reliability map generator 324 may also generate forward and reverse reliability maps between the four images, respectively.

신뢰도 맵 생성부(324)가 a, b, c, d의 순서로 4개의 영상에 대해 a 영상으로부터 b 영상 방향으로의 사이클 전이에 대한 신뢰도 맵을 생성하는 경우, 사이클 신뢰도 맵(

)은 수학식 5와 같이 계산될 수 있다.When the reliability map generator 324 generates a reliability map for the cycle transition from the a image to the b image direction for four images in the order of a, b, c, and d, the cycle reliability map (

) Can be calculated as in Equation 5.

그리고 사이클 일관성 손실 측정부(321)는 신뢰도 맵 생성부(324)에서 획득된 사이클 신뢰도 맵(

)에 기반하여, 사이클 손실(L _c )을 수학식 6에 따라 획득할 수 있다.In addition, the cycle coherence loss measurement unit 321 is a cycle reliability map obtained from the reliability map generation unit 324 (

), the cycle loss L _c can be obtained according to Equation 6.

여기서

과

는 도5 의 (a) 및 (b)에 나타난 순방향 사이클 손실 및 역방향 사이클 손실을 나타내고, 각각 수학식 7과 같이 계산될 수 있다.here

and

Denotes the forward cycle loss and the reverse cycle loss shown in (a) and (b) of FIG. 5, and can be calculated as in Equation 7 respectively.

수학식 7에서 ψ()는 절단 함수로서,

로서 ∥∥₁은 L1 norm 함수이며, T는 기지정된 문턱값을 나타낸다. 즉 ψ(x)는 x의 L1 norm 값과 기지정된 문턱값(T) 중 작은 값을 출력하는 함수로서, 사이클 손실의 이상 출력값을 제거하기 위해 적용된다.In Equation 7, ψ() is a truncation function,

As ∥∥ ₁ is an L1 norm function, and T represents a predetermined threshold. That is, ψ(x) is a function for outputting the smaller value of the L1 norm value of x and the predetermined threshold value T, and is applied to remove the abnormal output value of the cycle loss.

한편 사이클 일관성 손실 측정부(321)가 순방향 및 역방향으로 사이클 손실을 측정하여, 사이클 손실이 매우 낮게 나타나더라도 경우에 따라서는 잘못된 사이클 경로를 통해 사이클 손실이 낮게 측정되는 경우가 발생될 수 있다. 즉 4개의 영상 사이를 전이하는 동안 부정확한 대응점을 거치더라도, 최종적으로 초기 픽셀로 전이됨으로써, 사이클 손실이 0 또는 0에 근접한 낮은 값으로 측정되는 경우가 발생할 수 있다.Meanwhile, even if the cycle loss measurement unit 321 measures the cycle loss in the forward and reverse directions, and the cycle loss is very low, in some cases, the cycle loss may be measured low through an incorrect cycle path. That is, even if an inaccurate correspondence point is passed during the transition between the four images, the cycle loss may be measured as 0 or a low value close to 0 as it is finally transferred to the initial pixel.

이러한 문제를 보완하기 위해, 복원 손실 측정부(322)는 대응점 사이의 의 픽셀값 및 그래디언트(gradient) 일관성 오차를 복원 손실로서 측정한다. a 영상과 b 영상 사이의 오프셋(F_a,b)이 주어지면, 신뢰도 맵(M_a,b)을 적용하여 두 영상 사이의 복원 손실(L ^r _a,b )은 수학식 8에 따라 획득될 수 있다.In order to compensate for this problem, the restoration loss measurement unit 322 measures a pixel value and a gradient consistency error between corresponding points as restoration loss. Given the offset (F _a,b ) between the a-image and the b-image, the restoration loss ( L ^r _a,b ) between the two images by applying the reliability map (M _a,b ) is obtained according to Equation 8 I can.

여기서

는 그래디언트 연산자이고, γ는 픽셀값과 그래디언트의 균형을 조절하기 위한 밸런스값이다. 수학식 8에서 그래디언트 연산자는 조명의 변화에 강건한 복원 손실을 획득하기 위해 적용되며, 기존에도 옵티컬플로우 추정 시에 널리 이용되고 있는 연산자이다.here

Is the gradient operator, and γ is the balance value for adjusting the balance between the pixel value and the gradient. In Equation 8, the gradient operator is applied to obtain a restoration loss robust to a change in lighting, and is an operator widely used in estimating optical flow.

그리고 수학식 8에서 신뢰도 맵(M_a,b)을 적용하는 것은 사이클 일관성 손실 계산에서와 마찬가지로 대응점이 없는 영역에 대한 복원 손실이 반영되지 않도록 하기 위해서이다.In addition, the application of the reliability map (M _a,b ) in Equation 8 is to prevent the restoration loss for a region without a corresponding point from being reflected as in the cycle coherence loss calculation.

도7 및 도8 은 복원 손실의 개념을 설명하기 위한 도면으로, 도7 에서 (a)는 2개의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))에서 옵티컬플로우의 순방향 및 역방향 복원 손실을, (b)는 디스패리티의 순방향 및 역방향 복원 손실을 나타내며, 도8 의 (a)와 (b)는 옵티컬플로우와 디스패리티의 순방향 및 역방향 조합 복원 손실을 나타낸다.7 and 8 are diagrams for explaining the concept of restoration loss. In FIG. 7 (a) is an optical flow diagram in two stereo image sets ((l ₁ , r ₁ ), (l ₂ , r ₂ )). Forward and reverse recovery loss, (b) represents the forward and reverse recovery loss of disparity, and (a) and (b) of FIG. 8 represent the forward and reverse combination recovery loss of optical flow and disparity.

도7 및 도8 에 나타난 바와 같이, 복원 손실 또한 사이클 일관성 손실과 유사하게 전이 경로를 따라 옵티컬플로우와 디스패리티 및 옵티컬플로우와 디스패리티 조합 각각의 순방향 및 역방향 복원 손실이 모두 고려되어야 한다.As shown in Figs. 7 and 8, similarly to the cycle coherence loss, the recovery loss should be taken into account for both the optical flow and disparity, and the forward and reverse recovery loss of each combination of optical flow and disparity along the transition path.

도8 에서 옵티컬플로우와 디스패리티 조합에 대한 복원 손실을 t번째 프레임의 좌영상(l₁)과 t+1번째 프레임의 우영상(r₂)으로부터 곧바로 획득하지 않고, t+1번째 프레임의 좌영상(l₂)을 통해 획득하는 것은 오프셋 획득부(310)의 8개의 샴 CNN 중 t번째 프레임의 좌영상(l₁)과 t+1번째 프레임의 우영상(r₂) 사이의 오프셋을 획득하는 CNN이 존재하지 않기 때문이다. 만일 오프셋 획득부(310)가 8개를 초과하는 개수의 샴 CNN을 포함하는 경우, 옵티컬플로우와 디스패리티 조합에 대한 복원 손실은 도8 과 다르게 t번째 프레임의 좌영상(l₁)과 t+1번째 프레임의 우영상(r₂) 사이의 오프셋으로부터 직접 획득될 수도 있다.In FIG. 8, the recovery loss for the optical flow and disparity combination is not immediately acquired from the left image (l ₁ ) of the t-th frame and the right image (r ₂ ) of the t+1th frame, but the left of the t+1th frame. Acquiring through the image (l ₂ ) acquires an offset between the left image (l ₁ ) of the t-th frame and the right image (r ₂ ) of the t+1-th frame among 8 sham CNNs of the offset acquisition unit 310 This is because there is no CNN. If the offset acquisition unit 310 includes more than 8 Siamese CNNs, the recovery loss for the optical flow and disparity combination is different from that of FIG. 8, the left image (l ₁ ) and t+ of the t-th frame. It may be obtained directly from the offset between the right image r ₂ of the first frame.

다만 오프셋 획득부(310)의 샴 CNN의 개수의 증가는 오프셋 획득부(310)의 구조를 복잡하게 하여, 학습부(300)를 구성하기 위한 비용이 증가하게 된다. 따라서 본 실시예에서는 효율성을 고려하여, 오프셋 획득부(310)가 8개의 샴 CNN을 구비하는 것으로 가정한다.However, an increase in the number of Siamese CNNs of the offset acquisition unit 310 complicates the structure of the offset acquisition unit 310, thereby increasing the cost for configuring the learning unit 300. Therefore, in the present embodiment, in consideration of efficiency, it is assumed that the offset acquisition unit 310 includes eight Siamese CNNs.

이에 복원 손실 측정부(322)는 도7 및 도8 에 도시된 2개의 프레임의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂))에 대해 옵티컬플로우와 디스패리티 및 옵티컬플로우와 디스패리티 조합 각각의 순방향 및 역방향 복원 손실을 모두 반영하여 총 재구성 손실을 수학식 9에 따라 획득한다.Accordingly, the restoration loss measurement unit 322 performs optical flow and optical flow for 4 images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of the stereo image set of two frames shown in FIGS. 7 and 8. The total reconstruction loss is obtained according to Equation 9 by reflecting all of the forward and reverse recovery losses of the disparity, optical flow, and disparity combination.

한편, 평활화 손실 측정부(323)는 옵티컬플로우 및 디스패리티는 영상에서 객체의 경계, 즉 불연속된 부분에 대응하여 급격하게 변화하는 반면, 객체의 내부 영역 즉 연속된 부분에서는 매끄럽게 변화한다는 특성에 따라 평활화 손실을 측정한다.On the other hand, the smoothing loss measurement unit 323 according to the characteristics that the optical flow and disparity rapidly change in response to the boundary of the object, that is, a discontinuous part in the image, while the internal area of the object, that is, a continuous part, changes smoothly. Measure the smoothing loss.

평활화 손실 측정부(323)는 a 영상과 b 영상의 두 영상에 대한 평활화 손실을 수학식 10에 따라 획득할 수 있다.The smoothing loss measurement unit 323 may obtain a smoothing loss for two images of image a and image b according to Equation 10.

여기서 β는 평활도 대역폭 조절자로서 기지정된 값을 갖는다.Here, β has a predetermined value as a smoothness bandwidth adjuster.

다만 연속되는 프레임 간의 차를 나타내는 옵티컬플로우의 오프셋은 서로 다른 시점 사이의 차를 나타내는 디스패리티의 오프셋보다 작은 경우가 빈번하기 때문에, 이후 손실 역전파부(330)가 옵티컬플로우에 대한 평활화 손실(L _o )과 디스패리티에 대한 평활화 손실(L _d )에 대해 동일한 손실 가중치를 적용하는 것은 바람직하지 않다.However, since the offset of the optical flow representing the difference between successive frames is often smaller than the offset of the disparity representing the difference between different points in time, the lossy backpropagation unit 330 then performs a smoothing loss for the optical flow ( L _o It is not desirable to apply the same loss weights to) and the smoothing loss ( L _d ) for disparity.

이에 평활화 손실 측정부(323)는 옵티컬플로우에 대한 평활화 손실(L _o )과 디스패리티에 대한 평활화 손실(L _d )에 대해 서로 다른 손실 가중치(w_o, w_d)를 적용할 수 있도록 구분하여 측정할 수 있다.Accordingly, the smoothing loss measurement unit 323 divides it so that different loss weights (w _o , w _d ) can be applied to the smoothing loss ( L _o ) for the optical flow and the smoothing loss ( L _d ) for the disparity. Can be measured.

손실 역전파부(330)는 손실 측정부(320)에서 측정된 사이클 일관성 손실(L _c ), 복원 손실(L _r ) 및 평활화 손실(L _o , L _d ) 각각에 대해 수학식 11과 같이 기지정된 손실 가중치(w_c, w_r, w_o, w_d)를 적용하고 합하여, 총 손실(L)을 계산한다.The loss back propagation unit 330 is determined by Equation 11 for each of the cycle coherence loss ( L _c ), the recovery loss ( L _r ), and the smoothing loss ( L _o , L _d ) measured by the loss measurement unit 320. The loss weight (w _c , w _r , w _o , w _d ) is applied and summed to calculate the total loss ( L ).

그리고 손실 역전파부(330)는 계산된 총 손실(L)을 오프셋 획득부(310)의 8개의 샴 CNN으로 역전파함으로써 8개의 샴 CNN의 가중치를 업데이트하는 방식으로 학습시킨다.In addition, the loss backpropagation unit 330 backpropagates the calculated total loss L to the eight Siamese CNNs of the offset acquisition unit 310 to learn by updating the weights of the eight Siamese CNNs.

즉 학습부(300)는 검증 자료가 레이블 되지 않은 일반의 스테레오 영상을 학습 데이터로 이용하여 오프셋 획득부(310)의 8개의 샴 CNN을 비지도 학습을 수행할 수 있다. 여기서 학습부(300)는 총 손실이 기지정된 기준 손실값 이하가 될 때까지 반복 학습시키거나, 기지정된 기준 학습 횟수까지 반복 학습시킬 수 있다.That is, the learning unit 300 may perform unsupervised learning on the eight Siamese CNNs of the offset acquisition unit 310 by using a general stereo image that is not labeled with the verification data as training data. Here, the learning unit 300 may repeatedly learn until the total loss becomes less than or equal to a predetermined reference loss value, or repeatedly learn up to a predetermined number of reference learning times.

그리고 학습된 8개의 샴 CNN 들의 가중치 중 추정부(200)에 포함된 CNN에 대응하는 CNN의 가중치를 추정부(200)의 CNN으로 전달함으로써, 추정부(200)의 CNN이 직접 학습된 것과 동일한 효과를 나타낼 수 있다.And by transferring the weight of the CNN corresponding to the CNN included in the estimator 200 among the weights of the learned 8 Siamese CNNs to the CNN of the estimating unit 200, the CNN of the estimating unit 200 is the same It can have an effect.

상기에서는 학습부(300)의 오프셋 획득부(310)와 추정부(200)가 별도로 구성되었으나, 경우에 따라서는 학습부(300)의 오프셋 획득부(310)가 추정부(200)으로 이용될 수도 있다. 즉 추정부(200)의 다수의 CNN은 손실 역전파부(330)에서 역전파되는 손실에 기반하여 직접 학습될 수도 있다.In the above, the offset obtaining unit 310 and the estimating unit 200 of the learning unit 300 are configured separately, but in some cases, the offset obtaining unit 310 of the learning unit 300 is used as the estimating unit 200. May be. That is, a plurality of CNNs of the estimating unit 200 may be directly learned based on the losses backpropagated by the lossy backpropagation unit 330.

도9 는 본 발명의 일 실시예에 따른 옵티컬플로우 및 디스패리티 추정 방법을 나타낸다.9 shows an optical flow and disparity estimation method according to an embodiment of the present invention.

도1 내지 도8 을 참조하여, 도9 의 옵티컬플로우 및 디스패리티 추정 방법을 설명하면, 우선 스테레오 영상 획득부(100)가 학습부(300)의 8개의 샴 CNN을 학습시키기 위한 학습 스테레오 영상을 획득하고, 획득된 학습 스테레오 영상에서 연속된 2개 프레임의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))를 학습부(300)로 전달한다(S10).Referring to FIGS. 1 to 8, the optical flow and disparity estimation method of FIG. 9 will be described. First, the stereo image acquisition unit 100 generates a training stereo image for learning eight Siamese CNNs of the learning unit 300. The obtained stereo image set ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of two consecutive frames from the acquired training stereo image is transmitted to the learning unit 300 (S10).

여기서 학습 스테레오 영상은 검증 자료가 별도로 레이블된 기존의 학습 데이터가 아닌 단순히 학습 시에 이용되는 스테레오 영상을 의미한다.Here, the learning stereo image refers to a stereo image that is simply used at the time of learning, rather than the existing training data where the verification data is separately labeled.

학습 스테레오 영상이 획득되면, 학습부(300)가 학습 스테레오 영상을 이용하여 추정부(200)의 다수의 CNN에 대한 학습 가중치를 획득하기 위한 학습을 수행한다(S20).When the learning stereo image is obtained, the learning unit 300 performs learning to acquire learning weights for a plurality of CNNs of the estimating unit 200 using the learning stereo image (S20).

학습부(300)의 오프셋 획득부(310)는 추정부(200)의 다수의 CNN과 동일한 구조를 갖는 8개의 샴 CNN을 이용하여 2개 프레임의 스테레오 영상 세트((l₁, r₁), (l₂, r₂))로부터 4개의 옵티컬플로우 오프셋((F_l1,l2), (F_r2,r1), (F_l2,l1), (F_r1,r2))과 4개의 디스패리티 오프셋((F_l1,r1), (F_r2,l2), (F_r1,l1), (F_l2,r2)))의 총 8개의 오프셋을 획득한다(S21).The offset acquisition unit 310 of the learning unit 300 uses 8 siamese CNNs having the same structure as the plurality of CNNs of the estimating unit 200 to a stereo image set of two frames ((l ₁ , r ₁ ), (l ₂ , r ₂ )) with 4 optical flow offsets ((F _l1,l2 ), (F _r2,r1 ), (F _l2,l1 ), (F _r1,r2 )) and 4 disparity offsets ( A total of eight offsets of (F _l1,r1 ), (F _r2,l2 ), (F _r1,l1 ), (F _l2,r2 ))) are obtained (S21).

학습부(300)는 획득된 8개의 오프셋으로부터 손실을 측정하여 8개의 샴 CNN으로 역전파함으로써, 8개의 샴 CNN을 학습시킨다.The learning unit 300 learns 8 Siamese CNNs by measuring losses from the obtained 8 offsets and backpropagating them to 8 Siamese CNNs.

8개의 샴 CNN을 학습시키기 위해, 우선 손실 측정부(320)의 신뢰도 맵 생성부(324)는 8개의 오프셋을 이용하여 2개 프레임의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂)) 중 서로 다른 기지정된 조합으로 2개 영상간의 신뢰도 맵을 생성하는 한편, 생성된 신뢰도 맵을 기반으로 사이클 신뢰도 맵을 생성한다(S22).In order to learn 8 Siamese CNNs, first, the reliability map generation unit 324 of the loss measurement unit 320 uses 8 offsets to form 4 images ((l ₁ , r ₁ ) of a stereo image set of 2 frames. , (l ₂ , r ₂ )), a reliability map between the two images is generated with a predetermined combination different from each other, and a cycle reliability map is generated based on the generated reliability map (S22).

그리고 사이클 일관성 손실 측정부(321)는 8개의 오프셋으로부터 2개 프레임의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂))에 대해 기지정된 순방향 및 역방향으로의 사이클 전이 결과(

,

)를 획득하고, 획득된 사이클 전이 결과(

,

)와 생성된 사이클 신뢰도 맵을 기반으로 2개 프레임의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂))에 대해 기지정된 순방향 및 역방향으로의 사이클 일관성 손실(L _c )을 수학식 6 및 7에 따라 계산한다(S23).In addition, the cycle coherence loss measurement unit 321 is in the forward and reverse directions for 4 images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of a stereo image set of 2 frames from 8 offsets. The cycle transition result of (

,

), and the obtained cycle transition result (

,

) And the cycle coherence loss in the forward and reverse directions for the four images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of a two-frame stereo image set based on the generated cycle reliability map. ( L _c ) is calculated according to Equations 6 and 7 (S23).

한편, 복원 손실 측정부(322)는 2개 프레임의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂)) 중 대응점 사이의 픽셀값 차와 그래디언트 차 및 신뢰도 맵을 이용하여 복원 손실(L _r )을 수학식 8 및 9에 따라 계산한다(S24). 여기서 복원 손실 측정부(322)는 수학식 9에 나타난 바와 같이, 옵티컬플로우의 순방향 및 역방향 복원 손실, 디스패리티의 순방향 및 역방향 복원 손실 및 옵티컬플로우와 디스패리티의 순방향 및 역방향 조합 복원 손실을 합하여 총 복원 손실(L _r )을 계산할 수 있다.On the other hand, the restoration loss measurement unit 322 is the difference between the pixel value, the gradient difference and the reliability between the corresponding points among the four images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of the stereo image set of two frames. The restoration loss L _r is calculated using the map according to Equations 8 and 9 (S24). Here, as shown in Equation 9, the restoration loss measurement unit 322 sums the forward and reverse restoration losses of the optical flow, the forward and reverse restoration losses of the disparity, and the combination restoration loss of the optical flow and the disparity in the forward and reverse directions. The recovery loss ( L _r ) can be calculated.

그리고 평활화 손실 측정부(323)는 2개 프레임의 스테레오 영상 세트의 4개의 영상((l₁, r₁), (l₂, r₂))에서 옵티컬플로우에 따른 2 영상들((l₁, l₂), (r₁, r₂))간 평활화 손실(L _o )과 디스패리티에 따른 2 영상들((l₁, r₁), (l₂, r₂))간의 평활화 손실(L _d )을 수학식 10에 따라 각각 계산한다(S25).And the smoothing loss measurement unit 323 is 2 images according to the optical flow in the 4 images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) of the stereo image set of two frames ((l ₁ , l ₂ ), (r ₁ , r ₂ )) smoothing loss ( L _o ) and smoothing loss between 2 images ((l ₁ , r ₁ ), (l ₂ , r ₂ )) according to disparity ( L _d ) Is calculated according to Equation 10 (S25).

손실 역전파부(330)는 손실 측정부(320)에서 측정된 사이클 일관성 손실(L _c ), 복원 손실(L _r ) 및 평활화 손실(L _o , L _d ) 각각에 대해 수학식 11과 같이 기지정된 손실 가중치(w_c, w_r, w_o, w_d)를 적용하고 합하여, 총 손실(L)을 계산한다(S26).The loss back propagation unit 330 is determined by Equation 11 for each of the cycle coherence loss ( L _c ), the recovery loss ( L _r ), and the smoothing loss ( L _o , L _d ) measured by the loss measurement unit 320. The loss weight (w _c , w _r , w _o , w _d ) is applied and summed to calculate the total loss ( L ) (S26).

그리고 계산된 총 손실(L)을 오프셋 획득부(310)의 8개의 샴 CNN으로 역전파하여 8개의 샴 CNN의 학습 가중치를 업데이트함으로써 학습시킨다(S27).Then, the calculated total loss ( L ) is backpropagated to the eight Siamese CNNs of the offset acquisition unit 310, and training is performed by updating the learning weights of the eight Siamese CNNs (S27).

학습부(300)는 오프셋 획득부(310)의 8개의 샴 CNN의 학습 횟수가 기지정된 기준 학습 횟수 이상인지 판별한다(S28). 만일 기준 학습 횟수 미만이면, 다시 8개의 오프셋을 획득한다(S21). 그러나 기준 학습 횟수 이상이면, 반복을 종료하고, 8개의 샴 CNN의 학습 가중치 중 추정부(200)의 다수의 CNN에 대응하는 학습 가중치들을 대응하는 CNN으로 전달한다(S29). 즉 추정부(200)의 다수의 CNN로 학습된 가중치를 전달하여, 다수의 CNN을 학습된 상태로 전환한다.The learning unit 300 determines whether the number of learning of the eight Siamese CNNs of the offset acquisition unit 310 is equal to or greater than a predetermined reference number of learning (S28). If it is less than the reference number of learning, eight offsets are obtained again (S21). However, if it is more than the reference number of training, the repetition is terminated, and training weights corresponding to a plurality of CNNs of the estimating unit 200 among the training weights of the 8 Siamese CNNs are transferred to the corresponding CNNs (S29). That is, by transferring the learned weights to the plurality of CNNs of the estimating unit 200, the plurality of CNNs are converted to the learned state.

이후 스테레오 영상 획득부(100)는 옵티컬플로우 및 디스패리티가 획득되어야 하는 스테레오 영상을 획득하고, 획득된 스테레오 영상에서 연속된 2개 프레임씩 스테레오 영상 세트((l₁, r₁), (l₂, r₂))를 추정부(200)로 전달한다(S30).Thereafter, the stereo image acquisition unit 100 acquires a stereo image for which optical flow and disparity are to be obtained, and a stereo image set ((l ₁ , r ₁ ), (l ₂ ) by two consecutive frames from the obtained stereo image. , r ₂ )) is transferred to the estimation unit 200 (S30).

학습부(300)의 8개의 샴 CNN과 동일하게 구성되고 학습 가중치를 인가받아 학습된 추정부(200)의 다수의 CNN 중 옵티컬플로우 추정부(210)를 구성하는 2개의 CNN은 2개의 스테레오 영상 세트((l1, r1), (l2, r2))의 t번째 프레임 및 t+1번째 프레임의 좌영상들(l1, l2) 사이의 대응점 변이(F_l1,l2)와 우영상들(r1, r2) 사이의 대응점 변이(F_r1,r2)를 옵티컬플로우로 추정하고, 디스패리티 추정부(220)를 구성하는 적어도 하나의 CNN은 t번째 프레임의 좌영상(l1)에서 우영상(r1)으로의 대응점 변이(F_l1,r1)를 디스패리티로 추정한다(S40). 또한 t+1번째 프레임의 좌영상(l2)에서 우영상(r2)으로의 대응점 변이(F_l2,r2)도 디스패리티로서 함께 추정할 수 있으며, 경우에 따라서는 t번째 프레임 및 t+1번째 프레임 각각의 우영상(r1, r2)에서 좌영상(l1, l2)으로의 대응점 변이((F_r1,l1), (F_r2,l2))도 디스패리티로 추정할 수 있다.Two CNNs constituting the optical flow estimator 210 among the plurality of CNNs of the estimating unit 200 that are configured identically to the eight Siamese CNNs of the learning unit 300 and are trained by receiving a learning weight are two stereo images. Correspondence point shift (F _{l1, l2} ) and right images (r1,) between the left images (l1, l2) of the t-th frame and the t+1th frame of the set ((l1, r1), (l2, r2)) The corresponding point variation (F _r1, r2) between _r2 ) is estimated as an optical flow, and at least one CNN constituting the disparity estimator 220 is from the left image (l1) of the t-th frame to the right image (r1). The corresponding point variation (F _l1,r1 ) of is estimated as disparity (S40). In addition, the shift of the correspondence point (F _l2,r2 ) from the left image (l2) to the right image (r2) of the t+1th frame can be estimated together as disparity, and in some cases, the tth frame and the t+1th frame The corresponding point transition ((F _r1,l1 ), (F _r2,l2 )) from the right image (r1, r2) of each frame to the left image (l1, l2) can also be estimated as disparity.

이하에서는 본 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치 및 방법의 성능을 검토한다.Hereinafter, the performance of the optical flow and disparity estimation apparatus and method according to the present embodiment will be reviewed.

표2 는 FlyingThings3D과 Sintel clean/final 및 KITTI 2012/2015 데이터 셋에 대해 본 실시예에 따른 옵티컬플로우 추정 성능을 기존의 옵티컬플로우 추정 방식과 비교한 결과를 나타낸다.Table 2 shows the results of comparing the optical flow estimation performance according to the present embodiment with the conventional optical flow estimation method for FlyingThings3D, Sintel clean/final and KITTI 2012/2015 data sets.

여기서 이용된 데이터 셋 중 FlyingThings3D는 옵티컬플로우, 디스패리티 및 디스패리티 변화에 대한 검증 자료가 레이블된 스테레오 영상을 제공하며, 21,818개의 레이블과 4,248개의 테스트 스테레오 영상을 포함한 데이터 셋을 제공한다. Sintel는 각각 1064개의 스테레오 영상이 포함된 clean 버전의 데이터 셋과 final 버전의 데이터 셋을 포함한다. 그리고 KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute) 2012과 KITTI 2015은 각각 197 및 200개의 레이블 및 195 및 200개의 테스트 스테레오 영상을 포함한 데이터 셋을 제공한다.Among the data sets used here, FlyingThings3D provides a labeled stereo image with verification data for optical flow, disparity, and disparity change, and provides a data set including 21,818 labels and 4,248 test stereo images. Sintel includes a clean version of the data set and a final version of the data set each containing 1064 stereo images. And KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) 2012 and KITTI 2015 provide data sets containing 197 and 200 labels and 195 and 200 test stereo images, respectively.

학습에 이용된 스테레오 영상은 512 X 256 크기를 가지고, 각 데이터 세트에 대해 1700k, 850k 및 200k회 반복 학습을 수행하였으며, 손실 가중치(w_c, w_r, w_o, w_d)를 각각 w_c = 1, w_r = 1, w_o = 13.25 및 w_d = 3으로 설정하였다. 그리고 α₁ = 1, α₂ = 20, β = 1, γ = 1, T = 10으로 설정하였다.The stereo image used for training has a size of 512 X 256, and iterative learning was performed 1700k, 850k and 200k times for each data set, and the loss weights (w _c , w _r , w _o , w _d ) were each w _c = 1, w _r = 1, w _o = 13.25 and w _d = 3. And α ₁ = 1, α ₂ = 20, β = 1, γ = 1, T = 10.

다만, KITTI 데이터 셋에 대해서는 조명의 변화를 고려하기 위해, w_r = 0.3 및 γ = 1으로 적용하였다.However, for the KITTI data set, w _r = 0.3 and γ = 1 were applied to consider the change in lighting.

표2 에서 접미어 -FC는 각 옵티컬플로우 추정 방법이 FlyingChairs 데이터 셋을 이용하여 학습된 경우를 나타내고, -FT는 FlyingThings3D 데이터 셋을 이용하여 학습된 경우를 나타내며, -K는 KITTI 데이터 셋을 이용하여 학습된 경우를 나타낸다. 그리고 -S는 Sintel 데이터 셋을 이용하여 학습된 경우를 나타낸다. 그리고 +ft- 은 두 가지 데이터 집합을 사용하여 미세 조정된 결과를 나타낸다.In Table 2, the suffix -FC indicates the case where each optical flow estimation method was learned using the FlyingChairs data set, -FT indicates the case where it was trained using the FlyingThings3D data set, and -K indicates the case learned using the KITTI data set. It indicates the case. And -S denotes a case learned using the Sintel data set. And +ft- represents the fine-tuned result using two data sets.

표2 에서의 모든 결과는 검증 자료(ground truth)와 비교할 때 픽셀 단위의 평균 EPE(endpoint error)를 나타내며, EPE가 3% 또는 5%를 초과할 때, 추정된 옵티컬플로우에 오류가 있는 것으로 간주되는 오류 픽셀의 비율(P1-all)을 %로 나타내었다.All results in Table 2 represent the average endpoint error (EPE) in pixels when compared to the ground truth, and when the EPE exceeds 3% or 5%, the estimated optical flow is considered to have an error. The percentage of error pixels (P1-all) is expressed as %.

표2 를 참조하면, FlyingChairs 데이터 셋으로 비지도 학습된 본 실시예는 동일하게 FlyingChairs 데이터 셋으로 비지도 학습된 기존의 DSTFlow-FC나 Occlusion-aware-FC 보다 KITTI 2012/2015 데이터 셋에 대해 뛰어난 성능을 나타냄을 확인할 수 있다. 특히 Sintel clean/final 데이터 셋에 대해서는 모든 비지도 학습 방식 중 가장 우수한 성능을 나타낸다.Referring to Table 2, this embodiment, which is unsupervised learning with the FlyingChairs data set, has superior performance for the KITTI 2012/2015 data set than the existing DSTFlow-FC or Occlusion-aware-FC that has been unsupervised learning with the FlyingChairs data set. It can be seen that it represents. Especially, for Sintel clean/final data set, it shows the best performance among all unsupervised learning methods.

그리고 KITTI 2012/2015 데이터 셋으로 비지도 학습된 본 실시예는 KITTI 2012/2015 데이터 셋에 대해 가장 우수한 성능을 나타내며, Sintel clean/final 데이터 셋으로 비지도 학습된 본 실시예는 KITTI 2012/2015 데이터 셋에 대해 지도 학습 방식을 이용하는 FlowNet과 유사한 성능을 나타낸다.And this embodiment, unsupervised learning with the KITTI 2012/2015 data set, shows the best performance for the KITTI 2012/2015 data set, and this embodiment unsupervised learning with the Sintel clean/final data set is the KITTI 2012/2015 data. It shows similar performance to FlowNet using supervised learning method for three.

한편, 표3 은 FlyingThings3D과 KITTI 2012/2015 데이터 셋에 대해 본 실시예에 따른 디스패리티 추정 성능을 기존의 디스패리티 추정 방식과 비교한 결과를 나타낸다.Meanwhile, Table 3 shows the results of comparing the disparity estimation performance according to the present embodiment with the conventional disparity estimation method for the FlyingThings3D and KITTI 2012/2015 data sets.

표3 에서는 디스패리티 추정 결과를 검증 자료와 비교한 MAE(mean absolute error), 로그 RMSE(log root mean square error)를 나타내며, D1-all은 MAE가 3% 또는 5% 이상일 때 오류로 간주되는 오류 픽셀의 비율을 나타낸다.Table 3 shows the mean absolute error (MAE) and log root mean square error (RMSE) comparing the disparity estimation result with the verification data, and D1-all is an error that is considered an error when MAE is 3% or 5% or more. Represents the percentage of pixels.

표3 을 참조하면, FlyingThings3D 데이터 셋으로 비지도 학습된 본 실시예는 KITTI 2012/2015 데이터 셋에 대해, 동일한 FlyingThings3D 데이터 셋으로 지도 학습된 DispNet 유사한 성능을 제공함을 알 수 있다. 또한 KITTI 2012/2015 데이터 셋으로 비지도 학습된 본 실시예는 KITTI 2012/2015 데이터 셋에 대해 평균 MAE 및 로그 RMSE 측면에서 모든 지도 학습 및 비지도 학습보다 나은 성능을 나타냄을 알 수 있다.Referring to Table 3, it can be seen that this embodiment, unsupervised learning with the FlyingThings3D data set, provides similar performance to DispNet supervised learning with the same FlyingThings3D data set for the KITTI 2012/2015 data set. In addition, it can be seen that this embodiment, which is unsupervised learning with the KITTI 2012/2015 data set, shows better performance than all supervised learning and unsupervised learning in terms of average MAE and log RMSE for the KITTI 2012/2015 data set.

표4 는 본 실시예의 디스패리티 추정 성능을 정량적으로 비교한 결과를 나타낸다.Table 4 shows the results of quantitative comparison of the disparity estimation performance of the present embodiment.

표4 를 참조하면, 본 실시예의 디스패리티 추정 성능은 모든 비지도 학습 방식보다 우수하며, 지도 학습 방식에 근접한 성능을 나타냄을 알 수 있다.Referring to Table 4, it can be seen that the disparity estimation performance of the present embodiment is superior to all unsupervised learning methods, and shows a performance close to that of the supervised learning method.

도10 및 도11 은 본 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치 및 방법의 성능을 비교한 결과를 나타낸다.10 and 11 show results of comparing the performance of the optical flow and disparity estimation apparatus and method according to the present embodiment.

도10 은 본 실시예의 옵티컬플로우 및 디스패리티 추정 성능을 정성적으로 비교한 결과를 나타내며, 도10 에서 (a)는 원본 스테레오 영상들을 나타내고, (b)는 검증 자료(ground-truth) 옵티컬 플로우를 나타내며, (c)는 본 실시예의 비지도 학습 방식으로 학습된 경우에 추정되는 옵티컬플로우를 나타낸다. 그리고 (d)는 검증 자료 디스패리티를 나타내고, (e)는 본 실시예의 비지도 학습 방식으로 학습된 경우에 추정되는 디스패리티를 나타낸다.FIG. 10 shows a result of qualitative comparison of the optical flow and disparity estimation performance of the present embodiment. In FIG. 10, (a) shows the original stereo images, and (b) shows the ground-truth optical flow. And (c) denotes an optical flow estimated in the case of learning by the unsupervised learning method of the present embodiment. In addition, (d) denotes the verification data disparity, and (e) denotes the estimated disparity in the case of learning by the unsupervised learning method of the present embodiment.

도10 상단으로부터 첫 6행은 각각 FT로 우선 학습하고, +ft-S로 미세조정 학습된 경우를 나타내고, 이후 6행은 KITTI 2012/2015 데이터 셋으로 비지도 학습되고, KITTI 2012/2015 데이터 셋에 대해 옵티컬플로우와 디스패리티를 추정한 결과를 나타낸다. 하단 3행은 은 FlyingThings3D 데이터 셋으로 비지도 학습된 경우에 FlyingThings3D 데이터 셋에 대해 옵티컬플로우와 디스패리티를 추정한 결과를 나타낸다.The first 6 rows from the top of Fig. 10 show the case where FT is first studied and fine-tuned learning is performed with +ft-S, and then 6 rows are unsupervised learning with the KITTI 2012/2015 data set, and the KITTI 2012/2015 data set. The results of estimating optical flow and disparity are shown. The bottom 3 row shows the results of estimating optical flow and disparity for the FlyingThings3D data set when unsupervised learning is performed with the FlyingThings3D data set.

도10 에 도시된 바와 같이, 본 실시예는 검증 자료(ground-truth)에 비견될 수 있는 우수한 성능으로 옵티컬플로우와 디스패리티를 추정할 수 있음을 알 수 있다.As shown in FIG. 10, it can be seen that the present embodiment can estimate optical flow and disparity with excellent performance comparable to ground-truth.

표5 는 학습부(300)가 신뢰도 맵, 사이클 일관성 각각 또는 모두를 이용하고, 옵티컬플로우와 디스패리티 각각 또는 모두에 대해 학습하는 경우에 대한 성능 비교를 나타낸다.Table 5 shows a performance comparison for the case where the learning unit 300 uses each or both of the reliability map and the cycle consistency, and learns each or both of the optical flow and disparity.

표5 를 참조하면, 신뢰도 맵, 사이클 일관성을 모두 이용하고, 옵티컬플로우와 디스패리티를 동시에 학습하는 경우에, FlyingThings3D 및 Sintel clean/final에서 가장 우수한 성능을 나타냄을 알 수 있다.Referring to Table 5, it can be seen that the best performance in FlyingThings3D and Sintel clean/final is obtained when both the reliability map and cycle consistency are used, and optical flow and disparity are simultaneously learned.

도11 은 학습 방식에 따른 옵티컬플로우와 디스패리티 추정 성능을 정성적으로 나타낸 도면으로, 상단은 옵티컬플로우 추정 결과를 나타내고, 하단은 디스패리티 추정 결과를 나타낸다. 그리고 (a)는 사이클 일관성 손실 및 신뢰도 맵을 모두 이용하지 않고 학습된 경우를 나타내고, (b)는 사이클 일관성 손실을 이용하지 않는 경우, (c)는 신뢰도 맵을 이용하지 않는 경우, (d)는 옵티컬플로우 및 디스패리티 각각에 대해 개별적으로 학습된 경우, (e)는 본 실시예에 따라 사이클 일관성 손실 및 신뢰도 맵을 이용하여 옵티컬플로우와 디스패리티가 동시 학습된 경우를 나타내며, (f)는 검증 자료를 나타낸다.11 is a diagram showing qualitatively the optical flow and disparity estimation performance according to the learning method. The upper part shows the optical flow estimation result, and the lower part shows the disparity estimation result. And (a) shows the case of learning without using both the cycle coherence loss and the reliability map, (b) the case where the cycle coherence loss is not used, (c) the case where the reliability map is not used, and (d) Is a case where optical flow and disparity are individually learned, (e) shows a case where optical flow and disparity are simultaneously learned using a cycle coherence loss and reliability map according to the present embodiment, and (f) is Show verification data.

도11 에 나타난 바와 같이, 본 실시예에 따라 사이클 일관성 손실 및 신뢰도 맵을 이용하여 옵티컬플로우와 디스패리티가 동시 학습된 경우에 검증 자료와 비교한 EPE 및 MAE가 가장 우수하게 나타남을 확인할 수 있다.As shown in FIG. 11, it can be seen that the EPE and MAE compared to the verification data are the best when optical flow and disparity are simultaneously learned using the cycle coherence loss and reliability map according to the present embodiment.

결과적으로 본 실시예에 따른 옵티컬플로우 및 디스패리티 추정 장치 및 방법은 옵티컬플로우와 디스패리티를 별도로 학습시키지 않아도 되므로, 학습 속도를 향상시킬 수 있을 뿐만 아니라, 개별 학습 방식보다 더욱 우수한 성능을 나타낼 수 있으며, 옵티컬플로우와 디스패리티를 동시에 추정할 수 있도록 한다.As a result, the optical flow and disparity estimation apparatus and method according to the present embodiment do not need to separately learn the optical flow and disparity, and thus, not only can improve the learning speed, but also exhibit better performance than the individual learning method. , Optical flow and disparity can be estimated at the same time.

본 발명에 따른 방법은 컴퓨터에서 실행 시키기 위한 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 여기서 컴퓨터 판독가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 또한 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함하며, ROM(판독 전용 메모리), RAM(랜덤 액세스 메모리), CD(컴팩트 디스크)-ROM, DVD(디지털 비디오 디스크)-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등을 포함할 수 있다.The method according to the present invention may be implemented as a computer program stored in a medium for execution on a computer. Computer readable media herein can be any available media that can be accessed by a computer, and can also include any computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, and ROM (readable) Dedicated memory), RAM (random access memory), CD (compact disk)-ROM, DVD (digital video disk)-ROM, magnetic tape, floppy disk, optical data storage, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, these are merely exemplary, and those skilled in the art will understand that various modifications and other equivalent embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Therefore, the true technical protection scope of the present invention should be defined by the technical spirit of the appended claims.

100: 스테레오 영상 획득부 200: 추정부
300: 학습부 210: 옵티컬플로우 추정부
220: 디스패리티 추정부 310: 오프셋 획득부
320: 손실 측정부 330: 손실 역전파부
311: 옵티컬플로우 오프셋 획득부 312: 디스패리티 오프셋 획득부
321: 사이클 일관성 손실 측정부 322: 복원 손실 측정부
323: 평활화 손실 측정부 324: 신뢰도 맵 생성부100: stereo image acquisition unit 200: estimation unit
300: learning unit 210: optical flow estimation unit
220: disparity estimation unit 310: offset acquisition unit
320: loss measurement unit 330: loss back propagation unit
311: optical flow offset acquisition unit 312: disparity offset acquisition unit
321: cycle consistency loss measurement unit 322: restoration loss measurement unit
323: smoothing loss measurement unit 324: reliability map generation unit

Claims

A stereo image acquisition unit that acquires a stereo image of multiple frames; And
Consecutive frames from a stereo image set of two consecutive frames transmitted from the stereo image acquisition unit, including a plurality of convolutional neural networks (hereinafter referred to as CNN) having the same structure and having the same learning weight by pre-learning the pattern recognition method An estimating unit for simultaneously estimating and outputting disparity between the left image and the right image divided according to the optical flow and parallax of the images of Including,
The plurality of CNNs of the estimation unit are
Cycle representing the sum of changes in the position of the searched corresponding points by cycling the remaining images from each pixel of one of the four images of the stereo image set of two consecutive frames input at the time of learning in each of the predetermined forward and reverse directions A device for estimating optical flow and disparity that is learned with the updated learning weight by backpropagating a total loss including a cycle coherence loss obtained according to a transition result.

The method of claim 1, wherein the cycle coherence loss is
The optical flow and disparity estimation apparatus obtained by reflecting a cycle reliability map indicating whether a corresponding point exists between two images in a cycle path in a forward direction and a reverse direction among the four images to the cycle transition result.

The method of claim 2, wherein the cycle coherence loss is
An optical flow and disparity estimation apparatus for outputting a cycle coherence loss of a corresponding pixel as a threshold when the cycle coherence loss for each pixel in the forward direction and the reverse direction exceeds a predetermined threshold value.

The method of claim 1, wherein the total loss is
At the time of learning, between two images according to the frame order of the four images, between two images of the same frame, and between two images with different parallaxes from the frame, acquired according to the pixel value and the gradient value of the corresponding point for each pixel Optical flow and disparity estimating device additionally including the restoration loss.

The method of claim 4, wherein the restoration loss is
An optical flow and disparity estimation apparatus obtained by further reflecting a reliability map indicating whether a corresponding point exists between two images according to a frame order, between two images of the same frame, and between two images having different parallaxes from a frame.

The method of claim 4, wherein the total loss is
Optical flow and disparity estimating apparatus further including an optical flow smoothing loss for limiting a change in optical flow acquired according to a frame order in the four images and a disparity smoothing loss for limiting a change in disparity obtained according to a parallax .

The method of claim 6, wherein the optical flow and disparity estimation apparatus
Further comprising a learning unit that is combined while training a plurality of CNNs of the estimation unit to obtain the learning weight,
The learning unit
An offset acquisition unit comprising a plurality of Siamese CNNs having the same structure as the plurality of CNNs of the estimation unit and obtaining a positional change of a corresponding point for each of two images of different combinations among the four images;
A loss measuring unit for calculating the cycle coherence loss, the restoration loss, the optical flow smoothing loss, and the disparity smoothing loss by using a plurality of offsets obtained from each of the plurality of sham CNNs of the offset obtaining unit; And
By applying a known loss weight to each of the cycle coherence loss, the restoration loss, the optical flow smoothing loss, and the disparity smoothing loss, the total loss is obtained and backpropagated to the plurality of Siamese CNNs, and the plurality of Siamese A lossy backpropagation unit that updates the training weights for the CNNs and, when the training on the plurality of Siamese CNNs is completed, transfers the training weights to the plurality of CNNs of the estimation unit; Optical flow and disparity estimation apparatus comprising a.

Obtaining a multi-frame stereo image; And
Using a plurality of convolutional neural networks (hereinafter referred to as CNNs) having the same structure and having the same learning weight by pre-learning the pattern recognition method, consecutive frames from the stereo image set of two consecutive frames among the stereo images of the plurality of frames are Simultaneously estimating and outputting a disparity between the left and right images classified according to the optical flow of the images and the parallax; Including,
The plurality of CNNs
Cycle representing the sum of changes in the position of the searched corresponding points by cycling the remaining images from each pixel of one of the four images of the stereo image set of two consecutive frames input at the time of learning in each of the predetermined forward and reverse directions A method for estimating optical flow and disparity learned with the updated learning weight by backpropagating a total loss including a cycle coherence loss obtained according to a transition result.

The method of claim 8, wherein the optical flow and disparity estimation method
Further comprising a learning step of learning the plurality of CNNs,
The learning step is
Using a plurality of Siamese CNNs having the same structure as the plurality of CNNs, acquiring a plurality of offsets representing a position change of a corresponding point for each of two images of different combinations among the four images input during training ;
Using the plurality of offsets, a cycle reliability map indicating whether a corresponding point exists between the cycle transition result and each of two images in a cycle path in a forward direction and a reverse direction among the four images is obtained, and the cycle reliability map is obtained. Reflecting the cycle transition result to obtain the cycle coherence loss;
Reconstruction loss obtained according to the pixel value and the gradient value of the corresponding point for each pixel between two images according to the frame order of the four images, between two images of the same frame, and between two images having different parallaxes from the frame Calculating;
Calculating an optical flow smoothing loss for limiting a change in optical flow obtained according to a frame order in the four images and a disparity smoothing loss for limiting a change in disparity obtained according to a parallax;
Obtaining the total loss by applying a predetermined loss weight to each of the cycle coherence loss, the restoration loss, the optical flow smoothing loss, and the disparity smoothing loss;
Backpropagating the total loss to the plurality of Siamese CNNs to update learning weights for the plurality of Siamese CNNs; And
Transmitting the learning weights to the plurality of CNNs when learning for the plurality of Siamese CNNs is completed; Optical flow and disparity estimation method comprising a.

The method of claim 9, wherein the learning step
Generating a reliability map indicating whether a corresponding point exists between two images of different combinations among the four images; And
Generating a cycle reliability map indicating whether a corresponding point exists between two images in a cycle path in a forward direction and a reverse direction among the four images using the reliability map; Further comprising,
The step of obtaining the cycle coherence loss
Reflecting the cycle reliability map to the cycle transition result to obtain the cycle coherence loss,
The step of calculating the restoration loss
An optical flow and disparity estimation method for obtaining the restoration loss by reflecting a reliability map obtained between two images according to a frame order, between two images of the same frame, and between two images having different parallaxes.