KR101957089B1

KR101957089B1 - Method and system on deep self-guided cost aggregation for stereo matching

Info

Publication number: KR101957089B1
Application number: KR1020180002272A
Authority: KR
Inventors: 박인규; 윌리엄
Original assignee: 인하대학교 산학협력단
Priority date: 2018-01-08
Filing date: 2018-01-08
Publication date: 2019-03-11

Abstract

According to an embodiment of the present invention, a method for aggregating deep magnetic induction costs for stereo matching can obtain an accurate disparity map. To this end, the method for aggregating deep magnetic induction costs for stereo matching comprises the steps of: generating a cost volume slice from a pair of stereo images; matching the stereo images based on the generated cost volume slice; and aggregating magnetic induction matching costs by matching the stereo images to obtain a disparity map.

Description

[0001] METHOD AND SYSTEM FOR DEEP SELF-GUIDED COST AGGREGATION FOR STEREO MATCHING [0002]

아래의 설명은 스테레오 정합을 위한 자기유도 비용집계 기술에 관한 것이다.
The following description relates to a magnetic induction cost aggregation technique for stereo matching.

사람의 시각은 주위 환경의 정보를 얻기 위한 감각의 하나로서, 두 눈을 통해 사물의 위치와 멀고 가까움을 인지할 수 있다. 이러한 시각 구조를 기계에 구현함으로써, 인간을 대신할 수 있는 로봇을 개발해 왔는데, 로봇의 시각 시스템은 스테레오 카메라(즉, 좌측 카메라와 우측 카메라)로 구성되고, 두 개의 카메라로부터 영상을 읽어 들여 하나의 3차원 정보로 재구성하게 된다. 이때, 하나의 영상은 3차원 공간을 2차원 공간으로 사영시킨 것을 의미하는데, 이 과정에서 3차원 거리 정보(depth)를 상실하게 되고, 이것으로부터 3차원 공간을 직접 복원하는 것은 매우 어렵게 된다. 하지만, 서로 다른 위치에서 얻어진 두 장 이상의 영상이 있을 경우에는 3차원 공간의 복원을 수행할 수 있다. 즉, 실제 공간상의 한 점이 두 장의 영상에 맺혔을 때 두 영상에서 나타나는 그 점의 대응점을 찾고, 기하학적인 구조를 이용하면 그 점의 실제 공간에서의 위치를 찾을 수 있다. 두 영상의 대응점을 찾는 일(이하, 스테레오 매칭)은 정류된 두 이미지 사이의 일치를 찾기 위해 지난 수십 년 동안 인기있는 연구 주제이다. 일반적으로 스테레오 매칭 방법은 비용 계산, 비용 집계, 불일치 계산, 불일치 해소의 네 개의 단계로 구성된다.Human vision is one of the senses for acquiring information of the surrounding environment, and it is possible to recognize the location and distance of objects through the two eyes. The visual system of the robot consists of a stereo camera (that is, a left camera and a right camera), and reads images from two cameras to obtain a single And reconstructed into three-dimensional information. In this case, one image means that a three-dimensional space is projected into a two-dimensional space. In this process, the three-dimensional distance information (depth) is lost, and it is very difficult to directly restore the three-dimensional space. However, if there are two or more images obtained from different positions, the restoration of the three-dimensional space can be performed. In other words, when a point on the real space is formed on two images, the corresponding point of the point appearing in the two images can be found, and the geometrical structure can be used to find the position in the real space of the point. Finding the correspondence of two images (hereinafter referred to as stereo matching) is a popular research topic for decades to find a match between two rectified images. In general, the stereo matching method consists of four steps: cost calculation, cost aggregation, inconsistency calculation, and discrepancy resolution.

비용 집계는 로컬 또는 글로벌 스테레오 매칭에 중요한 역할을 수행한다. 비용 집계 방법의 대부분은 유도 영상을 이용하는 조인트 영상 필터링과 유사한 개념을 공유한다. 이에 따라 유도 영상이 필요하다. 이때, 유도 영상의 비슷한 색상이 비슷한 불일치 값을 가져야 한다고 가정한다. 그러나 유사한 불일치 값을 갖는 매우 텍스처링 된 영역이 있는 경우, 이러한 가정은 종종 실패하게 된다.Cost aggregation plays an important role in local or global stereo matching. Most of the cost aggregation methods share a concept similar to joint image filtering using inductive images. Therefore, an inductive image is required. At this time, it is assumed that similar colors of the inductive image have similar disagreement values. However, if there is a highly textured area with a similar discrepancy value, this assumption often fails.

참고자료: 한국공개특허 제10-2015-0121179, 한국공개특허 제10-2014-0030635
Reference: Korean Patent Publication No. 10-2015-0121179, Korean Patent Publication No. 10-2014-0030635

깊은 컨볼루셔널 신경망을 이용하여 비용 볼륨 슬라이스에 기반하여 스테레오 매칭을 수행함에 따라 정확한 디스패리티 맵(Disparity-Map)을 획득하기 위한 자기유도 비용집계 방법 및 시스템을 제공할 수 있다.
It is possible to provide a magnetic induction cost aggregation method and system for acquiring an accurate disparity map (Disparity-Map) by performing stereo matching based on a cost volume slice using a deep convolutional neural network.

스테레오 정합을 위한 심층 자기유도 비용집계 방법은, 비용 볼륨 슬라이스에 기초하여 스테레오 이미지를 매칭시키는 단계; 및 상기 스테레오 이미지를 매칭시킴에 따라 디스패리티 맵을 획득함으로써 자기유도 정합 비용을 집계하는 단계를 포함할 수 있다. An in-depth magnetic induction cost aggregation method for stereo matching, comprising: matching a stereo image based on a cost volume slice; And aggregating the magnetic induction matching cost by acquiring the disparity map by matching the stereo image.

상기 스테레오 이미지를 매칭시키는 단계는, 깊은 컨볼루셔널 신경망(DCNN: deep convolutional neural network)을 실행하여 상기 생성된 비용 볼륨 슬라이스에 기초하여 상기 스테레오 이미지를 매칭시키는 단계를 포함하고, 상기 깊은 컨볼루셔널 신경망(DCNN: deep convolutional neural network)은, 동적 가중치 네트워크(dynamic weight network)와 하강 필터링 네트워크(descending filtering network)로 구성될 수 있다. Wherein matching the stereo image comprises: executing a deep convolutional neural network (DCNN) to match the stereo image based on the generated cost volume slice, wherein the deep convolutional neural network A deep convolutional neural network (DCNN) can be composed of a dynamic weight network and a descending filtering network.

상기 깊은 컨볼루셔널 신경망(DCNN: deep convolutional neural network)은, 정류된 선형 단위(ReLU) 레이어가 첫 번째 D-1 층을 따르는 D 개의 컨볼루션 층을 사용하고, 첫번째 레이어와 마지막 레이어를 제외한 모든 레이어에 대해 크기가 3 x 3 x 64 인 64 개의 커널이 존재하고, 첫 번째 콘볼루션 레이어가 3 x 3 x 1의 64 개 커널로 구성되고, 마지막 콘볼루션 레이어는 3 x 3 x 64 크기의 필터로 구성되며, 컨벌루션 레이어의 수는 지원 영역의 크기에 따라 설정될 수 있다. The deep convolutional neural network (DCNN) uses a D convolution layer with a rectified linear unit (ReLU) layer along the first D-1 layer, and all but the first and last layers There are 64 kernels of size 3 x 3 x 64 for the layer, the first convolution layer consists of 64 kernels of 3 x 3 x 1, and the final convolution layer is a 3 x 3 x 64 size filter And the number of convolution layers can be set according to the size of the support area.

상기 스테레오 이미지를 매칭시키는 단계는, 상기 깊은 컨볼루션 신경망을 이용하여 상기 비용 볼륨 슬라이스의 가중치를 측정하고 가중치 가산을 동시에 수행하는 단계를 포함할 수 있다. The step of matching the stereo image may include measuring the weight of the cost volume slice using the deeper convolution neural network and concurrently performing weight addition.

상기 스테레오 이미지를 매칭시키는 단계는, 상기 비용 볼륨 슬라이스에서 색상과 그라디언트의 절삭된 절대차를 선택하여 비용 계산을 일치시킴에 따라 가중 비용 집계를 수행하는 단계를 포함할 수 있다. The step of matching the stereo image may include performing a weighted cost aggregation by selecting a cut absolute difference of color and gradient in the cost volume slice to match the cost calculation.

상기 스테레오 이미지를 매칭시킴에 따라 디스패리티 맵을 획득함으로써 자기유도 정합 비용을 집계하는 단계는, 좌측 디스패리티 맵 및 우측 디스패리티 맵에 대한 일관성 검사를 수행하여 폐색 및 부정확한 디스패리티가 존재하는 픽셀을 감지하는 단계를 포함할 수 있다. The step of aggregating the magnetic induction matching cost by acquiring the disparity map by matching the stereo image may include performing a consistency check on the left disparity map and the right disparity map to determine whether a pixel having a closed and an incorrect disparity exists For example.

스테레오 정합을 위한 심층 자기유도 비용집계 시스템은, 비용 볼륨 슬라이스에 기초하여 상기 스테레오 이미지를 매칭시키는 매칭시키는 매칭부; 및 상기 스테레오 이미지를 매칭시킴에 따라 디스패리티 맵을 획득함으로써 자기유도 정합 비용을 집계하는 집계부를 포함할 수 있다.
An in-depth magnetic induction cost aggregation system for stereo matching, comprising: a matching unit for matching the stereo image based on a cost volume slice; And an aggregation unit for aggregating the magnetic induction matching cost by acquiring the disparity map by matching the stereo image.

유도 영상을 사용하지 않고 깊은 컨볼루셔널 신경망을 통하여 비용 볼륨 슬라이스에 기반하여 스테레오 이미지를 매칭시킴으로써 보다 빠르고 정확하게 디스패리티 맵을 획득할 수 있다.
The disparity map can be acquired more quickly and accurately by matching the stereo image based on the cost volume slice through the deep convoluted neural network without using the guidance image.

도 1은 일 실시예에 따른 자기유도 비용집계 시스템의 구성을 설명하기 위한 블록도이다.
도 2는 일 실시예에 따른 자기유도 비용집계 시스템에서 자기유도 비용집계 방법을 설명하기 위한 흐름도이다.
도 3은 일 실시예에 따른 자기유도 비용집계 시스템에서 깊은 자기유도 비용집계 방법의 개괄적인 개요를 설명하기 위한 도면이다.
도 4는 일 실시예에 따른 자기유도 비용집계 시스템에서 트레이닝 데이터의 예를 나타낸 도면이다.
도 5는 일 실시예에 따른 자기유도 비용집계 시스템에서 수행된 데이터와 종래의 데이터를 비교한 결과를 나타낸 예이다.
도 6은 일 실시예에 따른 자기유도 비용집계 시스템에서 제안하는 깊은 학습망의 개괄적인 개요를 설명하기 위한 도면이다.
도 7은 일 실시예에 따른 자기유도 비용집계 시스템에서 입력된 이미지와 디스패리티 맵을 나타낸 예이다.1 is a block diagram illustrating a configuration of a magnetic induction cost aggregation system according to an embodiment.
2 is a flow chart for explaining a method of self-induction cost aggregation in a magnetic induction cost aggregation system according to an embodiment.
3 is a diagram illustrating a general outline of a method of deep magnetic induction cost aggregation in a magnetic induction cost aggregation system according to an embodiment.
4 is a diagram illustrating an example of training data in a magnetic induction cost aggregation system according to an embodiment.
FIG. 5 shows an example of a comparison between data performed in the magnetic induction cost aggregation system according to an embodiment and conventional data.
FIG. 6 is a diagram for explaining a general outline of a deep learning network proposed in a self-induction cost aggregation system according to an embodiment.
FIG. 7 illustrates an example of an input image and a disparity map in the magnetic induction cost aggregation system according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.
Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따른 자기유도 비용집계 시스템의 구성을 설명하기 위한 블록도이고, 도 2는 일 실시예에 따른 자기유도 비용집계 시스템에서 자기유도 비용집계 방법을 설명하기 위한 흐름도이다. FIG. 1 is a block diagram for explaining a configuration of a magnetic induction cost aggregation system according to an embodiment, and FIG. 2 is a flowchart for explaining a magnetic induction cost aggregation method in a magnetic induction cost aggregation system according to an embodiment.

자기유도 비용집계 시스템(100)은 매칭부(110) 및 집계부(120)를 포함할 수 있다. 자기유도 비용집계 시스템(100)의 구성요소들은 도 2의 자기유도 비용집계 방법이 포함하는 단계들(210 내지 220)을 수행하도록 자기유도 비용집계 시스템(100)을 제어할 수 있다. The magnetic induction cost aggregation system 100 may include a matching unit 110 and an aggregation unit 120. The components of the magnetic induction cost aggregation system 100 may control the magnetic induction cost aggregation system 100 to perform the steps 210 to 220 that are included in the magnetic induction cost aggregation method of FIG.

단계(210)에서 매칭부(110)는 비용 볼륨 슬라이스에 기초하여 스테레오 이미지를 매칭시킬 수 있다. 매칭부(110)는 깊은 컨볼루셔널 신경망(DCNN: deep convolutional neural network)을 실행하여 비용 볼륨 슬라이스에 기초하여 스테레오 이미지를 매칭시킬 수 있다. 매칭부(110)는 깊은 컨볼루셔널 신경망을 이용하여 비용 볼륨 슬라이스의 가중치를 측정하고 가중치 가산을 동시에 수행할 수 있다. 생성부(120)는 비용 볼륨 슬라이스에서 색상과 그라디언트의 절삭된 절대차를 선택하여 비용 계산을 일치시킴에 따라 가중 비용 집계를 수행할 수 있다. In step 210, the matching unit 110 may match the stereo image based on the cost volume slice. The matching unit 110 may execute a deep convolutional neural network (DCNN) to match the stereo image based on the cost volume slice. The matching unit 110 can measure the weight of the cost volume slice and perform the weight addition at the same time using the deep convolutional neural network. The generating unit 120 may perform the weighted cost aggregation by selecting the absolute difference between the color and the gradient in the cost volume slice to match the cost calculation.

단계(220)에서 집계부(120)는 스테레오 이미지를 매칭시킴에 따라 디스패리티 맵을 획득함으로써 자기유도 정합 비용을 집계할 수 있다. 집계부(120)는 좌측 디스패리티 맵 및 우측 디스패리티 맵에 대한 일관성 검사를 수행하여 폐색 및 부정확한 디스패리티가 존재하는 픽셀을 감지할 수 있다. In step 220, the tallying unit 120 can compute the self-induced matching cost by acquiring the disparity map by matching the stereo image. The tallying unit 120 may perform a consistency check on the left disparity map and the right disparity map to detect a pixel having an obstruction and an incorrect disparity.

도 3은 일 실시예에 따른 자기유도 비용집계 시스템에서 깊은 자기유도 비용집계 방법의 개괄적인 개요를 설명하기 위한 도면이다. 3 is a diagram illustrating a general outline of a method of deep magnetic induction cost aggregation in a magnetic induction cost aggregation system according to an embodiment.

자기유도 비용집계 시스템은 매칭 비용 모델로서 색상과 그라디언트의 절삭된 절대차를 선택할 수 있다. 비용 계산을 매칭시킴에 따라 가중 비용 집계를 수행할 수 있다. 일반적으로, 유도 영상은 윈도우의 각 픽셀에 대한 가중치를 측정하기 위하여 필요하다. 실시예에서는, 유도 컬러 영상을 사용하는 대신 비용 볼륨 슬라이스를 사용할 수 있다. 이에 따라 심층 학습 기술을 적용하여 가중치를 측정하고 가중치 가산을 동시에 수행할 수 있다. The magnetic induction cost aggregation system can select the absolute difference in color and gradient as the matching cost model. Weighted cost aggregation can be performed by matching cost calculations. Generally, an inductive image is needed to measure the weight for each pixel of the window. In an embodiment, a cost volume slice can be used instead of using an inductive color image. Accordingly, it is possible to measure the weights by applying the in-depth learning technique and to perform the weight addition at the same time.

마지막으로, 디스패리티 벨류(disparity value)를 획득하기 위해 WTA(Winner-Take-All) 전략을 수행할 수 있다. 좌측 디스패리티 맵과 우측 디스패리티 맵을 계산한 다음 좌우측 일관성 검사를 사용하여 폐색 및 부정확한 불일치가 존재하는 픽셀을 감지할 수 있다. 홀(Hole)을 채우기 위해, 동일한 주사선에서 정확한 디스패리티 갖는 가장 가까운 픽셀의 가장 낮은 디스패리티 벨류가 할당될 수 있다. 그 후, 깊이 경계를 보존하기 위해 어댑티브 미디언 필터(adaptive median filter)를 사용하여 디스패리티를 수정할 수 있다.Finally, a Winner-Take-All (WTA) strategy can be implemented to obtain the disparity value. After calculating the left disparity map and the right disparity map, left and right coherence checks can be used to detect occlusion and pixels with incorrect mismatches. To fill the holes, the lowest disparity value of the closest pixel with the correct disparity in the same scan line can be assigned. The disparity can then be modified using an adaptive median filter to preserve the depth boundary.

도 3을 참고하면, 네트워크 구조의 개요를 나타낸 것이다. 깊은 컨볼루셔널 신경망의 구조는 정류된 선형 단위(ReLU) 레이어가 첫 번째 D-1 레이어를 따르는 D 개의 컨볼루션 레이어를 사용할 수 있다. Referring to FIG. 3, an outline of the network structure is shown. The structure of the deep convoluted neural network can use D convolutional layers where the rectified linear unit (ReLU) layer follows the first D-1 layer.

처음과 마지막을 제외한 모든 레이어에 대해 크기가 3 x 3 x 64 인 64 개의 커널을 포함할 수 있다. 이는 작은 필터가 각 레이어의 64 개 채널에서 수행된다는 것을 의미한다. 첫 번째 콘볼루션 레이어는 3 x 3 x 1의 64 개 커널로 구성되며 마지막 콘볼루션 레이어는 3 x 3 x 64 크기의 필터로 구성될 수 있다. You can include 64 kernels of size 3 x 3 x 64 for all layers except the first and last. This means that a small filter is performed on the 64 channels of each layer. The first convolution layer consists of 64 kernels, 3 x 3 x 1, and the last convolution layer can consist of 3 x 3 x 64 size filters.

컨벌루션 레이어의 수는 다루기를 원하는 지원 영역의 크기에 따라 설정될 수 있다. 예를 들면, 19 x 19개의 지원 영역에 대해, 지원 영역 내의 중심 화소가 다른 모든 화소에 영향을 미칠 수 있도록 D = 9로 설정할 수 있다.The number of convolution layers can be set according to the size of the support area desired to be handled. For example, for 19 x 19 support areas, D = 9 can be set so that the center pixel in the support area can affect all other pixels.

도 4는 일 실시예에 따른 자기유도 비용집계 시스템에서 트레이닝 데이터의 예를 나타낸 도면이다.4 is a diagram illustrating an example of training data in a magnetic induction cost aggregation system according to an embodiment.

도 4를 참고하면, 두 영역에서의 트레이닝 데이터 세트의 예제이다. 첫 번째 행과 두 번째 행은 주변의 깊이 불연속 패치이고, 세 번째 행은 균일한 깊이 패치를 나타낸 것이다. Referring to FIG. 4, an example of a training data set in two areas is shown. The first row and the second row are the depth discontinuity patches around, and the third row shows the uniform depth patches.

비용 볼륨 슬라이스를 실제로 측정하는 것이 어렵기 때문에 비용 집계 단계에 대한 깊은 학습망을 활용하기가 어렵다. 이에 따라 유도된 필터링된 비용 볼륨을 사용할 수 있다. 이때, 올바른 디스패리티를 초래하는 필터링된 비용 볼륨만 선택할 수 있다.Cost Because it is difficult to actually measure the volume slice, it is difficult to utilize a deep learning network for the cost aggregation step. So that the derived filtered cost volume can be used. At this time, only the filtered cost volume causing the correct disparity can be selected.

실시예에서 제안되는 깊은 학습망은 종단 간 학습(end-to-end learning)이므로 일련의 패치가 일련의 이미지 대신 데이터 세트로 생성될 수 있다. 데이터 세트는 깊이 불연속 패치와 균일한 깊이 패치, 두 개의 세트가 존재한다. 이러한 두 개의 세트의 비율은 50:50이므로 트레이닝된 매개 변수가 두 영역에서 정상적으로 작동할 수 있다.Since the deep learning network proposed in the embodiment is end-to-end learning, a series of patches can be generated as a data set instead of a series of images. There are two sets of data sets, a depth discontinuous patch and a uniform depth patch. The ratio of these two sets is 50:50, so the trained parameters can work normally in both areas.

도 5는 일 실시예에 따른 자기유도 비용집계 시스템에서 수행된 데이터와 종래의 데이터를 비교한 결과를 나타낸 예이다. FIG. 5 shows an example of a comparison between data performed in the magnetic induction cost aggregation system according to an embodiment and conventional data.

실시예에서 제안된 제안된 깊은 자기유도 비용집계 방법(DEEP)의 성능을 평가하기 위해 종래의 GF와 종래의 NL와 같은 방법을 비교할 수 있다. 또한, 단순 평균 필터(BOX)와 결과를 비교할 수 있다.To compare the performance of the proposed deep magnetic induction cost aggregation method (DEEP) proposed in the embodiment, it is possible to compare methods such as conventional GF and conventional NL. You can also compare the results with the simple average filter (BOX).

도 5는 책(Books) 데이터 세트의 질적 비교를 나타낸 것으로, 확대/ 축소된 책 데이터의 디스패리티 맵을 나타낸 것이다. (a)는 왼쪽 이미지 입력, (b)는 실측 자료, (c)는 BOX, (d)는 GF, (e)는 NL, (f)는 제안된 방법(DEEO)을 의미한다. 이에 따르면, 제안된 방법이 유사한 디스패리티를 갖는 고도로 텍스처링된 영역에서 잘 수행된다는 것을 확인할 수 있다. 도 5에서 표시된 빨간색 박스는 질감이 잘된 영역과 줌 버전을 나타내므로 잘 이해할 수 있다.FIG. 5 shows a qualitative comparison of a books data set, showing a disparity map of enlarged / reduced book data. (a) is the left image input, (b) is the actual data, (c) is BOX, (d) is GF, (e) is NL and (f) is the proposed method (DEEO). It can be seen from this that the proposed method performs well in a highly textured area with similar disparities. The red box shown in FIG. 5 indicates a well-textured area and a zoom version, so that it can be understood well.

　유도 영상 기반의 가중치는 고도의 텍스처링된 영역에서의 실측된 디스패리티 기반의 가중치와 동일하지 않을 수 있다.The weights based on the inductive image may not be the same as the weights based on the observed disparity in the highly textured region.

기존의 방법이 그 영역에서 정확하고 부드러운 불일치를 획득하지 못하는 반면, 제안된 깊은 자기 유도 방법은 더 나은 결과를 획득할 수 있다.While the existing methods do not obtain accurate and smooth discrepancies in the domain, the proposed deep magnetic induction method can obtain better results.

표 1은 각 데이터 세트의 불량 화소 비율을 나타낸다.Table 1 shows the percentage of defective pixels in each data set.

표 1:Table 1:

　불량 픽셀 비율은 비폐색 영역을 사용하여 계산될 수 있다. 제안된 방법은 다른 종래의 비용 집계 방법을 사용하여 비교 가능한 결과를 획득할 수 있는 것으로 나타났다. 실시예에 따른 방법은 유도 영상의 핸드캡으로부터 자유롭다. The defective pixel ratio can be calculated using the non-occluded area. The proposed method is shown to be able to obtain comparable results using other conventional cost aggregation methods. The method according to the embodiment is free from the hand cap of the inductive image.

본 발명은 정확한 스테레오 매칭 결과를 획득하기 위하여 자기유도 비용 집계 방법을 제안하였다. 먼저 트레이닝 데이터 세트에서 무작위로 훈련 패치를 생성하고, 가중치 매개 변수를 학습하기 위해 깊은 컨볼루셔널 신경망을 실행한다. 이때, 유도 영상을 사용하는 대신, 비용 볼륨 슬라이스를 자체 가이드하여 비용 집계를 수행할 수 있다. The present invention proposes a method of aggregating the cost of magnetic induction to obtain accurate stereo matching results. First, we randomly generate training patches in the training data set, and execute deep convolutional neural networks to learn the weight parameters. At this time, instead of using the inductive image, the cost volume slice can be guided by itself to perform the cost aggregation.

도 6은 일 실시예에 따른 자기유도 비용집계 시스템에서 제안하는 깊은 학습망의 개괄적인 개요를 설명하기 위한 도면이다. FIG. 6 is a diagram for explaining a general outline of a deep learning network proposed in a self-induction cost aggregation system according to an embodiment.

자기유도 비용집계 시스템은 자기유도 비용집계를 수행하기 위한 깊은 비용집계 네트워크(예를 들면, 깊은 컨볼루셔널 신경망)을 제안할 수 있다. 실시예에서 제안된 깊은 비용집계 네트워크는 동적 가중치 네트워크와 하향 필터링 네트워크를 포함하는 두 개의 하위 네트워크로 구성될 수 있다. 동적 가중치 네트워크는 각 입력 픽셀에 대해 일련의 동적 가중치를 생성하고, 이러한 가중치는 필터링 네트워크를 위한 지침으로 유용하다. 하향 필터링 네트워크는 입력 비용 볼륨 슬라이스에 대해 에지 인식 필터링 작업을 수행할 수 있다. 이에 따라 에지 인식 필터링된 비용 볼륨 슬라이스를 사용하게 된다. A magnetic induction cost aggregation system may propose a deep cost aggregation network (e. G., Deep convergent neural network) for performing magnetic induction cost aggregation. The deep cost aggregation network proposed in the embodiment can be composed of two sub-networks including a dynamic weighting network and a down-filtering network. The dynamic weighting network generates a series of dynamic weights for each input pixel, and these weights are useful as guidelines for filtering networks. The down-filtering network may perform an edge-aware filtering operation on the input cost volume slice. This results in the use of an edge-aware filtered cost volume slice.

도 6을 참고하면, 깊은 자기유도 비용집계 네트워크의 개요를 나타낸 것으로, 여기서 W와 H는 입력된 너비와 높이를 의미한다. 또한, d 및 r은 지원 영역의 직경 및 반경을 의미한다. 손실 함수는 피쳐 재구성 손실 및 평균 제곱 손실 함수로 구성될 수 있다Referring to FIG. 6, an outline of a deep magnetic induction cost aggregation network is shown, where W and H mean the input width and height. Also, d and r mean the diameter and radius of the support area. The loss function may consist of a feature reconstruction loss and a mean square loss function

도 7은 일 실시예에 따른 자기유도 비용집계 시스템에서 입력된 이미지와 디스패리티 맵을 나타낸 예이다. 예를 들면, 2015 KITTI 트레이닝 데이터로, 왼쪽은 왼쪽 이미지 입력, 오른쪽은 제안된 방법의 디스패리티 맵을 나타낸 것이다. FIG. 7 illustrates an example of an input image and a disparity map in the magnetic induction cost aggregation system according to an embodiment. For example, the 2015 KITTI training data, the left image input on the left, and the disparity map of the proposed method on the right.

실제로 측정된 비용 볼륨 슬라이스가 없기 때문에 깊은 학습 망을 적용하기가 어렵다. 따라서 유도된 필터링된 비용 볼륨을 사용할 수 있다. 이때, 실제로 측정된 디스패리티 맵이 지침으로 활용될 수 있다.It is difficult to apply a deep learning network because there is no actually measured cost volume slice. Thus, the derived filtered cost volume can be used. At this time, the actually measured disparity map can be used as a guide.

실시예에서는, 깊이 불연속 주위의 패치의 80%가 각 트레이닝 이미지에서 1000 개의 패치를 임의로 추출할 수 있다. 이때, 회전 및 대비 불변성을 고려하여 데이터가 증가될 수 있다. In an embodiment, 80% of the patches around the depth discontinuity can randomly extract 1000 patches in each training image. At this time, data can be increased in consideration of rotation and contrast invariance.

도 7을 참고하면, Middlebury 데이터 세트와 KITTI 데이터 세트를 사용하여 성능을 검증하는 평가를 수행할 수 있다. 표 2는 200 개의 KITTI 트레이닝 이미지의 평균 불량 화소 비율을 비교한 것을 나타낸 것이다.Referring to FIG. 7, an evaluation can be performed to verify the performance using the Middlebury data set and the KITTI data set. Table 2 shows a comparison of average defective pixel ratios of 200 KITTI training images.

KITTI 벤치 마크와 동일한 평가 척도가 활용될 수 있다. 표 2를 통하여 제안된 방법(DEEP)이 가장 작은 평균 오차를 달성함을 확인할 수 있다.The same rating scale as the KITTI benchmark can be used. From Table 2, it can be seen that the proposed method (DEEP) achieves the smallest mean error.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device As shown in FIG. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.
Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method of in-depth magnetic induction cost aggregation for stereo matching,
Matching the stereo image based on a cost volume slice; And
And calculating a self-induced matching cost by acquiring a disparity map by matching the stereo image
Lt; / RTI >
Wherein matching the stereo image comprises:
Performing a deep convolutional neural network (DCNN) to match the stereo image based on the cost volume slice
Lt; / RTI >
The deep convolutional neural network (DCNN) is composed of a dynamic weight network and a descending filtering network
Depth Self Induction Cost Aggregation Method.

delete

The method according to claim 1,
The deep convolutional neural network (DCNN)
The rectified linear unit (ReLU) layer uses D convolution layers along the first D-1 layer,
There are 64 kernels of size 3 x 3 x 64 for all layers except the first and last layer,
The first convolution layer consists of 64 x 3 x 3 x 1 kernels, the last convolution layer consists of 3 x 3 x 64 size filters,
The number of convolution layers is set according to the size of the support area
Depth Self Induction Cost Aggregation Method.

The method according to claim 1,
Wherein matching the stereo image comprises:
Measuring a weight of the cost volume slice using the deep convolutional neural network and simultaneously performing a weight addition;
Wherein the method comprises the steps of:

A method of in-depth magnetic induction cost aggregation for stereo matching,
Matching the stereo image based on a cost volume slice; And
And calculating a self-induced matching cost by acquiring a disparity map by matching the stereo image
Lt; / RTI >
Wherein matching the stereo image comprises:
Performing a weighted cost aggregation by selecting a cut absolute difference of color and gradient in the cost volume slice to match the cost calculation
Wherein the method comprises the steps of:

A method of in-depth magnetic induction cost aggregation for stereo matching,
Matching the stereo image based on a cost volume slice; And
And calculating a self-induced matching cost by acquiring a disparity map by matching the stereo image
Lt; / RTI >
The step of aggregating the magnetic induction matching cost by acquiring the disparity map by matching the stereo image,
Performing a coherence check on the left disparity map and the right disparity map to detect a pixel having an obstruction and an incorrect disparity
Wherein the method comprises the steps of:

1. An in-depth magnetic induction cost aggregation system for stereo matching,
A matching unit for matching the stereo image based on the cost volume slice; And
And an aggregation unit for aggregating the magnetic induction matching cost by acquiring a disparity map by matching the stereo image,
Lt; / RTI >
The matching unit,
Performing a deep convolutional neural network (DCNN) to match the stereo image based on the cost volume slice
&Lt; / RTI >
The deep convolutional neural network (DCNN) is composed of a dynamic weight network and a descending filtering network
Magnetic induction cost aggregation system.