KR102229861B1

KR102229861B1 - Depth estimation apparatus and method using low-channel lidar and stereo camera

Info

Publication number: KR102229861B1
Application number: KR1020190132701A
Authority: KR
Inventors: 손광훈; 박기홍
Original assignee: 연세대학교 산학협력단
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2021-03-18

Abstract

The present invention provides a depth estimation device and method using a low-channel LIDAR and a stereo camera, which use the color information obtained from a camera as a guide, and mix pixel depth information obtained from the low-channel LIDAR and inaccurate depth information obtained from the stereo camera to generate a high-density and accurate depth map even with a low-cost and low-channel LIDAR.

Description

Depth estimation apparatus and method using low-channel lidar and stereo camera {DEPTH ESTIMATION APPARATUS AND METHOD USING LOW-CHANNEL LIDAR AND STEREO CAMERA}

본 발명은 깊이 추정 장치 및 방법에 관한 것으로, 저채널 라이다와 스테레오 카메라를 이용한 깊이 추정 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for estimating depth, and to an apparatus and method for estimating depth using a low-channel lidar and a stereo camera.

장면의 3D 기하학적 구성을 인식하는 것은 자율 주행 차, 모바일 로봇, 위치 파악과 매핑, 장애물 회피 및 경로 계획 및 3D 재구성과 같은 많은 로봇 공학 및 컴퓨터 비전 응용 분야에서 수많은 작업에 필수적이다.Recognizing the 3D geometric composition of a scene is essential for numerous tasks in many robotics and computer vision applications, such as autonomous vehicles, mobile robots, positioning and mapping, obstacle avoidance and path planning, and 3D reconstruction.

현재 장면의 신뢰할 수 있는 깊이 정보를 추정하기 위해서는 RGB-D 센서 또는 3D 라이다(LiDAR)와 같은 능동형 3D 스캐너 또는 스테레오 이미지에 대한 패시브 매칭 알고리즘 등이 이용되고 있다.Currently, in order to estimate reliable depth information of a scene, an active 3D scanner such as an RGB-D sensor or a 3D LiDAR or a passive matching algorithm for a stereo image is used.

실외 환경에서는 RGB-D 센서의 경우 조명에 의해 정확도가 떨어지고 제한된 감지 범위를 제공하기 때문에, 센티미터 정도의 오차로 매우 정확한 깊이 정보를 제공할 수 있는 라이다가 주로 이용되고 있다.In an outdoor environment, since the RGB-D sensor is less accurate due to lighting and provides a limited detection range, a lidar that can provide very accurate depth information with an error of about a centimeter is mainly used.

도 1은 라이다의 밀도와 정확도 사이의 관계를 나타낸 도면이다.1 is a diagram showing the relationship between the density and accuracy of the lidar.

도 1에서 (a)는 64 채널의 고채널 라이다에서 스캔된 깊이 맵을 나타내고, (b)는 16 채널의 저채널 라이다에서 스캔된 깊이 맵을 나타내며, (c)는 라이다가 (b)의 저채널 라이다와 동일한 밀도로 균일 분산 스캔을 수행한 경우에 획득되는 깊이 맵을 나타낸다.In FIG. 1, (a) shows the depth map scanned from the high-channel lidar of 64 channels, (b) shows the depth map scanned from the low-channel lidar of 16 channels, and (c) shows the lidar (b). Depicts a depth map obtained when a uniform dispersion scan is performed with the same density as the low channel lidar of ).

도 1의 (a)와 (b)를 비교하면, (a)는 64 채널을 이용하여 4.47%의 입력 밀도로 깊이 정보가 획득되어 평균 제곱근 오차(RMSE)는 856인데 반해 (b)는 16 채널을 이용하여 1.16%의 밀도로 깊이 정보가 획득되어 평균 제곱근 오차(RMSE)는 1573이다. 즉 (a)에 도시된 고채널 라이다에서 스캔된 고밀도의 깊이 맵이 (b)의 저밀도 깊이 맵에 비해 확연히 높은 정확도의 깊이 맵을 제공할 수 있음을 알 수 있다. 그러나 라이다는 스캔 밀도가 높아질수록 라이다의 비용이 기하급수적으로 증가되며, 이에 일반적으로 낮은 밀도의 저채널 라이다가 이용되고 있으나, 저채널 라이다는 희박한 스캔 밀도로 인해 장면에서 일부 객체가 스캔되지 않는 객체 누락이 발생될 수 있는 문제가 있다. 특히 대부분의 라이다는 채널 수에 대응하는 라인 단위로 스캔을 수행함에 따라 수직 방향의 희소성이 크게 나타남을 알 수 있다.Comparing (a) and (b) of Fig. 1, (a) is obtained by obtaining depth information with an input density of 4.47% using 64 channels, so that the root mean square error (RMSE) is 856, whereas (b) is 16 channels. Depth information is obtained with a density of 1.16% by using, and the root mean square error (RMSE) is 1573. That is, it can be seen that the high-density depth map scanned by the high-channel lidar shown in (a) can provide a depth map with significantly higher accuracy than the low-density depth map of (b). However, as the scan density of the lidar increases, the cost of the lidar increases exponentially, and in general, low-density low-channel lidar is used, but due to the sparse scan density of the lidar, some objects in the scene There is a problem in which objects that are not scanned may be missing. In particular, as most of the radars perform scans in units of lines corresponding to the number of channels, it can be seen that the scarcity in the vertical direction is large.

또한 (b)와 (c)를 비교하면, 동일한 밀도(1.16%)로 스캔이 수행되더라도, 분산 스캔된 (c)의 깊이 맵의 평균 제곱근 오차(RMSE)가 1181로 (b)에 비해 더욱 정확한 깊이 맵을 제공할 수 있음을 알 수 있다. 그러나 상기한 바와 같이, 라이다는 하드웨어의 한계로 인해 (a) 및 (b)에 도시된 바와 같이 라인 단위로 스캔을 수행하며, (c)와 같은 분산 스캔을 수행할 수 없다. 즉 비용을 저감하기 위해 저채널 라이다를 이용하는 경우, 정확한 깊이 정보를 획득하기 어렵다는 한계가 있다.In addition, comparing (b) and (c), even if the scan is performed at the same density (1.16%), the root mean square error (RMSE) of the depth map of the distributed scan (c) is 1181, which is more accurate than (b). It can be seen that it can provide a depth map. However, as described above, due to hardware limitations, Rida performs a line-by-line scan as shown in (a) and (b), and cannot perform a distributed scan as shown in (c). That is, when using a low-channel lidar to reduce cost, there is a limitation in that it is difficult to obtain accurate depth information.

또한 라이다는 장면을 이해하는데 유용한 색상 정보를 제공하지 못한다는 한계가 있다.Also, there is a limitation in that the lidar cannot provide useful color information to understand the scene.

한편 스테레오 카메라에서 획득되는 스테레오 이미지에 대해 매칭 알고리즘을 적용하여 깊이 정보를 획득하는 경우, 요구되는 고밀도로로 깊이 정보와 색상 정보를 함께 획득할 수 있다. 그러나 스테레오 이미지 매칭 알고리즘으로 획득되는 깊이 정보는 카메라의 위치로부터 거리가 멀어질수록 제곱에 비례하여 증가하게 되므로, 현재는 대략 20m 이내까지 깊이를 측정하도록 권장되고 있는 실정이다.Meanwhile, when depth information is obtained by applying a matching algorithm to a stereo image obtained from a stereo camera, depth information and color information may be obtained together at a required high density. However, since the depth information obtained by the stereo image matching algorithm increases in proportion to the square as the distance from the position of the camera increases, it is currently recommended to measure the depth within approximately 20m.

이에 저채널 라이다와 스테레오 카메라를 융합하여 라이다에서 획득되는 저밀도의 희소 깊이 정보를 스테레오 카메라에서 획득되는 고밀도의 깊이 정보로 보완하고자 하는 다양한 융합 기법이 기존에도 제안된 바 있으나, 실외 환경에서는 신뢰할 수 있는 깊이 정보를 제공하지 못하거나, 라이다의 깊이 정보가 없는 영역에서 정확도가 크게 저하되는 한계가 있어, 정확한 깊이 정보 오차를 획득할 수 있는 융합 기법이 계속적으로 요구되고 있다.Accordingly, various fusion techniques have been proposed in the past to supplement the low-density sparse depth information obtained from the lidar with the high-density depth information obtained from the stereo camera by fusion of a low-channel lidar and a stereo camera. There is a limitation in that the accuracy is greatly degraded in an area where the possible depth information cannot be provided or there is no depth information of the lidar, and thus, a fusion technique capable of obtaining an accurate depth information error is continuously required.

한국 등록 특허 제10-1976290호 (2019.04.30 등록)Korean Registered Patent No. 10-1976290 (Registered on April 30, 2019)

본 발명의 목적은 저비용의 저채널 라이다와 스테레오 카메라를 이용하여 정확한 깊이 정보를 융합하여 할 수 있는 정확한 깊이 정보를 제공할 수 있는 깊이 추정 장치 및 방법을 제공하는데 있다.An object of the present invention is to provide a depth estimation apparatus and method capable of providing accurate depth information that can be performed by fusing accurate depth information using a low-cost low-channel lidar and a stereo camera.

본 발명의 다른 목적은 저채널 라이다에서 획득된 희소 밀도의 깊이 정보와 스테레오 카메라에서 획득된 부정확한 깊이 정보를 컬러 정보를 가이드로 하여 융합함으로써, 정확한 고밀도 깊이 정보를 제공할 수 있는 깊이 추정 장치 및 방법을 제공하는데 있다.Another object of the present invention is a depth estimation apparatus capable of providing accurate high-density depth information by fusing depth information of sparse density obtained from a low channel lidar and inaccurate depth information obtained from a stereo camera using color information as a guide. And to provide a method.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 깊이 추정 장치는 라이다에서 획득된 깊이 데이터로부터 라이다 입력 레이블을 획득하고, 미리 학습된 패턴 추정 방식에 따라 상기 라이다 입력 레이블에 대해 단계적으로 인코딩하여 라이다 깊이 특징을 추출하는 라이다 깊이 특징 추출부;스테레오 카메라에서 획득된 디스패리티 맵으로부터 스테레오 입력 레이블을 획득하고, 미리 학습된 패턴 추정 방식에 따라 상기 스테레오 입력 레이블에 대해 단계적으로 인코딩하여 스테레오 깊이 특징을 추출하는 스테레오 깊이 특징 추출부; 상기 스테레오 카메라에서 획득된 컬러 이미지에서 미리 학습된 패턴 추정 방식에 따라 추출되는 가이드 특징과 대응하는 단계에서 추출된 상기 라이다 깊이 특징과 상기 스테레오 깊이 특징을 미리 학습된 방식에 따라 융합하여 융합 특징을 획득하고, 단계적으로 획득된 융합 특징을 인코딩하여 가이드 특징을 단계적으로 추출하는 컬러 가이드 특징 추출부; 및 상기 컬러 가이드 특징 추출부에서 최종 추출된 융합 특징을 인가받아 미리 학습된 특징 복원 방식에 따라 디코딩하여 복원 특징을 획득하고, 획득된 복원 특징을 단계적으로 추출되는 융합 특징 중 대응하는 융합 특징과 결합하여 결합 복원 특징을 획득하며, 획득된 결합 복원 특징으로부터 다시 복원 특징을 획득하여 최종 획득된 복원 특징을 깊이 맵으로 출력하는 디코더를 포함한다.The depth estimation apparatus according to an embodiment of the present invention for achieving the above object obtains a lidar input label from depth data obtained from a lidar, and stepwise with respect to the lidar input label according to a pre-learned pattern estimation method. A lidar depth feature extraction unit that extracts a lidar depth feature by encoding it; obtaining a stereo input label from a disparity map obtained from a stereo camera, and stepwise encoding the stereo input label according to a pre-learned pattern estimation method A stereo depth feature extractor for extracting a stereo depth feature; The fusion feature is obtained by fusing the guide feature extracted according to the pattern estimation method learned in advance from the color image obtained from the stereo camera and the lidar depth feature extracted in the corresponding step and the stereo depth feature according to the pre-learned method. A color guide feature extracting unit for stepwise extraction of the guide feature by obtaining and encoding the stepwise acquired fusion feature; And the fusion feature finally extracted from the color guide feature extraction unit is applied and decoded according to a pre-learned feature reconstruction method to obtain a reconstruction feature, and combines the obtained reconstruction feature with a corresponding fusion feature among the stepwise extracted fusion features. And a decoder that obtains a joint restoration feature by doing so, obtains a restoration feature again from the obtained joint restoration feature, and outputs the finally obtained restoration feature as a depth map.

상기 라이다 깊이 특징 추출부는 상기 라이다에서 획득된 3D 저밀도 깊이 데이터를 2D 이미지 좌표 상에 투영하여 라이다 입력 레이블을 획득하는 라이다 깊이 맵 획득부; 및 다단 구조로 연결되어 상기 라이다 입력 레이블 또는 이전 단에서 획득된 라이다 깊이 특징을 학습에 의해 획득되는 마스크와 결합하고 미리 학습된 패턴 추정 방식에 따라 인코딩하여 라이다 깊이 특징을 획득하는 다수의 라이다 특징 추출부를 포함할 수 있다.The lidar depth feature extractor includes a lidar depth map acquisition unit that projects the 3D low-density depth data obtained from the lidar onto 2D image coordinates to obtain a lidar input label; And a plurality of LIDAR depth features that are connected in a multi-stage structure and combine the LIDAR input label or the LIDAR depth feature obtained from the previous stage with a mask obtained by learning, and encode according to a pre-learned pattern estimation method to obtain a LIDAR depth feature. It may include a lidar feature extraction unit.

상기 스테레오 깊이 특징 추출부는 상기 스테레오 카메라에서 획득되는 디스패리티 맵을 인가받아 상기 라이다의 스캔 영역과 매칭시켜 스테레오 깊이 맵을 획득하고, 상기 스테레오 깊이 맵의 밀도를 상기 라이다 입력 레이블에 대응하는 밀도로 변환하여 상기 스테레오 입력 레이블을 획득하는 스테레오 깊이 맵 획득부; 및 다단 구조로 연결되어 상기 스테레오 입력 레이블 또는 이전 단에서 획득된 스테레오 깊이 특징을 학습에 의해 획득되는 마스크와 결합하고 미리 학습된 패턴 추정 방식에 따라 인코딩하여 스테레오 깊이 특징을 획득하는 다수의 스테레오 특징 추출부를 포함할 수 있다.The stereo depth feature extractor receives the disparity map obtained from the stereo camera and matches the scan area of the lidar to obtain a stereo depth map, and determines the density of the stereo depth map to a density corresponding to the lidar input label. A stereo depth map acquisition unit that converts to and obtains the stereo input label; And extracting a plurality of stereo features that are connected in a multi-stage structure to obtain stereo depth features by combining the stereo input label or the stereo depth feature obtained from the previous stage with a mask obtained by learning, and encoding according to a pre-learned pattern estimation method. May contain wealth.

상기 컬러 가이드 특징 추출부는 컬러 이미지 또는 이전 단에서 획득된 융합 특징을 인가받아 미리 학습된 패턴 방식에 따라 가이드 특징을 추출하는 다수의 융합 특징 추출부; 및 상기 다수의 융합 특징 추출부 각각에서 추출된 가이드 특징을 상기 다수의 라이다 특징 추출부 중 대응하는 라이다 특징 추출부에서 인가되는 라이다 깊이 특징 및 상기 다수의 스테레오 특징 추출부 중 대응하는 스테레오 특징 추출부에서 인가되는 스테레오 깊이 특징과 미리 학습된 방식에 따라 융합하여 융합 특징을 획득하는 다수의 특징 융합부를 포함할 수 있다.The color guide feature extraction unit may include a plurality of fusion feature extraction units for extracting a guide feature according to a pre-learned pattern method by receiving a color image or a fusion feature obtained from a previous stage; And a lidar depth feature applied from a corresponding lidar feature extraction unit among the plurality of lidar feature extraction units, and a corresponding stereo among the plurality of stereo feature extraction units. It may include a plurality of feature fusion units for obtaining a fusion feature by fusing the stereo depth feature applied from the feature extraction unit according to a pre-learned method.

상기 다수의 특징 융합부는 대응하는 라이다 특징 추출부와 대응하는 스테레오 특징 추출부 각각 이용된 마스크를 함께 인가받고, 학습에 의해 결정된 가중치에 따라 라이다 깊이 특징과 스테레오 깊이 특징 및 마스크를 융합하여 상기 융합 특징을 출력할 수 있다.The plurality of feature fusion units receive the respective masks used by the corresponding LiDAR feature extraction unit and the corresponding stereo feature extraction unit, and combine the LiDAR depth features and the stereo depth features and masks according to the weights determined by learning. Fusion features can be output.

상기 디코더는 상기 다수의 융합 특징 추출부의 역순에 대응하여 구성되고, 상기 컬러 가이드 특징 추출부에서 최종 추출된 융합 특징 또는 결합 복원 특징을 인가받아 미리 학습된 특징 복원 방식에 따라 디코딩하여 복원 특징을 출력하는 다수의 디코딩부; 및 상기 다수의 디코딩부 중 대응하는 디코딩부에서 출력되는 복원 특징과 대응하는 융합 특징 추출부에서 전달되는 융합 특징을 결합하여 결합 복원 특징을 출력하는 다수의 결합부를 포함할 수 있다.The decoder is configured to correspond to the reverse order of the plurality of fusion feature extraction units, and receives the fusion feature or the combined restoration feature finally extracted from the color guide feature extraction unit, and decodes it according to a previously learned feature reconstruction method, and outputs the reconstructed feature. A plurality of decoding units; And a plurality of combiners configured to combine a reconstructed feature output from a corresponding decoding unit among the plurality of decoding units and a fusion feature transmitted from a corresponding fusion feature extraction unit to output a combined restoration feature.

상기 목적을 달성하기 위한 본 발명의 다른 실시예에 따른 깊이 추정 방법은 라이다에서 획득된 깊이 데이터로부터 라이다 입력 레이블을 획득하고, 미리 학습된 패턴 추정 방식에 따라 상기 라이다 입력 레이블에 대해 단계적으로 인코딩하여 라이다 깊이 특징을 추출하는 단계; 스테레오 카메라에서 획득된 디스패리티 맵으로부터 스테레오 입력 레이블을 획득하고, 미리 학습된 패턴 추정 방식에 따라 상기 스테레오 입력 레이블에 대해 단계적으로 인코딩하여 스테레오 깊이 특징을 추출하는 단계; 상기 스테레오 카메라에서 획득된 컬러 이미지에서 미리 학습된 패턴 추정 방식에 따라 추출되는 가이드 특징과 대응하는 단계에서 추출된 상기 라이다 깊이 특징과 상기 스테레오 깊이 특징을 미리 학습된 방식에 따라 융합하여 융합 특징을 획득하고, 단계적으로 획득된 융합 특징을 인코딩하여 가이드 특징을 추출하는 단계; 및 최종 추출된 융합 특징을 인가받아 미리 학습된 특징 복원 방식에 따라 디코딩하여 복원 특징을 획득하고, 획득된 복원 특징을 단계적으로 추출되는 융합 특징 중 대응하는 융합 특징과 결합하여 결합 복원 특징을 획득하며, 획득된 결합 복원 특징으로부터 다시 복원 특징을 획득하여 최종 획득된 복원 특징을 깊이 맵으로 출력하는 단계를 포함한다.In the depth estimation method according to another embodiment of the present invention for achieving the above object, a lidar input label is obtained from depth data obtained from a lidar, and the LIDAR input label is step-by-stepped according to a pre-learned pattern estimation method. Extracting a lidar depth feature by encoding with Obtaining a stereo input label from a disparity map obtained from a stereo camera, and stepwise encoding the stereo input label according to a previously learned pattern estimation method to extract a stereo depth feature; The fusion feature is obtained by fusing the guide feature extracted according to the pattern estimation method learned in advance from the color image obtained from the stereo camera and the lidar depth feature extracted in the corresponding step and the stereo depth feature according to the pre-learned method. Extracting a guide feature by acquiring and encoding the stepwise acquired fusion feature; And receiving the final extracted fusion feature and decoding it according to a previously learned feature reconstruction method to obtain a reconstruction feature, and combining the obtained reconstruction feature with a corresponding fusion feature among the stepwise extracted fusion features to obtain a joint reconstruction feature. And acquiring the reconstructed feature again from the obtained combined restoration feature and outputting the finally obtained reconstructed feature as a depth map.

따라서, 본 발명의 실시예에 따른 저채널 라이다와 스테레오 카메라를 이용한 깊이 추정 장치 및 방법은 카메라에서 획득된 컬러 정보를 가이드로 저채널 라이다에서 획득된 희소 밀도의 깊이 정보와 스테레오 카메라에서 획득된 부정확한 깊이 정보를 융합하여 저비용의 저채널 라이다로도 고밀도로 정확한 깊이 맵을 생성할 수 있다.Accordingly, the depth estimation apparatus and method using a low-channel lidar and a stereo camera according to an embodiment of the present invention uses the color information obtained from the camera as a guide, and obtains the sparse density depth information obtained from the low-channel lidar and the stereo camera. By fusing inaccurate depth information, it is possible to generate an accurate depth map with high density even with a low-cost, low-channel lidar.

도 1은 라이다의 밀도와 정확도 사이의 관계를 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 깊이 추정 장치의 개략적 구조를 나타낸다.
도 2는 본 실시예에 따른 깊이 추정 장치가 라이다와 스테레오 카메라에서 획득된 깊이 맵을 상호 보완하는 개념을 설명하기 위한 도면이다.
도 3은 라이다 깊이 맵에서 채널별로 측정된 평균 거리와 스테레오 카메라를 이용하여 감지된 거리에 따른 오차 오차를 나타낸다.
도 4는 본 발명의 일 실시예에 따른 깊이 추정 장치의 개략적 구조를 나타낸다.
도 5는 도 4의 깊이 추정 장치의 동작을 설명하기 위한 도면이다.
도 6은 도 4의 융합 특징 추출부의 동작을 설명하기 위한 도면이다.
도 7은 벤치마크 데이터를 이용하여, 본 실시예에 따른 깊이 추정 장치의 성능을 측정한 결과를 나타낸다.
도 8은 라이다의 채널 수와 스캔 방향에 따라 획득되는 깊이 맵의 차이를 설명하기 위한 도면이다.
도 9는 라이다의 채널 수에 따른 오차를 시뮬레이션한 결과를 나타낸다.
도 10은 본 발명의 일 실시예에 따른 깊이 추정 방법을 나타낸다.1 is a diagram showing the relationship between the density and accuracy of the lidar.
2 shows a schematic structure of a depth estimation apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining a concept in which the depth estimation apparatus according to the present embodiment complements the depth map obtained from a LiDAR and a stereo camera.
3 shows an error error according to an average distance measured for each channel in a lidar depth map and a distance detected using a stereo camera.
4 shows a schematic structure of a depth estimation apparatus according to an embodiment of the present invention.
5 is a diagram for describing an operation of the depth estimation apparatus of FIG. 4.
6 is a diagram illustrating an operation of the fusion feature extraction unit of FIG. 4.
7 shows a result of measuring the performance of the depth estimation apparatus according to the present embodiment, using benchmark data.
8 is a diagram for explaining a difference between a depth map obtained according to the number of channels of a lidar and a scan direction.
9 shows the result of simulating the error according to the number of channels of the lidar.
10 shows a depth estimation method according to an embodiment of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, operational advantages of the present invention, and objects achieved by the implementation of the present invention, reference should be made to the accompanying drawings illustrating preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, the present invention will be described in detail by describing a preferred embodiment of the present invention with reference to the accompanying drawings. However, the present invention may be implemented in various different forms, and is not limited to the described embodiments. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a certain part "includes" a certain component, it means that other components may be further included, rather than excluding other components unless specifically stated to the contrary. In addition, terms such as "... unit", "... group", "module", and "block" described in the specification mean a unit that processes at least one function or operation, which is hardware, software, or hardware. And software.

도 2는 본 실시예에 따른 깊이 추정 장치가 라이다와 스테레오 카메라에서 획득된 깊이 맵을 상호 보완하는 개념을 설명하기 위한 도면이고, 도 3은 라이다 깊이 맵에서 채널별로 측정된 평균 거리와 스테레오 카메라를 이용하여 감지된 거리에 따른 오차 오차를 나타낸다.FIG. 2 is a diagram for explaining a concept in which the depth estimation apparatus according to the present embodiment complements a depth map obtained from a LiDAR and a stereo camera, and FIG. 3 is an average distance measured for each channel in a LiDAR depth map and a stereo It shows the error error according to the distance detected using the camera.

도 2에서 (a)는 라이다에서 채널별 라인으로 측정된 깊이 맵으로 채널에 따라 색상을 다르게 표현하였다. 여기서 라이다의 다수의 채널 중 위쪽에 배치된 채널부터 순차적으로 식별자(ID)를 할당하였으며, 초록색이 낮은 식별자(ID)의 채널을 나타내고, 초록색이 낮은 붉은 색이 높은 식별자(ID)의 채널을 나타낸다. 차량 등에 장착되는 대부분의 라이다에서는 센서 각도에 따라 구면 형상(spherical shape)으로 레이저 광이 방출되므로, 레이저 광을 수평 방향으로 방출하는 채널, 즉 낮은 식별자(ID)를 갖는 채널에서 측정되는 깊이가 일반적으로 더 크게 나타난다.In FIG. 2, (a) is a depth map measured by lines for each channel in the lidar, and colors are expressed differently according to channels. Here, identifiers (IDs) are sequentially assigned from the channel arranged at the top of the plurality of channels of the lidar, and the green color indicates the channel with the low identifier (ID), and the green color indicates the red color is the high identifier (ID) channel. Show. In most lidars mounted on vehicles, etc., laser light is emitted in a spherical shape according to the sensor angle, so the depth measured in the channel that emits the laser light in the horizontal direction, that is, a channel with a low identifier (ID) is It usually appears larger.

(b)는 스테레오 카메라를 이용하여 획득된 깊이 오차 맵으로 실제 거리와의 오차의 크기에 따라 색상을 다르게 표현하였다. (b)에서 파란색은 오차가 작음을 의미하고, 붉은 색으로 갈수록 오차가 큼을 의미한다.(b) is a depth error map obtained using a stereo camera, and colors are expressed differently according to the size of the error from the actual distance. In (b), blue means the error is small, and the error becomes larger as the color becomes red.

도 3에서 (a)는 도 2의 (a)에 도시된 바와 같이 라이다에서 측정된 각 채널별로 측정된 거리의 평균을 나타내고, (b)는 도 2의 (b)에 깊이 오차 맵에서 거리에 따른 평균 오차를 나타낸다. 도 2와 도 3의 (b)를 참조하면 스테레오 카메라에서는 거리에 따라 오차가 매우 크게 증가됨을 알 수 있다.In FIG. 3, (a) shows the average of the distances measured by each channel measured by the lidar as shown in (a) of FIG. 2, and (b) is the distance in the depth error map in (b) of FIG. Shows the average error according to. Referring to FIGS. 2 and 3B, it can be seen that in a stereo camera, the error is very largely increased according to the distance.

즉 도 2의 (a), (b)와 도 3의 (a), (b)를 살펴보면, 라이다는 거리가 멀어지더라도 정확한 깊이 측정이 가능하지만, 라이다의 채널 수에 따라 측정이 되지 않는 영역의 비율이 매우 높다. 반면, 스테레오 카메라는 장면의 대부분의 영역에 대해 측정이 수행될 수 있으나, 거리가 멀어질수록 측정 오차가 크게 증가한다.That is, looking at (a), (b) of Fig. 2 and (a), (b) of Fig. 3, the rider can accurately measure the depth even if the distance increases, but it is not measured according to the number of channels of the rider. The percentage of areas that don't work is very high. On the other hand, in the stereo camera, measurement can be performed on most areas of the scene, but the measurement error increases significantly as the distance increases.

이에 본 실시예에서는 라이다와 스테레오 카메라의 상호 보완적인 특성을 고려하여, 저채널의 라이다가 상대적으로 먼거리를 측정하게 되는 수평 방향을 집중하여 스캔하도록 하고, 나머지 영역의 깊이 정보는 (c)와 같이 스테레오 카메라에서 획득된 데이터를 이용하여 보완함으로써 (d)에 도시된 바와 같이 정확한 깊이 맵을 획득할 수 있도록 한다. 특히 본 실시예에서는 컬러 이미지의 색상 정보를 라이다가 획득한 깊이 정보와 스테레오 카메라가 획득한 깊이 정보를 융합하기 위한 가이드로서 이용함으로써, 정확한 깊이 맵을 추출할 수 있도록 한다.Accordingly, in this embodiment, in consideration of the complementary characteristics of the lidar and the stereo camera, the horizontal direction in which the lidar of the low channel measures a relatively long distance is concentrated and scanned, and the depth information of the remaining area is (c). As shown in (d), an accurate depth map can be obtained by supplementing with data obtained from the stereo camera. In particular, in this embodiment, the color information of the color image is used as a guide for fusing the depth information acquired by the lidar and the depth information acquired by the stereo camera, so that an accurate depth map can be extracted.

도 4는 본 발명의 일 실시예에 따른 깊이 추정 장치의 개략적 구조를 나타내고, 도 5는 도 4의 깊이 추정 장치의 동작을 설명하기 위한 도면이며, 도 6은 도 4의 융합 특징 추출부의 동작을 설명하기 위한 도면이다.4 shows a schematic structure of a depth estimation apparatus according to an embodiment of the present invention, FIG. 5 is a diagram for explaining the operation of the depth estimation apparatus of FIG. 4, and FIG. 6 is a diagram illustrating the operation of the fusion feature extraction unit of FIG. 4. It is a drawing for explanation.

도 4 및 도 5를 참조하면, 본 실시예에 따른 깊이 추정 장치는 저채널 라이다에서 획득된 깊이 데이터와 스테레오 카메라에서 획득된 디스패리티 맵 및 컬러 이미지를 기반으로 미리 학습된 패턴 추정 방식에 따라 유효 영역의 깊이 특징을 추출하는 인코더(100)와 인코더(100)에서 추출된 특징을 미리 학습된 방식에 따라 디코딩하여 비유효 영역의 깊이 정보를 보완한 깊이 맵을 생성하는 디코더(200)를 포함할 수 있다.4 and 5, the depth estimation apparatus according to the present embodiment is performed according to a pattern estimation method learned in advance based on depth data obtained from a low-channel lidar, a disparity map obtained from a stereo camera, and a color image. Includes an encoder 100 for extracting depth features of an effective area and a decoder 200 for generating a depth map supplemented with depth information of an ineffective area by decoding the features extracted from the encoder 100 according to a pre-learned method. can do.

그리고 인코더(100)는 컬러 가이드 특징 추출부(110), 라이다 깊이 특징 추출부(120) 및 스테레오 깊이 특징 추출부(130)를 포함할 수 있다.Further, the encoder 100 may include a color guide feature extractor 110, a lidar depth feature extractor 120, and a stereo depth feature extractor 130.

라이다 깊이 특징 추출부(120)는 저채널 라이다가 획득한 저밀도의 깊이 데이터 인가받고, 인가된 저밀도의 깊이 데이터에서 유효 영역의 깊이 특징을 추출한다. 라이다 깊이 특징 추출부(120)는 라이다에서 획득된 저밀도 깊이 데이터로부터 저밀도 깊이 맵인 라이다 입력 레이블(S_L)을 획득하는 라이다 깊이 맵 획득부(121)와 다단 구조로 연결되어 라이다 입력 레이블(S_L)에서 유효 영역의 깊이 특징을 추출하는 다수의 라이다 특징 추출부(122 ~ 124)를 포함할 수 있다.The lidar depth feature extraction unit 120 receives the low-density depth data obtained by the low-channel lidar, and extracts the depth feature of the effective area from the applied low-density depth data. The lidar depth feature extraction unit 120 is connected to a lidar depth map acquisition unit 121 that acquires a _{lidar input label S L} , which is a low density depth map from low density depth data obtained from lidar, in a multi-stage structure. A plurality of lidar feature extracting units 122 to 124 for extracting depth features of the effective area from the input label S _{L may be included.}

라이다 깊이 맵 획득부(121)는 저채널 라이다에서 획득된 3D 월드 좌표계 값으로 획득되는 저밀도의 깊이 데이터를 스테레오 깊이 맵 및 컬러 이미지에 매칭시킬 수 있도록 2D 이미지 좌표로 변환하여, 라이다 입력 레이블(S_L)을 획득한다. 여기서 저채널 라이다는 가능한 수평 방향에서의 깊이 정보를 획득하도록 레이저의 방사 방향이 조절될 수 있다. 이는 상기한 바와 같이, 스테레오 카메라의 경우 먼 거리에 대한 깊이 측정에 대한 오차가 매우 크기 때문에 라이다가 적은 수의 채널로도 가급적 먼 거리의 깊이 값을 정확하게 측정할 수 있도록 하기 위해서이다.The lidar depth map acquisition unit 121 converts low-density depth data acquired as a 3D world coordinate system value acquired from a low-channel lidar into 2D image coordinates to match a stereo depth map and a color image, and inputs lidar. Obtain the label (S _{L ).} Here, the laser radiation direction may be adjusted so as to obtain depth information in the horizontal direction as much as possible for the low-channel radar. This is because, as described above, in the case of a stereo camera, the error in measuring the depth over a long distance is very large, so that the lidar can accurately measure the depth value over a distance as much as possible with a small number of channels.

라이다 입력 레이블(S_L)은 저밀도의 깊이 데이터의 3D 라이다 포인트 클라우드(d_L)를 수학식 1과 같이 캘리브레이션 매트릭스(calibration matrix)(H) 및 고유 카메라 메트릭(intrinsic camera metric)(P)에 기반하여, 스테레오 카메라에서 획득되는 한 쌍의 스테레오 이미지 중 하나(여기서는 일예로 좌이미지)(I_L)인 2D 이미지 좌표 상에 투영함으로써 획득될 수 있다.La is an input label (S _L) is a 3D la of low density depth data of the point cloud (d _L) of the calibration matrix (calibration matrix) (H) and a unique camera metric (intrinsic camera metric) as shown in Equation 1 (P) Based on, it may be acquired by projecting onto 2D image coordinates, which is one of a pair of stereo images obtained from a stereo camera (here, as an example, a left image) (I _{L ).}

여기서 (x, y, z)와 (u, v)는 각각 3D 월드 좌표(world coordinate)에서 라이다 포인트와 2D 이미지 좌표에서 대응하는 픽셀의 위치를 나타낸다.Here, (x, y, z) and (u, v) represent the positions of the LiDAR points in the 3D world coordinates and the corresponding pixels in the 2D image coordinates, respectively.

다수의 라이다 특징 추출부(122 ~ 124)는 도 5에 도시된 바와 같이, 라이다 깊이 맵 획득부(121)에서 획득한 라이다 입력 레이블(S_L) 또는 이전 단의 라이다 특징 추출부(122 ~ 123)에서 획득된 라이다 깊이 특징에 대응하는 마스크를 결합하고 학습된 패턴 추정 방식에 따라 인코딩하여 유효한 깊이 특징을 추출한다. 여기서 라이다 입력 레이블(S_L) 또는 라이다 깊이 특징에 대응하는 마스크를 결합하는 것은 저밀도 깊이 맵인 라이다 입력 레이블(S_L)이 희소한 유효값을 가지기 때문에, 유효값을 갖는 영역의 깊이 특징이 추출되는 반면 비유효한 영역의 값은 깊이 특징으로 추출되지 않도록 하기 위함이다.As shown in FIG. 5, the plurality of LiDAR feature extraction units 122 to 124 may include a LiDAR input label S _L obtained from the LiDAR depth map acquisition unit 121 or a LiDAR feature extraction unit of the previous stage, as shown in FIG. 5. Masks corresponding to the lidar depth features obtained in (122 ~ 123) are combined and encoded according to the learned pattern estimation method to extract effective depth features. Here, combining the lidar input label (S _L ) or the mask corresponding to the lidar depth feature is because the lidar input label (S _L ), which is a low-density depth map, has a sparse effective value. While is extracted, the value of the invalid region is to prevent extraction as a depth feature.

한편, 스테레오 깊이 특징 추출부(130)는 스테레오 카메라에서 획득되는 디스패리티 맵(D_S)의 깊이 특징을 추출한다. 스테레오 깊이 특징 추출부(130)는 스테레오 카메라에서 획득된 디스패리티 맵(D_S)을 인가받고, 인가된 디스패리티 맵(D_S)을 카메라의 촬영 특성을 고려하여 정렬하여 스테레오 깊이 맵(

)을 획득하고, 획득된 스테레오 깊이 맵(

)의 밀도를 라이다 입력 레이블(S_L)에 대응하는 밀도를 갖도록 변환하여 저밀도 스테레오 깊이 맵인 스테레오 입력 레이블(S_S)을 획득하는 스테레오 깊이 맵 획득부(131)와 다단 구조로 연결되어 스테레오 입력 레이블(S_S)에서 유효 영역의 깊이 특징을 추출하는 다수의 스테레오 특징 추출부(132 ~ 134)를 포함할 수 있다.Meanwhile, the stereo depth feature extraction unit 130 extracts a depth feature of the disparity map D _S obtained from the stereo camera. The stereo depth feature extraction unit 130 receives the disparity map D _S obtained from the stereo camera, and aligns the applied disparity map D _S in consideration of the photographing characteristics of the camera, and the stereo depth map (

), and the obtained stereo depth map (

) Density is the link to La is input label (S _L) is converted to have a density of low-density stereo depth map, the stereo input label (S _S) stereo depth map obtaining unit 131 and the multi-stage structure for obtaining a corresponding to the stereo input of the It may include a plurality of stereo feature extraction units (132 to 134) for extracting the depth feature of the effective region from the label (S _{S ).}

스테레오 깊이 맵 획득부(131)는 스테레오 카메라에서 좌영상 및 우영상 사이의 차이를 기반으로 획득한 디스패리티 맵(Ds)에 대해 스테레오 카메라의 수평 초점 거리(f_u) 및 기준선(b_s)을 적용하여 수학식 2에 따라 스테레오 깊이 맵(

)을 획득한다. 이는 스테레오 카메라에서 획득되는 디스패리티 맵(Ds)을 라이다가 스캔한 영역과 매칭시키기 위해서이다. _{The stereo depth map acquisition unit 131 calculates the horizontal focal length (f u} ) and the reference line (b _s ) of the stereo camera with respect to the disparity map (Ds) obtained based on the difference between the left image and the right image from the stereo camera. By applying the stereo depth map according to Equation 2 (

). This is to match the disparity map Ds obtained from the stereo camera with the area scanned by the lidar.

다만 스테레오 깊이 맵(

)은 저밀도 깊이 맵인 라이다 입력 레이블(S_L)에 비해 고밀도 깊이 정보를 포함하고 있으므로, 이후 깊이 정보의 융합 과정에서 스테레오 깊이 맵(

)의 고밀도 정보는 라이다 입력 레이블(S_L)보다 크게 영향을 미칠 수 있다. 이에 스테레오 깊이 맵 획득부(131)는 랜덤하게 생성되는 초기 마스크(

)를 수학식 3과 같이 스테레오 깊이 맵(

)에 적용하여 라이다 입력 레이블(S_L)에 대응하는 밀도를 갖는 스테레오 입력 레이블(S_S)을 획득한다.Just a stereo depth map (

) Contains high-density depth information compared to the _{LIDAR input label (S L} ), which is a low-density depth map, so the stereo depth map (

), the high-density information may have a greater influence than the lidar _{input label (S L ).} Accordingly, the stereo depth map acquisition unit 131 randomly generates an initial mask (

) As in Equation 3, the stereo depth map (

) To obtain _{a stereo input label S S} having a density corresponding to the lidar input label S _L.

여기서 ??는 원소별 곱셈 연산자이다.Where ?? is an element-wise multiplication operator.

다수의 스테레오 특징 추출부(132 ~ 134)는 스테레오 깊이 맵 획득부(131)에서 획득한 스테레오 입력 레이블(S_S) 또는 이전 단의 스테레오 특징 추출부(132 ~ 133)에서 획득된 스테레오 깊이 특징에 대응하는 마스크를 결합하여 유효한 깊이 특징을 추출한다. 여기서 스테레오 입력 레이블(S_S) 또는 스테레오 깊이 특징에 대응하는 마스크를 결합하는 것은 라이다 입력 레이블(S_L)에 대응하여 저밀도로 변환된 스테레오 입력 레이블(S_S)이 희소한 유효값을 가지기 때문에, 유효값을 갖는 영역의 깊이 특징을 추출하기 위해서이다.The plurality of stereo feature extracting units 132 to 134 are based on the stereo input label S _S obtained from the stereo depth map obtaining unit 131 or the stereo depth features obtained from the stereo feature extracting units 132 to 133 of the previous stage. The corresponding masks are combined to extract effective depth features. Here, because the stereo input label (S _S), it is referred to combine the mask corresponding to or stereo depth characteristic input label in response to the (S _L) is converted to low-density stereo input label (S _S) is have a rare valid value , In order to extract the depth feature of the region with a valid value.

여기서 상기한 다수의 라이다 특징 추출부(122 ~ 124)와 다수의 스테레오 특징 추출부(132 ~ 134)는 인공 신경망으로서, 특히 각각 부분 컨볼루션(Partial Convolution: PC) 레이어로 구현될 수 있다.Here, the plurality of LiDAR feature extractors 122 to 124 and the stereo feature extractors 132 to 134 are artificial neural networks, and in particular, each may be implemented as a partial convolution (PC) layer.

부분 컨볼루션은 학습시에 각 레이어의 가중치뿐만 아니라 마스크를 함께 학습시켜, 유효하지 않은 픽셀 영역의 영향을 최소화할 수 있도록 구성되는 컨볼루션 레이어로서 수학식 4에 따라 부분 컨볼루션 필터링 동작을 수행하도록 구성될 수 있다.Partial convolution is a convolutional layer configured to minimize the influence of an invalid pixel area by learning the mask as well as the weight of each layer at the time of learning, and to perform the partial convolution filtering operation according to Equation 4 Can be configured.

여기서 X는 픽셀(u,v)별 특성값(깊이값)(x)을 원소로 갖는 부분 컨볼루션 레이어의 입력인 2차원 행렬로서 라이다 입력 레이블(S_L) 또는 라이다 깊이 특징이거나, 스테레오 입력 레이블(S_S) 또는 스테레오 깊이 특징이다. 그리고 W는 가중치(w)를 원소로 갖는 가중치 행렬이고, b는 바이어스(bias) 값이다. 한편 M은 m을 원소로 갖는 마스크 행렬이다. 라이다 입력 레이블(S_L) 또는 라이다 깊이 특징과 스테레오 입력 레이블(S_S) 또는 스테레오 깊이 특징에 대응하도록 학습되는 마스크 행렬(M)의 마스크 원소(m)는 수학식 5에 따라 업데이트 될 수 있다.Here, X is a two-dimensional matrix that is an input of a partial convolution layer having a characteristic value (depth value) (x) for each pixel (u,v) as an element, and is a LiDAR input label (S _L ) or a LiDAR depth feature, or a stereo It is an input label (S _S ) or a stereo depth feature. In addition, W is a weight matrix having a weight (w) as an element, and b is a bias value. Meanwhile, M is a mask matrix having m as an element. The mask element (m) of the mask matrix (M) learned to correspond to the lidar input label (S _L ) or the lidar depth feature and the stereo input label (S _{S) or the stereo depth feature may be updated according to Equation 5.} have.

수학식 5에 따르면, 마스크 행렬(M)의 원소(m)는 주변 원소의 값과의 합에 따라 1 또는 0으로 설정될 수 있다.According to Equation 5, the element m of the mask matrix M may be set to 1 or 0 according to the sum of the values of the surrounding elements.

다만 수학식 5에 따라 마스크 행렬(M)의 위치(u,v)에서의 마스크 원소(m(u,v))가 0으로 업데이트 되면, 수학식 4에서 0으로 나누어지는 오류가 발생되지 않도록, 부분 컨볼루션 레이어의 출력(ㆈ(u,v))을 0 벡터로 설정한다.However, if the mask element (m(u,v)) at the position (u,v) of the mask matrix M is updated to 0 according to Equation 5, so that an error dividing by 0 in Equation 4 does not occur, The output of the partial convolution layer (ㆈ(u,v)) is set to a 0 vector.

여기서 제1 및 제2 특징 추출부(122, 123, 132, 133)을 제외한 나머지 특징 추출부(124, 134)는 2의 스트라이드(stride)를 갖도록 설정될 수 있다.Here, the feature extracting units 124 and 134 other than the first and second feature extracting units 122, 123, 132, and 133 may be set to have a stride of two.

도 4에서는 설명의 편의를 위하여 라이다 깊이 특징 추출부(120)와 스테레오 깊이 특징 추출부(130)가 각각 3개의 특징 추출부(122 ~ 124, 132 ~ 134)를 포함하는 것으로 도시하였으나, 특징 추출부의 개수는 다양하게 조절될 수 있다. 이에 도 5에서는 라이다 깊이 특징 추출부(120)와 스테레오 깊이 특징 추출부(130)가 각각 5개의 특징 추출부를 포함하는 것으로 가정하여 도시하였다.In FIG. 4, it is shown that the lidar depth feature extracting unit 120 and the stereo depth feature extracting unit 130 each include three feature extracting units 122 to 124 and 132 to 134 for convenience of description. The number of extraction units can be variously adjusted. Accordingly, in FIG. 5, it is assumed that the lidar depth feature extractor 120 and the stereo depth feature extractor 130 each include five feature extractors.

다만, 라이다 깊이 특징 추출부(120)와 스테레오 깊이 특징 추출부(130)는 동일한 개수(N_O)의 특징 추출부를 포함해야 하며, 컬러 가이드 특징 추출부(110)의 융합 특징 추출부(112 ~ 124) 및 디코더(200)의 디코딩부(212 ~ 214)의 개수 또한 동일하게 포함되어야 한다.However, the lidar depth feature extracting unit 120 and the stereo depth feature extracting unit 130 must include the same number _{of feature extracting units (N O} ), and the fusion feature extracting unit 112 of the color guide feature extracting unit 110 124) and the number of decoding units 212 to 214 of the decoder 200 should also be included in the same manner.

한편, 라이다 깊이 특징 추출부(120)의 다수의 라이다 특징 추출부(122 ~ 124) 및 스테레오 깊이 특징 추출부(130)의 다수의 스테레오 특징 추출부(132 ~ 134) 각각은 추출된 깊이 특징을 다음 단의 특징 추출부로 전달할 뿐만 아니라, 컬러 가이드 특징 추출부(110)의 대응하는 특징 융합부(F1 ~ F3)로도 전달한다.On the other hand, each of the plurality of lidar feature extraction units 122 to 124 of the lidar depth feature extraction unit 120 and the plurality of stereo feature extraction units 132 to 134 of the stereo depth feature extraction unit 130 are extracted depth The feature is not only transferred to the feature extracting unit of the next stage, but also to the corresponding feature fusion units F1 to F3 of the color guide feature extracting unit 110.

라이다 특징 추출부(122 ~ 124)에서 출력되는 라이다 중간 특징(

)은 수학식 4에 따라

로 표현될 수 있으며, 스테레오 특징 추출부(132 ~ 134)에서 출력되는 스테레오 중간 특징(

) 또한 유사하게 표현될 수 있다. 여기서 o는 특징 추출부(122 ~ 124, 132 ~ 134)의 인덱스를 나타내며, 라이다 중간 특징(

)과 스테레오 중간 특징(

) 각각의 원소는

및

로 표현될 수 있다.The intermediate features of the lidar output from the lidar feature extraction units 122 to 124 (

) Is according to Equation 4

It can be expressed as, and the stereo intermediate feature output from the stereo feature extraction unit (132 ~ 134) (

) Can also be expressed similarly. Here, o denotes the index of the feature extraction units 122 to 124 and 132 to 134, and the intermediate feature (

) And stereo intermediate feature (

) Each element is

And

It can be expressed as

또한 라이다 깊이 특징 추출부(120)의 다수의 라이다 특징 추출부(122 ~ 124) 및 스테레오 깊이 특징 추출부(130)의 다수의 스테레오 특징 추출부(132 ~ 134) 각각은 중간 특징(

,

)과 함께 대응하는 마스크(M_L, M_S)를 컬러 가이드 특징 추출부(110)의 대응하는 특징 융합부(F1 ~ F3)로도 전달한다.In addition, each of the plurality of lidar feature extraction units 122 to 124 of the lidar depth feature extraction unit 120 and the plurality of stereo feature extraction units 132 to 134 of the stereo depth feature extraction unit 130 are intermediate features (

,

) And the corresponding masks M _L and M _S are also transmitted to the corresponding feature fusion units F1 to F3 of the color guide feature extraction unit 110.

도 5에 도시된 바와 같이, 라이다 특징 추출부와 스테레오 특징 추출부의 개수가 각각 5개인 경우, 라이다 특징 추출부와 스테레오 특징 추출부는 표 1과 같은 부분 컨볼루션 레이어로 구현될 수 있다.As shown in FIG. 5, when the number of the LiDAR feature extracting unit and the stereo feature extracting unit is 5, respectively, the LiDAR feature extracting unit and the stereo feature extracting unit may be implemented as partial convolutional layers as shown in Table 1.

표 1에서 LN은 레이어 정규화 구성을 나타내고, LReLU는 누설 정류 선형 유닛을 의미한다.In Table 1, LN denotes a layer normalization configuration, and LReLU denotes a leakage rectification linear unit.

컬러 가이드 특징 추출부(110)는 스테레오 카메라에서 획득된 컬러 이미지를 인가받아 컬러 이미지의 컬러 정보를 기반으로 라이다 깊이 특징 추출부(120)에서 추출되는 라이다 깊이 특징과 스테레오 깊이 특징 추출부(130)에서 추출되는 스테레오 깊이 특징을 융합하여 융합 특징을 추출한다.The color guide feature extraction unit 110 receives the color image obtained from the stereo camera and extracts the lidar depth feature and stereo depth feature extracting unit from the lidar depth feature extraction unit 120 based on the color information of the color image. 130), the fusion features are extracted by fusing the stereo depth features.

컬러 정보를 라이다 깊이 특징과 스테레오 깊이 특징을 융합하고자 하는 경우 다수의 픽셀 중 어느 픽셀의 깊이 정보가 중요한지 고려될 필요가 있으며, 이에 컬러 가이드 특징 추출부(110)는 컬러 정보를 라이다 깊이 특징과 스테레오 깊이 특징의 융합시에 선택적으로 융합되도록 하여, 최적의 깊이 특징을 추출하도록 한다.In the case of fusing the color information with the lidar depth feature and the stereo depth feature, it is necessary to consider which depth information of a plurality of pixels is important. Accordingly, the color guide feature extraction unit 110 converts the color information to the lidar depth feature. When the and stereo depth features are fused, they are selectively fused to extract an optimal depth feature.

컬러 가이드 특징 추출부(110)는 컬러 이미지 획득부(111)와 다수의 융합 특징 추출부(112 ~ 114) 및 다수의 특징 융합부(F1 ~ F3)를 포함할 수 있다. 컬러 이미지 획득부(111)는 스테레오 카메라가 획득한 컬러 이미지인 좌 이미지(I_L)와 우 이미지(I_R) 중 하나를 획득한다. 여기서는 일예로 좌 이미지(I_L)를 획득하는 것으로 가정하여 설명하지만, 우 이미지(I_R)를 획득하여도 무방하다.The color guide feature extraction unit 110 may include a color image acquisition unit 111, a plurality of fusion feature extraction units 112 to 114, and a plurality of feature fusion units F1 to F3. The color image acquisition unit 111 acquires one of a left image I _L and a right image I _R , which are color images acquired by a stereo camera. Here, it is assumed that the left image I _L is acquired as an example, but it is also possible to acquire the _{right image I R.}

다수의 융합 특징 추출부(112 ~ 114)는 컬러 이미지 또는 이전단의 특징 융합부(F1 ~ F3)에서 출력되는 융합 특징을 인가받아 미리 학습된 패턴 방식에 따라 가이드 특징을 추출한다.The plurality of fusion feature extraction units 112 to 114 receive a color image or a fusion feature output from the feature fusion units F1 to F3 in the previous stage and extract guide features according to a pre-learned pattern method.

여기서 다수의 융합 특징 추출부(112 ~ 114)는 개수가 증가되더라도 라이다 깊이 특징과 스테레오 깊이 특징의 융합을 가이드를 위해 이용되는 컬러 이미지의 특징을 유지하고 정확하게 학습될 수 있도록, 도 5에 도시된 바와 같이, 입력되는 융합 특징을 스킵 커넥션(skip connection)을 통해 출력되는 가이드 특징과 다시 결합(concatenation)하여 출력하는 잔류 네트워크(residual network: 이하 ResNet) 구조로 구성될 수 있다. Here, the plurality of fusion feature extraction units 112 to 114 are shown in FIG. 5 so that even if the number of the fusion feature extraction units is increased, the feature of the color image used for guiding the fusion of the lidar depth feature and the stereo depth feature can be maintained and accurately learned. As described above, a residual network (hereinafter, ResNet) structure that concatenates and outputs an input fusion feature with a guide feature output through a skip connection may be configured.

다수의 특징 융합부(F1 ~ F3)는 대응하는 융합 특징 추출부(112 ~ 114)에서 추출된 가이드 특징과 라이다 깊이 특징 추출부(120)의 대응하는 라이다 특징 추출부(122 ~ 124)에서 인가되는 라이다 깊이 특징 및 스테레오 깊이 특징 추출부(130)의 대응하는 스테레오 특징 추출부(132 ~ 134)에서 인가되는 스테레오 깊이 특징을 학습된 방식으로 선택적 융합하여 융합 특징을 출력한다.The plurality of feature fusion units F1 to F3 are guide features extracted from the corresponding fusion feature extracting units 112 to 114 and corresponding lidar feature extraction units 122 to 124 of the lidar depth feature extraction unit 120 The fusion feature is output by selectively fusing the lidar depth feature applied from and the stereo depth feature applied from the corresponding stereo feature extracting units 132 to 134 of the stereo depth feature extracting unit 130 in a learned manner.

이때 다수의 특징 융합부(F1 ~ F3)는 다수의 라이다 특징 추출부(122 ~ 124) 및 다수의 스테레오 특징 추출부(132 ~ 134)로부터 깊이 특징과 함께 대응하는 라이다 마스크(M_L) 및 스테레오 마스크(M_S)도 함께 인가받고, 깊이 특징과 마스크를 함께 융합하여 융합 특징을 획득할 수 있다.At this time, the plurality of feature fusion units F1 to F3 correspond to the depth features from the plurality of lidar feature extracting units 122 to 124 and the stereo feature extracting units 132 to 134 together with a lidar mask (M _L ). And a stereo mask M _S is also applied, and a fusion feature may be obtained by fusing the depth feature and the mask together.

도 4 및 도 5에서는 설명의 편의를 위해, 다수의 특징 융합부(F1 ~ F3)를 다수의 융합 특징 추출부(112 ~ 114)와 별도로 도시하였으나, 다수의 특징 융합부(F1 ~ F3)는 다수의 융합 특징 추출부(112 ~ 114)에 포함되어 구성될 수 있다.In FIGS. 4 and 5, for convenience of explanation, a plurality of feature fusion units F1 to F3 are illustrated separately from a plurality of fusion feature extraction units 112 to 114, but the plurality of feature fusion units F1 to F3 are It may be included in a plurality of fusion feature extraction units 112 to 114 and configured.

다수의 특징 융합부(F1 ~ F3)는 도 6에 도시된 바와 같이, 대응하는 융합 특징 추출부(112 ~ 114)에서 추출된 가이드 특징(

)과 대응하는 라이다 특징 추출부(122 ~ 124)에서 인가되는 라이다 깊이 특징(

), 라이다 마스크(

) 및 다수의 스테레오 특징 추출부(132 ~ 134)에서 인가되는 스테레오 깊이 특징(

), 스테레오 마스크(

)를 선택적으로 융합하여 융합 특징(

)을 출력한다.The plurality of feature fusion units F1 to F3 are guide features extracted from the corresponding fusion feature extraction units 112 to 114 (

) And the lidar depth feature applied from the lidar feature extraction unit (122 to 124) corresponding to

), lidar mask (

) And stereo depth features applied from the plurality of stereo feature extraction units 132 to 134 (

), stereo mask (

) By selectively fusing the features of fusion (

) Is displayed.

특징 융합부(F1 ~ F3)는 수학식 6에 따라 특징(

,

)들과 마스크들(

,

)을 융합하여 융합 특징(

)을 출력할 수 있다.Feature fusion units (F1 to F3) are characterized in accordance with Equation 6 (

,

) And masks (

,

) By fusing the features of fusion (

) Can be printed.

여기서

는 픽셀(u,v)에서의 융합 특징이고,

,

는 각각 픽셀(u,v)에 대한 가이드 특징과 라이다 깊이 특징 및 스테레오 깊이 특징이며,

,

는 각각 픽셀(u,v)에 대한 라이다 마스크 및 스테레오 마스크를 나타낸다. 그리고

,

는 각각 픽셀(u,v)에 대한 가이드 가중치와 라이다 가중치 및 스테레오 가중치이다.here

Is the fusion feature in pixels (u,v),

,

Is the guide feature, lidar depth feature, and stereo depth feature for pixels (u,v), respectively,

,

Denotes a lidar mask and a stereo mask for pixels (u,v), respectively. And

,

Are guide weights, lidar weights, and stereo weights for pixels (u,v), respectively.

수학식 6에서

,

는 도 6에 도시된 바와 같이, 라이다 깊이 특징 및 스테레오 깊이 특징에 대한 가중치(

,

)와 라이다 마스크 및 스테레오 마스크에 대한 가중치(

,

)로 구분될 수 있으나, 여기서는 통합된 가중치로 표현하였다.In Equation 6

,

6, the weights for the LiDAR depth feature and the stereo depth feature (

,

) And weights for the lidar mask and stereo mask (

,

), but here it is expressed as an integrated weight.

본 실시예에서 인코더(100)의 라이다 깊이 특징 추출부(120) 및 스테레오 깊이 특징 추출부(130)가 각각 독립적으로 라이다 깊이 맵의 깊이 특징과 스테레오 깊이 맵의 특징을 다단 구조로 추출하고, 각 단에서 추출된 깊이 특징을 별도로 다단 구조로 구성된 컬러 가이드 특징 추출부(110)의 대응하는 단으로 전달하여 융합되도록 하는 것은 라이다를 통해 획득되는 라이다 깊이 맵과 스테레오 카메라를 통해 획득되는 스테레오 깊이 맵의 깊이 특징을 가능한 유지하면서도 색상 정보를 가이드로 하여 각 단계별 반복 융합되도록 하여 깊이 특징이 정확하게 추출될 수 있도록 하기 위함이다.In this embodiment, the lidar depth feature extraction unit 120 and the stereo depth feature extraction unit 130 of the encoder 100 independently extract the depth features of the lidar depth map and the features of the stereo depth map in a multi-stage structure. , Depth features extracted from each stage are transferred to the corresponding stage of the color guide feature extraction unit 110 configured in a separate multi-stage structure to be fused. This is to ensure that the depth features of the stereo depth map can be accurately extracted by maintaining the depth features of the stereo depth map as much as possible, and repetitively fusion of each step using color information as a guide.

디코더(200)는 인코더(100)에서 획득된 융합 특징(

)을 인가받아 디코딩하여 깊이 맵을 획득한다. 디코더(200)는 다수의 디코딩부(212 ~ 214)와 다수의 결합부(C1, C2) 및 깊이 맵 출력부(211)를 포함할 수 있다.The decoder 200 is the fusion feature obtained from the encoder 100 (

) Is applied and decoded to obtain a depth map. The decoder 200 may include a plurality of decoding units 212 to 214, a plurality of combining units C1 and C2, and a depth map output unit 211.

다수의 디코딩부(212 ~ 214)는 각각 다수의 융합 특징 추출부(112 ~ 114)의 역순에 대응하여 구성되고, 학습된 특징 복원 방식에 따라 인코더(100)에서 인가되는 융합 특징(

) 또는 결합부(C1, C2)에서 출력되는 결합 복원 특징을 순차적으로 디코딩하여 복원 특징을 출력한다.Each of the plurality of decoding units 212 to 214 is configured to correspond to the reverse order of the plurality of fusion feature extraction units 112 to 114, and the fusion features applied from the encoder 100 according to the learned feature reconstruction method (

) Or by sequentially decoding the combined and reconstructed features output from the combiners C1 and C2 to output the reconstructed features.

다수의 결합부(C1, C2)는 대응하는 디코딩부(213, 214)에서 출력되는 복원 특징을 대응하는 특징 융합부(F2, F3)에서 출력되는 융합 특징과 결합하여 다음 단의 디코딩부(212, 213)로 결합 복원 특징을 출력한다. 다수의 결합부(C1, C2)는 대응하는 특징 융합부(F2, F3)에서 출력되는 융합 특징을 스킵 커넥션을 통해 인가받아 복원 특징과 결합하여 출력함으로써, 디코딩부(212 ~ 214)는 깊이 특징이 추출되기 이전의 정보를 함께 고려하여 정확한 복원을 수행할 수 있게 된다.The plurality of combining units C1 and C2 combine the reconstructed features output from the corresponding decoding units 213 and 214 with the fusion features output from the corresponding feature fusion units F2 and F3, and the decoding unit 212 of the next stage. , 213). The plurality of coupling units (C1, C2) receives the fusion features output from the corresponding feature fusion units (F2, F3) through the skip connection and combines them with the restoration features to output them, so that the decoding units (212 to 214) have depth features. Accurate restoration can be performed by considering the information before the extraction.

깊이 맵 출력부(211)는 다수의 디코딩부(212 ~ 214)에 의해 복원된 깊이 맵을 출력한다. 여기서 복원된 깊이 맵은 희소한 저밀도의 라이다 깊이 정보와 스테레오 깊이 정보를 융합하고 컬러 이미지의 색상 정보를 기반으로 고밀도로 보완한 깊이 맵으로 출력된다.The depth map output unit 211 outputs a depth map restored by a plurality of decoding units 212 to 214. The reconstructed depth map is output as a depth map that combines rare low-density lidar depth information and stereo depth information, and complements the color information with high density based on the color information of the color image.

도 5에 도시된 바와 같이, 융합 특징 추출부와 디코딩부의 개수가 각각 5개인 경우, 융합 특징 추출부와 디코딩부는 표 2와 같이 ResNet 블록 및 컨볼루션 레이어로 구현될 수 있다.As shown in FIG. 5, when the number of the fusion feature extraction unit and the decoding unit is 5, respectively, the fusion feature extraction unit and the decoding unit may be implemented as a ResNet block and a convolution layer as shown in Table 2.

한편 도 4에 도시된 본 실시예의 깊이 추정 장치는 인공 신경망으로 구현되므로, 학습이 수행되어야 하며, 학습은 미리 깊이 값이 측정되어 레이블된 컬러 이미지를 학습용 데이터로 이용하여 수행될 수 있다. 그리고 학습 시에 손실은 깊이 맵 출력부(211)에서 출력되는 깊이 맵의 깊이 값(S_G)과 학습용 데이터의 깊이값(S_V) 사이의 차이를 L₂ 놈(Norm)으로 계산하여 수학식 7과 같이 획득될 수 있다.Meanwhile, since the depth estimation apparatus of the present embodiment illustrated in FIG. 4 is implemented with an artificial neural network, learning must be performed, and the learning may be performed using a color image labeled with a depth value measured in advance as learning data. In addition, the loss during learning is calculated by calculating the difference between the _{depth value (S G} ) of the depth map output from the depth map output unit 211 and the depth value (S _V _{) of the training data as L 2 norm.} Can be obtained like 7.

여기서 p는 깊이 맵에서의 좌표(u,v)를 나타내고, Ω(S_V)는 유효 깊이값을 포함한 좌표 집합을 나타낸다. Ω(S_V)는 현재 제공될 수 있는 학습용 데이터에도 대부분 누락된 값이 있으므로, 유효한 위치에서의 손실만을 평가하도록 하기 위해서이다.Here, p denotes coordinates (u,v) in the depth map, and Ω(S _V ) denotes a coordinate set including an effective depth value. Ω(S _V ) is to evaluate only the loss at a valid position, since most of the values for learning data that can be provided currently have missing values.

수학식 7에 따라 손실이 계산되면, 계산된 손실을 디코더(200) 및 인코더(100)로 역전파하여 학습을 수행한다.When the loss is calculated according to Equation 7, the calculated loss is backpropagated to the decoder 200 and the encoder 100 to perform learning.

도 7은 벤치마크 데이터를 이용하여, 본 실시예에 따른 깊이 추정 장치의 성능을 측정한 결과를 나타낸다.7 shows a result of measuring the performance of the depth estimation apparatus according to the present embodiment, using benchmark data.

도 7에서 KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute) 벤치마크 데이터를 이용하였으며, (a)는 입력되는 컬러 이미지를 나타내고, (b)는 저채널 라이다에서 획득된 깊이 데이터를 나타내고, (c)는 다수의 스테레오 특징 추출부(132 ~ 134)가 추출된 스테레오 특징을 각각 대응하는 특징 융합부(F1 ~ F3)로 전송하지고 깊이 정보를 추출한 결과를 나타내며, (d)는 디코더(200)의 다수의 결합부(C1, C2)가 복원 특징을 융합 특징과 결합하지 않는 경우를 나타낸다. 그리고 (e)는 본 실시예에 따라 획득된 깊이 맵을 나타내며, (f)는 고채널 라이다 등으로 실제 깊이가 측정된 깊이 맵을 나타낸다.In FIG. 7, KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) benchmark data was used, (a) represents an input color image, (b) represents depth data obtained from a low-channel lidar, (c ) Denotes the result of extracting depth information without transmitting the extracted stereo features by the plurality of stereo feature extraction units 132 to 134 to the corresponding feature fusion units F1 to F3, respectively, and (d) denotes the decoder 200 It represents a case in which a plurality of coupling portions C1 and C2 of do not combine the restoration feature with the fusion feature. And (e) represents the depth map obtained according to the present embodiment, and (f) represents the depth map in which the actual depth is measured using a high channel lidar or the like.

도 7의 (b)에 도시된 바와 같이, 본 실시예에서는 저채널 라이다의 모든 채널이 가능한 원거리를 측정할 수 있도록 수평 방향에 집중하여 스캔하도록 구성하였으며, 저채널에서 라이다에서 스캔되지 않은 영역에 대해서는 스테레오 카메라에서 획득된 깊이 정보와 컬러 정보를 이용하여 보완되도록 하였다.As shown in (b) of FIG. 7, in this embodiment, all channels of the low-channel lidar are configured to scan concentrated in the horizontal direction so that possible long distances can be measured, and the low-channel lidar is not scanned. For the area, the depth information and color information obtained from the stereo camera were used to supplement the area.

표 3은 도 7의 결과를 정량적으로 평가한 결과를 나타낸다.Table 3 shows the results of quantitatively evaluating the results of FIG. 7.

표 3은 종래의 깊이 추정 기법과 본 실시예에서 라이다 깊이 맵 획득부(121)가 라이다 입력 레이블(S_L) 또는 스테레오 입력 레이블(S_S)을 획득하지 않는 경우, 그리고 다수의 스테레오 특징 추출부(132 ~ 134)가 추출된 스테레오 특징을 각각 대응하는 특징 융합부(F1 ~ F3)로 전송하지 않거나 디코더(200)의 다수의 결합부(C1, C2)가 복원 특징을 융합 특징과 결합하지 않는 경우 및 다수의 특징 융합부(F1 ~ F3)가 가이드 특징과 라이다 깊이 특징 및 스테레오 깊이 특징을 선택적 융합하지 않는 경우 각각을 비교하여 성능을 평가한 결과를 나타낸다.Table 3 shows a case in which the LiDAR depth map acquisition unit 121 _{does not acquire a LiDAR input label S L} or a stereo input label S _S in the present embodiment, and a plurality of stereo features. The extraction units 132 to 134 do not transmit the extracted stereo features to the corresponding feature fusion units F1 to F3, or multiple combiners C1 and C2 of the decoder 200 combine the reconstructed features with the fusion features. If not, and if the plurality of feature fusion units F1 to F3 do not selectively fuse the guide feature, the lidar depth feature, and the stereo depth feature, the performance evaluation results are shown by comparing each of them.

도 7 및 표 3에 도시된 바와 같이, 본 실시예에 따른 깊이 추정 장치는 평균제곱근오차(Root Mean Square Error: RMSE) 및 평균절대값오차(Mean Absolute Error:MAE)가 다른 경우에 비해 낮음을 알 수 있다. 즉 정확한 깊이를 측정할 수 있음을 알 수 있다.7 and Table 3, the depth estimation apparatus according to the present embodiment has a lower mean square error (RMSE) and a mean absolute error (MAE) than other cases. Able to know. That is, it can be seen that accurate depth can be measured.

도 8은 라이다의 채널 수와 스캔 방향에 따라 획득되는 깊이 맵의 차이를 설명하기 위한 도면이다.8 is a diagram for explaining a difference between a depth map obtained according to the number of channels of a lidar and a scan direction.

도 8에서 (a)는 컬러 이미지와 실측된 깊이 맵을 나타내며, (b)는 고채널 라이다에서 획득된 라이다 깊이 데이터를 본 실시예에 적용한 결과를 나타낸다.In FIG. 8, (a) shows the color image and the measured depth map, and (b) shows the result of applying the lidar depth data obtained from the high-channel lidar to the present embodiment.

그리고 (c)는 저채널 라이다의 각 채널이 서로 균등 각도로 분산되어 스캔을 수행한 라이다 깊이 데이터와 이를 적용하여 획득되는 깊이 맵을 나타내고, (d) 내지 (f)는 저채널 라이다가 집중 스캔을 수행하는 스캔 방향에 따른 깊이 데이터와 이를 적용하여 획득되는 깊이 맵을 나타낸다.And (c) shows the lidar depth data obtained by applying the scan and the lidar depth data in which each channel of the low-channel lidar is distributed at an equal angle to each other, and (d) to (f) are low-channel lidar. Represents depth data according to the scan direction in which the intensive scan is performed and a depth map obtained by applying the same.

도 8에서 (f)에 도시된 바와 같이 저채널 라이다를 수평 방향에 집중하여 스캔하는 경우, (c)와 같이 분산 스캔을 수행하는 경우나 (f) 지면 방향으로 집중하여 스캔을 수행하는 경우에 비해, 매우 정확도가 높아져서 (b)에 도시된 고채널 라이다를 이용하는 경우와 유사한 결과를 도출할 수 있음을 알 수 있다.In FIG. 8, when the low-channel lidar is concentrated in the horizontal direction as shown in (f), the distributed scan is performed as in (c), or (f) the scan is performed by focusing in the ground direction In comparison, the accuracy is very high, and it can be seen that a result similar to the case of using the high-channel lidar shown in (b) can be obtained.

도 9는 라이다의 채널 수에 따른 오차를 시뮬레이션한 결과를 나타낸다.9 shows the result of simulating the error according to the number of channels of the lidar.

도 9에서는 본 실시예에 따른 깊이 추정 장치의 성능을 비교하기 위하여 기존의 Sparse-To-Dense 기법과 비교하여 도시하였다. 도 9를 참조하면, 본 실시예에 따른 깊이 추정 장치는 라이다가 동일한 개수의 채널을 가지고 스캔을 수행하는 경우에, 적은 스캔 라인 수로도 오차가 크게 줄어들었음을 알 수 있다. 즉 저채널 라이다로도 정확한 깊이 맵을 획득할 수 있다.In FIG. 9, in order to compare the performance of the depth estimation apparatus according to the present embodiment, it is illustrated in comparison with the existing Sparse-To-Dense technique. Referring to FIG. 9, in the depth estimation apparatus according to the present exemplary embodiment, when a lidar scans with the same number of channels, it can be seen that the error is greatly reduced even with a small number of scan lines. That is, it is possible to obtain an accurate depth map even with a low channel lidar.

도 10은 본 발명의 일 실시예에 따른 깊이 추정 방법을 나타낸다.10 shows a depth estimation method according to an embodiment of the present invention.

도 10의 깊이 추정 방법은 크게 인코딩 단계와 디코딩 단계로 구분된다.The depth estimation method of FIG. 10 is largely divided into an encoding step and a decoding step.

도 4 내지 도 6을 참조하면, 인코딩 단계에서는 우선 저채널 라이다에서 획득된 3D 저밀도 깊이 데이터를 수학식 1과 같이 2D 이미지 좌표 상에 투영하여 라이다 입력 레이블(S_L)을 획득한다(S11). 그리고 획득된 라이다 입력 레이블(S_L)에 대해 다수의 라이다 특징 추출부(122 ~ 123)가 미리 학습된 패턴 추정 방식에 따라 단계적으로 라이다 깊이 특징을 추출한다(S12).4 to 6, in the encoding step, 3D low-density depth data acquired from a low-channel lidar is first projected onto 2D image coordinates as shown in Equation 1 to obtain a lidar _{input label S L (S11).} ). In addition, for the acquired lidar input label S _L , a plurality of lidar feature extraction units 122 to 123 extract the lidar depth features in stages according to a previously learned pattern estimation method (S12).

이와 동시에 스테레오 카메라에서 획득된 디스패리티 맵(D_S)에 카메라의 특성을 수학식 2에 따라 반영하고, 수학식 3과 같이 라이다 입력 레이블(S_L)과 동일한 저밀도로 변환하여, 스테레오 입력 레이블(S_S)을 획득한다(S13). 그리고 획득된 스테레오 입력 레이블(S_S)에 대해 다수의 스테레오 특징 추출부(132 ~ 134)가 미리 학습된 패턴 추정 방식에 따라 단계적으로 스테레오 깊이 특징을 추출한다(S14).At the same time, the characteristics of the camera are reflected in the disparity map (D _S ) obtained from the stereo camera according to Equation 2, and converted to the same low density as the lidar _{input label (S L) as in Equation 3, and the stereo input label} (S _S ) is obtained (S13). In addition, for the obtained stereo input label S _S , a plurality of stereo feature extraction units 132 to 134 extract stereo depth features in stages according to a pre-learned pattern estimation method (S14).

여기서 라이다 입력 레이블(S_L)과 스테레오 입력 레이블(S_S)은 저밀도의 2D 이미지 형태이므로, 라이다 깊이 특징과 스테레오 깊이 특징 각각은 학습에 의해 마스크와 가중치가 결정된 부분 컨볼루션 연산에 의해 추출될 수 있다.Here, since the lidar input label (S _L ) and the stereo input label (S _S ) are in the form of a low-density 2D image, each of the lidar depth features and stereo depth features are extracted by partial convolution operation in which the mask and weight are determined by learning. Can be.

한편, 스테레오 카메라에서 획득된 한쌍의 이미지 중 하나의 컬러 이미지를 획득한다(S15). 그리고 획득된 컬러 이미지에 대해 미리 학습된 패턴 추정 방식에 따라 컬러 특징을 추출한다(S16).Meanwhile, one color image among a pair of images acquired from a stereo camera is acquired (S15). Then, color features are extracted according to a pattern estimation method learned in advance for the obtained color image (S16).

이후, 추출된 컬러 특징과 대응하는 단계에서 추출된 라이다 깊이 특징 및 스테레오 깊이 특징을 미리 학습된 방식에 따라 융합하여 융합 특징을 획득한다(S16). 이때, 라이다 깊이 특징 및 스테레오 깊이 특징을 추출하기 위해 이용된 마스크도 함께 융합하여 선택적 융합을 수행할 수 있다.Thereafter, a fusion feature is obtained by fusing the extracted color feature and the lidar depth feature and the stereo depth feature extracted in the corresponding step according to a pre-learned method (S16). In this case, selective fusion may be performed by fusing together the lidar depth feature and the mask used to extract the stereo depth feature.

그리고 융합 특징에 대해 다시 특징을 추출하여, 가이드 특징을 추출하고, 가이드 특징과 대응하는 단계에서 추출된 라이다 깊이 특징 및 스테레오 깊이 특징을 반복적으로 융합하여 융합 특징을 추출한다(S18).Then, a feature is extracted for the fusion feature again, the guide feature is extracted, and the fusion feature is extracted by repeatedly fusing the lidar depth feature and the stereo depth feature extracted in the step corresponding to the guide feature (S18).

전체 단계에서 반복 수행되어 융합 특징이 추출되면, 디코딩 단계를 수행한다.When the fusion feature is extracted by iteratively performing in all steps, a decoding step is performed.

디코딩 단계에서는 우선 인코딩 단계에서 최종적으로 획득된 융합 특징을 인가받고, 미리 학습된 특징 복원 방식에 따라 융합 특징을 디코딩하여 복원하여 복원 특징을 획득한다(S21). 그리고 복원 특징을 인코딩 단계의 대응하는 단계에서 생성된 융합 특징과 결합하여 결합 복원 특징을 획득한다(S22). 그리고 간계적으로 결합 복원 특징에 대해 다시 디코딩을 수행하여 복원 특징을 획득하고, 결합하여 결합 복원 특징을 획득한다(S23). 이후 최종 단계에서 디코딩되어 획득된 복원 특징을 깊이 맵으로서 출력한다(S24).In the decoding step, the fusion feature finally obtained in the encoding step is first applied, and the fusion feature is decoded and restored according to a previously learned feature reconstruction method to obtain a reconstructed feature (S21). Then, the reconstruction feature is obtained by combining the reconstruction feature with the fusion feature generated in the corresponding step of the encoding step (S22). Then, decoding is intermittently performed on the joint restoration feature to obtain the restoration feature, and the joint restoration feature is obtained by combining (S23). Thereafter, the reconstructed features obtained by decoding in the final step are output as a depth map (S24).

본 발명에 따른 방법은 컴퓨터에서 실행시키기 위한 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 여기서 컴퓨터 판독가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 또한 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함하며, ROM(판독 전용 메모리), RAM(랜덤 액세스 메모리), CD(컴팩트 디스크)-ROM, DVD(디지털 비디오 디스크)-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등을 포함할 수 있다.The method according to the present invention can be implemented as a computer program stored in a medium for execution on a computer. Here, the computer-readable medium may be any available medium that can be accessed by a computer, and may also include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data, Dedicated memory), RAM (random access memory), CD (compact disk)-ROM, DVD (digital video disk)-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.The present invention has been described with reference to the embodiments shown in the drawings, but these are merely exemplary, and those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

100: 인코더 200: 디코더
110: 컬러 가이드 특징 추출부 112 ~ 114: 융합 특징 추출부
F1 ~ F3: 특징 융합부 120: 라이다 깊이 특징 추출부
121: 라이다 깊이 맵 획득부 122 ~ 124: 라이다 특징 추출부
130: 스테레오 깊이 특징 추출부 131: 스테레오 깊이 맵 획득부
132 ~ 134: 스테레오 특징 추출부 212 ~ 214: 디코딩부
C1, C2: 결합부 211: 깊이 맵 출력부100: encoder 200: decoder
110: color guide feature extraction unit 112 to 114: fusion feature extraction unit
F1 to F3: feature fusion unit 120: lidar depth feature extraction unit
121: lidar depth map acquisition unit 122 to 124: lidar feature extraction unit
130: stereo depth feature extraction unit 131: stereo depth map acquisition unit
132 ~ 134: stereo feature extraction unit 212 ~ 214: decoding unit
C1, C2: coupling unit 211: depth map output unit

Claims

A lidar depth feature extraction unit that obtains a lidar input label from depth data obtained from lidar and extracts a lidar depth feature by stepwise encoding the lidar input label according to a pre-learned pattern estimation method;
A stereo depth feature extractor for obtaining a stereo input label from a disparity map obtained from a stereo camera, and encoding the stereo input label in stages according to a pre-learned pattern estimation method to extract a stereo depth feature;
The fusion feature is obtained by fusing the guide feature extracted according to the pattern estimation method learned in advance from the color image obtained from the stereo camera and the lidar depth feature extracted in the corresponding step and the stereo depth feature according to the pre-learned method. A color guide feature extracting unit for stepwise extraction of the guide feature by obtaining and encoding the stepwise acquired fusion feature; And
The color guide feature extraction unit receives the final extracted fusion feature and decodes it according to a pre-learned feature reconstruction method to obtain a reconstruction feature, and combines the obtained reconstruction feature with a corresponding fusion feature among the stepwise extracted fusion features. Comprising a decoder that obtains the joint restoration feature, obtains the restoration feature again from the obtained joint restoration feature, and outputs the finally obtained restoration feature as a depth map,
The lidar depth feature extraction unit
A lidar depth map acquisition unit for obtaining a lidar input label by projecting the 3D low-density depth data obtained from the lidar onto 2D image coordinates; And
A plurality of Radars that are connected in a multi-stage structure and combine the LIDAR input label or the LIDAR depth feature obtained from the previous stage with a mask obtained by learning, and encode the LIDAR depth feature according to a pre-learned pattern estimation method. It is a depth estimation apparatus including a feature extraction unit.

delete

The method of claim 1, wherein the stereo depth feature extractor
The disparity map obtained from the stereo camera is applied and matched with the scan area of the lidar to obtain a stereo depth map, and the density of the stereo depth map is converted into a density corresponding to the lidar input label, and the stereo input A stereo depth map acquisition unit that acquires a label; And
A plurality of stereo feature extraction units that are connected in a multi-stage structure to obtain stereo depth features by combining the stereo input label or the stereo depth feature obtained from the previous stage with a mask obtained by learning, and encoding according to a pre-learned pattern estimation method. Depth estimation device comprising.

The method of claim 3, wherein each of the plurality of LiDAR feature extraction units and the plurality of stereo feature extraction units
A depth estimation apparatus for extracting a LiDAR depth feature or a stereo depth feature by combining the LiDAR depth feature or the stereo depth feature obtained in the previous step with a mask and performing a partial convolution operation.

The method of claim 3, wherein the color guide feature extraction unit
A plurality of fusion feature extraction units for extracting a guide feature according to a pre-learned pattern method by receiving a color image or a fusion feature obtained from a previous stage; And
The guide feature extracted from each of the plurality of fusion feature extraction units is a lidar depth feature applied from a corresponding lidar feature extraction unit among the plurality of lidar feature extraction units, and a corresponding stereo feature among the plurality of stereo feature extraction units A depth estimation apparatus comprising a plurality of feature fusion units for obtaining a fusion feature by fusing the stereo depth feature applied from the extraction unit according to a pre-learned method.

The method of claim 5, wherein the plurality of feature fusion units
Depth that receives the masks used by the corresponding lidar feature extraction unit and the corresponding stereo feature extraction unit together, and fuses the lidar depth feature, the stereo depth feature, and the mask according to the weight determined by learning to output the fusion feature Estimation device.

The method of claim 5, wherein each of the plurality of fusion feature extraction units
A depth estimation apparatus having a residual network structure to extract a feature for an applied fusion feature, combine the extracted feature with the applied fusion feature again, and output the guide feature.

The method of claim 5, wherein the decoder
A plurality of fusion features configured to correspond to the reverse order of the plurality of fusion feature extraction units, receiving the fusion feature or the combined restoration feature finally extracted from the color guide feature extraction unit, decoding according to a previously learned feature restoration method, and outputting a restoration feature. A decoding unit; And
Depth estimation apparatus comprising a plurality of combining units for outputting a joint restoration feature by combining the restoration feature output from a corresponding decoding unit among the plurality of decoding units and the fusion feature transmitted from the corresponding fusion feature extraction unit.

The method of claim 1, wherein the depth data obtained from the lidar is
A depth estimation apparatus that is depth data obtained by scanning a horizontal direction at a level in which the plurality of channels of the lidar are possible.

The method of claim 1, wherein the depth estimating device
A depth estimating device for receiving a two-dimensional image whose depth value is measured in advance during learning as training data, _{and calculating a depth value and a difference of the depth map output from the decoder as a loss with an L 2 norm and backpropagating it.}

Extracting a lidar depth feature by obtaining a lidar input label from depth data obtained from lidar, encoding the lidar input label in stages according to a pre-learned pattern estimation method;
Obtaining a stereo input label from a disparity map obtained from a stereo camera, and stepwise encoding the stereo input label according to a previously learned pattern estimation method to extract a stereo depth feature;
The fusion feature is obtained by fusing the guide feature extracted according to the pattern estimation method learned in advance from the color image obtained from the stereo camera and the lidar depth feature extracted in the corresponding step and the stereo depth feature according to the pre-learned method. Extracting a guide feature by acquiring and encoding the stepwise acquired fusion feature; And
The final extracted fusion feature is applied and decoded according to a previously learned feature reconstruction method to obtain a reconstruction feature, and the obtained reconstruction feature is combined with a corresponding fusion feature among the stepwise extracted fusion features to obtain a joint reconstruction feature, Comprising the step of obtaining the reconstructed feature again from the acquired combined restoration feature and outputting the finally obtained restoration feature as a depth map,
Extracting the lidar depth feature
Projecting the 3D low-density depth data obtained from the lidar onto 2D image coordinates to obtain a lidar input label; And
Combining the lidar input label or the previously acquired lidar depth feature with a mask obtained by learning, encoding according to a pre-learned pattern estimation method, and repeatedly obtaining the lidar depth feature .

delete

The method of claim 11, wherein extracting the stereo depth feature comprises:
The disparity map obtained from the stereo camera is applied and matched with the scan area of the lidar to obtain a stereo depth map, and the density of the stereo depth map is converted into a density corresponding to the lidar input label, and the stereo input Obtaining a label; And
And repeatedly obtaining the stereo depth feature by combining the stereo input label or the previously acquired stereo depth feature with a mask acquired by learning and encoding according to a pre-learned pattern estimation method.

The method of claim 13, wherein extracting the guide feature comprises:
Repetitively extracting a guide feature according to a pre-learned pattern method by receiving a color image or a previously acquired fusion feature; And
The fusion feature is fused according to the corresponding stereo depth feature and a pre-learned method among the plurality of lidar depth features extracted by repeating the extracted guide feature and the corresponding lidar depth feature among the plurality of stereo depth features extracted repeatedly. Depth estimation method comprising the step of obtaining.

The method of claim 14, wherein obtaining the fusion feature comprises:
Depth that receives the corresponding lidar depth feature and the mask used when extracting the corresponding stereo depth feature, and combines the lidar depth feature, the stereo depth feature, and the mask according to the weight determined by learning to output the fusion feature. Estimation method.

The method of claim 14, wherein the outputting the depth map comprises:
Receiving the final extracted fusion feature or joint restoration feature in the step of extracting the guide feature, decoding it according to a previously learned feature restoration method, and outputting a restoration feature; And
Outputting a joint restoration characteristic by combining the restoration characteristic and a corresponding fusion characteristic among a plurality of fusion characteristics; Depth estimation method comprising a.

The method of claim 11, wherein the depth data obtained from the lidar is
Depth estimation method, which is depth data obtained by scanning a horizontal direction at a level in which the plurality of channels of the lidar are possible.

The method of claim 11, wherein the depth estimation method is
Further includes a learning step,
The learning step is
A depth estimation method in which a 2D image whose depth value is measured in advance is applied as training data, and the difference between the depth value and the output depth map is calculated as a loss with an _{L 2 norm and propagated back.}