KR102660946B1

KR102660946B1 - Dense depth extraction system and method using camera and lidar

Info

Publication number: KR102660946B1
Application number: KR1020210144185A
Authority: KR
Inventors: 윤국진; 류권영; 이강일; 조제경
Original assignee: 한국과학기술원; 국방과학연구소
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2024-04-26
Also published as: KR20230060011A

Abstract

본 발명은 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법에 관한 것으로서, 더욱 상세하게는, 적은 채널을 갖는 LiDAR 정보를 이용하여, 정체 채널을 이용하여 추정한 깊이 정보만큼의 정밀도를 갖는 깊이 정보를 제공할 수 있는 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법에 관한 것이다.The present invention relates to a precise depth information estimation system and method using a camera and LiDAR. More specifically, the present invention relates to a precise depth information estimation system and method using LiDAR information with a small number of channels, and has as much precision as depth information estimated using a congestion channel. This relates to a precise depth information estimation system and method using a camera and lidar that can provide depth information.

Description

Precise depth information estimation system and method using camera and lidar {Dense depth extraction system and method using camera and lidar}

본 발명은 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법에 관한 것으로, 더욱 상세하게는 낮은 스캔라인 해상도의 라이다(LiDAR) 데이터가 입력되어도 정밀한 해상도의 깊이 정보를 추정할 수 있는 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법에 관한 것이다.The present invention relates to a precise depth information estimation system and method using a camera and LiDAR, and more specifically, to a camera that can estimate depth information with precise resolution even when low scanline resolution LiDAR data is input. and a precise depth information estimation system and method using LiDAR.

고해상도 고밀도 깊이 맵을 추정하는 것은 자율 주행 내비게이션 및 증강 현실을 포함하는 다양한 컴퓨터 비전 어플리케이션의 핵심 작업이다.Estimating high-resolution, high-density depth maps is a key task for a variety of computer vision applications, including autonomous navigation and augmented reality.

단일 이미지 데이터에 대한 깊이 예측 방법이 제안되었으나, 더 나은 성능을 위해 LiDAR 센서, ToF(Time-of-Flight) 센서와 같은 희소 깊이 측정을 지원하는 데이터를 적용하여 고해상도 고밀도 깊이를 추정하는 것이 각광받고 있다.Depth estimation methods for single image data have been proposed, but for better performance, high-resolution, high-density depth estimation by applying data that supports sparse depth measurements, such as LiDAR sensors and Time-of-Flight (ToF) sensors, is attracting attention. there is.

특히 LiDAR 센서의 경우, 수평 주사선 내에서 긴 감지 범위를 가져 매우 정확한 거리 측정을 제공하지만, 희소 거리 측정만 제공하며 이미지 평면에 투영할 때, 간격이 불규칙하고 해상도가 낮아지는 문제점이 있다.In particular, LiDAR sensors have a long detection range within the horizontal scanning line and provide very accurate distance measurements, but they only provide sparse distance measurements and have problems with irregular spacing and low resolution when projected onto the image plane.

이러한 문제점으로 인해, LiDAR의 높은 정확도에도 불구하고 적용이 제한적으로 이루어지고 있다. 이에 따라, LiDAR 측정의 정확도를 유지하면서 조밀한 깊이 정보를 얻고자 하는 연구가 이루어지고 있다.Due to these problems, despite the high accuracy of LiDAR, its application is limited. Accordingly, research is being conducted to obtain dense depth information while maintaining the accuracy of LiDAR measurement.

그렇지만, 종래에는 64 스캔라인 해상도를 갖는 LiDAR 데이터에 대해서만 연구가 이루어지고 있어, 이에 걸 맞는 높은 해상도의 LiDAR 센서가 요구되기 때문에 이를 실제 다양한 분야에 적용하기에는 가격, 무게, 에너지 효율 등의 단점이 있다.However, conventionally, research has only been conducted on LiDAR data with a resolution of 64 scan lines, and since a LiDAR sensor with a correspondingly high resolution is required, there are disadvantages such as price, weight, and energy efficiency in applying it to various fields. .

이에 따라, 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법에서는, 전체 채널의 스캔라인 해상도를 갖는 LiDAR 데이터가 아닌, 보다 낮은 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 이용하더라도 전체 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 통해서 추정한 깊이 정보만큼 정밀한 해상도의 깊이 정보를 추정하고자 하는 연구를 수행하였다.Accordingly, in the precise depth information estimation system and method using a camera and LiDAR according to an embodiment of the present invention, LiDAR data with a scanline resolution of a lower channel is used, rather than LiDAR data with a scanline resolution of all channels. Even when using data, a study was conducted to estimate depth information with a resolution as precise as the depth information estimated through LiDAR data with scanline resolution of all channels.

이와 관련하여, 국내 공개 특허 제10-2021-0022016호("라이다와 카메라를 이용하여 이미지 특징점의 깊이 정보를 향상시키는 방법 및 시스템")에서는 라이다 기반 SLAM과 이미지를 이용하여 깊이 값을 정제할 수 있는 방법 및 시스템을 개시하고 있다.In this regard, in Domestic Public Patent No. 10-2021-0022016 (“Method and System for Enhancing Depth Information of Image Feature Points Using LiDAR and Camera”), depth values are refined using LiDAR-based SLAM and images. We are disclosing a method and system that can do this.

국내공개특허 제10-2021-0022016호(공개일자 2021.03.02.)Domestic published patent No. 10-2021-0022016 (publication date 2021.03.02.)

본 발명은 상기한 바와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은 전체 채널의 해상도를 가진 LiDAR 데이터가 아닌 보다 적은 채널의 해상도를 가진 LiDAR 데이터를 적용하더라도, 전체 채널의 해상도를 가진 LiDAR 데이터를 적용한 깊이 추정 정도의 성능을 얻을 수 있는 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법을 제공하는 것이다.The present invention was created to solve the problems of the prior art as described above. The purpose of the present invention is to apply the LiDAR data with a resolution of fewer channels rather than the LiDAR data with the resolution of all channels. The goal is to provide a precise depth information estimation system and method using a camera and LiDAR that can achieve depth estimation level performance by applying high-resolution LiDAR data.

본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템은, 적어도 하나의 카메라를 이용하여 획득된 소정 영역의 RGB 이미지 데이터를 입력받는 RGB 입력부(100), 3차원의 LiDAR 센서를 이용하여 획득된 상기 소정 영역의 포인트 클라우드 데이터를 입력받는 LiDAR 입력부(200), 상기 RGB 입력부(100)로부터 상기 RGB 이미지 데이터를 전달받고, 상기 LiDAR 입력부(200)로부터 입력받은 전체 채널의 상기 포인트 클라우드 데이터를 전달받아, 기저장된 깊이 완성 네트워크(depth completion network)를 이용하여, 제1 깊이 예측 정보를 출력하고, 상기 RGB 입력부(100)로부터 상기 RGB 이미지 데이터를 전달받고, 상기 LiDAR 입력부(200)로부터 전체 채널보다 적은 채널로 이루어지는 랜덤 채널의 상기 포인트 클라우드 데이터를 전달받아, 상기 깊이 완성 네트워크를 이용하여, 제2 깊이 예측 정보를 출력하며, 상기 제1 깊이 예측 정보와 제2 깊이 예측 정보를 비교 분석하여 손실 함수를 산출하여, 산출한 상기 손실 함수를 기반으로 상기 깊이 완성 네트워크를 학습시키는 훈련부(300) 및 상기 훈련부(300)에 의해 최종 학습된 깊이 완성 네트워크에 입력되는 RGB 이미지 데이터와 임의의 채널의 포인트 클라우드 데이터를 적용하여, 정밀 깊이 정보를 추정하는 정밀 추정부(400)를 포함하는 것이 바람직하다.A precise depth information estimation system using a camera and LiDAR according to an embodiment of the present invention includes an RGB input unit 100 that receives RGB image data of a predetermined area acquired using at least one camera, and a three-dimensional LiDAR sensor. A LiDAR input unit 200 that receives point cloud data of the predetermined area obtained using a LiDAR input unit 200, which receives the RGB image data from the RGB input unit 100, and the points of all channels received from the LiDAR input unit 200. Receives cloud data, outputs first depth prediction information using a pre-stored depth completion network, receives the RGB image data from the RGB input unit 100, and receives the LiDAR input unit 200. Receives the point cloud data of a random channel consisting of fewer channels than all channels, outputs second depth prediction information using the depth completion network, and compares the first depth prediction information and the second depth prediction information. A training unit 300 that analyzes and calculates a loss function and learns the depth completion network based on the calculated loss function, and RGB image data input to the depth completion network finally learned by the training unit 300 and random It is desirable to include a precision estimation unit 400 that estimates precise depth information by applying point cloud data of the channel.

더 나아가, 상기 훈련부(300)는 상기 깊이 완성 네트워크에 상기 RGB 이미지 데이터와 전체 채널의 상기 포인트 클라우드 데이터를 적용하여, 제1 깊이 예측 정보를 출력받는 제1 출력부(310) 및 상기 제1 깊이 예측 정보와, 외부로부터 입력되는 상기 소정 영역의 GT(Ground Truth) 데이터, 상기 RGB 입력부(100)로부터 전달되는 상기 RGB 이미지 데이터를 이용하여, 상기 제1 깊이 예측 정보에 대한 제1 손실 함수값과 제2 손실 함수값을 연산하는 제1 비교부(320)를 더 포함하는 것이 바람직하다.Furthermore, the training unit 300 applies the RGB image data and the point cloud data of all channels to the depth completion network, and provides a first output unit 310 for outputting first depth prediction information and the first depth. Using prediction information, GT (Ground Truth) data of the predetermined area input from the outside, and the RGB image data transmitted from the RGB input unit 100, a first loss function value for the first depth prediction information and It is desirable to further include a first comparison unit 320 that calculates the second loss function value.

더 나아가, 상기 훈련부(300)는 상기 깊이 완성 네트워크에 상기 RGB 이미지 데이터와 랜덤 채널의 상기 포인트 클라우드 데이터를 적용하여, 제2 깊이 예측 정보를 출력받는 제2 출력부(330), 상기 제2 깊이 예측 정보와 상기 제1 깊이 예측 정보를 이용하여, 상기 제2 깊이 예측 정보에 대한 제3 손실 함수값을 연산하는 제2 비교부(340) 및 상기 제1 손실 함수값, 제2 손실 함수값 및 제3 손실 함수값을 이용하여, 상기 깊이 완성 네트워크를 학습시키는 통합 처리부(350)를 더 포함하는 것이 바람직하다.Furthermore, the training unit 300 applies the RGB image data and the point cloud data of a random channel to the depth completion network, and includes a second output unit 330 that receives second depth prediction information, and the second depth A second comparison unit 340 that calculates a third loss function value for the second depth prediction information using prediction information and the first depth prediction information, and the first loss function value, the second loss function value, and It is preferable to further include an integrated processing unit 350 that trains the depth completion network using a third loss function value.

더 나아가, 상기 훈련부(300)는 입력되는 상기 RGB 이미지 데이터의 특징을 추출하는 제1 인코더, 입력되는 상기 포인트 클라우드 데이터의 특징을 추출하는 제2 인코더, 상기 제1 인코더와 제2 인코더로부터 특징들을 입력받아 합성하는 제1 모듈, 상기 제1 모듈로부터 합성 값을 입력받아 다중 스케일 특징을 추출하는 제2 모듈 및 제2 모듈로부터 다중 스케일 특징을 입력받아 깊이 예측 정보를 출력하는 디코더로 상기 깊이 완성 네트워크를 구성하는 것이 바람직하다.Furthermore, the training unit 300 includes a first encoder for extracting features of the input RGB image data, a second encoder for extracting features of the input point cloud data, and features from the first encoder and the second encoder. The depth completion network includes a first module that receives input and synthesizes, a second module that receives synthesized values from the first module and extracts multi-scale features, and a decoder that receives multi-scale features from the second module and outputs depth prediction information. It is desirable to configure .

더 나아가, 상기 깊이 예측 정보는 상기 디코더로부터 출력되는 초기 깊이 예측 정보와, 상기 초기 깊이 예측 정보를 기저장된 해상도 향상 네트워크에 적용하여 출력되는 후기 깊이 예측 정보를 포함하며, 상기 제1 깊이 예측 정보는 제1 초기 깊이 예측 정보, 제1 후기 깊이 예측 정보를 포함하고, 상기 제2 깊이 예측 정보는 제2 초기 깊이 예측 정보, 제2 후기 깊이 예측 정보를 포함하는 것이 바람직하다.Furthermore, the depth prediction information includes initial depth prediction information output from the decoder, and late depth prediction information output by applying the initial depth prediction information to a pre-stored resolution enhancement network, and the first depth prediction information Preferably, it includes first initial depth prediction information and first late depth prediction information, and the second depth prediction information includes second initial depth prediction information and second late depth prediction information.

더 나아가, 상기 제1 비교부(320)는 상기 제1 초기 깊이 예측 정보와 제1 후기 깊이 예측 정보의 각각에 대한 제1 손실 함수값과 제2 손실 함수값을 연산하고, 상기 제2 비교부(340)는 상기 제2 초기 깊이 예측 정보와 제2 후기 깊이 예측 정보의 각각에 대한 제3 손실 함수값을 연산하는 것이 바람직하다.Furthermore, the first comparison unit 320 calculates a first loss function value and a second loss function value for each of the first initial depth prediction information and the first late depth prediction information, and the second comparison unit Preferably, at step 340, a third loss function value is calculated for each of the second initial depth prediction information and the second late depth prediction information.

더 나아가, 상기 통합 처리부(350)는 상기 제1 초기 깊이 예측 정보에 대한 제1 손실 함수값, 제2 손실 함수값과, 상기 제1 후기 깊이 예측 정보에 대한 제1 손실 함수값, 제2 손실 함수값과, 상기 제2 초기 깊이 예측 정보에 대한 제3 손실 함수값과, 상기 제2 후기 깊이 예측 정보에 대한 제3 손실 함수값을 이용하여, 상기 깊이 완성 네트워크를 학습시키는 것이 바람직하다.Furthermore, the integrated processing unit 350 generates a first loss function value and a second loss function value for the first initial depth prediction information, and a first loss function value and a second loss function for the first late depth prediction information. It is preferable to train the depth completion network using a function value, a third loss function value for the second initial depth prediction information, and a third loss function value for the second late depth prediction information.

본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 방법은, 컴퓨터로 구현되는 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템에 의해 각 단계가 수행되는 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 방법에 있어서, RGB 입력부에서, 적어도 하나의 카메라를 이용하여 소정 영역의 RGB 이미지 데이터를 입력받는 제1 데이터 입력 단계(S100), LiDAR 입력부에서, 3차원의 LiDAR 센서를 이용하여 상기 소정 영역의 포인트 클라우드 데이터를 입력받는 제2 데이터 입력 단계(S200), 훈련부에서, 상기 제1 데이터 입력 단계(S100)에 의해 상기 RGB 이미지 데이터를 입력받고, 상기 제2 데이터 입력 단계(S200)에 의해 전체 채널의 상기 포인트 클라우드 데이터를 입력받아, 기저장된 깊이 완성 네트워크(depth completion network)를 이용하여 제1 깊이 예측 정보를 출력하는 제1 예측 출력 단계(S300), 훈련부에서, 상기 제1 데이터 입력 단계(S100)에 의해 상기 RGB 이미지 데이터를 입력받고, 상기 제2 데이터 입력 단계(S200)에 의해 상기 제1 예측 출력 단계(S300)로 입력된 채널보다 적은 채널로 이루어진 랜덤 채널의 상기 포인트 클라우드 데이터를 입력받아, 기저장된 깊이 완성 네트워크를 이용하여 제2 깊이 예측 정보를 출력하는 제2 예측 출력 단계(S400), 훈련부에서, 외부로부터 상기 소정 영역의 GT(Ground Truth) 데이터를 입력받는 제3 데이터 입력 단계(S500), 훈련부에서, 상기 제3 데이터 입력 단계(S500)에 의한 상기 GT 데이터, 상기 제1 데이터 입력 단계(S100)에 의한 상기 RGB 이미지 데이터와 상기 제1 예측 출력 단계(S300)에 의한 상기 제1 깊이 예측 정보를 이용하여, 상기 제1 깊이 예측 정보에 대한 제1 손실 함수값과 제2 손실 함수값을 연산하는 제1 손실 산출 단계(S600), 훈련부에서, 상기 제1 예측 출력 단계(S300)에 의한 상기 제1 깊이 예측 정보와 상기 제2 예측 출력 단계(S400)에 의한 상기 제2 깊이 예측 정보를 이용하여, 상기 제2 깊이 예측 정보에 대한 제3 손실 함수값을 연산하는 제2 손실 산출 단계(S700), 훈련부에서, 상기 제1 손실 산출 단계(S600)에 의한 상기 제1 손실 함수값, 제2 손실 함수값과, 상기 제2 손실 산출 단계(S700)에 의한 상기 제3 손실 함수값을 이용하여, 상기 깊이 완성 네트워크를 학습시키는 트레이닝 단계(S800) 및 정밀 추정부에서, 상기 트레이닝 단계(S800)에 의해 최종 학습된 깊이 완성 네트워크를 이용하여, 입력되는 RGB 이미지 데이터와 임의의 채널의 포인트 클라우드 데이터를 적용하여, 정밀 깊이 정보를 추정하는 정밀 출력 단계(S900)를 포함하는 것이 바람직하다.The precise depth information estimation method using a camera and LiDAR according to an embodiment of the present invention is a precise depth information estimation method using a camera and LiDAR, in which each step is performed by a precise depth information estimation system using a camera and LiDAR implemented on a computer. In the depth information estimation method, a first data input step (S100) of receiving RGB image data of a predetermined area using at least one camera at the RGB input unit, and receiving the predetermined area using a three-dimensional LiDAR sensor at the LiDAR input unit. A second data input step (S200) in which point cloud data of the area is input, the training unit receives the RGB image data through the first data input step (S100), and in the second data input step (S200) A first prediction output step (S300) of receiving the point cloud data of all channels and outputting first depth prediction information using a pre-stored depth completion network; in the training unit, the first data input step The RGB image data is input through (S100), and the point cloud data of a random channel consisting of fewer channels than the channel input to the first prediction output step (S300) is generated through the second data input step (S200). A second prediction output step (S400) of receiving input and outputting second depth prediction information using a pre-stored depth completion network, and a third data input receiving GT (Ground Truth) data of the predetermined area from the outside in the training unit. Step (S500), in the training unit, the GT data from the third data input step (S500), the RGB image data from the first data input step (S100), and the first prediction output step (S300) A first loss calculation step (S600) of calculating a first loss function value and a second loss function value for the first depth prediction information using the first depth prediction information, in the training unit, the first prediction output step Using the first depth prediction information in (S300) and the second depth prediction information in the second prediction output step (S400), a third loss function value for the second depth prediction information is calculated. 2 Loss calculation step (S700), in the training unit, the first loss function value, the second loss function value by the first loss calculation step (S600), and the third loss function value by the second loss calculation step (S700) In the training step (S800) of learning the depth completion network using the loss function value and the precision estimation unit, using the depth completion network finally learned by the training step (S800), the input RGB image data and random It is desirable to include a precise output step (S900) of estimating precise depth information by applying point cloud data of the channel.

더 나아가, 상기 깊이 완성 네트워크는 입력되는 상기 RGB 이미지 데이터의 특징을 추출하는 제1 인코더, 입력되는 상기 포인트 클라우드 데이터의 특징을 추출하는 제2 인코더, 상기 제1 인코더와 제2 인코더로부터 특징들을 입력받아 합성하는 제1 모듈, 상기 제1 모듈로부터 합성 값을 입력받아 다중 스케일 특징을 추출하는 제2 모듈 및 제2 모듈로부터 다중 스케일 특징을 입력받아 깊이 예측 정보를 출력하는 디코더로 구성되는 것이 바람직하다.Furthermore, the depth completion network includes a first encoder that extracts features of the input RGB image data, a second encoder that extracts features of the input point cloud data, and input features from the first encoder and the second encoder. It is preferably comprised of a first module that receives and synthesizes, a second module that receives synthesized values from the first module and extracts multi-scale features, and a decoder that receives multi-scale features from the second module and outputs depth prediction information. .

더 나아가, 상기 제1 손실 산출 단계(S600)는 상기 제1 초기 깊이 예측 정보, 제1 후기 깊이 예측 정보의 각각에 대한 제1 손실 함수값과 제2 손실 함수값을 연산하고, 상기 제2 손실 산출 단계(S700)는 상기 제2 초기 깊이 예측 정보, 제2 후기 깊이 예측 정보의 각각에 대한 제3 손실 함수값을 연산하는 것이 바람직하다.Furthermore, the first loss calculation step (S600) calculates a first loss function value and a second loss function value for each of the first initial depth prediction information and the first late depth prediction information, and calculates the second loss function value. In the calculating step (S700), it is preferable to calculate a third loss function value for each of the second initial depth prediction information and the second late depth prediction information.

더 나아가, 상기 트레이닝 단계(S800)는 상기 제1 초기 깊이 예측 정보에 대한 제1 손실 함수값, 제2 손실 함수값과, 상기 제1 후기 깊이 예측 정보에 대한 제1 손실 함수값, 제2 손실 함수값과, 상기 제2 초기 깊이 예측 정보에 대한 제3 손실 함수값과, 상기 제2 후기 깊이 예측 정보에 대한 제3 손실 함수값을 이용하여, 상기 깊이 완성 네트워크를 학습시키는 것이 바람직하다.Furthermore, the training step (S800) includes a first loss function value and a second loss function value for the first initial depth prediction information, and a first loss function value and a second loss for the first late depth prediction information. It is preferable to train the depth completion network using a function value, a third loss function value for the second initial depth prediction information, and a third loss function value for the second late depth prediction information.

상기와 같은 구성에 의한 본 발명의 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법은 전체 채널의 스캔라인 해상도를 갖는 LiDAR 데이터가 아닌, 보다 낮은 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 이용하더라도 전체 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 통해서 추정한 깊이 정보만큼 정밀한 해상도의 깊이 정보를 추정할 수 있는 장점이 있다.The precise depth information estimation system and method using the camera and LiDAR of the present invention according to the above configuration uses LiDAR data with a scanline resolution of a lower channel, rather than LiDAR data with a scanline resolution of all channels. However, there is an advantage in being able to estimate depth information with a resolution as precise as the depth information estimated through LiDAR data with scanline resolution of all channels.

이를 통해서, 적은 채널의 낮은 해상도를 갖는 LiDAR 데이터를 이용하여도 정밀한 깊이 정보가 요구되는 다양한 분야에서 활용할 수 있는 장점이 있다.Through this, there is an advantage that it can be used in various fields that require precise depth information even when using LiDAR data with a small number of channels and low resolution.

상세하게는, 동일한 깊이 완성 네트워크에 전체 채널의 해상도를 갖는 LiDAR 데이터를 이용한 깊이 예측 정보와 낮은 채널의 해상도를 갖는 LiDAR 데이터를 이용한 깊이 예측 정보를 출력받아, 이들 간의 손실 함수를 연산하고 이를 기반으로 네트워크 훈련을 수행함으로써, 최종 학습된 네트워크를 이용할 경우, 임의 채널의 해상도를 갖는 LiDAR 데이터를 이용하더라도 정밀한 깊이 정보를 추정할 수 있는 장점이 있다.In detail, depth prediction information using LiDAR data with full channel resolution and depth prediction information using LiDAR data with low channel resolution are output to the same depth completion network, a loss function between them is calculated, and based on this, By performing network training, when using the final learned network, there is an advantage of being able to estimate precise depth information even when using LiDAR data with arbitrary channel resolution.

도 1은 상이한 채널 수를 갖는 LiDAR 데이터를 이용하여 추정한 깊이 정보 결과에 대한 종래 기술과 본 발명의 비교 예시도이다.
도 2는 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템을 나타낸 구성 예시도이다.
도 3은 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템의 훈련부에서, 전체 채널을 이용한 제1 깊이 예측 정보를 출력하는 과정을 나타낸 예시도이다.
도 4는 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템의 훈련부에서, 제1 깊이 예측 정보를 출력한 후, 랜덤 채널을 이용한 제2 깊이 예측 정보를 출력하는 과정을 나타낸 예시도이다.
도 5는 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템의 훈련부에서, 적용한 깊이 완성 네트워크를 나타낸 구성 예시도이다.
도 6은 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템의 훈련부에서, 적용한 해상도 향상 네트워크를 나타낸 구성 예시도이다.
도 7은 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템의 훈련부에서, 적용한 깊이 완성 네트워크, 해상도 향상 네트워크를 나타낸 세부 구성 예시도이다.
도 8은 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 방법을 나타낸 순서 예시도이다.Figure 1 is an exemplary diagram comparing the prior art and the present invention for depth information results estimated using LiDAR data with different numbers of channels.
Figure 2 is an example configuration diagram showing a precise depth information estimation system using a camera and lidar according to an embodiment of the present invention.
Figure 3 is an example diagram showing a process of outputting first depth prediction information using all channels in the training unit of a precise depth information estimation system using a camera and lidar according to an embodiment of the present invention.
Figure 4 shows the process of outputting first depth prediction information and then outputting second depth prediction information using a random channel in the training unit of the precise depth information estimation system using a camera and lidar according to an embodiment of the present invention. This is an example diagram.
Figure 5 is an example configuration diagram showing a depth completion network applied in the training unit of a precise depth information estimation system using a camera and lidar according to an embodiment of the present invention.
Figure 6 is an example configuration diagram showing a resolution enhancement network applied in the training unit of a precise depth information estimation system using a camera and lidar according to an embodiment of the present invention.
Figure 7 is a detailed configuration example showing the depth completion network and resolution enhancement network applied in the training unit of the precise depth information estimation system using a camera and lidar according to an embodiment of the present invention.
Figure 8 is a flowchart illustrating a method for estimating precise depth information using a camera and LIDAR according to an embodiment of the present invention.

이하 첨부한 도면들을 참조하여 본 발명의 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법을 상세히 설명한다. 다음에 소개되는 도면들은 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 예로서 제공되는 것이다. 따라서, 본 발명은 이하 제시되는 도면들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 또한, 명세서 전반에 걸쳐서 동일한 참조번호들은 동일한 구성요소들을 나타낸다.Hereinafter, a precise depth information estimation system and method using a camera and lidar of the present invention will be described in detail with reference to the attached drawings. The drawings introduced below are provided as examples so that the idea of the present invention can be sufficiently conveyed to those skilled in the art. Accordingly, the present invention is not limited to the drawings presented below and may be embodied in other forms. Additionally, like reference numerals refer to like elements throughout the specification.

이 때, 사용되는 기술 용어 및 과학 용어에 있어서 다른 정의가 없다면, 이 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 통상적으로 이해하고 있는 의미를 가지며, 하기의 설명 및 첨부 도면에서 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 설명은 생략한다.At this time, if there is no other definition in the technical and scientific terms used, they have meanings commonly understood by those skilled in the art in the technical field to which this invention pertains, and the gist of the present invention is summarized in the following description and attached drawings. Descriptions of known functions and configurations that may unnecessarily obscure are omitted.

더불어, 시스템은 필요한 기능을 수행하기 위하여 조직화되고 규칙적으로 상호 작용하는 장치, 기구 및 수단 등을 포함하는 구성 요소들의 집합을 의미한다.In addition, a system refers to a set of components including devices, mechanisms, and means that are organized and interact regularly to perform necessary functions.

본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법은 전체 채널의 스캔라인 해상도를 갖는 LiDAR 데이터가 아닌, 보다 낮은 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 이용하더라도 전체 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 통해서 추정한 깊이 정보만큼 정밀한 해상도의 깊이 정보를 추정하고자 하는 시스템 및 그 방법에 관한 것이다.The precise depth information estimation system and method using a camera and LiDAR according to an embodiment of the present invention uses LiDAR data with a scanline resolution of a lower channel rather than LiDAR data with a scanline resolution of all channels. It relates to a system and method for estimating depth information with a resolution as precise as depth information estimated through LiDAR data with scanline resolution of all channels.

즉, 도 1에 도시된 바와 같이, 각각 상이한 16 채널의 스캔라인 해상도를 갖는 LiDAR 데이터와 4 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 이용하여 깊이 정보를 추정할 경우, 종래에는 a) 와 같이 16 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 이용한 깊이 정보에 비해 4 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 이용한 깊이 정보가 정밀도가 낮음을 알 수 있으나, 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법을 적용할 경우, b)와 같이, 4 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 이용한 깊이 정보일지라도 16 채널의 스캔라인 해상도를 갖는 LiDAR 데이터를 이용한 깊이 정보만큼 정밀한 해상도의 깊이 정보를 추정함을 연구를 통해서 확인하였다.That is, as shown in FIG. 1, when depth information is estimated using LiDAR data with different scanline resolutions of 16 channels and LiDAR data with different scanline resolutions of 4 channels, conventionally, as in a), 16 It can be seen that depth information using LiDAR data with a 4-channel scanline resolution has lower precision compared to depth information using LiDAR data with a scanline resolution of 4 channels, but the camera and lidar according to an embodiment of the present invention When applying a precise depth information estimation system and method using It was confirmed through research that depth information with precise resolution can be estimated.

자세하게는, 종래의 딥러닝 기반 깊이 정보 추정(획득/분석 등) 기술은 고해상도의 LiDAR 측정값에만 적합하여, 저해상도 LiDAR 측정값을 통해서는 신뢰할 수 있는 조밀한 깊이 정보(깊이 맵)을 예측하지 못하는 문제점이 있다. 그렇지만, 많은 측면(비용, 장치의 무게, 전력 소비 등)에서 LiDAR 채널 수를 감소시키는 것이 매우 중요하기 때문에, 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법은 다양한 비교적 저해상도의 LiDAR 스캔라인 해상도를 가진 측정값을 통해서도 조밀한 깊이 정보를 획득할 수 있는 기술을 제시한다. 이를 위해, 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템 및 그 방법은 서로 다른 스캔라인 해상도의 LiDAR 측정값을 통해서 출력되는 예측 깊이 정보 간의 일관성 손실을 정의하고, 이를 기반으로 비교적 저해상도의 LiDAR 측정값을 통해서도 고해상도의 LiDAR 측정값에 비견할 수 있는 정밀한 해상도의 깊이 정보를 추정할 수 있는 기술을 제공한다.In detail, conventional deep learning-based depth information estimation (acquisition/analysis, etc.) technology is only suitable for high-resolution LiDAR measurements, and cannot predict reliable dense depth information (depth map) through low-resolution LiDAR measurements. There is a problem. However, since it is very important to reduce the number of LiDAR channels in many aspects (cost, device weight, power consumption, etc.), a precise depth information estimation system and method using a camera and LiDAR according to an embodiment of the present invention presents a technology that can acquire dense depth information even through measurements with various relatively low-resolution LiDAR scanline resolutions. To this end, the precise depth information estimation system and method using a camera and LiDAR according to an embodiment of the present invention defines the consistency loss between predicted depth information output through LiDAR measurements of different scanline resolutions, and Based on this, it provides a technology that can estimate depth information with a precise resolution comparable to high-resolution LiDAR measurements even through relatively low-resolution LiDAR measurements.

즉, 각각의 상이한 채널의 스캔라인 해상도를 갖는 두 세트의 LiDAR 측정값에 동일한 이미지 데이터를 적용하여, 깊이 정보를 추정할 경우, 동일한 깊이 정보가 나와야 한다는 가정하고, 낮은 스캔라인 해상도를 갖는 LiDAR 측정값과 이미지 데이터를 적용한 깊이 정보의 정밀도를 향상시키기 위하여, 일관성 손실 함수를 적용하여 해상도에 무관하게 정밀한 해상도의 깊이 정보를 추정할 수 있는 네트워크 모델을 활용한다. 또한, RGB 이미지 데이터와 LiDAR 데이터의 희소 깊이 기능 간의 높은 수준의 상호 작용을 고려하기 위하여 특징 융합 모듈(Feature Fusion Module)을 설계하였다.That is, when estimating depth information by applying the same image data to two sets of LiDAR measurements with scanline resolutions in each different channel, it is assumed that the same depth information should be obtained, and LiDAR measurements with low scanline resolution are made. In order to improve the precision of depth information applied to value and image data, a network model that can estimate depth information with precise resolution regardless of resolution is utilized by applying a consistency loss function. Additionally, a feature fusion module was designed to consider high-level interactions between RGB image data and sparse depth features of LiDAR data.

이를 통해서, 다양한 스캔라인 해상도 입력에 대한 조밀한 깊이 정보를 예측할 수 있는 네트워크를 학습시킬 수 있으며, 이 때, 일관성 손실 함수를 적용하여 전체 스캔라인 해상도 예측에 의해 더 낮은 해상도의 희소 깊이의 성능 향상을 유도하게 된다. 또한, 두 개의 서로 다른 인코더에서 추출된 특징을 처리하여 디코더에 결합된 특징을 제공할 수 있는 특징 융합 모듈을 설계함으로써, 임의 채널의 해상도를 갖는 LiDAR 데이터를 이용하더라도 정밀한 깊이 정보를 추정할 수 있다.Through this, it is possible to learn a network that can predict dense depth information for various scanline resolution inputs. At this time, the performance of sparse depth at lower resolution is improved by predicting the entire scanline resolution by applying a consistency loss function. leads to . In addition, by designing a feature fusion module that can process features extracted from two different encoders and provide combined features to the decoder, precise depth information can be estimated even when using LiDAR data with arbitrary channel resolution. .

도 2는 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템을 나타낸 구성 예시도이며, 도 2를 참조로 하여 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템을 상세히 설명한다.Figure 2 is an example configuration diagram showing a precise depth information estimation system using a camera and lidar according to an embodiment of the present invention, and with reference to Figure 2, a precise depth information estimation system using a camera and lidar according to an embodiment of the present invention. The depth information estimation system is described in detail.

본 발명의 일 실시예에 따른 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 시스템은 도 2에 도시된 바와 같이, RGB 입력부(100), LiDAR 입력부(200), 훈련부(300) 및 정밀 추정부(400)를 포함하여 구성되는 것이 바람직하다. 각 구성들은 하나 또는 다수의 컴퓨터 등을 포함하는 연산처리수단에 적용되어 동작을 수행하는 것이 바람직하다.As shown in FIG. 2, a precise depth information estimation system using a camera and LiDAR according to an embodiment of the present invention includes an RGB input unit 100, a LiDAR input unit 200, and a training unit ( It is preferably configured to include a precision estimation unit 300) and a precision estimation unit 400. It is desirable that each component be applied to an operation processing means including one or multiple computers to perform the operation.

각 구성에 대해서 자세히 알아보자면,To learn more about each configuration,

상기 RGB 입력부(100)는 적어도 하나의 카메라를 이용하여 획득된 소정 영역의 RGB 이미지 데이터를 입력받는 것이 바람직하다. 여기서 소정 영역이란 정밀 깊이 정보를 획득하기 위한 깊이 완성 네트워크를 트레이닝시키기 위하여 임의로 설정된 소정 장소로서, 이에 대해서 한정하는 것은 아니다.The RGB input unit 100 preferably receives RGB image data of a predetermined area obtained using at least one camera. Here, the predetermined area is a predetermined location arbitrarily set to train a depth completion network for acquiring precise depth information, and is not limited thereto.

상기 LiDAR 입력부(200)는 3차원의 LiDAR 센서를 이용하여 획득횐 상기 소정 영역의 포인트 클라우드 데이터를 입력받는 것이 바람직하다.The LiDAR input unit 200 preferably receives point cloud data of the predetermined area obtained using a 3D LiDAR sensor.

상기 LiDAR 센서로부터의 센싱값을 토대로 포인트 클라우드 데이터로 변환하는 과정은 공지된 기술이기 때문에, 이에 대한 자세한 설명은 생략한다.Since the process of converting the sensing value from the LiDAR sensor into point cloud data is a known technology, detailed description thereof will be omitted.

상기 소정 영역은 상기 RGB 입력부(100)에서 상기 RGB 이미지 데이터를 입력받는 영역과 동일한 영역인 것에 바람직하다.The predetermined area is preferably the same area as the area where the RGB image data is input from the RGB input unit 100.

이를 위해, 상기 카메라와 LiDAR 센서들을 동일한 조건에 설치하고, 동일한 영역의 데이터를 센싱하여 이용하는 것이 바람직하다.For this purpose, it is desirable to install the camera and LiDAR sensors under the same conditions and sense and use data from the same area.

상기 훈련부(300)는 두 단계의 프레임 워크로 구성되며, 하나는 미리 저장되어 있는 깊이 완성 네트워크(Depth Completion Network)의 성능을 향상시키기 위하여 LiDAR 센서의 전체 스캔라인 해상도 입력을 고려한 훈련을 수행하고, 또다른 하나는 전체 스캔라인 해상도를 유지하면서 모든 스캔라인 해상도에서 잘 수행하기 위하여 일관성 손실이 있는 임의의 스캔라인 해상도에 대한 훈련을 수행할 수 있다.The training unit 300 consists of a two-stage framework, one of which performs training considering the entire scanline resolution input of the LiDAR sensor to improve the performance of a pre-stored depth completion network, Another can be training for arbitrary scanline resolutions with consistency loss to perform well at all scanline resolutions while maintaining the full scanline resolution.

상기 훈련부(300)는 도 2에 도시된 바와 같이, 제1 출력부(310), 제1 비교부(320), 제2 출력부(330), 제2 비교부(340) 및 통합 처리부(350)로 구성되는 것이 바람직하다.As shown in FIG. 2, the training unit 300 includes a first output unit 310, a first comparison unit 320, a second output unit 330, a second comparison unit 340, and an integrated processing unit 350. ) is preferably composed of.

상기 제1 출력부(310)와 제1 비교부(320)는 전체 스캔라인 해상도 입력에 대한 훈련을 수행하는 구성이다.The first output unit 310 and the first comparison unit 320 are configured to perform training on the entire scan line resolution input.

상기 제1 출력부(310)는 도 3에 도시된 바와 같이, 상기 RGB 입력부(100)로부터 상기 RGB 이미지 데이터를 전달받고, 상기 LiDAR 입력부(200)로부터 입력받은 전체 채널의 스캔라인 해상도를 갖는 상기 포인트 클라우드 데이터를 전달받아, 미리 저장되어 있는 깊이 완성 네트워크를 이용하여 제1 깊이 예측 정보를 출력받는 것이 바람직하다.As shown in FIG. 3, the first output unit 310 receives the RGB image data from the RGB input unit 100 and has the scan line resolution of all channels input from the LiDAR input unit 200. It is desirable to receive point cloud data and output first depth prediction information using a pre-stored depth completion network.

상기 깊이 완성 네트워크의 구성에 대해서는 자세히 후술하도록 한다.The configuration of the depth completion network will be described in detail later.

상기 제1 비교부(320)는 도 3에 도시된 바와 같이, 외부로부터 입력되는 상기 소정 영역의 GT(Ground Truth) 데이터와 상기 RGB 입력부(100)로부터 전달되는 상기 RGB 이미지 데이터를 이용하여, 상기 제1 출력부(310)에서 출력한 상기 제1 깊이 예측 정보에 대한 제1 손실 함수값과 제2 손실 함수값을 연산하는 것이 바람직하다.As shown in FIG. 3, the first comparison unit 320 uses the GT (Ground Truth) data of the predetermined area input from the outside and the RGB image data transmitted from the RGB input unit 100, It is desirable to calculate a first loss function value and a second loss function value for the first depth prediction information output from the first output unit 310.

상세하게는, 상기 제1 비교부(320)는 전체 채널의 스캔라인 해상도(일 예를 들자면, 64-scanline sparse depth)를 갖는 상기 포인트 클라우드 데이터를 적용하여 출력한 상기 제1 깊이 예측 정보를 정답 데이터라 정의되는 상기 GT 데이터를 이용하여, 상기 깊이 완성 네트워크의 트레이닝을 위한 손실 함수를 연산하게 된다.In detail, the first comparison unit 320 calculates the first depth prediction information output by applying the point cloud data with scanline resolution of all channels (for example, 64-scanline sparse depth). Using the GT data, which is defined as data, a loss function for training the depth completion network is calculated.

이 때, 상기 제1 손실 함수값은 Depth Loss이며, 상기 제2 손실 함수값은 Smoothness Loss로서, 하기의 수학식 1과 같이 정의된다.At this time, the first loss function value is Depth Loss, and the second loss function value is Smoothness Loss, which is defined as Equation 1 below.

(여기서,

는 K번 째 input에 대한 깊이정보 추정의 결과이며,(here,

is the result of depth information estimation for the Kth input,

는 깊이 정보의 Ground Truth 데이터를 의미하며,

means ground truth data of depth information,

는 K번 째 input에 대한 깊이정보 추정의 결과의 i, j 번째 픽셀에 해당하는 깊이 정보의 기울기변화 정도. 각각 x와 y방향으로 계산하며,

is the degree of change in slope of the depth information corresponding to the i and j pixels of the depth information estimation result for the K th input. Calculated in the x and y directions respectively,

는 입력 RGB 이미지의 i, j번째의 픽셀에 대한 기울기 정보 각각 x, y방향으로 계산함.)

calculates the gradient information for the i and j pixels of the input RGB image in the x and y directions, respectively.)

상기 제2 출력부(330)와 제2 비교부(340)는 전체 스캔라인 성능을 유지하면서 모든 스캔라인 해상도에서 잘 수행하기 위해 임의의 스캔라인 해상도에 대한 훈련을 수행하는 구성이다.The second output unit 330 and the second comparison unit 340 are configured to perform training for arbitrary scanline resolutions in order to perform well at all scanline resolutions while maintaining overall scanline performance.

더불어, 상기 제2 출력부(330)는 도 4에 도시된 바와 같이, 상기 RGB 입력부(100)로부터 상기 RGB 이미지 데이터를 전달받고, 이 때, 상기 RGB 이미지 데이터는 상기 제1 출력부(310)에 전달된 상기 RGB 이미지 데이터와 동일하다. 또한, 상기 LiDAR 입력부(200)로부터 전체 채널보다 적은 채널로 이루어지는 랜덤 채널의 상기 포인트 클라우드 데이터를 전달받아, 미리 저장되어 있는 깊이 완성 네트워크를 이용하여 제2 깊이 예측 정보를 출력받는 것이 바람직하다.In addition, as shown in FIG. 4, the second output unit 330 receives the RGB image data from the RGB input unit 100, and at this time, the RGB image data is transmitted to the first output unit 310. It is the same as the RGB image data transmitted to . In addition, it is preferable to receive the point cloud data of a random channel consisting of fewer channels than the total channel from the LiDAR input unit 200 and output second depth prediction information using a depth completion network that is stored in advance.

상기 깊이 완성 네트워크 역시도 상기 제1 출력부(310)에 적용된 네트워크와 동일하다.The depth completion network is also the same as the network applied to the first output unit 310.

상기 제2 비교부(340)는 상기 제1 출력부(310)에서 출력한 상기 제1 깊이 예측 정보와 상기 제2 출력부(330)에서 출력한 상기 제2 깊이 예측 정보를 이용하여, 상기 제2 깊이 예측 정보에 대한 제3 손실 함수값을 연산하는 것이 바람직하다.The second comparison unit 340 uses the first depth prediction information output from the first output unit 310 and the second depth prediction information output from the second output unit 330 to 2 It is desirable to calculate a third loss function value for the depth prediction information.

이는, 주어진 RGB 이미지 데이터가 둥일하자면, 입력되는 스캔라인 해상도에 무관하게 동일한 깊이 예측 정보가 출력되어야 하기 때문에, 동일하게 적용될 수 있는 일관성 손실 함수를 연산하게 된다.This means that if the given RGB image data is the same, the same depth prediction information must be output regardless of the input scanline resolution, so a consistency loss function that can be applied equally is calculated.

이 때, 상기 제3 손실 함수값은 Consistency Loss로서, 하기의 수학식 2와 같이 정의된다.At this time, the third loss function value is Consistency Loss, which is defined as Equation 2 below.

(여기서,

는 K번 째 input에 대한 깊이 정보 추정의 결과이며,(here,

is the result of depth information estimation for the Kth input,

n은 한 훈련 과정에서 계산한 입력 깊이 정보의 개수, 1의 경우 64개의 LiDAR 채널 깊이정보, 2 이상으로는 임의의 LiDAR 채널 깊이정보를 입력하되, 본 발명의 일 실시예에 따라 n = 2 임.)n is the number of input depth information calculated in one training process, in the case of 1, 64 LiDAR channel depth information, 2 or more input random LiDAR channel depth information, but n = 2 according to an embodiment of the present invention. .)

상기 통합 처리부(350)는 상기 제1 비교부(320)에서 연산한 상기 제1 손실 함수값, 제2 손실 함수값과, 상기 제2 비교부(340)에서 연산한 상기 제3 손실 함수값을 이용하여, 손실 함수값들이 최소가 되도록 상기 깊이 완성 네트워크를 반복 재학습시키는 것이 바람직하다.The integrated processing unit 350 calculates the first loss function value, the second loss function value, and the third loss function value calculated by the first comparison unit 320. It is desirable to iteratively retrain the depth completion network so that the loss function values are minimized.

여기서, 상기 통합 처리부(350)는 하기의 수학식 3과 같이 상기 제1 손실 함수값, 제2 손실 함수값 및 제3 손실 함수값을 정의하여, 상기 깊이 완성 네트워크를 트레이닝 시키는 것이 바람직하다.Here, the integrated processing unit 350 preferably trains the depth completion network by defining the first loss function value, the second loss function value, and the third loss function value as shown in Equation 3 below.

(여기서,

는 각 손실 함수의 가중치, 주 손실함수는 Depth Loss로부터 계산이 되며, Smoothness Loss와 Consistency Loss는 각각 0.1만큼의 가중치로 계산됨.)(here,

is the weight of each loss function, the main loss function is calculated from Depth Loss, and Smoothness Loss and Consistency Loss are each calculated with a weight of 0.1.)

이 때, 상기 깊이 완성 네트워크는 낮은 주사선과 불균일한 희소 깊이를 사용하여 조밀한 깊이 정보를 예측하기 위하여, 도 5에 도시된 바와 같이, 두 개의 인코더와 두 개의 모듈 및 디코더로 설계하는 것이 바람직하다.At this time, the depth completion network is preferably designed with two encoders, two modules, and a decoder, as shown in FIG. 5, in order to predict dense depth information using low scan lines and non-uniform sparse depth. .

상세하게는, 제1 인코더는 입력되는 상기 RGB 데이터의 특징을 추출하고, 제2 인코더는 입력되는 상기 포인트 클라우드 데이터의 특징을 추출하는 것이 바람직하다. 도 7에 도시된 바와 같이, 상기 제1 인코더의 잔차 블록은 평균 풀링을 사용하는데 반해, 상기 제2 인코더는 희소 입력 값을 보존하기 위해 최대 풀링을 포함하는 잔차 블록으로 구성되는 것이 바람직하다.In detail, it is preferable that the first encoder extracts features of the input RGB data, and the second encoder extracts features of the input point cloud data. As shown in FIG. 7, while the residual block of the first encoder uses average pooling, the second encoder preferably consists of a residual block that includes max pooling to preserve sparse input values.

제1 모듈은 특징 융합 모듈(Feature Fusion Module)로서, 종래의 특징 융합 모듈은 주로 연결된 입력을 인코딩하고 각 입력을 인코딩한 후, 특성을 연결하는 두가지 융합 방법에 중점을 두고 있었으나, 이 경우, 서로 다른 양식의 기능 간의 충분한 상호 작용을 유도하는 데에 한계가 있다. 이를 해소하기 위하여, 상기 제1 모듈은 인코더 기능을 결합한 다음 추가 잔여 블록으로 처리하도록 설계하였다.The first module is the Feature Fusion Module. The conventional feature fusion module mainly focused on two fusion methods: encoding connected inputs and encoding each input, then connecting the features. There are limitations in inducing sufficient interaction between functions of different modalities. To solve this problem, the first module was designed to combine the encoder function and then process it as an additional residual block.

도 7에 도시된 바와 같이, 각 인코더에서 추출한 기능을 요소별 합산과 잔차 블록으로 집계하며, 모든 레이어의 기능을 융합하여 초기 융합과 후기 융합 방법을 모두 적용함으로써, 희소 깊이와 RGB이미지로 고품질 융합을 수행할 수 있다.As shown in Figure 7, the features extracted from each encoder are aggregated into element-wise sums and residual blocks, and the features of all layers are fused and both early and late fusion methods are applied to achieve high-quality fusion with sparse depth and RGB images. can be performed.

즉, 상기 제1 모듈은 상기 제1 인코더와 제2 인코더에서 추출한 각 층들의 두 특징(feature)의 합을 입력값을 설정하고, 상기 제1 인코더와 제2 인코더의 첫 번째 레이어를 통과한 특징을 첫 번째 입력으로 설정하고, 그 이후의 레이어를 통과한 특징 결과와 각 층에 해당하는 상기 제1 인코더와 제2 인코더의 특징들의 총합을 그 다음 입력으로 설정하게 된다. 상기 제1 모듈은 총 4층으로 이루어지는 인코더로서, 매 층마다 합성하는 새로운 방법으로 초기 합성, 후기 합성 방법의 장점을 모두 취할 수 있는 장점이 있다.That is, the first module sets the input value to the sum of the two features of each layer extracted from the first encoder and the second encoder, and the feature that passed the first layer of the first encoder and the second encoder is set as the first input, and the feature results passed through subsequent layers and the sum of the features of the first encoder and the second encoder corresponding to each layer are set as the next input. The first module is an encoder composed of a total of 4 layers, and has the advantage of taking advantage of both the early and late synthesis methods through a new method of synthesis for each layer.

제2 모듈은 공간 피라미드 풀링 모듈(Spatial Pyramid Pooling Module)로서 상기 제1 모듈로부터 합성 값을 입력받아 다중 스케일 특징을 추출하게 된다.The second module is a spatial pyramid pooling module that receives composite values from the first module and extracts multi-scale features.

풀링된 기능을 연결하는 대신 효율적인 계산을 위해 요소별 합산으로 기능을 집계하며, 인코더의 최대 풀링으로 인해 1/16 스케일 특징이 충분히 조밀하므로 이를 적용하여 다중 스케일 피쳐를 추출한다.Instead of concatenating the pooled features, we aggregate the features by element-wise summation for efficient calculation, and since the 1/16 scale features are sufficiently dense due to the maximum pooling of the encoder, we apply them to extract multi-scale features.

디코더는 상기 제1 모듈로부터 다중 스케일 특징을 입력받아 상기 깊이 예측 정보를 출력하게 된다.The decoder receives multi-scale features from the first module and outputs the depth prediction information.

또한, 상기 깊이 완성 네트워크는 도 6에 도시된 바와 같이, 상기 디코더로부터 출력되는 깊이 예측 정보(초기 깊이 예측 정보)의 해상도를 향상시키기 위한 해상도 향상 네트워크를 적용하여 후처리된 후기 깊이 예측 정보를 출력하게 된다.In addition, as shown in FIG. 6, the depth completion network applies a resolution enhancement network to improve the resolution of the depth prediction information (initial depth prediction information) output from the decoder and outputs post-processed late depth prediction information. I do it.

여기서, 상기 해상도 향상 네트워크는 두 번의 stacked hourglass network로 구성되는 것이 바람직하다.Here, the resolution enhancement network is preferably composed of two stacked hourglass networks.

상기 해상도 향상 네트워크는 RGB 텍스쳐 정보와 가장자리 인식 부드러움 손실로 상기 디코더로부터 출력되는 깊이 예측 정보를 정제하여 보다 정확한 깊이 정보를 예측하도록 설계되게 된다. 도 7에 도시된 바와 같이, 작은 인코더와 디코더로 구성되며, 희소 깊이와 RGB 이미지를 사용하여 초기 예측을 수행하고, 건너뛰기 연결을 사용하여 보다 정제된 깊이를 예측하게 된다. 상기 해상도 향상 네트워크는 메인 네트워크인 상기 깊이 완성 네트워크보다 상대적으로 레이어 수가 적기 때문에, 상기 메인 네트워크의 깊이 예측을 향상시키면서 덜 혼합된 깊이를 두 개의 다른 입력으로 평활화할 수 있다.The resolution enhancement network is designed to predict more accurate depth information by refining the depth prediction information output from the decoder using RGB texture information and edge recognition smoothness loss. As shown in Figure 7, it consists of a small encoder and decoder, and performs initial prediction using sparse depth and RGB images, and predicts more refined depth using skip connections. Since the resolution enhancement network has relatively fewer layers than the depth completion network, which is the main network, it can smoothen the less mixed depth into two different inputs while improving the depth prediction of the main network.

이를 고려하여, 상기 제1 출력부(310) 또는, 상기 제2 출력부(330)를 통해서 출력되는 깊이 예측 정보는 상기 깊이 완성 네트워크의 디코더로부터 출력되는 초기 깊이 예측 정보(initial prediction depth)와, 상기 초기 깊이 예측 정보를 상기 해상도 향상 네트워크에 적용하여 출력되는 상기 후기 깊이 예측 정보(refined prediction depth)를 포함하여 구성되는 것이 바람직하다.Considering this, the depth prediction information output through the first output unit 310 or the second output unit 330 includes initial prediction depth information output from the decoder of the depth completion network, It is preferable to include the later depth prediction information (refined prediction depth) output by applying the initial depth prediction information to the resolution enhancement network.

이를 통해서, 상기 제1 깊이 예측 정보는 제1 초기 깊이 예측 정보, 제1 후기 깊이 예측 정보를 포함하는 것이 바람직하며, 상기 제2 깊이 예측 정보 역시도 상기 제1 초기 깊이 예측 정보, 제2 후기 깊이 예측 정보를 포함하는 것이 바람직하다.Through this, the first depth prediction information preferably includes first initial depth prediction information and first late depth prediction information, and the second depth prediction information also includes the first initial depth prediction information and the second late depth prediction information. It is desirable to include information.

이에 따라 상기 제1 비교부(320)는 상기 깊이 완성 네트워크의 디코더로부터 출력되는 상기 제1 초기 깊이 예측 정보와 상기 제1 초기 깊이 예측 정보를 상기 해상도 향상 네트워크에 적용하여 출력되는 상기 제1 후기 깊이 예측 정보의 각각에 대한 제1 손실 함수값(Depth Loss_init, Depth Loss_refined)과 제2 손실 함수값(Smoothness Loss_init, Smoothness Loss_refined)을 연산하는 것이 바람직하다.Accordingly, the first comparator 320 applies the first initial depth prediction information output from the decoder of the depth completion network and the first initial depth prediction information to the resolution enhancement network to determine the first late depth output. It is desirable to calculate first loss function values (Depth Loss_init, Depth Loss_refined) and second loss function values (Smoothness Loss_init, Smoothness Loss_refined) for each piece of prediction information.

또한, 상기 제2 비교부(340)는 상기 깊이 완성 네트워크의 디코더로부터 출력되는 상기 제2 초기 깊이 예측 정보와 상기 제2 초기 깊이 예측 정보를 상기 해상도 향상 네트워크에 적용하여 출력되는 상기 제2 후기 깊이 예측 정보의 각각에 대한 제3 손실 함수값(Consistency Loss_init, Consistency Loss_refined)을 연산하는 것이 바람직하다.In addition, the second comparison unit 340 applies the second initial depth prediction information output from the decoder of the depth completion network and the second initial depth prediction information to the resolution enhancement network to determine the second late depth output. It is desirable to calculate third loss function values (Consistency Loss_init, Consistency Loss_refined) for each piece of prediction information.

이에 따라, 상기 통합 처리부(350)는 하기의 수학식 4와 최종 손실함수값을 정의할 수 있다.Accordingly, the integrated processing unit 350 can define Equation 4 below and the final loss function value.

(여기서,

는 해상도 향상 네트워크에 입력되기 전의 깊이 정보에 대한 손실함수 값,(here,

is the loss function value for depth information before being input to the resolution enhancement network,

는 해상도 향상 네트워크에 입력된 후 출력된 깊이 정보에 대한 손실 함수 값,

is the loss function value for the depth information output after being input to the resolution enhancement network,

는

에 대한 가중치로서,

Is

As a weight for ,

초기 훈련 과정에서 0.1로 설정하여

을 훈련하는데 집중 후, 후반 훈련과정에서 성능을 향상시킴.)By setting it to 0.1 during the initial training process,

After focusing on training, performance is improved in the later training process.)

이를 토대로 상기 통합 처리부는 상기 깊이 완성 네트워크를 트레이닝 시키는 것이 바람직하다.Based on this, it is desirable for the integrated processing unit to train the depth completion network.

상기 정밀 추정부(400)는 상기 훈련부(300)에 의해 최종 학습된 깊이 학습 네트워크를 전달받아, 이 후 상기 최종 학습된 깊이 학습 네트워크에 외부로부터 입력되는 RGB 이미지 데이터와 다양한 임의의 채널의 포인트 클라우드 데이터를 적용하여, 정밀 깊이 정보를 추정하는 것이 바람직하다.The precision estimation unit 400 receives the final learned depth learning network from the training unit 300, and then RGB image data and point clouds of various arbitrary channels input from the outside to the final learned depth learning network. It is desirable to apply the data to estimate precise depth information.

이 때, 상기 정밀 깊이 정보는 입력된 임의의 채널의 포인트 클라우드 데이터가 전체 채널의 스캔라인 해상도를 갖는 포인트 클라우드 데이터가 아니더라도 이와 비견할 수 있는 정밀도를 갖는 깊이 정보를 예측하게 된다.At this time, the precise depth information predicts depth information with precision comparable to the input point cloud data of any channel even if it is not point cloud data with the scanline resolution of all channels.

도 8은 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 방법을 나타낸 순서 예시도이며, 도 8을 참조로 하여 본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 방법을 상세히 설명한다.Figure 8 is a sequence diagram showing a precise depth information estimation method using a camera and lidar according to an embodiment of the present invention. With reference to Figure 8, a precise depth information estimation method using a camera and lidar according to an embodiment of the present invention is shown. The depth information estimation method is explained in detail.

본 발명의 일 실시예에 따른 카메라와 라이다를 이용한 정밀한 깊이 정보 추정 방법은 도 8에 도시된 바와 같이, 제1 데이터 입력 단계(S100), 제2 데이터 입력 단계(S200), 제1 예측 출력 단계(S300), 제2 예측 출력 단계(S400), 제3 데이터 입력 단계(S500), 제1 손실 산출 단계(S600), 제2 손실 산출 단계(S700), 트레이닝 단계(S800) 및 정밀 출력 단계(S900)를 포함하여 구성되는 것이 바람직하다.As shown in FIG. 8, the method for estimating precise depth information using a camera and lidar according to an embodiment of the present invention includes a first data input step (S100), a second data input step (S200), and a first prediction output. Step (S300), second prediction output step (S400), third data input step (S500), first loss calculation step (S600), second loss calculation step (S700), training step (S800), and precision output step. It is preferable to include (S900).

각 단계에 대해서 자세히 알아보자면,To learn more about each step,

상기 제1 데이터 입력 단계(S100)는 상기 RGB 입력부(100)에서, 적어도 하나의 카메라를 이용하여 획득된 소정 영역의 RGB 이미지 데이터를 입력받게 된다. 여기서 소정 영역이란 정밀 깊이 정보를 획득하기 위한 깊이 완성 네트워크를 트레이닝시키기 위하여 임의로 설정된 소정 장소로서, 이에 대해서 한정하는 것은 아니다.In the first data input step (S100), RGB image data of a predetermined area acquired using at least one camera is input from the RGB input unit 100. Here, the predetermined area is a predetermined location arbitrarily set to train a depth completion network for acquiring precise depth information, and is not limited thereto.

상기 제2 데이터 입력 단계(S200)는 상기 LiDAR 입력부(200)에서, 3차원의 LiDAR 센서를 이용하여 획득횐 상기 소정 영역의 포인트 클라우드 데이터를 입력받게 된다.In the second data input step (S200), point cloud data of the predetermined area acquired using a 3D LiDAR sensor is input from the LiDAR input unit 200.

상기 소정 영역은 상기 제1 데이터 입력 단계(S100)에 의해 상기 RGB 이미지 데이터를 입력받는 영역과 동일한 영역인 것에 바람직하다. 이를 위해, 상기 카메라와 LiDAR 센서들을 동일한 조건에 설치하고, 동일한 영역의 데이터를 센싱하여 이용하는 것이 바람직하다.The predetermined area is preferably the same area as the area where the RGB image data is input in the first data input step (S100). For this purpose, it is desirable to install the camera and LiDAR sensors under the same conditions and sense and use data from the same area.

상기 제1 예측 출력 단계(S300)는 상기 훈련부(300)에서, 상기 제1 데이터 입력 단계(S100)에 의한 상기 RGB 이미지 데이터를 입력받고, 상기 제2 데이터 입력 단계(S200)에 의한 전체 채널의 스캔라인 해상도를 갖는 상기 포인트 클라우드 데이터를 입력받아, 미리 저장되어 있는 깊이 완성 네트워크를 이용하여 제1 깊이 예측 정보를 출력받게 된다.The first prediction output step (S300) receives the RGB image data from the training unit 300 through the first data input step (S100), and inputs the RGB image data from the entire channel through the second data input step (S200). The point cloud data having scanline resolution is input, and first depth prediction information is output using a pre-stored depth completion network.

상기 제2 예측 출력 단계(S400)는 상기 훈련부(300)에서, 상기 제1 데이터 입력 단계(S100)에 의한 상기 RGB 이미지 데이터를 입력받고, 상기 제2 데이터 입력 단계(S200)에 의한 전체 채널보다 적은 채널로 이루어지는 랜덤 채널의 상기 포인트 클라우드 데이터를 입력받아, 미리 저장되어 있는 깊이 완성 네트워크를 이용하여 제2 깊이 예측 정보를 출력받게 된다.The second prediction output step (S400) receives the RGB image data from the first data input step (S100) from the training unit 300, and receives more than all channels from the second data input step (S200). The point cloud data of a random channel consisting of few channels is input, and second depth prediction information is output using a pre-stored depth completion network.

이 때, 상기 제1 예측 출력 단계(S300)와 상기 제2 예측 출력 단계(S400)에 적용된 상기 깊이 완성 네트워크는 낮은 주사선과 불균일한 희소 깊이를 사용하여 조밀한 깊이 정보를 예측하기 위하여, 도 5에 도시된 바와 같이, 두 개의 인코더와 두 개의 모듈 및 디코더로 설계하는 것이 바람직하다.At this time, the depth completion network applied to the first prediction output step (S300) and the second prediction output step (S400) is used to predict dense depth information using a low scan line and non-uniform sparse depth, Figure 5 As shown, it is desirable to design with two encoders, two modules, and a decoder.

상기 제3 데이터 입력 단계(S500)는 상기 훈련부(300)에서, 외부로부터 입력되는 상기 소정 영역의 GT(Ground Truth) 데이터를 입력받게 된다. 상기 소정 영역은 상기 제1 데이터 입력 단계(S100)에 의해 상기 RGB 이미지 데이터를 입력받는 영역과 동일한 영역인 것에 바람직하다.In the third data input step (S500), the training unit 300 receives GT (Ground Truth) data of the predetermined area input from the outside. The predetermined area is preferably the same area as the area where the RGB image data is input in the first data input step (S100).

상기 제1 손실 산출 단계(S600)는 상기 훈련부(300)에서, 상기 제3 데이터 입력 단계(S500)에 의한 상기 GT 데이터, 상기 제1 데이터 입력 단계(S100)에 의한 상기 RGB 이미지 데이터와 상기 제1 예측 출력 단계(S300)에 의한 상기 제1 깊이 예측 정보를 이용하여, 상기 제1 깊이 예측 정보에 대한 제1 손실 함수값과 제2 손실 함수값을 연산하게 된다.The first loss calculation step (S600) is performed by the training unit 300, the GT data from the third data input step (S500), the RGB image data from the first data input step (S100), and the 1 Using the first depth prediction information in the prediction output step (S300), a first loss function value and a second loss function value for the first depth prediction information are calculated.

상세하게는, 상기 제1 손실 산출 단계(S600)는 전체 채널의 스캔라인 해상도(일 예를 들자면, 64-scanline sparse depth)를 갖는 상기 포인트 클라우드 데이터를 적용하여 출력한 상기 제1 깊이 예측 정보를 정답 데이터라 정의되는 상기 GT 데이터를 이용하여, 상기 깊이 완성 네트워크의 트레이닝을 위한 손실 함수를 연산하게 된다.In detail, the first loss calculation step (S600) includes the first depth prediction information output by applying the point cloud data with scanline resolution of all channels (for example, 64-scanline sparse depth). Using the GT data, which is defined as the correct answer data, a loss function for training the depth completion network is calculated.

이 때, 상기 제1 손실 함수값은 Depth Loss이며, 상기 제2 손실 함수값은 Smoothness Loss로서, 상기의 수학식 1과 같이 정의된다.At this time, the first loss function value is Depth Loss, and the second loss function value is Smoothness Loss, which is defined as Equation 1 above.

상기 제2 손실 산출 단계(S700)는 상기 훈련부(300)에서, 상기 제1 예측 출력 단계(S300) 에 의한 상기 제1 깊이 예측 정보와 상기 제2 예측 출력 단계(S400)에 의한 상기 제2 깊이 예측 정보를 이용하여, 상기 제2 깊이 예측 정보에 대한 제3 손실 함수값을 연산하게 된다.The second loss calculation step (S700) is performed in the training unit 300 by using the first depth prediction information from the first prediction output step (S300) and the second depth from the second prediction output step (S400). Using the prediction information, a third loss function value for the second depth prediction information is calculated.

이 때, 상기 제3 손실 함수값은 Consistency Loss로서, 상기의 수학식 2와 같이 정의된다.At this time, the third loss function value is Consistency Loss, which is defined as Equation 2 above.

상기 트레이닝 단계(S800)는 상기 훈련부(300)에서, 상기 제1 손실 산출 단계(S600)에 의한 상기 제1 손실 함수값, 제2 손실 함수값과, 상기 제2 손실 산출 단계(S700)에 의한 상기 제3 손실 함수값을 이용하여, 손실 함수값들이 최소가 되도록 상기 깊이 완성 네트워크를 반복 재학습시키게 된다.The training step (S800) is performed in the training unit 300 by combining the first loss function value and the second loss function value by the first loss calculation step (S600), and the second loss function value by the second loss calculation step (S700). Using the third loss function value, the depth completion network is repeatedly retrained so that the loss function values are minimized.

상기 트레이닝 단계(S800)는 상기의 수학식 3과 같이 상기 제1 손실 함수값, 제2 손실 함수값 및 제3 손실 함수값을 정의하여, 상기 깊이 완성 네트워크를 트레이닝 시키는 것이 바람직하다.In the training step (S800), it is preferable to train the depth completion network by defining the first loss function value, second loss function value, and third loss function value as shown in Equation 3 above.

이 때, 상기 깊이 완성 네트워크는 도 6에 도시된 바와 같이, 상기 디코더로부터 출력되는 깊이 예측 정보(초기 깊이 예측 정보)의 해상도를 향상시키기 위한 해상도 향상 네트워크를 적용하여 후처리된 후기 깊이 예측 정보를 출력하게 된다.At this time, as shown in FIG. 6, the depth completion network applies a resolution enhancement network to improve the resolution of the depth prediction information (initial depth prediction information) output from the decoder to post-process the later depth prediction information. It will be printed.

이를 고려하여, 상기 제1 예측 출력 단계(S300) 또는, 상기 제2 예측 출력 단계(S400)를 통해서 출력되는 깊이 예측 정보는 상기 깊이 완성 네트워크의 디코더로부터 출력되는 초기 깊이 예측 정보(initial prediction depth)와, 상기 초기 깊이 예측 정보를 상기 해상도 향상 네트워크에 적용하여 출력되는 상기 후기 깊이 예측 정보(refined prediction depth)를 포함하여 구성되는 것이 바람직하다.Considering this, the depth prediction information output through the first prediction output step (S300) or the second prediction output step (S400) is the initial depth prediction information output from the decoder of the depth completion network. and, the later depth prediction information (refined prediction depth) output by applying the initial depth prediction information to the resolution enhancement network.

이를 통해서, 상기 제1 깊이 예측 정보는 제1 초기 깊이 예측 정보, 제1 후기 깊이 예측 정보를 포함하는 것이 바람직하며, 상기 제2 깊이 예측 정보 역시도 상기 제1 초기 깊이 예측 정보, 제2 후기 깊이 예측 정보를 포함하게 된다.Through this, the first depth prediction information preferably includes first initial depth prediction information and first late depth prediction information, and the second depth prediction information also includes the first initial depth prediction information and the second late depth prediction information. It contains information.

즉, 상기 제1 손실 산출 단계(S600)는 상기 깊이 완성 네트워크의 디코더로부터 출력되는 상기 제1 초기 깊이 예측 정보와 상기 제1 초기 깊이 예측 정보를 상기 해상도 향상 네트워크에 적용하여 출력되는 상기 제1 후기 깊이 예측 정보의 각각에 대한 제1 손실 함수값(Depth Loss_init, Depth Loss_refined)과 제2 손실 함수값(Smoothness Loss_init, Smoothness Loss_refined)을 연산하게 된다.That is, the first loss calculation step (S600) is performed by applying the first initial depth prediction information output from the decoder of the depth completion network and the first initial depth prediction information to the resolution enhancement network to output the first later depth prediction information. First loss function values (Depth Loss_init, Depth Loss_refined) and second loss function values (Smoothness Loss_init, Smoothness Loss_refined) for each piece of depth prediction information are calculated.

또한, 상기 제2 손실 산출 단계(S700)는 상기 깊이 완성 네트워크의 디코더로부터 출력되는 상기 제2 초기 깊이 예측 정보와 상기 제2 초기 깊이 예측 정보를 상기 해상도 향상 네트워크에 적용하여 출력되는 상기 제2 후기 깊이 예측 정보의 각각에 대한 제3 손실 함수값(Consistency Loss_init, Consistency Loss_refined)을 연산하는 것이 바람직하다.In addition, the second loss calculation step (S700) is performed by applying the second initial depth prediction information output from the decoder of the depth completion network and the second initial depth prediction information to the resolution enhancement network to output the second late prediction information. It is desirable to calculate third loss function values (Consistency Loss_init, Consistency Loss_refined) for each piece of depth prediction information.

이에 따라, 상기 트레이닝 단계(S800)는 상기의 수학식 4와 최종 손실함수값을 정의할 수 있으며, 상기의 수학식 4를 이용하여 상기 깊이 완성 네트워크를 트레이닝 시키는 것이 바람직하다.Accordingly, the training step (S800) can define Equation 4 and the final loss function value, and it is preferable to train the depth completion network using Equation 4 above.

상기 정밀 출력 단계(S900)는 상기 정밀 추정부(400)에서, 상기 트레이닝 단계(S800)에 의해 최종 학습된 깊이 학습 네트워크를 전달받아, 이 후 상기 최종 학습된 깊이 학습 네트워크에 외부로부터 입력되는 RGB 이미지 데이터와 다양한 임의의 채널의 포인트 클라우드 데이터를 적용하여, 정밀 깊이 정보를 추정하게 된다.The precision output step (S900) receives the final learned depth learning network from the precision estimation unit 400 through the training step (S800), and then RGB input from the outside to the final learned depth learning network. By applying image data and point cloud data of various random channels, precise depth information is estimated.

이상과 같이 본 발명에서는 구체적인 구성 소자 등과 같은 특정 사항들과 한정된 실시예 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것 일 뿐, 본 발명은 상기의 일 실시예에 한정되는 것이 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described with reference to specific details such as specific components and limited embodiment drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above-mentioned embodiment. No, those skilled in the art can make various modifications and variations from this description.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허 청구 범위뿐 아니라 이 특허 청구 범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Accordingly, the spirit of the present invention should not be limited to the described embodiments, and all matters that are equivalent or equivalent to the claims of this patent as well as the claims described below shall fall within the scope of the spirit of the present invention. .

100 : RGB 입력부
200 : LiDAR 입력부
300 : 훈련부
310 : 제1 출력부 320 : 제1 비교부
330 : 제2 출력부 330 : 제2 비교부
350 : 통합 처리부
400 : 정밀 추정부100: RGB input unit
200: LiDAR input unit
300: Training Department
310: first output unit 320: first comparison unit
330: second output unit 330: second comparison unit
350: integrated processing unit
400: Precision estimation unit

Claims

An RGB input unit 100 that receives RGB image data of a predetermined area acquired using at least one camera;
A LiDAR input unit 200 that receives point cloud data of the predetermined area acquired using a 3D LiDAR sensor;
Receives the RGB image data from the RGB input unit 100 and point cloud data of scanline resolution for all channels from the LiDAR input unit 200, uses a pre-stored depth completion network, and determines the first depth Output forecast information,
Receives the RGB image data from the RGB input unit 100 and the point cloud data with scanline resolution of a random channel consisting of fewer channels than all channels from the LiDAR input unit 200, uses the depth completion network, and generates a second Outputs depth prediction information,
a training unit 300 that calculates a loss function by comparing and analyzing the first depth prediction information and the second depth prediction information, and learns the depth completion network based on the calculated loss function; and
By applying the input RGB image data and point cloud data of a random channel to the depth complete network model generated as a final learning result by the training unit 300, precision is achieved regardless of the resolution of the point cloud data of a random input channel. A precise estimation unit 400 that estimates depth information;
Includes,
The training department 300 is
It consists of a two-stage framework, receives two data sets with the same RGB image data and point cloud data of different channel scanline resolution, and uses the depth completion network to output each depth prediction information. ,
Calculate a first loss function value and a second loss function value for the first depth prediction information, calculate a third loss function value for the second depth prediction information, and calculate a first loss function value and a second loss function. Train the depth completion network using the value and the third loss function value,
The third loss function value is a consistency loss function. A precise depth information estimation system using a camera and lidar.

According to clause 1,
The training department 300 is
A first output unit 310 that receives first depth prediction information by applying the RGB image data and the point cloud data of the scanline resolution of all channels to the depth completion network; and
Using the first depth prediction information, GT (Ground Truth) data of the predetermined area input from the outside, and the RGB image data transmitted from the RGB input unit 100, a first information about the first depth prediction information is generated. A first comparison unit 320 that calculates a loss function value and a second loss function value;
A precise depth information estimation system using a camera and lidar, further comprising:

According to clause 2,
The training department 300 is
A second output unit 330 that receives second depth prediction information by applying the RGB image data and point cloud data with scanline resolution of a random channel consisting of fewer channels than all channels to the depth completion network;
a second comparison unit 340 that calculates a third loss function value for the second depth prediction information using the second depth prediction information and the first depth prediction information; and
an integrated processing unit 350 that trains the depth completion network using the first loss function value, the second loss function value, and the third loss function value;
A precise depth information estimation system using a camera and lidar, further comprising:

According to clause 3,
The training department 300 is
a first encoder that extracts features of the input RGB image data;
a second encoder that extracts features of the input point cloud data;
a first module that receives features from the first encoder and the second encoder and synthesizes them;
a second module that receives the composite value from the first module and extracts multi-scale features; and
A decoder that receives multi-scale features from a second module and outputs depth prediction information;
A precise depth information estimation system using cameras and lidar, which constitutes the depth completion network.

According to clause 4,
The depth prediction information is
Initial depth prediction information output from the decoder,
Contains late depth prediction information output by applying the initial depth prediction information to a pre-stored resolution enhancement network,
The first depth prediction information includes first initial depth prediction information and first late depth prediction information,
The second depth prediction information includes second initial depth prediction information and second late depth prediction information. A precise depth information estimation system using a camera and lidar.

According to clause 5,
The first comparison unit 320 is
Calculating a first loss function value and a second loss function value for each of the first initial depth prediction information and the first late depth prediction information,
The second comparison unit 340 is
A precise depth information estimation system using a camera and lidar, which calculates a third loss function value for each of the second initial depth prediction information and the second late depth prediction information.

According to clause 6,
The integrated processing unit 350 is
A first loss function value and a second loss function value for the first initial depth prediction information, a first loss function value and a second loss function value for the first late depth prediction information, and the second initial depth prediction A precise depth information estimation system using a camera and lidar, which trains the depth completion network using a third loss function value for information and a third loss function value for the second late depth prediction information.

In a precise depth information estimation method using a camera and LiDAR, where each step is performed by a precise depth information estimation system using a computer-implemented camera and LiDAR,
A first data input step (S100) of receiving RGB image data of a predetermined area from the RGB input unit using at least one camera;
A second data input step (S200) of receiving point cloud data of the predetermined area at the LiDAR input unit using a 3D LiDAR sensor;
In the training unit, it is composed of a two-stage framework, the first stage of which is to receive the RGB image data through the first data input stage (S100), and to receive the entire RGB image data through the second data input stage (S200). A first prediction output step (S300) of receiving point cloud data of channel scanline resolution and outputting first depth prediction information using a pre-stored depth completion network;
In the training unit, it is composed of a two-stage framework, the second stage of which is to receive the RGB image data through the first data input stage (S100), and to receive the entire image data through the second data input stage (S200). A second prediction output step (S400) of receiving point cloud data with scanline resolution of a random channel consisting of fewer channels than the channel and outputting second depth prediction information using the depth completion network;
A third data input step (S500) in which the training unit receives GT (Ground Truth) data of the predetermined area from the outside;
In the training unit, the GT data by the third data input step (S500), the RGB image data by the first data input step (S100), and the first depth prediction by the first prediction output step (S300). A first loss calculation step (S600) of calculating a first loss function value and a second loss function value for the first depth prediction information using information;
In the training unit, the first depth prediction information in the first prediction output step (S300) and the second depth prediction information in the second prediction output step (S400) are used to determine the second depth prediction information. A second loss calculation step (S700) of calculating a third loss function value;
In the training unit, using the first loss function value, the second loss function value by the first loss calculation step (S600), and the third loss function value by the second loss calculation step (S700), A training step (S800) to learn a depth completion network; and
In the precision estimation unit, the input RGB image data and point cloud data of a random channel are applied to the depth completion network model generated as the final learning result in the training step (S800), and the input point cloud data of a random channel A precision output step (S900) of estimating precise depth information regardless of the resolution;
Includes,
The third loss function value is a consistency loss function. A precise depth information estimation method using a camera and lidar.

According to clause 8,
The depth completion network is
a first encoder that extracts features of the input RGB image data;
a second encoder that extracts features of the input point cloud data;
a first module that receives features from the first encoder and the second encoder and synthesizes them;
a second module that receives the composite value from the first module and extracts multi-scale features; and
A decoder that receives multi-scale features from a second module and outputs depth prediction information;
A method for estimating precise depth information using a camera and lidar.

According to clause 9,
The depth prediction information is
It includes initial depth prediction information output from the decoder, and late depth prediction information output by applying the initial depth prediction information to a pre-stored resolution enhancement network,
The first depth prediction information includes first initial depth prediction information and first late depth prediction information,
The second depth prediction information includes second initial depth prediction information and second late depth prediction information. A precise depth information estimation method using a camera and lidar.

According to clause 10,
The first loss calculation step (S600) is
Calculating a first loss function value and a second loss function value for each of the first initial depth prediction information and the first late depth prediction information,
The second loss calculation step (S700) is
A method for estimating precise depth information using a camera and lidar, which calculates a third loss function value for each of the second initial depth prediction information and the second late depth prediction information.

According to claim 11,
The training step (S800) is
A first loss function value and a second loss function value for the first initial depth prediction information, a first loss function value and a second loss function value for the first late depth prediction information, and the second initial depth prediction A precise depth information estimation method using a camera and lidar, where the depth completion network is trained using a third loss function value for the information and a third loss function value for the second late depth prediction information.