KR20240078316A

KR20240078316A - Apparatus and method for estimating dept

Info

Publication number: KR20240078316A
Application number: KR1020230135973A
Authority: KR
Inventors: 리 웨이밍; 권남섭; 허 바오; 정경부; 전명제; 왕 창; 성영훈; 마 린
Original assignee: 삼성전자주식회사
Priority date: 2022-11-25
Filing date: 2023-10-12
Publication date: 2024-06-03
Also published as: CN118096854A

Abstract

본 개시는 깊이를 추정하는 장치 및 방법에 관한 것으로, 장면의 티오에프(TOF: Time Of Flight) 이미지, 왼쪽 이미지 및 오른쪽 이미지를 수신하는 단계, 티오에프 이미지의 각 티오에프 픽셀의 제1 신뢰도를 확인하는 단계 및 상기 제1 신뢰도에 따라 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하는 단계를 포함할 수 있다.The present disclosure relates to an apparatus and method for estimating depth, comprising receiving a Time Of Flight (TOF) image, a left image, and a right image of a scene, determining a first reliability of each TOF pixel of the TOF image. It may include confirming and estimating a depth map of the scene based on the TIOF image, the left image, and the right image according to the first reliability.

Description

Apparatus and method for estimating depth {APPARATUS AND METHOD FOR ESTIMATING DEPT}

이하의 일 실시 예들은 이미지 처리 분야에 관한 것으로, 특히 깊이 추정 방법, 장치, 전자 장치 및 저장 매체에 관한 것이다.The following embodiments relate to the field of image processing, and particularly to depth estimation methods, devices, electronic devices, and storage media.

효율적인 깊이 추정은 증강 현실에서 중요한 역할을 한다. 입력된 센서 이미지를 분석하여 주변 장면의 깊이 정보를 유추하여 다양한 후속 작업을 지원한다. Efficient depth estimation plays an important role in augmented reality. By analyzing the input sensor image, depth information of the surrounding scene is inferred to support various follow-up tasks.

예를 들어 장면 객체의 3D 정보를 기반으로 다양한 3D 가상 객체를 그릴 수 있다. 동시에 주변 장면의 3D 정보를 얻는 것은 로봇의 장애물 회피 및 잡기와 같은 다른 많은 작업에도 중요한 의미를 갖는다. 따라서 보다 정확하고 효율적인 깊이 추정 방법이 필요하다.For example, various 3D virtual objects can be drawn based on the 3D information of scene objects. At the same time, obtaining 3D information of the surrounding scene also has important implications for many other tasks, such as robot obstacle avoidance and grasping. Therefore, a more accurate and efficient depth estimation method is needed.

본 발명은 깊이를 추정하는 장치 및 방법을 제공하는 것을 목적으로 한다.The purpose of the present invention is to provide an apparatus and method for estimating depth.

본 발명의 일 실시 예에 따른 깊이 추정 방법에 있어서, 장면의 티오에프(TOF: Time Of Flight) 이미지, 왼쪽 이미지 및 오른쪽 이미지를 수신하는 단계; 상기 티오에프 이미지의 각 티오에프 픽셀의 제1 신뢰도를 확인하는 단계; 및 상기 제1 신뢰도에 따라 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하는 단계를 포함할 수 있다.A depth estimation method according to an embodiment of the present invention, comprising: receiving a Time Of Flight (TOF) image, a left image, and a right image of a scene; Confirming a first reliability of each TIOF pixel of the TIOF image; And it may include estimating a depth map of the scene based on the TIOF image, the left image, and the right image according to the first reliability.

이때, 상기 제1 신뢰도에 따라 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하는 단계는, 기결정된 요구사항을 충족하는 상기 제1 신뢰도의 티오에프 픽셀의 수량을 확인하는 단계; 상기 제1 신뢰도의 티오에프 픽셀의 수량이 제1 임계값 요구사항을 충족하는 경우, 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하는 단계; 및 상기 제1 신뢰도의 티오에프 픽셀의 수량이 상기 제1 임계값 요구사항을 충족하지 않는 경우, 상기 티오에프 이미지를 사용하지 않고 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하는 단계를 포함할 수 있다.At this time, the step of estimating the depth map of the scene based on the TFO image, the left image, and the right image according to the first reliability includes selecting the TFO pixel of the first reliability that satisfies a predetermined requirement. Confirming the quantity; If the quantity of TFO pixels of the first reliability meets a first threshold requirement, estimating a depth map of the scene based on the TFO image, the left image, and the right image; and if the quantity of TIOF pixels of the first reliability does not meet the first threshold requirement, estimate the depth map of the scene based on the left image and the right image without using the TFO image. It may include steps.

이때, 상기 티오에프 이미지의 각 티오에프 픽셀의 상기 제1 신뢰도를 확인하는 단계는, 상기 티오에프 이미지의 각 티오에프 픽셀을 각각 상기 왼쪽 이미지 및 상기 오른쪽 이미지에 투영하는 단계; 투영 후의 각 티오에프 투영점에 대해, 제1 방향을 따라 해당 티오에프 투영점 스캔 라인 상의 각 티오에프 투영점에 대응하는 티오에프 픽셀의 제2 신뢰도를 계산하는 단계; 및 상기 제1 방향과 반대되는 제2 방향을 따라, 상기 제1 방향의 신뢰도 계산 결과를 기반으로 각 티오에프 투영점에 대응하는 티오에프 픽셀의 상기 제1 신뢰도를 계산하는 단계를 포함할 수 있다.At this time, the step of checking the first reliability of each ThiofF pixel of the Thiofe image includes projecting each Thiofe pixel of the Thiofe image onto the left image and the right image, respectively; For each Thiof projection point after projection, calculating a second reliability of a Thiof pixel corresponding to each Thiof projection point on the corresponding Thiof projection point scan line along a first direction; And calculating the first reliability of the TIOF pixel corresponding to each TIOF projection point along a second direction opposite to the first direction based on a reliability calculation result in the first direction. .

이때, 상기 제2 신뢰도를 계산하는 단계와 상기 제1 신뢰도를 계산하는 단계는, 상기 왼쪽 이미지와 상기 오른쪽 이미지에서의 각 티오에프 투영점의 이미지 특징 차이 및 해당 투영점 스캔 라인 상에서 상기 각 티오에프 투영점 사이의 거리가 기설정된 범위 내에 있는 이미지 특징이 유사한 티오에프 투영점에 대응하는 티오에프 픽셀의 제3 신뢰도를 기반으로, 상기 각 티오에프 투영점에 대응하는 티오에프 픽셀의 상기 제2 신뢰도와 상기 제1 신뢰도를 계산할 수 있다.At this time, the step of calculating the second reliability and the step of calculating the first reliability include the difference in image characteristics of each TIOF projection point in the left image and the right image and the difference in image characteristics of each TIOF projection point on the corresponding projection point scan line. Based on the third reliability of the TIOF pixel corresponding to the TIOF projection point with similar image characteristics where the distance between the projection points is within a preset range, the second reliability of the TFO pixel corresponding to each TFO projection point and the first reliability can be calculated.

이때, 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하는 단계는, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 제1 스테레오 매칭을 수행하는 단계; 제1 신경망을 사용하여 상기 티오에프 이미지 및 상기 제1 스테레오 매칭 결과를 기반으로 상기 각 티오에프 픽셀의 제4 신뢰도를 예측하는 단계; 및 상기 티오에프 이미지와 상기 제4 신뢰도를 기반으로 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 제2 스테레오 매칭일 수행하여 상기 장면의 제1 깊이맵을 추정하는 단계를 포함할 수 있다.At this time, estimating the depth map of the scene based on the TIOF image, the left image, and the right image includes performing first stereo matching of the left image and the right image; predicting a fourth reliability of each TIOF pixel based on the TFO image and the first stereo matching result using a first neural network; and performing a second stereo matching of the left image and the right image based on the TIOF image and the fourth reliability to estimate a first depth map of the scene.

이때, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 상기 제1 스테레오 매칭을 수행하는 단계는, 상기 수량이 제1 임계값 요구사항 및 제2 임계값 요구사항을 충족하는 경우, 상기 제1 신뢰도 및 티오에프 이미지를 기반으로 왼쪽 이미지 및 오른쪽 이미지의 상기 제1 스테레오 매칭을 수행하는 단계; 또는 상기 수량이 상기 제1 임계값 요구사항을 충족하고 상기 제2 임계값 요구사항을 충족하지 않는 경우, 상기 티오에프 이미지를 사용하지 않는 상황에서 왼쪽 이미지 및 오른쪽 이미지의 상기 제1 스테레오 매칭을 수행하는 단계를 포함할 수 있다.At this time, the step of performing the first stereo matching of the left image and the right image includes, when the quantity satisfies the first threshold requirement and the second threshold requirement, the first reliability and TIOF image performing the first stereo matching of the left image and the right image based on; or if the quantity meets the first threshold requirement and does not meet the second threshold requirement, perform the first stereo matching of the left image and the right image in a situation where the TIOF image is not used. It may include steps.

이때, 상기 제1 신경망을 사용하여 상기 티오에프 이미지 및 상기 제1 스테레오 매칭 결과를 기반으로 상기 각 티오에프 픽셀의 상기 제4 신뢰도를 예측하는 단계는, 제1 정보, 제2 정보 및 제3 정보 중에서 적어도 하나의 정보를 상기 제1 신경망의 입력으로 하여 상기 각 티오에프 픽셀의 상기 제4 신뢰도를 예측하는 단계를 포함하고, 상기 제1 정보는 상기 티오에프 이미지에 따라 결정된 상기 각 티오에프 픽셀에 대응하는 시차 값과 상기 제1 스테레오 매칭을 통해 결정된 상기 각 티오에프 픽셀의 시차 값의 차이이고, 상기 제2 정보는 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 투영 지점에서의 상기 각 티오에프 픽셀의 이미지 특징 차이이고, 상기 제3 정보는 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 상기 각 티오에프 픽셀의 투영점과 해당 투영점 영역 내 유사한 이미지 특징을 가진 적어도 하나의 티오에프 투영점의 깊이 값의 차이일 수 있다.At this time, the step of predicting the fourth reliability of each TFO pixel based on the TFO image and the first stereo matching result using the first neural network includes first information, second information, and third information. and predicting the fourth reliability of each TIOF pixel by using at least one piece of information as an input to the first neural network, wherein the first information is applied to each TFO pixel determined according to the TFO image. It is the difference between the corresponding disparity value and the disparity value of each TIOF pixel determined through the first stereo matching, and the second information is the image characteristic of each TOF pixel at the projection point of the left image and the right image. It is the difference, and the third information may be the difference between the projection point of each TIOF pixel of the left image and the right image and the depth value of at least one TIOF projection point with similar image characteristics in the projection point area. .

이때, 상기 티오에프 이미지와 상기 제4 신뢰도를 기반으로 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 상기 제2 스테레오 매칭을 수행하여 상기 장면의 상기 제1 깊이맵을 추정하는 단계는, 상기 티오에프 이미지의 상기 각 티오에프 픽셀의 값과 예측된 상기 각 티오에프 픽셀의 상기 제4 신뢰도를 기반으로 상기 제2 스테레오 매칭 시 상기 각 티오에프 픽셀에 대응하는 후보 시차의 매칭 비용을 계산하는 단계; 및 상기 매칭 비용을 기반으로 상기 각 티오에프 픽셀에 대응하는 시차 값을 결정하고, 상기 결정된 시차 값을 이용해서 상기 제1 깊이맵을 추정하는 단계를 포함할 수 있다.At this time, the step of estimating the first depth map of the scene by performing the second stereo matching of the left image and the right image based on the TIOF image and the fourth reliability includes: Calculating a matching cost of a candidate disparity corresponding to each of the ThioF pixels during the second stereo matching based on the value of each Thiofe pixel and the predicted fourth reliability of each Thiofe pixel; and determining a disparity value corresponding to each TIOF pixel based on the matching cost, and estimating the first depth map using the determined disparity value.

이때, 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하는 단계는, 상기 티오에프 이미지를 상기 왼쪽 이미지 및 상기 오른쪽 이미지에 투영하고, 기설정된 밀도를 충족하는 티오에프 투영점 영역에 대해, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하여 제2 깊이맵을 추정하는 단계; 상기 제1 깊이맵에 대해 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하여 제3 깊이맵을 추정하는 단계; 및 상기 제2 깊이맵 및 상기 제3 깊이맵을 기반으로 상기 장면의 제4 깊이맵을 추정하는 단계를 더 포함할 수 있다.At this time, the step of estimating the depth map of the scene based on the TIOF image, the left image, and the right image includes projecting the TIOF image onto the left image and the right image, and satisfying a preset density. For the TIOF projection point area, performing interpolation based on image features of the left image and the right image to estimate a second depth map; estimating a third depth map by performing interpolation on the first depth map based on image features of the left image and the right image; And it may further include estimating a fourth depth map of the scene based on the second depth map and the third depth map.

이때, 상기 기설정된 밀도를 충족하는 티오에프 투영점 영역에 대해, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하여 상기 제2 깊이맵을 추정하는 단계는, 각각의 티오에프 투영점의 스캔 라인 상의 기설정된 거리 내에 이격된 인접한 티오에프 투영점에 대해, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하는 단계; 상기 보간된 티오에프 투영점을 샘플링하여 티오에프 투영점의 정규 그리드를 구성하는 단계; 및 구성된 각 그리드 상에서 상기 각각의 티오에프 투영점에 대해, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하여 상기 제2 깊이맵을 추정하는 단계를 포함할 수 있다.At this time, the step of estimating the second depth map by performing interpolation based on image features of the left image and the right image for the TIOF projection point area that satisfies the preset density includes the steps of estimating the second depth map, performing interpolation based on image features of the left image and the right image for adjacent TIOF projection points spaced within a preset distance on the scan line of the point; Constructing a regular grid of TIOF projection points by sampling the interpolated TIOF projection points; and estimating the second depth map by performing interpolation based on image features of the left image and the right image for each TIOF projection point on each configured grid.

이때, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하는 것은, 보간할 점과 인접한 기준점 사이의 공간적 거리와 이미지 특징의 차이를 기반으로 보간할 점의 깊이 값을 결정할 수 있다.At this time, when interpolation is performed based on the image features of the left image and the right image, the depth value of the point to be interpolated can be determined based on the difference in image features and the spatial distance between the point to be interpolated and the adjacent reference point.

이때, 상기 티오에프 이미지를 사용하지 않고 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하는 단계는, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 스테레오 매칭을 통해 상기 장면의 제5 깊이맵을 추정하는 단계를 포함할 수 있다.At this time, the step of estimating the depth map of the scene based on the left image and the right image without using the T.O.F image includes creating a fifth depth map of the scene through stereo matching of the left image and the right image. It may include a step of estimating.

이때, 깊이 추정 방법은, 추정된 깊이맵 중 신뢰할 수 없는 깊이 값 포인트의 깊이 값을 업데이트하는 단계를 더 포함하고, 상기 추정된 깊이맵은 상기 제1 깊이맵, 상기 제4 깊이맵 또는 상기 제5 깊이맵을 포함할 수 있다.At this time, the depth estimation method further includes updating the depth value of an unreliable depth value point among the estimated depth maps, wherein the estimated depth map is the first depth map, the fourth depth map, or the fourth depth map. 5 May include a depth map.

이때, 상기 추정된 깊이맵 중 상기 신뢰할 수 없는 깊이 값 포인트의 깊이 값을 업데이트하는 단계는, 상기 추정된 깊이맵 중 신뢰할 수 있는 깊이 값 포인트 및 상기 신뢰할 수 없는 깊이 값 포인트를 결정하는 단계; 제2 신경망을 이용하여 상기 신뢰할 수 있는 깊이 값 포인트와 상기 신뢰할 수 없는 깊이 값 포인트의 특징을 기반으로 상기 신뢰할 수 없는 깊이 값 포인트의 깊이 값을 예측하는 단계; 및 상기 신뢰할 수 없는 깊이 값 포인트 주변 영역에 대해, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하여 업데이트된 깊이맵을 추정하는 단계를 포함할 수 있다.At this time, updating the depth value of the unreliable depth value point in the estimated depth map includes determining a reliable depth value point and an unreliable depth value point in the estimated depth map; predicting a depth value of the unreliable depth value point based on characteristics of the reliable depth value point and the unreliable depth value point using a second neural network; and estimating an updated depth map by performing interpolation based on image features of the left image and the right image for the area surrounding the unreliable depth value point.

이때, 상기 추정된 깊이맵 중 상기 신뢰할 수 있는 깊이 값 포인트 및 상기 신뢰할 수 없는 깊이 값 포인트를 결정하는 단계는, 상기 추정된 깊이맵을 기반으로 깊이 값 포인트의 정규 그리드를 구성하고, 상기 정규 그리드 상에서 상기 신뢰할 수 있는 깊이 값 포인트와 상기 신뢰할 수 없는 깊이 값 포인트를 결정하는 단계를 포함할 수 있다.At this time, the step of determining the reliable depth value points and the unreliable depth value points among the estimated depth map includes configuring a regular grid of depth value points based on the estimated depth map, and forming the regular grid of depth value points based on the estimated depth map. and determining the reliable depth value point and the unreliable depth value point in the image.

본 발명의 일 실시 예에 따른 전자 장치에 있어서, 적어도 하나의 프로세서; 및 적어도 하나의 컴퓨터 실행 가능 명령을 저장한 메모리를 포함하고, 상기 프로세서는, 장면의 티오에프(TOF: Time Of Flight) 이미지, 왼쪽 이미지 및 오른쪽 이미지를 수신하고, 상기 티오에프 이미지의 각 티오에프 픽셀의 제1 신뢰도를 확인하고, 상기 제1 신뢰도에 따라 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정할 수 있다.An electronic device according to an embodiment of the present invention, comprising: at least one processor; and a memory storing at least one computer-executable instruction, wherein the processor receives a Time Of Flight (TOF) image, a left image, and a right image of the scene, each TOF of the TOF image. The first reliability of the pixel may be confirmed, and the depth map of the scene may be estimated based on the TFO image, the left image, and the right image according to the first reliability.

이때, 상기 프로세서는, 상기 장면의 깊이맵을 추정할 때, 기결정된 요구사항을 충족하는 상기 제1 신뢰도의 티오에프 픽셀의 수량을 확인하고, 상기 제1 신뢰도의 티오에프 픽셀의 수량이 제1 임계값 요구사항을 충족하는 경우, 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정하고, 상기 제1 신뢰도의 티오에프 픽셀의 수량이 상기 제1 임계값 요구사항을 충족하지 않는 경우, 상기 티오에프 이미지를 사용하지 않는 상황에서 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정할 수 있다.At this time, when estimating the depth map of the scene, the processor confirms the quantity of TIOF pixels of the first reliability that satisfies a predetermined requirement, and the quantity of TFO pixels of the first reliability is determined by the first reliability. If the threshold requirement is met, a depth map of the scene is estimated based on the TIOF image, the left image, and the right image, and the quantity of TFO pixels of the first reliability is determined by the first threshold requirement. If the requirements are not met, the depth map of the scene can be estimated based on the left image and the right image in a situation where the TFO image is not used.

이때, 상기 프로세서는, 상기 티오에프 이미지의 각 티오에프 픽셀의 상기 제1 신뢰도를 결정할 때, 상기 티오에프 이미지의 각 티오에프 픽셀을 각각 상기 왼쪽 이미지 및 상기 오른쪽 이미지에 투영하고, 투영 후의 각 티오에프 투영점에 대해, 제1 방향을 따라 해당 티오에프 투영점 스캔 라인 상의 각 티오에프 투영점에 대응하는 티오에프 픽셀의 제2 신뢰도를 계산하고, 상기 제1 방향과 반대되는 제2 방향을 따라, 상기 제1 방향의 신뢰도 계산 결과를 기반으로 각 티오에프 투영점에 대응하는 티오에프 픽셀의 상기 제1 신뢰도를 계산할 수 있다.At this time, when determining the first reliability of each Thiof pixel of the Thiof image, the processor projects each Thiof pixel of the Thiof image onto the left image and the right image, respectively, and projects each Thiof pixel of the Thiof image after projection. For an F projection point, calculate a second reliability of a Thiof pixel corresponding to each Thiof projection point on the corresponding Thiof projection point scan line along a first direction, and along a second direction opposite to the first direction. , The first reliability of the TIOF pixel corresponding to each TIOF projection point may be calculated based on the reliability calculation result in the first direction.

이때, 상기 프로세서는, 상기 티오에프 이미지, 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정할 때, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 제1 스테레오 매칭을 수행하고, 제1 신경망을 사용하여 상기 티오에프 이미지 및 상기 제1 스테레오 매칭 결과를 기반으로 상기 각 티오에프 픽셀의 제4 신뢰도를 예측하고, 상기 티오에프 이미지와 상기 제4 신뢰도를 기반으로 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 제2 스테레오 매칭일 수행하여 상기 장면의 제1 깊이맵을 추정할 수 있다.At this time, when estimating the depth map of the scene based on the TIOF image, the left image, and the right image, the processor performs first stereo matching of the left image and the right image, and uses a first neural network Predict the fourth reliability of each TIOF pixel based on the TIOF image and the first stereo matching result, and predict the fourth reliability of the left image and the right image based on the TFO image and the fourth reliability. The first depth map of the scene can be estimated by performing a second stereo matching.

이때, 상기 프로세서는, 상기 티오에프 이미지를 사용하지 않고 상기 왼쪽 이미지 및 상기 오른쪽 이미지를 기반으로 상기 장면의 깊이맵을 추정할 때, 상기 왼쪽 이미지 및 상기 오른쪽 이미지의 스테레오 매칭을 통해 상기 장면의 제5 깊이맵을 추정할 수 있다.At this time, when the processor estimates the depth map of the scene based on the left image and the right image without using the TFO image, the processor determines the depth map of the scene through stereo matching of the left image and the right image. 5 The depth map can be estimated.

도 1은 일 실시예에 따라 깊이를 추정하는 과정을 도시한 흐름도이다.
도 2는 일 실시예에 따른 헤드 웨어러블 장치 상의 티오에프 이미지 센서 및 컬러 이미지 센서의 구성을 도시한 예시도이다.
도 3은 일 실시예에 따라 양방향 전파에 기반한 티오에프 픽셀 신뢰성 추론하는 예를 도시한 예시도이다.
도 4는 일 실시예에 따라 티오에프 가이드의 스테레오 매칭을 도시한 도면이다.
도 5는 일 실시예에 따라 트랜스포머를 기반으로 한 ToF 픽셀 예측의 신뢰성에 대한 예를 도시한 예시도이다.
도 6은 일 실시예에 따라 RGB 이미지에 대한 ToF 포인트의 투영 상황을 도시한 예시도이다.
도 7은 일 실시예에 따라 다운샘플링 후의 그리드 포인트의 예를 도시한 예시도이다.
도 8은 일 실시예에 따라 정규 그리드 상에서 왼쪽 및 오른쪽 이미지의 이미지 특징에 의해 가이드되는 보간을 수행하는 예를 도시한 도면이다.
도 9는 일 실시예에 따라 불규칙 투영 후의 왼쪽 및 오른쪽 이미지의 이미지 특징에 의해 가이드되는 보간의 예시를 도시한다.
도 10은 일 실시예에 따라 트랜스포머 기반의 신뢰할 수 없는 깊이 값 포인트를 업데이트하는 과정을 도시한 도면이다.
도 11은 일 실시예에 따라 깊이를 추정하는 과정의 예를 도시한 도면이다.
도 12는 일 실시예의 따른 깊이 추정 장치를 도시한 블록도이다.
도 13은 일 실시예에 따른 전자 장치의 블록도이다.1 is a flowchart illustrating a process for estimating depth according to one embodiment.
Figure 2 is an exemplary diagram illustrating the configuration of a TIOF image sensor and a color image sensor on a head wearable device according to an embodiment.
FIG. 3 is an exemplary diagram illustrating an example of TFO pixel reliability inference based on two-way propagation according to an embodiment.
Figure 4 is a diagram illustrating stereo matching of a TIOF guide according to an embodiment.
Figure 5 is an exemplary diagram illustrating an example of reliability of transformer-based ToF pixel prediction according to an embodiment.
Figure 6 is an example diagram illustrating a projection situation of ToF points on an RGB image according to an embodiment.
Figure 7 is an exemplary diagram illustrating an example of grid points after downsampling according to an embodiment.
FIG. 8 is a diagram illustrating an example of performing interpolation guided by image features of left and right images on a regular grid, according to an embodiment.
9 shows an example of interpolation guided by image features of left and right images after random projection, according to one embodiment.
FIG. 10 is a diagram illustrating a process of updating a transformer-based unreliable depth value point according to an embodiment.
Figure 11 is a diagram illustrating an example of a process for estimating depth according to an embodiment.
Figure 12 is a block diagram showing a depth estimation device according to an embodiment.
Figure 13 is a block diagram of an electronic device according to an embodiment.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. However, various changes can be made to the embodiments, so the scope of the patent application is not limited or limited by these embodiments. It should be understood that all changes, equivalents, or substitutes for the embodiments are included in the scope of rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are for descriptive purposes only and should not be construed as limiting. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by a person of ordinary skill in the technical field to which the embodiments belong. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and unless explicitly defined in the present application, should not be interpreted in an ideal or excessively formal sense. No.

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, when describing with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted. In describing the embodiments, if it is determined that detailed descriptions of related known technologies may unnecessarily obscure the gist of the embodiments, the detailed descriptions are omitted.

또한, 실시 예의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다. Additionally, in describing the components of the embodiment, terms such as first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, sequence, or order of the component is not limited by the term. When a component is described as being “connected,” “coupled,” or “connected” to another component, that component may be directly connected or connected to that other component, but there is no need for another component between each component. It should be understood that may be “connected,” “combined,” or “connected.”

어느 하나의 실시 예에 포함된 구성요소와, 공통적인 기능을 포함하는 구성요소는, 다른 실시 예에서 동일한 명칭을 사용하여 설명하기로 한다. 반대되는 기재가 없는 이상, 어느 하나의 실시 예에 기재한 설명은 다른 실시 예에도 적용될 수 있으며, 중복되는 범위에서 구체적인 설명은 생략하기로 한다.Components included in one embodiment and components including common functions will be described using the same names in other embodiments. Unless stated to the contrary, the description given in one embodiment may be applied to other embodiments, and detailed description will be omitted to the extent of overlap.

이하에서는, 본 발명의 일 실시 예에 따른 깊이를 추정하는 장치 및 방법을 첨부된 도 1 내지 도 13을 참조하여 상세히 설명한다.Below, according to an embodiment of the present invention An apparatus and method for estimating depth will be described in detail with reference to the attached FIGS. 1 to 13.

도 1은 일 실시예에 따라 깊이를 추정하는 과정을 도시한 흐름도이다.1 is a flowchart illustrating a process for estimating depth according to one embodiment.

도 1을 참조하면, 깊이 추정 방법은 도 1에 도시된 바와 같이, 단계 S110에서, 장면의 티오에프(TOF: Time Of Flight), 왼쪽 이미지 및 오른쪽 이미지를 수신할 수 있다(110). 예를 들어, ToF 이미지 센서를 이용하여 ToF 이미지를 수집하고, 컬러 이미지 센서를 이용하여 좌우 이미지를 수집하거나, 미리 저장된 ToF 이미지, 좌우 이미지를 메모리에서 직접 획득할 수 있다. 본 개시는 ToF 이미지, 좌측 및 우측 이미지를 획득하는 방법을 제한하지 않는다. 왼쪽 및 오른쪽 이미지는 RGB 이미지, 다른 유형의 컬러 이미지 또는 적외선 이미지 등일 수 있다. 예를 들어, 헤드 웨어러블 장치의 ToF 이미지 센서와 컬러 이미지 센서의 구성은 도 2와 같이 구성될 수 있다.Referring to FIG. 1, the depth estimation method may receive the TOF (Time Of Flight), left image, and right image of the scene in step S110, as shown in FIG. 1 (110). For example, ToF images can be collected using a ToF image sensor, left and right images can be collected using a color image sensor, or pre-stored ToF images and left and right images can be acquired directly from memory. The present disclosure does not limit the method of acquiring ToF images, left and right images. The left and right images can be RGB images, other types of color images, or infrared images, etc. For example, the ToF image sensor and color image sensor of the head wearable device may be configured as shown in FIG. 2.

도 2는 일 실시예에 따른 헤드 웨어러블 장치 상의 티오에프 이미지 센서 및 컬러 이미지 센서의 구성을 도시한 예시도이다.Figure 2 is an exemplary diagram illustrating the configuration of a TIOF image sensor and a color image sensor on a head wearable device according to an embodiment.

도 2를 참조하면, 깊이를 추정하는 헤드 웨어러블 장치(210)는 같은 라인에 위치하는 ToF 이미지 센서(212)와 컬러 이미지 센서(214, 216)을 포함할 수 있다.Referring to FIG. 2, the head wearable device 210 that estimates depth may include a ToF image sensor 212 and color image sensors 214 and 216 located on the same line.

촬영한 이미지는 예를 들어 도 2에 도시된 바와 같이 ToF 이미지(222)와 스트레오 이미지 쌍에 해당하는 오른쪽 이미지(224)와 왼쪽 이미지(226)을 포함할 수 있다.For example, the captured image may include a right image 224 and a left image 226 corresponding to a ToF image 222 and a stereo image pair, as shown in FIG. 2 .

다시 도 1의 설명으로 돌아와서, 깊이 추정 방법은 ToF 이미지 중 각 ToF 픽셀의 제1 신뢰도를 확인할 수 있다(120). 이때, 제1 신뢰도는 ToF 픽셀에 대응하는 깊이 값의 정확도 또는 신뢰도를 측정하는 수치일 수 있다. 또한, 이하 본 개시에서 언급하는 제2 신뢰도, 제3 신뢰도, 제4 신뢰도 역시 TOF 픽셀에 대응하는 깊이 값의 정확도 또는 신뢰도를 측정하는 수치이다.Returning to the description of FIG. 1, the depth estimation method can confirm the first reliability of each ToF pixel in the ToF image (120). At this time, the first reliability may be a value measuring the accuracy or reliability of the depth value corresponding to the ToF pixel. In addition, the second reliability, third reliability, and fourth reliability mentioned in the present disclosure below are also values that measure the accuracy or reliability of the depth value corresponding to the TOF pixel.

120단계에서 깊이 추정 방법은 ToF 이미지의 각 ToF 픽셀은 왼쪽 및 오른쪽 이미지 상에 각각 먼저 투영될 수 있다. 그리고, 깊이 추정 방법은 투영 후의 각 ToF 투영점 스캔 라인에 대해, 제1 방향을 따라 해당 ToF 투영점 스캔 라인 상의 각 ToF 투영점에 대응하는 ToF 픽셀의 제2 신뢰도를 계산할 수 있다. 그리고, 깊이 추정 방법은 제1 방향과 반대되는 제2 방향을 따라, 제1 방향의 신뢰도 계산 결과를 기반으로 각 ToF 투영점에 대응하는 ToF 픽셀의 제1 신뢰도를 계산할 수 있다.In step 120, in the depth estimation method, each ToF pixel of the ToF image may be first projected onto the left and right images, respectively. In addition, the depth estimation method may calculate a second reliability of the ToF pixel corresponding to each ToF projection point on the ToF projection point scan line along the first direction for each ToF projection point scan line after projection. Additionally, the depth estimation method may calculate the first reliability of the ToF pixel corresponding to each ToF projection point along a second direction opposite to the first direction based on the reliability calculation result in the first direction.

본 개시에서는 상술한 제1 신뢰도를 결정하는 방법을 "양방향 전파 기반의 ToF 신뢰도 추론"이라 한다. 양방향 전파 기반의 ToF 신뢰도 추론에서, 제1 방향에 따른 제2 신뢰도와 제2 방향에 따른 제1 신뢰도를 계산할 때, 좌우 이미지에서의 각 ToF 투영점의 이미지 특징 차이, 해당 투영점 스캔 라인 상에서의 미리 정해진 범위 내의 각 ToF 투영점 간의 거리의 이미지 특징과 유사한 ToF 투영점에 대응하는 ToF 픽셀의 제3 신뢰도를 기반으로, 각 ToF 투영점에 대응하는 ToF 픽셀의 제2 신뢰도 및 제1 신뢰도를 계산할 수 있다. 예를 들어, 제1 방향은 투영점의 스캔 라인을 따라 왼쪽에서 오른쪽일 수 있고, 제2 방향은 투영점의 스캔 라인을 따라 오른쪽에서 왼쪽일 수 있다. 또는, 제1 방향은 투영점의 스캔 라인을 따라 위에서 아래일 수 있고, 제2 방향은 투영점의 스캔 라인을 따라 아래에서 위로 될 수 있다.In this disclosure, the method for determining the above-described first reliability is referred to as “two-way propagation-based ToF reliability inference.” In bidirectional propagation-based ToF reliability inference, when calculating the second reliability along the first direction and the first reliability along the second direction, the image feature difference of each ToF projection point in the left and right images, the difference in image characteristics on the scan line of the corresponding projection point, Based on the image feature of the distance between each ToF projection point within a predetermined range and the third reliability of the ToF pixel corresponding to the ToF projection point, the second reliability and the first reliability of the ToF pixel corresponding to each ToF projection point are calculated. You can. For example, the first direction may be left to right along the scan line of the projection point, and the second direction may be right to left along the scan line of the projection point. Alternatively, the first direction may be from top to bottom along the scan line of the projection point, and the second direction may be from bottom to top along the scan line of the projection point.

도 3은 일 실시예에 따라 양방향 전파에 기반한 티오에프 픽셀 신뢰성 추론하는 예를 도시한 예시도이다.FIG. 3 is an exemplary diagram illustrating an example of TFO pixel reliability inference based on two-way propagation according to an embodiment.

도 3을 참조하면, ToF 이미지 센서와 컬러 이미지 센서가 같은 라인에 위치하는 경우, 좌우 이미지에 투영되는 ToF 픽셀도 같은 라인에 위치한다. 왼쪽 이미지에 투영된 ToF 픽셀을 예로 들어보면, 각 ToF 투영점 스캔 라인에 대해, 두 방향(예, 먼저 왼쪽에서 오른쪽으로, 그 다음 오른쪽에서 왼쪽)을 따라 각 ToF 투영점에 대응하는 ToF 픽셀의 신뢰도를 계산한다.Referring to FIG. 3, when the ToF image sensor and the color image sensor are located on the same line, the ToF pixels projected on the left and right images are also located on the same line. Taking the ToF pixel projected on the left image as an example, for each ToF projection point scan line, there are two ToF pixels corresponding to each ToF projection point along two directions (e.g., first left to right, then right to left). Calculate reliability.

특정 구현에서 신뢰도 계산은 ToF 이미지의 각 ToF 픽셀을 투영하여 수행된다. 왼쪽에서 오른쪽으로의 방향을 예로 들면, ToF 투영점 P가 주어졌을 때, 왼쪽과 오른쪽 이미지의 대응점의 이미지 특징의 차이를 e0으로 정의할 수 있고, 그 수평 방향의 미리 정해진 거리 내의 이미 특징(예, 색상 특징)의 유사한 점은 M1, ??, Mn이다. 서로 다른 이미지 특징에 대해 이미지 특징이 유사한지 여부를 판단하기 위한 임계값은 미리 설정될 수 있다. 이 n개의 점에 대해, 점 P까지의 수평 거리는 di, i=1, ??, n으로 정의할 수 있고, 그들과 점 P 사이의 깊이 차이는 bi, i=1, ??, n이고, 계산된 신뢰도는 ci, i=1, ??, n이다. 따라서 점 P의 신뢰도는 다음의 <수학식 1>과 같이 정의할 수 있다.In certain implementations, confidence calculations are performed by projecting each ToF pixel of the ToF image. Taking the left to right direction as an example, given the ToF projection point P, the difference in image features between the corresponding points of the left and right images can be defined as e0, and the already existing features within a predetermined distance in the horizontal direction (e.g. , color features), the similar points are M1, ??, Mn. A threshold for determining whether image features are similar for different image features may be set in advance. For these n points, the horizontal distance to point P can be defined as di, i=1, ??, n, and the depth difference between them and point P is bi, i=1, ??, n, The calculated reliability is ci, i=1, ??, n. Therefore, the reliability of point P can be defined as follows <Equation 1>.

[수학식 1][Equation 1]

이때, Cp는 ToF 이미지의 ToF 픽셀인 P의 신뢰도이고, a0, a1, a2는 기설정된 상수일 수 있다. a0, a1, a2는 모두 0보다 크다. 예를 들어 a0, a1, a2는 0보다 큰 실험 값일 수 있다.At this time, Cp is the reliability of P, which is a ToF pixel of the ToF image, and a0, a1, and a2 may be preset constants. a0, a1, and a2 are all greater than 0. For example, a0, a1, and a2 may be experimental values greater than 0.

왼쪽에서 오른쪽 방향으로 P 지점 오른쪽의 ToF 투영점의 신뢰도(ToF 투영점의 신뢰도는 해당 ToF 투영점에 대응하는 ToF 픽셀의 신뢰도임)를 일시적으로 알 수 없다. 예를 들어, 일시적으로 기본값(예를 들어, 1)으로 설정할 수 있다. 즉, 아직 신뢰도가 계산되지 않은 신뢰도의 점에 대해서는 그 신뢰도를 일시적으로 기본값으로 설정할 수 있다. 각 ToF 투영점에 대응하는 ToF 픽셀의 제2 신뢰도를 왼쪽에서 오른쪽으로 계산한 후, 각 ToF 투영점에 대응하는 ToF 픽셀의 제1 신뢰도를 오른쪽에서 왼쪽으로 계산한다. 예를 들어, 도 3에서와 같이, P점의 제2 신뢰도를 왼쪽에서 오른쪽으로 계산할 때, M1, M2점의 제2 신뢰도는 이미 계산되어 있지만, P점의 오른쪽에 위치한 M3점의제2 신뢰도는 아직 계산되지 않았고, 일시적으로 이를 기본값(예를 들어, 1)으로 설정할 수 있다. 오른쪽에서 왼쪽으로 점 P의 제1 신뢰도를 계산할 때 점 M3의 제1 신뢰도는 이미 계산되었다. 이때 M3점의 제1 신뢰도와 M1, M2점의 제2 신뢰도를 이용하여 P점의 제1 신뢰도를 계산할 수 있다.The reliability of the ToF projection point to the right of point P from left to right (the reliability of the ToF projection point is the reliability of the ToF pixel corresponding to the ToF projection point) is temporarily unknown. For example, you can temporarily set it to the default value (e.g. 1). In other words, for reliability points for which reliability has not yet been calculated, the reliability can be temporarily set to the default value. After calculating the second reliability of the ToF pixel corresponding to each ToF projection point from left to right, the first reliability of the ToF pixel corresponding to each ToF projection point is calculated from right to left. For example, as shown in Figure 3, when calculating the second reliability of point P from left to right, the second reliability of points M1 and M2 have already been calculated, but the second reliability of point M3 located to the right of point P has not been calculated yet, you can temporarily set it to the default value (e.g. 1). When calculating the first reliability of point P from right to left, the first reliability of point M3 has already been calculated. At this time, the first reliability of point P can be calculated using the first reliability of point M3 and the second reliability of points M1 and M2.

양방향 전파를 기반으로 한 ToF 픽셀의 신뢰성 추론은 신경망과 같은 방법에 비해 빠른 이점이 있어 깊이 추정의 효율성을 향상시키는데 도움이 된다. 동시에 추정된 제1 신뢰도는 왼쪽, 오른쪽 및 ToF 이미지를 기반으로 장면의 깊이 정보를 추정할지 여부를 결정하는데 사용될 수 있다.

Reliability inference of ToF pixels based on bidirectional propagation has the advantage of being faster than methods such as neural networks, which helps improve the efficiency of depth estimation. At the same time, the estimated first confidence can be used to decide whether to estimate the depth information of the scene based on the left, right, and ToF images.

다시 도 1의 설명으로 돌아가서, 깊이 추정 방법은 제1 신뢰도에 따라 왼쪽 이미지, 오른쪽 이미지 및 ToF 이미지를 기반으로 장면의 깊이맵을 추정할 수 있다(130).Going back to the description of FIG. 1, the depth estimation method may estimate the depth map of the scene based on the left image, right image, and ToF image according to the first reliability (130).

실시예에 따르면, 130단계에서, 깊이 추정 방법은 제1 신뢰도가 기설정된 요구사항을 만족하는 ToF 픽셀의 수량을 확인하고, 제1 신뢰도가 기설정된 요구사항을 만족하는 ToF 픽셀의 수량이 제1 임계값 요구사항을 충족하면, 왼쪽 이미지, 오른쪽 이미지 및 ToF 이미지를 기반으로 장면의 깊이맵을 추정할 수 있다.According to an embodiment, in step 130, the depth estimation method determines the quantity of ToF pixels whose first reliability satisfies the preset requirement, and the quantity of ToF pixels whose first reliability satisfies the preset requirement is the first reliability. If the threshold requirements are met, the depth map of the scene can be estimated based on the left image, right image, and ToF image.

반대로, 130단계에서, 깊이 추정 방법은 제1 신뢰도가 기설정된 요구사항을 만족하는 ToF 픽셀의 수량이 제1 임계값 요건을 만족하지 못하면, ToF 이미지를 사용하지 않고 좌우 이미지를 기준으로 장면의 깊이맵을 추정할 수 있다.Conversely, in step 130, if the quantity of ToF pixels whose first reliability satisfies the preset requirement does not satisfy the first threshold requirement, the depth estimation method determines the depth of the scene based on the left and right images without using the ToF image. The map can be estimated.

예를 들어, 깊이 추정 방법은 제1 신뢰도를 기반으로 좌우 및 ToF 이미지를 기반으로 장면의 깊이맵을 결정하지 여부를 결정할 수 있다.For example, the depth estimation method may determine whether to determine the depth map of the scene based on the left and right and ToF images based on the first reliability.

깊이 추정 방법은 제1 신뢰도가 기설정된 요구사항을 충족하는 ToF 픽셀의 수량이 제1 임계값을 만족하는 상황에서만, 좌우 및 ToF 이미지를 기반으로 장면의 깊이맵을 추정하고, 그렇지 않은 경우는 ToF 이미지를 사용하지 않는 상황에서 좌우 이미지를 기반으로 장면의 깊이맵을 추정할 수 있다. 이로서 깊이 추정 방법은 ToF 이미지가 제공하는 제한된 정보로 인한 높은 계산 오버헤드 문제를 피할 수 있으므로, 깊이 추정 효율성이 향상될 수 있다.The depth estimation method estimates the depth map of the scene based on the left and right and ToF images only in situations where the quantity of ToF pixels whose first reliability meets the preset requirements satisfies the first threshold. Otherwise, the depth map of the scene is estimated based on the ToF image. In situations where images are not used, the depth map of the scene can be estimated based on the left and right images. As a result, the depth estimation method can avoid the problem of high computational overhead due to the limited information provided by ToF images, thereby improving depth estimation efficiency.

이하, 왼쪽, 오른쪽 및 ToF 이미지를 기반으로 장면의 깊이맵을 추정하는 것에 대해 보다 구체적으로 설명한다.Hereinafter, estimating the depth map of a scene based on left, right, and ToF images will be described in more detail.

좌, 우, ToF 이미지를 기반으로 한 장면의 깊이맵 추정은 장면의 깊이맵을 추정하기 위한 ToF 이미지 뿐만 아니라 좌우 이미지를 기반으로 한 스테레오 매칭을 포함할 수 있다.Estimating the depth map of a scene based on left, right, and ToF images may include stereo matching based on the left and right images as well as the ToF image to estimate the depth map of the scene.

연구에 따르면 ToF 카메라와 스테레오 매칭 모두 고유한 장점과 단점이 존재한다. ToF 카메라는 범위가 정확하고 텍스처가 없는 영역을 매우 잘 처리할 수 있다. 하지만, ToF 카메라를 통해 얻은 ToF 이미지에는 노이즈가 있을 수 있고 빛이 너무 강하거나 너무 어둡거나 장거리에서는 결과가 나오지 않는다. 스테레오 매칭의 경우 텍스처가 없는 영역의 처리가 좋지 않지만 조명과 거리 등에 영향을 받지 않는다. 이에 대응하여, 본 개시에서는 깊이 정보를 추정하기 위해 ToF 이미지와 이 스테레오 매칭을 더 잘 융합하는 방법을 제안한다.Research shows that both ToF cameras and stereo matching have unique advantages and disadvantages. ToF cameras have accurate range and can handle untextured areas very well. However, ToF images obtained through a ToF camera can be noisy and produce no results when the light is too strong, too dark, or at long distances. In the case of stereo matching, processing of areas without texture is poor, but it is not affected by lighting and distance. Correspondingly, this disclosure proposes a method to better fuse this stereo matching with ToF images to estimate depth information.

일 실시예에서, 좌측, 우측 및 ToF 이미지에 기초하여 장면의 깊이맵을 추정하는 단계는, 먼저, 좌측 및 우측 이미지의 제1 스테레오 매칭을 수행하는 단계, 둘째, ToF 이미지 및 제1 스테레오 매칭 결과를 기반으로 제1 신경망을 사용하여 각 ToF 픽셀의 제4 신뢰도를 예측하는 단계, 마지막으로 ToF 이미지와 예측된 제4 신뢰도를 기반으로 왼쪽 및 오른쪽 이미지의 제2 스테레오 매칭을 수행하여 장면의 제1 깊이맵을 얻는 단계를 포함할 수 있다. 예를 들어 제1 신경망은 트랜스포머(Transformer)일 수 있지만 이에 국한되지는 않는다.In one embodiment, estimating a depth map of a scene based on the left, right, and ToF images includes first, performing first stereo matching of the left and right images, and second, performing the ToF image and the first stereo matching result. predicting a fourth reliability of each ToF pixel using a first neural network based on , and finally performing a second stereo matching of the left and right images based on the ToF image and the predicted fourth reliability to obtain a first It may include the step of obtaining a depth map. For example, the first neural network may be, but is not limited to, a Transformer.

앞서 언급한 바와 같이 깊이 추정 방법은 제1 신뢰도가 소정의 요건을 만족하는 ToF 픽셀의 수량이 제1 임계값을 만족하는 경우, 좌우 이미지 및 ToF 이미지를 기준으로 장면의 깊이맵을 추정할 수 있다. 예를 들어, 깊이 추정 방법은 제1 신뢰도가 소정의 요건을 만족하는 ToF 픽셀의 수량이 제1 임계값을 만족하는 경우, ToF 이미지 뿐만 아니라 좌우 이미지의 스테레오 매칭을 기반으로 장면의 깊이맵을 추정할 수 있다. 구체적으로, 본 개시는 또한 제1 신뢰도가 미리 정해진 요구사항을 만족하는 ToF 픽셀의 수량이 제1 임계값 요구사항을 충족할 때 제2 임계값 요구사항을 더 만족하는지 여부에 따라 서로 다른 융합 방법을 채택할 수 있다. 예를 들어, 제1 임계값 및 제2 임계값 요구사항 중 제1 임계값 및 제2 임계값의 선택은 ToF 이미지의 ToF 픽셀 수량 및/또는 ToF 이미지를 획득하는데 사용되는 ToF 카메라의 정확도와 관련될 수 있다. 예를 들어, ToF 이미지의 ToF 픽셀이 많을수록 및/또는 ToF 카메라의 정확도가 높을수록 제1 임계값 및 제2 임계값이 높아질 수 있다.As mentioned earlier, the depth estimation method can estimate the depth map of the scene based on the left and right images and the ToF image when the quantity of ToF pixels whose first reliability satisfies a predetermined requirement satisfies the first threshold. . For example, the depth estimation method estimates the depth map of the scene based on stereo matching of the left and right images as well as the ToF image when the quantity of ToF pixels whose first reliability satisfies a predetermined requirement satisfies the first threshold. can do. Specifically, the present disclosure also provides different fusion methods depending on whether the quantity of ToF pixels whose first reliability satisfies a predetermined requirement satisfies the first threshold requirement and more satisfies the second threshold requirement. can be adopted. For example, the selection of the first and second threshold requirements may be related to the quantity of ToF pixels in the ToF image and/or the accuracy of the ToF camera used to acquire the ToF image. It can be. For example, the more ToF pixels in the ToF image and/or the higher the accuracy of the ToF camera, the higher the first threshold and the second threshold may be.

예를 들어, 서로 다른 융합 방법에서, 제1 신경망을 사용하여 각 ToF 픽셀의 제4 신뢰도를 예측하기 전, 다른 방식을 채택하여 좌우 이미지의 제1 스테레오 매칭을 수행할 수 있다. 예를 들어, 1 신뢰도가 미리 정해진 요구사항을 만족하는 ToF 픽셀의 수량이 제1 임계값 요구사항과 제2 임계값 요구사항을 충족하면, 제1 신뢰도 및 ToF 이미지를 기반으로 왼쪽 이미지 및 오른쪽 이미지의 제1 스테레오 매칭을 수행할 수 있다. 또는 1 신뢰도가 미리 정해진 요구사항을 만족하는 ToF 픽셀의 수량이 제1 임계값 요구사항을 충족하고 제2 임계값 요구사항을 충족하지 않는 경우, ToF 이미지를 사용하지 않는 상황에서 왼쪽 및 오른쪽 이미지의 제1 스테레오 매칭을 수행할 수 있다. ToF 이미지를 사용하지 않는 상황에서 좌우 이미지의 제1 스테레오 매칭은 당업자에게 공지된 임의의 방법을 사용하여 수행될 수 있다.For example, in different fusion methods, before using the first neural network to predict the fourth reliability of each ToF pixel, a different method may be adopted to perform the first stereo matching of the left and right images. For example, if the quantity of ToF pixels whose confidence satisfies the predetermined requirement satisfies the first threshold requirement and the second threshold requirement, then the left image and right image are selected based on the first confidence and ToF image. The first stereo matching can be performed. or 1 If the quantity of ToF pixels whose confidence satisfies a predetermined requirement satisfies the first threshold requirement and does not meet the second threshold requirement, then in the situation where ToF images are not used, the First stereo matching may be performed. In situations where ToF images are not used, the first stereo matching of left and right images may be performed using any method known to those skilled in the art.

여기서, 제1 신뢰도 및 ToF 이미지를 기반으로 좌우 이미지의 제1 스테레오 매칭을 수행하는 것에 대해서 설명한다. 예를 들어, 제1 신뢰도 및 ToF 이미지를 기반으로 좌우 이미지의 제1 스테레오 매칭을 수행하는 방법은, 첫째, ToF 이미지 중 각 ToF 픽셀의 값 및 각 ToF 픽셀의 제1 신뢰도를 기반으로 제1 스테레오 매칭 실행 시의 각 ToF 픽셀에 대응하는 후보 시차의 매칭 비용을 계산하는 단계, 둘째, 계산된 매칭 비용을 기반으로 각 ToF 픽셀에 대응하는 시차 값을 결정하는 단계를 포함할 수 있다.Here, performing the first stereo matching of the left and right images based on the first reliability and ToF images will be described. For example, a method of performing first stereo matching of left and right images based on the first reliability and ToF images includes: first, first stereo matching based on the value of each ToF pixel in the ToF image and the first reliability of each ToF pixel; It may include calculating a matching cost of a candidate disparity corresponding to each ToF pixel when performing matching, and secondly, determining a disparity value corresponding to each ToF pixel based on the calculated matching cost.

예를 들어, 제1 신뢰도 및 ToF 이미지를 기반으로 좌우 이미지의 제1 스테레오 매칭을 수행할 때, SGM(Semi Global Matching)을 스테레오 매칭의 기준 방법으로 사용할 수 있다. SGM은 속도가 빠르고 성능이 우수하며 도메인 변환의 영향을 받지 않는다. 그러나, ToF 픽셀에는 종종 노이즈가 있다. 따라서 본 개시에서는 SGM을 사용하여 스테레오 매칭 수행 시 ToF 이미지의 각 ToF 픽셀의 값 및 각 ToF 픽셀의 신뢰도를 기반으로 SGM 스테레오 매칭 시 각 ToF 픽셀에 대응하는 후보 시차의 매칭 비용을 계산하는 방법을 제안한다. 예를 들어, 매칭 비용 계산에 사용되는 가중치는 제1 신뢰도를 기준으로 정의할 수 있다. 설명의 편의를 위해 본 개시의 ToF 신뢰도를 고려한 스테레오 매칭 방법을 "ToF 가이드의 스테레오 매칭"이라 한다(도 4 참조).For example, when performing first stereo matching of left and right images based on the first reliability and ToF images, SGM (Semi Global Matching) can be used as a reference method for stereo matching. SGM is fast, has excellent performance, and is not affected by domain conversion. However, ToF pixels often have noise. Therefore, in this disclosure, when performing stereo matching using SGM, we propose a method of calculating the matching cost of the candidate disparity corresponding to each ToF pixel during SGM stereo matching based on the value of each ToF pixel of the ToF image and the reliability of each ToF pixel. do. For example, the weight used to calculate the matching cost can be defined based on the first reliability. For convenience of explanation, the stereo matching method considering ToF reliability of the present disclosure is referred to as “stereo matching of ToF guide” (see FIG. 4).

도 4는 일 실시예에 따라 티오에프 가이드의 스테레오 매칭을 도시한 도면이다.Figure 4 is a diagram illustrating stereo matching of a TIOF guide according to an embodiment.

도 4를 참조하면, 먼저, 왼쪽 이미지(420) 및 오른쪽 이미지(430)에 대해 각각 이미지 특징을 추출(422, 432)한다. 예를 들어, CNN을 사용하여 왼쪽 이미지(420) 및 오른쪽 이미지(430)에 대한 이미지 특징을 추출할 수 있다. 그런 다음, 깊이 추정 방법은 각 스캔 라인에 대해 SGM 스테레오 매칭 알고리즘을 별도로 실행할 수 있다. 다만, 본 개시에서 SGM 스테레오 매칭 알고리즘을 실행하면 각 ToF 픽셀에 대응하는 후보 시차의 매칭 비용은 좌우 이미지의 이미지 블록 매칭을 기반으로 결정된 매칭 비용 1과 각 ToF 픽셀을 바탕으로 결정된 매칭 비용 2의 가중합을 기반으로 결정될 수 있다. 이때, 가중합을 수행할 때 사용하는 매칭 비용 2의 가중치는 제1 신뢰도를 기준으로 정의할 수 있다. 제1 신뢰도가 높을수록 매칭 비용 2의 가중치가 커진다. 여기서, 후보 시차가 ToF 픽셀에 대응하는 시차에 가까울수록 매칭 비용 2는 0에 가까워진다. 이러한 방법은 스테레오 매칭 시 얻는 시차 값을 더욱 정확하게 만들 수 있다.Referring to FIG. 4, first, image features are extracted (422, 432) for the left image 420 and the right image 430, respectively. For example, image features for the left image 420 and right image 430 can be extracted using CNN. The depth estimation method can then run the SGM stereo matching algorithm separately for each scan line. However, when executing the SGM stereo matching algorithm in the present disclosure, the matching cost of the candidate disparity corresponding to each ToF pixel is a weight of matching cost 1 determined based on image block matching of the left and right images and matching cost 2 determined based on each ToF pixel. It can be decided based on the sum. At this time, the weight of matching cost 2 used when performing the weighted sum can be defined based on the first reliability. The higher the first reliability, the greater the weight of matching cost 2. Here, the closer the candidate disparity is to the disparity corresponding to the ToF pixel, the closer the matching cost 2 is to 0. This method can make the disparity value obtained during stereo matching more accurate.

깊이 추정 방법은 제1 신뢰도와 ToF 이미지를 기반으로 좌우 이미지의 제1 스테레오 매칭을 수행할 수 있다(442). 그리고, 깊이 추정 방법은 ToF 이미지와 제1 스테레오 매칭 결과를 기반으로 제1 신경망(예를 들어, Transformer)을 사용하여 각 ToF 픽셀의 제4 신뢰도를 예측할 수 있다(444). 각 ToF 픽셀의 제4 신뢰도는 ToF 이미지를 왼쪽 및 오른쪽 이미지에 투영한 후 스캔별로 확인할 수 있다.The depth estimation method may perform first stereo matching of the left and right images based on the first reliability and ToF images (442). Additionally, the depth estimation method may predict the fourth reliability of each ToF pixel using a first neural network (eg, Transformer) based on the ToF image and the first stereo matching result (444). The fourth confidence level of each ToF pixel can be checked for each scan after projecting the ToF image onto the left and right images.

도 5는 일 실시예에 따라 트랜스포머를 기반으로 한 ToF 픽셀 예측의 신뢰성에 대한 예를 도시한 예시도이다.Figure 5 is an exemplary diagram illustrating an example of reliability of transformer-based ToF pixel prediction according to an embodiment.

도 5를 참조하면, 깊이 추정 방법은 각 ToF 픽셀의 제4 신뢰도를 예측하기 위해 제1 신경망에 대한 입력으로 아래 적어도 하나의 유형의 정보(제1 정보(512), 제2 정보(514), 제3 정보(516))를 사용할 수 있다. Referring to FIG. 5, the depth estimation method uses at least one type of information below (first information 512, second information 514, Third information 516) can be used.

제1 정보(512)는 ToF 이미지에서 결정된 각 ToF 픽셀에 대응하는 시차 값과 제1 스테레오 매칭을 통해 결정된 각 ToF 픽셀의 시차 값 간의 차이일 수 있다. 제2 정보(514)는 왼쪽 이미지 및 오른쪽 이미지의 투영 지점에서의 각 ToF 픽셀의 이미지 특징 차이일 수 있다. 제3 정보(516)는 왼쪽 이미지 및 오른쪽 이미지 상의 각 ToF 픽셀의 투영점과 해당 투영점에 영역 내 이미지 특징과 유사한 적어도 하나의 ToF 투영점의 깊이 값 차이일 수 있다.The first information 512 may be the difference between the disparity value corresponding to each ToF pixel determined in the ToF image and the disparity value of each ToF pixel determined through first stereo matching. The second information 514 may be the difference in image characteristics of each ToF pixel at the projection point of the left image and the right image. The third information 516 may be a depth value difference between the projection point of each ToF pixel on the left image and the right image and at least one ToF projection point similar to an image feature in the area at the projection point.

여기서, ToF 이미지를 기반으로 결정된 각 ToF 픽셀에 대응하는 시차 값은 ToF 이미지 중 각 ToF 픽셀의 깊이 값을 변환하여 구한다. 왼쪽 이미지 및 오른쪽 이미지에서의 각 ToF 픽셀의 투영 지점에서의 이미지 특징 차이(제2 정보(514))는 ToF 이미지를 좌우 이미지에 투영하여 해당 투영점을 얻은 후, 해당 투영 지점에서의 이미지 특징을 비교하여 얻을 수 있다. 왼쪽 이미지와 오른쪽 이미지 상의 각 ToF 픽셀의 투영점과 해당 투영점의 인접 영역 내 이미지 특징과 유사한 적어도 하나의 ToF 투영점의 깊이 값 차이(제3 정보(516))는 ToF 이미지를 좌우 이미지 상에 투영하여 해당 투영점을 얻은 후 투영점 스캔 라인 상의 해당 투영점의 미리 정해진 거리 범위 내 이미지 특징과 유사한 ToF 투영점을 검색하고 이들 간의 깊이 값을 비교하여 결정될 수 있다. 제1 정보(512), 제2 정보(514) 및 제3 정보(516)는 연결(520, 530)되어 트랜스포머 인코더(540)로 보내져 인코딩될 수 있다. 인코딩된 결과는 디코딩을 위해 MLP(Multi-layer Perception Machine)(550)에 입력되어 각 ToF 포인트의 신뢰성을 얻을 수 있다. 도 5의 트랜스포머 인코더(540)는 선택적으로 Swin-Transformer(Liu Z, Lin Y, Cao Y, et al. Swin Transformer: Hierarchical vision transformer using shifted windows. ICCV, 2021)를 사용할 수 있는데, 일반 트랜스포머(Transformer)에 비해 처리 속도가 빠르다.Here, the disparity value corresponding to each ToF pixel determined based on the ToF image is obtained by converting the depth value of each ToF pixel in the ToF image. The image feature difference (second information 514) at the projection point of each ToF pixel in the left image and the right image is obtained by projecting the ToF image onto the left and right images to obtain the corresponding projection point, and then calculating the image feature at the projection point. can be obtained by comparison. The depth value difference (third information 516) between the projection point of each ToF pixel on the left image and the right image and at least one ToF projection point similar to the image feature in the adjacent area of the projection point is used to store the ToF image on the left and right images. After obtaining the corresponding projection point by projection, it can be determined by searching for ToF projection points similar to image features within a predetermined distance range of the projection point on the projection point scan line and comparing the depth values between them. The first information 512, the second information 514, and the third information 516 may be connected (520, 530) and sent to the transformer encoder 540 to be encoded. The encoded result is input to the MLP (Multi-layer Perception Machine) 550 for decoding to obtain the reliability of each ToF point. The transformer encoder 540 of Figure 5 can optionally use Swin-Transformer (Liu Z, Lin Y, Cao Y, et al. Swin Transformer: Hierarchical vision transformer using shifted windows. ICCV, 2021), which is a general transformer (Transformer ), the processing speed is faster than that of

도 4의 설명으로 돌아가서, 깊이 추정 방법은 제1 신경망을 사용하여 ToF 픽셀의 제4 신뢰도를 예측(444)한 후, ToF 이미지와 예측된 제4 신뢰도를 기반으로 왼쪽 이미지와 및 오른쪽 이미지의 제2 스테레오 매칭을 수행하여 상기 장면의 제1 깊이맵을 추정할 수 있다(446).Returning to the description of FIG. 4, the depth estimation method predicts the fourth reliability of the ToF pixel using a first neural network (444), and then calculates the fourth reliability of the left image and the right image based on the ToF image and the predicted fourth reliability. 2 Stereo matching can be performed to estimate the first depth map of the scene (446).

구체적으로, 깊이 추정 방법은 ToF 이미지의 각 ToF 픽셀 값과 예측된 각 ToF 픽셀의 제4 신뢰도를 기반으로 제2 스테레오 매칭 시 각 ToF 픽셀에 대응하는 후보 시차의 매칭 비용을 계산할 수 있다. 깊이 추정 방법은 매칭 비용을 기반으로 각 ToF 픽셀에 대응하는 시차 값을 결정하고, 결정된 시차 값을 기반으로 제1 깊이맵을 추정할 수 있다. "ToF 이미지 중 각 ToF 픽셀의 값과 예측된 각 ToF 픽셀의 제4 신뢰도를 기반으로 좌우 이미지의제2 스테레오 매칭을 수행"하는 방법은 앞서 설명한 "ToF 이미지의 각 ToF 픽셀의 값과 각 ToF 픽셀의 제1 신뢰도를 기반으로 왼쪽 및 오른쪽 이미지의 제1 스테레오 매칭을 수행”하는 방법과 동일하며, 다른 점은 여기서 사용되는 것은 트랜스포머(Transformer)를 기반으로 예측한 각 ToF 픽셀의 제4 신뢰도이고, 앞서 사용된 것은 각 ToF 픽셀의 제1 신뢰도라는 점이다. 따라서 여기서는 더 이상의 설명은 생략하고 관련 내용은 상기 설명에서 확인할 수 있다.Specifically, the depth estimation method may calculate the matching cost of the candidate disparity corresponding to each ToF pixel during the second stereo matching based on the value of each ToF pixel of the ToF image and the fourth reliability of each predicted ToF pixel. The depth estimation method may determine a disparity value corresponding to each ToF pixel based on a matching cost, and estimate a first depth map based on the determined disparity value. The method of "performing the second stereo matching of the left and right images based on the value of each ToF pixel in the ToF image and the fourth reliability of each predicted ToF pixel" is the previously described method of "performing the value of each ToF pixel in the ToF image and the fourth reliability of each ToF pixel." It is the same as the method of “performing the first stereo matching of the left and right images based on the first reliability of What was used previously is the first reliability of each ToF pixel. Therefore, further description is omitted here and related information can be found in the above description.

상술한 ToF 가이드 스테레오 매칭에서 ToF 픽셀 값과 ToF 픽셀의 신뢰도를 기반으로 스테레오 매칭 시의 매칭 비용을 계산하므로, 이를 통해 깊이 추정의 정확도를 높일 수 있고 보다 더 정확한 깊이맵을 얻을 수 있다.In the above-mentioned ToF guide stereo matching, the matching cost during stereo matching is calculated based on the ToF pixel value and the reliability of the ToF pixel, thereby improving the accuracy of depth estimation and obtaining a more accurate depth map.

선택적으로, 더 조밀하고 더 정확한 깊이맵을 얻기 위해, 깊이 추정 방법은 ToF 이미지를 왼쪽 이미지 및 오른쪽 이미지에 투영하고, 기설정된 밀도를 충족하는 ToF 투영점 영역에 대해, 왼쪽 이미지 및 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하여 제2 깊이맵을 추정하는 단계, 제1 깊이맵에 대해 왼쪽 이미지 및 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하여 제3 깊이맵을 추정하는 단계 및 제2 깊이맵 및 제3 깊이맵을 기반으로 상기 장면의 제4 깊이맵을 추정하는 단계를 더 포함할 수 있다. "좌우 이미지의 이미지 특징에 의해 가이드되는 보간을 수행한다"는 것은 왼쪽 이미지 및 오른쪽 이미지의 이미지 특징을 기준으로 보간을 수행하는 것, 즉 보간을 수행할 때 왼쪽 및 기존 이미지의 이미지 특징을 고려하여 수행하는 것을 의미한다.Optionally, to obtain a denser and more accurate depth map, the depth estimation method projects the ToF image to the left image and the right image, and for the ToF projection point area that meets the preset density, the image of the left image and the right image estimating a second depth map by performing interpolation based on features, estimating a third depth map by performing interpolation based on image features of the left image and right image with respect to the first depth map, and estimating a second depth The method may further include estimating a fourth depth map of the scene based on the map and the third depth map. "Perform interpolation guided by the image features of the left and right images" means performing interpolation based on the image features of the left and right images, that is, taking into account the image features of the left and existing images when performing the interpolation. means performing.

왼쪽 이미지 및 오른쪽 이미지의 이미지 특징에 의해 가이드 되는 보간(본개시에서는 "RGB 가이드의 보간"이라고도 함)은 보간을 위해 왼쪽 이미지 및 오른쪽 이미지의 이미지 특징을 사용하는 것을 말한다. 실시예에 따르면, 좌우 이미지의 이미지 특징에 의해 가이드되는 보간에서, 보간될 점과 인접한 기준점 사이의 공간적 거리 및 이미지 특징의 차이에 기초하여 보간될 점의 깊이 값을 결정할 수 있다. 예를 들어, 이미지 특징 차이는 색상 특징 차이("색상 거리"라고도 함)일 수 있지만 이에 제한되지 않는다. RGB 가이드의 보간은 좌우 이미지의 이미지 특징을 고려하므로 보간의 정확도를 높일 수 있다.Interpolation guided by image features of the left image and right image (also referred to as “interpolation of RGB guides” in this disclosure) refers to using image features of the left image and right image for interpolation. According to an embodiment, in interpolation guided by image features of the left and right images, the depth value of the point to be interpolated can be determined based on the difference in image features and the spatial distance between the point to be interpolated and adjacent reference points. For example, but not limited to, the image feature difference may be a color feature difference (also referred to as “color distance”). Interpolation of the RGB guide considers the image characteristics of the left and right images, so the accuracy of interpolation can be improved.

본 개시의 실시예에서 깊이 추정 방법은 보간될 대상을 기준으로 보간될 점 및 인접 기준점을 결정한다. 보간될 점의 다른 객체에 따라, 보간될 점과 인접한 기준점은 다른 점을 나타낸다. 예를 들어, 보간될 객체가 ToF 투영점 영역인 경우, 보간될 점과 인접한 기준점은 ToF 투영점이 될 수 있다. 예를 들어, 보간될 객체가 깊이맵인 경우, 보간될 점과 인접한 기준점은 깊이 값 포인트가 될 수 있다. 예를 들어, 보간될 점은 기존 ToF 투영점을 제외하고 ToF 투영점이 밀집된 영역의 포인트일 수 있다. 인접한 기준점은 보간될 점이 있는 직사각형 그리드의 네 모서리 상의 ToF 투영점이다.In an embodiment of the present disclosure, the depth estimation method determines the point to be interpolated and adjacent reference points based on the object to be interpolated. Depending on the different objects of the point to be interpolated, the reference points adjacent to the point to be interpolated represent different points. For example, if the object to be interpolated is a ToF projection point area, a reference point adjacent to the point to be interpolated may be a ToF projection point. For example, if the object to be interpolated is a depth map, a reference point adjacent to the point to be interpolated may be a depth value point. For example, the point to be interpolated may be a point in an area where ToF projection points are dense, excluding existing ToF projection points. Adjacent reference points are ToF projection points on the four corners of the rectangular grid where the points to be interpolated are located.

공간 거리(예, 유클리드 거리)를 계산하는데 수반되는 높은 계산량으로 인해, 계산량을 높이기 위해 본 개시에서는 먼저 ToF 이미지를 왼쪽 및 오른쪽 이미지 상에 투영하고, 미리 결정된 밀도를 충족하는 ToF 투영점 영역에 대한 정규 그리드 상의 RGB 가이드의 보간을 수행할 수 있다. Due to the high computational amount involved in calculating the spatial distance (e.g., Euclidean distance), to increase the computational amount, in this disclosure, the ToF image is first projected onto the left and right images, and the ToF projection point area that meets the predetermined density is Interpolation of RGB guides on a regular grid can be performed.

도 6은 일 실시예에 따라 RGB 이미지에 대한 ToF 포인트의 투영 상황을 도시한 예시도이다.Figure 6 is an example diagram illustrating a projection situation of ToF points on an RGB image according to an embodiment.

도 6을 참조하면, ToF 이미지의 ToF 픽셀을 왼쪽 및 오른쪽 이미지에 투영할 때, 초기 결과는 불규칙한 그리드 포인트이다. 즉, 동일한 라인의 ToF 포인트는 투영 후에도 여전히 동일한 투영 스캔 라인 상에 있을 수 있지만, 인접한 ToF 포인트 사이의 거리는 더 이상 규칙적이지 않을 수 있다. 따라서, 본 개시에서는 우선 각 ToF 투영점의 스캔 라인 상의 일정 거리 이내로 이격된 인접한 ToF 투영점에 대해 좌우 이미지의 이미지 특징에 의해 가이드되는 보간을 수행한다. 예를 들어, 포인트 C와 D 사이에 포인트 F를 보간한다. 투영 후 간격이 큰 영역의 경우 일반적으로 매우 불연속적이고 ToF 카메라에서 볼 수 없는 심각한 오류가 있을 수 있다. 따라서 미리 정해진 밀도를 충족하는 ToF 투영점 영역에서만 보간을 수행할 수 있다. 스캔 라인 상에서 인접한 두 ToF 투영점 사이의 거리가 미리 정해진 간격보다 크면 보간이 수행되지 않는다. 예를 들어, 미리 결정된 간격은 ToF 이미지에서 두 ToF 픽셀 사이의 간격과 관련될 수 있다. 예를 들어, 미리 정해진 간격은 ToF 이미지에서 두 ToF 픽셀 사이의 간격의 미리 정해진 배수(예를 들어, 3배)일 수 있다.Referring to Figure 6, when projecting the ToF pixels of a ToF image onto the left and right images, the initial result is irregular grid points. That is, ToF points on the same line may still be on the same projection scan line after projection, but the distance between adjacent ToF points may no longer be regular. Therefore, in the present disclosure, interpolation guided by the image features of the left and right images is first performed for adjacent ToF projection points spaced within a certain distance on the scan line of each ToF projection point. For example, interpolate point F between points C and D. Areas with large post-projection gaps are typically very discontinuous and may have significant errors that are not visible to the ToF camera. Therefore, interpolation can only be performed in the ToF projection point area that meets the predetermined density. If the distance between two adjacent ToF projection points on the scan line is greater than a predetermined interval, interpolation is not performed. For example, the predetermined spacing may be related to the spacing between two ToF pixels in a ToF image. For example, the predetermined spacing may be a predetermined multiple (e.g., three times) of the spacing between two ToF pixels in the ToF image.

둘째, 깊이 추정 방법은 보간된 ToF 투영점을 샘플링하여 정규 ToF 투영점 그리드를 구성한다. 예를 들어, 다운샘플링 후의 그리드 포인트의 개략도는 도 7과 같다. 마지막으로 왼쪽 이미지 및 오른쪽 이미지의 이미지 특징에 의해 가이드되는 보간은 구성된 각 그리드 상에서 수행된다.Second, the depth estimation method samples the interpolated ToF projection points to construct a regular ToF projection point grid. For example, a schematic diagram of grid points after downsampling is shown in Figure 7. Finally, interpolation guided by the image features of the left and right images is performed on each constructed grid.

도 7은 일 실시예에 따라 다운샘플링 후의 그리드 포인트의 예를 도시한 예시도이다.Figure 7 is an exemplary diagram illustrating an example of grid points after downsampling according to an embodiment.

도 7을 참조하면, 깊이 추정 방법은 그리드 포인트 상에 있는 네 개의 포인트 A, B, C 및 D가 주어지면, 그로 둘러싸인 직사각형 내에서 보간될 포인트 F의 깊이 값을 추론할 수 있다. 예를 들어, 작은 그리드의 너비가 주어지면, AB의 길이는 N이고, 포인트 F의 깊이 값d_F의 계산 방법은 예를 들어 다음 <수학식 2>와 같을 수 있다.Referring to FIG. 7, the depth estimation method can, given four points A, B, C, and D on a grid point, infer the depth value of point F to be interpolated within a rectangle surrounded by them. For example, given the width of a small grid, the length of AB is N, and the calculation method for the depth value d _F of point F may be, for example, as shown in Equation 2 below.

[수학식 2][Equation 2]

여기서, 는 각각 포인트 A와 포인트 F 사이의 x축 및 y축 방향의 공간적 거리이고, , 는 각각 포인트 A와 포인트 F에서의 색상값이고, a는 0보다 큰 상수이다.here, are the spatial distances in the x- and y-axis directions between point A and point F, respectively, , are the color values at point A and point F, respectively, and a is a constant greater than 0.

일 실시예에서, 처리 속도를 빠르게 하기 위해, 상기 계산의 공식에 따라 공간적 거리 및 생상 거리와 관련된 계수를 미리 계산하여 테이블에 저장하고, 테이블 룩업을 통해 그 값을 결정할 수 있다.In one embodiment, to speed up processing, the calculation Coefficients related to spatial distance and productive distance according to the formula of You can calculate in advance and store it in a table, and determine its value through table lookup.

위에서 포인트 F에 대한 깊이 값 계산 방법의 예시를 제공했으나, 포인트 F에 대한 깊이 값 계산 방법은 상기 예시에 제한되지 않으며, 오히려 포인트 F와 인접한 기준점 A, B, C, D 사이의 공간적 거리와 이미지 특징 차이를 기반으로 포인트 F의 깊이 값을 결정하는 모든 방법이 될 수 있다.Above we have provided an example of how to calculate the depth value for point F, but the method for calculating the depth value for point F is not limited to the above example, but rather the spatial distance between point F and adjacent reference points A, B, C, D and the image It can be any method that determines the depth value of point F based on feature differences.

도 8은 일 실시예에 따라 정규 그리드 상에서 왼쪽 및 오른쪽 이미지의 이미지 특징에 의해 가이드되는 보간을 수행하는 예를 도시한 도면이다.FIG. 8 is a diagram illustrating an example of performing interpolation guided by image features of left and right images on a regular grid, according to an embodiment.

도 8을 참조하면, 도 8의 (a)는 RGB 이미지이고, 도 8의 (b)는 서로 다른 픽셀 간격(그리드 너비)에서의 보간 결과를 도시한다. 왼쪽부터 순서대로 희소 깊이맵(즉, 보간 전의 깊이맵), 예측된 깊이맵(즉, 보간을 거친 깊이맵), 실제 깊이맵 및 실제 깊이맵과 예측된 깊이맵 간의 차이이다. 도 8에서 그리드 너비가 좁을수록 예측된 깊이맵과 실제 깊이맵 간의 차이가 작아짐을 알 수 있다.Referring to FIG. 8, (a) of FIG. 8 is an RGB image, and (b) of FIG. 8 shows interpolation results at different pixel intervals (grid widths). In order from the left, they are the sparse depth map (i.e., the depth map before interpolation), the predicted depth map (i.e., the depth map after interpolation), the actual depth map, and the difference between the actual depth map and the predicted depth map. In Figure 8, it can be seen that the narrower the grid width, the smaller the difference between the predicted depth map and the actual depth map.

도 9는 일 실시예에 따라 불규칙 투영 후의 왼쪽 및 오른쪽 이미지의 이미지 특징에 의해 가이드 되는 보간의 예시를 도시한다.9 shows an example of interpolation guided by image features of left and right images after irregular projection, according to one embodiment.

도 9를 참조하면, 도 9의 (a)는 왼쪽 RGB 이미지와 ToF 이미지를 도시한다. 여기서, ToF 이미지를 왼쪽 RGB 이미지 상에 투영하는 것을 예로 든다. 도 9의 (b)는 스파스(sparse) 스캔 라인 상의 보간 효과, 조밀한 ToF 투영점 영역을 채우기 위해 정규 그리드 상에 RGB 가이드의 보간을 수행한 효과, 예측된 깊이맵 및 실제 깊이맵을 순차적으로 도시한다. 여기서, 예측된 깊이맵은 앞서 언급한 제2 깊이맵이다.Referring to FIG. 9, (a) of FIG. 9 shows a left RGB image and a ToF image. Here, projecting the ToF image onto the left RGB image is taken as an example. Figure 9(b) shows the interpolation effect on a sparse scan line, the effect of interpolation of the RGB guide on a regular grid to fill the dense ToF projection point area, and the predicted depth map and actual depth map sequentially. It is shown as Here, the predicted depth map is the previously mentioned second depth map.

일 실시예에서, 위에서 언급한 작업 "ToF 이미지를 왼쪽 이미지 및 오른쪽 이미지에 투영하고, 기설정된 밀도를 충족하는 ToF 투영점 영역에 대해, 왼쪽 이미지 및 오른쪽 이미지의 이미지 특징을 기반으로 보간을 수행하여 제2 깊이맵을 추정하는" 작업은 앞서 언급한 작업 "ToF 이미지를 사용하지 않는 상황에서 왼쪽 이미지 및 오른쪽 이미지의 제1 스테레오 매칭 수행" 이전 또는 이후에 실행할 수 있으며, 또는 "제1 신뢰도 및 ToF 이미지를 기반으로 왼쪽 이미지 및 오른쪽 이미지의 제1 스테레오 매칭 수행" 이전 또는 이후에 실행할 수 있다. 본 개시는 이에 대해 제한하지 않는다. 또한, 제2 깊이맵을 추정하는 경우, 상기 깊이 추정 방법은 제1 깊이맵에서 대해 좌우 이미지의 이미지 특징에 의해 가이드되는 보간을 수행하여 제3 깊이맵을 추정할 수도 있다. 마지막으로, 제2 깊이맵 및 제3 깊이맵을 기반으로 상기 장면의 제4 깊이맵을 더 추정할 수 있다. 깊이맵의 추가 융합을 통해 보다 정확한 깊이맵을 얻을 수 있다.In one embodiment, the above-mentioned operation "projects the ToF image to the left image and the right image, and performs interpolation based on the image features of the left image and the right image for the ToF projection point area that meets a preset density, The task of “estimating a second depth map” can be performed before or after the previously mentioned task of “performing a first stereo matching of the left and right images without using the ToF image”, or the task of “performing a first confidence and ToF It can be executed before or after “performing the first stereo matching of the left image and the right image based on the image.” The present disclosure is not limited thereto. Additionally, when estimating the second depth map, the depth estimation method may estimate the third depth map by performing interpolation guided by image features of the left and right images on the first depth map. Finally, the fourth depth map of the scene can be further estimated based on the second depth map and the third depth map. A more accurate depth map can be obtained through additional fusion of the depth map.

예를 들어, 깊이 추정 방법은 제2 깊이맵 및 제3 깊이맵 중 신뢰할 수 있는 ToF 픽셀 주변 영역의 깊이 값을 가중 평균하여 제4 깊이맵을 얻을 수 있다. 이때, 가중 평균 시 가중치는 ToF 픽셀의 제4 신뢰도에 따라 달라진다. 가중 깊이 값에 앞서 제4 신뢰도를 계산한 해당 ToF 픽셀이 있는 경우, 해당 깊이 값의 가중치는 ToF 픽셀의 제4 신뢰도를 기반으로 결정될 수 있다. 가중 깊이 값에 앞서 제4 신뢰도를 계산한 해당 ToF 픽셀이 없는 경우, 깊이 추정 방법은 가장 가까운 ToF 픽셀의 제4 신뢰도를 기반으로 깊이 값의 가중치를 결정할 수 있다.For example, the depth estimation method may obtain a fourth depth map by performing a weighted average of the depth values of the area surrounding a reliable ToF pixel among the second depth map and the third depth map. At this time, the weight during the weighted average varies depending on the fourth reliability of the ToF pixel. If there is a corresponding ToF pixel for which the fourth reliability has been calculated prior to the weighted depth value, the weight of the corresponding depth value may be determined based on the fourth reliability of the ToF pixel. If there is no corresponding ToF pixel for which the fourth reliability was calculated prior to the weighted depth value, the depth estimation method may determine the weight of the depth value based on the fourth reliability of the nearest ToF pixel.

위에서 설명한 ToF 이미지와 스테레오 매칭을 융합하는 깊이 추정 방법은 ToF 이미지와 스테레오 매칭 둘의 장점을 효과적으로 통합하여 정확한 깊이맵을 추정할 수 있다.The depth estimation method that combines ToF image and stereo matching described above can estimate an accurate depth map by effectively integrating the advantages of both ToF image and stereo matching.

그러나, 경우에 따라 장면의 가려진 영역은 ToF 카메라 및 RGB 카메라 모두 볼 수 없기 때문에, 가려진 영역의 깊이 정보를 정확하게 추정할 수 없다. 이러한 폐색으로 인한 부정확한 깊이 정보 추정 문제는 왼쪽 이미지, 오른쪽 이미지 및 ToF 이미지를 기반으로 깊이맵을 추정하는 방식 뿐만 아니라 ToF 이미지를 사용하지 않고 좌우 이미지를 기반으로 깊이 정보를 추정하는 방식에서도 발생할 수 있다. 예를 들어, ToF 이미지를 사용하지 않는 상황에서 좌우 이미지를 기반으로 장면의 깊이 정보를 추정하는 방법은, ToF 이미지를 사용하지 않는 상황에서 좌우 이미지의 스테레오 매칭을 통해 상기 장면의 제5 깊이맵을 획득하는 단계를 포함할 수 있다. 그러나, 추정된 깊이맵에서 가려진 영역의 깊이 값은 충분히 정확하지 않을 수 있다. 이에 대해 본 개시는 더 정확한 깊이 지도를 얻기 위해 이러한 가려진 영역에 대응하는 깊이 값을 업데이트하는 것을 추가적으로 제안한다. 따라서, 실시예에 따르면, 상기 깊이 추정 방법은 추정된 깊이맵 중 신뢰할 수 없는 깊이 포인트의 깊이 값을 업데이트하는 단계를 더 포함할 수 있으며, 여기서 상기 추정된 깊이맵은 제1 깊이맵, 제4 깊이맵 또는 제5 깊이맵을 포함한다.However, in some cases, the hidden area of the scene cannot be seen by both the ToF camera and the RGB camera, so the depth information of the hidden area cannot be accurately estimated. The problem of inaccurate depth information estimation due to such occlusion can occur not only in the method of estimating the depth map based on the left image, right image, and ToF image, but also in the method of estimating depth information based on the left and right images without using the ToF image. there is. For example, a method of estimating depth information of a scene based on the left and right images in a situation where ToF images are not used is to create a fifth depth map of the scene through stereo matching of the left and right images in a situation where ToF images are not used. It may include an acquisition step. However, the depth value of the occluded area in the estimated depth map may not be accurate enough. In response to this, the present disclosure additionally proposes updating the depth value corresponding to this occluded area to obtain a more accurate depth map. Therefore, according to an embodiment, the depth estimation method may further include updating the depth value of an unreliable depth point among the estimated depth maps, where the estimated depth map is a first depth map, a fourth Includes a depth map or a fifth depth map.

구체적으로, 먼저, 깊이 추정 방법은 업데이트 작업에서 추정된 깊이맵의 신뢰할 수 있는 깊이 값 포인트와 신뢰할 수 없는 깊이 값 포인트를 먼저 결정할 수 있다. 둘째, 깊이 추정 방법은 신뢰할 수 있는 깊이 값 포인트와 신뢰할 수 없는 깊이 값 포인트의 특징을 기반으로 제2 신경망을 사용하여 상기 신뢰할 수 없는 깊이 값 포인트의 깊이 값을 예측할 수 있다. 마지막으로, 깊이 추정 방법은 신뢰할 수 없는 깊이 값 포인트 주변 영역에 대해 왼쪽 이미지 및 오른쪽 이미지의 이미지 특징으로 가이드되는 보간을 수행하여 업데이트된 깊이맵을 추정할 수 있다. 깊이맵의 업데이트 방법은 깊이 추론을 위한 전역 의미론적 정보를 효과적으로 활용할 수 있으므로 더 정확한 깊이맵을 추정할 수 있다. 실시예에 따르면, 제2 신경망은 트랜스포머일 수 있으나 이에 제한되지는 않는다.Specifically, first, the depth estimation method may first determine reliable depth value points and unreliable depth value points of the estimated depth map in the update operation. Second, the depth estimation method can predict the depth value of the unreliable depth value point using a second neural network based on the characteristics of the reliable depth value point and the unreliable depth value point. Finally, the depth estimation method can estimate an updated depth map by performing interpolation guided by image features of the left and right images for the area around the unreliable depth value point. The depth map update method can effectively utilize global semantic information for depth inference, so a more accurate depth map can be estimated. According to an embodiment, the second neural network may be a transformer, but is not limited thereto.

도 10은 일 실시예에 따라 트랜스포머 기반의 신뢰할 수 없는 깊이 값 포인트를 업데이트하는 과정을 도시한 도면이다.FIG. 10 is a diagram illustrating a process of updating a transformer-based unreliable depth value point according to an embodiment.

도 10을 참조하면, 업데이트 속도를 향상시키기 위해 본 개시에서는 추정된 깊이맵을 기반으로 정규 깊이 값 포인트 그리드를 구성하고, 정규 그리드 상에서 신뢰할 수 있는 깊이 값 포인트와 신뢰할 수 없는 깊이 값 포인트를 결정할 수 있다. 실시예에 따르면, 깊이 추정 방법은 추정된 깊이맵이 위에서 언급한 제1 깊이맵인 경우, 제1 깊이맵에 대해 RGB 가이드의 보간을 먼저 수행할 수 있다. 그런 다음, 깊이 추정 방법은 제1 깊이맵에 대해 깊이 값 포인트를 샘플링하여 정규 그리드를 구성할 수 있다. 그리고, 깊이 추정 방법은 그리드 상의 신뢰할 수 있는 깊이 값 포인트와 신뢰할 수 없는 깊이 값 포인트를 결정할 수 있다. Referring to FIG. 10, in order to improve the update speed, the present disclosure constructs a grid of regular depth value points based on the estimated depth map, and determines reliable depth value points and unreliable depth value points on the regular grid. there is. According to an embodiment, when the estimated depth map is the first depth map mentioned above, the depth estimation method may first perform interpolation of the RGB guide for the first depth map. Then, the depth estimation method may construct a regular grid by sampling depth value points for the first depth map. And, the depth estimation method can determine reliable and unreliable depth value points on the grid.

깊이 추정 방법은 추정된 깊이맵이 앞서 언급한 제4 깊이맵인 경우, 제4 깊이맵을 직접 샘플링하여 정규 그리드를 구성하고 그리드 상에서 신뢰할 수 있는 깊이 값 포인트와 신뢰할 수 없는 깊이 값 포인트를 결정할 수 있다. 정규 그리드를 구성하는 방법은 앞서 설명하였으므로 여기서 더는 설명하지 않는다.The depth estimation method can construct a regular grid by directly sampling the fourth depth map and determine reliable and unreliable depth value points on the grid when the estimated depth map is the aforementioned fourth depth map. there is. The method of configuring a regular grid has been described previously and will not be described further here.

깊이 추정 방법은 정규 그리드를 구성한 후, 정규 그리드 상에서 신뢰할 수 있는 깊이 값 포인트(도 10의 P1 내지 PN)와 신뢰할 수 없는 깊이 값 포인트(도 10의 Q1 내지 QM)를 결정할 수 있다. 일반적으로 신뢰할 수 없는 깊이 값 포인트인 경우, 그 대응하는 ToF 투영점 간의 인접 거리가 상대적으로 크거나, 대응하는 좌우 이미지의 특징 차이가 크다. 따라서 이를 기반으로 신뢰할 수 없는 깊이 값 포인트를 결정할 수 있으며, 신뢰할 수 없는 깊이 값 포인트 이외의 깊이 값 포인트는 신뢰할 수 있는 깊이 값 포인트이다. The depth estimation method can configure a regular grid and then determine reliable depth value points (P1 to PN in FIG. 10) and unreliable depth value points (Q1 to QM in FIG. 10) on the regular grid. Generally, in the case of an unreliable depth value point, the adjacent distance between the corresponding ToF projection points is relatively large, or the feature difference between the corresponding left and right images is large. Therefore, based on this, unreliable depth value points can be determined, and depth value points other than unreliable depth value points are reliable depth value points.

깊이 추정 방법은 정규 그리드 상에서 신뢰할 수 있는 깊이 포인트와 신뢰할 수 없는 깊이 포인트를 결정한 후, 정규 그리드 상의 신뢰할 수 있는 깊이 포인트의 특징(예, 깊이 특징 또는 깊이 값 포인트에 대응하는 이미지 특징 등)을 위치 정보와 연결(1010)하고 추가로 선형 임베딩(102)하여 최종적으로 인코딩을 위해 트랜스포머(Transformer) 인코더(1030)에 입력한다. The depth estimation method determines reliable and unreliable depth points on a regular grid, and then locates features (e.g., depth features or image features corresponding to depth value points, etc.) of the reliable depth points on the regular grid. It is connected to information (1010), further linearly embedded (102), and finally input to a transformer encoder (1030) for encoding.

또한, 깊이 추정 방법은 정규 그리드 상에서 신뢰할 수 없는 깊이 값 포인트의 특징 및 위치 정보를 연결(1040)하고 추가로 선형 임베딩(1050)하여 얻은 결과를 트랜스포머 인코더(1030)으로 인코딩한 결과와 함께 트랜스포머 디코더(1060)에 입력하여, 신뢰할 수 없는 깊이 값 포인트의 깊이 값(도 10의 D1 내지 DM)을 예측 또는 디코딩할 수 있다. In addition, the depth estimation method concatenates (1040) the features and location information of unreliable depth value points on a regular grid and further encodes the results obtained by linear embedding (1050) with a transformer encoder (1030). By entering 1060, the depth value of the unreliable depth value point (D1 to DM in FIG. 10) can be predicted or decoded.

마지막으로, 깊이 추정 방법은 신뢰할 수 없는 깊이 값 포인트 주변 영역에 대해 좌우 이미지의 이미지 특징에 의해 가이드되는 보간을 수행하여 업데이트된 깊이맵을 얻을 수 있다. 왼쪽 이미지와 오른쪽 이미지의 이미지 특징에 의해 가이드되는 보간법은 앞서 설명하였으므로 여기서 더는 반복하지 않는다. 정규 그리드 포인트 상에서 깊이 값을 업데이트하기 때문에, 처리 속도를 효과적으로 향상시킬 수 있다.Finally, the depth estimation method can obtain an updated depth map by performing interpolation guided by image features of the left and right images for the area around the unreliable depth value point. The interpolation method guided by the image features of the left and right images has been described previously and will not be repeated here. Because depth values are updated on regular grid points, processing speed can be effectively improved.

이상, 본 개시의 실시예에 따른 깊이 추정 방법에 대해 도 1 내지 도 10을 참조하여 설명하였다. 이하, 이해의 편의를 위하여 도 11을 참조하여 깊이 추정 방법의 예시에 대해 간략히 설명한다.Above, the depth estimation method according to an embodiment of the present disclosure has been described with reference to FIGS. 1 to 10. Hereinafter, for convenience of understanding, an example of a depth estimation method will be briefly described with reference to FIG. 11.

도 11은 일 실시예에 따라 깊이를 추정하는 과정의 예를 도시한 도면이다.Figure 11 is a diagram illustrating an example of a process for estimating depth according to an embodiment.

도 11을 참조하면, 깊이 추정 방법은 현재 시간 t에서 하나의 ToF 이미지와 두 개의 RGB 이미지(좌우 RGB 이미지)를 획득(1110)한 후, 앞서 언급한 양방향 전파 기반의 ToF 신뢰도 추론 방법을 사용하여 각 ToF 픽셀의 제1 신뢰도를 신속하게 결정할 수 있다(1120).Referring to FIG. 11, the depth estimation method acquires (1110) one ToF image and two RGB images (left and right RGB images) at the current time t, and then uses the aforementioned two-way propagation-based ToF reliability inference method. A first confidence level for each ToF pixel can be quickly determined (1120).

그리고, 깊이 추정 방법은 기설정된 요구사항을 충족하는 제1 신뢰도의 ToF 픽셀의 수량(신뢰할 수 있는 포인트의 수량)이 s0 보다 큰지 여부를 확인할 수 있다(1130).Additionally, the depth estimation method can check whether the quantity of ToF pixels (quantity of reliable points) of the first reliability that satisfies preset requirements is greater than s0 (1130).

1130단계의 확인결과, 기설정된 요구사항을 충족하는 제1 신뢰도의 ToF 픽셀의 수량(신뢰할 수 있는 포인트의 수량)이 s0보다 큰 경우, 깊이 추정 방법은 ToF 이미지를 스테레오 매칭과 융합하는 방식을 통해 장면의 깊이맵을 추정할 수 있다(1150). 예를 들어, 이러한 융합 방식에서 깊이 추정 방법은 왼쪽 RGB 이미지와 오른쪽 RGB 이미지에 ToF 이미지를 먼저 각각 투영하고, ToF 투영점이 밀집된 영역(ToF 포인트가 밀집된 영역)에 RGB 가이드의 보간을 수행할 수 있다(1152). 다음으로, 깊이 추정 방법은 제1 신뢰도와 ToF 이미지를 기반으로 좌우 RGB 이미지의 스테레오 매칭을 수행할 수 있다(1154). 그런 다음 ToF 이미지와 스테레오 매칭 결과를 기반으로 Transformer를 사용하여 각 ToF 픽셀의 제4 신뢰도를 예측할 수 있다(1156). 마지막으로, 깊이 추정 방법은 ToF 이미지와 예측된 제4 신뢰도를 기반으로 왼쪽 및 오른쪽 RGB 이미지의 스테레오 매칭을 수행할 수 있다(1158).As a result of the confirmation in step 1130, if the quantity of ToF pixels (quantity of trustworthy points) of the first reliability that meets the preset requirements is greater than s0, the depth estimation method is performed by fusing the ToF image with stereo matching. The depth map of the scene can be estimated (1150). For example, in this fusion method, the depth estimation method first projects the ToF image on the left RGB image and the right RGB image, respectively, and performs interpolation of the RGB guide in the area where ToF projection points are concentrated (area where ToF points are concentrated). (1152). Next, the depth estimation method can perform stereo matching of the left and right RGB images based on the first reliability and the ToF image (1154). Then, based on the ToF image and the stereo matching result, the fourth reliability of each ToF pixel can be predicted using Transformer (1156). Finally, the depth estimation method can perform stereo matching of the left and right RGB images based on the ToF image and the predicted fourth reliability (1158).

1130단계의 확인결과, 신뢰할 수 있는 포인트의 수량이 너무 적으면(s0 이하), 깊이 추정 방법은 스테레오 매칭을 직접 수행한다(1142). As a result of the confirmation in step 1130, if the number of reliable points is too small (s0 or less), the depth estimation method directly performs stereo matching (1142).

깊이 추정 방법은 신뢰할 수 있는 포인트의 수량이 s0보다 크지 않으나 s1보다 큰 경우를 확인할 수 있다(1144).The depth estimation method can confirm the case where the quantity of reliable points is not greater than s0 but greater than s1 (1144).

1144단계의 확인결과 신뢰할 수 있는 포인트의 수량이 s0보다 크지 않으나 s1보다 큰 경우, 깊이 추정 방법은 ToF 이미지를 스테레오 매칭과 융합하여 장면의 깊이맵을 추정할 수 있다. 이러한 융합 방식은 ToF 이미지를 사용하지 않는 상황에서 좌우 이미지의 스테레오 매칭을 진행한 후 RGB 이미지에 ToF 이미지를 투영할 수 있으며, ToF 투영점이 밀집된 영역에 RGB 가이드의 보간을 수행할 수 있다(1146). 다음으로, 깊이 추정 방법은 ToF 이미지와 스테레오 매칭 결과를 기반으로 Transformer를 사용하여 각 ToF 픽셀의 제4 신뢰도를 예측할 수 있다(1156). 마지막으로, 깊이 추정 방법은 ToF 이미지와 예측된 제4 신뢰도를 기반으로 좌우 이미지의 스테레오 매칭을 수행할 수 있다(1158). 여기서 보간 작업과 스테레오 매칭은 상대적으로 독립적이다. 두 가지 처리 방식에 있어서, 스테레오 매칭 전 보간 작업을 먼저 진행할 수 있고, 또는 보간 전 스테레오 매칭 작업을 먼저 진행할 수도 있다. As a result of the confirmation in step 1144, if the quantity of reliable points is not greater than s0 but greater than s1, the depth estimation method can estimate the depth map of the scene by fusing the ToF image with stereo matching. This fusion method can project the ToF image onto the RGB image after performing stereo matching of the left and right images in a situation where the ToF image is not used, and can perform interpolation of the RGB guide in an area where ToF projection points are dense (1146) . Next, the depth estimation method can predict the fourth reliability of each ToF pixel using a Transformer based on the ToF image and the stereo matching result (1156). Finally, the depth estimation method can perform stereo matching of the left and right images based on the ToF image and the predicted fourth reliability (1158). Here the interpolation operations and stereo matching are relatively independent. In both processing methods, the interpolation task can be performed first before stereo matching, or the stereo matching task can be performed first before interpolation.

상기 과정에서 얻은 깊이맵은 신뢰할 수 없는 깊이 값 포인트에 대해 추가로 업데이트할 수 있고(1160), 이를 통해 업데이트된 깊이맵을 얻을 수 있다. 구체적으로, 이 업데이트는 그리드 포인트 상에서 수행할 수 있다. 예를 들어, 도 10에서 설명한 것과 같이, 깊이 추정 방법은 정규 그리드 상에서 신뢰할 수 없는 깊이 값 포인트(신뢰할 수 없는 그리드 포인트)를 먼저 결정할 수 있다(1162).The depth map obtained in the above process can be further updated for unreliable depth value points (1160), and through this, an updated depth map can be obtained. Specifically, this update can be performed on grid points. For example, as described in FIG. 10, the depth estimation method may first determine unreliable depth value points (unreliable grid points) on a regular grid (1162).

그리고, 깊이 추정 방법은 Transformer를 사용하여 신뢰할 수 없는 그리드 포인트의 깊이 값을 추론할 수 있다(1162). 마지막으로, 깊이 추정 방법은 신뢰할 수 없는 그리드 포인트 주변 영역에 대해 RGB 가이드의 보간을 수행하여 업데이트된 깊이맵을 추정할 수 있다(1166).Additionally, the depth estimation method can use a Transformer to infer the depth value of an unreliable grid point (1162). Finally, the depth estimation method can estimate an updated depth map by performing interpolation of the RGB guide for the area around the unreliable grid point (1166).

본 개시의 실시예에 따른 깊이 추정 방법은 증강 현실에 적용될 수 있다. 예를 들어, 증강 현실에서의 가상 객체의 렌더링, 가상 객체와 현실 공간의 상호작용, 또는 VST(Video See Through)의 이미지 투영 보정에도 적용될 수 있다.The depth estimation method according to an embodiment of the present disclosure can be applied to augmented reality. For example, it can be applied to the rendering of virtual objects in augmented reality, the interaction of virtual objects with real space, or image projection correction in VST (Video See Through).

예를 들어, 본 개시의 실시예에 따른 깊이 추정 방법은 스마트 글래스와 같은 헤드 웨어러블 스마트 기기에 적용될 수 있다. 스마트 글래스는 ToF 이미지 센서 및 한 쌍의 컬러 이미지 센서를 포함할 수 있다. ToF 이미지 센서를 통해 장면의 ToF 이미지와 컬러 이미지 센서를 통해 장면의 좌우 이미지를 획득한 후, 본 개시의 실시예에 따른 깊이 추정 방법을 이용하여 장면의 깊이맵을 추정할 수 있다. 추정된 깊이맵을 사용하여, 스마트 글래스는 장면의 각 객체의 깊이 값을 결정할 수 있고, 결정된 깊이 값에 따라 상호 작용할 것으로 예상되는 객체 위 또는 주변에 가상 객체를 표시할 수 있고, 착용자는 게임 플레이와 같이 표시된 가상 객체와 추가로 상호작용할 수 있다. 가상 객체와 실제 공간의 물체 결합을 통해, 착용자는 몰입감 있는 느낌을 경험할 수 있다.For example, the depth estimation method according to an embodiment of the present disclosure can be applied to head wearable smart devices such as smart glasses. Smart glasses may include a ToF image sensor and a pair of color image sensors. After acquiring the ToF image of the scene through the ToF image sensor and the left and right images of the scene through the color image sensor, the depth map of the scene can be estimated using the depth estimation method according to an embodiment of the present disclosure. Using the estimated depth map, smart glasses can determine the depth value of each object in the scene, display virtual objects on or around objects expected to be interacted with according to the determined depth value, and allow the wearer to play the game. You can further interact with virtual objects displayed as . Through the combination of virtual objects and objects in real space, the wearer can experience an immersive feeling.

이상, 도 1 내지 도 11을 결합하여 본 개시의 실시예에 따른 깊이 추정 방법에 대해 설명하였다. 상술한 깊이 추정 방법에 따라 장면의 깊이 정보를 보다 정확하고 효율적으로 추정할 수 있다.Above, the depth estimation method according to an embodiment of the present disclosure has been described by combining FIGS. 1 to 11. According to the above-described depth estimation method, the depth information of the scene can be estimated more accurately and efficiently.

도 12는 일 실시예의 따른 깊이 추정 장치를 도시한 블록도이다.Figure 12 is a block diagram showing a depth estimation device according to an embodiment.

도 12를 참조하면, 깊이 추정 장치(1200)는 이미지 획득부(1210), 신뢰도 결정부(1220) 및 깊이 추정부(1230)를 포함할 수 있다. 구체적으로, 이미지 획득부(1210)는 장면의 비행 시간 ToF 이미지, 왼쪽 이미지 및 오른쪽 이미지를 얻도록 구성된다. 신뢰도 결정부(1220)은 ToF 이미지 중 각 ToF 픽셀의 제1 신뢰도를 결정하도록 구성된다. 깊이 추정부(1230)은 제1 신뢰도에 따라 왼쪽 이미지, 오른쪽 이미지 및 ToF 이미지를 기반으로 장면의 깊이맵을 추정하도록 구성된다. 선택적으로, 깊이 추정 장치(1200)는 또한 추정된 깊이맵에서 신뢰할 수 없는 깊이 값 포인트의 깊이 값을 업데이트하기 위한 업데이트 유닛(미도시)을 더 포함할 수 있다.Referring to FIG. 12 , the depth estimation device 1200 may include an image acquisition unit 1210, a reliability determination unit 1220, and a depth estimation unit 1230. Specifically, the image acquisition unit 1210 is configured to obtain a time-of-flight ToF image, a left image, and a right image of the scene. The reliability determination unit 1220 is configured to determine the first reliability of each ToF pixel in the ToF image. The depth estimation unit 1230 is configured to estimate the depth map of the scene based on the left image, right image, and ToF image according to the first reliability. Optionally, the depth estimation device 1200 may further include an update unit (not shown) for updating the depth value of an unreliable depth value point in the estimated depth map.

도 1에 도시된 깊이 추정 방법은 도 12에 도시된 깊이 추정 장치(1200)에 의해 수행될 수 있고, 이미지 획득부(1210), 신뢰도 결정부(1220) 및 깊이 추정부(1230)은 각각 110단계, 120단계 및 130단계를 수행하므로, 도 12의 각 구성에 의해 수행되는 작업에 관련된 모든 관련 세부사항은 도 1 내지 도 11의 해당 설명을 참조할 수 있으므로, 여기서 더는 반복하지 않는다.The depth estimation method shown in FIG. 1 can be performed by the depth estimation device 1200 shown in FIG. 12, and the image acquisition unit 1210, reliability determination unit 1220, and depth estimation unit 1230 each have 110 Since steps 120 and 130 are performed, all relevant details related to the operations performed by each configuration in Figure 12 may refer to the corresponding descriptions in Figures 1 to 11, and will not be repeated further here.

또한, 깊이 추정 장치(1200)를 소개할 때 해당 처리를 별도로 수행하기 위해 이를 이미지 획득부(1210), 신뢰도 결정부(1220) 및 깊이 추정부(1230)으로 나누었으나, 이미지 획득부(1210), 신뢰도 결정부(1220) 및 깊이 추정부(1230)에 의해 수행되는 처리는 깊이 추정 장치(1200)에서 분할 없이 또는 구성 간의 명확한 경계 없이 수행될 수도 있음은 당업자에게 자명하다. 또한, 깊이 추정 장치(1200)는 이미지 전처리부, 메모리 등과 같은 다른 구성을 더 포함할 수도 있다.In addition, when introducing the depth estimation device 1200, it was divided into an image acquisition unit 1210, a reliability determination unit 1220, and a depth estimation unit 1230 to perform the corresponding processing separately, but the image acquisition unit 1210 , it is obvious to those skilled in the art that the processing performed by the reliability determination unit 1220 and the depth estimation unit 1230 may be performed in the depth estimation device 1200 without division or clear boundaries between configurations. Additionally, the depth estimation device 1200 may further include other components such as an image preprocessor, memory, etc.

도 13은 일 실시예에 따른 전자 장치의 블록도이다.Figure 13 is a block diagram of an electronic device according to an embodiment.

도 13을 참조하면, 전자 장치(1300)는 적어도 하나의 메모리(1310)와 적어도 하나의 프로세서(1320)를 포함하고, 상기 적어도 하나의 메모리(1310)는 컴퓨터 실행 가능 명령을 저장하고, 컴퓨터 실행 가능 명령은 적어도 하나의 프로세서(1320)에 의해 실행될 때, 적어도 하나의 프로세서(1320)가 본 개시의 실시예에 따른 깊이 추정 방법을 실행하도록 한다.Referring to FIG. 13, the electronic device 1300 includes at least one memory 1310 and at least one processor 1320, wherein the at least one memory 1310 stores computer-executable instructions and performs computer execution. The enabling instructions, when executed by at least one processor 1320, cause at least one processor 1320 to execute a depth estimation method according to an embodiment of the present disclosure.

AI 모델을 통해 상기 복수의 구성 중 적어도 하나의 구성을 구현할 수 있다. AI와 관련된 기능은 비휘발성 메모리, 휘발성 메모리 및 프로세서에 의해 수행될 수 있다.At least one configuration among the plurality of configurations can be implemented through an AI model. AI-related functions can be performed by non-volatile memory, volatile memory, and processors.

프로세서는 하나 이상의 프로세서를 포함할 수 있다. 이때, 하나 이상의 프로세서는 범용 프로세서(예, 중앙 처리 장치(CPU), 응용 프로세서(AP) 등) 또는 순수 그래픽 처리 유닛(예, 그래픽 처리 유닛(GPU), 시각 처리 유닛(VPU)), 및/또는 AI 전용 프로세서(예, 신경 처리 유닛(NPU))일 수 있다.A processor may include one or more processors. At this time, one or more processors may be a general-purpose processor (e.g., central processing unit (CPU), application processor (AP), etc.) or a pure graphics processing unit (e.g., graphics processing unit (GPU), visual processing unit (VPU)), and/ Alternatively, it may be a dedicated AI processor (e.g., neural processing unit (NPU)).

하나 이상의 프로세서는 비휘발성 메모리 및 휘발성 메모리에 저장된 사전 정의된 동작 규칙 또는 인공 지능(AI) 모델에 따라 입력 데이터의 처리를 제어한다. 훈련 또는 학습을 통해 사전 정의된 동작 규칙 또는 인공 지능 모델을 제공한다. 여기서, 학습에 의한 제공은 복수의 학습 데이터에 학습 알고리즘을 적용하여 사전 정의된 동작 규칙 또는 원하는 특성을 갖는 AI 모델을 얻는 것을 의미한다. 이러한 학습은 실시예에 따른 AI가 수행되는 장치 자체에서 수행될 수 있고, 및/또는 별도의 서버/장치/시스템에 의해 구현될 수 있다.One or more processors control the processing of input data according to predefined operation rules or artificial intelligence (AI) models stored in non-volatile memory and volatile memory. Provides predefined action rules or artificial intelligence models through training or learning. Here, provision by learning means applying a learning algorithm to a plurality of learning data to obtain an AI model with predefined operation rules or desired characteristics. This learning may be performed on the device itself where AI according to the embodiment is performed, and/or may be implemented by a separate server/device/system.

학습 알고리즘은 복수의 학습 데이터를 이용하여 소정의 타겟 장치(예, 로봇)를 훈련시켜 타겟 장치를 결정 또는 예측하도록 유도, 허용 또는 제어하는 *?*방법이다. 해당 학습 알고리즘의 예시는 지도 학습(supervised learning), 비지도 학습, 반 지도 학습 또는 강화 학습을 포함하나 이에 국한되지는 않는다.A learning algorithm is a method of inducing, allowing, or controlling a target device (e.g., a robot) to determine or predict the target device by training it using a plurality of learning data. Examples of such learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

본 발명에 따르면, 전자 장치에서 실행되는 이미지 처리 방법에 있어서, 입력 이미지를 AI 모델에 대한 입력 데이터로 이용하여 타겟 영역의 처리 후의 출력 이미지를 획득할 수 있다.According to the present invention, in an image processing method executed in an electronic device, an output image after processing the target area can be obtained by using an input image as input data for an AI model.

AI 모델은 훈련을 통해 얻을 수 있다. 여기서 '훈련을 통해 얻는' 것은 훈련 알고리즘을 통해 여러 훈련 데이터로 기본 인공지능 모델을 훈련시켜 미리 정의된 운영 규칙 또는 필요한 특징(또는 목표)을 실행하도록 구성된 인공지능 모델을 훈련시키는 것을 의미한다.AI models can be obtained through training. Here, 'gaining through training' means training an artificial intelligence model configured to execute predefined operating rules or necessary characteristics (or goals) by training the basic artificial intelligence model with multiple training data through a training algorithm.

예를 들어, AI 모델은 복수의 신경망 레이어로 구성될 수 있다. 각 레이어는 복수의 가중치 값을 가지며, 하나의 레이어의 계산은 이전 레이어의 계산 결과와 현재 레이어의 복수의 가중치에 의해 수행된다. 신경망의 예시로, 컨볼루션 신경망(CNN), 심층 신경망(DNN), 순환 신경망(RNN), 제한된 볼츠만 머신(RBM), 심층 신뢰망(DBN), 양방향 순환 심층 신경망(BRDNN), 생성 대응 네트워크(GAN) 및 심층 Q 네트워크를 포함하나 이에 제한되지 않는다. 예를 들어, 전자 장치는 PC 컴퓨터, 태블릿 장치, PDA, 스마트폰, 또는 앞서 언급한 명령 세트를 실행할 수 있는 다른 장치일 수 있다. 여기서, 전자 장치는 하나의 전자 장치일 필요는 없으며, 상기 명령(또는 명령 세트)를 개별적으로 또는 공동으로 실행할 수 있는 장치 또는 회로의 집합체일 수도 있다. 전자 장치는 또한 통합 제어 시스템 또는 시스템 관리자의 일부일 수 있거나, 로컬 또는 원격(예, 무선 전송을 통해)으로 인터페이스하는 휴대용 전자 장치로 구성될 수 있다.For example, an AI model may consist of multiple neural network layers. Each layer has multiple weight values, and the calculation of one layer is performed based on the calculation results of the previous layer and multiple weights of the current layer. Examples of neural networks include convolutional neural networks (CNN), deep neural networks (DNN), recurrent neural networks (RNN), restricted Boltzmann machines (RBM), deep belief networks (DBN), bidirectional recurrent deep neural networks (BRDNN), and generative correspondence networks ( GAN) and deep Q networks. For example, the electronic device may be a PC computer, tablet device, PDA, smartphone, or other device capable of executing the aforementioned instruction set. Here, the electronic device does not have to be a single electronic device, but may be a collection of devices or circuits that can individually or jointly execute the command (or set of commands). The electronic device may also be part of an integrated control system or system manager, or may consist of a portable electronic device that interfaces locally or remotely (e.g., via wireless transmission).

전자 장치에서, 프로세서는 중앙 처리 장치(CPU), 그래픽 처리 장치(GPU), 프로그래밍 가능한 논리 장치, 전용 프로세서 시스템, 마이크로컨트롤러 또는 마이크로프로세서를 포함할 수 있다. 제한이 아닌 예로서, 프로세서는 또한 아날로그 프로세서, 디지털 프로세서, 마이크로프로세서, 멀티코어 프로세서, 프로세서 어레이, 네트워크 프로세서 등을 더 포함할 수 있다.In electronic devices, a processor may include a central processing unit (CPU), graphics processing unit (GPU), programmable logic device, dedicated processor system, microcontroller, or microprocessor. By way of example and not limitation, the processor may further include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, and the like.

프로세서는 메모리에 저장된 명령 또는 코드를 실행할 수 있고, 그중, 메모리는 데이터를 더 저장할 수 있다. 명령 및 데이터는 또한 임의의 알려진 전송 프로토콜을 사용할 수 있는 네트워크 인터페이스를 통해 네트워크를 통해 전송 및 수신될 수 있다.The processor can execute instructions or code stored in memory, which can further store data. Commands and data may also be transmitted and received over a network via a network interface, which may use any known transmission protocol.

메모리는 예를 들어 집적 회로 마이크로프로세서 등 내에 RAM 또는 플래시 메모리를 배열함으로써 프로세서와 통합될 수 있다. 또한, 메모리는 외부 디스크 드라이브, 스토리지 어레이 또는 임의의 데이터베이스 시스템에서 사용할 수 있는 기타 스토리지 장치와 같은 독립 장치를 포함할 수 있다. 메모리와 프로세서는 작동적으로 결합되거나 I/O 포트, 네트워크 연결 등을 통해 서로 통신하여 프로세서가 메모리에 저장된 파일을 읽을 수 있게 한다.Memory may be integrated with the processor, for example by arranging RAM or flash memory within an integrated circuit microprocessor or the like. Additionally, memory may include independent devices such as external disk drives, storage arrays, or other storage devices that may be used in any database system. The memory and processor are operatively coupled or communicate with each other through I/O ports, network connections, etc., allowing the processor to read files stored in the memory.

또한, 전자 장치는 비디오 디스플레이(예, 액정 디스플레이) 및 사용자 상호작용 인터페이스(예, 키보드, 마우스, 터치 입력 장치 등)를 더 포함할 수 있다. 전자 장치의 모든 구성 요소는 버스 및/또는 네트워크를 통해 서로 연결될 수 있다.Additionally, the electronic device may further include a video display (eg, liquid crystal display) and a user interaction interface (eg, keyboard, mouse, touch input device, etc.). All components of an electronic device may be connected to each other via buses and/or networks.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may store program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and available to those skilled in the art of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 저장될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , can be saved. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with limited drawings as described above, those skilled in the art can apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the following claims.

Claims

In the depth estimation method,
Receiving a Time Of Flight (TOF) image, a left image, and a right image of the scene;
Confirming a first reliability of each TIOF pixel of the TIOF image; and
Estimating a depth map of the scene based on the TFO image, the left image, and the right image according to the first reliability.
Depth estimation method including.

According to paragraph 1,
Estimating a depth map of the scene based on the TFO image, the left image, and the right image according to the first reliability,
confirming the quantity of TIOF pixels of the first reliability that satisfies a predetermined requirement;
If the quantity of TFO pixels of the first reliability meets a first threshold requirement, estimating a depth map of the scene based on the TFO image, the left image, and the right image; and
If the quantity of TIOF pixels of the first reliability does not meet the first threshold requirement, estimating the depth map of the scene based on the left image and the right image without using the TFO image. step
Depth estimation method including.

According to paragraph 1,
The step of checking the first reliability of each TIOF pixel of the TIOF image is,
Projecting each TIOF pixel of the TIOF image onto the left image and the right image, respectively;
For each Thiof projection point after projection, calculating a second reliability of a Thiof pixel corresponding to each Thiof projection point on the corresponding Thiof projection point scan line along a first direction; and
Calculating the first reliability of a TIOF pixel corresponding to each TIOF projection point along a second direction opposite to the first direction based on a reliability calculation result in the first direction.
Depth estimation method including.

According to paragraph 3,
The step of calculating the second reliability and the step of calculating the first reliability are,
The difference in image characteristics of each TIOF projection point in the left image and the right image and the distance between each TIOF projection point on the corresponding projection point scan line correspond to similar TIOF projection points with image characteristics within a preset range. Based on the third reliability of the TIOF pixel, calculating the second reliability and the first reliability of the TIOF pixel corresponding to each TIOF projection point.
Depth estimation method.

According to paragraph 2,
The step of estimating the depth map of the scene based on the TIOF image, the left image, and the right image,
performing first stereo matching of the left image and the right image;
predicting a fourth reliability of each TIOF pixel based on the TFO image and the first stereo matching result using a first neural network; and
Estimating a first depth map of the scene by performing a second stereo matching of the left image and the right image based on the TIOF image and the fourth reliability.
Depth estimation method including.

According to clause 5,
The step of performing the first stereo matching of the left image and the right image includes:
If the quantity satisfies the first threshold requirement and the second threshold requirement, performing the first stereo matching of the left image and the right image based on the first reliability and T.O.F image; or
If the quantity meets the first threshold requirement and does not meet the second threshold requirement, performing the first stereo matching of the left image and the right image in a situation where the TIOF image is not used. step
Depth estimation method including.

According to clause 5,
Predicting the fourth reliability of each TFO pixel based on the TFO image and the first stereo matching result using the first neural network,
Predicting the fourth reliability of each TIOF pixel by using at least one piece of information among first information, second information, and third information as an input to the first neural network.
Contains,
The first information is,
It is the difference between the disparity value corresponding to each TIOF pixel determined according to the TIOF image and the disparity value of each TIOF pixel determined through the first stereo matching,
The second information is,
It is the difference in image characteristics of each TIOF pixel at the projection point of the left image and the right image,
The third information is,
The difference between the projection point of each Thiofe pixel of the left image and the right image and the depth value of at least one Thiofe projection point with similar image features in the projection point area,
Depth estimation method.

According to clause 5,
Estimating the first depth map of the scene by performing the second stereo matching of the left image and the right image based on the T.O.F image and the fourth reliability includes,
Calculating a matching cost of a candidate disparity corresponding to each TIOF pixel during the second stereo matching based on the value of each TIOF pixel of the TIOF image and the fourth reliability of each predicted TIOF pixel. step; and
Determining a disparity value corresponding to each TIOF pixel based on the matching cost, and estimating the first depth map using the determined disparity value.
Depth estimation method including.

According to clause 5,
The step of estimating the depth map of the scene based on the TIOF image, the left image, and the right image,
The TIOF image is projected onto the left image and the right image, and interpolation is performed based on the image features of the left image and the right image for a TIOF projection point area that satisfies a preset density to obtain a second depth. estimating a map;
estimating a third depth map by performing interpolation on the first depth map based on image features of the left image and the right image; and
Estimating a fourth depth map of the scene based on the second depth map and the third depth map.
A depth estimation method further comprising:

According to clause 9,
The step of estimating the second depth map by performing interpolation based on image features of the left image and the right image for the TIOF projection point area that satisfies the preset density includes,
performing interpolation based on image features of the left image and the right image for adjacent TIOF projection points spaced apart within a preset distance on the scan line of each TIOF projection point;
Constructing a regular grid of TIOF projection points by sampling the interpolated TIOF projection points; and
Estimating the second depth map by performing interpolation based on image features of the left image and the right image for each TIOF projection point on each constructed grid.
Depth estimation method including.

According to clause 9,
Performing interpolation based on the image features of the left image and the right image,
Determines the depth value of the point to be interpolated based on the spatial distance between the point to be interpolated and adjacent reference points and the difference in image features.
Depth estimation method.

According to paragraph 2,
The step of estimating the depth map of the scene based on the left image and the right image without using the T.O.F image,
Comprising the step of estimating a fifth depth map of the scene through stereo matching of the left image and the right image.
Depth estimation method.

According to any one of paragraphs 8, 9, and 12,
Step of updating the depth value of an unreliable depth value point in the estimated depth map.
It further includes,
The estimated depth map is,
Containing the first depth map, the fourth depth map, or the fifth depth map
Depth estimation method.

According to clause 13,
The step of updating the depth value of the unreliable depth value point in the estimated depth map includes:
determining reliable depth value points and unreliable depth value points among the estimated depth map;
predicting a depth value of the unreliable depth value point based on characteristics of the reliable depth value point and the unreliable depth value point using a second neural network; and
For the area surrounding the unreliable depth value point, performing interpolation based on image features of the left image and the right image to estimate an updated depth map.
Depth estimation method including.

According to clause 14,
The step of determining the reliable depth value point and the unreliable depth value point among the estimated depth map includes:
Constructing a regular grid of depth value points based on the estimated depth map, and determining the reliable depth value points and the unreliable depth value points on the regular grid.
Depth estimation method including.

In electronic devices,
at least one processor; and
Memory that stores at least one computer executable instruction
Including,
The processor,
Receive a Time Of Flight (TOF) image, a left image, and a right image of a scene, determine a first reliability of each TOF pixel of the TOF image, and determine the TOF image according to the first reliability; Estimating the depth map of the scene based on the left image and the right image
Electronic devices.

According to clause 16,
The processor,
When estimating the depth map of the scene,
Confirming the quantity of TIOF pixels of the first reliability that meets predetermined requirements,
If the quantity of TIOF pixels of the first reliability meets a first threshold requirement, estimate a depth map of the scene based on the TFO image, the left image, and the right image,
If the quantity of TFO pixels of the first reliability does not meet the first threshold requirement, a depth map of the scene is created based on the left image and the right image in a situation where the TFO image is not used. presumed
Electronic devices.

According to clause 16,
The processor,
When determining the first reliability of each ThiofF pixel of the Thiofe image,
Projecting each TIOF pixel of the TIOF image onto the left image and the right image, respectively,
For each Thiof projection point after projection, calculate a second reliability of the Thiof pixel corresponding to each Thiof projection point on the corresponding Thiof projection point scan line along the first direction;
Calculating the first reliability of a TIOF pixel corresponding to each TIOF projection point based on a reliability calculation result in the first direction along a second direction opposite to the first direction.
Electronic devices.

According to clause 17,
The processor,
When estimating the depth map of the scene based on the TIOF image, the left image, and the right image,
Perform first stereo matching of the left image and the right image,
Predicting a fourth reliability of each TIOF pixel based on the TIOF image and the first stereo matching result using a first neural network,
Performing a second stereo matching of the left image and the right image based on the TIOF image and the fourth reliability to estimate the first depth map of the scene
Electronic devices.

According to clause 17,
The processor,
When estimating the depth map of the scene based on the left image and the right image without using the T.O.F image,
Estimating a fifth depth map of the scene through stereo matching of the left image and the right image.
Electronic devices.