KR20220043830A

KR20220043830A - Method and device for performing plane detection

Info

Publication number: KR20220043830A
Application number: KR1020210049086A
Authority: KR
Inventors: 준 시; 레이 카오
Original assignee: 삼성전자주식회사
Priority date: 2020-09-29
Filing date: 2021-04-15
Publication date: 2022-04-05

Abstract

A method and a device for performing plane detection are provided. A method of extracting a plane includes the steps of: acquiring an input image; extracting features of the input image using a deep neural network and estimating a depth map of the input image based on the extracted features; and performing region segmentation using the depth map to detect plane regions in the input image. Thus, planes can be detected in a texture-free region.

Description

METHOD AND DEVICE FOR PERFORMING PLANE DETECTION

본 개시는 인간-컴퓨터 상호 작용(human-computer interaction) 분야에 관한 것이고, 보다 구체적으로 평면 검출(plane detection)을 수행하는 방법 및 장치에 관한 것이다.The present disclosure relates to the field of human-computer interaction, and more particularly, to a method and apparatus for performing plane detection.

인간-컴퓨터 상호 작용(human-computer interaction)의 분야에서, 증강 현실(Augmented Reality: AR)은 중요한 브랜치(branch)들 중 하나이다. 증강 현실은 현실 세계 환경의 상호 작용 경험이다. 증강 현실에서, 현실 세계에 존재하는 오브젝트(object)들은 컴퓨터에 의해 생성되는 지각 정보에 의해 향상되며, 때로는 시각, 청각, 촉각, 체성 감각, 및/또는 후각을 포함하는 다수의 감각 모드들에 걸쳐 향상될 수 있다. 증강 현실 시스템은 상기 현실 세계와 가상 세계의 조합, 실시간 상호 작용, 가상 및 현실 오브젝트들의 정확한 3 차원 등록의 3가지 기본 특성들을 충족하는 시스템으로 정의될 수 있다. 중첩된 감각 정보는 현실 환경, 또는 상기 현실 환경의 은폐(concealment)로 증강될 수 있다. In the field of human-computer interaction, Augmented Reality (AR) is one of the important branches. Augmented reality is an interactive experience of a real world environment. In augmented reality, objects present in the real world are enhanced by computer-generated perceptual information, sometimes spanning multiple sensory modes including sight, hearing, touch, somatosensory, and/or smell. can be improved An augmented reality system can be defined as a system that meets three basic characteristics: combination of the real world and virtual world, real-time interaction, and accurate three-dimensional registration of virtual and real objects. The superimposed sensory information may be augmented with a real environment, or concealment of the real environment.

이동 스마트 장치들의 대중화와 컴퓨팅 전력에서의 엄청난 증가로 인해, 증강 현실 기술은 지난 몇 년 동안 상당한 진전을 이루어 냈다. 새로운 인간-컴퓨터 상호 작용 기술인 증강 현실은 현실 장면(scene)들에서 물리 오브젝트들과 데이터 정보를 보다 직관적으로 디스플레이할 수 있다. 가상 오브젝트들을 현실 환경과 보다 잘 결합하여 보다 좋은 몰입(immersive) 감각 경험을 제공하는 많은 연구들이 수행되고 있다.Due to the popularity of mobile smart devices and the tremendous increase in computing power, augmented reality technology has made significant progress in the past few years. Augmented reality, a new human-computer interaction technology, can more intuitively display physical objects and data information in real scenes. Many studies are being conducted to better combine virtual objects with the real environment to provide a better immersive sensory experience.

평면 검출 기술은 증강 현실의 핵심 기술들 중 하나이다. 평면 검출 기술은 현실 환경에서 다양한 평면들(지면, 데스크톱, 벽, 등과 같은)의 위치와 사이즈를 검출할 수 있다. 증강 현실에서, 가상 아이템들은 검출된 평면들에 배치될 수 있다. 정확한 평면 검출 결과들은 양호한 사용자 경험을 제공하기 위한 증강 현실 어플리케이션들에 대한 핵심 요소들 중 하나이다. Planar detection technology is one of the core technologies of augmented reality. The plane detection technology can detect the position and size of various planes (such as the ground, desktop, wall, etc.) in a real environment. In augmented reality, virtual items may be placed in detected planes. Accurate plane detection results are one of the key factors for augmented reality applications to provide a good user experience.

기존의 평면 검출 방법들은 이미지 장면들에서 텍스처가 없는(texture-less) 영역들(단색 벽들, 데스크톱들, 등과 같은)에서 평면들을 검출하는 것에 실패할 뿐만 아니라, 획득된 평면들이 현실 오브젝트들의 경계들과 정렬되지 않을 수 있다. 이를 고려하여, 평면을 정확하게 검출할 수 있는 방법 및 장치에 대한 필요성이 대두되고 있다.Existing plane detection methods not only fail to detect planes in texture-less areas (such as solid-color walls, desktops, etc.) may not be aligned with In consideration of this, there is a need for a method and an apparatus capable of accurately detecting a plane.

본 개시의 일 측면에 따르면, 평면(plane) 검출을 수행하는 방법이 제공되며, 상기 방법은 입력 이미지를 획득하는 과정; 상기 입력 이미지의 특징들을 추출하고, 심층 신경 네트워크(deep neural network)를 사용하여, 상기 추출된 특징들을 기반으로 상기 입력 이미지의 깊이 지도(depth map)를 추정하는 과정; 및 상기 깊이 지도를 사용하여 영역 세그먼테이션(region segmentation)을 수행하여 상기 입력 이미지에서 평면 영역들을 검출하는 과정을 포함한다. According to one aspect of the present disclosure, there is provided a method for performing plane detection, the method comprising: acquiring an input image; extracting features of the input image and estimating a depth map of the input image based on the extracted features using a deep neural network; and performing region segmentation using the depth map to detect planar regions in the input image.

다양한 실시 에에 따라, 상기 심층 신경 네트워크는 상기 입력 이미지의 특징들을 추출하기 위한 특징 추출자(extractor), 상기 입력 이미지의 깊이 정보를 추정하기 위한 깊이 추정 브랜치(branch), 및 상기 입력 이미지의 노말(normal) 정보를 추정하기 위한 노말 추정 브랜치를 포함할 수 있고, 상기 입력 이미지의 깊이 지도를 추정할 시, 상기 깊이 추정 브랜치에 의해 추정되는 깊이 정보는 상기 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 최적화된다. According to various embodiments, the deep neural network includes a feature extractor for extracting features of the input image, a depth estimation branch for estimating depth information of the input image, and a normal ( normal) for estimating information, and when estimating the depth map of the input image, the depth information estimated by the depth estimation branch uses normal information estimated by the normal estimation branch to be optimized

다양한 실시 에에 따라, 상기 입력 이미지의 깊이 지도를 추정할 시, 상기 특징 추출자를 사용하는 상기 입력 이미지의 특징 추출에 의해 획득되는 미리 결정되어 있는 해상도의 특징 지도는 상기 깊이 추정 브랜치를 사용하는 깊이 추정 시 생성되는 동일한 해상도의 깊이 특징 지도, 및 상기 노말 추정 브랜치를 사용하는 노말 추정 시 생성되는 동일한 해상도의 노말 특징 지도와 각각 융합되어 상기 융합된 깊이 특징 지도 및 상기 융합된 노말 특징 지도를 사용하는 상기 깊이 지도를 획득할 수 있다.According to various embodiments, when estimating the depth map of the input image, the feature map of a predetermined resolution obtained by feature extraction of the input image using the feature extractor is a depth estimation using the depth estimation branch. The fused depth feature map and the fused normal feature map are respectively fused with a depth feature map of the same resolution generated during normal estimation and a normal feature map of the same resolution generated during normal estimation using the normal estimation branch. You can get a depth map.

다양한 실시 에에 따라, 상기 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 상기 깊이 추정 브랜치에 의해 추정되는 깊이 정보를 최적화하는 과정은 상기 노말 특징 지도에서 노말 특징 변경이 미리 결정되어 있는 정도를 초과하는 영역에 관련되는 정보를 추출하고, 상기 정보를 사용하여 상기 깊이 특징 지도를 최적화하여 상기 최적화된 깊이 특징 지도를 획득하는 과정을 포함할 수 있다.According to various embodiments, the process of optimizing the depth information estimated by the depth estimation branch using the normal information estimated by the normal estimation branch exceeds a predetermined degree of normal feature change in the normal feature map. extracting information related to a region, and optimizing the depth feature map using the information to obtain the optimized depth feature map.

다양한 실시 예에 따라, 상기 노말 특징 지도에서 상기 노말 특징 변경이 상기 미리 결정되어 있는 정도를 초과하는 영역에 관련되는 정보를 추출하고, 상기 정보를 사용하여 상기 깊이 특징 지도를 최적화하는 과정은: 상기 노말 특징 지도에 대해 수평 깊이 컨볼루션(convolution) 및 수직 깊이 컨볼루션을 각각 수행하고, 활성화 함수(activation function)를 사용하여 상기 정보에 대한 수평 어텐션(attention) 지도 및 수직 어텐션 지도를 획득하는 과정; 상기 수평 어텐션 지도, 상기 수직 어텐션 지도 및 상기 깊이 특징 지도를 기반으로 상기 최적화된 깊이 특징 지도를 획득하는 과정을 포함한다.According to various embodiments, the process of extracting information related to a region in which the normal feature change exceeds the predetermined degree from the normal feature map and optimizing the depth feature map using the information includes: performing horizontal depth convolution and vertical depth convolution on the normal feature map, respectively, and using an activation function to obtain a horizontal attention map and a vertical attention map for the information; and obtaining the optimized depth feature map based on the horizontal attention map, the vertical attention map, and the depth feature map.

다양한 실시 예에 따라, 상기 수평 어텐션 지도, 상기 수직 어텐션 지도 및 상기 깊이 특징 지도를 기반으로 상기 최적화된 깊이 특징 지도를 획득하는 과정은: 상기 수평 어텐션 지도와 상기 수직 어텐션 지도에 가중치를 부여하는 과정; 상기 가중치가 부여된 수평 어텐션 지도와 상기 가중치가 부여된 수직 어텐션 지도를 상기 깊이 특징 지도와 융합하여 상기 최적화된 깊이 특징 지도를 획득하는 과정을 포함할 수 있다.According to various embodiments, the process of obtaining the optimized depth feature map based on the horizontal attention map, the vertical attention map, and the depth feature map includes: a process of assigning weights to the horizontal attention map and the vertical attention map ; and fusing the weighted horizontal attention map and the weighted vertical attention map with the depth feature map to obtain the optimized depth feature map.

다양한 실시 예에 따라, 상기 깊이 지도를 사용하여 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 평면 영역들을 검출하는 과정은: 상기 깊이 지도를 사용하여 평면 추정을 위해 상기 입력 이미지에서 3차원 포인트(point)들과 깊이-연속 영역(depth-continuous region)들을 계산하고, 상기 계산된 3차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 상기 평면 영역들을 검출하는 과정을 포함할 수 있다.According to various embodiments, the process of detecting planar regions in the input image by performing region segmentation using the depth map may include: 3D points in the input image for plane estimation using the depth map. and depth-continuous regions, and detecting the planar regions in the input image by performing the region segmentation using the calculated 3D points and information on the depth continuous regions. may include

다양한 실시 예에 따라, 상기 계산된 3차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 상기 평면 영역들을 검출하는 과정은: 상기 계산된 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도를 계산하고, 상기 계산된 노말 지도를 상기 심층 신경 네트워크에 의해 추정되는 노말 지도와 융합하는 과정; 상기 융합된 노말 지도와 상기 깊이 연속 영역들의 정보를 사용하여 상기 평면 영역들을 세그먼트하기 위해 클러스터링(clustering)하는 과정을 포함할 수 있다.According to various embodiments, the process of detecting the planar regions in the input image by performing the region segmentation using the calculated 3D points and information on the depth continuous regions includes: using the calculated 3D points calculating a normal map of the input image, and fusing the calculated normal map with a normal map estimated by the deep neural network; and clustering to segment the planar regions using the fused normal map and information on the depth continuous regions.

다양한 실시 예에 따라, 상기 계산된 3차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 상기 평면 영역들을 검출하는 과정은: 상기 계산된 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도를 계산하는 과정; 상기 계산된 노말 지도와 상기 깊이 연속 영역들의 정보를 사용하여 상기 평면 영역들을 세그먼트하기 위해 클러스터링(clustering)하는 과정을 포함할 수 있다. According to various embodiments, the process of detecting the planar regions in the input image by performing the region segmentation using the calculated 3D points and information on the depth continuous regions includes: using the calculated 3D points calculating a normal map of the input image; and clustering to segment the planar regions using the calculated normal map and information on the depth continuous regions.

다양한 실시 예에 따라, 상기 심층 신경 네트워크는 상기 입력 이미지의 특징들을 추출하기 위한 추출자(extractor)와 상기 입력 이미지의 깊이 정보를 추출하기 위한 깊이 추정 브랜치(branch)를 포함하고, 상기 입력 이미지의 깊이 지도를 추정할 시, 상기 특징 추출자를 사용하는 상기 입력 이미지의 특징 추출에 의해 획득되는 미리 결정되어 있는 해상도의 특징 지도를 상기 깊이 추정 브랜치를 사용하는 깊이 추정 시 생성되는 동일한 해상도의 깊이 특징 지도와 융합하여 상기 융합된 깊이 특징 지도를 사용하여 상기 깊이 지도를 생성할 수 있다. According to various embodiments, the deep neural network includes an extractor for extracting features of the input image and a depth estimation branch for extracting depth information of the input image, When estimating a depth map, a feature map of a predetermined resolution obtained by feature extraction of the input image using the feature extractor is a depth feature map of the same resolution generated during depth estimation using the depth estimation branch The depth map may be generated using the fused depth feature map by fusion with .

다양한 실시 예에 따라, 상기 방법은 상기 평면 영역들의 경계들이 상기 입력 이미지에서의 현실 오브젝트(real object)들의 경계들과 정렬되도록 상기 검출된 평면 영역들의 경계들을 미세 조정하는 과정을 더 포함할 수 있다.According to various embodiments of the present disclosure, the method may further include fine-tuning the boundaries of the detected planar areas so that the boundaries of the planar areas are aligned with the boundaries of real objects in the input image. .

다양한 실시 예에 따라, 상기 검출된 평면 영역들의 경계들을 미세 조정하는 과정은: 상기 검출된 각 평면 영역에 상응하는 개별 레이블 값(discrete label value)을 각각 획득하는 과정; 상기 개별 레이블 값을 기반으로 상기 검출된 각 평면 영역을 3차원 볼륨(volume)으로 변환하는 과정; 상기 변환된 3 차원 볼륨과 상기 입력 이미지를 기반으로 상기 평면 영역들을 미세 조정하는 과정을 포함하여, 상기 평면 영역들의 경계들이 상기 입력 이미지에서의 현실 오브젝트들의 경계들과 정렬되도록 할 수 있다. According to various embodiments of the present disclosure, the process of fine-tuning the boundaries of the detected planar regions may include: acquiring discrete label values corresponding to each of the detected planar regions; converting each detected planar region into a three-dimensional volume based on the individual label value; The method may include a process of fine-tuning the planar areas based on the converted 3D volume and the input image, so that the boundaries of the planar areas are aligned with the boundaries of real objects in the input image.

다양한 실시 예에 따라, 상기 검출된 평면 영역들의 경계들을 미세 조정하는 과정은: 상기 검출된 각 평면 영역을 기반으로 상기 입력 이미지에서의 각 픽셀(pixel)에 상응하는 영역 정보를 획득하는 과정; 상기 각 픽셀과 상기 검출된 각 평면 영역의 경계들 간의 2 차원 단일 채널 이미지 상의 최단 거리를 기반으로, 상기 입력 이미지로 구성되는 4-채널 이미지 및 상기 영역 정보로 구성되는 2 차원 단일 채널 이미지에서 각 픽셀의 평면 가중치 정보를 획득하는 과정; 픽셀 값, 상기 영역 정보 및 상기 각 픽셀에 상응하는 상기 평면 가중치 정보를 기반으로 픽셀들 간의 유사성(similarity)을 결정하고, 상기 각 픽셀 간의 유사성을 기반으로 이미지 세그먼테이션을 수행하여 상기 미세 조정된 평면 영역 경계들을 획득하는 과정을 포함할 수 있다.According to various embodiments of the present disclosure, the process of fine-tuning the boundaries of the detected plane regions may include: acquiring region information corresponding to each pixel in the input image based on each detected plane region; Based on the shortest distance on the two-dimensional single-channel image between each pixel and the detected boundaries of each planar region, each obtaining plane weight information of pixels; The fine-tuned planar area is determined by determining similarity between pixels based on a pixel value, the area information, and the plane weight information corresponding to each pixel, and performing image segmentation based on the similarity between the respective pixels It may include the process of obtaining boundaries.

본 개시의 다른 측면에 따르면, 평면(plane) 검출을 수행하는 장치가 제공되고, 상기 장치는 입력 이미지를 획득하도록 구성되는 이미지 획득 유닛; 상기 입력 이미지의 특징들을 추출하고, 심층 신경 네트워크(deep neural network)를 사용하여, 상기 추출된 특징들을 기반으로 상기 입력 이미지의 깊이 지도(depth map)를 추정하도록 구성되는 추정 유닛; 상기 깊이 지도를 사용하여 영역 세그먼테이션(region segmentation)을 수행하여 상기 입력 이미지에서 평면 영역들을 검출하는 영역 세그먼테이션 유닛을 포함한다. According to another aspect of the present disclosure, there is provided an apparatus for performing plane detection, the apparatus comprising: an image acquiring unit configured to acquire an input image; an estimation unit, configured to extract features of the input image and, using a deep neural network, estimate a depth map of the input image based on the extracted features; and a region segmentation unit configured to perform region segmentation using the depth map to detect planar regions in the input image.

다양한 실시 예에 따라, 상기 심층 신경 네트워크는 상기 입력 이미지의 특징들을 추출하기 위한 특징 추출자(extractor), 상기 입력 이미지의 깊이 정보를 추정하기 위한 깊이 추정 브랜치(branch), 및 상기 입력 이미지의 노말(normal) 정보를 추정하기 위한 노말 추정 브랜치를 포함할 수 있고, 상기 입력 이미지의 깊이 지도를 추정할 시, 상기 추정 유닛은 상기 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 상기 깊이 추정 브랜치에 의해 추정되는 깊이 정보를 최적화한다. According to various embodiments, the deep neural network includes a feature extractor for extracting features of the input image, a depth estimation branch for estimating depth information of the input image, and a normal of the input image and a normal estimation branch for estimating (normal) information, wherein when estimating the depth map of the input image, the estimation unit is configured to access the depth estimation branch by using normal information estimated by the normal estimation branch. Optimize the depth information estimated by

다양한 실시 예에 따라, 상기 입력 이미지의 깊이 지도를 추정할 시, 상기 추정 유닛은 상기 특징 추출자를 사용하는 상기 입력 이미지의 특징 추출에 의해 획득되는 미리 결정되어 있는 해상도의 특징 지도를 상기 깊이 추정 브랜치를 사용하는 깊이 추정 시 생성되는 동일한 해상도의 깊이 특징 지도, 및 상기 노말 추정 브랜치를 사용하는 노말 추정 시 생성되는 동일한 해상도의 노말 특징 지도와 각각 융합하여 상기 융합된 깊이 특징 지도 및 상기 융합된 노말 특징 지도를 사용하는 상기 깊이 지도를 획득할 수 있다.According to various embodiments, when estimating the depth map of the input image, the estimation unit generates a feature map of a predetermined resolution obtained by feature extraction of the input image using the feature extractor to the depth estimation branch. The fused depth feature map and the fused normal feature map are respectively fused with a depth feature map of the same resolution generated during depth estimation using The depth map using the map may be obtained.

다양한 실시 예에 따라, 상기 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 상기 깊이 추정 브랜치에 의해 추정되는 깊이 정보를 최적화하는 상기 추정 유닛은: 상기 노말 특징 지도에서 노말 특징 변경이 미리 결정되어 있는 정도를 초과하는 영역에 관련되는 정보를 추출하고, 상기 정보를 사용하여 상기 깊이 특징 지도를 최적화하여 상기 최적화된 깊이 특징 지도를 획득하는 것을 포함할 수 있다.According to various embodiments, the estimation unit for optimizing the depth information estimated by the depth estimation branch using the normal information estimated by the normal estimation branch may include: a normal feature change in the normal feature map is predetermined extracting information related to a region exceeding the degree, and optimizing the depth feature map using the information to obtain the optimized depth feature map.

다양한 실시 예에 따라, 상기 노말 특징 지도에서 상기 노말 특징 변경이 상기 미리 결정되어 있는 정도를 초과하는 영역에 관련되는 정보를 추출하고, 상기 정보를 사용하여 상기 깊이 특징 지도를 최적화하는 것은: 상기 노말 특징 지도에 대해 수평 깊이 컨볼루션(convolution) 및 수직 깊이 컨볼루션을 각각 수행하고, 활성화 함수(activation function)를 사용하여 상기 정보에 대한 수평 어텐션(attention) 지도 및 수직 어텐션 지도를 획득하는 것; 상기 수평 어텐션 지도, 상기 수직 어텐션 지도 및 상기 깊이 특징 지도를 기반으로 상기 최적화된 깊이 특징 지도를 획득하는 것을 포함할 수 있다.According to various embodiments, extracting information related to a region in which the normal feature change exceeds the predetermined degree from the normal feature map and optimizing the depth feature map using the information may include: performing horizontal depth convolution and vertical depth convolution on the feature map, respectively, and using an activation function to obtain a horizontal attention map and a vertical attention map for the information; The method may include acquiring the optimized depth feature map based on the horizontal attention map, the vertical attention map, and the depth feature map.

다양한 실시 예에 따라, 상기 수평 어텐션 지도, 상기 수직 어텐션 지도 및 상기 깊이 특징 지도를 기반으로 상기 최적화된 깊이 특징 지도를 획득하는 것은: 상기 수평 어텐션 지도와 상기 수직 어텐션 지도에 가중치를 부여하는 것; 상기 가중치가 부여된 수평 어텐션 지도와 상기 가중치가 부여된 수직 어텐션 지도를 상기 깊이 특징 지도와 융합하여 상기 최적화된 깊이 특징 지도를 획득하는 것을 포함할 수 있다.According to various embodiments, obtaining the optimized depth feature map based on the horizontal attention map, the vertical attention map, and the depth feature map may include: assigning weights to the horizontal attention map and the vertical attention map; and fusing the weighted horizontal attention map and the weighted vertical attention map with the depth feature map to obtain the optimized depth feature map.

다양한 실시 예에 따라, 상기 깊이 지도를 사용하여 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 평면 영역들을 검출하는 것은: 상기 깊이 지도를 사용하여 평면 추정을 위해 상기 입력 이미지에서 3차원 포인트(point)들과 깊이-연속 영역(depth-continuous region)들을 계산하고, 상기 계산된 3차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 상기 평면 영역들을 검출하는 것을 포함할 수 있다.According to various embodiments, detecting planar regions in the input image by performing region segmentation using the depth map includes: 3D points in the input image for plane estimation using the depth map and detecting the planar regions in the input image by calculating depth-continuous regions and performing the region segmentation using the calculated three-dimensional points and information on the depth continuous regions can

다양한 실시 예에 따라, 상기 계산된 3차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 상기 평면 영역들을 검출하는 것은: 상기 계산된 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도를 계산하고, 상기 계산된 노말 지도를 상기 심층 신경 네트워크에 의해 추정되는 노말 지도와 융합하는 것; 상기 융합된 노말 지도와 상기 깊이 연속 영역들의 정보를 사용하여 상기 평면 영역들을 세그먼트하기 위해 클러스터링(clustering)하는 것을 포함할 수 있다. According to various embodiments, detecting the planar regions in the input image by performing the region segmentation using the calculated 3D points and information on the depth continuous regions includes: using the calculated 3D points calculating a normal map of the input image, and fusing the calculated normal map with a normal map estimated by the deep neural network; and clustering to segment the planar regions using the fused normal map and information on the depth continuous regions.

다양한 실시 예에 따라, 상기 계산된 3차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 상기 평면 영역들을 검출하는 것은: 상기 계산된 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도를 계산하는 것; 상기 계산된 노말 지도와 상기 깊이 연속 영역들의 정보를 사용하여 상기 평면 영역들을 세그먼트하기 위해 클러스터링(clustering)하는 것을 포함할 수 있다. According to various embodiments, detecting the planar regions in the input image by performing the region segmentation using the calculated 3D points and information on the depth continuous regions includes: using the calculated 3D points calculating a normal map of the input image; and clustering to segment the planar regions using the calculated normal map and information on the depth continuous regions.

다양한 실시 예에 따라, 상기 심층 신경 네트워크는 상기 입력 이미지의 특징들을 추출하기 위한 추출자(extractor)와 상기 입력 이미지의 깊이 정보를 추출하기 위한 깊이 추정 브랜치(branch)를 포함할 수 있고, 상기 입력 이미지의 깊이 지도를 추정할 시, 상기 추정 유닛은 상기 특징 추출자를 사용하는 상기 입력 이미지의 특징 추출에 의해 획득되는 미리 결정되어 있는 해상도의 특징 지도를 상기 깊이 추정 브랜치를 사용하는 깊이 추정 시 생성되는 동일한 해상도의 깊이 특징 지도와 융합하여 상기 융합된 깊이 특징 지도를 사용하여 상기 깊이 지도를 생성할 수 있다. According to various embodiments, the deep neural network may include an extractor for extracting features of the input image and a depth estimation branch for extracting depth information of the input image, When estimating a depth map of an image, the estimating unit generates a feature map of a predetermined resolution obtained by feature extraction of the input image using the feature extractor in depth estimation using the depth estimation branch By fusion with a depth feature map of the same resolution, the depth map may be generated using the fused depth feature map.

다양한 실시 예에 따라, 상기 장치는 상기 평면 영역들의 경계들이 상기 입력 이미지에서의 현실 오브젝트(real object)들의 경계들과 정렬되도록 상기 검출된 평면 영역들의 경계들을 미세 조정하도록 구성되는 평면 경계 미세 조정 유닛을 더 포함할 수 있다. According to various embodiments, the device is a plane boundary fine-tuning unit, configured to fine-tune the boundaries of the detected planar regions so that the boundaries of the plane regions are aligned with the boundaries of real objects in the input image may further include.

다양한 실시 예에 따라, 상기 검출된 평면 영역들의 경계들을 미세 조정하는 것은: 상기 검출된 각 평면 영역에 상응하는 개별 레이블 값(discrete label value)을 각각 획득하는 것; 상기 개별 레이블 값을 기반으로 상기 검출된 각 평면 영역을 3차원 볼륨(volume)으로 변환하는 것; 상기 변환된 3 차원 볼륨과 상기 입력 이미지를 기반으로 상기 평면 영역들을 미세 조정하는 것을 포함하여, 상기 평면 영역들의 경계들이 상기 입력 이미지에서의 현실 오브젝트들의 경계들과 정렬되도록 할 수 있다.According to various embodiments, fine-tuning the boundaries of the detected planar regions may include: acquiring discrete label values corresponding to each of the detected planar regions, respectively; converting each detected planar area into a three-dimensional volume based on the individual label value; The method may include fine-tuning the planar areas based on the transformed 3D volume and the input image, so that the boundaries of the planar areas are aligned with the boundaries of real objects in the input image.

다양한 실시 예에 따라, 상기 검출된 평면 영역들의 경계들을 미세 조정하는 것은: 상기 검출된 각 평면 영역을 기반으로 상기 입력 이미지에서의 각 픽셀(pixel)에 상응하는 영역 정보를 획득하는 것; 상기 각 픽셀과 상기 검출된 각 평면 영역의 경계들 간의 2 차원 단일 채널 이미지 상의 최단 거리를 기반으로, 상기 입력 이미지로 구성되는 4-채널 이미지 및 상기 영역 정보로 구성되는 2 차원 단일 채널 이미지에서 각 픽셀의 평면 가중치 정보를 획득하는 것; 픽셀 값, 상기 영역 정보 및 상기 각 픽셀에 상응하는 상기 평면 가중치 정보를 기반으로 픽셀들 간의 유사성(similarity)을 결정하고, 상기 각 픽셀 간의 유사성을 기반으로 이미지 세그먼테이션을 수행하여 상기 미세 조정된 평면 영역 경계들을 획득하는 것을 포함할 수 있다.According to various embodiments, fine-tuning the boundaries of the detected planar regions may include: acquiring region information corresponding to each pixel in the input image based on each detected plane region; Based on the shortest distance on the two-dimensional single-channel image between each pixel and the detected boundaries of each planar region, each obtaining plane weight information of the pixel; The fine-tuned planar area is determined by determining similarity between pixels based on a pixel value, the area information, and the plane weight information corresponding to each pixel, and performing image segmentation based on the similarity between the respective pixels may include obtaining boundaries.

다양한 실시 예에 따라, 프로세서와 인스트럭션(program instruction)들을 저장하는 메모리를 포함하는 장치가 제공되며, 상기 인스트럭션들은, 상기 프로세서에 의해 실행될 때, 상기 프로세서가 상기에서 설명한 바와 같은 방법을 수행하도록 한다. According to various embodiments, there is provided an apparatus including a processor and a memory for storing program instructions, the instructions, when executed by the processor, cause the processor to perform the method as described above.

다양한 실시 예에 따라, 그에 기록되는 프로그램 인스트럭션(program instruction)들을 저장하는 컴퓨터로 읽을 수 있는 기록 매체가 제공되고, 상기 인스트럭션들은 상기 프로세서에 의해 실행될 때, 상기 프로세서가 상기에서 설명한 바와 같은 방법을 수행하도록 한다.According to various embodiments, there is provided a computer-readable recording medium storing program instructions recorded thereon, and when the instructions are executed by the processor, the processor performs the method as described above to do it

본 개시의 평면 검출 방법 및 평면 검출을 위한 장치에 따르면, 평면들이 전체 입력 이미지의 깊이 지도를 기반으로 검출되므로, 상기 평면들이 텍스처-프리 영역(texture-free region)에서 검출될 수 있다. According to the method for detecting a plane and an apparatus for detecting a plane of the present disclosure, since planes are detected based on a depth map of an entire input image, the planes may be detected in a texture-free region.

추가적으로, 이를 기반으로, 상기 평면 검출의 정확도는 특징 지도 융합을 통해, 및/또는 노말 정보를 사용하여 깊이 정보를 최적화함으로써 효율적으로 개선될 수 있고, 상기 검출된 평면 영역들의 경계들은 평면 미세 조정 동작을 더 수행함으로써 상기 입력 이미지에서의 현실 오브젝트들의 경계들에 정렬될 수 있다. Additionally, based on this, the accuracy of plane detection can be efficiently improved through feature map fusion, and/or by optimizing depth information using normal information, and the boundaries of the detected plane regions are subjected to a plane fine-tuning operation. may be aligned to the boundaries of real objects in the input image by further performing .

본 출원의 측면들, 다른 측면들, 및/또는 이점들은 첨부 도면들과 함께 본 출원의 실시 예들의 다음의 상세한 설명으로부터 명확하게 이해될 수 있다.
도 1은 기존의 증강 현실 프레임워크(augmented reality framework)에서의 평면 검출(plane detection)의 개략적 도면이다.
도 2는 본 개시에 따른 증강 현실 프레임워크에서의 평면 검출의 개략적 도면이다.
도 3은 본 개시에 따른 평면 검출을 수행하는 방법의 플로우 차트이다.
도 4는 본 개시의 예시적인 실시 예에 따른 평면 검출 방법의 개략적 도면이다.
도 5는 본 개시의 예시적인 실시 예에 따른 심층 신경 네트워크(deep neural network)의 동작들의 개략적 도면이다.
도 6은 본 개시의 예시적인 실시 예에 따른 특징 융합 모듈(feature fusion module)에 의해 수행되는 동작들을 도시하고 있는 개략적 도면이다.
도 7은 노말(normal) 정보가 확실하게(obviously) 변경되지만 깊이 정보는 약간(slightly) 변경되는 영역의 개략적 도면이다.
도 8은 본 개시의 예시적인 실시 예에 따른 노말 정보를 사용하여 깊이 정보를 최적화하는 개략적 도면이다.
도 9는 본 개시의 예시적인 실시 예에 따른 노말-가이드 어텐션 모듈(normal-guided attention module)에 의해 수행되는 동작들을 도시하고 있는 개략적 도면이다.
도 10은 본 개시의 예시적인 실시 예에 따른 평면 영역 세그먼테이션(segmentation) 동작의 일 예를 도시하고 있는 개략적 도면이다.
도 11은 본 개시의 예시적인 실시 예에 따른 평면 미세 조정(refinement) 동작의 일 예를 도시하고 있는 개략적 도면이다.
도 12는 본 개시의 다른 예시적인 실시 예에 따른 평면 검출 방법을 도시하고 있는 개략적 도면이다.
도 13은 본 개시의 다른 예시적인 실시 예에 따른 상기 심층 신경 네트워크의 동작들을 도시하고 있는 개략적 도면이다.
도 14는 본 개시의 다른 예시적인 실시 예에 따른 평면 영역 세그먼테이션 동작을 도시하고 있는 개략적 도면이다.
도 15는 본 개시에 따른 평면 검출을 수행하는 장치를 도시하고 있는 블록 도면이다.
도 16은 본 개시에 따른 증강 현실에 적합한 시맨틱-인식(semantic-aware) 평면 검출 시스템을 도시하고 있는 개략적 도면이다. Aspects, other aspects, and/or advantages of the present application may be clearly understood from the following detailed description of embodiments of the present application in conjunction with the accompanying drawings.
1 is a schematic diagram of plane detection in an existing augmented reality framework.
2 is a schematic diagram of plane detection in an augmented reality framework according to the present disclosure;
3 is a flowchart of a method for performing plane detection according to the present disclosure;
4 is a schematic diagram of a plane detection method according to an exemplary embodiment of the present disclosure.
5 is a schematic diagram of operations of a deep neural network according to an exemplary embodiment of the present disclosure;
6 is a schematic diagram illustrating operations performed by a feature fusion module according to an exemplary embodiment of the present disclosure.
7 is a schematic diagram of a region in which normal information is obviously changed but depth information is slightly changed.
8 is a schematic diagram of optimizing depth information using normal information according to an exemplary embodiment of the present disclosure.
9 is a schematic diagram illustrating operations performed by a normal-guided attention module according to an exemplary embodiment of the present disclosure.
10 is a schematic diagram illustrating an example of a planar area segmentation operation according to an exemplary embodiment of the present disclosure.
11 is a schematic diagram illustrating an example of a plane refinement operation according to an exemplary embodiment of the present disclosure.
12 is a schematic diagram illustrating a plane detection method according to another exemplary embodiment of the present disclosure.
13 is a schematic diagram illustrating operations of the deep neural network according to another exemplary embodiment of the present disclosure.
14 is a schematic diagram illustrating a planar region segmentation operation according to another exemplary embodiment of the present disclosure.
15 is a block diagram illustrating an apparatus for performing plane detection according to the present disclosure.
16 is a schematic diagram illustrating a semantic-aware plane detection system suitable for augmented reality according to the present disclosure.

본 개시의 콘셉트 및 본 개시의 예시적인 실시 예들을 설명하기 전에, 본 개시의 보다 나은 이해를 돕기 위해서, 이제 기존의 증강 현실 프레임워크(augmented reality framework)에서의 평면(plane) 검출 방법이 먼저 간략하게 설명될 것이다.Before describing the concept of the present disclosure and exemplary embodiments of the present disclosure, in order to help a better understanding of the present disclosure, a plane detection method in an existing augmented reality framework is first briefly described. will be explained

도 1은 기존의 증강 현실 프레임워크에서의 평면 검출의 개략적 도면이다.1 is a schematic diagram of plane detection in an existing augmented reality framework.

도 1에 도시되어 있는 바와 같이, 기존의 증강 현실 프레임워크에서의 평면 검출 방법은 SLAM (Simultaneously Localization and Mapping) 시스템으로부터의 3 차원 포인트 클라우드(point cloud)에 크게 의존하고 있다. 특히, 기존의 AR 프레임워크는 상기 SLAM 시스템으로부터의 3차원 포인트 클라우드를 입력으로 획득하고, RANSAC(Random Sample Consensus)를 사용하여 평면 파라미터 피팅(plane parameter fitting)을 수행하여 상기 평면 검출을 완료한다. 그 후에, AR 콘텐트(content)는 카메라 포즈(camera pose)와 함께 디스플레이된다. 기존의 평면 검출 방법은 상기 SLAM 시스템에 의해 생성된 성긴(sparse) 3차원 포인트 클라우드를 기반으로 평면들을 검출하는 것을 알 수 있다. As shown in FIG. 1 , a plane detection method in an existing augmented reality framework largely relies on a three-dimensional point cloud from a Simultaneously Localization and Mapping (SLAM) system. In particular, the existing AR framework acquires a three-dimensional point cloud from the SLAM system as an input, and performs plane parameter fitting using RANSAC (Random Sample Consensus) to complete the plane detection. After that, the AR content is displayed along with the camera pose. It can be seen that the conventional plane detection method detects planes based on a sparse 3D point cloud generated by the SLAM system.

상기 SLAM 시스템은 먼저 입력 이미지로부터 특징 포인트(feature point)들을 추출하고, 상기 추출된 2 차원 특징 포인트들에 매치(match)하고, 상기 매치된 이차원 특징 포인트들을 사용하여 3차원 포인트들을 계산할 수 있다. 텍스처가 없는 영역(texture-less region)에서는 상기 2차원 특징 포인트들이 검출될 수 없기 때문에, 상기 SLAM 시스템은 상기 텍스처가 없는 영역에 상응하는 상기 3차원 포인트들을 획득하는데 실패하고, 이는 기존 방법이 텍스처가 없는 영역에서 상기 평면 검출을 수행할 수 없는 문제를 초래할 수 있다. The SLAM system may first extract feature points from an input image, match the extracted two-dimensional feature points, and calculate three-dimensional points using the matched two-dimensional feature points. Since the two-dimensional feature points cannot be detected in a texture-less region, the SLAM system fails to acquire the three-dimensional points corresponding to the texture-less region, which is why the existing method fails to obtain the texture-less region. It may cause a problem in that the plane detection cannot be performed in an area where there is no .

두 번째로, 상기에서 설명된 바와 같이, 기존의 평면 검출 방법은 상기 SLAM 시스템에 의해 출력되는 성긴(sparse) 3차원 포인트 클라우드를 기반으로 상기 평면 검출을 수행한다. 다만, 상기 3차원 포인트 클라우드는 평면의 경계들을 정확하게 추정하는 데 충분한 정보를 제공할 수 없고, 상기 검출된 평면의 경계들이 현실 오브젝트(real object)의 경계들과 정렬될 수 없는 문제를 초래할 수 있다.Second, as described above, the conventional plane detection method performs plane detection based on a sparse 3D point cloud output by the SLAM system. However, the 3D point cloud cannot provide sufficient information to accurately estimate plane boundaries, and the detected plane boundaries cannot be aligned with the boundaries of a real object. .

본 개시는 상기 문제들을 해결할 수 있는 새로운 평면 검출 방법 및 장치를 제안한다. 이하, 본 개시의 평면 검출을 수행하는 콘셉트 및 예시적인 실시 예들이 도 2 내지 도 15를 참조하여 상세히 설명될 것이다. The present disclosure proposes a new plane detection method and apparatus capable of solving the above problems. Hereinafter, a concept and exemplary embodiments for performing plane detection of the present disclosure will be described in detail with reference to FIGS. 2 to 15 .

도 2는 본 개시에 따른 증강 현실 프레임워크에서의 평면 검출의 개략적 도면이다. 도 2에 도시되어 있는 바와 같이, 본 개시는 SLAM 시스템에 의해 출력되는 성긴(sparse) 3차원 포인트 클라우드에 따라 평면 검출을 수행하는 대신, 입력 이미지(컬러 이미지일 수 있음)를 기반으로 평면들을 직접 검출할 수 있다. 전체 이미지에서의 정보가 평면 검출을 위해 사용되기 때문에, 본 개시는 텍스처가 없는 영역에서도 평면 검출을 수행할 수 있다.2 is a schematic diagram of plane detection in an augmented reality framework according to the present disclosure; As shown in FIG. 2 , the present disclosure directly determines planes based on an input image (which may be a color image), instead of performing plane detection according to a sparse three-dimensional point cloud output by the SLAM system. can be detected. Since information in the entire image is used for plane detection, the present disclosure can perform plane detection even in a texture-free region.

텍스처가 없는 영역에서 평면 검출을 수행할 수 없는 문제점을 해결하기 위해서, 본 개시는 장면(scene) 정보의 획득을 위한 심층 신경 네트워크(deep neural network)를 설계할 수 있다. 상기 심층 신경 네트워크는 상기 장면에서 평면들을 추정하기 위해 텍스처가 없는 영역을 포함하는 전체 장면에 대한 정보를 제공할 수 있다.In order to solve the problem that plane detection cannot be performed in a texture-free region, the present disclosure may design a deep neural network for acquiring scene information. The deep neural network may provide information about the entire scene, including areas without texture, to estimate planes in the scene.

검출된 평면들이 현실 오브젝트들의 경계들과 정렬될 수 없다는 문제점을 해결하기 위해서, 본 개시는 초기 평면을 획득하기 위해 깊이 영역 세그먼테이션(segmentation) 기술을 채택하고, 검출된 초기 평면들의 경계들을 현실 오브젝트들에 정렬하는 평면 경계 미세 조정(refinement) 기술을 채택한다. In order to solve the problem that the detected planes cannot be aligned with the boundaries of real objects, the present disclosure adopts a depth region segmentation technique to obtain an initial plane, and divides the boundaries of the detected initial planes into real objects. Adopt a plane boundary refinement technique to align to

이하, 도 3 내지 도 14를 참조하여, 본 개시의 예시적인 실시 예에 따른 평면 검출을 수행하는 방법이 상세히 설명될 것이다.Hereinafter, a method for performing plane detection according to an exemplary embodiment of the present disclosure will be described in detail with reference to FIGS. 3 to 14 .

도 3은 본 개시에 따른 평면 검출을 수행하는 방법의 플로우 차트이다 (이하, 설명의 편의를 위해, "평면 검출 방법"이라 칭해진다). 3 is a flowchart of a method for performing plane detection according to the present disclosure (hereinafter, for convenience of description, it will be referred to as a “plane detection method”).

도 3을 참조하면, 단계 S310에서, 입력 이미지가 획득될 수 있다. 예를 들어, 상기 입력 이미지는 사용자 요청에 대한 응답으로 카메라를 통해 획득될 수 있고, 상기 입력 이미지는 실시간으로 획득될 수도 있다. 본 개시의 실시 예들은 상기 입력 이미지를 획득하는 방식에 어떠한 제한도 가지지 않는다. Referring to FIG. 3 , in step S310, an input image may be obtained. For example, the input image may be acquired through a camera in response to a user request, and the input image may be acquired in real time. Embodiments of the present disclosure do not have any limitation on a method of acquiring the input image.

단계 S320에서, 심층 신경 네트워크는 상기 입력 이미지의 특징(feature)들을 추출하고 상기 추출된 특징들을 기반으로 상기 입력 이미지의 깊이 지도(depth map)를 추정할 수 있다. 여기서, 심층 신경 네트워크는 장면 정보를 획득하는데 사용될 수 있다. 예를 들어, 상기 입력 이미지의 특징들은 상기 입력 이미지의 1/2 해상도의 특징 지도, 1/4 해상도의 특징 지도, 1/8 해상도의 특징 지도, 1/16 해상도의 특징 지도, 1/32 해상도의 특징 지도 등을 생성하기 위해 사용될 수 있다. 상기 입력 이미지의 깊이 지도는 생성된 특징 지도를 기반으로 상기 심층 신경 네트워크에서 정보 추정을 위해 사용되는 계층(layer)들의 다양한 디컨볼루션(deconvolution) 및 컨볼루션(convolution) 동작들(상기 특징 지도를 확대하여 보다 큰 사이즈의 깊이 지도 및 노말 지도(normal map)를 예측하기 위해 사용되는))을 사용하여 최종적으로 추정될 수 있다. In step S320 , the deep neural network may extract features of the input image and estimate a depth map of the input image based on the extracted features. Here, a deep neural network may be used to obtain scene information. For example, the features of the input image may include a feature map of 1/2 resolution, a feature map of 1/4 resolution, a feature map of 1/8 resolution, a feature map of 1/16 resolution, and 1/32 resolution of the input image. It can be used to create a feature map of The depth map of the input image includes various deconvolution and convolution operations of layers used for information estimation in the deep neural network based on the generated feature map (using the feature map) It can be finally estimated using a larger size depth map and a normal map (used to predict a normal map) by magnification.

단계 S330에서, 깊이 지도는 입력 이미지에서 평면 영역들을 검출하기 위해 영역 세그먼테이션을 수행하기 위해 사용될 수 있다. 다양한 실시 예에 따라, In step S330, the depth map may be used to perform region segmentation to detect planar regions in the input image. According to various embodiments,

단계 S330에서, 깊이 지도는 평면 추정을 위해 입력 이미지에서 3차원 포인트들 및 깊이 연속 영역(depth-continuous region)들을 계산하는데 사용될 수 있다. 상기 계산된 3차원 포인트들 및 상기 깊이 연속 영역들의 정보는 상기 입력 이미지에서 평면 영역들을 검출하기 위한 영역 세그먼테이션을 수행하는데 사용될 수 있다. In step S330, the depth map may be used to calculate 3D points and depth-continuous regions in the input image for plane estimation. The calculated 3D points and information of the depth continuous regions may be used to perform region segmentation for detecting planar regions in the input image.

예를 들어, 깊이 지도는 다음과 같은 방식으로 3차원 포인트들을 계산하는 데 사용될 수 있다. 카메라 이미징 모델(camera imaging model)(예를 들어, 핀홀 카메라 모델(pinhole camera model))에 따라, 다음 수학식 1은 3차원 포인트들을 계산하기 위해 사용될 수 있다:For example, a depth map can be used to calculate three-dimensional points in the following way. According to a camera imaging model (eg, a pinhole camera model), the following equation (1) can be used to calculate three-dimensional points:

[수학식 1][Equation 1]

여기서, u와 v는 이미지 픽셀 좌표이고, fx와 fy는 초점 거리(focal length)들이고, cx와 cy는 이미지 중심 포인트(image principal point)들이다. 상기 이미지에서의 각 픽셀을 가로지를 경우(traverse), Z는 상기 픽셀에 상응하는 깊이 지도에서의 깊이이며 u, v, Z를 상기와 같은 수학식에 대입하면, 상기 공간 포인트의 X 및 Y 좌표가 획득될 수 있고, 따라서 상기 전체 이미지에 상응하는 모든 3차원 포인트들이 획득될 수 있다. Here, u and v are image pixel coordinates, fx and fy are focal lengths, and cx and cy are image principal points. When traversing each pixel in the image, Z is the depth in the depth map corresponding to the pixel, and substituting u, v, and Z into the above equation, the X and Y coordinates of the spatial point can be obtained, and thus all three-dimensional points corresponding to the entire image can be obtained.

본 개시에서는 깊이 지도를 사용하여 3차원 포인트들을 계산하는 상기와 같은 방법이 일 예로 기재하지만, 설계 방법에 따라 3차원 포인트의 다른 계산 방법들 역시 사용될 수 있음은 해당 기술 분야의 당업자에게 자명하며, 본 개시는 상기 3차원 포인트들의 특정 계산 방법들을 제한하지 않는다. In the present disclosure, the above method for calculating three-dimensional points using a depth map is described as an example, but it is apparent to those skilled in the art that other methods of calculating three-dimensional points may also be used depending on the design method, The present disclosure does not limit specific calculation methods of the three-dimensional points.

다양한 실시 예에 따라, 깊이 연속 영역들은 다음 방식으로 깊이 지도를 사용하여 계산될 수 있다. 예를 들어, 비교적 간단한 영역 증가 방법(region growth method)을 사용할 경우, 먼저 상기 포인트(9x9와 같은)의 주변(neighborhood)의 3차원 포인트들을 사용하여 각 포인트의 노말 벡터(normal vector)이 계산될 수 있다. 특정 임계값(예를 들어, 10도)을 설정하고, 인접한 포인트들의 노말 벡터들을 하나씩 비교하고, 그 결과가 상기 특정 임계 값 내에 존재할 경우, 두 개의 포인트들이 연속적이라고 고려되고, 두 개의 포인트들의 평균 값은 병합 후 상기 연속 영역의 노말 벡터로서 사용될 수 있다. 이후, 상기 인접 포인트들 혹은 다른 연속 영역들은 지속적으로 비교될 수 있다. According to various embodiments, depth continuous regions may be calculated using the depth map in the following manner. For example, when using a relatively simple region growth method, the normal vector of each point is first calculated using three-dimensional points in the neighborhood of the point (such as 9x9). can Set a certain threshold (eg 10 degrees), compare the normal vectors of adjacent points one by one, and if the result is within the certain threshold, two points are considered continuous, and the average of the two points The value can be used as a normal vector of the contiguous region after merging. Thereafter, the adjacent points or other contiguous regions may be continuously compared.

본 개시에서는 상기 깊이 지도를 사용하여 상기 깊이 연속 영역을 계산하는 방법이 일 예로 기재되어 있지만 설계 사양에 따라 상기 깊이 연속 영역들을 계산하기 위해 다른 방법들 역시 사용될 수 있다는 것은 해당 기술 분야의 당업자들에게 자명하다.In the present disclosure, a method of calculating the depth contiguous region using the depth map is described as an example, but it will be appreciated by those skilled in the art that other methods may also be used to calculate the depth contiguous region according to design specifications. self-evident

다양한 실시 예에 따라, 상기 3차원 포인트들과 상기 깊이 연속 영역들을 계산한 후, 상기 영역 세그먼테이션이 다음과 같은 방식에 따라 상기 계산된 3 차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 입력 이미지에서 상기 평면 영역들을 검출하기 위해 수행될 수 있다. 다양한 실시 예에 따라, 상기 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도가 계산될 수 있다.되 상기 계산된 노말 지도는 상기 심층 신경 네트워크에 의해 추정된 노말 지도와 융합될 수 있다. 상기 융합된 노말 지도와 깊이 연속 영역들의 정보를 사용하고, 상기 평면 영역들을 상기 융합된 노말 지도와 상기 깊이 연속 영역들의 정보를 사용하여 상기 평면 영역들을 세그먼트하기 위해 클러스터링(clustering)할 수 있다. According to various embodiments, after calculating the 3D points and the depth contiguous regions, the region segmentation is performed using the information of the calculated 3D points and the depth contiguous regions in the following manner to input the input may be performed to detect the planar regions in the image. According to various embodiments, a normal map of the input image may be calculated using the 3D points. However, the calculated normal map may be fused with a normal map estimated by the deep neural network. The fused normal map and information on the depth continuation regions may be used, and the planar regions may be clustered to segment the plane regions using the fused normal map and information on the depth continuation regions.

다양한 실시 예에 따라, 상기 3차원 포인트들과 상기 깊이 연속 영역들을 계산한 후, 상기 영역 세그먼테이션이 다음과 같은 방식에 따라 상기 계산된 3 차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 입력 이미지에서 상기 평면 영역들을 검출하기 위해 수행될 수 있다. 다양한 실시 예에 따라, 상기 계산된 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도를 계산하고, 상기 계산된 노말 지도와 상기 깊이 연속 영역들의 정보를 사용하여 상기 평면 영역들을 세그먼트하기 위해 클러스터링(clustering)할 수 있다. 다양한 실시 에에 따라, 상기 계산된 3차원 포인트들 및 상기 깊이 연속 영역들의 정보를 사용하여 상기 입력 이미지에서 상기 평면 영역들을 검출하기 위해 상기 영역 세그먼테이션을 수행하는 방법은 상기에서와 같은 예제로 제한되지 않는 점은 해당 기술 분야의 당업자들에게 자명하다. According to various embodiments, after calculating the 3D points and the depth contiguous regions, the region segmentation is performed using the information of the calculated 3D points and the depth contiguous regions in the following manner to input the input may be performed to detect the planar regions in the image. According to various embodiments, a normal map of the input image is calculated using the calculated 3D points, and clustering is performed to segment the planar regions using the calculated normal map and information on the depth continuous regions. )can do. According to various embodiments, the method of performing the region segmentation to detect the planar regions in the input image using the information of the calculated three-dimensional points and the depth continuous regions is not limited to the example as above. The point is obvious to those skilled in the art.

상기에서 설명한 바와 같이, 본 개시의 평면 검출 방법은 SLAM 시스템에 의해 생성되는 성긴(sparce) 특징 포인트들을 사용하여 상기 평면 검출을 수행하는 것 대신에, 입력 이미지의 깊이 지도를 추정하기 위해 전체 입력 이미지의 정보를 사용하고, 상기 입력 이미지에서 상기 평면 영역들을 검출하기 위해 상기 깊이 지도를 사용하기 때문에, 텍스처가 없는 영역에서라도 상기 평면 검출을 실현하는 것이 가능할 수 있다. As described above, the plane detection method of the present disclosure uses the entire input image to estimate the depth map of the input image, instead of performing the plane detection using the sparse feature points generated by the SLAM system. Since using the information of and using the depth map to detect the plane regions in the input image, it may be possible to realize the plane detection even in a texture-free region.

다양한 실시 예에 따라, 평면 검출 방법은 다음 단계들(도 3에 도시되지 않음)을 더 포함할 수 있다. 다양한 실시 에에 따라, 상기 평면 영역들의 경계들이 상기 입력 이미지에서의 현실 오브젝트(real object)들의 경계들과 정렬되도록 상기 검출된 평면 영역들의 경계들을 미세 조정될 수 있다.According to various embodiments, the plane detection method may further include the following steps (not shown in FIG. 3 ). According to various embodiments, the boundaries of the detected plane regions may be finely adjusted so that the boundaries of the plane regions are aligned with the boundaries of real objects in the input image.

이하, 본 개시의 예시적인 실시 예에 따른 평면 검출 방법과 관련된 콘텐트는 도 4 내지 도 14를 참조하여 보다 구체적으로 설명될 것이다.Hereinafter, content related to a plane detection method according to an exemplary embodiment of the present disclosure will be described in more detail with reference to FIGS. 4 to 14 .

다양한 실시 예에 따라, 단계 S320에서 설명된 심층 신경 네트워크는 입력 이미지의 특징들을 추출하기 위한 특징 추출자(extractor), 상기 입력 이미지의 깊이 정보를 추정하기 위한 깊이 추정 브랜치(branch), 및 상기 입력 이미지의 노말(normal) 정보를 추정하기 위한 노말 추정 브랜치를 포함할 수 있다. According to various embodiments, the deep neural network described in step S320 includes a feature extractor for extracting features of an input image, a depth estimation branch for estimating depth information of the input image, and the input It may include a normal estimation branch for estimating normal information of the image.

다만, 본 개시의 심층 신경 네트워크의 구조는 상기와 같은 예제들로 한정되지 않는다. 다양한 실시 예에 따라, 심층 신경 네트워크는 노말 추정 브랜치를 포함하지 않을 수 있고, 상기 입력 이미지의 특징들 및 상기 입력 이미지의 깊이 정보를 추정하기 위해 사용되는 깊이 추정 브랜치(depth estimation branch)를 추정하기 위한 특징 추정자를 포함할 수 있다. 본 개시는 상기 입력 이미지의 특징들을 추출할 수 있고, 적어도 상기 입력 이미지의 깊이 지도를 추정할 수 있는 한, 상기 심층 신경 네트워크의 특정 구조를 제한하지 않는다. 다양한 실시 예에 따라, 상기 심층 신경 네트워크는 상기 입력 이미지의 깊이 지도에 추가하여 상기 입력 이미지의 노말 지도를 추정할 수 있다.However, the structure of the deep neural network of the present disclosure is not limited to the above examples. According to various embodiments, the deep neural network may not include a normal estimation branch, and to estimate a depth estimation branch used for estimating features of the input image and depth information of the input image It may include a feature estimator for The present disclosure does not limit the specific structure of the deep neural network as long as the features of the input image can be extracted and at least a depth map of the input image can be estimated. According to various embodiments, the deep neural network may estimate a normal map of the input image in addition to the depth map of the input image.

도 4는 본 개시의 예시적인 실시 예에 따른 평면 검출 방법의 개략적 도면이다. 4 is a schematic diagram of a plane detection method according to an exemplary embodiment of the present disclosure.

도 4에 도시되어 있는 바와 같이, 상기 입력 이미지는 심층 신경 네트워크에 입력되고, 상기 검출된 평면들은 평면 영역 세그먼테이션 및 평면 영역 미세 조정을 통해 출력될 수 있다. 다양한 실시 예에 따라, 이미지의 깊이 정보와 표면 노말 정보가 밀접하게 관련되어 있음을 고려할 경우, 심층 신경 네트워크는 입력 이미지의 특징들을 추출하는 특징 추출자, 상기 입력 이미지의 깊이 정보를 추정하는 깊이 추정 브랜치(depth estimation branch), 및 상기 입력 이미지의 노말 정보를 추정하는 노말 추정 브랜치(normal estimation branch)를 포함하도록 설정될 수 있다. As shown in FIG. 4 , the input image may be input to a deep neural network, and the detected planes may be output through plane region segmentation and plane region fine-tuning. According to various embodiments, when considering that depth information of an image and surface normal information are closely related, a deep neural network includes a feature extractor for extracting features of an input image, and depth estimation for estimating depth information of the input image. It may be configured to include a depth estimation branch and a normal estimation branch for estimating normal information of the input image.

다양한 실시 예에 따라, 심층 신경 네트워크는 멀티-태스크(multi-task) 방식으로 입력 단일 이미지로부터 깊이 정보와 노말 정보를 동시에 추정할 수 있고, 평면 추정을 위한 밀집한(dense) 3 차원 포인트 클라우드(cloud)를 획득할 수 있다. 다양한 실시 예에 따라, 상기 입력 이미지는 특징 추출을 위해 심층 신경 네트워크의 특징 추출자로 입력될 수 있고, 상기 추출된 특징들은 각각 깊이 추정 브랜치 및 노말 추정 브랜치로 입력될 수 있다. According to various embodiments, the deep neural network may simultaneously estimate depth information and normal information from a single input image in a multi-task manner, and a dense three-dimensional point cloud for plane estimation ) can be obtained. According to various embodiments, the input image may be input to a feature extractor of a deep neural network for feature extraction, and the extracted features may be input to a depth estimation branch and a normal estimation branch, respectively.

다양한 실시 예에 따라, 깊이 추정 브랜치는 깊이 추정을 수행하여 상기 깊이 지도를 출력할 수 있고, 상기 노말 추정 브랜치는 노말 추정을 수행하여 상기 노말 지도를 출력할 수 있으며, 상기 깊이 지도와 노말 지도는 상기 평면 영역 세그먼테이션시 사용될 수 있다. 도 4의 예시적인 실시 예에서, 입력 이미지의 깊이 지도 추정 시, 깊이 추정 브랜치에 의해 추정되는 깊이 정보는 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 최적화될 수 있다. 도 4에서, 이런 접근 방식을 "노말-가이드 어텐션 메커니즘(the normal-guided attention mechanism)"이라고 칭해진다. 노말-가이드 어텐션 메커니즘으로, 노말 추정 브랜치에 의해 추정된 노말 정보를 사용하여 깊이 추정 브랜치에 의해 추정되는 깊이 정보를 최적화함으로써, 추정된 깊이 지도는 보다 정확하게 될 수 있고, 상기 평면 검출의 정확도를 개선시킬 수 있다.According to various embodiments, the depth estimation branch may perform depth estimation to output the depth map, the normal estimation branch may perform normal estimation to output the normal map, and the depth map and the normal map may include: It may be used during the flat area segmentation. 4 , when estimating the depth map of the input image, depth information estimated by the depth estimation branch may be optimized using normal information estimated by the normal estimation branch. In Fig. 4, this approach is referred to as "the normal-guided attention mechanism". With a normal-guided attention mechanism, by using the normal information estimated by the normal estimation branch to optimize the depth information estimated by the depth estimation branch, the estimated depth map can be made more accurate, and the accuracy of the plane detection is improved can do it

이하, 도 4에 도시되어 있는 상기 평면 검출 방법에서 상기 심층 신경 네트워크는 도 5와 관련하여 설명될 것이다. Hereinafter, the deep neural network in the plane detection method shown in FIG. 4 will be described with reference to FIG. 5 .

도 5는 본 개시의 예시적인 실시 예에 따른 심층 신경 네트워크의 동작들의 개략적 도면이다. 도 5에서의 심층 신경 네트워크의 예시에서, 특징 융합 모듈(feature fusion module)과 노말 가이드 어텐션 모듈은 깊이 추정 브랜치와 노말 추정 브랜치에 포함될 수 있다.5 is a schematic diagram of operations of a deep neural network according to an exemplary embodiment of the present disclosure. In the example of the deep neural network in FIG. 5 , a feature fusion module and a normal guide attention module may be included in the depth estimation branch and the normal estimation branch.

다양한 실시 예에 따라, 입력 이미지의 깊이 지도 추정 시, 특징 추출자를 사용하는 입력 이미지의 특징 추출에 의해 획득되는 미리 결정되어 있는 해상도의 특징 지도는, 상기 깊이 추정 브랜치를 사용하는 깊이 추정 시 생성되는 동일한 해상도의 깊이 특징 지도, 및 상기 노말 추정 브랜치를 사용하는 노말 추정 시 생성되는 동일한 해상도의 노말 특징 지도와 각각 융합되어 상기 융합된 깊이 특징 지도 및 상기 융합된 노말 특징 지도를 사용하는 상기 깊이 지도를 획득할 수 있다. 상기에서와 같은 동작들은 상기 특징 융합 모듈에 의해 수행되는 동작들이다. According to various embodiments, when estimating the depth map of the input image, the feature map of a predetermined resolution obtained by feature extraction of the input image using the feature extractor is generated during depth estimation using the depth estimation branch. The depth feature map of the same resolution and the normal feature map of the same resolution generated during normal estimation using the normal estimation branch are respectively fused to obtain the fused depth feature map and the depth map using the fused normal feature map can be obtained The above operations are operations performed by the feature fusion module.

여기서, 상기 미리 결정되어 있는 해상도는, 예를 들어, 1/8 해상도 및 1/16 해상도일 수 있으나, 그렇다고 그에 제한되는 것은 아니다. 융합을 위해 부분 해상도의 특징 지도들(상기 1/8 및 1/16 해상도의 특징 지도들과 같은)을 선택하는 것은 최종 깊이 지도와 노말 지도의 정확도 및 구체적인 정보를 향상시키면서 신경 네트워크의 트레이닝 속도(training speed)를 가속화시킬 수 있다. Here, the predetermined resolution may be, for example, 1/8 resolution and 1/16 resolution, but is not limited thereto. Selecting partial resolution feature maps (such as the 1/8 and 1/16 resolution feature maps above) for fusion improves the accuracy and specific information of the final depth map and the normal map, while improving the training speed of the neural network ( training speed) can be accelerated.

다양한 실시 예에 따라, 깊이 추정 브랜치에서의 특징 융합 모듈은 특징 추출자를 사용하는, 입력 이미지의 특징 추출에 의해 획득되는 미리 결정되어 있는 해상도의 특징 지도와 깊이 추정 시 생성되는 동일한 해상도의 깊이 특징 지도를 융합할 수 있다. 다양한 실시 예에 따라, 노말 추정 브랜치에서의 특징 융합 모듈은 특징 추출자를 사용하는, 입력 이미지의 특징 추출에 의해 획득되는 미리 결정되어 있는 해상도의 특징 지도와 상기 노말 추정 시 생성되는 동일한 해상도의 노말 특징 지도를 융합할 수 있다 (상기 도면들에서 상기 노말-가이드 어텐션 모듈을 참조). According to various embodiments, the feature fusion module in the depth estimation branch uses a feature extractor to obtain a feature map of a predetermined resolution obtained by feature extraction of an input image and a depth feature map of the same resolution generated during depth estimation. can be fused. According to various embodiments, the feature fusion module in the normal estimation branch uses a feature extractor to extract a feature map of a predetermined resolution obtained by feature extraction of an input image and a normal feature of the same resolution generated during normal estimation. Maps can be fused (refer to the normal-guided attention module in the drawings).

상기 특징 지도들의 융합을 통해, 상기 최종 깊이 지도와 노말 지도의 정확도와 구체적인 정보가 개선될 수 있다. 예를 들어, 상기 깊이 추정 및 노말 추정의 최종 결과의 공간적 세부 사항들이 복원될 수 있으며, 따라서 정확한 평면 검출 결과를 제공하는 것이 가능할 수 있다. 상기 특징 지도들의 융합은 또한 상기 심층 신경 네트워크의 트레이닝 속도를 가속화하는 것에 기여할 수 있다.Through the fusion of the feature maps, the accuracy and specific information of the final depth map and the normal map may be improved. For example, spatial details of the final result of the depth estimation and normal estimation may be reconstructed, and thus it may be possible to provide an accurate plane detection result. The fusion of the feature maps may also contribute to accelerating the training speed of the deep neural network.

이하, 도 6을 참조하여 특징 융합 모듈에 의해 수행되는 동작들이 설명될 수 있다. 첫 번째로, 입력 특징 추출자(예를 들어, DenseNet 인코더) 및 깊이 특징 지도를 사용하여 획득되는 미리 결정되어 있는 해상도의 특징 지도 (이하, 설명의 편의를 위해, "특징 추출자 특징 지도"로 축약된다)와 동일한 해상도의 노말 특징 지도는 각각 상기 특징 융합 모듈로 입력될 수 있다. Hereinafter, operations performed by the feature fusion module may be described with reference to FIG. 6 . First, a feature map of a predetermined resolution obtained using an input feature extractor (e.g., DenseNet encoder) and a depth feature map (hereinafter, for convenience of explanation, is referred to as “feature extractor feature map”). abbreviated) and normal feature maps of the same resolution may be input to the feature fusion module, respectively.

두 번째로, 입력 특징 추출자 특징 지도는 2차원 컨볼루션(convolution), 배치 노말라이제이션(batch normalization: BN) 및 활성화 함수(activation function) Relu로 구성된 유닛들을 프로세싱하는 3개의 그룹들(특정 개수는 제한되지 않고, 상기 3개의 그룹들은 단지 일 예일 뿐이다)을 사용하여 프로세싱될 수 있다. 여기서, 상기 프로세싱 유닛들의 제 1 그룹의 목적은 상기 입력 특징 지도의 채널들의 개수를 감소시키는 것이고, 상기 프로세싱 유닛들의 제 2 그룹의 함수는 상기 특징 추출자 특징 지도로부터 융합될 상기 특징 지도에 관련되는 특징들을 추출하고, 상기 특징들을 상기 특징 추출의 특징 도메인(feature domain)으로부터 상기 융합될 특징들의 특징 도메인으로 이동시키는 것이고, 상기 프로세싱 유닛들의 마지막 그룹의 함수는 출력 채널들의 개수가 상기 입력 깊이 특징 지도 (혹은 상기 노말 특징 지도)의 채널들의 개수와 동일해지도록 조정하는 것이다. Second, the input feature extractor feature map is divided into three groups (a certain number of processing units) consisting of two-dimensional convolution, batch normalization (BN) and activation function Relu. is not limited, and the three groups can be processed using only an example). wherein the purpose of the first group of processing units is to reduce the number of channels of the input feature map, and the function of the second group of processing units relates to the feature map to be fused from the feature extractor feature map. extracting features, and moving the features from the feature domain of the feature extraction to the feature domain of the features to be fused, a function of the last group of processing units is that the number of output channels is the input depth feature map (or the normal feature map) is adjusted to be equal to the number of channels.

이전 단계에서의 프로세싱된 특징 지도 출력은 입력 깊이 특징 지도 (혹은 상기 노말 브랜치의 특징 지도)의 상응하는 엘리먼트들과 더해질 수 있다. 마지막으로, 이전 단계에서의 특징 지도 출력은 3x3 컨볼루션, 배치 노말라이제이션 및 활성화 함수 Relu으로 구성되는 다른 프로세싱 유닛을 사용하여 프로세싱되어 상기 융합된 깊이 특징 지도 (혹은 상기 노말 특징 지도)를 획득할 수 있다. The processed feature map output in the previous step may be added with the corresponding elements of the input depth feature map (or the feature map of the normal branch). Finally, the feature map output from the previous step is processed using another processing unit consisting of 3x3 convolution, batch normalization and activation function Relu to obtain the fused depth feature map (or the normal feature map). can

특징 지도에서 수행되는 상기와 같은 동작들은 상기와 같은 프로세싱 유닛들(예를 들어, 상기 사용된 활성화 함수는 Relu로 제한되지 않을 수 있다)에 의해 프로세싱되는 것으로 제한되지 않으며, 상기 특징 지도들의 융합의 구체적인 방법은 도 6에 도시되어 있는 예제로 제한되는 것은 아니라는 것에 유의하여야만 하며, 공개된 정보에서 특징 지도들의 융합의 방법이 사용될 수 있다. Such operations performed on a feature map are not limited to being processed by such processing units (eg, the activation function used may not be limited to Relu), and the fusion of the feature maps It should be noted that the specific method is not limited to the example shown in FIG. 6 , and the method of fusion of feature maps in the published information may be used.

다음으로, 도 5에서 노말 가이드 어텐션 모듈에 의해 수행되는 동작들이 설명된다. 상기에서 설명한 바와 같이, 입력 이미지의 깊이 지도 추정 시, 깊이 추정 브랜치에 의해 추정되는 깊이 정보는 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 최적화될 수 있다. 예를 들어, 도 8에 도시되어 있는 바와 같이, 깊이 추정 브랜치 및 노말 추정 브랜치의 각 스테이지(stage)에서, 상기 깊이 추정 브랜치의 출력은 상기 노말 추정 브랜치의 출력을 사용하여 최적화될 수 있다. 예를 들어, 상기 노말 추정 브랜치에 상기 노말 정보를 반영하는 상기 노말 특징 지도에서의 고주파 정보 (예를 들어, 상기 노말 특징 변경이 미리 정의되어 있는 정도(degree)를 초과하는 상기 노말 특징 지도에서의 영역과 관련된 정보)는 상기 깊이 추정 브랜치에서의 깊이 특징 지도를 최적화하기 위해 사용될 수 있다. 상기 깊이 추정 브랜치에서의 상기 깊이 특징 지도가 상기 노말 정보는 확실하게(obviously) 변경되지만 상기 깊이 정보는 약간(slightly) 변경되는 영역에서 민감하게 유지되도록 하고, 따라서 상기 최종 깊이 지도가 이들 영역들에서 샤프한(sharp) 에지(edge)들을 가지고, 그리고 상기 평면 검출에 대해 보다 정확한 3차원 포인트들을 제공할 수 있다. Next, operations performed by the normal guide attention module in FIG. 5 will be described. As described above, when estimating the depth map of the input image, the depth information estimated by the depth estimation branch may be optimized using the normal information estimated by the normal estimation branch. For example, as shown in FIG. 8 , in each stage of the depth estimation branch and the normal estimation branch, the output of the depth estimation branch may be optimized using the output of the normal estimation branch. For example, high-frequency information in the normal feature map that reflects the normal information in the normal estimation branch (eg, in the normal feature map in which the normal feature change exceeds a predefined degree) region-related information) may be used to optimize the depth feature map in the depth estimation branch. The depth feature map in the depth estimation branch allows the normal information to change obviously but remain sensitive in regions where the depth information changes slightly, so that the final depth map in these regions It has sharp edges and can provide more accurate three-dimensional points for the plane detection.

도 7은 노말 정보가 확실하게 변경되지만 깊이 정보는 약간 변경되는 영역의 개략적 도면이다. 도 7에 도시되어 있는 바와 같이, 원으로 마킹된 영역은 노말 정보가 확실하게 변경되지만 깊이 정보는 많이 변경되지 않는 영역이며, 노말 특징 지도에서의 고주파 정보는 깊이 추정 브랜치에서 깊이 특징 지도를 최적화하기 위해 사용되고, 따라서 노말 정보는 확실하게 변경되지만 깊이 정보는 약간 변경되는 영역에서 민감하게 유지될 수 있고, 따라서 최종적으로 획득되는 깊이 지도는 이들 영역들에서 샤프한 에지(edge)들을 가질 수 있다.7 is a schematic diagram of a region in which normal information is definitely changed but depth information is slightly changed. As shown in FIG. 7 , the area marked with a circle is an area in which normal information is definitely changed but depth information is not changed much, and the high-frequency information in the normal feature map is used to optimize the depth feature map in the depth estimation branch. Therefore, the normal information is definitely changed but the depth information can be kept sensitive in the slightly changed regions, and thus the finally obtained depth map can have sharp edges in these regions.

다시 도 5로 돌아가서, 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 깊이 추정 브랜치에 의해 추정되는 깊이 정보를 최적화하는 것은, 노말 특징 변경이 미리 결정되어 있는 정도를 초과하는 노말 특징 지도에서의 영역과 관련되는 정보를 추출하고, 최적화된 깊이 특징 지도를 획득하기 위해 상기 정보를 사용하여 상기 깊이 특징 지도를 최적화하는 것을 포함할 수 있다. Returning again to FIG. 5 , optimizing the depth information estimated by the depth estimation branch using the normal information estimated by the normal estimation branch is an area in the normal feature map in which the normal feature change exceeds a predetermined degree. extracting information related to , and optimizing the depth feature map using the information to obtain an optimized depth feature map.

다양한 실시 예에 따라, 노말 특징 지도에서 노말 특징 변경이 미리 결정되어 있는 정도를 초과하는 영역에 관련되는 정보를 추출하고, 상기 정보를 사용하여 상기 깊이 특징 지도를 최적화하는 과정은 아래의 과정을 포함할 수 있다. 첫 번째로, 노말 특징 지도에 대해 수평 깊이 컨볼루션(convolution) 및 수직 깊이 컨볼루션을 각각 수행하고, 활성화 함수(activation function)를 사용하여 상기 정보에 대한 수평 어텐션(attention) 지도 및 수직 어텐션 지도를 획득할 수 있다. 두 번째로, 상기 수평 어텐션 지도, 상기 수직 어텐션 지도 및 상기 깊이 특징 지도를 기반으로 상기 최적화된 깊이 특징 지도를 획득할 수 있다. 예를 들어, 상기 수평 어텐션 지도, 상기 수직 어텐션 지도 및 상기 깊이 특징 지도를 기반으로 상기 최적화된 깊이 특징 지도를 획득하는 과정은 상기 수평 어텐션 지도와 상기 수직 어텐션 지도에 가중치를 부여하는 과정, 상기 가중치가 부여된 수평 어텐션 지도와 상기 수직 어텐션 지도를 상기 깊이 특징 지도와 융합하여 상기 최적화된 깊이 특징 지도를 획득하는 과정을 포함할 수 있다. According to various embodiments, the process of extracting information related to a region in which the normal feature change exceeds a predetermined degree from the normal feature map and optimizing the depth feature map using the information includes the following process can do. First, horizontal depth convolution and vertical depth convolution are performed on the normal feature map, respectively, and a horizontal attention map and a vertical attention map for the information are generated using an activation function. can be obtained Second, the optimized depth feature map may be obtained based on the horizontal attention map, the vertical attention map, and the depth feature map. For example, the process of obtaining the optimized depth feature map based on the horizontal attention map, the vertical attention map, and the depth feature map is a process of assigning weights to the horizontal attention map and the vertical attention map; and fusing the horizontal attention map and the vertical attention map to which has been assigned with the depth feature map to obtain the optimized depth feature map.

다양한 실시 예에 따라, 상기와 같은 최적화 동작은 도 5에 도시되어 있는 노말 가이드 어텐션 모듈에 의해 수행될 수 있다. According to various embodiments, the above optimization operation may be performed by the normal guide attention module illustrated in FIG. 5 .

도 9는 본 개시의 예시적인 실시 예에 따른 노말-가이드 어텐션 모듈에 의해 수행되는 동작들을 도시하고 있는 개략적 도면이다. 상기와 같은 최적화 동작은 도 9를 참조하여 하기에 예시될 것이다.9 is a schematic diagram illustrating operations performed by a normal-guide attention module according to an exemplary embodiment of the present disclosure. The optimization operation as described above will be exemplified below with reference to FIG. 9 .

도 9를 참조하면, 먼저, 상기 융합된 깊이 특징 지도

및 상기 융합된 노말 특징 지도

이 각각 상기 노말 가이드 어텐션 모듈로 입력된다.Referring to FIG. 9 , first, the fused depth feature map

and the fused normal feature map.

Each of these is input to the normal guide attention module.

상기 노말 특징 지도

는 (-1, 2, -1)의 컨볼루션 커널(convolution kernel)을 가지는 수평 깊이 컨볼루션

와 (-1, 2, -1)^T 의 컨볼루션 커널을 가지는 수직 깊이 컨볼루션

를 각각 사용하여 연산되고, 상기 Tanh 활성화 함수는 수평 어텐션 지도와 수직 어텐션 지도를 각각 획득하는 데 사용된다. 여기에서 상기 수평 어텐션 지도와 수직 어텐션 지도는 상기 노말 특징 지도에서 상기 고주파 정보에 대한 어텐션 지도들이다. 도 9에서 그들은 각각 "수평 고주파 어텐션 지도(a horizontal high-frequency attention map)"와 "수직 고주파 어텐션 지도(a vertical high-frequency attention map)"라고도 칭해질 수 있다. 그 후, 상기 수평 및 수직 어텐션 지도들은 각각 수평 어텐션 결과 및 수직 어텐션 결과를 획득하기 위해 상기

의 입력 깊이 특징 지도를 곱하는데 사용된다 (융합의 방법, 다른 융합 방법들 역시 사용될 수 있다). the normal feature map

is a horizontal depth convolution with a convolution kernel of (-1, 2, -1)

A vertical depth convolution with a convolution kernel of (-1, 2, -1) ^T

, respectively, and the Tanh activation function is used to obtain a horizontal attention map and a vertical attention map, respectively. Here, the horizontal attention map and the vertical attention map are attention maps for the high frequency information in the normal feature map. In FIG. 9 , they may also be referred to as “a horizontal high-frequency attention map” and “a vertical high-frequency attention map”, respectively. Then, the horizontal and vertical attention maps are used to obtain a horizontal attention result and a vertical attention result, respectively.

is used to multiply the input depth feature map of (method of fusion, other fusion methods can also be used).

마지막으로, 상기 수평 어텐션 결과 및 수직 어텐션 결과는 상기 최적화된 깊이 특징 지도를 획득하기 위해 상기 가중치 계수(weight coefficient)들 α 및 β를 각각을 사용하여 상기 입력 깊이 브랜치 특징 지도

의 상응하는 엘리먼트들과 더해진다. 요약하면, 이 모듈에 의해 출력되는 상기 어텐션 가이드 깊이 특징 지도는 다음 수학식 2로 설명될 수 있다.Finally, the horizontal attention result and the vertical attention result are the input depth branch feature map using the weight coefficients α and β respectively to obtain the optimized depth feature map.

is added with the corresponding elements of In summary, the attention guide depth feature map output by this module can be described by the following Equation (2).

[수학식 2] [Equation 2]

여기서,

와

은 각각 상기 입력 깊이 특징 지도와 입력 노말 특징 지도이고,

와

는 각각 상기 수평 깊이 컨볼루션과 수직 깊이 컨볼루션이고, tanh는 상기 활성화 함수고, α와 β는 상기 가중치 계수들이다. here,

Wow

are the input depth feature map and the input normal feature map, respectively,

Wow

are the horizontal depth convolution and the vertical depth convolution, respectively, tanh is the activation function, and α and β are the weighting coefficients.

다양한 실시 예에 따라, 상기 노말 특징 지도에서의 고주파 정보(즉, 상기 노말 특징이 상기 미리 결정되어 있는 정도를 초과하는 상기 노말 특징 지도에서의 영역에 관련되는 정보)가 도 9의 예제에서 추출되고, 상기 잔차 합 방법(residual sum method)이 상기 깊이 특징 지도를 최적화하기 위해 사용될 지라도, 상기 깊이 특징 지도를 최적화하기 위해 상기 노말 특징 지도에서 상기 고주파 정보를 사용하는 특정 방법은 도 9에서의 예제로 제한되지 않고, 상기 노말 특징 지도에서의 고주파 정보가 효율적으로 사용될 수 있는 한 다른 방법들이 사용될 수 있다. According to various embodiments, high-frequency information in the normal feature map (that is, information related to a region in the normal feature map in which the normal feature exceeds the predetermined degree) is extracted from the example of FIG. 9 and , although the residual sum method is used to optimize the depth feature map, the specific method of using the high frequency information in the normal feature map to optimize the depth feature map is as an example in FIG. It is not limited, and other methods may be used as long as the high frequency information in the normal feature map can be used efficiently.

도 9에서의 수직 깊이 컨볼루션 커널과 수평 깊이 컨볼루션 커널의 사이즈 (-1, 2, -1) 및 (-1, 2, -1)^T는 단지 일 예 일뿐이고, 설계 사양에 따라 수직 깊이 컨볼루션 커널과 수평 깊이 컨볼루션 커널의 사이즈는 다양하게 구현될 수 있다. Sizes (-1, 2, -1) and (-1, 2, -1) ^T of the vertical depth convolution kernel and the horizontal depth convolution kernel in FIG. 9 are only examples, and the vertical depth according to design specifications The size of the convolution kernel and the horizontal depth convolution kernel may be implemented in various ways.

도 5의 예제에 도시되어 있는 심층 신경 네트워크는 특징 지도 융합 모듈과 노말 가이드 어텐션 모듈을 모두 포함할 수 있다. 다만, 본 개시의 다양한 실시 예에 따라, 심층 신경 네트워크는 상기 특징 지도 융합 모듈 및 상기 노말 가이드 어텐션 모듈 중 어느 하나만 포함할 수도 있다. The deep neural network shown in the example of FIG. 5 may include both a feature map fusion module and a normal guide attention module. However, according to various embodiments of the present disclosure, the deep neural network may include only one of the feature map fusion module and the normal guide attention module.

다양한 실시 에에 따라, 심층 신경 네트워크가 상기 특징 지도 융합 모듈과 상기 노말 가이드 어텐션 모듈 (도 5에 도시되어 있는 바와 같은)을 모두 포함하는 케이스에서, 상기 입력 이미지의 깊이 지도 추정 시, 상기 깊이 추정 브랜치에 의해 추정되는 깊이 정보가 상기 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 최적화될 때 사용되는 노말 특징 지도 및 깊이 특징 지도는, 상기 특징 지도들의 융합에 의해 획득되는 상기 융합된 노말 특징 지도 및 융합된 깊이 특징 지도일 수 있다. According to various embodiments, in the case where the deep neural network includes both the feature map fusion module and the normal guide attention module (as shown in FIG. 5 ), when estimating the depth map of the input image, the depth estimation branch The normal feature map and the depth feature map used when the depth information estimated by ? is optimized using the normal information estimated by the normal estimation branch include the fused normal feature map obtained by fusion of the feature maps and It may be a fused depth feature map.

다양한 실시 에에 따라, 심층 신경 네트워크가 특징 지도 융합 모듈을 포함하지 않는 케이스에서, 입력 이미지의 깊이 지도 추정 시, 상기 깊이 추정 브랜치에 의해 추정되는 깊이 정보가 상기 노말 추정 브랜치에 의해 추정되는 노말 정보를 사용하여 최적화될 때 사용되는 노말 특징 지도 및 깊이 특징 지도는, 상기 특징 지도들의 융합이 없는 상기 노말 특징 지도 및 깊이 특징 지도이다. According to various embodiments, in a case in which the deep neural network does not include a feature map fusion module, when estimating a depth map of an input image, the depth information estimated by the depth estimation branch is calculated using the normal information estimated by the normal estimation branch. The normal feature map and the depth feature map used when optimized by using the normal feature map and the depth feature map are the normal feature map and the depth feature map without the fusion of the feature maps.

다시 도 4를 참조하면, 도 4의 예시적인 실시 예에서, 깊이 지도 및 노말 지도를 획득한 후, 상기 깊이 지도 및 노말 지도는 초기 평면을 검출하기 위한 평면 영역 세그먼테이션을 수행하기 위해 사용될 수 있다. 도 3의 단계 S330을 참조하여 상기에서 설명한 바와 같이, 상기 깊이 지도가 추정된 후, 상기 깊이 지도는 상기 평면 추정을 위해 상기 입력 이미지에서 3차원 포인트들 및 깊이 연속 영역들을 계산하기 위해 사용될 수 있고, 상기 계산된 3차원 포인트들과 상기 깊이 연속 영역들의 정보는 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 상기 평면 영역들을 검출하기 위해 사용될 수 있다. Referring back to FIG. 4 , in the exemplary embodiment of FIG. 4 , after obtaining a depth map and a normal map, the depth map and the normal map may be used to perform plane region segmentation for detecting an initial plane. As described above with reference to step S330 of FIG. 3 , after the depth map is estimated, the depth map can be used to calculate three-dimensional points and depth continuous regions in the input image for the plane estimation, and , the calculated 3D points and information on the depth continuous regions may be used to detect the planar regions in the input image by performing the region segmentation.

다양한 실시 예에 따라, 도 4의 상기 심층 신경 네트워크가 노말 추정 브랜치를 포함하는 케이스에서, 상기 계산된 3차원 포인트들과 상기 깊이 연속 영역들의 정보를 사용하여 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지에서 평면 영역들을 검출하는 과정은, 상기 계산된 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도를 계산하고, 상기 계산된 노말 지도를 상기 노말 추정 브랜치를 사용하여 획득된 노말 지도와 융합하는 과정, 및 상기 융합된 노말 지도 및 상기 깊이 연속 영역들의 정보를 사용하여 상기 평면 영역들을 세그먼트하는 클러스터링(clustering)하는 과정을 포함할 수 있다. 상기와 같은 동작은 "평면 영역 세그먼테이션 동작(a plane region segmentation operation)"이라고 칭해질 수 있다. According to various embodiments, in a case in which the deep neural network of FIG. 4 includes a normal estimation branch, the region segmentation is performed using the calculated 3D points and information on the depth continuous regions in the input image. The detecting of the planar regions includes calculating a normal map of the input image using the calculated 3D points, and fusing the calculated normal map with a normal map obtained using the normal estimation branch, and and clustering the planar regions by using the fused normal map and information on the depth continuous regions. Such an operation may be referred to as "a plane region segmentation operation".

도 10은 본 개시의 예시적인 실시 예에 따른 평면 영역 세그먼테이션 동작의 일 예를 도시하고 있는 개략적 도면이다. 이하, 상기 평면 영역 세그먼테이션 동작의 예제가 도 10을 참조하여 설명될 것이다. 도 10을 참조하면, 상기 평면 영역 세그먼테이션의 결과는, 첫 번째로, 상기 심층 신경 네트워크에 의해 출력되는 상기 깊이 지도를 사용하여 상기 평면 추정을 위한 입력 이미지에서의 상기 3차원 포인트들 및 깊이 연속 영역들을 계산하고, 두 번째로, 상기 계산된 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도를 계산하고, 상기 계산된 노말 지도를 상기 심층 신경 네트워크에 의해 추정된 노말 지도(예를 들어, 상기 노말 추정 브랜치에 의해 추정되는 노말 지도)와 융합함으로써 획득될 수 있다. 10 is a schematic diagram illustrating an example of a planar area segmentation operation according to an exemplary embodiment of the present disclosure. Hereinafter, an example of the planar area segmentation operation will be described with reference to FIG. 10 . Referring to FIG. 10 , the result of the plane region segmentation is, first, the 3D points and the depth continuous region in the input image for plane estimation using the depth map output by the deep neural network. , and secondly, a normal map of the input image is calculated using the calculated 3D points, and the calculated normal map is used as a normal map estimated by the deep neural network (eg, the normal map It can be obtained by fusion with the normal map estimated by the estimated branch).

예를 들어, 도 10에 도시되어 있는 바와 같이, 다음과 같은 융합 방법이 채택될 수 있다. 상기 융합 방법은, 상기 계산된 노말 지도와 심층 추정 네트워크에 의해 추정되는 노말 지도를 사용하여 평균 노말 지도(average normal map)를 획득하고, 노말-일관 영역(normal-consistent region)의 정보를 계산하는 과정 및 상기 융합된 노말 지도, 상기 노말-일관 영역의 정보 및 상기 깊이-연속 영역의 정보를 사용하여 상기 평면 영역들을 세그먼트하기 위해 클러스터링하는 과정을 포함할 수 있다. For example, as shown in FIG. 10 , the following fusion method may be employed. In the fusion method, an average normal map is obtained using the calculated normal map and a normal map estimated by a deep estimation network, and information of a normal-consistent region is calculated. and clustering to segment the planar regions using the fused normal map, information on the normal-consistent region, and information on the depth-continuous region.

상기 계산된 노말 지도와 상기 심층 신경 네트워크에 의해 추정된 노말 지도의 융합은 상기에서와 같은 예제 방법으로 제한되지 않고, 예를 들어 가중치가 부여된 평균 노말 맵을 계산하는 등과 같은 다른 융합 방법들이 사용될 수 있다.The fusion of the calculated normal map and the normal map estimated by the deep neural network is not limited to the example method as described above, and other fusion methods such as calculating a weighted average normal map may be used. can

다시 도 4를 참조하면, 상기 평면 영역들이 상기 평면 영역 세그먼테이션에 의해 검출된 후, 대안적으로 상기 평면 영역들의 경계들은 상기 입력 이미지의 현실 오브젝트들의 경계들과 상기 플레인 영역들의 경계들을 정렬하기 위해 연속적으로 미세 조정될 수 있다. 다양한 실시 예에 따라, 상기 검출된 평면 영역들의 경계들을 미세 조정하는 과정은, 상기 검출된 각 평면 영역에 상응하는 개별 레이블 값(discrete label value)을 각각 획득하는 과정, 상기 개별 레이블 값을 기반으로 상기 검출된 각 평면 영역을 3차원 볼륨(volume)으로 변환하는 과정, 상기 변환된 3 차원 볼륨과 상기 입력 이미지를 기반으로 상기 평면 영역들을 미세 조정하는 과정을 포함할 수 있다. 다양한 실시 예에 따라, 상기 평면 영역들의 경계들이 상기 입력 이미지에서의 현실 오브젝트들의 경계들과 정렬되도록 할 수 있다. Referring again to FIG. 4 , after the planar areas are detected by the planar area segmentation, alternatively the boundaries of the planar areas are continuous to align the boundaries of the plane areas with the boundaries of real objects of the input image. can be fine-tuned. According to various embodiments, the process of fine-tuning the boundaries of the detected planar regions may include acquiring discrete label values corresponding to each detected planar regions, respectively, based on the individual label values. The method may include converting each detected planar area into a 3D volume, and finely adjusting the planar area based on the converted 3D volume and the input image. According to various embodiments, boundaries of the planar regions may be aligned with boundaries of real objects in the input image.

도 11은 본 개시의 예시적인 실시 예에 따른 평면 미세 조정 동작의 일 예를 도시하고 있는 개략적 도면이다. 도 11에 도시되어 있는 바와 같이, 먼저, 상기 평면 영역 세그먼테이션에 의해 획득되는 각 평면 영역은 개별 레이블 값을 사용하여 넘버링될 수 있고, 그리고 나서 상기 레이블된 평면 영역들은 원-핫 인코딩(one-hot encoding)을 사용하여 상기 3차원 볼륨으로 변환될 수 있다. 마지막으로, 상기 평면 영역들의 경계들은 상기 변환된 3차원 볼륨과 상기 입력 색상 이미지를 사용하여 상기 에지-보존 최적화 알고리즘(edge-preserving optimization algorithm)(양방향 솔버 알고리즘(bilateral solver algorithm)과 같은)을 적용하여 미세 조정되고, 따라서 그들은 상기 신의 현실 오브젝트들의 경계들과 정렬될 수 있다. 11 is a schematic diagram illustrating an example of a plane fine-tuning operation according to an exemplary embodiment of the present disclosure. 11 , first, each planar area obtained by the planar area segmentation can be numbered using an individual label value, and then the labeled planar areas are one-hot encoded. encoding) can be used to transform the three-dimensional volume. Finally, the boundaries of the planar regions apply the edge-preserving optimization algorithm (such as a bilateral solver algorithm) using the transformed three-dimensional volume and the input color image. to be fine-tuned, so that they can be aligned with the boundaries of the real-world objects of the scene.

예를 들어, 상기 3차원 볼륨의 에지-보존 최적화는 계층 별로 수행될 수 있으며 최적화 후 다른 평면들의 각 픽셀의 가중치가 획득되고, 그리고 나서 가장 큰 가중치를 가지는 레이블이 각 픽셀에 대해서 상기 픽셀의 미세 조정된 평면 레이블로 선택되고, 따라서 상기 픽셀이 속해 있는 평면을 결정할 수 있다. For example, the edge-preserving optimization of the three-dimensional volume may be performed for each layer, and after optimization, the weight of each pixel in different planes is obtained, and then the label with the greatest weight is the fineness of the pixel for each pixel. It is chosen as the adjusted plane label, so it is possible to determine which plane the pixel belongs to.

본 개시의 예시적인 실시 예에 따른 상기 평면 검출 방법은 도 4 내지 도 11을 참조하여 상기에서 설명한 바 있다. 그러나, 본 개시의 평면 검출 방법은 상기에서 설명한 예시적인 실시 예에 한정되지 않는다. The plane detection method according to an exemplary embodiment of the present disclosure has been described above with reference to FIGS. 4 to 11 . However, the plane detection method of the present disclosure is not limited to the exemplary embodiment described above.

도 12는 본 개시의 다른 예시적인 실시 예에 따른 평면 검출 방법을 도시하고 있는 개략적 도면이다.12 is a schematic diagram illustrating a plane detection method according to another exemplary embodiment of the present disclosure.

도 12를 참조하면, 심층 신경 네트워크 (상기에서 깊이 추정 네트워크라고도 칭해질 수 있음)는 입력 이미지의 특징들을 추출하기 위한 특징 추출자, 상기 입력 이미지의 깊이 정보를 추정하기 위한 깊이 추정 브랜치(상기 깊이 브랜치(depth branch)라고도 칭해지는)를 포함할 수 있다.. 다양한 실시 예에 따라, 심층 신경 네트워크는 도 5에 도시되어 있는 바와 같은 입력 이미지의 노말 정보를 추정하기 위한 노말 추정 브랜치(상기 노말 브랜치(normal branch) 라고도 칭해지는)는 포함하지 않을 수 있다. Referring to FIG. 12 , a deep neural network (which may also be referred to as a depth estimation network above) includes a feature extractor for extracting features of an input image, and a depth estimation branch for estimating depth information of the input image (the depth). It may include a branch (also referred to as a depth branch). According to various embodiments, the deep neural network includes a normal estimation branch (the normal branch) for estimating normal information of an input image as shown in FIG. 5 . (also referred to as a normal branch) may not be included.

도 12에서 사용되는 심층 신경 네트워크는 상기 깊이 추정 브랜치와 특징 추출자만을 포함하고, 상기 노말 추정 브랜치를 포함하지 않기 때문에, 계산량이 더 적고, 상기 깊이 정보는 실시간으로 획득될 수 있어, 이동 장치들에서의 어플리케이션에 유리하다.Since the deep neural network used in FIG. 12 includes only the depth estimation branch and the feature extractor and does not include the normal estimation branch, the amount of computation is less, and the depth information can be obtained in real time, so that the mobile devices It is advantageous for applications in

도 13은 본 개시의 예시적인 실시 예에 따른 상기 심층 신경 네트워크의 동작들을 도시하고 있는 개략적 도면이다. 13 is a schematic diagram illustrating operations of the deep neural network according to an exemplary embodiment of the present disclosure.

도 13에 도시되어 있는 바와 같이, 특징 융합 모듈은 깊이 추정 브랜치에 포함될 수 있다. 다양한 실시 예에 따라, 입력 이미지의 깊이 지도 추정 시, 상기 특징 추출자를 사용하는 상기 입력 이미지의 특징 추출에 의해 획득되는 미리 결정되어 있는 해상도의 특징 지도(예를 들어, 1/16 해상도의 특징 지도)는 상기 깊이 추정 브랜치를 사용하는 깊이 추정 시 생성되는 동일한 해상도의 깊이 특징 지도와 융합되어, 상기 융합된 깊이 특징 지도를 사용하여 상기 깊이 지도를 생성할 수 있다. 상기 특징 지도들의 융합을 통해, 상기 최종 깊이 지도의 정확도 및 구체적인 정보가 개선될 수 있고, 보다 정확한 평면 검출 결과가 제공될 수 있다.13 , the feature fusion module may be included in the depth estimation branch. According to various embodiments, when estimating a depth map of an input image, a feature map of a predetermined resolution (eg, a feature map of 1/16 resolution) obtained by feature extraction of the input image using the feature extractor ) may be fused with a depth feature map of the same resolution generated during depth estimation using the depth estimation branch, and the depth map may be generated using the fused depth feature map. Through the fusion of the feature maps, the accuracy and specific information of the final depth map may be improved, and a more accurate plane detection result may be provided.

예를 들어, 상기 특징 추출자에 의해 생성되는 특징 지도는 도 7에 도시되어 있는 방법을 사영하여 상기 깊이 특징 지도와 융합되어 상기 융합된 깊이 특징 지도를 획득하고, 상기 최종 깊이 지도는 상기 융합된 깊이 지도를 사용하여 생성된다. For example, the feature map generated by the feature extractor is fused with the depth feature map using the method shown in FIG. 7 to obtain the fused depth feature map, and the final depth map is the fused depth map. Generated using depth maps.

그 후, 상기 깊이 지도는 상기 평면 영역 세그먼테이션을 통해 상기 평면들을 검출하기 위해 사용될 수 있다. The depth map can then be used to detect the planes through the plane area segmentation.

도 14는 본 개시의 다른 예시적인 실시 예에 따른 평면 영역 세그먼테이션 동작을 도시하고 있는 개략적 도면이다. 14 is a schematic diagram illustrating a planar region segmentation operation according to another exemplary embodiment of the present disclosure.

도 14를 참조하면, 깊이 지도는 평면 추정을 위해 입력 이미지에서 3차원 포인트들과 깊이-연속 영역들을 계산하는 데 사용될 수 있으며, 상기 계산된 3차원 포인트들 및 상기 깊이-연속 영역들의 정보는 영역 세그먼테이션을 수행하여 상기 입력 이미지의 평면 영역들을 검출하기 위해 사용될 수 있다. Referring to FIG. 14 , the depth map may be used to calculate 3D points and depth-continuous regions in an input image for plane estimation, and the information of the calculated 3D points and the depth-continuous regions is an area It can be used to perform segmentation to detect planar regions of the input image.

예를 들어, 도 14에 도시되어 있는 바와 같이, 상기 계산된 3차원 포인트들과 상기 깊이-연속 영역들의 정보를 사용하여 상기 영역 세그먼테이션을 수행하여 상기 입력 이미지의 평면 영역들을 검출하는 과정은 상기 계산된 3차원 포인트들을 사용하여 상기 입력 이미지의 노말 지도를 계산하는 과정, 및 상기 계산된 노말 지도 및 상기 깊이-연속 영역들의 정보를 사용하여 상기 평면 영역들을 세크먼트하기 위해 클러스터링하는 과정을 포함할 수 있다. For example, as shown in FIG. 14 , the process of detecting planar regions of the input image by performing the region segmentation using the calculated three-dimensional points and information on the depth-continuous regions is the calculation. calculating a normal map of the input image using the obtained three-dimensional points, and clustering the planar regions to segment the planar regions using the calculated normal map and information on the depth-continuous regions. there is.

다시 도 13을 참조하면, 본 개시의 다른 예시적인 실시 예에 따른 평면 검출 방법은 평면 영역 세그먼테이션을 통해 평면 영역들이 검출된 후에 상기 검출된 평면 영역들의 경계들을 지속적으로 미세 조정할 수 있고, 상기 평면 영역들의 경계들은 상기 입력 이미지에서의 실제 오브젝트들의 경계들과 정렬될 수 있다. 다양한 실시 예에 따라, 이미지 세그먼테이션(image segmentation)을 기반으로 하는 평면 미세 조정 기술이 사용될 수 있지만, 그에 한정되는 것은 아니다. Referring back to FIG. 13 , the plane detection method according to another exemplary embodiment of the present disclosure may continuously fine-tune the boundaries of the detected plane regions after the plane regions are detected through plane region segmentation, and the plane region Boundaries of the s may be aligned with the boundaries of real objects in the input image. According to various embodiments, a planar fine adjustment technique based on image segmentation may be used, but is not limited thereto.

예를 들어, 도 11을 참조하여 상기에서 설명된 바와 같은 평면 미세 조정 방법 역시 사용될 수 있다. 상기 이미지 세그먼테이션을 기반으로 하는 다음의 미세조정 방법은 상기 평면 영역들의 경계들을 미세 조정하기 위해 도 4에 도시되어 있는 평면 검출 방법에 적용될 수도 있다. 예를 들어, 상기 검출된 평면 영역들의 경계들을 미세 조정하는 과정은, 상기 검출된 평면 영역들을 기반으로 상기 입력 이미지에서의 각 픽셀(pixel)에 상응하는 영역 정보를 획득하는 과정, 상기 각 픽셀과 상기 검출된 각 평면 영역의 경계들 간의 2 차원 단일 채널 이미지 상의 최단 거리를 기반으로, 상기 입력 이미지로 구성되는 4-채널 이미지 및 상기 영역 정보로 구성되는 2 차원 단일 채널 이미지에서 각 픽셀의 평면 가중치 정보를 획득하는 과정, 픽셀 값들, 상기 영역 정보 및 상기 각 픽셀에 상응하는 상기 평면 가중치 정보를 기반으로 상기 픽셀들 간의 유사성(similarity)을 결정하고, 상기 각 픽셀 간의 유사성을 기반으로 상기 이미지 세그먼테이션을 수행하여 상기 미세 조정된 평면 영역 경계들을 획득하는 과정을 포함할 수 있다.For example, the planar fine-tuning method as described above with reference to FIG. 11 may also be used. The following fine-tuning method based on the image segmentation may be applied to the plane detection method shown in FIG. 4 to fine-tune the boundaries of the planar regions. For example, the process of fine-tuning the boundaries of the detected planar regions includes: acquiring region information corresponding to each pixel in the input image based on the detected plane regions; Based on the shortest distance on the two-dimensional single-channel image between the detected boundaries of each planar region, the plane weight of each pixel in the 4-channel image composed of the input image and the two-dimensional single-channel image composed of the region information A process of obtaining information, determining similarity between the pixels based on pixel values, the region information, and the plane weight information corresponding to each pixel, and performing the image segmentation based on the similarity between the respective pixels performing to obtain the fine-tuned planar region boundaries.

다양한 실시 예에 따라, 각 평면 영역에 숫자로 디지털 레이블이 추가될 수 있으며, 평면 영역 번호로 구성된 상기 이미지는 상기 컬러 이미지의 네 번째 채널에서와 같다. 상기 4-채널 이미지가 세그먼트되고, 상기 세그먼테이션의 결과가 상기 평면 영역들의 경계들을 미세 조정하기 위해 사용된다. 상기와 같은 평면 미세 조정 동작을 통해, 상기 검출된 평면들은 상기 현실 오브젝트들의 경계들과 정렬될 수 있다.According to various embodiments, a digital label may be added to each planar area as a number, and the image composed of the planar area number is the same as in the fourth channel of the color image. The four-channel image is segmented, and the result of the segmentation is used to fine-tune the boundaries of the planar regions. Through the plane fine-tuning operation as described above, the detected planes may be aligned with the boundaries of the real objects.

상기 이미지 세그먼테이션을 기반으로 하는 상기 평면 미세 조정은 하기에서 구체적으로 설명될 것이다.The plane fine-tuning based on the image segmentation will be specifically described below.

먼저, 상기 개별 값 레이블은 상기 평면 세그먼테이션 모듈에 의해 출력되는 각 평면 영역을 넘버링하기 위해 사용될 수 있으며, 상기 평면 영역 번호로 구성되는 2차원 단일-채널 이미지와 신 색상 이미지가 결합되어 상기 4-채널 이미지를 형성한다. 이때, 상기 4-채널의 각 픽셀의 값은

이며, 여기서 R은 적색 채널, G는 녹색 채널, B는 청색 채널, P는 평면 영역 레이블 번호로 구성된 채널이다. P 값은 각 픽셀에 상응하는 상기 영역 정보를 반영할 수 있다. 평면 가중치 지도(plane weight map)

는 상기 평면 영역 번호로 구성되는 상기 2차원 단일-채널 이미지를 사용하여 계산된다. First, the individual value labels can be used to number each planar area output by the planar segmentation module, and a two-dimensional single-channel image composed of the planar area number and a scene color image are combined to form the four-channel form an image. At this time, the value of each pixel of the 4-channel is

where R is a red channel, G is a green channel, B is a blue channel, and P is a channel consisting of a flat area label number. The P value may reflect the region information corresponding to each pixel. plane weight map

is calculated using the two-dimensional single-channel image composed of the planar area number.

상기 가중치 지도에서 각 픽셀의 가중치 값은 상기 픽셀과 상기 평면 영역들의 경계들 사이의 평면 영역 번호로 구성되는 2차원 단일-채널 이미지에서의 최단 거리에 비례한다. 이런 방식으로, 상기와 같은 4-채널 이미지에서 각 픽셀의 평면 가중치 정보는 각 픽셀과 상기 검출된 각 평면 영역의 경계들 간의 상기 평면 영역 번호로 구성되는 2차원 단일-채널 이미지에서의 최단 거리를 기반으로 획득될 수 있다. The weight value of each pixel in the weight map is proportional to the shortest distance in the two-dimensional single-channel image consisting of the plane region number between the pixel and the boundaries of the plane regions. In this way, the plane weight information of each pixel in such a four-channel image is the shortest distance in a two-dimensional single-channel image consisting of the plane region number between each pixel and the boundaries of each detected plane region. can be obtained based on

마지막으로, 상기 이미지 세그먼테이션은 이미지 세그먼테이션 알고리즘(image segmentation algorithm)(효율적 그래프-기반 이미지 세그먼테이션 알고리즘(Efficient Graph-based image segmentation algorithm)과 같은)을 사용하여 픽셀 유사성 함수를 기반으로 수행되어 상기 미세 조정된 평면 영역들의 경계들을 획득한다. 상기 픽셀 유사성 함수는 아래의 수학식 3과 같이 정의된다:Finally, the image segmentation is performed based on a pixel similarity function using an image segmentation algorithm (such as an Efficient Graph-based image segmentation algorithm) to perform the fine-tuned Obtain the boundaries of the planar regions. The pixel similarity function is defined as Equation 3 below:

[수학식 3][Equation 3]

여기서,

는 그 유사성이 계산될 2개의 픽셀들이고,

,

은 각각 상기 4-채널 이미지에서의

의 픽셀 값들이고,

,

는 각각 상기 4-채널 이미지에서의

의 픽셀 값들이고,

,

는 각각 상기 가중치 지도

에서

의 가중치 값들이고, f(p1,p2)는 평면 거리 함수 혹은 평면 거리 메트릭(metric) 함수다. here,

is the two pixels whose similarity is to be calculated,

,

are each in the 4-channel image.

are the pixel values of

,

are each in the 4-channel image.

are the pixel values of

,

are each of the weighted maps

at

are weight values of , and f(p1,p2) is a plane distance function or a plane distance metric function.

본 개시의 평면 검출 방법의 일부 예시적인 실시 예들이 상기에서 설명된 바 있다. 본 개시에서 제안되는 평면 검출 방법은 상기 전체 입력 이미지의 깊이 지도를 기반으로 상기 평면들을 검출하므로, 상기 평면들이 텍스처가 없는 영역(texture-less region)에서 검출될 수 있다. 또한, 이를 기반으로, 상기 평면 영역의 정확도는 상기에서 설명된 특징 지도 융합 모듈 및/또는 노말 가이드 어텐션 모듈에 의해, 또한 추가적으로 상기 평면 미세 조정 동작을 수행함으로써, 효율적으로 개선될 수 있고, 상기 검출된 평면 영역들의 경계들은 상기 입력 이미지에서의 현실 오브젝트들의 경계들과 정렬될 수 있다. Some exemplary embodiments of the plane detection method of the present disclosure have been described above. Since the plane detection method proposed in the present disclosure detects the planes based on the depth map of the entire input image, the planes may be detected in a texture-less region. In addition, based on this, the accuracy of the planar region can be efficiently improved by the feature map fusion module and/or the normal guide attention module described above, and by additionally performing the plane fine-tuning operation, the detection Boundaries of the obtained planar regions may be aligned with boundaries of real objects in the input image.

구체적으로, AR 어플리케이션들에서는, 종종 현실 신(scene)에 가상 오브젝트들을 배치하는 것이 필요로 되지만, 상기 현실 신에는 (단색 벽들, 데스크탑들, 등과 같은) 많은 텍스처가 없는 영역들이 존재한다. 본 개시에 따른 평면 검출 방법을 사용함으로써, 상기 텍스처가 없는 영역들에서 평면들을 검출하는 것을 지원하는 것이 가능할 수 있고, 따라서 사용자는 상기 텍스처가 없는 영역들에 상기 가상 오브젝트들을 배치할 수 있고, 이는 상기 사용자의 요구를 만족시키고 사용자 경험을 개선시킨다. Specifically, in AR applications, it is often necessary to place virtual objects in a real scene, but there are many non-textured areas (such as solid color walls, desktops, etc.) in the real scene. By using the plane detection method according to the present disclosure, it may be possible to support detecting planes in the non-textured regions, so that a user can place the virtual objects in the non-textured regions, which Satisfy the user's needs and improve the user experience.

또한, 예를 들어, AR 게임들에서는, 종종 가상 오브젝트들이 현실 오브젝트들과 상호 작용하는 것이 필요로 된다. 본 개시에 따른 평면 검출 방법을 사용함으로써, 상기 현실 오브젝트들의 경계들과 정렬되는 평면 검출 결과가 제공될 수 있고, 따라서 상기 가상 오브젝트들과 상기 현실 오브젝트들의 상호 작용의 정확도를 개선시킬 수 있고, 게임 경험을 개선시킬 수 있다.Also, for example in AR games, it is often necessary for virtual objects to interact with real objects. By using the plane detection method according to the present disclosure, a plane detection result aligned with the boundaries of the real objects can be provided, thus improving the accuracy of the interaction of the virtual objects with the real objects, and a game can improve the experience.

본 개시의 상기 평면 검출 방법은 AR 안경들, 스마트 폰들 또는 다른 AR 단말기들에 적용될 수 있다. 게다가, 네비게이션들, 전시회들, 트레이닝들, 게임들, 등과 같은 어플리케이션들에 적용될 수 있다.The plane detection method of the present disclosure may be applied to AR glasses, smart phones or other AR terminals. In addition, it can be applied to applications such as navigations, exhibitions, trainings, games, and the like.

도 15는 본 개시에 따른 평면 검출을 수행하는 장치를 도시하고 있는 블록 도면이다 (이하, 설명의 편의를 위해, "평면 검출 장치"라 칭해진다).15 is a block diagram illustrating an apparatus for performing plane detection according to the present disclosure (hereinafter, for convenience of description, it will be referred to as a “plane detection apparatus”).

도 15를 참조하면, 상기 평면 검출 장치(1500)는 이미지 획득 유닛(1510), 추정 유닛(1520), 및 영역 세그먼테이션 유닛(1530)을 포함할 수 있다. 구체적으로, 상기 이미지 획득 유닛(1510)은 입력 이미지를 획득하도록 구성될 수 있다. 상기 추정 유닛(1520)은 심층 신경 네트워크를 사용하여 상기 입력 이미지의 특징들을 추출하고, 상기 추출된 특징들을 기반으로 상기 입력 이미지의 깊이 지도를 추정하도록 구성될 수 있다. 상기 영역 세그먼테이션 유닛(1530)은 상기 깊이 지도를 사용하여 상기 입력 이미지에서 평면 영역들을 검출하기 위해 영역 세그먼테이션을 수행하도록 구성될 수 있다. 구체적으로, 예를 들어, 상기 영역 세그먼테이션 유닛(1530)은 상기 깊이 지도를 사용하여 평면 추정을 위해 상기 입력 이미지에서 3차원 포인트들 및 깊이-연속 영역들을 계산하고, 상기 계산된 3차원 포인트들 및 깊이-연속 영역들의 정보를 사용하여 상기 입력 이미지에서 평면 영역들을 검출하기 위해 상기 영역 세그먼테이션을 수행한다. Referring to FIG. 15 , the plane detection apparatus 1500 may include an image acquisition unit 1510 , an estimation unit 1520 , and a region segmentation unit 1530 . Specifically, the image acquisition unit 1510 may be configured to acquire an input image. The estimation unit 1520 may be configured to extract features of the input image using a deep neural network, and estimate a depth map of the input image based on the extracted features. The region segmentation unit 1530 may be configured to perform region segmentation to detect planar regions in the input image using the depth map. Specifically, for example, the region segmentation unit 1530 calculates three-dimensional points and depth-continuous regions in the input image for plane estimation using the depth map, and the calculated three-dimensional points and The region segmentation is performed to detect planar regions in the input image using information of depth-continuous regions.

다양한 실시 예에 따라, 상기 평면 검출 장치(1500)는 평면 경계 미세 조정 유닛 (도시되어 있지 않음)을 더 포함할 수 있고, 상기 평면 경계 미세 조정 유닛은 상기 평면 영역들의 경계들이 상기 입력 이미지에서의 현실 오브젝트들의 경계들과 정렬되도록 상기 검출된 평면 영역들의 경계들을 미세 조정할 수 있다. According to various embodiments, the plane detection apparatus 1500 may further include a plane boundary fine-tuning unit (not shown), wherein the plane boundary fine-tuning unit determines whether boundaries of the plane regions are determined in the input image. Boundaries of the detected planar regions may be finely adjusted to be aligned with the boundaries of real objects.

본 개시에 따른 평면 검출 동작들에 관련되는 콘텐트 및 세부 사항들이 상기에서 설명되었으므로, 여기서는 자세한 설명을 생략한다. 상기 상응하는 콘텐트 혹은 세부 사항들은 도 3 내지 도 14에 대한 설명을 참조할 수 있다. Since content and details related to the plane detection operations according to the present disclosure have been described above, a detailed description thereof will be omitted. The corresponding content or details may refer to the description of FIGS. 3 to 14 .

본 개시의 실시 예들에 따른 평면 검출 방법 및 평면 검출 장치가 상기에서 도 1 내지 도 15를 참조하여 설명되었다. 그러나, 도 15에 도시되어 있는 장치의 각 유닛은 소프트웨어, 하드웨어, 펌웨어, 또는 특정 기능을 수행하는 상기와 같은 아이템들의 조합 각각으로 구성될 수 있다. 예를 들어, 이들 유닛들은 전용 집적 회로에 상응할 수 있고, 또한 순수 소프트웨어 코드에 상응할 수 있으며, 또한 상기 소프트웨어와 하드웨어를 결합한 모듈에도 상응할 수 있다. 예를 들어, 도 15를 참조하여 설명되는 상기 장치는 PC 컴퓨터, 태블릿 장치, 개인 휴대 정보 단말기, 스마트 폰, 웹 어플리케이션, 또는 프로그램 인스트럭션(instruction)들을 실행할 수 있는 다른 장치들일 수 있고, 이에 제한되지는 않는다.A plane detection method and a plane detection apparatus according to embodiments of the present disclosure have been described above with reference to FIGS. 1 to 15 . However, each unit of the apparatus shown in FIG. 15 may be configured with software, hardware, firmware, or a combination of the above items that perform a specific function, respectively. For example, these units may correspond to a dedicated integrated circuit, may also correspond to pure software code, and may also correspond to a module combining the software and hardware. For example, the device described with reference to FIG. 15 may be a PC computer, a tablet device, a personal digital assistant, a smart phone, a web application, or other devices capable of executing program instructions, but is not limited thereto. does not

또한, 상기에서 상기 평면 검출 장치(1500)를 설명할 시, 상기 평면 검출 장치(1500)는 상응하는 프로세싱을 수행하기 위한 유닛들로 분할되지만, 각 유닛에 의해 수행되는 프로세싱은 상기 평면 검출 장치가 상기 특정 유닛들에 대한 어떤 분할도 수행하지 않거나 혹은 상기 유닛들 간의 명백한 경계가 없는 케이스에서도 수행될 수 있다는 것은 해당 기술 분야의 당업자에게 명백하다. In addition, when the plane detection device 1500 is described above, the plane detection device 1500 is divided into units for performing corresponding processing, but the processing performed by each unit is performed by the plane detection device. It is apparent to those skilled in the art that the specific units may be performed in a case where no division is performed or there is no clear boundary between the units.

도 15를 참조하여 설명된 장치는 상기에서 설명된 유닛들을 포함하는 것으로 제한되지는 않지만, 필요에 따라 일부 다른 유닛들 (예를 들어, 저장 유닛, 데이터 프로세싱 유닛, 등)이 추가될 수 있거나, 또는 상기와 같은 유닛들은 결합될 수도 있다.The apparatus described with reference to FIG. 15 is not limited to including the units described above, but some other units (eg, storage unit, data processing unit, etc.) may be added as needed; Alternatively, such units may be combined.

도 16에 도시되어 있는 바와 같이, 본 개시에 의해 제안되는 증강 현실에 적합한 시맨틱-인식(semantic-aware) 평면 검출 시스템은 3개의 파트들을 포함할 수 있다. As shown in FIG. 16 , a semantic-aware plane detection system suitable for augmented reality proposed by the present disclosure may include three parts.

첫 번째 파트는 장명(또는 신) 정보 획득을 위한 심층 신경 네트워크이고, 두 번째 파트는 평면 영역 세그먼테이션 모듈이고, 세 번째 파트는 평면 경계 미세 조정 모듈이다.The first part is a deep neural network for acquiring long life (or new) information, the second part is a planar region segmentation module, and the third part is a plane boundary fine-tuning module.

제 1 구현 방식에서는, 상기 심층 신경 네트워크가 상기 깊이 정보와 노말 정보를 추정하기 위해 사용된다. 노말-가이드 어텐션 모듈이 이 모듈의 네트워크 구조에서 설계되고, 이는 상기 노말 특징 지도에서의 고주파 정보를 사용하여 상기 경계들에서 상기 추정된 깊이 지도를 보다 샤프하게(sharp) 할 수 있고, 따라서 이는 상기 평면 검출을 위한 더 정확한 밀집한 3차원 포인트 클라우드를 제공할 수 있고, 상기 평면 영역 세그먼테이션 모듈은 상기 노말 정보를 포함하는 상기 밀집한 3차원 포인트 클라우드에 대해 클러스터링 및 세그먼테이션을 수행하고, 따라서 보다 정확하고 강인한 평면 영역을 획득할 수 있다. 상기 평면 경계 미세 조정 모듈은 상기 에지-보존 최적화 알고리즘을 사용하여 상기 현실 오브젝트들의 경계들과 정렬하기 위해 상기 획득된 평면 영역들을 미세조정하고, 따라서 시맨틱-인식 평면을 획득할 수 있다.In a first implementation manner, the deep neural network is used to estimate the depth information and the normal information. A normal-guided attention module is designed in the network structure of this module, which can use the high-frequency information in the normal feature map to sharpen the estimated depth map at the boundaries more, thus it It is possible to provide a more accurate dense three-dimensional point cloud for plane detection, and the plane region segmentation module performs clustering and segmentation on the dense three-dimensional point cloud including the normal information, and thus a more accurate and robust plane area can be obtained. The plane boundary fine-tuning module may use the edge-preserving optimization algorithm to fine-tune the obtained plane regions to align with the boundaries of the real objects, and thus obtain a semantic-aware plane.

제 2 구현 방식에서는, 먼저 상기 입력 이미지가 상기 전체 이미지의 깊이 정보를 획득하기 위해 상기 추정 네트워크로 입력되고, 그리고 나서 상기 획득된 깊이 지도가 상기 평면 영역 세그먼테이션의 결과를 획득하기 위해 상기 평면 영역 세그먼테이션 모듈로 입력되고, 마지막으로 상기 평면 영역들의 경계들이 상기 이미지 세그먼테이션을 기반으로 상기 평면 미세 조정 방법을 사용하여 상기 현실 오브젝트들의 경계들과 보다 잘 정렬될 수 있다. 여기서, 상기 심층 추정 네트워크는 다음 평면 영역 세그먼테이션 모듈에 대해 충분한 정보를 제공할 수 있고, 따라서 상기 텍스처가 없는 영역을 포함하는 전체 이미지 상의 평면 영역들을 계산할 수 있다. 여기서, 상기 평면 영역 세그먼테이션 모듈은 이미지 세그먼테이션을 기반으로 상기 다음 평면 미세 조정 모듈과 함께 동작하여 상기 검출된 평면 영역들의 경계들을 상기 신의 현실 오브젝트들의 경계들과 보다 잘 정렬되도록 하고, 상기 평면 미세 조정 알고리즘이 보다 효율적이다. In a second implementation manner, first, the input image is input into the estimation network to obtain depth information of the entire image, and then the obtained depth map is subjected to the plane region segmentation to obtain a result of the plane region segmentation. input into a module, and finally, the boundaries of the planar regions can be better aligned with the boundaries of the real objects using the plane fine-tuning method based on the image segmentation. Here, the deep estimation network can provide sufficient information for the next planar area segmentation module, and thus can calculate planar areas on the entire image including the non-textured area. Here, the plane region segmentation module operates together with the next plane fine-tuning module based on image segmentation to better align the boundaries of the detected plane regions with the boundaries of the real objects of the scene, and the plane fine-tuning algorithm This is more efficient.

또한, 본 개시에 따른 평면 검출 방법은 컴퓨터-리드 가능 기록 매체에 기록될 수 있다. 구체적으로, 본 개시에 따르면, 프로세서에 의해 실행될 때, 상기 프로세서가 상기에서 설명된 바와 같은 상기 평면 검출 방법을 실행할 수 있도록 하는, 프로그램 인스트럭션들을 기록하는 상기 컴퓨터-리드 가능 기록 매체를 제공하는 것이 가능할 수 있다. 상기 컴퓨터-리드 가능 기록 매체들의 예제들은 자기 매체들(예를 들어, 하드 디스크, 플로피 디스크, 자기 테이프), 광 매체들 (예를 들어, CD-ROM 및 DVD), 마그네토-광(magneto-optical) 매체들(예를 들어, 광 디스크) 및 상기 프로그램 인스트럭션들을 저장하고 실행하도록 특별히 구성되는 하드웨어 장치(예를 들어, 리드 온니 메모리 (read only memory: ROM), 랜덤 액세스 메모리(random access memory: RAM), 플래시 메모리, 등)를 포함할 수 있다. 또한, 본 개시에 따르면, 프로세서 및 프로그램 인스트럭션들을 저장하는 메모리를 포함하는 전자 장치가 제공 될 수 있으며, 여기서 상기 프로그램 인스트럭션들은, 상기 프로세서에 의해 실행될 때, 상기 프로세서가 상기에서 설명된 평면 검출 방법을 실행하도록 한다. 상기 프로그램 인스트럭션들의 예제들은, 예를 들어 컴파일러에 의해 생성되는 머신 코드(machine code) 및 인터프리터(interpreter)를 사용하여 컴퓨터에 의해 실행될 수 있는 하이-레벨 코드(high-level code)를 포함하는 파일을 포함한다.In addition, the plane detection method according to the present disclosure may be recorded in a computer-readable recording medium. Specifically, according to the present disclosure, it is possible to provide the computer-readable recording medium for recording program instructions, which, when executed by a processor, enables the processor to execute the plane detection method as described above. can Examples of the computer-readable recording media include magnetic media (eg, hard disk, floppy disk, magnetic tape), optical media (eg, CD-ROM and DVD), magneto-optical ) media (eg, an optical disk) and a hardware device specially configured to store and execute the program instructions (eg, read only memory (ROM), random access memory (RAM) ), flash memory, etc.). Further, according to the present disclosure, there may be provided an electronic device including a processor and a memory for storing program instructions, wherein the program instructions, when executed by the processor, cause the processor to perform the above-described plane detection method. make it run Examples of the program instructions include, for example, machine code generated by a compiler and a file containing high-level code that can be executed by a computer using an interpreter. include

또한, 본 출원의 예시적인 실시 예에 따른 평면 검출 방법의 일부 동작들은 소프트웨어 방식으로 구현될 수 있고, 일부 동작들은 하드웨어 방식으로 구현될 수 있고, 또한, 이러한 동작들은 상기 소프트웨어 및 하드웨어의 결합에 의해 구현될 수도 있다. In addition, some operations of the plane detection method according to an exemplary embodiment of the present application may be implemented in a software manner, and some operations may be implemented in a hardware manner, and these operations may be performed by a combination of the software and hardware. may be implemented.

또한, 본 개시는 또한 프로세서 및 프로그램 인스트럭션들을 저장하는 메모리를 포함하는 전자 장치를 제공하며, 여기서 상기 프로그램 인스트럭션들은, 상기 프로세서에 의해 실행될 때, 상기 프로세서가 본 개시의 평면 검출 방법을 실행하도록 한다. 예를 들어, 상기 전자 장치는 PC 컴퓨터, 태블릿 장치, 개인 휴대 정보 단말기, 스마트 폰, 또는 상기의 인스트럭션들의 집합을 실행할 수 있는 다른 장치들일 수 있다. 여기서, 상기 전자 장치는 반드시 단일 전자 장치일 필요는 없고, 상기와 같은 인스트럭션들(또는 상기 인스트럭션들의 집합)을 개별적으로 또는 공동으로 실행할 수 있는 장치들 또는 회로들의 어그리게이션(aggregation)일 수도 있다. 상기 전자 장치는 또한 통합 제어 시스템 또는 시스템 관리자의 일부일 수 있거나, 또는 인터페이스에 의해 로컬 또는 원격(예를 들어, 무선 송신을 통해)으로 상호 연결되는 휴대용 전자 장치로 구성될 수 있다.In addition, the present disclosure also provides an electronic device including a processor and a memory for storing program instructions, wherein the program instructions, when executed by the processor, cause the processor to execute the plane detection method of the present disclosure. For example, the electronic device may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or other devices capable of executing the set of instructions. Here, the electronic device is not necessarily a single electronic device, and may be an aggregation of devices or circuits capable of individually or jointly executing the above instructions (or a set of instructions). . The electronic device may also be part of an integrated control system or system administrator, or it may consist of a portable electronic device interconnected locally or remotely (eg, via wireless transmission) by an interface.

본 개시가 본 개시의 예시적인 실시 예들을 참조하여 구체적으로 도시되어 있고 또한 설명되었지만, 해당 기술 분야의 당업자는 첨부되는 청구항들에 의해 정의되는 바와 같은 본 개시의 사상 및 범위로부터 벗어남이 없이 형태 및 세부 사항들에서의 다양한 변경들이 이루어질 수 있다는 것을 이해해야만 할 것이다. While the present disclosure has been particularly shown and described with reference to exemplary embodiments of the present disclosure, those skilled in the art will take form and form without departing from the spirit and scope of the disclosure as defined by the appended claims. It should be understood that various changes in details may be made.

Claims

A method for performing plane detection, comprising:
acquiring an input image;
extracting features of the input image using a deep neural network, and estimating a depth map of the input image based on the extracted features; and
and detecting planar regions in the input image by performing region segmentation using the depth map.

The method of claim 1,
The deep neural network includes a feature extractor for extracting the features of the input image, a depth estimation branch for estimating depth information of the input image, and normal information of the input image. a normal estimation branch for estimating;
When estimating the depth map of the input image, the depth information estimated by the depth estimation branch is optimized using the normal information estimated by the normal estimation branch.

3. The method of claim 2,
When estimating the depth map of the input image, a feature map of a predetermined resolution obtained by feature extraction of the input image using the feature extractor is the same resolution generated when estimating the depth using the depth estimation branch to obtain the depth map using the fused depth feature map and the fused normal feature map by fusion with a depth feature map of and a normal feature map of the same resolution generated during normal estimation using the normal estimation branch Way.

3. The method of claim 2,
The process of optimizing the depth information estimated by the depth estimation branch using the normal information estimated by the normal estimation branch includes:
extracting information related to a region in which normal feature change exceeds a predetermined degree from the normal feature map, optimizing the depth feature map using the extracted information, and obtaining an optimized depth feature map; How to include.

5. The method of claim 4,
The process of extracting information related to a region in which the normal feature change exceeds the predetermined degree from the normal feature map, and optimizing the depth feature map using the extracted information,
A process of performing horizontal depth convolution and vertical depth convolution on the normal feature map, respectively, and obtaining a horizontal attention map and a vertical attention map for the information using an activation function ; and
and obtaining an optimized depth feature map based on the horizontal attention map, the vertical attention map, and the depth feature map.

6. The method of claim 5,
The process of obtaining an optimized depth feature map based on the horizontal attention map, the vertical attention map, and the depth feature map,
assigning weights to the horizontal attention map and the vertical attention map; and
and fusing the weighted horizontal attention map and the weighted vertical attention map with the depth feature map to obtain an optimized depth feature map.

The method of claim 1,
The process of detecting the planar regions in the input image by performing the region segmentation using the depth map,
3D points and depth-continuous regions are calculated from the input image for plane estimation using the depth map, and information on the calculated 3D points and the depth continuous regions and detecting the planar regions in the input image by performing the region segmentation using

8. The method of claim 7,
The process of detecting the planar regions in the input image by performing the region segmentation using the calculated 3D points and information on the depth continuous regions includes:
calculating a normal map of the input image using the calculated three-dimensional points, and fusing the calculated normal map with a normal map estimated by the deep neural network; and
and clustering to segment the planar regions using the fused normal map and information on the depth continuous regions.

8. The method of claim 7,
The process of detecting the planar regions in the input image by performing the region segmentation using the calculated 3D points and information on the depth continuous regions includes:
calculating a normal map of the input image using the calculated 3D points; and
and clustering to segment the planar regions using the calculated normal map and information on the depth continuous regions.

10. The method of claim 9,
The deep neural network includes an extractor for extracting features of the input image and a depth estimation branch for extracting depth information of the input image,
When estimating the depth map of the input image, a feature map of a predetermined resolution obtained by feature extraction of the input image using the feature extractor is a feature map of the same resolution generated when depth estimation using the depth estimation branch. A method of generating the depth map using the fused depth feature map by fusion with a depth feature map.

The method of claim 1,
and fine-tuning the boundaries of the detected planar regions so that the boundaries of the planar regions are aligned with the boundaries of real objects in the input image.

12. The method of claim 11,
The process of fine-tuning the boundaries of the detected planar regions includes:
acquiring discrete label values corresponding to each of the detected planar regions, respectively;
converting each detected planar region into a three-dimensional volume based on the individual label value; and
A method of fine-tuning the planar regions based on the transformed three-dimensional volume and the input image so that the boundaries of the planar regions are aligned with the boundaries of real objects in the input image.

12. The method of claim 11,
The process of fine-tuning the boundaries of the detected planar regions includes:
obtaining area information corresponding to each pixel in the input image based on each detected planar area;
Based on the shortest distance on the two-dimensional single-channel image between each pixel and the detected boundaries of each planar region, each obtaining plane weight information of pixels; and
The fine-tuned planar area is determined by determining similarity between pixels based on a pixel value, the area information, and the plane weight information corresponding to each pixel, and performing image segmentation based on the similarity between the respective pixels A method comprising obtaining boundaries.

An apparatus for performing plane detection, comprising:
an image acquiring unit, configured to acquire an input image;
an estimation unit, configured to extract features of the input image and, using a deep neural network, estimate a depth map of the input image based on the extracted features; and
and a region segmentation unit configured to detect planar regions in the input image by performing region segmentation using the depth map.

An electronic device comprising a processor and a memory for storing program instructions, the electronic device comprising:
The electronic device of claim 1 , wherein the program instructions, when executed by the processor, cause the processor to perform the method of claim 1 .

A computer-readable recording medium storing program instructions,
The recording medium of claim 1 , wherein the program instructions, when executed by a processor, cause the processor to perform the method of claim 1 .