KR102494552B1

KR102494552B1 - a Method for Indoor Reconstruction

Info

Publication number: KR102494552B1
Application number: KR1020170183212A
Authority: KR
Inventors: 윤국진; 이정균
Original assignee: 광주과학기술원
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2023-02-03
Also published as: KR20190080641A

Abstract

깊이 카메라로 촬영한 동영상을 이용하여 3차원 실내 복원하는 방법이 개시된다. 본 발명의 일 실시 예에 따른 3차원 실내 복원 방법은, 깊이 측정 카메라를 이용해 실내를 촬영한 동영상을 획득하는 단계, 획득한 동영상을 이용하여 3차원 복원을 위한 초기 데이터를 만드는 단계, 상기 초기 데이터로부터 레이아웃을 추정하는 단계, 상기 추정된 레이아웃을 활용하여 글로벌 3차원 데이터 정합을 수행하는 단계를 포함한다.Disclosed is a method of restoring a 3D interior using a video captured by a depth camera. A 3D indoor restoration method according to an embodiment of the present invention includes acquiring a video of a room photographed using a depth measuring camera, creating initial data for 3D restoration using the acquired video, and the initial data. estimating a layout from, and performing global 3D data matching using the estimated layout.

Description

Indoor restoration method {a Method for Indoor Reconstruction}

본 발명은 실내를 3차원으로 복원하는 기술에 관한 것이다. 구체적으로, 조인트 레이아웃 추정 및 전역 멀티-뷰 정합을 이용하는 실내 3차원 복원 기술에 관한 것이다.The present invention relates to a technique for restoring a room in three dimensions. Specifically, it relates to an indoor 3D reconstruction technique using joint layout estimation and global multi-view registration.

본 발명의 배경이 되는 기술은 저가형 깊이 측정 카메라(depth camera)를 이용하여 실내를 3차원으로 복원하는 기술로서, Simultaneous localization and mapping(SLAM), 3D reconstruction 등이 해당한다. The background technology of the present invention is a technology for restoring a room in 3D using a low-cost depth camera, and corresponds to Simultaneous localization and mapping (SLAM), 3D reconstruction, and the like.

기존에 RGB 카메라를 이용한 SLAM 및 3차원 복원 등의 연구가 많이 이루어졌으나, RGB 영상을 이용하게 되면, 텍스처가 부족한 장면에서 알고리즘이 잘 동작하지 않는 현상이 발생하기도 하며, 거리 측정이 정확하지 않은 단점이 있다.In the past, many studies have been conducted on SLAM and 3D reconstruction using RGB cameras, but when RGB images are used, the algorithm does not work well in scenes with insufficient texture, and the distance measurement is not accurate. there is

최근 저가형 깊이 측정 카메라가 상용화됨에 따라 이를 이용하여 앞서 언급된 단점을 보완할 수 있는 3차원 복원 연구가 활발히 진행되고 있다.Recently, as low-cost depth measurement cameras have been commercialized, 3D reconstruction studies that can compensate for the above-mentioned disadvantages using them have been actively conducted.

선행기술 1은 깊이 측정 카메라를 이용해 3차원 복원을 수행하는 가장 대표적인 연구이다. Iterative closest point(ICP) 알고리즘을 이용하여 카메라의 자세를 연속적으로 추정하며, Truncated signed distance function(TSDF)를 이용해 매 프레임에서 얻은 각 픽셀의 깊이 정보를 3차원 볼륨 공간에 투영함으로써 3차원 복원 결과를 실시간으로 획득한다.Prior Art 1 is the most representative study for performing 3D reconstruction using a depth measuring camera. The posture of the camera is continuously estimated using the iterative closest point (ICP) algorithm, and the depth information of each pixel obtained from each frame is projected onto the 3D volume space using the truncated signed distance function (TSDF) to obtain a 3D restoration result. Acquired in real time.

그러나, 선행기술 1은 카메라 자세 추정 시 발생하는 누적 오차(drift error)에 취약하고, GPU 메모리의 한계로 인해 좁은 공간의 복원만 가능하다. 따라서 이러한 문제를 해결하기 위한 다양한 연구가 진행되고 있다.However, prior art 1 is vulnerable to drift errors generated when estimating a camera attitude, and can only restore a narrow space due to limitations of GPU memory. Therefore, various studies are being conducted to solve these problems.

하지만, 기존 알고리즘이 여전히 잘 동작하지 않는 경우(예를 들어, loop closure가 미검출/오검출 되는 경우, loop closure: 카메라가 이전에 촬영했던 장소로 다시 돌아오는 것을 감지하고, 이를 기반으로 카메라 자세 추정 시 발생하는 누적 오차를 최소화 시키는 알고리즘)가 발생하며, 따라서 이하에서는 기존의 기술에서의 문제를 해결하며 실내 3차원 복원 기술의 성능을 향상시킬 수 있는 방법을 설명한다.However, if the existing algorithm still doesn't work well (e.g. loop closure is not detected/falsely detected), then loop closure: detects the camera coming back to where it was previously shot, and based on that detects the camera pose An algorithm that minimizes the cumulative error generated during estimation) occurs, and therefore, the following describes a method that can solve problems in the existing technology and improve the performance of the indoor 3D reconstruction technology.

R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux,D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. Kinectfusion: Real-time dense surface mapping and tracking. In ISMAR, 2011.R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. Kinectfusion: Real-time dense surface mapping and tracking. In ISMAR, 2011. S. Choi, Q.-Y. Zhou, and V. Koltun. Robust reconstruction of indoor scenes. In CVPR, 2015.S. Choi, Q.-Y. Zhou, and V. Koltun. Robust reconstruction of indoor scenes. In CVPR, 2015. R. B. Rusu, N. Blodow, and M. Beetz. Fast point feature histograms (fpfh) for 3d registration. In ICRA, 2009.R. B. Rusu, N. Blodow, and M. Beetz. Fast point feature histograms (fpfh) for 3d registration. In ICRA, 2009. Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. TPAMI, 23(11):1222??1239, 2001.Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. TPAMI, 23(11):1222??1239, 2001. R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In ICRA, 2011.R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In ICRA, 2011. M. Iwanowski. Morphological boundary pixel classification. In International Conference on "Computer as a Tool", 2007.M. Iwanowski. Morphological boundary pixel classification. In International Conference on "Computer as a Tool", 2007. Y. Chen and G. Medioni. Object modeling by registration of multiple range images. Image and Vision Computing, 10(3):145??155, 1992.Y. Chen and G. Medioni. Object modeling by registration of multiple range images. Image and Vision Computing, 10(3):145??155, 1992. T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard,T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J. McDonald. Kintinuous: Spatially extended Kinect-Fusion. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, 2012.and J. McDonald. Kintinuous: Spatially extended Kinect-Fusion. In RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, 2012. T. Whelan, S. Leutenegger, R. S. Moreno, B. Glocker, and A. Davison. Elasticfusion: Dense slam without a pose graph. In RSS, 2015.T. Whelan, S. Leutenegger, R. S. Moreno, B. Glocker, and A. Davison. Elasticfusion: Dense slam without a pose graph. In RSS, 2015. J. Xiao, A. Owens, and A. Torralba. SUN3D: A database of big spaces reconstructed using sfm and object labels. In ICCV, 2013.J. Xiao, A. Owens, and A. Torralba. SUN3D: A database of big spaces reconstructed using sfm and object labels. In ICCV, 2013.

본 발명에서는 복원될 실내 구조의 레이아웃을 추정하고 이를 활용하여 3차원 복원 정확도를 향상시키고자 한다.In the present invention, it is intended to improve 3D reconstruction accuracy by estimating the layout of an indoor structure to be restored and utilizing it.

본 발명의 일 실시 예에 따른 3차원 실내 복원 방법은, 깊이 측정 카메라를 이용해 실내를 촬영한 동영상을 획득하는 단계, 획득한 동영상을 이용하여 3차원 복원을 위한 초기 데이터를 만드는 단계, 상기 초기 데이터로부터 레이아웃을 추정하는 단계, 상기 추정된 레이아웃을 활용하여 글로벌 3차원 데이터 정합을 수행하는 단계를 포함한다.A 3D indoor restoration method according to an embodiment of the present invention includes acquiring a video of a room photographed using a depth measuring camera, creating initial data for 3D restoration using the acquired video, and the initial data. estimating a layout from, and performing global 3D data matching using the estimated layout.

본 발명의 일 실시 예에 따른 실내 복원 방법은 카메라 자세 추정 시 발생하는 누적 오차에 취약하고, GPU 메모리의 한계로 인해 좁은 공간의 복원만 가능한 문제를 해결하고 실내 3차원 복원 기술의 성능을 향상시킬 수 있다.The indoor restoration method according to an embodiment of the present invention is vulnerable to cumulative errors generated when estimating the camera posture, solves the problem of being able to restore only a narrow space due to the limitation of GPU memory, and improves the performance of indoor 3D restoration technology. can

도 1은 본 발명의 일 실시 예에 따른 실내 3차원 복원 기술 알고리즘의 전체적인 흐름을 나타낸다.
도 2는 계층적 응집적 군집화 및 에너지 기반 다중 모델 정합에 의한 평면 군집화 결과를 나타낸다.
도 3은 레이아웃 추정 과정을 나타낸다.
도 4는 본 발명의 일 실시 예에 따른 레이아웃 추정과 글로벌 정합의 결합적 최적화 알고리즘이다.
도 5는 도 4의 알고리즘에 따라 반복적인 최적화를 한 결과에 따른 복원 품질의 향상을 나타낸다.1 shows the overall flow of an indoor 3D reconstruction technology algorithm according to an embodiment of the present invention.
Figure 2 shows the results of hierarchical cohesive clustering and planar clustering by energy-based multi-model matching.
3 shows a layout estimation process.
4 is an associative optimization algorithm of layout estimation and global matching according to an embodiment of the present invention.
FIG. 5 shows the improvement of restoration quality according to the result of iterative optimization according to the algorithm of FIG. 4 .

이하에서는 도면을 참조하여 본 발명의 구체적인 실시예를 상세하게 설명한다. 그러나 본 발명의 사상은 이하의 실시예에 제한되지 아니하며, 본 발명의 사상을 이해하는 당업자는 동일한 사상의 범위 내에 포함되는 다른 실시예를 구성요소의 부가, 변경, 삭제, 및 추가 등에 의해서 용이하게 제안할 수 있을 것이나, 이 또한 본 발명 사상의 범위 내에 포함된다고 할 것이다. Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings. However, the spirit of the present invention is not limited to the following embodiments, and those skilled in the art who understand the spirit of the present invention can easily find other embodiments included in the scope of the same spirit by adding, changing, deleting, and adding elements. It may be suggested, but it will be said that this is also included within the scope of the spirit of the present invention.

첨부 도면은 발명의 사상을 이해하기 쉽게 표현하기 위하여 전체적인 구조를 설명함에 있어서는 미소한 부분은 구체적으로 표현하지 않을 수도 있고, 미소한 부분을 설명함에 있어서는 전체적인 구조는 구체적으로 반영되지 않을 수도 있다. 또한, 설치 위치 등 구체적인 부분이 다르더라도 그 작용이 동일한 경우에는 동일한 명칭을 부여함으로써, 이해의 편의를 높일 수 있도록 한다. 또한, 동일한 구성이 복수 개가 있을 때에는 어느 하나의 구성에 대해서만 설명하고 다른 구성에 대해서는 동일한 설명이 적용되는 것으로 하고 그 설명을 생략한다. In the accompanying drawings, in describing the overall structure in order to easily understand the spirit of the invention, minute parts may not be specifically expressed, and in describing minute parts, the overall structure may not be specifically reflected. In addition, even if the specific parts such as the installation location are different, when the action is the same, the same name is given to increase the convenience of understanding. In addition, when there are a plurality of the same configuration, only one configuration is described, and the same description is applied to the other configurations, and the description is omitted.

상술한 바와 같이, 기존의 문제를 해결하기 위해 본 발명에서는 복원될 실내 구조의 레이아웃을 추정하고, 이를 이용하여 3차원 복원 정확도를 향상시키고자 하였다. 즉, 본 발명에서는 바닥과 천장이 평행하며 벽에 수직하도록 실내 구조를 제약하여 실내를 복원한다.As described above, in order to solve the existing problem, in the present invention, the layout of an indoor structure to be restored is estimated, and 3D reconstruction accuracy is improved by using the estimated layout. That is, in the present invention, the interior is restored by constraining the interior structure so that the floor and the ceiling are parallel and perpendicular to the wall.

도 1은 본 발명의 일 실시 예에 따른 실내 3차원 복원 기술 알고리즘의 전체적인 흐름을 나타낸다.1 shows the overall flow of an indoor 3D reconstruction technology algorithm according to an embodiment of the present invention.

도 1에 도시된 알고리즘은 각 단계가 각각 초기 정합부, 레이아웃 추정부, 레이아웃을 이용한 글로벌 정합부로 명명되는 모듈에 의해 구현될 수 있다. 또한, 도 1에 도시된 알고리즘은 각 단계가 프로그램화 된 애플리케이션에 의해 수행되거나, 또는 애플리케이션이 설치된 시스템에 의해 수행될 수 있다. 여기에서 시스템이란 애플리케이션이 설치되고 실행이 가능한 하드웨어 장치를 의미할 수 있으며, 상술한 각 모듈은 하나의 하드웨어 장치에 모두 마련될 수도 있으며, 복수의 서로 다른 하드웨어 장치에 나누어 마련될 수도 있다.The algorithm shown in FIG. 1 may be implemented by modules in which each step is named an initial matching unit, a layout estimation unit, and a global matching unit using layout. Also, the algorithm shown in FIG. 1 may be performed by an application in which each step is programmed, or a system in which the application is installed. Here, the system may refer to a hardware device capable of installing and executing applications, and each of the above-described modules may be provided in one hardware device or may be provided in a plurality of different hardware devices.

본 발명의 일 실시예에 따른 실내 3차원 복원 기술 알고리즘을 간단하게 설명하면, 우선 기존 알고리즘(선행기술 2)를 이용하여 실내의 부분별 3차원 볼륨 데이터를 형성하고 글로벌 3차원 데이터 정합(Global ICP)를 수행해 누적 오차로 인한 불완전한 3차원 복원 데이터를 획득한다. 그리고 불완전 3차원 복원 데이터를 레이아웃 추정과 레이아웃 제약 기반의 Global ICP를 반복적으로 수행하여 보완된 데이터를 획득한다. 이하에서 글로벌 3차원 데이터 정합 방법을 순차적으로 설명한다.Briefly explaining the indoor 3D restoration technology algorithm according to an embodiment of the present invention, first, using the existing algorithm (prior art 2), 3D volume data for each part of the room is formed, and global 3D data matching (Global ICP ) to obtain incomplete 3D reconstruction data due to accumulated errors. In addition, supplementary data is obtained by repeatedly performing layout estimation and layout constraint-based global ICP on incomplete 3D reconstruction data. Hereinafter, the global 3D data matching method will be sequentially described.

1. 초기 글로벌 3차원 데이터 정합(Initial registration)1. Initial global 3D data registration (Initial registration)

깊이 측정 카메라를 이용해 실내를 촬영한 동영상이 주어졌을 때, 3차원 복원을 위한 초기 데이터를 만들기 위하여 선행기술 2에 따른 알고리즘을 이용한다. 선행기술 2에 따른 알고리즘은 아래에서 설명하는 1) 부분적 3차원 복원, 2) 루프 결합 검출, 그리고 3)자세 그래프 최적화의 3단계로 구성된다.When a video taken indoors using a depth measuring camera is given, the algorithm according to the prior art 2 is used to create initial data for 3D reconstruction. The algorithm according to the prior art 2 consists of three steps: 1) partial 3D reconstruction, 2) loop joint detection, and 3) posture graph optimization.

(1) 부분적 3차원 복원(1) Partial 3D reconstruction

건물의 실내가 우선 부분적으로 복원된다. 구체적으로, 깊이 측정 카메라를 이용해 촬영한 동영상이 50프레임씩 나누어 3차원 복원이 수행되고, 그 복원된 장면 조각(scene fragment)들이 3차원 전역 좌표계(3D global coordination)에 정합(registration)된다. 이렇게 하는 이유는 적은 수의 프레임을 이용한 3차원 복원은 어느 정도 정확한 복원이 가능하며, 그러면 대규모의 실내 3차원 복원 문제는 장면 조각들을 전역 좌표계에 정합(global registration)하는 문제로 단순화되어질 수 있기 때문이다.The interior of the building is first partially restored. Specifically, 3D reconstruction is performed by dividing a video captured using a depth camera into 50 frames, and the restored scene fragments are registered with a 3D global coordination. The reason for this is that 3D reconstruction using a small number of frames can be somewhat accurate, and then the large-scale indoor 3D reconstruction problem can be simplified to a problem of registering scene fragments to the global coordinate system. am.

장면 조각들(

)을 만들기 위해서, 선행기술 1에서 소개하는 KinectFusion 알고리즘이 사용된다. KinectFusion 알고리즘을 이용하면 연속된 프레임 간 카메라의 자세 변화를 추정해주기 때문에, 연속된 장면 조각들 간의 좌표계(혹은 자세) 변환 행렬(T_{i, i+1})을 알 수 있으며, 이 변환 행렬의 순차적인 곱을 통해 각 장면 조각들을 전역 좌표계에 정합시킬 수 있다. 이러한 과정을 통해 장면 조각들의 집합(

)에 대해, 장면 조각들을 전역 좌표계로 정합시키는 변환 행렬(

)을 구할 수 있다.scene fragments (

), the KinectFusion algorithm introduced in Prior Art 1 is used. Since the KinectFusion algorithm estimates the change in the pose of the camera between consecutive frames, it is possible to know the coordinate system (or pose) transformation matrix (T _{i, i+1} ) between successive scene pieces, and the sequential sequence of this transformation matrix Through multiplication, each scene fragment can be registered to the global coordinate system. Through this process, a set of scene fragments (

), a transformation matrix that aligns the scene fragments to the global coordinate system (

) can be obtained.

(2) 루프 결합(loop closure) 검출(2) loop closure detection

장면 조각들 간 변환 행렬(T_{i, i+1})을 이용하여 글로벌 정합(global registration)을 수행할 경우 자세 추정에 대한 누적 오차가 발생하는바 부정확한 실내 복원 결과가 제공될 수 있다. 따라서, 이를 보완하기 위해 루프 결합을 검출하고, 검출 결과에 기초하여 변환 행렬을 최적화하는 단계가 수행된다.When global registration is performed using a transformation matrix (T _{i, i+1} ) between scene pieces, an accumulated error for pose estimation occurs, and thus an inaccurate indoor reconstruction result may be provided. Therefore, in order to compensate for this, a step of detecting loop coupling and optimizing a transformation matrix based on the detection result is performed.

루프 결합을 검출하기 위해 FPFH 기술자(선행기술 3)를 이용해 장면 조각들 간 정합이 수행된다. 만약 한쌍의 장면 조각 F_i, F_j에 대해 정합했을 때 겹치는 영역의 비율이 30%이상이면, 그 한쌍의 장면 조각을 루프 결합에 성공한것으로 간주하고, 그 정합 결과로부터 변환 행렬(T_i,j)이 정의된다.Matching between scene fragments is performed using the FPFH descriptor (prior art 3) to detect loop coupling. If the ratio of the overlapping area when matching a pair of scene pieces F _i and F _j is 30% or more, the pair of scene pieces are considered to be loop-combined successfully, and the transformation matrix (T _i,j ) from the matching result ) is defined.

(3) 자세 그래프(pose graph) 최적화(3) Optimization of pose graph

루프 결합이 검출되면, 자세 그래프, 즉, 변환 행렬의 집합(

)이 최적화된다. T_i는 장면 조각 F_i를 전역 좌표계로 변환하는 행렬로 정의되며,

는 j번째 장면 조각 좌표들 F_j를 i번째 장면 조각 좌표계로 변환하는 행렬로 정의된다. 그리고 난뒤, 아래 수학식 1에 따라 에너지 함수를 최소화함으로써, 최적의 변환 행렬(

)과 라인 프로세스(line process) (

)가 추정될 수 있다.When loop coupling is detected, the pose graph, that is, a set of transformation matrices (

) is optimized. T _i is defined as a matrix that transforms a scene fragment F _i into the global coordinate system,

is defined as a matrix that transforms the j-th scene fragment coordinates F _j into the i-th scene fragment coordinate system. And then, by minimizing the energy function according to Equation 1 below, the optimal transformation matrix (

) and line process (

) can be estimated.

여기에서, 함수

는 부분적 3차원 복원에 의해 계산된 변환 행렬과 T_a,T_b로부터 계산된 변환 행렬간의 차이를 측정한다.

는 0 ~ 1의 값을 갖는 라인 프로세스의 매개 변수이다. 함수

는 올바른 루프 결합(inlier loop closure)의 수를 최대화 해주며, l_ij가 일정 임계치 이상이면 i와 j번째 조각이 올바른 루프 결합을 이룸을 의미한다.Here, the function

measures the difference between the transformation matrix calculated by partial 3D reconstruction and the transformation matrix calculated from T _{a and} T _b .

is a parameter of the line process with a value of 0 to 1. function

maximizes the number of correct loop closures (inlier loop closure), and if l _ij is above a certain threshold value, it means that the i and j th pieces form the correct loop closure.

2. 레이아웃 추정(Layout estimation)2. Layout estimation

레이아웃 추정 문제는 실내를 대표하는 평면을 찾는 문제이다. 바닥, 천장, 벽 등과 같이 평면들의 집합으로 구성된 실내의 레이아웃을 추정하기 위해 먼저 대표 평면(dominant plane, P_dominant)이 추정되고, 다음에 대표 평면으로부터 레이아웃 평면들의 집합(P_layout)이 선택된다.The layout estimation problem is a problem of finding a plane representing the interior. In order to estimate the layout of a room composed of a set of planes, such as a floor, ceiling, and wall, a dominant plane (P _dominant ) is first estimated, and then a set of layout planes (P _layout ) is selected from the representative plane.

(1) 대표 평면 추출(Dominant plane extraction)(1) Dominant plane extraction

도 2는 계층적 응집적 군집화 및 에너지 기반 다중 모델 정합에 의한 평면 군집화 결과를 나타낸다.Figure 2 shows the results of hierarchical cohesive clustering and planar clustering by energy-based multi-model matching.

먼저, 장면 조각(F_i)을 수퍼복셀들(supervoxels)의 집합(

)으로 나눈다. 그리고 임의의 세개의 인접한 수퍼복셀을 선택하고, 그 수퍼복셀들의 중심점을 이용하여 평면 파라미터 추측치(plane parameter hypothesis,

)를 계산한다. 이렇게 하는 이유는 선택된 수퍼복셀의 모든 점을들 이용하는 것보다 빠르게 평면 파라미너를 계산할 수 있기 때문이다. 벽과 같이 넓은 평면을 갖는 영역은 유사한 다수의 평면 파라미터가 존재하는바, 이들을 군집화(clustering)하여 하나의 평면 파라미터로 표현할 필요가 있다. 이를 위해 본 발명에서는 두 단계의 군집화를 통해 대표 평면을 추출하는 방법을 제안한다.First, a scene fragment (Fi ) is a set of _supervoxels (

) divided by Then, randomly select three adjacent supervoxels, and use the center points of the supervoxels to obtain a plane parameter hypothesis.

) is calculated. The reason for doing this is that the planar parameters can be calculated faster than using all the points of the selected supervoxel. A region having a wide plane, such as a wall, has a plurality of similar plane parameters, and it is necessary to cluster them and express them as one plane parameter. To this end, the present invention proposes a method of extracting a representative plane through two-step clustering.

첫번째 군집화 단계로 계층적 응집적 군집화(hierarchical agglomerative clustering, HAC)를 통해 평면 파라미터를 병합한다. 이를 위해 임의의 수퍼복셀 S_k와 임의의 평면 파라미터 추측치(

) 사이의 거리를 측정하는 함수를 아래 수학식 2와 같이 정의한다.In the first clustering step, planar parameters are merged through hierarchical agglomerative clustering (HAC). For this purpose, an arbitrary supervoxel S _k and an arbitrary plane parameter estimate (

) A function for measuring the distance between is defined as in Equation 2 below.

여기에서

는 수퍼복셀 내 3차원 점의 개수를 의미하며, 거리함수 d()는 수학식 3과 같이 정의된다.From here

denotes the number of 3D points in the supervoxel, and the distance function d() is defined as in Equation 3.

여기에서, 평면 파라미터는

로 정의되며,

는 수퍼복셀 S_k에 속하는 3차원 점 P의 동차 좌표(homogeneous coordinate), 즉

를 의미한다. 군집화된 수퍼복셀들은 병합된 하나의 평면 파라미터를 계산하는데 사용된다.Here, the plane parameter is

is defined as,

is the homogeneous coordinate of the 3-dimensional point P belonging to the supervoxel S _k , that is,

means The clustered supervoxels are used to calculate the merged one plane parameter.

하지만, 장면 조각 경계 부분에서는 지역적으로 왜곡되어 복원된 경우가 많아 같은 평면으로 군집화되어야 할 부분이 여전히 군집화되지 못한채로 남아있는 경우가 발생한다. 따라서, 이러한 부분들의 군집화를 가능케하기 위해 본 발명에서는 두 번째 군집화 단계로 에너지 기반 다중 모델 정합 방법을 제안한다.However, there are many cases where the region is distorted and reconstructed at the scene fragment boundary, so parts that should be clustered on the same plane still remain unclustered. Therefore, in order to enable clustering of these parts, the present invention proposes an energy-based multi-model matching method as a second clustering step.

장면 조각 내의 3차원 점들(

)과 평면 파라미터 추측치들(

)이 주어질 때, 본 발명에서 해결하고자 문제는 3차원 점(p)를 평면 파라미터에 매핑하는 함수

를 구하는 것으로 정의된다. 이 문제를 해결하기 위한 에너지 함수 E_P는 아래 수학식 4와 같이 정의된다.3D points in a scene fragment (

) and planar parameter estimates (

) is given, the problem to be solved in the present invention is a function that maps a three-dimensional point (p) to a plane parameter.

is defined as finding An energy function E _P to solve this problem is defined as in Equation 4 below.

여기에서 데이터 항(D_p)는 3차원 점(p)와 평면 파라미터 사이의 거리를 측정하는 수학식 5와 같이 정의된다.Here, the data term (D _p ) is defined as in Equation 5 that measures the distance between the 3D point (p) and the plane parameter.

여기에서

는 상수이다.

는 평면 파라미터가 할당되지 않은 상태를 의미하는데, 이는 비평면 위에 놓여진 점들 혹은 잡음이 큰 점들에

를 할당하기 위함이다. 스무드 항(smoothness term) V_p,q는 Pott 모델(선행기술 4)이 사용된다. Pott 모델은

로 정의되며, 여기에서

는 패널티 파라미터(penalty weight)이고, 함수(T)는 함수 안의 내용이 참이면 1, 그렇지 않으면 0을 출력하는 함수이다.From here

is a constant.

means a state in which no planar parameter is assigned, which is for points lying on a non-planar surface or points with large noise.

is to assign The smoothness term V _p,q uses the Pott model (prior art 4). Pott's model is

is defined as, where

is a penalty weight, and function T is a function that outputs 1 if the content in the function is true and 0 otherwise.

N_p는 점 p에 이웃하는 점들의 집합으로, k 최근접이웃(k-nearest neighbor, k-NN) 알고리즘(선행기술 5)을 이용해 추정된다. 에너지 함수 E_p는 그래프 컷(graph cuts)을 이용하여 최적의 매핑 h를 추정한다.N _p is a set of points neighboring the point p, which is estimated using the k-nearest neighbor (k-NN) algorithm (Prior Art 5). The energy function E _p estimates the optimal mapping h using graph cuts.

결과적으로, 적은 수의 병합된 평면 파리미터를 획득할 수 있으며, 각 조각F_i에서 추정된 평면 파라미터를 전역 좌표계로 모두 변환하여 투영시키면 (a)와 같이 표현된다.As a result, it is possible to obtain a small number of merged plane parameters, and if all the plane parameters estimated in each piece F _i are converted to the global coordinate system and projected, it is expressed as (a).

다음으로, (a)와 같이 전역 좌표계로 투영된 평면들을 병합하여 (b)와 같이 대표 평면들로 나타내기 위해 HAC 알고리즘을 다시 사용하여 평면들을 병합한다. 일 실시 예에서, 수학식 4에 따른 에너지 기반 다중 모델 정합 방식을 이용하여 (a)와 같은 병합된 평면을 획득할 수 있고, 수학식 2에 따른 HAC 알고리즘을 이용하여 (b)와 같은 대표 평면들을 획득할 수 있다.Next, as in (a), the planes projected in the global coordinate system are merged and the planes are merged using the HAC algorithm again to represent them as representative planes as in (b). In one embodiment, a merged plane such as (a) can be obtained using the energy-based multi-model matching method according to Equation 4, and a representative plane such as (b) using the HAC algorithm according to Equation 2 can obtain them.

(2) 레이아웃 평면 추정(Layout plane estimation)(2) Layout plane estimation

도 3은 레이아웃 추정 과정을 나타낸다.3 shows a layout estimation process.

앞에서 추정된 대표 평면들(P_dominant)과 군집화된 점무리가 주어지는 경우, 바닥, 천장, 벽 등을 포함하는 레이아웃 평면을 추정할 수 있다. 이때, 레이아웃 평면 추정을 위한 알고리즘으로 일 실시 예에서 약한 맨해튼 세계 가정(weak Manhattan world assumption)을 이용할 수 있다. 이를 통해 레이아웃 평면 추정과정에서 수직에 거의 가까운 평면들을 강제적으로 수직하게 만들어 레이아웃 추정의 정확도를 향상시킬 수 있다.Given the previously estimated representative planes (P _dominant ) and clustered points, layout planes including floors, ceilings, and walls can be estimated. In this case, in an embodiment, a weak Manhattan world assumption may be used as an algorithm for estimating the layout plane. Through this, in the process of estimating the layout plane, it is possible to improve the accuracy of the layout estimation by forcibly making the planes almost vertical to be vertical.

도 3에 도시된 바와 같이, 본 발명에서는 두 단계를 거쳐 레이아웃 평면들의 집합(P_layout)을 찾는다. 첫 번째 단계는 천장 혹은 바닥으로 가정되는 대표 평면들 중에 가장 넓은 영역을 갖는 기저 평면(base plane)을 찾는 것이다. 도 3의 점유 격자 생산(Occupancy grid generation)에 도시된 바와 같이, 각 평면에 평행한 점유 격자 지도(Occupancy grid map)를 만들고, 그 평면에 속하는 점들을 점유 격자 지도에 투영시킨다. 그리고 나서 점유된 격자의 수를 세어 가장 많은 영역을 점유한 평면을 기저 평면으로 결정한다.As shown in FIG. 3 , in the present invention, a set of layout planes (P _layout ) is found through two steps. The first step is to find a base plane having the largest area among representative planes assumed to be a ceiling or a floor. As shown in the occupancy grid generation of FIG. 3, an occupancy grid map parallel to each plane is created, and points belonging to the plane are projected onto the occupancy grid map. Then, by counting the number of occupied lattices, the plane occupying the most area is determined as the base plane.

두 번째 단계는 기저 평면에 수직한 평면들, 즉, 벽을 찾아내는 단계이다. 먼저 기저 평면과 평행한 점유 격자 지도를 형성한다. 그리고 전역 좌표계로 투영된 모든 장면 조각에 속하는 모든 점들을 점유 격자 지도로 사영시킨다. 만약, 점유 격자 지도의 어떤 칸에 사영된 점들의 수가 일정 임계치 이상이면, 해당 칸은 점유된것으로 판단한다. 이렇게 만들어진 점유 격자 지도는 모폴로지 경계 검출 방법(선행기술 6)을 이용해 점유 격자의 경계(

), 즉, 경계면에 해당하는 위치를 알 수 있다. 최종적으로 수학식 6에 따라 계산된 값이 1인 평면을 레이아웃 평면으로 결정한다.The second step is to find planes perpendicular to the base plane, i.e. walls. First, an occupancy grid map parallel to the basal plane is formed. Then, all points belonging to all scene fragments projected in the global coordinate system are projected into the occupancy grid map. If the number of projected points in a cell of the occupied grid map exceeds a certain threshold, the cell is determined to be occupied. The occupied lattice map thus created uses the morphological boundary detection method (prior art 6) to determine the boundary of the occupied lattice (

), that is, the position corresponding to the boundary surface can be known. Finally, a plane having a value of 1 calculated according to Equation 6 is determined as a layout plane.

여기에서,

와

는 선택된 대표 평면과 기저 평면의 법선 벡터를 나타낸다. 첫 번째 판단 기준은 두 평면의 수직성을 검사하는 것이며, 두 번째 판단 기준은 해당 대표 평면(

)과 경계면(

) 사이의 거리를 측정한다.

과

는 사용자에 의해 정해진 상수 파라미터이다. 거리함수 g()는 아래 수학식 7과 같이 정의된다.From here,

and

represents the normal vector of the selected representative plane and the basal plane. The first criterion is to check the perpendicularity of the two planes, and the second criterion is to check the corresponding representative plane (

) and the boundary (

) to measure the distance between them.

class

is a constant parameter determined by the user. The distance function g() is defined as in Equation 7 below.

여기에서 p_proj는 점p를 기저 평면에 투영한 점을 의미하며,

는 평면(

)에 속하는 점들의 집합을 의미하고,

는 점들의 집합(

)에 속하는 점들의 수를 나타낸다.Here, p _proj is the projection of point p onto the base plane,

is the plane (

) is the set of points belonging to

is the set of points (

) represents the number of points belonging to

3. 레이아웃 제약을 이용한 글로벌 3차원 데이터 정합(Layout-constrained global registration)3. Layout-constrained global registration

글로벌 정합이란, 앞서 추정된 레이아웃을 활용하여 장면 조각들의 집합을 전역 좌표계에 정합시킴으로써, 실내를 3차원으로 복원하는 것을 의미한다. 본 발명의 일 실시 예에서 글로벌 3차원 데이터 정합은 레이아웃 추정 문제와 글로벌 정합 문제가 서로 의존적인 특성을 이용해 두 문제를 서로 번갈아가며 해결하는 결합적 접근법을 제안한다. 이하에서 이를 상세하게 설명한다.Global matching refers to restoring a room in 3D by matching a set of scene fragments to a global coordinate system using a previously estimated layout. In one embodiment of the present invention, the global 3D data matching proposes a combined approach in which the layout estimation problem and the global matching problem alternately solve each other using the mutually dependent characteristics. This will be described in detail below.

도 4는 본 발명의 일 실시 예에 따른 레이아웃 추정과 글로벌 정합의 결합적 최적화 알고리즘이다.4 is an associative optimization algorithm of layout estimation and global matching according to an embodiment of the present invention.

도 5는 도 4의 알고리즘에 따라 반복적인 최적화를 한 결과에 따른 복원 품질의 향상을 나타낸다.FIG. 5 shows the improvement of restoration quality according to the result of iterative optimization according to the algorithm of FIG. 4 .

여기에서,

는 장면 조각의 쌍으로 이루어진 집합으로 정의하며, 겹쳐진 영역이 존재하는 한 쌍의 조각(

)에 대하여 겹쳐진 영역에 대한 대응점들의 집합을

로 정의한다. 이와 유사하게, 조각

와 레이아웃 사이의 대응되는 점들의 집합을

로 나타낸다. 레이아웃은 평면 파라미터 형태로 존재하기 때문에, 가상적으로 3차원 점을 만들어주어야 한다 이를 위해

의 점을 레이아웃 평면에 투영하여 그 점을 레이아웃의 점으로 정의하고, 두 점간의 거리가 작은 경우, 그 주점을 대응점으로써 정의한다. 조각에 속하는 임의의 3차원 점(

)과 임의의 장면 조각으로부터 전역 좌표계로 변환시키는 행렬(T)에 대하여 전역 좌표계로 변환된 좌표는

로 구해지며,

은 회전 행렬(rotation matrix),

는 병진 벡터(translation vector)를 나타낸다.From here,

is defined as a paired set of scene fragments, and a pair of fragments with overlapping regions (

), a set of correspondence points for the overlapped region

is defined as Similarly, fragment

the set of corresponding points between and the layout

represented by Since the layout exists in the form of a plane parameter, it is necessary to virtually create a 3-dimensional point.

The point of is projected onto the layout plane, the point is defined as a point of the layout, and when the distance between the two points is small, the principal point is defined as a corresponding point. Any 3D point belonging to the piece (

) and the coordinates converted to the global coordinate system for the matrix T that transforms from an arbitrary scene fragment to the global coordinate system are

is saved by

is the rotation matrix,

represents a translation vector.

모든 조각들에 대한 글로벌 정합을 위해 수학식 8과 같이 에너지 함수를 정의한다.For global matching of all pieces, an energy function is defined as in Equation 8.

여기에서,

와

는 가중치 파라미터(weighting parameter)이며,

는 계산되는 점들의 개수에 따라,

는 계산되는 장면 조각 쌍들의 개수에 따라 결정된다. 수학식 8에서 첫번째 항은 레이아웃과 한 장면 조각 간 대응점의 거리를 최소화하는 역할을 한다. 이를 위한 거리 척도로써, point-to-plane(선행기술 7)를 이용하며 이는 수학식 9와 같이 정의된다.From here,

and

is a weighting parameter,

depends on the number of points to be calculated,

is determined according to the number of scene piece pairs to be calculated. In Equation 8, the first term serves to minimize the distance of the corresponding point between the layout and one scene fragment. As a distance scale for this, point-to-plane (prior art 7) is used, which is defined as in Equation 9.

여기에서, p는 장면 조각에 속하는 3차원 점을, n_p는 p의 법선 벡터를, q는 레이아웃에 속하는 점을 의미한다. 수학식 8의 두번째 항은 두 장면 조각 간 대응점의 거리를 최소화 시키는 역할을 하며, 위와 동일하게 point-to-plane 척도를 이용해 수학식 10과 같이 정의된다.Here, p denotes a 3D point belonging to a scene fragment, n _p denotes a normal vector of p, and q denotes a point belonging to a layout. The second term of Equation 8 serves to minimize the distance of the corresponding point between the two scene pieces, and is defined as Equation 10 using the point-to-plane scale in the same way as above.

수학식 8에서 마지막 항은 추정된 변환 행렬이 기존에 계산된 변환 행렬로부터 크게 달라지는 것을 제한하는 항으로 수학식 11과 같이 정의된다.The last term in Equation 8 is defined as in Equation 11 as a term limiting the estimated transformation matrix from being significantly different from the previously calculated transformation matrix.

여기에서,

는 ICP 알고리즘(선행기술 7)에 의해 추정된 두 장면 조각 간 변환 행렬을 의미하고,

는 행렬 요소 값들의 L1-norm 값들의 합을 나타낸다. 수학식 8을 최적화하기 위해 Gauss-Newton 방법이 이용될 수 있다.From here,

Means a transformation matrix between two scene fragments estimated by the ICP algorithm (prior art 7),

represents the sum of L1-norm values of matrix element values. A Gauss-Newton method may be used to optimize Equation 8.

레이아웃 추정 문제와 글로벌 정합 문제를 한번에 해결하는 것은 어려운 문제이기 때문에, 본 발명에서는 도 4에 도시된 바와 같이, 두 문제를 번갈아가면서 최적화를 수행한다. 장면 조각(

)와 변환 행렬에 대한 초기값(

)이 주어지는 것으로 가정하며, 첫 번째 장면 조각의 좌표계를 전역 좌표계로 설정하고, T₀는 항등 행렬(identity matrix)로 고정시킨다.Since it is difficult to solve the layout estimation problem and the global matching problem at once, the present invention alternately optimizes the two problems, as shown in FIG. 4 . scene fragment (

) and the initial value for the transformation matrix (

) is given, the coordinate system of the first scene fragment is set to the global coordinate system, and T ₀ is fixed as an identity matrix.

그리고 도 4의 알고리즘에 따라 반복하는 경우 도 5에 도시된 바와 같이, 반복 횟수가 증가할수록 굽어진 실내 복원 결과가 올바르게 펴진 형태로 복원되는 것을 확인할 수 있다.Further, when the algorithm of FIG. 4 is repeated, as shown in FIG. 5 , it can be seen that as the number of iterations increases, the result of restoring the curved interior is correctly restored to a straightened form.

아래 표는 본 발명에서 제안하는 알고리즘과 기존의 알고리즘의 복원 성능 평가를 비교하는 것이다.The table below compares the restoration performance evaluation of the algorithm proposed in the present invention and the existing algorithm.

표 1에 기재된 바와 같이, 다른 최근의 방법들에 비해 실내를 더 정확하게 복원할 수 있음을 확인할 수 있다. 또한, 가상 데이터를 이용한 정량적 평가에서 수치적으로도 복원 정확도가 향상되었음을 확인할 수 있다.As shown in Table 1, it can be confirmed that the interior can be restored more accurately than other recent methods. In addition, it can be confirmed that the restoration accuracy is improved numerically in the quantitative evaluation using virtual data.

전술한 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. The above-described present invention can be implemented as computer readable code on a medium on which a program is recorded. The computer-readable medium includes all types of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable media include Hard Disk Drive (HDD), Solid State Disk (SSD), Silicon Disk Drive (SDD), ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. , and also includes those implemented in the form of a carrier wave (eg, transmission over the Internet).

따라서, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.Accordingly, the above detailed description should not be construed as limiting in all respects and should be considered illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

Claims

Obtaining a video of a room photographed using a depth measuring camera;
Creating initial data for 3D reconstruction using the acquired video;
estimating a layout from the initial data; and
Performing global 3D data matching using the estimated layout;
The step of estimating the layout from the initial data,
Extracting a representative plane representing the interior using hierarchical agglomerative clustering (HAC) and an energy-based multi-model matching method; and
Estimating a layout plane from the estimated representative planes
3D interior restoration method.

According to claim 1,
The step of creating initial data for the 3D reconstruction,
Creating 3D reconstructed scene fragments by dividing the acquired video by a certain frame, acquiring a transformation matrix by matching the restored scene fragments to a 3D global coordinate system, and detecting loop combinations to complement the obtained transformation matrix optimizing the transformation matrix based on
3D interior restoration method.

According to claim 1,
Estimating a layout plane from the estimated representative planes,
Finding a base plane among the representative planes and a plane perpendicular to the base plane
3D interior restoration method.

According to claim 1,
The step of performing global 3D data matching using the estimated layout
Iterating the layout estimation and global matching alternately a predetermined number of times.
3D interior restoration method.