KR20210075619A

KR20210075619A - Autonomous robot, location estimation server of autonomous robot and location estimation or autonomous robot using the same

Info

Publication number: KR20210075619A
Application number: KR1020190166926A
Authority: KR
Inventors: 김기현; 김현숙; 한상훈
Original assignee: 주식회사 케이티
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2021-06-23
Also published as: KR102457588B1

Abstract

When a location estimation server receives a plurality of received key frames generated based on an image of an autonomous driving robot, a plurality of first image feature points and a first feature descriptor for the first image feature points are extracted from each of the received key frames. The location estimation server calculates a vector of storage key frames based on visual vocabulary for each of the plurality of storage key frames stored in advance, arranges a storage key frame in which the second image feature points of the storage key frames constituting the calculated vector coincides with the first image feature points, and create it as a key frame index. In addition, the location estimation server calculates weights for the second feature descriptors of the storage key frames included in the key frame index, and selects a comparison candidate key frame from among the plurality of storage key frames. The present invention can estimate the location of a place where a robot is driving without prior information.

Description

Autonomous robot, location estimation server of autonomous driving robot, and method for estimating location of autonomous driving robot using same {Autonomous robot, location estimation server of autonomous robot and location estimation or autonomous robot using the same}

본 발명은 자율주행 로봇, 자율주행 로봇의 위치 추정 서버 및 이를 이용한 자율주행 로봇의 위치 추정 방법에 관한 것이다.The present invention relates to an autonomous driving robot, a position estimation server for an autonomous driving robot, and a method for estimating a position of an autonomous driving robot using the same.

자율주행 로봇(이하, '로봇'이라 지칭함)에서 시각적 장소 인식이란, 로봇이 구비된 카메라를 통해 수집한 이미지를 이용하여, 현재 있는 장소가 이전에 방문한 장소인지 여부를 확인하는 기술이다. Visual place recognition in autonomous driving robots (hereinafter referred to as 'robots') is a technology that uses images collected through a camera equipped with a robot to determine whether the current place is a previously visited place.

SLAM(Simulation Language for Alternative Modeling)에서 누적되는 로봇 위치의 오차를 줄이는 루프 결합 감지를 수행하는 데, 시각적 장소 인식은 중요한 정보를 제공한다. 시각적 장소 인식은 전역적 위치 추정, 증강 현실 등 다양한 분야에 응용 및 적용되고 있다. 전역적 위치 추정이란, 로봇의 초기 위치 정보가 없이, 로봇의 주변 환경을 인지하여 로봇의 위치를 추정하는 기술이다. In SLAM (Simulation Language for Alternative Modeling), to perform loop joint detection that reduces the accumulated error of robot position, visual place recognition provides important information. Visual place recognition is being applied and applied in various fields such as global location estimation and augmented reality. Global position estimation is a technology for estimating the position of the robot by recognizing the surrounding environment of the robot without initial position information of the robot.

로봇이 공간을 이동하면, 로봇과 연동한 서버는 로봇이 방문한 공간에 대한 맵을 시각적 장소 인식 기법을 이용하여 생성한다. 서버에서 생성된 맵이 로봇에 제공됨으로써, 로봇은 현재 위치가 이전에 방문한 공간인지 여부를 확인할 수 있다. 그러나 로봇이 자주 방문하지 않은 공간을 주행할 때, 로봇은 현재 보고 있는 공간과 이전에 방문한 공간을 매칭하는데 어려움이 있다. When the robot moves through space, the server linked with the robot creates a map of the space visited by the robot using a visual place recognition technique. By providing the map generated by the server to the robot, the robot can check whether the current location is a previously visited space. However, when the robot travels in an infrequently visited space, the robot has difficulty in matching the space it is currently seeing with the space it has visited before.

또한, 호텔 등 공간의 재배치나 조명의 변화가 빈번한 공간에서도, 강인한 공간 인식이 필요하다. 그리고 로봇은 자신의 위치에 대한 사전 정보 없이 위치를 파악할 수 있어야 한다. 그리고 불특정 다수의 외부인이 이용하는 공간(예를 들어, 호텔 등)에서 물리적인 충격이나 외부의 힘에 의해 로봇이 들어올려 임의의 장소로 이동되는 사고가 발생하더라도, 로봇은 자신이 위치하는 공간을 인지할 수 있는 기술이 필요하다.In addition, strong spatial awareness is required even in a space, such as a hotel, where space relocation or lighting changes are frequent. And the robot should be able to determine its location without prior information about its location. And even if an accident occurs that the robot is lifted and moved to an arbitrary place by a physical shock or external force in a space used by an unspecified number of outsiders (for example, a hotel, etc.), the robot recognizes the space it is located in. You need skills to do it.

기존의 로봇은 공간을 인지하기 위하여 현재 프레임에서 전역적 위치 추정을 수행할 때, 로봇이 현재 보고 있는 장면과 가장 유사한 키프레임을 참조하여 자신의 위치를 추정한다. 가장 유사한 키프레임을 찾기 위해, 로봇은 모든 키프레임과 현재 프레임을 비교하기 때문에 많은 연산량을 필요로 한다. 또한, 로봇이 주행을 하면 할수록 맵의 크기가 커지고 키프레임의 개수 또한 계속해서 늘어나기 때문에, 이를 해결하기 위한 방안이 필요하다.When a conventional robot performs global location estimation in the current frame to recognize space, it estimates its location by referring to a keyframe that is most similar to the scene the robot is currently viewing. To find the most similar keyframe, the robot requires a lot of computation because it compares all keyframes with the current frame. In addition, as the robot moves, the size of the map increases and the number of keyframes continues to increase, so a method is needed to solve this problem.

따라서, 본 발명은 호텔과 같은 건물의 다양한 층을 주행하는 자율주행 로봇과 로봇의 위치를 추정하는 서버를 통해, 사전 정보 없이도 로봇이 주행중인 장소의 위치를 추정할 수 있는 기술을 제공한다.Accordingly, the present invention provides a technology capable of estimating the location of a place where a robot is traveling without prior information through an autonomous driving robot that runs on various floors of a building such as a hotel and a server that estimates the location of the robot.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 하나의 특징인 자율주행 로봇과 연동하여 상기 로봇이 위치를 추정할 수 있도록 비교 대상프레임을 제공하는 서버로서,As a server that provides a comparison target frame so that the robot can estimate a position in conjunction with an autonomous driving robot, which is one feature of the present invention for achieving the technical problem of the present invention,

상기 자율주행 로봇으로부터 특정 영상에서 선택된 복수의 키프레임들을 수신하고, 상기 복수의 키프레임들로부터 각각 복수의 제1 영상 특징점들과 상기 복수의 제1 영상 특징점들에 대한 제1 특징 서술자들을 추출하는 키프레임 처리 모듈, 상기 복수의 키프레임들 각각의 키프레임 벡터를 계산하고, 기 저장된 복수의 저장 키프레임들의 제2 영상 특징점들을 포함하는 키프레임 인덱스 벡터를 생성하는 벡터 계산 모듈, 상기 제2 영상 특징점들에 대응되는 제1 특징 서술자들이 포함된 키프레임들을, 상기 키프레임 인덱스 벡터에 정렬하여 키프레임 인덱스로 생성하는 키프레임 인덱스 벡터 생성 모듈, 그리고 상기 키프레임 인덱스의 가중치를 계산하여, 상기 기 저장된 복수의 저장 키프레임들 중 상기 자율주행 로봇에서 상기 복수의 키프레임들과 비교하여 위치를 인식할 복수의 비교 후보 키프레임을 선정하는 비교 후보 키프레임 선정 모듈을 포함한다.receiving a plurality of keyframes selected from a specific image from the autonomous driving robot, and extracting a plurality of first image feature points and first feature descriptors for the plurality of first image feature points from the plurality of keyframes A keyframe processing module, a vector calculation module for calculating a keyframe vector of each of the plurality of keyframes, and generating a keyframe index vector including second image feature points of a plurality of stored keyframes stored in advance, the second image A keyframe index vector generation module that aligns keyframes including first feature descriptors corresponding to feature points to the keyframe index vector to generate a keyframe index, and calculates a weight of the keyframe index, and a comparison candidate keyframe selection module configured to select a plurality of comparison candidate keyframes for recognizing a position by comparing the autonomous driving robot with the plurality of keyframes from among a plurality of stored keyframes.

상기 키프레임 처리 모듈에서 처리한 복수의 키프레임들 및 제1 영상 특징점들과 제1 특징 서술자, 그리고 상기 저장 키프레임들에 대한 제2 영상 특징점들과 제2 특징 서술자들을 저장하는 키프레임 저장 모듈을 더 포함할 수 있다.A keyframe storage module for storing a plurality of keyframes, first image feature points, and first feature descriptors processed by the keyframe processing module, and second image feature points and second feature descriptors for the stored keyframes may further include.

상기 키프레임 처리 모듈은, 상기 복수의 키프레임들 중 제1 키프레임에서 추출한 복수의 영상 특징점들 각각의 마스크 영역을 지정하고, 상기 제1 키프레임과 연속된 제2 키프레임에서 상기 마스크 영역 이외의 영역에서 복수의 추가 영상 특징점들을 추출할 수 있다.The keyframe processing module is configured to designate a mask area of each of a plurality of image feature points extracted from a first keyframe among the plurality of keyframes, and other than the mask area in a second keyframe consecutive to the first keyframe. A plurality of additional image feature points may be extracted from the region of .

상기 벡터 계산 모듈은, 상기 복수의 키프레임들 각각으로부터 시각적 어휘를 획득하고, 획득한 시각적 어휘를 기초로 BoW(Bag of Words) 방법으로 수신 키프레임들의 벡터를 계산할 수 있다.The vector calculation module may obtain a visual vocabulary from each of the plurality of keyframes, and calculate a vector of received keyframes using a Bag of Words (BoW) method based on the acquired visual vocabulary.

상기 키프레임 인덱스 벡터 생성 모듈은, 상기 제1 영상 특징점들에 tf-idf(term frequency-inverse document frequency)를 적용하여, 상기 키프레임 인덱스를 생성할 수 있다.The keyframe index vector generation module may generate the keyframe index by applying a term frequency-inverse document frequency (tf-idf) to the first image feature points.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 또 다른 특징인 위치 추정 서버와 연동하는 자율주행 로봇으로서,As an autonomous driving robot that interworks with a location estimation server that is another feature of the present invention for achieving the technical problem of the present invention,

영상을 촬영하는 카메라, 상기 촬영된 영상으로부터 생성된 복수의 프레임들 중, 상기 로봇의 위치 추정을 위한 복수의 키프레임들을 선택하는 키프레임 생성 모듈, 상기 위치 추정 서버로 상기 복수의 키프레임을 전송하고, 상기 위치 추정 서버로부터 상기 복수의 키프레임을 기초로 선택된 복수의 비교 후보 키프레임들을 수신하는 인터페이스, 그리고 상기 복수의 키프레임과 상기 복수의 비교 후보 키프레임들을 비교하여, 상기 자율주행 로봇의 위치를 인식하는 키프레임 위치 인식 모듈을 포함한다.A camera that captures an image, a keyframe generation module that selects a plurality of keyframes for estimating the position of the robot from among a plurality of frames generated from the captured image, and transmits the plurality of keyframes to the position estimation server and an interface for receiving a plurality of comparison candidate keyframes selected based on the plurality of keyframes from the location estimation server, and comparing the plurality of keyframes with the plurality of comparison candidate keyframes, so that It includes a keyframe position recognition module for recognizing the position.

상기 카메라에서 촬영된 영상을 복수의 이미지들로 생성하고, 복수의 이미지들 각각에 대한 상기 복수의 프레임을 생성하는 이미지 처리부를 더 포함할 수 있다.The image processing unit may further include an image processing unit generating the image captured by the camera as a plurality of images, and generating the plurality of frames for each of the plurality of images.

상기 키프레임 위치 인식 모듈에서 상기 자율주행 로봇의 위치가 인식되면, 위치 인식을 위해 사용된 상기 복수의 키프레임들을 이용하여 키프레임 지도를 생성하는 키프레임 지도 생성부를 더 포함할 수 있다.The keyframe location recognition module may further include a keyframe map generator configured to generate a keyframe map by using the plurality of keyframes used for location recognition when the location of the autonomous driving robot is recognized by the keyframe location recognition module.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 또 다른 특징인 위치 추정 서버가 자율주행 로봇의 위치를 추정하는 방법으로서,As another feature of the present invention for achieving the technical problem of the present invention, a method for estimating a position of an autonomous driving robot by a position estimating server,

상기 자율주행 로봇에서 영상을 기초로 생성된 복수의 수신 키프레임들을 수신하는 단계, 상기 수신 키프레임들 각각에서 복수의 제1 영상 특징점들과 상기 제1 영상 특징점들에 대한 제1 특징 서술자를 추출하는 단계, 기 저장된 복수의 저장 키프레임들 각각에 대한 시각적 어휘들을 기초로 상기 저장 키프레임들의 벡터를 계산하는 단계, 상기 계산한 벡터를 구성하는 저장 키프레임들의 제2 영상 특징점이 상기 제1 영상 특징점에 일치하는 저장 키프레임을 정렬하여, 키프레임 인덱스로 생성하는 단계, 그리고 상기 키프레임 인덱스에 포함된 저장 키프레임들의 제2 특징 서술자들에 대한 가중치를 계산하여, 복수의 저장 키프레임들 중 비교 후보 키프레임을 선정하는 단계를 포함한다.receiving a plurality of received keyframes generated based on an image in the autonomous driving robot; extracting a plurality of first image feature points and a first feature descriptor for the first image feature points from each of the received keyframes calculating a vector of the storage keyframes based on visual vocabulary for each of the stored plurality of storage keyframes, wherein a second image feature point of the storage keyframes constituting the calculated vector is the first image Sorting the storage keyframes matching the keyframes to generate a keyframe index, and calculating weights for the second characteristic descriptors of the storage keyframes included in the keyframe index, from among the plurality of storage keyframes. and selecting a comparison candidate keyframe.

상기 제1 특징 서술자를 추출하는 단계는, 상기 제1 영상 특징점들과 제1 특징 서술자를 기초로 특징 서술자 공간을 생성하는 단계, 그리고 상기 생성한 특징 서술자 공간을 클러스터링하여, 상기 제1 영상 특징점들을 제1 시각적 어휘로 생성하는 단계를 포함할 수 있다.The extracting of the first feature descriptor may include generating a feature descriptor space based on the first image feature points and the first feature descriptor, and clustering the generated feature descriptor space to obtain the first image feature points. It may include generating the first visual vocabulary.

상기 제1 특징 서술자를 추출하는 단계는, 상기 수신 키프레임들 중 제1 키프레임에서 제1 키프레임 영상 특징점을 추출하는 단계, 상기 제1 키프레임과 인접한 제2 키프레임에서 제2 키프레임 영상 특징점을 추출하는 단계, 상기 추출한 제2 키프레임 영상 특징점의 수가 미리 설정한 임계값보다 적으면, 상기 제2 키프레임 영상 특징점을 기준점으로 하는 마스크 영역을 지정하는 단계, 그리고 상기 제2 키프레임에서 상기 마스크 영역 이외의 영역에서 추가 제2 키프레임 영상 특징점을 추출하는 단계를 포함할 수 있다.The extracting of the first feature descriptor may include extracting a first keyframe image feature point from a first keyframe among the received keyframes, and a second keyframe image from a second keyframe adjacent to the first keyframe. extracting feature points; if the number of extracted feature points of the second keyframe image is less than a preset threshold value, designating a mask area using the feature points of the second keyframe image as a reference point; and in the second keyframe The method may include extracting an additional second keyframe image feature point in a region other than the mask region.

상기 저장 키프레임들의 벡터를 계산하는 단계는, 상기 저장 키프레임들에 대응하는 제2 시각적 어휘를 기초로 Bow 벡터를 계산할 수 있다.The calculating of the vector of the stored keyframes may include calculating a Bow vector based on a second visual vocabulary corresponding to the stored keyframes.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 또 다른 특징인 위치 추정 서버와 연동한 자율주행 로봇이 위치를 추정하는 방법으로서,Another feature of the present invention for achieving the technical problem of the present invention is a method for estimating the position of an autonomous driving robot linked with a position estimation server,

촬영한 영상으로부터 복수의 이미지들을 생성하고, 상기 생성한 복수의 이미지들에 대한 프레임들을 생성하는 단계, 상기 생성한 프레임들 중 상기 로봇의 위치 추정을 위한 복수의 키프레임들을 선택하는 단계, 상기 선택한 복수의 키프레임들을 상기 위치 추정 서버로 전송하는 단계, 상기 복수의 키프레임들을 토대로 선택된 비교 후보 키프레임을 상기 위치 추정 서버로부터 수신하는 단계, 그리고 상기 수신한 비교 후보 키프레임과 상기 선택한 복수의 키프레임들을 이용하여, 위치를 추정하는 단계를 포함한다.Generating a plurality of images from the captured image, generating frames for the plurality of generated images, selecting a plurality of keyframes for estimating the position of the robot from among the generated frames, the selected transmitting a plurality of keyframes to the location estimation server, receiving a comparison candidate keyframe selected based on the plurality of keyframes from the location estimation server, and the received comparison candidate keyframe and the selected plurality of keys and estimating a position using the frames.

상기 위치를 추정하는 단계 이후에, 상기 복수의 키프레임들을 이용하여 키프레임 지도를 생성하는 단계를 포함할 수 있다. After estimating the location, the method may include generating a keyframe map using the plurality of keyframes.

본 발명에 따르면, BoW(Bag of Words) 기법을 이용하여 키프레임을 분류함으로써, 키프레임을 이용한 로봇의 시각적 공간 재인식 속도 및 정밀도를 향상시킬 수 있다.According to the present invention, by classifying keyframes using the Bag of Words (BoW) technique, it is possible to improve the speed and precision of recognizing the visual space of the robot using the keyframes.

도 1은 본 발명의 실시예에 따른 위치 추정 서버가 적용된 환경의 예시도이다.
도 2는 본 발명의 실시예에 따른 로봇의 구조도이다.
도 3은 본 발명의 실시예에 따른 위치 추정 서버의 구조도이다.
도 4는 본 발명의 실시예에 따른 위치 추정 방법에 대한 흐름도이다.
도 5는 본 발명의 실시예에 따른 시각적 어휘 생성의 예시도이다.
도 6은 본 발명의 실시예에 따라 계산된 Bow 벡터의 예시도이다.
도 7은 본 발명의 실시예에 따른 키프레임 인덱스 벡터의 예시도이다.
도 8은 본 발명의 실시예에 따른 영상 특징점 추출 방법에 대한 흐름도이다.
도 9는 본 발명의 실시예에 따라 키프레임에서 영상 특징점이 추출된 예시도이다.
도 10은 본 발명의 실시예에 따른 시각적 어휘의 예시도이다.1 is an exemplary diagram of an environment to which a location estimation server according to an embodiment of the present invention is applied.
2 is a structural diagram of a robot according to an embodiment of the present invention.
3 is a structural diagram of a location estimation server according to an embodiment of the present invention.
4 is a flowchart of a location estimation method according to an embodiment of the present invention.
5 is an exemplary diagram of visual vocabulary generation according to an embodiment of the present invention.
6 is an exemplary diagram of a Bow vector calculated according to an embodiment of the present invention.
7 is an exemplary diagram of a keyframe index vector according to an embodiment of the present invention.
8 is a flowchart of a method for extracting image feature points according to an embodiment of the present invention.
9 is an exemplary diagram in which image feature points are extracted from a keyframe according to an embodiment of the present invention.
10 is an exemplary diagram of a visual vocabulary according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a part "includes" a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated.

이하, 도면을 참조로 하여, 본 발명의 실시예에 따른 자율주행 로봇, 자율주행 로봇의 위치 추정 서버 및 이를 이용한 자율주행 로봇의 위치 추정 방법을 상세히 설명한다.Hereinafter, an autonomous driving robot, a position estimation server for an autonomous driving robot, and a method for estimating a position of an autonomous driving robot using the same according to an embodiment of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 위치 추정 서버가 적용된 환경의 예시도이다.1 is an exemplary diagram of an environment to which a location estimation server according to an embodiment of the present invention is applied.

도 1에 도시된 바와 같이, 불특정 다수의 외부인이 이용하는 공간(예를 들어, 호텔 등)에 설치된 자율주행 로봇(이하, ‘로봇’이라 지칭함)(100)은 위치 추정 서버(200)와 연동한다. 또한, 로봇(100)과 위치 추정 서버(200)는 엘리베이터 제어 서버(300)와 연동한다. As shown in FIG. 1 , an autonomous driving robot (hereinafter, referred to as a 'robot') 100 installed in a space (eg, a hotel, etc.) used by a large number of unspecified outsiders works with the location estimation server 200 . . In addition, the robot 100 and the position estimation server 200 interwork with the elevator control server 300 .

로봇(100)은 내장 또는 외장 형태로 장착된 카메라를 이용하여, 로봇(100)이 주행하는 방향의 장소를 촬영하여 촬영 영상을 수집한다. 로봇(100)은 촬영 영상을 이미지 처리하여 복수의 이미지들을 생성한다. The robot 100 collects a photographed image by photographing a place in a direction in which the robot 100 travels by using a camera mounted in an internal or external form. The robot 100 generates a plurality of images by image processing the captured image.

로봇(100)은 복수의 이미지들로부터 키프레임을 각각 생성한다. 여기서 키프레임이라 함은, 로봇(100)이 촬영한 영상의 매 프레임에서 로봇(100)의 위치 인식을 위해 중요하다고 선택된 프레임을 의미한다. The robot 100 generates each keyframe from a plurality of images. Here, the key frame means a frame selected to be important for recognizing the position of the robot 100 in every frame of the image captured by the robot 100 .

예를 들어, 로봇(100)이 도 1에 도시된 바와 같이 호텔의 로비를 돌아다니면서 이미지를 수집하는 경우, 객실 문의 테두리, 엘리베이터로 연결되는 통로, 객실 번호 표지판 등이 로봇(100)의 위치 인식을 위해 중요한 요소가 될 수 있다. 따라서, 로봇(100)이 수집한 프레임에서 객실 문의 테두리나 통로, 객실 번호 표지판 등이 등장한다면, 해당 프레임을 키프레임으로 생성한다.For example, when the robot 100 collects images while walking around the hotel lobby as shown in FIG. 1 , the location of the robot 100 is recognized by the border of the room door, the passageway leading to the elevator, and the room number sign. can be an important factor for Therefore, if the frame or the passage of the room door, the room number sign, etc. appear in the frame collected by the robot 100, the corresponding frame is generated as a key frame.

로봇(100)은 하나의 촬영 영상에서 하나 이상의 키프레임을 생성할 수 있다. 이미지에서 키프레임을 생성하는 방법은 다양한 방법으로 실행할 수 있으므로, 본 발명의 실시예에서는 어느 하나의 방법으로 한정하지 않는다. 그리고 하나의 이미지에서 생성하는 키프레임의 수도 한정하지 않는다.The robot 100 may generate one or more keyframes from one captured image. Since a method of generating a keyframe from an image can be performed in various ways, the embodiment of the present invention is not limited to any one method. Also, the number of keyframes generated from one image is not limited.

또한, 본 발명의 실시예에서는 설명의 편의를 위하여 로봇(100)에서 키프레임을 생성하는 것을 예로 하여 설명하나, 위치 추정 서버(200)가 키프레임을 생성할 수 있다. 이러한 경우, 로봇(100)은 카메라가 촬영한 촬영 영상을 위치 추정 서버(200)로 전달하기만 하면 된다.In addition, in the embodiment of the present invention, for convenience of description, the generation of the keyframe in the robot 100 is described as an example, but the location estimation server 200 may generate the keyframe. In this case, the robot 100 only needs to transmit the captured image captured by the camera to the location estimation server 200 .

또한, 로봇(100)은 복수의 키프레임들을 이용하여 키프레임 지도를 생성한다. 이때, 로봇(100)은 층마다 키프레임 지도를 생성하며, 키프레임들을 이용하여 키프레임 지도를 생성하는 방법에 대해서는 이후 설명한다. 본 발명의 실시예에서는, 로봇(100)이 키프레임 지도를 생성하는 것을 예로 하여 설명하나, 위치 추정 서버(200)가 생성할 수도 있다.Also, the robot 100 generates a keyframe map using a plurality of keyframes. At this time, the robot 100 generates a keyframe map for each layer, and a method of generating a keyframe map using the keyframes will be described later. In the embodiment of the present invention, the robot 100 generates a keyframe map as an example, but the location estimation server 200 may generate it.

로봇(100)은 위치 추정 서버(200)로부터 복수의 비교 후보 키프레임들을 수신하고, 수신한 복수의 비교 후보 키프레임들을 이용하여 로봇(100) 자신의 위치를 인식한다.The robot 100 receives a plurality of comparison candidate keyframes from the position estimation server 200 , and recognizes a position of the robot 100 itself by using the received plurality of comparison candidate keyframes.

위치 추정 서버(200)는 엘리베이터 서버(300)로부터 로봇(100)이 하차한 건물의 층 정보를 수신한다. 위치 추정 서버(200)는 건물 층 정보를 수신하면, 기 저장되어 있는 해당 건물 층의 키프레임 지도를 읽어들인다. 여기서, 키프레임 지도는 건물 기둥, 복도, 방문 등 변경이 거의 발생하지 않는 구성 요소들만을 포함하는 글로벌 지도와, 화분, 이동 가능한 구조물 등의 장애물을 표시하는 로컬 지도로 구분된다.The location estimation server 200 receives the floor information of the building from which the robot 100 alights from the elevator server 300 . When the location estimation server 200 receives the building floor information, it reads a pre-stored keyframe map of the corresponding building floor. Here, the keyframe map is divided into a global map including only components that rarely change, such as building pillars, corridors, and doors, and a local map displaying obstacles such as flowerpots and movable structures.

로봇(100)이 실시간으로 지도를 업데이트하고 자기 위치를 인식하는 과정은 모두 로컬 지도에서 수행한다. 로컬 지도의 정보는 영상 특징을 통해 로봇(100)에서 구성되며, 키프레임 간 비교 수행을 통해 로봇(100) 자신의 위치를 인식힌다. The process of the robot 100 updating the map in real time and recognizing its location is all performed on the local map. Information on the local map is configured in the robot 100 through image features, and the robot 100 recognizes its own position through comparison between keyframes.

이때, 변경된 장애물 위치는 로컬 지도에 작성된다. 장애물 위치가 반복적으로 같은 위치에 인식되면, 로봇(100)은 글로벌 지도에 해당 장애물의 위치를 업데이트 한다. 즉, 가변성이 높은 장애물과 정밀 위치 인식은 로컬 지도에서 수행되며, 로컬 지도에 포함되어 있던 장애물 중 위치가 미리 설정한 횟수 이상 반복적으로 인식되면 해당 장애물의 위치는 글로벌 지도에 반영한 뒤 글로벌 지도를 구성한다. At this time, the changed obstacle position is written on the local map. When the obstacle position is repeatedly recognized at the same position, the robot 100 updates the position of the obstacle on the global map. That is, highly variable obstacles and precise location recognition are performed on the local map. If the location of the obstacles included in the local map is recognized repeatedly more than a preset number of times, the location of the obstacle is reflected on the global map and then the global map is constructed. do.

예를 들어 설명하면, 호텔 로비에서 정문으로 주행 명령이 들어왔을 때, 로봇(100)은 로컬 지도 내에서 현재 자신의 위치를 인식한 후 주행을 수행한다. 호텔 로비에서 다른 층의 객실(이하, '목적지 객실'이라 지칭함)로 이동 명령을 수행하였을 때, 목적지 객실은 로컬 지도 상에 존재하지 않는다. For example, when a driving command is received from the hotel lobby to the front door, the robot 100 recognizes its current location in the local map and then performs driving. When a move command is executed from the hotel lobby to a room on another floor (hereinafter referred to as a 'destination room'), the destination room does not exist on the local map.

따라서, 로봇(100)은 글로벌 지도에서 목적지 객실을 탐색하여 주행한다. 즉, 호텔 전체 공간 정보를 글로벌 지도가 가지고 있다면, 해당 공간의 상세한 구역 정보는 로컬 지도가 가지고 있다고 할 수 있다.Accordingly, the robot 100 searches for a destination cabin on the global map and drives. That is, if the global map has spatial information of the entire hotel, it can be said that the local map has detailed area information of the space.

위치 추정 서버(200)는 로봇(100)으로부터 복수의 키프레임들을 수신한다. 또한, 위치 추정 서버(200)는 로봇(100)으로부터 키프레임 지도를 수신할 수도 있다. 만약, 로봇(1000으로부터 복수의 키프레임들을 수신하였다면, 위치 추정 서버(200)는 복수의 키프레임들을 이용하여 키프레임 지도를 생성한다. The position estimation server 200 receives a plurality of keyframes from the robot 100 . Also, the location estimation server 200 may receive a keyframe map from the robot 100 . If a plurality of keyframes are received from the robot 1000, the location estimation server 200 generates a keyframe map using the plurality of keyframes.

위치 추정 서버(200)는 키프레임 지도를 생성하기 위해 사용한 키프레임들 각각에 대하여, Bow(Bag of Words) 기법을 이용하여 Bow 벡터를 계산한다. Bow 기법은 문서를 자동으로 분류하기 위한 방법 중 하나로, 글에 포함된 단어들의 분포를 보고 이 문서가 어떤 종류의 문서인지를 판단하는 기법을 의미한다.The location estimation server 200 calculates a Bow vector using a Bow (Bag of Words) technique for each of the keyframes used to generate the keyframe map. The Bow technique is one of the methods for automatically classifying documents, and refers to a technique of determining what kind of document this document is by looking at the distribution of words included in the text.

영상 처리나 컴퓨터 비전 분야에서는 Bow 기법을 주로 이미지를 분류하거나 검색하기 위한 목적으로 사용하였으며, 최근에는 물체나 장면(scene)을 인식하기 위한 용도로 활용되고 있다. 이미지 분류를 위한 Bow 기법에 대해서는 이후 설명한다.In the field of image processing or computer vision, the Bow technique is mainly used for the purpose of classifying or searching images, and has recently been used for recognizing objects or scenes. The Bow method for image classification will be described later.

위치 추정 서버(200)는 키프레임에서 Bow 벡터를 계산하기 위하여, 시각적 어휘를 이용한다. 이때, 시각적 어휘는 ORB(Oriented FAST and Rotated BRIEF) 특징점을 사용하여 생성되며, 이에 대해서는 이후 상세하게 설명한다.The localization server 200 uses a visual vocabulary to calculate a Bow vector in a keyframe. In this case, the visual vocabulary is generated using Oriented FAST and Rotated BRIEF (ORB) feature points, which will be described in detail later.

위치 추정 서버(200)는 계산한 Bow 벡터를 이용하여, 키프레임 인덱스 벡터를 생성한다. 위치 추정 서버(200)는 키 프레임 인덱스를 이용하여 복수의 키프레임 후보를 선정하고, 선정한 복수의 키프레임 후보를 로봇(100)에 제공한다.The position estimation server 200 generates a keyframe index vector by using the calculated Bow vector. The position estimation server 200 selects a plurality of key frame candidates using the key frame index, and provides the selected plurality of key frame candidates to the robot 100 .

엘리베이터 제어 서버(300)는 로봇(100)이 엘리베이터에 탑승하거나 하차할 때, 엘리베이터 문의 개폐 상태를 기초로 로봇이 하차한 층의 층 정보를 위치 추정 서버(200)와 로봇(100)에 각각 전달한다. When the robot 100 gets on or off the elevator, the elevator control server 300 transmits the floor information of the floor where the robot alights based on the open/close state of the elevator door to the location estimation server 200 and the robot 100, respectively. do.

이상에서 설명한 로봇(100)과 위치 추정 서버(200) 각각의 구조에 대해 도 2 및 도 3을 참조로 설명한다.The structure of each of the robot 100 and the position estimation server 200 described above will be described with reference to FIGS. 2 and 3 .

도 2는 본 발명의 실시예에 따른 로봇의 구조도이고, 도 3은 본 발명의 실시예에 따른 위치 추정 서버의 구조도이다.2 is a structural diagram of a robot according to an embodiment of the present invention, and FIG. 3 is a structural diagram of a location estimation server according to an embodiment of the present invention.

도 2에 도시된 바와 같이 로봇(100)은 카메라(110), 이미지 처리 모듈(120), 키프레임 생성 모듈(130), 인터페이스(140), 키프레임 위치 인식 모듈(150), 키프레임 지도 생성 모듈(160), 그리고 로봇 구동 모듈(170)을 포함한다. 로봇(100)은 도 2에 도시한 구성 이외의 구성 요소들을 포함할 수 있다.As shown in FIG. 2 , the robot 100 includes a camera 110 , an image processing module 120 , a keyframe generation module 130 , an interface 140 , a keyframe location recognition module 150 , and a keyframe map generation a module 160 , and a robot driving module 170 . The robot 100 may include components other than those shown in FIG. 2 .

카메라(110)는 로봇(100)에 내장되어 있거나 로봇(100)의 외부에 부착되어, 로봇(100)의 진행 방향에 위치한 장소(배경, 객체 등)를 영상으로 촬영한다. The camera 110 is built into the robot 100 or is attached to the outside of the robot 100, and takes a place (background, object, etc.) located in the moving direction of the robot 100 as an image.

이미지 처리 모듈(120)은 카메라(110)가 촬영한 영상을 복수의 이미지로 생성하고, 각각의 이미지에 대한 프레임을 생성한다. 이미지 처리 모듈(120)이 영상을 이미지로 생성하거나, 이미지에 대한 프레임으로 생성하는 방법은 다양한 방법으로 실행할 수 있으므로, 본 발명의 실시예에서는 어느 하나의 방법으로 한정하지 않는다.The image processing module 120 generates an image captured by the camera 110 as a plurality of images, and generates a frame for each image. Since the image processing module 120 generates an image as an image or a method for generating a frame for the image in various ways, the embodiment of the present invention is not limited to any one method.

키프레임 생성 모듈(130)은 이미지 처리 모듈(120)이 생성한 복수의 프레임들 중, 복수의 프레임들을 키프레임들로 선택한다. 키프레임은 영상을 기초로 생성된 프레임들 중 로봇(100)의 위치 추정을 위해 중요한 프레임을 의미한다.The keyframe generation module 130 selects a plurality of frames from among the plurality of frames generated by the image processing module 120 as keyframes. The key frame means an important frame for estimating the position of the robot 100 among frames generated based on an image.

키프레임 생성 모듈(130)이 복수의 프레임들 중에서 복수의 키프레임들을 선택하는 것을 다양한 방법을 통해 이미 알려진 것이다. 따라서, 본 발명의 실시예에서는 키프레임 선택 방법에 대한 상세 설명은 생략한다.It is already known through various methods that the keyframe generation module 130 selects a plurality of keyframes from among a plurality of frames. Therefore, in the embodiment of the present invention, a detailed description of the keyframe selection method will be omitted.

인터페이스(140)는 키프레임 생성 모듈(130)이 생성한 복수의 키프레임들을 위치 추정 서버(200)로 전달한다. 또한, 인터페이스(140)는 키프레임 지도 생성 모듈(160)이 생성한 키프레임 지도를 위치 추정 서버(200)로 전달할 수 있다. 그리고 인터페이스(140)는 위치 추정 서버(200)로부터 복수의 키프레임들에서 선택된 적어도 하나의 비교 후보 키프레임들을 수신한다.The interface 140 transmits the plurality of keyframes generated by the keyframe generation module 130 to the location estimation server 200 . Also, the interface 140 may transmit the keyframe map generated by the keyframe map generation module 160 to the location estimation server 200 . In addition, the interface 140 receives at least one comparison candidate keyframe selected from a plurality of keyframes from the location estimation server 200 .

키프레임 위치 인식 모듈(150)은 위치 추정 서버(200)로부터 수신한 적어도 하나의 비교 후보 키프레임들과, 키프레임 생성 모듈(130)에서 생성한 키프레임들을 비교하여, 현재 로봇(100)의 위치를 인식한다. The keyframe location recognition module 150 compares at least one comparison candidate keyframes received from the location estimation server 200 with the keyframes generated by the keyframe generation module 130, Recognize the location.

종래에는 로봇(100)에서 생성한 모든 키프레임들과 생성한 키프레임을 비교하여 로봇(100)의 위치를 인식하였기 때문에, 위치 인식 연산이 오래 걸렸다. 그러나 본 발명의 실시예에서는 키프레임 위치 인식 모듈(150)이 위치 추정 서버(200)로부터 수신한 적은 수의 비교 후보 키프레임들과 생성한 키프레임들을 비교하기 때문에, 위치 인식 연산 시간이 단축되는 효과가 있다.Conventionally, since the position of the robot 100 was recognized by comparing all the keyframes generated by the robot 100 with the generated keyframes, the position recognition operation took a long time. However, in the embodiment of the present invention, since the keyframe location recognition module 150 compares the generated keyframes with a small number of comparison candidate keyframes received from the location estimation server 200, the location recognition operation time is reduced. It works.

키프레임 지도 생성 모듈(160)은, 키프레임 위치 인식 모듈(150)이 로봇(100)의 위치를 인식하면, 위치 인식을 위해 사용한 키프레임들을 이용하여 로봇(100)이 위치한 건물의 키프레임 지도를 생성한다. 키프레임 지도 생성 모듈(160)은 생성한 키프레임 지도를 저장하고, 인터페이스(140)를 통해 위치 추정 서버(200)로 전달한다.The keyframe map generation module 160, when the keyframe location recognition module 150 recognizes the location of the robot 100, uses keyframes used for location recognition to provide a keyframe map of the building in which the robot 100 is located. create The keyframe map generation module 160 stores the generated keyframe map and transmits it to the location estimation server 200 through the interface 140 .

키프레임 지도 생성 모듈(160)은 로봇(100)이 엘리베이터에서 하차한 직후 로봇이 위치한 층 정보에 대응하여 로봇(100)에 저장한 로컬 지도를 불러들이는 것을 예로 하여 설명한다. 이때, 로봇(100)이 하차한 층 정보는 엘리베이터 제어 서버(300)로부터 수신한다. The keyframe map generation module 160 will be described as an example of calling the local map stored in the robot 100 in response to the floor information on which the robot is located immediately after the robot 100 gets off the elevator. At this time, the floor information from which the robot 100 got off is received from the elevator control server 300 .

즉, 로봇(100)이 룸 서비스를 위해 목적지 객실의 위치(호수)를 토대로 목적지 객실에 대한 층 정보를 인식한다. 로봇(100)이 엘리베이터 탑승을 위해 엘리베이터 앞으로 이동하면, 위치 추정 서버(200)에서 로봇(100)의 엘리베이터 탑승 대기를 인식하고, 엘리베이터 제어 서버(300)에 엘리베이터 호출을 명령한다. That is, the robot 100 recognizes floor information for the destination room based on the location (lake) of the destination room for room service. When the robot 100 moves in front of the elevator to board the elevator, the position estimation server 200 recognizes the elevator boarding wait of the robot 100 and commands the elevator control server 300 to call the elevator.

엘리베이터 제어 서버(300)는 로봇(100)의 위치 추정 서버(200)와 연결되어 엘리베이터의 호출, 도착, 엘리베이터 문 개폐, 층 정보를 로봇(100)으로 전달한다. 로봇(100)이 호출한 엘리베이터가 도착하고 문이 열리면, 로봇(100)은 엘리베이터 탑승을 진행한다. The elevator control server 300 is connected to the location estimation server 200 of the robot 100 and transmits elevator call, arrival, elevator door opening and closing, and floor information to the robot 100 . When the elevator called by the robot 100 arrives and the door is opened, the robot 100 proceeds to board the elevator.

이때, 로봇(100)과 사람의 충돌을 방지하기 위해 카메라(110)가 촬영한 영상을 통해 사람을 인식한 후 탑승한다. 로봇(100)은 목적지 객실의 층에 엘리베이터가 도착하면, 해당 층 정보를 엘리베이터 제어 서버(300)와 연결된 위치 추정 서버(200)로부터 수신한다. At this time, in order to prevent a collision between the robot 100 and the person, the camera 110 recognizes the person through the image captured and then boards the vehicle. When the elevator arrives at the floor of the destination room, the robot 100 receives the corresponding floor information from the location estimation server 200 connected to the elevator control server 300 .

로봇(100)은 위치 추정 서버(200)로부터 수신한 해당 층 정보를 해당 층의 지도 정보로 전환한다. 로봇(100)이 층 정보를 이용하여 지도 정보로 전환하는 방법은 다양하므로, 본 발명의 실시예에서는 어느 하나의 방법으로 한정하지 않는다.The robot 100 converts the corresponding floor information received from the location estimation server 200 into map information of the corresponding floor. Since there are various methods for the robot 100 to convert the map information using the floor information, the embodiment of the present invention is not limited to any one method.

키프레임 지도 생성 모듈(160)이 키프레임 지도를 생성하는 방법은 다양한 방법(예를 들어, SLAM(Simultaneous Localization and Mapping) 등)으로 실행될 수 있으므로, 본 발명의 실시예에서는 상세한 설명을 생략한다. 또한, 본 발명의 실시예에서는 로봇(100)이 키프레임 지도를 생성하는 것을 예로 하여 설명하나, 위치 추정 서버(200)가 키프레임 지도를 생성할 수도 있다.Since the method for generating the keyframe map by the keyframe map generation module 160 may be performed in various ways (eg, Simultaneous Localization and Mapping (SLAM), etc.), a detailed description will be omitted in the embodiment of the present invention. Also, in the embodiment of the present invention, the robot 100 generates a keyframe map as an example, but the location estimation server 200 may also generate a keyframe map.

로봇 구동 모듈(170)은 로봇(100)이 주행할 수 있도록 구동한다. 로봇 구동 모듈(170)은 로봇(100)의 주행을 제어하는 제어 모듈, 로봇(100)의 하부에 설치된 바퀴, 엔진 등 로봇(100)이 자율주행 할 수 있도록 하는 구동 모듈 등 다양한 구성 요소들을 포함할 수 있다. 로봇 구동 모듈(170)의 기능은 다양하게 설정될 수 있으므로 본 발명의 실시예에서는 상세한 설명을 생략한다.The robot driving module 170 drives the robot 100 to run. The robot driving module 170 includes various components such as a control module for controlling the driving of the robot 100, a wheel installed under the robot 100, and a driving module for enabling the robot 100 to autonomously drive, such as an engine can do. Since the function of the robot driving module 170 can be set in various ways, a detailed description will be omitted in the embodiment of the present invention.

한편, 도 3에 도시된 바와 같이, 위치 추정 서버(200)는 키프레임 처리 모듈(210), 키프레임 저장 모듈(220), 벡터 계산 모듈(230), 키프레임 인덱스 벡터 생성 모듈(240), 그리고 비교 후보 키프레임 선정 모듈(250)를 포함한다.On the other hand, as shown in FIG. 3, the position estimation server 200 includes a keyframe processing module 210, a keyframe storage module 220, a vector calculation module 230, a keyframe index vector generation module 240, and a comparison candidate keyframe selection module 250 .

키프레임 처리 모듈(210)은 로봇(200)으로부터 복수의 키프레임들을 수신하면, 수신한 키프레임(이하, 설명의 편의를 위하여 ‘수신 키프레임’이라 지칭함)들을 처리하여 영상 특징점과 특징 서술자를 각각 추출한다. When the keyframe processing module 210 receives a plurality of keyframes from the robot 200, it processes the received keyframes (hereinafter, referred to as 'received keyframes' for convenience of description) to generate image feature points and feature descriptors. extract each.

여기서, 영상 특징점은 물체의 형태나 크기, 위치가 변해도 쉽게 프레임에서 식별이 가능하고, 카메라의 시점이나 조명이 변해도 프레임에서 해당 지점을 쉽게 찾아낼 수 있는 점을 말한다. 그리고 특징 서술자는 영상 특징점의 지역적 특성을 설명하는 것으로, 영상 특징점들 간 비교를 가능하게 한다. Here, the image feature point refers to a point that can be easily identified in a frame even if the shape, size, or position of an object changes, and that the corresponding point can be easily found in the frame even if the camera's viewpoint or lighting changes. And the feature descriptor describes the regional characteristics of image feature points, and enables comparison between image feature points.

키프레임 처리 모듈(210)은 FAST(Feature from Accelerated Segment Test) 방법을 이용하여 수신 키프레임에서 영상 특징점을 추출한다. 키프레임 처리 모듈(210)이 영상 특징점을 추출할 때, 미리 설정된 수 이상의 특징점이 추출되지 않으면, 키프레임 처리 모듈(210)은 이미 추출한 영상 특징점을 중심으로 마스크 영역을 지정한다. 그리고 키프레임 처리 모듈(210)은 마스크 영역 이외의 영역에서 추가로 영상 특징점을 추출한다. The keyframe processing module 210 extracts image feature points from the received keyframe using a Feature from Accelerated Segment Test (FAST) method. When the keyframe processing module 210 extracts image feature points, if more than a preset number of feature points are not extracted, the keyframe processing module 210 designates a mask area based on the already extracted image feature points. And the keyframe processing module 210 additionally extracts image feature points from areas other than the mask area.

그리고, 키프레임 처리 모듈(210)은 영상 특징점을 추출할 때, 이전 키프레임과 다음 키프레임 사이의 광 흐름(optical flow)을 이용하여 다음 키프레임의 영상 특징점을 추출한다. 광 흐름은 각 픽셀별 움직임을 예상하는 알고리즘으로, 이전 프레임과 현재 프레임을 비교하여 픽셀이 움직인 방향과 크기를 구한다. 키프레임 처리 모듈(210)이 광 흐름을 이용하여 영상 특징점을 추출하는 방법은 이미 알려져 있으므로 본 발명의 실시예에서는 상세한 설명을 생략한다.And, when extracting the image feature point, the keyframe processing module 210 extracts the image feature point of the next keyframe by using an optical flow between the previous keyframe and the next keyframe. Light flow is an algorithm that predicts the movement of each pixel, and compares the previous frame with the current frame to determine the direction and size of the pixel movement. A method for the keyframe processing module 210 to extract image feature points by using a light flow is already known, so a detailed description will be omitted in the embodiment of the present invention.

키프레임 처리 모듈(210)은 수신 키프레임에서 특징 서술자를 추출하는데, 특징 서술자로는 SIFT(Scale Invariant Feature Transform), HOG(Histogram of Oriented Gradient), 이진 기술자(BRIEF, ORB, BRISK) 등이 있다. 본 발명의 실시예에서는 빠른 비교를 위해 서술자를 이진 열로 표현하는 BRIEF 서술자를 이용하는 것을 예로 하여 설명한다.The keyframe processing module 210 extracts a feature descriptor from a received keyframe, and the feature descriptor includes a Scale Invariant Feature Transform (SIFT), a Histogram of Oriented Gradient (HOG), and a binary descriptor (BRIEF, ORB, BRISK). . In the embodiment of the present invention, the use of the BRIEF descriptor representing the descriptor in a binary column for quick comparison will be described as an example.

여기서, 키프레임 처리 모듈(210)이 영상 특징점을 추출하기 위해 사용한 FAST 방법과, 특징 서술자를 추출하기 위해 사용한 BRIEF 서술자를 합쳐 ORB(Oriented FAST and Rotated BRIEF)라 한다. FAST 방법과 BRIEF 서술자는 이미 알려진 것으로, 본 발명의 실시예에서는 상세한 설명을 생략한다.Here, the FAST method used by the keyframe processing module 210 to extract image feature points and the BRIEF descriptor used to extract the feature descriptor are collectively referred to as Oriented FAST and Rotated BRIEF (ORB). The FAST method and the BRIEF descriptor are already known, and detailed description will be omitted in the embodiment of the present invention.

키프레임 처리 모듈(210)은 키프레임 저장 모듈(220)에 층별로 관리되는 키프레임 지도를 수신한다. 키프레임 지도는 로봇(100)이 위치를 인식하기 위해 비교할 비교 후보 키프레임들을 비교 후보 키프레임 선정 모듈(250)이 선정할 때 이용된다. The keyframe processing module 210 receives the keyframe map managed for each layer in the keyframe storage module 220 . The keyframe map is used when the comparison candidate keyframe selection module 250 selects the comparison candidate keyframes to be compared in order to recognize the position of the robot 100 .

키프레임 저장 모듈(220)은 키프레임 처리 모듈(210)이 수신한 수신 키프레임과, 키프레임 처리 모듈(210)에서 추출된 영상 특징점과 특징 서술자를 매핑하여 저장한다. 본 발명의 실시예에서는 모든 키프레임들을 바이너리 파일 형태 저장하는 것을 예로 하여 설명하나, 반드시 이와 같이 한정되는 것은 아니다. The keyframe storage module 220 maps and stores the received keyframe received by the keyframe processing module 210 with the image feature point and the feature descriptor extracted from the keyframe processing module 210 . In the exemplary embodiment of the present invention, storing all keyframes in the form of a binary file is described as an example, but the present invention is not limited thereto.

그리고 키프레임 저장 모듈(220)은 로봇(100)으로부터 수신한 키프레임들을 이용하여 로봇(100)이 위치한 건물의 키프레임 지도를 건물 층별로 생성한다. 키프레임 저장 모듈(220)은 생성한 키프레임 지도를 저장하고, 키프레임 처리 모듈(210)로 제공한다. In addition, the keyframe storage module 220 generates a keyframe map of the building in which the robot 100 is located by using the keyframes received from the robot 100 for each building floor. The keyframe storage module 220 stores the generated keyframe map and provides it to the keyframe processing module 210 .

키프레임 저장 모듈(220)은 로봇(100)이 엘리베이터에서 하차한 직후 키프레임 지도를 생성하는 것을 예로 하여 설명한다. 이때, 로봇(100)이 하차한 층 정보를 키프레임 지도에 대응하여 저장하기 위하여, 엘리베이터 제어 서버(300)로부터 층 정보 신호를 수신한다. The keyframe storage module 220 will be described as an example of generating a keyframe map immediately after the robot 100 gets off the elevator. At this time, in order to store the floor information from which the robot 100 got off corresponding to the keyframe map, the floor information signal is received from the elevator control server 300 .

또한, 본 발명의 실시예에서는 로봇(100)에서도 키프레임 지도를 생성할 수도 있다. 이 경우, 키프레임 저장 모듈(220)은 로봇(100)이 생성한 키프레임 지도를 층 정보와 함께 수신하여, 층별로 저장, 관리하기만 한다.In addition, in the embodiment of the present invention, the robot 100 may also generate a keyframe map. In this case, the keyframe storage module 220 only receives the keyframe map generated by the robot 100 together with floor information, and stores and manages them for each floor.

벡터 계산 모듈(230)은 키프레임 처리 모듈(210)이 수신한 수신 키프레임들 각각의 키프레임 벡터를 계산한다. 본 발명의 실시예에서는 수신 키프레임들에서 키프레임 벡터를 계산하기 위해 Bow 방법을 이용하는 것을 예로 하여 설명한다. The vector calculation module 230 calculates a keyframe vector of each of the received keyframes received by the keyframe processing module 210 . In the embodiment of the present invention, the Bow method is used to calculate a keyframe vector from received keyframes as an example.

Bow 방법은 문서를 자동으로 분류하기 위한 방법 중 하나로, 글에 포함된 단어들의 분포를 통해 해당 문서가 어떤 종류의 문서인지 판단하는데 사용되는 방법이다. 본 발명의 실시예에서는 벡터 계산 모듈(230)이 Bow 방법을 이용하여 수신 키프레임들을 분류하는데 사용한다. The Bow method is one of the methods for automatically classifying documents. It is a method used to determine what kind of document a document is based on the distribution of words included in the text. In an embodiment of the present invention, the vector calculation module 230 uses the Bow method to classify received keyframes.

벡터 계산 모듈(230)이 수신 키프레임들의 벡터를 계산하기 위해서는, 수신 키프레임들 각각의 시각적 어휘(visual vocabulary)가 필요하다. 시각적 어휘를 얻기 위해, 벡터 계산 모듈(230)은 수신 키프레임들에 대해 키프레임 처리 모듈(210)이 추출한 영상 특징점과 특징 서술자를 확인한다.In order for the vector calculation module 230 to calculate the vector of the received keyframes, a visual vocabulary of each of the received keyframes is required. To obtain a visual vocabulary, the vector calculation module 230 checks the image feature points and feature descriptors extracted by the keyframe processing module 210 for the received keyframes.

여기서, 시각적 어휘는 키프레임 처리 모듈(210)이 키프레임에서 특징점과 특징 서술자를 추출하고, 추출한 특징 서술자로 특징점을 학습한 결과이다. 시각적 어휘의 예에 대해서는 이후 상세히 설명한다.Here, the visual vocabulary is a result of the keyframe processing module 210 extracting a feature point and a feature descriptor from the keyframe, and learning the feature point using the extracted feature descriptor. Examples of visual vocabulary will be described in detail later.

벡터 계산 모듈(230)은 저장 키프레임들의 영상 특징점과 특징 서술자, 수신 키프레임들의 영상 특징점과 특징 서술자를 기초로, 수신 키프레임들의 특징 서술자 그룹을 생성한다. 여기서, 특징 서술자 그룹은 벡터 계산 모듈(230)이 유사한 방향성을 지닌 특징 서술자들을 클러스터링하여 생성한 것이다.The vector calculation module 230 generates a feature descriptor group of the received keyframes based on the image feature point and feature descriptor of the stored keyframes and the image feature point and the feature descriptor of the received keyframes. Here, the feature descriptor group is generated by clustering feature descriptors with similar directions by the vector calculation module 230 .

벡터 계산 모듈(230)은 생성한 특징 서술자 그룹을 클러스터링을 수행하여 복수의 시각적 어휘를 생성한다. 여기서, 벡터 계산 모듈(230)은 특징 서술자 그룹을 클러스터링할 때 K-means 클러스터링을 수행하는 것을 예로 하여 설명하나, 반드시 이와 같이 한정되는 것은 아니다.The vector calculation module 230 generates a plurality of visual vocabulary by clustering the generated feature descriptor group. Here, although the vector calculation module 230 performs K-means clustering when clustering the feature descriptor group as an example, the description is not necessarily limited thereto.

키프레임 인덱스 벡터 생성 모듈(240)은 키프레임 처리 모듈(210)이 추출한 영상 특징점을 가지는 수신 키프레임들을 정렬하여 키프레임 인덱스 벡터로 생성한다. 키프레임 인덱스 벡터는 키프레임들에 각각 점수를 부여하기 위한 가중치로, 문서 분류와 동일한 방식인 tf-idf(term frequency-inverse document frequency)를 시각적 어휘에 적용하여 사용한다.The keyframe index vector generation module 240 aligns the received keyframes having the image feature points extracted by the keyframe processing module 210 to generate the keyframe index vector. The keyframe index vector is a weight for assigning scores to keyframes, and the same method as document classification, tf-idf (term frequency-inverse document frequency), is applied to the visual vocabulary.

tf-idf는 정보 검색 및 검색 엔진에 사용하는 가중치로서, 여러 문서로 구성된 문서 군이 있을 때 어떤 단어가 특정 문서 내에서 얼마나 중요한지를 나타내는 수치를 의미한다. 영어 문서를 예로 하면, a, an, and, of, but과 같은 관사나 접속사는 특정 문서에서 많이 등장하므로, 가중치도 낮고 중요 단어로 선택되지 않는다. 이를 본 발명의 실시예에 따른 시각적 어휘에 적용하면, 차별성이 있는 영상 특징점이 문서에서의 단어라 볼 수 있다. tf-idf is a weight used for information retrieval and search engines, and refers to a number indicating how important a word is within a specific document when there is a document group consisting of several documents. Taking English documents as an example, articles and conjunctions such as a, an, and, of, but appear frequently in certain documents, so they have low weight and are not selected as important words. When this is applied to the visual vocabulary according to the embodiment of the present invention, the image feature point having a distinctiveness can be regarded as a word in the document.

키프레임 인덱스 벡터 생성 모듈(240)은 수신 키프레임에서 확인한 영상 특징점을, 벡터를 구성하는 각각의 요소에 정렬하여 키프레임 인덱스 벡터로 생성한다. 키프레임 인덱스 벡터의 각 요소인 영상 특징점은 시각적 어휘에 포함된 특징점에 해당한다. The keyframe index vector generation module 240 aligns the image feature points identified in the received keyframe with each element constituting the vector to generate the keyframe index vector. An image feature point, which is each element of the keyframe index vector, corresponds to a feature point included in the visual vocabulary.

키프레임 인덱스 벡터 생성 모듈(240)은 벡터 계산 모듈(230)에서 계산된 모든 키프레임들의 Bow 벡터의 요소인 특징점과, 특징점에 대응하는 키프레임 인덱스 벡터의 요소에 해당 키프레임을 사용한다. 모든 키프레임들에 대해 이 작업을 수행하면 각 워드에 해당하는 요소마다 해당 워드가 포함된 Bow 벡터를 가지고 있는 키프레임 목록이 완성된다.The keyframe index vector generation module 240 uses the keyframe for the keyframe index vector element corresponding to the keyframe index vector element and the keyframe index vector element corresponding to the keyframe index vector element of the Bow vector of all the keyframes calculated by the vector calculation module 230 . Doing this for all keyframes results in a list of keyframes with a Bow vector containing that word for each element corresponding to that word.

비교 후보 키프레임 선정 모듈(250)은 현재 프레임을 구성하는 워드와 비교후보 키프레임들의 워드를 비교한다. 그리고 비교 후보 키프레임 선정 모듈(250)은 비교한 결과가 미리 설정된 임계값(예를 들어, 80% 등) 이상의 중복되는 워드를 가지는 키프레임이 존재하는 경우, 현재 프레임의 Tf-idf 가중치 점수를 증가시킨다. The comparison candidate keyframe selection module 250 compares the words constituting the current frame with the words of the comparison candidate keyframes. In addition, the comparison candidate keyframe selection module 250 calculates the Tf-idf weight score of the current frame when there is a keyframe having a word overlapping more than a preset threshold (eg, 80%, etc.) as a result of the comparison. increase

비교 후보 키프레임 선정 모듈(250)은 증가시킨 Tf-idf 가중치 점수가 높은 키프레임들 중 미리 설정한 수 또는 미리 설정된 가중치 점수를 나타내는 키프레임을 비교 후보 키프레임으로 선정한다. 비교 후보 키프레임 선정 모듈(250)은 선정한 비교 후보 키프레임을 로봇(100)으로 전달한다. 이를 토대로, 로봇(100)에서 현재 촬영중인 영상의 키프레임과 비교 후보 키프레임을 비교하여, 로봇(100)이 현재 위치를 파악하도록 한다.The comparison candidate keyframe selection module 250 selects, as a comparison candidate keyframe, a preset number or a keyframe indicating a preset weighting score among keyframes having a high Tf-idf weight score. The comparison candidate keyframe selection module 250 transmits the selected comparison candidate keyframe to the robot 100 . Based on this, the robot 100 compares the keyframe of the image currently being photographed with the comparison candidate keyframe, so that the robot 100 grasps the current position.

이상에서 설명한 로봇(100)과 위치 추정 서버(200)를 이용하여, 로봇(100)의 위치를 추정하는 방법에 대해 도 4를 참조로 설명한다.A method of estimating the position of the robot 100 using the robot 100 and the position estimation server 200 described above will be described with reference to FIG. 4 .

도 4는 본 발명의 실시예에 따른 위치 추정 방법에 대한 흐름도이다.4 is a flowchart of a location estimation method according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 로봇(100)은 이동하면서 카메라(110)를 이용하여 장소를 촬영하면, 촬영한 동영상을 이미지로 처리하여 복수의 프레임으로 생성한다(S100). 생성한 복수의 프레임들 중, 로봇(100)이 위치 추정을 위해 중요한 프레임이라 판단하는 적어도 하나 이상의 프레임들을 키프레임들로 생성하여 위치 추정 서버(200)로 전달한다(S101, S102). As shown in FIG. 4 , when the robot 100 captures a location using the camera 110 while moving, the captured video is processed into an image and generated into a plurality of frames ( S100 ). Among the plurality of generated frames, at least one or more frames that the robot 100 determines to be important frames for location estimation are generated as key frames and transmitted to the location estimation server 200 ( S101 and S102 ).

여기서, 로봇(100)이 복수의 프레임들 중 키프레임들을 선정하는 방법은 다양한 방법으로 수행할 수 있다. 또한, 로봇(100)은 S100 단계에서 생성한 이미지를 위치 추정 서버(200)로 전달하고, 위치 추정 서버(200)가 키프레임을 선정할 수도 있다.Here, the method for the robot 100 to select key frames from among the plurality of frames may be performed in various ways. Also, the robot 100 may transmit the image generated in step S100 to the location estimation server 200 , and the location estimation server 200 may select a keyframe.

복수의 키프레임들을 수신한 위치 추정 서버(200)는, 수신한 수신 키프레임에서 영상 특징점과 특징 서술자를 추출한다(S103). 그리고 추출한 영상 특징점과 특징 서술자를 이용하여 생성한 특징 서술자 공간을 클러스터링하여 복수의 영상 특징점들을 시각적 어휘로 생성한다(S104). The location estimation server 200 that has received the plurality of keyframes extracts the image feature point and the feature descriptor from the received keyframe (S103). Then, a plurality of image feature points are generated as visual vocabulary by clustering the feature descriptor space generated using the extracted image feature points and feature descriptors (S104).

위치 추정 서버(200)는 기 저장되어 있는 모든 저장 키프레임들의 Bow 벡터를 계산한다(S105). Bow 벡터를 계산하기 위해서 시각적 어휘가 필요한데, 모든 저장 키프레임들은 시각적 어휘가 추출되어 있으므로, 추출된 시각적 어휘를 이용하여 저장 키프레임들의 Bow 벡터를 계산한다.The position estimation server 200 calculates a Bow vector of all stored keyframes in advance (S105). A visual vocabulary is required to calculate the Bow vector. Since the visual vocabulary is extracted for all stored keyframes, the Bow vector of the stored keyframes is calculated using the extracted visual vocabulary.

위치 추정 서버(200)가 키프레임에 대해서 BoW 벡터를 계산하기 위해서는 시각적 어휘가 필요하다. 본 발명의 실시예에서는 ORB 특징을 사용하기 때문에, ORB특징으로 학습한 시각적 어휘를 사용한다. 시각적 어휘는 이미지를 희소 수 벡터(sparse numerical vector)인 BoW 벡터로 변환하는데 사용되는 것으로, 수많은 학습 이미지들로부터 특징점과 특징 서술자를 추출하고, 특징 서술자 공간을 K-median 클러스터링을 통해 W개의 시각적 단어로 표현한 것이다.In order for the position estimation server 200 to calculate a BoW vector for a keyframe, a visual vocabulary is required. Since the embodiment of the present invention uses the ORB feature, a visual vocabulary learned with the ORB feature is used. The visual vocabulary is used to transform an image into a BoW vector, a sparse numerical vector. It extracts feature points and feature descriptors from numerous training images, and uses the feature descriptor space to form W visual words through K-median clustering. is expressed as

위치 추정 서버(200)는 S105 단계에서 계산한 Bow 벡터의 요소인 영상 특징점들과 S103 단계에서 추출한 수신 키프레임의 영상 특징점을 비교한다. 그리고 일치하는 영상 특징점을 가지는 저장 키프레임을 정렬하여 키프레임 인덱스 벡터로 생성한다(S106). The location estimation server 200 compares the image feature points, which are elements of the Bow vector calculated in step S105, with the image feature points of the received keyframe extracted in step S103. Then, the stored keyframes having identical image feature points are aligned and generated as a keyframe index vector (S106).

위치 추정 서버(200)는 S106 단계에서 생성한 키프레임 인덱스 벡터의 가중치를 계산하고, 미리 설정된 점수 이상을 나타내는 저장 키프레임을 비교 후보 키프레임으로 선정한다(S107). 위치 추정 서버(200)는 S107 단계에서 선정한 비교 후보 키프레임들을 로봇(100)으로 전송한다(S108).The location estimation server 200 calculates the weight of the keyframe index vector generated in step S106, and selects a stored keyframe representing a preset score or more as a comparison candidate keyframe (S107). The position estimation server 200 transmits the comparison candidate keyframes selected in step S107 to the robot 100 (S108).

여기서, S107 단계에서 위치 추정 서버(200)가 복수의 저장 키프레임들 중 비교 후보 키프레임들을 선정할 때, 현재 키프레임의 Bow 벡터와의 유사도 점수를 계산하여 비교 후보 키프레임을 선정한다. 유사도 점수는 다음 수학식 1을 통해 계산한다.Here, when the location estimation server 200 selects the comparison candidate keyframes from among the plurality of stored keyframes in step S107, the comparison candidate keyframe is selected by calculating a similarity score with the Bow vector of the current keyframe. The similarity score is calculated through Equation 1 below.

여기서, v는 수신한 현재 키프레임의 Bow 벡터, w 저장 키프레임의 Bow 벡터를 의미한다. n은 워드 개수, i는 인덱스를 의미하고, v_i은 현재 키프레임의 Bow 벡터에서 인덱스에 해당하는 영상 특징점의 가중치, w_i는 저장 키프레임의 Bow 벡터에서 인덱스에 해당하는 영상 특징점의 가중치를 의미한다.Here, v denotes the Bow vector of the received current keyframe and the Bow vector of the w stored keyframe. n is the number of words, i is the index, v _i is the weight of the image feature corresponding to the index in the Bow vector of the current keyframe, and w _i is the weight of the image feature that corresponds to the index in the Bow vector of the saved keyframe. it means.

수학식 1을 이용하여 키프레임에 공통으로 속해있는 모든 영상 특이점에 대한 유사도 점수를 계산한 후, 위치 추정 서버(200)는 다음 수학식 2를 이용하여 현재 키프레임에 대한 최종 유사도 점수를 계산한다.After calculating the similarity scores for all image singularities commonly included in the keyframe using Equation 1, the location estimation server 200 calculates the final similarity score for the current keyframe using Equation 2 below. .

본 발명의 실시예에서는 비교 후보 키프레임을 선정하는 과정에서 현재 키프레임의 가중치를 더 우선시하기 위하여, 유사도 점수를 계산하는 과정에서도 현재 키프레임과 저장 키프레임의 가중치를 곱하여 사용한다. 이를 토대로, 키프레임의 가중치가 낮아 현재 키프레임의 가중치가 무시될 수 있는 경우의 수를 최대한 줄이고, 정확도를 높였다.In an embodiment of the present invention, in order to give priority to the weight of the current keyframe in the process of selecting the comparison candidate keyframe, the weight of the current keyframe and the stored keyframe are multiplied and used in the process of calculating the similarity score. Based on this, the number of cases in which the weight of the current keyframe can be ignored due to the low weight of the keyframe is reduced as much as possible, and the accuracy is increased.

로봇(100)은 S108 단계에서 수신한 비교 후보 키프레임들과 S101 단계에서 생성한 키프레임을 비교하여, 로봇(100) 자신의 현재 위치를 인식한다(S110). 그리고 로봇이 위치를 인식하는데 사용한 키프레임을 키프레임 지도로 생성한다(S111). The robot 100 compares the comparison candidate keyframes received in step S108 with the keyframe generated in step S101, and recognizes the current position of the robot 100 itself (S110). And the keyframe used to recognize the position of the robot is generated as a keyframe map (S111).

이와 같은 절차로 로봇(100)이 자신의 위치를 인식할 때, 모든 키프레임들과 비교하는 것이 아니라 선정된 비교 후보 키프레임들과 자신이 생성한 키프레임을 비교함으로써, 키프레임 비교를 위한 연산량을 줄이고 빠르게 위치를 파악할 수 있다.When the robot 100 recognizes its position through this procedure, it does not compare with all keyframes, but compares the selected comparison candidate keyframes with the keyframes generated by it, so that the amount of computation for keyframe comparison is can be reduced and the location can be quickly identified.

이상에서 설명한 절차 중, S104 단계에서 복수의 시각적 어휘를 생성하는 예에 대해 도 5를 참조로 설명한다.Among the procedures described above, an example of generating a plurality of visual vocabulary in step S104 will be described with reference to FIG. 5 .

도 5는 본 발명의 실시예에 따른 시각적 어휘 생성의 예시도이다.5 is an exemplary diagram of visual vocabulary generation according to an embodiment of the present invention.

도 5에 도시된 바와 같이, BoW 방법을 이용하여 로봇(100)의 시각적 공간 재인식의 개선하기 위해, 위치 추정 서버(200)는 키프레임 인덱스 벡터를 생성한다. 먼저, 로봇(100)의 전역적 위치 추정을 하기 위해, 위치 추정 서버(200)에 기 저장된 바이너리 파일을 읽어들여, 저장 키프레임들을 불러온다. 위치 추정 서버(200)는 모든 저장 키프레임에 대한 영상 특징점과 특징 서술자를 불러온다. As shown in FIG. 5 , in order to improve the visual spatial recognition of the robot 100 using the BoW method, the position estimation server 200 generates a keyframe index vector. First, in order to estimate the global position of the robot 100, a binary file pre-stored in the position estimation server 200 is read and stored keyframes are called. The location estimation server 200 calls image feature points and feature descriptors for all stored keyframes.

위치 추정 서버(200)는 수신 키프레임에 대한 BoW 벡터를 계산하기 위해, ORB SLAM으로 ORB 특징으로 학습한 시각적 어휘를 사용한다. 여기서, ORB SLAM은 SLAM 기술 중의 하나로, ORB SLAM에서 키프레임 특징을 추출할 때 추출한 특징이 ORB 특징이 된다. 도 5에 도시된 바와 같이, 총 6개의 그룹을 가지는 시각적 위치를 생성한 예를 나타내었으며, 이를 6개의 시각적 어휘를 가진다고 지칭한다.The localization server 200 uses the visual vocabulary learned by ORB features with ORB SLAM to calculate the BoW vector for the received keyframe. Here, ORB SLAM is one of the SLAM technologies, and when keyframe features are extracted from ORB SLAM, the extracted features become ORB features. As shown in FIG. 5 , an example of generating a visual location having a total of 6 groups is shown, which is referred to as having 6 visual vocabulary.

예를 들어 설명하면, 위치 추정 서버(200)가 키프레임 내 객실 문의 특징점과 특징 서술자를 지닌 이미지를 같은 특성을 지닌 키프레임으로 군집화하였다고 가정한다. 위치 추정 서버(200)가 군집화한 객실 문의 특징점과 특징 서술자는 로봇(100)이 객실 문 앞에서 영상으로 위치 인식을 수행하고자 할 때, 비교 키프레임으로 사용된다. For example, it is assumed that the location estimation server 200 clusters the image having the feature point and the feature descriptor of the room door in the keyframe into keyframes having the same feature. The feature points and feature descriptors of the room door clustered by the location estimation server 200 are used as comparison keyframes when the robot 100 wants to perform location recognition as an image in front of the room door.

도 5에서는 대표 시각적 어휘를 6개 선정한 예를 도시한 것이며, 대표 시각적 어휘의 수가 반드시 이와 같이 한정되는 것은 아니다. 5 shows an example in which six representative visual vocabulary is selected, and the number of representative visual vocabulary is not necessarily limited in this way.

다음은 상기 도 4의 S105 단계에서 계산된 벡터의 예에 대해 도 6을 참조로 설명한다.Next, an example of the vector calculated in step S105 of FIG. 4 will be described with reference to FIG. 6 .

도 6은 본 발명의 실시예에 따라 계산된 Bow 벡터의 예시도이다. 6 is an exemplary diagram of a Bow vector calculated according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 위치 추정 서버(200)는 각 키프레임의 BoW 벡터를 계산하기 위해, 해당 키프레임의 모든 특징 서술자들에 대한 시각적 어휘의 각 레벨에서 해밍거리를 최소화하는 노드를 선택한다. 그리고 첫 시작점인 루트 노드에서 워드에 해당하는 잎 노드까지, 트리 구조의 시각적 어휘를 검색하여 가장 잘 대표할 수 있는 워드를 선택한다. As shown in FIG. 6 , in order to calculate the BoW vector of each keyframe, the localization server 200 selects a node that minimizes the Hamming distance at each level of the visual vocabulary for all feature descriptors of the corresponding keyframe. do. Then, from the root node, which is the first starting point, to the leaf node corresponding to the word, the visual vocabulary of the tree structure is searched and the word that can best be represented is selected.

이는 모든 워드에 해당하는 노드를 검색하는 것이 아니라, 각 레벨에서 해밍거리가 가장 가까운 하나의 노드를 선택하고, 이외의 노드에 대해서는 검색하지 않기 때문에 효율적으로 워드를 선택할 수 있다. This does not search for nodes corresponding to all words, but selects one node with the closest Hamming distance in each level and does not search for other nodes, so that words can be selected efficiently.

그리고 위치 추정 서버(200)는 tf-idf를 이용해 키프레임에 대하여 선택한 워드의 가중치를 계산한다. 이렇게 선택된 워드와 가중치를 한 쌍으로 모든 쌍들을 모아 BoW 벡터를 생성한다. And the position estimation server 200 calculates the weight of the word selected for the key frame using tf-idf. A BoW vector is generated by collecting all pairs of the selected word and weight as a pair.

즉, 다수의 키프레임 내의 영상 특징점이나 특징 서술자를 비교 또는 학습하기 위하여, 위치 추정 서버(200)는 키프레임의 영상 특징점에서 방향성을 지닌 특징 서술자를 수치화한다. 예를 들어 객실 문의 모서리에서 오늘 쪽 방향을 가리키는 수평적인 특징은 1, 수직적인 특징은 2로 수치화하여 위치 추정 서버(200)에서 관리하고, 이는 키프레임을 그룹화하는 기준이 된다.That is, in order to compare or learn the image feature points or feature descriptors in a plurality of key frames, the location estimation server 200 digitizes the feature descriptors having a direction at the image feature points of the key frames. For example, the horizontal feature pointing to the present direction from the corner of the room door is digitized as 1 and the vertical feature is managed by the location estimation server 200 , which is a standard for grouping keyframes.

다음은 상기 도 4의 S106 단계에서 생성한 키프레임 인덱스 벡터의 예에 대해 도 7을 참조로 설명한다.Next, an example of the keyframe index vector generated in step S106 of FIG. 4 will be described with reference to FIG. 7 .

도 7은 본 발명의 실시예에 따른 키프레임 인덱스 벡터의 예시도이다.7 is an exemplary diagram of a keyframe index vector according to an embodiment of the present invention.

도 7에 도시된 바와 같이, 키프레임 인덱스 벡터의 각 요소는, ORB 특징으로 학습한 시각적 어휘에 있는 워드와 대응한다. 위치 추정 서버(200)는 저장되어 있는 모든 저장 키프레임들의 Bow 벡터를 계산한 후, 각 Bow 벡터의 요소인 워드에 대응하는 키프레임 인덱스 벡터의 요소에 해당 키프레임을 사용한다.As shown in Figure 7, each element of the keyframe index vector corresponds to a word in the visual vocabulary learned with the ORB feature. After calculating the Bow vector of all stored keyframes, the location estimation server 200 uses the corresponding keyframe as an element of the keyframe index vector corresponding to the word that is the element of each Bow vector.

도 7을 예를 들면, 키프레임 인덱스 벡터의 역삼각형 워드 요소에 접근하면, 역삼각형 워드가 포함된 BoW 벡터를 가지고 있는 키프레임은 1, 2, 3인 것을 알 수 있다. 특정 워드가 포함된 키프레임을 찾을 때 모든 키프레임을 비교할 필요 없이 해당 워드 요소에만 접근하면 되므로 수행시간이 단축된다.Referring to FIG. 7 for example, when the inverted triangle word element of the keyframe index vector is accessed, it can be seen that the keyframes having the BoW vector including the inverted triangle word are 1, 2, and 3. When finding a keyframe containing a specific word, it is not necessary to compare all keyframes, and only the corresponding word element is accessed, reducing execution time.

즉, 위치 추정 서버(200)는 선정된 키프레임들을 그룹화하는 기준을 통해, 키프레임들을 다시 정렬한다. 한 실시예로 위치 추정 서버(200)는 객실 복도의 조명 엣지의 특징을 지닌 키프레임 인덱스 벡터를 가진 키프레임을, 유사한 인덱스 벡터를 가진 키프레임 이미지들끼리 그룹화한다. That is, the location estimation server 200 rearranges the keyframes based on the criteria for grouping the selected keyframes. In one embodiment, the location estimation server 200 groups keyframes having a keyframe index vector having a characteristic of a lighting edge of a hallway of a guest room, and keyframe images having a similar index vector.

이렇게 위치 추정 서버(200)에서 미리 그룹화한 키프레임 이미지는, 로봇(100)이 객실 복도의 조명 근처에서 위치 재인식을 카메라로 수행할 때, 비교 이미지들로 선정되어 신속하게 영상으로 로봇이 위치를 인식할 수 있게 해준다. 모든 키프레임들에 대해 이 작업을 수행하면, 각 영상 특징점에 해당하는 요소마다 해당 영상 특징점이 포함된 Bow 벡터를 가지고 있는 키프레임 목록이 도 7에 도시한 바와 같이 완성된다.The keyframe images grouped in advance by the position estimation server 200 in this way are selected as comparison images when the robot 100 performs position re-recognition with a camera near the lighting in the hallway of the room, and the robot quickly determines the position of the image with the image. makes it recognizable If this operation is performed for all keyframes, a list of keyframes having a Bow vector including the corresponding image feature point for each element corresponding to each image feature point is completed as shown in FIG. 7 .

한편, 상기에서 설명한 도 4의 S103 단계에서 위치 추정 서버(200)가 키프레임에서 영상 특징점을 추출하는 방법에 대해 도 8을 참조로 설명한다.Meanwhile, a method in which the location estimation server 200 extracts image feature points from a keyframe in step S103 of FIG. 4 described above will be described with reference to FIG. 8 .

도 8은 본 발명의 실시예에 따른 영상 특징점 추출 방법에 대한 흐름도이다.8 is a flowchart of a method for extracting image feature points according to an embodiment of the present invention.

도 8에 도시된 바와 같이, 위치 추정 서버(200)가 로봇(100)으로부터 복수의 키프레임을 수신하면, 제1 키프레임에서 복수의 영상 특징점들을 추출한다(S200). 제1 키프레임에서 복수의 영상 특징점들을 추출한 후, 위치 추정 서버(200)는 제1 키프레임과의 광 흐름을 이용하여, 제2 키프레임에서 복수의 영상 특징점들을 추출한다(S201). 여기서, 광 흐름은 두 개의 연속된 비디오 프레임 사이에 이미지 객체의 가시적인 동작 패턴을 의미하는 것으로, 광 흐름에 대한 사항은 이미 알려져 있으므로 본 발명의 실시예에서는 상세한 설명을 생략한다.As shown in FIG. 8 , when the location estimation server 200 receives a plurality of keyframes from the robot 100, a plurality of image feature points are extracted from the first keyframe (S200). After extracting the plurality of image feature points from the first keyframe, the location estimation server 200 extracts the plurality of image feature points from the second keyframe by using the light flow with the first keyframe (S201). Here, the light flow refers to a visible operation pattern of an image object between two consecutive video frames. Since the light flow is already known, a detailed description thereof will be omitted in the embodiment of the present invention.

위치 추정 서버(200)는 S201 단계에서 추출한 제2 키프레임의 영상 특징점들의 수가 미리 설정한 임계값보다 많은지 확인한다(S202). The location estimation server 200 checks whether the number of image feature points of the second keyframe extracted in step S201 is greater than a preset threshold value (S202).

제2 키프레임에서 추출한 영상 특징점들의 수가 충분하지 않은 경우, 위치 추정 서버(200)는 S201 단계에서 추출한 영상 특징점을 중심점으로 하고 미리 정의한 길이의 반지름을 갖는 원을 생성하여 마스크 영역으로 지정한다(S203). 위치 추정 서버(200)가 영상 특징점을 중심으로 마스크 영역을 지정하는 방법은 다양한 방법을 통해 수행할 수 있으므로, 본 발명의 실시예에서는 상세한 설명을 생략한다.If the number of image feature points extracted from the second keyframe is not sufficient, the location estimation server 200 uses the image feature point extracted in step S201 as a center point and creates a circle having a predefined length and radius and designates it as a mask area (S203). ). Since the method for the location estimation server 200 to designate a mask area based on image feature points can be performed through various methods, a detailed description will be omitted in the exemplary embodiment of the present invention.

위치 추정 서버(200)는 S203 단계에서 지정한 제2 키프레임에서의 마스크 영역을 제외한 나머지 영역에서 추가로 복수의 영상 특징점을 추출한다(S204). 그리고 추가로 추출한 복수의 영상 특징점들과 S201 단계에서 추출한 영상 특징점들의 수가 미리 설정한 임계값 보다 많은지 확인한다(S205).The location estimation server 200 additionally extracts a plurality of image feature points from the remaining area except for the mask area in the second keyframe designated in step S203 (S204). Then, it is checked whether the number of additionally extracted plurality of image feature points and the number of image feature points extracted in step S201 is greater than a preset threshold value (S205).

제2 키프레임에서 추출한 모든 영상 특징점들의 수가 임계값보다 많으면, 위치 추정 서버(200)는 제2 키프레임에 연속된 제3 키프레임에서, 제2 키프레임과의 광 흐름을 이용하여 영상 특징점을 추출한다(S206). 그러나 제2 키프레임에서 추출한 모든 영상 특징점들의 수가 임계값보다 적으면, S203 단계 이후의 절차를 반복 수행한다.If the number of all image feature points extracted from the second keyframe is greater than the threshold value, the location estimation server 200 determines the image feature points using the light flow with the second keyframe in the third keyframe continuous to the second keyframe. is extracted (S206). However, if the number of all image feature points extracted from the second keyframe is less than the threshold, the procedure after step S203 is repeatedly performed.

이상에서 설명한 영상 특징점 추출 예에 대해 도 9를 참조로 설명한다.An example of extracting image feature points described above will be described with reference to FIG. 9 .

도 9는 본 발명의 실시예에 따라 키프레임에서 영상 특징점이 추출된 예시도이다.9 is an exemplary diagram in which image feature points are extracted from a keyframe according to an embodiment of the present invention.

도 9의 (a)는 t 시점에 임의의 키프레임에서 추출한 7개의 영상 특징점들을 나타낸 예시도이다. 그리고 도 9의 (b)는 t+1 시점에 임의의 키프레임에서 추출한 15개의 영상 특징점들을 나타낸 예시도이다.9A is an exemplary diagram illustrating seven image feature points extracted from an arbitrary keyframe at time t. And FIG. 9(b) is an exemplary diagram showing 15 image feature points extracted from an arbitrary key frame at time t+1.

도 9의 (a)에 나타낸 바와 같이, 위치 추정 서버(200)가 임의의 키프레임에서 t 시점에 7개의 영상 특징점을 추출한 후(①), 각각의 영상 특징점을 기준으로 하는 원을 생성하여 마스크 영역으로 지정한다(②, ③). 이때, 키프레임에서 추출한 7개의 영상 특징점이 임계값 보다 적다고 가정한다. As shown in (a) of FIG. 9, the location estimation server 200 extracts 7 image feature points at time t in an arbitrary keyframe (①), then creates a circle based on each image feature point to mask Designate as an area (②, ③). In this case, it is assumed that the 7 image feature points extracted from the keyframe are less than the threshold value.

그러면, 위치 추정 서버(200)는 도 9의 (b)에 나타낸 바와 같이, 임의의 키프레임에서 7개의 영상 특징점을 기준으로 형성된 마스크 영역 이외의 영역에서 8개의 영상 특징점들을 추가로 추출한다(④). 그리고 추가로 추출한 8개의 영상 특징점들을 포함하여 마스크 영역을 지정한다(⑤, ⑥). Then, the location estimation server 200 additionally extracts 8 image feature points in a region other than the mask region formed based on the 7 image feature points in an arbitrary key frame, as shown in FIG. 9(b) (④) ). Then, a mask area is designated including the additionally extracted 8 image feature points (⑤, ⑥).

이와 같이, 본 발명의 실시예에서는 마스크 방식으로 영상 특징점들을 추출하기 때문에, 키프레임의 특정 영역에만 영상 특징점이 추출되는 것이 아니라 키프레임 전체에 고르게 영상 특징점을 추출할 수 있다. 따라서, 보다 많은 영상 특징점을 추출할 수 있다. 또한, 광 흐름에 유리한 영상 특징점들을 추출하므로, 이전 키프레임과 다음 키프레임 사이의 광 흐름 성능을 높일 수 있다.As described above, in the embodiment of the present invention, since the image feature points are extracted in the mask method, the image feature points are not extracted only in a specific area of the key frame, but the image feature points can be evenly extracted throughout the key frame. Accordingly, more image feature points can be extracted. In addition, since image feature points favorable to light flow are extracted, light flow performance between the previous keyframe and the next keyframe can be improved.

다음은 벡터 계산 모듈(230)이 수신 키프레임들의 벡터를 계산하기 위해 사용한 시각적 어휘에 대해 도 10을 참조로 설명한다.The following describes a visual vocabulary used by the vector calculation module 230 to calculate a vector of received keyframes with reference to FIG. 10 .

도 10은 본 발명의 실시예에 따른 시각적 어휘의 예시도이다.10 is an exemplary diagram of a visual vocabulary according to an embodiment of the present invention.

도 10의 (a)와 같이 로봇(100)이 촬영한 이미지를 수신하면, 키프레임 처리 모듈(210)은 해당 이미지에서 특징이 되는 픽셀을 추출한다. 빨간색과 파란색으로 표시된 부분이 특징이 되는 픽셀이다. 해당 픽셀 값은 방향성을 지닌 벡터로 표현할 수 있고, 방향성을 지닌 값은 기존에 공지된 기술을 통해 특징 서술자로 변환할 수 있다. When an image photographed by the robot 100 is received as shown in FIG. 10A , the keyframe processing module 210 extracts a characteristic pixel from the image. The areas marked in red and blue are the characteristic pixels. A corresponding pixel value may be expressed as a vector having a direction, and a value having a direction may be converted into a feature descriptor through a known technique.

그리고 도 10의 (b)에 도시된 바와 같이, 특징 서술자는 영상 특징점에 대해 방향 히스토그램을 구하고, 가장 강한 방향을 특징점 방향으로 결정한다. 특징 서술자 중에서도 유사한 특징 서술자들을 클러스터링하면, 키프레임 처리 모듈(210)은 tf-idf를 수행하기 위한 소수의 대표하는 시각적 어휘로 나타낸다. 소수의 대표적인 시각적 어휘가 도 10의 (c)에 해당한다.And as shown in (b) of FIG. 10 , the feature descriptor obtains a direction histogram for the image feature point, and determines the strongest direction as the feature point direction. If similar feature descriptors are clustered among the feature descriptors, the keyframe processing module 210 represents a small number of representative visual vocabulary for performing tf-idf. A few representative visual vocabulary corresponds to FIG. 10( c ).

예를 들어, 로봇(100)이 호텔의 로비를 돌아다니면서 키프레임 이미지를 수집하는 경우, 객실 문의 테두리, 엘리베이터로 연결되는 통로, 객실 번호 표지판 등의 특징 서술자는 가장 강한 방향을 지닌 특징 서술자가 될 수 있다. 따라서, 해당 특징을 지닌 키프레임을 차별성 있는 특징점의 특징 서술자가 속한 시각적 어휘라 할 수 있다.For example, when the robot 100 collects keyframe images while walking around the hotel lobby, the feature descriptors such as the border of the room door, the passageway leading to the elevator, and the room number sign will be the feature descriptors with the strongest direction. can Therefore, the keyframe with the corresponding characteristic can be regarded as a visual vocabulary to which the characteristic descriptor of the distinctive characteristic point belongs.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다. Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improved forms of the present invention are also provided by those skilled in the art using the basic concept of the present invention as defined in the following claims. is within the scope of the right.

Claims

A server that provides a comparison target frame so that the robot can estimate a location in conjunction with an autonomous driving robot,
receiving a plurality of keyframes selected from a specific image from the autonomous driving robot, and extracting a plurality of first image feature points and first feature descriptors for the plurality of first image feature points, respectively, from the plurality of keyframes keyframe processing module,
a vector calculation module for calculating a keyframe vector of each of the plurality of keyframes and generating a keyframe index vector including second image feature points of a plurality of stored keyframes;
a keyframe index vector generation module that aligns keyframes including first feature descriptors corresponding to the second image feature points to the keyframe index vector to generate a keyframe index; and
A comparison candidate key for calculating a weight of the keyframe index and selecting a plurality of comparison candidate keyframes for recognizing a position by comparing the autonomous driving robot with the plurality of keyframes from among the plurality of stored keyframes stored in advance Frame selection module
A location estimation server comprising a.

According to claim 1,
A keyframe storage module for storing a plurality of keyframes, first image feature points, and first feature descriptors processed by the keyframe processing module, and second image feature points and second feature descriptors for the stored keyframes
A location estimation server further comprising a.

According to claim 1,
The keyframe processing module,
designating a mask area for each of a plurality of image feature points extracted from a first keyframe among the plurality of keyframes,
A location estimation server for extracting a plurality of additional image feature points in a region other than the mask region in a second keyframe successive to the first keyframe.

According to claim 1,
The vector calculation module,
A location estimation server for acquiring a visual vocabulary from each of the plurality of keyframes, and calculating a vector of received keyframes by a Bag of Words (BoW) method based on the acquired visual vocabulary.

5. The method of claim 4,
The keyframe index vector generation module,
A position estimation server for generating the keyframe index by applying a term frequency-inverse document frequency (tf-idf) to the first image feature points.

As an autonomous driving robot that interworks with a location estimation server,
a camera that shoots video,
a keyframe generation module for selecting a plurality of keyframes for estimating the position of the robot from among the plurality of frames generated from the photographed image;
an interface for transmitting the plurality of keyframes to the location estimation server and receiving a plurality of comparison candidate keyframes selected based on the plurality of keyframes from the location estimation server; and
A keyframe position recognition module for recognizing the position of the autonomous driving robot by comparing the plurality of keyframes with the plurality of comparison candidate keyframes
self-driving robot including

7. The method of claim 6,
An image processing unit generating the image captured by the camera as a plurality of images, and generating the plurality of frames for each of the plurality of images
An autonomous driving robot further comprising a.

8. The method of claim 7,
When the location of the autonomous driving robot is recognized by the keyframe location recognition module, a keyframe map generator for generating a keyframe map using the plurality of keyframes used for location recognition
An autonomous driving robot further comprising a.

A method for a position estimation server to estimate a position of an autonomous driving robot, comprising:
Receiving a plurality of received keyframes generated based on an image in the autonomous driving robot;
extracting a plurality of first image feature points and a first feature descriptor for the first image feature points from each of the received keyframes;
calculating a vector of stored keyframes based on visual vocabularies for each of a plurality of previously stored stored keyframes;
generating a keyframe index by arranging storage keyframes in which second image feature points of the storage keyframes constituting the calculated vector coincide with the first image feature points; and
selecting a comparison candidate keyframe from among a plurality of storage keyframes by calculating weights for second characteristic descriptors of the storage keyframes included in the keyframe index;
A robot position estimation method comprising a.

10. The method of claim 9,
The step of extracting the first feature descriptor,
generating a feature descriptor space based on the first image feature points and a first feature descriptor; and
generating the first image feature points as a first visual vocabulary by clustering the generated feature descriptor space;
A robot position estimation method comprising a.

11. The method of claim 10,
The step of extracting the first feature descriptor,
extracting a first keyframe image feature point from a first keyframe among the received keyframes;
extracting a second keyframe image feature point from a second keyframe adjacent to the first keyframe;
designating a mask area using the second keyframe image feature point as a reference point when the number of extracted second keyframe image feature points is less than a preset threshold value; and
extracting an additional second keyframe image feature point in an area other than the mask area in the second keyframe
A robot position estimation method comprising a.

12. The method of claim 11,
Calculating the vector of the stored keyframes comprises:
A robot position estimation method for calculating a Bow vector based on a second visual vocabulary corresponding to the stored keyframes.

As a method for estimating a location by an autonomous driving robot linked with a location estimation server,
generating a plurality of images from the captured image, and generating frames for the generated plurality of images;
selecting a plurality of keyframes for estimating the position of the robot from among the generated frames;
transmitting the selected plurality of keyframes to the location estimation server;
Receiving a comparison candidate keyframe selected based on the plurality of keyframes from the location estimation server, and
estimating a position using the received comparison candidate keyframe and the selected plurality of keyframes;
A robot position estimation method comprising a.

14. The method of claim 13,
After estimating the location,
generating a keyframe map using the plurality of keyframes;
Robot position estimation method further comprising.