KR102321998B1

KR102321998B1 - Method and system for estimating position and direction of image

Info

Publication number: KR102321998B1
Application number: KR1020190114009A
Authority: KR
Inventors: 김덕화; 이동환
Original assignee: 네이버랩스 주식회사
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2021-11-04
Also published as: KR20210032678A

Abstract

환경 변화에 강인한 이미지의 위치 및 방향 추정 방법 및 시스템이 개시된다. 이미지 기반 측위 방법은, 측위 요청에 따른 쿼리 이미지를 이미지 합성 모델을 통해 합성하는 단계; 상기 쿼리 이미지와 상기 합성된 이미지를 이용하여 측위를 위한 참조 이미지를 선정하는 단계; 및 상기 쿼리 이미지와 상기 합성된 이미지에서 추출된 특징점과 상기 참조 이미지에서 추출된 특징점을 이용하여 상기 쿼리 이미지에 대한 포즈를 추정하는 단계를 포함한다.Disclosed are a method and system for estimating the position and orientation of an image that is robust to environmental changes. The image-based positioning method includes: synthesizing a query image according to a positioning request through an image synthesis model; selecting a reference image for positioning using the query image and the synthesized image; and estimating a pose for the query image using the feature points extracted from the query image and the synthesized image and the feature points extracted from the reference image.

Description

Method and system for estimating the position and orientation of an image that is robust to environmental changes

아래의 설명은 이미지 기반 측위 기술에 관한 것이다.The description below relates to image-based positioning technology.

무선단말에는 위치를 기반으로 하는 다양한 서비스가 제공되고 있다. 다시 말해, 무선단말이 위치한 장소를 기준으로 지도를 보여주거나 내비게이션 기능을 제공하거나 위치를 추적하는 등의 위치기반 서비스가 제공되고 있다.Various services based on location are provided to wireless terminals. In other words, location-based services such as showing a map, providing a navigation function, or tracking a location based on a location where a wireless terminal is located are being provided.

실내/외에서 위치기반 서비스를 제공하기 위해서는 정확한 측위 기술이 필요하다. 예컨대, 한국등록특허공보 제10-0723680호(등록일 2007년 05월 23일)에는 실내/외 환경에서 GPS를 이용하여 이동통신 단말기의 위치를 측정하는 기술이 개시되어 있다.Accurate positioning technology is required to provide location-based services indoors and outdoors. For example, Korean Patent Publication No. 10-0723680 (registration date of May 23, 2007) discloses a technique for measuring the location of a mobile communication terminal using GPS in an indoor/outdoor environment.

GPS를 이용한 측위 기술은 GPS 위성으로부터 수신한 GPS 신호를 분석하여 현재 위치를 연산하게 되는데, 그 정확도가 높지 않은 편이다.Positioning technology using GPS analyzes a GPS signal received from a GPS satellite to calculate the current position, but the accuracy is not high.

이에, 정확도가 높은 측위 기술이 요구됨에 따라 이미지를 기반으로 위치를 측정하는 연구 개발이 활발해지고 있다. 그러나, 이미지 기반 측위 기술은 움직이는 물체나 사람, 날씨, 조도, 시간 등 이미지 상에서 발생할 수 있는 여러 환경 변화에 민감하다.Accordingly, as a high-accuracy positioning technique is required, research and development for measuring a position based on an image is being actively conducted. However, the image-based positioning technology is sensitive to various environmental changes that may occur in an image, such as moving objects or people, weather, illuminance, and time.

다양한 환경 변화에 강인한 이미지 기반 측위를 위한 방법 및 시스템을 제공한다.A method and system for image-based positioning that is robust to various environmental changes are provided.

이미지를 기반으로 정확한 6-DOF(6-degrees of freedom) 포즈를 추정할 수 있는 방법 및 시스템을 제공한다.We provide a method and system for estimating an accurate 6-degrees of freedom (6-DOF) pose based on an image.

컴퓨터 시스템에서 실행되는 이미지 기반 측위 방법에 있어서, 상기 컴퓨터 시스템은 메모리에 포함된 컴퓨터 판독가능한 명령들을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고, 상기 이미지 기반 측위 방법은, 상기 적어도 하나의 프로세서에 의해, 측위 요청에 따른 쿼리 이미지를 이미지 합성 모델을 통해 합성하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 쿼리 이미지와 상기 합성된 이미지를 이용하여 측위를 위한 참조 이미지를 선정하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 상기 쿼리 이미지와 상기 합성된 이미지에서 추출된 특징점과 상기 참조 이미지에서 추출된 특징점을 이용하여 상기 쿼리 이미지에 대한 포즈를 추정하는 단계를 포함하는 이미지 기반 측위 방법을 제공한다.An image-based positioning method executed on a computer system, the computer system comprising at least one processor configured to execute computer readable instructions contained in a memory, the image-based positioning method being performed by the at least one processor , synthesizing a query image according to a location request through an image synthesis model; selecting, by the at least one processor, a reference image for positioning using the query image and the synthesized image; and estimating, by the at least one processor, a pose for the query image by using the feature points extracted from the query image and the synthesized image and the feature points extracted from the reference image. to provide.

일 측면에 따르면, 상기 합성하는 단계는, 상기 이미지 합성 모델을 이용한 이미지 변환(image translation)을 통해 합성 이미지를 생성할 수 있다.According to an aspect, the synthesizing may generate a composite image through image translation using the image synthesis model.

다른 측면에 따르면, 상기 합성하는 단계는, 야간 이미지(night image)를 주간 이미지(day image)로 변환하는 GAN(generative adversarial network) 모델을 상기 쿼리 이미지를 합성할 수 있다.According to another aspect, the synthesizing may include synthesizing the query image with a generative adversarial network (GAN) model that converts a night image into a day image.

또 다른 측면에 따르면, 상기 GAN 모델은 이미지의 색상(color), 텍스처(texture), 그라디언트(gradient) 중 적어도 하나의 피처를 변환하는 네트워크로 구성될 수 있다.According to another aspect, the GAN model may be configured as a network that transforms at least one feature among a color, a texture, and a gradient of an image.

또 다른 측면에 따르면, 상기 선정하는 단계는, 상기 쿼리 이미지에서 추출된 글로벌 피처(global feature)를 이용하여 상기 쿼리 이미지와 매칭되는 적어도 하나의 제1 후보 이미지를 검색하는 단계; 상기 합성된 이미지에서 추출된 글로벌 피처를 이용하여 상기 합성된 이미지와 매칭되는 적어도 하나의 제2 후보 이미지를 검색하는 단계; 및 상기 제1 후보 이미지와 상기 제2 후보 이미지 중 적어도 하나를 상기 참조 이미지로 선정하는 단계를 포함할 수 있다.According to another aspect, the selecting may include: searching for at least one first candidate image matching the query image using a global feature extracted from the query image; searching for at least one second candidate image matching the synthesized image using global features extracted from the synthesized image; and selecting at least one of the first candidate image and the second candidate image as the reference image.

또 다른 측면에 따르면, 상기 제1 후보 이미지와 상기 제2 후보 이미지 중 적어도 하나를 상기 참조 이미지로 선정하는 단계는, 상기 쿼리 이미지 또는 상기 합성된 이미지와의 매칭 스코어를 기준으로 상기 제1 후보 이미지와 상기 제2 후보 이미지 중에서 일부 후보 이미지를 상기 참조 이미지로 선정할 수 있다.According to another aspect, the selecting of at least one of the first candidate image and the second candidate image as the reference image includes the first candidate image based on a matching score of the query image or the synthesized image. and some candidate images among the second candidate images may be selected as the reference image.

또 다른 측면에 따르면, 상기 선정하는 단계는, 상기 제1 후보 이미지와 상기 제2 후보 이미지 중에서 이미지와 관련된 시간 정보가 일정 시간 이상 경과된 이미지를 상기 참조 이미지로 선정되지 않도록 제외시키는 단계를 더 포함할 수 있다.According to another aspect, the selecting may further include excluding an image in which time information related to an image from among the first candidate image and the second candidate image has elapsed for more than a predetermined time from being selected as the reference image. can do.

또 다른 측면에 따르면, 상기 추정하는 단계는, 상기 쿼리 이미지 또는 상기 합성된 이미지의 특징점과 상기 참조 이미지의 특징점 간의 시각적 관련성(visual correspondences) 정보를 획득하는 단계; 및 상기 시각적 관련성 정보를 기반으로 상기 쿼리 이미지에 대응되는 6-DOF(6-degrees of freedom) 포즈를 계산하는 단계를 포함할 수 있다.According to another aspect, the estimating may include: acquiring information about visual correspondences between a feature point of the query image or the synthesized image and a feature point of the reference image; and calculating a 6-degrees of freedom (6-DOF) pose corresponding to the query image based on the visual relevance information.

또 다른 측면에 따르면, 상기 6-DOF 포즈를 계산하는 단계는, 상기 쿼리 이미지 또는 상기 합성된 이미지의 특징점과 상기 참조 이미지의 특징점의 관련성을 기반으로 특징점 간의 오차를 구하는 방식인 재투영 오차(re-projection error) 측정 방법을 이용하여 상기 6-DOF 포즈를 계산할 수 있다.According to another aspect, the calculating of the 6-DOF pose includes a re-projection error (re) in which an error between the feature points is calculated based on the relation between the feature points of the query image or the synthesized image and the feature points of the reference image. The 6-DOF pose may be calculated using a -projection error) measurement method.

상기 이미지 기반 측위 방법을 상기 컴퓨터 시스템에 실행시키기 위해 비-일시적인 컴퓨터 판독가능한 기록 매체에 저장되는 컴퓨터 프로그램을 제공한다.There is provided a computer program stored in a non-transitory computer-readable recording medium for executing the image-based positioning method in the computer system.

상기 이미지 기반 측위 방법을 컴퓨터에 실행시키기 위한 프로그램이 기록되어 있는 비-일시적인 컴퓨터 판독 가능한 기록 매체를 제공한다.There is provided a non-transitory computer-readable recording medium in which a program for executing the image-based positioning method in a computer is recorded.

컴퓨터 시스템에 있어서, 메모리에 포함된 컴퓨터 판독가능한 명령들을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 측위 요청에 따른 쿼리 이미지를 이미지 합성 모델을 통해 합성하는 이미지 합성부; 상기 쿼리 이미지와 상기 합성된 이미지를 이용하여 측위를 위한 참조 이미지를 선정하는 후보 선정부; 및 상기 쿼리 이미지와 상기 합성된 이미지에서 추출된 특징점과 상기 참조 이미지에서 추출된 특징점을 이용하여 상기 쿼리 이미지에 대한 포즈를 추정하는 포즈 추정부를 포함하는 컴퓨터 시스템을 제공한다.A computer system comprising: at least one processor configured to execute computer readable instructions contained in a memory, the at least one processor comprising: an image synthesizing unit for synthesizing a query image according to a positioning request through an image synthesizing model; a candidate selection unit for selecting a reference image for positioning using the query image and the synthesized image; and a pose estimator for estimating a pose for the query image using the feature points extracted from the query image and the synthesized image and the feature points extracted from the reference image.

본 발명의 실시예들에 따르면, 환경 변화에 강인한 이미지를 합성하여 합성된 이미지를 측위에 이용함으로써 정확한 측위 결과를 제공할 수 있다.According to embodiments of the present invention, it is possible to provide an accurate positioning result by synthesizing an image robust to environmental changes and using the synthesized image for positioning.

본 발명의 실시예들에 따르면, 환경 변화에 강인한 이미지를 합성한 후 합성된 이미지를 이용하여 특징점 기반의 포즈 추정을 수행함으로써 정확한 6-DOF 포즈를 추정할 수 있다.According to embodiments of the present invention, an accurate 6-DOF pose can be estimated by synthesizing an image robust to environmental changes and then performing a feature point-based pose estimation using the synthesized image.

도 1은 본 발명의 일실시예에 있어서 컴퓨터 시스템의 내부 구성의 일례를 설명하기 위한 블록도이다.
도 2는 본 발명의 일실시예에 따른 컴퓨터 시스템의 프로세서가 포함할 수 있는 구성요소의 예를 도시한 도면이다.
도 3은 본 발명의 일실시예에 따른 컴퓨터 시스템이 수행할 수 있는 이미지 기반 측위 방법의 예를 도시한 순서도이다.
도 4 내지 도 5는 본 발명의 일실시예에 있어서 이미지 합성을 위한 GAN 모델의 예시를 도시한 것이다.
도 6은 본 발명의 일실시예에 있어서 야간 이미지를 주간 이미지로 합성하는 과정을 설명하기 위한 예시 도면이다.
도 7은 본 발명의 일실시예에 있어서 측위를 위한 참조 이미지로 후보 이미지를 선정하는 과정의 일례를 도시한 것이다.
도 8 내지 도 10은 본 발명의 일실시예에 있어서 6-DOF 포즈를 추정하는 과정의 일례를 도시한 것이다.1 is a block diagram illustrating an example of an internal configuration of a computer system according to an embodiment of the present invention.
2 is a diagram illustrating an example of components that a processor of a computer system according to an embodiment of the present invention may include.
3 is a flowchart illustrating an example of an image-based positioning method that a computer system can perform according to an embodiment of the present invention.
4 to 5 show examples of GAN models for image synthesis according to an embodiment of the present invention.
6 is an exemplary view for explaining a process of synthesizing a night image into a day image according to an embodiment of the present invention.
7 illustrates an example of a process of selecting a candidate image as a reference image for positioning according to an embodiment of the present invention.
8 to 10 show an example of a process for estimating a 6-DOF pose according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 실시예들은 환경 변화에 강인한 이미지의 위치 및 방향을 추정하는 기술에 관한 것이다.Embodiments of the present invention relate to a technique for estimating the position and orientation of an image that is robust to environmental changes.

본 명세서에서 구체적으로 개시되는 것들을 포함하는 실시예들은 환경 변화에 강인한 이미지를 측위에 이용하여 정확한 6-DOF 포즈를 추정할 수 있고, 이를 통해 이미지를 이용한 측위 기술의 정확성, 정밀성, 최적화 등 여러 측면에 있어서 상당한 장점들을 달성한다.Embodiments including those specifically disclosed in this specification can estimate an accurate 6-DOF pose by using an image that is robust to environmental changes, and through this, various aspects such as accuracy, precision, and optimization of a positioning technique using an image achieve significant advantages in

도 1은 본 발명의 일실시예에 따른 컴퓨터 시스템의 예를 도시한 블록도이다. 예를 들어, 본 발명의 실시예들에 따른 이미지 기반 측위 시스템은 도 1을 통해 도시된 컴퓨터 시스템(100)에 의해 구현될 수 있다.1 is a block diagram illustrating an example of a computer system according to an embodiment of the present invention. For example, the image-based positioning system according to embodiments of the present invention may be implemented by the computer system 100 illustrated in FIG. 1 .

도 1에 도시된 바와 같이 컴퓨터 시스템(100)은 본 발명의 실시예들에 따른 이미지 기반 측위 방법을 실행하기 위한 구성요소로서, 메모리(110), 프로세서(120), 통신 인터페이스(130) 그리고 입출력 인터페이스(140)를 포함할 수 있다.As shown in FIG. 1 , the computer system 100 is a component for executing the image-based positioning method according to embodiments of the present invention, and includes a memory 110 , a processor 120 , a communication interface 130 , and input/output. It may include an interface 140 .

메모리(110)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 디스크 드라이브와 같은 비소멸성 대용량 기록장치는 메모리(110)와는 구분되는 별도의 영구 저장 장치로서 컴퓨터 시스템(100)에 포함될 수도 있다. 또한, 메모리(110)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(110)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 메모리(110)로 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 인터페이스(130)를 통해 메모리(110)에 로딩될 수도 있다. 예를 들어, 소프트웨어 구성요소들은 네트워크(160)를 통해 수신되는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 컴퓨터 시스템(100)의 메모리(110)에 로딩될 수 있다.The memory 110 is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive. Here, a non-volatile mass storage device such as a ROM and a disk drive may be included in the computer system 100 as a separate permanent storage device distinct from the memory 110 . Also, an operating system and at least one program code may be stored in the memory 110 . These software components may be loaded into the memory 110 from a computer-readable recording medium separate from the memory 110 . The separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card. In another embodiment, the software components may be loaded into the memory 110 through the communication interface 130 rather than the computer-readable recording medium. For example, the software components may be loaded into the memory 110 of the computer system 100 based on a computer program installed by files received over the network 160 .

프로세서(120)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(110) 또는 통신 인터페이스(130)에 의해 프로세서(120)로 제공될 수 있다. 예를 들어 프로세서(120)는 메모리(110)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 120 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. The instructions may be provided to the processor 120 by the memory 110 or the communication interface 130 . For example, the processor 120 may be configured to execute a received instruction according to a program code stored in a recording device such as the memory 110 .

통신 인터페이스(130)은 네트워크(160)를 통해 컴퓨터 시스템(100)이 다른 장치와 서로 통신하기 위한 기능을 제공할 수 있다. 일례로, 컴퓨터 시스템(100)의 프로세서(120)가 메모리(110)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이나 명령, 데이터, 파일 등이 통신 인터페이스(130)의 제어에 따라 네트워크(160)를 통해 다른 장치들로 전달될 수 있다. 역으로, 다른 장치로부터의 신호나 명령, 데이터, 파일 등이 네트워크(160)를 거쳐 컴퓨터 시스템(100)의 통신 인터페이스(130)를 통해 컴퓨터 시스템(100)으로 수신될 수 있다. 통신 인터페이스(130)를 통해 수신된 신호나 명령, 데이터 등은 프로세서(120)나 메모리(110)로 전달될 수 있고, 파일 등은 컴퓨터 시스템(100)이 더 포함할 수 있는 저장 매체(상술한 영구 저장 장치)로 저장될 수 있다.The communication interface 130 may provide a function for the computer system 100 to communicate with other devices via the network 160 . For example, a request, command, data, file, etc. generated by the processor 120 of the computer system 100 according to a program code stored in a recording device such as the memory 110 is transmitted to the network ( 160) to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer system 100 through the communication interface 130 of the computer system 100 via the network 160 . A signal, command, or data received through the communication interface 130 may be transferred to the processor 120 or the memory 110 , and the file may be a storage medium (described above) that the computer system 100 may further include. persistent storage).

통신 방식은 제한되지 않으며, 네트워크(160)가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망)을 활용하는 통신 방식뿐만 아니라 기기들간의 근거리 유선/무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크(160)는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크(160)는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and a communication method using a communication network (eg, a mobile communication network, a wired Internet, a wireless Internet, a broadcasting network) that the network 160 may include may also include short-distance wired/wireless communication between devices. have. For example, the network 160 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , the Internet, and the like. In addition, the network 160 may include any one or more of a network topology including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, etc. not limited

입출력 인터페이스(140)는 입출력 장치(150)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 마이크, 키보드, 카메라 또는 마우스 등의 장치를, 그리고 출력 장치는 디스플레이, 스피커와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(140)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 입출력 장치(150)는 컴퓨터 시스템(100)과 하나의 장치로 구성될 수도 있다.The input/output interface 140 may be a means for an interface with the input/output device 150 . For example, the input device may include a device such as a microphone, keyboard, camera, or mouse, and the output device may include a device such as a display or speaker. As another example, the input/output interface 140 may be a means for an interface with a device in which functions for input and output are integrated into one, such as a touch screen. The input/output device 150 may be configured as one device with the computer system 100 .

또한, 다른 실시예들에서 컴퓨터 시스템(100)은 도 1의 구성요소들보다 더 적은 혹은 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 시스템(100)은 상술한 입출력 장치(150) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), GPS(Global Positioning System) 모듈, 카메라, 각종 센서, 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다. 보다 구체적인 예로, 컴퓨터 시스템(100)이 무선단말이나 이동로봇으로부터 수신된 이미지를 기반으로 이미지의 위치와 방향 추정이 가능한 서버 시스템 형태로 구현될 수 있으며, 일반적으로 서버 시스템이 포함하고 있는 다양한 구성요소들이 컴퓨터 시스템(100)에 더 포함되도록 구현될 수 있다.Also, in other embodiments, computer system 100 may include fewer or more components than those of FIG. 1 . However, there is no need to clearly show most of the prior art components. For example, the computer system 100 is implemented to include at least a portion of the above-described input/output device 150 or other components such as a transceiver, a global positioning system (GPS) module, a camera, various sensors, and a database. may include more. As a more specific example, the computer system 100 may be implemented in the form of a server system capable of estimating the position and direction of an image based on an image received from a wireless terminal or a mobile robot, and in general, various components included in the server system They may be implemented to be further included in the computer system 100 .

본 발명은 이미지 기반 측위(visual localization) 기술에 관한 것으로, 실내 측위는 물론이고 실외 측위에 모두 적용 가능하다.The present invention relates to an image-based localization (visual localization) technology, and is applicable to both indoor and outdoor localization.

도 2는 본 발명의 일실시예에 따른 컴퓨터 시스템의 프로세서가 포함할 수 있는 구성요소의 예를 도시한 도면이고, 도 3은 본 발명의 일실시예에 따른 컴퓨터 시스템이 수행할 수 있는 이미지 기반 측위 방법의 예를 도시한 순서도이다.FIG. 2 is a diagram illustrating an example of components that a processor of a computer system according to an embodiment of the present invention may include, and FIG. 3 is an image-based view that the computer system according to an embodiment of the present invention may include. It is a flowchart showing an example of a positioning method.

도 2에 도시된 바와 같이, 프로세서(120)는 이미지 합성부(201), 후보 선정부(202), 및 포즈 추정부(203)를 포함할 수 있다. 이러한 프로세서(120)의 구성요소들은 적어도 하나의 프로그램 코드에 의해 제공되는 제어 명령에 따라 프로세서(120)에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 예를 들어, 프로세서(120)가 환경 변화에 강인한 이미지를 합성하도록 컴퓨터 시스템(100)을 제어하기 위해 동작하는 기능적 표현으로서 이미지 합성부(201)가 사용될 수 있다.As shown in FIG. 2 , the processor 120 may include an image synthesis unit 201 , a candidate selection unit 202 , and a pose estimation unit 203 . These components of the processor 120 may be representations of different functions performed by the processor 120 according to a control instruction provided by at least one program code. For example, the image synthesizing unit 201 may be used as a functional representation that operates to control the computer system 100 so that the processor 120 synthesizes an image robust to environmental changes.

프로세서(120) 및 프로세서(120)의 구성요소들은 도 3의 이미지 기반 측위 방법이 포함하는 단계들(S310 내지 S340)을 수행할 수 있다. 예를 들어, 프로세서(120) 및 프로세서(120)의 구성요소들은 메모리(110)가 포함하는 운영체제의 코드와 상술한 적어도 하나의 프로그램 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다. 여기서, 적어도 하나의 프로그램 코드는 이미지 기반 측위 방법을 처리하기 위해 구현된 프로그램의 코드에 대응될 수 있다.The processor 120 and the components of the processor 120 may perform steps S310 to S340 included in the image-based positioning method of FIG. 3 . For example, the processor 120 and components of the processor 120 may be implemented to execute an operating system code included in the memory 110 and an instruction according to at least one program code described above. Here, at least one program code may correspond to a code of a program implemented to process the image-based positioning method.

이미지 기반 측위 방법은 도시된 순서대로 발생하지 않을 수 있으며, 단계들 중 일부가 생략되거나 추가의 과정이 더 포함될 수 있다.The image-based positioning method may not occur in the order shown, and some of the steps may be omitted or additional processes may be further included.

프로세서(120)는 이미지 기반 측위 방법을 위한 프로그램 파일에 저장된 프로그램 코드를 메모리(110)에 로딩할 수 있다. 예를 들어, 이미지 기반 측위 방법을 위한 프로그램 파일은 메모리(110)와는 구분되는 영구 저장 장치에 저장되어 있을 수 있고, 프로세서(120)는 버스를 통해 영구 저장 장치에 저장된 프로그램 파일로부터 프로그램 코드가 메모리(110)에 로딩되도록 컴퓨터 시스템(100)을 제어할 수 있다. 이때, 프로세서(120) 및 프로세서(120)가 포함하는 이미지 합성부(201), 후보 선정부(202), 및 포즈 추정부(203) 각각은 메모리(110)에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(S310 내지 S340)을 실행하기 위한 프로세서(120)의 서로 다른 기능적 표현들일 수 있다. 단계들(S310 내지 S340)의 실행을 위해, 프로세서(120) 및 프로세서(120)의 구성요소들은 직접 제어 명령에 따른 연산을 처리하거나 또는 컴퓨터 시스템(100)을 제어할 수 있다.The processor 120 may load the program code stored in the program file for the image-based positioning method into the memory 110 . For example, the program file for the image-based positioning method may be stored in a persistent storage device separate from the memory 110, and the processor 120 stores the program code from the program file stored in the persistent storage device through the bus to the memory. The computer system 100 may be controlled to be loaded into 110 . At this time, each of the processor 120 and the image synthesizing unit 201 , the candidate selecting unit 202 , and the pose estimating unit 203 included in the processor 120 corresponds to a corresponding portion of the program code loaded into the memory 110 . may be different functional representations of the processor 120 for executing the subsequent steps S310 to S340 by executing the instruction of . For the execution of steps S310 to S340 , the processor 120 and components of the processor 120 may directly process an operation according to a control command or control the computer system 100 .

단계(S310)에서 이미지 합성부(201)는 측위 요청에 따른 쿼리 이미지가 수신되는 경우 쿼리 이미지를 환경 변화에 강인한 이미지로 합성할 수 있다(synthesize). 일례로, 이미지 합성부(201)는 쿼리 이미지에 대하여 이미지 합성 모델(일례로, GAN: generative adversarial network)을 통해 환경 변화에 강인한 이미지(이하, '합성 이미지'라 칭함)를 생성할 수 있다. 이미지 합성부(201)는 쿼리 이미지로서 밤에 찍은 이미지(이하, '야간 이미지'라 칭함)를 이미지 합성 모델에 입력하여 낮에 찍은 이미지(이하, '주간 이미지'라 칭함)로 합성할 수 있다.In step S310 , when a query image according to a location request is received, the image synthesizing unit 201 may synthesize the query image into an image that is robust to environmental changes (synthesize). For example, the image synthesizing unit 201 may generate an image robust to environmental changes (hereinafter, referred to as a 'synthetic image') through an image synthesizing model (eg, a generative adversarial network (GAN)) with respect to the query image. The image synthesizing unit 201 may input an image taken at night (hereinafter, referred to as a 'night image') as a query image into the image synthesis model and synthesized into an image taken during the day (hereinafter, referred to as a 'day image'). .

단계(S320)에서 후보 선정부(202)는 쿼리 이미지 및 단계(S310)에서 합성된 이미지에 대한 참조 이미지(reference image)로서 VL(visual localization)용 지도 DB(상술한 영구 저장 장치)에서 쿼리 이미지 및 합성 이미지와 관련이 높은 후보 이미지를 선정할 수 있다. 일례로, 후보 선정부(202)는 쿼리 이미지와 합성 이미지에서 글로벌 피처(global feature)를 추출한 후 VL용 지도 DB 상의 각 이미지의 글로벌 피처와 비교함으로써 매칭 스코어가 높은 이미지를 후보 이미지로 선정할 수 있다.In step S320 , the candidate selector 202 is a query image and a reference image for the image synthesized in step S310 , which is a query image in the map DB (the above-mentioned permanent storage device) for VL (visual localization). and a candidate image highly related to the composite image may be selected. For example, the candidate selector 202 extracts global features from the query image and the composite image and compares them with global features of each image on the VL map DB to select an image with a high matching score as a candidate image. have.

단계(S330)에서 포즈 추정부(203)는 쿼리 이미지와 합성 이미지에 대해 단계(S320)에서 선정된 후보 이미지와의 시각적 관련성(visual correspondences) 정보를 추출할 수 있다. 일례로, 포즈 추정부(203)는 쿼리 이미지와 합성 이미지, 그리고 후보 이미지에서 로컬 피처(local feature)인 특징점을 각각 추출한 후 해당 이미지들의 특징점 간에 시각적 관련성 정보를 추출할 수 있다.In operation S330 , the pose estimator 203 may extract information on visual correspondences between the query image and the composite image with the candidate image selected in operation S320 . For example, the pose estimator 203 may extract feature points that are local features from the query image, the composite image, and the candidate image, respectively, and then extract visual relevance information between the feature points of the corresponding images.

단계(S340)에서 포즈 추정부(203)는 단계(S330)에서 추출된 시각적 관련성 정보를 기반으로 쿼리 이미지에 대한 측위 결과로서 쿼리 이미지에 대응되는 6-DOF 포즈를 추정할 수 있다. 일례로, 포즈 추정부(203)는 이미지의 로컬 피처인 특징점을 기반으로 특징점 간의 오차를 구하는 방식인 재투영 오차(re-projection error) 측정 방법을 이용하여 6-DOF 포즈를 계산할 수 있다.In step S340 , the pose estimator 203 may estimate a 6-DOF pose corresponding to the query image as a positioning result for the query image based on the visual relevance information extracted in step S330 . For example, the pose estimator 203 may calculate the 6-DOF pose using a re-projection error measurement method, which is a method of calculating an error between the feature points based on the feature points that are local features of the image.

쿼리 이미지를 환경 변화에 강인한 이미지로 합성하는 과정의 구체적인 실시예를 설명하면 다음과 같다.A specific example of the process of synthesizing the query image into an image that is robust to environmental changes will be described as follows.

이미지 간 변환(image-to-image translation)image-to-image translation

GAN은 이미지 간 변환 접근 방식으로서 주어진 입력에서 조건화된 출력을 생성한다.GAN is an image-to-image transformation approach that produces a conditioned output from a given input.

이미지 간 변환을 비감독 프레임워크(unsupervised framework)로 확장한 GAN 모델인 CycleGAN 모델은 이미지 쌍의 정렬을 필요로 하지 않는다. 적대 훈련(adversarial training) 모델은 생성자(generator)와 구별자(discriminator) 네트워크를 포함하며, 이때 구별자 D는 하나 이상의 실제 샘플과 생성된 샘플 쌍을 구별하도록 훈련하고, 생성자 G는 생성된 샘플로 구별자 D를 속이도록 훈련한다.The CycleGAN model, which is a GAN model that extends image-to-image transformation to an unsupervised framework, does not require alignment of image pairs. An adversarial training model includes a generator and a discriminator network, where a discriminator D trains to distinguish one or more real and generated sample pairs, and a generator G uses the generated samples. Train to deceive discriminator D.

도 4는 본 발명의 일실시예에 있어서 CycleGAN 모델을 설명하기 위한 예시 도면을 도시한 것이다.4 shows an exemplary diagram for explaining the CycleGAN model in an embodiment of the present invention.

도 4를 참조하면, CycleGAN 모델(40)은 (G_A, D_A) 및 (G_B, D_B) 두 쌍의 생성자(41)와 구별자(42)로 구성될 수 있다. 여기서, 도메인 A와 B 사이의 변환자(translator)는 G_A: A→B와 G_B: B→A이다. D_A는 실제 이미지 a와 변환된 이미지 G_B(b)를 구별하도록 훈련하고, D_B는 이미지 b와 G_A(a)를 구별하도록 훈련한다. CycleGAN 모델(40)은 적대 손실(adversarial loss)과 주기 일관성 손실(cycle loss)을 모두 사용하여 훈련한다. 주기 일관성 손실은 G_B(G_A(a)

a, G_A(G_B(b))

b와 같이 G_A와 G_B가 반전되도록 유도함으로써 이미지를 단방향으로 변환하는 문제를 규칙화할 수 있다.Referring to FIG. 4 , the CycleGAN model 40 may include two pairs of a generator 41 and a discriminator 42 _{(G A} , D _A ) and (G _B , D _{B ).} Here, the translators between domains A and B are G _A : A→B and G _B : B→A. D _A is trained to distinguish between the real image a and the transformed image G _B (b), and D _B is trained to distinguish between the images b and G _A (a). The CycleGAN model 40 is trained using both an adversarial loss and a cycle loss. Periodic coherence loss is G _B (G _A (a)

a, G _A (G _B (b))

_{By inducing G A} and G _B to be inverted as in b, we can regularize the problem of transforming the image in one direction.

CycleGAN 모델(40)의 목표는 수학식 1 내지 수학식 3과 같이 나타낼 수 있다.The target of the CycleGAN model 40 may be expressed as in Equations 1 to 3.

[수학식 1][Equation 1]

[수학식 2][Equation 2]

[수학식 3][Equation 3]

장소 인식 및 place awareness and 로컬리제이션localization

장소 인식은 위치 분류(location classification)로서 특정 장소의 이미지에서 실제 위치를 식별하는 작업을 의미한다.Location recognition is location classification and refers to an operation of identifying an actual location in an image of a specific location.

이미지 기반 측위(VL)는 카메라 위치 (및 방향)를 로컬 또는 글로벌 피처 맵과 비교하여 식별하는 과정이다. 이미지 기반 측위(VL)를 위한 한 가지 방법은 이미지 검색을 통해 쿼리에 대해 포즈가 태깅된 이미지 중에서 가장 유사한 이미지를 찾는 것이다. 이러한 경우 포즈 계산에 카메라 포즈가 매우 중요한 역할을 하기 때문에 카메라 포즈의 변화는 바람직하지 않다.Image-based localization (VL) is the process of identifying a camera position (and orientation) by comparing it to a local or global feature map. One method for image-based localization (VL) is to find the most similar image among images tagged with a pose for a query through an image search. In this case, changing the camera pose is not desirable because the camera pose plays a very important role in the pose calculation.

이미지 기반 측위(image-based positioning ( VLVL )를 위한 이미지 변환(image translation)) for image translation

본 발명의 실시예에서는 쿼리 이미지인 야간 이미지와 포즈가 태깅된 주간 이미지 셋 간의 로컬리제이션 문제를 다루는 GAN 모델을 이용한다. GAN 모델은 주간 이미지 도메인과 야간 이미지 도메인 간의 변환을 훈련하고, 야간 이미지를 주간 이미지로 변환하는 방향(night-to-day direction)을 적용한다. 이때, 변환된 이미지와 참조 이미지 모두 이미지 당 피처 벡터(feature vector)를 얻을 수 있다. 최근접 이웃 탐색(nearest neighbor search)은 쿼리 이미지와 가장 매칭되는 참조 이미지를 제공하게 되며, 쿼리 이미지의 포즈는 해당 참조 이미지의 포즈로 근사하게 된다.In the embodiment of the present invention, a GAN model that deals with the localization problem between a set of nighttime images, which is a query image, and a set of pose-tagged daytime images, is used. The GAN model trains the transformation between the daytime image domain and the nighttime image domain, and applies a night-to-day direction to transform the night image into the day image. In this case, both the transformed image and the reference image may obtain a feature vector per image. The nearest neighbor search provides a reference image that best matches the query image, and the pose of the query image is approximated by the pose of the reference image.

본 발명에서는 쿼리 이미지를 환경 변화에 강인한 이미지로 합성하기 위해 CycleGAN 모델(40)을 변형한 이미지 합성 모델을 적용할 수 있다.In the present invention, an image synthesis model modified by the CycleGAN model 40 may be applied to synthesize a query image into an image that is robust to environmental changes.

이미지 합성 모델의 생성자 네트워크는 CycleGAN 모델(40)에서 사용되는 생성자(41)와 동일하나 반으로 나누어 전자 생성자는 인코더로, 후자 생성자는 디코더로 구성될 수 있다. 두 도메인의 경우 생성자 네트워크의 구조와 훈련 절차는 CycleGAN 모델(40)과 동일하다.The generator network of the image synthesis model is the same as that of the generator 41 used in the CycleGAN model 40, but by dividing it in half, the former generator may be configured as an encoder and the latter generator as a decoder. For both domains, the structure and training procedure of the generator network are the same as those of the CycleGAN model (40).

구별자 네트워크의 경우는 CycleGAN 모델(40)에서 사용되는 구별자(42)를 변형하여 구현할 수 있다.The discriminator network can be implemented by modifying the discriminator 42 used in the CycleGAN model 40 .

도 5는 본 발명의 일실시예에 있어서 이미지 변환을 위한 구별자 네트워크 구조의 일례를 도시한 것이다.5 shows an example of a discriminator network structure for image transformation according to an embodiment of the present invention.

도 5를 참조하면, 본 발명에 따른 이미지 합성 모델의 구별자 네트워크에서 각 도메인의 구별자(52)는 입력 이미지에 대한 3개의 네트워크 클론(블러-RGB, 그레이 스케일, xy-그라디언트)으로 구성될 수 있다. 각 구별자(52)는 컨볼루션 레이어(convolution layer)에서 서로 다른 수용 영역 크기(receptive-field size)를 커버할 수 있는 결정(decision)을 출력한다.5, in the discriminator network of the image synthesis model according to the present invention, the discriminator 52 of each domain is composed of three network clones (blur-RGB, gray scale, xy-gradient) for the input image. can Each discriminator 52 outputs a decision that can cover different receptive-field sizes in the convolution layer.

3개의 네트워크 클론 중 하나는 입력 이미지(501)를 5×5 3σ 가우시안 커널을 통해 블러 처리한 RGB 이미지(502)를 취하고, 다른 하나는 입력 이미지(501)의 휘도 데이터(503)를 취하고, 또 다른 하나는 입력 이미지(501)의 수평/수직 그라디언트 데이터(504)를 취한다. 이러한 구성의 구별자(52)는 각 네트워크 클론을 통해 색상(color), 텍스처(texture), 그라디언트(gradient)에 각각 초점을 맞출 수 있다. 구별자의 각 네트워크 클론은 아키텍처와 하이퍼파라미터에서 동일하며 손실은 평균으로 결국 동일하다.One of the three network clones takes the RGB image 502 in which the input image 501 is blurred through a 5×5 3σ Gaussian kernel, the other takes the luminance data 503 of the input image 501, and The other takes the horizontal/vertical gradient data 504 of the input image 501 . The discriminator 52 of this configuration may focus on a color, a texture, and a gradient through each network clone, respectively. Each network clone of the discriminator is identical in architecture and hyperparameters, and the losses are averaged and eventually identical.

수평/수직 그라디언트 데이터(504)를 취하는 네트워크 클론은 SIFT(scale invariant feature transform) 디스크립터를 추출하는 과정을 모방하는 역할을 한다. 입력 이미지(501)를 그레이 스케일로 변환한 후 다른 모든 픽셀을 건너뛰며 2배수로 다운샘플링한다. 그런 다음, x 방향 그라디언트를 위한 [-1 0 1] 커널과 y 방향의 트랜스포즈를 사용하여 그라디언트를 계산한 후 크기가 가중된 그라디언트(magnitude-weighted gradient)의 히스토그램을 생성한다.A network clone taking the horizontal/vertical gradient data 504 serves to mimic the process of extracting a scale invariant feature transform (SIFT) descriptor. After converting the input image 501 to gray scale, all other pixels are skipped and downsampled by a factor of two. Then, the gradient is computed using the [-1 0 1] kernel for the x-direction gradient and the y-direction transpose, and then a histogram of the magnitude-weighted gradient is generated.

따라서, 본 발명에 따른 이미지 합성 모델은 1×1 stride-2 콘볼루션을 사용하여 다운샘플링된 이미지를 얻고 2개의 필터와 결합하여 구별자에 대해 다른 방식으로 동일한 xy-그라디언트를 구할 수 있다. 상기한 구조의 구별자(52)를 사용하여 원본에는 존재하지 않는 변환된 버전에 일치하는 관련 피처를 만드는 한편, 순환 재구성(cyclic reconstruction)에서 원본 이미지의 관련 피처를 유지할 수 있다.Therefore, the image synthesis model according to the present invention can obtain a downsampled image using 1×1 stride-2 convolution and combine with two filters to obtain the same xy-gradient in different ways for the discriminator. The structural discriminator 52 above can be used to create relevant features that match the transformed version that do not exist in the original, while retaining relevant features of the original image in cyclic reconstruction.

그리고, 구별자(52)에 그라디언트 절대값과 방향을 포함시킬 수 있고 각 다운샘플링 계층 이후에 라벨(label)/결정을 출력한다. 여러 스케일로 이미지를 구별함에 따라 임의 수용 필드 크기가 아닌 낮은 수준의 이미지뿐만 아니라 높은 수준의 이미지에 대해 모두 일관성을 유지할 수 있다.And, it is possible to include the gradient absolute value and direction in the discriminator 52 and output a label/decision after each downsampling layer. Differentiating images on multiple scales allows consistency for both low-level as well as high-level images rather than arbitrary receptive field sizes.

본 발명에 따른 이미지 합성 모델은 각 구별자(52) 별로 다중 결정을 출력한다. 최종 손실은 네트워크 예측의 복잡성과 성능이 뎁스에 따라 증가함에 따라 마지막 계층으로 오름차순으로 선형 가중된다. 오름차순 계층 순서로 [1, 2, …, n]의 가중치를 가진 n 출력으로 볼 수 있으며, 그 합계를

로 나누어 평균을 산출한다.The image synthesis model according to the present invention outputs multiple crystals for each discriminator 52 . The final loss is linearly weighted in ascending order to the last layer as the complexity and performance of the network prediction increases with depth. In ascending hierarchical order [1, 2, … , n] can be seen as output n with weights of

Divide by to get the average.

본 발명에 따른 이미지 합성 모델에서는 입력이 절대적 방식으로 현실성을 결정하는 것이 아니라 가짜에 비해 얼마나 사실적인지 판단하는 훈련 형태로 변경된 구별자(52)의 손실 공식을 적용한다. 이는 생성자(41)와 관련하여 구별자(52)가 너무 강력해지는 것을 방지함으로써 전체적으로 훈련을 안정화시키고자 하는 것이다.In the image synthesis model according to the present invention, the loss formula of the discriminator 52 is applied, which is changed to a training form that determines how realistic the input is compared to the fake, rather than determining the realism in an absolute way. This is intended to stabilize training as a whole by preventing the discriminator 52 from becoming too powerful with respect to the constructor 41 .

수학식 4는 상대론적 손실(Relativistic Loss) 공식에 수학식 1을 적용하여 최소 제곱 GAN 손실을 정의한 것이다.Equation 4 defines the least squares GAN loss by applying Equation 1 to the relativistic loss formula.

[수학식 4][Equation 4]

이미지 합성부(201)는 도 4와 도 5를 통해 설명한 이미지 합성 모델을 통해 환경 변화에 강인한 이미지를 생성할 수 있다. 도 6에 도시한 바와 같이, 이미지 합성부(201)는 쿼리 이미지로 입력된 야간 이미지(601)를 GAN 모델(60)을 통해 주간 이미지(602)로 합성할 수 있다.The image synthesizing unit 201 may generate an image robust to environmental changes through the image synthesizing model described with reference to FIGS. 4 and 5 . As shown in FIG. 6 , the image synthesizing unit 201 may synthesize a night image 601 input as a query image into a daytime image 602 through the GAN model 60 .

상기에서는 쿼리 이미지를 환경 변화에 강인한 이미지로 합성하기 위해 이미지 합성 모델 중 하나인 GAN 모델을 이용하는 것으로 설명하고 있으나, 이에 한정되는 것은 아니며 이미지 변환이 가능한 기술이라면 얼마든지 적용 가능하다. 예를 들어, 쿼리 이미지의 밝기를 조정하는 방식을 통해 환경 변화에 강인한 이미지를 합성하는 것 또한 가능하다.In the above description, the GAN model, which is one of the image synthesis models, is used to synthesize the query image into an image that is robust to environmental changes, but the present invention is not limited thereto, and any technology capable of converting an image can be applied. For example, it is also possible to synthesize an image that is resistant to environmental changes by adjusting the brightness of the query image.

도 7은 본 발명의 일실시예에 있어서 측위를 위한 참조 이미지로 후보 이미지를 선정하는 과정의 일례를 도시한 것이다.7 illustrates an example of a process of selecting a candidate image as a reference image for positioning according to an embodiment of the present invention.

도 7을 참조하면, 후보 선정부(202)는 원본 이미지인 쿼리 이미지(601)와 매칭되는 적어도 하나의 참조 이미지를 제1 후보 이미지군으로 선정함과 아울러, GAN 모델(60)을 통해 합성된 이미지(602)와 매칭되는 적어도 하나의 참조 이미지를 제2 후보 이미지군으로 선정할 수 있다. 이때, 후보 선정부(202)는 딥러닝(deep learning) 모델을 통해 쿼리 이미지(601)와 합성 이미지(602)에서 글로벌 피처를 추출한 후 추출된 피처를 이용하여 VL용 지도 DB 상에서 매칭 이미지를 검색할 수 있다. 제1 후보 이미지군과 제2 후보 이미지군은 쿼리 이미지(601) 또는 합성 이미지(602)와의 매칭 스코어를 기준으로 사전에 정해진 개수의 이미지가 선정되거나 혹은 사전에 정해진 일정 스코어 이상의 이미지가 선정될 수 있다.Referring to FIG. 7 , the candidate selector 202 selects at least one reference image matching the query image 601 which is the original image as the first candidate image group, and is synthesized through the GAN model 60 . At least one reference image matching the image 602 may be selected as the second candidate image group. At this time, the candidate selector 202 extracts global features from the query image 601 and the composite image 602 through a deep learning model, and then uses the extracted features to search for matching images on the VL map DB. can do. For the first candidate image group and the second candidate image group, a predetermined number of images may be selected based on a matching score with the query image 601 or the composite image 602, or images with a predetermined score or higher may be selected. have.

후보 선정부(202)는 제1 후보 이미지군과 제2 후보 이미지군을 이용하여 최종 참조 이미지(703)를 결정할 수 있다. 일례로, 제1 후보 이미지군과 제2 후보 이미지군을 모두 최종 참조 이미지(703)로 사용할 수 있다. 다른 예로는, 제1 후보 이미지군과 제2 후보 이미지군을 매칭 스코어를 기준으로 정렬하여 매칭 스코어가 높은 상위 일정 개수의 일부 후보 이미지만을 최종 참조 이미지(703)로 사용할 수 있다.The candidate selector 202 may determine the final reference image 703 using the first candidate image group and the second candidate image group. For example, both the first candidate image group and the second candidate image group may be used as the final reference image 703 . As another example, by arranging the first candidate image group and the second candidate image group based on the matching score, only some candidate images having a high matching score may be used as the final reference image 703 .

이때, 후보 선정부(202)는 제1 후보 이미지군과 제2 후보 이미지군에서 이미지와 관련된 시간 정보(예컨대, 이미지 생성 시간, 이미지 업데이트 시간 등)가 일정 시간 이상 경과된 이미지를 최종 참조 이미지(703)로 선정되지 않도록 제외시킬 수 있다.In this case, the candidate selector 202 selects an image in which time information (eg, image creation time, image update time, etc.) related to the images in the first candidate image group and the second candidate image group has elapsed for a certain period of time or more as a final reference image ( 703) can be excluded.

따라서, 후보 선정부(202)는 쿼리 이미지(601)뿐만 아니라 합성 이미지(602)를 함께 이용하여 참조 이미지(703)를 선정할 수 있다.Accordingly, the candidate selector 202 may select the reference image 703 by using not only the query image 601 but also the composite image 602 .

도 8은 본 발명의 일실시예에 있어서 6-DOF 포즈를 추정하는 과정의 일례를 도시한 것이다.8 illustrates an example of a process for estimating a 6-DOF pose according to an embodiment of the present invention.

도 8을 참조하면, 포즈 추정부(203)는 쿼리 이미지(601)와 합성 이미지(602)에서 로컬 피처(2D 특징점)을 추출함과 아울러 최종 참조 이미지(703)에서 로컬 피처(3D 특징점)을 추출할 수 있다. 예를 들어, 포즈 추정부(203)는 SIFT(Scale Invariant Feature Transform), SURF(Speeded up robust features), FAST(Features from Accelerated Segment Test), BRIEF(Binary robust independent elementary features), ORB(Oriented FAST and Rotated BRIEF) 등의 특징점 추출 알고리즘을 통해서 이미지 상에서 비주얼 피처인 특징점들을 추출할 수 있다.Referring to FIG. 8 , the pose estimator 203 extracts local features (2D feature points) from the query image 601 and the composite image 602 as well as local features (3D feature points) from the final reference image 703. can be extracted. For example, the pose estimator 203 may include a Scale Invariant Feature Transform (SIFT), Speeded up robust features (SURF), Features from Accelerated Segment Test (FAST), Binary robust independent elementary features (BRIEF), Oriented FAST and ORB (Oriented FAST and Rotated BRIEF) can extract key points that are visual features from the image through a key point extraction algorithm.

이후, 포즈 추정부(203)는 쿼리 이미지(601)와 합성 이미지(602)에서 추출된 특징점과 최종 참조 이미지(703)에서 추출된 특징점의 디스크립터 정보를 이용하여 서로 다른 이미지 간에 특징점의 관련성(correspondences)을 획득할 수 있다. 도 9에 도시한 바와 같이, 포즈 추정부(203)는 쿼리 이미지(601)(또는 합성 이미지(602))와 최종 참조 이미지(703)의 특징점 매칭 결과로부터 서로 다른 두 이미지(601)(703) 간의 관련성을 획득할 수 있다.Thereafter, the pose estimator 203 uses descriptor information of the feature points extracted from the query image 601 and the composite image 602 and the feature points extracted from the final reference image 703 to correlate the feature points between different images. ) can be obtained. As shown in FIG. 9 , the pose estimator 203 performs two different images 601 and 703 from the feature point matching result of the query image 601 (or the composite image 602 ) and the final reference image 703 . relationship between them can be obtained.

그리고, 포즈 추정부(203)는 서로 다른 두 이미지(601)(703)의 로컬 피처인 특징점을 기반으로 특징점 간의 오차를 구하는 방식인 재투영 오차 측정 방법을 이용하여 6-DOF 포즈를 계산할 수 있다.In addition, the pose estimator 203 may calculate a 6-DOF pose using a re-projection error measuring method, which is a method of calculating an error between the feature points based on feature points that are local features of two different images 601 and 703 . .

도 10을 참조하면, 서로 다른 이미지 사이에서 얻어진 관련성을 이용하여 재투영 오차(re-projection error)(

)를 수학식 5와 같이 정의할 수 있다.Referring to FIG. 10 , a re-projection error (re-projection error) using the association obtained between different images

) can be defined as in Equation 5.

[수학식 5][Equation 5]

(

)(

)

여기서,

는 투영 함수(projection function)이고,

는 6자유도 포즈를 나타내는 4×4 변환 행렬(transformation matrix)이다. 그리고,

는 최종 참조 이미지(703)의 특징점의 3차원 포인트이고,

는 쿼리 이미지(601) 또는 합성 이미지(602)의 특징점의 2차원 픽셀 좌표이다.here,

is the projection function,

is a 4×4 transformation matrix representing a 6-DOF pose. and,

is the three-dimensional point of the feature point of the final reference image 703,

is the two-dimensional pixel coordinates of the feature points of the query image 601 or the composite image 602 .

모든 관련성에 대하여 오차를 합산하면 수학식 6과 같은 형태의 에너지 함수(energy fuction)(

)를 얻을 수 있다.When the errors are summed for all relevance, an energy function (energy function) in the form of Equation 6

) can be obtained.

[수학식 6][Equation 6]

상기한 에너지 함수(

)가 최소화되는 방향으로 변환 행렬인

를 계산할 수 있다. 이때, 변환 행렬(

)의 계산을 위해서는 수학식 7과 같은 screw displacement 기법을 사용할 수 있다.The above-mentioned energy function (

) is the transformation matrix in the direction in which it is minimized.

can be calculated. In this case, the transformation matrix (

) can be calculated using the screw displacement technique as in Equation 7.

[수학식 7][Equation 7]

(

)(

)

여기서,

은 회전 행력(rotation matrix)이고,

는 병진 벡터(translation vector)이다. 그리고,

는 트위스트 벡터(twist vector)를 의미하고,

,

는 각 축의 회전 속도(rotation velocity)를 의미하고,

,

는 각 축의 병진 속도를 의미한다.here,

is the rotation matrix,

is a translation vector. and,

means a twist vector,

,

is the rotation velocity of each axis,

,

is the translational speed of each axis.

포즈 추정부(203)는 에너지 함수(

)가 최소화되는 방향으로 6-DOF 포즈인

를 계산할 수 있고, 이때 가우스-뉴턴(Gauss-Newton) 기반의 비선형 최적화(non-linear optimization) 기법을 활용하여 점진적으로 포즈를 계산할 수 있다.The pose estimator 203 is an energy function (

) is a 6-DOF pose in the direction where it is minimized.

can be calculated, and at this time, the pose can be gradually calculated using a Gauss-Newton-based non-linear optimization technique.

따라서, 포즈 추정부(203)는 상기한 과정을 거쳐 서로 다른 이미지 사이에서 획득한 특징점의 관련성 정보를 바탕으로 쿼리 이미지(601)에 대한 6-DOF 포즈를 계산할 수 있다.Accordingly, the pose estimator 203 may calculate a 6-DOF pose for the query image 601 based on the correlation information of the feature points obtained between different images through the above-described process.

이처럼 본 발명의 실시예들에 따르면, 환경 변화에 강인한 이미지를 합성하여 합성된 이미지를 이용하여 정확한 참조 이미지를 추출함은 물론이고 정확한 측위 결과를 제공할 수 있다. 그리고, 본 발명의 실시예들에 따르면, 환경 변화에 강인한 이미지를 합성한 후 쿼리 이미지와 함께 합성된 이미지를 이용하여 특징점 기반의 포즈 추정을 수행함으로써 정확한 6-DOF 포즈를 추정할 수 있다.As described above, according to embodiments of the present invention, an accurate reference image can be extracted using the synthesized image by synthesizing an image robust to environmental changes, and an accurate positioning result can be provided. And, according to embodiments of the present invention, an accurate 6-DOF pose can be estimated by synthesizing an image robust to environmental changes and then performing a feature point-based pose estimation using the synthesized image together with the query image.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, the apparatus and components described in the embodiments may include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), and a programmable logic unit (PLU). It may be implemented using one or more general purpose or special purpose computers, such as a logic unit, microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be embodied in any type of machine, component, physical device, computer storage medium or device for interpretation by or providing instructions or data to the processing device. have. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 이때, 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수 개의 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 어플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. In this case, the medium may be to continuously store a program executable by a computer, or to temporarily store it for execution or download. In addition, the medium may be a variety of recording means or storage means in the form of a single or several hardware combined, it is not limited to a medium directly connected to any computer system, and may exist distributed on a network. Examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like. In addition, examples of other media include recording media or storage media managed by an app store that distributes applications, sites that supply or distribute various other software, and servers.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

An image-based positioning method executed on a computer system, comprising:
the computer system comprising at least one processor configured to execute computer readable instructions contained in a memory;
The image-based positioning method,
synthesizing, by the at least one processor, a query image according to a location request through an image synthesis model;
selecting, by the at least one processor, a reference image for positioning using the query image and the synthesized image; and
estimating, by the at least one processor, a pose for the query image using the feature points extracted from the query image and the synthesized image and the feature points extracted from the reference image;
including,
The selecting step is
searching for at least one first candidate image matching the query image using a global feature extracted from the query image;
searching for at least one second candidate image matching the synthesized image using global features extracted from the synthesized image; and
selecting at least one of the first candidate image and the second candidate image as the reference image;
An image-based positioning method that includes.

According to claim 1,
The synthesizing step is
Generating a composite image through image translation using the image synthesis model
An image-based positioning method, characterized in that

According to claim 1,
The synthesizing step is
synthesizing the query image through a generative adversarial network (GAN) model that converts a night image into a day image
An image-based positioning method, characterized in that

4. The method of claim 3,
The GAN model is composed of a network that transforms at least one feature of an image color, texture, and gradient
An image-based positioning method, characterized in that

delete

According to claim 1,
selecting at least one of the first candidate image and the second candidate image as the reference image includes:
Selecting some candidate images from among the first candidate image and the second candidate image as the reference image based on a matching score of the query image or the synthesized image
An image-based positioning method, characterized in that

According to claim 1,
The selecting step is
excluding an image in which time information related to an image has elapsed for more than a predetermined time from among the first candidate image and the second candidate image from being selected as the reference image;
An image-based positioning method further comprising a.

According to claim 1,
The estimating step is
obtaining visual correspondences information between the feature points of the query image or the synthesized image and the feature points of the reference image; and
calculating a 6-degrees of freedom (6-degrees of freedom) pose corresponding to the query image based on the visual relevance information
An image-based positioning method comprising a.

9. The method of claim 8,
Calculating the 6-DOF pose comprises:
Calculating the 6-DOF pose using a re-projection error measuring method, which is a method of calculating an error between feature points based on the relation between the feature points of the query image or the synthesized image and the feature points of the reference image thing
An image-based positioning method, characterized in that

A computer program stored in a non-transitory computer-readable recording medium for executing the image-based positioning method of any one of claims 1 to 4, 6 to 9 in the computer system.

A non-transitory computer-readable recording medium having a program recorded thereon for executing the image-based positioning method of any one of claims 1 to 4 and 6 to 9 on a computer.

In a computer system,
at least one processor configured to execute computer readable instructions contained in memory
including,
the at least one processor,
an image synthesizing unit for synthesizing a query image according to a positioning request through an image synthesizing model;
a candidate selection unit for selecting a reference image for positioning using the query image and the synthesized image; and
A pose estimator for estimating a pose for the query image using the feature points extracted from the query image and the synthesized image and the feature points extracted from the reference image
including,
The candidate selection unit
searching for at least one first candidate image matching the query image using the global feature extracted from the query image,
searching for at least one second candidate image matching the synthesized image using the global features extracted from the synthesized image;
selecting at least one of the first candidate image and the second candidate image as the reference image
A computer system characterized by a.

13. The method of claim 12,
The image synthesizing unit,
Generating a composite image through image conversion using the image synthesis model
A computer system characterized by a.

13. The method of claim 12,
The image synthesizing unit,
Compositing the query image through a GAN model that converts a night image into a day image
A computer system characterized by a.

15. The method of claim 14,
The GAN model consists of a network that transforms at least one feature among color, texture, and gradient of an image.
A computer system characterized by a.

delete

13. The method of claim 12,
The candidate selection unit
Selecting some candidate images from among the first candidate image and the second candidate image as the reference image based on a matching score of the query image or the synthesized image
A computer system characterized by a.

13. The method of claim 12,
The candidate selection unit
Excluding an image in which time information related to an image has elapsed for a predetermined time or more from among the first candidate image and the second candidate image is not selected as the reference image
A computer system characterized by a.

13. The method of claim 12,
The pose estimation unit,
Obtaining visual relevance information between the feature points of the query image or the synthesized image and the feature points of the reference image,
Calculating a 6-DOF pose corresponding to the query image based on the visual relevance information
A computer system characterized by a.

20. The method of claim 19,
The pose estimation unit,
Calculating the 6-DOF pose using a re-projection error measuring method, which is a method of calculating an error between the feature points based on the relation between the feature points of the query image or the synthesized image and the feature points of the reference image
A computer system characterized by a.