JP2016526313A

JP2016526313A - Monocular visual SLAM using global camera movement and panoramic camera movement

Info

Publication number: JP2016526313A
Application number: JP2016511811A
Authority: JP
Inventors: クリスチャン・ピルチハイム; ディーター・シュマルシュティーグ; ゲルハルト・ライトメイヤー
Original assignee: クアルコム，インコーポレイテッド
Priority date: 2013-04-30
Filing date: 2014-04-29
Publication date: 2016-09-01
Anticipated expiration: 2034-04-29
Also published as: CN105210113B; US20140320593A1; US9674507B2; CN105210113A; WO2014179349A1; KR20160003066A; JP6348574B2; EP2992505A1

Abstract

総体的6DOFカメラ移動およびパノラマカメラ移動を取り扱う単眼視覚同時局所化およびマッピングのためのシステム、装置および方法が開示される。レギュラーキーフレームまたはパノラマキーフレーム内で観察された有限深度または無限深度を有する特徴を含んだ環境の3Dマップが受け取られる。カメラは、有限特徴セット、無限特徴セットまたは混合特徴セットから6DOFで追跡される。マッピングされない場面領域へ向かうパノラマカメラ移動が検出されると、無限特徴を有する基準パノラマキーフレームが生成され、かつ、3Dマップ中に挿入される。パノラマカメラ移動がマッピングされない場面領域へ向かって延びている場合、基準キーフレームは、他の従属パノラマキーフレームを使用して延長される。パノラマキーフレームは、有限3Dマップ特徴に対して6DOFで頑強に局所化される。局所化されたパノラマキーフレームは、局所化された他のキーフレームにおける2D観察と整合される無限マップ特徴の2D観察を含む。2D-2D符合は三角測量され、新しい有限3Dマップ特徴が得られる。Disclosed are systems, devices and methods for monocular visual simultaneous localization and mapping that handle global 6DOF camera movement and panoramic camera movement. A 3D map of the environment is received that includes features with a finite or infinite depth observed in regular or panoramic key frames. The camera is tracked at 6 DOF from a finite feature set, an infinite feature set, or a mixed feature set. When panoramic camera movement toward an unmapped scene area is detected, a reference panoramic key frame with infinite features is generated and inserted into the 3D map. If the panoramic camera movement extends towards the unmapped scene area, the reference key frame is extended using other subordinate panoramic key frames. Panorama keyframes are robustly localized at 6DOF for finite 3D map features. Localized panoramic key frames include 2D observations of infinite map features that are aligned with 2D observations in other localized key frames. The 2D-2D code is triangulated to obtain a new finite 3D map feature.

Description

関連行為の相互参照
本出願は、参照により明確に本明細書に組み込まれている、2013年4月30日に出願した米国仮出願第61/817,808号の利益を主張するものである。 This application claims the benefit of US Provisional Application No. 61 / 817,808, filed Apr. 30, 2013, which is expressly incorporated herein by reference.

本明細書において開示される主題は、一般に、同時局所化およびマッピングに関する。 The subject matter disclosed herein generally relates to simultaneous localization and mapping.

視覚同時局所化およびマッピング(SLAM:Simultaneous Localization and Mapping)システムは、カメラが6自由度(6DOF:Six Degrees of Freedom)で動く際に、単一のカメラの入力を処理し、かつ、環境の三次元(3D:Three Dimensional)モデル(たとえばSLAMマップ)を連続的に作成することができる。視覚SLAMシステムは、3Dモデルに対するカメラの位置および配向(ポーズ)を同時に追跡することができる。キーフレームをベースとする視覚SLAMシステムは、入ってくるカメラ画像ストリームまたはフィードから離散的に選択されたフレームを処理することができる。キーフレームをベースとする視覚SLAMシステムは、総体的カメラ運動を想定して、運動からの構造技法を適用して3D特徴マップを生成する。 The Simultaneous Localization and Mapping (SLAM) system processes the input of a single camera and moves it to the third order of the environment as the camera moves in 6 DOF (Six Degrees of Freedom). Original (3D: Three Dimensional) models (eg, SLAM maps) can be created continuously. The visual SLAM system can simultaneously track the position and orientation (pose) of the camera relative to the 3D model. A visual SLAM system based on key frames can process discretely selected frames from an incoming camera image stream or feed. A visual SLAM system based on key frames assumes a general camera motion and applies a structural technique from motion to generate a 3D feature map.

視覚SLAMシステムには、3Dマップ特徴を三角測量するために、場合によっては、キーフレーム対間の並進カメラ運動または総体的カメラ運動によって誘導される十分な視差が必要である。したがってすでに選択済みのキーフレームに対して、選択アルゴリズムは候補フレームを拒絶することがあり、相対回転のみカメラ運動が低下する。マッピングされない領域に対する回転のみカメラ運動は、新しく選択されたキーフレームの不足のため、視覚SLAMシステムを停止させることがある。カメラ追跡は、マップ非可用性のため、最終的には失敗することになる。その結果、視覚SLAMシステムは、追跡を再開するために、再局所化モードに強制されることになる。したがって改良型追跡技法およびマッピング技法が望ましい。 A visual SLAM system requires sufficient parallax, possibly induced by translational or global camera movement between keyframe pairs, to triangulate 3D map features. Thus, for key frames that have already been selected, the selection algorithm may reject the candidate frame, and only relative rotation will reduce camera motion. Rotation-only camera motion relative to unmapped regions may cause the visual SLAM system to stop due to a lack of newly selected keyframes. Camera tracking will eventually fail due to map inability. As a result, the visual SLAM system will be forced into relocalization mode to resume tracking. Improved tracking and mapping techniques are therefore desirable.

本明細書において開示される実施形態は、単眼視覚同時局所化およびマッピングのための方法に関連付けることができる。一実施形態では、画像を処理するための機械実施方法は、環境の3Dマップを受け取る。一実施形態では、3Dマップは、2つ以上のキーフレーム内で観察された有限深度を有する特徴を含み、各キーフレームは、パノラマキーフレームまたはレギュラーキーフレームである。また、3Dマップは、1つまたは複数のパノラマキーフレーム内で観察された無限深度を有する特徴を同じく含む。一実施形態では、方法は、入力画像フィードからの画像フレーム内で観察された3Dマップの有限深度特徴または無限深度特徴から6自由度(6DOF)でカメラを追跡する。 Embodiments disclosed herein can be associated with methods for monocular visual simultaneous localization and mapping. In one embodiment, a machine-implemented method for processing an image receives a 3D map of the environment. In one embodiment, the 3D map includes features having a finite depth observed in two or more key frames, where each key frame is a panoramic key frame or a regular key frame. The 3D map also includes features with infinite depth observed within one or more panoramic key frames. In one embodiment, the method tracks the camera in 6 degrees of freedom (6 DOF) from a finite depth feature or an infinite depth feature of a 3D map observed in an image frame from an input image feed.

本明細書において開示される実施形態は、単眼視覚同時局所化およびマッピングのための装置に関連付けることができる。装置は、環境の3Dマップを受け取るための手段を含むことができる。一実施形態では、3Dマップは、2つ以上のキーフレーム内で観察された有限深度を有する特徴を含み、各キーフレームは、パノラマキーフレームまたはレギュラーキーフレームである。また、3Dマップは、1つまたは複数のパノラマキーフレーム内で観察された無限深度を有する特徴を同じく含む。一実施形態では、装置は、入力画像フィードからの画像フレーム内で観察された3Dマップの有限深度特徴または無限深度特徴から6自由度(6DOF)でカメラを追跡するための手段を含むことができる。 Embodiments disclosed herein can be associated with an apparatus for monocular visual simultaneous localization and mapping. The apparatus can include means for receiving a 3D map of the environment. In one embodiment, the 3D map includes features having a finite depth observed in two or more key frames, where each key frame is a panoramic key frame or a regular key frame. The 3D map also includes features with infinite depth observed within one or more panoramic key frames. In one embodiment, the apparatus can include means for tracking the camera in 6 degrees of freedom (6 DOF) from a finite depth feature or an infinite depth feature of a 3D map observed in an image frame from an input image feed. .

本明細書において開示される実施形態は、単眼視覚同時局所化およびマッピングのためのデバイスに関連付けることができ、デバイスは、環境の3Dマップを受け取るためのハードウェアおよびソフトウェアを備えている。デバイスは、環境の三次元(3D)マップを受け取るための命令を処理することができる。一実施形態では、3Dマップは、2つ以上のキーフレーム内で観察された有限深度を有する特徴を含み、各キーフレームは、パノラマキーフレームまたはレギュラーキーフレームである。また、3Dマップは、1つまたは複数のパノラマキーフレーム内で観察された無限深度を有する特徴を同じく含む。一実施形態では、デバイスは、入力画像フィードからの画像フレーム内で観察された3Dマップの有限深度特徴または無限深度特徴から6自由度(6DOF)でカメラを追跡するための命令を処理することができる。 The embodiments disclosed herein can be associated with a device for monocular simultaneous visual localization and mapping, the device comprising hardware and software for receiving a 3D map of the environment. The device can process instructions for receiving a three-dimensional (3D) map of the environment. In one embodiment, the 3D map includes features having a finite depth observed in two or more key frames, where each key frame is a panoramic key frame or a regular key frame. The 3D map also includes features with infinite depth observed within one or more panoramic key frames. In one embodiment, the device may process instructions for tracking the camera in 6 degrees of freedom (6 DOF) from a finite depth feature or an infinite depth feature of a 3D map observed in an image frame from an input image feed. it can.

本明細書において開示される実施形態は、デバイス内のプロセッサによって実行されると、それに応答して、環境の3Dマップの受取りを実行する命令を記憶した非一時的記憶媒体に関連付けることができる。媒体は、環境の三次元(3D)マップを受け取る命令を記憶することができる。一実施形態では、3Dマップは、2つ以上のキーフレーム内で観察された有限深度を有する特徴を含み、各キーフレームは、パノラマキーフレームまたはレギュラーキーフレームである。また、3Dマップは、1つまたは複数のパノラマキーフレーム内で観察された無限深度を有する特徴を同じく含む。一実施形態では、媒体は、入力画像フィードからの画像フレーム内で観察された3Dマップの有限深度特徴または無限深度特徴から6自由度(6DOF)でカメラを追跡する命令を記憶することができる。 Embodiments disclosed herein can be associated with a non-transitory storage medium that stores instructions for performing receipt of a 3D map of an environment in response to being executed by a processor in the device. The medium can store instructions for receiving a three-dimensional (3D) map of the environment. In one embodiment, the 3D map includes features having a finite depth observed in two or more key frames, where each key frame is a panoramic key frame or a regular key frame. The 3D map also includes features with infinite depth observed within one or more panoramic key frames. In one embodiment, the medium can store instructions for tracking the camera in 6 degrees of freedom (6 DOF) from a finite depth feature or an infinite depth feature of a 3D map observed in an image frame from an input image feed.

他の特徴および利点は、添付の図面から、また、詳細な説明から明らかになるであろう。 Other features and advantages will be apparent from the accompanying drawings and from the detailed description.

一実施形態における、本発明の態様を実践することができるシステムのブロック図である。1 is a block diagram of a system in which aspects of the present invention may be practiced in one embodiment. 一実施形態における、混成SLAMの流れ図である。2 is a flow diagram of a hybrid SLAM, in one embodiment. 一実施形態における、キーフレームと特徴との間の混成SLAMマップ表現の第1のステージを示す図である。FIG. 6 illustrates a first stage of a hybrid SLAM map representation between keyframes and features in one embodiment. 一実施形態における、キーフレームと特徴との間の混成SLAMマップ表現の第2のステージを示す図である。FIG. 6 illustrates a second stage of a hybrid SLAM map representation between keyframes and features in one embodiment. 一実施形態における、混成SLAM初期化の流れ図である。3 is a flow diagram of hybrid SLAM initialization in one embodiment. 一実施形態における、総体的カメラ運動および純回転カメラ運動が交番する、6DOFおよびパノラママッピングならびに追跡段階を示す図である。FIG. 6 illustrates 6DOF and panoramic mapping and tracking steps in which an overall camera motion and a pure rotation camera motion alternate in one embodiment. 一実施形態における、マッピング中のキーフレーム選択の異なる状態に対する状態線図である。FIG. 6 is a state diagram for different states of key frame selection during mapping in one embodiment. 一実施形態における、追跡構成要素およびマッピング構成要素を含んだ混成SLAMシステムのブロック図である。FIG. 2 is a block diagram of a hybrid SLAM system that includes a tracking component and a mapping component in one embodiment.

「例示的」または「例」という単語は、本明細書では「例、事例、または例示の働きをすること」を意味するために使用される。「例示的」もしくは「例」として本明細書に記載される任意の態様または実施形態は、他の態様もしくは実施形態に比べて好ましいか、または有利であると必ずしも解釈されるべきではない。 The word “exemplary” or “example” is used herein to mean “serving as an example, instance, or illustration”. Any aspect or embodiment described herein as "exemplary" or "example" is not necessarily to be construed as preferred or advantageous over other aspects or embodiments.

一実施形態では、6DOF SLAMおよびパノラマSLAMの機能は、組み合わせて、正規の6DOF動作のための完全に三角測量されたキーフレーム、ならびに制約が回転のみであるキーフレームの両方を受け入れることができる、頑強な運動の混成キーフレームをベースとするSLAMシステムにすることができる。一実施形態では、混成SLAM(HSLAM:Hybrid SLAM)は、純回転をうまく処理することができ、また、継目のない追跡経験をユーザに提供することができる。一実施形態では、HSLAMマッピングは、6DOFおよびパノラマキーフレームを利用して、三次元(3D)マップ(たとえば広域SLAMマップ)の新しい部分を推定している。HSLAMは、回転全体にわたって、場面のマッピングされる部分から離れた3Dマップを連続的に追跡することができ、また、回転のみ運動(たとえばパノラマ追跡)の間に観察されたカメラ画像からの情報を使用して3Dマップを更新することができる。一実施形態では、HSLAMは、単一のカメラセンサをあるタイプの単眼視覚SLAMとして使用して実施することができる。以下で説明されるように、HSLAM動作は、本明細書において説明される機能を実施するために、プロセッサの制御の下でデバイス100によって実施することができる。 In one embodiment, the 6DOF SLAM and panoramic SLAM functions can be combined to accept both fully triangulated keyframes for regular 6DOF operation, as well as keyframes where the constraint is only rotation. It can be a SLAM system based on a hybrid keyframe of robust exercise. In one embodiment, Hybrid SLAM (HSLAM: Hybrid SLAM) can successfully handle pure rotations and provide the user with a seamless tracking experience. In one embodiment, HSLAM mapping utilizes 6DOF and panoramic key frames to estimate a new portion of a three-dimensional (3D) map (eg, a wide area SLAM map). HSLAM can continuously track the 3D map away from the mapped part of the scene throughout the rotation, and can also capture information from camera images observed during the rotation-only motion (e.g. panoramic tracking). Can be used to update 3D maps. In one embodiment, HSLAM can be implemented using a single camera sensor as a type of monocular visual SLAM. As described below, HSLAM operations can be performed by device 100 under the control of a processor to perform the functions described herein.

図1は、本発明の実施形態を実践することができるシステムを示すブロック図である。
システムは、汎用プロセッサ161、画像処理モジュール171、6DOF SLAMモジュール173、パノラマモジュール175およびメモリ164を含むことができるデバイス100であってもよい。また、デバイス100は、1つまたは複数のバス177に結合された多数のデバイスセンサ、または少なくとも画像処理モジュール171、6DOF SLAMモジュール173およびパノラマSLAM175モジュールにさらに結合された信号線を含むことも可能である。モジュール171、173および175は、図には、明確にするためにプロセッサ161および/またはハードウェア162とは別に示されているが、これらは、ソフトウェア165およびファームウェア163の中の命令に基づいて、プロセッサ161および/またはハードウェア162の中で結合および/または実施することができる。制御ユニット160は、以下で説明される混成SLAMを実行する方法を実施するように構成することができる。たとえば制御ユニット160は、図2で説明されるモバイルデバイス100の機能を実施するように構成することができる。 FIG. 1 is a block diagram illustrating a system in which embodiments of the present invention can be practiced.
The system may be a device 100 that may include a general purpose processor 161, an image processing module 171, a 6DOF SLAM module 173, a panorama module 175, and a memory 164. The device 100 can also include multiple device sensors coupled to one or more buses 177, or signal lines further coupled to at least the image processing module 171, 6DOF SLAM module 173, and panoramic SLAM175 module. is there. Modules 171, 173, and 175 are shown separately from processor 161 and / or hardware 162 for clarity, but these are based on instructions in software 165 and firmware 163, It can be combined and / or implemented in processor 161 and / or hardware 162. The control unit 160 can be configured to implement the method for performing the hybrid SLAM described below. For example, the control unit 160 can be configured to implement the functions of the mobile device 100 described in FIG.

デバイス100は、モバイルデバイス、ワイヤレスデバイス、セル電話、拡張現実感デバイス(AR:Augmented Reality)、パーソナルデジタルアシスタント、着用可能デバイス(たとえば眼鏡、腕時計、帽子または同様の身体装着デバイス)、モバイルコンピュータ、タブレット、パーソナルコンピュータ、ラップトップコンピュータ、データ処理デバイス/システム、または処理能力を有する任意のタイプのデバイスであってもよい。 Device 100 is a mobile device, wireless device, cell phone, augmented reality device (AR), personal digital assistant, wearable device (e.g. glasses, watch, hat or similar body-worn device), mobile computer, tablet A personal computer, a laptop computer, a data processing device / system, or any type of device with processing capabilities.

一実施形態では、デバイス100はモバイル/携帯型プラットフォームである。デバイス100は、カメラ114などの画像を捕獲するための手段を含むことができ、また、任意選択で、加速度計、ジャイロスコープ、電子コンパスまたは他の同様の運動知覚要素などの運動センサ111を含むことも可能である。また、デバイス100は、前向きカメラまたは後向きカメラ(たとえばカメラ114)上の画像を捕獲することも可能である。デバイス100は、拡張現実感画像を表示するための、ディスプレイ112などの手段を含むユーザインターフェース150をさらに含むことができる。また、ユーザインターフェース150は、キーボード、キーパッド152、またはユーザが情報をデバイス100中に入力することができる他の入力デバイスを含むことも可能である。必要に応じて、タッチスクリーン/センサを有するディスプレイ112の中に仮想キーパッドを統合することにより、キーボードまたはキーパッド152を除去することができる。たとえば、デバイス100がセルラー電話のようなモバイルプラットフォームであるとき、ユーザインターフェース150は、マイクロフォン154およびスピーカー156も含み得る。デバイス100は、衛星位置システム受信機、電力デバイス(たとえば電池)などの本開示には無関係の他の要素、ならびに典型的には携帯型電子デバイスおよび非携帯型電子デバイスに結合された他の構成要素を含むことができる。 In one embodiment, device 100 is a mobile / portable platform. The device 100 can include means for capturing an image, such as a camera 114, and optionally includes a motion sensor 111, such as an accelerometer, gyroscope, electronic compass, or other similar motion sensing element. It is also possible. Device 100 can also capture images on a forward-facing camera or a rear-facing camera (eg, camera 114). Device 100 can further include a user interface 150 that includes means such as a display 112 for displaying augmented reality images. The user interface 150 may also include a keyboard, keypad 152, or other input device that allows a user to enter information into the device 100. If desired, the keyboard or keypad 152 can be removed by integrating a virtual keypad into the display 112 having a touch screen / sensor. For example, when the device 100 is a mobile platform such as a cellular phone, the user interface 150 may also include a microphone 154 and a speaker 156. Device 100 includes other elements not relevant to the present disclosure, such as satellite position system receivers, power devices (e.g., batteries), and other configurations typically coupled to portable and non-portable electronic devices. Can contain elements.

デバイス100は、モバイルデバイスまたはワイヤレスデバイスとして機能することができ、また、任意の適切なワイヤレス通信技術に基づくか、さもなければサポートするワイヤレスネットワークを通した1つまたは複数のワイヤレス通信リンクを介して通信することができる。たとえばいくつかの態様では、デバイス100は、クライアントまたはサーバであってもよく、また、ワイヤレスネットワークに結合することができる。いくつかの態様では、ネットワークは、ボディエリアネットワークまたはパーソナルエリアネットワーク(たとえば超広帯域ネットワーク)を備えることができる。いくつかの態様では、ネットワークは、ローカルエリアネットワークまたは広域ネットワークを備えることができる。ワイヤレスデバイスは、様々なワイヤレス通信技術、プロトコル、またはたとえば3G、LTE、Advanced LTE、4G、CDMA、TDMA、OFDM、OFDMA、WiMAXおよびWi-Fiなどの規格のうちの1つまたは複数をサポートするか、さもなければ使用することができる。同様に、ワイヤレスデバイスは、様々な対応する変調スキームまたは多重化スキームのうちの1つまたは複数をサポートするか、さもなければ使用することができる。モバイルワイヤレスデバイスは、他のモバイルデバイス、セルラーフォーン、他のワイヤードコンピュータおよびワイヤレスコンピュータ、インターネットウェブサイト、等々と無線で通信することができる。 The device 100 can function as a mobile device or a wireless device and is based on any suitable wireless communication technology or otherwise via one or more wireless communication links through a supporting wireless network Can communicate. For example, in some aspects, device 100 may be a client or server and may be coupled to a wireless network. In some aspects, the network may comprise a body area network or a personal area network (eg, an ultra wideband network). In some aspects, the network may comprise a local area network or a wide area network. Does the wireless device support one or more of various wireless communication technologies, protocols, or standards such as 3G, LTE, Advanced LTE, 4G, CDMA, TDMA, OFDM, OFDMA, WiMAX and Wi-Fi , Otherwise it can be used. Similarly, a wireless device may support or otherwise use one or more of a variety of corresponding modulation or multiplexing schemes. Mobile wireless devices can communicate wirelessly with other mobile devices, cellular phones, other wired and wireless computers, Internet websites, and so on.

上で説明したように、デバイス100は、携帯型電子デバイス(たとえばスマートフォン、専用拡張現実感(AR)デバイス、ゲームデバイス、またはAR処理能力および表示能力を有する他のデバイス)であってもよい。本明細書において説明されるARシステムを実施するデバイスは、様々な環境(たとえばショッピングモール、道路、事務所、家庭、またはユーザが自分のデバイスを使用することができる場所)で使用することができる。ユーザは、広範囲にわたる様々な状況で自分のデバイス100の複数の特徴とインターフェースすることができる。AR文脈では、ユーザは、自分のデバイスを使用して、自分のデバイスのディスプレイを通してリアルワールドの表現を見ることができる。ユーザは、自分のデバイスのカメラを使用してリアルワールド画像/ビデオを受け取り、かつ、デバイス上に表示されたリアルワールド画像/ビデオの上に追加情報または代替情報を重ねる方法で画像を処理することにより、自分のAR可能デバイスと対話することができる。ユーザは、自分のデバイス上でAR実施を見る際に、リアルワールド対象または場面をデバイスディスプレイ上で、実時間で置き換えるか、あるいは変更することができる。仮想対象(たとえばテキスト、画像、ビデオ)は、デバイスディスプレイ上に描写された場面の表現の中に挿入することができる。 As described above, the device 100 may be a portable electronic device (eg, a smartphone, a dedicated augmented reality (AR) device, a gaming device, or other device having AR processing and display capabilities). Devices that implement the AR system described herein can be used in a variety of environments (eg, shopping malls, roads, offices, homes, or places where users can use their devices). . A user can interface with multiple features of his device 100 in a wide variety of situations. In the AR context, a user can use his device to see a real world representation through his device display. Users process real-time images / video using their device's camera and process the image in a way that overlays additional or alternative information on the real-world image / video displayed on the device Allows you to interact with your AR-enabled device. Users can replace or change real-world objects or scenes in real time on the device display when viewing the AR implementation on their devices. Virtual objects (eg, text, images, video) can be inserted into the representation of the scene depicted on the device display.

一実施形態では、HSLAMは、上で説明される広域SLAMマップの追跡およびマッピングを含む6DOF SLAMを実行することができる。HSLAMは、単一のSLAMマップ(すなわち広域SLAMマップ)を維持することができ、また、6DOF SLAMとパノラマSLAMの両方は、広域SLAMマップにアクセスし、かつ、更新することができる。 In one embodiment, the HSLAM may perform 6DOF SLAM, including global SLAM map tracking and mapping as described above. HSLAM can maintain a single SLAM map (ie, a wide area SLAM map), and both 6DOF SLAM and panoramic SLAM can access and update the wide area SLAM map.

いくつかの実施形態では、HSLAMは、6DOF SLAM(たとえば専用6DOFモジュール173として)を通して、捕獲した画像からキーフレームを生成することができる。HSLAMは、捕獲した画像が閾値並進と一致することを決定すると、広域SLAMマップにすでに結合されている先行するキーフレームからキーフレームを生成することができる。 In some embodiments, HSLAM can generate key frames from captured images through 6DOF SLAM (eg, as a dedicated 6DOF module 173). If HSLAM determines that the captured image matches the threshold translation, it can generate a key frame from the previous key frame that is already combined with the global SLAM map.

一実施形態では、6DOF SLAM(たとえば6DOF追跡)は、キーフレームから観察された特徴を広域SLAMマップに結合することができる。6DOF SLAM(たとえば6DOF追跡)は、この特徴結合を使用して、それぞれのカメラ画像に関連するカメラ位置および配向(すなわちポーズ)を決定することができる。また、6DOFマッピングは、広域SLAMマップを更新/維持することも可能である。上で説明したように、6DOF SLAMによって維持される広域SLAMマップは、2つ以上のキーフレーム(たとえば一対のキーフレームまたは複数の対のキーフレーム)から三角測量された3D特徴点を含むことができる。たとえばキーフレームは、観察された場面を表現するために、画像またはビデオストリームまたはフィードから選択することができる。HSLAMは、キーフレームごとに、画像に結合されたそれぞれの6DOFカメラポーズを計算することができる。計算されたポーズは、本明細書においてはキーフレームポーズ(3DOFキーフレーム位置および3DOFキーフレーム配向からなる)と呼ぶことができる。 In one embodiment, 6DOF SLAM (eg, 6DOF tracking) can combine features observed from keyframes into a global SLAM map. 6DOF SLAM (eg, 6DOF tracking) can use this feature combination to determine the camera position and orientation (ie, pose) associated with each camera image. 6DOF mapping can also update / maintain wide area SLAM maps. As explained above, the global SLAM map maintained by 6DOF SLAM may contain 3D feature points triangulated from two or more keyframes (e.g. a pair of keyframes or multiple pairs of keyframes). it can. For example, key frames can be selected from an image or video stream or feed to represent an observed scene. HSLAM can calculate each 6DOF camera pose combined with the image for each keyframe. The calculated pose may be referred to herein as a key frame pose (consisting of 3DOF key frame position and 3DOF key frame orientation).

本明細書において使用されているパノラマSLAMは、捕獲された複数の画像をまとめて縫いつづって、回転のみカメラ運動を使用して取得された画像の結合収集にすることを意味している。パノラマSLAM(たとえばパノラマモジュール175のパノラマ追跡)を使用しているHSLAMは、6DOF SLAM(すなわち6DOFモジュール173によって計算される)の6DOFと比較すると、3回転自由度(3DOF)を計算することができる。HSLAMは、相対回転を使用してパノラマキーフレームを互いに関連付けることができる。HSLAMは、最小閾値視差または並進が一致しない場合、特徴点三角測量をバイパスまたはスキップすることができる。たとえばカメラの位置はそのままであり、先行するキーフレームから純回転のみが生じた場合、最小閾値視差または並進は一致しないことになる。 As used herein, panoramic SLAM refers to stitching together multiple captured images into a combined collection of images acquired using rotational only camera motion. HSLAM using panoramic SLAM (eg panoramic tracking of panorama module 175) can calculate 3 rotational degrees of freedom (3DOF) when compared to 6DOF of 6DOF SLAM (ie calculated by 6DOF module 173) . HSLAM can associate panoramic key frames with each other using relative rotation. HSLAM can bypass or skip feature point triangulation if the minimum threshold disparity or translation does not match. For example, if the camera position remains the same and only pure rotation has occurred from the preceding key frame, the minimum threshold parallax or translation will not match.

HSLAMは、現在のキーフレームとすでに捕獲済みのキーフレームとを比較して視差レベルまたは並進レベルを決定することができる。したがってパノラマ特徴点は、光線(すなわち無限特徴、無限深度特徴、推定された深度がない特徴または無限深度を有する特徴)と見なすことができる。一実施形態では、6DOF SLAMから生成される3D点は、有限深度特徴と呼ばれる(たとえば特徴は、規定された、あるいは推定された深度を有することができる)。 HSLAM can compare the current key frame with the already captured key frame to determine the parallax level or translation level. Thus, panoramic feature points can be considered as rays (ie, infinite features, infinite depth features, features with no estimated depth or features with infinite depth). In one embodiment, 3D points generated from 6DOF SLAM are referred to as finite depth features (eg, a feature can have a defined or estimated depth).

従来の6DOF SLAMは、純回転カメラ移動を処理することができない。追跡が失われることがあり、また、いくつかの状況では、誤って測定された有限特徴がマップ(たとえば広域SLAMマップ)を害することがある。一方、従来から並進運動中に回転運動を取り扱っているパノラマSLAMは、追加回転として符号化されることがあり、同じくマップ品質を低下させている。 Conventional 6DOF SLAM cannot handle pure rotational camera movement. Tracking can be lost, and in some situations, incorrectly measured finite features can harm a map (eg, a wide area SLAM map). On the other hand, the panorama SLAM, which has traditionally handled rotational motion during translation, may be encoded as additional rotation, which also reduces map quality.

一実施形態では、HSLAMは、6DOF SLAMおよびパノラマSLAMの利点を組み合わせて、運動の性質に応じて6DOF SLAMとパノラマSLAMとの間を動的に切り換えることができる混成システムにしている。たとえばユーザは、総体的運動または純回転である運動を実施することができる。HSLAMは、ユーザがしばしば実践する場面のマッピングされる部分から離れた一時的な回転を取り扱うことができる。また、HSLAMは、十分な追加情報が利用可能になる場合、回転運動中になされた場面の観察を後の3Dマッピングステップに組み込むことも可能である。 In one embodiment, HSLAM combines the advantages of 6DOF SLAM and panoramic SLAM into a hybrid system that can dynamically switch between 6DOF SLAM and panoramic SLAM depending on the nature of the movement. For example, the user can perform an exercise that is gross or pure rotation. HSLAM can handle temporary rotation away from the mapped part of the scene that users often practice. HSLAM can also incorporate observations of scenes made during rotational movement into subsequent 3D mapping steps if sufficient additional information becomes available.

一実施形態では、HSLAMは、6DOF追跡を使用して、1つまたは複数の画像フレームまたはビデオフレームのためのカメラポーズを決定することができる。HSLAMは、3Dマップからの特徴を画像フレームまたはビデオフレーム中に投影し、かつ、確証された2D-3D符合からのカメラポーズを更新することによってカメラポーズを決定することができる。また、HSLAMは、マップの中に挿入するための新しいキーフレームを選択することも可能である。HSLAMは、現在のカメラ位置(すなわち並進)がすべての既存のキーフレーム位置から十分に離れている場合、新しいキーフレームを3Dマップの中に挿入することができる。また、HSLAMは、既知の特徴との現在のフレームのカバレージが閾値未満(たとえば3Dマップの新しい、あるいはそれ以前にマップされていない領域が現在のフレーム中に表現されている)場合、新しいキーフレームをマップの中に挿入することも可能である。さらに、HSLAMは、現在のカメラの配向が既存のキーフレーム配向から十分に離れており、また、現在のカメラの位置が既存のキーフレーム位置から最小距離だけ並進されている場合、キーフレームを挿入することができる。 In one embodiment, HSLAM may use 6DOF tracking to determine camera poses for one or more image frames or video frames. HSLAM can determine the camera pose by projecting features from the 3D map into an image or video frame and updating the camera pose from the validated 2D-3D code. HSLAM can also select new keyframes for insertion into the map. HSLAM can insert a new keyframe into the 3D map if the current camera position (ie translation) is well away from all existing keyframe positions. HSLAM also allows new keyframes if the coverage of the current frame with known features is below a threshold (for example, a new or previously unmapped region of the 3D map is represented in the current frame). Can also be inserted into the map. In addition, HSLAM inserts keyframes when the current camera orientation is sufficiently far from the existing keyframe orientation and the current camera position is translated a minimum distance from the existing keyframe position. can do.

別法としては、6DOF SLAMは、配向は変化しているが、位置はそれほど十分には変化していない場合、キーフレーム挿入をバイパスするか、あるいはスキップすることができる。配向が変化し、位置は変化していない場合、6DOF SLAMは、純回転の移動を考慮することができる。6DOF SLAMは、純回転の間、三角測量することができないか、あるいは新しい有限特徴をマップの中に挿入することができない。 Alternatively, 6DOF SLAM can bypass or skip keyframe insertion if the orientation is changing but the position is not changing enough. If the orientation changes and the position does not change, 6DOF SLAM can account for pure rotational movement. 6DOF SLAM cannot triangulate during pure rotation or insert new finite features into the map.

一実施形態では、HSLAMは、閾値追跡条件が合致すると、6DOF SLAMからパノラマSLAMへの実時間切換えをトリガすることができる。たとえば閾値追跡条件は、十分な配向変化である(たとえばカメラビューを回転させる)こと、カメラ位置を維持すること(たとえばカメラロケーションが固定されるか、あるいはその前に捕獲された画像とほぼ同じ位置である)、および既存のカバレージが狭いこと(たとえば捕獲された画像領域が3Dマップ中の新しい、あるいはそれ以前にマッピングされていない領域であること)を含むことができる。たとえば狭い既存のカバレージは、現在の画像の50パーセント以下が既知の特徴点によってカバーされ、カメラビューが新しい領域に向かって回転し続ける場合、やがて追跡が失われることになることを示していることをHSLAMが検出することに基づいて決定することができる。他の実施形態では、HSLAMは、一般化されたGeometric Robust Information Criteria(すなわちGRIC)スコアを使用して、6DOF SLAMからパノラマSLAMへの切換えをトリガすることができる。 In one embodiment, HSLAM can trigger a real-time switch from 6DOF SLAM to panoramic SLAM when threshold tracking conditions are met. For example, the threshold tracking condition is sufficient orientation change (e.g., rotating the camera view), maintaining the camera position (e.g., where the camera location is fixed or approximately the same as the previously captured image) And existing coverage is narrow (eg, the captured image region is a new or unmapped region in the 3D map). For example, narrow existing coverage indicates that if less than 50 percent of the current image is covered by known feature points, tracking will eventually be lost if the camera view continues to rotate toward the new area Can be determined based on what HSLAM detects. In other embodiments, the HSLAM may use a generalized Geometric Robust Information Criteria (or GRIC) score to trigger a switch from 6DOF SLAM to panoramic SLAM.

HSLAMは、パノラマSLAMに切り換わると、無限特徴を含んだ新しいキーフレームを生成することができる。HSLAMは、無限特徴キーフレームをキーフレームのデータベースの中に挿入することができる。たとえばキーフレームのデータベースは、広域SLAMマップに結合することができる。一実施形態では、無限特徴キーフレームは、「パノラマ」キーフレームとして印を付けることができ、あるいは識別することができる。本明細書において使用されているパノラマキーフレームは、無限特徴、または計算された深度が全くない特徴を含んだキーフレームである。 When HSLAM switches to panoramic SLAM, it can generate new keyframes containing infinite features. HSLAM can insert infinite feature keyframes into a keyframe database. For example, a keyframe database can be combined into a global SLAM map. In one embodiment, infinite feature keyframes can be marked or identified as “panoramic” keyframes. As used herein, a panoramic keyframe is a keyframe that includes infinite features or features that have no calculated depth.

HSLAMは、閾値追跡条件または閾値GRICスコアが合致している間、無限特徴の追跡を継続し、かつ、追加パノラマキーフレームを挿入することができる。一実施形態では、HSLAMは、すべてのパノラマキーフレームのキーフレーム位置は、6DOF追跡からパノラマSLAMに切り換わる前に考慮されていた最後の6DOFキーフレームと同じ位置であることを想定することができる。 HSLAM can continue to track infinite features and insert additional panoramic keyframes while the threshold tracking condition or threshold GRIC score is met. In one embodiment, HSLAM can assume that the keyframe position of all panoramic keyframes is the same position as the last 6DOF keyframe that was considered before switching from 6DOF tracking to panoramic SLAM. .

一実施形態では、HSLAMは、有限特徴および無限特徴の混成セットをまとめて処理するためのポーズ洗練アルゴリズムを使用することができる。先にポーズを初期化すると、ポーズ洗練アルゴリズムは、一組の有限マップ特徴および無限マップ特徴から、更新される6DOF/3DOFポーズおよびその対応する二次元(2D:Two Dimensional)画像測値を計算することができる。増分ポーズ更新は、有限マップ特徴と無限マップ特徴の両方に対する再投影誤差を反復して最適化することによって計算される。 In one embodiment, HSLAM can use a pose refinement algorithm to collectively process a mixed set of finite and infinite features. Initializing the pose first, the pose refinement algorithm calculates an updated 6DOF / 3DOF pose and its corresponding two-dimensional (2D) image measurements from a set of finite and infinite map features. be able to. Incremental pose updates are calculated by iteratively optimizing reprojection errors for both finite and infinite map features.

一実施形態では、有限特徴の閾値数を利用することができる場合、6DOFポーズが計算される。いくつかの実施形態では、無限特徴から単純に構成された一組の特徴は、6DOFポーズの代わりに3DOFポーズをもたらすことができる。一実施形態では、HSLAMポーズ洗練アルゴリズムにより、パノラマSLAMと6DOF SLAM(たとえば6DOF追跡)との間を継目なく切り換えることができる。HSLAMは、利用可能である場合(たとえば捕獲された場面の画像から有限特徴点を決定することができる場合)、無限点を使用して一時的に追跡し、かつ、有限点に切り換えることができる。一実施形態では、追跡が失われると、HSLAMは、広域SLAMマップを使用して再局所化を実行することができる。追跡が失われると、HSLAMは、すべての利用可能なキーフレームに対して微小ぼやけ画像(SBI:Small Blurry Images)を使用して完全な再局所化を実行することができる。別法としては、HSLAMは、記述子整合を使用して再局所化を実行することも可能である。HSLAMは、6DOFキーフレームならびにパノラマキーフレームを使用して広域SLAMマップの再局所化を試行することができる。 In one embodiment, a 6DOF pose is calculated if a threshold number of finite features can be utilized. In some embodiments, a set of features simply constructed from infinite features can result in 3DOF poses instead of 6DOF poses. In one embodiment, the HSLAM pose refinement algorithm can seamlessly switch between panoramic SLAM and 6DOF SLAM (eg, 6DOF tracking). HSLAM can be tracked temporarily using an infinite point and switched to a finite point if it is available (eg if a finite feature point can be determined from an image of a captured scene) . In one embodiment, if tracking is lost, HSLAM may perform relocalization using a global SLAM map. If tracking is lost, HSLAM can perform full relocalization using Small Blurry Images (SBI) for all available keyframes. Alternatively, HSLAM can perform relocalization using descriptor matching. HSLAM can attempt to re-localize the global SLAM map using 6DOF keyframes as well as panoramic keyframes.

図2は、一実施形態における混成SLAMの流れ図を示したものである。ブロック205で、一実施形態(たとえばHSLAM)は環境の3Dマップを受け取る。たとえばHSLAMは、広域SLAMマップを処理することができる。3Dマップは、2つ以上のキーフレーム内で観察された有限深度を有する特徴を有することができる。各キーフレームは、パノラマキーフレームまたはレギュラーキーフレームであってもよい。3Dマップは、1つまたは複数のパノラマキーフレーム内で観察された無限深度を有する特徴を有することができる。 FIG. 2 shows a flow diagram of a hybrid SLAM in one embodiment. At block 205, one embodiment (eg, HSLAM) receives a 3D map of the environment. For example, HSLAM can process wide area SLAM maps. A 3D map can have features with a finite depth observed in more than one keyframe. Each key frame may be a panoramic key frame or a regular key frame. A 3D map can have features with infinite depth observed within one or more panoramic key frames.

ブロック210で、実施形態は、現在のフレーム内で観察された3Dマップの有限特徴または無限特徴から6DOFでカメラを追跡する。カメラ移動は、総体的カメラ移動または純回転カメラ移動であってもよい。一実施形態では、HSLAMは、有限特徴から6DOFポーズを推定することができ、また、無限特徴から3DOFポーズを推定することができる。パノラマカメラ移動の追跡は、受け取った3Dマップ(たとえば広域SLAMマップ)の先在する境界を超えて継続することができる。たとえば実施形態は、パノラマSLAMを使用して新しい領域を追跡し、かつ、マッピングして、受け取った3Dマップに追加することができる。広域SLAMマップは、キーフレーム、三角測量された特徴点、およびキーフレームと特徴点との間の結合(観察)のうちの1つまたは複数を含むことができる。 At block 210, the embodiment tracks the camera at 6 DOF from the finite or infinite feature of the 3D map observed in the current frame. The camera movement may be a global camera movement or a pure rotation camera movement. In one embodiment, HSLAM can estimate 6 DOF poses from finite features and can estimate 3 DOF poses from infinite features. Tracking panoramic camera movement can continue beyond the pre-existing boundaries of the received 3D map (eg, a wide area SLAM map). For example, an embodiment can use panoramic SLAM to track and map new regions and add them to the received 3D map. The wide area SLAM map may include one or more of keyframes, triangulated feature points, and the coupling (observation) between keyframes and feature points.

キーフレームは、捕獲された画像(たとえばデバイスカメラ114によって捕獲された画像フレーム)および捕獲された画像を生成するのに使用されたカメラパラメータからなっていてもよい。本明細書において使用されているカメラパラメータは、カメラ位置および配向(ポーズ)を含む。広域SLAMマップは、有限特徴および無限特徴を含むことができる。一実施形態では、HSLAMは、カメラ画像が十分な視差閾値または並進閾値に合致していない場合、回転のみ運動の結果として得られたカメラ画像を既存の3Dマップの中に組み込むことができる。 A key frame may consist of a captured image (eg, an image frame captured by device camera 114) and the camera parameters used to generate the captured image. As used herein, camera parameters include camera position and orientation (pose). A wide area SLAM map can include finite and infinite features. In one embodiment, HSLAM can incorporate a camera image resulting from a rotation-only motion into an existing 3D map if the camera image does not meet a sufficient parallax threshold or translation threshold.

一実施形態では、HSLAMは、純回転カメラ運動を検出すると、基準キーフレーム(すなわち基準パノラマキーフレーム)として第1のパノラマキーフレームを選択する。第1のパノラマキーフレームは、3Dマップに対して局所化することができる。たとえばHSLAMは、6DOFカメラ移動からパノラマカメラ移動への移行を検出すると、受け取った第1のキーフレームを選択することができる。HSLAMは、恐らくは局所化されていない追加パノラマキーフレーム(たとえば従属キーフレーム)を選択することができる。追加パノラマキーフレームは、後に、マッピングプロセスの一部として、3Dマップに対して局所化することができる。HSLAMは、既存のマップ特徴との符合を生成する(たとえば能動探索および記述子整合技法を使用して)ことによって追加キーフレームを局所化することができる。局所化すると、HSLAMは、パノラマキーフレームの無限特徴(すなわち無限深度特徴)を、(a)それらを局所化された他のキーフレームの特徴と整合する、および(b)結果として得られた2D-2D符合を三角測量する(たとえば無限特徴を整合する)ことによって変換し、追加3Dマップ特徴を得ることができる。延いては、新しい3Dマップ特徴を使用して、局所化されていない他のパノラマキーフレームを局所化することができる。 In one embodiment, when HSLAM detects a pure rotating camera motion, it selects the first panoramic key frame as the reference key frame (ie, the reference panoramic key frame). The first panoramic key frame can be localized with respect to the 3D map. For example, when HSLAM detects a transition from 6DOF camera movement to panoramic camera movement, it can select the first key frame received. HSLAM can select additional panoramic keyframes (eg, dependent keyframes) that are probably not localized. Additional panoramic key frames can later be localized to the 3D map as part of the mapping process. HSLAM can localize additional key frames by generating a match with existing map features (eg, using active search and descriptor matching techniques). When localized, HSLAM will infinitely feature panoramic keyframes (i.e., infinite depth features), (a) match them with the features of other localized keyframes, and (b) the resulting 2D -2D codes can be transformed by triangulation (eg matching infinite features) to obtain additional 3D map features. In turn, new 3D map features can be used to localize other panoramic key frames that are not localized.

図3は、一実施形態における、キーフレームと特徴との間の混成SLAMマップ表現の第1のステージを示したものである。この第1のステージは、有限マップ特徴305を観察する6DOFキーフレーム320を示している。局所パノラママップ350は、有限特徴305観察および無限特徴310観察を有する基準パノラマキーフレーム330を介して3Dマップ(たとえば広域SLAMマップ)内に位置決めすることができ、一方、残りの従属パノラマキーフレーム315は、無限特徴310を観察することができる。 FIG. 3 illustrates a first stage of a hybrid SLAM map representation between keyframes and features in one embodiment. This first stage shows a 6DOF key frame 320 observing a finite map feature 305. The local panorama map 350 can be positioned in a 3D map (eg, a wide area SLAM map) via a reference panorama key frame 330 having a finite feature 305 observation and an infinite feature 310 observation, while the remaining dependent panorama key frame 315. Can observe the infinite feature 310.

図4は、一実施形態における、キーフレームと特徴との間の混成SLAMマップ表現の第2のステージを示したものである。第2のステージでは、(a)追加6DOFキーフレーム410と、局所化されたパノラマキーフレーム(たとえば基準パノラマキーフレーム430)との間、または(b)異なる局所パノラママップ(たとえばパノラママップ「A」440およびパノラママップ「B」450)からの局所化されたパノラマキーフレーム(たとえば基準パノラマキーフレーム430)間で整合された対応する観察から無限特徴310を三角測量することができる。追加特徴は、他のパノラマキーフレーム(たとえば従属パノラマキーフレーム415)の局所化を可能にすることができる。 FIG. 4 illustrates a second stage of a hybrid SLAM map representation between keyframes and features in one embodiment. In the second stage, (a) between an additional 6 DOF keyframe 410 and a localized panoramic keyframe (e.g., reference panoramic keyframe 430), or (b) a different local panoramic map (e.g., panorama map `` A '') Infinite features 310 can be triangulated from corresponding observations aligned between localized panoramic key frames (eg, reference panoramic key frame 430) from 440 and panoramic map “B” 450). Additional features may allow localization of other panoramic key frames (eg, dependent panoramic key frame 415).

パノラマキーフレームの頑強な局所化は、パノラマキーフレーム内の有限3Dマップ特徴の新しい2D観察を見出すための反復プロセスであってもよい。有限3Dマップ特徴に対する十分な2D観察を確立すると、全6DOFポーズを使用してパノラマフレームを局所化し、かつ、レギュラー(すなわち非パノラマ)キーフレームに変換することができる。正規のキーフレームへの変換後、HSLAMは、追加無限特徴点(たとえば2D特徴)を三角測量することができ、それにより同じく他のパノラマキーフレームを局所化することができる。 Robust localization of panoramic key frames may be an iterative process to find new 2D observations of finite 3D map features within panoramic key frames. Once sufficient 2D observations for finite 3D map features are established, panoramic frames can be localized using all 6DOF poses and converted to regular (ie non-panoramic) keyframes. After conversion to regular key frames, HSLAM can triangulate additional infinite feature points (eg, 2D features), thereby localizing other panoramic key frames as well.

図5は、一実施形態における混成SLAM初期化の流れ図を示したものである。ブロック505で、実施形態(たとえばHSLAM)は、捕獲された画像を受け取ることができる。たとえば捕獲された画像は、カメラ画像またはビデオフィードからの画像であってもよい。 FIG. 5 shows a flowchart of hybrid SLAM initialization in one embodiment. At block 505, an embodiment (eg, HSLAM) can receive the captured image. For example, the captured image may be a camera image or an image from a video feed.

ブロック510で、実施形態は、初期3Dマップを生成するか、あるいは既存の3Dマップ515に情報を追加することによってHSLAMを初期化し、かつ、カメラ位置および配向(ポーズ)520を出力することができる。HSLAMの初期化には、捕獲された1つまたは複数の画像を処理して、無矛盾スケールを有する3Dマップ(たとえば広域SLAMマップ)を作成することを含むことができる。いくつかの実施形態では、デバイス100へのアプリケーションのローディング開始時に、HSLAMは、モデルに基づくディテクタおよびトラッカをロードして初期マップを生成することができる。既知の平面画像目標を検出すると、HSLAMは、第1の6DOFキーフレームを生成することができる。HSLAMは、画像目標の追跡を継続し、かつ、2D-2D符合のフレーム間整合を実行することができる。十分な符合を頑強に三角測量することができる場合、第2の6DOFキーフレームが選択される。したがって2つのレギュラー6DOFキーフレームおよび結果として得られる有限マップ特徴は、初期3Dマップを構成することができる。 At block 510, the embodiment may initialize the HSLAM by generating an initial 3D map or adding information to an existing 3D map 515 and outputting a camera position and orientation (pose) 520. . HSLAM initialization may include processing the captured image or images to create a 3D map (eg, a wide area SLAM map) with a consistent scale. In some embodiments, at the beginning of loading an application onto device 100, HSLAM can load a model-based detector and tracker to generate an initial map. Upon detecting a known planar image target, HSLAM can generate a first 6DOF keyframe. HSLAM can keep track of image targets and perform 2D-2D code inter-frame alignment. If enough signs can be triangulated robustly, the second 6DOF keyframe is selected. Thus, two regular 6DOF keyframes and the resulting finite map features can constitute an initial 3D map.

3Dマップは、レギュラー6DOFキーフレームおよびパノラマキーフレーム内に2D画像観察を有する有限点特徴および無限点特徴から構成することができる。捕獲された各画像は、それぞれの画像がカメラによって捕獲された時点における、関連するカメラポーズを有することができる。一実施形態では、HSLAMは、6DOFを追跡する能力を拡張して、純カメラ回転の間、広域SLAMマップを追跡することができる。一実施形態では、HSLAMは、純カメラ回転の間に生成されたキーフレームを広域SLAMマップの中に組み込むことも可能である。 The 3D map can be composed of finite point features and infinite point features with 2D image observation in regular 6DOF keyframes and panoramic keyframes. Each captured image can have an associated camera pose at the time that the respective image was captured by the camera. In one embodiment, HSLAM can extend the ability to track 6DOF to track wide area SLAM maps during pure camera rotation. In one embodiment, HSLAM may also incorporate key frames generated during pure camera rotation into a global SLAM map.

現在のカメラポーズは、単純な一定崩壊運動モデルによって推測することができる。HSLAMは、推測されたカメラポーズ、現在のパノラママップ(たとえばパノラママップ)の無限特徴から、可視性のための特徴をフィルタリングし、かつ、有限が好ましい場合に無限特徴の上に特徴再投影を重畳することにより、整合するための一組の特徴をすべての広域SLAMマップ特徴から選択することができる。次に、実施形態は、スコア関数としてNCCを使用して、現在のフレーム内の選択された各特徴を能動的に探索することができる。十分に高いNCCスコアとの整合は、統一相対ポーズリファイナによって処理される符合セットに追加することができる。ポーズリファイナは、更新された6DOFポーズまたは3DOFポーズのいずれかを出力することができる。増分ポーズ推定が失敗した場合、6DOFポーズを出力することができる再局所化に入る。 The current camera pose can be estimated by a simple constant collapse motion model. HSLAM filters features for visibility from inferred camera poses, infinite features of the current panorama map (eg panorama map) and superimposes feature reprojection on infinite features when finite is preferred By doing so, a set of features for matching can be selected from all the global SLAM map features. Embodiments can then actively search for each selected feature in the current frame using NCC as the score function. A match with a sufficiently high NCC score can be added to the code set processed by the unified relative pose refiner. The pose refiner can output either an updated 6DOF pose or a 3DOF pose. If the incremental pose estimation fails, re-localization is entered where a 6DOF pose can be output.

図6は、一実施形態による、6DOFおよびパノラママッピング段階および追跡段階を示したものであり、総体的カメラ運動および純回転カメラ運動が交番している。カメラ運動(たとえば総体的カメラ運動)は、3Dマップ605(たとえば広域SLAMマップ)から6DOFで追跡することができる。ドロップキーフレームを使用して3Dマップを洗練し、かつ、拡張することができる。回転のみカメラ運動625に切り換えると、ドロップキーフレームを使用して局所パノラママップ610が作成される。カメラ追跡は、パノラママップ特徴および3Dマップ特徴を使用して実行することができる。総体的カメラ運動によって追跡が破壊され、6DOFカメラポーズ再局所化635をもたらすことがある。総体的カメラ運動は3Dマップ上に戻すことができ、また、有限特徴および無限特徴を追跡することにより、滑らかに移行する640ことができる。 FIG. 6 illustrates 6DOF and panoramic mapping and tracking phases, according to one embodiment, with alternating total camera motion and pure rotation camera motion. Camera motion (eg, global camera motion) can be tracked in 6DOF from a 3D map 605 (eg, a wide area SLAM map). Refine and extend 3D maps using drop keyframes. Switching to rotation only camera motion 625 creates a local panorama map 610 using drop key frames. Camera tracking can be performed using panoramic map features and 3D map features. Global camera motion can destroy tracking, resulting in 6DOF camera pose relocalization 635. The overall camera motion can be returned on the 3D map and can be transitioned 640 smoothly by tracking finite and infinite features.

図7は、マッピング中のキーフレーム選択の異なる状態に対する状態線図を示したものである。HSLAM初期化510の後、システムは、全6DOFマッピングモード755で動作を開始する。純回転運動が検出されると760、新しいパノラママップが生成される(たとえば3DOFマッピング765)。純回転運動は、追跡された6DOFポーズの履歴に基づいてHSLAMによって検出することができる。追跡された6DOFポーズは、年代順にメモリに記憶することができる。HSLAMは、現在のポーズと記憶されているポーズとの間の視差角度を計算し、視差が大きい(たとえば5度を超える)すべてのポーズを放棄することができる。6DOF測定770は、システムを全6DOFマッピングモード755に戻すことができる。追跡が失敗すると、再局所化775が全6DOFポーズを回復することができる。 FIG. 7 shows a state diagram for different states of key frame selection during mapping. After HSLAM initialization 510, the system starts operating in full 6DOF mapping mode 755. If a pure rotational motion is detected 760, a new panorama map is generated (eg, 3DOF mapping 765). Pure rotational motion can be detected by HSLAM based on the history of tracked 6DOF poses. The tracked 6DOF poses can be stored in memory in chronological order. HSLAM can calculate the parallax angle between the current pose and the stored pose and abandon all poses with large parallax (eg, greater than 5 degrees). The 6DOF measurement 770 can return the system to the full 6DOF mapping mode 755. If tracking fails, relocalization 775 can recover all 6 DOF poses.

図8は、一実施形態における、追跡構成要素およびマッピング構成要素を含んだ混成SLAMシステムのブロック図である。構成要素は、スレッド、エンジン、またはハードウェアあるいはソフトウェアとして実施されたモジュールであってもよい。一実施形態では、HSLAMは、総体的カメラ運動および純回転カメラ運動の追跡を可能にする十分な有限特徴および無限特徴から6DOFポーズを推定することができる。純回転カメラ運動がマッピングされない場面領域に向かっていることを決定すると、HSLAMは、混成3Dおよびパノラママップ追跡815を継続することができ、また、混成キーフレーム845を3Dマップ865に割り当てることができる。純回転カメラ運動がマッピングされない場面領域に向かっていることを決定すると、HSLAMは、純パノラマ追跡820に切り換え、また、パノラマキーフレーム850を局所パノラママップ870に割り当てることができる。総体的カメラ運動がマッピングされる場面領域に向かっていることを決定すると、HSLAMは、広域SLAMマップ(たとえば3Dマップ865)上に移行して戻ることができる。総体的カメラ運動がマッピングされない場面領域に向かっていることを決定すると、追跡が失敗し、再局所化が必要であるか、または十分な視差および狭いカバレージに基づいてレギュラー6DOFキーフレームを選択することができる。いずれの場合においてもHSLAMは、3Dマップ上に移行して戻り、追跡810し、かつ、6DOFキーフレーム840を3Dマップ865に割り当てることができる。 FIG. 8 is a block diagram of a hybrid SLAM system that includes a tracking component and a mapping component in one embodiment. A component may be a thread, an engine, or a module implemented as hardware or software. In one embodiment, HSLAM can estimate 6DOF poses from sufficient finite and infinite features that allow tracking of total camera motion and pure rotational camera motion. Upon determining that the pure rotating camera motion is heading towards an unmapped scene area, HSLAM can continue the hybrid 3D and panorama map tracking 815 and can also assign the hybrid keyframe 845 to the 3D map 865 . Upon determining that the pure rotating camera motion is heading towards the unmapped scene area, HSLAM can switch to pure panorama tracking 820 and assign a panorama key frame 850 to the local panorama map 870. If it is determined that the overall camera motion is toward the scene area to be mapped, the HSLAM can transition back on the wide area SLAM map (eg, 3D map 865). If you decide that the overall camera motion is heading towards an unmapped scene area, tracking will fail and relocalization is required, or select regular 6DOF keyframes based on sufficient disparity and narrow coverage Can do. In either case, HSLAM can move back on the 3D map, track 810, and assign a 6DOF keyframe 840 to the 3D map 865.

一実施形態では、HSLAMポーズ追跡およびキーフレーム選択構成要素825は、較正された単一のカメラの捕獲された画像(たとえばビデオストリームまたはフィード)を処理し、3Dマップ865(たとえば広域SLAMマップ)に対する総体的カメラ運動および回転のみカメラ運動を追跡することができる。 In one embodiment, the HSLAM pose tracking and keyframe selection component 825 processes a calibrated single camera captured image (e.g., a video stream or feed), and against a 3D map 865 (e.g., a wide area SLAM map). Only global camera motion and rotation can track camera motion.

追跡構成要素は、ユーザによって実行される現在の運動に応じて、全6D追跡モードとパノラマ追跡モードとの間を、動的に、かつ、継目なく切り換えることができる。追跡構成要素は、ユーザがしばしば実践する場面のマッピングされる部分から離れた一時的回転を取り扱うことができる。追跡構成要素は、これらの回転を検出し、かつ、局所パノラママップを作成するために使用される特殊な「パノラマ」キーフレームを選択することができる。局所パノラママップは、単一の無矛盾3Dマップ内に位置決めされる。総体的カメラ運動および回転のみカメラ運動は、有限特徴および無限特徴を含むことができる広域SLAMマップに対して追跡することができる。一実施形態では、HSLAMは、頑強なフレームレートカメラポーズ追跡および再局所化を可能にする。ポーズ推定は、有限特徴(既知の3Dロケーション)と無限特徴の両方の測値を組み合わせることができ、また、HSLAMは、6DOFまたは3DOFのいずれかのポーズ更新830を自動的に計算することができる。一実施形態では、増分ポーズ追跡が失敗すると、微小ぼやけ画像に基づいてHSLAMが再局所化することができる。 The tracking component can dynamically and seamlessly switch between full 6D tracking mode and panoramic tracking mode depending on the current movement performed by the user. The tracking component can handle temporary rotation away from the mapped portion of the scene that the user often practices. The tracking component can detect these rotations and select a special “panorama” keyframe that is used to create a local panorama map. The local panorama map is positioned within a single consistent 3D map. Global camera motion and rotational only camera motion can be tracked against a global SLAM map that can include finite and infinite features. In one embodiment, HSLAM enables robust frame rate camera pose tracking and relocalization. Pose estimation can combine measurements for both finite features (known 3D locations) and infinite features, and HSLAM can automatically calculate pose updates 830 for either 6DOF or 3DOF . In one embodiment, if incremental pose tracking fails, HSLAM can be re-localized based on the micro-blurred image.

HSLAMは、キーフレーム画像から特徴を抽出することができる。本明細書において使用されている特徴(たとえば特徴点または重要な点)は、画像の重要な部分または注目すべき部分である。捕獲された画像から抽出される特徴は、三次元空間(たとえば軸X、YおよびZ上の座標)に沿った全く別の点を表すことができ、また、すべての特徴点は、関連する特徴ロケーションを有することができる。キーフレーム内の特徴は、すでに捕獲済みのキーフレームの特徴と整合しているか、または整合に失敗しているかのいずれかである(すなわちすでに捕獲済みのキーフレームの特徴と同じか、あるいは対応している)。特徴検出は、すべてのピクセルを調べて、特定のピクセルに特徴が存在しているかどうかを決定するための画像処理操作であってもよい。特徴検出は、捕獲された画像全体、別法としては捕獲された画像の特定の部分または一部を処理することができる。 HSLAM can extract features from keyframe images. As used herein, features (eg, feature points or important points) are important or noteworthy portions of an image. Features extracted from the captured image can represent completely different points along a three-dimensional space (e.g. coordinates on axes X, Y and Z), and all feature points are related features You can have a location. The feature in the keyframe is either aligned with the feature of the already captured keyframe or has failed to match (i.e., is the same as or corresponds to the feature of the already captured keyframe. ing). Feature detection may be an image processing operation that examines all pixels to determine if a feature exists at a particular pixel. Feature detection can process the entire captured image, alternatively a specific part or part of the captured image.

捕獲された画像またはビデオフレームごとに特徴が検出されると、その特徴の周囲の局所画像パッチを抽出することができる。特徴を局所化し、それらの記述を生成する、Scale Invariant Feature Transform(SIFT)のようなよく知られている技法を使用して、特徴が抽出され得る。必要に応じて、Speed Up Robust Features(SURF)、Gradient Location-Orientation histogram(GLOH)、Normalized Cross Correlation(NCC)または他の匹敵し得る技法などの他の技法を使用することも可能である。ある画像に対して抽出された特徴の数が閾値(たとえば100点特徴または他の数の点)を超えていることが決定されると、その画像および特徴をキーフレームとして保存することができる。 Once a feature is detected for each captured image or video frame, a local image patch around that feature can be extracted. Features can be extracted using well-known techniques such as Scale Invariant Feature Transform (SIFT) that localize the features and generate their descriptions. Other techniques such as Speed Up Robust Features (SURF), Gradient Location-Orientation histogram (GLOH), Normalized Cross Correlation (NCC), or other comparable techniques can be used as needed. If it is determined that the number of features extracted for an image exceeds a threshold (eg, 100 features or another number of points), the images and features can be saved as key frames.

マッピング構成要素875は、データ結合855洗練およびバンドル調整最適化860によってマップ品質を改善することができる。HSLAMは、キーフレーム選択を実行して、6DOF、および3Dマップ865に含めるためのパノラマキーフレーム840〜850を選択することができる。マッピング構成要素875は、再局所化を補助するために3Dマップデータ835を追跡構成要素825に送ることができる。さらに、HSLAMは、3Dマップを拡張するためにパノラマキーフレームを局所化し、かつ、無限特徴を三角測量することができる。 The mapping component 875 can improve map quality through data combining 855 refinement and bundle adjustment optimization 860. HSLAM can perform keyframe selection to select 6DOF and panoramic keyframes 840-850 for inclusion in the 3D map 865. The mapping component 875 can send 3D map data 835 to the tracking component 825 to assist in relocalization. In addition, HSLAM can localize panoramic keyframes to extend 3D maps and triangulate infinite features.

一実施形態では、HSLAMは、個別のマッピング構成要素(たとえば上で説明したマッピング構成要素875などのスレッド、エンジンまたはモジュール)を実行して、広域SLAMマップ(すなわち3Dマップ)の品質を改善することができる。たとえばマッピング構成要素875は、1つまたは複数のタイプの最適化860(たとえば3Dバンドル調整)を実行することができる。また、HSLAMは、3Dマップを拡張するために、パノラマキーフレームのための全6DOFポーズを推定し、かつ、無限特徴を三角測量することも可能である。 In one embodiment, HSLAM executes individual mapping components (e.g., threads, engines or modules such as mapping component 875 described above) to improve the quality of wide area SLAM maps (i.e. 3D maps). Can do. For example, the mapping component 875 can perform one or more types of optimization 860 (eg, 3D bundle adjustment). HSLAM can also estimate all 6 DOF poses for panoramic keyframes and triangulate infinite features to extend the 3D map.

データ結合洗練855の一部として、HSLAMは、既存の特徴ロケーションおよびキーフレームポーズをさらに拘束するために、新しいキーフレーム-特徴観察を探索する。HSLAMは、能動探索および記述子整合技法を適用して2D-2D符合を確立することができる。また、HSLAMは、孤立した観察および特徴を検出し、かつ、放棄することも可能である。 As part of data combination refinement 855, HSLAM searches for new keyframe-feature observations to further constrain existing feature locations and keyframe poses. HSLAM can apply active search and descriptor matching techniques to establish 2D-2D codes. HSLAM can also detect and abandon isolated observations and features.

HSLAMは、有限マップ特徴に対してパノラマキーフレームを頑強に局所化することができる。ポーズは、全6DOFでは無限特徴から正確に推定することができないため、パノラマキーフレームは、信頼することができないと見なされているポーズをパノラマ追跡から初期化することができる。しかしながら既存の有限マップ特徴に対する符合を確立することにより、HSLAMは全6DOFポーズを推定することができる。したがってHSLAMは、パノラマキーフレームを有効にレギュラー6DOFキーフレームに変換する。 HSLAM can robustly localize panoramic keyframes for finite map features. Since poses cannot be accurately estimated from infinite features in all 6 DOFs, panoramic keyframes can initialize poses that are considered unreliable from panorama tracking. However, by establishing a match to existing finite map features, HSLAM can estimate all 6 DOF poses. Therefore, HSLAM effectively converts panoramic key frames to regular 6DOF key frames.

HSLAMは、無限特徴観察を三角測量することにより、3Dマッピングのための局所パノラママップに記憶されている情報をてこ入れすることができる。HSLAMは、記述子整合を使用して、たとえば同じ場面領域を見ている個別の局所パノラママップ内の頑強に局所化されたキーフレーム間の2D-2D符合を見出すことができる。検証試験に合格する符合は、追加有限マップ特徴を構成する。したがってHSLAMは、無限特徴を有効に有限特徴に変換することができる。 HSLAM can leverage the information stored in the local panorama map for 3D mapping by triangulating infinite feature observations. HSLAM can use descriptor matching to find 2D-2D codes between robustly localized keyframes in separate local panoramic maps, eg, looking at the same scene region. A sign that passes the verification test constitutes an additional finite map feature. Therefore, HSLAM can effectively convert infinite features into finite features.

HSLAMは、バンドル調整を使用してマップの最適化860を実行することができる。バンドル調整は、キーフレーム-特徴観察に基づいて費用関数を最小にすることにより、局所化されたキーフレームの6DOFポーズおよび有限マップ特徴の3D位置を更新する。非局所化パノラマキーフレームおよび無限特徴は、最適化の一部ではあり得ない。しかしながらHSLAMは、最適化された3Dマップ内のパノラママップの位置決めを調整することにより、マップ一貫性を維持することができる。 HSLAM may perform map optimization 860 using bundle adjustment. Bundle adjustment updates the 6DOF pose of localized keyframes and the 3D position of finite map features by minimizing the cost function based on keyframe-feature observations. Non-localized panoramic key frames and infinite features cannot be part of the optimization. However, HSLAM can maintain map consistency by adjusting the positioning of the panoramic map within the optimized 3D map.

一実施形態では、カメラポーズを完全に6DOF内に拘束することができることを決定すると、HSLAMは、6DOFキーフレームとしてそれぞれのキーフレームに印を付けるか、あるいはタグを振ることができる。たとえば、ポーズ追跡に関して以下で説明されるように、十分な有限特徴点がポーズ推定の一部であるときである。さらに、HSLAMは、キーフレームが場面の新しい部分を画像化している間に既存のキーフレームに対して十分な視差を生成する場合、レギュラー6DOFキーフレームを選択することができる。視差を使用して頑強な特徴三角測量を保証することができる。 In one embodiment, upon determining that the camera pose can be fully constrained within 6DOF, HSLAM can mark or tag each keyframe as a 6DOF keyframe. For example, when sufficient finite feature points are part of the pose estimation, as described below with respect to pose tracking. In addition, HSLAM can select regular 6DOF keyframes if it generates sufficient disparity for existing keyframes while the keyframes image a new part of the scene. Using parallax can ensure robust feature triangulation.

視差は、2つのカメラビュー(たとえば現在のカメラビュー、キーフレームカメラビュー)から観察された3D点ロケーション(たとえば有限3Dマップ特徴)のスケール依存三角測量角度である。HSLAMは、平均場面深度(たとえば現在のフレーム内で観察された有限マップ特徴の平均深度)、および現在のカメラロケーションと既存のキーフレームカメラロケーションとの間の距離の関数として現在のカメラビューの視差角度を近似することができる。カバレージは、カメラビュー(たとえば現在のカメラビュー、キーフレームカメラビュー)中に投影される有限マップ特徴で覆われる画像フレーム領域の比である。HSLAMは、画像フレームをセルを有する規則格子に分割することができ、また、カメラポーズを使用して有限マップ特徴を投影することができる。含まれている特徴の数が最少の格子セルが覆われていると見なされる。カバレージは、覆われた格子セルの数とすべての格子セルとの比である。 Parallax is the scale-dependent triangulation angle of 3D point locations (eg, finite 3D map features) observed from two camera views (eg, current camera view, keyframe camera view). HSLAM is the disparity of the current camera view as a function of the average scene depth (for example, the average depth of finite map features observed in the current frame) and the distance between the current camera location and the existing keyframe camera location. The angle can be approximated. Coverage is the ratio of the image frame area covered by a finite map feature that is projected into a camera view (eg, current camera view, keyframe camera view). HSLAM can divide an image frame into regular grids with cells, and can project finite map features using camera poses. A grid cell with the least number of features included is considered to be covered. Coverage is the ratio of the number of covered lattice cells to all lattice cells.

HSLAMは、十分な視差および狭いカバレージに基づいてレギュラー6DOFキーフレームを選択することができる。視差は、頑強な特徴三角測量のために必要である。カバレージは、現在のフレームポーズが投影されたマップ特徴と共に頑強に拘束されているかどうかを示す。狭いカバレージは、カメラがマッピングされない場面領域を観察していることを示す。 HSLAM can select regular 6DOF keyframes based on sufficient disparity and narrow coverage. Parallax is necessary for robust feature triangulation. Coverage indicates whether the current frame pose is robustly constrained with the projected map features. Narrow coverage indicates that the camera is observing an unmapped scene area.

カバレージが狭く、また、現在のフレームと既存のキーフレームとの間の視差が十分でないことをHSLAMが検出すると、3Dマップ特徴を現在のフレーム内でもはや観察することができない場合、追跡が失敗する可能性がある。カメラ運動が純回転に近い場合、HSLAMは、局所化されたパノラマキーフレームの選択をトリガすることができる。狭いカバレージは、カメラ点がマッピングされない場面領域に向かっていることを示すことができる。しかしながらHSLAMは、純回転カメラ運動の視差が小さいため、レギュラー6DOFキーフレームを生成することはできない。したがってHSLAMは、3Dマップに対して局所化されたパノラマキーフレームを生成することができる。 If HSLAM detects that the coverage is narrow and the disparity between the current frame and the existing keyframes is not sufficient, tracking will fail if 3D map features can no longer be observed in the current frame there is a possibility. If the camera motion is close to pure rotation, HSLAM can trigger the selection of localized panoramic keyframes. Narrow coverage can indicate that the camera point is heading towards an unmapped scene area. However, HSLAM cannot generate regular 6DOF keyframes due to the small parallax of purely rotating camera motion. Therefore, HSLAM can generate panoramic key frames localized to the 3D map.

HSLAMは、追跡された6DOFポーズの履歴に基づいて純回転カメラ運動を検出することができる。追跡された6DOFポーズは、年代順に履歴に記憶される。HSLAMは、現在のポーズと履歴との間の視差角度を計算し、視差が十分に大きいすべてのポーズを放棄することができる。残りの履歴ポーズは、現在のフレームと同様の3Dロケーションを有することができる。最後に、HSLAMが、履歴の中に、現在のフレームに対して視差が小さく、かつ、大きい角度を有するポーズを見出すと、HSLAMは、見る方向間の角度を計算し、純回転を検出することができる。 HSLAM can detect pure rotating camera motion based on the history of tracked 6DOF poses. The tracked 6DOF poses are stored in the history in chronological order. HSLAM can calculate the parallax angle between the current pose and history and abandon all poses with sufficiently large parallax. The remaining history poses can have a 3D location similar to the current frame. Finally, when HSLAM finds a pose in the history that has a small parallax and a large angle with respect to the current frame, HSLAM calculates the angle between the viewing directions and detects the pure rotation. Can do.

HSLAMは、狭いカバレージおよび十分な回転に基づいてパノラマキーフレームの選択を継続することができる。狭いカバレージは、カメラがマッピングされない場面領域を探査し続けていることを示すことができる。HSLAMは、現在のフレームの見る方向と現在のパノラママップのキーフレームポーズとの間の差動角度として回転を計算することができる。HSLAMは、再び3Dマップの一部を観察すると、より総体的な動作にそれとなく移動して戻ることができる。総体的な動作では、HSLAMは、同じ基準を適用して新しい6DOFキーフレームを生成することができる。 HSLAM can continue selecting panoramic key frames based on narrow coverage and sufficient rotation. Narrow coverage can indicate that the camera continues to explore scene areas that are not mapped. HSLAM can calculate the rotation as the differential angle between the current frame viewing direction and the current panoramic map keyframe pose. HSLAM can implicitly move back to a more holistic movement when viewing a portion of the 3D map again. In overall operation, HSLAM can apply the same criteria to generate a new 6DOF keyframe.

上で説明したように、デバイス100は、携帯型電子デバイス(たとえばスマートフォン、専用拡張現実感(AR)デバイス、ゲームデバイス、眼鏡などの着用可能デバイス、またはAR処理能力および表示能力を有する他のデバイス)であってもよい。本明細書において説明されているARシステムを実施するデバイスは、ショッピングモール、道路、部屋、またはユーザが携帯型デバイスを持っていくことができるあらゆる場所などの様々な環境で使用することができる。AR文脈では、ユーザは、デバイス100を使用して、自分のデバイスのディスプレイを通してリアルワールドの表現を見ることができる。ユーザは、自分のデバイスのカメラを使用してリアルワールド画像/ビデオを受け取り、かつ、デバイス上に表示されたリアルワールド画像/ビデオの上に追加情報または代替情報を重ねる、つまり重畳させることにより、自分のAR可能デバイスと対話することができる。ユーザは、自分のデバイス上でAR実施を見る際に、リアルワールド対象または場面をデバイスディスプレイ上で、実時間で置き換えるか、あるいは変更することができる。仮想対象(たとえばテキスト、画像、ビデオ)は、デバイスディスプレイ上に描写された場面の表現の中に挿入することができる。 As described above, the device 100 may be a portable electronic device (e.g., a smartphone, a dedicated augmented reality (AR) device, a gaming device, a wearable device such as glasses, or other device having AR processing and display capabilities. ). Devices that implement the AR system described herein can be used in a variety of environments, such as shopping malls, roads, rooms, or any place where a user can take a portable device. In the AR context, the user can use the device 100 to view a real world representation through the display of his device. The user receives a real world image / video using his device's camera and overlays or superimposes additional or alternative information on the real world image / video displayed on the device, You can interact with your AR-enabled device. Users can replace or change real-world objects or scenes in real time on the device display when viewing the AR implementation on their devices. Virtual objects (eg, text, images, video) can be inserted into the representation of the scene depicted on the device display.

デバイス100およびカメラ114が移動すると、ディスプレイは、広域SLAMマップ内の目標(たとえば1つまたは複数の対象または場面)の拡張を実時間で更新することになる。デバイスが初期基準画像位置から離れて移動すると、デバイスは、代替ビューから追加画像を捕獲することができる。特徴を抽出し、かつ、追加キーフレームから三角測量すると、向上した拡張精度を達成することができる(たとえば対象の周囲の境界をより正確に適合することができ、場面内の対象の表現がより写実的に出現し、また、目標をカメラ114ポーズに対してより正確に配置することができる)。 As device 100 and camera 114 move, the display will update the extension of the target (eg, one or more objects or scenes) in the wide area SLAM map in real time. As the device moves away from the initial reference image position, the device can capture additional images from the alternate view. Extracting features and triangulating from additional keyframes can achieve improved extended accuracy (e.g., better fit the perimeter of the object and better represent the object in the scene) It can appear realistic and the target can be placed more accurately with respect to the camera 114 pose).

一実施形態では、カメラ114によって捕獲され、かつ、ディスプレイ112上に表示されたビデオストリーム(または画像)中に、対象または図形を挿入し、あるいは統合することができる。HSLAMは、任意選択で、目標を拡張するための追加情報をユーザに促すことができる。たとえばユーザは、ユーザコンテントを追加して目標の表現を拡張することができる。ユーザコンテントは、画像、3D対象、ビデオ、テキスト、または目標の表現と統合し、重畳し、あるいは置き換えることができる他のコンテントタイプであってもよい。 In one embodiment, objects or graphics can be inserted or integrated into the video stream (or image) captured by the camera 114 and displayed on the display 112. HSLAM can optionally prompt the user for additional information to extend the goal. For example, the user can extend the target representation by adding user content. The user content may be an image, 3D object, video, text, or other content type that can be integrated, superimposed, or replaced with a representation of the target.

ディスプレイは、実時間で、継目のない追跡で元の場面から更新することができる。たとえばサイン上のテキストは、代替テキストと置き換えることができ、あるいは3D対象は、戦略的に場面の中に置いてデバイス100上に表示することができる。ユーザがカメラ114の位置および配向を変更すると、図形または対象を調整または拡張して、カメラ114の相対移動と整合させることができる。たとえば仮想対象が拡張された現実的ディスプレイ中に挿入される場合、仮想対象から離れたカメラ移動は、カメラ114が移動した距離に比例して仮想対象のサイズを小さくすることができる。たとえば仮想対象から4歩ステップバックすると、仮想対象から半歩ステップバックした場合と比較して仮想対象のサイズがより大きく縮小することになり、他のすべての変量についても同様である。運動図形または動画は、HSLAMによって表現された場面内で動画にすることができる。たとえば動画化された対象は、拡張された現実的ディスプレイ中に描写された場面内で「動かす」ことができる。 The display can be updated from the original scene in real time with seamless tracking. For example, the text on the sign can be replaced with alternative text, or the 3D object can be strategically placed in the scene and displayed on the device 100. As the user changes the position and orientation of the camera 114, the graphic or object can be adjusted or expanded to match the relative movement of the camera 114. For example, if the virtual object is inserted into an expanded realistic display, camera movement away from the virtual object can reduce the size of the virtual object in proportion to the distance the camera 114 has moved. For example, if you step back four steps from the virtual object, the size of the virtual object will be greatly reduced compared to a half-step step back from the virtual object, and the same applies to all other variables. The motion figure or animation can be animated within the scene expressed by HSLAM. For example, an animated object can be “moved” within a scene depicted in an expanded realistic display.

本明細書において説明されている実施形態は、AR(たとえばロボット位置決め)以外の方法で実施することができることは当業者には認識されよう。 Those skilled in the art will recognize that the embodiments described herein can be implemented in ways other than AR (eg, robot positioning).

HSLAMは、ソフトウェア、ファームウェア、ハードウェア、モジュールまたはエンジンとして実施することができる。一実施形態では、上記HSLAM説明は、上記所望の機能を達成するためのデバイス100内の汎用プロセッサ161によって実施することができる。一実施形態では、HSLAMは、副構成要素として画像処理モジュール171、6DOFモジュール173およびパノラマモジュール175を含むことができるエンジンまたはモジュールとして実施することができる。他の実施形態では、説明されている副構成要素のうちの1つまたは複数の特徴を組み合わせ、あるいは分割して、異なる個々の構成要素、モジュールまたはエンジンにすることができる。 HSLAM can be implemented as software, firmware, hardware, modules or engines. In one embodiment, the HSLAM description can be implemented by a general purpose processor 161 in the device 100 to achieve the desired function. In one embodiment, HSLAM can be implemented as an engine or module that can include an image processing module 171, a 6DOF module 173, and a panorama module 175 as sub-components. In other embodiments, one or more features of the described subcomponents can be combined or divided into different individual components, modules or engines.

本明細書における教示は、様々な装置(たとえばデバイス)に組み込むことができる(たとえば様々な装置(たとえばデバイス)の中で実施され、あるいは様々な装置(たとえばデバイス)によって実行される)。一実施形態では、ITCは、画像またはビデオを入力として受け取るためにプロセッサによって実行されるエンジンまたはモジュールとして実施することができる。本明細書において教示された1つまたは複数の態様は、電話(たとえばセルラーフォーン)、パーソナルデータアシスタント(「PDA」)、タブレット、モバイルコンピュータ、ラップトップコンピュータ、タブレット、娯楽デバイス(たとえば音楽デバイスまたはビデオデバイス)、ヘッドセット(たとえばヘッドホン、イヤピース、等々)、メディアデバイス(たとえば生物測定センサ、心拍数モニタ、歩数計、EKGデバイス、等々)、ユーザI/Oデバイス、コンピュータ、サーバ、売場専用デバイス、娯楽デバイス、セットトップボックスまたは任意の他の適切なデバイスに組み込むことができる。これらのデバイスは、異なる電力要件およびデータ要件を有し、各特徴または特徴のセットに対して生成される異なる電力プロファイルをもたらすことがある。 The teachings herein can be incorporated into various apparatuses (eg, devices) (eg, implemented in or performed by various apparatuses (eg, devices)). In one embodiment, the ITC can be implemented as an engine or module that is executed by a processor to receive an image or video as input. One or more aspects taught herein include a phone (eg, a cellular phone), a personal data assistant (“PDA”), a tablet, a mobile computer, a laptop computer, a tablet, an entertainment device (eg, a music device or video). Devices), headsets (e.g. headphones, earpieces, etc.), media devices (e.g. biometric sensors, heart rate monitors, pedometers, EKG devices, etc.), user I / O devices, computers, servers, point-of-sale devices, entertainment It can be incorporated into a device, set top box or any other suitable device. These devices have different power and data requirements and may result in different power profiles generated for each feature or set of features.

いくつかの態様では、ワイヤレスデバイスは、通信システムのためのアクセスデバイス(たとえばWi-Fiアクセスポイント)を備えることができる。そのようなアクセスデバイスは、たとえば、有線またはワイヤレス通信リンクを介した、トランシーバ140を通した別のネットワーク(たとえばインターネットまたはセルラーネットワークなどの広域ネットワーク)への接続性を提供することができる。したがってアクセスデバイスは、別のデバイス(たとえばWi-Fi局)による他のネットワークまたは何らかの他の機能へのアクセスを可能にすることができる。さらに、デバイスのうちの1つまたは両方は、携帯型であっても、あるいはいくつかのケースでは比較的非携帯型であってもよいことを理解されたい。 In some aspects, the wireless device may comprise an access device (eg, a Wi-Fi access point) for the communication system. Such an access device can provide connectivity to another network (eg, a wide area network such as the Internet or a cellular network) through the transceiver 140, eg, via a wired or wireless communication link. Thus, an access device can allow access to another network or some other function by another device (eg, a Wi-Fi station). Furthermore, it should be understood that one or both of the devices may be portable or, in some cases, relatively non-portable.

情報および信号は、任意の様々な異なる技術および技法を使用して表すことができることは当業者には理解されよう。たとえば上記説明全体を通して参照することができるデータ、命令、指令、情報、信号、ビット、記号およびチップは、電圧、電流、電磁波、磁界または粒子、光学場または粒子、あるいはそれらの任意の組合せによって表すことができる Those of skill in the art will understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols and chips that can be referred to throughout the above description are represented by voltage, current, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof. be able to

本明細書において開示されている実施形態に関連して説明されている様々な実例論理ブロック、モジュール、エンジン、回路およびアルゴリズムステップは、電子ハードウェア、コンピュータソフトウェアまたは両方の組合せとして実施することができることは当業者にはさらに理解されよう。ハードウェアおよびソフトウェアのこの互換性を明確に示すために、様々な実例構成要素、ブロック、モジュール、エンジン、回路およびステップは、上では、一般に、それらの機能に関して説明されている。そのような機能がハードウェアとして実施されるか、あるいはソフトウェアとして実施されるかどうかは、総合システムに課される特定のアプリケーションおよび設計制約で決まる。当業者は、説明されている機能を特定のアプリケーションごとに可変方式で実施することができるが、そのような実施態様決定は、本発明の範囲を逸脱させるものとして解釈してはならない。 The various example logic blocks, modules, engines, circuits, and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or a combination of both. Will be further understood by those skilled in the art. To clearly illustrate this interchangeability of hardware and software, various example components, blocks, modules, engines, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art can implement the described functions in a variable manner for each particular application, but such implementation decisions should not be construed as departing from the scope of the present invention.

本明細書において開示される実施形態に関連して説明される様々な例示的な論理ブロック、モジュール、および回路は、汎用プロセッサ、デジタルシグナルプロセッサ(DSP)、特定用途向け集積回路(ASIC)、フィールドプログラマブルゲートアレイ(FPGA)もしくは他のプログラマブル論理デバイス、個別のゲートもしくはトランジスタ論理、個別のハードウェアコンポーネント、または本明細書において説明される機能を実行するように設計されているそれらの任意の組合せを用いて実施または実行され得る。汎用プロセッサはマイクロプロセッサであってもよいが、代替ではプロセッサは、任意の従来のプロセッサ、コントローラ、マイクロコントローラまたは状態マシンであってもよい。また、プロセッサは、計算デバイスの組合せ、たとえばDSPと、マイクロプロセッサ、複数のマイクロプロセッサ、DSPコアと関連した1つまたは複数のマイクロプロセッサとの組合せ、あるいは任意の他のそのような構成として実施することも可能である。 Various exemplary logic blocks, modules, and circuits described in connection with the embodiments disclosed herein are general purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), fields A programmable gate array (FPGA) or other programmable logic device, individual gate or transistor logic, individual hardware components, or any combination thereof designed to perform the functions described herein Can be implemented or implemented. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. The processor may also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors associated with a DSP core, or any other such configuration. It is also possible.

本明細書において開示されている実施形態に関連して説明されている方法またはアルゴリズムのステップは、ハードウェアの中、プロセッサによって実行されるソフトウェアモジュールの中、またはその2つの組合せの中で直接具体化することができる。ソフトウェアモジュールは、RAMメモリ、フラッシュメモリ、ROMメモリ、EPROMメモリ、EEPROMメモリ、レジスタ、ハードディスク、取外し可能ディスク、CD-ROM、または当分野で知られている任意の他の形態の記憶媒体に常駐させることができる。例示的記憶媒体は、プロセッサが記憶媒体から情報を読み出し、かつ、記憶媒体に情報を書き込むことができるようにプロセッサに結合される。代替では、記憶媒体はプロセッサと一体であってもよい。プロセッサおよび記憶媒体は、ASICの中に常駐させることができる。ASICは、ユーザ端末の中に常駐させることができる。代替では、プロセッサおよび記憶媒体は、離散構成要素としてユーザ端末の中に常駐させることができる。 The method or algorithm steps described in connection with the embodiments disclosed herein may be implemented directly in hardware, in software modules executed by a processor, or in a combination of the two. Can be Software modules reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art be able to. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium can reside in an ASIC. The ASIC can reside in the user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

1つまたは複数の例示的実施形態では、説明されている機能またはモジュールは、ハードウェア(たとえばハードウェア162)、ソフトウェア(たとえばソフトウェア165)、ファームウェア(たとえばファームウェア163)またはそれらの任意の組合せの中で実施することができる。コンピュータプログラム製品としてソフトウェアの中で実施される場合、機能またはモジュールは、非一時的コンピュータ可読媒体上の1つまたは複数の命令またはコードとして記憶することができ、あるいは送信することができる。コンピュータ可読媒体は、ある場所から別の場所へのコンピュータプログラムの転送を容易にする任意の媒体を含む、コンピュータ記憶媒体とコンピュータ通信媒体の両方を含み得る。記憶媒体は、コンピュータまたはデータ処理デバイス/システムによってアクセスすることができる任意の利用可能媒体であってもよい。限定ではなく例として、そのような非一時的コンピュータ可読媒体は、RAM、ROM、EEPROM、CD-ROMもしくは他の光ディスク記憶装置、磁気ディスク記憶装置もしくは他の磁気記憶デバイス、または、命令もしくはデータ構造の形態の所望のプログラムコードを搬送もしくは記憶するために使用することができ、コンピュータによってアクセス可能であり得る任意の他の媒体を含み得る。また、任意の接続も適切にコンピュータ可読媒体と称される。たとえば、ソフトウェアが、同軸ケーブル、光ファイバケーブル、ツイストペア、デジタル加入者回線(DSL)、または赤外線、無線、およびマイクロ波などのワイヤレス技術を使用して、ウェブサイト、サーバ、または他のリモートソースから送信される場合、同軸ケーブル、光ファイバケーブル、ツイストペア、DSL、または赤外線、無線、およびマイクロ波などのワイヤレス技術は、媒体の定義に含まれる。本明細書で使用する場合、ディスク(disk)およびディスク(disc)は、コンパクトディスク(CD)、レーザーディスク（登録商標）、光ディスク、デジタル多用途ディスク(DVD)、フロッピー（登録商標）ディスク、およびブルーレイディスクを含み、ディスク(disk)は、通常、磁気的にデータを再生し、ディスク(disc)は、レーザーで光学的にデータを再生する。上記の組合せも非一時的コンピュータ可読媒体の範囲内に含めるべきである。 In one or more exemplary embodiments, the functions or modules described are among hardware (eg, hardware 162), software (eg, software 165), firmware (eg, firmware 163), or any combination thereof. Can be implemented. When implemented in software as a computer program product, the functions or modules can be stored or transmitted as one or more instructions or code on a non-transitory computer-readable medium. Computer-readable media can include both computer storage media and computer communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer or data processing device / system. By way of example, and not limitation, such non-transitory computer readable media can be RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage device, or instructions or data structure Any other medium that may be used to carry or store the desired program code in the form of and that may be accessible by a computer may be included. Also, any connection is properly termed a computer-readable medium. For example, software can use a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, wireless, and microwave, from a website, server, or other remote source When transmitted, coaxial technology, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the media definition. As used herein, a disk and a disc are a compact disc (CD), a laser disc (registered trademark), an optical disc, a digital versatile disc (DVD), a floppy disc (registered trademark), and Including Blu-ray discs, the disk normally reproduces data magnetically, and the disc optically reproduces data with a laser. Combinations of the above should also be included within the scope of non-transitory computer readable media.

開示されている実施形態についての以上の説明は、すべての当業者による本発明の構築または使用を可能にするために提供されたものである。当業者には、これらの実施形態に対する様々な修正が容易に明らかであり、また、本明細書において定義されている一般的な原理は、本発明の精神または範囲を逸脱することなく、他の実施形態に適用することができる。したがって本発明は、本明細書において示されている実施形態に限定されることは意図されておらず、本明細書において開示されている原理および新規な特徴と無矛盾の最も広義の範囲と一致するものとする。 The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be used without departing from the spirit or scope of the invention. It can be applied to the embodiment. Accordingly, the present invention is not intended to be limited to the embodiments shown herein, but is consistent with the broadest scope consistent with the principles and novel features disclosed herein. Shall.

100 デバイス
111 運動センサ
112 ディスプレイ
114 カメラ
140 トランシーバ
150 ユーザインターフェース
152 キーパッド
154 マイクロフォン
156 スピーカー
160 制御ユニット
161 プロセッサ
162 ハードウェア
163 ファームウェア
164 メモリ
165 ソフトウェア
170 混成SLAM
171 画像処理モジュール
173 6DOF SLAM
175 パノラマSLAM
177 バス
305 有限マップ特徴、有限特徴
310 無限特徴
315、415 従属パノラマキーフレーム
320、410 6DOFキーフレーム
330、430 基準パノラマキーフレーム
350、610 局所パノラママップ
440 パノラママップ「A」
450 パノラママップ「B」
510 HSLAM初期化
605、865 3Dマップ
625 回転のみカメラ運動
635 6DOFカメラポーズ再局所化
640 滑らかな移行
755 全6DOFマッピングモード
760 純回転運動検出
765 3DOFマッピング
770 6DOF測定
775 再局所化
810 追跡
815 混成3Dおよびパノラママップ追跡
820 純パノラマ追跡
825 HSLAMポーズ追跡およびキーフレーム選択構成要素
830 ポーズ更新
835 3Dマップデータ
840 パノラマキーフレーム、6DOFキーフレーム
845 混成キーフレーム
850 パノラマキーフレーム
855 データ結合
860 バンドル調整最適化
870 局所パノラママップ
875 マッピング構成要素 100 devices
111 Motion sensor
112 display
114 camera
140 transceiver
150 User interface
152 Keypad
154 microphone
156 Speaker
160 Control unit
161 processor
162 hardware
163 Firmware
164 memory
165 software
170 Hybrid SLAM
171 Image processing module
173 6DOF SLAM
175 Panorama SLAM
177 Bus
305 Finite map feature, Finite feature
310 Infinite features
315, 415 subordinate panoramic keyframes
320, 410 6DOF key frame
330, 430 standard panoramic keyframes
350, 610 Local panorama map
440 Panorama Map “A”
450 Panorama Map “B”
510 HSLAM initialization
605, 865 3D map
625 rotation only camera movement
635 6DOF camera pose relocalization
640 smooth transition
755 All 6DOF mapping mode
760 Pure rotational motion detection
765 3DOF mapping
770 6DOF measurement
775 relocalization
810 tracking
815 Hybrid 3D and panoramic map tracking
820 pure panorama tracking
825 HSLAM pause tracking and keyframe selection components
830 Pause update
835 3D map data
840 panoramic key frame, 6DOF key frame
845 hybrid key frame
850 panoramic keyframes
855 Data Join
860 Bundle adjustment optimization
870 Local panorama map
875 mapping components

Claims

A machine-implemented method for simultaneous monocular visual localization and mapping comprising:
Receiving a three-dimensional (3D) map of the environment, wherein the 3D map is
A feature having a finite depth observed within two or more key frames, each key frame being a panoramic key frame or a regular key frame;
Comprising a feature having an infinite depth observed within one or more panoramic key frames;
Tracking the camera in 6 degrees of freedom (6 DOF) from a finite depth feature or an infinite depth feature of the 3D map observed in an image frame from an input image feed.

When transitioning from the 6DOF camera movement to a panoramic camera movement toward an unmapped scene area, selecting a reference panoramic keyframe;
Incorporating the reference panoramic keyframe into the 3D map by adding finite depth feature observation and infinite depth feature observation to the 3D map;
Initializing a local panoramic map positioned in the 3D map, comprising:
Assigning a reference panorama keyframe to the local panorama map;
2. The machine-implemented method of claim 1, further comprising: using the 6DOF pose of the reference panoramic keyframe to position the local panorama map within the 3D map.

Selecting one or more subordinate panoramic keyframes toward a scene area where continuous panoramic camera movement is not mapped, the one or more subordinate panoramic keyframes subordinate to a reference panoramic keyframe; ,
Incorporating the one or more subordinate panoramic keyframes into the 3D map by adding infinite depth feature observations to the 3D map;
The machine-implemented method of claim 1, further comprising expanding the local panorama map by adding the one or more dependent panoramic key frames to the local panorama map.

Further comprising localizing the one or more panoramic key frames relative to the 3D map, the localizing step comprising:
Finding a two-dimensional (2D) observation of the finite depth feature in the one or more panoramic key frames;
Determining a 3D-2D code between the 3D map and the 2D observation of the finite depth feature;
2. Estimating a 6DOF camera position and orientation of the one or more panoramic keyframes using the 3D-2D code.

Converting an infinite depth feature from a first localized panoramic keyframe to a new finite depth feature for the 3D map, the converting step comprising:
Finding a 2D observation of the infinite depth feature in a second localized keyframe, wherein the second localized keyframe is a localized panoramic keyframe or a localized regular keyframe When,
Determining a 2D-2D code from the 2D observation of the second localized keyframe;
2. The machine-implemented method of claim 1, comprising: triangulating the new finite depth feature based on the 2D-2D code and the 6DOF camera position and orientation of the key frame pair.

The step of tracking comprises:
Establishing a sign between the finite depth feature and the infinite depth feature of the 3D map, and an image frame from an input image feed;
The machine-implemented method of claim 1, further comprising: estimating a 6DOF camera position and orientation based on the established code.

The step of tracking comprises:
Observing only infinite depth features in the image frame from the input image feed, switching from 6DOF camera movement tracking to panoramic camera movement tracking;
The machine-implemented method of claim 1, further comprising: observing a finite depth feature in the image frame from the input image feed and switching from panoramic camera movement tracking to 6DOF camera movement tracking.

A machine readable non-transitory storage medium containing executable program instructions that will cause a data processing device to perform a method for simultaneous monocular visual localization and mapping, the method comprising:
Receiving a three-dimensional (3D) map of the environment, wherein the 3D map is
A feature having a finite depth observed within two or more key frames, each key frame being a panoramic key frame or a regular key frame;
Comprising a feature having an infinite depth observed within one or more panoramic key frames;
Tracking the camera in 6 degrees of freedom (6 DOF) from a finite depth feature or an infinite depth feature of the 3D map observed in an image frame from an input image feed.

When transitioning from the 6DOF camera movement to a panoramic camera movement toward an unmapped scene area, selecting a reference panoramic keyframe;
Incorporating the reference panoramic keyframe into the 3D map by adding finite depth feature observation and infinite depth feature observation to the 3D map;
Initializing a local panoramic map positioned in the 3D map, comprising:
Assigning a reference panorama keyframe to the local panorama map;
9. The medium of claim 8, further comprising: positioning the local panorama map within the 3D map using a 6DOF pose of the reference panorama key frame.

Selecting one or more subordinate panoramic keyframes toward a scene area where continuous panoramic camera movement is not mapped, the one or more subordinate panoramic keyframes subordinate to a reference panoramic keyframe; ,
Incorporating the one or more subordinate panoramic keyframes into the 3D map by adding infinite depth feature observations to the 3D map;
9. The medium of claim 8, further comprising expanding the local panorama map by adding the one or more dependent panorama key frames to the local panorama map.

Further comprising localizing the one or more panoramic key frames relative to the 3D map, the localizing step comprising:
Finding a two-dimensional (2D) observation of the finite depth feature in the one or more panoramic key frames;
Determining a 3D-2D code between the 3D map and the 2D observation of the finite depth feature;
9. Estimating a 6DOF camera position and orientation of the one or more panoramic keyframes using the 3D-2D code.

Converting an infinite depth feature from a first localized panoramic keyframe to a new finite depth feature for the 3D map, the converting step comprising:
Finding a 2D observation of the infinite depth feature in a second localized keyframe, wherein the second localized keyframe is a localized panoramic keyframe or a localized regular keyframe When,
Determining a 2D-2D code from the 2D observation of the second localized keyframe;
9. The medium of claim 8, comprising: triangulating the new finite depth feature based on the 2D-2D code and the 6DOF camera position and orientation of the key frame pair.

The step of tracking comprises:
Establishing a sign between the finite depth feature and the infinite depth feature of the 3D map, and an image frame from an input image feed;
9. The medium of claim 8, further comprising: estimating a 6DOF camera position and orientation based on the established code.

The step of tracking comprises:
Observing only infinite depth features in the image frame from the input image feed, switching from 6DOF camera movement tracking to panoramic camera movement tracking;
9. The medium of claim 8, further comprising switching from panoramic camera movement tracking to 6DOF camera movement tracking upon observing a finite depth feature in the image frame from the input image feed.

A data processing device for simultaneous monocular visual localization and mapping comprising:
A processor;
A storage device coupled to the processor, wherein when executed by the processor, the processor
Receiving a three-dimensional (3D) map of the environment, wherein the 3D map is
A feature having a finite depth observed within two or more key frames, each key frame being a panoramic key frame or a regular key frame;
Comprising a feature having an infinite depth observed within one or more panoramic key frames;
Tracking the camera in six degrees of freedom (6 DOF) from a finite depth feature or an infinite depth feature of the 3D map observed in an image frame from an input image feed, and configured to store instructions A data processing device comprising: a storage device capable of:

The processor is
When transitioning from the 6DOF camera movement to a panoramic camera movement toward an unmapped scene area, selecting a reference panoramic keyframe;
Incorporating the reference panoramic keyframe into the 3D map by adding finite depth feature observation and infinite depth feature observation to the 3D map;
Initializing a local panoramic map positioned in the 3D map, comprising:
Assigning a reference panorama keyframe to the local panorama map;
16. The device of claim 15, further comprising: steps including: positioning the local panorama map within the 3D map using a 6DOF pose of the reference panoramic keyframe.

The processor is
Selecting one or more subordinate panoramic keyframes toward a scene area where continuous panoramic camera movement is not mapped, the one or more subordinate panoramic keyframes subordinate to a reference panoramic keyframe; ,
Incorporating the one or more subordinate panoramic keyframes into the 3D map by adding infinite depth feature observations to the 3D map;
16. The device of claim 15, further comprising instructions that: performing the step of expanding the local panorama map by adding the one or more dependent panorama key frames to the local panorama map.

The processor further comprising instructions that will perform the step of localizing the one or more panoramic key frames relative to the 3D map, the localizing step comprising:
Finding a two-dimensional (2D) observation of the finite depth feature in the one or more panoramic key frames;
Determining a 3D-2D code between the 3D map and the 2D observation of the finite depth feature;
16. The device of claim 15, comprising: performing a step of estimating a 6DOF camera position and orientation of the one or more panoramic key frames using the 3D-2D code.

The processor further comprising instructions to perform the step of converting an infinite depth feature from a first localized panoramic keyframe to a new finite depth feature for the 3D map, the converting step comprising:
Finding a 2D observation of the infinite depth feature in a second localized keyframe, wherein the second localized keyframe is a localized panoramic keyframe or a localized regular keyframe When,
Determining a 2D-2D code from the 2D observation of the second localized keyframe;
16. The device of claim 15, comprising instructions to: triangulate the new finite depth feature based on the 2D-2D code and the 6DOF camera position and orientation of the keyframe pair.

The step of tracking comprises:
Establishing a sign between the finite depth feature and the infinite depth feature of the 3D map, and an image frame from an input image feed;
16. The device of claim 15, further comprising instructions that: perform a step of estimating a 6DOF camera position and orientation based on the established code.

The step of tracking comprises:
Observing only infinite depth features in the image frame from the input image feed, switching from 6DOF camera movement tracking to panoramic camera movement tracking;
The device of claim 15, further comprising: upon observing a finite depth feature in the image frame from the input image feed, switching from panoramic camera movement tracking to 6DOF camera movement tracking.

A device for simultaneous monocular visual localization and mapping comprising:
Means for receiving a three-dimensional (3D) map of an environment, said 3D map comprising:
A feature having a finite depth observed within two or more key frames, each key frame being a panoramic key frame or a regular key frame;
Means comprising: an infinite depth feature observed within one or more panoramic key frames;
Means for tracking the camera in 6 degrees of freedom (6 DOF) from a finite depth feature or an infinite depth feature of the 3D map observed in an image frame from an input image feed.

Means for selecting a reference panoramic keyframe when transitioning from the 6DOF camera movement to a panoramic camera movement toward an unmapped scene area;
Means for incorporating the reference panoramic keyframe into the 3D map by adding finite depth feature observation and infinite depth feature observation to the 3D map;
Means for initializing a local panorama map positioned in the 3D map,
Means for assigning a reference panoramic keyframe to the local panorama map;
23. The apparatus of claim 22, further comprising: means for positioning the local panorama map within the 3D map using a 6DOF pose of the reference panorama key frame.

Means for selecting one or more subordinate panoramic keyframes toward a scene area where continuous panoramic camera movement is not mapped, said one or more subordinate panoramic keyframes subordinate to a reference panoramic keyframe Means,
Means for incorporating the one or more subordinate panoramic keyframes into the 3D map by adding infinite depth feature observations to the 3D map;
23. The apparatus of claim 22, further comprising: means for extending the local panorama map by adding the one or more dependent panorama key frames to the local panorama map.

Means for localizing the one or more panoramic key frames relative to the 3D map, the means for localizing comprising:
Means for finding a two-dimensional (2D) observation of the finite depth feature in the one or more panoramic key frames;
Means for determining a 3D-2D code between the 3D map and the 2D observation of the finite depth feature;
23. The apparatus of claim 22, comprising means for estimating a 6DOF camera position and orientation of the one or more panoramic key frames using the 3D-2D code.

Means for converting an infinite depth feature from a first localized panoramic keyframe to a new finite depth feature for the 3D map, the means for converting comprising:
Means for finding a 2D observation of the infinite depth feature in a second localized keyframe, wherein the second localized keyframe is a localized panoramic keyframe or a localized regular keyframe With some means,
Means for determining a 2D-2D code from the 2D observation of the second localized keyframe;
23. The apparatus of claim 22, comprising: means for triangulating the new finite depth feature based on the 2D-2D code and the 6DOF camera position and orientation of the key frame pair.

The means for tracking comprises:
Means for establishing a code between the finite depth feature and the infinite depth feature of the 3D map, and an image frame from an input image feed; and estimating a 6DOF camera position and orientation based on the established code 23. The apparatus of claim 22, further comprising: means for.

The means for tracking comprises:
Means to switch from 6DOF camera movement tracking to panoramic camera movement tracking when only infinite depth features in the image frame from the input image feed are observed;
23. The apparatus of claim 22, further comprising: means for switching from panoramic camera movement tracking to 6DOF camera movement tracking upon observing a finite depth feature in the image frame from the input image feed.