JP6228239B2

JP6228239B2 - A method for registering data using a set of primitives

Info

Publication number: JP6228239B2
Application number: JP2015561464A
Authority: JP
Inventors: 田口　裕一; 裕一田口; アテア−カンシゾグル、エスラ; ラマリンガム、スリクマール; ガラース、タイラー・ダブリュ
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2013-06-19
Filing date: 2014-05-30
Publication date: 2017-11-08
Anticipated expiration: 2034-05-30
Also published as: CN105339981B; WO2014203743A1; CN105339981A; DE112014002943T5; JP2016527574A

Description

本発明は、包括的にはコンピュータービジョンに関し、より詳細にはカメラの姿勢を推定することに関する。 The present invention relates generally to computer vision, and more particularly to estimating camera pose.

カメラの姿勢を追跡し、その間同時にシーンの３Ｄ構造を再構成するシステム及び方法が、拡張現実（ＡＲ：ａｕｇｍｅｎｔｅｄｒｅａｌｉｔｙ）視覚化、ロボットナビゲーション、シーンモデリング及びコンピュータービジョンアプリケーションにおいて広く用いられている。そのようなプロセスは、一般的に、ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ（ＳＬＡＭ）と呼ばれる。リアルタイムＳＬＡＭシステムは、２次元（２Ｄ：ｔｗｏ−ｄｉｍｅｎｓｉｏｎａｌ）画像を取得する従来のカメラ、３次元（３Ｄ：ｔｈｒｅｅ−ｄｉｍｅｎｓｉｏｎａｌ）ポイントクラウド（３Ｄ点の組）を取得する奥行きカメラ、又は２Ｄ画像及び３Ｄポイントクラウドの双方を取得する、Ｋｉｎｅｃｔ（登録商標）等の赤、緑、青及び奥行き（ＲＧＢ−Ｄ：ｒｅｄ，ｇｒｅｅｎ，ｂｌｕｅａｎｄｄｅｐｔｈ）カメラを用いることができる。追跡は、カメラの予測される動きを用いてカメラの姿勢を連続的に推定するプロセスを指し、位置再特定は、追跡失敗から回復するための、何らかの特徴ベースの大域レジストレーションを用いるプロセスを指す。 Systems and methods for tracking camera poses and simultaneously reconstructing the 3D structure of a scene are widely used in augmented reality (AR) visualization, robot navigation, scene modeling, and computer vision applications. Such a process is commonly referred to as Simulaneous Localization and Mapping (SLAM). A real-time SLAM system is a conventional camera that acquires two-dimensional (2D) images, a depth camera that acquires a three-dimensional (3D) point cloud (a set of 3D points), or 2D images and 3D. Red, green, blue and depth (RGB-D: red, green, blue and depth) cameras such as Kinect (registered trademark), which acquire both point clouds, can be used. Tracking refers to the process of continuously estimating camera poses using predicted camera motion, and relocation refers to the process of using some feature-based global registration to recover from tracking failure. .

２Ｄカメラを用いるＳＬＡＭシステムは、テクスチャが存在するシーンの場合、概して成功するが、テクスチャが欠けている領域の場合、失敗する可能性が高い。奥行きカメラを用いるシステムは、Ｉｔｅｒａｔｉｖｅ−ＣｌｏｓｅｓｔＰｏｉｎｔ（ＩＣＰ）法を利用して、曲面及び奥行き境界等のシーン内の幾何学的変動に頼る。しかしながら、ＩＣＰベースのシステムは、平坦なシーン等、幾何学的変動が小さいときに多くの場合に失敗する。ＲＧＢ−Ｄカメラを用いるシステムは、テクスチャ及び幾何学的特徴の双方を利用することができるが、依然として別個のテクスチャを必要とする。 SLAM systems using 2D cameras are generally successful for scenes where textures are present, but are more likely to fail for areas lacking textures. A system using a depth camera relies on an iterative-closest point (ICP) method to rely on geometrical variations in the scene such as curved surfaces and depth boundaries. However, ICP-based systems often fail when geometric variations are small, such as flat scenes. A system using an RGB-D camera can take advantage of both texture and geometric features, but still requires a separate texture.

多くの方法は、単一の部屋よりも大きな３Ｄモデルを再構築する際の難点に明確に対処していない。これらの方法をより大きなシーンに拡張するために、より良好なメモリ管理技法が必要とされる。一方、メモリ制限のみが課題ではない。通常、部屋の規模のシーンは、テクスチャ特徴及び幾何学的特徴の双方を有する多くのオブジェクトを有する。より大きなシーンに拡張するためには、限られたテクスチャ及び不十分な幾何学的変動を有する、廊下等の領域においてカメラ姿勢を追跡する必要がある。 Many methods do not explicitly address the difficulties in reconstructing a 3D model that is larger than a single room. In order to extend these methods to larger scenes, better memory management techniques are needed. On the other hand, memory limitation is not the only problem. Typically, room-scale scenes have many objects that have both texture and geometric features. To extend to larger scenes, it is necessary to track the camera pose in areas such as corridors that have limited texture and insufficient geometric variation.

カメラ追跡
３Ｄセンサーを用いて３Ｄポイントクラウドを取得するシステムは、いくつかの３Ｄ対応を所与として、追跡問題をレジストレーション問題に帰着する。ＩＣＰ方法は、カメラ動き予測によって与えられた初期姿勢推定値から開始して、点対点又は点対面の対応を反復的に突き止める。ＩＣＰは、走査マッチングとしても知られる、モバイルロボティクスにおけるライン走査３Ｄセンサーのために広く用いられ、完全な３Ｄポイントクラウドを生成する奥行きカメラ及び３Ｄセンサーのためにも広く用いられている。特許文献１は、Ｋｉｎｅｃｔ（登録商標）カメラの姿勢追跡のために、ＩＣＰ方法を用いた点対面の対応を用いる。マップの表現はボクセルの組である。各ボクセルは、最近傍表面点への距離のためのトランケートされた符号付き距離関数を表す。その方法は、３Ｄポイントクラウドから面を抽出するのではなく、ローカル近傍を用いて３Ｄ点の法線を求めることによって点対面の対応が確立される。そのようなＩＣＰベースの方法は、正確なレジストレーションの場合にシーンが十分な幾何学的変動を有することを必要とする。 Camera Tracking A system that uses a 3D sensor to acquire a 3D point cloud reduces the tracking problem to a registration problem given a number of 3D correspondences. The ICP method starts with an initial pose estimate given by camera motion prediction and iteratively locates point-to-point or point-to-face correspondences. ICP is widely used for line-scanning 3D sensors in mobile robotics, also known as scan matching, and is also widely used for depth cameras and 3D sensors that generate complete 3D point clouds. Patent Document 1 uses point-to-face correspondence using the ICP method for tracking the posture of a Kinect (registered trademark) camera. A map representation is a set of voxels. Each voxel represents a truncated signed distance function for the distance to the nearest surface point. The method does not extract a surface from a 3D point cloud, but establishes a point-to-face correspondence by obtaining a normal of a 3D point using a local neighborhood. Such ICP-based methods require that the scene have sufficient geometric variation in the case of accurate registration.

別の方法は、ＲＧＢ画像から特徴を抽出し、記述子ベースの点マッチングを実行して、点対点の対応を求め、カメラ姿勢を推定する。カメラ姿勢は次に、ＩＣＰ方法を用いて精緻化される。その方法は、シーン内のテクスチャ（ＲＧＢ）特徴及び幾何学的（奥行き）特徴を用いる。しかし、点特徴のみを用いてテクスチャのない領域及び繰り返しのテクスチャを有する領域を扱うことは依然として問題がある。 Another method extracts features from the RGB image and performs descriptor-based point matching to determine point-to-point correspondence and estimate camera pose. The camera pose is then refined using the ICP method. The method uses texture (RGB) and geometric (depth) features in the scene. However, it is still problematic to handle regions without texture and regions with repetitive textures using only point features.

平面を用いたＳＬＡＭ
いくつかのＳＬＡＭシステムにおいて面特徴が用いられている。カメラ姿勢を求めるために、法線がＲ^３にまたがる少なくとも３つの面が必要とされる。このため、面のみを用いることによって、特に、視野（ＦＯＶ：ｆｉｅｌｄｏｆｖｉｅｗ）又はセンサー範囲がＫｉｎｅｃｔ（登録商標）におけるように小さいとき、多くの縮退問題が生じる。大きなＦＯＶのライン走査３Ｄセンサー及び小さな視野（ＦＯＶ）の奥行きカメラの組み合わせによって、更なるシステムコストを伴うが縮退を回避することができる。 SLAM using a plane
Surface features are used in some SLAM systems. In order to determine the camera pose, at least three surfaces whose normals span R ³ are required. For this reason, the use of only surfaces creates many degeneracy problems, especially when the field of view (FOV) or sensor range is as small as in Kinect®. The combination of a large FOV line scan 3D sensor and a small field of view (FOV) depth camera can avoid degeneracy with additional system cost.

米国特許出願公開第２０１２／０１９４５１６号明細書US Patent Application Publication No. 2012/0194516

関連出願に記載されている方法は、これらのプリミティブのうちの１つを用いる方法において一般的な失敗モードを回避するために点及び面の双方を用いる、点−面ＳＬＡＭ（ｐｏｉｎｔ−ｐｌａｎｅＳＬＡＭ）を用いる。そのシステムは、カメラ動き予測を一切用いない。代わりに、そのシステムは、点及び面の対応を大域的に突き止めることによって、全てのフレームについて位置再特定を行う。結果として、そのシステムは、毎秒約３フレームしか処理することができず、記述子ベースの点マッチングに起因して幾つかの繰り返しテクスチャを有するシーンで失敗する。 The method described in the related application is a point-plane SLAM that uses both points and planes to avoid common failure modes in methods that use one of these primitives. Is used. The system does not use any camera motion prediction. Instead, the system repositions every frame by globally locating point and face correspondences. As a result, the system can only process about 3 frames per second and fails on scenes with several repetitive textures due to descriptor-based point matching.

関連特許出願において記載されている方法は、点対点及び面対面の双方の対応を用いて様々な座標系における３Ｄデータレジストレーションも表す。 The method described in the related patent application also represents 3D data registration in various coordinate systems using both point-to-point and face-to-face correspondences.

人工構造物を含む屋内シーン及び屋外シーンでは、平面が支配的である。本発明の実施形態は、点及び面をプリミティブ特徴として用いるＲＧＢ−Ｄカメラを追跡するシステム及び方法を提供する。本方法は、面を当てはめることによって、３Ｄセンサーに一般的な奥行きデータにおけるノイズを暗黙的に処理する。追跡方法は、ハンドヘルド又はロボット搭載のＲＧＢ−Ｄカメラを用いてリアルタイムＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ（ＳＬＡＭ）システムを実証する位置再特定及びバンドル調整プロセスによってサポートされる。 In an indoor scene and an outdoor scene including an artificial structure, a plane is dominant. Embodiments of the present invention provide systems and methods for tracking RGB-D cameras that use points and faces as primitive features. The method implicitly handles noise in depth data common to 3D sensors by fitting a surface. The tracking method is supported by a relocation and bundle adjustment process that demonstrates a real-time Simulaneous Localization and Mapping (SLAM) system using a handheld or robotic RGB-D camera.

本発明の目的は、レジストレーション失敗を引き起こす縮退問題を最小にしながら、高速で正確なレジストレーションを可能にすることである。本方法は、カメラ動き予測を用いて点及び面の対応を突き止め、予測及び補正のフレームワークに基づく追跡器を提供する。本方法は、点及び面の双方を用いる位置再特定及びバンドル調整プロセスを組み込むことにより、追跡失敗から回復し、カメラ姿勢推定を連続的に精緻化する。 It is an object of the present invention to enable fast and accurate registration while minimizing the degeneracy problem that causes registration failure. The method uses camera motion prediction to locate point and surface correspondences and provides a tracker based on a prediction and correction framework. The method recovers from tracking failure and incorporates a continuous refinement of camera pose estimation by incorporating a relocation and bundle adjustment process using both points and faces.

特に、本方法は、３次元のデータにおける点及び面を含むプリミティブの組を用いてデータをレジストレーションする。第１に、本方法は、第１の座標系内のデータからプリミティブの第１の組を選択する。プリミティブの第１の組は、少なくとも３つのプリミティブを含み、少なくとも１つの面を含む。 In particular, the method registers data using a set of primitives that include points and faces in three-dimensional data . First, the method selects a first set of primitives from data in the first coordinate system. The first set of primitives includes at least three primitives and includes at least one face.

第１の座標系から第２の座標系への変換が予測される。この変換は、カメラモーションモデルを用いて予測される。プリミティブの第１の組が予測された変換を用いて第２の座標系に変換される。第２の座標系に変換されたプリミティブの第１の組に従ってプリミティブの第２の組が求められる。 A transformation from the first coordinate system to the second coordinate system is predicted. This conversion is predicted using a camera motion model. The first set of primitives is transformed into the second coordinate system using the predicted transformation. A second set of primitives is determined according to the first set of primitives converted to the second coordinate system.

次に、第１の座標系におけるプリミティブの第１の組及び第２の座標系におけるプリミティブの第２の組を用いて、第２の座標系が第１の座標系にレジストレーションされる。レジストレーションすることは、ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ（ＳＬＡＭ）に用いられる。このレジストレーションを用いて、データを取得するカメラの姿勢を追跡することができる。 The second coordinate system is then registered with the first coordinate system using the first set of primitives in the first coordinate system and the second set of primitives in the second coordinate system. Registering is used for Simulative Localization and Mapping (SLAM). This registration can be used to track the attitude of the camera that acquires the data.

本発明の実施形態による、カメラの姿勢を追跡する方法の流れ図である。3 is a flow diagram of a method for tracking the posture of a camera according to an embodiment of the present invention. 本発明の実施形態による、カメラの予測姿勢を用いて現在のフレームとマップとの間の点対点及び面対面の対応を確立する手順の概略図である。FIG. 6 is a schematic diagram of a procedure for establishing point-to-point and face-to-face correspondence between a current frame and a map using a camera's predicted pose according to an embodiment of the present invention.

本発明の実施形態は、カメラの姿勢を追跡するシステム及び方法を提供する。本方法は、より高速な対応検索及びレジストレーションのためにカメラ動き予測を用いることによって、関連の米国特許出願第１３／５３９０６０号に記載されている実施形態を拡張する。本発明では、現在のフレームとマップとの間に確立される点対点及び面対面の対応を用いる。マップは、大域座標系において以前にレジストレーションされたフレームからの点及び面を含む。ここで、本発明の焦点は、カメラ動き予測を用いて面対面の対応を確立すること、並びに混合した事例では点対点及び面対面の双方の対応を確立することである。 Embodiments of the present invention provide a system and method for tracking the posture of a camera. The method extends the embodiments described in the related US patent application Ser. No. 13 / 539,060 by using camera motion prediction for faster correspondence search and registration. The present invention uses point-to-point and face-to-face correspondence established between the current frame and the map. The map includes points and faces from previously registered frames in the global coordinate system. Here, the focus of the present invention is to establish a face-to-face correspondence using camera motion prediction and to establish both point-to-point and face-to-face correspondence in the mixed case.

システム概観
好ましいシステムでは、ＲＧＢ−Ｄカメラ１０２はＫｉｎｅｃｔ（登録商標）又はＡＳＵＳ（登録商標）ＸｔｉｏｎＰＲＯＬＩＶＥであり、一連のフレーム１０１を必要とする。本発明ではキーフレームベースのＳＬＡＭシステムを用い、キーフレームとして幾つかの代表的なフレームを選択し、マップ内の単一の大域座標系内にレジストレーションされたキーフレームを記憶する。点のみを用いる従来技術のＳＬＡＭと対照的に、本発明ではシステムの全てのプロセスにおいて点及び面をプリミティブとして用いる。各フレーム内の点及び面は測定値と呼ばれ、キーフレームからの測定値はランドマークとしてマップに記憶される。 System Overview In a preferred system, the RGB-D camera 102 is a Kinect® or ASUS® Xtion PRO LIVE and requires a series of frames 101. The present invention uses a key frame based SLAM system to select several representative frames as key frames and store the registered key frames in a single global coordinate system in the map. In contrast to prior art SLAM, which uses only points, the present invention uses points and faces as primitives in all processes of the system. The points and planes in each frame are called measured values, and the measured values from the key frame are stored in the map as landmarks.

マップを所与として、予測及び補正フレームワークを用いて現在のフレームの姿勢を推定する。カメラの姿勢を予測し、姿勢を用いて点測定値及び面測定値と点ランドマーク及び面ランドマークとの間の対応を求め、次にこれらを用いてカメラ姿勢が求められる。 Given a map, estimate the current frame pose using a prediction and correction framework. The camera posture is predicted, and the correspondence between the point measurement value and the surface measurement value and the point landmark and the surface landmark is obtained using the posture, and then the camera posture is obtained using these.

追跡は、誤った又は不十分な対応に起因して失敗する場合がある。本発明では、所定の数の連続追跡失敗後に位置再特定を行う。ここでは、現在のフレームとマップとの間の点及び面の大域対応検索を用いる。点及び面を用いたバンドル調整も適用し、マップ内のランドマークを非同期に精緻化する。 Tracking may fail due to incorrect or insufficient responses. In the present invention, position re-specification is performed after a predetermined number of continuous tracking failures. Here, a global correspondence search of points and surfaces between the current frame and the map is used. Bundle adjustment using points and faces is also applied to refine the landmarks in the map asynchronously.

方法概観
図１に示すように、現在のフレーム１０１は、シーン１０３の赤、緑、青及び奥行き（ＲＧＢ−Ｄ）カメラ１０２によって取得される（１１０）。フレームを取得するときのカメラの姿勢が予測され（１２０）、これを用いて、フレームとマップ１９４との間の点及び面の対応が突き止められる（１３０）。点及び面の対応は、ＲＡＮｄｏｍＳＡｍｐｌｅＣｏｎｓｅｎｓｕｓ（ＲＡＮＳＡＣ）フレームワーク１４０において、フレームをマップにレジストレーションするのに用いられる。レジストレーションが失敗した場合（１５０）、連続した失敗の数をカウントし（１５４）、偽（Ｆ）である場合、次のフレームに続き、そうではなく真（Ｔ）である場合、カメラ動き予測を用いることなく大域レジストレーション方法を用いてカメラを位置再特定する（１５８）。 Method Overview As shown in FIG. 1, a current frame 101 is acquired 110 by a red, green, blue and depth (RGB-D) camera 102 of a scene 103. The posture of the camera when acquiring the frame is predicted (120) and used to locate the point and surface correspondence between the frame and the map 194 (130). Point and face correspondence is used in the RANdom Sample Consensus (RANSAC) framework 140 to register a frame to a map. If registration failed (150), count the number of consecutive failures (154), if false (F), continue to next frame, otherwise true (T), camera motion prediction The camera is re-positioned using the global registration method without using (158).

ＲＡＮＳＡＣレジストレーションが成功すると、ＲＡＮＳＡＣフレームワークにおいて推定された姿勢１６０がフレームの姿勢として用いられる。次に、現在のフレームがキーフレームであるか否かを判断し（１７０）、偽である場合、ステップ１１０において次のフレームに進む。そうでない場合、現在のフレーム内で追加の点及び面を抽出し（１８０）、マップ１９４を更新し（１９０）、次のフレームに進む。マップはバンドル調整を用いて非同期で精緻化される（１９８）。 If the RANSAC registration is successful, the posture 160 estimated in the RANSAC framework is used as the frame posture. Next, it is determined whether or not the current frame is a key frame (170). If false, the process proceeds to the next frame in step 110. Otherwise, extract additional points and faces in the current frame (180), update the map 194 (190), and go to the next frame. The map is refined asynchronously using bundle adjustment (198).

ステップは、当該技術分野において既知のメモリ及び入／出力インターフェースに接続されたプロセッサにおいて実行することができる。 The steps can be performed in a processor connected to memory and input / output interfaces known in the art.

カメラ姿勢追跡
上記で述べたように、本発明による追跡は、点及び面の双方を含む特徴を用いる。追跡は、予測及び補正方式に基づき、これは以下のように要約することができる。フレームごとに、カメラモーションモデルを用いて姿勢を予測する。予測姿勢に基づいて、マップ内の点ランドマーク及び面ランドマークに対応するフレーム内の点測定値及び面測定値を突き止める。点及び面の対応を用いてＲＡＮＳＡＣベースのレジストレーションを行う。姿勢がマップ内に現在記憶されているいずれのキーフレームの姿勢とも異なる場合、追加の点測定値及び面測定値を抽出し、新たなキーフレームとしてフレームをマップに追加する。 Camera Pose Tracking As described above, tracking according to the present invention uses features that include both points and surfaces. The tracking is based on a prediction and correction scheme, which can be summarized as follows: For each frame, the camera motion model is used to predict the posture. Based on the predicted posture, the point measurement value and the surface measurement value in the frame corresponding to the point landmark and the surface landmark in the map are determined. RANSAC-based registration is performed using point and surface correspondence. If the posture is different from the posture of any key frame currently stored in the map, additional point measurement values and surface measurement values are extracted and the frame is added to the map as a new key frame.

カメラ動き予測
ｋ番目のフレームの姿勢を以下のように表す。

ここで、Ｒ_ｋ及びｔ_ｋはそれぞれ、回転行列及び並進ベクトルを表す。第１のフレームを用いてマップの座標系を定義する。このため、Ｔ_１は恒等行列であり、Ｔ_ｋはマップに対するｋ番目のフレームの姿勢を表す。 Camera Motion Prediction The posture of the kth frame is expressed as follows.

Here, R _k and t _k represent a rotation matrix and a translation vector, respectively. The first frame is used to define the map coordinate system. Therefore, T ₁ is an identity matrix, and T _k represents the posture of the kth frame with respect to the map.

一定速度推定を用いることによって、ｋ番目のフレームの姿勢

を予測する。ΔＴが、（ｋ−１）番目のフレームと（ｋ−２）番目のフレームとの間の以前に推定された動き、すなわちΔＴ＝Ｔ_ｋ−１Ｔ_ｋ−２ ^−１を示すものとする。このとき、ｋ番目のフレームの姿勢を、

として予測する。 The posture of the kth frame by using constant velocity estimation

Predict. Let ΔT denote the previously estimated motion between the (k−1) th frame and the (k−2) th frame, ie ΔT = T _k−1 T _k−2 ⁻¹ . At this time, the posture of the kth frame is

To predict.

点及び面の対応の突き止め
図２に示すように、予測姿勢

を用いてマップ内のランドマークに対応するｋ番目のフレームの点測定値及び面測定値を突き止める。現在のフレームの予測姿勢２０１を所与として、マップ２０２内の点ランドマーク及び面ランドマークと現在のフレーム２０３内の点測定値及び面測定値との間の対応を突き止める。まず、マップ内のランドマークを、予測姿勢を用いて現在のフレームに変換する。次に、全ての点について、現在のフレーム内の予測ピクセル位置からのオプティカルフロー手順を用いた局所探索を実行する。全ての面について、まず、予測面のパラメーターを突き止める。次に、予測面上の基準点の組を検討し、予測面上に位置する各基準点から接続されたピクセルを突き止める。最大数の接続ピクセルを有する基準点が選択され、全ての接続されたピクセルを用いて面パラメーターが精緻化される。 Ascertain the correspondence between points and surfaces

Is used to locate the point measurement and the surface measurement of the kth frame corresponding to the landmark in the map. Given the predicted pose 201 of the current frame, locate the correspondence between the point landmarks and surface landmarks in the map 202 and the point and surface measurements in the current frame 203. First, the landmark in the map is converted to the current frame using the predicted posture. Next, a local search using an optical flow procedure from the predicted pixel position in the current frame is performed for all points. For all surfaces, first determine the parameters of the prediction surface. Next, a set of reference points on the prediction plane is examined, and pixels connected from each reference point located on the prediction plane are identified. A reference point with the maximum number of connected pixels is selected and the surface parameters are refined using all connected pixels.

点対応：ｐ_ｉ＝（ｘ_ｉ，ｙ_ｉ，ｚ_ｉ，ｌ）^Ｔが、等質ベクトルとして表されるマップ内のｉ番目の点ランドマーク２１０を表すものとする。現在のフレームにおけるｐ_ｉの２Ｄ画像投影２２０が以下のように予測される。

ここで、

はｋ番目のフレームの座標系に変換された３Ｄ点であり、関数ＦＰ（・）は、内部カメラ較正パラメーターを用いて画像面上への３Ｄ点の順方向投影を求める。初期位置

から開始して、ルーカス−カナデのオプティカルフロー法を用いることによって対応する点測定値を突き止める。

を、求められたオプティカルフローベクトル２３０とする。このとき、対応する点測定値

は、以下となる。

ここで、関数ＢＰ（・）は、２Ｄ画像ピクセルを３Ｄ光線に後方投影し、Ｄ（・）はピクセルの奥行き値を指す。オプティカルフローベクトルが求められないか、又はピクセルロケーション

が無効な奥行き値を有する場合、特徴は失われたものとみなされる。 Point correspondence: p _i = (x _i , y _i , z _i , l) Let ^T denote the i th point landmark 210 in the map represented as a homogeneous vector. 2D image projection 220 of p _i in the current frame is predicted as follows.

here,

Is the 3D point converted to the coordinate system of the kth frame, and the function FP (•) determines the forward projection of the 3D point onto the image plane using internal camera calibration parameters. Initial position

Starting from, locate the corresponding point measurement by using the Lucas-Kanade optical flow method.

Is the obtained optical flow vector 230. At this time, the corresponding point measurement value

Is as follows.

Here, the function BP (•) projects a 2D image pixel back to a 3D ray, and D (•) refers to the depth value of the pixel. Optical flow vector not found or pixel location

If has an invalid depth value, the feature is considered lost.

面対応：従来技術のように、各フレームにおいて、他のフレームと独立して時間がかかる面抽出手順を行う代わりに、本発明では、予測姿勢を利用して面を抽出する。これによって、面測定抽出がより高速となり、面対応ももたらされる。 Surface correspondence: Instead of performing a time-consuming surface extraction procedure in each frame independently of other frames as in the prior art, in the present invention, a surface is extracted using a predicted posture. This makes surface measurement extraction faster and also provides surface correspondence.

π_ｊ＝（ａ_ｊ，ｂ_ｊ，ｃ_ｊ，ｄ_ｊ）^Ｔがマップ内のｊ番目の面ランドマーク２４０の面方程式を表すものとする。面ランドマーク及び対応する測定値は、画像内に幾つかの重複領域を有すると仮定する。そのような対応する面測定値を突き止めるために、ｊ番目の面ランドマークのインライアから幾つかの基準点２５０、ｑ_ｊ，ｒ（ｒ＝１，．．．，Ｎ）をランダムに選択し、基準点をｋ番目のフレームに２５５として変換する。

π _j = (a _j , b _j , c _j , d _j ) Let ^T denote the surface equation of the jth surface landmark 240 in the map. Assume that the surface landmarks and corresponding measurements have several overlapping areas in the image. In order to locate such a corresponding surface measurement, several reference points 250, q _{j, r} (r = 1,..., N) are randomly selected from the inlier of the jth surface landmark, The reference point is converted to 255 in the kth frame.

また、π_ｊをｋ番目のフレームに２４５として変換する。

Also, π _j is converted as 245 into the kth frame.

面

上にある各変換された基準点

から接続されたピクセル２６０を突き止め、最大のインライアを有するピクセルを選択する。インライアを用いて面方程式を精緻化し、結果として対応する面測定値

が得られる。インライアの数が閾値未満である場合、面ランドマークは失われたものと宣言される。例えば、Ｎ＝５個の基準点と、面におけるインライアを求めるのに点対面の距離について５０ｍｍの閾値と、インライアの最大数の閾値として９０００とを用いる。 surface

Each transformed reference point above

Locate the connected pixel 260 and select the pixel with the largest inlier. Refine the surface equation using inliers and, as a result, the corresponding surface measurements

Is obtained. If the number of inliers is less than the threshold, the face landmark is declared lost. For example, N = 5 reference points, a threshold of 50 mm for the point-to-face distance to determine the inliers in the plane, and 9000 as the threshold for the maximum number of inliers.

ランドマーク選択
マップ内の全てのランドマークを用いて上記のプロセスを実行することは非効率的である可能性がある。したがって、現在のフレームに最も近い単一のキーフレーム内に現れるランドマークを用いる。最も近いキーフレームは、追跡プロセスの前に、前のフレームＴ_ｋ−１の姿勢を用いることによって選択される。 Landmark Selection Performing the above process with all landmarks in the map can be inefficient. Therefore, the landmark that appears in a single key frame closest to the current frame is used. The closest key frame is selected by using the pose of the previous frame T _k−1 before the tracking process.

ＲＡＮＳＡＣレジストレーション
予測ベースの対応探索は、点対点及び面対面の対応の候補を提供する。これらの候補は外れ値を含む場合がある。このため、ＲＡＮＳＡＣベースのレジストレーションを行ってインライアを求め、カメラ姿勢を求める。姿勢を明確に求めるために、少なくとも３つの対応を必要とする。このため、対応の候補が３つ未満である場合、即座に追跡失敗であると判断する。また、正確なカメラ追跡のために、僅かな数の対応候補しかないとき、追跡失敗であると判断する。 RANSAC Registration Prediction-based correspondence search provides point-to-point and face-to-face candidate correspondences. These candidates may include outliers. Therefore, RANSAC-based registration is performed to obtain an inlier and a camera posture is obtained. In order to determine the posture clearly, at least three actions are required. For this reason, if there are fewer than three corresponding candidates, it is immediately determined that tracking has failed. Further, for accurate camera tracking, when there are only a few corresponding candidates, it is determined that tracking has failed.

十分な数の候補が存在する場合、閉形式で混合した対応を用いてレジストレーション問題を解く。手順は、点対応よりも面対応を優先する。なぜなら、面の数は通常、点の数よりもはるかに小さく、面は、多くの点からのサポートに起因してノイズがより少ないためである。ＲＡＮＳＡＣが十分な数のインライア、例えば全ての点測定値及び面測定値の数のうちの４０％を突き止める場合、追跡は成功とみなされる。本方法により、ｋ番目のフレームの補正された姿勢Ｔ_ｋが得られる。 If there are a sufficient number of candidates, solve the registration problem using a closed-form mixed correspondence. The procedure gives priority to face correspondence over point correspondence. This is because the number of faces is usually much smaller than the number of points and the faces are less noisy due to support from many points. Tracking is considered successful if RANSAC locates a sufficient number of inliers, eg, 40% of the number of all point and surface measurements. With this method, the corrected posture T _k of the k th frame is obtained.

マップ更新
推定姿勢Ｔ_ｋがマップ内の任意の既存のキーフレームの姿勢と十分異なる場合、ｋ番目のフレームをキーフレームであると判断する。この条件をチェックするために、例えば、並進における１００ｍｍの閾値及び回転における５度の閾値を用いることができる。新たなキーフレームのために、ＲＡＮＳＡＣベースのレジストレーションにおけるインライアとして突き止められた点及び面の測定値は、対応するランドマークに関連付けられる一方、外れ値として突き止められた点及び面の測定値は廃棄される。次に、このフレーム内に新たに現れる追加の点及び面の測定値を抽出する。追加の点測定値は、いかなる既存の点測定値にも近くないピクセルに対し、Ｓｃａｌｅ−ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ（ＳＩＦＴ）及びＳｐｅｅｄｅｄＵｐＲｏｂｕｓｔＦｅａｔｕｒｅｓ（ＳＵＲＦ）等のキーポイント検出器を用いて抽出される。追加の面測定値は、任意の既存の面測定値のインライアでないピクセルに対しＲＡＮＳＡＣベースの面当てはめを用いることによって抽出される。追加の点測定値及び面測定値は、新たなランドマークとしてマップに加えられる。さらに、フレーム内の全ての点測定値について、ＳＩＦＴ及びＳＵＲＦ等の特徴記述子を抽出し、これらが位置再特定に用いられる。 Map Update If the estimated posture T _k is sufficiently different from the posture of any existing key frame in the map, it is determined that the k th frame is a key frame. In order to check this condition, for example, a threshold of 100 mm for translation and a threshold of 5 degrees for rotation can be used. Due to the new keyframe, the point and surface measurements located as inliers in the RANSAC-based registration are associated with the corresponding landmarks, while the point and face measurements located as outliers are discarded. Is done. Next, additional point and surface measurements that newly appear in this frame are extracted. Additional point measurements are extracted for pixels that are not close to any existing point measurements using keypoint detectors such as Scale-Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF). Additional surface measurements are extracted by using RANSAC-based surface fitting for non-inlier pixels of any existing surface measurement. Additional point and surface measurements are added to the map as new landmarks. Furthermore, feature descriptors such as SIFT and SURF are extracted for all point measurements in the frame, and these are used for position re-specification.

Claims

A method of registering data using a set of primitives, wherein the data has three dimensions (3D), the primitives include points and faces in the data in three dimensions, the method comprising:
Selecting a first set of primitives from the data in a first coordinate system, the first set of primitives including at least three primitives and including at least one surface;
Predicting a transformation from the first coordinate system to a second coordinate system, wherein the transformation is predicted using a camera motion model;
Transforming the first set of primitives into the second coordinate system using the predicted transform;
Determining a second set of primitives according to the first set of primitives converted to the second coordinate system;
Using the first set of primitives in the first coordinate system and the second set of primitives in the second coordinate system corresponding to each other, the second coordinate system is converted to the first coordinate system. And registering with
The registering is used for Simulaneous Localization and Mapping (SLAM), and the steps are performed in a processor ;
Determining the second set of primitives comprises:
A method of registering data using a set of primitives, wherein the posture of the primitive in the first coordinate system converted to the second coordinate system is used as the predicted posture of the primitive in the second coordinate system. .

The first set of primitives includes at least one point and at least one surface in the first coordinate system, and the second set of primitives includes at least one point in the second coordinate system and at least The method of claim 1, comprising one surface.

The method of claim 1, wherein the data is acquired by a movable camera.

The method of claim 1, wherein the data includes texture and depth.

The method of claim 1, wherein the registration uses a RANdom Sample Consensus (RANSAC).

The method of claim 1, wherein the data takes the form of a frame sequence acquired by a camera.

Selecting a set of frames as key frames from the frame sequence;
Storing the key frame in a map, wherein the key frame includes the point and the surface, and the point and the surface are stored as landmarks in the map;
The method of claim 6 , further comprising:

Predicting the camera posture for each frame;
Tracking the camera for the posture of the camera for each frame according to the registration;
The method of claim 7 , further comprising:

The method of claim 1, wherein the registration is performed in real time.

The method of claim 7 , further comprising applying a bundle adjustment using the points and the surface to refine the landmarks in the map.

The posture of the kth frame is

, And the wherein each Rk and tk represents the rotation matrix and the translation vector The method of claim 8.

The method of claim 8 , wherein the predicting uses a constant velocity estimate.

The method of claim 6 , wherein the point in the frame is located using an optical flow procedure.

The method of claim 1, wherein the correspondence of the surface takes precedence over the correspondence of the points.