JP2009237845A

JP2009237845A - Information processor, information processing method, and computer program

Info

Publication number: JP2009237845A
Application number: JP2008082448A
Authority: JP
Inventors: Steven Goodman; ステフェングットマン; Kenichiro Oi; 堅一郎多井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-03-27
Filing date: 2008-03-27
Publication date: 2009-10-15
Anticipated expiration: 2028-03-27
Also published as: JP4985516B2

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a configuration for efficiently calculating three-dimensional positions of feature points included in a photographic image based on an image obtained by a camera. <P>SOLUTION: In a configuration of acquiring the three-dimensional position of feature points based on the image obtained by a camera, the three-dimensional position information of feature points in subsequent frames are acquired by:acquiring the three-dimensional position information of the feature points based on corresponding feature point analysis using only a plurality of preceding image frames in the image frame of the camera photographic image as initial information, and adopting an extended Kalman filter (EKF) corresponding to the subsequent image frame while setting the initial information as state information corresponding the initial image frame. Thus, it is required to execute the feature point extraction processing with frame matching or the like only to the preceding image frame. Therefore, it is possible to efficiently acquire the three-dimensional position information of the feature points and generate the three-dimensional image data. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、情報処理装置、および情報処理方法、並びにコンピュータ・プログラムに関する。さらに詳細には、カメラによる撮影画像に基づく３次元マップ（３Ｄｍａｐ）の生成に適用する特徴点の３次元位置を算出する処理を行う情報処理装置、および情報処理方法、並びにコンピュータ・プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a computer program. More specifically, the present invention relates to an information processing apparatus, an information processing method, and a computer program for performing processing for calculating a three-dimensional position of a feature point applied to generation of a three-dimensional map (3D map) based on a photographed image by a camera.

カメラの撮影画像を解析して撮影画像に含まれるオブジェクトの３次元位置を求める処理が様々な分野で利用されている。例えばカメラを備えたロボットなどのエージェント(移動体)が、カメラの撮影画像を解析して移動環境を観測し、観測状況に応じてエージェント周囲の環境を把握しながら移動を行う処理や、撮影画像に基づいて周囲環境の地図（環境地図）を作成する環境マップ構築処理に利用される。非特許文献１には、特徴点位置の追跡を全フレームで行い、全フレームのデータが得られた後、バッチ処理により特徴点の位置とカメラ位置を算出する方法を開示している。 Processing for analyzing a photographed image of a camera and obtaining a three-dimensional position of an object included in the photographed image is used in various fields. For example, an agent (moving body) such as a robot equipped with a camera analyzes the captured image of the camera, observes the moving environment, and moves while grasping the environment around the agent according to the observation situation, It is used for environment map construction processing for creating a map of the surrounding environment (environment map) based on Non-Patent Document 1 discloses a method of tracking feature point positions in all frames and calculating the positions of feature points and camera positions by batch processing after data of all frames is obtained.

３次元マップ（３Ｄｍａｐ）の生成処理シーケンスの一例について図１を参照して説明する。まず、ステップＳ１１においてカメラによって画像を撮影する。例えばカメラを保持したユーザやロボットなどが移動しながら周りの画像を連続的に撮影する。 An example of a three-dimensional map (3D map) generation processing sequence will be described with reference to FIG. First, in step S11, an image is taken by a camera. For example, a user or a robot holding a camera continuously takes surrounding images while moving.

ステップＳ１２において取得画像の解析によって、画像に含まれる特徴点の位置情報などが含まれる疎な３次元情報を構築する。この処理においては、ＳＬＡＭ（ｓｉｍｕｌｔａｎｅｏｕｓｌｏｃａｌｉｚａｔｉｏｎａｎｄｍａｐｐｉｎｇ）やＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）などの処理が適用される。ＳＬＡＭは、カメラから入力する画像内の特徴点の位置と、カメラの位置姿勢を併せて検出する処理である。ＳＦＭは、例えば複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析する処理などである。 In step S12, sparse three-dimensional information including position information of feature points included in the image is constructed by analyzing the acquired image. In this process, a process such as SLAM (simultaneous localization and mapping) or SFM (Structure from Motion) is applied. SLAM is a process for detecting the position of a feature point in an image input from the camera and the position and orientation of the camera. SFM is, for example, processing for analyzing correspondence between feature points (Landmarks) included in an image using images taken from a plurality of different positions.

さらに、ステップＳ１３では、ステップＳ１２において求めたカメラの軌跡情報や画像内の特徴点情報などを利用して詳細な３次元情報である密な３次元情報である３次元マップを生成する。 Furthermore, in step S13, a three-dimensional map, which is dense three-dimensional information, which is detailed three-dimensional information, is generated using the camera trajectory information obtained in step S12 and the feature point information in the image.

ステップＳ１２のＳＦＭやＳＬＡＭ処理において画像内の特徴点の解析を正確に実行することが、最終的な３次元情報の精度を高めることになる。このステップＳ１２では、画像フレームに含まれる特徴点情報とカメラ軌跡情報を取得する処理が行われる。この詳細処理例を図２に示す。 Accurate analysis of the feature points in the image in the SFM or SLAM processing in step S12 increases the accuracy of the final three-dimensional information. In step S12, a process of acquiring feature point information and camera trajectory information included in the image frame is performed. An example of this detailed processing is shown in FIG.

まず、ステップＳ２１においてカメラから複数の撮影画像を入力し、フレーム間の一致する特徴点を利用して、カメラの位置を算出するフレームマッチング処理を行う。入力画像は、例えば移動するカメラが撮影した動画像の複数の画像フレーム、すなわち異なる位置から撮影した複数の画像であり、同一のオブジェクトが複数フレームに撮影されている。ステップＳ２１では、複数の画像フレームから対応する特徴点を検出し、これらの情報を利用してカメラの位置を計算する。 First, in step S <b> 21, a plurality of captured images are input from the camera, and frame matching processing for calculating the position of the camera is performed using matching feature points between frames. The input image is, for example, a plurality of image frames of a moving image captured by a moving camera, that is, a plurality of images captured from different positions, and the same object is captured in a plurality of frames. In step S21, corresponding feature points are detected from a plurality of image frames, and the position of the camera is calculated using these pieces of information.

ステップＳ２２では、フレームスティッチ処理（ＦｒａｍｅＳｔｉｔｃｈ）を実行する。この処理は、ステップＳ２１で検出した情報を利用して各画像の特徴点（Ｌａｎｄｍａｒｋ）の３次元位置を推定して、複数の画像データ内の複数の特徴点位置に基づいて画像フレームを接合して各特徴点の３次元位置を反映したデータを生成する処理である。 In step S22, a frame stitch process (Frame Stitch) is executed. This process estimates the three-dimensional position of each image feature point (Landmark) using the information detected in step S21, and joins image frames based on a plurality of feature point positions in a plurality of image data. This is a process of generating data reflecting the three-dimensional position of each feature point.

ステップＳ２３では、バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）を実行する。このバンドル調整処理は、異なる位置から撮影した複数の画像に含まれる対応する特徴点の３次元位置を１つの位置に収束させる処理である。各撮影画像のカメラ位置情報と、各カメラ位置において撮影された画像に含まれる対応する特徴点の情報を利用して、特徴点の最も確からしい３次元位置を算出する処理である。この処理のためには、基本的には、２つ以上の異なる位置から撮影した画像フレームが必要となる。 In step S23, bundle adjustment processing (Bundle Adjustment) is executed. This bundle adjustment process is a process for converging the three-dimensional positions of corresponding feature points included in a plurality of images taken from different positions into one position. This is a process of calculating the most probable three-dimensional position of the feature point using the camera position information of each captured image and the information of the corresponding feature point included in the image captured at each camera position. For this processing, basically, image frames taken from two or more different positions are required.

例えば、図３に示すように、３つの異なる位置からの撮影画像１１〜１３を利用し、その撮影画像中から検出した特徴点２１〜２３を利用して、特徴点の３次元位置を求める。複数の画像に含まれる特徴点中、対応する特徴点の位置は図３に示す特徴点２１のように、各画像の撮影ポイントから特徴点２１を結ぶ線（Ｂｕｎｄｌｅ）が１つの特徴点２１位置において交わるはずである。 For example, as shown in FIG. 3, the captured images 11 to 13 from three different positions are used, and the feature points 21 to 23 detected from the captured images are used to obtain the three-dimensional positions of the feature points. Among the feature points included in a plurality of images, the position of the corresponding feature point is the position of the feature point 21 where a line (bundle) connecting the feature point 21 to the shooting point of each image is a feature point 21 shown in FIG. Should cross.

しかし、カメラの位置や特徴点の画像内の位置情報などは必ずしも正確に算出されず様々な要因による誤差が含まれる。従って、このような誤差を取り除く必要がある。具体的には１つの対応する特徴点の３次元位置とカメラ位置とを結ぶ線が、その１つの特徴点において交わるように補正する。すなわち算出済みのカメラ位置や特徴点位置を修正する処理が必要となる。この処理がバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）として実行され、この調整処理によって修正された特徴点位置情報やカメラ位置情報を利用してより精度の高い３次元情報を生成することが可能となる。 However, the position of the camera, the position information in the image of the feature points, etc. are not necessarily calculated accurately, and errors due to various factors are included. Therefore, it is necessary to remove such errors. Specifically, correction is performed so that a line connecting the three-dimensional position of one corresponding feature point and the camera position intersects at the one feature point. That is, it is necessary to correct the calculated camera position and feature point position. This processing is executed as bundle adjustment processing (Bundle Adjustment), and it becomes possible to generate more accurate three-dimensional information using the feature point position information and camera position information corrected by this adjustment processing.

しかしながら、このような従来型の処理では、十分な数の対応特徴点が得られないと、ステップＳ２１のフレームマッチング処理が不可能となり、カメラ位置の算出が困難になる場合がある。フレームマッチを省略することはできず、フレームマッチに失敗した場合は、処理のやり直しが必要となる。またカメラは必ず移動していることが必要であり、また、カメラがあるオブジェクトの回りを周回しているような場合、特徴点の対応付けエラーが発生して本来１つのオブジェクトが複数あるものとして認識されてしまうといったエラーが発生しやすいという問題があった。
"Ｓｈａｐｅａｎｄｍｏｔｉｏｎｆｒｏｍｉｍａｇｅｓｔｒｅａｍｓｕｎｄｅｒｏｒｔｈｏｇｒａｐｈｙ：ａｆａｃｔｏｒｉｚａｔｉｏｎｍｅｔｈｏｄ"，Ｃ．ＴｏｍａｓｉａｎｄＴ．Ｋａｎａｄｅ，ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，Ｖｏｌｕｍｅ９，Ｎｕｍｂｅｒ２，ｐｐ．１３７−１５４，（１９９２）． However, in such a conventional process, if a sufficient number of corresponding feature points cannot be obtained, the frame matching process in step S21 becomes impossible, and it may be difficult to calculate the camera position. Frame matching cannot be omitted, and if frame matching fails, processing must be restarted. In addition, the camera must always move, and if the camera orbits around an object, it is assumed that there is a plurality of one object due to a feature point mapping error. There was a problem that errors such as recognition were likely to occur.
“Shape and motion from image stream under orthography: a factorization method”, C.I. Tomasi and T. Kanade, International Journal of Computer Vision, Volume 9, Number 2, pp. 137-154, (1992).

本発明は、カメラによる撮影画像に基づく３次元マップ（３Ｄｍａｐ）の生成に適用する特徴点の３次元位置を算出する構成において、一部の撮影画像のみを利用したフレームマッチング処理を行うのみで、その後の処理ではフレームマッチング処理を実行することなく効率的な特徴点の３次元位置情報生成を可能とする情報処理装置、および情報処理方法、並びにコンピュータ・プログラムを提供することを目的とする。 According to the present invention, in a configuration for calculating a three-dimensional position of a feature point applied to generation of a three-dimensional map (3D map) based on a photographed image by a camera, only a frame matching process using only a part of the photographed image is performed. An object of the present invention is to provide an information processing apparatus, an information processing method, and a computer program that enable efficient three-dimensional position information generation of feature points without performing frame matching processing in subsequent processing.

本発明の第１の側面は、
画像に含まれる特徴点の３次元位置を算出する情報処理装置であり、
カメラ撮影画像の画像フレーム中、複数の先行画像フレームを入力し、各画像フレームの対応特徴点解析により、特徴点の３次元位置情報を取得する初期情報生成部と、
前記初期情報生成部が先行画像フレームから取得した特徴点位置情報を初期画像フレームに対する状態情報として設定し、後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報を取得する特徴点位置情報生成部と、
を有することを特徴とする情報処理装置にある。 The first aspect of the present invention is:
An information processing apparatus that calculates a three-dimensional position of a feature point included in an image,
An initial information generation unit that inputs a plurality of preceding image frames in an image frame of a camera-captured image and acquires three-dimensional position information of feature points by analyzing corresponding feature points of each image frame;
The feature point position information acquired from the preceding image frame by the initial information generating unit is set as state information for the initial image frame, and 3 of feature points in the succeeding frame is processed by applying an extended Kalman filter (EKF) to the succeeding image frame. A feature point position information generation unit for acquiring dimension position information;
There is an information processing apparatus characterized by having.

さらに、本発明の情報処理装置の一実施態様において、前記情報処理装置は、さらに、前記特徴点位置情報生成部の生成した特徴点位置情報を入力して特徴点の３次元位置情報の修正処理を実行するバンドル調整処理部を有することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the information processing apparatus further inputs the feature point position information generated by the feature point position information generation unit and corrects the three-dimensional position information of the feature points. It has the bundle adjustment process part which performs.

さらに、本発明の情報処理装置の一実施態様において、前記初期情報生成部は、複数の異なる位置から撮影した画像を利用して各画像フレームに含まれる特徴点の対応を解析するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理を実行する構成であることを特徴とする。 Furthermore, in an embodiment of the information processing apparatus according to the present invention, the initial information generation unit analyzes the correspondence between feature points included in each image frame using images taken from a plurality of different positions. Motion) processing is executed.

さらに、本発明の情報処理装置の一実施態様において、前記初期情報生成部は、前記先行画像フレームの解析により、画像フレームを撮影したカメラの位置姿勢情報を算出する構成であり、前記特徴点位置情報生成部は、前記初期情報生成部が前記先行画像フレームから算出した特徴点位置情報とカメラ位置姿勢情報を、初期画像フレームに対応する状態情報として設定し、前記後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報と後続フレームを撮影したカメラ位置姿勢情報を取得することを特徴とする。 Furthermore, in one embodiment of the information processing apparatus according to the present invention, the initial information generation unit is configured to calculate position and orientation information of a camera that captured the image frame by analyzing the preceding image frame, and the feature point position The information generation unit sets the feature point position information and the camera position and orientation information calculated by the initial information generation unit from the preceding image frame as state information corresponding to the initial image frame, and an extended Kalman filter (EKF) for the subsequent image frame. ) To obtain the three-dimensional position information of the feature points in the subsequent frame and the camera position and orientation information obtained by photographing the subsequent frame.

さらに、本発明の情報処理装置の一実施態様において、前記特徴点位置情報生成部は、特徴点位置情報とカメラ位置姿勢情報を含む多次元正規分布データを状態情報として設定して拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報と後続フレームを撮影したカメラ位置姿勢情報を取得することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the feature point position information generation unit sets multidimensional normal distribution data including feature point position information and camera position and orientation information as state information, and uses an extended Kalman filter (EKF). ) To obtain the three-dimensional position information of the feature points in the subsequent frame and the camera position and orientation information obtained by photographing the subsequent frame.

さらに、本発明の情報処理装置の一実施態様において、前記情報処理装置は、さらに、前記初期情報生成部と前記特徴点位置情報生成部とによって抽出されなかった特徴点を追加特徴点として抽出して抽出した追加特徴点の３次元位置を算出する処理を実行する特徴点抽出部を有し、前記バンドル調整処理部は、前記特徴点位置情報生成部の生成した特徴点位置情報と前記特徴点抽出部が抽出した追加特徴点の特徴点位置情報を入力して特徴点の３次元位置情報の修正処理を実行する構成であることを特徴とする。 Furthermore, in an embodiment of the information processing device of the present invention, the information processing device further extracts feature points that have not been extracted by the initial information generation unit and the feature point position information generation unit as additional feature points. A feature point extraction unit that executes a process of calculating a three-dimensional position of the additional feature point extracted in the step, and the bundle adjustment processing unit includes the feature point position information generated by the feature point position information generation unit and the feature point. The feature is that the feature point position information of the additional feature point extracted by the extraction unit is input and the correction process of the three-dimensional position information of the feature point is executed.

さらに、本発明の情報処理装置の一実施態様において、前記特徴点抽出部は、画像フレームから抽出した特徴点と、前記初期情報生成部と前記特徴点位置情報生成部とによって抽出された特徴点との重複判定を実行し、重複しない新規特徴点のみを追加特徴点として選択する処理を実行することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the feature point extraction unit includes feature points extracted from an image frame, and feature points extracted by the initial information generation unit and the feature point position information generation unit. And a process of selecting only new feature points that do not overlap as additional feature points.

前記情報処理装置は、さらに、前記特徴点位置情報生成部の生成した特徴点位置情報を利用した３次元画像データを生成する３Ｄマップ生成部を有することを特徴とする。 The information processing apparatus further includes a 3D map generation unit that generates 3D image data using the feature point position information generated by the feature point position information generation unit.

さらに、本発明の第２の側面は、
情報処理装置において、画像に含まれる特徴点の３次元位置を算出する情報処理方法であり、
初期情報生成部が、カメラ撮影画像の画像フレーム中、複数の先行画像フレームを入力し、各画像フレームの対応特徴点解析により、特徴点の３次元位置情報を取得する初期情報生成ステップと、
特徴点位置情報生成部が、前記初期情報生成部が先行画像フレームから取得した特徴点位置情報を初期画像フレームに対する状態情報として設定し、後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報を取得する特徴点位置情報生成ステップと、
を有することを特徴とする情報処理方法にある。 Furthermore, the second aspect of the present invention provides
An information processing method for calculating a three-dimensional position of a feature point included in an image in an information processing device,
An initial information generating step for inputting a plurality of preceding image frames in an image frame of a camera-captured image and acquiring three-dimensional position information of the feature points by corresponding feature point analysis of each image frame;
The feature point position information generation unit sets the feature point position information acquired from the preceding image frame by the initial information generation unit as state information for the initial image frame, and applies an extended Kalman filter (EKF) to the subsequent image frame. A feature point position information generation step for acquiring three-dimensional position information of the feature points in the subsequent frame;
There is an information processing method characterized by comprising:

さらに、本発明の情報処理方法の一実施態様において、前記情報処理方法は、さらに、バンドル調整処理部が、前記特徴点位置情報生成ステップにおいて生成した特徴点位置情報を入力して特徴点の３次元位置情報の修正処理を実行するバンドル調整処理ステップを有することを特徴とする。 Furthermore, in an embodiment of the information processing method according to the present invention, the information processing method further includes: a feature point position information generated by the bundle adjustment processing unit by inputting the feature point position information generated in the feature point position information generation step; It has the bundle adjustment process step which performs the correction process of dimension position information, It is characterized by the above-mentioned.

さらに、本発明の情報処理方法の一実施態様において、前記初期情報生成ステップは、複数の異なる位置から撮影した画像を利用して各画像フレームに含まれる特徴点の対応を解析するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理を実行するステップであることを特徴とする。 Furthermore, in an embodiment of the information processing method of the present invention, the initial information generation step uses an SFM (Structure from) that analyzes the correspondence between feature points included in each image frame using images taken from a plurality of different positions. It is a step for executing a (Motion) process.

さらに、本発明の情報処理方法の一実施態様において、前記初期情報生成ステップは、前記先行画像フレームの解析により、画像フレームを撮影したカメラの位置姿勢情報を算出するステップであり、前記特徴点位置情報生成ステップは、前記初期情報生成部が前記先行画像フレームから算出した特徴点位置情報とカメラ位置姿勢情報を、初期画像フレームに対応する状態情報として設定し、前記後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報と後続フレームを撮影したカメラ位置姿勢情報を取得するステップであることを特徴とする。 Furthermore, in an embodiment of the information processing method of the present invention, the initial information generation step is a step of calculating position and orientation information of a camera that has captured the image frame by analyzing the preceding image frame, and the feature point position The information generation step sets the feature point position information and the camera position and orientation information calculated from the preceding image frame by the initial information generation unit as state information corresponding to the initial image frame, and an extended Kalman filter (EKF) for the subsequent image frame. ) To obtain the three-dimensional position information of the feature points in the subsequent frame and the camera position and orientation information obtained by photographing the subsequent frame.

さらに、本発明の情報処理方法の一実施態様において、前記特徴点位置情報生成ステップは、特徴点位置情報とカメラ位置姿勢情報を含む多次元正規分布データを状態情報として設定して拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報と後続フレームを撮影したカメラ位置姿勢情報を取得することを特徴とする。 Furthermore, in an embodiment of the information processing method of the present invention, the feature point position information generation step sets multidimensional normal distribution data including feature point position information and camera position and orientation information as state information, and sets an extended Kalman filter (EKF). ) To obtain the three-dimensional position information of the feature points in the subsequent frame and the camera position and orientation information obtained by photographing the subsequent frame.

さらに、本発明の情報処理方法の一実施態様において、前記情報処理方法は、さらに、特徴点抽出部が、前記初期情報生成部と前記特徴点位置情報生成部とによって抽出されなかった特徴点を追加特徴点として抽出して抽出した追加特徴点の３次元位置を算出する処理を実行する特徴点抽出ステップを有し、前記バンドル調整処理ステップは、前記特徴点位置情報生成部の生成した特徴点位置情報と前記特徴点抽出部が抽出した追加特徴点の特徴点位置情報を入力して特徴点の３次元位置情報の修正処理を実行することを特徴とする。 Furthermore, in an embodiment of the information processing method of the present invention, the information processing method further includes a feature point extracted by the feature point extraction unit by the initial information generation unit and the feature point position information generation unit. A feature point extraction step for executing a process of calculating a three-dimensional position of the additional feature point extracted and extracted as the additional feature point, wherein the bundle adjustment processing step includes the feature point generated by the feature point position information generation unit The correction processing of the feature point three-dimensional position information is executed by inputting the position information and the feature point position information of the additional feature point extracted by the feature point extraction unit.

さらに、本発明の情報処理方法の一実施態様において、前記特徴点抽出ステップは、画像フレームから抽出した特徴点と、前記初期情報生成部と前記特徴点位置情報生成部とによって抽出された特徴点との重複判定を実行し、重複しない新規特徴点のみを追加特徴点として選択する処理を実行することを特徴とする。 Furthermore, in an embodiment of the information processing method of the present invention, the feature point extraction step includes a feature point extracted from an image frame, and a feature point extracted by the initial information generation unit and the feature point position information generation unit. And a process of selecting only new feature points that do not overlap as additional feature points.

さらに、本発明の情報処理方法の一実施態様において、前記情報処理方法は、さらに、３Ｄマップ生成部が、前記特徴点位置情報生成部の生成した特徴点位置情報を利用した３次元画像データを生成する３Ｄマップ生成ステップを有することを特徴とする。 Furthermore, in an embodiment of the information processing method of the present invention, the information processing method further includes: a 3D map generation unit configured to generate 3D image data using the feature point position information generated by the feature point position information generation unit; It has a 3D map generation step to generate.

さらに、本発明の第３の側面は、
情報処理装置において、画像に含まれる特徴点の３次元位置を算出させるコンピュータ・プログラムであり、
初期情報生成部に、カメラ撮影画像の画像フレーム中、複数の先行画像フレームを入力し、各画像フレームの対応特徴点解析により、特徴点の３次元位置情報を取得させる初期情報生成ステップと、
特徴点位置情報生成部に、前記初期情報生成部が先行画像フレームから取得した特徴点位置情報を初期画像フレームに対する状態情報として設定し、後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報を取得させる特徴点位置情報生成ステップと、
を有することを特徴とするコンピュータ・プログラムにある。 Furthermore, the third aspect of the present invention provides
In the information processing device, a computer program for calculating a three-dimensional position of a feature point included in an image,
An initial information generating step of inputting a plurality of preceding image frames in the image frames of the camera-captured image to the initial information generating unit and acquiring the three-dimensional position information of the feature points by analyzing the corresponding feature points of each image frame;
In the feature point position information generation unit, the feature point position information acquired from the preceding image frame by the initial information generation unit is set as state information for the initial image frame, and an extended Kalman filter (EKF) for the subsequent image frame is applied. A feature point position information generation step for acquiring three-dimensional position information of the feature points in the subsequent frame;
There is a computer program characterized by comprising:

なお、本発明のコンピュータ・プログラムは、例えば、様々なプログラム・コードを実行可能な汎用コンピュータ・システムに対して、コンピュータ可読な形式で提供する記憶媒体、通信媒体によって提供可能なコンピュータ・プログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、コンピュータ・システム上でプログラムに応じた処理が実現される。 The computer program of the present invention is, for example, a computer program that can be provided by a storage medium or a communication medium provided in a computer-readable format to a general-purpose computer system that can execute various program codes. . By providing such a program in a computer-readable format, processing corresponding to the program is realized on the computer system.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施例や添付する図面に基づくより詳細な説明によって明らかになるであろう。なお、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Other objects, features, and advantages of the present invention will become apparent from a more detailed description based on embodiments of the present invention described later and the accompanying drawings. In this specification, the system is a logical set configuration of a plurality of devices, and is not limited to one in which the devices of each configuration are in the same casing.

本発明の一実施例の構成によれば、カメラの取得した画像に基づく特徴点の３次元位置取得構成において、カメラ撮影画像の画像フレーム中、複数の先行画像フレームのみを利用した対応特徴点解析による特徴点の３次元位置情報を初期情報として取得する処理と、この初期情報を初期画像フレームに対する状態情報として設定して、後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報を取得する構成としたので、例えばフレームマッチングなどを伴う特徴点抽出処理は、先行画像フレームに対してのみ実行すればよく、効率的な特徴点の３次元位置情報の取得および３次元画像データの生成が実現される。 According to the configuration of one embodiment of the present invention, in a configuration for acquiring a three-dimensional position of a feature point based on an image acquired by a camera, corresponding feature point analysis using only a plurality of preceding image frames in an image frame of a camera-captured image. In the subsequent frame, a process of acquiring the three-dimensional position information of the feature point as initial information and a process of setting this initial information as state information for the initial image frame and applying an extended Kalman filter (EKF) to the subsequent image frame Since the feature point extraction process with frame matching, for example, only needs to be performed on the preceding image frame, the feature point three-dimensional position information is efficiently acquired. Acquisition and generation of three-dimensional image data.

以下、図面を参照しながら本発明の実施形態に係る情報処理装置、および情報処理方法、並びにコンピュータ・プログラムの詳細について説明する。 Details of an information processing apparatus, an information processing method, and a computer program according to embodiments of the present invention will be described below with reference to the drawings.

本発明の概要について、図４を参照して説明する。本発明の情報処理装置１２０は、例えば移動するユーザ１０１の保持するカメラ１０２の撮影画像、例えば動画像を構成する画像を入力し、その入力画像の解析を実行して撮影画像に含まれる様々なオブジェクトからなる３次元画像情報１０３を生成する。 The outline of the present invention will be described with reference to FIG. The information processing apparatus 120 of the present invention inputs, for example, a photographed image of the camera 102 held by the moving user 101, for example, an image constituting a moving image, performs analysis of the input image, and includes various images included in the photographed image. Three-dimensional image information 103 composed of objects is generated.

情報処理装置１２０は、先に図１を参照して説明した処理と同様、取得画像の解析によって、画像に含まれる特徴点の位置情報などが含まれる疎な３次元マップ１３１を、ＳＬＡＭ（ｓｉｍｕｌｔａｎｅｏｕｓｌｏｃａｌｉｚａｔｉｏｎａｎｄｍａｐｐｉｎｇ）やＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）などの処理を適用して生成し、さらに、カメラの軌跡情報や画像内の特徴点情報などを利用して詳細な３次元情報である密な３次元マップ１３２を生成する。 Similar to the processing described above with reference to FIG. 1, the information processing apparatus 120 analyzes a sparse three-dimensional map 131 that includes position information of feature points included in an image by analyzing an acquired image, using SLAM (simultaneous). Dense three-dimensional information that is generated by applying processing such as localization and mapping (SFM) and structure from motion (SFM), and further using detailed information such as camera trajectory information and feature point information in the image. A map 132 is generated.

先に説明したように、ＳＬＡＭは、カメラから入力する画像内の特徴点の位置と、カメラの位置姿勢を併せて検出する処理である。ＳＦＭは、例えば複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析する処理などである。 As described above, SLAM is a process for detecting the position of the feature point in the image input from the camera and the position and orientation of the camera together. SFM is, for example, processing for analyzing correspondence between feature points (Landmarks) included in an image using images taken from a plurality of different positions.

疎な３次元マップ１３１は特徴点の３次元位置情報を持つ。この情報の生成処理シーケンスの従来例については、図２のフローチャートを参照して説明したが、本発明の情報処理装置１２０は、図２に示すフローと異なった処理により特徴点の３次元位置の取得を行う。 The sparse three-dimensional map 131 has three-dimensional position information of feature points. The conventional example of the information generation processing sequence has been described with reference to the flowchart of FIG. 2, but the information processing apparatus 120 of the present invention performs the processing of the three-dimensional position of the feature points by processing different from the flow shown in FIG. 2. Acquire.

本発明の一実施例の特徴点の３次元位置およびカメラの位置姿勢情報の取得シーケンスについて、図５に示すフローチャートを参照して説明する。 The acquisition sequence of the three-dimensional position of the feature point and the position and orientation information of the camera according to an embodiment of the present invention will be described with reference to the flowchart shown in FIG.

特徴点の３次元位置情報およびカメラの位置姿勢情報の取得処理は、以下の処理シーケンスで実行する。
ステップＳ１０１：初期情報取得処理（ＳＦＭ）
ステップＳ１０２：拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）
ステップＳ１０３：バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ） The process of acquiring the feature point 3D position information and the camera position and orientation information is executed in the following processing sequence.
Step S101: Initial information acquisition process (SFM)
Step S102: Camera position and orientation and feature point three-dimensional position information acquisition processing (EKF SLAM) to which the extended Kalman filter is applied
Step S103: Bundle adjustment processing (Bundle Adjustment)

本発明の一実施例の特徴点の３次元位置情報およびカメラの位置姿勢情報の取得処理の基本的流れは、このように、
「初期情報取得処理（ＳＦＭ）」
→「拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）」
→「バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）」
このような処理シーケンスである。 The basic flow of the acquisition process of the three-dimensional position information of the feature point and the position and orientation information of the camera of one embodiment of the present invention is as follows
"Initial information acquisition process (SFM)"
→ "Camera position / posture and feature point 3D position information acquisition processing (EKF SLAM) applying extended Kalman filter"
→ "Bundle adjustment process (Bundle Adjustment)"
This is the processing sequence.

ステップＳ１０１の初期情報取得処理（ＳＦＭ）は、例えば複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理によって行われるが、この処理は、例えばカメラから入力する最初の数フレームの入力画像のみを適用して実行する。この処理によって得られた情報をステップＳ１０２において実行する拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）のイニシャライズ情報として利用する。 The initial information acquisition process (SFM) in step S101 is performed, for example, by an SFM (Structure from Motion) process that analyzes the correspondence of feature points (Landmarks) included in the image using images taken from a plurality of different positions. However, this processing is executed by applying only the input images of the first few frames input from the camera, for example. Information obtained by this process is used as initialization information for the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is executed in step S102.

ＥＫＦＳＬＡＭに適用するイニシャライズ情報が得られた後は、その後の入力画像に対しては、ＳＦＭを実行することなく、ＥＫＦＳＬＡＭを実行して、特徴点位置情報と各画像を撮影したカメラの位置姿勢情報を取得する。 After initialization information to be applied to the EKF SLAM is obtained, the EKF SLAM is executed on the subsequent input image without executing the SFM, and the feature point position information and the position of the camera that captured each image are obtained. Get posture information.

具体的な処理例について図６を参照して説明する。図６（ａ）に示すカメラによって取得される画像フレームに含まれる被写体の特徴点の位置情報とカメラの軌跡を求めるとする。なお、図６（ａ）には、カメラによって取得される画像フレームを一定間隔で示している。 A specific processing example will be described with reference to FIG. Assume that the position information of the feature points of the subject included in the image frame acquired by the camera shown in FIG. In FIG. 6A, image frames acquired by the camera are shown at regular intervals.

本発明の情報処理装置は、特徴点の３次元位置情報の解析対象とする画像フレームのすべてではなく、最初からの数フレームのみ（図６に示すフレームＴ１までの入力フレーム）を、図５に示すステップＳ１０１の初期情報取得処理（ＳＦＭ）の処理対象フレームとする。この複数の先行画像フレームを対象とした初期情報取得処理（ＳＦＭ）によって、３次元マップとカメラフレームの軌跡情報が、次のステップＳ１０２において実行する「ＥＫＦＳＬＡＭ」の初期化情報（ＩｎｉｔｉａｌｉｚｅＤａｔａ）として利用される。 In the information processing apparatus of the present invention, not only all the image frames to be analyzed for the three-dimensional position information of feature points, but only the first few frames (input frames up to frame T1 shown in FIG. 6) are shown in FIG. The processing target frame of the initial information acquisition process (SFM) in step S101 shown in FIG. By the initial information acquisition process (SFM) for the plurality of preceding image frames, the trajectory information of the three-dimensional map and the camera frame is used as initialization information (Initialized Data) of “EKF SLAM” executed in the next step S102. Used.

フレームＴ１までの複数の先行画像フレームによって得られた３次元マップとカメラフレームの軌跡情報が初期情報となり、この初期情報を用いて「ＥＫＦＳＬＡＭ」の初期化を行い、「ＥＫＦＳＬＡＭ」の処理を行う。すなわち、先行画像フレームから取得した特徴点位置情報を初期画像フレームに対する状態情報として設定し、後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報を取得する。 The three-dimensional map obtained from a plurality of preceding image frames up to frame T1 and the trajectory information of the camera frame become initial information. Using this initial information, “EKF SLAM” is initialized, and “EKF SLAM” processing is performed. Do. That is, the feature point position information acquired from the preceding image frame is set as the state information for the initial image frame, and the process of applying the extended Kalman filter (EKF) to the subsequent image frame is used to obtain the three-dimensional position information of the feature points in the subsequent frame. get.

図７に示すフローチャートは、図５に示すフローの詳細シーケンスを示すフローチャートである。
図５に示すステップＳ１０１［初期情報取得処理（ＳＦＭ）］は図７のステップＳ２０４の処理、
図５に示すステップＳ１０２［拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）］は図７のステップＳ２０７の処理、
図５に示すステップＳ１０３［バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）］は、図７のステップＳ２０９の処理、
にそれぞれ対応する。 The flowchart shown in FIG. 7 is a flowchart showing a detailed sequence of the flow shown in FIG.
Step S101 [initial information acquisition process (SFM)] shown in FIG. 5 is the process of step S204 of FIG.
Step S102 [camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) applying the extended Kalman filter] shown in FIG. 5 is the process of step S207 of FIG.
Step S103 [bundle adjustment processing] shown in FIG. 5 is the same as the processing in step S209 in FIG.
Correspond to each.

図７のフローチャートに従って本発明の処理シーケンスについて説明する。まず、ステップＳ２０１において、フラグ設定を行う。このフラグは、初期情報取得処理（ＳＦＭ）が完了しているか否かを示す状態フラグであり、初期情報取得処理（ＳＦＭ）が未完了の場合は［０］、完了した場合は［１］に設定される。 The processing sequence of the present invention will be described with reference to the flowchart of FIG. First, in step S201, a flag is set. This flag is a status flag indicating whether or not the initial information acquisition process (SFM) is completed. If the initial information acquisition process (SFM) is not completed, the flag is set to [0]. If completed, the flag is set to [1]. Is set.

まず、初期的には、ステップＳ２０１において、状態フラグが［０］に設定され、ステップＳ２０２において画像を入力する。例えば図４に示すカメラ１０２の撮影した動画像を構成する１枚の画像フレームである。 Initially, in step S201, the status flag is set to [0], and an image is input in step S202. For example, it is one image frame constituting a moving image taken by the camera 102 shown in FIG.

次に、ステップＳ２０３において状態フラグの値を確認し、フラグ＝１でない場合は、初期情報取得処理（ＳＦＭ）が未完了であるので、ステップＳ２０４に進む。ステップＳ２０４では、初期情報取得処理（ＳＦＭ）を実行する。この処理は、複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理である。従来から知られるＳＦＭ処理と同様、フレームマッチングによる特徴点の対応付け処理、バンドル調整処理によって、特徴点３次元位置と、特徴点を含む画像フレームの位置姿勢情報（カメラの位置姿勢情報）を算出する。 Next, the value of the status flag is confirmed in step S203. If the flag is not 1, the initial information acquisition process (SFM) has not been completed, and the process proceeds to step S204. In step S204, an initial information acquisition process (SFM) is executed. This process is an SFM (Structure from Motion) process that analyzes correspondence between feature points (Landmarks) included in an image using images taken from a plurality of different positions. Similar to the conventionally known SFM processing, the feature point three-dimensional position and the position and orientation information of the image frame including the feature point (camera position and orientation information) are calculated by the feature point association processing and bundle adjustment processing by frame matching. To do.

このステップＳ２０４での初期情報取得処理（ＳＦＭ）は、予め設定したフレーム数に対してのみ実行される。ステップＳ２０４での初期情報取得処理（ＳＦＭ）の入力情報と出力情報は以下の通りである。
入力：予め設定したフレーム数の画像データ、
出力：入力画像フレームの位置姿勢と特徴点位置情報
なお、入力画像フレームの位置姿勢はカメラの位置姿勢に対応する情報であり、カメラ位置姿勢情報と同一または１対１に対応する情報である。
ステップＳ２０４での初期情報取得処理（ＳＦＭ）において生成した出力情報［入力画像フレームの位置姿勢と特徴点位置情報］は、ステップＳ２０７における拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）に利用される。なお、ステップＳ２０４での初期情報取得処理（ＳＦＭ）において生成した出力情報［入力画像フレームの位置姿勢と特徴点位置情報］は、処理対象画像フレームとともに、記憶部１５２に格納する。 The initial information acquisition process (SFM) in step S204 is executed only for a preset number of frames. The input information and output information of the initial information acquisition process (SFM) in step S204 are as follows.
Input: Image data with a preset number of frames,
Output: Position and orientation of input image frame and feature point position information Note that the position and orientation of the input image frame is information corresponding to the position and orientation of the camera, and is information corresponding to the camera position and orientation information or corresponding one-to-one.
The output information [position and orientation of the input image frame and feature point position information] generated in the initial information acquisition process (SFM) in step S204 is the camera position and orientation and feature point three-dimensional position information acquisition using the extended Kalman filter in step S207. Used for processing (EKF SLAM). The output information [position and orientation of input image frame and feature point position information] generated in the initial information acquisition process (SFM) in step S204 is stored in the storage unit 152 together with the processing target image frame.

ステップＳ２０４での初期情報取得処理（ＳＦＭ）の後、ステップＳ２０５では、予め設定されたフレーム数に達してＳＦＭが完了したか否かを判定し、完了していない場合は、ステップＳ２０２に戻り、次の画像を入力してステップＳ２０４の初期情報取得処理（ＳＦＭ）を継続する。 After the initial information acquisition process (SFM) in step S204, in step S205, it is determined whether or not SFM has been completed by reaching the preset number of frames. If not, the process returns to step S202. The next image is input and the initial information acquisition process (SFM) in step S204 is continued.

ステップＳ２０５において、予め設定されたフレーム数に達して初期情報取得処理（ＳＦＭ）が完了したと判定されると、ステップＳ２０６に進み、状態フラグを［１］、すなわち、初期情報取得処理（ＳＦＭ）の完了を示す値に設定してステップＳ２０２に戻り、次の画像を入力する。 If it is determined in step S205 that the preset number of frames has been reached and the initial information acquisition process (SFM) has been completed, the process proceeds to step S206, and the status flag is set to [1], that is, the initial information acquisition process (SFM). Is set to a value indicating completion, and the process returns to step S202 to input the next image.

次に、ステップＳ２０３において、状態フラグの値が［１］であることが確認され、ステップＳ２０７に進む。ステップＳ２０７では、図５のフローにおけるステップＳ１０２の処理、すなわち、拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）を実行する。 Next, in step S203, it is confirmed that the value of the status flag is [1], and the process proceeds to step S207. In step S207, the process of step S102 in the flow of FIG. 5, that is, the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is applied is executed.

拡張カルマンフィルタ（ＥＫＦ）を適用した処理では、例えば先行する画像フレームなどから取得したカメラや特徴点の位置などの情報を状態変数として保持し、後続するフレームから対応する特徴点を観測した場合、観測情報に基づいて状態変数としてのカメラや特徴点の位置などの情報を更新してカメラの軌跡や、特徴点の３次元位置を求めるものである。なお、状態変数は、複数の状態値からなる多次元正規分布データとして保持される。すなわち、各変数の状態の確率を正規分布として示した多次元正規分布データとして保持される。 In processing using an extended Kalman filter (EKF), for example, information such as the camera and the position of a feature point acquired from a preceding image frame is held as a state variable, and the corresponding feature point is observed from the subsequent frame. Based on the information, information such as the position of the camera and feature points as state variables is updated to obtain the camera trajectory and the three-dimensional position of the feature points. The state variable is held as multidimensional normal distribution data including a plurality of state values. That is, it is held as multidimensional normal distribution data indicating the probability of the state of each variable as a normal distribution.

この拡張カルマンフィルタ（ＥＫＦ）を適用した処理を開始するためには、カメラや特徴点の位置を示す状態変数を初期値として予め保持していることが必要である。本発明の処理では、この初期値となるデータを図７のステップＳ２０４の初期情報取得処理（ＳＦＭ）（図５に示すステップＳ１０１の処理）によって取得する。 In order to start processing using the extended Kalman filter (EKF), it is necessary to previously hold state variables indicating the positions of the cameras and feature points as initial values. In the process of the present invention, the data that becomes the initial value is acquired by the initial information acquisition process (SFM) in step S204 of FIG. 7 (the process of step S101 shown in FIG. 5).

拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）において利用する状態変数について図８を参照して説明する。図８には、
（ａ）カメラによる撮影処理例、
（ｂ）状態変数、
（ｃ）状態変数の更新処理、
これらを示している。 The state variables used in the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is applied will be described with reference to FIG. In FIG.
(A) Example of shooting process by camera,
(B) a state variable,
(C) State variable update processing,
These are shown.

状態変数［ｘ］は、例えば図８（ｂ）に示すように、カメラの位置、姿勢、速度、各速度、さらに、カメラの撮影画像から検出される複数の特徴点の位置情報Ｐ１〜Ｐｎによって構成される。このように複数の状態値からなる多次元正規分布データとして構成される。なお、ここに示す状態変数の例は一例であり、さらにその他の詳細情報を含めてもよいし、これらの一部のみからなるデータ構成としてもよい。 For example, as shown in FIG. 8B, the state variable [x] is determined by the position information P1 to Pn of a plurality of feature points detected from the camera position, posture, speed, each speed, and the camera's captured image. Composed. In this way, it is configured as multidimensional normal distribution data composed of a plurality of state values. The example of the state variable shown here is merely an example, and other detailed information may be included, or a data structure including only a part of these may be used.

状態変数は、新たな入力画像に基づいて逐次更新される。図８（ｃ）に示すように、状態変数は新たな処理フレームから検出される特徴点情報に基づいて、以下のように、更新情報［ｘ_ｔ＋１］が生成される。
ｘ_ｔ＋１←ｆ（ｘ_ｔ，ａ_ｔ，ｓ_ｔ）＋ｕ
ただし、
ａ_ｔ：モーションモデル、
ｓ_ｔ：観測モデル、
ｕ：ノイズ
である。
本実施例では、モーションモデルは「等速運動モデル」、観測モデルは「ピンホールカメラモデル」に従って処理する。 The state variable is sequentially updated based on a new input image. As shown in FIG. 8C, update information [x _{t + 1} ] is generated as follows based on the feature point information detected from the new processing frame.
_{_{_{x t + 1 ← f (x}}} t, a t, s t) + u
However,
a _t : motion model,
_st : observation model,
u: Noise.
In this embodiment, the motion model is processed according to the “constant velocity motion model” and the observation model is processed according to the “pinhole camera model”.

ステップＳ２０７の拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）の開始に際して、データ変換処理部Ａ１５１が、ステップＳ２０４の初期情報取得処理（ＳＦＭ）において得られた情報を変換してＥＫＦＳＬＡＭに適用するための状態変数を生成する。先に説明したように、ステップＳ２０４での初期情報取得処理（ＳＦＭ）の入力情報と出力情報は以下の通りである。
入力：予め設定したフレーム数の画像データ、
出力：入力画像フレームの位置姿勢と特徴点位置情報 Information obtained by the data conversion processing unit A151 in the initial information acquisition process (SFM) in step S204 at the start of the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is applied in step S207. To generate state variables for application to EKF SLAM. As described above, the input information and output information of the initial information acquisition process (SFM) in step S204 are as follows.
Input: Image data with a preset number of frames,
Output: Position and orientation of input image frame and feature point position information

データ変換処理部Ａ１５１は、初期情報取得処理（ＳＦＭ）の出力情報［入力画像フレームの位置姿勢と特徴点位置情報］を入力して、ステップＳ２０７の拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）において利用するためのデータ、すなわち、ＥＫＦＳＬＡＭにおいて適用する初期フレームの状態（＝状態変数）を生成する。 The data conversion processing unit A151 receives the output information [position and orientation and feature point position information of the input image frame] of the initial information acquisition process (SFM), and the camera position and orientation and feature point 3 to which the extended Kalman filter of step S207 is applied. Data for use in the dimension position information acquisition process (EKF SLAM), that is, an initial frame state (= state variable) applied in the EKF SLAM is generated.

データ変換処理部Ａ１５１の入力値と出力値は以下の通りである。
入力値：ステップＳ２０４の初期情報取得処理（ＳＦＭ）において得られた入力画像フレームの位置姿勢（カメラ位置姿勢）と特徴点位置情報、
出力値：ＥＫＦＳＬＡＭにおいて適用する初期フレームの状態（＝状態変数） The input value and output value of the data conversion processing unit A151 are as follows.
Input values: position and orientation of the input image frame (camera position and orientation) and feature point position information obtained in the initial information acquisition process (SFM) in step S204,
Output value: State of initial frame applied in EKF SLAM (= state variable)

データ変換処理部Ａ１５１は、例えば、ＥＫＦＳＬＡＭで用いる初期フレームに対応するカメラ位置姿勢と特徴点位置を初期フレームの状態（多次元正規分布）の平均として設定する。なお、図８に示す変数のようにカメラ速度・角速度が必要な場合は、例えばＳＦＭの結果として取得される過去の画像フレームに対応して算出されたカメラ位置姿勢と、カメラが等速度で運動するという仮説を用いて、カメラの速度・加速度を予測し、初期フレームの状態の平均に反映させる設定とする。 For example, the data conversion processing unit A151 sets the camera position and orientation and the feature point position corresponding to the initial frame used in the EKF SLAM as an average of the state (multidimensional normal distribution) of the initial frame. When the camera speed / angular velocity is required as in the variable shown in FIG. 8, for example, the camera position and orientation calculated corresponding to the past image frame acquired as a result of SFM, and the camera moves at a constant speed. The camera speed / acceleration is predicted using the hypothesis that the image is to be reflected and reflected in the average of the initial frame states.

なお、状態変数は、前述したように多次元正規分布データであるが、この状態変数の共分散行列の設定態様としては様々な方法がある。共分散行列を直接求めることは、様々な誤差要因があるため困難であり、一般的には経験則で決める。本実施例では、ユーザが予め設定した値を用いた。例えば、共分散値は０として、カメラに関する自己共分散値は０、特徴点位置に関する自己共分散値はユーザが設定した経験的な値σを用いる。 Note that the state variable is multidimensional normal distribution data as described above, but there are various methods for setting the covariance matrix of the state variable. Obtaining the covariance matrix directly is difficult because of various error factors, and is generally determined by empirical rules. In this embodiment, a value preset by the user is used. For example, the covariance value is 0, the autocovariance value for the camera is 0, and the empirical value σ set by the user is used for the autocovariance value for the feature point position.

このように、データ変換部Ａ１５１は、ステップＳ２０４の初期情報取得処理（ＳＦＭ）において処理を行った予め設定された０〜Ｔフレームの画像フレームの位置姿勢（カメラ位置姿勢）と特徴点位置情報を利用して、ＥＫＦＳＬＡＭで用いる初期フレームに対応する状態変数を算出し、ステップＳ２０７では、この状態変数を利用して、拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）を行い、後続する画像フレームの入力に応じた状態変数の更新処理を行う。 As described above, the data conversion unit A151 obtains the position and orientation (camera position and orientation) and the feature point position information of the image frames of 0 to T frames set in advance in the initial information acquisition process (SFM) in step S204. Using this, a state variable corresponding to the initial frame used in the EKF SLAM is calculated, and in step S207, using this state variable, a camera position and orientation and feature point three-dimensional position information acquisition process (EKF) using the extended Kalman filter is used. SLAM), and update processing of the state variable according to the input of the subsequent image frame is performed.

ステップＳ２０７の拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）の入力情報と出力情報は以下の通りである。
入力：特徴点と初期フレームの状態、すなわち、カメラ位置姿勢情報と特徴点位置情報の多次元正規分布データからなる状態変数（図８参照）、
出力：各画像フレームに対応する状態変数と、特徴点トラッキングデータ、
なお、特徴点トラッキングデータは、各画像フレームに対応する状態変数に含まれる特徴点位置情報によって算出できる。
このステップＳ２０７の拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）において生成した出力情報［各画像フレームに対応する状態変数と、特徴点トラッキングデータ］は、処理対象画像フレームとともに、記憶部１５２に格納され、ステップＳ２０９におけるバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）に利用される。 Input information and output information of the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter of step S207 is applied are as follows.
Input: State of feature points and initial frame, that is, state variables (see FIG. 8) consisting of multi-dimensional normal distribution data of camera position and orientation information and feature point position information,
Output: State variables corresponding to each image frame, feature point tracking data,
The feature point tracking data can be calculated from the feature point position information included in the state variable corresponding to each image frame.
The output information [state variable and feature point tracking data corresponding to each image frame] generated in the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter of step S207 is applied is processed. The image is stored in the storage unit 152 together with the image frame, and is used for bundle adjustment processing (Bundle Adjustment) in step S209.

ステップＳ２０７では、各処理画像フレームに対応する状態変数（例えば図８（ｂ）に示す状態変数ｘ（多次元正規分布））と、特徴点トラッキングデータを取得し、各フレームに対応付けて図７に示すように記憶部１５２に記録する。すなわち、記憶部１５２には、ステップＳ２０７における拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）において処理された画像フレームに対応する状態変数（多次元正規分布）と特徴点トラッキングデータが記録される。 In step S207, state variables corresponding to each processed image frame (for example, state variable x (multidimensional normal distribution) shown in FIG. 8B) and feature point tracking data are acquired and associated with each frame. As shown in FIG. That is, the storage unit 152 stores state variables (multidimensional normal distribution) corresponding to the image position processed in the camera position and orientation and feature point three-dimensional position information acquisition processing (EKF SLAM) to which the extended Kalman filter is applied in step S207. Feature point tracking data is recorded.

ステップＳ２０８では、未処理の入力画像があるかを判定し、ある場合は、ステップＳ２０２に戻り、次の画像を入力して、ステップＳ２０７の拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）を継続して実行する。 In step S208, it is determined whether there is an unprocessed input image. If there is, the process returns to step S202, the next image is input, and the camera position and orientation and feature point three-dimensional position to which the extended Kalman filter of step S207 is applied. The information acquisition process (EKF SLAM) is continuously executed.

ステップＳ２０８において、未処理の入力画像がないと判定されると、ステップＳ２０９に進み、バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）を実行する。 If it is determined in step S208 that there is no unprocessed input image, the process advances to step S209 to execute bundle adjustment processing (Bundle Adjustment).

このステップＳ２０９のバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）は、ステップＳ２０７の拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）において生成された出力情報［各画像フレームに対応する状態変数（多次元正規分布）と、特徴点トラッキングデータ］を利用して実行する。 The bundle adjustment process (Bundle Adjustment) in step S209 is the output information generated in the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is applied in step S207 [corresponding to each image frame. This is executed using state variables (multidimensional normal distribution) and feature point tracking data].

図７に示すデータ変換部Ｂ１５３は、ステップＳ２０７の拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）において生成され記憶部１５２に記憶された情報を利用して、ステップＳ２０９のバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）で利用するデータへの変換処理を行う。 The data conversion unit B153 illustrated in FIG. 7 uses the information generated and stored in the storage unit 152 in the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter of step S207 is applied, Conversion processing to data used in the bundle adjustment processing (bundle adjustment) in step S209 is performed.

データ変換部Ｂ１５３の入出力は以下のとおりである。
入力：各画像フレームに対応する状態変数（多次元正規分布）と、特徴点トラッキングデータ（記憶部１５２の格納データ）
出力：各画像フレームに対応するカメラ位置姿勢と特徴点位置 The input / output of the data conversion unit B153 is as follows.
Input: State variable (multidimensional normal distribution) corresponding to each image frame and feature point tracking data (stored data in storage unit 152)
Output: camera position and feature point position corresponding to each image frame

データ変換部Ｂ１５３は、記憶部１５２の格納データである各画像フレームに対応する状態変数（多次元正規分布）を用いて、各画像フレームのカメラ位置姿勢情報を求め、特徴点位置は、最新フレームの状態（多次元正規分布）から求める。なお、状態変数は、前述したように各変数の状態の確率を正規分布として示した多次元正規分布データであり、正規分布データの平均に対応する値が、一番確率が高いので、カメラ位置姿勢や特徴点位置は、状態変数の各正規分布データの平均値を採用する。なお、特徴点トラッキングデータは記憶部１５２に記憶されたデータを利用する。 The data conversion unit B 153 obtains the camera position and orientation information of each image frame using the state variable (multidimensional normal distribution) corresponding to each image frame that is stored data in the storage unit 152, and the feature point position is the latest frame. Is obtained from the state (multidimensional normal distribution). As described above, the state variable is multi-dimensional normal distribution data indicating the state probability of each variable as a normal distribution, and the value corresponding to the average of the normal distribution data has the highest probability. For the posture and the feature point position, an average value of each normal distribution data of the state variable is adopted. Note that the feature point tracking data uses data stored in the storage unit 152.

ステップＳ２０９のバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）は、データ変換部Ｂ１５３の生成したデータ、すなわち、各画像フレームに対応するカメラ位置姿勢と特徴点位置を入力して処理を行う。
ステップＳ２０９のバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）の入出力データは以下の通りである。
入力：各画像フレームのカメラ位置姿勢と特徴点位置、および特徴点トラッキングデータ
出力：各画像フレームのカメラ位置姿勢と特徴点位置 The bundle adjustment process (bundle adjustment) in step S209 is performed by inputting the data generated by the data converter B153, that is, the camera position and orientation and the feature point position corresponding to each image frame.
Input / output data of the bundle adjustment processing (Bundle Adjustment) in step S209 is as follows.
Input: Camera position and orientation and feature point position of each image frame, and feature point tracking data Output: Camera position and orientation and feature point position of each image frame

バンドル調整処理は、先に、図３を参照して説明したように異なる位置から撮影した複数の画像に含まれる対応する特徴点の３次元位置を１つの位置に収束させる処理によって、各画像フレームのカメラ位置姿勢と特徴点位置を求める処理である。すなわち、対応する特徴点の位置は図３に示す特徴点２１のように、各画像の撮影ポイントから特徴点２１を結ぶ線（Ｂｕｎｄｌｅ）が１つの特徴点２１位置において交わるはずであるが、カメラの位置や特徴点の画像内の位置情報などは必ずしも正確に算出されず様々な要因による誤差が含まれる。従って、このような誤差を取り除く必要がある。具体的には１つの対応する特徴点の３次元位置とカメラ位置とを結ぶ線が、その１つの特徴点において交わるように算出済みのカメラ位置や特徴点位置を修正する。この処理をバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）として実行する。 As described above with reference to FIG. 3, the bundle adjustment process is performed by processing each image frame by converging the three-dimensional positions of corresponding feature points included in a plurality of images captured from different positions into one position. This is a process for obtaining the camera position and orientation and the feature point position. That is, as for the position of the corresponding feature point, a line connecting the feature point 21 from the shooting point of each image should intersect at the position of one feature point 21 like the feature point 21 shown in FIG. The position information and the position information of the feature points in the image are not necessarily calculated accurately, and errors due to various factors are included. Therefore, it is necessary to remove such errors. Specifically, the calculated camera position and feature point position are corrected so that a line connecting the three-dimensional position of one corresponding feature point and the camera position intersects at the one feature point. This processing is executed as bundle adjustment processing (Bundle Adjustment).

この修正処理によって得られたカメラ姿勢位置情報と、特徴点位置情報を最終的な出力として、３次元マップを生成する。この処理によって、精度の高い３次元マップの生成が可能となる。 A three-dimensional map is generated by using the camera posture position information and the feature point position information obtained by this correction process as final outputs. By this processing, a highly accurate three-dimensional map can be generated.

［特徴点の追加を実行する処理例］
図７のフローチャートを参照して説明した処理では、ステップＳＳ２０７の拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）の結果で得られる特徴点の情報のみを用いて、ステップＳ２０９におけるバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）を行う構成としていた。 [Example of processing for adding feature points]
In the processing described with reference to the flowchart of FIG. 7, only the information on the feature point obtained as a result of the camera position and orientation and feature point three-dimensional position information acquisition processing (EKF SLAM) to which the extended Kalman filter of step SS207 is applied is used. The bundle adjustment processing (Bundle Adjustment) in step S209 is performed.

このような処理を実行しても得られた特徴点の３次元位置情報を用いて３次元マップを生成することが可能ではあるが、処理対象とするカメラフレーム数に対して特徴点数が相対的に少ない場合、生成される３次元マップの結果が悪くなる場合がある。より精度の高い３次元マップを生成するためには利用可能な特徴点の数が多いほうが有利である。すなわち、バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）を行う前に特徴点の追加が可能であれば、追加を行い追加した特徴点の情報も併せて利用して、バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）を行って多くの特徴点の３次元位置情報を出力する構成とすることが好ましい。以下では、この特徴点追加処理を実行する処理例について説明する。 Although it is possible to generate a three-dimensional map using the three-dimensional position information of feature points obtained by executing such processing, the number of feature points is relative to the number of camera frames to be processed. If the number is too small, the result of the generated three-dimensional map may be deteriorated. In order to generate a three-dimensional map with higher accuracy, it is advantageous that the number of available feature points is large. That is, if it is possible to add feature points before performing bundle adjustment processing (bundle adjustment), information on the added feature points is also used and bundle adjustment processing (bundle adjustment) is used to perform many. It is preferable that the three-dimensional position information of the feature points is output. Hereinafter, a processing example for executing this feature point addition processing will be described.

本実施例の処理シーケンスは、図９に示すフローチャートに従った処理となる。図９に示す処理フローは、先に説明した図７に示すステップＳ２０９の処理の前にステップＳ３００の特徴点追加処理を追加した点である。その他の処理は、図７に示すフローと同様である。 The processing sequence of this embodiment is processing according to the flowchart shown in FIG. The process flow shown in FIG. 9 is that the feature point addition process of step S300 is added before the process of step S209 shown in FIG. 7 described above. Other processing is the same as the flow shown in FIG.

ステップＳ３００の特徴点追加処理は、ステップＳ２０７における拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）の完了後に実行する。 The feature point addition process in step S300 is executed after the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is applied in step S207 is completed.

この時点で、記憶部には、ステップＳ２０７におけるＥＫＦＳＬＡＭ処理によって生成された各画像フレームに対応する状態変数と、特徴点トラッキングデータが格納され、ステップＳ３００の特徴点追加処理では、この記憶部の格納データと、処理済みの画像フレームデータを入力して特徴点の追加処理を行う。 At this point, the storage unit stores state variables corresponding to each image frame generated by the EKF SLAM process in step S207 and feature point tracking data. In the feature point addition process in step S300, the storage unit stores The stored data and processed image frame data are input to perform feature point addition processing.

ステップＳ３００の特徴点追加処理の入出力データは以下の通りである。
入力：データ変換部１５３の出力［各画像フレームのカメラ位置姿勢と特徴点位置、および特徴点トラッキングデータ］と処理済みの全画像フレームデータ
出力：特徴点の三次元位置と、各画像内での特徴点位置、
である。 The input / output data of the feature point addition process in step S300 is as follows.
Input: Output of data conversion unit 153 [camera position and orientation and feature point position and feature point tracking data of each image frame] and all processed image frame data Output: three-dimensional position of feature point and within each image Feature point position,
It is.

ステップＳ２０９におけるバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）では、ステップＳ２０７におけるＥＫＦＳＬＡＭ処理において抽出済みの特徴点情報に、ステップＳ３００の特徴点追加処理において追加された特徴点情報の双方を用いてバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）が行われることになる。すなわち、先に説明した図７に示すフローより多くの特徴点の３次元位置情報の出力が可能となる。 In the bundle adjustment process (bundle adjustment) in step S209, the bundle adjustment process (both the feature point information added in the feature point addition process in step S300) is used for the feature point information extracted in the EKF SLAM process in step S207. (Bundle Adjustment) will be performed. That is, it is possible to output three-dimensional position information of more feature points than the flow shown in FIG. 7 described above.

ステップＳ３００の特徴点追加処理の詳細シーケンスを図１０のフローチャートを参照して説明する。ステップＳ３０１では、処理対象画像フレームを１枚ずつ順次入力する。なお処理対象とする画像フレームは、ステップＳ２０４の初期情報取得処理（ＳＦＭ）と、ステップＳ２０７のＥＫＦ−ＳＬＡＭ処理の処理対象となった画像である。ステップＳ３０２以下の処理は、処理対象画像フレームの各々について順次実行する。 A detailed sequence of the feature point addition process in step S300 will be described with reference to the flowchart of FIG. In step S301, the processing target image frames are sequentially input one by one. Note that the image frame to be processed is an image that has been subjected to the initial information acquisition process (SFM) in step S204 and the EKF-SLAM process in step S207. The processing after step S302 is sequentially executed for each processing target image frame.

ステップＳ３０２では、処理対象の画像フレームからの特徴点抽出処理を実行する。特徴点抽出処理は、既存の処理方法が適用可能であり、例えば、ハリスコーナーディテクター（ＨａｒｒｉｓＣｏｒｎｅｒＤｅｔｅｃｔｏｒ）を用いた特徴点抽出処理を適用する。 In step S302, a feature point extraction process from the image frame to be processed is executed. An existing processing method can be applied to the feature point extraction process, and for example, a feature point extraction process using a Harris Corner Detector is applied.

ハリスコーナーディテクター（ＨａｒｒｉｓＣｏｒｎｅｒＤｅｔｅｃｔｏｒ）を用いた特徴点抽出処理について図１１を参照して説明する。情報処理装置のデータ処理部は、特徴点抽出に際して図１１に示すようにカメラによって撮影された取得イメージ１７０から複数のハリスコーナーイメージ１８０〜１８２と、ラプラシアンイメージ１９０〜１９２を生成する。 A feature point extraction process using a Harris corner detector will be described with reference to FIG. The data processing unit of the information processing apparatus generates a plurality of Harris corner images 180 to 182 and a Laplacian image 190 to 192 from the acquired image 170 photographed by the camera as shown in FIG.

ハリスコーナーイメージは、取得イメージに対してハリスコーナーディテクター（ＨａｒｒｉｓＣｏｒｎｅｒＤｅｔｅｃｔｏｒ）を適用して生成されるイメージデータである。これらのハリスコーナーイメージ１８０〜１８２から、例えば周囲８画素に比較して値の高い画素ポイント（ｍａｘｉｍａｐｏｉｎｔ）１８５を検出点として抽出する。さらに、取得イメージ１７０に対して、ＬｏＧ（ＬａｐｌａｃｉａｎｏｆＧａｕｓｓｉａｎ）フィルタを適用して複数レベル（解像度）のラプラシアンイメージ１９０〜１９２を生成する。ＬｏＧ（ＬａｐｌａｃｉａｎｏｆＧａｕｓｓｉａｎ）フィルタは、画像の輪郭強調のために用いられる２次微分フィルタの一種であり、人間の視覚系で網膜からの情報が外側膝状体で中継されるまでに行われている処理の近似モデルとして用いられるものである。 A Harris corner image is image data generated by applying a Harris Corner Detector to an acquired image. From these Harris corner images 180 to 182, for example, a pixel point (maxima point) 185 having a higher value than the surrounding eight pixels is extracted as a detection point. Further, a LoG (Laplacian of Gaussian) filter is applied to the acquired image 170 to generate a multi-level (resolution) Laplacian image 190-192. The LoG (Laplacian of Gaussian) filter is a kind of second-order differential filter used for image edge enhancement, and is performed until information from the retina is relayed by the outer knee in the human visual system. It is used as an approximate model of the processing.

特徴点抽出処理は、ＬｏＧフィルタ出力画像であるラプラシアンイメージ１９０〜１９２の所定のレベル範囲内の解像度変化によって位置の変化がないかハリスコーナーイメージ１８０〜１８２から得られた検出点の位置に対応する箇所に対して調べ、変化がない点を特徴点とする。これにより、画像の拡大縮小操作に対してロバストな特徴点間のマッチングが実現できる。なお、これらの特徴点抽出処理の詳細については、例えば、特開２００４−３２６６９３号公報（特願２００３−１２４２２５）に記載されている。 The feature point extraction processing corresponds to the position of the detection point obtained from the Harris corner images 180 to 182 as to whether there is a change in position due to a resolution change within a predetermined level range of the Laplacian images 190 to 192 that are LoG filter output images. A point is examined, and a point having no change is defined as a feature point. Thereby, it is possible to realize matching between feature points that is robust to an enlargement / reduction operation of an image. Details of these feature point extraction processes are described in, for example, Japanese Patent Application Laid-Open No. 2004-326693 (Japanese Patent Application No. 2003-124225).

次に、ステップＳ３０３において、重複特徴点を除去する。すなわち、ステップＳ３０２において抽出された特徴点から、ステップＳ２０４の初期情報取得処理（ＳＦＭ）と、ステップＳ２０７のＥＫＦ−ＳＬＡＭ処理の処理において抽出された特徴点と重複する特徴点を除去する。ステップＳ２０４の初期情報取得処理（ＳＦＭ）と、ステップＳ２０７のＥＫＦ−ＳＬＡＭ処理の処理において抽出された特徴点情報は、記憶部１５２に格納されており、この取得済みの特徴点と、ステップＳ３０２において取得した特徴点とを比較して重複していない特徴点のみを記憶部１５２に追加登録する。 Next, in step S303, duplicate feature points are removed. That is, feature points that overlap with the feature points extracted in the initial information acquisition process (SFM) in step S204 and the EKF-SLAM process in step S207 are removed from the feature points extracted in step S302. The feature point information extracted in the initial information acquisition process (SFM) in step S204 and the EKF-SLAM process in step S207 is stored in the storage unit 152. In step S302, Only the feature points that do not overlap are compared and registered in the storage unit 152 with the acquired feature points.

具体的には、処理画像フレームに対して、記憶部１５２に登録されている特徴点の画像フレーム中の位置を下式（数式１）によって計算する。 Specifically, the position of the feature point registered in the storage unit 152 in the image frame with respect to the processed image frame is calculated by the following formula (Formula 1).

上記式の意味について、図１２、図１３を参照して説明する。上記式は、カメラの撮影画像２１０に含まれるオブジェクト２１１の点（ｍ）のカメラ像平面の画素位置２１２、すなわち、カメラ座標系によって表現されている位置と、世界座標系におけるオブジェクト２００の３次元位置（Ｍ）２０１との対応関係を示す式である。 The meaning of the above formula will be described with reference to FIGS. The above expression is obtained by calculating the pixel position 212 on the camera image plane of the point (m) of the object 211 included in the captured image 210 of the camera, that is, the position represented by the camera coordinate system, and the three-dimensional shape of the object 200 in the world coordinate system. It is an expression showing the correspondence with the position (M) 201.

カメラ像平面の画素位置２１２はカメラ座標系によって表現されている。カメラ座標系は、カメラの焦点を原点Ｃとして、像平面がＸｃ，Ｙｃの二次元平面、奥行きをＺｃとした座標系であり、カメラの動きによって原点Ｃは移動する。 A pixel position 212 on the camera image plane is expressed by a camera coordinate system. The camera coordinate system is a coordinate system in which the focal point of the camera is the origin C, the image plane is a two-dimensional plane of Xc and Yc, and the depth is Zc, and the origin C is moved by the movement of the camera.

一方、オブジェクト２００の３次元位置（Ｍ）２０１は、カメラの動きによって移動しない原点Ｏを有するＸＹＺ三軸からなる世界座標系によって示される。この異なる座標系でのオブジェクトの位置の対応関係を示す式が上述のピンホールカメラモデルとして定義される。 On the other hand, the three-dimensional position (M) 201 of the object 200 is indicated by a world coordinate system composed of three XYZ axes having an origin O that does not move due to the movement of the camera. An expression indicating the correspondence between the positions of the objects in the different coordinate systems is defined as the above-described pinhole camera model.

この式に含まれるλ，Ａ，Ｃｗ，Ｒｗは、図１３に示すように、
λ：正規化パラメータ
Ａ：カメラ内部パラメータ、
Ｃｗ：カメラ位置、
Ｒｗ：カメラ回転行列、
を意味している。 Λ, A, Cw, and Rw included in this equation are as shown in FIG.
λ: normalization parameter A: camera internal parameter,
Cw: camera position,
Rw: camera rotation matrix,
Means.

カメラ内部パラメータＡには、以下の値が含まれる。
f：焦点距離
θ：画像軸の直交性（理想値は９０°）
ｋｕ：縦軸のスケール（３次元位置のスケールから二次元画像のスケールへの変換）
ｋｖ：横軸のスケール（３次元位置のスケールから二次元画像のスケールへの変換）
（ｕ０，ｖ０）：画像中心位置 The camera internal parameter A includes the following values.
f: Focal length θ: Image axis orthogonality (ideal value is 90 °)
ku: Scale of the vertical axis (conversion from a 3D position scale to a 2D image scale)
kv: horizontal scale (conversion from 3D position scale to 2D image scale)
(U0, v0): Image center position

もし、ステップＳ３０２の特徴点抽出処理で抽出した特徴点が、記憶部１５２に登録されている特徴点の画像内の位置に近ければ、重複特徴点として削除する。近いかどうかの判断は、予め設定した閾値と比較して閾値より小さい差であれば削除する。具体的には、例えば、以下の式（数式２）、（数式３）を用いて判定を行う。
If the feature point extracted by the feature point extraction process in step S302 is close to the position in the image of the feature point registered in the storage unit 152, it is deleted as a duplicate feature point. The determination of whether or not they are close is deleted if the difference is smaller than the threshold compared with a preset threshold. Specifically, for example, the determination is performed using the following expressions (Expression 2) and (Expression 3).

上記（数式２）と（数式３）のｄｉｓｔとＸｚを用いて、
「Ｘｚが正で且つ、ｄｉｓｔがある閾値以上」
の点を重複特徴点であると判断する。但し、ｍ_{ｄａｔａｂａｓｅ}は、記憶部１５２に登録された特徴点であり、前述の式（数式１）の左辺値と同じである。また、ｍ_{ｈａｒｒｉｓ＿ｃｏｒｎｅｒ}は、ステップＳ３０２の特徴点抽出と処理で抽出された点に対応する。また、式（数式３）の左辺の各成分は、前述の式（数式１）と同一である。なお、式（数式３）は、カメラの背面にある特徴点を無視した設定としている。 Using the dist and Xz in (Equation 2) and (Equation 3) above,
“Xz is positive and dist is above a certain threshold”
Are determined to be overlapping feature points. However, m _database is a feature point registered in the storage unit 152, and is the same as the left-side value of the above formula (Formula 1). Further, m _{Harris_corner} corresponds to the point extracted by the feature point extraction and processing in step S302. Further, each component on the left side of the formula (formula 3) is the same as the formula (formula 1) described above. The equation (Equation 3) is set to ignore the feature points on the back of the camera.

次に、ステップＳ３０４において、第１の特徴点トラッキング処理、すなわち、「特徴点トラッキング＃１（特徴点位置未知）」を実行する。この処理は、一般的なテンプレートウィンドウマッチングによる特徴点のオプティカルフロー検出処理である。なお、テンプレートウィンドウマッチングによる特徴点のオプティカルフロー検出処理の詳細は、例えば［ディジタル画像処理編集委員会、「ディジタル画像処理」、財団法人画像情報教育振興協会（２００４）ｐｐ．２４３］に記載されている。 Next, in step S304, the first feature point tracking process, that is, “feature point tracking # 1 (feature point position unknown)” is executed. This process is a feature point optical flow detection process by general template window matching. Details of optical flow detection processing of feature points by template window matching are described in, for example, [Digital Image Processing Editorial Committee, “Digital Image Processing”, Foundation for Image Information Education (2004) pp. 243].

処理対象としている画像フレームからステップＳ３０２の特徴点抽出処理において新たに抽出された特徴点を選択し、フレームの特徴点周辺をウィンドウとして切り出してテンプレート画像を取得する。ウィンドウサイズはユーザが設定する（例えば３６０ｘ２４０の画像に対して２１ｘ２１のテンプレート）。次に、未来方向のフレームに対してテンプレートマッチングを行い、対応点を求める。探索領域は、直前のフレームでの対応点周辺に設定して探索を実行する。 A feature point newly extracted in the feature point extraction processing in step S302 is selected from the image frame to be processed, and the periphery of the feature point of the frame is cut out as a window to obtain a template image. The window size is set by the user (for example, a 21 × 21 template for a 360 × 240 image). Next, template matching is performed on the frame in the future direction to obtain corresponding points. The search area is set around the corresponding point in the immediately preceding frame and the search is executed.

マッチングしているか否かの指標として利用するマッチングスコアは、例えば正規化相関値を用いる。なお、正規化相関値については、上記文献［ディジタル画像処理編集委員会、「ディジタル画像処理」、財団法人画像情報教育振興協会（２００４）ｐｐ．２０４］に記載されている。この処理において、マッチングスコアが小さければ（相関値が小さければ）探索を止める。求めた対応点が求めるトラッキング結果となる。このようにして、処理対象の画像フレームからステップＳ３０２の特点抽出処理において新たに抽出された特徴点の追跡（トラッキング）処理を実行する。 For example, a normalized correlation value is used as a matching score used as an index of whether or not matching is performed. The normalized correlation value is described in the above-mentioned document [Digital Image Processing Editorial Committee, “Digital Image Processing”, Foundation for Image Information Education (2004) pp. 204]. In this process, if the matching score is small (if the correlation value is small), the search is stopped. The obtained corresponding point is the obtained tracking result. In this way, the tracking process of the feature point newly extracted in the feature point extraction process of step S302 from the image frame to be processed is executed.

次に、ステップＳ３０５において、特徴点三次元位置推定処理を実行する。この処理は、各画像フレームのカメラ位置と上述の処理によって得られる特徴点トラッキング結果を用いて特徴点の三次元位置を求める処理である。 Next, in step S305, a feature point three-dimensional position estimation process is executed. This process is a process for obtaining the three-dimensional position of the feature point using the camera position of each image frame and the feature point tracking result obtained by the above-described process.

先に説明した式（数式１）、すなわち、ピンホールカメラモデルの式を用いて、特徴点の位置Ｍ求める。但し、カメラ位置はｃ、カメラ姿勢（回転行列）はＲ、カメラ内部パラメータはＡ、特徴点の画像内の位置はｍである。次に、式（数式１）を変形し、特徴点の位置Ｍに関する以下の式（数式４）を作成する。
The position M of the feature point is obtained using the formula (Formula 1) described above, that is, the formula of the pinhole camera model. However, the camera position is c, the camera posture (rotation matrix) is R, the camera internal parameter is A, and the position of the feature point in the image is m. Next, the equation (Equation 1) is modified to create the following equation (Equation 4) relating to the position M of the feature point.

上記式（数式４）にあるそれぞれの変数は、前述した式（数式１）と同じである。また、ｄはカメラ位置ｃと特徴点Ｍとの距離を表す。また、下付き文字ｉはフレーム番号を表す。また、小文字で書かれているベクトルｘは、下式（数式５）によって定義される。
Each variable in the above formula (formula 4) is the same as the formula (formula 1) described above. D represents the distance between the camera position c and the feature point M. A subscript i represents a frame number. A vector x written in lower case is defined by the following equation (Equation 5).

なお、上記式（数式４）は正値にのみ成り立つので、図９に示すデータ変換部Ｂ１５３から得られる計測値に対しては、以下の式（数式６）が成り立つ。
In addition, since the said Formula (Formula 4) is formed only in a positive value, the following formula | equation (Formula 6) is formed with respect to the measured value obtained from data converter B153 shown in FIG.

ただし、
記号［＾］は計測値を意味し、
［ε］は計測誤差により発生する左辺・右辺同士の誤差、
また、［ｘ＾］は、以下の式（数式７）によって定義される値である。
However,
The symbol [^] means the measured value,
[Ε] is the error between the left and right sides caused by the measurement error,
[X ^] is a value defined by the following equation (Equation 7).

入力フレームは１フレーム分ではなく、ステップＳ３０４の処理、すなわち、「特徴点トラッキング＃１（特徴点位置未知）」でトラッキングしたｎフレーム分（１フレーム分＝カメラ位置姿勢ｃ、Ｒと特徴点の画像内の位置ｍ）とするので、複数フレームに対応するために上記式（数式６）を拡張し、以下の式（数式８）を得る。
The input frame is not for one frame, but for n frames (1 frame = camera position / posture c, R and feature points) tracked in the process of step S304, that is, “feature point tracking # 1 (feature point position unknown)”. Since the position in the image is m), the above formula (formula 6) is expanded to correspond to a plurality of frames, and the following formula (formula 8) is obtained.

ただし、Ｍは下式によって定義される。
However, M is defined by the following formula.

上記式（数式８）をｎフレーム分用意し、１つの式にまとめたのが以下に示す式（数式１０）である。
The above formula (Formula 8) is prepared for n frames, and the following formula (Formula 10) is compiled into one formula.

なお、上記式（数式１０）の各行列・ベクトルは、以下の式（数式１１）
として定義する。 In addition, each matrix and vector of the above formula (formula 10) is expressed by the following formula (formula 11).
Define as

上記式（数式１１）の左辺のεのノルム（ベクトルの長さ）が最小（誤差最小）になるようなベクトルＸを計算すれば、所望のベクトルＹ（特徴点の三次元位置Ｍと、各フレームデータのカメラ位置と特徴点との間の距離ｄｉ）が求まる。εのノルムの二乗の式は、以下の式（数式１２）になる。
If the vector X is calculated such that the norm (vector length) of ε on the left side of the above equation (Equation 11) is minimum (error minimum), the desired vector Y (three-dimensional position M of the feature points, A distance di) between the camera position of the frame data and the feature point is obtained. The expression of the square of the norm of ε is the following expression (Expression 12).

上記式（数式１２）のベクトルＹを変数、他を全て定数とみなしたとき、上記式（数式１２）はベクトルＹに関する二次関数なので、左辺が最小になるＹは、上記式（数式１２）のベクトルＹに関する偏微分式である以下の式（数式１３）、すなわち、
When the vector Y in the above equation (Equation 12) is regarded as a variable and the others are all constants, the above equation (Equation 12) is a quadratic function related to the vector Y. The following equation (Equation 13) which is a partial differential equation with respect to the vector Y of:

上記式（数式１３）の偏微分値が０となるＹである。そのベクトルＹは、以下の式（数式１４）である。
Y where the partial differential value of the above formula (formula 13) is zero. The vector Y is represented by the following formula (Formula 14).

上記式（数式１４）で求めたベクトル、すなわち、
The vector obtained by the above formula (Formula 14), that is,

上記式によって定義されるベクトル［Ｙ］から、特徴点位置［Ｍ］、すなわち、
上記特徴点位置［Ｍ］を抽出し、結果とする。 From the vector [Y] defined by the above equation, the feature point position [M], that is,
The feature point position [M] is extracted as a result.

次のステップＳ３０６では第２の特徴点トラッキング処理、すなわち、「特徴点トラッキング＃２（特徴点位置既知）」を実行する。先に説明した一般的なテンプレートウィンドウマッチングによる特徴点のオプティカルフロー検出方法に、特徴点位置情報を用いてよりロバストにトラッキングをする。 In the next step S306, the second feature point tracking process, that is, “feature point tracking # 2 (feature point position known)” is executed. Tracking is performed more robustly using the feature point position information in the feature point optical flow detection method based on the general template window matching described above.

画像フレームの特徴点周辺を切り出し、予め設定したウィンドウサイズのテンプレート、例えば３６０ｘ２４０の画像に対して２１ｘ２１のテンプレート画像を取得する。ここまでは、先のステップＳ３０４の第１のトラッキング処理と同一の処理である。ステップＳ３０６では、未来方向のフレームに対してテンプレートマッチングを行い、対応点を求める。この際の探索中心を、ステップＳ３０５における特徴点三次元位置推定処理で求めた特徴点位置、すなわち、
The feature point periphery of the image frame is cut out, and a template image having a preset window size, for example, a 21 × 21 template image is acquired for a 360 × 240 image. Up to this point, the process is the same as the first tracking process in step S304. In step S306, template matching is performed on the frame in the future direction to obtain corresponding points. The search center at this time is the feature point position obtained by the feature point three-dimensional position estimation process in step S305, that is,

上記特徴点位置Ｍを用いて計算する。計算は、先に説明した式（数式１）を用いる。この計算により探索中心を決定して、その探索中心に中心を設定したウィンドウ内の検索を行う。 Calculation is performed using the feature point position M. For the calculation, the above-described formula (Formula 1) is used. A search center is determined by this calculation, and a search is performed in a window in which the center is set to the search center.

マッチングの有無の判定指標であるマッチングスコアは、先に説明したステップＳ３０４の「特徴点トラッキング＃１」と同様正規化相関値を用いることができる。もし、マッチングスコアが小さければ（相関値が小さければ）、そのフレームには特徴点がないと判定する。画像フレームデータに対して未来方向全フレームに対して処理を行い、特徴点のトラッキングを実行し、各画像フレームにおける対応特徴点を検出する。この処理によれば、カメラフレームから一旦外れた特徴点も、再度カメラフレームに入ったときに再度追跡することが可能となる。 As the matching score, which is a determination index for the presence / absence of matching, a normalized correlation value can be used as in “feature point tracking # 1” in step S304 described above. If the matching score is small (the correlation value is small), it is determined that there is no feature point in the frame. The image frame data is processed for all frames in the future direction, the feature points are tracked, and the corresponding feature points in each image frame are detected. According to this process, a feature point that has once deviated from the camera frame can be tracked again when it enters the camera frame again.

ステップＳ３０６の処理によって各画像フレームにおいて検出された対応特徴点情報の三次元位置をステップＳ３０７において算出し、算出した核特徴点の三次元位置情報を各フレームに対応付けて記憶部１５２に追加特徴点情報として記録する。 In step S307, the three-dimensional position of the corresponding feature point information detected in each image frame by the processing in step S306 is calculated, and the calculated three-dimensional position information of the nuclear feature point is associated with each frame and added to the storage unit 152. Record as point information.

ステップＳ３０８において、処理対象フレームにおける全特徴点の処理が完了したか否かを判定し、未処理特徴点がある場合は、ステップＳ３０４からの処理を未処理の特徴点について実行する。ステップＳ３０８において、処理対象フレームにおける全特徴点の処理が完了したと判定した場合は、ステップＳ３０９に進み、画像フレーム全ての処理が完了したか否かを判定し、未処理画像フレームがある場合は、ステップＳ３０１からの処理を未処理の画像フレームについて実行する。ステップＳ３０９において、全画像フレームの処理が完了したと判定した場合は処理を終了する。 In step S308, it is determined whether or not the processing of all feature points in the processing target frame has been completed. If there are unprocessed feature points, the processing from step S304 is executed on the unprocessed feature points. If it is determined in step S308 that processing of all feature points in the processing target frame has been completed, the process proceeds to step S309, where it is determined whether processing of all image frames has been completed. Then, the processing from step S301 is executed for an unprocessed image frame. If it is determined in step S309 that all image frames have been processed, the processing ends.

このようにして、図９に示すステップＳ３００の特徴点追加処理が実行され、ステップＳ２０４における初期情報取得処理（ＳＦＭ）、ステップＳ２０７における拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）において抽出されなかった特徴点が記憶部１５２に追加登録されることになる。 In this way, the feature point addition process in step S300 shown in FIG. 9 is executed, the initial information acquisition process (SFM) in step S204, and the camera position and orientation and feature point three-dimensional position information acquisition using the extended Kalman filter in step S207. The feature points that are not extracted in the processing (EKF SLAM) are additionally registered in the storage unit 152.

より精度の高い３次元マップを生成するためには利用可能な特徴点の数が多いほうが有利であり、本実施例によって追加した特徴点を利用した３次元マップはより精度の高いデータとなる。 In order to generate a more accurate three-dimensional map, it is advantageous that the number of available feature points is larger, and the three-dimensional map using the feature points added according to the present embodiment is data with higher accuracy.

先に図７のフローを参照して説明した特徴点を追加していない状態でバンドル調整を行った場合」と、図９のフローを参照して説明した「特徴点を追加した後でバンドル調整を行った場合」の、それぞれの特徴点のデータ例を図１４に示す。 When bundle adjustment is performed in a state where the feature points described with reference to the flow of FIG. 7 are not added ”and“ bundle adjustment after adding feature points ”described with reference to the flow of FIG. FIG. 14 shows an example of data of each feature point in the case of

図１４は、図１４（ａ）に示すポスターを貼った壁面の撮影画像に基づいて、
（ｂ１）特徴点追加処理なしの場合の特徴点情報とカメラ軌跡、
（ｂ２）特徴点追加処理ありの場合の特徴点情報とカメラ軌跡、
これらのデータを示している。
（ｂ２）に示すように、特徴点の数が増大し、このような多くの特徴点を利用することで、より詳細な３Ｄマップを生成することが可能となる。図１５は、特徴点の追加処理を行って取得した特徴点の３次元位置情報を用いて生成した３Ｄマップの例を示している。 FIG. 14 is based on the photographed image of the wall with the poster shown in FIG.
(B1) Feature point information and camera trajectory without the feature point addition process,
(B2) Feature point information and camera locus when feature point addition processing is performed,
These data are shown.
As shown in (b2), the number of feature points increases, and a more detailed 3D map can be generated by using such many feature points. FIG. 15 shows an example of a 3D map generated using the three-dimensional position information of feature points acquired by performing the feature point addition process.

図１６に本発明の情報処理装置の構成例を示す。情報処理装置１２０は、先に図４を参照して説明したように例えば移動するユーザ１０１の保持するカメラ１０２の撮影画像、例えば動画像を構成する画像を入力し、その入力画像の解析を実行して撮影画像に含まれるオブジェクトについての３次元画像情報１０３を生成する。 FIG. 16 shows a configuration example of the information processing apparatus of the present invention. As described above with reference to FIG. 4, the information processing apparatus 120 inputs, for example, a captured image of the camera 102 held by the moving user 101, for example, an image constituting a moving image, and analyzes the input image Thus, the three-dimensional image information 103 about the object included in the captured image is generated.

情報処理装置１２０は、図１６に示すように、画像入力部５０１、初期情報生成部（ＳＦＭ）５０２、データ変換部Ａ５０３、特徴点位置情報生成部（ＥＫＦ−ＳＬＡＭ）５０４、データ変換部Ｂ５０５、バンドル調整処理部５０６、記憶部５０７、３Ｄマップ生成部５０８、特徴点抽出部５１１を有する。 As illustrated in FIG. 16, the information processing apparatus 120 includes an image input unit 501, an initial information generation unit (SFM) 502, a data conversion unit A 503, a feature point position information generation unit (EKF-SLAM) 504, a data conversion unit B 505, A bundle adjustment processing unit 506, a storage unit 507, a 3D map generation unit 508, and a feature point extraction unit 511 are included.

画像入力部５０１は、図４を参照して説明したように例えば移動するユーザ１０１の保持するカメラ１０２の撮影画像を入力して、初期情報取得部５０２、特徴点位置情報取得部５０４に出力する。なお、画像入力部５０１から入力する画像フレームは、直接、あるいは他の処理部を介して記憶部５０７に記憶される。 As described with reference to FIG. 4, the image input unit 501 inputs a captured image of the camera 102 held by the moving user 101, for example, and outputs it to the initial information acquisition unit 502 and the feature point position information acquisition unit 504. . Note that the image frame input from the image input unit 501 is stored in the storage unit 507 directly or through another processing unit.

画像入力部５０１の入力画像中、予め設定された所定の初期フレームは、初期情報取得部（ＳＦＭ）５０２に出力され、その後は、特徴点位置情報取得部５０４に出力される。 In the input image of the image input unit 501, a predetermined initial frame set in advance is output to the initial information acquisition unit (SFM) 502 and then output to the feature point position information acquisition unit 504.

初期情報生成部（ＳＦＭ）５０２は、図７、図９のフローチャートにおけるステップＳ２０４の処理を実行する。すなわち、複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理を実行する。従来から知られるＳＦＭ処理と同様、フレームマッチングによる特徴点の対応付け処理、バンドル調整処理によって、特徴点３次元位置と、特徴点を含む画像フレームの位置姿勢情報（カメラの位置姿勢情報）を算出する。初期情報生成部（ＳＦＭ）５０２の入力情報と出力情報は以下の通りである。
入力：予め設定したフレーム数の画像データ、
出力：入力画像フレームの位置姿勢と特徴点位置情報
この出力情報は、データ変換部Ａ５０３に出力されるとともに、記憶部５０７に処理画像フレームに対応付けられて格納される。 The initial information generation unit (SFM) 502 executes the process of step S204 in the flowcharts of FIGS. That is, SFM (Structure from Motion) processing for analyzing the correspondence of feature points (Landmarks) included in the image using images taken from a plurality of different positions is executed. Similar to the conventionally known SFM processing, the feature point three-dimensional position and the position and orientation information of the image frame including the feature point (camera position and orientation information) are calculated by the feature point association processing and bundle adjustment processing by frame matching. To do. The input information and output information of the initial information generation unit (SFM) 502 are as follows.
Input: Image data with a preset number of frames,
Output: Position and orientation of input image frame and feature point position information This output information is output to the data conversion unit A503 and stored in the storage unit 507 in association with the processed image frame.

データ変換部Ａ５０３は、初期情報生成部（ＳＦＭ）５０２の初期情報取得処理（ＳＦＭ）において得られた入力画像フレームの位置姿勢（カメラ位置姿勢）と特徴点位置情報を入力して、
出力値：ＥＫＦＳＬＡＭにおいて適用する初期フレームの状態（＝状態変数）
を生成する。なお、状態変数は、先に図８（ｂ）を参照して説明したように、カメラの位置、姿勢、速度、各速度、さらに、カメラの撮影画像から検出される複数の特徴点の位置情報Ｐ１〜Ｐｎによって構成される。
データ変換部Ａ５０３の生成した初期フレームの状態情報（＝状態変数）は、特徴点位置情報生成部（ＥＫＦＳＬＡＭ）５０４に入力される。 The data conversion unit A503 inputs the position and orientation (camera position and orientation) of the input image frame and the feature point position information obtained in the initial information acquisition process (SFM) of the initial information generation unit (SFM) 502, and
Output value: State of initial frame applied in EKF SLAM (= state variable)
Is generated. As described above with reference to FIG. 8B, the state variable includes the camera position, posture, speed, each speed, and position information of a plurality of feature points detected from the captured image of the camera. It is comprised by P1-Pn.
The state information (= state variable) of the initial frame generated by the data conversion unit A503 is input to the feature point position information generation unit (EKF SLAM) 504.

特徴点位置情報生成部（ＥＫＦＳＬＡＭ）５０４は、図７、図９のフローチャートにおけるステップＳ２０７の処理を実行する。すなわち、拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）を実行する。 The feature point position information generation unit (EKF SLAM) 504 executes the process of step S207 in the flowcharts of FIGS. That is, the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is applied is executed.

前述したように拡張カルマンフィルタ（ＥＫＦ）を適用した処理では、例えば先に図８（ｂ）を参照して説明したように、カメラの位置、姿勢、速度、各速度、さらに、カメラの撮影画像から検出される複数の特徴点の位置情報Ｐ１〜Ｐｎによって構成される状態変数を、逐次更新して、各処理画像フレームに対応する状態変数を求める。この拡張カルマンフィルタ（ＥＫＦ）を適用した処理を開始するためには、カメラや特徴点の位置を示す状態変数を初期値として予め保持していることが必要であり、この初期値となるデータを、データ変換部Ａ５０３から受領して処理を開始する。 As described above, in the process to which the extended Kalman filter (EKF) is applied, for example, as described above with reference to FIG. 8B, the camera position, posture, speed, each speed, and further, from the captured image of the camera. A state variable constituted by position information P1 to Pn of a plurality of detected feature points is sequentially updated to obtain a state variable corresponding to each processed image frame. In order to start the processing using the extended Kalman filter (EKF), it is necessary to hold in advance a state variable indicating the position of the camera or the feature point as an initial value. Receiving from the data converter A503, processing is started.

特徴点位置情報生成部（ＥＫＦＳＬＡＭ）５０４における入力情報と出力情報は以下の通りである。
入力：特徴点と初期フレームの状態、すなわち、カメラ位置姿勢情報と特徴点位置情報の多次元正規分布データからなる状態変数（図８参照）、
出力：各画像フレームに対応する状態変数と、特徴点トラッキングデータ、
なお、特徴点トラッキングデータは、各画像フレームに対応する状態変数に含まれる特徴点位置情報によって算出できる。
この特徴点位置情報生成部（ＥＫＦＳＬＡＭ）５０４の処理（ＥＫＦＳＬＡＭ）において生成した出力情報［各画像フレームに対応する状態変数と、特徴点トラッキングデータ］は、処理対象画像フレームとともに、記憶部５０７に格納される。 Input information and output information in the feature point position information generation unit (EKF SLAM) 504 are as follows.
Input: State of feature points and initial frame, that is, state variables (see FIG. 8) consisting of multi-dimensional normal distribution data of camera position and orientation information and feature point position information,
Output: State variables corresponding to each image frame, feature point tracking data,
The feature point tracking data can be calculated from the feature point position information included in the state variable corresponding to each image frame.
The output information [state variable and feature point tracking data corresponding to each image frame] generated in the processing (EKF SLAM) of the feature point position information generation unit (EKF SLAM) 504 is stored together with the processing target image frame. Stored in

データ変換部Ｂ５０５は、記憶部５０７に格納された特徴点位置情報生成部（ＥＫＦＳＬＡＭ）５０４の生成データを取り出して、バンドル調整処理部（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）５０６で利用するデータへの変換処理を行う。データ変換部Ｂ５０５の入出力データは以下のとおりである。
入力：各画像フレームに対応する状態変数（多次元正規分布）と、特徴点トラッキングデータ（記憶部５０７の格納データ）
出力：各画像フレームに対応するカメラ位置姿勢と特徴点位置 The data conversion unit B505 takes out the generated data of the feature point position information generation unit (EKF SLAM) 504 stored in the storage unit 507, and performs conversion processing to data used by the bundle adjustment processing unit (Bundle Adjustment) 506. . The input / output data of the data conversion unit B505 is as follows.
Input: State variables (multidimensional normal distribution) corresponding to each image frame and feature point tracking data (stored data in the storage unit 507)
Output: camera position and feature point position corresponding to each image frame

データ変換部Ｂ５０５は、記憶部５０７の格納データである各画像フレームに対応する状態変数（多次元正規分布）を用いて、各画像フレームのカメラ位置姿勢情報を求め、特徴点位置は、最新フレームの状態（多次元正規分布）から求める。なお、状態変数は、前述したように各変数の状態の確率を正規分布として示した多次元正規分布データであり、正規分布データの平均に対応する値が、一番確率が高いので、カメラ位置姿勢や特徴点位置は、状態変数の各正規分布データの平均値を採用する。 The data conversion unit B505 uses the state variables (multidimensional normal distribution) corresponding to each image frame that is stored data in the storage unit 507 to obtain the camera position and orientation information of each image frame, and the feature point position is the latest frame. Is obtained from the state (multidimensional normal distribution). As described above, the state variable is multi-dimensional normal distribution data indicating the state probability of each variable as a normal distribution, and the value corresponding to the average of the normal distribution data has the highest probability. For the posture and the feature point position, an average value of each normal distribution data of the state variable is adopted.

データ変換部Ｂ５０５の生成データは、バンドル調整処理部（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）５０６に提供され、バンドル調整処理部（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）５０６は、この入力情報を利用してバンドル調整処理を行う。
バンドル調整処理部（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）５０６の入出力データは以下の通りである。
入力：各画像フレームのカメラ位置姿勢と特徴点位置、および特徴点トラッキングデータ
出力：各画像フレームのカメラ位置姿勢と特徴点位置 The data generated by the data conversion unit B505 is provided to a bundle adjustment processing unit (bundle adjustment) 506, and the bundle adjustment processing unit (bundle adjustment) 506 performs bundle adjustment processing using this input information.
The input / output data of the bundle adjustment processing unit (bundle adjustment) 506 is as follows.
Input: Camera position and orientation and feature point position of each image frame, and feature point tracking data Output: Camera position and orientation and feature point position of each image frame

バンドル調整処理は、先に、図３を参照して説明したように異なる位置から撮影した複数の画像に含まれる対応する特徴点の３次元位置を１つの位置に収束させる処理によって、各画像フレームのカメラ位置姿勢と特徴点位置を求める処理である。この処理によって得られたカメラ姿勢位置情報と、特徴点位置情報が記憶部５０７に格納され、さらに３Ｄマップ生成部５０８に提供される。３Ｄマップ生成部５０８は、これらの特徴点情報とカメラ軌跡情報を利用して３Ｄマップの生成を行う。例えば図１５に示すような３Ｄマップが生成される。 As described above with reference to FIG. 3, the bundle adjustment process is performed by processing each image frame by converging the three-dimensional positions of corresponding feature points included in a plurality of images captured from different positions into one position. This is a process for obtaining the camera position and orientation and the feature point position. The camera posture position information and the feature point position information obtained by this processing are stored in the storage unit 507 and further provided to the 3D map generation unit 508. The 3D map generation unit 508 generates a 3D map using these feature point information and camera trajectory information. For example, a 3D map as shown in FIG. 15 is generated.

特徴点抽出部５１１は、図９、図１０を参照して説明した特徴点追加処理を実行する場合の構成であり、図１０に示すフローに従った処理を実行して、初期情報生成部（ＳＦＭ）５０２、および特徴点位置情報生成部（ＥＫＦ−ＳＬＡＭ）５０４において抽出されなかった特徴点を検出し、その３次元位置情報と各画像フレーム内の位置情報を算出しては記憶部５０７に追加登録する。 The feature point extraction unit 511 is configured to execute the feature point addition processing described with reference to FIGS. 9 and 10. The feature point extraction unit 511 executes processing according to the flow shown in FIG. SFM) 502 and feature point position information generation unit (EKF-SLAM) 504 detect feature points that are not extracted, calculate the three-dimensional position information and position information in each image frame, and store them in storage unit 507. Register additional.

この処理が実行された場合、バンドル調整処理部（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）５０６は、初期情報生成部（ＳＦＭ）５０２、および特徴点位置情報生成部（ＥＫＦ−ＳＬＡＭ）５０４において抽出された特徴点と、さらに、特徴点抽出部５１１において新たに抽出された特徴点の双方を用いてバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）を実行し、より多くの特徴点の３次元位置情報を出力する。先に図１４を参照して説明した通りである。この処理によって、より精度の高い３次元マップの生成が実現される。 When this process is executed, the bundle adjustment processing unit (bundle adjustment) 506 includes the feature points extracted by the initial information generation unit (SFM) 502 and the feature point position information generation unit (EKF-SLAM) 504, and further Then, bundle adjustment processing (Bundle Adjustment) is executed using both of the feature points newly extracted by the feature point extraction unit 511, and three-dimensional position information of more feature points is output. As described above with reference to FIG. By this process, a more accurate three-dimensional map can be generated.

以上、特定の実施例を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本発明の要旨を判断するためには、特許請求の範囲の欄を参酌すべきである。 The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiments without departing from the gist of the present invention. In other words, the present invention has been disclosed in the form of exemplification, and should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims should be taken into consideration.

また、明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。例えば、プログラムは記録媒体に予め記録しておくことができる。記録媒体からコンピュータにインストールする他、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットといったネットワークを介してプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることができる。 The series of processing described in the specification can be executed by hardware, software, or a combined configuration of both. When executing processing by software, the program recording the processing sequence is installed in a memory in a computer incorporated in dedicated hardware and executed, or the program is executed on a general-purpose computer capable of executing various processing. It can be installed and run. For example, the program can be recorded in advance on a recording medium. In addition to being installed on a computer from a recording medium, the program can be received via a network such as a LAN (Local Area Network) or the Internet, and installed on a recording medium such as a built-in hard disk.

なお、明細書に記載された各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。また、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Note that the various processes described in the specification are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Further, in this specification, the system is a logical set configuration of a plurality of devices, and the devices of each configuration are not limited to being in the same casing.

以上、説明したように、本発明の一実施例の構成によれば、カメラの取得した画像に基づく特徴点の３次元位置取得構成において、カメラ撮影画像の画像フレーム中、複数の先行画像フレームのみを利用した対応特徴点解析による特徴点の３次元位置情報を初期情報として取得する処理と、この初期情報を初期画像フレームに対する状態情報として設定して、後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報を取得する構成としたので、例えばフレームマッチングなどを伴う特徴点抽出処理は、先行画像フレームに対してのみ実行すればよく、効率的な特徴点の３次元位置情報の取得および３次元画像データの生成が実現される。 As described above, according to the configuration of one embodiment of the present invention, in the configuration for acquiring the three-dimensional position of the feature point based on the image acquired by the camera, only a plurality of preceding image frames in the image frame of the camera-captured image. Processing to obtain 3D position information of feature points as initial information by corresponding feature point analysis using, and set this initial information as state information for the initial image frame and apply an extended Kalman filter (EKF) to the subsequent image frame As a result of the above processing, the configuration is such that the three-dimensional position information of the feature points in the subsequent frame is acquired. For example, the feature point extraction processing involving frame matching or the like only needs to be performed on the previous image frame, which is efficient. Acquisition of three-dimensional position information of feature points and generation of three-dimensional image data are realized.

３次元マップ（３Ｄｍａｐ）の生成処理シーケンスの一例について説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining an example of the production | generation process sequence of a three-dimensional map (3D map). 画像フレームに含まれる特徴点情報とカメラ軌跡情報を取得する処理シーケンスの一例について説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining an example of the process sequence which acquires the feature point information and camera locus information which are contained in an image frame. バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）の一例について説明する図である。It is a figure explaining an example of a bundle adjustment process (Bundle Adjustment). 本発明の処理の概要について説明する図である。It is a figure explaining the outline | summary of the process of this invention. 本発明の一実施例に従った処理のシーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the sequence of the process according to one Example of this invention. 本発明の一実施例に従った処理の具体的なシーケンスについて説明する図である。It is a figure explaining the specific sequence of the process according to one Example of this invention. 本発明の一実施例に従った処理の詳細シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the detailed sequence of the process according to one Example of this invention. 拡張カルマンフィルタ（ＥＫＦ）を適用した処理において適用する状態変数について説明する図である。It is a figure explaining the state variable applied in the process which applied the extended Kalman filter (EKF). 本発明の一実施例に従った処理の詳細シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the detailed sequence of the process according to one Example of this invention. 本発明の一実施例に従った特徴点抽出処理の詳細シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the detailed sequence of the feature point extraction process according to one Example of this invention. ハリスコーナーディテクター（ＨａｒｒｉｓＣｏｒｎｅｒＤｅｔｅｃｔｏｒ）を用いた特徴点抽出処理について説明する図である。It is a figure explaining the feature point extraction process using a Harris corner detector (Harris Corner Detector). ピンホールカメラモデル、すなわちカメラ座標系によって表現されている位置と、世界座標系におけるオブジェクトの３次元位置との対応関係を示す式の意味について説明する図である。It is a figure explaining the meaning of the formula which shows the correspondence of the pinhole camera model, ie, the position expressed by the camera coordinate system, and the three-dimensional position of the object in the world coordinate system. ピンホールカメラモデル、すなわちカメラ座標系によって表現されている位置と、世界座標系におけるオブジェクトの３次元位置との対応関係を示す式の意味について説明する図である。It is a figure explaining the meaning of the formula which shows the correspondence of the pinhole camera model, ie, the position expressed by the camera coordinate system, and the three-dimensional position of the object in the world coordinate system. 特徴点追加処理を行わない場合と行った場合の処理例について説明する図である。It is a figure explaining the process example when not performing the feature point addition process and when performing. 特徴点追加処理を行った場合の処理例について説明する図である。It is a figure explaining the process example at the time of performing a feature point addition process. 本発明の一実施例に係る情報処理装置の構成例について説明する図である。It is a figure explaining the structural example of the information processing apparatus which concerns on one Example of this invention.

Explanation of symbols

１１〜１３撮影画像
２１〜２３特徴点
１０１ユーザ
１０２カメラ
１０３３次元画像情報
１２０情報処理装置
１３１疎な３次元マップ
１３２密な３次元マップ
１７０取得イメージ
１８０〜１８２ハリスコーナーイメージ
１９０〜１９２ラプラシアンイメージ
５０１画像入力部
５０２初期情報生成部（ＳＦＭ）
５０３データ変換部Ａ
５０４特徴点位置情報生成部（ＥＫＦ−ＳＬＡＭ）
５０５データ変換部Ｂ
５０６バンドル調整処理部
５０７記憶部
５０８３Ｄマップ生成部
５１１特徴点抽出部 11-13 Photographed image 21-23 Feature point 101 User 102 Camera 103 Three-dimensional image information 120 Information processing device 131 Sparse three-dimensional map 132 Dense three-dimensional map 170 Acquisition image 180-182 Harris corner image 190-192 Laplacian image 501 Image input unit 502 Initial information generation unit (SFM)
503 Data conversion part A
504 Feature point position information generator (EKF-SLAM)
505 Data converter B
506 Bundle adjustment processing unit 507 Storage unit 508 3D map generation unit 511 Feature point extraction unit

Claims

An information processing apparatus that calculates a three-dimensional position of a feature point included in an image,
An initial information generation unit that inputs a plurality of preceding image frames in an image frame of a camera-captured image and acquires three-dimensional position information of feature points by analyzing corresponding feature points of each image frame;
The feature point position information acquired from the preceding image frame by the initial information generating unit is set as state information for the initial image frame, and 3 of feature points in the succeeding frame is processed by applying an extended Kalman filter (EKF) to the succeeding image frame. A feature point position information generation unit for acquiring dimension position information;
An information processing apparatus comprising:

The information processing apparatus further includes:
2. The information processing according to claim 1, further comprising a bundle adjustment processing unit that inputs the feature point position information generated by the feature point position information generation unit and executes a process of correcting the three-dimensional position information of the feature point. apparatus.

The initial information generator is
The information according to claim 1, wherein an SFM (Structure from Motion) process for analyzing a correspondence between feature points included in each image frame is performed using images taken from a plurality of different positions. Processing equipment.

The initial information generator is
By analyzing the preceding image frame, it is configured to calculate position and orientation information of the camera that captured the image frame,
The feature point position information generation unit
Processing in which the initial information generation unit sets feature point position information and camera position and orientation information calculated from the preceding image frame as state information corresponding to the initial image frame, and applies an extended Kalman filter (EKF) to the subsequent image frame The information processing apparatus according to claim 1, wherein three-dimensional position information of a feature point in a subsequent frame and camera position and orientation information obtained by capturing the subsequent frame are acquired.

The feature point position information generation unit
Capturing 3D position information and subsequent frames of feature points in subsequent frames by setting multidimensional normal distribution data including feature point position information and camera position and orientation information as state information and applying an extended Kalman filter (EKF) 5. The information processing apparatus according to claim 4, wherein the camera position / orientation information obtained is acquired.

The information processing apparatus further includes:
A feature point extraction unit that executes a process of calculating a three-dimensional position of an additional feature point extracted by extracting a feature point that has not been extracted by the initial information generation unit and the feature point position information generation unit as an additional feature point; Have
The bundle adjustment processing unit
The feature point position information generated by the feature point position information generation unit and the feature point position information of the additional feature point extracted by the feature point extraction unit are input, and correction processing of the three-dimensional position information of the feature point is executed. The information processing apparatus according to claim 2, wherein the information processing apparatus is provided.

The feature point extraction unit includes:
The feature point extracted from the image frame and the feature point extracted by the initial information generation unit and the feature point position information generation unit are determined to overlap, and only new feature points that do not overlap are selected as additional feature points. The information processing apparatus according to claim 6, wherein processing is executed.

The information processing apparatus further includes:
The information processing apparatus according to claim 1, further comprising: a 3D map generation unit that generates 3D image data using the feature point position information generated by the feature point position information generation unit.

An information processing method for calculating a three-dimensional position of a feature point included in an image in an information processing device,
An initial information generating step for inputting a plurality of preceding image frames in an image frame of a camera-captured image and acquiring three-dimensional position information of the feature points by corresponding feature point analysis of each image frame;
The feature point position information generation unit sets the feature point position information acquired from the preceding image frame by the initial information generation unit as state information for the initial image frame, and applies an extended Kalman filter (EKF) to the subsequent image frame. A feature point position information generation step for acquiring three-dimensional position information of the feature points in the subsequent frame;
An information processing method characterized by comprising:

The information processing method further includes:
The bundle adjustment processing unit includes a bundle adjustment processing step of inputting the feature point position information generated in the feature point position information generation step and executing a correction process of the three-dimensional position information of the feature point. 9. The information processing method according to 9.

The initial information generating step includes
The information according to claim 9, wherein the information is a step of performing SFM (Structure from Motion) processing for analyzing correspondence between feature points included in each image frame using images taken from a plurality of different positions. Processing method.

The initial information generating step includes
Calculating the position and orientation information of the camera that captured the image frame by analyzing the preceding image frame;
The feature point position information generation step includes:
Processing in which the initial information generation unit sets feature point position information and camera position and orientation information calculated from the preceding image frame as state information corresponding to the initial image frame, and applies an extended Kalman filter (EKF) to the subsequent image frame The information processing method according to claim 9, further comprising: acquiring three-dimensional position information of feature points in the subsequent frame and camera position and orientation information obtained by capturing the subsequent frame.

The feature point position information generation step includes:
Capturing 3D position information and subsequent frames of feature points in subsequent frames by setting multidimensional normal distribution data including feature point position information and camera position and orientation information as state information and applying an extended Kalman filter (EKF) The information processing method according to claim 12, wherein acquired camera position and orientation information is acquired.

The information processing method further includes:
The feature point extraction unit executes a process of extracting a feature point that has not been extracted by the initial information generation unit and the feature point position information generation unit as an additional feature point and calculating a three-dimensional position of the extracted additional feature point A feature point extracting step,
The bundle adjustment processing step includes:
Inputting the feature point position information generated by the feature point position information generation unit and the feature point position information of the additional feature point extracted by the feature point extraction unit, and executing correction processing of the three-dimensional position information of the feature point The information processing method according to claim 10.

The feature point extraction step includes:
The feature point extracted from the image frame and the feature point extracted by the initial information generation unit and the feature point position information generation unit are determined to overlap, and only new feature points that do not overlap are selected as additional feature points. The information processing method according to claim 14, wherein processing is executed.

The information processing method further includes:
The information processing method according to claim 9, wherein the 3D map generation unit includes a 3D map generation step of generating 3D image data using the feature point position information generated by the feature point position information generation unit. .

In the information processing device, a computer program for calculating a three-dimensional position of a feature point included in an image,
An initial information generating step of inputting a plurality of preceding image frames in the image frames of the camera-captured image to the initial information generating unit and acquiring the three-dimensional position information of the feature points by analyzing the corresponding feature points of each image frame;
In the feature point position information generation unit, the feature point position information acquired from the preceding image frame by the initial information generation unit is set as state information for the initial image frame, and an extended Kalman filter (EKF) for the subsequent image frame is applied. A feature point position information generation step for acquiring three-dimensional position information of the feature points in the subsequent frame;
A computer program characterized by comprising: