JP5012615B2

JP5012615B2 - Information processing apparatus, image processing method, and computer program

Info

Publication number: JP5012615B2
Application number: JP2008082451A
Authority: JP
Inventors: 堅一郎多井; 雷張; 隆之芦ヶ原; ステフェングットマン
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-03-27
Filing date: 2008-03-27
Publication date: 2012-08-29
Anticipated expiration: 2028-03-27
Also published as: JP2009237848A

Description

本発明は、情報処理装置、および画像処理方法、並びにコンピュータ・プログラムに関する。さらに詳細には、カメラ撮影画像を利用した３次元マップ（３Ｄｍａｐ）生成において、最適な画像フレームの選択を行うことで精度の高い特徴点位置の算出や高精度な３次元マップ生成を可能とした情報処理装置、および画像処理方法、並びにコンピュータ・プログラムに関する。 The present invention relates to an information processing apparatus, an image processing method, and a computer program. More specifically, in generating a three-dimensional map (3D map) using a camera-captured image, it is possible to calculate a feature point position with high accuracy and generate a highly accurate three-dimensional map by selecting an optimal image frame. The present invention relates to an information processing apparatus, an image processing method, and a computer program.

カメラの撮影画像を解析して撮影画像に含まれるオブジェクトの３次元位置を求める処理が様々な分野で利用されている。例えばカメラを備えたロボットなどのエージェント(移動体)が、カメラの撮影画像を解析して移動環境を観測し、観測状況に応じてエージェント周囲の環境を把握しながら移動を行う処理や、撮影画像に基づいて周囲環境の地図（環境地図）を作成する環境マップ構築処理に利用される。非特許文献１には、特徴点位置の追跡を全フレームで行い、全フレームのデータが得られた後、バッチ処理により特徴点の位置とカメラ位置を算出する方法を開示している。 Processing for analyzing a photographed image of a camera and obtaining a three-dimensional position of an object included in the photographed image is used in various fields. For example, an agent (moving body) such as a robot equipped with a camera analyzes the captured image of the camera and observes the moving environment, and performs processing of moving while grasping the environment around the agent according to the observation situation, It is used for environment map construction processing for creating a map of the surrounding environment (environment map) based on Non-Patent Document 1 discloses a method of tracking feature point positions in all frames and calculating the positions of feature points and camera positions by batch processing after data of all frames is obtained.

３次元マップ（３Ｄｍａｐ）の生成処理シーケンスの一例について図１を参照して説明する。まず、ステップＳ１１においてカメラによって画像を撮影する。例えばカメラを保持したユーザやロボットなどが移動しながら周りの画像を連続的に撮影する。 An example of a three-dimensional map (3D map) generation processing sequence will be described with reference to FIG. First, in step S11, an image is taken by a camera. For example, a user or a robot holding a camera continuously takes surrounding images while moving.

ステップＳ１２において取得画像の解析によって、画像に含まれる特徴点の位置情報などが含まれる疎な３次元情報を構築する。この処理においては、ＳＬＡＭ（ｓｉｍｕｌｔａｎｅｏｕｓｌｏｃａｌｉｚａｔｉｏｎａｎｄｍａｐｐｉｎｇ）やＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）などの処理が適用される。ＳＬＡＭは、カメラから入力する画像内の特徴点の位置と、カメラの位置姿勢を併せて検出する処理である。ＳＦＭは、例えば複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析する処理などである。 In step S12, sparse three-dimensional information including position information of feature points included in the image is constructed by analyzing the acquired image. In this process, a process such as SLAM (simultaneous localization and mapping) or SFM (Structure from Motion) is applied. SLAM is a process for detecting the position of a feature point in an image input from the camera and the position and orientation of the camera. SFM is, for example, processing for analyzing correspondence between feature points (Landmarks) included in an image using images taken from a plurality of different positions.

さらに、ステップＳ１３では、ステップＳ１２において求めたカメラの軌跡情報や画像内の特徴点情報などを利用して詳細な３次元情報である密な３次元情報である３次元マップを生成する。 Furthermore, in step S13, a three-dimensional map, which is dense three-dimensional information, which is detailed three-dimensional information, is generated using the camera trajectory information obtained in step S12 and the feature point information in the image.

ステップＳ１２のＳＦＭやＳＬＡＭ処理において画像内の特徴点の解析を正確に実行することが、最終的な３次元情報の精度を高めることになる。このステップＳ１２では、画像フレームに含まれる特徴点情報とカメラ軌跡情報を取得する処理が行われる。この詳細処理例を図２に示す。 Accurate analysis of the feature points in the image in the SFM or SLAM processing in step S12 increases the accuracy of the final three-dimensional information. In step S12, a process of acquiring feature point information and camera trajectory information included in the image frame is performed. An example of this detailed processing is shown in FIG.

まず、ステップＳ２１においてカメラから複数の撮影画像を入力し、フレーム間の一致する特徴点を利用して、カメラの位置を算出するフレームマッチング処理を行う。入力画像は、例えば移動するカメラが撮影した動画像の複数の画像フレーム、すなわち異なる位置から撮影した複数の画像であり、同一のオブジェクトが複数フレームに撮影されている。ステップＳ２１では、複数の画像フレームから対応する特徴点を検出し、これらの情報を利用してカメラの位置を計算する。 First, in step S <b> 21, a plurality of captured images are input from the camera, and frame matching processing for calculating the position of the camera is performed using matching feature points between frames. The input image is, for example, a plurality of image frames of a moving image captured by a moving camera, that is, a plurality of images captured from different positions, and the same object is captured in a plurality of frames. In step S21, corresponding feature points are detected from a plurality of image frames, and the position of the camera is calculated using these pieces of information.

ステップＳ２２では、フレームスティッチ処理（ＦｒａｍｅＳｔｉｔｃｈ）を実行する。この処理は、ステップＳ２１で検出した情報を利用して各画像の特徴点（Ｌａｎｄｍａｒｋ）の３次元位置を推定して、複数の画像データ内の複数の特徴点位置に基づいて画像フレームを接合して各特徴点の３次元位置を反映したデータを生成する処理である。 In step S22, a frame stitch process (Frame Stitch) is executed. This process estimates the three-dimensional position of each image feature point (Landmark) using the information detected in step S21, and joins image frames based on a plurality of feature point positions in a plurality of image data. This is a process of generating data reflecting the three-dimensional position of each feature point.

ステップＳ２３では、バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）を実行する。このバンドル調整処理は、異なる位置から撮影した複数の画像に含まれる対応する特徴点の３次元位置を１つの位置に収束させる処理である。各撮影画像のカメラ位置情報と、各カメラ位置において撮影された画像に含まれる対応する特徴点の情報を利用して、特徴点の最も確からしい３次元位置を算出する処理である。この処理のためには、基本的には、２つ以上の異なる位置から撮影した画像フレームが必要となる。 In step S23, bundle adjustment processing (Bundle Adjustment) is executed. This bundle adjustment process is a process for converging the three-dimensional positions of corresponding feature points included in a plurality of images taken from different positions into one position. This is a process of calculating the most probable three-dimensional position of the feature point using the camera position information of each captured image and the information of the corresponding feature point included in the image captured at each camera position. For this processing, basically, image frames taken from two or more different positions are required.

例えば、図３に示すように、３つの異なる位置からの撮影画像１１〜１３を利用し、その撮影画像中から検出した特徴点２１〜２３を利用して、特徴点の３次元位置を求める。複数の画像に含まれる特徴点中、対応する特徴点の位置は図３に示す特徴点２１のように、各画像の撮影ポイントから特徴点２１を結ぶ線（Ｂｕｎｄｌｅ）が１つの特徴点２１位置において交わるはずである。 For example, as shown in FIG. 3, the captured images 11 to 13 from three different positions are used, and the feature points 21 to 23 detected from the captured images are used to obtain the three-dimensional positions of the feature points. Among the feature points included in a plurality of images, the position of the corresponding feature point is the position of the feature point 21 where a line (bundle) connecting the feature point 21 to the shooting point of each image is a feature point 21 shown in FIG. Should cross.

しかし、カメラの位置や特徴点の画像内の位置情報などは必ずしも正確に算出されず様々な要因による誤差が含まれる。従って、このような誤差を取り除く必要がある。具体的には１つの対応する特徴点の３次元位置とカメラ位置とを結ぶ線が、その１つの特徴点において交わるように補正する。すなわち算出済みのカメラ位置や特徴点位置を修正する処理が必要となる。この処理がバンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）として実行され、この調整処理によって修正された特徴点位置情報やカメラ位置情報を利用してより精度の高い３次元情報を生成することが可能となる。 However, the position of the camera, the position information in the image of the feature points, etc. are not necessarily calculated accurately, and errors due to various factors are included. Therefore, it is necessary to remove such errors. Specifically, correction is performed so that a line connecting the three-dimensional position of one corresponding feature point and the camera position intersects at the one feature point. That is, it is necessary to correct the calculated camera position and feature point position. This processing is executed as bundle adjustment processing (Bundle Adjustment), and it becomes possible to generate more accurate three-dimensional information using the feature point position information and camera position information corrected by this adjustment processing.

このような処理において、正確な３次元情報を生成するためには、
複数の撮影画像から十分な数の対応特徴点を取得すること、
撮影方向の異なる複数の画像に対応特徴点が含まれていること、
対応特徴点の情報から誤差の少ない正確な特徴点位置情報が算出できること、
など、解析対象とする画像フレームの選択が重要な要素となる。３次元画像解析の専門家であれば、このような条件を満足する画像を直感的に判断して適切な画像を選択することができるが、通常のユーザは、自分の撮影した多数の画像フレームから適切な画像を選択することは難しい。
"Ｓｈａｐｅａｎｄｍｏｔｉｏｎｆｒｏｍｉｍａｇｅｓｔｒｅａｍｓｕｎｄｅｒｏｒｔｈｏｇｒａｐｈｙ：ａｆａｃｔｏｒｉｚａｔｉｏｎｍｅｔｈｏｄ"，Ｃ．ＴｏｍａｓｉａｎｄＴ．Ｋａｎａｄｅ，ＩｎｔｅｒｎａｔｉｏｎａｌＪｏｕｒｎａｌｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，Ｖｏｌｕｍｅ９，Ｎｕｍｂｅｒ２，ｐｐ．１３７−１５４，（１９９２）． In such a process, in order to generate accurate three-dimensional information,
Obtaining a sufficient number of corresponding feature points from multiple captured images;
The corresponding feature points are included in multiple images with different shooting directions,
Accurate feature point position information with few errors can be calculated from the corresponding feature point information,
For example, selection of an image frame to be analyzed is an important factor. An expert of 3D image analysis can intuitively determine an image satisfying such a condition and select an appropriate image. However, a normal user has many image frames taken by himself / herself. It is difficult to select an appropriate image from.
“Shape and motion from image stream under orthography: a factorization method”, C.I. Tomasi and T. Kanade, International Journal of Computer Vision, Volume 9, Number 2, pp. 137-154, (1992).

本発明は、例えば上述のような問題点に鑑みてなされたものであり、例えばビデオカメラで撮影された動画像を構成する多数の画像フレームから３次元情報を生成するために適切な画像フレームを予め設定したアルゴリズムに従って選択し、選択した画像を利用して特徴点位置の算出など処理を行うことで精度の高い３次元情報の生成を行う情報処理装置、および画像処理方法、並びにコンピュータ・プログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, for example. For example, an image frame suitable for generating three-dimensional information from a large number of image frames constituting a moving image photographed by a video camera is provided. An information processing apparatus, an image processing method, and a computer program that generate high-precision three-dimensional information by selecting according to a preset algorithm and performing processing such as calculation of a feature point position using the selected image The purpose is to provide.

また、本発明は、正確な３次元情報の生成を実現するための画像選択に適用可能な画像の評価処理を行う情報処理装置、および画像処理方法、並びにコンピュータ・プログラムを提供することを目的とする。 Another object of the present invention is to provide an information processing apparatus, an image processing method, and a computer program for performing image evaluation processing applicable to image selection for realizing generation of accurate three-dimensional information. To do.

本発明の第１の側面は、
画像に含まれる画素の３次元位置を算出する情報処理装置であり、
入力画像から複数画像を選択画像として選択する画像選択部と、
前記選択画像が３次元データの生成に適切な画像であるか否かを判定するための評価値を算出する評価値算出部と、
前記評価値算出部の算出した評価値に基づいて、前記選択画像が３次元データの生成に適切な画像であるか否かを判定する評価値判定部を有し、
前記評価値算出部は、
（ａ１）前記選択画像に含まれる３次元位置の推定された特徴点数
（ｂ１）前記選択画像に含まれる対応特徴点の推定３次元位置と各画像フレームの特徴点位置とを結ぶ光線同士の交差角、
（ｃ１）参照画像のカメラモデルフィッティング誤差
上記（ａ１）〜（ｃ１）の少なくともいずれかを評価値として算出し、
前記評価値判定部は、
上記（ａ１）〜（ｃ１）の少なくともいずれかの値を予め設定した閾値と比較して、前記選択画像が３次元データの生成に適切な画像であるか否かを判定する構成であり、
（ａ２）前記選択画像に含まれる３次元位置の推定された特徴点数が規定閾値（Ｔｈａ）以上であること、
（ｂ２）前記選択画像に含まれる対応特徴点の推定３次元位置と各画像フレームの特徴点位置とを結ぶ光線同士の交差角が規定閾値（Ｔｈｂ）以上であること、
（ｃ２）参照画像のカメラモデルフィッティング誤差が規定閾値（Ｔｈｃ）以下であること、
上記（ａ２）〜（ｃ２）の少なくともいずれかの条件を満足する場合に、前記選択画像が３次元データの生成に適切な画像であると判定することを特徴とする情報処理装置にある。 The first aspect of the present invention is:
An information processing apparatus that calculates a three-dimensional position of a pixel included in an image,
An image selection unit for selecting a plurality of images as input images from an input image;
An evaluation value calculation unit for calculating an evaluation value for determining whether or not the selected image is an image suitable for generating three-dimensional data;
Based on the evaluation value calculated by the evaluation value calculation unit, an evaluation value determination unit that determines whether or not the selected image is an image suitable for generating three-dimensional data,
The evaluation value calculation unit
(A1) Estimated number of feature points of the three-dimensional position included in the selected image (b1) Intersection of rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame Horn,
(C1) Camera model fitting error of reference image At least one of the above (a1) to (c1) is calculated as an evaluation value,
The evaluation value determination unit
Comparing at least one of the above values (a1) to (c1) with a preset threshold value, it is configured to determine whether or not the selected image is an image suitable for generating three-dimensional data,
(A2) the estimated number of feature points of the three-dimensional position included in the selected image is equal to or greater than a specified threshold (Tha);
(B2) the intersection angle between rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame is equal to or greater than a specified threshold (Thb);
(C2) The camera model fitting error of the reference image is not more than a specified threshold value (Thc),
An information processing apparatus is characterized in that, when at least one of the conditions (a2) to (c2) is satisfied, the selected image is determined to be an image suitable for generating three-dimensional data.

さらに、本発明の情報処理装置の一実施態様において、前記評価値算出部は、前記（ａ１）〜（ｃ１）のすべてを評価値として算出し、前記評価値判定部は、前記（ａ２）〜（ｃ２）のすべての条件を満足する場合に、前記選択画像が３次元データの生成に適切な画像であると判定することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the evaluation value calculation unit calculates all of (a1) to (c1) as evaluation values, and the evaluation value determination unit includes the (a2) to (a2) to When all the conditions of (c2) are satisfied, it is determined that the selected image is an image suitable for generating three-dimensional data.

さらに、本発明の情報処理装置の一実施態様において、前記評価値算出部は、前記評価値中、光線同士の交差角については、複数の対応特徴点および画像フレームを適用して算出した複数の交差角の中央値を算出する構成であり、前記評価値判定部は、前記中央値が規定閾値（Ｔｈｂ）以上であることを満足する場合に、前記選択画像が３次元データの生成に適切な画像であると判定することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the evaluation value calculation unit calculates a plurality of calculated intersection angles between light rays by applying a plurality of corresponding feature points and image frames. The median value of the intersection angle is calculated, and the evaluation value determination unit is suitable for generating the three-dimensional data when the selected median is satisfied that the median is equal to or greater than a predetermined threshold (Thb). It is determined that the image is an image.

さらに、本発明の情報処理装置の一実施態様において、前記情報処理装置は、前記選択画像の解析により、画像に含まれる特徴点の３次元位置およびカメラの位置姿勢情報を含む環境認識結果を生成する環境認識部を有し、前記評価値算出部は、前記環境認識結果に含まれる情報を適用して前記評価値を算出することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the information processing apparatus generates an environment recognition result including a three-dimensional position of a feature point included in the image and position and orientation information of the camera by analyzing the selected image. And the evaluation value calculation unit calculates the evaluation value by applying information included in the environment recognition result.

さらに、本発明の情報処理装置の一実施態様において、前記情報処理装置は、前記環境認識結果を検証するための参照画像を選択する参照画像選択部を有し、前記参照画像選択部は、参照画像候補と、前記選択画像に共通に含まれる対応特徴点の１画像座標系内での距離が規定閾値以上であるとの条件を満足する画像を参照画像として選択する構成であり、前記評価値算出部は、前記参照画像選択部において選択された参照画像を適用して前記評価値を算出することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the information processing apparatus includes a reference image selection unit that selects a reference image for verifying the environment recognition result, and the reference image selection unit includes a reference The evaluation value is a configuration in which an image candidate and an image satisfying a condition that a distance within one image coordinate system of corresponding feature points commonly included in the selected image is equal to or greater than a predetermined threshold are selected as reference images. The calculation unit calculates the evaluation value by applying the reference image selected by the reference image selection unit.

さらに、本発明の情報処理装置の一実施態様において、前記情報処理装置は、前記選択画像の解析により、画像に含まれる特徴点軌跡を求める特徴点追跡部を有し、前記評価値算出部は、前記特徴点軌跡を適用して前記評価値を算出することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the information processing apparatus includes a feature point tracking unit that obtains a feature point locus included in an image by analyzing the selected image, and the evaluation value calculation unit includes: The evaluation value is calculated by applying the feature point trajectory.

さらに、本発明の情報処理装置の一実施態様において、前記入力画像選択部は、選択フレーム間隔に対応する変数［Ｔ］と、最初の選択フレームに対応する変数［Ｆ］を設定および更新して、異なる画像の選択処理を行う構成であり、前記評価値判定部において、前記選択画像が３次元データの生成に適切な画像でないと判定された場合に、前記変数［Ｔ］または［Ｆ］の少なくともいずれかの変数の更新を実行して新たな画像選択を実行することを特徴とする。 Furthermore, in one embodiment of the information processing apparatus of the present invention, the input image selection unit sets and updates a variable [T] corresponding to the selected frame interval and a variable [F] corresponding to the first selected frame. , A configuration for selecting different images, and when the evaluation value determining unit determines that the selected image is not an image suitable for generating three-dimensional data, the variable [T] or [F] At least one of the variables is updated to select a new image.

さらに、本発明の情報処理装置の一実施態様において、前記入力画像選択部は、前記評価値判定部における選択画像の評価態様に応じて、予め規定した変数更新アルゴリズムに従って、前記変数［Ｔ］または［Ｆ］の少なくともいずれかの変数の更新を実行して新たな画像選択を実行することを特徴とする。 Furthermore, in one embodiment of the information processing apparatus of the present invention, the input image selection unit is configured to change the variable [T] or the variable [T] or according to a predetermined variable update algorithm according to the evaluation mode of the selection image in the evaluation value determination unit. Updating at least one of the variables of [F] is performed to select a new image.

さらに、本発明の情報処理装置の一実施態様において、前記情報処理装置は、さらに、前記評価値判定部が３次元データの生成に適切な画像であると判定した選択画像を適用したＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理と、該ＳＦＭ処理による生成情報を初期情報として利用した拡張カルマンフィルタ（ＥＫＦ）を適用した処理により３次元データ生成を実行する構成を有することを特徴とする。 Furthermore, in an embodiment of the information processing apparatus of the present invention, the information processing apparatus further includes an SFM (Structure) to which the selected image determined by the evaluation value determination unit as an image suitable for generating three-dimensional data is applied. It has a configuration in which three-dimensional data generation is performed by a process to which an extended Kalman filter (EKF) using information generated by the SFM process as initial information is applied.

さらに、本発明の第２の側面は、
情報処理装置において、画像に含まれる画素の３次元位置を算出するために適用する画像の選択処理を実行する画像処理方法であり、
画像選択部が、入力画像から複数画像を選択画像として選択する画像選択ステップと、
評価値算出部が、前記選択画像が３次元データの生成に適切な画像であるか否かを判定するための評価値を算出する評価値算出ステップと、
評価値判定部が、前記評価値算出ステップにおいて算出した評価値に基づいて、前記選択画像が３次元データの生成に適切な画像であるか否かを判定する評価値判定ステップを有し、
前記評価値算出ステップは、
（ａ１）前記選択画像に含まれる３次元位置の推定された特徴点数
（ｂ１）前記選択画像に含まれる対応特徴点の推定３次元位置と各画像フレームの特徴点位置とを結ぶ光線同士の交差角、
（ｃ１）参照画像のカメラモデルフィッティング誤差
上記（ａ１）〜（ｃ１）の少なくともいずれかを評価値として算出するステップであり、
前記評価値判定ステップは、
上記（ａ１）〜（ｃ１）の少なくともいずれかの値を予め設定した閾値と比較して、前記選択画像が３次元データの生成に適切な画像であるか否かを判定するステップであり、
（ａ２）前記選択画像に含まれる３次元位置の推定された特徴点数が規定閾値（Ｔｈａ）以上であること、
（ｂ２）前記選択画像に含まれる対応特徴点の推定３次元位置と各画像フレームの特徴点位置とを結ぶ光線同士の交差角が規定閾値（Ｔｈｂ）以上であること、
（ｃ２）参照画像のカメラモデルフィッティング誤差が規定閾値（Ｔｈｃ）以下であること、
上記（ａ２）〜（ｃ２）の少なくともいずれかの条件を満足する場合に、前記選択画像が３次元データの生成に適切な画像であると判定するステップであることを特徴とする画像処理方法にある。 Furthermore, the second aspect of the present invention provides
In the information processing apparatus, an image processing method for performing an image selection process to be applied to calculate a three-dimensional position of a pixel included in an image,
An image selection step in which an image selection unit selects a plurality of images as selected images from the input image;
An evaluation value calculating step for calculating an evaluation value for determining whether the selected image is an image suitable for generating the three-dimensional data;
The evaluation value determination unit includes an evaluation value determination step of determining whether the selected image is an image suitable for generating three-dimensional data based on the evaluation value calculated in the evaluation value calculation step;
The evaluation value calculating step includes:
(A1) Estimated number of feature points of the three-dimensional position included in the selected image (b1) Intersection of rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame Horn,
(C1) Camera model fitting error of reference image is a step of calculating at least one of the above (a1) to (c1) as an evaluation value,
The evaluation value determination step includes
Comparing at least one of the above values (a1) to (c1) with a preset threshold value to determine whether the selected image is an image suitable for generating three-dimensional data;
(A2) the estimated number of feature points of the three-dimensional position included in the selected image is equal to or greater than a specified threshold (Tha);
(B2) the intersection angle between rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame is equal to or greater than a specified threshold (Thb);
(C2) The camera model fitting error of the reference image is not more than a specified threshold value (Thc),
An image processing method comprising: a step of determining that the selected image is an image suitable for generating three-dimensional data when at least one of the above conditions (a2) to (c2) is satisfied is there.

さらに、本発明の画像処理方法の一実施態様において、前記評価値算出ステップは、前記（ａ１）〜（ｃ１）のすべてを評価値として算出するステップであり、前記評価値判定ステップは、前記（ａ２）〜（ｃ２）のすべての条件を満足する場合に、前記選択画像が３次元データの生成に適切な画像であると判定するステップであることを特徴とする。 Furthermore, in one embodiment of the image processing method of the present invention, the evaluation value calculating step is a step of calculating all of (a1) to (c1) as evaluation values, and the evaluation value determining step is the above ( It is a step of determining that the selected image is an image suitable for generating three-dimensional data when all of the conditions a2) to (c2) are satisfied.

さらに、本発明の画像処理方法の一実施態様において、前記評価値算出ステップは、前記評価値中、光線同士の交差角については、複数の対応特徴点および画像フレームを適用して算出した複数の交差角の中央値を算出するステップであり、前記評価値判定ステップは、前記中央値が規定閾値（Ｔｈｂ）以上であることを満足する場合に、前記選択画像が３次元データの生成に適切な画像であると判定するステップであることを特徴とする。 Furthermore, in an embodiment of the image processing method of the present invention, the evaluation value calculating step includes a plurality of calculated feature points and image frames applied to the intersection angle between the light rays in the evaluation value. Calculating a median value of intersection angles, and the evaluation value determining step is performed when the selected image is suitable for generating three-dimensional data when the median satisfies the specified threshold value (Thb) or more. It is a step of determining that the image is an image.

さらに、本発明の画像処理方法の一実施態様において、前記画像処理方法は、さらに、環境認識部が、前記選択画像の解析により、画像に含まれる特徴点の３次元位置およびカメラの位置姿勢情報を含む環境認識結果を生成する環境認識ステップを有し、前記評価値算出ステップは、前記環境認識結果に含まれる情報を適用して前記評価値を算出するステップであることを特徴とする。 Furthermore, in an embodiment of the image processing method of the present invention, the image processing method is further characterized in that the environment recognition unit analyzes the three-dimensional position of the feature point included in the image and the position and orientation information of the camera by analyzing the selected image. An environment recognition step of generating an environment recognition result including the evaluation value calculation step, wherein the evaluation value calculation step is a step of calculating the evaluation value by applying information included in the environment recognition result.

さらに、本発明の画像処理方法の一実施態様において、前記画像処理方法は、さらに、参照画像選択部が、前記環境認識結果を検証するための参照画像を選択する参照画像選択ステップを有し、前記参照画像選択ステップは、参照画像候補と、前記選択画像に共通に含まれる対応特徴点の１画像座標系内での距離が規定閾値以上であるとの条件を満足する画像を参照画像として選択するステップであり、前記評価値算出ステップは、前記参照画像選択ステップにおいて選択された参照画像を適用して前記評価値を算出するステップであることを特徴とする。 Furthermore, in one embodiment of the image processing method of the present invention, the image processing method further includes a reference image selection step in which a reference image selection unit selects a reference image for verifying the environment recognition result, The reference image selection step selects, as a reference image, an image that satisfies a condition that a distance between a reference image candidate and a corresponding feature point included in the selected image in one image coordinate system is equal to or greater than a predetermined threshold value. The evaluation value calculating step is a step of calculating the evaluation value by applying the reference image selected in the reference image selecting step.

さらに、本発明の画像処理方法の一実施態様において、前記画像処理方法は、さらに、特徴点追跡部が、前記選択画像の解析により、画像に含まれる特徴点軌跡を求める特徴点追跡ステップを有し、前記評価値算出ステップは、前記特徴点軌跡を適用して前記評価値を算出するステップであることを特徴とする。 Furthermore, in an embodiment of the image processing method of the present invention, the image processing method further includes a feature point tracking step in which the feature point tracking unit obtains a feature point trajectory included in the image by analyzing the selected image. The evaluation value calculating step is a step of calculating the evaluation value by applying the feature point locus.

さらに、本発明の画像処理方法の一実施態様において、前記入力画像選択ステップは、選択フレーム間隔に対応する変数［Ｔ］と、最初の選択フレームに対応する変数［Ｆ］を設定および更新して、異なる画像の選択処理を行うステップであり、前記評価値判定ステップにおいて、前記選択画像が３次元データの生成に適切な画像でないと判定された場合に、前記変数［Ｔ］または［Ｆ］の少なくともいずれかの変数の更新を実行して新たな画像選択を実行するステップであることを特徴とする。 Furthermore, in an embodiment of the image processing method of the present invention, the input image selection step sets and updates a variable [T] corresponding to the selected frame interval and a variable [F] corresponding to the first selected frame. A step of selecting different images, and when the evaluation value determining step determines that the selected image is not an image suitable for generating three-dimensional data, the variable [T] or [F] It is a step of executing a new image selection by updating at least one of the variables.

さらに、本発明の画像処理方法の一実施態様において、前記入力画像選択ステップは、前記評価値判定ステップにおける選択画像の評価態様に応じて、予め規定した変数更新アルゴリズムに従って、前記変数［Ｔ］または［Ｆ］の少なくともいずれかの変数の更新を実行して新たな画像選択を実行するステップであることを特徴とする。 Furthermore, in an embodiment of the image processing method of the present invention, the input image selection step includes the variable [T] or the variable [T] or the variable [T] according to a predetermined variable update algorithm according to the evaluation mode of the selected image in the evaluation value determination step. It is a step of executing a new image selection by updating at least one of the variables of [F].

さらに、本発明の画像処理方法の一実施態様において、前記画像処理方法は、さらに、３次元データ生成部が、前記評価値判定ステップにおいて３次元データの生成に適切な画像であると判定した選択画像を適用したＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理と、該ＳＦＭ処理による生成情報を初期情報として利用した拡張カルマンフィルタ（ＥＫＦ）を適用した処理により３次元データ生成を実行するステップを有することを特徴とする。 Furthermore, in an embodiment of the image processing method of the present invention, the image processing method is further configured such that the three-dimensional data generation unit determines that the image is suitable for generating three-dimensional data in the evaluation value determination step. It has a step of executing three-dimensional data generation by an SFM (Structure from Motion) process to which an image is applied and a process to which an extended Kalman filter (EKF) using information generated by the SFM process as initial information is applied. .

さらに、本発明の第３の側面は、
情報処理装置において、画像に含まれる画素の３次元位置を算出するために適用する画像の選択処理を実行させるコンピュータ・プログラムであり、
画像選択部に、入力画像から複数画像を選択画像として選択させる画像選択ステップと、
評価値算出部に、前記選択画像が３次元データの生成に適切な画像であるか否かを判定するための評価値を算出させる評価値算出ステップと、
評価値判定部に、前記評価値算出ステップにおいて算出した評価値に基づいて、前記選択画像が３次元データの生成に適切な画像であるか否かを判定させる評価値判定ステップを有し、
前記評価値算出ステップは、
（ａ１）前記選択画像に含まれる３次元位置の推定された特徴点数
（ｂ１）前記選択画像に含まれる対応特徴点の推定３次元位置と各画像フレームの特徴点位置とを結ぶ光線同士の交差角、
（ｃ１）参照画像のカメラモデルフィッティング誤差
上記（ａ１）〜（ｃ１）の少なくともいずれかを評価値として算出させるステップであり、
前記評価値判定ステップは、
上記（ａ１）〜（ｃ１）の少なくともいずれかの値を予め設定した閾値と比較して、前記選択画像が３次元データの生成に適切な画像であるか否かを判定するステップであり、
（ａ２）前記選択画像に含まれる３次元位置の推定された特徴点数が規定閾値（Ｔｈａ）以上であること、
（ｂ２）前記選択画像に含まれる対応特徴点の推定３次元位置と各画像フレームの特徴点位置とを結ぶ光線同士の交差角が規定閾値（Ｔｈｂ）以上であること、
（ｃ２）参照画像のカメラモデルフィッティング誤差が規定閾値（Ｔｈｃ）以下であること、
上記（ａ２）〜（ｃ２）の少なくともいずれかの条件を満足する場合に、前記選択画像が３次元データの生成に適切な画像であると判定させるステップであることを特徴とするコンピュータ・プログラムにある。 Furthermore, the third aspect of the present invention provides
In the information processing apparatus, a computer program that executes an image selection process applied to calculate a three-dimensional position of a pixel included in an image,
An image selection step for causing the image selection unit to select a plurality of images as selected images from the input image;
An evaluation value calculation step for causing the evaluation value calculation unit to calculate an evaluation value for determining whether or not the selected image is an image suitable for generating three-dimensional data;
An evaluation value determination unit that causes the evaluation value determination unit to determine whether or not the selected image is an image suitable for generating three-dimensional data based on the evaluation value calculated in the evaluation value calculation step;
The evaluation value calculating step includes:
(A1) Estimated number of feature points of the three-dimensional position included in the selected image (b1) Intersection of rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame Horn,
(C1) Camera model fitting error of reference image is a step of calculating at least one of the above (a1) to (c1) as an evaluation value,
The evaluation value determination step includes
Comparing at least one of the above values (a1) to (c1) with a preset threshold value to determine whether the selected image is an image suitable for generating three-dimensional data;
(A2) the estimated number of feature points of the three-dimensional position included in the selected image is equal to or greater than a specified threshold (Tha);
(B2) the intersection angle between rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame is equal to or greater than a specified threshold (Thb);
(C2) The camera model fitting error of the reference image is not more than a specified threshold value (Thc),
A computer program characterized in that, when at least one of the above conditions (a2) to (c2) is satisfied, it is a step of determining that the selected image is an image suitable for generating three-dimensional data. is there.

なお、本発明のコンピュータ・プログラムは、例えば、様々なプログラム・コードを実行可能な汎用コンピュータ・システムに対して、コンピュータ可読な形式で提供する記憶媒体、通信媒体によって提供可能なコンピュータ・プログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、コンピュータ・システム上でプログラムに応じた処理が実現される。 The computer program of the present invention is, for example, a computer program that can be provided by a storage medium or a communication medium provided in a computer-readable format to a general-purpose computer system that can execute various program codes. . By providing such a program in a computer-readable format, processing corresponding to the program is realized on the computer system.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施例や添付する図面に基づくより詳細な説明によって明らかになるであろう。なお、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Other objects, features, and advantages of the present invention will become apparent from a more detailed description based on embodiments of the present invention described later and the accompanying drawings. In this specification, the system is a logical set configuration of a plurality of devices, and is not limited to one in which the devices of each configuration are in the same casing.

本発明の一実施例の構成によれば、入力画像から選択した選択画像が３次元データの生成に適切な画像であるか否かを判定するために、（ａ）前記選択画像に含まれる３次元位置の推定された特徴点数、（ｂ）選択画像に含まれる対応特徴点の推定３次元位置と各画像フレームの特徴点位置とを結ぶ光線同士の交差角、（ｃ）参照画像のカメラモデルフィッティング誤差、これら（ａ）〜（ｃ）の少なくともいずれかを評価値として算出し、これらの評価値が規定条件を満足する場合に、選択画像が３次元データの生成に適切な画像であると判定してＳＦＭやＥＫＦＳＬＡＭなどの３次元データ生成処理を実行する構成としたので、精度の高い３次元データの生成を実現することが可能となる。 According to the configuration of the embodiment of the present invention, in order to determine whether or not the selected image selected from the input image is an image suitable for generating three-dimensional data, (a) 3 included in the selected image The estimated number of feature points of the three-dimensional position, (b) the intersection angle of rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame, and (c) the camera model of the reference image When at least one of the fitting error and (a) to (c) is calculated as an evaluation value, and these evaluation values satisfy a specified condition, the selected image is an image suitable for generating three-dimensional data. Since the determination is made and the three-dimensional data generation process such as SFM or EKF SLAM is executed, it is possible to realize generation of highly accurate three-dimensional data.

以下、図面を参照しながら本発明の実施形態に係る情報処理装置、および画像処理方法、並びにコンピュータ・プログラムの詳細について説明する。 Details of an information processing apparatus, an image processing method, and a computer program according to embodiments of the present invention will be described below with reference to the drawings.

本発明の概要について、図４を参照して説明する。本発明の情報処理装置１２０は、例えば移動するユーザ１０１の保持するカメラ１０２の撮影画像、例えば動画像を構成する画像を入力し、その入力画像の解析を実行して撮影画像に含まれる様々なオブジェクトからなる３次元画像情報１０３を生成する。 The outline of the present invention will be described with reference to FIG. The information processing apparatus 120 of the present invention inputs, for example, a photographed image of the camera 102 held by the moving user 101, for example, an image constituting a moving image, performs analysis of the input image, and includes various images included in the photographed image. Three-dimensional image information 103 composed of objects is generated.

情報処理装置１２０は、先に図１を参照して説明した処理と同様、取得画像の解析によって、画像に含まれる特徴点の位置情報などが含まれる疎な３次元マップ１３１を、ＳＬＡＭ（ｓｉｍｕｌｔａｎｅｏｕｓｌｏｃａｌｉｚａｔｉｏｎａｎｄｍａｐｐｉｎｇ）やＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）などの処理を適用して生成し、さらに、カメラの軌跡情報や画像内の特徴点情報などを利用して詳細な３次元情報である密な３次元マップ１３２を生成する。 Similar to the processing described above with reference to FIG. 1, the information processing apparatus 120 analyzes a sparse three-dimensional map 131 that includes position information of feature points included in an image by analyzing an acquired image, using SLAM (simultaneous). Dense three-dimensional information that is generated by applying processing such as localization and mapping (SFM) and structure from motion (SFM), and further using detailed information such as camera trajectory information and feature point information in the image. A map 132 is generated.

先に説明したように、ＳＬＡＭは、カメラから入力する画像内の特徴点の位置と、カメラの位置姿勢を併せて検出する処理である。ＳＦＭは、例えば複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析する処理などである。 As described above, SLAM is a process for detecting the position of the feature point in the image input from the camera and the position and orientation of the camera together. SFM is, for example, processing for analyzing correspondence between feature points (Landmarks) included in an image using images taken from a plurality of different positions.

疎な３次元マップ１３１は特徴点の３次元位置情報を持つ。この情報の生成処理シーケンスの従来例については、図２のフローチャートを参照して説明したが、本発明の情報処理装置１２０においては、図２に示すフローと異なった処理により特徴点の３次元位置の取得を行う。 The sparse three-dimensional map 131 has three-dimensional position information of feature points. The conventional example of this information generation processing sequence has been described with reference to the flowchart of FIG. 2, but in the information processing apparatus 120 of the present invention, the three-dimensional positions of feature points are processed by a process different from the flow shown in FIG. Get the.

本発明の一実施例の特徴点の３次元位置およびカメラの位置姿勢情報の取得シーケンスについて、図５に示すフローチャートを参照して説明する。 The acquisition sequence of the three-dimensional position of the feature point and the position and orientation information of the camera according to an embodiment of the present invention will be described with reference to the flowchart shown in FIG.

特徴点の３次元位置情報およびカメラの位置姿勢情報の取得処理は、以下の処理シーケンスで実行する。
ステップＳ１０１：初期情報取得処理（ＳＦＭ）
ステップＳ１０２：拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）
ステップＳ１０３：バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ） The process of acquiring the feature point 3D position information and the camera position and orientation information is executed in the following processing sequence.
Step S101: Initial information acquisition process (SFM)
Step S102: Camera position and orientation and feature point three-dimensional position information acquisition processing (EKF SLAM) to which the extended Kalman filter is applied
Step S103: Bundle adjustment processing (Bundle Adjustment)

本発明の一実施例の特徴点の３次元位置情報およびカメラの位置姿勢情報の取得処理の基本的流れは、このように、
「初期情報取得処理（ＳＦＭ）」
→「拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）」
→「バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）」
このような処理シーケンスである。 The basic flow of the acquisition process of the three-dimensional position information of the feature point and the position and orientation information of the camera according to the embodiment of the present invention is as follows:
"Initial information acquisition process (SFM)"
→ "Camera position / posture and feature point 3D position information acquisition processing (EKF SLAM) applying extended Kalman filter"
→ "Bundle adjustment process (Bundle Adjustment)"
This is the processing sequence.

ステップＳ１０１の初期情報取得処理（ＳＦＭ）は、例えば複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理によって行われるが、この処理は、例えばカメラから入力する最初の数フレームの入力画像のみを適用して実行する。この処理によって得られた情報をステップＳ１０２において実行する拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）のイニシャライズ情報として利用する。 The initial information acquisition process (SFM) in step S101 is performed, for example, by an SFM (Structure from Motion) process that analyzes the correspondence of feature points (Landmarks) included in the image using images taken from a plurality of different positions. However, this processing is executed by applying only the input images of the first few frames input from the camera, for example. Information obtained by this process is used as initialization information for the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is executed in step S102.

ＥＫＦＳＬＡＭに適用するイニシャライズ情報が得られた後は、その後の入力画像に対しては、ＳＦＭを実行することなく、ＥＫＦＳＬＡＭを実行して、特徴点位置情報と各画像を撮影したカメラの位置姿勢情報を取得する。 After initialization information to be applied to the EKF SLAM is obtained, the EKF SLAM is executed on the subsequent input image without executing the SFM, and the feature point position information and the position of the camera that captured each image are obtained. Get posture information.

具体的な処理例について図６を参照して説明する。図６（ａ）に示すカメラによって取得される画像フレームに含まれる被写体の特徴点の位置情報とカメラの軌跡を求めるとする。なお、図６（ａ）には、カメラによって取得される画像フレームを一定間隔で示している。 A specific processing example will be described with reference to FIG. Assume that the position information of the subject feature points included in the image frame acquired by the camera shown in FIG. In FIG. 6A, image frames acquired by the camera are shown at regular intervals.

本発明の情報処理装置は、特徴点の３次元位置情報の解析対象とする画像フレームのすべてではなく、最初の数フレーム（図６に示すフレームＴ１までの入力フレーム）から後述する画像選択処理アルゴリズムによって選択した画像フレームを、図５に示すステップＳ１０１の初期情報取得処理（ＳＦＭ）の処理対象フレームとする。この複数の選択画像フレームを対象とした初期情報取得処理（ＳＦＭ）によって、３次元マップとカメラフレームの軌跡情報が、次のステップＳ１０２において実行する「ＥＫＦＳＬＡＭ」の初期化情報（ＩｎｉｔｉａｌｉｚｅＤａｔａ）として利用される。 The information processing apparatus according to the present invention uses an image selection processing algorithm described later from the first few frames (input frames up to frame T1 shown in FIG. 6) instead of all the image frames to be analyzed for the three-dimensional position information of feature points. The image frame selected in step S101 is set as a processing target frame of the initial information acquisition process (SFM) in step S101 shown in FIG. By the initial information acquisition process (SFM) for the plurality of selected image frames, the trajectory information of the three-dimensional map and the camera frame is used as initialization information (Initialized Data) of “EKF SLAM” executed in the next step S102. Used.

フレームＴ１までの複数の先行する選択画像フレームによって得られた３次元マップとカメラフレームの軌跡情報が初期情報となり、この初期情報を用いて「ＥＫＦＳＬＡＭ」の初期化を行い、「ＥＫＦＳＬＡＭ」の処理を行う。すなわち、先行画像フレームから取得した特徴点位置情報を初期画像フレームに対する状態情報として設定し、後続画像フレームに対する拡張カルマンフィルタ（ＥＫＦ）を適用した処理により、後続フレーム中の特徴点の３次元位置情報を取得する。 The trajectory information of the three-dimensional map and camera frame obtained by a plurality of preceding selected image frames up to the frame T1 becomes initial information. Using this initial information, “EKF SLAM” is initialized, and “EKF SLAM” is initialized. Process. That is, the feature point position information acquired from the preceding image frame is set as the state information for the initial image frame, and the process of applying the extended Kalman filter (EKF) to the subsequent image frame is used to obtain the three-dimensional position information of the feature points in the subsequent frame. get.

ステップＳ１０１の初期情報取得処理（ＳＦＭ）は、前述したように複数の異なる位置から撮影した画像を利用して画像内に含まれる特徴点（Ｌａｎｄｍａｒｋ）の対応を解析するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理によって行われ、この処理結果が、ステップＳ１０２において実行する「ＥＫＦＳＬＡＭ」の初期化情報（ＩｎｉｔｉａｌｉｚｅＤａｔａ）として利用されることになる。従って、ＳＦＭ処理の精度が、最終的に生成する３次元データの精度に大きく影響することになる。 The initial information acquisition process (SFM) in step S101 is an SFM (Structure from Motion) process for analyzing the correspondence between feature points (Landmarks) included in an image using images taken from a plurality of different positions as described above. The processing result is used as initialization information (Initialized Data) of “EKF SLAM” executed in step S102. Therefore, the accuracy of the SFM process greatly affects the accuracy of the finally generated three-dimensional data.

ＳＦＭにおいては、例えば、
複数の撮影画像から十分な数の対応特徴点を取得すること、
撮影方向の異なる複数の画像に対応特徴点が含まれていること、
対応特徴点の情報から誤差の少ない正確な特徴点位置情報が算出できること、
など、解析対象とする画像フレームの選択が重要な要素となる。３次元画像解析の専門家であれば、このような条件を満足する画像を直感的に判断して適切な画像を選択することができるが、通常のユーザは、自分の撮影した多数の画像フレームから適切な画像を選択することは難しい。 In SFM, for example,
Obtaining a sufficient number of corresponding feature points from multiple captured images;
The corresponding feature points are included in multiple images with different shooting directions,
Accurate feature point position information with few errors can be calculated from the corresponding feature point information,
For example, selection of an image frame to be analyzed is an important factor. An expert of 3D image analysis can intuitively determine an image satisfying such a condition and select an appropriate image. However, a normal user has many image frames taken by himself / herself. It is difficult to select an appropriate image from.

本発明の情報処理装置では、このＳＦＭにおける適切な画像選択を自動的に実行して、かつＳＦＭの処理結果の評価を実行して、「ＥＫＦＳＬＡＭ」の初期化情報として適切な情報を生成してＥＫＦＳＬＡＭを実行することで精度の高い３次元データを生成することを可能としている。図７以下を参照して、この画像選択処理および評価処理の詳細について説明する。 The information processing apparatus of the present invention automatically executes appropriate image selection in the SFM and evaluates the processing result of the SFM to generate appropriate information as initialization information of “EKF SLAM”. By executing EKF SLAM, it is possible to generate highly accurate three-dimensional data. The details of the image selection process and the evaluation process will be described with reference to FIG.

図７は、本発明の情報処理装置の構成要素の一部、すなわち、ＳＦＭなど３次元情報の生成に適用するために最適な画像フレームの選択処理、画像選択のための画像評価処理を行う構成を示している。図７に示すように、情報処理装置は、カメラ３１２の撮影した映像データ３１３を入力する入力画像選択部３０１、環境認識部３０２、参照画像選択部３０４、さらに、特徴点追跡部３０３、評価値算出部３０５、評価値判定部３０６を有する。 FIG. 7 shows a part of the components of the information processing apparatus of the present invention, that is, a configuration for performing an image frame selection process and an image evaluation process for image selection that are optimal for application to the generation of three-dimensional information such as SFM. Is shown. As illustrated in FIG. 7, the information processing apparatus includes an input image selection unit 301 that inputs video data 313 captured by the camera 312, an environment recognition unit 302, a reference image selection unit 304, a feature point tracking unit 303, an evaluation value, and the like. A calculation unit 305 and an evaluation value determination unit 306 are included.

なお、図７に示す処理を実行する本発明の構成においては、以下の制約がある。
・映像データ（画像フレーム）３１３を撮影するカメラ３１２はピンホールカメラモデルに近似可能
・カメラ３１２の内部パラメータは既知
・映像の撮影中においては、カメラ内部パラメータの変化はない
・静止環境３１１をカメラ３１２で記録した映像データ３１３を予め用意してある（撮影処理時にリアルタイム処理を行うのではなく、撮影済みの映像データ３１３に対するオフライン処理を行う）
・映像データ３１３のフレーム間毎の時間的間隔は同一である。
・映像データ３１３は時間的に連続したデータである（急に別のシーンに移ることはない）
また、本例では計算で扱うことの出来る映像フレーム数は、メモリ・計算量の制約上、Ｎ（例えばＮ＝５、ＮはＮ＞１を満たせばよい）とする。 Note that the configuration of the present invention for executing the processing shown in FIG. 7 has the following restrictions.
The camera 312 that captures the video data (image frame) 313 can be approximated to a pinhole camera model. The internal parameters of the camera 312 are known. The camera internal parameters do not change while the video is being captured. The video data 313 recorded in 312 is prepared in advance (offline processing is performed on the captured video data 313 instead of real-time processing at the time of shooting processing).
The time interval between frames of the video data 313 is the same.
The video data 313 is continuous in time (does not move to another scene suddenly)
Also, in this example, the number of video frames that can be handled by calculation is N (for example, N = 5, N needs to satisfy N> 1) due to memory and calculation amount constraints.

なお、静止環境３１１とは、カメラの撮影する画像中に移動する物体が含まれないことを意味しているが、移動物体が含まれていてもその他の静止オブジェクトについての３次元位置の算出は可能であり、カメラ３１２によって撮影される対象すべてが静止していることが必須ではない。３次元位置の算出対象が静止していればよい。カメラ３１２は移動して映像を撮影し様々な角度からの画像を撮影する。 The still environment 311 means that the moving object is not included in the image captured by the camera. However, even if the moving object is included, the calculation of the three-dimensional position for other stationary objects is not performed. It is possible and it is not essential that all objects photographed by the camera 312 are stationary. It is only necessary that the calculation target of the three-dimensional position is stationary. The camera 312 moves to shoot images and shoot images from various angles.

また、以下に説明する実施例では、カメラ撮影データを記憶部に蓄積し、記憶部に蓄積した画像を処理するオフライン処理を前提としているが、処理速度の速いプロセッサなど、所定の処理能力を備えた装置であれば、カメラ撮影に並行して画像選択および３次元データの生成を行うオンライン処理構成も可能である。 Further, in the embodiment described below, it is assumed that camera photographing data is accumulated in the storage unit and offline processing for processing the image accumulated in the storage unit is premised. However, a predetermined processing capability such as a processor having a high processing speed is provided. In addition, an online processing configuration in which image selection and three-dimensional data generation are performed in parallel with camera shooting is also possible.

以下、図７以下を参照して、精度の高い３次元データの生成を可能とする画像選択処理の詳細について説明する。図７に示す構成によって、ＳＦＭなど３次元データの生成に適用する画像選択処理や画像評価処理が実行される。この処理シーケンスについて図８に示すフローチャートを参照して説明する。まずステップＳ３０１において、入力画像選択部３０１が映像データ３１３から入力フレームを選択する。選択する方法の一例として、本例では変数［Ｔ］と変数［Ｆ］を用いる。
変数［Ｔ］と変数［Ｆ］は、以下の変数である。
Ｔ：選択フレーム間隔に対応する変数、
Ｆ：最初の選択フレームに対応する変数、 Hereinafter, with reference to FIG. 7 and the following, details of the image selection processing that enables generation of highly accurate three-dimensional data will be described. With the configuration shown in FIG. 7, an image selection process and an image evaluation process applied to the generation of three-dimensional data such as SFM are executed. This processing sequence will be described with reference to the flowchart shown in FIG. First, in step S301, the input image selection unit 301 selects an input frame from the video data 313. As an example of the selection method, variable [T] and variable [F] are used in this example.
Variable [T] and variable [F] are the following variables.
T: Variable corresponding to the selected frame interval,
F: Variable corresponding to the first selected frame,

図９に示すように、最初のフレームから数えてＦ番目のフレームから未来方向にＴフレーム間隔でＮフレームを選択して採用する。前述したように、Ｎは、メモリ・計算量の制約などによって決定すればよくＮ＞１を満たせばよい。例えばＮ＝５である。 As shown in FIG. 9, N frames are selected and adopted at intervals of T frames from the Fth frame counting from the first frame in the future direction. As described above, N may be determined according to memory / computation constraints or the like, and N> 1 may be satisfied. For example, N = 5.

図９に示すように、選択フレームは、［Ｆ，Ｆ＋Ｔ，Ｆ＋２Ｔ，Ｆ＋３Ｔ，・・・，Ｆ＋（Ｎ−１）×Ｔ］の各フレームである。なお、選択フレーム間隔に対応する変数［Ｔ］は初期値が１で、最初の選択フレームに対応する変数［Ｆ］は初期値が０（最初のフレームの番号、１スタートの場合は１）として設定し、その設定において画像評価を実行して、３次元データの生成に適切であるか否かを判定する評価値を算出し、評価値が規定に満たない場合は、変数を変更して、新たな画像フレームのセットを設定して、再度評価することになる。変数Ｔと変数Ｆは、ユーザが逐次、設定、更新するか、あるいは、評価結果に応じて更新態様を決定して自動更新する構成としてもよい。この処理については後段で詳細に説明する。 As shown in FIG. 9, the selected frame is each frame of [F, F + T, F + 2T, F + 3T,..., F + (N−1) × T]. Note that the variable [T] corresponding to the selected frame interval has an initial value of 1, and the variable [F] corresponding to the first selected frame has an initial value of 0 (the number of the first frame, 1 for the first frame). Set and perform image evaluation in that setting, calculate an evaluation value to determine whether it is appropriate for generating three-dimensional data, and if the evaluation value is less than the regulation, change the variable, A new set of image frames will be set and evaluated again. The variable T and the variable F may be set or updated sequentially by the user, or may be configured to be automatically updated by determining an update mode according to the evaluation result. This process will be described in detail later.

図８に示すフローのステップＳ３０２では、環境認識部３０２が、入力画像選択部３０１で選択した画像、すなわち、例えば図９に示すフレーム［Ｆ，Ｆ＋Ｔ，Ｆ＋２Ｔ，Ｆ＋３Ｔ，・・・，Ｆ＋（Ｎ−１）×Ｔ］の各フレームを入力して、静止環境３１１の構造とカメラ３１２の各映像フレームの位置・姿勢を計算する。環境認識部３０２の実行する処理は、従来から公知の処理として実行可能である。例えば、文献［１］［ＹｉＭａ，ＳｔｅｆａｎｏＳｏａｔｔｏ，ＪａｎａＫｏｓｅｃｈａａｎｄＳ．ＳｈａｎｋａｒＳａｓｔｒｙ，"ＡｎＩｎｖｉｔａｔｉｏｎｔｏ３−ＤＶｉｓｉｏｎ"，Ｓｐｒｉｎｇｅｒ（２００６）．］のＣｈａｐｔｅｒ１１に記載の方法を適用することができる。 In step S302 of the flow shown in FIG. 8, the environment recognition unit 302 selects the image selected by the input image selection unit 301, for example, the frames [F, F + T, F + 2T, F + 3T,. −1) × T] is input, and the structure of the still environment 311 and the position and orientation of each video frame of the camera 312 are calculated. The process executed by the environment recognition unit 302 can be executed as a conventionally known process. For example, reference [1] [Yi Ma, Stefano Soatto, Jana Kosecha and S. Shankar Sastry, “An Invitation to 3-D Vision”, Springer (2006). The method described in Chapter 11 can be applied.

ステップＳ３０２の環境認識部３０２の処理で生成した結果は環境認識結果３１４として、情報処理装置内の記憶部に一旦保持される。環境認識結果３１４には以下の情報が保持されている。但し、特徴量は特徴点同士の比較（トラッキング）が可能な特徴量であれば、特に指定はない。スケール・回転に頑強な特徴量表現であることが望ましい。
・Ｎフレーム分のカメラの位置と姿勢
・各特徴点の３次元位置
・各特徴点の特徴量（特徴点追跡に利用）
・各特徴点に対するＮフレーム分の画像内位置（軌跡） The result generated by the processing of the environment recognition unit 302 in step S302 is temporarily held in the storage unit in the information processing apparatus as the environment recognition result 314. The environment recognition result 314 holds the following information. However, the feature amount is not particularly specified as long as the feature amount can be compared (tracked) between feature points. It is desirable that the feature quantity is robust to scale and rotation.
-Camera position and posture for N frames-3D position of each feature point-Feature amount of each feature point (used for feature point tracking)
・ Image position (trajectory) for N frames for each feature point

次に、ステップＳ３０３において、特徴点追跡部３０３による特徴点追跡処理を実行する。ステップＳ３０１において入力画像選択部３０１が選択した画像、すなわち、例えば図９に示すフレーム［Ｆ，Ｆ＋Ｔ，Ｆ＋２Ｔ，Ｆ＋３Ｔ，・・・，Ｆ＋（Ｎ−１）×Ｔ］の各フレームを入力して、共通する特徴点を追跡して特徴点の軌跡を求める。この処理は、環境認識結果にある対応特徴点のトラッキングを行う処理であり、公知の処理、具体的には、前述の文献［１］の１１章に記載の方法を用いることが可能である。 Next, in step S303, feature point tracking processing by the feature point tracking unit 303 is executed. In step S301, the image selected by the input image selection unit 301, that is, for example, each frame of frames [F, F + T, F + 2T, F + 3T,..., F + (N−1) × T] shown in FIG. The trajectory of the feature point is obtained by tracking the common feature point. This process is a process of tracking corresponding feature points in the environment recognition result, and a known process, specifically, the method described in Chapter 11 of the above-mentioned document [1] can be used.

次に、ステップＳ３０４において、参照画像選択部３０４は、環境認識で用いた画像とは別に、環境認識結果を検証するために利用する参照画像を選択する。参照画像は、映像データ３１３を構成する１つの画像フレームを選択する処理であり、選択方法としては、例えば単純に、最終的な処理フレーム［Ｆ＋（Ｎ−１）×Ｔフレーム］の［Ｎ×Ｔ］フレーム前の画像を選択するといった処理が可能である。しかし、この方法では、カメラが長時間動いていない場合に遭遇したとき、参照画像としてふさわしくない。特徴点位置の撮影方向に変化のない画像を参照画像として選択してしまう恐れがある。 Next, in step S304, the reference image selection unit 304 selects a reference image to be used for verifying the environment recognition result, separately from the image used in environment recognition. The reference image is a process of selecting one image frame constituting the video data 313. As a selection method, for example, simply [N × of the final processing frame [F + (N−1) × T frame] is used. T] Processing such as selecting an image before the frame is possible. However, this method is not suitable as a reference image when encountered when the camera has not moved for a long time. There is a risk of selecting an image having no change in the shooting direction of the feature point position as the reference image.

そのため、本処理例では、ステップＳ３０３の特徴点追跡部３０３の処理結果として得られる特徴点追跡結果を利用して、参照画像の特徴点の画像内位置が、環境認識部３０２で用いた特徴点軌跡に対してどの程度離れているか検証し、充分離れているフレームを、参照画像として選択する。 Therefore, in this processing example, using the feature point tracking result obtained as the processing result of the feature point tracking unit 303 in step S303, the position of the feature point of the reference image in the image is the feature point used by the environment recognition unit 302. It is verified how far away from the trajectory, and a frame that is sufficiently far away is selected as a reference image.

ステップＳ３０４において実行する参照画像選択処理の詳細シーケンスについて、図１０に示すフローチャートを参照して説明する。まず、ステップＳ３５１において、処理対象とする映像フレーム３１３から、任意の１フレームを参照画像候補として選択する。次にステップＳ３５２において、特徴点追跡部３０３において、ステップＳ３５１で選択した参照画像候補としての１フレームに対し、環境認識結果にある対応特徴点のトラッキングを行う。特徴点追跡処理は前述したように公知の処理を適用する。具体的には、前述の文献［１］の１１章に記載の方法を用いることが可能である。 The detailed sequence of the reference image selection process executed in step S304 will be described with reference to the flowchart shown in FIG. First, in step S351, one arbitrary frame is selected as a reference image candidate from the video frame 313 to be processed. In step S352, the feature point tracking unit 303 tracks the corresponding feature point in the environment recognition result for one frame as the reference image candidate selected in step S351. As described above, a known process is applied to the feature point tracking process. Specifically, it is possible to use the method described in Chapter 11 of the aforementioned document [1].

次にステップＳ３５３において、ステップＳ３５２で追跡した特徴点位置群と、環境認識結果３１４において算出されている特徴点位置群との間の距離を計算する。計算式は以下に示す式（数式１）を適用する。 Next, in step S353, the distance between the feature point position group tracked in step S352 and the feature point position group calculated in the environment recognition result 314 is calculated. As the calculation formula, the following formula (Formula 1) is applied.

但し、上記式において、
ｄｉｓｔは特徴点追跡部３０３で追跡した特徴点位置群と環境認識部３０２で算出した特徴点位置群との間の距離、
ｉは、特徴点番号、
ｔは、フレーム番号、 However, in the above formula,
dist is the distance between the feature point position group tracked by the feature point tracking unit 303 and the feature point position group calculated by the environment recognition unit 302;
i is a feature point number,
t is the frame number,

はｉに関する中央値を出力する関数、 Is a function that outputs the median for i,

はｔに関する最小値を出力する関数、
ｍ_ｉｔは、環境認識部３０２で用いたｉ番目の特徴点のフレームｔでの画像内の位置、
ｍ＿ｒｅｆ_ｉは前述のｉ番目の特徴点に対応する参照画像内の位置を示す。 Is a function that outputs the minimum value for t,
_mit is the position of the i-th feature point used in the environment recognition unit 302 in the frame t,
m_ref _i indicates a position in the reference image corresponding to the i-th feature point.

なお、図１１に１つの参照画像候補３２１と、入力画像選択部３０１が選択した画像フレーム３２２に対して、上記式（数式１）を適用して距離を算出した場合の処理例を示している。１つの参照画像候補３２１と、入力画像選択部３０１が選択した画像フレーム３２２にはそれぞれ対応特徴点３３１，３４１，３４２が存在し、入力画像選択部３０１が選択した画像フレーム３２２の特徴点の軌跡が図１１の下段に示すように特徴点軌跡３５１として求められ、この軌跡情報を利用して、参照画像上の特徴点３５２の位置との特徴点間距離３５３を算出する。 FIG. 11 shows a processing example when the distance is calculated by applying the above formula (Formula 1) to one reference image candidate 321 and the image frame 322 selected by the input image selection unit 301. . Corresponding feature points 331, 341, and 342 exist in one reference image candidate 321 and the image frame 322 selected by the input image selection unit 301, respectively, and the locus of the feature points of the image frame 322 selected by the input image selection unit 301 11 is obtained as a feature point locus 351 as shown in the lower part of FIG. 11, and the feature point distance 353 from the position of the feature point 352 on the reference image is calculated using this locus information.

次に、ステップＳ３５４において、ステップＳ３５３で上記式（数式１）を用いて算出した選択フレームの特徴点群と参照画像候補の特徴点との特徴点間距離が予め設定した規定閾値以上であり、さらに特徴点の数が予め定めた閾値以上であるか否かを判定し、閾値以上であれば、参照画像として適切であると判定して参照画像の選択処理を終了し、もし閾値以上でない場合はステップＳ３５１に戻り、他の画像を参照画像候補として選択して、ステップＳ３５２以下の処理を実行する。このようにして、図８に示すフローのステップＳ３０４において参照画像の選択処理が実行される。 Next, in step S354, the distance between the feature points of the feature point group of the selected frame calculated using the above formula (Formula 1) in step S353 and the feature point of the reference image candidate is equal to or greater than a predetermined threshold value, Further, it is determined whether or not the number of feature points is equal to or greater than a predetermined threshold value. If it is equal to or greater than the threshold value, it is determined that the feature image is suitable as a reference image, and the reference image selection process is terminated. Returns to step S351, selects another image as a reference image candidate, and executes the processing from step S352 onward. In this way, the reference image selection process is executed in step S304 of the flow shown in FIG.

次に、図８に示すフローのステップＳ３０５では、評価値算出部３０５が、ステップＳ３０１において選択した画像フレームの評価を行う。すなわち、選択フレームが３次元画像の生成に適当であるか否かの評価を実行する。具体的には、例えば、図５のフローに示すステップＳ１０１の処理として実行するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理に適用する画像フレームとして適切であるか、さらに、その画像フレームを用いたＳＦＭ処理結果が、図５のステップＳ１０２の処理として実行する拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）に適用する初期情報として適切であるか否かを判定するための評価値を算出する。 Next, in step S305 of the flow shown in FIG. 8, the evaluation value calculation unit 305 evaluates the image frame selected in step S301. That is, it is evaluated whether or not the selected frame is appropriate for generating a three-dimensional image. Specifically, for example, it is appropriate as an image frame to be applied to the SFM (Structure from Motion) process executed as the process of step S101 shown in the flow of FIG. 5, and further, the SFM process result using the image frame is obtained. Evaluation for determining whether or not it is appropriate as initial information applied to the camera position and orientation and feature point three-dimensional position information acquisition processing (EKF SLAM) to which the extended Kalman filter executed as the processing in step S102 of FIG. 5 is applied Calculate the value.

本例では評価値として、
（ａ）環境認識部３０２で用いた特徴点の数
（ｂ）光線同士の交差角（図１２参照）の中央値
（ｃ）参照画像のカメラモデルフィッティング誤差
これらの３種類の値を評価値として用いる。
以下、上記の（ａ）〜（ｃ）それぞれの値について説明する。 In this example,
(A) Number of feature points used in environment recognition unit 302 (b) Median value of crossing angles of light rays (see FIG. 12) (c) Camera model fitting error of reference image These three kinds of values are used as evaluation values Use.
Hereinafter, each of the values (a) to (c) will be described.

（ａ）環境認識部で用いた特徴点の数
環境認識部３０２で用いた特徴点の数は、環境認識部３０２において３次元位置を推定した特徴点の数である。
精度の高い３次元データを生成するためには特徴点の数は多いほど３次元データの精度を高めることが可能となる、従って、特徴点数は多いほど評価値が高くなる。 (A) Number of feature points used in environment recognition unit The number of feature points used in environment recognition unit 302 is the number of feature points whose 3D position is estimated in environment recognition unit 302.
In order to generate highly accurate three-dimensional data, the greater the number of feature points, the higher the accuracy of the three-dimensional data. Therefore, the greater the number of feature points, the higher the evaluation value.

（ｂ）光線同士の交差角の中央値
次に、光線同士の交差角の中央値について説明する。まず、「光線同士の交差角」の定義について図１２を参照して説明する。図１２に示すようなある２つの画像フレーム３７１，３７２のカメラ位置・姿勢と、その２つのフレーム３７１，３７２から見える１点の特徴点３８１があるとする。「光線同士の交差角」は、図１２に示すように、２つのカメラ位置において撮影した２つの画像フレーム３７１，３７２から特徴点３８１を結ぶ光線が交差する角度φである。 (B) Median value of crossing angle between light rays Next, a median value of crossing angle between light rays will be described. First, the definition of “intersection angle between rays” will be described with reference to FIG. Assume that there are camera positions / postures of two image frames 371 and 372 and one feature point 381 that can be seen from the two frames 371 and 372 as shown in FIG. As shown in FIG. 12, the “intersection angle between the light rays” is an angle φ at which the light rays connecting the feature points 381 from the two image frames 371 and 372 photographed at the two camera positions intersect.

この交差角［φ］が大きければ大きいほど、フレーム間の視差が大きくとれていることになる。特徴点の３次元位置を高精度に算出するためには、より大きな視差を持つフレームに対応する特徴点が存在することが重要な要素となる。
交差角［φ］を求める式としては、以下に示す式（数式２）を用いる。 The larger the intersection angle [φ], the greater the parallax between frames. In order to calculate the three-dimensional position of the feature point with high accuracy, it is an important factor that a feature point corresponding to a frame having a larger parallax exists.
As a formula for obtaining the crossing angle [φ], the following formula (Formula 2) is used.

但し、上記式において、
ｗは特徴点の位置、
ｃ_ｐはｐ番目のフレームのカメラ位置（焦点位置）、
‖ ‖はノルム、
はベクトル同士の内積、
を表し、
関数ａｃｏｓは逆コサイン関数である。 However, in the above formula,
w is the position of the feature point,
c _p is the camera position of the p-th frame (the focus position),
‖ ノル is the norm,
Is the inner product of vectors,
Represents
The function acos is an inverse cosine function.

次に上記式（数式２）によって算出した交差角［φ］を用いて、「光線同士の交差角の中央値」を求める。まず、Ｎフレームから２フレームを１ペアとする組を［_ｎＣ_２］組用意する。例えば、ｐフレームとｑフレームがありそれぞれのフレームではｉ番目の特徴点が写っているものとする。それぞれの組に対して、ｉ番目の特徴点の「光線同士の交差角の中央値」を求める。そして、全ての特徴点に対して以上の処理を行い、「光線同士の交差角」の中央値を「光線同士の交差角の中央値」とする。 Next, using the intersection angle [φ] calculated by the above equation (Equation 2), “the median value of the intersection angles of the light beams” is obtained. First, [ _n C ₂ ] pairs are prepared in which N frames to 2 frames form a pair. For example, it is assumed that there are a p frame and a q frame, and the i-th feature point is captured in each frame. For each set, the “median value of the intersection angle between rays” of the i-th feature point is obtained. Then, the above processing is performed on all feature points, and the median value of “intersection angle between rays” is set as “median value of intersection angle between rays”.

数式として示すと、以下に示す式（数式３）によって、光線同士の交差角の中央値［ａｎｇｌｅ］を算出する。 When expressed as an equation, the median [angle] of the crossing angle between the rays is calculated by the following equation (Equation 3).

但し、上記式中、
は、もしあるフレームペアの少なくとも片方のフレームにｉ番目の特徴点がないとしたときは、採用しない。
前述したように、特徴点の３次元位置を高精度に算出するためには、より大きな視差を持つフレームに対応する特徴点が存在することが重要な要素となる。従って、光線同士の交差角の中央値［ａｎｇｌｅ］が大きいほど、評価値が高くなる。 However, in the above formula,
Is not adopted if there is no i-th feature point in at least one frame of a frame pair.
As described above, in order to calculate the three-dimensional position of the feature point with high accuracy, it is an important element that there is a feature point corresponding to a frame having a larger parallax. Therefore, the evaluation value increases as the median [angle] of the intersection angle between the light beams increases.

（ｃ）参照画像のカメラモデルフィッティング誤差
次に、参照画像のカメラモデルフィッティング誤差について説明する。
環境認識部３０２による処理によって得られた特徴点の３次元位置と、参照画像から得られる画像内特徴点位置の双方（特徴点の３次元位置・参照画像での画像内特徴点位置）が正しいならば、ピンホールカメラモデルに従って、図１３に示すように参照画像フレーム３９０を通る全ての光線（参照画像フレーム３９０上の特徴点位置と、特徴点３９１ａ〜ｆを結ぶ線）は１点で交差する。この交差点は、参照画像フレーム３９０の焦点３９２である。 (C) Camera Model Fitting Error of Reference Image Next, the camera model fitting error of the reference image will be described.
Both the three-dimensional position of the feature point obtained by the processing by the environment recognition unit 302 and the in-image feature point position obtained from the reference image (the three-dimensional position of the feature point / the in-image feature point position in the reference image) are correct. Then, according to the pinhole camera model, as shown in FIG. 13, all the light rays passing through the reference image frame 390 (the line connecting the feature point positions on the reference image frame 390 and the feature points 391a to 391f) intersect at one point. To do. This intersection is the focal point 392 of the reference image frame 390.

ピンホールカメラモデルについて、図１４、図１５を参照して説明する。ピンホールカメラモデルにおいて、特徴点の画像フレーム中の位置は下式（数式４）によって計算することができる。 The pinhole camera model will be described with reference to FIGS. In the pinhole camera model, the position of the feature point in the image frame can be calculated by the following formula (Formula 4).

上記式の意味について、図１４、図１５を参照して説明する。上記式は、カメラの撮影画像４１０に含まれるオブジェクト４１１の点（ｍ）のカメラ像平面の画素位置４１２、すなわち、カメラ座標系によって表現されている位置と、世界座標系におけるオブジェクト４００の３次元位置（Ｍ）４０１との対応関係を示す式である。 The meaning of the above formula will be described with reference to FIGS. The above formula is obtained by calculating the pixel position 412 on the camera image plane of the point (m) of the object 411 included in the captured image 410 of the camera, that is, the position expressed by the camera coordinate system, and the three-dimensional object 400 in the world coordinate system. It is an expression showing the correspondence with the position (M) 401.

カメラ像平面の画素位置４１２はカメラ座標系によって表現されている。カメラ座標系は、カメラの焦点を原点Ｃとして、像平面がＸｃ，Ｙｃの二次元平面、奥行きをＺｃとした座標系であり、カメラの動きによって原点Ｃは移動する。 A pixel position 412 on the camera image plane is expressed by a camera coordinate system. The camera coordinate system is a coordinate system in which the focal point of the camera is the origin C, the image plane is a two-dimensional plane of Xc and Yc, and the depth is Zc, and the origin C is moved by the movement of the camera.

一方、オブジェクト４００の３次元位置（Ｍ）４０１は、カメラの動きによって移動しない原点Ｏを有するＸＹＺ三軸からなる世界座標系によって示される。この異なる座標系でのオブジェクトの位置の対応関係を示す式が上述のピンホールカメラモデルとして定義される。 On the other hand, the three-dimensional position (M) 401 of the object 400 is indicated by a world coordinate system composed of three XYZ axes having an origin O that does not move by the movement of the camera. An expression indicating the correspondence between the positions of the objects in the different coordinate systems is defined as the above-described pinhole camera model.

この式に含まれる値は、図１５に示すように、
λ：正規化パラメータ
Ａ：カメラ内部パラメータ、
Ｃｗ：カメラ位置、
Ｒｗ：カメラ回転行列、
を意味している。
さらに、
は、同次座標系で表現されたカメラの像平面上の位置である。
λは、正規化パラメータであり、
の第３項を満足させるための値である。 The values included in this equation are as shown in FIG.
λ: normalization parameter A: camera internal parameter,
Cw: camera position,
Rw: camera rotation matrix,
Means.
further,
Is a position on the image plane of the camera expressed in a homogeneous coordinate system.
λ is a normalization parameter,
Is a value for satisfying the third term.

なお、カメラ内部パラメータＡには、以下の値が含まれる。
f：焦点距離
θ：画像軸の直交性（理想値は９０°）
ｋｕ：縦軸のスケール（３次元位置のスケールから二次元画像のスケールへの変換）
ｋｖ：横軸のスケール（３次元位置のスケールから二次元画像のスケールへの変換）
（ｕ０，ｖ０）：画像中心位置 The camera internal parameter A includes the following values.
f: Focal length θ: Image axis orthogonality (ideal value is 90 °)
ku: Scale of the vertical axis (conversion from a 3D position scale to a 2D image scale)
kv: horizontal scale (conversion from 3D position scale to 2D image scale)
(U0, v0): Image center position

このように、世界座標系にある特徴点は位置［Ｍ］で表現される。また、カメラは位置［Ｃｗ］と姿勢（回転行列）Ｒｗで表現される。カメラの焦点位置・画像中心等はカメラ内部パラメータ［Ａ］で表現される。これらのパラメータから、「世界座標系にある特徴点」から「カメラの像平面上」に射影されるそれぞれの位置の関係式が、上述した式（数式４）によって表現することができる。 Thus, the feature point in the world coordinate system is expressed by the position [M]. The camera is represented by a position [Cw] and a posture (rotation matrix) Rw. The focal position of the camera, the image center, etc. are expressed by camera internal parameters [A]. From these parameters, the relational expression of the respective positions projected from the “feature point in the world coordinate system” onto the “camera image plane” can be expressed by the above-described expression (Expression 4).

また、カメラの位置・姿勢を回転行列Ｒｗと並進ベクトルで表現した場合は、以下に示す式（数式５）のようになる。但し、並進ベクトルとカメラ位置との関係は式（数式６）で表される。 Further, when the position / orientation of the camera is expressed by a rotation matrix Rw and a translation vector, the following equation (Equation 5) is obtained. However, the relationship between the translation vector and the camera position is expressed by Expression (Formula 6).

本発明の一実施例における「参照画像のカメラモデルフィッティング誤差」の算出処理具体例について図１６を参照して説明する。
「参照画像のカメラモデルフィッティング誤差」の算出処理は、以下の２つの処理によって行われる。
（処理１）ピンホールカメラモデルを元に参照画像のカメラ位置・姿勢を求める処理、
（処理２）得られたカメラ位置・姿勢を元に各特徴点を像平面に再射影し、再射影した点とトラッキング点（観測点）との誤差を計算する処理、 A specific example of the calculation process of the “camera model fitting error of the reference image” in one embodiment of the present invention will be described with reference to FIG.
The calculation process of “camera model fitting error of reference image” is performed by the following two processes.
(Process 1) Processing for obtaining the camera position / posture of the reference image based on the pinhole camera model,
(Process 2) A process of re-projecting each feature point on the image plane based on the obtained camera position and orientation, and calculating an error between the re-projected point and the tracking point (observation point);

まず、図１６（ａ）を参照して、
（処理１）ピンホールカメラモデルを元に参照画像のカメラ位置・姿勢を求める処理、
について説明する。図１６（ａ）は、先に説明した図１３と同様の図である。参照画像フレーム３９０を撮影したカメラの位置・姿勢を求める。
カメラの位置・姿勢を求める数式は、以下の式（数式７）である。 First, referring to FIG.
(Process 1) Processing for obtaining the camera position / posture of the reference image based on the pinhole camera model,
Will be described. FIG. 16A is a diagram similar to FIG. 13 described above. The position / orientation of the camera that captured the reference image frame 390 is obtained.
A formula for obtaining the position / posture of the camera is the following formula (Formula 7).

但し、
ｗ_ｉは、ｉ番目の特徴点の３次元位置、
ｘ_ｉは、下式（数式８）によって算出される値である。ただし、ｍ_ｉは、特徴点の参照画像における位置、行列Ａは先に説明した式（数式４）に記載の内部パラメータを示す。 However,
w _i is the three-dimensional position of the i-th feature point,
x _i is a value calculated by the following formula (Formula 8). Here, _mi is the position of the feature point in the reference image, and the matrix A is an internal parameter described in the above-described equation (Equation 4).

また、
Ｉ_３は、３行３列の単位行列、
０は３行１列の零ベクトル、
はｎ行１列の零ベクトル（左辺の行列の大きさに依存）、
ｒ_１，ｒ_２，ｒ_３は、参照画像フレームの姿勢を表す下式に示される回転行列の列ベクトルＲ、
Also,
I ₃ is a 3 × 3 identity matrix,
0 is a zero vector of 3 rows and 1 column,
Is an n-by-1 zero vector (depending on the size of the left-hand side matrix),
r ₁ , r ₂ , r ₃ are column vectors R of the rotation matrix shown in the following equation representing the orientation of the reference image frame,

ｔは、参照画像フレームの並進ベクトルである。
なお、Ａ，Ｒ，ｔは、先に図１４、図１５を参照して説明した「ピンホールカメラモデル」における値Ａ，Ｒ，ｔに対応する値である。 t is the translation vector of the reference image frame.
A, R, and t are values corresponding to the values A, R, and t in the “pinhole camera model” described above with reference to FIGS. 14 and 15.

上述したカメラの位置・姿勢を求める数式（数式７）は、様々な求め方がある。解法の１例として、数式（数式７）を、
There are various ways of obtaining the above-described mathematical formula (Formula 7) for obtaining the position and orientation of the camera. As an example of the solution, the mathematical formula (Formula 7) is

上記式のような表現としたとき、
行列Ｂ^Ｔ・Ｂの最小固有値に対応する固有ベクトルがｑであるという設定で解を求めることができる。
但し、ｒ_１，ｒ_２，ｒ_３は回転行列の列ベクトルでノルムが１になるように、ｑの各成分のスケールを算出後補正する必要がある。以上の処理により、上記式（数式７）を適用して参照画像のカメラ位置・姿勢が求めることができる。 When expressed as the above formula,
A solution can be obtained by setting that the eigenvector corresponding to the minimum eigenvalue of the matrix B ^T · B is q.
However, r ₁ , r ₂ , and r ₃ are column vectors of the rotation matrix and need to be corrected after calculating the scale of each component of q so that the norm is 1. Through the above processing, the camera position / posture of the reference image can be obtained by applying the above formula (Formula 7).

次に、図１６（ｂ）と図１７を参照して、
（処理２）得られたカメラ位置・姿勢を元に各特徴点を像平面に再射影し、再射影した点とトラッキング点（観測点）との誤差を計算する処理、
この処理について説明する。 Next, referring to FIG. 16B and FIG.
(Process 2) A process of re-projecting each feature point on the image plane based on the obtained camera position and orientation, and calculating an error between the re-projected point and the tracking point (observation point);
This process will be described.

まず、上述の（処理１）によって、推定したカメラ位置・姿勢を元に各特徴点を像平面５００に再射影する。図１６（ｂ）、図１７に示すように再射影点５０１が設定される。さらに、特徴点追跡部３０３の処理によって取得される特徴点の特徴点３次元位置トラッキング点（観測点）５０２を像平面５００に設定する。これらの２つの点の距離を再投影誤差（ｒｅｐｒｏｊｅｃｔｉｏｎｅｒｒｏｒ）［ε_ｉ］として算出する。ｉは特徴点の識別番号であり、各特徴点についての再投影誤差［ε_ｉ］をそれぞれ算出する。 First, each feature point is re-projected on the image plane 500 based on the estimated camera position and orientation by (Process 1) described above. A reprojection point 501 is set as shown in FIGS. Further, the feature point three-dimensional position tracking point (observation point) 502 of the feature point acquired by the processing of the feature point tracking unit 303 is set on the image plane 500. The distance between these two points is calculated as a reprojection error [ε _i ]. i is an identification number of a feature point, and reprojection error [ε _i ] for each feature point is calculated.

上記式（数式７）を適用して算出したカメラ位置・姿勢の誤差を測るために、再投影誤差（ｒｅｐｒｏｊｅｃｔｉｏｎｅｒｒｏｒ）［ε_ｉ］を計算する。再投影誤差［ε_ｉ］は、以下に示す式（数式９）（数式１０）を適用して算出する。 In order to measure the error of the camera position / posture calculated by applying the above formula (formula 7), a reprojection error [ε _i ] is calculated. The reprojection error [ε _i ] is calculated by applying the following equations (Equation 9) and (Equation 10).

但し、
ｍ_ｉはトラッキングで得られた特徴点の画像内位置、 However,
_mi is the position in the image of the feature point obtained by tracking,

式（数式９）中の、
は、再射影点であり、上記式（数式１０）を用いて求める。 In the formula (Formula 9),
Is a reprojection point and is obtained using the above equation (Equation 10).

なお、上記の式（数式１０）は、先に図１４、図１５を参照して説明したピンホールカメラモデルに基づく式であり、式中のパラメータは、図１４、図１５を参照して説明したパラメータに相当する。 The above equation (Equation 10) is an equation based on the pinhole camera model described above with reference to FIGS. 14 and 15, and parameters in the equation are described with reference to FIGS. 14 and 15. It corresponds to the parameter.

上記式（数式９）で得られる各特徴点（ｉ＝１，２，・・・）に対応する複数の再投影誤差［ε_ｉ］の中央値を「参照画像のカメラモデルフィッティング誤差」とする。
参照画像のカメラモデルフィッティング誤差を［ｒｅｐｒｏｊ＿ｅｒｒｏｒ］として、数式で表すと、以下の式（数式１１）によって示される。 The median value of a plurality of reprojection errors [ε _i ] corresponding to the feature points (i = 1, 2,...) Obtained by the above equation (Equation 9) is defined as “camera model fitting error of reference image”. .
When the camera model fitting error of the reference image is expressed as [reproj_error], it is expressed by the following expression (Expression 11).

なお、参照画像のカメラモデルフィッティング誤差として、中央値を用いる理由は、特徴点追跡ミスなどによる遊離値、すなわちエラー測定値などを排除するためである。 The reason why the median is used as the camera model fitting error of the reference image is to eliminate a free value due to a feature point tracking error, that is, an error measurement value.

上記式（数式１１）によって示される「参照画像のカメラモデルフィッティング誤差」は小さい程、評価値が高くなる。 The smaller the “camera model fitting error of the reference image” indicated by the above equation (Equation 11), the higher the evaluation value.

評価値算出部３０５は、図８に示すフローチャートのステップＳ３０５において、上述したように、
（ａ）環境認識部３０２で用いた特徴点の数
（ｂ）光線同士の交差角（図１２参照）の中央値
（ｃ）参照画像のカメラモデルフィッティング誤差
これらの３種類の値を評価値として算出する。 As described above, the evaluation value calculation unit 305 in step S305 of the flowchart shown in FIG.
(A) Number of feature points used in environment recognition unit 302 (b) Median value of crossing angles of light rays (see FIG. 12) (c) Camera model fitting error of reference image These three kinds of values are used as evaluation values calculate.

すなわち、ステップＳ３０１において選択した画像フレームが３次元データの生成に適切であるか。具体的には例えば図５のステップＳ１０１の処理として実行するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理に適用する画像フレームとして適切であるか、さらに、その画像フレームを用いたＳＦＭ処理結果が、図５のステップＳ１０２の処理として実行する拡張カルマンフィルタを適用したカメラ位置姿勢及び特徴点３次元位置情報取得処理（ＥＫＦＳＬＡＭ）に適用する初期情報として適切であるか否かを判定するための評価値として、上記（ａ）〜（ｃ）の３種類の評価値を算出する。 That is, is the image frame selected in step S301 appropriate for generating three-dimensional data? Specifically, for example, it is appropriate as an image frame to be applied to SFM (Structure from Motion) processing executed as the processing in step S101 in FIG. 5, and the SFM processing result using the image frame is shown in FIG. As an evaluation value for determining whether or not it is appropriate as initial information applied to the camera position and orientation and feature point three-dimensional position information acquisition process (EKF SLAM) to which the extended Kalman filter is executed as the process of S102, Three types of evaluation values a) to (c) are calculated.

なお、上記（ａ）〜（ｃ）の各評価値は以下の設定となる。
（ａ）環境認識部３０２で用いた特徴点の数は大きいほど高い評価値、
（ｂ）光線同士の交差角（図１２参照）の中央値は大きいほど高い評価値、
（ｃ）参照画像のカメラモデルフィッティング誤差は小さいほど高い評価値、
このような設定となる。 In addition, each evaluation value of said (a)-(c) becomes the following settings.
(A) The larger the number of feature points used in the environment recognition unit 302, the higher the evaluation value,
(B) The higher the median of the crossing angle between rays (see FIG. 12), the higher the evaluation value,
(C) The smaller the camera model fitting error of the reference image, the higher the evaluation value,
This is the setting.

次に、図８に示すフローチャートのステップＳ３０６において、評価値判定部３０６が、上記の３種類の評価値に基づいて、ステップＳ３０１において選択した画像フレームが、３次元データの生成処理、例えばＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理やＥＫＦＳＬＡＭ処理に適用する画像フレームとして適切であるか否かを判定する。 Next, in step S306 in the flowchart shown in FIG. 8, the image frame selected in step S301 by the evaluation value determination unit 306 based on the above three types of evaluation values is converted into a three-dimensional data generation process, for example, SFM ( It is determined whether or not it is appropriate as an image frame to be applied to Structure from Motion) processing or EKF SLAM processing.

ステップＳ３０６の評価値判定処理は、例えば以下のように実行される。
（ａ）環境認識部３０２で用いた特徴点の数が予め設定した閾値［Ｔｈａ］以上であれば適切、閾値［Ｔｈａ］未満であれば不適切、
（ｂ）光線同士の交差角（図１２参照）の中央値が予め設定した閾値［Ｔｈｂ］以上であれば適切、閾値［Ｔｈｂ］未満であれば不適切、
（ｃ）参照画像のカメラモデルフィッティング誤差が予め設定した閾値［Ｔｈｃ］以下であれば適切、閾値［Ｔｈｃ］より大きい場合は不適切、
このような判定を行う。なお、閾値はカメラの画角や環境中の特徴点の数、要求される精度などに依存するものであり、例えばユーザ設定によって適宜変更してよい。 The evaluation value determination process in step S306 is executed as follows, for example.
(A) Appropriate if the number of feature points used in the environment recognition unit 302 is equal to or greater than a preset threshold [Tha], and inappropriate if less than the threshold [Tha],
(B) Appropriate if the median of the angle of intersection between rays (see FIG. 12) is greater than or equal to a preset threshold [Thb], inappropriate if less than the threshold [Thb],
(C) Appropriate if the camera model fitting error of the reference image is less than or equal to a preset threshold [Thc], inappropriate if greater than the threshold [Thc],
Such a determination is performed. Note that the threshold depends on the angle of view of the camera, the number of feature points in the environment, the required accuracy, and the like, and may be changed as appropriate according to user settings, for example.

上記（ａ）〜（ｃ）のすべてが適切であれば、選択画像フレームは適正な評価値を有すると判定し、ステップＳ３０１において選択した画像フレームが３次元データの生成に適用される。具体的には、例えば図５のステップＳ１０１の処理として実行するＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理やその後のＥＫＦＳＬＡＭ処理に適用する画像フレームとして適切であると判定して処理を終了する。 If all of the above (a) to (c) are appropriate, it is determined that the selected image frame has an appropriate evaluation value, and the image frame selected in step S301 is applied to the generation of three-dimensional data. Specifically, for example, it is determined that the image frame is suitable for the SFM (Structure from Motion) process executed as the process of step S101 in FIG. 5 and the subsequent EKF SLAM process, and the process ends.

上記（ａ）〜（ｃ）のいずれかが不適切と判定された場合は、ステップＳ３０１において選択した画像フレームが３次元データの生成、具体的には、ＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理やＥＫＦＳＬＡＭ処理に適用する画像フレームとして適切でないと判定して、ステップＳ３０１に戻り、さらに異なる画像フレームの選択処理を実行して、ステップＳ３０２以下の処理を実行する。 If any of the above (a) to (c) is determined to be inappropriate, the image frame selected in step S301 generates three-dimensional data, specifically, SFM (Structure from Motion) processing or EKF SLAM. It is determined that the image frame is not appropriate as the image frame to be applied to the process, and the process returns to step S301. Further, a different image frame selection process is executed, and the process after step S302 is executed.

なお、上記処理例では、図８に示すフローチャートのステップＳ３０６において、上述した（ａ）〜（ｃ）すべての評価値が、規定閾値に基づく判定で適切であると判定された場合にステップＳ３０６の判定をＹｅｓの判定とし、その場合にはステップＳ３０１において選択した画像フレームを３次元データの生成処理に適用する画像フレームとする設定として説明したが、例えば（ａ）〜（ｃ）中の少なくとも２つまたは１つが適正であると判定された場合には、ステップＳ３０６の判定をＹｅｓの判定として、ステップＳ３０１において選択した画像フレームを３次元データの生成に適用するフレームとするといった処理構成とするという設定としてもよい。すなわち閾値や評価値に基づく判定基準は、状況に応じて変更可能な要素である。 In the above processing example, if it is determined in step S306 in the flowchart shown in FIG. 8 that all the evaluation values (a) to (c) described above are appropriate based on the determination based on the specified threshold value, the processing in step S306 is performed. The determination is determined as Yes, and in this case, the image frame selected in step S301 has been described as the image frame to be applied to the three-dimensional data generation process. For example, at least 2 in (a) to (c) If one or one is determined to be appropriate, the processing configuration is such that the determination in step S306 is determined as Yes and the image frame selected in step S301 is a frame that is applied to the generation of three-dimensional data. It is good also as a setting. That is, the criterion based on the threshold value and the evaluation value is an element that can be changed depending on the situation.

本発明の情報処理装置は、図７の構成を有し、図８に示すフローチャートを参照して説明した処理に従って選択した画像フレームを用いて３次元データの生成を行う。
具体的には、例えば図５のフローチャートに示すステップＳ１０１のＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理を実行し、その処理結果をステップＳ１０２の「ＥＫＦＳＬＡＭ」の初期化情報（ＩｎｉｔｉａｌｉｚｅＤａｔａ）として利用する。 The information processing apparatus of the present invention has the configuration shown in FIG. 7, and generates three-dimensional data using an image frame selected according to the processing described with reference to the flowchart shown in FIG.
Specifically, for example, SFM (Structure from Motion) processing in step S101 shown in the flowchart of FIG. 5 is executed, and the processing result is used as initialization information (Initialized Data) of “EKF SLAM” in step S102.

図８に示すフローに従って選択された画像は、
（ａ）特徴点の数が予め設定した閾値［Ｔｈａ］以上、
（ｂ）光線同士の交差角（図１２参照）の中央値が予め設定した閾値［Ｔｈｂ］以上、
（ｃ）参照画像のカメラモデルフィッティング誤差が予め設定した閾値［Ｔｈｃ］以下、
このような条件を満足する画像フレームによって構成されることになり、これらの画像フレームを用いたＳＦＭ処理によって、精度の高い特徴点の３次元位置情報の取得が可能となり、その後の「ＥＫＦＳＬＡＭ」においても精度の高い３次元データの生成が可能となる。 The image selected according to the flow shown in FIG.
(A) The number of feature points is equal to or greater than a preset threshold [Tha],
(B) The median of the crossing angle between the rays (see FIG. 12) is equal to or greater than a preset threshold [Thb],
(C) The camera model fitting error of the reference image is equal to or less than a preset threshold [Thc],
It will be composed of image frames that satisfy these conditions, and SFM processing using these image frames enables highly accurate acquisition of three-dimensional position information of feature points, and the subsequent “EKF SLAM” It is possible to generate highly accurate three-dimensional data.

すなわち情報処理装置の３次元データ生成部が、評価値判定部において３次元データの生成に適切な画像であると判定した選択画像を適用したＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理を実行し、さらにＳＦＭ処理による生成情報を初期情報として利用した拡張カルマンフィルタ（ＥＫＦ）を適用した処理により３次元データ生成を実行する。 That is, the three-dimensional data generation unit of the information processing apparatus executes SFM (Structure from Motion) processing to which the selected image determined to be an image suitable for generation of three-dimensional data by the evaluation value determination unit, and further performs SFM processing. The three-dimensional data generation is executed by a process to which an extended Kalman filter (EKF) using the generated information as the initial information is applied.

なお、図８に示すフローでは、ステップＳ３０６の判定においてＮｏの判定が得られた場合、ステップＳ３０１に戻り、新たな画像フレームの選択を行うことになり、この画像選択処理では、先に図９を参照して説明したように、最初のフレームから数えてＦ番目のフレームから未来方向にＴフレーム間隔でＮフレームを選択して採用する設定［Ｆ，Ｆ＋Ｔ，Ｆ＋２Ｔ，Ｆ＋３Ｔ，・・・，Ｆ＋（Ｎ−１）×Ｔ］として、
選択フレーム間隔に対応する変数［Ｔ］の初期値＝１、
最初の選択フレームに対応する変数［Ｆ］の初期値＝０、
この初期値を、まず選択して、満足する評価値が得られず、ステップＳ３０６に戻った場合に、変数を変更して新たな画像フレームのセットを選択する。 In the flow shown in FIG. 8, if the determination in step S306 is No, the process returns to step S301, and a new image frame is selected. In this image selection process, FIG. As described with reference to the above, settings for selecting and adopting N frames at T frame intervals from the Fth frame counting from the first frame in the future direction [F, F + T, F + 2T, F + 3T,. (N-1) × T]
Initial value of variable [T] corresponding to the selected frame interval = 1,
Initial value of variable [F] corresponding to the first selected frame = 0,
This initial value is first selected, and when a satisfactory evaluation value is not obtained and the process returns to step S306, the variable is changed and a new set of image frames is selected.

この場合、変数［Ｔ］を１，２，・・・と変更する処理、変数［Ｆ］を０，１，２・・・と変更する処理をそれぞれシーケンシャルに実行する設定としてもよいが、評価値の状況に応じて、最適な変数［Ｔ］，［Ｆ］の変更処理を行うことが可能である。この変数変更処理シーケンスについて図１８に示すフローチャートを参照して説明する。 In this case, the process of changing the variable [T] to 1, 2,... And the process of changing the variable [F] to 0, 1, 2,. It is possible to change the optimum variables [T] and [F] according to the value status. The variable change processing sequence will be described with reference to the flowchart shown in FIG.

図１８に示すフローチャートは、
（１）評価値判定処理
（２）画像選択用変数設定処理
これらの２つの処理部分によって構成される。（１）評価値判定処理は、図８のフローチャート中のステップＳ３０６の評価値判定ステップについて、
（ａ）環境認識部３０２で用いた特徴点の数
（ｂ）光線同士の交差角（図１２参照）の中央値
（ｃ）参照画像のカメラモデルフィッティング誤差
これらの各評価値各々についての判定処理を区分して記載したフローに相当する。（２）画像選択用変数設定処理は、（１）評価値判定処理の処理結果に応じて、
選択フレーム間隔に対応する変数［Ｔ］、
最初の選択フレームに対応する変数［Ｆ］、
これらの変数をどのように変更するかを決定するための処理である。なお、この図１８に示す（２）画像選択用変数設定処理は、図７に示す構成の入力画像選択部３０１において実行する。 The flowchart shown in FIG.
(1) Evaluation value determination process (2) Image selection variable setting process The process is constituted by these two processing parts. (1) In the evaluation value determination process, the evaluation value determination step in step S306 in the flowchart of FIG.
(A) Number of feature points used in environment recognition unit 302 (b) Median value of intersection angles of light rays (see FIG. 12) (c) Camera model fitting error of reference image Determination processing for each of these evaluation values It corresponds to the flow described separately. (2) The image selection variable setting process is performed according to the processing result of the (1) evaluation value determination process.
Variable [T] corresponding to the selected frame interval,
Variable [F] corresponding to the first selected frame,
This is a process for determining how to change these variables. The (2) image selection variable setting process shown in FIG. 18 is executed by the input image selection unit 301 having the configuration shown in FIG.

図１８に示すフローの各ステップについて説明する。ステップＳ５０１〜Ｓ５０３は、それぞれ、先に説明した図８のフローチャート中のステップＳ３０６の評価値判定処理に相当し、それぞれ、
（ａ）特徴点の数が予め設定した閾値［Ｔｈａ］以上、
（ｂ）光線同士の交差角（図１２参照）の中央値が予め設定した閾値［Ｔｈｂ］以上、
（ｃ）参照画像のカメラモデルフィッティング誤差が予め設定した閾値［Ｔｈｃ］以下、
これらの判定を行うステップである。すべてがＹｅｓの判定が得られれば、ステップＳ５０４に進み、その時点で選択された画像フレーム、すなわち、図８に示すステップＳ３０１において選択した画像フレームのセットが、図５のフローチャートに示すステップＳ１０１のＳＦＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）処理に適用するフレームとして選択される。 Each step of the flow shown in FIG. 18 will be described. Steps S501 to S503 correspond to the evaluation value determination process in step S306 in the flowchart of FIG. 8 described above, respectively.
(A) The number of feature points is equal to or greater than a preset threshold [Tha],
(B) The median of the crossing angle between the rays (see FIG. 12) is equal to or greater than a preset threshold [Thb],
(C) The camera model fitting error of the reference image is equal to or less than a preset threshold [Thc],
This is the step of making these determinations. If all the determinations are Yes, the process proceeds to step S504, and the set of image frames selected at that time, that is, the set of image frames selected in step S301 shown in FIG. 8, is shown in step S101 shown in the flowchart of FIG. It is selected as a frame to be applied to SFM (Structure from Motion) processing.

ステップＳ５０１〜Ｓ５０３のいずれかの判定がＮｏとなった場合には、図１８の（２）画像選択用変数設定処理に進み、先に図９を参照して説明した変数、すなわち、
選択フレーム間隔に対応する変数［Ｔ］、
最初の選択フレームに対応する変数［Ｆ］、
これらの変数を新たに設定する処理が行われる。 If any of the determinations in steps S501 to S503 is No, the process proceeds to (2) image selection variable setting process in FIG. 18, and the variables described above with reference to FIG.
Variable [T] corresponding to the selected frame interval,
Variable [F] corresponding to the first selected frame,
Processing for newly setting these variables is performed.

ステップＳ５０１において、
特徴点の数が予め設定した閾値［Ｔｈａ］以上でないと判定されると、ステップＳ５１１に進み、選択フレーム間隔に対応する変数［Ｔ］の設定値が１より大きい設定であるか否かを判定する。選択フレーム間隔［Ｔ］＝１の場合は、図１８に示すように（１）のルートに進み、ステップＳ５１４において、最初の選択フレームに対応する変数［Ｆ］を［Ｆ＋１］とする変数更新を行う。 In step S501,
If it is determined that the number of feature points is not equal to or greater than the preset threshold value [Tha], the process proceeds to step S511, and it is determined whether or not the setting value of the variable [T] corresponding to the selected frame interval is a setting greater than 1. To do. If the selected frame interval [T] = 1, the process proceeds to the route (1) as shown in FIG. 18, and in step S514, the variable update is performed with the variable [F] corresponding to the first selected frame as [F + 1]. Do.

以下、図１８に示すルート（１）〜（５）各々の変数更新処理について説明する。
ルート（１）
これは、フレーム間隔［Ｔ］が［１］と最小値であるのに、特徴点数が少ない状態である。
この状態は、
「特徴のない環境を撮影している」、
「カメラ移動が余りにも速すぎてフレーム間の重畳領域がない」、
これらの状態であることが考えられる。
この場合は、ステップＳ５１４において、最初の選択フレームに対応する変数［Ｆ］を［Ｆ＋１］とする変数更新を実行して、ステップＳ５１７に進み、現在の選択フレーム［Ｆ］の次のフレーム［Ｆ＋１］から、画像フレームを選択する。この場合は、フレーム間隔［Ｔ］は更新しない。 Hereinafter, the variable update process of each of the routes (1) to (5) illustrated in FIG. 18 will be described.
Route (1)
This is a state where the number of feature points is small even though the frame interval [T] is a minimum value of [1].
This state is
"I'm shooting an environment with no features",
“Camera movement is too fast and there is no overlap area between frames”,
These states are considered.
In this case, in step S514, a variable update is performed in which the variable [F] corresponding to the first selected frame is set to [F + 1], the process proceeds to step S517, and the next frame [F + 1] of the current selected frame [F] is executed. ] To select an image frame. In this case, the frame interval [T] is not updated.

ステップＳ５１１において、選択フレーム間隔に対応する変数［Ｔ］の設定値が１より大きいと判定された場合、すなわち、選択フレーム間隔［Ｔ］＞１の場合は、図１８に示すようにステップＳ５１２に進み、選択フレーム間隔［Ｔ］を［Ｔ−１］として、さらに、ステップＳ５１３において、ステップＳ５０２と同様の判定処理、すなわち、（ｂ）光線同士の交差角（図１２参照）の中央値が予め設定した閾値［Ｔｈｂ］以上であるか否かを判定する。なお、この判定処理は、評価値判定部３０６において実行する。 If it is determined in step S511 that the set value of the variable [T] corresponding to the selected frame interval is greater than 1, that is, if the selected frame interval [T]> 1, the process proceeds to step S512 as shown in FIG. Then, the selected frame interval [T] is set to [T-1], and in step S513, the same determination process as in step S502, that is, (b) the median of the crossing angles of the rays (see FIG. 12) is previously set. It is determined whether or not the set threshold value [Thb] is exceeded. This determination process is executed by the evaluation value determination unit 306.

ステップＳ５１３において、（ｂ）光線同士の交差角（図１２参照）の中央値が予め設定した閾値［Ｔｈｂ］以上でないと判定した場合は、図１８に示すようにルート（２）に進み、ステップＳ５１４において、最初の選択フレームに対応する変数［Ｆ］を［Ｆ＋１］とする変数更新を行ってステップＳ５１７に進み、更新した変数［Ｆ］，［Ｔ］に従った新たな画像選択を行う。 If it is determined in step S513 that (b) the median of the crossing angle between the rays (see FIG. 12) is not equal to or greater than the preset threshold [Thb], the process proceeds to route (2) as shown in FIG. In S514, the variable [F] corresponding to the first selected frame is updated to [F + 1], the process proceeds to step S517, and a new image is selected according to the updated variables [F] and [T].

このルート（２）の処理について説明する。
このルート（２）の場合は、特徴点数が少なく、かつ光線の交差角が小さい状態である。
この状態は、
「特徴のない環境を撮影している」
この状態であると想定される。
この場合は、フレーム間隔［Ｔ］を１つデクリメント［Ｔ−１］するとともに、現在の選択フレーム［Ｆ］の次のフレーム［Ｆ＋１］から、画像フレームを選択する設定とする。 The process of route (2) will be described.
In the case of this route (2), the number of feature points is small and the crossing angle of light rays is small.
This state is
"I'm shooting an environment without features"
This state is assumed.
In this case, the frame interval [T] is decremented by one [T-1], and an image frame is selected from the frame [F + 1] next to the currently selected frame [F].

一方、ステップＳ５１３において、（ｂ）光線同士の交差角（図１２参照）の中央値が予め設定した閾値［Ｔｈｂ］以上であると判定した場合は、図１８に示すようにルート（３）に進み、そのままステップＳ５１７に進み、更新した変数［Ｔ］に従った新たな画像選択を行う。 On the other hand, if it is determined in step S513 that (b) the median of the crossing angle between the rays (see FIG. 12) is equal to or greater than a preset threshold [Thb], the route (3) is routed as shown in FIG. The process proceeds directly to step S517 to select a new image according to the updated variable [T].

このルート（３）の処理について説明する。
このルート（３）の場合は、特徴点数が少なく、光線の交差角が大きい状態である。
この状態は、
「サンプリングフレーム［Ｔ］の間隔に対し、カメラ移動が速過ぎてフレーム間の重畳領域が少ない」
このような場合であることが想定される。
この場合は、フレーム間隔［Ｔ］を１つデクリメント［Ｔ−１］して画像フレームの選択を行う。最初の選択フレームに対応する変数［Ｆ］は変更しない。 The process of route (3) will be described.
In the case of this route (3), the number of feature points is small and the crossing angle of light rays is large.
This state is
“Camera movement is too fast for the sampling frame [T] interval and there are few overlapping areas between frames”
It is assumed that this is the case.
In this case, an image frame is selected by decrementing the frame interval [T] by one [T-1]. The variable [F] corresponding to the first selected frame is not changed.

図１８に示すルート（４）は、ステップＳ５０１の判定がＹｅｓ、ステップＳ５０２の判定がＮｏの場合である。この場合は、ステップＳ５１５に進み、フレーム間隔［Ｔ］を１つインクリメント［Ｔ＋１］して、ステップＳ５１７に進み、新たな画像選択を行う。変数［Ｆ］は変更しない。 Route (4) shown in FIG. 18 is a case where the determination in step S501 is Yes and the determination in step S502 is No. In this case, the process proceeds to step S515, the frame interval [T] is incremented by one [T + 1], the process proceeds to step S517, and a new image is selected. The variable [F] is not changed.

このルート（４）の処理について説明する。
このルート（４）の場合は、特徴点数は十分であるが光線の交差角が小さい状態である。
この状態は、
「サンプリングフレーム［Ｔ］の間隔に対し、カメラ移動が遅い」
このような場合であることが想定される。
この場合は、フレーム間隔［Ｔ］を１つインクリメント［Ｔ＋１］して、変数［Ｆ］は変更せずに新たな画像フレームの選択を行う。 The process of route (4) will be described.
In the case of this route (4), the number of feature points is sufficient, but the crossing angle of light rays is small.
This state is
“Camera movement is slow relative to the sampling frame [T] interval”
It is assumed that this is the case.
In this case, the frame interval [T] is incremented by one [T + 1], and a new image frame is selected without changing the variable [F].

図１８に示すルート（５）は、ステップＳ５０１の判定がＹｅｓ、ステップＳ５０２の判定がＹｅｓ、ステップＳ５０３の判定がＮｏの場合である。この場合は、ステップＳ５１６に進み、最初の選択フレームに対応する変数［Ｆ］を１つインクリメント［Ｆ＋１］して、ステップＳ５１７に進み、新たな画像選択を行う。変数［Ｔ］は変更しない。 The route (5) illustrated in FIG. 18 is a case where the determination in step S501 is Yes, the determination in step S502 is Yes, and the determination in step S503 is No. In this case, the process proceeds to step S516, and the variable [F] corresponding to the first selected frame is incremented by one [F + 1], and the process proceeds to step S517 to select a new image. The variable [T] is not changed.

このルート（５）の処理について説明する。
このルート（５）の場合は、特徴点数は十分であり、光線の交差角も十分大きいが、ピンホールカメラ誤差が大きい状態である。
この状態は、
３次元位置推定が特異解に陥った、
トラッキング失敗
これらの可能性がある。
この場合は、フレーム間隔［Ｔ］については変更せず、現在の選択フレーム［Ｆ］の次のフレーム［Ｆ＋１］から、画像フレームを選択する設定とする。 The process of route (5) will be described.
In the case of this route (5), the number of feature points is sufficient and the crossing angle of rays is sufficiently large, but the pinhole camera error is large.
This state is
3D position estimation fell into a singular solution,
Tracking failure These are possible.
In this case, the frame interval [T] is not changed, and an image frame is selected from the frame [F + 1] next to the currently selected frame [F].

なお、図１８に示すルート（６）の場合は、
（ａ）特徴点の数が予め設定した閾値［Ｔｈａ］以上、
（ｂ）光線同士の交差角（図１２参照）の中央値が予め設定した閾値［Ｔｈｂ］以上、
（ｃ）参照画像のカメラモデルフィッティング誤差が予め設定した閾値［Ｔｈｃ］以下、
これらのすべての条件が満足された場合であり、全ての条件を満たしているので、この時点で選択した画像フレームを処理フレームとして決定すればよい。 In the case of route (6) shown in FIG.
(A) The number of feature points is equal to or greater than a preset threshold [Tha],
(B) The median of the crossing angle between the rays (see FIG. 12) is equal to or greater than a preset threshold [Thb],
(C) The camera model fitting error of the reference image is equal to or less than a preset threshold [Thc],
This is a case where all these conditions are satisfied, and since all the conditions are satisfied, the image frame selected at this point may be determined as the processing frame.

この図１８に示すフローに従って、
選択フレーム間隔に対応する変数［Ｔ］、
最初の選択フレームに対応する変数［Ｆ］、
これらの変数を更新して、画像選択を実行することで、効率的に評価値の高い画像フレームを選択することが可能となる。 According to the flow shown in FIG.
Variable [T] corresponding to the selected frame interval,
Variable [F] corresponding to the first selected frame,
By updating these variables and executing image selection, it is possible to efficiently select an image frame having a high evaluation value.

なお、先に図７を参照して説明したように、本発明に係る情報処理装置は、静止環境３１１をカメラ３１２で記録した映像データ３１３を予め用意し、オフライン処理として、ＳＦＭやＥＫＦＳＬＡＭに適用する画像フレームの選択処理を実行してＳＦＭやＥＫＦＳＬＡＭを実行するものとして説明したが、情報処理装置の処理能力が十分に高く、処理が高速に実行できる装置であれば、オフラインではなく、撮影処理に追随して画像選択、ＳＦＭ、ＥＫＦＳＬＡＭを実行するオンライン処理を行う構成としてもよい。 As described above with reference to FIG. 7, the information processing apparatus according to the present invention prepares in advance video data 313 in which the static environment 311 is recorded by the camera 312, and performs offline processing in SFM or EKF SLAM. Although it has been described that the selection process of the image frame to be applied is executed and SFM or EKF SLAM is executed, if the processing capability of the information processing apparatus is sufficiently high and the apparatus can execute the processing at high speed, it is not offline, It may be configured to perform online processing for executing image selection, SFM, and EKF SLAM following the shooting processing.

例えば、カメラによって撮影された画像フレーム中、最初の数フレーム分だけ映像データとして記憶部に保存して、図８に示す画像選択処理を記憶部に格納された画像データを利用して実行し、さらに評価値の高い選択画像を利用してＳＦＭやＥＫＦＳＬＡＭを高速（例えば１／３０秒より速い速度）で実行する。このような処理構成とすれば、カメラの撮影速度に追随して処理を行うことが可能である。 For example, among the image frames taken by the camera, only the first few frames are stored as video data in the storage unit, and the image selection process shown in FIG. 8 is executed using the image data stored in the storage unit, Furthermore, SFM and EKF SLAM are executed at a high speed (for example, a speed faster than 1/30 seconds) using a selected image having a high evaluation value. With such a processing configuration, it is possible to perform processing following the shooting speed of the camera.

以上、特定の実施例を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本発明の要旨を判断するためには、特許請求の範囲の欄を参酌すべきである。 The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiments without departing from the gist of the present invention. In other words, the present invention has been disclosed in the form of exemplification, and should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims should be taken into consideration.

また、明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させるか、あるいは、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。例えば、プログラムは記録媒体に予め記録しておくことができる。記録媒体からコンピュータにインストールする他、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットといったネットワークを介してプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることができる。 The series of processing described in the specification can be executed by hardware, software, or a combined configuration of both. When executing processing by software, the program recording the processing sequence is installed in a memory in a computer incorporated in dedicated hardware and executed, or the program is executed on a general-purpose computer capable of executing various processing. It can be installed and run. For example, the program can be recorded in advance on a recording medium. In addition to being installed on a computer from a recording medium, the program can be received via a network such as a LAN (Local Area Network) or the Internet, and installed on a recording medium such as a built-in hard disk.

なお、明細書に記載された各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。また、本明細書においてシステムとは、複数の装置の論理的集合構成であり、各構成の装置が同一筐体内にあるものには限らない。 Note that the various processes described in the specification are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Further, in this specification, the system is a logical set configuration of a plurality of devices, and the devices of each configuration are not limited to being in the same casing.

以上、説明したように、本発明の一実施例の構成によれば、入力画像から選択した選択画像が３次元データの生成に適切な画像であるか否かを判定するために、（ａ）前記選択画像に含まれる３次元位置の推定された特徴点数、（ｂ）選択画像に含まれる対応特徴点の推定３次元位置と各画像フレームの特徴点位置とを結ぶ光線同士の交差角、（ｃ）参照画像のカメラモデルフィッティング誤差、これら（ａ）〜（ｃ）の少なくともいずれかを評価値として算出し、これらの評価値が規定条件を満足する場合に、選択画像が３次元データの生成に適切な画像であると判定してＳＦＭやＥＫＦＳＬＡＭなどの３次元データ生成処理を実行する構成としたので、精度の高い３次元データの生成を実現することが可能となる。 As described above, according to the configuration of the embodiment of the present invention, in order to determine whether the selected image selected from the input image is an image suitable for generating three-dimensional data, (a) The estimated number of feature points of the three-dimensional position included in the selected image, (b) the intersection angle of rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame; c) Camera model fitting error of reference image, and at least one of these (a) to (c) is calculated as an evaluation value, and when these evaluation values satisfy a prescribed condition, the selected image generates three-dimensional data. Since it is determined that the image is suitable for the three-dimensional data generation processing such as SFM or EKF SLAM, it is possible to realize generation of highly accurate three-dimensional data.

３次元マップ（３Ｄｍａｐ）の生成処理シーケンスの一例について説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining an example of the production | generation process sequence of a three-dimensional map (3D map). 画像フレームに含まれる特徴点情報とカメラ軌跡情報を取得する処理シーケンスの一例について説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining an example of the process sequence which acquires the feature point information and camera locus information which are contained in an image frame. バンドル調整処理（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ）の一例について説明する図である。It is a figure explaining an example of a bundle adjustment process (Bundle Adjustment). 本発明の処理の概要について説明する図である。It is a figure explaining the outline | summary of the process of this invention. 本発明の一実施例に従った処理のシーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the sequence of the process according to one Example of this invention. 本発明の一実施例に従った３次元情報生成処理のシーケンスについて説明する図である。It is a figure explaining the sequence of the three-dimensional information generation process according to one Example of this invention. 本発明の一実施例に従った情報処理装置の画像選択を行うための構成および画像選択処理について説明する図である。It is a figure explaining the structure and image selection process for performing the image selection of the information processing apparatus according to one Example of this invention. 本発明の一実施例に従った画像選択処理のシーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the sequence of the image selection process according to one Example of this invention. 本発明の一実施例に従った画像選択処理の態様および画像選択処理に用いる変数について説明する図である。It is a figure explaining the aspect used for the image selection process according to one Example of this invention, and the variable used for an image selection process. 図８に示すフロー中のステップＳ３０４において実行する参照画像選択処理の詳細シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the detailed sequence of the reference image selection process performed in step S304 in the flow shown in FIG. １つの参照画像候補と、入力画像選択部が選択した画像フレームを用いた追跡特徴点位置群と、環境認識結果として算出されている特徴点位置群との間の距離について説明する図である。It is a figure explaining the distance between the tracking feature point position group using one reference image candidate, the image frame which the input image selection part selected, and the feature point position group calculated as an environment recognition result. 光線同士の交差角について説明する図である。It is a figure explaining the crossing angle of light rays. 参照画像フレームを通る全ての光線（参照画像フレーム上の特徴点位置と、特徴点を結ぶ線）が１点で交差することについて説明する図である。It is a figure explaining that all the light rays (a feature point position on a reference image frame and a line connecting feature points) passing through a reference image frame intersect at one point. ピンホールカメラモデル、すなわちカメラ座標系によって表現されている位置と、世界座標系におけるオブジェクトの３次元位置との対応関係を示す式の意味について説明する図である。It is a figure explaining the meaning of the formula which shows the correspondence of the pinhole camera model, ie, the position expressed by the camera coordinate system, and the three-dimensional position of the object in the world coordinate system. ピンホールカメラモデル、すなわちカメラ座標系によって表現されている位置と、世界座標系におけるオブジェクトの３次元位置との対応関係を示す式の意味について説明する図である。It is a figure explaining the meaning of the formula which shows the correspondence of the pinhole camera model, ie, the position expressed by the camera coordinate system, and the three-dimensional position of the object in the world coordinate system. 本発明の一実施例における「参照画像のカメラモデルフィッティング誤差」の算出処理具体例について説明する図である。It is a figure explaining the calculation process specific example of the "camera model fitting error of a reference image" in one Example of this invention. 本発明の一実施例における「参照画像のカメラモデルフィッティング誤差」の算出処理具体例について説明する図である。It is a figure explaining the calculation process specific example of the "camera model fitting error of a reference image" in one Example of this invention. 本発明の一実施例に係る情報処理装置の実行する画像選択における変数更新処理シーケンスについて説明するフローチャートを示す図である。It is a figure which shows the flowchart explaining the variable update process sequence in the image selection which the information processing apparatus which concerns on one Example of this invention performs.

Explanation of symbols

１１〜１３撮影画像
２１〜２３特徴点
１０１ユーザ
１０２カメラ
１０３３次元画像情報
１２０情報処理装置
１３１疎な３次元マップ
１３２密な３次元マップ
３０１入力画像選択部
３０２環境認識部
３０３特徴点追跡部
３０４参照画像選択部
３０５評価値算出部
３０６評価値判定部
３１１静止環境
３１２カメラ
３１３映像データ
３１４環境認識結果
３２１参照画像候補
３２２画像フレーム
３３１，３４１，３４２特徴点
３５１特徴点軌跡
３５２特徴点
３５３距離
３７１，３７２画像フレーム
３８１特徴点
３９０参照画像フレーム
３９１特徴点
３９２焦点
５００像平面
５０１再射影点
５０２３次元位置トラッキング点（観測点） 11-13 Photographed image 21-23 Feature point 101 User 102 Camera 103 3D image information 120 Information processing device 131 Sparse 3D map 132 Dense 3D map 301 Input image selection unit 302 Environment recognition unit 303 Feature point tracking unit 304 Reference image selection unit 305 Evaluation value calculation unit 306 Evaluation value determination unit 311 Still environment 312 Camera 313 Video data 314 Environment recognition result 321 Reference image candidate 322 Image frame 331, 341, 342 Feature point 351 Feature point trajectory 352 Feature point 353 Distance 371 , 372 Image frame 381 Feature point 390 Reference image frame 391 Feature point 392 Focus 500 Image plane 501 Reprojection point 502 Three-dimensional position tracking point (observation point)

Claims

An information processing apparatus that calculates a three-dimensional position of a pixel included in an image,
An image selection unit for selecting a plurality of images as input images from an input image;
An evaluation value calculation unit for calculating an evaluation value for determining whether or not the selected image is an image suitable for generating three-dimensional data;
Based on the evaluation value calculated by the evaluation value calculation unit, an evaluation value determination unit that determines whether or not the selected image is an image suitable for generating three-dimensional data ;
An environment recognition unit that generates an environment recognition result including the three-dimensional position of the feature point included in the image and the position and orientation information of the camera by analyzing the selected image;
A reference image selection unit that selects a reference image for verifying the environment recognition result, wherein a distance between a reference image candidate and a corresponding feature point included in the selected image in one image coordinate system is equal to or greater than a predetermined threshold value A reference image selection unit that selects, as a reference image, an image that satisfies the condition of
The evaluation value calculation unit
Applying the information included in the environment recognition result and the reference image selected by the reference image selection unit,
(A1) Estimated number of feature points of the three-dimensional position included in the selected image (b1) Intersection of rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame Horn,
(C1) Camera model fitting error of reference image At least one of the above (a1) to (c1) is calculated as an evaluation value,
The evaluation value determination unit
Comparing at least one of the above values (a1) to (c1) with a preset threshold value, it is configured to determine whether or not the selected image is an image suitable for generating three-dimensional data,
(A2) the estimated number of feature points of the three-dimensional position included in the selected image is equal to or greater than a specified threshold (Tha);
(B2) the intersection angle between rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame is equal to or greater than a specified threshold (Thb);
(C2) The camera model fitting error of the reference image is not more than a specified threshold value (Thc),
An information processing apparatus that determines that the selected image is an image suitable for generating three-dimensional data when at least one of the conditions (a2) to (c2) is satisfied.

The evaluation value calculation unit
Calculate all of (a1) to (c1) as evaluation values,
The evaluation value determination unit
2. The information processing apparatus according to claim 1, wherein when all of the conditions (a2) to (c2) are satisfied, the selected image is determined to be an image suitable for generating three-dimensional data. .

The evaluation value calculation unit
Among the evaluation values, for the intersection angle between the light rays, it is a configuration for calculating a median value of a plurality of intersection angles calculated by applying a plurality of corresponding feature points and image frames,
The evaluation value determination unit
3. The method according to claim 1, wherein the selected image is determined to be an image suitable for generating three-dimensional data when the median satisfies a threshold value (Thb) or more. Information processing device.

The information processing apparatus includes:
A feature point tracking unit for obtaining a feature point locus included in the image by analysis of the selected image;
The information processing apparatus according to claim 1, wherein the evaluation value calculation unit calculates the evaluation value by applying the feature point trajectory.

The input image selection unit
The variable [T] corresponding to the selection frame interval and the variable [F] corresponding to the first selection frame are set and updated to perform different image selection processing.
When the evaluation value determination unit determines that the selected image is not an image suitable for generating three-dimensional data, the evaluation value determination unit updates at least one of the variables [T] and [F] to newly The information processing apparatus according to claim 1, wherein the image selection is performed.

The input image selection unit
According to the evaluation mode of the selected image in the evaluation value determination unit, according to a predetermined variable update algorithm, at least one of the variables [T] and [F] is updated to perform a new image selection. The information processing apparatus according to claim 5 , wherein:

The information processing apparatus further includes:
The SFM (Structure from Motion) process using the selected image determined by the evaluation value determination unit as an image suitable for generating three-dimensional data, and an extended Kalman filter (EKF) using the generation information generated by the SFM process as initial information 7. The information processing apparatus according to claim 1 , wherein the information processing apparatus has a configuration that executes three-dimensional data generation by a process to which () is applied.

In the information processing apparatus, an image processing method for performing an image selection process to be applied to calculate a three-dimensional position of a pixel included in an image,
An image selection step in which an image selection unit selects a plurality of images as selected images from the input image;
An evaluation value calculating step for calculating an evaluation value for determining whether the selected image is an image suitable for generating the three-dimensional data;
An evaluation value determining unit that determines whether or not the selected image is an image suitable for generating three-dimensional data based on the evaluation value calculated in the evaluation value calculating step ;
An environment recognition step in which an environment recognition unit generates an environment recognition result including the three-dimensional position of the feature point included in the image and the position and orientation information of the camera by analyzing the selected image;
A reference image selection unit is a reference image selection unit that selects a reference image for verifying the environment recognition result, and within a one-image coordinate system of reference image candidates and corresponding feature points that are commonly included in the selected image A reference image selection step of selecting, as a reference image, an image that satisfies a condition that the distance is equal to or greater than a prescribed threshold,
The evaluation value calculating step includes:
Applying the information included in the environment recognition result and the reference image selected by the reference image selection unit,
The evaluation value calculating step includes:
(A1) Estimated number of feature points of the three-dimensional position included in the selected image (b1) Intersection of rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame Horn,
(C1) Camera model fitting error of reference image is a step of calculating at least one of the above (a1) to (c1) as an evaluation value,
The evaluation value determination step includes
Comparing at least one of the above values (a1) to (c1) with a preset threshold value to determine whether the selected image is an image suitable for generating three-dimensional data;
(A2) the estimated number of feature points of the three-dimensional position included in the selected image is equal to or greater than a specified threshold (Tha);
(B2) the intersection angle between rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame is equal to or greater than a specified threshold (Thb);
(C2) The camera model fitting error of the reference image is not more than a specified threshold value (Thc),
An image processing method, comprising: a step of determining that the selected image is an image suitable for generating three-dimensional data when at least one of the conditions (a2) to (c2) is satisfied.

The evaluation value calculating step includes:
Calculating all of (a1) to (c1) as evaluation values,
The evaluation value determination step includes
When satisfying all the conditions of the (a2) ~ (c2), according to claim 8, wherein the selected image is determining that an appropriate image for the generation of three-dimensional data Image processing method.

The evaluation value calculating step includes:
Among the evaluation values, for the crossing angle between the light rays, it is a step of calculating a median value of a plurality of crossing angles calculated by applying a plurality of corresponding feature points and image frames,
The evaluation value determination step includes
When satisfied that the median is defined threshold (Thb) above, claim 8 or 9, wherein the selected image is determining that an appropriate image for the generation of three-dimensional data An image processing method described in 1.

The image processing method further includes:
A feature point tracking unit has a feature point tracking step of obtaining a feature point trajectory included in the image by analyzing the selected image;
The image processing method according to claim 8, wherein the evaluation value calculating step is a step of calculating the evaluation value by applying the feature point locus.

The input image selection step includes:
A step of setting and updating a variable [T] corresponding to the selection frame interval and a variable [F] corresponding to the first selection frame, and performing a selection process of different images,
In the evaluation value determination step, when it is determined that the selected image is not an image suitable for generating three-dimensional data, at least one of the variables [T] and [F] is updated and a new one is executed. The image processing method according to claim 8, wherein the image processing method is a step of executing correct image selection.

The input image selection step includes:
According to the evaluation mode of the selected image in the evaluation value determination step, a new image selection is performed by updating at least one of the variables [T] and [F] according to a predetermined variable update algorithm. The image processing method according to claim 12 , wherein the image processing method includes:

The image processing method further includes:
The SFM (Structure from Motion) process using the selected image determined by the three-dimensional data generation unit as an image suitable for the generation of the three-dimensional data in the evaluation value determination step, and the information generated by the SFM process as initial information The image processing method according to claim 8, further comprising a step of executing three-dimensional data generation by a process to which an extended Kalman filter (EKF) used as a step is applied.

In the information processing apparatus, a computer program that executes an image selection process applied to calculate a three-dimensional position of a pixel included in an image,
An image selection step for causing the image selection unit to select a plurality of images as selected images from the input image;
An evaluation value calculation step for causing the evaluation value calculation unit to calculate an evaluation value for determining whether or not the selected image is an image suitable for generating three-dimensional data;
An evaluation value determination step for causing the evaluation value determination unit to determine whether or not the selected image is an image suitable for generating three-dimensional data based on the evaluation value calculated in the evaluation value calculation step ;
An environment recognition step in which an environment recognition unit generates an environment recognition result including the three-dimensional position of the feature point included in the image and the position and orientation information of the camera by analyzing the selected image;
A reference image selection unit is a reference image selection unit that selects a reference image for verifying the environment recognition result, and within a one-image coordinate system of reference image candidates and corresponding feature points that are commonly included in the selected image A reference image selection step of selecting an image that satisfies a condition that the distance of the distance is equal to or greater than a predetermined threshold as a reference image;
The evaluation value calculating step includes:
Applying the information included in the environment recognition result and the reference image selected by the reference image selection unit,
(A1) Estimated number of feature points of the three-dimensional position included in the selected image (b1) Intersection of rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame Horn,
(C1) Camera model fitting error of reference image is a step of calculating at least one of the above (a1) to (c1) as an evaluation value,
The evaluation value determination step includes
Comparing at least one of the above values (a1) to (c1) with a preset threshold value to determine whether the selected image is an image suitable for generating three-dimensional data;
(A2) the estimated number of feature points of the three-dimensional position included in the selected image is equal to or greater than a specified threshold (Tha);
(B2) the intersection angle between rays connecting the estimated three-dimensional position of the corresponding feature point included in the selected image and the feature point position of each image frame is equal to or greater than a specified threshold (Thb);
(C2) The camera model fitting error of the reference image is not more than a specified threshold value (Thc),
A computer program characterized by being a step of determining that the selected image is an image suitable for generating three-dimensional data when at least one of the above conditions (a2) to (c2) is satisfied.