JP2006252275A

JP2006252275A - Restoration system of camera motion and object shape

Info

Publication number: JP2006252275A
Application number: JP2005069118A
Authority: JP
Inventors: Masahiro Tomono; 正裕友納
Original assignee: Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency
Priority date: 2005-03-11
Filing date: 2005-03-11
Publication date: 2006-09-21

Abstract

<P>PROBLEM TO BE SOLVED: To highly accurately and efficiently restore a camera motion and the three-dimensional shape of an object by using only an image sequence photographed while moving a camera. <P>SOLUTION: In initial estimation, a degeneration reference value is calculated (S13) for a feature point sequence (S12) appearing in common from an image 1 to an image k, k is defined as the appropriate number of images to be used for the initial estimation in the case that the degeneration reference value exceeds a threshold (Yes in S14), and the camera motion and the object shape are obtained. Also, in order to further improve accuracy, a partial set g<SB>1</SB>to be the solution candidate of the camera motion and the object shape and its restoration reference value are obtained (S15) by using robust estimation for the feature point sequence (S12), the degeneration reference value is calculated for the partial set g<SB>i</SB>whose restoration reference value exceeds a threshold (S16), k is defined as the appropriate number of the images to be used for the initial estimation in the case that the degeneration reference value exceeds the threshold (Yes in S17), and the partial set g<SB>i</SB>of the best score is adopted as a feature point sequence for obtaining the camera motion and the object shape. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、カメラを移動させながら撮影した画像列からカメラ運動及び物体形状を同時に復元するシステムに関するものである。 The present invention relates to a system for simultaneously restoring camera motion and object shape from a sequence of images taken while moving a camera.

カメラの画像列から物体の３次元形状を復元する手法として、エピポーラ幾何に基づく手法や因子分解法がある。両手法とも、画像間で特徴点を対応づけ、その特徴点の画像上の位置から、カメラの３次元運動（並進と回転）と物体の３次元形状を推定する。ここで、物体の３次元形状は、３次元空間に復元された特徴点の集合で表される。特徴点の抽出と対応づけは、例えば、非特許文献１で提案されているように、画像エッジのコーナ点や交差点に相当する特徴を抽出し、現画像での特徴点の近傍領域とよくマッチする領域を次画像から所定の範囲内で探索することで実現する。
エピポーラ幾何に基づく手法では、２枚の画像間で、カメラの位置姿勢、画像特徴点の位置、および、３次元特徴点の位置の間に成り立つ基礎方程式を８点以上の特徴点について並べた連立方程式を解いて基礎行列を求め、それをもとにカメラ運動と物体形状を復元するのが一般的である（非特許文献２２６２〜２６５頁）。
また、因子分解法では、カメラモデルとして弱透視投影などの線形モデルを仮定し、画像特徴点を並べた計測行列を特異値分解によってカメラ運動行列と物体形状行列に分解し、座標軸の直交制約を用いてユークリッド復元を行うのが一般的である（非特許文献３）。 Methods for restoring the three-dimensional shape of an object from a camera image sequence include a method based on epipolar geometry and a factorization method. In both methods, a feature point is associated between images, and the three-dimensional motion (translation and rotation) of the camera and the three-dimensional shape of the object are estimated from the position of the feature point on the image. Here, the three-dimensional shape of the object is represented by a set of feature points restored in a three-dimensional space. For example, as proposed in Non-Patent Document 1, feature points are extracted and matched, and features corresponding to corner points and intersections of image edges are extracted and matched well with the neighborhood of feature points in the current image. This is realized by searching the area to be performed within a predetermined range from the next image.
In the method based on epipolar geometry, the basic equations that hold between the position and orientation of the camera, the position of the image feature point, and the position of the three-dimensional feature point are arranged between the two images for eight or more feature points. It is common to solve the equation to obtain a basic matrix and restore the camera motion and the object shape based on the basic matrix (Non-patent Document 2, pages 262 to 265).
In the factorization method, a linear model such as weak perspective projection is assumed as the camera model, the measurement matrix in which image feature points are arranged is decomposed into a camera motion matrix and an object shape matrix by singular value decomposition, and the orthogonal constraint of the coordinate axes is set. It is common to perform Euclidean reconstruction using this method (Non-patent Document 3).

しかしながら、上述の従来技術による方法は、エピポーラ幾何に基づく手法でも、因子分解法でも、カメラ運動の並進成分が対象物体までの距離に対して小さいと、復元結果が不安定になるという問題がある。たとえば、並進成分が０の場合は、退化が起きて、原理的に解は得られない。現実の画像では、画像特徴点抽出の誤差などの影響により完全な退化はおきず、物体形状が平面上に押しつぶされたような復元像が得られる。並進成分が０ではないが微小な場合も、程度は異なるが同様の現象が起こる。本明細書では、この場合も含めて退化と呼ぶ。
図５に退化の例を示す。図５（ａ）は、復元に用いる画像列の中の画像の一例である。（ａ）に示す対象物体は直角に交わった雑誌であり、カメラは雑誌を正面に見ながら横方向に動いて画像列を撮影した。画像上の白い点が抽出された特徴点である。図５（ｂ）および（ｃ）は、対象物体（雑誌）を横から見た視点で表示した復元結果であり、図５（ｂ）は、カメラ運動の並進成分が小さい場合の復元結果を示す図であり、５１０で示す矢印はカメラ視線の方向、５２０で示されている点が物体形状の復元結果である。図５（ｃ）は、並進成分が大きい場合の復元結果を示す図であり、５３０で示す矢印はカメラ視線の方向、５４０で示されている点が物体形状の復元結果である。本来は、図５（ｃ）のように、直角に交わった平面上に復元点が並ぶはずであるが、図５（ｂ）では、並進成分が小さいため、全特徴点が同一平面上に並んで復元されている。 However, the above-described prior art method has a problem that the restoration result becomes unstable when the translational component of the camera motion is small with respect to the distance to the target object, both in the method based on epipolar geometry and in the factorization method. . For example, if the translation component is 0, degeneration occurs and no solution can be obtained in principle. In an actual image, there is no complete degeneration due to the influence of an image feature point extraction error or the like, and a restored image in which an object shape is crushed on a plane is obtained. When the translational component is not 0 but is very small, the same phenomenon occurs to a different extent. In this specification, this case is also referred to as degeneration.
FIG. 5 shows an example of degeneration. FIG. 5A is an example of an image in an image sequence used for restoration. The target object shown in (a) is a magazine that intersects at a right angle, and the camera moves in the horizontal direction while looking at the magazine in front and takes an image sequence. White points on the image are extracted feature points. FIGS. 5B and 5C show restoration results displayed from the viewpoint of the target object (magazine) viewed from the side, and FIG. 5B shows the restoration results when the translational component of the camera motion is small. In the figure, the arrow indicated by 510 is the direction of the line of sight of the camera, and the point indicated by 520 is the object shape restoration result. FIG. 5C is a diagram showing a restoration result when the translational component is large. The arrow indicated by 530 is the direction of the camera line of sight, and the point indicated by 540 is the restoration result of the object shape. Originally, the restoration points should be arranged on a plane intersecting at right angles as shown in FIG. 5C, but in FIG. 5B, since the translational components are small, all feature points are arranged on the same plane. Has been restored.

このような退化に対処するには、カメラ運動の並進成分が十分大きくなるように画像列を撮影するか、あるいは、画像列の中から並進成分が大きくなる画像群を選択する必要がある。人間が介在するシステムでは、人間が目で見て画像を選択すればよいが、適切な画像枚数を選ぶには手間がかかる。また、移動ロボットにカメラを搭載したシステムで形状復元を行う場合には、システムが自動で適切な画像を選択する機構が必要になる。
この問題に対処する１つの方法として、カメラ運動の並進成分と対象物体までの距離を他のセンサを用いて測定する方法が考えられる。たとえば、特許文献１では、加速度センサなどでカメラの並進成分を測定し、また、距離センサを用いて物体までの距離を測定した結果を用いて、ノイズの影響が小さい、高精度な形状復元を行っている。しかし、この方法は、加速度センサや距離センサなど、カメラ以外のセンサを用いるため、装置が大がかりになるという問題がある。また、適用できる距離範囲が、距離センサの仕様に限定されるという問題もある。
一方、画像だけから退化を検出する方法として、画像特徴点の集合がエピポーラ幾何に従うか平面ホモグラフィに従うかの選択基準を評価して、平面ホモグラフィに従う場合に退化が起きていると判定する手法が非特許文献４で提案されている。また、この基準を用いて、十分な並進量が得られる画像間隔を選択する方法が非特許文献５で提案されている。
しかし、この方法では、エピポーラ幾何の他に平面ホモグラフィの計算も必要であり、特に、画像特徴点の誤差に対処するためにロバスト推定を行うと、エピポーラ幾何と平面ホモグラフィのそれぞれでロバスト推定が必要になり、処理時間が多くかかるという問題があった。 In order to cope with such degeneration, it is necessary to capture an image sequence so that the translational component of the camera motion is sufficiently large, or to select an image group in which the translational component is large from the image sequence. In a system in which humans intervene, it suffices for humans to select images visually, but it takes time to select an appropriate number of images. In addition, when shape restoration is performed in a system in which a camera is mounted on a mobile robot, a mechanism for automatically selecting an appropriate image is necessary for the system.
One method for dealing with this problem is to measure the translation component of the camera motion and the distance to the target object using another sensor. For example, in Patent Document 1, the translational component of a camera is measured with an acceleration sensor or the like, and the result of measuring the distance to an object with a distance sensor is used to perform highly accurate shape restoration with little influence of noise. Is going. However, this method uses a sensor other than a camera, such as an acceleration sensor or a distance sensor, so that there is a problem that the apparatus becomes large. There is also a problem that the applicable distance range is limited to the specifications of the distance sensor.
On the other hand, as a method of detecting degeneration only from an image, a method for evaluating whether a set of image feature points conforms to epipolar geometry or planar homography, and determines that degeneration has occurred when following planar homography Is proposed in Non-Patent Document 4. Further, Non-Patent Document 5 proposes a method for selecting an image interval that can provide a sufficient amount of translation using this criterion.
However, this method requires calculation of planar homography in addition to epipolar geometry. In particular, robust estimation is performed for both epipolar geometry and planar homography to cope with image feature point errors. There is a problem that a lot of processing time is required.

特開平１１−０６３９４９「３次元形状復元装置及び方法」Japanese Patent Laid-Open No. 11-063949 “3-D Shape Restoration Apparatus and Method” J. Shi and C. Tomasi: “Good Features to Track,” Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 593-600, 1994.J. Shi and C. Tomasi: “Good Features to Track,” Proceedings of International Conference on Computer Vision and Pattern Recognition, pp. 593-600, 1994. R. Hartley and A. Zisserman: “Multiple View Geometry in Computer Vision,” Cambridge University Press, 2000.R. Hartley and A. Zisserman: “Multiple View Geometry in Computer Vision,” Cambridge University Press, 2000. C. Tomasi and T. Kanade: “Shape and Motion from Image Streams under Orthography: A Factorization Approach,” International Journal of Computer Vision, Vol.9, No.2, pp.137-154, 1992.C. Tomasi and T. Kanade: “Shape and Motion from Image Streams under Orthography: A Factorization Approach,” International Journal of Computer Vision, Vol.9, No.2, pp.137-154, 1992. P. H. S. Torr, A. W. Fitzgibbon, and A. Zisserman: “The Problem of Degeneracy in Structure and Motion Recovery from Uncalibrated Image Sequences,”International Journal of Computer Vision, Vol. 32, No. 1, pp. 27-44, 1999.P. H. S. Torr, A. W. Fitzgibbon, and A. Zisserman: “The Problem of Degeneracy in Structure and Motion Recovery from Uncalibrated Image Sequences,” International Journal of Computer Vision, Vol. 32, No. 1, pp. 27-44, 1999. D. Nister: “Reconstruction from Uncalibrated Sequences with a Hierarchy of Trifocal Tensors,” Proceedings of European Conference on Computer Vision, pp.649-663, 2000.D. Nister: “Reconstruction from Uncalibrated Sequences with a Hierarchy of Trifocal Tensors,” Proceedings of European Conference on Computer Vision, pp. 649-663, 2000.

本発明の課題は、カメラ（以下、画像入力装置を総称してカメラと呼ぶ）を移動しながら撮影した画像列からカメラ運動と物体の３次元形状を同時に復元する際に、カメラからの画像列だけを用いて精度の高いカメラ運動及び物体形状の復元を効率よく実現することができる復元システムを構築することである。 An object of the present invention is to provide an image sequence from a camera when simultaneously reconstructing a camera motion and a three-dimensional shape of an object from an image sequence taken while moving a camera (hereinafter, the image input device is generically called a camera). It is to construct a restoration system that can efficiently realize high-precision camera motion and object shape restoration using

上記目的を達成するために、本発明は、カメラを移動しながら撮影した画像列からカメラ運動と物体の３次元形状を復元するカメラ運動・物体形状復元システムであって、入力した画像から特徴点を抽出して画像間での対応づけを行ない、該対応づけられた特徴点を特徴点記憶手段に記憶する特徴点抽出・追跡手段と、初期値ｋを与え、入力した最初の画像からｋ番目の画像までの特徴点を前記特徴点記憶手段から読み出して、前記特徴点から基礎方程式行列を生成し、該基礎方程式行列の特異値を求めて該特異値から退化基準値を計算し、該退化基準値が所定の閾値を超えない場合はｋを増やしながら退化基準値の計算を繰り返し、該退化基準値が前記閾値を超えた場合には、前記最初の画像からｋ番目の画像までの特徴点からカメラの初期姿勢及び物体の特徴点の３次元位置を求めてそれぞれカメラ運動記憶手段及び３次元形状記憶手段に記憶する初期推定手段と、前記カメラ運動記憶手段に記憶しているカメラ姿勢と前記３次元形状記憶手段に記憶している物体形状とを読み出すカメラ運動・物体形状読出手段とを備えることを特徴とする。
上記のカメラ運動・物体形状復元システムは、さらに、前記初期推定手段の処理後に、ｋ＋１番目の画像から最後の画像まで順に、現在の画像のカメラ姿勢を求めて前記カメラ運動記憶手段に記憶するカメラ姿勢推定と、特徴点の３次元位置を求めて前記３次元形状記憶手段に融合する物体形状復元とを交互に繰り返し行なう逐次推定手段を備えていてもよい。
前記初期推定手段は、前記最初の画像からｋ番目の画像までの特徴点からロバスト推定によりカメラ運動と物体形状の復元に用いる特徴点の部分集合を求め、前記特徴点の部分集合から基礎方程式行列を生成し、該基礎方程式行列の特異値を求めて該特異値から退化基準値を計算し、該退化基準値が所定の閾値を超えない場合はｋを増やしながら退化基準値の計算を繰り返し、該退化基準値が前記閾値を超えた場合には、前記特徴点の部分集合からカメラの初期姿勢及び物体の特徴点の３次元位置を求めてそれぞれカメラ運動記憶手段及び３次元形状記憶手段に記憶することとしてもよい
前記初期推定手段は、前記特徴点の部分集合が複数求められた場合には、各部分集合について復元基準値を求め、該復元基準値と前記退化基準値とから計算されるスコアが最良である前記部分集合からカメラの初期姿勢及び物体の特徴点の３次元位置を求めてそれぞれカメラ運動記憶手段及び３次元形状記憶手段に記憶してもよい。
前記初期推定手段は、前記特徴点の部分集合を求める前に、前記最初の画像からｋ番目の画像までの特徴点から退化基準値を求め、該退化基準値が所定の閾値を超えない場合はｋを増やしながら退化基準値の計算を繰り返し、該退化基準値が前記閾値を超えた場合に前記特徴点の部分集合を求めてもよい。
前記逐次推定手段における物体形状復元は、現在の画像の特徴点と、前記特徴点記憶手段に記憶されている過去の各時点での画像の対応する特徴点とでそれぞれ３次元復元を行なってその共分散行列を求め、該共分散行列の値が最小かつ所定の閾値未満である３次元復元結果を該特徴点の３次元位置として前記３次元形状記憶手段に融合してもよい。
上述したカメラ運動・物体形状復元システムの機能をコンピュータ・システムに構成させるためのカメラ運動・物体形状復元プログラムやそのプログラムを記録した記録媒体も本発明である。 In order to achieve the above object, the present invention provides a camera motion / object shape restoration system for restoring a camera motion and a three-dimensional shape of an object from an image sequence taken while moving the camera. Is extracted and tracked by the feature point extracting / tracking means for storing the matched feature points in the feature point storing means, and an initial value k is given. The feature points up to the image are read from the feature point storage means, a basic equation matrix is generated from the feature points, a singular value of the basic equation matrix is obtained, a degenerate reference value is calculated from the singular value, and the degeneration is performed. When the reference value does not exceed the predetermined threshold, the calculation of the degeneration reference value is repeated while increasing k, and when the degeneration reference value exceeds the threshold, the feature points from the first image to the kth image First camera from Initial estimation means for obtaining the posture and the three-dimensional position of the feature point of the object and storing them in the camera movement storage means and the three-dimensional shape storage means, respectively, and the camera posture and the three-dimensional shape storage stored in the camera movement storage means And a camera motion / object shape reading means for reading out the object shape stored in the means.
The camera motion / object shape restoration system further includes a camera that obtains the camera posture of the current image in order from the (k + 1) th image to the last image after the processing of the initial estimation unit, and stores it in the camera motion storage unit There may be provided sequential estimation means for alternately and repeatedly performing posture estimation and object shape restoration obtained by obtaining the three-dimensional position of the feature point and fusing with the three-dimensional shape storage means.
The initial estimation means obtains a subset of feature points used for camera motion and object shape restoration by robust estimation from the feature points from the first image to the k-th image, and a basic equation matrix from the subset of the feature points Generating a singular value of the basic equation matrix and calculating a degeneration reference value from the singular value. If the degeneration reference value does not exceed a predetermined threshold, the calculation of the degeneration reference value is repeated while increasing k, When the degeneration reference value exceeds the threshold value, the initial posture of the camera and the three-dimensional position of the object feature point are obtained from the subset of the feature points and stored in the camera motion storage unit and the three-dimensional shape storage unit, respectively. The initial estimation means may obtain a restoration reference value for each subset when a plurality of subsets of the feature points are obtained, and calculate from the restoration reference value and the degenerate reference value. The initial posture of the camera and the three-dimensional position of the feature point of the object may be obtained from the subset with the best score to be calculated and stored in the camera motion storage means and the three-dimensional shape storage means, respectively.
The initial estimation means obtains a degeneration reference value from the feature points from the first image to the k-th image before obtaining the subset of the feature points, and if the degeneration reference value does not exceed a predetermined threshold value The calculation of the degeneration reference value may be repeated while increasing k, and the feature point subset may be obtained when the degeneration reference value exceeds the threshold.
The object shape restoration in the successive estimation means is performed by performing three-dimensional restoration on the feature point of the current image and the corresponding feature point of the image at each past time point stored in the feature point storage means. A covariance matrix may be obtained, and a three-dimensional restoration result in which the value of the covariance matrix is minimum and less than a predetermined threshold may be merged with the three-dimensional shape storage means as the three-dimensional position of the feature point.
The camera motion / object shape restoration program for configuring the computer system with the functions of the camera motion / object shape restoration system described above and a recording medium recording the program are also the present invention.

本発明のカメラ運動及び物体形状の復元システムによれば、人間からの指示や従来技術のようなカメラ以外のセンサ等を必要とせずに、カメラからの画像列だけを用いて、カメラ運動の並進成分が十分大きく確保できる画像枚数を自動的に選択し、精度の高いカメラ運動及び物体の３次元形状の復元を効率よく実現することができる。
このような本発明の復元システムは、移動ロボット、あるいは、自動車や飛行機などの移動機構にカメラを搭載して得た画像列から自動的にカメラ運動と物体形状を復元し、復元したカメラ運動や３次元物体形状を読み出して利用する場合等に、非常に有効である。また、３次元物体として環境全体を対象とすることで、環境の３次元マップを復元し、その環境マップやカメラ運動を読み出して利用する場合等にも有用である。 According to the camera motion and object shape restoration system of the present invention, translation of camera motion is performed using only an image sequence from the camera without requiring a human instruction or a sensor other than the camera as in the prior art. It is possible to automatically select the number of images that can ensure a sufficiently large component, and to efficiently realize highly accurate camera movement and three-dimensional shape reconstruction of the object.
Such a restoration system of the present invention automatically restores camera motion and object shape from an image sequence obtained by mounting a camera on a mobile robot or a moving mechanism such as an automobile or airplane, This is very effective when a three-dimensional object shape is read and used. Further, by targeting the entire environment as a three-dimensional object, it is also useful when restoring a three-dimensional map of the environment and reading out and using the environment map and camera motion.

以降、本発明のカメラ運動及び物体形状の復元システムの実施形態の一例を、図面を参照しながら詳細に説明する。なお、ここでは、画像入力装置全般を指して「カメラ」と呼ぶものとする。
まず、（１）本実施形態のカメラ運動及び物体形状の復元システムの処理の概要を説明し、次に、（２）本実施形態の復元システムで用いる新しい手法である基線長選択方法（復元に適切な画像枚数を決定する方法）について詳細に説明する。最後に、（３）本実施形態の復元システムを用いて実験を行なった結果を示す。 Hereinafter, an example of an embodiment of the camera motion and object shape restoration system of the present invention will be described in detail with reference to the drawings. It should be noted that here, the entire image input apparatus is referred to as “camera”.
First, (1) the outline of the processing of the camera motion and object shape restoration system of this embodiment will be described, and then (2) the baseline length selection method (for restoration) which is a new method used in the restoration system of this embodiment. A method for determining an appropriate number of images will be described in detail. Finally, (3) the results of experiments using the restoration system of this embodiment are shown.

（１．カメラ運動及び物体形状の復元システムの処理の概要）
まず、本実施形態のカメラ運動及び物体形状の復元システムの処理の概要を説明する。例として、ここではパーソナルコンピュータなどのコンピュータに本発明の復元システムを構築し、カメラで撮影した画像列を該コンピュータに入力して、入力した画像列から自動的にカメラ運動及び対象物体の３次元物体形状を復元する場合を想定して説明する。例えば、上述したようなカメラを搭載した移動ロボットの内部に本発明の復元システムを構築し、記憶した３次元物体形状やカメラの動き（ロボットの動き）を読み出して利用することができる。
本実施形態では、オドメトリなどの内界センサを用いずに、標準的なレンズをつけた単眼カメラで撮影した画像列からカメラ姿勢を推定し、同時に３次元物体形状を復元する。３次元物体形状は画像から抽出した特徴点を３次元に復元した点の集合である。カメラ姿勢は３次元空間で位置と方向合せて６自由度をもつとする。なお、本発明をカメラを搭載する移動ロボットに用いる場合には、カメラ姿勢とロボット姿勢は同一視する。また、カメラ運動はカメラ姿勢の時系列という意味で用いるが、カメラ姿勢とカメラ運動をとくに区別せずに用いることもある。 (1. Overview of processing of camera motion and object shape restoration system)
First, an outline of processing of the camera motion and object shape restoration system of the present embodiment will be described. As an example, here, the restoration system of the present invention is constructed in a computer such as a personal computer, an image sequence photographed by a camera is input to the computer, and the camera motion and the three-dimensional object are automatically obtained from the input image sequence. Description will be made assuming that the object shape is restored. For example, the restoration system of the present invention can be constructed inside a mobile robot equipped with a camera as described above, and the stored three-dimensional object shape and camera movement (robot movement) can be read and used.
In this embodiment, without using an internal sensor such as odometry, the camera posture is estimated from a sequence of images taken with a monocular camera with a standard lens, and at the same time, a three-dimensional object shape is restored. A three-dimensional object shape is a set of points obtained by restoring feature points extracted from an image in three dimensions. Assume that the camera posture has six degrees of freedom in alignment with the position in a three-dimensional space. When the present invention is used for a mobile robot equipped with a camera, the camera posture and the robot posture are regarded as the same. Camera motion is used in the sense of time series of camera posture, but camera posture and camera motion may be used without distinction.

本実施形態のカメラ運動及び物体形状の復元システムの処理の流れを図１に示す。ここで説明するシステム概要について、基本的な流れは、山崎・友納らのシステムに沿っている（K. Yamazaki, M. Tomono, T. Tsubouchi, and S. Yuta: “3-D Object Modeling by a Camera Equipped on a Mobile Robot,” Proceedings of the IEEE International Conference on Robotics and Automation, 2004. を参照されたい。）が、本実施形態では、本実施形態の新しい手法である基線長選択方法を用いて、初期推定処理（図１のＳ１５０）において退化を起こさずにカメラ姿勢及び３次元物体形状を復元することができる適切な画像枚数ｍを決定する。基線長選択方法については、次項（２．基線長選択方法）で説明する。また、逐次推定処理における物体形状復元処理において、復元誤差がなるべく小さくなる適切な画像枚数を求める。
カメラの内部パラメータは既知とする。なお、ここで用いる時刻ｔは仮想的な時刻であり、画像枚数に対応した離散量である。 The flow of processing of the camera motion and object shape restoration system of this embodiment is shown in FIG. Regarding the system overview described here, the basic flow is in line with the system of Yamazaki and Tono (K. Yamazaki, M. Tomono, T. Tsubouchi, and S. Yuta: “3-D Object Modeling by a Camera Equipped on a Mobile Robot, ”Proceedings of the IEEE International Conference on Robotics and Automation, 2004.)) In this embodiment, using the baseline length selection method, which is a new method of this embodiment, In the initial estimation process (S150 in FIG. 1), an appropriate number m of images that can restore the camera posture and the three-dimensional object shape without degeneration is determined. The baseline length selection method will be described in the next section (2. Baseline length selection method). In addition, in the object shape restoration process in the successive estimation process, an appropriate number of images that minimizes the restoration error is obtained.
The internal parameters of the camera are assumed to be known. The time t used here is a virtual time, which is a discrete amount corresponding to the number of images.

（１）特徴点抽出・追跡処理（Ｓ１４０）
図１に示すように、本実施形態の復元システムでは、まず、撮影した画像列１１０を本実施形態の復元システムが構築されたコンピュータ等に入力し、特徴点の抽出と追跡（画像間での特徴点の対応づけ）を行なう（Ｓ１４０）。
形状復元は、画像間で対応づけられた特徴をもとに、ステレオ視の原理で行われる。本実施形態では、この特徴点の抽出と対応づけには、従来技術であるＫＬＴトラッカを用いるが、他の手法を用いてもよい。ＫＬＴトラッカは、画像エッジのコーナ点や交差点に相当する特徴を抽出し、現画像での特徴点の近傍領域とよくマッチする領域を次画像から所定の範囲内で探索することで、特徴点の追跡を行う（詳しくは非特許文献１を参照）。
ＫＬＴトラッカにより、特徴点の対応はかなり良くとれるが、近くに似た特徴があると対応誤りが生じる。また、透視投影による変形が強い部分やオクルージョンの付近では、対応する特徴点の位置誤差が大きくなることがある。これらの問題には、後述のように、従来技術のＲＡＮＳＡＣを用いて対処する。
ここで求めた特徴点とその対応づけは、例えば図４に示すように、特徴点を一意に識別する番号と、どの画像から得られたかを示す番号をインデックスとして、配列などの記憶領域に記憶する。記憶された特徴点は、後で説明する初期推定処理（Ｓ１５０），逐次推定処理（Ｓ１６０〜Ｓ１７０）の各処理で用いる。 (1) Feature point extraction / tracking process (S140)
As shown in FIG. 1, in the restoration system of this embodiment, first, a captured image sequence 110 is input to a computer or the like in which the restoration system of this embodiment is constructed, and feature points are extracted and tracked (between images). (Association of feature points) is performed (S140).
Shape restoration is performed based on the principle of stereo vision based on the features associated with each other. In the present embodiment, the KLT tracker, which is a conventional technique, is used to extract and associate the feature points. However, other methods may be used. The KLT tracker extracts features corresponding to the corner points and intersections of the image edge, and searches the next image for a region that closely matches the vicinity of the feature point in the current image within a predetermined range, thereby Tracking is performed (refer to Non-Patent Document 1 for details).
The KLT tracker can deal with feature points fairly well, but if there is a similar feature nearby, a correspondence error occurs. In addition, the position error of the corresponding feature point may become large in a portion where deformation due to perspective projection is strong or in the vicinity of occlusion. These problems are addressed using RANSAC of the prior art as will be described later.
For example, as shown in FIG. 4, the feature points obtained here are stored in a storage area such as an array using a number that uniquely identifies the feature point and a number that indicates which image is obtained as an index. To do. The stored feature points are used in each process of an initial estimation process (S150) and a sequential estimation process (S160 to S170) described later.

（２）初期推定処理（Ｓ１５０）
次に、Ｓ１４０で求めた特徴点から、初期推定を行なう（Ｓ１５０）。
本実施形態の復元システムにより３次元に復元された物体形状の特徴点は、記憶領域１３０に融合されて３次元物体形状を構成する。しかしながら、３次元物体形状復元の開始時（ｔ＝１〜ｍ）においては、本実施形態の復元システムは、まだ３次元物体形状が何もない状態で、入力した画像列だけからカメラ姿勢と３次元物体形状を同時に推定しなければならない。このため、従来技術である因子分解法を用いて、Ｓ１４０で求めた特徴点からカメラの初期姿勢ｒ_１，…，ｒ_ｍと物体の初期形状（物体形状の３次元特徴点の集合）を同時に求める（詳しくは非特許文献３を参照）。因子分解法は弱透視投影への近似による誤差をもつので、非線形最小化を用いて正確な解を求める。なお、因子分解法の代わりに、同じく従来技術であるエピポーラ幾何に基づく手法を用いてもよい。
因子分解法は、特徴点の座標を並べた行列Ｗを、特異値分解を用いて、カメラ運動を表す行列Ｍと物体形状を表す行列Ｓの積Ｗ＝ＭＳに分解する。この分解は一意ではないが、カメラ姿勢の座標軸の直交性に関する計量制約を用いて、Ｍを唯一に求める。 (2) Initial estimation process (S150)
Next, initial estimation is performed from the feature points obtained in S140 (S150).
The feature points of the object shape restored three-dimensionally by the restoration system of the present embodiment are merged into the storage area 130 to form a three-dimensional object shape. However, at the start of the reconstruction of the three-dimensional object shape (t = 1 to m), the restoration system of the present embodiment has the camera posture and 3 from the input image sequence without any three-dimensional object shape. The dimensional object shape must be estimated simultaneously. Therefore, using the factorization method is prior art, the initial attitude r ₁ from the feature point camera obtained in _S140, ..., r _m and the object of the initial shape (a set of three-dimensional feature point of an object shape) at the same time (See Non-Patent Document 3 for details). Since the factorization method has an error due to approximation to weak perspective projection, an accurate solution is obtained using nonlinear minimization. Instead of the factorization method, a technique based on epipolar geometry, which is also a conventional technique, may be used.
In the factorization method, a matrix W in which the coordinates of feature points are arranged is decomposed into a product W = MS of a matrix M representing a camera motion and a matrix S representing an object shape by using singular value decomposition. Although this decomposition is not unique, M is uniquely determined using a metric constraint on the orthogonality of the coordinate axes of the camera pose.

因子分解法では、透視投影を平行投影あるいは弱透視投影に線形近似する。このため、非常に安定して解けるという利点があるが、得られた復元結果には線形近似の誤差が含まれる。そこで、非線形最小化によって線形近似誤差を除去し、透視投影のもとでのカメラ運動と物体形状を求める。これは、式（１）に基づいて行う。
In the factorization method, perspective projection is linearly approximated to parallel projection or weak perspective projection. For this reason, there is an advantage that it can be solved very stably, but an error of linear approximation is included in the obtained restoration result. Therefore, the linear approximation error is removed by nonlinear minimization, and the camera motion and object shape under the perspective projection are obtained. This is done based on equation (1).

ここで、（ｕ_ｔ，ｉ，ｖ_ｔ，ｉ）は、時刻ｔの画像におけるｉ番目の特徴点の座標、Ｒ_ｔおよびＴ_ｔは、それぞれ、時刻ｔのカメラ座標系の回転行列と並進量、Ｐ_ｉ＝（Ｘ_ｉ，Ｙ_ｉ，Ｚ_ｉ）^Ｔはｉ番目の特徴点の３次元位置である。Ｐ_ｔ，ｉは時刻ｔのカメラ座標系から見たｉ番目の特徴点の３次元位置である。また、ｍは画像枚数、ｎは特徴点数である。この式は、３次元空間の特徴点Ｐ_ｉをカメラ姿勢に基づいて画像上に逆投影した誤差を表している。この非線形最小化問題は、因子分解法で求めた解を初期値として、繰り返し法を用いて解く。
なお、この処理では対象となる物体のスケールは決められないので、何らかの形で実スケールを与える必要がある。たとえば、人間がスケールを明示的に教える、既知の物体をロボットが認識してスケールを自動設定する、距離センサなど他のセンサを利用する、などの方法が考えられる。 Here, (u _{t, i} , v _{t, i} ) are the coordinates of the i-th feature point in the image at time t, and R _t and T _t are the rotation matrix and translation amount of the camera coordinate system at time t, respectively. , P _i = (X _i , Y _i , Z _i ) ^T is the three-dimensional position of the i-th feature point. P _{t, i} is the three-dimensional position of the i-th feature point viewed from the camera coordinate system at time t. M is the number of images, and n is the number of feature points. This equation represents the error that backprojected onto an image based on the feature point P _i of the three-dimensional space to the camera posture. This nonlinear minimization problem is solved using an iterative method with a solution obtained by a factorization method as an initial value.
In this process, the scale of the target object cannot be determined, so it is necessary to give an actual scale in some form. For example, a method in which a person explicitly teaches a scale, a robot recognizes a known object and automatically sets the scale, or uses another sensor such as a distance sensor can be considered.

特徴点の誤対応に対処するために、従来技術であるＲＡＮＳＡＣを用いてロバストな推定を行う（詳しくは「M. Fischler and R. Bolles: “Random Sample Consensus: a Paradigm for Model Fitting with Application to Image Analysis and Automated Cartography”, Communications ACM, 24:381-395, 1981.」を参照）。まず、特徴点からｎ_１個の点をランダムに選び、上述の因子分解と非線形最小化によってカメラ姿勢を求めた後、そのカメラ姿勢に基づいて残りのｎ−ｎ_１個の特徴点を３次元復元する。特徴点が誤対応をしていると、その３次元復元は大きな誤差をもつ。そこで、復元した３次元点を画像に逆投影して、対応する画像特徴点との誤差が所定の閾値以上であるものは誤対応として削除する。これを繰り返して、正しい特徴点対応の個数が最大になるカメラ姿勢を解として採用する。
ここで求めたカメラの初期姿勢は記憶領域１２０に記憶され、物体の初期形状（３次元特徴点）は記憶領域１３０に記憶される。 In order to deal with miscorresponding feature points, robust estimation is performed using RANSAC, which is a conventional technology (for details, see “M. Fischler and R. Bolles:“ Random Sample Consensus: a Paradigm for Model Fitting with Application to Image Analysis and Automated Cartography ”, Communications ACM, 24: 381-395, 1981.). First, randomly selects n ₁ one point from the feature point, after obtaining the camera position by factorization and nonlinear minimization described above, the three-dimensional _one feature point remaining n-n based on the camera posture Restore. If the feature points are miscorresponding, the three-dimensional reconstruction has a large error. Therefore, the restored three-dimensional point is back-projected on the image, and an error whose error from the corresponding image feature point is equal to or greater than a predetermined threshold is deleted as an incorrect correspondence. By repeating this, the camera posture that maximizes the number of correct feature points is adopted as the solution.
The initial posture of the camera obtained here is stored in the storage area 120, and the initial shape (three-dimensional feature point) of the object is stored in the storage area 130.

（３）逐次推定（Ｓ１６０〜Ｓ１７０）
初期推定（Ｓ１５０）の後は、カメラ姿勢と３次元特徴点を交互に推定する逐次推定を行なう（Ｓ１６０〜Ｓ１７０）ことにより、カメラ運動および物体形状を復元する。
（ｉ）カメラ姿勢の推定（Ｓ１６０）
時刻ｔ−１までに求めた物体形状の３次元特徴点（記憶領域１３０に記憶されている）と、時刻ｔの画像の特徴点がマッチするように、時刻ｔのカメラ姿勢を計算する。まず、時刻ｔ−１とｔの画像間で特徴点を追跡する。ここで得られた特徴点には、時刻ｔ−１以前から存在する特徴点もあるが、オクルージョンなどで消失した特徴点に代わって新たに抽出された特徴点もある。既に記憶領域１３０に登録されているのは時刻ｔ−１以前の特徴点なので、カメラ姿勢の推定にはこれらの特徴点だけを用いる。時刻ｔ−１までに記憶領域１３０に登録された３次元特徴点をＰ_ｉ＝（Ｘ_ｉ，Ｙ_ｉ，Ｚ_ｉ）^Ｔとすると、次式を最小化するカメラ姿勢Ｒ_ｔ，Ｔ_ｔを求める。
これは、物体形状の３次元点を時刻ｔの画像に逆投影した誤差を表す。これも非線形最小化問題となるので、ｔ−１のカメラ姿勢（記憶領域１２０に記憶されている）を初期値として、繰り返し法を用いて解く。ここでも、特徴点の誤対応に対処するためにＲＡＮＳＡＣを用いる。
ここで求めたカメラ姿勢は、記憶領域１２０に記憶する。 (3) Sequential estimation (S160 to S170)
After the initial estimation (S150), the camera motion and the object shape are restored by performing sequential estimation (S160 to S170) for alternately estimating the camera posture and the three-dimensional feature point.
(I) Camera posture estimation (S160)
The camera posture at time t is calculated so that the three-dimensional feature point (stored in the storage area 130) of the object shape obtained up to time t-1 matches the feature point of the image at time t. First, feature points are tracked between images at times t-1 and t. The feature points obtained here include feature points existing before time t−1, but there are also feature points newly extracted in place of feature points that have disappeared due to occlusion or the like. Since the feature points already registered in the storage area 130 are before the time t−1, only these feature points are used for estimating the camera posture. Assuming that the three-dimensional feature points registered in the storage area 130 up to time t−1 are P _i = (X _i , Y _i , Z _i ) ^T , camera postures R _t and T _t that minimize the following expression are obtained. .
This represents an error of back projecting the three-dimensional point of the object shape onto the image at time t. Since this is also a nonlinear minimization problem, it is solved using an iterative method with the camera posture at t−1 (stored in the storage area 120) as an initial value. Again, RANSAC is used to deal with mis-corresponding feature points.
The camera posture obtained here is stored in the storage area 120.

（ｉｉ）物体形状の復元（Ｓ１７０）
特徴点ごとに、時刻ｔ−１以前のカメラ姿勢と時刻ｔのカメラ姿勢とから三角測量の原理で特徴点の３次元位置を求める。
時刻ｔ₁とｔ_２のカメラ姿勢（記憶領域１２０に記憶されている）と画像から３次元点の位置を求める。時刻ｔ_１のカメラから見たｔ_２のカメラ姿勢をＲ，Ｔとすると、両時刻の特徴点ｑ_ｔ１，ｉ，ｑ_ｔ２，ｉの間には次の関係が成り立つ。ｓ_１，ｓ_２は奥行きを表すパラメータである。
特徴点抽出やカメラ姿勢推定の誤差のため、ほとんどの場合、この方程式は厳密には解けないので、以下の二乗誤差を最小にする解を求める。
３次元復元した特徴点は、記憶領域１３０に既に記憶されている３次元特徴点と融合する。このようにして、３次元物体形状を復元する。なお、３次元復元した特徴点を３次元物体形状に融合する方法については、後で詳しく説明する。 (Ii) Restoration of object shape (S170)
For each feature point, the three-dimensional position of the feature point is obtained from the camera posture before time t-1 and the camera posture at time t by the principle of triangulation.
Time t ₁ (stored in the storage area 120) and t ₂ of the camera orientation and determine the position of the three-dimensional point from the image. When the camera posture of _{t 2} as seen from the time _{t 1} camera R, is T, both time characteristic points _q t1, _i, the following relationship holds between the _{q t2, i.} s ₁ and s ₂ are parameters representing the depth.
In most cases, this equation cannot be solved exactly due to errors in feature point extraction and camera posture estimation, so a solution that minimizes the following square error is obtained.
The three-dimensional restored feature points are merged with the three-dimensional feature points already stored in the storage area 130. In this way, the three-dimensional object shape is restored. A method of fusing the three-dimensional restored feature points into a three-dimensional object shape will be described in detail later.

（２．基線長選択方法）
背景技術で述べたように、従来、カメラ運動及び物体形状の復元においては、カメラ運動の並進成分が対象物体までの距離に対して微小な場合に、退化が起きることにより復元結果が不安定になるという問題があった。
本実施形態では、入力された画像列だけから並進成分に関する退化を検出し、さらに、初期推定処理において十分な並進成分が確保できるだけの適切な画像枚数を求める方法を提案することにより、上記の問題を解決する。
以降、本実施形態の復元システムで提案する基線長選択方法について詳細に説明する。
本実施形態で提案する基線長選択方法は、初期推定処理（図１のＳ１５０）で用いる画像枚数（基線長）を決定する処理であり、具体的には、例えば、ここで提案する基線長選択方法の処理の各ステップを初期推定処理（Ｓ１５０）に組み込むことにより実現することができる。 (2. Baseline length selection method)
As described in the background art, in the conventional camera motion and object shape restoration, if the translational component of the camera motion is very small with respect to the distance to the target object, the restoration result becomes unstable due to degeneration. There was a problem of becoming.
In the present embodiment, the above problem is solved by proposing a method for detecting a degeneration related to a translation component only from an input image sequence and obtaining an appropriate number of images that can secure a sufficient translation component in the initial estimation process. To solve.
Hereinafter, the baseline length selection method proposed in the restoration system of the present embodiment will be described in detail.
The baseline length selection method proposed in this embodiment is a process for determining the number of images (baseline length) used in the initial estimation process (S150 in FIG. 1). Specifically, for example, the baseline length selection proposed here is used. Each step of the process of the method can be realized by incorporating it in the initial estimation process (S150).

上記問題に対処するため、本実施形態では、画像特徴点の系列で構成される基礎方程式行列の特異値を調べることにより、その特徴点系列から退化を起こさずにカメラ運動と物体形状が復元できるかを調べる。
退化のチェックは、画像列の最初の画像Ｉ_１とｋ番目の画像Ｉ_ｋの２枚で行う。いま、画像Ｉ_１上の特徴点ｑ_１，ｉに対応する画像Ｉ_ｋ上の特徴点をｑ_ｋ，ｉとし、両画像間の基礎行列をＦ_１，ｋとすると、次の基礎方程式が成り立つ（非特許文献２参照）。
In order to address the above problem, in this embodiment, by examining the singular values of the basic equation matrix composed of a sequence of image feature points, the camera motion and the object shape can be restored without causing degeneration from the feature point sequence. Find out.
Degeneration checks are carried out by two of the first image I ₁ and the k-th image I _k of the image sequence. If the feature points on the image I _k corresponding to the feature points q _{1, i} on the image I ₁ are q _{k, i} and the basic matrix between the two images is F _{1, k} , the following basic equation holds. (Refer nonpatent literature 2).

上式を展開し、同次座標系で表した各画像特徴点ｑ_１，ｉ＝（ｕ_１ｉ，ｖ_１ｉ，１）^Ｔ，ｑ_ｋ，ｉ＝（ｕ_ｋｉ，ｖ_ｋｉ，１）^Ｔ，（１≦ｉ≦ｎ）に関してまとめると、次式が得られる。
ここで、ｆは基礎行列Ｆ_１，ｋの要素を一列に並べたベクトルである。本明細書では、この行列Ａを基礎方程式行列と呼ぶ。 Expanding the above equation, each image feature point q _{1, i} = (u _1i , v _1i , 1) ^T , q _{k, i} = (u _ki , v _ki , 1) ^T , ( Summarizing with respect to 1 ≦ i ≦ n), the following equation is obtained.
Here, f is a vector in which elements of the basic matrix F _{1, k} are arranged in a line. In this specification, this matrix A is called a basic equation matrix.

非特許文献２の２７９〜２８０頁によると、微小並進により退化が起こると、行列Ａのランクは６になる。したがって、行列Ａのランクを調べれば、退化が起きているかどうかを判定できる。ところが、実際には、並進成分が完全に０でなくても、前述のように広い意味での退化は起きる。また、ノイズの影響により、行列Ａのランクが厳密に６になることはない。
そこで、本発明では、行列Ａを特異値分解し、特異値を大きい順に並べて、その７番目の特異値（第７特異値）と８番目の特異値（第８特異値）が０に近いかどうかで退化を判定する。たとえば、それぞれの特異値をｓ_７，ｓ_８とすると、その相乗平均であるＤ＝√ｓ_７ｓ_８を退化の尺度として用いる。これを退化基準とよぶことにする。退化基準Ｄの大小によって、退化の有無だけでなく、退化の度合もわかる。これにより、Ｄが適当な閾値を超えたかどうかで、退化が起こらずに高精度な復元ができるかどうかを判定することができる。すなわち、画像枚数ｋを増やしながら、退化基準値が閾値を超えるｋを求め、そのときのｋを十分な並進成分（基線長）が確保できる適切な画像枚数として採用する。
画像特徴点の誤追跡があると、基礎方程式行列に大きなノイズが入ることになるので、実際は退化が起きているにもかかわらず、退化基準値が大きくなることがありうる。そこで、ロバスト推定により画像特徴点の誤追跡分を除去した後で、退化基準値の計算を行う。
以上のように、退化基準値によって退化の度合まで考慮して適切な画像枚数を求める点、および、ノイズを除去した上で信頼性の高い退化基準値を計算する点が本発明の特徴である。 According to pages 279 to 280 of Non-Patent Document 2, when degeneration occurs due to minute translation, the rank of the matrix A becomes 6. Therefore, if the rank of the matrix A is examined, it can be determined whether or not degeneration has occurred. However, in practice, even if the translation component is not completely zero, degeneration in a broad sense as described above occurs. Further, the rank of the matrix A does not become exactly 6 due to the influence of noise.
Therefore, in the present invention, the matrix A is subjected to singular value decomposition, the singular values are arranged in descending order, and the seventh singular value (seventh singular value) and the eighth singular value (eighth singular value) are close to zero. Determining degeneration by how. For example, assuming that the singular values are s ₇ and s ₈ , D = √s ₇ s ₈ , which is the geometric mean, is used as a degeneration measure. This is called the degeneration standard. Depending on the size of the degeneration criterion D, not only the presence or absence of degeneration but also the degree of degeneration can be known. Thereby, it can be determined whether or not high-precision restoration can be performed without degeneration depending on whether or not D exceeds an appropriate threshold. That is, while increasing the number k of images, k is determined for which the degeneration reference value exceeds the threshold, and k at that time is adopted as an appropriate number of images that can ensure a sufficient translational component (baseline length).
If there is a mistracking of image feature points, a large noise will be included in the basic equation matrix, so that the degeneration reference value may become large even though degeneration actually occurs. Therefore, after removing the erroneous tracking of the image feature points by robust estimation, the degeneration reference value is calculated.
As described above, the present invention is characterized in that an appropriate number of images is obtained in consideration of the degree of degeneration based on the degeneration reference value, and that a highly reliable degeneration reference value is calculated after removing noise. .

図２は、上述で説明した基線長選択方法を、本実施形態の復元システムにおいて処理する場合の処理の流れ図である。なお、上述したように、図１に示す初期推定処理（Ｓ１５０）に、ここで説明する基線長選択方法の処理の各ステップを組み込むことにより実現することができる。
まず、ステップＳ１１において、画像枚数ｋの初期値ｋ_０を与える。次に、ステップＳ１２において、画像１から画像ｋまでに共通して現れる特徴点系列Ｆを抽出する。なお、特徴点は上述した特徴点抽出・追跡処理（図１のＳ１４０）により求められ、記憶されている。
次に、ステップＳ１３において、特徴点系列Ｆ全体に対して、退化基準値Ｄを計算する。そして、ステップＳ１４において、Ｆの退化基準値が所定の閾値を超えるかどうかを判定し、閾値を超える場合はステップＳ１５に進み、閾値を超えない場合はステップＳ１９でｋを増やしてステップＳ１２に戻る。特徴点系列Ｆ全体で退化が起きている場合は、Ｆのどの部分集合を選んでも退化が起きると予想される。その場合にステップＳ１５以降に進むのは無駄であり、処理の効率化のため、ステップＳ１３〜Ｓ１４でＦ全体の退化判定を行う。 FIG. 2 is a flowchart of processing when the baseline length selection method described above is processed in the restoration system of the present embodiment. As described above, this can be realized by incorporating the steps of the baseline length selection method described here into the initial estimation process (S150) shown in FIG.
First, in step S11, giving the initial value _{k 0} of the number of images k. Next, in step S12, a feature point series F that appears in common from image 1 to image k is extracted. The feature points are obtained and stored by the feature point extraction / tracking process (S140 in FIG. 1).
Next, in step S13, a degeneration reference value D is calculated for the entire feature point series F. In step S14, it is determined whether or not the degeneration reference value of F exceeds a predetermined threshold value. If it exceeds the threshold value, the process proceeds to step S15. If it does not exceed the threshold value, k is increased in step S19 and the process returns to step S12. . When degeneration occurs in the entire feature point series F, degeneration is expected to occur regardless of which subset of F is selected. In this case, it is useless to proceed to step S15 and subsequent steps, and in order to improve the efficiency of the process, the entire F is judged to be degenerated in steps S13 to S14.

なお、ステップ１５以降の処理を行なわず、ステップＳ１４でＦの退化基準値が閾値を超えた場合のｋを適切な画像枚数として採用することもできる。（この場合、ステップＳ１４でＹｅｓになれば処理を終了する。）
また、ステップ１３，ステップ１４を行なわずに（ロバスト推定を行なう前に特徴点系列Ｆ全体での退化判定を行なわずに）ステップ１２からステップＳ１５に進む処理とすることもできる。 Note that k in the case where the degeneration reference value of F exceeds the threshold value in step S14 can be adopted as an appropriate number of images without performing the processing after step 15. (In this case, the process ends if the answer is Yes in step S14).
Alternatively, the process may proceed from step 12 to step S15 without performing steps 13 and 14 (without performing degeneracy determination for the entire feature point series F before performing robust estimation).

ステップＳ１５では、ＲＡＮＳＡＣなどのロバスト推定を用いて、カメラ運動と物体形状を求める。このロバスト推定は、特徴点の誤追跡を取り除くことにより、精度の高い復元を実現するとともに、ステップＳ１６での退化基準値の信頼性を上げるために行う。そして、誤追跡された特徴点をＦから取り除き、特徴点系列の部分集合を求める。このとき、精度のよい復元を与える特徴点系列の部分集合が複数個得られることがあるので、部分集合の族｛ｇ_ｉ｝を解候補として登録する。そして、解候補から適切な解となる部分集合を１つ選択するために、各部分集合ｇ_ｉに対して、復元の良さを表す復元基準値を計算する。なお、求められた部分集合が１つである場合には、最適な画像枚数ｋにおける部分集合をそのまま解として採用すればよい。
次に、ステップＳ１６において、各ｇ_ｉに対して退化基準値を計算する。そして、ステップＳ１７において、退化基準値が閾値を超えるｇ_ｉがあるかどうかを判定し、閾値を超えるｇ_ｉがあればステップＳ１８に進み、閾値を超えるｇ_ｉがなければステップＳ１９でｋを増やしてステップＳ１２に戻る。ステップＳ１８では、ｋを十分な並進成分（基線長）が確保できる適切な画像枚数として採用し、また、特徴点系列として退化基準値と復元基準値から計算されるスコアが最良となるｇ_ｉを選択する。
以下、図２の処理フローの各ステップについて詳細に説明する。 In step S15, camera motion and object shape are determined using robust estimation such as RANSAC. This robust estimation is performed in order to realize high-precision restoration by removing the mistracking of the feature points and to improve the reliability of the degeneration reference value in step S16. Then, the mistracked feature points are removed from F to obtain a subset of the feature point series. In this case, a subset of the feature point sequence give good recovery of precision because it may be obtained plurality, it registers a family of subsets {g _i} as candidate solutions. Then, in order to one subset of a proper solution selected from the solution candidate, for each subset g _i, computes the restored reference value representing a goodness of restoration. When only one subset is obtained, the subset at the optimum number of images k may be adopted as a solution as it is.
Next, in step S16, to calculate the degeneration reference value for each _{g i.} Then, in step S17, it is judged if there is a g _i to degenerate reference value exceeds the threshold value, the process proceeds to step S18 if there is g _i that exceeds the threshold value, increasing k in step S19 if there is g _i exceeds the threshold value The process returns to step S12. In step S18, k is adopted as an appropriate number of images for which a sufficient translation component (baseline length) can be secured, and g _i that gives the best score calculated from the degeneration reference value and the restoration reference value as the feature point series is used. select.
Hereinafter, each step of the processing flow of FIG. 2 will be described in detail.

ステップＳ１１は、画像列から取り出す画像枚数ｋを規定し、画像列から最初の画像とｋ番目の画像を取り出す。ｋの初期値ｋ_０は適当な値を与えてよい。ただし、エピポーラ幾何に基づく手法ではｋ_０は２以上、因子分解法で復元する場合はｋ_０は３以上にする必要がある。
ステップＳ１２では、画像１から画像ｋまでに共通して現れる特徴点集合を求める。各画像の特徴点は、従来の技術及び上述の特徴点抽出・追跡処理（Ｓ１４０）で説明した方法により、画像エッジのコーナ点やテクスチャの複雑な点を抽出し、画像間で追跡すればよい。得られた特徴点は、図４に示すように、特徴点を一意に識別する番号と、どの画像から得られたかを示す番号をインデックスとして、配列などの記憶領域に格納される。 In step S11, the number k of images to be extracted from the image sequence is defined, and the first image and the kth image are extracted from the image sequence. The initial value k ₀ of k may give an appropriate value. However, k ₀ is 2 or more in a manner that is based on epipolar geometry, when reconstitution with factorization method is k ₀ must be 3 or more.
In step S12, a set of feature points that appear in common from image 1 to image k is obtained. The feature points of each image may be traced between images by extracting corner points of image edges and complex points of textures by the method described in the conventional technique and the feature point extraction / tracking process (S140) described above. . As shown in FIG. 4, the obtained feature points are stored in a storage area such as an array using a number that uniquely identifies the feature point and a number that indicates which image is obtained from the index.

ステップＳ１３では、特徴点系列Ｆに対して、図３に示す手順で退化基準値を計算する。図３では、まず、ステップＳ２１において、式（７）にしたがって、特徴点系列から基礎方程式行列Ａを生成する。この際、ノイズの影響を軽減し、また、特異値の範囲を標準化するために、エピポーラ幾何に基づく復元でよく行われる方法にしたがって、特徴点の座標値を正規化しておく。すなわち、画像特徴点ｑ_１，ｉ，ｑ_ｋ，ｉ全体の重心を計算し、その重心を原点として、ｑ_１，ｉ，ｑ_ｋ，ｉのｕ座標値およびｖ座標値の範囲が−√２〜√２の間に入るように、各ｑ_１，ｉ，ｑ_ｋ，ｉを正規化する（詳しくは、R. Hartley: “In defense of the eight-point algorithm,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 19, No. 6, pp. 580--593, 1997. を参照）。
次に、ステップＳ２２において、行列Ａを特異値分解する。特異値は大きい順にソーティングしておく。そして、ステップＳ２３において、第７特異値と第８特異値から退化基準を計算する。退化基準としては、たとえば、前述のように相乗平均を用いてもよいし、相加平均や二乗平均など、他の基準を用いてもよい。
ステップＳ１４は退化基準値を閾値によって判定するステップであり、該閾値は経験的に与える。 In step S13, a degeneration reference value is calculated for the feature point series F according to the procedure shown in FIG. In FIG. 3, first, in step S21, the basic equation matrix A is generated from the feature point series according to the equation (7). At this time, in order to reduce the influence of noise and to standardize the range of singular values, the coordinate values of feature points are normalized in accordance with a method often used in restoration based on epipolar geometry. That is, the centroid of the entire image feature points q _{1, i} , q _{k, i} is calculated, and the range of the u coordinate value and the v coordinate value of q _{1, i} , q _{k, i} is −√2 with the centroid as the origin. Normalize each q _{1, i} , q _{k, i} so that it falls within the range of ~ √2 (for details, see R. Hartley: “In defense of the eight-point algorithm,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 19, No. 6, pp. 580-593, 1997.).
Next, in step S22, the matrix A is subjected to singular value decomposition. Sort singular values in descending order. In step S23, a degeneration criterion is calculated from the seventh singular value and the eighth singular value. As the degeneration criterion, for example, the geometric average may be used as described above, or another criterion such as an arithmetic average or a square average may be used.
Step S14 is a step of determining the degeneration reference value by a threshold value, which is given empirically.

ステップＳ１５では、特徴点の誤対応に対処するために、たとえば、ＲＡＮＳＡＣを用いてロバストな推定を行う。まず、特徴点系列Ｆからｎ_１個の特徴点をランダムに選び、前述のエピポーラ幾何に基づく手法や因子分解法を用いて、このｎ_１個の点を復元するカメラ姿勢を求めた後、そのカメラ姿勢に基づいて残りのｎ−ｎ_１個の特徴点を３次元復元する。特徴点が誤対応をしていると、その３次元復元は大きな誤差をもつ。そこで、復元した３次元点をもとの画像上に逆投影した位置と、対応する画像特徴点との位置の誤差が所定の閾値以上であるものは誤追跡として削除する。特徴点系列Ｆから誤追跡分を除いた特徴点系列の部分集合をｇ_ｉとする。そして、ｇ_ｉに含まれる特徴点の個数を、このｇ_ｉの復元の良さを表す復元基準値とする。これは、逆投影誤差の小さい特徴点を多くもつカメラ姿勢ほどよいということを表している。以上の処理を繰り返して、復元基準値が閾値を超えたｇ_ｉを解候補とする。なお、ＲＡＮＳＡＣの詳細は上述で紹介したM. Fischler and R. Bollesの文献に述べられている。また、ＲＡＮＳＡＣ以外に、ＬＭｅｄｓなどの他のロバスト推定の手法を用いてもよい。 In step S15, robust estimation is performed using, for example, RANSAC in order to deal with miscorresponding feature points. First, n ₁ feature points are randomly selected from the feature point series F, and after obtaining the camera posture for restoring the n ₁ points using the above-described epipolar geometry-based method or factorization method, to restore the remaining n-n ₁ pieces of feature points 3D based on the camera posture. If the feature points are miscorresponding, the three-dimensional reconstruction has a large error. In view of this, an error in which the position error between the position where the restored three-dimensional point is back-projected on the original image and the corresponding image feature point is equal to or greater than a predetermined threshold is deleted as a false tracking. Let g _i be a subset of the feature point series obtained by removing the mistracked part from the feature point series F. Then, the number of feature points included in g _i is set as a restoration reference value that represents the goodness of restoration of g _i . This indicates that a camera posture having many feature points with a small backprojection error is better. The above process is repeated, and g _i whose restoration reference value exceeds the threshold is set as a solution candidate. Details of RANSAC are described in the literature of M. Fischler and R. Bolles introduced above. In addition to RANSAC, other robust estimation methods such as LMeds may be used.

ステップＳ１６では、特徴点系列の部分集合ｇ_ｉに対して、ステップＳ１３と同様に、図３に示す手順で退化基準値を計算する。
ステップＳ１７は、ステップＳ１４と同様に、退化基準値を閾値によって判定するステップであり、該閾値は経験的に与える。
ステップＳ１８では、そのときのｋを十分な並進成分が確保できる適切な画像枚数とし、また、候補ｇ_ｉのうち、スコアが最良のものを、復元に用いる特徴点系列として選ぶ。ここでのスコアとして、たとえば、復元基準値、あるいは、退化基準値をそのまま用いてもよい。または、復元基準値と退化基準値の平均などによりスコアを定義してもよい。
ステップＳ１９は、現在の画像枚数ｋに増分ｄｋを加えて、画像列から最初の画像とｋ＋ｄｋ番目の画像を取り出す。ｄｋの値は標準的には１であるが、動画像のように間隔の小さい画像の場合は、ｄｋを１より大きくしてもよい。 In step S16, a degeneration reference value is calculated for the subset g _i of the feature point series by the procedure shown in FIG. 3 as in step S13.
Similar to step S14, step S17 is a step of determining the degeneration reference value by a threshold value, which is given empirically.
At step S18, the k at that time and sufficient translation component can be secured appropriate number of images, also of the candidate g _i, those score is best, chosen as a feature point sequence used to restore. As the score here, for example, the restoration reference value or the degeneration reference value may be used as it is. Alternatively, the score may be defined by an average of the restoration reference value and the degeneration reference value.
In step S19, an increment dk is added to the current number k of images, and the first image and the k + dk-th image are extracted from the image sequence. The value of dk is typically 1, but dk may be larger than 1 for an image with a small interval such as a moving image.

以上の処理により、退化基準値が閾値を超える画像枚数ｋを自動的に求め、しかも、復元基準値が高くなる特徴点系列の部分集合ｇ_ｉを求めることができる。そして、選ばれた画像枚数と特徴点系列の部分集合を用いて、上述の初期推定処理（図１のＳ１５０）で説明した方法により、カメラ運動と物体形状を復元して、カメラ初期姿勢は記憶領域１２０に記憶し、物体形状の３次元点（３次元物体の初期形状）は記憶領域１３０に記憶する。
なお、一般に、退化基準値は、カメラ運動の並進成分が長いほど、また、特徴点数が多いほど大きくなる。ところが、並進成分が大きくなると画像間で共通に抽出できる特徴点が減るため、ある時点から退化基準値は減少する。また、カメラ運動の仕方や対象物体の形状によっては、画像枚数をいくら増やしても、退化基準値が閾値を超えずに減少に向かうことがありうる。これに対処するため、あらかじめ画像枚数の上限値を定めておき、画像枚数がその上限値を超えても退化基準値が閾値を超えなければ、別の画像列に変えて再試行する処理を加えてもよい。あるいは、退化基準値が減少に向かった時点で、別の画像列に変えて再試行するようにしてもよい。 Thus the process, automatically determine the number of images k degeneration reference value exceeds the threshold value, moreover, it is possible to determine the subset g _i of the feature point sequence restoring the reference value is increased. Then, using the selected number of images and a subset of the feature point series, the camera motion and the object shape are restored by the method described in the initial estimation process (S150 in FIG. 1), and the initial camera posture is stored. The data is stored in the area 120, and the 3D point of the object shape (the initial shape of the 3D object) is stored in the storage area 130.
In general, the degeneration reference value increases as the translational component of the camera motion is longer and as the number of feature points is larger. However, as the translation component increases, the feature points that can be extracted in common between images decrease, and the degenerative reference value decreases from a certain point in time. Further, depending on how the camera moves and the shape of the target object, the degeneration reference value may decrease without exceeding the threshold value, no matter how many images are increased. In order to deal with this, an upper limit value for the number of images is set in advance, and if the degeneration reference value does not exceed the threshold even if the number of images exceeds the upper limit value, processing for changing to another image sequence and retrying is added. May be. Alternatively, when the degenerative reference value is reduced, it may be changed to another image sequence and retried.

（微小並進下での３次元点の復元方法）
上述で説明した基線長選択方法では、特徴点の集合に対して基準が最もよく満たされる画像枚数を選択することにより、初期推定処理におけるカメラ姿勢と３次元特徴点の復元を改善することができる。
しかしながら、並進成分が０に近い場合のもう１つの問題として、３次元点の復元精度が極端に悪くなることが挙げられる。特徴点の３次元復元は三角測量の原理で行うので、その基線長が短いと復元精度が悪くなる。特徴点の３次元位置の精度が悪いと、次のカメラ姿勢推定に誤差が伝播し、どんどん誤差が拡大する。この問題は、初期推定と逐次推定の両方に現れる。これに対処するために、ここでは、逐次推定処理において、復元精度が許容範囲内に収まるよう基線長が十分長くなるまで３次元特徴点の復元を行わない方法を提案する。なお、初期推定処理で復元された３次元特徴点も、ここで説明する処理の対象となりうる。
具体的には、逐次推定処理における物体形状復元処理（図１のＳ１７０）にここで説明する処理を組み込むことにより実現することができる。 (Reconstruction method of 3D points under micro translation)
In the baseline length selection method described above, it is possible to improve the camera posture and the reconstruction of the three-dimensional feature points in the initial estimation process by selecting the number of images that satisfy the criteria best for the set of feature points. .
However, another problem when the translational component is close to 0 is that the reconstruction accuracy of the three-dimensional point is extremely deteriorated. Since the three-dimensional restoration of feature points is performed based on the principle of triangulation, if the baseline length is short, the restoration accuracy is deteriorated. If the accuracy of the three-dimensional position of the feature point is poor, an error is propagated to the next camera posture estimation, and the error is enlarged. This problem appears in both initial and sequential estimation. In order to deal with this, here, a method is proposed in which the three-dimensional feature points are not restored until the baseline length is sufficiently long so that the restoration accuracy is within an allowable range in the successive estimation process. Note that the three-dimensional feature points restored in the initial estimation process can also be the target of the process described here.
Specifically, it can be realized by incorporating the process described here into the object shape restoration process (S170 in FIG. 1) in the sequential estimation process.

ここでは、復元精度の評価のために、特徴点の３次元復元の共分散行列Ｖ［Ｐ］＝Ｅ［ΔＰΔＰ^Ｔ］を用いる。これは、次のように計算される。
ただし、ｓ_１は前述の式（３）の奥行パラメータであり、Ｖ［ｓ_１］＝Ｅ［（Δｓ_１）^２］、Ｖ［ｑ，ｓ_１］＝Ｅ［ΔｑΔｓ_１］である。なお、これらの導出については「K. Kanatani: “Statistical Optimization for Geometric Computation: Theory and Practice,” Elsevier, Amsterdam, 1996.」を参照されたい。 Here, a three-dimensional restoration covariance matrix V [P] = E [ΔPΔP ^T ] of feature points is used for evaluation of restoration accuracy. This is calculated as follows:
However, _{s 1} is a depth parameter of the above formula _{(3), V [s 1} ] = E [(Δs 1) 2], a _{V [q, s 1] =} E [ΔqΔs 1]. For these derivations, see “K. Kanatani:“ Statistical Optimization for Geometric Computation: Theory and Practice, ”Elsevier, Amsterdam, 1996.”.

一般に、複数の画像間で同じ特徴点が追跡できるので、復元結果も複数得られる。それらを共分散行列を用いて融合する。時刻ｔ−１までに記憶領域１３０に３次元物体形状として登録されている特徴点の３次元位置をＰ^ｔ−１、その共分散行列をＶ［Ｐ^ｔ−１］とすると、時刻ｔで得た復元結果Ｐ_ｔとその共分散行列Ｖ［Ｐ_ｔ］を融合して、記憶領域１３０に記憶されている３次元位置を更新する。
３次元点の復元においては、どの画像対を選んで復元計算をするかにより、復元結果の精度が変わる。画像上の特徴点位置ｑの誤差分布は全画像で同じものを使うとすると、復元精度はカメラ姿勢に依存する。直観的には、基線長が長いほど精度がよくなる。そこで、式（８）を用いて、なるべく誤差が小さくなる画像対を選ぶ。以下に手順を示す。 In general, since the same feature point can be tracked between a plurality of images, a plurality of restoration results can be obtained. They are fused using a covariance matrix. If the three-dimensional position of the feature point registered as a three-dimensional object shape in the storage area 130 by time t−1 is P ^t−1 and its covariance matrix is V [P ^t−1 ], it is obtained at time t. The three-dimensional position stored in the storage area 130 is updated by fusing the restored result P _t and the covariance matrix V [P _t ].
In the restoration of the three-dimensional point, the accuracy of the restoration result varies depending on which image pair is selected and the restoration calculation is performed. If the error distribution of the feature point position q on the image is the same for all images, the restoration accuracy depends on the camera posture. Intuitively, the longer the baseline length, the better the accuracy. Therefore, an image pair having as small an error as possible is selected using Expression (8). The procedure is shown below.

（１）時刻ｔの画像特徴点ｑ_ｔ，ｉに対応する特徴点を過去の画像から求め、その集合をＱ_ｉとする。
（２）Ｑ_ｉ内の各時刻ｔ_ｋでの特徴点ｑ_ｔｋ，ｉに対して、ｑ_ｔ，ｉと組み合わせて上述で説明した逐次推定の方法により３次元復元を行い、その結果から式（８）を用いてＶ［Ｐ_ｔ，ｉ］を計算する。
（３）共分散行列Ｖ［Ｐ_ｔ，ｉ］の行列式の値が最小で、かつ、所定の閾値未満となるｑ_ｔｋ，ｉがあれば、それとｑ_ｔ，ｉを用いて３次元復元を行った結果を解として採用し、式（９）に基づいて記憶領域１３０に記憶されている３次元物体形状に融合する。
このようにして、本実施形態の逐次推定処理における物体形状復元処理（図１のＳ１７０）において、復元精度を向上させることができる。 (1) A feature point corresponding to an image feature point q _{t, i} at time t is obtained from a past image, and the set is defined as Q _i .
(2) _{Q i} characteristic points _{q tk} at each time _{t k} in the _{for _i,} _{q t,} in combination with _i performs three-dimensional reconstruction by sequential methods of estimating described above, expression from the result ( 8) is used to calculate V [P _{t, i} ].
(3) If there is q _{tk, i} having a minimum determinant value of the covariance matrix V [P _{t, i} ] and less than a predetermined threshold, three-dimensional reconstruction is performed using that and q _{t, i.} The result obtained is adopted as a solution and fused to the three-dimensional object shape stored in the storage area 130 based on the equation (9).
In this way, the restoration accuracy can be improved in the object shape restoration process (S170 in FIG. 1) in the successive estimation process of the present embodiment.

（３．本実施形態の復元システムを用いた実験）
最後に、上述で説明した本実施形態のカメラ運動及び物体形状の復元システムを用いて実験を行なった結果を示す。
実験の条件として、人間がデジタルカメラを持ち、歩きながら連写モードで静止画を撮影した。もとの静止画は６４０×４８０であるが、これを３２０×２４０に縮小して使用した。撮影した画像からオフラインで復元実験を行った。 (3. Experiment using restoration system of this embodiment)
Finally, the results of experiments using the camera motion and object shape restoration system of the present embodiment described above will be shown.
As a condition of the experiment, a human had a digital camera and took a still image in continuous shooting mode while walking. The original still image is 640 × 480, but this was used after being reduced to 320 × 240. An offline restoration experiment was performed from the captured images.

（１）最適な基線長（画像枚数）の選択
本実施形態の基線長選択方法により復元に最適な画像枚数（基線長）を自動的に選択する実験例を図５に示す。図５（ａ）は、用いた画像列のうちの画像例と、各々の画像に抽出された特徴点を示している。追跡した特徴点数は５０個である。この例では、カメラをゆっくり動かし、並進成分を小さくした。（ｂ）は、最初の画像３枚を用いて初期推定を行った復元例を読み出したものである。矢印５１０はカメラ視線の方向、５２０で示される点が復元結果である。実際は直角をなすべき部分がゆるやかになって、平面に近い形状になっているのがわかる。このときの退化基準値Ｄは０．０２であった。（ｃ）は、本実施形態の方法で適切な基線長を自動的に求めて得た復元結果を読み出したものである。矢印５３０はカメラ視線の方向、５４０で示される点が復元結果である。直角部分がうまく復元されていることがわかる。要した画像枚数は２４枚、退化基準値Ｄは０．２２であった。
上述した３次元復元の誤差分散の様子を図６に示す。対象物体は図５と同じである。図６では、視線方向の標準偏差を線分の長さで表している。復元点は各線分の中点である。（ａ）は基線長が画像１０枚の場合である。矢印６１０はカメラ視線の方向、６２０で示される線分が復元結果（標準偏差を示す線分）である。（ｂ）は基線長が画像２０枚の場合である。矢印６３０はカメラ視線の方向、６４０で示される線分が復元結果（標準偏差を示す線分）である。基線長が長い方が誤差が小さくなる様子がわかる。また、カメラから遠い点ほど誤差が大きいこともわかる。この標準偏差値は、実スケールが不定なのでカメラからの距離との比で表すと、（ａ）で０．０５５〜０．０７、（ｂ）で０．０１７〜０．０２であった。 (1) Selection of optimum baseline length (number of images) FIG. 5 shows an experimental example in which the optimum number of images (baseline length) for restoration is automatically selected by the baseline length selection method of this embodiment. FIG. 5A shows an image example in the used image sequence and feature points extracted in each image. The number of feature points tracked is 50. In this example, the camera was moved slowly to reduce the translational component. (B) is a readout of a restoration example in which initial estimation was performed using the first three images. The arrow 510 indicates the direction of camera line of sight, and the point indicated by 520 is the restoration result. In fact, it can be seen that the part that should be perpendicular is gradual and has a shape close to a plane. The degeneration reference value D at this time was 0.02. (C) is a readout of the restoration result obtained by automatically obtaining an appropriate baseline length by the method of the present embodiment. The arrow 530 is the direction of camera line of sight, and the point indicated by 540 is the restoration result. It can be seen that the right-angled part has been successfully restored. The number of required images was 24, and the degeneration reference value D was 0.22.
FIG. 6 shows the state of error dispersion in the above-described three-dimensional restoration. The target object is the same as in FIG. In FIG. 6, the standard deviation in the line-of-sight direction is represented by the length of the line segment. The restoration point is the midpoint of each line segment. (A) is a case where the baseline length is 10 images. The arrow 610 is the direction of the camera line of sight, and the line segment indicated by 620 is the restoration result (line segment indicating the standard deviation). (B) is a case where the baseline length is 20 images. An arrow 630 is the direction of the camera line of sight, and a line segment indicated by 640 is a restoration result (a line segment indicating a standard deviation). It can be seen that the longer the baseline length, the smaller the error. It can also be seen that the point farther from the camera has a larger error. Since the actual scale is indefinite, the standard deviation values are 0.055 to 0.07 in (a) and 0.017 to 0.02 in (b) when expressed as a ratio to the distance from the camera.

（２）カメラ姿勢と物体形状の復元例
（ｉ）階段
図７に大学校舎の玄関の階段を画像９０枚から復元した例を示す。初期推定での基線長は画像３枚であった。（ａ）〜（ｃ）は入力された画像の例、（ｄ）は側面から見た復元結果である。７１０が復元されたカメラ姿勢、７２０が復元された物体の３次元点である。また、（ｅ）は正面から見た復元結果であり、７３０が復元されたカメラ姿勢、７４０が復元された物体の３次元点である。上下方向のカメラ運動や階段段差が復元できていることがわかる。これは、ロボットの不整地での移動に利用できることを示唆する。
（ｉｉ）ビル
図８に大学校舎を１２０枚の画像から復元した例を示す。初期推定での基線長は画像９枚であった。（ａ）〜（ｃ）は入力された画像の例、（ｄ）は側面から見た復元結果である。８１０が復元されたカメラ姿勢、８２０が復元された物体の３次元点である。また、（ｅ）は正面から見た復元結果であり、８３０が復元されたカメラ姿勢、８４０が復元された物体の３次元点である。このように、基線長を調節できる本方式では、遠方の物体も復元できる。撮影の途中で階段を降りた様子（８５０）も復元できている。
（ｉｉｉ）屋内
図９に研究室内で机を撮影した画像から復元した例を示す。初期推定での基線長は画像４枚であった。（ａ）は入力された画像の例（８枚）、（ｂ）は側面から見た復元結果である。９１０が復元されたカメラ姿勢、９２０が復元された物体の３次元点である。また、（ｃ）は正面から見た復元結果であり、９３０が復元されたカメラ姿勢、９４０が復元された物体の３次元点である。カメラの動きは不規則であるが、うまく復元できている。 (2) Example of Restoring Camera Posture and Object Shape (i) Stairs FIG. 7 shows an example of restoring the staircase at the entrance of the university building from 90 images. The baseline length in the initial estimation was 3 images. (A)-(c) is an example of the input image, (d) is the restoration result seen from the side. Reference numeral 710 denotes a restored camera posture, and reference numeral 720 denotes a three-dimensional point of the restored object. Further, (e) is a restoration result viewed from the front, where 730 is a restored camera posture and 740 is a three-dimensional point of the restored object. It can be seen that the camera movement in the vertical direction and the stairs are restored. This suggests that the robot can be used for moving on rough terrain.
(Ii) Building FIG. 8 shows an example in which a university building is restored from 120 images. The baseline length in the initial estimation was 9 images. (A)-(c) is an example of the input image, (d) is the restoration result seen from the side. Reference numeral 810 denotes a restored camera posture, and reference numeral 820 denotes a three-dimensional point of the restored object. Further, (e) is a restoration result viewed from the front, where 830 is the restored camera posture and 840 is the three-dimensional point of the restored object. As described above, in this method in which the baseline length can be adjusted, a distant object can be restored. The state of going down the stairs during shooting (850) has also been restored.
(Iii) Indoor FIG. 9 shows an example of restoration from an image obtained by photographing a desk in the laboratory. The baseline length in the initial estimation was 4 images. (A) is an example of input images (eight images), and (b) is a restoration result viewed from the side. 910 is the restored camera posture, and 920 is the three-dimensional point of the restored object. Further, (c) is a restoration result viewed from the front, 930 is the restored camera posture, and 940 is the restored three-dimensional point of the object. The camera movement is irregular, but it is well restored.

本実施形態の復元システムの全体的な処理フローを示す図である。It is a figure which shows the whole processing flow of the decompression | restoration system of this embodiment. 本実施形態のカメラ運動及び物体形状復元処理の手順の一例を示す流れ図である。It is a flowchart which shows an example of the procedure of the camera motion of this embodiment, and an object shape restoration process. 退化基準の計算手順の一例を示す流れ図である。It is a flowchart which shows an example of the calculation procedure of a degeneration reference | standard. 画像特徴点の構成の一例を示す図である。It is a figure which shows an example of a structure of an image feature point. 本実施形態の復元システムを用いた実験結果を示す図である。It is a figure which shows the experimental result using the decompression | restoration system of this embodiment. 本実施形態の復元システムを用いた実験結果を示す図である。It is a figure which shows the experimental result using the decompression | restoration system of this embodiment. 本実施形態の復元システムを用いた実験結果を示す図である。It is a figure which shows the experimental result using the decompression | restoration system of this embodiment. 本実施形態の復元システムを用いた実験結果を示す図である。It is a figure which shows the experimental result using the decompression | restoration system of this embodiment. 本実施形態の復元システムを用いた実験結果を示す図である。It is a figure which shows the experimental result using the decompression | restoration system of this embodiment.

Claims

A camera motion / object shape restoration system that restores the camera motion and the three-dimensional shape of an object from an image sequence taken while moving the camera,
A feature point extracting / tracking unit that extracts feature points from the input image, associates them between images, and stores the associated feature points in a feature point storage unit;
An initial value k is given, feature points from the first input image to the k-th image are read from the feature point storage means, a basic equation matrix is generated from the feature points, and a singular value of the basic equation matrix is obtained. When the degeneration reference value is calculated from the singular value and the degeneration reference value does not exceed the predetermined threshold, the calculation of the degeneration reference value is repeated while increasing k. When the degeneration reference value exceeds the threshold, Initial estimation means for obtaining the initial posture of the camera and the three-dimensional position of the feature point of the object from the feature points from the first image to the k-th image and storing them in the camera motion storage means and the three-dimensional shape storage means, respectively;
Camera motion / object shape restoration means comprising camera motion / object shape reading means for reading out the camera posture stored in the camera motion storage means and the object shape stored in the three-dimensional shape storage means system.

The camera motion / object shape restoration system according to claim 1,
Further, after the processing of the initial estimation means, the camera orientation estimation for obtaining the camera orientation of the current image in order from the (k + 1) th image to the last image and storing it in the camera motion storage means, and the three-dimensional position of the feature point What is claimed is: 1. A camera motion / object shape restoration system comprising: successive estimation means for alternately and repeatedly obtaining and reconstructing an object shape to be fused with the three-dimensional shape storage means.

In the camera movement / object shape restoration system according to claim 1 or 2,
The initial estimation means obtains a subset of feature points used for camera motion and object shape restoration by robust estimation from the feature points from the first image to the k-th image, and a basic equation matrix from the subset of the feature points Generating a singular value of the basic equation matrix and calculating a degeneration reference value from the singular value. If the degeneration reference value does not exceed a predetermined threshold, the calculation of the degeneration reference value is repeated while increasing k, When the degeneration reference value exceeds the threshold value, the initial posture of the camera and the three-dimensional position of the object feature point are obtained from the subset of the feature points and stored in the camera motion storage unit and the three-dimensional shape storage unit, respectively. Camera motion / object shape restoration system.

The camera motion / object shape restoration system according to claim 3,
When a plurality of subsets of the feature points are obtained, the initial estimation means obtains a restoration reference value for each subset, and a score calculated from the restoration reference value and the degeneration reference value is the best. A camera motion / object shape restoration system, wherein an initial posture of a camera and a three-dimensional position of a feature point of an object are obtained from the subset and stored in camera motion storage means and three-dimensional shape storage means, respectively.

In the camera movement / object shape restoration system according to claim 3 or 4,
The initial estimation means obtains a degeneration reference value from the feature points from the first image to the k-th image before obtaining the subset of the feature points, and if the degeneration reference value does not exceed a predetermined threshold value A camera motion / object shape restoration system characterized by repeatedly calculating a degeneration reference value while increasing k, and obtaining a subset of the feature points when the degeneration reference value exceeds the threshold.

In the camera movement / object shape restoration system according to any one of claims 2 to 5,
The object shape restoration in the successive estimation means is performed by performing three-dimensional restoration on the feature point of the current image and the corresponding feature point of the image at each past time point stored in the feature point storage means. Camera motion characterized by obtaining a covariance matrix and fusing the three-dimensional shape storage means with a three-dimensional restoration result having a minimum covariance matrix value less than a predetermined threshold as a three-dimensional position of the feature point -Object shape restoration system.

A camera motion / object shape restoration program for causing a computer system to configure the functions of the camera motion / object shape restoration system according to claim 1.

A recording medium on which a camera motion / object shape restoration program for causing a computer system to configure the function of the camera motion / object shape restoration system according to claim 1 is recorded.