JP4283816B2

JP4283816B2 - Three-dimensional environment information acquisition apparatus, three-dimensional environment information acquisition method, and recording medium storing a program realizing the method

Info

Publication number: JP4283816B2
Application number: JP2006096442A
Authority: JP
Inventors: 達哉大澤; 小軍ウ; 佳織若林; 貴之安野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-03-31
Filing date: 2006-03-31
Publication date: 2009-06-24
Anticipated expiration: 2026-03-31
Also published as: JP2007271408A

Description

本発明は、複数の撮像装置を使って取得した移動観測画像列から空間の三次元構造を取得する装置，方法及びその方法を記録した記録媒体に関するものである。 The present invention relates to an apparatus and method for acquiring a three-dimensional structure of space from a moving observation image sequence acquired using a plurality of imaging devices, and a recording medium on which the method is recorded.

実世界の三次元構造を表す三次元環境情報は、例えば、ヴァーチャルリアリティにおける仮想現実の世界を表現したり、ナビゲーション用途における地図と見做して扱ったり、環境情報を利用した映像監視、などに幅広く応用できることが知られている。 The 3D environment information that represents the 3D structure of the real world can be used, for example, to represent the virtual reality world in virtual reality, to treat it as a map for navigation purposes, or to monitor images using environmental information. It is known that it can be widely applied.

なお、三次元環境情報は、物体（例えば、静止した被写体あるいは物体，屋内にある物体）に関する三次元の絶対的な位置姿勢情報（直接観測できない箇所の位置姿勢情報も含む情報）である。 Note that the three-dimensional environment information is three-dimensional absolute position and orientation information (information including position and orientation information of a portion that cannot be directly observed) related to an object (for example, a stationary subject or object, or an indoor object).

従来、このような三次元環境情報は、レンジファインダに代表されるような特殊な機器を利用したり、手作業によってＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）モデルを作成する、ことによって取得されていた。 Conventionally, such three-dimensional environment information has been acquired by using a special device such as a range finder or manually creating a CG (Computer Graphics) model.

例えば、レンジファインダで獲得した三次元環境情報を用いて人物の頭の位置を頑健に追跡する方法（以後、人物追跡方法と称する）が提案されている（例えば、非特許文献１参照）。 For example, a method of robustly tracking the position of a person's head using three-dimensional environment information acquired by a range finder (hereinafter referred to as a person tracking method) has been proposed (see Non-Patent Document 1, for example).

また、画像入力装置を使って三次元環境情報を取得する技術では、ビデオカメラの校正方法（例えば、非特許文献２参照）、透視投影モデルに基づいた頑健なカメラ運動の推定方法（例えば、非特許文献３参照）、三次元点群の座標変換行列に関する変換方法（例えば、非特許文献４参照）、多視点画像からの３次元座標計算方法（例えば、非特許文献５参照）が知られている。
鈴木達也，岩崎慎介，小林貴訓，佐藤洋一，杉本晃宏，「環境モデルの導入による人物追跡の安定化」，電子情報通信学会論文誌，２００５（平成１７年），Ｄ−ＩＩｖｏｌ．Ｊ８８，Ｎｏ８，ｐ．１５９２−１６００。Ｚ．Ｚｈａｎｇ，”Ａｆｌｅｘｉｂｌｅｎｅｗｔｅｃｈｎｉｑｕｅｆｏｒｃａｍｅｒａｃａｌｉｂｒａｔｉｏｎ”，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，２０００（平成１２年），２２（１１）：ｐ．１３３０−１３３４．Ｓ．Ｃｈｒｉｓｔｙ，Ｒ．Ｈｏｒａｕｄ，”Ｅｕｃｌｉｄｅａｎｓｈａｐｅａｎｄｍｏｔｉｏｎｆｒｏｍｍｕｌｔｉｐｌｅｐｅｒｓｐｅｃｔｉｖｅｖｉｅｗｓｂｙａｆｆｉｎｅｉｔｅｒａｔｉｏｎｓ”，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，１９９６（平成８年），１８（１１）：ｐ．１０９８−１１０４．Ｂ．Ｋ．Ｐ．Ｈｏｒｎ，”Ｃｌｏｓｅｄ−ｆｏｒｍｓｏｌｕｔｉｏｎｏｆａｂｓｏｌｕｔｅｏｒｉｅｎｔａｔｉｏｎｕｓｉｎｇｕｎｉｔｑｕａｔｅｒｎｉｏｎｓ”，ＪｏｕｒｎａｌｏｆｔｈｅＯｐｔｉｃａｌＳｏｃｉｅｔｙｏｆＡｍｅｒｉｃａ，１９８７（昭和６２年），Ａ，ｖｏｌ．４，ｐ．６２９−６４２．辻三郎，徐剛，「３次元ビジョン」，共立出版，平成１０年，ｐ．９５−９６。 In addition, in the technology for acquiring three-dimensional environment information using an image input device, a video camera calibration method (for example, see Non-Patent Document 2) and a robust camera motion estimation method based on a perspective projection model (for example, non-patent document 2). Patent Document 3), a conversion method related to a coordinate conversion matrix of a three-dimensional point group (for example, see Non-Patent Document 4), and a three-dimensional coordinate calculation method from a multi-viewpoint image (for example, see Non-Patent Document 5) are known. Yes.
Suzuki Tatsuya, Iwasaki Shinsuke, Kobayashi Takanori, Sato Yoichi, Sugimoto Yasuhiro, “Stabilization of Person Tracking by Introducing Environmental Model”, IEICE Transactions, 2005 (2005), D-II vol. J88, No8, p. 1592-1600. Z. Zhang, "A flexible new technology for camera calibration", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000 (2000), 22 (11): 1330-1334. S. Christy, R.D. Horaud, “Euclidean shape and motion, multiple views, by-by-affine iterations,” IEEE Transactions on Pattern Analysis, 1996 (Heisei 18 and Heisei 18). 1098-1104. B. K. P. Horn, “Closed-form solution of absolute orientation using unit quotas”, Journal of the Optical Society of America, 1987, A, vol. 4, p. 629-642. Tsubasa Saburo, Xugang, “Three-dimensional vision”, Kyoritsu Shuppan, 1998, p. 95-96.

前述のように、三次元環境情報を取得するためには、レンジファインダなどの非常に高価で特殊な機械を用いる方法か、手作業によってＣＧモデルを作成する方法を採らなければならなかった。そのため、ユーザが三次元環境情報を簡易に取得できなかった。 As described above, in order to acquire the three-dimensional environment information, a method using a very expensive and special machine such as a range finder or a method of creating a CG model manually must be adopted. For this reason, the user cannot easily acquire the three-dimensional environment information.

また、前記の方法を用いて三次元環境情報を取得しても、例えば、前述の人物追跡方法は、空間中に配置されたビデオカメラ（以後、単にカメラという）から得られる情報を用いて人物を追跡する場合、カメラ情報の座標系と三次元環境モデルの座標系を統一することが必要になる。 Further, even if the three-dimensional environment information is acquired using the above method, for example, the above-described person tracking method uses the information obtained from a video camera (hereinafter simply referred to as a camera) arranged in the space. When tracking the camera, it is necessary to unify the coordinate system of the camera information and the coordinate system of the three-dimensional environment model.

さらに、前述の人物追跡方法は、空間中に多数のマーカを配置して、各カメラでの対応関係を求めるなど、多大な労力を要するものとなっていた。 Furthermore, the person tracking method described above requires a great amount of labor, such as arranging a large number of markers in the space and obtaining the correspondence between each camera.

本発明は、前記課題に基づいてなされたものであって、空間中に配置されたビデオカメラから取得した情報から三次元環境情報を取得することによって、ユーザが容易に三次元環境情報を取得し、かつ、極めて簡単な操作のみで全てのカメラ情報を一つの世界座標系に統合（例えば、各カメラのカメラ座標系と世界座標系を変換）する三次元環境情報取得装置，三次元環境情報取得方法及びその方法を実現したプログラムを格納した記録媒体を提供することにある。 The present invention has been made based on the above-described problem, and by acquiring 3D environment information from information acquired from a video camera arranged in a space, a user can easily acquire 3D environment information. And 3D environment information acquisition device, 3D environment information acquisition that integrates all camera information into one world coordinate system (for example, transforming camera coordinate system and world coordinate system of each camera) with very simple operation A method and a recording medium storing a program that implements the method are provided.

本発明は、前記課題の解決を図るために、請求項１記載の発明は、被写体が撮像された画像を複数の撮像装置から取得し、その撮像された画像から該被写体に関する三次元環境情報を取得する三次元環境情報取得装置であって、前記複数の撮像装置をそれぞれに移動させながら各撮像装置の前記被写体を撮像した移動観測画像列を取得する移動観測画像列の取得手段と、その取得した各撮像装置の移動観測画像列の各画像を撮影した時の、それぞれの撮像装置の撮像装置運動を推定する撮像装置運動推定手段と、その取得したそれぞれの撮像装置運動に基づいて、その被写体に関する各撮像装置の座標系で表した三次元点群を取得する三次元点群取得手段と、その移動観測画像列を用いて取得された各撮像装置における三次元点群を、いずれか１つの撮像装置の座標系を世界座標系として世界座標系に統合する三次元点群統合手段と、その統合された三次元点群から、三次元点群が表す最大面積を有する平面である基準面領域を検出する基準面検出手段と、その検出された基準面領域に関する情報に基づいて、基準面をＸＹ平面、高さ方向をＺ軸として、その統合された三次元点群からＺ座標の投票処理により最大得票数を得たＺ座標以外のエラー点を除去するノイズ除去手段と、を有することを特徴とする。 In order to solve the above-described problem, the present invention provides a first aspect of the present invention, wherein an image of a subject is acquired from a plurality of imaging devices, and three-dimensional environment information related to the subject is obtained from the captured images. a three-dimensional environment information acquisition device that acquires an acquisition unit of a mobile observation image sequence to get the movement observed image sequence obtained by imaging the subject of the imaging device while moving the plurality of imaging devices to each, the obtained when taken each image of the moving observation image sequence of each imaging device, an imaging device motion estimation means for estimating an imaging device motion of each of the image pickup apparatus, based on each of the imaging device motion so acquired, the subject and to a three-dimensional point group obtaining means for obtaining a three-dimensional point group expressed in the coordinate system of the imaging devices, a three-dimensional point group in the imaging devices obtained using the moving observation image sequence, either The coordinate system of one of the imaging device and the three-dimensional point group integrating means for integrating the world coordinate system as the world coordinate system, from the integrated three-dimensional point group, the reference is a plane having the largest area three-dimensional point group represents Based on the reference plane detection means for detecting the plane area and information on the detected reference plane area, the reference plane is the XY plane and the height direction is the Z axis . And noise removing means for removing error points other than the Z coordinate obtained by the voting process .

請求項２記載の発明は、請求項１記載の発明において、前記三次元点群取得手段は、前記移動観測画像列からステレオ処理に利用する画像を選択し、その選択された画像間でステレオ処理を行い、そのステレオ処理の結果と、前記撮像装置運動推定手段で求められた撮像装置運動と、を用いて三次元点を復元する、ことを特徴とする。 According to a second aspect of the present invention, in the first aspect of the invention, the three-dimensional point cloud acquisition unit selects an image to be used for stereo processing from the moving observation image sequence, and performs stereo processing between the selected images. And the three-dimensional point is restored using the result of the stereo processing and the imaging device motion obtained by the imaging device motion estimation means .

請求項３記載の発明は、請求項１または２に記載の発明において、前記基準面検出手段は、前記統合された三次元点群から、三次元ハフ変換によって面積最大となる平面領域を検出し、その検出された平面領域を基準面領域と見做す、ことを特徴とする。 According to a third aspect of the invention, in the first or second aspect of the invention, the reference plane detecting means detects a planar area having a maximum area by three-dimensional Hough transform from the integrated three-dimensional point group. The detected plane area is regarded as a reference plane area.

請求項４記載の発明は、被写体が撮像された画像を複数の撮像装置から取得し、その撮像された画像から該被写体に関する三次元環境情報を取得する三次元環境情報取得方法であって、前記複数の撮像装置をそれぞれに移動させながら各撮像装置の前記被写体を撮像した移動観測画像列を取得する移動観測画像列の取得ステップと、その取得した各撮像装置の移動観測画像列の各画像を撮影した時の、それぞれの撮像装置の撮像装置運動を推定する撮像装置運動推定ステップと、その取得したそれぞれの撮像装置運動に基づいて、その被写体に関する各撮像装置の座標系で表した三次元点群を取得する三次元点群取得ステップと、その移動観測画像列を用いて取得された各撮像装置における三次元点群を、いずれか１つの撮像装置の座標系を世界座標系として世界座標系に統合する三次元点群統合ステップと、その統合された三次元点群から、三次元点群が表す最大面積を有する平面である基準面領域を検出する基準面検出ステップと、その検出された基準面領域に関する情報に基づいて、基準面をＸＹ平面、高さ方向をＺ軸として、その統合された三次元点群からＺ座標の投票処理により最大得票数を得たＺ座標以外のエラー点を除去するノイズ除去ステップと、を有する。 The invention according to claim 4 is a three-dimensional environment information acquisition method for acquiring an image in which a subject is captured from a plurality of imaging devices and acquiring three-dimensional environment information on the subject from the captured images. An acquisition step of a moving observation image sequence for acquiring a moving observation image sequence obtained by imaging the subject of each imaging device while moving each of the plurality of imaging devices, and each image of the acquired moving observation image sequence of each imaging device when taken, an imaging device motion estimation step of estimating an imaging device motion of each of the image pickup apparatus, based on each of the imaging device motion so acquired, three-dimensional point expressed in the coordinate system of the imaging device relating to the subject a three-dimensional point group acquisition step of acquiring a group, the three-dimensional point group in the imaging devices obtained using the moving observation image sequence, the coordinate system of one of the imaging device A three-dimensional point cloud integration step of integrating into the world coordinate system as the field coordinate system, from its integrated three-dimensional point cloud, the reference level detection for detecting a reference plane area is a plane having the largest area three-dimensional point group represents Based on the information about the step and the detected reference plane area, the reference plane is the XY plane and the height direction is the Z axis, and the maximum number of votes is obtained by voting processing of the Z coordinate from the integrated three-dimensional point group And a noise removal step for removing error points other than the Z coordinate .

請求項５記載の発明は、請求項４記載の発明において、前記撮像装置運動推定ステップが、前記移動観測画像列中の任意の画像で発生した特徴点を、他の全ての画像に対し追跡処理を行い、その特徴点追跡の結果を用いて計測行列を作成し、因子分解法により撮像装置運動を求めるステップ、前記移動観測画像列中の任意の画像で発生した特徴点を、他の全ての画像に対し追跡処理を行い、その特徴点追跡の結果を用いて逐次的に射影復元を行って撮像装置運動を求めるステップ、のいずれかを含むことを特徴とする。 According to a fifth aspect of the present invention, in the invention according to the fourth aspect , the imaging device motion estimation step performs a tracking process on feature points generated in an arbitrary image in the moving observation image sequence with respect to all other images. To create a measurement matrix using the result of the feature point tracking, obtain an imaging device motion by a factorization method, feature points generated in any image in the moving observation image sequence, all other points The image processing apparatus includes any one of a step of performing tracking processing on the image and sequentially performing projection restoration using a result of the feature point tracking to obtain an imaging device motion.

請求項６記載の発明は、請求項４または５に記載の発明において、前記三次元点群取得ステップは、前記移動観測画像列からステレオ処理に利用する画像を選択し、その選択された画像間でステレオ処理を行い、そのステレオ処理の結果と、前記撮像装置運動推定ステップで求められた撮像装置運動と、を用いて三次元点を復元する、ことを特徴とする。 According to a sixth aspect of the present invention, in the invention according to the fourth or fifth aspect , the three-dimensional point group acquisition step selects an image to be used for stereo processing from the moving observation image sequence, and selects between the selected images. The stereo processing is performed in step S3, and the three-dimensional point is restored using the result of the stereo processing and the imaging device motion obtained in the imaging device motion estimation step.

請求項７記載の発明は、請求項４乃至６のいずれかに記載の発明において、前記基準面検出ステップは、前記統合された三次元点群から、三次元ハフ変換によって面積最大となる平面領域を検出し、その検出された平面領域を基準面領域と見做す、ことを特徴とする。 A seventh aspect of the present invention is the invention according to any one of the fourth to sixth aspects, wherein the reference plane detecting step is a planar region having a maximum area by three-dimensional Hough transform from the integrated three-dimensional point group. And the detected plane area is regarded as a reference plane area.

請求項８記載の発明は、コンピュータに、請求項４乃至７のいずれか１項に記載の移動観測画像列の取得ステップと、撮像装置運動推定ステップと、三次元点群取得ステップと、三次元点群統合ステップと、基準面検出ステップと、ノイズ除去ステップとを実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体、を特徴とする。 According to an eighth aspect of the present invention, there is provided a computer, the moving observation image sequence acquisition step according to any one of the fourth to seventh aspects, an imaging device motion estimation step, a three-dimensional point group acquisition step, and a three-dimensional A computer-readable recording medium having recorded thereon a program for executing a point group integration step, a reference plane detection step, and a noise removal step .

前記の請求項１，４の発明によれば、移動観測画像列から統合された三次元点群を取得できる。また、基準面領域に関する投票処理によってエラー点を除去できる。 According to the first and fourth aspects of the invention, an integrated three-dimensional point group can be acquired from the moving observation image sequence . In addition, error points can be removed by voting processing related to the reference plane area.

前記の請求項２，６の発明によれば、移動観測画像列からステレオ処理に関する対応点を取得できる。 According to the second and sixth aspects of the invention, it is possible to acquire corresponding points related to stereo processing from a moving observation image sequence.

前記の請求項３，７の発明によれば、統合された三次元点群から面積最大となる平面領域を取得できる。 According to the invention of the claim 3, 7, can be obtained a planar region comprising a maximum area from an integrated three-dimensional point cloud.

前記の請求項５の発明によれば、移動観測画像列から計測行列を作成し、因子分解法に基づいた撮像装置運動、または、移動観測画像列から逐次的に射影復元に基づいた撮像装置運動を取得できる。 According to the invention of claim 5, the measurement matrix is created from the moving observation image sequence, and the imaging device motion based on the factorization method or the imaging device motion based on the sequential projection restoration from the moving observation image sequence Can be obtained.

前記の請求項８の発明によれば、請求項４乃至７のいずれかに記載の三次元環境情報取得方法をコンピュータプログラムとして記載できる。 According to the invention of claim 8 , the three-dimensional environment information acquisition method according to any of claims 4 to 7 can be described as a computer program.

以上示したように、請求項１，４の発明によれば、撮像装置に関する簡易な操作によって三次元環境情報を取得できる。また、密で精度の高い三次元点群を取得できる。 As described above, according to the first and fourth aspects of the invention, the three-dimensional environment information can be acquired by a simple operation related to the imaging apparatus. In addition, a dense and highly accurate three-dimensional point group can be acquired.

請求項２，６の発明によれば、撮像対象に関する三次元点を復元できる。 According to the invention of claim 2, 6, it can be restored three-dimensional points for the imaging target.

請求項３，７の発明によれば、統合された三次元点群から基準面を取得できる。 According to the invention of claim 3, 7, can obtain a reference plane from an integrated three-dimensional point cloud.

請求項５の発明によれば、撮像装置に関する簡易な操作によって撮像装置運動を取得できる。 According to the fifth aspect of the present invention, the motion of the imaging device can be acquired by a simple operation relating to the imaging device.

請求項８の発明によれば、三次元環境情報取得方法を実装したコンピュータプログラムを記録した記録媒体を取得できる。 According to invention of Claim 8 , the recording medium which recorded the computer program which mounted the three-dimensional environment information acquisition method is acquirable.

これらを以ってコンピュータビジョン分野に貢献できる。 These can contribute to the computer vision field.

以下、本発明の実施の形態における三次元環境情報取得装置，三次元環境情報取得方法及びその方法を実現したプログラムを格納した記録媒体を図面等に基づいて詳細に説明する。 Hereinafter, a 3D environment information acquisition device, a 3D environment information acquisition method, and a recording medium storing a program that implements the method will be described in detail with reference to the drawings.

本発明の基本方式は、複数の撮像装置（例えば、ビデオカメラなどのカメラ）によって観測される空間の三次元構造情報を表し、三次元環境情報を取得し、撮像装置によって取得された三次元点群を統一された世界座標系に統合し、この世界座標系と各撮像装置の撮像装置座標系との変換式を求めることである。 The basic method of the present invention represents three-dimensional structure information of a space observed by a plurality of imaging devices (for example, cameras such as video cameras), acquires three-dimensional environment information, and acquires three-dimensional points acquired by the imaging device. The group is integrated into a unified world coordinate system, and a conversion formula between this world coordinate system and the imaging device coordinate system of each imaging device is obtained.

本発明の実施の形態は、複数の撮像装置から画像を取得し、三次元環境情報（または、実世界の三次元構造）を取得する三次元環境情報取得装置，三次元環境情報取得方法及びその方法を実現したプログラムを格納した記録媒体である。 Embodiments of the present invention provide a three-dimensional environment information acquisition device, a three-dimensional environment information acquisition method, and a method for acquiring three-dimensional environment information (or a real-world three-dimensional structure) by acquiring images from a plurality of imaging devices. A recording medium storing a program that implements the method.

即ち、前記の三次元環境情報取得装置，三次元環境情報取得方法及びその方法を実現したプログラムを格納した記録媒体は、撮像装置を移動させながら観測した画像列（即ち、移動観測画像列）を取得し、前記の取得した移動観測画像列の各フレーム（即ち、その画像列中の画像）を撮影したときの撮像装置運動（即ち、撮像装置の三次元位置と姿勢）を推定するものである。さらに、前記の取得した撮像装置運動を用いて被写体の三次元点群を取得し、前記の移動観測画像列を用いて取得された三次元点群を統一された世界座標系に統合し、その統合された三次元点群から基準面領域を検出し、その検出された基準面領域の情報に基づいて、その統合された三次元点群からエラー点を除去するものである。 That is, the recording medium storing the three-dimensional environment information acquisition device, the three-dimensional environment information acquisition method, and the program that implements the method uses an image sequence (that is, a moving observation image sequence) observed while moving the imaging device. Obtaining and estimating the motion of the imaging device (that is, the three-dimensional position and orientation of the imaging device) when each frame of the acquired moving observation image sequence (that is, the image in the image sequence) is captured. . Furthermore, the three-dimensional point cloud of the subject is acquired using the acquired imaging device motion, the three-dimensional point cloud acquired using the moving observation image sequence is integrated into a unified world coordinate system, and A reference plane region is detected from the integrated three-dimensional point group, and error points are removed from the integrated three-dimensional point group based on the information of the detected reference plane region.

本発明の実施の形態を図１乃至図６に基づいて以下に説明する。 An embodiment of the present invention will be described below with reference to FIGS.

図１は、本実施の形態における三次元環境情報取得装置の構成例を示す。三次元環境情報取得装置は、移動観測画像列の取得手段１１と、カメラ運動推定手段（即ち、撮像装置運動推定手段）１２と、三次元点群取得手段１３と、三次元点群統合手段１４と、床面検出手段１５と、ノイズ除去手段１６から構成される。 FIG. 1 shows a configuration example of a three-dimensional environment information acquisition apparatus according to the present embodiment. The three-dimensional environment information acquisition device includes a movement observation image sequence acquisition unit 11, a camera motion estimation unit (that is, an imaging device motion estimation unit) 12, a three-dimensional point group acquisition unit 13, and a three-dimensional point group integration unit 14. And a floor surface detecting means 15 and a noise removing means 16.

移動観測画像列の取得手段１１は、画像入力装置を移動させながら時系列画像データ（即ち、移動観測画像列）を取得する手段であって、例えば、スライド可能なカメラ雲台に取り付けたビデオカメラ等が挙げられる。また、取得した移動観測画像列は、移動観測画像列の取得手段１１よって、外部記憶装置（例えば、ハードディスク）に記憶され管理されても良い。 The moving observation image sequence acquisition unit 11 is a unit that acquires time-series image data (that is, the moving observation image sequence) while moving the image input device, and is, for example, a video camera attached to a slidable camera pan head. Etc. Further, the acquired movement observation image sequence may be stored and managed in an external storage device (for example, a hard disk) by the movement observation image sequence acquisition means 11.

カメラ運動推定手段１２は、前記の移動観測画像列の各フレームを撮影した際のカメラの三次元位置（ｘ，ｙ，ｚ）と姿勢（φ，θ，γ）を推定する手段であって、例えば、因子分解法やエピポーラ幾何を用いて推定する方法を採用した手段である。 The camera motion estimation means 12 is a means for estimating the three-dimensional position (x, y, z) and posture (φ, θ, γ) of the camera when each frame of the moving observation image sequence is photographed. For example, it is a means that employs a factorization method or an estimation method using epipolar geometry.

三次元点群取得手段１３は、前記の移動観測画像列およびカメラ運動を用いて被写体の三次元情報を復元する手段であって、例えば、ステレオ法や視体積交差法などを採用した手段である。 The three-dimensional point cloud acquisition unit 13 is a unit that restores the three-dimensional information of the subject using the moving observation image sequence and the camera motion, and employs, for example, a stereo method or a view volume intersection method. .

三次元点群統合手段１４は、複数の移動観測画像列の取得手段１１よって得られた複数の三次元点群データを統一された世界座標系に統合する手段である。 Three-dimensional point cloud integration unit 14 is a means for integrating a plurality of mobile observation image sequence acquisition unit 11 thus obtained are unified a plurality of three-dimensional point group data world coordinate system.

床面検出手段１５は、前記の世界座標系に統一された三次元点群から床面領域（即ち、基準面領域）を検出する手段である。 The floor surface detection means 15 is a means for detecting a floor surface area (that is, a reference surface area) from the three-dimensional point group unified in the world coordinate system.

ノイズ除去手段１６は、前記の取得された床面領域に関する情報を用いて、床面が世界座標系におけるＸＹ平面に一致するように三次元点群を座標変換した後に、Ｚ方向で投票処理を行うことによって、三次元点群に含まれるエラー点を除去する手段である。 The noise removing unit 16 uses the information on the acquired floor area to coordinate the three-dimensional point group so that the floor matches the XY plane in the world coordinate system, and then performs voting processing in the Z direction. This is a means for removing error points included in the three-dimensional point group.

次に、本実施の形態における三次元環境情報取得方法を図２乃至図３に基づいて説明する。なお、図３は前記の三次元環境情報取得方法を示すフローチャートである。 Next, the three-dimensional environment information acquisition method according to the present embodiment will be described with reference to FIGS. FIG. 3 is a flowchart showing the three-dimensional environment information acquisition method.

本実施の形態における三次元環境情報取得方法では、図２のように二台のビデオカメラＣ１およびＣ２によって観測される空間の三次元構造（即ち、被写体）ＢＧに関する情報（即ち、三次元構造情報または三次元環境情報）を取得するものとする。 In the three-dimensional environment information acquisition method according to the present embodiment, as shown in FIG. 2, information about the three-dimensional structure (namely, subject) BG of the space observed by the two video cameras C1 and C2 (namely, three-dimensional structure information). Or three-dimensional environmental information).

ビデオカメラ（以下、単にカメラという）Ｃ１およびＣ２のカメラ内部パラメータは、事前に校正を行っておくものとする。例えば、前述のビデオカメラの校正方法（非特許文献２参照）によって校正できる。 The camera internal parameters of the video cameras (hereinafter simply referred to as cameras) C1 and C2 are calibrated in advance. For example, it can be calibrated by the above-described video camera calibration method (see Non-Patent Document 2).

本実施の形態では、カメラ運動推定手段１２には因子分解法、三次元点群取得方法にはステレオ法を採用する。なお、カメラ運動推定手段１２には因子分解法、三次元点群取得方法にはステレオ法を用いる例で説明するが、これらに限定されるものではないことは明らかである。 In the present embodiment, a factorization method is employed for the camera motion estimation means 12 and a stereo method is employed for the three-dimensional point cloud acquisition method. In addition, although the example using the factorization method for the camera motion estimation means 12 and the stereo method for the three-dimensional point cloud acquisition method will be described, it is obvious that the present invention is not limited to these.

本実施の形態における三次元環境情報取得方法に関する処理を次に説明する。 Next, processing related to the three-dimensional environment information acquisition method in the present embodiment will be described.

まず、処理が開始されると、移動観測画像列取得手段１１によって、カメラＣ１を移動させながら観測した移動観測画像列Ｉｍ₁とカメラＣ２を移動させながら観測した移動観測画像列Ｉｍ₂を取得する（Ｓ２０１）。 First, when the process is started, the moving observation image sequence acquisition unit 11 acquires the movement observation image sequence Im ₁ observed while moving the camera C1 and the movement observation image sequence Im ₂ observed while moving the camera C2. (S201).

次に、カメラ運動推定手段１２によって、各移動観測画像列Ｉｍ₁およびＩｍ₂の各フレーム撮影時のカメラＣ１およびカメラＣ２の三次元位置と姿勢を推定する。なお、カメラの三次元位置と姿勢は各フレーム毎に求まり、移動観測画像列の第ｉフレームをＦＲ_i、第ｉフレームにおけるカメラの三次元位置を（ｘ_i，ｙ_i，ｚ_i）、姿勢を（φ_i，θ_i，γ_i）と表すことにする。 Next, the camera motion estimation unit 12 estimates the three-dimensional position and orientation of the camera C1 and the camera C2 at the time of each frame photography of each mobile observation image sequence Im ₁ and Im _2. The three-dimensional position and orientation of the camera are determined for each frame. The i-th frame of the moving observation image sequence is FR _i , the three-dimensional position of the camera in the i-th frame is (x _i , y _i , z _i ), and the orientation. Is represented as (φ _i , θ _i , γ _i ).

次に、取得した移動観測画像列Ｉｍ₁およびＩｍ₂に関して特徴点追跡を行う（Ｓ２０２）。なお、特徴点追跡とは、移動観測画像列中の任意のフレームで発生させた特徴点の画像座標値とその他の全てのフレームで対応する画像座標値を求めることである。即ち、この処理を移動観測画像列Ｉｍ₁およびＩｍ₂のそれぞれに対して行う。 Next, feature point tracking is performed on the acquired moving observation image sequences Im ₁ and Im ₂ (S202). The feature point tracking is to obtain image coordinate values of feature points generated in an arbitrary frame in the moving observation image sequence and corresponding image coordinate values in all other frames. That is, this process is performed for each of the moving observation image sequences Im ₁ and Im ₂ .

次に、特徴点追跡の結果を用いて因子分解法を適用する（Ｓ２０３）。そのためには、特徴点を追跡した結果から計測行列を作成する。この計測行列は、移動観測画像列のフレーム数をＦ、特徴点数をＮとすると、以下の式１で表されるＦ行Ｎ列の行列となる。 Next, a factorization method is applied using the result of feature point tracking (S203). For this purpose, a measurement matrix is created from the result of tracking the feature points. This measurement matrix is a matrix of F rows and N columns expressed by Equation 1 below, where F is the number of frames of the moving observation image sequence and N is the number of feature points.

ただし、（ｕ_ij，ｖ_ij）はｉフレーム目のｊ番目の特徴点の画像座標値を表す。 However, (u _ij , v _ij ) represents the image coordinate value of the j-th feature point in the i-th frame.

因子分解法は、式１で表される計測行列から、例えば、特異値分解などによって、各特徴点の三次元位置とカメラ運動（図４中のカメラ運動表Ｔ１で示される各フレーム毎の三次元位置（ｘ_i，ｙ_i，ｚ_i）、姿勢（φ_i，θ_i，γ_i））を求める方法であって、例えば、前述の透視投影モデルに基づいた頑健なカメラ運動の推定方法（非特許文献３参照）を用いれば透視投影モデルで頑健にカメラ運動を推定することが出来る。移動観測画像列Ｉｍ₁およびＩｍ₂のそれぞれに対し、因子分解処理を行って、カメラＣ１およびカメラＣ２のカメラ運動（Ｍ１，Ｍ２）を推定する。 In the factorization method, the three-dimensional position of each feature point and the camera motion (the cubic for each frame indicated by the camera motion table T1 in FIG. 4) are obtained from the measurement matrix represented by Equation 1 by, for example, singular value decomposition. The original position (x _i , y _i , z _i ) and posture (φ _i , θ _i , γ _i )) are obtained by, for example, a robust camera motion estimation method based on the above-described perspective projection model ( If non-patent literature 3 is used, camera motion can be estimated robustly with a perspective projection model. A factorization process is performed on each of the moving observation image sequences Im ₁ and Im ₂ to estimate the camera motions (M1, M2) of the cameras C1 and C2.

次に、推定されたカメラ運動と移動観測画像列を用いて三次元点群取得手段１３により、被写体の三次元点群を取得する。 Next, the 3D point cloud of the subject is acquired by the 3D point cloud acquisition means 13 using the estimated camera motion and the moving observation image sequence.

なお、移動観測画像列は複数の視点から得られた画像の集合であるため、視点の違いを利用して三角測量によって三次元計測を行う。即ち、三次元計測にはステレオ処理を適応できる。 Since the moving observation image sequence is a set of images obtained from a plurality of viewpoints, three-dimensional measurement is performed by triangulation using the difference in viewpoints. That is, stereo processing can be applied to three-dimensional measurement.

ステレオ処理では、移動観測画像列の中からステレオ処理に利用する画像を選択する（Ｓ２０４）。この画像選択は、以下のような手順である。 In stereo processing, an image to be used for stereo processing is selected from the moving observation image sequence (S204). This image selection is performed as follows.

まず、ステレオ処理の際の基準フレームを選択する。なお、基準フレームには任意のフレームを選択可能であり、複数のフレームを選択することも可能である。 First, a reference frame for stereo processing is selected. Note that any frame can be selected as the reference frame, and a plurality of frames can be selected.

次に、基準フレームを選択した後、その基準フレームに合わせて比較フレームを選択する。 Next, after selecting a reference frame, a comparison frame is selected in accordance with the reference frame.

なお、その比較フレームの選択に関して、移動観測画像列の時間的に隣り合うような２枚の画像をステレオ画像に選んでしまうと、カメラの移動量が小さいため、ステレオ処理の結果が不安定になる問題が発生する。 Regarding the selection of the comparison frame, if two images that are temporally adjacent to each other in the moving observation image sequence are selected as a stereo image, the amount of movement of the camera is small, and the result of stereo processing becomes unstable. Problem occurs.

その問題の解決策として、フレームインターバルｄを決定し、基準フレームから時間軸上でｄフレーム以上離れたフレームの中から比較フレームを採用する。 As a solution to this problem, a frame interval d is determined, and a comparison frame is adopted from frames that are separated from the reference frame by d frames or more on the time axis.

上記の条件を満たしていた場合、一枚の基準フレームに対し比較フレームを複数枚選択する。 If the above condition is satisfied, a plurality of comparison frames are selected for one reference frame.

例えば、図５中の移動観測画像列５０３における基準フレーム５０１に対し、時系列で正方向、負方向の両方向でフレームインターバルｄの整数倍フレーム離れたフレームを比較フレーム５０２と見做すことができる。このように、一枚の基準フレーム５０１に対し、複数の比較フレーム５０２を使ってステレオ処理を行うため、一枚の比較フレームのみを使った場合に対して頑健な処理を行うことが可能である。 For example, a frame that is separated from the reference frame 501 in the moving observation image sequence 503 in FIG. 5 by an integer multiple of the frame interval d in both the positive and negative directions in time series can be regarded as the comparison frame 502. . In this way, since stereo processing is performed on one reference frame 501 using a plurality of comparison frames 502, robust processing can be performed when only one comparison frame is used. .

以上の選択処理を移動観測画像列Ｉｍ₁およびＩｍ₂それぞれに対し行う。 The above selection process is performed for each of the moving observation image sequences Im ₁ and Im ₂ .

次に、基準フレームと比較フレームの選択処理が終った後、ステレオ処理を行う（Ｓ２０５）。 Next, after the selection processing of the reference frame and the comparison frame is finished, stereo processing is performed (S205).

即ち、ステレオ処理とは、基準フレームの各画素と比較フレームの対応する画素を結びつける処理である。 That is, the stereo process is a process for connecting each pixel of the reference frame and a corresponding pixel of the comparison frame.

基準フレームに対し、Ｋ枚の比較フレームがある場合、その基準フレームと全ての比較フレームの組み合わせＫ通りで、ステレオ処理を行うものとする。 When there are K comparison frames with respect to the reference frame, stereo processing is performed in K combinations of the reference frame and all comparison frames.

なお、ステレオ処理では、例えば、一般的なエリアベースのマッチング手法を用いて、基準フレームの全画面に対して行い、対応点データを求めることとする。 In stereo processing, for example, a general area-based matching method is used for the entire screen of the reference frame to obtain corresponding point data.

また、マッチング手法における評価関数には、例えば、絶対値差ＳＡＤ（ＳｕｍｏｆＡｂｓｏｌｕｔｅＤｉｆｆｅｒｅｎｃｅｓ）関数や正規化相互相関関数など任意の関数を用いることが可能である。 As an evaluation function in the matching method, for example, an arbitrary function such as an absolute value difference SAD (Sum of Absolute Differences) function or a normalized cross-correlation function can be used.

ステレオ処理の結果、一枚の基準フレームに対し、Ｋ個の対応点データを得ることができる。なお、以後、ステレオ処理対象の一対の画像（フレーム）をステレオ画像対と称することとする。 As a result of the stereo processing, K corresponding point data can be obtained for one reference frame. Hereinafter, a pair of images (frames) to be stereo processed will be referred to as a stereo image pair.

次に、得られた対応点データから三次元点群を復元する（Ｓ２０６）。 Next, a three-dimensional point group is restored from the obtained corresponding point data (S206).

即ち、一枚の基準フレームに対しＫ個の対応点データがある場合を考え、基準フレームの各画素毎に三次元点の計算を行う。 That is, considering the case where there are K corresponding point data for one reference frame, a three-dimensional point is calculated for each pixel of the reference frame.

しかし、ステップＳ２０６までの処理を行った後のままでは、基準フレームの各画素毎にＫ通りの三次元点が計算されてしまうため、対応点データの統合を行う。この対応点データの統合は、例えば、基準フレームの各画素毎にＫ個対応点データのうちのいずれか１枚のデータを利用して三次元点を計算する、または、選択する、ことによって実現できる。 However, since the K three-dimensional points are calculated for each pixel of the reference frame after performing the processing up to step S206, the corresponding point data is integrated. This integration of corresponding point data is realized by, for example, calculating or selecting a three-dimensional point using any one of K pieces of corresponding point data for each pixel of the reference frame. it can.

ステレオ処理では、基準フレーム撮影時のカメラ光学中心の位置と比較フレーム撮影時のカメラ光学中心の位置を結ぶ直線の距離（即ち、ベースライン）に関して、復元される三次元点の精度とステレオ処理自体の安定性の間におけるトレードオフが知られている。 In stereo processing, the accuracy of the three-dimensional point to be restored and the stereo processing itself with respect to the distance of a straight line connecting the position of the camera optical center at the time of reference frame shooting and the position of the camera optical center at the time of comparison frame shooting (ie, the baseline) There is a known trade-off between stability.

即ち、前記のベースラインが短ければ、ステレオ処理は安定するが、正しく対応付けされたデータからでも復元される三次元点は精度が悪く。逆に、ベースラインが長ければ、ステレオ処理は不安定だが、正しく対応付けされたデータに関しては復元される三次元点の精度は良い。このようなトレードオフを利用して対応点データの選択を行うものとする。 That is, if the baseline is short, stereo processing is stable, but the accuracy of the three-dimensional points restored from correctly associated data is poor. Conversely, if the baseline is long, stereo processing is unstable, but the accuracy of the three-dimensional points to be restored is good for correctly associated data. It is assumed that the corresponding point data is selected using such a trade-off.

まず、基準フレームの各画素毎にＫ枚の視差画像を用いて、Ｋ通りの三次元点を復元する。この三次元点の復元は、基準フレームの投影行列Ｐおよび比較フレームの投影行列Ｐ’と対応点データを用いて行う。なお、投影行列Ｐは、３行４列の行列であって、推定されたカメラ運動（三次元位置（ｘ，ｙ，ｚ）、姿勢（φ，θ，γ））とカメラ内部行列Ａから計算できる。 First, K three-dimensional points are restored using K parallax images for each pixel of the reference frame. This three-dimensional point restoration is performed using the projection matrix P of the reference frame, the projection matrix P ′ of the comparison frame, and the corresponding point data. The projection matrix P is a 3 × 4 matrix, and is calculated from the estimated camera motion (three-dimensional position (x, y, z), posture (φ, θ, γ)) and camera internal matrix A. it can.

ただし、 However,

また、Ａ行列の各要素は、カメラ校正によって既知である。 Each element of the A matrix is known by camera calibration.

式２によって、基準フレームの投影行列Ｐおよび比較フレームの投影行列Ｐ’を計算する。 The projection matrix P of the reference frame and the projection matrix P ′ of the comparison frame are calculated by Equation 2.

次に、対応点データと投影行列ＰおよびＰ’を使って三次元点を復元する。基準フレームの画像座標（ｕ，ｖ）と比較フレームの画像座標（ｕ’，ｖ’）が対応しているとすると、三次元点Ｍ（即ち、点（Ｘ，Ｙ，Ｚ））は、前述の多視点画像からの３次元座標計算方法に基づいて、以下の式３により復元される。 Next, the three-dimensional points are restored using the corresponding point data and the projection matrices P and P ′. If the image coordinates (u, v) of the reference frame correspond to the image coordinates (u ′, v ′) of the comparison frame, the three-dimensional point M (that is, the point (X, Y, Z)) is Based on the three-dimensional coordinate calculation method from the multi-viewpoint image, the following equation 3 is restored.

ただし、 However,

であり、ｐ_ijおよびｐ’_ijはＰおよびＰ’のｉ行ｊ列の要素を表し、Ｂ⁺はＢ行列の疑似逆行列を表す。 _Where p _ij and p ′ _ij represent elements of i rows and j columns of P and P ′, and B ⁺ represents a pseudo inverse of the B matrix.

式３を用いて基準フレームの全画素に対し、Ｋ個の対応点データを用いて三次元点を復元する。 Using Equation 3, three-dimensional points are restored using K corresponding point data for all the pixels of the reference frame.

次に、各画素毎に三次元点の統合を行う。 Next, three-dimensional points are integrated for each pixel.

既に述べたように、Ｋ個の対応点データのうち、ベースラインが短いステレオ画像対から得られたものは、対応点データの精度は良いが、復元される三次元点の精度は悪い。逆に、ベースラインが長いステレオ画像対から得られたものは、対応点データには間違いも含まれるが、正しい対応点から復元される三次元点の精度は高い。 As described above, among the K corresponding point data, the data obtained from the stereo image pair having the short base line has the high accuracy of the corresponding point data, but the accuracy of the restored three-dimensional point is low. On the other hand, for a stereo image pair obtained with a long baseline, the corresponding point data includes errors, but the accuracy of the three-dimensional points restored from the correct corresponding points is high.

そこで、Ｋ個の対応点データの内でベースラインが一番短い対応点データから復元された三次元点を点（Ｘｂ，Ｙｂ，Ｚｂ）と見做したとき、以下の式４を満たす三次元点を復元した対応点データのうちで一番ベースラインが長い対応点データから復元された三次元点を、その画素の三次元点と見做す。 Therefore, when the three-dimensional point restored from the corresponding point data having the shortest baseline among the K corresponding point data is regarded as a point (Xb, Yb, Zb), the three-dimensional satisfying the following expression 4 The three-dimensional point restored from the corresponding point data having the longest baseline among the corresponding point data obtained by restoring the points is regarded as the three-dimensional point of the pixel.

ただし、Ｅは閾値であって、任意に実験的に決めることが出来る。そのため、前記Ｅの値を入力する手段（例えば、キーボード装置などの値読み取り装置）を備えていても良い。 However, E is a threshold and can be arbitrarily determined experimentally. Therefore, a means for inputting the value of E (for example, a value reading device such as a keyboard device) may be provided.

式４を全ての基準フレームの全画素に対し適応することによって、膨大な三次元点群を復元することが出来る。なお、復元された三次元点群は、移動観測画像の最初のフレームを撮影した際のカメラ座標系で表すことができる。 By applying Equation 4 to all pixels in all reference frames, a vast 3D point cloud can be restored. The restored three-dimensional point group can be represented by a camera coordinate system when the first frame of the moving observation image is captured.

三次元点群の復元を移動観測画像列Ｉｍ₁およびＩｍ₂それぞれに対し適応することによって、それぞれのカメラＣ１およびカメラＣ２で観測された環境の三次元構造を取得できる。 By applying the reconstruction of the three-dimensional point group to each of the moving observation image sequences Im ₁ and Im ₂ , the three-dimensional structure of the environment observed by the respective cameras C1 and C2 can be acquired.

次に、前記の各移動観測画像列から得られた三次元点群は座標系が異なっているため、三次元点群統合手段１４によって各カメラから得られた三次元点群を共通の世界座標系に統合を行う（Ｓ２０７）。 Next, since the three-dimensional point group obtained from each of the moving observation image sequences has a different coordinate system, the three-dimensional point group obtained from each camera by the three-dimensional point group integration means 14 is changed to a common world coordinate. Integration into the system is performed (S207).

即ち、前記の世界座標系に統合は、一方の座標系に他方の座標系で表された三次元点群を変換してやることによって実現する。 That is, the integration into the world coordinate system is realized by converting a three-dimensional point group represented by the other coordinate system into one coordinate system.

本実施の形態では、移動観測画像列Ｉｍ₁から復元した三次元点群３Ｄ₁を、Ｉｍ₂から復元した三次元点群３Ｄ₂の座標系に変換することにする。 In the present embodiment, the 3D point group 3D ₁ restored from the moving observation image sequence Im ₁ is converted into the coordinate system of the 3D point group 3D ₂ restored from Im ₂ .

また、三次元点群の座標変換行列Ｔ₁の計算は、図６中の対応点テーブルＴ２のような異なる座標系で表された同一の三次元点の対応点テーブルを使って、例えば、前述の三次元点群の座標変換行列に関する変換方法によって、最低３つの三次元点を利用することによって実現できる。 In addition, the calculation of the coordinate transformation matrix T ₁ of the three-dimensional point group is performed using, for example, the corresponding point table of the same three-dimensional point represented by different coordinate systems such as the corresponding point table T2 in FIG. This can be realized by using at least three three-dimensional points by the conversion method related to the coordinate conversion matrix of the three-dimensional point group.

３Ｄ₁は、Ｉｍ₁の最初のフレームのカメラ座標系で表されており、３Ｄ₂はＩｍ₂の最初のフレームのカメラ座標系で表されている。そこで、選択する際に、Ｉｍ₁の最初のフレームとＩｍ₂の最初のフレームをステレオ処理の基準フレームと見做すこととする。 3D ₁ is represented in the camera coordinate system of the first frame of Im ₁ , and 3D ₂ is represented in the camera coordinate system of the first frame of Im ₂ . Therefore, when selecting, the first frame of Im _{1 and} the first frame of Im ₂ are regarded as a reference frame for stereo processing.

ステレオ処理によって復元された三次元点は、基準フレームの各画素と復元された三次元点が一対一で対応しているため、Ｉｍ₁の最初のフレームとＩｍ₂の最初のフレームの画像上で対応点を取得し、三次元点の対応点テーブルを作成する。この三次元点の対応点テーブルを利用して座標変換行列４行４列のＴ₁を計算し、式５によって３Ｄ₁を全て変換する。 Since the three-dimensional points restored by stereo processing have a one-to-one correspondence between the pixels of the reference frame and the restored three-dimensional points, the first frame of Im _{1 and} the first frame of Im ₂ are displayed on the image. Acquire corresponding points and create a corresponding point table of 3D points. By using the corresponding point table of the three-dimensional points, T ₁ of the coordinate transformation matrix 4 rows and 4 columns is calculated, and all 3D ₁ is transformed by Equation 5.

ただし、前記の点（Ｘ，Ｙ，Ｚ）は変換前の３Ｄ₁に含まれる三次元点、点（Ｘ_n，Ｙ_n，Ｚ_n）は変換して３Ｄ₂の座標系で表された三次元点、である。 However, the point (X, Y, Z) is a three-dimensional point included in 3D ₁ before the conversion, and the point (X _n , Y _n , Z _n ) is converted to a cubic expressed in a 3D ₂ coordinate system. The origin.

以上の変換処理によって、全ての三次元点群がＩｍ₂撮影時のカメラ座標系で表されたことになる。 Through the above conversion processing, all three-dimensional point groups are represented in the camera coordinate system at the time of Im ₂ photographing.

次に、床面検出手段１５によって復元された三次元環境中の床面を表す三次元点の検出を行う。 Next, a three-dimensional point representing the floor surface in the three-dimensional environment restored by the floor surface detection means 15 is detected.

例えば、床面（即ち、基準面）は、復元された三次元点群が表す最大面積を有する平面として検出する。 For example, the floor surface (that is, the reference surface) is detected as a plane having the maximum area represented by the restored three-dimensional point group.

前記の最大面積を有する平面は、三次元ハフ変換を利用して計算できる（Ｓ２０８）。即ち、三次元ハフ変換を用いることによって、最大面積を有する平面の平面方程式（ａｘ＋ｂｙ＋ｃｚ＋ｄ＝０）を得ることが出来る。 The plane having the maximum area can be calculated using a three-dimensional Hough transform (S208). That is, a plane equation (ax + by + cz + d = 0) of a plane having the maximum area can be obtained by using the three-dimensional Hough transform.

次に、ステップＳ２０９では、床面がＸＹ平面、高さ方向がＺ軸、となるような座標系に三次元点群を変換する。なお、床面に関する平面方程式は、ステップＳ２０８で前記の平面方程式（ａｘ＋ｂｙ＋ｃｚ＋ｄ＝０）と計算されているため、座標変換前における床面の法線ベクトルは、ベクトル（ａ，ｂ，ｃ）と見做す。 In step S209, the three-dimensional point group is converted into a coordinate system in which the floor surface is the XY plane and the height direction is the Z axis. Since the plane equation relating to the floor surface is calculated as the above-described plane equation (ax + by + cz + d = 0) in step S208, the normal vector of the floor surface before the coordinate conversion is regarded as a vector (a, b, c). Hesitate.

床面がＸＹ平面、かつ、高さ方向がＺ軸、となる座標系に変換するためには、床面の法線ベクトルがベクトル（０，０，１）となるように変換する。即ち、この変換のための３行３列の変換行列Ｔ_pは、以下の式６によって計算できる。 In order to convert to a coordinate system in which the floor surface is the XY plane and the height direction is the Z axis, the normal vector of the floor surface is converted to a vector (0, 0, 1). That is, the 3-by-3 conversion matrix T _p for this conversion can be calculated by the following Equation 6.

Ｔ_pが計算できたら、三次元点を以下の式７を用いて変換する。 When T _p can be calculated, the three-dimensional point is converted using Equation 7 below.

ただし、点（Ｘｃ，Ｙｃ，Ｚｃ）は三次元点（Ｘ，Ｙ，Ｚ）を変換行列Ｔ_pによって変換したものである。 However, the point (Xc, Yc, Zc) is obtained by converting the three-dimensional point (X, Y, Z) by the conversion matrix T _p .

このようにして、復元された三次元点群が床面を、基準面と見做した座標系に変換される。 In this way, the restored three-dimensional point group is converted into a coordinate system in which the floor surface is regarded as a reference surface.

次に、ノイズ除去手段１６によって復元された三次元点群からエラー点を除去する。 Next, error points are removed from the three-dimensional point group restored by the noise removing unit 16.

一般に、ステレオ処理から膨大な三次元点群を得ることが出来る。その反面、復元された三次元点にエラーが混入してしまう。この混入したエラーを取り除くために投票処理を行う（Ｓ２１０）。 In general, a vast 3D point cloud can be obtained from stereo processing. On the other hand, an error is mixed in the restored three-dimensional point. A voting process is performed to remove the mixed error (S210).

例えば、環境中に存在する物体などは床面から特定の高さを持つ点の集合である、と考えられるため、高さ方向に関して投票処理を行うことによってエラー点を除去できる。具体的な投票処理は、次の通りである。 For example, since an object or the like existing in the environment is considered to be a set of points having a specific height from the floor surface, error points can be removed by performing a voting process in the height direction. The specific voting process is as follows.

単位立方格子を用いて空間を分割（即ち、ボクセル表現）し、復元された三次元点がどの単位立方格子に入るかを判定し、該当する立方格子に登録を行う。 The unit cubic lattice is used to divide the space (that is, voxel expression), it is determined which unit cubic lattice the restored three-dimensional point enters, and registration is performed in the corresponding cubic lattice.

次に、全ての三次元点に対して前記の登録処理を終えた後、同じボクセル空間内で同じＸＹ座標値を持つ三次元点で、Ｚ座標の投票処理を行う。 Next, after the above registration processing is completed for all three-dimensional points, voting processing of Z coordinates is performed at three-dimensional points having the same XY coordinate values in the same voxel space.

そして、最大得票数を得たＺ座標をそのＸＹ座標値におけるＺ座標値として採用し、その他の三次元点を削除する。 Then, the Z coordinate at which the maximum number of votes is obtained is adopted as the Z coordinate value in the XY coordinate value, and the other three-dimensional points are deleted.

全てのＸＹ座標値に対して前述の投票処理を行うことによって、前記の三次元点群からエラー点を除去できる。 By performing the above-described voting process on all XY coordinate values, error points can be removed from the three-dimensional point group.

そして、復元された三次元環境情報とカメラ座標系を結び付けるために、カメラＣ１のカメラ座標系との関係を以下の式８で表す。 Then, in order to link the restored three-dimensional environment information and the camera coordinate system, the relationship between the camera C1 and the camera coordinate system is expressed by the following Expression 8.

なお、行列Ｔ_p’は、行列Ｔ_pに一行追加した以下の式９で表される行列である。 Note that the matrix T _p ′ is a matrix represented by the following Expression 9 in which one row is added to the matrix T _p .

カメラＣ２のカメラ座標系との関係は、以下の式９で表される。 The relationship between the camera C2 and the camera coordinate system is expressed by the following Expression 9.

ただし、座標系（Ｘ_CAM1，Ｙ_CAM1，Ｚ_CAM1）はカメラＣ１のカメラ座標系、座標系（Ｘ_CAM2，Ｙ_CAM2，Ｚ_CAM2）はカメラＣ２のカメラ座標系、座標系（Ｘ_3D，Ｙ_3D，Ｚ_3D）は三次元環境情報の座標系、をそれぞれ表す。 However, the coordinate system _{_{(X CAM1, Y CAM1, Z}} CAM1) camera coordinate system of the camera C1, the coordinate system _{_{(X CAM2, Y CAM2, Z}} CAM2) camera coordinate system of the camera C2, the coordinate system (X _3D, Y _3D , Z _3D ) represents the coordinate system of the three-dimensional environment information.

以上のように、カメラＣ１のカメラ座標系と三次元環境情報の座標系（即ち、世界座標系）間の交換、カメラＣ２のカメラ座標系と三次元環境情報の座標系間の交換、を行うことができる。 As described above, the exchange between the camera coordinate system of the camera C1 and the coordinate system of the three-dimensional environment information (that is, the world coordinate system) and the exchange between the camera coordinate system of the camera C2 and the coordinate system of the three-dimensional environment information are performed. be able to.

なお、図１中の三次元環境情報取得装置における各手段の一部もしくは全部の機能をコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行することによって、本実施の形態を実現できる。 Note that this embodiment can be realized by configuring some or all of the functions of each means in the three-dimensional environment information acquisition apparatus in FIG. 1 with a computer program and executing the program using the computer.

さらに、図３中の三次元環境情報取得方法における処理手順をコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることは言うまでもない。 Furthermore, it goes without saying that the processing procedure in the three-dimensional environment information acquisition method in FIG. 3 can be constituted by a computer program and the program can be executed by the computer.

また、コンピュータで前記の機能を実現するためのプログラムを、そのコンピュータが読み取り可能な記録媒体（例えば、ＦＤ（Ｆｌｏｐｐｙ(登録商標) Ｄｉｓｋ），ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｋ），ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ），メモリカード，ＣＤ（ＣｏｍｐａｃｔＤｉｓｋ），ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ），リムーバブルディスクなど）に記録して、保存したり、配布したりできる。 In addition, a computer-readable recording medium (for example, FD (Floppy (registered trademark) Disk), MO (Magneto-Optical disk), ROM (Read Only Memory) can be recorded on the computer. , Memory card, CD (Compact Disk), DVD (Digital Versatile Disk), removable disk, etc., and can be stored or distributed.

そして、上記のプログラムをネットワーク（例えば、インターネットや電子メールなど）を通して提供できる。 Then, the above program can be provided through a network (for example, the Internet, electronic mail, etc.).

以上のように、本実施の形態は、従来のように高価な機器を利用することなく、移動しながら画像を撮影するのみで、自動的に三次元環境を取得できる。 As described above, the present embodiment can automatically acquire a three-dimensional environment simply by capturing an image while moving without using an expensive device as in the prior art.

また、取得した三次元環境情報からノイズ除去を行った結果、そのままポリゴン化できるほど、密で精度が高い三次元情報を取得できる。 Moreover, as a result of performing noise removal from the acquired three-dimensional environment information, it is possible to acquire dense and high-accuracy three-dimensional information so that it can be polygonized as it is.

さらに、三次元環境情報取得に利用したカメラをそのまま使って人物追跡などの処理を行う場合、カメラと三次元環境情報間の座標系が既に校正済みであるため、従来のように、環境にマーカを設置して認識する必要がなく、大幅に省力化できる。 In addition, when performing processing such as person tracking using the camera used to acquire 3D environment information as it is, the coordinate system between the camera and 3D environment information has already been calibrated, so as before, the marker is placed in the environment. There is no need to install and recognize it, which can save a lot of labor.

次に、本実施の形態の実施例を図７乃至図１２に基づいて以下に説明する。 Next, an example of the present embodiment will be described below with reference to FIGS.

より具体的に述べると、前述の三次元環境情報取得装置を用いて、２台のビデオカメラ（即ち、カメラＣ１およびカメラＣ２）から取得した移動観測画像を処理した結果に基づいて、図３中の処理フローを適応した例を説明するものである。 More specifically, based on the result of processing the moving observation images acquired from two video cameras (that is, the camera C1 and the camera C2) using the above-described three-dimensional environment information acquisition device, FIG. An example in which the processing flow is applied will be described.

まず、処理が開始されると、移動観測画像列の取得を行う（Ｓ２０１）。例えば、図７は、移動観測画像の取得に用いた鉛直方向に移動可能なカメラ雲台（即ち、符号７０１〜７０２の画像のようにカメラを移動できるカメラ雲台；以下、鉛直移動カメラ雲台と称する）７０と取得した移動観測画像列の一部（即ち、符号７０３〜７０５で示された画像）である。 First, when the process is started, a moving observation image sequence is acquired (S201). For example, FIG. 7 shows a camera head that can be moved in the vertical direction used for acquiring the moving observation image (that is, a camera head that can move the camera like the images denoted by reference numerals 701 to 702; 70) and a part of the acquired moving observation image sequence (that is, an image indicated by reference numerals 703 to 705).

なお、前記の鉛直移動カメラ雲台７０は、カメラを鉛直方向に２０ｃｍ（センチメートル）程度移動できる。そして、符号７０１の画像のように鉛直移動カメラ雲台７０が一番高くなる状態から、符号７０２の画像のように一番低くなる状態まで動かしながら撮影を行った。 The vertical camera pan head 70 can move the camera in the vertical direction by about 20 cm (centimeter). Then, shooting was performed while moving from the state in which the vertically moving camera pan head 70 was the highest as in the image of reference numeral 701 to the state in which it was the lowest as in the image of reference numeral 702.

カメラＣ１およびカメラＣ２を用いて上記のような撮影を行って、移動観測画像列Ｉｍ₁およびＩｍ₂（例えば、符号７０３〜７０５で示す画像）を取得する。 Using the camera C1 and the camera C2 performs imaging as described above, the mobile observed image sequence Im ₁ and Im ₂ (e.g., image shown by reference numeral 703 to 705) to acquire.

次に、ステップＳ２０２では、特徴点の追跡処理を行う。 In step S202, feature point tracking processing is performed.

図８は、特徴点を追跡した結果の一部（即ち、符号８０１〜８０３で示された画像）であって、符号８０１の画像から撮影した時系列順に並んでいるものとする。 FIG. 8 shows a part of the result of tracking the feature points (that is, images indicated by reference numerals 801 to 803), which are arranged in time-series order taken from the image of reference numeral 801.

なお、中央の画像は、時系列順に並んだ移動観測画像列で中央に位置する画像であって、画像中の四角い枠の中にＨａｒｒｉｓ特徴を用いた特徴点を発生させた画像である。 The center image is an image located in the center of the moving observation image sequence arranged in chronological order, and is an image in which a feature point using the Harris feature is generated in a square frame in the image.

次に、発生した特徴点を時系列逆方向、順方向ともに追跡して、移動観測画像列の全てのフレームで対応付けを行う。 Next, the generated feature points are tracked in both the time series reverse direction and the forward direction, and association is performed in all frames of the moving observation image sequence.

そして、移動観測画像列Ｉｍ₁およびＩｍ₂に対して前記の特徴点の追跡処理を行う。 Then, the feature point tracking process is performed on the moving observation image sequences Im ₁ and Im ₂ .

次に、ステップＳ２０３では、ステップＳ２０２にて求めた特徴点追跡結果から計測行列を作成し、因子分解法によって、カメラ運動を推定する。 Next, in step S203, a measurement matrix is created from the feature point tracking result obtained in step S202, and camera motion is estimated by a factorization method.

次に、ステップＳ２０４では、ステレオ画像の選択を行う。即ち、本実施の形態における実施例では、鉛直移動カメラ雲台７０が一番近い位置で撮影したフレームの一枚を基準フレームと見做し、カメラ雲台が一番高い位置のフレームを比較フレームと見做し、その基準フレームと比較フレーム間のフレームでインターバルｄを設定して複数枚選択する。 Next, in step S204, a stereo image is selected. That is, in the example of the present embodiment, one frame taken by the vertically moving camera pan head 70 at the closest position is regarded as the reference frame, and the frame at the highest camera pan head is the comparison frame. Therefore, a plurality of frames are selected by setting an interval d between the reference frame and the comparison frame.

次に、ステップＳ２０５では、ステレオ処理を行って、対応点データを取得し、ステップＳ２０６で三次元点群を復元する。 Next, in step S205, stereo processing is performed to acquire corresponding point data, and in step S206, the three-dimensional point group is restored.

以上のステップＳ２０３からＳ２０６までの処理を移動観測画像列Ｉｍ₁およびＩｍ₂に対して行う。 The processes from steps S203 to S206 are performed on the moving observation image sequences Im ₁ and Im ₂ .

次に、ステップＳ２０７では、移動観測画像列Ｉｍ₁およびＩｍ₂からそれぞれ復元された三次元点群の統合を行う。 Next, in step S207, three-dimensional point groups restored from the moving observation image sequences Im ₁ and Im ₂ are integrated.

なお、図９は、二つの三次元点群を世界座標系に統合した一例である。図９中の点群９０１はカメラＣ１を用いて復元した三次元点群、点群９０２はカメラＣ２を用いて復元した三次元点群、点群９０３は三次元点群９０１と三次元点群９０２を世界座標系で統合した三次元点群、である。 FIG. 9 is an example in which two three-dimensional point groups are integrated into the world coordinate system. In FIG. 9, a point group 901 is a three-dimensional point group restored using the camera C1, a point group 902 is a three-dimensional point group restored using the camera C2, and a point group 903 is a three-dimensional point group 901 and a three-dimensional point group. A three-dimensional point group obtained by integrating 902 in the world coordinate system.

次に、ステップＳ２０８では、ステップＳ２０７にて統合された三次元点群から床面を表す点群を、最大面積を有する平面として検出する。 Next, in step S208, the point group representing the floor surface is detected as a plane having the maximum area from the three-dimensional point group integrated in step S207.

次に、ステップＳ２０９では、検出された床面がＸＹ平面、高さ方向がＺ軸となるような座標系に変換する。なお、図１０中の符号１００１で示される点群は、このような座標系に変換した後の三次元点群の一例である。 Next, in step S209, the detected floor is converted into a coordinate system in which the XY plane is the XY plane and the height direction is the Z axis. In addition, the point group shown with the code | symbol 1001 in FIG. 10 is an example of the three-dimensional point group after converting into such a coordinate system.

そして、ステップＳ２１０では、高さ方向で投票処理を行ってノイズを除去する。なお、図１１中の符号１１０１で示される点群は、ノイズ除去された三次元点群（即ち、三次元環境）の一例である。 In step S210, a voting process is performed in the height direction to remove noise. Note that the point group indicated by reference numeral 1101 in FIG. 11 is an example of a three-dimensional point group from which noise is removed (that is, a three-dimensional environment).

前記のように、ノイズを除去した三次元環境情報に基づいて、例えば、隣接する点群を三角形のパッチで結び、ポリゴンデータ（あるいは、ポリゴンモデル）などを生成できる。なお、図１２中の符号１２０１及び１２０２は、取得した三次元環境情報からポリゴンモデルを生成した一例を示している。 As described above, based on the three-dimensional environment information from which noise has been removed, for example, adjacent points can be connected with triangular patches to generate polygon data (or polygon model) or the like. Note that reference numerals 1201 and 1202 in FIG. 12 indicate an example in which a polygon model is generated from the acquired three-dimensional environment information.

以上のように、本実施の形態における実施例では、三次元環境情報を簡易に自動的に構築できる。 As described above, in the example of the present embodiment, the three-dimensional environment information can be easily and automatically constructed.

以上、本発明において、記載された具体例に対してのみ詳細に説明したが、本発明の技術思想の範囲で多彩な変形および修正が可能であることは、当業者にとって明白なことであり、このような変形および修正が特許請求の範囲に属することは当然のことである。 Although the present invention has been described in detail only for the specific examples described above, it is obvious to those skilled in the art that various changes and modifications are possible within the scope of the technical idea of the present invention. Such variations and modifications are naturally within the scope of the claims.

例えば、本実施の形態では、次のような変形が考えられる。 For example, in the present embodiment, the following modifications can be considered.

本実施の形態では、カメラ運動推定手段１３に因子分解法を用いる例を説明したが、移動観測画像からカメラ運動を推定する手段であれば、何を用いても実現可能であり、例えば、逐次射影復元を用いても実現可能である。 In the present embodiment, an example in which the factorization method is used for the camera motion estimation unit 13 has been described. However, any method can be used as long as it is a unit that estimates camera motion from a moving observation image. This can also be realized using projective restoration.

本実施形態では、三次元点群の統合に復元した三次元点群を利用してクォータニオンを用いて統合を行ったが、単純にカメラ間の移動パラメータ（即ち、カメラ外部パラメータ）を予め校正しておいて、統合を行う方法を用いることも可能である。 In this embodiment, the integration is performed using the quaternion using the three-dimensional point cloud restored to the integration of the three-dimensional point cloud. However, the movement parameter between cameras (ie, the camera external parameter) is simply calibrated in advance. It is also possible to use a method of integration.

本実施形態では、利用するカメラ台数を２とした例で説明を行ったが、カメラの台数を増やしても基準となるカメラに対する変換行列を逐次求めることによって、同様に実現できる。 In the present embodiment, an example in which the number of cameras to be used is 2 has been described. However, even if the number of cameras is increased, the same can be realized by sequentially obtaining a conversion matrix for a reference camera.

本実施の形態における三次元環境情報取得装置の構成図。The block diagram of the three-dimensional environment information acquisition apparatus in this Embodiment. 本実施の形態における三次元環境情報取得対象のモデル図。The model figure of the three-dimensional environmental information acquisition object in this Embodiment. 本実施の形態における三次元環境情報取得方法の処理手順を示すフローチャート。The flowchart which shows the process sequence of the three-dimensional environment information acquisition method in this Embodiment. 本実施の形態におけるカメラ運動情報の一例を示す図。The figure which shows an example of the camera motion information in this Embodiment. 本実施の形態における移動観測画像列からのステレオ画像の選択の方法の一例を示す図。The figure which shows an example of the method of selection of the stereo image from the movement observation image sequence in this Embodiment. 本実施の形態における異なる座標系で表された三次元点の対応点テーブルの例を示す図。The figure which shows the example of the corresponding point table of the three-dimensional point represented by the different coordinate system in this Embodiment. 本実施の形態の一例における（Ａ）鉛直方向に移動可能なカメラ雲台と（Ｂ）取得した移動観測画像列の一部を示す図。The figure which shows a part of (A) camera pan head which can move to an orthogonal | vertical direction in an example of this Embodiment, and the acquired movement observation image sequence. 本実施例における移動観測画像列から特徴点を追跡した結果を示す図。The figure which shows the result of having tracked the feature point from the movement observation image sequence in a present Example. 本実施例におけるカメラＣ１から復元した三次元点群，カメラＣ２から復元した三次元点群，それら二つの三次元点群を世界座標系に統合した三次元点群を示す図。The figure which shows the three-dimensional point group which integrated the three-dimensional point group decompress | restored from the camera C1 in a present Example, the three-dimensional point group decompress | restored from the camera C2, and those two three-dimensional point groups into the world coordinate system. 本実施例における統合した三次元点群を床面をＸＹ平面とする座標系に変換した三次元点群を示す図。The figure which shows the three-dimensional point group which converted the integrated three-dimensional point group in a present Example into the coordinate system which makes a floor surface an XY plane. 本実施の形態の一例における投票処理によって、ノイズを除去した三次元点群を示す図。The figure which shows the three-dimensional point group which removed the noise by the voting process in an example of this Embodiment. 本実施の形態の一例における三次元環境情報をポリゴンデータで表現した図。The figure which expressed the three-dimensional environment information in an example of this Embodiment with polygon data.

Explanation of symbols

１１…移動観測画像列取得手段
１２…カメラ運動推定手段
１３…三次元点群取得手段
１４…三次元点群統合手段
１５…床面検出手段
１６…ノイズ除去手段
７０…鉛直移動カメラ雲台
５０１…基準フレーム
５０２…比較フレーム
５０３…移動観測画像列
７０１，７０２…鉛直移動カメラ雲台を撮像した画像
７０３，７０４，７０５…移動観測画像
８０１，８０２，８０３…特徴点を追跡した結果の画像
９０１…カメラＣ１を用いて復元した三次元点群
９０２…カメラＣ２を用いて復元した三次元点群
９０３…三次元点群９０１と三次元点群９０２を世界座標系で統合した三次元点群
１００１…座標系に変換した後の三次元点群
１１０１…ノイズ除去された三次元点群
１２０１，１２０２…三次元環境情報から生成したポリゴンモデル
ＢＧ…三次元構造
ｄ…フレームインターバル
Ｔ１…カメラ運動表
Ｔ２…対応点テーブル DESCRIPTION OF SYMBOLS 11 ... Moving observation image sequence acquisition means 12 ... Camera motion estimation means 13 ... Three-dimensional point group acquisition means 14 ... Three-dimensional point group integration means 15 ... Floor surface detection means 16 ... Noise removal means 70 ... Vertical movement camera pan head 501 ... Reference frame 502 ... Comparison frame 503 ... Moving observation image sequence 701,702 ... Image obtained by picking up a vertically moving camera pan head 703,704,705 ... Moving observation image 801,802,803 ... Image resulting from tracking feature points 901 ... Three-dimensional point group restored using camera C1 902 ... Three-dimensional point group restored using camera C2 903 ... Three-dimensional point group obtained by integrating three-dimensional point group 901 and three-dimensional point group 902 in the world coordinate system 1001 ... Three-dimensional point group 1101 after conversion to a coordinate system 1101 ... Three-dimensional point group from which noise is removed 1201, 1202 ... Polygon model generated from three-dimensional environment information BG ... three-dimensional structure d ... frame interval T1 ... camera motion table T2 ... corresponding point table

Claims

A three-dimensional environment information acquisition device that acquires an image of a subject from a plurality of imaging devices and acquires three-dimensional environment information about the subject from the captured images.
A moving observation image sequence acquisition means for acquiring a movement observation image sequence obtained by imaging the subject of each imaging device while moving each of the plurality of imaging devices;
An imaging device motion estimation means for estimating the time taken for each image of the moving observation image sequence of each imaging apparatus that acquires an imaging device motion of each of the imaging device,
Based on the respective imaging device motion so acquired, the three-dimensional point group obtaining means for obtaining a three-dimensional point group expressed in the coordinate system of the imaging device relating to the subject,
3D point group integration means for integrating the 3D point group in each imaging device acquired using the moving observation image sequence into the world coordinate system using the coordinate system of any one imaging device as the world coordinate system;
From the integrated three-dimensional point group , a reference surface detecting means for detecting a reference surface region that is a plane having the maximum area represented by the three-dimensional point group , and
Based on information on the detected reference plane area, the Z coordinate obtained from the integrated three-dimensional point group by the voting process of the Z coordinate with the reference plane as the XY plane and the height direction as the Z axis Noise removal means for removing error points other than
A three-dimensional environment information acquisition apparatus comprising:

The three-dimensional point cloud acquisition means includes
Select an image to be used for stereo processing from the moving observation image sequence,
Perform stereo processing between the selected images,
Using the result of the stereo processing and the imaging device motion of each imaging device determined by the imaging device motion estimation means to restore a three-dimensional point;
The three-dimensional environment information acquisition apparatus according to claim 1.

The reference plane detection means includes
From the integrated three-dimensional point group, to detect a plane area that has the maximum area by three-dimensional Hough transform, the detected plane area is regarded as a reference plane area,
The three-dimensional environment information acquisition apparatus according to claim 1 or 2,

A three-dimensional environment information acquisition method for acquiring an image of a subject captured from a plurality of imaging devices and acquiring three-dimensional environment information about the subject from the captured images,
A movement observation image sequence acquisition step of acquiring a movement observation image sequence obtained by imaging the subject of each imaging device while moving each of the plurality of imaging devices;
An imaging device motion estimation step of estimating the time taken for each image of the moving observation image sequence of each imaging apparatus that acquires an imaging device motion of each of the imaging device,
Based on the respective imaging device motion so acquired, the three-dimensional point group acquisition step of acquiring a three-dimensional point group expressed in the coordinate system of the imaging device relating to the subject,
A three-dimensional point group integration step of integrating a three-dimensional point group in each imaging device acquired using the moving observation image sequence into a world coordinate system using the coordinate system of any one imaging device as a world coordinate system;
From the integrated three-dimensional point group, a reference surface detection step for detecting a reference surface region that is a plane having the maximum area represented by the three-dimensional point group ;
Based on information on the detected reference plane area, the Z coordinate obtained from the integrated three-dimensional point group by the voting process of the Z coordinate with the reference plane as the XY plane and the height direction as the Z axis A noise removal step to remove error points other than
A three-dimensional environmental information acquisition method characterized by comprising:

The imaging device motion estimation step comprises:
A feature point generated in an arbitrary image in the moving observation image sequence is subjected to a tracking process for all other images, a measurement matrix is created using the result of the feature point tracking, and an image pickup device by a factorization method Steps to seek exercise,
The feature points generated in an arbitrary image in the moving observation image sequence are tracked for all other images, and the projection is sequentially restored using the result of the feature point tracking to perform the imaging device motion. Steps to seek,
The three-dimensional environment information acquisition method according to claim 4 , further comprising:

The three-dimensional point cloud acquisition step includes
Select an image to be used for stereo processing from the moving observation image sequence,
Perform stereo processing between the selected images,
Using the result of the stereo processing and the imaging device motion obtained in the imaging device motion estimation step, a three-dimensional point is restored.
The three-dimensional environment information acquisition method according to claim 4 or 5 .

The reference plane detection step includes
From the integrated three-dimensional point group, to detect a plane area that has the maximum area by three-dimensional Hough transform, the detected plane area is regarded as a reference plane area,
The three-dimensional environment information acquisition method according to any one of claims 4 to 6 .

An acquisition step of the moving observation image sequence according to any one of claims 4 to 7, an imaging device motion estimation step, a three-dimensional point group acquisition step, a three-dimensional point group integration step, a reference plane, A computer-readable recording medium recording a program for executing a detection step and a noise removal step.