JP3759712B2

JP3759712B2 - Camera parameter estimation method, apparatus, program, and recording medium

Info

Publication number: JP3759712B2
Application number: JP2001357890A
Authority: JP
Inventors: 勲宮川; 裕治石川; 佳織若林; 知彦有川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-11-22
Filing date: 2001-11-22
Publication date: 2006-03-29
Anticipated expiration: 2021-11-22
Also published as: JP2003156317A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像入力装置等により取得した時系列画像データから、対象物の３次元形状または構造を計測または獲得、ならびに復元する際の画像入力装置の透視投影視点に関する姿勢（ピッチ，ロール，ヨー）、または回転パラメータ、特に、コンピュータビジョンにおけるカメラの姿勢推定方法に関する。
【０００２】
【従来の技術】
リモートセンシング技術の進展により、高精度なセンサ装置が開発されている。最近では、高精度なディファレンシャルＧＰＳ（Global Positioning System：位置測位システム）装置を複数用いて姿勢を計測する方式が実用化されており、０．１度以下の姿勢までもセンシングできるようになっている。しかし、センサ精度は向上するものの、映像との一対一の対応付けの点で言えば、画像が装置間の遅延や画像レートとの非同期などにより、画像を管理するタイムコードと一致せず、特に、空中撮影のような環境では、突風、ビル風、気象条件などにより、急激な変化が発生する場合は、画像とセンシング姿勢との不一致が見られる。図６で、センサ装置を使った計測値と画像レートでの不一致を説明する。センサ装置は位置を、ある一定間隔で計測し、そのレートで画像と計測値が、タイムコードなどにより一対一で対応付けられて記録される。しかし、一般的に、画像レートとセンサレートは異なる場合が多く、必ずしも各時系列画像を取得した姿勢をセンシングできているとは限らない。すなわち、測量において、画像レートとセンサレートを合わせるため、画像レートに対応付けられるように、線形補間などの内挿法を利用してつじつま合わせを行う。このとき、図６で白丸で表したような点は、線形補間されているため、黒丸で表した点と比べて、正確な姿勢でない場合が発生し、空間情報を映している画像上の点は、画像上の投影と異なる可能性がある。したがって、このような姿勢と画像との不一致により、画像とセンサデータを利用した３次元座標獲得手法などに精度悪化を発生させる。ここで、表現するレートとは、１秒間当たりのデータ数を指す。つまり、画像レートとは、１秒間に３０枚の周期でデータが取得されるのに対して、センサレートの場合、１秒間に１回しかデータ取得（ディファレンシャルＧＰＳの場合）ができない。本明細書では、この３０枚の静止画像／１秒と、１回測定のＧＰＳデータ／１秒をそれぞれ、レートと表現している。
【０００３】
一方、コンピュータビジョンの分野では、時系列画像データから、対象物の形状を計測、または獲得する手法として、ステレオ計測やエピポーラ解析を用いた３次元解析手法がある。また、最近では、カメラの運動と対象物の形状に関する３次元情報を同時に計測または獲得する手法の代表的な手法として、因子分解法がある。これらの手法によれば、対象物が撮影されている複数の時系列画像から、３次元の形状に関する情報ならびにカメラ視点に関する姿勢パラメータ（回転パラメータ）を獲得、復元することができる。しかし、撮影環境の影響によりカメラが振動し、得られた空撮画像は雑音が多く含んだ時系列画像であり、その空撮映像中の雑音成分の影響により、カメラ運動の姿勢推定、運動復元が困難な場合がある。
【０００４】
また、特願平１０−２５５３３８「カメラ姿勢推定方法およびその処理手順を記録した記録媒体ならびにカメラ姿勢推定装置」では、物体の標高値データ、２次元ディジタル地図、カメラ位置情報を使ったカメラ姿勢推定手法が存在する。しかし、この発明のように、複数のデータが常に、準備されている保証はなく、データ量が膨大になるという問題がある。
【０００５】
【発明が解決しようとする課題】
一般に、画像入力装置の画像入力レートはセンサ装置のセンサレートよりも高速であるため、画像入力装置で撮影された時系列画像全てについてカメラ運動としてのカメラ姿勢、または回転パラメータ等のカメラパラメータを得ることはできない。このため、通常、撮影された時系列画像中にセンサ装置で測定された正確なカメラパラメータと対応づけられた画像（以下、パラメータ判明画像という）は所定の割合でしか存在しない。カメラパラメータが対応づけられていない画像（以下、パラメータ未定画像という）のカメラパラメータを、前記パラメータ判明画像に対応づけられたカメラパラメータから補間計算し、パラメータ未定画像のカメラパラメータとして対応づける方法もある（既知のカメラパラメータを使って補間することにより得られたカメラパラメータを対応づけられた画像を、以下では、パラメータ補間画像という）。しかし、突風、ビル風等の影響により映像シーンが揺れる空撮画像においては、カメラの撮影姿勢が微妙に、またはバースト的に変動するため、計算されて補間されたカメラパラメータは不正確になる。
【０００６】
本発明の目的は、撮影された時系列画像中に正確な３次元座標値が判明している複数の基準点（以下では、基準点群という。）が存在する場合に、パラメータ未定画像の正確なカメラパラメータを推定するカメラパラメータ推定方法および装置を提供することである。
【０００７】
本発明の他の目的は、パラメータ判明画像が存在せず、全ての時系列画像のカメラパラメータが不明な場合に、パラメータ未定画像の正確なカメラパラメータを推定するカメラパラメータ推定方法および装置を提供することである。
【０００８】
【課題を解決するための手段】
本発明のカメラパラメータ推定方法は、
基準点群を所定の初期画像に配置する基準点群配置ステップと、
基準点群が移動する状態を撮影した一連の画像群中の各画像において基準点群の座標位置を画像処理により求め、これら基準点群の座標位置の移動状態を示す計測行列より、因子分解法を用いて基準点群の相対的３次元座標行列を求める相対的３次元座標行列計算ステップと、
相対的３次元座標行列を特異値分解し、分解された各々の行列の中から特異値行列を抽出する特異値行列抽出ステップと、
正確な３次元座標行列を特異値分解し、分解された各々の行列の中から、特異値行列の両側に位置する左行列と右行列を抽出する左右行列抽出ステップと、
特異値行列抽出ステップで抽出された特異値行列と左右行列抽出ステップで抽出された左行列と右行列を組み合わせてスケールを変換した３次元座標行列を生成する３次元座標行列生成ステップと、
計測行列を特異値分解してノイズを除去した行列と、３次元座標行列生成ステップで生成された３次元座標行列より、因子分解法を用いてカメラ運動行列を求めるカメラ運動行列算出ステップと、
前記カメラ運動行列からカメラパラメータを算出するカメラパラメータ算出ステップを有する。
【０００９】
本発明は、突風、ビル風等の影響により映像シーンが揺れる空撮映像において、空撮映像中の基準点群の時系列画像上での２次元的移動量と、基準点群に関する３次元位置が格納されている空間情報データベースから、該当する３次元座標値を抽出して、それを既知情報とし、各時系列画像のカメラ視点に関する運動、姿勢、回転のパラメータを獲得、ならびに復元するものである。
【００１０】
本発明の方法を利用することにより、空中撮影した時系列画像から、カメラの姿勢パラメータ、すなわち回転パラメータを高精度に獲得、復元することが可能となる。また、本発明により復元したカメラの姿勢パラメータ（回転パラメータ）は、特願平１０−２３２９７９「３次元構造獲得方法および装置ならびにその方法の記録媒体」、特願平１０−３７４０６２「３次元構造獲得・復元方法、装置、および３次元構造獲得・復元プログラムを記録した記録媒体」、特願平１１−４０２９９「構造情報復元および記録媒体ならびに構造情報復元装置」、特願平１１−２７１３８３「３次元構造獲得方法及び記録媒体並びに装置」のカメラ運動に適用することが可能であり、それらと併用して、建物形状のより高精度で、かつ安定的な獲得、復元が可能となる。
【００１１】
【発明の実施の形態】
次に、本発明の実施の形態について図面を参照して説明する。
【００１２】
図１は本発明の一実施形態のカメラパラメータ推定装置の構成図である。
【００１３】
本実施形態のカメラパラメータ推定装置は、データを入力するための、キーボード等のデータ入力部１と、データを一時的に記憶する記憶部２と、地面の形状をデジタル化したデジタル情報が格納されている空間情報データベース３と、撮影された画像が時系列に格納されている時系列画像データベース４と、推定されたカメラパラメータが出力（表示または印字）される出力部５と、カメラパラメータ推定処理を行い、推定されたカメラパラメータを出力部５に出力する処理部６で構成されている。
【００１４】
処理部６は、基準点群を所定の初期画像に配置する基準点群配置部６Ａと、基準点群が移動する状態を撮影した一連の画像群中の各画像において基準点群の座標位置を画像処理により求め、これら基準点群の座標位置の移動状態を示す計測行列より、因子分解法を用いて基準点群の相対的３次元座標行列を求める相対的３次元座標行列計算部６Ｂと、相対的３次元座標行列を特異値分解し、分解された各々の行列の中で、特異値行列を抽出する特異値行列抽出部６Ｃと、正確な３次元座標行列を特異値分解し、分解された各々の行列の中で、特異値行列の両側に位置する左行列と右行列を抽出する左右行列抽出部６Ｄと、特異値行列抽出部６Ｃで抽出された特異値行列と左右行列抽出部６Ｄで抽出された左行列と右行列を組み合わせてスケールを変換した３次元座標行列を生成する３次元座標行列生成部６Ｅと、計測行列を特異値分解してノイズを除去した行列と、３次元座標行列生成部６Ｅによって生成された３次元座標行列より、因子分解法を用いてカメラ運動行列を求めるカメラ運動行列算出部６Ｆと、カメラ運動行列からカメラ姿勢パラメータ（カメラ回転パラメータ）を算出するカメラパラメータ算出部６Ｇを含む。
【００１５】
図２は処理部６で行なわれるカメラパラメータ推定処理のフローチャートである。
【００１６】
以下では、基準面上の点群の例として、地面に位置する標本点（ＧＣＰ：Ground Control Points / Positions）を例にとり、空間情報、空間モデルの獲得、復元方法として因子分解法を用いた例について記載する。
【００１７】
まず、基準点群配置部６Ａにより、カメラ運動を復元する映像シーンにおいて、地面に位置すると認識した基準点群を映像シーンの初期画像に配置する（ステップ１１）。
【００１８】
図３により、基準点群の初期配置に関する処理の流れを説明する。まず、空間情報データベース３の中から、空間情報が存在するものだけを抽出する（ステップ２２）。または、空撮画像でも特定できるような地面上の標本点、すなわち、ＧＣＰを予め測量しておき、これら点群の空間情報を空間情報データベース３に格納し、この空間情報を抽出する。ここで、空間情報とは、基準点Ｐ_iの３次元座標値（Ｘ_i，Ｙ_i，Ｚ_i）を意味する。これに対して、時系列画像データベース４の中から、初期画像とする画像を取り出し（ステップ２３）、この画像に基準点群の空間情報を配置する（ステップ２４）。図４に、初期画像における基準点群配置の例を示す。図のように、複数の点を、偏りがない程度に配置する。なお、この基準点群は、空間情報データベース３から読み込んだ空間情報を元に、画像上に、それに該当する点群を配置したものである。例えば、パラメータ判明画像またはパラメータ補間画像のいずれか１枚の画像のカメラパラメータが正しいと仮定し、カメラパラメータが正しいと仮定した画像を初期画像として、基準点群の正確な３次元座標値と該カメラパラメータに基づいて、基準点群を前記初期画像に透視投影により配置したり、または、空間情報とそれに該当する画像サンプルが時系列画像データベース４に存在し、オペレータに対してその画像サンプルを閲覧させて（ステップ２１）、オペレータが初期画像中のどこにその点をマークすべきかを認識し、初期画像上に、逐次、基準点群を配置し、閲覧画像を見て点配置が正常かどうかを判断し（ステップ２５）、基準点群が確定する。また、空撮映像で、時系列画像の流れる方向が一定方向であるため、初期画像に、全体に基準点群を配置するのではなく、ある枚数までフレームアウト（映像中の被写体が画面から外れること）しない程度に基準点群を配置しておく。図５は、ある枚数まで画像追跡した後の基準点群の画像上での位置を示している。図２での時系列画像追跡（ステップ１３）は、基準点群がフレームアウトするまで、Ｆ枚の画像に対して行う（点群中に１つでもフレームアウトすると処理を停止する）。これにより、計測した基準点群の時系列画像での２次元座標ｐ_i（ｘ_i1，ｙ_i1）；ｉ＝１，２，・・・Ｐの軌跡データ［Ａ_gcp］（以下では、計測行列と称する）を記憶部２に保存する。このデータ形式は、以下のようなフォーマットになっている。
【００１９】
【数１】

【００２０】
次に、相対的３次元座標行列計算部６Ｂにおいて、この行列［Ａgcp］から因子分解法なる手法で、基準点群の相対的３次元座標行列［ｓ］を獲得する（ステップ１４）。この手法として、
C. Tomasi and T. Kanade, "Shape and Motion from Image Streams under Orthography: a Factorization Method", IJCV, Vol.9, No.2, pp.137-154, 1992（文献１）
Conrad J. Poelman, and T. Kanade, "A Paraperspective Factorization Method for Shape and Motion Recovery", IEEE Trans. PAMI, Vol.19, No.3, March, 1997（文献２）
に記載されている手法を利用する。因子分解法は、映像中の特徴点（被写体の角点、特徴のある点群）に対して、時系列にわたり、その画像上の座標点を計測し、得られた画像座標値のデータから、時系列におけるカメラの運動（すなわち、各時系列にわたり、どのようにカメラが動いたかという情報）その特徴点の３次元座標値を同時に獲得する手法である。
【００２１】
因子分解法により獲得した基準点群の相対的３次元座標行列［ｓ］は、以下の形式となる。
【００２２】
【数２】

【００２３】
一方、図２の処理フローでは、各基準点群Ｐ_iの空間情報（Ｘ_i，Ｙ_i，Ｚ_i）から、以下のようなフォーマットで基準点群の正確な３次元座標行列［Ｓ］を生成する。
【００２４】
【数３】

【００２５】
特異値行列抽出部６Ｃにより式（２）と式（３）を特異値分解（ＳＶＤ：Singular Value Decomposition）する（ステップ１５）。
【００２６】
【数４】

【００２７】
左右行列抽出部６Ｄにより式（５）における左側と右側の行列を抽出する。３次元座標行列抽出部６Ｅによって図２におけるステップ１６で、式（４）における特異値行列と、式（５）における左側と右側の行列を組み合わせて、以下の行列を生成し、出力データを［Ｘ］とする。
【００２８】
【数５】

【００２９】
次に、カメラ運動行列抽出部６Ｆにより式（１）を特異値分解し、ランク４以上の成分をノイズと見なし、ノイズを除去することにより、［Ａ_gcp］≒［Ｕ₁］［Ｗ₁］［Ｖ₁］を求め、次に、以下を満足する行列［Ｑ］を計算する（ステップ１７）。
分解した行列には、以下の行列［Ｑ］だけの不定性がある。
【００３０】
【数６】

【００３１】
ここで、行列［Ｍ］はカメラ運動行列である。これを、以下の拘束条件で行列［Ｑ］を求める。
【００３２】
【数７】

【００３３】
［Ｍ］は未知な行列であるため、以下が成立しなければならない。
【００３４】
【数８】

【００３５】
したがって、行列［Ｑ］は、
【００３６】
【数９】

【００３７】
となる。一方、ステップ１８では、次の計算式によりカメラ運動行列を復元する。
【００３８】
【数１０】

【００３９】
ステップ１９で、カメラ視点の姿勢または回転パラメータを、文献（１）、（２）に記載されている手法で求める。ここで、回転パラメータとは、時系列画像に対するロール，ピッチ，ヨー角を指す。
【００４０】
ステップ１９では、まず、カメラ運動行列〔Ｍ〕が、式（９）のような成分を有し、この行列に対して、時系列画像に対するカメラ運動成分ｍ_f，ｎ_f（ｆ＝１，２・・・，Ｆ）を取り出す。
【００４１】
【数１１】

【００４２】
次に、以下のベクトルをｍ_f，ｎ_f，ならびに各フレームでの基準点群から算出される重心座標値（ｘ_f，ｙ_f）を使って求める。
【００４３】
【数１２】

【００４４】
上記のベクトルと重心座標値により、光軸方向のカメラ姿勢ベクトルｋ_fが以下の式にて算出できる。
【００４５】
【数１３】

【００４６】
続いて、以下の計算で、その他のカメラ姿勢ベクトルｉ_f，ｊ_fが算出できる。
【００４７】
【数１４】

【００４８】
さらに、回転パラメータα_f，β_f，γ_fは、以下の式で算出する。
【００４９】
【数１５】

【００５０】
上記の回転パラメータ算出を、各フレームｆにおいて行い、時系列画像に対する回転パラメータを得る。
【００５１】
この方法で復元したカメラ視点に関する姿勢（カメラパラメータ）は、時系列画像と矛盾なく一致しており、突風、ビル風、気象条件の影響の場合でも、正確な視点、または、視線の姿勢を表現している。
【００５２】
さらに、初期画像のカメラパラメータが正しいと仮定して基準点群を初期画像に透視投影により自動配置した場合には、上記方法により算出されたカメラパラメータを用いて、再度、更新したカメラパラメータを使って基準点群を透視投影により自動配置し、上記方法を実行することにより、一層、正確なカメラパラメータを獲得復元することができる。
【００５３】
なお、処理部６はその機能を実現するためのプログラムを、コンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行するものであってもよい。コンピュータ読み取り可能な記録媒体とは、フロッピー（登録商標）ディスク、光磁気ディスク、ＣＤ−ＲＯＭ等の記録媒体、コンピュータシステムに内蔵されるハードディスク装置等の記憶装置を指す。さらに、コンピュータ読み取り可能な記録媒体は、インターネットを介してプログラムを送信する場合のように、短時間の間、動的にプログラムを保持するもの（伝送媒体もしくは伝送波）、その場合のサーバとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含む。
【００５４】
【発明の効果】
以上説明したように、本発明は、下記のような効果がある。
１）撮影された時系列画像中に正確な３次元座標値が判明している複数の基準点（基準点群）が存在する場合に、パラメータ判明画像のカメラパラメータと時系列画像中の基準点群の移動に基づいてパラメータ未定画像の正確なカメラパラメータを推定できる。
２）パラメータ判明画像が存在せず、全ての時系列画像のカメラパラメータが不明な場合に、オペレータが基準点群を所定の画像に初期配置することにより、時系列画像中の基準点群の移動に基づいてパラメータ未定画像の正確なカメラパラメータを推定できる。
【図面の簡単な説明】
【図１】本発明の一実施形態のカメラパラメータ推定装置の構成図である。
【図２】カメラパラメータ推定処理のフローチャートである。
【図３】基準点群の配置方法を示すフローチャートである。
【図４】初期画像での点群配置を示す図である。
【図５】追跡中の基準点群を示す図である。
【図６】センサ計測値と画像レートの違いを示す図である。
【符号の説明】
１入力部
２記憶部
３空間情報データベース
４時系列画像データベース
５出力部
６処理部
６Ａ基準点群配置部
６Ｂ相対的３次元座標行列計算部
６Ｃ特異点行列抽出部
６Ｄ左右行列抽出部
６Ｅ３次元座標行列生成部
６Ｆカメラ運動行列算出部
６Ｇカメラパラメータ算出部
１１〜１９，２１〜２５ステップ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a perspective (pitch, roll, yaw) of an image input device when measuring or acquiring and restoring a three-dimensional shape or structure of an object from time-series image data acquired by an image input device or the like. ) Or rotation parameters, and more particularly, to a camera posture estimation method in computer vision.
[0002]
[Prior art]
With the advancement of remote sensing technology, highly accurate sensor devices have been developed. Recently, a method for measuring postures using a plurality of high-precision differential GPS (Global Positioning System) devices has been put into practical use, and can sense even postures of 0.1 degrees or less. . However, although the sensor accuracy is improved, in terms of one-to-one correspondence with video, the image does not match the time code for managing the image due to delay between devices or asynchronous with the image rate. In an environment such as aerial photography, when an abrupt change occurs due to gusts, building winds, weather conditions, etc., there is a discrepancy between the image and the sensing posture. The discrepancy between the measurement value using the sensor device and the image rate will be described with reference to FIG. The sensor device measures the position at a certain interval, and the image and the measured value are recorded in one-to-one correspondence with the time code or the like at that rate. However, in general, the image rate and the sensor rate are often different, and it is not always possible to sense the posture at which each time-series image is acquired. That is, in surveying, in order to match the image rate and the sensor rate, matching is performed using an interpolation method such as linear interpolation so as to be associated with the image rate. At this time, since the points represented by white circles in FIG. 6 are linearly interpolated, there are cases where the posture is not accurate compared to the points represented by black circles, and the points on the image showing the spatial information are generated. May differ from the projection on the image. Therefore, the mismatch between the posture and the image causes a deterioration in accuracy in a three-dimensional coordinate acquisition method using the image and the sensor data. Here, the expressed rate refers to the number of data per second. In other words, with the image rate, data is acquired at a cycle of 30 sheets per second, whereas with the sensor rate, data can be acquired only once per second (in the case of differential GPS). In this specification, the 30 still images / second and the GPS data / second measured once are expressed as rates.
[0003]
On the other hand, in the field of computer vision, there are three-dimensional analysis methods using stereo measurement and epipolar analysis as methods for measuring or acquiring the shape of an object from time-series image data. Recently, there is a factorization method as a representative method for simultaneously measuring or acquiring three-dimensional information related to camera motion and the shape of an object. According to these methods, it is possible to acquire and restore information relating to a three-dimensional shape and posture parameters (rotation parameters) relating to a camera viewpoint from a plurality of time-series images in which an object is photographed. However, the camera vibrates due to the influence of the shooting environment, and the obtained aerial image is a time-series image that contains a lot of noise. Due to the influence of the noise component in the aerial image, the camera motion posture estimation and motion restoration are performed. May be difficult.
[0004]
In Japanese Patent Application No. 10-255338, “Camera Posture Estimation Method, Recording Medium Recording Processing Procedure and Camera Posture Estimation Device”, camera posture estimation using object elevation value data, two-dimensional digital map, and camera position information. There is a method. However, unlike the present invention, there is no guarantee that a plurality of data is always prepared, and there is a problem that the amount of data becomes enormous.
[0005]
[Problems to be solved by the invention]
In general, since the image input rate of the image input device is faster than the sensor rate of the sensor device, the camera parameters as camera motion or camera parameters such as rotation parameters are obtained for all the time-series images taken by the image input device. It is not possible. For this reason, normally, images that are associated with accurate camera parameters measured by the sensor device (hereinafter referred to as parameter known images) in the photographed time-series images exist only at a predetermined ratio. There is also a method in which camera parameters of an image that is not associated with a camera parameter (hereinafter referred to as parameter undetermined image) are interpolated from camera parameters associated with the parameter-identified image and associated as camera parameters of the parameter undetermined image. (An image associated with camera parameters obtained by interpolation using known camera parameters is hereinafter referred to as a parameter interpolation image). However, in an aerial image in which the video scene fluctuates due to the effects of gusts, buildings, etc., the camera's shooting posture fluctuates slightly or in bursts, so that the calculated and interpolated camera parameters are inaccurate.
[0006]
It is an object of the present invention to provide an accurate determination of a parameter undetermined image when there are a plurality of reference points (hereinafter referred to as reference point groups) whose accurate three-dimensional coordinate values are known in a captured time-series image. It is an object to provide a camera parameter estimation method and apparatus for estimating various camera parameters.
[0007]
Another object of the present invention is to provide a camera parameter estimation method and apparatus for estimating an accurate camera parameter of a parameter undetermined image when there is no parameter known image and the camera parameters of all time-series images are unknown. That is.
[0008]
[Means for Solving the Problems]
The camera parameter estimation method of the present invention includes:
A reference point group placement step for placing the reference point group in a predetermined initial image;
A factorization method based on a measurement matrix that indicates the coordinate position of the reference point group in each image in a series of images obtained by photographing the state of the reference point group by image processing. A relative three-dimensional coordinate matrix calculation step for obtaining a relative three-dimensional coordinate matrix of the reference point group using
A singular value matrix extraction step of singular value decomposition of a relative three-dimensional coordinate matrix and extracting a singular value matrix from each of the decomposed matrices;
A left-right matrix extraction step for extracting a left matrix and a right matrix located on both sides of the singular value matrix from each of the decomposed matrices by singular value decomposition of an accurate three-dimensional coordinate matrix;
A three-dimensional coordinate matrix generation step for generating a three-dimensional coordinate matrix obtained by converting the scale by combining the singular value matrix extracted in the singular value matrix extraction step and the left matrix and the right matrix extracted in the left-right matrix extraction step;
A camera motion matrix calculation step for obtaining a camera motion matrix using a factorization method from a matrix in which noise is removed by singular value decomposition of the measurement matrix and a 3D coordinate matrix generated in the 3D coordinate matrix generation step;
A camera parameter calculating step of calculating camera parameters from the camera motion matrix;
[0009]
The present invention relates to a two-dimensional movement amount on a time-series image of a reference point group in an aerial image and a three-dimensional position related to the reference point group in an aerial image in which an image scene fluctuates due to a gust, a building wind, or the like. The corresponding three-dimensional coordinate value is extracted from the spatial information database in which is stored as known information, and motion, posture, and rotation parameters related to the camera viewpoint of each time-series image are acquired and restored. is there.
[0010]
By using the method of the present invention, it is possible to acquire and restore a camera posture parameter, that is, a rotation parameter with high accuracy from a time-series image taken in the air. The posture parameters (rotation parameters) of the camera restored according to the present invention are described in Japanese Patent Application No. 10-232929 “Method and apparatus for acquiring a three-dimensional structure and recording medium of the method”, Japanese Patent Application No. 10-374062 “Acquisition of a three-dimensional structure” -Restoration method, apparatus, and recording medium on which 3D structure acquisition / restoration program is recorded ", Japanese Patent Application No. 11-40299" Structural information restoration and recording medium and structural information restoration apparatus ", Japanese Patent Application No. 11-271383" 3D It can be applied to the camera movement of the “structure acquisition method and recording medium and apparatus”, and in combination with them, the building shape can be acquired and restored more accurately and stably.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments of the present invention will be described with reference to the drawings.
[0012]
FIG. 1 is a configuration diagram of a camera parameter estimation apparatus according to an embodiment of the present invention.
[0013]
The camera parameter estimation apparatus according to the present embodiment stores a data input unit 1 such as a keyboard for inputting data, a storage unit 2 for temporarily storing data, and digital information obtained by digitizing the shape of the ground. Spatial information database 3, time-series image database 4 in which captured images are stored in time series, output unit 5 that outputs (displays or prints) estimated camera parameters, and camera parameter estimation processing And a processing unit 6 that outputs the estimated camera parameters to the output unit 5.
[0014]
The processing unit 6 includes a reference point group placement unit 6A that places the reference point group on a predetermined initial image, and a coordinate position of the reference point group in each image in a series of images obtained by capturing a state in which the reference point group moves. A relative three-dimensional coordinate matrix calculation unit 6B that obtains a relative three-dimensional coordinate matrix of the reference point group using a factorization method from a measurement matrix that is obtained by image processing and indicates a movement state of the coordinate position of the reference point group; Singular value decomposition is performed on the relative three-dimensional coordinate matrix, and a singular value matrix extraction unit 6C that extracts a singular value matrix in each decomposed matrix, and an accurate three-dimensional coordinate matrix are subjected to singular value decomposition and decomposition. The left and right matrix extraction units 6D for extracting the left and right matrices located on both sides of the singular value matrix, and the singular value matrix and the left and right matrix extraction unit 6D extracted by the singular value matrix extraction unit 6C. Combine the left and right matrices extracted in step A three-dimensional coordinate matrix generation unit 6E that generates a three-dimensional coordinate matrix obtained by converting, a matrix obtained by singular value decomposition of the measurement matrix to remove noise, and a three-dimensional coordinate matrix generated by the three-dimensional coordinate matrix generation unit 6E , A camera motion matrix calculation unit 6F that calculates a camera motion matrix using a factorization method, and a camera parameter calculation unit 6G that calculates a camera posture parameter (camera rotation parameter) from the camera motion matrix.
[0015]
FIG. 2 is a flowchart of camera parameter estimation processing performed by the processing unit 6.
[0016]
In the following, sample points (GCP: Ground Control Points / Positions) located on the ground will be taken as an example of the point cloud on the reference plane, and the factorization method will be used as a method for acquiring and restoring spatial information and a spatial model. Is described.
[0017]
First, the reference point group placement unit 6A places a reference point group recognized as being located on the ground in the initial image of the video scene in the video scene for restoring the camera motion (step 11).
[0018]
With reference to FIG. 3, the flow of processing relating to the initial arrangement of the reference point group will be described. First, only those having spatial information are extracted from the spatial information database 3 (step 22). Alternatively, sample points on the ground that can be identified even in an aerial image, that is, GCP, are surveyed in advance, the spatial information of these point groups is stored in the spatial information database 3, and this spatial information is extracted. Here, the spatial information means the three-dimensional coordinate values (X _i , Y _i , Z _i ) of the reference point P _i . On the other hand, an image as an initial image is extracted from the time-series image database 4 (step 23), and spatial information of the reference point group is arranged on this image (step 24). FIG. 4 shows an example of reference point group arrangement in the initial image. As shown in the figure, a plurality of points are arranged so as not to be biased. The reference point group is obtained by arranging a corresponding point group on an image based on the spatial information read from the spatial information database 3. For example, assuming that the camera parameter of any one of the parameter known image and the parameter interpolated image is correct, and assuming that the camera parameter is correct, the initial three-dimensional coordinate value of the reference point group and the Based on the camera parameters, a reference point group is arranged on the initial image by perspective projection, or spatial information and corresponding image samples are present in the time-series image database 4, and the operator browses the image samples. (Step 21), the operator recognizes where the point should be marked in the initial image, sequentially arranges the reference point group on the initial image, and sees whether or not the point arrangement is normal by looking at the browsing image. Judgment is made (step 25), and the reference point group is fixed. In addition, since the flow direction of the time-series image is a fixed direction in the aerial image, the reference point cloud is not arranged in the entire initial image, but is framed out to a certain number (the subject in the video is off the screen). The reference point group is arranged to such an extent that it does not. FIG. 5 shows the position of the reference point group on the image after the image is traced to a certain number. The time-series image tracking (step 13) in FIG. 2 is performed on the F images until the reference point group is out of frame (the process is stopped when at least one frame in the point group is out of frame). Accordingly, the trajectory data [A _gcp ] of the two-dimensional coordinates p _i (x _i1 , y _i1 ); i = 1, 2,... P in the time series image of the measured reference point group (hereinafter, a measurement matrix) Is stored in the storage unit 2. This data format is as follows.
[0019]
[Expression 1]

[0020]
Next, in the relative three-dimensional coordinate matrix calculation unit 6B, a relative three-dimensional coordinate matrix [s] of the reference point group is obtained from the matrix [Agcp] by a method of factorization (step 14). As this technique,
C. Tomasi and T. Kanade, "Shape and Motion from Image Streams under Orthography: a Factorization Method", IJCV, Vol.9, No.2, pp.137-154, 1992 (reference 1)
Conrad J. Poelman, and T. Kanade, "A Paraperspective Factorization Method for Shape and Motion Recovery", IEEE Trans. PAMI, Vol.19, No.3, March, 1997 (Reference 2)
Use the method described in. The factorization method measures the coordinate points on the image over the time series for the feature points in the video (the corner points of the subject, the featured point group), and from the obtained image coordinate value data, This is a technique for simultaneously acquiring the three-dimensional coordinate values of the feature points of camera movement in time series (that is, information on how the camera has moved over each time series).
[0021]
The relative three-dimensional coordinate matrix [s] of the reference point group obtained by the factorization method has the following format.
[0022]
[Expression 2]

[0023]
On the other hand, in the processing flow of FIG. 2, from the spatial information (X _i , Y _i , Z _i ) of each reference point group P _i , an accurate three-dimensional coordinate matrix [S] of the reference point group is obtained in the following format. Generate.
[0024]
[Equation 3]

[0025]
The singular value matrix extraction unit 6C performs singular value decomposition (SVD: Singular Value Decomposition) on the equations (2) and (3) (step 15).
[0026]
[Expression 4]

[0027]
The left and right matrix extraction units 6D extract the left and right side matrices in equation (5). In step 16 in FIG. 2, the three-dimensional coordinate matrix extraction unit 6E combines the singular value matrix in Equation (4) and the left and right matrices in Equation (5) to generate the following matrix and output data as [ X].
[0028]
[Equation 5]

[0029]
Next, singular value decomposition is performed on the equation (1) by the camera motion matrix extraction unit 6F, components of rank 4 or higher are regarded as noise, and noise is removed, so that [A _gcp ] ≈ [U ₁ ] [W ₁ ]. [V ₁ ] is obtained, and then a matrix [Q] satisfying the following is calculated (step 17).
The decomposed matrix has indefiniteness of only the following matrix [Q].
[0030]
[Formula 6]

[0031]
Here, the matrix [M] is a camera motion matrix. From this, the matrix [Q] is obtained under the following constraint conditions.
[0032]
[Expression 7]

[0033]
Since [M] is an unknown matrix, the following must hold:
[0034]
[Equation 8]

[0035]
Therefore, the matrix [Q] is
[0036]
[Equation 9]

[0037]
It becomes. On the other hand, in step 18, the camera motion matrix is restored by the following calculation formula.
[0038]
[Expression 10]

[0039]
In step 19, the camera viewpoint orientation or rotation parameter is obtained by the method described in documents (1) and (2). Here, the rotation parameter refers to the roll, pitch, and yaw angle with respect to the time-series image.
[0040]
In step 19, first, the camera motion matrix [M] has a component as shown in Expression (9). For this matrix, camera motion components m _f and n _f (f = 1, 2 for the time-series images). ..., F) is taken out.
[0041]
[Expression 11]

[0042]
Next, the following vectors are obtained using m _f , n _f and barycentric coordinate values (x _f , y _f ) calculated from the reference point group in each frame.
[0043]
[Expression 12]

[0044]
From the above vector and the barycentric coordinate value, the camera posture vector k _{f in} the optical axis direction can be calculated by the following equation.
[0045]
[Formula 13]

[0046]
Subsequently, other camera posture vectors i _f and j _f can be calculated by the following calculation.
[0047]
[Expression 14]

[0048]
Further, the rotation parameters α _f , β _f , and γ _f are calculated by the following equations.
[0049]
[Expression 15]

[0050]
The above rotation parameter calculation is performed for each frame f to obtain a rotation parameter for the time-series image.
[0051]
The posture (camera parameters) related to the camera viewpoint restored by this method is consistent with the time-series images, and expresses the correct viewpoint or gaze posture even under the influence of gusts, building winds, and weather conditions. is doing.
[0052]
Furthermore, when the reference point cloud is automatically arranged on the initial image by perspective projection on the assumption that the camera parameter of the initial image is correct, the updated camera parameter is used again using the camera parameter calculated by the above method. By automatically arranging the reference point group by perspective projection and executing the above method, it is possible to acquire and restore more accurate camera parameters.
[0053]
The processing unit 6 may record a program for realizing the function on a computer-readable recording medium, read the program recorded on the recording medium into a computer system, and execute the program. . The computer-readable recording medium refers to a recording medium such as a floppy (registered trademark) disk, a magneto-optical disk, and a CD-ROM, and a storage device such as a hard disk device built in the computer system. Furthermore, a computer-readable recording medium is a server that dynamically holds a program (transmission medium or transmission wave) for a short period of time, as in the case of transmitting a program via the Internet, and a server in that case. Some of them hold programs for a certain period of time, such as volatile memory inside computer systems.
[0054]
【The invention's effect】
As described above, the present invention has the following effects.
1) When there are a plurality of reference points (reference point groups) whose accurate three-dimensional coordinate values are known in the captured time series image, the camera parameters of the parameter known image and the reference points in the time series image Based on the movement of the group, it is possible to estimate an accurate camera parameter of the parameter undetermined image.
2) When there is no parameter known image and the camera parameters of all time series images are unknown, the operator moves the reference point group in the time series image by initially placing the reference point group on the predetermined image. Based on the above, it is possible to estimate an accurate camera parameter of the parameter undetermined image.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a camera parameter estimation apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart of camera parameter estimation processing.
FIG. 3 is a flowchart showing a reference point group arrangement method;
FIG. 4 is a diagram showing a point group arrangement in an initial image.
FIG. 5 is a diagram showing a reference point group during tracking.
FIG. 6 is a diagram illustrating a difference between a sensor measurement value and an image rate.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Input part 2 Storage part 3 Spatial information database 4 Time series image database 5 Output part 6 Processing part 6A Reference point group arrangement | positioning part 6B Relative three-dimensional coordinate matrix calculation part 6C Singular point matrix extraction part 6D Left-right matrix extraction part 6E Three-dimensional Coordinate matrix generation unit 6F camera motion matrix calculation unit 6G camera parameter calculation units 11-19, 21-25 steps

Claims

A method for estimating camera parameters of a parameter undetermined image in a time-series image when an accurate three-dimensional coordinate matrix representing an accurate three-dimensional coordinate value of a reference point group is known,
A reference point group placement step for placing the reference point group in a predetermined initial image;
The coordinate position of the reference point group is obtained by image processing in each image in the series of images obtained by photographing the state of movement of the reference point group, and factorization is performed from the measurement matrix indicating the movement state of the coordinate position of the reference point group. A relative three-dimensional coordinate matrix calculation step for obtaining a relative three-dimensional coordinate matrix of the reference point group using a method;
A singular value matrix extracting step of singular value decomposition of the relative three-dimensional coordinate matrix and extracting a singular value matrix from each of the decomposed matrices;
A left-right matrix extraction step for extracting a left matrix and a right matrix located on both sides of the singular value matrix from each of the decomposed matrices by performing singular value decomposition on the exact three-dimensional coordinate matrix;
A three-dimensional coordinate matrix generation step for generating a three-dimensional coordinate matrix obtained by converting a scale by combining the singular value matrix extracted in the singular value matrix extraction step and the left matrix and the right matrix extracted in the left-right matrix extraction step;
A camera motion matrix calculation step for obtaining a camera motion matrix by using a factorization method from a matrix obtained by singular value decomposition of the measurement matrix to remove noise, and the three-dimensional coordinate matrix generated in the three-dimensional coordinate matrix generation step; ,
A camera parameter estimation method comprising a camera parameter calculation step of calculating a camera parameter from the camera motion matrix.

In the reference point group placement step, it is assumed that the camera parameter of any one of the parameter known image and the parameter interpolated image is correct, and the image on which the camera parameter is assumed to be correct is used as an initial image to accurately determine the reference point group. The camera parameter estimation method according to claim 1, wherein a reference point group is arranged on the initial image by perspective projection based on a three-dimensional coordinate value and the camera parameter.

In the reference point group arranging step, it is assumed that the camera parameter of any one of the parameter known image and the parameter interpolated image is correct, the image on which the camera parameter is assumed to be correct is set as an initial image, and the reference point group accurate Based on the three-dimensional coordinate values and the camera parameters, the camera parameters are assumed to be correct, the reference point group is arranged on the initial image by perspective projection, and the camera parameters previously determined to be correct are obtained. The camera parameter value assumed to be correct first is updated using the camera parameter of the camera motion, and the reference point group is arranged on the initial image by perspective projection again using the updated camera parameter. Camera parameter estimation method.

An apparatus for estimating camera parameters of a parameter undetermined image in a time-series image when an accurate three-dimensional coordinate matrix representing an accurate three-dimensional coordinate value of a reference point group is known,
Reference point group placement means for placing the reference point group in a predetermined initial image;
The coordinate position of the reference point group is obtained by image processing in each image in the series of images obtained by photographing the state of movement of the reference point group, and factorization is performed from the measurement matrix indicating the movement state of the coordinate position of the reference point group. A relative three-dimensional coordinate matrix calculating means for obtaining a relative three-dimensional coordinate matrix of the reference point group using a method;
Singular value matrix extracting means for performing singular value decomposition on the relative three-dimensional coordinate matrix and extracting a singular value matrix from each of the decomposed matrices;
Left and right matrix extracting means for performing singular value decomposition on the exact three-dimensional coordinate matrix and extracting a left matrix and a right matrix located on both sides of the singular value matrix from each of the decomposed matrices;
Three-dimensional coordinate matrix generation means for generating a three-dimensional coordinate matrix obtained by converting the scale by combining the singular value matrix extracted by the singular value matrix extraction means and the left matrix and right matrix extracted by the left and right matrix extraction means;
Camera motion matrix calculation means for obtaining a camera motion matrix using a factorization method from a matrix obtained by singular value decomposition of the measurement matrix to remove noise, and a three-dimensional coordinate matrix generated by the three-dimensional coordinate matrix generation means; ,
A camera parameter estimation device comprising camera parameter calculation means for calculating camera parameters from the camera motion matrix.

The reference point group arranging means assumes that the camera parameter of any one of the parameter known image and the parameter interpolated image is correct, and uses the image with the correct camera parameter as an initial image, to accurately determine the reference point group. The camera parameter estimation apparatus according to claim 4, wherein a reference point group is arranged on the initial image by perspective projection based on a three-dimensional coordinate value and the camera parameter.

The reference point group arranging means assumes that the camera parameter of any one of the parameter known image and the parameter interpolated image is correct, sets the image on which the camera parameter is correct as an initial image, and accurately determines the reference point group. Based on the three-dimensional characteristic values and the camera parameters, the camera parameters are assumed to be correct, the reference point group is arranged on the initial image by perspective projection, and the camera parameters previously determined to be correct are obtained. 5. The camera parameter value previously assumed to be correct is updated using the camera parameter of the camera motion, and the reference point group is arranged on the initial image by perspective projection again using the updated camera parameter. Camera parameter estimation device.

The camera parameter estimation program for making a computer perform the method of any one of Claim 1 to 3.

The recording medium which recorded the camera parameter estimation program for making a computer perform the method of any one of Claim 1 to 3.