JP2009139995A

JP2009139995A - Apparatus and program for real-time pixel matching in stereo image pairs

Info

Publication number: JP2009139995A
Application number: JP2007312428A
Authority: JP
Inventors: Gurbuz Sabri; サブリ・グルブズ; Naoki Inoue; 直己井ノ上; Sumio Yano; 澄男矢野
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2007-12-03
Filing date: 2007-12-03
Publication date: 2009-06-25

Abstract

【課題】ステレオ画像対の対応の画素をリアルタイムで正確にマッチングする装置を提供する。
【解決手段】補正されたステレオ画像対の画素対をマッチングする装置は、レンジカメラ画像内の画素をステレオカメラシステムの左右の画像中の画素にマッピングし、レンジカメラ画像内の画素の画素値をしきい値と比較するための前景／背景マッピングモジュール１４６と、前景／背景マッピングモジュール１４６に応答して、比較手段の比較結果により、第１の画像内の画素に対応する第２の画像内の画素について、第１のディスパリティ探索又は第２のディスパリティ探索を選択的に行うためのディスパリティ探索モジュール１４８とを含む。
【選択図】図３
An apparatus for accurately matching corresponding pixels of a stereo image pair in real time is provided.
An apparatus for matching a pixel pair of a corrected stereo image pair maps a pixel in a range camera image to a pixel in a left and right image of the stereo camera system, and calculates a pixel value of the pixel in the range camera image. In response to the foreground / background mapping module 146 for comparing with the threshold and the comparison result of the comparison means in response to the foreground / background mapping module 146, the second image in the second image corresponding to the pixel in the first image A disparity search module 148 for selectively performing a first disparity search or a second disparity search for pixels.
[Selection] Figure 3

Description

この発明はステレオ画像のリアルタイムディスパリティ推定のための装置及びコンピュータプログラムに関し、特に、２Ｄ／３Ｄ頭部姿勢検出、認識、３Ｄゲーム開発、アニメーション、放送及び通信のための、ステレオ画像対のリアルタイムの、フレームごとの画素のマッチングのための装置及びプログラムに関する。 The present invention relates to an apparatus and computer program for real-time disparity estimation of stereo images, and more particularly to real-time stereo image pairs for 2D / 3D head posture detection, recognition, 3D game development, animation, broadcast and communication. The present invention relates to an apparatus and a program for pixel matching for each frame.

メディア放送コンテンツにおいて、本物らしい感じを達成するために、送信及び音声―映像マルチメディア技術が速いペースで進展している。視聴者に本物らしさを感じさせるためには、視聴者に対し、超現実的な音声及び映像コンテンツを配信しなければならない。この発明では、３次元（３Ｄ）視覚コンテンツ獲得に焦点を当てる。 In media broadcast content, transmission and audio-video multimedia technologies are evolving at a fast pace to achieve a genuine feel. In order to make the viewer feel authentic, surreal audio and video content must be distributed to the viewer. The present invention focuses on acquiring three-dimensional (3D) visual content.

長年にわたって、周知の赤／緑立体視眼鏡に基づく表示、容積表示（非特許文献１）、自動立体視表示（非特許文献２、３）等の様々な種類の３Ｄ表示技術が開発されてきた。赤／緑立体視眼鏡に基づく表示は２つのカメラからの入力を必要とするが、処理はほとんど不要である。ただし視聴者は特別な眼鏡をかけなければならない。自動立体視３Ｄ表示では特別な眼鏡は必要ないが、コンテンツは３Ｄでキャプチャし３Ｄで配信しなければならない。自然なシーンの３Ｄによるコンテンツの獲得で、既存のテレビジョン放送の品質に達するには、依然としてかなりの困難がある。 Over the years, various types of 3D display technologies such as display based on the well-known red / green stereoscopic glasses, volume display (Non-Patent Document 1), auto-stereoscopic display (Non-Patent Documents 2 and 3) have been developed. . Display based on red / green stereoscopic glasses requires input from two cameras, but requires little processing. However, viewers must wear special glasses. Auto-stereoscopic 3D display does not require special glasses, but content must be captured in 3D and distributed in 3D. There is still considerable difficulty in reaching the quality of existing television broadcasts with the acquisition of content in 3D of natural scenes.

従来の２Ｄテレビジョンと対照的に、自動立体視３Ｄテレビジョンはあるシーンの視覚的外観だけでなく、そのシーンの密度の濃い深度マップ情報を必要とする。このため、主に２つの科学的課題がある。すなわち、シーンの視覚的外観と深度マップとをリアルタイムで獲得することと、結果として得られる３Ｄテレビジョン上の３Ｄ外観が、そのシーンの元の視覚的外観とマッチしていなければならない（本物らしさ）ということと、である。 In contrast to conventional 2D television, autostereoscopic 3D television requires not only the visual appearance of a scene but also the dense map information of that scene. For this reason, there are two main scientific challenges. That is, obtaining the visual appearance and depth map of the scene in real time and the resulting 3D appearance on the 3D television must match the original visual appearance of the scene (genuineness) ).

これまで、３Ｄのシーンキャプチャに多くの技術が提案されてきた。極端なものの一つでは、従来の較正済立体視カメラを用いた画像ベースのアプローチによって、深度マップを推定している。別の極端な例では、赤外光の飛行時間（Ｔｉｍｅ−ｏｆ−ｆｌｉｇｈｔ：ＴＯＦ）を用いた光学ベースのカメラを用いて深度を獲得している。 Until now, many techniques have been proposed for 3D scene capture. In one extreme, the depth map is estimated by an image-based approach using a conventional calibrated stereoscopic camera. In another extreme example, the depth is obtained using an optical-based camera using time-of-flight (TOF) of infrared light.

ＴＯＦ３Ｄカメラは正弦波変調された赤外光信号を射出する（非特許文献４）。光が測定システムを出て対象物に至り、システムに戻るまでの移動に必要な時間が測定され、画像中の各画素の深度が計算される。達成可能な最良の距離精度は、深度距離によるが数センチメートルのオーダであって、動作範囲の制限があり、範囲外の測定値は信頼性が低い。このため、ＴＤＦカメラによる獲得アプローチは３Ｄテレビジョン放送には向いていない。 The TOF3D camera emits a sinusoidally modulated infrared light signal (Non-Patent Document 4). The time required for the movement of light until it exits the measurement system to the object and returns to the system is measured, and the depth of each pixel in the image is calculated. The best achievable distance accuracy is on the order of a few centimeters depending on the depth distance, with limited operating range, and measurements outside the range are unreliable. For this reason, the acquisition approach by the TDF camera is not suitable for 3D television broadcasting.

従来の、較正済立体視に基づくアプローチ（非特許文献５、６）は、画像処理アルゴリズム、すなわちテンプレートマッチングを用いて、左右のカメラ画像からのシーンの深度マップを計算する。これはカメラを較正して、内部及び外部のカメラパラメータを推定することを必要とする。深度マップ推定処理は、同期キャプチャ、較正パラメータを用いた補正、及び左右のカメラ画像の画素間のディスパリティ推定のためのテンプレートマッチングを含む。テンプレートマッチングアルゴリズムは通常、正規化相互相関ベースの類似尺度を用いる。
Ｂ．ブランデル及びＡ．Ｊ．シュワルツ、「容積表示システムの分類：画像空間の特性及び予測可能性」ＩＥＥＥ視覚化及びコンピュータグラフィックトランザクション、第８巻、第１号、６６−７６ページ、２００２年（B. Blundell and A.J. Schwarz, "The classification of volumetric display systems: Characteristics and predictability of the image space," IEEE Transactions on Visualization and Computer Graphics, vol. 8, no. 1, pp. 66-76, 2002.）フィリップスリサーチ、「マルチビュー自動立体視表示」、［ｏｎｌｉｎｅ］、平成１９年１２月３日検索、インターネット＜ＵＲＬ：http://www.research.philips.com/technologies/display/ov_3ddisp.html＞（Philips Research, "Multi-view autostereoscopic displays," in http://www.research.philips.com/technologies /display/ov_3ddisp.html）Ｒ．ボルナー、Ｂ．ダックスタイン、Ｏ．マチウ、Ｒ．ラダー、Ｔ．シニング及びＴ．シコラ、「頭部トラッキング能力を備えた単一ユーザの自動立体視表示のファミリー」ＩＥＥＥビデオ技術のための回路及びシステムトランザクション、第１０巻、第２号、２３４−２４３ページ、２０００年（R. Borner, B. Duckstein, O. Machui, R. Rder, T. Sinning, and T. Sikora, "A family of single-user autostereoscopic displays with head-tracking capabilities," IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 2, pp. 234-243, 2000.）Ｂ．ブットゲン、Ｔ．オジヤ、及びＭ．レーマン、「レンジ画像化のためのＣｃｄ／ｃｍｏｓロックイン画素：課題、限界と現状」レンジ画像化研究第１日、予稿集、２１−２３ページ、インゲンサンド／カールマン（編）、チューリッヒ、２００５年（B. Buttgen, T. Oggier, and M. Lehmann, "Ccd/cmos lock-in pixel for range imaging: challenges, limitations and state-of-the-art," in Proceedings of 1st range imaging research day, pp.21-32, Ingensand/Kahlmann (eds.), Zurich, 2005.）Ｋ．コーノリヒ、「スモールビジョンシステム：ハードウェア及び実現」ロボティクス研究に関する第８回国際シンポジウム、ハヤマ、日本、１９９７年（K. Konolige, "Small vision systems: Hardware and implementation," in Eighth International Symposium on Robotics Research, Hayama, Japan, 1997.）Ｃ．サン、「矩形サブ領域化及び３ｄ最大面積技術を用いた高速ステレオマッチング」コンピュータビジョン国際ジャーナル、第４７巻、第１／２／３巻、９９−１１７ページ、２００２年（C. Sun, "Fast stereo matching using rectangular sub-regioning and 3d maximum-surface techniques," International Journal of Computer Vision, vol. 47, no. 1/2/3, pp. 99-117, 2002.）Ｇ．Ｒ．グリメット及びＤ．Ｒ．スチルゼーカ、確率とランダム処理、第２版、クラレンドンプレス、オックスフォード。ＩＳＢＮ０−１９−８５３６６５０８、１９９２年（G.R. Grimmett and D.R. Stirzaker, Probability and Random Processes, 2nd Edition, Clarendon Press, Oxford. ISBN 0-19-853665-8, 1992.）Ｐ．ヴィオラ及びＭ．ジョーンズ、「頑健なリアルタイムの物体検出」視覚モデリング、学習、計算及びサンプリングの統計及び計算理論に関する第２回国際ワークショップ、バンクーバー、カナダ、２００１年（P. Viola and M. Jones, "Robust real-time object detection," in 2nd International Workshop on Statistical and Computational Theories of Vision-Modeling, Learning, Computing, and Sampling, Vancouver, Canada, 2001.）Ｒ．Ｊ．シャルコフ、ディジタル画像処理及びコンピュータビジョン、ジョンワイリーアンドサンズ社、１９８９年（R.J. Schalkoff, Digital Image Processing and Computer Vision, John Wiley and Sons, Inc., 1989.）Ｔ．オジヤ、Ｒ．カウフマン、Ｍ．レーマン、Ｐ．メツラー、Ｇ．ラング、Ｍ．シュバイツァー、Ｍ．リヒター、Ｂ．ブットゲン、Ｎ．ブランク、Ｋ．グリースバッハ、Ｂ．ウルマン、Ｋ−Ｈ．ステージマン及びＣ．エルマー、「小型化された光学レンジカメラでのリアルタイム３ｄイメージング」ニュルンベルグ光学会議、ＤＥ２００４年（ T Oggier, R. Kaufmann, M. Lehmann, P. Metzler, G. Lang, M. Schweizer, M. Richter, B. Bttgen, N. Blanc, K. Griesbach, B. Uhlmann, K.-H. Stegemann, and C. Ellmers, "3d-imaging in real-time with miniaturized optical range camera," in Opto Conference Nurnberg, DE., 2004.） Conventional approaches based on calibrated stereoscopic vision (Non-Patent Documents 5 and 6) use an image processing algorithm, ie, template matching, to calculate a depth map of a scene from left and right camera images. This requires the camera to be calibrated to estimate internal and external camera parameters. The depth map estimation process includes template matching for synchronous capture, correction using calibration parameters, and disparity estimation between pixels of the left and right camera images. Template matching algorithms typically use a normalized cross-correlation based similarity measure.
B. Brandel and A.M. J. et al. Schwarz, “Classification of Volume Display Systems: Image Space Characteristics and Predictability”, IEEE Visualization and Computer Graphic Transactions, Vol. 8, No. 1, pp. 66-76, 2002 (B. Blundell and AJ Schwarz, “ The classification of volumetric display systems: Characteristics and predictability of the image space, "IEEE Transactions on Visualization and Computer Graphics, vol. 8, no. 1, pp. 66-76, 2002.) Philips Research, “Multi-view autostereoscopic display”, [online], search on December 3, 2007, Internet <URL: http://www.research.philips.com/technologies/display/ov_3ddisp.html> ( Philips Research, "Multi-view autostereoscopic displays," in http://www.research.philips.com/technologies /display/ov_3ddisp.html) R. Borner, B.B. Duxtine, O. Machiu, R.M. Ladder, T.W. Thinning and T.W. Sicola, “Family of single-user autostereoscopic display with head tracking capability” Circuit and System Transactions for IEEE Video Technology, Vol. 10, No. 2, pages 234-243, 2000 (R. Borner, B. Duckstein, O. Machui, R. Rder, T. Sinning, and T. Sikora, "A family of single-user autostereoscopic displays with head-tracking capabilities," IEEE Transactions on Circuits and Systems for Video Technology, vol 10, no. 2, pp. 234-243, 2000.) B. Butgen, T.W. Ojiya and M.M. Lehmann, “Ccd / cmos Lock-in Pixel for Range Imaging: Challenges, Limits and Current Status” Range Imaging Research Day 1, Proceedings 21-23, Ingen Sand / Carlmann (ed.), Zurich, 2005 (B. Buttgen, T. Oggier, and M. Lehmann, "Ccd / cmos lock-in pixel for range imaging: challenges, limitations and state-of-the-art," in Proceedings of 1st range imaging research day, pp. 21-32, Ingensand / Kahlmann (eds.), Zurich, 2005.) K. Konorige, “Small Vision Systems: Hardware and Implementation,” 8th International Symposium on Robotics Research, Hayama, Japan, 1997 (K. Konolige, “Small vision systems: Hardware and implementation,” in Eighth International Symposium on Robotics Research, (Hayama, Japan, 1997.) C. Sun, “Fast Stereo Matching Using Rectangular Sub-Regionization and 3d Maximum Area Technology”, Computer Vision International Journal, Vol. 47, 1/2/3, pages 99-117, 2002 (C. Sun, “Fast stereo matching using rectangular sub-regioning and 3d maximum-surface techniques, "International Journal of Computer Vision, vol. 47, no. 1/2/3, pp. 99-117, 2002.) G. R. Glymet and D.C. R. Stillzeka, Probability and Random Processing, 2nd edition, Clarendon Press, Oxford. ISBN 0-19-85366508, 1992 (GR Grimmett and DR Stirzaker, Probability and Random Processes, 2nd Edition, Clarendon Press, Oxford. ISBN 0-19-853665-8, 1992.) P. Viola and M.M. Jones, “Robust Real-Time Object Detection” Second International Workshop on Statistics and Theory of Visual Modeling, Learning, Calculation and Sampling, Vancouver, Canada, 2001 (P. Viola and M. Jones, “Robust real- time object detection, "in 2nd International Workshop on Statistical and Computational Theories of Vision-Modeling, Learning, Computing, and Sampling, Vancouver, Canada, 2001.) R. J. et al. Scharkov, Digital Image Processing and Computer Vision, John Wiley and Sons, 1989 (RJ Schalkoff, Digital Image Processing and Computer Vision, John Wiley and Sons, Inc., 1989.) T.A. Ojiya, R.D. Kaufman, M.C. Lehmann, P.A. Metzler, G. Lang, M.C. Schweizer, M.C. Richter, B.B. Butgen, N.C. Blank, K.K. Greasebach, B.B. Ullman, KH. Stageman and C.I. Elmer, "Real-time 3d imaging with a miniaturized optical range camera" Nuremberg Optical Conference, DE 2004 (T Oggier, R. Kaufmann, M. Lehmann, P. Metzler, G. Lang, M. Schweizer, M. Richter, B. Bttgen, N. Blanc, K. Griesbach, B. Uhlmann, K.-H. Stegemann, and C. Ellmers, "3d-imaging in real-time with miniaturized optical range camera," in Opto Conference Nurnberg, DE. , 2004.)

しかし、従来のステレオベースの３Ｄ再構築アプローチは、多くの場合正確に３Ｄ情報を計算することができない。なぜなら、ステレオ画像中の画素の場所によっては、特に、ステレオディスパリティ探索アルゴリズムのための探索範囲が未知である場合は、対応のマッチングのための独自の情報を提供しないことがあるからである。さらに、テンプレートマッチングアルゴリズムは画像にノイズがある場合、しばしば、正確にディスパリティを計算することができない。 However, traditional stereo-based 3D reconstruction approaches often cannot accurately calculate 3D information. This is because, depending on the location of the pixel in the stereo image, unique information for corresponding matching may not be provided, particularly when the search range for the stereo disparity search algorithm is unknown. Furthermore, template matching algorithms often cannot accurately calculate disparity when the image is noisy.

従って、この発明の目的の１つは、ステレオ画像対の対応の画素をリアルタイムで正確にマッチングする装置を提供することである。 Accordingly, one of the objects of the present invention is to provide an apparatus for accurately matching corresponding pixels of a stereo image pair in real time.

この発明の別の目的は、ステレオ画像対の対応の画素をリアルタイムで正確かつ頑健にマッチングする装置を提供することである。 Another object of the present invention is to provide an apparatus for accurately and robustly matching corresponding pixels of a stereo image pair in real time.

この発明のさらに別の目的は、左画像のテンプレートを右画像の対応するブロックと正確にマッチングしてステレオ画像対の画素マッチングを向上させる装置を提供することである。 Yet another object of the present invention is to provide an apparatus that improves the pixel matching of a stereo image pair by accurately matching the template of the left image with the corresponding block of the right image.

この発明の第１の局面は、補正したステレオ画像対において、対応する画素対をマッチングするための装置に関する。この装置は、ステレオカメラ及びレンジカメラに接続され、それぞれ補正されたステレオ画像対とレンジカメラ画像とを受けることができる。ステレオ画像対は第１の画像と第２の画像とを含む。この装置は、前記レンジカメラ画像中の画素を前記第１の画像中の画素と前記第２の画像中の画素とにマッピングするための第１のマッピング手段と、前記レンジカメラ画像中の前記画素の強度をしきい値と比較するための比較手段と、前記比較手段に応答して、前記比較手段による比較結果に依存して第１のディスパリティ探索と第２のディスパリティ探索とを選択的に行なって、前記第２の画像中の画素であって前記第１の画像中の前記画素にマッチするものを探索する手段と、を含む。 A first aspect of the present invention relates to an apparatus for matching corresponding pixel pairs in a corrected stereo image pair. This apparatus is connected to a stereo camera and a range camera, and can receive a corrected stereo image pair and a range camera image, respectively. The stereo image pair includes a first image and a second image. The apparatus comprises: a first mapping means for mapping a pixel in the range camera image to a pixel in the first image and a pixel in the second image; and the pixel in the range camera image A comparison means for comparing the intensity of the first and second disparity searches in response to the comparison result by the comparison means in response to the comparison means And searching for pixels in the second image that match the pixels in the first image.

好ましくは、探索手段は、前記比較手段が、前記画素の前記強度が前記しきい値より高いと判定したことに応答する第１の探索手段と、前記比較手段が、前記画素の前記強度が前記しきい値以下であると判定したことに応答する第２の探索手段と、を含む。第１の探索手段の探索範囲は前記第２の探索手段の探索範囲より短い。 Preferably, the search means is a first search means that responds when the comparison means determines that the intensity of the pixel is higher than the threshold value, and the comparison means determines that the intensity of the pixel is the value of the pixel. And a second search means for responding to the determination that it is equal to or less than the threshold value. The search range of the first search means is shorter than the search range of the second search means.

さらに好ましくは、前記第１の探索手段は、前記比較手段が、前記画素の前記強度が前記しきい値より高いと判定したことに応答して、前記レンジカメラ画像の前記画素を前記第２の画像にマッピングするための第２のマッピング手段と、前記第２のマッピング手段によってマッピングされた前記第２の画像のエピポーラ線上の前記画素の両側に延在する第１の探索範囲を規定する、第１の探索範囲規定手段と、前記第１の画像中の前記画素を包含する予め定められたサイズの予め規定されたブロックと、前記第１の探索範囲内のそれぞれの画素を包含する前記第２の画像中の予め定められたサイズのブロックとの、予め定められた類似尺度を計算するための第１の類似度計算手段と、前記第２の画像中の前記ブロックのうち、最も高い類似尺度を有するものを選択するための第１のブロック選択手段と、前記第１のブロック選択手段によって選択された前記ブロックの中心の画素を、前記第１の画像の前記画素にマッチするものとして選択するための、第１の画素選択手段と、を含む。ここで、「包含する」とは、このブロックが予め定められた位置の画素、例えば、ブロックが矩形であるとすればブロックの中心の画素、を含むことを意味する。 More preferably, in response to the comparison unit determining that the intensity of the pixel is higher than the threshold value, the first search unit determines the pixel of the range camera image to be the second A second mapping means for mapping to an image; and a first search range extending on both sides of the pixel on the epipolar line of the second image mapped by the second mapping means, One search range defining means; a predetermined block of a predetermined size including the pixels in the first image; and the second including each pixel in the first search range. A first similarity calculating means for calculating a predetermined similarity measure with a block of a predetermined size in the image of the image, and the highest similarity among the blocks in the second image A first block selecting means for selecting one having a degree, and a pixel at the center of the block selected by the first block selecting means is selected as one that matches the pixel of the first image First pixel selection means. Here, “include” means that the block includes a pixel at a predetermined position, for example, a pixel at the center of the block if the block is rectangular.

さらに好ましくは、前記第２の探索手段は、前記第１の探索範囲より長い第２の探索範囲を規定するための、第２の探索範囲規定手段を含み、前記第２の探索範囲は前記第２のマッピング手段によってマッピングされた前記第２の画像のエピポーラ線上の前記画素の一方側のみに延在し、前記第２の探索手段はさらに、前記第１の画像中の前記画素を包含する予め定められたサイズの予め規定されたブロックと、前記第２の探索範囲内のそれぞれの画素を包含する前記第２の画像中の予め定められたサイズのブロックとの、予め定められた類似尺度を計算するための第２の類似度計算手段と、前記第２の画像中の前記ブロックのうち、前記第２の類似手段によって計算された最も高い類似尺度を有するものを選択するための第２のブロック選択手段と、前記第２のブロック選択手段によって選択された前記ブロックの中心の画素を、前記第１の画像の前記画素にマッチするものとして選択するための、第２の画素選択手段と、を含む。 More preferably, the second search means includes second search range defining means for defining a second search range that is longer than the first search range, wherein the second search range is the first search range. Extending to only one side of the pixel on the epipolar line of the second image mapped by the second mapping means, the second search means further including the pixel in the first image in advance A predetermined similarity measure between a predetermined block of a predetermined size and a block of a predetermined size in the second image that includes each pixel in the second search range A second similarity calculating means for calculating, and a second for selecting the block in the second image having the highest similarity measure calculated by the second similar means. Block selection And a second pixel selection means for selecting a central pixel of the block selected by the second block selection means as a match with the pixel of the first image. .

前記第１の類似度計算手段は、前記第１の画像中の前記ブロックと前記第２の画像中の前記ブロックとの各々を、同じ形状の複数のサブブロックに分割するための手段と、前記第２の画像中の前記ブロックの各々と、前記第１の画像中の前記ブロックとの平均画素値を計算するための手段と、前記サブブロックの各々の画素の画素値から前記平均画素値を減算するための手段と、前記サブブロックの各々の画素の平均画素値を計算するための手段と、前記第１の画像のサブブロックの平均画素値と、前記第２の画像のブロックの各々のサブブロックのそれぞれの平均画素値との二乗誤差の合計を計算するための手段と、を含んでもよい。前記二乗誤差の合計が前記類似尺度である。 The first similarity calculation means includes means for dividing each of the block in the first image and the block in the second image into a plurality of sub-blocks having the same shape; Means for calculating an average pixel value of each of the blocks in the second image and the block of the first image; and calculating the average pixel value from the pixel values of each pixel of the sub-block. Means for subtracting, means for calculating an average pixel value of each pixel of the sub-block, an average pixel value of the sub-block of the first image, and each of the blocks of the second image Means for calculating a sum of squared errors with each of the average pixel values of the sub-blocks. The sum of the square errors is the similarity measure.

この発明の第２の局面は、ステレオカメラ及びレンジカメラに接続されてそれぞれ補正されたステレオ画像対とレンジカメラ像とを受けることができるコンピュータで実行可能なコンピュータプログラムに関する。ステレオ画像対は第１の画像と第２の画像とを含む。前記コンピュータプログラムは、前記コンピュータ上で実行されると、コンピュータを、前記レンジカメラ画像中の画素を前記第１の画像中の画素にマッピングするための第１のマッピング手段と、前記レンジカメラ画像中の前記画素の画素値をしきい値と比較するための比較手段と、前記比較手段に応答して、前記比較手段による比較結果に依存して第１のディスパリティ探索と第２のディスパリティ探索とを選択的に行なって、前記第２の画像中の画素であって前記第１の画像中の前記画素にマッチするものを探索する手段と、として動作させる。 A second aspect of the present invention relates to a computer-executable computer program that is connected to a stereo camera and a range camera and can receive a corrected stereo image pair and a range camera image. The stereo image pair includes a first image and a second image. The computer program, when executed on the computer, causes the computer to map a pixel in the range camera image to a pixel in the first image; A comparison means for comparing the pixel value of the pixel with a threshold value, and a first disparity search and a second disparity search in response to the comparison result by the comparison means in response to the comparison means Are selectively operated to search for pixels in the second image that match the pixels in the first image.

深度マップ推定アルゴリズム
ステレオカメラベースの深度マップ推定アルゴリズムはしばしば、補正及び探索メカニズムに依存する。図７はディスパリティ推定のための補正されたステレオ画像対テンプレート探索処理を表す。左カメラ画像２４０Ｌにおいて対象点２５０（ｘ,ｙ）を含む矩形のテンプレートウィンドウ２５２が、右カメラ画像２４０Ｒ内の、左カメラ画像２４０Ｌ中のエピポーラ線２４２Ｌと同じ高さにあるエピポーラ線２４２Ｒ上の同じサイズのブロック２６２と、二乗誤差又はゼロ平均正規化相関法等の様々な類似尺度を用いて比較される。類似尺度が最も高いブロックが、右カメラ画像２４０Ｒ中の画素２６０（ｘ＋ｕ，ｙ）を規定する。この画素２６０と左画像中の画素２５０とは、同じ対象点の画像であると考えられる。 Depth Map Estimation Algorithm Stereo camera-based depth map estimation algorithms often rely on correction and search mechanisms. FIG. 7 represents a corrected stereo image pair template search process for disparity estimation. The rectangular template window 252 including the target point 250 (x, y) in the left camera image 240L is the same on the epipolar line 242R at the same height as the epipolar line 242L in the left camera image 240L in the right camera image 240R. The size block 262 is compared with various similarity measures such as square error or zero mean normalized correlation. The block with the highest similarity measure defines the pixel 260 (x + u, y) in the right camera image 240R. The pixel 260 and the pixel 250 in the left image are considered to be images of the same target point.

マッチングアルゴリズムは、左カメラ画像２４０Ｌ内の２ＤテンプレートウィンドウＡ２５２の、右カメラ画像２４０Ｒ内のサイズｗ×ｈの２ＤブロックＢ２６２に対する類似尺度を計算する必要がある。一般に、補正誤差とガウスノイズの存在下で、Ａ及びＢを以下のように表すことができる。 The matching algorithm needs to calculate a similarity measure for the 2D block B262 of size w × h in the right camera image 240R of the 2D template window A252 in the left camera image 240L. In general, in the presence of correction error and Gaussian noise, A and B can be expressed as follows:

ここでθは、補正誤差による、ウィンドウデータＡに対するウィンドウデータＢの相対的回転である。γ及びΓはそれぞれ左及び右カメラの任意のスケール値（利得要素）を表す。Ｘ―（ここで―は式中文字の上に付されるものである）は平均値を表し、Ｘ〜（ここで〜は式中文字の上に付されるものである）はウィンドウデータの固有の形状又はテクスチャ特性を表すものである。Ｎ（μ,σ^２）は、平均μ及び分散σ^２≧０の時の任意のガウスノイズを表し、ここで画像とノイズデータとは独立である。すなわち、一般に、理想的な条件下では、ウィンドウデータの両方が同じ対象パッチに属し、θがゼロであり、ステレオ画像が完璧に補正されたときのみ、以下が成立する。

Here, θ is a relative rotation of the window data B with respect to the window data A due to a correction error. γ and Γ represent arbitrary scale values (gain factors) of the left and right cameras, respectively. X- (where-is attached to the letter in the formula) represents an average value, and X to (where ~ is attached to the letter in the formula) are window data. It represents a unique shape or texture characteristic. N (μ, σ ² ) represents an arbitrary Gaussian noise when the mean μ and the variance σ ² ≧ 0, where the image and the noise data are independent. In other words, generally, under ideal conditions, both of the window data belong to the same target patch, θ is zero, and the following holds only when the stereo image is perfectly corrected.

しかし、一般に、我々がアクセスできるのは式１及び式２のＡ及びＢの観察のみであり、画像は、較正誤差、映像ノイズ、又はＡ〜及びＢ〜が独自の情報を担持していない場合に独自の固有形状特性がないことにより、画像を完璧にマッチングすることはできないであろう。映像ノイズは、レンズ焦点の違い、視野角の違い、及び左右のカメラ上での照明効果のむら等のために生ずる。

However, in general, we can only access the observations of A and B in Equations 1 and 2, and the image is calibration error, video noise, or if A ~ and B ~ do not carry their own information Due to the lack of unique intrinsic shape characteristics, the image may not be perfectly matched. Video noise occurs due to differences in lens focus, differences in viewing angle, and uneven lighting effects on the left and right cameras.

ＴＯＦカメラ
この実施の形態では、失敗又はマッチングの間違いの数を減じるために、ＴＯＦカメラの３Ｄ測定値をオンザフライで左右のカメラ画像２４０Ｌ及び２４０Ｒの両方に逆投影することで、画像画素の３Ｄの場所を推定する。従って、ディスパリティアルゴリズムの探索範囲は各画素に制限される。制限アルゴリズムは後述する。 TOF Camera In this embodiment, in order to reduce the number of failures or matching errors, the 3D measurements of the TOF camera are backprojected on-the-fly to both the left and right camera images 240L and 240R, so Estimate the location. Therefore, the search range of the disparity algorithm is limited to each pixel. The restriction algorithm will be described later.

ＴＯＦカメラは振幅変調された不可視赤外光を射出し、この光はシーンの対象物によって反射されてイメージセンサ上に後方散乱する。イメージセンサ上の各画素は入来する光信号を復調し、正弦波関数を回復して位相遅延を推定する。この場合、位相遅延は対象物のカメラへの距離に直接比例する（非特許文献７）。 The TOF camera emits amplitude-modulated invisible infrared light that is reflected by the scene object and backscattered onto the image sensor. Each pixel on the image sensor demodulates the incoming optical signal and recovers the sine wave function to estimate the phase delay. In this case, the phase delay is directly proportional to the distance of the object to the camera (Non-Patent Document 7).

実務的観点からは、ＴＯＦカメラには様々な課題と制限がある。例えば、距離測定の標準偏差は変調周波数と逆比例の挙動をする（非特許文献４）。このため、正確な距離測定を目標とする場合は、高い変調周波数が好ましい。一方、一義的な測定範囲は変調周波数と逆比例する。例えば、変調周波数が２０ＭＨｚの場合、一義的な測定範囲は７．５メートルである。このため、多義的な範囲（７．５メートル超）に属する測定値は不正確に推定される。 From a practical point of view, TOF cameras have various problems and limitations. For example, the standard deviation of distance measurement behaves inversely proportional to the modulation frequency (Non-Patent Document 4). For this reason, a high modulation frequency is preferred when aiming at accurate distance measurement. On the other hand, the unique measurement range is inversely proportional to the modulation frequency. For example, when the modulation frequency is 20 MHz, the unique measurement range is 7.5 meters. For this reason, the measured values belonging to the ambiguous range (above 7.5 meters) are estimated inaccurately.

射出された赤外光は逆二乗の法則に従うため、入来する画素値の振幅しきい値を設定すればノイズのある画素をフィルタ除去することができる。基本的に、これは、前景の対象物の範囲測定値は入手できるが、背景の対象物については入手できないと考えることができる。従って、この実施の形態では、この情報を、従来のステレオ再構築処理において前景及び背景領域のマスキングに利用する。 Since the emitted infrared light follows the law of inverse square, the pixel with noise can be filtered out by setting the amplitude threshold value of the incoming pixel value. Basically, this can be thought of as having a range measurement for the foreground object but not for the background object. Therefore, in this embodiment, this information is used for masking the foreground and background areas in the conventional stereo reconstruction process.

レンジカメラからステレオカメラへのマッピング
ＴＯＦカメラからの一組の３Ｄ測定値とステレオ画像中のそれらに対応する画像点とが、レンジカメラからステレオカメラへのマッピングパラメータＡ＝［ａ_ｉｊ］を与える。均質座標での各カメラに対する、レンジカメラからステレオカメラへのマッピング式は、以下の、周知のカメラ較正手順として書くことができる（非特許文献８）。 Range Camera to Stereo Camera Mapping A set of 3D measurements from the TOF camera and their corresponding image points in the stereo image give the range camera to stereo camera mapping parameter A = [a _ij ]. The mapping equation from the range camera to the stereo camera for each camera in homogeneous coordinates can be written as the following well-known camera calibration procedure (Non-Patent Document 8).

ここで［ｘ_０ｙ_０ｚ_ｏ］^ＴはＴＯＦカメラからの３Ｄ測定値であり、［ｘ_ｉｙ_ｉ］^Ｔはその対応の画像点であり、ｗ_ｉは非ゼロのファクタである。ａが以下のようなベクトルの形のマッピングパラメータの組を表すものとする。

Where [x ₀ y ₀ z _o ] ^T is the 3D measurement from the TOF camera, [x _i y _i ] ^T is its corresponding image point, and w _i is a non-zero factor. Let a denote a set of mapping parameters in the form of a vector:

ａ＝［ａ₁₁ａ₁₂ａ₁₃ａ₁₄ａ₂₁ａ₂₂ａ₂₃ａ₂₄ａ₃₁ａ₃₂ａ₃₃］^Ｔ．
ａ₃₄＝１．０と設定することで、式３をスケーリングすることができ、いくつかの代数操作を行なうと、対応の３Ｄ−２Ｄ点の対の各々がａ_ijパラメータにおいて線形である２つの式を与える。すなわち _{_{a = [a 11 a 12 a}} 13 a 14 a 21 a 22 a 23 a 24 a 31 a 32 a 33] T.
By setting a ₃₄ = 1.0, Equation 3 can be scaled, and with some algebraic operations, two corresponding 3D-2D point pairs are linear in the a _ij parameter. Gives an expression. Ie

上の式を以下のように定式化することができる。

The above equation can be formulated as follows:

こうして、３Ｄ−２Ｄの点の対１つから２つの式と１１の未知数が与えられ、ここでＱはその行が式３の形である２×１１の行列であり、ａは式４の未知のマッピングパラメータのベクトルである。式３中の２つの式は線形独立でなければならないので、ここから、同一平面上にないマッピング点の最小数がｎ＝６であることがわかる。実際には、ｎ＞＞６を選択してＱ−Ｒ分解により式４を解く。上述の処理の結果、レンジカメラからステレオカメラへの２個のマッピング行列Ａ_Ｌ及びＡ_Ｒを、左右のステレオカメラそれぞれについてオフラインで得ることができる。図１５はＴＯＦカメラからステレオ画像へのマッピングの画面をキャプチャしたものであって、マッピングされた画素が左右のカメラ画像に重ねられている。

Thus, from one pair of 3D-2D points, two equations and 11 unknowns are given, where Q is a 2 × 11 matrix whose rows are in the form of Equation 3 and a is the unknown of Equation 4. This is a vector of mapping parameters. Since the two equations in Equation 3 must be linearly independent, it can be seen from this that the minimum number of mapping points that are not on the same plane is n = 6. In practice, n >> 6 is selected and Equation 4 is solved by QR decomposition. As a result of the above processing, two mapping matrices A _L and A _R from the range camera to the stereo camera can be obtained offline for each of the left and right stereo cameras. FIG. 15 shows a screen shot of mapping from a TOF camera to a stereo image. The mapped pixels are superimposed on the left and right camera images.

ディスパリティ探索範囲を各画素対に限定できるよう、ＴＯＦカメラでキャプチャされた３Ｄレンジデータをオンザフライで左右のステレオ画像にマッピングしたい。いずれの３Ｄ−２Ｄ対についてもｗｉ≠０なので、式３は次の形に書換えることができる。 I want to map 3D range data captured by a TOF camera to left and right stereo images on the fly so that the disparity search range can be limited to each pixel pair. Since wi ≠ 0 for any 3D-2D pair, Equation 3 can be rewritten as:

ここで［ｘ_０ｙ_０ｚ_０］^ＴはＴＯＦカメラからの画素の場所についての３Ｄ測定値であり、以下はそれぞれ、これに対応する左右のステレオカメラ画像の画像座標値である。

Here, [x ₀ y ₀ z ₀ ] ^T is a 3D measurement value for the location of the pixel from the TOF camera, and the following are the image coordinate values of the left and right stereo camera images corresponding thereto.

ＴＯＦカメラ自体に誤差があるので、右カメラ上で実際に対応する画素の場所は以下の通りとなり、ここでステレオカメラ画像は補正されており、δは前景対象物に対するあらたな探索範囲を表す定数（この実現例では８）である。

Since there is an error in the TOF camera itself, the location of the actually corresponding pixel on the right camera is as follows, where the stereo camera image has been corrected, and δ is a constant representing the new search range for the foreground object (8 in this implementation).

テンプレートマッチングのための正規化された相互相関類似尺度
立体視ディスパリティ探索アルゴリズム等の様々なパターンマッチングアルゴリズムにおいて、２個の画像ブロック間の類似度を測定するのに、相互相関値が利用される。これは、第２のカメラ画像内の一組のブロックを、一つづつ第１のカメラ画像からの既知のものと比較することによって、その一組のブロックのうちからマッチするものを見出すのに慣用される。ブロックのＤ．Ｃ．値は形状情報を担持していないので、式１及び式２から各ブロックの平均を除去する。これによって以下が得られる。

Normalized cross-correlation similarity measure for template matching Cross-correlation values are used to measure the similarity between two image blocks in various pattern matching algorithms such as stereoscopic disparity search algorithm . This can be done by comparing a set of blocks in the second camera image to the known ones from the first camera image one by one to find a match from the set of blocks. It is commonly used. D. of the block. C. Since the value does not carry shape information, the average of each block is removed from Equation 1 and Equation 2. This gives the following:

式６及び式７の個々の分散はウィンドウデータ及びノイズデータの分散の和に等しい。というのも、画像データ（あるシーンの視覚的外観）と画像ノイズとは独立だからである。したがって、分散は以下のように表すことができる。

The individual variances in Equations 6 and 7 are equal to the sum of the variances of window data and noise data. This is because image data (visual appearance of a scene) and image noise are independent. Thus, the variance can be expressed as:

そうでなければ、Ａ〜とＢ〜とはなんら独自の形状情報を担持しない。式６と式７との正規化相互相関式は次のように書くことができる。

Otherwise, A ~ and B ~ do not carry any unique shape information. The normalized cross-correlation equation between Equation 6 and Equation 7 can be written as

ここでｕは第２の画像のエピポーラ線上の場所（ｘ＋ｕ）に対応するウィンドウＡの前もって仮定された変位値であり、θは一般に、画像が補正されている場合は０であると仮定される。従って、以下の式ではパラメータθを無視することができる。

Where u is the pre-assumed displacement value of window A corresponding to the location (x + u) on the epipolar line of the second image, and θ is generally assumed to be zero if the image has been corrected. . Therefore, the parameter θ can be ignored in the following equation.

式１０をよりよく分析するために、式６−７及び式８−９を式１０に代入し、これを拡張すると、以下のようになる。 To better analyze Equation 10, substituting Equation 6-7 and Equation 8-9 into Equation 10 and expanding it yields:

ここでＮ（０，σ^２）（ｘ,ｙ）は、Ｎ（０，σ^２）分布からくる画素の場所（ｘ,ｙ）でのノイズデータを表す。

Here, N (0, σ ² ) (x, y) represents noise data at the pixel location (x, y) coming from the N (0, σ ² ) distribution.

明らかに、画像中にノイズがあるときには式１１が与える類似尺度の信頼性は低くなる。従って、間違ったマッチングを減じるために、パターンマッチングの課題においては類似尺度からノイズ効果を除去しなければならない。 Obviously, the reliability of the similarity measure given by Equation 11 is low when there is noise in the image. Therefore, to reduce false matching, noise effects must be removed from the similarity measure in the pattern matching task.

テンプレートマッチングのためのｍノイズに対し頑健な新規な類似尺度
以下では、テンプレートマッチング課題のための、新たな、ノイズに対し頑健な類似尺度（ｎｏｉｓｅｒｏｂｕｓｔｓｉｍｉｌａｒｉｔｙｍｅａｓｕｒｅ：ＮＲＳＭ）アルゴリズムを説明する。式６及び式７のブロックデータＡ＾及びＢ＾（ここで＾の符号は式中文字の上に付されるものである）をサブブロックに分割する。これは図１３に示すとおりであり、以下で表される。 In the following, a new noise robustness measure (NRSM) algorithm for the template matching task will be described. The block data A ^ and B ^ (where the sign of ^ is attached to the letter in the formula) in Expression 6 and Expression 7 is divided into sub-blocks. This is as shown in FIG. 13 and is expressed below.

なぜなら、これらブロックの強度値は平均からの偏差だからである。各サブブロックを次の式に従ってシーンとノイズとに分解することができる。

This is because the intensity values of these blocks are deviations from the average. Each sub-block can be decomposed into scene and noise according to the following equation:

新たな係数の組を以下のように規定することによって、上述のノイズ項を除去する。

The above noise term is removed by defining a new set of coefficients as follows.

これによって次の式が得られる。

This gives the following equation:

ａ＝［ａ_１，ａ_２，．．．，ａ_Ｎ］，（１５）
ｂ＝［ｂ_１，ｂ_２，．．．，ｂ_Ｎ］，（１６）
式６及び式７では、ノイズ項Ｎ（０，σ^２）は独立かつ同一に配分された（ｉｎｄｅｐｅｎｄｅｎｔａｎｄｉｄｅｎｔｉｃａｌｌｙｄｉｓｔｒｉｂｕｔｅｄ：ｉ．ｉ．ｄ）ランダム変数から来る。従って、大数の法則により、ａｖｇ（Ａ＾_k-noise）→０が成立つ。一方で、以下のシーン項ａｖｇ（Ａ＾_k-scene）はシーンパターンからくるもので、その形状特性により、必ずしもゼロに近似しなくてもよいが、テンプレートとの独自のマッチングのためには、少なくとも１個のサブブロックが非ゼロの平均値を生成する必要がある。そうでなければ、形状情報が入手できないと考えられる。 a = [a ₁ , a ₂ ,. . . , A _N ], (15)
b = [b ₁ , b ₂ ,. . . , B _N ], (16)
In Equations 6 and 7, the noise term N (0, σ ² ) comes from independent and identically distributed (i.d.) random variables. Therefore, avg (A ^ _k-noise ) → 0 is established according to the law of large numbers. On the other hand, the following scene term avg (A ^ _k-scene ) comes from the scene pattern and does not necessarily approximate to zero due to its shape characteristics, but for unique matching with the template, At least one sub-block needs to produce a non-zero average value. Otherwise, it is considered that shape information is not available.

上述の処理のために、アルゴリズムは、ブロックサイズに依存せず非常に高速で輪を算出するための、「統合画像」（非特許文献９）と呼ばれる入力画像の中間表現を計算する。簡潔に言えば、統合画像は、画像の各画素について、その画素を包含する任意サイズの矩形内の画素値を合計し、その結果得られる和をその画素に割当てることによって得られる。 For the above-described processing, the algorithm calculates an intermediate representation of the input image called “integrated image” (Non-Patent Document 9) for calculating a circle at a very high speed without depending on the block size. Briefly, an integrated image is obtained by summing the pixel values in an arbitrarily sized rectangle that encompasses the pixel for each pixel of the image and assigning the resulting sum to that pixel.

さて、ａ_ｋとｂ_ｋとはノイズの影響から解放されたが、左右のカメラスケールファクタ、それぞれγ及びΓからは解放されていない。スケールファクタを除去するために、新たな組の正規化記述子を以下のように規定することとする。 Now, a _k and b _k are released from the influence of noise, but are not released from the left and right camera scale factors, γ and Γ, respectively. To remove the scale factor, a new set of normalization descriptors will be defined as follows:

ここでａ_ｋ＞０，ｂ_ｋ＞０であり、かつ１≦ｋ≦Ｎである。最大の絶対値係数、ａ_ｋ及びｂ_ｋを正規化のために選択することで、（もし残っていても）正規化記述子上のノイズ効果をさらに低減することができる。その後、差分平方和(ｓｕｍｏｆｓｑｕａｒｅｄｄｉｆｆｅｒｅｎｃｅｓｍｅｔｈｏｄ）の方法を用いて、以下のように正規化記述子の組を比較する。

Here, a _k > 0, b _k > 0, and 1 ≦ k ≦ N. Choosing the largest absolute value coefficients, a _k and b _k for normalization can further reduce the noise effects on the normalization descriptor (if any). Then, using the sum of squared differences method, the set of normalized descriptors is compared as follows:

正規化相互相関ベースの類似尺度アルゴリズムが生成する結果は、ノイズがある場合には信頼性が低くなることを、数学的に示した。他方で、提案されたＮＲＳＭアルゴリズムは、画像のノイズ効果を低減又は除去する。予備的な実験結果は、理論が実践と一致することを示した。

The results generated by the normalized cross-correlation-based similarity measure algorithm have been shown to be less reliable in the presence of noise. On the other hand, the proposed NRSM algorithm reduces or eliminates the noise effect of the image. Preliminary experimental results showed that the theory is consistent with practice.

システム設定
［構造］
上述の通り、この実施の形態では、リアルタイムのディスパリティデータ獲得のために、ＴＯＦカメラによって性能を高めた立体視ベースのディスパリティ探索アルゴリズムを利用する。この実施の形態は、観察されるＴＯＦカメラ画像内の各画素の３Ｄ場所情報を利用して、ディスパリティアルゴリズムの探索ドメインを制限し、ディスパリティ探索範囲を選択する。 System settings [Structure]
As described above, this embodiment uses a stereoscopic-based disparity search algorithm whose performance is improved by the TOF camera in order to acquire real-time disparity data. In this embodiment, the search domain of the disparity algorithm is limited and the disparity search range is selected using the 3D location information of each pixel in the observed TOF camera image.

図１はこの実施の形態のディスパリティ推定システム４０の全体構造を示す図である。図１を参照して、ディスパリティ推定システム４０は、ステレオカメラ６０とＴＯＦカメラ６２とを含むカメラアセンブリ５２と、３Ｄモニタ５４と、通常のモニタ７４と、ステレオカメラ６０からのステレオ画像ストリームと、ＴＯＦカメラ６２からのＴＯＦ画像ストリームとから、左右のカメラ画像の画素対の各々のディスパリティを計算するリアルタイムディスパリティ計算装置５０と、を含む。３Ｄモニタ５４は２Ｄから３Ｄへの変換が可能である。すなわち、２Ｄ画像のストリームと対応するディスパリティ画像のストリームとを与えられると、３Ｄモニタ５４は与えられた画像に基づいて右画像を計算する。３Ｄモニタ５４の表示スクリーン上には傾斜レンチキュラーレンズが配置され、３Ｄモニタ５４は、レンチキュラーレンズが異なる画素からの光を予め規定された方向に焦点合わせして、見る人にある絵の異なる側を見せるように、左右の画像を交互に表示する。 FIG. 1 is a diagram showing the overall structure of the disparity estimation system 40 of this embodiment. Referring to FIG. 1, a disparity estimation system 40 includes a camera assembly 52 including a stereo camera 60 and a TOF camera 62, a 3D monitor 54, a normal monitor 74, a stereo image stream from the stereo camera 60, A real-time disparity calculating device 50 that calculates the disparity of each pixel pair of the left and right camera images from the TOF image stream from the TOF camera 62. The 3D monitor 54 can convert from 2D to 3D. That is, given a 2D image stream and a corresponding disparity image stream, the 3D monitor 54 calculates a right image based on the given image. An inclined lenticular lens is placed on the display screen of the 3D monitor 54, and the 3D monitor 54 focuses the light from the different pixels in a pre-defined direction so that the viewer sees different sides of the picture. The left and right images are displayed alternately as shown.

図２は、コンピュータで実現されたこの実施の形態のリアルタイムディスパリティ計算装置５０のハードウェアブロック図である。図２を参照して、ディスパリティ計算装置５０は、３Ｄモニタ５４及びモニタ７４に加えて、コンピュータ７０と、マウス８２と、キーボード８０と、を含む。マウス８２と、キーボード８０と、モニタ５４及び７４とは、全てコンピュータ７０に接続される。 FIG. 2 is a hardware block diagram of the real-time disparity calculating apparatus 50 of this embodiment implemented by a computer. Referring to FIG. 2, disparity calculating apparatus 50 includes a computer 70, a mouse 82, and a keyboard 80 in addition to a 3D monitor 54 and a monitor 74. The mouse 82, the keyboard 80, and the monitors 54 and 74 are all connected to the computer 70.

さらに図２を参照して、コンピュータ７０は、中央処理装置（ｃｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇｕｎｉｔ：ＣＰＵ）９０と、ＣＰＵ９０に接続された双方向データ及びアドレスバス９２と、バス９２に接続された読出専用メモリ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ：ＲＯＭ）９４と、バス９２に接続されたランダムアクセスメモリ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ：ＲＡＭ）９６と、バス９２に接続されたハードディスクドライブ９８と、バス９２に接続されＤＶＤメディア１０８を駆動するためのディジタル多用途ディスク（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ：ＤＶＤ）ドライブ１２８と、バス９２に接続され、ステレオカメラ６０からのステレオ画像ストリームを受け、さらにＴＯＦカメラ６２からのＴＯＦ画像ストリームを受けるためのビデオキャプチャボード１０２と、バス９２に接続され、半導体メモリ１１０を駆動するための半導体メモリドライブ１０６と、バス９２、３Ｄモニタ５４及びモニタ７４に接続されたグラフィック処理ユニット（ＧｒａｐｈｉｃＰｒｏｃｅｓｓｉｎｇＵｎｉｔ：ＧＰＵ）１０４と、を含む。コンピュータ７０のこれらの構成要素は全てバス９２に接続され、相互にアクセス可能である。 Still referring to FIG. 2, the computer 70 includes a central processing unit (CPU) 90, a bidirectional data and address bus 92 connected to the CPU 90, and a read-only memory (Read) connected to the bus 92. In order to drive only memory (ROM) 94, random access memory (RAM) 96 connected to bus 92, hard disk drive 98 connected to bus 92, and DVD medium 108 connected to bus 92. Digital versatile disc (DVD) drive 128 and a bus 92 connected to receive a stereo image stream from stereo camera 60, and a TF from TOF camera 62. A video capture board 102 for receiving the OF image stream, a semiconductor memory drive 106 connected to the bus 92 for driving the semiconductor memory 110, and a graphics processing unit (connected to the bus 92, the 3D monitor 54 and the monitor 74). (Graphic Processing Unit: GPU) 104. All of these components of computer 70 are connected to bus 92 and are mutually accessible.

別の局面では、リアルタイムディスパリティ計算装置５０の機能はコンピュータ７０上で実行されるソフトウェアで実現される。図３はリアルタイムディスパリティ計算装置５０の機能ブロック図である。 In another aspect, the function of the real-time disparity calculation apparatus 50 is realized by software executed on the computer 70. FIG. 3 is a functional block diagram of the real-time disparity calculating apparatus 50.

図３を参照して、機能的には、リアルタイムディスパリティ計算装置５０は、ＴＯＦカメラ６２の予備較正の間にＴＯＦカメラ６２からの画像を記憶するための較正フレームメモリ１３８と、ステレオカメラ６０を較正し、ステレオカメラ６０の較正パラメータを出力するための較正ソフトウェア１３０と、較正ソフトウェア１３０によって出力された較正パラメータを記憶するための較正パラメータメモリ１３２と、較正パラメータメモリ１３２に記憶された較正パラメータを利用して、ステレオカメラ６０からのステレオ画像対を補正するための補正ソフトウェア１３４と、補正ソフトウェア１３４によって出力される左右の画像を記憶するためのフレームメモリ１３６と、を含む。較正の間、較正ソフトウェア１３０はフレームメモリ１３６に記憶された、補正ソフトウェア１３４によって補正されていない画像を読出し、ステレオカメラ６０の較正パラメータを計算する。 Referring to FIG. 3, functionally, the real-time disparity calculator 50 includes a calibration frame memory 138 and a stereo camera 60 for storing images from the TOF camera 62 during preliminary calibration of the TOF camera 62. Calibration software 130 for calibrating and outputting calibration parameters of the stereo camera 60, calibration parameter memory 132 for storing the calibration parameters output by the calibration software 130, and calibration parameters stored in the calibration parameter memory 132 Utilizing correction software 134 for correcting a stereo image pair from the stereo camera 60 and a frame memory 136 for storing left and right images output by the correction software 134. During calibration, calibration software 130 reads the images stored in frame memory 136 that have not been corrected by correction software 134 and calculates calibration parameters for stereo camera 60.

リアルタイムディスパリティ計算装置５０はさらに、前景対象物の、較正フレームメモリ１３８に記憶されたＴＯＦカメラ測定値を、補正ソフトウェア１３４によって補正されフレームメモリ１３６に記憶された左右の画像にマッピングするためのマッピング行列を計算する予備較正ソフトウェア１４０と、予備較正ソフトウェア１４０によって計算されたマッピング行列のパラメータを記憶するＴＯＦマッピングパラメータメモリ１４２と、較正フレームメモリ１３８内のＴＯＦ画像の各画素について、ＴＯＦマッピングパラメータメモリ１４２に記憶されたマッピング行列によって、左右の画像内の対応の２Ｄ画素の場所を計算し、選択されたＴＯＦ画素の画素値が予め定められた画素値のしきい値より高ければハイレベルとなりそうでなければローレベルとなる前景／背景（ｆｏｒｅｇｒｏｕｎｄ／ｂａｃｋｇｒｏｕｎｄ：Ｆ／Ｇ）信号１７６を出力するための、前景／背景マッピングモジュール１４６と、前景／背景マッピングモジュール１４６で用いられる画素値のしきい値を記憶するためのしきい値記憶部１４４と、を含む。 The real-time disparity calculation device 50 further maps the foreground object to map the TOF camera measurement values stored in the calibration frame memory 138 to the left and right images corrected by the correction software 134 and stored in the frame memory 136. Pre-calibration software 140 for calculating the matrix, TOF mapping parameter memory 142 for storing the parameters of the mapping matrix calculated by the pre-calibration software 140, and for each pixel of the TOF image in the calibration frame memory 138, the TOF mapping parameter memory 142 Is used to calculate the location of the corresponding 2D pixel in the left and right images, and is likely to be high if the pixel value of the selected TOF pixel is higher than a predetermined pixel value threshold. If the foreground / background mapping module 146 and the foreground / background mapping module 146 for outputting the foreground / background (F / G) signal 176 to be low level, And a threshold storage unit 144 for storing.

上述のとおり、ＴＯＦカメラ６２から射出された赤外光は逆二乗の法則に従う。従って、入来するしきい値を設定することで、背景の（ＴＯＦカメラ６２から遠い）ノイズを含んだ画素をフィルタ除去することができる。 As described above, the infrared light emitted from the TOF camera 62 follows the inverse square law. Therefore, by setting an incoming threshold value, it is possible to filter out pixels including noise in the background (far from the TOF camera 62).

リアルタイムディスパリティ計算装置５０はさらに、Ｆ／Ｇ信号１７６とフレームメモリ１３６に記憶された画像とを受けるように接続され、右カメラ画像中の画素で左カメラ画像の各画素に対応するものを探索し、左右の画素の画素間のディスパリティを計算するためのディスパリティ探索モジュール１４８と、ディスパリティ探索モジュール１４８によって計算されたディスパリティを、左カメラ画像の画素に対応するアドレスに記憶するためのディスパリティメモリ１５０と、較正フレームメモリ１３８、フレームメモリ１３６に記憶された画像、若しくはディスパリティメモリ１５０に記憶されたディスパリティ画像、又はこれら画像の任意の組合せを、３Ｄモニタ５４及びモニタ７４に選択的に出力するためのグラフィック出力ユニット１５２とを含む。左カメラの２Ｄ画像とその対応のディスパリティ画像とのストリームが与えられると、３Ｄモニタ５４は２Ｄ画像とディスパリティ画像とを３Ｄ画像のストリームに変換し、これを表示する。 The real-time disparity calculation device 50 is further connected to receive the F / G signal 176 and the image stored in the frame memory 136, and searches for a pixel in the right camera image corresponding to each pixel in the left camera image. The disparity search module 148 for calculating the disparity between the left and right pixels, and the disparity calculated by the disparity search module 148 are stored at addresses corresponding to the pixels of the left camera image. Select the disparity memory 150, the calibration frame memory 138, the image stored in the frame memory 136, the disparity image stored in the disparity memory 150, or any combination of these images for the 3D monitor 54 and the monitor 74 Graphic output And a Tsu door 152. Given a stream of a 2D image of the left camera and its corresponding disparity image, the 3D monitor 54 converts the 2D image and disparity image into a 3D image stream and displays it.

リアルタイムディスパリティ計算装置５０はさらに、リアルタイムディスパリティ計算装置５０内のモジュールの動作全体を制御するコントローラ１２２を含む。予備較正ソフトウェア１４０、較正ソフトウェア１３０、補正ソフトウェア１３４、前景／背景マッピングモジュール１４６、ディスパリティ探索モジュール１４８及びコントローラ１２２の機能は全て、コンピュータ７０上で実行されるソフトウェアで実現される。図３には示さないが、これら構成要素のソフトウェア及びモジュールはコントローラ１２２と通信し、動作においてモニタ７４上の好適なＧＵＩを利用する。グラフィック出力ユニット１５２の機能はＣＰＵ９０及びＣＰＵ１０４で実行されるソフトウェアの組合せで実現される。 The real-time disparity calculation apparatus 50 further includes a controller 122 that controls the overall operation of the modules in the real-time disparity calculation apparatus 50. The functions of the preliminary calibration software 140, the calibration software 130, the correction software 134, the foreground / background mapping module 146, the disparity search module 148, and the controller 122 are all implemented by software executed on the computer 70. Although not shown in FIG. 3, the software and modules of these components communicate with the controller 122 and utilize the preferred GUI on the monitor 74 in operation. The function of the graphic output unit 152 is realized by a combination of software executed by the CPU 90 and the CPU 104.

この実施の形態では、較正ソフトウェア１３０はオフラインで用いられてステレオカメラ６０の較正パラメータＡ_１及びＡ_２を計算する。較正は、半径方向の歪、レンズの偏心、焦点距離、画素のアスペクト比、ベースライン、並びにカメラ６０Ｌ及びカメラ６０Ｒの各々の配向を修正するために行なわれる。較正パラメータは較正パラメータメモリ１３２に記憶される。この実施の形態の較正処理では、ユーザは予め規定されたパターンをステレオカメラ６０に提示する。較正ソフトウェア１３０はステレオカメラ６０の出力を利用してパラメータを計算する。較正のためのソフトウェアは商業的に入手可能である。例えば、ＳＲＩインターナショナルが頒布するスモールビジョンシステム（ＳｍａｌｌＶｉｓｉｏｎＳｙｓｔｅｍ^ＴＭ：ＳＶＳ^ＴＭ）ソフトウェアを利用することができる。 In this embodiment, the calibration software 130 is used offline to calculate the calibration parameters A ₁ and A ₂ of the stereo camera 60. Calibration is performed to correct radial distortion, lens eccentricity, focal length, pixel aspect ratio, baseline, and orientation of each of the cameras 60L and 60R. Calibration parameters are stored in calibration parameter memory 132. In the calibration process of this embodiment, the user presents a predetermined pattern to the stereo camera 60. The calibration software 130 uses the output of the stereo camera 60 to calculate parameters. Software for calibration is commercially available. For example, Small Vision System ^™ (SVS ^™ ) software distributed by SRI International can be used.

補正ソフトウェア１３４はステレオカメラ６０の出力ステレオ画像を補正するのに用いられる。ここで補正とは、ステレオカメラ６０からの左右の画像の対応するエピポーラ線をそろえることである。この処理を図６に示す。 The correction software 134 is used to correct the output stereo image of the stereo camera 60. Here, the correction is to align the corresponding epipolar lines of the left and right images from the stereo camera 60. This process is shown in FIG.

図６を参照して、ステレオカメラ６０の左右の画像２３０Ｌ及び２３０Ｒが線２３２Ｌと対応の線２３２Ｒとをそれぞれ含むと仮定する。レンズのディストーションとレンズ配向の差のために、左右の画像では、視差を別にしても、同じ線の画像が異なる位置で異なる形状となる。 Referring to FIG. 6, it is assumed that left and right images 230L and 230R of stereo camera 60 include a line 232L and a corresponding line 232R, respectively. Due to the difference between the lens distortion and the lens orientation, the left and right images have different shapes at different positions even if the parallax is different.

画像を補正することにより、対応の線２３２Ｌと２３２Ｒとが画像の行と整列し、較正された左右のカメラ像２４０Ｌ及び２４０Ｒのエピポーラ線２４２Ｌ及び２４２Ｒとなる。補正なしでは、リアルタイムのディスパリティ探索はほとんど不可能である。補正は所定の計算によって実行される。この計算では、較正パラメータメモリ１３２に記憶された較正パラメータが用いられる。補正ソフトウェアも商業的に入手可能である。 By correcting the image, the corresponding lines 232L and 232R are aligned with the rows of the image, resulting in calibrated epipolar lines 242L and 242R of the left and right camera images 240L and 240R. Without correction, real-time disparity search is almost impossible. The correction is performed by a predetermined calculation. In this calculation, the calibration parameters stored in the calibration parameter memory 132 are used. Correction software is also commercially available.

通常の動作では、ステレオカメラ６０からの画像が補正ソフトウェア１３４で補正され、フレームメモリ１３６に記憶される。 In normal operation, the image from the stereo camera 60 is corrected by the correction software 134 and stored in the frame memory 136.

予備較正ソフトウェア１４０はＴＯＦカメラ６２からの画像とステレオカメラ６０の補正された画像との予備較正のためにオフラインで用いられる。この処理は図１１及び図１２を参照して説明する。 Pre-calibration software 140 is used offline for pre-calibration of the image from TOF camera 62 and the corrected image of stereo camera 60. This process will be described with reference to FIGS.

ＴＯＦカメラ６２からステレオ画像へのマッピング行列を推定するために、グラフィカルユーザインターフェイスが実現される。ユーザは図１１に示す予め規定されたパターン３４０をＴＯＦカメラ６２及びステレオカメラ６０に提示する。パターン３４０の面には複数個のマーカ３４４Ａ、３４４Ｂ、３４４Ｃ、３３４Ｄ及び３４４Ｅがある。図１２（Ａ）を参照して、ＴＯＦカメラ６２上のパターン３４０の画像３６０が最初にモニタ７４に表示される。ユーザはポインタ３７０をマーカのうち一つにあて、マウスボタンをクリックする。クリックされた位置のｘ−ｙ座標がメモリに記憶される。次に、左カメラ画像中のパターン３４０の画像３６２Ｌがモニタ７４上に表示され、図１２（Ｂ）に示すように、ユーザはポインタ３７０を同じマーカにあて、マウスボタンをクリックする。クリックされた位置のｘ−ｙ座標がメモリに記憶される。同様に、図１２（Ｃ）に示すように、右カメラ画像のパターン３４０の画像３６２Ｒについても同じポイント―クリック動作が行なわれ、ｘ−ｙ座標値がメモリに記憶される。 A graphical user interface is implemented to estimate the mapping matrix from the TOF camera 62 to a stereo image. The user presents a predetermined pattern 340 shown in FIG. 11 to the TOF camera 62 and the stereo camera 60. There are a plurality of markers 344A, 344B, 344C, 334D and 344E on the surface of the pattern 340. With reference to FIG. 12A, an image 360 of a pattern 340 on the TOF camera 62 is first displayed on the monitor 74. The user places the pointer 370 on one of the markers and clicks the mouse button. The xy coordinates of the clicked position are stored in the memory. Next, an image 362L of the pattern 340 in the left camera image is displayed on the monitor 74. As shown in FIG. 12B, the user places the pointer 370 on the same marker and clicks the mouse button. The xy coordinates of the clicked position are stored in the memory. Similarly, as shown in FIG. 12C, the same point-click operation is performed on the image 362R of the pattern 340 of the right camera image, and the xy coordinate values are stored in the memory.

こうして、グラフィカルユーザインタフェィスによってユーザはオフラインでｎ個（ｎ＞＞６）の対象画像点の三つ組を選択することができ、Ｑ−Ｒ分解により式４が解かれる。この結果、左右のステレオカメラについて２個のマッピング行列Ａ_ＬとＡ_Ｒとがそれぞれ得られる。行列Ａ_Ｌは、ＴＯＦカメラ画像中の画素を、左カメラ画像の対応の画素にマッピングするのに用いられる。行列Ａ_Ｒは、画素を右カメラ画像の対応の画素にマッピングするのに用いられる。Ａ_ＬとＡ_Ｒとを規定するパラメータはＴＯＦマッピングパラメータメモリ１４２に記憶される。 Thus, the graphical user interface allows the user to select a triple of n (n >> 6) target image points off-line, and Equation 4 is solved by QR decomposition. As a result, two mapping matrices A _L and A _R are obtained for the left and right stereo cameras, respectively. Matrix A _L is the pixel in the TOF camera image is used to map to the corresponding pixel of the left camera image. The matrix _AR is used to map pixels to corresponding pixels in the right camera image. Parameters defining the A _L and _{A R} are stored in the TOF mapping parameter memory 142.

ビデオキャプチャの間に、前景対象物のＴＯＦカメラ測定値がステレオ画像にマッピングされ、ステレオディスパリティアルゴリズムの探索範囲をオンザフライで制限する。 During video capture, TOF camera measurements of foreground objects are mapped to stereo images, limiting the search range of the stereo disparity algorithm on the fly.

図４は前景／背景マッピングモジュール１４６の全体構造を示す。図４を参照して、前景／背景マッピングモジュール１４６は、ＴＯＦカメラ画像の左上から右下へ向かう順に画素を選択する画素選択部１７０と、画素選択部１７０の出力を受けるように接続され、画素選択部１７０によって選択され較正フレームメモリ１３８に記憶されたＴＯＦカメラ６２の画像中の画素からの画素値（強度）を読出すための画素読出部１７２と、ＴＯＦマッピングパラメータメモリ１４２に記憶されたパラメータを用いて、マッピング行列Ａ_Ｌをｘ及びｙにそれぞれ適用することによって、選択された画素のｘ−ｙ座標値を左カメラ画像内の座標値ｘ_Ｌ及びｙ_Ｌにそれぞれマッピングするための左マッピング部１７４と、マッピング行列Ａ_Ｒをｘ及びｙにそれぞれ適用することによって、選択された画素のｘ−ｙ座標値を右カメラ画像内の座標値ｘ_Ｒ及びｙ_Ｒにそれぞれマッピングするための右マッピング部１７８と、画素読出部１７２からの選択された画素の画素値としきい値記憶部１４４からのしきい値強度とを受けるように接続され、Ｆ／Ｇ信号１７６を出力するための、比較器１８０とを含む。上述の通り、Ｆ／Ｇ信号１７６は画素選択部１７０によって選択された画素の強度がしきい値より高ければハイレベルとなり、そうでなければローレベルとなる。 FIG. 4 shows the overall structure of the foreground / background mapping module 146. Referring to FIG. 4, the foreground / background mapping module 146 is connected to receive a pixel selection unit 170 that selects pixels in order from the upper left to the lower right of the TOF camera image, and to receive the output of the pixel selection unit 170. A pixel reading unit 172 for reading a pixel value (intensity) from a pixel in the image of the TOF camera 62 selected by the selection unit 170 and stored in the calibration frame memory 138, and a parameter stored in the TOF mapping parameter memory 142 Is used to map the xy coordinate values of the selected pixel to the coordinate values x _L and y _L in the left camera image respectively by applying the mapping matrix A _L to x and y respectively. The xy coordinates of the selected pixel by applying the part 174 and the mapping matrix _AR to x and y respectively A right mapping unit 178 for mapping values to coordinate values x _R and y _R in the right camera image, a pixel value of the selected pixel from the pixel reading unit 172, and a threshold value from the threshold storage unit 144 And a comparator 180 connected to receive the intensity and for outputting an F / G signal 176. As described above, the F / G signal 176 is at the high level if the intensity of the pixel selected by the pixel selection unit 170 is higher than the threshold value, and is at the low level otherwise.

図５は、図３に示したディスパリティ探索モジュール１４８の全体構造を示す。図５を参照して、ディスパリティ探索モジュール１４８は、前景／背景マッピングモジュール１４６からのｘ_Ｌ及びｙ_Ｌを受けるように接続され、フレームメモリ１３６に記憶された左画像の（ｘ_Ｌ,ｙ_Ｌ）の画素を読出すための左ブロック読出モジュール２００と、Ｆ／Ｇ信号１７６を受けるように接続され、左ブロック読出モジュール２００の出力に接続された入力と、２つの出力２０２ａ及び２０２ｂとを有し、左ブロック読出モジュール２００から受けた画素値をＦ／Ｇ信号１７６のレベルに依存して出力２０２ａ又は２０２ｂのいずれかに選択的に出力するための選択部２０２と、入力が選択部２０２の出力２０２ａに接続され、右カメラ画像中の画素であって左ブロック読出モジュール２００から読出された左画素に対応するものを、前景ディスパリティ探索アルゴリズムを利用して探索し、左右の画素間のディスパリティ値を出力するための前景ディスパリティ探索モジュール２０８と、右カメラ画像中の画素であって左ブロック読出モジュール２００から読出された左画素に対応するものを、背景ディスパリティ探索アルゴリズムを利用して探索し、左右の画素間のディスパリティ値を出力するための背景ディスパリティ探索モジュール２１０と、Ｆ／Ｇ信号のレベルに依存して、前景ディスパリティ探索モジュール２０８及び背景ディスパリティ探索モジュール２１０の出力の一方を選択する選択部２１２と、を含む。選択部２１２の出力はディスパリティメモリ１５０のデータ入力ポートに接続される。 FIG. 5 shows the overall structure of the disparity search module 148 shown in FIG. Referring to FIG. 5, the disparity search module 148 is connected to receive x _L and y _L from the foreground / background mapping module 146 and stores (x _L , y _L _L ) of the left image stored in the frame memory 136. ) Having a left block readout module 200 for reading out pixels, an input connected to receive the F / G signal 176, connected to the output of the left block readout module 200, and two outputs 202a and 202b. The selection unit 202 for selectively outputting the pixel value received from the left block readout module 200 to either the output 202a or 202b depending on the level of the F / G signal 176, and the input of the selection unit 202 Connected to the output 202a and corresponding to a pixel in the right camera image that is read from the left block readout module 200 A foreground disparity search module 208 for searching for one using a foreground disparity search algorithm and outputting a disparity value between left and right pixels, and a left block readout module 200 for pixels in the right camera image. A background disparity search module 210 for searching for a pixel corresponding to the left pixel read out from the image using a background disparity search algorithm and outputting a disparity value between the left and right pixels; A selection unit 212 that selects one of the outputs of the foreground disparity search module 208 and the background disparity search module 210 depending on the level. The output of the selection unit 212 is connected to the data input port of the disparity memory 150.

ディスパリティ探索モジュールはさらに、前景ディスパリティ探索アルゴリズムにおいてディスパリティの探索に用いられる間隔の長さを記憶する前景間隔メモリ２０４と、背景ディスパリティ探索アルゴリズムにおいてディスパリティの探索に用いられる間隔の長さを記憶する背景間隔メモリ２０６と、選択部２１２によって選択されたディスパリティ値がディスパリティメモリ１５０のそれぞれの左画素アドレスに記憶されるように、図３に示すディスパリティメモリ１５０を制御するためのディスパリティメモリ制御部２１４と、を含む。ある画素のディスパリティの計算が完了すると、前景ディスパリティ探索モジュール２０８又は背景ディスパリティ探索モジュール２１０のいずれかが探索終了を示す信号を出力し、これは選択部２１２によって選択され、前景／背景マッピングモジュール１４６の画素選択部１７０に与えられ、こうして画素選択部１７０は次の画素を選択する。 The disparity search module further includes a foreground interval memory 204 that stores a length of an interval used for disparity search in the foreground disparity search algorithm, and an interval length used for disparity search in the background disparity search algorithm. 3 to control the disparity memory 150 shown in FIG. 3 so that the disparity value selected by the background interval memory 206 and the selection unit 212 is stored at the respective left pixel addresses of the disparity memory 150. A disparity memory control unit 214. When the calculation of the disparity of a certain pixel is completed, either the foreground disparity search module 208 or the background disparity search module 210 outputs a signal indicating the end of the search, which is selected by the selection unit 212, and foreground / background mapping. The pixel selection unit 170 of the module 146 is provided, and thus the pixel selection unit 170 selects the next pixel.

再び図７を参照して、ディスパリティ探索アルゴリズムは一般に、以下のステップを含む。すなわち、左カメラ画像２４０Ｌ内の画素２５０（ｘ,ｙ）を選択するステップと、左カメラ画像２４０Ｌ内で対象の画素２５０（ｘ,ｙ）を含む矩形テンプレートウィンドウ２５０を規定するステップと、右カメラ画像２４０Ｒ内で（ｘ＋ｕ,ｙ）の画素２６０を選択するステップと、上述の類似度尺度を用いて、右カメラ画像２４０Ｒ内のエピポーラ線２４２Ｒ上のその中心に画素２６０を含む、同じサイズのブロック２６２と矩形テンプレートウィンドウ２５２とを比較するステップと、予め定められた間隔内で値ｕを変化させて矩形テンプレートウィンドウ２５２と同じサイズのブロック２６２との類似尺度を計算するステップと、画素２６０が予め定められた間隔内で最も左の位置から最も右の位置まで移動するまで、変化させつつ計算するステップを繰返すステップと、右カメラ画像２４０Ｒ中で最も高い類似尺度を有するブロックを選択するステップと、選択されたブロック中の画素２６０を画素２５０に対応する画素として選択するステップと、を含む。 Referring again to FIG. 7, the disparity search algorithm generally includes the following steps. That is, selecting a pixel 250 (x, y) in the left camera image 240L, defining a rectangular template window 250 including the target pixel 250 (x, y) in the left camera image 240L, and the right camera Selecting (x + u, y) pixel 260 in image 240R and using the similarity measure described above, a block of the same size including pixel 260 at its center on epipolar line 242R in right camera image 240R 262 and the rectangular template window 252 are compared; a value u is changed within a predetermined interval to calculate a similarity measure with the block 262 of the same size as the rectangular template window 252; Measure while changing until it moves from the leftmost position to the rightmost position within a set interval. Comprising the steps of repeating the steps of, selecting a block having the highest similarity measure in right camera image 240R, and selecting the pixels 260 in the selected block as the pixel corresponding to the pixel 250.

図８を参照して、左カメラ画像２４０Ｌ内の画素２５０Ｌと、右カメラ画像２４０Ｒ内の対応の画素２５０Ｒとの距離が、ディスパリティＤを規定する。 Referring to FIG. 8, the distance between the pixel 250L in the left camera image 240L and the corresponding pixel 250R in the right camera image 240R defines the disparity D.

対象物が前景にある場合、その左右カメラ画像間のディスパリティは背景にある対称物のそれよりも大きくなるはずである。従って、探索の間隔を長くしなければならない。探索間隔が長くなると、探索に必要な時間も長くなり、リアルタイムでのディスパリティ探索が困難になる。 If the object is in the foreground, the disparity between its left and right camera images should be greater than that of the symmetrical object in the background. Therefore, the search interval must be increased. If the search interval becomes longer, the time required for the search also becomes longer, and disparity search in real time becomes difficult.

しかし、この実施の形態のディスパリティ推定システム４０でディスパリティの計算に必要とされる時間はかなり短い。これは、ディスパリティ推定システム４０がＴＯＦカメラ６２を用いるからである。図９（Ａ）及び図９（Ｂ）を参照して、ＴＯＦカメラ画像中の画素が、左マッピング行列Ａ_Ｌによって選択され左カメラ画像２４０_Ｌ内の画素２５０_Ｌにマッピングされる。ＴＯＦカメラ画像中の選択された画素の画素値に基づいて、画素が前景の対象物に属するか否かが判定される。 However, the time required for calculating the disparity in the disparity estimation system 40 of this embodiment is considerably short. This is because the disparity estimation system 40 uses the TOF camera 62. Referring to FIGS. 9 (A) and 9 FIG. 9 (B), the pixels in the TOF camera image is mapped to the pixel 250 _L in the left camera image 240 _L is selected by the left mapping matrix _{A L.} Based on the pixel value of the selected pixel in the TOF camera image, it is determined whether the pixel belongs to the foreground object.

画素が前景の対象物に属する場合、右カメラ画像２４０Ｒの対応の画素２６０Ｒの座標値がマッピング行列Ａ_Ｒを用いて推定される。推定された画素２６０Ｒを包含するように、探索間隔２７０が規定される。一般に、画素２５０Ｌと２６０Ｒとのｘ座標値は互いに異なる。これらは図９（Ａ）において距離Ｄ_Ｌ及びＤ_Ｒとしてそれぞれ示される。この実施の形態では、探索間隔２７０は推定された画素２６０Ｒが探索間隔２７０の中央にくるように選択される。表現を変えれば、もし画素２５０Ｌのｘ軸座標がｘ_Ｌなら、右カメラ画像２４０Ｒ中の対応の画素は[ｘ_Ｒ−δ,ｘ_Ｒ＋δ]の間隔内で探索され、ここでδは定数（この実現例では８）であり、ｘ_Ｒは行列ＡＲによってマッピングされた画素のｘ座標である。 If the pixel belongs to the foreground object, the coordinate value of the corresponding pixel 260R of the right camera image 240R is estimated using the mapping matrix A _R. A search interval 270 is defined so as to include the estimated pixel 260R. In general, the x-coordinate values of the pixels 250L and 260R are different from each other. These are shown respectively as the distance _{D L} and _{D R} in FIG. 9 (A). In this embodiment, the search interval 270 is selected such that the estimated pixel 260R is in the center of the search interval 270. In other expressions, if if the pixel 250L of x-axis coordinates x _L, corresponding pixels in the right camera image 240R is searched in intervals of _{_{[x R -δ, x R +}} δ], where [delta] is a constant ( a 8) in this implementation, x _R is the x-coordinate of the pixels mapped by a matrix AR.

画素が背景にある対象物に属する場合、右カメラ画像２４０Ｒ中で、対象の画素２５０Ｌと同じ位置にある画素２６０Ｒが選択される。この場合探索間隔２７２は、画素２６０Ｒの右側の間隔２７２として規定される。表現を変えれば、ｘ_Ｌが左カメラ画像のｘ軸座標値であれば、右カメラの実際の対応の画素の場所が[ｘ_Ｌ,ｘ_Ｌ＋Ｌ]の間隔で探索されればよく、ここでステレオカメラ画像は補正され、Ｌは背景対象物の探索範囲を表す定数（この実現例では４０）である。 When the pixel belongs to the object in the background, the pixel 260R at the same position as the target pixel 250L is selected in the right camera image 240R. In this case, the search interval 272 is defined as an interval 272 on the right side of the pixel 260R. In other words, if x _L is the x-axis coordinate value of the left camera image, the location of the actual corresponding pixel of the right camera may be searched at intervals of [x _L , x _L + L], where The stereo camera image is corrected, and L is a constant (40 in this implementation) representing the search range of the background object.

ＴＯＦカメラ画像の画素をステレオ画像にマッピングするマッピング行列のために、左右の画像中の対応の画素がおおよそわかっているので、探索範囲は制限される。さらに、画素が前景の対象物に属するか否かがわかっているので、前景又は背景の対象物の画素に好適な探索アルゴリズムを選択することもできる。アルゴリズムの各々について、好ましい探索範囲が予め規定されている。特に、背景画素のための探索間隔は前景画素のものに比べてかなり制限されるので、計算コストは低く、間違ったマッチングが生じる可能性はかなり低い。従って、ディスパリティを計算するにあたって（もしあれば）誤差を訂正することができる。 Because of the mapping matrix that maps the pixels of the TOF camera image to the stereo image, the corresponding pixels in the left and right images are roughly known, so the search range is limited. Further, since it is known whether or not the pixel belongs to the foreground object, a search algorithm suitable for the foreground or background object pixel can be selected. A preferred search range is predefined for each of the algorithms. In particular, the search interval for the background pixels is considerably limited compared to that of the foreground pixels, so the calculation cost is low and the possibility of incorrect matching is very low. Therefore, errors (if any) can be corrected in calculating disparity.

３ＤでのＸ,Ｙ座標値がディスパリティ探索範囲になんの影響も持たないことに注意されたい。影響があるのはＺ座標のみである。Ｚ座標値はＴＯＦカメラ６２によって大まかに測定され、探索範囲を規定するのに用いられる。 Note that the 3D X and Y coordinate values have no effect on the disparity search range. Only the Z coordinate is affected. The Z coordinate value is roughly measured by the TOF camera 62 and used to define the search range.

予備較正アルゴリズム
図１０は図３に示した予備較正ソフトウェア１４０を実現するコンピュータプログラムのフロー図である。図１０を参照して、プログラムはステップ３００で開始し、ここで変数ｉがゼロに設定される。変数ｉはＴＯＦカメラ６２の予備較正のためにユーザによって選択された点の数を示す。 Precalibration Algorithm FIG. 10 is a flow diagram of a computer program that implements the precalibration software 140 shown in FIG. Referring to FIG. 10, the program starts at step 300 where variable i is set to zero. The variable i indicates the number of points selected by the user for pre-calibration of the TOF camera 62.

プログラムはステップ３００に続いてステップ３０２を含む。ここで変数ｉが定数ＭＡＸより大きいか否かが判定され、その判定に応じて、制御フローが２つの方向に分岐する。定数ＭＡＸは予備較正に用いられる点（画素）の数であり、予め定められる。明細書の他の箇所で言及したように、ＭＡＸは＞＞６でなければならない。 The program includes step 302 following step 300. Here, it is determined whether or not the variable i is larger than the constant MAX, and the control flow branches in two directions according to the determination. The constant MAX is the number of points (pixels) used for preliminary calibration and is determined in advance. As mentioned elsewhere in the specification, MAX must be >> 6.

プログラムはさらに以下のステップを含む。ステップ３０２での判定がＮＯである場合に実行され、ＴＯＦカメラ画像内でユーザによって選択された画素の座標（ｘ_ｏｉ，ｙ_ｏｉ，ｚ_ｏｉ）を検出するステップ３０４と、ＲＡＭ９６内の記憶領域Ｒａｎｇｅ［ｉ］に座標（ｘ_ｏｉ，ｙ_ｏｉ，ｚ_ｏｉ）を記憶するステップ３０６と、左カメラ画像中でユーザによって選択された画素の座標（ｘ_Ｌｉ，ｙ_Ｌｉ）を検出するステップ３０８と、ＲＡＭ９６内の記憶領域Ｌｅｆｔ［ｉ］に座標（ｘ_Ｌｉ，ｙ_Ｌｉ）を記憶するステップ３１０と、右カメラ画像内でユーザによって選択された画素の座標値（ｘ_Ｒｉ，ｙ_Ｒｉ）を検出するステップ３１２と、ＲＡＭ９６内の記憶領域Ｒｉｇｈｔ［ｉ］に座標（ｘ_Ｒｉ，ｙ_Ｒｉ）を記憶するステップ３１４と、変数ｉを１だけ増分するステップ３１６とを含む。ステップ３１６の後、制御フローはステップ３０２に戻る。 The program further includes the following steps. This is executed when the determination in step 302 is NO, and step 304 for detecting the coordinates (x _oi , y _oi , z _oi ) of the pixel selected by the user in the TOF camera image, and the storage area Range in the RAM 96 are performed. A step 306 for storing coordinates (x _oi , y _oi , z _oi ) in [i], a step 308 for detecting the coordinates (x _Li , y _Li ) of the pixel selected by the user in the left camera image, and a RAM 96 Step 310 for storing the coordinates (x _Li , y _Li ) in the storage area Left [i] within the storage area, and Step 312 for detecting the coordinate values (x _Ri , y _Ri ) of the pixel selected by the user in the right camera image. Step 314 for storing the coordinates (x _Ri , y _Ri ) in the storage area Right [i] in the RAM 96, and incrementing the variable i by 1 Step 316. After step 316, control flow returns to step 302.

プログラムはさらに、ステップ３０２の判定がＹＥＳであるときに実行され、ＴＯＦカメラ画像の画素を左右のカメラ画像にそれぞれマッピングするマッピング行列Ａ_Ｌ及びＡ_Ｒを計算するステップ３１８と、ステップ３１８に続いて、図２に示すＲＡＭ９６内のＴＯＦマッピングパラメータメモリ１４２内にマッピング行列Ａ_Ｌ及びＡ_Ｒのパラメータを記憶するステップ３２０とを含む。ステップ３１８で、行列Ａ_Ｌ及びＡ_Ｒが、式４をＱ−Ｒ分解を利用して解くことによって計算される。 Program further, the determination of step 302 is executed when it is YES, the step 318 of calculating the mapping matrix A _L and A _R for mapping each pixel of the TOF camera image to the left and right of the camera image, following step 318 Storing 320 the parameters of the mapping matrices A _L and A _R in the TOF mapping parameter memory 142 in the RAM 96 shown in FIG. In step 318, matrices A _L and A _R are calculated by solving Equation 4 using QR decomposition.

ディスパリティ探索アルゴリズム
図１４は図３に示すディスパリティ探索モジュール１４８を実現するソフトウェアのフロー図である。図１４を参照して、プログラムは、全てのＴＯＦ画素について、ＴＯＦ画素を左カメラ画像の画素にマッピングするステップ４００と、ＴＯＦ画素が前景にあるか否かを判定し、制御フローの２方向への分岐を引起すステップ４０１と、ステップ４０１での判定がＮＯの場合に実行され、通常の背景探索を行なうステップ４０２と、ステップ４０１での判定がＹＥＳの場合に実行されＴＯＦ画素に対応する右カメラ画像の場所を推定するステップ４０４と、ステップ４０４に続いて、左カメラ画像の画素に対応する画素を右カメラ画像中で探索する制限された前景探索を行なうステップ４０６と、を繰返し行なう。 Disparity Search Algorithm FIG. 14 is a software flowchart for realizing the disparity search module 148 shown in FIG. Referring to FIG. 14, the program determines, for all TOF pixels, step 400 for mapping TOF pixels to pixels of the left camera image, whether the TOF pixel is in the foreground, and proceeds in two directions of the control flow. Is executed when the determination in step 401 is NO, the step 402 for performing normal background search, and the determination corresponding to the TOF pixel is executed when the determination in step 401 is YES. Step 404 of estimating the location of the camera image, and step 406 of performing a limited foreground search for searching the pixel corresponding to the pixel of the left camera image in the right camera image are repeated.

ステップ４０１は比較器１８０（図４を参照）のソフトウェア実現例であり、強度がしきい値より大きければ、画素は前景に属するものと判定され、そうでなければ、背景に属するものと判定される。 Step 401 is a software implementation of the comparator 180 (see FIG. 4). If the intensity is greater than the threshold, the pixel is determined to belong to the foreground, otherwise it is determined to belong to the background. The

通常の背景探索では、図９（Ｂ）を参照して説明した探索方法が行なわれる。制限された背景探索では、図９（Ａ）を参照して説明した探索方法が行なわれる。制限された探索は計算コストを大幅に削減するので、アルゴリズムは左右のカメラ画像間のディスパリティをリアルタイムで見出すことができる。 In the normal background search, the search method described with reference to FIG. 9B is performed. In the limited background search, the search method described with reference to FIG. 9A is performed. The limited search greatly reduces the computational cost, so the algorithm can find the disparity between the left and right camera images in real time.

[動作]
図１から図１４を参照して、上述のディスパリティ推定システム４０は以下のように動作する。図１を参照して、ステレオカメラ６０の較正が最初にオフラインで行なわれる。較正プロセスでは、予め定められたパターンプレートがステレオカメラ６０に提示される。較正ソフトウェア１３０は較正パラメータを計算する。パラメータは較正パラメータメモリ１３２に記憶される。 [Operation]
1 to 14, the above-described disparity estimation system 40 operates as follows. Referring to FIG. 1, calibration of stereo camera 60 is first performed offline. In the calibration process, a predetermined pattern plate is presented to the stereo camera 60. Calibration software 130 calculates calibration parameters. The parameters are stored in the calibration parameter memory 132.

次に、ＴＯＦカメラ６が予備較正ソフトウェア１４０によってオフラインで較正される。予備較正プロセスでは、予め定められたパターン３４０がステレオカメラ６０及びＴＯＦカメラ６２に提示される。予備較正プロセスでは、ステレオ画像は補正ソフトウェア１３４によって補正され、フレームメモリ１３６に記憶される。図１０に示すように、予備較正ソフトウェア１４０はＴＯＦカメラ画像と補正されたステレオ画像とを、較正フレームメモリ１３８とフレームメモリ１３６とから、それぞれ読出す。３つ組の画像（ＴＯＦカメラ画像と左右の画像）が３Ｄモニタ５４に順に示され、ユーザはＧＵＩによってこれら画像の対応の画素を繰返し選択する。パターン３４０を移動させ選択プロセスを繰返すことによって、式（４）を解くに足る充分な数の画素（＞＞６）が収集される。式４をＱ−Ｒ分解を用いて解くことによってマッピング行列Ａ_Ｌ及びＡ_Ｒが得られる。マッピング行列はＴＯＦマッピングパラメータメモリ１４２に記憶される。 Next, the TOF camera 6 is calibrated offline by the preliminary calibration software 140. In the preliminary calibration process, a predetermined pattern 340 is presented to the stereo camera 60 and the TOF camera 62. In the pre-calibration process, the stereo image is corrected by the correction software 134 and stored in the frame memory 136. As shown in FIG. 10, the preliminary calibration software 140 reads the TOF camera image and the corrected stereo image from the calibration frame memory 138 and the frame memory 136, respectively. Three sets of images (TOF camera image and left and right images) are shown in sequence on the 3D monitor 54, and the user repeatedly selects the corresponding pixels of these images via the GUI. By moving the pattern 340 and repeating the selection process, enough pixels (>> 6) are collected to solve equation (4). Mapping matrices A _L and A _R are obtained by solving Equation 4 using QR decomposition. The mapping matrix is stored in the TOF mapping parameter memory 142.

ステレオカメラ６０の較正とＴＯＦカメラ６２の予備較正が完了すると、ディスパリティ推定システム４０は３Ｄシーンをキャプチャしてディスパリティメモリ１５０にディスパリティ画像のストリームを生成する準備が整ったことになる。 When the calibration of the stereo camera 60 and the preliminary calibration of the TOF camera 62 are complete, the disparity estimation system 40 is ready to capture a 3D scene and generate a stream of disparity images in the disparity memory 150.

動作において、ディスパリティ推定システム４０はステレオカメラ６０からのステレオ画像のストリームを獲得する。キャプチャされた画像は較正パラメータメモリ１３２に記憶された較正パラメータを用いて補正ソフトウェア１３４によって補正され、エピポーラ線が画像の行に対応するようになる。補正された画像はフレームメモリ１３６に記憶される。同時に、ディスパリティ推定システム４０はＴＯＦカメラ６２からＴＯＦ画像のストリームを獲得する。 In operation, the disparity estimation system 40 obtains a stream of stereo images from the stereo camera 60. The captured image is corrected by the correction software 134 using the calibration parameters stored in the calibration parameter memory 132 so that epipolar lines correspond to the rows of the image. The corrected image is stored in the frame memory 136. At the same time, the disparity estimation system 40 obtains a stream of TOF images from the TOF camera 62.

ＴＯＦカメラ画像の各フレームの各画素について、前景／背景マッピングモジュール１４６が画素を左カメラ画像にマッピングする。その後、モジュール１４６はＴＯＦカメラ画像の画素の強度を調べ、画素が前景の対象物に属するか、背景に属するかを判定する。もし画素が前景にあれば、モジュール１４６はハイレベルのＦ／Ｇ信号１７６をディスパリティ探索モジュール１４８に送る。そうでなければ、モジュール１４６はローレベルのＦ／Ｇ信号１７６をディスパリティ探索モジュール１４８に送る。 For each pixel in each frame of the TOF camera image, the foreground / background mapping module 146 maps the pixel to the left camera image. Thereafter, module 146 examines the pixel intensity of the TOF camera image to determine whether the pixel belongs to the foreground object or the background. If the pixel is in the foreground, module 146 sends a high level F / G signal 176 to disparity search module 148. Otherwise, module 146 sends a low level F / G signal 176 to disparity search module 148.

ハイレベルのＦ／Ｇ信号１７６に応答して、ディスパリティ探索モジュール１４８はＴＯＦカメラ画像内の興味ある画素を右カメラ画像にマッピングすることによって右カメラ画像の画素を推定する。その後、ディスパリティ探索モジュール１４８は、制限された探索範囲で前景探索アルゴリズムを利用して、左右のカメラ画像のディスパリティを判定する。 In response to the high level F / G signal 176, the disparity search module 148 estimates the pixels of the right camera image by mapping the pixels of interest in the TOF camera image to the right camera image. Thereafter, the disparity search module 148 determines the disparity of the left and right camera images using a foreground search algorithm in the limited search range.

ローレベルのＦ／Ｇ信号１７６に応答して、ディスパリティ探索モジュール１４８は左画素と同じｘ−ｙ座標値を有する右カメラ画像中の画素の場所とつきとめ、より長い探索範囲で背景ディスパリティ探索アルゴリズムを利用して、ディスパリティを判定する。対応の画素の探索が終了すると、ディスパリティが計算され、ディスパリティ探索モジュール１４８は、ＴＯＦ画像の次の画素を選択するよう、前景／背景マッピングモジュール１４６に信号を送る。各画素について、ディスパリティ探索モジュール１４８はディスパリティを、ディスパリティメモリ１５０の左カメラ画像のその画素のアドレスに書込む。 In response to the low level F / G signal 176, the disparity search module 148 locates the pixel in the right camera image having the same xy coordinate values as the left pixel, and searches for background disparity over a longer search range. Disparity is determined using an algorithm. When the search for the corresponding pixel is finished, the disparity is calculated and the disparity search module 148 signals the foreground / background mapping module 146 to select the next pixel of the TOF image. For each pixel, disparity search module 148 writes the disparity to the address of that pixel in the left camera image in disparity memory 150.

ディスパリティ画像が完成すると、グラフィック出力ユニット１５２はフレームメモリ１３６に記憶された左カメラ画像とディスパリティメモリ１５０に記憶されたディスパリティ画像とを選択し、これらを３Ｄモニタ５４に与える。同時に、グラフィック出力ユニット１５２はＴＯＦ画像、ステレオ画像及びディスパリティ画像のいずれかの組合せをコントローラ１２２の指示に従って選択し、これをモニタ７４に与える。 When the disparity image is completed, the graphic output unit 152 selects the left camera image stored in the frame memory 136 and the disparity image stored in the disparity memory 150 and supplies them to the 3D monitor 54. At the same time, the graphic output unit 152 selects any combination of the TOF image, the stereo image, and the disparity image in accordance with the instruction of the controller 122, and supplies this to the monitor 74.

３Ｄモニタ５４は左カメラ画像と対応するディスパリティ画像とに基づいてステレオ表示を生成する能力を有するので、ユーザは３Ｄモニタ５４上に３Ｄ画像を見ることができる。 Since the 3D monitor 54 has the ability to generate a stereo display based on the left camera image and the corresponding disparity image, the user can view the 3D image on the 3D monitor 54.

[実験的セットアップ]
Ｖｉｄｅｒｅ^ＴＭステレオビジョンハードウェアとＳＶＳソフトウェアがこの実現に利用される。カメラのキャリブレーション及び補正はＳＶＳライブラリを用いて自動的に行なわれる。ＳＶＳソフトウェアはステレオビデオシーケンスをキャプチャすることができ、ステレオ対の３Ｄデータを３２０×２４０の全画像解像度で３０Ｈｚで再構築する。しかし、３Ｄの再構築にとって興味ある領域（ｒｅｇｉｏｎｏｆｉｎｔｅｒｅｓｔ：ＲＯＩ）はこの実験においてユーザの顔区域であるので、ディスパリティ探索区域を、上述の説明の通り、顔のホロプタ周辺に限定した。したがって、顔の外側で深さの異なる再構築された３Ｄデータは、図２３（Ｃ）に示すように３Ｄ推定が不正確である。 [Experimental setup]
Videre ^™ stereo vision hardware and SVS software are utilized for this implementation. Camera calibration and correction is performed automatically using the SVS library. SVS software can capture stereo video sequences and reconstruct stereo pairs of 3D data at 30 Hz with a total image resolution of 320 × 240. However, since the region of interest (ROI) of interest for 3D reconstruction is the user's face area in this experiment, the disparity search area was limited to the area around the face horopter as described above. Therefore, the reconstructed 3D data having different depths outside the face has an inaccurate 3D estimation as shown in FIG.

図１に示すように、システムはＳｗｉｓｓｒａｎｇｅｒ^ＴＭ製の飛行時間（ｔｉｍｅ−ｏｆ−ｆｌｉｇｈｔ：ＴＯＦ）レンジカメラ（略してＴＯＦカメラ）と、Ｖｉｄｅｒｅ^ＴＭ製の従来のＣＣＤステレオカメラから構成され、密なステレオ再構築アルゴリズムのディスパリティ計算を改良する、実質的に並列な画像チャネルを可能にする。ＳＲＩインターナショナル（ＳＲＩＩｎｔｅｒｎａｔｉｏｎａｌ^ＴＭ）のＳＶＳ^ＴＭソフトウェアを利用して、ステレオ画像のキャプチャ、カメラ較正及び補正を行なう。システムはステレオビデオシーケンスをキャプチャすることができ、ステレオ対の３Ｄデータを３２０×２４０の画像解像度で毎秒３０フレームで再構築する。 As shown in FIG. 1, the system is composed of a Swissrange ^™ time-of-flight (TOF) range camera (abbreviated as TOF camera) and a conventional CCD stereo camera made by Videre ^™ , which is a dense stereo. It enables a substantially parallel image channel that improves the disparity computation of the reconstruction algorithm. Stereo image capture, camera calibration and correction is performed using SRI ^™ software from SRI International ^™ . The system can capture a stereo video sequence and reconstruct the stereo pair of 3D data at an image resolution of 320 × 240 at 30 frames per second.

再構築された３Ｄ座標値は、予め規定された世界座標系に対するものである。この実現例での世界座標系（原点）は左カメラの焦点となるように規定され、右手の座標系である。 The reconstructed 3D coordinate values are for a predefined world coordinate system. The world coordinate system (origin) in this implementation is defined to be the focal point of the left camera and is the right hand coordinate system.

図１５はＴＯＦレンジカメラからステレオ画像へのマッピングのためのスクリーンキャプチャであり、ここでＴＯＦカメラ画像４３０からマッピングされた画素は左右のカメラ画像４３２及び４３４上に重ねられる。 FIG. 15 is a screen capture for mapping from a TOF range camera to a stereo image, where the pixels mapped from the TOF camera image 430 are superimposed on the left and right camera images 432 and 434.

図１６は左カメラ４５０の画像フレーム、ディスパリティ画像４５２、及び再構築された３ＤデータのオープンＧＬ（ＯｐｅｎＧＬ）プロット４５４をそれぞれ示す。ＴＯＦカメラからの範囲情報で、背景の対象物に対する３Ｄ再構築が改良される。 FIG. 16 shows an image frame of the left camera 450, a disparity image 452, and an open GL (Open GL) plot 454 of the reconstructed 3D data, respectively. With range information from the TOF camera, 3D reconstruction for background objects is improved.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味および範囲内でのすべての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.

この発明の一実施の形態に従ったディスパリティ推定システム４０のシステム構成要素を示す図である。It is a figure which shows the system component of the disparity estimation system 40 according to one embodiment of this invention. この発明の実施の形態の、コンピュータによって実現されたディスパリティ計算装置５０のハードウェアブロック図である。It is a hardware block diagram of the disparity calculation apparatus 50 implement | achieved by computer of embodiment of this invention. ディスパリティ計算装置５０の全体の機能ブロック図である。3 is a functional block diagram of the entire disparity calculating apparatus 50. FIG. 前景／背景マッピングモジュール１４６の詳細を示す図である。FIG. 6 is a diagram showing details of a foreground / background mapping module 146. ディスパリティ探索モジュール１４８の詳細を示す図である。FIG. 10 is a diagram showing details of a disparity search module 148. ステレオ画像の補正を示す図である。It is a figure which shows correction | amendment of a stereo image. ディスパリティ推定のための、補正されたステレオ画像対テンプレート探索プロセスを示す図である。FIG. 6 illustrates a corrected stereo image pair template search process for disparity estimation. 左カメラ画像２４０Ｌの画素２５０Ｌのディスパリティをどのように計算するかを示す図である。It is a figure which shows how the disparity of the pixel 250L of the left camera image 240L is calculated. （Ａ）は前景ディスパリティ探索を示し、（Ｂ）は背景ディスパリティ探索を示す図である。(A) shows a foreground disparity search, and (B) shows a background disparity search. ＴＯＦカメラ６２の予備較正の全体のプロセスステップを示すフロー図である。FIG. 6 is a flow diagram showing the overall process steps for pre-calibration of the TOF camera 62. 予備較正プロセスで用いられるパターン３４０を示す図である。FIG. 4 shows a pattern 340 used in a pre-calibration process. 予備較正プロセスにおいて、ユーザがどのようにＴＯＦカメラ及びステレオカメラの画素を選択するかを示す図である。FIG. 4 is a diagram showing how a user selects pixels of a TOF camera and a stereo camera in a pre-calibration process. 「左」カメラ画像２４０Ｌ及び「右」カメラ画像２４０Ｒ内のブロックがどのようにサブブロックに分割されるかを示す図である。It is a figure which shows how the block in the "left" camera image 240L and the "right" camera image 240R is divided | segmented into a subblock. ディスパリティ探索アルゴリズムのフロー図である。It is a flowchart of a disparity search algorithm. ＴＯＦカメラからステレオ画像へのマッピングのためのスクリーンキャプチャを示す図である。It is a figure which shows the screen capture for the mapping from a TOF camera to a stereo image. 左カメラ４５０の画像フレーム、ディスパリティ画像４５２及び３Ｄ再構築データのＯｐｅｎＧＬプロット４５４をそれぞれ示す図である。It is a figure which shows the Open GL plot 454 of the image frame of the left camera 450, the disparity image 452, and 3D reconstruction data, respectively.

Explanation of symbols

４０ディスパリティ推定システム
５０ディスパリティ計算装置
５４３Ｄモニタ
６０ステレオカメラ
６２ＴＯＦカメラ
７０コンピュータ
７４モニタ
８０キーボード
８２マウス
９０ＣＰＵ
９４ＲＯＭ
９６ＲＡＭ
９８ハードディスクドライブ
１３０較正ソフトウェア
１３２較正パラメータメモリ
１３４補正ソフトウェア
１３６フレームメモリ
１３８較正フレームメモリ
１４０予備較正ソフトウェア
40 disparity estimation system 50 disparity calculation device 54 3D monitor 60 stereo camera 62 TOF camera 70 computer 74 monitor 80 keyboard 82 mouse 90 CPU
94 ROM
96 RAM
98 Hard disk drive 130 Calibration software 132 Calibration parameter memory 134 Correction software 136 Frame memory 138 Calibration frame memory 140 Pre-calibration software

Claims

A device for matching corresponding pixel pairs in a corrected stereo image pair, wherein the device is connected to a stereo camera and a range camera to receive the corrected stereo image pair and the range camera image, respectively. The stereo image pair includes a first image and a second image, and the apparatus maps a pixel in the range camera image to a pixel in the first image and a pixel in the second image. First mapping means for:
Comparison means for comparing a pixel value of the pixel in the range camera image with a threshold value;
In response to the comparison means, a first disparity search and a second disparity search are selectively performed depending on a comparison result by the comparison means, and pixels in the second image are obtained. Means for searching for a match in the first image for the pixel.

The search means includes
First search means responsive to the comparison means determining that the pixel value of the pixel is higher than the threshold;
A second search means responsive to determining that the pixel value of the pixel is less than or equal to the threshold value;
The apparatus according to claim 1, wherein a search range of the first search means is shorter than a search range of the second search means.

The first search means includes
Second mapping means for mapping the pixels of the range camera image to the second image in response to the comparison means determining that the pixel value of the pixel is higher than the threshold value When,
First search range defining means for defining a first search range extending on both sides of the pixel on the epipolar line of the second image mapped by the second mapping means;
A predetermined block of a predetermined size that includes the pixels in the first image and a predetermined block in the second image that includes each pixel within the first search range. First similarity calculating means for calculating a predetermined similarity measure with the size block;
First block selecting means for selecting the block in the second image having the highest similarity measure;
The first pixel selection means for selecting the pixel at the center of the block selected by the first block selection means as a match with the pixel of the first image. The device described.

The second search means includes
Second search range defining means for defining a second search range that is longer than the first search range, wherein the second search range is mapped by the second mapping means. Extends only to one side of the pixel on the epipolar line of the image,
The second search means further includes a predetermined block of a predetermined size that includes the pixels in the first image, and a second block that includes each pixel in the second search range. Second similarity calculating means for calculating a predetermined similarity measure with a block of a predetermined size in the images of
Second block selection means for selecting the block in the second image having the highest similarity measure calculated by the second similarity means;
The second pixel selection means for selecting the pixel at the center of the block selected by the second block selection means as a match with the pixel of the first image. The device described.

The first similarity calculation means includes:
Means for dividing each of the block in the first image and the block in the second image into a plurality of sub-blocks of the same shape;
Means for calculating an average pixel value of each of the blocks in the second image and the block in the first image;
Means for subtracting the average pixel value from the pixel value of each pixel of the sub-block;
Means for calculating an average pixel value of each pixel of the sub-block;
Means for calculating a sum of square errors between an average pixel value of the sub-blocks of the first image and an average pixel value of each of the sub-blocks of the second image block;
The apparatus of claim 3, wherein the sum of squared errors is the similarity measure.

A computer-executable computer program connected to a stereo camera and a range camera to receive a corrected stereo image pair and a range camera image, respectively, the stereo image pair comprising a first image and a second image First mapping means for mapping a computer in the range camera image to a pixel in the first image when the computer program is executed on the computer;
Comparison means for comparing a pixel value of the pixel in the range camera image with a threshold value;
In response to the comparison means, a first disparity search and a second disparity search are selectively performed depending on a comparison result by the comparison means, and pixels in the second image are obtained. A computer program that operates as means for searching for a match with the pixel in the first image.