JP3456029B2

JP3456029B2 - 3D object recognition device based on image data

Info

Publication number: JP3456029B2
Application number: JP24944094A
Authority: JP
Inventors: 美樹男笹木; 章人豊田
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 1994-10-14
Filing date: 1994-10-14
Publication date: 2003-10-14
Anticipated expiration: 2018-10-14
Also published as: JPH08114416A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自律走行形自動車が車
外環境を画像処理技術により認識して既知形状の対象物
体（自動車、障害物、建築物など）を発見したり、自律
移動ロボットが既知対象物体と自身との間の三次元的な
位置関係を画像処理技術により検出する際の視覚認識装
置、或いは三次元映像データベースの構築や超低ビット
レート画像伝送のためのモデルベース符号化装置などの
用途に利用できる画像データに基づく三次元物体認識装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an autonomous mobile vehicle that recognizes an environment outside the vehicle by image processing technology and discovers a target object (automobile, obstacle, building, etc.) of known shape, or an autonomous mobile robot. A visual recognition device for detecting a three-dimensional positional relationship between a known target object and itself by image processing technology, or a model-based coding device for constructing a three-dimensional video database and transmitting an ultra-low bit rate image The present invention relates to a three-dimensional object recognition device based on image data that can be used for such purposes.

【０００２】[0002]

【従来の技術】近年では、ＦＡ検査の自動化機器、無人
監視装置、無人移動車、或いは自律移動ロボット用の視
覚装置に代表されるロボットビジョンなどの需要増大に
伴い、対象物や環境の運動・形状・色彩などを認識して
異なる視点から見た三次元映像を即座且つ自在に作り出
すことができる装置が要求されている。また、ＴＶ電話
などの超低ビットレート画像伝送や三次元映像の臨場感
通信を目的とした次世代映像符号化技術が研究されてい
るが、この場合の符号化プロセスにおいても、被写体の
三次元運動や変形を自動的に推定することが要求されて
いる。さらに、コンピュータグラフィックスを用いた映
像合成・編集技術においても、対象物のモデリングの自
動化に対する要求が非常に大きく、このような要求は人
工現実感技術の出現により益々増大している。2. Description of the Related Art In recent years, with the increase in demand for automation equipment for FA inspection, unmanned monitoring equipment, unmanned mobile vehicles, or robot vision represented by visual equipment for autonomous mobile robots, movement of objects and the environment There is a demand for an apparatus capable of recognizing shapes, colors, and the like and instantly and freely creating three-dimensional images viewed from different viewpoints. In addition, next-generation video coding technology for ultra-low bit rate image transmission such as TV phones and realistic communication of 3D video is being researched. There is a demand for automatic estimation of movement and deformation. Further, even in the image synthesizing / editing technique using computer graphics, there is a great demand for automating modeling of an object, and such a demand is increasing more and more due to the advent of artificial reality technology.

【０００３】例えば、工場内において形状が既知である
ワークを扱うＦＡ用自律移動ロボットについては、単眼
視カメラ（例えばＣＣＤカメラ）による二次元的な視覚
情報（画像情報）に基づいて三次元位置姿勢を認識する
ことが要求される。この場合には、絶対座標系で固定さ
れた画像処理対象からの特徴量抽出や形状パラメータ計
算などに基づいて、単眼視カメラの三次元現在位置と姿
勢とを算出することが目標となる。For example, for an autonomous mobile robot for FA that handles a work whose shape is known in a factory, three-dimensional position / orientation based on two-dimensional visual information (image information) from a monocular camera (for example, CCD camera). Is required to be recognized. In this case, the goal is to calculate the three-dimensional current position and orientation of the monocular camera based on feature amount extraction and shape parameter calculation from the image processing target fixed in the absolute coordinate system.

【０００４】このようなＦＡ用位置姿勢認識手法として
は、例えば特開平６−２６２５６８号公報及び特開平６
−２５８０２８号公報に見られるように、画像情報に基
づいて対象物体の形状の特徴を示す画像特徴点を抽出す
ると共に、カメラの位置姿勢を表現するパラメータ空間
を目標精度の格子間隔で量子化し、各量子化空間点に対
応する画像特徴量の誤差評価関数最小化によって最適姿
勢位置を探索するものが考えられている。このような位
置姿勢認識手法は、工場内という既知の屋内環境におい
て既知形状の対象物を扱う場合を想定しているため、対
象物の見え方が限定でき、しかも照明条件がほぼ一定で
あることから、特徴点を二値化して抽出することが可能
となり、探索領域も機械の停止精度から限定できる。従
って、パラメータ毎の巡回的な一次元探索を数回繰り返
せば最適な位置姿勢を推定することができる。As such a position / orientation recognition method for FA, for example, JP-A-6-262568 and JP-A-6-262568 are available.
As disclosed in Japanese Patent No. 258028, the image feature points indicating the feature of the shape of the target object are extracted based on the image information, and the parameter space expressing the position and orientation of the camera is quantized at the grid interval of the target accuracy, It is considered that the optimum posture position is searched by minimizing the error evaluation function of the image feature amount corresponding to each quantized spatial point. Since such a position and orientation recognition method is intended to handle an object with a known shape in a known indoor environment inside a factory, the appearance of the object can be limited, and the lighting conditions are almost constant. Therefore, the feature points can be binarized and extracted, and the search area can be limited from the machine stop accuracy. Therefore, the optimum position and orientation can be estimated by repeating the cyclic one-dimensional search for each parameter several times.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記の
ような位置姿勢認識手法では、屋外環境のように明るさ
の変動や光の反射などの雑音成分レベルが大きい場所で
使用する場合、或いは非対象物と画像の重なりがあるな
どして対象物の見え方が限定できない場合などのよう
に、特徴点抽出を安定して行えない状況下では適用が困
難になるものであった。また、対象物そのものが曲面形
状を主体に構成されている場合のように特徴点の定義が
できない場合にも、適用が困難になるという事情があっ
た。つまり、従来構成では、対象物の形状、その見え方
と位置姿勢範囲、照明条件などに関して制約が存在する
ため、これが実用上の障害になっていた。また、従来構
成では、静止画像処理を基本にしているため移動対象物
の運動情報を推定することができず、このような点も課
題の一つとなっていた。However, the above-described position and orientation recognition method is used when used in a place where a noise component level such as a change in brightness or reflection of light is large, such as in an outdoor environment, or in a non-target. This is difficult to apply in a situation where feature points cannot be stably extracted, such as when the appearance of an object cannot be limited due to overlapping of objects and images. Further, there is a situation in which the application becomes difficult even when the feature points cannot be defined as in the case where the object itself is mainly formed of a curved surface shape. In other words, in the conventional configuration, there are restrictions on the shape of the object, its appearance and position / orientation range, lighting conditions, etc., which has been a practical obstacle. Moreover, in the conventional configuration, since the still image processing is basically used, the motion information of the moving object cannot be estimated, and such a point is also one of the problems.

【０００６】一方、従来では、映像情報に基づく移動物
体の認識装置として、特開平５−７３６８１号公報及び
特開平５−７３６８２号公報などに記載されたもののよ
うに、対象物の特徴点抽出を前提としたものが知られて
いるが、これらのものも上述と同様の理由により曲面形
状の物体には適用が困難である。On the other hand, conventionally, as a moving object recognition device based on video information, feature point extraction of an object is performed, such as those described in Japanese Patent Application Laid-Open Nos. 5-73681 and 5-73682. Although the predicates are known, it is difficult to apply them to curved objects for the same reason as described above.

【０００７】本発明は上記のような事情に鑑みてなされ
たものであり、その目的は、明るさの変動や光の反射な
どの雑音成分が多い状況下、或いは対象物の特徴点抽出
を安定して行えない状況下においても、異なる視点から
見た移動対象物の三次元映像を自在に生成可能となり、
しかも、斯様な三次元映像の生成を、対象物が曲面形状
を主体に構成されている場合或いは撮像手段の姿勢位置
が不明な状況下でも支障なく行い得るようになるなどの
効果を奏する画像データに基づく三次元物体認識装置を
提供することにある。The present invention has been made in view of the above circumstances, and an object thereof is to stabilize the extraction of feature points of an object under a situation where there are many noise components such as fluctuations in brightness and reflection of light. Even in situations where it is not possible to do so, it is possible to freely generate 3D images of moving objects viewed from different viewpoints,
In addition, such an image having an effect that it is possible to generate such a three-dimensional image without trouble even when the object is mainly configured by a curved surface shape or the situation in which the posture position of the imaging unit is unknown It is to provide a three-dimensional object recognition device based on data.

【０００８】[0008]

【課題を解決するための手段】本発明による三次元物体
認識装置は上記目的を達成するために、三次元形状が既
知である対象物を撮影する撮像手段を備え、この撮像手
段により撮影された系列画像中の連続する複数のフレー
ムの画像データに基づいた動画像処理を行うことにより
対象物の三次元位置姿勢を認識するための装置におい
て、前記複数のフレームの画面間で出現する二次元動ベ
クトルに基づいて移動体領域を抽出する移動体領域抽出
手段と、前記対象物の三次元構造をモデリングした形状
データを記憶している記憶手段と、前記移動体領域抽出
手段により抽出された移動体領域の二次元形状が得られ
る撮像手段位置及び姿勢を探索することにより前記対象
物の位置姿勢を推定する空間量子化手段と、この空間量
子化手段が推定した対象物の位置姿勢と前記記憶手段に
記憶された形状データとに基づいた三次元構造モデリン
グにより前記対象物の仮想実画像を生成する画像推定手
段とを備えた構成としたものである（請求項１）。In order to achieve the above-mentioned object, a three-dimensional object recognition apparatus according to the present invention is provided with an image pickup means for photographing an object whose three-dimensional shape is known, and the image pickup means picks up an image. In a device for recognizing a three-dimensional position and orientation of an object by performing moving image processing based on image data of a plurality of consecutive frames in a series image, a two-dimensional motion appearing between screens of the plurality of frames. Moving body area extracting means for extracting a moving body area based on a vector, storage means for storing shape data modeling the three-dimensional structure of the object, and moving body extracted by the moving body area extracting means Spatial quantization means for estimating the position and orientation of the object by searching the position and orientation of the image pickup means for obtaining the two-dimensional shape of the region, and the space quantization means The image estimating means is configured to generate a virtual real image of the object by three-dimensional structural modeling based on the position and orientation of the elephant and the shape data stored in the storage means. 1).

【０００９】この場合、前記空間量子化手段を、前記画
像推定手段により生成された仮想実画像と実際の観測画
像との間の誤差を予め定義された誤差評価関数により算
出すると共に、その誤差評価関数が最小値を取った状態
での三次元位置姿勢を最終推定結果として出力する構成
としても良い（請求項２）。In this case, the spatial quantizing means calculates the error between the virtual real image generated by the image estimating means and the actual observed image by a predefined error evaluation function and evaluates the error. The configuration may be such that the three-dimensional position and orientation with the function taking the minimum value is output as the final estimation result (claim 2).

【００１０】また、前記空間量子化手段を、前記移動体
領域抽出手段により抽出された移動体領域の二次元形状
の長軸及び短軸の長さ情報から複数個の探索領域中心を
設定し、これらの探索領域を並列に一次元探索したデー
タに基づいて最終的に１つの位置姿勢を推定する構成と
しても良い（請求項３）。Further, the spatial quantizing means sets a plurality of search area centers from the length information of the long axis and the short axis of the two-dimensional shape of the moving body area extracted by the moving body area extracting means, A configuration may be adopted in which one position / orientation is finally estimated based on data obtained by parallelly one-dimensionally searching these search areas (claim 3).

【００１１】さらに、対象物における視覚的に特徴を把
握し易い部分を特徴領域として設定し、前記画像推定手
段を、仮想実画像の生成に当たって前記特徴領域以外の
画素密度が当該特徴領域より粗くなるように補間する構
成とすることもできる（請求項４）。Further, a portion of the object in which the feature is easily grasped is set as a feature region, and the image estimation means makes the pixel density other than the feature region coarser than that of the feature region in generating the virtual real image. It is also possible to adopt a configuration in which the above interpolation is performed (claim 4).

【００１２】前記移動体領域抽出手段は、ブロックマッ
チング法を用いてブロック単位の移動体領域を抽出して
ラベリングする構成とすることができる（請求項５）。The moving body area extracting means may be configured to extract the moving body area in block units using a block matching method and perform labeling.

【００１３】この場合、移動体領域抽出手段は、ブロッ
ク単位で抽出した移動体領域から、雑音成分を含むブロ
ックを除去する機能を備えたものとすることができる
（請求項６）。In this case, the moving body area extracting means may be provided with a function of removing a block containing a noise component from the moving body area extracted in block units.

【００１４】また、上記移動体領域抽出手段は、ブロッ
ク単位で抽出した移動体領域に囲まれた範囲に動きベク
トルが存在しない領域が含まれる場合に、そのブロック
を周囲のブロックと同一にラベリングする機能を備えた
構成とすることができる（請求項７）。Further, when the area surrounded by the moving body areas extracted in block units includes an area in which no motion vector exists, the moving body area extracting means labels the blocks in the same manner as the surrounding blocks. function can be configured and a child with a (claim 7).

【００１５】さらに、上記移動体領域抽出手段は、ブロ
ック単位で抽出した移動体領域から、前記対象物の影に
対応した部分を色空間分割処理によって除去する機能を
備えた構成とすることができる（請求項８）。Further, the moving body region extracting means may be provided with a function of removing a portion corresponding to the shadow of the object from the moving body region extracted in block units by color space division processing. (Claim 8).

【００１６】前記記憶手段を、前記対象物の形状データ
をワイヤフレームデータとして記憶した構成とし、前記
画像推定手段を、前記ワイヤフレームに対して前記対象
物を複数方向から撮影した実画像を張り付けるというテ
クスチャマッピングを行うことにより仮想実画像を生成
する構成としても良い（請求項９）。The storage means is configured to store the shape data of the object as wire frame data, and the image estimation means attaches real images obtained by photographing the object from a plurality of directions to the wire frame. The virtual real image may be generated by performing such texture mapping (claim 9).

【００１７】また、最終的な推定位置姿勢情報により生
成された複数の仮想実画像を含む再生映像に対して、推
定位置の時系列に応じた移動平均による平滑化処理を行
う構成としても良い（請求項１０）。Further, a smoothing process may be performed on a reproduced video including a plurality of virtual real images generated by the final estimated position / orientation information by a moving average according to a time series of estimated positions ( Claim 10).

【００１８】[0018]

【作用及び発明の効果】請求項１記載の装置では、移動
体領域抽出手段が、撮像手段により撮影された系列画像
のうち、連続する複数のフレームの画面間で出現する二
次元動ベクトル（オプティカルフロー）に基づいて移動
体領域を抽出するようになる。このように抽出された移
動体領域の二次元形状は、撮像手段位置及び姿勢に応じ
た形状を呈するものであり、空間量子化手段は、上記抽
出移動体領域の二次元形状が得られるような撮像手段位
置及び姿勢を探索することにより前記対象物の三次元位
置姿勢を推定する。画像推定手段は、上記のように空間
量子化手段が推定した対象物の三次元位置姿勢と、記憶
手段に記憶された形状データ、つまり対象物の三次元構
造をモデリングした形状データとに基づいた三次元構造
モデリングにより対象物の仮想実画像を生成する。In the apparatus according to the first aspect of the present invention, the moving body region extracting means includes a two-dimensional motion vector (optical) appearing between the screens of a plurality of consecutive frames in the series image taken by the imaging means. The mobile body region is extracted based on the flow). The two-dimensional shape of the moving body region thus extracted has a shape corresponding to the position and orientation of the image pickup unit, and the spatial quantizing unit obtains the two-dimensional shape of the extracted moving body region. The three-dimensional position and orientation of the object is estimated by searching the position and orientation of the image pickup means. The image estimation means is based on the three-dimensional position and orientation of the object estimated by the spatial quantization means as described above and the shape data stored in the storage means, that is, the shape data obtained by modeling the three-dimensional structure of the object. A virtual real image of the object is generated by three-dimensional structural modeling.

【００１９】この結果、撮像手段の姿勢位置が不明な状
況下においても、異なる視点から見た対象物の三次元映
像を自在に生成可能となるものである。特にこの場合に
は、従来の位置姿勢認識手法のように、二値化処理によ
る特徴点の抽出が不要であるから、屋外環境のように明
るさの変動や光の反射などの雑音成分レベルが大きい場
所で使用する場合、或いは非対象物と画像の重なりなど
により対象物の見え方が限定できない場合であっても、
上記のような異なる視点から見た対象物の三次元映像の
生成を支障なく行い得るようになる。また、対象物その
ものが曲面形状を主体に構成されている場合において
も、当該対象物の三次元構造をモデリング可能であれ
ば、つまりその三次元構造をモデリングした形状データ
が記憶手段に記憶されてさえいれば、上述同様の対象物
の三次元映像の生成を支障なく行い得るようになる。As a result, it is possible to freely generate a three-dimensional image of the object viewed from different viewpoints even in a situation where the posture position of the image pickup means is unknown. In this case, in particular, unlike the conventional position and orientation recognition method, it is not necessary to extract feature points by binarization processing, so noise component levels such as brightness fluctuations and light reflections such as in an outdoor environment can be reduced. Even if you use it in a large place, or if you can not limit the appearance of the object due to overlapping of non-object and image,
It becomes possible to generate a three-dimensional image of an object viewed from different viewpoints as described above without any trouble. Even when the object itself is mainly composed of a curved surface, if the three-dimensional structure of the object can be modeled, that is, the shape data modeling the three-dimensional structure is stored in the storage means. As long as the above conditions are satisfied, it is possible to generate a three-dimensional image of the same object as described above without any trouble.

【００２０】請求項２記載の装置では、空間量子化手段
が、画像推定手段により生成された仮想実画像と実際の
観測画像との間の誤差を予め定義された誤差評価関数に
より算出すると共に、その誤差評価関数が最小値を取っ
た状態での三次元位置姿勢を最終推定結果として出力す
るようになる。この結果、最終的に画像推定手段により
生成される仮想実画像の精度を高め得るようになる。In the apparatus according to the second aspect, the spatial quantizing means calculates an error between the virtual real image generated by the image estimating means and the actual observed image by a predefined error evaluation function, and The three-dimensional position / orientation with the error evaluation function taking the minimum value is output as the final estimation result. As a result, the accuracy of the virtual real image finally generated by the image estimation means can be improved.

【００２１】請求項３記載の装置では、空間量子化手段
が、移動体領域抽出手段により抽出された移動体領域の
二次元形状の長軸及び短軸の長さ情報から複数個の探索
領域中心を設定し、これらの探索領域を並列に一次元探
索したデータに基づいて最終的に１つの位置姿勢を推定
するようになる。従って、対象物の探索時に局所最適に
陥ることを防止できるようになり、しかも、このように
二次元形状から長軸及び短軸を抽出する処理自体は比較
的簡単であるから、探索領域中心の設定のための処理を
容易に行い得るようになって、空間量子化手段における
演算量を減らすことが可能となる。In the apparatus according to the third aspect, the spatial quantizing means uses a plurality of search area centers based on the length information of the long axis and the short axis of the two-dimensional shape of the moving body area extracted by the moving body area extracting means. Is set, and one position / orientation is finally estimated based on the data obtained by parallelly one-dimensionally searching these search areas. Therefore, it becomes possible to prevent the local optimum from falling during the search of the target object, and the process itself of extracting the long axis and the short axis from the two-dimensional shape is relatively simple. The processing for setting can be easily performed, and the amount of calculation in the spatial quantization means can be reduced.

【００２２】請求項４記載の装置では、対象物における
視覚的に特徴を把握し易い部分が特徴領域として設定さ
れ、画像推定手段は、仮想実画像の生成に当たって上記
特徴領域以外の画素密度が当該特徴領域より粗くなるよ
うに補間する構成であるから、画像推定手段での演算量
の低減を図り得るようになる。In the apparatus according to the fourth aspect, a portion of the object in which the feature is easily grasped visually is set as the feature region, and the image estimating means determines the pixel density other than the feature region in generating the virtual real image. Since the interpolation is performed so as to be coarser than the characteristic region, the amount of calculation in the image estimating means can be reduced.

【００２３】請求項５記載の装置では、移動体領域抽出
手段は、ブロックマッチング法を用いてブロック単位の
移動体領域を抽出してラベリングするようになる。この
場合、ブロックマッチング法は実用技術として既に確立
されたものであるから、移動体領域の抽出を確実に行い
得るようになる。In the apparatus according to the fifth aspect, the moving body area extracting means extracts the moving body area in block units using the block matching method and performs labeling. In this case, since the block matching method has already been established as a practical technique, it becomes possible to reliably extract the moving body region.

【００２４】請求項６記載の装置では、移動体領域抽出
手段は、ブロックマッチング法によりブロック単位で抽
出された移動体領域から、雑音成分を含むブロックを除
去するようになる。この結果、輝度のムラに起因した雑
音や画像そのものの雑音などによる悪影響を受けること
がなくなり、移動体領域の抽出精度を高め得るようにな
る。In the apparatus according to the sixth aspect, the moving body region extracting means removes the block including the noise component from the moving body region extracted in block units by the block matching method. As a result, it is possible to improve the extraction accuracy of the moving body region without being adversely affected by the noise caused by the uneven brightness and the noise of the image itself.

【００２５】請求項７記載の装置では、移動体領域抽出
手段は、ブロック単位で抽出された移動体領域に囲まれ
た範囲に動ベクトルが存在しない領域が含まれる場合
に、そのブロックを周囲のブロックと同一にラベリング
するようになる。この結果、対象物からの光の反射など
により動ベクトルを検知できなかったブロックが存在す
る場合でも、移動体領域の抽出精度が低下する虞がなく
なる。In the apparatus according to the seventh aspect, the moving body region extraction means, when the region surrounded by the moving body regions extracted in block units includes a region in which no motion vector exists, surrounds the block. Labeling will be the same as for blocks. As a result, even if there is a block in which the motion vector cannot be detected due to reflection of light from the object, there is no risk that the accuracy of extracting the moving body region will decrease.

【００２６】請求項８記載の装置では、移動体領域抽出
手段は、ブロック単位で抽出された移動体領域から、対
象物の影に対応した部分を色空間分割処理によって除去
するようになるから、対象物の影が存在する状況下でも
移動体領域の抽出を精度良く行い得るようになる。In the apparatus according to the eighth aspect, the moving body region extracting means removes the portion corresponding to the shadow of the object from the moving body region extracted in block units by color space division processing. Even in the situation where the shadow of the target object exists, it is possible to accurately extract the moving body region.

【００２７】請求項９記載の装置では、対象物の形状デ
ータがワイヤフレームデータとして記憶手段に記憶され
ていると共に、画像推定手段は、斯かるワイヤフレーム
に対して対象物を複数方向から撮影した実画像を張り付
けるというテクスチャマッピングを行うことにより仮想
実画像を生成するようになっているから、その仮想実画
像の生成を比較的簡単に行い得るようになる。In the apparatus according to the ninth aspect, the shape data of the object is stored in the storage means as the wire frame data, and the image estimating means photographs the object from the plural directions with respect to the wire frame. Since the virtual real image is generated by performing texture mapping of pasting the real image, the virtual real image can be generated relatively easily.

【００２８】請求項１０記載の装置では、最終的な推定
位置姿勢情報により生成された複数の仮想実画像を含む
再生映像に対して、推定位置の時系列に応じた移動平均
による平滑化処理が行われるから、各再生画像間におけ
る仮想実画像が滑らかな運動を示すようになり、高品質
の物体認識結果が得られるようになる。In the apparatus according to the tenth aspect, the smoothing process by the moving average according to the time series of the estimated position is performed on the reproduced video including the plurality of virtual real images generated by the final estimated position and orientation information. Since this is performed, the virtual real image between the reproduced images shows a smooth motion, and a high quality object recognition result can be obtained.

【００２９】[0029]

【実施例】以下、本発明を自動車の走行シーンを撮影し
た原画像から当該自動車の三次元運動を推定する機能を
備えた三次元位置認識装置に適用した一実施例について
図面を参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment in which the present invention is applied to a three-dimensional position recognizing device having a function of estimating the three-dimensional motion of a car from an original image obtained by photographing a car driving scene will be described with reference to the drawings To do.

【００３０】図１には、本実施例による三次元位置認識
装置の基本的な構成が機能ブロックの組み合わせにより
示されており、まず、この構成について概略的に説明す
る。即ち、図１において、Ａ／Ｄ変換部１は、例えばＲ
ＧＢ画像信号を発生するＣＣＤカメラ２（本発明による
撮像手段に相当）により撮影される系列画像、つまり対
象物を含むシーンを供給する映像ソースをデジタイズす
るようになっており、そのデジタイズ後の映像ソース
は、前処理部３にて雑音除去やフィルタリングが行われ
る。本発明でいう移動体領域抽出手段に相当した移動体
抽出部４は、前処理部３の出力データ（映像ソース）か
ら移動体領域画像の抽出を行い、特徴量抽出部５は、前
処理部３の出力データから動ベクトル以外の特徴量（特
徴点位置、領域分割、エッジ情報その他）の抽出を行
う。FIG. 1 shows a basic structure of a three-dimensional position recognition apparatus according to this embodiment by combining functional blocks. First, this structure will be briefly described. That is, in FIG. 1, the A / D conversion unit 1 is, for example, R
A series image captured by a CCD camera 2 (corresponding to an image capturing unit according to the present invention) that generates a GB image signal, that is, a video source that supplies a scene including an object is digitized, and the video after the digitization is digitized. The source is subjected to noise removal and filtering by the preprocessing unit 3. The moving body extraction unit 4 corresponding to the moving body region extraction means in the present invention extracts a moving body region image from the output data (video source) of the preprocessing unit 3, and the feature amount extraction unit 5 uses the preprocessing unit. From the output data of No. 3, the feature amount other than the motion vector (feature point position, region division, edge information, etc.) is extracted.

【００３１】構造及び特徴量モデリング部６は、対象物
の三次元構造をワイヤフレームとしてモデリングする機
能を有するものであり、そのモデリングは例えば人手を
介在させて行い、斯様なワイヤフレームによる三次元構
造データはモデルデータベース７（本発明でいう記憶手
段に相当）に登録（記憶）される。このモデルデータベ
ース７に登録されたワイヤフレーム情報は、構造及び特
徴量モデリング部６内の要求解釈部により抽出された対
象物情報によって選択的にアクセスされて、対象物メモ
リ８にストアされるようになっている。The structure / feature quantity modeling unit 6 has a function of modeling a three-dimensional structure of an object as a wire frame, and the modeling is performed, for example, by manpower. The structural data is registered (stored) in the model database 7 (corresponding to the storage means in the present invention). The wireframe information registered in the model database 7 is selectively accessed by the object information extracted by the request interpreting unit in the structure / feature quantity modeling unit 6 and stored in the object memory 8. Has become.

【００３２】テクスチャマッピング部９は、対象物メモ
リ８にストアされたワイヤフレームデータを基準にし、
且つ空間量子化部１０から出力される位置姿勢情報を用
いて、対象物のワイヤフレームへのテクスチャマッピン
グを実行し、テクスチャマッピング部９と共に本発明で
いう画像推定手段を構成する画像推定部１１は、斯様な
テクスチャマッピングを元にして対象物の仮想実画像を
生成する。上記空間量子化部１０は、本発明でいう空間
量子化手段を構成するもので、前記移動体抽出部４によ
る移動体領域の抽出画像を利用した移動対象物の位置姿
勢の推定を後述のように行うと共に、画像推定部１１に
よる仮想実画像に基づいた推定画像と移動体抽出部４に
よる移動体領域の抽出画像との照合による位置姿勢探索
を後述のように行うものであり、その結果を最適な位置
姿勢情報として出力するようになっている。属性演算部
１２は、空間量子化部１０からの位置姿勢情報、並びに
対象物に関する先見的情報、構造情報、色情報、周囲の
環境情報などに基づいて運動・色彩・形状及び環境情報
を出力する。The texture mapping unit 9 uses the wireframe data stored in the object memory 8 as a reference,
In addition, the position / orientation information output from the spatial quantizer 10 is used to perform texture mapping of the object on the wire frame, and the image estimator 11 that constitutes the image estimator according to the present invention together with the texture mapper 9 A virtual real image of the object is generated based on such texture mapping. The space quantizer 10 constitutes a space quantizer according to the present invention, and the estimation of the position and orientation of the moving object using the extracted image of the moving body region by the moving body extractor 4 will be described later. In addition to the above, the position and orientation search by collating the estimated image based on the virtual real image by the image estimation unit 11 with the extracted image of the moving body region by the moving body extraction unit 4 is performed as described below. It is designed to be output as optimum position and orientation information. The attribute calculator 12 outputs the motion / color / shape and environment information based on the position / orientation information from the space quantizer 10 and the a priori information, structure information, color information, surrounding environment information, etc. regarding the object. .

【００３３】前記構造及び特徴量モデリング部６は、外
部から入力される音声、言語、数値データなどの指示に
より対象物体や環境の認識に関わる要求を取り込み、装
置内の動作モード及びデータ形式に変換する機能を有す
る。また、構造及び特徴量モデリング部６は、未知対象
物のモデルデータベース７への登録の際に、そのカテゴ
リに応じた分類と数値入力要求、既登録対象物の検索を
行い、対象物メモリ８へ構造及び画像属性データを転送
する機能も有する。さらに、構造及び特徴量モデリング
部６においては、未知対象物の構造データの教示を対話
的に行うことにより、入力データの管理を行うようにな
っている。The structure / feature quantity modeling unit 6 takes in a request relating to recognition of a target object or environment in accordance with an instruction such as voice, language, or numerical data inputted from the outside, and converts it into an operation mode and data format in the apparatus. Have the function to Further, when registering the unknown object in the model database 7, the structure / feature amount modeling unit 6 performs a classification according to the category, a numerical value input request, a search for an already registered object, and an object memory 8. It also has the function of transferring structure and image attribute data. Further, in the structure / feature quantity modeling unit 6, the input data is managed by interactively teaching the structure data of the unknown object.

【００３４】画像圧縮再生システム１３は、映像ソース
或いは仮想生成した環境映像または画像合成部１４によ
り得られる両者の合成画像などの画像データ群を圧縮し
て画像記憶部１５に順次蓄積する。通信用インタフェイ
ス１６は、例えば情報通信ネットワークにより得られた
圧縮画像情報やモデルデータベース情報或いは処理要求
のコマンドを装置内に取り込むと共に、他の装置にデー
タ送信する機能を有する。Ｄ／Ａ変換部１７は、装置内
で生成された映像データをアナログビデオ画像として出
力するものであり、その映像データはモニタ１８におい
て再生可能となっている。The image compression / reproduction system 13 compresses an image data group such as a video source, an environmentally generated virtual image, or a composite image of the both obtained by the image compositing unit 14 and sequentially stores it in the image storage unit 15. The communication interface 16 has a function of taking in compressed image information, model database information, or a processing request command obtained by an information communication network into the device and transmitting data to another device. The D / A converter 17 outputs the video data generated in the device as an analog video image, and the video data can be reproduced on the monitor 18.

【００３５】図２には、本実施例による三次元位置認識
装置の作動内容が示されており、以下、これについて関
連した部分の構成及び作動と共に説明する。FIG. 2 shows the operation contents of the three-dimensional position recognizing apparatus according to this embodiment, which will be described below together with the structure and operation of relevant parts.

【００３６】図２に示された作動内容の概略は次の通り
である。ＣＣＤカメラ２による時刻ｎ（但し、ｎは自然
数）及び時刻（ｎ＋１）での各１フレーム分の画像を読
み込み（ステップＳ１、Ｓ２）、それらのフレーム間の
画像偏移量に基づいて算出したブロック毎の動ベクトル
に基づいてブロック単位の移動体領域の抽出を行う（移
動体領域抽出ルーチンＳ３）。次いで、抽出した移動体
領域の二次元形状や事前に得られた先見的な情報などか
らＣＣＤカメラ２の姿勢及び位置パラメータの探索初期
値を設定し、或いは複数個の探索候補領域の設定を行っ
てそれらの候補領域に対して量子化空間点の階層的並列
探索を行う（探索領域限定ルーチンＳ４）。さらに、モ
デルベース符号化による対象物（この場合は自動車）の
画像を再生するために、三次元構造モデリングルーチン
Ｓ５において、テクスチャマッピングにより仮想実画像
を推定すると共に、この推定画像と観測画像との間の誤
差評価関数が最小値を取るときの位置姿勢を推定し、こ
のようにして得られる位置姿勢情報を各時刻ｎについて
求め、これを元にして軌道ベクトルを計算するという動
作（ステップＳ６）を繰り返す。The outline of the operation contents shown in FIG. 2 is as follows. A block calculated by the CCD camera 2 at time n (where n is a natural number) and each frame of image at time (n + 1) is read (steps S1 and S2) and calculated based on the image shift amount between these frames. The moving body area is extracted in block units based on each motion vector (moving body area extraction routine S3). Then, the search initial values of the posture and position parameters of the CCD camera 2 are set from the extracted two-dimensional shape of the moving body area and the a priori information obtained in advance, or a plurality of search candidate areas are set. Then, a hierarchical parallel search for quantized spatial points is performed for these candidate areas (search area limiting routine S4). Further, in order to reproduce the image of the object (the automobile in this case) by the model-based encoding, the virtual real image is estimated by texture mapping in the three-dimensional structure modeling routine S5, and the estimated image and the observed image are combined. An operation of estimating the position and orientation when the error evaluation function between the two takes the minimum value, obtaining the position and orientation information obtained in this way for each time n, and calculating the trajectory vector based on this (step S6) repeat.

【００３７】具体的には、上記移動体領域抽出ルーチン
Ｓ３での移動体領域の抽出は、以下のようにして行う。
尚、このルーチンＳ３での処理内容は、本発明でいう移
動体領域抽出手段の機能に相当する。Specifically, the extraction of the moving body area in the moving body area extraction routine S3 is performed as follows.
The processing content of this routine S3 corresponds to the function of the moving body region extraction means in the present invention.

【００３８】まず、対象物（この実施例の場合は自動
車）がＣＣＤカメラ２の視野内に収まっていると仮定し
て、当該カメラ２により撮影された系列画像から得た連
続する２フレーム分の画像データに基づいて、ブロック
画素毎の二次元的な動ベクトルの分布（オプティカルフ
ロー）を検出する。この場合には、ＭＰＥＧ動画像圧縮
基準などで採用されているブロックマッチング法に基づ
くフレーム間処理によって、ブロックを構成する符号化
対象画素毎の二次元的な動ベクトルの分布を検出するも
のであり、これにより図３（ａ）に示すようなオプティ
カルフローが得られる。First, assuming that the object (the automobile in this embodiment) is within the field of view of the CCD camera 2, two consecutive frames obtained from the series image taken by the camera 2 are taken. A two-dimensional motion vector distribution (optical flow) for each block pixel is detected based on the image data. In this case, the distribution of the two-dimensional motion vector for each pixel to be encoded that constitutes a block is detected by interframe processing based on the block matching method adopted in the MPEG moving image compression standard and the like. As a result, an optical flow as shown in FIG. 3A is obtained.

【００３９】尚、図３（ａ）は、２フレーム目の画像を
表したもので、Ａは対象物である自動車を示し、矢印が
各ブロックの動ベクトルを示すものであるが（但し、図
３では、図面の煩雑化を防止するためにブロック数を減
らした状態で示している）、同図中において、自動車Ａ
以外の部分で発生している動ベクトルは、雑音成分であ
る。また、通常のブロックマッチング法におけるブロッ
クサイズは、一般的には符号化しようとする画素を囲む
８×８画素程度に設定されるものであるが、処理対象に
応じて適宜に変更できるものである。例えば、本実施例
のように対象物が自動車である場合には、２４（縦）×
１２（横）画素のブロックサイズに設定している。Incidentally, FIG. 3 (a) shows an image of the second frame, A shows an automobile as an object, and arrows show the motion vector of each block (however, in FIG. 3 shows a state in which the number of blocks is reduced in order to prevent the drawing from becoming complicated). In FIG.
Motion vectors generated in other parts are noise components. Further, the block size in the normal block matching method is generally set to about 8 × 8 pixels surrounding the pixel to be encoded, but can be appropriately changed according to the processing target. . For example, when the object is an automobile as in this embodiment, 24 (vertical) ×
The block size is set to 12 (horizontal) pixels.

【００４０】次いで、上記のように得た図３（ａ）に示
すようなオプティカルフローの発生状況に基づいて各ブ
ロックのラベリングを行う。尚、図３（ｂ）には、動ベ
クトルが検出されたブロックを斜線帯によりグループ分
けした状態で示した。Then, each block is labeled based on the generation status of the optical flow as shown in FIG. 3A obtained as described above. Note that, in FIG. 3B, the blocks in which the motion vector is detected are shown in a state in which the blocks are divided into hatched bands.

【００４１】ところで、一般にブロックマッチング法で
は、画像中の輝度のムラや画像そのものの雑音、光の反
射の影響などにより、必ずしも移動対象物が存在する領
域のみに動ベクトルが発生するとは限らず、雑音成分が
含まれることが避けられないという事情がある。そこ
で、移動体領域の抽出を行うに当たっては、以下〜
のような雑音成分除去のための各処理を行う。By the way, generally, in the block matching method, the motion vector is not always generated only in the area where the moving object exists due to the unevenness of the brightness in the image, the noise of the image itself, the influence of the reflection of light, and the like. There is a circumstance that inclusion of noise components is unavoidable. Therefore, when extracting the moving body region,
Each process for removing noise components such as

【００４２】…しきい値処理しきい値処理の方法としては、２つの方法が用意されて
おり、前提条件に応じてこれらの方法を使い分ける。具
体的には、対象物である自動車が急激な方向転換を行わ
ないことが分かっている場合、並びに対象物の概略位置
を検出する場合には、第１の方法を利用し、これ以外の
場合には第２の方法を利用する（演算速度が十分に早い
場合などには第２の方法のみを利用する構成も可）。Threshold processing Two methods are prepared as threshold processing, and these methods are selectively used according to the preconditions. Specifically, when it is known that the automobile, which is the object, does not make a sharp turn, and when detecting the approximate position of the object, the first method is used, and in other cases The second method is used for (for example, a configuration that uses only the second method when the calculation speed is sufficiently high).

【００４３】第１の方法では、動ベクトルか否かの判定
をしきい値処理により行う。つまり、予めしきい値Ｖth
を設定しておき、動ベクトルの大きさＶが、Ｖ≧Ｖth の関係にあるときのみ、該当ブロックを動ベクトル発生
領域と見なしてラベリングを行うものであり、この方法
によれば演算速度が速くなる。In the first method, the threshold value processing is used to determine whether or not the vector is a motion vector. That is, the threshold value Vth
Is set and the labeling is performed by regarding the corresponding block as a motion vector generation area only when the magnitude V of the motion vector has a relationship of V ≧ Vth. According to this method, the calculation speed is high. Become.

【００４４】第２の方法は、ブロックマッチング法によ
る画像偏移量算出時においてブロック画素間の誤差評価
関数を計算する際に、次式（１）の条件が成立するとき
に動ベクトルが（０，０）であるとするものである。但
し、式（１）は、第ｎ番目のフレーム及び第（ｎ−１）
番目のフレーム中でブロックの範囲及び探索範囲が図４
（ａ）、（ｂ）に示すようなＸＹ座標で定義されている
ことを前提としたものであり、（ｒij，ｇij，ｂij）
は、第ｎフレームの（ｉ，ｊ）の位置にある画素のＲＧ
Ｂ成分を示し、（ｒ'ij ，ｇ'ij ，ｂ'ij ）は、第（ｎ
−１）フレームの（ｉ，ｊ）の位置にある画素のＲＧＢ
成分を示し、（ｘs ，ｙs ）、（ｘe 、ｙe ）は、該当
画素ブロックの対角頂点の座標、ΔOFは予め設定された
しきい値である。In the second method, when the error evaluation function between block pixels is calculated when the image shift amount is calculated by the block matching method, when the condition of the following expression (1) is satisfied, the motion vector becomes (0 , 0). However, the equation (1) is used for the nth frame and the (n-1) th frame.
The block range and the search range in the second frame are shown in FIG.
It is premised on being defined by XY coordinates as shown in (a) and (b), and (rij, gij, bij)
Is the RG of the pixel at the position (i, j) in the nth frame
B component, (r'ij, g'ij, b'ij) is the (n'th)
-1) RGB of the pixel at the position (i, j) of the frame
(Xs, ys) and (xe, ye) are the coordinates of the diagonal vertices of the pixel block, and ΔOF is a preset threshold value.

【００４５】[0045]

【数１】 [Equation 1]

【００４６】具体的には、本実施例の場合、ブロックサ
イズを２４×１２画素、マッチングを計算する領域の幅
ｄｘ＝ｄｙ＝１５画素とし、しきい値ΔOF＝２×２４×
１２とした。Specifically, in the case of this embodiment, the block size is 24 × 12 pixels, the width of the region for which matching is calculated is dx = dy = 15 pixels, and the threshold value ΔOF = 2 × 24 ×.
It was set to 12.

【００４７】…孤立点除去処理前述のように求めた動ベクトル発生ブロックのラベリン
グ情報において、孤立して存在するブロックについて
は、誤った偏移量検出が行われたものと判断して、その
ブロックの動ベクトルを（０，０）に設定する。例え
ば、図５に示すように、孤立点除去処理対象ブロックに
ついて８方向から隣接したブロックの動ベクトルをチェ
ックし、全てが静止ブロック（動ベクトルが（０，
０））であった場合に当該対象ブロックを孤立点ブロッ
クと見なして画像処理対象から除去する。... Isolated Point Removal Processing In the labeling information of the motion vector generating block obtained as described above, it is judged that the block existing in isolation is erroneously detected in the shift amount, and the block is detected. The motion vector of is set to (0,0). For example, as shown in FIG. 5, with respect to the isolated point removal processing target block, the motion vectors of adjacent blocks in eight directions are checked, and all of the stationary blocks (the motion vector is (0,
0)), the target block is regarded as an isolated point block and is removed from the image processing target.

【００４８】…反射の除去処理金属面の反射や光沢がもたらすオプティカルフローは、
面全体のフローと向き、時間変化状態などが異なること
が多いという性質を利用して、図３（ｂ）に示すように
移動体領域の穴となるようなブロックｈを周囲と同一に
ラベリングする。この場合、上記ブロックｈの動ベクト
ルの大きさは、周囲のブロックのデータを利用した内挿
計算により求める。Reflection removal treatment The optical flow caused by the reflection and gloss of the metal surface is
By utilizing the property that the flow and direction of the entire surface and the state of time change are often different, the block h that becomes a hole in the moving body region is labeled in the same manner as the surroundings, as shown in FIG. 3B. . In this case, the magnitude of the motion vector of the block h is obtained by the interpolation calculation using the data of the surrounding blocks.

【００４９】上記のような処理を行った後には、ラベリ
ングによるブロック領域の抽出を行う。この場合には、
ブロックを単位とした各行について、オプティカルフロ
ーが発生しているブロックの左右両端に位置したもの及
びそれらの間に位置したブロックをラベリングする。こ
のようにして、移動対象物が空間中に連続して分布する
物体（この実施例では自動車）であると仮定した場合
に、図３（ｃ）に示すように、対象物である自動車Ａ及
びその影を含むブロック単位の移動体領域（太線で囲ん
だ部分）が求められることになる。After the above processing is performed, the block area is extracted by labeling. In this case,
For each row in units of blocks, the blocks located at the left and right ends of the block in which the optical flow is generated and the blocks located between them are labeled. In this way, assuming that the moving object is an object (an automobile in this embodiment) continuously distributed in the space, as shown in FIG. A moving object area (a portion surrounded by a thick line) including the shadow is obtained in block units.

【００５０】この後には、色空間分割による影の除去を
行う。即ち、一般にブロックマッチング法では、照明条
件によっては上述のように対象物の影も移動体領域とし
て抽出されることになる。そこで、その影の部分を、周
知の色空間のクラスタリング手法によって分離し、最終
的に図３（ｄ）のような移動体領域を抽出する。尚、こ
のクラスタリングに際しては、例えばＫ平均クラスタリ
ングやＬＢＧ法などが考えられるが、特に影の部分はほ
とんどの場合無彩色であるという性質を利用して経験的
にＲＧＢ各成分のベクトルのシードとしきい値を設定し
ている。After that, the shadow is removed by color space division. That is, generally, in the block matching method, the shadow of the object is also extracted as the moving body region as described above depending on the illumination condition. Therefore, the shadow portion is separated by a well-known color space clustering method, and finally a moving body region as shown in FIG. 3D is extracted. Note that, in this clustering, for example, K-means clustering or LBG method can be considered, but in particular, the property that the shadow part is achromatic in most cases is utilized empirically and used as a seed for the vector of each RGB component. The value is set.

【００５１】具体的には、対象物の車体色が例えば赤色
であった場合には、上記のようなクラスタリング時にお
ける初期クラスタ中心のＲＧＢ成分の分布値を２５６階
調（８ビット）で、車体部分について（ＲＧＢ）＝（２
５５，０，０）に設定し、影部分について（ＲＧＢ）＝
（８５，８５，８５）に設定することになる。More specifically, when the body color of the object is red, for example, the distribution value of RGB components at the center of the initial cluster at the time of clustering as described above is set to 256 gradations (8 bits), About part (RGB) = (2
55,0,0) and for the shadow part (RGB) =
It will be set to (85, 85, 85).

【００５２】また、上記のようにしてオプティカルフロ
ー検出領域を抽出する際には、対象物の画像が欠ける場
合があるため、実際には、移動体領域はそれより一回り
大きく抽出する。また、上述したような雑音除去処理な
どを施して得られるラベリング領域については平均動ベ
クトルを算出する。Further, when the optical flow detection area is extracted as described above, the image of the object may be missing, so that the moving object area is actually extracted one size larger than that. Further, an average motion vector is calculated for the labeling area obtained by performing the noise removal processing as described above.

【００５３】一方、探索領域限定ルーチンＳ４では、Ｃ
ＣＤカメラ２から取り込んだ映像ソースが初期フレーム
か否か（つまりｎ＝１か否か）を判断し、初期フレーム
でなかった場合には、ＣＣＤカメラ２に関する後述のパ
ラメータ（γ，θ，φ，ｒ）の探索初期値が、前のフレ
ームでのパラメータ（γ，θ，φ，ｒ）n-1 となるよう
に設定する。On the other hand, in the search area limiting routine S4, C
It is determined whether or not the video source captured from the CD camera 2 is the initial frame (that is, n = 1 or not), and if it is not the initial frame, the parameters (γ, θ, φ, The initial search value of r) is set to be the parameter (γ, θ, φ, r) n−1 in the previous frame.

【００５４】ところで、本実施例においては、対象物の
ＣＣＤカメラ２による観測を全方位から行うことを前提
としている。そこで、図６に示すように、対象物（自動
車Ａ）を囲むドーム状の空間を想定し、全方位から対象
物を観測した場合の見え方を推定するようにしており、
このために、対象物に対するカメラ座標系を、位置ｒ、
θ、φ及び姿勢α、β、γの６個のパラメータで表現
し、このパラメータ空間を後述のように量子化して仮想
実画像を推定する構成としている。By the way, in this embodiment, it is premised that the object is observed by the CCD camera 2 from all directions. Therefore, as shown in FIG. 6, assuming a dome-shaped space surrounding the object (vehicle A), the appearance when the object is observed from all directions is estimated.
To this end, the camera coordinate system for the object is defined by the position r,
It is represented by six parameters of θ, φ and postures α, β, γ, and the parameter space is quantized as described later to estimate a virtual real image.

【００５５】この場合、ＣＣＤカメラ２から取り込んだ
映像ソースが初期フレームであった場合には、探索領域
を限定することが困難であるため、以下に述べる処理を
実行する。即ち、図７は、この処理の流れを示してお
り、まず、移動体領域抽出ルーチンＳ３で抽出した移動
体領域の二次元形状の大きさを計算により判定して、図
９に示すように、その二次元形状から長軸ａ、短軸ｂ及
び長軸ａの傾き角度Ψを計算し、その計算情報から距離
ｒと角度θ、φに関する探索領域の限定を行う。この場
合、対象物を全方位から観測した全ての見え方を仮想実
画像で推定するためには、計算量が大幅に増えるという
事情があるので、移動体領域の形状情報を用いて候補と
なる見え方を絞り込み、これに基づいて図８に示すよう
に複数個の探索候補領域Ｑ1 〜Ｑi を設定し、それらの
候補領域の一次元探索を並列に行うものである。In this case, when the video source captured from the CCD camera 2 is the initial frame, it is difficult to limit the search area, and therefore the following processing is executed. That is, FIG. 7 shows the flow of this processing. First, the size of the two-dimensional shape of the moving body region extracted by the moving body region extraction routine S3 is determined by calculation, and as shown in FIG. The tilt angles ψ of the long axis a, the short axis b, and the long axis a are calculated from the two-dimensional shape, and the search area regarding the distance r and the angles θ and φ is limited from the calculated information. In this case, in order to estimate all appearances of the object observed from all directions with a virtual real image, there is a situation that the amount of calculation increases significantly. The appearance is narrowed down, and based on this, a plurality of search candidate areas Q1 to Qi are set as shown in FIG. 8, and one-dimensional searches of these candidate areas are performed in parallel.

【００５６】上記のような探索領域の候補となる見え方
の絞り込み処理において、移動体領域の長軸ａ、短軸ｂ
及び長軸ａの傾き角度Ψを計算するに当たっては、以下
のような第１の形状判定方法及び第２の形状判定方法が
考えられる。In the process of narrowing down the appearance that is a candidate for the search area as described above, the long axis a and the short axis b of the moving body area are selected.
In calculating the inclination angle Ψ of the long axis a, the following first shape determination method and second shape determination method are conceivable.

【００５７】…第１の形状判定方法図１０に示すように、ブロック単位で抽出した移動体領
域に対し、最左辺に存するブロック群ＢＬの垂直座標成
分の中心位置、並びに最右辺に存するブロック群ＢＲの
垂直座標成分の中心位置をそれぞれ計算し、それらの中
心位置間を直線で結ぶ。次に、上記直線の中点から当該
直線と直交した直線を引いて、その直線と移動体領域の
外周との交点を求める。このように得られた２本の線分
のうち、長い方を長軸Ｌmax 、短い方を短軸Ｌmin と
し、長軸と水平軸との角度をΨ0 する。この場合、移動
体領域の重心は長軸Ｌmax 及び短軸Ｌmin の交点となる
ものであるが、その重心と画像中心とは一般的には一致
しない。First Shape Determination Method As shown in FIG. 10, the center position of the vertical coordinate component of the block group BL existing on the leftmost side and the block group existing on the rightmost side with respect to the moving body area extracted in block units. The center positions of the vertical coordinate components of BR are calculated, and the center positions are connected by a straight line. Next, a straight line orthogonal to the straight line is drawn from the midpoint of the straight line, and the intersection of the straight line and the outer circumference of the moving body region is obtained. Of the two line segments thus obtained, the longer one is the major axis Lmax, the shorter one is the minor axis Lmin, and the angle between the major axis and the horizontal axis is ψ0. In this case, the center of gravity of the moving body region is the intersection of the long axis Lmax and the short axis Lmin, but the center of gravity and the image center generally do not coincide.

【００５８】…第２の形状判定方法図１１に示すように、ブロック単位で抽出した移動体領
域の重心から、水平軸に対し任意角度を呈する多数本の
直線を引くことを仮定し、これらの直線に対して移動体
領域の外周部に位置する各ブロックの中心からの距離を
計算すると共に、その総和を評価関数とした最小二乗法
によって最適角度Ψo を求めるものであり、その最適角
度Ψo に対応した線分を長軸Ｌmax とする。また、移動
体領域の重心から長軸と直交する直線を引き、その直線
が移動体領域に含まれる部分を短軸Ｌmin とする。Second Shape Judging Method As shown in FIG. 11, it is assumed that a large number of straight lines exhibiting an arbitrary angle with respect to the horizontal axis are drawn from the center of gravity of the moving body region extracted in block units. The distance from the center of each block located on the outer periphery of the moving body area with respect to the straight line is calculated, and the optimum angle Ψo is obtained by the least squares method using the sum of the two as an evaluation function. The corresponding line segment is the long axis Lmax. Further, a straight line orthogonal to the long axis is drawn from the center of gravity of the moving body region, and the portion where the straight line is included in the moving body region is defined as the short axis Lmin.

【００５９】上記のような第１及び第２の形状判定方法
の何れかで求めた長軸Ｌmax 、短軸Ｌmin 、及び角度Ψ
の情報を元にしてカメラ位置（ｒ，θ，φ）の候補領域
を絞り込むものであり、この場合には、その情報から距
離ｒと角度θ、φに関する探索領域の限定を行う。その
後、図６に示すようなドーム状の空間に探索中心の候補
値を数十個程度設定し、各探索中心を並列に一次元探索
することによって、最終的に１つの位置姿勢を出力す
る。The major axis Lmax, the minor axis Lmin, and the angle Ψ obtained by either of the first and second shape determining methods as described above.
The candidate area of the camera position (r, θ, φ) is narrowed down based on the information of 1. In this case, the search area regarding the distance r and the angles θ and φ is limited based on the information. Thereafter, several tens of candidate values for the search center are set in a dome-shaped space as shown in FIG. 6, and one-dimensional search is performed for each search center in parallel to finally output one position and orientation.

【００６０】具体的には、探索領域の限定は例えば以下
の方法で行う。即ち、図６のように、位置ｒ、θ、φ及
び姿勢α、β、γの６個のパラメータで表現された対象
物（自動車Ａ）に対するカメラの座標系において、Specifically, the search area is limited by the following method, for example. That is, as shown in FIG. 6, in the coordinate system of the camera with respect to the object (automobile A) represented by the six parameters of position r, θ, φ and postures α, β, γ,

【００６１】 α＝β＝γ＝０ [deg] と固定し、ｒ＝ｒmin ＋ｉΔｒ（但し、ｉは負でない整数、ｒ≦ｒmax ） θ＝θmin ＋ｊΔθ （但し、ｊは負でない整数、θ≦θmax ） φ＝φmin ＋ｋΔφ （但し、ｋは負でない整数、φ≦φmax ）[0061] α = β = γ = 0 [deg] Fixed with r = rmin + iΔr (where i is a non-negative integer, r ≦ rmax) θ = θmin + jΔθ (where j is a non-negative integer, θ ≦ θmax) φ = φmin + kΔφ (where k is a non-negative integer, φ ≦ φmax)

【００６２】なる各パラメータｒ、θ、φについて、自
動車Ａのモデルの仮想実画像を生成し、その仮想実画像
の二次元形状の特徴となる前記図１０或いは図１１に示
すような長軸Ｌmax （ｒ，θ，φ）、短軸Ｌmin （ｒ，
θ，φ）、傾き角度Ψ（ｒ，θ，φ）を、前述した第１
及び第２の形状判定方法の何れかを用いて計算する。但
し、ｒmin 、θmin 、φmin 及びｒmax 、θmax 、φma
x は、それぞれのパラメータについての最小探索値及び
最大探索値である。これにより、（ｒ，θ，φ）→（Ｌmax ，Ｌmin ，Ψ）［ｒ，θ，φ］の対応表を予め作成しておく。この対応表を参照するこ
とにより、（Ｌmax ，Ｌmin ，Ψ）から（ｒ，θ，φ）
を計算することができる。この場合、本実施例において
は、θmin 、θmax 、φmin 、φmax は全方位について
対応表が作成される。また、ｒmin 、ｒmax は（Ｌmax
，Ｌmin ，Ψ0 ）との間の粗いスケールの対応表から
決めるか、或いは応用対象に応じて適宜に設定する。For each of the parameters r, θ, and φ, a virtual real image of the model of the automobile A is generated, and the long axis Lmax as shown in FIG. 10 or 11 that is the feature of the two-dimensional shape of the virtual real image. (R, θ, φ), short axis Lmin (r,
θ, φ) and tilt angle Ψ (r, θ, φ)
And the second shape determination method. Where rmin, θmin, φmin and rmax, θmax, φma
x is the minimum and maximum search value for each parameter. Thereby, a correspondence table of (r, θ, φ) → (Lmax, Lmin, Ψ) [r, θ, φ] is created in advance. By referring to this correspondence table, from (Lmax, Lmin, Ψ) to (r, θ, φ)
Can be calculated. In this case, in the present embodiment, a correspondence table is created for θmin, θmax, φmin, and φmax for all directions. Also, rmin and rmax are (Lmax
, Lmin, Ψ 0), or based on a rough scale correspondence table, or set appropriately according to the application target.

【００６３】上記対応表の探索を行う場合には、次式
（２）の関係を満たす（ｒ，θ，φ）の組である（ｒo
i，θ0i，φ0i）を求める。但し、i は候補数Ｎに応じ
た自然数（i ＝１，２，…，Ｎ）である。また、式
（２）において、Ｌ0max、Ｌ0minは前述した第１の形状
判定方法或いは第２の形状判定方法により算出した長軸
及び短軸の長さ、ΔＬmax 、ΔＬmin 、ΔΨはしきい値
である。When the above correspondence table is searched, a set of (r, θ, φ) satisfying the relation of the following expression (2) (ro)
i, θ0i, φ0i). However, i is a natural number (i = 1, 2, ..., N) according to the number N of candidates. Further, in the equation (2), L0max and L0min are the lengths of the major axis and the minor axis calculated by the above-described first shape determining method or the second shape determining method, and ΔLmax, ΔLmin and ΔΨ are threshold values. .

【００６４】[0064]

【数２】 [Equation 2]

【００６５】次に、図１２に示すように、移動体領域の
重心（ｘG ，ｙG ）から姿勢パラメータα、βを次式に
よって求める。但し、ＦはＣＣＤカメラ２の焦点距離、
ｘGは画像中心を原点とした移動体領域の重心の水平座
標、ｙG は画像中心を原点とした移動体領域の重心の垂
直座標である。Next, as shown in FIG. 12, the attitude parameters α and β are obtained from the center of gravity (xG, yG) of the moving body region by the following equation. However, F is the focal length of the CCD camera 2,
xG is the horizontal coordinate of the center of gravity of the moving body area with the image center as the origin, and yG is the vertical coordinate of the center of gravity of the moving body area with the center of the image as the origin.

【００６６】α＝arctan（ｙG ／Ｆ） β＝arctan｛（−ｘG ／Ｆ）×cos α｝また、姿勢パラメータγについては、本実施例の場合、
対象物である自動車ＡとＣＣＤカメラ２との姿勢の関係
において、γ方向のずれはほとんどないものとしてお
り、このためγ＝０としている。Α = arctan (yG / F) β = arctan {(− xG / F) × cos α} Regarding the attitude parameter γ, in the case of the present embodiment,
It is assumed that there is almost no deviation in the γ direction in the relationship between the postures of the automobile A, which is the object, and the CCD camera 2, and therefore γ = 0.

【００６７】そして、このようして得られた（ｒoi，θ
0i，φ0i）及び（α，β，γ）をＣＣＤカメラ２に対す
る移動体の位置姿勢の候補値とする。尚、候補値の数Ｎ
は、ΔＬmax 、ΔＬmin 、ΔΨの値に依存するが、後述
の形状特徴の誤差評価関数に基づいて誤差が小さいもの
から優先的に候補値を選択し、本実施例の装置の演算機
能を司るプロセッサの演算能力と実時間性に応じてＮを
決定する。Then, thus obtained (roi, θ
0i, φ0i) and (α, β, γ) are candidate values for the position and orientation of the moving body with respect to the CCD camera 2. The number of candidate values N
Is dependent on the values of ΔLmax, ΔLmin, and ΔΨ, but a candidate value is preferentially selected based on an error evaluation function of a shape feature described later from the one having the smallest error, and the processor that controls the arithmetic function of the apparatus of the present embodiment. N is determined according to the computing power and the real-time property.

【００６８】ここで、観測時の姿勢パラメータα、β
は、対応表作成時のα＝β＝０に限定できないという問
題点が残るが、一般的にＣＣＤカメラ２の画角の範囲で
は、Ｌmax 、Ｌmin 、Ψ0 にそれほどの違いが出ない上
に、探索領域の限定という一次推定としての精度があれ
ば良いと考えれば特に支障はない。勿論、演算能力に余
裕がある場合には、上式のα、βの値を組み入れて対応
表を計算する。尚、本実施例では、上記のような一次元
探索の探索幅を各パラメータｒ、θ、φ、α、β、γに
ついて、例えば以下に記した表１のように設定した。Here, attitude parameters α and β at the time of observation
Has the problem that it cannot be limited to α = β = 0 at the time of creating the correspondence table, but generally, in the range of the angle of view of the CCD camera 2, Lmax, Lmin, and Ψ0 are not so different, and There is no particular problem if it is considered that the accuracy of the primary estimation, that is, the limitation of the search area, is sufficient. Of course, when the computing capacity has a margin, the values of α and β in the above equation are incorporated to calculate the correspondence table. In the present embodiment, the search width of the one-dimensional search as described above is set for each of the parameters r, θ, φ, α, β, γ, for example, as shown in Table 1 below.

【００６９】[0069]

【表１】 [Table 1]

【００７０】三次元構造モデリングルーチンＳ５におい
ては、以下に述べるような処理が行われる。透視変換行
列の計算に当たっては、カメラ座標系の位置姿勢パラメ
ータｒ、θ、φ、α、β、γにより表現されるドーム状
のパラメータ空間での量子化空間点を視点とした透視変
換行列を計算する。In the three-dimensional structure modeling routine S5, the following processing is performed. In calculating the perspective transformation matrix, the perspective transformation matrix is calculated from the viewpoint of the quantized space point in the dome-shaped parameter space represented by the position and orientation parameters r, θ, φ, α, β, γ of the camera coordinate system. To do.

【００７１】斯様な透視変換行列を利用して行う仮想実
画像の推定に当たっては、対象物が自動車の場合には、
例えば図１３（ａ）に示すように、予め対象物の三次元
構造をモデリングしたワイヤフレームを作成しておき、
予め前後左右の４方向から撮影した実画像Ｐ１〜Ｐ４を
テクスチャマッピングにより張り付けることにより、図
１３（ｂ）に示すような任意の視点から見た任意配置の
仮想的な実画像を任意個数だけ生成できる。この場合、
上記仮想実画像の枚数は、これが少なすぎると全方向か
ら見た場合の可視面の生成が不可能となり、多すぎると
データ量及び計算速度の点で不具合が生ずるから、これ
らの条件を勘案して決定する。尚、対象物の天井部は平
板状であると見なすものであり、その部分には、例えば
上記実画像からサンプリングした色の平板パネルを張り
付ける。In estimating a virtual real image using such a perspective transformation matrix, when the object is a car,
For example, as shown in FIG. 13A, a wire frame in which a three-dimensional structure of an object is modeled is created in advance,
By pasting the real images P1 to P4 captured in four directions in the front, rear, left, and right in advance by texture mapping, an arbitrary number of virtual real images at arbitrary positions viewed from an arbitrary viewpoint as shown in FIG. Can be generated. in this case,
If the number of virtual real images is too small, it becomes impossible to generate a visible surface when viewed from all directions, and if it is too large, problems occur in terms of data amount and calculation speed, so consider these conditions. To decide. The ceiling of the object is regarded as a flat plate, and a flat plate of a color sampled from the actual image is attached to the ceiling.

【００７２】このような手法により仮想実画像の生成を
行う場合に必要となる対象物画像のテクスチャは、予め
撮影した対象物画像から得る方法と、観測画像からリア
ルタイムにて得る方法とが考えられるものであり、以下
これらの方法について個別に説明しておく。The texture of the object image required when the virtual real image is generated by such a method may be obtained from a previously photographed object image or may be obtained in real time from the observed image. These methods will be individually described below.

【００７３】（ａ）予め撮影した対象物画像からテクス
チャを得る方法この場合には、対象物の実画像を、良好な照明条件や位
置姿勢で撮影でき、しかもワイヤフレーム作成のための
実際寸法の計測も当時に行える利点があるから、生成さ
れる仮想実画像の品質向上を図り得る。また、得られた
画像に対するワイヤフレームの張り付けも十分に時間を
かけて行い得るものであり、このような張り付けに際し
ては以下〜のような４つのレベルがある。(A) Method of Obtaining Texture from Object Image Preliminarily Photographed In this case, the actual image of the object can be photographed under favorable illumination conditions and position / orientation, and the actual size for wireframe preparation can be obtained. Since the measurement can be performed at that time, the quality of the generated virtual real image can be improved. Further, it is possible to attach the wire frame to the obtained image with sufficient time, and there are the following four levels in such attachment.

【００７４】…インタラクティブに計算機操作を行い
ながら透視変換行列を変更し、人手のカットアンドトラ
イで画面上の張り付けを行う方法。 …インタラクティブに計算機操作を行いながら、特徴
点の抽出と対応のみ指定し、透視ｎ点問題を計算機上で
解くことにより透視変換する方法。 …特徴点抽出及び対応関係の決定から透視ｎ点問題の
計算まで自動的に計算機で解くことにより透視変換行列
を求め、その結果を人手により検査し、修正する方法。 …上記を人手を介さずに行う方法（透視変換行列の
計算結果を計算機の自律的評価のみにより検査・修正す
る方法）。A method of changing the perspective transformation matrix while interactively operating the computer and pasting on the screen by manual cut-and-try. ... A method of performing perspective transformation by interactively operating a computer, specifying only the extraction and correspondence of feature points, and solving the perspective n-point problem on a computer. A method for obtaining a perspective transformation matrix by automatically solving by a computer from extraction of feature points and determination of correspondence to calculation of perspective n-point problem, and manually inspecting and correcting the result. A method of performing the above without human intervention (a method of inspecting and correcting the calculation result of the perspective transformation matrix only by the autonomous evaluation of the computer).

【００７５】（ｂ）観測画像からリアルタイムにテクス
チャを得る方法この場合には、観測画像に対するワイヤフレームの張り
付けが自動的に行われねばならない。そこで、まず、抽
出された移動体領域に対するワイヤフレーム構造の当て
はめを行うものであり、この当てはめ時には、以下〜
の条件が満たされるようにする。(B) Method of Obtaining Texture in Real Time from Observation Image In this case, the wire frame must be automatically attached to the observation image. Therefore, first, fitting of the wire frame structure to the extracted moving body area is performed. At the time of fitting,
Ensure that the conditions of are met.

【００７６】…図１４（ａ）に示すように、ワイヤフ
レームＷＦの外周が移動体領域ＭＡをはみ出ないこと。 …図１４（ｂ）に示すように、ワイヤフレームＷＦの
外周と移動体領域ＭＡの外周とが極端に離れないこと。 …図１４（ｃ）に示すように、ワイヤフレームＷＦ及
び移動体領域ＭＡにおける長軸、短軸及び傾き角度が互
いに大きくずれないこと。As shown in FIG. 14A, the outer periphery of the wire frame WF should not exceed the moving body area MA. ... As shown in FIG. 14B, the outer circumference of the wire frame WF and the outer circumference of the moving body area MA should not be extremely separated. ... As shown in FIG. 14 (c), the wire frame WF and the long axis in the moving object region MA, the minor axis and the tilt angle is not a large one another Lena Ikoto.

【００７７】この他にも、予め撮影した画像を用いた仮
想実画像のテクスチャモデルにより初期位置姿勢を求め
る構成としても良く、これらの条件を満たす推定画像は
図１４（ｄ）のようになる。また、この場合において、
初期位置姿勢で当てはめたワイヤフレームの修正は、計
算機の自律的評価により行うことになる。In addition to this, the initial position / orientation may be obtained by a texture model of a virtual real image using a pre-captured image, and an estimated image satisfying these conditions is as shown in FIG. 14 (d). Also, in this case,
The correction of the wire frame fitted in the initial position and orientation will be performed by the autonomous evaluation of the computer.

【００７８】また、図２には示されていないが、本実施
例では、生成される仮想実画像の品質を高めるために、
以下のような推定画像生成における可変密度補間（標本
化）及び色の適応化処理を行っており、以下これらにつ
いて説明する。Although not shown in FIG. 2, in the present embodiment, in order to improve the quality of the virtual real image generated,
Variable density interpolation (sampling) and color adaptation processing in the following estimation image generation are performed, and these will be described below.

【００７９】（ａ）推定画像生成における可変密度補間この場合、人間の感性に基づいて、対象物における視覚
的に特徴を把握し易い部分を特徴領域として設定する。
例えば、図１５のような自動車Ａでは、ヘッドライト、
ナンバプレート、タイヤのホイールなどを特徴領域とし
て設定し、これらの特徴領域では細かく、他のパッチ面
では粗くするという補間密度の可変化を行う。これによ
り、演算量の低減と統計的な平滑化を達成している。
尚、本実施例の場合、仮想実画像生成におけるパッチ面
１つ当たりの補間密度及び特徴領域の補間密度は１６×
１６点（或いは２０×２０点）程度で標本化するもので
あるが、実際には、特徴領域の方が一般のパッチ面に比
べて面積が小さいため、特徴領域側の方が等価的に標本
化密度が高くなっている。(A) Variable Density Interpolation in Estimated Image Generation In this case, a portion of the object in which the feature is easily recognized visually is set as the feature region based on the human sensitivity.
For example, in an automobile A as shown in FIG. 15, headlights,
A number plate, a wheel of a tire, etc. are set as the characteristic regions, and the interpolation density is made variable such that the characteristic regions are fine and the other patch surfaces are coarse. As a result, the amount of calculation is reduced and statistical smoothing is achieved.
In the case of the present embodiment, the interpolation density per patch surface and the interpolation density of the characteristic region in the virtual real image generation are 16 ×.
Although 16 points (or 20 × 20 points) are sampled, in reality, the characteristic region has a smaller area than a general patch surface, so that the characteristic region side is equivalently sampled. The chemical density is high.

【００８０】（ｂ）色の適応化処理この場合には次のようなケースで、対象物の性質に応じ
てパッチ面内の画素を他の画素領域からサンプリングし
た色または対象物領域の平均色で塗りつぶす。 …モデリング作成のために実画像において孤立点状に
存在する異なる色の領域、 …モデル作成用画像において観測されていない面。(B) Color adaptation processing In this case, in the following case, a color obtained by sampling pixels in the patch surface from another pixel area or an average color of the object area according to the property of the object. Fill with. ... Regions of different colors that exist as isolated points in the actual image for modeling, ... Surfaces that are not observed in the model image.

【００８１】ところで、本実施例のように、対象物が屋
外の未知の環境を移動するような状況下では、非対象物
と画像の重なり具合や明るさが変動したり、対象物の見
え方が限定不能になるなどの諸条件があるため、安定し
た特徴点抽出が困難になると共に、位置姿勢パラメータ
の量子化空間点の探索の際に所謂局所最適に陥る危険性
も大きくなる。このような危険性を回避するために、本
実施例では、三次元構造モデリングルーチンＳ５におい
て推定画像と観測画像との間の誤差評価関数が最小値を
取るときの位置姿勢を推定する際に、その誤差評価関数
を次のように改良した。By the way, in a situation where the object moves in an unknown outdoor environment as in the present embodiment, the degree of overlap between the non-object and the image and the brightness change, and the appearance of the object is changed. Since there are various conditions such as being incapable of limiting, stable feature point extraction becomes difficult, and the risk of falling into so-called local optimization when searching for quantized spatial points of position and orientation parameters increases. In order to avoid such a risk, in the present embodiment, when estimating the position and orientation when the error evaluation function between the estimated image and the observed image takes the minimum value in the three-dimensional structure modeling routine S5, The error evaluation function was improved as follows.

【００８２】具体的には、面積誤差、画素値の誤差、特
徴領域の二次元位置の誤差、形状特徴の評価を用いて総
合的に評価するものであり、まず、斯かる総合的評価の
基礎となる各ファクタについて個別に説明する。Specifically, the area error, the pixel value error, the two-dimensional position error of the characteristic region, and the evaluation of the shape feature are comprehensively evaluated. First, the basis of the comprehensive evaluation. Each of the factors will be explained individually.

【００８３】（ａ）面積誤差図１６のように、移動体領域に対して仮想実画像による
推定画像を重ね合わせ、次のように場合分けして個別の
荷重により誤差評価を計算する。 …移動体領域外に重なる画素数が一定値以上ある場合
は、その推定画像を棄却する。 …移動体領域内で推定画像領域外の場合は、その誤差
面積Ｅａ（移動体領域抽出画像と仮想実画像とが重なら
ない領域の面積）を画素単位で計算し、誤差荷重ｗａを
高める。この場合、Ｅａ＝ｗａΣ（画素数）で得られることになる（ｗａは「５」程度）。(A) Area error As shown in FIG. 16, the estimated image based on the virtual real image is superimposed on the moving body region, and the error evaluation is calculated by the individual load in the following cases. ... If the number of pixels overlapping the outside of the moving body region is a certain value or more, the estimated image is rejected. If the moving object region is outside the estimated image region, the error area Ea (the area of the region where the moving object region extracted image and the virtual actual image do not overlap) is calculated in pixel units, and the error load wa is increased. In this case, Ea = waΣ (number of pixels) is obtained (wa is about “5”).

【００８４】（ｂ）画素値の誤差実画像（観測画像）と推定画像（仮想実画像）とが重な
る領域については、同じ位置にある各画像の画素につい
て、画素値の絶対値誤差和（または二乗距離和など）を
取る。Ｅｖ＝ｗｖΣ（画素値の誤差）具体的には、本実施例では、次式（３）の画素値誤差評
価関数を用いて計算した。(B) Pixel value error In the area where the actual image (observed image) and the estimated image (virtual actual image) overlap, the sum of absolute error of pixel values (or Sum of squared distances). Ev = wvΣ (pixel value error) Specifically, in the present embodiment, calculation was performed using the pixel value error evaluation function of the following expression (3).

【００８５】[0085]

【数３】 [Equation 3]

【００８６】但し、（３）式中における各記号の意味は
次の通りである。Ｐ：仮想実画像の画面上に投影された各パッチ面上での
サンプル画素の集合。Ｎ：Ｐに含まれる画素数。ｒ1xy 、ｇ1xy 、ｂ1xy ：観測画像から抽出した移動体
領域における画素（ｘ，ｙ）のＲＧＢ成分の値。ｒ2xy 、ｇ2xy 、ｂ2xy ：仮想実画像においてサンプリ
ングした画素（ｘ，ｙ）のＲＧＢ成分の値。Ｗxy：座標xyに依存した重み係数。但し、この重み係数
Ｗxyは以下の３通りに設定される。However, the meaning of each symbol in the formula (3) is as follows. P: A set of sample pixels on each patch surface projected on the screen of the virtual real image. N: the number of pixels included in P. r1xy, g1xy, b1xy: RGB component values of the pixel (x, y) in the moving body region extracted from the observed image. r2xy, g2xy, b2xy: RGB component values of the pixel (x, y) sampled in the virtual real image. Wxy: Weighting coefficient depending on the coordinate xy. However, the weighting coefficient Wxy is set in the following three ways.

【００８７】第１のケース：（ｘ，ｙ）に位置する画素
が、仮想実画像において特徴領域上にある場合→ｗ1 第２のケース：（ｘ，ｙ）に位置する画素が、仮想実画
像において特徴領域上になく、且つ観測画像から抽出し
た移動体領域に含まれている場合→ｗ2 第３のケース：（ｘ，ｙ）に位置する画素が、仮想実画
像において特徴領域上になく、且つ観測画像から抽出し
た移動体領域にも含まれていない場合→ｗ3First case: When the pixel located at (x, y) is on the characteristic region in the virtual real image → w1 Second case: The pixel located at (x, y) is the virtual real image In the case where it is not on the characteristic region and is included in the moving body region extracted from the observed image → w2 Third case: the pixel located at (x, y) is not on the characteristic region in the virtual real image, In addition, if it is not included in the moving body region extracted from the observed image → w3

【００８８】このような重み係数の設定により、次のよ
うな効果が生ずる。即ち、第１のケースの設定によっ
て、モデルの特徴領域の部分が観測画像から抽出した移
動体領域の対応する部分に重なったときに誤差が最小と
なる。第２ケース及び第３のケースの設定によって、画
素値の誤差Ｅｖの値は、仮想実画像が移動体領域の外側
に出た場合に大きくなり、仮想実画像が移動体領域内に
収まったときに最小となる。尚、本実施例では、上記画
素値誤差評価関数における重み係数ｗ1 〜ｗ3 を、例え
ば以下に記した表２のように設定した。The setting of such weighting factors produces the following effects. That is, by the setting of the first case, the error is minimized when the portion of the characteristic region of the model overlaps the corresponding portion of the moving body region extracted from the observed image. By the settings of the second case and the third case, the value of the pixel value error Ev becomes large when the virtual real image is outside the moving body region, and when the virtual real image is inside the moving body region. Is the smallest. In this embodiment, the weighting factors w1 to w3 in the pixel value error evaluation function are set, for example, as shown in Table 2 below.

【００８９】[0089]

【表２】 [Table 2]

【００９０】（ｃ）特徴領域の二次元位置の誤差画素処理による特徴領域の二次元位置の検出が可能なら
ば、図１６のように対応する特徴領域の位置の誤差Ｅｐ
を評価関数の一つとして計算する。例えば二乗距離和を
誤差評価尺度に用いれば、次式（４）で得られる。(C) Error in the two-dimensional position of the characteristic region If the two-dimensional position of the characteristic region can be detected by pixel processing, the error Ep in the position of the corresponding characteristic region as shown in FIG.
Is calculated as one of the evaluation functions. For example, if the sum of squared distances is used as an error evaluation scale, the following formula (4) is obtained.

【００９１】[0091]

【数４】 [Equation 4]

【００９２】ここで、Ｎｆは対象となる特徴領域の数、
ｕvi、ｖviは仮想実画像の特徴領域内の特徴点の（ｕ，
ｖ）座標上における位置、ｕmi、ｖmiは移動体領域の特
徴領域内の特徴点の（ｕ，ｖ）座標上における位置であ
る。Here, Nf is the number of target characteristic regions,
uvi and vvi are the feature points (u,
v) Position on coordinates, umi, vmi are positions on (u, v) coordinates of feature points in the feature area of the moving body area.

【００９３】（ｄ）形状特徴の評価形状特徴の評価は、移動体領域の長軸と短軸とに着目し
て行うものであり、その評価を行う関数Ｅｓは次式
（５）で得られる。(D) Evaluation of shape feature The shape feature is evaluated by focusing on the long axis and the short axis of the moving body area, and the function Es for performing the evaluation is obtained by the following equation (5). .

【００９４】[0094]

【数５】 [Equation 5]

【００９５】但し、Ｌ1max：観測画像より抽出した移動体領域から計算した
長軸の長さＬ1min：観測画像より抽出した移動体領域から計算した
短軸の長さ Ψ1 ：観測画像より抽出した移動体領域から計算した
長軸の傾き角度Ｌ2max：仮想実画像から計算した長軸の長さＬ2min：仮想実画像から計算した短軸の長さ Ψ2 ：仮想実画像から計算した長軸の傾き角度ｗｓi ：重み係数（i ＝１、２、３）。この重み係数の
値は、例えば以下のように設定する。但し、Δmax 、Δ
min 、ΔΨはしきい値である。ｗｓ1 ＝０（０≦Ｌ1max−Ｌ2max＜Δmax の場合）ｗｓ1 ＞０（上記の場合以外）ｗｓ2 ＝０（０≦Ｌ1min−Ｌ2min＜Δmin の場合）ｗｓ2 ＞０（上記の場合以外）ｗｓ3 ＝０（（Ψ1 −Ψ2 ）の絶対値がΔΨより小さい
場合）ｗｓ3 ＞０（上記の場合以外）However, L1max: the length of the long axis calculated from the moving body region extracted from the observed image L1min: the length of the short axis calculated from the moving body region extracted from the observed image Ψ1: the moving body extracted from the observed image Inclination angle L2max of the long axis calculated from the region: Length L2min of the long axis calculated from the virtual real image: Short axis length Ψ2 calculated from the virtual real image: Inclination angle wsi of the long axis calculated from the virtual real image: Weighting factor (i = 1, 2, 3). The value of this weighting coefficient is set as follows, for example. Where Δmax and Δ
min and ΔΨ are threshold values. ws1 = 0 (when 0 ≦ L1max−L2max <Δmax) ws1> 0 (other than the above case) ws2 = 0 (when 0 ≦ L1min−L2min <Δmin) ws2> 0 (other than the above case) ws3 = 0 ( (When the absolute value of Ψ1−Ψ2) is smaller than ΔΨ) ws3> 0 (other than the above case)

【００９６】このような形状特徴誤差評価関数を組み入
れることにより、仮想実画像が移動体領域より極端に小
さくなる状態（図１４（ｂ）のような状態）となること
を避け得るようになる。尚、本実施例では、上記形状特
徴誤差評価関数における重み係数ｗｓ1 〜ｗｓ3 及びし
きい値Δmax 、Δmin 、ΔΨを、例えば以下に記した表
３のように設定した。By incorporating such a shape feature error evaluation function, it becomes possible to prevent the virtual real image from becoming extremely smaller than the moving body region (the state as shown in FIG. 14B). In the present embodiment, the weighting factors ws1 to ws3 and the thresholds Δmax, Δmin and ΔΨ in the shape feature error evaluation function are set as shown in Table 3 below.

【００９７】[0097]

【表３】 [Table 3]

【００９８】しかして、以上述べた４つの評価関数Ｅ
ａ、Ｅｖ、Ｅｐ、Ｅｓを統合した誤差評価関数Ｅを得る
ためには、例えば次のような線形結合を求める。Therefore, the four evaluation functions E described above are used.
In order to obtain the error evaluation function E in which a, Ev, Ep, and Es are integrated, for example, the following linear combination is obtained.

【００９９】[0099]

【数６】Ｅ＝αａ・Ｅａ＋αｖ・Ｅｖ＋αｐ・Ｅｐ＋αｓ・Ｅｓ ……（６）[Equation 6] E = αa · Ea + αv · Ev + αp · Ep + αs · Es (6)

【０１００】この場合、αａ、αｖ、αｐ、αｓは任意
の係数であり、これらの係数を「０」に設定すれば評価
関数Ｅａ、Ｅｖ、Ｅｐ、Ｅｓの取捨選択を容易に行い得
るものであるが、少なくともαｖ、αｓについては
「０」以外に設定することが望ましい。In this case, αa, αv, αp, and αs are arbitrary coefficients, and if these coefficients are set to “0”, the evaluation functions Ea, Ev, Ep, and Es can be easily selected. However, at least αv and αs are preferably set to values other than “0”.

【０１０１】そして、上記のように得られた誤差評価関
数Ｅを用いて、前記探索領域限定ルーチンＳ４で得られ
た複数の候補値を探索中心とした量子化空間点の一次元
探索を実行する。このような一次元探索のパラメータの
順番は実際の応用事例に応じて変更するものであるが、
本実施例のように対象物が自動車である場合には、図１
７に示すような順番で行う。このような一次元探索は、
前にも述べたように、対象物（自動車）を囲むドーム状
の空間に設定された数十個程度の探索中心の候補値につ
いて並列に行われるものであり、評価関数が最も小さい
位置姿勢を最終的な推定値として出力する。本実施例の
場合、仮想実画像生成におけるパッチ面１つ当たりの補
間密度は例えば１６×１６点程度であり、量子化点の探
索数は１パラメータ当たり１０回程度で良い。Then, using the error evaluation function E obtained as described above, the one-dimensional search of the quantized spatial point with the plurality of candidate values obtained in the search area limiting routine S4 as the search center is executed. . The order of the parameters of such a one-dimensional search is changed according to the actual application example.
When the object is an automobile as in the present embodiment, FIG.
The order is as shown in 7. One-dimensional search like this
As mentioned earlier, it is performed in parallel for dozens of candidate values of the search center set in the dome-shaped space surrounding the object (vehicle), and the position and orientation with the smallest evaluation function is calculated. Output as the final estimated value. In the case of the present embodiment, the interpolation density per patch plane in the virtual real image generation is, for example, about 16 × 16 points, and the number of quantization point searches may be about 10 times for each parameter.

【０１０２】一方、第３フレーム目以降の探索領域の設
定は、重心移動の推定に基づいて行うことができる。即
ち、今、第ｋフレーム（ｋ≧３）の映像ソースが得られ
た場合には、第（ｋ−１）フレーム及び第ｋフレームの
各画像から移動体領域を抽出する。このときのフレーム
間の画像偏移量に基づいてブロック毎の動ベクトルの平
均（ｘOF，ｙOF）を計算して、姿勢パラメータα、βの
探索中心を次のように設定する。On the other hand, the search areas after the third frame can be set based on the estimation of the movement of the center of gravity. That is, when the video source of the kth frame (k ≧ 3) is obtained, the moving body region is extracted from each image of the (k−1) th frame and the kth frame. The average (xOF, yOF) of motion vectors for each block is calculated based on the image shift amount between frames at this time, and the search centers of the posture parameters α and β are set as follows.

【０１０３】 αc[k]＝arctan（tan αE[k-1]−ｙOF／Ｆ[k-1] ） βc[k]＝arctan{(tan βE[k-1]−ｘOF／Ｆ[k-1] ）× cosα} ここで、 αE[k-1]：時刻 k-1における姿勢パラメータαの推定値 βE[k-1]：時刻 k-1における姿勢パラメータβの推定値Ｆ[k-1] ：時刻 k-1におけるカメラの焦点距離[0103] αc [k] = arctan (tan αE [k-1] -yOF / F [k-1]) βc [k] = arctan {(tan βE [k-1] −xOF / F [k-1]) × cosα} here, αE [k-1]: Estimated value of attitude parameter α at time k-1 βE [k-1]: Estimated value of attitude parameter β at time k-1 F [k-1]: Camera focal length at time k-1

【０１０４】である。これは動ベクトル（オプティカル
フロー）の重心が１フレーム時間内に移動する量を光軸
ずれの時間変化量（Δα，Δβ）として、第ｋフレーム
の三次元位置の初期推定を行うことを意味する。一方、
位置パラメータ（ｒ，θ，φ）については、時刻 k-1に
関する位置姿勢探索に基づいて得られた推定結果である
ところの、（ｒ[k-1] ，θ[k-1] ，φ[k-1] ）を第ｋフレームについての探索中心とする。It is This means that the amount of movement of the center of gravity of the motion vector (optical flow) within one frame time is used as the temporal change amount (Δα, Δβ) of the optical axis deviation to perform initial estimation of the three-dimensional position of the k-th frame. . on the other hand,
The position parameters (r, θ, φ) are (r [k-1], θ [k-1], φ [which are the estimation results obtained based on the position and orientation search at time k-1. k−1]) is the search center for the k-th frame.

【０１０５】また、初期フレームでは複数個の探索候補
領域について並列に一次元探索を行ったが、第ｋフレー
ム以降は上記のような重心移動の推定により探索領域を
１つに限定できるので、評価関数は前記（６）式と同様
に計算し、例えば、特願平５−２４７２２９号の発明に
記載された手段と同様に、上記計算による評価誤差値が
最小となるような各パラメータ毎の探索点を独立して一
次元探索すれば良いものであり、これにより、フレーム
毎に位置姿勢の最適値を出力する。Further, in the initial frame, a one-dimensional search is carried out in parallel for a plurality of search candidate regions. However, since the k-th frame and thereafter can estimate the movement of the center of gravity, the search region can be limited to one. The function is calculated in the same manner as the equation (6), and for example, similar to the means described in the invention of Japanese Patent Application No. 5-247229, a search for each parameter such that the evaluation error value by the above calculation is minimized. It suffices to perform a one-dimensional search for the points independently, and thereby the optimum value of the position and orientation is output for each frame.

【０１０６】ところで、ここまで述べてきたような画像
再生方法では、上記重心移動の推定以外の位置姿勢探索
が各静止画フレーム毎に独立して行われるため、連続再
生画像は、対象物である自動車があたかも凸凹道を走行
しているかのようなギクシャクした映像となることがあ
る。そこで、実際の自動車の走行状況に近い滑らかな運
動推定結果を得るためには、推定した位置姿勢の時系列
に対して例えば次式のような移動平均による平滑化処理
を施せば良い。By the way, in the image reproducing method as described above, the position / orientation search other than the estimation of the movement of the center of gravity is independently performed for each still image frame, and therefore the continuously reproduced image is an object. The image may be jerky as if the car were traveling on a bumpy road. Therefore, in order to obtain a smooth motion estimation result that is close to the actual traveling state of the automobile, it is sufficient to perform smoothing processing by a moving average such as the following equation on the time series of the estimated position and orientation.

【０１０７】[0107]

【数７】 [Equation 7]

【０１０８】しかして、上述のような処理により得られ
た推定画像を実画像から得た移動体領域に照合させ、上
述した評価関数と量子化空間点の探索方法を用いれば、
例えば対象物である自動車の連続的な走行シーンにおけ
る位置姿勢に関する三次元運動情報（ｒ，θ，φ，α，
β，γ）の推定結果を、複数枚の推定画像により得るこ
とができる。そして、この位置姿勢の推定結果に対して
移動平均による平滑化処理を適用し、モデルベース再生
による仮想実画像によって自動車の運動を再現すれば、
上記連続的な自動車走行シーンに対応した滑らかな運動
推定結果を得ることができる。If the estimated image obtained by the above-described processing is collated with the moving body area obtained from the actual image and the above-described evaluation function and quantization space point search method are used,
For example, the three-dimensional motion information (r, θ, φ, α,
The estimation result of β, γ) can be obtained from a plurality of estimated images. Then, if smoothing processing by moving average is applied to the estimation result of the position and orientation, and the motion of the car is reproduced by the virtual real image by the model-based reproduction,
It is possible to obtain a smooth motion estimation result corresponding to the continuous automobile traveling scene.

【０１０９】要するに上記した本実施例の構成によれ
ば、時間的に連続する２画面間で出現する二次元動ベク
トルに基づいて抽出した移動体領域の二次元形状から対
象物の見え方を仮想実画像で推定しているため、カメラ
の姿勢位置が不明な状況下においても、同じ映像ソース
に基づいて異なった視点から見た移動対象物の三次元映
像を自在に生成可能となるものである。特にこの場合に
は、特開平６−２６２５６８号公報或いは特開平６−２
５８０２８号公報に見られるような従来の位置姿勢認識
手法のように、二値化処理による特徴点の抽出が不要で
あるから、屋外環境のように明るさの変動や光の反射な
どの雑音成分レベルが大きい場所で使用する場合、或い
は非対象物と画像の重なりなどにより移動対象物の見え
方が限定できない場合であっても、上記のような異なる
視点から見た移動対象物の三次元映像の生成を支障なく
行い得るようになる。従って、自動車の追従走行やロボ
ットの遠隔操作技術などに展開可能となる。In short, according to the configuration of the present embodiment described above, the appearance of the object is virtually determined from the two-dimensional shape of the moving body area extracted based on the two-dimensional motion vector appearing between two temporally consecutive screens. Since it is estimated from a real image, it is possible to freely generate a three-dimensional image of a moving object viewed from different viewpoints based on the same image source even when the position of the camera is unknown. . Particularly in this case, JP-A-6-262568 or JP-A-6-2
Unlike the conventional position / orientation recognition method as disclosed in Japanese Patent No. 58028, since it is not necessary to extract feature points by binarization processing, noise components such as fluctuations in brightness and reflection of light like in an outdoor environment. Even when it is used in a place where the level is large, or when the appearance of the moving object cannot be limited due to the overlap of the non-object and the image, etc., the three-dimensional image of the moving object viewed from different viewpoints as described above. Can be generated without any trouble. Therefore, it can be applied to the follow-up running of automobiles and the remote control technology of robots.

【０１１０】また、対象物そのものが曲面形状を主体に
構成されている場合においても、当該対象物の三次元構
造をモデリング可能であれば、つまりその三次元構造を
モデリングした形状データがモデルデータベース７に記
憶されてさえいれば、上述同様の対象物の三次元映像の
生成を支障なく行い得るようになる。Even when the object itself is mainly composed of a curved surface, if the three-dimensional structure of the object can be modeled, that is, the shape data modeling the three-dimensional structure is used as the model database 7. 3D image of the same object as described above can be generated without any problem if it is stored in.

【０１１１】尚、本実施例の構成によれば、単に移動対
象物の三次元運動の推定に限らず、対象物がどのような
姿勢にあるのかを認識可能となるものであり、三次元モ
デルとの照合によって移動対象物の位置姿勢を推定すれ
ば、その対象物に含まれる細かな部位（自動車の場合、
ミラー位置など）の位置姿勢推定も可能になる。さら
に、障害物の形状及び位置姿勢が既知であれば、移動対
象物が当該障害物によって一部遮られているような状況
下でも、その対象物の三次元運動を推定可能となる。According to the configuration of this embodiment, it is possible to recognize not only the estimation of the three-dimensional motion of the moving object but also the posture of the object, and the three-dimensional model. If the position and orientation of the moving object is estimated by collating with, the detailed parts included in the object (in the case of a car,
It is also possible to estimate the position and orientation of the mirror position). Furthermore, if the shape and position / orientation of the obstacle are known, it is possible to estimate the three-dimensional movement of the moving object even in a situation where the moving object is partially blocked by the obstacle.

【０１１２】対象物領域を色情報などで限定可能であれ
ば、静止対象物の位置姿勢であっても検出可能となる。
また、対象物の存在、位置姿勢、運動、色彩などを認識
した上で画像を物体として符号化することができる。こ
れにより、対象物の三次元映像データベースの構築や、
公衆電話回線を利用したＴＶ電話用などの超低ビットレ
ート画像伝送のためのモデルベース符号化・復号化装置
に応用することも可能になる。If the object area can be limited by color information or the like, the position and orientation of the stationary object can be detected.
Further, the image can be encoded as an object after recognizing the presence, position / orientation, motion, color, etc. of the object. With this, construction of a 3D image database of the object,
It is also possible to apply the present invention to a model-based encoding / decoding device for transmitting an ultra-low bit rate image for a TV phone using a public telephone line.

【０１１３】上記のように推定した仮想実画像と実際の
観測画像との間の誤差を予め定義された誤差評価関数、
特には面積誤差、画素値の誤差、特徴領域の二次元位置
の誤差及び形状特徴の各評価関数を統合した誤差評価関
数により算出すると共に、その誤差評価関数が最小値を
取った状態での位置姿勢を最終推定結果として出力する
構成となっているから、上記推定仮想実画像の精度を高
め得るようになる。The error between the virtual real image estimated as described above and the actual observed image is defined in advance by an error evaluation function,
In particular, the area error, the pixel value error, the error in the two-dimensional position of the feature area, and the shape feature are calculated by an integrated error evaluation function, and the position when the error evaluation function takes the minimum value is calculated. Since the posture is output as the final estimation result, the accuracy of the estimated virtual real image can be improved.

【０１１４】さらに、上記のように抽出した移動体領域
の二次元形状の長軸及び短軸の長さ情報から複数個の探
索領域中心を設定し、これらの探索領域を並列に一次元
探索したデータに基づいて最終的に１つの位置姿勢を推
定する構成となっているから、対象物の探索時に局所最
適に陥ることを防止できるようになり、しかも、このよ
うに二次元形状から長軸及び短軸を抽出する処理自体は
比較的簡単であるから、探索領域中心の設定のための処
理を容易に行い得るようになって演算量を減らすことが
可能となる。Furthermore, a plurality of search area centers are set from the length information of the long axis and the short axis of the two-dimensional shape of the moving body area extracted as described above, and these search areas are one-dimensionally searched in parallel. Since it is configured to finally estimate one position / orientation based on the data, it becomes possible to prevent a local optimum from falling during the search for an object. Since the process itself for extracting the short axis is relatively simple, the process for setting the center of the search area can be easily performed, and the amount of calculation can be reduced.

【０１１５】また、対象物における視覚的に特徴を把握
し易い部分（自動車の場合、ヘッドライト、ナンバプレ
ート、タイヤのホイールなど）を特徴領域として設定
し、仮想実画像の生成に当たっては、上記特徴領域以外
の画素密度が当該特徴領域より粗くなるように補間する
構成となっているから、これによっても演算量の低減を
図り得るようになる。[0115] Further, in the case of generating a virtual real image, the above-mentioned characteristics are set by setting a portion (a headlight, a number plate, a wheel of a tire, etc. in the case of an automobile) of an object in which a characteristic can be easily grasped visually. Since the pixel density in areas other than the area is interpolated so as to be coarser than that in the characteristic area, the amount of calculation can be reduced also by this.

【０１１６】移動体領域の抽出時には、実用技術として
既に確立されたブロックマッチング法によりブロック単
位の移動体領域を抽出してラベリングするようになるか
ら、移動体領域の抽出を確実に行い得るようになる。こ
の場合、ブロック単位で抽出された移動体領域から、画
像中の輝度のムラや画像そのものの雑音などのような雑
音成分を含むブロックを除去するようになるから、雑音
成分による悪影響を受けることがなくなり、移動体領域
の抽出精度を高め得るようになる。At the time of extracting the moving body area, the moving body area is extracted and labeled in block units by the block matching method already established as a practical technique, so that the moving body area can be surely extracted. Become. In this case, blocks that include noise components such as luminance unevenness in the image and noise in the image itself are removed from the moving body region extracted in block units, so that the noise components may be adversely affected. Therefore, the extraction accuracy of the moving body area can be improved.

【０１１７】上記のようにブロック単位で抽出された移
動体領域に囲まれた範囲に動ベクトルが存在しない領域
が含まれる場合に、そのブロックを周囲のブロックと同
一にラベリングするようになるから、対象物からの光の
反射などにより動ベクトルを検知できなかったブロック
が存在する場合でも、移動体領域の抽出精度が低下する
虞がなくなる。また、ブロック単位で抽出された移動体
領域から、対象物の影に対応した部分を色空間分割処理
によって除去するようになるから、対象物の影が存在す
る状況下でも移動体領域の抽出を精度良く行い得るよう
になる。When the area surrounded by the moving body areas extracted in block units as described above includes an area in which no motion vector exists, the block is labeled in the same manner as the surrounding blocks. Even if there is a block in which the motion vector cannot be detected due to reflection of light from the target object or the like, there is no fear that the extraction accuracy of the moving body region will decrease. Further, since the portion corresponding to the shadow of the target object is removed by the color space division processing from the mobile object region extracted in block units, it is possible to extract the mobile object region even when the shadow of the target object exists. You will be able to do it accurately.

【０１１８】さらに、対象物の形状データをワイヤフレ
ームデータとして記憶すると共に、斯かるワイヤフレー
ムに対して対象物を複数方向から撮影した実画像を張り
付けるというテクスチャマッピングを行うことにより仮
想実画像を生成するようにしているから、仮想実画像の
生成を比較的簡単に行い得るようになる。Further, the shape data of the object is stored as wire frame data, and texture mapping is performed by pasting real images obtained by shooting the object from a plurality of directions to the wire frame to generate a virtual real image. Since the virtual real image is generated, the virtual real image can be generated relatively easily.

【０１１９】また、最終的な推定位置姿勢情報により生
成された複数の仮想実画像を含む再生映像に対して、推
定位置の時系列に応じた移動平均による平滑化処理が行
われるから、例えば自動車の走行シーンを再現する場合
などには、各再生画像間における仮想実画像が滑らかな
運動を示すようになり、高品質の物体認識結果が得られ
るようになる。Further, since the smoothing process by the moving average according to the time series of the estimated position is performed on the reproduced video including the plurality of virtual real images generated by the final estimated position / orientation information, for example, an automobile When reproducing the running scene of, the virtual real image between the reproduced images shows a smooth motion, and a high quality object recognition result can be obtained.

【０１２０】尚、上記実施例では探索領域の限定を行う
に当たって、（ｒ，θ，φ）→（Ｌmax ，Ｌmin ，Ψ）
［ｒ，θ，φ］の対応表を参照する方法を採用したが、
これに代えて、以下〜の手段を組み合わせた方法を
採用することもできる。In the above embodiment, in limiting the search area, (r, θ, φ) → (Lmax, Lmin, Ψ)
Although the method of referring to the correspondence table of [r, θ, φ] is adopted,
Instead of this, it is also possible to adopt a method in which the following means are combined.

【０１２１】…θ、φの限定対象物から視点までの距離をある一定の基準値ｒ0 （例
えば１０ｍ）に設定した上で、対象物のワイヤフレーム
モデルの透視変換を（θ，φ）の全方位のスキャンによ
って計算し、前記第１の形状判定方法（図１０参照）或
いは第２の形状判定方法（図１１参照）などを用いて、
長軸ａ0 及び短軸ｂ0 を求める。この結果から、それら
の比、即ち、ｃ0 ＝ｂ0 ／ａ0 と長軸ａ0 及びその傾き角度Ψ0 との組み合わせである
（ｃ0 ，ａ0 ，Ψ0)と、前記（θ，φ）との間の対応表
を計算で求める。Limitation of θ and φ: The distance from the object to the viewpoint is set to a certain reference value r 0 (for example, 10 m), and the perspective transformation of the wire frame model of the object is set to (θ, φ). Calculated by scanning the azimuth and using the first shape determination method (see FIG. 10) or the second shape determination method (see FIG. 11),
The major axis a0 and the minor axis b0 are obtained. From this result, a correspondence table between (c0, a0, Ψ0), which is a combination of these ratios, that is, c0 = b0 / a0 and the major axis a0 and its inclination angle Ψ0, and (θ, φ) above. Is calculated.

【０１２２】次に、この対応表を用いて、観測画像から
得られる長軸ａ及び短軸ｂの比、即ち、ｃ＝ｂ／ａと同じく観測画像から得られる長軸の傾き角度Ψとか
ら、形状特徴の評価関数として例えばNext, using this correspondence table, from the ratio of the major axis a and the minor axis b obtained from the observed image, that is, c = b / a and the inclination angle Ψ of the major axis obtained from the observed image as well. , As a shape feature evaluation function, for example

【数８】 [Equation 8]

【０１２３】を計算し、所定の判定基準、例えばＥsc≦ΔＥsc を満たすならば（ΔＥscはしきい値）、上記対応表中の
θとφは探索候補値であると見なす。If a predetermined criterion, for example, Esc ≦ ΔEsc is satisfied (ΔEsc is a threshold value), θ and φ in the above correspondence table are regarded as search candidate values.

【０１２４】…ｒの限定上記で求めたθ、φの候補値に対する長軸ａ0 と観測
画像から得た長軸ａの大きさを用いて、ｒの概略推定値
ｒE を次式から求める。Limitation of r Using the magnitudes of the major axis a0 and the major axis a obtained from the observed image with respect to the candidate values of θ and φ obtained above, a rough estimated value rE of r is obtained from the following equation.

【０１２５】ｒE ＝ｒ0 ・（ａ0 ［ｒ0 ，θ，φ］／ａ）[0125] rE = r0 · (a0 [r0, θ, φ] / a)

【０１２６】…α、βの限定抽出された移動体領域の重心は、画像中心と一般的に一
致しない。そこで中心ずれα、βの値を例えば特開平６
−２６２５６８号公報或いは特開平６−２５８０２８号
公報に見られるような方法で求める。但し、実際上は対
象物が画面内に収まっていれば、α＝β＝０として計算
しても対応表の計算には支障を生じない。従って、演算
能力に余裕が在る場合のみ、動的に観測画像から計算し
たα、βの値を組み入れた対応表を計算すれば良い。The center of gravity of the moving object region in which α and β are limitedly extracted generally does not coincide with the center of the image. Therefore, the values of the center deviations α and β are set, for example, in Japanese Patent Laid-Open No.
-262568 or JP-A-6-258028. However, in practice, if the target object is within the screen, the calculation of the correspondence table will not be hindered even if the calculation is performed with α = β = 0. Therefore, only when the computing capacity has a margin, it is sufficient to calculate the correspondence table dynamically incorporating the values of α and β calculated from the observed image.

[Brief description of drawings]

【図１】本発明の一実施例を示す全体の機能ブロック図FIG. 1 is an overall functional block diagram showing an embodiment of the present invention.

【図２】動作内容説明用のフローチャートFIG. 2 is a flowchart for explaining operation contents.

【図３】移動体領域の抽出動作を説明するための摸式図FIG. 3 is a schematic diagram for explaining a moving body region extracting operation.

【図４】ブロックマッチング法によるオプティカルフロ
ー検出時のしきい値処理を説明するためのＸＹ座標図FIG. 4 is an XY coordinate diagram for explaining threshold processing when an optical flow is detected by a block matching method.

【図５】ブロック単位で抽出した移動体領域の孤立ブロ
ックを除去する処理を説明するための摸式図FIG. 5 is a schematic diagram for explaining a process of removing an isolated block of a moving body region extracted in block units.

【図６】量子化空間点の探索動作に必要なパラメータを
説明するための摸式図FIG. 6 is a schematic diagram for explaining parameters required for a quantized space point search operation.

【図７】初期フレームでの探索領域を限定するための処
理を示すフローチャートFIG. 7 is a flowchart showing a process for limiting a search area in an initial frame.

【図８】初期フレームでの探索領域を説明するための摸
式図FIG. 8 is a schematic diagram for explaining a search area in an initial frame.

【図９】移動体領域の二次元形状の判定方法を説明する
ための図FIG. 9 is a diagram for explaining a method of determining a two-dimensional shape of a moving body region.

【図１０】移動体領域の長軸及び短軸の抽出方法を説明
するための図その１FIG. 10 is a diagram for explaining a method of extracting a long axis and a short axis of a moving body region, part 1

【図１１】移動体領域の長軸及び短軸の抽出方法を説明
するための図その２FIG. 11 is a diagram for explaining a method of extracting a long axis and a short axis of a moving body region, part 2

【図１２】移動体領域の重心から姿勢パラメータを求め
る例を説明するためのＸＹ座標図FIG. 12 is an XY coordinate diagram for explaining an example of obtaining a posture parameter from the center of gravity of a moving body region.

【図１３】仮想実画像をテクスチャマッピングにより生
成する方法を摸式的に示す図FIG. 13 is a diagram schematically showing a method of generating a virtual real image by texture mapping.

【図１４】ワイヤフレームと移動体領域との関係例を示
す摸式図FIG. 14 is a schematic diagram showing an example of a relationship between a wire frame and a moving body region.

【図１５】特徴領域の位置を説明するための摸式図FIG. 15 is a schematic diagram for explaining the position of a characteristic region.

【図１６】誤差評価関数の内容を表現した摸式図FIG. 16 is a schematic diagram showing the contents of the error evaluation function.

【図１７】一次元探索法を示すフローチャートFIG. 17 is a flowchart showing a one-dimensional search method.

[Explanation of symbols]

２…ＣＣＤカメラ（撮像手段）、４…移動体抽出部（移
動体領域抽出手段）、７…モデルデータベース（記憶手
段）、９…テクスチャマッピング部、１０…空間量子化
部（空間量子化手段）、１１…画像推定部（画像推定手
段）。2 ... CCD camera (imaging means), 4 ... moving body extracting section (moving body area extracting means), 7 ... model database (storage means), 9 ... texture mapping section, 10 ... spatial quantizing section (spatial quantizing means) , 11 ... Image estimating unit (image estimating means).

フロントページの続き (56)参考文献特開平６−259560（ＪＰ，Ａ) 特開平６−247246（ＪＰ，Ａ) 特開平５−143737（ＪＰ，Ａ) 特開平６−262568（ＪＰ，Ａ) 特開平６−258028（ＪＰ，Ａ) 特開平５−73681（ＪＰ，Ａ) 特開平５−73682（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G01B 11/00 - 11/30 G06T 1/00 G06T 7/00 G06T 7/60 Continuation of front page (56) Reference JP-A-6-259560 (JP, A) JP-A-6-247246 (JP, A) JP-A-5-143737 (JP, A) JP-A-6-262568 (JP , A) JP-A-6-258028 (JP, A) JP-A-5-73681 (JP, A) JP-A-5-73682 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB) Name) G01B 11/00-11/30 G06T 1/00 G06T 7/00 G06T 7/60

Claims

(57) [Claims]

1. An image pickup means for picking up an object having a known three-dimensional shape, wherein moving image processing is performed based on image data of a plurality of consecutive frames in a series of images picked up by the image pickup means. In the device for recognizing the three-dimensional position and orientation of the object by means of a moving body area extracting means for extracting a moving body area based on a two-dimensional motion vector appearing between screens of the plurality of frames, the object Storage means for storing shape data modeling a three-dimensional structure, and the object by searching the position and orientation of the imaging means for obtaining the two-dimensional shape of the moving body region extracted by the moving body region extracting means. 3D based on the spatial quantization means for estimating the position and orientation of the object, and the position and orientation of the object estimated by the spatial quantization means and the shape data stored in the storage means. Concrete modeled by the object of the virtual real image a three-dimensional object recognition device based on image data comprising the image estimating means for generating a.

2. The spatial quantizing means is provided so as to calculate an error between a virtual real image generated by the image estimating means and an actual observed image by a predefined error evaluation function. The three-dimensional object recognizing device based on image data according to claim 1, wherein the three-dimensional position and orientation in a state where the error evaluation function takes the minimum value is output as a final estimation result.

3. The spatial quantization means sets a plurality of search area centers from the length information of the long axis and the short axis of the two-dimensional shape of the moving body area extracted by the moving body area extracting means, The three-dimensional object recognition device based on image data according to claim 1, wherein one position / orientation is finally estimated based on data obtained by parallelly one-dimensionally searching these search regions. .

4. A feature region is set as a portion of a target object in which the feature is easily grasped visually, and the image estimation means
The three-dimensional object recognition device based on image data according to claim 1, wherein, when generating the virtual real image, interpolation is performed so that a pixel density other than the characteristic region is coarser than that of the characteristic region.

5. The image data according to claim 1, wherein the moving body region extracting means is configured to extract and label a moving body region in block units using a block matching method. Based 3D object recognition system.

6. The moving body region extracting means has a function of removing a block containing a noise component from the moving body region extracted in block units.
A three-dimensional object recognition device based on the described image data.

7. The moving body region extracting means, when a region surrounded by moving body regions extracted on a block-by-block basis includes a region in which no motion vector exists, labels the block in the same manner as surrounding blocks. The three-dimensional object recognition device based on image data according to claim 5, which is provided with a function.

8. The moving body region extracting means has a function of removing a portion corresponding to the shadow of the object from the moving body region extracted in block units by color space division processing. A three-dimensional object recognition device based on the image data according to claim 5.

9. The storage means stores shape data of the target object as wire frame data, and the image estimating means captures real images of the target object taken from a plurality of directions with respect to the wire frame. The three-dimensional object recognition device based on image data according to claim 1, wherein the three-dimensional object recognition device is configured to generate a virtual real image by performing texture mapping of pasting.

10. A smoothing process using a moving average according to a time series of estimated positions is performed on a reproduced video including a plurality of virtual real images generated by final estimated position and orientation information. A three-dimensional object recognition device based on image data according to claim 1.