JPH11250252A

JPH11250252A - Three-dimensional object recognizing device and method therefor

Info

Publication number: JPH11250252A
Application number: JP10049520A
Authority: JP
Inventors: Shigeru Kimura; 茂木村; Katsuyuki Nakano; 勝之中野; Hiroyoshi Yamaguchi; 博義山口; Tetsuya Shinpo; 哲也新保; Eiji Kawamura; 英二川村; Masato Ogata; 正人緒方
Original assignee: SAIBUAASU KK; Japan Steel Works Ltd; Komatsu Ltd; Technical Research and Development Institute of Japan Defence Agency; Mitsubishi Precision Co Ltd
Current assignee: SAIBUAASU KK; Japan Steel Works Ltd; Komatsu Ltd; Technical Research and Development Institute of Japan Defence Agency; Mitsubishi Precision Co Ltd
Priority date: 1998-03-02
Filing date: 1998-03-02
Publication date: 1999-09-17
Anticipated expiration: 2018-03-02
Also published as: JP2881193B1

Abstract

PROBLEM TO BE SOLVED: To obtain a working visual field corresponding to a working situation without being limited by a working visual field unitarily decided according to the arrangement position or attitude of a picture sensor. SOLUTION: A virtual picture #21 obtained by a virtual picture sensor 21 at the time of image picking-up an object 5 with a desired visual field at a desired position is set instead of a picture #1 of a first image pickup means 1. Then, a position coordinate (Xk , Yk ) (k=1,...,N) of a corresponding candidate Pk in pictures #1-Ν of plural image pickup means 1-N corresponding to a selected picture element P21 specified by a position (i, j) in the set virtual picture #21 is generated for each size of a virtual distance zn from the virtual picture sensor P21 to a point 50a on the object 50 corresponding to the selected picture element P21 . Then, the similarity of the picture information of the position coordinate (Xk , Yk ) (k=1,..., N) of the generated corresponding candidate point Pk is calculated. Thus, an as umption distance znx when the calculated similarity is made the maximum is obtained as a distance from the virtual picture sensor 21 to the point 50a on the object 50 corresponding to the selected picture element P21 , and this distance znx is calculated for each selected picture element.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、物体の認識装置お
よびその方法に関し、異なる位置に配置された複数の撮
像手段による画像情報から三角測量の原理を利用して認
識対象物体までの距離情報など、認識対象物体の３次元
情報を算出して、この３次元情報を用いて物体を認識す
るような場合に適用して好適な装置および方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus and a method for recognizing an object, and more particularly to distance information to an object to be recognized by using triangulation principles from image information obtained by a plurality of image pickup means arranged at different positions. More specifically, the present invention relates to an apparatus and a method which are preferably applied to a case where three-dimensional information of a recognition target object is calculated and an object is recognized using the three-dimensional information.

【０００２】[0002]

【従来の技術】従来より、撮像手段たる画像センサの撮
像結果に基づき認識対象物体までの距離を計測する方法
として、ステレオビジョン（ステレオ視）による計測方
法が広く知られている。2. Description of the Related Art Hitherto, as a method of measuring a distance to a recognition target object based on an image pickup result of an image sensor as an image pickup means, a measurement method using stereo vision (stereo vision) has been widely known.

【０００３】この計測方法は２次元画像から、距離、深
度、奥行きといった３次元情報を得るために有用な方法
である。This measurement method is a useful method for obtaining three-dimensional information such as distance, depth and depth from a two-dimensional image.

【０００４】すなわち、２台の画像センサを例えば左右
に配置し、これら２台の画像センサで同一の認識対象物
を撮像したときに生じる視差から、三角測量の原理で対
象物までの距離を測定するという方法である。このとき
の左右の画像センサの対はステレオ対と呼ばれており、
２台で計測を行うことから２眼ステレオ視と呼ばれてい
る。That is, two image sensors are arranged, for example, on the left and right, and the distance to the object is measured by the principle of triangulation from parallax generated when the same image of the object to be recognized is picked up by these two image sensors. It is a method of doing. The pair of left and right image sensors at this time is called a stereo pair,
This is called twin-lens stereo vision because measurement is performed by two units.

【０００５】図２０は、こうした２眼ステレオ視の原理
を示したものである。FIG. 20 shows the principle of such two-lens stereo vision.

【０００６】同図に示すように、２眼ステレオ視では、
左右の画像センサ１、２の画像＃１（撮像面１ａ上で得
られる）、画像＃２（撮像面２ａ上で得られる）中の、
対応する点Ｐ₁、Ｐ₂の位置の差である視差（ディスパリ
ティ）ｄを計測する必要がある。一般に視差ｄは、３次
元空間中の点５０ａ（認識対象物体５０上の点）までの
距離ｚとの間に、次式で示す関係が成立する。[0006] As shown in FIG.
In image # 1 (obtained on imaging surface 1a) and image # 2 (obtained on imaging surface 2a) of left and right image sensors 1 and 2,
It is necessary to measure a parallax (disparity) d, which is a difference between the positions of the corresponding points P ₁ and P ₂ . In general, the following equation is established between the parallax d and the distance z to a point 50a (a point on the recognition target object 50) in the three-dimensional space.

【０００７】ｚ＝Ｆ・Ｂ/ ｄ …（１）ここに、Ｂは左右の画像センサ１、２間の距離（基線
長）であり、Ｆは画像センサ１のレンズ３１、画像セン
サ２のレンズ３２の焦点距離である。通常、基線長Ｂと
焦点距離Ｆは既知であるので、視差ｄが分かれば、距離
ｚは一義的に求められることになる。この視差ｄは、両
画像＃１、＃２間で、どの点がどの点に対応するかを逐
一探索することにより算出することができる。このとき
の一方の画像＃１上の点Ｐ₁に対応する他方の画像＃２
上の点Ｐ₂のことを「対応点」と以下呼ぶこととし、対
応点を探索することを、以下「対応点探索」と呼ぶこと
にする。物体５０までの距離を仮定したとき、この仮定
距離をもって探索される一方の画像＃１上の点Ｐ₁に対
応する他方の画像＃２上の点のことを以下「対応候補
点」と呼ぶことにする。２眼ステレオ視による計測を行
う場合、上記対応点探索を行った結果、真の距離ｚに対
応する真の対応点Ｐ₂を検出することができれば、真の
視差ｄを算出することができたことになり、このとき対
象物５０上の点５０ａまでの真の距離ｚが計測できたと
いえる。Z = F · B / d (1) where B is the distance (base line length) between the left and right image sensors 1 and 2, and F is the lens 31 of the image sensor 1 and the lens of the image sensor 2. 32 focal length. Usually, since the base line length B and the focal length F are known, if the parallax d is known, the distance z can be uniquely obtained. The parallax d can be calculated by searching for a point corresponding to which point between the images # 1 and # 2. Other image # 2 corresponding to the one of the image # point P ₁ on 1 at this time
To a point P ₂ of the upper and is referred hereinafter as "corresponding points", to explore the corresponding points, it will be hereinafter referred to as "corresponding point search." When assuming a distance to an object 50, that the following referred to as "candidate corresponding points" to a point on the other image # 2 corresponding to the point P ₁ on one of the image # 1 to be searched with this assumption distance To When performing measurement by binocular stereo vision, as a result of the corresponding point search, if it is possible to detect the true corresponding point P ₂ corresponding to the true distance z, it was possible to calculate the true disparity d That is, at this time, it can be said that the true distance z to the point 50a on the object 50 has been measured.

【０００８】こうした処理を、一方の画像＃１の全画素
について実行することにより、画像＃１の全選択画素に
距離情報を付与した画像（距離画像）が生成されること
になる。上記対応点を探索して真の距離を求める処理
を、図２１、図２２、図２３を用いて詳述する。By executing such processing for all pixels of one image # 1, an image (distance image) in which distance information is added to all selected pixels of image # 1 is generated. The processing for finding the true distance by searching for the corresponding point will be described in detail with reference to FIGS. 21, 22, and 23.

【０００９】図２３は、従来の２眼ステレオ視による距
離計測装置（物体認識装置）の構成を示すブロック図で
ある。FIG. 23 is a block diagram showing the configuration of a conventional distance measuring device (object recognizing device) based on binocular stereo vision.

【００１０】基準画像入力部１０１には、視差ｄ（距
離ｚ）を算出する際に基準となる画像センサ１で撮像さ
れた基準画像＃１が取り込まれる。一方、画像入力部１
０２には、基準画像＃１上の点に対応する対応点が存在
する画像である画像センサ２の画像＃２が取り込まれ
る。The reference image input unit 101 receives a reference image # 1 captured by the image sensor 1 serving as a reference when calculating the parallax d (distance z). On the other hand, the image input unit 1
In 02, an image # 2 of the image sensor 2 which is an image having a corresponding point corresponding to a point on the reference image # 1 is captured.

【００１１】つぎに、対応候補点座標発生部１０３、局
所情報抽出部１０４、類似度算出部１０５、距離推定部
１０６における処理を図２１を用いて説明すると、ま
ず、対応候補点座標発生部１０３では、基準画像＃１の
各画素に対して、仮定した距離ｚ_n毎に、画像センサ２
の画像＃２の対応候補点の位置座標が記憶、格納されて
おり、これを読み出すことにより対応候補点の位置座標
を発生する。Next, the processing in the corresponding candidate point coordinate generator 103, the local information extractor 104, the similarity calculator 105, and the distance estimator 106 will be described with reference to FIG. So for each pixel of the reference image # 1, for each distance z _n assuming the image sensor 2
The position coordinates of the corresponding candidate point of the image # 2 are stored and stored, and by reading this, the position coordinates of the corresponding candidate point are generated.

【００１２】すなわち、基準画像センサ１の基準画像＃
１の中から位置（ｉ，ｊ）で特定される画素Ｐ₁が選択
されるとともに、認識対象物体５０までの距離ｚ_nが仮
定される。そして、この仮定距離ｚ_nに対応する他方の
画像センサ２の画像＃２内の対応候補点Ｐ₂の位置座標
（Ｘ₂，Ｙ₂）が読み出される。That is, the reference image # of the reference image sensor 1
1, a pixel P ₁ specified by the position (i, j) is selected, and a distance z _n to the recognition target object 50 is assumed. Then, the position coordinates (X ₂ , Y ₂ ) of the corresponding candidate point P ₂ in the image # 2 of the other image sensor 2 corresponding to the assumed distance z _n are read.

【００１３】つぎに、局所情報抽出部１０４では、この
ようにして対応候補点座標発生部１０３によって発生さ
れた対応候補点の位置座標に基づき局所情報を抽出する
処理が実行される。ここで、局所情報とは、対応候補点
の近傍の画素を考慮して得られる対応候補点の画像情報
のことである。Next, the local information extracting section 104 executes a process of extracting local information based on the position coordinates of the corresponding candidate points generated by the corresponding candidate point coordinate generating section 103 in this way. Here, the local information is image information of a corresponding candidate point obtained in consideration of a pixel near the corresponding candidate point.

【００１４】さらに、類似度算出部１０５では、上記局
所情報抽出部１０４で得られた対応候補点Ｐ₂の局所情
報Ｆ₂と基準画像の選択画素Ｐ₁の画像情報との類似度が
算出される。Further, the similarity calculating section 105 calculates the similarity between the local information F _{2 of} the corresponding candidate point P ₂ obtained by the local information extracting section 104 and the image information of the selected pixel P ₁ of the reference image. You.

【００１５】具体的には、基準画像＃１の選択された画
素の周囲の領域と、画像センサ２の画像＃２の対応候補
点の周囲の領域とのパターンマッチングにより、両画像
の領域同士が比較されて、類似度が算出される。つま
り、類似度の安定化処理が行われる。More specifically, pattern matching between the area around the selected pixel of the reference image # 1 and the area around the corresponding candidate point of the image # 2 of the image sensor 2 allows the areas of both images to be compared. The comparison is performed to calculate the similarity. That is, the similarity stabilization process is performed.

【００１６】すなわち、図２１に示すように、基準画像
＃１の選択画素Ｐ₁の位置座標を中心とするウインドウ
ＷＤ₁が切り出されるとともに、画像センサ２の画像＃
２の対応候補点Ｐ₂の位置座標を中心とするウインドウ
ＷＤ₂が切り出され、これらウインドウＷＤ₁、ＷＤ₂同
士についてパターンマッチングを行うことにより、これ
らの類似度が算出される。このパターンマッチングは各
仮定距離ｚ_n毎に行われる。そして同様のパターンマッ
チングが、基準画像＃１の各選択画素毎に全画素につい
て行われる。That is, as shown in FIG. 21, a window WD ₁ centered on the position coordinates of the selected pixel P ₁ of the reference image # ₁ is cut out, and the image # 2 of the image sensor 2 is cut out.
Window WD ₂ around the second position coordinates of the corresponding candidate point P ₂ is cut out, by performing pattern matching for these windows WD _1, WD ₂ together, these similarities are calculated. This pattern matching is performed for each assumed distance z _n . Then, the same pattern matching is performed for all pixels for each selected pixel of the reference image # 1.

【００１７】図２２は、仮定距離ｚ_nと類似度逆数Ｑsと
の対応関係を示すグラフである。図２１のウインドウＷ
Ｄ₁と、仮定距離がｚ'_nのときの対応候補点の位置を中
心とするウインドウＷＤ' ₂とのマッチングを行った結
果は、図２２に示すように類似度の逆数Ｑsとして大き
な値が得られている（類似度は小さくなっている）が、
図２１のウインドウＷＤ₁と、仮定距離がｚ_nxのときの
対応候補点の位置を中心とするウインドウＷＤ₂とのマ
ッチングを行った結果は、図２２に示すように類似度の
逆数Ｑsは小さくなっている（類似度は大きくなってい
る）のがわかる。FIG. 22 is a graph showing the correspondence between the assumed distance z _n and the inverse similarity Qs. Window W in FIG.
As a result of matching between D ₁ and the window WD ′ ₂ centered on the position of the corresponding candidate point when the assumed distance is z ′ _n, a large value is obtained as the reciprocal Qs of the similarity as shown in FIG. (Similarity is smaller),
The window WD ₁ in FIG. 21, matching the results of the corresponding window WD ₂ around the position of the candidate point when assumptions distance z _nx is reciprocal Qs of similarity as shown in FIG. 22 are small (The similarity is increased).

【００１８】なお、類似度は、一般に比較すべき選択画
素と対応候補点の画像情報の差の絶対値や、差の２乗和
として求められる。このようにして仮定距離ｚ_nと類似
度の逆数Ｑsとの対応関係から、最も類似度が高くなる
点（類似度の逆数Ｑsが最小値となる点）を判別し、こ
の最も類似度が高くなっている点に対応する仮定距離ｚ
_nxを最終的に、認識対象物体５０上の点５０ａまでの真
の距離（最も確からしい距離）と推定する。The similarity is generally obtained as the absolute value of the difference between the selected pixel to be compared and the image information of the corresponding candidate point, or the sum of squares of the difference. In this way, from the correspondence between the hypothetical distance z _n and the reciprocal Qs of the similarity, the point having the highest similarity (the point at which the reciprocal Qs of the similarity has the minimum value) is determined. Hypothetical distance z corresponding to the point
Finally, _nx is estimated as the true distance (the most probable distance) to the point 50a on the recognition target object 50.

【００１９】つまり、図２１における仮定距離ｚ_nxに対
応する対応候補点Ｐ₂が選択画素Ｐ₁に対する対応点であ
るとされる。このように、距離推定部１０６では、基準
画像＃１の選択画素について仮定距離ｚ_nを順次変化さ
せて得られた各類似度の中から、最も類似度の高くなる
ものが判別され、最も類似度が高くなる仮定距離ｚ_nxが
真の距離と推定され、出力される。That is, the corresponding candidate point P ₂ corresponding to the assumed distance z _nx in FIG. 21 is determined to be the corresponding point for the selected pixel P ₁ . As described above, the distance estimating unit 106 determines the one having the highest similarity from among the similarities obtained by sequentially changing the assumed distance z _n for the selected pixel of the reference image # 1, and determines the most similar one. The assumed distance z _{nx at} which the degree becomes higher is estimated as the true distance and is output.

【００２０】以上、２眼ステレオ視による場合を説明し
たが、３台以上の画像センサを用いてもよい。３台以上
の画像センサを用いて距離計測（物体認識）を行うこと
を、多眼ステレオ視による距離計測（物体認識）と称す
ることにする。Although the above description has been made of the case of the binocular stereo vision, three or more image sensors may be used. Performing distance measurement (object recognition) using three or more image sensors is referred to as distance measurement (object recognition) by multi-view stereo vision.

【００２１】多眼ステレオ視は対応点のあいまいさを低
減できるため格段に信頼性を向上できるので最近良く用
いられている。この多眼ステレオ視による距離計測装置
（物体認識装置）では、複数の画像センサを、２台の画
像センサからなるステレオ対に分割し、それぞれのステ
レオ対に対し、前述した２眼ステレオ視の原理を繰り返
し適用する方式をとっている。[0021] Multi-view stereo vision has recently been often used because the ambiguity of corresponding points can be reduced and the reliability can be remarkably improved. In this multi-view stereo distance measurement device (object recognition device), a plurality of image sensors are divided into stereo pairs consisting of two image sensors, and each stereo pair is subjected to the principle of the above-described two-view stereo vision. Is applied repeatedly.

【００２２】すなわち、複数ある画像センサの中から基
準となる画像センサを選択し、この基準画像センサと他
の画像センサとの間で、ステレオ対を構成する。そし
て、各ステレオ対に対して２眼ステレオ視の場合の処理
を適用していく。この結果、基準画像センサから基準画
像センサの視野内に存在する認識対象物までの距離が計
測されることになる。That is, a reference image sensor is selected from a plurality of image sensors, and a stereo pair is formed between the reference image sensor and another image sensor. Then, the processing in the case of binocular stereo vision is applied to each stereo pair. As a result, the distance from the reference image sensor to the recognition target present in the field of view of the reference image sensor is measured.

【００２３】従来の多眼ステレオにおけるステレオ対の
関係を図２４を参照して説明すると、図２１に示す２眼
ステレオでは、基準画像＃１と対をなす対応画像は＃２
の１つであったが、多眼ステレオでは、基準画像＃１と
画像センサ２の画像＃２の対、基準画像＃１と画像セン
サ３の画像＃３の対、…、基準画像＃１と画像センサＮ
の画像＃Ｎの対という具合に複数のステレオ対が存在す
る。The relationship between stereo pairs in a conventional multi-view stereo will be described with reference to FIG. 24. In the twin-lens stereo shown in FIG. 21, the corresponding image paired with the reference image # 1 is # 2.
However, in the multi-view stereo, the pair of the reference image # 1 and the image # 2 of the image sensor 2, the pair of the reference image # 1 and the image # 3 of the image sensor 3, ..., the reference image # 1 Image sensor N
There are a plurality of stereo pairs such as the pair of image #N.

【００２４】こうした対応画像と基準画像の各ステレオ
対に基づく処理を行う前には、画像センサたるカメラの
取付け歪みなどを考慮する必要があり、通常はキャリブ
レーションによる補正処理を前もって行うようにしてい
る。多眼ステレオによって対応点を探索して真の距離を
求める処理を、前述した２眼ステレオの図２１、図２３
に対応する図２４、図２５を用いて詳述する。Before performing the processing based on each stereo pair of the corresponding image and the reference image, it is necessary to consider the mounting distortion of the camera serving as the image sensor, and the correction processing by calibration is usually performed in advance. I have. The process of searching for a corresponding point by multi-view stereo to obtain a true distance is the same as that of the above-described binocular stereo shown in FIGS.
24 and FIG. 25 corresponding to FIG.

【００２５】図２４は、従来の多眼ステレオ視による距
離計測装置（物体認識装置）の構成を説明する図であ
る。なお、各画像センサ１、２、３、…、Ｎは、水平、
垂直あるいは斜め方向に所定の間隔で配置されているも
のとする（説明の便宜上、図２４では一定間隔で左右に
配置されている場合を示している）。FIG. 24 is a view for explaining the configuration of a conventional distance measuring device (object recognition device) based on multi-view stereo vision. The image sensors 1, 2, 3,..., N are horizontal,
It is assumed that they are arranged at predetermined intervals in the vertical or oblique direction (for convenience of explanation, FIG. 24 shows a case where they are arranged on the left and right at regular intervals).

【００２６】図２６の基準画像入力部２０１には、視差
ｄ（距離ｚ）を算出する際に基準となる画像センサ１で
撮像された基準画像＃１が取り込まれる。一方、画像入
力部２０２には、基準画像＃１上の点に対応する対応点
が存在する画像である画像センサ２の画像＃２が取り込
まれる。他の画像入力部２０３、２０４においても、基
準画像＃１に対応する画像センサ３の画像＃３が、基準
画像＃１に対応する画像センサＮの画像＃Ｎがそれぞれ
取り込まれる。The reference image input unit 201 shown in FIG. 26 receives a reference image # 1 captured by the image sensor 1 serving as a reference when calculating the parallax d (distance z). On the other hand, the image input unit 202 receives an image # 2 of the image sensor 2 which is an image having a corresponding point corresponding to a point on the reference image # 1. Also in the other image input units 203 and 204, the image # 3 of the image sensor 3 corresponding to the reference image # 1 and the image #N of the image sensor N corresponding to the reference image # 1 are captured.

【００２７】対応候補点座標発生部２０５では、基準画
像＃１の各選択画素Ｐ₁に対して、仮定した距離ｚ_n毎
に、画像センサ２の画像＃２の対応候補点Ｐ₂の位置座
標、画像センサ３の画像＃３の対応候補点Ｐ₃の位置座
標、画像センサＮの画像＃Ｎの対応候補点Ｐ_Nの位置座
標がそれぞれ記憶、格納されており、これらを読み出す
ことにより各対応候補点の位置座標を発生する。[0027] In the corresponding candidate point coordinate generating unit 205, for each selected pixel P ₁ of the reference image # 1, for each distance z _n assuming the position coordinates of the corresponding candidate point P ₂ of the image # 2 image sensor 2 , storage location coordinates of the corresponding candidate point P ₃ of the image # 3 image sensor 3, the position coordinates of the corresponding candidate point P _N of the image #N image sensors N, respectively, are stored, each corresponding by reading these Generate the position coordinates of the candidate points.

【００２８】すなわち、基準画像センサ１の基準画像＃
１の中から位置（ｉ，ｊ）で特定される画素Ｐ₁が選択
されるとともに、認識対象物体５０までの距離ｚ_nが仮
定される。そして、この仮定距離ｚ_nに対応する画像セ
ンサ２の画像＃２内の対応候補点Ｐ₂の位置座標（Ｘ₂，
Ｙ₂）が読み出される。That is, the reference image # of the reference image sensor 1
1, a pixel P ₁ specified by the position (i, j) is selected, and a distance z _n to the recognition target object 50 is assumed. The position coordinates (X ₂ , X ₂ ) of the corresponding candidate point P ₂ in the image # 2 of the image sensor 2 corresponding to the assumed distance z _n
Y ₂ ) is read.

【００２９】同様にして、基準画像＃１の選択画素
Ｐ₁、仮定距離ｚ_nに対応する画像センサ３の画像＃３の
対応候補点Ｐ₃の位置座標（Ｘ₃，Ｙ₃）が読み出され、
基準画像＃１の選択画素Ｐ₁、仮定距離ｚ_nに対応する画
像センサＮの画像＃Ｎの対応候補点Ｐ_Nの位置座標
（Ｘ_N，Ｙ_N）が読み出される。そして、仮定距離ｚ_nを
順次変化させて同様の読み出しが行われる。また、選択
画素を順次変化させることによって同様の読み出しが行
われる。こうして対応候補点Ｐ₂の位置座標（Ｘ₂，
Ｙ₂）、対応候補点Ｐ₃の位置座標（Ｘ₃，Ｙ₃）、対応候
補点Ｐ_Nの位置座標（Ｘ_N，Ｙ_N）が対応候補点座標発生
部２０５から出力される。[0029] Similarly, the selected pixel P ₁ of the reference image # 1, left position coordinates of the corresponding candidate point P ₃ of the image # 3 of the image sensor 3 corresponding to the assumed distance _{_{_{z n (X 3, Y 3}}} ) is read And
Selected pixels P ₁ of the reference image # 1, the position coordinates (X _N, Y _N) of the corresponding candidate points P _N of the image #N image sensors N which corresponds to the assumed distance z _n is read. Then, similar reading is performed by sequentially changing the assumed distance z _n . Similar reading is performed by sequentially changing the selected pixels. Position coordinates (X ₂ corresponding candidate point P ₂ Thus,
Y _2), the position coordinates (X _{_3,} Y ₃ corresponding candidate points P _3), the position coordinates (X _N corresponding candidate point P _{_N,} Y _N) is outputted from the corresponding candidate point coordinate generating unit 205.

【００３０】つぎに、局所情報抽出部２０６では、この
ようにして対応候補点座標発生部２０５によって発生さ
れた対応候補点Ｐ₂の位置座標（Ｘ₂，Ｙ₂）に基づき局
所情報を抽出する処理が実行される。同様にして、局所
情報抽出部２０７では、対応候補点座標発生部２０５で
発生された画像センサ３の画像＃３の対応候補点Ｐ₃の
位置座標（Ｘ₃，Ｙ₃）に基づいて、対応候補点Ｐ₃の局
所情報が、局所情報抽出部２０８では、対応候補点座標
発生部２０５で発生された画像センサＮの画像＃Ｎの対
応候補点Ｐ_Nの位置座標（Ｘ_N，Ｙ_N）に基づいて、対応
候補点Ｐ_Nの局所情報がそれぞれ求められる。Next, the local information extracting section 206 extracts local information based on the position coordinates (X ₂ , Y ₂ ) of the corresponding candidate point P ₂ generated by the corresponding candidate point coordinate generating section 205 in this way. The processing is executed. In the same manner, the local information extracting unit 207 performs the correspondence based on the position coordinates (X ₃ , Y ₃ ) of the corresponding candidate point P ₃ of the image # 3 of the image sensor 3 generated by the corresponding candidate point coordinate generating unit 205. local information of the candidate point P ₃ is in the local information extracting unit 208, the position coordinates of the corresponding candidate point P _N of the image #N image sensors N which is generated in the corresponding candidate point coordinate generating unit 205 (X _N, Y _N) , Local information of the corresponding candidate point P _N is obtained.

【００３１】さらに、類似度算出部２０９では、上記局
所情報抽出部２０６で得られた対応候補点Ｐ₂の局所情
報Ｆ₂と基準画像＃１の選択画素Ｐ₁の画像情報との類似
度が算出される。具体的には、基準画像＃１の選択され
た画素の周囲の領域と、画像センサ２の画像＃２の対応
候補点の周囲の領域とのパターンマッチングにより、両
画像の領域同士が比較されて、類似度が算出される。Further, the similarity calculating section 209 calculates the similarity between the local information F _{2 of} the corresponding candidate point P ₂ obtained by the local information extracting section 206 and the image information of the selected pixel P ₁ of the reference image # 1. Is calculated. Specifically, by performing pattern matching between a region around the selected pixel of the reference image # 1 and a region around the corresponding candidate point of the image # 2 of the image sensor 2, the regions of both images are compared. , The degree of similarity is calculated.

【００３２】すなわち、図２４に示すように、基準画像
＃１の選択画素Ｐ₁の位置座標を中心とするウインドウ
ＷＤ₁が切り出されるとともに、画像センサ２の画像＃
２の対応候補点Ｐ₂の位置座標を中心とするウインドウ
ＷＤ₂が切り出され、これらウインドウＷＤ₁、ＷＤ₂同
士についてパターンマッチングを行うことにより、これ
らの類似度が算出される。このパターンマッチングは各
仮定距離ｚ_n毎に行われる。That is, as shown in FIG. 24, a window WD ₁ centered on the position coordinates of the selected pixel P ₁ of the reference image # ₁ is cut out, and the image # 2 of the image sensor 2 is cut out.
Window WD ₂ around the second position coordinates of the corresponding candidate point P ₂ is cut out, by performing pattern matching for these windows WD _1, WD ₂ together, these similarities are calculated. This pattern matching is performed for each assumed distance z _n .

【００３３】図２（１）は、仮定距離ｚ_nとステレオ対
（基準画像センサ１と画像センサ２）の類似度の逆数Ｑ
s₁との対応関係を示すグラフである。FIG. 2 (1) shows the assumed distance z _n and the reciprocal Q of the similarity between the stereo pair (reference image sensor 1 and image sensor 2).
It is a graph showing a relationship between a s _1.

【００３４】図２４のウインドウＷＤ₁と、仮定距離が
ｚ'_nのときの対応候補点の位置座標を中心とするウイン
ドウＷＤ'₂とのマッチングを行った結果は、図２（１）
に示すように類似度の逆数Ｑsとして大きな値が得られ
ている（類似度は小さくなっている）が、図２４のウイ
ンドウＷＤ₁と、仮定距離がｚ_nxのときの対応候補点の
位置座標を中心とするウインドウＷＤ₂とのマッチング
を行った結果は、図２（１）に示すように類似度の逆数
Ｑsは小さくなっている（類似度は大きくなっている）
のがわかる。[0034] the window WD ₁ of FIG. 24, as a result of assumptions distance was matching with the ₂ 'window WD around the position coordinates of the corresponding candidate points when the _n' where z is 2 (1)
A large value is obtained as the reciprocal Qs of similarity as shown in (similarity is smaller) is the window WD ₁ of FIG. 24, the position coordinates of the corresponding candidate points when assuming the distance is z _nx as a result of matching with the window WD ₂ centered at the reciprocal Qs similarity is smaller as shown in FIG. 2 (1) (similarity is larger)
I understand.

【００３５】同様にして類似度算出部２１０では、基準
画像＃１の選択画素Ｐ₁の位置座標を中心とするウイン
ドウＷＤ₁と、画像センサ３の画像＃３の対応候補点Ｐ₃
の位置座標を中心とするウインドウＷＤ₃とのパターン
マッチングが実行され、これらの類似度が算出される。
そして、パターンマッチングが各仮定距離ｚ_n毎に行わ
れることによって、このステレオ対（画像センサ１と画
像センサ３）についても図２（２）に示すような仮定距
離ｚ_nと類似度の逆数Ｑs₂との対応関係が求められる。[0035] In the similarity calculation unit 210 in the same manner, the reference image # the window WD ₁ around the 1 position coordinates of the selected pixel P _1, corresponding candidate point P ₃ of the image # 3 image sensors 3
Pattern matching between the window WD ₃ around the position coordinates of the runs, these similarities are calculated.
Then, since pattern matching is performed for each assumed distance z _n , the stereo pair (image sensor 1 and image sensor 3) also has the assumed distance z _n and the reciprocal Qs of the similarity as shown in FIG. Correspondence with ₂ is required.

【００３６】同様にして類似度算出部２１１では、基準
画像＃１の選択画素Ｐ₁の位置座標を中心とするウイン
ドウＷＤ₁と、画像センサＮの画像＃Ｎの対応候補点Ｐ_N
の位置座標を中心とするウインドウＷＤ_Nとのパターン
マッチングが実行され、これらの類似度が算出される。
そして、パターンマッチングが各仮定距離ｚ_n毎に行わ
れることによって、このステレオ対（画像センサ１と画
像センサＮ）についても図２（Ｎ）に示すような仮定距
離ｚ_nと類似度の逆数Ｑs_Nとの対応関係が求められる。Similarly, in the similarity calculation unit 211, the corresponding candidate point P _N between the window WD ₁ centered on the position coordinates of the selected pixel P ₁ of the reference image # ₁ and the image #N of the image sensor _N
Pattern matching between the window WD _N around the position coordinates of the runs, these similarities are calculated.
Then, since pattern matching is performed for each assumed distance z _n , the stereo pair (image sensor 1 and image sensor N) is also subjected to the assumed distance z _n and the reciprocal Qs of similarity as shown in FIG. Correspondence with _N is required.

【００３７】最後に、各ステレオ対毎に得られた仮定距
離ｚ_nと類似度の逆数との対応関係を仮定距離毎に加算
する。さらに図２（融合結果）に示すように仮定距離ｚ
_nと類似度の逆数の加算値との対応関係から、最も類似
度が高くなる点（類似度の逆数の加算値が最小値となる
点）を判別し、この最も類似度が高くなっている点に対
応する仮定距離ｚ_nxを最終的に、認識対象物体５０上の
点５０ａまでの真の距離（最も確からしい距離）と推定
する。かかる処理は、基準画像＃１の各選択画素毎に全
画素について行われる。Finally, the correspondence between the assumed distance z _n obtained for each stereo pair and the reciprocal of the similarity is added for each assumed distance. Further, as shown in FIG.
_From the correspondence between _n and the reciprocal of the similarity, the point having the highest similarity (the point at which the reciprocal of the similarity has the minimum value) is determined, and the highest similarity is determined. The hypothetical distance z _nx corresponding to the point is finally estimated as a true distance (the most likely distance) to the point 50 a on the recognition target object 50. This process is performed for all pixels for each selected pixel of the reference image # 1.

【００３８】以上のようにして、距離推定部２１２で
は、仮定距離ｚ_nを順次変化させて得られた類似度の加
算値の中から、最も類似度の加算値が高くなるものが判
別され、最も類似度の加算値が高くなる仮定距離ｚ_nxが
真の距離と推定され、出力される。そして、かかる距
離推定が基準画像＃１の全画素について行われることか
ら、基準画像＃１の全選択画素に距離情報を付与した画
像（距離画像）が生成されることになる。As described above, the distance estimating section 212 determines the one having the highest similarity addition value from the similarity addition values obtained by sequentially changing the assumed distance z _n . The hypothetical distance z _{nx at} which the sum of the similarities becomes the highest is estimated as the true distance and is output. Then, since such distance estimation is performed for all pixels of the reference image # 1, an image (distance image) in which distance information is added to all selected pixels of the reference image # 1 is generated.

【００３９】[0039]

【発明が解決しようとする課題】本発明は、以下に列挙
する課題を解決しようとするものである。SUMMARY OF THE INVENTION The present invention is to solve the following problems.

【００４０】（１）作業視野が固定で、所望の作業視野
に設定することができない。(1) The work field of view is fixed and cannot be set to a desired work field of view.

【００４１】上述したように従来の計測方法では、２眼
ステレオであれ、多眼ステレオであれ、基準となるべき
画像を得るための基準画像センサを用意し、この基準画
像に基づいて距離計測を行うようにしているため、この
基準画像センサの配設位置、取付時の傾きによって定ま
る観測視野（実際の画像センサから得られる視野のこと
をいう）でみた距離画像しか生成することができない。As described above, according to the conventional measuring method, a reference image sensor for obtaining an image to be a reference is prepared, whether it is a binocular stereo or a multi-view stereo, and distance measurement is performed based on the reference image. Since it is performed, it is possible to generate only a distance image viewed from an observation field of view (referred to as a field of view obtained from an actual image sensor) determined by the arrangement position of the reference image sensor and the inclination at the time of attachment.

【００４２】すなわち、ひとたび画像センサが固定され
たならば、観測視野は一義的に定まってしまい、所望の
観測視野からみた距離画像を生成することはできなかっ
た。しかし、実際には、物体を認識する際に、作業状況
に応じて観測視野を変えてやり、所望の作業視野（作業
にとって必要な視野）をもって作業を行いたいとの要請
がある。That is, once the image sensor is fixed, the observation field of view is uniquely determined, and a distance image viewed from a desired observation field cannot be generated. However, in actuality, when recognizing an object, there is a request to change the observation field of view according to the work situation and to perform the work with a desired work view (a view necessary for the work).

【００４３】たとえば、図２６に示すように、車両６０
に画像センサ群１１を搭載して作業を行う場合であれ
ば、車両６０に固定した画像センサからみた車両６０前
方の作業視野だけではなく、状況によっては、車両６０
前方の障害物を俯瞰するような作業視野をもって障害物
を観測しつつ作業を行いたいとの要請がある。For example, as shown in FIG.
When the work is performed by mounting the image sensor group 11 on the vehicle 60, not only the work field of view in front of the vehicle 60 as viewed from the image sensor fixed to the vehicle 60, but also the vehicle 60 depending on the situation.
There is a request to work while observing an obstacle with a work field of view that looks down at an obstacle in front.

【００４４】本発明の第１発明はこうした実状に鑑みて
なされたものであり、画像センサの配設位置、姿勢によ
って定まる作業視野に限定されることなく、作業状況に
応じた作業視野を取得できるようにすることを第１の解
決課題とするものである。The first invention of the present invention has been made in view of such circumstances, and is not limited to a work field determined by the arrangement position and attitude of the image sensor, and can obtain a work field according to the work situation. This is a first solution.

【００４５】（２）画像センサの光学系の種類に制約が
ある。(2) There are restrictions on the type of optical system of the image sensor.

【００４６】従来の多眼ステレオ視による計測は、２眼
ステレオの原理を前提にしており、各ステレオ対すべて
について同一の基準で類似度の比較を行うことが必要で
あることから、多眼ステレオを構成する各画像センサ
は、正確に同一な光学的特性を有している必要がある。The conventional measurement based on multi-view stereo vision is based on the principle of binocular stereo, and since it is necessary to compare the similarity on the same basis for all stereo pairs, the multi-view stereo is used. Are required to have exactly the same optical characteristics.

【００４７】したがって、異なるレンズ特性を有した画
像センサを同時に使用したり、異なる感度の画像センサ
を同時に使用することはできない。Therefore, image sensors having different lens characteristics cannot be used at the same time, and image sensors having different sensitivities cannot be used at the same time.

【００４８】ここに、一般に野外などで距離計測を行う
場合には、測定する距離に応じて、広角レンズと望遠レ
ンズを同時に使用して計測能力の向上を図ったり、外光
の状況に応じて、感度の異なる２種類の画像センサ、つ
まり、通常の感度のモノクロ画像センサと赤外線画像セ
ンサを併用したいとの要請がある。Here, in general, when distance measurement is performed outdoors or the like, a wide-angle lens and a telephoto lens are used at the same time to improve the measurement capability according to the distance to be measured, or according to the situation of external light. There is a demand to use two types of image sensors having different sensitivities, that is, a monochrome image sensor having a normal sensitivity and an infrared image sensor.

【００４９】しかし、従来の多眼ステレオシステムで
は、画像センサの光学的特性を正確に同一に揃える必要
があることから、かかる光学的特性の異なる２種類の画
像センサを同時に使用することができなかった。However, in the conventional multi-view stereo system, since the optical characteristics of the image sensors need to be exactly the same, two types of image sensors having different optical characteristics cannot be used at the same time. Was.

【００５０】この場合、通常の感度のモノクロ画像セン
サ群と、赤外線画像センサ群の２種類の画像センサ群を
用意して、各画像センサ群ごとに基準画像センサを設定
して、計測を行うことも可能であろうが、それぞれの基
準画像センサの配設位置が異なることから、これら２種
類の画像センサ群の計測結果をそれぞれ突き合わせて総
合的な判断を行うことはできないことになっていた。ま
た、それぞれの基準画像センサの配設位置を物理的に全
く同一にすることは不可能である。In this case, two types of image sensors, a monochrome image sensor group with normal sensitivity and an infrared image sensor group, are prepared, and a reference image sensor is set for each image sensor group to perform measurement. However, since the reference image sensors are disposed at different positions, it has been impossible to make a comprehensive judgment by comparing the measurement results of these two types of image sensors. Further, it is impossible to make the arrangement positions of the respective reference image sensors physically identical.

【００５１】本発明の第２発明はこうした実状に鑑みて
なされたものであり、多眼ステレオによる計測におい
て、共通の仮想視野を設定することによって、２種類以
上の撮像条件の異なる画像センサ群それぞれの計測結果
を総合して判断できるようにして、計測の精度を向上さ
せることを第２の解決課題としている。The second invention of the present invention has been made in view of such a situation. In a multi-view stereo measurement, by setting a common virtual visual field, two or more types of image sensor groups having different imaging conditions are set. The second solution is to improve the accuracy of the measurement by making it possible to judge the measurement results in total.

【００５２】（３）画像センサの組合せに制約がある。(3) There are restrictions on the combinations of image sensors.

【００５３】従来の多眼ステレオ視による計測は、２眼
ステレオの原理を前提にしており、各ステレオ対すべて
について同一の基準画像センサを基準に類似度の比較を
行うことが必要であることから、ひとたび基準画像セン
サが定まれば、複数のステレオ対は固定されてしまう。
そして、各ステレオ対すべてについて同一の基準で類似
度の比較を行なうことが必要である。したがって、状況
に応じて、それぞれ基準の異なるステレオ対を順次選択
したり、特定のステレオ対からの類似度に関する情報を
修正したり、異なる類似度算出の仕方を採用することは
できない。Conventional measurement using multi-view stereo vision presupposes the principle of binocular stereo, and it is necessary to compare the similarity of all stereo pairs based on the same reference image sensor. Once the reference image sensor is determined, a plurality of stereo pairs are fixed.
Then, it is necessary to compare the similarities of all stereo pairs on the same basis. Therefore, depending on the situation, it is not possible to sequentially select stereo pairs having different references, correct information on similarity from a specific stereo pair, or adopt a different similarity calculation method.

【００５４】ここに、刻々と変化する作業視野に応じ
て、特定の画像センサの対（あるいは３以上の組合せ）
を順次選択していき、状況変化に応じたきめの細かい計
測を行いたいとの要請があるが、これに対処することは
できない。Here, a specific image sensor pair (or a combination of three or more) according to the working field of view that changes every moment.
Are sequentially selected, and there is a demand to perform a detailed measurement according to the situation change, but this cannot be dealt with.

【００５５】本発明の第３発明はこうした実状に鑑みて
なされたものであり、多眼ステレオによる計測におい
て、ステレオ対を固定して計測を行うのではなく、刻々
と変化する作業視野に応じて、特定のステレオ対の画像
センサ（あるいは３以上の画像センサの組合せ）を順次
選択していき、状況変化に対応したきめの細かい計測を
行うようにすることを第３の解決課題とするものであ
る。The third invention of the present invention has been made in view of such a situation. In a multi-view stereo measurement, a stereo pair is not fixed and measurement is performed according to an ever-changing work field. A third solution is to sequentially select a specific stereo pair of image sensors (or a combination of three or more image sensors) and perform fine measurement corresponding to a situation change. is there.

【００５６】（４）距離の計測結果を用いて物体を認識
することによる非効率。(4) Inefficiency caused by recognizing an object by using a distance measurement result.

【００５７】従来の物体認識の一般的な方法は、まず全
空間について物体までの距離計測を行い、その後で、こ
れらの距離情報を解析して障害物までの距離、形状、大
きさなどの特徴を認識するものである。The general method of the conventional object recognition is to first measure the distance to the object in the entire space, and then analyze the distance information to obtain features such as the distance to the obstacle, the shape and the size. It recognizes.

【００５８】しかし、図２６に示すように車両６０が道
路６１を走行している場合のように、視野の状況が一様
のものであれば、全空間について距離計測を行うこと
は、非効率的である。However, if the condition of the visual field is uniform as in the case where the vehicle 60 is traveling on the road 61 as shown in FIG. 26, it is inefficient to measure the distance in all the spaces. It is a target.

【００５９】すなわち、道路６１の存在する場所が予め
だいたい分かっていて、また車両６０にとっての障害物
が道路６１の直上に存在することが予想できるような場
合に、空中や道路の下の領域についても距離計測を実施
することは、演算処理上無駄が多く、演算処理が膨大な
ものとなってしまう。また、距離計測の処理の後で物体
の認識のための処理が必要となるが、これは単純な平面
上の障害物を認識するような場合でも複雑な処理にな
る。以上のように従来の計測方法では、効率のよい物体
の判別ができないばかりか、認識に誤りが発生する可能
性がある。In other words, when the location of the road 61 is known in advance and an obstacle for the vehicle 60 can be expected to exist immediately above the road 61, the area in the air or under the road is determined. However, performing the distance measurement involves a lot of waste in the arithmetic processing, and the arithmetic processing becomes enormous. Further, a process for recognizing the object is required after the process of the distance measurement. However, this process is complicated even when recognizing an obstacle on a simple plane. As described above, according to the conventional measurement method, not only is it not possible to efficiently discriminate an object, but also there is a possibility that an error occurs in recognition.

【００６０】本発明の第４発明は、こうした実状に鑑み
てなされたものであり、認識対象の物体の形状などが予
め特定されている場合に、より効率よく、誤認識するこ
となく、計測を行えるようにすることを第４の解決課題
とするものである。The fourth invention of the present invention has been made in view of such a situation. When the shape of an object to be recognized is specified in advance, measurement can be performed more efficiently without erroneous recognition. The fourth solution is to make it possible.

【００６１】[0061]

【課題を解決するための手段および効果】そこで、第１
発明の主たる発明では、第１の解決課題を達成するため
に、複数の撮像手段を所定間隔をもって配置し、これら
複数の撮像手段のうちの一の撮像手段で対象物体を撮像
したときの当該一の撮像手段の撮像画像中の選択画素に
対応する他の撮像手段の撮像画像中の対応候補点の情報
を、前記一の撮像手段から前記選択画素に対応する前記
物体上の点までの仮定距離の大きさ毎に抽出し、前記選
択画素の画像情報と前記対応候補点の画像情報の類似度
を算出し、この算出された類似度が最も大きくなるとき
の前記仮定距離を、前記一の撮像手段から前記選択画素
に対応する前記物体上の点までの距離とし、この各選択
画素毎に求められた距離に基づき前記物体を認識するよ
うにした物体の認識装置において、所望の位置で、所望
の視野をもって前記物体を撮像したときの仮想の撮像手
段による仮想の撮像画像を、前記一の撮像手段の撮像画
像の代わりに、設定する仮想視野情報設定手段と、前記
仮想視野情報設定手段で設定された仮想画像中の選択画
素に対応する前記複数の撮像手段の撮像画像中の対応候
補点の情報を、前記仮想視野情報設定手段で設定された
視点から前記選択画素に対応する前記物体上の点までの
仮定距離の大きさ毎に抽出する対応候補点情報抽出手段
と、前記対応候補点情報抽出手段で抽出された対応候補
点の画像情報同士の類似度を算出する類似度算出手段
と、前記類似度算出手段で算出された類似度が最も大き
くなるときの前記仮定距離を、前記仮想視野情報設定手
段で設定された視点から前記選択画素に対応する前記物
体上の点までの距離とし、この距離を各選択画素毎に求
める距離推定手段とを具えるようにしている。Means and effects for solving the problems
In the main invention of the present invention, in order to achieve the first solution, a plurality of imaging means are arranged at a predetermined interval, and one of the plurality of imaging means when one of the plurality of imaging means images a target object. The information of the corresponding candidate point in the image picked up by the other image pickup means corresponding to the selected pixel in the image picked up by the image pickup means is the assumed distance from the one image pickup means to the point on the object corresponding to the selected pixel. Is extracted for each size, and the similarity between the image information of the selected pixel and the image information of the corresponding candidate point is calculated. A distance from the means to a point on the object corresponding to the selected pixel; and an object recognition device configured to recognize the object based on the distance obtained for each selected pixel. In front of the field of view A virtual image information set by the virtual image information set by the virtual image information set by the virtual image information set by the virtual image pickup means when the object is imaged, instead of the image picked up by the one image pickup means. The information of the corresponding candidate points in the captured images of the plurality of imaging units corresponding to the selected pixels in the assumption from the viewpoint set by the virtual visual field information setting unit to the point on the object corresponding to the selected pixels Corresponding candidate point information extracting means for extracting for each magnitude of distance, similarity calculating means for calculating the similarity between the image information of the corresponding candidate points extracted by the corresponding candidate point information extracting means, and the similarity calculating The assumed distance when the similarity calculated by the means is the largest, the distance from the viewpoint set by the virtual visual field information setting means to a point on the object corresponding to the selected pixel, this distance So that comprising a distance estimation means for obtaining for each selected pixel.

【００６２】かかる構成によれば、図１と図４に示すよ
うに、仮想視野情報設定部３０５において、所望の位置
で、所望の視野をもって物体５０を撮像したときの仮想
の撮像手段２１による仮想画像＃２１が、一の撮像手段
１の画像＃１の代わりに、設定される。According to such a configuration, as shown in FIGS. 1 and 4, the virtual visual field information setting unit 305 uses the virtual image pickup means 21 to image the object 50 at a desired position with a desired visual field. The image # 21 is set instead of the image # 1 of one imaging unit 1.

【００６３】そして、対応候補点座標発生部３０６にお
いて、設定された仮想画像＃２１中の位置（ｉ，ｊ）で
特定される選択画素Ｐ₂₁に対応する複数の撮像手段１〜
Ｎの画像＃１〜＃Ｎ中の対応候補点Ｐ_kの位置座標
（Ｘ_k，Ｙ_k）（ｋ＝１、…、Ｎ）を、仮定距離ｚ_n毎に
発生する。[0063] Then, in the corresponding candidate point coordinate generating unit 306, a plurality of image pickup means corresponding to the selected pixel P ₂₁ specified by the position in the virtual image # 21 is set (i, j). 1 to
Position coordinates of the corresponding candidate point P _k of the image # 1 to # in N of _{_{N (X k, Y k)}} (k = 1, ..., N) and generated every assumption distance z _n.

【００６４】そして、類似度算出手段３１１〜３１３に
おいて、発生された対応候補点Ｐ_kの位置座標（Ｘ_k，Ｙ
_k）（ｋ＝１、…、Ｎ）の画像情報同士の類似度が算出
される。Then, in the similarity calculation means 311 to 313, the position coordinates (X _k , Y _k ) of the generated corresponding candidate point P _k
_k ) (k = 1,..., N) is calculated for the similarity between the pieces of image information.

【００６５】そして、距離推定手段３１４において、上
記算出された類似度の加算値が最も大きくなるときの仮
定距離ｚ_nxが、仮想撮像手段２１の選択画素Ｐ₂₁に対応
する物体５０上の点５０ａまでの距離として求められ
る。Then, in the distance estimating means 314, the assumed distance z _nx at which the calculated sum of the similarities becomes the largest is determined by the point 50 a on the object 50 corresponding to the selected pixel P ₂₁ of the virtual imaging means 21. It is calculated as the distance to

【００６６】このように、本発明によれば、基準画像セ
ンサたる一の撮像手段１の代わりに、所望の観測視野を
有した仮想の撮像手段２１を任意に設定できるようにな
ったので、画像センサ１〜Ｎの配設位置、姿勢によって
定まる作業視野に限定されることなく、作業状況に応じ
た作業視野を取得することができる。とりわけ、物理的
に画像センサを配置することが困難であったり、画像セ
ンサの使用が困難な環境の良くない場所であっても、そ
の場所に実際の基準画像センサを配置したのと同じ視野
の距離画像を得ることができる。As described above, according to the present invention, it is possible to arbitrarily set a virtual image pickup means 21 having a desired observation field of view instead of one image pickup means 1 serving as a reference image sensor. The work field of view according to the work situation can be acquired without being limited to the work field of view determined by the arrangement positions and postures of the sensors 1 to N. In particular, even in places where it is difficult to physically place the image sensor or where it is difficult to use the image sensor, the same field of view as that where the actual reference image sensor is placed in that place. A range image can be obtained.

【００６７】さらに、以下の種々の効果がもたらされ
る。Further, the following various effects can be obtained.

【００６８】すなわち、本発明によれば、空間認識、物
体認識を行うための基準となる観測視野を自由に変更で
きるため、後段の処理を効率よく行うことができる。That is, according to the present invention, since the observation field of view, which is a reference for performing space recognition and object recognition, can be freely changed, the subsequent processing can be performed efficiently.

【００６９】この点、従来の計測方法では、実際に得ら
れる作業視野は、実際の画像センサ１〜Ｎの何れかで得
られる観測視野に限定される。このため、取得された観
測視野に応じて対象物の認識や判断を行おうとすると、
その目的に応じた作業視野を取得する必要があり、この
ため座標変換を行ったり、画像センサの向きを物理的に
変更する必要があった。また、認識のためのアルゴリズ
ムも複雑になる場合があった。In this respect, in the conventional measuring method, the working visual field actually obtained is limited to the observation visual field obtained by any of the actual image sensors 1 to N. For this reason, when trying to recognize or judge an object according to the acquired observation field of view,
It was necessary to acquire a working field of view according to the purpose, and therefore, it was necessary to perform coordinate conversion and physically change the direction of the image sensor. Also, the algorithm for recognition sometimes becomes complicated.

【００７０】本発明によれば、認識、判断などの後段の
処理に合わせて、自由に作業視野を変更することができ
るため、画像センサの向きを物理的に変更することなど
の複雑な後段の処理を要することなく、簡易に、効率よ
く、後段の認識、判断などの処理を行うことが可能にな
る。According to the present invention, the working field of view can be freely changed in accordance with the subsequent processing such as recognition and judgment, so that complicated downstream processing such as physically changing the orientation of the image sensor can be performed. Processing such as recognition and determination at a later stage can be easily and efficiently performed without requiring processing.

【００７１】また、本発明によれば、仮想画像＃２１、
つまり仮想視野に基づいて、各画像センサからの局所情
報を抽出するときに、それぞれの画像センサのキャリブ
レーションを同時に行っていることになる。このため、
キャリブレーションのためだけに新たに処理を追加する
必要がなく、かつ効率よくキャリブレーションを実行す
ることが可能である。According to the present invention, virtual image # 21,
That is, when local information from each image sensor is extracted based on the virtual visual field, calibration of each image sensor is performed at the same time. For this reason,
It is not necessary to add a new process only for the calibration, and the calibration can be executed efficiently.

【００７２】また、本発明によれば、実際に画像センサ
の配設位置を変化させることなく、仮想画像＃２１、つ
まり仮想視野を演算処理の上で任意に変更することがで
き、いわば「電子的な首振り」が可能であるので、機械
的に画像センサの位置、姿勢を変化させる機構を設けな
いで済む。このため、機械的故障などのない信頼性の高
いシステムを構築することが可能となる。よって、こう
した「電子的な首振り」を応用して、人の目線の動きに
合わせて、仮想視野を変更することも可能である。この
ようなシステムを構築することにより、遠隔地に居なが
らも、あたかも観測地点に居て自在に視線を向けた場合
の画像を生成することができる。Further, according to the present invention, the virtual image # 21, that is, the virtual visual field, can be arbitrarily changed through arithmetic processing without actually changing the arrangement position of the image sensor. Therefore, it is not necessary to provide a mechanism for mechanically changing the position and the posture of the image sensor. For this reason, it is possible to construct a highly reliable system without any mechanical failure. Therefore, it is also possible to change the virtual visual field according to the movement of the human eye by applying such “electronic swinging”. By constructing such a system, it is possible to generate an image in the case where the user is in a remote place, as if at the observation point, and turns his or her eyes freely.

【００７３】また、第２発明の主たる発明では、第２の
解決課題達成のために、複数の撮像手段を所定間隔をも
って配置し、これら複数の撮像手段のうちの一の撮像手
段で対象物体を撮像したときの当該一の撮像手段の撮像
画像中の選択画素に対応する他の撮像手段の撮像画像中
の対応候補点の情報を、前記一の撮像手段から前記選択
画素に対応する前記物体上の点までの仮定距離の大きさ
毎に抽出し、前記選択画素の画像情報と前記対応候補点
の画像情報の類似度を算出し、この算出された類似度が
最も大きくなるときの前記仮定距離を、前記一の撮像手
段から前記選択画素に対応する前記物体上の点までの距
離とし、この各選択画素毎に求められた距離に基づき前
記物体を認識するようにした物体の認識装置において、
前記複数の撮像手段は、前記物体を撮像する条件の異な
る少なくとも２種類の撮像手段群に分類されるものであ
り、所望の位置で、所望の視野をもって前記物体を撮像
したときの仮想の撮像手段による仮想の撮像画像を、前
記一の撮像手段の撮像画像の代わりに、設定する仮想視
野情報設定手段と、前記仮想視野情報設定手段で設定さ
れた仮想画像中の選択画素に対応するそれぞれの前記撮
像手段群の撮像画像中の対応候補点の情報を、前記仮想
視野情報設定手段で設定された視点から前記選択画素に
対応する前記物体上の点までの仮定距離の大きさ毎に抽
出する対応候補点情報抽出手段と、前記対応候補点情報
抽出手段で抽出された対応候補点の画像情報同士の類似
度を、それぞれの前記撮像手段群毎に算出する類似度算
出手段と、前記類似度算出手段で算出されたそれぞれの
前記撮像手段群についての類似度に基づいて融合類似度
を求め、この融合類似度が最も大きくなるときの前記仮
定距離を、前記仮想視野情報設定手段で設定された視点
から前記選択画素に対応する前記物体上の点までの距離
とし、この距離を各選択画素毎に求める距離推定手段と
を具えるようにしている。In the main invention of the second invention, in order to achieve the second solution, a plurality of imaging means are arranged at predetermined intervals, and one of the plurality of imaging means is used to detect a target object. The information of the corresponding candidate point in the image picked up by the other image pickup means corresponding to the selected pixel in the image picked up by the one image pickup means when the image is picked up is displayed on the object corresponding to the selected pixel from the one image pickup means. Is extracted for each size of the assumed distance to the point, and the similarity between the image information of the selected pixel and the image information of the corresponding candidate point is calculated. The assumed distance when the calculated similarity is the largest Is the distance from the one imaging means to a point on the object corresponding to the selected pixel, in the object recognition device to recognize the object based on the distance obtained for each selected pixel,
The plurality of imaging units are classified into at least two types of imaging units having different conditions for imaging the object, and are virtual imaging units when the object is imaged at a desired position with a desired field of view. The virtual captured image according to, instead of the captured image of the one image capturing means, a virtual visual field information setting means to set, and each of the above-mentioned corresponding to a selected pixel in the virtual image set by the virtual visual information setting means Correspondence to extract information on the corresponding candidate points in the captured image of the imaging means group for each magnitude of the assumed distance from the viewpoint set by the virtual visual field information setting means to the point on the object corresponding to the selected pixel Candidate point information extracting means; similarity calculating means for calculating the similarity between the image information of the corresponding candidate points extracted by the corresponding candidate point information extracting means for each of the imaging means groups; A fusion similarity is obtained based on the similarity of each of the imaging means groups calculated by the degree calculation means, and the assumed distance when the fusion similarity is maximized is set by the virtual visual field information setting means. And a distance estimating means for obtaining a distance from the viewpoint to a point on the object corresponding to the selected pixel and obtaining the distance for each selected pixel.

【００７４】かかる構成によれば、例えば２つの撮像手
段群で説明した場合、図７（ｂ）に示すように、複数の
撮像手段１１″が、物体５０を撮像する条件の異なる２
種類の第１の撮像手段群１４１、１４２、…と第２の撮
像手段群１５１、１５２、…とに分類される。According to such a configuration, for example, in the case where two groups of imaging means are described, as shown in FIG. 7B, a plurality of imaging means 11 ″ have different conditions for imaging the object 50.
Are classified into first imaging means groups 141, 142,... And second imaging means groups 151, 152,.

【００７５】そして、図１と図９に示すように、仮想視
野情報設定手段５０５において、所望の位置で、所望の
視野をもって物体５０を撮像したときの仮想の撮像手段
２１による仮想画像＃２１が、一の撮像手段１４１、１
５１の画像＃１４１、＃１５１の代わりに、設定され
る。Then, as shown in FIGS. 1 and 9, the virtual visual field information setting means 505 converts the virtual image # 21 by the virtual imaging means 21 when the object 50 is imaged at a desired position with a desired visual field. , One imaging means 141, 1
51 are set instead of the images # 141 and # 151.

【００７６】そして、対応候補点座標発生部５０６にお
いて、設定された仮想画像＃２１中の位置（ｉ，ｊ）で
特定される選択画素Ｐ₂₁に対応する第１の撮像手段群１
４１、１４２、…の画像＃１４１、＃１４２、…中の対
応候補点および第２の撮像手段群１５１、１５２…の画
像＃１５１、＃１５２、…中の対応候補点Ｐ_kの位置座
標（Ｘ_k，Ｙ_k）が仮定距離ｚ_nの大きさ毎に発生され
る。[0076] Then, in the corresponding candidate point coordinate generating unit 506, the first imaging means groups corresponding to the selected pixel P ₂₁ specified by the position in the virtual image # 21 is set (i, j) 1
, And the corresponding coordinates of the corresponding candidate points _Pk in the images # 151, # 152,... Of the second imaging means groups 151, 152,. X _k , Y _k ) are generated for each magnitude of the assumed distance z _n .

【００７７】そして、類似度算出手段５１１、５１２に
おいて、発生された対応候補点Ｐ_kの位置座標（Ｘ_k，Ｙ
_k）の画像情報同士の類似度が、第１の撮像手段群１４
１、１４２、…、第２の撮像手段群１５１、１５２、
…、それぞれ毎に算出される。そして、距離推定手段５
１５において、図１０（ａ）、（ｂ）、（ｃ）に示すよ
うに、上記算出された第１の撮像手段群１４１、１４
２、…についての類似度（類似度の逆数Ｑs₁）と第２の
撮像手段群１５１、１５２、…についての類似度（類似
度の逆数Ｑs₂）とを融合した融合類似度（類似度の逆数
の加算値Ｑs₁₂）が最も大きくなるときの仮定距離ｚ_nx
が各選択画素毎に求められる。Then, the similarity calculating means 511 and 512 calculate the position coordinates (X _k , Y _k) of the generated corresponding candidate point P _k.
_k ), the degree of similarity between the pieces of image information
1, 142,..., A second imaging unit group 151, 152,
.., Are calculated for each. And the distance estimating means 5
In FIG. 15, as shown in FIGS. 10A, 10 </ b> B, and 10 </ b> C, the calculated first imaging unit groups 141 and 14 are calculated.
2, the similarity (reciprocal similarity Qs ₁₎ and second image pickup means group 151 and 152 for ..., (similarity reciprocal Qs ₂₎ similarity ... for a fused fusion similarity (similarity Assumed distance z _nx when reciprocal addition value Qs ₁₂ ) is largest
Is obtained for each selected pixel.

【００７８】このように、多眼ステレオによる計測にお
いて、仮想撮像手段（仮想画像センサ）２１を設定し
て、これを第１の撮像手段群（第１の画像センサ群）と
第２の撮像手段群（第２の画像センサ群）に共通の基準
画像センサとしたので（撮像手段群毎に別々に基準画像
センサを設定することを要しないので）、明視野用画像
センサ群と暗視野用画像センサ群、あるいは通常の感度
のモノクロ画像センサ群と、赤外線画像センサ群のよう
に、光学的特性の異なる画像センサ群同士であっても、
これら２種類の画像センサ群の計測結果をそれぞれ融合
して総合的な判断を行うことが可能となる。２種類の撮
像条件の異なる画像センサ群それぞれの計測結果を総合
して判断できるようになった結果、計測の精度を飛躍的
に向上させることができる。As described above, in the measurement by the multi-view stereo, the virtual image pickup means (virtual image sensor) 21 is set, and this is set to the first image pickup means group (first image sensor group) and the second image pickup means. Since the reference image sensor is used in common for the group (second image sensor group) (since it is not necessary to separately set a reference image sensor for each imaging unit group), the bright-field image sensor group and the dark-field image sensor are used. Sensor groups, or monochrome image sensors with normal sensitivity, and even between image sensors with different optical characteristics, such as infrared image sensors,
It is possible to make a comprehensive judgment by fusing the measurement results of these two types of image sensors. As a result of comprehensively determining the measurement results of the two types of image sensors having different imaging conditions, the accuracy of the measurement can be dramatically improved.

【００７９】以上は２種類の撮像手段群の場合で説明し
たが、３種類以上の撮像手段群の場合にも同様に適用さ
れる。Although the above description has been made for the case of two types of imaging means groups, the same applies to the case of three or more types of imaging means groups.

【００８０】また、第３発明の主たる発明では、第３の
解決課題達成のために、複数の撮像手段を所定間隔をも
って配置し、これら複数の撮像手段のうちの一の撮像手
段で対象物体を撮像したときの当該一の撮像手段の撮像
画像中の選択画素に対応する他の撮像手段の撮像画像中
の対応候補点の情報を、前記一の撮像手段から前記選択
画素に対応する前記物体上の点までの仮定距離の大きさ
毎に抽出し、前記選択画素の画像情報と前記対応候補点
の画像情報の類似度を算出し、この算出された類似度が
最も大きくなるときの前記仮定距離を、前記一の撮像手
段から前記選択画素に対応する前記物体上の点までの距
離とし、この各選択画素毎に求められた距離に基づき前
記物体を認識するようにした物体の認識装置において、
前記複数の撮像手段は、前記物体を撮像する条件の異な
る少なくとも２種類の撮像手段群に分類されるものであ
り、前記物体の撮像条件に応じて、前記撮像手段群の中
から少なくとも一つの撮像手段群を、実際に使用すべき
撮像手段群として選択する撮像手段群選択手段と、所望
の位置で、所望の視野をもって前記物体を撮像したとき
の仮想の撮像手段による仮想の撮像画像を、前記一の撮
像手段の撮像画像の代わりに、設定する仮想視野情報設
定手段と、前記仮想視野情報設定手段で設定された仮想
画像中の選択画素に対応する前記選択された撮像手段群
の撮像画像中の対応候補点の情報を、前記仮想視野情報
設定手段で設定された視点から前記選択画素に対応する
前記物体上の点までの仮定距離の大きさ毎に抽出する対
応候補点情報抽出手段と、前記対応候補点情報抽出手段
で抽出された対応候補点の画像情報同士の類似度を、前
記選択された撮像手段群について算出する類似度算出手
段と、前記類似度算出手段で算出された前記選択撮像手
段群についての類似度が最も大きくなるときの前記仮定
距離を、前記仮想視野情報設定手段で設定された視点か
ら前記選択画素に対応する前記物体上の点までの距離と
し、この距離を各選択画素毎に求める距離推定手段とを
具えるようにしている。In the main invention of the third invention, in order to achieve the third solution, a plurality of imaging means are arranged at a predetermined interval, and one of the plurality of imaging means is used to detect a target object. The information of the corresponding candidate point in the image picked up by the other image pickup means corresponding to the selected pixel in the image picked up by the one image pickup means when the image is picked up is displayed on the object corresponding to the selected pixel from the one image pickup means. Is extracted for each size of the assumed distance to the point, and the similarity between the image information of the selected pixel and the image information of the corresponding candidate point is calculated. The assumed distance when the calculated similarity is the largest Is the distance from the one imaging means to a point on the object corresponding to the selected pixel, in the object recognition device to recognize the object based on the distance obtained for each selected pixel,
The plurality of image pickup units are classified into at least two types of image pickup unit groups having different conditions for imaging the object, and at least one image pickup unit from the image pickup unit group according to the image pickup condition of the object. A means group, an imaging means group selection means for selecting as an imaging means group to be actually used, and a virtual image captured by a virtual imaging means at a desired position when capturing the object with a desired field of view, Instead of the image picked up by the one image pickup means, a virtual field information setting means to be set, and an image picked up image of the selected image pickup means group corresponding to a selected pixel in the virtual image set by the virtual field information set means. Extraction of corresponding candidate point information for each magnitude of an assumed distance from the viewpoint set by the virtual field of view information setting means to a point on the object corresponding to the selected pixel And a similarity calculating means for calculating the similarity between the image information of the corresponding candidate points extracted by the corresponding candidate point information extracting means for the selected group of imaging means; The assumed distance when the similarity for the selected imaging unit group is the largest is a distance from the viewpoint set by the virtual visual field information setting unit to a point on the object corresponding to the selected pixel, Distance estimating means for obtaining a distance for each selected pixel.

【００８１】かかる構成によれば、図８に示すように、
作業視野Ａrの撮像条件（車両６０の速度Ｖ、進行方向
φ）に応じて、複数の撮像手段群である画像センサ群１
１の中から実際に使用すべき少なくとも２つの撮像手段
である画像センサ２、３が選択される。According to such a configuration, as shown in FIG.
The image sensor group 1 that is a plurality of image pickup units according to the image pickup condition (the speed V of the vehicle 60, the traveling direction φ) of the work visual field Ar.
Image sensors 2 and 3, which are at least two image pickup means to be actually used, are selected from among the image sensors 2 and 3.

【００８２】そして、道路６１の撮像条件に応じた所望
の位置で、所望の視野をもって道路６１を撮像したとき
の図１に示す仮想画像センサ２１による仮想画像＃２１
が、一の撮像手段１の画像＃１の代わりに設定される。
そして、設定された仮想画像＃２１中の位置（ｉ，ｊ）
で特定される選択画素Ｐ₂₁に対応する前記選択された少
なくとも２つの撮像手段２、３の画像中の対応候補点Ｐ
_kの位置座標（Ｘ_k，Ｙ_k）を仮定距離ｚ_nの大きさ毎に発
生する。A virtual image # 21 obtained by the virtual image sensor 21 shown in FIG. 1 when the road 61 is imaged at a desired position according to the imaging conditions of the road 61 with a desired visual field.
Is set in place of the image # 1 of one imaging unit 1.
Then, the position (i, j) in the set virtual image # 21
In the corresponding candidate point P in the image of the at least two imaging means 2, 3 said selected corresponding to the selected pixel P ₂₁ specified
_k position coordinates (X _k, Y _k) generated for each size of the assumed distance z _n a.

【００８３】そして、発生された対応候補点Ｐ_kの位置
座標（Ｘ_k，Ｙ_k）の画像情報同士の類似度が算出され
る。Then, the similarity between the image information of the position coordinates (X _k , Y _k ) of the generated corresponding candidate point P _k is calculated.

【００８４】そして、上記算出された類似度が最も大き
くなるときの仮定距離ｚ_nxが作業視野Ａr中の物体上の
点までの距離とされ各選択画素毎に求められる。Then, the assumed distance z _nx at which the calculated similarity becomes _maximum is set as the distance to a point on the object in the work field of view Ar and is obtained for each selected pixel.

【００８５】このように、本発明によれば、多眼ステレ
オによる計測において、ステレオ対を固定して計測を行
うのではなく、刻々と変化する作業視野Ａrに応じて、
特定のステレオ対の画像センサ２、３を順次選択してい
くことができるので、状況変化（車両６０の速度Ｖ、進
行方向φの変化など）に対応したきめの細かい計測を行
うことができる。As described above, according to the present invention, in the measurement using the multi-view stereo, instead of performing the measurement while fixing the stereo pair, the measurement is performed according to the working field Ar that changes every moment.
Since the image sensors 2 and 3 of a specific stereo pair can be sequentially selected, fine measurement corresponding to a change in the situation (a change in the speed V of the vehicle 60, a change in the traveling direction φ, etc.) can be performed.

【００８６】また、第４発明では、第４の解決課題達成
のために、所定間隔をもって配置され、認識対象物体を
撮像する複数の撮像手段と、前記認識対象物体に応じた
モデル群を仮定するとともに、当該モデル群の各モデル
上の各点の位置座標を設定する仮定モデル情報設定手段
と、前記複数の撮像手段のうちの一の撮像手段で前記認
識対象物体を撮像したときの当該一の撮像手段の撮像画
像中の選択画素に対応する他の撮像手段の撮像画像中の
対応候補点の情報を、前記モデル群の中から選択された
仮定モデル毎に、当該仮定モデルの位置座標を用いて抽
出する対応候補点情報抽出手段と、前記選択画素の画像
情報と前記対応候補点情報抽出手段で抽出された対応候
補点の画像情報の類似度を算出する類似度算出手段と、
前記類似度算出手段で算出された類似度が最も大きくな
るときの前記仮定モデルを、前記選択画素に対応する点
のモデルとし、このモデルを各選択画素毎に求めるモデ
ル推定手段とを具えるようにしている。According to the fourth aspect of the present invention, in order to achieve the fourth solution, a plurality of image pickup means arranged at predetermined intervals for picking up an object to be recognized and a model group corresponding to the object to be recognized are assumed. Along with a hypothetical model information setting means for setting the position coordinates of each point on each model in the model group, and the one when the recognition target object is imaged by one of the plurality of imaging means. For each hypothetical model selected from the model group, the information of the corresponding candidate point in the captured image of the other imaging means corresponding to the selected pixel in the captured image of the imaging means is used by using the position coordinates of the hypothetical model. Candidate corresponding point information extracting means, and similarity calculating means for calculating the similarity between the image information of the selected pixel and the image information of the corresponding candidate point extracted by the corresponding candidate point information extracting means,
A model estimating unit that sets the hypothetical model when the similarity calculated by the similarity calculating unit is the largest as a model of a point corresponding to the selected pixel, and obtains this model for each selected pixel. I have to.

【００８７】かかる構成によれば、図１３に示すよう
に、車両６０上の画像センサ群１１を用いて道路６１を
認識するような問題の場合に、以下のように効率的に処
理を実施することができる。According to such a configuration, as shown in FIG. 13, in the case where the road 61 is recognized using the image sensor group 11 on the vehicle 60, the processing is efficiently performed as follows. be able to.

【００８８】すなわち、図２７に示すように、基準画像
入力部８０１およびその他の画像入力部８０２〜８０４
から認識対象である道路６１の基準画像および対応画像
を複数取り込み、仮定モデル情報設定部８０５では道路
６１に応じたモデルＭ₁、Ｍ₂、Ｍ₃、Ｍ₄をパラメータθ
の値θ₁、θ₂、θ₃、θ₄毎に生成する。That is, as shown in FIG. 27, the reference image input unit 801 and the other image input units 802 to 804
, A plurality of reference images and corresponding images of the road 61 to be recognized are fetched, and the model M ₁ , M ₂ , M ₃ , and M ₄ corresponding to the road 61 are determined by the hypothetical model information setting unit 805 as parameters θ.
Are generated for each of the values θ ₁ , θ ₂ , θ ₃ , θ ₄ .

【００８９】そして、対応候補点座標発生部８０６で
は、図２８に示すように基準画像センサ１の画像＃１中
の位置（ｉ，ｊ）で特定される選択画素Ｐ₁に対応する
他の画像センサｋの画像＃ｋ中の対応候補点Ｐ_kの位置
座標（Ｘ_k，Ｙ_k）を、選択された仮定モデルＭ_i（ｉ＝
１〜４）毎に発生する。Then, as shown in FIG. 28, the corresponding candidate point coordinate generator 806 outputs another image corresponding to the selected pixel P ₁ specified by the position (i, j) in the image # 1 of the reference image sensor 1. position coordinates (X _k, Y _k) of the corresponding candidate point P _k in the image #k of the sensor k and selected hypothesized model M _i (i =
1 to 4).

【００９０】さらに、局所情報抽出部８０７〜８１０で
はそれぞれの局所情報を抽出し、類似度算出部８１１〜
８１３では基準画像の局所情報と比較することにより各
類似度が算出される。Further, local information extracting sections 807 to 810 extract respective local information, and calculate similarity calculating sections 811 to 811.
In step 813, each similarity is calculated by comparing with the local information of the reference image.

【００９１】そして、モデル推定部８１４では、図１４
に例を示すように、上記算出された類似度の逆数の加算
値が最も大きくなるモデル（ここでは、Ｍ₃とＭ₄の間
のモデルＭ_2.6［θ＝３度］のように計測のきざみ幅よ
り細かい値を求めることもできる。）が、選択画素Ｐ₁
に対応するモデルとして推定される。このモデルが各選
択画素毎に求められ、この各選択画素毎に求められたモ
デルの中で最も頻度の高いモデル（Ｍ₃）が、道路６１
を示すモデルと認識される。Then, the model estimating unit 814 generates
As shown in the example, a model in which the sum of the reciprocals of the calculated similarity is the largest (here, a model M _2.6 between M ₃ and M ₄ [θ = 3 degrees] it is also possible to obtain the finer than the width value.) is selected pixel P ₁
Is estimated as a model corresponding to This model is obtained for each selected pixel, and the most frequent model (M ₃ ) among the models obtained for each selected pixel is the road 61.
Is recognized as a model indicating

【００９２】このように、認識対象の物体の形状などが
予め特定されている場合に、これをモデル化して、認識
対象を判別するようにしたので、従来のように全空間の
各点ごとに距離計測を行ったり、距離計測結果から物体
を特定、抽出したりする複雑な処理を要しないので、よ
り効率よく、誤認識することなく、計測を行えるように
なる。As described above, when the shape of the object to be recognized is specified in advance, the model is modeled to determine the object to be recognized. Since it is not necessary to perform a complicated process of performing distance measurement and specifying and extracting an object from the distance measurement result, measurement can be performed more efficiently without erroneous recognition.

【００９３】[0093]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態について説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００９４】・第１の実施形態本実施形態は、所望の作業視野を取得することができる
実施形態である。First Embodiment This embodiment is an embodiment in which a desired work field can be obtained.

【００９５】以下、関連する図を参照して説明する。Hereinafter, description will be made with reference to the related drawings.

【００９６】図４は、本実施形態の多眼ステレオ視によ
る物体認識装置の構成を示すブロック図であり、図２３
に示す従来装置に対応している。また、図３は、本実施
形態の仮想画像センサを含む各画像センサの構成図であ
り、図２４に示す従来の画像センサの構成図に対応して
いる。なお、本実施形態では、多眼ステレオに適用され
る場合を想定しているが、２眼ステレオに適用してもよ
い。FIG. 4 is a block diagram showing the configuration of an object recognition apparatus for multi-view stereo vision according to this embodiment.
Corresponding to the conventional device shown in FIG. FIG. 3 is a configuration diagram of each image sensor including the virtual image sensor of the present embodiment, and corresponds to the configuration diagram of the conventional image sensor shown in FIG. In the present embodiment, it is assumed that the present invention is applied to multi-view stereo, but the present invention may be applied to twin-view stereo.

【００９７】多眼ステレオを構成する基準画像センサ１
と各画像センサ２、３、…、Ｎは、水平、垂直あるいは
斜め方向に所定の間隔で配置されているものとする。な
お、説明の便宜上、図３では一定間隔で左右に配置され
ている場合を示している。Reference image sensor 1 constituting multi-view stereo
, N are arranged at predetermined intervals in the horizontal, vertical, or oblique directions. For convenience of explanation, FIG. 3 shows a case where the antennas are arranged on the left and right at a constant interval.

【００９８】基準画像入力部３０１および各画像入力部
３０２、３０３、…、３０４には、それぞれ基準画像＃
１および各画像＃２、＃３、…、＃Ｎが取り込まれる。The reference image input unit 301 and the image input units 302, 303,...
1 and each of the images # 2, # 3,..., #N.

【００９９】仮想視野情報設定部３０５では、図１に示
すように、所望の位置で、所望の視野をもって物体５０
を撮像したときの仮想画像センサ２１による仮想画像＃
２１に関する情報が設定される。つまり、作業視野とな
るべき仮想視野が設定される。この仮想画像センサ２
１には、実際の画像センサと同じように、画像センサの
配設位置、向き（姿勢）、焦点距離、解像度などのパラ
メータが定義される。また、目的に応じて仮想画像セン
サ２１のパラメータを自由に変更してもよい。なお、図
１において４１は仮想画像センサ２１を構成する仮想的
なレンズを示している。In the virtual visual field information setting section 305, as shown in FIG.
Image # by the virtual image sensor 21 when the image is taken
21 is set. That is, a virtual visual field to be a working visual field is set. This virtual image sensor 2
1 defines parameters such as the arrangement position, orientation (posture), focal length, and resolution of the image sensor, as in the actual image sensor. Further, the parameters of the virtual image sensor 21 may be freely changed according to the purpose. In FIG. 1, reference numeral 41 denotes a virtual lens constituting the virtual image sensor 21.

【０１００】対応候補点座標発生部３０６では、仮想画
像＃２１の各選択画素に対して、仮定した距離ｚ_n毎
に、基準画像＃１の対応候補点の位置座標、画像センサ
２の画像＃２の対応候補点の位置座標、画像センサ３の
画像＃３の対応候補点の位置座標、画像センサＮの画像
＃Ｎの対応候補点の位置座標がそれぞれ記憶、格納され
ており、これらを読み出すことにより各対応候補点の位
置座標を発生する。ここでは簡単のために位置座標を記
憶格納した例で説明するが、位置座標を演算によって求
めても良い。[0100] In the corresponding candidate point coordinate generating unit 306, for each selected pixel of the virtual image # 21, for each distance z _n assuming the position coordinates of the corresponding candidate points of the reference image # 1, the image sensor 2 image # And the position coordinates of the corresponding candidate point of image # 3 of the image sensor 3 and the position coordinates of the corresponding candidate point of image #N of the image sensor N are read out. Thus, the position coordinates of each corresponding candidate point are generated. Here, for the sake of simplicity, an example will be described in which the position coordinates are stored and stored, but the position coordinates may be obtained by calculation.

【０１０１】仮想画像＃２１の選択画素Ｐ₂₁と、仮定距
離ｚ_nにおける各画像センサｋ（ｋ＝１、２、…、Ｎ：
以下、ｋは同じ意味として使用する）の対応候補点Ｐ_k
の位置座標（Ｘ_k，Ｙ_k）との関係を図１に示す。[0102] Virtual image # the selected pixel P ₂₁ of 21, each of the image sensors k at assumed distance _{z n (k = 1,2, ...} , N:
Hereinafter, the corresponding candidate points of k is used as the same meaning) P _k
FIG. 1 shows the relationship with the position coordinates (X _k , Y _k ).

【０１０２】図１の物体５０上の位置座標（ｘ，ｙ，
ｚ）は、全体座標系ｘ−ｙ−ｚで表され、画像上の位置
座標（Ｘ，Ｙ）は各画像毎に設定された局所座標系Ｘ−
Ｙで表されるものとする。同図に示すように、仮想画像
センサ２１の仮想画像＃２１の中の（ｉ，ｊ）で特定さ
れる選択画素Ｐ₂₁が選択されるとともに、認識対象物体
５０上の点５０ａ（ｘ，ｙ，ｚ）までの距離ｚ_nが仮定
される。そして、この仮定距離ｚ_nに対応する実際の各
画像センサｋの画像＃ｋ内の対応候補点Ｐ_kの位置座標
（Ｘ_k，Ｙ_k）が読み出される。The position coordinates (x, y,
z) is represented by the global coordinate system xyz, and the position coordinates (X, Y) on the image are expressed in the local coordinate system X-Y set for each image.
Let it be represented by Y. As shown in the figure, the selected pixel P21 specified by (i, j) in the virtual image # 21 of the virtual image sensor ₂₁ is selected, and the point 50a (x, y) on the recognition target object 50 is selected. , the distance z _n to z) is assumed. Then, the actual position coordinates (X _k , Y _k ) of the corresponding candidate point P _k in the image #k of each image sensor k corresponding to the assumed distance z _n are read.

【０１０３】つぎに、局所情報抽出部３０７〜３１０で
は、このようにして対応候補点座標発生部３０６によっ
て発生された対応候補点Ｐ_kの位置座標に基づき局所情
報を抽出する処理がそれぞれ実行される。Next, the local information extracting sections 307 to 310 respectively execute processing for extracting local information based on the position coordinates of the corresponding candidate point _Pk generated by the corresponding candidate point coordinate generating section 306 in this way. You.

【０１０４】そして、画像＃１、＃２、＃３、・・・、＃
Ｎの対応候補点Ｐ₁、Ｐ₂、Ｐ₃、・・・、Ｐ_Nの画像情報
Ｆ₁、Ｆ₂、Ｆ₃、・・・、Ｆ_Nがそれぞれ得られると、基
準画像＃１の対応候補点Ｐ₁の画像情報Ｆ₁を基準とする
類似度が各類似度算出部３１１〜３１３で算出される。Then, images # 1, # 2, # 3,..., #
Corresponding candidate points _{_{N P 1, P 2, P}} 3, ···, the image information F ₁ of _{_{_{P N, F 2, F 3}}} , ···, the F _N is obtained respectively, corresponding reference image # 1 similarity relative to the image information F ₁ of the candidate point P ₁ is calculated by the similarity calculation unit 311 to 313.

【０１０５】すなわち、類似度算出部３１１では、画像
センサ２の画像＃２の対応候補点Ｐ₂の画像情報Ｆ₂と基
準画像＃１の対応候補点Ｐ₁の画像情報Ｆ₁との比較によ
り、これらの画像情報の類似度が、また、類似度算出部
３１２では、画像センサ３の画像＃３の対応候補点Ｐ₃
の画像情報Ｆ₃と基準画像＃１の対応候補点Ｐ₁の画像情
報Ｆ₁との比較により、これらの画像情報の類似度が、
また、類似度算出部３１３では、画像センサＮの画像＃
Ｎの対応候補点Ｐ_Nの画像情報Ｆ_Nと基準画像＃１の対応
候補点Ｐ₁の画像情報Ｆ₁との比較により、これらの画像
情報の類似度がそれぞれ算出される。[0105] That is, the similarity calculation unit 311, by comparing the image information F ₁ corresponding candidate point P ₁ of the image # 2 image sensor 2 of the corresponding candidate point P ₂ image information F ₂ and the reference image # 1 And the similarity of these pieces of image information, and the similarity calculation unit 312 calculates the corresponding candidate point P ₃ of the image # 3 of the image sensor _3.
Of the comparison of the image information F ₃ and the image information F ₁ corresponding candidate point P ₁ of the reference image # 1, the similarity of these image information,
In addition, the similarity calculation unit 313 calculates the image # of the image sensor N
By comparing the image information F ₁ corresponding candidate point P ₁ of the image information of the N corresponding candidate point P _N F _N and the reference image # 1, the similarity of image information are calculated.

【０１０６】つまり、本実施形態では、基準画像＃１
を、類似度算出の際の基準となる画像としている点で、
図２５の従来の類似度算出部２０９〜２１１と共通して
いる。たとえば、基準画像＃１の画像情報と対応画像
（たとえば＃２）の画像情報との差の２乗が、類似度と
して求められる。具体的には、類似度算出部３１１で
は、基準画像＃１の対応候補点Ｐ₁の周囲の領域と、画
像センサ２の画像＃２の対応候補点の周囲の領域とのパ
ターンマッチングにより、両画像の領域同士が比較され
て、類似度が算出される。That is, in the present embodiment, the reference image # 1
Is used as a reference image for calculating the similarity,
This is common to the conventional similarity calculation units 209 to 211 of FIG. For example, the square of the difference between the image information of the reference image # 1 and the image information of the corresponding image (for example, # 2) is obtained as the similarity. Specifically, the similarity calculating unit 311, and the surrounding region of the corresponding candidate point P ₁ of the reference image # 1, the pattern matching with the surrounding area of the corresponding candidate points of the image # 2 image sensor 2, both The regions of the image are compared with each other to calculate the similarity.

【０１０７】すなわち、図３に示すように、基準画像＃
１の対応候補点Ｐ₁の位置座標を中心とするウインドウ
ＷＤ₁が切り出されるとともに、画像センサ２の画像＃
２の対応候補点Ｐ₂の位置座標を中心とするウインドウ
ＷＤ₂が切り出され、これらウインドウＷＤ₁、ＷＤ₂同
士についてパターンマッチングを行うことにより、これ
らの類似度が算出される。このパターンマッチングは各
仮定距離ｚ_n毎に行われる。That is, as shown in FIG.
Together with the window WD ₁ is cut around the 1 position coordinates of the corresponding candidate points P _1, the image of the image sensor 2 #
Window WD ₂ around the second position coordinates of the corresponding candidate point P ₂ is cut out, by performing pattern matching for these windows WD _1, WD ₂ together, these similarities are calculated. This pattern matching is performed for each assumed distance z _n .

【０１０８】図２（１）は、このステレオ対（基準画像
センサ１と画像センサ２）について得られた、仮定距離
ｚ_nと類似度の逆数Ｑsとの対応関係を示すグラフであ
る。同図に示すように、仮定距離がｚ'_nのときの対応候
補点の位置を中心とする基準画像センサのウインドウＷ
Ｄ'₁と画像センサ２のウインドウＷＤ'₂とのマッチング
を行った結果は、類似度の逆数Ｑsとして大きな値が得
られている（類似度は小さくなっている）が、ウインド
ウＷＤ₁と、仮定距離がｚ_nxのときの対応候補点の位置
座標を中心とするウインドウＷＤ₂とのマッチングを行
った結果は、類似度の逆数Ｑsは小さくなっている（類
似度は大きくなっている）のがわかる。同様にして類似
度算出部３１２では、基準画像＃１の対応候補点Ｐ₁の
位置座標を中心とするウインドウＷＤ₁と、画像センサ
３の画像＃３の対応候補点Ｐ₃の位置座標を中心とする
ウインドウＷＤ₃とのパターンマッチングが実行され、
これらの類似度が算出される。そして、パターンマッチ
ングが各仮定距離ｚ_n毎に行われることによって、この
ステレオ対（画像センサ１と画像センサ３）についても
図２（２）に示すような仮定距離ｚ_nと類似度の逆数Ｑs
との対応関係が求められる。FIG. 2A is a graph showing the correspondence between the assumed distance z _n and the reciprocal Qs of the similarity obtained for the stereo pair (reference image sensor 1 and image sensor 2). As shown in the figure, the window W of the reference image sensor centered on the position of the corresponding candidate points when assumptions distance z _'n
D result of the matching between the ₂ _'1 and the window WD of the image sensor 2' is large values obtained as the inverse Qs similarity (is smaller similarity) is the window WD _1, results assumptions distance was matching with the window WD ₂ around the position coordinates of the corresponding candidate points when the z _nx is the reciprocal Qs similarity is smaller (similarity is larger) I understand. In the similarity calculation unit 312 in the same manner, the center and the window WD ₁ around the position coordinates of the corresponding candidate point P ₁ of the reference image # 1, the position coordinates of the corresponding candidate point P ₃ of the image # 3 image sensors 3 Pattern matching with the window WD ₃
These similarities are calculated. Then, since pattern matching is performed for each assumed distance z _n , the stereo pair (image sensor 1 and image sensor 3) also has the assumed distance z _n and the reciprocal Qs of the similarity as shown in FIG.
Is required.

【０１０９】同様にして類似度算出部３１３では、基準
画像＃１の対応候補点Ｐ₁の位置座標を中心とするウイ
ンドウＷＤ₁と、画像センサＮの画像＃Ｎの対応候補点
Ｐ_Nの位置座標を中心とするウインドウＷＤ_Nとのパター
ンマッチングが実行され、これらの類似度が算出され
る。そして、パターンマッチングが各仮定距離ｚ_n毎に
行われることによって、このステレオ対（画像センサ１
と画像センサＮ）についても図２（Ｎ）に示す仮定距離
ｚ_nと類似度の逆数Ｑsとの対応関係が求められる。[0109] In the similarity calculation unit 313 in the same manner, the window WD ₁ around the position coordinates of the corresponding candidate point P ₁ of the reference image # 1, the position of the candidate corresponding points P _N of the image #N image sensor N pattern matching between the window WD _N centered coordinates is performed, these similarities are calculated. Then, by performing pattern matching for each assumed distance z _n , this stereo pair (image sensor 1
Correspondence between the assumed distance z _n and similarity reciprocal Qs shown in FIG. 2 (N) is also the image sensor N) and is obtained.

【０１１０】最後に、各ステレオ対毎に得られた仮定距
離ｚ_nと類似度の逆数との対応関係を加算し、仮定距離
ｚ_nと類似度の逆数の加算値との図２（融合した結果）
のような対応関係が求められる。[0110] Finally, by adding the correspondence relationship between the reciprocal of the similarity assumed distance z _n obtained for each stereo pair, and FIG. 2 (a fusion between the sum of the reciprocal of the similarity assumed distance z _n result)
Is required.

【０１１１】このようにして仮定距離ｚ_nと類似度の逆
数の加算値との対応関係から、最も類似度が高くなる点
（類似度の逆数の加算値が最小値となる点）を判別し、
この最も類似度が高くなっている点に対応する仮定距離
ｚ_nxを最終的に、認識対象物体５０上の点５０ａまでの
真の距離（最も確からしい距離）と推定する。かかる処
理は、仮想画像＃２１の各選択画素毎に全画素について
行われる。In this way, from the correspondence between the assumed distance z _n and the reciprocal of the similarity, the point having the highest similarity (the point at which the reciprocal of the similarity has the minimum value) is determined. ,
The hypothetical distance z _nx corresponding to the point with the highest similarity is finally estimated as the true distance (the most likely distance) to the point 50 a on the recognition target object 50. This process is performed for all pixels for each selected pixel of the virtual image # 21.

【０１１２】以上のようにして、距離推定部３１４で
は、仮定距離ｚ_nを順次変化させて得られた類似度の加
算値の中から、最も類似度の加算値が高くなるものが判
別され、最も類似度の加算値が高くなる仮定距離ｚ_nxが
真の距離と推定され、出力される。As described above, the distance estimating unit 314 discriminates the one having the highest similarity addition value from the similarity addition values obtained by sequentially changing the assumed distance z _n . The hypothetical distance z _{nx at} which the sum of the similarities becomes the highest is estimated as the true distance and is output.

【０１１３】なお、仮想画像＃２１の各選択画素ごと
に、各画像＃１〜＃Ｎの対応点の画像情報が求められる
ので、仮想画像＃２１の各画素に、対応点の画像情報を
付与することで、仮想画像センサ２１で物体５０を撮像
したときの撮像画像を生成することができる。この場
合、仮想画像＃２１の各画素に付与される対応点の画像
情報としては、各画像＃１〜＃Ｎの対応点の明度の平均
値、メディアン値等を用いることができる。こうして仮
想画像センサ２１であたかも撮影したような画像を生成
できる。Since image information of the corresponding point of each of the images # 1 to #N is obtained for each selected pixel of the virtual image # 21, the image information of the corresponding point is added to each pixel of the virtual image # 21. By doing so, it is possible to generate a captured image when the virtual image sensor 21 captures an image of the object 50. In this case, as the image information of the corresponding points assigned to each pixel of the virtual image # 21, an average value of the brightness of the corresponding points of the images # 1 to #N, a median value, or the like can be used. In this manner, an image as if photographed by the virtual image sensor 21 can be generated.

【０１１４】また、加算した図２（融合した結果）に示
すように、仮想画像＃２１の各選択画素ごとに、類似度
の逆数Ｑsが最小になるときの推定距離ｚ_nxが求められ
るが、この時類似度の逆数の加算値の最小値を、その推
定距離ｚ_nxの信頼度を示すデータ（最小値が小さい値を
示すほど信頼度は高い）として、各推定距離ｚ_nxごとに
対応づけておくような実施も可能である。また、信頼度
としては、この他に類似度の変化率または原画像の変化
率などに基づいて求めたものを利用しても良い。As shown in FIG. 2 (combined result), the estimated distance z _{nx at} which the reciprocal Qs of the similarity becomes minimum is obtained for each selected pixel of the virtual image # 21. At this time, the minimum value of the reciprocal of the similarity is set as data indicating the reliability of the estimated distance z _nx (the smaller the minimum value, the higher the reliability), and is associated with each estimated distance z _nx. It is also possible to carry out such operations. In addition, as the reliability, a value obtained based on a change rate of the similarity or a change rate of the original image may be used.

【０１１５】以上のように、本実施形態によれば、所望
の観測視野を有した仮想画像センサ２１を任意に設定し
て、この仮想画像センサ２１による距離画像、撮像画像
を取得することができるので、画像センサ１〜Ｎの配設
位置、姿勢によって一義的に定まる作業視野に限定され
ることなく、作業状況に応じた作業視野をもって、物体
５０の認識、判断などの処理をすることができるように
なる。とりわけ、物理的に画像センサを配置することが
困難であったり、画像センサの使用が困難な環境の良く
ない場所であっても、その場所に、仮想画像センサ２１
を設定することで、その場所に実際の画像センサを配置
したのと同等の画像を取得することができる。As described above, according to the present embodiment, it is possible to arbitrarily set the virtual image sensor 21 having a desired observation field of view and obtain a distance image and a captured image by the virtual image sensor 21. Therefore, processing such as recognition and determination of the object 50 can be performed with a work field according to a work situation without being limited to a work field uniquely determined by the arrangement positions and postures of the image sensors 1 to N. Become like In particular, even in a place where it is difficult to physically place the image sensor or where the environment in which the use of the image sensor is difficult is not good, the virtual image sensor 21 is provided in that place.
By setting, it is possible to obtain an image equivalent to the case where an actual image sensor is arranged at that location.

【０１１６】さらに、仮想画像センサ２１で得られた仮
想画像＃２１は、機械的な首振り機構によらずに、演算
処理の上だけで任意に位置、姿勢の変更が可能であるの
で（「電子的な首振り」）、画像センサ１〜Ｎは固定し
たままで、自由に視点を変えた画像を得ることができ
る。これを利用して、物体をトラッキングするシステム
や、作業目的に応じて自動的に視野を変化させるシステ
ムを、機械的な首振り機構を要せずして構築することが
可能となる。Further, since the virtual image # 21 obtained by the virtual image sensor 21 can be arbitrarily changed in position and posture only by arithmetic processing without using a mechanical swing mechanism (see ""). Electronic swing "), while keeping the image sensors 1 to N fixed, an image whose viewpoint is freely changed can be obtained. By utilizing this, it is possible to construct a system for tracking an object or a system for automatically changing the field of view according to the work purpose without the need for a mechanical swing mechanism.

【０１１７】つぎに、上記実施形態の変形例について説
明する。Next, a modification of the above embodiment will be described.

【０１１８】図５は上述した図３に対応する画像センサ
の構成図であり、図６は上述した図４に対応するブロッ
ク図である。FIG. 5 is a block diagram of the image sensor corresponding to FIG. 3 described above, and FIG. 6 is a block diagram corresponding to FIG. 4 described above.

【０１１９】図３、図４の実施形態と異なるのは、類似
度を算出する際に、基準画像センサ１の基準画像＃１の
対応候補点Ｐ₁の画像情報Ｆ₁を基準として、各ステレオ
対ごとに類似度を求めているのではなく、画像センサ１
を含む全画像センサ１〜Ｎの対応候補点Ｐ₁〜Ｐ_Nの局
所情報Ｆ₁〜Ｆ_Nのばらつき度合いから、これら多眼ス
テレオ全体の類似度を算出している点である。The difference from the embodiment of FIGS. 3 and 4 is that, when calculating the similarity, each stereo image is obtained based on the image information F ₁ of the corresponding candidate point P ₁ of the reference image # 1 of the reference image sensor 1. Instead of finding the similarity for each pair, the image sensor 1
The point is that the similarity of the entire multi-view stereo is calculated from the degree of variation of the local information F ₁ to F _N of the corresponding candidate points P ₁ to P _N of all the image sensors ₁ to _N.

【０１２０】図６において、仮想視野情報設定部４０
４、対応候補点座標発生部４０５、局所情報抽出部４０
６、４０７、・・・、４０８は、図４の仮想視野情報設定
部３０５、対応候補点座標発生部３０６、局所情報抽出
部３０７、３０８、・・・、３１０に相当するものであ
る。In FIG. 6, a virtual visual field information setting unit 40
4. Corresponding candidate point coordinate generator 405, local information extractor 40
, 408 correspond to the virtual visual field information setting unit 305, the corresponding candidate point coordinate generating unit 306, and the local information extracting units 307, 308,.

【０１２１】類似度算出部４０９では、局所情報抽出部
４０６、４０７、・・・、４０８で抽出された画像＃１、
＃２、・・・、＃Ｎの対応候補点Ｐ₁、Ｐ₂、・・・、Ｐ_Nの
局所情報Ｆ₁、Ｆ₂、・・・、Ｆ_Nに基づき、これら局所情
報Ｆ₁、Ｆ₂、・・・、Ｆ_Nの「まとまり具合」を示す評価
値が類似度として算出される。この「まとまり具合」を
示す評価値は、例えば画像情報の平均値からの差の絶対
値の総和や画像情報の分散値などである。たとえば、画
像情報の分散値が小さいと、「まとまり具合」の評価と
しては高くなり、類似度は大きいとされる。In the similarity calculating section 409, the images # 1 and # 2 extracted by the local information extracting sections 406, 407,.
# 2, ..., candidate corresponding points P _1, P ₂ of # N, ..., local information F _1, F ₂ of the P _N, ..., based on F _N, these local information F _1, F _2, ..., the evaluation value indicating the "unity degree" of F _N is calculated as the similarity. The evaluation value indicating the “unity degree” is, for example, a sum of absolute values of differences from the average value of the image information, a variance value of the image information, and the like. For example, when the variance value of the image information is small, the evaluation of “the unity” is high, and the similarity is high.

【０１２２】このように、図４のごとく基準画像＃１の
画像情報を基準として差分をとるなどして各ステレオ対
ごとに類似度を算出し、これら各ステレオ対ごとの類似
度を加算するなどして全体の類似度を求めているのでは
なく、全画像＃１〜＃Ｎの全画像情報に基づく一括した
算出により全体の類似度を求めているので、より柔軟で
安定性が高く類似度を求めることができる。全画像＃１
〜＃Ｎについての類似度は、仮定距離ｚ_nごとに求めら
れる。As described above, the similarity is calculated for each stereo pair by taking a difference based on the image information of the reference image # 1 as shown in FIG. 4, and the similarity for each stereo pair is added. Instead of calculating the overall similarity, the overall similarity is obtained by a collective calculation based on all the image information of all the images # 1 to #N. Can be requested. All images # 1
Similarity of ~ # N is calculated for each assumed distance z _n.

【０１２３】距離推定部４１０では、仮定距離ｚ_nを順
次変化させて得られた類似度の中から、最も類似度が高
くなるものが判別され、最も類似度が高くなる仮定距離
ｚ_nxが真の距離と推定され、出力される。なお、仮想画
像＃２１の各選択画素ごとに、各画像＃１〜＃Ｎの対応
点の画像情報が求められるので、仮想画像＃２１の各画
素に、対応点の画像情報を付与することで、仮想画像セ
ンサ２１で物体５０を撮像したときの撮像画像を生成す
ることができる。In the distance estimating section 410, the one having the highest similarity is determined from the similarities obtained by sequentially changing the assumed distance z _n, and the assumed distance z _nx having the highest similarity is true. Is estimated and output. Since image information of the corresponding points of the images # 1 to #N is obtained for each selected pixel of the virtual image # 21, the image information of the corresponding points is added to each pixel of the virtual image # 21. In addition, a captured image when the object 50 is captured by the virtual image sensor 21 can be generated.

【０１２４】・第２の実施形態本実施形態は、配置位置、方向、光学的特性、解像度な
どが異なる複数種類の画像センサ群を同時に使用するこ
とができ、複数種類全ての画像センサ群からの情報を用
いて信頼度の高い距離計測、物体認識を行うことができ
る実施形態である。Second Embodiment In this embodiment, a plurality of types of image sensors having different arrangement positions, directions, optical characteristics, and resolutions can be used at the same time. This is an embodiment in which distance measurement and object recognition with high reliability can be performed using information.

【０１２５】図７（ｂ）は、本実施形態で想定している
画像センサ群１１″を示しており、明視野用画像センサ
群１４１〜１４４と、暗視野用画像センサ群１５１〜１
５４とから成っている。画像センサ群１１″の中央位置
には、照度計１６が配設されている。明視野用画像セン
サ１４１〜１４４は、照度が高い環境下で、暗視野用画
像センサ１５１〜１５４よりも感度が高い特性を示し、
暗視野用画像センサ１５１〜１５４は、照度が低い環境
下で、明視野用画像センサ１４１〜１４４よりも感度が
高い特性を示す。FIG. 7B shows an image sensor group 11 ″ assumed in the present embodiment. The image sensor groups 141 to 144 for bright field and the image sensor groups 151 to 144 for dark field are shown.
54. An illuminometer 16 is disposed at the center of the image sensor group 11 ″. The bright-field image sensors 141 to 144 have higher sensitivity than the dark-field image sensors 151 to 154 in an environment with high illuminance. Show high properties,
The dark-field image sensors 151 to 154 exhibit higher sensitivity than the bright-field image sensors 141 to 144 in an environment with low illuminance.

【０１２６】図９は、上記画像センサ群１１″の撮像画
像に基づき距離計測、物体認識を行う装置のブロック図
であり、図６に対応するブロック図であり、図６と同様
に本実施形態では、類似度を算出する際に、基準となる
基準画像センサを設定していない。FIG. 9 is a block diagram of an apparatus for performing distance measurement and object recognition based on a captured image of the image sensor group 11 ″, and is a block diagram corresponding to FIG. 6, and is similar to FIG. Does not set a reference image sensor as a reference when calculating the similarity.

【０１２７】図６の実施形態と異なるのは、類似度を、
画像センサの種類ごと、つまり明視野用画像センサ群１
４１〜１４４、暗視野用画像センサ群１５１〜１５４毎
に算出している点である。The difference from the embodiment of FIG. 6 is that the similarity is
For each type of image sensor, that is, bright field image sensor group 1
41 to 144, and is calculated for each dark-field image sensor group 151 to 154.

【０１２８】画像入力部５０１〜５０２には、明視野用
画像センサ１４１〜１４４の画像＃１４１〜＃１４４が
取り込まれる。同様に、画像入力部５０３〜５０４に
は、暗視野用画像センサ１５１〜１５４の画像＃１５１
〜＃１５４が取り込まれる。The image input units 501 to 502 receive the images # 141 to # 144 of the bright-field image sensors 141 to 144, respectively. Similarly, the image input units 503 to 504 include images # 151 of the dark-field image sensors 151 to 154, respectively.
To # 154 are captured.

【０１２９】図９において、仮想視野情報設定部５０
５、対応候補点座標発生部５０６は、図４の仮想視野情
報設定部３０５、対応候補点座標発生部３０６、あるい
は図６の仮想視野情報設定部４０４、対応候補点座標発
生部４０５に相当するものである。In FIG. 9, the virtual visual field information setting unit 50
5. The corresponding candidate point coordinate generating unit 506 corresponds to the virtual visual field information setting unit 305 and the corresponding candidate point coordinate generating unit 306 in FIG. 4, or the virtual visual field information setting unit 404 and the corresponding candidate point coordinate generating unit 405 in FIG. Things.

【０１３０】仮想視野情報設定部５０５では、図１に示
すように、明視野用画像センサ１４１〜１４４、暗視野
用画像センサ１５１〜１５４に共通の仮想画像＃２１が
設定され、この仮想画像＃２１中の位置（ｉ，ｊ）で特
定される選択画素Ｐ₂₁が選択される。In the virtual visual field information setting unit 505, as shown in FIG. 1, a virtual image # 21 common to the bright field image sensors 141 to 144 and the dark field image sensors 151 to 154 is set. The selected pixel P21 specified at the position (i, j) in ₂₁ is selected.

【０１３１】対応候補点座標発生部５０６では、上記設
定された仮想画像＃２１中の選択画素Ｐ₂₁に対応する明
視野用画像センサ群１４１〜１４４の画像＃１４１〜＃
１４４中の対応候補点および暗視野用画像センサ群１５
１〜１５４の画像＃１５１〜＃１５４中の対応候補点Ｐ
_kの位置座標（Ｘ_k，Ｙ_k）（ｋ＝１４１、１４２、１４
３、１４４、１５１、１５２、１５３、１５４：以下ｋ
は同じ意味で使用する）を仮定距離ｚ_nの大きさ毎に発
生する。[0131] In the corresponding candidate point coordinate generating unit 506, the set virtual image # bright field image sensors 141-144 corresponding to the selected pixel P ₂₁ in the 21 image # 141 to #
Corresponding candidate points in 144 and dark field image sensor group 15
Corresponding candidate point P in images # 151 to # 154 of Nos. 1 to 154
_k position coordinates (X _k , Y _k ) (k = 141, 142, 14)
3, 144, 151, 152, 153, 154: k below
Are used in the same sense) for each magnitude of the assumed distance z _n .

【０１３２】局所情報抽出部５０７〜５０８は、図４の
局所情報抽出部３０７、３０８、３１０あるいは図６の
４０６、４０７、・・・、４０８に相当するものであり、
明視野用画像センサ１４１〜１４４の画像＃１４１〜＃
１４４上の対応候補点Ｐ₁₄₁〜Ｐ₁₄₄の局所情報Ｆ₁₄₁〜
Ｆ₁₄₄が抽出される。局所情報抽出部５０９〜５１０に
おいても同様に、暗視野用画像センサ１５１〜１５４の
画像＃１５１〜＃１５４上の対応候補点Ｐ₁₅₁〜Ｐ₁₅₄の
局所情報Ｆ₁₅₁〜Ｆ₁₅₄が抽出される。The local information extracting units 507 to 508 correspond to the local information extracting units 307, 308, 310 of FIG. 4 or 406, 407,..., 408 of FIG.
Images # 141 to # of bright field image sensors 141 to 144
144, the local information F ₁₄₁の of the corresponding candidate points P ₁₄₁ 144P ₁₄₄
F ₁₄₄ is extracted. Similarly, in the local information extraction units 509 to 510, the local information F _{151 to} F ₁₅₄ of the corresponding candidate points P _{151 to} P ₁₅₄ on the images # 151 to # 154 of the dark field image sensors ₁₅₁ to ₁₅₄ are extracted.

【０１３３】類似度算出部５１１では、局所情報抽出部
５０７〜５０８で抽出された明視野用画像センサの画像
＃１４１、＃１４２、＃１４３、＃１４４の対応候補点
Ｐ₁₄ ₁、Ｐ₁₄₂、Ｐ₁₄₃、Ｐ₁₄₄の画像情報Ｆ₁₄₁、Ｆ₁₄₂、
Ｆ₁₄₃、Ｆ₁₄₄に基づき、これら画像情報Ｆ₁₄₁、Ｆ₁₄₂、
Ｆ₁₄₃、Ｆ₁₄₄の分散値（画像情報の「まとまり具合」を
示す評価値）が類似度として算出される。この類似度
は、仮定距離ｚ_nごとに求められる。[0133] In the similarity calculation unit 511, image # 141 of the image sensor brightfield extracted by the local information extracting unit 507 to 508, # 142, # 143, candidate corresponding points P ₁₄ ₁ of # 144, P _142, _P143 , _P144 image information _F141 , _F142 ,
Based on F ₁₄₃ and F ₁₄₄ , these image information F ₁₄₁ , F ₁₄₂ ,
The variance values of F ₁₄₃ and F ₁₄₄ (evaluation values indicating “the degree of unity” of the image information) are calculated as the similarity. This similarity is obtained for each assumed distance z _n .

【０１３４】この結果、類似度算出部５１１からは、図
１０（ａ）に示すように、仮定距離ｚ_nに各対応した類
似度の逆数Ｑs₁が距離推定部５１５に出力されることに
なる。一方、類似度算出部５１２では、局所情報抽出
部５０９〜５１０で抽出された暗視野用画像センサの画
像＃１５１、＃１５２、＃１５３、＃１５４の対応候補
点Ｐ₁₅₁、Ｐ₁₅₂、Ｐ₁₅₃、Ｐ₁₅₄の画像情報Ｆ₁₅₁、
Ｆ₁₅₂、Ｆ₁₅₃、Ｆ₁₅₄に基づき、これら画像情報Ｆ₁₅₁、
Ｆ₁₅₂、Ｆ₁₅₃、Ｆ₁₅₄の分散値（画像情報の「まとまり
具合」を示す評価値）が類似度として算出される。As a result, as shown in FIG. 10A, the reciprocal Qs _{1 of the} similarity corresponding to each assumed distance z _n is output from the similarity calculator 511 to the distance estimator 515. . On the other hand, the similarity calculating unit 512, the image # 151 darkfield image sensor extracted in the local information extracting unit 509 to 510, # 152, # 153, candidate corresponding points P _151, P ₁₅₂ of # 154, P ₁₅₃ , _P154 image information _F151 ,
Based on F ₁₅₂ , F ₁₅₃ , and F ₁₅₄ , these image information F ₁₅₁ ,
The variance of F ₁₅₂ , F ₁₅₃ , and F ₁₅₄ (evaluation value indicating “the degree of unity” of the image information) is calculated as the similarity.

【０１３５】この類似度は、仮定距離ｚ_nごとに求めら
れる。この結果、類似度算出部５１２からは、図１０
（ｂ）に示すように、仮定距離ｚ_nに各対応した類似度
の逆数Ｑs₂が距離推定部５１５に出力されることにな
る。The similarity is obtained for each assumed distance z _n . As a result, the similarity calculation unit 512 outputs FIG.
As shown in (b), the reciprocal Qs _{2 of the} similarity corresponding to each of the assumed distances z _n is output to the distance estimating unit 515.

【０１３６】外部情報入力部５１３には、上記照度計１
６の検出出力である照度を示す信号が入力され、これが
センサ群の重み係数発生部５１４に出力される。The external information input unit 513 has the illuminance meter 1
A signal indicating the illuminance, which is the detection output of No. 6, is input, and is output to the weight coefficient generation unit 514 of the sensor group.

【０１３７】重み係数発生部５１４からは、上記照度計
１６で検出された照度の大きさに基づいて、類似度算出
部５１１、類似度算出部５１２でそれぞれ算出された類
似度の逆数Ｑs₁とＱs₂に乗算すべき重み係数ω1、ω2が
演算され、距離推定部５１５に出力される。The weighting factor generator 514 outputs the reciprocal Qs _{1 of the} similarity calculated by the similarity calculator 511 and the similarity calculator 512 based on the magnitude of the illuminance detected by the illuminometer 16. Weight coefficients ω1 and ω2 to be multiplied by Qs ₂ are calculated and output to distance estimation section 515.

【０１３８】この重み係数ω1、ω2は、照度計１６で検
出された照度が大きな値を示す程、重み係数ω1が大き
く、重み係数ω2が小さくなるように演算される。つま
り、明視野用画像センサ１４１〜１４４の画像に基づき
演算された類似度の寄与率が大きく、暗視野用画像セン
サ１５１〜１５４の画像に基づき演算された類似度の寄
与率が小さくなるように演算される。一方、照度計１６
で検出された照度が小さな値を示す程、重み係数ω1が
小さく、重み係数ω2が大きくなるように演算される。
つまり、明視野用画像センサ１４１〜１４４の画像に基
づき演算された類似度の寄与率が小さく、暗視野用画像
センサ１５１〜１５４の画像に基づき演算された類似度
の寄与率が大きくなるように演算される。The weight coefficients ω1 and ω2 are calculated such that the larger the illuminance detected by the illuminometer 16 is, the larger the weight coefficient ω1 is and the smaller the weight coefficient ω2 is. That is, the contribution ratio of the similarity calculated based on the images of the bright-field image sensors 141 to 144 is large, and the contribution ratio of the similarity calculated based on the images of the dark-field image sensors 151 to 154 is small. Is calculated. Meanwhile, the illuminometer 16
The calculation is performed such that the smaller the illuminance detected in the above, the smaller the weight coefficient ω1 and the larger the weight coefficient ω2.
That is, the contribution ratio of the similarity calculated based on the images of the bright-field image sensors 141 to 144 is small, and the contribution ratio of the similarity calculated based on the images of the dark-field image sensors 151 to 154 is large. Is calculated.

【０１３９】距離推定部５１５では、図１０（ａ）に示
すように、明視野用画像センサ１４１〜１４４の画像に
基づき算出された類似度の逆数Ｑs₁に対して上記重み係
数ω1が乗算されるとともに、図１０（ｂ）に示すよう
に、暗視野用画像センサ１５１〜１５４の画像に基づき
算出された類似度の逆数Ｑs₂に対して上記重み係数ω2
が乗算される。つまり、仮定距離ｚ_nと類似度の逆数Ｑs
₁との対応関係、仮定距離ｚ_nと類似度の逆数Ｑs₂との対
応関係がそれぞれ重み係数ω1、ω2によって補正される
ことになる。そして、これら重み係数ω1、ω2によって
補正された仮定距離ｚ_nと類似度の逆数Ｑs₁との対応関
係、仮定距離ｚ_nと類似度の逆数Ｑs₂との対応関係が加
算され、図１０（ｃ）に示すように、仮定距離ｚ_nと類
似度の逆数の加算値Ｑs₁₂との対応関係が生成される。[0139] In the distance estimation unit 515, as shown in FIG. 10 (a), the weighting coefficient ω1 is multiplied by the reciprocal Qs ₁ of calculated similarity based on the image of the image sensor 141 to 144 for bright field Rutotomoni, 10 (b), the above-mentioned weighting coefficient ω2 against the reciprocal Qs ₂ of calculated similarity based on the image of the darkfield image sensor 151 to 154
Is multiplied. That is, the assumed distance z _n and the reciprocal Qs of the similarity
The correspondence relationship with ₁ and the correspondence relationship between the assumed distance z _n and the reciprocal Qs _{2 of the} degree of similarity are corrected by the weight coefficients ω 1 and ω 2, respectively. Then, these weight coefficients .omega.1, correspondence between the reciprocal Qs ₁ similarity assumed distance z _n corrected by the .omega.2, correspondence between the assumed distance z _n the reciprocal Qs ₂ of similarity is added, FIG. 10 ( As shown in c), a correspondence between the assumed distance z _n and the sum Qs ₁₂ of the reciprocal of the similarity is generated.

【０１４０】このようにして、仮定距離ｚ_nと類似度の
逆数の加算値Ｑs₁₂との対応関係から、最も類似度が高
くなる点（類似度の逆数の加算値Ｑs₁₂が最小値となる
点）が判別され、この最も類似度が高くなっている点に
対応する仮定距離ｚ_nxが最終的に、認識対象物体５０上
の点５０ａまでの真の距離（最も確からしい距離）と推
定される。かかる処理は、仮想画像＃２１の各選択画素
毎に全画素について行われる。In this way, from the correspondence between the assumed distance z _n and the reciprocal of the similarity Qs ₁₂ , the point having the highest similarity (the reciprocal of the similarity Qs ₁₂ becomes the minimum value) Is determined, and the assumed distance z _nx corresponding to the point with the highest similarity is finally estimated as the true distance (the most probable distance) to the point 50a on the recognition target object 50. You. This process is performed for all pixels for each selected pixel of the virtual image # 21.

【０１４１】以上のようにして、距離推定部５１５で
は、仮定距離ｚ_nを順次変化させて得られた類似度の加
算値Ｑs₁₂の中から、最も類似度の加算値が高くなるも
のが判別され、最も類似度の加算値が高くなる仮定距離
ｚ_nxが真の距離と推定され、出力される。[0141] As described above, the distance estimating unit 515, the determination is that among the assumed distance z sum of the similarity obtained by the sequentially changing _n Qs _12, the sum of the most similarity is high Then, the assumed distance z _{nx at} which the added value of the similarity becomes the highest is estimated as the true distance and is output.

【０１４２】本実施形態によれば、さらに、以下のよう
な効果が得られる。According to the present embodiment, the following effects can be further obtained.

【０１４３】すなわち、多眼ステレオによる計測におい
て、仮想画像センサ２１を設定して、これを明視野用画
像センサ群と暗視野用画像センサ群に共通の仮想視野と
して用いたので、これら光学的特性の異なる画像センサ
群同士であっても、これら２種類の画像センサ群の計測
結果をそれぞれ融合して総合的な判断を行うことが可能
となる。２種類の撮像条件の異なる画像センサ群それぞ
れの計測結果を総合して判断できるようになり、飛躍的
に計測精度を向上させることができる。That is, in the multi-view stereo measurement, the virtual image sensor 21 is set and used as a virtual visual field common to the bright-field image sensor group and the dark-field image sensor group. Even if the image sensor groups differ from each other, it is possible to perform comprehensive judgment by fusing the measurement results of these two types of image sensor groups. The measurement results of each of the two image sensor groups having different imaging conditions can be comprehensively determined, and the measurement accuracy can be dramatically improved.

【０１４４】なお、本実施形態では、２種類の異なる画
像センサ群を想定しているが、３種類以上の異なる画像
センサ群に適用する実施も可能である。In this embodiment, two types of different image sensor groups are assumed, but it is also possible to apply the present invention to three or more types of different image sensor groups.

【０１４５】また、本実施形態では、照度計１６の検出
出力に応じて、明視野用センサ群１４１〜１４４による
類似度と、暗視野用センサ群１５１〜１５４による類似
度の重みを変化させ、これら２種類の画像センサから得
られる類似度を融合することで距離を推定しているが、
照度計１６の検出出力に応じて、使用すべき画像センサ
の種類自体を完全に切り換える実施も可能である。In this embodiment, the weight of the similarity by the bright-field sensor groups 141 to 144 and the weight of the similarity by the dark-field sensor groups 151 to 154 are changed in accordance with the detection output of the illuminometer 16. The distance is estimated by fusing similarities obtained from these two types of image sensors.
It is also possible to completely switch the type of image sensor to be used according to the detection output of the illuminometer 16.

【０１４６】たとえば、照度計１６で検出される照度の
大きさを２値的に判断するしきい値を設定して、照度計
１６で検出される照度が、このしきい値以上になった場
合には、重み係数ω1を１とし、重み係数ω2を０にし
て、明視野用センサ群１４１〜１４４による類似度のみ
で、距離の推定を行うようにし、逆に、照度計１６で検
出される照度が、上記しきい値よりも小さくなった場合
には、重み係数ω1を０とし、重め係数ω2を１とし、暗
視野用画像センサ１５１〜１５４による類似度のみで、
距離の推定を行うようにしてもよい。For example, when a threshold value for determining the magnitude of the illuminance detected by the illuminometer 16 in a binary manner is set, and the illuminance detected by the illuminometer 16 exceeds this threshold value, , The weight coefficient ω1 is set to 1, the weight coefficient ω2 is set to 0, and the distance is estimated only by the similarity by the bright-field sensor groups 141 to 144. On the contrary, the distance is detected by the illuminometer 16. When the illuminance is smaller than the threshold value, the weighting factor ω1 is set to 0, the weighting factor ω2 is set to 1, and only the similarity by the dark-field image sensors 151 to 154 is used.
The distance may be estimated.

【０１４７】また、使用する画像センサの種類は、任意
である。The type of image sensor to be used is arbitrary.

【０１４８】図７（ａ）に示すように、通常の感度のモ
ノクロの画像センサ群１２1〜１２4と、赤外線画像セン
サ群１３1〜１３4とから成っている画像センサ群１１'
に上述した本実施形態を適用してもよい。As shown in FIG. 7A, an image sensor group 11 'comprising a group of monochrome image sensors 121 to 124 of normal sensitivity and a group of infrared image sensors 131 to 134.
The present embodiment described above may be applied to the present invention.

【０１４９】また、画像センサの種類毎に、画像センサ
の焦点距離、絞り、シャッタースピートなどのパラメー
タを異ならせてもよく、画像センサの向きなどを条件に
応じて調整してもよい。The parameters such as the focal length, aperture, and shutter speed of the image sensor may be different for each type of image sensor, and the orientation of the image sensor may be adjusted according to conditions.

【０１５０】また、本実施形態では、照度計１６を使用
しているが、画像センサの撮像条件（周囲の環境条件）
の変化を検出することができるセンサであれば、任意に
使用することができる。In the present embodiment, the illuminometer 16 is used, but the imaging conditions of the image sensor (surrounding environmental conditions)
Any sensor can be used as long as the sensor can detect the change in.

【０１５１】また、時刻によって重み係数を変更するよ
うな実施形態も考えられる。An embodiment in which the weight coefficient is changed depending on the time is also conceivable.

【０１５２】・第３の実施形態本実施形態は、多眼ステレオを構成する複数の画像セン
サが配置されている場合に、これら複数の画像センサの
すべてを計測に使用するのではなくて、これら複数の画
像センサの中から計測に使用すべき少なくとも２つの画
像センサを選択することで、状況に応じたきめの細かい
計測を行うことができる実施形態である。図８は、車
両６０に、各画像センサ１、２、３、…からなる画像セ
ンサ群１１を搭載して、この画像センサ群１１による撮
像結果に基づき、車両６０の進行方向前方、道路６１上
の作業視野Ａrを認識、判断しつつ走行する場合を想定
している。Third Embodiment In the present embodiment, when a plurality of image sensors constituting a multi-view stereo are arranged, not all of the plurality of image sensors are used for measurement. In this embodiment, by selecting at least two image sensors to be used for measurement from a plurality of image sensors, it is possible to perform fine-grained measurement according to the situation. FIG. 8 shows an image sensor group 11 composed of image sensors 1, 2, 3,... Mounted on a vehicle 60. It is assumed that the vehicle travels while recognizing and judging the working field of view Ar.

【０１５３】車両６０には、車両の速度Ｖを検出する速
度センサ６２、車両６０の進行方向φを検出するジャイ
ロなどの姿勢角センサ６３が搭載されている。なお、姿
勢角センサ６３の代わりに、車両６０の操舵角を検出す
るセンサを使用してもよい。ここで、車両６０の速度
Ｖ、進行方向φが定まれば、それに応じて車両６０が認
識すべき車両前方距離、認識すべき方向が定まることか
ら、上記作業視野Ａrが特定されることになり、この作
業視野Ａrを撮像するのに最も適切な少なくとも２つの
画像センサが、画像センサ群１１の中から選択されるこ
とになる。The vehicle 60 is equipped with a speed sensor 62 for detecting the speed V of the vehicle and a posture angle sensor 63 such as a gyro for detecting the traveling direction φ of the vehicle 60. Note that a sensor that detects the steering angle of the vehicle 60 may be used instead of the attitude angle sensor 63. Here, if the speed V and the traveling direction φ of the vehicle 60 are determined, the vehicle forward distance to be recognized by the vehicle 60 and the direction to be recognized are determined accordingly, so that the work visual field Ar is specified. The at least two image sensors most suitable for imaging the working field of view Ar are selected from the image sensor group 11.

【０１５４】そこで、速度センサ６２、姿勢角センサ６
３で検出される車速Ｖ、進行方向φに応じて、画像セン
サ群１１の中から、実際に使用すべき少なくとも２つの
画像センサ、たとえば画像センサ２、３が選択される。Therefore, the speed sensor 62 and the posture angle sensor 6
At least two image sensors to be actually used, for example, the image sensors 2 and 3 are selected from the image sensor group 11 in accordance with the vehicle speed V and the traveling direction φ detected at 3.

【０１５５】この結果、図６と同様に、画像入力部４０
１に、画像センサ２の画像＃２が取り込まれるととも
に、画像入力部４０２に、画像センサ３の画像＃３が取
り込まれる。以下、図６と同様に、対応候補点座標発生
部４０５では、これら２つの画像センサ２、３の画像の
中から仮想画像＃２１の選択画素の対応候補点Ｐ_kの位
置座標（Ｘ_k，Ｙ_k）（ｋ＝１、２：以下ｋを同じ意味で
使用する）が、仮定距離ｚ_nの大きさ毎に発生される。As a result, as in FIG.
1, the image # 2 of the image sensor 2 is captured, and the image input unit 402 captures the image # 3 of the image sensor 3. Hereinafter, similarly to FIG. 6, the corresponding candidate point coordinate generating unit 405 selects the position coordinates (X _k , X _k) of the corresponding candidate point P _k of the selected pixel of the virtual image # 21 from the images of the two image sensors 2 and 3. Y _k ) (k = 1, 2: k is used in the same meaning) is generated for each magnitude of the assumed distance z _n .

【０１５６】局所情報抽出部４０６、４０７では、画像
センサ２、３の対応候補点Ｐ_kの画像情報Ｆ_kが抽出さ
れ、類似度算出部４０９では、これら発生された画像セ
ンサ２、３の対応候補点Ｐ_kの位置座標（Ｘ_k，Ｙ_k ）
の画像情報Ｆ_k同士の類似度が算出される。The local information extraction units 406 and 407 extract the image information F _k of the corresponding candidate points P _k of the image sensors 2 and 3, and the similarity calculation unit 409 extracts the correspondence of the generated image sensors 2 and 3. Position coordinates (X _k , Y _k ) of candidate point P _k
Image information F _k similarity between are calculated for.

【０１５７】そして、距離推定部４１０では、上記算出
された類似度が最も大きくなるときの仮定距離ｚ_nxが、
仮想画像センサ２１から選択画素Ｐ₂₁に対応する作業視
野Ａr中の物体上の点までの距離とされ、この距離ｚ_nx
が各選択画素毎に求められる。この結果、仮想画像セ
ンサ２１から見た作業視野Ａrを示す距離画像、撮像画
像が、車両６０の走行状況がいかに変化したとしても、
常に正確なものとして、生成されることになる。In the distance estimating section 410, the assumed distance z _nx at which the calculated similarity becomes _maximum is calculated as follows:
Is the distance to the point on the object in the work field Ar corresponding virtual image sensor 21 in the selected pixel P _21, the distance z _nx
Is obtained for each selected pixel. As a result, the distance image and the captured image indicating the working field of view Ar viewed from the virtual image sensor 21 indicate that the traveling state of the vehicle 60 changes,
It will always be generated as accurate.

【０１５８】このように、本実施形態によれば、多眼ス
テレオによる計測において、利用する画像センサを固定
して計測を行うのではなく、刻々と変化する作業視野Ａ
rに応じて、これを撮像するのに最も適切な少なくとも
二つの画像センサを順次選択していくようにしたので、
車両６０の走行状況（車両６０の速度Ｖ、進行方向φの
変化）が変化したとしても、これに対応したきめの細か
い計測を行うことができる。As described above, according to the present embodiment, in the measurement by the multi-view stereo, the measurement is not performed by fixing the image sensor to be used, but the work field A that changes every moment.
According to r, at least two image sensors most suitable for imaging this are sequentially selected, so that
Even if the running situation of the vehicle 60 (the change in the speed V and the traveling direction φ of the vehicle 60) changes, fine measurement corresponding to the change can be performed.

【０１５９】また、本実施形態では、車両に画像センサ
群を搭載する場合を想定しているが、搭載されるべきも
のは、これに限定されることなく、たとえば移動ロボッ
トに搭載してもよい。また、画像センサ群１１の姿勢角
を、矢印Ａ、Ｂに示す方向に調整できる姿勢角調整機構
を備えるようにして、作業視野Ａrを撮像できる方向に
画像センサが向くように、速度センサ６２、姿勢角セン
サ６３の検出出力に応じて車両センサ群１１の姿勢角を
変化させるような実施も可能である。In this embodiment, it is assumed that the image sensor group is mounted on the vehicle. However, the image sensor group is not limited to this, and may be mounted on, for example, a mobile robot. . In addition, by providing a posture angle adjustment mechanism that can adjust the posture angle of the image sensor group 11 in the directions indicated by arrows A and B, the speed sensor 62, An embodiment in which the attitude angle of the vehicle sensor group 11 is changed according to the detection output of the attitude angle sensor 63 is also possible.

【０１６０】また、作業視野Ａrまでの距離に応じて画
像センサのズームの度合いを調整してもよい。Further, the degree of zoom of the image sensor may be adjusted according to the distance to the working visual field Ar.

【０１６１】・第４の実施形態本実施形態は、各選択画素について全ての仮想距離に対
して探索を行うのではなく、認識対象物体を示すモデル
という概念を導入し、このモデル上の点についてだけ照
合処理を行うことによって、モデル推定の効率を向上さ
せることができる実施形態である。Fourth Embodiment This embodiment introduces the concept of a model representing a recognition target object, instead of searching for all the virtual distances for each selected pixel. In this embodiment, the efficiency of model estimation can be improved by performing only the matching process.

【０１６２】まず、はじめに、モデルの概念、定義につ
いて説明する。First, the concept and definition of the model will be described.

【０１６３】本明細書にいうモデルとは、空間内に定義
された、３次元構造をもった認識対象物のことである。
モデルは、絶対空間内（全体座標系ｘ−ｙ−ｚ）におけ
る座標位置、形状データなどのモデル情報により、３次
元構造が規定される。各画像センサ１〜Ｎの観測視野あ
るいは仮想画像センサ２１の仮想視野で見たモデルの位
置は、このモデル情報に基づいて算出することができ
る。モデルとしては、単純な平面、曲面ばかりではな
く、複雑な３次元の形状を有したものも使用することも
できる。モデルは、最終的に、物体の３次元構造の認識
に役立てるものなので、モデル情報の表現方法や、モデ
ル情報の内容は、この物体の３次元構造を認識するとい
う目的に合わせたものを使用できる。The model referred to in this specification is a recognition target having a three-dimensional structure defined in a space.
The model has a three-dimensional structure defined by model information such as coordinate positions and shape data in an absolute space (global coordinate system xyz). The position of the model viewed in the observation field of view of each of the image sensors 1 to N or the virtual field of view of the virtual image sensor 21 can be calculated based on this model information. As a model, not only a simple plane and a curved surface but also a model having a complicated three-dimensional shape can be used. Since the model is ultimately useful for recognizing the three-dimensional structure of the object, the method of expressing the model information and the contents of the model information can be adapted to the purpose of recognizing the three-dimensional structure of the object. .

【０１６４】また、関連するモデルを効率よく表現する
ために、目的にあわせてモデル情報をパラメタライズす
る方法もある。There is also a method of parameterizing model information according to the purpose in order to efficiently represent related models.

【０１６５】例えば、ある間隔で平行に配置された一連
の平面のモデル群を、平行位置というパラメータによっ
て表現する。そして、このパラメータを変化させること
により、認識対象の表面が一連の平面のモデル群のいず
れかのモデルであるかが推定される。For example, a series of plane models arranged in parallel at a certain interval is represented by a parameter called a parallel position. Then, by changing this parameter, it is estimated whether the recognition target surface is any one of a series of plane model groups.

【０１６６】以下、本実施形態について、図１１〜図１
８を参照して説明する。Hereinafter, this embodiment will be described with reference to FIGS.
8 will be described.

【０１６７】本実施形態では、図１３に示すように、車
両６０に、画像センサ群１１を搭載して、この画像セン
サ群１１の撮像結果に基づいて、車両進行方向前方の道
路６１の傾斜θを認識、判断しつつ走行する場合を想定
している。In this embodiment, as shown in FIG. 13, an image sensor group 11 is mounted on a vehicle 60, and the inclination θ of a road 61 ahead in the vehicle traveling direction is determined based on the image pickup result of the image sensor group 11. It is assumed that the vehicle travels while recognizing and judging.

【０１６８】図１２は、本実施形態の多眼ステレオ視に
よる物体認識装置の構成を示すブロック図であり、図６
のブロック図に対応するものである。FIG. 12 is a block diagram showing the configuration of an object recognition apparatus using multi-view stereoscopic vision according to this embodiment.
Corresponds to the block diagram of FIG.

【０１６９】なお、本実施形態では、多眼ステレオに適
用される場合を想定しているが、２眼ステレオに適用し
てもよい。In this embodiment, it is assumed that the present invention is applied to a multi-view stereo, but the present invention may be applied to a twin-view stereo.

【０１７０】また、本実施形態では、図６と同様に、仮
想画像センサ２１による仮想画像＃２１を設定する場合
を想定しているが、本実施形態においては、かかる仮想
画像＃２１を設定することなく、図２７と同等の構成で
も実現できる。In this embodiment, as in FIG. 6, it is assumed that the virtual image # 21 is set by the virtual image sensor 21, but in this embodiment, the virtual image # 21 is set. Without this, a configuration equivalent to that of FIG. 27 can be realized.

【０１７１】すなわち、仮想画像＃２１の選択画素に対
応する各画像＃１〜＃Ｎの対応点を探索する構成のもの
に、本実施形態を適用するばかりではなく、基準画像＃
１の選択画素に対応する各画像＃２〜＃Ｎの対応点を探
索する構成のものに、本実施形態を適用するようにして
もよい。That is, the present embodiment is not only applied to a configuration for searching for the corresponding point of each of the images # 1 to #N corresponding to the selected pixel of the virtual image # 21,
The present embodiment may be applied to a configuration in which a corresponding point of each of the images # 2 to #N corresponding to one selected pixel is searched.

【０１７２】多眼ステレオである画像センサ群１１を構
成する各画像センサ１、２、３、…、Ｎは、水平、垂直
あるいは斜め方向に所定の間隔で配置されているものと
する。図１２に示す画像入力部６０１には、画像セン
サ１で撮像された画像＃１が取り込まれ、画像入力部６
０２には、画像センサ２の画像＃２が取り込まれ、画像
入力部６０３には、画像センサＮの画像＃Ｎがそれぞれ
取り込まれる。The image sensors 1, 2, 3,..., N constituting the image sensor group 11 which is a multi-view stereo are arranged at predetermined intervals in a horizontal, vertical or oblique direction. The image input unit 601 shown in FIG.
02, the image # 2 of the image sensor 2 is captured, and the image input unit 603 captures the image #N of the image sensor N, respectively.

【０１７３】仮想視野情報設定部６０４では、図１１に
示すように、所望の位置で、所望の視野をもって認識対
象物体（この場合は、道路６１）を撮像したときの仮想
画像センサ２１による仮想の視野に関する情報が設定さ
れる。つまり、作業視野となるべき仮想視野が設定され
る。この仮想画像センサ２１では、実際の画像センサ１
〜Ｎと同じように、画像センサの配設位置、向き（姿
勢）、焦点距離、解像度などのパラメータが定義され
る。As shown in FIG. 11, the virtual visual field information setting unit 604 uses the virtual image sensor 21 to capture a virtual image of the object to be recognized (in this case, the road 61) at a desired position with a desired visual field. Information about the field of view is set. That is, a virtual visual field to be a working visual field is set. In this virtual image sensor 21, the actual image sensor 1
As in the case of N, parameters such as the arrangement position, orientation (posture), focal length, and resolution of the image sensor are defined.

【０１７４】また、目的に応じて仮想画像センサ２１の
パラメータを自由にダイナミックに変更してもよい。Further, the parameters of the virtual image sensor 21 may be freely and dynamically changed according to the purpose.

【０１７５】なお、図１１のレンズ４１は仮想画像セン
サ２１を構成する仮想的なレンズを示している。The lens 41 shown in FIG. 11 is a virtual lens constituting the virtual image sensor 21.

【０１７６】仮定モデル情報設定部６０５には、３次元
空間内における認識すべき物体のモデルの形状に関する
モデル情報が設定される。本実施形態では、図１３に示
すように、道路６１の傾斜θを認識、判断する必要があ
ることから、認識対象物体に応じたモデルＭ₁、Ｍ₂、Ｍ
₃、Ｍ₄を、パラメータθの値θ₁、θ₂、θ₃、θ₄毎にモ
デル情報として設定する。In the assumption model information setting section 605, model information relating to the shape of the model of the object to be recognized in the three-dimensional space is set. In the present embodiment, as shown in FIG. 13, since it is necessary to recognize and determine the inclination θ of the road 61, the models M ₁ , M ₂ , M
_3, the M _4, the value theta ₁ parameter theta, theta _2, theta _3, is set as the model information for each theta _4.

【０１７７】このモデル情報の具体的な設定方法として
は、多くの方法が考えられる。[0177] As a specific setting method of the model information, many methods can be considered.

【０１７８】例えば、仮定するモデルが上記のような単
純な平面からなる道路のような場合には、その平面を特
定する方法として、その平面が通るどこか一点の空間座
標と平面の法線ベクトルを与えても良いし、その平面が
通るどこか三点の空間座標を与えても良い。また、仮定
するモデルが起伏を持つような複雑な場合には、その起
伏を十分に再現するに足る複数点の座標を多数与えて補
間しても良いし、任意の幾何モデルを組み合わせて設定
しても良い。For example, in the case where the assumed model is a road composed of a simple plane as described above, a method of specifying the plane includes the spatial coordinates of a point passing through the plane and the normal vector of the plane. May be given, or spatial coordinates of three points somewhere through the plane may be given. When the assumed model is complex with undulations, interpolation may be performed by giving a large number of coordinates of a plurality of points sufficient to sufficiently reproduce the undulations, or set by combining arbitrary geometric models. May be.

【０１７９】対応候補点座標発生部６０６では、図１１
に示すように、仮想画像センサ２１で道路６１を撮像し
たときの当該仮想画像センサ２１の仮想画像＃２１中の
位置（ｉ，ｊ）で特定される選択画素Ｐ₂₁に対応する各
画像センサｋの画像＃ｋ中の対応候補点Ｐ_kの位置座標
（Ｘ_k，Ｙ_k）が、モデルＭ₁、Ｍ₂、Ｍ₃、Ｍ₄の中から選
択された仮定モデルＭ_i（ｉ＝１〜４）毎に発生され
る。この対応候補点Ｐ_kの位置座標（Ｘ_k，Ｙ_k）は、図
１１に示すように各モデル上の点（ｘ，ｙ，ｚ）に対応
して求めても良いし、最初から対応する変換テーブルを
用意しておき実際に計算しなくても良い。In the correspondence candidate point coordinate generation unit 606, FIG.
As shown in, the virtual virtual image # position in 21 of the image sensor 21 (i, j) each image sensor corresponding to the selected pixel P ₂₁ specified by k when capturing the road 61 in the virtual image sensor 21 Is the position coordinates (X _k , Y _k ) of the corresponding candidate point P _k in the image #k of the hypothetical model M _i (i = 1 to 1) selected from the models M ₁ , M ₂ , M ₃ , and M ₄ 4) Generated every time. The position coordinates (X _k , Y _k ) of the corresponding candidate point P _k may be obtained corresponding to the point (x, y, z) on each model as shown in FIG. It is not necessary to prepare a conversion table and actually calculate.

【０１８０】つぎに、局所情報抽出部６０７〜６０９で
は、このようにして対応候補点座標発生部６０６によっ
て発生された対応候補点Ｐ_kの位置座標に基づき局所情
報を抽出する処理がそれぞれ実行される。Next, in the local information extracting units 607 to 609, processes for extracting local information based on the position coordinates of the corresponding candidate point _Pk generated by the corresponding candidate point coordinate generating unit 606 are respectively executed. You.

【０１８１】すなわち、局所情報抽出部６０７では、対
応候補点座標発生部６０６で発生された画像センサ１の
画像＃１の対応候補点Ｐ₁の位置座標から、周囲の画素
の画像情報に応じて補間することで、対応候補点Ｐ₁の
画像情報Ｆ₁を抽出する。同様に、局所情報抽出部６０
８では、対応候補点座標発生部６０６で発生された画像
センサ２の画像＃２の対応候補点Ｐ₂の位置座標に基づ
いて、対応候補点Ｐ₂の画像情報Ｆ₂が、局所情報抽出部
６０９では、対応候補点座標発生部６０６で発生された
画像センサＮの画像＃Ｎの対応候補点Ｐ_Nの位置座標に
基づいて、対応候補点Ｐ_Nの画像情報Ｆ_Nがそれぞれ求め
られる。なお、ここで画像情報とは明度あるいは明度の
２階微分値などのことである。[0181] That is, the local information extracting unit 607, from the position coordinates of the corresponding candidate point P ₁ of the image # 1 of the image sensor 1, which is generated by the corresponding candidate point coordinate generating unit 606, in accordance with the image information of the surrounding pixels by interpolation, extracts image information F ₁ corresponding candidate point P _1. Similarly, the local information extraction unit 60
In 8, based on the position coordinates of the corresponding candidate point P ₂ of the image # 2 image sensor 2 generated by the corresponding candidate point coordinate generating unit 606, image information F ₂ corresponding candidate point P ₂ is, the local information extracting unit in 609, based on the position coordinates of the corresponding candidate point P _N of the image #N image sensors N which is generated in the corresponding candidate point coordinate generating unit 606, image information F _N corresponding candidate point P _N are obtained, respectively. Here, the image information refers to lightness or a second-order differential value of the lightness.

【０１８２】こうして画像＃１、＃２、…、＃Ｎの対応
候補点Ｐ₁、Ｐ₂、…、Ｐ_Nの画像情報Ｆ₁、Ｆ₂、…、
Ｆ_Nがそれぞれ得られると、類似度算出部６１０では、
全画像センサ１〜Ｎの対応候補点Ｐ₁〜Ｐ_Nの画像情報
Ｆ₁〜Ｆ_Nの分散値（画像情報の「まとまり具合」を示
す評価値）が類似度として算出される。類似度が大きい
値を示す程（画像情報の分散値が小さい程）、この仮定
モデルＭに近いことを意味する。類似度は、仮定モデル
Ｍ₁、Ｍ₂、Ｍ₃、Ｍ₄ごとに求められる。[0182] Thus, image # 1, # 2, ..., the corresponding candidate points P _1, P ₂ of # N, ..., image information F _1, F ₂ of the P _N, ...,
When F _N is obtained, the similarity calculation unit 610 calculates
The variance value (evaluation value indicating “the degree of unity” of the image information) of the image information F ₁ to F _N of the corresponding candidate points P ₁ to P _N of all the image sensors ₁ to _N is calculated as the similarity. The higher the similarity indicates a larger value (the smaller the variance of the image information is), the closer the model is to the assumption model M. The similarity is obtained for each of the hypothetical models M ₁ , M ₂ , M ₃ , and M ₄ .

【０１８３】このようにして、図１４に示すように、仮
定モデルＭと類似度の逆数Ｑsとの対応関係が取得され
る。In this manner, as shown in FIG. 14, the correspondence between the hypothetical model M and the reciprocal Qs of the similarity is obtained.

【０１８４】つぎに、モデル推定部６１１では、仮定モ
デルＭと類似度の逆数Ｑsとの対応関係から最も類似度
の高いモデルを決定する。これを各選択画素ごとの対応
関係に関して実行して、各画素毎にモデルを推定する。Next, the model estimating section 611 determines a model having the highest similarity from the correspondence between the hypothetical model M and the reciprocal Qs of the similarity. This is performed for the correspondence relationship for each selected pixel, and a model is estimated for each pixel.

【０１８５】また、画面の一部の領域あるいは全領域に
対してモデル推定を実行すれば、最も頻度の高いモデル
を、認識対象物体を示すモデルとして最終的に決定する
こともできる。あるいは、仮定モデルＭと類似度の逆数
Ｑsとの対応関係から探索のきざみより細かい精度でモ
デルを推定し、このモデルを認識対象物体に近いモデル
と決定してもよい。By performing model estimation on a partial area or the entire area of the screen, the most frequent model can be finally determined as a model representing the recognition target object. Alternatively, a model may be estimated from the correspondence between the hypothetical model M and the reciprocal Qs of the similarity with a precision finer than the search interval, and this model may be determined as a model close to the recognition target object.

【０１８６】いま、図１４が、仮定モデルＭと類似度の
逆数Ｑsの加算値の対応関係を示しているものとする
と、この対応関係に基づいて、類似度（類似度の逆数Ｑ
s ）が最も大きくなるときの仮定モデルが、認識対象物
体を示すモデルとされる。この場合、図１４の折れ線状
の対応関係に、曲線近似を適用することによって、正確
に、類似度の逆数Ｑsが最小値をとるＭ_2.6を定めること
ができる。いま、モデルＭ₂に、傾斜角θ₂＝０度が対応
づけられ、モデルＭ₃に、傾斜角θ₃＝５度が対応づけら
れているものとすると、上記Ｍ_2.6は、傾斜角θ＝３度
であると推定することができる。よって、車両６０前方
の道路６１の傾斜角θは３度であると認識、判断するこ
とができる。Now, assuming that FIG. 14 shows the correspondence between the assumption model M and the added value of the reciprocal Qs of the similarity, based on this correspondence, the similarity (the reciprocal Q of the similarity Q
The hypothetical model when s) is the largest is the model representing the recognition target object. In this case, the polygonal line relationship of Figure 14, by applying curve fitting, can be accurately define the M _2.6 to reciprocal Qs similarity takes a minimum value. Now, the model M _2, the inclination angle theta ₂ = 0 degrees associated, in the model M _3, the inclination angle theta ₃ = 5 ° is assumed to be correlated, the M _2.6 is the inclination angle theta = It can be estimated to be three degrees. Therefore, it can be recognized and determined that the inclination angle θ of the road 61 in front of the vehicle 60 is 3 degrees.

【０１８７】モデル群選択部６１２では、補助情報入力
部６１３から出力されるモデル群の各モデルの存在度合
いの情報と、モデル推定部６１１のモデル推定結果に基
づいてモデル群の中から、仮定モデルとすべきモデルを
選択する処理が実行される。すなわち、車両６０には、
車両６０のピッチ角を検出するピッチ角センサが搭載さ
れており、このピッチ角センサにより車両６０の現在の
ピッチ角が検出される。よって、このピッチ角センサの
検出出力に基づいて車両６０前方の路面の傾斜が、どの
程度であるのかを予測できる。The model group selecting section 612 selects a presumed model from the model groups based on the information on the degree of existence of each model of the model group output from the auxiliary information input section 613 and the model estimation result of the model estimating section 611. A process for selecting a model to be executed is executed. That is, in the vehicle 60,
A pitch angle sensor for detecting the pitch angle of the vehicle 60 is mounted, and the current pitch angle of the vehicle 60 is detected by the pitch angle sensor. Therefore, it is possible to predict the degree of the inclination of the road surface ahead of the vehicle 60 based on the detection output of the pitch angle sensor.

【０１８８】補助情報入力部６１３は、ピッチ角センサ
の検出出力をモデル群選択部６１２に出力する。Auxiliary information input section 613 outputs the detection output of the pitch angle sensor to model group selection section 612.

【０１８９】モデル群選択部６１２では、ピッチ角セン
サの検出値を所定のしきい値にて２値化して、ピッチ角
センサの検出値が、このしきい値以上である場合には、
モデルＭ₁、Ｍ₂、Ｍ₃、Ｍ₄の中から、傾斜の緩いモデル
Ｍ₁、Ｍ₂が、仮定すべきモデルとして選択される。The model group selecting section 612 binarizes the detected value of the pitch angle sensor with a predetermined threshold value, and if the detected value of the pitch angle sensor is equal to or larger than this threshold value,
From the models M ₁ , M ₂ , M ₃ , and M ₄ , the models M ₁ and M ₂ with gentle slopes are selected as models to be assumed.

【０１９０】一方、ピッチ角センサの検出値が、上記し
きい値よりも小さい場合には、モデルＭ₁、Ｍ₂、Ｍ₃、
Ｍ₄の中から、傾斜の急なピッチＭ₃、Ｍ₄が、仮定すべ
きモデルとして選択される。On the other hand, when the detected value of the pitch angle sensor is smaller than the threshold value, the models M ₁ , M ₂ , M ₃ ,
From M ₄ , the steep pitches M ₃ and M ₄ are selected as models to be assumed.

【０１９１】このようにしてモデル群選択部６１２で選
択された特定のピッチのみが、仮定モデル情報設定部６
０５で生成され、対応候補点座標発生部６０６で、実際
に使用すべきモデルとして仮定されることになる。Only the specific pitch selected by model group selecting section 612 in this way is assumed model information setting section 6
05, and is assumed as a model to be actually used by the corresponding candidate point coordinate generation unit 606.

【０１９２】また、道路６１を示すモデル群が、傾斜角
θが正（登り勾配）のモデル群と、傾斜角θが負（下り
勾配）のモデル群で構成されている場合には、モデル群
選択部６１２で、傾斜角θが正（登り勾配）のモデル群
と、傾斜角θが負（下り勾配）のモデル群のいずれかを
選択、出力するような実施も可能である。When the model group indicating the road 61 is composed of a model group having a positive inclination angle θ (uphill gradient) and a model group having a negative inclination angle θ (downhill gradient), The selection unit 612 can select and output one of a model group having a positive inclination angle θ (uphill gradient) and a model group having a negative inclination angle θ (downhill gradient).

【０１９３】以上のように、本実施形態では、モデル群
選択部６１２を備えるようにしているので、用意したモ
デル群のすべてに対して演算処理を行うのではなく、用
意したモデル群の中から特定のモデル群のみに絞り込ん
で、この特定のモデル群のみに対して演算処理を行えば
よいので、きわめて効率よく物体を認識、判断すること
が可能となる。As described above, in the present embodiment, since the model group selecting section 612 is provided, not all of the prepared model groups are subjected to the arithmetic processing but the prepared model groups are selected. Since it is only necessary to narrow down to a specific model group and perform the arithmetic processing only on this specific model group, it is possible to recognize and judge the object extremely efficiently.

【０１９４】以上のように、この図１２に示す本実施形
態によれば、道路６１の形状（平面）などが予め特定さ
れている場合に、これをモデル化して、認識対象を判別
するようにしたので、従来のように各選択画素毎に全空
間について距離計測を行ったり、距離計測結果から物体
を特定、抽出したりする複雑な処理を要しないので、よ
り効率よく、誤認識することなく、計測が行えるように
なる。As described above, according to the present embodiment shown in FIG. 12, when the shape (plane) of the road 61 is specified in advance, it is modeled to determine the recognition target. Since it is not necessary to perform distance measurement for the entire space for each selected pixel or to identify and extract an object from the distance measurement result as in the past, it is not necessary to perform complex processing more efficiently and without erroneous recognition. Measurement can be performed.

【０１９５】つぎに、本実施形態に適用される種々の変
形例について説明する。Next, various modifications applied to the present embodiment will be described.

【０１９６】図１５では、車両６０の前方の道路６１の
傾斜状態θを認識するばかりではなく、車両６０前方の
障害物６２を認識する場合を想定している。In FIG. 15, it is assumed that not only the inclination θ of the road 61 ahead of the vehicle 60 is recognized, but also an obstacle 62 ahead of the vehicle 60 is recognized.

【０１９７】この場合、図１２のモデル推定部６１１で
は、モデルＭ₁、Ｍ₂、Ｍ₃、Ｍ₄の中から、モデルＭ₂が
道路６１を示すものと推定された場合を示している。そ
こで、モデル推定部６１１では、以下の演算が実行され
る。In this case, the model estimating unit 611 in FIG. 12 shows a case where the model M ₂ is estimated to indicate the road 61 from the models M ₁ , M ₂ , M ₃ , and M ₄ . Therefore, the following calculation is performed in the model estimation unit 611.

【０１９８】すなわち、モデルＭ₂を仮定したときに、
仮想画像＃２１の各選択画素Ｐ₂₁毎に、類似度の逆数Ｑ
sが類似度算出部６１０で算出されているので、図１６
に示すように仮想画像＃２１の各選択画素Ｐ₂₁に、対応
する類似度の逆数Ｑsの値が対応づけられる。That is, when the model M ₂ is assumed,
For each selected pixel P ₂₁ of the virtual image # 21, the inverse of similarity Q
Since s is calculated by the similarity calculation unit 610, FIG.
Each selected pixel P ₂₁ of the virtual image # 21, as shown, the value of the corresponding similarity reciprocal Qs is associated.

【０１９９】そこで、この類似度の逆数の分布に基づい
て、道路６１上に障害物６２が存在するか否か、そして
その場所はいずれの場所であるかが判断される。Therefore, based on the distribution of the reciprocal of the degree of similarity, it is determined whether or not the obstacle 62 exists on the road 61 and the location of the obstacle 62.

【０２００】同図に示すように、仮想画像＃２１上の類
似度の逆数Ｑsの分布は、数値が低い領域２６（この領
域では、類似度の逆数Ｑsは「２」を示している）と、
数値が高い領域２５（この領域では、類似度の逆数Ｑs
は「３」、「４」を示している）とに分類される。これ
らの領域の識別は、類似度の逆数Ｑsの大きさを２値的
に判別するしきい値を設定して、このしきい値以下であ
るか否かによって行うことができる。As shown in the figure, the distribution of the reciprocal Qs of the similarity on the virtual image # 21 is represented by a region 26 having a low numerical value (in this region, the reciprocal Qs of the similarity indicates “2”). ,
A region 25 with a high numerical value (in this region, the reciprocal Qs of the similarity is Qs)
Indicate “3” and “4”). Identification of these regions can be performed by setting a threshold value for determining the magnitude of the reciprocal Qs of the similarity in a binary manner and determining whether or not the value is equal to or smaller than the threshold value.

【０２０１】そこで、上記類似度の逆数Ｑsの数値が低
い領域２６については道路６１を示すものと判断し、上
記類似度の逆数Ｑsの数値が高い領域２５については道
路６１上に存在する障害物６２であると判断するもので
ある。Therefore, it is determined that the area 26 where the numerical value of the reciprocal Qs of the similarity is low indicates the road 61, and the area 25 where the numerical value of the reciprocal Qs of the similarity is high is an obstacle existing on the road 61. 62.

【０２０２】ここで、仮想画像＃２１の各選択画素ごと
に、各画像＃１〜＃Ｎの対応点の画像情報が求められる
ので、仮想画像＃２１の各画素に、対応点の画像情報に
応じた明度を付与することで、仮想画像センサ２１で認
識対象物体を撮像したときの画像を、図１５に示すよう
に、生成することができる。この場合、上述した道路６
１と障害物６２とを識別した結果に基づいて、画像中の
特定の領域６４が障害物６２を示すものと判断すること
ができる。Here, for each selected pixel of the virtual image # 21, the image information of the corresponding point of each of the images # 1 to #N is obtained, so that each pixel of the virtual image # 21 is added to the image information of the corresponding point. By giving the corresponding brightness, an image when the recognition target object is captured by the virtual image sensor 21 can be generated as shown in FIG. In this case, the above-mentioned road 6
It is possible to determine that the specific area 64 in the image indicates the obstacle 62 based on the result of identifying the obstacle 1 and the obstacle 62.

【０２０３】このように走行路上に障害物が存在するか
否かを判断するには、必ずしも走行路を示すモデルを複
数用意する必要はない。In order to determine whether or not an obstacle is present on the traveling path, it is not always necessary to prepare a plurality of models indicating the traveling path.

【０２０４】図１７は、道路６１が平坦であることが既
知であり、モデルとしては平坦（傾斜角度θ＝０度）な
モデルＭ₂のみを用意して、道路６１上の障害物６２が
存在するか否かを認識、判断する場合を想定している。[0204] Figure 17, it is known the road 61 is flat, as a model to prepare only flat (inclination angle theta = 0 degrees) model M _2, there is an obstacle 62 on a road 61 It is assumed that it is recognized and determined whether or not to perform.

【０２０５】この場合、図１２のモデル推定部６１１で
は、以下の演算が実行される。In this case, the following calculation is performed in the model estimating unit 611 of FIG.

【０２０６】すなわち、モデルＭ₂を仮定したときに、
仮想画像＃２１の各選択画素Ｐ₂₁毎に、類似度の逆数Ｑ
sが類似度算出部６１０で算出されているので、図１８
に示すように仮想画像＃２１の各選択画素Ｐ₂₁に、対応
する類似度の逆数Ｑsの値が対応づけられる。That is, when the model M ₂ is assumed,
For each selected pixel P ₂₁ of the virtual image # 21, the inverse of similarity Q
Since s is calculated by the similarity calculation unit 610, FIG.
Each selected pixel P ₂₁ of the virtual image # 21, as shown, the value of the corresponding similarity reciprocal Qs is associated.

【０２０７】そこで、この類似度の逆数の分布に基づい
て、道路６１上に障害物６２が存在するか否か、そして
その場所はいずれの場所であるかが判断される。Therefore, based on the distribution of the reciprocal of the degree of similarity, it is determined whether or not the obstacle 62 exists on the road 61 and the location of the obstacle 62.

【０２０８】同図に示すように、仮想画像＃２１上の類
似度の逆数Ｑsの分布は、類似度の逆数Ｑsの大きさを２
値的に判別するしきい値によって、数値が低い領域２
６’と、数値が高い領域２５’とに分類される。As shown in the figure, the distribution of the reciprocal Qs of the similarity on the virtual image # 21 is such that the magnitude of the reciprocal Qs of the similarity is 2
Area 2 where the numerical value is low according to the threshold value that is determined by value
6 ′ and a region 25 ′ having a high numerical value.

【０２０９】そこで、上記類似度の逆数Ｑsの数値が低
い領域２６’については道路６１（モデルＭ₂）を示す
ものと判断し、上記類似度の逆数Ｑsの数値が高い領域
２５’については道路６１上に存在する障害物６２であ
ると判断するものである。Therefore, it is determined that the area 26 'where the numerical value of the reciprocal Qs of the similarity is low indicates the road 61 (model M ₂ ), and the area 25' where the numerical value of the reciprocal Qs of the similarity is high is the road 25 '. It is determined that the obstacle 62 exists on the object 61.

【０２１０】さて、以上説明した実施形態では、モデル
が単一の形状（平面）である場合を想定しているが、も
ちろん、多種多様な形状を有する複数の物体を認識する
場合にも本発明を適用することができる。In the embodiment described above, it is assumed that the model has a single shape (plane). However, the present invention is also applicable to the case of recognizing a plurality of objects having various shapes. Can be applied.

【０２１１】たとえば、地面上に設置されている建物を
認識、判別するにあたって、地面を示すモデル、建物の
壁面を示すモデル、建物の屋根を示すモデルなどのモデ
ル群を、図１２の仮定モデル情報設定部６０５で生成し
て、以下、これらモデル群の各仮定モデルに対して同様
な処理を行うようにすればよい。For example, in recognizing and discriminating a building installed on the ground, a model group such as a model indicating the ground, a model indicating the wall surface of the building, a model indicating the roof of the building, etc. The same process may be performed on each hypothetical model of these model groups, generated by the setting unit 605.

【０２１２】ここで、処理の繁雑さを避けるため、モデ
ル群の各モデル間で階層構造をもたせ、処理を効率的に
実行することができる。Here, in order to avoid complexity of processing, a hierarchical structure is provided between the models in the model group, and the processing can be executed efficiently.

【０２１３】すなわち、建物と地面全体のモデル群を、
まず、大きく、複数の地面のモデルからなるモデル群
と、複数の壁面のモデルからなるモデル群と、複数の屋
根のモデルからなるモデル群とに分類する。さらに、地
面のモデル群を、平坦な地面のモデル群と、傾いた地面
のモデル群とに分類する。そして、地面の高さ、水平面
に対する傾き角度などのパラメータを導入して、地面の
モデルそれぞれに対して、このパラメータをモデル情報
として付与する。建物の壁面のモデルについても、同様
にして、垂直面に対する傾斜角などのパラメータを導入
して、このパラメータをモデル情報として付与する。That is, a model group of the whole building and the ground is
First, it is classified into a model group including a plurality of ground models, a model group including a plurality of wall models, and a model group including a plurality of roof models. Further, the ground model group is classified into a flat ground model group and an inclined ground model group. Then, parameters such as the height of the ground and the inclination angle with respect to the horizontal plane are introduced, and the parameters are given to each model of the ground as model information. Similarly, for a model of a wall surface of a building, a parameter such as an inclination angle with respect to a vertical plane is introduced, and this parameter is added as model information.

【０２１４】そして、図１２のモデル推定部６１１で
は、仮想画像＃２１を、いくつかの領域に区分して、各
領域毎にモデルを推定することができる。Then, the model estimating unit 611 in FIG. 12 can divide the virtual image # 21 into several regions and estimate a model for each region.

【０２１５】いま、仮想画像＃２１で、ある建物が捕ら
えられており、仮想画像＃２１の中央の正面に、垂直な
壁面が存在し、その上に、傾斜のある屋根が存在し、正
面壁面の左に、奥行き方向に傾いているた壁面が存在し
ていたとする。Now, a certain building is captured in the virtual image # 21. A vertical wall surface is present in front of the center of the virtual image # 21, and a sloping roof is present thereon. Assume that there is a wall inclined to the depth direction to the left of.

【０２１６】そこで、仮想画像＃２１の中央領域、中央
上部領域、中央より左の領域それぞれ毎に、モデル推定
処理を実行することにより、各領域に存在するであろう
物体を、確実かつ効率よく認識、判断することが可能と
なる。Therefore, by executing a model estimation process for each of the central region, the central upper region, and the region to the left of the center of the virtual image # 21, objects which may be present in each region can be reliably and efficiently detected. It becomes possible to recognize and judge.

【０２１７】さらに、一の建物だけではなく、複数の建
物を認識する実施も可能である。この場合、複数の建物
（たとえば、一戸建てとビル）それぞれを示すモデル群
が上位に存在し、建物を構成する壁面、屋根などのモデ
ルがその下位に存在する階層構造をもつモデル群とな
る。ここで、上位の構造としては、個々の建物のモデル
を個別に用意してもよく、複数の建物が存在する全体構
成（建物群）をモデルとすることも可能である。Further, it is possible to recognize not only one building but also a plurality of buildings. In this case, a model group representing each of a plurality of buildings (for example, a single-family house and a building) is at the top, and a model group having a hierarchical structure in which models such as walls and roofs constituting the building exist at the bottom. Here, as a higher-order structure, a model of each building may be individually prepared, or an entire configuration (building group) in which a plurality of buildings exist may be used as a model.

【０２１８】また、上述した仮想画像＃２１の領域毎に
モデルを推定する場合に、各領域毎に、各モデルの存在
確率を予め情報として与えることによって、より効率よ
くモデル推定処理を行うようにしてもよい。各モデルに
対して、いずれの領域にどの程度の確率で存在している
のかという情報がモデル情報として付与されることにな
る。When the model is estimated for each region of the virtual image # 21 described above, the model estimation process is performed more efficiently by giving the existence probability of each model as information in advance for each region. You may. For each model, information indicating in which region and at what probability the model is present is given as model information.

【０２１９】そこで、このモデルに付与されたモデル情
報に基づいて、存在確率がきわめて低い領域について
は、このモデルを仮定モデルとしないで、先に処理を進
めることが可能となる。Therefore, based on the model information given to this model, it is possible to proceed with the processing for an area having a very low existence probability without using this model as a hypothetical model.

【０２２０】たとえば、仮想画像＃２１の下側の領域
に、屋根を示すモデルが存在する確率はきわめて低いの
で、この下側の領域については、屋根を示すモデルを仮
定モデルから排除することができる。このため、処理効
率をより向上させることができる。For example, since the probability that a model representing a roof exists in a region below virtual image # 21 is extremely low, the model representing a roof can be excluded from this hypothetical model in this region below. . Therefore, the processing efficiency can be further improved.

【０２２１】たとえば、特定の局所情報や、局所情報か
ら算出した特殊なパラメータ（エッジの強度や方向、位
置などの情報や、テクスチャ情報など）、仮想視野に関
連する情報、仮定したモデルに関連する情報、あるいは
他のセンサの検出結果に基づく情報に基づいて、仮想画
像＃２１の特定の領域では、仮定すべきモデルの種類を
変更したり、仮定すべきモデルの種類に制限を加えた
り、使用すべき画像を各画像＃１〜＃Ｎの中から選択す
るなどの特殊な操作を加えることもできる。For example, specific local information, special parameters calculated from the local information (information on edge strength, direction, position, etc., texture information, etc.), information relating to the virtual visual field, and information relating to the assumed model Based on the information or information based on the detection results of other sensors, in a specific area of the virtual image # 21, the type of the model to be assumed is changed, the type of the model to be assumed is restricted, A special operation such as selecting an image to be selected from the images # 1 to #N can also be added.

【０２２２】また、本実施形態では、前述したように、
仮想画像＃２１を設定することを必ずしも要しないが、
仮想画像＃２１を使用してモデルを推定する場合には、
仮想画像＃２１の視野自在性を利用して、モデルをあら
ゆる方向から観測した結果に基づきモデル推定を行うよ
うにしてもよい。In this embodiment, as described above,
Although it is not always necessary to set the virtual image # 21,
When estimating the model using virtual image # 21,
The model estimation may be performed based on the result of observing the model from all directions using the visual field flexibility of the virtual image # 21.

【０２２３】すなわち、モデル（たとえば球形のモデ
ル）の３次元構造が、通常観測される表側だけではな
く、裏側についても定義されている場合には、仮想画像
＃２１を、モデルの表側だけではなく、モデルの裏側か
ら観測する視野に設定することにより、モデルの推定の
柔軟性を高めることもできる。ただし、これは各画像セ
ンサ１〜Ｎのいずれかで、モデルの裏側が撮像され得る
ことが必要である。That is, when the three-dimensional structure of a model (for example, a spherical model) is defined not only for the normally observed front side but also for the back side, the virtual image # 21 is displayed not only on the front side of the model but also on the back side. By setting the field of view to be observed from the back side of the model, the flexibility in estimating the model can be increased. However, this requires that the back side of the model can be imaged by any of the image sensors 1 to N.

【０２２４】なお、本実施形態では、画像（仮想画像）
中に、予め設定したモデル群の中のうちのいずれかのモ
デル、あるいは予め設定した一のモデルが存在すること
を前提として説明したが、モデルに関して得られた類似
度に基づいて、画像（仮想画像）中に、当該モデルが存
在しているか否かの判断を行うことも可能である。In the present embodiment, an image (virtual image)
In the description, it is assumed that any one of a predetermined model group or one predetermined model exists, but the image (virtual) It is also possible to determine whether or not the model exists in the image).

【０２２５】たとえば、図１６、図１８に示すように、
推定したモデルに関する類似度の逆数Ｑsの分布を取得
し、この類似度の逆数Ｑsが所定のしきい値より高い場
合には、この推定モデルは画像中に存在しないと判断す
るようにしてもよい。For example, as shown in FIGS.
The distribution of the reciprocal Qs of the similarity related to the estimated model is obtained, and if the reciprocal Qs of the similarity is higher than a predetermined threshold, it may be determined that the estimated model does not exist in the image. .

【０２２６】・第５の実施形態本実施形態は、上記第４の実施形態で述べた実在の物体
の認識の考え方を利用することにより、画像上において
実際の背景と背景モデルとが一致している領域について
は擬似的な画像を表示し、不一致の領域についてはそこ
にある実在の物体を表示するという具合に、実際のもの
とは異なる内容に画像表示内容を変更するというもので
ある。Fifth Embodiment In the present embodiment, by utilizing the concept of recognizing a real object described in the fourth embodiment, the actual background matches the background model on the image. For example, a pseudo image is displayed for an existing area, and a real object is displayed for a non-matching area. For example, the image display content is changed to a content different from the actual one.

【０２２７】すなわち、クロマキー等の映像放送の技術
分野に関するものであり、以下のような画像処理を行う
場合に好適なものである。モデルとして、例えば模様の
あるカーテンを背景モデルとして用意する。そして、画
像センサによって、通常、模様のあるカーテンが背景と
して撮像されるものとする。この状態で、背景の前に背
景以外のもの、例えば人などの背景以外の物体が出現す
ると、その背景以外の物体が画像中で占める領域の部分
において、背景モデルとの不一致が生じる。そこで、こ
の不一致が生じた領域には、画像センサで撮像した実際
の人などの背景以外の物体の画像を表示するようにし、
一方、背景モデルと一致した領域については、予め用意
しておいた擬似背景画像、たとえば東京タワーを写した
別の画像を表示するようにするものである。That is, the present invention relates to the technical field of video broadcasting such as chroma key, and is suitable for performing the following image processing. As a model, for example, a curtain with a pattern is prepared as a background model. Then, it is assumed that a curtain with a pattern is normally captured as a background by the image sensor. In this state, if an object other than the background, for example, an object other than the background, such as a person, appears in front of the background, inconsistency with the background model occurs in an area occupied by the object other than the background in the image. Therefore, in the area where the mismatch occurs, an image of an object other than the background such as an actual person captured by the image sensor is displayed,
On the other hand, for an area that matches the background model, a pseudo background image prepared in advance, for example, another image of the Tokyo Tower is displayed.

【０２２８】本実施形態によれば、実在の物体の複雑な
輪郭形状に合わせて背景を物体から分離するという複雑
な処理を要せずとも、きわめて簡易かつ迅速に処理を進
めることができるようになる。According to the present embodiment, the processing can be performed very simply and quickly without the need to separate the background from the object in accordance with the complex contour shape of the real object. Become.

【０２２９】また、背景モデルは、背景の特徴を形状の
みを取り出し抽出したものであり、背景の模様、明るさ
が変化したとしても、これに影響されることなく、常に
精度よく処理を進めることが可能である。The background model is obtained by extracting only the shape of the background feature and extracting it. Even if the background pattern and brightness change, the processing is always performed with high accuracy without being affected by the change. Is possible.

【０２３０】以下、図１２に対応する図１９に示すブロ
ック図を参照して、本実施形態の画像処理について説明
する。Hereinafter, the image processing of this embodiment will be described with reference to the block diagram shown in FIG. 19 corresponding to FIG.

【０２３１】まず、画像センサ１〜Ｎの中から、一の基
準画像センサ、たとえば画像センサ１が選択される。こ
の基準画像センサ１の基準画像＃１に基づいて背景分離
処理が実行される。この基準画像センサ１により、背景
（たとえば模様のあるカーテン）と背景の前に出現する
物体（たとえば人）とが撮像される。他の画像センサ２
〜Ｎにおいても、背景とこの背景の前に出現する物体が
撮像される。First, one reference image sensor, for example, the image sensor 1 is selected from the image sensors 1 to N. The background separation process is performed based on the reference image # 1 of the reference image sensor 1. The reference image sensor 1 captures an image of a background (for example, a curtain with a pattern) and an object (for example, a person) appearing in front of the background. Other image sensor 2
Also in ~ N, the background and an object appearing in front of the background are imaged.

【０２３２】基準画像入力部７０１には、基準画像セン
サ１で撮像された基準画像＃１が取り込まれる。他の画
像入力部７０２、７０３には、他の画像センサ２〜Ｎで
撮像された画像＃２〜＃Ｎが取り込まれる。The reference image input unit 701 receives a reference image # 1 captured by the reference image sensor 1. The other image input units 702 and 703 capture images # 2 to #N captured by the other image sensors 2 to N, respectively.

【０２３３】背景モデル情報設定部７０４では、図１２
の仮定モデル情報設定部６０５と同様にして、背景の３
次元構造を示すモデル情報が設定される。In the background model information setting section 704, FIG.
In the same manner as the assumed model information setting unit 605 of FIG.
Model information indicating a dimensional structure is set.

【０２３４】対応候補点座標発生部７０５では、上記背
景モデルを仮定モデルＭとして、図１２の対応候補点座
標発生部６０６と同様にして、基準画像センサ１の背景
画像＃１中の選択画素Ｐ₁に対応する他の画像センサｋ
の画像＃ｋ中の対応候補点Ｐ_kの位置座標（Ｘ_k，Ｙ_k）
（ｋ＝２、…、Ｎ）を、仮定するモデルＭのモデル情報
から発生する。In the corresponding candidate point coordinate generator 705, the selected pixel P in the background image # 1 of the reference image sensor 1 is set in the same manner as the corresponding candidate point coordinate generator 606 in FIG. Other image sensor k corresponding to ₁
Coordinates (X _k , Y _k ) of the corresponding candidate point P _k in the image #k
(K = 2,..., N) are generated from the model information of the assumed model M.

【０２３５】局所情報抽出部７０６では、基準画像＃１
の選択画素Ｐ₁の画像情報Ｆ₁が局所情報として抽出され
る。局所情報抽出部７０７、７０８では、対応候補点座
標発生部７０５によって発生された対応候補点Ｐ_kの位
置座標に基づき局所情報Ｆ_k（ｋ＝２、…、Ｎ）を抽出
する処理がそれぞれ実行される。In the local information extraction unit 706, the reference image # 1
Image information F ₁ of the selected pixel P ₁ of is extracted as the local information. In the local information extracting units 707 and 708, processes of extracting local information F _k (k = 2,..., N) based on the position coordinates of the corresponding candidate point P _k generated by the corresponding candidate point coordinate generating unit 705 are executed. Is done.

【０２３６】類似度算出部７０９では、図１２に示す類
似度算出部６１０と同様にして、全画像センサ１〜Ｎの
選択画素Ｐ₁、対応候補点Ｐ₂〜Ｐ_Nの画像情報Ｆ₁、Ｆ
₂〜Ｆ_Nの分散値（画像情報の「まとまり具合」を示す評
価値）が類似度として算出される。類似度が大きい値を
示す程（画像情報の分散値が小さい程）、画像中のその
領域は仮定モデルＭと一致している可能性が高い、すな
わち背景を表している可能性が高いということになる。In the similarity calculating section 709, as in the similarity calculating section 610 shown in FIG. 12, the selected pixel P ₁ of all the image sensors 1 to _N , the image information F _{1 of} the corresponding candidate points P ₂ to P _N , F
The variance values of _{2 to} F _N (evaluation values indicating “the degree of unity” of the image information) are calculated as the similarity. The higher the degree of similarity is (the smaller the variance of the image information is), the higher the possibility that the area in the image matches the hypothetical model M, that is, the higher the possibility of representing the background. become.

【０２３７】背景モデル判定部７１０では、基準画像＃
１の各選択画素毎に、各領域毎に、その選択画素、その
領域に背景が存在するか否かが判断される。具体的に
は、類似度算出部７０９で算出された類似度が所定のし
きい値以上になっている選択画素あるいは領域について
は、背景であると判定され、上記算出された類似度が所
定のしきい値よりも小さくなっている選択画素あるいは
領域については、背景以外の物体であると判定される。In the background model determination section 710, the reference image #
For each selected pixel, it is determined for each region whether the selected pixel and the background have a background. Specifically, a selected pixel or area whose similarity calculated by the similarity calculation unit 709 is equal to or greater than a predetermined threshold is determined to be a background, and the calculated similarity is determined to be a predetermined value. The selected pixel or area smaller than the threshold value is determined to be an object other than the background.

【０２３８】背景が存在しているか否かの判定を行う場
合に、背景モデルに関して予め付与された補助情報を利
用してもよい。補助情報とは、背景モデルの精度、明度
などの情報である。When determining whether or not a background exists, auxiliary information given in advance for the background model may be used. The auxiliary information is information such as accuracy and brightness of the background model.

【０２３９】例えば、背景モデルの精度に応じて、判定
を緩やかにしたり、照明などの影響で、局所情報の精度
が劣化する領域については、この点を考慮するなどし
て、判定の精度を高めることができる。For example, depending on the accuracy of the background model, the accuracy of the determination is increased by taking into account this point, for example, by easing the determination or in a region where the accuracy of the local information deteriorates due to the influence of lighting or the like. be able to.

【０２４０】また、対応候補点の探索の範囲を、補助情
報に基づき領域毎に設定してもよい。擬似背景発生部
７１１では、所望の擬似背景画像、たとえば「東京タワ
ーと青空」が生成される。この擬似背景は、画像センサ
１〜Ｎでは、現在、撮像されていない画像であり、予め
他の画像センサで撮像しておくようにすればよい。ま
た、擬似背景は、基準画像＃１で撮像された実際の背景
の代わりとなるものであるので、基準画像＃１の視野の
動きにに合わせて撮像されていればよりリアルである。
画像選択部７１２は、基準画像＃１の背景として、擬
似背景発生部７１１で発生した擬似背景（「東京タワー
と青空」）か実際の画像（「模様のあるカーテンの前の
人」）のいずれかを表示させるかを画素毎に選択するも
のである。The range of searching for the corresponding candidate point may be set for each area based on the auxiliary information. The pseudo background generation unit 711 generates a desired pseudo background image, for example, “Tokyo Tower and Blue Sky”. This pseudo background is an image that is not currently captured by the image sensors 1 to N, and may be captured in advance by another image sensor. In addition, since the pseudo background is a substitute for the actual background captured in the reference image # 1, it is more realistic if captured in accordance with the movement of the field of view of the reference image # 1.
The image selection unit 712 determines whether the background of the reference image # 1 is the pseudo background (“Tokyo Tower and the blue sky”) generated by the pseudo background generation unit 711 or the actual image (“the person in front of the patterned curtain”). This is to select whether or not to display for each pixel.

【０２４１】そして、画像表示部７１３では、背景モデ
ル判定部７１０の判定結果に基づいて、基準画像＃１の
領域のうち、背景であると判定された領域については、
実際の画像の代わりに、画像選択部７１２で選択された
背景（たとえば擬似背景）が表示されるとともに、基準
画像＃１の領域のうち、背景以外の物体（人など）であ
ると判定された領域については、基準画像＃１の画像情
報をそのままとし、画像センサ１で撮像された背景以外
の物体がそのまま表示される。つまり、画像表示部７１
３では、画像センサ１の視野でみたときの物体（人）が
擬似背景とともに表示されることになる。The image display unit 713 determines, based on the determination result of the background model determination unit 710, of the area of the reference image # 1 which is determined to be the background.
Instead of the actual image, the background (for example, a pseudo background) selected by the image selection unit 712 is displayed, and it is determined that the area of the reference image # 1 is an object (such as a person) other than the background. As for the area, the image information of the reference image # 1 is left as it is, and an object other than the background captured by the image sensor 1 is displayed as it is. That is, the image display unit 71
In 3, the object (person) as viewed from the field of view of the image sensor 1 is displayed together with the pseudo background.

【０２４２】なお、本実施形態では、画像表示部７１３
で、画像センサ１の視野でみた画像を表示しているが、
他の画像センサ２、…、Ｎの視野でみた画像を表示する
実施も可能である。In this embodiment, the image display unit 713
Displays the image viewed from the field of view of the image sensor 1,
It is also possible to display an image viewed from the other image sensors 2,..., N.

【０２４３】また、本実施形態では、図１２と異なり、
仮想画像センサ２１による仮想画像＃２１を設定しない
で処理を行う場合を例にとり説明したが、図１２と同様
に仮想画像センサ２１による仮想画像＃２１を設定し
て、この仮想画像＃２１を、基準画像１の代わりとする
実施も可能である。この場合、画像表示部７１３では、
仮想画像＃２１でみた画像が表示されることになる。In this embodiment, unlike FIG.
Although the case where the processing is performed without setting the virtual image # 21 by the virtual image sensor 21 has been described as an example, the virtual image # 21 by the virtual image sensor 21 is set as in FIG. It is also possible to carry out an alternative to the reference image 1. In this case, in the image display unit 713,
The image viewed as virtual image # 21 is displayed.

【０２４４】なお、背景モデル情報設定部７０４で設定
される背景モデルは、実際の背景を捕らえた距離画像に
基づき生成してもよいし、また、カメラの動きを検出す
るセンサを用意し、カメラの動きに応じてモデルを修正
してもよい。この場合、仮想画像＃２１からみた距離画
像を生成し、この距離画像そのものを背景モデル、モデ
ル情報として登録することができる。これにより距離画
像の信頼度などの補助情報を利用して、背景か否かの判
定の精度を高めることが可能である。The background model set by the background model information setting section 704 may be generated based on a distance image capturing the actual background, or a sensor for detecting the movement of the camera may be provided. The model may be modified according to the movement of. In this case, a distance image viewed from the virtual image # 21 can be generated, and the distance image itself can be registered as a background model and model information. This makes it possible to use the auxiliary information such as the reliability of the distance image to improve the accuracy of determining whether or not the image is a background.

[Brief description of the drawings]

【図１】図１は第１の実施形態を説明するための仮想画
像センサの概念を示す図である。FIG. 1 is a diagram showing a concept of a virtual image sensor for explaining a first embodiment.

【図２】図２は多眼ステレオの類似度の融合の仕方を説
明する図である。FIG. 2 is a diagram illustrating a method of merging similarities of multi-view stereo.

【図３】図３は第１の実施形態を説明する図であり、仮
想視野を用いた多眼ステレオの距離計測の処理内容（基
準画像センサを設定する場合）を説明する図である。FIG. 3 is a diagram for explaining the first embodiment, and is a diagram for explaining processing content of distance measurement of a multi-view stereo using a virtual visual field (when a reference image sensor is set).

【図４】図４は第１の実施形態を説明する図であり、仮
想視野を用いた多眼ステレオ距離計測システム（基準画
像センサを設定する場合）のブロック図である。FIG. 4 is a diagram for explaining the first embodiment, and is a block diagram of a multi-view stereo distance measurement system using a virtual visual field (when a reference image sensor is set).

【図５】図５は第１の実施形態を説明する図であり、仮
想視野を用いた多眼ステレオの距離計測の処理内容（基
準画像センサを設定しない場合）を説明する図である。FIG. 5 is a diagram for explaining the first embodiment, and is a diagram for explaining processing content of distance measurement of a multi-view stereo using a virtual visual field (when a reference image sensor is not set).

【図６】図６は第１の実施形態を説明する図であり、仮
想視野を用いた多眼ステレオ距離計測システム（基準画
像センサを設定しない場合）のブロック図である。FIG. 6 is a diagram for explaining the first embodiment, and is a block diagram of a multi-view stereo distance measuring system using a virtual visual field (when a reference image sensor is not set).

【図７】図７（ａ）、（ｂ）は第２の実施形態を説明す
る図であり、異なる種類の画像センサが配置された画像
センサ群を示す斜視図である。FIGS. 7A and 7B are views for explaining a second embodiment, and are perspective views showing image sensor groups in which different types of image sensors are arranged.

【図８】図８は第３の実施形態で想定している車両搭載
のシステムを概念的に示す図である。FIG. 8 is a diagram conceptually showing a vehicle-mounted system assumed in a third embodiment.

【図９】図９は第２の実施形態を説明する図であり、仮
想視野を用いて異なる種類の画像センサ群を統合する多
眼ステレオ距離計測システムのブロック図である。FIG. 9 is a diagram illustrating a second embodiment, and is a block diagram of a multi-view stereo distance measuring system that integrates different types of image sensors using a virtual field of view.

【図１０】図１０（ａ）、（ｂ）、（ｃ）は図９に示す
距離推定部で実行される処理の内容を説明する図であ
る。FIGS. 10A, 10B, and 10C are diagrams for explaining the contents of processing executed by a distance estimating unit shown in FIG. 9;

【図１１】図１１は第４の実施形態を説明する図であ
り、仮想視野を用いる場合のモデル上の点が画像上のど
こに写るかを示す図である。FIG. 11 is a diagram for explaining the fourth embodiment, and is a diagram showing where points on a model appear in an image when a virtual visual field is used.

【図１２】図１２は第４の実施形態を説明する図であ
り、仮想視野を用いてモデルを推定する多眼ステレオ装
置の構成ブロック図である。FIG. 12 is a diagram illustrating a fourth embodiment, and is a configuration block diagram of a multi-view stereo apparatus that estimates a model using a virtual visual field.

【図１３】図１３は第４の実施形態で想定しているモデ
ルの一例を示す図である。FIG. 13 is a diagram illustrating an example of a model assumed in a fourth embodiment.

【図１４】図１４は図１２に示す類似度算出部で得られ
る仮定モデルと類似度の逆数との対応関係を示すグラフ
である。FIG. 14 is a graph showing a correspondence relationship between a hypothetical model obtained by the similarity calculation unit shown in FIG. 12 and a reciprocal of the similarity;

【図１５】図１５は第４の実施形態の変形例を説明する
図であり、走行路の上に存在する障害物を認識する場合
を説明する図である。FIG. 15 is a diagram illustrating a modified example of the fourth embodiment, and is a diagram illustrating a case where an obstacle existing on a traveling path is recognized.

【図１６】図１６は図１５に示す実施形態において走行
路と障害物とを識別する処理を説明する図である。FIG. 16 is a diagram illustrating a process of identifying a traveling path and an obstacle in the embodiment shown in FIG.

【図１７】図１７は第４の実施形態の変形例を説明する
図であり、走行路が平坦であることが既知であり、この
平面の上に存在する障害物を認識する場合を説明する図
である。FIG. 17 is a diagram illustrating a modified example of the fourth embodiment, in which it is known that a traveling path is flat and an obstacle existing on this plane is recognized. FIG.

【図１８】図１８は図１７に示す実施形態において走行
路と障害物とを識別する処理を説明する図である。FIG. 18 is a diagram illustrating a process of identifying a traveling path and an obstacle in the embodiment shown in FIG. 17;

【図１９】図１９は第５の実施形態を説明する図であ
り、擬似背景を背景とする画像を表示するシステムを示
すブロック図である。FIG. 19 is a diagram illustrating a fifth embodiment, and is a block diagram illustrating a system that displays an image with a pseudo background as a background.

【図２０】図２０は従来の２眼ステレオの原理を示した
図である。FIG. 20 is a diagram showing the principle of a conventional twin-lens stereo.

【図２１】図２１は従来の２眼ステレオの距離計測の処
理を説明する図である。FIG. 21 is a diagram illustrating a conventional distance measurement process of a twin-lens stereo.

【図２２】図２２は仮定距離と類似度の逆数との対応関
係を示すグラフである。FIG. 22 is a graph showing a correspondence relationship between an assumed distance and a reciprocal of a similarity.

【図２３】図２３は従来の２眼ステレオ装置の構成を示
したブロック図である。FIG. 23 is a block diagram showing a configuration of a conventional twin-lens stereo apparatus.

【図２４】図２４は従来の多眼ステレオの距離計測の処
理内容を説明する図である。FIG. 24 is a diagram for explaining processing content of a conventional multi-lens stereo distance measurement.

【図２５】図２５は従来の多眼ステレオ装置の構成を示
したブロック図である。FIG. 25 is a block diagram showing a configuration of a conventional multi-view stereo apparatus.

【図２６】図２６は車両上に搭載した画像センサ群の撮
像結果に基づき距離を計測する従来技術を説明する図で
ある。FIG. 26 is a diagram for explaining a conventional technique for measuring a distance based on an imaging result of an image sensor group mounted on a vehicle.

【図２７】図２７は第４の実施形態を説明する図であ
り、仮想視野を用いないでモデルを推定する多眼ステレ
オ装置のブロック図である。FIG. 27 is a diagram illustrating a fourth embodiment, and is a block diagram of a multi-view stereo apparatus that estimates a model without using a virtual visual field.

【図２８】図２８は第４の実施形態を説明する図であ
り、仮想視野を用いない場合のモデル上の点が画像上の
どこに写るかを示す図である。FIG. 28 is a diagram illustrating the fourth embodiment, and is a diagram illustrating where points on a model are projected on an image when a virtual visual field is not used.

[Explanation of symbols]

２１仮想画像センサ１〜Ｎ画像センサ１１画像センサ群５０認識対象物体６０車両６１道路６２障害物３０５仮想視野情報設定部３１４距離情報推定部５１３外部情報入力部６０５仮定モデル情報設定部６１１モデル推定部７１１擬似背景発生部 Reference Signs List 21 virtual image sensor 1 to N image sensor 11 image sensor group 50 recognition target object 60 vehicle 61 road 62 obstacle 305 virtual visual field information setting unit 314 distance information estimation unit 513 external information input unit 605 assumed model information setting unit 611 model estimation unit 711 Pseudo background generation unit

フロントページの続き (72)発明者木村茂神奈川県川崎市宮前区菅生ヶ丘９−１− 403 (72)発明者中野勝之東京都目黒区中目黒２−２−30 (72)発明者山口博義神奈川県平塚市四之宮2597 株式会社小松製作所特機事業本部研究部内 (72)発明者新保哲也神奈川県平塚市四之宮2597 株式会社小松製作所特機事業本部研究部内 (72)発明者川村英二神奈川県川崎市宮前区有馬２丁目８番24号株式会社サイヴァース内 (72)発明者緒方正人神奈川県鎌倉市上町屋345番地三菱プレシジョン株式会社内Continued on the front page (72) Inventor Shigeru Kimura 9-1-403, Sugogaoka, Miyamae-ku, Kawasaki-shi, Kanagawa (72) Inventor Katsuyuki Nakano 2-2-30 Nakameguro, Meguro-ku, Tokyo (72) Inventor Hiroyoshi Yamaguchi 2597 Shinomiya, Hiratsuka-shi, Kanagawa Pref., Ltd.Tetsuya Shinbo Inspector Tetsuya Shinbo 2597 Shinoho, Hiratsuka-shi, Kanagawa Pref. 2-8-24 Arima, Miyamae-ku Inside Cyvers Corporation (72) Inventor Masato Ogata 345 Kamimachiya, Kamakura-shi, Kanagawa Prefecture Mitsubishi Precision Corporation

Claims

[Claims]

A plurality of image pickup means are arranged at a predetermined interval, and correspond to a selected pixel in an image picked up by the one image pickup means when one of the plurality of image pickup means picks up an image of a target object. Information of corresponding candidate points in a captured image of another imaging means to be extracted for each magnitude of an assumed distance from the one imaging means to a point on the object corresponding to the selected pixel, Calculate the similarity between the image information and the image information of the corresponding candidate point, and calculate the assumed distance when the calculated similarity is the largest, from the one imaging unit on the object corresponding to the selected pixel. A distance to a point, and in an object recognition device configured to recognize the object based on the distance obtained for each selected pixel, a virtual position when the object is imaged at a desired position with a desired field of view By imaging means A virtual visual field information setting unit that sets a virtual captured image instead of the captured image of the one imaging unit, and the plurality of captured images corresponding to selected pixels in the virtual image set by the virtual visual field information setting unit Corresponding candidate points for extracting information on corresponding candidate points in the captured image of the means for each magnitude of an assumed distance from the viewpoint set by the virtual visual field information setting means to a point on the object corresponding to the selected pixel Information extracting means, similarity calculating means for calculating the similarity between the image information of the corresponding candidate points extracted by the corresponding candidate point information extracting means, and the similarity calculated by the similarity calculating means being the largest And the distance estimating means for obtaining the distance from the viewpoint set by the virtual visual field information setting means to a point on the object corresponding to the selected pixel, and obtaining this distance for each selected pixel. Object recognition device.

2. A method for determining each of the selected pixels in the virtual image based on the magnitude of the similarity, the rate of change of the similarity, or the rate of change of the original image for each assumed distance calculated by the similarity calculating means. 2. The object recognition apparatus according to claim 1, further comprising a distance reliability calculating means for calculating the reliability of the distance to be obtained.

3. An image indicating a distance or a degree of reliability from a viewpoint set by the virtual visual field information setting means to each point of the object based on a distance obtained for each selected pixel in the virtual image. 3. The object recognition apparatus according to claim 1, further comprising an image generation unit for generating the image.

4. The method according to claim 1, further comprising the step of: setting, based on the distance determined for each selected pixel in the virtual image and image information of a corresponding point corresponding to the distance, the object at the viewpoint set by the virtual visual field information setting means. 2. The object recognition apparatus according to claim 1, further comprising an image generation unit configured to generate an image when the image is captured.

5. A plurality of imaging means are arranged at predetermined intervals, and correspond to a selected pixel in a captured image of one of the plurality of imaging means when the target object is imaged by one of the plurality of imaging means. Information of corresponding candidate points in a captured image of another imaging means to be extracted for each magnitude of an assumed distance from the one imaging means to a point on the object corresponding to the selected pixel, Calculate the similarity between the image information and the image information of the corresponding candidate point, and calculate the assumed distance when the calculated similarity is the largest, from the one imaging unit on the object corresponding to the selected pixel. In an object recognition device configured to recognize the object based on a distance determined for each selected pixel as a distance to a point, the plurality of imaging units may include at least two types of images under different conditions for imaging the object. Imaging A virtual image picked up by a virtual image pickup means when the object is picked up at a desired position and with a desired visual field is set instead of the image picked up by the one image pickup means. Virtual visual field information setting means, and information on corresponding candidate points in the captured images of the respective imaging means groups corresponding to the selected pixels in the virtual image set by the virtual visual field information setting means. Means for extracting corresponding candidate point information for each magnitude of an assumed distance from the viewpoint set by the means to the point on the object corresponding to the selected pixel; and a corresponding candidate extracted by the corresponding candidate point information extracting means. A similarity calculating unit that calculates the similarity between the pieces of image information of points for each of the imaging unit groups, and based on the similarity of each of the imaging unit groups calculated by the similarity calculating unit. Seeking fusion similarity, the assumptions distance when the fusion similarity is maximized,
An object recognizing device comprising: a distance from a viewpoint set by the virtual visual field information setting unit to a point on the object corresponding to the selected pixel; and a distance estimating unit that calculates the distance for each selected pixel.

6. A plurality of imaging means are arranged at a predetermined interval, and correspond to a selected pixel in a captured image of the one imaging means when one of the plurality of imaging means images a target object. Information of corresponding candidate points in a captured image of another imaging means to be extracted for each magnitude of an assumed distance from the one imaging means to a point on the object corresponding to the selected pixel, Calculate the similarity between the image information and the image information of the corresponding candidate point, and calculate the assumed distance when the calculated similarity is the largest, from the one imaging unit on the object corresponding to the selected pixel. In an object recognition device configured to recognize the object based on a distance determined for each selected pixel as a distance to a point, the plurality of imaging units may include at least two types of images under different conditions for imaging the object. Imaging Imaging means group selecting means for selecting at least one imaging means group from the imaging means group as an imaging means group to be actually used, according to an imaging condition of the object; A virtual field of view information setting means for setting a virtual captured image by a virtual image capturing means when capturing the object with a desired visual field at a desired position, instead of the captured image of the one image capturing means, The information of the corresponding candidate point in the captured image of the selected image capturing unit group corresponding to the selected pixel in the virtual image set by the virtual visual field information setting unit, from the viewpoint set by the virtual visual field information setting unit Corresponding candidate point information extracting means for extracting for each magnitude of the assumed distance to the point on the object corresponding to the selected pixel; and image information of corresponding candidate points extracted by the corresponding candidate point information extracting means. Similarity, similarity calculating means for calculating the selected imaging means group, and the assumed distance when the similarity for the selected imaging means group calculated by the similarity calculating means is the largest, The distance from the viewpoint set by the virtual visual field information setting means to a point on the object corresponding to the selected pixel,
An object recognizing device comprising a distance estimating means for obtaining this distance for each selected pixel.

7. A plurality of image pickup means are arranged at predetermined intervals, and correspond to a selected pixel in an image picked up by the one image pickup means when one of the plurality of image pickup means picks up an image of a target object. Information of corresponding candidate points in a captured image of another imaging means to be extracted for each magnitude of an assumed distance from the one imaging means to a point on the object corresponding to the selected pixel, Calculate the similarity between the image information and the image information of the corresponding candidate point, and calculate the assumed distance when the calculated similarity is the largest, from the one imaging unit on the object corresponding to the selected pixel. In an object recognition device configured to recognize the object based on the distance obtained for each selected pixel as a distance to a point, an actual object is selected from among the plurality of imaging units in accordance with an imaging condition of the object. Few to use for An image pickup means selecting means for selecting two image pickup means; and a virtual image picked up by the virtual image pickup means when the object is imaged at a desired position and a desired visual field according to the image pickup condition of the object. Virtual field information setting means to be set instead of the captured image of the image capturing means, and captured images of the selected at least two image capturing means corresponding to the selected pixels in the virtual image set by the virtual visual field information setting means Corresponding candidate point information extracting means for extracting information of corresponding candidate points in, for each magnitude of an assumed distance from the viewpoint set by the virtual visual field information setting means to a point on the object corresponding to the selected pixel. A similarity calculating unit that calculates the similarity between the image information of the corresponding candidate points extracted by the corresponding candidate point information extracting unit; and the similarity calculated by the similarity calculating unit is the largest. And the distance estimating means for obtaining the distance from the viewpoint set by the virtual visual field information setting means to a point on the object corresponding to the selected pixel, and obtaining this distance for each selected pixel. Object recognition device.

8. A plurality of image pickup means arranged at predetermined intervals for picking up an image of a recognition target object, and a model group corresponding to the recognition target object is assumed, and position coordinates of each point on each model of the model group are assumed. A hypothetical model information setting unit that sets the other imaging corresponding to a selected pixel in a captured image of the one imaging unit when one of the plurality of imaging units images the recognition target object. Corresponding candidate point information extracting means for extracting information on corresponding candidate points in the captured image of the means for each hypothetical model selected from the model group by using position coordinates of the hypothetical model; and A similarity calculating unit that calculates a similarity between the image information and the image information of the corresponding candidate point extracted by the corresponding candidate point information extracting unit; and a process before the similarity calculated by the similarity calculating unit becomes the largest. Assumptions model, a model of a point corresponding to the selected pixel, the object recognition device with the model estimating means for obtaining the model for each selected pixel.

9. An image pick-up of one of the plurality of image pick-up means based on a magnitude of similarity, a change rate of the similarity, or a change rate of an original image for each hypothetical model calculated by the similarity calculating means. 9. A model reliability calculating means for calculating a model reliability calculated for each selected pixel in an image picked up by the one image pick-up means when the recognition target object is picked up by the means. An object recognition device according to the above.

10. A model obtained for each selected pixel in an image picked up by the one imaging means when one of the plurality of imaging means images the recognition target object, 10. The object recognition apparatus according to claim 8, further comprising an image generation unit configured to generate an image indicating a value or reliability of the model.

11. A plurality of imaging means arranged at predetermined intervals for imaging a recognition target object in a space, a model corresponding to the recognition target object in the space is generated, and a position of each point on the model is generated. Hypothetical model information setting means for setting coordinates, and the model corresponding to a selected pixel in a captured image of the one imaging means when one of the plurality of imaging means images the recognition target object Corresponding candidate point information extracting means for extracting information on a corresponding candidate point in a captured image of another imaging means corresponding to the above point using the position coordinates of the model; image information of the selected pixel and the corresponding candidate A similarity calculating unit that calculates the similarity of the image information of the corresponding candidate points extracted by the point information extracting unit; and a similarity calculated by the similarity calculating unit is obtained for each selected pixel. Asked for It is by using the degree of similarity,
The region of the selected pixel whose similarity is equal to or larger than a predetermined threshold is determined to be the model, and the region of the selected pixel whose similarity is smaller than the predetermined threshold is an object other than the model. Object recognizing device, comprising: determining means for determining

12. A plurality of image pickup means arranged at predetermined intervals to image a recognition target object and a background in a space, a model corresponding to the background in the space is generated, and a position of each point on the model is generated. Assumption model information setting means for setting coordinates, and selection indicating the background in an image captured by the one imaging means when one of the plurality of imaging means images the recognition target object and the background. Correspondence candidate point information extraction means for extracting information on a corresponding candidate point in a captured image of another imaging means corresponding to a pixel using position coordinates of the background model; and image information of the selected pixel and the corresponding candidate A similarity calculating unit that calculates the similarity of the image information of the corresponding candidate points extracted by the point information extracting unit; and a similarity calculated by the similarity calculating unit is obtained for each selected pixel. Request Obtained by using the degree of similarity,
The area of the selected pixel whose similarity is equal to or more than a predetermined threshold is determined to be the background, and the area of the selected pixel whose similarity is smaller than the predetermined threshold is an object other than the background. Determining means, a pseudo background image generating means for generating a desired pseudo background image, and a pseudo pixel generated by the pseudo image background image generating means for an area of a selected pixel determined to be a background by the determining means. Image display means for displaying a background image and displaying the object other than the background as it is based on the image information of the selected pixel for the area of the selected pixel determined to be an object other than the background by the judgment means. Display device.

13. An appropriate one model to be assumed as a hypothetical model from the model group based on the model estimation information obtained by the model estimation means or the model reliability calculation means and other sensor information. 12. The apparatus for recognizing an object according to claim 8, further comprising model group selecting means for sequentially selecting a group of models.

14. A virtual image picked up by a virtual image pickup means when the object is picked up at a desired position with a desired field of view, is set instead of the image picked up by said one image pickup means, and Information on the corresponding candidate points in the captured images of the plurality of imaging units corresponding to the selected pixels in the virtual image, for each hypothetical model selected from the model group,
11. The object recognition device according to claim 8, wherein extraction is performed using the position coordinates of the hypothetical model, and a degree of similarity between the image information of the extracted corresponding candidate points is calculated. .

15. A virtual image picked up by a virtual image pickup means when the object is picked up at a desired position with a desired field of view, is set instead of the image picked up by the one image pickup means, and The information of the corresponding candidate points in the captured images of the plurality of imaging units corresponding to the selected pixels in the virtual image is extracted using the position coordinates of the model, and the similarity of the image information of the extracted corresponding candidate points is determined. 13. The object recognition device according to claim 11, wherein the degree is calculated.

16. A plurality of imaging means are arranged at a predetermined interval, and correspond to a selected pixel in a captured image of one of the plurality of imaging means when the target object is imaged by one of the plurality of imaging means. Information of corresponding candidate points in a captured image of another imaging means to be extracted for each magnitude of an assumed distance from the one imaging means to a point on the object corresponding to the selected pixel, Calculate the similarity between the image information and the image information of the corresponding candidate point, and calculate the assumed distance when the calculated similarity is the largest, from the one imaging unit on the object corresponding to the selected pixel. In a method for recognizing the object based on the distance obtained for each selected pixel as a distance to a point, a virtual position when the object is imaged with a desired field of view at a desired position. For imaging means A virtual captured image to be set, instead of the captured image of the one imaging unit, a virtual visual field information setting step to be set, and the plurality of virtual image information corresponding to selected pixels in the virtual image set in the virtual visual field information setting step. Correspondence candidates for extracting information on corresponding candidate points in an image picked up by an image pickup means for each magnitude of an assumed distance from a viewpoint set by the virtual visual field information setting means to a point on the object corresponding to the selected pixel. A point information extraction step, a similarity calculation step of calculating the similarity between the image information of the corresponding candidate points extracted in the corresponding candidate point information extraction step, and a similarity calculated in the similarity calculation step is the largest. The assumed distance is a distance from the viewpoint set in the virtual visual field information setting step to a point on the object corresponding to the selected pixel, and a distance estimation step of obtaining this distance for each selected pixel. Ingredient Object recognition method.

17. A method for calculating each of the selected pixels in the virtual image based on the magnitude of the similarity, the rate of change of the similarity, or the rate of change of the original image for each assumed distance calculated in the similarity calculating step. 17. The object recognition method according to claim 16, further comprising a distance reliability calculation step of calculating the reliability of the distance to be obtained.

18. An image indicating a distance or a degree of reliability from a viewpoint set in the virtual visual field information setting step to each point of the object based on a distance obtained for each selected pixel in the virtual image. 18. The method for recognizing an object according to claim 16, further comprising an image generation step for generating.

19. The object according to the viewpoint set in the virtual visual field information setting step, based on distances obtained for each selected pixel in the virtual image and image information of corresponding points corresponding to the distances. 17. The method for recognizing an object according to claim 16, further comprising an image generation step of generating an image when the image is captured.

20. A plurality of image pickup means arranged at predetermined intervals, corresponding to a selected pixel in a picked-up image of the one image pickup means when one of the plurality of image pickup means picks up an image of a target object. Information of corresponding candidate points in a captured image of another imaging means to be extracted for each magnitude of an assumed distance from the one imaging means to a point on the object corresponding to the selected pixel, Calculate the similarity between the image information and the image information of the corresponding candidate point, and calculate the assumed distance when the calculated similarity is the largest, from the one imaging unit on the object corresponding to the selected pixel. A method for recognizing an object based on a distance determined for each selected pixel as a distance to a point, wherein the plurality of imaging units are different in at least two types under different conditions for imaging the object. Shooting A virtual image picked up by the virtual image pickup means when the object is picked up at a desired position and with a desired visual field is set instead of the image picked up by the one image pickup means. A virtual visual field information setting step, and information on a corresponding candidate point in a captured image of each of the imaging means groups corresponding to a selected pixel in the virtual image set in the virtual visual field information setting step. A corresponding candidate point information extracting step of extracting for each magnitude of an assumed distance from a viewpoint set in the process to a point on the object corresponding to the selected pixel; and a corresponding candidate extracted in the corresponding candidate point information extracting process. A similarity calculation step for calculating the similarity between the image information of points for each of the imaging means groups; and a similarity calculation section for each of the imaging means groups calculated in the similarity calculation step. Seeking fusion similarity Te, the assumptions distance when the fusion similarity is maximized,
A method for recognizing an object, comprising: a distance from a viewpoint set in the virtual visual field information setting step to a point on the object corresponding to the selected pixel, and a distance estimating step of obtaining the distance for each selected pixel.

21. A plurality of image pickup means arranged at predetermined intervals, corresponding to a selected pixel in a picked-up image of the one image pickup means when one of the plurality of image pickup means picks up an image of a target object. Information of corresponding candidate points in a captured image of another imaging means to be extracted for each magnitude of an assumed distance from the one imaging means to a point on the object corresponding to the selected pixel, Calculate the similarity between the image information and the image information of the corresponding candidate point, and calculate the assumed distance when the calculated similarity is the largest, from the one imaging unit on the object corresponding to the selected pixel. A method for recognizing an object based on a distance determined for each selected pixel as a distance to a point, wherein the plurality of imaging units are different in at least two types under different conditions for imaging the object. Shooting An imaging means group selecting step of selecting at least one imaging means group from the imaging means group as an imaging means group to be actually used, according to an imaging condition of the object; At a desired position, a virtual image captured by the virtual image capturing means when capturing the object with a desired visual field, instead of the image captured by the one image capturing means, a virtual visual field information setting process, The information of the corresponding candidate point in the captured image of the selected imaging unit group corresponding to the selected pixel in the virtual image set in the virtual visual field information setting step is obtained from the viewpoint set in the virtual visual field information setting step. A corresponding candidate point information extraction step to extract for each magnitude of an assumed distance to a point on the object corresponding to the selected pixel; and image information of corresponding candidate points extracted in the corresponding candidate point information extraction step. A similarity, a similarity calculation step for calculating the selected imaging unit group, and the assumed distance when the similarity for the selected imaging unit group calculated in the similarity calculation stage is the largest, The distance from the viewpoint set in the virtual visual field information setting step to a point on the object corresponding to the selected pixel,
A method for recognizing an object, comprising: a distance estimation step of obtaining this distance for each selected pixel.

22. A plurality of image pickup means are arranged at a predetermined interval, and correspond to a selected pixel in a picked-up image of the one image pickup means when one of the plurality of image pickup means picks up an image of a target object. Information of corresponding candidate points in a captured image of another imaging means to be extracted for each magnitude of an assumed distance from the one imaging means to a point on the object corresponding to the selected pixel, Calculate the similarity between the image information and the image information of the corresponding candidate point, and calculate the assumed distance when the calculated similarity is the largest, from the one imaging unit on the object corresponding to the selected pixel. A method for recognizing the object based on the distance determined for each selected pixel as a distance to a point, the method comprising: Small to use An imaging means selection step of selecting at least two imaging means, a desired position according to the imaging conditions of the object, a virtual image captured by the virtual imaging means when imaging the object with a desired field of view, Virtual field information setting means for setting, instead of the image picked up by one image pickup means; imaging of the selected at least two image pickup means corresponding to selected pixels in the virtual image set in the virtual field information setting step A corresponding candidate point information extracting step of extracting information on a corresponding candidate point in an image from the viewpoint set in the virtual visual field information setting step for each magnitude of an assumed distance from the viewpoint on the object corresponding to the selected pixel; A similarity calculation step of calculating the similarity between the image information of the corresponding candidate points extracted in the corresponding candidate point information extraction step; and the similarity calculated in the similarity calculation step is the largest. The assumed distance at the time of setting the distance from the viewpoint set in the virtual visual field information setting step to a point on the object corresponding to the selected pixel, and a distance estimation step of obtaining this distance for each selected pixel. A method for recognizing equipped objects.

23. An imaging process of imaging a recognition target object by a plurality of imaging means arranged at predetermined intervals, and a model group corresponding to the recognition target object is assumed. A hypothetical model information setting step of setting a position coordinate of a point, and corresponding to a selected pixel in an image captured by the one imaging unit when the recognition target object is imaged by one of the plurality of imaging units. Corresponding candidate point information extraction step of extracting information of corresponding candidate points in an image captured by another imaging means for each hypothetical model selected from the model group using the position coordinates of the hypothetical model, A similarity calculation step of calculating the similarity between the image information of the selected pixel and the image information of the corresponding candidate point extracted in the corresponding candidate point information extraction step; and the similarity calculated in the similarity calculation step is the largest. The assumptions model, a model of a point corresponding to the selected pixel, the object recognition method that comprises the model estimation step of obtaining the model for each selected pixel when made.

24. One of the plurality of image pickup means based on the magnitude of similarity, the rate of change of the similarity, the rate of change of the original image, etc. for each hypothetical model calculated in the similarity calculation step. 24. The apparatus according to claim 23, further comprising: a model reliability calculation step of calculating a model reliability calculated for each selected pixel in the image picked up by the one image pickup means when the recognition target object is picked up by the means. The method of recognizing the described object.

25. Based on a model obtained for each selected pixel in an image picked up by the one imaging means when one of the plurality of imaging means images the recognition target object, An image generating step for generating an image indicating a model value or reliability is further provided.
26. The object recognition method according to claim 24 or claim 25.

26. An imaging step of imaging one recognition target object in a space by a plurality of imaging means arranged at predetermined intervals, and generating a model corresponding to the one recognition target object in the space; A hypothetical model information setting step of setting the position coordinates of each point on the model; and a captured image of the one imaging unit when the one of the plurality of imaging units captures the recognition target object. A corresponding candidate point information extracting step of extracting information of corresponding candidate points in a captured image of another imaging unit corresponding to the selected pixel indicating the model using the position coordinates of the model; and image information of the selected pixel. And a similarity calculation step of calculating the similarity of the image information of the corresponding candidate points extracted in the corresponding candidate point information extraction step, and a similarity calculated in the similarity calculation step is obtained for each selected pixel, Each selected picture Using the similarity obtained for each,
The region of the selected pixel whose similarity is equal to or greater than a predetermined threshold is determined to be the recognition target object, and the region of the selected pixel whose similarity is smaller than the predetermined threshold is determined as the recognition target object. A method for recognizing an object, comprising: a determination step of determining that the object is other than an object.

27. A recognition target object and a background in a space,
An imaging step of imaging with a plurality of imaging means arranged at predetermined intervals, and a model corresponding to a background in the space, and a hypothetical model information setting step of setting position coordinates of each point on the model, When one of the plurality of imaging units captures the recognition target object and the background, the captured image of another imaging unit corresponding to the selected pixel indicating the background in the captured image of the one imaging unit A corresponding candidate point information extracting step of extracting information of corresponding candidate points in the image using the position coordinates of the background model; and a corresponding candidate point extracted in the image information of the selected pixel and the corresponding candidate point information extracting step. A similarity calculation process for calculating the similarity of the image information of the, and the similarity calculated in the similarity calculation process is obtained for each selected pixel, using the similarity obtained for each selected pixel,
The region of the selected pixel whose similarity is equal to or greater than a predetermined threshold is determined to be the background, and the region of the selected pixel whose similarity is smaller than the predetermined threshold is determined by an object other than the background. A determination process for determining that there is a pixel; a pseudo background image generation process for generating a desired pseudo background image; and a region of a selected pixel determined to be a background in the determination process is generated in the pseudo image background image generation process. Displaying a pseudo background image, and for an area of a selected pixel determined to be an object other than the background in the determination step, an image display step of displaying the object other than the background as it is based on the image information of the selected pixel. Equipped display method.

28. An appropriate model to be a hypothetical model from the model group based on the model estimation information and other sensor information obtained in the model estimation step or the model reliability calculation step. 27. The object recognition method according to claim 23, further comprising a model group selection step of sequentially selecting a model group.

29. A virtual image picked up by a virtual image pickup means when the object is picked up at a desired position with a desired field of view, is set instead of the image picked up by the one image pickup means, and Information on the corresponding candidate points in the captured images of the plurality of imaging units corresponding to the selected pixels in the virtual image, for each hypothetical model selected from the model group,
26. The object recognition method according to claim 23, wherein the extraction is performed using the position coordinates of the hypothetical model, and the degree of similarity between the image information of the extracted corresponding candidate points is calculated. .

30. A virtual image picked up by a virtual image pickup means when the object is picked up at a desired position and with a desired visual field, is set instead of the image picked up by the one image pickup means, and The information of the corresponding candidate points in the captured images of the plurality of imaging units corresponding to the selected pixels in the virtual image is extracted using the position coordinates of the model, and the similarity of the image information of the extracted corresponding candidate points is determined. 28. The object recognition method according to claim 26, wherein the degree is calculated.