JP2003022442A

JP2003022442A - Method and device for object detection and position measurement, execution program for the same method, and its recording medium

Info

Publication number: JP2003022442A
Application number: JP2001208920A
Authority: JP
Inventors: Takahito Kawanishi; 隆仁川西; Hiroshi Murase; 洋村瀬; Shigeru Takagi; 茂高木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2001-07-10
Filing date: 2001-07-10
Publication date: 2003-01-24

Abstract

PROBLEM TO BE SOLVED: To perform object detection and its position measurement robustly and very fast with increased computation efficiency. SOLUTION: An object features quantity learning means 1 learns object features and features quantity distribution from a set of reference images having different distances, orientations, etc. A panning, tilting, and zoom-parameter computing means 2 computes parameter information needed for a search from the size of the features quantity distribution. A camera control and image input means 3 photographs an image which is not searched for with the parameter information to generate a features quantity image. An input image features quantity distribution computing means 4 computes the features quantity distribution in a window of interest set in the features quantity image. A features quantity collating means 5 computes the similarity value between the object features quantity distribution and the features quantity distribution in the window of interest to detect an object. A collation-omitted area computing means 6 computes many object features quantity similar to the object features and collation-omittable areas between many windows of interest nearby the window of interest. An object position computing means 7 computes the object position based on the detection direction of the detected object.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、３次元実環境中か
ら、あらかじめ登録した物体に類似した領域を探し出す
物体検出を行い、その位置を測定する方法と装置に関す
るものであり、たとえば、ペットロボットの目、情報家
電等のための非接触センサに利用可能である。すなわ
ち、移動ロボットによる標識・目標認識、室内監視シス
テム、商品管理システムなどに関係する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for detecting an object in a three-dimensional real environment in which a region similar to an object registered in advance is detected and measuring the position thereof. For example, a pet robot. It can be used as a non-contact sensor for eyes, information appliances, etc. That is, it relates to sign / target recognition by a mobile robot, an indoor monitoring system, a product management system, and the like.

【０００２】[0002]

【従来の技術】従来、高速物体検出に関しては、「物体
検出装置」（特願平８−１４６８５７号；先願１）のよ
うに等間隔に量子化した色を用いて高速に探索する手法
や正規化相互相関によるテンプレート照合法により照明
変動に頑健に探索する手法と、カメラ制御の基本的枠組
みである「物体検出方法および装置およびこの方法を記
録した記録媒体」（特願２０００−３５３３６；先願
２）を組み合わせた方法が知られている。2. Description of the Related Art Conventionally, with respect to high-speed object detection, there is a method for searching at high speed using quantized colors at equal intervals, such as "Object detection device" (Japanese Patent Application No. 8-146857; prior application 1). A method for robustly searching for illumination fluctuations by a template matching method based on normalized cross-correlation, and a basic framework of camera control, "object detection method and apparatus and recording medium recording this method" (Japanese Patent Application No. 2000-35336; A method in which the request 2) is combined is known.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、この従
来の方法において、先願１では変動の多い３次元実環境
の探索の場合、十分な精度で探索できず、また正規化相
互相関によるテンプレート照合法では照合コストが膨大
である上に、向きの変化に対して弱いため、変化・変動
に対応するには多数の参照画像を利用しなければなら
ず、参照画像の枚数に比例して照合コストが増加してし
まい、「物体検出方法および装置およびこの方法を記録
した記録媒体」（特願２０００−３５３３６）の先願２
と組み合わせても、莫大な時間がかかるという欠点があ
った。However, in this conventional method, in the prior application 1, in the case of searching a three-dimensional real environment with a lot of fluctuations, the search cannot be performed with sufficient accuracy, and the template matching method by the normalized cross correlation is used. Since the matching cost is enormous, and it is weak against changes in orientation, a large number of reference images must be used to cope with changes and fluctuations, and the matching cost is proportional to the number of reference images. The number has increased, and the prior application 2 of “Object detection method and apparatus and recording medium recording this method” (Japanese Patent Application No. 2000-35336)
Even with this, there was the drawback that it would take an enormous amount of time.

【０００４】本発明は、公知の方法よりも計算効率の良
い物体検出および位置測定を行い、公知の方法よりも頑
健かつ圧倒的に少ない時間で高速に物体検出および位置
測定を行う方法および装置を提供することを課題として
いる。The present invention provides a method and apparatus for performing object detection and position measurement which are more computationally efficient than known methods, and which are more robust and faster than known methods and at high speed in overwhelmingly less time. The challenge is to provide.

【０００５】[0005]

【課題を解決するための手段】上記課題を解決するため
の、本発明の一手段は、パン・チルト・ズーム機能を有
する能動カメラを用いて、あらかじめ登録した物体に類
似した物体を検出し、その物体の３次元位置を求める物
体検出および位置測定を行う方法であって、遠近や向
き、照明などの撮影条件を変化させて撮影した多数の物
体の参照画像集合から物体特徴量を学習し、物体特徴お
よび特徴量分布を学習する物体特徴量学習過程と、前記
物体特徴量学習過程で得られた特徴量分布の画像上での
大きさからカメラの視野全てを探索するのに必要なパン
・チルト・ズームのカメラパラメータを計算するパン・
チルト・ズームパラメータ計算過程と、前記パン・チル
ト・ズームパラメータ計算過程で得られたカメラパラメ
ータ情報のうち、まだ探索していないカメラパラメータ
にカメラを選択して画像を撮影し、撮影した入力画像上
の各画素に対して前記物体特徴量学習過程で得られた特
徴量に変換した特徴量画像を生成するカメラ制御・画像
入力過程と、カメラ制御・画像入力過程で得られた特徴
量画像に、照合する物体特徴量分布の画像上での大きさ
に対応した注目窓を設定し、該注目窓内の特徴量分布を
計算する入力特徴量分布計算過程と、前記物体特徴量学
習過程で得られた物体特徴量分布と前記入力特徴量分布
計算過程で設定した注目窓内の特徴量分布との間の類似
値または距離値を計算し物体の有無を判定して物体を検
出する特徴量分布照合過程と、前記特徴量分布照合過程
で計算された類似値または距離値に基づいて前記物体特
徴と類似した他の多数の物体特徴量と該注目窓周辺の多
数の注目窓との間の照合省略可能な領域を計算する照合
省略領域計算過程と、前記特徴量分布照合過程で物体を
検出した場合、該検出方向から物体位置を計算する物体
位置計算過程とを、備えることを特徴とする物体検出／
位置測定方法である。[Means for Solving the Problems] In order to solve the above problems, one means of the present invention detects an object similar to an object registered in advance by using an active camera having a pan / tilt / zoom function, A method for detecting and measuring a three-dimensional position of an object, the object feature amount being learned from a set of reference images of a large number of objects photographed by changing imaging conditions such as perspective, direction, and illumination. The object feature amount learning process of learning the object feature and the feature amount distribution, and the pan required to search the entire field of view of the camera from the size of the feature amount distribution obtained in the object feature amount learning process on the image. Pan that calculates camera parameters for tilt and zoom
Of the camera parameter information obtained in the tilt / zoom parameter calculation process and the pan / tilt / zoom parameter calculation process, a camera is selected as a camera parameter that has not yet been searched, an image is captured, and the captured input image is displayed. For each pixel of the camera control / image input process for generating the feature amount image converted into the feature amount obtained in the object feature amount learning process, and the feature amount image obtained in the camera control / image input process, A target window corresponding to the size of the object feature amount distribution to be collated on the image is set, and the feature amount distribution calculation process for calculating the feature amount distribution in the target window and the object feature amount learning process are performed. A feature distribution for detecting an object by calculating a similarity value or a distance value between the object feature distribution and the feature distribution in the window of interest set in the input feature distribution calculation process and determining the presence or absence of the object Matching process, and matching between a large number of other object feature amounts similar to the object feature and a large number of attention windows around the attention window based on the similarity value or distance value calculated in the feature amount distribution comparison process. An object characterized by comprising a matching omission area calculation process for calculating an eliminable region and an object position calculation process for calculating an object position from the detection direction when an object is detected in the feature amount distribution matching process. detection/
It is a position measuring method.

【０００６】あるいは、上記物体特徴量学習過程に、色
情報のベクトル量子化による色符号を用いた物体特徴量
を使用したことを特徴とする物体検出／位置測定方法で
ある。Alternatively, the object detection / position measurement method is characterized in that an object feature quantity using a color code by vector quantization of color information is used in the object feature quantity learning process.

【０００７】あるいは、上記物体特徴量学習過程に、物
体に含まれる色に限定したベクトル量子化による色符号
を物体特徴量に使用したことを特徴とする物体検出／位
置測定方法である。Alternatively, in the object feature amount learning process, an object detection / position measuring method is characterized in that a color code by vector quantization limited to colors included in the object is used for the object feature amount.

【０００８】あるいは、上記物体特徴量学習過程の物体
特徴量分布に、物体特徴のヒストグラムを用いることを
特徴とする物体検出／位置測定方法である。Alternatively, the object detecting / position measuring method is characterized in that a histogram of object characteristics is used for the object characteristic amount distribution in the object characteristic amount learning process.

【０００９】あるいは、上記特徴量分布照合過程におい
て、上記物体特徴量学習過程において学習した物体特徴
量分布のうち、誤検出や検出漏れが生じ物体を検出した
とは扱えない物体特徴量分布を「候補選択用」物体特徴
量分布として利用し、上記カメラ制御・画像入力過程
に、「候補選択用」物体特徴量で物体があると評価した
領域を優先的にカメラを向けさせることを特徴とする物
体検出／位置測定方法である。Alternatively, in the feature amount distribution matching process, among the object feature amount distributions learned in the object feature amount learning process, an object feature amount distribution that cannot be treated as detecting an object due to erroneous detection or omission of detection is " It is characterized in that it is used as a "candidate selection" object feature amount distribution, and in the camera control / image input process, the camera is preferentially pointed to an area evaluated to have an object by the "candidate selection" object feature amount. This is an object detection / position measurement method.

【００１０】あるいは、「候補選択用」物体特徴量の評
価には、合成分布による類似値または距離値による評価
を用いることを特徴とする物体検出／位置測定方法であ
る。Alternatively, the object detection / position measurement method is characterized in that the evaluation of the “feature selection” object feature quantity uses the evaluation by the similarity value or the distance value by the composite distribution.

【００１１】あるいは、上記特徴量分布照合過程におい
て、類似値または距離値の計算結果から該類似値または
距離値計算に用いた該局所部分領域周辺の局所部分領域
の類似値の上限値または距離値の下限値を計算し、しき
い値に達しない領域の類似値計算を省略することを特徴
とする物体検出／位置測定方法である。Alternatively, in the feature amount distribution matching process, the upper limit value or the distance value of the similarity value of the local partial area around the local partial area used for the calculation of the similar value or the distance value is calculated from the calculation result of the similar value or the distance value. The object detection / position measurement method is characterized in that the lower limit value of is calculated, and the calculation of the similarity value in the region that does not reach the threshold value is omitted.

【００１２】あるいは、上記物体特徴量学習過程におい
て、物体特徴量分布間の類似値または距離値を予め計算
し、上記特徴量分布照合過程において、類似値または距
離値の計算結果と物体特徴量分布間の類似値または距離
値から該類似値または距離値の計算に用いた該物体特徴
量と類似した物体特徴量の類似値の上限値または距離値
の下限値を計算し、しきい値に達しない物体特徴量の類
似値または距離値の計算を省略することを特徴とする物
体検出／位置測定方法である。Alternatively, in the object feature amount learning process, a similarity value or a distance value between the object feature amount distributions is calculated in advance, and in the feature amount distribution matching process, the calculation result of the similarity value or the distance value and the object feature amount distribution are calculated. The upper limit value of the similarity value or the lower limit value of the distance value of the object feature amount similar to the object feature amount used for the calculation of the similarity value or the distance value is calculated from the similar value or the distance value between The object detection / position measurement method is characterized in that the calculation of the similarity value or the distance value of the object feature amount is omitted.

【００１３】あるいは、上記物体特徴量学習過程におい
て、複数の物体特徴量分布を１つに合成した合成分布を
予め計算し、上記特徴量分布照合過程において、合成分
布と入力画像上の局所部分領域との類似値または距離値
の計算結果から該物体特徴量分布の上限値または距離値
の下限値を計算し、しきい値に達しない物体特徴量の類
似計算を省略することを特徴とする物体検出／位置測定
方法である。Alternatively, in the object feature amount learning process, a combined distribution obtained by combining a plurality of object feature amount distributions into one is calculated in advance, and in the feature amount distribution matching process, the combined distribution and a local partial region on the input image are calculated. An object characterized by calculating the upper limit value or the lower limit value of the distance value of the object feature amount distribution from the calculation result of the similarity value or the distance value and omitting the similarity calculation of the object feature amount that does not reach the threshold value This is a detection / position measurement method.

【００１４】あるいは、上記物体位置計算過程におい
て、カメラ単体により検出した物体の方向と既知の物体
の大きさとに基づいて物体位置を計算することを特徴と
する物体検出／位置測定方法である。Alternatively, in the object position calculation process, the object position is calculated based on the direction of the object detected by the camera alone and the known size of the object.

【００１５】あるいは、上記物体位置計算過程におい
て、複数のカメラにより検出した物体の方向に基づいて
物体位置を計算することを特徴とする物体検出／位置測
定方法である。Alternatively, in the object position calculating process, the object detecting / position measuring method is characterized in that the object position is calculated based on the directions of the objects detected by a plurality of cameras.

【００１６】あるいは、パン・チルト・ズーム機能を有
する能動カメラを用いて、あらかじめ登録した物体に類
似した物体を検出し、その物体の３次元位置を求める物
体検出および位置測定を行う装置であって、遠近や向
き、照明などの撮影条件を変化させて撮影した多数の物
体の参照画像集合から物体特徴量を学習し、物体特徴お
よび特徴量分布を学習する物体特徴量学習手段と、前記
物体特徴量学習手段で得られた特徴量分布の画像上での
大きさからカメラの視野全てを探索するのに必要なパン
・チルト・ズームのカメラパラメータを計算するパン・
チルト・ズームパラメータ計算手段と、前記パン・チル
ト・ズームパラメータ計算手段で得られたカメラパラメ
ータ情報のうち、まだ探索していないカメラパラメータ
にカメラを選択して画像を撮影し、撮影した入力画像上
の各画素に対して前記物体特徴量学習手段で得られた特
徴量に変換した特徴量画像を生成するカメラ制御・画像
入力手段と、カメラ制御・画像入力手段で得られた特徴
量画像に、照合する物体特徴量分布の画像上での大きさ
に対応した注目窓を設定し、該注目窓内の特徴量分布を
計算する入力特徴量分布計算手段と、前記物体特徴量学
習手段で得られた物体特徴量分布と前記入力特徴量分布
計算手段で設定した注目窓内の特徴量分布との間の類似
値または距離値を計算し物体の有無を判定して物体を検
出する特徴量分布照合手段と、前記特徴量分布照合手段
で計算された類似値または距離値に基づいて前記物体特
徴と類似した他の多数の物体特徴量と該注目窓周辺の多
数の注目窓との間の照合省略可能な領域を計算する照合
省略領域計算手段と、前記特徴量分布照合手段で物体を
検出した場合、該検出方向から物体位置を計算する物体
位置計算手段とを、備えることを特徴とする物体検出／
位置測定装置である。Alternatively, it is an apparatus for detecting an object similar to an object registered in advance by using an active camera having a pan / tilt / zoom function, and performing object detection and position measurement for obtaining a three-dimensional position of the object. , An object feature amount learning unit that learns an object feature amount from a reference image set of a large number of objects photographed by changing shooting conditions such as perspective, orientation, and illumination, and the object feature amount learning unit, The pan / tilt / zoom camera parameters required to search the entire field of view of the camera from the size of the feature distribution on the image obtained by the quantity learning means
Of the camera parameter information obtained by the tilt / zoom parameter calculation means and the pan / tilt / zoom parameter calculation means, a camera is selected as a camera parameter that has not yet been searched, an image is taken, and the input image is taken. A camera control / image input unit that generates a feature amount image converted into a feature amount obtained by the object feature amount learning unit for each pixel of, and a feature amount image obtained by the camera control / image input unit, It is obtained by the input feature amount distribution calculating means for setting the target window corresponding to the size of the object feature amount distribution to be collated on the image and calculating the feature amount distribution in the target window, and the object feature amount learning means. A feature distribution for detecting an object by calculating a similarity value or a distance value between the object feature distribution and the feature distribution in the window of interest set by the input feature distribution calculating means, determining the presence or absence of the object Matching means and matching between a large number of other object feature quantities similar to the object feature based on the similarity value or distance value calculated by the feature quantity distribution matching means and a large number of attention windows around the attention window. An object characterized by comprising collation omitted region calculation means for calculating an eliminable region, and object position calculation means for calculating an object position from the detection direction when the feature amount distribution collation means detects an object. detection/
It is a position measuring device.

【００１７】あるいは、上記物体検出／位置測定方法に
おける手順を、コンピュータに実行させるためのプログ
ラムとしたことを特徴とする物体検出／位置測定方法の
実行プログラムである。Alternatively, there is provided an object detection / position measurement method execution program characterized in that a program for causing a computer to execute the procedure in the object detection / position measurement method.

【００１８】あるいは、上記物体検出／位置測定方法に
おける手順を、コンピュータに実行させるためのプログ
ラムとし、前記実行プログラムを、前記コンピュータが
読み取り可能な記録媒体に記録したことを特徴とする物
体検出／位置測定方法の実行プログラムを記録した記録
媒体である。Alternatively, the object detecting / position measuring method is characterized in that the procedure in the object detecting / position measuring method is a program for causing a computer to execute, and the execution program is recorded in a recording medium readable by the computer. It is a recording medium in which an execution program of the measuring method is recorded.

【００１９】本発明では、特に、「物体検出装置」（特
願平８−１４６８５７号）やテンプレート照合法を「物
体検出方法および装置およびこの方法を記録した記録媒
体」（特願２０００−３５３３６）と組み合わせた方法
に比べて、物体特徴量学習過程／手段において精度の高
い（探索漏れが少なく、誤検出が少ない）特徴を使用
し、３次元実環境で想定される照明やカメラパラメータ
の変動により生じる物体特徴量の変動に頑健である。ま
た、これらの特徴量はヒストグラムなどの特徴量分布が
計算可能であり、特徴量分布照合過程／手段において精
度を保証したまま多数の物体の参照画像による照合探索
を大幅に高速化できる（請求項５〜９）。とくに本発明
で課題とする３次元環境探索では向きの違いやピントの
違いなどによっては、物体特徴分布の変動が大きいた
め、請求項９で用いた合成特徴の利用が有効である。ま
た、請求項５により画像探索を増やすことによってカメ
ラの制御回数を減らすことができる。その画像探索の時
間は、請求項６の手法により精度をほとんど落とさない
まま大幅に削減することが可能になり、全体として探索
時間を大きく削減できる。以上が本発明の主眼である。In the present invention, in particular, the "object detection device" (Japanese Patent Application No. 8-146857) and the template matching method are referred to as "object detection method and device and recording medium recording this method" (Japanese Patent Application No. 2000-35336). Compared with the method combined with, the features with high accuracy (fewer search omissions and less false detections) are used in the object feature amount learning process / means, and it is possible to change the lighting and camera parameters expected in a 3D real environment. It is robust against changes in the amount of object features that occur. In addition, a feature amount distribution such as a histogram can be calculated for these feature amounts, and the matching search using a reference image of a large number of objects can be significantly speeded up while ensuring accuracy in the feature amount distribution matching process / means. 5-9). In particular, in the three-dimensional environment search which is the subject of the present invention, the variation of the object feature distribution is large due to the difference in the direction, the difference in focus, etc. Therefore, the use of the synthetic feature used in claim 9 is effective. In addition, the number of camera controls can be reduced by increasing the number of image searches according to the fifth aspect. The image search time can be greatly reduced by the method according to claim 6 with almost no loss in accuracy, and the search time can be greatly reduced as a whole. The above is the main point of the present invention.

【００２０】[0020]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を用いて説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００２１】図１は、本発明を適用した物体検出および
位置測定装置の一実施形態例を示すブロック図である。
ここでは、比較的効果が高いと思われる物体に含まれる
色に限定したベクトル量子化による色符号を物体特徴量
に使用し、物体特徴量の分布に該色符号のヒストグラム
を使用する。カメラを制御するパン・チルト・ズームパ
ラメータとしては「物体検出方法および装置およびこの
方法を記録した記録媒体」（特願２０００−３５３３
６）の実施形態例で示されている方法を用いる。特徴量
分布照合の類似値尺度には、照合する特徴量分布の重な
り率を用いる。カメラの制御方法では、検出した場合物
体が有りと扱う「検出用」物体特徴量分布の探索に対し
て、物体特徴分布間類似値を用いた照合省略法と「物体
検出装置」（特願平８−１４６８５７号）の実施形態例
で示されている入力画像上の局所部分領域の周辺領域に
対する照合省略法と組み合わせて用い、「候補選択用」
物体特徴量分布の探索に対しては、合成分布を用いた照
合省略と「物体検出装置」（特願平８−１４６８５７
号）の実施形態例で示されている方法とを組み合わせて
用いる。物体位置計算では、複数のカメラからの検出結
果から物体位置を計算する方法を用いる。FIG. 1 is a block diagram showing an embodiment of an object detecting and position measuring apparatus to which the present invention is applied.
Here, a color code obtained by vector quantization limited to colors included in an object that seems to have a relatively high effect is used for the object feature amount, and a histogram of the color code is used for distribution of the object feature amount. As the pan / tilt / zoom parameters for controlling the camera, “object detection method and apparatus and recording medium recording this method” (Japanese Patent Application No. 2000-3533).
The method shown in the example embodiment of 6) is used. The overlapping ratio of the feature amount distributions to be collated is used as the similarity value scale of the feature amount distribution collation. In the camera control method, in order to search for a “detection” object feature amount distribution that treats an object as detected, there is a matching omission method that uses similar values between object feature distributions and an “object detection device” (Japanese Patent Application No. No. 8-146857), it is used in combination with the collation omission method for the peripheral area of the local partial area on the input image, which is used for “candidate selection”.
For the search of the object feature amount distribution, the collation omission using the composite distribution and the “object detection device” (Japanese Patent Application No. 8-146857).
No.) is used in combination. The object position calculation uses a method of calculating the object position from the detection results from a plurality of cameras.

【００２２】図１に示す物体検出および位置測定装置
は、物体特徴量学習手段１と、パン・チルト・ズームパ
ラメータ計算手段２と、カメラ制御・画像入力手段３
と、入力画像特徴量分布計算手段４と、特徴量分布照合
手段５と、照合省略領域計算手段６と、物体位置計算手
段７とで構成され、参照画像すなわち見本となる物体を
表現する多数の照明条件、向き、スケールで撮影した画
像、カメラの初期位置・方向・視野角等を入力とし、参
照画像の物体特徴量分布との類似度があらかじめ設定し
たしきい値をθ以上でカメラに撮影されたカメラからの
物体の方向・３次元上の位置を出力する。The object detecting and position measuring apparatus shown in FIG. 1 comprises an object feature quantity learning means 1, a pan / tilt / zoom parameter calculation means 2, a camera control / image input means 3.
, An input image feature amount distribution calculation unit 4, a feature amount distribution collation unit 5, a collation omission region calculation unit 6, and an object position calculation unit 7, which represent a reference image, that is, a sample object. Input the lighting condition, orientation, image taken with scale, initial position / direction / viewing angle of the camera, etc., and take the image with the threshold of the similarity of the object feature distribution of the reference image to the preset threshold value θ or more. Outputs the direction / three-dimensional position of the object from the captured camera.

【００２３】物体特徴量学習手段１は、遠近や向き、照
明などの撮影条件を変化させて撮影した物体の参照画像
集合から物体特徴量を学習し、物体特徴および特徴量分
布を学習する。The object feature amount learning means 1 learns an object feature amount from a reference image set of an object shot by changing shooting conditions such as perspective, direction, and illumination, and learns an object feature and a feature amount distribution.

【００２４】パン・チルト・ズームパラメータ計算手段
２は、前記物体特徴量学習手段１から得られた特徴量分
布の画像上での大きさからカメラの視野全てを探索する
のに必要なパン・チルト・ズームの位置を計算する。The pan / tilt / zoom parameter calculating means 2 is necessary for searching the entire field of view of the camera from the size of the feature amount distribution obtained from the object feature amount learning means 1 on the image. -Calculate the zoom position.

【００２５】カメラ制御・画像入力手段３は、前記パン
・チルト・ズームパラメータ計算手段２で得られたカメ
ラパラメータ情報のうち、まだ探索していないカメラパ
ラメータにカメラを選択して画像を撮影し、撮影した入
力画像上の各画素に対して物体特徴量学習手段１で得ら
れた特徴量に変換した特徴量画像を生成する。The camera control / image input means 3 picks up an image by selecting a camera from among the camera parameter information obtained by the pan / tilt / zoom parameter calculation means 2 as a camera parameter that has not yet been searched, A feature amount image is generated by converting each pixel on the captured input image into the feature amount obtained by the object feature amount learning unit 1.

【００２６】入力画像特徴量分布計算手段４は、カメラ
制御・画像入力手段３で得られた特徴量画像に、照合す
る物体量分布の画像上での大きさに対応した注目窓を設
定し、注目窓内部の特徴量分布を計算する。The input image feature amount distribution calculating means 4 sets a notice window corresponding to the size of the object amount distribution to be collated on the feature amount image obtained by the camera control / image input means 3, The feature distribution inside the window of interest is calculated.

【００２７】特徴量分布照合手段５は、前記物体特徴量
学習手段１により得られた物体特徴量分布と入力画像特
徴量分布計算手段４で設定した注目窓内の特徴量分布と
の間の類似値を計算し物体の有無を判定する。The feature amount distribution collating means 5 is a similarity between the object feature amount distribution obtained by the object feature amount learning means 1 and the feature amount distribution within the target window set by the input image feature amount distribution calculating means 4. The value is calculated and the presence or absence of an object is determined.

【００２８】照合省略領域計算手段６は、前記特徴量分
布照合手段５で計算された類似値に基づいて該物体特徴
と類似した他の多数の物体特徴量と該注目窓周辺の多数
の注目窓との間の照合省略可能な領域を計算する。The collation omission area calculation means 6 is based on the similarity value calculated by the characteristic amount distribution collation means 5 and is based on the similar values, a large number of other object feature quantities similar to the object feature and a large number of attention windows around the attention window. Calculate the optional area matching between and.

【００２９】物体位置計算手段７は、前記特徴量分布照
合手段５により物体を検出した場合、その検出方向から
物体位置を計算する。When the feature amount distribution matching unit 5 detects an object, the object position calculating unit 7 calculates the object position from the detection direction.

【００３０】次に、図２のフローチャートを用いて、上
述した図１の構成の物体特徴量学習手段１〜物体位置計
算手段７における処理を具体的に説明する。図２中のＳ
１１〜Ｓ２０はその処理の手順を表すステップである。Next, with reference to the flow chart of FIG. 2, the processing in the object feature amount learning means 1 to the object position calculation means 7 of the above-mentioned configuration of FIG. 1 will be concretely described. S in FIG.
11 to S20 are steps representing the procedure of the processing.

【００３１】ステップＳ１１の物体特徴量学習過程にお
いて、物体特徴量学習手段１は、まず、位置や向きを変
えて様々な位置に配置した物体をズームの値を変えなが
ら撮影する。物体の位置は、３０ｍ２程度の広さの部屋
の場合２ｍ間隔程度で撮影すればよく、図３の（１）〜
（３）で示される物体の場合、正面、左右、の３方向を
撮影すれば十分であった。またこれ以上多く撮影した場
合でも、後に示す物体特徴量間の照合省略方法によりほ
とんど処理時間は変わらない。In the object feature amount learning process of step S11, the object feature amount learning means 1 first takes an image of objects placed at various positions by changing the position and direction while changing the zoom value. As for the position of the object, in the case of a room with a size of about 30 m2, it is sufficient to take images at intervals of about 2 m.
In the case of the object shown in (3), it was sufficient to photograph in three directions: front, left and right. Further, even when more images are taken than this, the processing time hardly changes due to the method of omitting matching between the object feature amounts described later.

【００３２】次に、撮影した画像から物体領域を切り出
す。切り出した各画像が参照画像である。ここで背景の
特徴が入らないように物体内部の領域を切り出す。本発
明に記述する手法は任意の形状の領域に対して容易に適
用可能であるが、簡潔に説明するために方形の場合を扱
う。また、ベクトル量子化の方法には毎度代表ベクトル
との距離計算を行って正確な代表ベクトルを求める方法
も可能であるが、ここでは速度を重視して、以下に述べ
る符号化テーブルを使った方法を採用する。Next, the object area is cut out from the photographed image. Each image cut out is a reference image. Here, the area inside the object is cut out so that the background features are not included. Although the method described in the present invention can be easily applied to a region having an arbitrary shape, a rectangular case is used for the sake of simplicity. In addition, as a vector quantization method, it is possible to calculate an accurate representative vector by calculating a distance with the representative vector every time, but here, with emphasis on speed, a method using an encoding table described below is used. To adopt.

【００３３】符号化テーブルはＲＧＢの各軸を３２分割
（３２³区画）した各色の小区画がどのＶＱ符号に属す
るかを示すもので、定数時間で色とＶＱ符号とを対応づ
ける。また物体探索への使用を目的とする場合、物体に
含まれる色のみに対してＶＱを行えるため、物体に含ま
れない色を表す特別な符号０を導入することによって、
物体に含まれる色に効率良く符号を割り当てることがで
きる。この方法では符号数が少なくなるため、ヒストグ
ラム照合演算が高速になる。更に、物体の写らない領域
を高速に枝刈りすることができる。手順の詳細を以下に
述べる。The encoding table shows which VQ code a small section of each color obtained by dividing each RGB axis into 32 (32 ³ sections) belongs, and associates the color with the VQ code in a constant time. Further, when it is intended to be used for object search, since VQ can be performed only on colors included in an object, by introducing a special code 0 representing a color not included in an object,
Codes can be efficiently assigned to colors included in an object. In this method, the number of codes is small, so that the histogram matching calculation becomes fast. Further, it is possible to quickly prun an area where no object is captured. The details of the procedure are described below.

【００３４】ステップ１．全体の画素の平均色、分散を
求める。この平均色を代表色Ｃ₁とし、全画素をＣ₁に帰
属させる。Step 1. Calculate the average color and variance of all pixels. This average color is designated as the representative color C ₁ and all pixels are assigned to C ₁ .

【００３５】ステップ２．分散最大の代表色Ｃ_iを選
ぶ。Ｃ_iの分散α_i＜σであれば、ステップ４へ飛ぶ。Step 2. Select the representative color C _i with the maximum variance. If the variance α _i <σ of C _i , jump to step 4.

【００３６】ステップ３．Ｃ_iを２つに分割し、公知の
方法であるＬＢＧアルゴリズムにより最適な代表色を決
定する。全画素と全代表色の距離を計算し各画素を最も
近い代表色に帰属させる。代表色に帰属された画素の色
の平均色と分散αを計算する。ステップ２へ戻る。Step 3. C _i is divided into two, and the optimum representative color is determined by the well-known method LBG algorithm. The distances between all pixels and all representative colors are calculated and each pixel is assigned to the closest representative color. The average color and variance α of the colors of the pixels belonging to the representative color are calculated. Return to step 2.

【００３７】ステップ４．色空間の全色Ｐ_iに対して最
も距離の近い代表色Ｃ_jを求める。｜Ｐ_i−Ｃ_j｜＜ｎα_j
であればＰ_iにＣ_jの符号を割り当てる。さもなければ、
Ｐ_iに符号０を割り当てる。Step 4. A representative color C _j closest to all the colors P _{i in the} color space is obtained. │P _i −C _j │ < _n α _j
If so, the code of C _j is assigned to P _i . Otherwise,
Assign code 0 to P _i .

【００３８】ｎは学習に用いた画素のうち符号０に含ま
れる画素の割合が一定以下になるように経験的に２〜３
程度に決める。ｎが少な過ぎる場合には、符号０の割合
が増加し、多過ぎる場合には０以外の符号に物体の色に
類似しない色が含まれる割合が増加する。どちらの場合
も誤検出が増加する。一方、σは、小さい場合には照明
や向きの変動に弱くなり、大きい場合には誤検出が多く
なる。このため物体の色分布に応じて決定する。The value n is empirically 2-3 so that the ratio of the pixels included in the code 0 among the pixels used for learning is less than a certain value.
Decide on the degree. When n is too small, the ratio of code 0 increases, and when it is too large, the ratio of codes other than 0 that include a color that is not similar to the color of the object increases. In either case, false positives increase. On the other hand, when σ is small, it is vulnerable to changes in illumination and orientation, and when it is large, erroneous detection increases. Therefore, it is determined according to the color distribution of the object.

【００３９】物体特徴量学習手段１では、各参照画像に
対して、参照画像中の各画素を色符号化し、各符号の数
を数えたヒストグラムを生成し特徴量分布照合手段５に
送出する。また、十分検出に用いることが可能なサイズ
の参照画像領域の大きさを物体毎に経験的に定め（１０
０画素〜１０００画素）、パン・チルト・ズームパラメ
ータ計算手段２に送出する。さらに、求めた符号化テー
ブルをカメラ制御・画像入力手段３に送出する。The object feature amount learning means 1 color-codes each pixel in the reference image for each reference image, generates a histogram in which the number of each code is counted, and sends it to the feature amount distribution collating means 5. Also, the size of the reference image area that can be used for sufficient detection is empirically determined for each object (10
(0 pixels to 1000 pixels), and sends them to the pan / tilt / zoom parameter calculation means 2. Further, the obtained encoding table is sent to the camera control / image input means 3.

【００４０】照合省略領域の計算に、参照画像のヒスト
グラムの類似や各ヒストグラムを合成したヒストグラム
を用いる場合には、類似値および合成ヒストグラムを物
体特徴量学習手段１であらかじめ計算しておいて、照合
省略領域計算手段６に送出する（図１中の点線の流れ
（請求項８，９））。類似値の計算法については、特徴
量分布照合手段５で合成ヒストグラムの計算法について
は照合省略領域計算手段６で述べる。When the similarity of the reference image histogram or a histogram obtained by synthesizing the histograms is used for the calculation of the matching omission area, the similarity value and the combined histogram are calculated in advance by the object feature amount learning means 1 and the matching is performed. The data is sent to the omitted area calculation means 6 (the flow of the dotted line in FIG. 1 (claims 8 and 9)). The calculation method of the similarity value will be described in the feature amount distribution matching unit 5, and the calculation method of the composite histogram will be described in the matching omitted region calculation unit 6.

【００４１】ステップＳ１２のパン・チルト・ズームパ
ラメータ計算過程において、パン・チルト・ズームパラ
メータ計算手段２は、前記物体特徴量学習手段１におい
て蓄積した物体の大きさに応じてカメラのパン・チルト
・ズーム間隔を設定する。具体的には、精度を保証し
て、できる限り少ない回数で全視野を撮影可能なパラメ
ータを用いる。このために、公知の方法、「物体検出方
法および装置およびこの方法を記録した記録媒体」（特
願２０００−３５３３６）を用いる。「物体検出方法お
よび装置およびこの方法を記録した記録媒体」（特願２
０００−３５３３６）は、画像サイズにおける最大物体
領域サイズの割合が大きいほど、ズームの回数を削減で
きる反面、重なり領域を多くとる必要があるので、パン
・チルト・ズームの回数が多くなるため、パン・チルト
・ズームの回数を最小化するような最大物体領域サイズ
を最小物体領域サイズから求める手法である。In the pan / tilt / zoom parameter calculation process of step S12, the pan / tilt / zoom parameter calculation means 2 determines the pan / tilt / zoom of the camera according to the size of the object accumulated in the object feature amount learning means 1. Set the zoom interval. Specifically, a parameter is used that guarantees accuracy and allows the entire field of view to be photographed as few times as possible. For this purpose, a known method, "object detection method and apparatus and recording medium recording this method" (Japanese Patent Application No. 2000-35336) is used. "Object detection method and apparatus and recording medium recording this method" (Japanese Patent Application No.
000-35336), the larger the ratio of the maximum object area size to the image size, the more the number of zooms can be reduced, but the number of overlapping areas needs to be increased. This is a method for obtaining the maximum object area size that minimizes the number of tilts and zooms from the minimum object area size.

【００４２】まず、物体特徴量学習手段１の出力とし
て、検出に使用するのに十分精度が保証できる物体特徴
量分布の最小サイズを得て、これを「物体検出方法およ
び装置およびこの方法を記録した記録媒体」（特願２０
００−３５３３６）に用いてカメラの視野全域（近くか
ら遠くまでを含む）を見渡すパン・チルト・ズームの全
パラメータを求める。パン・チルト・ズームパラメータ
計算手段２では、このように算出した全パラメータをリ
ストとして保存し、カメラ制御・画像入力手段３に送出
する。First, as the output of the object feature amount learning means 1, the minimum size of the object feature amount distribution that can be guaranteed with sufficient accuracy for use in detection is obtained, and this is recorded as "object detection method and apparatus and this method. Recording medium "(Japanese Patent Application 20
00-35336) to find all parameters of pan / tilt / zoom over the entire field of view of the camera (including from near to far). The pan / tilt / zoom parameter calculation means 2 saves all the parameters calculated in this way as a list and sends them to the camera control / image input means 3.

【００４３】ステップＳ１３のカメラ制御・画像入力過
程において、カメラ制御・画像入力手段３は、前記パン
・チルト・ズームパラメータ計算手段２で得られたカメ
ラパラメータリストの先頭にあるパラメータにカメラを
動かす。カメラの動作が完了したら画像を撮影し、入力
画像を得る。入力画像上の各画素に対して物体特徴量学
習手段１で得られた符号化テーブルを用いて変換した特
徴量画像を生成し、入力画像特徴量分布計算手段４へ送
出する。In the camera control / image input process of step S13, the camera control / image input means 3 moves the camera to the parameter at the head of the camera parameter list obtained by the pan / tilt / zoom parameter calculation means 2. When the operation of the camera is completed, an image is taken and an input image is obtained. A feature amount image obtained by converting each pixel on the input image using the encoding table obtained by the object feature amount learning unit 1 is generated and sent to the input image feature amount distribution calculation unit 4.

【００４４】さらに入力画像特徴量分布計算手段４の準
備として、物体特徴量分布と入力画像上の照合位置を添
字とするｂｏｏｌ値の３次元配列を作成する（照合配列
と呼ぶことにする）。大きさは、物体特徴量分布数×入
力画像上で照合する注目窓の総数である。初期値として
全てｔｒｕｅを代入する。物体特徴量の探索順序として
は、画素数が大きい方が重なりが大きいことが期待で
き、照合省略効果が大きいので、画素数の降順に探索す
る。一方、注目窓の探索順序は、物体と入力画像の類似
値が大きい場合、１画素毎ずらす方が高速になり、類似
値が小さい場合、大きな間隔を開けた方が高速になる。
これは注目窓をずらす場合、元のヒストグラムに対して
新たに入った画素の符号を加えて出ていった画素の符号
を減らせば、ずらした注目窓のヒストグラムが生成でき
る。全部入れ替える場合に比べ、高速である。一方、入
力画像との類似値が小さい場合、一回の照合の結果、上
下左右の局所領域に対して照合の省略が可能になる。一
画素ずつずらす場合には、進行方向にしか照合省略がで
きないので、効率が悪い。一般に入力画像のほとんどの
場合、物体と類似していないことが期待されるので、ま
ず、粗い間隔（参照画像のサイズの．５倍程度の間隔で
注目領域を照合し、そのあと残った領域に対してずらし
て照合するのが比較的良い結果が得られる。Further, as a preparation for the input image characteristic amount distribution calculating means 4, a three-dimensional array of the bool values having the object characteristic amount distribution and the collation position on the input image as a subscript is created (hereinafter referred to as collation array). The size is the number of object feature distributions × the total number of windows of interest to be matched on the input image. All true are substituted as initial values. As for the search order of the object feature amount, it can be expected that the larger the number of pixels, the larger the overlap, and the greater the collation omission effect, so the search is performed in descending order of the number of pixels. On the other hand, when the similarity value between the object and the input image is large, the search order of the window of interest is faster when the pixels are shifted by one pixel, and when the similarity value is smaller, the large interval is faster.
This is because when the window of interest is shifted, the histogram of the shifted window of interest can be generated by adding the code of the newly entered pixel to the original histogram and reducing the code of the pixel that has left. It is faster than replacing all of them. On the other hand, when the similarity value with the input image is small, the collation can be omitted for the upper, lower, left, and right local regions as a result of one collation. In the case of shifting by one pixel, collation can be omitted only in the traveling direction, which is inefficient. Generally, most of the input images are expected not to be similar to the object, so first of all, the target region is collated at coarse intervals (about 0.5 times the size of the reference image), and then the remaining regions are Relatively good results can be obtained by comparing with each other.

【００４５】ステップ１４の入力画像特徴量分布計算過
程において、入力画像特徴量分布計算手段４は、カメラ
制御・画像入力手段３で得られた特徴量画像上の残った
注目窓のうちの１つの注目窓内部の特徴量分布を計算
し、照合領域があれば得られた特徴量分布を特徴量分布
照合手段５に送出するが、照合領域がなければ、ステッ
プ１３に戻る（ステップ１５）。In the step 14 of calculating the input image feature amount distribution, the input image feature amount distribution calculating means 4 selects one of the remaining attention windows on the feature amount image obtained by the camera control / image input means 3. The feature amount distribution inside the window of interest is calculated, and if there is a collation region, the obtained feature amount distribution is sent to the feature amount distribution collating means 5. If there is no collation region, the process returns to step 13 (step 15).

【００４６】ステップ１６の特徴量分布照合過程におい
て、特徴量分布照合手段５は、まず、物体特徴量学習手
段１から出力された参照画像のヒストグラムと、入力画
像特徴量分布計算手段４から出力された入力画像上の局
所領域のヒストグラムを受け取る。続いて、例えば、こ
れらのヒストグラム同士の類似値（ｓｉｍｉｌａｒｉｔ
ｙ）を計算する。ヒストグラム特徴Ｈ_MとＨ_Aの重なり率
Ｓ_AMは、次のように定義される。In the feature quantity distribution matching process of step 16, the feature quantity distribution matching means 5 first outputs the histogram of the reference image output from the object feature quantity learning means 1 and the input image feature quantity distribution calculation means 4. It receives a histogram of local regions on the input image. Then, for example, the similarity value (similit
Calculate y). The overlap rate S _AM of the histogram features H _M and H _A is defined as follows.

【００４７】[0047]

【数１】 [Equation 1]

【００４８】ここで、Ｈ_MとＨ_Aは、それぞれ参照画像Ｍ
と入力画像中の局所領域Ａに対するヒストグラムであ
り、Ｈ_Mi，Ｈ_Aiはそれぞれのｉ番目の符号を持つ画素数
である。また｜Ｍ｜は参照画像の画素数（参照画像の特
徴の総数）であり、Ｉは符号の種類である。Here, H _M and H _A are reference images M, respectively.
Is a histogram for the local area A in the input image, and H _Mi and H _Ai are the numbers of pixels each having the i-th code. Also, | M | is the number of pixels of the reference image (total number of features of the reference image), and I is the type of code.

【００４９】もし、Ｓ_AMが探索しきい値θ以上となると
き、物体として検出し、検出した物体の方向および大き
さを物体位置計算手段７に出力する（ステップ１７）。If S _AM is greater than or equal to the search threshold value θ, it is detected as an object and the direction and size of the detected object are output to the object position calculation means 7 (step 17).

【００５０】もし、候補選択用のヒストグラムを用いて
カメラを動的に制御する場合には（ステップ１８（図１
中の１点鎖線の処理の流れ（請求項５）））、類似値の
値が予め設定した予測用しきい値を越えた場合は、現在
の注目窓位置において、物体がありそうだと分かるの
で、ステップ１２に戻り、その領域を含むズームイン後
のカメラパラメータが先頭になるように並び替える。ズ
ームインすることになった領域に対する該候補選択用の
ヒストグラムの照合は打ち切る。複数検出した場合に
は、重なりの大きかった順に並ぶようにカメラパラメー
タリストを並べ替える。If the camera is dynamically controlled using the histogram for candidate selection (step 18 (see FIG.
If the value of the similarity value exceeds the preset threshold value for prediction, it can be known that there is an object at the current window position of interest. Returning to step 12, rearrangement is performed so that the camera parameter including the area after zooming in is at the top. The matching of the histogram for selecting the candidate with respect to the area that is to be zoomed in is terminated. When a plurality of images are detected, the camera parameter list is rearranged so that they are arranged in descending order of overlap.

【００５１】Ｓ_AMが探索しきい値θ以下の場合には、こ
の類似値を照合省略領域計算手段６に出力する（ステッ
プ１８）。When S _AM is equal to or smaller than the search threshold value θ, this similar value is output to the collation skipping area calculating means 6 (step 18).

【００５２】ステップ１９の照合省略領域計算過程にお
いて、照合省略領域計算手段６でなされる、特徴量分布
照合手段５による照合結果を用いた周辺局所領域、類似
物体特徴分布照合の省略法について述べる。なお、ステ
ップ１９での処理後はステップ１４に戻る。In the matching omission area calculation process of step 19, a method of omitting the peripheral local area / similar object feature distribution matching using the matching result by the feature amount distribution matching means 5 performed by the matching omission area calculating means 6 will be described. After the processing in step 19, the process returns to step 14.

【００５３】まず、局所領域の枝刈り法を図４を用いて
説明する。大きさの異なる２つの参照画像をＭ，Ｎとす
る（｜Ｍ｜＞｜Ｎ｜）。ＭとＮの色の変化はない。すな
わち、ＮのヒストグラムはＭのヒストグラムの定数倍で
あり、Ｎの色は全てＭに含まれるとする（異なる色が含
まれる場合への一般化は後述）。ここで、｜Ａ｜＝｜Ｍ
｜，｜Ｂ｜＝｜Ｎ｜を満たし重複領域を持つ局所領域
Ａ，Ｂを考える。参照画像Ｍと局所領域Ａとの類似値Ｓ
_AMと、参照画像Ｎと局所領域Ｂとの類似値Ｓ_BNとの間に
は（２）式の不等式が成立する。First, the local area pruning method will be described with reference to FIG. Let two reference images of different sizes be M and N (| M |> | N |). There is no change in the colors of M and N. That is, the histogram of N is a constant multiple of the histogram of M, and it is assumed that all N colors are included in M (generalization to the case where different colors are included will be described later). Where | A | = | M
Consider local areas A and B that satisfy |, | B | = | N | and have overlapping areas. Similarity value S between reference image M and local area A
The inequality (2) is established between _AM and the similar value S _BN of the reference image N and the local area B.

【００５４】｜Ｎ｜・Ｓ_BN＜｜Ｍ｜・Ｓ_AM＋ｎ …（２）ここで、｜Ｍ｜・Ｓ_AMはＡとＭとの間で色の同じ画素の
組が何組あるかを意味する。これを同色画素数と呼ぶこ
とにする。この式は「物体とＢ間の同色画素数（｜Ｎ｜
・Ｓ_BN）は物体とＡ間の同色画素数（｜Ｍ｜・Ｓ_AM）が
ＡとＢの共通領域に局在し、かつ、Ａに含まれないＢの
領域（図４の画素すべてが物体とＢの同色画素になった
場合より多くなることはない」ことを意味する。すなわ
ち、Ｓ_AMが既知となったとき、｜Ｎ｜・Ｓ_BNの上限は
（２）式の右辺になる。この上限値が｜Ｎ｜・θより小
さければＮとＢとの照合を省略することができ、探索の
高速化が図れる。照合を省略できるｎの範囲は次の式で
与えられる。| N | · S _BN <| M | · S _AM + n (2) Here, | M | · S _AM indicates how many sets of pixels of the same color are present between A and M. means. This will be called the number of pixels of the same color. This formula is “the number of pixels of the same color between the object and B (| N |
・ S _BN ) has the same number of pixels (| M | · S _AM ) of the same color between the object and A localized in the common region of A and B, and the region of B not included in A (all pixels in FIG. 4 are There is no more than when the object and B have the same color pixel. ”That is, when S _AM becomes known, the upper limit of | N | · S _BN becomes the right side of equation (2). If the upper limit value is smaller than | N | .theta., The collation between N and B can be omitted, and the search can be speeded up.The range of n at which the collation can be omitted is given by the following equation.

【００５５】｜Ｍ｜・Ｓ_AM＋ｎ＜｜Ｎ｜・θ ∴ｎ＜｜Ｎ｜・θ−｜Ｍ｜・Ｓ_AM …（３）次に，参照画像Ｎに参照画像Ｍと色の異なる画素が含ま
れる場合を扱えるよう、（２）式を拡張する。Ｎの画素
のうち、Ｍと異色な画素（ＮのＭに対する異色画素と呼
ぶ）の数は｜Ｎ｜（１−Ｓ_MN）で表される（図５）。Ｓ
_AMからは、ＮのＭに対する異色画素がＡ中に存在するか
の情報は得られないため、ＮのＭに対する異色画素がＡ
中に現れていると仮定して、｜Ｎ｜・Ｓ_BNの上限を推定
し、｜Ｎ｜・θに達しない領域を求める必要がある。す
なわち、以下の不等式（４）が導かれる。本方式をＯＲ
探索と呼ぶことにする。| M | · S _AM + n <| N | · θ ∴n <| N | · θ− | M | · S _AM (3) Next, in the reference image N, a pixel whose color is different from that of the reference image M is used. Expression (2) is extended to handle the case where is included. Among N pixels, the number of pixels different in color from M (referred to as different color pixels with respect to M of N) is represented by | N | (1-S _MN ) (FIG. 5). S
_{Since AM} does not provide information as to whether a different color pixel for N M exists in A, a different color pixel for N M is A
It is necessary to estimate the upper limit of | N | · S _BN , assuming that it appears inside, and obtain a region that does not reach | N | · θ. That is, the following inequality (4) is derived. OR this method
Let's call it search.

【００５６】｜Ｎ｜・Ｓ_BN＜｜Ｍ｜・Ｓ_AM＋｜Ｎ｜（１−Ｓ_MN）＋ｎｎ＜｜Ｎ｜・（θ−１＋Ｓ_MN）−｜Ｍ｜・Ｓ_AM …（４）一方、探索する参照画像の間でヒストグラムの違いが大
きい場合には、式（４）に含まれる加算部分｜Ｎ｜（１
−Ｓ_MN）・｜Ｍ｜が大きくなり、照合を省略する効果が
少ない。そこでＯＲ探索のように類似値にヒストグラム
の違いを反映させるのではなく、ヒストグラム自身に反
映させることによって不要な照合を省略させることを考
える（図６）。参照画像の全てのヒストグラムを| N | · S _BN <| M | · S _AM + | N | (1-S _MN ) + n n <| N | · (θ−1 + S _MN ) − | M | · S _AM (4) On the other hand, when the difference in the histograms between the reference images to be searched is large, the addition part | N | (1
-S _MN ) · | M | becomes large, and the effect of omitting collation is small. Therefore, it is considered that unnecessary matching is omitted by reflecting the difference in the histogram in the similar value unlike in the OR search, but by reflecting the difference in the histogram itself (FIG. 6). All histograms of the reference image

【００５７】[0057]

【数２】 [Equation 2]

【００５８】とする。この時、合成ヒストグラムＨ_UをIt is assumed that At this time, the composite histogram H _U

【００５９】[0059]

【数３】 [Equation 3]

【００６０】と定義する。Ｈ_UiはＵの各符号の数であ
る。Ｈ_Uiの各要素は、各参照画像の最大値から構成され
るため、ある探索窓ＡとＵとの類似値はＡとＵに属する
Ｍ⁰，Ｍ¹，…，Ｍ^kの類似値よりも必ず大きい。従って
｜Ｕ｜・Ｓ_AUは｜Ｍ｜・Ｓ^J _AM（∀ｊ∈１，…，ｋ）の
上限値となる。Ｕを探索することによってＭ⁰，Ｍ¹，
…，Ｍ ^kの照合を省略できる。但し、類似値自体を求め
る必要がある場合には、Ｕにより検出した領域に関し
て、更にＭ⁰，Ｍ¹，…，Ｍ^kを用いて探索する必要があ
る。もちろん、後にズームインして再度評価する候補選
択段階では繰り返し探索する必要はない。It is defined as H_UiIs the number of each code in U
It H_UiEach element of consists of the maximum value of each reference image
Therefore, the similarity value between certain search windows A and U belongs to A and U.
M⁰, M¹, ..., M^kIs always greater than the similarity value of. Therefore
｜ U ｜・ S_AUIs | M | ・ S^J _AM(∀j ∈ 1, ..., k)
It becomes the upper limit. By searching U⁰, M¹，
…, M ^kCan be omitted. However, the similar value itself is calculated
The area detected by U, if necessary.
And M⁰, M¹, ..., M^kNeed to search using
It Of course, selecting candidates to zoom in later and evaluate again
It is not necessary to search repeatedly in the selection stage.

【００６１】本方式をＵＮＩＯＮ探索と呼ぶことにす
る。This method will be called a UNION search.

【００６２】上記ｎの範囲の重複領域をもつ注目窓に相
当する照合配列を全てｆａｌｓｅにする。All the collating sequences corresponding to the window of interest having the overlapping area in the range of n are set to false.

【００６３】物体が検出された場合にステップ２０の物
体位置計算過程において、物体位置計算手段７では、前
記特徴量分布照合手段５により、物体方向が検出された
とき、画像上の物体の座標When an object is detected, in the object position calculation process of step 20, in the object position calculation means 7, when the object direction is detected by the feature amount distribution matching means 5, the coordinates of the object on the image are detected.

【００６４】[0064]

【数４】 [Equation 4]

【００６５】が得られる。Is obtained.

【００６６】これから、カメラ座標系（カメラのレンズ
中心が原点、画像平面上、左右にパン軸［右方正］、上
下にチルト軸［上方正］、光軸方向にＺ軸［前方正］）
における物体の方向θ（θ_x，θ_y［ｒａｄ］））が求ま
る。From now on, the camera coordinate system (the center of the lens of the camera is the origin, on the image plane, the pan axis [rightward right] on the left and right, the tilt axis [upward positive] up and down, the Z axis [forward positive] on the optical axis).
The direction of the object at θ (θ _x , θ _y [rad])) is obtained.

【００６７】カメラの３次元上の外部パラメータ（位置Three-dimensional external parameters of camera (position

【００６８】[0068]

【数５】 [Equation 5]

【００６９】、方向, Direction

【００７０】[0070]

【数６】 [Equation 6]

【００７１】））とする。ｉはカメラ番号、また物体の
３次元上の外部パラメータ（位置))). i is the camera number, and the three-dimensional external parameters (position

【００７２】[0072]

【数７】 [Equation 7]

【００７３】）とする。)).

【００７４】[0074]

【数８】 [Equation 8]

【００７５】ただし、視野はHowever, the field of view is

【００７６】[0076]

【数９】 [Equation 9]

【００７７】、入力画像サイズはThe input image size is

【００７８】[0078]

【数１０】 [Equation 10]

【００７９】である。It is

【００８０】また物体の大きさThe size of the object

【００８１】[0081]

【数１１】 [Equation 11]

【００８２】が既知のとき、物体までの距離Ｌ（ｌ_x，
ｌ_y）はWhen is known, the distance L (l _x ,
l _y ) is

【００８３】[0083]

【数１２】 [Equation 12]

【００８４】となる。このとき物体の座標はIt becomes At this time, the coordinates of the object are

【００８５】[0085]

【数１３】 [Equation 13]

【００８６】として求めることができる。ここでＹはＹ
軸回りの回転、ＺはＺ軸回りの回転を表す。Can be obtained as Where Y is Y
Rotation around the axis, and Z indicates rotation around the Z axis.

【００８７】しかし、ヒストグラム照合の場合、検出し
た物体の大きさの精度はあまり正確でない。そこで、多
数のカメラによるステレオ視による物体位置の測定を行
うことも可能である。あるカメラで物体を検出し、位置
を求めた場合、ほかのカメラを、当該位置にある物体を
検出できるカメラパラメータ、および誤差を考慮して当
該位置にある物体を検出できるカメラパラメータから１
段階広角にしたカメラパラメータをカメラパラメータリ
ストの先頭におく。２つ以上のカメラで物体を検出した
場合、探索を終了する。他のカメラで物体を検出しなか
った場合、はじめの検出結果を正解とすることも可能で
あるが、ここでは、不正解として２つ以上のカメラで物
体を検出するまで探索を続行する。However, in the case of histogram matching, the accuracy of the size of the detected object is not very accurate. Therefore, it is also possible to measure the object position by stereoscopic viewing with a large number of cameras. When an object is detected by a certain camera and the position is obtained, the other camera is selected from the camera parameters capable of detecting the object at the position and the camera parameter capable of detecting the object at the position considering the error.
The camera parameter with the stepwise wide angle is placed at the top of the camera parameter list. When an object is detected by two or more cameras, the search ends. When the object is not detected by another camera, the first detection result can be set as the correct answer, but here, the search is continued until the object is detected as an incorrect answer by two or more cameras.

【００８８】以下２つ以上のカメラで物体を検出した場
合の位置測定法について述べる。A position measuring method when an object is detected by two or more cameras will be described below.

【００８９】物体の座標はThe coordinates of the object are

【００９０】[0090]

【数１４】 [Equation 14]

【００９１】として求めることができる（ｋ：距離を表
す未知数、α＝ｄⁱ _x＋θⁱ _x，β＝ｄⁱ _y＋θⁱ _y）。(K: unknown number representing distance, α = d ⁱ _x + θ ⁱ _x , β = d ⁱ _y + θ ⁱ _y ).

【００９２】ｋを消去するとｑ_xcos（α）＋ｑ_ysin（α）＝ｐ_xcos（α）＋ｐ_ysin（α） …（１０）ｑ_ysin（β）−ｑ_zcos（α）sin（β）＝ｐⁱ _ysin（β）＋ｐⁱ _zcos（α）sin（ β） …（１１）となる。はじめＱの座標の３つが未知数で、カメラ１台
の結果が増える毎に、未知数が１つと式が３つ増える。
すなわちカメラ２つ以上有れば物体の位置が求まる。そ
こで２つ以上で検出した場合、特異値分解など公知の手
法を用いて残差最小になる逆行列を求め、精度の良い物
体位置を求めることができる。When k is eliminated, q _x cos (α) + q _y sin (α) = p _x cos (α) + _py sin (α) (10) q _y sin (β) −q _z cos (α) sin (Β) = p ⁱ _y sin (β) + p ⁱ _z cos (α) sin (β) (11) Initially, three of the coordinates of Q are unknowns, and each time the result of one camera increases, the number of unknowns and the number of formulas increase by one.
That is, if there are two or more cameras, the position of the object can be obtained. Therefore, when two or more detections are performed, a well-known method such as singular value decomposition can be used to obtain an inverse matrix that minimizes the residual error, and an accurate object position can be obtained.

【００９３】次に、本発明の実現性および具体性を示す
ために行った実験結果について述べる。本実験では、図
７で述べる仕様で、最広角時に図８で得られる環境に対
して物体検出実験を行った。使用した物体は図３の物体
である。物体の大きさは、高さ約５ｃｍから１０ｃｍ程
度である。Next, the results of experiments conducted to show the feasibility and specificity of the present invention will be described. In this experiment, an object detection experiment was performed in the environment obtained in FIG. 8 at the widest angle with the specifications described in FIG. 7. The object used is the object of FIG. The size of the object is about 5 cm to 10 cm in height.

【００９４】まず、学習段階において、対象とする物体
を各１００枚程度適当に撮影し、参照画像とした。具体
的には、照明条件を変えるために図８のＡ〜Ｅの５箇所
に配置し、物体の正面、左右の計３方向を、１．２倍ず
つカメラズームを変え撮影した。First, at the learning stage, about 100 target objects were appropriately photographed and used as reference images. More specifically, in order to change the illumination condition, they were arranged at five points A to E in FIG. 8, and the front, right and left three directions in total of the object were photographed by changing the camera zoom by 1.2 times.

【００９５】探索段階では、物体をカメラから５ｍ以内
の近くから遠くまで、図８のａ〜ｅの５箇所に配置し
た。更に、物体の向きを左右に２種類変化させた。この
とき物体の輪郭形状が異なる場合がある。この場合にも
アクティブ探索法は適用可能であるが、より高速に探索
を行うため、輪郭形状が相似な物体内部の矩形を参照画
像として利用した。また、ヒストグラム区画生成に使用
したσは、向きの変化により色の変動が大きいものに対
しては、大きいσを、色の変動が少ないものに対しては
小さいσを物体毎に選択した（σ＝１．５〜３）。σは
計算速度にはあまり影響を与えなかった。その理由は符
号数の減少による計算速度の向上の効果と、背景類似値
の上昇による枝刈り効率の低下の効果が相殺しているた
めと推察される。閾値は、予備実験により物体毎に設定
した。In the search stage, the object was placed at 5 points a to e in FIG. 8 from near to far within 5 m from the camera. Furthermore, the orientation of the object was changed to the left and right. At this time, the contour shape of the object may be different. Although the active search method can be applied to this case as well, in order to perform the search at a higher speed, a rectangle inside the object with similar contour shapes was used as the reference image. As for σ used for the histogram partition generation, a large σ is selected for each object having a large color variation due to a change in orientation, and a small σ is selected for each object having a small color variation (σ = 1.5-3). σ did not significantly affect the calculation speed. It is presumed that the reason is that the effect of improving the calculation speed due to the decrease in the number of codes and the effect of decreasing the pruning efficiency due to the increase in the background similarity value are offset. The threshold was set for each object by preliminary experiments.

【００９６】物体に含まれる色に限定したベクトル量子
化で生成した符号特徴（ＶＱ法）による探索精度を評価
するため、等分割のヒストグラム区画を用いた従来法と
の探索精度の違いを評価した。入力画像としては広角段
階で撮影した画像１５枚を用いた。In order to evaluate the search accuracy by the code feature (VQ method) generated by the vector quantization limited to the colors included in the object, the difference in the search accuracy from the conventional method using the equally divided histogram partitions was evaluated. . As the input image, 15 images taken in the wide-angle stage were used.

【００９７】精度は適合率（Ｐｒｅｃｉｓｉｏｎｒａｔ
ｅ）と再現率（ｒｅｃａｌｌｒａｔｅ）の平均値で評価
した。ここで適合率とは、探索結果として出力されたも
ののうち正しいものの割合であり、再現率とは、探索さ
れるべきもののうち探索結果として出力されたものの割
合である。適合率と再現率がともに１００％であれば、
検出もれや余分な検出がなかったことを意味する。Precision is a precision ratio (Precision rat)
It evaluated by the average value of e) and a recall rate. Here, the precision ratio is a ratio of correct ones output as the search result, and the recall ratio is a ratio of those output as the search result among those to be searched. If both precision and recall are 100%,
This means that there was no missed detection or extra detection.

【００９８】実験結果を図９に示す。この実験では、探
索窓の大きさをパラメータとして精度を評価した。The experimental results are shown in FIG. In this experiment, the accuracy was evaluated using the size of the search window as a parameter.

【００９９】従来法では、参照画像を多数撮影したにも
関わらず一辺の長さが１０〜３０画素の範囲では、６０
％程度の精度しか達成できなかった。一方、ＶＱ法は、
一辺１０〜３０画素の物体に対しても、９０％近い精度
を達成した。According to the conventional method, in the case where the length of one side is 10 to 30 pixels, 60
We could only achieve accuracy of about%. On the other hand, the VQ method
Even for an object with 10 to 30 pixels on a side, an accuracy of nearly 90% was achieved.

【０１００】次に，ＯＲ法とＵＮＩＯＮ法の高速化の効
果を評価するため、前実験と同じ条件で探索速度を測定
した（図１０）。Next, in order to evaluate the effect of speeding up the OR method and the UNION method, the search speed was measured under the same conditions as in the previous experiment (FIG. 10).

【０１０１】比較した手法は、「物体検出装置」（特願
平８−１４６８５７号）、ＶＱ法＋「物体検出装置」
（特願平８−１４６８５７号）（ＶＱ法）、ＶＱ法に加
えＯＲ探索を行ったＯＲ法、同様にＵＮＩＯＮ探索を行
ったＵＮＩＯＮ法の計４種類である。The methods used for comparison are "object detection device" (Japanese Patent Application No. 8-146857), VQ method + "object detection device".
(Japanese Patent Application No. 8-146857) (VQ method), an OR method in which an OR search is performed in addition to the VQ method, and a UNION method in which a UNION search is performed similarly.

【０１０２】まず、ＶＱ法により約２倍程度、探索時間
が向上していることがわかる。ＯＲ法を用いることによ
って精度を保証しつつ速度は更に１０％程度改善してい
る。一方ＵＮＩＯＮ法では、速度は更に８０％程度向上
する。ＵＮＩＯＮ法により速度が向上している理由は、
５〜１０枚の複数参照画像の照合をまとめて１つで探索
しているためである。併合探索では、物体検出の取りこ
ぼしはないが別の領域を誤検出する危険はある。しか
し、今回の場合、複数の参照画像に対するヒストグラム
は類似しているために、完全には精度を保証しないもの
のカメラの制御に利用するためには十分な精度が得られ
ている。First, it can be seen that the VQ method improves the search time about twice. By using the OR method, the speed is further improved by about 10% while ensuring the accuracy. On the other hand, in the UNION method, the speed is further improved by about 80%. The reason why the speed is improved by the UNION method is
This is because the collation of 5 to 10 multiple reference images is collectively searched by one. In the merge search, there is no omission in object detection, but there is a risk of erroneously detecting another area. However, in this case, since the histograms for a plurality of reference images are similar, the accuracy is not completely guaranteed, but sufficient accuracy is obtained for use in camera control.

【０１０３】さらに、以上の手法をカメラ制御が必要な
物体探索について評価した（図１１）。この実験では、
検出するために少なくとも１度のズームが必要なａ〜ｃ
にある物体を探索対象とした。Further, the above method was evaluated for the object search requiring camera control (FIG. 11). In this experiment,
A to c that requires at least one zoom to detect
The object in is the search target.

【０１０４】比較する手法は、予測なしの手法（従来
法）、従来のアクティブ探索にＶＱを導入して物体の検
出とカメラの動的制御を行うＶＱ予測法、ＶＱ＋ＯＲ探
索で物体の検出とカメラの動的制御を行うＶＱ＋ＯＲ
法、ＶＱ＋ＯＲ探索で物体の検出を行いカメラの動的制
御をＶＱ＋ＵＮＩＯＮ探索で行うＶＱ＋ＵＮＩＯＮ法の
４通りである。Methods for comparison include a method without prediction (conventional method), a VQ prediction method in which VQ is introduced into a conventional active search to detect an object and dynamically control a camera, and an object detection and a camera in a VQ + OR search. VQ + OR for dynamic control of
Method, and the VQ + UNION method in which the object is detected by the VQ + OR search and the camera is dynamically controlled by the VQ + UNION search.

【０１０５】探索精度は、いずれも１００％になる大き
さで探索を行った。動的制御により物体検出までの時間
が約５０％削減できたことがわかる。更に並列探索で１
０％、併合探索で５０％の探索時間を削減できた。しか
しながら併合探索で探索窓のサイズが一辺１０画素以下
の小さな領域の場合には一部の物体に対して探索候補の
誤検出の増加のために、無駄なカメラ制御が多くなり探
索時間が若干増加する場合もあった。The search precision was 100% in all cases. It can be seen that the time until object detection was reduced by about 50% by the dynamic control. 1 in parallel search
The search time could be reduced by 0% and 50% by the merged search. However, in the merged search, when the size of the search window is a small area of 10 pixels or less on one side, useless camera control is increased and search time is slightly increased due to increase in false detection of search candidates for some objects. Sometimes I did.

【０１０６】最後に、７．５ｍ×３．５ｍの部屋の４角
に設置したカメラによる物体検出の結果を述べる。同じ
学習データを利用し、図１２に示すα〜ｆの位置におい
た物体に対して実験を行った。図１３は検出して行く様
子を示したものである。図１４（１）〜（３）は得られ
た視線方向から物体位置を求めたものである。直線が視
線を表し、四角が特異値分解により求めた交点の座標で
ある。多数の視線情報を用いることで頑健に検出できる
ことがわかる。図１５（１），（２）は全点で調べた測
定結果である。図１５（１）中の円は、検出したカメラ
の数を表す。ほぼ４つのカメラでもれなく検出てきてお
り。隠蔽や、非常に遠方にあり見えない場合に３つ以下
になることがあるが、今回の実験では検出したカメラの
台数が２台未満になることはなかった。図１５（２）
は、真値と測定値の誤差を示している。多数のカメラに
なるほど、中央に近いほど精度が良いことがわかる。こ
こで、平均誤差２８ｃｍ、標準偏差６．７ｃｍで物体位
置が得られ、３０ｃｍ程度の大きさの物体の位置検出と
しては良好な結果が得られている。Finally, the results of object detection by cameras installed in the four corners of a 7.5 m × 3.5 m room will be described. Using the same learning data, an experiment was conducted on objects placed at positions α to f shown in FIG. FIG. 13 shows how detection is performed. 14 (1) to (3) are obtained by obtaining the object position from the obtained line-of-sight direction. The straight line represents the line of sight, and the square represents the coordinates of the intersection determined by singular value decomposition. It can be seen that robust detection can be performed by using a large number of line-of-sight information. FIGS. 15 (1) and 15 (2) show the measurement results examined at all points. The circle in FIG. 15 (1) represents the number of detected cameras. It has been detected by almost four cameras. The number of cameras detected may not be less than 2 in this experiment, although the number may be 3 or less when it is hidden or when it is very far away and invisible. Fig. 15 (2)
Indicates the error between the true value and the measured value. It can be seen that the more cameras, the closer to the center the accuracy is. Here, the object position is obtained with an average error of 28 cm and a standard deviation of 6.7 cm, which is a good result for detecting the position of an object having a size of about 30 cm.

【０１０７】なお、上記の実施の形態では類似値を用い
る場合について説明したが、類似値に代えて距離値を用
いることが可能である。この場合、類似値の上限値は距
離値の下限値となる。In the above embodiment, the case where the similar value is used has been described, but it is possible to use the distance value instead of the similar value. In this case, the upper limit of the similarity value becomes the lower limit of the distance value.

【０１０８】また、図１で示した装置における各部の一
部もしくは全部の機能をコンピュータのプログラムで構
成し、そのプログラムをコンピュータを用いて実行して
本発明を実現することができること、あるいは、図２で
示した処理の手順をコンピュータのプログラムで構成
し、そのプログラムをコンピュータに実行させることが
できることは言うまでもなく、コンピュータでその機能
を実現するためのプログラム、あるいは、コンピュータ
にその処理の手順を実行させるためのプログラムを、そ
のコンピュータが読み取り可能な記録媒体、例えば、Ｆ
Ｄ（フロッピーディスク（登録商標））や、ＭＯ、ＲＯ
Ｍ、メモリカード、ＣＤ、ＤＶＤ、リムーバブルディス
クなどに記録して、保存したり、配布したりすることが
可能である。また、上記のプログラムをインターネット
や電子メールなど、ネットワークを通して提供すること
も可能である。Further, a part or all of the functions of each unit in the apparatus shown in FIG. 1 can be configured by a computer program, and the program can be executed by the computer to implement the present invention, or It goes without saying that the processing procedure shown in 2 can be configured by a computer program and the computer can be caused to execute the program, or the computer executes the processing procedure. The program for executing the program is recorded on a computer-readable recording medium, for example, F
D (floppy disk (registered trademark)), MO, RO
It is possible to record, save, or distribute to M, a memory card, a CD, a DVD, a removable disk, or the like. It is also possible to provide the above program through a network such as the Internet or electronic mail.

【０１０９】[0109]

【発明の効果】以上で明らかなように、本発明によれ
ば、特に、「物体検出装置」（特願平８−１４６８５７
号）やテンプレート照合法を「物体検出方法および装置
およびこの方法を記録した記録媒体」（特願２０００−
３５３３６）と組み合わせた方法に比べて、物体特徴量
学習過程／手段において精度の高い（探索漏れが少な
く、誤検出が少ない）特徴を使用し、３次元実環境で想
定される照明やカメラパラメータの変動により生じる物
体特徴量の変動に頑健である。また、これらの特徴量は
ヒストグラムなどの特徴量分布が計算可能であり、特徴
量分布照合過程／手段において精度を保証したまま多数
の物体の参照画像による照合探索を大幅に高速化でき
る。とくに本発明で課題とする３次元環境探索では向き
の違いやピントの違いなどによっては、物体特徴分布の
変動が大きいため、合成特徴の利用が有効である。ま
た、「候補選択用」物体特徴量分布の利用により画像探
索を増やすことによってカメラの制御回数を減らすこと
ができる。その画像探索の時間は、「候補選択用」物体
特徴量を合成分布の類似値または距離値で評価すること
により、精度をほとんど落とさないまま大幅に削減する
ことが可能になり、全体として探索時間を大きく削減で
きる。As is apparent from the above, according to the present invention, in particular, "object detecting device" (Japanese Patent Application No. 8-146857).
No.) or template matching method, "Object detection method and apparatus and recording medium recording this method" (Japanese Patent Application No. 2000-
35336), a feature with high accuracy (less search omission, less false detection) is used in the object feature amount learning process / means, and the illumination and camera parameters that are assumed in a three-dimensional real environment are used. It is robust against changes in the object feature amount caused by changes. Further, for these feature amounts, a feature amount distribution such as a histogram can be calculated, and the collation search with reference images of many objects can be significantly speeded up while the accuracy is guaranteed in the feature amount distribution collation process / means. In particular, in the three-dimensional environment search which is a subject of the present invention, the variation of the object feature distribution is large depending on the direction difference, the focus difference, and the like, and thus the use of the synthetic feature is effective. Further, the number of camera controls can be reduced by increasing the image search by using the “candidate selection” object feature amount distribution. The image search time can be greatly reduced by evaluating the “candidate selection” object feature amount with the similarity value or distance value of the composite distribution, and it is possible to significantly reduce the accuracy, and the search time as a whole. Can be greatly reduced.

[Brief description of drawings]

【図１】本発明の一実施形態例を示すブロック図であ
る。FIG. 1 is a block diagram illustrating an exemplary embodiment of the present invention.

【図２】本発明の一実施形態例の処理の流れを示すフロ
ーチャートである。FIG. 2 is a flowchart showing a flow of processing according to an embodiment of the present invention.

【図３】（１），（２），（３）は、本発明の一実施形
態例で用いる物体の例を示す図である。3 (1), (2), and (3) are diagrams showing an example of an object used in an embodiment of the present invention.

【図４】本発明による周辺領域における照合省略を説明
する図である。FIG. 4 is a diagram for explaining collation omission in a peripheral area according to the present invention.

【図５】本発明による類似参照画像間の照合省略を説明
する図である。FIG. 5 is a diagram illustrating the omission of matching between similar reference images according to the present invention.

【図６】本発明による複数参照画像の照合省略を説明す
る図である。FIG. 6 is a diagram illustrating the omission of matching of multiple reference images according to the present invention.

【図７】本発明による一実施形態例の実験仕様を表す図
である。FIG. 7 is a diagram showing experimental specifications of an embodiment example according to the present invention.

【図８】本発明による一実施形態例の最も広角段階の入
力画像を表す図である。FIG. 8 is a diagram showing an input image in the widest angle stage of an exemplary embodiment according to the present invention.

【図９】本発明による一実施形態例の精度の結果を示す
図である。FIG. 9 is a diagram showing accuracy results of an example embodiment according to the present invention.

【図１０】本発明による一実施形態例の画像探索速度の
結果を示す図である。FIG. 10 is a diagram showing a result of image search speed according to an embodiment of the present invention.

【図１１】本発明による一実施形態例のカメラ制御時間
を含む物体探索速度の結果を示す図である。FIG. 11 is a diagram showing a result of an object search speed including a camera control time according to an embodiment of the present invention.

【図１２】（１），（２）は、本発明による一実施形態
例の複数カメラの物体位置測定実験の様子を示す図であ
る。12 (1) and 12 (2) are diagrams showing a state of an object position measurement experiment of a plurality of cameras according to an embodiment of the present invention.

【図１３】本発明による一実施形態例の複数カメラによ
り物体を検出していく様子を示す図である。FIG. 13 is a diagram showing how an object is detected by a plurality of cameras according to an embodiment of the present invention.

【図１４】（１），（２），（３）は、本発明による一
実施形態例の複数カメラにより物体位置測定に用いる視
線情報の一例を示す図である。14 (1), (2), and (3) are diagrams showing an example of line-of-sight information used for object position measurement by a plurality of cameras according to an embodiment of the present invention.

【図１５】（１），（２）は、本発明による一実施形態
例の複数カメラにより物体位置測定の結果を示す図であ
る。15 (1) and 15 (2) are diagrams showing results of object position measurement by a plurality of cameras according to an embodiment of the present invention.

[Explanation of symbols]

１…物体特徴量学習手段２…パン・チルト・ズームパラメータ計算手段３…カメラ制御・画像入力手段４…入力画像特徴量分布計算手段５…特徴量分布照合手段６…照合省略領域計算手段７…物体位置計算手段Ｓ１１…物体特徴量学習過程Ｓ１２…パン・チルト・ズームパラメータ計算過程Ｓ１３…カメラ制御・画像入力過程Ｓ１４…入力画像特徴量分布計算過程Ｓ１６…特徴量分布照合過程Ｓ１９…照合省略領域計算過程Ｓ２０…物体位置計算過程 1 ... Object feature amount learning means 2. Pan / tilt / zoom parameter calculation means 3 ... Camera control / image input means 4 ... Input image feature distribution calculating means 5 ... Feature amount distribution matching means 6 ... Collation omitted area calculation means 7 ... Object position calculation means S11 ... Object feature amount learning process S12 ... Pan / tilt / zoom parameter calculation process S13 ... Camera control / image input process S14 ... Input image feature amount distribution calculation process S16 ... Feature value distribution matching process S19 ... Collation omitted region calculation process S20 ... Object position calculation process

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｔ 7/20 １００Ｇ０６Ｔ 7/20 １００Ｈ０４Ｎ 7/18 Ｈ０４Ｎ 7/18 ＣＫ (72)発明者高木茂東京都千代田区大手町二丁目３番１号日本電信電話株式会社内Ｆターム(参考） 2F065 AA04 AA32 AA34 AA37 AA51 DD06 FF04 FF05 FF61 FF65 FF67 HH14 JJ03 JJ26 LL06 QQ21 QQ29 QQ36 QQ37 QQ39 QQ42 QQ43 RR05 5C054 AA01 AA04 FC15 FC16 FE00 HA05 5L096 AA02 AA06 AA09 BA05 CA04 CA05 DA02 EA35 FA18 FA37 GA40 GA41 GA51 HA08 JA03 JA11 JA18 JA25 Front page continuation (51) Int.Cl. ⁷ identification code FI theme code (reference) G06T 7/20 100 G06T 7/20 100 H04N 7/18 H04N 7/18 C K (72) Inventor Shigeru Takagi Chiyoda, Tokyo 2-3-1, Otemachi, Okumachi, Nihon Telegraph and Telephone Corporation F-term (reference) 2F065 AA04 AA32 AA34 AA37 AA51 DD06 FF04 FF05 FF61 FF65 FF67 HH14 JJ03 JJ26 LL06 QQ21 QQ29 QQ36 QQ37 QQ39 QQ42 Q0543 FC15 A15A16 A15 FE00 HA05 5L096 AA02 AA06 AA09 BA05 CA04 CA05 DA02 EA35 FA18 FA37 GA40 GA41 GA51 HA08 JA03 JA11 JA18 JA25

Claims

[Claims]

1. A method of detecting an object similar to an object registered in advance and detecting a three-dimensional position of the object using an active camera having a pan / tilt / zoom function and performing position measurement. , An object feature amount learning process of learning an object feature amount from a set of reference images of a large number of objects photographed under different shooting conditions, and learning the object feature amount and the feature amount distribution, and the object feature amount learning process Pan / tilt / zoom parameter calculation process for calculating camera parameters of pan / tilt / zoom necessary for searching the entire field of view of the camera from the size of the feature distribution on the image, and the pan / tilt / zoom parameter Of the camera parameter information obtained in the calculation process, the camera is selected as a camera parameter that has not been searched yet, an image is captured, and the captured input image A camera control / image input process for generating a feature amount image converted into the feature amount obtained in the object feature amount learning process for each pixel above, and a feature amount image obtained in the camera control / image input process. , An input feature amount calculation process of setting a target window corresponding to the size of the object feature amount distribution to be collated on the image, and calculating the feature amount distribution in the target window, and the object feature amount learning process. A feature distribution for detecting an object by calculating the similarity value or the distance value between the obtained object feature distribution and the feature distribution in the window of interest set in the input feature distribution calculation process and determining the presence or absence of the object Matching process, and matching between many other object feature amounts similar to the object feature based on the similarity value or distance value calculated in the feature amount distribution matching process and many attention windows around the attention window. Collation omitting area calculation to calculate the optional area Extent and, when detecting an object in the feature distribution matching process, the object position calculation process of calculating the object position from the detection direction, the object detection / position measuring method characterized by comprising.

2. The object detection / position measurement method according to claim 1, wherein an object feature quantity using a color code by vector quantization of color information is used in the object feature quantity learning process.

3. The object detection / position measurement according to claim 1, wherein a color code by vector quantization limited to colors included in the object is used for the object feature in the object feature learning process. Method.

4. The object detecting / position measuring method according to claim 1, wherein a histogram of object features is used for the object feature amount distribution in the object feature amount learning process.

5. In the feature amount distribution collation process, among the object feature amount distributions learned in the object feature amount learning process, an object feature amount distribution that cannot be treated as having detected an object due to erroneous detection or omission of detection is “ It is characterized in that it is used as a "candidate selection" object feature amount distribution, and in the camera control / image input process, the camera is preferentially pointed to an area evaluated to have an object by the "candidate selection" object feature amount. The object detection / position measurement method according to any one of claims 1 to 4.

6. The evaluation of the “for candidate selection” object feature quantity includes:
The object detection / position measurement method according to claim 5, wherein evaluation based on a similarity value or a distance value based on a composite distribution is used.

7. The upper limit value or the distance value of the similarity value of the local subregions around the local subregion used for the calculation of the similarity value or the distance value from the calculation result of the similarity value or the distance value in the feature amount distribution matching process. 7. The object detection / position measurement method according to claim 1, wherein a lower limit value is calculated, and calculation of a similarity value or a distance value of a region that does not reach a threshold value is omitted.

8. The similarity value or distance value between the object feature amount distributions is calculated in advance in the object feature amount learning process, and the calculation result of the similarity value or distance value and the object feature amount distribution in the feature amount distribution matching process. The upper limit value of the similarity value or the lower limit value of the distance value of the object feature amount similar to the object feature amount used for the calculation of the similarity value or the distance value is calculated from the similar value or the distance value between 8. The object detection / position measurement method according to claim 1, wherein the calculation of the similarity value or the distance value of the object feature amount that is not performed is omitted.

9. In the object feature amount learning process, a combined distribution obtained by combining a plurality of object feature amount distributions into one is calculated in advance, and in the feature amount distribution matching process, the combined distribution and a local partial region on the input image are calculated. And calculating the upper limit value or the lower limit value of the distance value of the object feature amount distribution from the calculation result of the similarity value or the distance value with, and omitting the similarity calculation of the object feature amount that does not reach the threshold value. Item 9. The object detection / position measurement method according to any one of items 1 to 8.

10. The object position calculation process according to claim 1, wherein the object position is calculated based on the direction of the object detected by the camera alone and the size of the known object. The object detection / position measurement method described in 1.

11. The object detection / detection method according to claim 1, wherein, in the object position calculation process, the object position is calculated based on the directions of the objects detected by a plurality of cameras. Position measurement method.

12. A device for detecting an object similar to an object registered in advance and detecting a three-dimensional position of the object using an active camera having a pan / tilt / zoom function and performing position measurement. , An object feature amount learning unit that learns an object feature amount from a reference image set of a large number of objects that have been shot under different shooting conditions, and learns the object feature and the feature amount distribution; and the object feature amount learning unit. Pan / tilt / zoom parameter calculation means for calculating the pan / tilt / zoom camera parameters necessary to search the entire field of view of the camera from the size of the feature distribution on the image, and the pan / tilt / zoom parameter Of the camera parameter information obtained by the calculation means, a camera is selected for a camera parameter that has not yet been searched, an image is captured, and the captured input image is captured. A camera control / image input unit for generating a feature amount image converted into a feature amount obtained by the object feature amount learning unit for each pixel on the image, and a feature amount image obtained by the camera control / image input unit In the input feature distribution calculating means for setting a target window corresponding to the size of the object feature distribution to be collated on the image, and calculating the feature distribution in the target window, the object feature learning means A feature amount for detecting the object by calculating the similarity value or the distance value between the obtained object feature amount distribution and the feature amount distribution in the window of interest set by the input feature amount distribution calculating means, determining the presence or absence of the object Between the distribution matching means and a large number of other object feature quantities similar to the object feature based on the similarity value or distance value calculated by the feature quantity distribution matching means and a large number of attention windows around the attention window. Cross-reference omission area total that calculates the area that can be skipped And means, when detecting an object in the feature distribution verification means, object detection / position measuring device, characterized in that the object position calculating means for calculating an object position from the detection direction, provided.

13. Execution of an object detection / position measurement method, characterized in that a program for causing a computer to execute the procedure in the object detection / position measurement method according to any one of claims 1 to 11 is executed. program.

14. A program for causing a computer to execute the procedure in the object detection / position measurement method according to claim 1, wherein the execution program is a computer-readable recording medium. A recording medium on which an execution program of the object detection / position measurement method is recorded.