JP4675368B2

JP4675368B2 - Object position estimation apparatus, object position estimation method, object position estimation program, and recording medium recording the program

Info

Publication number: JP4675368B2
Application number: JP2007301322A
Authority: JP
Inventors: 勲宮川; 達哉大澤; 啓之新井; 秀樹小池
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-11-21
Filing date: 2007-11-21
Publication date: 2011-04-20
Anticipated expiration: 2027-11-21
Also published as: JP2009129049A

Description

本発明は、監視カメラなどの固定カメラで撮影した人物歩行撮影画像、車両などの移動物体や歩行者などの動きを撮影した時系列画像に利用可能で、カメラで取得した時系列画像から、対象物（被写体）の位置を推定する物体位置推定装置、物体位置推定方法、物体位置推定プログラムおよびそのプログラムを記録した記録媒体に関する。 The present invention can be used for a human walking photographed image captured by a fixed camera such as a surveillance camera, a time-series image captured of a moving object such as a vehicle or a pedestrian, and the target from a time-series image acquired by the camera. The present invention relates to an object position estimation device, an object position estimation method, an object position estimation program, and a recording medium on which the program is recorded.

監視カメラで撮影された映像から混雑で込み合う状態を計測する技術として、従来では背景差分などの画像処理を使って人物領域を抽出し、人物一人を構成する画素数や空間の奥行きに応じた補正を行うことにより監視エリア内の混雑度を測定する方法がある。これに対し、ステレオカメラを使って人物の大まかな位置を３次元計測することにより混雑度算出の精度を上げる方法がある。ステレオカメラでは光軸をほぼ平行に設置することで画像間の探索を容易にしており、画素間の対応付けが正確に処理されれば三角測量の原理によりその点の奥行きが分かる。さらに、カメラからの各物体への奥行きが分かれば、オクルージョン問題を解決することも期待できる。あるいは、複数のカメラを使って監視領域を撮影し、対象画像または対象領域と３次元空間を対応付けし、物体の位置を検出する方法も公開されている（特許文献１、２参照）。 As a technique to measure the crowded state from video captured by a surveillance camera, a person area is conventionally extracted using image processing such as background difference, and correction according to the number of pixels constituting the person and the depth of the space There is a method for measuring the degree of congestion in the monitoring area by performing the above. On the other hand, there is a method of improving the accuracy of calculating the degree of congestion by three-dimensionally measuring the approximate position of a person using a stereo camera. Stereo cameras facilitate the search between images by setting the optical axes almost in parallel, and the depth of the point can be determined by the principle of triangulation if the correspondence between pixels is processed accurately. Furthermore, if the depth from the camera to each object is known, it can be expected to solve the occlusion problem. Alternatively, a method of detecting a position of an object by photographing a monitoring area using a plurality of cameras, associating a target image or the target area with a three-dimensional space, and the like is disclosed (see Patent Documents 1 and 2).

一方、監視映像における人物追跡への新たな応用としてパーティクル・フィルタ（非特許文献１参照）の有用性が報告されている。パーティクル・フィルタは、追跡対象を状態量と尤度を持つ多数の仮説群により離散的な確率密度として表現し、状態遷移モデルを用いて伝播させることで動きの変動や観測の雑音に対して頑健に追跡を実現するために使われている。 On the other hand, the usefulness of a particle filter (see Non-Patent Document 1) has been reported as a new application for person tracking in surveillance video. The particle filter expresses the tracking target as a discrete probability density by a large number of hypotheses with state quantities and likelihoods, and is robust against movement fluctuations and observation noise by propagating using a state transition model. Used to realize tracking.

尚、本発明に関連する技術として、ガウス混合分布を利用して画像から物体領域を抽出することは、例えば非特許文献２に記載されている。
特開２００３−３３１２６３号公報特開２００４−４６４６４号公報樋口知之，”粒子フィルタ”，電子情報通信学会誌，Ｖｏｌ．８８，Ｎｏ．１２，ｐｐ．９８９−９９４，２００５．Ｃ．Ｓｔａｕｆｆｅｒ＆Ｗ．Ｅ．ＬＧｒｉｍｓｏｎ，”Ａｄａｐｔｉｖｅｂａｃｋｇｒｏｕｎｄｍｉｘｔｕｒｅｍｏｄｅｌｓｆｏｒｒｅａｌ−ｔｉｍｅｔｒａｃｋｉｎｇ”，ＰｒｏｃｅｅｄｉｎｇｏｆＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，Ｖｏｌ．２，ｐｐ．２４６−２５２，１９９９． As a technique related to the present invention, extracting an object region from an image using a Gaussian mixture distribution is described in Non-Patent Document 2, for example.
JP 2003-331263 A JP 2004-46464 A Tomoyuki Higuchi, “Particle Filter”, Journal of IEICE, Vol. 88, no. 12, pp. 989-994, 2005. C. Stauffer & W. E. L Grimson, “Adaptive background mix models for real-time tracking”, Proceeding of International Conferencing on Computer Vision and Pattern Recognition. 2, pp. 246-252, 1999.

単眼カメラでは奥行き方向に不定性があるため、画素数に応じた混雑度計測は必ずしも的確である保証はない。その奥行き不定性を解消するためにステレオカメラなどのマルチカメラ撮影が考えられるが、公共広場、駅、デパート構内などの監視カメラで撮像されるエリアは多数の人物監視の上で広域撮影である場合が多い。つまり、やや遠方のエリアを低角度で監視する状況では画面中において人物は小さく撮像されるためステレオ視に十分な精度の特徴点が観測できるかどうかは曖昧であり、遠方なだけに両眼での対応点が正確でないと推定された奥行き値は信頼性が低いものとなる。また、カメラ画像と３次元空間の対応付けにより物体位置を検出するには画像（２Ｄ）のボクセル空間（３Ｄ）への逆投影法（特許文献２）が提案されているが、計算コストが高く、かつ、複数カメラでは物体位置がボクセル空間において分散する恐れがある。 Since the monocular camera has indefiniteness in the depth direction, there is no guarantee that the measurement of the degree of congestion according to the number of pixels is accurate. Multi-camera shooting such as a stereo camera can be considered to eliminate the depth ambiguity, but the area captured by surveillance cameras such as public plazas, stations, department stores, etc. is wide-area shooting on many people monitoring There are many. In other words, in a situation where a slightly distant area is monitored at a low angle, a person is imaged small on the screen, so it is unclear whether or not a feature point with sufficient accuracy for stereo vision can be observed. The depth value estimated that the corresponding point is not accurate is low in reliability. In order to detect an object position by associating a camera image with a three-dimensional space, a back projection method (Patent Document 2) of an image (2D) onto a voxel space (3D) has been proposed, but the calculation cost is high. In addition, in a plurality of cameras, the object position may be dispersed in the voxel space.

一方、広域エリアの監視映像ではパーティクル・フィルタを使った３次元追跡は有望な手法であると考えられるが、計算機パワーに依存する処理が多く、監視映像を処理する計算機の資源が乏しい場合は負荷が大きい。 On the other hand, 3D tracking using a particle filter is considered a promising method for monitoring video over a wide area, but there are many processes that depend on computer power, and the load is high when the resources of the computer that processes the monitoring video are scarce. Is big.

本発明は上記課題を解決するものであり、その目的は、メモリに乏しく処理速度があまり速くない計算機においても監視映像からの物体位置の推定を高速、かつ効率的に行うことができる物体位置推定装置、物体位置推定方法、物体位置推定プログラムおよびそのプログラムを記録した記録媒体を提供することにある。 The object of the present invention is to solve the above-mentioned problems, and an object of the present invention is to estimate an object position from a monitoring image at a high speed and efficiently even in a computer having a small memory and a low processing speed. The object is to provide an apparatus, an object position estimation method, an object position estimation program, and a recording medium recording the program.

上記課題を解決するための請求項１に記載の物体位置推定装置は、カメラなどの画像入力装置を１台または複数台使って取得した映像において対象物領域を抽出し、その対象物領域から空間中の位置を推定する物体位置推定装置であって、前記各画像入力装置で取得した各時系列画像から、それぞれ対象物領域を画像処理的に抽出する物体領域抽出手段と、前記物体領域抽出手段により抽出された、前記各画像入力装置に対応する各画素が、用意した３次元モデルのうちどの投影が該当するかどうかを、前記画像入力装置から得られる画像の各画素がどの空間位置に配置した３次元モデルから投影されるかを示す参照用テーブルを使って照合し、投影される３次元モデルが存在した場合はそのモデルの空間位置にそれぞれ投票することにより、投票値が多い場所に対象物の存在の可能性が高いことを表すマップを生成する空間マップ生成手段と、前記空間マップ生成手段により生成されたマップの得票値が極大点となる位置を検出して対象物の位置とし、さらに、前記対象物の位置を中心に半径Ｒの円領域を排他的領域として設定し、前記マップのうち該排他的領域を設定しなかった領域に半径Ｒの円領域を別の排他的領域として設定し、前記別の排他的領域の各排他的領域ごとに該排他的領域内における得票値を集計し、前記集計した値が所定の値以上である場合には該別の排他的領域の中心を対象物の位置とする物体位置推定手段とを備えることを特徴としている。 The object position estimation apparatus according to claim 1 for solving the above-described problem is to extract a target area from an image obtained by using one or a plurality of image input devices such as a camera, and to obtain a space from the target area. An object position estimation device for estimating a position in an object, wherein the object region extraction unit extracts the target region from each time-series image acquired by each of the image input devices in an image processing manner, and the object region extraction unit Which pixel of the image obtained from the image input device is arranged in which spatial position, which projection of the three-dimensional model prepared for each pixel corresponding to each image input device extracted by By using a reference table that indicates whether the projected 3D model is projected, and if there is a projected 3D model, vote on the spatial position of the model. A space map generating means for generating a map indicating an increased likelihood of the presence of an object in place vote value is large, the vote value of maps generated by the spatial map generating means detects the position at which the maximum point A circular area having a radius R centered on the position of the object as an exclusive area, and a circular area having a radius R in an area of the map where the exclusive area is not set. Is set as another exclusive area, the vote values in the exclusive area are totaled for each exclusive area of the other exclusive area, and when the total value is equal to or greater than a predetermined value, And an object position estimating means for setting the center of another exclusive area as the position of the object.

また請求項２に記載の物体位置推定装置は、請求項１において、前記参照用テーブルは、ある基準座標系上の空間位置に複数個の３次元モデルを配置し、画像入力装置の位置と姿勢及びその画像入力装置固有の内部パラメータを使って生成した３次元形状の透視投影像の各画像座標と、前記３次元モデルの配置位置との関係を示すテーブルとして構築されていることを特徴としている。 According to a second aspect of the present invention, there is provided the object position estimation apparatus according to the first aspect, wherein the reference table includes a plurality of three-dimensional models arranged at spatial positions on a certain reference coordinate system, and the position and orientation of the image input apparatus. And a table indicating the relationship between the image coordinates of the three-dimensional perspective projection image generated using the internal parameters unique to the image input apparatus and the arrangement position of the three-dimensional model. .

また、請求項６に記載の物体位置推定方法は、カメラなどの画像入力装置を１台または複数台使って取得した映像において対象物領域を抽出し、その対象物領域から空間中の位置を推定する物体位置推定方法であって、物体領域抽出手段が、前記各画像入力装置で取得した各時系列画像から、それぞれ対象物の領域を画像処理的に抽出する物体領域抽出ステップと、参照用テーブル構築手段が、前記各画像入力装置から得られる画像の各画素がどの空間位置に配置した３次元モデルから投影されるかを示す参照用テーブルを構築する参照用テーブル構築ステップと、空間マップ生成手段が、前記物体領域抽出ステップにより抽出された、前記各画像入力装置に対応する各画素が、用意した３次元モデルのうちどの投影が該当するかどうかを、前記参照用テーブルを使って照合し、投影される３次元モデルが存在した場合はそのモデルの空間位置にそれぞれ投票することにより、投票値が多い場所に対象物の存在の可能性が高いことを表すマップを生成する空間マップ生成ステップと、物体位置推定手段が、前記空間マップ生成ステップにより生成されたマップの得票値に基づいて対象物の位置を推定する物体位置推定ステップとを備え、前記物体位置推定ステップは、前記空間マップ生成ステップにより生成されたマップの得票値が極大点となる位置を検出して対象物の位置とする第１のサブステップと、前記対象物の位置を中心に半径Ｒの円領域を排他的領域として設定する第２のサブステップと、前記マップのうち該排他的領域を設定しなかった領域に半径Ｒの円領域を別の排他的領域として設定し、前記別の排他的領域内における得票値を集計する第３のサブステップと、前記集計した値が所定の値以上である場合には前記別の排他的領域の中心を対象物の位置とする第４のサブステップとを有し、前記第２のサブステップ〜第４のサブステップをマップ全体に対して繰り返すことを特徴としている。 The object position estimation method according to claim 6 extracts a target area from an image obtained by using one or a plurality of image input devices such as a camera, and estimates a position in the space from the target area. An object position estimation method, wherein an object area extraction unit extracts an object area from each time-series image acquired by each of the image input devices in an image processing manner, and a reference table A reference table construction step in which the construction means constructs a reference table indicating in which spatial position each pixel of the image obtained from each image input device is projected; and a spatial map generation means Which projection extracted from the three-dimensional model prepared for each pixel corresponding to each image input device extracted by the object region extraction step, If there is a projected 3D model using the reference table, and voting for each model's spatial position, it is highly possible that the object exists in a place with a high vote value. A spatial map generating step for generating a map representing the object, and an object position estimating unit for estimating the position of the object based on a vote value of the map generated by the spatial map generating step , The position estimation step includes a first sub-step in which a position where the vote value of the map generated by the spatial map generation step is a maximum point is detected as a position of the object, and a radius centered on the position of the object A second sub-step for setting an R circle area as an exclusive area, and a circular area with a radius R in another area of the map where the exclusive area is not set; A third sub-step for setting the target area as a target area and totaling the vote values in the other exclusive area; and if the total value is equal to or greater than a predetermined value, the center of the other exclusive area is targeted And a fourth sub-step as an object position, and the second to fourth sub-steps are repeated for the entire map .

また請求項７に記載の物体位置推定方法は請求項６において、前記参照用テーブル構築ステップは、ある基準座標系上の空間位置に複数個の３次元モデルを配置し、画像入力装置の位置と姿勢及びその画像入力装置固有の内部パラメータを使ってその３次元形状の透視投影像を生成し、その透視投影像の各画像座標と、前記３次元モデルの配置位置とを対応づけることを特徴としている。 The object position estimating method according to claim 7 according to claim 6, wherein the lookup table construction step, a plurality of three-dimensional model located in spaces located on one reference coordinate system, the position of the image input device A perspective projection image of the three-dimensional shape is generated using the posture and internal parameters specific to the image input apparatus, and each image coordinate of the perspective projection image is associated with the arrangement position of the three-dimensional model. Yes.

上記構成によれば、従来のように画素数に応じた奥行き補正を行うことなく、画像入力装置、例えばカメラを使って取得した時系列画像全般から、車両などの移動物体や人物の位置を推定することができる。 According to the above configuration, the position of a moving object such as a vehicle or a person is estimated from all time-series images acquired using an image input device, for example, a camera, without performing depth correction according to the number of pixels as in the past. can do.

また複数台のカメラを用いた場合、対象物位置を示す空間マップ分布がより確からしくなるとともに、雑音による影響の低減が可能となり、物体位置の推定精度を向上させることができる。 Further, when a plurality of cameras are used, the spatial map distribution indicating the object position becomes more accurate, the influence of noise can be reduced, and the object position estimation accuracy can be improved.

また、各画像から物体領域を抽出し、参照用テーブルの参照による投票処理だけで済むため、市販のパーソナルコンピュータによるリアルタイム処理が可能であるとともに、処理コストを低減することができる。 Further, since an object region is extracted from each image and only voting processing by referring to the reference table is required, real-time processing by a commercially available personal computer is possible and processing cost can be reduced.

また請求項３に記載の物体位置推定装置は、請求項１または２において、前記物体位置推定手段は、前記生成された空間マップを格子状に量子化し、該量子化された範囲における各得票値が所定の値以上である場合に、当該量子化された範囲を対象物の存在する場所とすることを特徴としている。 The object position estimation device according to claim 3 is the object position estimation device according to claim 1, wherein the object position estimation unit quantizes the generated space map in a lattice shape, and obtains each vote value in the quantized range. When is equal to or greater than a predetermined value, the quantized range is set as a place where the object exists.

また請求項８に記載の物体位置推定方法は、請求項６または７において、前記物体位置推定ステップは、前記生成された空間マップを格子状に量子化し、該量子化された範囲における各得票値が所定の値以上である場合に、当該量子化された範囲を対象物の存在する場所とすることを特徴としている。 The object position estimation method according to claim 8 is the object position estimation method according to claim 6 or 7 , wherein the object position estimation step quantizes the generated spatial map in a lattice shape, and obtains each vote value in the quantized range. When is equal to or greater than a predetermined value, the quantized range is set as a place where the object exists.

上記構成によれば、例えば物体領域抽出時に生じる、背景画像との差分をとる処理の誤差（雑音）による影響を低減することができる。 According to the above configuration, it is possible to reduce the influence due to the error (noise) of the processing for obtaining the difference from the background image, which occurs at the time of object region extraction, for example.

また請求項４に記載の物体位置推定装置は、請求項３において、前記空間マップにおける格子状の量子化は、前記各画像入力装置が向く方向と当該装置の設置面とのなす角度が、所定角度よりも小さい場合、前記画像入力装置が向く方向の量子化を粗く設定することを特徴としている。 According to a fourth aspect of the present invention, there is provided the object position estimation apparatus according to the third aspect , wherein the grid-like quantization in the space map is performed by a predetermined angle between a direction in which each image input device faces and an installation surface of the device. When the angle is smaller than the angle, the quantization in the direction in which the image input device faces is set coarsely.

また請求項９に記載の物体位置推定方法は、請求項８において、前記空間マップにおける格子状の量子化は、前記各画像入力装置が向く方向と当該装置の設置面とのなす角度が、所定角度よりも小さい場合、前記画像入力装置が向く方向の量子化を粗く設定することを特徴としている。 The object position estimation method according to claim 9 is the object position estimation method according to claim 8 , wherein the grid-like quantization in the space map is such that an angle formed between a direction in which each image input device faces and an installation surface of the device is predetermined. When the angle is smaller than the angle, the quantization in the direction in which the image input device faces is set coarsely.

上記構成によれば、カメラの設置面に対する角度が小さいときは、奥行き方向に空間マップの分布が流れる可能性があるが、その場合に、不要に推定位置の数が増加してしまうことを防止することができ、処理コストが低減される。 According to the above configuration, when the angle with respect to the installation surface of the camera is small, the distribution of the spatial map may flow in the depth direction, but in this case, the number of estimated positions is prevented from increasing unnecessarily. Processing costs can be reduced.

また請求項５に記載の物体位置推定装置は、請求項１ないし４のいずれか１項において、用意した３次元モデルの画像上での投影結果について、画像に投影された領域を投影有効画素として設定する有効画素設定手段をさらに備え、前記参照用テーブルは、前記有効画素設定手段により設定された投影有効画素を用いて構築され、前記物体領域抽出手段は、前記有効画素設定手段により設定された投影有効画素の領域のみを抽出することを特徴としている。 According to a fifth aspect of the present invention , in the object position estimation device according to any one of the first to fourth aspects, with respect to a projection result on the image of the prepared three-dimensional model, an area projected on the image is used as a projection effective pixel. Effective pixel setting means for setting, the reference table is constructed using the projection effective pixels set by the effective pixel setting means, and the object region extraction means is set by the effective pixel setting means It is characterized by extracting only the area of effective projection pixels.

また請求項１０に記載の物体位置推定方法は、請求項６ないし９のいずれか１項において、用意した３次元モデルの画像上での投影結果について、画像に投影された領域を投影有効画素として設定する有効画素設定ステップをさらに備え、前記参照用テーブル構築ステップは、前記有効画素設定ステップにより設定された投影有効画素を用いて構築し、前記物体領域抽出ステップは、前記有効画素設定ステップにより設定された投影有効画素の領域のみを抽出することを特徴としている。 An object position estimation method according to a tenth aspect of the present invention is the object position estimation method according to any one of the sixth to ninth aspects, wherein the projection result on the image of the prepared three-dimensional model is defined as a projection effective pixel. An effective pixel setting step for setting; the reference table construction step is constructed using the projection effective pixels set by the effective pixel setting step; and the object region extraction step is set by the effective pixel setting step. Only the projected effective pixel area is extracted.

上記構成によれば、投影されない画素領域を処理対象から外すことができ、一連の処理を効率的に実行することができる。 According to the above configuration, a pixel region that is not projected can be excluded from the processing target, and a series of processing can be executed efficiently.

また請求項１１に記載の物体位置推定プログラムは、コンピュータに、請求項６ないし１０のいずれか１項に記載の各ステップを実行させるプログラムとしたことを特徴としている。 An object position estimation program according to an eleventh aspect is a program for causing a computer to execute each step according to any one of the sixth to tenth aspects.

また請求項１２に記載の記録媒体は、請求項１１に記載の物体位置推定プログラムを記録したコンピュータ読み取り可能な記録媒体であることを特徴としている。 A recording medium according to claim 12 is a computer-readable recording medium in which the object position estimation program according to claim 11 is recorded.

（１）請求項１〜１２に記載の発明によれば、画像入力装置、例えばカメラを使って取得した時系列画像全般から、車両などの移動物体や人物の位置を推定することができる。単眼カメラを使う場合には奥行き方向に不定性があるため従来技術では画素数に応じた奥行きを補正する必要があったが、本発明ではそのような補正を一切必要とせず、対象物の位置を表す空間マップとして推定することができる。 (1) According to the first to twelfth aspects of the present invention, the position of a moving object such as a vehicle or a person can be estimated from all time-series images acquired using an image input device, for example, a camera. In the case of using a monocular camera, the depth direction is indefinite, so the prior art had to correct the depth according to the number of pixels, but the present invention does not require any such correction and the position of the target object. It can be estimated as a spatial map representing.

また、複数カメラの場合では、従来では光軸をそろえたステレオタイプや基線長を既知とし多眼カメラが多く使われていたが、本発明では適当に配置された複数カメラでも容易に適用することができる。 In the case of a plurality of cameras, a stereotype with a fixed optical axis and a base length are known and a multi-lens camera has been used in many cases. However, in the present invention, it can be easily applied to a plurality of cameras arranged appropriately. Can do.

なお、複数カメラでは従来、ボクセル空間への逆投影法が使われていたが、その手法では計算コストが高く、多くのカメラを扱うことでボクセル空間が発散するという問題があった。これに対して、本発明での多視点カメラ投票では、対象物位置を示す空間マップ分布がより確からしくなるとともに、雑音による影響を低減できるため、物体位置の推定精度を向上させることができる。 Conventionally, a back projection method to a voxel space has been used for a plurality of cameras. However, this method has a problem in that the calculation cost is high and the voxel space is diverged by handling many cameras. On the other hand, in the multi-view camera voting according to the present invention, the spatial map distribution indicating the object position becomes more accurate and the influence of noise can be reduced, so that the object position estimation accuracy can be improved.

さらに、従来のパーティクル・フィルタを使った３次元追跡は演算コストが高く計算機パワーに頼っていたが、本発明はモデルテーブルを予め生成し具備しておくことで、各画像から物体領域が抽出されれば空間マップの生成にはテーブル参照による投票処理だけで済む。すなわち、市販のパーソナルコンピュータであってもビデオレートでのリアルタイム処理が可能であり、さらに計算機の資源が不十分な場合でも照合処理における処理コストを低減して物体位置推定の処理を行うことができる。
（２）請求項３、８に記載の発明によれば、例えば物体領域抽出時に生じる、背景画像との差分をとる処理の誤差（雑音）による影響を低減することができる。
（３）請求項４、９に記載の発明によれば、カメラの設置面に対する角度が小さいときは、奥行き方向に空間マップの分布が流れる可能性があるが、その場合に、不要に推定位置の数が増加してしまうことを防止することができ、処理コストが低減される。
（４）請求項５、１０に記載の発明によれば、投影されない画素領域を処理対象から外すことができ、一連の処理を効率的に実行することができる。 In addition, 3D tracking using a conventional particle filter is computationally expensive and relies on computer power. However, the present invention extracts the object region from each image by creating and providing a model table in advance. Thus, the spatial map can be generated only by voting by referring to the table. In other words, even a commercially available personal computer can perform real-time processing at a video rate, and even when computer resources are insufficient, the processing cost in collation processing can be reduced and object position estimation processing can be performed. .
(2) According to the third and eighth aspects of the invention, it is possible to reduce the influence caused by, for example, processing error (noise) that takes a difference from the background image, which occurs at the time of object region extraction.
(3) According to the inventions described in claims 4 and 9 , when the angle with respect to the installation surface of the camera is small, there is a possibility that the distribution of the spatial map flows in the depth direction. Can be prevented from increasing, and the processing cost is reduced.
(4) According to the inventions described in claims 5 and 10 , a pixel region that is not projected can be excluded from the processing target, and a series of processing can be executed efficiently.

以下、図面を参照しながら本発明の実施の形態を説明するが、本発明は下記の実施形態例に限定されるものではない。
（実施例１）
まず最初にカメラ（画像入力装置）と３次元モデルを配置する座標系と使用する数式や表記について説明する。図１１はカメラと３次元モデルの位置関係を表している。カメラは事前に校正されているものとし、空間中の点Ｏを原点とした基準座標系ＸＹＺからの位置（Ｘ_i，Ｙ_i，Ｚ_i）と姿勢（Ｘ軸周りの回転Ｒｘ，Ｙ軸周りの回転Ｒｙ，Ｚ軸周りの回転Ｒｚ）、および内部パラメータ（有効焦点距離ｆ，画像中心（Ｃｘ，Ｃｙ），アスペクト比α，スキューγなど）を既知とする。ＸＹ平面を監視エリアの底辺（床，地面など）と位置付け、人物や物体はこの平面上を移動すると仮定する。監視エリアをＷｘ[ｍ]×Ｗｙ[ｍ]とし、その中心が原点Ｏとする。このＷｘ×ＷｙエリアをＸ軸方向にΔＸの間隔でＭ等分、Ｙ軸方向にΔＹの間隔でＮ等分に分割する。この平面において３次元モデルを式（１）に従って位置（Ｘ_m，Ｙ_n），ｍ＝１，２，・・・，Ｍ，ｎ＝１，２，・・・，Ｎ（Ｍ，Ｎは便宜上奇数とする）の位置に配置する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, but the present invention is not limited to the following embodiments.
Example 1
First, mathematical expressions and notations used with a camera (image input device), a coordinate system in which a three-dimensional model is arranged, will be described. FIG. 11 shows the positional relationship between the camera and the three-dimensional model. It is assumed that the camera is calibrated in advance, and the position (X _i , Y _i , Z _i ) and orientation (rotation around the X axis Rx, around the Y axis) from the reference coordinate system XYZ with the point O in the space as the origin And the internal parameters (effective focal length f, image center (Cx, Cy), aspect ratio α, skew γ, etc.) are known. It is assumed that the XY plane is positioned as the bottom side (floor, ground, etc.) of the monitoring area, and that a person or an object moves on this plane. The monitoring area is Wx [m] × Wy [m], and the center is the origin O. This Wx × Wy area is divided into M equal parts in the X-axis direction at intervals of ΔX and N equal parts in the Y-axis direction at intervals of ΔY. In this plane, the three-dimensional model is positioned at positions (X _m , Y _n ), m = 1, 2,..., M, n = 1, 2,. Placed at an odd number) position.

３次元モデルは式（２）に従い、中心（Ｘ_m，Ｙ_n，Ｃ）、半径Ａ、Ｂ、Ｃとした３次元楕円体とする。 The three-dimensional model is a three-dimensional ellipsoid having a center (X _m , Y _n , C) and radii A, B, C according to equation (2).

ここで、例えばＣは人の平均身長の１／２としておく。一方、半径Ａ、ＢはΔＸ、ΔＹと同じ、またはそれ以下の値にし、必ずしも人の大きさである必要はない。本発明では、例えば、量子化数Ｍ、Ｎを十分に大きくして３次元モデルのＡ、Ｂを小さく設定し、細かく量子化した空間位置（Ｘ_m，Ｙ_n）での微細な３次元モデルを扱う。 Here, for example, C is ½ of the average height of a person. On the other hand, the radii A and B are equal to or smaller than ΔX and ΔY, and need not necessarily be the size of a person. In the present invention, for example, the quantization numbers M and N are sufficiently increased to set A and B of the three-dimensional model to be small, and a fine three-dimensional model at a finely quantized spatial position (X _m , Y _n ). Handle.

図１は請求項１、２の発明において、画像入力装置が１台の場合の基本構成図である。図１において１は、時系列画像を蓄積した時系列画像データベース、２は前記データベース１内の各画像における物体または人物領域を抽出する物体領域抽出部（物体領域抽出手段）である。３はカメラの内部・外部パラメータを与えるカメラ情報データ入力部であり、４はそのカメラ情報データから前記ＸＹＺ座標系において式（１）に従った空間位置に３次元モデルを配置し、その透視投影像を計算して参照するためのモデルテーブル（参照用テーブル）を構築するモデルテーブル構築部である。５は前記テーブルを使って物体領域の各画素から照合処理を行い物体の空間位置を得る空間マップ生成部（空間マップ生成手段）であり、６はその空間マップから物体位置を推定する物体位置推定部（物体位置推定手段）である。図１の各部は、例えばコンピュータにより構成される。 FIG. 1 is a basic configuration diagram in the case of a single image input apparatus according to the first and second aspects of the present invention. In FIG. 1, 1 is a time series image database in which time series images are stored, and 2 is an object area extraction unit (object area extraction means) for extracting an object or person area in each image in the database 1. Reference numeral 3 denotes a camera information data input unit that gives internal and external parameters of the camera. Reference numeral 4 denotes a perspective projection of a three-dimensional model that is arranged from the camera information data at a spatial position according to the equation (1) in the XYZ coordinate system. It is a model table construction unit that constructs a model table (reference table) for calculating and referring to an image. Reference numeral 5 denotes a spatial map generation unit (spatial map generation means) that obtains a spatial position of the object by performing collation processing from each pixel of the object region using the table, and 6 is an object position estimation that estimates the object position from the spatial map. Part (object position estimating means). Each unit in FIG. 1 is configured by a computer, for example.

この構成において、時系列画像データベース１には、ハードディスク、ＲＡＩＤ装置、ＣＤ−ＲＯＭなどの記録媒体を利用する、または、ネットワークを介してリモートなデータ資源を利用する形態でもどちらでも構わない。 In this configuration, the time-series image database 1 may be in the form of using a recording medium such as a hard disk, a RAID device, a CD-ROM, or a remote data resource via a network.

さらに、図２はリアルタイムで処理する場合の処理構成図であり、本発明は必ずしも各データベース部（時系列画像データベース１）などの記憶装置を必要としない。図２において図１と異なる点は、時系列画像データベース１の代わりに、カメラからの画像を取り込む画像入力部７を設けた点にあり、その他は図１と同一に構成されている。 Further, FIG. 2 is a processing configuration diagram in the case of processing in real time, and the present invention does not necessarily require a storage device such as each database unit (time-series image database 1). 2 is different from FIG. 1 in that an image input unit 7 for capturing an image from a camera is provided instead of the time-series image database 1, and the other configuration is the same as that in FIG.

図３は本実施例の処理フローであり、この図に従って図１、図２の装置の動作を説明する。まず、カメラ情報データを与えて予めモデルテーブルを生成することについて説明する。カメラは予め校正されているので、図１、図２のカメラ情報データ入力部３では、基準座標系に対する外部パラメータ（位置と姿勢）とカメラ特有の内部パラメータ（焦点距離，レンズ歪，画像中心など）を保有しており、また、３次元モデル入力においてもパラメータＡ、Ｂ、Ｃを与えておく（ステップＳ１，Ｓ２）。 FIG. 3 is a processing flow of this embodiment, and the operation of the apparatus of FIGS. 1 and 2 will be described with reference to this figure. First, generation of a model table in advance by giving camera information data will be described. Since the camera is calibrated in advance, the camera information data input unit 3 in FIGS. 1 and 2 uses external parameters (position and orientation) with respect to the reference coordinate system and camera-specific internal parameters (focal length, lens distortion, image center, etc.) ) And parameters A, B, and C are given to the three-dimensional model input (steps S1 and S2).

次にステップＳ３において、インデックスｍ、ｎをそれぞれ１から順番にあげていき、式（１）に従った位置（Ｘ_m，Ｙ_n）での３次元モデルとして、式（２）の３次元楕円体表面の標本点（Ｘ_j，Ｙ_j，Ｚ_j），ｊ＝１，２，・・・，Ｑを計算する（ステップＳ２，Ｓ３）。この標本点数Ｑは十分に大きな値に設定しておく（例えば、透視投影像へ変換したとき、画像上にて楕円体内にスペースが生じない程度に密に設定する）。 Next, in step S3, the indices m and n are incremented from 1 respectively, and the three-dimensional ellipse of equation (2) is obtained as a three-dimensional model at the position (X _m , Y _n ) according to equation (1). The sample points (X _j , Y _j , Z _j ), j = 1, 2,..., Q on the body surface are calculated (steps S2, S3). The number of sample points Q is set to a sufficiently large value (for example, set so densely that no space is generated in the ellipsoid when converted into a perspective projection image).

次にステップＳ４の透視投影像生成では、各標本点（Ｘ_j，Ｙ_j，Ｚ_j）からカメラ情報データを使って式（３）〜（５）の計算式により画像座標（ｘ_j，ｙ_j），ｊ＝１，２，・・・，Ｑを生成する。なお、上式にレンズ歪を考慮した透視投影計算により画像座標を算出してもよい。 Next, in the perspective projection image generation in step S4, the image coordinates (x _j , y) are calculated by the equations (3) to (5) using the camera information data from each sample point (X _j , Y _j , Z _j ). _j ), j = 1, 2,..., Q are generated. Note that the image coordinates may be calculated by perspective projection calculation in consideration of lens distortion.

次にステップＳ５において、前記画像座標とモデルを結びつけるための図４に示すモデルテーブルを構築する。図４は画像サイズを６４０×４８０画素とした場合のモデルテーブルの例である。縦方向に画素（１，１）から（６４０，４８０）画素に対応する欄が順番に用意され、横方向にＸ方向の量子化数ｍが１からＭに対応する欄が順番に用意され、モデルテーブルの大きさは（６４０×４８０）×Ｍになる。 Next, in step S5, a model table shown in FIG. 4 for associating the image coordinates with the model is constructed. FIG. 4 is an example of a model table when the image size is 640 × 480 pixels. Columns corresponding to pixels (1, 1) to (640, 480) pixels are prepared in order in the vertical direction, and columns corresponding to quantization numbers m in the X direction from 1 to M are sequentially prepared in the horizontal direction. The size of the model table is (640 × 480) × M.

まず、テーブル構築の前に予め全ての配列の欄を０に初期化しておく。次に、式（３）〜（５）に従った計算で得られる画像座標（ｘ_j，ｙ_j）をｊ＝１，２，・・・，Ｑの順番でこのテーブルに書き込む。つまり、位置（Ｘ_m，Ｙ_n）に置いた３次元モデル上の点からの透視投影像（ｘ_j，ｙ_j）について、縦方向の行番号（ｘ_j，ｙ_j）と横方向の列番号ｍで指定する欄に”ｎ”を書き込む。これを全ての画素（ｘ_j，ｙ_j），ｊ＝１，２，・・・，Ｑに対して行い、それが済むと次の位置（Ｘ_m，Ｙ_n）に３次元モデルを置いて、同様の透視投影計算により画像座標（ｘ_j，ｙ_j），ｊ＝１，２，・・・，Ｑを得て、同じテーブルに書き込む。こうして得られたテーブルは各画素がどのモデルと結びついているかを示したモデルテーブルになる。 First, all the array columns are initialized to 0 before the table construction. Next, the image coordinates (x _j , y _j ) obtained by the calculation according to the equations (3) to (5) are written in this table in the order of j = 1, 2,. That is, with respect to the perspective projection image (x _j , y _j ) from the point on the three-dimensional model placed at the position (X _m , Y _n ), the vertical row number (x _j , y _j ) and the horizontal column Write “n” in the column designated by the number m. This is performed for all the pixels (x _j , y _j ), j = 1, 2,..., Q, and after that, a three-dimensional model is placed at the next position (X _m , Y _n ). The image coordinates (x _j , y _j ), j = 1, 2,..., Q are obtained by the same perspective projection calculation and written in the same table. The table thus obtained is a model table indicating which model each pixel is associated with.

図４の例では、画素（１，１）は（Ｘ₂，Ｙ₁）に置いた３次元モデル上のある点から投影され、画素（１，２）は（Ｘ₃，Ｙ₁₀）と（Ｘ_m，Ｙ₂₅）に置いた３次元モデル上のある点からそれぞれ投影されることを示している。各画素には１つだけのモデルが紐付くわけではなく、複数のモデルが同じ画素を形成することもある。配列の値が０であるときは位置（Ｘ_m，Ｙ_n）に置いた３次元モデル上の点が全く投影されないことを意味し、全て０の場合はその画素に対して、量子化数ｍ、ｎを使って用意した３次元モデル上の点からその画素に何一つ投影されないことを意味する。以上のテーブル構築を全ての位置（Ｘ_m，Ｙ_n）において行い、モデルテーブル構築を完了する。 In the example of FIG. 4, the pixel (1, 1) is projected from a certain point on the three-dimensional model placed at (X ₂ , Y ₁ ), and the pixel (1, 2) is projected to (X ₃ , Y ₁₀ ) and ( X _m , Y ₂₅ ) is projected from a certain point on the three-dimensional model. Each pixel is not associated with only one model, and a plurality of models may form the same pixel. When the value of the array is 0, it means that the point on the three-dimensional model placed at the position (X _m , Y _n ) is not projected at all, and when it is all 0, the quantization number m for the pixel. , N means that nothing is projected onto the pixel from a point on the three-dimensional model prepared using n. The above table construction is performed at all positions (X _m , Y _n ) to complete the model table construction.

なお、本発明では図４の配列形式である必要は無く、図５に示すように各画素に投影されたモデルの数（すなわち物体位置の数）とそのモデルのＸＹ平面上での位置を示すインデックス（ｍ，ｎ）を横方向に逐次書き込む形式としてもよい。この形式では横方向には投影されたモデル数だけが書き込まれ、何も投影されない画素の場合は何の値を書き込む必要がないため、メモリ資源を有効に利用する場合に適している。さらに、これらのモデルテーブルを別の処理装置によって予め構築し、そのテーブルを他の処理装置でも保持していれば、以降の処理だけで物体位置推定を同様に行うことができる。 In the present invention, the arrangement form of FIG. 4 is not necessary, and the number of models projected on each pixel (that is, the number of object positions) and the position of the model on the XY plane are shown as shown in FIG. The index (m, n) may be sequentially written in the horizontal direction. In this format, only the number of projected models is written in the horizontal direction, and it is not necessary to write any value in the case of a pixel on which nothing is projected, so this is suitable for effective use of memory resources. Furthermore, if these model tables are constructed in advance by another processing device and the table is also held by another processing device, the object position can be estimated in the same manner only by the subsequent processing.

一方、図１において、対象物の時間的な動きはカメラにより撮影され、その時系列画像が時系列画像データベース１に取り込まれる（ステップＳ６）。図１の物体領域抽出部２では、時系列画像データベース１から画像を逐次取り出し、その画像から移動物体領域または人物領域（これ以降の説明では物体領域と総称する）を抽出する（ステップＳ７）。 On the other hand, in FIG. 1, the temporal movement of the object is photographed by the camera, and the time series image is taken into the time series image database 1 (step S6). 1 sequentially extracts images from the time-series image database 1, and extracts a moving object region or a person region (generally referred to as an object region in the following description) from the image (step S7).

尚図２において、前記同様にカメラにより撮影された画像は、画像入力部７から取り込まれ（ステップＳ６）、物体領域抽出部２に入力される。 In FIG. 2, the image taken by the camera in the same manner as described above is captured from the image input unit 7 (step S6) and input to the object region extraction unit 2.

ここで図６に示すように、事前に背景画像が得られているならば、各画像からこの背景画像との差分処理（背景差分）により物体領域を抽出することができる。または非特許文献２に記載されたガウス混合分布を利用した方法を使えば、背景画像を必要とせずフレーム間差分だけで移動物体領域を抽出することができる。以上の処理により、物体領域を形成する各画素（ｘ_j，ｙ_j），ｊ＝１，２，・・・，Ｐ（抽出した領域の総画素数がＰ個）が得られる。 Here, as shown in FIG. 6, if a background image is obtained in advance, an object region can be extracted from each image by differential processing (background difference) with the background image. Alternatively, if a method using a Gaussian mixture distribution described in Non-Patent Document 2 is used, a moving object region can be extracted only by inter-frame differences without requiring a background image. Through the above processing, each pixel (x _j , y _j ), j = 1, 2,..., P (the total number of pixels in the extracted region is P) forming the object region is obtained.

次にステップＳ８において、物体領域抽出部２から得られた各画素（ｘ_j，ｙ_j），ｊ＝１，２，・・・，Ｐを前記モデルテーブルと照合する。照合処理は、まずＸＹ空間を量子化数ＭとＮによる大きさのＭ×Ｎの２次元バッファを用意しておき、各画素（ｘ_j，ｙ_j）に対応する欄を横方向に参照して０以外の値があれば、そのｍとｎで決定されるバッファの位置（ｍ，ｎ）に１票を入れる（ステップＳ９）。これを抽出した画素全て（ｘ_j，ｙ_j），ｊ＝１，２，・・・，Ｐに対して行う（ステップＳ１１）。その結果、抽出した物体領域と３次元モデルの透視投影像とのパターン照合を行ったことになり、投票数が多い位置ほど物体位置である可能性が高いという空間マップが得られる（ステップＳ１０）。 Next, in step S8, each pixel (x _j , y _j ), j = 1, 2,..., P obtained from the object region extraction unit 2 is collated with the model table. In the collation process, first, an M × N two-dimensional buffer having a size corresponding to the quantization numbers M and N is prepared in the XY space, and the column corresponding to each pixel (x _j , y _j ) is referred to in the horizontal direction. If there is a value other than 0, one vote is put in the buffer position (m, n) determined by m and n (step S9). This is performed for all the extracted pixels (x _j , y _j ), j = 1, 2,..., P (step S11). As a result, pattern matching between the extracted object region and the perspective projection image of the three-dimensional model is performed, and a spatial map is obtained in which the position with a larger number of votes is more likely to be the object position (step S10). .

このマップはＭ×Ｎの大きさのバッファであるが、式（１）により実世界のスケールと対応付けられているので、配列（ｍ，ｎ）は空間位置（Ｘ_m，Ｙ_n）に対応している。上記の投票結果により生成される空間マップは（Ｘ_m，Ｙ_n）を座標とし、その座標の値が得票値に対応している。その３次元分布は対象物が存在する場所を中心に山型の形状となり、そのマップを等高線図として表現すると図７になる。 This map is a buffer of size M × N, but since it is associated with the real world scale by equation (1), the array (m, n) corresponds to the spatial position (X _m , Y _n ). is doing. The spatial map generated by the above voting results has (X _m , Y _n ) as coordinates, and the values of the coordinates correspond to the vote values. The three-dimensional distribution has a mountain shape centering on the place where the object exists, and the map is expressed as a contour map as shown in FIG.

図７の例ではＹ_n軸方向にカメラがあり、その方向を向くように分布する特徴がある。また、手前の分布（ＡとＢ）は比較的まとまっているが、奥側の分布（ＣとＤ）はやや伸びるような傾向になる。これはカメラから遠くなるに従って投票による精度が劣化することを表しており、この傾向は床面に対するカメラの傾きが浅くなると、さらに奥行きに長く伸びる分布となる。 In the example of FIG. 7, there is a camera in the Y _n axis direction, and there is a feature that the camera is distributed so as to face the direction. In addition, the front distributions (A and B) are relatively grouped, but the rear distributions (C and D) tend to slightly increase. This indicates that the accuracy of voting deteriorates as the distance from the camera increases, and this tendency becomes a distribution that extends longer as the camera tilts with respect to the floor surface.

次にステップＳ１２において、図１、図２の物体位置推定部６では、図７の空間マップから対象物の位置を推定する。その処理フローを図８に示す。まず、物体領域抽出での雑音の影響などにより孤立点などが存在することが考えられるため、全ての格子点（ｍ＝１，２，・・・，Ｍ，ｎ＝１，２，・・・，Ｎ）に対して適当な窓の標準偏差σの２次元ガウシアンを使って投票値を暈す（ステップＳ２１，Ｓ２２）。次に、その暈した値を得票値とする空間マップにおいて図９の×で示したように極大となる点を検出し、その点を対象物の位置とする（ステップＳ２３，Ｓ２４）。 Next, in step S12, the object position estimation unit 6 in FIGS. 1 and 2 estimates the position of the object from the spatial map in FIG. The processing flow is shown in FIG. First, since there may be isolated points due to the influence of noise in object region extraction, all the lattice points (m = 1, 2,..., M, n = 1, 2,... , N), a voting value is entered using a two-dimensional Gaussian with an appropriate window standard deviation σ (steps S21 and S22). Next, a maximum point is detected as indicated by x in FIG. 9 in the spatial map using the wrinkled value as a vote value, and that point is set as the position of the object (steps S23 and S24).

ただし、カメラの配置状況や対象物の密集によっては極大点の箇所に必ずしも単独の対象物があるとは限らない。特に、奥側に位置する分布（図９のＣまたはＤ）において奥行き方向に長く伸びている場合、重複して複数の対象物の集まりを一つの分布として形成していることも考えられる。図８の処理フローはこのような重複するような対象物を考慮して複数の物体位置を算出する。 However, depending on the arrangement of the cameras and the density of the objects, there is not always a single object at the maximum point. In particular, when the distribution located on the far side (C or D in FIG. 9) extends long in the depth direction, it is conceivable that a plurality of overlapping objects are formed as a single distribution. The processing flow of FIG. 8 calculates a plurality of object positions in consideration of such overlapping objects.

すなわちステップＳ２３により各分布から極大点を検出した後に、その位置を中心に物体を包含する程度の半径Ｒの円で囲み（図１０の太い実線の円）、その内部を排他的領域（第１の排他的領域）として設定する（ステップＳ２５）。つまり、その排他的領域に物体は一つとする。さらに、前記設定した排他的領域を共有しない箇所に半径Ｒの円領域（円領域は排他的領域と共有しない）を別の排他的領域（第２〜第ｎの排他的領域）として設定し（図１０の破線の円）、その範囲において得票値を集計し（ステップＳ２６）、その集計値が所定の値を超えている場合（ステップＳ２７）には、その箇所に物体が存在するとして、当該領域（第２〜第ｎの排他的領域）の中心を対象物の位置とする（ステップＳ２４）。なお、前記第１、第２〜第ｎの排他的領域は半径Ｒの円の他に矩形や多角形などの図形としてもよい。この処理を空間マップ全体に対して行うことで、空間マップから物体位置を得、該物体位置を出力する（ステップＳ１３）。 That is, after detecting the local maximum point from each distribution in step S23, the position is surrounded by a circle with a radius R that includes the object at the center (the thick solid line circle in FIG. 10), and the inside of the circle is an exclusive region (first As an exclusive area) (step S25). That is, there is one object in the exclusive area. Further, a circular area having a radius R (the circular area is not shared with the exclusive area) is set as another exclusive area (second to n-th exclusive area) at a place where the set exclusive area is not shared ( The broken line circle in FIG. 10), the vote values in the range are totaled (step S26), and if the total value exceeds a predetermined value (step S27), it is assumed that there is an object at that location. The center of the region (the second to nth exclusive regions) is set as the position of the object (step S24). The first, second to n-th exclusive regions may be a figure such as a rectangle or a polygon in addition to a circle with a radius R. By performing this process on the entire space map, the object position is obtained from the space map, and the object position is output (step S13).

以上の一連の処理は対象とする画像全てに対して繰り返され、本発明の実施例により各画像における物体の位置を推定することができる。また、画像系列を長くすることで長時間での物体位置の推定も可能となる。 The above series of processing is repeated for all the target images, and the position of the object in each image can be estimated according to the embodiment of the present invention. In addition, it is possible to estimate the object position in a long time by lengthening the image series.

（実施例２）
図１２は請求項１，２の発明において、画像入力装置が複数台の場合の基本構成図であり、図１の構成と比べてカメラ台数分のカメラ情報データ入力部３ａ〜３ｎと対応するモデルテーブル構築部４ａ〜４ｎが付け加わる。また、図１３はリアルタイムで処理する場合の処理構成図であり、本発明は必ずしも各データベース部（時系列画像データベース１）などの記憶装置を必要としない。 (Example 2)
12 is a basic configuration diagram in the case of a plurality of image input devices according to the first and second aspects of the invention. Compared with the configuration of FIG. 1, the models corresponding to the camera information data input units 3a to 3n corresponding to the number of cameras. Table construction units 4a to 4n are added. FIG. 13 is a processing configuration diagram in the case of processing in real time, and the present invention does not necessarily require a storage device such as each database unit (time-series image database 1).

図１３において図１２と異なる点は、時系列画像データベース１の代わりに、複数のカメラからの画像を取り込む画像入力部７を設けた点にあり、その他は図１２と同一に構成されている。 13 is different from FIG. 12 in that an image input unit 7 for capturing images from a plurality of cameras is provided instead of the time-series image database 1, and the other configuration is the same as that in FIG.

尚、図１２、図１３において、前記実施例１と同一部分の説明は省略し、以下では、前掲の実施例１と異なる箇所を中心に説明する。 In FIGS. 12 and 13, the description of the same parts as those of the first embodiment is omitted, and the following description will be focused on the points different from the first embodiment.

この実施例２ではＣ（Ｃ≧２）台の複数カメラを使った場合を説明する。各カメラは必ずしも同じ種類のカメラである必要はなく、事前にカメラ校正作業により各カメラの内部・外部パラメータを得ておればよく、図１２の各カメラ情報データ入力部３ａ〜３ｎにそれらの情報を保持しておく。全てのカメラは時間的に同期がとれており各フレームでの画像が取得され、図１２の場合は時系列画像データベース１に格納されることを前提に以下の説明を進める。 In the second embodiment, a case where a plurality of C (C ≧ 2) cameras are used will be described. Each camera does not necessarily have to be the same type of camera. It is only necessary to obtain the internal and external parameters of each camera by camera calibration work in advance, and the camera information data input units 3a to 3n in FIG. Keep it. All cameras are synchronized in time, and images in each frame are acquired. In the case of FIG. 12, the following description will be made on the assumption that the images are stored in the time-series image database 1.

モデルテーブル構築部４ａ〜４ｎで行われる透視投影像生成（図３のステップＳ４）では、インデックスｍとｎの値をそれぞれ順番にあげていき、式（１）に従った空間位置（Ｘ_m，Ｙ_n）と、その位置での式（２）の３次元モデル上の点（Ｘ_j，Ｙ_j，Ｚ_j）を、各カメラｉ情報データ入力部３ａ〜３ｎから与えられるカメラパラメータを使って式（３）〜（５）の計算により画像座標（ｘ_ij，ｙ_ij），ｉ＝１，２，・・・，Ｃ，ｊ＝１，２，・・・，Ｑを得る。 In the perspective projection image generation (step S4 in FIG. 3) performed by the model table construction units 4a to 4n, the values of the indices m and n are sequentially increased, and the spatial position (X _m , Y _n ) and a point (X _j , Y _j , Z _j ) on the three-dimensional model of the expression (2) at that position using the camera parameters given from the respective camera i information data input units 3a to 3n Image coordinates (x _ij , y _ij ), i = 1, 2,..., C, j = 1, 2,.

実施例１と異なるのは画像座標（ｘ_ij，ｙ_ij）を各カメラの分だけ求めることだけであり、空間位置（Ｘ_m，Ｙ_n）とその位置での３次元モデルの配置は前掲の実施例１と同じである。モデルテーブル構築部４ａ〜４ｎでは、実施例１と同様の手続き（図３のステップＳ１〜Ｓ５）により各カメラに応じてＣ種類のモデルテーブルが生成される。 The only difference from the first embodiment is that the image coordinates (x _ij , y _ij ) are obtained for each camera, and the spatial position (X _m , Y _n ) and the arrangement of the three-dimensional model at that position are as described above. Same as Example 1. In the model table construction units 4a to 4n, C types of model tables are generated according to each camera by the same procedure as in the first embodiment (steps S1 to S5 in FIG. 3).

次に、物体領域抽出部２では時系列画像データベース１から画像が取り出され、実施例１と同様に背景差分やフレーム間差分により物体領域が抽出される（図３のステップＳ６，Ｓ７）。物体領域が得られれば、各画素（ｘ_ij，ｙ_ij），ｉ＝１，２，・・・，Ｃ，ｊ＝１，２，・・・，ＰをＣ種類のモデルテーブルに従いＭ×Ｎの２次元マップ空間へ投票する。このとき、各カメラは事前のカメラ校正によりＸＹ平面、すなわち、インデックスｍとｎにより示される位置（Ｘ_m，Ｙ_n）と幾何的に対応付けられているため、各テーブルによる照合処理を逐次実行し、同じＭ×Ｎの２次元マップ空間へ投票することができる。その結果、抽出した物体領域と３次元モデルの全カメラによる透視投影像とのパターン照合を行ったことになる（図３のステップＳ８〜Ｓ１１）。 Next, the object region extraction unit 2 extracts an image from the time-series image database 1, and extracts the object region based on the background difference and the inter-frame difference as in the first embodiment (steps S6 and S7 in FIG. 3). If an object region is obtained, each pixel (x _ij , y _ij ), i = 1, 2,..., C, j = 1, 2,. Vote for the 2D map space. At this time, since each camera is geometrically associated with the XY plane, that is, the position (X _m , Y _n ) indicated by the indices m and n by prior camera calibration, the collating process by each table is sequentially executed. Then, it is possible to vote for the same M × N two-dimensional map space. As a result, pattern matching between the extracted object region and the perspective projection image obtained by all cameras of the three-dimensional model is performed (steps S8 to S11 in FIG. 3).

実施例２では、実施例１の単眼カメラと比べて複数カメラを使用するため、単眼カメラに付随する曖昧性が大幅に低減された信頼性の高い空間マップが得られる。実施例１では、カメラの配置状況や複数の対象物の集まりなどによって図７の分布Ｃまたは分布Ｄに示すように奥行き方向に長く伸びる場合もある。これに対して、上記実施例２の複数のカメラに基づく投票では、それぞれのカメラによる分布が長く伸びても、複数カメラに基づく各分布を同一の空間マップにおいて交差させることができるため、得票値の多い箇所が物体位置であることを反映した信頼性の高い空間マップを得ることができる。これ以降は、実施例１と同様に、空間マップから物体位置を推定する（図３のステップＳ１２，Ｓ１３）。 Since the second embodiment uses a plurality of cameras as compared with the monocular camera of the first embodiment, a highly reliable spatial map in which the ambiguity associated with the monocular camera is greatly reduced can be obtained. In the first embodiment, there may be a case where it extends long in the depth direction as shown in the distribution C or the distribution D in FIG. On the other hand, in the voting based on a plurality of cameras in the second embodiment, each distribution based on the plurality of cameras can be crossed in the same spatial map even if the distributions of the respective cameras extend long. It is possible to obtain a highly reliable spatial map that reflects the location of the object as the object position. Thereafter, the object position is estimated from the space map as in the first embodiment (steps S12 and S13 in FIG. 3).

（実施例３）
次に請求項３、４に関する実施例を説明する。本実施例３は図１、図２と同じ処理構成であり、同一部分の説明は省略する。本実施例３では物体位置推定部６における処理フローが前記図８と異なっているため、以下ではこの箇所のみを説明する。 (Example 3)
Next, an embodiment relating to claims 3 and 4 will be described. The third embodiment has the same processing configuration as that shown in FIGS. 1 and 2, and the description of the same parts is omitted. In the third embodiment, since the processing flow in the object position estimation unit 6 is different from that in FIG. 8, only this portion will be described below.

図１４は本実施例３における物体位置推定部６での処理フローである。物体領域抽出から得られる空間位置マップは背景処理による雑音の影響を受けており、カメラの傾きによって分布が流れたりする場合がある。したがって、空間位置マップでは必ずしも物体数に応じた忠実な極大点が得られるとは限らない。そこで、まず図１４のステップＳ３１では、（Ｘ_m，Ｙ_n）空間において格子状に量子化（セル化）を行う（図１５の図示破線部分）。このセルの大きさは、カメラの設置状況を考慮して決めてよい。例えば、カメラと床面との角度が大きい場合は、図１５に示すように人が一人占有する大きさとし、またその角度が、例えば所定角度より小さい場合は奥行き方向に分布が流れている可能性があるので、図１６のようにＹ_n方向の量子化を粗くしたセルを設定する。 FIG. 14 is a processing flow in the object position estimation unit 6 in the third embodiment. The spatial position map obtained from the object region extraction is affected by noise due to background processing, and the distribution may flow depending on the tilt of the camera. Therefore, the spatial position map does not always provide a faithful maximum point corresponding to the number of objects. Therefore, first, in step S31 in FIG. 14, quantization (cellization) is performed in a lattice shape in the (X _m , Y _n ) space (the broken line portion in FIG. 15). The size of this cell may be determined in consideration of the camera installation status. For example, when the angle between the camera and the floor is large, it is assumed that the person occupies one person as shown in FIG. 15, and when the angle is smaller than a predetermined angle, for example, the distribution may flow in the depth direction. Therefore, a cell having coarse quantization in the Y _n direction as shown in FIG. 16 is set.

次にステップＳ３２，Ｓ３３において、空間マップ上の最大の得票値が１となるように各得票値を予め正規化しておき、そのときの（Ｘ_m，Ｙ_n）での値（得票率）が０．５未満の場合は削除するという処理（ステップＳ３４，Ｓ３５）を施す。これは背景差分処理時の誤差などを含めて空間マップが生成されているため、極力、この雑音からの得票を除去して、得票率が半分以上のものを物体位置の候補として残すためである。 Next, in steps S32 and S33, each vote value is normalized in advance so that the maximum vote value on the space map is 1, and the value (voting rate) at (X _m , Y _n ) at that time is If it is less than 0.5, a process of deleting (steps S34 and S35) is performed. This is because the spatial map is generated including errors during background difference processing, so that the votes obtained from this noise are removed as much as possible, and those with a vote ratio of more than half are left as object position candidates. .

続いて、前記設定されたセル内での新たな得票率を集計して得票率マップを生成する（ステップＳ３６）。そしてこのセル座標空間での得票率が所定値より大きい場合はそのセル位置を物体の位置とする（ステップＳ３７〜Ｓ３９）。図１７の黒で塗ったセルは物体位置と決定した処理例である。図１５のようにＹ_n方向のセル量子化をＸ_m方向のセル量子化と同じにすると、奥側の物体位置として検出されるセルが図１７のように多くなる場合がある。その場合はＹ_n方向の量子化をやや粗くした図１６のセル空間において物体位置を検出することで、図１８に示すように、位置として検出するセルの数を極力抑えることができる。 Subsequently, a new vote rate in the set cell is totaled to generate a vote rate map (step S36). If the vote rate in the cell coordinate space is larger than a predetermined value, the cell position is set as the object position (steps S37 to S39). A cell painted in black in FIG. 17 is an example of processing determined as an object position. If the cell quantization in the _Yn direction is the same as the cell quantization in the _Xm direction as shown in FIG. 15, the _number of cells detected as the back object position may increase as shown in FIG. In that case, by detecting the object position in the cell space of FIG. 16 which slightly rougher quantization of Y _n direction, as shown in FIG. 18, it is possible to suppress the number of cells detected as a position as much as possible.

（実施例４）
次に請求項５に関する実施例を説明する。本実施例４の基本構成図は図１９のとおりであり、複数カメラへも容易に拡張できるためその説明および図示は省略する。
Example 4
Next, an embodiment relating to claim 5 will be described. A basic configuration diagram of the fourth embodiment is as shown in FIG. 19 and can be easily extended to a plurality of cameras, and therefore description and illustration thereof are omitted.

図１９において図１と異なる点は、用意した３次元モデルの画像上での投影結果について、画像に投影された領域を投影有効画素として設定する有効画素設定部８をさらに設けた点にあり、その他は図１と同一に構成されている。 19 differs from FIG. 1 in that an effective pixel setting unit 8 is further provided for setting a region projected on an image as a projection effective pixel for a projection result on an image of a prepared three-dimensional model. The rest of the configuration is the same as in FIG.

また、図２０はリアルタイムで処理する場合の処理構成図であり、本発明は必ずしも各データベース部（時系列画像データベース１）などの記憶装置を必要としない。 FIG. 20 is a processing configuration diagram in the case of processing in real time, and the present invention does not necessarily require a storage device such as each database unit (time-series image database 1).

図２０において図１９と異なる点は、時系列画像データベース１の代わりに、複数のカメラからの画像を取り込む画像入力部７を設けた点にあり、その他は図１９と同一に構成されている。 20 is different from FIG. 19 in that an image input unit 7 for capturing images from a plurality of cameras is provided instead of the time-series image database 1, and the other configuration is the same as that in FIG.

通常の監視カメラの場合、例えば図２１に示すように、監視エリアを撮影する際に床面（または地面）を向くように傾けて配置されることが多い。上記で説明した実施例１〜３のモデルテーブル構築では、位置（Ｘ_m，Ｙ_n）を変えて３次元モデルとし、その投影される座標に基づいてテーブルを構築した。ところが、図２１でカメラの画角をΦとすると、位置（Ｘ_m，Ｙ_n）で指定する有限なエリアにしか３次元モデルを配置しないので、図２１の破線で示した部分より上側には何も投影されないことが明らかであり、画像として図２２のように投影される（白部分が投影された画素、黒部分が投影されない画素）。 In the case of a normal monitoring camera, for example, as shown in FIG. 21, when the monitoring area is photographed, the camera is often tilted so as to face the floor (or the ground). In the model table construction of Examples 1 to 3 described above, the position (X _m , Y _n ) is changed to form a three-dimensional model, and the table is constructed based on the projected coordinates. However, if the angle of view of the camera is Φ in FIG. 21, the three-dimensional model is arranged only in a finite area designated by the position (X _m , Y _n ). It is clear that nothing is projected, and the image is projected as shown in FIG. 22 (pixels on which the white part is projected, pixels on which the black part is not projected).

つまり、全ての画素に対してモデルテーブルを構築しても、画角とカメラ配置の状況によって”疎な”テーブル（図４，図５にて空欄の多いテーブル）を構築することになる。そこで、本実施例では、配置する３次元モデルを事前に把握し、処理の対象をその投影される画素のみに行う。以下では、これまでの実施例１〜３と比べて図１９の有効画素設定部８の処理が異なるため、この処理を中心に説明する。 That is, even if a model table is constructed for all pixels, a “sparse” table (a table with many blanks in FIGS. 4 and 5) is constructed depending on the angle of view and the camera arrangement. Therefore, in this embodiment, a three-dimensional model to be arranged is grasped in advance, and processing is performed only on the projected pixel. In the following, since the processing of the effective pixel setting unit 8 in FIG. 19 is different from those in the first to third embodiments, this processing will be mainly described.

前記実施例１の図３のステップＳ４（透視投影像生成ステップ）で説明したように、位置（Ｘ_m，Ｙ_n）での３次元モデル上の点（Ｘ_j，Ｙ_j，Ｚ_j）から画像座標（ｘ_j，ｙ_j），ｊ＝１，２，・・・，Ｑを得る。次に、図２２で示したように何も投影されない画素が存在するので、モデルテーブル構築において投影されない画素領域の該当する欄を削除しておく。この削除処理の後に残った画素が投影有効画素であり、この投影有効画素が物体領域抽出部２と空間マップ生成部５において使われる。この画素削除処理として大まかに二通りあり、一つは投影されないと分かった画像座標の欄を削除する方法で、図４の例では、ある画素（ｘ_j，ｙ_j）の横方向の欄が全て０となる場合はその（ｘ_j，ｙ_j）の行を全て削除する。 As described in step S4 (perspective projection image generation step) in FIG. 3 of the first embodiment, from the point (X _j , Y _j , Z _j ) on the three-dimensional model at the position (X _m , Y _n ). Obtain image coordinates (x _j , y _j ), j = 1, 2,..., Q. Next, as shown in FIG. 22, since there is a pixel on which nothing is projected, the corresponding column of the pixel area not projected in the model table construction is deleted. Pixels remaining after this deletion processing are projection effective pixels, and these projection effective pixels are used in the object region extraction unit 2 and the space map generation unit 5. There are roughly two types of pixel deletion processing. One is a method of deleting a column of image coordinates that is found not to be projected. In the example of FIG. 4, a horizontal column of a certain pixel (x _j , y _j ) If all are 0, delete all the rows of (x _j , y _j ).

あるいは、図５の例では投影数が０の場合にその行を削除する。もう一つは、図２３に示すように投影されないと分かった画素領域を矩形などでフィルタリングし、このフィルタリングされた画素領域に該当する画像座標の欄をモデルテーブルから削除する処理である。以降は後者の削除処理の場合についてである。 Alternatively, in the example of FIG. 5, when the number of projections is 0, the row is deleted. The other is processing for filtering a pixel area that is not projected as shown in FIG. 23 with a rectangle or the like, and deleting a column of image coordinates corresponding to the filtered pixel area from the model table. The following is for the latter deletion process.

このモデルテーブル構築の際に、投影される画像領域を投影有効画素として有効画素設定部８に設定しておく。続いて、物体領域抽出部２では実施例１と同様に背景差分やフレーム間差分により時系列画像から物体領域を抽出するとき、前記投影有効画素として設定されている画素領域のみ物体領域の画素（ｘ_j，ｙ_j）として抽出する。次に、この抽出した各画素についてモデルテーブルを参照し投票処理を行い空間マップを生成する。その後の処理は実施例１と同様である。本実施例は、投影されない画素領域を処理対象から外すため、物体領域抽出、モデルテーブル構築、空間マップ生成の一連の処理を効率的に実行することができる。 When constructing the model table, the projected image area is set in the effective pixel setting unit 8 as a projection effective pixel. Subsequently, when the object region extraction unit 2 extracts the object region from the time-series image by the background difference or the inter-frame difference as in the first embodiment, only the pixel region set as the effective projection pixel (the pixel of the object region ( x _j , y _j ). Next, a voting process is performed for each extracted pixel with reference to the model table to generate a space map. Subsequent processing is the same as in the first embodiment. In the present embodiment, since a pixel area that is not projected is excluded from the processing target, a series of processes of object area extraction, model table construction, and spatial map generation can be executed efficiently.

尚、本発明の物体位置推定方法は、前記実施例１〜４で述べた各処理を実行するものである。 Note that the object position estimation method of the present invention executes each process described in the first to fourth embodiments.

また前記物体位置推定方法をコンピュータに実行させるためのプログラムを構築するものである。 A program for causing a computer to execute the object position estimation method is constructed.

また、本実施形態の物体位置推定装置における各手段の一部もしくは全部の機能をコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行して本発明を実現することができること、本実施形態の物体位置推定方法における手順をコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることは言うまでもなく、コンピュータでその機能を実現するためのプログラムを、そのコンピュータが読み取り可能な記録媒体、例えばＦＤ（Ｆｌｏｐｐｙ（登録商標）Ｄｉｓｋ）や、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｋ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、メモリカード、ＣＤ（ＣｏｍｐａｃｔＤｉｓｋ）−ＲＯＭ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）−ＲＯＭ、ＣＤ−Ｒ，、ＣＤ−ＲＷ，ＨＤＤ，リムーバブルディスクなどに記録して、保存したり、配布したりすることが可能である。また、上記のプログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 Further, the present invention can be realized by configuring some or all of the functions of each means in the object position estimation apparatus of the present embodiment by a computer program and executing the program using the computer. It goes without saying that the procedure in the object position estimation method of the above can be configured by a computer program and the program can be executed by the computer, and the program for realizing the function by the computer can be read by the computer, For example, FD (Floppy (registered trademark) Disk), MO (Magneto-Optical disk), ROM (Read Only Memory), memory card, CD (Compact Disk) -ROM, DVD (Digital Versati) e Disk) -ROM, CD-R ,, CD-RW, HDD, and recorded in a removable disk, or stored, it is possible or distribute. It is also possible to provide the above program through a network such as the Internet or electronic mail.

本発明の一実施形態例の物体位置推定装置の画像蓄積型の基本構成図である。It is a basic composition figure of an image storage type of an object position estimating device of an example of an embodiment of the present invention. 本発明の一実施形態例の物体位置推定装置のリアルタイム処理型の基本構成図である。1 is a basic configuration diagram of a real-time processing type of an object position estimation apparatus according to an embodiment of the present invention. 本発明の実施例１の処理フロー図である。It is a processing flow figure of Example 1 of the present invention. 本発明のモデルテーブルの一例を示す説明図である。It is explanatory drawing which shows an example of the model table of this invention. 本発明のモデルテーブルの他の例を示す説明図である。It is explanatory drawing which shows the other example of the model table of this invention. 本発明において背景画像と各フレーム画像との背景差分により物体領域を抽出する例を示す説明図である。It is explanatory drawing which shows the example which extracts an object area | region by the background difference of a background image and each frame image in this invention. 本発明における空間マップの例を示す説明図である。It is explanatory drawing which shows the example of the space map in this invention. 本発明における物体位置推定の一例を示す処理フロー図である。It is a processing flowchart which shows an example of the object position estimation in this invention. 図８の処理フローによる物体位置推定の処理結果の例を示す説明図である。It is explanatory drawing which shows the example of the process result of the object position estimation by the processing flow of FIG. 図８の処理フローによる物体位置推定の処理結果の例を示す説明図である。It is explanatory drawing which shows the example of the process result of the object position estimation by the processing flow of FIG. 本発明におけるカメラと物体の位置関係、並びに３次元モデルを配置する格子マップの説明図である。It is explanatory drawing of the lattice map which arrange | positions the positional relationship of a camera and an object in this invention, and a three-dimensional model. 本発明の他の実施形態例の物体位置推定装置の画像蓄積型の基本構成図である。It is an image storage type | mold basic block diagram of the object position estimation apparatus of the other embodiment of this invention. 本発明の他の実施形態例の物体位置推定装置のリアルタイム処理型の基本構成図である。It is a basic-structure figure of the real-time processing type of the object position estimation apparatus of the other embodiment of this invention. 本発明における物体位置推定の他の例を示す処理フロー図である。It is a processing flowchart which shows the other example of the object position estimation in this invention. 本発明の実施例３の物体位置推定処理のようすを示す説明図である。It is explanatory drawing which shows the state of the object position estimation process of Example 3 of this invention. 本発明の実施例３の物体位置推定処理のようすを示す説明図である。It is explanatory drawing which shows the state of the object position estimation process of Example 3 of this invention. 図１５の処理結果を示す説明図である。It is explanatory drawing which shows the process result of FIG. 図１６の処理結果を示す説明図である。It is explanatory drawing which shows the process result of FIG. 本発明の実施例４の物体位置推定装置の画像蓄積型の基本構成図である。It is an image storage type | mold basic block diagram of the object position estimation apparatus of Example 4 of this invention. 本発明の実施例４の物体位置推定装置のリアルタイム処理型の基本構成図である。It is a basic block diagram of the real-time processing type | mold of the object position estimation apparatus of Example 4 of this invention. カメラと３次元モデルの配置例を示す説明図。Explanatory drawing which shows the example of arrangement | positioning of a camera and a three-dimensional model. ３次元モデルの投影結果の例を示す説明図である。It is explanatory drawing which shows the example of the projection result of a three-dimensional model. 本発明の実施例４の、背景差分による物体領域における投影有効画素からの抽出例を示す説明図である。It is explanatory drawing which shows the example extracted from the projection effective pixel in the object area | region by the background difference of Example 4 of this invention.

Explanation of symbols

１…時系列画像データベース、２…物体領域抽出部、３、３ａ〜３ｎ…カメラ情報データ入力部、４，４ａ〜４ｎ…モデルテーブル構築部、５…空間マップ生成部、６…物体位置推定部、７…画像入力部、８…有効画素設定部。 DESCRIPTION OF SYMBOLS 1 ... Time series image database, 2 ... Object area extraction part 3, 3a-3n ... Camera information data input part, 4, 4a-4n ... Model table construction part, 5 ... Spatial map generation part, 6 ... Object position estimation part , 7... Image input unit, 8... Effective pixel setting unit.

Claims

An object position estimation device that extracts a target area from a video acquired by using one or a plurality of image input devices such as a camera, and estimates a position in the space from the target area,
From each time-series image acquired by each of the image input devices, an object region extraction unit that extracts a target region in an image processing manner,
Each pixel of the image obtained from the image input device indicates which projection of the three-dimensional model prepared by each pixel corresponding to each image input device extracted by the object region extraction means corresponds. A reference table indicating which spatial position is projected from which 3D model is projected is used for collation. If there is a projected 3D model, the vote value is obtained by voting for each model spatial position. A spatial map generating means for generating a map indicating that there is a high possibility of the presence of an object in a place where there are many
The position where the vote value of the map generated by the spatial map generation means becomes the maximum point is detected as the position of the object, and the circular area of radius R around the position of the object is set as the exclusive area Then, a circular area having a radius R is set as another exclusive area in the area where the exclusive area is not set in the map, and the exclusive area is set for each exclusive area of the other exclusive area. Object position estimation means comprising: an object position estimating means for summing up the vote values obtained at the position, and when the summed value is equal to or greater than a predetermined value, the center of the other exclusive area is set as the position of the target object. Estimating device.

The reference table is
Each image of a three-dimensional perspective projection image generated by arranging a plurality of three-dimensional models at spatial positions on a certain reference coordinate system and using the position and orientation of the image input device and internal parameters unique to the image input device. The object position estimation apparatus according to claim 1, wherein the object position estimation apparatus is constructed as a table indicating a relationship between coordinates and an arrangement position of the three-dimensional model.

The object position estimating means includes
When the generated spatial map is quantized in a lattice shape, and each vote value in the quantized range is equal to or greater than a predetermined value, the quantized range is set as a place where the object exists. The object position estimation apparatus according to claim 1 or 2, characterized in that

The grid-like quantization in the spatial map is rough when the angle between the direction in which each image input device faces and the installation surface of the device is smaller than a predetermined angle, in the direction in which the image input device faces. The object position estimation apparatus according to claim 3 , wherein the object position estimation apparatus is set.

The projection result on the image of the prepared three-dimensional model further includes effective pixel setting means for setting an area projected on the image as a projection effective pixel,
The reference table is constructed using the projection effective pixels set by the effective pixel setting means,
The object area extracting means, the effective pixel object position estimation apparatus according to any one of claims 1 to 4, characterized in that to extract only region of the projection effective pixels set by the setting means.

An object position estimation method for extracting a target area from a video acquired by using one or a plurality of image input devices such as a camera, and estimating a position in the space from the target area,
An object area extracting step in which the object area extracting means extracts the area of the object in an image processing manner from each time-series image acquired by each of the image input devices;
A reference table construction step, wherein the reference table construction means constructs a reference table indicating which spatial position each pixel of the image obtained from each image input device is projected from;
The spatial map generation means uses the reference table to determine which projection of the three-dimensional model prepared for each pixel corresponding to each image input device extracted in the object region extraction step. If there is a three-dimensional model to be collated and voted, the space for generating a map indicating that there is a high possibility of the presence of an object at a place with a large vote value by voting on the spatial position of the model. A map generation step;
An object position estimating means comprising: an object position estimating step for estimating a position of an object based on a vote value of the map generated by the spatial map generating step ;
The object position estimating step includes:
A first sub-step of detecting a position where the vote value of the map generated by the spatial map generating step is a maximum point and setting the position of the object; and a circular region having a radius R around the position of the object A second sub-step that is set as an exclusive area, and a circular area having a radius R is set as another exclusive area in the area where the exclusive area is not set in the map, And a fourth sub-step in which the center of the other exclusive area is set as the position of the object when the calculated value is equal to or greater than a predetermined value. And the object position estimation method characterized by repeating said 2nd sub-step-4th sub-step with respect to the whole map .

The reference table construction step includes
A plurality of three-dimensional models are arranged at spatial positions on a reference coordinate system, and a perspective projection image of the three-dimensional shape is generated using the position and orientation of the image input device and internal parameters unique to the image input device, The object position estimation method according to claim 6 , wherein each image coordinate of the perspective projection image is associated with an arrangement position of the three-dimensional model.

The object position estimating step includes:
When the generated spatial map is quantized in a lattice shape, and each vote value in the quantized range is equal to or greater than a predetermined value, the quantized range is set as a place where the object exists. The object position estimation method according to claim 6 or 7 , wherein the object position is estimated.

The grid-like quantization in the spatial map is rough when the angle between the direction in which each image input device faces and the installation surface of the device is smaller than a predetermined angle, in the direction in which the image input device faces. The object position estimation method according to claim 8 , wherein the object position estimation method is set.

The projection result on the image of the prepared three-dimensional model further includes an effective pixel setting step for setting an area projected on the image as a projection effective pixel,
The reference table construction step is constructed using the projection effective pixels set by the effective pixel setting step,
The object position estimation method according to any one of claims 6 to 9 , wherein the object region extraction step extracts only a region of the projection effective pixel set by the effective pixel setting step.

An object position estimation program for causing a computer to execute the steps according to any one of claims 6 to 10 .

The computer-readable recording medium which recorded the object position estimation program of Claim 11 .