JP2006209334A

JP2006209334A - Position detection apparatus, method and program

Info

Publication number: JP2006209334A
Application number: JP2005018438A
Authority: JP
Inventors: Itaru Kitahara; 格北原; Kiyoshi Kogure; 潔小暮; Norihiro Hagita; 紀博萩田
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2005-01-26
Filing date: 2005-01-26
Publication date: 2006-08-10
Anticipated expiration: 2025-01-26
Also published as: JP4674316B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a position detection apparatus capable of detecting the two-dimensional position of an object of varying height such as a human through a simple detection process using one photographing means. <P>SOLUTION: A videocamera 11 takes a photo of a person to obtain a two-dimensional image. A projecting part 12 extracts a person region from the two-dimensional image obtained and projects the person region onto a plurality of horizontal planes within a three-dimensional space. An integrating part 13 calculates an integrated value by integrating the mappings of the person projected onto the horizontal planes. A detection part 14 detects a horizontal position within the three-dimensional space where the peak of the integrated value calculated is situated as the position of the person, and detects as the height of the person the height of the highest horizontal plane where the mapping of the person exists at the horizontal position within the three-dimensional space where the peak is situated. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、一の撮影手段を用いて、対象物、例えば、人間の３次元位置を検出する位置検出装置及び位置検出方法に関するものである。 The present invention relates to a position detection apparatus and a position detection method for detecting a three-dimensional position of an object, for example, a human, using a single photographing unit.

近年、画像取得技術の進歩及びネットワーク化により、画像監視システムが市中に設置されるようになっている。このような画像監視システムが、近い将来、安全面に問題のある人を監視したり、歩行者、特に老人及び子供に有益な情報を提供したり、種々の重要な目的のために使用されることが予想され、これらのサービスから提供されるデータを分析することにより生活の質を向上させることができる。 In recent years, image monitoring systems have been installed in the city due to advances in image acquisition technology and networking. Such image surveillance systems will be used for various important purposes in the near future, monitoring people with safety problems, providing useful information to pedestrians, especially the elderly and children, etc. The quality of life can be improved by analyzing the data provided by these services.

上記のような画像監視システムには、人間の位置を検出して追跡することが必要となり、例えば、複数のカメラを使用したステレオ技術を用いて人間の３次元位置を追跡することが報告されている（非特許文献１参照）。このステレオ技術を用いて人間を追跡する場合、人間が重なり合った状態でも、正確に各人間の３次元位置を検出することができる。
エイミタル（A. Mittal）他、「Ｍ２Ｔトラッカー：混雑した状態の人間の分割及び追跡に対する複数視点のアプローチ」（M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene）、コンピュータビジョンインターナショナルジャーナ（International Journal of Computer Vision）、２００３年、Ｖｏｌ．５１（３），ｐ．１８９−ｐ．２０３ The image monitoring system as described above needs to detect and track the position of a person. For example, it has been reported that a three-dimensional position of a person is tracked using stereo technology using a plurality of cameras. (See Non-Patent Document 1). When a person is tracked using this stereo technology, the three-dimensional position of each person can be accurately detected even when the person overlaps.
A. Mittal et al., “M2T Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene”, Computer Vision International Journal (International Journal of Computer Vision), 2003, Vol. 51 (3), p. 189-p. 203

しかしながら、上記のように複数のカメラを用いた場合、多数の画像データを処理する必要があり、計算コストが増加する。また、一つの撮影範囲に対して複数台のカメラが必要となり、広い範囲を撮影する場合、多数のカメラが必要となり、装置のコストが増大する。このため、１台のカメラを用いて撮影した画像を３次元空間中の所定高さに配置された仮想平面に投影して人間の位置を検出することも考えられるが、この場合、水平面の高さが固定されるため、測定物体の高さと想定した平面の高さとが一致しない場合、正確な２次元位置を検出することは困難である。一方、本発明が測定対象としている人間の身長は、個人ごとに異なるのが一般的であるため、測定可能な高さが固定される問題の影響は深刻である。 However, when a plurality of cameras are used as described above, it is necessary to process a large number of image data, which increases the calculation cost. In addition, a plurality of cameras are required for one shooting range. When shooting a wide range, a large number of cameras are required, which increases the cost of the apparatus. For this reason, it is conceivable to detect the position of a human by projecting an image photographed by one camera onto a virtual plane arranged at a predetermined height in the three-dimensional space. Therefore, it is difficult to detect an accurate two-dimensional position when the height of the measurement object does not match the assumed height of the plane. On the other hand, since the height of the human subject to be measured by the present invention is generally different for each individual, the influence of the problem that the measurable height is fixed is serious.

本発明の目的は、一の撮影手段を用いて簡略な検出処理により、人間のように様々な高さを有する物体の２次元位置を検出することができる位置検出装置、位置検出方法及び位置検出プログラムを提供することである。 An object of the present invention is to provide a position detection apparatus, a position detection method, and position detection that can detect two-dimensional positions of objects having various heights such as human beings by a simple detection process using a single photographing means. Is to provide a program.

本発明に係る位置検出装置は、対象物を撮影して当該対象物を含む２次元画像を取得する一の撮影手段と、撮影手段により取得された２次元画像から対象物領域を抽出して当該対象物領域を３次元空間中に所定間隔で予め設定された複数の水平面上に投影する投影手段と、投影手段により各水平面上に投影された対象物の写像を積分して写像の積分値を算出する積分手段と、積分手段により算出された積分値のピークが位置する３次元空間中の水平位置を対象物の位置として検出する検出手段とを備えるものである。 The position detection device according to the present invention includes a photographing unit that captures a target and acquires a two-dimensional image including the target, and extracts a target region from the two-dimensional image acquired by the photographing unit. Projecting means for projecting the object region onto a plurality of horizontal planes set in advance in the three-dimensional space at predetermined intervals, and integrating the mapping of the target object projected onto each horizontal plane by the projecting means, the integral value of the mapping is obtained. An integrating means for calculating, and a detecting means for detecting the horizontal position in the three-dimensional space where the peak of the integrated value calculated by the integrating means is located as the position of the object.

本発明に係る位置検出装置においては、一の撮影手段により対象物を撮影して当該対象物を含む２次元画像が取得され、取得された２次元画像から抽出された対象物領域が３次元空間中に所定間隔で予め設定された複数の水平面上に投影され、各水平面上に投影された対象物の写像を積分して写像の積分値が算出され、算出された積分値のピークが位置する３次元空間中の水平位置が対象物の位置として検出される。 In the position detection apparatus according to the present invention, a two-dimensional image including the target object is acquired by shooting the target object with one imaging unit, and the target object region extracted from the acquired two-dimensional image is a three-dimensional space. Are projected onto a plurality of preset horizontal planes at predetermined intervals, and the integral of the projection is calculated by integrating the projection of the object projected onto each horizontal plane, and the peak of the calculated integral value is located A horizontal position in the three-dimensional space is detected as the position of the object.

このように、対象物が直立していると仮定して、積分値のピークの位置を対象物の水平位置として検出するとともに、このピークの位置に対象物の写像が存在する最も高い水平面の高さを対象物の高さとして検出しているので、一の撮影手段により撮影された画像のみを用いて、人間のように様々な高さを有する物体の２次元位置を検出することができる。また、一の撮影手段により撮影された画像のみを用いているので、検出処理を簡略化することができる。 In this way, assuming that the object is upright, the position of the peak of the integrated value is detected as the horizontal position of the object, and the height of the highest horizontal plane where the mapping of the object exists at this peak position is detected. Since the height is detected as the height of the object, it is possible to detect the two-dimensional position of an object having various heights such as a human by using only an image photographed by one photographing means. Moreover, since only the image photographed by one photographing means is used, the detection process can be simplified.

検出手段は、対象物の３次元形状を近似した近似フィルタと積分値との畳み込み値が最大となる３次元空間中の水平位置を対象物の位置として検出することが好ましい。 Preferably, the detecting means detects a horizontal position in the three-dimensional space where the convolution value of the approximate filter approximating the three-dimensional shape of the object and the integral value is the maximum as the position of the object.

この場合、近似フィルタにより対象物のみを抽出することができ、他の物体等による外乱に対してロバストな検出を行うことができる。 In this case, only the target object can be extracted by the approximate filter, and robust detection can be performed against disturbance caused by other objects.

検出手段は、ピークが位置する３次元空間中の水平位置に対象物の写像が存在する最も高い水平面の高さを対象物の高さとして検出することが好ましい。 Preferably, the detection means detects the height of the highest horizontal plane where the mapping of the object exists at a horizontal position in the three-dimensional space where the peak is located as the height of the object.

この場合、ピークが位置する３次元空間中の水平位置に対象物の写像が存在する最も高い水平面の高さを対象物の高さとして検出することができるので、３次元物体の登頂部の位置（３次元物体が地図上で存在する２次元座標及びその高さ）を物体の３次元位置として検出することができる。 In this case, since the height of the highest horizontal plane where the mapping of the object exists at the horizontal position in the three-dimensional space where the peak is located can be detected as the height of the object, the position of the top of the three-dimensional object It is possible to detect (a two-dimensional coordinate where the three-dimensional object exists on the map and its height) as the three-dimensional position of the object.

検出手段は、積分手段により算出された積分値に複数のピークが存在する場合、隣接するピーク間の谷の高さとピークの高さとの間隔が所定間隔以上のピーク毎に、当該ピークが位置する３次元空間中の水平位置を対象物の位置として検出するとともに、当該ピークが位置する３次元空間中の水平位置に対象物の写像が存在する最も高い水平面の高さを対象物の高さとして検出することが好ましい。 When a plurality of peaks are present in the integration value calculated by the integration unit, the detection unit is positioned for each peak in which the interval between the height of the valley between adjacent peaks and the peak height is equal to or greater than a predetermined interval. The horizontal position in the three-dimensional space is detected as the position of the object, and the height of the highest horizontal plane where the mapping of the object exists at the horizontal position in the three-dimensional space where the peak is located is defined as the height of the object. It is preferable to detect.

この場合、隣接するピーク間の谷の高さとピークの高さとの間隔が所定間隔以上の各ピークが対象物によるものであると判断することができるので、複数の対象物が近接している状態でも、各対象物の３次元位置を高精度に検出することができる。 In this case, since it can be determined that each peak having an interval between the height of the valley between adjacent peaks and the height of the peak is a predetermined interval or more is due to the object, a plurality of objects are close to each other However, the three-dimensional position of each object can be detected with high accuracy.

検出手段は、隣接するピーク間の谷の高さとピークの高さとの間隔が所定間隔未満である場合、第１の対象物と、撮影手段から第１の対象物より離れている第２の対象物とが重なっていると判断し、基準高さ以上のピークが位置する３次元空間中の水平位置を第１の対象物の位置として検出するとともに、第１の対象物の高さとして所定時間前に検出された高さを用いることが好ましい。 When the interval between the height of the valley between the adjacent peaks and the peak height is less than the predetermined interval, the detection unit detects the first object and the second object that is separated from the first object from the imaging unit. It is determined that the object overlaps, and the horizontal position in the three-dimensional space where the peak equal to or higher than the reference height is detected as the position of the first object, and the height of the first object is determined for a predetermined time. It is preferred to use the previously detected height.

この場合、第１及び第２の対象物が非常に近接して撮影画像上で重なり合っていても、撮影手段側に位置する第１の対象物の３次元位置を高精度に検出することができる。 In this case, even if the first and second objects are very close to each other and overlap on the photographed image, the three-dimensional position of the first object located on the photographing means side can be detected with high accuracy. .

検出手段は、３次元空間における撮影手段の撮影中心位置と３次元空間における基準高さ以上のピークの３次元位置とを結ぶ直線と、第２の対象物の高さとして所定時間前に検出された高さに位置する水平面とが交わる点が位置する３次元空間中の水平位置を第２の対象物の位置として検出するとともに、第２の対象物の高さとして所定時間前に検出された高さを用いることが好ましい。 The detection means is detected a predetermined time ago as a straight line connecting the photographing center position of the photographing means in the three-dimensional space and the three-dimensional position of the peak higher than the reference height in the three-dimensional space, and the height of the second object. The horizontal position in the three-dimensional space where the point where the horizontal plane located at the height intersects is located is detected as the position of the second object, and the height of the second object is detected a predetermined time ago. It is preferable to use the height.

この場合、第１及び第２の対象物が非常に近接して撮影画像上で重なり合っていても、撮影手段から第１の対象物より離れている第２の対象物の３次元位置を高精度に検出することができる。 In this case, even if the first and second objects are very close to each other and overlap each other on the photographed image, the three-dimensional position of the second object that is farther from the first object from the photographing means is highly accurate. Can be detected.

本発明に係る位置検出方法は、一の撮影手段を用いて対象物を撮影して当該対象物を含む２次元画像を取得する第１のステップと、取得された２次元画像から対象物領域を抽出して当該対象物領域を３次元空間中に所定間隔で予め設定された複数の水平面上に投影する第２のステップと、各水平面上に投影された対象物の写像を積分して写像の積分値を算出する第３のステップと、算出された積分値のピークが位置する３次元空間中の水平位置を対象物の位置として検出するとともに、ピークが位置する３次元空間中の水平位置に対象物の写像が存在する最も高い水平面の高さを対象物の高さとして検出するステップとを含むものである。 The position detection method according to the present invention includes a first step of capturing an object using one image capturing unit to acquire a two-dimensional image including the object, and an object region from the acquired two-dimensional image. A second step of extracting and projecting the object region onto a plurality of preset horizontal planes at predetermined intervals in a three-dimensional space, and integrating the mapping of the target object projected onto each horizontal plane to A third step of calculating an integral value, and detecting a horizontal position in the three-dimensional space where the peak of the calculated integral value is located as the position of the object, and at the horizontal position in the three-dimensional space where the peak is located Detecting the height of the highest horizontal plane where the mapping of the object exists as the height of the object.

本発明に係る位置検出プログラムは、対象物を撮影して当該対象物を含む２次元画像を取得する一の撮影手段により取得された２次元画像から対象物領域を抽出して当該対象物領域を３次元空間中に所定間隔で予め設定された複数の水平面上に投影する投影手段と、投影手段により各水平面上に投影された対象物の写像を積分して写像の積分値を算出する積分手段と、積分手段により算出された積分値のピークが位置する３次元空間中の水平位置を対象物の位置として検出する検出手段としてコンピュータを機能させるものである。 The position detection program according to the present invention extracts a target area from a two-dimensional image acquired by one imaging unit that captures a target and acquires a two-dimensional image including the target. Projection means for projecting onto a plurality of horizontal planes set in advance in the three-dimensional space at predetermined intervals, and integration means for calculating the integral value of the map by integrating the mapping of the object projected onto each horizontal plane by the projection means The computer is caused to function as detection means for detecting the horizontal position in the three-dimensional space where the peak of the integration value calculated by the integration means is located as the position of the object.

本発明によれば、対象物が直立していると仮定して、積分値のピークの位置を対象物の水平位置として検出しているので、一の撮影手段により撮影された画像のみを用いて、人間のように様々な高さを有する物体の２次元位置を検出することができるとともに、検出処理を簡略化することができる。 According to the present invention, since the position of the peak of the integrated value is detected as the horizontal position of the object on the assumption that the object is upright, only the image photographed by one photographing means is used. It is possible to detect a two-dimensional position of an object having various heights, such as a human being, and to simplify the detection process.

以下、本発明の一実施の形態による位置検出装置について図面を参照しながら説明する。図１は、本発明の一実施の形態による位置検出装置の構成を示すブロック図である。 Hereinafter, a position detection device according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a position detection apparatus according to an embodiment of the present invention.

図１に示す位置検出装置は、１台のビデオカメラ１１、投影部１２、積分部１３、検出部１４及び追跡部１５を備える。投影部１２、積分部１３、検出部１４及び追跡部１５は、入力装置、ＲＯＭ（リードオンリメモリ）、ＣＰＵ（中央演算処理装置）、ＲＡＭ（ランダムアクセスメモリ）、画像Ｉ／Ｆ（インターフェース）部及び外部記憶装置等を備えるコンピュータを用いて後述する各処理を行うための位置検出プログラムをＣＰＵ等で実行することにより実現される。なお、投影部１２、積分部１３、検出部１４及び追跡部１５の構成例は、本例に特に限定されず、各ブロックを専用のハードウエアから構成したり、一部のブロック又はブロック内の一部の処理のみを専用のハードウエアで構成したりする等の種々の変更が可能である。 The position detection apparatus shown in FIG. 1 includes a single video camera 11, a projection unit 12, an integration unit 13, a detection unit 14, and a tracking unit 15. The projection unit 12, the integration unit 13, the detection unit 14, and the tracking unit 15 are an input device, a ROM (read only memory), a CPU (central processing unit), a RAM (random access memory), and an image I / F (interface) unit. In addition, a CPU or the like executes a position detection program for performing each process described later using a computer including an external storage device or the like. Note that the configuration example of the projection unit 12, the integration unit 13, the detection unit 14, and the tracking unit 15 is not particularly limited to this example, and each block may be configured by dedicated hardware, or may be included in some blocks or blocks. Various modifications such as configuring only a part of processing with dedicated hardware are possible.

ビデオカメラ１１は、通常の監視カメラ等から構成され、撮影空間の所定位置、例えば、天井の隅に取り付けられる。ビデオカメラ１１は、後述する校正方法により校正されており、対象物である人間を撮影して人間を含む２次元画像を取得して投影部１２へ出力する。なお、ビデオカメラ１１としては、既設の監視カメラを用いてもよい。 The video camera 11 is composed of a normal surveillance camera or the like, and is attached to a predetermined position in the photographing space, for example, a corner of the ceiling. The video camera 11 is calibrated by a calibration method described later, captures a human being as a target, acquires a two-dimensional image including the human, and outputs the two-dimensional image to the projection unit 12. As the video camera 11, an existing surveillance camera may be used.

投影部１２は、取得された２次元画像から対象物領域となる人物領域を抽出し、抽出した人物領域を３次元空間中に所定間隔で予め設定された複数の水平面（各水平面の高さは既知）上に投影する。なお、２次元画像における背景領域（人間以外の領域）と人物領域（人間のシルエット）との分離には、背景差分等の公知の画像処理手法を用いることができる。 The projecting unit 12 extracts a human area as a target area from the acquired two-dimensional image, and extracts the extracted human area in a three-dimensional space at a plurality of horizontal planes (the height of each horizontal plane is preset). Project onto (known). A known image processing method such as background difference can be used to separate the background area (non-human area) and the person area (human silhouette) in the two-dimensional image.

ここで、ビデオカメラ１１の校正方法について説明する。３次元空間中に設定された複数の水平面上に人物領域画像を投影するためには、ビデオカメラ１１の撮影画像と３次元空間中に設定された二つの基準面（水平面）との間の２次元射影変換行列が必要となる。まず、二つの基準面のうち一方の面が地面の高さ（高さ０）に位置し、他方の面は高さＹｈに位置するものとし、撮影空間となる３次元空間に普通の人間の身長より高い４本の校正用バーを地面に対して垂直に所定間隔に（４本の校正用バーが立方体をなすように）設置する。各校正用バーの頭部及び底部には色付けされたマーカーが設けられ、各基準面に対して４個のマーカーを観測することにより、２次元射影変換行列Ｈ０、Ｈ１を計算する。ここで、二つの基準面の間に位置する中間の高さの水平面の高さをＹｎとすると、この水平面の２次元射影変換行列Ｈｎは、下記の式（１）に示すように、２次元射影変換行列Ｈ０、Ｈ１を用いた補間により推定することができる。 Here, a calibration method of the video camera 11 will be described. In order to project a person area image onto a plurality of horizontal planes set in the three-dimensional space, 2 between the captured image of the video camera 11 and two reference planes (horizontal planes) set in the three-dimensional space. A dimensional projective transformation matrix is required. First, it is assumed that one of the two reference planes is positioned at the height of the ground (height 0) and the other plane is positioned at the height Yh. Four calibration bars that are taller than the height are installed perpendicularly to the ground at predetermined intervals (so that the four calibration bars form a cube). Colored markers are provided on the head and bottom of each calibration bar, and two-dimensional projective transformation matrices H0 and H1 are calculated by observing four markers on each reference plane. Here, assuming that the height of a horizontal plane located between two reference planes is Yn, the two-dimensional projective transformation matrix Hn of this horizontal plane is two-dimensional as shown in the following equation (1). It can be estimated by interpolation using the projective transformation matrices H0 and H1.

Ｈｎ＝（（Ｙｈ−Ｙｎ）Ｈ０＋（Ｙｎ）Ｈ１）／Ｙｈ（１）
上記のようにして、各設置位置で校正用バーの３次元位置を正確に測定することにより、複数の水平面と３次元世界座標系との間の変換を容易に行うことができ、ビデオカメラ１１の校正を簡略に行うことができる。 Hn = ((Yh−Yn) H0 + (Yn) H1) / Yh (1)
As described above, by accurately measuring the three-dimensional position of the calibration bar at each installation position, conversion between a plurality of horizontal planes and the three-dimensional world coordinate system can be easily performed. Can be simplified.

積分部１３は、各水平面上に投影された人間の写像を積分して写像の積分値を算出して検出部１４へ出力する。検出部１４は、算出された積分値のピークが位置する３次元空間中の水平位置を人間の位置として検出するとともに、このピークが位置する３次元空間中の水平位置に人物領域の写像が存在する最も高い水平面の高さを人物の高さとして検出する。追跡部１５は、検出された３次元位置を用いて人間を追跡し、各観測時刻における人間の３次元位置（水平位置及び高さ）を記憶している。 The integration unit 13 integrates the human map projected on each horizontal plane, calculates an integral value of the map, and outputs it to the detection unit 14. The detection unit 14 detects the horizontal position in the three-dimensional space where the calculated peak of the integrated value is located as a human position, and a mapping of the person region exists at the horizontal position in the three-dimensional space where the peak is located. The height of the highest horizontal plane to be detected is detected as the height of the person. The tracking unit 15 tracks a person using the detected three-dimensional position, and stores the three-dimensional position (horizontal position and height) of the person at each observation time.

なお、ビデオカメラ１１がパン方向及びチルト方向に移動可能な可動カメラから構成される場合、追跡部１５は、追跡結果に応じてビデオカメラ１１の撮影位置を制御するようにしてもよい。この場合、その撮影位置でカメラ校正を行うことにより、常に最適な観測位置で対象物を撮影することができ、３次元位置を高精度に検出することができる。 When the video camera 11 is configured by a movable camera that can move in the pan direction and the tilt direction, the tracking unit 15 may control the shooting position of the video camera 11 according to the tracking result. In this case, by performing camera calibration at the shooting position, the object can always be shot at the optimum observation position, and the three-dimensional position can be detected with high accuracy.

具体的には、検出部１４は、対象物の３次元形状を近似した近似フィルタと積分値との畳み込み値が最大となる３次元空間中の水平位置を対象物の位置として検出するとともに、この対象物の位置において人間の写像が存在する最も高い水平面の高さを人間の高さとして検出することにより、人間の３次元位置を検出する。 Specifically, the detection unit 14 detects the horizontal position in the three-dimensional space where the convolution value between the approximate filter that approximates the three-dimensional shape of the target object and the integral value is the maximum as the position of the target object. The human three-dimensional position is detected by detecting the height of the highest horizontal plane where the human mapping exists at the position of the object as the human height.

また、検出部１４は、近接する複数の人間が撮影され、積分値に複数のピークが存在する場合、隣接するピーク間の谷の高さとピークの高さとの間隔が所定間隔以上、例えば、１ｍ以上あるピーク毎に、当該ピークが位置する３次元空間中の水平位置を人間の位置として検出するとともに、当該ピークが位置する３次元空間中の水平位置に人間の写像が存在する最も高い水平面の高さを対象物の高さとして検出する。 In addition, the detection unit 14 captures images of a plurality of adjacent humans, and when there are a plurality of peaks in the integrated value, the interval between the valley height between adjacent peaks and the peak height is equal to or greater than a predetermined interval, for example, 1 m For each peak, the horizontal position in the three-dimensional space where the peak is located is detected as a human position, and the highest horizontal plane where the human mapping exists at the horizontal position in the three-dimensional space where the peak is located. The height is detected as the height of the object.

さらに、検出部１４は、複数の人間が非常に近接して撮影画像上で重なり、隣接するピーク間の谷の高さとピークの高さとの間隔が所定間隔未満である場合、二人以上の人間が重なっていると判断し、基準高さ以上、例えば、２．５ｍ以上のピークが位置する３次元空間中の水平位置をビデオカメラ１１に近い方の人間の位置として検出するとともに、この人間の高さとして所定時間前、例えば、１又は数フレーム前に検出された高さを追跡部１５から読み出し、読み出した高さを現在の高さとして用いる。 Further, the detection unit 14 is configured so that two or more humans are overlapped when a plurality of humans are very close to each other and overlap each other on the captured image, and the interval between the valley height between adjacent peaks and the peak height is less than a predetermined interval. And the horizontal position in the three-dimensional space where the peak of the reference height or more, for example, 2.5 m or more is located, is detected as the position of the person closest to the video camera 11, and The height detected for a predetermined time before, for example, one or several frames before, is read from the tracking unit 15 as the height, and the read height is used as the current height.

このとき、検出部１４は、ビデオカメラ１１から遠い方の人間の高さとして所定時間前、例えば、１又は数フレーム前に検出された高さを追跡部１５から読み出し、読み出した高さを現在の高さとして用いるとともに、３次元空間におけるビデオカメラ１１のカメラ中心位置と３次元空間における基準高さ以上のピークの３次元位置とを結ぶ直線と、追跡部１５から読み出した高さに位置する水平面との交点が位置する３次元空間中の水平位置をこの人間の位置として検出する。 At this time, the detection unit 14 reads, from the tracking unit 15, the height detected as a height of a person far from the video camera 11 for a predetermined time, for example, one or several frames before, and the read height is currently As well as a straight line connecting the camera center position of the video camera 11 in the three-dimensional space and the three-dimensional position of the peak higher than the reference height in the three-dimensional space, and the height read from the tracking unit 15. The horizontal position in the three-dimensional space where the intersection with the horizontal plane is located is detected as the human position.

上記のように、本実施の形態では、３次元空間中に一つの水平面ではなく、複数の水平面を設定し、校正されたビデオカメラ１１により撮影された単眼画像だけを使用している。この単眼画像すなわち２次元情報のみを使用して３次元位置を推定する場合、次元数が１次元不足するが、これを補うために人間が垂直に立っていることを仮定し、複数の水平面上に人間のシルエットである人物領域を投影し、この投影領域を垂直軸に沿って積分した積分値のピークを用いて人間の３次元位置（水平位置及び高さ）を検出し、３次元空間において種々の高さの人間を追跡することができる。 As described above, in the present embodiment, not a single horizontal plane but a plurality of horizontal planes are set in the three-dimensional space, and only a monocular image photographed by the calibrated video camera 11 is used. When estimating a three-dimensional position using only this monocular image, that is, two-dimensional information, the number of dimensions is insufficient by one dimension. To compensate for this, it is assumed that a human is standing upright on a plurality of horizontal planes. A human region, which is a human silhouette, is projected onto the image, and the three-dimensional position (horizontal position and height) of the human is detected using a peak of an integral value obtained by integrating the projection region along the vertical axis. Humans of various heights can be tracked.

なお、本実施の形態では、ビデオカメラ１１が撮影手段の一例に相当し、投影部１２が投影手段の一例に相当し、積分部１３が積分手段の一例に相当し、検出部１４が検出手段の一例に相当する。 In the present embodiment, the video camera 11 corresponds to an example of a photographing unit, the projection unit 12 corresponds to an example of a projection unit, the integration unit 13 corresponds to an example of an integration unit, and the detection unit 14 includes a detection unit. It corresponds to an example.

次に、一人の人間を撮影した画像を用いて当該人間の３次元位置を検出する通常３次元位置検出処理について説明する。図２は、図１に示す位置検出装置による通常３次元位置検出処理を説明するための原理図である。 Next, normal three-dimensional position detection processing for detecting the three-dimensional position of a person using an image obtained by photographing one person will be described. FIG. 2 is a principle diagram for explaining normal three-dimensional position detection processing by the position detection apparatus shown in FIG.

図２に示すように、３次元空間中に複数の水平面ＨＰが設定され、各水平面ＨＰの２次元座標は、Ｘ軸及びＹ軸によって規定され、高さ方向がＹ軸であり、対象物である人間ＯＢは、説明を容易にするため、図中にバー形状で示している。まず、投影部１２は、高さＹｎ（０≦Ｙｎ≦Ｙｈ）を変更することにより、ビデオカメラ１１により撮影された２次元画像から分割された人物領域Ｓを水平面ｎ（各水平面ＨＰ）に投影する。この投影結果Ｐ（Ｙｎ）は、高さＹｎを用いて下記式（２）により表すことができる。 As shown in FIG. 2, a plurality of horizontal planes HP are set in a three-dimensional space, and the two-dimensional coordinates of each horizontal plane HP are defined by the X axis and the Y axis, the height direction is the Y axis, A certain human OB is shown in a bar shape in the drawing for easy explanation. First, the projection unit 12 changes the height Yn (0 ≦ Yn ≦ Yh) to project the person area S divided from the two-dimensional image captured by the video camera 11 onto the horizontal plane n (each horizontal plane HP). To do. This projection result P (Yn) can be expressed by the following formula (2) using the height Yn.

Ｐ（Ｙｎ）＝ＨｎＳ（２）
ここで、人間ＯＢが直立していると仮定すると、投影された人物領域ＰＲ（図中の斜線部）は、常に、実際の人間が存在する水平面ｎ上のある点（Ｘ，Ｚ）を含み、全ての水平面ｎを統合（マージ）することにより人物の３次元位置を推定することができる。 P (Yn) = HnS (2)
Here, assuming that the human OB is standing upright, the projected human region PR (shaded portion in the figure) always includes a certain point (X, Z) on the horizontal plane n where the actual human exists. The three-dimensional position of the person can be estimated by integrating (merging) all the horizontal planes n.

積分部１３は、下記式（３）を用いて、点（Ｘ，Ｚ）で垂直軸Ｙに沿って対象物の写像である人物領域Ｐ（Ｙ）の積分値Ｉｔｇ（Ｘ，Ｚ）を算出する。 The integration unit 13 calculates an integral value Itg (X, Z) of the person region P (Y) that is a mapping of the object along the vertical axis Y at the point (X, Z) using the following equation (3). To do.

また、積分値のピークを検出するために、人間の形状を近似する近似フィルタである３次元凸フィルタが使用され、例えば、下記式（４）に表される円筒形状フィルタＣｖｘ（Ｘ，Ｚ）を用いることができ、ここで、ｒは円筒の底部の半径である。なお、近似フィルタとしては、本例に特に限定されず、対象物の３次元形状等に応じて種々のものを用いることができる。また、積分値のピークを検出する方法も、以下の方法に特に限定されず、種々の方法を用いることができ、例えば、上記式（３）により表される積分値を微分してピークを検出する方法（微分フィルタを用いる方法）を用いてもよい。 Further, in order to detect the peak of the integral value, a three-dimensional convex filter that is an approximation filter that approximates a human shape is used. For example, a cylindrical filter Cvx (X, Z) represented by the following formula (4) Where r is the radius of the bottom of the cylinder. The approximate filter is not particularly limited to this example, and various filters can be used according to the three-dimensional shape of the object. Also, the method for detecting the peak of the integral value is not particularly limited to the following method, and various methods can be used. For example, the peak is detected by differentiating the integral value represented by the above formula (3). A method (a method using a differential filter) may be used.

検出部１４は、上記の凸フィルタＣｖｘ（Ｘ，Ｚ）と積分値Ｉｔｇ（Ｘ，Ｚ）との畳み込み値を算出し、畳み込み値が最大となる座標値（Ｘｍ，Ｚｍ）を人間の水平位置として検出し、人間の水平位置（Ｘｍ，Ｚｍ）における人物領域の最大高さ（人間の写像が存在する最も高い水平面の高さ）を人間の高さＹｍとして検出する。図２に示す例では、畳み込み値ＣＶが算出され、そのピークＰＩが位置する座標値（Ｘｍ，Ｚｍ）が人間の水平位置として検出され、そのピークＰＩが位置する座標値（Ｘｍ，Ｚｍ）における人物領域の最大高さが人間の高さＹｍとして検出される。 The detection unit 14 calculates a convolution value between the convex filter Cvx (X, Z) and the integral value Itg (X, Z), and obtains a coordinate value (Xm, Zm) that maximizes the convolution value as a human horizontal position. And the maximum height of the person area at the horizontal position (Xm, Zm) of the person (the height of the highest horizontal plane where the human mapping exists) is detected as the human height Ym. In the example shown in FIG. 2, the convolution value CV is calculated, the coordinate value (Xm, Zm) where the peak PI is located is detected as the human horizontal position, and the coordinate value (Xm, Zm) where the peak PI is located. The maximum height of the person area is detected as the human height Ym.

上記の積分処理の計算中に画像中の小さなホールやクラックなどの分割エラーがスムーズ化され、各領域を完全に分割する必要がなくなるとともに、凸フィルタＣｖｘ（Ｘ，Ｚ）により人間のみを抽出することができ、他の物体等による外乱に対してロバストな検出を行うことができる。 During calculation of the above integration processing, division errors such as small holes and cracks in the image are smoothed, and it is not necessary to completely divide each region, and only the human is extracted by the convex filter Cvx (X, Z). It is possible to perform robust detection against disturbances caused by other objects.

次に、複数の人間が撮影された画像を用いて各人間の３次元位置を検出する複数３次元位置検出処理について説明する。まず、複数の人間が入力画像に撮影され、各人物領域を分割できる場合、第１の複数３次元位置検出処理として、投影部１２は、２次元画像を一人の人物領域のみを含む分割領域に分割し、積分部１３及び検出部１４は、分割領域毎に一人に対して上記の通常３次元位置検出処理を繰り返すことにより、全ての人間の３次元位置を検出する。 Next, a plurality of three-dimensional position detection processing for detecting the three-dimensional position of each person using images obtained by photographing a plurality of persons will be described. First, when a plurality of persons are photographed in an input image and each person area can be divided, as a first plurality of three-dimensional position detection processing, the projection unit 12 converts the two-dimensional image into divided areas including only one person area. The integration unit 13 and the detection unit 14 detect the three-dimensional positions of all humans by repeating the normal three-dimensional position detection process for one person for each divided region.

しかしながら、例えば、撮影された複数の人間が手をつないでいる場合、二つの人物領域の一部が連結され、二つの人物領域が弱く連結されるため、上記のように領域を分割することができない。この場合、第２の複数３次元位置検出処理として、複数の人物領域の一部が連結された連結領域を一つの領域として上記の通常３次元位置検出処理が実行される。 However, for example, when a plurality of photographed people are holding hands, a part of the two person areas are connected and the two person areas are weakly connected, so the area can be divided as described above. Can not. In this case, as the second plurality of three-dimensional position detection processes, the above-described normal three-dimensional position detection process is executed with a connected region in which a part of a plurality of person regions is connected as one region.

すなわち、上記と同様に、投影部１２は、複数の人物領域の一部が連結された連結領域を一つの領域として複数の水平面に投影し、積分部１３は、投影領域の積分値Ｉｔｇ（Ｘｎ，Ｚｎ）を算出し、検出部１４は、凸フィルタＣｖｘ（Ｘｎ，Ｚｎ）及び積分値Ｉｔｇ（Ｘｎ，Ｚｎ）の畳み込み値を算出する。 That is, similarly to the above, the projecting unit 12 projects a connected region in which a part of a plurality of person regions is connected as a single region onto a plurality of horizontal planes, and the integrating unit 13 integrates an integrated value Itg (Xn of the projected region. , Zn), and the detection unit 14 calculates a convolution value of the convex filter Cvx (Xn, Zn) and the integral value Itg (Xn, Zn).

次に、検出部１４は、隣接する畳み込み値のピーク間に位置する谷の高さと畳み込み値のピークの高さとの間隔と、予め記憶している所定間隔とを比較して所定間隔以上のピークを検出する。次に、検出部１４は、検出した畳み込み値のピークと、予め記憶している所定値、例えば、１．５ｍとを比較して所定値以上のピークを抽出する。次に、検出部１４は、抽出したピーク毎に当該ピークが位置する座標点（Ｘｎ，Ｚｎ）を一人の人間の水平位置として検出するとともに、水平位置（Ｘｎ，Ｚｎ）における人物領域の最大高さを人間の高さＹｎとして検出する。上記の処理が３次元空間の全体に適用され、複数の人間の３次元位置が順次検出される。なお、上記のピークの高さ及び谷の高さは、人間の３次元位置の検出と同様に検出され、ピーク及び谷が位置する座標点をピーク及び谷の水平位置として検出し、検出された水平位置における人物領域の最大高さをピークの高さとして検出することができる。 Next, the detection unit 14 compares the interval between the height of the valley located between the peaks of adjacent convolution values and the height of the peak of the convolution value with a predetermined interval stored in advance, and a peak that is equal to or greater than the predetermined interval. Is detected. Next, the detection unit 14 compares the detected peak of the convolution value with a predetermined value stored in advance, for example, 1.5 m, and extracts a peak greater than or equal to the predetermined value. Next, the detection unit 14 detects the coordinate point (Xn, Zn) where the peak is located for each extracted peak as the horizontal position of one person, and the maximum height of the person region at the horizontal position (Xn, Zn). Is detected as a human height Yn. The above processing is applied to the entire three-dimensional space, and a plurality of human three-dimensional positions are sequentially detected. The height of the peak and the height of the valley are detected in the same manner as the detection of the three-dimensional position of a human, and the coordinate point where the peak and the valley are located is detected as the horizontal position of the peak and the valley. The maximum height of the person area at the horizontal position can be detected as the peak height.

上記の処理により、隣接する畳み込み値のピーク間の谷の高さと畳み込み値のピークの高さとの間隔が所定間隔以上の各ピークが人間によるものであると判断することができるので、複数の人間が近接している状態でも、各人間の３次元位置を高精度に検出することができる。 With the above processing, it is possible to determine that each peak whose interval between the height of the valley between the peaks of adjacent convolution values and the peak height of the convolution values is equal to or greater than a predetermined interval is caused by humans. Even in the state of being close to each other, the three-dimensional position of each person can be detected with high accuracy.

次に、ビデオカメラ１１のカメラ中心と人間とを結ぶ直線上に他の人間がいるとき、二人の人間の人物領域が大きく重なり合い、二つの人物領域が強く連結されるため、隣接する畳み込み値のピーク間に位置する谷の高さと畳み込み値のピークの高さとの間隔が所定間隔未満となり、上記のフィルタ処理では連結している領域を分割することが困難なため、上記の第２の複数３次元位置検出処理を適用することができない。この場合、数フレーム前に既に検出された各人間の位置を用いて重なり合った人間の位置を検出する第３の複数３次元位置検出処理が実行される。なお、同じ投影領域に二人以上の人間がいる場合は、下記の処理を人間の数だけ繰り返す。 Next, when there is another person on the straight line connecting the camera center of the video camera 11 and the person, the person areas of the two humans greatly overlap and the two person areas are strongly connected. Since the interval between the height of the valleys located between the peaks and the height of the peak of the convolution value is less than a predetermined interval, and it is difficult to divide the connected region by the filter processing, the second plurality of The three-dimensional position detection process cannot be applied. In this case, a third plurality of three-dimensional position detection processes for detecting overlapping human positions using the positions of the humans already detected several frames before are executed. If there are two or more people in the same projection area, the following processing is repeated by the number of people.

まず、上記と同様に、投影部１２は、二人の人間を含む連結領域を複数の水平面に投影し、積分部１３は、投影領域の積分値Ｉｔｇ（Ｘｎ，Ｚｎ）を算出する。検出部１４は、凸フィルタＣｖｘ（Ｘｎ，Ｚｎ）及び積分値Ｉｔｇ（Ｘｎ，Ｚｎ）の畳み込み値を算出し、最大となる座標値（Ｘ０（ｔ），Ｚ０（ｔ））における人物領域の最大高さＹ０（ｔ）’を検出する（ここで、ｔは時間パラメータ）。 First, similarly to the above, the projection unit 12 projects a connected region including two people onto a plurality of horizontal planes, and the integration unit 13 calculates an integration value Itg (Xn, Zn) of the projection region. The detection unit 14 calculates a convolution value of the convex filter Cvx (Xn, Zn) and the integral value Itg (Xn, Zn), and maximizes the person area at the maximum coordinate values (X0 (t), Z0 (t)). The height Y0 (t) ′ is detected (where t is a time parameter).

次に、検出部１４は、検出した高さＹ０（ｔ）’と、予め記憶している基準値Ｙｒ（例えば、２．５ｍ）とを比較して高さＹ０（ｔ）’が基準値Ｙｒ以上の場合、ビデオカメラ１１に近い方の人間が水平位置（Ｘ０（ｔ），Ｚ０（ｔ））にいると判断し、前のフレームを用いて検出された同一人物（ビデオカメラ１１に近い方の人間）の高さＹ０（ｔ−１）を追跡部１５から読み出し、読み出した高さＹ０（ｔ−１）を現在の高さＹ０（ｔ）として用いる。 Next, the detection unit 14 compares the detected height Y0 (t) ′ with a reference value Yr (for example, 2.5 m) stored in advance, and the height Y0 (t) ′ is the reference value Yr. In the above case, it is determined that the person closer to the video camera 11 is in the horizontal position (X0 (t), Z0 (t)), and the same person (the one closer to the video camera 11) detected using the previous frame. Is read from the tracking unit 15 and the read height Y0 (t-1) is used as the current height Y0 (t).

上記の処理により、検出部１４は、ビデオカメラ１１に近い方の人間の３次元位置として、水平位置（Ｘ０（ｔ），Ｚ０（ｔ））及び高さＹ０（ｔ−１）を取得することができるので、二人の人間が非常に近接して撮影画像上で重なり合っていても、ビデオカメラ１１に近い方の人間の３次元位置を高精度に検出することができる。 Through the above processing, the detection unit 14 acquires the horizontal position (X0 (t), Z0 (t)) and the height Y0 (t−1) as the three-dimensional position of the person closer to the video camera 11. Therefore, even if two humans are very close to each other and overlap each other on the captured image, the three-dimensional position of the human who is closer to the video camera 11 can be detected with high accuracy.

次に、検出部１４は、ビデオカメラ１１から遠い方の人間として前のフレームを用いて検出された同一人物の高さＹ１（ｔ−１）を追跡部１５から読み出し、読み出した高さＹ１（ｔ−１）をビデオカメラ１１から遠い方の人間の現在の高さＹ１（ｔ）として用いるとともに、基準値Ｙｒ以上のピークとして検出された座標値（Ｘ０（ｔ），Ｙ０（ｔ）’，Ｚ０（ｔ））とビデオカメラ１１のカメラ中心（Ｘｃ，Ｙｃ，Ｚｃ）とを結ぶ直線を設定し、読み出した高さＹ１（ｔ−１）に位置する水平面とこの直線との交点の座標値（Ｘ１（ｔ），Ｚ１（ｔ））をビデオカメラ１１から遠い方の人間の位置として検出する。 Next, the detection unit 14 reads the height Y1 (t−1) of the same person detected using the previous frame as a person far from the video camera 11 from the tracking unit 15, and reads the height Y1 ( t-1) is used as the current height Y1 (t) of the person far from the video camera 11, and the detected coordinate values (X0 (t), Y0 (t) ′, Z0 (t)) and a straight line connecting the camera center (Xc, Yc, Zc) of the video camera 11 are set, and the coordinate value of the intersection of the horizontal plane located at the read height Y1 (t-1) and this line (X1 (t), Z1 (t)) is detected as the position of a person far from the video camera 11.

上記の処理により、検出部１４は、ビデオカメラ１１から遠い方の人間の３次元位置として、水平位置（Ｘ１（ｔ），Ｚ１（ｔ））及び高さＹ１（ｔ−１）を取得することができるので、二人の人間が非常に近接して撮影画像上で重なり合っていても、ビデオカメラ１１から遠い方の人間の３次元位置を高精度に検出することができる。 Through the above processing, the detection unit 14 acquires the horizontal position (X1 (t), Z1 (t)) and the height Y1 (t-1) as the three-dimensional position of the person far from the video camera 11. Therefore, even if two people are very close to each other and overlap each other on the captured image, the three-dimensional position of the person far from the video camera 11 can be detected with high accuracy.

なお、上記の同一人物予測処理としては、例えば、追跡部１５において、各フレームで検出された人間の３次元位置を記憶しておき、各３次元位置を用いて人間の移動（例えば、等速度移動）に対する速度予測及び／又は加速度予測を行うことにより現在の３次元位置を予測し、予測した３次元位置と検出された現在の水平位置とが最も近い人間が、現在の水平位置にいる人間と同一人物であると判定することにより行うことができる。この場合、速度予測及び／又は加速度予測を用いた人間の３次元位置予測を３次元空間において直接行っているので、投影による幾何学的な非線形性の影響を受けず、高精度に速度予測及び／又は加速度予測を行うことができ、同一人物を高精度に判定することができる。 As the above same person prediction process, for example, the tracking unit 15 stores the three-dimensional position of the person detected in each frame, and uses the three-dimensional position to move the person (for example, constant velocity). The current three-dimensional position is predicted by performing speed prediction and / or acceleration prediction with respect to (movement), and the person whose predicted three-dimensional position is closest to the detected current horizontal position is the person at the current horizontal position. It can be performed by determining that they are the same person. In this case, since human 3D position prediction using speed prediction and / or acceleration prediction is directly performed in the 3D space, it is not affected by geometric non-linearity due to projection, and the speed prediction and Acceleration prediction can be performed and / or the same person can be determined with high accuracy.

次に、上記の位置検出装置を用いて実際に人間の３次元位置を検出した結果について説明する。図３は、撮影空間を説明するための模式図であり、図４は、図１に示す位置検出装置による検出結果の一例を示す図である。 Next, the result of actually detecting the three-dimensional position of a human using the above position detection device will be described. FIG. 3 is a schematic diagram for explaining an imaging space, and FIG. 4 is a diagram illustrating an example of a detection result by the position detection device illustrated in FIG.

図３に示すように、ビデオカメラ１１をそのカメラ中心の高さが２.６ｍになる位置に取り付け、２.５×１１ｍの撮影空間（廊下）に対して水平面において６８度傾けるとともに撮影空間を見下ろすように配置し、撮影空間を移動する人間を撮影して３２０×２４０ｐｉｘｅｌの画像を１秒間隔で１００枚取得した。撮影空間には、２２個のランドマークを配置し、３次元レーザー測定装置を用いて全ての点の３次元位置を測定し、この３次元位置情報と撮影画像の２次元位置情報とを用いてビデオカメラ１１を校正した。撮影画像の空間分解能は距離に依存し、廊下の端部に位置する対象物を撮影した場合の空間分解能は、約５６ｍｍであり、撮影された対象物がビデオカメラ１１の正面に立っている場合の空間分解能は、約１７ｍｍであった。 As shown in FIG. 3, the video camera 11 is mounted at a position where the height of the center of the camera is 2.6 m, tilted 68 degrees in the horizontal plane with respect to the 2.5 × 11 m shooting space (corridor), and the shooting space is Arranged to look down, a human moving in the shooting space was photographed and 100 images of 320 × 240 pixels were acquired at 1 second intervals. Twenty-two landmarks are arranged in the photographing space, and the three-dimensional positions of all the points are measured using a three-dimensional laser measuring device. Using the three-dimensional position information and the two-dimensional position information of the photographed image, The video camera 11 was calibrated. The spatial resolution of the captured image depends on the distance, and the spatial resolution when the object located at the end of the corridor is captured is about 56 mm, and the captured object is standing in front of the video camera 11 The spatial resolution of was about 17 mm.

上記の測定条件の下、人間の３次元位置を検出した結果、図４に示す結果を得られた。まず、図４の（ａ）は一人の人間が歩いている場面、（ｂ）は一人の人間が屈伸している場面、（ｃ）は二人の人間がすれ違う場面（人物領域が弱く結合されている状態）、（ｄ）は二人の人間が撮影画像中で重なり合った場面（人物領域が強く結合されている状態）をそれぞれ示している。また、図４の上段は撮影画像を示し、中段は撮影画像から抽出された人物領域を示し、下段は撮影空間の水平面への写像を積分した結果を示し、上段及び下段の四角及び丸は検出されたピークの位置すなわち人間の位置を示している。図４から、上記の４種類の場面において、一人又は二人の人間を正確に検出することができ、複数の人間が重なり合っても、正確に追跡することができることがわかった。 As a result of detecting the three-dimensional position of a human under the above measurement conditions, the result shown in FIG. 4 was obtained. First, (a) in FIG. 4 is a scene where one person is walking, (b) is a scene where one person is bending, and (c) is a scene where two persons pass each other (the person area is weakly combined). (D) and (d) respectively show a scene where two people overlap in a captured image (a state in which person areas are strongly coupled). 4 shows the photographed image, the middle part shows the person area extracted from the photographed image, the lower part shows the result of integrating the mapping onto the horizontal plane of the photographing space, and the upper and lower squares and circles are detected. The position of the recorded peak, that is, the position of a human being is shown. FIG. 4 shows that one or two people can be accurately detected in the above four types of scenes, and can be accurately tracked even if a plurality of people overlap.

上記のように、本発明では、入力センサとして１台のビデオカメラだけを必要とするので、既に広く設置されている監視カメラを用いて人間の３次元位置を容易に検出することができ、また、検出結果が３次元座標系で記述されているので、他のセンサ（例えば、ＧＰＳ、ＲＦＩＤ）と同じ内容を共有したり、ネットワークを介してロボットをナビゲートしたりする場合等に好適に用いることができる。 As described above, since the present invention requires only one video camera as an input sensor, it is possible to easily detect the three-dimensional position of a human using a surveillance camera that is already widely installed. Since the detection result is described in a three-dimensional coordinate system, it is suitably used when sharing the same contents as other sensors (for example, GPS, RFID) or navigating the robot via a network. be able to.

なお、上記の説明では、検出される対象物として、人間を例に説明したが、他の動物、他の移動体等にも同様に適用することができ、同様の効果を得ることができる。 In the above description, a human is described as an example of an object to be detected. However, the present invention can be similarly applied to other animals, other moving objects, and the like, and similar effects can be obtained.

本発明の一実施の形態による位置検出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the position detection apparatus by one embodiment of this invention. 図１に示す位置検出装置による通常３次元位置検出処理を説明するための原理図である。It is a principle figure for demonstrating the normal three-dimensional position detection process by the position detection apparatus shown in FIG. 撮影空間を説明するための模式図である。It is a schematic diagram for demonstrating imaging | photography space. 図１に示す位置検出装置による検出結果の一例を示す図である。It is a figure which shows an example of the detection result by the position detection apparatus shown in FIG.

Explanation of symbols

１１ビデオカメラ
１２投影部
１３積分部
１４検出部
１５追跡部 DESCRIPTION OF SYMBOLS 11 Video camera 12 Projection part 13 Integration part 14 Detection part 15 Tracking part

Claims

A photographing means for photographing the object and obtaining a two-dimensional image including the object;
Projection means for extracting an object region from the two-dimensional image acquired by the photographing means and projecting the object region onto a plurality of horizontal planes set in advance in the three-dimensional space at predetermined intervals;
Integrating means for integrating the mapping of the object projected on each horizontal plane by the projection means to calculate an integral value of the mapping;
A position detection apparatus comprising: a detection unit configured to detect a horizontal position in the three-dimensional space where the peak of the integration value calculated by the integration unit is located as a position of an object.

The detection means detects a horizontal position in the three-dimensional space where a convolution value of an approximate filter approximating a three-dimensional shape of the object and the integral value is maximum as the position of the object. Item 1. The position detection device according to Item 1.

The detection means detects the height of the highest horizontal plane where a mapping of the object exists at a horizontal position in the three-dimensional space where the peak is located, as the height of the object. 2. The position detection device according to 2.

When there are a plurality of peaks in the integration value calculated by the integration unit, the detection unit detects the peak for each peak having a predetermined gap or more between a valley height and a peak height between adjacent peaks. The horizontal position in the three-dimensional space where the position is located is detected as the position of the object, and the height of the highest horizontal plane where the mapping of the object exists at the horizontal position in the three-dimensional space where the peak is located is the object. The position detection device according to claim 3, wherein the position detection device detects the height of the position detection device.

The detection means has a first object and a first object that is farther from the imaging means than the first object when an interval between a valley height between adjacent peaks and a peak height is less than a predetermined interval. 2, the horizontal position in the three-dimensional space where the peak higher than the reference height is located is detected as the position of the first object, and the height of the first object is determined. 5. The position detecting device according to claim 3, wherein a height detected before a predetermined time is used.

The detection means is predetermined as a height of a second object and a straight line connecting a photographing center position of the photographing means in the three-dimensional space and a three-dimensional position of a peak higher than the reference height in the three-dimensional space. The horizontal position in the three-dimensional space where the point where the horizontal plane located at the height detected before time intersects is detected as the position of the second object, and the height of the second object is predetermined. 6. The position detecting device according to claim 5, wherein a height detected before time is used.

A first step of capturing an object using one imaging means to obtain a two-dimensional image including the object;
A second step of extracting an object region from the acquired two-dimensional image and projecting the object region onto a plurality of predetermined horizontal planes at predetermined intervals in the three-dimensional space;
A third step of calculating an integral value of the mapping by integrating the mapping of the object projected on each horizontal plane;
And a step of detecting a horizontal position in the three-dimensional space where the peak of the calculated integral value is located as the position of the object.

A target area is extracted from a two-dimensional image acquired by one imaging unit that captures a target and acquires a two-dimensional image including the target, and the target area is previously stored in a three-dimensional space at predetermined intervals. Projection means for projecting onto a plurality of set horizontal planes;
Integrating means for integrating the mapping of the object projected on each horizontal plane by the projection means to calculate an integral value of the mapping;
A position detection program for causing a computer to function as detection means for detecting a horizontal position in the three-dimensional space where the peak of the integration value calculated by the integration means is located as the position of an object.