JP2011198244A

JP2011198244A - Object recognition system, monitoring system using the same, and watching system

Info

Publication number: JP2011198244A
Application number: JP2010066328A
Authority: JP
Inventors: Hiromitsu Hama; 裕光濱; Tin Pyke; ティンパイ; Kiichiro Shibuya; 喜一郎渋谷
Original assignee: Asahi Engineering Co Ltd Fukuoka
Current assignee: Asahi Engineering Co Ltd Fukuoka
Priority date: 2010-03-23
Filing date: 2010-03-23
Publication date: 2011-10-06

Abstract

PROBLEM TO BE SOLVED: To provide a recognition system which can stably and precisely obtain background in any situation, and does not fail in detecting even if influenced by illumination fluctuation, shielding, etc., and to provide a monitoring system and a watching system.SOLUTION: The recognition system functions fundamentally based on majority logic. The recognition system obtains a background parameter which reflects background by an observation means, and a presence position parameter reflecting an object's presence position, determines whether the background parameter exists within the predetermined limit with a predetermined threshold or more, if the background parameter exists, sets the background parameter as a parameter which shows a background candidate area, determines whether the presence position parameter exists within the predetermined limits with the predetermined threshold or more, and if the presence position parameter exists, sets the presence position parameter as a parameter which shows the object candidate area.

Description

地下鉄、駅、空港等の公共の場における、拳銃、刀等の不法所持者、テロリスト、尾行、スリ、喧嘩等怪しい人物（不審者）の発見と追跡及び不審物の発見等セキュリティ確保のための監視システム、あるいは家庭や学校等建物内における老人や子供、病人等の見守りシステム等に必須の技術として、人の姿勢・動作・動きや持ち主不在の物体を検出及び認識して、異常を検知するためのセンサー情報処理技術を用いた監視システム及び見守りシステムに関する。 To ensure security, such as the discovery and tracking of suspicious persons (suspicious persons) such as terrorists, tails, pickles, fights, etc. in public places such as subways, stations, airports, etc. Detecting and recognizing human postures / motions / movements and the absence of an owner to detect abnormalities as an indispensable technology for surveillance systems or systems for monitoring elderly people, children, sick people, etc. in buildings such as homes and schools The present invention relates to a monitoring system and a monitoring system using sensor information processing technology.

従来の監視システムはほとんどが固定的に設置された監視カメラを使用したものが大半である。一方、携帯型センサーを用いた老人や子供の見守りシステムに関するものとしては、GPS搭載の携帯電話を子供に持たせたものがある程度で、他には見当たらない。ほとんどの従来型監視技術は一つのメディアタイプ（主には、ビデオカメラ）だけを用いており、広い範囲はカバーできても、不審者、不審物の発見や異常事態の検出は人の目視に頼らざるを得なかった。本発明は、これらの欠点を補強し、信頼性の高い自動監視システム及び見守りシステムの実現を目指したものである。 Most conventional surveillance systems use fixedly installed surveillance cameras. On the other hand, with regard to the monitoring system for the elderly and children using portable sensors, there are a few cases where children have a GPS-equipped mobile phone, and there are no others. Most conventional surveillance technologies use only one media type (mainly a video camera), and even if it can cover a wide area, the detection of suspicious persons, suspicious objects, and abnormal situations can be detected by human eyes. I had to rely on it. The present invention reinforces these drawbacks and aims to realize a highly reliable automatic monitoring system and watching system.

不審者、不審物の発見には監視カメラ以外では、空港等におけるX線検査や係官による目視によるもの、発熱を感知するサーモグラフィ検査、顔写真や指紋照合によるもの等が代表的なものである。過去の研究では画像処理アルゴリズムを用いた静止物体や特定の動きをする人の自動検出については、それほど複雑でない比較的安定した環境の中での動作認識（歩く、走る、転倒等）に限られており、実用的な24時間自動監視システム及び見守りシステムを実現できるレベルには至っていない。 Other than surveillance cameras, suspicious persons and suspicious objects are typically detected by X-ray inspection at airports, visual inspection by officials, thermography inspection to detect fever, facial photographs and fingerprint verification. In past research, automatic detection of stationary objects and people with specific movements using image processing algorithms is limited to motion recognition (walking, running, falling, etc.) in a relatively stable and less complex environment. Therefore, it has not reached a level where a practical 24-hour automatic monitoring system and a monitoring system can be realized.

Teddy Ko, “A survey on behavior analysis in video surveillance forhomeland security applications,” 37th IEEE Applied ImageryPattern Recognition Workshop, Washington, DC, USA,pp.1-8, Oct. 15-17, 2008.Teddy Ko, “A survey on behavior analysis in video surveillance for homeland security applications,” 37th IEEE Applied Imagery Pattern Recognition Workshop, Washington, DC, USA, pp. 1-8, Oct. 15-17, 2008.

しかしながら、従来の監視システム及び見守りシステムには以下に示す問題点があった。すなわち、対象となる静止物体の検出には、背景差分が多く用いられるが、そのためには正確な背景が必要であり、よく用いられる観測開始時のデータを固定的に利用する方法や、単に一定時間内の移動平均を利用する方法では、動物体による遮蔽や影、反射あるいは照明条件の変動等の影響があると誤差が大きくなるという問題があった。 However, the conventional monitoring system and watching system have the following problems. In other words, the background difference is often used to detect the target stationary object, but an accurate background is necessary for this purpose. In the method using the moving average over time, there is a problem that an error becomes large if there is an influence of a shield, a shadow, a reflection by a moving object, or a change in illumination conditions.

また、照明変動や対象物の周りの移動物体による遮蔽等の影響を受けて、検出に失敗（未検出及び誤検出）することがよくあり、このような観測値の欠落やノイズ等に起因して、対象物が存在しているのに検出できなかったり、存在していないのに誤検出したりする問題と、本来は一つであるべき対象物領域が分離されて抽出されたり、複数個の対象物が一つの領域として検出されたりして、後処理が困難になると言う問題があった。 In addition, detection failure (undetected and false detection) often occurs due to lighting fluctuations or shielding by moving objects around the target, and this is caused by missing observation values or noise. The problem is that the target object cannot be detected even if it is present, or it is erroneously detected even if it does not exist. There is a problem that post-processing becomes difficult because the target object is detected as one region.

さらに、背景差分から求められた対象物の位置に関してもノイズの影響等により精度が低下するという問題があった。 Furthermore, there is a problem that the accuracy of the position of the object obtained from the background difference is lowered due to the influence of noise or the like.

本発明は、このような問題点を解決するために提案された対象物の認識システムであって、一定時間内、監視領域及び／又は見守り領域を観測する観測手段と、上記観測手段の観測によって得られた、背景を反映する背景パラメータが、所定の範囲内に所定の閾値以上で存在しているか否かを判定する背景判定手段と、上記背景判定手段が所定の閾値以上で存在していると判定した場合、上記所定の範囲内にある背景パラメータを、背景候補領域を示すパラメータとして設定することで、背景を抽出する背景抽出手段と、上記観測手段の観測によって得られた、対象物の存在位置を反映する対象物の存在位置パラメータが、所定の範囲内に所定の閾値以上で存在しているか否かを判定する対象物判定手段と、上記対象物判定手段が所定の閾値以上で存在していると判定した場合、上記所定の範囲内にある対象物存在位置パラメータに対応する対象物パラメータを、対象物候補領域を示すパラメータとして設定することで、対象物を抽出する対象物抽出手段と、を備える。 The present invention is an object recognition system proposed to solve such a problem, and includes an observation means for observing a monitoring area and / or a monitoring area within a certain period of time, and observation by the observation means. The background determination means for determining whether or not the obtained background parameter reflecting the background exists within a predetermined range at a predetermined threshold value or more, and the background determination means exists at a predetermined threshold value or more. The background parameter within the predetermined range is set as a parameter indicating the background candidate region, the background extraction means for extracting the background, and the object obtained by the observation of the observation means Object determination means for determining whether or not the presence position parameter of the object reflecting the presence position is within a predetermined range at a predetermined threshold value or more, and the object determination means is a predetermined threshold value If it is determined that the target exists, the target parameter corresponding to the target position parameter within the predetermined range is set as a parameter indicating the target candidate area, and the target is extracted. And an object extraction means.

「所定の閾値以上」の判定のために、該パラメータが多く存在していることを示すものであればよく、「所定の割合以上」あるいは「所定の個数以上」あるいは「所定の累積値以上」あるいは「該パラメータが多く存在するある一定の範囲内にある」のどれかの基準を用いて判断してもよい。背景及び／又は対象物の内容あるいは属性を反映するパラメータの代表的なものには、背景や対象物そのものを反映するパラメータとして背景パラメータ（例えば、画素値）、対象物パラメータ（例えば、画素値）、対象物の存在を反映するパラメータとして対象物存在位置パラメータ（例えば、領域重心）、対象物の動きを反映するパラメータとして対象物の重心の動き、等がある。 In order to determine “greater than or equal to a predetermined threshold”, it is only necessary to indicate that a large number of the parameters exist, and “a predetermined ratio or more”, “a predetermined number or more”, or “a predetermined cumulative value or more”. Alternatively, the determination may be made by using any criterion of “within a certain range where there are many parameters”. Typical parameters that reflect the contents and attributes of the background and / or object include background parameters (for example, pixel values) and object parameters (for example, pixel values) as parameters that reflect the background and the object itself. There are an object location parameter (for example, a region centroid) as a parameter reflecting the presence of the object, a movement of the centroid of the object as a parameter reflecting the movement of the object, and the like.

また、上記背景判定手段が所定の閾値以上で存在していないと判定した部分を背景候補領域から除いて、上記背景抽出手段が設定した所定の範囲内にある背景パラメータを、背景の候補領域を示すパラメータとして設定し、背景候補領域から背景パラメータを決定する背景推定手段
を備えてもよい。 Further, the background determination unit removes a portion determined not to exist at a predetermined threshold or more from the background candidate region, and the background parameter within the predetermined range set by the background extraction unit is determined as the background candidate region. It may be provided with a background estimation means that is set as a parameter to be shown and determines a background parameter from the background candidate region.

さらに、上記対象物判定手段が所定の閾値以上で存在していると判定した場合、上記対象物抽出手段が設定した所定の範囲内にある対象物存在位置に対応するパラメータを、上記対象物判定手段の対象物の候補領域を示すパラメータとして設定し、求められた領域に対応する対象物パラメータを求める対象物抽出手段とを備えてもよい。 Further, when it is determined that the object determination unit exists at a predetermined threshold value or more, a parameter corresponding to the object presence position within the predetermined range set by the object extraction unit is used as the object determination unit. There may be provided an object extracting means for setting an object parameter corresponding to the obtained area, which is set as a parameter indicating a candidate area of the object of the means.

本発明によれば、適応的に背景を抽出及び更新することで、どのような状況にあっても安定的に高精度な背景を得ることができる。また、照明変動や対象物の周りの移動物体による遮蔽等の影響を受けても、検出に失敗することなく、このような観測値の欠落やノイズ等に起因して、対象物が存在しているのに検出できなかったり、誤検出したりする問題を解消することができる。 According to the present invention, it is possible to stably obtain a highly accurate background in any situation by adaptively extracting and updating the background. In addition, even if affected by fluctuations in lighting or shielding by moving objects around the target, the target is present due to such missing observation values or noise without failing in detection. However, it is possible to solve the problem that it cannot be detected or erroneously detected.

異なる時点における対象物存在位置パラメータを重なり合わせた場合、重なり合わせたパラメータが所定の閾値以上であるか否かを判定する重なりパラメータ判定手段と、上記重なりパラメータ判定手段が所定の閾値以上であると判定した場合、その閾値以上のパラメータに対応する領域を対象物領域として抽出する対象物領域抽出手段と、を備えても良い。 When the object location parameter at different time points are overlapped, the overlap parameter determination means for determining whether or not the overlapped parameter is equal to or greater than a predetermined threshold, and the overlap parameter determination means is equal to or greater than the predetermined threshold If it is determined, an object area extracting unit that extracts an area corresponding to a parameter equal to or greater than the threshold as an object area may be provided.

対象物抽出手段が静止している物体として抽出した対象物領域において対象物が出現してから現在に至るある一定時間内におけるフレームの領域パラメータの相関度が、第１の閾値以上であるか、第１の閾値以下で第２の閾値以上であるか、第２の閾値以下であるかを判定する判定手段と、上記判定手段が第１の閾値以上であると判定した場合、対象物のデータを一時的に乱すものと判断し、第１以下で第２の閾値以上であると判定した場合、対象物は静止物体であると判断し、第２の閾値以下であると判定した場合、対象物は小さな動きを伴う動体であると判断する対象物判断手段とを備えてもよい。 Whether the correlation between the region parameters of the frame within a certain time from the appearance of the object to the present in the object region extracted as a stationary object by the object extraction means is greater than or equal to the first threshold value, Data of the object when it is determined that the determination unit determines whether the threshold value is equal to or lower than the first threshold value and equal to or lower than the second threshold value, and the determination unit is equal to or higher than the first threshold value. If the target is determined to be temporarily disturbed and determined to be equal to or greater than the second threshold and equal to or greater than the first threshold, the target is determined to be a stationary object, and the target is determined to be equal to or smaller than the second threshold. The object may include an object determination unit that determines that the object is a moving body with a small movement.

検出された対象部が完全に静止しているか、微妙に動いているかを知ることができれば、人か物かの識別に役立つ。ここでの相関度は、正規化相関係数を想定して０〜１の間の値を持つものとして説明したが、距離を用いる場合等では大小を逆にする必要がある。 If it is possible to know whether the detected target part is completely stationary or slightly moving, it is useful for identifying whether it is a person or an object. Here, the degree of correlation has been described as having a value between 0 and 1 assuming a normalized correlation coefficient, but it is necessary to reverse the magnitude when using a distance or the like.

本発明の対象物検出システムは、不審者の発見と追跡及び不審物の発見等セキュリティ確保のための監視システムや、家庭や学校等建物内における老人や子供、病人等の見守りシステム等に利用することができる。 The object detection system of the present invention is used for a surveillance system for ensuring security such as discovery and tracking of a suspicious person and discovery of a suspicious object, a monitoring system for elderly persons, children, sick persons, etc. in buildings such as homes and schools. be able to.

本発明によれば、どのような状況にあっても安定的に高精度な背景を得ることができる。また、照明変動や対象物の周りの移動物体による遮蔽等の影響を受けても、検出に失敗することなく、このような観測値の欠落やノイズ等に起因して、対象物が存在しているのに検出できなかったり、誤検出したりする問題を解消することができる。 According to the present invention, a highly accurate background can be stably obtained under any circumstances. In addition, even if affected by fluctuations in lighting or shielding by moving objects around the target, the target is present due to such missing observation values or noise without failing in detection. However, it is possible to solve the problem that it cannot be detected or erroneously detected.

本実施形態における処理手段を含むシステム全体の構成図Configuration diagram of the entire system including processing means in the present embodiment 本実施形態におけるシステム全体の処理フローダイアグラムProcess flow diagram of the entire system in this embodiment 本実施形態における背景の求め方を示す図The figure which shows how to obtain | require the background in this embodiment バウンディングボックスに関連するパラメータを示す図Diagram showing parameters related to bounding box 対象となる静止物体の検出方法を示す図The figure which shows the detection method of the target stationary object バウンディングボックスを用いた静止物体の検出方法を示す図The figure which shows the detection method of the stationary object using the bounding box 一定の動きをする対象物の検出方法を示す図The figure which shows the detection method of the object which moves constant 持ち主不在の不審物の検出過程を示す処理ブロック図Processing block diagram showing the process of detecting a suspicious object with no owner 対象となる静止物体の別の検出方法を示す図The figure which shows another detection method of the target stationary object 検出された静止物体の状態を識別する方法を示す図The figure which shows the method of identifying the state of the detected stationary object 静止物体の微小な動き検出のための領域分割方法を示す図The figure which shows the area segmentation method for the minute motion detection of the stationary object

以下、本発明の実施の形態について、図面と数式に基づいて説明する。図１は、本実施形態における処理手段を含むシステム全体の構成図である。本システム１００は、一定時間内、監視領域及び／又は見守り領域を観測する１又は２以上のセンサー１０１、１０２と、センサー１０１、１０２から受信したデータを分析統合する制御手段１０３を備えている。センサー１０１、１０２は、一定時間内に、背景や対象物の存在を観測する観測手段としての機能を有し、固定センサー１０１、携帯センサー１０２等で構成される。 Hereinafter, embodiments of the present invention will be described based on the drawings and mathematical expressions. FIG. 1 is a configuration diagram of the entire system including processing means in the present embodiment. The system 100 includes one or more sensors 101 and 102 for observing a monitoring area and / or a monitoring area within a certain time, and a control unit 103 that analyzes and integrates data received from the sensors 101 and 102. The sensors 101 and 102 have a function as observation means for observing the presence of a background and an object within a certain time, and are configured by a fixed sensor 101, a portable sensor 102, and the like.

一定時間内、監視及び見守り対象となる場面を観測する観測手段として、次に挙げるセンサーを用いてもよい。センサーの種類は、「０次元：位置計測GPS、温度、湿度」、「１次元：加速度計（３軸）、角加速度計（３軸）、傾斜計（３方向、角加速度計に含ませ得る）、マイク（音響）」、「２次元：通常の可視光カメラ、近赤外線カメラ、遠赤外線カメラ、サーモグラフィ」、「３次元：３次元データ（レンジセンサー、３次元位置センサー、３次元モーションセンサー、３次元測距センサー）」等から選ばれる。ここでは、本来の使用目的と直感的な分かりやすさのために主に画像を用いて説明するが、その他のメディア（３次元データ等）等の観測データから対象物を認識する場合にも同様に適用できる。また、種々の変動等に対応するために０次近似だけでなく、線形近似あるいは高次近似等も使え、誤差を最小にするために、最小二乗平均誤差（LMS: Least Mean Square)、あるいは最大誤差を最小にして最適値を得る手法等も必要に応じて使い分ければよい。 The following sensors may be used as observation means for observing scenes to be monitored and watched over within a certain period of time. The types of sensors are “0 dimension: position measurement GPS, temperature, humidity”, “1 dimension: accelerometer (3 axes), angular accelerometer (3 axes), inclinometer (3 directions, can be included in the angular accelerometer. ), Microphone (acoustic) ”,“ 2D: normal visible light camera, near infrared camera, far infrared camera, thermography ”,“ 3D: 3D data (range sensor, 3D position sensor, 3D motion sensor, 3D ranging sensor) ”and the like. Here, for the purpose of original use and intuitive intelligibility, explanation will be given mainly using images, but the same applies to the case of recognizing an object from observation data such as other media (such as three-dimensional data). Applicable to. Also, not only zero-order approximation but also linear approximation or higher-order approximation can be used to cope with various fluctuations, and the least mean square error (LMS) or maximum is used to minimize the error. A technique for obtaining an optimum value by minimizing an error may be properly used as necessary.

制御手段１０３は、センサー１０１、１０２から取得した画像等に基づいて背景を抽出するための背景判定手段、背景抽出手段、背景推定手段や、対象物を抽出するための対象物判定手段、対象物抽出手段、対象物存在位置推定手段を備えている。これらの手段は、具体的には例えば、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）等を備えたパーソナルコンピュータで構成され、ＣＰＵが、ＲＯＭやＨＤＤ等に記憶されているプログラムを実行することで、各手段としての機能が果たされる。 The control means 103 is a background determination means, background extraction means, background estimation means for extracting a background based on images acquired from the sensors 101, 102, an object determination means for extracting an object, an object Extraction means and object presence position estimation means are provided. Specifically, these means include, for example, a personal computer including a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), and the like. By executing a program stored in a ROM, HDD, or the like, functions as each means are fulfilled.

各手段について説明する前に、まず本実施形態の対象物認識システム（又は監視システム、見守りシステム）におけるシステム全体の流れについて簡単に説明する。図２は、本実施形態におけるシステム全体の処理フローダイアグラムである。まず、観測に入る前に初期値としての背景を求める。ここで、背景とは、人、物に関わらず、観測開始より前から観測が始まってからも一定期間静止しているものを意味し、初期背景とは観測開始時の背景、短時間背景と長時間背景は静止時間の長短により区別する。すなわち、途中から現れても、ある一定時間以上場面中に存在し、静止し続けているものを指すこともあるが、短時間背景と長時間背景はどのくらい長い時間場面内で静止している物を静止物体として検出したいのかによって使い分ける。 Before describing each means, first, the flow of the entire system in the object recognition system (or monitoring system, watching system) of this embodiment will be briefly described. FIG. 2 is a processing flow diagram of the entire system in the present embodiment. First, the background as the initial value is obtained before the observation. Here, the background means a thing that is stationary for a certain period after the observation starts before the start of observation, regardless of the person or object, and the initial background is the background at the start of observation, the short-time background. A long-time background is distinguished by the length of stationary time. In other words, even if it appears from the middle, it may be a thing that has been in the scene for a certain period of time and has been stationary, but the short-time background and the long-time background are things that remain stationary within the scene. Depending on whether you want to detect as a stationary object.

図２に示すように、センサー２００からのデータを受けとり（Ｓ２０１）、背景の抽出と更新をする（Ｓ２０２）。まず、背景差分やフレーム間差分等により対象物領域を抽出し（Ｓ２０３）、さらに位置と大きさの微調整を行い（Ｓ２０４）、セグメンテイションを行い（Ｓ２０５）、領域特徴を抽出する（Ｓ２０６）。３次元実空間上での位置推定を行い（Ｓ２０７）、対象となる静止物体や一定の動きをする動物体の姿勢・動作の認識、不審物の検出を行う（Ｓ２０８~Ｓ２１３）。具体的には、物か、人かの判別を行い（Ｓ２０８）、物ならば不審物と判断して（Ｓ２０９）、この不審物を持ってきた人の追跡を始める（Ｓ２１０）。人ならば姿勢・動作の認識結果から異常事態の検出を行い（Ｓ２１１、Ｓ２１２）、異常が発見されたら通報等の対応をする（Ｓ２１３）。これらの手順により一連の処理を終わるが、監視・見守り時間中は上の処理を繰り返す。 As shown in FIG. 2, the data from the sensor 200 is received (S201), and the background is extracted and updated (S202). First, an object region is extracted based on a background difference, an inter-frame difference or the like (S203), finely adjusted in position and size (S204), segmented (S205), and region features are extracted (S206). . Position estimation in a three-dimensional real space is performed (S207), and the posture and movement of a target stationary object or a moving object that moves in a certain manner are detected, and a suspicious object is detected (S208 to S213). Specifically, it is determined whether it is an object or a person (S208). If it is an object, it is determined as a suspicious object (S209), and tracking of the person who brought this suspicious object is started (S210). If it is a person, an abnormal situation is detected from the recognition result of posture / motion (S211 and S212), and if an abnormality is found, a notification or the like is taken (S213). A series of processing is completed by these procedures, but the above processing is repeated during monitoring and watching time.

次に、本実施形態における背景の求め方について、図３を用いて説明する。図３は、本実施形態における背景の求め方を示す図である。図３では背景を反映する背景パラメータとして画素値を選んでいるが３次元データ等を選ぶこともできる。一定時間内の画素値が所定の範囲の外にある領域は遮蔽や影、反射や照明条件の変動等による一時的な変化として検出し、その部分はデータが一時的、部分的に欠落したものとして除いた部分の値（画素値等）を用いて更新すること又は一定の割合で変動する照明条件等に関してはその変動を近似することで更新する。具体的には、まず背景判定手段によって、背景を反映する画素値が、所定の範囲内に所定の閾値以上で存在しているか否かを判定する。例えば、各画素の時間変化を求め、一定範囲内に画素値が存在するか否かを判定する。存在すると判定した場合は、背景抽出手段がその画素値を用いてその位置での画素の背景値を計算する。 Next, how to obtain the background in this embodiment will be described with reference to FIG. FIG. 3 is a diagram showing how to obtain the background in the present embodiment. In FIG. 3, pixel values are selected as background parameters reflecting the background, but three-dimensional data or the like can also be selected. An area where the pixel value within a certain time is outside the specified range is detected as a temporary change due to shielding, shadows, reflections, or changes in lighting conditions, and the data is temporarily or partially missing Are updated by using the value of the part removed (pixel value or the like), or the illumination conditions that change at a constant rate are updated by approximating the change. Specifically, first, the background determination unit determines whether or not the pixel value reflecting the background exists within a predetermined range at a predetermined threshold or more. For example, the temporal change of each pixel is obtained, and it is determined whether or not the pixel value exists within a certain range. When it is determined that the pixel exists, the background extraction unit calculates the background value of the pixel at the position using the pixel value.

これに対して、データを一時的に乱すもの（例えば、動物体による遮蔽や影、影、反射や照明条件の変動等）により、背景判定手段が画素値が所定の閾値以上で存在しない時間帯と判定する場合もある。このような場合には、背景推定手段により、所定の範囲内に所定の閾値以上で存在する画素値のうち、次のような手段のどれかを用いて最適な値を選んでその位置、その時間帯での画素の背景値とすることもできる。ここで、最適な値を決めるために、重み付き移動平均値（単なる平均値も可）、中央値、ヒストグラムのピーク値、出現頻度の高い値等を用いてもよい。 In contrast, when the data is temporarily disturbed (for example, occlusion, shadows, shadows, reflections, fluctuations in illumination conditions, etc. by the moving object), the background determination means has a time zone in which the pixel value does not exist above a predetermined threshold May be determined. In such a case, the background estimation means selects an optimal value using any of the following means from among the pixel values existing within a predetermined range at a predetermined threshold or more, and the position, It can also be the background value of the pixel in the time zone. Here, in order to determine an optimum value, a weighted moving average value (which may be a simple average value), a median value, a peak value of a histogram, a value having a high appearance frequency, or the like may be used.

このようにして、データを一時的に乱すものを検出しそれを除去して一定時間の画素値の線形近似等を用いることで、背景の変動に対応することができる。本方法は、センサーデータや３次元データ等、画像以外のメディアにも適用可能である。対象物が場面内でどのくらい長い時間静止していれば静止物体とみなすか、は必要性に応じて時間間隔を制御することによって決めることができる。背景差分を利用する際には、正確な背景が得られることが前提であり、背景更新が効果的に働くためには、動物体領域が精度よく抽出され、照明変動等がそれほど激しくなく、影等の影響も少ないことが望ましい。また、対象物が場面上に現れて静止し始めた時刻及び静止を終えた時刻も分かる。 In this way, it is possible to cope with background fluctuations by detecting what temporarily disturbs the data, removing it, and using linear approximation or the like of pixel values for a certain period of time. This method can also be applied to media other than images, such as sensor data and three-dimensional data. It can be determined by controlling the time interval according to the necessity how long the object is stationary in the scene. When using background subtraction, it is premised that an accurate background is obtained, and in order for the background update to work effectively, the moving object region is extracted with high accuracy, illumination fluctuations, etc. are not so severe, It is desirable that the influence of the above is small. In addition, the time when the object appears on the scene and starts to stop and the time when the object ends are also known.

各フレームの領域パラメータの相関行列の固有値の大きいものから順番に累積寄与率がある一定の値以上になるまで選び出し、それに対応する固有ベクトルを用いて投影・逆投影して抽出された領域を遮蔽、動物体、あるいは照明変動とみなして、背景の候補領域から除外して残りの領域を背景候補領域として求めてもよい。相関行列を求める際に、各フレームが1次元データでなく、２次元や３次元の場合は、配列を並べなおすことによって１次元化する手法を使ってもよい。 Select from the largest eigenvalues of the correlation matrix of the region parameter of each frame until the cumulative contribution rate reaches a certain value or higher, and shield the region extracted by projection / backprojection using the corresponding eigenvector, The remaining area may be obtained as a background candidate area by considering it as a moving object or illumination variation and excluding it from the background candidate area. When obtaining the correlation matrix, if each frame is not one-dimensional data but two-dimensional or three-dimensional, a technique of making it one-dimensional by rearranging the array may be used.

次に、対象物を認識する手順について説明する。対象物とは静止物体あるいは一定の動きをする物体で検出・認識の対象となっているものであり、人と物を含む。ここで、対象物存在位置パラメータは、対象物の存在位置を反映するパラメータのことで、対象物のシルエットやバウンディンボックスの重心や領域の最大値、最小値、背景差分の絶対値や後述する動き測度等も含む。動き測度は対象物の位置と対応させて用いる。対象画像を例に取ると、存在位置パラメータに対応する画像上の領域を求めると対象物の存在位置や大きさを知ることができる。パラメータの検出には上述の方法で得られた背景を利用してもよい。また、対象物パラメータは対象物を反映するパラメータ、背景パラメータは背景を反映するパラメータのことで、具体的な例として画素値や３D位置データ等である。両方を総称して領域パラメータという。 Next, a procedure for recognizing an object will be described. An object is a stationary object or an object that moves in a certain manner and is a target of detection and recognition, and includes people and objects. Here, the target object position parameter is a parameter that reflects the target position of the target object. The target object silhouette, the center of gravity of the bounding box, the maximum and minimum values of the area, the absolute value of the background difference, and the like will be described later. Includes motion measures. The motion measure is used in correspondence with the position of the object. Taking the target image as an example, the position and size of the target object can be known by obtaining the area on the image corresponding to the position parameter. The background obtained by the method described above may be used for parameter detection. The object parameter is a parameter that reflects the object, and the background parameter is a parameter that reflects the background. Specific examples include pixel values and 3D position data. Both are collectively referred to as region parameters.

バウンディングボックスは、図４に示すように、理想的にはある対象物を囲む最小の長方形、あるいは直方体領域のことであるが、必ずしも最小のものが抽出できるとは限らない。また、存在位置パラメータとして用いる重心の代わりに対象物候補領域の位置を表すパラメータであれば、他の値で置き換えることも可能であり、例えば、バウンディングボックスの左上隅の点の位置、最上点、最下点等を選ぶことも可能である。シルエットとは、背景差分等によって抽出された対象物（人、物）領域を白黒で表現したものである。３次元データは、３次元空間上の距離画像データ等を総称していい、レンジセンサーや、対象物までの距離を計測することにより３次元データが得られる３次元測距センサー等から取得することができる。 As shown in FIG. 4, the bounding box is ideally a minimum rectangle or a rectangular parallelepiped region surrounding a certain object, but the minimum one cannot always be extracted. In addition, if it is a parameter representing the position of the candidate object region instead of the center of gravity used as the existence position parameter, it can be replaced with other values, for example, the position of the upper left corner of the bounding box, the top point, It is also possible to select the lowest point. The silhouette is a black and white representation of an object (person, object) region extracted by background difference or the like. Three-dimensional data may be a generic term for distance image data in a three-dimensional space, and acquired from a range sensor or a three-dimensional ranging sensor that can obtain three-dimensional data by measuring the distance to an object. Can do.

図５は、対象となる静止物体の検出方法を示す図である。この図において、対象物存在位置推定手段が対象物存在位置パラメータとしてバウンディングボックスの重心のx座標を用いて対象物の存在位置を推定する。その後、対象物領域抽出手段が推定されたパラメータを持つバウンディングボックスの共通部分あるいはある閾値以上の重なりがある部分を対象物領域として抽出する。図６は、バウンディングボックスを用いた静止物体の検出方法を示す図である。さらに、静止物体と判定するための時間を制御できることを示している。また、一定の動きする動物体の検出のためには、その動きと位置を反映するものであれば何でもよく、例えば、対象物の動きの周期性、アスペクト比、モーメント比、速度、加速度、角加速度、傾斜角、移動効率（詳細は後述する）等も利用できる。図７は、本実施形態における一定の動きをする動物体の検出方法を示している。この図では、動きパラメータの違いから、対象者が急に異なる動きをしたことも検知できることを示している。以下、本実施形態における対象物の検出手順（ステップ１からステップ５）について詳細に説明する。 FIG. 5 is a diagram illustrating a method for detecting a target stationary object. In this figure, the target object position estimation means estimates the target position of the target object using the x coordinate of the center of gravity of the bounding box as the target object position parameter. After that, the object area extraction means extracts the common part of the bounding box having the estimated parameters or the part having the overlap more than a certain threshold as the object area. FIG. 6 is a diagram illustrating a method for detecting a stationary object using a bounding box. Furthermore, it shows that the time for determining as a stationary object can be controlled. In addition, in order to detect a moving object that moves constantly, any object that reflects its movement and position may be used. For example, the periodicity of movement of an object, aspect ratio, moment ratio, speed, acceleration, angle Acceleration, inclination angle, movement efficiency (details will be described later), and the like can also be used. FIG. 7 shows a method for detecting a moving object that moves in a certain manner in this embodiment. This figure shows that it is possible to detect that the subject suddenly moved differently from the difference in the motion parameters. Hereinafter, the object detection procedure (step 1 to step 5) in this embodiment will be described in detail.

（ステップ１）の処理について
ある一定時間内（T）の観測により、対象物抽出手段が、対象物の存在（位置、動き）を反映しているパラメータの値を算出する。 The object extraction means calculates the value of the parameter reflecting the presence (position, movement) of the object by observing the process of (Step 1) within a certain period of time (T).

（ステップ２）の処理について
観測値に欠落があり得ることを想定しているので、得られたパラメータは、連続して存在しているとは限らず、値に変動もあり得る。そこで、ある一定の上下幅±δを設け、この範囲内での変動は許容する。具体的には、対象物判定手段が、±δの範囲内に、ある閾値以上のパラメータの値が集中しているか判定する。集中していれば（≧kT、0<k<1）、対象となる静止物体の候補領域とする。実係数kは欠落をどの程度許容するかによって決められる。閾値δは対象物の観測パラメータの変動の平均値を基準に決められるが、再現率（recall rate）を高めるために少し余裕を持って大きめに設定することもできるが（10%〜30%程度）、一般には適合率（precision rate）が低下するので、その時々の要求に合わせて決める。また、中央値を採用してもよい。 Since it is assumed that the observed value may be missing in the process of (Step 2), the obtained parameters do not always exist continuously, and the values may vary. Therefore, a certain vertical width ± δ is provided, and variation within this range is allowed. Specifically, the object determining means determines whether parameter values equal to or greater than a certain threshold are concentrated within a range of ± δ. If it is concentrated (≧ kT, 0 <k <1), it is determined as a candidate area of the target stationary object. The real coefficient k is determined by how much omission is allowed. The threshold value δ is determined based on the average value of the observation parameter fluctuation of the object, but it can be set larger with a little margin (about 10% to 30%) in order to increase the recall rate. ) In general, the precision rate is lowered, so it is determined according to the requirements of the time. Moreover, you may employ | adopt a median.

（ステップ３）の処理について
このステップ３では遮蔽と判断される場所（すなわち、対象物判定手段が集中していないと判定した場合）は、対象物更新手段は、集中していないと判定されたパラメータを計算から取り除き、ステップ２で求められたパラメータを基に再計算することにより誤差を減少させることができる。そうすることで、位置の推定精度が向上し、より精度の高い候補領域が検出でき、位置を反映するパラメータの精度も向上する。この処理（ステップ３）はオプション（選択）であり、１回目の処理でパラメータが安定的に精度よく求められていれば、スキップしてもよい。 Regarding the processing of (Step 3), the place where the object is determined to be blocked in Step 3 (that is, when it is determined that the object determination means is not concentrated) is determined that the object update means is not concentrated. The error can be reduced by removing the parameter from the calculation and recalculating based on the parameter determined in step 2. By doing so, the position estimation accuracy is improved, a more accurate candidate region can be detected, and the accuracy of the parameter reflecting the position is also improved. This process (step 3) is an option (selection), and may be skipped if the parameters are obtained stably and accurately in the first process.

（ステップ４）の処理について
必要に応じて、上のステップを繰り返した後、対象物が検出されれば、形状測度や動き測度を用いて、人か物かの識別ができるが、同時に、人であれば姿勢・動作認識等により不審物・不審者の検出が可能となる。ここで、形状測度とはモノの形を計るモノサシのことで、例えば、形そのもの、大きさ、円形度、伸長度、縦横高さ、モーメント、アスペクト比、モーメント比、輪郭線、フーリエ記述子、ヒストグラム、HOG等がある。また、HOG（Histgrams of Oriented Gradients）特徴量は勾配ベースの特徴量であり、フーリエ記述子（Fourier
Descriptor）は物体の境界曲線の形状を表す特徴量となる。動き測度とは、動きの違いを図るモノサシのことで、速度、加速度、角加速度、傾斜角、移動効率（例えば、ＳＲ、ＤＲ）等がある。ここでは、人以外の静止物体で持ち主不在のものを不審物とみなす。 If the target is detected after repeating the above steps as necessary for the processing of (Step 4), it can be identified as a person or an object using a shape measure or a motion measure. Then, it becomes possible to detect a suspicious object / suspicious person by posture / motion recognition or the like. Here, the shape measure is a mono sashi that measures the shape of a thing. There are histograms, HOG, etc. In addition, HOG (Histgrams of Oriented Gradients) features are gradient-based features and Fourier descriptors (Fourier
Descriptor is a feature amount representing the shape of the boundary curve of the object. The motion measure is a monosashi that makes a difference in motion, and includes speed, acceleration, angular acceleration, tilt angle, movement efficiency (for example, SR, DR), and the like. Here, a stationary object other than a person and having no owner is regarded as a suspicious object.

（ステップ５）の処理について
上で不審物が検出された場合、誰かが運んできたものであれば、当該不審物と観測時間内のどこかで、人と不審物との位置パラメータが共通部分を持つか接近するはずであり、その時点から不審物を運んできた人の追跡が可能となる。時間を遡れば、その人がどちらから来たかの情報も得ることができる。また、不審な行動が検出された場合、その人の追跡を始めることができる。 When a suspicious object is detected in the process of (Step 5), if someone has carried it, the position parameter between the person and the suspicious object is the common part somewhere within the observation time. You should be able to track the person who brought the suspicious object from that point. If you go back in time, you can also get information on where the person came from. If suspicious behavior is detected, the person can be traced.

上記のステップをとることで、対象となる物体が静止物体であるにも関わらず、静止物体であると判定するためには連続した時間が不足している場合にも、断続的なデータを加える（>閾値）ことで検出が可能となる。対象となる静止物体とは、人、物を問わず、ある時刻、例えば、最初観測を始めた時には観測場面に存在していなくて、それ以後、途中から場面中に存在し、静止し続けているものを指す。例えば、誰かが置き忘れたもの、故意に置いて行ったもの、および歩いてきて椅子に座っている人等を指す。観測の最初から存在していたものは対象としない。また、途中から現れても、ある一定時間以上場面中に存在し、静止し続けているものを指すこともある。 By taking the above steps, intermittent data is added even if the target object is a stationary object, but there is insufficient continuous time to determine that it is a stationary object. Detection is possible by (> threshold). Regardless of the person or object, the target stationary object does not exist in the observation scene at a certain time, for example, when starting the first observation, and then exists in the scene from the middle and keeps still. It points to what is. For example, it refers to something that someone has left behind, deliberately left behind, or a person who walks and sits in a chair. Those that existed from the beginning of the observation are not included. In addition, even if it appears from the middle, it may be present in the scene for a certain period of time and remain stationary.

図５は、バウンディングボックスの重心（x座標）をパラメータとして用いたときの処理の様子を例示的に示したものである。ここでは直感的な分かりやすさのために画像を用いて説明するが、その他のパラメータ（３次元データや加速度センサー等）の観測データに欠落がある場合にも同様に適用できる。また、誤差を最小にするために、一般には線形近似を行い最小二乗平均誤差、あるいは最大誤差を最小にして最適値を得る手法が多く用いられるが、種々の変動等に対応するために０次近似だけでなく、高次の近似等を使ってもよい。また、BB₂₁、BB₂₂、BB₃₁ はBB₁（動き予測も含む）に含まれ、あるいは含まれる共通部分がある一定以上あり、BB₂₁とBB₃₁ はある一定以上静止状態にあるので、対象となる静止物体であることが判明し、時刻t₄で他の人による遮蔽が発生した後、時刻t₅にBB₅として再度単体で現れている。この後は、人か物かの判断が必要であり、物であると不審物と判断される。時刻t₁〜t₃まで遡り、当該不審物を運んできて立ち去った人物BB₃₂が不審者として特定されるので追跡を始めてもよい。ここで、BBはバウンディングボックスを、（G_x, G_y)はその重心を表す。画像を例に取ると、パラメータに対応する画像上の領域を求めることで対象物の存在位置や大きさを知ることができる。 FIG. 5 exemplarily shows the processing when the center of gravity (x coordinate) of the bounding box is used as a parameter. Here, the description will be made using an image for intuitive understanding, but the present invention can be similarly applied to cases where observation data of other parameters (such as three-dimensional data and an acceleration sensor) is missing. In order to minimize the error, generally, a method of obtaining an optimum value by performing linear approximation and minimizing the minimum mean square error or the maximum error is used in many cases, but the zero order is used to cope with various fluctuations. Not only approximation but also higher order approximation may be used. In addition, BB ₂₁ , BB ₂₂ , and BB ₃₁ are included in BB ₁ (including motion prediction), or there is a certain amount of common parts included, and BB ₂₁ and BB ₃₁ are in a stationary state for a certain amount. It is found that the object is a stationary object, and after another person is shielded at time t ₄ , it appears again as a single unit at time t ₅ as BB ₅ . After this, it is necessary to determine whether the object is a person or an object, and if it is an object, it is determined to be a suspicious object. Since the person BB ₃₂ who went back from time t _{1 to} t ₃ and carried away the suspicious object is identified as a suspicious person, tracking may be started. Here, BB represents a bounding box, and (G _x , G _y ) represents its center of gravity. Taking an image as an example, it is possible to know the position and size of an object by obtaining an area on the image corresponding to a parameter.

以下、図６を参照しながら、バウンディングボックスを用いた静止物体の検出手順について説明する。初めに共通集合を求める。具体的には、まず、決められた時間内で各ピクセルに、バウンディングボックスBBのID番号を投票する。つづいて、ID番号毎に、時刻t0に至る長時間フレーム数Fl の間でピクセル内で投票数V1 (ID,x,y;t0,Fl)を集計する。そして、ID番号とは無関係に時刻t0に至る短時間フレーム数Fs の間でピクセル内で投票数V2(x,y;t0,F)を集計する。ID番号が正しく特定できる場合には、V1 (ID,x,y;t0,F)を、特定できない場合にV2(x,y;t0,F)を用いることができる。 Hereinafter, a stationary object detection procedure using a bounding box will be described with reference to FIG. First, find the common set. Specifically, first, the ID number of the bounding box BB is voted for each pixel within a predetermined time. Subsequently, for each ID number, the number of votes V1 (ID, x, y; t0, Fl) is counted within the pixel during the long-time frame number Fl until time t0. Then, the number of votes V2 (x, y; t0, F) is counted within the pixel during the short-time frame number Fs up to time t0 regardless of the ID number. When the ID number can be correctly specified, V1 (ID, x, y; t0, F) can be used, and when the ID number cannot be specified, V2 (x, y; t0, F) can be used.

次に、得られた集計結果から得票数の多い画素を静止物体の候補領域とする。例えば、V1(ID, x, y; t0, Fl ) ＞ k1×Fl（k1=0.1）の領域を対象となる長時間静止物体の候補領域とし、 V2(x, y; t0, Fs ) ＞ k2×Fs（k2=0.7）の領域を対象となる短時間静止物体の候補領域とする。共通集合から、このような処理を行うことによって対象となる静止物体を検出することができる。 Next, a pixel having a large number of votes is determined as a still object candidate region from the obtained totaling result. For example, let V1 (ID, x, y; t0, Fl)> k1 × Fl (k1 = 0.1) be a candidate region for a long-time stationary object, and V2 (x, y; t0, Fs)> k2 A region of × Fs (k2 = 0.7) is set as a candidate region for a short-time stationary object. By performing such processing from the common set, a target stationary object can be detected.

次式は、よく使われる形状特徴として一般に用いられているモーメントの定義を示すが、ある点、又はある軸の回りの分散を表現できるパラメータであれば、本文中のモーメントの代わりに用いることができる。例えば、ある軸に沿ってのヒストグラムの分散等を用いることもできる。
The following equation shows the definition of a moment that is commonly used as a commonly used shape feature, but if it is a parameter that can express the dispersion around a point or an axis, it can be used instead of the moment in the text. it can. For example, histogram dispersion along a certain axis can be used.

重心回りのモーメントを次式のように表す。
The moment around the center of gravity is expressed as

重心を通る垂直軸、水平軸回りモーメントを次式のように表す。
The vertical and horizontal moments passing through the center of gravity are expressed as follows:

ここで、全てのモーメントに関するパラメータは単独で用いることもできるが、比（(4), (8)）を用いた方が効果が高い。上の２種類のモーメントに関するパラメータにおいて、(2), (3), (6), (7) のモーメント計算においてここでは上下に分けたが、人の検知に使うときには、高さの比を設定し、その比で、頭、胴体、脚の３つの部分（腰を中心に２つに分けることも可能、何らかの意味のある部分に分ける）に分けて、それぞれの範囲で計算する（各区間の高さ：k₁*H〜k₂*H、0≦k₁＜k₂≦1）。このようにすると一部分遮蔽があっても、見えている部分を利用でき、ある程度の判断ができるので、遮蔽に強い頑健なシステムが構築できる。 Here, all moment parameters can be used independently, but the ratio ((4), (8)) is more effective. In the above two types of moment parameters, the moment calculation of (2), (3), (6), (7) is divided into upper and lower here, but when using for human detection, the ratio of height is set The ratio is divided into three parts (head, torso, leg) (can be divided into two parts around the waist, or divided into some meaningful parts) and calculated for each range (for each interval) Height: k ₁ * H to k ₂ * H, 0 ≦ k ₁ <k ₂ ≦ 1). In this way, even if there is partial shielding, the visible portion can be used and a certain degree of judgment can be made, so that a robust system that is strong against shielding can be constructed.

また、次に動き測度として用いられるものを挙げる。
Next, what is used as a motion measure is listed.

以上は２次元上での話であるが、３次元空間上でも同様に定義することができる。ここで、移動効率は、第一義的にはどれだけ効率的に移動したかを示すパラメータであって０〜１の範囲の値をとり、具体的には、変位距離に対する走行距離の比（以下、ＳＲという）やT フレーム前の位置と現在位置との差とその間に移動した総距離の比（以下、ＤＲという）等を用いることもできる。また、手足の不用な動きや不規則性、腰を曲げたり、頭、首を振る等の無駄な動作、上半身、下半身の動作から意味のない動作を抽出してその割合を用いることもできる。ＳＲ及びＤＲは、式(8)、(9)の逆数を用いて定義されることもあるが、ここでは０から１の範囲の値をとる（0≦SR≦1、0≦DR≦1）ように上式のように定義する。なお、一般にフレームは画像を意味するが、３次元データ等その他マルチメディアコンテンツにも適用できる。コンテンツは、センサーから得られたデータの総称をいい、コンテンツが時間的に変化する場合に各時刻におけるデータをフレームという。例えば、動画を例に取ると、複数枚のフレーム（静止画）が集まって一つの動画を構成する。一般にはフレームは画像に用いられる場合が多いが、ここではその他のコンテンツにも同じ用語を用いることにする。 The above is a two-dimensional story, but it can be similarly defined in a three-dimensional space. Here, the movement efficiency is a parameter that indicates how efficiently the movement is primarily performed, and takes a value in the range of 0 to 1. Specifically, the ratio of the travel distance to the displacement distance ( (Hereinafter referred to as SR), the difference between the position before the T frame and the current position, and the ratio of the total distance moved between them (hereinafter referred to as DR) can also be used. It is also possible to extract the meaningless movements from unnecessary movements and irregularities of the limbs, useless movements such as bending the waist, shaking the head and neck, and movements of the upper and lower bodies, and using the ratio. SR and DR may be defined using the reciprocals of equations (8) and (9), but here take values in the range of 0 to 1 (0 ≦ SR ≦ 1, 0 ≦ DR ≦ 1). Define as above. In general, a frame means an image, but it can also be applied to other multimedia contents such as three-dimensional data. Content is a generic term for data obtained from sensors, and when content changes with time, data at each time is called a frame. For example, taking a moving image as an example, a plurality of frames (still images) are collected to form one moving image. In general, a frame is often used for an image, but here, the same term is used for other contents.

本実施形態では、静止物体の検出方法は、対象物の存在位置を反映するパラメータの計算やバウンディングボックスの抽出等も必要とせず、直接対象となる静止物体の候補領域が検出できることを特徴とし、次の手順による。まず、背景差分の絶対値等の結果を、そのまま用いるか、あるいは２値化して対象物の候補領域をシルエットとして求め、その重なりがある閾値以上である部分を対象となる静止物体の候補領域とする。重なり判定手段は、各画素において背景差分の絶対値又は絶対値の２値化画像又はシルエットの値の累積値から重なりの程度を判定し、その値が所定の閾値以上であれかば候補領域とする。このときにフレーム数で割る等の正規化をしてもよい。 In the present embodiment, the method for detecting a stationary object is characterized in that it does not require calculation of a parameter reflecting the position of the target object, extraction of a bounding box, or the like, and can directly detect a candidate area of a target stationary object, Follow the steps below. First, the result of the absolute value of the background difference or the like is used as it is, or binarized to obtain a candidate area of the object as a silhouette. To do. The overlap determination means determines the degree of overlap from the absolute value of the background difference or the binarized image of the absolute value or the cumulative value of the silhouette value for each pixel, and if the value is equal to or greater than a predetermined threshold, To do. At this time, normalization such as division by the number of frames may be performed.

図８は、対象物の存在位置パラメータとして背景差分の絶対値を用いて、駅のプラットフォームにおける持ち主不在の不審物の検出結果を示す。この図においては、重なりパラメータ判定手段は、 (1)シルエット領域の濃淡値の累積値、又は(2)シルエットを２値化した後の重なりフレーム数（値１を持つフレーム数）、のどちらかを用いて重なり程度を判定する。次に、対象物領域手段がその値が所定の閾値以上であれば対象物領域として抽出する。同図(b)は、同図(a)の途中フレームの拡大図にシルエットを用いた結果を示したものである。本来は背景部分にデータは存在しないが、分かりやすさのために半透明で表示している。同図(c)は、閾値の設定により検出する対象物の静止時間の制御ができることを示している。一般に、人に比べて荷物は小さく、形状は長方形でアスペクト比が大きいものが多い。３次元空間上での大きさに換算してある程度以下の大きさのものは人でないと判断できるが、ゴルフバッグやスーツケースは人と間違って認識されることがあり、単純な方法だけでは区別がつきにくい場合があるので、詳細の区別が必要なときは形状測度やその他の手法を合わせて使ってもよい。同図(d)は、このことを示している。バウンディングボックス等の抽出を必要としない静止物体の一般的な検出方法を図９に示す。図９において、対象物の存在位置パラメータとして背景差分の絶対値を用いた場合に、そのままの値を用いる場合と、２値化後の値（シルエット）を用いる場合と、があり、どちらも集計後はある閾値を設定して領域抽出を行い、さらに領域パラメータを求める。 FIG. 8 shows the detection result of the suspicious object absent from the owner on the station platform using the absolute value of the background difference as the presence position parameter of the object. In this figure, the overlap parameter judgment means is either (1) the accumulated value of the shade value of the silhouette area, or (2) the number of overlapping frames after binarizing the silhouette (the number of frames having the value 1). Is used to determine the degree of overlap. Next, the object area means extracts the object area if the value is equal to or greater than a predetermined threshold value. FIG. 7B shows the result of using silhouettes in the enlarged view of the intermediate frame in FIG. Originally, there is no data in the background, but it is displayed in a semi-transparent form for easy understanding. FIG. 3C shows that the stationary time of the object to be detected can be controlled by setting the threshold value. In general, luggage is smaller than people, and the shape is often rectangular and has a large aspect ratio. Although it can be determined that a person with a size below a certain level when converted to a size in a three-dimensional space is not a person, a golf bag or a suitcase may be mistakenly recognized as a person. If it is necessary to distinguish between details, shape measures and other methods may be used together. FIG. 4D shows this. FIG. 9 shows a general method for detecting a stationary object that does not require extraction of a bounding box or the like. In FIG. 9, when the absolute value of the background difference is used as the presence position parameter of the object, there are a case where the value is used as it is and a case where the value (silhouette) after binarization is used. After that, a certain threshold value is set, region extraction is performed, and region parameters are obtained.

対象となる静止物体が抽出された後の処理については、上記のステップ４、ステップ５と同じである。候補領域の抽出には背景差分等を用いるが、センサーは固定的に設置されていることを前提にしているので、大きな環境の変化はないものと考えられるが、照明条件等の変動による画像の明るさ等の変化に対応するためには前述の手順に沿って背景の更新を行った方がよい。同様の手法は３次元データにも適用できるが、その場合は画素の代わりにボクセルを用いてもよい。ここで、ボクセルは、２次元デジタル画像を構成する単位のピクセル（画素）に対して、厚み情報を加えた３次元デジタル画像を構成する単位のことで、厚みを持った粒子として表現される。 The processing after the target stationary object is extracted is the same as Step 4 and Step 5 described above. Although the background difference is used to extract the candidate area, it is assumed that the sensor is fixedly installed, so it is considered that there will be no significant environmental change, but the image of the image due to fluctuations in lighting conditions etc. In order to cope with changes in brightness or the like, it is better to update the background according to the above-described procedure. A similar method can be applied to three-dimensional data, but in that case, voxels may be used instead of pixels. Here, a voxel is a unit constituting a three-dimensional digital image in which thickness information is added to a unit pixel constituting the two-dimensional digital image, and is expressed as a particle having a thickness.

図１０は、本実施形態における、検出された静止物体をさらに完全に静止しているものか、少し動くものか、あるいは動物体による遮蔽や影、反射あるいは照明変動の影響等による一時的な影響かを識別する原理を示した図である。ここでは、相関度として対象物候補領域における対象物パラメータの時間的な相互相関係数を用い、ある適当な閾値を設定し、相互相関係数が、Th₁〜１ならば物、Th₂~Th₁の部分が存在すれば人、それ以外、すなわち０〜Th₂ならば、一時的にデータを乱すもの、例えば遮蔽や影、反射あるいは照明変動の部分と判断する様子を示している。このように対象物が検出された後で、一般に物は完全に静止しており、人は微妙に（局所的に少し）動くことを利用して、人と物を区別できる。この図においては、対象物抽出手段が抽出した対象物領域における対象物パラメータの時間的な相関度が閾値のどの範囲にあるかを判定し、その後対象物判断手段が完全に静止した物体と小さな動きを伴う人（一般に人は完全に静止していない）を区別する様子を示している。 FIG. 10 shows a temporary effect of the detected stationary object that is more completely stationary or slightly moving, or the influence of shielding, shadow, reflection, or illumination variation by the moving object in this embodiment. It is the figure which showed the principle which identifies these. Here, a temporal cross-correlation coefficient of the object parameter in the object candidate region is used as the degree of correlation, and an appropriate threshold value is set. If the cross-correlation coefficient is Th ₁ to 1, the object, Th ₂ to If the Th ₁ portion is present, it indicates that the person is determined to be a person, and if it is 0 to Th ₂ , it is determined that the data is temporarily disturbed, for example, a portion of shielding, shadow, reflection, or illumination variation. After the object is detected in this manner, the object is generally completely stationary, and the person can distinguish the person from the object by using subtle movement (a little locally). In this figure, it is determined which range of the threshold value the temporal correlation degree of the object parameter in the object region extracted by the object extracting means is, and then the object determining means is small and the completely stationary object. It shows how to distinguish people with movements (generally people are not completely stationary).

また、動きの範囲を広く取るために抽出され対象物候補領域を少し拡大して用いてもよく、その領域内で時間的な相互相関係数を計算し、「ほぼ変化なし、即ち相互相関係数が１に近いまま」であれば、完全に静止した物体であり、「たまに変化するか1秒に数回以内の動き」であれば、人の可能性が高い。ただし、非常に大きな変動があって後に元に戻った場合は、一時的、部分的なものの影響であり、例えば、遮蔽や影、反射あるいは照明変動等の影響であると考えられ、元に戻らなかった場合は対象物が移動したと考えられる。相互相関係数を計算する領域として、精度は低下するがシルエットの代わりにバウディングボックス等を用いてもよいし、対象データとして原画像そのものでなく微分処理をしたエッジ画像等を用いてもよい。 In addition, the object candidate area extracted to obtain a wide range of motion may be used with a slight enlargement, and a temporal cross-correlation coefficient is calculated within the area, and “substantially no change, that is, cross-correlation” If the number stays close to 1, it is a completely stationary object, and if it changes occasionally or moves within a few times per second, the human possibility is high. However, if there is a very large change and it returns to its original state later, it is a temporary or partial effect, for example, it is considered to be an effect of shielding, shadows, reflections, or lighting fluctuations. If not, the object is considered to have moved. As a region for calculating the cross-correlation coefficient, a bounding box or the like may be used instead of the silhouette although accuracy is reduced, or an edge image or the like subjected to differentiation processing may be used as target data instead of the original image itself. .

また、対象物候補領域が大きく、その一部が小さく動く場合には、相互相関係数が１に近くなり、検出が難しくなる場合があるので、そのときには対象物候補領域をさらに小領域に分割して各領域毎に相互相関係数を計算することで検出が容易になる。分割の手法としては、できるだけ面積が均等になるように分割した方がよい。そのために、例えば、垂直ヒストグラムを用いて上下の領域の面積が等しくなるように分割位置を決め、さらに上下に分割された各領域における水平ヒストグラムを用いて左右の領域の面積が等しくなるように分割位置を決めることで面積が等しい４つの領域に分割することができる。上下左右の分割の順序を入れ替えて分割してもよい。また、分割領域の境界部分が動く場合には、重複した領域を用いてもよい。この様子は図１１に示されている。 In addition, when the object candidate area is large and part of it moves small, the cross-correlation coefficient is close to 1, which may make detection difficult. In that case, the object candidate area is further divided into smaller areas. Thus, detection is facilitated by calculating the cross-correlation coefficient for each region. As a dividing method, it is preferable to divide the area so that the areas are as uniform as possible. For this purpose, for example, a vertical histogram is used to determine the dividing position so that the areas of the upper and lower areas are equal, and further, the horizontal histogram is used for each area divided vertically so that the areas of the left and right areas are equal. By determining the position, it can be divided into four regions having the same area. You may divide | segment by changing the order of the division | segmentation of up and down, right and left. Moreover, when the boundary part of a division area moves, you may use an overlapping area | region. This is shown in FIG.

また、フレーム間の違い、すなわち変化（動き）を反映するものであれば相互相関係数以外のもの、例えば、２乗平均誤差等を用いてもよい。本方式は簡単にしかも安定的に識別できることを特徴とするが、その原理からじっとして動かない人や後ろ向きで動きが分からない人は物と、微妙に動く物は人と区別できないので、他の手法と併用してもよい。基準となる画像の求め方は、シルエット領域において前述の最適な画素値を選ぶ方法、すなわち重み付き移動平均値（単なる平均値も可）、中央値、ヒストグラムのピーク値、出現頻度の高い値等を利用することができる。また、ある一定時間内でのシルエット領域内の分散から動きの程度を推定することもでき、明るさやコントラストの違いを吸収するための正規化をしてもよい。 Further, other than the cross-correlation coefficient, for example, a mean square error may be used as long as it reflects a difference between frames, that is, a change (motion). This method is characterized by being easily and stably discernible. However, the person who does not move by the principle or the person who does not understand the movement backwards cannot distinguish the object from the person, and the object that moves slightly cannot be distinguished from the person. You may use together with the method of. The method for obtaining the reference image is to select the above-mentioned optimum pixel value in the silhouette area, that is, a weighted moving average value (which can be a simple average value), a median value, a peak value of a histogram, a value having a high appearance frequency, etc. Can be used. Also, the degree of motion can be estimated from the variance in the silhouette region within a certain time, and normalization may be performed to absorb differences in brightness and contrast.

相関度を計算する基準となる画像を求めるために、上記の変動する環境の中から背景や対象物を求める手段を用いてもよい。また、シルエットやバウンディングボックス領域内での誤差の標準偏差等を用いる際には、照明等による影響を受けにくくするために、領域内輝度値の平均値で割った値を用いてもよい。上記では相関度を各フレーム毎に求めたが、画素毎に時系列の中での違い、すなわち変化（動き）を反映するもの、例えば分散等を用いてもよい。「完全に静止している、少し動く、大きく変化する」の内、大きく変化する状態は対象物領域の明るさ等を用いても検出できる。また、対象領域が小さい場合は相関度の計算が不正確になる場合があるが、画像上の大きさを実空間に換算したときに、人の標準的な大きさに比べて極端に小さい場合は人ではあり得ないので、物と判断してもよい。
In order to obtain an image serving as a reference for calculating the degree of correlation, a means for obtaining a background or an object from the above-described changing environment may be used. In addition, when using the standard deviation of the error in the silhouette or the bounding box area, a value divided by the average value of the in-area luminance values may be used in order to make it less susceptible to illumination. In the above description, the degree of correlation is obtained for each frame. However, a difference in the time series, that is, a change (motion) reflecting each pixel, for example, variance may be used. The state that changes greatly among “completely stationary, slightly moving, and greatly changing” can also be detected using the brightness of the object region. In addition, when the target area is small, the calculation of correlation may be inaccurate, but when the size on the image is converted to real space, it is extremely small compared to the standard size of a person Since it cannot be a person, it may be judged as a thing.

なお、パラメータの選定に関して、画像上の動きをそのまま用いたものより、３次元空間上での動きに変換した方が一般には検出及び認識精度が上がる。短い時間間隔で一定の動きを反映するパラメータが取得できれば、遮蔽が頻繁に発生しても検出が可能となる。遮蔽がそれほど頻繁に発生しない場合には、パラメータ取得のための時間間隔を大きく取れる。 Regarding the selection of parameters, detection and recognition accuracy generally increases when converted to motion in a three-dimensional space rather than using motion on an image as it is. If a parameter reflecting a certain movement can be acquired at a short time interval, detection is possible even if shielding frequently occurs. If the shielding does not occur so frequently, the time interval for parameter acquisition can be increased.

以上に示すように、本発明によれば、人混みの多いところ等では、遮蔽が頻繁に発生し、照明変動の影響等種々の原因で観測値に欠落が生じることがあるが、そのような場合であっても不審物の発見、不審物を置き去りにした不審者を追跡できる。駅の改札口や空港等において、うろうろしている人、酔っ払っている人、あたりを窺う人、無賃乗車のために改札口で中腰になって通過する人や跳び上がる人、設定されたラインから離れて移動する不審な動きをする人、持ち主不在の不審物、の検出と不審者の追跡等のセキュリティ確保のための監視システムに利用できる。また、車載の遠赤外線カメラ画像を用いた歩行者検知にも役立つので重大事故の防止に役立つ。 As described above, according to the present invention, shielding is frequently generated in a crowded place and the like, and the observation value may be lost due to various causes such as the influence of illumination variation. Even so, it is possible to track the suspicious person who found the suspicious object and left it. At the ticket gates and airports of the station, people who are hungry, drunk, people who crawl around, people who pass through the ticket gates for unpaid rides, people who jump up, from the set line It can be used in a surveillance system for ensuring security, such as detection of a person who moves suspiciously moving away, a suspicious object absent from the owner, and tracking of the suspicious person. In addition, it is useful for pedestrian detection using in-vehicle far-infrared camera images, so it is useful for preventing serious accidents.

家庭内、公共の建物内や通学路などにおいては、病人、子供や老人等が急に転倒したり、うずくまったり、したりの異常がないか、周辺に怪しい人がいないか、等を検知し通報する見守りシステムの構築に利用できる。 In homes, public buildings, school routes, etc., detect whether there are abnormalities such as sudden fall, cramping, or illness of sick people, children, elderly people, etc. It can be used to build a monitoring system for reporting.

１００ネットワーク
１０１固定センサー
１０２携帯センサー
１０３制御手段 100 network 101 fixed sensor 102 portable sensor 103 control means

Claims

Observation means for observing the monitoring area and / or the monitoring area within a certain period of time;
Parameter determination means for determining whether a parameter reflecting the background and / or the contents or attributes of the object obtained by the observation of the observation means exists within a predetermined range at a predetermined threshold value or more;
When the parameter determining means determines that the parameter is present at a predetermined threshold or more, the parameters and the attributes of the background and / or object are extracted by setting the parameters within the predetermined range as the parameters to be obtained. An object recognition system having feature extraction means.

A background determination unit that determines whether a background parameter that reflects the background obtained by observation of the observation unit exists within a predetermined range at a predetermined threshold value or more;
A background extraction unit for extracting a background by setting a background parameter within the predetermined range as a parameter indicating a background candidate area when it is determined that the background determination unit exists above a predetermined threshold; ,
The object recognition system according to claim 1, comprising:

Object existence position determination for determining whether or not the object existence position parameter reflecting the object existence position obtained by observation of the observation means is present within a predetermined range at a predetermined threshold or more. Means,
When the object presence position determination unit determines that the object exists at a predetermined threshold value or more, the object candidate region is extracted from the existence position parameter within the predetermined range, and the object is extracted from the object parameter. Object extraction means;
The object recognition system according to claim 1, comprising:

An object determination means for determining whether or not an object parameter reflecting the object obtained by observation of the observation means exists within a predetermined range at a predetermined threshold value or more;
When the object determination means determines that the object exists at a predetermined threshold value or more, the object is extracted by setting the object parameter within the predetermined range as a parameter indicating the object candidate area. Object extraction means;
The object recognition system according to claim 1, comprising:

Whether or not the minute motion parameter reflecting the minute motion obtained by the observation of the observation means exists within a predetermined range at a predetermined threshold value or more and / or exists at a predetermined threshold value or less. A minute movement determining means for determining;
A minute motion detecting means capable of determining whether the object is a complete stationary object or a minute motion according to the existence range of the parameter determined by the minute motion determining means;
5. The object recognition system according to claim 1, comprising:

A portion indicating that the background determination means does not exist above a predetermined threshold is excluded from the background candidate area, and a background parameter within a predetermined range set by the background extraction means is used as a parameter indicating a background candidate area. Background estimation means for determining the background parameter from the background candidate region,
The object recognition system according to claim 1 or 2, further comprising:

A portion determined by the object determining unit not to exist at a predetermined threshold or more is removed from the object candidate region, and an object presence position parameter within a predetermined range set by the object extracting unit An object existence position estimating means for setting as a parameter indicating an object candidate area;
4. The object recognition system according to claim 1 or 3, wherein the object extraction means extracts an object region based on the parameter set by the object existence position estimation means and determines an existence position parameter.

The part determined that the object determining means does not exist at a predetermined threshold or more is excluded from the object candidate area, and the object parameter within the predetermined range set by the object extracting means is The object recognition system according to claim 1, further comprising object estimation means that is set as a parameter indicating a candidate area and determines an object parameter from the object candidate area.

Determination means for determining high, medium and low according to the existence range from the temporal correlation degree of the object parameter in the object region extracted as the stationary object by the object extraction means;
If it is determined that the determination means is high, the object is determined to be a stationary object. If it is determined to be medium, the object is determined to be a moving object with a small movement, and if it is determined to be low, the object Object judging means for judging that the data of
An object recognition system according to any one of claims 1 to 8.

A monitoring system using the object recognition system according to claim 1.