JP2018136832A

JP2018136832A - Object image estimation device and object image determination device

Info

Publication number: JP2018136832A
Application number: JP2017032054A
Authority: JP
Inventors: 昌宏前田; Masahiro Maeda
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2017-02-23
Filing date: 2017-02-23
Publication date: 2018-08-30
Anticipated expiration: 2037-02-23
Also published as: JP6548683B2

Abstract

PROBLEM TO BE SOLVED: To solve the problem in which estimation accuracy deteriorates when a front side and a back side have different concealment situations as a person in a decubitus position in an object image estimation device that estimates a visible area of an object using a three-dimensional model in an image imaging a three-dimensional space.SOLUTION: A horizontal distance and a vertical distance being lengths of an image, in which visual line vectors are orthogonally projected to a horizontal plane and a vertical axis respectively, are defined as two kinds of projective distances of a visual line vector of imaging means. An installation object distance storage means 30 stores the two kinds of projective distances to an installation object as an installation object distance by associating with each pixel of the image. A model distance calculation means 42 calculates the projective distance to a candidate position of an object, being a specified kind corresponding to a candidate posture as a model distance. A visible area estimation means 43 estimates a pixel, in which the specified kind of installation object distance of the pixel is more than or equal to the specified kind of model distance of a three-dimensional model, of the pixels constituting a model image of the three-dimensional model at the candidate position as an object visible pixel.SELECTED DRAWING: Figure 2

Description

本発明は、画像における対象物の像が現れている対象物可視領域を推定する対象物画像推定装置および対象物画像判定装置に関し、特に什器等の設置物により画像上で隠蔽され得る対象物の可視領域を推定する装置に関する。 The present invention relates to an object image estimation device and an object image determination device that estimate an object visible region in which an image of an object appears in an image, and particularly to an object that can be concealed on an image by an installation such as a fixture. The present invention relates to an apparatus for estimating a visible region.

監視空間を撮像した監視画像から人物などの監視対象物の像を抽出したり、当該像の画像特徴に基づいて監視対象物の存在を判定したり追跡したりすることが行われている。監視画像においては常に監視対象物の全体像が観測できるとは限らず、監視空間内の什器や柱、さらには他の人物などによりその一部がしばしば隠蔽される。この隠蔽は監視対象物の観測される大きさや色成分などの画像特徴を変動させるため、検知失敗や追跡失敗の原因となる。 An image of a monitoring object such as a person is extracted from a monitoring image obtained by imaging the monitoring space, and the presence or absence of the monitoring object is determined or tracked based on image characteristics of the image. In the monitoring image, it is not always possible to observe the entire image of the monitoring object, and a part of the monitoring image is often concealed by fixtures and pillars in the monitoring space, and other persons. This concealment fluctuates image characteristics such as the observed size and color component of the monitored object, which causes detection failure and tracking failure.

そこで、三次元のモデルを用いて隠蔽をシミュレートすることにより監視画像において隠蔽されていない監視対象物の領域（対象物可視領域）を推定し、対象物可視領域の画像特徴から対象物の像の存在を判定することにより画像特徴の変動による検知失敗や追跡失敗を防ぐことが行われている。 Therefore, by simulating concealment using a three-dimensional model, the region of the monitored object (object visible region) that is not concealed in the monitored image is estimated, and the image of the object is determined from the image features of the object visible region. By detecting the presence of the image, detection failure and tracking failure due to variation in image characteristics are prevented.

また、このシミュレーションにおける三次元のモデルの投影処理は処理負荷が大きくリアルタイムでの検知や追跡を困難化させるため、事前の投影処理により処理負荷の低減が図られている。 Further, since the projection processing of the three-dimensional model in this simulation has a large processing load and makes it difficult to detect and track in real time, the processing load is reduced by prior projection processing.

例えば、特許文献１，２に記載の対象物画像判定装置では、人物等の対象物、および監視空間の設置物それぞれについて予め三次元モデルの投影を行って、対象物モデル像および設置物モデル像それぞれと床面座標との対応関係、並びに設置物により隠蔽が生じる床面位置のマップを生成して記憶部に格納しておき、人物位置が隠蔽位置であれば、当該人物位置に対応する対象物モデル像から設置物モデル像との重複部分を除外して対象物可視領域を求める。 For example, in the object image determination apparatuses described in Patent Documents 1 and 2, a three-dimensional model is projected in advance on each of an object such as a person and an object in a monitoring space, and the object model image and the object model image are displayed. Correspondence between each and floor surface coordinates, and a floor surface map that is concealed by the installed object is generated and stored in the storage unit. If the person position is the concealment position, the object corresponding to the person position The object visible region is obtained by excluding the overlapping part with the installation object model image from the object model image.

特開２０１２−１０８５７４号公報JP 2012-108574 A 特開２０１２−１５５５９５号公報JP 2012-155595 A

しかしながら、従来技術では、立位の人物のように奥行きの幅が無視できる物体を監視対象物としており、倒れている人物（臥位の人物）のように手前と奥とで隠蔽状況が異なり得る監視対象物に対する推定精度が低下していた。 However, in the conventional technology, an object whose depth width can be ignored, such as a standing person, is set as a monitoring target, and the concealment state may be different between the front and the back like a fallen person (person in a supine position). The estimation accuracy for the monitored object was reduced.

さらに、従来技術では、設置物ごとのモデル投影像や床面座標ごとの対応関係を格納していたため、監視空間の複雑さや広さに応じて記憶容量を増やさなければならず、また、設置物のレイアウト変更等に際して事前処理の手間が大きくなるという問題があった。 Furthermore, in the conventional technology, the model projection image for each installation and the correspondence for each floor coordinate are stored, so the storage capacity must be increased according to the complexity and size of the monitoring space. There is a problem that the time and effort of the pre-processing is increased when the layout is changed.

すなわち、従来技術では、床面上の隠蔽位置と当該位置に立つ二次元の設置物モデル像とを対応付けていたため、カメラから見て奥行き方向に凹凸のある複雑な設置物については、隠蔽位置を複数設定して隠蔽位置ごとに異なる設置物のモデル像を記憶させなくてはならなかった。 That is, in the prior art, the concealment position on the floor surface is associated with the two-dimensional object model image standing at the position. It was necessary to memorize a model image of a different installation object for each concealment position by setting multiple.

また、従来技術では、設置物ごとに設置物モデル像を記憶させていたため、設置物の数が増加すると、設置物モデル像を記憶させるための容量を増やさなければならなかった。 Further, in the prior art, since the installation object model image is stored for each installation object, the capacity for storing the installation object model image has to be increased as the number of installation objects increases.

また、従来技術では、監視空間内での位置ごとに隠蔽の有無および設置物モデル像との対応関係を記憶させていたため、監視空間の広さに応じて隠蔽の有無および設置物モデル像との対応関係を記憶させるための容量を増やさなければならなかった。 Further, in the prior art, since the presence / absence of concealment and the correspondence relationship with the installed object model image are stored for each position in the monitoring space, the presence / absence of concealment and the installed object model image are determined according to the size of the monitored space. The capacity for memorizing the correspondence had to be increased.

本発明は上記問題を鑑みてなされたものであり、臥位の人物のように奥行きの長い姿勢を含めて、複数の姿勢をとり得る監視対象物の非隠蔽領域を高精度に推定できる対象物画像推定装置および、監視対象物の像の存在を高精度に判定できる対象物画像判定装置を提供することを目的とする。また、本発明は、監視空間の複雑さや広さが増しても設置物について必要な記憶容量の変わらない対象物画像推定装置および対象物画像判定装置を提供することを別の目的とする。 The present invention has been made in view of the above problems, and an object that can accurately estimate a non-hidden region of a monitoring object that can take a plurality of postures including a posture with a long depth such as a prone person. It is an object of the present invention to provide an image estimation apparatus and an object image determination apparatus that can determine the presence of an image of a monitoring object with high accuracy. Another object of the present invention is to provide an object image estimation device and an object image determination device that do not change the storage capacity required for an installed object even if the monitoring space is increased in complexity and size.

（１）本発明に係る対象物画像推定装置は、撮影手段により所定の三次元空間を撮影した画像において対象物の像が現れている対象物可視領域を、前記三次元空間における前記対象物の立体モデルを用いて推定する装置であって、前記撮影手段の視線ベクトルの２種類の射影距離として、当該視線ベクトルをそれぞれ水平面及び鉛直軸へ正射影した像の長さである水平距離及び鉛直距離を定義し、前記画像の各画素に対応付けて、当該画素に対応する前記視線ベクトルの前記三次元空間内の設置物までの２種類の前記射影距離を設置物距離として記憶した設置物距離記憶手段と、前記対象物の候補姿勢、及び当該対象物が存在し得る前記三次元空間内の候補位置を設定する候補設定手段と、前記候補位置における前記候補姿勢での前記立体モデルを前記画像の座標系に投影したモデル像を出力するモデル像出力手段と、前記射影距離の種類のうち前記候補姿勢に対し予め定められた少なくとも１つの指定種類について、前記候補位置までの前記視線ベクトルの前記射影距離をモデル距離として算出するモデル距離算出手段と、前記立体モデルの前記モデル像を構成する画素のうち、当該画素の前記指定種類の前記設置物距離が当該立体モデルの当該指定種類の前記モデル距離以上である画素を対象物可視画素と推定する可視領域推定手段と、を有する。 (1) The object image estimation apparatus according to the present invention is configured to display an object visible region in which an image of an object appears in an image obtained by photographing a predetermined three-dimensional space by a photographing unit, and the object in the three-dimensional space. An apparatus for estimation using a three-dimensional model, wherein two types of projection distances of the line-of-sight vector of the photographing means are the horizontal distance and the vertical distance, which are the lengths of images obtained by orthogonal projection of the line-of-sight vector onto the horizontal plane and the vertical axis, respectively. An installation distance storage that stores two types of projection distances to the installation in the three-dimensional space of the line-of-sight vector corresponding to the pixel as the installation distance in association with each pixel of the image Means, candidate setting means for setting the candidate posture of the object, and a candidate position in the three-dimensional space where the target object may exist, and the stereoscopic model at the candidate posture at the candidate position. Model image output means for outputting a model image obtained by projecting the image on the coordinate system of the image, and at least one specified type predetermined for the candidate posture among the types of the projection distance, Model distance calculation means for calculating the projection distance of the line-of-sight vector as a model distance, and among the pixels constituting the model image of the stereo model, the specified object distance of the specified type of the pixel is the specification of the stereo model Visible region estimation means for estimating a pixel that is equal to or larger than the model distance as a target visible pixel.

（２）上記（１）に記載の対象物画像推定装置において、前記対象物は人であり、前記候補姿勢が立位に設定された場合に対して、前記指定種類の前記射影距離として前記水平距離が定められている構成とすることができる。 (2) In the object image estimation apparatus according to (1), the object is a person and the horizontal as the projection distance of the specified type when the candidate posture is set to a standing position. The distance may be determined.

（３）上記（１）および（２）に記載の対象物画像推定装置において、前記対象物は人であり、前記候補姿勢が臥位に設定された場合に対して、前記指定種類の前記射影距離として前記鉛直距離が定められている構成とすることができる。 (3) In the object image estimation apparatus according to (1) and (2) above, when the object is a person and the candidate posture is set to the supine position, the projection of the specified type The vertical distance can be determined as the distance.

（４）本発明に係る対象物画像判定装置は、上記（１）から（３）のいずれか１つに記載の対象物画像推定装置と、前記画像における前記対象物可視画素の画像特徴から前記対象物の像の存在を判定する対象物判定手段と、を有する。 (4) The object image determination apparatus according to the present invention is based on the object image estimation apparatus according to any one of (1) to (3) above, and the image feature of the object visible pixel in the image. Object determining means for determining presence of an image of the object.

本発明によれば、複数の姿勢をとり得る監視対象物について、その像の存在を高精度に判定できる。また、本発明によれば、監視空間の複雑さや広さが増しても設置物について必要な記憶容量を増やすことなく、監視対象物の像の存在を高速かつ高精度に判定できる。 According to the present invention, it is possible to determine the existence of an image of a monitoring object that can take a plurality of postures with high accuracy. Further, according to the present invention, it is possible to determine the presence of the image of the monitoring object at high speed and with high accuracy without increasing the storage capacity required for the installed object even if the monitoring space is increased in complexity and size.

本発明の実施形態に係る移動物体追跡装置のブロック構成図である。It is a block block diagram of the moving object tracking apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る移動物体追跡装置の概略の機能ブロック図である。1 is a schematic functional block diagram of a moving object tracking device according to an embodiment of the present invention. モデル像出力手段の機能ブロック図である。It is a functional block diagram of a model image output means. 候補姿勢に応じたモデル距離の具体的な定義を説明するための監視空間の模式図である。It is a schematic diagram of the monitoring space for demonstrating the specific definition of the model distance according to a candidate attitude | position. 設置物による隠蔽を判断する際の立体モデルの位置を立体モデルの主軸に垂直な平面に沿った距離で表すことの意義を説明するための監視空間の模式的な断面図である。It is a typical sectional view of surveillance space for explaining the meaning of expressing the position of a solid model at the time of judging concealment by an installation object with the distance along the plane perpendicular to the principal axis of a solid model. 設置物距離の具体例を説明するための模式図である。It is a schematic diagram for demonstrating the specific example of the installation object distance. 候補姿勢が立位である場合の可視領域推定手段の処理例を説明する模式図である。It is a schematic diagram explaining the example of a process of the visible region estimation means when a candidate attitude | position is standing. 候補姿勢が臥位である場合の可視領域推定手段の処理例を説明する模式図である。It is a schematic diagram explaining the example of a process of the visible region estimation means when a candidate attitude | position is a supine position. 本発明の実施形態に係る移動物体追跡装置の追跡処理の概略のフロー図である。It is a general | schematic flowchart of the tracking process of the moving object tracking apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る移動物体追跡装置の追跡処理の概略のフロー図である。It is a general | schematic flowchart of the tracking process of the moving object tracking apparatus which concerns on embodiment of this invention. 設置物隠蔽推定処理の概略のフロー図である。It is a general | schematic flowchart of an installation object concealment estimation process.

以下、本発明の実施の形態（以下実施形態という）である移動物体追跡装置１について、図面に基づいて説明する。移動物体追跡装置１は、什器が配置された部屋等のように設置物が存在する屋内外の三次元空間を監視対象の空間とすることができ、当該監視空間内を移動する人物を追跡対象物（以下、対象物と称する）とする。移動物体追跡装置１は監視空間を撮像した監視画像を処理して対象物の検出・追跡を行う。什器等の監視空間内の設置物は対象物のように移動せず予めその設置位置が判っている。設置物の他の例としては柱や給湯器などがある。設置物は画像処理の観点からは対象物の像を隠蔽し得る遮蔽物である。なお、注目している対象物以外の対象物も遮蔽物となり得る。 Hereinafter, a moving object tracking device 1 according to an embodiment of the present invention (hereinafter referred to as an embodiment) will be described with reference to the drawings. The moving object tracking device 1 can set an indoor / outdoor three-dimensional space where an installation is present, such as a room in which furniture is arranged, as a monitoring target space, and a person who moves in the monitoring space can be tracked An object (hereinafter referred to as an object). The moving object tracking device 1 detects and tracks a target object by processing a monitoring image obtained by imaging the monitoring space. The installation object in the monitoring space such as a fixture does not move like an object, and its installation position is known in advance. Other examples of installed objects include pillars and water heaters. The installed object is a shielding object that can conceal the image of the object from the viewpoint of image processing. Note that objects other than the object of interest can also be a shielding object.

［移動物体追跡装置１の構成］
図１は、実施形態に係る移動物体追跡装置１のブロック構成図である。移動物体追跡装置１は、撮影部２、記憶部３、画像処理部４および出力部５を含んで構成される。撮影部２、記憶部３および出力部５は画像処理部４に接続される。 [Configuration of Moving Object Tracking Device 1]
FIG. 1 is a block diagram of a moving object tracking device 1 according to the embodiment. The moving object tracking device 1 includes an imaging unit 2, a storage unit 3, an image processing unit 4, and an output unit 5. The photographing unit 2, the storage unit 3, and the output unit 5 are connected to the image processing unit 4.

撮影部２は、監視カメラであり、監視空間を臨むように設置され、監視空間を所定の時間間隔で撮影する。撮影された監視空間の監視画像は順次、画像処理部４へ出力される。専ら床面又は地表面等の基準面に沿って移動する人の位置、移動を把握するため、撮影部２は基本的に人を俯瞰撮影可能な高さに設置され、例えば、本実施形態では移動物体追跡装置１は屋内監視に用いられ、撮影部２は天井に設置される。監視画像が撮像される時間間隔は例えば１／５秒である。以下、この撮像の時間間隔で刻まれる時間の単位を時刻と称する。 The imaging unit 2 is a monitoring camera, is installed so as to face the monitoring space, and images the monitoring space at a predetermined time interval. The captured monitoring images of the monitoring space are sequentially output to the image processing unit 4. In order to grasp the position and movement of a person who moves along a reference plane such as the floor surface or the ground surface, the photographing unit 2 is basically installed at a height at which a person can be seen from a bird's-eye view. The moving object tracking device 1 is used for indoor monitoring, and the photographing unit 2 is installed on the ceiling. The time interval at which the monitoring image is captured is 1/5 second, for example. Hereinafter, the unit of time recorded at the time interval of imaging is referred to as time.

記憶部３は、ＲＯＭ(Read Only Memory)、ＲＡＭ(Random Access Memory)等の記憶装置である。記憶部３は、各種プログラムや各種データを記憶し、画像処理部４との間でこれらの情報を入出力する。各種データには、対象物や設置物のモデルに関するデータや、カメラパラメータが含まれる。 The storage unit 3 is a storage device such as a ROM (Read Only Memory) or a RAM (Random Access Memory). The storage unit 3 stores various programs and various data, and inputs and outputs such information to and from the image processing unit 4. The various data includes data relating to the model of the object or the installation object, and camera parameters.

画像処理部４はＣＰＵ(Central Processing Unit)、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の演算装置を用いて構成され、撮影部２、記憶部３および出力部５に接続される。画像処理部４は記憶部３からプログラムを読み出して実行することで後述する各手段として機能する。 The image processing unit 4 is configured using an arithmetic device such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), or an MCU (Micro Control Unit), and is connected to the photographing unit 2, the storage unit 3, and the output unit 5. . The image processing unit 4 functions as each unit to be described later by reading and executing a program from the storage unit 3.

出力部５は警告音を出力するスピーカー又はブザー等の音響出力手段、異常が判定された監視画像を表示する液晶ディスプレイ又はＣＲＴ等の表示手段などを含んでなり、画像処理部４からアラーム信号が入力されると異常発生の旨を外部へ出力する。また、出力部５は通信回線を介してアラーム信号を警備会社の監視センタに設置されたセンタ装置へ送信する通信手段を含んでもよい。 The output unit 5 includes a sound output unit such as a speaker or a buzzer that outputs a warning sound, a display unit such as a liquid crystal display or a CRT that displays a monitoring image determined to be abnormal, and an alarm signal is output from the image processing unit 4. When input, the fact that an error has occurred is output to the outside. Further, the output unit 5 may include a communication unit that transmits an alarm signal to a center device installed in a monitoring center of a security company via a communication line.

図２は移動物体追跡装置１の概略の機能ブロック図である。図２に示す構成において、記憶部３は設置物距離記憶手段３０として機能する。また、画像処理部４は仮説設定手段４０、モデル像出力手段４１、モデル距離算出手段４２、可視領域推定手段４３、対象物判定手段４４および異常判定手段４５として機能する。 FIG. 2 is a schematic functional block diagram of the moving object tracking device 1. In the configuration shown in FIG. 2, the storage unit 3 functions as the installation distance storage unit 30. The image processing unit 4 functions as a hypothesis setting unit 40, a model image output unit 41, a model distance calculation unit 42, a visible region estimation unit 43, an object determination unit 44, and an abnormality determination unit 45.

仮説設定手段４０（候補設定手段）は、過去に判定された各対象物の物体位置又は過去に設定された各対象物の候補位置から動き予測を行なって、新たに入力される監視画像において対象物が存在する位置を予測し、その予測された位置（候補位置）をモデル像出力手段４１およびモデル距離算出手段４２へ出力する。 The hypothesis setting means 40 (candidate setting means) performs motion prediction from the object position of each target object determined in the past or the candidate position of each target object set in the past, and in the newly input monitoring image The position where the object is present is predicted, and the predicted position (candidate position) is output to the model image output means 41 and the model distance calculation means 42.

例えば、位置の予測はパーティクルフィルタなどと呼ばれる方法を用いて行うことができる。当該方法は、各対象物に対して多数（その個数をＰで表す。例えば１対象物あたり２００個）の候補位置を順次設定して確率的に対象物の位置（物体位置）を判定するものであり、設定される候補位置は仮説などと呼ばれる。候補位置は監視画像のｘｙ座標系で設定することもできるが、本実施形態では監視空間の直交座標系（ＸＹＺ座標系）で設定する。動き予測は過去の物体位置に所定の運動モデルを適用するか（下記（１））、又は過去の候補位置に所定の運動モデルを適用すること（下記（２））で行なわれる。 For example, position prediction can be performed using a method called a particle filter. In this method, a large number of candidate positions (the number of which is represented by P. For example, 200 per object) are sequentially set for each object, and the position (object position) of the object is determined probabilistically. The candidate position to be set is called a hypothesis or the like. Although the candidate position can be set in the xy coordinate system of the monitoring image, in this embodiment, it is set in the orthogonal coordinate system (XYZ coordinate system) of the monitoring space. The motion prediction is performed by applying a predetermined motion model to the past object position (the following (1)) or applying a predetermined motion model to the past candidate position (the following (2)).

（１）物体位置からの予測
注目時刻より前のＴ時刻分（例えばＴ＝５）の物体位置から平均速度ベクトルを算出する。この平均速度ベクトルを１時刻前の物体位置に加算して注目時刻における物体位置を予測する。予測された物体位置を中心とする所定範囲にランダムにＰ個の候補位置を設定する。この方法では、過去Ｔ時刻分の物体位置が記憶部３に循環記憶される。 (1) Prediction from object position An average velocity vector is calculated from the object position for T time (for example, T = 5) before the attention time. This average velocity vector is added to the object position one time ago to predict the object position at the time of interest. P candidate positions are randomly set within a predetermined range centered on the predicted object position. In this method, object positions for the past T times are circulated and stored in the storage unit 3.

（２）候補位置からの予測
注目時刻より前のＴ時刻分（例えばＴ＝５）の候補位置から平均速度ベクトルを算出する。この平均速度ベクトルを１時刻前の候補位置に加算して注目時刻における新たな候補位置を予測する。予測はＰ個の候補位置それぞれに対し行ない、新たな候補位置とその元となった過去の候補位置には同一の識別子を付与して循環記憶する。なお、１時刻前の候補位置のうちその尤度が予め設定された尤度閾値より低いものは削除する。一方、この削除分を補うために、削除した個数と同数の１時刻前の新たな候補位置を１時刻前の予測された物体位置を中心とする所定範囲にランダムに設定し、これらの候補位置と対応する２時刻前以前の候補位置を過去の物体位置の運動に合わせて外挿し求める。そのために過去の候補位置に加えて、過去Ｔ時刻分の物体位置も記憶部３に循環記憶させる。 (2) Prediction from candidate position An average velocity vector is calculated from candidate positions for T time (for example, T = 5) before the attention time. This average velocity vector is added to the candidate position one hour before, and a new candidate position at the time of interest is predicted. Prediction is performed for each of the P candidate positions, and the same identifier is assigned to the new candidate position and the past candidate position that is the origin of the new candidate position, and is circularly stored. Note that candidate positions whose likelihood is lower than a preset likelihood threshold are deleted from candidate positions one hour ago. On the other hand, in order to compensate for this deletion, new candidate positions one hour before the same number as the number deleted are randomly set within a predetermined range centered on the predicted object position one hour before, and these candidate positions And extrapolating the candidate positions before two hours before corresponding to the motion of the past object position. Therefore, in addition to the past candidate positions, the object positions for the past T times are also circulated and stored in the storage unit 3.

なお、新規の対象物が検出された場合は、仮説設定手段４０は、その検出位置を中心とする所定範囲にランダムにＰ個の候補位置を設定する。 When a new object is detected, the hypothesis setting unit 40 randomly sets P candidate positions within a predetermined range centered on the detected position.

また、仮説設定手段４０は候補姿勢を仮説に含める。例えば、各対象物の仮説の半数についての候補姿勢に、立っている姿勢（立位）を設定し、残りの半数についての候補姿勢に、倒れている姿勢（臥位）を設定する。 Further, the hypothesis setting means 40 includes the candidate posture in the hypothesis. For example, a standing posture (standing position) is set as a candidate posture for half of the hypotheses of each target object, and a falling posture (declining posture) is set as a candidate posture for the remaining half.

あるいは、仮説設定手段４０は、変化画素のまとまり（変化領域）の形状や移動速度の大きさ等に基づき現在の姿勢が予測できる対象物については、予測した姿勢の仮説を他の姿勢の仮説よりも多く設定することもできる。例えば、追跡中の人物のうち、その候補位置周辺に鉛直方向に長い形状の変化領域が抽出されている人物については、９割の仮説の候補姿勢に立位を設定し、残り１割の候補姿勢に臥位を設定する。 Alternatively, the hypothesis setting unit 40 may use a predicted posture hypothesis as compared to other posture hypotheses for an object whose current posture can be predicted based on the shape of a group of changed pixels (change region), the magnitude of movement speed, and the like. Can also be set. For example, among the people being tracked, for a person whose vertical region of change in the vertical direction is extracted around the candidate position, the standing posture is set to 90% of the hypothetical candidate postures, and the remaining 10% of the candidates Set the posture to posture.

モデル像出力手段４１は、候補位置での対象物の立体モデルを監視画像の座標系に投影したモデル像を可視領域推定手段４３に出力する。図３はモデル像出力手段４１の機能ブロック図であり、モデル像出力手段４１の２種類の構成例を示している。図３に示す構成において、立体モデル記憶手段３００、カメラパラメータ記憶手段３０１およびモデル像記憶手段３０２は記憶部３により実現され、モデル像生成手段４２０およびモデル像読み出し手段４２１は画像処理部４により実現される。 The model image output unit 41 outputs a model image obtained by projecting the three-dimensional model of the object at the candidate position onto the coordinate system of the monitoring image to the visible region estimation unit 43. FIG. 3 is a functional block diagram of the model image output unit 41, and shows two types of configuration examples of the model image output unit 41. In the configuration shown in FIG. 3, the three-dimensional model storage unit 300, the camera parameter storage unit 301, and the model image storage unit 302 are realized by the storage unit 3, and the model image generation unit 420 and the model image reading unit 421 are realized by the image processing unit 4. Is done.

図３（Ａ）に示すモデル像出力手段４１は、対象物の可視領域の推定処理の都度、立体モデルを投影してモデル像を生成する形態であり、立体モデル記憶手段３００、カメラパラメータ記憶手段３０１およびモデル像生成手段４２０でなる。立体モデル記憶手段３００は対象物の立体モデルを記憶しており、また、カメラパラメータ記憶手段３０１は撮影手段のカメラパラメータを記憶している。モデル像生成手段４２０は立体モデル記憶手段３００およびカメラパラメータ記憶手段３０１から立体モデルおよびカメラパラメータを読み出す。そして、モデル像生成手段４２０はカメラパラメータを用いて候補位置での対象物の立体モデルを監視画像の座標系に投影してモデル像を生成し出力する。 The model image output means 41 shown in FIG. 3A is a form in which a model image is generated by projecting a three-dimensional model every time the visible region of the object is estimated. The three-dimensional model storage means 300, the camera parameter storage means 301 and model image generating means 420. The three-dimensional model storage unit 300 stores a three-dimensional model of the object, and the camera parameter storage unit 301 stores camera parameters of the photographing unit. The model image generation unit 420 reads the stereo model and camera parameters from the stereo model storage unit 300 and the camera parameter storage unit 301. Then, the model image generation unit 420 generates and outputs a model image by projecting the three-dimensional model of the target object at the candidate position onto the coordinate system of the monitoring image using the camera parameters.

図３（Ｂ）に示すモデル像出力手段４１は、立体モデルを事前に投影したモデル像を読み出す形態であり、モデル像記憶手段３０２およびモデル像読み出し手段４２１でなる。モデル像記憶手段３０２は三次元空間内の各位置でのモデル像を記憶しており、モデル像読み出し手段４２１は候補位置に対応して記憶されているモデル像をモデル像記憶手段３０２から読み出して出力する。 The model image output unit 41 shown in FIG. 3B reads a model image obtained by projecting a three-dimensional model in advance, and includes a model image storage unit 302 and a model image reading unit 421. The model image storage unit 302 stores the model image at each position in the three-dimensional space, and the model image reading unit 421 reads the model image stored corresponding to the candidate position from the model image storage unit 302. Output.

あるいは、これらの中間的形態として、三次元空間内の代表位置でのモデル像を記憶しておき、候補位置の代表位置との関係に応じて代表位置でのモデル像に拡大・縮小などの変形処理を施すことによって候補位置のモデル像を生成し出力することもできる。 Alternatively, as an intermediate form, the model image at the representative position in the three-dimensional space is stored, and the model image at the representative position is deformed by enlarging or reducing according to the relationship with the representative position of the candidate position. It is also possible to generate and output a model image of candidate positions by performing processing.

モデル距離算出手段４２は仮説設定手段４０から仮説として候補位置および候補姿勢を入力され、撮影手段から候補位置までの射影距離を算出し、射影距離を可視領域推定手段４３に出力する。ここで、候補姿勢ごとに、監視空間である三次元空間において立体モデルの長手方向の主軸ベクトルが定義される線形部分空間を主軸ベクトル空間と称すると、射影距離とは、撮影手段の視線ベクトルの主軸ベクトル空間に直交する成分の大きさと定義する。モデル距離算出手段４２が算出する射影距離は、撮影手段であるカメラの視点を始点とし候補位置を終点とする視線ベクトルについての距離である。この撮影手段から候補位置までの射影距離を、後述する撮影手段から設置物までの視線ベクトルについての射影距離と区別するために、モデル距離と称する。 The model distance calculation unit 42 receives the candidate position and the candidate posture as the hypothesis from the hypothesis setting unit 40, calculates the projection distance from the imaging unit to the candidate position, and outputs the projection distance to the visible region estimation unit 43. Here, for each candidate posture, a linear subspace in which the principal axis vector in the longitudinal direction of the three-dimensional model is defined in the three-dimensional space that is the monitoring space is referred to as a principal axis vector space, and the projection distance is the line-of-sight vector of the imaging means. It is defined as the size of the component orthogonal to the principal axis vector space. The projection distance calculated by the model distance calculation unit 42 is a distance with respect to a line-of-sight vector starting from the viewpoint of the camera serving as the imaging unit and having the candidate position as the end point. This projection distance from the photographing means to the candidate position is referred to as a model distance in order to distinguish it from the projection distance for the line-of-sight vector from the photographing means to the installation described later.

したがってモデル距離算出手段４２は仮説設定手段４０から入力される対象物の候補姿勢に応じてモデル距離の種類を切り替える。図４は候補姿勢に応じたモデル距離の具体的な定義を説明するための監視空間の模式図であり、図４（Ａ）は透視図であり、図４（Ｂ）はＹＺ平面に沿った断面図である。以下、候補姿勢が立位の場合と、臥位の場合とを順次説明する。 Therefore, the model distance calculation unit 42 switches the model distance type according to the candidate posture of the target object input from the hypothesis setting unit 40. 4A and 4B are schematic views of the monitoring space for explaining a specific definition of the model distance according to the candidate posture, FIG. 4A is a perspective view, and FIG. 4B is along the YZ plane. It is sectional drawing. Hereinafter, a case where the candidate posture is standing and a case where the candidate posture is standing will be sequentially described.

（１）候補姿勢が立位の場合
立位の人の立体モデル６００の主軸６０１はＺ軸に平行な方向すなわち鉛直方向であるから、主軸ベクトルは鉛直方向の直線である一次元空間に束縛され、これが主軸ベクトル空間となる。この場合、主軸６０１に対する垂直面であるＸＹ平面が主軸ベクトル空間の直交補空間であり、このＸＹ平面上での視線ベクトルの大きさが射影距離となる。ここでは射影距離を、視線ベクトルを水平面６１０（図４の例では監視空間である部屋の床面）に正射影した像の大きさで定義する。よってモデル距離算出手段４２は鉛直な主軸６０１を有する立体モデル６００に対して撮影手段から候補位置Ｐ_１までの水平距離ｄ_ＨＭ１をモデル距離として算出する。 (1) When Candidate Posture is Standing Since the main axis 601 of the standing human solid model 600 is parallel to the Z axis, that is, the vertical direction, the main axis vector is constrained to a one-dimensional space that is a straight line in the vertical direction. This becomes the principal axis vector space. In this case, the XY plane that is a plane perpendicular to the main axis 601 is the orthogonal complement space of the main axis vector space, and the size of the line-of-sight vector on this XY plane is the projection distance. Here, the projection distance is defined by the size of the image obtained by orthogonally projecting the line-of-sight vector onto the horizontal plane 610 (in the example of FIG. 4, the floor of the room that is the monitoring space). Therefore the model distance calculating unit 42 calculates a horizontal distance d _HM1 from photographing means with respect to the three-dimensional model 600 to the candidate position P ₁ having a vertical main axis 601 as a model distance.

直交座標系ＸＹＺで表した三次元空間内における撮影手段の位置Ｃの座標を（Ｘ_Ｃ，Ｙ_Ｃ，Ｚ_Ｃ）、立位の人の候補位置Ｐ_１の座標を（Ｘ_１，Ｙ_１，Ｚ_１）とすると、モデル距離である水平距離ｄ_ＨＭ１は次式で与えられる。
ｄ_ＨＭ１＝｛（Ｘ_１−Ｘ_Ｃ）^２＋（Ｙ_１−Ｙ_Ｃ）^２｝^１／２ ………（１） The coordinates of the position C of the photographing means in the three-dimensional space represented by the orthogonal coordinate system XYZ are (X _C , Y _C , Z _C ), and the coordinates of the candidate position P ₁ of the standing person are (X ₁ , Y ₁ , If Z ₁ ), the horizontal distance d _HM1, which is the model distance, is given by the following equation.
d _HM1 = {(X ₁ −X _C ) ² + (Y ₁ −Y _C ) ² } ^1/2 (1)

なお、モデル像読み出し手段４２１は、立体モデル６００の厚み２Ｗ_１を考慮に入れ、候補位置Ｐ_１を立体モデル６００の主軸６０１よりも撮影手段にＷ_１だけ近い点６０２と定義してモデル距離を算出してもよい。当該定義は例えば、設置物に壁を含める場合に有効である。この例を含め、候補位置Ｐ_１と立体モデル６００との位置関係は立体モデル６００の厚みの範囲で主軸６０１から前後させて定義可能である。なぜなら、その範囲であればモデル距離と後述する設置物距離との大小関係は保たれ、後述する可視領域推定手段４３は隠蔽の有無の判定を正しく行うことができるからである。 The model image reading unit 421 takes the thickness 2W ₁ of the stereo model 600 into consideration, defines the candidate position P ₁ as a point 602 that is closer to the imaging unit by W ₁ than the main axis 601 of the stereo model 600, and sets the model distance. It may be calculated. This definition is effective, for example, when a wall is included in an installation. Including this example, the positional relationship between the candidate position P ₁ and the three-dimensional model 600 can be defined by the front and rear from the main axis 601 in a range of thickness of the three-dimensional model 600. This is because, within that range, the magnitude relationship between the model distance and the installation object distance described later is maintained, and the visible area estimation means 43 described later can correctly determine the presence or absence of concealment.

（２）候補姿勢が臥位の場合
臥位の人の立体モデル６２０の主軸６２１はＸＹ平面に平行な方向、すなわち水平方向で方位は様々であるから、主軸ベクトルは水平面である二次元空間に束縛され、これが主軸ベクトル空間となる。この場合、鉛直軸が主軸ベクトル空間の直交補空間であり、射影距離は当該鉛直軸に視線ベクトルを正射影した像の大きさである鉛直距離で定義される。 (2) In the case where the candidate posture is the supine position Since the main axis 621 of the three-dimensional model 620 of the supine person is in a direction parallel to the XY plane, that is, in the horizontal direction and has various orientations, the main axis vector is in a two-dimensional space that is a horizontal plane. This is the main axis vector space. In this case, the vertical axis is an orthogonal complement space of the principal axis vector space, and the projection distance is defined as a vertical distance that is the size of an image obtained by orthogonally projecting the line-of-sight vector onto the vertical axis.

よって、水平な主軸６２１を有する立体モデル６２０に対して、モデル距離算出手段４２は撮影手段（点Ｃ）から候補位置Ｐ_２までの視線ベクトルの鉛直距離ｄ_ＶＭ２をモデル距離として算出する。候補位置Ｐ_２の座標を（Ｘ_２，Ｙ_２，Ｚ_２）とすると、鉛直距離ｄ_ＶＭ２は次式で与えられる。
ｄ_ＶＭ２＝｜Ｚ_２−Ｚ_Ｃ｜ ………（２） Thus, for the three-dimensional model 620 having a horizontal main shaft 621, the model distance calculating unit 42 calculates the vertical distance d _VM2 of line-of-sight vector from the imaging means (point C) to the candidate position P ₂ as a model distance. When the coordinates of the candidate position P ₂ are (X ₂ , Y ₂ , Z ₂ ), the vertical distance d _VM2 is given by the following equation.
d _VM2 = | Z ₂ −Z _C | (2)

なお、モデル距離算出手段４２は、立体モデル６２０の厚み２Ｗ_２を考慮に入れ、候補位置Ｐ_２を立体モデル６２０の主軸６２１よりも撮影手段にＷ_２だけ近い点６２２と定義してモデル距離を算出してもよい。当該定義は設置物に床を含める場合に有効である。この例を含め、水平距離の場合と同様の理由により、候補位置Ｐ_２と立体モデル６２０との位置関係は立体モデル６２０の厚みの範囲で主軸６２１から前後（上下）させて定義することができる。 Incidentally, the model distance calculating unit 42, taking into account the thickness 2W ₂ of the three-dimensional model 620, the candidate position _{P 2} is defined as _{W 2} as close point 622 to the photographing unit than the main shaft 621 of the three-dimensional model 620 models the distance It may be calculated. This definition is effective when the floor is included in the installation. Including this example, it is possible for the same reason as in the case of horizontal distance, positional relationship between the candidate position P ₂ and the three-dimensional model 620 to define by longitudinal (vertical) from the spindle 621 in a range of thickness of the three-dimensional model 620 .

撮影手段の視線ベクトルをそれぞれ水平面及び鉛直軸へ正射影した像の長さである水平距離及び鉛直距離を、視線ベクトルの２種類の射影距離とすると、上述したように、モデル距離算出手段４２は射影距離の種類のうち候補姿勢に対し予め定められた少なくとも１つの指定種類について、撮影手段から候補位置までの視線ベクトルの射影距離をモデル距離として算出する。具体的には本実施形態での指定種類の射影距離は、候補姿勢が立位の場合には水平距離であり、候補姿勢が臥位の場合には鉛直距離である。 When the horizontal distance and the vertical distance, which are the lengths of the images obtained by orthogonally projecting the line-of-sight vector of the photographing unit onto the horizontal plane and the vertical axis, are the two types of projection distance of the line-of-sight vector, the model distance calculating unit 42 is as described above. The projection distance of the line-of-sight vector from the imaging means to the candidate position is calculated as the model distance for at least one specified type predetermined for the candidate posture among the projection distance types. Specifically, the specified type of projection distance in the present embodiment is a horizontal distance when the candidate posture is standing, and a vertical distance when the candidate posture is saddle.

以上、候補姿勢に応じてモデル距離を計算する方法を説明した。この方法では、水平距離および鉛直距離はいずれも立体モデルの主軸に垂直な平面に沿った距離である。すなわちいずれの候補姿勢においても、立体モデルの主軸に垂直な平面に沿った距離で、設置物による隠蔽を判断する際の立体モデルの位置を捉える。この方法の意義について図５の模式図を用いつつ説明する。 The method for calculating the model distance according to the candidate posture has been described above. In this method, the horizontal distance and the vertical distance are both distances along a plane perpendicular to the main axis of the three-dimensional model. That is, in any of the candidate postures, the position of the three-dimensional model when determining concealment by the installation object is captured at a distance along a plane perpendicular to the main axis of the three-dimensional model. The significance of this method will be described with reference to the schematic diagram of FIG.

図５は図４（Ａ）に示す立体モデル６００，６２０の配置例における監視空間のＹＺ平面に沿った模式的な断面図である。立体モデル６００，６２０に対応する候補位置Ｐ_１，Ｐ_２は、立体モデル６００，６２０それぞれの水平面６１０上での中央に位置している。立位の立体モデル６００には候補位置Ｐ_１との比較のための点として主軸６０１に沿った方向の位置（つまりＺ軸座標）が異なる点Ｐ_１ａ，Ｐ_１ｂを例示している。候補位置Ｐ_１が立体モデル６００の床との接地点であるのに対し、点Ｐ_１ａは立体モデル６００の高さの中央に向かう視線ベクトルと立体モデル６００の表面との交点であり、点Ｐ_１ｂは立体モデル６００の頭部に位置する。また臥位の立体モデル６２０には候補位置Ｐ_２との比較のための点として主軸６２１に沿った方向の位置（つまりＹ軸座標）が異なる点Ｐ_２ａ，Ｐ_２ｂを例示している。候補位置Ｐ_２が立体モデル６２０の長手方向の中央における床との接地点であるのに対し、点Ｐ_２ａは候補位置Ｐ_２より撮影手段に近い部位である立体モデル６２０の頭部に向かう視線ベクトルと立体モデル６２０との交点であり、点Ｐ_２ｂは候補位置Ｐ_２より撮影手段から遠い部位である立体モデル６２０の脚部に向かう視線ベクトルと立体モデル６２０との交点である。 FIG. 5 is a schematic cross-sectional view along the YZ plane of the monitoring space in the arrangement example of the three-dimensional models 600 and 620 shown in FIG. Candidate positions P ₁ and P ₂ corresponding to the three-dimensional models 600 and 620 are positioned at the center of the three-dimensional models 600 and 620 on the horizontal plane 610. The standing solid model 600 illustrates points P _1a and P _1b having different positions in the direction along the main axis 601 (that is, Z-axis coordinates) as points for comparison with the candidate position P ₁ . The candidate position P ₁ is a contact point with the floor of the three-dimensional model 600, whereas the point P _1a is an intersection of the line-of-sight vector toward the center of the height of the three-dimensional model 600 and the surface of the three-dimensional model 600. _{1 b} is located at the head of the three-dimensional model 600. In addition, the prone three-dimensional model 620 illustrates points P _2a and P _2b having different positions in the direction along the main axis 621 (that is, Y-axis coordinates) as points for comparison with the candidate position P ₂ . To candidate position P ₂ that is a ground point of the floor in the longitudinal direction of the center of the three-dimensional model 620, the point P _2a is towards the head of the three-dimensional model 620 is a portion close to the photographing means from the candidate position P ₂ sight The intersection of the vector and the three-dimensional model 620, and the point P _2b is the intersection of the line-of-sight vector toward the leg of the three-dimensional model 620, which is a part farther from the imaging means than the candidate position P _2, and the three-dimensional model 620.

図５に見て取れるように、カメラから対象物までの距離は視線ベクトルの向きによって変化する。換言すれば、当該距離は同一の対象物であっても監視画像における画素ごとに異なり得る。ここで、同一対象物について画素ごとに距離を計算すると処理負荷が過大となるのに対し、候補位置に関する距離を同一対象物の各画素に共通に用いれば処理負荷を軽減することができる。その一方、画素ごとに本来相違し得る距離を共通の距離に置き換えた場合、当然ながら誤差が発生し誤った隠蔽状態が推定される可能性が高くなる。 As can be seen in FIG. 5, the distance from the camera to the object varies depending on the direction of the line-of-sight vector. In other words, the distance can be different for each pixel in the monitoring image even for the same object. Here, if the distance is calculated for each pixel with respect to the same object, the processing load becomes excessive. On the other hand, if the distance regarding the candidate position is commonly used for each pixel of the same object, the processing load can be reduced. On the other hand, when a distance that originally differs from pixel to pixel is replaced with a common distance, naturally an error occurs and the possibility of an erroneous concealment state being estimated increases.

この点に関し、本発明ではカメラから候補位置までの視線ベクトルの大きさを立体モデルの主軸に直交する方向の距離で表し、当該距離を同一対象物の各画素に共通に用いることで、上述の誤差の低減を図っている。 In this regard, in the present invention, the magnitude of the line-of-sight vector from the camera to the candidate position is represented by a distance in a direction orthogonal to the main axis of the three-dimensional model, and the distance is used in common for each pixel of the same object. The error is reduced.

つまり、同一の対象物の立体モデルの各点への視線ベクトルの長さの違いを、視線ベクトルの鉛直方向の成分の違いからの寄与分（鉛直差寄与分とする）と水平方向の成分の違いからの寄与分（水平差寄与分とする）とに分けると、立位の立体モデル６００では鉛直方向のサイズ（対象物である人物の背丈）より水平方向のサイズ（人物の幅）が小さいので、鉛直差寄与分より水平差寄与分が小さくなる。例えば、立体モデル６００に対する３つの視線ベクトルＣＰ_１，ＣＰ_１ａ，ＣＰ_１ｂに関し、主軸６０１に直交する方向（図５ではＹ軸方向）の成分の相違は立体モデル６００の厚み程度となり、主軸６０１に平行な方向（図５ではＺ軸方向）の成分に生じる立体モデル６００の高さ程度の相違より小さくなる。水平距離のように立体モデルの主軸に直交する方向の距離は、Ｚ軸方向の成分が除去されるので、視線ベクトルそのものの長さや視線ベクトルをＺ軸に正射影した像の長さなど、Ｚ軸方向の成分を含んで定義される距離と比較して立体モデル６００全体での差が小さい。よって、候補姿勢が立位の場合には例えば（１）式で与えられる候補位置Ｐ_１の水平距離を同一対象物の各画素に共通に用いることで上述の誤差の低減が図れる。 In other words, the difference in the length of the line-of-sight vector to each point of the three-dimensional model of the same object is defined as the contribution from the difference in the vertical component of the line-of-sight vector (referred to as the vertical difference contribution) and the horizontal component. When divided into contributions from differences (referred to as horizontal difference contributions), in the standing solid model 600, the horizontal size (the width of the person) is smaller than the vertical size (the height of the person who is the object). Therefore, the horizontal difference contribution is smaller than the vertical difference contribution. For example, regarding three line-of-sight vectors CP ₁ , CP _1a , CP _{1b with} respect to the three-dimensional model 600, the difference in components in the direction orthogonal to the main axis 601 (the Y-axis direction in FIG. 5) is about the thickness of the three-dimensional model 600. This is smaller than the difference in height of the solid model 600 that occurs in the components in the parallel direction (Z-axis direction in FIG. 5). Since the component in the Z-axis direction is removed from the distance in the direction orthogonal to the main axis of the three-dimensional model, such as the horizontal distance, the length of the line-of-sight vector itself, the length of the image obtained by orthogonally projecting the line-of-sight vector onto the Z-axis, etc. The difference in the entire solid model 600 is small compared to the distance defined including the axial component. Thus, the candidate posture can be reduced in the above-described error by using in common to each pixel of the example (1) the same object the horizontal distance of the candidate positions P ₁ given by the formula in the case of standing.

一方、臥位の立体モデル６２０では水平方向のサイズより鉛直方向のサイズが小さいので、水平差寄与分より鉛直差寄与分が小さくなる。例えば、立体モデル６２０に対する３つの視線ベクトルＣＰ_２，ＣＰ_２ａ，ＣＰ_２ｂに関し、主軸６２１に直交する方向（図５ではＺ軸方向）の成分の相違は立体モデル６２０の高さ（人物の厚み）程度となり、主軸６２１に平行な方向（図５ではＹ軸方向）の成分に生じる立体モデル６２０の長さ（人物の背丈）程度の相違より小さくなる。鉛直距離のように立体モデルの主軸に直交する方向の距離は、Ｙ軸方向の成分が除去されるので、視線ベクトルそのものの長さや視線ベクトルをＹ軸に正射影した像の長さなど、Ｙ軸方向の成分を含んで定義される距離と比較して立体モデル６２０全体での差が小さい。よって、候補姿勢が臥位の場合には例えば（２）式で与えられる候補位置Ｐ_２の鉛直距離を同一対象物の各画素に共通に用いることで上述の誤差の低減が図れる。 On the other hand, since the vertical model is smaller in the vertical model than the horizontal model, the vertical difference contribution is smaller than the horizontal difference contribution. For example, regarding the three line-of-sight vectors CP ₂ , CP _2a , CP _{2b with} respect to the three-dimensional model 620, the difference in the components in the direction orthogonal to the main axis 621 (the Z-axis direction in FIG. 5) is the height of the three-dimensional model 620 (the thickness of the person). It becomes smaller than the difference in the length (person's height) of the three-dimensional model 620 generated in the component in the direction parallel to the main axis 621 (Y-axis direction in FIG. 5). Since the component in the Y-axis direction is removed from the distance in the direction perpendicular to the principal axis of the stereo model, such as the vertical distance, the length of the line-of-sight vector itself or the length of the image obtained by orthogonally projecting the line-of-sight vector to the Y-axis, etc. The difference in the entire three-dimensional model 620 is small compared to the distance defined including the axial component. Thus, if the candidate position is supine example (2) by using in common to the pixels of the same object in the vertical distance of the candidate position P ₂ given by equation can be reduced in the above-mentioned error.

設置物距離記憶手段３０は監視画像の各画素に対応付けて、当該画素に対応する視線ベクトルの三次元空間内の設置物までの射影距離を記憶する。この構成では、設置物について必要な記憶容量は監視画像の画素数に応じて決まり、監視空間に存在する設置物の数、監視空間の複雑さや広さが増しても影響を受けない。 The installation object distance storage unit 30 stores the projection distance to the installation object in the three-dimensional space of the line-of-sight vector corresponding to each pixel in association with each pixel of the monitoring image. In this configuration, the storage capacity necessary for the installation is determined according to the number of pixels of the monitoring image, and is not affected even if the number of installations existing in the monitoring space and the complexity and size of the monitoring space increase.

なお、本実施形態では候補姿勢に応じて種類が異なる射影距離を用いることとしており、この場合、設置物距離記憶手段３０は、候補姿勢が立位の場合に用いるために撮影手段と候補位置との水平距離を記憶し、かつ候補姿勢が臥位の場合に用いるために鉛直距離を記憶する。すなわち、設置物距離記憶手段３０は監視画像の各画素に対応付けて、当該画素に対応する視線ベクトルの三次元空間内の設置物までの２種類の射影距離を設置物距離として記憶する。 In this embodiment, projection distances of different types are used according to the candidate postures. In this case, the installation object distance storage unit 30 uses the photographing unit and the candidate position for use when the candidate posture is standing. The horizontal distance is stored, and the vertical distance is stored for use when the candidate posture is supine. In other words, the installation object distance storage unit 30 stores two types of projection distances to the installation object in the three-dimensional space of the line-of-sight vector corresponding to the pixel as the installation object distance in association with each pixel of the monitoring image.

設置物距離記憶手段３０に記憶される撮影手段から設置物までの視線ベクトルに対応する水平距離又は鉛直距離を、上述した撮影手段から候補位置までの視線ベクトルについての水平距離又は鉛直距離であるモデル距離と区別するために、設置物距離と称する。 A model in which the horizontal distance or the vertical distance corresponding to the line-of-sight vector from the photographing means to the installation object stored in the installation distance storage means 30 is the horizontal distance or the vertical distance for the line-of-sight vector from the photographing means to the candidate position. In order to distinguish from the distance, it is referred to as an installation distance.

設置物距離記憶手段３０に格納する設置物距離は、設置物の三次元モデルを用いたシミュレーションによって予め算出されたものであってもよいし、三次元計測器などにより得た実測値に基づいて予め算出されたものであってもよい。 The installed object distance stored in the installed object distance storage unit 30 may be calculated in advance by a simulation using a three-dimensional model of the installed object, or based on an actual measurement value obtained by a three-dimensional measuring instrument or the like. It may be calculated in advance.

図６は設置物距離の具体例を説明するための模式図である。図６（Ａ）は設置物７００が存在する監視空間の透視図であり、設置物７００の表面の点Ｐ_３〜Ｐ_５が投影された撮影面７１０が示されている。設置物距離記憶手段３０に記憶される水平距離は監視画像の画素数と同数の水平距離群７２０を構成し、図６（Ｂ）は水平距離群７２０を監視画像と同じｘｙ座標系での画像の形式で模式的に表している。同様に、設置物距離記憶手段３０に記憶される鉛直距離は監視画像の画素数と同数の鉛直距離群７３０を構成し、図６（Ｃ）は鉛直距離群７３０を監視画像と同じｘｙ座標系での画像の形式で模式的に表している。 FIG. 6 is a schematic diagram for explaining a specific example of the installation object distance. FIG. 6A is a perspective view of the monitoring space where the installation object 700 exists, and shows an imaging surface 710 on which points P _{3 to} P _{5 on} the surface of the installation object 700 are projected. The horizontal distance stored in the installation object distance storage means 30 constitutes the same number of horizontal distance groups 720 as the number of pixels of the monitoring image, and FIG. 6B shows the horizontal distance group 720 in the same xy coordinate system as the monitoring image. It is schematically represented in the form of Similarly, the vertical distance stored in the installed object distance storage means 30 constitutes the same number of vertical distance groups 730 as the number of pixels of the monitoring image, and FIG. 6C shows the same xy coordinate system as that of the monitoring image. This is schematically shown in the form of an image.

図６（Ａ）に示すように設置物７００の表面の点Ｐ_３〜Ｐ_５に対応する撮影面７１０での投影点がＱ_３〜Ｑ_５であり、当該投影点は水平距離群７２０および鉛直距離群７３０にて画素Ｑ_３〜Ｑ_５として示されている。図６（Ｂ）に示す水平距離群７２０にて画素Ｑ_３〜Ｑ_５それぞれには図６（Ａ）に示す水平距離ｄ_ＨＦ３〜ｄ_ＨＦ５が記憶される。また、図６（Ｂ）では設置物が写っていない画素Ｑ_６の水平距離をｄ_ＨＦ６と表している。図６（Ｃ）に示す鉛直距離群７３０にて画素Ｑ_３〜Ｑ_５それぞれには図６（Ａ）に示す鉛直距離ｄ_ＶＦ３〜ｄ_ＶＦ５が記憶される。また、図６（Ｃ）では設置物が写っていない画素Ｑ_７の鉛直距離をｄ_ＶＦ７と表している。 As shown in FIG. 6A, the projection points on the imaging plane 710 corresponding to the points P _{3 to} P ₅ on the surface of the installation object 700 are Q _{3 to} Q ₅ , and the projection points are the horizontal distance group 720 and the vertical point. It is shown as pixels Q _{3 to} Q _{5 in the} distance group 730. In the horizontal distance group 720 shown in FIG. _{6B, the} horizontal distances d _{HF3 to} d _HF5 shown in FIG. _6A are stored in the pixels Q _{3 to} Q ₅ respectively. Also, it represents the horizontal distance of the pixel _{Q 6} not reflected is in installed objects FIG 6 (B) and _{d HF 6.} Vertical distance _d VF3 _{to d VF5} shown in FIG. 6 (A) is stored in the pixel _Q 3 to Q ₅ each in a vertical distance group 730 shown in FIG. 6 (C). Also, the vertical distance of the pixel _{Q 7} not reflected is in installed objects FIG 6 (C) represents the _{d VF7.}

具体的には、設置物７００の表面の点Ｐ_３の三次元空間における座標を（Ｘ_３，Ｙ_３，Ｚ_３）とし、また撮影手段の位置Ｃと設置物７００の表面の点Ｐ_３とを結ぶ直線と撮影手段の撮影面７１０との交点、すなわち点Ｐ_３のｘｙ平面への投影点Ｑ_３の撮影面７１０における座標（すなわち画素位置）を（ｘ_３，ｙ_３）とすると、設置物距離記憶手段３０には画素Ｑ_３の座標（ｘ_３，ｙ_３）に対応する水平距離ｄ_ＨＦ３、鉛直距離ｄ_ＶＦ３として、
ｄ_ＨＦ３＝｛（Ｘ_３−Ｘ_Ｃ）^２＋（Ｙ_３−Ｙ_Ｃ）^２｝^１／２
ｄ_ＶＦ３＝｜Ｚ_３−Ｚ_Ｃ｜
が記憶される。 Specifically, the coordinate in the three-dimensional space of the point P _{3 on} the surface of the installation object 700 is (X ₃ , Y ₃ , Z ₃ ), and the position C of the photographing means and the point P _{3 on} the surface of the installation object 700 are intersection of the imaging plane 710 of the linearly with imaging means for connecting, i.e. when the coordinates (i.e., pixel positions) in the imaging plane 710 of the projection point _{Q 3} of the xy plane of the point _{P 3} and _(x 3, _{y 3),} installed horizontal distance _{d HF3} in the object distance storage means 30 corresponding to the coordinates of the pixel _{_{_{Q 3 (x 3, y 3}}} ), as the vertical distance _{d VF3,}
_{_{_{^{d HF3 = {(X 3 -X}}}} C) 2 + (Y 3 -Y C) 2} 1/2
d _VF3 = | Z ₃ -Z _C |
Is memorized.

設置物７００の表面の別の２点Ｐ_４，Ｐ_５についても同様であり、画素Ｑ_４の座標（ｘ_４，ｙ_４）に対応して設置物距離記憶手段３０には水平距離ｄ_ＨＦ４、鉛直距離ｄ_ＶＦ４として、
ｄ_ＨＦ４＝｛（Ｘ_４−Ｘ_Ｃ）^２＋（Ｙ_４−Ｙ_Ｃ）^２｝^１／２
ｄ_ＶＦ４＝｜Ｚ_４−Ｚ_Ｃ｜
が記憶され、画素Ｑ_５の座標（ｘ_５，ｙ_５）に対応して設置物距離記憶手段３０には水平距離ｄ_ＨＦ５、鉛直距離ｄ_ＶＦ５として、
ｄ_ＨＦ５＝｛（Ｘ_５−Ｘ_Ｃ）^２＋（Ｙ_５−Ｙ_Ｃ）^２｝^１／２
ｄ_ＶＦ５＝｜Ｚ_５−Ｚ_Ｃ｜
が記憶される。 The same applies to the other two points P ₄ and P ₅ on the surface of the installation object 700, and the installation object distance storage means 30 stores the horizontal distance d _HF4 , corresponding to the coordinates (x ₄ , y ₄ ) of the pixel Q ₄ . As the vertical distance d _VF4 ,
_{_{_{^{d HF4 = {(X 4 -X}}}} C) 2 + (Y 4 -Y C) 2} 1/2
d _VF4 = | Z ₄ -Z _C |
There is stored, the coordinates of the pixel _{_{_{Q 5 (x 5, y 5}}} ) for installation thereof distance storage means 30 corresponding to the horizontal distance _{d HF 5,} as the vertical distance _{d VF5,}
_{_{_{^{d HF5 = {(X 5 -X}}}} C) 2 + (Y 5 -Y C) 2} 1/2
d _VF5 = | Z ₅ -Z _C |
Is memorized.

なお、図６（Ｂ）にて画素Ｑ_６での水平距離として例示した、対応する視線方向に設置物の無い画素の水平距離ｄ_ＨＦ６には、モデル距離よりも必ず大きくなる値を設定する。例えばｄ_ＨＦ６の値は監視空間（撮影範囲）に入り得る線分や監視空間にて設定し得る視線ベクトルの最大長以上の値とすることができる。或いは、監視空間が部屋であれば壁を設置物に準ずるものとして扱うこともでき、その場合、ｄ_ＨＦ６は撮影手段から壁までの水平距離とすることができる。 Incidentally, exemplified as a horizontal distance in pixels Q ₆ in FIG. 6 (B), the the corresponding horizontal distance of no pixel of installed objects in the viewing direction d _{HF 6,} sets always larger than model distance. For example, the value of _dHF6 can be a line segment that can enter the monitoring space (imaging range) or a value that is greater than or equal to the maximum length of the line-of-sight vector that can be set in the monitoring space. Alternatively, if the monitoring space is a room, it is possible to treat the wall as being equivalent to an installed object, and in this case, _dHF6 can be the horizontal distance from the imaging means to the wall.

同様に、図６（Ｃ）にて画素Ｑ_７での水平距離として例示した、対応する視線方向に設置物の無い画素の鉛直距離ｄ_ＶＦ７にも、モデル距離よりも必ず大きくなる値を設定する。例えばｄ_ＶＦ７の値は監視空間（撮影範囲）に入り得る線分や監視空間にて設定し得る視線ベクトルの最大長以上の値とすることができる。或いは、床や地面を設置物に準ずるものとして扱うこともでき、その場合、ｄ_ＶＦ７は撮影手段から床までの鉛直距離とすることができる。 Similarly, exemplified as a horizontal distance in pixels Q ₇ in FIG. 6 (C), the the corresponding vertical distance no pixel of installed objects in the viewing direction d _VF7 also sets always larger than model distance . For example, the value of d _VF7 can be a line segment that can enter the monitoring space (imaging range) or a value that is greater than or equal to the maximum length of the line-of-sight vector that can be set in the monitoring space. Alternatively, the floor or the ground can be treated as equivalent to the installation object, and in this case, _dVF7 can be a vertical distance from the photographing means to the floor.

可視領域推定手段４３は立体モデルのモデル像を構成する画素のうち、当該画素に対応して設置物距離記憶手段３０に記憶されている設置物距離が当該立体モデルのモデル距離以上である画素を対象物可視画素と推定する。具体的には、可視領域推定手段４３はモデル像出力手段４１から仮説（すなわち候補位置および候補姿勢）に対応したモデル像を入力されるとともに、モデル距離算出手段４２から当該仮説に対応したモデル距離を入力され、当該モデル像のうちの設置物により隠蔽されていない対象物可視領域を推定する。当該推定処理では、モデル像を構成する画素のうち、当該画素に対応する設置物距離がモデル距離以上である画素が対象物可視画素とされ、設置物距離がモデル距離未満である画素が設置物により隠蔽された隠蔽画素とされる。可視領域推定手段４３は推定した対象物可視画素からなる対象物可視領域を対象物判定手段４４に出力する。 The visible region estimation means 43 is a pixel of which the object distance stored in the object distance storage means 30 corresponding to the pixel is greater than or equal to the model distance of the three-dimensional model among the pixels constituting the model image of the three-dimensional model. The object is estimated as a visible pixel. Specifically, the visible region estimation unit 43 receives a model image corresponding to a hypothesis (that is, a candidate position and a candidate posture) from the model image output unit 41 and the model distance corresponding to the hypothesis from the model distance calculation unit 42. Is estimated, and the object visible region that is not concealed by the installation object in the model image is estimated. In the estimation process, among the pixels constituting the model image, a pixel whose installation distance corresponding to the pixel is equal to or greater than the model distance is a target visible pixel, and a pixel whose installation distance is less than the model distance is an installation object. The concealment pixel is concealed by. The visible area estimating means 43 outputs the object visible area including the estimated object visible pixels to the object determining means 44.

なお、上述したようにモデル距離の種類（指定種類）は候補姿勢に応じて切り替わる。可視領域推定手段４３は、指定種類がモデル距離と共通する設置物距離を設置物距離記憶手段３０から読み出して利用する。候補姿勢が立位である場合、すなわちモデル距離が水平距離である場合、可視領域推定手段４３は、モデル像を構成する各画素について、当該画素の設置物距離のうちの水平距離がモデル距離以上である画素を対象物可視領域の画素と推定し、当該設置物距離がモデル距離未満である画素を隠蔽画素と推定する。 As described above, the model distance type (specified type) is switched according to the candidate posture. The visible region estimation means 43 reads the installed object distance whose designation type is the same as the model distance from the installed object distance storage means 30 and uses it. When the candidate posture is standing, that is, when the model distance is a horizontal distance, the visible region estimation unit 43 determines that the horizontal distance of the installed object distance of the pixel is greater than or equal to the model distance for each pixel constituting the model image. Are estimated as pixels in the object visible region, and pixels whose installation object distance is less than the model distance are estimated as concealment pixels.

図７は候補姿勢が立位である場合の可視領域推定手段４３の処理例を説明する模式図である。例えば、仮説設定手段４０が出力する候補位置および候補姿勢に対応してモデル像出力手段４１から、図４に示す候補位置Ｐ_１に配置した立位の立体モデル６００を撮影面ｘｙに投影したモデル像８００が可視領域推定手段４３に入力される（図７上段）。モデル像８００を構成する画素には図６に示した設置物７００の表面上の点Ｐ_３への視線方向の画素Ｑ_３および設置物の無い点への視線方向の画素Ｑ_６が含まれている。また、仮説設定手段４０が出力する候補位置および候補姿勢に対応してモデル距離算出手段４２からモデル距離ｄ_ＨＭ１が可視領域推定手段４３に入力される。 FIG. 7 is a schematic diagram illustrating a processing example of the visible region estimation unit 43 when the candidate posture is standing. For example, the candidate positions and the candidate orientation model image output unit 41 in response to output from the hypothesis setting means 40, the projection of the standing of the three-dimensional model 600 disposed in the candidate position P ₁ shown in FIG. 4 to the imaging surface xy model The image 800 is input to the visible region estimation means 43 (upper part of FIG. 7). Pixels constituting the model image 800 include a pixel Q _{3 in} the line-of-sight direction to the point P ₃ on the surface of the installation object 700 shown in FIG. 6 and a pixel Q _{6 in} the line-of-sight direction to a point having no installation object. Yes. Further, the model distance d _HM1 is input from the model distance calculation unit 42 to the visible region estimation unit 43 corresponding to the candidate position and candidate posture output by the hypothesis setting unit 40.

可視領域推定手段４３は、候補姿勢が立位であることに対応して、設置物距離記憶手段３０に記憶されている設置物距離のうちの水平距離群７２０から、モデル像８００を構成する画素と対応する水平距離（…，ｄ_ＨＦ６，…，ｄ_ＨＦ３，…）を読み出す（図７中段）。 The visible region estimation unit 43 corresponds to the candidate posture being standing, and the pixels constituting the model image 800 from the horizontal distance group 720 of the installation object distances stored in the installation object distance storage unit 30. .., D _HF6 ,..., D _HF3,.

可視領域推定手段４３はモデル像８００を構成する各画素について、モデル距離を当該画素の設置物距離と比較し、モデル距離が設置物距離以下である画素からなる対象物可視領域８０１を推定する（図７下段）。具体的には、画素Ｑ_６は、モデル距離ｄ_ＨＭ１が設置物距離ｄ_ＨＦ６未満であるため、対象物可視領域８０１を構成する可視画素と判定される。一方、画素Ｑ_３は、モデル距離ｄ_ＨＭ１が設置物距離ｄ_ＨＦ３より大きいため対象物可視領域８０１を構成する可視画素と判定されない。 The visible region estimation means 43 compares the model distance with the installation object distance of the pixel for each pixel constituting the model image 800, and estimates the object visible region 801 including pixels whose model distance is equal to or less than the installation object distance ( FIG. 7 bottom). Specifically, since the model distance d _HM1 is less than the installation object distance d _HF6 , the pixel Q ₆ is determined as a visible pixel that forms the object visible region 801. On the other hand, the pixel _{Q 3} are model distance _{d HM1} is not determined visible pixels constituting the installed object distance _{d HF3} greater for objects visible region 801.

また、候補姿勢が臥位である場合、すなわちモデル距離が鉛直距離である場合、可視領域推定手段４３は、モデル像を構成する各画素について、当該画素の設置物距離のうちの鉛直距離がモデル距離以上である画素を対象物可視領域の画素と推定し、当該設置物距離がモデル距離未満である画素を隠蔽画素と推定する。 Further, when the candidate posture is supine, that is, when the model distance is a vertical distance, the visible region estimation unit 43 determines that the vertical distance of the installation object distance of the pixel is a model for each pixel constituting the model image. A pixel that is equal to or longer than the distance is estimated as a pixel in the object visible region, and a pixel whose installation object distance is less than the model distance is estimated as a hidden pixel.

図８は候補姿勢が臥位である場合の可視領域推定手段４３の処理例を説明する模式図である。例えば、仮説設定手段４０が出力する候補位置および候補姿勢に対応してモデル像出力手段４１から、図４に示す候補位置Ｐ_２に配置した臥位の立体モデル６２０を撮影面ｘｙに投影したモデル像８１０が可視領域推定手段４３に入力される（図８上段）。モデル像８１０を構成する画素には図６に示した設置物７００の表面上の点Ｐ_５への視線方向の画素Ｑ_５および設置物の無い点への視線方向の画素Ｑ_７が含まれている。また、仮説設定手段４０が出力する候補位置および候補姿勢に対応してモデル距離算出手段４２からモデル距離ｄ_ＶＭ２が可視領域推定手段４３に入力される。 FIG. 8 is a schematic diagram for explaining a processing example of the visible region estimation means 43 when the candidate posture is the supine position. For example, from the model image output means 41 corresponding to the candidate location and the candidate orientation output from the hypothesis setting means 40, the projection of the recumbent three-dimensional model 620 disposed in the candidate position P ₂ shown in FIG. 4 to the imaging surface xy model An image 810 is input to the visible region estimation means 43 (upper part of FIG. 8). The pixels constituting the model image 810 include the pixel Q _{5 in} the line-of-sight direction to the point P ₅ on the surface of the installation object 700 shown in FIG. 6 and the pixel Q _{7 in} the line-of-sight direction to the point having no installation object. Yes. Further, the model distance d _VM2 is input from the model distance calculation unit 42 to the visible region estimation unit 43 corresponding to the candidate position and the candidate posture output by the hypothesis setting unit 40.

可視領域推定手段４３は、候補姿勢が臥位であることに対応して、設置物距離記憶手段３０に記憶されている設置物距離のうちの鉛直距離群７３０から、モデル像８１０を構成する画素と対応する水平距離（…，ｄ_ＶＦ５，…，ｄ_ＶＦ７，…）を読み出す（図８中段）。 The visible region estimating means 43 corresponds to the candidate posture being the saddle position, and the pixels constituting the model image 810 from the vertical distance group 730 among the installation distances stored in the installation distance storage means 30. the corresponding horizontal distance _{_{(..., d VF5, ...,}} d VF7, ...) read out (FIG. 8 middle).

可視領域推定手段４３はモデル像８１０を構成する各画素について、モデル距離を当該画素の設置物距離と比較し、モデル距離が設置物距離以下である画素からなる対象物可視領域８１１を推定する（図８下段）。具体的には、画素Ｑ_７は、モデル距離ｄ_ＶＭ２が設置物距離ｄ_ＶＦ７未満であるため、対象物可視領域８１１を構成する可視画素と判定される。一方、画素Ｑ_５は、モデル距離ｄ_ＶＭ２が設置物距離ｄ_ＶＦ５より大きいため対象物可視領域８１１を構成する可視画素と判定されない。 The visible region estimation means 43 compares the model distance with the installation object distance of the pixel for each pixel constituting the model image 810, and estimates the object visible region 811 including pixels whose model distance is equal to or less than the installation object distance ( FIG. 8 bottom). Specifically, the pixel _{Q 7,} since the model distance _{d VM2} is less than the installed object distance _{d VF7,} it is determined that the visible pixels constituting the object visible region 811. On the other hand, the pixel _{Q 5,} the model distance _{d VM2} is determined not to visible pixels constituting the installed object distance _{d VF5} greater for objects visible region 811.

対象物判定手段４４は監視画像における対象物可視画素の画像特徴から対象物の像の存在を判定する。具体的には、対象物判定手段４４は可視領域推定手段４３から対象物可視領域の情報を入力され、監視画像における対象物可視領域の画像特徴から少なくとも対象物の像の存在を判定し、判定結果を異常判定手段４５に出力する。本実施形態では異常判定手段４５はさらに対象物の状態である対象物の位置および対象物の姿勢を判定する。これらの処理は対象物判定手段４４に含まれる変化画素抽出手段４４０、尤度算出手段４４１および位置・姿勢判定手段４４２により行われる。 The object determination means 44 determines the presence of the object image from the image feature of the object visible pixel in the monitoring image. Specifically, the object determining unit 44 receives information on the object visible region from the visible region estimating unit 43, determines the presence of at least the image of the object from the image features of the object visible region in the monitoring image, and determines The result is output to the abnormality determination means 45. In the present embodiment, the abnormality determination unit 45 further determines the position of the target object and the posture of the target object that are in the state of the target object. These processes are performed by the change pixel extraction unit 440, the likelihood calculation unit 441, and the position / posture determination unit 442 included in the object determination unit 44.

変化画素抽出手段４４０は、撮影部２から新たに入力された監視画像から変化画素を抽出し、抽出された変化画素の情報を尤度算出手段４４１へ出力する。変化画素の情報は必要に応じて仮説設定手段４０にも出力される。変化画素の抽出は公知の背景差分処理により行われる。すなわち変化画素抽出手段４４０は、新たに入力された監視画像と背景画像との差分処理を行って差が予め設定された差分閾値以上である画素を変化画素として抽出する。変化画素は対象物が存在する領域に対応して抽出され得る。変化画素抽出手段４４０は背景画像として、監視領域に対象物が存在しない状態での監視画像を記憶部３に格納する。例えば、基本的に管理者は監視領域に対象物が存在しない状態で移動物体追跡装置１を起動するので、起動直後の監視画像から背景画像を生成することができる。なお、差分処理に代えて、新たに入力された監視画像と背景画像との相関処理によって変化画素を抽出してもよいし、背景画像に代えて背景モデルを学習して当該背景モデルとの差分処理によって変化画素を抽出してもよい。 The change pixel extraction unit 440 extracts change pixels from the monitoring image newly input from the imaging unit 2 and outputs information about the extracted change pixels to the likelihood calculation unit 441. Information on the changed pixels is also output to the hypothesis setting means 40 as necessary. Extraction of a change pixel is performed by a known background difference process. That is, the change pixel extraction unit 440 performs difference processing between the newly input monitoring image and the background image, and extracts pixels whose difference is equal to or greater than a preset difference threshold as change pixels. The change pixel can be extracted corresponding to a region where the object exists. The change pixel extraction unit 440 stores, as a background image, a monitoring image in a state where no object exists in the monitoring area in the storage unit 3. For example, the administrator basically activates the moving object tracking device 1 in a state where no object is present in the monitoring area, so that a background image can be generated from the monitoring image immediately after activation. Instead of the difference process, the changed pixels may be extracted by a correlation process between the newly input monitoring image and the background image, or the difference from the background model by learning the background model instead of the background image You may extract a change pixel by a process.

尤度算出手段４４１は、各仮説に対して推定された対象物可視領域における対象物の特徴量を監視画像から抽出し、特徴量の抽出度合いに応じた、当該仮説における候補位置の物体位置としての尤度を算出して位置・姿勢判定手段４４２へ出力する。下記（１）〜（４）は尤度の算出方法の例である。 The likelihood calculating means 441 extracts the feature amount of the target in the target visible region estimated for each hypothesis from the monitoring image, and determines the object position of the candidate position in the hypothesis according to the degree of feature extraction. Is calculated and output to the position / posture determination means 442. The following (1) to (4) are examples of likelihood calculation methods.

（１）変化画素抽出手段４４０により抽出された変化画素に対象物可視領域を重ね合わせ、変化画素が対象物可視領域に含まれる割合（包含度）を求める。包含度は、候補位置が現に対象物が存在する位置に近いほど高くなり、遠ざかるほど低くなりやすい。そこで、当該包含度に基づいて尤度を算出する。 (1) The object visible area is superimposed on the change pixel extracted by the change pixel extraction means 440, and the ratio (inclusion level) at which the change pixel is included in the object visible area is obtained. Inclusion degree is higher as the candidate position is closer to the position where the object is actually present, and is likely to be lower as the distance is further away. Therefore, the likelihood is calculated based on the degree of inclusion.

（２）監視画像における対象物可視領域の輪郭に対応する部分からエッジを抽出する。候補位置が現に対象物が存在する位置に近いほど、対象物可視領域の輪郭がエッジ位置と一致するため、エッジの抽出度（例えば抽出されたエッジ強度の和）は増加し、一方、遠ざかるほど抽出度は減少しやすい。そこで、エッジの抽出度に基づいて尤度を算出する。 (2) An edge is extracted from a portion corresponding to the contour of the object visible region in the monitoring image. The closer the candidate position is to the position where the object actually exists, the more the edge extraction degree (for example, the sum of the extracted edge intensities) increases because the contour of the object visible region matches the edge position, while the more the candidate position is located, The degree of extraction tends to decrease. Therefore, the likelihood is calculated based on the degree of edge extraction.

（３）各対象物の過去の物体位置において監視画像から抽出された特徴量を当該対象物の参照情報として記憶部３に記憶する。候補位置が現に対象物が存在する位置に近いほど背景や他の対象物の特徴量が混入しなくなるため、対象物可視領域から抽出された特徴量と参照情報との類似度は高くなり、一方、遠ざかるほど類似度は低くなりやすい。そこで、監視画像から対象物可視領域内の特徴量を抽出し、抽出された特徴量と参照情報との類似度を尤度として算出する。ここでの特徴量として例えば、エッジ分布、色ヒストグラム又はこれらの両方など、種々の画像特徴量を利用することができる。 (3) The feature amount extracted from the monitoring image at the past object position of each object is stored in the storage unit 3 as reference information of the object. As the candidate position is closer to the position where the target object actually exists, the feature quantity of the background and other target objects will not be mixed, so the similarity between the feature quantity extracted from the target visible area and the reference information becomes higher. The degree of similarity tends to decrease as the distance increases. Therefore, the feature quantity in the object visible region is extracted from the monitoring image, and the similarity between the extracted feature quantity and the reference information is calculated as the likelihood. As the feature amount here, for example, various image feature amounts such as an edge distribution, a color histogram, or both of them can be used.

（４）また上述した包含度、エッジの抽出度、類似度のうちの複数の度合いの重み付け加算値に応じて尤度を算出してもよい。 (4) The likelihood may be calculated according to a weighted addition value of a plurality of degrees of the above-described inclusion degree, edge extraction degree, and similarity degree.

このように可視領域推定手段４３により推定された対象物可視領域を利用することで、各対象物の隠蔽状態に適合した尤度を算出できるので追跡の信頼性が向上する。 By using the object visible region estimated by the visible region estimating means 43 in this way, the likelihood suitable for the concealment state of each object can be calculated, so that the tracking reliability is improved.

位置・姿勢判定手段４４２は、対象物の各仮説、および当該仮説ごとに算出された尤度から当該対象物の位置（物体位置）を判定し、判定結果を記憶部３に対象物ごとに時系列に蓄積する。なお、全ての尤度が所定の下限値（尤度下限値）未満の場合は物体位置なし、つまり消失したと判定する。下記（１）〜（３）は物体位置の算出方法の例である。
（１）対象物ごとに、尤度を重みとする候補位置の重み付け平均値を算出し、これを当該対象物の物体位置とする。
（２）対象物ごとに、最大の尤度が算出された候補位置を求め、これを物体位置とする。
（３）対象物ごとに、予め設定された尤度閾値以上の尤度が算出された候補位置の平均値を算出し、これを物体位置とする。ここで、尤度閾値＞尤度下限値である。 The position / posture determination unit 442 determines the position (object position) of the target object from each hypothesis of the target object and the likelihood calculated for each hypothesis, and stores the determination result in the storage unit 3 for each target object. Accumulate in series. When all the likelihoods are less than a predetermined lower limit (likelihood lower limit), it is determined that there is no object position, that is, disappeared. The following (1) to (3) are examples of the object position calculation method.
(1) For each object, a weighted average value of candidate positions weighted by likelihood is calculated, and this is set as the object position of the object.
(2) For each object, a candidate position where the maximum likelihood is calculated is obtained, and this is set as the object position.
(3) For each object, an average value of candidate positions at which likelihoods equal to or higher than a preset likelihood threshold value are calculated is set as an object position. Here, the likelihood threshold> the likelihood lower limit value.

上述のように対象物判定手段４４において尤度算出手段４４１および位置・姿勢判定手段４４２は、画像における対象物可視領域の画像特徴から対象物の像の存在を判定する機能を有する。また、対象物判定手段４４は、消失が判定されなかった場合の最高尤度の仮説に設定された候補姿勢を対象物の姿勢と判定する。 As described above, the likelihood calculating unit 441 and the position / posture determining unit 442 in the object determining unit 44 have a function of determining the presence of the image of the object from the image features of the object visible region in the image. Moreover, the target object determination means 44 determines the candidate posture set as the maximum likelihood hypothesis when the disappearance is not determined as the posture of the target object.

異常判定手段４５は、記憶部３に蓄積された時系列の物体位置を参照し、長時間滞留する不審な動きや通常動線から逸脱した不審な動きを異常と判定し、異常が判定されると出力部５へアラーム信号を出力する。 The abnormality determination unit 45 refers to the time-series object positions accumulated in the storage unit 3, determines that a suspicious movement that stays for a long time or a suspicious movement that deviates from the normal flow line is abnormal, and determines the abnormality. The alarm signal is output to the output unit 5.

［移動物体追跡装置１の動作］
次に、移動物体追跡装置１の追跡動作を説明する。図９，図１０は移動物体追跡装置１の追跡処理の概略のフロー図である。 [Operation of Moving Object Tracking Device 1]
Next, the tracking operation of the moving object tracking device 1 will be described. 9 and 10 are schematic flowcharts of the tracking process of the moving object tracking device 1. FIG.

画像処理部４は撮影部２が監視空間を撮像するたびに、撮影部２から監視画像を取得する（ステップＳ１）。以下、最新の監視画像が入力された時刻を現時刻、最新の監視画像を現画像と呼ぶ。 The image processing unit 4 acquires a monitoring image from the imaging unit 2 every time the imaging unit 2 images the monitoring space (step S1). Hereinafter, the time when the latest monitoring image is input is called the current time, and the latest monitoring image is called the current image.

現画像は変化画素抽出手段４４０により背景画像と比較され、変化画素抽出手段４４０は変化画素を抽出する（ステップＳ２）。ここで、孤立した変化画素はノイズによるものとして抽出結果から除外する。なお、背景画像が無い動作開始直後は、現画像を背景画像として記憶部３に記憶させ、便宜的に変化画素なしとする。 The current image is compared with the background image by the change pixel extraction unit 440, and the change pixel extraction unit 440 extracts the change pixel (step S2). Here, the isolated change pixel is excluded from the extraction result as being caused by noise. Note that immediately after the start of an operation without a background image, the current image is stored in the storage unit 3 as a background image, and there is no change pixel for convenience.

また、仮説設定手段４０は追跡中の各対象物に対して動き予測に基づきＰ個の仮説（候補姿勢および候補位置）を設定する（ステップＳ３）。なお、後述するステップＳ１６にて新規出現であると判定された対象物の候補位置は動き予測不能なため出現位置を中心とする広めの範囲にＰ個の仮説を設定する。また、後述するステップＳ１６にて消失と判定された対象物の仮説は削除する。 Further, the hypothesis setting means 40 sets P hypotheses (candidate postures and candidate positions) for each object being tracked based on motion prediction (step S3). Note that P hypotheses are set in a wider range centering on the appearance position because the candidate position of the object determined to be a new appearance in step S16 to be described later cannot be predicted. Further, the hypothesis of the object determined to be lost in step S16 described later is deleted.

画像処理部４は、ステップＳ２にて変化画素が抽出されず、かつステップＳ３にて仮説が設定されていない（追跡中の対象物がない）場合（ステップＳ４にて「ＹＥＳ」の場合）はステップＳ１に戻り、次の監視画像の入力を待つ。 In the case where the changed pixel is not extracted in step S2 and the hypothesis is not set in step S3 (there is no object being tracked) (if “YES” in step S4), the image processing unit 4 Returning to step S1, the input of the next monitoring image is awaited.

一方、ステップＳ４にて「ＮＯ」の場合は、ステップＳ５〜Ｓ１６の処理を行う。まず、仮説設定手段４０が対象物の前後関係を判定する（ステップＳ５）。具体的には、仮説設定手段４０は候補位置の重心（平均値）とカメラ位置との距離を算出し、距離の昇順に対象物の識別子を並べた前後関係リストを作成する。 On the other hand, if “NO” in the step S4, the processes of the steps S5 to S16 are performed. First, the hypothesis setting means 40 determines the front-rear relationship of the object (step S5). Specifically, the hypothesis setting means 40 calculates the distance between the center of gravity (average value) of the candidate positions and the camera position, and creates a contextual list in which the identifiers of the objects are arranged in ascending order of the distance.

画像処理部４は前後関係リストの先頭から順に各対象物を順次、注目物体に設定する（ステップＳ６）。続いて、画像処理部４は注目物体の各仮説を順次、注目仮説に設定する（ステップＳ７）。但し、監視画像の視野外である候補位置を有する仮説は注目仮説の設定対象から除外し、当該仮説の対象物可視領域は推定せず、尤度を０に設定する。 The image processing unit 4 sequentially sets each target object as a target object in order from the top of the context list (step S6). Subsequently, the image processing unit 4 sequentially sets each hypothesis of the target object as the target hypothesis (step S7). However, a hypothesis having a candidate position outside the field of view of the monitoring image is excluded from the target hypothesis setting target, the object visible region of the hypothesis is not estimated, and the likelihood is set to zero.

モデル像出力手段４１は仮説設定手段４０から入力された候補姿勢および候補位置に応じた対象物モデル像を、例えば立体モデル記憶手段３００から読み出した立体モデルをカメラパラメータ記憶手段３０１から読み出したカメラパラメータを用いて監視画像の座標系に投影して生成し、可視領域推定手段４３に出力する（ステップＳ８）。なお、上述したように、モデル像出力手段４１は入力された候補姿勢および候補位置に応じた対象物モデル像をモデル像記憶手段３０２から読み出して可視領域推定手段４３に出力する構成とすることもできる。 The model image output unit 41 reads the object model image corresponding to the candidate posture and candidate position input from the hypothesis setting unit 40, for example, the stereo model read from the stereo model storage unit 300, and the camera parameter read from the camera parameter storage unit 301. Is projected onto the coordinate system of the monitoring image and generated and output to the visible region estimation means 43 (step S8). As described above, the model image output unit 41 may be configured to read the target object model image corresponding to the input candidate posture and candidate position from the model image storage unit 302 and output it to the visible region estimation unit 43. it can.

可視領域推定手段４３はモデル像出力手段４１から入力された対象物モデル像における設置物による隠蔽を推定する（ステップＳ９）。図１１はステップＳ９の設置物隠蔽推定処理の概略のフロー図である。 The visible region estimation means 43 estimates the concealment by the installation object in the object model image input from the model image output means 41 (step S9). FIG. 11 is a schematic flowchart of the object concealment estimation process in step S9.

モデル距離算出手段４２は仮説設定手段４０から入力された仮説が候補姿勢を立位とする立位仮説であれば（ステップＳ１００にて「ＹＥＳ」の場合）、当該仮説の候補位置について水平距離ｄ_ＨＭを算出し、これをモデル距離として可視領域推定手段４３に出力する（ステップＳ１０１）。 If the hypothesis input from the hypothesis setting unit 40 is a standing hypothesis with the candidate posture standing (“YES” in step S100), the model distance calculating unit 42 determines the horizontal distance d for the candidate position of the hypothesis. _HM is calculated and output to the visible region estimation means 43 as a model distance (step S101).

可視領域推定手段４３は、モデル像出力手段４１から入力された対象物モデル像内の画素を順次、注目画素に設定し（ステップＳ１０２）、設置物距離記憶手段３０に設置物距離として監視画像の各画素に対応付けて記憶されている水平距離と鉛直距離のうち、注目画素のｘｙ座標に対応する水平距離ｄ_ＨＦ（ｘ，ｙ）を読み出す（ステップＳ１０３）。 The visible region estimation unit 43 sequentially sets the pixels in the object model image input from the model image output unit 41 as the target pixel (step S102), and stores the monitored image as the installation object distance in the installation object distance storage unit 30. Of the horizontal distance and the vertical distance stored in association with each pixel, the horizontal distance d _HF (x, y) corresponding to the xy coordinate of the target pixel is read (step S103).

可視領域推定手段４３は設置物距離ｄ_ＨＦ（ｘ，ｙ）をモデル距離ｄ_ＨＭと比較し、設置物距離ｄ_ＨＦ（ｘ，ｙ）がモデル距離ｄ_ＨＭ以上であれば（ステップＳ１０４にて「ＹＥＳ」の場合）、注目画素は対象物可視画素、すなわち設置物により隠蔽されていない非隠蔽画素であると推定し（ステップＳ１０５）、ステップＳ１０６に進む。一方、設置物距離ｄ_ＨＦ（ｘ，ｙ）がモデル距離ｄ_ＨＭ未満であれば（ステップＳ１０４にて「ＮＯ」の場合）、注目画素は設置物により隠蔽されている隠蔽画素として扱われることになり、よって注目画素は対象物可視画素であるとは推定されずにステップＳ１０６に進む。 If the visible region estimating means 43 installed object distance _d HF (x, y) compared with the model distances _{d HM} and installed object distance _d HF (x, y) is the model distance _{d HM} or more (step S104 " In the case of “YES”), it is estimated that the target pixel is an object visible pixel, that is, a non-hidden pixel that is not concealed by the installation (step S105), and the process proceeds to step S106. On the other hand, if the installation object distance d _HF (x, y) is less than the model distance d _HM (in the case of “NO” in step S104), the target pixel is treated as a concealed pixel concealed by the installation object. Therefore, the target pixel is not estimated to be an object visible pixel, and the process proceeds to step S106.

可視領域推定手段４３は対象物モデル像の全画素について処理が完了するまで（ステップＳ１０６にて「ＮＯ」の場合）、ステップＳ１０２〜Ｓ１０５の処理を繰り返し、全画素について当該処理が完了すると（ステップＳ１０６にて「ＹＥＳ」の場合）、図１０のステップＳ１０に処理を進める。 The visible region estimating means 43 repeats the processing of steps S102 to S105 until the processing is completed for all the pixels of the object model image (in the case of “NO” in step S106), and when the processing is completed for all the pixels (step If “YES” in S106), the process proceeds to step S10 in FIG.

また、モデル距離算出手段４２は仮説設定手段４０から入力された仮説が候補姿勢を立位とする立位仮説でない場合、つまり臥位仮説である場合（ステップＳ１００にて「ＮＯ」の場合）、当該仮説の候補位置について鉛直距離ｄ_ＶＭを算出し、これをモデル距離として可視領域推定手段４３に出力する（ステップＳ１０７）。 Further, the model distance calculation means 42, if the hypothesis input from the hypothesis setting means 40 is not a standing hypothesis with the candidate posture standing, that is, a supine hypothesis (in the case of “NO” in step S100), The vertical distance d _VM is calculated for the candidate position of the hypothesis, and this is output as a model distance to the visible region estimating means 43 (step S107).

可視領域推定手段４３は、モデル像出力手段４１から入力された対象物モデル像内の画素を順次、注目画素に設定し（ステップＳ１０８）、設置物距離記憶手段３０に設置物距離として監視画像の各画素に対応付けて記憶されている水平距離と鉛直距離のうち、注目画素のｘｙ座標に対応する鉛直距離ｄ_ＶＦ（ｘ，ｙ）を読み出す（ステップＳ１０９）。 The visible region estimation unit 43 sequentially sets the pixels in the object model image input from the model image output unit 41 as the target pixel (step S108), and stores the monitored image as the installation object distance in the installation object distance storage unit 30. Of the horizontal distance and the vertical distance stored in association with each pixel, the vertical distance d _VF (x, y) corresponding to the xy coordinate of the target pixel is read (step S109).

可視領域推定手段４３は設置物距離ｄ_ＶＦ（ｘ，ｙ）をモデル距離ｄ_ＶＭと比較し、設置物距離ｄ_ＶＦ（ｘ，ｙ）がモデル距離ｄ_ＶＭ以上であれば（ステップＳ１１０にて「ＹＥＳ」の場合）、注目画素は対象物可視画素、すなわち設置物により隠蔽されていない非隠蔽画素であると推定し（ステップＳ１１１）、ステップＳ１１２に進む。一方、設置物距離ｄ_ＶＦ（ｘ，ｙ）がモデル距離ｄ_ＶＭ未満であれば（ステップＳ１１０にて「ＮＯ」の場合）、注目画素は設置物により隠蔽されている隠蔽画素として扱われることになり、よって注目画素は対象物可視画素であるとは推定されずにステップＳ１１２に進む。 If the visible region estimating means 43 installed object distance _d VF (x, y) compared to a model distance _{d VM,} installed objects distance _d VF (x, y) is the model distance _{d VM} above (step S110 " In the case of “YES”), it is estimated that the target pixel is an object visible pixel, that is, a non-hidden pixel that is not concealed by the installation (step S111), and the process proceeds to step S112. On the other hand, if the installation object distance d _VF (x, y) is less than the model distance d _VM (in the case of “NO” in step S110), the target pixel is treated as a concealed pixel concealed by the installation object. Accordingly, the target pixel is not estimated to be a target visible pixel, and the process proceeds to step S112.

可視領域推定手段４３は対象物モデル像の全画素について処理が完了するまで（ステップＳ１１２にて「ＮＯ」の場合）、ステップＳ１０８〜Ｓ１１１の処理を繰り返し、全画素について当該処理が完了すると（ステップＳ１１２にて「ＹＥＳ」の場合）、図１０のステップＳ１０に処理を進める。 The visible region estimating means 43 repeats the processing of steps S108 to S111 until the processing is completed for all the pixels of the object model image (in the case of “NO” in step S112), and when the processing is completed for all the pixels (step If “YES” in S112), the process proceeds to step S10 in FIG.

可視領域推定手段４３は、前後関係リストを参照して注目物体より手前の対象物があれば（ステップＳ１０にて「ＹＥＳ」の場合）、手前の対象物の物体位置に対応する対象物モデル像の領域を対象物可視領域から除くマスク処理を行って対象物可視領域を更新する（ステップＳ１１）。なお、手前に複数の対象物があれば、それら全てについてマスク処理を試みてもよいし、それらのうちカメラ位置から物体位置への投影線が注目物体への投影線となす角度が幅２Ｗに相当する角度未満の対象物のみに絞って除外処理を実行してもよい。一方、手前に対象物がなければ（ステップＳ１０にて「ＮＯ」の場合）、マスク処理は行われない。 The visible region estimation means 43 refers to the context list, and if there is an object in front of the object of interest (“YES” in step S10), the object model image corresponding to the object position of the object in front is shown. The object visible area is updated by performing a mask process for removing the area from the object visible area (step S11). Note that if there are a plurality of objects in front, mask processing may be attempted for all of them, and the angle between the projection line from the camera position to the object position and the projection line to the object of interest is 2 W in width. The exclusion process may be executed by limiting only to objects having an angle less than the corresponding angle. On the other hand, if there is no object in front (in the case of “NO” in step S10), the mask process is not performed.

設置物及び手前の対象物による隠蔽領域を除去して、注目仮説の候補位置での対象物可視領域が求められると、尤度算出手段４４１によって当該注目仮説に対する尤度算出が行われる（ステップＳ１２）。尤度算出手段４４１は、撮影画像のうち変化画素抽出手段４４０により抽出した変化画素に対応する部分を入力され、当該部分から、注目仮説に対して推定された対象物可視領域における対象物の特徴量を抽出し、特徴量の抽出度合いに応じた、当該仮説の物体位置としての尤度を算出して位置・姿勢判定手段４４２へ出力する。 When the concealment area by the installation object and the front object is removed and the target visible area at the candidate position of the target hypothesis is obtained, the likelihood calculation unit 441 calculates the likelihood for the target hypothesis (step S12). ). The likelihood calculating unit 441 receives a portion of the captured image corresponding to the changed pixel extracted by the changed pixel extracting unit 440, and the feature of the target in the target visible region estimated for the target hypothesis from the portion. The amount is extracted, and the likelihood as the object position of the hypothesis corresponding to the degree of feature amount extraction is calculated and output to the position / posture determination unit 442.

画像処理部４は、尤度が算出されていない仮説が残っている場合（ステップＳ１３にて「ＮＯ」の場合）、ステップＳ７〜Ｓ１２の処理を繰り返す。Ｐ個全ての仮説について尤度が算出されると（ステップＳ１３にて「ＹＥＳ」の場合）、位置・姿勢判定手段４４２が注目物体の各仮説と当該仮説のそれぞれについて算出された尤度とを用いて注目物体の物体位置を算出し、また姿勢を判定する（ステップＳ１４）。現時刻について算出された物体位置は１時刻前までに記憶部３に記憶させた注目物体の物体位置と対応付けて追記される。なお、新規出現した対象物の場合は新たな識別子を付与して登録する。また、全ての予測位置での尤度が尤度下限値未満の場合は物体位置なしと判定する。 The image processing unit 4 repeats the processes in steps S7 to S12 when a hypothesis whose likelihood is not calculated remains (in the case of “NO” in step S13). When the likelihood is calculated for all P hypotheses (in the case of “YES” in step S13), the position / posture determination means 442 calculates each hypothesis of the target object and the likelihood calculated for each of the hypotheses. Using this, the object position of the object of interest is calculated and the posture is determined (step S14). The object position calculated for the current time is additionally recorded in association with the object position of the object of interest stored in the storage unit 3 one hour before. In the case of a newly appearing object, a new identifier is assigned and registered. If the likelihood at all predicted positions is less than the lower limit of likelihood, it is determined that there is no object position.

画像処理部４は未処理の対象物が残っている場合（ステップＳ１５にて「ＮＯ」の場合）、当該対象物について物体位置を判定するステップＳ６〜Ｓ１４の処理を繰り返す。一方、全ての対象物について物体位置を判定すると（ステップＳ１５にて「ＹＥＳ」の場合）、物体の新規出現と消失を判定する（ステップＳ１６）。具体的には、画像処理部４は各物体位置に対して推定された対象物可視領域を合成して、変化画素抽出手段４４０により抽出された変化画素のうち合成領域外の変化画素を検出し、検出された変化画素のうち近接する変化画素同士をラベリングする。ラベルが対象物とみなせる大きさであれば新規出現の旨をラベルの位置（出現位置）とともに記憶部３に記録する。また、物体位置なしの対象物があれば当該対象物が消失した旨を記憶部３に記録する。以上の処理を終えると、次時刻の監視画像に対する処理を行うためにステップＳ１へ戻る。 When an unprocessed object remains (when “NO” in step S15), the image processing unit 4 repeats the processes of steps S6 to S14 for determining the object position of the object. On the other hand, when the object position is determined for all the objects (in the case of “YES” in step S15), the appearance and disappearance of the object are determined (step S16). Specifically, the image processing unit 4 synthesizes the object visible area estimated for each object position, and detects a change pixel outside the synthesis area among the change pixels extracted by the change pixel extraction unit 440. The adjacent change pixels among the detected change pixels are labeled. If the label is a size that can be regarded as an object, a new appearance is recorded in the storage unit 3 together with the label position (appearance position). If there is an object without an object position, the fact that the object has disappeared is recorded in the storage unit 3. When the above processing is completed, the processing returns to step S1 in order to perform processing on the monitoring image at the next time.

［変形例］
上記実施形態においては、候補姿勢が立位の場合の射影距離として、撮影手段の視線ベクトルを水平面に正射影したベクトルの長さである水平距離を算出し、それぞれ当該水平距離で定義したモデル距離と設置物距離とを大小比較して隠蔽の有無を判定する例を示した。これに対し、水平距離に対応する正射影ベクトルのＸ成分もしくはＹ成分を射影距離と定義し、この定義でのモデル距離と設置物距離とを大小比較することでも隠蔽を判定することができる。 [Modification]
In the above embodiment, as the projection distance when the candidate posture is standing, the horizontal distance, which is the length of the vector obtained by orthogonally projecting the line-of-sight vector of the photographing means on the horizontal plane, is calculated, and the model distance defined by the horizontal distance, respectively. An example in which the presence or absence of concealment is determined by comparing the distance between the installation object and the object is shown. On the other hand, the X component or the Y component of the orthogonal projection vector corresponding to the horizontal distance is defined as the projection distance, and the concealment can also be determined by comparing the model distance and the installation object distance in this definition.

撮影手段の位置Ｃの座標を（Ｘ_Ｃ，Ｙ_Ｃ，Ｚ_Ｃ）とし、立位の人の候補位置Ｐ_１（Ｘ_１，Ｙ_１，Ｚ_１）に対する設置物の表面の点Ｐ_３（Ｘ_３，Ｙ_３，Ｚ_３）の隠蔽の有無の判定を例に、本変形例を具体的に説明する。可視領域推定手段４３は例えば、以下の（３ａ）式〜（３ｄ）式のいずれかが成り立てば、候補位置Ｐ_１の立体モデルのモデル像の画素のうちＰ_３に対応する画素が設置物により隠蔽されない、つまり可視画素であると判定する。
|（Ｘ_１−Ｘ_Ｃ）|＞|（Ｙ_１−Ｙ_Ｃ）|、かつ（Ｘ_１−Ｘ_Ｃ）＞０の場合、
（Ｘ_１−Ｘ_Ｃ）＜（Ｘ_３−Ｘ_Ｃ）、すなわちＸ_１＜Ｘ_３ ………（３ａ）
|（Ｘ_１−Ｘ_Ｃ）|＞|（Ｙ_１−Ｙ_Ｃ）|、かつ（Ｘ_１−Ｘ_Ｃ）＜０の場合、
（Ｘ_１−Ｘ_Ｃ）＞（Ｘ_３−Ｘ_Ｃ）、すなわちＸ_１＞Ｘ_３ ………（３ｂ）
|（Ｘ_１−Ｘ_Ｃ）|＜|（Ｙ_１−Ｙ_Ｃ）|、かつ（Ｙ_１−Ｙ_Ｃ）＞０の場合、
（Ｙ_１−Ｙ_Ｃ）＜（Ｙ_３−Ｙ_Ｃ）、すなわちＹ_１＜Ｙ_３ ………（３ｃ）
|（Ｘ_１−Ｘ_Ｃ）|＜|（Ｙ_１−Ｙ_Ｃ）|、かつ（Ｙ_１−Ｙ_Ｃ）＜０の場合、
（Ｙ_１−Ｙ_Ｃ）＞（Ｙ_３−Ｙ_Ｃ）、すなわちＹ_１＞Ｙ_３ ………（３ｄ） The coordinates of the position C of the photographing means are set to (X _C , Y _C , Z _C ), and the point P ₃ (X on the surface of the installation object with respect to the standing person candidate position P ₁ (X ₁ , Y ₁ , Z ₁ ) ₃ , Y ₃ , Z ₃ ) This modification will be specifically described with reference to the determination of the presence or absence of concealment. Visible region estimating unit 43, for example, if one of the following: (3a) of the formula ~ (3d) expression Naritate, pixels corresponding to P ₃ among the pixels of the model image of the three-dimensional model of the candidate position P ₁ is the installed object It is determined that the pixel is not hidden, that is, a visible pixel.
| (X ₁ -X _C ) |> | (Y ₁ -Y _C ) | and (X ₁ -X _C )> 0,
(X ₁ −X _C ) <(X ₃ −X _C ), that is, X ₁ <X ₃ (3a)
| (X ₁ -X _C ) |> | (Y ₁ -Y _C ) | and (X ₁ -X _C ) <0,
(X ₁ −X _C )> (X ₃ −X _C ), that is, X ₁ > X ₃ (3b)
| (X ₁ −X _C ) | <| (Y ₁ −Y _C ) | and (Y ₁ −Y _C )> 0,
_{_{_{(Y 1 -Y C) <(}}} Y 3 -Y C), i.e. _{_{Y 1 <Y 3 ......... (3c}} )
| (X ₁ −X _C ) | <| (Y ₁ −Y _C ) | and (Y ₁ −Y _C ) <0,
(Y ₁ -Y _C )> (Y ₃ -Y _C ), that is, Y ₁ > Y ₃ (3d)

この場合、例えば、モデル距離算出手段４２は、撮影手段から立位の立体モデル６００の候補位置Ｐ_１までの水平距離に対応する正射影ベクトルのＸ成分（Ｘ_１−Ｘ_Ｃ）およびＹ成分（Ｙ_１−Ｙ_Ｃ）を可視領域推定手段４３に出力し、設置物距離記憶手段３０は設置物７００の表面の点Ｐ_３のＸ成分（Ｘ_３−Ｘ_Ｃ）およびＹ成分（Ｙ_３−Ｙ_Ｃ）を記憶しておき可視領域推定手段４３に出力する。この構成では、設置物距離記憶手段３０に立位に対応して記憶する設置物距離のデータ量は、水平距離を記憶する上述の実施形態と比べて２倍となるが、モデル距離の算出にて（１）式に含まれる累乗や平方根の計算が不要となり処理量が軽減できる。 In this case, for example, the model distance calculation unit 42 includes an X component (X ₁ −X _C ) and a Y component (X component (X ₁ −X _C )) of the orthogonal projection vector corresponding to the horizontal distance from the imaging unit to the candidate position P ₁ of the standing stereo model 600. Y ₁ -Y _C ) is output to the visible region estimating means 43, and the installed object distance storing means 30 outputs the X component (X ₃ -X _C ) and Y component (Y ₃ -Y) of the point P _{3 on} the surface of the installed object 700. _C ) is stored and output to the visible region estimation means 43. In this configuration, the data amount of the installation distance stored corresponding to the standing position in the installation distance storage means 30 is twice that of the above-described embodiment in which the horizontal distance is stored. Therefore, calculation of the power and square root included in the equation (1) is unnecessary, and the processing amount can be reduced.

ちなみに、上記実施形態における候補姿勢が臥位の場合に用いる鉛直距離は、撮影手段の視線ベクトルを立体モデル６２０の主軸６２１に直交する鉛直面に正射影したベクトルのＺ成分の大きさであり、本変形例と技術的に共通点を有する。 Incidentally, the vertical distance used when the candidate posture in the above embodiment is the supine position is the magnitude of the Z component of a vector obtained by orthogonally projecting the line-of-sight vector of the imaging means on the vertical plane perpendicular to the main axis 621 of the three-dimensional model 620, This modification has technical common points.

１移動物体追跡装置、２撮影部、３記憶部、４画像処理部、５出力部、３０設置物距離記憶手段、４０仮説設定手段、４１モデル像出力手段、４２モデル距離算出手段、４３可視領域推定手段、４４対象物判定手段、４５異常判定手段、３００立体モデル記憶手段、３０１カメラパラメータ記憶手段、３０２モデル像記憶手段、４２０モデル像生成手段、４２１モデル像読み出し手段、４４０変化画素抽出手段、４４１尤度算出手段、４４２位置・姿勢判定手段、６００，６２０立体モデル、６０１，６２１主軸、６１０水平面、７００設置物、７１０撮影面。 DESCRIPTION OF SYMBOLS 1 Moving object tracking apparatus, 2 Image pick-up part, 3 Storage part, 4 Image processing part, 5 Output part, 30 Installation object distance storage means, 40 Hypothesis setting means, 41 Model image output means, 42 Model distance calculation means, 43 Visible region Estimating means, 44 object judging means, 45 abnormality judging means, 300 three-dimensional model storing means, 301 camera parameter storing means, 302 model image storing means, 420 model image generating means, 421 model image reading means, 440 change pixel extracting means, 441 Likelihood calculation means, 442 Position / attitude determination means, 600,620 solid model, 601,621 main axis, 610 horizontal plane, 700 installation object, 710 photographing plane.

Claims

An object image estimation apparatus that estimates an object visible region in which an image of an object appears in an image obtained by photographing a predetermined three-dimensional space by an imaging unit using a three-dimensional model of the object in the three-dimensional space. And
As the two types of projection distances of the line-of-sight vector of the photographing means, a horizontal distance and a vertical distance, which are the lengths of images obtained by orthogonal projection of the line-of-sight vector onto the horizontal plane and the vertical axis, respectively, are defined.
An installation distance storage means for storing the two types of projection distances to the installation object in the three-dimensional space of the line-of-sight vector corresponding to the pixel in association with each pixel of the image;
Candidate setting means for setting candidate positions of the object and candidate positions in the three-dimensional space where the object can exist;
Model image output means for outputting a model image obtained by projecting the three-dimensional model at the candidate position at the candidate position onto the coordinate system of the image;
Model distance calculation means for calculating, as a model distance, the projection distance of the line-of-sight vector to the candidate position for at least one specified type predetermined for the candidate posture among the types of the projection distance;
Among the pixels constituting the model image of the three-dimensional model, a pixel in which the specified object distance of the specified type of the pixel is greater than or equal to the specified model distance of the three-dimensional model is estimated to be an object visible pixel. Region estimation means;
The object image estimation apparatus characterized by having.

The object is a person;
When the candidate posture is set to the standing position, the horizontal distance is determined as the projection distance of the designated type,
The object image estimation apparatus according to claim 1.

The object is a person;
The vertical distance is determined as the projection distance of the designated type when the candidate posture is set to the saddle position,
The object image estimation apparatus according to claim 1, wherein the object image estimation apparatus is an object image estimation apparatus.

The object image estimation apparatus according to any one of claims 1 to 3,
An object determination means for determining presence of an image of the object from an image feature of the object visible pixel in the image;
The object image determination apparatus characterized by having.