JP2005165688A

JP2005165688A - Multiple objects tracking method and system

Info

Publication number: JP2005165688A
Application number: JP2003403820A
Authority: JP
Inventors: Jenru Shue; ジェンルシュエ; Yasuji Seko; 保次瀬古
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-12-02
Filing date: 2003-12-02
Publication date: 2005-06-23

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multiple object tracking method based on the Bayes theory by expanding a particle filter. <P>SOLUTION: An observation likelihood function is designed, which specifically expresses an occulusion between objects in particles and not only incorporates a concept of observation but also flexibly deal with an object in a range from not entirely overlapped one to multi-overlapped one, and a state expression is reinforced with a hidden variable, and thereby an observation model invariable to view or geometrical conversion is generated. The observation model is adapted when the appearance of the object (the object's form when seen) is changed. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、映像中の対象物を追跡する方法及びシステムに係わり、特に、複数の対象物を同時に追跡する方法及びシステムに関する。 The present invention relates to a method and system for tracking an object in a video, and more particularly to a method and system for simultaneously tracking a plurality of objects.

視覚追跡は、ビデオ監視、人と機械の相互作用、及び、テレビ会議において広範囲に亘って研究され、コンピュータビジョンの分野における重要な研究テーマである。近年になって、単一対象物追跡に関して様々な成功事例が報告されているが、複数対象物追跡は、特に、複数の対象物が同一であるとき、或いは、複数の対象物が複雑な動きをするときの複数対象物追跡は、依然として興味深い研究テーマである。 Visual tracking has been studied extensively in video surveillance, human-machine interaction, and video conferencing and is an important research theme in the field of computer vision. In recent years, various success cases have been reported for single object tracking, but multiple object tracking is particularly useful when multiple objects are the same, or when multiple objects are complex. Multi-object tracking is still an interesting research topic.

頑健（ロバスト）な追跡アルゴリズムの設計は難しい。なぜならば、頑健な追跡アルゴリズムは、識別性の乏しい画像特徴、背景クラッター、不安定かつ不連続な動き、複数の遮蔽対象物、及び、その他の問題のような懸案事項を取り扱うメカニズムを要求するからである。推定技術は追跡の際に重要な役目を果たす。しかし、実世界の環境では、推定技術の性能は、素早い動き、邪魔、及び、オクルージョンのような視覚現象によって屡々妨害される。このため、対象物の画像投影を正確に識別することが困難になる。なお、素早い動きとは、トラッカー（追跡装置）の動的予測能力を超えた継続した対象物移動を意味する。素早い動きが発生すると推定処理が損なわれる。なぜならば、素早い動きは、画像投影の推定位置を不確定にさせ、効率的なセグメンテーションを複雑にするからである。明確なセグメンテーションの更なる妨げは、乱れや、追跡中の対象物と画像の見え方が類似しているその他のシーン要素である。最後に、別のシーン要素がカメラと追跡中の対象物の間に介在し、対象物の画像投影の一部を遮ると、オクルージョンが発生する。オクルージョンは、推定アルゴリズムへ供給されるデータを不完全にさせ、又は、推定アルゴリズムへデータが供給されないような結果を生ずる。 It is difficult to design a robust tracking algorithm. Because robust tracking algorithms require mechanisms to handle issues such as poorly discriminating image features, background clutter, unstable and discontinuous motion, multiple occlusion objects, and other issues It is. Estimation techniques play an important role in tracking. However, in real-world environments, the performance of estimation techniques is often hampered by visual phenomena such as fast movement, obstruction, and occlusion. This makes it difficult to accurately identify the image projection of the object. Note that quick movement means continuous object movement that exceeds the dynamic prediction capability of the tracker (tracking device). If a quick movement occurs, the estimation process is impaired. This is because fast movement makes the estimated position of the image projection uncertain and complicates efficient segmentation. Further hindrances to clear segmentation are turbulence and other scene elements that look similar to the object being tracked. Finally, occlusion occurs when another scene element is interposed between the camera and the object being tracked and obstructs part of the image projection of the object. Occlusion results in incomplete data supplied to the estimation algorithm or results in no data being supplied to the estimation algorithm.

近年の複数対象物追跡システムは二つの基本的なタイプに大別することができる。第１のタイプは、複数台の単一対象物追跡装置を、ブートストラップ機構のインジェクションと組み合わせて使用する。このタイプのシステムの殆どは、動き対応付け問題が解決されているか、又は、動き対応付け問題は些細な問題であり、最近傍戦略が有効であることを想定している。ある種のケースでは、最近傍戦略が本当に適切である。例えば、このようなアプローチを使用して非常に多数のフレームに亘ってコーナー特徴を追跡することが提案されている（例えば、非特許文献９を参照。）。最近傍戦略は、通常、フレーム間の画像の動きが非常に小さいということを前提としている。しかし、フレーム間に著しい動きが現れた場合、曖昧さが急激に増加し得る。これらの曖昧さは、次のフレーム内で追跡特徴を検出するため、重み付き相関窓を使用することによって最小限に抑えられる（非特許文献１０を参照。）。相関技術は動き対応付けの曖昧さを著しく低減し得るが、部分的なオクルージョンや背景の著しい変化がこのような相関技術の場合に問題になり得る。その上、このような相関技術は、既存のトラッキングから測定量を検出するためには適切であるが、新しいトラックの検出のためには適切ではない。 Recent multi-object tracking systems can be broadly divided into two basic types. The first type uses multiple single object tracking devices in combination with bootstrap mechanism injection. Most of this type of system assumes that the motion mapping problem has been solved, or that the motion mapping problem is a trivial problem and that the nearest neighbor strategy is effective. In some cases, the nearest neighbor strategy is really appropriate. For example, it has been proposed to use such an approach to track corner features across a very large number of frames (see, for example, Non-Patent Document 9). Nearest neighbor strategies usually assume that the motion of the image between frames is very small. However, if significant motion appears between frames, the ambiguity can increase rapidly. These ambiguities are minimized by using a weighted correlation window to detect tracking features in the next frame (see Non-Patent Document 10). Although correlation techniques can significantly reduce motion mapping ambiguity, partial occlusion and significant background changes can be a problem with such correlation techniques. Moreover, such a correlation technique is suitable for detecting a measured quantity from existing tracking, but not for detecting a new track.

動き対応付けに関するもう一つの種類のアプローチは、例えば、同時確率データ連想（ＪＰＤＡ）フィルタ（非特許文献１１）やトラッキング・スプリッティング・フィルタ（非特許文献１２）のような統計的データ連想技術であり、これらのアルゴリズムは、現在、特に、コンピュータビジョンの分野において幅広く注目されている。しかし、ＪＰＤＡは、トラックの個数が事前に既知であり、動きシーケンスの全体を通じて一定に保たれる場合に限り適切である。トラッキング・スプリッティング・フィルタは、より多くの証拠が利用可能になるまで対応付けの決定を先延ばしにするため追跡ツリーで使用される多重仮説追跡と類似している。しかし、トラッキング・スプリッティング・フィルタは、トラック間で測定結果を共有することが可能である。これは、物理的に現実的ではなく、追跡装置を一つの最も可能性の高い対象物に集中させる場合がある（例えば、非特許文献２を参照。）。また、幾何学的特徴は、ある時間フレーム内に唯一の測定ベクトルしか生じないことを仮定してもよい。この非同時性は、人の視覚における一般的な制約であり、ステレオ対応付けの場合には、一意性と呼ばれ（非特許文献１３）、動き対応付けの場合には、要素完全性原理と呼ばれる（非特許文献１４）。トラッキング・スプリッティング・アルゴリズムはこれらの制約を扱うことができず、ＭＨＴアプローチを使用しなければならない（非特許文献１５を参照。）。 Another type of approach for motion mapping is a statistical data association technique such as, for example, a joint probability data association (JPDA) filter (NPL 11) or a tracking splitting filter (NPL 12). These algorithms are currently attracting widespread attention, particularly in the field of computer vision. However, JPDA is only suitable if the number of tracks is known in advance and is kept constant throughout the motion sequence. The tracking splitting filter is similar to multiple hypothesis tracking used in tracking trees to postpone matching decisions until more evidence is available. However, tracking splitting filters can share measurement results between tracks. This is not physically realistic and may focus the tracking device on one most likely object (see, for example, Non-Patent Document 2). It may also be assumed that the geometric feature produces only one measurement vector within a time frame. This non-simultaneity is a general limitation in human vision, and is called uniqueness in the case of stereo matching (Non-patent Document 13), and in the case of motion matching, Called (Non-Patent Document 14). Tracking splitting algorithms cannot handle these constraints and must use the MHT approach (see Non-Patent Document 15).

ＭＨＴは、以下の１）〜５）の能力を統合した唯一の統計的データ連想アルゴリズムである。
１）追跡開始：新しい幾何学的特徴が視野に入ったとき、新しいトラックを自動的に生成する。
２）追跡終了：幾何学的特徴が長期間に亘って見えなくなったとき、追跡を自動的に終了する。
３）追跡継続：測定量が無いとき、数フレームに亘って追跡を継続する。そのため、このアルゴリズムは、一時的なオクルージョンをあるレベルでサポートすることができる。
４）偽測定値の明示的なモデル化
５）一意性制約の明示的なモデル化：ある測定値は単一のトラックだけに割り当てられ、あるトラックは１フレーム当たりに１個の測定値の発生源である。 MHT is the only statistical data associative algorithm that integrates the following capabilities 1) to 5).
1) Start tracking: When a new geometric feature enters the field of view, a new track is automatically generated.
2) End of tracking: When the geometric feature disappears for a long time, the tracking is automatically ended.
3) Continuation tracking: When there is no measurement, the tracking is continued over several frames. Therefore, this algorithm can support temporary occlusion at a certain level.
4) Explicit modeling of false measurements 5) Explicit modeling of uniqueness constraints: A measurement is assigned to only a single track, and a track generates one measurement per frame Is the source.

このタイプの複数対象物追跡システムの利点は、殆どのケースにおいて、単一対象物の追跡装置が、複数対象物追跡問題の全体の、限定された視点、即ち、局所視点しか持たないことである。この結果として、準最適解が得られる。 The advantage of this type of multi-object tracking system is that in most cases a single-object tracking device has only a limited, or local, perspective of the entire multi-object tracking problem. . As a result, a suboptimal solution is obtained.

第２のタイプの複数対象物追跡システムは、複数対象物追跡問題の完全なベイズ定式化を利用する。不確実性を予測する性質を本来的に備えているので、ベイズ理論は、クラッター環境下における対象物追跡の強力なツールである。大域的な完全なベイズモデルに基づく複数対象物追跡装置は、ドメインの完全なビューを有する。残念ながら、この大域的モデルは、高次元状態空間を利用しなければ獲得できないので、ベイズ推論が殆ど困難になる。近年のパーティクルフィルタリングの研究によれば、この問題に対する有望なソリューションが示されている。 The second type of multi-object tracking system utilizes a complete Bayesian formulation of the multi-object tracking problem. Because of its inherent nature of predicting uncertainty, Bayesian theory is a powerful tool for object tracking in clutter environments. A multi-object tracking device based on a global complete Bayesian model has a complete view of the domain. Unfortunately, this global model can only be obtained using a high-dimensional state space, making Bayesian inference almost difficult. Recent particle filtering research shows a promising solution to this problem.

パーティクルフィルタリングはクラッター環境下の対象物追跡にうまく適用されている（非特許文献１４及び１５を参照。）。元来のCondensation (Conditional Density Propagation)アルゴリズム及びその変形版は、基本状態表現として、一つの対象物状態を使用する。多数の対象物が存在することは、事後確率分布に多数のピークが現れることによって暗黙的に示される。Condensationアルゴリズムがこのような表現に適用された場合、ピークは、推定が長期間に亘って行われたとき、他のすべてのピークよりも支配的になる。事後確率分布が一定数のサンプルと共に伝搬する場合、最終的に、すべてのサンプルは支配的なピークの周辺に集まる。支配的ピークの問題の他に、追加、削除及びオクルージョンのような事象を当然には取り扱えない。サンプリングの枠組みと、複数のモードを維持し得る能力は、パーティクルフィルタリングがこのような問題に対処する手段を提供すべきであることを示す。しかしながら、このような手段は、これまでは十分に利用されていない。概略的に説明すると、複数のモードが生じる原因が二つある。第１の原因は、測定量が不十分であるか、又は、クラッターのために対象物状態に曖昧さが含まれることである。第２の原因は、測定量が複数の対象物に由来することである。第１の原因の場合、曖昧さが自然に解消されるまで、すべてのモードを追跡することが望ましい。第２の原因の場合、屡々、現れたすべての対象物を追跡することが要求される。これらの制限は、パーティクルフィルタリング処理から生じる制限ではなく、追跡装置によって使用される状態表現と観測モデルの両方から生じる制限である。
H.Tao, H.Sawhney, and R.Kumar, A sampling algorithm for tracking multiple objects, In proc ICCV workshop on vision algorithms, 1999 J.MacCormick, A.Blake, A probabilistic exclusion principle for tracking multiple objects, In proc 7th Int. Conf on computer vision, 1999.pp:572-578 D.Comaniciu. V.Ramesh, P.Meer, Real time tracking of non-rigid objects using mean shift, Proc Conf. on Computer Vision and Pattern recognition. Vol.2.pp:142-149,2000 M.Isard .J.MacCormick, A.Blake, BraMBle: A Bayesian multiple ?blob tracker, , In proc of Int. Conf on Computer Vision,.vol II:34-41, Vacouver, Canda, July. 2001 M.Spengler, B.Schiele, Multi-object tracking:Explicit knowledge representation and implementation for complexity reduction, ECCV 2002 C.Hue, J-P Le Cadre, Sequential Monte Carlo methods for multiple target tracking and data fusion, IEEE Trans. On Signal processing, vol.50,no.2, Feb 2002 Irani, M. and Anandan, A unified approach to moving object detection in 2D and 3D scenes. IEEE trans. On Pattern Analysis and Machine Intelligence, vol.20, no. 6,557-589 A.Senior, Tracking people with probabilistic appearance models, In proceedings of ECCV workshop on performance evaluation of tracking and Surveillance systems C.Tomasi, and T. Kaneda, “Shape and motion from image streams under orthography: A factorization method”. Int’l J. of Computer Vision, vol.9, no.2, pp. 137-154 Q.Zheng and R.Chellappa, Automatic feature point extraction and and tracking in image sequences from unknown camera motion,. Proc. Fourth Int’l Conference on Computer vision. ICCV(93), pp. 335-339 T.E.Fortmann, Y.Bar-Shalom, and M.Scheffe, Sonar tracking of multiple targets using joint probabilistic data association, IEEE J of Oceanic Engineering, vol.8, no.3, pp.173-184, 1983 Z.Zhang and O.D. Faugeras, Three-dimensional motion computation and object segmentation in a long sequences of stereo frames, Int’l J. Computer Vision, vol.7, no.3, pp.211-241, 1992 J.E.W. Mayhew and J.P. Frisby, Psychophysical and computational studies towards a theory of human stereopsis, Artifical intelligence, vol.17, 1981 M.R.W. Dawson, The how and why of what went where in apparent motion: Modeling solutions to the motion correspondence problem, Psychological Review, vol.98.no.4, pp:569-603,1991 A.Doucet, N.de Freitas, and N.Gordon, editors, Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2002 M.A. Isard and A.Blake. Condensation-conditional density propagation for visual tracking. Int.J.Computer vision, 28(1):5-28,1998 R.Fablet, M.J.Black, Automatic detection and tracking of human motion with a view-based representation, in European Conference on Computer Vision, pages 476-491, 2002 M.Isard and A.Blake, Icondensation:unifying low level and high-level tracking in a stochastic framework. Lecture notes in Computer Science, 1406, 1998 J.Vermaak, P.Perez, M.Gangnet, and A.Blake. Towards improved observation models for visual tracking: Selective adpaptation. In European Conference on Computer Vision, pages 645-660, 2002 E.Koller-Merier and F.Ade. Tracking multiple objects using the Condensation algorithm. Journal of Robotics and Autonomous System, 34:93-105, (2-3) 2001 S.Maskell, M.Rollason, D.Salmond, and N.Gordon. Efficient particle filtering for multiple target tracking with application to tracking in structured images. In SPIE Vol 4728. Signal and Data Processing of small targets, 2002 P.Perez, C.Hue, J.Vermaak, and M.gangnet. Color-based probabilistic tracking. In European Conference on Computer Vision, pages 661-675, 2002 Xue Jianru, Hand-drawing aided discussion in meeting room using smart assigner, TR, Fujixerox, 2003 J.Vermaak, A.Doucet, P.Perez, Maintaining multi-modality through mixture tracking, In International Conference on Computer Vision, 2003, Nice, France D.Tweed, A.Calway, Tracking many objects using subordinated Condensation, in British machine vision conference, 2002, pp 283:292 Y.Wu, T.Yu, and G.Hua, Tracking appearance with occlusion, in Proc.of. IEEE Conference on CVPR’03, Madison,Wisconsin,2003 D.Mumford, Pattern Theory: a unifying perspective, Perception as Bayesian Inference, D.Knill and W.Richards, eds., pp. 25-62, Cambridge Univ.Press, 1996 B.Ripley, Pattern recognition and Neural networks, Cambridge Univ. Press, 1996 Particle filtering tutorial Jun, Liu. Monte Carlo strategies in scientific computing, Springer-Verlag,2001. ISBN 0-387-95230-6 A.Kong, J.S.Liu, and W.H.Wong, Sequential imputation method and Bayesian missing data peoblems. J.Amer.Statis.Assoc., vol.29,no.1,pp.5-28 J.S.Liu, Metropolized independent sampling with comparison to rejection sampling and importance sampling, Statist.Compu., vol.6.pp 113-119, 1996 A.Doucet, On sequential simulation-based methods for Bayesian filtering, Signal Process, Group, Dept, Eng., Camberidge, U.K., Tech, Rep. CUED/F-INFENG/TR.310.1998 N.Gordon, A hybrid bootstrap filter for target tracking in clutter. IEEE Trans. Aerosp, Electtron. Syst., vol.33, pp.353-359, Jan, 1997 D.Schulz, W.Burgard, D.Fox, and A.B.Cremers, Tracking multiple moving targets with a mobile robot using particle filters and statistical data association. In Proc. IEEE. Int.Conf. Robotics Automat., Seoul, Korea, May 21-26, 2002, pp. 1665-1670 A.Milistein, J.N.Sanchez, and E.T. Williamson, Robust global localization using clustered particle filtering. In Proceedings of AAAI/IAAI. Pages 581-586, 2002 Y.Bar-Shalom and T.Fortmann, Tracking and data association, Academic Press, 1988 D.Comaniciu, V.Ramesh, P.Meer, Kernel-based object tracking, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol.25, no.5, May, 2003. pp:564-577 S.Z.Li,X.G.Lv, and H.J.Zhang, View-based clustering of object appearances based on independent subspace analysis. In Proc IEEE Int’l Conf.on. Computer Vision, Vancouver, Canda, July,2001 R.Fablet, M.J.Black, Automatic detection and tracking of human motion with a view based representation. ECCV 2002, LNCS 2350.pp.476-491.2002 K.Toyama, A.Blake, Probabilistic tracking with exemplars in a metric space, International Journal of Computer Vision, 2002. in press B.J.Frey, N.Jojic, Learning Graphical models for images, video and their Spatial Transformations, In Proceedings of the sixteenth Conference on Uncertainty in Artificial Intelligence, 2001, Morgan Kaufmann, San Francisco,CA Andrew Blake, Michael Isard, Active Contours, Springer-Verlag, London, 1998 T.Cootes, G.Edwards, and C.Taylor, Active appearance models. In Proc. European Conf. On Computer Vision, Pages 484-498, 1998 A.Gelb, editor. Applied Optimal Estimation. MIT Press, Camberidge, MA, 1974 C.Hue, J-P Le Cadre, Sequential Monte Carlo methods for multiple target tracking and data fusion, IEEE Trans. On Signal processing, vol.50,no.2, Feb 2002 Particle filtering has been successfully applied to tracking an object in a clutter environment (see Non-Patent Documents 14 and 15). The original Condensation (Conditional Density Propagation) algorithm and its variants use a single object state as the basic state representation. The presence of multiple objects is implicitly indicated by the appearance of multiple peaks in the posterior probability distribution. When the Condensation algorithm is applied to such a representation, the peak becomes more dominant than all other peaks when the estimation is made over a long period of time. If the posterior probability distribution propagates with a certain number of samples, eventually all samples will gather around the dominant peak. In addition to the dominant peak problem, events such as additions, deletions and occlusions cannot of course be handled. The sampling framework and the ability to maintain multiple modes indicate that particle filtering should provide a means to address such issues. However, such means have not been fully utilized so far. Briefly described, there are two causes for a plurality of modes. The first cause is that the measured quantity is insufficient or the object state is ambiguous due to clutter. The second cause is that the measurement amount is derived from a plurality of objects. For the first cause, it is desirable to track all modes until the ambiguity is resolved naturally. In the case of the second cause, it is often required to track all the objects that appear. These restrictions are not restrictions that arise from the particle filtering process, but are restrictions that arise from both the state representation and the observation model used by the tracking device.
H. Tao, H. Sawhney, and R. Kumar, A sampling algorithm for tracking multiple objects, In proc ICCV workshop on vision algorithms, 1999 J. MacCormick, A. Blake, A probabilistic exclusion principle for tracking multiple objects, In proc 7th Int. Conf on computer vision, 1999.pp: 572-578 D. Comaniciu. V. Ramesh, P. Meer, Real time tracking of non-rigid objects using mean shift, Proc Conf. On Computer Vision and Pattern recognition. Vol.2.pp: 142-149,2000 M.Isard .J.MacCormick, A.Blake, BraMBle: A Bayesian multiple? Blob tracker,, In proc of Int. Conf on Computer Vision, .vol II: 34-41, Vacouver, Canda, July. 2001 M.Spengler, B.Schiele, Multi-object tracking: Explicit knowledge representation and implementation for complexity reduction, ECCV 2002 C. Hue, JP Le Cadre, Sequential Monte Carlo methods for multiple target tracking and data fusion, IEEE Trans. On Signal processing, vol. 50, no. 2, Feb 2002 Irani, M. and Anandan, A unified approach to moving object detection in 2D and 3D scenes.IEEE trans.On Pattern Analysis and Machine Intelligence, vol.20, no.6,557-589 A. Senior, Tracking people with probabilistic appearance models, In proceedings of ECCV workshop on performance evaluation of tracking and Surveillance systems C. Tomasi, and T. Kaneda, “Shape and motion from image streams under orthography: A factorization method”. Int'l J. of Computer Vision, vol.9, no.2, pp. 137-154 Q.Zheng and R.Chellappa, Automatic feature point extraction and and tracking in image sequences from unknown camera motion ,. Proc. Fourth Int'l Conference on Computer vision.ICCV (93), pp. 335-339 TEFortmann, Y. Bar-Shalom, and M. Scheffe, Sonar tracking of multiple targets using joint probabilistic data association, IEEE J of Oceanic Engineering, vol.8, no.3, pp.173-184, 1983 Z. Zhang and OD Faugeras, Three-dimensional motion computation and object segmentation in a long sequences of stereo frames, Int'l J. Computer Vision, vol.7, no.3, pp.211-241, 1992 JEW Mayhew and JP Frisby, Psychophysical and computational studies towards a theory of human stereopsis, Artifical intelligence, vol.17, 1981 MRW Dawson, The how and why of what went where in apparent motion: Modeling solutions to the motion correspondence problem, Psychological Review, vol.98.no.4, pp: 569-603,1991 A.Doucet, N.de Freitas, and N.Gordon, editors, Sequential Monte Carlo Methods in Practice.Springer-Verlag, 2002 MA Isard and A.Blake.Condensation-conditional density propagation for visual tracking.Int.J.Computer vision, 28 (1): 5-28,1998 R. Fablet, MJBlack, Automatic detection and tracking of human motion with a view-based representation, in European Conference on Computer Vision, pages 476-491, 2002 M. Isard and A. Blake, Icondensation: unifying low level and high-level tracking in a stochastic framework. Lecture notes in Computer Science, 1406, 1998 J. Vermaak, P. Perez, M. Gangnet, and A. Blake. Towards improved observation models for visual tracking: Selective adpaptation.In European Conference on Computer Vision, pages 645-660, 2002 E.Koller-Merier and F.Ade.Tracking multiple objects using the Condensation algorithm.Journal of Robotics and Autonomous System, 34: 93-105, (2-3) 2001 S. Maskell, M. Rollason, D. Salmond, and N. Gordon. Efficient particle filtering for multiple target tracking with application to tracking in structured images. In SPIE Vol 4728. Signal and Data Processing of small targets, 2002 P.Perez, C.Hue, J.Vermaak, and M.gangnet.Color-based probabilistic tracking.In European Conference on Computer Vision, pages 661-675, 2002 Xue Jianru, Hand-drawing aided discussion in meeting room using smart assigner, TR, Fujixerox, 2003 J. Vermaak, A. Doucet, P. Perez, Maintaining multi-modality through mixture tracking, In International Conference on Computer Vision, 2003, Nice, France D. Tweed, A. Calway, Tracking many objects using subordinated Condensation, in British machine vision conference, 2002, pp 283: 292 Y.Wu, T.Yu, and G.Hua, Tracking appearance with occlusion, in Proc.of.IEEE Conference on CVPR'03, Madison, Wisconsin, 2003 D. Mumford, Pattern Theory: a unifying perspective, Perception as Bayesian Inference, D. Knill and W. Richards, eds., Pp. 25-62, Cambridge Univ.Press, 1996 B. Ripley, Pattern recognition and Neural networks, Cambridge Univ. Press, 1996 Particle filtering tutorial Jun, Liu. Monte Carlo strategies in scientific computing, Springer-Verlag, 2001. ISBN 0-387-95230-6 A.Kong, JSLiu, and WHWong, Sequential imputation method and Bayesian missing data peoblems.J.Amer.Statis.Assoc., Vol.29, no.1, pp.5-28 JSLiu, Metropolized independent sampling with comparison to rejection sampling and importance sampling, Statist.Compu., Vol.6.pp 113-119, 1996 A. Doucet, On sequential simulation-based methods for Bayesian filtering, Signal Process, Group, Dept, Eng., Camberidge, UK, Tech, Rep. CUED / F-INFENG / TR.310.1998 N. Gordon, A hybrid bootstrap filter for target tracking in clutter.IEEE Trans. Aerosp, Electtron. Syst., Vol.33, pp.353-359, Jan, 1997 D. Schulz, W. Burgard, D. Fox, and ABCremers, Tracking multiple moving targets with a mobile robot using particle filters and statistical data association.In Proc.IEEE. Int.Conf.Robotics Automat., Seoul, Korea, May 21 -26, 2002, pp. 1665-1670 A. Milistein, JNSanchez, and ET Williamson, Robust global localization using clustered particle filtering. In Proceedings of AAAI / IAAI. Pages 581-586, 2002 Y. Bar-Shalom and T. Fortmann, Tracking and data association, Academic Press, 1988 D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 25, no. 5, May, 2003. pp: 564-577 SZLi, XGLv, and HJZhang, View-based clustering of object appearances based on independent subspace analysis.In Proc IEEE Int'l Conf.on.Computer Vision, Vancouver, Canda, July, 2001 R.Fablet, MJBlack, Automatic detection and tracking of human motion with a view based representation.ECCC 2002, LNCS 2350.pp.476-491.2002 K. Toyama, A. Blake, Probabilistic tracking with exemplars in a metric space, International Journal of Computer Vision, 2002. in press BJFrey, N. Jojic, Learning Graphical models for images, video and their Spatial Transformations, In Proceedings of the sixteenth Conference on Uncertainty in Artificial Intelligence, 2001, Morgan Kaufmann, San Francisco, CA Andrew Blake, Michael Isard, Active Contours, Springer-Verlag, London, 1998 T. Cootes, G. Edwards, and C. Taylor, Active appearance models. In Proc. European Conf. On Computer Vision, Pages 484-498, 1998 A. Gelb, editor. Applied Optimal Estimation. MIT Press, Camberidge, MA, 1974 C. Hue, JP Le Cadre, Sequential Monte Carlo methods for multiple target tracking and data fusion, IEEE Trans. On Signal processing, vol. 50, no. 2, Feb 2002

従来の標準的な対象物追跡法であるパーティクルフィルタは、対象物追跡がベイズ推論問題として定式化できることを示している。そして、複数対象物の追跡も同様に定式化できる。しかし、パーティクルフィルタを複数対象物の追跡の定式化に使用するためには、非線形性のある確率的動的システムと、不完全な観測と、観測される複数対象物の次数及び割当の増大によって生ずるその他の困難な問題と、を解決しなければならない。 Particle filters, which are conventional standard object tracking methods, show that object tracking can be formulated as a Bayesian inference problem. The tracking of a plurality of objects can be similarly formulated. However, in order to use a particle filter to formulate the tracking of multiple objects, a nonlinear stochastic dynamic system, imperfect observations, and increased order and allocation of observed multiple objects. Any other difficult problems that arise must be solved.

この問題点を解決するため、本発明は、多様相を維持し、任意の数の類似対象物を自然に効率的に取り扱うことがよりよくできる複数対象物追跡方法及び追跡システムを提案する。また、本発明は、このような追跡システムにおいて行われる処理をコンピュータに実現させるためのプログラム、及び、このプログラムを記録した記録媒体の提供を目的とする。 In order to solve this problem, the present invention proposes a multi-object tracking method and tracking system that can better handle any number of similar objects in a natural and efficient manner while maintaining a variety of phases. Another object of the present invention is to provide a program for causing a computer to execute the processing performed in such a tracking system, and a recording medium on which the program is recorded.

本発明は、これらの困難な問題に対処するため、次の三つの新規技術を導入することにより典型的なパーティクルフィルタの拡張を提供する。１番目の新規技術は、対象物の間のオクルージョンをパーティクルに明示的に表現する。第２の新規技術は、観測の概念だけを組み込むのではなく、全く重なり合い（オーバーラップ）の無い対象物から多数の重なり合いのある対象物までを柔軟に取り扱うことができる専用設計された観測尤度関数である。３番目の新規技術は、観測モデルに関する技術である。本発明によれば、状態表現に隠れ変数を補強することによって、ビューや幾何学的変換に対して不変である生成的観測モデルが開発された。この観測モデルは対象物のアピアランス（見え方）が変形したときに適応する。 The present invention provides a typical particle filter extension by introducing the following three new technologies to address these difficult problems. The first new technique explicitly expresses occlusions between objects in particles. The second new technology does not incorporate only the concept of observation, but is an observation likelihood designed exclusively for flexible handling of objects with no overlap (overlap) to many objects with overlap. It is a function. The third new technology is related to observation models. According to the present invention, generative observation models have been developed that are invariant to views and geometric transformations by reinforcing hidden variables in the state representation. This observation model is applied when the appearance of the object is deformed.

請求項１に係る発明は、
画像シーケンス中の複数対象物を個別に同時に追跡するパーティクルフィルタリングに基づく複数対象物追跡方法であって、
対象物間のオクルージョンをパーティクルに明示的に記述し、
重なり合いの無い対象物から重なり合いのある対象物までを取り扱う観測尤度関数を設定し、
対象物アピアランス変形のダイナミクスをモデル化する隠れ表現を状態表現に組み込む、
ことを特徴とする。 The invention according to claim 1
A multi-object tracking method based on particle filtering for individually tracking multiple objects in an image sequence simultaneously,
Explicitly describe occlusions between objects in particles,
Set an observation likelihood function that handles objects from non-overlapping to overlapping objects,
Incorporate hidden expressions into the state expression to model the dynamics of object appearance deformation.
It is characterized by that.

本発明は、推論に確率推論を用いて複数の対象物を個別に識別しながら対象物をトラッキングする。各々の対象物は動きの特徴から識別できる。よって、形が同じ対象物であっても識別することができる。また、本発明によれば、オクルージョンの発生も認識できる。更に、本発明では、見えない変数（隠れ変数）を利用して幾何学的変換に影響を受けないモデルが実現されている。よって、本発明によれば、図形変換や傾きの変化に強い対象物追跡を実現することができる。 The present invention tracks an object while individually identifying a plurality of objects using probability reasoning for inference. Each object can be identified from the motion characteristics. Therefore, even objects having the same shape can be identified. Further, according to the present invention, occurrence of occlusion can also be recognized. Furthermore, in the present invention, a model that is not affected by geometric transformation is realized by using invisible variables (hidden variables). Therefore, according to the present invention, it is possible to realize object tracking that is resistant to graphic transformation and inclination change.

請求項２に係る発明によれば、前記状態表現において、基本表現は画像内のすべての対象物を対象物コンフィギュレーションとして表現することによって増強され、オクルージョンはオクルージョンリストを前記対象物コンフィギュレーションに補強することによって明示的に取り扱われる。表現を増強することによって、観測尤度計算のために必要な順序付き奥行き(ordering depths)を、オクルージョン関係に含まれる対象物のペアに分解することが可能になる。 According to the invention of claim 2, in the state representation, the basic representation is enhanced by representing all objects in the image as object configurations, and occlusion reinforces the occlusion list to the object configurations. To be handled explicitly. By enhancing the representation, it is possible to decompose the ordering depths necessary for the calculation of the observation likelihood into pairs of objects included in the occlusion relation.

請求項３に係る発明によれば、観測モデルは個数が変化しながら出現する対象物の尤度を反映する。一旦、このようなモデルを利用できるようになると、対象物の個数とコンフィギュレーションに関する事後確率分布を生成するために標準的なパーティクルを適用することができる。 According to the invention of claim 3, the observation model reflects the likelihood of the object appearing while the number changes. Once such a model is available, standard particles can be applied to generate an a posteriori probability distribution for the number of objects and the configuration.

請求項４に係る発明によれば、前記隠れ表現は確率的ランダム変数でもよく、ビューや幾何学的変換に対して不変である観測モデルを生成する。これにより、観測モデルは様々な種類の変形に適応する。サンプルの不足と、複数対象物の追跡の際に特に悪化する確率崩壊問題に対処するため、再重み付けされたサンプリングを複数対象物追跡の枠組みの中で具現化することができる。 According to the invention of claim 4, the hidden representation may be a random random variable, and generates an observation model that is invariant to view and geometric transformation. This allows the observation model to adapt to various types of deformations. To address the lack of samples and the probability collapse problem that is particularly exacerbated when tracking multiple objects, reweighted sampling can be implemented within a multiple object tracking framework.

請求項５に係る発明は、画像シーケンス中の複数対象物を個別に同時に追跡するパーティクルフィルタリングに基づく複数対象物追跡方法であって、
対象物間に発生したオクルージョンに関するすべての仮説を維持するオクルージョンリストと、観測モデルが対象物のアピアランス変形によって変化しないようにさせる変数と、を個別の対象物のコンフィギュレーションの複合体の状態表現に導入することによって排他原理を複数対象物へ拡張するステップと、
動的ベイジアンネットワークを、排他原理に基づく専用設計された観測モデルと組み合わせ使用することにより、ベイズ推論フレームワークの範囲内で複数対象物追跡を定式化するステップと、
全体的なオクルージョンが発生したときの曖昧さを軽減するための混合状態動的プロセスと、新たに到着した対象物を追跡するための再初期化を、同じ枠組みで考慮するステップと、
を有する。 The invention according to claim 5 is a multiple object tracking method based on particle filtering for individually tracking multiple objects in an image sequence simultaneously,
An occlusion list that maintains all hypotheses about occlusions that occur between objects, and a variable that keeps the observation model from changing due to the appearance deformation of the object. Extending the exclusion principle to multiple objects by introducing;
Formulating multiple object tracking within a Bayesian inference framework by using a dynamic Bayesian network in combination with a dedicated designed observation model based on exclusion principles;
Considering a mixed-state dynamic process to reduce ambiguity when global occlusion occurs and reinitialization to track newly arrived objects in the same framework;
Have

請求項５に係る発明によれば、対象物間のオクルージョンをモデル化する隠れ過程と、各対象物の変換をモデル化する隠れ過程を、パーティクルフィルタ形式の基本動的ベイジアンネットワークに統合することによって、複数対象物追跡における、対象物間のオクルージョンと、幾何学的変換によって生じる可能性のある変形に対する適応性とを、ベイジアンネットワークの枠組みで取り扱い、複数対象物追跡方法を、例えば、逐次モンテカルロ解決法によって定式化することが可能になる。 According to the invention of claim 5, the hidden process for modeling the occlusion between objects and the hidden process for modeling the transformation of each object are integrated into a basic dynamic Bayesian network in the form of a particle filter. In the multi-object tracking, the occlusion between the objects and the adaptability to the deformation that may be caused by the geometric transformation are handled in the framework of the Bayesian network, and the multi-object tracking method is, for example, a sequential Monte Carlo solution. It becomes possible to formulate by law.

請求項６記載の複数対象物追跡システムは、
画像シーケンスを入力して複数対象物を個別に同時に追跡するパーティクルフィルタリングに基づく複数対象物追跡処理部と、
対象物間に発生したオクルージョンに関するすべての仮説を維持するオクルージョンリストと、観測モデルが対象物のアピアランス変形によって変化しないようにさせる変数と、を個別の対象物のコンフィギュレーションの複合体の状態表現に導入することによって排他原理を複数対象物へ拡張する拡張処理部と、
動的ベイジアンネットワークを、排他原理に基づく専用設計された観測モデルと組み合わせ使用することにより、ベイズ推論フレームワークの範囲内で複数対象物追跡を定式化するベイズ推論部と、
全体的なオクルージョンが発生したときの曖昧さを軽減するための混合状態動的プロセスと、新たに到着した対象物を追跡するための再初期化を、同じ枠組みで考慮する統合処理部と、
を有する。 The multiple object tracking system according to claim 6,
Multiple object tracking processing unit based on particle filtering that inputs image sequence and tracks multiple objects individually and simultaneously,
An occlusion list that maintains all hypotheses about occlusions that occur between objects, and a variable that keeps the observation model from changing due to the appearance deformation of the object. An expansion processing unit that extends the exclusion principle to multiple objects by introducing;
A Bayesian inference unit that formulates multiple object tracking within the Bayesian inference framework by using a dynamic Bayesian network in combination with a dedicated designed observation model based on the exclusion principle;
An integrated processing unit that considers mixed state dynamic processes to reduce ambiguity when global occlusion occurs and re-initialization to track newly arrived objects in the same framework;
Have

請求項６に係る発明によれば、対象物間のオクルージョンをモデル化する隠れ過程と、各対象物の変換をモデル化する隠れ過程を、パーティクルフィルタ形式の基本動的ベイジアンネットワークに統合することによって、複数対象物追跡における、対象物間のオクルージョンと、幾何学的変換によって生じる可能性のある変形に対する適応性とを、ベイジアンネットワークの枠組みで取り扱い、複数対象物追跡方法を、例えば、逐次モンテカルロ解決法によって定式化するシステムを構築することが可能になる。 According to the invention which concerns on Claim 6, the hidden process which models the occlusion between objects, and the hidden process which models conversion of each object are integrated into the basic dynamic Bayesian network of a particle filter type, In the multi-object tracking, the occlusion between the objects and the adaptability to the deformation that may be caused by the geometric transformation are handled in the framework of the Bayesian network, and the multi-object tracking method is, for example, a sequential Monte Carlo solution. It becomes possible to construct a system formulated by law.

請求項７に係る発明は、
画像シーケンスを入力して複数対象物を個別に同時に追跡するパーティクルフィルタリングに基づく複数対象物追跡機能と、
対象物間に発生したオクルージョンに関するすべての仮説を維持するオクルージョンリストと、観測モデルが対象物のアピアランス変形によって変化しないようにさせる変数と、を個別の対象物のコンフィギュレーションの複合体の状態表現に導入することによって排他原理を複数対象物へ拡張する機能と、
動的ベイジアンネットワークを、排他原理に基づく専用設計された観測モデルと組み合わせ使用することにより、ベイズ推論フレームワークの範囲内で複数対象物追跡を定式化するベイズ推論機能と、
全体的なオクルージョンが発生したときの曖昧さを軽減するための混合状態動的プロセスと、新たに到着した対象物を追跡するための再初期化を、同じ枠組みで考慮する統合機能と、
をコンピュータに実現させるためのプログラムである。 The invention according to claim 7 provides:
Multiple object tracking function based on particle filtering to input multiple image sequences and track multiple objects individually,
An occlusion list that maintains all hypotheses about occlusions that occur between objects, and a variable that keeps the observation model from changing due to the appearance deformation of the object. A function to extend the exclusion principle to multiple objects by introducing,
Bayesian inference function that formulates multi-object tracking within the Bayesian inference framework by using a dynamic Bayesian network in combination with a specially designed observation model based on exclusion principles;
An integrated function that considers mixed state dynamic processes to reduce ambiguity when global occlusion occurs and reinitialization to track newly arrived objects in the same framework;
Is a program for causing a computer to realize the above.

この請求項７に係る発明は、前記システムの様々な機能をコンピュータに実現させるためのプログラムを提供する。このプログラムは、通信回線や記録媒体を用いて前記システムに提供することが可能である。 The invention according to claim 7 provides a program for causing a computer to realize various functions of the system. This program can be provided to the system using a communication line or a recording medium.

また、請求項８に係る発明は、請求項７に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体を提供する。 The invention according to claim 8 provides a computer-readable recording medium in which the program according to claim 7 is recorded.

発明によれば、状態表現に隠れ変数を補強することによって、ビューや幾何学的変換に対して不変である生成的観測モデルが開発された。この観測モデルは対象物のアピアランス（見え方）が変形したときに適応する。本発明によれば、非線形性のある確率的動的システムと、不完全な観測と、観測される複数対象物の次数及び割当の増加によって生ずるその他の困難な問題と、が解決されるので、複数対象物の追跡の定式化にパーティクルフィルタを使用することができる。 According to the invention, generative observation models have been developed that are invariant to views and geometric transformations by reinforcing hidden variables in the state representation. This observation model is applied when the appearance of the object is deformed. According to the present invention, non-linear stochastic dynamic systems, incomplete observations, and other difficult problems caused by increasing the order and allocation of observed multiple objects are solved. A particle filter can be used to formulate the tracking of multiple objects.

以下、本発明を実施するための最良の形態を図面と共に詳細に説明する。 Hereinafter, the best mode for carrying out the present invention will be described in detail with reference to the drawings.

本発明を実施するための最良の形態では、クラッター画像シーケンス中の同じタイプの多数対象物を追跡するためパーティクルフィルタリングを使用する。本発明の一つのアプリケーション分野は、自動会議要約への先駆けとして会議室で使用されるペン型入力装置によって生成された円のロバスト追跡である（非特許文献２３を参照。）。典型的に、画像シーケンスには、見分けることが難しい画像特徴が含まれる。このようなシーケンス内での複数対象物追跡は困難である。なぜならば、アピアランスモデルは、各対象物をサポートするため、同じタイプの画像特徴を使用し、対象物が遮蔽され始めたり、相互に側を通過したりするとき、測定量の曖昧さの範囲が非常に大きくなるからである。 In the best mode for practicing the invention, particle filtering is used to track multiple objects of the same type in a clutter image sequence. One application area of the present invention is robust tracking of circles generated by pen-type input devices used in conference rooms as a precursor to automatic conference summaries (see Non-Patent Document 23). An image sequence typically includes image features that are difficult to distinguish. Tracking multiple objects within such a sequence is difficult. Because the appearance model supports each object, it uses the same type of image features, and when the objects begin to be occluded or pass side by side, the range of ambiguity of the measurand is Because it becomes very large.

本発明の一実施例は、従来の方法とは異なる以下の三つの新規技術を導入することによって複数対象物追跡を実現する。第１の新規技術は、状態表現のパーティクル内で対象物間のオクルージョンの明示的な表現を使用することである。この状態表現では、基本表現は、画像内のすべての対象物を対象物コンフィギュレーションとして表現することによって増強され、オクルージョンは、オクルージョンリストをそのコンフィギュレーションに補強することによって明示的に取り扱われる。表現を増強することによって、観測尤度計算のために必要な順序付き奥行き(ordering depths)を、オクルージョン関係に含まれる対象物のペアに分解することが可能になる。 One embodiment of the present invention realizes multi-object tracking by introducing the following three new technologies different from the conventional method. The first novel technique is to use an explicit representation of occlusion between objects within the state representation particles. In this state representation, the basic representation is augmented by representing all objects in the image as object configurations, and occlusion is handled explicitly by augmenting the occlusion list to that configuration. By enhancing the representation, it is possible to decompose the ordering depths necessary for the calculation of the observation likelihood into pairs of objects included in the occlusion relation.

第２の新規技術は、観測量の表記だけを組み込むのではなく、重なり合いのない対象物から多数の重なり合いのある対象物までスペクトルを柔軟に取り扱う観測尤度関数を設計することである。得られた観測モデルは、個数が変化しながら出現する対象物の尤度を正確に反映する。一旦、このようなモデルを利用できるようになると、対象物の個数とコンフィギュレーションに関する事後確率分布を生成するために標準的なパーティクルを適用可能であることがわかる。 The second novel technique is to design an observation likelihood function that flexibly handles a spectrum from a non-overlapping object to a large number of overlapping objects, rather than incorporating only the observation amount notation. The obtained observation model accurately reflects the likelihood of the object appearing while the number changes. Once such a model is available, it can be seen that standard particles can be applied to generate a posteriori probability distribution for the number of objects and the configuration.

第３の新規技術は観測モデルに関する技術である。オクルージョンリストの他に、幾何学的変形によって生じる対象物アピアランス変形のダイナミクスをモデル化する隠れ確率的ランダム変数を、状態表現に補強する。これにより、観測モデルは様々な種類の変形に適応する。サンプルの不足と、複数対象物の追跡の際に特に悪化する確率崩壊問題に対処するため、再重み付けされたサンプリングを複数対象物追跡の枠組みの中で具現化する。 The third new technology is a technology related to observation models. In addition to the occlusion list, the state representation is augmented with hidden stochastic random variables that model the dynamics of object appearance deformation caused by geometric deformation. This allows the observation model to adapt to various types of deformations. To address the lack of samples and the probability collapse problem that is particularly worse when tracking multiple objects, reweighted sampling is implemented within the multiple object tracking framework.

実験結果から分かるように、あらゆる新規技術は、追跡をロバスト（頑健）にさせるだけではなく、対象物がビデオ内で実行する変換のクラスを追跡することが可能である。本発明の最良の実施の形態の性能は、多数の同一円追跡と、会議室内での人物追跡の二つの実施例を用いて実証されている。 As can be seen from the experimental results, any new technology can not only make the tracking robust, but also track the class of transformations that the object performs in the video. The performance of the best embodiment of the present invention has been demonstrated using two examples: multiple identical circle tracking and person tracking in a conference room.

以下では、本発明をより詳細に説明する。そのため、最初に関連技術を概説する。次に、複数対象物追跡のタスクをベイズ推論問題として定式化する。次に、状態表現及び観測尤度関数を説明する。更に、対象物の追加、削除及びオクルージョンの取り扱い方を詳細に説明する。最後に、実験結果と結論を説明する。 In the following, the present invention will be described in more detail. Therefore, first, the related technology is outlined. Next, the task of tracking multiple objects is formulated as a Bayesian inference problem. Next, the state expression and the observation likelihood function will be described. In addition, how to add and delete objects and how to handle occlusion will be described in detail. Finally, the experimental results and conclusions will be explained.

［１］関連技術
一般的に、複数対象物追跡方法は二つのクラスに大別される。一つの方法は、多数の追跡システムの成功例で使用されている共通の方法であり、前景対象物を得るために背景差分を使用し、次に、データ連想問題を解法するため、多数仮説追跡によって各対象物にカルマンフィルタを帰属させる。対象物の表現モデルは、位置又は速度しか含まないので、更なる処理に必要な情報が不足している。もう一方の方法は、ベイズ推論に基づく追跡方法である。このクラスの追跡方法は、検出及び追跡を統合されたベイズ推論フレームワークに具現化し、将来に有望な手法である。本発明は、後者のベイズ推論に基づく追跡方法を重点的に取り扱う。 [1] Related Art Generally, the multiple object tracking method is roughly divided into two classes. One method is a common method used in the success of multiple tracking systems, using background differences to obtain foreground objects, and then multiple hypothesis tracking to solve data association problems. By assigning a Kalman filter to each object. Since the representation model of the object includes only the position or velocity, information necessary for further processing is insufficient. The other method is a tracking method based on Bayesian inference. This class of tracking methods embodies detection and tracking in an integrated Bayesian inference framework and is a promising approach in the future. The present invention focuses on the latter tracking method based on Bayesian inference.

ベイズ推論フレームワークに基づく対象物追跡を扱う近年の研究は、二つのカテゴリーに大別される。第１のクラスは、単一対象物の追跡問題に取り組み（非特許文献１７、１８、１９及び２２）、第２のクラスは、複数対象物を同時に追跡する問題を取り扱う（非特許文献１、４、８、２０、２１、２４及び２５）。これらのシステムは、その他のシステムと同様に、非ガウス確率分布上のベイズ推論のために逐次モンテカルロ法を使用する。最初に、これらの二つのクラスのアルゴリズムについて簡単に説明する。 Recent studies dealing with object tracking based on the Bayesian inference framework can be broadly divided into two categories. The first class addresses the tracking problem of single objects (Non-Patent Documents 17, 18, 19 and 22), and the second class handles the problem of tracking multiple objects simultaneously (Non-Patent Documents 1, 4, 8, 20, 21, 24 and 25). These systems, like other systems, use sequential Monte Carlo methods for Bayesian inference over non-Gaussian probability distributions. First, a brief description of these two classes of algorithms.

単一対象物追跡の研究では、通常、巧妙な対象物モデルに重点が置かれ、単一対象物状態を基本状態表現として使用し、複数対象物の存在は、事後確率分布の複数個のピークによって暗黙的に示される。注目対象物が人物である場合に、人物は３次元姿勢モデルを種々の視点からのオプティカルフロー情報と融合することによりモデル化される（例えば、非特許文献１７を参照。）。また、低レベル追跡手法と高レベル追跡手法の間のキャップをつなぐために重要度サンプリングが使用される（非特許文献１８）。クラッター環境下のロバスト追跡では、観測モデルの適応性が重要であると指摘されている（非特許文献１９）。 Single object tracking studies typically focus on clever object models, using single object states as basic state representations, and the presence of multiple objects is due to multiple peaks in the posterior probability distribution. Implicitly indicated by When the target object is a person, the person is modeled by fusing a three-dimensional posture model with optical flow information from various viewpoints (see, for example, Non-Patent Document 17). Also, importance sampling is used to connect the cap between the low level tracking method and the high level tracking method (Non-patent Document 18). It has been pointed out that the adaptability of the observation model is important for robust tracking in a clutter environment (Non-patent Document 19).

複数対象物追跡の殆どのアルゴリズムは二つのカテゴリーに大別される。第１のカテゴリーに属するアルゴリズムは、一般的に、すべての注目対象物のコンポーネントを包括するように状態空間を拡張することによって複数対象物追跡問題を解決する。これにより、複数対象物の問題はより簡単な単一対象物の問題に簡約化され、複数対象物が存在する場合に多様相に対処できないという単一対象物状態表現に基づくパーティクルフィルタの制限が克服される。対象物の個数の変化は、状態空間の次数をダイナミックに変化させることによって、又は、対象物の有無を表すインジケータ変数の組を対応付けることによって受け容れられる。 Most algorithms for tracking multiple objects fall into two categories. Algorithms belonging to the first category typically solve the multi-object tracking problem by extending the state space to encompass all the components of interest. This simplifies the problem of multiple objects to a simpler single object problem and restricts particle filters based on single object state representations that cannot deal with polymorphism when multiple objects exist. Overcome. The change in the number of objects is accepted by dynamically changing the order of the state space or by associating a set of indicator variables indicating the presence or absence of the object.

これに対して、第２のカテゴリーに属するアルゴリズムは、対象物の連結を避け、複数対象物を個別に、しかも、同時に追跡するパーティクルフィルタリングに基づくアルゴリズムである（非特許文献２０、２１、２４及び２５。）。複数対象物追跡装置は、単一対象物追跡アルゴリズムの複数のインスタンス化によって構築される。様々なレベルの高度化を伴う戦略は、オクルージョンや、重なり合う対象物が存在する場合に、得られた追跡装置の出力を解釈するために開発された。Condensationアルゴリズムは、ICondensationアルゴリズム（非特許文献１８）からアイデアを借用して複数対象物追跡に拡張され、突然の出現及び消失の状況に対処するため検出−追跡−検出スキームを利用する。近年、多様相を維持することができないパーティクルフィルタの欠点を解決するため非パラメトリック混合モデルが提案された（非特許文献２５）。 On the other hand, the algorithm belonging to the second category is an algorithm based on particle filtering that avoids connection of objects and tracks a plurality of objects individually and simultaneously (Non-Patent Documents 20, 21, 24 and 25.). A multi-object tracking device is built by multiple instantiations of a single object tracking algorithm. Strategies with various levels of sophistication were developed to interpret the resulting tracking device output in the presence of occlusions and overlapping objects. The Condensation algorithm borrows ideas from the ICondensation algorithm (Non-Patent Document 18) and extends to multi-object tracking and utilizes a detection-tracking-detection scheme to deal with sudden appearance and disappearance situations. In recent years, a non-parametric mixture model has been proposed in order to solve the drawbacks of particle filters that cannot maintain various phases (Non-patent Document 25).

複数対象物追跡における不可避的な、興味深い問題は、複数の対象物と背景のオクルージョンである。従来の複数対象物追跡の拡張は、理論的に精密ではあるが、せいぜい２〜３個の対象物を取り扱うように設計され、より多数の対象物を取り扱えるように拡張することは容易ではなかった（非特許文献１、２、４、８及び２６）。複数の隠蔽する対象物を追跡する確率論的排他原理は、各パーティクルに２個の対象物、即ち、隠蔽する可能性のある前景背景物、及び、隠蔽される可能性のある背景対象物を収容することによって導入された（非特許文献２）。この原理によれば、画像の「特徴」は、２個の対象物のうちの多くても一方をサポートするため使用することができる。状態密度を因数分解することにより、分割サンプリングは、前景対象物である第１レベル及び背景対象物である第２レベルと共に使用することができる。Condensationアルゴリズムにパーティクル間のバインディングが導入され（非特許文献２５）、これにより、複数のオクルージョンを標準的なCondensationフレームワーク内で自然に取り扱えるようになるが、すべての２個のパーティクルの間のオクルージョン検出の計算は、必要なパーティクルの個数が増加したとき、集約的になる。 An inevitable and interesting issue in multi-object tracking is multiple object and background occlusion. The extension of conventional multi-object tracking is theoretically precise, but it was designed to handle at most 2 to 3 objects, and it was not easy to expand to handle a larger number of objects. (Non-Patent Documents 1, 2, 4, 8, and 26). The probabilistic exclusion principle that tracks multiple objects to hide is that each particle contains two objects: a foreground background that may be hidden and a background object that may be hidden. It was introduced by housing (Non-Patent Document 2). According to this principle, the “features” of the image can be used to support at most one of the two objects. By factoring the density of states, split sampling can be used with a first level that is a foreground object and a second level that is a background object. The Condensation algorithm introduces binding between particles (Non-Patent Document 25), which allows multiple occlusions to be handled naturally within the standard Condensation framework, but occlusion between all two particles. The detection calculation becomes intensive when the number of required particles increases.

上記の従来技術に対して、本発明は、複数対象物追跡のための古典的なパーティクルフィルタリング方法の新規の拡張を提供する。図１は本発明の一実施例による複数対象物追跡方法のフローチャートである。本発明の一実施例は、排他原理を採用し（非特許文献２）、対象物間に発生したオクルージョンに関するすべての仮説を維持するオクルージョンリストと、観測モデルが対象物のアピアランス変形によって変化しないようにさせる変数と、を個別の対象物のコンフィギュレーションの複合体の状態表現に導入することによって排他原理を複数対象物へ拡張する（ステップ１０１）。これらを、排他原理に基づく専用設計された観測モデルと組み合わせ、動的ベイジアンネットワークを使用することにより、ベイズ推論フレームワークの範囲内で複数対象物追跡を定式化する（ステップ１０２）。本発明の一実施例では複数対象物の追跡が取り扱われるので、全体的なオクルージョンが発生したときの曖昧さを軽減するための混合状態動的プロセスと、新たに到着した対象物を追跡するための再初期化が、このフレームワーク内で考慮される（ステップ１０３）。 In contrast to the prior art described above, the present invention provides a novel extension of the classical particle filtering method for multi-object tracking. FIG. 1 is a flowchart of a multi-object tracking method according to an embodiment of the present invention. One embodiment of the present invention adopts an exclusion principle (Non-Patent Document 2), and maintains an occlusion list that maintains all hypotheses regarding occlusions occurring between objects, so that the observation model does not change due to appearance deformation of the object. The exclusion principle is extended to a plurality of objects by introducing the variable to be expressed into the state representation of the complex of the configuration of the individual objects (step 101). These are combined with a specially designed observation model based on exclusion principles and a dynamic Bayesian network is used to formulate multiple object tracking within the Bayesian inference framework (step 102). In one embodiment of the present invention, tracking multiple objects is handled, so a mixed state dynamic process to reduce ambiguity when global occlusion occurs and to track newly arrived objects Is re-initialized within this framework (step 103).

［２］追跡のベイズ定式化とパーティクルフィルタ
ベイズの定理（非特許文献２７及び２８）は、次式（１）に示されるように、画像シーンIから世界Wを確率論的に推論するツールを提供する。 [2] Bayesian formulation of tracking and particle filter The Bayes' theorem (Non-patent Documents 27 and 28) is a tool that probabilistically infers the world W from the image scene I as shown in the following equation (1). provide.

p（I）は他の項から推論できるので、p(I)は典型的に正規化定数1/kとして取り扱われる。世界の状態のMAP推定値は、観測された画像が与えられた場合に、１である可能性が最も確からしい。

Since p (I) can be inferred from other terms, p (I) is typically treated as a normalization constant 1 / k. The world state MAP estimate is most likely to be 1 given the observed image.

追跡するため、観測器（オブサーバ）は、世界の小さい部分（以下では、対象物若しくは目標物と称される）に注目し、過去の画像を考慮に入れる。これは、隠れマルコフモデルと類似した動的ベイジアンネットワークによって説明することができる。図２は、追跡問題を隠れマルコフモデルと類似した動的ベイジアンネットワークを用いて表現した例の説明図である。

To track, the observer (observer) focuses on a small part of the world (hereinafter referred to as an object or target) and takes into account past images. This can be explained by a dynamic Bayesian network similar to the hidden Markov model. FIG. 2 is an explanatory diagram of an example in which the tracking problem is expressed using a dynamic Bayesian network similar to the hidden Markov model.

時点tにおいて、状態 State at time t

は、対象物の特徴パラメータの現在推定値を表現する。ここまでに観測された画像特徴Ｚ_ｔ、Ｚ_ｔ−１…のシーケンスを使用して、MAP追跡タスクは、

Represents the current estimated value of the feature parameter of the object. Using the sequence of image features Z _t , Z _t−1 ... Observed so far, the MAP tracking task is

を最大にさせる状態を推定する。特徴Ｚ_ｔ、Ｚ_ｔ−１…は、連続測定方程式：

Estimate the state that maximizes. Features Z _t , Z _t−1 ... Are continuous measurement equations:

によって状態Ｘと関係付けられた測定空間を定める。ベイズの定理を適用し、並び替えを行うことにより、以下の式（２）が得られる（非特許文献１６）。

Defines the measurement space associated with state X. By applying the Bayes' theorem and rearranging, the following formula (2) is obtained (Non-patent Document 16).

式中、Ｘ_ｔに関する事前知識を要約する式：

In the formula, the formula that summarizes the pre-knowledge of the X _t:

は、過去の状態推定値と状態の進展についての知識とに基づく予測である。フィルタリング理論において一般的であるように、時間刻みの間の期待される動きに対するモデルが採用され、このモデルは、ダイナミクスと呼ばれる条件付き確率分布

Is a prediction based on past state estimates and knowledge of state evolution. As is common in filtering theory, a model for the expected movement during the time step is adopted, which is a conditional probability distribution called dynamics.

の形式で表される。ダイナミクスを使用して、式（２）は、次式（３）のように書き換えられる。

It is expressed in the form of Using dynamics, equation (2) is rewritten as the following equation (3).

簡単化のために時間指数を省略すると、観測尤度

Omitting the time index for simplicity, the observation likelihood

は、現在状態が与えられたときに、時点tに特定の画像を観測する確率を記述する。これは、画像構成と、期待されたものを悪化させるノイズの物理的過程に依存する。特に周知であり、かつ、容易に実施されるカルマンフィルタのケースでは、観測密度

Describes the probability of observing a particular image at time t given the current state. This depends on the image composition and the physical process of noise that exacerbates what is expected. In the case of the Kalman filter, which is particularly well known and easily implemented, the observation density

はガウシアンであると仮定され、ダイナミクスは加法的ガウスノイズに関して線形であると仮定される。残念ながら、視覚追跡問題で生じる観測密度はガウシアンであるとは限らないということが経験的に分かっている。非正規性について考えられる原因と改善法が提案されている（非特許文献１６）。確率論的ダイナミクス及び非線形状態又は非ガウシアンノイズの測定方程式の状況下では、パーティクルフィルタ（非特許文献２９）が特に適切である。

Is assumed to be Gaussian, and the dynamics are assumed to be linear with respect to additive Gaussian noise. Unfortunately, experience has shown that the density of observations caused by visual tracking problems is not always Gaussian. Possible causes and improvement methods for non-normality have been proposed (Non-Patent Document 16). In the context of stochastic dynamics and nonlinear state or non-Gaussian noise measurement equations, the particle filter (29) is particularly suitable.

説明の完全性のため、基本パーティクルフィルタについて簡単に説明する。パーティクルフィルタを裏付ける基本的な考え方は、モンテカルロ戦略（非特許文献３０）によって式（３）をシミュレーションすることである。このシミュレーションは、重み付きパーティクル集合： For completeness of explanation, the basic particle filter is briefly described. The basic idea behind the particle filter is to simulate equation (3) using the Monte Carlo strategy (Non-patent Document 30). This simulation is a weighted particle set:

の考え方を使用する。パーティクル集合は、確率π_iをもつ一つのx_iを選択することは、分布ｐ（ｘ）からランダムサンプルを引くこととほぼ同じことであるという意味で、確率分布ｐ（ｘ）を表現するためのものである。式（３）をシミュレーションするためには、パーティクル集合に対して、「ダイナミクスとのコンボリューション」演算、及び、「観測密度による乗算」演算を実行する必要がある。これは非常に簡単であることが分かる。ダイナミクス

Use the idea of In order to express the probability distribution p (x), the particle set means that selecting one x _i having the probability π _i is almost the same as subtracting a random sample from the distribution p (x). belongs to. In order to simulate Expression (3), it is necessary to execute a “convolution with dynamics” operation and a “multiplication by observation density” operation on a particle set. This turns out to be very simple. dynamics

とのコンボリューションのためには、各パーティクル

For convolution with each particle

を

The

から引かれた乱数で置換するだけでよい。観測密度

All you need to do is replace with a random number subtracted from. Observation density

で乗算するためには、各重みπ_ｉを

To multiply by, each weight π _i

で乗算し、新しい重みの合計が１になるように正規化するだけでよい。シミュレーションを事後確率分布により近づけるためには、再サンプリング、再重み付け、及び、棄却のようないくつかの演算が要求される（非特許文献３０）。パーティクル集合の悪化を評価するため、有効なサンプルサイズが規定されている（非特許文献３１及び３２）。図３は、適応的再サンプリングを伴う基本パーティクルフィルタのアルゴリズムの説明図である。また、再サンプリングステップは、図３に示されたアルゴリズムで実行されることが提案されている（非特許文献３３）。また、図４は、基本パーティクルフィルタの処理のフローチャートである。

And normalizing so that the total new weight is 1. In order to make the simulation closer to the posterior probability distribution, some operations such as resampling, reweighting, and rejection are required (Non-patent Document 30). In order to evaluate the deterioration of particle aggregation, an effective sample size is defined (Non-patent Documents 31 and 32). FIG. 3 is an explanatory diagram of an algorithm of a basic particle filter with adaptive resampling. Further, it has been proposed that the re-sampling step is executed by the algorithm shown in FIG. 3 (Non-patent Document 33). FIG. 4 is a flowchart of basic particle filter processing.

次に、複数対象物追跡問題と、複数対象物追跡のための従来のパーティクルフィルタリング方法と、について簡単に説明する。複数対象物を並列に追跡することは非常に難しい。なぜならば、複数対象物は、複数の最大値を有する事後確率分布を生ずるからである。MAPソリューションは曖昧になり、その結果として非実用的になる。しかし、好都合に、単一対象物追跡を用いて複数対象物のケースで追跡を行う方法が存在する。複数対象物の同時追跡は、複数対象物により構成された「対象物複合」対象物のコンフィギュレーションとして説明することができる。但し、単一複合対象物追跡を用いて複数対象物追跡問題のモデル化を簡単化することは、状態空間の次元が目標数の関数として指数関数的に増大することの代償として達成される。対象物の個数が少数であっても計算的に殆ど解決困難な問題、「次元の呪い」とも呼ばれる現象を生ずる。 Next, a multiple object tracking problem and a conventional particle filtering method for tracking multiple objects will be briefly described. It is very difficult to track multiple objects in parallel. This is because multiple objects produce a posterior probability distribution having multiple maximum values. The MAP solution becomes ambiguous and consequently impractical. However, there are expedient ways to track in the case of multiple objects using single object tracking. Simultaneous tracking of multiple objects can be described as a configuration of “object composite” objects composed of multiple objects. However, simplifying the modeling of multiple object tracking problems using single compound object tracking is accomplished at the cost of exponentially increasing the state space dimension as a function of the target number. Even if the number of objects is small, a problem that is almost difficult to solve computationally, a phenomenon called “curse of dimension”, occurs.

この複雑さは、シーン内に存在する対象物の個数が未知である場合には更に増大する。追跡される対象物の個数を制限することによって、問題の複雑さを予想することが可能になるが、システムの汎用性が制限される。これに対して、出現対象物の個数を事前予想することなく、追跡問題を解決する効率的な戦略を得ることは難しい。本発明の一実施例では、これらの問題に対処する手段を利用する。 This complexity is further increased when the number of objects present in the scene is unknown. By limiting the number of objects tracked, it becomes possible to predict the complexity of the problem, but it limits the versatility of the system. On the other hand, it is difficult to obtain an efficient strategy for solving the tracking problem without predicting the number of appearance objects in advance. One embodiment of the present invention utilizes means to address these issues.

［３］複数対象物パーティクルフィルタ
追跡すべき対象物の個数をMで表し、最初に、このMは一定であると仮定する。次に、対象物の個数が可変である場合について説明する。以下の説明では、インデックスiはM個の目標物のなかの一つを指定し、常に、第１の上付き添え字として使用する。複数対象物の追跡では、すべての対象物の状態ベクトルを連結することにより作成された状態ベクトルを評価する。一般的に、対象物は独立したマルコフダイナミクスに従って運動することが仮定される。単一対象物追跡と同様に、複数対象物追跡は、図５に示されるようなグラフで表すことができる。 [3] Multi-object particle filter The number of objects to be tracked is represented by M. First, it is assumed that M is constant. Next, a case where the number of objects is variable will be described. In the following description, the index i designates one of M targets and is always used as the first superscript. In tracking multiple objects, the state vectors created by concatenating the state vectors of all objects are evaluated. In general, it is assumed that the object moves according to independent Markov dynamics. Similar to single object tracking, multiple object tracking can be represented by a graph as shown in FIG.

時点tにおいて、 At time t

は、M個の部分方程式：

M partial equations:

に分解できる。ノイズ

Can be disassembled. noise

は、時間的かつ空間的に白色ノイズであり、ｉ≠ｉ´の場合に独立であると考えられる。

Is white noise temporally and spatially and is considered to be independent when i ≠ i ′.

時点tに収集された観測ベクトルは、 The observation vector collected at time t is

によって表され、インデックスｊはｍ_ｔ個の測定量のうちの１個を参照するための第１の上付き添え字として使用される。ベクトルｚ_ｔは検出測定量とクラッター測定量により構成される。各測定量の発生源は不明であるが、測定量と対象物の間の関連性を記述するためベクトルK_tを導入する必要がある。各成分Ｋ_ｔ ^ｊは、{０、．．．、Ｍ}の範囲の値を取るランダム変数である。かくして、Ｋ_ｔ ^ｊ＝ｉは、ｚ_ｔ ^ｊがｉ番目の対象物と関連付けられていることを示す。本例の場合、測定量ｚ_ｔ ^ｊは、以下の確率過程：

Is represented by the index j is used as the first upper superscript to refer to one of the m _t pieces of measured quantities. The vector z _t is composed of a detected measurement amount and a clutter measurement amount. The source of each measured quantity is unknown, but it is necessary to introduce the vector K _t to describe the relationship between the measured quantity and the object. Each component K _t ^j is {0,. . . , M} is a random variable that takes a value in the range. Thus, K _t ^j = i indicates that z _t ^j is associated with the i th object. In the case of this example, the measured quantity z _t ^j is the following stochastic process:

を表している。再び、ノイズW_t ^j及びW_t ^j'は、白色ノイズであり、j≠j^'の場合に独立であると仮定する。インデックス付けに応じて、測定量は、所定の対象物モデルｉに関連付けられるべき同じ事前確率をもつ。時点tにおいて、これらの連想確率は、次のベクトル：

Represents. Again, it is assumed that the noises W _t ^j and W _t ^{j ′} are white noises and are independent when j ≠ j ^′ . Depending on the indexing, the measured quantity has the same prior probability to be associated with a given object model i. At time t, these association probabilities are the following vectors:

を定める。かくして、ｉ＝１、．．．、Ｍに関して、すべてのｊ＝１、．．．ｍに対するｑ_ｔ ^ｉ＝Ｋ_ｔ ^ｊ＝１は、任意の測定量がｉ番目の対象物と関連付けられる離散確率である。データ連関を解明するため、［２］で説明した一般的な仮説を採用する。確率的排他原理（仮説５）に基づいて、即ち、単独の画像測定量（例えば、エッジ若しくは色ブロブ）は、相互に排他的な仮説を同時に補強しないという原理に基づいて、

Determine. Thus, i = 1,. . . , M, all j = 1,. . . q _t ⁱ = K _t ^j = 1 for m is the discrete probability that an arbitrary measurement is associated with the i th object. In order to elucidate the data association, the general hypothesis described in [2] is adopted. Based on the probabilistic exclusion principle (hypothesis 5), that is, based on the principle that a single image metric (eg edge or color blob) does not simultaneously reinforce mutually exclusive hypotheses,

が得られる。別の仮説によると、１個の対象物は、同時に０個又は１個の測定量を生成し得るので、ｍ_ｔはMとは異なり、特に、ｊ＝１、．．．ｍ^ｔに対する連想変数Ｋ_ｔ ^ｊは独立である。

Is obtained. According to another hypothesis, since one object may produce 0 or 1 measurand simultaneously, m _t is different from M, in particular, j = 1,. . . The associative variable K _t ^j for m ^t is independent.

多重仮説追跡アルゴリズムは連想仮説を再帰的に構築する（非特許文献１５）。このアルゴリズムの一つの利点は、新しい目標物のアピアランスは各ステップで仮説として立てられることである。しかし、このアルゴリズムの複雑さは、時間と共に指数関数的に増大する。連想の一部を除去するため、ある種の除去用ソリューションを見つける必要がある。JDPAF（非特許文献１１）は、測定量の規制を行い、予測の周りの楕円内部の測定量だけを通す。この規制は、測定量が予測状態に中心があるガウス則に従って分布することを仮定する。したがって、各連想K_t ^j=iの確率が推定される。 The multiple hypothesis tracking algorithm recursively constructs associative hypotheses (Non-Patent Document 15). One advantage of this algorithm is that the appearance of the new target is hypothesized at each step. However, the complexity of this algorithm increases exponentially with time. To remove some of the associations, some sort of removal solution needs to be found. JDPAF (Non-Patent Document 11) regulates the measurement amount and passes only the measurement amount inside the ellipse around the prediction. This restriction assumes that the measured quantity is distributed according to Gaussian law centered on the predicted state. Therefore, the probability of each association K _t ^j = i is estimated.

パーティクルフィルタリング方法は、パーティクル集合を用いて多重仮説を表現する本質的な能力を備えている。連想は所定の時間的反復のときだけ考慮すればよいので、データ連想の複雑さが軽減される。パーティクルが単一対象物状態空間に展開されるハイブリッド・ブートストラップ・フィルタが提案されている（非特許文献３５）。かくして、測定量が与えられた場合の目標物の事後確率則は、ガウス混合によって表現される。この法則の各モードは対象物のうちの一つに対応する。しかし、混合によって事後確率を表現する方法は、オクルージョン中に対象物のうちの一つを失う可能性がある（非特許文献３６）。隠蔽された対象物を追跡するパーティクルは非常に小さい重みが与えられるので、再サンプリングステップ中に放棄される。この問題にクラッターパーティクルの考え方を導入することが提案されている（非特許文献３７）。このアルゴリズムは、本質的に、パーティクルを、別々に追跡されるクラスタに分類する。各クラスタには、上位レベルで動作する別のモンテカルロフィルタによって追跡される確率が割り当てられる。 The particle filtering method has an essential ability to express multiple hypotheses using particle sets. Since associations need only be considered for a given temporal iteration, the complexity of data associations is reduced. A hybrid bootstrap filter in which particles are developed in a single object state space has been proposed (Non-patent Document 35). Thus, the posterior probability rule of the target when the measured quantity is given is expressed by Gaussian mixture. Each mode of this law corresponds to one of the objects. However, the method of expressing the posterior probability by mixing may lose one of the objects during occlusion (Non-patent Document 36). Particles that track the concealed object are given a very low weight and are discarded during the resampling step. It has been proposed to introduce the clutter particle concept into this problem (Non-patent Document 37). This algorithm essentially classifies particles into separately tracked clusters. Each cluster is assigned a probability that is tracked by another Monte Carlo filter operating at a higher level.

本発明の一実施例では、複数対象物追跡のためパーティクルフィルタを採用し、パーティクルの次数が各対象物に対応した個別の状態空間の和であるパーティクルを使用する（非特許文献１、２、４及び２６）。そして、従来技術とは異なり、各パーティクルサンプルに付加的な変数Occl_listを導入する。これは、対象物間のオクルージョンを明示し、対象物が互いに隠蔽することを許容し、観測尤度をどのように計算すべきであるかについて重要な手がかりを与える。Occl_listはコンフィギュレーションが与えられた場合に決定される。コンフィギュレーション空間が０個、１個、２個、．．．の対象物の考え得るすべてのコンフィギュレーションの和集合であるとすると、典型的な要素は、次式（５’）： In one embodiment of the present invention, a particle filter is used for tracking a plurality of objects, and particles whose order is the sum of individual state spaces corresponding to each object are used (Non-Patent Documents 1, 2, 4 and 26). Unlike the prior art, an additional variable Occl_list is introduced to each particle sample. This demonstrates occlusion between objects, allows objects to hide from each other, and provides important clues as to how the observation likelihood should be calculated. Occl_list is determined when a configuration is given. Configuration space is 0, 1, 2,. . . Assuming that it is the union of all possible configurations of the object of, a typical element is:

のように記述される。式中、Ｍ^ｔは現在時間ステップにおける対象物の個数を表し、Ｘ^ｔはｉ番目の対象物の状態をエンコーディングするベクトルである。Ｘ^ｔは、更に、位置を表現するサブベクトルｐ^ｉと、形状を表現するサブベクトルＱ^ｉと、当該オブジェクトがオクルージョンに関係しているかどうか、及び、オクルージョンに関係している場合にカメラに近い方はどちらのオブジェクトであるかを示す３値変数ｏｃｃｌ（０、１、２）と、に分解できる。多数の円を追跡するような特定のアプリケーションでは、ｐ^ｉは円の２次元位置であり、Ｑ^ｉは半径を表現するパラメータである。Occl_listは、隠蔽する対象物と隠蔽される対象物のペアの間のオクルージョン関係を記録し、必要な奥行き順序情報（レイヤ化情報とも呼ばれる（例えば、非特許文献７を参照。））を後述の観測モデルに提供し、M行M列の２値行列であり、オクルージョン関係はその要素Ｏ_ijの一つによって表現される。

It is described as follows. Where M ^t represents the number of objects in the current time step, and X ^t is a vector that encodes the state of the i th object. X ^t is further close to the camera when the sub-vector p ⁱ representing the position, the sub-vector Q ⁱ representing the shape, whether the object is related to occlusion, and if it is related to occlusion. Can be decomposed into a ternary variable occl (0, 1, 2) indicating which object it is. In certain applications, such as tracking a large number of circles, p ⁱ is the two-dimensional position of the circle and Q ⁱ is a parameter representing the radius. The Occl_list records an occlusion relationship between a target object to be concealed and a target object to be concealed, and necessary depth order information (also referred to as layering information (see, for example, Non-Patent Document 7)) described later. This is a binary matrix of M rows and M columns provided to the observation model, and the occlusion relation is expressed by one of its elements O _ij .

O_ij=1は、i番目の対象物がj番目の対象物を隠蔽することを意味する。説明をより簡潔にするため、O_ijとO_jiの両方に値１が与えられる状況は除外する。オクルージョンは、二つの対象物の各々と所定の最低閾値とのユークリッド距離を閾値化することによって検出される。図６は、隠れ過程{o_t}によって動的ベイジアンネットワークにOccl_listが導入される例の説明図である。図７は、Occl_listが構築される例の説明図である。図６では、隠れ状態過程{o_t}が動的ベイジアンネットワークに追加され、対象物間にオクルージョンが発生した状況が表現されている。図７では、両矢印付きの破線の両端の対象物がオクルージョン関係の候補を形成し、オクルージョンは、二つの対象物の各々と所定の最低閾値とのユークリッド距離を閾値化することにより容易に検出することができる。

O _ij = 1 means that the i-th object hides the j-th object. To make the explanation more concise, the situation where the value 1 is given to both O _ij and O _ji is excluded. Occlusion is detected by thresholding the Euclidean distance between each of the two objects and a predetermined minimum threshold. FIG. 6 is an explanatory diagram of an example in which Occl_list is introduced into the dynamic Bayesian network by the hidden process {o _t }. FIG. 7 is an explanatory diagram of an example in which Occl_list is constructed. In FIG. 6, a hidden state process {o _t } is added to the dynamic Bayesian network, and a situation in which occlusion occurs between objects is represented. In FIG. 7, the objects at both ends of the broken line with double arrows form candidates for the occlusion, and the occlusion is easily detected by thresholding the Euclidean distance between each of the two objects and a predetermined minimum threshold. can do.

式（２）から類推すると、複数対象物状態に対して（過去の画像に基づく条件を仮定すると）、 By analogy with equation (2), for multiple object states (assuming conditions based on past images)

が得られる。状態事前確率と呼ばれる右辺の最後の項は、連則確率のための式の同時実現可能性ロジックによってＪＰＤＡフィルタに具現化されている。観測尤度

Is obtained. The last term on the right side, called the state prior probability, is embodied in the JPDA filter by the simultaneous feasibility logic of the equation for the joint probability. Observation likelihood

が次式（８）：

Is the following formula (8):

として分解できるという仮説は、対象物が非常に接近しているとき、又は、重なり合っているときには破綻する。対象物をより正確に追跡するため、

The hypothesis that it can be decomposed as follows breaks down when the objects are very close or overlapping. In order to track the object more accurately,

は、カメラに対する追跡されている対象物の奥行きの順序を考慮しなければならない。対象物が重なり合うとき、どちらの対象物が前方にあるかを特定することは、一体的な対象物から画像のアピアランスを適切に予測するために重要である。対象物の３次元表現からの差分により、それらの奥行き順序情報が得られる（非特許文献４）。本発明の一実施例は、この問題を解決するためランダム変数occlを使用する。同様の考え方が提案されている（非特許文献２）。occlの値はoccl_listに依存する。

Must consider the depth order of the tracked object relative to the camera. Specifying which object is ahead when objects overlap is important to properly predict the appearance of an image from an integral object. The depth order information is obtained from the difference from the three-dimensional representation of the object (Non-Patent Document 4). One embodiment of the present invention uses a random variable occl to solve this problem. A similar idea has been proposed (Non-Patent Document 2). The value of occl depends on occl_list.

再度式（３）を参照すると、対象物の動きは独立しているので、 Referring to Equation (3) again, the movement of the object is independent.

は、次式（９）：

Is the following formula (9):

として書き換えられる。

Can be rewritten as

上述の通り、オクルージョンがなければ、観測尤度は式（８）を用いて一意に決定することができる。しかし、オクルージョンが発生したとき、オクルージョン関係は、尤度を一意に計算する前に分かっていなければならない。即ち、尤度は隠れ過程{o_i}に基づいて決まる（非特許文献２７）。本発明の一実施例では、オクルージョンは、確率過程ではなく、決定論的過程として取り扱われる（非特許文献２７）。なぜならば、本実施例の状態表現の場合、オクルージョンは簡単に検出することができ、これにより、後段のベイズ推論タスクが簡単化されるからである。 As described above, if there is no occlusion, the observation likelihood can be uniquely determined using Equation (8). However, when occlusion occurs, the occlusion relationship must be known before the likelihood is calculated uniquely. That is, the likelihood is determined based on the hidden process {o _i } (Non-patent Document 27). In one embodiment of the present invention, occlusion is treated as a deterministic process, not a stochastic process (27). This is because the occlusion can be detected easily in the case of the state expression of this embodiment, thereby simplifying the subsequent Bayesian inference task.

図７に示された動的ベイジアンネットワークの場合、M個の対象物に対応したM個の隠れマルコフ過程が存在する。図７における動的ベイジアンネットワークの厳密な確率論的推論は、観測尤度が複雑である場合には、非常に困難であろう。これに対して、確率論的逐次モンテカルロ戦略は、この問題に対して計算的なアプローチを提供し、確率は重み付きパーティクルの集合によって近似される。動的ベイジアンネットワークによるパーティクルの集合の展開は、動的システムの挙動を特徴付け、隠れ過程はパーティクルの集合から復元できる。 In the case of the dynamic Bayesian network shown in FIG. 7, there are M hidden Markov processes corresponding to M objects. Strict probabilistic reasoning of the dynamic Bayesian network in FIG. 7 would be very difficult when the observation likelihood is complex. In contrast, the stochastic sequential Monte Carlo strategy provides a computational approach to this problem, with probabilities approximated by a set of weighted particles. The expansion of a set of particles by a dynamic Bayesian network characterizes the behavior of the dynamic system, and the hidden process can be restored from the set of particles.

N個のパーティクルからなる初期パーティクル集合： Initial particle set consisting of N particles:

は、i=1,...,Mに対する各成分S₀ ^n,iが他の成分とは独立に

Each component S ₀ ^{n, i} for i = 1, ..., M is independent of the other components

からサンプリングされている。ここで、

It is sampled from. here,

が得られていると仮定する。各パーティクルは、次数：

Is obtained. Each particle has an order:

のベクトルであり、ここで、

Where

は、

Is

のi番目の成分を表し、

Represents the i-th component of

は、目標物iの状態表現の次数を表す。

Represents the order of the state representation of the target i.

図３及び４に示したパーティクルフィルタリングアルゴリズムと同様に、各反復は、予測と重み付けのような二つの主要な演算を含む。予測は、プロポーザル密度fからのサンプリングによって実行され、これは、以下のダイナミクス（１０）と一致する。 Similar to the particle filtering algorithm shown in FIGS. 3 and 4, each iteration includes two main operations, such as prediction and weighting. Prediction is performed by sampling from proposal density f, which is consistent with the following dynamics (10):

重み付けステップは、後述の観測尤度による乗算によって実行される。

The weighting step is executed by multiplication by observation likelihood described later.

［４］適応的観測尤度
n番目のパーティクル [4] Adaptive observation likelihood
nth particle

に条件付けされた観測の尤度の計算について説明する。測定量の独立性だけを仮定すると、すべてのn=1,...,Nに対して、

The calculation of the likelihood of observations conditioned on is described. Assuming only measurement independence, for all n = 1, ..., N,

として表すことができる。

Can be expressed as

ここで、 here,

及び

as well as

は、j番目の測定量の確率はクラッターと関連付けられていることを表す。この状況に対処するため、最初に、パーティクル内の対象物を、オクルージョンに関係していない対象物と、オクルージョンに関係している対象物の二つのクラスにラベル付けする。オクルージョンに関係していない対象物に対しては、式（１１）をそのまま使用することができる。しかし、オクルージョンに関係している対象物に関しては、オクルージョン検出ステップの後にoccl_listが与えられ、それらの同時観測尤度を考慮し、奥行き順序を考慮しなければならない。この演算と、ＰＤＡＦ及びＪＰＡＤＡＦの独立したアプローチとの重大な差は、本発明の一実施例による方法が対象物間のオクルージョンを判定する能力を備えている点である。一方の対象物が他方の対象物の前方に存在するという仮説が立てられたとき、隠蔽された対象物のアピアランスに関する期待値が変化する。

Represents that the probability of the j-th measure is associated with the clutter. To deal with this situation, first, the objects in the particles are labeled into two classes: objects that are not related to occlusion and objects that are related to occlusion. For an object that is not related to occlusion, Equation (11) can be used as it is. However, for objects related to occlusion, an occl_list is given after the occlusion detection step, and the depth order has to be taken into account considering their likelihood of simultaneous observation. A significant difference between this operation and the PDAF and JPADAF independent approach is that the method according to one embodiment of the present invention has the ability to determine occlusion between objects. When a hypothesis is made that one object is in front of the other object, the expected value for the appearance of the concealed object changes.

第２のステップは、オクルージョンに関係している対象物に対する最も確からしい奥行き順序を選ぶことである。このため、オクルージョンに関係している対象物の奥行き順序の考えられるすべての並べ替えが列挙される。対象物間のオクルージョン関係をあらゆる２個の対象物へ分解するので、奥行き順序に関して残されている不確実性は、２状態過程であり、遷移行列 The second step is to select the most probable depth order for the object involved in occlusion. For this reason, all possible permutations of the depth order of objects related to occlusion are listed. Since the occlusion relationship between the objects is broken down into every two objects, the remaining uncertainty with respect to the depth order is a two-state process and the transition matrix

はオクルージョン関係の遷移をモデル化するため定義される。ここで、0<δ<1である。

Is defined to model transitions in occlusion relationships. Here, 0 <δ <1.

得られた奥行き順序を用いて、画像Iのサイズの２値マスクの集合 A set of binary masks of the size of image I using the obtained depth order

が誘導され、ここで、Ma_jは対象物jに対応する。Ma(x,y)=1は、位置(x,y)での画素（画像ピクセル）が対象物jに由来することを表す。Ma(x,y)=0は、画素が別の対象物又は背景のいずれかに属することを表す。このようにして、遮られていると予測された画素は棄却され、見えていると予測された画素は正常にマッチングされる。更に、各画素は１個の対象物による証拠としてのみ使用されるべきであるという排他原理は、これらのマスクによって同時に保証され、オクルージョンに関係している対象物の同時観測尤度は、その固有のサポートマスクを用いて独立に処理される。したがって、後述の輪郭に基づく尤度及びアピアランスに基づく尤度では、単一対象物の状況だけを考慮すればよい。これらの二つの形態の尤度は、多数の追跡システムで採用されている（非特許文献２、３、４及び８）。

Where Ma _j corresponds to the object j. Ma (x, y) = 1 represents that the pixel (image pixel) at the position (x, y) is derived from the object j. Ma (x, y) = 0 represents that the pixel belongs to either another object or the background. In this way, pixels predicted to be blocked are rejected, and pixels predicted to be visible are matched normally. Furthermore, the exclusion principle that each pixel should only be used as evidence by one object is simultaneously guaranteed by these masks, and the simultaneous observation likelihood of objects related to occlusion is its own It is processed independently using the support mask. Therefore, in the likelihood based on the contour and the likelihood based on the appearance described later, only the situation of the single object needs to be considered. These two forms of likelihood are employed in many tracking systems (

Non-Patent Documents

2, 3, 4, and 8).

［４．１］輪郭に基づく観測モデル
この形式の対象物は輪郭を用いて記述され、輪郭はＢ−スプライン曲線としてモデル化される。形状空間は、低次元ベクトル空間Q_sとしてパラメータ化され（非特許文献１６）、アフィン変換的な変形に制約される。このスプライン方式アプローチは、追跡される対象物の形状の任意の詳細記述を可能にするが、対象物の輪郭が剛性的、平面的な曲線であり、平行移動、拡大縮小、平面内の回転に限定されているならば、アフィン変換による制約は、モデルの自由度を効率的に固定する。図８は、画像上の輪郭観測と測定ラインに沿ったエッジ点の説明図である。式（１０）のようなプロポーザル密度fによって予測されたコンフィギュレーションQ∈Q_sは、図８に示された方法によって測定され、画像座標のリストZ=(Z⁽¹⁾,...,Z^(B))が得られる。Zの成分は、それ自体が、コンフィギュレーションQの固定測定ラインに沿って測定された測定量により構成されたベクトルz^(b) b∈{1,...,B}である（即ち、コンフィギュレーションQが与えられた場合、測定ラインの配置も固定されることを意味する（非特許文献１６を参照。）。）。図６において、点線の直線は測定ラインを表し、小さい丸点は測定ライン上のエッジ点を表す。番号が付されていない丸点は真の輪郭点を表している。円状の曲線（一部は隠れている）は仮定された輪郭である。 [4.1] Observation model based on contour An object of this type is described using a contour, and the contour is modeled as a B-spline curve. The shape space is parameterized as a low-dimensional vector space Q _s (Non-Patent Document 16), and is restricted by affine transformation. This spline approach allows any detailed description of the shape of the object being tracked, but the contour of the object is a rigid, planar curve that translates, scales, and rotates in a plane. If limited, the affine transformation constraints effectively fix the model's degrees of freedom. FIG. 8 is an explanatory diagram of edge observation along the contour observation and measurement line on the image. The configuration QεQ _s predicted by the proposal density f as in equation (10) is measured by the method shown in FIG. 8, and the list of image coordinates Z = (Z ⁽¹⁾ ,. ^(B) ) is obtained. The component of Z is itself a vector z ^(b) b∈ {1, ..., B} composed of measured quantities measured along the fixed measurement line of configuration Q (ie, configuration When the action Q is given, it means that the arrangement of the measurement line is also fixed (see Non-Patent Document 16). In FIG. 6, a dotted straight line represents a measurement line, and a small round dot represents an edge point on the measurement line. A round dot without a number represents a true contour point. A circular curve (partially hidden) is an assumed contour.

ここで、２個の隠蔽する対象物を含むことが分かっている画像に配置された長さＬの１本の固定測定ラインについて考える。１次元のエッジ検出器がこのラインに適用され、ある特徴が画像座標z=(z₁,...,z_n)で検出される。一部のy_iは目標対象物の境界に対応しているかもしれないが、その他は画像のクラッターによって生じたものである。 Now consider a single fixed measurement line of length L placed in an image known to contain two objects to be hidden. A one-dimensional edge detector is applied to this line, and certain features are detected at image coordinates z = (z ₁ ,..., Z _n ). Some y _i may correspond to the boundaries of the target object, while others are caused by image clutter.

異なる測定ラインには、異なる個数のこのような特徴点が存在するので、異なる輪郭仮説に対する観測尤度は同等ではないであろう。そのため、本発明の一実施例で使用される観測尤度は、{z₁,...,z_n}の位置の同時確率（非特許文献４３）とは異なる。{z_i} i=1,...,nのうちの高々１個が目標輪郭点x_iによって生成され、その位置だけを考慮する必要があり、他の特徴点はポアソン過程によりモデル化されたクラッターと関連しているので、他の特徴点については、実際の位置ではなく、その個数だけを考慮すればよい。したがって、尤度を計算する観測は、（１）
１個のエッジ点の位置（未知である）と、（２）検出された特徴点の個数と、により構成される。 Since different numbers of such feature points exist in different measurement lines, the observation likelihoods for different contour hypotheses will not be equivalent. Therefore, the observation likelihood used in one embodiment of the present invention is different from the joint probability of the position of {z ₁ ,..., Z _n } (Non-patent Document 43). At most one of {z _i } i = 1, ..., n is generated by the target contour point x _i , only its position needs to be considered, the other feature points are modeled by Poisson process For other feature points, it is only necessary to consider their number, not their actual position. Therefore, the observation for calculating the likelihood is (1)
It is composed of the position of one edge point (unknown) and (2) the number of detected feature points.

検出された特徴点が測定ライン方向のクラッター分布と関連付けられ、ポアソン過程β（m：λ）に従う場合を考える。ここで、mは特徴の個数である。この仮説は従来技術でも使用されている（非特許文献２）。m個の特徴がクラッターと関連付けられている場合、 Consider a case where the detected feature points are associated with a clutter distribution in the direction of the measurement line and follow a Poisson process β (m: λ). Here, m is the number of features. This hypothesis is also used in the prior art (Non-Patent Document 2). If m features are associated with a clutter,

であり、m(L)は測定ラインに沿った特徴点の個数を表し、Cはクラッターを表す。

M (L) represents the number of feature points along the measurement line, and C represents clutter.

さらに、目標輪郭点x_iと関連した特徴点は、 Furthermore, the feature points associated with the target contour point x _i are

によるウィンドウW内のガウス分布

Gaussian distribution in window W

によって生成される。ここで、m(W)はウィンドウ内の特徴点の個数を表す。しかし、これらの特徴のうちのせいぜい１個しか目標と関連付けられない。その特徴を

Generated by. Here, m (W) represents the number of feature points in the window. However, at most one of these features can be associated with a goal. Its features

によって表す。したがって、i番目の測定ラインに対して、観測量Z_iは、検出された特徴の個数m(L)と、ウィンドウW内の特徴z_i ^*の位置と、により構成される。図９は測定ライン上のエッジ点を表す図である。

Is represented by Therefore, for the i-th measurement line, the observation amount Z _i is constituted by the number m (L) of detected features and the position of the feature z _i ^* in the window W. FIG. 9 is a diagram showing edge points on the measurement line.

W内部のどの特徴を目標に関連付けるべきであるかを決定できないので、すべての可能性を統合する。その上、目標に関連付けられた特徴の検出ミスを許容することは追跡をより頑健にすることができるので、事象φ₀及びφ₁を Since it is not possible to determine which features within W should be associated with the goal, integrate all possibilities. Moreover, allowing for misdetection of features associated with the target can make tracking more robust, so events φ ₀ and φ ₁

として表す。φ₀を条件として、尤度は、

Represent as On the condition of φ ₀ , the likelihood is

として設定される。

Set as

同様に、φ₁を条件とする尤度は、 Similarly, the likelihood with φ ₁ as the condition is

である。

It is.

したがって、 Therefore,

として表される。

Represented as:

測定ラインは独立であると仮定しているので、目標観測モデルは、 Since the measurement line is assumed to be independent, the target observation model is

である。尚、オクルージョンに関係した対象物の同時観測尤度を計算するため式（１７）を使用するとき、固有のサポートマスク内の特徴だけが尤度に影響を与えることに注意すべきである。

It is. It should be noted that when using Equation (17) to calculate the simultaneous observation likelihood of an object related to occlusion, only the features in the unique support mask affect the likelihood.

輪郭に基づく尤度の順応性に関しては、形状空間は既にアフィン変換によって生じた変化をモデル化しているので、変形が形状空間を獲得するため使用された学習集合に含まれているならば、形状空間は変形に対して多少頑健である。 For contour-like likelihood adaptability, the shape space already models the changes caused by the affine transformation, so if the deformation is included in the learning set used to acquire the shape space, the shape The space is somewhat robust against deformation.

［４．２］アピアランス観測モデル
近年、大域的なカラー基準モデルを使用する追跡装置は、頑健であり、計算コストが妥当であるため多目的に使えることが分かってきた（非特許文献３、８、２２及び３８を参照。）。これらの追跡装置は、特に、注目対象物が様々な種類の対象物であり、さらに、シーケンスの間に空間構造が姿勢変化や部分的オクルージョンなどによって急激に変化するタスクを追跡するために非常に有用である。 [4.2] Appearance Observation Model In recent years, it has been found that tracking devices using a global color reference model are robust and can be used for various purposes because of reasonable calculation costs (Non-Patent Documents 3 and 8, 22 and 38). These tracking devices are especially useful for tracking tasks in which the object of interest is of various types, and the spatial structure changes rapidly during a sequence due to posture changes, partial occlusions, etc. Useful.

アピアランスに基づく追跡用モデルの基本的な考え方は、現在フレームから、領域、即ち、カラー内容が基準カラーモデルと最もよく一致する固定形状可変サイズのウィンドウを探索することである。多種多様なパラメータ及び非パラメトリック確率論的技術が一様な着色領域のカラー分布をモデル化するため使用されている。その中で、カーネル密度推定技術、ある種の特定の非パラメトリック技術がコンピュータビジョン分野で近年注目されている。その理由は、この技術は、特定の分布を前提とするのではなく、推定が十分なサンプルをもつ任意の密度形状に収束し、特に、領域のカラー分布をパターンやカラーの混合によってモデル化するために適しているからである。したがって、この技術は、追跡の際の標準的な対象物表現方法になり始めている（非特許文献３、８、２２及び３８）。本発明の一実施例では、従来の対照表現方法（非特許文献８）を採用し、ビューの変化に適応するように僅かな変更を加える。 The basic idea of an appearance-based tracking model is to search the current frame for a region, i.e. a fixed-shape variable-size window whose color content best matches the reference color model. A wide variety of parameters and non-parametric stochastic techniques are used to model the color distribution of uniform colored areas. Among them, kernel density estimation technology and certain specific non-parametric technologies have recently attracted attention in the field of computer vision. The reason is that this technique does not assume a specific distribution, but converges to an arbitrary density shape with sufficient samples to estimate, and in particular, the color distribution of the region is modeled by a mixture of patterns and colors Because it is suitable for. Therefore, this technique has begun to become a standard object expression method for tracking (Non-Patent Documents 3, 8, 22, and 38). In one embodiment of the present invention, a conventional contrast expression method (Non-Patent Document 8) is adopted, and slight changes are made to adapt to changes in the view.

領域Rは目標の画像投影として定義され、カラーモデルは、シェーディング効果から色情報を分離するため、色相−彩度−明度（HSV）色空間でヒストグラム化技術を用いて獲得される。画像位置p及び形状Qによってパラメータ化された幾何学的表現Rの内部の画素を収集した後、Rのカラー確率分布関数は、基準モデルになるように選択され、位置yで定められ、確率分布関数p(y)によって特徴付けられる予測領域と比較される。両方の確率密度関数pdfはデータから推定される。目標モデルのためのカラー確率密度関数と、領域候補は、次式（１８）及び（１９）のように予測される。 Region R is defined as the target image projection, and the color model is acquired using a histogramming technique in the Hue-Saturation-Brightness (HSV) color space to separate the color information from the shading effect. After collecting the pixels inside the geometric representation R parameterized by the image position p and shape Q, the color probability distribution function of R is selected to be the reference model, defined by the position y, and the probability distribution Compared with the prediction region characterized by the function p (y). Both probability density functions pdf are estimated from the data. The color probability density function for the target model and the region candidate are predicted as in the following equations (18) and (19).

［４．２．１］目標モデル
目標モデルとして定義された領域内での正規化画素位置を

[4.2.1] Target model The normalized pixel position in the area defined as the target model

で表す。領域の中心は０である。凸状の単調減少カーネルプロファイルk(x)（勾配に基づく探索アルゴリズムのためのプロファイル）を備えた等方性カーネルは、画素が中心から離れるのに応じて小さい重みを割り当てる。このような重みを使用すると、密度推定の頑健性（ロバスト性）が増大する。なぜならば、周辺画素の方が信頼性が低く、屡々、オクルージョン又は背景からの妨害による影響を受けるからである。

Represented by The center of the region is zero. An isotropic kernel with a convex monotonically decreasing kernel profile k (x) (profile for a gradient-based search algorithm) assigns small weights as the pixels move away from the center. Use of such weights increases the robustness (robustness) of density estimation. This is because the surrounding pixels are less reliable and are often affected by occlusion or background interference.

関数 function

は、場所ｘ_i ^*における画素に、量子化特徴空間内の対応したビンのインデックスb(x_i ^*)を関連付ける。目標モデル内の特徴u=1,...,mの確率は、

^Associates the pixel at location x _i ^* with the corresponding bin index b (x _i ^* ) in the quantized feature space. The probability of features u = 1, ..., m in the target model is

として計算される。

Is calculated as

正規化係数Cは、制約条件 Normalization factor C is a constraint

を課すことによって導かれる。ここから、

Guided by imposing. from here,

が得られる。

Is obtained.

［４．２．２］目標候補
現在フレーム内でyに中心が置かれた目標候補の正規化画素位置を [4.2.2] Target candidate The normalized pixel position of the target candidate centered at y in the current frame.

とする。帯域幅hだけが異なる同じカーネル関数を使用する。目標候補における特徴u=1...mの確率は、

And Use the same kernel function that differs only in bandwidth h. The probability of feature u = 1 ... m in the target candidate is

によって与えられる。

Given by.

C_hがyに依存しない点に注意すべきである。画素位置x_iは規則的な格子に編成され、yは格子グリッドの一つである。したがって、C_hは、所定のカーネルとhの種々の値に対して予め計算することができる。帯域幅hは、目標候補のスケール、即ち、局在化過程で考慮される画素数を定義する。 Note that C _h does not depend on y. Pixel positions x _i are organized into a regular grid, and y is one of the grid grids. Therefore, C _h can be pre-calculated for various values of a given kernel and h. Bandwidth h defines the scale of the target candidate, ie the number of pixels considered in the localization process.

［４．２．３］観測尤度関数
類似度関数は、目標モデルと候補の間の距離を定義する。様々な目標間での比較が行えるようにするため、距離はメトリック構造を備えるべきである。Battachary係数に基づくメトリックは次式（２４）で定義される。 [4.2.3] Observation likelihood function The similarity function defines the distance between the target model and the candidate. The distance should have a metric structure so that comparisons between various goals can be made. The metric based on the Battachary coefficient is defined by the following equation (24).

類似度関数は次式（２５）で定義される。

The similarity function is defined by the following equation (25).

KL情報量(Kullback-Leibler divergence)とは対照的に、この距離は適切な距離であり、[0,1]の範囲に限定され、空ビンは重要ではない。従来の観測尤度（非特許文献２２）と同様に、本発明の一実施例における観測尤度は次式（２８）のように表される。

In contrast to the KL information (Kullback-Leibler divergence), this distance is a reasonable distance, limited to the range [0,1], and empty bins are not important. Similar to the conventional observation likelihood (Non-patent Document 22), the observation likelihood in one embodiment of the present invention is represented by the following equation (28).

［４．３］適応的観測モデル
アピアランスモデルを一例として考えると、上述のアピアランスモデルは、多数のアプリケーションを実現しているが、アピアランスモデルは、平行移動、回転、シヤリング及びワープのような入力画像の空間的変形又はトポロジカル変形に対し非常に敏感である。サブ空間に基づく技術が、ビューや大きいアピアランス変化に対して頑健なアピアランスに基づく表現を学習するため利用されている（非特許文献３９及び４０）。これらの表現は目標検出及び認識のために適しているが、追跡タスクの場合にサブ空間の次数は高くなる。

[4.3] Adaptive observation model Considering the appearance model as an example, the above-mentioned appearance model has realized many applications, but the appearance model is an input image such as translation, rotation, shearing, and warp. Very sensitive to spatial or topological deformations of Subspace-based techniques are used to learn appearance-based expressions that are robust to views and large appearance changes (Non-Patent Documents 39 and 40). These representations are suitable for target detection and recognition, but in the case of tracking tasks, the subspace order is high.

コンピュータビジョン用のアピアランスに基づくモデルを構築するための別の方法は、グラフィカルモデルとそれらのダイナミックな変形方式であり、普及し始めている（非特許文献２６、４１及び４２）。離散的変換変数を使用して、これらのモデルに対する入力画像における地理的変換を考慮し、この変数を生成的グラフィカルモデル内で潜在的変数として構築し、次に、変換の集合についての例外を計算するためＥＭアルゴリズムを使用し、得られたモデルが、入力における変換に対して不変性のある形でクラスタリングと、次数削減と、時系列解析と、を実行することが提案されている（非特許文献４２）。生の学習データの選択された代表値として標本を獲得し、対象物コンフィギュレーションの確率混合分布を表現するためこの標本を使用し、生成的グラフィカルモデル内でのトラッキングを解決するため逐次モンテカルロ法を使用することも提案されている（非特許文献４１）。このアイデアをアピアランスに基づく追跡に利用することも提案されている（非特許文献２６）。もちろん、このアイデアは輪郭に基づく観測モデルにも拡張することができる。 Another method for building appearance-based models for computer vision is graphical models and their dynamic variants, which are beginning to become popular (Non-Patent Documents 26, 41, and 42). Use discrete transformation variables to account for geographic transformations in the input image for these models, build this variable as a latent variable in the generative graphical model, and then compute exceptions for the set of transformations In order to achieve this, it has been proposed that the obtained model perform clustering, order reduction, and time series analysis in a form that is invariant to the transformation at the input (non-patent) Reference 42). Take a sample as a selected representative value of the raw training data, use this sample to represent the probability mixture distribution of the object configuration, and use a sequential Monte Carlo method to solve tracking in the generative graphical model It has also been proposed to use (Non-Patent Document 41). It has also been proposed to use this idea for tracking based on appearance (Non-patent Document 26). Of course, this idea can be extended to observation models based on contours.

本発明の一実施例では、観測モデルを適合的にするため、変換された隠れマルコフモデル(THMM)を導入する。変換t（既知変換の集合）を、画素強度のベクトルに作用するスパース変換生成行列G_tを用いて表現する。例えば、画像の整数画素の平行移動は順列行列によって表現することができる。他のタイプの変換行列は、順列行列によって正確に表現できない場合もあるが、殆どの有用なタイプの変換はスパース変換行列によって表現できる。例えば、回転及びブラーは、１行あたりに小数の非零要素を有する行列によって表現できる。 In one embodiment of the present invention, a transformed hidden Markov model (THMM) is introduced to make the observation model adaptive. A transformation t (a set of known transformations) is expressed using a sparse transformation generation matrix G _t that operates on a pixel intensity vector. For example, the translation of integer pixels in the image can be represented by a permutation matrix. Although other types of transformation matrices may not be accurately represented by permutation matrices, most useful types of transformations can be represented by sparse transformation matrices. For example, rotation and blur can be represented by a matrix having a small number of non-zero elements per row.

観測画像zは、無変換潜在的画像xと、変換インデックスt∈{1,...,L}に次式（２７）のようにリンクされる。 The observed image z is linked to the non-transformed latent image x and the transformation index tε {1,..., L} as in the following equation (27).

ここで、

here,

は、G_ixに中心があり、分散行列がΨであり、画素ノイズ分散の直交行列であるガウス分布である。変換の確率は潜在的画像に依存するので、潜在的画像xと、変換インデックスTと、観測画像zとに関する同時分布は、

Is a Gaussian distribution that is centered on G _i x, has a variance matrix Ψ, and is an orthogonal matrix of pixel noise variance. Since the probability of transformation depends on the potential image, the simultaneous distribution for potential image x, transformation index T, and observed image z is

である。図１０は対応したグラフィカルモデルの説明図である。同図には、観測画像zをモデル化するために離散的変換変数tを潜在的画像xに対する密度モデルp(x)に加える様子が示されている。

It is. FIG. 10 is an explanatory diagram of a corresponding graphical model. The figure shows how the discrete transformation variable t is added to the density model p (x) for the latent image x in order to model the observed image z.

ビデオシーケンスを解析する場合、対象物のアピアランスを決定する隠れ変数と、対象物の空間変換を表現する隠れ変数の両方に、時間的コヒーレンスを使用することは優れた考え方である。ＴＨＭＭは、時間的コヒーレンスをクラス変換及び空間変換に効率的に取り入れる。空間変換が非常にコヒーレントである場合、動的モデルにおける推論及び学習に要する時間は、同じ画像の集合に適用された等価的な静的モデルの場合と同じであることが分かる。図１１は、ビデオ処理のために動的ベイジアンネットワークの一実施例の説明図である。本例では、時点tにおけるクラスは、時点t-1におけるクラスに依存するが、時点t-1での変換には依存しない。 When analyzing video sequences, it is a good idea to use temporal coherence for both the hidden variables that determine the appearance of the object and the hidden variables that represent the spatial transformation of the object. THMM efficiently incorporates temporal coherence into class and spatial transformations. If the spatial transformation is very coherent, it can be seen that the time required for inference and learning in the dynamic model is the same as in the equivalent static model applied to the same set of images. FIG. 11 is an illustration of an embodiment of a dynamic Bayesian network for video processing. In this example, the class at time t depends on the class at time t-1, but does not depend on the conversion at time t-1.

ＴＨＭＭに組み込むことができる独立性には数種類のタイプがある。殆どのケースでは、動き（変換遷移）は画像クラスに依存する場合も、依存しない場合もあるが、クラス遷移は現在の変換インデックスから独立していることを仮定するのが適当である。 There are several types of independence that can be incorporated into a THMM. In most cases, the motion (transformation transition) may or may not depend on the image class, but it is appropriate to assume that the class transition is independent of the current transform index.

ここで、s_t=(c_t,t_t)は合成状態であり、c_tは現在クラスであり、t_tは現在変換である。本発明の一実施例では、現在変換は現在クラスに依存しないという仮定をする。即ち、

Here, s _t = (c _t , t _t ) is the composite state, c _t is the current class, and t _t is the current conversion. In one embodiment of the present invention, the current transformation is assumed to be independent of the current class. That is,

である。

It is.

変換に時間的コヒーレンスをモデル化するため、相対運動のマップm(t_t,t_t-1)を使用して Use the relative motion map m (t _t , t _t-1 ) to model temporal coherence in the transformation.

が指定される。例えば、画像平行移動の場合、このマッピングは、単純に、二つの大域的平行移動の間の相対シフトに対応する。全部でM個の垂直シフトとM個の平行シフトが考えられる場合、変換の総数はL=M²個であり、変換は、t=iM+jとなるようにソーティングされる。ここで、i及びjは、それぞれ、適切な垂直シフト及びｂ水平シフトのインデックスを表す。

Is specified. For example, in the case of image translation, this mapping simply corresponds to a relative shift between two global translations. If a total of M vertical shifts and M parallel shifts are considered, the total number of transforms is L = M ² and the transforms are sorted so that t = iM + j. Here, i and j represent appropriate vertical shift and b horizontal shift indexes, respectively.

次に、相対運動の大きさだけに関心がある場合には、マッピングを Next, if you are only interested in the magnitude of relative motion,

として定義し、運動の方向も重要である場合には、マッピングをベクトル

If the direction of motion is also important, the mapping is a vector

として定義する。同様に、他のタイプの変換の距離測定量も定義することができる。

Define as Similarly, distance measures for other types of transformations can be defined.

そこで、異なる画像クラスに対して異なる動き特性を許容するかどうかに依存して、 So depending on whether to allow different motion characteristics for different image classes,

を仮定する。連続的なフレームの間に小さい動きを仮定し、m>閾値(threshold)に対してp(m)=0を設定することにより、パラメータの個数と推論過程の計算負荷を大幅に削減することができる。

Assuming Assuming small movements between consecutive frames and setting p (m) = 0 for m> threshold, the number of parameters and the computational load of the inference process can be significantly reduced. it can.

ＴＨＭＭのパラメータは、c=1,...Cに対するu_cと、C個のクラスに対する平均画像と、各クラスの異なる画素に対する不確実性のレベルを定義するφ_cと、検出器ノイズを記述する対角共分散行列ψ_cと、s=(c,t)の場合の異なる状態の事前確率π_sと、上述のように分解できる最終的な遷移確率a_s,s'=p(s_t=s'｜s_t-1=s)と、である。モデルの隠れ変数は、状態s_tと潜在的画像x_tである。 THMM parameters describe u _c for c = 1, ... C, average image for C classes, φ _c defining the level of uncertainty for different pixels in each class, and detector noise Diagonal covariance matrix ψ _c , prior probabilities π _s for different states when s = (c, t), and final transition probabilities a _{s, s ′} = p (s _t = s' | s _t-1 = s). Models of hidden variables, the state s _t a potential image x _t.

生成的モデルと、過去の状態が与えられると、クラスタインデックスc_tと変換インデックスt_tは、 Given a generative model and past states, the cluster index c _t and the transformation index t _t are

からランダムに導かれる。次に、潜在的画像が

Randomly derived from. Then the potential image is

から得られ、最終フレームは、

And the final frame is

から得られる。この処理はシーケンスの最後まで繰り返される。所与のシーケンスに対して最良のＴＨＭＭパラメータが得られた場合、動的クラス分け、方向検出、動作認識のようないくつかの興味深いコンピュータビジョンタスクが、ＴＨＭＭ内の推論によって実行される。

Obtained from. This process is repeated until the end of the sequence. When the best THMM parameters are obtained for a given sequence, some interesting computer vision tasks such as dynamic classification, direction detection, motion recognition are performed by inference within THMM.

基礎になる画像空間の変換集合{t}及び正規化画像のクラスタ{c}（又はクラス）への分割は、様々な形式で実現される。例えば、変換集合{t}は、幾何学的歪みをモデル化する形状空間でもよく、クラスタ{c}はテクスチャーの空間でもよい（非特許文献４４）。或いは、{t}は平面類似度変換の空間であり、{c}に歪みと、テクスチャー／シェーディングの分布を吸収させてもよい。{t}及び{c}の解析的な形式を得るため、ガウス混合、因子分析（確率的PCA）、及び、隠れマルコフモデルが屡々使用される。この考え方は、標本の集合から学習させることができるメトリックモデルを用いることにより更に拡張されている（非特許文献４２）。より詳しく説明すると、本発明の一実施例では、{t}と{c}の両方は既に解析的形式で分かっていることを前提とする。実際上、輪郭観測モデルの場合、{t}及び{c}は、既にＢ−スプライン曲線形状空間内に暗黙的に表現されている。しかし、{t}及び{c}が採用された場合、形状空間の次数は低下するであろう。 The division of the underlying image space transformation set {t} and the normalized image into clusters {c} (or classes) can be implemented in various forms. For example, the transformation set {t} may be a shape space that models geometric distortion, and the cluster {c} may be a texture space (Non-patent Document 44). Alternatively, {t} is a plane similarity conversion space, and {c} may absorb distortion and texture / shading distribution. Gaussian mixture, factor analysis (probabilistic PCA), and hidden Markov models are often used to obtain analytic forms of {t} and {c}. This idea is further expanded by using a metric model that can be learned from a set of samples (Non-Patent Document 42). More specifically, in one embodiment of the present invention, it is assumed that both {t} and {c} are already known in analytical form. In practice, in the case of the contour observation model, {t} and {c} are already implicitly expressed in the B-spline curve shape space. However, if {t} and {c} are adopted, the shape space order will decrease.

当然、図１０に示された変換による影響を受けないモデルと、図６に示されたオクルージョンモデルを組み合わせることにより、新しい動的ベイジアンネットワークが得られる。図１２は、新しい動的ベイジアンネットワークの説明図である。隠れ変数ｔ_t ^mはパラメータc_t ^mによってインデックスが付けられた幾何学的変換であり、ここで、m∈[1,M]である。隠れ変数O_tは異なる対象物の間のオクルージョン関係を制御する。 Naturally, a new dynamic Bayesian network can be obtained by combining the model not affected by the transformation shown in FIG. 10 and the occlusion model shown in FIG. FIG. 12 is an explanatory diagram of a new dynamic Bayesian network. The hidden variable t _t ^m is a geometric transformation indexed by the parameter c _t ^m , where mε [1, M]. The hidden variable O _t controls the occlusion relationship between different objects.

図６に示された動的ベイジアンネットワークと比較すると、図１２の例では、三つ以上の隠れマルコフ処理が各対象物状態に付加されている。ここで、M*3+1個の隠れ処理 Compared to the dynamic Bayesian network shown in FIG. 6, in the example of FIG. 12, three or more hidden Markov processes are added to each object state. Here, M * 3 + 1 hidden processing

を、同図に矢印で示されるようなすべての条件付き確率に基づいて、観測データZ_tから推論する。より明瞭に説明するため、s_t=(c_t、t_t)を使用すると、以下の導出において式（３０）をそのまま使用し、図１２に示されたモデルよりも簡略化されたモデルを得ることができる。図１３は、図１２のグラフィカルモデルの簡略化されたモデルの説明図である。

Is inferred from the observation data Z _t based on all conditional probabilities as indicated by arrows in FIG. For the sake of clarity, using s _t = (c _t , t _t ), the equation (30) is used as is in the following derivation to obtain a model that is simpler than the model shown in FIG. be able to. FIG. 13 is an explanatory diagram of a simplified model of the graphical model of FIG.

同時尤度 Joint likelihood

に基づいて、

On the basis of the,

が得られる。

Is obtained.

モデルを特徴付けるためには、各対象物のダイナミクス To characterize the model, the dynamics of each object

と、遷移モデル

And transition model

と、

When,

と、観測尤度

And observation likelihood

と、をモデル化することが必要である。

It is necessary to model

ここまでの説明では、対象物の個数が既知であり、追跡中にその個数が一定に保たれる場合を仮定していた。しかし、実際の対象物追跡アプリケーションでは、対象物がシーンに出入りするので、対象物の個数が絶えず変化することがある。予測モデル In the description so far, it has been assumed that the number of objects is known and the number is kept constant during tracking. However, in an actual object tracking application, the number of objects may constantly change because the objects enter and exit the scene. Prediction model

は、各対象物が各時間ステップに確率λ_rでシーンに留まること、更に、新しい対象物が各時間ステップにシーンへ入ってくる確率はλ_iであることを表している。

Indicates that each object remains in the scene at each time step with probability λ _r , and that the probability that a new object enters the scene at each time step is λ _i .

［５］実施例の詳細な説明
以下では、二つの主要なステップ：予測ステップと重み付けステップの実施例を詳細に説明する。複数対象物用の完全なパーティクルフィルタは、図３及び４に示した生成的パーティクルフィルタに類似しているように思われるが、実際には相違点がある。これらの相違点について説明する。 [5] Detailed Description of the Examples In the following, an example of two main steps: a prediction step and a weighting step will be described in detail. A complete particle filter for multiple objects appears to be similar to the generative particle filter shown in FIGS. 3 and 4, but in practice there are differences. These differences will be described.

［５．１］予測
本発明の一実施例では、 [5.1] Prediction In one embodiment of the present invention,

の平行移動的ダイナミクスを表すためにダンピングされた定数速度とガウスノイズとを利用し、形状係数（Ｂ−スプライン曲線）はARP(1)（１次の自己回帰過程）に従う。即ち、

Using the damped constant velocity and Gaussian noise to represent the translational dynamics, the shape factor (B-spline curve) follows ARP (1) (first order autoregressive process). That is,

である。ここで、

It is. here,

は、標準的なガウス分布ランダム変数のベクトルである。

Is a vector of standard Gaussian distributed random variables.

は、Yule-Walker法（非特許文献４５）を使用して学習することができる。η_v及びζは、実験中に調整可能な固定パラメータであり、

Can be learned using the Yule-Walker method (Non-patent Document 45). η _v and ζ are fixed parameters that can be adjusted during the experiment,

は、対象物の水平速度及び垂直速度を定める速度ベクトルである。

Is a velocity vector that defines the horizontal and vertical velocities of the object.

幾何学的変換に対し不変性のある観測モデルを導入しているので、sのダイナミクスも考慮しなければならない。もう一度式（３１）に戻り、相対運動が画像クラスcとは独立であることを仮定する。これにより、sのダイナミクスはcとvの二つのダイナミクスに分解できる。vのダイナミクスは次式（３６） Since an invariant observation model is introduced for the geometric transformation, the dynamics of s must also be considered. Returning again to equation (31), it is assumed that the relative motion is independent of image class c. As a result, the dynamics of s can be decomposed into two dynamics of c and v. The dynamics of v is given by the following formula (36)

に示される。

Shown in

に対するマルコフ行列T_cは遷移をヒストグラム化することにより学習される。

The Markov matrix T _{c for} is learned by histogramming the transitions.

本発明の一実施例では、対象物間のオクルージョンは、occl_list内の対象物のあらゆるペアの考えられるオクルージョンに分解されるので、簡略化された状況では、あらゆるオクルージョン遷移をモデル化するため固定マルコフ行列T_ijを使用する。もちろん、T_ijは遷移をヒストグラム化することによって学習することが可能である。 In one embodiment of the present invention, occlusion between objects is broken down into possible occlusions for every pair of objects in the occl_list, so that in a simplified situation, a fixed Markov model is used to model every occlusion transition. Use the matrix T _ij . Of course, T _ij can be learned by making the transition into a histogram.

［５．２］観測尤度による重み付け
観測尤度 [5.2] Weighting by observation likelihood Observation likelihood

は、新規技術に基づいて、即ち、予測輪郭と、検出された実際の輪郭との間の不一致、又は、予測画像アピアランスと実際の画像観測量との間の不一致に基づいてモデル化される。

Are modeled based on a new technique, i.e., based on a mismatch between the predicted contour and the detected actual contour, or a mismatch between the predicted image appearance and the actual image observation.

２値マスク Binary mask

が得られた後、観測尤度が直ちに計算される。例えば、アピアランスモデルが適用されている場合、

Is obtained, the observation likelihood is calculated immediately. For example, if an appearance model is applied,

である内部画素だけが観測モデルに寄与する（非特許文献２６）。これは、輪郭に基づく観測モデルの場合と同じである。図１４は適応的再サンプリングを用いて個数が変化する対象物を追跡するパーティクルフィルタの説明図である。図１４には完全なアルゴリズムが示されている。

Only the internal pixels that contribute to the observation model (Non-patent Document 26). This is the same as the case of the observation model based on the contour. FIG. 14 is an explanatory diagram of a particle filter that tracks an object whose number changes using adaptive resampling. FIG. 14 shows the complete algorithm.

初期化関数g(t)は、一意の識別子tを割り当て、対象物事前分布に従って位置、速度及び形状を生成する。位置は可視的シーンに対応した四角形から一意に得られ、形状は形状ＡＲＰの定常状態分布から得られる。例示であるため、パーティクル集合に存在し得る個別の対象物の総数をM_maxに制限し、本実施例では、M_t個の対象物を追跡する。｜M_t-1｜<M_maxのときに限り、初期サンプルが生成される。時点t=0での初期化は簡単であり、各パーティクルには等しい重みが割り当てられ、対象物の個数は０個であると仮定する。 The initialization function g (t) assigns a unique identifier t and generates position, velocity and shape according to the object prior distribution. The position is uniquely obtained from a square corresponding to the visible scene, and the shape is obtained from the steady state distribution of the shape ARP. As an example, the total number of individual objects that can exist in the particle set is limited to M _max , and in this example, M _t objects are tracked. An initial sample is generated only if | M _t-1 | <M _max . Initialization at time t = 0 is simple, assuming that each particle is assigned an equal weight and the number of objects is zero.

図１５は不特定の数の対象物を追跡するパーティクルフィルタの処理のフローチャートである。 FIG. 15 is a flowchart of a particle filter process for tracking an unspecified number of objects.

［６］実験
本発明の方法を２種類のアプリケーションに適用した。実験１は多数の円形領域を追跡するアプリケーションであり、実験２は会議室内の人物の行動を追跡するアプリケーションである。実験１は、複数の同一対象物を追跡する能力を調べるために使用された。なぜならば、円形領域は同じアピアランスを有するからである。円形の直径は追跡中に変化し、輪郭に基づく観測モデルが使用された。実験２では、本発明の方法は、複数の人物の動きと形状を追跡するために使用された。本実験で示されたもう一つの興味深い性質は、本発明の方法が、追跡中に副生成物として、対象物の動きと姿勢のクラスを出力し得ることである。 [6] Experiment The method of the present invention was applied to two types of applications. Experiment 1 is an application that tracks a large number of circular areas, and Experiment 2 is an application that tracks the actions of a person in a conference room. Experiment 1 was used to examine the ability to track multiple identical objects. This is because the circular regions have the same appearance. The circular diameter changed during tracking, and a contour-based observation model was used. In Experiment 2, the method of the present invention was used to track the movement and shape of multiple people. Another interesting property shown in this experiment is that the method of the present invention can output a class of object motion and posture as a by-product during tracking.

［６．１］実験１：円形領域追跡
本実験では、対象物間のオクルージョンを考慮し、３通りの状況で結果を得た。図１６は、５８７フレーム長のテスト用ビデオシーケンスからの３フレームを表す図である。各フレームのサイズは３２０×２４０であり、３０フレーム／秒である。 [6.1] Experiment 1: Circular area tracking In this experiment, the results were obtained in three different situations in consideration of occlusion between objects. FIG. 16 is a diagram illustrating three frames from a test video sequence having a length of 587 frames. The size of each frame is 320 × 240, which is 30 frames / second.

本実験１では、円形領域の外形、即ち、円が追跡されるので、各対象物の状態Xは円の３個のパラメータである。本実験１で使用した動的パラメータの数値は表１に示されている。 In Experiment 1, since the outer shape of the circular area, that is, the circle is tracked, the state X of each object is the three parameters of the circle. The numerical values of the dynamic parameters used in Experiment 1 are shown in Table 1.

第１の状況では、３個の円の中から指定された１個の円を追跡する。追跡の結果は図１７に示されている。追跡結果の円には番号１が付されている。図１７は、３個の円の中から指定された１個の円を追跡する処理の説明図であり、（ａ）から（ｃ）は、観測モデルにおいてオクルージョンと動きクラスを考慮しない場合の結果を表し、（ｄ）から（ｆ）はオクルージョンと動きクラスを考慮した観測モデルを用いた結果を表している。（ａ）から（ｃ）では、オクルージョンを考慮に入れていないので、追跡中の円が別の円に接近するフレーム１４１で追跡結果が離れ始め、最終的には、フレーム１４７において間違った円を追跡している。これに対して、適応的観測が適用された（ｄ）から（ｆ）では、追跡結果はすべてのシーケンスを通じて指定された円に合っている。

In the first situation, one designated circle is traced among the three circles. The results of the tracking are shown in FIG. The tracking result circle is numbered 1. FIG. 17 is an explanatory diagram of a process of tracking one circle designated from three circles, and (a) to (c) are results when the observation model does not consider occlusion and motion class. (D) to (f) show the results using an observation model that takes into account occlusion and motion classes. In (a) to (c), the occlusion is not taken into consideration, so that the tracking result starts to be separated in the frame 141 in which the circle being tracked approaches another circle. Tracking. On the other hand, in (d) to (f) where adaptive observation is applied, the tracking result matches the specified circle throughout the sequence.

第２の状況では、一定数の複数対象物を追跡する。その追跡結果は図１８に番号１〜３が付された円として示されている。本実験例では、各円のダイナミクスを別々に取り扱うので、それぞれの動的モデルは異なる。（ａ）から（ｃ）では、円１は２状態動的モデルで記述され、円２は３状態動的モデルで記述され、円３は１状態動的モデルで記述されている。追跡装置は、フレーム５０以降では円３を見失っているが、ビデオシーケンスの全体を通じてオクルージョンと交差が発生しても円１と円２を追跡し続けた。円３を見失う問題は、（ｄ）から（ｆ）において、円３の動的モデルを１状態動的モデルで置き換えることによって解決された。この実験は、時間的コヒーレンスが追跡の際に、特に、多数の同一対象物を追跡する際に、重要な役割を果たしていることを示している。 In the second situation, a certain number of multiple objects are tracked. The tracking results are shown as circles numbered 1 to 3 in FIG. In this experimental example, the dynamics of each circle are handled separately, so that each dynamic model is different. In (a) to (c), circle 1 is described by a two-state dynamic model, circle 2 is described by a three-state dynamic model, and circle 3 is described by a one-state dynamic model. The tracker lost track of circle 3 after frame 50, but continued to track circle 1 and circle 2 even if occlusions and intersections occurred throughout the video sequence. The problem of missing the circle 3 was solved by replacing the dynamic model of the circle 3 with a one-state dynamic model in (d) to (f). This experiment shows that temporal coherence plays an important role in tracking, especially when tracking a large number of identical objects.

最後に、図１９は、不特定の数の複数対象物を追跡する処理の説明図である。使用したテスト用ビデオは７００フレームの長さがあり、フレームサイズは上記の実験例と同じであり、輪郭に基づく適応的観測モデルを使用した。このビデオに使用された動的モデルは、円１に対しては２状態、円２に対しては３状態、円３に対しては３状態である。これらの動的モデルは３００フレームの学習用ビデオから学習した。図２０は、動的モデルを学習するために使用されたこれらの３個の移動円の軌跡を表す図である。 Finally, FIG. 19 is an explanatory diagram of processing for tracking an unspecified number of multiple objects. The test video used was 700 frames long, the frame size was the same as in the experimental example above, and an adaptive observation model based on contours was used. The dynamic model used for this video has two states for circle 1, three states for circle 2, and three states for circle 3. These dynamic models were learned from 300 frames of learning videos. FIG. 20 is a diagram showing the trajectories of these three moving circles used for learning the dynamic model.

［６．２］実験２：人物追跡
本実験では、例えば、会議室内の３乃至５人の複数の人物の形状と動きを追跡するため本発明の方法を使用する。追跡装置は、姿勢と動きタイプ（例えば、歩行、会釈、席を見つけるための移動など）を出力する。 [6.2] Experiment 2: Person Tracking In this experiment, for example, the method of the present invention is used to track the shape and movement of a plurality of 3 to 5 persons in a conference room. The tracking device outputs a posture and a motion type (for example, walking, talk, movement to find a seat, etc.).

［７］複数対象物追跡システムの構成
本発明の複数対象物追跡方法は、この追跡方法の各ステップをコンピュータに実現させるためのプログラムを、コンピュータに接続されるハードディスク装置や、ＣＤ−ＲＯＭ、ＤＶＤ又はフレキシブルディスクなどの可搬型記憶媒体に格納し、本発明を実施する際にコンピュータにインストールし、又は、通信回線からコンピュータにダウンロードし、インストールして、コンピュータのＣＰＵ等でこのプログラムを実行することによって容易に実現される。 [7] Configuration of Multi-Object Tracking System The multi-object tracking method of the present invention includes a program for causing a computer to implement each step of the tracking method, a hard disk device connected to the computer, a CD-ROM, a DVD. Alternatively, the program is stored in a portable storage medium such as a flexible disk and installed in a computer when the present invention is carried out, or downloaded to a computer from a communication line and installed, and this program is executed by a CPU of the computer or the like. Is easily realized.

また、図２１は、本発明の複数対象物追跡方法を実現するシステムの機能ブロック図であり、複数対象物追跡システムは、
画像シーケンスを入力して複数対象物を個別に同時に追跡するパーティクルフィルタリングに基づく複数対象物追跡処理部２１０１と、
対象物間に発生したオクルージョンに関するすべての仮説を維持するオクルージョンリストと、観測モデルが対象物のアピアランス変形によって変化しないようにさせる変数と、を個別の対象物のコンフィギュレーションの複合体の状態表現に導入することによって排他原理を複数対象物へ拡張する拡張処理部２１０２と、
動的ベイジアンネットワークを、排他原理に基づく専用設計された観測モデルと組み合わせ使用することにより、ベイズ推論フレームワークの範囲内で複数対象物追跡を定式化するベイズ推論部２１０３と、
全体的なオクルージョンが発生したときの曖昧さを軽減するための混合状態動的プロセスと、新たに到着した対象物を追跡するための再初期化を、同じ枠組みで考慮する統合処理部２１０４と、
を有する。 FIG. 21 is a functional block diagram of a system for realizing the multiple object tracking method of the present invention.
A multi-object tracking processing unit 2101 based on particle filtering that inputs an image sequence and simultaneously tracks a plurality of objects individually;
An occlusion list that maintains all hypotheses about occlusions that occur between objects, and variables that keep the observation model from changing due to the appearance deformation of the objects, in a state representation of the complex of individual object configurations An expansion processing unit 2102 for extending the exclusion principle to a plurality of objects by introducing;
A Bayesian inference unit 2103 that formulates multiple object tracking within the Bayesian inference framework by using a dynamic Bayesian network in combination with a dedicated designed observation model based on the exclusion principle;
An integrated processing unit 2104 that considers mixed state dynamic processes to reduce ambiguity when global occlusion occurs and re-initialization to track newly arrived objects in the same framework;
Have

以上で説明した実施例３は本発明を実施するための最良の形態の一つにすぎず、本発明はその趣旨を逸脱しない限り種々変形して実施可能である。 The third embodiment described above is only one of the best modes for carrying out the present invention, and the present invention can be implemented with various modifications without departing from the gist thereof.

［８］まとめ
以上の通り、本発明では、典型的なパーティクルフィルタを拡張することにより複数対象物追跡方法を実現した。本発明は、対象物間のオクルージョンをモデル化する隠れ過程と、各対象物の変換をモデル化する隠れ過程を、パーティクルフィルタ形式の基本動的ベイジアンネットワークに統合することによって、複数対象物追跡における、対象物間のオクルージョンと、幾何学的変換によって生じる可能性のある変形に対する適応性とを、ベイジアンネットワークの枠組みで取り扱うことを可能にさせ、複数対象物追跡に対する逐次モンテカルロ解決法を見出した。 [8] Summary As described above, in the present invention, the multi-object tracking method is realized by extending a typical particle filter. The present invention integrates a hidden process that models occlusion between objects and a hidden process that models the transformation of each object into a basic dynamic Bayesian network in the form of a particle filter. It has made it possible to handle occlusion between objects and adaptability to deformations caused by geometric transformations in a Bayesian network framework, and found a sequential Monte Carlo solution for multiple object tracking.

本発明の一実施例による複数対象物追跡方法のフローチャートである。3 is a flowchart of a method for tracking multiple objects according to an embodiment of the present invention. 追跡問題を隠れマルコフモデルと類似した動的ベイジアンネットワークを用いて表現した例の説明図である。It is explanatory drawing of the example which expressed the tracking problem using the dynamic Bayesian network similar to a hidden Markov model. 適応的再サンプリングを含む生成的パーティクルフィルタリングアルゴリズムの説明図である。FIG. 4 is an illustration of a generative particle filtering algorithm that includes adaptive resampling. 図３のパーティクルフィルタリングの処理のフローチャートである。It is a flowchart of the process of the particle filtering of FIG. 複数対象物追跡方法を表現する動的ベイジアンネットワークのグラフである。3 is a graph of a dynamic Bayesian network expressing a multi-object tracking method. 対象物間にオクルージョンが発生した状況を示す隠れ状態過程が動的ベイジアンネットワークに追加された様子の説明図である。It is explanatory drawing of a mode that the hidden state process which shows the condition where the occlusion generate | occur | produced between the objects was added to the dynamic Bayesian network. Occl_listを構築する例の説明図である。It is explanatory drawing of the example which builds Occl_list. 画像上の輪郭観測と測定ラインに沿ったエッジ点の説明図である。It is explanatory drawing of the edge point along the outline observation and measurement line on an image. 測定ライン上のエッジ点の説明図である。It is explanatory drawing of the edge point on a measurement line. 観測画像をモデル化するために離散的変換変数を潜在的画像に対する密度モデルに加える様子を示すグラフィカルモデルを表す図である。It is a figure showing the graphical model which shows a mode that a discrete transformation variable is added to the density model with respect to a latent image in order to model an observation image. 変換の影響を受けないビデオ処理のための動的ベイジアンネットワークの一例の説明図である。It is explanatory drawing of an example of the dynamic Bayesian network for the video processing which is not influenced by conversion. 新しい動的ベイジアンネットワークの説明図である。It is explanatory drawing of a new dynamic Bayesian network. 図１２のグラフィカルモデルの簡略化されたモデルの説明図である。It is explanatory drawing of the simplified model of the graphical model of FIG. 適応的再サンプリングを用いて個数が変化する対象物を追跡するパーティクルフィルタの説明図である。It is explanatory drawing of the particle filter which tracks the target object from which a number changes using adaptive resampling. 適応的再サンプリングを用いて個数が変化する対象物を追跡するパーティクルフィルタの処理のフローチャートである。It is a flowchart of the process of the particle filter which tracks the target object from which a number changes using adaptive resampling. テスト用ビデオからの３フレームを示す図であり、（ａ）はフレーム２０、（ｂ）はフレーム１５０、（ｃ）はフレーム１６０である。It is a figure which shows 3 frames from the test video, (a) is the frame 20, (b) is the frame 150, (c) is the frame 160. ３個の円の中から指定された１個の円を追跡する処理の説明図であり、（ａ）から（ｃ）は、観測モデルにおいてオクルージョンと動きクラスを考慮しない場合の結果を表し、（ｄ）から（ｆ）はオクルージョンと動きクラスを考慮した観測モデルを用いた結果を表している。It is explanatory drawing of the process which tracks one circle designated from three circles, (a)-(c) represents the result when an occlusion and a motion class are not considered in an observation model, ( d) to (f) show the results using an observation model that takes into account occlusion and motion classes. 一定数の円を追跡する処理の説明図であり、時間的コヒーレンスが追跡に重要な役割を果たし、（ａ）から（ｃ）では、円３はその動的モデルが原因で追跡結果が外れ、（ｄ）から（ｆ）では、円３の動的モデルを置換した結果が示され、どちらの場合でも観測モデルはオクルージョン下でも十分に機能することがわかる。It is explanatory drawing of the process which tracks a certain number of circles, and temporal coherence plays an important role in tracking, and in (a) to (c), the tracking result of Circle 3 is out of tracking results due to its dynamic model, In (d) to (f), the result of replacing the dynamic model of circle 3 is shown, and it can be seen that in either case, the observation model functions well even under occlusion. 不特定数の円を追跡する実験結果の説明図である。It is explanatory drawing of the experimental result which tracks an unspecified number of circles. 図１７の実験における動的モデルを学習するため使用された３個の円の軌跡を表す図である。It is a figure showing the locus | trajectory of three circles used in order to learn the dynamic model in the experiment of FIG. 本発明の複数対象物追跡方法を実現するシステムの機能ブロック図である。It is a functional block diagram of the system which implement | achieves the multiple object tracking method of this invention.

Explanation of symbols

２１０１複数対象物追跡処理部
２１０２拡張処理部
２１０３ベイズ推論部
２１０４統合処理部
2101 Multiple object tracking processing unit 2102 Extended processing unit 2103 Bayesian inference unit 2104 Integrated processing unit

Claims

A multi-object tracking method based on particle filtering for individually tracking multiple objects in an image sequence simultaneously,
Explicitly describe occlusions between objects in particles,
Set an observation likelihood function that handles objects from non-overlapping to overlapping objects,
Incorporate hidden expressions into the state expression to model the dynamics of object appearance deformation.
A method for tracking multiple objects.

In the state representation, the basic representation is augmented by representing all objects in the image as object configurations, and occlusion is explicitly handled by augmenting an occlusion list with the object configuration. Item 2. The multi-object tracking method according to Item 1.

The multi-object tracking method according to claim 1, wherein the observation likelihood reflects a likelihood of an object that appears while the number of objects changes.

The multi-object tracking method according to claim 1, wherein the hidden representation is a stochastic random variable and generates an observation model that is invariant to a view or a geometric transformation.

A multi-object tracking method based on particle filtering for individually tracking multiple objects in an image sequence simultaneously,
An occlusion list that maintains all hypotheses about occlusions that occur between objects, and a variable that keeps the observation model from changing due to the appearance deformation of the object. Extending the exclusion principle to multiple objects by introducing;
Formulating multiple object tracking within a Bayesian inference framework by using a dynamic Bayesian network in combination with a dedicated designed observation model based on exclusion principles;
Considering a mixed-state dynamic process to reduce ambiguity when global occlusion occurs and reinitialization to track newly arrived objects in the same framework;
A multi-object tracking method comprising:

Multiple object tracking processing unit based on particle filtering that inputs image sequence and tracks multiple objects individually and simultaneously,
An occlusion list that maintains all hypotheses about occlusions that occur between objects, and a variable that keeps the observation model from changing due to the appearance deformation of the object. An expansion processing unit that extends the exclusion principle to multiple objects by introducing;
A Bayesian inference unit that formulates multiple object tracking within the Bayesian inference framework by using a dynamic Bayesian network in combination with a dedicated designed observation model based on the exclusion principle;
An integrated processing unit that considers mixed state dynamic processes to reduce ambiguity when global occlusion occurs and re-initialization to track newly arrived objects in the same framework;
A multi-object tracking system.

Multiple object tracking function based on particle filtering to input multiple image sequences and track multiple objects individually,
An occlusion list that maintains all hypotheses about occlusions that occur between objects, and a variable that keeps the observation model from changing due to the appearance deformation of the object. A function to extend the exclusion principle to multiple objects by introducing,
Bayesian inference function that formulates multi-object tracking within the Bayesian inference framework by using a dynamic Bayesian network in combination with a specially designed observation model based on exclusion principles;
An integrated function that considers mixed state dynamic processes to reduce ambiguity when global occlusion occurs and reinitialization to track newly arrived objects in the same framework;
A program to make a computer realize.

A computer-readable recording medium on which the program according to claim 7 is recorded.