JP2009217523A

JP2009217523A - Dynamic image processing method, dynamic image processing device and dynamic image processing program

Info

Publication number: JP2009217523A
Application number: JP2008060339A
Authority: JP
Inventors: Katsuhiko Ishiguro; 勝彦石黒; Takeshi Yamada; 武士山田; Shuko Ueda; 修功上田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-03-10
Filing date: 2008-03-10
Publication date: 2009-09-24
Anticipated expiration: 2028-03-10
Also published as: JP4972016B2

Abstract

<P>PROBLEM TO BE SOLVED: To further accurately perform tracking for a plurality of moving objects. <P>SOLUTION: A dynamic image processing device 1 can further accurately perform tracking for a plurality of moving objects by repetitively executing each processing of estimation of a plurality of hidden variables and respective weights of the hidden variables, estimation of hidden state quantities, resampling based on the weight distribution of particles, and update of the hyper parameter of dynamics. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、複数の移動対象が存在する動画像などのように、複数対象の隠れ状態量（真値の推定値）から出力される情報によって構成される時系列データが入力として与えられる場合に、複数対象の時間発展（物理量の値が時々刻々変化すること）を制御するダイナミクスパターンの数と各パターンの特徴を推定し、未知数の追跡対象の隠れ状態量を精度良く推定する技術に関する。 In the present invention, when time-series data composed of information output from a plurality of hidden state quantities (estimated values of true values) is given as input, such as a moving image having a plurality of moving objects. The present invention relates to a technique for estimating the number of dynamic patterns that control the temporal development of multiple objects (the value of a physical quantity changes from moment to moment) and the characteristics of each pattern, and accurately estimating the hidden state quantities of unknown tracking targets.

複数の移動対象の追跡は動画像処理の研究において重要な問題の一つである。追跡とは、動画像中から検出された追跡対象の正しい位置や状態量を、その時間変化をモデリングすることによって連続的に推定することである。結果として、動画像から各対象の位置（状態）の時空間軌跡を得ることができる。動画像中の複数対象を同時に追跡することができれば、行動パターンの認識や状況の自動監視、動画像のコンテンツ理解などさまざまな用途に役立てることが可能となる。 Tracking multiple moving objects is one of the important problems in the study of moving image processing. The tracking is to continuously estimate the correct position and state quantity of the tracking target detected from the moving image by modeling the temporal change. As a result, a spatiotemporal trajectory of the position (state) of each object can be obtained from the moving image. If a plurality of objects in a moving image can be tracked at the same time, it can be used for various purposes such as recognition of behavior patterns, automatic monitoring of situations, and understanding of contents of moving images.

従来の複数対象追跡アルゴリズム（例えば非特許文献１，２参照）は、追跡すべき追跡対象の数が未知、あるいは時間的に変動する場合においても各対象の状態軌跡を推定することが可能である。特に、非特許文献２の技術によれば、追跡対象の数の変化も確率的にモデル化しており、動画像に限らず多くの時系列データに関して追跡処理を行うことが可能である。 Conventional multi-object tracking algorithms (see, for example, Non-Patent Documents 1 and 2) can estimate the state trajectory of each target even when the number of tracking targets to be tracked is unknown or temporally fluctuates. . In particular, according to the technique of Non-Patent Document 2, a change in the number of tracking targets is also modeled stochastically, and tracking processing can be performed not only for moving images but also for many time-series data.

ほとんどの既存の複数対象追跡手法（非特許文献１，２など）では、追跡対象の時間発展を単一のダイナミクスモデルでモデル化している。すなわち、動画像中のあらゆる対象が同一のダイナミクスパターン（動作パターン）で移動すると仮定している。しかし、多くの移動物体がシーン中で観察される場合、各対象の特徴に応じた、複数の異なるダイナミクスが存在するという仮定の方がより自然で正しいモデル化であると予想される。なお、本明細書において、ダイナミクスとは、「隠れ状態量の時間発展関数」および「観測量を計算するための観測関数」のことを指し、詳細については後記する。 In most existing multi-object tracking methods (Non-Patent Documents 1, 2, etc.), the time evolution of the tracking object is modeled by a single dynamics model. That is, it is assumed that all objects in the moving image move with the same dynamics pattern (motion pattern). However, when many moving objects are observed in the scene, the assumption that there are a plurality of different dynamics depending on the characteristics of each target is expected to be a more natural and correct modeling. In this specification, the dynamics refers to a “time evolution function of a hidden state quantity” and an “observation function for calculating an observation quantity”, and details will be described later.

このような追跡モデルの設計には、ｉ）ダイナミクスパターンの数、そして、ｉｉ）各パターンの特徴、を事前に知る必要がある。これらの情報を人手によって推定することは困難なため、機械学習による解決法が必要となる。ｉ）の問題は、時系列データのクラスタリング手法（例えば非特許文献３，４など）を用いて解決できる。ｉｉ）の問題は、時系列データにおけるモデルパラメータの推定問題として解決される（例えば非特許文献５，６）。
B. Leibe, N.Cornelis, K. Cornelis and L. Van Gool, “Dynamic 3D Scene Analysis from a Moving Vehicle”, in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2007. S. Saerkkae, A. Vehtari and J. Lampinen, “Rao-Blackwellized Particle Filter for Multiple Target Tracking”, Information Fusion, Vol. 8, N0. 1， pp・2-15, 2007. M. Ramoni and P. Sebastiani and P. Cohen, “Bayesian Clustering by Dynamics”, Machine Learning, Vol. 47, No. 1, pp. 91-121 ， 2002. P. Smyth, “Clustering Sequences with Hidden Markov Models”，in Advances in Neural Information Processing Systems 9, pp. 648-654, 1997. F. Caron， M. Davy， A. Doucet， E. Duflos and P. Vanheeghe，“Bayesian Inference for Linear Dynamic Models with DirichletProcess Mixtures”, IEEE Transactions on Signal Processing, To appear, 2007. M. J. Cassidy and W. D. Penny, “Bayesian Nonstationary Autoregressive Models for Biomedical Signal Analysis”, IEEE Transactions on Biomedical Engineering, Vol. 49, No. 10, PP. 1142-1152, 2002. Designing such a tracking model requires prior knowledge of i) the number of dynamics patterns and ii) the characteristics of each pattern. Since it is difficult to estimate this information manually, a solution by machine learning is required. The problem i) can be solved by using a time-series data clustering method (for example, Non-Patent Documents 3 and 4). The problem ii) is solved as a model parameter estimation problem in time series data (for example, Non-Patent Documents 5 and 6).
B. Leibe, N. Cornelis, K. Cornelis and L. Van Gool, “Dynamic 3D Scene Analysis from a Moving Vehicle”, in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2007. S. Saerkkae, A. Vehtari and J. Lampinen, “Rao-Blackwellized Particle Filter for Multiple Target Tracking”, Information Fusion, Vol. 8, N0. 1, pp 2-15, 2007. M. Ramoni and P. Sebastiani and P. Cohen, “Bayesian Clustering by Dynamics”, Machine Learning, Vol. 47, No. 1, pp. 91-121, 2002. P. Smyth, “Clustering Sequences with Hidden Markov Models”, in Advances in Neural Information Processing Systems 9, pp. 648-654, 1997. F. Caron, M. Davy, A. Doucet, E. Duflos and P. Vanheeghe, “Bayesian Inference for Linear Dynamic Models with DirichletProcess Mixtures”, IEEE Transactions on Signal Processing, To appear, 2007. MJ Cassidy and WD Penny, “Bayesian Nonstationary Autoregressive Models for Biomedical Signal Analysis”, IEEE Transactions on Biomedical Engineering, Vol. 49, No. 10, PP. 1142-1152, 2002.

しかしながら、既存のクラスタリング、あるいはパラメータ推定手法は、対象ごとに分離された観測系列（時系列）データにしか用いることができない。すなわち、複数の追跡対象が混在する観測系列データに対して、これらの従来手法を用いてパターンを学習することは不可能である。 However, existing clustering or parameter estimation methods can only be used for observation series (time series) data separated for each object. That is, it is impossible to learn a pattern using these conventional methods for observation series data in which a plurality of tracking targets are mixed.

そこで、本発明は、前記した問題を解決するためになされたものであり、複数の移動対象に対する追跡をより高精度で行うことを目的とする。 Accordingly, the present invention has been made to solve the above-described problem, and an object thereof is to perform tracking with respect to a plurality of moving objects with higher accuracy.

前記課題を解決するために、本発明は、複数の移動対象が存在する動画像に関する観測データが入力として与えられる場合に、確率的な生成モデルを用いて、前記複数の移動対象に関する真値の推定量を示す隠れ状態量の時間発展関数と観測データを計算するための観測関数とを示すダイナミクスの、パターン数および各特徴を推定することで、前記複数の移動対象の隠れ状態量を推定する動画像処理装置による動画像処理方法であって、前記動画像処理装置は、前記観測データ、前記移動対象それぞれの隠れ状態量、前記移動対象それぞれに関する前記観測データと前記隠れ状態量それぞれとの間の関係を示す複数の隠れ変数、前記隠れ変数それぞれの重み、および、前記ダイナミクスのハイパーパラメータ、を記憶する記憶手段と、演算手段と、を備えており、前記演算手段は、隠れ変数集合推定部が、前記観測データに基づいて、前記複数の隠れ変数、および、前記隠れ変数それぞれの重みを推定するステップと、隠れ状態推定部が、前記観測データ、現在計算している時刻の１つ前の時刻の前記隠れ状態量、および、前記隠れ変数集合推定部が推定した前記複数の隠れ変数に基づいて、前記隠れ状態量を推定するステップと、リサンプリング部が、前記隠れ変数集合推定部が推定した前記複数の隠れ変数、および、前記隠れ状態推定部が推定した前記隠れ状態量に基づいて、前記隠れ変数の一推定結果とそれに付随した隠れ状態量の推定結果を示すパーティクルの重みの分布から事後分布の高い部分を推測してリサンプリングを行うステップと、ハイパーパラメータ更新部が、前記リサンプリング部によるリサンプリング結果に基づいて、前記ダイナミクスのハイパーパラメータを更新するステップと、を繰り返し実行することで、前記隠れ状態推定部による前記複数の移動対象の隠れ状態量の推定を行うことを特徴とする。 In order to solve the above-described problem, the present invention provides a true value of a plurality of moving objects using a probabilistic generation model when observation data related to a moving image in which a plurality of moving objects exist is given as an input. Estimating the number of hidden states of the multiple moving objects by estimating the number of patterns and each feature of the dynamics indicating the time evolution function of the hidden state amount indicating the estimation amount and the observation function for calculating the observation data A moving image processing method by a moving image processing device, wherein the moving image processing device includes the observation data, a hidden state quantity of each of the moving objects, and between the observed data and each of the hidden state quantities of each of the moving objects. Storage means for storing a plurality of hidden variables indicating the relationship of each, a weight of each of the hidden variables, and a hyperparameter of the dynamics, and an operation The hidden variable set estimation unit estimates the weights of the plurality of hidden variables and the hidden variables based on the observation data; and hidden state estimation. Is configured to calculate the hidden state quantity based on the observation data, the hidden state quantity at a time immediately before the currently calculated time, and the plurality of hidden variables estimated by the hidden variable set estimation unit. The estimation step, and the resampling unit, based on the plurality of hidden variables estimated by the hidden variable set estimation unit and the hidden state quantity estimated by the hidden state estimation unit, one estimation result of the hidden variable A step of performing resampling by estimating a portion with a high posterior distribution from the distribution of the weights of the particles indicating the estimation result of the hidden state amount accompanying with the hyperparameter updating unit And repeatedly executing the step of updating the hyperparameters of the dynamics based on the resampling result by the resampling unit, thereby estimating the hidden state quantities of the plurality of moving targets by the hidden state estimating unit. It is characterized by that.

かかる発明によれば、複数の隠れ変数および隠れ変数それぞれの重みの推定、隠れ状態量の推定、パーティクルの重みの分布に基づいたリサンプリング、ダイナミクスのハイパーパラメータの更新、の各処理を繰り返し実行することで、複数の移動対象に対する追跡をより高精度で行うことができる。 According to this invention, the respective processes of estimation of a plurality of hidden variables and the respective weights of the hidden variables, estimation of the hidden state quantity, resampling based on the distribution of the weights of the particles, and update of the hyperparameters of the dynamics are repeatedly executed. Thus, it is possible to perform tracking with respect to a plurality of moving objects with higher accuracy.

また、本発明は、隠れ変数集合推定部が、前記観測データに基づいて、前記複数の隠れ変数、および、前記隠れ変数それぞれの重みを推定するステップにおいて、隠れ変数決定部による、前記１つ前の時刻の前記隠れ状態量、前記１つ前の時刻の前記複数の隠れ変数、および、前記１つ前の時刻の前記隠れ変数それぞれの重みに基づいて、前記パーティクルごとに、前記複数の隠れ変数を推定するステップと、隠れ変数重み計算部による、前記隠れ変数決定部が推定した前記複数の隠れ変数に基づいて、前記パーティクルごとに、前記隠れ変数それぞれの重みを推定するステップと、を実行することが望ましい。 Further, according to the present invention, in the step in which the hidden variable set estimation unit estimates the weights of the plurality of hidden variables and the hidden variables based on the observation data, the hidden variable determination unit determines the previous one. The plurality of hidden variables for each particle based on the weight of the hidden state at the time of the first time, the plurality of hidden variables at the previous time, and the weight of the hidden variable at the previous time. And a step of estimating a weight of each hidden variable for each particle based on the plurality of hidden variables estimated by the hidden variable determination unit by a hidden variable weight calculation unit. It is desirable.

かかる発明によれば、隠れ変数決定部がパーティクルごとに複数の隠れ変数を、また、隠れ変数重み計算部がパーティクルごとに隠れ変数それぞれの重みを、それぞれ推定することで、パーティクルの数だけ異なる推定仮説（および各重み）をもつことができる。したがって、移動対象が急激に（瞬間的に）運動方向を変化させた場合などでも、対応する推定仮説を使うことで、複数の移動対象をより高精度で追跡することができる。 According to this invention, the hidden variable determination unit estimates a plurality of hidden variables for each particle, and the hidden variable weight calculation unit estimates the weight of each hidden variable for each particle, so that the estimations differ by the number of particles. You can have hypotheses (and each weight). Therefore, even when the moving object suddenly (instantaneously) changes the direction of movement, a plurality of moving objects can be tracked with higher accuracy by using the corresponding estimation hypothesis.

また、本発明は、前記隠れ変数決定部が、前記１つ前の時刻の前記隠れ状態量、前記１つ前の時刻の前記複数の隠れ変数、および、前記１つ前の時刻の前記隠れ変数それぞれの重みに基づいて、前記パーティクルごとに、前記複数の隠れ変数を推定するステップにおいて、前記１つ前の時刻の前記隠れ状態量、前記１つ前の時刻の前記複数の隠れ変数、および、現在計算している時刻の前記観測データの入力を受け付け、移動対象数決定部による、前記複数の隠れ変数のうち、ある時刻において前記移動対象それぞれが前記動画像中に存在するか否かを表す前記隠れ変数を決定するステップと、動作パターン番号決定部による、前記複数の隠れ変数のうち、前記移動対象の動作パターンの識別子を表す前記隠れ変数を決定するステップと、動作パターンパラメータ決定部による、前記動作パターンのパラメータを決定するステップと、移動対象対応関係決定部による、前記複数の隠れ変数のうち、前記観測データと前記移動対象の対応関係を表す前記隠れ変数を決定するステップと、を実行することが望ましい。 Further, according to the present invention, the hidden variable determination unit includes the hidden state quantity at the previous time, the plurality of hidden variables at the previous time, and the hidden variable at the previous time. In the step of estimating the plurality of hidden variables for each particle based on the respective weights, the hidden state quantity at the previous time, the plurality of hidden variables at the previous time, and Receives input of the observation data at the currently calculated time, and indicates whether each of the moving objects exists in the moving image at a certain time among the plurality of hidden variables by the moving object number determination unit Determining the hidden variable; determining the hidden variable representing an identifier of the movement pattern of the movement target among the plurality of hidden variables by the movement pattern number determination unit; A step of determining a parameter of the operation pattern by a pattern parameter determination unit; and a determination of the hidden variable representing the correspondence between the observation data and the movement target among the plurality of hidden variables by the movement target correspondence determination unit It is desirable to carry out the following steps.

かかる発明によれば、移動対象数決定部が移動対象の存在を表す隠れ変数を決定することで、移動対象の増減に対応することができる。また、動作パターン番号決定部が移動対象の動作パターンの識別子を表す隠れ変数を決定することで、移動対象が動作パターンを切り替えながら行動する状況に対応することができる。さらに、動作パターンパラメータ決定部が動作パターンのパラメータを決定することで、動作パターンの初期パラメータが不適切であることによって追跡精度が低下する事態を回避できる。また、移動対象対応関係決定部が観測データと移動対象の対応関係を表す隠れ変数を決定することで、観測データと移動対象の対応関係を把握することができる。 According to this invention, it is possible to cope with increase / decrease in the number of movement targets by determining the hidden variable representing the existence of the movement target by the movement target number determination unit. In addition, since the motion pattern number determination unit determines the hidden variable that represents the identifier of the motion pattern to be moved, it is possible to cope with a situation in which the movement target acts while switching the motion pattern. Furthermore, since the motion pattern parameter determination unit determines the motion pattern parameters, it is possible to avoid a situation in which the tracking accuracy is deteriorated due to an inappropriate initial parameter of the motion pattern. In addition, the correspondence relationship between the observation data and the movement target can be grasped by the movement target correspondence determination unit determining the hidden variable representing the correspondence between the observation data and the movement target.

また、本発明に係る動画像処理プログラムは、コンピュータを動画像処理装置として機能させることを特徴とする。このような構成により、このプログラムをインストールされたコンピュータは、このプログラムに基づいた各機能を実現することができる。 A moving image processing program according to the present invention causes a computer to function as a moving image processing apparatus. With such a configuration, a computer in which this program is installed can realize each function based on this program.

本発明によれば、複数の移動対象に対する追跡をより高精度で行うことができる。 According to the present invention, tracking for a plurality of moving objects can be performed with higher accuracy.

以下、図面を参照（言及図以外の図も適宜参照）して、本発明を実施するための最良の形態（以下、「実施形態」という。）について詳細に説明する。本実施形態では、動画像中に存在する複数の対象（移動対象または追跡対象ともいう。）を同時に追跡する手法について説明する。なお、前記したように、ダイナミクスとは、「隠れ状態量（後記するｘ（ｔ））の時間発展関数（後記する式（２０））」および「観測量（後記するｙ（ｔ））を計算するための観測関数（後記する式（２１））」のことを指す。また、ダイナミクスの学習とは、式（２０）、式（２１）に必要となるパラメータ（後記するξ，ψ）を推定することである。 The best mode for carrying out the present invention (hereinafter referred to as “embodiment”) will be described in detail below with reference to the drawings (refer to drawings other than the referenced drawings as appropriate). In the present embodiment, a method of simultaneously tracking a plurality of objects (also referred to as moving objects or tracking objects) existing in a moving image will be described. As described above, the dynamics is calculated by calculating the time evolution function of the hidden state quantity (x (t) described later) (formula (20) described later) and the observed quantity (y (t) described later). It refers to the observation function (formula (21) to be described later) for The learning of dynamics is to estimate parameters (ξ, ψ described later) necessary for Expressions (20) and (21).

従来手法と本実施形態による手法（以下、本手法という。）について改めて説明すると、従来手法では、複数対象の追跡に先んじて対象の状態変化を表現するダイナミクスモデルを決定しておく必要があるが、一般に対象のダイナミクスの数や各ダイナミクスのパラメータの値は未知であり、これらを自動的に推定して設定することは困難であった。一方、対象のダイナミクスを推定する手法も存在するが、これらの手法は複数対象が存在するデータに適用不能であった。本実施形態では、複数対象が存在するデータに対して、確率的な生成モデルを用いてダイナミクスパターンの数とパラメータを推定すると同時に未知数の対象に対する追跡を行うことができる手法について説明する。 The conventional method and the method according to the present embodiment (hereinafter referred to as the present method) will be described again. In the conventional method, it is necessary to determine a dynamics model that represents a change in the state of an object prior to tracking a plurality of objects. In general, the number of target dynamics and the values of the parameters of each dynamics are unknown, and it is difficult to automatically estimate and set them. On the other hand, there are methods for estimating the dynamics of objects, but these methods are not applicable to data with multiple objects. In the present embodiment, a method will be described in which the number of dynamics patterns and parameters are estimated using a probabilistic generation model for data having a plurality of targets, and at the same time, tracking of an unknown number of targets can be performed.

（動画像処理装置の構成）
まず、本実施形態の動画像処理装置の構成について説明する。図１は、本実施形態に係る動画像処理装置の構成を模式的に示す機能ブロック図である。動画像処理装置１は、例えば、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、入出力インタフェースなどから構成されるコンピュータ装置である。図１に示すように、動画像処理装置１は、演算手段２（例えばＣＰＵとＲＡＭ）、記憶手段３（例えばＲＯＭとＨＤＤ）、キーボードやマウスなどの入力手段４、および、液晶表示機などの出力手段５を備え、それらがバスライン６で接続されている。なお、ここでは動画像処理装置１の構成の概要について説明し、詳細については図２Ａ以下で説明する。 (Configuration of moving image processing apparatus)
First, the configuration of the moving image processing apparatus of the present embodiment will be described. FIG. 1 is a functional block diagram schematically showing the configuration of the moving image processing apparatus according to this embodiment. The moving image processing device 1 is a computer device including, for example, a central processing unit (CPU), a random access memory (RAM), a read only memory (ROM), a hard disk drive (HDD), and an input / output interface. As shown in FIG. 1, the moving image processing apparatus 1 includes a computing unit 2 (for example, CPU and RAM), a storage unit 3 (for example, ROM and HDD), an input unit 4 such as a keyboard and a mouse, and a liquid crystal display. Output means 5 is provided, and they are connected by a bus line 6. Here, an outline of the configuration of the moving image processing apparatus 1 will be described, and details will be described in FIG.

記憶手段３は、動画像に関する観測量（観測データまたは観測値ともいう。）３１、動画像におけるそれぞれの移動対象に関する真値の推定量を示す複数の隠れ状態量３２、それぞれの移動対象に関する観測データと隠れ状態量（隠れ状態ともいう。）との間の関係を示す複数の隠れ変数３３、隠れ変数それぞれの重み３４、および、ダイナミクスのハイパーパラメータを含む各種パラメータ３５、を記憶する（詳細は後記）。なお、記憶手段３は、特に図示しないが、演算手段２の動作プログラムなども記憶する。また、以下において、演算手段２が記憶手段３に対して各データの読み書きを行う場合、「記憶手段３に対して」などの記載は省略する。 The storage unit 3 includes an observation amount (also referred to as observation data or an observation value) 31 relating to a moving image, a plurality of hidden state quantities 32 indicating an estimated value of a true value relating to each moving object in the moving image, and an observation relating to each moving object. A plurality of hidden variables 33 indicating the relationship between data and a hidden state quantity (also referred to as a hidden state), weights 34 of the hidden variables, and various parameters 35 including hyper parameters of dynamics are stored (for details, see FIG. (Postscript). Note that the storage means 3 also stores an operation program of the calculation means 2 and the like although not particularly shown. In the following, when the calculation unit 2 reads / writes each data from / to the storage unit 3, the description “to the storage unit 3” is omitted.

演算手段２は、隠れ変数集合推定部２１、隠れ状態推定部２２、リサンプリング部２３、および、ハイパーパラメータ更新部２４を備えている。 The computing means 2 includes a hidden variable set estimation unit 21, a hidden state estimation unit 22, a resampling unit 23, and a hyper parameter update unit 24.

隠れ変数集合推定部２１は、観測データに基づいて、複数の隠れ変数、および、それぞれの隠れ変数の重みを推定するものであり、隠れ変数決定部２１１と隠れ変数重み計算部２１２とを備えている。 The hidden variable set estimation unit 21 estimates a plurality of hidden variables and the weight of each hidden variable based on the observation data, and includes a hidden variable determination unit 211 and a hidden variable weight calculation unit 212. Yes.

隠れ変数決定部２１１は、１つ前の時刻の隠れ変数、複数の隠れ状態量、および、それぞれの隠れ変数の重みに基づいて、パーティクルごとに、隠れ変数を推定するものであり、移動対象数決定部２１１１、動作パターン番号決定部２１１２、動作パターンパラメータ決定部２１１３、および、移動対象対応関係決定部２１１４を備えている。 The hidden variable determination unit 211 estimates the hidden variable for each particle based on the hidden variable at the previous time, the plurality of hidden state quantities, and the weight of each hidden variable. A determination unit 2111, an operation pattern number determination unit 2112, an operation pattern parameter determination unit 2113, and a movement target correspondence determination unit 2114 are provided.

移動対象数決定部２１１１は、ある時刻において各移動対象が動画中に存在するか否かを表す隠れ変数を決定する。動作パターン番号決定部２１１２は、移動対象の動作パターンの識別子（識別番号など）を表す隠れ変数を決定する。 The movement target number determination unit 2111 determines a hidden variable indicating whether or not each movement target exists in the moving image at a certain time. The motion pattern number determination unit 2112 determines a hidden variable that represents an identifier (such as an identification number) of the motion pattern to be moved.

動作パターンパラメータ決定部２１１３は、動作パターンのパラメータを決定する。移動対象対応関係決定部２１１４は、観測データと移動対象の対応関係を表す隠れ変数を決定する。 The operation pattern parameter determination unit 2113 determines an operation pattern parameter. The movement target correspondence determining unit 2114 determines a hidden variable representing the correspondence between the observation data and the movement target.

隠れ変数重み計算部２１２は、隠れ変数決定部２１１が推定した隠れ変数に基づいて、パーティクル（詳細は後記）ごとに、それぞれの隠れ変数の重みを推定する。 The hidden variable weight calculation unit 212 estimates the weight of each hidden variable for each particle (details will be described later) based on the hidden variable estimated by the hidden variable determination unit 211.

隠れ状態推定部２２は、観測データ、１つ前の時刻の隠れ状態量、および、隠れ変数集合推定部２１が推定した複数の隠れ変数に基づいて、隠れ状態量を推定する。 The hidden state estimation unit 22 estimates the hidden state amount based on the observation data, the hidden state amount at the previous time, and the plurality of hidden variables estimated by the hidden variable set estimation unit 21.

リサンプリング部２３は、隠れ変数集合推定部２１が推定した複数の隠れ変数、および、隠れ状態推定部２２が推定した隠れ状態量に基づいて、隠れ変数の一推定結果とそれに付随した隠れ状態量の推定結果を示すパーティクルの重みの分布から事後分布の高い部分を推測してサンプリングを複数回行うリサンプリングを行う。 Based on the plurality of hidden variables estimated by the hidden variable set estimation unit 21 and the hidden state quantity estimated by the hidden state estimation unit 22, the resampling unit 23 estimates one hidden variable and the accompanying hidden state quantity. Re-sampling is performed by inferring a portion with a high posterior distribution from the distribution of particle weights indicating the estimation result, and sampling a plurality of times.

ハイパーパラメータ更新部２４は、リサンプリング部２３によるリサンプリング結果に基づいて、ダイナミクスのパラメータを更新する。 The hyper parameter update unit 24 updates dynamics parameters based on the resampling result by the resampling unit 23.

（計算処理の概要と目的）
次に、本手法の計算処理の概要とその目的について説明する。図２Ａは、複数対象追跡のイメージ図である。観測データの一部は追跡すべき対象と関係ないノイズであることもある。本手法では、複数の追跡対象の状態ベクトルｘ（ｔ）（例えば２次元の位置ベクトル）を、観測された時系列データ（観測データ）Ｙ（ｔ）＝｛ｙ（ｔ）｝，ｔ＝１，２，・・・，から推定することが目標である。 (Outline and purpose of calculation process)
Next, the outline of the calculation process of this method and its purpose will be described. FIG. 2A is an image diagram of multi-object tracking. Some of the observation data may be noise that has nothing to do with the object to be tracked. In this method, a plurality of state vectors x (t) (for example, two-dimensional position vectors) to be tracked are observed time series data (observed data) Y (t) = {y (t)}, t = 1. The goal is to estimate from.

ｘ（ｔ）は隠れ状態量（すなわち直接の観測は不可能）であり、また、計算を簡単にするため、ｘ（ｔ）にはマルコフ性（将来における事象の起こる確率は、現在の状態だけから決まり、過去の状態に依存しないという性質）を仮定する。なお、ｘ（ｔ）は直接の観測が不可能である、というのは、あるデータが観測されたとしてもそのデータはノイズによって真値からずれている可能性があり正確なデータとは限らない、という意味である。つまり、隠れ状態量とはこの真値（またはその推定値）のことを指す。そして、真値は、観測不能であるが、本手法によれば高精度で推定することができる。 x (t) is a hidden state quantity (ie, direct observation is impossible), and in order to simplify the calculation, x (t) is Markovian (the probability of an event occurring in the future is only the current state) It is assumed that it is determined from the above and does not depend on the past state). Note that x (t) is not directly observable because even if some data is observed, the data may be deviated from the true value due to noise and is not necessarily accurate data. It means that. That is, the hidden state quantity refers to this true value (or its estimated value). The true value cannot be observed, but can be estimated with high accuracy according to this method.

さらに、観測データと隠れ状態量の間の関係を表現するために隠れ変数φ（ｔ）を定義する。実際の計算では、各時刻ｔにおける隠れ状態量ｘ（ｔ）を推定するために、観測データＹ（ｔ）と時刻ｔ−１での隠れ変数φ（ｔ−１）および隠れ状態量ｘ（ｔ−１）を用いて、隠れ変数φ（ｔ）の値の候補を確率的に生成し、各候補値のもとでの隠れ状態量ｘ（ｔ）を推定する。複数の隠れ変数と隠れ状態量の組み合わせ候補を、その確からしさで重みづけすることによって各時刻での隠れ状態量の確率分布を推定することを繰り返す。 Furthermore, a hidden variable φ (t) is defined to express the relationship between the observation data and the hidden state quantity. In actual calculation, in order to estimate the hidden state quantity x (t) at each time t, the observation data Y (t), the hidden variable φ (t-1) and the hidden state quantity x (t at time t−1 are calculated. -1) is used to probabilistically generate values for the hidden variable φ (t) and estimate the hidden state quantity x (t) under each candidate value. The estimation of the probability distribution of the hidden state quantity at each time is repeated by weighting the combination candidates of the plurality of hidden variables and the hidden state quantity with the certainty.

一例をあげて説明する。今、目的がビデオ画像中の人物追跡であるとすると、隠れ状態量は各時刻における人物の位置、あるいは速度ベクトルである。観測量は動画像データあるいはそこから抽出された特徴量ベクトルの時系列データである。隠れ変数は、各時刻におけるシーン（動画像）中の人物数、各人物の動き方パターン、観測量と人物の対応関係（あるピクセルあるいは特徴ベクトルがどの人物に対応するか）といった情報である。各時刻ｔにおいて得られた観測情報と時刻ｔ−１での人物数や各人物の位置といった情報から、時刻ｔにおける隠れ変数と各人物の状態の候補値を複数推定し、その確からしさの重みづけによって隠れ状態量の分布を推定することを繰り返す。 An example will be described. Now, assuming that the purpose is person tracking in a video image, the hidden state quantity is the position of a person at each time or a velocity vector. The observation amount is moving image data or time-series data of feature amount vectors extracted therefrom. The hidden variable is information such as the number of persons in the scene (moving image) at each time, the movement pattern of each person, and the correspondence between the observation amount and the person (which person a certain pixel or feature vector corresponds to). From the observation information obtained at each time t and information such as the number of persons and the positions of each person at the time t−1, a plurality of hidden variables and candidate values for the state of each person at the time t are estimated, and the weight of the probability The estimation of the hidden state quantity distribution is repeated.

（生成モデル）
続いて、本手法の計算に用いられる生成モデルについて説明する。図２Ｂは、本手法の計算に用いられる生成モデルのグラフィカルモデルである。図２Ｂでは、確率変数間の依存関係と、それらの変数から時系列データが生成される過程を表現している。生成されるデータは追跡対象の状態（位置）を表すｘ（ｔ）と、状態量にノイズの重畳した観測量であるｙ（ｔ）である。この生成モデルはダイナミクスパラメータの混合モデルに従う複数対象の行動データを表現できる。 (Generation model)
Next, the generation model used for the calculation of this method will be described. FIG. 2B is a graphical model of the generation model used for the calculation of this method. FIG. 2B represents the dependency between random variables and the process of generating time-series data from these variables. The generated data is x (t) representing the state (position) of the tracking target and y (t), which is an observation amount in which noise is superimposed on the state amount. This generation model can represent action data of multiple objects according to a mixed model of dynamics parameters.

複数の追跡対象が存在するため、隠れ状態量ｘ（ｔ）はｉ、観測量ｙ（ｔ）はｍの添字（インデックス）を持つ。ｉ番目の対象（隠れ状態量）をｘ_ｉ（ｔ）、ｍ番目の観測量をｙ_ｍ（ｔ）とする。Ｎ_ｔは時刻ｔにおける追跡対象の数、Ｍ_ｔは観測量の数を表す。複数の追跡対象（隠れ状態量）と観測量が存在する場合、各対象ごとの隠れ状態量の推定には、どの対象がどの観測量を出力したのか（対応するのか）、という対応関係を解決しなければならない。この問題はData Associationと呼ばれており、隠れ状態量の正確な推定のためには重要な問題である。この生成モデルでは、Data Association問題を解決するため、ｃおよびｊという隠れ変数を導入している。 Since there are a plurality of tracking targets, the hidden state quantity x (t) has an index i, and the observation quantity y (t) has an index m. The i th target (hidden state quantity) is x _i (t), and the m th observation quantity is y _m (t). N _t represents the number of tracking objects at time t, and M _t represents the number of observations. When there are multiple tracking targets (hidden state quantities) and observed quantities, resolve the correspondence of which target outputs which observation quantity (corresponds) to estimate the hidden state quantity for each target. Must. This problem is called Data Association and is an important issue for accurate estimation of hidden state quantities. In this generation model, hidden variables c and j are introduced to solve the Data Association problem.

各隠れ変数について説明する。まず、ｃ_ｉ（ｔ）は追跡対象ｉのbirth(addition)とdeath(deletion)を表す変数である。すなわち、どれだけの対象が現在現れ、あるいは消えたかを説明する。ｃ_ｉ（ｔ）＝１のとき、ｉ番目の対象は生存、すなわちシーン中に存在することを表し、逆にｃ_ｉ（ｔ）＝０の場合は対象がシーン外に存在することを示す。時刻ｔに新しい追跡対象が生成された場合には、新しいインデックスｉ＾（本明細書において、記号「＾」は直前の文字の上に位置する記号であるものとする。）を導入して、ｃ_ｉ＾（ｔ）＝１とする。ｊ_ｍ（ｔ）は、観測量ｙ_ｍ（ｔ）をその出力元である対象ｘ_ｉ（ｔ）と関連づけるdata association変数である。ｊ_ｍ（ｔ）＝ｉはｉ番目の追跡対象ｘ_ｉ（ｔ）がｍ番目の観測量ｙ_ｍ（ｔ）を出力したことを示す。 Each hidden variable will be described. First, c _i (t) is a variable representing the birth (addition) and death (deletion) of the tracking target i. That is, explain how many objects currently appear or disappear. When c _i (t) = 1, this indicates that the i-th object is alive, ie, exists in the scene, and conversely, when c _i (t) = 0, this indicates that the object exists outside the scene. When a new tracking target is generated at time t, a new index i ^ (in this specification, the symbol "^" is a symbol positioned on the immediately preceding character) is introduced. Let c _{i ^} (t) = 1. j _m (t) is a data association variable that associates the observation amount y _m (t) with the target x _i (t) that is the output source. j _m (t) = i indicates that the i-th tracking target x _i (t) outputs the m-th observation amount y _m (t).

パラメータは添字ｋでインデキシングされた集合｛ξ_ｋ｝、｛ψ_ｋ｝となっている。ｘ（ｔ）（ｙ（ｔ））はパラメータξ_ｋ（ψ_ｋ）で特徴づけられた分布に従って生成される。本モデルでは状態空間モデルとしてKalman filter（カルマンフィルタ。例えば「北川源四郎、“時系列解析入門”、岩波書店、pp. 125-141, 209-222、2005.」（以下、「非特許文献７」という。）参照）を用いるので、ξ_ｋとψ_ｋはその正規分布パラメータである。なお、ここではKalman filterを用いたが、これに限定されず、一般に正規分布で特徴づけられるモデルが利用可能である。 The parameter is a set {ξ _k }, {ψ _k } indexed by the subscript k. x (t) (y (t)) is generated according to the distribution characterized by the parameter ξ _k (ψ _k ). In this model, the Kalman filter (Kalman filter. For example, “Genjiro Kitagawa,“ Introduction to Time Series Analysis ”, Iwanami Shoten, pp. 125-141, 209-222, 2005.”) (hereinafter referred to as “Non-Patent Document 7”) ) _K ) and ξ _k and ψ _k are their normal distribution parameters. Although the Kalman filter is used here, the present invention is not limited to this, and a model that is generally characterized by a normal distribution can be used.

Ｇ_０およびＨ_０はNormal Inverse Wishart distributions（ＮＩＷ）（例えば「渡部洋、“ベイズ統計学入門”、福村出版、1999.」参照）とし、そのパラメータ（ハイパーパラメータ）をそれぞれθ^ξ、θ^ψとする。ここでＮＩＷ分布を用いたのは正規分布に対する共役分布であり計算が簡略化できるためであり、その他の分布を用いることも可能である。 G ₀ and H ₀ are Normal Inverse Wishart distributions (NIW) (see, for example, “Watanabe Hiroshi,“ Introduction to Bayesian Statistics ”, Fukumura Publishing, 1999.), and their parameters (hyper parameters) are θ ^ξ and θ ^ψ , respectively. To do. Here, the NIW distribution is used because it is a conjugate distribution with respect to the normal distribution, and the calculation can be simplified. Other distributions can also be used.

各時刻ｔにおいて、まず非特許文献５の手法に従ってｉ番目の追跡対象を支配するダイナミクスパターンのインデックスｚ_ｉ（ｔ）を生成する。この過程はパラメータγを持つChinese Restaurant Process（ＣＲＰ）（例えば「D. Blackwell and J. MacQueen, “Ferguson Distributions via Polyaurnschemes”, The Annals of Statistics, Vol. 1, pp. 353-355, 1973.」（以下、「非特許文献８」という。）および「C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada and N. Ueda, “Learning Systems of Concepts With An Infinite Relational Model”, in Proceedings of 21st National Conference on Artificial Intelligence, 2006.」参照）とπでパラメタライズされるベルヌイ試行でモデル化する。 At each time t, first, an index z _i (t) of the dynamics pattern that dominates the i-th tracking target is generated according to the method of Non-Patent Document 5. This process is the Chinese Restaurant Process (CRP) with parameter γ (eg “D. Blackwell and J. MacQueen,“ Ferguson Distributions via Polyaurnschemes ”, The Annals of Statistics, Vol. 1, pp. 353-355, 1973.”) “Non-Patent Document 8”) and “C. Kemp, JB Tenenbaum, TL Griffiths, T. Yamada and N. Ueda,“ Learning Systems of Concepts With An Infinite Relational Model ”, in Proceedings of 21st National Conference on Artificial Intelligence, 2006.)) and Bernoulli's trial parameterized by π.

ＣＲＰはDirichlet Process Mixture（ＤＰＭ）の実現例の一つであり、クラスタクリングの事前分布を与える。ＣＲＰを用いることでダイナミクスのパターン数（クラスタ数）を事前に固定することなく、ダイナミクスのパターン化（クラスタリング）をモデル化可能である。ｚ_ｉ（ｔ）＝ｋが生成されたら、それに対応したパラメータξ_ｋ、ψ_ｋを用いて隠れ状態量と観測量が生成される。このモデルの特徴は、パラメタライズされた分布を表現するためにＤＰＭを用いた点である。ＤＰＭは、ノンパラメトリックベイズモデルの一つで、混合数が無限の混合モデルと理解できる。ＤＰＭを用いることのメリットは、従来の混合モデルの推定のように各混合コンポーネントの性質（パラメータ）推定だけでなく、混合数についてもベイズ的枠組で同時に推定できる点にある。 CRP is one of the implementation examples of Dirichlet Process Mixture (DPM) and gives a prior distribution of clustering. By using CRP, dynamics patterning (clustering) can be modeled without fixing the number of dynamics patterns (number of clusters) in advance. When z _i (t) = k is generated, a hidden state quantity and an observed quantity are generated using parameters ξ _k and ψ _k corresponding thereto. A feature of this model is that DPM is used to represent a parameterized distribution. DPM is one of non-parametric Bayes models and can be understood as a mixture model with an infinite number of mixtures. The advantage of using DPM is that not only the properties (parameters) of each mixing component can be estimated as in the conventional mixing model estimation, but also the number of mixtures can be estimated simultaneously using a Bayesian framework.

以上で、ｘ_ｉ（ｔ）、ｙ_ｍ（ｔ）の生成に必要なすべての隠れ変数が準備された。隠れ状態量ｘ_ｉ（ｔ）は、対象ｉがシーン中に存在する（ｃ_ｉ（ｔ）＝１）場合のみ生成される。生成過程はインデックスｚ_ｉ（ｔ）＝ｋで決定されたξ_ｋとψ_ｋで特徴づけられる。続いて、観測量ｙ_ｍ（ｔ）の生成のために、data association変数ｊ_ｍ（ｔ）を生成する。ｍ番目の観測量ｙ_ｍ（ｔ）はｊ_ｍ（ｔ）＝ｉのときｘ_ｉ（ｔ）と観測分布のパラメータψ_{ｚｉ（ｔ）}によって生成される。このプロセスをすべてのｔについて繰り返すことで、複数の異なるダイナミクスで支配された、複数対象の行動軌跡の重ね合わせからなる、一般の複数対象の行動時系列データを生成することが可能である。 Thus, all the hidden variables necessary for generating x _i (t) and y _m (t) are prepared. The hidden state quantity x _i (t) is generated only when the target i exists in the scene (c _i (t) = 1). The generation process is characterized by ξ _k and ψ _k determined by the index z _i (t) = k. Subsequently, in order to generate the observation amount y _m (t), a data association variable j _m (t) is generated. The m-th observed quantity y _m (t) is generated by x _i (t) and the observation distribution parameter ψ _{zi (t)} when j _m (t) = i. By repeating this process for all t, it is possible to generate general multi-target behavior time series data composed of a superposition of a plurality of target behavior trajectories controlled by a plurality of different dynamics.

（計算アルゴリズム）
次に、本手法の計算アルゴリズムについて説明する。最終的な目標は、各時刻において事後分布ｐ（ｘ（ｔ）｜Ｙ（ｔ））を計算することであり、例えば次の式（１）によって計算できる。
(Calculation algorithm)
Next, the calculation algorithm of this method will be described. The final goal is to calculate the posterior distribution p (x (t) | Y (t)) at each time, and can be calculated by the following equation (1), for example.

図３は、本手法の全体の処理を示すフローチャートである。つまり、図３は、隠れ状態量、および隠れ変数を推定するアルゴリズムの概略を示している。図１の動画像処理装置１の演算手段２は、時刻ｔ＝１（ステップＳ１）から時刻ｔ＝Ｔ（ステップＳ８でＮｏ）まで、時刻を「１」ずつ更新しながら（ステップＳ９）、ステップＳ２〜Ｓ７の処理を行う。 FIG. 3 is a flowchart showing the overall processing of this method. That is, FIG. 3 shows an outline of an algorithm for estimating hidden state quantities and hidden variables. The calculation means 2 of the moving image processing apparatus 1 in FIG. 1 updates the time by “1” from time t = 1 (step S1) to time t = T (No in step S8) (step S9), Processes S2 to S7 are performed.

各時刻ｔにおいて、演算手段２がその時刻における観測値を取得（入力）した（ステップＳ２）のち、隠れ変数集合推定部２１は各隠れ変数（集合）を推定する（ステップＳ３：詳細は図４で後記）。この際、複数の推定結果をそれらの重みとともに保持することで、隠れ変数の推定分布を表現する。次に、隠れ状態推定部２２はその隠れ変数のもとで想定される最良の状態量を計算する（時刻ｔの隠れ状態（量）を推定）（ステップＳ４：詳細は図１１で後記）。これらの推定された状態量は隠れ変数の結果に依存する。前記したように、隠れ変数の一推定結果と、それに付随した隠れ状態量の推定結果を「パーティクル」と呼ぶ。 At each time t, after the computing means 2 acquires (inputs) the observed value at that time (step S2), the hidden variable set estimation unit 21 estimates each hidden variable (set) (step S3: details are shown in FIG. 4). And later). At this time, the estimated distribution of hidden variables is expressed by holding a plurality of estimation results together with their weights. Next, the hidden state estimation unit 22 calculates the best state quantity assumed under the hidden variable (estimates the hidden state (quantity) at time t) (step S4: details will be described later in FIG. 11). These estimated state quantities depend on the result of the hidden variable. As described above, one estimation result of the hidden variable and the estimation result of the hidden state quantity accompanying the estimation result are called “particles”.

続いて、リサンプリング部２３は、パーティクルの重みに従って、リサンプリングと呼ばれる処理を行う（ステップＳ５：詳細は図１２で後記）。リサンプリングとは、パーティクルおよびその重みの分布から、事後分布の高い部分を推測し、そこからサンプリングを何度も行うことである。ハイパーパラメータ更新部２４は、このリサンプリングの結果を受け、この時刻の観測データと、ステップＳ４で推定された隠れ状態量に基づいて、ダイナミクスのパラメータであるハイパーパラメータを更新する（ステップＳ６：詳細は図１３で後記）。演算手段２は、時刻ｔの隠れ状態量、隠れ変数およびハイパーパラメータを出力し（ステップＳ７）、例えば出力手段５に追跡結果表示を行う。 Subsequently, the resampling unit 23 performs a process called resampling according to the weight of the particles (step S5: details will be described later in FIG. 12). Resampling is to estimate a portion with a high posterior distribution from the distribution of particles and their weights, and perform sampling many times from there. The hyper parameter update unit 24 receives the resampling result, and updates the hyper parameter, which is a dynamics parameter, based on the observation data at this time and the hidden state quantity estimated in step S4 (step S6: details). Is described later in FIG. 13). The computing means 2 outputs the hidden state quantity, the hidden variable, and the hyper parameter at time t (step S7), and displays the tracking result on the output means 5, for example.

図４は、図３のステップＳ３の詳細を示すフローチャートであり、隠れ変数（集合）の推定方法を示すフローチャートである。ここでは、ｓ＝１，・・・,Ｓ個の隠れ変数の予測値と重みを計算する。まず、時刻ｔにおける各種隠れ変数をまとめて、φ（ｔ）＝｛｛ｃ_ｉ（ｔ）｝,｛ｊ_ｍ（ｔ）｝,｛ｚ_ｉ（ｔ）｝,｛ξ_ｋ（ｔ）｝,｛ψ_ｋ（ｔ）｝｝とする。 FIG. 4 is a flowchart showing details of step S3 of FIG. 3, and is a flowchart showing a hidden variable (set) estimation method. Here, the predicted values and weights of s = 1,..., S hidden variables are calculated. First, various hidden variables at time t are collected and φ (t) = {{c _i (t)}, {j _m (t)}, {z _i (t)}, {ξ _k (t)}, {Ψ _k (t)}}.

ここで、各隠れ変数について簡単に説明する。ｃ_ｉ（ｔ）は、時刻ｔにおいてｉ番目の追跡対象がシーン中に存在するかどうかを表す変数である。動画像における人物追跡の例では、時刻ｔにおいて動画像のシーン内に人物が存在するか否かを表現している。ｃ（ｔ）＝｛ｃ_ｉ（ｔ）｝,ｉ＝１,・・・,Ｎ_ｔの値によって現在の追跡対象数が決定できる。 Here, each hidden variable will be briefly described. c _i (t) is a variable indicating whether or not the i-th tracking target exists in the scene at time t. In the example of person tracking in a moving image, it is expressed whether or not a person exists in the moving image scene at time t. The current number of tracking targets can be determined by the values of c (t) = {c _i (t)}, i = 1,..., N _t .

ｊ_ｍ（ｔ）は、時刻ｔにおいてｍ番目の観測量を生成した追跡対象の番号を表す。動画像における人物追跡の例では、例えばｍ番目のピクセルがどの人物に対応するか、というような情報である。ｊ（ｔ）＝｛ｊ_ｍ（ｔ）｝,ｍ＝１,・・・Ｍ_ｔの値は各追跡対象の隠れ状態量の推定値に大きな影響を与える。 j _m (t) represents the number of the tracking target that generated the m-th observation amount at time t. In the example of person tracking in a moving image, for example, the information indicates which person the mth pixel corresponds to. The value of j (t) = {j _m (t)}, m = 1,... M _t greatly affects the estimated value of the hidden state quantity of each tracking target.

ｚ_ｉ（ｔ）は、時刻ｔにおいてｉ番目の追跡対象の状態量変化を決定するダイナミクスのパターン番号である。動画像における人物追跡の例では、各人物の時刻ｔにおける動きのパターン、例えば走っている、右向きに歩く、というような情報である。ｚ（ｔ）＝｛ｚ_ｉ（ｔ）｝，ｉ＝１，・・・，Ｎ_ｔの値も、各追跡対象の隠れ状態量の推定値、およびｊ（ｔ）の決定に影響を与える。 z _i (t) is a dynamics pattern number that determines the state quantity change of the i-th tracking target at time t. In the example of person tracking in a moving image, the movement pattern of each person at time t, for example, information such as running or walking to the right. The values of z (t) = {z _i (t)}, i = 1,..., N _t also influence the estimated value of the hidden state quantity of each tracking target and the determination of j (t).

ξ_ｋ（ｔ）およびψ_ｋ（ｔ）は、時刻ｔにおいてｋ番目のダイナミクスパターンを表す。より正確には、各パターンを特徴づけるパラメータである。動画像における人物追跡の例では、人物の位置ベクトルの時間変化量、あるいは観測ノイズの平均や分散に対応する。 ξ _k (t) and ψ _k (t) represent the k th dynamics pattern at time t. More precisely, it is a parameter that characterizes each pattern. In the example of person tracking in a moving image, this corresponds to the amount of time change in the position vector of the person or the average or variance of observation noise.

隠れ変数集合推定部２１は、時刻ｔにおける観測量ｙ（ｔ）を入力し（ステップＳ３１）、ｓ＝１（ステップＳ３２）からｓ＝Ｓまで（ステップＳ３６でＮｏ）、ｓの値を「１」ずつ更新しながら（ステップＳ３７）、ステップＳ３３〜Ｓ３５の処理を行う。 The hidden variable set estimation unit 21 inputs the observation amount y (t) at time t (step S31), s = 1 (step S32) to s = S (No in step S36), and sets the value of s to “1”. ”Are updated (step S37), and steps S33 to S35 are performed.

ステップＳ３３において、隠れ変数集合推定部２１は、時刻ｔ−１における隠れ状態量ｘ^（ｓ）（ｔ−１）、隠れ変数φ^（ｓ）（ｔ−１）、重みｗ^（ｓ）（ｔ−１）を入力する。 In step S33, the hidden variable set estimation unit 21 sets the hidden state quantity x ^(s) (t-1), the hidden variable φ ^(s) (t-1), and the weight w ^(s) (t-) at time t-1. Enter 1).

ステップＳ３４において、隠れ変数決定部２１１は、時刻ｔにおけるｓ番目の隠れ変数φ^（ｓ）（ｔ）を決定する（詳細は図５で後記）。具体的には、ｓ番目のパーティクルに対してφ^（ｓ）（ｔ）を次の式（２）に示すようにサンプリングすることで決定する。
In step S34, the hidden variable determination unit 211 determines the s-th hidden variable φ ^(s) (t) at time t (details will be described later in FIG. 5). Specifically, φ ^(s) (t) is determined by sampling the s-th particle as shown in the following equation (2).

ステップＳ３５において、隠れ変数重み計算部２１２は、ｓ番目の隠れ変数の重みｗ^（ｓ）（ｔ）を計算する（詳細は図１０で後記）。具体的には、各々のパーティクルは、式（３）に示すように、ｗ^（ｓ）（ｔ）によって重みづけられる。

次の式（４）〜（８）に示すように、条件付き分布ｐ（φ（ｔ）｜Φ（ｔ−１））を定義できる。
In step S35, the hidden variable weight calculation unit 212 calculates the weight w ^(s) (t) of the sth hidden variable (details will be described later in FIG. 10). Specifically, each particle is weighted by w ^(s) (t) as shown in equation (3).

As shown in the following expressions (4) to (8), a conditional distribution p (φ (t) | Φ (t−1)) can be defined.

ステップＳ３６でＮｏの場合、隠れ変数重み計算部２１２は、時刻ｔにおけるＳ個の隠れ変数φ（ｔ）と重みｗ（ｔ）を出力する（ステップＳ３８）。 In the case of No in step S36, the hidden variable weight calculation unit 212 outputs S hidden variables φ (t) and weight w (t) at time t (step S38).

図５は、図４のステップＳ３４の詳細を示すフローチャートであり、隠れ変数の予測値の計算方法を示すフローチャートである。図５は、前記した提案分布からのφ^（ｓ）（ｔ）のサンプリング過程であり、次の式（９）〜（１３）のように表記できる。つまり、図５は、ｑ（・）からのサンプリングを表現している。
FIG. 5 is a flowchart showing details of step S34 in FIG. 4, and is a flowchart showing a calculation method of the predicted value of the hidden variable. FIG. 5 shows a sampling process of φ ^(s) (t) from the above-mentioned proposal distribution, which can be expressed as the following equations (9) to (13). That is, FIG. 5 represents sampling from q (•).

隠れ変数決定部２１１は、時刻ｔ−１における隠れ状態量ｘ（ｔ−１）、隠れ変数φ（ｔ−１）を入力し（ステップＳ３４１）、時刻ｔにおける観測量ｙ（ｔ）を入力する（ステップＳ３４２）。 The hidden variable determination unit 211 inputs the hidden state quantity x (t−1) and the hidden variable φ (t−1) at time t−1 (step S341), and inputs the observed quantity y (t) at time t. (Step S342).

ステップＳ３４３において、移動対象数決定部２１１１は、追跡対象の生成と消滅を表す隠れ変数ｃ（ｔ）を決定する（詳細は図６で後記）。 In step S343, the moving object number determination unit 2111 determines a hidden variable c (t) representing generation and disappearance of the tracking object (details will be described later in FIG. 6).

ステップＳ３４４において、動作パターン番号決定部２１１２は、追跡対象の動作パターンの番号を表す隠れ変数ｚ（ｔ）を決定する（詳細は図７で後記）。 In step S344, the motion pattern number determination unit 2112 determines a hidden variable z (t) that represents the number of the motion pattern to be tracked (details will be described later in FIG. 7).

ステップＳ３４５において、動作パターンパラメータ決定部２１１３は、動作パターンのパラメータを表すξ（ｔ），ψ（ｔ）を決定する（詳細は図３，４，５で後記）。 In step S345, the motion pattern parameter determination unit 2113 determines ξ (t) and ψ (t) representing the parameters of the motion pattern (details will be described later with reference to FIGS. 3, 4 and 5).

ステップＳ３４６において、移動対象対応関係決定部２１１４は、観測値と追跡対象の対応関係を表す隠れ変数ｊ（ｔ）を決定する（詳細は図９で後記）。 In step S346, the movement target correspondence determining unit 2114 determines a hidden variable j (t) representing the correspondence between the observed value and the tracking target (details will be described later in FIG. 9).

ステップＳ３４７において、隠れ変数決定部２１１は、時刻ｔにおける隠れ変数φ（ｔ）を出力する。 In step S347, the hidden variable determination unit 211 outputs the hidden variable φ (t) at time t.

図６は、図５のステップＳ３４３の詳細を示すフローチャートであり、式（９）を用いた追跡対象のbirth(addition)およびdeath(deletion)を表現する隠れ変数ｃ（ｔ）のサンプリングである。 FIG. 6 is a flowchart showing details of step S343 in FIG. 5, which is sampling of the hidden variable c (t) that expresses the tracking target birth (addition) and death (deletion) using Expression (9).

まず、図６の処理の概要について説明する。非特許文献２において、ｃ（ｔ）の時間発展は２段階のベルヌイ試行としてモデル化されており、本モデルでもそれを踏襲する。すなわち、ｉ）前時刻においてシーン中に存在する追跡対象（ｃ_ｉ（ｔ−１））＝１）は確率Ｐ_ｄでシーンから消える（ｃ_ｉ（ｔ）＝０）。そうでなければｃ_ｉ（ｔ）＝１とする。ｉｉ）新たな追跡対象が確率Ｐ_ｂで生成される。各時刻で新たに生成される追跡対象の数を「１」に制限（非特許文献２参照）すると、全追跡対象についてｃ（ｔ）の計算を行ったときに、その値を得る確率は式（１４）のようになる。 First, an overview of the processing of FIG. 6 will be described. In Non-Patent Document 2, the time evolution of c (t) is modeled as a two-stage Bernoulli trial, and this model follows it. That is, i) The tracking target (c _i (t−1)) = 1) existing in the scene at the previous time disappears from the scene with the probability P _d (c _i (t) = 0). Otherwise, c _i (t) = 1. ii) the new target object is generated with a probability _{P b.} When the number of tracking targets newly generated at each time is limited to “1” (see Non-Patent Document 2), the probability of obtaining the value when c (t) is calculated for all tracking targets is expressed by the equation (14)

ここで、ｎ_ｓはシーン中に現存している追跡対象の数、ｎ_ｄは時刻ｔにシーンから消えた対象数、そして（ｎ_ｂ＝｛０，１｝）が新たに生成した追跡対象の数である。
Here, n _s is the number of tracking targets currently existing in the scene, n _d is the number of tracking targets that disappeared from the scene at time t, and (n _b = {0, 1}) is the newly generated tracking target. Is a number.

具体的には、移動対象数決定部２１１１は、まず、ｃ（ｔ−１）を入力し（ステップＳ６０１）、ｉ＝１（ステップＳ６０２）からｉ＝Ｎ_ｔ−１まで（ステップＳ６０７でＮｏ）、ｉの値を「１」ずつ更新しながら（ステップＳ６０８）、ステップＳ６０３〜Ｓ６０６の処理を行う。 Specifically, the movement target number determination unit 2111 first inputs c (t−1) (step S601), and from i = 1 (step S602) to i = N _t−1 (No in step S607). , I is updated by “1” (step S608), and the processing of steps S603 to S606 is performed.

移動対象数決定部２１１１は、ｃ_ｉ（ｔ−１）＝１か否か判断し（ステップＳ６０３）、Ｙｅｓの場合は時刻ｔにおいて追跡対象ｉが存在するか否か判断する（ステップＳ６０４）。ステップＳ６０４でＹｅｓの場合はｃ_ｉ（ｔ）に１を代入する（ステップＳ６０６）。ステップＳ６０３とステップＳ６０４のいずれかでＮｏの場合は、ｃ_ｉ（ｔ）に０を代入する（ステップＳ６０５）。 The movement target number determination unit 2111 determines whether c _i (t−1) = 1 (step S603), and in the case of Yes, determines whether the tracking target i exists at the time t (step S604). If Yes in step S604, 1 is substituted for c _i (t) (step S606). In the case of No in either step S603 or step S604, 0 is substituted for c _i (t) (step S605).

ステップＳ６０９において、移動対象数決定部２１１１は、新しい追跡対象が発生したか否か判断し、Ｙｅｓの場合はＮ_ｔにＮ_ｔ−１＋１を代入し（ステップＳ６１０）、ｃ_Ｎｔ（ｔ）に１を代入し（ステップＳ６１１）、Ｎｏの場合はＮ_ｔにＮ_ｔ−１を代入する（ステップＳ６１２）。最後に、移動対象数決定部２１１１は、ｃ（ｔ）を出力する（ステップＳ６１３）。 In step S609, the mobile object number determination unit 2111 determines whether the new tracked occurs, in the case of Yes substituting _{N t-1} +1 to _{N t} (step _{S610), c} Nt _(t) 1 by substituting (step S611), If No substituting _{N t-1} to _{N t} (step S612). Finally, the movement target number determination unit 2111 outputs c (t) (step S613).

図７は、図５のステップＳ３４４の詳細を示すフローチャートであり、式（１１）を用いたｚ（ｔ）のサンプリングである。まず、図７の処理の概要について説明する。ｚ（ｔ）の生成にはChinese Restaurant Process（ＣＲＰ）（非特許文献８参照）を利用する。ＣＲＰとは、ノンパラメトリックベイズモデルの一種であるDirichlet Process Mixture（ＤＰＭ）の実現例で、サンプルの分割（クラスタリング）の事前分布である。今、ｋでインデックスされた混合分布の要素（クラスタ）集合があり、ｉ番目のサンプルのクラスタインデックスをｚ_ｉと書くことにする。ｚ_１からｚ_ｉ−１が与えられたとき、ｚ_ｉの分布は次の式（１５）のようになる。
FIG. 7 is a flowchart showing details of step S344 in FIG. 5, and is sampling of z (t) using equation (11). First, an overview of the processing in FIG. 7 will be described. For the generation of z (t), Chinese Restaurant Process (CRP) (see Non-Patent Document 8) is used. CRP is an implementation example of Dirichlet Process Mixture (DPM), which is a kind of nonparametric Bayes model, and is a prior distribution of sample division (clustering). Now, there is a mixed distribution element (cluster) set indexed by k, and the cluster index of the i-th sample is written as z _i . When z ₁ to z _i−1 are given, the distribution of z _i is expressed by the following equation (15).

ここで、ｍ_ｋはｋ番目のクラスタの大きさである。このサンプリングプロセスをｚ_ｉ〜ＣＲＰ（γ）と書く。式（１５）からすぐ分かるようにＣＲＰから生成されるクラスタリングではサイズが大きい少数のクラスタが生成されやすい。これはクラスタリングの事前分布として適切な性質である。ＣＲＰによって生成されたクラスタリングに適切な尤度関数を組み合わせることで、ｚ（ｔ）の事後分布も導くことができる。 Here, m _k is the size of the k-th cluster. This sampling process is written as z _{i to} CRP (γ). As can be seen from Equation (15), a small number of large clusters are likely to be generated in clustering generated from CRP. This is an appropriate property as a prior distribution of clustering. A posterior distribution of z (t) can be derived by combining an appropriate likelihood function with clustering generated by CRP.

本モデルでは、各時刻で最良のクラスタリング推定を得るため、式（１５）をオンラインで更新する。Ｚ（ｔ−１）を事前の情報として、時刻ｔにおけるｚｉ（ｔ）のサンプリングを次の式（１６）のように行う。つまり、ここで、「オンライン」とは、「インクリメンタル」の意味であり、式（１５）の計算は、実際には式（１６）のように時刻ｔ−１までのＺの推定結果を用いて計算する、ということである。
In this model, equation (15) is updated online to obtain the best clustering estimate at each time. Sampling of zi (t) at time t is performed as in the following equation (16) using Z (t-1) as prior information. That is, here, “online” means “incremental”, and the calculation of equation (15) is actually performed using the estimation result of Z up to time t−1 as in equation (16). It is to calculate.

ｍ_ｋ（ｔ−１）は時刻ｔ−１までに得られたｋ番目のクラスタの大きさである。また、｜Ｚ（ｔ−１）｜は時刻ｔ−１におけるクラスタのサイズ総数、すなわちｍ_ｋ（ｔ−１）の総和である。 m _k (t−1) is the size of the k th cluster obtained up to time t−1. | Z (t−1) | is the total number of cluster sizes at time t−1, that is, the sum of m _k (t−1).

本モデルではＣＲＰによって各対象のダイナミクスを変更することができるが、そういったダイナミクスの変更は一定の頻度でしか発生しないという場合が考えられる。ダイナミクス変更の頻度もモデル化するため新しいパラメータπを導入し、確率πでダイナミクスの変更を行うこととする。最終的に式（１１）は次のようにモデル化される。まず、式（１７）に示すように、式（１１）を対象ｉで分解する。
In this model, the dynamics of each target can be changed by CRP, but such a change in dynamics may occur only at a certain frequency. In order to model the frequency of dynamics change, a new parameter π is introduced and the dynamics are changed with probability π. Finally, equation (11) is modeled as follows. First, as shown in equation (17), equation (11) is decomposed with object i.

その上で、各ｚ_ｉ（ｔ）を次の式（１８）、式（１９）のようにサンプリングする。
Then, each z _i (t) is sampled as in the following equations (18) and (19).

具体的には、動作パターン番号決定部２１１２は、まず、ｚ（ｔ−１）、ｃ（ｔ）を入力し（ステップＳ７０１）、ｉ＝１（ステップＳ７０２）からｉ＝Ｎ_ｔまで（ステップＳ７０９でＮｏ）、ｉの値を「１」ずつ更新しながら（ステップＳ７０４）、ステップＳ７０３〜Ｓ７０８の処理を行う。 Specifically, the motion pattern number determination unit 2112 first inputs z (t−1) and c (t) (step S701), and from i = 1 (step S702) to i = N _t (step S709). No), while updating the value of i by “1” (step S704), the processing of steps S703 to S708 is performed.

動作パターン番号決定部２１１２は、ｃ_ｉ（ｔ）＝１か否か判断し（ステップＳ７０３）、Ｎｏの場合はステップＳ７０４に進み、Ｙｅｓの場合は追跡対象ｉの動作パターンが変化したか否か判断する（ステップＳ７０５）。ステップＳ７０５でＹｅｓの場合は、ＣＲＰにしたがって動作パターンのインデックスｋを決定し（ステップＳ７０６）、ｚ_ｉ（ｔ）にｋを代入する（ステップＳ７０７）。ステップＳ７０５でＮｏの場合は、ｚ_ｉ（ｔ）にｚ_ｉ（ｔ−１）を代入する（ステップＳ７０８）。最後に、動作パターン番号決定部２１１２は、ｚ（ｔ）を出力する（ステップＳ７１０）。 The motion pattern number determination unit 2112 determines whether c _i (t) = 1 (step S703). If No, the process proceeds to step S704. If Yes, whether the motion pattern of the tracking target i has changed. Judgment is made (step S705). If Yes in step S705, the index k of the operation pattern is determined according to CRP (step S706), and k is substituted for z _i (t) (step S707). If No at step _S705, the substituting _z i (t-1) to _z i (t) (step S 708). Finally, the operation pattern number determination unit 2112 outputs z (t) (step S710).

図８は、図５のステップＳ３４５の詳細を示すフローチャートであり、Kalman filterの正規分布パラメータであるξ（ｔ）、ψ（ｔ）をそれぞれ式（１２）、式（１３）からサンプリングするプロセスである。 FIG. 8 is a flowchart showing details of step S345 in FIG. 5, and is a process of sampling ξ (t) and ψ (t), which are normal distribution parameters of the Kalman filter, from Equation (12) and Equation (13), respectively. is there.

まず、図８の処理の概要について説明する。先に述べたように、Kalman filterでなく、他のモデルでも正規分布で特徴づけられるモデルならば同様のモデル化が可能である。時刻ｔにおいてパーティクルｓが推定するクラスタ数をｋ_ｔとする。追跡対象の隠れ状態量ｘ_ｉ（ｔ）および観測量ｙ_ｍ（ｔ）がｋ番目のダイナミクスクラスタから生成される場合（すなわちｊ_ｍ（ｔ）＝ｉ and ｚ_ｉ（ｔ）＝ｋの場合）、それらの生成過程はKalman filterを用いて次の式（２０）、式（２１）のようになる。
First, the outline of the processing of FIG. 8 will be described. As described above, similar modeling is possible if other models than the Kalman filter are characterized by a normal distribution. The number of clusters of particles s estimates and k _t at time t. When the hidden state quantity x _i (t) and the observation quantity y _m (t) to be tracked are generated from the k-th dynamics cluster (that is, when j _m (t) = i and z _i (t) = k) These generation processes are expressed by the following equations (20) and (21) using the Kalman filter.

ここで、ｆとｈは正規分布ノイズを持つ線形モデルである。システムノイズの平均と共分散行列をξ_ｋ（ｔ）＝｛ｑ，Ｑ｝、観測ノイズの平均と共分散行列をψ_ｋ（ｔ）＝｛ｒ，Ｒ｝とする。これらのパラメータ推定のため、パラメータの事前分布にNormal Inverse Wishart distribution（ＮＩＷ）（非特許文献５参照）を導入する。各クラスタのシステムノイズ、観測ノイズそれぞれにＮＩＷを設定して、パラメータをθ_ｋ＝｛θ^ξ _ｋ，θ^ψ _ｋ｝と表す。 Here, f and h are linear models having normal distribution noise. Let the system noise mean and covariance matrix be ξ _k (t) = {q, Q}, and the observation noise mean and covariance matrix be ψ _k (t) = {r, R}. In order to estimate these parameters, Normal Inverse Wishart distribution (NIW) (see Non-Patent Document 5) is introduced into the parameter prior distribution. NIW is set for the system noise and observation noise of each cluster, and the parameters are represented as θ _k = {θ ^ξ _k , θ ^ψ _k }.

時刻ｔ−１までのデータに基づいた事後分布から、時刻ｔにおける真のパラメータξ_ｋ，ψ_ｋの推定値ξｋ（ｔ）とψ_ｋ（ｔ）をサンプリングする。時刻ｔにおけるサンプリングには、時刻ｔ−１におけるθの事後推定値θ_ｋ（ｔ−１）＝｛θξ_ｋ（ｔ−１），θ^ψ _ｋ（ｔ−１）｝を用いる。式（１２）、式（１３）をそれぞれｐ（ξ（ｔ）｜θ^ξ（ｔ−１））＝Π_ｋ（ξ_ｋ（ｔ）｜θ^ξ _ｋ（ｔ−１）），ｐ（ψ（ｔ）｜θ^ψ（ｔ−１））＝Π_ｋ（ψ_ｋ（ｔ）｜θ^ψ _ｋ（ｔ−１））のように分解し、各要素を次の式（２２）、式（２３）のように表す。
From the posterior distribution based on the data up to time t−1, the estimated values ξk (t) and ψ _k (t) of the true parameters ξ _k and ψ _k at time t are sampled. For sampling at time t, a posteriori estimated value θ _k (t−1) = {θξ _k (t−1), θ ^ψ _k (t−1)} at time t−1 is used. Equations (12) and (13) are expressed as p (ξ (t) | θ ^ξ (t−1)) = Π _k (ξ _k (t) | θ ^ξ _k (t−1)), p (ψ ( t) | θ ^ψ (t−1)) = Π _k (ψ _k (t) | θ ^ψ _k (t−1)), and each element is expressed by the following equations (22) and (23). It expresses like this.

具体的には、動作パターンパラメータ決定部２１１３は、まず、ｚ（ｔ）、θ（ｔ−１）、θ（０）を入力し（ステップＳ８０１）、ｋ＝１（ステップＳ８０２）からｋ＝Ｋ_ｔまで（ステップＳ８０８でＮｏ）、ｋの値を「１」ずつ更新しながら（ステップＳ８０９）、ステップＳ８０３〜Ｓ８０７の処理を行う。 Specifically, the motion pattern parameter determination unit 2113 first inputs z (t), θ (t−1), and θ (0) (step S801), and k = 1 (step S802) to k = K. Until _t (No in step S808), while updating the value of k by “1” (step S809), the processing of steps S803 to S807 is performed.

動作パターンパラメータ決定部２１１３は、ｋは時刻ｔに生成された新しいインデックスか否か判断し（ステップＳ８０３）、Ｙｅｓの場合はξ_ｋ（ｔ）をＮＩＷ（θ^ξ _ｋ（０））からサンプリングし（ステップＳ８０４）、ψ_ｋ（ｔ）をＮＩＷ（θ^ψ _ｋ（０））からサンプリングする（ステップＳ８０５）。ステップＳ８０３でＮｏの場合はξ_ｋ（ｔ）をＮＩＷ（θ^ξ _ｋ（ｔ−１））からサンプリングし（ステップＳ８０６）、ψ_ｋ（ｔ）をＮＩＷ（θ^ψ _ｋ（ｔ−１））からサンプリングする（ステップＳ８０７）。最後に、動作パターンパラメータ決定部２１１３は、ξ（ｔ）、ψ（ｔ）を出力する（ステップＳ８１０）。 The operation pattern parameter determination unit 2113 determines whether k is a new index generated at time t (step S803). If Yes, ξ _k (t) is sampled from NIW (θ ^ξ _k (0)). (Step S804), ψ _k (t) is sampled from NIW (θ ^ψ _k (0)) (Step S805). In the case of No in step S803, ξ _k (t) is sampled from NIW (θ ^ξ _k (t−1)) (step S806), and ψ _k (t) is sampled from NIW (θ ^ψ _k (t−1)). Sampling is performed (step S807). Finally, the motion pattern parameter determination unit 2113 outputs ξ (t) and ψ (t) (step S810).

図９は、図５のステップＳ３４６の詳細を示すフローチャートであり、式（２５）によるdata association変数Ｊのサンプリングであるが、ここでは式（２４）についても同時に説明する。 FIG. 9 is a flowchart showing details of step S346 in FIG. 5 and is the sampling of the data association variable J according to the equation (25). Here, the equation (24) will also be described at the same time.

まず、図９の処理の概要について説明する。各観測量が独立であると仮定し、次の式（２４）のように分解する。
First, an overview of the processing of FIG. 9 will be described. Assuming that each observation amount is independent, it is decomposed as the following equation (24).

本モデルではｐ（ｊ_ｍ（ｔ））を一様分布、すなわち一般にdata associationに関して事前知識（情報）がない状況を想定する。ただし、後記する実験では事前知識を導入した例を示す。 In this model, it is assumed that p (j _m (t)) is distributed uniformly, that is, in general, there is no prior knowledge (information) regarding data association. However, the experiment described below shows an example in which prior knowledge is introduced.

Ｊの探索空間が広い場合には、疑似的に尤度を導入することで外れ値を除外することも可能である（例えば「M. K. Pitt and N. Shephard, “Filtering via simulation: Auxiliary particle filters”, Journal of the American Statistical Association， Vol. 94, N0. 446， pp. 590-599， 1999.」参照）。後記する実験では、ｊのproposal distributionとして次の式（２５）のような分布を用いた（非特許文献２参照）。
If the search space of J is wide, outliers can be excluded by introducing pseudo likelihood (for example, “MK Pitt and N. Shephard,“ Filtering via simulation: Auxiliary particle filters ”, Journal of the American Statistical Association, Vol. 94, N0. 446, pp. 590-599, 1999 ”). In an experiment to be described later, a distribution represented by the following equation (25) was used as the proper distribution of j (see Non-Patent Document 2).

この式は、“代表的な”状態ベクトルｘ＾_ｉ（ｔ）における尤度を使って事後分布に近い分布を計算しようとしている。本モデルではｘ＾_ｉ（ｔ）として、ｘ_ｉ（ｔ）予測分布（フィルタリング前）の平均値を利用した。 This equation attempts to calculate a distribution close to the posterior distribution using the likelihood in the “typical” state vector x ^ _i (t). In this model, the average value of x _i (t) prediction distribution (before filtering) was used as x ^ _i (t).

具体的には、移動対象対応関係決定部２１１４は、まず、時刻ｔ−１における隠れ状態量ｘ（ｔ−１）を入力し（ステップＳ９０１）、時刻ｔの観測値ｙ（ｔ）を入力し（ステップＳ９０２）、Ｊ（ｔ−１）、ｃ（ｔ）、ｚ（ｔ）、ξ（ｔ）、ψ（ｔ）を入力する（ステップＳ９０３）。 Specifically, the movement target correspondence determining unit 2114 first inputs the hidden state quantity x (t−1) at time t−1 (step S901), and inputs the observation value y (t) at time t. (Step S902), J (t-1), c (t), z (t), ξ (t), and ψ (t) are input (Step S903).

続いて、移動対象対応関係決定部２１１４は、ｍ＝１（ステップＳ９０４）からｍ＝Ｍ_ｔまで（ステップＳ９１２でＮｏ）、ｍの値を「１」ずつ更新しながら（ステップＳ９１３）、ステップＳ９０５〜Ｓ９１１の処理を行う。 Subsequently, the moving object correspondence relationship determining unit 2114, m = 1 to (step S904) until m = _{M t} (No in step S912), while updating the value of m by "1" (step S913), step S905 Processing of ~ S911 is performed.

また、移動対象対応関係決定部２１１４は、ｉ＝１（ステップＳ９０５）からｉ＝Ｎ_ｔまで（ステップＳ９１０でＮｏ）、ｉの値を「１」ずつ更新しながら（ステップＳ９０７）、ステップＳ９０６〜Ｓ９０９の処理を行う。 The mobile object correspondence relationship determining unit 2114, i = 1 to (step S905) to i = _{N t} (No in step S910), while updating the value of i by "1" (step S907), step S906~ The process of S909 is performed.

移動対象対応関係決定部２１１４は、ｃ_ｉ（ｔ）＝１か否か判断し（ステップＳ９０６）、Ｎｏの場合はステップＳ９０７に進み、Ｙｅｓの場合はｚ_ｉ（ｔ）、ξ（ｔ）、ψ（ｔ）からｘ＾_ｉを計算し（ステップＳ９０８）、ｑ（ｊ_ｍ（ｔ）＝ｉ｜Ｊ（ｔ−１），ｙ_ｍ（ｔ））を計算する（ステップＳ９０９）。 The movement target correspondence determining unit 2114 determines whether or not c _i (t) = 1 (step S906). If No, the process proceeds to step S907. If Yes, z _i (t), ξ (t), x ^ _i is calculated from ψ (t) (step S908), and q (j _m (t) = i | J (t-1), y _m (t)) is calculated (step S909).

ステップＳ９１１において、移動対象対応関係決定部２１１４は、ｑ（ｊ_ｍ（ｔ）｜Ｊ（ｔ−１），ｙ_ｍ（ｔ））に従ってｊ_ｍ（ｔ）を決定する。最後に、移動対象対応関係決定部２１１４は、ｊ（ｔ）を出力する（ステップＳ９１４）。 In step S911, the mobile object correspondence relationship determining unit _{2114, q (j m (t)} | J (t-1), y m (t)) to determine the _j m (t) in accordance with. Finally, the movement target correspondence determining unit 2114 outputs j (t) (step S914).

次に、図４のステップＳ３５の処理、つまり、隠れ変数の重み（以下、単に「重み」ともいう。）を計算する処理について説明する。図１０は、図４のステップＳ３５の詳細を示すフローチャートである。重みの計算式は既に式（３）に示した通りである。 Next, the process of step S35 in FIG. 4, that is, the process of calculating the weight of the hidden variable (hereinafter also simply referred to as “weight”) will be described. FIG. 10 is a flowchart showing details of step S35 in FIG. The formula for calculating the weight is as shown in the formula (3).

具体的には、隠れ変数重み計算部２１２は、まず、時刻ｔ−１における隠れ状態量ｘ（ｔ−１）と重みｗ（ｔ−１）を入力し（ステップＳ１００１）、時刻ｔにおける隠れ変数φ（ｔ）と観測値ｙ（ｔ）を入力する（ステップＳ１００２）。 Specifically, the hidden variable weight calculation unit 212 first inputs the hidden state quantity x (t−1) and the weight w (t−1) at time t−1 (step S1001), and the hidden variable at time t. φ (t) and observed value y (t) are input (step S1002).

続いて、隠れ変数重み計算部２１２は、ｑ（φ（ｔ）｜Φ（ｔ−１），Ｙ（ｔ））を計算し（ステップＳ１００３）、ｐ（φ（ｔ）｜Φ（ｔ−１））を計算する（ステップＳ１００４）。 Subsequently, the hidden variable weight calculation unit 212 calculates q (φ (t) | Φ (t−1), Y (t)) (step S1003), and p (φ (t) | Φ (t−1). )) Is calculated (step S1004).

その後、隠れ変数重み計算部２１２は、ｐ（ｙ（ｔ）｜φ（ｔ））を計算し（ステップＳ１００５）、ｗ（ｔ）を計算する（ステップＳ１００６）。最後に、隠れ変数重み計算部２１２は、ｗ（ｔ）を出力する（ステップＳ１００７）。 Thereafter, the hidden variable weight calculation unit 212 calculates p (y (t) | φ (t)) (step S1005) and calculates w (t) (step S1006). Finally, the hidden variable weight calculation unit 212 outputs w (t) (step S1007).

続いて、図３のステップＳ４の処理、つまり、ステップＳ３で得られた隠れ変数をもとに隠れ状態量を推定する処理について説明する。図１１は、図３のステップＳ４の詳細を示すフローチャートである。 Next, the process of step S4 in FIG. 3, that is, the process of estimating the hidden state quantity based on the hidden variable obtained in step S3 will be described. FIG. 11 is a flowchart showing details of step S4 in FIG.

まず、図１１の処理の概要について説明する。パーティクルフィルタの利用によって、求める事後分布は次の式（２６）のように近似される。
First, an overview of the processing of FIG. 11 will be described. By using the particle filter, the posterior distribution to be obtained is approximated as the following equation (26).

今、隠れ変数Φ（ｔ）^（ｓ）＝｛φ（ｔ）^（ｓ）｝を生成したので、これはｐ（ｘ（ｔ），Φ（ｔ）^（ｓ）｜Ｙ（ｔ））を計算することに相当する。計算方法自体は、一般的なKalman filterによる状態推定手法（非特許文献７）と同じである。 Now that the hidden variable Φ (t) ^(s) = {φ (t) ^(s) } is generated, this calculates p (x (t), Φ (t) ^(s) | Y (t)) It corresponds to doing. The calculation method itself is the same as the state estimation method using a general Kalman filter (Non-Patent Document 7).

具体的には、隠れ状態推定部２２は、まず、時刻ｔ−１における隠れ状態量ｘ（ｔ−１）を入力し（ステップＳ１１０１）、時刻ｔにおける隠れ変数φ（ｔ）と観測量ｙ（ｔ）を入力する（ステップＳ１１０２）。 Specifically, the hidden state estimation unit 22 first inputs the hidden state quantity x (t−1) at time t−1 (step S1101), and the hidden variable φ (t) and observed quantity y (at time t). t) is input (step S1102).

続いて、隠れ状態推定部２２は、ｓ＝１（ステップＳ１１０３）からｓ＝Ｓまで（ステップＳ１１１５でＮｏ）、ｓの値を「１」ずつ更新しながら（ステップＳ１１１６）、ステップＳ１１０４〜Ｓ１１１４の処理を行う。 Subsequently, the hidden state estimation unit 22 updates the value of s by “1” from s = 1 (step S1103) to s = S (No in step S1115) (step S1116), and performs steps S1104 to S1114. Process.

また、隠れ状態推定部２２は、ｉ＝１（ステップＳ１１０４）からｉ＝Ｎ_ｔまで（ステップＳ１１１４でＮｏ）、ｉの値を「１」ずつ更新しながら（ステップＳ１１０６）、ステップＳ１１０５〜Ｓ１１１３の処理を行う。 Also, the hidden state estimation unit 22, i = 1 to (step S1104) to i = _{N t} (No in step S1114), while updating the value of i by "1" (step S1106), in step S1105~S1113 Process.

隠れ状態推定部２２は、ｃ^（ｓ） _ｉ（ｔ）＝１か否か判断し（ステップＳ１１０５）、Ｎｏの場合はステップＳ１１０６に進み、Ｙｅｓの場合はｚ^（ｓ） _ｉ（ｔ）＝ｋで選択されるξ^（ｓ） _ｋ（ｔ）、ψ^（ｓ） _ｋ（ｔ）をカルマンフィルタにセットする（ステップＳ１１０７）。 The hidden state estimation unit 22 determines whether c ^(s) _i (t) = 1 (step S1105). If No, the process proceeds to step S1106. If Yes, z ^(s) _i (t) = k. Ξ ^(s) _k (t) and ψ ^(s) _k (t) selected in step S1107 are set in the Kalman filter (step S1107).

その後、隠れ状態推定部２２は、ｍ＝１（ステップＳ１１０８）からｍ＝Ｍ_ｔまで（ステップＳ１１１２でＮｏ）、ｍの値を「１」ずつ更新しながら（ステップＳ１１１０）、ステップＳ１１０９、Ｓ１１１１の処理を行う。 Then, hidden state estimation unit 22, m = 1 to (step S1108) until m = _{M t} (No in step S1112), while updating the value of m by "1" (step S1110), step S1109, S1111 of Process.

隠れ状態推定部２２は、ｊ^（ｓ） _ｍ（ｔ）＝ｉか否か判断し（ステップＳ１１０９）、Ｎｏの場合はステップＳ１１１０に進み、Ｙｅｓの場合はｙ^（ｓ） _ｍ（ｔ）を用いてｘ^（ｓ） _ｉ（ｔ）をフィルタリングする（ステップＳ１１１１）。 The hidden state estimation unit 22 determines whether j ^(s) _m (t) = i (step S1109). If No, the process proceeds to step S1110. If Yes, y ^(s) _m (t) is used. Then, x ^(s) _i (t) is filtered (step S1111).

ステップＳ１１１２でＮｏの場合、隠れ状態推定部２２は、各ｍに対するフィイルタリング結果の平均をｘ^（ｓ） _ｉ（ｔ）として設定する。 In the case of No in step S1112, the hidden state estimation unit 22 sets the average of the filtering results for each m as x ^(s) _i (t).

ステップＳ１１１５でＮｏの場合、隠れ状態推定部２２は、時刻ｔにおける隠れ状態量ｘ（ｔ）を出力する（ステップＳ１１１７）。 In the case of No in step S1115, the hidden state estimation unit 22 outputs the hidden state quantity x (t) at time t (step S1117).

次に、図３のステップＳ５の処理について説明する。この処理は、重みの大きい、すなわち事後分布の高いパーティクル（隠れ変数および隠れ状態量）を選択的にコピーし、重みの小さいパーティクルと入れ換えることで、より確からしいパーティクル集合を保持するために行う。図１２は、図３のステップＳ５の詳細を示すフローチャートである。なお、図１２の処理（計算）は、従来のパーティクルフィルタにおけるリサンプリングと同じである（非特許文献７参照）。 Next, the process of step S5 in FIG. 3 will be described. This processing is performed in order to retain a more probable particle set by selectively copying particles (hidden variables and hidden state quantities) having a large weight, that is, a high posterior distribution, and replacing them with particles having a small weight. FIG. 12 is a flowchart showing details of step S5 in FIG. Note that the processing (calculation) in FIG. 12 is the same as the resampling in the conventional particle filter (see Non-Patent Document 7).

リサンプリング部２３は、重みｗ（ｔ）、隠れ変数Φ（ｔ）、隠れ状態量Ｘ（ｔ）を入力し（ステップＳ１２０１）、パーティクルの重みｗ（ｔ）を正規化する（ステップＳ１２０２）。 The resampling unit 23 receives the weight w (t), the hidden variable Φ (t), and the hidden state quantity X (t) (step S1201), and normalizes the particle weight w (t) (step S1202).

その後、リサンプリング部２３は、ｓ＝１（ステップＳ１２０３）からｓ＝Ｓまで（ステップＳ１２０８でＮｏ）、ｓの値を「１」ずつ更新しながら（ステップＳ１２０９）、ステップＳ１２０４〜Ｓ１２０７の処理を行う。 Thereafter, the resampling unit 23 updates the value of s by “1” from s = 1 (step S1203) to s = S (No in step S1208) (step S1209), and performs the processing of steps S1204 to S1207. Do.

リサンプリング部２３は、ｕ∈[０，１]を生成し（ステップＳ１２０４）、Σ_ｌ＝１ ^ｓ＾−１ｗ（ｔ）^（ｌ）（「ｌ＝１」から「ｓ＾−１」までのｗ（ｔ）^（ｌ）の総和。以下同様）＜ｕ≦Σ_ｌ＝１ ^ｓ＾ｗ（ｔ）^（ｌ）となるｓ＾を見つける（ステップＳ１２０５）。 The resampling unit 23 generates uε [0, 1] (step S1204), and Σ _{l = 1} ^{s ^ −1} w (t) ^(l) (from “l = 1” to “s ^ −1”. The sum of w (t) ^(l) of the same, and so on) <u ≦ Σ _{l = 1} ^{s ^} w (t) Find s ^ that satisfies ^(l) (step S1205).

リサンプリング部２３は、Φ＾^（ｓ）（ｔ）にΦ^（ｓ＾）（ｔ）を、Ｘ＾^（ｓ）（ｔ）にＸ^（ｓ＾）（ｔ）をそれぞれ代入し（ステップＳ１２０６）、ｗ＾^（ｓ）（ｔ）に１／ｓを代入する（ステップＳ１２０７）。 Resampler 23, [Phi ^{^} a ^(s) (t) to ^{Φ (s ^) (t)} , X ^ a ^(s) (t) to ^{X (s ^) (t)} by substituting each (step S1206) , W ^ ^(s) 1 / s is substituted into (t) (step S1207).

ステップＳ１２０８でＮｏの場合、リサンプリング部２３は、重みｗ＾（ｔ）、隠れ変数Φ＾（ｔ）、隠れ状態量Ｘ＾（ｔ）を出力する（ステップＳ１２１０）。 In the case of No in step S1208, the resampling unit 23 outputs the weight w ^ (t), the hidden variable Φ ^ (t), and the hidden state quantity X ^ (t) (step S1210).

続いて、図３のステップＳ６の処理の詳細について説明する。図１３は、図３のステップＳ６の詳細を示すフローチャートである。 Next, details of the process in step S6 in FIG. 3 will be described. FIG. 13 is a flowchart showing details of step S6 in FIG.

まず、図１３の処理の概要について説明する。図１３の処理は、サンプリングされたξ_ｋ（ｔ），ψ_ｋ（ｔ）を用いて対象のトラッキングを実行し、ハイパーパラメータθのオンライン推定を行うものである。θ^ξ _ｋ（ｔ）の推定は次の式（２７）〜式（２９）の漸化式に従う。
First, the outline of the processing of FIG. 13 will be described. The processing of FIG. 13 performs tracking of an object using sampled ξ _k (t), ψ _k (t), and performs online estimation of the hyperparameter θ. The estimation of θ ^ξ _k (t) follows the recurrence formulas of the following formulas (27) to (29).

ここで、θ^ξ _ｋ（０）はθ^ξ _ｋ（ｔ）の初期値である。共役性からθ^ξ _ｋ（ｔ）をＰ（ξ_ｋ｜Ｘ（ｔ），θ^ξ _ｋ（０））＝ＮＩＷ（ξ_ｋ；θ^ξ _ｋ（ｔ））となるように定義できて、式（３０）に示すように、ハイパーパラメータθ^ξ _ｋ（ｔ）の推定値はθ^ξ _ｋ（ｔ−１）と推定された隠れ状態量ｘ（ｔ）の尤度を用いて計算できることが示された。
Here, θ ^ξ _k (0) is an initial value of θ ^ξ _k (t). From the conjugate property, θ ^ξ _k (t) can be _defined to be P (ξ _k | X (t), θ ^ξ _k (0)) = NIW (ξ _k ; θ ^ξ _k (t)), as shown in 30), estimated values of the hyper parameters ^θ ξ _{k (t)} has been shown to be computed using the likelihood of ^θ ξ _{k (t-1)} and the estimated hidden state quantity x (t) .

同様にθ^ψ _ｋ（ｔ）に関しても次の式（３１）で推定できる。
Similarly, θ ^ψ _k (t) can be estimated by the following equation (31).

これらの更新済みハイパーパラメータθ（ｔ）は次の時刻において式（２２）、式（２３）に用いられる。実際には、ｘ（ｔ）およびｙ（ｔ）は複数の追跡対象ｉおよび観測データｍからなっているので、次の式（３２）、式（３３）に示すように、ｋに関連するデータの集合を用いる。
These updated hyper parameters θ (t) are used in the equations (22) and (23) at the next time. Actually, since x (t) and y (t) are composed of a plurality of tracking objects i and observation data m, as shown in the following equations (32) and (33), data related to k Is used.

式（３２）において、「s.t.」は「such that」の略記であり、後ろに続く条件に適合するものだけに適用する、という意味である。つまり、式（３２）は、式（３０）中のｘ（ｔ）はｘ[ｋ]（ｔ）に読み替えて計算する（ただし、ｘ[ｋ]（ｔ）とは、全追跡対象ｉの状態量ｘ_ｉ（ｔ）の中でｚ_ｉ（ｔ）＝ｋであるｉを持つｘ_ｉ（ｔ）の集合である）という意味である。 In the equation (32), “st” is an abbreviation for “such that”, which means that it applies only to a condition that meets the following conditions. That is, the expression (32) is calculated by replacing x (t) in the expression (30) with x [k] (t) (where x [k] (t) is the state of all the tracking targets i). It is a set of x _i (t) having i with z _i (t) = k in the quantity x _i (t).

具体的には、ハイパーパラメータ更新部２４は、まず、時刻ｔにおける隠れ変数Φ（ｔ）、隠れ状態量ｘ（ｔ）、観測量ｙ（ｔ）を入力する（ステップＳ１３０１）。 Specifically, the hyper parameter update unit 24 first inputs the hidden variable Φ (t), the hidden state quantity x (t), and the observed quantity y (t) at time t (step S1301).

続いて、ハイパーパラメータ更新部２４は、ｓ＝１（ステップＳ１３０２）からｓ＝Ｓまで（ステップＳ１３２０でＮｏ）、ｓの値を「１」ずつ更新しながら（ステップＳ１３２１）、ステップＳ１３０３〜Ｓ１３１９の処理を行う。 Subsequently, the hyperparameter update unit 24 updates the value of s by “1” from s = 1 (step S1302) to s = S (No in step S1320) (step S1321), and continues from steps S1303 to S1319. Process.

また、ハイパーパラメータ更新部２４は、ｋ＝１（ステップＳ１３０３）からｋ＝Ｋ_ｔまで（ステップＳ１３１８でＮｏ）、ｋの値を「１」ずつ更新しながら（ステップＳ１３１９）、ステップＳ１３０４〜Ｓ１３１７の処理を行う。 Furthermore, the hyper parameter updating unit 24, k = 1 to (step S1303) until k = _{K t} (No in step S1318), while updating the value of k by "1" (step S1319), in step S1304~S1317 Process.

さらに、ハイパーパラメータ更新部２４は、ｘ[ｋ]とｙ[ｋ]に｛ｎｕｌｌ｝を代入した（ステップＳ１３０４）後、ｉ＝１（ステップＳ１３０５）からｉ＝Ｎ_ｔまで（ステップＳ１３１５でＮｏ）、ｉの値を「１」ずつ更新しながら（ステップＳ１３０７）、ステップＳ１３０６〜Ｓ１３１４の処理を行う。 Furthermore, the hyper parameter updating unit 24 was substituted for {null} in x [k] and y [k] (Step S1304) after, i = 1 to (step S1305) to i = _{N t} (No in step S1315) , I is updated by “1” (step S1307), and the processing of steps S1306 to S1314 is performed.

ハイパーパラメータ更新部２４は、ｃ^（ｓ） _ｉ（ｔ）＝１か否か判断し（ステップＳ１３０６）、Ｎｏの場合はステップＳ１３０７に進み、Ｙｅｓの場合はｚ^（ｓ） _ｉ（ｔ）＝ｋか否か判断する（ステップＳ１３０８）。 The hyperparameter update unit 24 determines whether c ^(s) _i (t) = 1 (step S1306). If No, the process proceeds to step S1307. If Yes, z ^(s) _i (t) = k. Whether or not (step S1308).

ステップＳ１３０８において、ハイパーパラメータ更新部２４は、Ｎｏの場合、ステップＳ１３０７に進み、Ｙｅｓの場合、ｘ[ｋ]に｛ｘ[ｋ]，ｘ^（ｓ） _ｉ（ｔ）｝を代入し（ステップＳ１３０９）、ｍ＝１（ステップＳ１３１０）からｍ＝Ｍ_ｔまで（ステップＳ１３１４でＮｏ）、ｍの値を「１」ずつ更新しながら（ステップＳ１３１２）、ステップＳ１３１１、Ｓ１３１３の処理を行う。 In step S1308, the hyper parameter update unit 24 proceeds to step S1307 if No, and substitutes {x [k], x ^(s) _i (t)} for x [k] if Yes (step S1309). ), m = 1 to (step S1310) until m = _{M t} (No in step S1314), while updating the value of m by "1" (step S1312), performs the processing of steps S1311, S1313.

ハイパーパラメータ更新部２４は、ｊ^（ｓ） _ｍ（ｔ）＝ｉか否か判断し（ステップＳ１３１１）、Ｎｏの場合はステップＳ１３１２に進み、Ｙｅｓの場合はｙ[ｋ]に｛ｙ[ｋ]，ｙ^（ｓ） _ｍ（ｔ）｝を代入する（ステップＳ１３１３）。 The hyperparameter update unit 24 determines whether j ^(s) _m (t) = i (step S1311). If No, the process proceeds to step S1312, and if Yes, the y [k] is changed to {y [k]. , Y ^(s) _m (t)} is substituted (step S1313).

ステップＳ１３１５でＮｏの場合、ハイパーパラメータ更新部２４は、ｘ[ｋ]を用いてθ^ξ _ｋ ^（ｓ）（ｔ）に代入したθ^ξ _ｋ ^（ｓ）（ｔ−１）を出力し（ステップＳ１３１６）、ｙ[ｋ]を用いてθ^ψ _ｋ ^（ｓ）（ｔ）に代入したθ^ψ _ｋ ^（ｓ）（ｔ−１）を出力する（ステップＳ１３１７）。 In the case of No in step S1315, the hyper parameter update unit 24 outputs θ ^ξ _k ^(s) (t−1) substituted for θ ^ξ _k ^(s) (t) using x [k] (step S1316). ^{), y [k] θ ψ} k (s) and outputs the (was substituted in ^{_{^{t) θ ψ k (s)}}} (t-1) with (step S1317).

このように、本実施形態の動画像処理装置１によれば、確率的な生成モデルを用いてダイナミクスパターンの数とパラメータを推定すると同時に未知数の移動対象に対する追跡を行うことができる。 Thus, according to the moving image processing apparatus 1 of the present embodiment, the number of dynamics patterns and parameters can be estimated using a stochastic generation model, and at the same time, an unknown number of moving objects can be tracked.

なお、動画像処理装置１の最大の特徴は、複数の対象の追跡と、それぞれの対象のダイナミクスを同時に推定できる点である。具体的には、図２Ｂ、図５、式（４）〜式（１３）などに示すように、ｃ，ｚ，ξ，ψ，ｊという５つの隠れ変数を推定することにより、複数対象追跡とダイナミクス推定を同時に計算できる。 The greatest feature of the moving image processing apparatus 1 is that it can track a plurality of objects and simultaneously estimate the dynamics of each object. Specifically, as shown in FIG. 2B, FIG. 5, Equations (4) to (13), etc., by estimating five hidden variables c, z, ξ, ψ, j, Dynamics estimation can be calculated simultaneously.

一方、従来技術では、｛ｃ，ｊ｝だけ（複数の追跡対象があるという情報）、もしくは｛ｚ，ｘ，ｙ｝だけ（複数のダイナミクスがあるという情報）しか推定しないため、本実施形態と比較した場合に推定されない変数による情報が欠落する。 On the other hand, according to the present embodiment, only {c, j} (information that there are a plurality of tracking targets) or {z, x, y} (information that there are a plurality of dynamics) is estimated. Missing information due to variables that are not estimated when compared.

また、本実施形態の動画像処理装置１によれば、経時とともに追跡対象の数が変わっても、その変化を推定して正しく追跡を行うことができる。これは、式（１４）において時刻ｔでの対象の生成と消滅を表すｃ（ｔ）を推定する処理によって実現できる。 Further, according to the moving image processing apparatus 1 of the present embodiment, even if the number of tracking objects changes with time, the change can be estimated and tracking can be performed correctly. This can be realized by a process of estimating c (t) representing generation and disappearance of an object at time t in equation (14).

さらに、本実施形態の動画像処理装置１によれば、推定されるダイナミクスの数を自動的に決定することもできる。これは、式（１５）の計算方法をオンライン計算用に変更した式（１６）でクラスタの数を増減させることによって実現できる。 Furthermore, according to the moving image processing apparatus 1 of the present embodiment, the estimated number of dynamics can be automatically determined. This can be realized by increasing or decreasing the number of clusters in Expression (16) in which the calculation method of Expression (15) is changed for online calculation.

また、パーティクルを用いない（例えばカルマンフィルタだけを用いる）従来手法では、各時刻において最善と思われる隠れ変数の推定値を一つだけ求める。一方、本手法ではカルマンフィルタとパーティクルフィルタを組み合わせており、パーティクルごとに個別に隠れ変数の推定を行うため、パーティクルの数だけ異なる推定仮説をもつことになる。 In the conventional method that does not use particles (for example, only the Kalman filter is used), only one estimated value of a hidden variable that seems to be the best at each time is obtained. On the other hand, in this method, the Kalman filter and the particle filter are combined, and the hidden variable is estimated individually for each particle. Therefore, the estimation hypotheses differ by the number of particles.

この違いは、例えば追跡対象が急激に（瞬間的に）運動方向を変化させた場合などに結果として現れる。パーティクルを用いない従来手法の場合には、それまでの動きに基づいて最善と思われる運動方向だけで推定を行うので、急激な変化には対応しにくく、追跡に失敗する可能性が高くなる。一方、パーティクルを用いる本手法の場合には、いくつも仮説を生成しておくので、そのうちの一つでも運動方向の急変を予測していれば引き続き追跡できる可能性が高くなる。このようにいくつも仮説があるとどれを信頼してよいかの判断が必要になるので、本手法では、隠れ変数の重み（≒パーティクルの重み）ｗ（ｔ）で、それぞれの仮説の信頼度を表現する。そして、急激な方向転換のように起こりにくい事象の推定値には小さい重みをつけておけば、通常の（予測通りの）状態での追跡では重要視されない一方で、実際にそのような仮説が起こったときに対応する余地を残すことができる。 This difference appears as a result when, for example, the tracking target suddenly (instantaneously) changes the direction of movement. In the case of the conventional method that does not use particles, since the estimation is performed only in the direction of motion that seems to be the best based on the previous movement, it is difficult to cope with a sudden change, and the possibility of failure in tracking increases. On the other hand, in the case of this method using particles, a number of hypotheses are generated, so if any one of them predicts a sudden change in the direction of motion, there is a high possibility of being able to continue tracking. In this way, it is necessary to judge which one of the hypotheses should be trusted, so in this method, the reliability of each hypothesis is determined by the weight of the hidden variable (≈particle weight) w (t). Express. And if you put a small weight on estimates of events that are unlikely to happen, such as sudden turnarounds, tracking under normal (as expected) conditions is not important, but such hypotheses are actually You can leave room for what happens.

さらに、各隠れ変数による作用と効果について改めて説明する。まず、移動対象の存在を表す隠れ変数（動画像中に存在するか否かを表す隠れ変数）（ｃ（ｔ））を推定することで、移動対象の増減（画面からの消失および画面への登場）を表現できる。この変数が存在しない場合には、追跡対象数は全データ（時間）を通じて一定であると仮定したことになるので、そうでないデータの推定には適用できないモデルとなる。 Furthermore, the action and effect of each hidden variable will be described again. First, by estimating a hidden variable indicating the presence of a moving target (hidden variable indicating whether or not it exists in a moving image) (c (t)), the movement target is increased or decreased (disappearance from the screen and Appearance) can be expressed. If this variable does not exist, it is assumed that the number of tracking targets is constant throughout the entire data (time), so that the model cannot be applied to estimation of data other than that.

また、動作パターンの識別子を表す隠れ変数（ｚ（ｔ））を推定することで、各追跡対象が動作パターンを切り替えながら行動する状況を表現できる。この変数が存在しない場合には、追跡対象は常に一定の動作パターンで行動すると仮定したことになるので、その仮定が正しくない場合には追跡自体が困難となり、例えば表３に示した対数尤度のような追跡精度の指標が劣化する可能性が高くなる。 In addition, by estimating the hidden variable (z (t)) representing the identifier of the motion pattern, it is possible to express the situation in which each tracking target acts while switching the motion pattern. If this variable does not exist, it is assumed that the tracking target always behaves in a certain motion pattern. Therefore, if the assumption is not correct, tracking itself becomes difficult. For example, the log likelihood shown in Table 3 is used. There is a high possibility that the tracking accuracy index such as will deteriorate.

さらに、動作パターンのパラメータ（ξ、ψ）を推定することで、このパラメータの初期設定（値）にかかわらず、追跡精度を向上することができる。つまり、このパラメータを推定しない場合には、前記したような複数の動作パターンを許容するモデルを用いたとしても、そのモデルの正確さは動作パターンの初期パラメータで完全に決定されてしまい、その初期設定が良くない場合には追跡精度の向上が図りにくくなる。 Further, by estimating the parameters (ξ, ψ) of the operation pattern, the tracking accuracy can be improved regardless of the initial setting (value) of this parameter. In other words, if this parameter is not estimated, even if a model that allows a plurality of motion patterns as described above is used, the accuracy of the model is completely determined by the initial parameters of the motion pattern. If the setting is not good, it is difficult to improve the tracking accuracy.

また、観測データと移動対象の対応関係を表す隠れ変数（ｊ（ｔ））を推定することで、複数対象の追跡を行うことができる。この隠れ変数の推定は、複数対象の追跡を行う場合には必須である。この隠れ変数を推定しない場合には、複数の追跡対象が存在する場合に正しい推定を行うことができない、つまり、単一の追跡対象しか追跡できない。 In addition, it is possible to track a plurality of objects by estimating a hidden variable (j (t)) representing the correspondence between the observation data and the moving object. This estimation of hidden variables is indispensable when tracking multiple objects. If this hidden variable is not estimated, correct estimation cannot be performed when there are a plurality of tracking targets, that is, only a single tracking target can be tracked.

なお、本発明は前記実施形態に限定されるものではなく、その趣旨を変えない範囲で実施することができる。つまり、ハードウェア、ソフトウェアの具体的な構成について、本発明の主旨を逸脱しない範囲で適宜変更が可能である。 In addition, this invention is not limited to the said embodiment, It can implement in the range which does not change the meaning. That is, specific configurations of hardware and software can be appropriately changed without departing from the gist of the present invention.

（実験例）
次に、人工データと実動画データを用いて本手法の性能を確認した実験について説明する。
＜比較手法＞
本手法の比較対象として隠れ変数の数を減じた２モデルを用意し、隠れ変数導入の効果を評価する。これら３モデルの特徴を表１に示した。
(Experimental example)
Next, an experiment for confirming the performance of this method using artificial data and actual moving image data will be described.
<Comparison method>
Two models with a reduced number of hidden variables are prepared for comparison with this method, and the effect of introducing hidden variables is evaluated. The characteristics of these three models are shown in Table 1.

第１の比較手法は、Saerkkaeらのモデル（非特許文献２参照）に似たモデルである。このモデルはダイナミクスのハイパーパラメータθ（ｔ）の更新（式（３０）、式（３１））を行わず（no）、常にデフォルトの初期値θ（０）を用いる。さらに、ＣＲＰを用いたダイナミクスのクラスタリング（式（１６）、式（１８）、式（１９））も行わない（no）。結果としてすべての追跡対象に対してξ（ｔ）とψ（ｔ）は常に唯一の初期分布からサンプリングされる。 The first comparison method is a model similar to the model of Saerkkae et al. (See Non-Patent Document 2). This model does not update the dynamics hyperparameter θ (t) (Equations (30) and (31)) (no) and always uses the default initial value θ (0). Further, dynamics clustering using CRP (formula (16), formula (18), formula (19)) is not performed (no). As a result, ξ (t) and ψ (t) are always sampled from a single initial distribution for all tracked objects.

第２の比較手法は、θ（ｔ）のオンライン推定（式（３０）、式（３１））を行うが（yes）、ｚ_ｉ（ｔ）に対するクラスタリング（式（１６）、式（１８）、式（１９））を行わない（no）。従って、推定されるハイパーパラメータは追跡対象ごとに独立に学習される。このモデルでは、最初のモデルにくらべ複数の追跡対象が異なるダイナミクスを持つことを表現できる点で複雑な（移動対象が多い）場合に対応できるようになっているが、パラメータのパターン化（クラスタリング）が行われない。 Second comparison technique, theta-line estimation (t) is performed (Equation (30), equation (31)) _(yes), clustering for _z i (t) (Equation (16), equation (18), Equation (19)) is not performed (no). Therefore, the estimated hyperparameter is learned independently for each tracking target. In this model, it is possible to express the fact that multiple tracking targets have different dynamics compared to the first model, so that it can cope with complicated cases (many moving targets), but parameter patterning (clustering) Is not done.

最後に、本手法では、前記した通り、ダイナミクスのクラスタリング（式（１６）、式（１８）、式（１９））を行い（yes）、ハイパーパラメータをオンラインで推定（式（３０）、式（３１））する（yes）。第２の比較手法との相違点は、各追跡対象が共有した運動パターンを時間的に切替えながら軌跡を生成するようにモデル化している点にある。 Finally, in this method, as described above, dynamics clustering (formula (16), formula (18), formula (19)) is performed (yes), and hyperparameters are estimated online (formula (30), formula ( 31)) Yes (yes). The difference from the second comparison method is that a trajectory is modeled while temporally switching the motion pattern shared by each tracking target.

〔人工データを用いた実験〕
この実験では、［０：２００］×［０：２００］の仮想２次元空間中の移動質点のトラッキングとクラスタリングをタスクとする。データの例を図１４に示す。図１４は、人工データに対応する図であり、（ａ）が観測データ（「○」で示す）、（ｂ）が正解データである。図１４（ｂ）において、「●」が追跡対象を示し、「○」がノイズを示す。追跡対象（質点）の隠れ状態量は、各対象の位置を表す２次元の実数ベクトルとする。また、観測量も２次元実数ベクトルである。これらはノイズによって劣化した対象位置の情報を表す。なお、これらのベクトルは整数ベクトルでも機能する。各対象は次の式（３４）、式（３５）に示すランダムウォークモデルに従う。
[Experiment using artificial data]
In this experiment, tracking and clustering of moving mass points in a virtual two-dimensional space of [0: 200] × [0: 200] are tasks. An example of the data is shown in FIG. FIG. 14 is a diagram corresponding to artificial data, where (a) is observation data (indicated by “◯”), and (b) is correct data. In FIG. 14B, “●” indicates a tracking target, and “◯” indicates noise. The hidden state quantity of the tracking target (mass point) is a two-dimensional real vector representing the position of each target. The observation amount is also a two-dimensional real vector. These represent information on the target position degraded by noise. These vectors also function as integer vectors. Each object follows a random walk model shown in the following equations (34) and (35).

ここで、Ｎ（・）は正規分布を表していて、ｖ（ｔ）およびｗ（ｔ）はそれぞれ正規分布からサンプリングされる。本実験では、各対象は表２に示す４つのダイナミクスパターンをランダムに切り替えながら軌跡と出力を生成するものとした。なお、｛｝^Ｔは転置した行列またはベクトルを意味し、diag｛｝は対角行列を意味する。
Here, N (•) represents a normal distribution, and v (t) and w (t) are each sampled from the normal distribution. In this experiment, each object is assumed to generate a trajectory and output while randomly switching the four dynamics patterns shown in Table 2. In addition, {} ^T means a transposed matrix or vector, and diag {} means a diagonal matrix.

全パターンにおいてｒは零ベクトルである。ＮＩＷの初期値θ（０）は、ξｋ（ｔ）およびψ_ｋ（ｔ）の平均値が次の式（３６）の値となるように設定する。
In all patterns, r is a zero vector. The initial value θ (0) of NIW is set so that the average value of ξk (t) and ψ _k (t) becomes the value of the following equation (36).

その他、追跡対象の軌跡推定に必要なパラメータを列挙する。時系列データは３００ステップである。パーティクル数はＳ＝３００、ＣＲＰのconcentration parameterはγ＝２とした。各時刻においてＰ_ｂ＝０．１の確率で新しい追跡対象が発生するものとする。シーン中に存在する追跡対象（ｃ_ｉ（ｔ−１）＝１）は式（３７）で計算される確率（消滅確率）Ｐ_ｄでシーンから消滅するものとする。なお、ｔ_ｎはｊ_ｍ（・）＝ｉとなる観測量ｙ_ｍが少なくとも一つ存在した最後の時刻を表す。また、λ＝０．１とした。
Other parameters necessary for estimating the track of the tracking target are listed. The time series data is 300 steps. The number of particles was S = 300, and the CRP concentration parameter was γ = 2. It is assumed that a new tracking target is generated with a probability of P _b = 0.1 at each time. The tracking target (c _i (t−1) = 1) existing in the scene is assumed to disappear from the scene with the probability (annihilation probability) P _d calculated by the equation (37). Note that t _n represents the last time when at least one observation amount y _m with j _m (·) = i exists. In addition, λ = 0.1.

続いて、実験結果を検討する。各時刻におけるデータ対数尤度の平均値を表３に示す。
Next, we will examine the experimental results. Table 3 shows the average value of the data log likelihood at each time.

人工データの列の値（実動画像の列の値も同様）は、推定された隠れ変数（すなわちモデルそのもの）から観測データが生成される可能性を表すもので、値が大きい程モデルの適合度が高いことを示す。表３から、Kalman filterのパラメータ推定およびそのクラスタリングが対数尤度を向上させることが確認できる。これは、本手法がオンラインのトラッキングとクラスタリングによってより良いモデルとパラメータを獲得したことを示している。 The value of the artificial data column (same as the actual moving image column value) indicates the possibility of generating observation data from the estimated hidden variables (that is, the model itself). Indicates a high degree. From Table 3, it can be confirmed that parameter estimation of Kalman filter and its clustering improve log likelihood. This indicates that the method has obtained better models and parameters through online tracking and clustering.

図１５は、本手法（π＝０．２）における、正規分布ノイズのパラメータｑの最終時刻における分布図（人工データ実験における速度ノイズの平均値分布図）である。ｑは追跡対象の平均速度バイアスを表している。生成された人工データは表２に示した通り４パターンのｑをもっている。図１５における各プロット点は一つのダイナミクスクラスタのｑの平均値を表している。サイズの数字は各クラスタのデータサイズ（どれだけ多くの対象がそのクラスタに属したか）を表す。大きく４つの主要なクラスタ（ｓｉｚｅ＞１００）が獲得され、それらはデザイン（設定）された「正解（ground truth）」に近い値を得た。すなわち、本手法はダイナミクスパターンのクラスタリングとそれらのパラメータ推定に成功したといえる。 FIG. 15 is a distribution diagram (average value distribution diagram of velocity noise in the artificial data experiment) at the final time of the parameter q of the normal distribution noise in the present method (π = 0.2). q represents the average velocity bias of the tracking target. The generated artificial data has four patterns of q as shown in Table 2. Each plot point in FIG. 15 represents an average value of q of one dynamics cluster. The size number represents the data size of each cluster (how many objects belonged to that cluster). Large four major clusters (size> 100) were acquired, and they obtained values close to the designed “ground truth”. In other words, this method succeeds in clustering dynamics patterns and estimating their parameters.

〔実動画像データによる実験〕
次に、実動画像データを用いた実験について説明する。デジタルカメラで撮影された３２０×２４０ピクセルの動画像を用いて、シーン中の歩行者から抽出される特徴点の追跡とクラスタリングを行う。 [Experiments with actual video data]
Next, an experiment using actual moving image data will be described. Tracking and clustering of feature points extracted from pedestrians in a scene are performed using a 320 × 240 pixel moving image captured by a digital camera.

目標となる特徴点は次のようにして抽出した。最初に背景差分（特定の対象を特定の条件で撮影した背景画像と、別途同一条件で撮影した観測画像を比較すること）を行い、二値化によって前景ピクセルを抽出する。続いて、前景部分中から黒色のピクセルだけを選び、それらをmean shiftクラスタリングで少数のクラスタ中心へ量子化する。これらのクラスタ中心座標を特徴点とする。これら特徴点の位置ベクトル（２次元のベクトル）が観測量となる。この位置ベクトルは、一般に実数ベクトルであるが、整数ベクトルとしても機能する。おおよそ一人の歩行者から１〜３個の特徴点が抽出された。 Target feature points were extracted as follows. First, background difference (comparing a background image obtained by photographing a specific object under a specific condition and an observation image separately photographed under the same condition) is performed, and foreground pixels are extracted by binarization. Subsequently, only black pixels are selected from the foreground portion and quantized to a small number of cluster centers by mean shift clustering. These cluster center coordinates are used as feature points. The position vector (two-dimensional vector) of these feature points is an observation amount. This position vector is generally a real vector, but also functions as an integer vector. 1-3 feature points were extracted from approximately one pedestrian.

本実験ではdata association分布（式（２４））に色モデルを設定した（非特許文献１参照）。モデルに用いるのは８ビットのＲＧＢ（Red Green Blue）ヒストグラムである。各特徴点の周辺ピクセルにおけるヒストグラムと、前時刻に得られた追跡対象のヒストグラムの間でBhattacharya係数を計算する。対象ｉのヒストグラムは、ｊ_ｍ（ｔ−１）＝ｉとなった観測点のヒストグラムの平均とする。ｊ_ｍ（ｔ）＝ｉとなる確率は次の式（３８）で計算する。
In this experiment, a color model was set for the data association distribution (formula (24)) (see Non-Patent Document 1). An 8-bit RGB (Red Green Blue) histogram is used for the model. A Bhattacharya coefficient is calculated between the histogram in the surrounding pixels of each feature point and the histogram to be tracked obtained at the previous time. The histogram of the object i is the average of the histograms of observation points where j _m (t−1) = i. The probability that j _m (t) = i is calculated by the following equation (38).

ここでσは事前に設定した定数パラメータ（σ＝０．１５）で、ｄ（ｍ，ｉ）はｍ番目の観測データとｉ番目の追跡対象のヒストグラム間のBhattacharya coefficientである。 Here, σ is a constant parameter (σ = 0.15) set in advance, and d (m, i) is a Bhattacharya coefficient between the m-th observation data and the histogram of the i-th tracking target.

状態空間モデルは先の人工データ実験と同じものを用いる。ＮＩＷの初期値θ０に関しては、ξ_ｋ（ｔ）およびψ_ｋ（ｔ）の平均値が次の式（３９）の値になるように設定した。
The same state space model as the previous artificial data experiment is used. The initial value θ0 of NIW was set so that the average value of ξ _k (t) and ψ _k (t) would be the value of the following equation (39).

他のパラメータ設定を列挙する。動画像のフレーム数は２００フレームである。この２００フレームは３０ＦＰＳ（Frame Per Second）でキャプチャされた１０００フレームから間引いて抽出した。パーティクルの総数はＳ＝５００個、ＣＲＰのconcentration parameterはγ＝０．１とした。新規追跡対象の生成確率はＰ_ｂ＝０．１とし、シーン中の対象が消滅する確率は式（３７）で計算された確率Ｐ_ｄである（λ＝０．１とする）。 List other parameter settings. The number of frames of the moving image is 200 frames. These 200 frames were extracted by thinning out from 1000 frames captured at 30 FPS (Frame Per Second). The total number of particles was S = 500, and the CRP concentration parameter was γ = 0.1. The generation probability of the new tracking target is P _b = 0.1, and the probability that the target in the scene disappears is the probability P _d calculated by Equation (37) (assuming λ = 0.1).

続いて実験結果を検討する。前記したように、表３の実動画像の列は平均対数尤度を示す。人工データの実験の場合と同様、本手法によって対数尤度が向上することが確認できる。次に、平均速度のバイアスであるｑの分布を図１６と図１７に示す。それぞれπ＝０．１としたときの１０番目、１９９番目（最終）フレームでの分布である。 Next, we will examine the experimental results. As described above, the actual moving image column in Table 3 indicates the average log likelihood. As in the case of the artificial data experiment, it can be confirmed that the log likelihood is improved by this method. Next, the distribution of q, which is the bias of the average speed, is shown in FIGS. Distributions in the 10th and 199th (final) frames when π = 0.1, respectively.

実データであるため「正解（ground truth）」は存在しないが、動画像データを観察すると、ほとんどの歩行者は画面内の上下方向にのみ動いていることがわかった。また、ごくたまに店に入るため左右方向に方向転換する。以上のことから、基本的に動きのパターンは上下方向の２種であることが予想される。入力フレーム数が少ない状態（図１６）では、まだそういったトレンドを推測することができないため、各対象ごとに個別のダイナミクスパターンを設定して尤度を上げようとする。しかし、情報が蓄積されると（図１７）、最終的には上下方向２種類のダイナミクスクラスタのみが存在するようになった。 Although it is real data, there is no “ground truth”, but when moving image data was observed, it was found that most pedestrians moved only in the vertical direction in the screen. Also, in order to enter the store occasionally, the direction is changed to the left and right. From the above, it is expected that there are basically two types of movement patterns in the vertical direction. In a state where the number of input frames is small (FIG. 16), since such a trend cannot be estimated yet, an individual dynamics pattern is set for each target to increase the likelihood. However, when information is accumulated (FIG. 17), only two types of dynamic clusters in the vertical direction finally exist.

最後に、トラッキング結果の連続スナップショット（８枚）を図１８に示す。「＃０」などは時系列に沿ったフレーム番号を示す。図１８は、ショッピングモールの吹き抜け部分を上から撮影したものである。各画面の左右両端部分はショッピングモールの２階廊下部分、画面中央部分は１階通路である。１階の各店は２階廊下部分の下に存在し、それらの入り口では光が漏れている。実際には、１階通路において、画面右側を歩く人のほとんどは画面上方向に歩き、画面左側を歩く人のほとんどは画面下方向に歩いている。 Finally, continuous snapshots (eight) of tracking results are shown in FIG. “# 0” or the like indicates a frame number along the time series. FIG. 18 is a photograph taken from above of the atrium portion of the shopping mall. The left and right ends of each screen are the second floor corridor portion of the shopping mall, and the center portion of the screen is the first floor passage. Each store on the first floor exists under the corridor on the second floor, and light leaks at their entrance. Actually, in the first floor passage, most people walking on the right side of the screen walk upward, and most people walking on the left side of the screen walk downward.

対象の推定位置は矩形で示している。なお、１人の人物に対して、矩形の追跡マークが複数ついている部分もあるが、これは人物を直接追跡しているのではなく、人物とおぼしき部分から黒色領域（髪の部分やズボンの部分など）を個別に抽出し、その中心点を追跡しているからであり、追跡結果は正しいものとなる。 The estimated position of the object is indicated by a rectangle. In addition, there is a part with a plurality of rectangular tracking marks for one person, but this is not tracking the person directly, but from the person and the obscured part to the black area (hair part and pants This is because the center point is extracted individually and the center point thereof is tracked, and the tracking result is correct.

それぞれの矩形の上部に付した数字はダイナミクスクラスタのインデックス（ｚ）に対応する。上へ動くバイアスを獲得したクラスタは「１」、そして下向きのダイナミクスは「０」に対応している。フレーム１００までは上向きに歩いていた画面右側の二人が、フレーム１２５では店に入るために反転している。本手法では、それを検知して対応する特徴点のダイナミクスパターンを上向き「１」から下向き「０」に切り替えた。つまり、本手法によって複数の移動対象に対する追跡をより高精度で行っていることが確認できる。 The number attached to the top of each rectangle corresponds to the index (z) of the dynamics cluster. The cluster that has acquired the bias to move upward corresponds to “1”, and the downward dynamics corresponds to “0”. The two people on the right side of the screen who walked up to the frame 100 are reversed to enter the store at the frame 125. In this method, it is detected and the dynamic pattern of the corresponding feature point is switched from “1” upward to “0” downward. That is, it can be confirmed that the tracking of a plurality of moving objects is performed with higher accuracy by this method.

本実施形態に係る動画像処理装置の構成を模式的に示す機能ブロック図である。It is a functional block diagram which shows typically the structure of the moving image processing apparatus which concerns on this embodiment. 複数対象追跡のイメージ図である。It is an image figure of multiple object tracking. 本手法の計算に用いられる生成モデルのグラフィカルモデルである。It is a graphical model of the generation model used for the calculation of this method. 本手法の全体の処理を示すフローチャートである。It is a flowchart which shows the whole process of this method. 図３のステップＳ３の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S3 of FIG. 図４のステップＳ３４の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S34 of FIG. 図５のステップＳ３４３の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S343 of FIG. 図５のステップＳ３４４の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S344 of FIG. 図５のステップＳ３４５の詳細を示すフローチャートでありIt is a flowchart which shows the detail of step S345 of FIG. 図５のステップＳ３４６の詳細を示すフローチャートであIt is a flowchart which shows the detail of step S346 of FIG. 図１０は、図４のステップＳ３５の詳細を示すフローチャートである。FIG. 10 is a flowchart showing details of step S35 in FIG. 図３のステップＳ４の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S4 of FIG. 図３のステップＳ５の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S5 of FIG. 図３のステップＳ６の詳細を示すフローチャートである。It is a flowchart which shows the detail of step S6 of FIG. 人工データに対応する図であり、（ａ）が観測データ、（ｂ）が正解データである。It is a figure corresponding to artificial data, (a) is observation data, (b) is correct data. 本手法（π＝０．２）における、正規分布ノイズのパラメータｑの最終時刻における分布図である。It is a distribution map in the last time of parameter q of normal distribution noise in this technique (π = 0.2). 実動画像を用いた実験における平均速度のバイアスであるｑの分布図である（１０番目のフレーム）。It is a distribution map of q which is a bias of average speed in an experiment using an actual moving image (10th frame). 実動画像を用いた実験における平均速度のバイアスであるｑの分布図である（１９９番目のフレーム）。It is a distribution map of q which is a bias of average speed in an experiment using an actual moving image (199th frame). 実動画像を用いた実験におけるトラッキング結果の連続スナップショット（８枚）である。It is a continuous snapshot (8 sheets) of the tracking result in the experiment using an actual moving image.

Explanation of symbols

１動画像処理装置
２演算手段
３記憶手段
４入力手段
５出力手段
６バスライン
２１隠れ変数集合推定部
２２隠れ状態推定部
２３リサンプリング部
２４ハイパーパラメータ部
２１１隠れ変数決定部
２１２隠れ変数重み計算部
２１１１移動対象数決定部
２１１２動作パターン番号決定部
２１１３動作パターンパラメータ決定部
２１１４移動対象対応関係決定部
DESCRIPTION OF SYMBOLS 1 Moving image processing apparatus 2 Calculation means 3 Storage means 4 Input means 5 Output means 6 Bus line 21 Hidden variable set estimation part 22 Hidden state estimation part 23 Resampling part 24 Hyper parameter part 211 Hidden variable determination part 212 Hidden variable weight calculation part 2111 Number of movement target determination unit 2112 Operation pattern number determination unit 2113 Operation pattern parameter determination unit 2114 Movement target correspondence determination unit

Claims

When observation data regarding a moving image in which a plurality of moving objects exist is given as an input, a time evolution function of a hidden state quantity indicating an estimated value of a true value regarding the plurality of moving objects using a probabilistic generation model; A moving image processing method by a moving image processing apparatus for estimating the number of hidden states of a plurality of moving objects by estimating the number of patterns and each feature of dynamics indicating an observation function for calculating observation data, ,
The moving image processing device includes the observation data, a hidden state quantity for each of the moving objects, a plurality of hidden variables indicating a relationship between the observation data and the hidden state quantities for each of the moving objects, and each of the hidden variables. Storage means for storing the weight and the hyper parameter of the dynamics, and an arithmetic means,
The computing means is
A hidden variable set estimation unit estimating the weights of the plurality of hidden variables and the hidden variables based on the observation data;
The hidden state estimation unit is configured to generate the hidden state based on the observation data, the hidden state amount at a time immediately before the currently calculated time, and the plurality of hidden variables estimated by the hidden variable set estimation unit. Estimating a state quantity;
Based on the plurality of hidden variables estimated by the hidden variable set estimation unit and the hidden state quantity estimated by the hidden state estimation unit, the resampling unit performs an estimation result of the hidden variable and the associated hidden Re-sampling by estimating the high posterior distribution from the particle weight distribution indicating the state quantity estimation result;
A hyperparameter update unit updating the hyperparameters of the dynamics based on a resampling result by the resampling unit;
The moving image processing method is characterized in that the hidden state amount of the plurality of moving objects is estimated by the hidden state estimation unit by repeatedly executing.

In the step of estimating the weight of each of the plurality of hidden variables and the hidden variables based on the observation data, the hidden variable set estimation unit,
The hidden variable determination unit is based on the hidden state quantity at the previous time, the plurality of hidden variables at the previous time, and the weights of the hidden variables at the previous time, respectively. Estimating the plurality of hidden variables for each particle;
A hidden variable weight calculation unit, for each particle, based on the plurality of hidden variables estimated by the hidden variable determination unit, to estimate the weight of each hidden variable;
The moving image processing method according to claim 1, wherein:

The hidden variable determination unit is configured based on the hidden state quantity at the previous time, the plurality of hidden variables at the previous time, and the weights of the hidden variables at the previous time. In the step of estimating the plurality of hidden variables for each particle,
Accepting the input of the hidden state quantity at the previous time, the plurality of hidden variables at the previous time, and the observation data at the currently calculated time,
A step of determining a hidden variable representing whether or not each of the moving objects exists in the moving image at a certain time among the plurality of hidden variables;
An operation pattern number determining unit determining the hidden variable representing an identifier of the movement target operation pattern among the plurality of hidden variables;
An operation pattern parameter determination unit determining a parameter of the operation pattern;
A movement target correspondence determining unit determining the hidden variable representing the correspondence between the observation data and the movement target among the plurality of hidden variables;
The moving image processing method according to claim 2, wherein:

When observation data regarding a moving image in which a plurality of moving objects exist is given as an input, a time evolution function of a hidden state quantity indicating an estimated value of a true value regarding the plurality of moving objects using a probabilistic generation model; A dynamic image processing device that estimates the number of patterns and each feature of a dynamics indicating an observation function for calculating observation data, and estimates the amount of hidden states of the plurality of moving objects,
The observation data, hidden state quantities of the moving objects, a plurality of hidden variables indicating relationships between the observation data and the hidden state quantities relating to the moving objects, weights of the hidden variables, and the dynamics Storage means for storing the hyperparameters of
Based on the observation data, the plurality of hidden variables, and a hidden variable set estimation unit that estimates the weight of each hidden variable,
Hidden estimation of the hidden state quantity based on the observation data, the hidden state quantity at the time immediately before the currently calculated time, and the plurality of hidden variables estimated by the hidden variable set estimation unit State estimation part,
Based on the plurality of hidden variables estimated by the hidden variable set estimation unit and the hidden state amount estimated by the hidden state estimation unit, an estimation result of one hidden variable and an associated hidden state amount A resampling unit that performs resampling by estimating a high posterior distribution from the distribution of particle weights indicating
A hyperparameter update unit that updates a hyperparameter of the dynamics based on a resampling result by the resampling unit,
And the hidden state of the plurality of movement targets by the hidden state estimation unit by repeating the processing by the hidden variable set estimation unit, the hidden state estimation unit, the resampling unit, and the hyperparameter update unit. Computing means for estimating the quantity;
A moving image processing apparatus comprising:

The hidden variable set estimation unit includes:
For each particle, based on the weight of the hidden state quantity at the previous time, the plurality of hidden variables at the previous time, and the weight of the hidden variable at the previous time, A hidden variable determination unit that estimates a plurality of hidden variables;
Based on the plurality of hidden variables estimated by the hidden variable determination unit, for each particle, a hidden variable weight calculation unit that estimates the weight of each hidden variable;
The moving image processing apparatus according to claim 4, further comprising:

The hidden variable determination unit
Accepting the input of the hidden state quantity at the previous time, the plurality of hidden variables at the previous time, and the observation data at the currently calculated time,
Among the plurality of hidden variables, a moving object number determining unit that determines whether or not each of the moving objects exists in the moving image at a certain time,
Among the plurality of hidden variables, an action pattern number determining unit that determines the hidden variable representing an identifier of the movement target movement pattern;
An operation pattern parameter determination unit for determining a parameter of the operation pattern;
Among the plurality of hidden variables, a movement target correspondence determining unit that determines the hidden variable representing the correspondence between the observation data and the movement target;
The moving image processing apparatus according to claim 5, further comprising:

A moving image processing program for causing a computer to function as the moving image processing apparatus according to any one of claims 4 to 6.