JP2011243229A

JP2011243229A - Object tracking device and object tracking method

Info

Publication number: JP2011243229A
Application number: JP2011192418A
Authority: JP
Inventors: Hidetomo Sakaino; 英朋境野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-09-05
Filing date: 2011-09-05
Publication date: 2011-12-01
Anticipated expiration: 2028-11-17
Also published as: JP5216902B2

Abstract

PROBLEM TO BE SOLVED: To provide an object tracking device and an object tracking method that can track an object recorded on a dynamic image at a high speed using a particle method.SOLUTION: A high-speed object tracking device introduces an average value shift method between a prediction model and an observation model in a particle method, searches a position of an image feature amount that is most similar to an image feature amount of an object to be tracked by the average value shift method, and moves a sample to the searched position. Accordingly, an amount of weighting calculation in the observation model is reduced.

Description

本発明は、粒子法を用いて動画像に撮影されている対象物を追跡する対象物追跡装置及び対象物追跡方法の技術に関する。 The present invention relates to a technique for an object tracking apparatus and an object tracking method for tracking an object captured in a moving image using a particle method.

車、人、魚などの追跡に関する応用分野は多岐に渡り、追跡対象である対象物の重心を検出し、線形モデルを用いて該重心を追跡していく方法などが例として挙げられる。しかしながら、実環境下では建物や他の対象物との重なり等により、追跡中の対象物を見失う場合が多いことが知られている。 There are a wide variety of application fields related to tracking of cars, people, fish, etc., and examples include a method of detecting the center of gravity of an object to be tracked and tracking the center of gravity using a linear model. However, it is known that in an actual environment, the object being tracked is often lost due to overlap with a building and other objects.

そこで、映像（動画像）を構成している時系列な画像フレームを用いてカルマンフィルタや粒子法（パーティクル法）を適用した追跡方法が利用されている（非特許文献１、２参照）。特に、粒子法は対象物の状態変数が非ガウス分布にしたがって変化することをモデル化する技術であるため、対象物の急峻な位置変化に対応することが可能である。以下、従来の粒子法について説明する。 Therefore, tracking methods using a Kalman filter or a particle method (particle method) using time-series image frames constituting a video (moving image) are used (see Non-Patent Documents 1 and 2). In particular, since the particle method is a technique for modeling that the state variable of the object changes according to a non-Gaussian distribution, it is possible to cope with a steep position change of the object. Hereinafter, the conventional particle method will be described.

粒子法とは、追跡対象である対象物の状態（位置、大きさ）を直接知ることができない場合に、その状態を統計的に効率よく推定する方法である。通常、対象物の状態は観測された映像の変化、分布からのみ推定することができ、ノイズ等の影響により真の状態を直接把握することは不可能である。 The particle method is a method for estimating the state statistically and efficiently when the state (position and size) of the object to be tracked cannot be directly known. Usually, the state of the object can be estimated only from the change and distribution of the observed image, and it is impossible to directly grasp the true state due to the influence of noise or the like.

具体的には、図７に示すように、予測モデルと観測モデルの２つの処理体系を通じて、映像から対象物の状態を再帰的に最尤推定する。粒子法では対象物の状態（位置、大きさ）を近似させた数百〜数万のサンプル（追跡指標）を用いるが、一般的には、ある再帰過程において前の時刻ｔ−１の画像フレームのリサンプリング過程から所定数のサンプルＳが選択される。 Specifically, as shown in FIG. 7, the state of the object is recursively estimated from the video through two processing systems of a prediction model and an observation model. In the particle method, hundreds to tens of thousands of samples (tracking indices) approximating the state (position, size) of an object are used, but in general, an image frame at the previous time t-1 in a certain recursive process. A predetermined number of samples S are selected from the resampling process.

そして、予測モデルにおいて、所定のシステムモデルに従って時刻ｔにおける各サンプルＳの位置が動的にそれぞれ予測される。このとき、予測先に位置付けられるサンプルＳ’の大きさは均一であり、予測前のサンプルＳの大きさ（重み）に応じて１つのサンプルから複数のサンプルに拡散される場合もある。これにより、サンプルの分布密度に変化が生じ、見掛け上、サンプルが拡散、移動することになる。 In the prediction model, the position of each sample S at time t is dynamically predicted according to a predetermined system model. At this time, the size of the sample S ′ positioned at the prediction destination is uniform and may be diffused from one sample to a plurality of samples according to the size (weight) of the sample S before prediction. As a result, a change occurs in the distribution density of the sample, and the sample apparently diffuses and moves.

その後、観測モデルにおいて、時刻ｔの画像フレームから予測先の位置に対応する重みを求め、予測先に位置付けられている各サンプルＳ’に対して重み付けをそれぞれ行う。その重みに応じた大きさのサンプルを表示することにより、対象物を追跡することが可能となる。 Thereafter, in the observation model, a weight corresponding to the position of the prediction destination is obtained from the image frame at time t, and each sample S 'positioned at the prediction destination is weighted. By displaying a sample having a size corresponding to the weight, the object can be tracked.

Greg Welch、外１名、“An Introduction to the Kalman Filter”、TR 95-041、Department of Computer Science University of North Carolina at Chapel Hill、NC 27599-3175、2006年7月24日Greg Welch, 1 other, “An Introduction to the Kalman Filter”, TR 95-041, Department of Computer Science University of North Carolina at Chapel Hill, NC 27599-3175, July 24, 2006 北川源四郎著、“時系列解析入門”、岩波書店、2005年2月24日、p.216-223Genshiro Kitagawa, “Introduction to Time Series Analysis”, Iwanami Shoten, February 24, 2005, p.216-223 “ブラウン運動”、［online］、［平成20年10月27日検索］、インターネット＜URL : http://www1.parkcity.ne.jp/yone/math/mathB03_11.htm＞“Brown Movement”, [online], [October 27, 2008 search], Internet <URL: http://www1.parkcity.ne.jp/yone/math/mathB03_11.htm>

しかしながら、前述した粒子法において対象物を高精度で追跡するには非常に多くのサンプルを必要とするため、サンプル数の増加に伴って予測モデルや観測モデルでの処理時間が増大し、実時間性が失われるという問題があった。具体的には、サンプルＳの増加に伴って予測されるサンプルＳ’も増加するため、結果として観測モデルでの重み付けに時間がかかるという問題があった。 However, since the particle method described above requires an extremely large number of samples in order to track an object with high accuracy, the processing time in the prediction model and the observation model increases as the number of samples increases. There was a problem of loss of sex. Specifically, as the number of samples S increases, the predicted sample S ′ also increases, resulting in a problem that it takes time to weight the observation model.

本発明は、上記を鑑みてなされたものであり、粒子法を用いて動画像に撮影されている対象物を高速に追跡可能な対象物追跡装置及び対象物追跡方法を提供することを課題とする。 The present invention has been made in view of the above, and it is an object of the present invention to provide an object tracking device and an object tracking method that can rapidly track an object captured in a moving image using a particle method. To do.

第１の請求項に係る発明は、粒子法を用いて動画像に撮影されている対象物を追跡する対象物追跡装置において、時系列な複数の画像フレームで構成された動画像を蓄積しておく蓄積手段と、粒子法で用いる所定のシステムモデルに従って、時刻ｔ−１の前記画像フレームに撮影されている対象物を表した追跡指標が次の時刻ｔで移動する位置を予測する予測手段と、前記時刻ｔの画像フレームから任意の探索領域を特定し、時刻ｔ−１における前記対象物の色分布ｑ＝｛ｑ_ｕ｝_{ｕ＝１...ｍ}（但し、ｑの上に∧記号あり、Σ_{ｕ＝１〜ｍ}ｑ_ｕ＝１であり、ｍは色分布における画素濃淡値の量子化レベルである）を計算すると共に、時刻ｔにおける前記探索領域の色分布ｐ（ｙ）＝｛ｐ_ｕ（ｙ）｝_{ｕ＝１...ｍ}（但し、ｐの上に∧記号あり、Σ_{ｕ＝１〜ｍ}ｐ_ｕ＝１であり、ｙは探索領域の位置である）を計算し、前記対象物の色分布ｑと前記探索領域の色分布ｐ（ｙ）との類似度ρ（ｙ）≡ρ［ｐ（ｙ），ｑ］＝Σ_{ｕ＝１〜ｍ}√（ｐ_ｕ（ｙ）ｑ_ｕ）が大きくなる方向に前記探索領域をシフトさせて当該類似度ρ（ｙ）が最大となる探索領域の位置を探索することを、ｐ（ｙ）＝｛ｐ_ｕ（ｙ_０）｝_{ｕ＝１...ｍ}を用いて位置ｙ_０における色分布を計算して、ρ［ｐ（ｙ_０），ｑ］＝Σ_{ｕ＝１〜ｍ}√（ｐ_ｕ（ｙ_０）ｑ_ｕ）を計算し、ｙ_１＝（Σ_{ｉ＝１〜ｎ＿ｈ}ｘ_ｉｗ_ｉｇ（‖（ｙ_０−ｘ_ｉ）／ｈ‖^２））／（Σ_{ｉ＝１〜ｎ＿ｈ}ｗ_ｉｇ（‖（ｙ_０−ｘ_ｉ）／ｈ‖^２））（但し、ｙの上に∧記号あり、ｗ_ｉは少なくともΣ_{ｕ＝１〜ｍ}√（ｑ_ｕ／（ｐ_ｕ（ｙ_０）））を用いて計算される値であり、ｇ（ｘ）はｋ（ｘ）の一次微分であってｇ（ｘ）＝−（（ｄ＋２）／（ｎｈ^３））Σ_{ｉ＝１〜ｎ＿ｈ}（ｘ_ｉ−ｘ）であり、ｋ（ｘ）＝０．５（ｄ＋２）（１−ｘ^Ｔｘ）（ｘ^Ｔｘ＜１の場合）であり、ｋ（ｘ）＝０（ｘ^Ｔｘ＞１の場合）であり、ｄは２であり、ｈは２〜６の範囲で適宜選択される探索領域の半径であり、ｉ（＝１〜ｎ＿ｈ）は半径ｈ内に含まれる画素の番号である）を用いてシフト先の位置ｙ_１を計算し、ｐ（ｙ）＝｛ｐ_ｕ（ｙ_１）｝_{ｕ＝１...ｍ}を用いて位置ｙ_１における色分布を計算して、ρ［ｐ（ｙ_１），ｑ］＝Σ_{ｕ＝１〜ｍ}√（ｐ_ｕ（ｙ_１）ｑ_ｕ）を計算し、計算された類似度ρ［ｐ（ｙ_１），ｑ］の値が類似度ρ［ｐ（ｙ_０），ｑ］の値よりも小さくなるまで前記探索領域をシフトさせて反復計算を繰り返す平均値シフト法（但し、前記半径ｈ内に含まれる画素の輝度値が一定値以下の場合には、前記類似度を計算することなく次の探索領域にシフトする）により行う探索手段と、前記追跡指標の予測位置を前記探索位置に修正し、前記時刻ｔの画像フレームを用いて修正後の予測位置に対応する画像特徴量を計算し、当該画像特徴量を用いて修正後の予測位置に移動させた前記追跡指標に重み付けを行って、時刻ｔの追跡指標として推定する推定手段と、を有することを要旨とする。 According to a first aspect of the present invention, there is provided an object tracking device for tracking an object photographed in a moving image using a particle method, wherein a moving image composed of a plurality of time-series image frames is accumulated. Storage means for storing, and predicting means for predicting a position at which a tracking index representing an object photographed in the image frame at time t-1 moves at the next time t in accordance with a predetermined system model used in the particle method. An arbitrary search area is specified from the image frame at time t, and the color distribution q = {q _u } _{u = 1 ... m at} time t−1 (note that there is a ∧ symbol on q. , Σ _{u =} 1 to m q _u = 1, and m is the quantization level of the pixel gray value in the color distribution), and the color distribution p (y) = {p of the search area at time t _{_{u (y)} u = 1}} ... m ( however, there ∧ symbols on the p Sigma _{u =} a _{1~m p u} = _{_1,} y computes the a is) position of the search area, the degree of similarity between the color distribution p (y) of the search area and the color distribution q of the object [rho ( y) ≡ρ [p (y) , q] = Σ u = 1~m √ (p u (y) q u) by shifting the search area in the direction in which the larger the similarity [rho (y) is the maximum The search for the position of the search region is calculated by calculating the color distribution at position y ₀ using p (y) = {p _u (y ₀ )} _{u = 1.} _{y 0), q] = Σ} u = 1~m √ a _{_{_{(p u (y 0) q}}} u) _{_{calculated, y 1 = (Σ i =}} 1~n_h x i w i g ( ‖ _(y 0 -x _{^{i) / h‖ 2)) /}} (Σ i = 1~n_h w i g ( ‖ _{_{(y 0 -x i) / h‖}} 2)) ( where there ∧ symbol on the y, _{w i} is at least sigma _{u = 1 to m√} (q _u / (p _u (y ₀ ))), G (x) is the first derivative of k (x) and g (x) = − ((d + 2) / (nh ³ )) Σ _{i = 1 to n_h} (X _i −x), k (x) = 0.5 (d + 2) (1−x ^T x) (when x ^T x <1), and k (x) = 0 (x ^T x> 1), d is 2, h is a radius of a search region appropriately selected in a range of 2 to 6, and i (= 1 to n_h) is a number of a pixel included in the radius h. the position _{y 1} of the shift destination using certain) _{calculates, p (y) = {p} u (y 1)} u = 1 ... by calculating the color distribution at positions _{y 1} with _m, [rho [P (y ₁ ), q] = Σ _{u =} _{1 to} m √ (p _u (y ₁ ) q _u ) is calculated, and the calculated similarity ρ [p (y ₁ ), q] is similar degree _{ρ [p (y 0),} q] smaller or than the value of An average value shift method in which the search region is shifted and repeated calculation is repeated (however, if the luminance value of a pixel included in the radius h is equal to or less than a predetermined value, the next search region is not calculated without calculating the similarity). And the predicted position of the tracking index is corrected to the search position, the image feature amount corresponding to the corrected predicted position is calculated using the image frame at time t, and the image The gist of the invention is to include an estimation unit that weights the tracking index moved to the corrected predicted position using the feature amount and estimates the tracking index as a tracking index at time t.

第２の請求項に係る発明は、粒子法を用いて動画像に撮影されている対象物を追跡する対象物追跡装置で処理する対象物追跡方法において、前記対象物追跡装置により、時系列な複数の画像フレームで構成された動画像を蓄積しておくステップと、粒子法で用いる所定のシステムモデルに従って、時刻ｔ−１の前記画像フレームに撮影されている対象物を表した追跡指標が次の時刻ｔで移動する位置を予測するステップと、前記時刻ｔの画像フレームから任意の探索領域を特定し、時刻ｔ−１における前記対象物の色分布ｑ＝｛ｑ_ｕ｝_{ｕ＝１...ｍ}（但し、ｑの上に∧記号あり、Σ_{ｕ＝１〜ｍ}ｑ_ｕ＝１であり、ｍは色分布における画素濃淡値の量子化レベルである）を計算すると共に、時刻ｔにおける前記探索領域の色分布ｐ（ｙ）＝｛ｐ_ｕ（ｙ）｝_{ｕ＝１...ｍ}（但し、ｐの上に∧記号あり、Σ_{ｕ＝１〜ｍ}ｐ_ｕ＝１であり、ｙは探索領域の位置である）を計算し、前記対象物の色分布ｑと前記探索領域の色分布ｐ（ｙ）との類似度ρ（ｙ）≡ρ［ｐ（ｙ），ｑ］＝Σ_{ｕ＝１〜ｍ}√（ｐ_ｕ（ｙ）ｑ_ｕ）が大きくなる方向に前記探索領域をシフトさせて当該類似度ρ（ｙ）が最大となる探索領域の位置を探索することを、ｐ（ｙ）＝｛ｐ_ｕ（ｙ_０）｝_{ｕ＝１...ｍ}を用いて位置ｙ_０における色分布を計算して、ρ［ｐ（ｙ_０），ｑ］＝Σ_{ｕ＝１〜ｍ}√（ｐ_ｕ（ｙ_０）ｑ_ｕ）を計算し、ｙ_１＝（Σ_{ｉ＝１〜ｎ＿ｈ}ｘ_ｉｗ_ｉｇ（‖（ｙ_０−ｘ_ｉ）／ｈ‖^２））／（Σ_{ｉ＝１〜ｎ＿ｈ}ｗ_ｉｇ（‖（ｙ_０−ｘ_ｉ）／ｈ‖^２））（但し、ｙの上に∧記号あり、ｗ_ｉは少なくともΣ_{ｕ＝１〜ｍ}√（ｑ_ｕ／（ｐ_ｕ（ｙ_０）））を用いて計算される値であり、ｇ（ｘ）はｋ（ｘ）の一次微分であってｇ（ｘ）＝−（（ｄ＋２）／（ｎｈ^３））Σ_{ｉ＝１〜ｎ＿ｈ}（ｘ_ｉ−ｘ）であり、ｋ（ｘ）＝０．５（ｄ＋２）（１−ｘ^Ｔｘ）（ｘ^Ｔｘ＜１の場合）であり、ｋ（ｘ）＝０（ｘ^Ｔｘ＞１の場合）であり、ｄは２であり、ｈは２〜６の範囲で適宜選択される探索領域の半径であり、ｉ（＝１〜ｎ＿ｈ）は半径ｈ内に含まれる画素の番号である）を用いてシフト先の位置ｙ_１を計算し、ｐ（ｙ）＝｛ｐ_ｕ（ｙ_１）｝_{ｕ＝１...ｍ}を用いて位置ｙ_１における色分布を計算して、ρ［ｐ（ｙ_１），ｑ］＝Σ_{ｕ＝１〜ｍ}√（ｐ_ｕ（ｙ_１）ｑ_ｕ）を計算し、計算された類似度ρ［ｐ（ｙ_１），ｑ］の値が類似度ρ［ｐ（ｙ_０），ｑ］の値よりも小さくなるまで前記探索領域をシフトさせて反復計算を繰り返す平均値シフト法（但し、前記半径ｈ内に含まれる画素の輝度値が一定値以下の場合には、前記類似度を計算することなく次の探索領域にシフトする）により行うステップと、前記追跡指標の予測位置を前記探索位置に修正し、前記時刻ｔの画像フレームを用いて修正後の予測位置に対応する画像特徴量を計算し、当該画像特徴量を用いて修正後の予測位置に移動させた前記追跡指標に重み付けを行って、時刻ｔの追跡指標として推定するステップと、を有することを要旨とする。 According to a second aspect of the present invention, there is provided an object tracking method for processing by an object tracking apparatus that tracks an object captured in a moving image using a particle method. A tracking index representing an object photographed in the image frame at time t−1 is stored in accordance with a step of accumulating a moving image composed of a plurality of image frames and a predetermined system model used in the particle method. Predicting the position to move at time t, specifying an arbitrary search area from the image frame at time t, and color distribution q = {q _u } _{u = 1 of the} object at time t−1. _.m (where ∧ is above q, Σ _{u =} 1 to m q _u = 1, m is the quantization level of the pixel gray value in the color distribution), and at the time t Search area color distribution p (y) = {p _{_{u (y)} u = 1}} ... m ( however, there ∧ symbol over p, a _{_{Σ u = 1~m p u = 1}} , y is a is a position of the search area) is calculated, said Similarity ρ (y) ≡ρ [p (y), q] = Σ _{u = 1 to} m √ (p _u (y) q between the color distribution q of the object and the color distribution p (y) of the search area p (y) = {p _u (y ₀ )} _{u =} to search the position of the search area where the similarity ρ (y) is maximized by shifting the search area in the direction in which _u ) increases. _{1 ... m} is used to calculate the color distribution at position y ₀ , and ρ [p (y ₀ ), q] = Σ _{u =} _{1 to m} √ (p _u (y ₀ ) q _u ) , Y ₁ = (Σ _{i =} _{1 to} _{n_h} x _i w _i g (‖ (y ₀ −x _i ) / h‖ ² )) / (Σ _{i = 1 to n_h} w _i g (‖ (y ₀ −x _i ) / h‖ ²⁾⁾ (However, there ∧ symbol on the y, _{w i} Is a value calculated using at least Σ _{u = 1 to m} √ (q _u / (p _u (y ₀ ))), and g (x) is the first derivative of k (x) and g (x ) = − ((D + 2) / (nh ³ )) Σ _{i = 1 to n_h} (x _i −x), k (x) = 0.5 (d + 2) (1−x ^T x) (x ^T x <1), k (x) = 0 (when x ^T x> 1), d is 2, and h is the radius of the search region appropriately selected in the range of 2-6. , I (= 1 to n_h) is the number of the pixel included in the radius h) to calculate the shift destination position y ₁ , and p (y) = {p _u (y ₁ )} _{u = 1 ... m} is used to calculate the color distribution at position y ₁ to calculate ρ [p (y ₁ ), q] = Σ _{u =} ₁ to _m √ (p _u (y ₁ ) q _u ), The value of the calculated similarity ρ [p (y ₁ ), q] is An average value shift method in which the search area is shifted and repeated calculation is repeated until the similarity is smaller than the value of ρ [p (y ₀ ), q] (however, the luminance value of the pixels included in the radius h is constant). If the value is less than or equal to the value, the step of shifting to the next search area without calculating the similarity), correcting the predicted position of the tracking index to the search position, and using the image frame at time t Calculating an image feature amount corresponding to the corrected predicted position and weighting the tracking index moved to the corrected predicted position using the image feature amount, and estimating the tracking index at time t The gist is to have.

本発明によれば、粒子法を用いて動画像に撮影されている対象物を高速に追跡可能な対象物追跡装置及び対象物追跡方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the target object tracking apparatus and target object tracking method which can track the target object currently image | photographed by the particle method at high speed can be provided.

本実施の形態に係る対象物追跡装置の機能構成を示す機能構成図である。It is a functional block diagram which shows the functional structure of the target tracking device which concerns on this Embodiment. 本実施の形態におけるサンプルの状態を説明するための説明図である。It is explanatory drawing for demonstrating the state of the sample in this Embodiment. 平均値シフト法におけるバッタチャアア係数の類似度分布を示す図である。It is a figure which shows the similarity distribution of the grasshopper area coefficient in the average value shift method. 本実施の形態に係る対象物推定装置の処理フローを示すフロー図である。It is a flowchart which shows the processing flow of the target object estimation apparatus which concerns on this Embodiment. バッタチャアア係数による類似度に対する各サンプルの位置関係を示す図である。It is a figure which shows the positional relationship of each sample with respect to the similarity by a grasshopper char coefficient. 水槽の中で急峻に泳ぐ２匹の魚を追跡した結果を示す図である。It is a figure which shows the result of having tracked two fish which swims sharply in an aquarium. 従来のサンプルの状態を説明するための説明図である。It is explanatory drawing for demonstrating the state of the conventional sample.

図１は、本実施の形態に係る対象物追跡装置の機能構成を示す機能構成図である。この対象物追跡装置１００は、入力部１１と、予測部１２と、探索部１３と、推定部１４と、表示部１５と、蓄積部３１とを備えた構成である。 FIG. 1 is a functional configuration diagram showing a functional configuration of the object tracking device according to the present embodiment. The object tracking device 100 includes an input unit 11, a prediction unit 12, a search unit 13, an estimation unit 14, a display unit 15, and a storage unit 31.

入力部１１は、水槽で泳ぐ魚や交差点を渡っている人等の移動している対象物が撮影された映像の入力を受け付ける機能を備えている。 The input unit 11 has a function of receiving an input of an image in which a moving object such as a fish swimming in an aquarium or a person crossing an intersection is photographed.

蓄積部３１は、入力部１１で受け付けた後に入力された映像を、時系列な複数の画像フレームとして蓄積しておく機能を備えている。このような蓄積部３１としては、例えばメモリ、ハードディスク等の記憶装置を用いることが一般的であり、対象物追跡装置１００の内部のみならず、通信ネットワークを介して電気的に接続可能な外部の記憶装置を用いることも可能である。 The accumulation unit 31 has a function of accumulating videos input after being received by the input unit 11 as a plurality of time-series image frames. As such a storage unit 31, for example, a storage device such as a memory or a hard disk is generally used, and not only inside the object tracking device 100 but also an external device that can be electrically connected via a communication network. It is also possible to use a storage device.

予測部１２は、粒子法における予測モデルの処理と同じ処理を行うものであって、具体的には、粒子法で用いる所定のシステムモデルに従って、時刻ｔ−１の画像フレームに撮影されている追跡対象としての対象物を示したサンプル（追跡指標）が、次の時刻ｔに移動する位置を予測する機能を備えている。以下、具体的な処理について説明する。 The prediction unit 12 performs the same processing as the processing of the prediction model in the particle method, and specifically, tracking that is captured in the image frame at time t−1 according to a predetermined system model used in the particle method. A sample (tracking index) showing a target object as a target has a function of predicting a position to move at the next time t. Specific processing will be described below.

サンプルとは、前述したように対象物の状態（位置、大きさ）を近似させたものであり、通常、数百〜数万のサンプルで構成されている。予測部１２は、ある再帰過程において、前の時刻ｔ−１の画像フレームＺ_ｔ−１のリサンプリング過程からＮ個のサンプルＳ_ｔ−１ ^（ｎ）を選択する。ここで、各サンプルに対する重み付け量をπ_ｔ−１ ^（ｎ）とし、画像フレームがＺ_ｔ−１である場合にサンプルの状態（位置、大きさ）がＸ_ｔ−１である条件付確率をｐ（Ｘ_ｔ−１｜Ｚ_ｔ−１）とすると、選択後のサンプルの状態を図２の最上段に示すように模式化することができる。図２の最上段に示すサンプルは、重み付け量に応じた大きさで示されている。重み付け量とは、画像フレームから得られる対象物の画像特徴量であって、具体的には、画像強度，エッジ強度，色，テクスチャ強度等を用いることができる。なお、各再帰過程における重み付け量の総和は１になるようにしている。 The sample is obtained by approximating the state (position, size) of the object as described above, and is usually composed of hundreds to tens of thousands of samples. The prediction unit 12 selects N samples S _t−1 ⁽ⁿ⁾ from the resampling process of the image frame Z _{t−1 at} the previous time t−1 in a certain recursive process. Here, the weighting amount for each sample is π _t−1 ^(n), and the conditional probability that the state (position, size) of the sample is X _t−1 when the image frame is Z _t−1 is p. _{Assuming that} (X _t-1 | Z _t-1 ), the state of the sample after selection can be schematically shown as shown in the uppermost part of FIG. The sample shown at the top of FIG. 2 is shown in a size corresponding to the weighting amount. The weighting amount is an image feature amount of an object obtained from an image frame, and specifically, image strength, edge strength, color, texture strength, and the like can be used. Note that the sum of the weights in each recursive process is set to 1.

その後、予測部１２は、前述した所定のシステムモデルに従って、選択後のＮ個のサンプルＳ_ｔ−１ ^（ｎ）が次の時刻ｔに移動する予測位置を計算する。ここで、画像フレームがＺ_ｔ−１である場合に時刻ｔにおける予測されたサンプルの状態（位置、大きさ）がＸ_ｔである条件付確率をｐ（Ｘ_ｔ｜Ｚ_ｔ−１）とすると、予測後の状態を図２の上から二段目に示すように模式化することができる。なお、各予測位置に位置付けられる各サンプルの大きさは均一であり、予測前のサンプルの大きさ（重み）に応じて１つのサンプルから複数のサンプルに拡散されている。また、所定のシステムモデルとしては、例えばランダムウォークモデル（非特許文献３参照）や運動モデルを用いることができる。これにより、サンプルの分布密度に変化が生じ、見掛け上、サンプルが拡散、移動することになる。 Thereafter, the prediction unit 12 calculates a predicted position at which the N samples S _t−1 ⁽ⁿ⁾ after selection move to the next time t in accordance with the predetermined system model described above. Here, when the image frame is Z _t−1 , the conditional probability that the state (position, size) of the predicted sample at time t is X _t is p (X _t | Z _t−1 ). The state after the prediction can be schematically shown as shown in the second stage from the top of FIG. Note that the size of each sample positioned at each predicted position is uniform, and is spread from one sample to a plurality of samples according to the size (weight) of the sample before prediction. Moreover, as a predetermined | prescribed system model, a random walk model (refer nonpatent literature 3) and a motion model can be used, for example. As a result, a change occurs in the distribution density of the sample, and the sample apparently diffuses and moves.

探索部１３は、時刻ｔの画像フレームから任意の探索領域を特定し、平均値シフト法（ＭｅａｎＳｈｉｆｔ法）を用いて時刻ｔ−１における対象物の画像特徴量と時刻ｔにおける探索領域の画像特徴量との類似度を計算し、この類似度が大きくなる方向に探索領域をシフトさせ、この類似度が最大となる位置を探索する機能を備えている。以下、具体的な処理について説明する。 The search unit 13 specifies an arbitrary search region from the image frame at time t, and uses the average value shift method (Mean Shift method) and the image feature amount of the object at time t−1 and the image of the search region at time t. It has a function of calculating the similarity with the feature amount, shifting the search area in a direction in which the similarity increases, and searching for a position where the similarity is maximum. Specific processing will be described below.

平均値シフト法は、通常、ノイズ除去を高速に行う方法として知られているが、画像中の対象物をヒストグラムに基づく画像特徴量に置き換えて追跡する方法にも応用されている。ヒストグラムに基づく画像特徴量としては、色彩に関する勾配値（一次微分値：以降、「色分布」と称する）等を用いることができる。すなわち、平均値シフト法とは、追跡対象である対象物の色分布をモデル特徴として、前時刻の画像フレームの追跡結果を中心として次時刻の画像フレーム内で類似領域の探索を行って、対象物を随時追跡していく方法である。ここで、類似領域を単純に探索した場合には探索範囲が膨大となって探索時間の増大や誤検出の可能性があるため、平均値シフト法では、等方的なカーネル（例えば、ガウス分布）で畳み込み積分を行って色分布の平滑化を行い、余分なノイズを抑制して計算の安定化を図ることも可能となっている。以下、より具体的な処理の流れについて説明する。 The average value shift method is generally known as a method of performing noise removal at high speed, but is also applied to a method of tracking an object in an image by replacing it with an image feature amount based on a histogram. As the image feature amount based on the histogram, a gradient value relating to color (primary differential value: hereinafter referred to as “color distribution”) or the like can be used. In other words, the average value shift method is a method of searching for similar regions in an image frame at the next time centering on the result of tracking the image frame at the previous time, using the color distribution of the object being tracked as a model feature. It is a method of tracking things from time to time. Here, if a similar region is simply searched, the search range becomes enormous, and the search time may increase or misdetection may occur. Therefore, in the mean value shift method, an isotropic kernel (for example, Gaussian distribution) ), The color distribution is smoothed by performing convolution integration, and it is also possible to stabilize the calculation by suppressing excess noise. Hereinafter, a more specific processing flow will be described.

最初に、探索部１３は、時刻ｔの画像フレームから任意の探索領域を特定し、時刻ｔ−１における対象物の色分布を計算すると共に、時刻ｔにおける探索領域の色分布を計算する（ステップＳ１０１）。ここで、対象物の色分布を式（１）とし、探索領域の色分布を式（２）とする。なお、ｍは色分布における画素濃淡値の量子化レベルに相当する。また、式（２）のｙは探索領域の位置を示している。
First, the search unit 13 specifies an arbitrary search region from the image frame at time t, calculates the color distribution of the object at time t−1, and calculates the color distribution of the search region at time t (step). S101). Here, the color distribution of the object is represented by equation (1), and the color distribution of the search region is represented by equation (2). Note that m corresponds to the quantization level of the pixel gray value in the color distribution. In the equation (2), y indicates the position of the search area.

次に、探索部１３は、対象物の色分布と探索領域の色分布との類似度を計算し、山登り計算によって類似度が最大となる方向に探索領域を移動する（ステップＳ１０２）。具体的には、色分布間の類似度を表す指標として、式（３）に示すバッタチャアア係数を用いる。
Next, the search unit 13 calculates the similarity between the color distribution of the object and the color distribution of the search area, and moves the search area in the direction in which the similarity is maximized by the hill-climbing calculation (step S102). Specifically, as an index representing the degree of similarity between the color distributions, a grasshopper coefficient shown in Expression (3) is used.

この係数は２つの色分布間の類似度が高いほど大きな値となり、完全に一致すると１を返す。この係数を最大化する位置ｙを探索することにより、追跡を実現する。ここで、追跡処理について具体的に説明する。最初に、式（４）を用いて位置ｙ_０における色分布を計算し、式（５）を評価する（ステップＳ１０２ａ）。
This coefficient becomes larger as the degree of similarity between the two color distributions is higher. Tracking is realized by searching for a position y that maximizes this coefficient. Here, the tracking process will be specifically described. First, the color distribution at the position y ₀ is calculated using the equation (4), and the equation (5) is evaluated (step S102a).

続いて、式（６）を用いて移動先の位置ｙ_１を計算する（ステップＳ１０２ｂ）。なお、ｇ（ｘ）はｋ（ｘ）の一次微分である。また、ｄの値は２とし、ｈは２〜６の範囲で適宜選択されるものとする。なお、ｈは探索領域の半径であり、ｉは半径ｈ内に含まれる画素の番号（ｉ＝１〜ｎ_ｈ）を示している。
Then, to calculate the position _{y 1} of the destination using equation (6) (step S102b). Note that g (x) is the first derivative of k (x). The value of d is 2, and h is appropriately selected within the range of 2-6. Note that h is the radius of the search area, and i is the number of pixels included in the radius h (i = 1 to n _h ).

その後、式（７）を用いて位置ｙ_１における色分布を評価し、式（８）を計算する（ステップＳ１０２ｃ）。
Thereafter, the color distribution at the position y ₁ is evaluated using Expression (7), and Expression (8) is calculated (Step S102c).

式（８）の計算結果が式（５）の計算結果よりも小さくなるまでｙ_０にｙ_１を代入して反復計算を繰り返す（ステップＳ１０２ｄ）。 Until the calculation result of Expression (8) becomes smaller than the calculation result of Expression (5), y ₁ is substituted into y ₀ and the iterative calculation is repeated (step S102d).

最後に、探索部１３は、ステップＳ１０２によって類似度が最大となった探索領域の位置を探索位置とする（ステップＳ１０３）。 Finally, the search unit 13 sets the position of the search area where the similarity is maximized in step S102 as the search position (step S103).

なお、前述したように平均値シフト法における類似度の計算にはバッタチャアア（Ｂｈａｔｔａｃｈａｒｙｙａ）係数が用いられるので、類似度分布が滑らかとなり、図３に示すように、極値に陥ることなく初期位置（○印）から最も類似している収束位置（▽点）を迅速に探索することが可能となっている。即ち、単純な類似度計算の場合には複数の極大値が存在するが、バッタチャアア係数を用いて類似度を計算する場合には単峰性の極大値（最大値）が得られるので、類似度の平滑化に基づく高速化という効果を得ることができる。 As described above, since the Batterchaa coefficient is used for the similarity calculation in the average value shift method, the similarity distribution becomes smooth, and as shown in FIG. It is possible to quickly search for the convergence position (▽ point) that is most similar from the mark (○). That is, in the case of simple similarity calculation, there are a plurality of maximum values, but when calculating the similarity using the Batterchaer coefficient, a unimodal maximum value (maximum value) is obtained. The effect of speeding up based on smoothing can be obtained.

また、ステップＳ１０２の処理において、探索部１３は、半径ｈの領域内に含まれる画素の輝度値が一定値以下の場合に、類似度を計算することなく次の探索領域にシフトすることも可能である。例えば、８ビット階調（２５６階調）の場合において、輝度値が１０未満の画素が含まれる探索領域については類似度を計算しないことや、探索領域に含まれる画素のうち１０未満の画素については無いものとして取り扱うことことが可能である。類似度計算の対象となる探索領域が少なくなるので、類似度が最大となる探索領域の位置を高速に探索することが可能となる。 In the process of step S102, the search unit 13 can also shift to the next search area without calculating the similarity when the luminance value of the pixel included in the area of the radius h is equal to or less than a certain value. It is. For example, in the case of 8-bit gradation (256 gradations), the similarity is not calculated for a search area including a pixel with a luminance value of less than 10, or less than 10 of the pixels included in the search area. It is possible to handle as no. Since there are fewer search areas to be subjected to similarity calculation, it is possible to search for the position of the search area where the similarity is maximized at high speed.

推定部１４は、粒子法における観測モデルの処理と同じ処理を行うものであって、具体的には、探索部１３によって探索された探索位置を用いて予測部１２によって予測された予測位置を修正し、時刻ｔの画像フレームを用いて修正後の予測位置に対応する画像特徴量を計算し、この画像特徴量を用いて修正後の予測位置に移動させたサンプルに重み付けを行って、時刻ｔの追跡指標として推定する機能を備えている。 The estimation unit 14 performs the same processing as the observation model processing in the particle method, and specifically corrects the predicted position predicted by the prediction unit 12 using the search position searched by the search unit 13. Then, the image feature amount corresponding to the corrected predicted position is calculated using the image frame at time t, and the sample moved to the corrected predicted position is weighted using the image feature amount, and the time t It has a function to estimate as a tracking index.

粒子法の観測モデルは、前述したように予測モデルで予測された位置のサンプルを用いて重み付けを行うが、本実施の形態に係る推定部１４は、図２の下半分に示すように、探索部１３で探索された探索位置になるように予測位置を修正する。そして、時刻ｔの画像フレームＺ_ｔから予測位置に対応する重みを計算して重み付けを行う。その結果、画像フレームがＺ_ｔである場合にサンプルの状態（位置、大きさ）がＸ_ｔである条件付確率をｐ（Ｘ_ｔ｜Ｚ_ｔ）とすると、重み付け後の状態を図２の最下段に示すように模式化することができる。 The observation method of the particle method performs weighting using the sample of the position predicted by the prediction model as described above, but the estimation unit 14 according to the present embodiment performs a search as shown in the lower half of FIG. The predicted position is corrected to be the search position searched by the unit 13. Then, the weighting from the image frame Z _t at time t by calculating the weight corresponding to the predicted position. Outermost When _{| (Z} t X _t), the state after weighting in Figure 2 as a result, when the image frame is a Z _t sample states (position, size) of the conditional probability is X _t p It can be modeled as shown in the bottom row.

表示部１５は、重み付け後のサンプルの状態を表示する機能を備えている。 The display unit 15 has a function of displaying the weighted sample state.

続いて、本実施の形態に係る対象物推定装置の処理フローについて説明する。図４は、本実施の形態に係る対象物推定装置の処理フローを示すフロー図である。最初に、入力部１１が、移動している対象物が撮影された映像の入力を受け付けて、時系列な複数の画像フレームとして蓄積部３１に蓄積する（ステップＳ２０１）。 Then, the processing flow of the target object estimation apparatus which concerns on this Embodiment is demonstrated. FIG. 4 is a flowchart showing a processing flow of the object estimation apparatus according to the present embodiment. First, the input unit 11 receives an input of a video in which a moving object is photographed, and accumulates it in the accumulation unit 31 as a plurality of time-series image frames (step S201).

次に、予測部１２が、粒子法で用いる所定のシステムモデルに従って、時刻ｔ−１の画像フレームに撮影されている追跡対象としての対象物を示したサンプルが、次の時刻ｔに移動する位置を予測する（ステップＳ２０２）。 Next, according to a predetermined system model used in the particle method, the prediction unit 12 is a position where a sample indicating an object to be tracked captured in an image frame at time t-1 moves to the next time t. Is predicted (step S202).

続いて、探索部１３が、時刻ｔの画像フレームから任意の探索領域を特定し、平均値シフト法を用いて時刻ｔ−１における対象物の色分布と時刻ｔにおける探索領域の色分布との類似度を計算し、この類似度が大きくなる方向に探索領域をシフトさせ、この類似度が最大となる位置を探索する（ステップＳ２０３）。 Subsequently, the search unit 13 specifies an arbitrary search region from the image frame at time t, and uses the average value shift method to calculate the color distribution of the object at time t-1 and the color distribution of the search region at time t. The similarity is calculated, the search area is shifted in the direction in which the similarity is increased, and a position where the similarity is maximized is searched (step S203).

その後、推定部１４が、探索位置を用いて予測位置を修正し、時刻ｔの画像フレームを用いて修正後の予測位置に対応する画像特徴量を計算し、この画像特徴量を用いて修正後の予測位置に移動させたサンプルに重み付けを行って、時刻ｔのサンプルとして推定する（ステップＳ２０４）。 Thereafter, the estimation unit 14 corrects the predicted position using the search position, calculates an image feature amount corresponding to the corrected predicted position using the image frame at time t, and uses the image feature amount to correct the corrected position. The sample moved to the predicted position is weighted and estimated as a sample at time t (step S204).

最後に、表示部１５が、重み付け後のサンプルの状態を表示する（ステップＳ２０５）。 Finally, the display unit 15 displays the weighted sample state (step S205).

このようなステップＳ２０１〜ステップＳ２０５の処理を蓄積部３１に蓄積されている全ての画像フレームを用いて行うことで、映像に映された対象物を逐次追跡することが可能となる。 By performing the processing of step S201 to step S205 using all the image frames stored in the storage unit 31, it becomes possible to sequentially track the object shown in the video.

図５は、バッタチャアア係数による類似度に対する各サンプルの位置関係を示す図である。３次元表示された分布は、時刻ｔ−１における対象物の色分布と時刻ｔにおける画像フレームの色分布とのバッタチャアア係数による類似度を示している。また、○印はサンプルを示し、☆印は類似度が最大となる位置を示している。ステップＳ２０３及びステップＳ２０４の処理により、複数のサンプルが類似度が最大である位置に移動している状態を示している。 FIG. 5 is a diagram illustrating the positional relationship of each sample with respect to the similarity based on the grasshopper area coefficient. The three-dimensionally displayed distribution indicates the similarity of the color distribution of the object at time t-1 and the color distribution of the image frame at time t, based on the Batterchaer coefficient. In addition, ◯ indicates a sample, and ☆ indicates a position where the similarity is maximum. The process of step S203 and step S204 shows a state where a plurality of samples have moved to a position where the similarity is maximum.

図６は、水槽の中で急峻に泳ぐ２匹の魚を追跡した結果を示す図である。図６（ａ）は従来法での追跡結果を示し、（ｂ）は本実施の形態での追跡結果を示している。従来法の場合には、追跡結果としての○印が２匹の魚のうち１匹の魚についてのみしか表示されていないが、本実施の形態では２匹の魚について表示されているので、急峻に移動する移動体の場合であっても迅速かつ安定的に追跡できていることが理解できる。なお、この追跡では、探索部１３での探索処理において、色のヒストグラム分布を用いて類似度を計算している。特に、ロバスト性を高めるためにＲＧＢの色彩情報をＨＳＶのカラー変換モデルを適用し、Ｈ（色相）情報を用いて類似計算を行った。Ｈに関する対象物の画素を０〜３６０度の範囲でヒストグラム化して３０度づつ１２分割しておく。そして、１２個の分割データを用いて画像フレーム間での探索を行っている。なお、色に代えて対象物の輪郭を追跡する方法もあるが、対象物が魚の場合には観察方向によって大きく形状が変化するため、対象物を見失う場合がある。そのため、画像特徴量として色を用いることで、より安定的に対象物を追跡することが可能となる。 FIG. 6 is a diagram showing the results of tracking two fish swimming steeply in the aquarium. FIG. 6A shows the tracking result in the conventional method, and FIG. 6B shows the tracking result in the present embodiment. In the case of the conventional method, the circle mark as a tracking result is displayed only for one fish out of two fish, but in this embodiment, it is displayed for two fish, so it is steep. It can be understood that even a moving mobile body can be tracked quickly and stably. In this tracking, the similarity is calculated using the color histogram distribution in the search process in the search unit 13. In particular, in order to enhance robustness, the color conversion model of RGB was applied to the color information of RGB, and similar calculation was performed using H (hue) information. The pixel of the object related to H is made into a histogram in the range of 0 to 360 degrees and divided into 12 parts every 30 degrees. A search is performed between image frames using 12 pieces of divided data. Although there is a method of tracking the contour of the object instead of the color, when the object is a fish, the shape changes greatly depending on the observation direction, so the object may be lost. Therefore, it is possible to track the object more stably by using the color as the image feature amount.

本実施の形態によれば、粒子法における予測モデルの処理と観測モデルの処理との間に平均値シフト法を導入し、平均値シフト法により追跡対象である対象物の画像特徴量に最も類似する画像特徴量の位置を探索し、探索後の位置にサンプルを移動させるので、図２の最下段に示すように観測モデルにおける重み付け計算量が低減され、高速な対象物追跡装置を提供することができる。 According to the present embodiment, the average value shift method is introduced between the prediction model processing and the observation model processing in the particle method, and is most similar to the image feature amount of the object to be tracked by the average value shift method. Since the position of the image feature amount to be searched is searched and the sample is moved to the position after the search, the weighting calculation amount in the observation model is reduced as shown in the lowermost stage of FIG. 2, and a high-speed object tracking device is provided. Can do.

最後に、各実施の形態で説明した対象物追跡装置は、コンピュータで構成され、各機能ブロックの各処理はプログラムで実行されるようになっている。また、各実施の形態で説明した対象物追跡装置の各処理動作をプログラムとして例えばコンパクトディスクやフロッピー（登録商標）ディスク等の記録媒体に記録して、この記録媒体をコンピュータに組み込んだり、若しくは記録媒体に記録されたプログラムを、任意の通信回線を介してコンピュータにダウンロードしたり、又は記録媒体からインストールし、該プログラムでコンピュータを動作させることにより、上述した各処理動作を対象物追跡装置として機能させることができるのは勿論である。 Finally, the object tracking device described in each embodiment is configured by a computer, and each process of each functional block is executed by a program. Further, each processing operation of the object tracking device described in each embodiment is recorded as a program on a recording medium such as a compact disk or a floppy (registered trademark) disk, and this recording medium is incorporated into a computer or recorded. A program recorded on a medium can be downloaded to a computer via an arbitrary communication line, or installed from a recording medium, and the computer can be operated with the program, whereby each processing operation described above functions as an object tracking device. Of course, it can be made.

なお、本実施の形態で説明した対象物追跡装置は、特にマルチメディア分野，符号化分野，通信分野，映像監視分野の技術分野において応用可能であることを付言しておく。 It should be noted that the object tracking device described in the present embodiment can be applied particularly in the technical fields of the multimedia field, the coding field, the communication field, and the video surveillance field.

１１…入力部
１２…予測部
１３…探索部
１４…推定部
１５…表示部
３１…蓄積部
１００…対象物追跡装置
Ｓ１０１〜Ｓ１０３，Ｓ１０２ａ〜Ｓ１０２ｄ…ステップ
Ｓ２０１〜Ｓ２０５…ステップ DESCRIPTION OF SYMBOLS 11 ... Input part 12 ... Prediction part 13 ... Search part 14 ... Estimation part 15 ... Display part 31 ... Accumulation part 100 ... Object tracking apparatus S101-S103, S102a-S102d ... Step S201-S205 ... Step

Claims

In an object tracking device that tracks an object captured in a moving image using the particle method,
Storage means for storing moving images composed of a plurality of time-series image frames;
Predicting means for predicting the position at which the tracking index representing the object imaged in the image frame at time t-1 moves at the next time t according to a predetermined system model used in the particle method;
An arbitrary search region is identified from the image frame at time t, and the color distribution q = {q _u } _{u = 1...} M of the object at time t−1 (provided that there is a ∧ symbol above q, Σ _{u =} 1 to m q _u = 1, where m is the quantization level of the pixel gray value in the color distribution) and the color distribution p (y) = {p _{u of the} search area at time t _{(y)} u = 1 ...} m ( however, there ∧ symbol over p, a _{_{Σ u = 1~m p u = 1}} , y is a is a position of the search area) is calculated, said target similarity between the color distribution p (y) of the search area and the color distribution q of the object ρ (y) ≡ρ [p ( y), q] = Σ u = 1~m √ (p u (y) q u ) that is shifted the search area in the direction of increasing by the similarity [rho (y) is to search for the position of the search region to be a maximum, p (y) = {p u (y 0)} u = _... by calculating the color distribution at the position _{y 0} with _{_{m, ρ [p (y 0}} ), q] = Σ u = 1~m √ (p u (y 0) q u) is calculated and y ₁ = (Σ _{i =} _{1 to} _{n_h} x _i w _i g (‖ (y ₀ −x _i ) / h‖ ² )) / (Σ _{i = 1 to n_h} w _i g (‖ (y ₀ −x _i )) / H ‖ ² )) (where ∧ is above y, and w _i is a value calculated using at least Σ _{u = 1 to m} √ (q _u / (p _u (y ₀ ))) , G (x) is the first derivative of k (x), g (x) = − ((d + 2) / (nh ³ )) Σ _{i = 1 to n_h} (x _i −x), and k (x ) = 0.5 (d + 2) (1−x ^T x) (when x ^T x <1), k (x) = 0 (when x ^T x> 1), and d is 2. , H is a radius of a search region that is appropriately selected within a range of 2 to 6, and i (= 1 to n_h) is the number of the pixel included in the radius h), and the shift destination position y ₁ is calculated, and p (y) = {p _u (y ₁ )} _{u = 1. .m} is used to calculate the color distribution at position y ₁ and ρ [p (y ₁ ), q] = Σ _{u =} _{1 to} m √ (p _u (y ₁ ) q _u ) Average value shifting method in which the search region is shifted and the iterative calculation is repeated until the similarity ρ [p (y ₁ ), q] is smaller than the similarity ρ [p (y ₀ ), q]. (However, if the luminance value of the pixel included in the radius h is equal to or less than a certain value, the search means performs the following search without shifting the similarity to the next search area);
The predicted position of the tracking index is corrected to the search position, an image feature amount corresponding to the corrected predicted position is calculated using the image frame at time t, and the corrected predicted position is calculated using the image feature amount An estimation means for weighting the tracking index moved to the position and estimating it as a tracking index at time t;
An object tracking device characterized by comprising:

In an object tracking method processed by an object tracking device that tracks an object captured in a moving image using a particle method,
By the object tracking device,
Storing a moving image composed of a plurality of time-series image frames;
Predicting the position at which the tracking index representing the object imaged in the image frame at time t-1 moves at the next time t according to a predetermined system model used in the particle method;
An arbitrary search region is identified from the image frame at time t, and the color distribution q = {q _u } _{u = 1...} M of the object at time t−1 (provided that there is a ∧ symbol above q, Σ _{u =} 1 to m q _u = 1, where m is the quantization level of the pixel gray value in the color distribution) and the color distribution p (y) = {p _{u of the} search area at time t _{(y)} u = 1 ...} m ( however, there ∧ symbol over p, a _{_{Σ u = 1~m p u = 1}} , y is a is a position of the search area) is calculated, said target similarity between the color distribution p (y) of the search area and the color distribution q of the object ρ (y) ≡ρ [p ( y), q] = Σ u = 1~m √ (p u (y) q u ) that is shifted the search area in the direction of increasing by the similarity [rho (y) is to search for the position of the search region to be a maximum, p (y) = {p u (y 0)} u = _... by calculating the color distribution at the position _{y 0} with _{_{m, ρ [p (y 0}} ), q] = Σ u = 1~m √ (p u (y 0) q u) is calculated and y ₁ = (Σ _{i =} _{1 to} _{n_h} x _i w _i g (‖ (y ₀ −x _i ) / h‖ ² )) / (Σ _{i = 1 to n_h} w _i g (‖ (y ₀ −x _i )) / H ‖ ² )) (where ∧ is above y, and w _i is a value calculated using at least Σ _{u = 1 to m} √ (q _u / (p _u (y ₀ ))) , G (x) is the first derivative of k (x), g (x) = − ((d + 2) / (nh ³ )) Σ _{i = 1 to n_h} (x _i −x), and k (x ) = 0.5 (d + 2) (1−x ^T x) (when x ^T x <1), k (x) = 0 (when x ^T x> 1), and d is 2. , H is a radius of a search region that is appropriately selected within a range of 2 to 6, and i (= 1 to n_h) is the number of the pixel included in the radius h), and the shift destination position y ₁ is calculated, and p (y) = {p _u (y ₁ )} _{u = 1. .m} is used to calculate the color distribution at position y ₁ and ρ [p (y ₁ ), q] = Σ _{u =} _{1 to} m √ (p _u (y ₁ ) q _u ) Average value shifting method in which the search region is shifted and the iterative calculation is repeated until the similarity ρ [p (y ₁ ), q] is smaller than the similarity ρ [p (y ₀ ), q]. (However, if the luminance value of the pixel included in the radius h is equal to or less than a certain value, shift to the next search area without calculating the similarity);
The predicted position of the tracking index is corrected to the search position, an image feature amount corresponding to the corrected predicted position is calculated using the image frame at time t, and the corrected predicted position is calculated using the image feature amount Weighting the tracking index moved to, and estimating it as a tracking index at time t;
An object tracking method characterized by comprising: