JP5116605B2

JP5116605B2 - Automatic tracking device

Info

Publication number: JP5116605B2
Application number: JP2008203659A
Authority: JP
Inventors: へい東全; 基継室井; 将司高見; 聡一須藤
Original assignee: Koito Electric IndustriesLtd
Current assignee: Koito Electric IndustriesLtd
Priority date: 2008-08-07
Filing date: 2008-08-07
Publication date: 2013-01-09
Anticipated expiration: 2028-08-07
Also published as: JP2010041526A

Description

本発明は、パン、チルト及びズームの制御が可能なカメラを用いて、追尾対象を自動追尾して撮像する自動追尾装置に関するものである。 The present invention relates to an automatic tracking device that automatically tracks and images a tracking target using a camera capable of controlling pan, tilt, and zoom.

従来から、このような自動追尾装置として、例えば、下記特許文献１に開示されている自動追尾装置が提案されている。この自動追尾装置は、テレビジョンカメラからの映像信号を入力すると共に映像信号を表示部に出力する画像入出力部と、画像入出力部よりの映像信号の前処理を行う画像処理部と、画像処理部を介し入力される参照画像データ及び探索画像データの相関値を求め探索画像上で最も相関性の高い位置を検出する相関演算部と、相関演算部よりの信号に基づいて前記テレビジョンカメラの旋回装置及びズームレンズを駆動制御すると共に前記各部を制御する中央演算処理部とからなるものである。
特開平１１−１８７３８７号公報 Conventionally, as such an automatic tracking device, for example, an automatic tracking device disclosed in Patent Document 1 below has been proposed. The automatic tracking device includes an image input / output unit that inputs a video signal from a television camera and outputs the video signal to a display unit, an image processing unit that performs preprocessing of the video signal from the image input / output unit, and an image A correlation calculation unit that obtains a correlation value between the reference image data and the search image data input via the processing unit and detects a position having the highest correlation on the search image, and the television camera based on a signal from the correlation calculation unit And a central processing unit for controlling the above-mentioned units as well as driving and controlling the swivel device and the zoom lens.
JP-A-11-187387

しかしながら、前記従来の自動追尾装置では、相関演算を用いて追尾対象の位置を求めているので、追尾対象の大きさや形状の変化に弱く、オクルージョン（隠蔽）の影響やカメラ制御による画像変動の影響も受け易く、このため、追尾対象を精度良く追尾することができなかった。 However, in the conventional automatic tracking device, since the position of the tracking target is obtained using the correlation calculation, it is weak against changes in the size and shape of the tracking target, the influence of occlusion (concealment) and the influence of image fluctuation due to camera control. Therefore, the tracking target cannot be tracked with high accuracy.

本発明は、このような事情に鑑みてなされたもので、追尾対象をより精度良く追尾して撮像することができる自動追尾装置を提供することを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide an automatic tracking device that can track and image a tracking target with higher accuracy.

前記課題を解決するための手段として、以下の各態様を提示する。第１の態様による自動追尾装置は、パン、チルト及びズームの制御が可能なカメラと、前記カメラにより撮像された画像に基づいて、追尾対象を追跡する追跡処理を行う追跡処理手段と、前記追跡処理手段による前記追跡処理の結果に応じて前記カメラが前記追尾対象を追尾して撮像するように、前記カメラのパン、チルト及びズームを制御する制御手段と、を備えたものである。前記追跡処理手段は、前記カメラにより撮像された画像に基づいて、画素の位置を状態とした複数のパーティクルによるパーティクルフィルタによって、追跡結果の一部として前記追尾対象の位置を推定する位置推定手段を含む。そして、前記パーティクルフィルタは、前記各パーティクルに関して、画素に関する１つ以上の特徴量に基づいて当該画素が追尾対象画素であるか背景画素であるかを識別するように構築されたAdaBoost識別器の応答値であって、当該パーティクルの位置の画素に関する前記１つ以上の特徴量による応答値から、算出した尤度を用いるものである。 The following aspects are presented as means for solving the problems. An automatic tracking device according to a first aspect includes a camera capable of controlling pan, tilt, and zoom, tracking processing means for performing tracking processing for tracking a tracking target based on an image captured by the camera, and the tracking Control means for controlling pan, tilt and zoom of the camera so that the camera tracks and tracks the tracking target according to the result of the tracking processing by the processing means. The tracking processing means includes position estimating means for estimating the position of the tracking target as a part of the tracking result by a particle filter using a plurality of particles whose pixel positions are based on an image captured by the camera. Including. The particle filter, for each particle, is a response of an AdaBoost discriminator constructed so as to identify whether the pixel is a tracking target pixel or a background pixel based on one or more feature quantities related to the pixel. A likelihood calculated from a response value based on the one or more feature amounts related to the pixel at the position of the particle.

この第１の態様によれば、前記パーティクルフィルタにより追尾対象の位置を推定するので、相関演算とは異なり、複数の解の候補（複数のパーティクル）を持つので追跡失敗から回復する可能性が高くなり、オクルージョンや複雑な背景などに対して強く、より精度良く追跡処理を行うことができ、ひいては、追尾対象をより精度良く追尾して撮像することができる。なお、決定論的手法（例えば、テンプレートマッチングなど）では、解を一意に決定するため追跡失敗から回復できない。また、前記第１の態様によれば、AdaBoost識別器により追尾対象画素と背景画素とが識別され、前記パーティクルフィルタがAdaBoost識別器の応答値から算出した尤度を用いるものであるため、追尾対象以外の背景の形状や大きさや色や明暗変化などの背景の変化の影響を受け難くなり、この点からも、より一層精度良く追跡処理を行うことができ、ひいては、追尾対象をより一層精度良く追尾して撮像することができる。 According to the first aspect, since the position of the tracking target is estimated by the particle filter, unlike the correlation calculation, since there are a plurality of solution candidates (a plurality of particles), the possibility of recovery from a tracking failure is high. Thus, it is strong against occlusion and complex backgrounds, and can perform tracking processing with higher accuracy. As a result, the tracking target can be tracked with higher accuracy and imaged. A deterministic method (for example, template matching) cannot recover from a tracking failure because the solution is uniquely determined. According to the first aspect, the tracking target pixel and the background pixel are identified by the AdaBoost classifier, and the particle filter uses the likelihood calculated from the response value of the AdaBoost classifier. It is less affected by background changes such as background shape, size, color, lightness and darkness, etc. From this point, it is possible to perform tracking processing with even higher accuracy, and in turn, the tracking target is more accurately controlled. It can be tracked and imaged.

第２の態様による自動追尾装置は、前記第１の態様において、前記位置推定手段は、前記追尾対象の位置を、前記複数のパーティクルの状態である画素位置の、前記尤度を用いた重み付け平均値として推定するものである。この第２の態様は、追尾対象の位置の推定の具体的な手法の例を挙げたものである。 In the automatic tracking device according to a second aspect, in the first aspect, the position estimation unit uses the likelihood of the position of the tracking target as a pixel position that is the state of the plurality of particles using the likelihood. It is estimated as a value. The second mode is an example of a specific method for estimating the position of the tracking target.

第３の態様による自動追尾装置は、前記第１又は第２の態様において、前記追跡処理手段は、前記複数のパーティクルのうち前記AdaBoost識別器により追尾対象画素であると識別された画素に応じたパーティクルに基づいて、追跡結果の他の一部として前記追尾対象の大きさを推定する大きさ推定手段を含むものである。 In the automatic tracking device according to the third aspect, in the first or second aspect, the tracking processing unit responds to a pixel identified as a tracking target pixel by the AdaBoost classifier among the plurality of particles. Based on the particles, a size estimation means for estimating the size of the tracking target as another part of the tracking result is included.

この第３の態様によれば、前記AdaBoost識別器により追尾対象画素であると識別された画素に応じたパーティクルに基づいて追尾対象の大きさを推定するので、追尾対象の大きさについても、オクルージョンや複雑な背景などに対して強く、より精度良く追跡処理を行うことができ、ひいては、追尾対象をより精度良く追尾して撮像することができる。 According to the third aspect, since the size of the tracking target is estimated based on the particles corresponding to the pixel identified as the tracking target pixel by the AdaBoost discriminator, the size of the tracking target is also occluded. It is strong against a complicated background or the like, and can perform a tracking process with higher accuracy. As a result, the tracking target can be tracked with higher accuracy and imaged.

第４の態様による自動追尾装置は、前記第１乃至第３のいずれかの態様において、前記追跡処理手段は、前記複数のパーティクルのうち前記AdaBoost識別器により追尾対象画素であると識別された画素に応じたパーティクルのうちの、前記位置推定手段により推定された前記追尾対象の位置に対する当該パーティクルの前記尤度によって重み付けされた共分散行列を用いたマハラノビス距離が所定値以下であるパーティクルの分布状況に基づいて、前記追尾対象の大きさを推定する大きさ推定手段を、含むものである。この第４の態様は、追尾対象の大きさの推定の具体的な手法の例を挙げたものである。 The automatic tracking device according to a fourth aspect is the automatic tracking device according to any one of the first to third aspects, wherein the tracking processing unit is a pixel identified as a tracking target pixel by the AdaBoost classifier among the plurality of particles. Distribution state of particles having a Mahalanobis distance equal to or less than a predetermined value using a covariance matrix weighted by the likelihood of the particle with respect to the tracking target position estimated by the position estimation unit The size estimation means for estimating the size of the tracking target is included. The fourth mode is an example of a specific method for estimating the size of the tracking target.

第５の態様による自動追尾装置は、前記第１乃至第４のいずれかの態様において、前記１つ以上の特徴量は、（i）当該画素を含む局所領域の画素の所定色空間の第１乃至第３の値のうちの前記第１の値の平均値、（ii）当該画素を含む局所領域の画素の前記第１の値の分散値、（iii）当該画素を含む局所領域の画素の前記第２の値の平均値、（iv）当該画素を含む局所領域の画素の前記第２の値の分散値、（v）当該画素を含む局所領域の画素の前記第３の値の平均値、（vi）当該画素を含む局所領域の画素の前記第３の値の分散値、（vii）当該画素を含む局所領域におけるエッジ方向ヒストグラム、及び、（viii）当該画素を含む局所領域におけるローカルバイナリーパターンのヒストグラム、のうちの少なくとも１つを含むものである。なお、前記所定色空間としては、ＣＩＥ１９７６Ｌ^＊ｕ^＊ｖ^＊色空間を挙げることができるが、必ずしもこれに限定されるものではない。 The automatic tracking device according to a fifth aspect is the automatic tracking device according to any one of the first to fourth aspects, wherein the one or more feature amounts are (i) a first predetermined color space of a pixel in a local region including the pixel. Thru | or the average value of said 1st value among 3rd values, (ii) The dispersion value of said 1st value of the pixel of the local region containing the said pixel, (iii) The pixel of the local region containing the said pixel An average value of the second values, (iv) a variance value of the second values of the pixels in the local area including the pixel, and (v) an average value of the third values of the pixels in the local area including the pixel. (Vi) a variance value of the third value of the pixel in the local area including the pixel; (vii) an edge direction histogram in the local area including the pixel; and (viii) a local binary in the local area including the pixel. Including at least one of the histograms of the pattern. The predetermined color space may be a CIE1976L ^* u ^* v ^* color space, but is not necessarily limited thereto.

この第５の態様は、特徴量の具体例を挙げたものである。例えば、前記１つ以上の特徴量として、（i）〜（vi）のみを採用してもよいが、その場合に比べて、（i）〜（vii）のみを採用する場合や、（i）〜（vi），（viii）のみを採用する場合や、（i）〜（viii）を採用する場合の方が、追跡性能が高まるので、好ましい。特に、それらの場合のうち、前記１つ以上の特徴量として（i）〜（viii）を採用する場合には、最も追跡性能が高まることが実験的に確認された。 In the fifth aspect, a specific example of the feature amount is given. For example, although only (i) to (vi) may be adopted as the one or more feature quantities, compared to that case, only (i) to (vii) are adopted, or (i) The cases where only (vi) and (viii) are adopted and the cases where (i) to (viii) are adopted are preferable because the tracking performance is enhanced. In particular, it has been experimentally confirmed that, in those cases, when (i) to (viii) are adopted as the one or more feature quantities, the tracking performance is most enhanced.

第６の態様による自動追尾装置は、前記第１乃至第５のいずれかの態様において、前記複数のパーティクルのうちの所定のパーティクルを追尾対象画素用の学習サンプルとするとともに、前記複数のパーティクルのうちの他の所定のパーティクルを背景画素用の学習サンプルとして、前記AdaBoost識別器を更新させる更新手段を備えたものである。 The automatic tracking device according to a sixth aspect is the automatic tracking device according to any one of the first to fifth aspects, wherein a predetermined particle of the plurality of particles is used as a learning sample for a tracking target pixel, Update means for updating the AdaBoost discriminator is provided using other predetermined particles as learning samples for background pixels.

この第６の態様によれば、AdaBoost識別器を更新させるので、追尾対象の見え方の変化や環境（照明及び日照条件など）の変化に対応することができ、これにより、AdaBoost識別器による追尾対象画素であるか背景画素であるかの識別の精度が高まる。したがって、この第６の態様によれば、より一層精度良く追跡処理を行うことができ、ひいては、追尾対象をより一層精度良く追尾して撮像することができる。 According to the sixth aspect, since the AdaBoost discriminator is updated, it is possible to cope with a change in the appearance of the tracking target and a change in the environment (such as lighting and sunshine conditions), and thereby the tracking by the AdaBoost discriminator. The accuracy of identifying whether the pixel is the target pixel or the background pixel is increased. Therefore, according to the sixth aspect, it is possible to perform the tracking process with higher accuracy, and as a result, it is possible to track and image the tracking target with higher accuracy.

第７の態様による自動追尾装置は、前記第１乃至第６のいずれかの態様において、前記複数のパーティクルの前記尤度の空間的な重み付け平均値を算出する算出手段と、前記空間的な重み付け平均値が所定値以上であるか否かを判定する判定手段とを備えたものである。 In the automatic tracking device according to a seventh aspect, in any one of the first to sixth aspects, a calculation unit that calculates a spatially weighted average value of the likelihoods of the plurality of particles, and the spatial weighting Determining means for determining whether or not the average value is equal to or greater than a predetermined value.

前記空間的な重み付け平均値は、追跡結果の信頼度を示すものとなり得る。前記空間的な重み付け平均値が大きければ、追跡結果の信頼度が高い一方、追跡結果の信頼度が低いと考えられる。前記第７の態様によれば、前記空間的な重み付け平均値を算出し、その値が所定値以上であるか否かを判定しているので、結局、追跡が成功しているか失敗したかを判定することができる。したがって、その判定結果を利用することで、追跡に失敗しているのにその誤った追跡結果に基づいてカメラの制御が継続されてしまうような事態を、回避することができる。 The spatial weighted average value may indicate the reliability of the tracking result. If the spatial weighted average value is large, the reliability of the tracking result is high, while the reliability of the tracking result is low. According to the seventh aspect, since the spatial weighted average value is calculated and it is determined whether or not the value is equal to or greater than a predetermined value, it is determined whether the tracking has succeeded or failed in the end. Can be determined. Therefore, by using the determination result, it is possible to avoid a situation in which the control of the camera is continued based on the erroneous tracking result although the tracking has failed.

第８の態様による自動追尾装置は、前記第１乃至第７のいずれかの態様において、前記制御手段は、前記追跡処理手段による前記追跡処理の結果に基づいて現在から所定時間経過後の追尾対象の位置及び大きさを予測する予測手段を含み、前記制御手段は、前記予測手段による予測結果に応じて、前記カメラに対する現在のパン、チルト及びズームの制御状態を修正して前記カメラのパン、チルト及びズームを制御するものである。 The automatic tracking device according to an eighth aspect is the automatic tracking device according to any one of the first to seventh aspects, wherein the control means is a tracking target after a predetermined time has elapsed from the present based on the result of the tracking process by the tracking processing means. Predicting means for predicting the position and size of the camera, and the control means corrects the current pan, tilt and zoom control states for the camera according to the prediction result by the predicting means, It controls tilt and zoom.

この第８の態様によれば、予測制御が導入されているので、例えば、カメラが制御指令に対して応答してその指令状態になるまでの動作時間が画像処理時間に比べて長い場合であっても、追尾対象の急な動きの変化などにも対応することができ、追尾対象をより精度良く追尾して撮像することができる。なお、カメラのパン、チルト、ズームの制御速度があまりに速過ぎると、追尾対象を監視者が目で追う際に、カメラのパン、チルト、ズームの変化があまりに急激になってしまい、監視者に不快感を与えてしまい監視に適さなくなってしまうが、カメラとして制御速度が比較的遅いものを使用することができるので、カメラのパン、チルト、ズームの変化をスムーズにして監視により適した追尾を実現することができる。 According to the eighth aspect, since predictive control is introduced, for example, the operation time until the camera responds to the control command and enters the command state is longer than the image processing time. However, it is possible to cope with a sudden change in the tracking target, and the tracking target can be tracked with higher accuracy and imaged. Note that if the camera pan, tilt, and zoom control speeds are too fast, the camera pan, tilt, and zoom will change too rapidly when the observer follows the tracking target. Although it may be uncomfortable and unsuitable for monitoring, a camera with a relatively slow control speed can be used, so the camera's pan, tilt and zoom changes are smoothed for tracking that is more suitable for monitoring. Can be realized.

第９の態様による自動追尾装置は、前記第８の態様において、前記予測手段は、カルマンフィルタにより、現在から所定時間経過後の追尾対象の位置及び大きさを予測するものである。この第９の態様では、カルマンフィルタが用いられているので、追尾対象の位置及び大きさを精度良く予測することができ、ひいては、追尾対象をより精度良く追尾して撮像することができる。もっとも、前記第８の態様では、予測手段はカルマンフィルタを用いたものに限定されるものではない。 In the automatic tracking device according to a ninth aspect, in the eighth aspect, the prediction means predicts the position and size of the tracking target after a predetermined time has elapsed from the present time, using a Kalman filter. In the ninth aspect, since the Kalman filter is used, the position and size of the tracking target can be predicted with high accuracy, and as a result, the tracking target can be tracked with higher accuracy and imaged. But in the said 8th aspect, a prediction means is not limited to the thing using a Kalman filter.

本発明によれば、追尾対象をより精度良く追尾して撮像することができる自動追尾装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the automatic tracking apparatus which can track and image a tracking object more accurately can be provided.

以下、本発明による自動追尾装置について、図面を参照して説明する。 Hereinafter, an automatic tracking device according to the present invention will be described with reference to the drawings.

図１は、本発明の一実施の形態による自動追尾装置を模式的に示すブロック図である。本実施の形態による自動追尾装置は、図１に示すように、パン、チルト及びズームの制御が可能なカメラ１と、処理部２と、分配器３と、表示・記録制御部４と、液晶パネル等の表示部５と、記録部６とを備えている。 FIG. 1 is a block diagram schematically showing an automatic tracking device according to an embodiment of the present invention. As shown in FIG. 1, the automatic tracking device according to the present embodiment includes a camera 1 that can control pan, tilt, and zoom, a processing unit 2, a distributor 3, a display / recording control unit 4, and a liquid crystal display. A display unit 5 such as a panel and a recording unit 6 are provided.

カメラ１は、カメラ本体１ａと、カメラ本体１ａに装着され処理部２からのズームを制御する制御信号に応じて倍率を設定するズームレンズ１ｂと、カメラ本体１ａが搭載され処理部２からのパン及びチルトを制御する制御信号に応じてカメラ本体１ａのパン及びチルトを設定する回転台１ｃとを有している。 The camera 1 includes a camera body 1 a, a zoom lens 1 b that is mounted on the camera body 1 a and sets a magnification according to a control signal that controls zoom from the processing unit 2, and a pan from the processing unit 2. And a turntable 1c for setting pan and tilt of the camera body 1a according to a control signal for controlling the tilt.

分配器３は、カメラ１からの画像信号を、処理部２と表示・記録制御部４とに分配して供給する。画像処理部２は、分配器３を介して供給されたカメラからの画像信号に基づいて、カメラ１により撮像された画像を処理して、カメラ１が侵入者又は侵入物体等の追尾対象１０（後述する図２参照）を自動追尾して撮像するように、カメラ１のパン、チルト及びズームを制御する。表示・記録制御部４は、分配器３を介して供給されたカメラからの画像信号が示す画像を、表示部５に表示させたり記録部６に記録させたりする。監視者は、表示部５に表示された画像を監視することができる。なお、監視者が画像を監視しないような場合は、分配器３を設けずに、カメラ１からの画像信号を処理部２に直接入力させてもよい。 The distributor 3 distributes and supplies the image signal from the camera 1 to the processing unit 2 and the display / recording control unit 4. The image processing unit 2 processes an image captured by the camera 1 based on the image signal from the camera supplied via the distributor 3, and the camera 1 tracks the tracking target 10 (such as an intruder or an intruding object). The pan, tilt, and zoom of the camera 1 are controlled so as to automatically track and capture an image (see FIG. 2 described later). The display / recording control unit 4 displays the image indicated by the image signal from the camera supplied via the distributor 3 on the display unit 5 or records the image on the recording unit 6. The monitor can monitor the image displayed on the display unit 5. Note that when the monitor does not monitor the image, the image signal from the camera 1 may be directly input to the processing unit 2 without providing the distributor 3.

図２は、カメラ１による追尾対象１０の追尾の様子の例を模式的に示す図である。図２では、侵入者等の追尾対象１０を追尾して、カメラ１の視野が変化している様子を示している。なお、実際には、追尾対象１０の移動に伴い、回転台１ｃのパン及びチルトが変化することでカメラ１の視野の向きが変化するとともにズームレンズ１ｂが作動することでカメラ１の視野が拡大・縮小するが、図２では、カメラ１の各部の図示は省略しカメラ１の視野のみを模式的に示している。 FIG. 2 is a diagram schematically illustrating an example of the tracking state of the tracking target 10 by the camera 1. FIG. 2 shows a state in which the visual field of the camera 1 is changed by tracking the tracking target 10 such as an intruder. Actually, as the tracking target 10 moves, the pan and tilt of the turntable 1c change to change the direction of the field of view of the camera 1, and the zoom lens 1b operates to enlarge the field of view of the camera 1. Although reduced, in FIG. 2, illustration of each part of the camera 1 is omitted, and only the field of view of the camera 1 is schematically shown.

次に、本実施の形態による自動追尾装置の処理部２の動作の一例について、図３乃至図１５を参照して説明する。図３は、処理部２の動作の一例を示す概略フローチャートである。図４は、図３中の追尾対象検知処理（ステップＳ２）を詳細に示すフローチャートである。図８は、図３中の追跡処理（ステップＳ５）を詳細に示すフローチャートである。図１１は、図８中の識別器の更新処理（ステップＳ２１８）を詳細に示すフローチャートである。図１２乃至図１５は、図３中のカメラ制御処理（ステップ７）を詳細に示すフローチャートである。なお、図５乃至図７、図９及び図１０については、後に説明する。 Next, an example of the operation of the processing unit 2 of the automatic tracking device according to the present embodiment will be described with reference to FIGS. FIG. 3 is a schematic flowchart illustrating an example of the operation of the processing unit 2. FIG. 4 is a flowchart showing in detail the tracking target detection process (step S2) in FIG. FIG. 8 is a flowchart showing in detail the tracking process (step S5) in FIG. FIG. 11 is a flowchart showing in detail the update process (step S218) of the discriminator in FIG. 12 to 15 are flowcharts showing in detail the camera control process (step 7) in FIG. 5 to 7, 9 and 10 will be described later.

図３に示すように、処理部２は、動作を開始すると、まず、カメラ１をプリセット状態にする（ステップＳ１）。すなわち、処理部２は、カメラ１のパン、チルト及びズームを予め定められたパン、チルト及びズームにする。 As shown in FIG. 3, when the processing unit 2 starts operation, first, the processing unit 2 sets the camera 1 in a preset state (step S1). That is, the processing unit 2 changes the pan, tilt, and zoom of the camera 1 to predetermined pan, tilt, and zoom.

次に、処理部２は、プリセットされた状態、つまり、カメラ１のパン、チルト、ズームが固定の状態で、追尾対象を検知（動体検知）する追尾対象検知処理を行う（ステップＳ２）。この検知は、一般的に用いられている手法（図４中のＳ１０１〜Ｓ１１２）を適用する他に、レーザーレーダなどの他のセンシングデバイスによって行ってもよいし、画面上に表示された人物を監視者が指定手段としてのマウス等のポインティングデバイスによって指定することによって行ってもよい。 Next, the processing unit 2 performs a tracking target detection process for detecting a tracking target (moving object detection) in a preset state, that is, in a state where the pan, tilt, and zoom of the camera 1 are fixed (step S2). In addition to applying a commonly used method (S101 to S112 in FIG. 4), this detection may be performed by another sensing device such as a laser radar, or a person displayed on the screen may be detected. The monitoring may be performed by specifying with a pointing device such as a mouse as a specifying means.

ここで、図４を参照して、追尾対象検知処理（ステップＳ２）の一例について説明する。なお、図３中の追尾対象検知処理（ステップＳ２）は、図４に示す例に限定されるものではない。 Here, an example of the tracking target detection process (step S2) will be described with reference to FIG. Note that the tracking target detection process (step S2) in FIG. 3 is not limited to the example shown in FIG.

追尾対象検知処理（ステップＳ２）を開始すると、図４に示すように、処理部２は、まず、カメラ１が撮像した２枚の連続する画像をサンプリングし（ステップＳ１０１，Ｓ１０２）、それらの画像の差分画像（フレーム間差分画像）を生成する（ステップＳ１０３）。 When the tracking target detection process (step S2) is started, as illustrated in FIG. 4, the processing unit 2 first samples two consecutive images captured by the camera 1 (steps S101 and S102), and these images. The difference image (inter-frame difference image) is generated (step S103).

次いで、処理部２は、ステップＳ１０３で生成した差分画像を２値化する（ステップＳ１０４）。この２値化に用いる閾値は、固定閾値でもよいし、判別分析法に代表されるような可変閾値でもよい。 Next, the processing unit 2 binarizes the difference image generated in step S103 (step S104). The threshold value used for the binarization may be a fixed threshold value or a variable threshold value represented by discriminant analysis method.

引き続いて、処理部２は、ステップＳ１０４で２値化された画像をラベリングする（ステップＳ１０５）。そして、処理部２は、ラベリングされたものがあるか否かを判定し（ステップＳ１０６）、ラベリングされたものがなければステップＳ１１２へ移行する一方、ラベリングされたものがあれば、ステップＳ１０７へ移行する。 Subsequently, the processing unit 2 labels the image binarized in step S104 (step S105). Then, the processing unit 2 determines whether there is a labeled item (step S106). If there is no labeled item, the processing unit 2 proceeds to step S112. If there is a labeled item, the processing unit 2 proceeds to step S107. To do.

ステップＳ１０７において、処理部２は、ラベリングされたもの全てについてそれぞれ特徴量を取得する（ステップＳ１０７，Ｓ１０８）。ここでいう特徴量は、例えば面積や円形度など、追尾対象１０を正確に検出するために必要なものである。 In step S107, the processing unit 2 acquires feature amounts for all the labeled items (steps S107 and S108). The feature amount referred to here is necessary for accurately detecting the tracking target 10 such as an area or a circularity.

その後、処理部２は、ステップＳ１０７で取得した全てのラベルの特徴量から、追尾対象１０の候補となるものが存在するか否かを判定する（ステップＳ１０９）。存在しなければステップＳ１１２へ移行する一方、存在すればステップＳ１１０へ移行する。 Thereafter, the processing unit 2 determines whether there is a candidate for the tracking target 10 from the feature values of all the labels acquired in step S107 (step S109). If it does not exist, the process proceeds to step S112. If it exists, the process proceeds to step S110.

ステップＳ１１０において、処理部２は、追尾対象１０の候補のうちから追尾対象１０を決定する。このとき、追尾対象１０の候補が１つであればそれを追尾対象１０として決定し、追尾対象１０の候補が複数存在すれば、所定の判断基準によって１つに絞り込んで、それを追尾対象１０として決定する。 In step S 110, the processing unit 2 determines the tracking target 10 from the tracking target 10 candidates. At this time, if there is one candidate for the tracking target 10, it is determined as the tracking target 10, and if there are a plurality of candidates for the tracking target 10, it is narrowed down to one according to a predetermined judgment criterion, and is selected as the tracking target 10. Determine as.

ステップＳ１１１の後に、処理部２は、追尾対象１０が検知されたか否かを示す追尾対象検知フラグを１（１は、追尾対象１０が検知されたこと示す。）にセットし（ステップＳ１１１）、追尾対象検知処理（ステップＳ２）を終了して、図３中のステップＳ３へ移行する。 After step S111, the processing unit 2 sets a tracking target detection flag indicating whether the tracking target 10 has been detected to 1 (1 indicates that the tracking target 10 has been detected) (step S111). The tracking target detection process (step S2) is terminated, and the process proceeds to step S3 in FIG.

ステップＳ１１２において、処理部２は、追尾対象検知フラグを０（０は、追尾対象１０が検知されなかったこと示す。）にセットする。その後、追尾対象検知処理（ステップＳ２）を終了して、図３中のステップＳ３へ移行する。 In step S112, the processing unit 2 sets the tracking target detection flag to 0 (0 indicates that the tracking target 10 has not been detected). Thereafter, the tracking target detection process (step S2) is terminated, and the process proceeds to step S3 in FIG.

再び図３を参照すると、ステップＳ３において、処理部２は、ステップＳ２で追尾対象１０が検知されたか否かを判定する。この判定は、前記追尾対象検知フラグが１であるか０であるかによって行う。追尾対象１０が検知された場合（追尾対象検知フラグが１の場合）は、ステップＳ４へ移行する一方、追尾対象１０が検知されなかった場合（追尾対象検知フラグが０の場合）は、ステップＳ２へ戻り、追尾対象検知処理（ステップＳ２）を繰り返す。 Referring to FIG. 3 again, in step S3, the processing unit 2 determines whether or not the tracking target 10 is detected in step S2. This determination is made depending on whether the tracking target detection flag is 1 or 0. When the tracking target 10 is detected (when the tracking target detection flag is 1), the process proceeds to step S4. On the other hand, when the tracking target 10 is not detected (when the tracking target detection flag is 0), step S2 is performed. Returning to Fig. 4, the tracking target detection process (step S2) is repeated.

図５は、カメラ１により撮像された画像、その画像から追尾対象検知処理（ステップＳ２）により検出された（あるいは、監視者がポインティングデバイスによって指定された）追尾対象領域（追尾対象１０の領域）、及び、その追尾対象領域に応じて設定された背景領域の例を、示す図である。図５に示す例では、人物の全身をちょうど囲む内側の矩形の領域が追尾対象領域となっている。内側の矩形と外側の矩形との間の領域が後述するステップＳ４で着目する背景領域である。背景領域の外形（外側の矩形）は、例えば、追尾対象領域の外形（内側の矩形）と同心でかつ追尾対象領域の外形（内側の矩形）を横方向（ｘ方向）及び縦方向（ｙ方向）にそれぞれ所定倍率で拡大したものとして、設定される。 FIG. 5 shows an image captured by the camera 1, a tracking target region (region of the tracking target 10) detected from the image by the tracking target detection process (step S 2) (or designated by the monitoring device with the pointing device). FIG. 4 is a diagram illustrating an example of a background region set in accordance with the tracking target region. In the example shown in FIG. 5, the inner rectangular area that just surrounds the whole body of the person is the tracking target area. A region between the inner rectangle and the outer rectangle is a background region to be noted in step S4 described later. The outer shape (outer rectangle) of the background region is, for example, concentric with the outer shape (inner rectangle) of the tracking target region and the outer shape (inner rectangle) of the tracking target region in the horizontal direction (x direction) and the vertical direction (y direction). ) Are respectively set to be enlarged at a predetermined magnification.

ステップＳ４において、処理部２は、AdaBoost識別器を構築する処理（初期学習）を、初期化処理として行う。この初期化処理では、処理部２は、追尾対象検知処理（ステップＳ２）により検出された図５に示すような追尾対象領域及び背景領域に基づいて、AdaBoost識別器を、画素に関する１つ以上の特徴量としての特徴ベクトルｖに基づいて当該画素が追尾対象画素であるか背景画素であるかを識別するように構築する処理を行う。以下の説明では、追尾対象領域の符号をＲ_{ｔａｒｇｅｔ}とし、背景領域の符号をＲ_ｂａｃｋとする。 In step S4, the processing unit 2 performs processing (initial learning) for constructing an AdaBoost discriminator as initialization processing. In this initialization process, the processing unit 2 converts the AdaBoost discriminator into one or more pixels related to the pixel based on the tracking target area and the background area as shown in FIG. 5 detected by the tracking target detection process (step S2). Based on the feature vector v as the feature quantity, processing is performed to identify whether the pixel is a tracking target pixel or a background pixel. In the following description, the code of the tracking target area is R _target and the code of the background area is R _back .

この初期化処理（ステップＳ４）では、追尾対象領域Ｒ_{ｔａｒｇｅｔ}と背景領域Ｒ_ｂａｃｋから、追尾対象を表すポジティブサンプル及び背景を表すネガティブサンプルを以下のように取得する。 In this initialization process (step S4), a positive sample representing the tracking target and a negative sample representing the background are acquired from the tracking _target region R _target and the background region R _back as follows.

ただし、数１において、ｘ_１ ^（ｉ）は位置（ｘ，ｙ）、ｌ_１ ^（ｉ）はクラスラベルである。クラスラベルｌ_１ ^（ｉ）と特徴ベクトルｖで構成される初期学習サンプル（ｖ（ｘ_１ ^（ｉ）），ｌ_１ ^（ｉ））を用いて初期のAdaBoost識別器を構築する。ここで、特徴ベクトルｖは、後で記すＬｕｖ画素値の平均値・分散値、ＥＯＨ、ＬＢＰのヒストグラムをすべて用いた特徴ベクトルとする。また、本実施の形態では、AdaBoost識別器の学習アルゴリズムは、Gentle AdaBoostアルゴリズムを用いる。もっとも、本発明では、AdaBoost識別器の学習アルゴリズムは、Gentle AdaBoostアルゴリズムに限定されるものではない。 However, in Equation 1, x ₁ ⁽ⁱ⁾ is a position (x, y), and l ₁ ⁽ⁱ⁾ is a class label. An initial AdaBoost classifier is constructed using initial learning samples (v (x ₁ ⁽ⁱ⁾ ), l ₁ ⁽ⁱ⁾ ) composed of the class label l ₁ ⁽ⁱ⁾ and the feature vector v. Here, the feature vector v is assumed to be a feature vector using all of the average values and variance values of Luv pixel values, which will be described later, and histograms of EOH and LBP. In this embodiment, the learning algorithm of the AdaBoost classifier uses the Gentle AdaBoost algorithm. However, in the present invention, the learning algorithm of the AdaBoost classifier is not limited to the Gentle AdaBoost algorithm.

Gentle AdaBoostアルゴリズムは、文献『J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: a Statistical View of Boosting”, technical report, Department of Statistics, Stanford University, 1998』に記載されているように公知であり、以下の通りである。なお、Ｎはサンプルの個数、Ｍは弱識別器の個数である。 The Gentle AdaBoost algorithm is described in the literature “J. Friedman, T. Hastie, and R. Tibshirani,“ Additive Logistic Regression: a Statistical View of Boosting ”, technical report, Department of Statistics, Stanford University, 1998”. And is as follows. N is the number of samples, and M is the number of weak classifiers.

（１）下記の数２による学習サンプル重みｗ_ｉの初期化、強識別器Ｆ（ｘ）の初期化
（２）繰り返し処理（ｍ＝１，２，…，Ｍ）
（ａ）重み付き学習サンプルを用いて弱識別器ｆ_ｍ（ｘ）の学習（重み付き最小二乗法）
（ｂ）下記の数３による弱識別器ｆ_ｍ（ｘ）の値設定
数３において、＾を付したＰ_ｗは学習サンプル重みを利用した重み付き識別率を示す。
（ｃ）下記の数４による識別器の更新
（ｄ）下記の数５による学習サンプル重みの更新
（３）下記の数６による識別器の構築
(1) Initialization of learning sample weight w _i by the following equation 2 and initialization of strong classifier F (x)
(2) Repetitive processing (m = 1, 2,..., M)
(A) learning using the weighted learning sample weak discriminators f _{m (x)} (weighted least squares method)
(B) Value setting of weak classifier f _m (x) by the following equation (3)
In number 3, the P _w marked with ^ show a weighted identification rate using the learning sample weight.
(C) Updating the classifier according to the following equation (4)
(D) Update of learning sample weight by the following equation 5
(3) Construction of discriminator by the following formula 6

ここで、本実施の形態において、AdaBoost識別器に用いる特徴ベクトルｖについて説明する。AdaBoost識別器には、他種類の特徴量を入力することが可能である。本実施の形態では、画素に関する特徴量（特徴ベクトルｖの要素）として、Ｌｕｖの平均値及び分散値、エッジ方向ヒストグラム（ＥＯＨ、Edge Orientation Histogram）、ローカルバイナリーパターン（ＬＢＰ、Local Binary Pattern）のヒストグラムを用いる。 Here, the feature vector v used for the AdaBoost discriminator in the present embodiment will be described. It is possible to input other types of feature quantities into the AdaBoost classifier. In the present embodiment, Luv average values and variance values, edge direction histograms (EOH, Edge Orientation Histogram), and local binary pattern (LBP) histograms are used as feature quantities (elements of feature vector v) relating to pixels. Is used.

まず、Ｌｕｖの平均値及び分散値について説明する。ここでは、画素のＣＩＥ１９７６Ｌ^＊ｕ^＊ｖ^＊色空間の３つの値であるＬ，ｕ，ｖを用いる。着目画素の特徴量の１つであるＬの平均値は、着目画素を中心に含む局所領域Ｗ（例えば、５×５画素）のＬの平均値である。着目画素の特徴量の１つであるＬの分散値は、着目画素を中心に含む局所領域Ｗ（例えば、５×５画素）のＬの分散値である。着目画素の特徴量の１つであるｕの平均値は、着目画素を中心に含む局所領域Ｗ（例えば、５×５画素）のｕの平均値である。着目画素の特徴量の１つであるｕの分散値は、着目画素を中心に含む局所領域Ｗ（例えば、５×５画素）のｕの分散値である。着目画素の特徴量の１つであるｖの平均値は、着目画素を中心に含む局所領域Ｗ（例えば、５×５画素）のｖの平均値である。着目画素の特徴量の１つであるｖの分散値は、着目画素を中心に含む局所領域Ｗ（例えば、５×５画素）のｖの分散値である。これらを数式を用いて表現すると、下記の数７及び数８のようになる。数７及び数８において、局所領域Ｗ内のＬｕｖ画素値をＩ＝（Ｌ，ｕ，ｖ）とし、その平均値を￣Ｉ＝（￣Ｌ，￣ｕ，￣ｖ）、その分散値をσ_ｆ＝（σ_ｌ，σ_ｕ，σ_ｖ）とし、Ａを局所領域Ｗ内の画素数としている。ここで、￣Ｉは上バー付きのＩを示し、￣Ｌ，￣ｕ，￣ｖも同様に上バー付きのＬ，ｕ，ｖをそれぞれ示す。 First, the average value and variance value of Luv will be described. Here, L, u, and v, which are three values of the CIE 1976 L ^* u ^* v ^* color space of the pixel, are used. The average value of L, which is one of the feature amounts of the target pixel, is the average value of L of the local region W (for example, 5 × 5 pixels) including the target pixel as the center. The L variance value, which is one of the feature amounts of the pixel of interest, is the L variance value of the local region W (for example, 5 × 5 pixels) including the pixel of interest at the center. The average value of u, which is one of the feature amounts of the target pixel, is the average value of u in the local region W (for example, 5 × 5 pixels) including the target pixel as the center. The variance value of u, which is one of the feature amounts of the target pixel, is the variance value of u in the local region W (for example, 5 × 5 pixels) including the target pixel as the center. The average value of v, which is one of the feature amounts of the target pixel, is the average value of v of the local region W (for example, 5 × 5 pixels) including the target pixel as the center. The variance value of v, which is one of the feature amounts of the target pixel, is the variance value of v in the local region W (for example, 5 × 5 pixels) including the target pixel as the center. When these are expressed using mathematical expressions, the following equations 7 and 8 are obtained. In Equations 7 and 8, the Luv pixel value in the local region W is I = (L, u, v), the average value is ￣I = (￣L, ￣u, ￣v), and the variance value is σ. _f = (σ _l , σ _u , σ _v ), and A is the number of pixels in the local region W. Here, ￣I indicates I with an upper bar, and ￣L, ￣u, and ￣v similarly indicate L, u, and v with an upper bar, respectively.

次に、エッジ方向ヒストグラム（ＥＯＨ）について説明する。着目画素の特徴量の１つであるエッジ方向ヒストグラムは、着目画素を中心に含む局所領域（例えば、５×５画素）内のエッジの方向をエッジの強度で重み付けしたヒストグラムである。局所領域内の濃淡画像をＩとしたとき、座標（ｘ，ｙ）での勾配（エッジ）画像Ｇ_ｘ，Ｇ_ｙは下記の数９及び数１０のように表現できる。 Next, the edge direction histogram (EOH) will be described. The edge direction histogram, which is one of the feature amounts of the target pixel, is a histogram obtained by weighting the edge direction in the local region (for example, 5 × 5 pixels) including the target pixel as the center by the edge strength. Assuming that the grayscale image in the local region is I, the gradient (edge) images G _x and G _y at the coordinates (x, y) can be expressed as in the following _equations (9) and (10).

数９及び数１０において、Sobel_ｘとSobel_ｙはそれぞれｘ，ｙ方向のエッジを計算するための演算子である。数９及び数１０より、着目している座標（ｘ，ｙ）での勾配（エッジ）強度Ｇ（ｘ，ｙ）及びエッジの方向角θ（ｘ，ｙ）は、下記の数１１及び数１２で示す通りとなる。 In Equations 9 and 10, Sobel _x and Sobel _y are operators for calculating edges in the x and y directions, respectively. From Equation 9 and Equation 10, the gradient (edge) strength G (x, y) and the edge direction angle θ (x, y) at the coordinate (x, y) of interest are expressed by Equations 11 and 12 below. As shown in.

このエッジ強度Ｇと方向角θを用いて、Ｋ個の階級数のヒストグラムを作成する。このヒストグラムがエッジ方向ヒストグラム（ＥＯＨ）である。図６は、エッジ方向ヒストグラムを示す概略図である。図６（ａ）は、局所領域（５×５画素）におけるエッジのベクトル（エッジ強度と方向角）の分布を示している。図６（ｂ）は、図６（ａ）の分布から作成した８階級のエッジ方向ヒストグラムを示している。 Using this edge strength G and the direction angle θ, a histogram of K classes is created. This histogram is an edge direction histogram (EOH). FIG. 6 is a schematic diagram showing an edge direction histogram. FIG. 6A shows the distribution of edge vectors (edge strength and direction angle) in the local region (5 × 5 pixels). FIG. 6B shows an 8-level edge direction histogram created from the distribution of FIG.

次に、ローカルバイナリーパターン（ＬＢＰ）のヒストグラムについて説明する。ローカルバイナリーパターンは、注目画素近傍での画素値の大小関係を２値パターンとしたものである。濃淡画像Ｉとして、図７のような注目画素（ｘ，ｙ）の４近傍のＬＢＰは、下記の数１３のようになる。図７は、ローカルバイナリーパターンの説明図である。 Next, a local binary pattern (LBP) histogram will be described. The local binary pattern is a binary pattern in which the magnitude relationship between pixel values in the vicinity of a target pixel is changed. As the grayscale image I, the LBP near 4 of the target pixel (x, y) as shown in FIG. FIG. 7 is an explanatory diagram of a local binary pattern.

このＬＢＰを局所領域内で計算しヒストグラムとする。４近傍のＬＢＰは０から１５の１６種類の値をとるため１６階級のヒストグラムを作成する。本実施の形態では、着目画素の特徴量の１つであるローカルバイナリーパターンのヒストグラムは、着目画素を中心に含む局所領域内で計算したローカルバイナリーパターンのヒストグラムである。 This LBP is calculated within the local region to obtain a histogram. Since LBP near 4 takes 16 values from 0 to 15, a 16-level histogram is created. In the present embodiment, the local binary pattern histogram, which is one of the feature quantities of the pixel of interest, is a local binary pattern histogram calculated within a local region centered on the pixel of interest.

以上、本実施の形態においてAdaBoost識別器で用いる画素に関する特徴量（特徴ベクトルｖの要素）について、説明した。しかしながら、本発明で用いる特徴量は、前述した特徴ベクトルｖの例に限定されるものではない。 The feature amount (element of the feature vector v) related to the pixel used in the AdaBoost discriminator in the present embodiment has been described above. However, the feature quantity used in the present invention is not limited to the example of the feature vector v described above.

再び図３を参照すると、AdaBoost識別器が構築されて初期化処理（ステップＳ４）が終了すると、ステップＳ５へ移行する。ステップＳ５において、処理部２は、カメラ１により撮像された画像に基づいて、追尾対象を追跡する追跡処理を行う。本実施の形態では、追跡処理（ステップＳ５）において、処理部２は、カメラ１により撮像された画像に基づいて、画素の位置を状態とした複数のパーティクルによるパーティクルフィルタによって、追跡結果の一部として前記追尾対象の位置を推定する。前記パーティクルフィルタは、前記各パーティクルに関して、画素に関する１つ以上の特徴量（特徴ベクトルｖ）に基づいて当該画素が追尾対象画素であるか背景画素であるかを識別するように構築された前記AdaBoost識別器の応答値であって、当該パーティクルの位置の画素に関する前記１つ以上の特徴量（特徴ベクトルｖ）による応答値から、算出した尤度を用いるものである。処理部２は、前記複数のパーティクルのうち前記AdaBoost識別器により追尾対象画素であると識別された画素に応じたパーティクルに基づいて、追跡結果の他の一部として前記追尾対象の大きさを推定する。なお、パーティクルを粒子と呼ぶ場合がある。 Referring to FIG. 3 again, when the AdaBoost discriminator is constructed and the initialization process (step S4) ends, the process proceeds to step S5. In step S 5, the processing unit 2 performs tracking processing for tracking the tracking target based on the image captured by the camera 1. In the present embodiment, in the tracking process (step S5), the processing unit 2 uses the particle filter with a plurality of particles whose pixel positions are based on the image captured by the camera 1 to partly track the result. The position of the tracking target is estimated as follows. The AdaBoost is constructed such that, for each particle, the AdaBoost is configured to identify whether the pixel is a tracking target pixel or a background pixel based on one or more feature amounts (feature vector v) regarding the pixel. It is a response value of the discriminator, and uses a likelihood calculated from a response value based on the one or more feature amounts (feature vector v) regarding the pixel at the position of the particle. The processing unit 2 estimates the size of the tracking target as another part of the tracking result based on the particles corresponding to the pixel identified as the tracking target pixel by the AdaBoost classifier among the plurality of particles. To do. Note that the particles may be referred to as particles.

ここで、図８を参照して、追跡処理（ステップＳ５）の一例について説明する。なお、図３中の追跡処理（ステップＳ５）は、図８に示す例に限定されるものではない。 Here, an example of the tracking process (step S5) will be described with reference to FIG. The tracking process (step S5) in FIG. 3 is not limited to the example shown in FIG.

追跡処理（ステップＳ５）を開始すると、図８に示すように、処理部２は、まず、現在の画像をサンプリングする（ステップＳ２０１）。 When the tracking process (step S5) is started, as shown in FIG. 8, the processing unit 2 first samples the current image (step S201).

次に、処理部２は、パーティクルフィルタの選択・予測（粒子群の選択・予測）を行う（ステップＳ２０２）。ここで、ある時刻ｔの状態ｘを持つＮ個の粒子群（パーティクル群）を｛ｓ_ｔ ^（ｉ）｝_{ｉ＝１，…，Ｎ}とする。ただし、各粒子の状態ｘは、位置（ｘ，ｙ）（すなわち、画素の位置）とする。また、各粒子の尤度を｛π_ｔ ^（ｉ）｝_{ｉ＝１，…，Ｎ}とする。 Next, the processing unit 2 performs particle filter selection / prediction (particle group selection / prediction) (step S202). Here, it is assumed that N particle groups (particle groups) having a state x at a certain time t are {s _t ⁽ⁱ⁾ } _{i = 1,.} However, the state x of each particle is a position (x, y) (that is, a pixel position). Also, let the likelihood of each particle be {π _t ⁽ⁱ⁾ } _{i = 1,.}

ステップＳ２０２において、処理部２は、パーティクルフィルタの選択処理として、リサンプリング手法に基づき、前の時刻ｔ−１での（前のフレームでの）粒子群｛ｓ_ｔ−１ ^（ｉ）｝_{ｉ＝１，…，Ｎ}の尤度｛π_ｔ−１ ^（ｉ）｝_{ｉ＝１，…，Ｎ}を用いて、新たな粒子群｛ｓ’_ｔ−１ ^（ｉ）｝_{ｉ＝１，…，Ｎ}を選択する。尤度が大きい粒子を増やし（分裂）、尤度の小さい粒子を減らす（消滅）させることにより、追尾対象のいる位置確率分布を限られた粒子数で近似することを可能にする。粒子の消滅のため粒子数が予め設定した個数Ｎに満たない場合は、対象領域及び背景領域内に新たな粒子を追加する。このステップは、追跡処理開始直後の場合（ステップＳ４からステップＳ５へ移行した場合）には、粒子の尤度が得られていないため、このプロセスは省略される。 In step S202, the processing unit 2 performs the particle filter selection process based on the resampling technique, based on the resampling method, the particle group {s _t-1 ⁽ⁱ⁾ } _{i =} at the previous time t−1. _{1,..., N} likelihood {π _t−1 ⁽ⁱ⁾ } _{i = 1,..., N} , and a new particle group {s ′ _t−1 ⁽ⁱ⁾ } _{i = 1,.} select. By increasing the number of particles with high likelihood (split) and decreasing the number of particles with low likelihood (disappear), it is possible to approximate the position probability distribution of the tracking target with a limited number of particles. If the number of particles is less than the preset number N due to the disappearance of the particles, new particles are added to the target area and the background area. If this step is immediately after the start of the tracking process (when moving from step S4 to step S5), the likelihood of particles is not obtained, so this process is omitted.

ステップＳ２０２において、処理部２は、パーティクルフィルタの予測処理として、運動モデルに従って時刻ｔ−１の各粒子ｓ’_ｔ−１ ^（ｉ）を状態遷移させ時刻ｔの粒子ｓ_ｔ ^（ｉ）を予測する。本実施の形態では、運動モデルとしてランダムウォークモデルを用いる。なぜなら、ここでは多くの環境を想定しており、特にパン、チルト及びズームの制御が可能なカメラ１で追尾対象を追跡する際は、追尾対象の動きを等速直線運動などで近似するのは困難なためである。下記の数１４に、ランダムウォークモデルの式を示す。 In step S202, the processing unit 2 predicts the particle s _t ⁽ⁱ⁾ at the time t by causing the state transition of each particle s ′ _t-1 ⁽ⁱ⁾ at the time t−1 according to the motion model as the prediction process of the particle filter. . In this embodiment, a random walk model is used as the motion model. This is because many environments are assumed here, and in particular, when tracking a tracking target with the camera 1 capable of controlling pan, tilt, and zoom, the movement of the tracking target is approximated by a uniform linear motion or the like. This is because it is difficult. Equation 14 below shows the formula of the random walk model.

ただし、ｗ_ｔ−１は、ガウスノイズである。ｗ_ｔ−１は、ガウスイメージに従った乱数イメージとして良い。つまり、時刻ｔ−１の位置に乱数が加わって時刻ｔの位置が決まることになる。 However, w _t−1 is Gaussian noise. w _t−1 may be a random image according to a Gaussian image. That is, the position at time t is determined by adding a random number to the position at time t-1.

なお、追跡処理開始直後の場合（ステップＳ４からステップＳ５へ移行した場合）、ステップＳ４で学習に使用したデータを数１４の時刻ｔ−１のデータとして扱い、状態を遷移させる。 In the case immediately after the start of the tracking process (when the process proceeds from step S4 to step S5), the data used for learning in step S4 is treated as the data at time t-1 in Formula 14, and the state is changed.

ステップＳ２０２の後に、処理部２は、パーティクル（粒子）の番号ｎの値を１に設定する（ステップＳ２０３）。 After step S202, the processing unit 2 sets the value of the particle number n to 1 (step S203).

次いで、処理部２は、ｎ番目の粒子の位置の画素を着目画素とし、その着目画素に関するＬｕｖ画素値の平均値・分散値を算出する（ステップＳ２０４）。その算出手法は、前述した通りである。 Next, the processing unit 2 sets the pixel at the position of the n-th particle as the target pixel, and calculates an average value and a variance value of the Luv pixel values related to the target pixel (step S204). The calculation method is as described above.

引き続いて、処理部２は、ｎ番目の粒子の位置の画素を着目画素とし、その着目画素に関するエッジ方向ヒストグラム（ＥＯＨ）を取得する（ステップＳ２０５）。その取得手法は、前述した通りである。 Subsequently, the processing unit 2 uses the pixel at the position of the nth particle as the pixel of interest, and acquires an edge direction histogram (EOH) regarding the pixel of interest (step S205). The acquisition method is as described above.

その後、処理部２は、ｎ番目の粒子の位置の画素を着目画素とし、その着目画素に関するローカルバイナリーパターン（ＬＢＰ）のヒストグラム取得する（ステップＳ２０６）。その取得手法は、前述した通りである。 Thereafter, the processing unit 2 uses the pixel at the position of the n-th particle as a target pixel, and acquires a histogram of a local binary pattern (LBP) related to the target pixel (step S206). The acquisition method is as described above.

次に、処理部２は、ｎ番目の粒子に関してステップＳ２０４〜Ｓ２０６で取得した特徴量を要素とする特徴ベクトルｖによる前記AdaBoost識別器の応答値Ｆを、下記の数１５に従って算出する（ステップＳ２０７）。応答値Ｆが正であれば、ｎ番目の粒子の位置の画素が追尾対象画素であると識別されたことを意味し、応答値Ｆが負であれば、ｎ番目の粒子の位置の画素が背景画素であると識別されたことを意味する。 Next, the processing unit 2 calculates the response value F of the AdaBoost discriminator by the feature vector v having the feature amount acquired in steps S204 to S206 as the element for the n-th particle according to the following formula 15 (step S207). ). If the response value F is positive, it means that the pixel at the position of the nth particle is identified as the tracking target pixel. If the response value F is negative, the pixel at the position of the nth particle is This means that the pixel is identified as a background pixel.

次いで、処理部２は、ステップＳ２０７で算出された応答値Ｆから、下記の数１６に従って、ｎ番目の粒子の尤度π_ｔ ^（ｉ）を算出する（ステップＳ２０８）。数１６において、ａは定数である。この処理は、パーティクルフィルタの観測に相当している。 Next, the processing unit 2 calculates the likelihood π _t ⁽ⁱ⁾ of the nth particle from the response value F calculated in step S207 according to the following _equation 16 (step S208). In Equation 16, a is a constant. This process corresponds to the observation of the particle filter.

なお、尤度π_ｔ ^（ｉ）の算出式は数１６に限定されるものではない。その算出式は、応答値Ｆは−１から＋１までの値を取ることから、この区間で単調増加となる関数の式であれば良い。 Note that the equation for calculating the likelihood π _t ⁽ⁱ⁾ is not limited to Equation 16. As the calculation formula, the response value F takes a value from −1 to +1.

引き続いて、処理部２は、全てのパーティクルについてステップＳ２０４〜Ｓ２０８を行って尤度π_ｔ ^（ｉ）を算出したか否かを判定する（ステップＳ２０９）。全てのパーティクルフィルタについて算出が終了していればＳ２１１へ移行する。一方、終了していなければ、粒子の番号ｎの値を１だけインクリメントした（ステップＳ２１０）後に、ステップＳ２０４へ戻る。 Subsequently, the processing unit 2 determines whether or not the likelihood π _t ⁽ⁱ⁾ has been calculated by performing steps S204 to S208 for all particles (step S209). If the calculation has been completed for all the particle filters, the process proceeds to S211. On the other hand, if not completed, the value of the particle number n is incremented by 1 (step S210), and then the process returns to step S204.

ステップＳ２１１において、処理部２は、各粒子の状態である画素位置の、ステップＳ２０８で算出した各粒子の尤度π_ｔ ^（ｉ）を用いた重み付け平均値Ｅ［Ｓ_ｔ］を、下記の数１７に従って、状態の推定結果である追尾対象の位置（ここでは、重心位置）の推定結果（追跡結果の一部）として算出する。 In step S211, the processing unit 2 calculates the weighted average value E [S _t ] using the likelihood π _t ⁽ⁱ⁾ of each particle calculated in step S208 at the pixel position that is the state of each particle by the following number. 17 is calculated as an estimation result (part of the tracking result) of the tracking target position (here, the center of gravity position), which is the state estimation result.

次に、処理部２は、各パーティクル（粒子）ｓ_ｔ ^（ｉ）について、ステップＳ２１１で推定結果として得た追尾対象の重心位置（￣ｘ，￣ｙ）に対する当該パーティクルｓ_ｔ ^（ｉ）の前記尤度π_ｔ ^（ｉ）によって重み付けされた共分散行列を用いたマハラノビス距離Ｄ_ｔ ^（ｉ）を、下記の数２２に従って算出する（ステップＳ２１２）。なお、￣ｘは上バー付きのｘを示し、￣ｙは上バー付きのｙを示している。 Then, the processing unit 2, the each particle _(particle) for ^{s t (i),} the center of gravity of the tracking target, obtained as the estimation result in step S211 (¯x, ¯y) with respect to the particle _s ^{t (i)} The Mahalanobis distance D _t ⁽ⁱ⁾ using the covariance matrix weighted by the likelihood π _t ⁽ⁱ⁾ is calculated according to the following _equation 22 (step S212). Note that ￣x indicates x with an upper bar, and ￣y indicates y with an upper bar.

下記の数１８〜数２１に、各粒子ｓ_ｔ ^（ｉ）＝ｘ_ｔ ^（ｉ）＝（ｘ_ｔ ^（ｉ），ｙ_ｔ ^（ｉ））の重み付き共分散行列Σ_ｔ’を示す。 The following _equations 18 to 21 show the weighted covariance matrix Σ _t ′ of each particle s _t ⁽ⁱ⁾ = x _t ⁽ⁱ⁾ = (x _t ⁽ⁱ⁾ , y _t ⁽ⁱ⁾ ).

各粒子ｓ_ｔ ^（ｉ）の追尾対象の重心位置（￣ｘ，￣ｙ）に対するマハラノビス距離Ｄ_ｔ ^（ｉ）は、下記の数２２のようになる。数２２において、￣ｓ_ｔ＝￣ｘ_ｔ＝（￣ｘ_ｔ，￣ｙ_ｔ）、Σ_ｔ'^-1はΣ_ｔ’の逆行列である。 The Mahalanobis distance D _t ^{(i) with} respect to the center-of-gravity position (￣x, ￣y) of the tracking target of each particle s _t ⁽ⁱ⁾ is expressed by Equation 22 below. In Equation 22, ￣s _t = ￣x _t = (￣x _t , ￣y _t ), and Σ _t ' ^-1 is an inverse matrix of Σ _t '.

マハラノビス距離の等しい点を結ぶと楕円を描く。マハラノビス距離は、あるサンプルが正規分布に従うと仮定した場合，そのサンプルの信頼区間を表わすものと考えることができる．本実施の形態では、例えば、経験的にサンプルつまり重み付き粒子群を約80％含むようにマハラノビス距離Ｄ≒１．４０とする。このマハラノビス距離を用いて描いた楕円（内側の楕円）を対象楕円として、図９中に示している。図９において、外側の楕円は、全てのすべての粒子部群を含むようにマハラノビス距離Ｄ＝１０として描いたものであり、背景楕円である、図９において、白点はAdaBoost識別器の応答値Ｆが正（つまり、追尾対象と識別された）の粒子、黒点は応答値Ｆが負（つまり、背景と識別された）の粒子を表す。 An ellipse is drawn by connecting points with equal Mahalanobis distance. Assuming that a sample follows a normal distribution, the Mahalanobis distance can be thought of as representing the confidence interval for that sample. In the present embodiment, for example, the Mahalanobis distance D≈1.40 is set so as to empirically include about 80% of the sample, that is, the weighted particle group. An ellipse (inner ellipse) drawn using this Mahalanobis distance is shown as an object ellipse in FIG. In FIG. 9, the outer ellipse is drawn as the Mahalanobis distance D = 10 so as to include all the particle part groups, and in FIG. 9, the white point is the response value of the AdaBoost discriminator. F is a positive particle (that is, identified as a tracking target), and a black dot represents a particle whose response value F is negative (that is, identified as a background).

ステップＳ２１２において各粒子ｓ_ｔ ^（ｉ）についてマハラノビス距離Ｄ_ｔ ^（ｉ）を算出した後、処理部２は、すべてのパーティクルｓ_ｔ ^（ｉ）のうちAdaBoost識別器により追尾対象画素であると識別された画素に応じたパーティクル（すなわち、応答値Ｆが正のパーティクル）に基づいて、追尾対象の大きさを推定する（ステップＳ２１３）。本実施の形態では、処理部２は、応答値Ｆが正のパーティクルのうちの、ステップＳ２１２で算出されたマハラノビス距離Ｄ_ｔ ^（ｉ）が所定値（例えば、１．４０）以下であるパーティクルの分布状況に基づいて、追尾対象の大きさを推定する。 After calculating the Mahalanobis distance D _t ⁽ⁱ⁾ for each particle s _t ⁽ⁱ⁾ in step S212, the processing unit 2 is identified as a tracking target pixel by the AdaBoost classifier among all the particles s _t ^(i). The size of the tracking target is estimated based on the particles corresponding to the pixels (that is, particles with a positive response value F) (step S213). In the present embodiment, the processing unit 2 includes particles having a positive response value F and having a Mahalanobis distance D _t ⁽ⁱ⁾ calculated in step S212 of a predetermined value (eg, 1.40) or less. Based on the distribution situation, the size of the tracking target is estimated.

本実施の形態では、具体的には、ステップＳ２１３において、処理部２は、応答値Ｆが正でかつマハラノビス距離Ｄ_ｔ ^（ｉ）が所定値（例えば、１．４０）以下の粒子に絞り込む。そして、処理部は、その絞り込まれた粒子の位置（当該粒子の状態である画素位置）のｘ座標及びｙ座標について、それぞれ最大値と最小値を求める。このとき、ｘ座標の最小値をｘ_ｓ、ｘ座標の最大値をｘ_ｅ、ｙ座標の最小値をｙ_ｓ、ｙ座標の最大値をｙ_ｅ、とする。その結果から、処理部２は、追尾対象となる領域の矩形の領域を、始点が（ｘ_ｓ，ｙ_ｓ）で終点が（ｘ_ｅ，ｙ_ｅ）の矩形領域として、この領域の大きさであるとして追尾対象領域の大きさを推定する。図３中のカメラ制御処理（ステップＳ７）においてカメラ１のズームを制御する場合には、この矩形領域を示すデータを使用する。 In the present embodiment, specifically, in step S213, the processing unit 2 narrows down to particles whose response value F is positive and whose Mahalanobis distance D _t ⁽ⁱ⁾ is a predetermined value (eg, 1.40) or less. Then, the processing unit obtains a maximum value and a minimum value for the x-coordinate and the y-coordinate of the narrowed-down particle position (pixel position in the state of the particle), respectively. At this time, the minimum value of the x coordinate is x _s , the maximum value of the x coordinate is x _e , the minimum value of the y coordinate is y _s , and the maximum value of the y coordinate is y _e . As a result, the processing unit 2 sets the rectangular area of the tracking target area as a rectangular area having a start point (x _s , y _s ) and an end point (x _e , y _e ) with the size of this area. Assuming there is a size of the tracking target area. When the zoom of the camera 1 is controlled in the camera control process (step S7) in FIG. 3, data indicating this rectangular area is used.

次に、処理部２は、追跡結果（ステップＳ２１１で推定された追尾対象の位置とステップＳ２１３で推定された追尾対象の大きさ）の信頼度ｃ_ｔとして、下記の数２３及び数２４に従って、全粒子の尤度の空間的な重み付け平均値を算出する（ステップＳ２１４）。ただし、数２５が成立する。数２３〜数２５において、Ｎは粒子の数、ｗ_ｔ ^（ｉ）は各粒子の空間的な重み、Ｄ_ｔ ^（ｉ）は各粒子の前記数２２のマハラノビス距離である。 Next, the processing unit 2 determines the reliability ct of the tracking result (the tracking target position estimated in step S211 and the tracking target size estimated in step S213) according to the following _equations 23 and 24. A spatially weighted average value of the likelihood of all particles is calculated (step S214). However, Equation 25 holds. In Equations 23 to 25, N is the number of particles, w _t ⁽ⁱ⁾ is the spatial weight of each particle, and D _t ⁽ⁱ⁾ is the Mahalanobis distance of Equation 22 for each particle.

前述したパーティクルフィルタでは、画像上の位置（ｘ、ｙ）を状態として扱い、その尤度は各粒子が追尾対象か背景かの確率を示すものとなっている。追尾対象は点の集合であることから、各位置の尤度の平均値を追尾対象の信頼度として扱うことも可能ではあるが、より確かな値として扱うために数２３で示される信頼度ｃ_ｔを算出している。 In the particle filter described above, the position (x, y) on the image is treated as a state, and the likelihood indicates the probability that each particle is a tracking target or the background. Since the tracking target is a set of points, it is possible to treat the average value of the likelihood of each position as the reliability of the tracking target. However, the reliability c shown in Equation 23 is used in order to treat it as a more reliable value. _t is calculated.

ここでは、追尾対象の粒子は集中していると仮定をし、尤度に対して単なる平均値を算出するのではなく、数２３に示すように、空間的な重み付け平均値を算出することで信頼度ｃ_ｔとしている。図１０はその説明図である。図１０（ａ）は、画像の左上を原点とし粒子の分布を表すものである。図１０（ｂ）は各粒子の尤度である。図１０（ｃ）は上述した空間的な重み付けであり、数２４のｗ_ｔ ^（ｉ）に相当する。この図１０（ｂ）の尤度に図１０（ｃ）の各粒子の空間的重みを乗算すると、図１０（ｄ）のようになり、この積和演算の結果が今回用いている信頼度ｃ_ｔとなる。 Here, it is assumed that the tracking target particles are concentrated, and instead of simply calculating an average value for the likelihood, a spatial weighted average value is calculated as shown in Equation 23. It is the reliability _{c t.} FIG. 10 is an explanatory diagram thereof. FIG. 10A shows the particle distribution with the upper left corner of the image as the origin. FIG. 10B shows the likelihood of each particle. FIG. 10C shows the above-described spatial weighting, which corresponds to w _t ⁽ⁱ⁾ in Expression 24. When the likelihood of FIG. 10 (b) is multiplied by the spatial weight of each particle of FIG. 10 (c), the result is as shown in FIG. 10 (d). The result of this product-sum operation is the reliability c used this time. _t .

ステップＳ２１４の後、処理部２は、ステップＳ２１４において算出した信頼度ｃ_ｔが、予め設定した閾値よりも高いか否かを判定する（ステップＳ２１５）。高ければステップＳ２１６へ移行し、高くなければステップＳ２１７へ移行する。 After step S214, the processing unit 2 determines the calculated reliability _{c t} is whether higher than a preset threshold at step S214 (step S215). If it is higher, the process proceeds to step S216, and if not higher, the process proceeds to step S217.

ステップＳ２１６において、処理部２は、信頼度ｃ_ｔが閾値よりも高かった場合は、追尾が成功したと判定し、追尾結果フラグを１とする。その後、ステップＳ２１８へ移行する。 In step S216, the processing unit 2, if the reliability c _t is higher than the threshold value, it is determined that the tracking is successful, and 1 tracking result flag. Thereafter, the process proceeds to step S218.

ステップＳ２１７において、処理部２は、信頼度ｃ_ｔが閾値よりも高くなかった場合は、追尾が失敗したと判定し、追尾結果フラグを０とする。その後、ステップＳ２１８へ移行する。 In step S217, the processing unit 2, if the reliability c _t is not higher than the threshold value, it determines that tracking is unsuccessful, the tracking result flag to 0. Thereafter, the process proceeds to step S218.

ステップＳ２１８において、処理部２は、前記AdaBoost識別器を更新させる更新処理を行う。処理部２は、この更新処理において、ステップＳ２０２において状態が遷移され予測された時刻ｔの粒子ｓ_ｔ ^（ｉ）のうちの所定のパーティクルを追尾対象画素用の学習サンプルとするとともに、時刻ｔの粒子ｓ_ｔ ^（ｉ）のうちの他の所定のパーティクルを背景画素用の学習サンプルとして、前記AdaBoost識別器を更新させる。 In step S218, the processing unit 2 performs an update process for updating the AdaBoost discriminator. In this update process, the processing unit 2 uses predetermined particles among the particles s _t ⁽ⁱ⁾ at the time t whose state has been changed and predicted in step S202 as learning samples for the tracking target pixel, and at the time t. The AdaBoost discriminator is updated using another predetermined particle of the particles s _t ⁽ⁱ⁾ as a learning sample for the background pixel.

ここで、図１１を参照して、AdaBoost識別器の更新処理（ステップＳ２１８）の一例について説明する。なお、図８中のAdaBoost識別器の更新処理（ステップＳ２１８）は、図１１に示す例に限定されるものではない。 Here, an example of the update process (step S218) of the AdaBoost discriminator will be described with reference to FIG. The AdaBoost discriminator update process (step S218) in FIG. 8 is not limited to the example shown in FIG.

AdaBoost識別器の更新処理（ステップＳ２１８）を開始すると、図１１に示すように、処理部２は、まず、追尾対象の学習サンプルと背景の学習サンプルの分類を行う（ステップＳ３０１）。具体的には、処理部２は、前述した図９に示す対象楕円（内側の楕円、マハラノビス距離Ｄ≒１．４０とした楕円）内の領域を追跡対象領域Ｒ_{ｔａｒｇｅｔ}とし、図９に示す背景楕円（外側の楕円、マハラノビス距離Ｄ＝１０とした楕円）と対象楕円との間の領域を背景領域Ｒ_ｂａｃｋとする。そして、処理部２は、追跡対象領域Ｒ_{ｔａｒｇｅｔ}中で、AdaBoost識別器の応答値Ｆが正の粒子を対象の学習に使用するポジティブサンプル、背景領域Ｒ_ｂａｃｋの中のすべての粒子を背景の学習に使用するネガティブサンプルとする。これを数式で表現すると、下記の数２６の上段がポジティブサンプル、下段がネガティブサンプルとなる。 When the AdaBoost discriminator update process (step S218) is started, as shown in FIG. 11, the processing unit 2 first classifies the learning sample to be tracked and the background learning sample (step S301). Specifically, the processing unit 2 sets the region in the target ellipse (inner ellipse, Mahalanobis distance D≈1.40) shown in FIG. 9 as the tracking target region R _target, and the background shown in FIG. A region between an ellipse (an outer ellipse, an ellipse with Mahalanobis distance D = 10) and the target ellipse is defined as a background region R _back . Then, the processing unit 2 learns the background of all particles in the background region R _back , a positive sample that uses particles having a positive response value F of the AdaBoost discriminator for learning in the target region R _target. Negative sample used for If this is expressed by a mathematical expression, the upper part of the following equation 26 is a positive sample, and the lower part is a negative sample.

次に、処理部２は、ステップＳ３０１で分類された各サンプルについて、学習サンプルの重みを算出する（ステップＳ３０２）。このとき、ポジティブサンプルの重みは下記の数２７の上段によって算出し、ネガティブサンプルの重みは下記の数２７の下段によって算出する。 Next, the processing unit 2 calculates the weight of the learning sample for each sample classified in step S301 (step S302). At this time, the weight of the positive sample is calculated by the upper part of the following expression 27, and the weight of the negative sample is calculated by the lower part of the following expression 27.

数２７において、Ｆ(ｖ(ｓ_ｔ ^（ｉ）))は粒子ｓ_ｔ ^（ｉ）に対するAdaBoost識別器の応答値、Ｄ_ｔ ^（ｉ）は各粒子の前記数２２のマハラノビス距離である。 In Equation 27, F (v (s _t ⁽ⁱ⁾ )) is the response value of the AdaBoost discriminator for the particle s _t ⁽ⁱ⁾ , and D _t ⁽ⁱ⁾ is the Mahalanobis distance of Equation 22 for each particle.

引き続いて、処理部２は、全てのサンプルについて重みを算出したかを否かを判定する（ステップＳ３０３）。算出されていればステップＳ３０４へ移行し、算出されていなければＳ３０２へ戻る。 Subsequently, the processing unit 2 determines whether or not weights have been calculated for all samples (step S303). If it is calculated, the process proceeds to step S304, and if not calculated, the process returns to S302.

ステップＳ３０４において、処理部２は、ステップＳ３０２で新たに算出された重みを使い、ステップＳ４と同様の学習（再学習）を行い、識別器を作成する。 In step S304, the processing unit 2 performs learning (relearning) similar to step S4 using the weight newly calculated in step S302, and creates a discriminator.

次に、処理部２は、新たに学習した識別器と既存の識別器とを統合することにより識別器を更新する（ステップＳ３０５）。過剰更新を防ぐため、本実施の形態では、初期化時ｔ＝１に学習された強識別器Ｆ_１（ｖ）と、ｔ＞１において逐次更新により学習された識別器Ｆ_ｔ（ｖ）とを統合する。下記の数２８及び数２９に、Ｆ_１（ｖ）、Ｆ_ｔ（ｖ）の式を示す。 Next, the processing unit 2 updates the classifier by integrating the newly learned classifier and the existing classifier (step S305). In order to prevent excessive updating, in the present embodiment, the strong classifier F ₁ (v) learned at initialization time t = 1, and the classifier F _t (v) learned by sequential updating at t> 1 To integrate. Equations 28 and 29 below show formulas for F ₁ (v) and F _t (v).

ただし、ｆ_１，ｍ（ｖ）はｔ＝１における弱識別器、Ｍは弱識別器の数、ｆ_ｔ（ｖ）はｔ＞１における弱識別器、Ｔは更新弱識別器の履歴数、Ｋは更新する弱識別器の数、ｃ_ｔは追尾対象の信頼度である。初期化時の識別器Ｆ_１（ｖ）、追加学習により得られた識別器Ｆ_ｔ（ｖ）を下記の数３０のように統合し識別器Ｆ（ｖ）を更新する。 Where f _{1, m} (v) is the weak classifier at t = 1, M is the number of weak classifiers, f _t (v) is the weak classifier at t> 1, T is the number of history of the updated weak classifiers, K is the number of weak classifiers to be updated, and _ct is the reliability of the tracking target. The discriminator F ₁ (v) at the time of initialization and the discriminator F _t (v) obtained by the additional learning are integrated as shown in Equation 30 below, and the discriminator F (v) is updated.

ただし、β_１，β_ｔはそれぞれ初期識別器Ｆ_１（ｖ）と、追加学習された識別器Ｆ_ｔ（ｖ）の学習サンプルにおける誤識別率に基づく重みであり、下記の数３１〜数３３のように定義する。 Here, β ₁ and β _t are weights based on the misclassification rate in the learning samples of the initial discriminator F ₁ (v) and the additionally learned discriminator F _t (v), respectively. Define as follows.

ここで、ｅ_１，ｅ_ｔは、それぞれ初期化時ｔ＝１の識別器Ｆ_１（ｖ）とｔ＞１で追加学習された識別器Ｆ_ｔ（ｖ）の学習サンプルに対する重み付き誤識別率である。つまり、ｅ_ｔはステップＳ３０５で算出された値となる。 Here, e ₁ and e _t are the weighted misidentification rates for the learning samples of the discriminator F ₁ (v) at t = 1 at initialization and the discriminator F _t (v) additionally learned at t> 1, respectively. It is. That, _{e t} is the calculated value in step S305.

ステップＳ３０５が終了すると、図８中の識別器の更新処理（ステップＳ２１８）が終了し、図３中のステップＳ６へ移行する。 When step S305 ends, the discriminator update process (step S218) in FIG. 8 ends, and the process proceeds to step S6 in FIG.

再び図３を参照すると、ステップＳ６において、処理部２は、現在の追跡結果フラグが１であるか否かを判定することで、ステップＳ５の追跡処理による追跡が成功したか否かを判定する。追跡が成功であれば（追跡結果フラグが１であれば）、ステップＳ７へ移行する。一方、追跡が失敗であれば（追跡結果フラグが０であれば）、ステップＳ８へ移行する。ステップＳ８において、処理部２は、追追跡の失敗の状態が一定時間継続しているか否かを判定する。一定時間継続していない場合は、追跡処理（ステップＳ５）に戻り、その処理を繰り返す。一定時間継続している場合は、追跡処理が成功する見込みがないものとみなして、ステップＳ１（プリセット状態）に戻る。 Referring to FIG. 3 again, in step S6, the processing unit 2 determines whether the tracking by the tracking process in step S5 is successful by determining whether the current tracking result flag is 1 or not. . If the tracking is successful (if the tracking result flag is 1), the process proceeds to step S7. On the other hand, if tracking fails (if the tracking result flag is 0), the process proceeds to step S8. In step S8, the processing unit 2 determines whether or not the follow-up failure state continues for a certain period of time. If it does not continue for a certain time, the process returns to the tracking process (step S5) and the process is repeated. If it continues for a certain period of time, it is considered that the tracking process is unlikely to succeed, and the process returns to step S1 (preset state).

ステップＳ６において追跡が成功である（追跡結果フラグが１である）と判定されると、処理部２は、追跡処理（ステップＳ５）で得た追跡結果、すなわち、図８中のステップＳ２１１で推定された追尾対象の位置及び図８中のステップＳ２１３で推定された追尾対象の大きさに応じて、カメラ１が追尾対象を追尾するように、カメラ１のパン、チルト及びズームを制御するカメラ制御処理を行う。このカメラ制御処理において、処理部２は、前記追跡結果に基づいて現在から所定時間経過後（ｎ_ｆフレーム後）の追尾対象の位置及び大きさを予測し、その予測結果に応じて、カメラ１に対する現在のパン、チルト及びズームの制御状態を修正してカメラ１のパン、チルト及びズームを制御する。ここで、本実施の形態では、追尾対象の位置及び大きさを予測は、カルマンフィルタにより行う。 If it is determined in step S6 that the tracking is successful (the tracking result flag is 1), the processing unit 2 estimates the tracking result obtained in the tracking process (step S5), that is, the estimation in step S211 in FIG. Camera control for controlling panning, tilting and zooming of the camera 1 so that the camera 1 tracks the tracking target according to the position of the tracking target and the size of the tracking target estimated in step S213 in FIG. Process. In this camera control process, the processing unit 2 predicts the position and size of the tracking target after a predetermined time has elapsed from the present (after n _f frames) based on the tracking result, and the camera 1 according to the prediction result. The current pan, tilt and zoom control states of the camera 1 are corrected to control the pan, tilt and zoom of the camera 1. Here, in the present embodiment, the position and size of the tracking target are predicted by the Kalman filter.

ここで、図１２乃至図１５を参照して、カメラ制御処理（ステップＳ７）の一例について説明する。なお、図３中のカメラ制御処理（ステップＳ７）は、図１２乃至図１５に示す例に限定されるものではない。 Here, an example of the camera control process (step S7) will be described with reference to FIGS. Note that the camera control process (step S7) in FIG. 3 is not limited to the examples shown in FIGS.

カメラ制御処理（ステップＳ７）を開始すると、図１２に示すように、処理部２は、まず、処理部２は、カメラ制御処理において用いる情報として、追跡結果（図８中のステップＳ２１１で推定された追尾対象の位置及び図８中のステップＳ２１３で推定された追尾対象の大きさ）を取得する（ステップＳ５０１）。なお、この追跡結果は処理部２がそもそも有しているので、本来はその取得動作は不要であるが、ここでは理解を容易にするため、このステップＳ５０１を挿入している。 When the camera control process (step S7) is started, as shown in FIG. 12, the processing unit 2 first determines that the processing unit 2 uses the tracking result (step S211 in FIG. 8 as information used in the camera control process). The position of the tracking target and the size of the tracking target estimated in step S213 in FIG. 8 are acquired (step S501). Since the tracking result is originally possessed by the processing unit 2, the acquisition operation is originally unnecessary, but step S 501 is inserted here for easy understanding.

次に、処理部２は、現在のカメラ１のパン、チルト、ズームのそれぞれの制御状態を示す情報として、パン制御フラグ、チルト制御フラグ、ズーム制御フラグを取得する（ステップＳ５０２）。本実施の形態では、カメラ１は、パン、チルト、ズームのそれぞれについて、処理部２から制御指令受けてからその制御動作を行ってその制御動作が完了すると制御完了信号を処理部２に返すようになっている。処理部２は、カメラ１にパン制御指令を与えるときにパン制御フラグを１にセットし、カメラ１からパン制御完了信号を受けたときに割り込み処理でパン制御フラグを０にリセットする。また、処理部２は、カメラ１にチルト制御指令を与えるときにチルト制御フラグを１にセットし、カメラ１からチルト制御完了信号を受けたときに割り込み処理でチルト制御フラグを０にリセットする。さらに、処理部２は、カメラ１にズーム制御指令を与えるときにズーム制御フラグを１にセットし、カメラ１からズーム制御完了信号を受けたときに割り込み処理でズーム制御フラグを０にリセットする。このように、パン、チルト、ズームのそれぞれの制御フラグは、１であれば対応する動作が制御中であることを示し、０であれば対応する動作が停止中であることを示す。以上の説明からわかるように、パン制御フラグ、チルト制御フラグ、ズーム制御フラグは処理部２がそもそも有しているので、本来はその取得動作は不要であるが、ここでは理解を容易にするため、このステップＳ５０２を挿入している。 Next, the processing unit 2 acquires a pan control flag, a tilt control flag, and a zoom control flag as information indicating the respective pan, tilt, and zoom control states of the camera 1 (step S502). In the present embodiment, the camera 1 performs control operations after receiving control commands from the processing unit 2 for each of pan, tilt, and zoom, and returns a control completion signal to the processing unit 2 when the control operations are completed. It has become. The processing unit 2 sets a pan control flag to 1 when a pan control command is given to the camera 1, and resets the pan control flag to 0 by an interrupt process when a pan control completion signal is received from the camera 1. Further, the processing unit 2 sets the tilt control flag to 1 when giving a tilt control command to the camera 1, and resets the tilt control flag to 0 by interruption processing when receiving a tilt control completion signal from the camera 1. Further, the processing unit 2 sets the zoom control flag to 1 when giving a zoom control command to the camera 1, and resets the zoom control flag to 0 by interrupt processing when receiving a zoom control completion signal from the camera 1. As described above, when each control flag of pan, tilt, and zoom is 1, it indicates that the corresponding operation is being controlled, and when it is 0, it indicates that the corresponding operation is being stopped. As can be understood from the above description, since the processing unit 2 originally has the pan control flag, the tilt control flag, and the zoom control flag, the acquisition operation is originally unnecessary, but here, for easy understanding. This step S502 is inserted.

次に、処理部２は、全ての制御フラグ（パン制御フラグ、チルト制御フラグ、ズーム制御フラグ）が０であるか否かを判定し（ステップＳ５０３）、全ての制御フラグが０であればステップＳ５１９へ移行し、いずれか１つ以上の制御フラグが１であればステップＳ５０４へ移行する。 Next, the processing unit 2 determines whether all control flags (pan control flag, tilt control flag, zoom control flag) are 0 (step S503). The process proceeds to S519, and if any one or more control flags is 1, the process proceeds to Step S504.

ステップＳ５０４において、処理部２は、ステップＳ５０１で取得した追跡結果（特に、追尾対象の位置）が、前のフレームで取得した追跡結果（特に、追尾対象の位置）に比較して、画像中央から遠ざかっているかどうかを判定する。遠ざかっている場合はステップＳ５０６へ移行し、近づいている場合はステップＳ５０５へ移行する。 In step S504, the processing unit 2 compares the tracking result (particularly the tracking target position) acquired in step S501 from the center of the image compared to the tracking result (particularly the tracking target position) acquired in the previous frame. Determine if you are away. If it is away, the process proceeds to step S506, and if it is approaching, the process proceeds to step S505.

画像中央から遠ざかっている場合は、前のフレームからの制御が適していないと判断し、処理部２は、ステップＳ５０６においてパン、チルト、ズームいずれの制御フラグも０にリセットし、さらに、カメラ制御を停止させる（ステップＳ５０７）。 If it is away from the center of the image, it is determined that the control from the previous frame is not suitable, and the processing unit 2 resets the pan, tilt, and zoom control flags to 0 in step S506, and further performs camera control. Is stopped (step S507).

ステップＳ５０５において、処理部２は、ステップＳ５０１で取得した追跡結果に基づいて、それまでの追尾対象の進行方向や大きさの変化の方向（だんだん拡大していくのか、だんだん縮小していくのか）が変わってしまったかどうかを判定する。進行方向及び大きさの変化共に変化がなければステップＳ５０８へ移行し、いずれかが変化している場合はステップＳ５０６へ移行する。 In step S505, based on the tracking result acquired in step S501, the processing unit 2 changes the direction of movement and the size of the tracking target up to that point (whether it gradually increases or decreases) Determine if has changed. If there is no change in both the traveling direction and the size, the process proceeds to step S508, and if either has changed, the process proceeds to step S506.

ステップＳ５０８において、処理部２は、ズーム制御フラグが１であるか否かを判定することで、現在、ズーム制御の途中であるか否かを判定する。ズーム制御中であればステップＳ５０９へ移行し、ズーム制御中でなければステップＳ５１１に移行する。 In step S508, the processing unit 2 determines whether or not the zoom control flag is 1, thereby determining whether or not zoom control is currently in progress. If zoom control is being performed, the process proceeds to step S509. If zoom control is not being performed, the process proceeds to step S511.

ステップＳ５０９において、処理部２は、ステップＳ５０１で取得した追跡結果が、既に予め設定しておいた目標とする大きさの範囲に達しているかどうかを判定する。達している場合はＳ５１０へ移行し、達していない場合はステップＳ５１１へ移行する。 In step S509, the processing unit 2 determines whether the tracking result acquired in step S501 has reached a target size range that has already been set in advance. If it has reached, the process proceeds to S510, and if not, the process proceeds to Step S511.

ステップＳ５１０において、処理部２は、ズームの制御フラグを０にする。これは、予め設定しておいた目標とする大きさの範囲に達しているため、その時点でズーム制御をストップさせることが好ましいためである。 In step S510, the processing unit 2 sets the zoom control flag to 0. This is because the zoom control is preferably stopped at that time because the target size range set in advance has been reached.

ステップＳ５１１において、処理部２は、パン制御フラグが１であるか否かを判定することで、現在、パン制御の途中であるか否かを判定する。パン制御中であればステップＳ５１２へ移行し、パン制御中でなければステップＳ５１４へ移行する。 In step S511, the processing unit 2 determines whether or not the pan control flag is 1, thereby determining whether or not the pan control is currently in progress. If pan control is being performed, the process proceeds to step S512, and if pan control is not being performed, the process proceeds to step S514.

ステップＳ５１２において、処理部２は、ステップＳ５０１で取得した追跡結果が、既に予め設定しておいた目標とする水平方向の位置の範囲に達しているか否かを判定する。達している場合はステップＳ５１３へ移行し、達していない場合はＳ５１４へ移行する。 In step S512, the processing unit 2 determines whether or not the tracking result acquired in step S501 has reached the target horizontal position range that has been set in advance. If it has reached, the process proceeds to step S513, and if not, the process proceeds to S514.

ステップＳ５１３において、処理部２は、パンの制御フラグを０にする。これは、予め設定しておいた目標とする水平方向の位置の範囲に達しているため、その時点でパン制御をストップさせることが好ましいためである。 In step S513, the processing unit 2 sets the pan control flag to 0. This is because it is preferable to stop the pan control at that time point because the target horizontal position range set in advance has been reached.

ステップＳ５１４において、チルト制御フラグが１であるか否かを判定することで、現在、チルト制御の途中であるか否かを判定する。チルト制御中であればステップＳ５１５へ移行し、チルト制御中でなければＳ５１７へ移行する。 In step S514, it is determined whether or not the tilt control flag is 1, thereby determining whether or not the tilt control is currently in progress. If the tilt control is being performed, the process proceeds to step S515, and if the tilt control is not being performed, the process proceeds to S517.

ステップＳ５１５において、処理部２は、Ｓ５０１で取得した追跡結果が、既に予め設定しておいた目標とする垂直方向の位置の範囲に達しているか否かを判定する。達している場合はＳ５１６へ移行し、達していない場合はＳ５１７へ移行する。 In step S515, the processing unit 2 determines whether or not the tracking result acquired in S501 has reached the target vertical position range that has been set in advance. If it has reached, the process proceeds to S516, and if not, the process proceeds to S517.

ステップＳ５１６において、処理部２は、チルトの制御フラグを０にする。これは、予め設定しておいた目標とする垂直方向の位置の範囲に達しているため、その時点でチルト制御をストップさせることが好ましいためである。 In step S516, the processing unit 2 sets the tilt control flag to 0. This is because the tilt control is preferably stopped at that time because the target vertical position range has been set in advance.

ステップＳ５１７において、処理部２は、パン制御フラグ、チルト制御フラグ、ズーム制御フラグのいずれかに変更があったか否かを判定する。ステップＳ５１０、Ｓ５１３、Ｓ５１６のいずれかを行った場合はＳ５１８へ移行し、いずれも行わなかった場合はそのまま制御を続行させるためＳ５１９へ移行する。 In step S517, the processing unit 2 determines whether any of the pan control flag, the tilt control flag, and the zoom control flag has been changed. If any one of steps S510, S513, and S516 is performed, the process proceeds to S518. If none is performed, the process proceeds to S519 in order to continue the control.

ステップＳ５１８において、処理部２は、制御を変更する。これは、ステップＳ５１８に到達するということは、その前の予測によるカメラ１のパン、チルト、ズームの制御と実際の追尾対象の動きに違いが生じていることを示すことになるためである。 In step S518, the processing unit 2 changes the control. This is because reaching step S518 indicates that there is a difference between the control of the pan, tilt and zoom of the camera 1 based on the previous prediction and the actual movement of the tracking target.

ステップＳ５１９において、処理部２は、カメラ１から、現在のカメラ１の姿勢（パン、チルト、ズームの位置）を取得する。 In step S 519, the processing unit 2 acquires the current posture of the camera 1 (pan, tilt, zoom position) from the camera 1.

次に、処理部２は、ステップＳ５０１で取得した追跡結果と、その前の追跡処理の結果から、ｎ_ｆフレーム後の追尾対象の位置と大きさを予測する（ステップＳ５２０）。ｎ_ｆフレーム後とは、例えば、ＮＴＳＣの信号の場合、（ｎ_ｆ／３０）秒後に相当する。 Next, the processing unit 2 predicts the position and size of the tracking target after n _f frames from the tracking result acquired in step S501 and the result of the previous tracking process (step S520). For example, in the case of an NTSC signal, “after n _f frames” corresponds to (n _f / 30) seconds later.

ここでは、ｎ_ｆフレーム後の追尾対象の位置と大きさを予測するため、カルマンフィルタを用いる。 Here, in order to predict the tracking position and size of the target after n _f frames, using a Kalman filter.

ここでは、追尾対象の位置及び大きさの変化が等速であり、かつその変化が滑らかであると仮定して、カルマンフィルタを構成する。カルマンフィルタでは、追尾対象の状態の変化が設定したモデルに厳密に当てはまらないとしても、誤差項があるので近似的に適用できる場合が多い。 Here, the Kalman filter is configured on the assumption that the change in the position and size of the tracking target is constant and the change is smooth. In the Kalman filter, even if the change in the state of the tracking target is not strictly applied to the set model, there are many cases where it can be applied approximately because there is an error term.

時刻ｋの状態変数ベクトルｘ_ｋを下記の数３４のように定義する。 A state variable vector x _k at time k is defined as in the following Expression 34.

ここで、ｘ_ｋと上にドットを付したｘ_ｋは画像における対象矩形の中心の水平座標と速度、ｙ_ｋと上にドットを付したｙ_ｋは垂直座標と速度、ｓ_ｋと上にドットを付したｓ_ｋは大きさ（矩形の横幅と縦幅の積）とその変化を表す。 Here, y _k is the vertical coordinate and velocity, s _k and top to dot x _k marked with dots on the x _k is denoted by the dot target rectangle center horizontal coordinates and velocity of the upper and y _k in the image Sk with a _represents a size (product of a horizontal width and a vertical width of a rectangle) and its change.

この状態ベクトルとカメラ１の制御、誤差を考慮したシステム方程式は、下記の数３５で定義される。 A system equation in consideration of this state vector, control of the camera 1, and error is defined by the following equation (35).

数３５において、Ａは下記の数３６に示す定数行列である。 In Equation 35, A is a constant matrix shown in Equation 36 below.

入力画像の横幅をＷ_ｓｒｃ、縦幅をＨ_ｓｒｃ、時刻ｋにおける水平画角をθ_ｋ、垂直画角をφ_ｋ、カメラのパン角速度を上にドットを付したＰ_ｋ、チルト角速度を上のドットを付したＴ_ｋとすると、各時刻でパン・チルト制御による画素の水平・垂直方向の変化量は下記の数３７及び数３８でそれぞれで表される。 The horizontal width of the input image is W _src , the vertical width is H _src , the horizontal angle of view at time k is θ _k , the vertical angle of view is φ _k , the pan angle speed of the camera is P _k , and the tilt angular speed is up When T _k marked with dots, horizontal and vertical variation of the pixel by the pan-tilt control at each time it is represented by the respective by the number 37 and number 38 below.

なお、ズーム操作による画角の変化をλとすると、λは下記の数３９で表すことができる。 If the change in the angle of view due to the zoom operation is λ, λ can be expressed by the following equation (39).

以上の要素から、制御ベクトルｕ_ｋは下記の数４０で与えられる。ｕ_ｋはカメラ制御による画像の変動を表す制御ベクトルである。 From the above elements, the control vector u _k is given by the following numbers 40. u _k is a control vector which represents the variation of the image by the camera control.

ｗ_ｋはシステム誤差であり、下記の数４１で示すように、共分散行列Ｑ_ｋかつ平均０の正規白色過程に従う。 w _k is a system error, and follows a normal white process with a covariance matrix Q _k and an average of 0, as shown in Equation 41 below.

ここでは、先に述べた追跡処理から得た、追尾対象を囲む矩形の位置と大きさを観測値とし、時刻ｋの観測ベクトルを下記の数４２のように定義する。 Here, the position and size of the rectangle surrounding the tracking target obtained from the tracking process described above are taken as observation values, and the observation vector at time k is defined as in the following equation (42).

観測方程式は、下記の数４３で表される。 The observation equation is expressed by the following Equation 43.

ここでＨは下記の数４４に示す定数行列である。また、数４５で示すように、観測誤差ｖ_ｋは共分散行列Ｒ_ｋかつ平均０の正規白色過程に従う。 Here, H is a constant matrix shown in the following equation 44. Further, as shown in Expression 45, the observation error v _k follows a normal white process having a covariance matrix R _k and an average of 0.

カルマンフィルタでは、現在時刻の観測量と１期前の状態量を用いて、現在時刻の推定量を推定する。システムの現在時刻ｋの状態は下記の数４６で示す２つの変数で表される。なお、本明細書において、記号の上に付した符号＾は、推定値を意味している。 In the Kalman filter, the estimated amount of the current time is estimated using the observed amount of the current time and the state amount of the previous period. The state of the current time k of the system is represented by two variables represented by the following formula 46. In addition, in this specification, the code | symbol ^ attached | subjected on the symbol means the estimated value.

カルマンフィルタは、時間ステップをひとつ進めるために予測と更新の二つの手続きを行う。予測の手続きでは、前の時刻の推定状態から、現在時刻の推定状態を計算する。更新では、今の時刻の観測を用い、推定値を補正することで、より正確な状態を推定する。 The Kalman filter performs two procedures, prediction and update, to advance one time step. In the prediction procedure, the estimated state of the current time is calculated from the estimated state of the previous time. In the update, a more accurate state is estimated by using the observation at the current time and correcting the estimated value.

予測に関し、現在時刻の推定値は下記の数４７で表され、現在時刻の誤差の共分散行列は下記の数４８で表される。 Regarding the prediction, the estimated value of the current time is expressed by the following formula 47, and the covariance matrix of the current time error is expressed by the following formula 48.

更新に関し、カルマンフィルタは、更新後の誤差の推定値を最小にするカルマンゲインを下記の数４９〜数５３により計算し、状態を更新していく。 Regarding the update, the Kalman filter calculates a Kalman gain that minimizes the estimated value of the error after the update by the following formulas 49 to 53, and updates the state.

以上の計算により、誤差を考慮した現在時刻の状態量を推定することができる。 With the above calculation, it is possible to estimate the state quantity at the current time in consideration of the error.

ここで、カルマンフィルタの初期条件の設定について説明する。追跡開始の際の追尾対象矩形の中心座標を（ｘ_０，ｙ_０）、大きさをｓ_０とすると、状態の初期値は下記の数５４のようになる。ここで、速度は０としている。 Here, setting of initial conditions of the Kalman filter will be described. Assuming that the center coordinates of the tracking target rectangle at the start of tracking are (x ₀ , y ₀ ) and the size is s ₀ , the initial value of the state is as shown in Equation 54 below. Here, the speed is zero.

初期条件に誤差があるなら、誤差の共分散行列を下記の数５５のように与える。 If there is an error in the initial condition, a covariance matrix of the error is given as in Equation 55 below.

フィルタの特性はシステム誤差と観測誤差の分散比であり、この分散比が大きいほどフィルタ処理後の推定値は元の観測値に忠実となるが誤差に敏感となり、小さいほど平滑化されるがデータに対する追従性が落ちる。本実施の形態では、対象人物の移動方向の変化、立ち止まりなどに迅速に対応できるが、追跡結果に含まれる誤差には影響を受けにくい程度の分散比を経験的に用いる。 The characteristic of the filter is the variance ratio between the system error and the observation error. The larger the variance ratio, the more accurate the estimated value after filtering is, but the more sensitive the error is. Follow-up performance against is reduced. In the present embodiment, it is possible to quickly cope with a change in the moving direction of the target person, a stoppage, and the like, but a dispersion ratio that is not easily affected by an error included in the tracking result is empirically used.

以上、図１３中のステップＳ５２０の処理で用いるカルマンフィルタについて説明した。 The Kalman filter used in the process of step S520 in FIG. 13 has been described above.

ステップＳ５２０の後に、処理部２は、ステップＳ５２０の予測によって得られたｎ_ｆフレーム後の追尾対象の大きさと目標とすべき追尾対象の大きさｓ_ｉから、ｎ_ｆフレーム後の水平画角または垂直画角を算出する（ステップＳ５２１）。ただし、下記の数５６は水平画角を算出するものである。 After step S520, the processing unit 2 calculates the horizontal angle of view after n _f frames from the size of the tracking target after n _f frames obtained by the prediction at step S520 and the size of the tracking target to be targeted s _i. A vertical angle of view is calculated (step S521). However, the following formula 56 calculates the horizontal angle of view.

次に、処理部２は、ステップＳ５２１で算出された水平画角がズームの限界に達しているか否かを判定する（ステップＳ５２２）。ズームの限界に達している場合はステップＳ５４９へ移行する。ただし、ズームアウトで限界に達している場合はステップＳ５２３へ移行する。ズームイン、ズームアウトのいずれも限界に達していない場合はステップＳ５２３へ移行する。 Next, the processing unit 2 determines whether or not the horizontal angle of view calculated in step S521 has reached the zoom limit (step S522). If the zoom limit has been reached, the process proceeds to step S549. However, when the limit is reached in zooming out, the process proceeds to step S523. If neither zoom-in nor zoom-out has reached the limit, the process proceeds to step S523.

ステップＳ５２３において、処理部２は、Ｓ５２１で得られた結果から現在の水平画角と比較し、ズームの制御量は所定値より小さいか否かを判定する。小さければＳ５２４へ移行し、大きければＳ５２６へ移行する。 In step S523, the processing unit 2 compares the current horizontal angle of view from the result obtained in S521, and determines whether or not the zoom control amount is smaller than a predetermined value. If it is smaller, the process proceeds to S524, and if it is larger, the process proceeds to S526.

ステップＳ５２４において、処理部２は、ズーム制御速度を０にセットする。引き続いて、処理部２は、ズーム制御フラグを０にセットし（ステップＳ５２５）、ステップＳ５３０へ移行する。これらにより、ズームの制御を行わない。このように、ズームの制御量が小さい場合は、ズームの制御を行わない。ズームの制御量が小さい場合、ズームの制御を行うと、細かい動きをすることになるので、表示部５を監視する監視者に不快感を与える可能性があるためである。 In step S524, the processing unit 2 sets the zoom control speed to zero. Subsequently, the processing unit 2 sets the zoom control flag to 0 (step S525), and proceeds to step S530. Thus, zoom control is not performed. Thus, when the control amount of zoom is small, zoom control is not performed. This is because when the zoom control amount is small, if the zoom control is performed, a fine movement is caused, which may cause discomfort to the monitor who monitors the display unit 5.

ステップＳ５２６において、ズームの制御速度Ｚ_{ｓｐｅｅｄ}を下記の数５７により算出する。ここで、ｆ_ｒはフレームレート（ＮＴＳＣの場合はｆ_ｒ＝３０）を示す。また、Ｚ_ｐｔは水平画角が時刻ｔのときのズームポジションである。 In step S526, the zoom control speed Z _speed is calculated by the following _equation 57. Here, f _r indicates a frame rate (in the case of NTSC, f _r = 30). Z _pt is the zoom position when the horizontal angle of view is time t.

その後、処理部２は、ステップＳ５２６で算出されたズームの制御速度Ｚ_{ｓｐｅｅｄ}と現在制御中の速度との間の差が所定値より大きいか否かを判定する（ステップＳ５２７）。差が大きければステップＳ５２８へ移行し、差が小さければステップＳ５２９へ移行する。ここでも、差が一定値以上大きくなければズームの制御速度の変更を行わないのは、表示部５を目視監視する監視者にとってズームの制御速度が頻繁に変化することは不快感につながる可能性があるので、これを回避するためである。また、追尾対象は画像中の真ん中近くにあればよく、画像中心と全くずれがない状態を維持することが目的ではない。 Thereafter, the processing unit 2 determines whether or not the difference between the zoom control speed Z _speed calculated in step S526 and the _speed currently being controlled is greater than a predetermined value (step S527). If the difference is large, the process proceeds to step S528, and if the difference is small, the process proceeds to step S529. Again, if the difference is not greater than a certain value, the zoom control speed is not changed because the frequent change of the zoom control speed may cause discomfort for the observer who visually monitors the display unit 5. This is to avoid this. In addition, the tracking target only needs to be near the center of the image, and the purpose is not to maintain a state in which there is no deviation from the center of the image.

ステップＳ５２８において、処理部２は、ステップＳ５２６で算出されたズームの制御速度の値に変更する。その後、ステップＳ５２９へ移行する。 In step S528, the processing unit 2 changes the zoom control speed value calculated in step S526. Thereafter, the process proceeds to step S529.

ステップＳ５２９において、処理部２は、ズーム制御フラグを１に設定する。その後、ステップＳ５３０へ移行する。 In step S529, the processing unit 2 sets the zoom control flag to 1. Thereafter, the process proceeds to step S530.

ステップＳ５３０において、処理部２は、ステップＳ５２０の予測で得られた結果から、パンの制御量は所定値より小さいか否かを判定する。小さければステップＳ５３１へ移行し、大きければステップＳ５３３へ移行する。 In step S530, the processing unit 2 determines whether or not the pan control amount is smaller than a predetermined value from the result obtained by the prediction in step S520. If so, the process proceeds to step S531, and if greater, the process proceeds to step S533.

ステップＳ５３１において、処理部２は、パンの制御速度を０にセットする。引き続いて、処理部２は、パン制御フラグを０にセットし（ステップＳ５３２）、ステップＳ５３８へ移行する。これらにより、パンの制御を行わない。このように、パンの制御量が小さい場合は、パンの制御を行わない。パンの制御量が小さい場合、パンの制御を行うと、細かい動きをすることになるので、表示部５を監視する監視者に不快感を与える可能性があるためである。 In step S531, the processing unit 2 sets the pan control speed to zero. Subsequently, the processing unit 2 sets the pan control flag to 0 (step S532), and proceeds to step S538. Thus, pan control is not performed. Thus, when the pan control amount is small, pan control is not performed. This is because if the pan control amount is small, if the pan control is performed, a fine movement occurs, which may cause discomfort to the monitor who monitors the display unit 5.

ステップＳ５３３において、処理部２は、パンの制御速度Ｐ_{ｓｐｅｅｄ}を下記の数５８により算出する。ここでＷ_ｓｒｃは入力画像の横幅である。 In step S 533, the processing unit 2 calculates the pan control speed P _speed by the following equation 58. Here, W _src is the horizontal width of the input image.

その後、処理部２は、ステップＳ５３３で算出されたパンの制御速度Ｐ_{ｓｐｅｅｄ}でパンを制御した仮定した場合において、ｎ_ｆフレーム分時間が経過したとき（ｎ_ｆ／ｆ_ｒ秒後）のパンの位置を計算し、その値がパンの限界に達するか又はパンの限界を超えた値となるか否かを判定する（ステップＳ５３４）。その値がパンの限界に達するか又はパンの限界を超えた値となる場合は、ステップＳ５４９へ移行し、そうではない場合はステップＳ５３５へ移行する。 Thereafter, the processing unit 2 assumes that the pan is controlled at the pan control speed P _speed calculated in step S533, and when the time corresponding to n _f frames has elapsed (after n _f / f _r seconds), The position is calculated, and it is determined whether or not the value reaches the panning limit or exceeds the panning limit (step S534). If the value reaches the panning limit or exceeds the panning limit, the process proceeds to step S549, and if not, the process proceeds to step S535.

ステップＳ５３５において、処理部２は、ステップＳ５３３で算出されたパンの制御速度Ｐ_{ｓｐｅｅｄ}と現在制御中の速度との差が所定値より大きいか否かを判定する。差が大きければステップＳ５３６へ移行し、差が小さければステップＳ５３７へ移行する。 In step S535, the processing unit 2 determines whether or not the difference between the pan control speed P _speed calculated in step S533 and the _speed currently being controlled is greater than a predetermined value. If the difference is large, the process proceeds to step S536, and if the difference is small, the process proceeds to step S537.

ステップＳ５３６において、処理部２は、ステップＳ５３６で算出されたパンの制御速度の値に変更する。その後ステップＳ５３７へ移行する。 In step S536, the processing unit 2 changes the pan control speed value calculated in step S536. Thereafter, the process proceeds to step S537.

ステップＳ５３７において、処理部２は、パン制御フラグを１に設定する。その後、ステップＳ５３８へ移行する。 In step S537, the processing unit 2 sets the pan control flag to 1. Thereafter, the process proceeds to step S538.

ステップＳ５３８において、処理部２は、ステップＳ５２０の予測で得られた結果から、チルトの制御量は所定値より小さいかどうかを判定する。小さければステップＳ５３９へ移行し、大きければステップＳ５４１へ移行する。 In step S538, the processing unit 2 determines whether the tilt control amount is smaller than a predetermined value from the result obtained in the prediction in step S520. If it is smaller, the process proceeds to step S539, and if it is larger, the process proceeds to step S541.

ステップＳ５３９において、処理部２は、チルトの制御速度を０にセットする。引き続いて、処理部２は、チルト制御フラグを０にセットし（ステップＳ５４０）、ステップＳ５４６へ移行する。これらにより、チルトの制御を行わない。このように、チルトの制御量が小さい場合は、チルトの制御を行わない。チルトの制御量が小さい場合、チルトの制御を行うと、細かい動きをすることになるため表示部５を監視する監視者に不快感を与える可能性があるためである。 In step S539, the processing unit 2 sets the tilt control speed to zero. Subsequently, the processing unit 2 sets the tilt control flag to 0 (step S540), and proceeds to step S546. Thus, tilt control is not performed. Thus, when the amount of tilt control is small, tilt control is not performed. This is because when the tilt control amount is small, if the tilt control is performed, a fine movement is caused, which may cause discomfort to the monitor who monitors the display unit 5.

ステップＳ５４１において、処理部２は、チルトの制御速度Ｔ_{ｓｐｅｅｄ}を下記の数５９により算出する。ここでＨ_ｓｒｃは入力画像の高さ（縦方向の長さ）である。 In step S 541, the processing unit 2 calculates the tilt control speed T _speed by the following equation 59. Here, H _src is the height (length in the vertical direction) of the input image.

その後、処理部２は、Ｓ５４１で算出されたチルトの制御速度Ｔ_{ｓｐｅｅｄ}でチルトを制御したと仮定した場合において、ｎ_ｆフレーム分時間が経過したとき（ｎ_ｆ／ｆ_ｒ秒後）のチルトの位置を計算し、その値がチルトの限界に達するか又はチルトの限界を超えた値となる否かを判定する（ステップＳ５４２）。その値がチルトの限界に達するか又はチルトの限界を超えた値となる場合は、ステップＳ５４９へ移行し、そうではない場合はステップＳ５４３へ移行する。 Thereafter, the processing unit 2 assumes that the tilt is controlled at the tilt control speed T _speed calculated in S541, and when the time corresponding to n _f frames has elapsed (after n _f / f _r seconds), The position is calculated, and it is determined whether or not the value reaches the tilt limit or exceeds the tilt limit (step S542). If the value reaches the tilt limit or exceeds the tilt limit, the process proceeds to step S549, and if not, the process proceeds to step S543.

ステップＳ５４３において、処理部２は、ステップＳ５４１で算出されたチルトの制御速度Ｔ_{ｓｐｅｅｄ}と現在制御中の速度との差が所定値より大きいか否かを判定する。差が大きければＳ５４４へ移行し、差が小さければＳ５４５へ移行する。 In step S543, the processing unit 2 determines whether or not the difference between the tilt control speed T _speed calculated in step S541 and the _speed currently being controlled is greater than a predetermined value. If the difference is large, the process proceeds to S544, and if the difference is small, the process proceeds to S545.

ステップＳ５４４において、処理部２は、ステップＳ５４１で算出されたチルトの制御速度の値に変更する。その後ステップＳ５４５へ移行する。 In step S544, the processing unit 2 changes the value of the tilt control speed calculated in step S541. Thereafter, the process proceeds to step S545.

ステップＳ５４５において、処理部２は、チルト制御フラグを１に設定する。その後、ステップＳ５４６へ移行する。 In step S545, the processing unit 2 sets the tilt control flag to 1. Thereafter, the process proceeds to step S546.

ステップＳ５４６において、処理部２は、全ての制御フラグ（パン制御フラグ、チルト制御フラグ、ズーム制御フラグ）が０であるか否かを判定する。全ての制御フラグが０であれば、制御しないことになるため、ステップＳ５４７を経ることなくステップＳ５４８へ移行する。いずれか１つ以上の制御フラグが１であれば、ステップＳ５４７へ移行する。 In step S546, the processing unit 2 determines whether or not all control flags (pan control flag, tilt control flag, zoom control flag) are zero. If all the control flags are 0, control is not performed, and the process proceeds to step S548 without passing through step S547. If any one or more control flags are 1, it will transfer to step S547.

ステップＳ５４７において、処理部２は、１になっている制御フラグ及びこれに対応する制御速度に応じた制御を行うようにカメラ１に制御指令を与える。その制御は、その速度に応じてｎ_ｆ／ｆ_ｒ秒間行うが、このループを出てステップＳ５４８へ移行するのは制御が完了するのを待たないでよい。なお、本実施の形態では、制御フラグ自体によって制御指令がカメラ１に与えられるわけではなく、ステップＳ５４７のような動作によって制御フラグ及び制御速度に応じた制御指令がカメラ１に与えられるようになっている。 In step S547, the processing unit 2 gives a control command to the camera 1 so as to perform control according to the control flag that is 1 and the control speed corresponding thereto. Its control is performed n _{f /} f _r seconds depending on the speed, may no wait for the control to complete the process proceeds to step S548 exits this loop. In the present embodiment, a control command is not given to the camera 1 by the control flag itself, but a control command according to the control flag and the control speed is given to the camera 1 by the operation as in step S547. ing.

ステップＳ５４８において、処理部２は、追尾限界フラグを０（０は、パン、チルト、ズームのいずれもが限界に達する可能性がないことを示す）にする。これは、パン、チルト、ズームのいずれもが限界に達する可能性がない場合に、ステップＳ５４８に到達するためである。ステップＳ５４８の後、カメラ制御処理（ステップＳ７）を終了して、図３中のステップＳ９へ移行する。 In step S548, the processing unit 2 sets the tracking limit flag to 0 (0 indicates that none of pan, tilt, and zoom can reach the limit). This is because step S548 is reached when none of pan, tilt, and zoom can reach the limit. After step S548, the camera control process (step S7) is terminated, and the process proceeds to step S9 in FIG.

ステップＳ５４９において、処理部２は、追尾限界フラグを１（１は、パン、チルト、ズームのいずれかが限界に達する可能性があることを示す）にする。これは、パン、チルト、ズームの少なくともひとつが限界に達する可能性がある場合に、ステップＳ５４９に到達するためである。ステップＳ５４９の後、ステップＳ５５０において、処理部２は、カメラ１の姿勢制御中であれば、その制御を停止する。ステップＳ５５０の後、カメラ制御処理（ステップＳ７）を終了して、図３中のステップＳ９へ移行する。 In step S549, the processing unit 2 sets the tracking limit flag to 1 (1 indicates that any of pan, tilt, and zoom may reach the limit). This is because step S549 is reached when there is a possibility that at least one of pan, tilt, and zoom may reach the limit. After step S549, in step S550, the processing unit 2 stops the control if the posture control of the camera 1 is being performed. After step S550, the camera control process (step S7) is terminated, and the process proceeds to step S9 in FIG.

再び図３を参照すると、ステップＳ９において、処理部２は、追尾限界フラグが１であるか否かを判定する。追尾限界フラグが１であれば、追尾対象の追尾の継続が困難であると判断し、ステップＳ１（プリセット状態）に戻る。追尾限界フラグが０であれば、追尾対象の追尾の継続が可能であるので、ステップＳ５へ戻って、追尾対象の追尾を継続する。 Referring to FIG. 3 again, in step S9, the processing unit 2 determines whether or not the tracking limit flag is 1. If the tracking limit flag is 1, it is determined that it is difficult to continue tracking of the tracking target, and the process returns to step S1 (preset state). If the tracking limit flag is 0, it is possible to continue tracking the tracking target, so the process returns to step S5 to continue tracking the tracking target.

本実施の形態によれば、前記パーティクルフィルタにより追尾対象の位置を推定する（図８中のステップＳ２１１）ので、相関演算とは異なり、複数の解の候補（複数のパーティクル）を持つので追跡失敗から回復する可能性が高くなり、オクルージョンや複雑な背景などに対して強く、より精度良く追跡処理を行うことができ、ひいては、追尾対象をより精度良く追尾して撮像することができる。また、本実施の形態によれば、AdaBoost識別器により追尾対象画素と背景画素とが識別され、前記パーティクルフィルタがAdaBoost識別器の応答値から算出した尤度を用いるものであるため、追尾対象以外の背景の形状や大きさや色や明暗変化などの背景の変化の影響を受け難くなり、この点からも、より一層精度良く追跡処理を行うことができ、ひいては、追尾対象をより一層精度良く追尾して撮像することができる。 According to the present embodiment, the tracking target position is estimated by the particle filter (step S211 in FIG. 8). Therefore, unlike the correlation calculation, a plurality of solution candidates (a plurality of particles) are included, so tracking failure occurs. Therefore, the tracking process can be performed with higher accuracy and more accurately with respect to occlusion and complicated backgrounds. As a result, the tracking target can be tracked with higher accuracy and imaged. According to the present embodiment, the tracking target pixel and the background pixel are identified by the AdaBoost classifier, and the particle filter uses the likelihood calculated from the response value of the AdaBoost classifier. This makes it less susceptible to changes in the background, such as the shape, size, color, and light / dark changes in the background. From this point, it is possible to perform tracking processing with even higher accuracy, and, in turn, to track the tracking target with higher accuracy. Image.

また、本実施の形態によれば、前記AdaBoost識別器により追尾対象画素であると識別された画素に応じたパーティクルに基づいて追尾対象の大きさを推定する（図８中のステップＳ２１３）ので、追尾対象の大きさについても、オクルージョンや複雑な背景などに対して強く、より精度良く追跡処理を行うことができ、ひいては、追尾対象をより精度良く追尾して撮像することができる。 Further, according to the present embodiment, the size of the tracking target is estimated based on the particles corresponding to the pixel identified as the tracking target pixel by the AdaBoost discriminator (step S213 in FIG. 8). The size of the tracking target is also strong against occlusion and a complicated background, so that tracking processing can be performed with higher accuracy. As a result, the tracking target can be tracked with higher accuracy and imaged.

さらに、本実施の形態によれば、AdaBoost識別器を更新させるので、追尾対象の見え方の変化や環境（照明及び日照条件など）の変化に対応することができ、これにより、AdaBoost識別器による追尾対象画素であるか背景画素であるかの識別の精度が高まる。したがって、本実施の形態によれば、より一層精度良く追跡処理を行うことができ、ひいては、追尾対象をより一層精度良く追尾して撮像することができる。 Furthermore, according to the present embodiment, since the AdaBoost discriminator is updated, it is possible to cope with changes in the appearance of the tracking target and changes in the environment (such as lighting and sunshine conditions). The accuracy of identifying whether the pixel is a tracking target pixel or a background pixel is increased. Therefore, according to the present embodiment, it is possible to perform the tracking process with higher accuracy, and as a result, it is possible to track and image the tracking target with higher accuracy.

本実施の形態によれば、追尾対象の信頼度信頼度ｃ_ｔとして、全粒子の尤度の空間的な重み付け平均値を算出し（ステップＳ２１４）、その値が所定値以上であるか否かを判定している（ステップＳ１１１，Ｓ１１２，Ｓ６）ので、結局、追跡が成功しているか失敗したかを判定することができる。したがって、本実施の形態では、ステップＳ６でＮＯでかつステップＳ８でＹＥＳの場合、ステップＳ１へ戻るので、追跡に失敗しているのにその誤った追跡結果に基づいてカメラの制御が継続されてしまうような事態を、回避することができる。 According to this embodiment, as the reliability reliability c _t of the tracking target, and calculates the spatial weighted mean value of the likelihood of all particles (step S214), whether the value is a predetermined value or more (Steps S111, S112, S6), it can be determined whether the tracking has succeeded or failed. Therefore, in this embodiment, if NO in step S6 and YES in step S8, the process returns to step S1, so that control of the camera is continued based on the erroneous tracking result even though tracking has failed. Can be avoided.

また、本実施の形態によれば、カメラ制御処理において予測制御が導入されているので、例えば、カメラ１が制御指令に対して応答してその指令状態になるまでの動作時間が画像処理時間に比べて長い場合であっても、追尾対象の急な動きの変化などにも対応することができ、追尾対象をより精度良く追尾して撮像することができる。なお、カメラのパン、チルト、ズームの制御速度があまりに速過ぎると、追尾対象を監視者が目で追う際に、カメラのパン、チルト、ズームの変化があまりに急激になってしまい、監視者に不快感を与えてしまい監視に適さなくなってしまうが、カメラ１として制御速度が比較的遅いものを使用することができるので、カメラ１のパン、チルト、ズームの変化をスムーズにして監視により適した追尾を実現することができる。 Further, according to the present embodiment, since predictive control is introduced in the camera control process, for example, the operation time until the camera 1 responds to the control command and enters the command state is the image processing time. Even if it is longer than this, it is possible to cope with a sudden change in the tracking target, and it is possible to track and image the tracking target with higher accuracy. Note that if the camera pan, tilt, and zoom control speeds are too fast, the camera pan, tilt, and zoom will change too rapidly when the observer follows the tracking target. Although it is uncomfortable and not suitable for monitoring, a camera with a relatively slow control speed can be used as the camera 1, so that the panning, tilting and zooming changes of the camera 1 are smoothed and more suitable for monitoring. Tracking can be realized.

さらに、本実施の形態によれば、このような予測にカルマンフィルタが用いられているので、追尾対象領域の位置及び大きさを精度良く予測することができ、ひいては、追尾対象をより精度良く追尾して撮像することができる。 Furthermore, according to the present embodiment, since the Kalman filter is used for such prediction, the position and size of the tracking target region can be predicted with high accuracy, and thus the tracking target can be tracked with higher accuracy. Can be taken.

以上、本発明の一実施の形態について説明したが、本発明はこの実施の形態に限定されるものではない。 Although one embodiment of the present invention has been described above, the present invention is not limited to this embodiment.

本発明の一実施の形態による自動追尾装置を模式的に示すブロック図である。1 is a block diagram schematically showing an automatic tracking device according to an embodiment of the present invention. カメラによる追尾対象の追尾の様子の例を模式的に示す図である。It is a figure which shows typically the example of the mode of the tracking of the tracking target by a camera. 図１中の処理部の動作の一例を示す概略フローチャートである。It is a schematic flowchart which shows an example of operation | movement of the process part in FIG. 図３中の追尾対象検知処理（ステップＳ２）を詳細に示すフローチャートである。It is a flowchart which shows the tracking object detection process (step S2) in FIG. 3 in detail. カメラにより撮像された画像、追尾対象領域及び背景領域の例を示す図である。It is a figure which shows the example of the image imaged with the camera, a tracking object area | region, and a background area | region. エッジ方向ヒストグラムを示す概略図である。It is the schematic which shows an edge direction histogram. ローカルバイナリーパターンの説明図である。It is explanatory drawing of a local binary pattern. 図３中の追跡処理（ステップＳ５）を詳細に示すフローチャートである。It is a flowchart which shows the tracking process (step S5) in FIG. 3 in detail. 対象楕円と背景楕円の例を示す図である。It is a figure which shows the example of a target ellipse and a background ellipse. 信頼度の説明図である。It is explanatory drawing of reliability. 図８中の識別器の更新処理（ステップＳ２１８）を詳細に示すフローチャートである。It is a flowchart which shows the update process (step S218) of the identification device in FIG. 8 in detail. 図３中のカメラ制御処理（ステップ７）を詳細に示すフローチャートである。It is a flowchart which shows the camera control process (step 7) in FIG. 3 in detail. 図１２に引き続くフローチャートである。It is a flowchart following FIG. 図１３に引き続くフローチャートである。It is a flowchart following FIG. 図１４に引き続くフローチャートである。It is a flowchart following FIG.

Explanation of symbols

１カメラ
１ａカメラ本体
１ｂズームレンズ
１ｃ回転台
２処理部 DESCRIPTION OF SYMBOLS 1 Camera 1a Camera body 1b Zoom lens 1c Turntable 2 Processing part

Claims

A camera capable of controlling pan, tilt and zoom;
Tracking processing means for performing tracking processing for tracking a tracking target based on an image captured by the camera;
Control means for controlling pan, tilt, and zoom of the camera so that the camera tracks and captures the tracking target according to the result of the tracking processing by the tracking processing means;
With
The tracking processing means includes position estimating means for estimating the position of the tracking target as a part of the tracking result by a particle filter using a plurality of particles whose pixel positions are based on an image captured by the camera. Including
The particle filter is a response value of an AdaBoost discriminator constructed to identify whether the pixel is a tracking target pixel or a background pixel based on one or more feature amounts related to the pixel for each particle. there are, from the response value by the one or more feature amount relating to the pixel position of the particle state, and are not used the calculated likelihood,
The tracking processing unit includes the particle corresponding to the pixel identified as the tracking target pixel by the AdaBoost classifier among the plurality of particles, the tracking target position estimated by the position estimation unit. Size estimation means for estimating the size of the tracking target based on a particle distribution state in which a Mahalanobis distance using a covariance matrix weighted by the likelihood of the particles is a predetermined value or less,
An automatic tracking device characterized by that.

The one or more feature amounts are an average value of the first value of the first to third values of a predetermined color space of a pixel in the local region including the pixel, and a pixel in the local region including the pixel. The variance value of the first value, the average value of the second values of the pixels in the local area including the pixel, the variance value of the second value of the pixels in the local area including the pixel, and the local value including the pixel The average value of the third values of the pixels in the region, the variance value of the third value of the pixels in the local region including the pixel, the edge direction histogram in the local region including the pixel, and the local region including the pixel The automatic tracking device according to claim 1, comprising at least one of a histogram of a local binary pattern at.

A predetermined particle of the plurality of particles is used as a learning sample for a tracking target pixel, and the AdaBoost discriminator is updated using another predetermined particle of the plurality of particles as a learning sample for a background pixel. The automatic tracking device according to claim 1, further comprising an updating unit.

A calculation unit that calculates a spatial weighted average value of the likelihoods of the plurality of particles, and a determination unit that determines whether or not the spatial weighted average value is equal to or greater than a predetermined value. The automatic tracking device according to any one of claims 1 to 3.

A camera capable of controlling pan, tilt and zoom;
Tracking processing means for performing tracking processing for tracking a tracking target based on an image captured by the camera;
Control means for controlling pan, tilt, and zoom of the camera so that the camera tracks and captures the tracking target according to the result of the tracking processing by the tracking processing means;
With
The tracking processing means includes position estimating means for estimating the position of the tracking target as a part of the tracking result by a particle filter using a plurality of particles whose pixel positions are based on an image captured by the camera. Including
The particle filter is a response value of an AdaBoost discriminator constructed to identify whether the pixel is a tracking target pixel or a background pixel based on one or more feature amounts related to the pixel for each particle. Then, the likelihood calculated from the response value by the one or more feature amounts related to the pixel at the position of the particle is used,
The one or more feature amounts are an average value of the first value of the first to third values of a predetermined color space of a pixel in the local region including the pixel, and a pixel in the local region including the pixel. The variance value of the first value, the average value of the second values of the pixels in the local area including the pixel, the variance value of the second value of the pixels in the local area including the pixel, and the local value including the pixel The average value of the third values of the pixels in the region, the variance value of the third value of the pixels in the local region including the pixel, the edge direction histogram in the local region including the pixel, and the local region including the pixel comprising at least one of the histogram of local binary pattern, in,
An automatic tracking device characterized by that.

A predetermined particle of the plurality of particles is used as a learning sample for a tracking target pixel, and the AdaBoost discriminator is updated using another predetermined particle of the plurality of particles as a learning sample for a background pixel. 6. The automatic tracking device according to claim 5, further comprising updating means.

A calculation unit that calculates a spatial weighted average value of the likelihoods of the plurality of particles, and a determination unit that determines whether or not the spatial weighted average value is equal to or greater than a predetermined value. The automatic tracking device according to claim 5 or 6 .

A camera capable of controlling pan, tilt and zoom;
Tracking processing means for performing tracking processing for tracking a tracking target based on an image captured by the camera;
Control means for controlling pan, tilt, and zoom of the camera so that the camera tracks and captures the tracking target according to the result of the tracking processing by the tracking processing means;
With
The tracking processing means includes position estimating means for estimating the position of the tracking target as a part of the tracking result by a particle filter using a plurality of particles whose pixel positions are based on an image captured by the camera. Including
The particle filter is a response value of an AdaBoost discriminator constructed to identify whether the pixel is a tracking target pixel or a background pixel based on one or more feature amounts related to the pixel for each particle. Then, the likelihood calculated from the response value by the one or more feature quantities related to the pixel at the position of the particle is used,
A predetermined particle of the plurality of particles is used as a learning sample for a tracking target pixel, and the AdaBoost discriminator is updated using another predetermined particle of the plurality of particles as a learning sample for a background pixel. With updating means,
An automatic tracking device characterized by that.

A calculation unit that calculates a spatial weighted average value of the likelihoods of the plurality of particles, and a determination unit that determines whether or not the spatial weighted average value is equal to or greater than a predetermined value. The automatic tracking device according to claim 8.

A camera capable of controlling pan, tilt and zoom;
Tracking processing means for performing tracking processing for tracking a tracking target based on an image captured by the camera;
Control means for controlling pan, tilt, and zoom of the camera so that the camera tracks and captures the tracking target according to the result of the tracking processing by the tracking processing means;
With
The tracking processing means includes position estimating means for estimating the position of the tracking target as a part of the tracking result by a particle filter using a plurality of particles whose pixel positions are based on an image captured by the camera. Including
The particle filter is a response value of an AdaBoost discriminator constructed to identify whether the pixel is a tracking target pixel or a background pixel based on one or more feature amounts related to the pixel for each particle. Then, the likelihood calculated from the response value by the one or more feature quantities related to the pixel at the position of the particle is used,
A calculation unit that calculates a spatial weighted average value of the likelihoods of the plurality of particles, and a determination unit that determines whether the spatial weighted average value is a predetermined value or more.
An automatic tracking device characterized by that.

The position estimation unit estimates the position of the tracking target as a weighted average value using the likelihood of pixel positions that are states of the plurality of particles . automatic tracking device according to.

The tracking processing means determines the size of the tracking target as another part of the tracking result based on the particles corresponding to the pixel identified as the tracking target pixel by the AdaBoost classifier among the plurality of particles. The automatic tracking device according to any one of claims 1 to 11 , further comprising a size estimation means for estimation.

The control means includes a prediction means for predicting the position and size of the tracking target after a predetermined time has elapsed from the current time based on the result of the tracking processing by the tracking processing means,
The control means controls the pan, tilt and zoom of the camera by correcting a current pan, tilt and zoom control state for the camera according to a prediction result by the prediction means.
The automatic tracking device according to any one of claims 1 to 12 ,

14. The automatic tracking device according to claim 13 , wherein the prediction means predicts the position and size of the tracking target after a predetermined time has elapsed from the present time by using a Kalman filter.