JP2017033175A

JP2017033175A - Image processing apparatus, image processing method, and program

Info

Publication number: JP2017033175A
Application number: JP2015150992A
Authority: JP
Inventors: 崇之原; Takayuki Hara
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-07-30
Filing date: 2015-07-30
Publication date: 2017-02-09
Anticipated expiration: 2035-07-30
Also published as: JP6613687B2

Abstract

PROBLEM TO BE SOLVED: To reduce false recognition of an object which is similar to an object to be tracked, while reducing processing time.SOLUTION: An image processing apparatus includes: an acquisition unit for acquiring a frame from a video; an extraction unit which extracts a plurality of partial areas; a determination unit which determines whether each of the first partial areas is a target area including a target object by means of a classifier; an estimation unit which determines an estimated position of the target object, from the target areas; a positive example addition unit which adds a second partial area located at a distance of a first threshold from the estimated position, as a positive example, to training data when the second partial area is determined to be a positive example by the classifier; a negative example addition unit which adds a third partial area located at a distance of a second threshold from a position of the second partial area added as the positive example, as a negative example, to the training data when the third partial area is determined to be a positive example by the classifier; and an update unit which updates the classifier by use of the updated training data.SELECTED DRAWING: Figure 4

Description

本発明は画像処理装置、画像処理方法及びプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program.

動画中の任意の物体を追跡する技術が従来から知られている。近年では、物体検出を物体追跡に取り入れたＴｒａｃｋｉｎｇｂｙＤｅｔｅｃｔｉｏｎが成功を収めている。ＴｒａｃｋｉｎｇｂｙＤｅｔｅｃｔｉｏｎでは、追跡対象物体を正例、その背景領域を負例として、２クラスの分類器を学習し、この分類器に基づき追跡対象物体の位置を検出する。 A technique for tracking an arbitrary object in a moving image has been conventionally known. In recent years, Tracking by Detection, which incorporates object detection into object tracking, has been successful. In Tracking by Detection, a two-class classifier is learned by using a tracking target object as a positive example and a background area as a negative example, and the position of the tracking target object is detected based on the classifier.

非特許文献１には、動画中の任意の物体を追跡する方法の代表的な例が開示されている。また特許文献１、非特許文献３及び非特許文献４には、追跡対象物体をｂｏｏｓｔｉｎｇにより学習し、当該追跡対象物体を検出する方法が開示されている。また非特許文献２には、追跡対象物体を、サポートベクターマシン（ＳＶＭ）により学習し、当該追跡対象物体を検出する方法が開示されている。 Non-Patent Document 1 discloses a typical example of a method for tracking an arbitrary object in a moving image. Patent Document 1, Non-Patent Document 3, and Non-Patent Document 4 disclose a method of learning a tracking target object by boosting and detecting the tracking target object. Non-Patent Document 2 discloses a method of learning a tracking target object using a support vector machine (SVM) and detecting the tracking target object.

これらの方法では追跡対象物体とともに背景領域を学習するので、背景との混同を低減しロバストな追跡を実現することができる。また、学習は物体追跡中にも逐次行われるため、追跡対象物体が時間的に変化する場合にも、その経時変化を分類器に反映させ追跡を行うことができる。さらに特許文献２では、追跡対象物体を一度見失い再追跡する際に、学習済みの背景の影響を受けないように認識モデル（分類器）を切り替える方法が開示されている。 In these methods, the background region is learned together with the tracking target object, so that confusion with the background can be reduced and robust tracking can be realized. In addition, since learning is sequentially performed during object tracking, even when the tracking target object changes with time, the change with time can be reflected in the classifier for tracking. Further, Patent Document 2 discloses a method of switching a recognition model (classifier) so as not to be influenced by a learned background when a tracking target object is lost once and retracked.

前述の通り、ＴｒａｃｋｉｎｇｂｙＤｅｔｅｃｔｉｏｎに基づく従来の物体追跡方法では、物体追跡中に再学習を行うことで追跡対象物体の経時変化に追随することができる。再学習時に正例及び負例の訓練データを獲得する方法は、（１）追跡対象物体の存在領域のサンプルを正例、それ以外を負例とする方法と、（２）分類器で正例と判定されたサンプルを正例、負例と判定されたサンプルを負例として訓練データに新たに追加する方法と、が知られている。 As described above, in the conventional object tracking method based on Tracking by Detection, it is possible to follow a change with time of the tracking target object by performing relearning during the object tracking. The method of acquiring the training data of positive examples and negative examples at the time of re-learning includes (1) a method in which a sample of a region to be tracked is a positive example, and other cases as negative examples, and (2) a positive example with a classifier A method of newly adding to the training data a sample determined as a positive example and a sample determined as a negative example as a negative example is known.

しかしながら、上記（１），（２）の方法は、サンプリングした分だけ訓練データが増大し処理時間が増加する。具体的には、（１）の方法では、追跡対象物体の位置推定精度が低いと正例と負例の振り分けを間違えて精度低下につながる。また（２）の方法では、その時点までの分類器では識別できない追跡対象物体に類似する物体が現れた時に、それを正例として誤って訓練データに追加してしまう可能性がある。 However, in the above methods (1) and (2), the training data increases by the sampled amount and the processing time increases. Specifically, in the method (1), if the position estimation accuracy of the tracking target object is low, the accuracy of the positive example and the negative example is mistaken, leading to a decrease in accuracy. In the method (2), when an object similar to the tracking target object that cannot be identified by the classifier up to that time appears, it may be mistakenly added to the training data as a positive example.

本発明は、上記に鑑みてなされたものであって、処理時間を抑えるとともに、追跡対象物体に類似する物体の誤認識を低減することができる画像処理装置、画像処理方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above, and provides an image processing apparatus, an image processing method, and a program capable of reducing processing time and reducing erroneous recognition of an object similar to a tracking target object. With the goal.

上述した課題を解決し、目的を達成するために、本発明は、動画からフレームを取得する取得部と、前記フレームの一部の領域を示す複数の第１の部分領域を抽出する抽出部と、前記第１の部分領域のそれぞれが、追跡対象物体を含む追跡対象領域であるか否かを、前記追跡対象領域を正例とし、前記追跡対象領域でない領域を負例とする訓練データにより学習された分類器により判定する判定部と、複数の前記追跡対象領域から、前記追跡対象物体の推定位置を推定する推定部と、前記推定位置から第１の閾値以内の距離にある第２の部分領域が、前記分類器により正例であると判定された場合、前記第２の部分領域を正例として前記訓練データに追加する正例追加部と、正例として追加された前記第２の部分領域の位置から第２の閾値以上の距離にある第３の部分領域が、前記分類器により正例であると判定された場合、前記第３の部分領域を、負例として前記訓練データに追加する負例追加部と、前記正例追加部と前記負例追加部とにより更新された前記訓練データを使用して前記分類器を更新する更新部と、を備える。 In order to solve the above-described problems and achieve the object, the present invention includes an acquisition unit that acquires a frame from a moving image, and an extraction unit that extracts a plurality of first partial regions indicating a partial region of the frame. Whether each of the first partial areas is a tracking target area including a tracking target object is learned from training data using the tracking target area as a positive example and a non-tracking target area as a negative example. A determination unit that is determined by the classified classifier, an estimation unit that estimates an estimated position of the tracking target object from a plurality of the tracking target regions, and a second part that is within a first threshold from the estimated position When the region is determined to be a positive example by the classifier, a positive example adding unit that adds the second partial region to the training data as a positive example, and the second part added as a positive example From the position of the region to the second threshold value or more When the third partial region at a distance of is determined to be a positive example by the classifier, a negative example addition unit that adds the third partial region to the training data as a negative example; And an updating unit that updates the classifier using the training data updated by the example adding unit and the negative example adding unit.

本発明によれば、処理時間を抑えるとともに、追跡対象物体に類似する物体の誤認識を低減することができるという効果を奏する。 According to the present invention, it is possible to reduce the processing time and reduce erroneous recognition of an object similar to the tracking target object.

図１は第１実施形態の学習装置の機能構成の例を示す図である。FIG. 1 is a diagram illustrating an example of a functional configuration of the learning device according to the first embodiment. 図２は第１実施形態の正例領域の例を示す図である。FIG. 2 is a diagram illustrating an example of a positive example area according to the first embodiment. 図３は第１実施形態の負例領域の例を示す図である。FIG. 3 is a diagram illustrating an example of a negative example area according to the first embodiment. 図４は第１実施形態の画像処理装置の機能構成の例を示す図である。FIG. 4 is a diagram illustrating an example of a functional configuration of the image processing apparatus according to the first embodiment. 図５は第１実施形態の学習装置の動作例を示すフローチャートである。FIG. 5 is a flowchart illustrating an operation example of the learning apparatus according to the first embodiment. 図６は第１実施形態の画像処理装置の動作例を示すフローチャートである。FIG. 6 is a flowchart illustrating an operation example of the image processing apparatus according to the first embodiment. 図７は第１及び第２実施形態の学習装置及び画像処理装置のハードウェア構成の例を示す図である。FIG. 7 is a diagram illustrating an example of a hardware configuration of the learning device and the image processing device according to the first and second embodiments.

以下に添付図面を参照して、画像処理装置、画像処理方法及びプログラムの第１実施形態を詳細に説明する。 Hereinafter, an image processing apparatus, an image processing method, and a program according to a first embodiment will be described in detail with reference to the accompanying drawings.

（第１実施形態）
はじめに第１実施形態の学習装置の機能構成の例について説明する。第１実施形態の学習装置は、第１実施形態の画像処理装置で使用される分類器の学習を行う。 (First embodiment)
First, an example of a functional configuration of the learning device according to the first embodiment will be described. The learning device according to the first embodiment learns a classifier used in the image processing device according to the first embodiment.

図１は第１実施形態の学習装置１００の機能構成の例を示す図である。第１実施形態の学習装置１００は、記憶部１０、学習制御部２０、入力部３０及び表示部４０を備える。学習制御部２０は、取得部２１、指定部２２、正例取得部２３、負例取得部２４及び学習部２５を備える。 FIG. 1 is a diagram illustrating an example of a functional configuration of the learning device 100 according to the first embodiment. The learning device 100 according to the first embodiment includes a storage unit 10, a learning control unit 20, an input unit 30, and a display unit 40. The learning control unit 20 includes an acquisition unit 21, a specification unit 22, a positive example acquisition unit 23, a negative example acquisition unit 24, and a learning unit 25.

記憶部１０は処理対象の動画を記憶する。取得部２１は、記憶部１０から処理対象の動画に含まれるフレームを１つ取得する。取得部２１により取得されるフレームは任意でよい。取得部２１により取得されるフレームは、例えば処理対象の動画の最初のフレーム、及び、ユーザにより指定されたフレーム等である。取得部２１はフレームを指定部２２に入力する。 The storage unit 10 stores a moving image to be processed. The acquisition unit 21 acquires one frame included in the moving image to be processed from the storage unit 10. The frame acquired by the acquisition unit 21 may be arbitrary. The frames acquired by the acquisition unit 21 are, for example, the first frame of the moving image to be processed, the frame specified by the user, and the like. The acquisition unit 21 inputs the frame to the designation unit 22.

指定部２２は、取得部２１からフレームを受け付けると、当該フレームを表示部４０に表示する。入力部３０は、表示部４０に表示されたフレーム内で追跡の対象とする追跡対象領域を指定する入力を受け付ける。追跡対象領域の指定方法は任意でよい。追跡対象領域の指定方法は、例えば追跡対象物体を含む矩形を追跡対象領域として指定する方法でもよい。指定部２２は、入力部３０から追跡対象領域を示す追跡対象領域情報を受け付けると、当該追跡対象領域情報を正例取得部２３及び負例取得部２４に入力する。 When the specification unit 22 receives a frame from the acquisition unit 21, the specification unit 22 displays the frame on the display unit 40. The input unit 30 receives an input for designating a tracking target region to be tracked in the frame displayed on the display unit 40. The method for specifying the tracking target area may be arbitrary. The method for specifying the tracking target area may be, for example, a method of specifying a rectangle including the tracking target object as the tracking target area. When receiving the tracking target region information indicating the tracking target region from the input unit 30, the specifying unit 22 inputs the tracking target region information to the positive example acquisition unit 23 and the negative example acquisition unit 24.

正例取得部２３は、指定部２２から追跡対象領域情報を受け付けると、当該追跡対象領域情報が示す追跡対象領域の周辺の領域から正例領域を取得する。正例領域は、訓練データの正例を示す領域である。 When receiving the tracking target region information from the specifying unit 22, the positive example acquisition unit 23 acquires a positive example region from a region around the tracking target region indicated by the tracking target region information. A positive example area | region is an area | region which shows the positive example of training data.

図２は第１実施形態の正例領域２１０ａ〜２１０ｃの例を示す図である。図２の例は、正例取得部２３が、追跡対象領域２００を水平方向及び垂直方向に変動させた矩形を、正例領域２１０ａ〜２１０ｃとして取得する場合を示す。以下、正例領域２１０ａ〜２１０ｃを区別しない場合は、単に正例領域２１０という。正例領域２１０を取得する際の変動量は任意でよい。水平方向の変動量は、例えば追跡対象領域２００の横の長さの１０％である。また垂直方向の変動量は、例えば追跡対象領域２００の縦の長さの１０％である。 FIG. 2 is a diagram illustrating an example of positive example areas 210a to 210c according to the first embodiment. The example of FIG. 2 illustrates a case where the positive example acquisition unit 23 acquires, as positive example areas 210a to 210c, rectangles obtained by changing the tracking target area 200 in the horizontal direction and the vertical direction. Hereinafter, when the positive example areas 210a to 210c are not distinguished, they are simply referred to as positive example areas 210. The amount of variation when acquiring the positive example area 210 may be arbitrary. The amount of fluctuation in the horizontal direction is, for example, 10% of the horizontal length of the tracking target area 200. Further, the amount of fluctuation in the vertical direction is, for example, 10% of the vertical length of the tracking target area 200.

なお正例領域２１０の取得方法は図２の方法に限られず任意でよい。正例取得部２３は、例えば追跡対象領域２００を細かく分割することにより複数の分割領域を取得し、当該分割領域を正例領域２１０として取得してもよい。また正例領域２１０の形状は、追跡対象領域２００の形状と同一でなくてもよい。 The method for acquiring the positive example area 210 is not limited to the method shown in FIG. The positive example acquisition unit 23 may acquire a plurality of divided regions by finely dividing the tracking target region 200, for example, and may acquire the divided regions as the positive example region 210. The shape of the positive example area 210 may not be the same as the shape of the tracking target area 200.

図１に戻り、正例取得部２３は、正例領域２１０の特徴を示す第１の特徴量情報を算出する。第１の特徴量情報は任意でよい。第１の特徴量情報は、例えば正例領域の画素値を示す特徴ベクトルでもよい。この特徴ベクトルの次元は、例えば正例領域のサイズが１６×１６の場合、２５６次元である。また第１の特徴量情報は、色ヒストグラム、エッジ強度のヒストグラム、ＳＩＦＴ特徴量、ＨＯＧ特徴量、及び、ニューラルネットワークの出力値等でもよい。また、第１の特徴量情報は、これらの組み合わせでもよい。なお正例取得部２３は、正例領域のサイズを特定のサイズに正規化し、正規化された正例領域の第１の特徴量情報を算出してもよい。正例取得部２３は第１の特徴量情報を学習部２５に入力する。 Returning to FIG. 1, the positive example acquisition unit 23 calculates first feature amount information indicating the characteristics of the positive example region 210. The first feature amount information may be arbitrary. The first feature amount information may be, for example, a feature vector indicating the pixel value of the positive example area. The dimension of this feature vector is 256 dimensions when the size of the positive example area is 16 × 16, for example. The first feature amount information may be a color histogram, a histogram of edge strength, a SIFT feature amount, an HOG feature amount, an output value of a neural network, or the like. Further, the first feature amount information may be a combination of these. The positive example acquisition unit 23 may normalize the size of the positive example region to a specific size and calculate the first feature amount information of the normalized positive example region. The positive example acquisition unit 23 inputs the first feature amount information to the learning unit 25.

一方、負例取得部２４は、指定部２２から追跡対象領域情報を受け付けると、当該追跡対象領域情報が示す追跡対象領域の外の領域から負例領域を取得する。負例領域は、訓練データの負例を示す領域である。 On the other hand, when receiving the tracking target region information from the specifying unit 22, the negative example acquisition unit 24 acquires a negative example region from a region outside the tracking target region indicated by the tracking target region information. A negative example area | region is an area | region which shows the negative example of training data.

図３は第１実施形態の負例領域２２０ａ〜２２０ｎの例を示す図である。図２の例は、負例取得部２４が、追跡対象領域２００外の周囲の領域を格子状の領域に分割することにより、負例領域２２０ａ〜２２０ｎを取得する場合を示す。以下、負例領域２２０ａ〜２２０ｎを区別しない場合は、単に負例領域２２０という。 FIG. 3 is a diagram illustrating an example of the negative example regions 220a to 220n of the first embodiment. The example of FIG. 2 illustrates a case where the negative example acquisition unit 24 acquires the negative example regions 220a to 220n by dividing the surrounding region outside the tracking target region 200 into a grid-like region. Hereinafter, when the negative example regions 220a to 220n are not distinguished, they are simply referred to as negative example regions 220.

なお負例領域２２０の取得方法は図３の方法に限られず任意でよい。負例取得部２４は、例えば追跡対象領域２００外の周囲の領域を格子状の領域に分割してから、当該分割された格子状の領域の位置を、ランダム又は所定の割合だけ変動させた負例領域２２０ａ〜２２０ｎを取得してもよい。 The method for obtaining the negative example region 220 is not limited to the method shown in FIG. For example, the negative example acquisition unit 24 divides a surrounding region outside the tracking target region 200 into a lattice region, and then changes the position of the divided lattice region randomly or by a predetermined ratio. The example areas 220a to 220n may be acquired.

図１に戻り、負例取得部２４は、負例領域２２０の特徴を示す第２の特徴量情報を算出する。第２の特徴量情報の説明は、正例領域２１０の場合の説明と同様なので省略する。負例取得部２４は、負例領域２２０の第２の特徴量情報を学習部２５に入力する。 Returning to FIG. 1, the negative example acquisition unit 24 calculates second feature amount information indicating the characteristics of the negative example region 220. The description of the second feature amount information is the same as that in the case of the positive example area 210, and will be omitted. The negative example acquisition unit 24 inputs the second feature amount information of the negative example region 220 to the learning unit 25.

学習部２５は、正例取得部２３から第１の特徴量情報を受け付け、負例取得部２４から第２の特徴量情報を受け付けると、当該第１及び第２の特徴量情報に基づいて分類器を学習する。第１実施形態の説明では、学習部２５が線形判別分析により分類器を学習する場合について説明する。また第１及び第２の特徴量情報が、上述の特徴ベクトルである場合について説明する。 When the learning unit 25 receives the first feature amount information from the positive example acquisition unit 23 and receives the second feature amount information from the negative example acquisition unit 24, the learning unit 25 performs classification based on the first and second feature amount information. Learn the vessel. In the description of the first embodiment, a case where the learning unit 25 learns a classifier by linear discriminant analysis will be described. A case where the first and second feature quantity information is the above-described feature vector will be described.

正例領域２１０の特徴ベクトルの数をＮ_ｐとする。負例領域２２０の特徴ベクトルの数をＮ_ｎとする。学習部２５は、まず正例領域２１０の特徴ベクトルの平均ベクトルｍ_ｐ及び変動行列Ｓ_ｐ、並びに、負例領域２２０の特徴ベクトルの平均ベクトルｍ_ｎ及び変動行列Ｓ_ｎを算出する。そして学習部２５は、判別係数ベクトルａを次式（１）により算出する。 Let N _p be the number of feature vectors in the positive example area 210. _Let N _n be the number of feature vectors in the negative example region 220. Learning section 25 first mean vector _{m p} and fluctuation matrix _{S p} of the feature vector of the positive examples region 210, and calculates the average vector _{m n} and fluctuation matrix _{S n} of the feature vector of the negative examples region 220. Then, the learning unit 25 calculates the discrimination coefficient vector a by the following equation (1).

学習部２５は、（Ｎ_ｐ，ｍ_ｐ，Ｓ_ｐ，Ｎ_ｎ，ｍ_ｎ，Ｓ_ｎ，ａ）を学習結果情報として記憶部１０に記憶する。そして学習部２５は、特徴ベクトルｘにより特徴付けられる領域が、追跡対象領域であるか否かを識別する識別関数（判別関数）を次式（２）により構成する。 Learning unit 25 _stores _{_{_{(N p, m p, S}}} p, N n, m n, S n, a) in the storage unit 10 as the learning result information. Then, the learning unit 25 configures an identification function (discriminant function) that identifies whether or not the region characterized by the feature vector x is a tracking target region by the following equation (2).

すなわち式（２）の識別関数ｈ（ｘ）が、学習部２５により学習された分類器である。ｈ（ｘ）が正の値の場合、当該特徴ベクトルｘにより特徴付けられる領域が追跡対象領域であることを示す。ｈ（ｘ）が負の値の場合、当該特徴ベクトルｘにより特徴付けられる領域が背景領域であることを示す。 That is, the discriminant function h (x) in the equation (2) is a classifier learned by the learning unit 25. When h (x) is a positive value, it indicates that the region characterized by the feature vector x is a tracking target region. When h (x) is a negative value, it indicates that the region characterized by the feature vector x is a background region.

次に第１実施形態の画像処理装置の機能構成の例について説明する。 Next, an example of a functional configuration of the image processing apparatus according to the first embodiment will be described.

図４は第１実施形態の画像処理装置３００の機能構成の例を示す図である。第１実施形態の画像処理装置３００は、記憶部３１０及び画像処理部３２０を備える。画像処理部３２０は、取得部３２１、抽出部３２２、判定部３２３、推定部３２４、正例追加部３２５、負例追加部３２６及び更新部３２７を備える。 FIG. 4 is a diagram illustrating an example of a functional configuration of the image processing apparatus 300 according to the first embodiment. The image processing apparatus 300 according to the first embodiment includes a storage unit 310 and an image processing unit 320. The image processing unit 320 includes an acquisition unit 321, an extraction unit 322, a determination unit 323, an estimation unit 324, a positive example addition unit 325, a negative example addition unit 326, and an update unit 327.

記憶部３１０は処理対象の動画を記憶する。また記憶部３１０は学習結果情報及び分類器を記憶する。学習結果情報は、上述の（Ｎ_ｐ，ｍ_ｐ，Ｓ_ｐ，Ｎ_ｎ，ｍ_ｎ，Ｓ_ｎ，ａ）である。分類器は、上述の式（２）が示す識別関数ｈ（ｘ）である。 The storage unit 310 stores a moving image to be processed. The storage unit 310 stores learning result information and a classifier. Learning result information is described above _{_{_{_{(N p, m p, S}}}} p, N n, m n, S n, a). The classifier is the discriminant function h (x) indicated by the above equation (2).

取得部３２１は、記憶部３１０から処理対象の動画に含まれるフレームを、カレントフレームとして１つ取得する。最初のイタレーションでは、取得部３２１は、追跡対象物体の追跡を開始するフレームを、カレントフレームとして１つ取得する。２回目以降のイタレーションでは、取得部３２１は、前回選択されたフレームよりも時系列順で新しいフレーム（典型的には次のフレーム）を、カレントフレームとして１つ取得する。取得部３２１はカレントフレームを抽出部３２２に入力する。 The acquisition unit 321 acquires one frame included in the processing target moving image from the storage unit 310 as a current frame. In the first iteration, the acquisition unit 321 acquires one frame for starting tracking of the tracking target object as a current frame. In the second and subsequent iterations, the acquisition unit 321 acquires one new frame (typically the next frame) in chronological order as compared to the previously selected frame as the current frame. The acquisition unit 321 inputs the current frame to the extraction unit 322.

抽出部３２２は、取得部３２１からカレントフレームを受け付けると、当該カレントフレームから複数の第１の部分領域を抽出する。第１の部分領域の抽出の仕方は任意でよい。抽出部３２２は、例えば前フレームで推定された追跡対象物体の推定位置を基準にして、図２のように水平方向及び垂直方向に複数の変動を与えることにより抽出される複数の矩形領域を、複数の第１の部分領域として抽出する。 When receiving the current frame from the acquisition unit 321, the extraction unit 322 extracts a plurality of first partial regions from the current frame. The method for extracting the first partial region may be arbitrary. The extraction unit 322, for example, with reference to the estimated position of the tracking target object estimated in the previous frame, a plurality of rectangular areas extracted by giving a plurality of variations in the horizontal direction and the vertical direction as shown in FIG. Extracted as a plurality of first partial regions.

このとき、抽出部３２２が第１の部分領域を抽出する際の変動の大きさは、学習装置１００が分類器を学習する際の変動の大きさよりも大きくする。具体的には、抽出部３２２は、第１の部分領域を抽出する際の変動の大きさを、追跡対象物体の想定される速度に応じて設定する。 At this time, the magnitude of the fluctuation when the extraction unit 322 extracts the first partial region is larger than the magnitude of the fluctuation when the learning device 100 learns the classifier. Specifically, the extraction unit 322 sets the magnitude of fluctuation when extracting the first partial region according to the assumed speed of the tracking target object.

なお最初のイタレーションでは、前フレームの情報がないので、抽出部３２２は、カレントフレーム全体を複数の第１の部分領域に分割することにより複数の第１の部分領域を抽出してもよい。 Note that in the first iteration, there is no information on the previous frame, so the extraction unit 322 may extract a plurality of first partial areas by dividing the entire current frame into a plurality of first partial areas.

以下、第１の部分領域を示す情報を第１の部分領域情報という。抽出部３２２は複数の第１の部分領域情報を判定部３２３に入力する。 Hereinafter, information indicating the first partial area is referred to as first partial area information. The extraction unit 322 inputs a plurality of pieces of first partial region information to the determination unit 323.

判定部３２３は、抽出部３２２から複数の第１の部分領域情報を受け付けると、各第１の部分領域情報が示す第１の部分領域が、追跡対象物体を含む追跡対象領域であるか否かを判定する。具体的には、判定部３２３は、まず各第１の部分領域情報が示す第１の部分領域の特徴量情報を算出する。次に判定部３２３は、特徴量情報により特徴付けられる第１の部分領域が、学習装置１００により学習された上述の分類器（式（２）参照）を使用して、追跡対象領域であるか否かを判定する。次に判定部３２３は、第１の部分領域が追跡対象領域であると判定された場合、当該追跡対象領域を示す追跡対象領域情報を推定部３２４に入力する。 When the determination unit 323 receives a plurality of pieces of first partial region information from the extraction unit 322, whether or not the first partial region indicated by each first partial region information is a tracking target region including a tracking target object. Determine. Specifically, the determination unit 323 first calculates feature amount information of the first partial area indicated by each first partial area information. Next, the determination unit 323 determines whether the first partial region characterized by the feature amount information is a tracking target region using the above-described classifier (see Expression (2)) learned by the learning device 100. Determine whether or not. Next, when it is determined that the first partial region is a tracking target region, the determination unit 323 inputs tracking target region information indicating the tracking target region to the estimation unit 324.

推定部３２４は、判定部３２３から追跡対象領域情報を受け付ける。推定部３２４は、判定部３２３から複数の追跡対象領域情報を受け付けた場合、各追跡対象領域情報が示す追跡対象領域に基づいて追跡対象物体の位置を推定する。 The estimation unit 324 receives tracking target area information from the determination unit 323. When receiving a plurality of pieces of tracking target area information from the determination unit 323, the estimation unit 324 estimates the position of the tracking target object based on the tracking target area indicated by each tracking target area information.

推定部３２４は、例えば各追跡対象領域情報が示す追跡対象領域の位置の平均により、追跡対象物体の位置を推定する。なお推定部３２４が、追跡対象物体の位置を推定する具体的な方法は任意でよい。 The estimation unit 324 estimates the position of the tracking target object based on, for example, the average of the positions of the tracking target areas indicated by the tracking target area information. Note that a specific method by which the estimation unit 324 estimates the position of the tracking target object may be arbitrary.

推定部３２４は、例えば上述の式（２）の識別関数ｈ（ｘ）の値を元にして加重平均を取ることにより、追跡対象物体の位置を推定してもよい。識別関数ｈ（ｘ）の値を元にした加重平均の重みｗ_ｉは、例えば次式（３）により算出される。 The estimation unit 324 may estimate the position of the tracking target object, for example, by taking a weighted average based on the value of the discrimination function h (x) in the above equation (2). The weighted average weight w _i based on the value of the discriminant function h (x) is calculated by the following equation (3), for example.

ここでα（＞０）はパラメータである。このとき推定部３２４は、ｉ番目の第１の部分領域の中心位置を（ｘ_ｉ，ｙ_ｉ）として、追跡対象物体の推定位置を次式（４）により算出する。 Here, α (> 0) is a parameter. At this time, the estimation unit 324 calculates the estimated position of the tracking target object by the following expression (4), with the center position of the i-th first partial region as (x _i , y _i ).

また、より簡略な別の方法として、推定部３２４は、識別関数値ｈ_ｉが最大のｉ番目の第１の部分領域の中心位置（ｘ_ｉ，ｙ_ｉ）を、そのまま追跡対象物体の推定位置としてもよい。 As another simpler method, the estimation unit 324 uses the center position (x _i , y _i ) of the i-th first partial region with the largest discrimination function value h _{i as it} is as the estimated position of the tracking target object. It is good.

推定部３２４は、追跡対象物体の推定位置を含む推定領域が追跡対象領域であるか否かを、学習装置１００により学習された上述の分類器（式（２）参照）を使用して判定する。分類器による判定に使用される特徴量ｘは、例えば上述の特徴ベクトルである。また推定領域の形状は任意でよい。推定部３２４は、例えば追跡対象物体の推定位置を中心とした矩形を推定領域としてもよい。 The estimation unit 324 determines whether or not the estimation area including the estimated position of the tracking target object is the tracking target area, using the above-described classifier (see Expression (2)) learned by the learning device 100. . The feature quantity x used for determination by the classifier is, for example, the above-described feature vector. The shape of the estimation area may be arbitrary. For example, the estimation unit 324 may use a rectangle centered on the estimated position of the tracking target object as the estimation region.

また推定部３２４は、追跡対象物体の推定位置を中心とした矩形の位置を変動させることにより、複数の推定領域を取得してもよい。推定部３２４は、例えば推定位置から第１の閾値以内の距離にある第２の部分領域を、所定の数だけ推定領域として取得してもよい。なお推定位置と第２の部分領域との距離は、例えば推定位置を示す座標と、推定位置を中心とした矩形の位置を変動させることにより得られた第２の部分領域の中心の座標との距離である。 The estimation unit 324 may acquire a plurality of estimation regions by changing the position of a rectangle centered on the estimated position of the tracking target object. The estimation unit 324 may acquire, for example, a predetermined number of second partial regions that are within a first threshold from the estimated position as estimated regions. The distance between the estimated position and the second partial area is, for example, the coordinates indicating the estimated position and the coordinates of the center of the second partial area obtained by changing the position of the rectangle centered on the estimated position. Distance.

推定部３２４は、推定領域が追跡対象領域である場合、当該推定領域を示す推定領域情報を、正例追加部３２５及び負例追加部３２６に入力する。 When the estimation region is the tracking target region, the estimation unit 324 inputs estimation region information indicating the estimation region to the positive example addition unit 325 and the negative example addition unit 326.

なお処理時間を低減するため、推定部３２４は判定部３２３の処理結果を利用してもよい。具体的には、推定部３２４は、上述の判定部３２３により得られた追跡対象領域のうち、推定部３２４により推定された推定位置との距離が第１の閾値以下である追跡対象領域を示す追跡対象領域情報を、正例追加部３２５及び負例追加部３２６に入力してもよい。なお推定位置と追跡対象領域との距離は、例えば推定位置を示す座標と、追跡対象領域内の中心の座標との距離である。 In order to reduce the processing time, the estimation unit 324 may use the processing result of the determination unit 323. Specifically, the estimation unit 324 indicates a tracking target region whose distance from the estimated position estimated by the estimation unit 324 is equal to or less than a first threshold among the tracking target regions obtained by the determination unit 323 described above. The tracking target area information may be input to the positive example adding unit 325 and the negative example adding unit 326. Note that the distance between the estimated position and the tracking target area is, for example, the distance between the coordinates indicating the estimated position and the coordinates of the center in the tracking target area.

正例追加部３２５は、推定部３２４から推定領域情報を受け付けると、当該推定領域情報が示す推定領域（第２の部分領域）の第１の特徴量情報を、正例として訓練データに追加する。第１の特徴量情報は、例えば推定領域を特徴付ける上述の特徴ベクトルである。 When the positive example addition unit 325 receives the estimation region information from the estimation unit 324, the positive example addition unit 325 adds the first feature amount information of the estimation region (second partial region) indicated by the estimation region information to the training data as a positive example. . The first feature amount information is, for example, the above-described feature vector that characterizes the estimation region.

負例追加部３２６は、推定部３２４から推定領域情報を受け付けると、当該推定領域情報が示す推定領域に基づいて、負例を示す負例領域を決定する。負例追加部３２６は、推定部３２４から受け付けた推定領域情報が示す推定領域から第２の閾値以上の距離にある第３の部分領域を、所定の数だけ取得する。そして負例追加部３２６は、複数の第３の部分領域のうち、分類器により正例領域であると判定された第３の部分領域の第２の特徴量情報を、負例として訓練データに追加する。 When the negative example addition unit 326 receives the estimation region information from the estimation unit 324, the negative example addition unit 326 determines a negative example region indicating a negative example based on the estimation region indicated by the estimation region information. The negative example adding unit 326 acquires a predetermined number of third partial regions that are at a distance equal to or larger than the second threshold from the estimated region indicated by the estimated region information received from the estimating unit 324. Then, the negative example addition unit 326 converts the second feature amount information of the third partial region determined as the positive example region by the classifier from the plurality of third partial regions into the training data as a negative example. to add.

正例領域であると判定された部分領域を、負例領域として訓練データに追加する理由は、“追跡対象物体の推定位置から第２の閾値以上の距離にある領域に追跡対象物体は存在しない”という前提条件による。すなわち追跡対象物体の推定位置から十分に離れた場所で追跡対象物体が検出された場合は、分類器の学習が不十分で類似物体を識別できていないとみなす。そこで、負例追加部３２６が、追跡対象物体と誤認識されるサンプルを負例として訓練データに追加する。具体的には、負例追加部３２６は、訓練データに正例として追加された第２の部分領域の位置から第２の閾値以上の距離にある第３の部分領域が、分類器により正例領域であると判定された場合、当該第３の部分領域を、負例領域として訓練データに追加する。これにより、後述の更新部３２７が、当該訓練データを使用して分類器を更新（再学習）することにより、追跡対象物体の識別精度を上げることができる。 The reason why the partial area determined to be the positive example area is added to the training data as the negative example area is that “the tracking target object does not exist in the area at a distance equal to or larger than the second threshold from the estimated position of the tracking target object. "It is based on the precondition. That is, if a tracking target object is detected at a location sufficiently away from the estimated position of the tracking target object, it is considered that the similar object cannot be identified due to insufficient learning of the classifier. Therefore, the negative example adding unit 326 adds a sample erroneously recognized as a tracking target object to the training data as a negative example. Specifically, the negative example addition unit 326 uses the classifier to generate a positive example of the third partial region that is at a distance greater than or equal to the second threshold from the position of the second partial region added as a positive example to the training data. When it determines with it being an area | region, the said 3rd partial area | region is added to training data as a negative example area | region. Thereby, the update part 327 mentioned later can raise the identification accuracy of a tracking target object by updating (relearning) a classifier using the said training data.

なお処理時間を低減するため、負例追加部３２６は判定部３２３の処理結果を利用してもよい。具体的には、負例追加部３２６は、上述の判定部３２３により得られた追跡対象領域のうち、推定部３２４により推定された推定位置との距離が第２の閾値以上である追跡対象領域の特徴量情報を、負例として訓練データに追加してもよい。 In order to reduce the processing time, the negative example adding unit 326 may use the processing result of the determining unit 323. Specifically, the negative example adding unit 326 has a tracking target region whose distance from the estimated position estimated by the estimating unit 324 is equal to or larger than a second threshold among the tracking target regions obtained by the determination unit 323 described above. The feature amount information may be added to the training data as a negative example.

更新部３２７は、正例追加部３２５と負例追加部３２６により更新された訓練データを使用して分類器を更新する。更新部３２７による分類器の更新方法を、分類器に線形判別分析法（線形判別分析に基づく２値分類）を用いる場合を例にして説明する。具体的には、更新部３２７は、更新された訓練データの正例及び負例を用いて、正例の特徴ベクトルのサンプル数Ｎ_ｐ、平均ベクトルｍ_ｐ及び変動行列Ｓ_ｐ、負例の特徴ベクトルのサンプル数Ｎ_ｎ、平均ベクトルｍ_ｎ及び変動行列Ｓ_ｎ、並びに、判別係数ベクトルａを更新する。更新部３２７は、（ｍ_ｐ，ｍ_ｎ，ａ）から、上述の式（２）の識別関数（分類器）を更新する。 The updating unit 327 updates the classifier using the training data updated by the positive example adding unit 325 and the negative example adding unit 326. The update method of the classifier by the update unit 327 will be described by taking as an example the case of using a linear discriminant analysis method (binary classification based on linear discriminant analysis) for the classifier. Specifically, the update unit 327, using positive cases and negative examples of the updated training data, the number of samples N _p of the feature vectors of the positive _sample, the mean vectors m _p, and fluctuation matrix S _p, wherein the negative sample sample number _{n n} of the vector, the mean vector _{m n} and fluctuation matrix _{S n,} and updates the determination coefficient vector a. The updating unit 327 updates the discriminant function (classifier) of the above equation (2) from (m _p , m _n, a).

なお更新部３２７は、（Ｎ_ｐ，ｍ_ｐ，Ｓ_ｐ，Ｎ_ｎ，ｍ_ｎ，Ｓ_ｎ）を記憶部３１０に記憶しておけば、過去の訓練データを破棄しても、新しい訓練データのみから（Ｎ_ｐ，ｍ_ｐ，Ｓ_ｐ，Ｎ_ｎ，ｍ_ｎ，Ｓ_ｎ）の値を更新して判別係数ベクトルａを求めることができる。また更新部３２７は、記憶部３１０のデータ量の削減のため、Ｓ_ｐ及びＳ_ｎを記憶する代わりにＳ_ｐ及びＳ_ｎの固有値が大きい方から特定個の固有ベクトルと対応する固有値を、記憶部３１０に記憶してもよい。 Note that the update unit 327 stores only (N _p , m _p , S _p , N _n , m _n , S _n ) in the storage unit 310, even if the past training data is discarded, only new training data is stored. from can be obtained _{_{_{_{(n p, m p, S}}}} p, n n, m n, S n) discrimination coefficient vector a to update the value of. The updating unit 327, in order to reduce the amount of data in the storage unit 310, the corresponding eigenvalues and specific eigenvectors from the direction eigenvalues of S _p and S _n is large, instead of storing the S _p and S _n, a storage unit 310 may be stored.

次に第１実施形態の画像処理方法について説明する。 Next, an image processing method according to the first embodiment will be described.

図５は第１実施形態の学習装置１００の動作例を示すフローチャートである。はじめに、取得部２１が、記憶部１０から処理対象の動画に含まれるフレームを１つ取得する（ステップＳ１）。次に、指定部２２が、フレーム内で追跡の対象とする追跡対象領域を示す追跡対象領域情報を受け付ける（ステップＳ２）。次に、正例取得部２３が、追跡対象領域情報が示す追跡対象領域の周辺の領域（図２参照）から正例領域２１０を取得する（ステップＳ３）。次に、正例取得部２３が、正例領域２１０の特徴を示す第１の特徴量情報を算出する（ステップＳ４）。次に、負例取得部２４が、追跡対象領域情報が示す追跡対象領域の外の領域（図３参照）から負例領域２２０を取得する（ステップＳ５）。次に、負例取得部２４が、負例領域２２０の特徴を示す第２の特徴量情報を算出する（ステップＳ６）。次に、学習部２５が、第１及び第２の特徴量情報に基づいて分類器（式（２）参照）を学習する（ステップＳ７）。 FIG. 5 is a flowchart illustrating an operation example of the learning apparatus 100 according to the first embodiment. First, the acquisition unit 21 acquires one frame included in the moving image to be processed from the storage unit 10 (step S1). Next, the designation unit 22 receives tracking target area information indicating a tracking target area to be tracked in the frame (step S2). Next, the positive example acquisition unit 23 acquires the positive example area 210 from the area around the tracking target area indicated by the tracking target area information (see FIG. 2) (step S3). Next, the positive example acquisition unit 23 calculates first feature amount information indicating the characteristics of the positive example region 210 (step S4). Next, the negative example acquisition part 24 acquires the negative example area | region 220 from the area | region (refer FIG. 3) outside the tracking object area | region which tracking object area | region information shows (step S5). Next, the negative example acquisition unit 24 calculates second feature amount information indicating the characteristics of the negative example region 220 (step S6). Next, the learning unit 25 learns the classifier (see Expression (2)) based on the first and second feature amount information (step S7).

図６は第１実施形態の画像処理装置３００の動作例を示すフローチャートである。はじめに、取得部３２１が、記憶部３１０から処理対象の動画に含まれるフレームを、カレントフレームとして１つ取得する（ステップＳ２１）。次に、抽出部３２２が、カレントフレームから複数の第１の部分領域を抽出する（ステップＳ２２）。次に、判定部３２３が、分類器（式（２）参照）により追跡対象領域であると判定された第１の部分領域を、追跡対象物体を含む追跡対象領域として検出する（ステップＳ２３）。次に、推定部３２４が、複数の追跡対象領域がある場合、各追跡対象領域に基づいて追跡対象物体の位置を推定する（ステップＳ２４）。 FIG. 6 is a flowchart illustrating an operation example of the image processing apparatus 300 according to the first embodiment. First, the acquiring unit 321 acquires one frame included in the processing target moving image from the storage unit 310 as a current frame (step S21). Next, the extraction unit 322 extracts a plurality of first partial areas from the current frame (step S22). Next, the determination unit 323 detects the first partial region determined as the tracking target region by the classifier (see Expression (2)) as the tracking target region including the tracking target object (step S23). Next, when there are a plurality of tracking target areas, the estimation unit 324 estimates the position of the tracking target object based on each tracking target area (step S24).

次に、正例追加部３２５が、追跡対象物体の推定位置から第１の閾値以内の距離にある第２の部分領域が、分類器により正例であると判定された場合、第２の部分領域を正例として訓練データに追加する（ステップＳ２５）。次に、負例追加部３２６が、正例として追加された第２の部分領域の位置から第２の閾値以上の距離にある第３の部分領域が、分類器により正例であると判定された場合、第３の部分領域を、負例として訓練データに追加する（ステップＳ２６）。 Next, when the positive example adding unit 325 determines that the second partial region within the first threshold from the estimated position of the tracking target object is a positive example by the classifier, the second part A region is added to the training data as a positive example (step S25). Next, the negative example addition unit 326 determines that the third partial region located at a distance greater than or equal to the second threshold from the position of the second partial region added as a positive example is a positive example. If so, the third partial region is added to the training data as a negative example (step S26).

次に、更新部３２７が、ステップＳ２５で正例追加部３２５により更新された訓練データと、ステップＳ２６で負例追加部３２６により更新された訓練データを使用して分類器を更新する（ステップＳ２７）。 Next, the updating unit 327 updates the classifier using the training data updated by the positive example adding unit 325 in step S25 and the training data updated by the negative example adding unit 326 in step S26 (step S27). ).

次に、更新部３２７が、カレントフレームが動画の最終フレームであるか否かを判定する。最終フレームでない場合（ステップＳ２８、Ｎｏ）、処理はステップＳ１に戻る。最終フレームである場合（ステップＳ２８、Ｙｅｓ）、処理は終了する。 Next, the update unit 327 determines whether the current frame is the final frame of the moving image. If it is not the final frame (step S28, No), the process returns to step S1. If it is the last frame (step S28, Yes), the process ends.

以上説明したように、第１実施形態の画像処理装置３００によれば、追跡対象物体と背景とを学習した分類器を用いることで、追跡対象と背景の混同を低減し、ロバストな物体追跡を実現することができる。特に、各フレームで分類器の更新を行う際、追跡対象物体と誤判定されるサンプルのみを負例として訓練データに追加するので、訓練データに冗長なサンプルを増加させることがなく、かつ追跡対象物体に類似する物体との誤認識を低減させることができる。 As described above, according to the image processing apparatus 300 of the first embodiment, by using the classifier that learns the tracking target object and the background, confusion between the tracking target and the background is reduced, and robust object tracking is performed. Can be realized. In particular, when updating the classifier in each frame, only samples that are erroneously determined as tracking target objects are added to the training data as negative examples, so that redundant samples are not added to the training data and the tracking target is not increased. False recognition with an object similar to the object can be reduced.

（第１実施形態の変形例１）
次に第１実施形態の変形例１について説明する。変形例１では、分類器として２クラス分類のサポートベクターマシン（ＳＶＭ）が使用される場合について説明する。変形例１の説明では、第１実施形態と同様の説明については省略し、第１実施形態と異なる箇所について説明する。 (Modification 1 of the first embodiment)
Next, Modification 1 of the first embodiment will be described. In the first modification, a case where a two-class classification support vector machine (SVM) is used as a classifier will be described. In the description of the first modification, the description similar to that of the first embodiment will be omitted, and different parts from the first embodiment will be described.

ＳＶＭによる学習では、Ｍ個のＤ次元サポートベクトルｓ^（ｉ）（ｉ＝１，２，・・・，Ｍ）、対応するＭ個の実数の重みω_ｉ（ｉ＝１，２，・・・，Ｍ）、及び、オフセットｃ（ｃは実数）が求まり、次式（５）により識別関数ｈ（ｘ）が得られる。 In learning by SVM, M D-dimensional support vectors s ⁽ⁱ⁾ (i = 1, 2,..., M) and corresponding M real weights ω _i (i = 1, 2,...). , M) and an offset c (c is a real number), and the discriminant function h (x) is obtained by the following equation (5).

領域の特徴を示すＤ次元特徴ベクトルｘに対して、ｈ（ｘ）が正であれば、当該Ｄ次元特徴ベクトルｘにより特徴付けられる領域に追跡対象物体が含まれると予測され、ｈ（ｘ）が負であれば、当該Ｄ次元特徴ベクトルｘにより特徴付けられる領域が背景である（追跡対象物体でない）と予測される。ここでＫはカーネル関数Ｒ^Ｄ×Ｒ^Ｄ→Ｒである。なおＲ^ＤはＤ次元実ベクトル空間の集合を示す。Ｒは１次元実ベクトル空間の集合を示す。 If h (x) is positive with respect to the D-dimensional feature vector x indicating the feature of the region, it is predicted that the tracking target object is included in the region characterized by the D-dimensional feature vector x, and h (x) Is negative, it is predicted that the region characterized by the D-dimensional feature vector x is the background (not the tracking target object). Here, K is a kernel function R ^D × R ^D → R. ^RD represents a set of D-dimensional real vector spaces. R represents a set of one-dimensional real vector spaces.

カーネル関数は、例えば次式（６）のＲＢＦカーネルである。ここでγはパラメータである。なおＳＶＭの学習法の詳細は非特許文献２を参照されたい。 The kernel function is, for example, an RBF kernel represented by the following formula (6). Here, γ is a parameter. Refer to Non-Patent Document 2 for details of the SVM learning method.

上述の式（５）の分類器の初期学習は、第１実施形態の学習装置１００と同様の順序で行われるが、分類器の具体的な学習処理がＳＶＭに置き換わる。 The initial learning of the classifier of the above formula (5) is performed in the same order as the learning device 100 of the first embodiment, but the specific learning process of the classifier is replaced with SVM.

また変形例１の画像処理装置３００の動作は、第１実施形態の画像処理装置３００と同様であるが、第１実施形態の式（２）の分類器が、上述の式（５）の分類器に置き換わる。 The operation of the image processing apparatus 300 according to the first modification is the same as that of the image processing apparatus 300 according to the first embodiment. Replaces the vessel.

しかしながらＳＶＭを使用する変形例１では、訓練データが増えるにつれて学習にかかる処理時間が増大する。そこで、変形例１の正例追加部３２５は、正例として新しく追加されたサンプルの数だけ、過去に追加されたサンプルを訓練データから削除する。 However, in Modification 1 using SVM, the processing time required for learning increases as the training data increases. Therefore, the positive example adding unit 325 according to the first modification deletes samples added in the past from the training data by the number of samples newly added as positive examples.

削減の具体的な方法は任意でよい。正例追加部３２５は、例えば過去に訓練データに追加されたサンプルのうち、古いサンプルから順に削減してもよい。また例えば、正例追加部３２５は、訓練データからサポートベクトル以外のサンプルを削除してもよい。また例えば、正例追加部３２５は、訓練データから新しく追加されたサンプルに類似するサンプルを削減してもよい。 A specific method of reduction may be arbitrary. For example, the positive example adding unit 325 may sequentially reduce the oldest samples among the samples added to the training data in the past. Further, for example, the positive example adding unit 325 may delete samples other than the support vector from the training data. Further, for example, the positive example adding unit 325 may reduce samples similar to the newly added sample from the training data.

同様に、変形例１の負例追加部３２６は、負例として新しく追加されたサンプルの数だけ、過去に追加されたサンプルを訓練データから削除する。 Similarly, the negative example addition part 326 of the modification 1 deletes the sample added in the past from training data by the number of the samples newly added as a negative example.

変形例１の更新部３２７は、正例追加部３２５及び負例追加部３２６により更新された訓練データを元に、分類器（ＳＶＭ）を更新（再学習）し、更新されたＳＶＭを記憶部３１０に記憶する。 The update unit 327 of Modification 1 updates (relearns) the classifier (SVM) based on the training data updated by the positive example addition unit 325 and the negative example addition unit 326, and stores the updated SVM. Store in 310.

変形例１によれば、分類器にＳＶＭを使用することにより、特徴空間で追跡対象物体と背景が線形分離不可能の場合でも、ＳＶＭの非線形分離能力により、追跡対象物体を背景から分離して追跡を行うことが可能となる。 According to the first modification, by using SVM as a classifier, even when the tracking target object and the background cannot be linearly separated in the feature space, the tracking target object is separated from the background by the non-linear separation capability of the SVM. Tracking can be performed.

（第１実施形態の変形例２）
次に第１実施形態の変形例２について説明する。変形例２では、分類器としてＡｄａＢｏｏｓｔが使用される場合について説明する。変形例２の説明では、第１実施形態と同様の説明については省略し、第１実施形態と異なる箇所について説明する。 (Modification 2 of the first embodiment)
Next, a second modification of the first embodiment will be described. In the second modification, a case where AdaBoost is used as a classifier will be described. In the description of the modified example 2, the description similar to that of the first embodiment will be omitted, and portions different from the first embodiment will be described.

ＡｄａＢｏｏｓｔでは、Ｍ個の弱識別器ｈ_ｉ（ｘ）（ｉ＝１，２，…，Ｍ）の重みω_ｉの線形和により、強識別器ｈ（ｘ）を次式（７）により構成する。 In AdaBoost, the strong classifier h (x) is configured by the following equation (7) by the linear sum of the weights ω _i of the M weak classifiers h _i (x) (i = 1, 2,..., M). .

弱識別器の学習については、非特許文献３と同様の方法を取ることができる。 For learning of the weak classifier, the same method as in Non-Patent Document 3 can be used.

上述の式（７）の分類器の初期学習は、第１実施形態の学習装置１００と同様の順序で行われるが、分類器の具体的な学習処理がＡｄａＢｏｏｓｔに置き換わる。 The initial learning of the classifier of Equation (7) is performed in the same order as the learning device 100 of the first embodiment, but the specific learning process of the classifier is replaced with AdaBoost.

また変形例２の画像処理装置３００の動作は、第１実施形態の画像処理装置３００と同様であるが、第１実施形態の式（２）の分類器が、上述の式（７）の分類器に置き換わる。 The operation of the image processing apparatus 300 according to the second modification is the same as that of the image processing apparatus 300 according to the first embodiment. However, the classifier according to the expression (2) according to the first embodiment performs the classification according to the expression (7) described above. Replaces the vessel.

なお変形例２の更新部３２７は、正例追加部３２５及び負例追加部３２６により更新された訓練データを元に、分類器を更新（再学習）し、更新された弱識別器ｈ_ｉ（ｘ）と、更新された重みω_ｉとを記憶部３１０に記憶する。 The update unit 327 of the second modification updates (re-learns) the classifier based on the training data updated by the positive example addition unit 325 and the negative example addition unit 326, and updates the weak classifier h _i ( x) and the updated weight ω _i are stored in the storage unit 310.

ＡｄａＢｏｏｓｔでは、非特許文献３のようなオンライン学習を行うことができるので、古い訓練データは破棄することができる。オンライン学習を行わない場合は、記憶部３１０が訓練データを記憶する必要がある。しかしながら、正例追加部３２５及び負例追加部３２６が、変形例１の場合と同様に、古いサンプルや類似するサンプルから順に記憶部３１０の訓練データを削減してもよい。 Since AdaBoost can perform online learning as in Non-Patent Document 3, old training data can be discarded. When online learning is not performed, the storage unit 310 needs to store training data. However, the positive example adding unit 325 and the negative example adding unit 326 may reduce the training data in the storage unit 310 in order from an old sample or a similar sample, as in the first modification.

変形例２によれば、分類器にＡｄａＢｏｏｓｔを使用することにより、特徴空間で追跡対象物体と背景が線形分離不可能の場合でも、ＡｄａＢｏｏｓｔの非線形分離能力により、追跡対象物体を背景から分離して追跡を行うことが可能となる。 According to the second modification, by using AdaBoost as the classifier, even if the tracking target object and the background cannot be linearly separated in the feature space, the tracking target object is separated from the background by the non-linear separation capability of AdaBoost. Tracking can be performed.

（第２実施形態）
次に第２実施形態について説明する。第２実施形態の説明では、追跡対象物体の動作予測にパーティクルフィルタを用いる場合について説明する。第２実施形態の説明では、第１実施形態と同様の説明については省略し、第１実施形態と異なる箇所について説明する。 (Second Embodiment)
Next, a second embodiment will be described. In the description of the second embodiment, a case will be described in which a particle filter is used for motion prediction of a tracking target object. In the description of the second embodiment, the description similar to that of the first embodiment is omitted, and only points different from the first embodiment will be described.

パーティクルフィルタでは、追跡対称物体のパラメータを有するＮ個のパーティクルの集合により状態表現が行われる。第２実施形態の画像処理装置３００は、パーティクルの更新により追跡対象物体の動作を予測し、当該追跡対象物体の位置を推定する。以下ではパラメータとしてパーティクルの位置及び速度を用いる場合について説明する。ｎ番目のパーティクルの位置を（ｘ^（ｎ），ｙ^（ｎ））とする。またｎ番目のパーティクルの速度を（／ｘ^（ｎ），／ｙ^（ｎ））とする。ただし、”／”は、変数ｘ、ｙ上部の微分記号・を示す。またｎ番目のパーティクルの位置及び速度をまとめてｚ^（ｎ）＝（ｘ^（ｎ），ｙ^（ｎ），／ｘ^（ｎ），／ｙ^（ｎ））と表記する。またｎ番目のパーティクルの重みをｗ^（ｎ）と表記する。 In the particle filter, the state is expressed by a set of N particles having parameters of the tracking symmetry object. The image processing apparatus 300 according to the second embodiment predicts the operation of the tracking target object by updating the particle, and estimates the position of the tracking target object. Hereinafter, a case where the position and speed of a particle are used as parameters will be described. The position of the nth particle is assumed to be (x ⁽ⁿ⁾ , y ⁽ⁿ⁾ ). The speed of the nth particle is (/ x ⁽ⁿ⁾ , / y ⁽ⁿ⁾ ). However, “/” indicates a differential symbol • above the variables x and y. Further, the position and velocity of the nth particle are collectively ^expressed as z ⁽ⁿ⁾ = (x ⁽ⁿ⁾ , y ⁽ⁿ⁾ , / x ⁽ⁿ⁾ , / y ⁽ⁿ⁾ ). The weight of the nth particle is denoted as w ⁽ⁿ⁾ .

第２実施形態の画像処理装置３００で使用される分類器の初期学習は、第１実施形態の学習装置１００と同様の順序で行われる。なお第２実施形態で使用される分類器は、線形判別法（第１実施形態参照）、ＳＶＭ（第１実施形態の変形例１参照）、及び、ＡｄａＢｏｏｓｔ（第１実施形態の変形例２参照）のいずれでもよい。また、これら以外の２クラス分類が可能なアルゴリズムを使用することもできる。 The initial learning of the classifier used in the image processing apparatus 300 of the second embodiment is performed in the same order as the learning apparatus 100 of the first embodiment. Note that the classifiers used in the second embodiment are a linear discriminant (see the first embodiment), SVM (see the first modification of the first embodiment), and AdaBoost (see the second modification of the first embodiment). ). In addition, other algorithms that can be classified into two classes can also be used.

また第２実施形態の画像処理装置３００の動作は、第１実施形態の画像処理装置３００と同様であるが、抽出部３２２及び推定部３２４の動作が第１実施形態の画像処理装置３００の動作と異なる。 The operation of the image processing apparatus 300 of the second embodiment is the same as that of the image processing apparatus 300 of the first embodiment, but the operations of the extraction unit 322 and the estimation unit 324 are the operations of the image processing apparatus 300 of the first embodiment. And different.

記憶部３１０は、パーティクルの初期位置及び初期速度を記憶する。パーティクルの初期位置は、学習装置１００で指定されたユーザ指定の追跡位置である。パーティクルの初期速度は０である。 The storage unit 310 stores the initial position and initial velocity of particles. The initial positions of the particles are user-specified tracking positions specified by the learning apparatus 100. The initial velocity of the particles is zero.

抽出部３２２は、取得部３２１からカレントフレームを受け付けると、当該カレントフレームから複数の第１の部分領域をパーティクルフィルタに基づいて抽出する。抽出部３２２は、例えばＮ個のパーティクル｛ｚ^（ｎ）｝^Ｎ _ｎ−１から重複を許してＮ個のパーティクルを再サンプリングすることにより、｛ｚ^（ｎ） _ｔｍｐ｝^Ｎ _ｎ−１を取得する。再サンプリングは各パーティクルの重みｗ^（ｎ）の比率に応じて行われる。つまり、抽出部３２２は、ｎ番目のパーティクルを次式（８）の確率Ｐ_ｎで選択する。 When receiving the current frame from the acquisition unit 321, the extraction unit 322 extracts a plurality of first partial regions from the current frame based on the particle filter. The extraction unit 322 obtains {z ⁽ⁿ⁾ _tmp } ^N _n−1 by re-sampling N particles, for example, by allowing N particles to overlap from ^N particles {z ⁽ⁿ⁾ } ^N _n−1. . Resampling is performed according to the ratio of the weight w ⁽ⁿ⁾ of each particle. That is, the extraction unit 322 selects the n-th particle with the probability P _n of the following equation (8).

そして、抽出部３２２は、次式（９）〜（１２）により、Ｎ個のパーティクル（ｎ＝１，２，…，Ｎ）を更新する。 Then, the extraction unit 322 updates N particles (n = 1, 2,..., N) according to the following equations (9) to (12).

ここでΔｔはパラメータであり、ε_１，…，ε_４は乱数である。抽出部３２２は、１つのパーティクルの位置を示す座標を中心に含む第１の部分領域を抽出する。 Here, Δt is a parameter, and ε ₁ ,..., Ε ₄ are random numbers. The extraction unit 322 extracts a first partial region including the coordinates indicating the position of one particle as the center.

推定部３２４は、各パーティクルに対応する第１の部分領域の識別関数値ｈ_ｉを元に重みｗ^（ｉ）を算出する。識別関数値ｈ_ｉから重みｗ^（ｉ）を算出する式は、識別関数値ｈ_ｉに対してｗ^（ｉ）が単調増加する関数を用いることができる。次式（１３）は識別関数値ｈ_ｉから重みｗ^（ｉ）を算出する式の例である。 The estimation unit 324 calculates the weight w ⁽ⁱ⁾ based on the identification function value h _i of the first partial region corresponding to each particle. Calculates the weight w ⁽ⁱ⁾ from the identification function value h _i equation can be used a function w ⁽ⁱ⁾ is increased monotonically with respect to the classification function value h _i. The following equation (13) is an example of an equation for calculating the weight w ⁽ⁱ⁾ from the discriminant function value h _i .

ここでα（＞０）はパラメータである。そして、推定部３２４は、ｉ番目の第１の部分領域の中心位置を（ｘ_ｉ，ｙ_ｉ）として、次式（１４）により、追跡対象物体の推定位置を算出する。 Here, α (> 0) is a parameter. Then, the estimation unit 324 calculates the estimated position of the tracking target object using the following expression (14), where the center position of the i-th first partial region is (x _i , y _i ).

以上により、第２実施形態の画像処理装置３００によれば、パーティクルフィルタに基づく動作予測により追跡対象物体の追跡を行うことができる。第２実施形態の説明では、パーティクルのパラメータとして位置及び速度を用いたが、さらに加速度及び躍度、並びに、第１の部分領域のサイズ及び形状等をパラメータとして付加してもよい。 As described above, according to the image processing apparatus 300 of the second embodiment, it is possible to track the tracking target object by motion prediction based on the particle filter. In the description of the second embodiment, the position and the velocity are used as the particle parameters. However, the acceleration and jerk, the size and shape of the first partial region, and the like may be added as parameters.

第２実施形態の画像処理装置３００によれば、パーティクルフィルタを用いることで、複雑な動作を行う追跡対象物体を少ない処理コストでロバストに追跡することができる。 According to the image processing apparatus 300 of the second embodiment, by using a particle filter, it is possible to robustly track a tracking target object that performs a complicated operation with low processing cost.

最後に第１及び第２実施形態の学習装置１００及び画像処理装置３００のハードウェア構成の例について説明する。 Finally, examples of hardware configurations of the learning device 100 and the image processing device 300 according to the first and second embodiments will be described.

図７は第１及び第２実施形態の学習装置１００及び画像処理装置３００のハードウェア構成の例を示す図である。第１及び第２実施形態の学習装置１００及び画像処理装置３００は、制御装置４０１、主記憶装置４０２、補助記憶装置４０３、表示装置４０４、入力装置４０５及び通信装置４０６を備える。制御装置４０１、主記憶装置４０２、補助記憶装置４０３、表示装置４０４、入力装置４０５及び通信装置４０６は、バス４１０を介して接続されている。 FIG. 7 is a diagram illustrating an example of a hardware configuration of the learning device 100 and the image processing device 300 according to the first and second embodiments. The learning device 100 and the image processing device 300 according to the first and second embodiments include a control device 401, a main storage device 402, an auxiliary storage device 403, a display device 404, an input device 405, and a communication device 406. The control device 401, main storage device 402, auxiliary storage device 403, display device 404, input device 405, and communication device 406 are connected via a bus 410.

制御装置４０１は補助記憶装置４０３から主記憶装置４０２に読み出されたプログラムを実行する。主記憶装置４０２はＲＯＭ及びＲＡＭ等のメモリである。補助記憶装置４０３はメモリカード及びＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等である。 The control device 401 executes the program read from the auxiliary storage device 403 to the main storage device 402. The main storage device 402 is a memory such as a ROM and a RAM. The auxiliary storage device 403 is a memory card, an SSD (Solid State Drive), or the like.

表示装置４０４は情報を表示する。表示装置４０４は、例えば液晶ディスプレイである。入力装置４０５は、情報の入力を受け付ける。入力装置４０５は、例えばキーボード及びマウス等である。通信装置４０６は他の装置と通信する。 The display device 404 displays information. The display device 404 is, for example, a liquid crystal display. The input device 405 receives information input. The input device 405 is, for example, a keyboard and a mouse. The communication device 406 communicates with other devices.

第１及び第２実施形態の学習装置１００及び画像処理装置３００で実行されるプログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、メモリカード、ＣＤ−Ｒ及びＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等のコンピュータで読み取り可能な記憶媒体に記憶されてコンピュータ・プログラム・プロダクトとして提供される。 The programs executed by the learning apparatus 100 and the image processing apparatus 300 according to the first and second embodiments are files in an installable format or an executable format, and are CD-ROM, memory card, CD-R, and DVD (Digital Versatile). The program is stored in a computer-readable storage medium such as Disk) and provided as a computer program product.

また第１及び第２実施形態の学習装置１００及び画像処理装置３００で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また第１及び第２実施形態の学習装置１００及び画像処理装置３００が実行するプログラムを、ダウンロードさせずにインターネット等のネットワーク経由で提供するように構成してもよい。 The program executed by the learning apparatus 100 and the image processing apparatus 300 according to the first and second embodiments is stored on a computer connected to a network such as the Internet, and is provided by being downloaded via the network. May be. The programs executed by the learning device 100 and the image processing device 300 according to the first and second embodiments may be provided via a network such as the Internet without being downloaded.

また第１及び第２実施形態の学習装置１００及び画像処理装置３００で実行されるプログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 The program executed by the learning apparatus 100 and the image processing apparatus 300 according to the first and second embodiments may be provided by being incorporated in advance in a ROM or the like.

第１及び第２実施形態の学習装置１００及び画像処理装置３００で実行されるプログラムは、上述の第１及び第２実施形態の学習装置１００及び画像処理装置３００の機能構成のうち、プログラムにより実現可能な機能を含むモジュール構成となっている。 A program executed by the learning device 100 and the image processing device 300 according to the first and second embodiments is realized by a program among the functional configurations of the learning device 100 and the image processing device 300 according to the first and second embodiments described above. It has a module configuration that includes possible functions.

プログラムにより実現される機能は、制御装置４０１が補助記憶装置４０３等の記憶媒体からプログラムを読み出して実行することにより、プログラムにより実現される機能が主記憶装置４０２にロードされる。すなわちプログラムにより実現される機能は、主記憶装置４０２上に生成される。 Functions realized by the program are loaded into the main storage device 402 when the control device 401 reads the program from a storage medium such as the auxiliary storage device 403 and executes the program. That is, the function realized by the program is generated on the main storage device 402.

なお第１及び第２実施形態の学習装置１００及び画像処理装置３００の機能の一部又は全部を、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等のハードウェアにより実現してもよい。 Note that some or all of the functions of the learning device 100 and the image processing device 300 according to the first and second embodiments may be realized by hardware such as an IC (Integrated Circuit).

１０記憶部
２０学習制御部
２１取得部
２２指定部
２３正例取得部
２４負例取得部
２５学習部
３０入力部
４０表示部
１００学習装置
３００画像処理装置
３１０記憶部
３２０画像処理部
３２１取得部
３２２抽出部
３２３判定部
３２４推定部
３２５正例追加部
３２６負例追加部
３２７更新部 DESCRIPTION OF SYMBOLS 10 Memory | storage part 20 Learning control part 21 Acquisition part 22 Specification part 23 Positive example acquisition part 24 Negative example acquisition part 25 Learning part 30 Input part 40 Display part 100 Learning apparatus 300 Image processing apparatus 310 Storage part 320 Image processing part 321 Acquisition part 322 Extraction unit 323 determination unit 324 estimation unit 325 positive example addition unit 326 negative example addition unit 327 update unit

特許第５０５２６７０号公報Japanese Patent No. 5052670 特許第５７１９２３０号公報Japanese Patent No. 5719230

Ｙ．Ｗｕ，Ｊ．Ｌｉｍ，Ｍ．Ｈ．Ｙａｎｇ， “ＯｂｊｅｃｔＴｒａｃｋｉｎｇＢｅｎｃｈｍａｒｋ，” ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，Ｐｒｅ−Ｐｒｉｎｔｓ，２０１５．Y. Wu, J .; Lim, M.M. H. Yang, “Object Tracking Benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Pre-Prints, 2015. Ｓ．Ａｖｉｄａｎ， “ＳｕｐｐｏｒｔＶｅｃｔｏｒＴｒａｃｋｉｎｇ，” ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，ｖｏｌ．２６，ｎｏ．８，ｐｐ．１０６４−１０７２，２００４．S. Avidan, “Support Vector Tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 8, pp. 1064-1072, 2004. Ｈ．Ｇｒａｂｎｅｒ，ｅｔａｌ．， “Ｏｎ−ｌｉｎｅＢｏｏｓｔｉｎｇａｎｄＶｉｓｉｏｎ，” ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（ＣＶＰＲ），２００６．H. Grabner, et al. “On-line Boosting and Vision,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. Ｂ．Ｂａｂｅｎｋｏ，ｅｔａｌ．， “ＶｉｓｕａｌＴｒａｃｋｉｎｇｗｉｔｈＯｎｌｉｎｅＭｕｌｔｉｐｌｅＩｎｓｔａｎｃｅＬｅａｒｎｉｎｇ，” ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩＥＥＥＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ（ＣＶＰＲ），２００９．B. Babenko, et al. “Visual Tracking with Online Multiple Instance Learning,” Proceedings of the IEEE Conferencing on Computer Vision and Pattern Recognition (CVPR), 2009.

Claims

An acquisition unit for acquiring frames from a video;
An extraction unit for extracting a plurality of first partial areas indicating partial areas of the frame;
Whether each of the first partial areas is a tracking target area including a tracking target object is learned from training data using the tracking target area as a positive example and a non-tracking target area as a negative example. A determination unit for determining by the classifier,
An estimation unit that estimates an estimated position of the tracking target object from a plurality of tracking target regions;
When the second partial area located within the first threshold from the estimated position is determined to be a positive example by the classifier, the second partial area is added to the training data as a positive example. Positive example addition part,
When the third partial area located at a distance equal to or larger than the second threshold from the position of the second partial area added as a positive example is determined as a positive example by the classifier, the third part A negative example addition unit for adding a region as a negative example to the training data;
An updating unit for updating the classifier using the training data updated by the positive example adding unit and the negative example adding unit;
An image processing apparatus comprising:

The classifier is a discriminant function that performs binary classification based on linear discriminant analysis.
The image processing apparatus according to claim 1.

The classifier is an identification function that performs binary classification based on a support vector machine.
The image processing apparatus according to claim 1.

The classifier is an identification function that performs binary classification based on a combination of weak classifiers.
The image processing apparatus according to claim 1.

The estimation unit estimates the tracking target region having the maximum value of the identification function as an estimated position of the tracking target object;
The image processing apparatus according to claim 2.

The estimation unit estimates an estimated position of the tracking target object based on an average position of the tracking target area;
The image processing apparatus according to claim 2.

The estimation unit estimates an estimated position of the tracking target object by a weighted average based on an identification function value of the tracking target region;
The image processing apparatus according to claim 2.

The extraction unit extracts the first partial region based on a particle filter;
The image processing apparatus according to claim 1.

An image processing device acquiring a frame from a moving image;
An image processing apparatus extracting a plurality of first partial areas indicating a partial area of the frame;
The image processing apparatus determines whether each of the first partial areas is a tracking target area including a tracking target object, using the tracking target area as a positive example and a non-tracking target area as a negative example. Determining by a classifier learned from training data;
An image processing device estimating an estimated position of the tracking target object from a plurality of tracking target regions;
When the image processing apparatus determines that the second partial region at a distance within the first threshold from the estimated position is a positive example by the classifier, the second partial region is used as a positive example. Adding to the training data;
When it is determined by the classifier that the third partial region at a distance equal to or greater than a second threshold from the position of the second partial region added as a positive example is a positive example, Adding the third partial region to the training data as a negative example;
The image processing apparatus adds the second partial region to the training data as a positive example, and adds the third partial region to the training data as a negative example. Updating the classifier with data;
An image processing method including:

Computer
An acquisition unit for acquiring frames from a video;
An extraction unit for extracting a plurality of first partial areas indicating partial areas of the frame;
Whether each of the first partial areas is a tracking target area including a tracking target object is learned from training data using the tracking target area as a positive example and a non-tracking target area as a negative example. A determination unit for determining by the classifier,
An estimation unit that estimates an estimated position of the tracking target object from a plurality of tracking target regions;
When the second partial area located within the first threshold from the estimated position is determined to be a positive example by the classifier, the second partial area is added to the training data as a positive example. Positive example addition part,
When the third partial area located at a distance equal to or larger than the second threshold from the position of the second partial area added as a positive example is determined as a positive example by the classifier, the third part A negative example addition unit for adding a region as a negative example to the training data;
An updating unit for updating the classifier using the training data updated by the positive example adding unit and the negative example adding unit;
Program to function as.