JP4181473B2

JP4181473B2 - Video object trajectory synthesis apparatus, method and program thereof

Info

Publication number: JP4181473B2
Application number: JP2003355620A
Authority: JP
Inventors: 正樹高橋; 清一合志; 俊彦三須
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2003-10-15
Filing date: 2003-10-15
Publication date: 2008-11-12
Anticipated expiration: 2023-10-15
Also published as: JP2005123824A

Description

本発明は、入力された映像に含まれる映像オブジェクトが移動する移動軌跡を、映像上に合成する映像オブジェクト軌跡合成装置、その方法及びそのプログラムに関する。 The present invention relates to a video object trajectory synthesizing apparatus, a method thereof, and a program thereof that synthesize a moving trajectory in which a video object included in an input video moves on the video.

従来、映像に含まれるオブジェクト（映像オブジェクト）を抽出し、追跡する映像オブジェクト抽出技術は種々存在している（例えば、特許文献１等）。
そして、その抽出した映像オブジェクトの位置情報を利用して、映像に特殊な効果を持たせることで、映像制作の幅を広げる試みが行われている。この映像に特殊効果を持たせる一例としては、野球中継等で、ピッチャーが投げたボール（映像オブジェクト）を追跡し、その後に、ピッチャーの投球映像にボールの移動軌跡をＣＧ（コンピュータ・グラフックス）により合成するものがある。これによって、視聴者は、ピッチャーがボールを離した瞬間からキャッチャーがボールを捕球するまでのボールの軌跡、例えば、変化球の変化をより実感することができる。なお、従来、この野球の投球軌跡を描画する場合、ボールを複数のカメラで撮影し、三次元座標上におけるボールの軌跡を算出し、ＣＧ映像として、投球映像に合成している。
特開２００３−４４８６０号公報（段落００１８〜００２７、図１） Conventionally, there are various video object extraction techniques for extracting and tracking an object (video object) included in a video (for example, Patent Document 1).
Attempts have been made to expand the scope of video production by using the positional information of the extracted video objects to give the video special effects. As an example of giving a special effect to this video, the ball (video object) thrown by the pitcher is tracked in a baseball broadcast or the like, and then the movement trajectory of the ball is displayed on the pitcher's pitch video by CG (Computer Graphics). There is something to synthesize. Thereby, the viewer can more feel the change of the trajectory of the ball, for example, the change ball, from the moment when the pitcher releases the ball until the catcher catches the ball. Conventionally, when drawing this baseball pitching trajectory, the ball is photographed by a plurality of cameras, the trajectory of the ball on three-dimensional coordinates is calculated, and synthesized as a CG video with the pitching video.
JP 2003-44860 (paragraphs 0018 to 0027, FIG. 1)

前記した特許文献１等の映像オブジェクト抽出技術は、映像から映像オブジェクトを抽出し、追跡するものであり、抽出した映像オブジェクトの位置情報に基づいて、リアルタイムで映像オブジェクトの移動軌跡を合成するものではない。また、映像オブジェクト抽出技術によって抽出した映像オブジェクトの位置情報から、映像オブジェクトの移動軌跡を元の映像に合成するには、抽出した位置情報に基づいて、人が手作業によって元の映像に移動軌跡を描画するという作業が必要となり、リアルタイムに移動軌跡を合成（描画）することができないという問題がある。 The video object extraction technique described in Patent Document 1 described above is to extract and track a video object from a video, and to synthesize a moving trajectory of the video object in real time based on the extracted position information of the video object. Absent. In addition, in order to synthesize the moving trajectory of a video object from the position information of the video object extracted by the video object extraction technology to the original video, a person manually moves to the original video based on the extracted position information. Therefore, there is a problem that it is impossible to synthesize (draw) the movement locus in real time.

さらに、野球中継におけるボールの軌跡をピッチャーの投球映像に合成する場合、複数のカメラが必要となり機器構成が大規模になってしまうという問題がある。そして、この場合、ボールの軌跡をＣＧ映像として投球映像に合成しており、より映像の効果を高めるため、移動軌跡を実写のボールの映像を用いて合成したいという要求がある。 Furthermore, when the trajectory of a ball in a baseball broadcast is synthesized with a pitcher's pitch video, there is a problem that a plurality of cameras are required and the device configuration becomes large. In this case, the ball trajectory is combined with the pitched video as a CG video, and there is a demand to synthesize the moving trajectory using the live-action ball video in order to further enhance the video effect.

本発明は、以上のような問題点に鑑みてなされたものであり、映像から映像オブジェクトを抽出し、追跡するとともに、映像オブジェクトの移動軌跡を映像上にリアルタイム及び実写で合成することが可能な映像オブジェクト軌跡合成装置、その方法及びそのプログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and can extract and track a video object from a video, and can synthesize a moving trajectory of the video object on the video in real time and a live action. An object of the present invention is to provide a video object trajectory synthesis apparatus, a method thereof, and a program thereof.

本発明は、前記目的を達成するために創案されたものであり、まず、請求項１に記載の映像オブジェクト軌跡合成装置は、入力された映像に含まれる映像オブジェクトを抽出し、追跡するとともに、当該映像オブジェクトの移動軌跡を前記映像上に合成する映像オブジェクト軌跡合成装置であって、オブジェクト候補画像生成手段と、抽出条件記憶手段と、オブジェクト抽出手段と、軌跡画像生成手段と、探索領域推定手段と、画像合成手段と、を備える構成とした。 The present invention was created to achieve the above object, and first, the video object trajectory synthesis apparatus according to claim 1 extracts and tracks a video object included in an input video, A video object trajectory synthesizing apparatus that synthesizes a moving trajectory of the video object on the video, the object candidate image generating means, the extraction condition storage means, the object extracting means, the trajectory image generating means, and the search area estimating means And an image composition means.

かかる構成によれば、映像オブジェクト軌跡合成装置は、オブジェクト候補画像生成手段によって、時系列に入力され映像を構成する静止画単位のフレーム画像の中から、映像オブジェクトの候補を抽出したオブジェクト候補画像を生成する。例えば、フレーム画像を輝度によって２値化したり、前に入力されたフレーム画像と、その次に入力されたフレーム画像との差分によって動きのある領域を特定したりすることで、映像オブジェクトの候補となる領域を抽出したオブジェクト候補画像を生成する。 According to such a configuration, the video object trajectory synthesizer obtains an object candidate image obtained by extracting video object candidates from frame images in units of still images that are input in time series by the object candidate image generation unit and constitute the video. Generate. For example, by binarizing a frame image according to luminance, or by specifying a region having motion based on a difference between a previously input frame image and a next input frame image, An object candidate image from which a region to be extracted is extracted is generated.

そして、映像オブジェクト軌跡合成装置は、オブジェクト抽出手段が、抽出条件記憶手段に予め記憶されている抽出条件を参照することで、オブジェクト候補画像の中から、抽出すべき映像オブジェクトの位置及び映像オブジェクトを特徴付ける特徴量を抽出する。この抽出条件記憶手段に、予め抽出対象となる映像オブジェクトの抽出条件を記憶しておくことで、複数の映像オブジェクトの候補から１つの映像オブジェクトを特定することが可能になる。これによって、抽出すべき映像オブジェクトのフレーム画像内における位置と、その映像オブジェクトの持つ固有の特徴量が特定される。ここで、映像オブジェクトの位置は映像オブジェクトの重心座標等を用いることができる。また、映像オブジェクトの特徴量は、映像オブジェクトの輝度、色、テクスチャ等を用いることができる。 In the video object trajectory synthesis apparatus, the object extraction unit refers to the extraction condition stored in advance in the extraction condition storage unit, so that the position of the video object to be extracted and the video object are extracted from the object candidate images. Extract feature quantities to be characterized. By storing the extraction conditions of the video object to be extracted in advance in this extraction condition storage means, it becomes possible to specify one video object from a plurality of video object candidates. As a result, the position of the video object to be extracted in the frame image and the unique feature amount of the video object are specified. Here, the position of the video object can use the barycentric coordinates of the video object. In addition, the luminance, color, texture, and the like of the video object can be used as the feature amount of the video object.

そして、映像オブジェクト軌跡合成装置は、軌跡画像生成手段によって、オブジェクト抽出手段で抽出した映像オブジェクトの位置及び特徴量から、映像オブジェクトの移動軌跡のみを作画した軌跡画像を生成する。なお、この軌跡画像は、フレーム画像毎に抽出した映像オブジェクトの特徴量（例えば、テクスチャ等）を時系列に順次上書きすることで、生成することができる。
このようにして生成された軌跡画像を、画像合成手段によって、逐次フレーム画像と合成することで、映像に抽出対象となる映像オブジェクトの移動軌跡を合成した合成映像が生成される。 Then, the video object trajectory synthesizer generates a trajectory image in which only the moving trajectory of the video object is drawn from the position and the feature amount of the video object extracted by the object extraction means by the trajectory image generation means. The trajectory image can be generated by sequentially overwriting the feature amount (for example, texture) of the video object extracted for each frame image in time series.
The trajectory image generated in this manner is sequentially combined with the frame image by the image combining means, thereby generating a composite video in which the moving trajectory of the video object to be extracted is combined with the video.

そして、映像オブジェクト軌跡合成装置は、探索領域推定手段によって、オブジェクト抽出手段で抽出された映像オブジェクトの位置に基づいて、次に入力されるフレーム画像内における映像オブジェクトを探索するための探索領域を、オブジェクトの重心座標にカルマンフィルタを適用することで次に入力される映像オブジェクトの位置を予測し、その予測から推定する。
なお、探索領域推定手段は、カルマンフィルタのフィルタ感度であるカルマンゲインを、観測雑音の共分散である観測雑音共分散と、フレーム画像における状態量の共分散である状態共分散とに基づいて、更新し、状態共分散は、映像オブジェクトの追跡に成功したときには小さく、追跡に失敗したときは大きくとる。 Then, the video object trajectory synthesis apparatus is configured to search a search area for searching for a video object in the next input frame image based on the position of the video object extracted by the object extraction means by the search area estimation means. By applying a Kalman filter to the barycentric coordinates of the object, the position of the next video object to be input is predicted and estimated from the prediction.
The search area estimation means updates the Kalman gain, which is the filter sensitivity of the Kalman filter, based on the observation noise covariance that is the covariance of the observation noise and the state covariance that is the covariance of the state quantities in the frame image. and the state covariance is small when successful tracking of video objects, DOO Ru large when the tracking fails.

かかる構成によれば、映像オブジェクト軌跡合成装置は、探索領域推定手段によって、オブジェクト抽出手段で抽出された映像オブジェクトの位置に基づいて、次に入力されるフレーム画像内における映像オブジェクトの探索領域を、映像オブジェクトの近傍領域に設定することができる。すなわち、カルマンフィルタを用いることで、逐次移動する映像オブジェクトの位置を推定し、探索領域を限定することができる。 According to such a configuration, the video object trajectory synthesis device uses the search area estimation unit to determine the search area for the video object in the next input frame image based on the position of the video object extracted by the object extraction unit. It can be set in the vicinity area of the video object. That is, by using the Kalman filter, it is possible to estimate the position of the video object that moves sequentially and limit the search area .

また、請求項２に記載の映像オブジェクト軌跡合成装置は、請求項１に記載の映像オブジェクト軌跡合成装置において、前記軌跡画像生成手段が、前記オブジェクト抽出手段における抽出結果に基づいて、抽出に成功した前記映像オブジェクトの位置から、抽出に失敗した映像オブジェクトの位置を補間する軌跡補間手段を備えていることを特徴とする。 The video object trajectory synthesis apparatus according to claim 2 is the video object trajectory synthesis apparatus according to claim 1, wherein the trajectory image generation means has succeeded in extraction based on the extraction result of the object extraction means. Trajectory interpolation means for interpolating the position of the video object that has failed to be extracted from the position of the video object is provided.

かかる構成によれば、映像オブジェクト軌跡合成装置は、軌跡画像生成手段が、抽出に成功した映像オブジェクトの位置を保持しておき、映像オブジェクトの抽出に失敗した場合、それ以降の抽出に成功した段階で、抽出に成功した映像オブジェクトの位置と、先に保持してあった映像オブジェクトの位置とに基づいて、抽出に失敗した位置を内挿等によって補間する。これによって、映像オブジェクトの連続した軌跡画像を生成することができる。 According to such a configuration, the video object trajectory synthesizer maintains the position of the video object that has been successfully extracted by the trajectory image generation means, and if the extraction of the video object has failed, the subsequent extraction succeeded. Thus, based on the position of the video object that has been successfully extracted and the position of the video object that has been previously held, the position that has failed to be extracted is interpolated by interpolation or the like. As a result, a continuous trajectory image of the video object can be generated .

また、請求項３に記載の映像オブジェクト軌跡合成方法は、入力された映像に含まれる映像オブジェクトを抽出し、追跡するとともに、当該映像オブジェクトの移動軌跡を前記映像上に合成する映像オブジェクト軌跡合成方法であって、オブジェクト候補画像生成ステップと、オブジェクト抽出ステップと、軌跡画像生成ステップと、探索領域推定ステップと、画像合成ステップと、を含むことを特徴とする。 The video object trajectory composition method according to claim 3 , wherein the video object trajectory composition method extracts and tracks a video object included in the input video, and composes a moving trajectory of the video object on the video. The method includes an object candidate image generation step, an object extraction step, a trajectory image generation step, a search area estimation step, and an image composition step.

この手順によれば、映像オブジェクト軌跡合成方法は、オブジェクト候補画像生成ステップによって、時系列に入力され映像を構成する静止画単位のフレーム画像の中から、映像オブジェクトの候補を抽出したオブジェクト候補画像を生成する。
なお、オブジェクト候補画像は、時系列に入力されるフレーム画像における輝度画像、輪郭画像及び入力時刻の異なるフレーム画像の差分をとった差分画像の少なくとも１つ以上の画像から生成する。 According to this procedure, in the video object trajectory synthesis method, an object candidate image obtained by extracting video object candidates from frame images in units of still images that are input in time series and constitute a video by the object candidate image generation step. Generate.
Note that the object candidate image is generated from at least one or more images of a difference image obtained by calculating a difference between a luminance image, a contour image, and frame images having different input times in a frame image input in time series.

例えば、抽出対象となる映像オブジェクトが、特定の輝度以上（又は以下）の明るさを有している場合は、フレーム画像を輝度の情報のみを有する輝度画像から、映像オブジェクトの候補を抽出する。また、映像オブジェクトの輝度が、周辺との輝度と大きく異なる場合は、輪郭のみを抽出した輪郭画像から、映像オブジェクトの候補を抽出する。あるいは、映像オブジェクトが大きな動きを持つ場合は、異なるフレーム画像の差分をとった差分画像から、映像オブジェクトの候補を抽出する。これによって、映像オブジェクトをカラー画像から抽出する場合に比べて演算量を抑えることができる。そして、輝度画像、輪郭画像及び差分画像を組み合わせたのちに、２値化することで、オブジェクト候補画像を生成する。 For example, if the video object to be extracted has a brightness greater than or equal to a specific brightness (or less), a candidate video object is extracted from a brightness image having only brightness information as a frame image. If the brightness of the video object is significantly different from the brightness of the surroundings, a video object candidate is extracted from the contour image from which only the contour is extracted. Alternatively, if the video object has a large movement, a video object candidate is extracted from a difference image obtained by taking a difference between different frame images. As a result, the amount of calculation can be reduced as compared with the case where the video object is extracted from the color image. Then, after combining the luminance image, the contour image, and the difference image, binarization is performed to generate an object candidate image.

そして、映像オブジェクト軌跡合成方法は、オブジェクト抽出ステップにおいて、予め設定されている抽出条件に基づいて、オブジェクト候補画像の中から、抽出すべき映像オブジェクトの位置及び映像オブジェクトを特徴付ける特徴量を抽出する。
ここで、抽出条件には、抽出すべき映像オブジェクトの面積、輝度、色、アスペクト比及び円形度の少なくとも１つ以上を用いることができる。これによって、フレーム画像内に複数の映像オブジェクトが存在する場合でも、抽出対象となる映像オブジェクトを絞り込むことができる。また、ここで、抽出すべき映像オブジェクトの位置は、映像オブジェクトの重心座標等を用いることができる。また、特徴量は、映像オブジェクトの輝度、色、テクスチャ等を用いることができる。 Then, in the object extraction step, the image object trajectory synthesis method extracts the position of the image object to be extracted and the feature quantity characterizing the image object from the object candidate images based on preset extraction conditions.
Here, at least one of the area, luminance, color, aspect ratio, and circularity of the video object to be extracted can be used as the extraction condition. Thereby, even when there are a plurality of video objects in the frame image, the video objects to be extracted can be narrowed down. Here, as the position of the video object to be extracted, the barycentric coordinates of the video object can be used. As the feature amount, the brightness, color, texture, etc. of the video object can be used.

そして、映像オブジェクト軌跡合成方法は、軌跡画像生成ステップによって、オブジェクト抽出ステップで抽出した映像オブジェクトの位置及び特徴量から、映像オブジェクトの移動軌跡のみを作画した軌跡画像を生成する。このようにして生成された軌跡画像を、画像合成ステップによって、逐次フレーム画像と合成することで、映像に抽出対象となる映像オブジェクトの移動軌跡を合成した合成映像が生成される。
そして、映像オブジェクト軌跡合成方法は、探索領域推定ステップによって、オブジェクト抽出ステップで抽出された映像オブジェクトの位置に基づいて、次に入力される前記フレーム画像内における前記映像オブジェクトを探索するための探索領域を、オブジェクトの重心座標にカルマンフィルタを適用することで次に入力される前記映像オブジェクトの位置を予測し、その予測から推定する。
なお、探索領域推定ステップは、カルマンフィルタのフィルタ感度であるカルマンゲインを、観測雑音の共分散である観測雑音共分散と、フレーム画像における状態量の共分散である状態共分散とに基づいて、更新し、前記状態共分散は、映像オブジェクトの追跡に成功したときには小さく、追跡に失敗したときは大きくとる。 In the video object trajectory synthesis method, the trajectory image generation step generates a trajectory image in which only the moving trajectory of the video object is drawn from the position and feature amount of the video object extracted in the object extraction step. The trajectory image generated in this manner is sequentially combined with the frame image in the image compositing step, so that a composite video in which the moving trajectory of the video object to be extracted is combined with the video is generated.
The video object trajectory synthesis method includes a search area for searching for the video object in the frame image to be input next based on the position of the video object extracted in the object extraction step by the search area estimation step. The position of the video object to be input next is predicted by applying a Kalman filter to the barycentric coordinates of the object, and is estimated from the prediction.
The search region estimation step updates the Kalman gain, which is the filter sensitivity of the Kalman filter, based on the observation noise covariance that is the covariance of the observation noise and the state covariance that is the covariance of the state quantities in the frame image. The state covariance is small when the tracking of the video object is successful, and is large when the tracking fails.

さらに、請求項４に記載の映像オブジェクト軌跡合成プログラムは、入力された映像に含まれる映像オブジェクトを抽出し、追跡するとともに、当該映像オブジェクトの移動軌跡を前記映像上に合成するために、コンピュータを、オブジェクト候補画像生成手段、オブジェクト抽出手段、軌跡画像生成手段、探索領域推定手段、画像合成手段、として機能させることを特徴とする。 Furthermore, the video object trajectory synthesis program according to claim 4 extracts and tracks a video object included in the input video, and combines a moving object of the video object with the video to synthesize the moving trajectory of the video object. , And object candidate image generation means, object extraction means, trajectory image generation means, search region estimation means, and image composition means.

かかる構成によれば、映像オブジェクト軌跡合成プログラムは、オブジェクト候補画像生成手段によって、時系列に入力され映像を構成する静止画単位のフレーム画像の中から、映像オブジェクトの候補を抽出したオブジェクト候補画像を生成する。
なお、オブジェクト候補画像は、時系列に入力されるフレーム画像における輝度画像、輪郭画像及び入力時刻の異なるフレーム画像の差分をとった差分画像の少なくとも１つ以上の画像から生成する。 According to such a configuration, the video object trajectory synthesis program obtains an object candidate image obtained by extracting video object candidates from frame images in units of still images that are input in time series by the object candidate image generation unit and constitute a video. Generate.
Note that the object candidate image is generated from at least one or more images of a difference image obtained by calculating a difference between a luminance image, a contour image, and frame images having different input times in a frame image input in time series.

そして、映像オブジェクト軌跡合プログラムは、オブジェクト抽出手段によって、予め設定されている抽出条件に基づいて、オブジェクト候補画像の中から、抽出すべき映像オブジェクトの位置及び映像オブジェクトを特徴付ける特徴量を抽出する。
ここで、抽出条件には、抽出すべき映像オブジェクトの面積、輝度、色、アスペクト比及び円形度の少なくとも１つ以上を用いることができる。これによって、フレーム画像内に複数の映像オブジェクトが存在する場合でも、抽出対象となる映像オブジェクトを絞り込むことができる。また、ここで、抽出すべき映像オブジェクトの位置は、映像オブジェクトの重心座標等を用いることができる。また、特徴量は、映像オブジェクトの輝度、色、テクスチャ等を用いることができる。 Then, the video object locus matching program extracts the position of the video object to be extracted and the feature quantity characterizing the video object from the object candidate images based on the preset extraction condition by the object extracting means.
Here, at least one of the area, luminance, color, aspect ratio, and circularity of the video object to be extracted can be used as the extraction condition. Thereby, even when there are a plurality of video objects in the frame image, the video objects to be extracted can be narrowed down. Here, as the position of the video object to be extracted, the barycentric coordinates of the video object can be used. As the feature amount, the brightness, color, texture, etc. of the video object can be used.

そして、映像オブジェクト軌跡合成プログラムは、軌跡画像生成手段によって、オブジェクト抽出手段で抽出した映像オブジェクトの位置及び特徴量から、映像オブジェクトの移動軌跡のみを作画した軌跡画像を生成する。このようにして生成された軌跡画像を、画像合成手段によって、逐次フレーム画像と合成することで、映像に抽出対象となる映像オブジェクトの移動軌跡を合成した合成映像が生成される。
そして、映像オブジェクト軌跡合成プログラムは、探索領域推定手段によって、オブジェクト抽出手段で抽出された映像オブジェクトの位置に基づいて、次に入力される前記フレーム画像内における前記映像オブジェクトを探索するための探索領域を、オブジェクトの重心座標にカルマンフィルタを適用することで次に入力される前記映像オブジェクトの位置を予測し、その予測から推定する。
なお、探索領域推定手段は、カルマンフィルタのフィルタ感度であるカルマンゲインを、観測雑音の共分散である観測雑音共分散と、フレーム画像における状態量の共分散である状態共分散とに基づいて、更新し、前記状態共分散は、映像オブジェクトの追跡に成功したときには小さく、追跡に失敗したときは大きくとる。 Then, the video object trajectory synthesis program generates a trajectory image in which only the moving trajectory of the video object is drawn from the position and the feature amount of the video object extracted by the object extraction means by the trajectory image generation means. The trajectory image generated in this manner is sequentially combined with the frame image by the image combining means, thereby generating a composite video in which the moving trajectory of the video object to be extracted is combined with the video.
The video object trajectory synthesis program searches the search area for searching for the video object in the next input frame image based on the position of the video object extracted by the object extraction means by the search area estimation means. The position of the video object to be input next is predicted by applying a Kalman filter to the barycentric coordinates of the object, and is estimated from the prediction.
The search area estimation means updates the Kalman gain, which is the filter sensitivity of the Kalman filter, based on the observation noise covariance that is the covariance of the observation noise and the state covariance that is the covariance of the state quantities in the frame image. The state covariance is small when the tracking of the video object is successful, and is large when the tracking fails.

請求項１、３、４に記載の発明によれば、映像から抽出対象となる映像オブジェクトを抽出、追跡し、その映像オブジェクトの移動軌跡を、人手を介さず映像に合成することができる。また、このとき映像オブジェクトの特徴量としてテクスチャを用いれば、実写の映像オブジェクトの動きを連続して描画した合成映像を生成することができる。これによって、新たな特殊効果を持った映像を提供することが可能になる。またフレーム画像内において、逐次移動する映像オブジェクトの位置を推定することができ、その位置に基づいて、映像オブジェクトの探索領域を限定することができるので、映像オブジェクトを抽出する際の演算量を抑えることができる。これによって、リアルタイムで映像オブジェクトの移動軌跡を映像上に合成することが可能になる。
さらに、本発明によれば、１台のカメラで撮影された映像から、映像オブジェクトの動きを連続して描画した合成映像を生成するため、機器構成の規模を小さくすることができる。 According to the first, third, and fourth aspects of the present invention, it is possible to extract and track a video object to be extracted from a video, and synthesize the moving trajectory of the video object to the video without human intervention. At this time, if a texture is used as the feature amount of the video object, it is possible to generate a composite video in which the motion of the live-action video object is continuously drawn. This makes it possible to provide a video with a new special effect. In addition, it is possible to estimate the position of the moving video object in the frame image, and to limit the search area of the video object based on the position, thereby reducing the amount of calculation when extracting the video object. be able to. This makes it possible to synthesize the moving trajectory of the video object on the video in real time.
Furthermore, according to the present invention, since the composite video in which the motion of the video object is continuously drawn is generated from the video shot by one camera, the scale of the device configuration can be reduced.

請求項２に記載の発明によれば、映像オブジェクトが不可視領域に移動し、再度可視領域に移動した場合であっても、不可視領域の移動を推定して、移動軌跡を描画することができる。 According to the second aspect of the present invention, even when the video object has moved to the invisible region and moved to the visible region again, the movement trajectory can be estimated by estimating the movement of the invisible region.

以下、本発明の実施の形態について図面を参照して説明する。
［映像オブジェクト軌跡合成装置の構成］
まず、図１を参照して、本発明に係る映像オブジェクト軌跡合成装置の構成について説明する。図１は、映像オブジェクト軌跡合成装置の構成を示したブロック図である。
図１に示すように、映像オブジェクト軌跡合成装置１は、入力された映像から、追跡対象となる動きを伴うオブジェクト（映像オブジェクト）を抽出し、追跡するとともに、その映像オブジェクトの移動軌跡を映像上に合成するものである。ここでは、映像オブジェクト軌跡合成装置１は、オブジェクト候補画像生成手段１０と、抽出条件記憶手段２０と、オブジェクト抽出手段３０と、探索領域推定手段４０と、軌跡画像生成手段５０と、軌跡画像記憶手段６０と、画像合成手段７０とを備えて構成されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[Configuration of video object trajectory synthesizer]
First, the configuration of the video object trajectory synthesis apparatus according to the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of a video object trajectory synthesis apparatus.
As shown in FIG. 1, the video object trajectory synthesizer 1 extracts and tracks an object (video object) with a movement to be tracked from an input video, and tracks the movement trajectory of the video object on the video. To be synthesized. Here, the video object locus synthesizing apparatus 1 includes an object candidate image generation means 10, an extraction condition storage means 20, an object extraction means 30, a search area estimation means 40, a locus image generation means 50, and a locus image storage means. 60 and an image composition means 70.

オブジェクト候補画像生成手段１０は、入力された映像から、その映像を構成するフレーム画像毎に、追跡対象となる映像オブジェクトの候補を抽出したオブジェクト候補画像を生成するものである。なお、映像は、例えば１秒間に３０枚のフレーム画像から構成されている。そこで、オブジェクト候補画像生成手段１０は、このフレーム画像の中から映像オブジェクトの候補を抽出し、２値化することで、その映像オブジェクトの候補だけからなる画像（オブジェクト候補画像）を生成する。このオブジェクト候補画像は、追跡対象となる映像オブジェクトに類似する映像オブジェクトを複数抽出した画像である。例えば、オブジェクト候補画像は、動きを伴った映像オブジェクト等、追跡対象となる映像オブジェクトを大まかに抽出した画像である。
ここでは、オブジェクト候補画像生成手段１０は、輝度画像生成部１１と、輪郭画像生成部１２と、差分画像生成部１３と、オブジェクト候補抽出部１４とを備えて構成されている。 The object candidate image generation means 10 generates an object candidate image obtained by extracting a video object candidate to be tracked for each frame image constituting the video from the input video. The video is composed of, for example, 30 frame images per second. Therefore, the object candidate image generation means 10 extracts a video object candidate from the frame image and binarizes it, thereby generating an image (object candidate image) consisting only of the video object candidate. This object candidate image is an image obtained by extracting a plurality of video objects similar to the video object to be tracked. For example, the object candidate image is an image obtained by roughly extracting a video object to be tracked, such as a video object with movement.
Here, the object candidate image generation means 10 includes a luminance image generation unit 11, a contour image generation unit 12, a difference image generation unit 13, and an object candidate extraction unit 14.

輝度画像生成部１１は、入力された映像（カラー映像）のフレーム画像から、フレーム画像をモノクロ（グレースケール）化した輝度画像を生成するものである。この輝度画像生成部１１は、例えば、フレーム画像の各画素値を構成する色のＲＧＢ成分の平均値を算出し、新しい画素値とすることで輝度画像を生成する。ここで生成された輝度画像は、オブジェクト候補抽出部１４に出力される。 The luminance image generation unit 11 generates a luminance image obtained by converting a frame image into a monochrome (grayscale) from a frame image of an input video (color video). For example, the luminance image generation unit 11 calculates an average value of the RGB components of the colors constituting each pixel value of the frame image, and generates a luminance image by using the average value as a new pixel value. The luminance image generated here is output to the object candidate extraction unit 14.

輪郭画像生成部１２は、入力された映像のフレーム画像から、フレーム画像の輝度に基づいて、輪郭（エッジ）を抽出した輪郭画像を生成するものである。この輪郭画像生成部１２は、例えば、フレーム画像の隣接する画素の輝度の変化を検出することで輪郭を検出する。なお、輪郭画像生成部１２は、輝度画像生成部１１で生成された輝度画像から、輪郭を抽出することとしてもよい。ここで生成された輪郭画像は、オブジェクト候補抽出部１４に出力される。 The contour image generation unit 12 generates a contour image in which a contour (edge) is extracted from the frame image of the input video based on the luminance of the frame image. For example, the contour image generation unit 12 detects a contour by detecting a change in luminance of adjacent pixels of the frame image. Note that the contour image generation unit 12 may extract a contour from the luminance image generated by the luminance image generation unit 11. The contour image generated here is output to the object candidate extraction unit 14.

差分画像生成部１３は、時系列に入力されるフレーム画像から、異なる時間に入力されたフレーム画像（例えば、現在のフレーム画像と、その前に入力されたフレーム画像）間の輝度の差を画素値とした差分画像を生成するものである。なお、差分画像生成部１３は、輝度画像生成部１１で異なる時間に生成された輝度画像から、差分画像を生成することとしてもよい。ここで生成された差分画像は、オブジェクト候補抽出部１４に出力される。 The difference image generation unit 13 calculates a luminance difference between frame images input at different times (for example, a current frame image and a frame image input before) from frame images input in time series. A difference image as a value is generated. The difference image generation unit 13 may generate a difference image from the luminance images generated at different times by the luminance image generation unit 11. The difference image generated here is output to the object candidate extraction unit 14.

オブジェクト候補抽出部１４は、輝度画像生成部１１、輪郭画像生成部１２及び差分画像生成部１３で生成された輝度画像、輪郭画像及び差分画像に基づいて、映像オブジェクトの候補を抽出したオブジェクト候補画像を生成するものである。ここでは、オブジェクト候補抽出部１４は、画像積算部１４１と、２値化部１４２と、ノイズ除去部１４３とを備えて構成されている。 The object candidate extraction unit 14 is an object candidate image obtained by extracting video object candidates based on the luminance image, the contour image, and the difference image generated by the luminance image generation unit 11, the contour image generation unit 12, and the difference image generation unit 13. Is generated. Here, the object candidate extraction unit 14 includes an image integration unit 141, a binarization unit 142, and a noise removal unit 143.

画像積算部１４１は、輝度画像、輪郭画像及び差分画像にそれぞれ重みを持たせて加算することで、映像オブジェクトを抽出するための画像（抽出用画像）を生成するものである。ここで生成される抽出用画像は、フレーム画像から、追跡対象となる映像オブジェクトに類似する映像オブジェクトを複数抽出した画像であって、映像オブジェクトの内容（テクスチャ）を伴った画像である。また、この抽出用画像は、２値化部１４２に出力される。 The image integration unit 141 generates an image (extraction image) for extracting a video object by adding a weight to the luminance image, the contour image, and the difference image. The image for extraction generated here is an image obtained by extracting a plurality of video objects similar to the video object to be tracked from the frame image, and is an image accompanied by the content (texture) of the video object. The extraction image is output to the binarization unit 142.

この画像積算部１４１は、例えば、ある時刻ｔにおける輝度画像ｙの座標（ｘ，ｙ）の画素値をｙ（ｘ，ｙ，ｔ）、輪郭画像ｅの座標（ｘ，ｙ）の画素値をｅ（ｘ，ｙ，ｔ）、差分画像ｄの座標（ｘ，ｙ）の画素値をｄ（ｘ，ｙ，ｔ）とし、輝度画像、輪郭画像及び差分画像のそれぞれの重み係数をｗ_y、ｗ_e及びｗ_dとしたとき、抽出用画像ｇの座標（ｘ，ｙ）の画素値ｇ（ｘ，ｙ，ｔ）を、以下の（１）式により算出する。 For example, the image integration unit 141 uses y (x, y, t) as the pixel value of the coordinate (x, y) of the luminance image y at a certain time t, and the pixel value of the coordinate (x, y) as the contour image e. e (x, y, t), the pixel value of the coordinate (x, y) of the difference image d is d (x, y, t), and the respective weighting factors of the luminance image, the contour image, and the difference image are w _y , when a w _e and w _d, coordinates (x, y) of the extracted image g pixel value g of (x, y, t) and is calculated by the following equation (1).

ここで重み係数（ｗ_y、ｗ_e及びｗ_d）は、抽出する映像オブジェクトの特性によって予め定めておくものとする。例えば、映像オブジェクトが、背景画像に比べて輝度差が大きい場合は、輝度画像又は輪郭画像の重み係数を大きくしておく。また、映像オブジェクトの動きが大きい場合は、差分画像の重み係数を大きくしておく。このように、輝度画像、輪郭画像及び差分画像にそれぞれ重みを持たせることで、生成される抽出用画像は、抽出しようとする映像オブジェクトの特性を表している画素の値が大きく表現されることになる。 Here the weighting factor (w _y, w _e and w _d) shall be predetermined by the characteristics of the video object to be extracted. For example, when the luminance difference of the video object is larger than that of the background image, the weighting coefficient of the luminance image or the contour image is increased. If the motion of the video object is large, the weighting coefficient of the difference image is increased. In this way, by giving weights to the luminance image, the contour image, and the difference image, the generated image for extraction has a large representation of the pixel value representing the characteristics of the video object to be extracted. become.

なお、ここでは、輝度画像、輪郭画像及び差分画像の３つの画像から抽出用画像を生成することとしているが、２つの画像から抽出用画像を生成する場合は、前記（１）式から使用しない画像の項を削除すればよい。また、輝度画像、輪郭画像及び差分画像の中で１つの画像のみを用いる場合は、画像積算部１４１を設ける必要はない。この場合、輝度画像、輪郭画像又は差分画像をそのまま２値化部１４２に入力することとすればよい。 Here, the extraction image is generated from the three images of the luminance image, the contour image, and the difference image. However, when the extraction image is generated from the two images, it is not used from the equation (1). What is necessary is just to delete the term of an image. Further, when only one image is used among the luminance image, the contour image, and the difference image, it is not necessary to provide the image integration unit 141. In this case, the luminance image, the contour image, or the difference image may be input to the binarization unit 142 as it is.

２値化部１４２は、画像積算部１４１で生成された抽出用画像を２値化することで、複数の映像オブジェクトを抽出するものである。例えば、抽出用画像の画素値が予め定めた閾値以上の場合に、画素値を“１（白）”、それ以外の場合に“０（黒）”とすることで２値化画像を生成する。これによって、例えば、画素値が“１（白）”となる領域を映像オブジェクトの候補として抽出することができる。ここで生成された２値化画像は、ノイズ除去部１４３に出力される。 The binarization unit 142 extracts a plurality of video objects by binarizing the extraction image generated by the image integration unit 141. For example, a binary image is generated by setting the pixel value to “1 (white)” when the pixel value of the extraction image is equal to or greater than a predetermined threshold, and “0 (black)” otherwise. . Thereby, for example, an area where the pixel value is “1 (white)” can be extracted as a video object candidate. The binarized image generated here is output to the noise removing unit 143.

ノイズ除去部１４３は、２値化部１４２で生成された２値化画像から、微細な映像オブジェクトをノイズとして除去するものである。このノイズ除去部１４３は、例えば、２値化画像に収縮処理と膨張処理とを施すことでノイズを除去する。ここでノイズを除去された２値化画像は、映像オブジェクトの候補を抽出したオブジェクト候補画像として、オブジェクト抽出手段３０に出力される。 The noise removing unit 143 removes a fine video object as noise from the binarized image generated by the binarizing unit 142. For example, the noise removing unit 143 removes noise by performing shrinkage processing and expansion processing on the binarized image. Here, the binarized image from which the noise has been removed is output to the object extracting unit 30 as an object candidate image obtained by extracting video object candidates.

ここで、収縮処理とは、近傍に１画素でも“０（黒）”の画素値を持つ画素の画素値を“０（黒）”とする処理である。また、膨張処理とは、近傍に１画素でも“１（白）”の画素値を持つ画素の画素値を“１（白）”とする処理である。この収縮処理によって、微小な“１（白）”の画素値を有する領域を消去することができる。この収縮段階では、映像オブジェクトの領域である“１（白）”の画素値を持つ領域は、収縮された状態となるので、膨張処理によって、映像オブジェクトの領域を膨張させる。これによって、ノイズを除去した２値化画像が生成される。 Here, the contraction process is a process for setting a pixel value of a pixel having a pixel value of “0 (black)” to “0 (black)” even in the vicinity of one pixel. The expansion process is a process for setting the pixel value of a pixel having a pixel value of “1 (white)” to “1 (white)” even in the vicinity of one pixel. By this contraction process, a region having a minute pixel value of “1 (white)” can be erased. In this contraction stage, the region having the pixel value “1 (white)” that is the region of the video object is in a contracted state, and thus the region of the video object is expanded by expansion processing. Thereby, a binarized image from which noise is removed is generated.

以上説明したオブジェクト候補抽出部１４では、画像積算部１４１で生成された抽出用画像に対して、２値化部１４２によって２値化処理を行い、ノイズ除去部１４３によって収縮処理及び膨張処理を行うことで、ノイズを除去した２値化画像を生成したが、抽出用画像を平滑化した後に２値化を行うことで、ノイズを除去することとしてもよい。 In the object candidate extraction unit 14 described above, the binarization unit 142 performs binarization processing on the extraction image generated by the image integration unit 141 and the noise removal unit 143 performs contraction processing and expansion processing. Thus, the binarized image from which noise has been removed is generated, but noise may be removed by performing binarization after smoothing the extraction image.

抽出条件記憶手段２０は、抽出（追跡）対象となる映像オブジェクトを選択するための条件を記憶するもので、一般的なハードディスク等の記憶媒体である。この抽出条件記憶手段２０には、種々の抽出条件を示す抽出条件情報２０ａと、映像オブジェクトの位置を示す位置情報２０ｂとを記憶している。 The extraction condition storage means 20 stores conditions for selecting a video object to be extracted (tracked), and is a general storage medium such as a hard disk. The extraction condition storage means 20 stores extraction condition information 20a indicating various extraction conditions and position information 20b indicating the position of the video object.

抽出条件情報２０ａは、抽出すべき映像オブジェクトの抽出条件を記述した情報であって、例えば、面積、輝度、色、アスペクト比及び円形度の少なくとも１つ以上の抽出条件を記述したものである。この抽出条件情報２０ａは、後記するオブジェクト抽出手段３０が、オブジェクト候補画像生成手段１０で生成されたオブジェクト候補画像から、抽出すべき映像オブジェクトを選択するためのフィルタ（面積フィルタ、輝度フィルタ、色フィルタ、アスペクト比フィルタ及び円形度フィルタ）の条件となるものである。
なお、抽出条件情報２０ａには、面積フィルタ、輝度フィルタ、色フィルタ、アスペクト比フィルタ及び円形度フィルタの条件として、予め定めた初期値と、その許容範囲を示す閾値とを記憶しておく。これによって、閾値外の値（特徴）を持つ映像オブジェクトを抽出すべき映像オブジェクトの候補から外すことができる。 The extraction condition information 20a is information describing the extraction condition of the video object to be extracted, and describes, for example, at least one extraction condition of area, luminance, color, aspect ratio, and circularity. This extraction condition information 20a is a filter (area filter, luminance filter, color filter) for the object extraction means 30 to be described later to select a video object to be extracted from the object candidate images generated by the object candidate image generation means 10. , Aspect ratio filter and circularity filter).
The extraction condition information 20a stores a predetermined initial value and a threshold value indicating an allowable range as conditions for the area filter, the luminance filter, the color filter, the aspect ratio filter, and the circularity filter. As a result, a video object having a value (feature) outside the threshold can be excluded from video object candidates to be extracted.

ここで、面積は、例えば、映像オブジェクトの画素数を示す。また、輝度は、映像オブジェクトにおける各画素の輝度の平均値を示す。また、色は、映像オブジェクトにおける各画素の色（例えばＲＧＢ値）の平均値を示す。なお、この色については、予め背景となる画像の背景色を初期値として、その背景色からの変化度を閾値としてもよい。 Here, the area indicates the number of pixels of the video object, for example. Further, the luminance indicates an average value of the luminance of each pixel in the video object. The color indicates an average value of colors (for example, RGB values) of each pixel in the video object. For this color, the background color of the background image may be set as an initial value, and the degree of change from the background color may be set as a threshold value.

例えば、図２に示すように、ＲＧＢ空間において、映像オブジェクトの色（色ベクトル）をＩ＝［Ｉｒ，Ｉｇ，Ｉｂ］、背景の色（色ベクトル）をＥ＝［Ｅｒ，Ｅｇ，Ｅｂ］とすると、背景色に対する明るさ（輝度）の変化度α、色合い（彩度）の変化度Ｃは、以下の（２）式及び（３）式で表すことができる。そこで、例えば閾値θ₁、θ₂を予め定め、Ｃ＞θ₁又はα＜θ₂となる部分が抽出すべき映像オブジェクトとなる。 For example, as shown in FIG. 2, in the RGB space, the color (color vector) of the video object is I = [Ir, Ig, Ib], and the background color (color vector) is E = [Er, Eg, Eb]. Then, the change degree α of the brightness (luminance) and the change degree C of the hue (saturation) with respect to the background color can be expressed by the following expressions (2) and (3). Therefore, for example, threshold values θ ₁ and θ ₂ are determined in advance, and a portion where C> θ ₁ or α <θ ₂ is a video object to be extracted.

また、アスペクト比は、映像オブジェクトの縦方向の最大長と、横方向の最大長との比を示す。また、円形度は、映像オブジェクトの円形の度合いを示すものであって、円形に近いほど大きな値を有するものである。この円形度ｅは、映像オブジェクトの面積をＳ、周囲長をｌとしたとき、以下の（４）式で表される。 The aspect ratio indicates the ratio between the maximum length in the vertical direction and the maximum length in the horizontal direction of the video object. The circularity indicates the degree of circularity of the video object, and has a larger value as it becomes closer to a circular shape. The circularity e is expressed by the following equation (4), where S is the area of the video object and l is the perimeter.

例えば、抽出対象の映像オブジェクトがボールのような円形の形状を有するものの場合は、抽出条件の円形度を、１に近い値に設定しておけばよい。 For example, if the video object to be extracted has a circular shape such as a ball, the circularity of the extraction condition may be set to a value close to 1.

位置情報２０ｂは、現在追跡している映像オブジェクトの位置を示す情報である。この位置情報２０ｂは、例えば、映像オブジェクトの重心位置とする。この重心座標は、後記するオブジェクト抽出手段３０の特徴量解析部３３によって算出される。なお、この位置情報２０ｂは、抽出条件情報２０ａに合致する映像オブジェクトが複数存在する場合に、位置情報２０ｂで示した座標に最も近い映像オブジェクトを、抽出すべき映像オブジェクトとして決定するための抽出条件としても機能する。 The position information 20b is information indicating the position of the video object currently being tracked. This position information 20b is, for example, the barycentric position of the video object. The barycentric coordinates are calculated by the feature amount analysis unit 33 of the object extraction unit 30 described later. This position information 20b is an extraction condition for determining the video object closest to the coordinates indicated by the position information 20b as the video object to be extracted when there are a plurality of video objects that match the extraction condition information 20a. Also works.

オブジェクト抽出手段３０は、オブジェクト候補画像生成手段１０で生成されたオブジェクト候補画像の中から、抽出条件記憶手段２０に記憶されている抽出条件（抽出条件情報２０ａ）に基づいて、抽出（追跡）対象となる映像オブジェクトを選択し、その映像オブジェクトの位置及び映像オブジェクトを特徴付ける特徴量を抽出するものである。ここでは、オブジェクト抽出手段３０は、ラベリング部３１と、オブジェクト選択部３２と、特徴量解析部３３とを備えて構成されている。 The object extraction means 30 is an object to be extracted (tracked) from the object candidate images generated by the object candidate image generation means 10 based on the extraction conditions (extraction condition information 20a) stored in the extraction condition storage means 20. The video object is selected, and the position of the video object and the feature quantity characterizing the video object are extracted. Here, the object extraction unit 30 includes a labeling unit 31, an object selection unit 32, and a feature amount analysis unit 33.

ラベリング部３１は、オブジェクト候補画像生成手段１０で生成されたオブジェクト候補画像（２値化画像）の中で、映像オブジェクトの候補となる領域に対して番号（ラベル）を付すものである。すなわち、ラベリング部３１は、映像オブジェクトの領域である“１（白）”の画素値を持つ連結した領域（連結領域）に対して１つの番号を付す。これによって、オブジェクト候補画像内の映像オブジェクトの候補が番号付けされたことになる。 The labeling unit 31 assigns numbers (labels) to regions that are candidates for video objects in the object candidate images (binarized images) generated by the object candidate image generation unit 10. That is, the labeling unit 31 assigns one number to a connected area (connected area) having a pixel value of “1 (white)” that is an area of the video object. As a result, the video object candidates in the object candidate image are numbered.

オブジェクト選択部３２は、ラベリング部３１で番号付けされた映像オブジェクトの候補毎に、抽出条件記憶手段２０に記憶されている抽出条件（抽出条件情報２０ａ）に合致する映像オブジェクトかどうかを判定することで、抽出（追跡）対象となる映像オブジェクトを選択するものである。ここで選択された映像オブジェクトの番号（ラベル）は、特徴量解析部３３に出力される。 The object selection unit 32 determines, for each video object candidate numbered by the labeling unit 31, whether the video object matches the extraction condition (extraction condition information 20 a) stored in the extraction condition storage unit 20. The video object to be extracted (tracked) is selected. The number (label) of the video object selected here is output to the feature amount analysis unit 33.

すなわち、このオブジェクト選択部３２は、映像オブジェクトの候補毎に、抽出条件記憶手段２０に記憶されている抽出条件情報２０ａで示される抽出条件（例えば、面積、輝度、色、アスペクト比及び円形度）に基づいて、フィルタリングを行うことで、抽出条件を満たす映像オブジェクトを、抽出すべき映像オブジェクトとして選択する。 That is, the object selection unit 32 extracts the extraction condition (for example, area, brightness, color, aspect ratio, and circularity) indicated by the extraction condition information 20a stored in the extraction condition storage unit 20 for each video object candidate. Based on the above, by filtering, a video object satisfying the extraction condition is selected as a video object to be extracted.

また、オブジェクト選択部３２は、面積等の抽出条件に基づいてフィルタリングを行う場合、ここで抽出した映像オブジェクトの面積等で抽出条件情報２０ａを更新する。そして、次のフレーム画像で映像オブジェクトを選択する際に、その更新された面積等に基づいて、抽出条件の判定を行う。これよって、映像オブジェクトの大きさ、明るさ等が変化する場合であっても、抽出条件に適合させて映像オブジェクトを抽出することが可能になる。
なお、オブジェクト選択部３２は、色に基づいてフィルタリングを行う場合、映像オブジェクト軌跡合成装置１に入力された映像のフレーム画像から、映像オブジェクトに対応する領域の色の平均値を算出し、抽出条件情報２０ａで示される抽出条件に基づいてフィルタリングを行うものとする。 Further, when performing filtering based on the extraction condition such as the area, the object selection unit 32 updates the extraction condition information 20a with the area of the video object extracted here. Then, when selecting a video object in the next frame image, the extraction condition is determined based on the updated area and the like. As a result, even when the size, brightness, etc. of the video object changes, it is possible to extract the video object in conformity with the extraction condition.
When performing filtering based on color, the object selection unit 32 calculates an average value of colors in the area corresponding to the video object from the frame image of the video input to the video object locus synthesis apparatus 1, and extracts the extraction condition. It is assumed that filtering is performed based on the extraction condition indicated by the information 20a.

特徴量解析部３３は、オブジェクト選択部３２で選択されたラベルに対応する映像オブジェクトを解析し、映像オブジェクトの位置及び映像オブジェクトを特徴付ける特徴量を抽出するものである。ここで抽出した映像オブジェクトの位置及び特徴量は、探索領域推定手段４０及び軌跡画像作成手段５０に出力される。また、ここで抽出した映像オブジェクトの位置は、現在の映像オブジェクトの位置情報として、抽出条件記憶手段２０の位置情報２０ｂに書き込まれる。なお、オブジェクト選択部３２で抽出条件に適合した映像オブジェクトが選択できなかった場合は、その旨（抽出失敗）を軌跡画像生成手段５０に通知する。 The feature amount analysis unit 33 analyzes the video object corresponding to the label selected by the object selection unit 32 and extracts the feature amount characterizing the position of the video object and the video object. The extracted position and feature amount of the video object are output to the search area estimation unit 40 and the trajectory image creation unit 50. Further, the position of the video object extracted here is written in the position information 20b of the extraction condition storage means 20 as the position information of the current video object. If the object selection unit 32 cannot select a video object that meets the extraction condition, the trajectory image generation unit 50 is notified of this (extraction failure).

ここで映像オブジェクトの位置としては、映像オブジェクトの重心座標、多角形近似の頂点座標、スプライン曲線の制御点座標等を用いることができる。なお、オブジェクト選択部３２で複数の映像オブジェクトが選択された場合は、例えば、１フレーム前のフレーム画像で抽出した映像オブジェクトの位置に最も近い映像オブジェクトを、追跡中の映像オブジェクトとして特定する。 Here, as the position of the video object, the barycentric coordinates of the video object, the vertex coordinates of the polygon approximation, the control point coordinates of the spline curve, and the like can be used. When a plurality of video objects are selected by the object selection unit 32, for example, the video object closest to the position of the video object extracted from the frame image one frame before is specified as the video object being tracked.

また、映像オブジェクトの特徴量には、映像オブジェクトの存在領域、輝度、色、テクスチャ等を用いることができる。なお、存在領域は、探索領域推定手段４０が、次のフレーム画像における映像オブジェクトの探索領域を推定するために用いられる。また、輝度及び色は、オブジェクト選択部３２で抽出条件と比較する際に用いた値を用いることができる。また、テクスチャは、映像オブジェクト軌跡合成装置１に入力された映像のフレーム画像から、映像オブジェクトに対応する領域を抽出した画像を用い、後記する軌跡画像生成手段５０で軌跡画像を作画する際に用いることができる。 In addition, a video object existence area, luminance, color, texture, and the like can be used as the feature amount of the video object. The existence area is used by the search area estimation means 40 to estimate the search area of the video object in the next frame image. For the luminance and color, the values used when the object selection unit 32 compares with the extraction condition can be used. The texture is used when a trajectory image is generated by the trajectory image generation means 50 described later using an image obtained by extracting an area corresponding to the video object from the video frame image input to the video object trajectory synthesis apparatus 1. be able to.

探索領域推定手段４０は、オブジェクト抽出手段３０で抽出された映像オブジェクトの位置（重心座標等）に基づいて、次に入力されるフレーム画像における、映像オブジェクトの探索領域を推定するものである。
この探索領域推定手段４０は、例えば、重心座標にカルマンフィルタ（Ｋａｌｍａｎｆｉｌｔｅｒ）を適用することで、次フレーム画像における映像オブジェクトの位置を予測し、探索領域を推定する。以下、探索領域推定手段４０が、カルマンフィルタを用いて探索領域を推定する手法について説明する。カルマンフィルタは、時系列に観測される観測量に基づいて現在の状態を推定する「濾波」と、未来の状態を推定する「予測」とを行う漸化式を適用することで、時々刻々と変化する状態を推定するものである。 The search area estimation means 40 estimates the search area of the video object in the next input frame image based on the position (center of gravity coordinates, etc.) of the video object extracted by the object extraction means 30.
This search area estimation means 40 predicts the position of the video object in the next frame image, for example, by applying a Kalman filter to the barycentric coordinates, and estimates the search area. Hereinafter, a method in which the search region estimation unit 40 estimates the search region using the Kalman filter will be described. The Kalman filter changes from moment to moment by applying a recurrence formula that performs “filtering” to estimate the current state based on the observed quantity observed in time series and “prediction” to estimate the future state. The state to be performed is estimated.

ここでは、探索領域推定手段４０は、映像オブジェクトの重心座標（ｐ_x，ｐ_y）を含んだ以下の（５）式に示す状態量ｘを推定するものとする。なお、ｖ_x及びｖ_yは、それぞれ映像オブジェクトの水平方向（ｘ方向）及び垂直方向（ｙ方向）の速度である。また、Ｔは行列の転置を表している。 Here, it is assumed that the search area estimation unit 40 estimates the state quantity x shown in the following equation (5) including the barycentric coordinates (p _x , p _y ) of the video object. Note that v _x and v _y are the velocities of the video object in the horizontal direction (x direction) and the vertical direction (y direction), respectively. T represents transposition of the matrix.

まず、探索領域推定手段４０は、観測雑音の共分散である観測雑音共分散Ｒと、時刻（ｔ−１）のフレーム画像における状態量ｘ_t-1の共分散である状態共分散Ｐ_t-1とに基づいて、フィルタ感度（カルマンゲイン）Ｋを以下に示す（６）式により更新する。 First, the search area estimation means 40 has an observation noise covariance R that is a covariance of observation noise and a state covariance P _t− that is a covariance of a state quantity x _t−1 in a frame image at time (t−1). based on ₁ and updates the showing filter sensitivity (Kalman gain) K below (6).

ここで、行列Ｈは、状態量ｘと、オブジェクト抽出手段３０から入力される重心座標である観測量ｙとの関係を示す観測行列である。また、Ｔは行列の転置を表している。
なお、観測行列Ｈは、状態量ｘの重心座標（ｐ_x，ｐ_y）を観測するための行列として、以下の（７）式に示す行列とする。 Here, the matrix H is an observation matrix indicating the relationship between the state quantity x and the observation quantity y that is the barycentric coordinate input from the object extraction means 30. T represents transposition of the matrix.
Note that the observation matrix H is a matrix represented by the following formula (7) as a matrix for observing the barycentric coordinates (p _x , p _y ) of the state quantity x.

また、観測雑音共分散Ｒは、映像オブジェクトの観測精度を水平方向ｒ_x、垂直方向ｒ_yとしたとき、以下の（８）式に示す共分散とする。 Further, observation noise covariance R is the horizontal direction r _x observation accuracy of the video object, when the vertical direction r _y, and covariance shown in the following equation (8).

さらに、探索領域推定手段４０は、状態量ｘ及び状態共分散Ｐを、以下に示す（９）式及び（１０）式により更新する。 Furthermore, the search area estimation means 40 updates the state quantity x and the state covariance P according to the following expressions (9) and (10).

なお、Ｋは（６）式で算出されたフィルタ感度（カルマンゲイン）、ｙはオブジェクト抽出手段３０から入力される重心座標（観測量）、Ｈは（７）式の観測行列を示している。このように、探索領域推定手段４０は、（６）式、（９）式及び（１０）式（状態更新式）に基づいて、現在の状態を推定する。
さらに、探索領域推定手段４０は、時刻ｔのフレーム画像における状態量ｘ_t及び状態共分散Ｐ_tに基づいて、時刻（ｔ＋１）のフレーム画像における状態量ｘ_t+1及び状態共分散Ｐ_t+1を、以下に示す（１１）式及び（１２）式により予測する。 Here, K is the filter sensitivity (Kalman gain) calculated by equation (6), y is the barycentric coordinates (observation amount) input from the object extraction means 30, and H is the observation matrix of equation (7). Thus, the search area estimation means 40 estimates the current state based on the equations (6), (9), and (10) (state update equation).
Additionally, the search region estimating means 40 based on the state amount x _t and state covariance P _t at time t of the frame image, the time the state quantity in the frame image of the _{(t + 1) x t +} 1 and the state covariance P _{t + 1,} predicted by shown below (11) and (12).

ここで、Ｆは予め設定された状態遷移規則であって、状態遷移行列を示す。なお、ここでは、映像オブジェクトは等速直線運動をするものと想定し、状態遷移行列を以下の（１３）式に示す行列とする。また、Ｔは行列の転置を表している。 Here, F is a state transition rule set in advance and represents a state transition matrix. In this case, it is assumed that the video object has a constant linear motion, and the state transition matrix is a matrix shown in the following equation (13). T represents transposition of the matrix.

また、Ｑは等速度直線運動のモデル化誤差や、ノイズ等の外乱を考慮して設定するシステム雑音の共分散（プロセス共分散）を示す。例えば、映像オブジェクトの水平方向及び垂直方向の速度ｖ_x及びｖ_yに対して独立に誤差が生じる場合、プロセス共分散Ｑは、以下の（１４）式で示す行列を用いることができる。なお、ｑ_x及びｑ_yは非負の値である。 Further, Q represents system noise covariance (process covariance) set in consideration of modeling errors of constant velocity linear motion and disturbances such as noise. For example, when errors occur independently with respect to the horizontal and vertical velocities v _x and v _{y of} the video object, the process covariance Q can use a matrix represented by the following equation (14). Note that q _x and q _y are non-negative values.

このように、探索領域推定手段４０は、（１１）式及び（１２）式（状態予測式）に基づいて、次フレーム画像における映像オブジェクトの重心座標を推定することができる。
なお、カルマンフィルタを適用する場合、追跡状態によって状態共分散を制御することとしてもよい。例えば、映像オブジェクトの追跡に成功したときは状態共分散を小さく、追跡に失敗したときは状態共分散を大きくとることとしてもよい。 As described above, the search area estimation unit 40 can estimate the barycentric coordinates of the video object in the next frame image based on the equations (11) and (12) (state prediction equation).
When applying the Kalman filter, the state covariance may be controlled according to the tracking state. For example, when the tracking of the video object is successful, the state covariance may be reduced, and when the tracking fails, the state covariance may be increased.

また、探索領域推定手段４０は、この状態共分散の変化に対応付けて、フレーム画像内における映像オブジェクトの探索領域を変化させることが可能である。例えば、探索領域推定手段４０は、オブジェクト抽出手段３０から入力される、映像オブジェクトの特徴量である存在領域に、状態共分散分を加味することで最小限の大きさの探索領域を推定することができる。
図１の映像オブジェクト軌跡合成装置１の構成について説明を続ける。 In addition, the search area estimation means 40 can change the search area of the video object in the frame image in association with the change in the state covariance. For example, the search area estimation means 40 estimates the search area of the minimum size by adding the state covariance to the existence area that is the feature amount of the video object input from the object extraction means 30. Can do.
The description of the configuration of the video object locus synthesizing device 1 in FIG. 1 will be continued.

軌跡画像生成手段５０は、オブジェクト抽出手段３０で抽出された映像オブジェクトの位置及び特徴量に基づいて、前記映像オブジェクトの特徴量を示す画像を作画するものである。ここでは、軌跡画像生成手段５０は、特徴量画像作画部５１と、軌跡補間部５１とを備えて構成されている。 The trajectory image generation means 50 draws an image indicating the feature amount of the video object based on the position and feature amount of the video object extracted by the object extraction means 30. Here, the trajectory image generating means 50 includes a feature amount image drawing unit 51 and a trajectory interpolation unit 51.

特徴量画像作画部５１は、軌跡画像記憶手段６０に記憶されている軌跡画像６０ａ上で、オブジェクト抽出手段３０で抽出された映像オブジェクトの位置に相当する場所に、特徴量を作画するものである。例えば、特徴量として色を用いる場合は、その色を予め定めた大きさ（例えば、ボールの大きさ等）で軌跡画像６０ａ上に作画する。 The feature amount image drawing unit 51 draws a feature amount at a location corresponding to the position of the video object extracted by the object extraction unit 30 on the locus image 60a stored in the locus image storage unit 60. . For example, when a color is used as the feature amount, the color is drawn on the trajectory image 60a with a predetermined size (for example, the size of a ball).

また、例えば、特徴量としてテクスチャを用いる場合は、そのテクスチャを軌跡画像６０ａ上に作画することで、フレーム画像上の映像オブジェクトを実写のまま作画することができる。なお、この特徴量画像作画部５１は、オブジェクト抽出手段３０から、映像オブジェクトの位置及び特徴量を通知される度に、軌跡画像６０ａに作画を行う。これによって、追跡を行っている映像オブジェクトのみの軌跡を表した画像が生成されることになる。また、特徴量画像作画部５１は、軌跡画像６０ａを作画したタイミングで、画像合成手段７０に画像合成の指示を行う。 Further, for example, when a texture is used as the feature amount, the video object on the frame image can be drawn as it is by drawing the texture on the trajectory image 60a. The feature amount image drawing unit 51 draws the trajectory image 60a every time the object extraction unit 30 is notified of the position and feature amount of the video object. As a result, an image representing the locus of only the video object being tracked is generated. Also, the feature image drawing unit 51 instructs the image composition unit 70 to perform image composition at the timing when the trajectory image 60a is created.

また、特徴量画像作画部５１は、少なくとも特徴量を作画した最新の座標を、図示していない記憶手段に記憶しておく。この座標（最新作画位置）は、オブジェクト抽出手段３０で映像オブジェクトの抽出に失敗した場合に、後記する軌跡補間部５２において、軌跡を補間するために用いられる。なお、特徴量画像作画部５１は、映像オブジェクトの抽出に失敗したのちに、抽出に成功した段階で、軌跡補間部５２に対して、現在の映像オブジェクトの位置及び最新作画位置と、映像オブジェクトの特徴量（例えば、テクスチャ）とを軌跡補間部５２に出力する。 Further, the feature amount image drawing unit 51 stores at least the latest coordinates where the feature amount is drawn in a storage unit (not shown). The coordinates (latest drawing position) are used to interpolate the trajectory in the trajectory interpolation unit 52 described later when the object extraction unit 30 fails to extract the video object. It should be noted that the feature amount image drawing unit 51, after failing to extract the video object, at the stage of successful extraction, the trajectory interpolation unit 52, with respect to the current video object position, the latest drawing position, and the video object The feature amount (for example, texture) is output to the trajectory interpolation unit 52.

軌跡補間部５２は、特徴量画像作画部５１から出力される、現在の映像オブジェクトの位置及び最新作画位置と、映像オブジェクトの特徴量とに基づいて、軌跡画像６０ａにおいて、最新作画位置と、現在の映像オブジェクトの位置との間に存在したと推定される映像オブジェクトを内挿により補間するものである。これによって、例えば、手前に存在する映像オブジェクトによって、背後に存在する映像オブジェクトを隠す、いわゆるオクルージョンが発生した場合、背後に存在する映像オブジェクトが追跡対象であっても、映像オブジェクトの位置を補間して軌跡画像６０ａを作画することが可能になる。 The trajectory interpolation unit 52, based on the current video object position, the latest drawing position, and the video object feature amount, which are output from the feature amount image drawing unit 51, in the trajectory image 60a, The video object presumed to exist between the positions of the video objects is interpolated by interpolation. As a result, for example, when a so-called occlusion occurs that hides a video object that exists behind the video object that exists in the foreground, the position of the video object is interpolated even if the video object that exists behind is the tracking target. The locus image 60a can be drawn.

軌跡画像記憶手段６０は、軌跡画像生成手段５０で生成される軌跡画像を記憶しておくもので、一般的なハードディスク等の記憶媒体である。この軌跡画像記憶手段６０に記憶される軌跡画像６０ａは、軌跡画像生成手段５０の特徴量画像作画部５１によって、フレーム画像の入力間隔のタイミングで、映像オブジェクトの特徴量（例えば、テクスチャ）が作画される。あるいは、軌跡補間部５２によって、映像オブジェクトの抽出に失敗した間の映像オブジェクトの位置が補間される。また、ここで作画された軌跡画像６０ａは、画像合成手段７０によって読み出される。 The trajectory image storage means 60 stores the trajectory image generated by the trajectory image generation means 50 and is a general storage medium such as a hard disk. The trajectory image 60a stored in the trajectory image storage unit 60 is generated by the feature amount image drawing unit 51 of the trajectory image generation unit 50 in which the feature amount (for example, texture) of the video object is drawn at the timing of the frame image input interval. Is done. Alternatively, the locus interpolation unit 52 interpolates the position of the video object while the video object extraction has failed. Further, the locus image 60 a drawn here is read by the image composition means 70.

画像合成手段７０は、映像を構成するフレーム画像毎に、軌跡画像記憶手段６０に記憶されている軌跡画像６０ａを合成することで、映像に追跡対象となる映像オブジェクトの軌跡を合成した合成映像を生成するものである。この画像合成手段７０は、軌跡画像生成手段５０から通知される画像合成の指示に基づいて、フレーム画像と軌跡画像６０ａとを合成する。 The image synthesizing unit 70 synthesizes the trajectory image 60a stored in the trajectory image storage unit 60 for each frame image constituting the video, thereby combining the video with the trajectory of the video object to be tracked. Is to be generated. The image synthesizing unit 70 synthesizes the frame image and the trajectory image 60 a based on the image synthesis instruction notified from the trajectory image generating unit 50.

以上、一実施形態に基づいて、映像オブジェクト軌跡合成装置１の構成について説明したが、本発明はこれに限定されるものではない。例えば、画像合成手段７０でフレーム画像に対して映像オブジェクトの特徴量を合成する場合、色、輝度等の特徴量の場合は、軌跡画像６０ａを記憶しておく必要はなく、映像オブジェクトの移動した位置座標のみを記憶しておくことで、画像合成時にその位置座標に対して、色、輝度等の特徴量に基づいて映像オブジェクトの軌跡を作画し合成することとしてもよい。 The configuration of the video object trajectory synthesizing device 1 has been described based on one embodiment, but the present invention is not limited to this. For example, when the image composition means 70 synthesizes the feature quantity of the video object with the frame image, in the case of the feature quantity such as color and brightness, it is not necessary to store the trajectory image 60a, and the video object has moved. By storing only the position coordinates, the trajectory of the video object may be drawn and combined with the position coordinates based on the feature amount such as the color and the luminance when the image is combined.

なお、以上説明した映像オブジェクト軌跡合成装置１は、一般的なコンピュータにプログラムを実行させ、コンピュータ内の演算装置や記憶装置を動作させることにより実現することができる。このプログラム（映像オブジェクト軌跡合成プログラム）は、通信回線を介して配布することも可能であるし、ＣＤ−ＲＯＭ等の記録媒体に書き込んで配布することも可能である。 The video object locus synthesizing apparatus 1 described above can be realized by causing a general computer to execute a program and operating an arithmetic device or a storage device in the computer. This program (video object trajectory synthesis program) can be distributed via a communication line, or can be distributed on a recording medium such as a CD-ROM.

（映像オブジェクトの軌跡合成の例）
ここで、図３及び図４を参照して、映像オブジェクト軌跡合成装置１が、映像オブジェクトの軌跡を合成する例について説明する。図３は、映像オブジェクトの移動軌跡を映像上に合成する例を説明するための説明図である。図４は、映像オブジェクトの抽出に失敗した場合に、移動軌跡を補間する例を説明するための説明図である。なお、ここでは、野球のボールを追跡し、その移動軌跡を合成するものとして説明を行う。 (Example of video object trajectory synthesis)
Here, with reference to FIG. 3 and FIG. 4, an example in which the video object trajectory synthesis apparatus 1 synthesizes the trajectory of the video object will be described. FIG. 3 is an explanatory diagram for explaining an example in which the moving trajectory of the video object is synthesized on the video. FIG. 4 is an explanatory diagram for explaining an example of interpolating a movement trajectory when extraction of a video object fails. In the following description, it is assumed that a baseball is tracked and the movement trajectory is synthesized.

まず、図３を参照（適宜図１参照）して、映像オブジェクトの移動軌跡を映像上に合成する例について説明する。図３（ａ）は、映像オブジェクト軌跡合成装置１に入力される映像を構成するフレーム画像を時系列に示したものである。図３（ｂ）は、図３（ａ）の各フレーム画像に対応して、軌跡画像生成手段５０が生成した映像オブジェクトの軌跡画像を時系列に示したものである。図３（ｃ）は、図３（ａ）の各フレーム画像に対応して、画像合成手段７０がフレーム画像と軌跡画像とを合成した合成フレーム画像を時系列に示したものである。 First, referring to FIG. 3 (refer to FIG. 1 as appropriate), an example of synthesizing the moving trajectory of the video object on the video will be described. FIG. 3A shows the frame images constituting the video input to the video object locus synthesizing device 1 in time series. FIG. 3B shows, in time series, the trajectory image of the video object generated by the trajectory image generation means 50 corresponding to each frame image of FIG. FIG. 3C shows, in chronological order, a combined frame image obtained by combining the frame image and the trajectory image by the image combining unit 70 corresponding to each frame image of FIG.

図３（ａ）に示すように、映像オブジェクト軌跡合成装置１は、時刻ｔ、時刻（ｔ＋１）…時刻（ｔ＋ｎ）のフレーム画像から、順次ボールＭ₁、Ｍ₂…Ｍ_nを抽出する。そして、図３（ｂ）に示すように、図３（ａ）のフレーム画像から抽出したボール（ここではテクスチャ）を作画した軌跡画像を生成する。ここで、図３（ｂ）の時刻（ｔ＋１）の軌跡画像に示すように、軌跡画像は、前の時刻（時刻ｔ）で生成したボールＭ₁を作画した軌跡画像に、現在の時刻（時刻（ｔ＋１））で抽出したボールＭ₂を作画する。これによって、軌跡画像には、ボール（テクスチャ）のみが、連続した軌跡として作画されることになる。 As shown in FIG. 3A, the video object trajectory synthesis device 1 sequentially extracts balls M ₁ , M ₂ ... M _n from the frame images at time t, time (t + 1)... Time (t + n). Then, as shown in FIG. 3B, a trajectory image in which a ball (here, texture) extracted from the frame image of FIG. 3A is generated. Here, as shown in the trajectory image at time (t + 1) in FIG. 3B, the trajectory image is obtained by adding the current time (time) to the trajectory image created by the ball M ₁ generated at the previous time (time t). The ball M ₂ extracted in (t + 1)) is drawn. As a result, only the ball (texture) is drawn as a continuous trajectory in the trajectory image.

そして、映像オブジェクト軌跡合成装置１は、同時刻のフレーム画像（図３（ａ））と軌跡画像（図３（ｂ））とを合成することで、時刻ｔ、時刻（ｔ＋１）…時刻（ｔ＋ｎ）毎に合成フレーム画像（図３（ｃ））を生成する。また、この合成フレーム画像が時系列に出力されることで、合成映像となる。これによって、ボールの軌跡が描かれた映像（合成映像）を生成することができる。 Then, the video object locus synthesizing apparatus 1 synthesizes the frame image (FIG. 3 (a)) and the locus image (FIG. 3 (b)) at the same time, so that time t, time (t + 1)... Time (t + n) ) Each time a composite frame image (FIG. 3C) is generated. In addition, the composite frame image is output in time series, thereby forming a composite video. Thereby, an image (composite image) in which the trajectory of the ball is drawn can be generated.

次に、図４を参照（適宜図１参照）して、映像オブジェクトの抽出に失敗した場合に、移動軌跡を補間する例について説明する。図４（ａ）は、映像オブジェクト軌跡合成装置１に入力される映像を構成するフレーム画像を時系列に示したものである。図４（ｂ）は、図４（ａ）の各フレーム画像に対応して、軌跡画像生成手段５０が生成した映像オブジェクトの軌跡画像を時系列に示したものである。図４（ｃ）は、図４（ａ）の各フレーム画像に対応して、画像合成手段７０がフレーム画像と軌跡画像とを合成した合成フレーム画像を時系列に示したものである。 Next, with reference to FIG. 4 (refer to FIG. 1 as appropriate), an example of interpolating a movement trajectory when extraction of a video object fails will be described. FIG. 4A shows the frame images constituting the video input to the video object locus synthesizing device 1 in time series. FIG. 4B shows, in time series, the trajectory image of the video object generated by the trajectory image generation means 50 corresponding to each frame image of FIG. FIG. 4C shows, in chronological order, a synthesized frame image obtained by synthesizing the frame image and the trajectory image by the image synthesizing unit 70 corresponding to each frame image of FIG.

図４（ａ）に示すように、映像オブジェクト軌跡合成装置１は、時刻ｔ…時刻（ｔ＋２）、時刻（ｔ＋３）のフレーム画像から、順次ボールＭ₁…Ｍ₃、Ｍ₄を抽出する。そして、図４（ｂ）に示すように、図４（ａ）のフレーム画像から抽出したボール（テクスチャ）を作画した軌跡画像を生成する。
なお、ここで、図４（ａ）の図示していない時刻（ｔ＋１）と、図示した時刻（ｔ＋２）において、ボールの抽出に失敗し、時刻（ｔ＋３）において、ボールの抽出に成功したとする。このとき、図４（ｂ）の時刻（ｔ＋２）の軌跡画像には、抽出に失敗したボールは作画されない。すなわち、抽出に成功したＭ₁のみが作画された状態となる。このため、図４（ｃ）の時刻（ｔ＋２）の合成フレーム画像には、ボールＭ₁のみが合成されることになる（なお、ここでＭ₃は、フレーム画像のボールである）。 As shown in FIG. 4A, the video object trajectory synthesizing device 1 sequentially extracts balls M ₁ ... M ₃ and M ₄ from frame images at time t... Time (t + 2) and time (t + 3). Then, as shown in FIG. 4B, a trajectory image in which a ball (texture) extracted from the frame image of FIG.
Here, it is assumed that the extraction of the ball failed at time (t + 1) (not shown) in FIG. 4A and the time (t + 2) shown in FIG. 4A, and the extraction of the ball was successful at time (t + 3). . At this time, the ball that failed to be extracted is not drawn in the trajectory image at time (t + 2) in FIG. That is only the M ₁ was successfully extracted is drawing state. Therefore, only the ball M ₁ is synthesized with the synthesized frame image at time (t + 2) in FIG. 4C (where M ₃ is the ball of the frame image).

そして、時刻（ｔ＋３）において、ボールの抽出に成功した段階で、軌跡画像生成手段５０の軌跡補間部５２が、軌跡画像にボールＭ₄を作画するとともに、軌跡画像上にボールＭ₁とボールＭ₄との位置（例えば重心位置）から内挿により計算される位置にボールＭ₂´及びＭ₃´を作画する。なお、このとき特徴量には、ボールＭ₄と同じ特徴量を用いる。このように、映像オブジェクトの抽出に失敗した場合であっても、抽出に成功した段階で、図４（ｃ）の時刻（ｔ＋３）の合成フレーム画像に示したように、ボールの軌跡が連続して描かれた映像（合成映像）を生成することができる。 At time (t + 3), when the ball is successfully extracted, the trajectory interpolation unit 52 of the trajectory image generation means 50 draws the ball M ₄ on the trajectory image, and the ball M ₁ and the ball M on the trajectory image. Balls M ₂ ′ and M ₃ ′ are drawn at positions calculated by interpolation from the position of ₄ (for example, the position of the center of gravity). At this time, the same feature amount as that of the ball M ₄ is used as the feature amount. As described above, even when the extraction of the video object fails, the ball trajectory continues as shown in the composite frame image at time (t + 3) in FIG. A video (composite video) drawn can be generated.

［映像オブジェクト軌跡合成装置の動作］
次に、図５乃至図７を参照して、映像オブジェクト軌跡合成装置１の動作について説明する。図５は、映像オブジェクト軌跡合成装置１の全体動作を示すフローチャートである。図６は、オブジェクト抽出手段３０のオブジェクト選択部３２における映像オブジェクトの選択処理の動作を示すフローチャートである。図７は、軌跡画像生成手段５０における軌跡画像の作画処理の動作を示すフローチャートである。
まず、図５を参照（適宜図１参照）して、映像オブジェクト軌跡合成装置１の動作について説明する。 [Operation of video object trajectory synthesizer]
Next, the operation of the video object locus synthesizing apparatus 1 will be described with reference to FIGS. FIG. 5 is a flowchart showing the overall operation of the video object trajectory synthesis apparatus 1. FIG. 6 is a flowchart showing the operation of the video object selection process in the object selection unit 32 of the object extraction means 30. FIG. 7 is a flowchart showing the operation of the trajectory image drawing process in the trajectory image generation means 50.
First, the operation of the video object trajectory synthesis apparatus 1 will be described with reference to FIG.

（オブジェクト候補画像生成ステップ）
映像オブジェクト軌跡合成装置１は、オブジェクト候補画像生成手段１０で、入力された映像を構成する最初のフレーム画像において、フレーム画像の全範囲に映像オブジェクトを探索する探索領域を設定する（ステップＳ１０）。
また、映像オブジェクト軌跡合成装置１は、輝度画像生成部１１によって、フレーム画像内の探索領域をモノクロ（グレースケール）化した輝度画像を生成する。また、映像オブジェクト軌跡合成装置１は、輪郭画像生成部１２によって、フレーム画像内の探索領域から、輝度により輪郭（エッジ）を抽出した輪郭画像を生成する。さらに、映像オブジェクト軌跡合成装置１は、差分画像生成部１３によって、探索領域内で、異なる時間に入力されたフレーム画像間の輝度の差を画素値とした差分画像を生成する（ステップＳ１１）。 (Object candidate image generation step)
The video object locus synthesizing apparatus 1 sets a search area for searching for a video object in the entire range of the frame image in the first frame image constituting the input video by the object candidate image generation means 10 (step S10).
Further, the video object trajectory synthesis apparatus 1 uses the luminance image generation unit 11 to generate a luminance image obtained by converting the search area in the frame image into monochrome (grayscale). In the video object trajectory synthesis device 1, the contour image generation unit 12 generates a contour image in which a contour (edge) is extracted from the search area in the frame image by luminance. Furthermore, the video object locus synthesizing apparatus 1 uses the difference image generation unit 13 to generate a difference image in which the luminance difference between the frame images input at different times is a pixel value in the search area (step S11).

そして、映像オブジェクト軌跡合成装置１は、オブジェクト候補抽出部１４の画像積算部１４１によって、ステップＳ１１で生成された輝度画像、輪郭画像及び差分画像を、予め定めた重み係数をかけて積算する（前記（１）式参照）ことで、映像オブジェクトを抽出するための画像（抽出用画像）を生成する（ステップＳ１２）。
さらに、映像オブジェクト軌跡合成装置１は、２値化部１４２によって、ステップＳ１２で生成された抽出用画像を２値化し、ノイズ除去部１４３によって、２値化された画像に対して収縮処理と膨張処理とを施すことでノイズを除去した画像を生成する。これによって、フレーム画像の探索領域内で、映像オブジェクトの候補を抽出したオブジェクト候補画像が生成される（ステップＳ１３）。 Then, the video object trajectory synthesis device 1 integrates the luminance image, the contour image, and the difference image generated in step S11 by the image integration unit 141 of the object candidate extraction unit 14 by applying a predetermined weighting coefficient (see above). Thus, an image (extraction image) for extracting a video object is generated (see formula (1)) (step S12).
Further, the video object trajectory synthesis device 1 binarizes the extraction image generated in step S12 by the binarization unit 142, and contracts and expands the binarized image by the noise removal unit 143. By performing processing, an image from which noise has been removed is generated. As a result, an object candidate image obtained by extracting video object candidates is generated within the search area of the frame image (step S13).

（オブジェクト抽出ステップ）
そして、映像オブジェクト軌跡合成装置１は、オブジェクト抽出手段３０のラベリング部３１によって、オブジェクト候補画像の中で、映像オブジェクトの候補となる領域に対して番号（ラベル）を付す（ステップＳ１４）。なお、以降の動作は、映像オブジェクトの候補に付された番号に基づいて、映像オブジェクトの単位で処理される。 (Object extraction step)
Then, the video object locus synthesizing apparatus 1 uses the labeling unit 31 of the object extraction unit 30 to assign a number (label) to a region that is a candidate for the video object in the object candidate image (step S14). The subsequent operations are processed in units of video objects based on the numbers assigned to the video object candidates.

また、映像オブジェクト軌跡合成装置１は、オブジェクト選択部３２によって、ステップＳ１４で番号付けされた映像オブジェクト毎に、オブジェクト候補画像の中から、抽出条件記憶手段２０に記憶されている抽出条件（抽出条件情報２０ａ）に基づいて、抽出（追跡）対象となる映像オブジェクトを選択する（ステップＳ１５）。なお、このステップＳ１５における映像オブジェクトの選択処理の詳細な動作については、図６を参照して後で説明することとする。 Further, the video object trajectory synthesis device 1 uses the object selection unit 32 to extract the extraction condition (extraction condition) stored in the extraction condition storage unit 20 from the object candidate images for each video object numbered in step S14. Based on the information 20a), a video object to be extracted (tracked) is selected (step S15). The detailed operation of the video object selection process in step S15 will be described later with reference to FIG.

そして、映像オブジェクト軌跡合成装置１は、特徴量解析部３３によって、ステップＳ１５で選択された映像オブジェクトの特徴量を解析（算出）して、映像オブジェクトの位置及び特徴量を抽出する（ステップＳ１６）。映像オブジェクトの位置としては、例えば、映像オブジェクトの重心座標を用いる。また、映像オブジェクトの特徴量としては、映像オブジェクトの存在領域、輝度、色、テクスチャ等を用いる。 Then, the video object trajectory synthesis device 1 analyzes (calculates) the feature quantity of the video object selected in step S15 by the feature quantity analysis unit 33, and extracts the position and feature quantity of the video object (step S16). . As the position of the video object, for example, the barycentric coordinates of the video object are used. In addition, as the feature amount of the video object, the existence area, luminance, color, texture, and the like of the video object are used.

（軌跡画像生成ステップ）
また、映像オブジェクト軌跡合成装置１は、軌跡画像生成手段５０の特徴量画像作画部５１によって、映像オブジェクトの特徴量を示す画像を作画し、軌跡画像記憶手段６０に軌跡画像６０ａとして記憶する（ステップＳ１７）。このステップＳ１７では、ステップＳ１６で抽出した映像オブジェクトの位置（例えば、重心座標）に、特徴量（例えば、テクスチャ）を作画することで軌跡画像６０ａを生成する。なお、このステップＳ１７における軌跡画像の作画処理の詳細な動作については、図７を参照して後で説明することとする。 (Track image generation step)
Further, the video object trajectory synthesis apparatus 1 draws an image indicating the feature amount of the video object by the feature amount image drawing unit 51 of the trajectory image generation unit 50 and stores the image as the trajectory image 60a in the trajectory image storage unit 60 (step). S17). In step S17, a trajectory image 60a is generated by drawing a feature amount (eg, texture) at the position (eg, barycentric coordinates) of the video object extracted in step S16. The detailed operation of the trajectory image drawing process in step S17 will be described later with reference to FIG.

（画像合成ステップ）
そして、映像オブジェクト軌跡合成装置１は、画像合成手段７０によって、軌跡画像記憶手段６０に記憶されている軌跡画像６０ａと、入力された映像を構成するフレーム画像とを合成し、合成フレーム画像を生成する（ステップＳ１８）。なお、画像合成手段７０が、この合成フレーム画像を連続して出力することで、映像オブジェクトの移動軌跡を映像に合成した合成映像が生成されることになる。
さらに、映像オブジェクト軌跡合成装置１は、ステップＳ１６で抽出した映像オブジェクトの位置（重心座標等）に基づいて、次に入力されるフレーム画像における、映像オブジェクトの探索領域を推定する（ステップＳ１９）。なお、この探索領域の推定には、カルマンフィルタを用いることができる。 (Image composition step)
Then, the video object trajectory synthesis apparatus 1 uses the image synthesis means 70 to synthesize the trajectory image 60a stored in the trajectory image storage means 60 and the frame image constituting the input video to generate a composite frame image. (Step S18). The image composition means 70 continuously outputs the composite frame image, so that a composite video in which the moving trajectory of the video object is combined with the video is generated.
Further, the video object locus synthesizing apparatus 1 estimates the search area for the video object in the next input frame image based on the position (center of gravity coordinates, etc.) of the video object extracted in step S16 (step S19). Note that a Kalman filter can be used for estimation of the search region.

ここで、映像オブジェクト軌跡合成装置１は、次に処理すべきフレーム画像（次フレーム画像）が存在するかどうかを判定し、次フレーム画像が存在する場合（ステップＳ２０；Ｙｅｓ）は、ステップＳ１１に戻って、ステップＳ１９で推定した探索領域内で映像オブジェクトの抽出を行う。一方、次フレーム画像が存在しない場合（ステップＳ２０；Ｎｏ）は、動作を終了する。 Here, the video object locus synthesizing apparatus 1 determines whether or not there is a frame image (next frame image) to be processed next. If there is a next frame image (step S20; Yes), the process proceeds to step S11. Returning, the video object is extracted in the search area estimated in step S19. On the other hand, when the next frame image does not exist (step S20; No), the operation ends.

以上の動作によって、映像オブジェクト軌跡合成装置１は、映像として時系列に入力されるフレーム画像から、追跡対象となる映像オブジェクトを逐次抽出、追跡し、映像オブジェクトの移動軌跡を映像に合成した合成映像を出力することができる。また、映像オブジェクト軌跡合成装置１は、映像オブジェクトを探索する領域を適宜更新しながら、映像オブジェクトの抽出を行うため、映像オブジェクトを抽出・更新するための演算量を抑えることができる。 Through the above operation, the video object trajectory synthesis apparatus 1 sequentially extracts and tracks video objects to be tracked from frame images input in time series as video, and combines the video object movement trajectory with video. Can be output. Further, since the video object trajectory synthesis apparatus 1 extracts the video object while appropriately updating the area for searching for the video object, it is possible to reduce the amount of calculation for extracting and updating the video object.

［映像オブジェクトの選択処理動作］
次に、図６を参照（適宜図１参照）して、映像オブジェクト軌跡合成装置１のオブジェクト選択部３２における、映像オブジェクトの選択処理（図５のステップＳ１５）の動作について説明する。
まず、オブジェクト選択部３２は、オブジェクト候補画像の中から、番号（ラベル）に基づいて、映像オブジェクトを１つ選択する（ステップＳ３０）。 [Video object selection processing operation]
Next, referring to FIG. 6 (refer to FIG. 1 as appropriate), the operation of the video object selection process (step S15 in FIG. 5) in the object selection unit 32 of the video object locus synthesis apparatus 1 will be described.
First, the object selection unit 32 selects one video object from the object candidate images based on the number (label) (step S30).

そして、オブジェクト選択部３２は、選択された映像オブジェクトの「面積」が、抽出条件記憶手段２０に記憶されている抽出条件（抽出条件情報２０ａ）に適合するかどうかを判定する（ステップＳ３１）。ここで、「面積」による適合条件に合致する場合（ステップＳ３１；条件適合）、オブジェクト選択部３２は、映像オブジェクトの「色」が、抽出条件に適合するかどうかを判定する（ステップＳ３２）。また、「色」による適合条件に合致する場合（ステップＳ３２；条件適合）、オブジェクト選択部３２は、映像オブジェクトの「輝度」が、抽出条件に適合するかどうかを判定する（ステップＳ３３）。また、「輝度」による適合条件に合致する場合（ステップＳ３３；条件適合）、オブジェクト選択部３２は、映像オブジェクトの「アスペクト比」が、抽出条件に適合するかどうかを判定する（ステップＳ３４）。また、「アスペクト比」による適合条件に合致する場合（ステップＳ３４；条件適合）、オブジェクト選択部３２は、映像オブジェクトの「円形度」が、抽出条件に適合するかどうかを判定する（ステップＳ３５）。 Then, the object selection unit 32 determines whether or not the “area” of the selected video object matches the extraction condition (extraction condition information 20a) stored in the extraction condition storage unit 20 (step S31). Here, when the matching condition by the “area” is met (step S31; condition matching), the object selection unit 32 determines whether or not the “color” of the video object matches the extraction condition (step S32). Further, when the matching condition based on “color” is met (step S32; condition matching), the object selection unit 32 determines whether or not the “luminance” of the video object matches the extraction condition (step S33). If the matching condition by “luminance” is met (step S33; condition matching), the object selection unit 32 determines whether the “aspect ratio” of the video object matches the extraction condition (step S34). Further, when the matching condition based on the “aspect ratio” is met (step S34; condition matching), the object selection unit 32 determines whether the “circularity” of the video object matches the extraction condition (step S35). .

さらに、「円形度」による適合条件に合致する場合（ステップＳ３５；条件適合）、すなわち、ステップＳ３１〜Ｓ３５のすべてで抽出条件に適合した場合、ステップＳ３０で選択した映像オブジェクトを、追跡対象の映像オブジェクトとして選択して（ステップＳ３６）、ステップＳ３７へ進む。一方、ステップＳ３１〜Ｓ３５のいずれかで、抽出条件に適合しなかった場合（条件不適合）は、そのままステップＳ３７へ進む。 Furthermore, when the conformity condition based on the “circularity” is met (step S35; condition conformance), that is, when the extraction condition is met in all of steps S31 to S35, the image object selected in step S30 is selected as the image to be tracked. It selects as an object (step S36) and progresses to step S37. On the other hand, in any of steps S31 to S35, if the extraction condition is not met (condition non-conformity), the process proceeds to step S37 as it is.

そして、オブジェクト選択部３２は、オブジェクト候補画像の全映像オブジェクトについて適合判定を行ったかどうかを判定し（ステップＳ３７）、全映像オブジェクトに対して適合判定を行った場合（Ｙｅｓ）は、動作を終了する。一方、適合判定をまだ行っていない映像オブジェクトが存在する場合（Ｎｏ）は、ステップＳ３０へ戻って動作を継続する。
以上の動作によって、映像オブジェクト軌跡合成装置１は、オブジェクト選択部３２によって、オブジェクト候補画像の中から抽出条件に適した映像オブジェクトを選択することができる。 Then, the object selection unit 32 determines whether or not the conformity determination has been performed on all the video objects of the object candidate image (step S37), and when the conformance determination has been performed on all the video objects (Yes), the operation ends. To do. On the other hand, if there is a video object that has not yet been determined for suitability (No), the process returns to step S30 to continue the operation.
With the above operation, the video object trajectory synthesis apparatus 1 can select a video object suitable for the extraction condition from the object candidate images by the object selection unit 32.

［映像オブジェクトの選択処理動作］
次に、図７を参照（適宜図１参照）して、映像オブジェクト軌跡合成装置１の軌跡画像生成手段５０における、軌跡画像の作画処理（図５のステップＳ１７）の動作について説明する。
まず、軌跡画像生成手段５０は、オブジェクト抽出手段３０が映像オブジェクトの抽出に成功したかどうかを判定する（ステップＳ４０）。ここで、映像オブジェクトの抽出に失敗した場合（ステップＳ４０；Ｎｏ）は、動作を終了する。一方、映像オブジェクトの抽出に成功した場合（ステップＳ４０；Ｙｅｓ）、前フレーム画像において、映像オブジェクトの抽出に成功していたかどうかを判定する（ステップＳ４１）。 [Video object selection processing operation]
Next, referring to FIG. 7 (refer to FIG. 1 as appropriate), the operation of the trajectory image drawing process (step S17 in FIG. 5) in the trajectory image generation means 50 of the video object trajectory synthesis apparatus 1 will be described.
First, the trajectory image generation means 50 determines whether or not the object extraction means 30 has succeeded in extracting a video object (step S40). If the extraction of the video object has failed (step S40; No), the operation ends. On the other hand, if the video object has been successfully extracted (step S40; Yes), it is determined whether the video object has been successfully extracted from the previous frame image (step S41).

ここで、前フレーム画像において、映像オブジェクトの抽出に成功していた場合（ステップＳ４１；Ｙｅｓ）は、ステップＳ４３へ進む。一方、前フレーム画像において、映像オブジェクトの抽出に失敗していた場合、すなわち、軌跡画像６０ａに、前フレーム画像以前の映像オブジェクトの特徴量（例えば、テクスチャ）が作画されていない場合、軌跡画像生成手段５０の軌跡補間部５２は、現在の映像オブジェクトの位置と、以前作画に成功している映像オブジェクトの位置とを内挿によって補間し、その補間された位置に現在の映像オブジェクトのテクスチャ等を作画する（ステップＳ４２）。
そして、軌跡画像生成手段５０の特徴量画像作画部５１が、映像オブジェクトの特徴量に基づいて、現在の映像オブジェクトの位置に特徴量（テクスチャ等）に基づいて軌跡画像６０ａを作画する（ステップＳ４３）。 If the video object has been successfully extracted from the previous frame image (step S41; Yes), the process proceeds to step S43. On the other hand, if the extraction of the video object has failed in the previous frame image, that is, if the feature amount (for example, texture) of the video object before the previous frame image has not been drawn in the trajectory image 60a, the trajectory image generation The trajectory interpolation unit 52 of the means 50 interpolates the position of the current video object and the position of the video object that has been successfully drawn by interpolation, and adds the texture of the current video object to the interpolated position. Drawing is performed (step S42).
Then, the feature amount image drawing unit 51 of the locus image generation means 50 draws the locus image 60a based on the feature amount (texture or the like) at the current position of the video object based on the feature amount of the video object (step S43). ).

以上の動作によって、映像オブジェクト軌跡合成装置１は、途中で映像オブジェクトの抽出に失敗した場合でも、補間によって軌跡画像を生成するため、画像合成手段７０で合成され出力される合成映像には、連続した映像オブジェクトの移動軌跡が合成されることになる。 With the above operation, the video object trajectory synthesis apparatus 1 generates a trajectory image by interpolation even when video object extraction fails in the middle. The movement trajectory of the selected video object is synthesized.

本発明に係る映像オブジェクト軌跡合成装置の構成を示したブロック図である。It is the block diagram which showed the structure of the video object locus | trajectory synthesis apparatus based on this invention. 色によって映像オブジェクトを抽出する手法を説明するための説明図である。It is explanatory drawing for demonstrating the method of extracting a video object with a color. 映像オブジェクトの移動軌跡を映像上に合成する例を説明するための説明図である。It is explanatory drawing for demonstrating the example which synthesize | combines the movement locus | trajectory of a video object on a video. 映像オブジェクトの抽出に失敗した場合に、移動軌跡を補間する例を説明するための説明図である。It is explanatory drawing for demonstrating the example which interpolates a movement locus | trajectory when extraction of a video object fails. 本発明に係る映像オブジェクト軌跡合成装置の全体動作を示すフローチャートである。It is a flowchart which shows the whole operation | movement of the video object locus | trajectory synthesis apparatus which concerns on this invention. 本発明に係る映像オブジェクト軌跡合成装置における映像オブジェクトの選択処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the selection process of the video object in the video object locus | trajectory synthesis apparatus which concerns on this invention. 本発明に係る映像オブジェクト軌跡合成装置における軌跡画像の作画処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the drawing process of the locus | trajectory image in the video object locus | trajectory synthesis apparatus which concerns on this invention.

Explanation of symbols

１映像オブジェクト軌跡合成装置
１０オブジェクト候補画像生成手段
２０抽出条件記憶手段
３０オブジェクト抽出手段
４０探索領域推定手段
５０軌跡画像生成手段
６０軌跡画像記憶手段
７０画像合成手段 DESCRIPTION OF SYMBOLS 1 Image | video object locus | trajectory synthesis apparatus 10 Object candidate image generation means 20 Extraction condition memory | storage means 30 Object extraction means 40 Search area estimation means 50 Trajectory image generation means 60 Trajectory image storage means 70 Image composition means

Claims

A video object trajectory synthesizer that extracts and tracks a video object included in an input video and synthesizes a moving trajectory of the video object on the video,
Object candidate image generation means for generating an object candidate image obtained by extracting the video object candidates from time-series input frame images constituting the video;
Extraction condition storage means for storing extraction conditions for the video object to be extracted;
An object for extracting the position of the video object and the feature quantity characterizing the video object from the object candidate images generated by the object candidate image generation unit based on the extraction conditions stored in the extraction condition storage unit Extraction means;
Based on the position and feature amount of the video object extracted by the object extraction means, an image indicating the feature amount of the video object is drawn in a corresponding area of the frame image, thereby extracting the movement trajectory of the video object. A trajectory image generating means for generating a trajectory image obtained;
Image synthesizing means for synthesizing the trajectory image generated by the trajectory image generating means and the frame image;
By applying a Kalman filter to the barycentric coordinates of the object, a search area for searching the video object in the next input frame image based on the position of the video object extracted by the object extraction means Search area estimation means for predicting the position of the video object to be input next and estimating from the prediction;
The search region estimation means, based on the Kalman gain that is the filter sensitivity of the Kalman filter, based on the observation noise covariance that is the covariance of the observation noise and the state covariance that is the covariance of the state quantities in the frame image, update, the state covariance, the small when successful tracking of video objects, video object locus synthesis device according to claim Rukoto greatly when it fails to track.

The trajectory image generation means includes trajectory interpolation means for interpolating the position of the video object that has failed to be extracted from the position of the video object that has been successfully extracted based on the extraction result of the object extraction means. The video object locus synthesizing device according to claim 1.

A video object trajectory synthesis method that extracts and tracks a video object included in an input video and synthesizes a moving trajectory of the video object on the video,
Extracting candidates of the video object from at least one of a difference image obtained by calculating a difference between a luminance image, a contour image, and a frame image having different input times in a frame image that is configured and configured in time series. An object candidate image generation step for generating an object candidate image that has been performed;
Features that characterize the position of the video object and the video object based on at least one extraction condition of area, brightness, color, aspect ratio, and circularity from the object candidate image generated in the object candidate image generation step An object extraction step to extract the quantity;
Based on the position and feature amount of the video object extracted in the object extraction step, an image indicating the feature amount of the video object is drawn in a corresponding area of the frame image, thereby extracting the movement trajectory of the video object. A trajectory image generation step for generating a trajectory image,
An image synthesis step for synthesizing the trajectory image generated in the trajectory image generation step and the frame image;
Based on the position of the video object extracted in the object extraction step, a search area for searching for the video object in the frame image to be input next is applied by applying a Kalman filter to the barycentric coordinates of the object. A search region estimation step of predicting the position of the video object input to
The search region estimation step is based on Kalman gain, which is filter sensitivity of the Kalman filter, based on observation noise covariance that is covariance of observation noise and state covariance that is covariance of state quantities in the frame image. updating the state covariance, the small when successful tracking of video objects, video object locus synthesis method and characterized make a marked when it fails to track.

In order to extract and track a video object included in the input video, and to synthesize a moving locus of the video object on the video,
Extracting candidates of the video object from at least one of a difference image obtained by calculating a difference between a luminance image, a contour image, and a frame image having different input times in a frame image that is configured and configured in time series. Object candidate image generation means for generating the object candidate image
Features that characterize the position of the video object and the video object based on at least one extraction condition of area, brightness, color, aspect ratio, and circularity from the object candidate image generated by the object candidate image generation unit Object extraction means for extracting the quantity,
Based on the position and feature amount of the video object extracted by the object extraction means, an image indicating the feature amount of the video object is drawn in a corresponding area of the frame image, thereby extracting the movement trajectory of the video object. A trajectory image generating means for generating a trajectory image obtained;
Image synthesis means for synthesizing the trajectory image generated by the trajectory image generation means and the frame image;
Based on the position of the video object extracted by the object extraction means, a search area for searching for the video object in the next input frame image is applied, and a Kalman filter is applied to the barycentric coordinates of the object. Function as search area estimation means for predicting the position of the video object input to
The search region estimation means, based on the Kalman gain that is the filter sensitivity of the Kalman filter, based on the observation noise covariance that is the covariance of the observation noise and the state covariance that is the covariance of the state quantities in the frame image, update, the state covariance, the small when successful tracking of video objects, video object locus synthesis program characterized Rukoto greatly when it fails to track.