JP2008262560A

JP2008262560A - Video search method and video search device

Info

Publication number: JP2008262560A
Application number: JP2008099460A
Authority: JP
Inventors: Koji Yamamoto; 晃司山本; Osamu Hori; 修堀; Toshimitsu Kaneko; 敏充金子; Takeshi Mita; 雄志三田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2008-04-07
Filing date: 2008-04-07
Publication date: 2008-10-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video information description method for allowing a video including an object whose background is moving to be searched for. <P>SOLUTION: A feature quantity 703 including information, such as a position, shape, and movement of an object and a feature quantity 704 including information such as a background movement, are described from an original video as notation data. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、映像中のオブジェクトに注目した映像情報の記述方法及びこれを用いて特定のオブジェクトやそのオブジェクトを含むフレームの検索を行う映像検索方法及び映像検索装置に関する。 The present invention relates to a video information description method that focuses on an object in a video, and a video search method and video search device that use this to search for a specific object or a frame that includes the object.

ディジタル衛星放送やケーブルテレビの普及に伴う放送の多チャンネル化により、ユーザの入手できる映像情報は増加の一途を辿っている。一方で、計算機技術の進歩やＤＶＤに代表される大容量記録媒体の実用化により、大量の映像情報をディジタル情報として蓄積し、計算機上で取り扱うことが容易になりつつある。 With the increase in broadcasting multi-channel accompanying the spread of digital satellite broadcasting and cable television, video information available to users is increasing. On the other hand, with the advancement of computer technology and the practical use of large-capacity recording media represented by DVD, it is becoming easy to store a large amount of video information as digital information and handle it on a computer.

ユーザが実際に映像情報を利用するために、このように大量の映像情報の中から目的の映像に対して効率的にアクセスを行うには、有効な映像検索技術が必要となる。そのような映像検索技術として、映像中のオブジェクトになんらかの情報を付随させ、ユーザの必要とする情報を満たすオブジェクトを含む映像を検索してユーザの視聴に供する方法が考えられている。映像中のオブジェクトに情報を付随させるためには、映像からオブジェクトを抽出する処理が必要となる。しかし、増加する一方の映像情報に対して手作業でオブジェクトの抽出を行うことは、現実的ではない。 In order for a user to actually use video information, an effective video search technique is required to efficiently access a target video from such a large amount of video information. As such a video search technique, a method is considered in which some information is attached to an object in the video, and a video including an object satisfying the information required by the user is searched and used for viewing by the user. In order to attach information to an object in a video, a process for extracting the object from the video is required. However, it is not realistic to manually extract an object from the increasing video information.

オブジェクトの自動検出技術に関しては、例えば文献（米山、中島、柳原、菅野「ＭＰＥＧビデオストリームからの移動物体の検出」信学論Ｖｏｌ．Ｊ８１−Ｄ−II，Ｎｏ．８，ｐｐ．１７７６−１７８６，１９９８−０８）に、背景の静止した映像からオブジェクトを検出する方法が提案されている。しかし、この方法は背景が静止していることを前提としており、背景が動く場合にはオブジェクトの検出を行うことは難しい。 Regarding the automatic object detection technology, for example, literature (Yoneyama, Nakajima, Yanagihara, Kanno “Detection of Moving Objects from MPEG Video Streams”, Science Theory Vol. J81-D-II, No. 8, pp. 1776-1786 (1998-08) proposes a method for detecting an object from a still image of a background. However, this method assumes that the background is stationary, and it is difficult to detect an object when the background moves.

すなわち、オブジェクトの形状が予め与えられていたとしても、背景の動きが分からなければ、オブジェクトの動きを用いて検索を行う際に、カメラワークの影響を受けることにより正確な動きを用いた検索ができない。例えば、左に動いているオブジェクトを追跡するように撮影した場合、オブジェクトは画面内でほぼ静止し、背景が相対的に右に動く。そのため、画面内で左に動くオブジェクトを含む映像を検索することはできないことになる。 That is, even if the shape of the object is given in advance, if the background movement is not known, when searching using the object movement, the search using the accurate movement is affected by the influence of camera work. Can not. For example, when shooting is performed so as to track an object moving to the left, the object is almost stationary in the screen, and the background moves relatively to the right. Therefore, it is impossible to search for a video including an object moving to the left in the screen.

上述したように、従来の映像検索技術では背景が動いているオブジェクトを検出することができず、そのようなオブジェクトを含む映像を検索することができないという問題点があった。 As described above, the conventional video search technique cannot detect an object whose background is moving, and cannot search a video including such an object.

本発明は、背景が動いているオブジェクトを含む映像についても検索を可能とするための映像情報記述方法を提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide a video information description method for enabling a search for a video including an object whose background is moving.

本発明の他の目的は、背景が動いているオブジェクトを検出できるオブジェクト検出方法を提供することにある。 Another object of the present invention is to provide an object detection method capable of detecting an object whose background is moving.

本発明のもう一つの目的は、背景が動いているオブジェクトを含む映像に対する種々の検索を可能とした映像検索方法及び映像検索装置を提供することにある。 Another object of the present invention is to provide a video search method and a video search apparatus that enable various searches for videos containing objects whose background is moving.

上記の課題を解決するため、本発明に係る映像情報記述方法は、映像中の特定のオブジェクトに関する特徴量及び該映像中の背景に関する特徴量を映像情報として記述することを基本的な特徴としている。 In order to solve the above-described problems, the video information description method according to the present invention has a basic feature of describing, as video information, a feature quantity related to a specific object in a video and a feature quantity related to a background in the video. .

また、本発明に係る他の映像情報記述方法は、映像中の特定のオブジェクトに関する特徴量と該映像中の背景に関する特徴量に加え、さらに両者の特徴量の差分を映像情報として記述することを特徴とする。また、映像中の特定のオブジェクトに関する特徴量と該映像中の背景に関する特徴量との差分及び背景に関する特徴量を映像情報として記述するようにしてもよい。 In addition, another video information description method according to the present invention describes a feature amount related to a specific object in a video and a feature amount related to a background in the video, and further describes a difference between both feature amounts as video information. Features. Further, the difference between the feature quantity related to a specific object in the video and the feature quantity related to the background in the video and the feature quantity related to the background may be described as video information.

ここで、特定のオブジェクトに関する特徴量としては少なくとも該オブジェクトの位置、形状及び動きの情報を記述し、また背景に関する特徴量としては少なくとも背景の動きの情報を記述することが望ましい。 Here, it is desirable to describe at least information on the position, shape, and motion of the object as the feature amount relating to the specific object, and to describe at least information on the background motion as the feature amount relating to the background.

また、本発明によれば、このようにして記述されたオブジェクトに関する特徴量及び背景に関する特徴量、さらには両者の特徴量の差分を映像データと共に、あるいは映像データとは別に格納した記録媒体が提供される。 In addition, according to the present invention, there is provided a recording medium that stores the feature quantity related to the object described in this way, the feature quantity related to the background, and the difference between the feature quantities together with the video data or separately from the video data. Is done.

本発明に係るオブジェクト検出方法は、入力された映像の動ベクトルを抽出する動きベクトル抽出ステップと、この抽出した動ベクトルを用いて映像の背景の動きを推定する推定ステップと、この推定した背景の動きを除去して映像中の特定のオブジェクトに関する動ベクトルを抽出し、該オブジェクトの領域を検出する検出ステップとを有することを特徴とする。 The object detection method according to the present invention includes a motion vector extraction step of extracting a motion vector of an input video, an estimation step of estimating a motion of a background of the video using the extracted motion vector, and the estimated background And a detection step of extracting a motion vector relating to a specific object in the video by removing the motion and detecting a region of the object.

ここで、背景の動きを推定する推定ステップは、背景の動きを所定の変換モデル（例えばアフィン変換や透視変換など）で近似し、その変換モデルの変換係数を映像の動べクトルから推定することによって背景の動きを推定することを特徴とする。変換モデルの変換係数は、例えばロバスト推定法により推定される。 Here, in the estimation step for estimating the background motion, the background motion is approximated by a predetermined conversion model (for example, affine transformation or perspective transformation), and the conversion coefficient of the conversion model is estimated from the motion vector of the video. The background motion is estimated by The conversion coefficient of the conversion model is estimated by a robust estimation method, for example.

背景の動きを推定する推定ステップにおいては、映像の画面内の動ベクトルを類似度に従って領域分割し、分割された各領域を該動ベクトルの類似度に基づいてクラスタリングし、最大のクラスタの領域を背景の領域と判定する処理を含んでもよい。 In the estimation step of estimating the background motion, the motion vector in the video screen is divided into regions according to the similarity, the divided regions are clustered based on the similarity of the motion vectors, and the region of the largest cluster is determined. Processing for determining a background region may be included.

また、背景の動きを推定する推定ステップにおいては、映像の複数フレームの動ベクトルを各フレーム内の類似度に従って領域分割し、これらの領域をフレーム間で対応付け、対応付けられた各領域が同じクラスタに属するように各領域を該動ベクトルに基づいて各フレーム内でクラスタリングし、最大のクラスタの領域を背景の領域と判定する処理を含んでもよい。 Further, in the estimation step for estimating the background motion, the motion vector of a plurality of frames of the video is divided into regions according to the similarity in each frame, and these regions are associated between the frames, and the associated regions are the same. Processing may be included in which each region is clustered within each frame so as to belong to the cluster, and the largest cluster region is determined as the background region.

本発明によると、映像中のオブジェクトに関する特徴量及び背景に関する特徴量、さらには両者の特徴量の差分を記述することにより、背景が動いているオブジェクトを含む映像に対する種々の検索を行うことが可能である。 According to the present invention, it is possible to perform various searches for videos including objects whose background is moving by describing the feature quantity related to the object in the video, the feature quantity related to the background, and the difference between the feature quantities of both. It is.

第１に、検索対象となる映像中の特定のオブジェクトに関する特徴量及び該映像中の背景に関する特徴量を記述しておき、検索対象となる映像中のオブジェクトに関する特徴量から背景に関する特徴量を差し引いた後、外部より入力されたオブジェクトに関する特徴量と比較することによって、検索対象となる映像から外部より入力されたオブジェクトと同一のオブジェクトもしくは外部より入力されたオブジェクトと同一のオブジェクトを含む映像中のフレームの少なくとも一方を検索することができる。 First, a feature quantity related to a specific object in a video to be searched and a feature quantity related to a background in the video are described, and a feature quantity related to a background is subtracted from a feature quantity related to an object in the video to be searched. After that, by comparing with the feature amount related to the object input from the outside, the object in the image including the same object as the object input from the outside from the image to be searched or the same object as the object input from the outside At least one of the frames can be searched.

第２に、検索対象となる映像中の特定のオブジェクトに関する特徴量と該映像中の背景に関する特徴量及びそれぞれの特徴量の差分を記述しておき、この差分と外部より入力されたオブジェクトに関する特徴量とを比較することによって、検索対象となる映像から外部より入力されたオブジェクトと同一のオブジェクトもしくは外部より入力されたオブジェクトと同一のオブジェクトを含む映像中のフレームの少なくとも一方を検索することができる。 Second, a feature amount related to a specific object in the video to be searched, a feature amount related to the background in the video, and a difference between the feature amounts are described, and this difference and a feature related to the object input from the outside are described. By comparing the amount, at least one of the frames in the video including the same object as the object input from the outside or the same object as the object input from the outside can be searched from the video to be searched. .

第３に、検索対象となる映像中の特定のオブジェクトに関する特徴量及び該映像中の背景に関する特徴量を記述しておき、背景に関する特徴量と外部より入力された映像中の背景に関する特徴量とを比較することによって、検索対象となる映像から外部より入力された映像を得たときのカメラワークとほぼ同一のカメラワークが用いられているフレームを検索することができる。 Third, a feature amount related to a specific object in the video to be searched and a feature amount related to the background in the video are described, and a feature amount related to the background and a feature amount related to the background in the video input from the outside, By comparing these two, it is possible to search for a frame in which almost the same camera work is used as the camera work when the externally input video is obtained from the video to be searched.

第４に、検索対象となる映像中の特定のオブジェクトに関する少なくとも動きの情報を含む特徴量及び該映像中の背景に関する特徴量を記述しておき、映像中の連続する複数フレームにおけるオブジェクトの動き情報と外部より入力されたオブジェクトの一連の動きの情報とを比較することにより、検索対象となる映像から外部より入力されたオブジェクトと同一のオブジェクトもしくは外部より入力されたオブジェクトと同一のオブジェクトを含むフレームの少なくとも一方を検索することができる。 Fourth, a feature amount including at least motion information about a specific object in the video to be searched and a feature amount related to the background in the video are described, and the motion information of the object in a plurality of consecutive frames in the video And a frame including the same object as the object input from the outside or the same object as the object input from the outside by comparing a series of movement information of the object input from the outside At least one of them can be searched.

本発明によればオブジェクトに関する特徴量と背景に関する特徴量を記述しておくことにより、背景の動きを除去してオブジェクト本来の動きによる検索が可能である。 According to the present invention, by describing the feature quantity related to the object and the feature quantity related to the background, it is possible to search based on the original motion of the object by removing the motion of the background.

また、大量に蓄積された映像データに対して人手に頼らずに自動的にオブジェクトを検出してその特徴量を抽出し、外部から入力された別の特徴量に合致したオブジェクトを検索したり、外部から入力されたカメラワークによる動きを伴う背景と同じ背景の動きを含むフレームを検索するなど、個々のユーザの目的に合った映像検索を容易に行うことができる。 In addition, it automatically detects an object without relying on human resources for a large amount of stored video data, extracts its feature value, searches for an object that matches another feature value input from the outside, It is possible to easily perform a video search suitable for each user's purpose, such as searching for a frame that includes the same background movement as that of the camera movement input from the outside.

さらに、予め検出した特徴量を記述しておくことにより、検索の度に特徴量抽出処埋を行う必要がなく、高速な検索が可能であり、またユーザ側にオブジェクト検出機能がない場合でも上述した検索が可能となる。 Further, by describing the feature quantity detected in advance, it is not necessary to perform feature quantity extraction processing every time it is searched, and high-speed search is possible, and even when the user does not have an object detection function, Search is possible.

以下、図面を参照して本発明の実施の形態を説明する。
［第１の実施形態］
本実施形態では、大別して３つの機能を提供している。第１に、映像データを再生する機能の他に、映像中で動きを伴うオブジェクトを自動的に検出し、楕円や矩形などの図形を重ねて合成表示することによって、ユーザにその存在を知らせる機能を提供する。 Embodiments of the present invention will be described below with reference to the drawings.
[First Embodiment]
In the present embodiment, three functions are roughly classified. First, in addition to the function of reproducing video data, a function of automatically detecting an object with motion in the video and notifying the user of its presence by overlaying and displaying figures such as ellipses and rectangles I will provide a.

第２に、検出されたオブジェクトの位置、大きさ、動きなどの特徴量と背景に関する特徴量を分離して外部ファイルなどに表記データとして記述する機能を提供する。 Second, it provides a function that separates feature quantities such as the position, size, and movement of the detected object and feature quantities related to the background and describes them as notation data in an external file or the like.

第３に、検出されたオブジェクトに関する特徴量のデータあるいは予め外部ファイルなどに記述された特徴量の表記データと、検索対象として外部から与えられた検索対象オブジェクトの特徴量のデータを比較し、該当するオブジェクトをユーザに提示することによって、映像中のオブジェクトを検索する機能を提供する。 Third, the feature quantity data relating to the detected object or the feature quantity notation data described in advance in an external file or the like is compared with the feature quantity data of the search target object given from the outside as the search target. By presenting an object to be performed to the user, a function for searching for an object in the video is provided.

図１に、本実施形態に係る映像検索システムの構成及び処理の手順をフローチャートとして示す。 FIG. 1 is a flowchart showing the configuration and processing procedure of the video search system according to this embodiment.

まず、ＤＶＤなどの媒体から再生された元映像データ１００を入力し（ステップ１０１）、この元映像データ１０１から後に詳しく説明する方法によって映像中の特定のオブジェクトを検出する（ステップ１０２）。この際、後述するように映像中の背景に関する情報も併せて検出する。検出されたオブジェクトをこれを囲むように生成された楕円や矩形などの図形と合成し、オブジェクト検出結果表示データ１０４として出力する（ステップ１０３）。 First, original video data 100 reproduced from a medium such as a DVD is input (step 101), and a specific object in the video is detected from the original video data 101 by a method described in detail later (step 102). At this time, as will be described later, information regarding the background in the video is also detected. The detected object is combined with a figure such as an ellipse or rectangle generated so as to surround it, and is output as object detection result display data 104 (step 103).

一方、ステップ１０２で検出されたオブジェクトの位置、形状（大きさを含む）及び動きなどのオブジェクトに関する特徴量を示す特徴量データと、背景の動きなどの背景に関する特徴量を示す特徴量データを記述するための特徴量データ生成処理を行い（ステップ１０５）、さらに生成されたオブジェクト及び背景に関する特徴量データ１０７を外部に出力して表記データとして記述する処理を行う（ステップ１０６）。 On the other hand, the feature amount data indicating the feature amount relating to the object such as the position, shape (including size) and movement of the object detected in step 102 and the feature amount data indicating the feature amount relating to the background such as the background motion are described. Feature data generation processing is performed (step 105), and the generated feature data 107 relating to the object and background is output to the outside and described as notation data (step 106).

なお、ステップ１０５，１０６においてはオブジェクトに関する特徴量データと背景に関する特徴量データを記述してもよいが、さらに両者の特徴の差分のデータを生成して記述してもよく、また場合によってはこの差分のデータと背景に関する特徴量のデータ、あるいは差分のデータとオブジェクトに関する特徴量データを生成して記述してもよい。 In steps 105 and 106, the feature quantity data relating to the object and the feature quantity data relating to the background may be described. However, the difference data between the features of the two may be generated and described. Difference data and feature amount data relating to the background, or difference data and feature amount data relating to the object may be generated and described.

ステップ１０６の記述処理とは、具体的には特徴量データ１０７を各種記録媒体やメモリに格納（記録）したり、表示したりすることをいう。特徴量データ１０７を格納する記録媒体は、元映像データ１００を格納したＤＶＤなどの媒体であってもよく、これとは別の記録媒体でも構わない。 The description process in step 106 specifically means storing (recording) or displaying the feature data 107 in various recording media or memories. The recording medium for storing the feature amount data 107 may be a medium such as a DVD in which the original video data 100 is stored, or may be a recording medium different from this.

次に、オブジェクトの検索を行うために、ステップ１０５で生成されたオブジェクトに関する特徴量データと、ステップ１０９により入力された検索対象特徴量データ１１０との類似度判定を行い（ステップ１０８）、さらに類似度判定結果をオブジェクト検索結果表示データ１１２として合成表示するための合成表示処理を行う（ステップ１１１）。検索対象特徴量データ１１０は、検索しようとするオブジェクトの位置、形状（大きさを含む）及び動きなどの特徴量を表すデータである。 Next, in order to search for an object, similarity determination is performed between the feature amount data related to the object generated in step 105 and the search target feature amount data 110 input in step 109 (step 108), and further similarities are determined. A composite display process for compositely displaying the degree determination result as the object search result display data 112 is performed (step 111). The search target feature amount data 110 is data representing feature amounts such as the position, shape (including size), and movement of an object to be searched.

なお、これら一連の処理は、ソフトウェアまたはハードウェアのいずれでも実現が可能である。 The series of processing can be realized by either software or hardware.

次に、図２を用いて図１のステップ１０２のオブジェクト検出処理について詳細に説明する。
まず、入力された元映像データ１００より動ベクトルの抽出を行う（ステップ２０１）。元映像データ１００がＭＰＥＧ圧縮データの場合は、Ｐビクチャより得られる動ベクトルを用いる。この場合、動ベクトルはマクロブロック毎に与えられる。元映像データ１００がアナログデータや動ベクトルを持たないディジタルデータの場合は、必要に応じてディジタル化し、オプティカルフローを用いて動ベクトルを抽出したり、ＭＰＥＧ圧縮データに変換してから動ベクトルを抽出する。 Next, the object detection process in step 102 in FIG. 1 will be described in detail with reference to FIG.
First, a motion vector is extracted from the input original video data 100 (step 201). When the original video data 100 is MPEG compressed data, a motion vector obtained from the P picture is used. In this case, a motion vector is given for each macroblock. If the original video data 100 is analog data or digital data that does not have a motion vector, it is digitized as necessary, and the motion vector is extracted using an optical flow or converted into MPEG compressed data and then extracted. To do.

こうして抽出された動ベクトルは、必ずしも実際のオブジェクトの動きを反映したものとはなっていない場合があり、画面の周辺部やテクスチャが平坦な部分でそれが顕著である。そこで、信頼性の低い動ベクトルを除去する処理を行う（ステップ２０２）。この処理は、以下のようにして行われる。 The motion vector extracted in this way may not necessarily reflect the actual motion of the object, and this is conspicuous in the peripheral part of the screen and the part where the texture is flat. Therefore, a process of removing a motion vector with low reliability is performed (step 202). This process is performed as follows.

まず、画面の周辺部に関しては、予め領域を定めておき、その領域に含まれる動ベクトルを除去する。一方、平坦なテクスチャを持つ部分に関しては、元映像データ１００がＭＰＥＧ圧縮データの場合は図３に示すようにＩピクチャのＤＣＴ（離散コサイン変換）係数のＤＣ成分を用い、図４に示すように１つのマクロブロックに含まれる４つのＤＣ成分の分散が閾値以下のマクロブロックの集まりを低信頼性領域として、始点がその領域のマクロブロックに含まれる動ベクトルを信頼性の低い動ベクトルとして除去する。 First, regarding the peripheral part of the screen, an area is determined in advance, and a motion vector included in the area is removed. On the other hand, for the portion having a flat texture, when the original video data 100 is MPEG compressed data, the DC component of the DCT (discrete cosine transform) coefficient of the I picture is used as shown in FIG. A collection of macroblocks whose variance of four DC components included in one macroblock is equal to or less than a threshold value is defined as a low reliability region, and a motion vector included in the macroblock in that region is removed as a motion vector having low reliability. .

このようにして得られた動ベクトルのデータは、オブジェクトの動きにカメラワークなどによる背景の動きが含まれているので、正確なオブジェクトの動きを得るためには、背景の動きを除去する必要がある。そこで、本実施形態ではカメラワークなどによる背景の動きを近似する変換モデルとしてアフィン変換モデルを用い、動ベクトルを用いてその変換係数を推定することによって、背景の動きを推定する処理を行う（ステップ２０３）。背景の動きのアフィン変換係数を推定する処理については、数種類の手法が存在するが、それらについては後述する。 Since the motion vector data obtained in this way includes background motion due to camerawork etc. in the motion of the object, it is necessary to remove the background motion in order to obtain accurate object motion. is there. Therefore, in the present embodiment, a process for estimating the background motion is performed by using an affine transformation model as a transformation model that approximates the background motion due to camerawork or the like, and estimating the transformation coefficient using a motion vector (step) 203). There are several methods for estimating the affine transformation coefficient of the background motion, which will be described later.

次に、推定されたアフィン変換係数を用いて各動ベクトルの始点を変換し、その移動分を元の動ベクトルより差し引くことによって、背景の動きを除去する処理を行う（ステップ２０４）。 Next, the starting point of each motion vector is converted using the estimated affine transformation coefficient, and the background motion is removed by subtracting the amount of movement from the original motion vector (step 204).

このようにして得られた、背景の動きを含まない動ベクトルデータを類似した動べクトルより構成される領域に分割する処理を行う（ステップ２０５）。具体的には、隣接する２つの動べクトルの余弦（方向）と大きさを比較し、その差が予め定めた閾値以下であれば、同じ領域に分割する処理を全ての隣接する動ベクトルの組み合わせについて行う。 A process of dividing the motion vector data not including the background motion obtained in this way into regions composed of similar motion vectors is performed (step 205). Specifically, the cosines (directions) of two adjacent moving vectors are compared with the magnitude, and if the difference is equal to or less than a predetermined threshold, the process of dividing the same region into all adjacent moving vectors is performed. Do the combination.

このようにして得られた領域には、オブジェクトとして扱うには相応しくない小領域が含まれるので、閾値処理によりこれらを取り除く判定処理３０６を行い、最終的なオブジェクトデータ３０７を出力する。 Since the area obtained in this way includes small areas that are not suitable for handling as objects, a determination process 306 is performed to remove these by threshold processing, and final object data 307 is output.

以下、動ベクトルから背景の動きを近似するステップ２０３のアフィン変換係数推定処理の３つの手法について説明する。 Hereinafter, three methods of the affine transformation coefficient estimation processing in step 203 for approximating the background motion from the motion vector will be described.

＜手法１＞
手法１では、信頼性の低い動ベクトルを除いた画面内の全ての動ベクトルを用いてアフィン変換係数を推定する。ｉ番目のマクロブロックの中心をｙ_ｉとし、そのマクロブロックに対応する動ベクトルをｖ_ｉとする。このとき、ベクトルの始点ｘ_ｉ＝ｙ_ｉ−ｖ_ｉのアフィン変換変形モデルによる移動先は、アフィン変換係数をαとするとｒ_ｉ＝ｘ_ｉαとなり、実際の移動先であるｙ_ｉとの誤差はｅ_ｉ＝ｒ_ｉ−ｙ_ｉとなる。推定残差の総和は次式のようになり、これを最小とするαを求めればよい。

<Method 1>
In Method 1, the affine transformation coefficient is estimated using all the motion vectors in the screen excluding the motion vectors with low reliability. The center of the i-th macroblock is y _i and the motion vector corresponding to the macroblock is v _i . At this time, error between the starting point x _{i =} y _i -v _i destination by affine transformation deformation model of, r _{_i =} x _i α becomes when the affine transformation coefficient alpha, the actual destination y _i of the vector is the _e _i ₌ r i -y i. The sum of the estimated residuals is given by the following equation, and α that minimizes this may be obtained.

このような問題を解くための手法として最小二乗法があり、その場合には式（１）でΨ（ｚ）＝ｚ²を用いればよい。しかし、最小二乗法を用いた場合、背景の動ベクトルとオブジェクトの動ベクトルを同列に扱うため、背景の動ベクトルのみからアフィン変換係数を推定することができず、オブジェクトの動きを含んだアフィン変換係数となってしまう。 There is a least square method as a method for solving such a problem, and in that case, Ψ (z) = z ² may be used in the equation (1). However, when the least squares method is used, the background motion vector and the object motion vector are handled in the same column, so the affine transformation coefficient cannot be estimated from the background motion vector alone, and the affine transformation including the object motion is included. It becomes a coefficient.

そこで、画面内における背景領域の占める面積が５０％以上あると仮定し、オブジェクトの動べクトルを外乱とみなして、背景部分の動ベクトルからのみアフィン変換係数を推定する。外乱に強い手法として、文献：中川徹・小柳義夫著「最小二乗法による実験データ解析」束京大学出版会、に開示されているようなロバスト推定法を用いる。ここでは特に、ロバスト推定法の一つであるＢｉｗｅｉｇｈｔ法によるＭ推定を用いる。Ｂｉｗｅｉｇｈｔ法では、誤差の大きい要素の重みを下げることによって、外乱の影響を受け難くしている。具体的には、先の式（１）のΨ（ｚ）において、重みｗを用いた次式（２）を使用する。定数ｃは５〜９に選ぶのがよいとされている。

Thus, assuming that the area occupied by the background region in the screen is 50% or more, the object's motion vector is regarded as a disturbance, and the affine transformation coefficient is estimated only from the motion vector of the background portion. A robust estimation method as disclosed in the literature: Toru Nakagawa and Yoshio Koyanagi, “Experimental Data Analysis by Least Squares Method”, Sakkyo University Press, is used as a robust technique for disturbance. Here, in particular, M estimation based on the Biweight method, which is one of robust estimation methods, is used. In the Biweight method, the weight of an element with a large error is lowered to make it less susceptible to disturbance. Specifically, the following equation (2) using the weight w is used in Ψ (z) of the previous equation (1). The constant c should be selected from 5 to 9.

＜手法２＞
背景の動きを近似するアフィン変換係数推定処理において、手法２を用いた場合の処理手順を図５に示すフローチャートを用いて説明する。 <Method 2>
A processing procedure when Method 2 is used in the affine transformation coefficient estimation processing that approximates the motion of the background will be described with reference to the flowchart shown in FIG.

まず、信頼性の低いベクトルを除去する処理を行った後の動ベクトルデータ５００に対して、先の手法１と同様な処理を用いて、隣接する動べクトルが類似した領域に分割する処理を行う（ステップ５０１）。ただし、手法１と異なり、この時点で背景の動きを除去する処理は行われていない。 First, for the motion vector data 500 after performing the process of removing the low-reliability vector, the process of dividing the adjacent motion vector into similar regions using the same process as in the previous method 1. (Step 501). However, unlike the method 1, the process of removing the background motion is not performed at this point.

次に、分割された各領域に含まれる動ベクトルより、領域の動きをアフィン変換モデルで近似したときのアフィン変換係数の推定処理を行う（ステップ５０２）。このときの推定処理には、手法１と同様なロバスト推定法を用いる。 Next, affine transformation coefficient estimation processing is performed when the motion of the region is approximated by an affine transformation model from the motion vector included in each divided region (step 502). For the estimation process at this time, a robust estimation method similar to the method 1 is used.

次に、分割された各領域のクラスタリング処理を行う（ステップ５０３）。これには全ての領域の組み合わせよりなるテーブルを用意して、各領域間の距離をアフィン変換係数より求める。ここでは、アフィン変換モデルの６係数のユークリッド距離を用いるが、他の距離を用いてもよい。次に、この距離が最も小さい２つの領域を統合し、統合された領域に対して新たなアフィン変換係数を求め、テーブルから統合された２領域を削除し、統合された領域を追加して、テーブルを更新する。この処理を領域間の距離が予め定めた閾値より大きくなるか、領域が１つになるまで繰り返し行う。 Next, clustering processing is performed on each divided area (step 503). For this purpose, a table comprising combinations of all areas is prepared, and the distance between the areas is obtained from the affine transformation coefficients. Here, the Euclidean distance of 6 coefficients of the affine transformation model is used, but other distances may be used. Next, the two regions having the smallest distance are integrated, a new affine transformation coefficient is obtained for the integrated region, the two integrated regions are deleted from the table, the integrated region is added, Update the table. This process is repeated until the distance between the areas becomes greater than a predetermined threshold or there is one area.

このようにしてクラスタリングされた領域のうち、クラスタが最大の領域を背景の領域と判定する処理を行い（ステップ５０４）、その領域のアフィン変換係数を背景の動きのアフィン変換係数５０５として出力する。 Of the areas clustered in this way, a process of determining the area with the largest cluster as the background area is performed (step 504), and the affine transformation coefficient of the area is output as the affine transformation coefficient 505 of the background motion.

＜手法３＞
背景の動きを近似するアフィン変換係数推定処理において、手法３を用いた場合の処理手順を図６に示すフローチャートを用いて説明する。 <Method 3>
A processing procedure when Method 3 is used in the affine transformation coefficient estimation processing for approximating the background motion will be described with reference to the flowchart shown in FIG.

まず、複数のフレームを一度に読み込み、各フレームに対して手法２と同様な処理を用いて隣接する動ベクトルが類似するような領域に分割する処理を行う（ステップ６０１）。 First, a plurality of frames are read at a time, and processing for dividing each frame into regions where adjacent motion vectors are similar is performed using the same processing as in method 2 (step 601).

次に、各領域の動きをアフィン変換モデルで近似したときの変換係数の推定処理を行い（ステップ６０２）、さらに領域の位置や動ベクトルデータ及び変換係数に基づいて、フレーム間で対応する領域を求める処理を行った後（ステップ６０３）、手法２と同様なクラスタリング処理により各フレーム内の領域をクラスタリングする（ステップ６０４）。 Next, a conversion coefficient estimation process is performed when the motion of each area is approximated by an affine transformation model (step 602). Further, based on the position of the area, motion vector data, and the conversion coefficient, a corresponding area between frames is determined. After performing the required processing (step 603), the regions in each frame are clustered by the same clustering processing as in method 2 (step 604).

フレーム間対応付け処理で対応付けられた領域が別のクラスタにクラスタリングされた場合は、最も多くのクラスタにクラスタリングされた結果を正解とし、他のクラスタにクラスタリングされた領域を移動する補正処理を行う（ステップ６０５）。 When the region associated in the inter-frame association processing is clustered in another cluster, the result clustered in the largest number of clusters is taken as the correct answer, and correction processing is performed to move the region clustered in another cluster (Step 605).

最後に、複数フレーム間で最も面積が大きかった領域を背景とする判定処理を行い（ステップ６０６）、各フレームにおける背景領域の変換係数５０７を求める。この手法３は、背景領域が特定のフレームで一時的に他の領域より小さくなった場合でも正しく推定ができるという利点がある。 Finally, a determination process is performed in which a region having the largest area among a plurality of frames is used as a background (step 606), and a conversion coefficient 507 for the background region in each frame is obtained. This method 3 has an advantage that it can be correctly estimated even when the background area temporarily becomes smaller than other areas in a specific frame.

上述の例では、背景の動きを推定する処理に用いる変換モデルにアフィン変換を用いたが、透視変換など、他の変換モデルを使用してもよい。 In the above example, the affine transformation is used as the transformation model used for the process of estimating the background motion, but other transformation models such as perspective transformation may be used.

次に、図７を用いて図１のステップ１０６におけるオブジェクト及び背景に関する特徴量データの記述処理で用いられるデータ表現について説明する。ここでは、図７（ａ）に示されるように例として第１０００番目のフレーム内の映像７０５に含まれる３つのオブジェクトの表記データ７００を表している。この表記データ７００は、元映像データの映像ストリーム７０６内の対応するフレームを示すフレーム情報７０１とオブジェクトに関する特徴量７０３及び背景に関する特徴量７０４の各データより構成され、次表記データへのポインタ７０２を用いたリスト構造により管理されている。 Next, data representation used in the description processing of the feature amount data relating to the object and the background in step 106 in FIG. 1 will be described with reference to FIG. Here, as shown in FIG. 7A, for example, the notation data 700 of three objects included in the video 705 in the 1000th frame is shown. This notation data 700 is composed of frame information 701 indicating the corresponding frame in the video stream 706 of the original video data, the feature amount 703 relating to the object, and the feature amount 704 relating to the background, and a pointer 702 to the next notation data is provided. It is managed by the list structure used.

オブジェクトに関する特徴量７０３は、少なくともオブジェクトの位置、形状（大きさを含む）及び動きの情報を含んでおり、具体的には例えば図７（ｂ）に例示されるような様々な特徴量より構成される。この例では、オブジェクトに関する特徴量７０３は「位置」、形状である「概形」、動きの情報である「アフィン変換係数」及び「動ベクトルの平均、方向」、さらに「色ヒストグラム」などから構成される。 The feature quantity 703 relating to the object includes at least information on the position, shape (including size) and movement of the object, and specifically includes various feature quantities as exemplified in FIG. 7B, for example. Is done. In this example, the feature quantity 703 regarding the object is composed of “position”, “rough shape” as shape, “affine transformation coefficient” and “average and direction of motion vector” as motion information, and “color histogram”. Is done.

ここで、オブジェクトの概形は、楕円や矩形などの簡易な図形により近似してもよい。アフィン変換係数は、前述したようにオブジェクトの動きをアフィン変換モデルで近似したときに推定される係数である。動ベクトルの平均は、オブジェクト中の動ベクトルの大きさの平均値である。また、オブジェクトの色情報が取得可能である場合には、オブジェクトの領域の色ヒストグラムが特徴量として用いることができる。オブジェクトの動きに関しては、背景の動きを除去したもの、除去しないもののいずれを記録してもよい。 Here, the outline of the object may be approximated by a simple figure such as an ellipse or a rectangle. The affine transformation coefficient is a coefficient estimated when the motion of the object is approximated by an affine transformation model as described above. The average of the motion vectors is an average value of the sizes of the motion vectors in the object. When the color information of the object can be acquired, the color histogram of the object area can be used as the feature amount. As for the movement of the object, either the background movement removed or not removed may be recorded.

この例のようにオブジェクトが複数存在する場合は、各オブジェクト特徴量７０３に個別のＩＤ番号を付け、拡張容易な例えば図７（ａ）に示されるようなリスト構造により管理することが望ましい。このようなリスト構造を用いると、オブジェクト特徴量の追加や削除が容易である。 When there are a plurality of objects as in this example, it is desirable to assign an individual ID number to each object feature quantity 703 and manage it by a list structure as shown in FIG. By using such a list structure, it is easy to add or delete object feature amounts.

背景に関する特徴量７０４に関してもオブジェクトに関する特徴量７０３と同様に図７（ｃ）に例示されるような様々な特徴量、例えば「アフィン変換係数」、「動ベクトルの平均、方向」、「カメラワーク種」、「色ヒストグラム」などにより構成される。カメラワーク種とは、バンやズームなどの撮影に用いられる典型的なカメラワークの種類をいう。 As for the feature quantity 704 related to the background, various feature quantities as exemplified in FIG. 7C, for example, “affine transformation coefficient”, “average of moving vector, direction”, “camera work”, like the feature quantity 703 related to the object. It consists of “species”, “color histogram” and the like. The camera work type is a typical type of camera work used for shooting such as a van or zoom.

次に、図８に示すフローチャートを用いて図１のステップ１０８における類似度判定処理について説明する。
この類似度判定処理は、元映像データに含まれる各オブジェクトに関する特徴量データ８００に対して、順次、外部より入力された特徴量データ８０４と比較することにより行う。外部より入力する特徴量データ８０４は、数値などで直接データとして与えてもよいし、映像から特徴量を抽出して特徴量データとして与えてもよい。 Next, the similarity determination process in step 108 of FIG. 1 will be described using the flowchart shown in FIG.
This similarity determination processing is performed by sequentially comparing the feature amount data 800 regarding each object included in the original video data with the feature amount data 804 input from the outside. The feature amount data 804 input from the outside may be directly given as numerical data or the like, or may be extracted from the video and given as feature amount data.

オブジェクトが複数の種類の特徴量を持つ場合は、各特徴量について順次、類似度判定処理により類似度を求める（ステップ８０３）。 When the object has a plurality of types of feature amounts, the similarity is obtained sequentially by the similarity determination process for each feature amount (step 803).

元映像データ８００に含まれる特徴量データ８００と外部より入力される特徴量データ８０４の比較は、特徴量の種類に基づいて適当な手法を用いる。例えば、特徴量が色のヒストグラムであれば、ヒストグラムの各要素の差を用いることなどが考えられる。比較するオブジェクトが異なる種類の特徴量を持つ場合は、一致する特徴量のみについて比較すればよい。 Comparison between the feature data 800 included in the original video data 800 and the feature data 804 input from the outside uses an appropriate method based on the type of the feature. For example, if the feature quantity is a color histogram, it is conceivable to use differences between the elements of the histogram. When the objects to be compared have different types of feature amounts, only the matching feature amounts need be compared.

ステップ８０１，８０２で全てのオブジェクトについて全ての特徴量データについて検索が終わったと判定されると、該当オブジェクトの情報について検索結果表示処理を行い（ステップ８０５）、処理は終了する。 If it is determined in steps 801 and 802 that the search has been completed for all the feature amount data for all objects, search result display processing is performed for the information of the corresponding object (step 805), and the processing ends.

オブジェクトの動きを比較するときには、背景に関する特徴量データを用いて、背景の動きを除去して比較することも可能である。図９を用いて、背景の動きを分離した検索の効果について説明する。 When comparing the movements of objects, it is possible to remove the movement of the background using the feature amount data related to the background and compare them. With reference to FIG. 9, the effect of the search with the background motion separated will be described.

図９に示すように、元映像データ９０１は左に移動するオブジェクトを追跡するようにカメラを移動しながら撮影したものであるが、映像中では見かけ上オブジェクトが静止し、背景が右に移動しているように見える。左に移動するオブジェクトを検索するために、外部よりオブジェクト９０５のデータが入力された場合、映像データ９０１のオブジェクトは静止しているために特徴量が一致せず、検索することができない。 As shown in FIG. 9, the original video data 901 is taken while moving the camera so as to track the object moving to the left, but in the video, the object appears to be stationary and the background moves to the right. Looks like. In order to search for an object that moves to the left, when data of the object 905 is input from the outside, the object of the video data 901 is stationary, so the feature amounts do not match and cannot be searched.

しかし、本発明に従いオブジェクトに関する特徴量と背景に関する特徴量を記述すると、背景の動きを利用して、カメラワークにより動く背景９０４を分離する処理９０２により、オブジェクト本来の左方向への動きを伴うオブジェクト９０３を検出することができる。すなわち、処理９０２においてはオブジェクトに関する特徴量と背景に関する特徴量との差分を求めることにより、オブジェクト９０３のみを検出する。 However, when the feature quantity related to the object and the feature quantity related to the background are described according to the present invention, the object with the original leftward movement is obtained by the process 902 for separating the background 904 moving by camerawork using the movement of the background 903 can be detected. That is, in the process 902, only the object 903 is detected by obtaining the difference between the feature quantity related to the object and the feature quantity related to the background.

従って、この検出したオブジェクト９０２と外部より入力されたオブジェクト９０５を比較することによって、外部より入力されたオブジェクト９０５と同一のオブジェクトを入力映像データ９０１から検索したり、外部より入力されたオブジェクト９０５と同一のオブジェクトを含む映像フレームを元映像データ９０１から検索したりすることが可能となる。この場合、前述のように差分のデータとを記述しておけば、処理９０２は不要となる。 Therefore, by comparing the detected object 902 with an object 905 input from the outside, the same object 905 as the object input from the outside can be searched from the input video data 901, or the object 905 input from the outside can be searched. A video frame including the same object can be searched from the original video data 901. In this case, if the difference data is described as described above, the processing 902 becomes unnecessary.

また、本発明に従いオブジェクトに関する特徴量と背景に関する特徴量を記述すると、図１０に示すように外部より入力されるカメラワークに一致したカメラワークの映像を検索することができる。すなわち、図１０に示すように元映像データ１００１からオブジェクト１００３を分離する処理１００２により、カメラワークにより動く背景１００４のみを検出する。そして、この検出した背景１００４と外部より入力されるカメラワークにより動く背景１００５と比較することによって、外部より入力されるカメラワークと同一のカメラワークが用いられている映像フレームを元映像データ１００１から検索する。 In addition, when the feature quantity related to the object and the feature quantity related to the background are described according to the present invention, it is possible to search for a camera work image that matches the camera work input from the outside as shown in FIG. That is, as shown in FIG. 10, only a background 1004 that moves due to camera work is detected by processing 1002 that separates the object 1003 from the original video data 1001. Then, by comparing the detected background 1004 with the background 1005 that is moved by the camera work input from the outside, a video frame using the same camera work as the camera work input from the outside is obtained from the original video data 1001. Search for.

この場合も、前述のように差分のデータとを記述しておけば、処理１００２は不要となる。 Also in this case, if the difference data is described as described above, the processing 1002 becomes unnecessary.

［第２の実施形態］
次に、図１１に示すフローチャートを参照して本発明の第２の実施形態について説明する。
本実施形態では、先の第１の実施形態におけるオブジェクトの検出及び記述に代えて、予め解析された特徴量データが付加された元映像データ１１００を入力し（ステップ１１０１）、この元映像データ１１００からオブジェクトに関する特徴量データを分離して抽出する（ステップ１１０２）。 [Second Embodiment]
Next, a second embodiment of the present invention will be described with reference to the flowchart shown in FIG.
In this embodiment, instead of detecting and describing an object in the first embodiment, original video data 1100 to which feature amount data analyzed in advance is added is input (step 1101), and this original video data 1100 is input. The feature amount data relating to the object is separated and extracted from the object (step 1102).

以下、第１の実施形態と同様に、ステップ１１０２で抽出された元映像データ１１００に関する特徴量データと、ステップ１１０９により入力された検索対象特徴量データ１１１０との類似度判定処理を行い（ステップ１１０８）、その結果をオブジェクト検索結果表示データ１１１２として合成表示するための合成表示処理を行う（ステップ１１１１）。 Subsequently, similar to the first embodiment, similarity determination processing is performed between the feature amount data related to the original video data 1100 extracted in step 1102 and the search target feature amount data 1110 input in step 1109 (step 1108). ), A combined display process for combining and displaying the result as object search result display data 1112 is performed (step 1111).

［第３の実施形態］
次に、図１２に示すフローチャートを用いて本発明の第３の実施形態について説明する。
本実施形態では、外部より入力された一連の動きと複数フレームに渡る表記データを比較し、時系列的な動きによるオブジェクトの検索を可能にするために、複数の連続する表記データ１２０１に含まれるオブジェクトのうち、同一のオブジェクトを対応付ける処理を行う（ステップ１２０２）。一方、外部より入力された動きデータ１２０３から、表記データ１２０１と同一の間隔で動きデータを抽出するサンプリング処理を行う（ステップ１２０４）。 [Third Embodiment]
Next, a third embodiment of the present invention will be described using the flowchart shown in FIG.
In this embodiment, in order to compare a series of movements input from the outside with notation data over a plurality of frames and to search for objects based on time-series movements, they are included in a plurality of continuous notation data 1201. A process of associating the same object among the objects is performed (step 1202). On the other hand, sampling processing for extracting motion data from the motion data 1203 input from the outside at the same interval as the notation data 1201 is performed (step 1204).

そして、互いに対応する表記データとサンプリングされた外部入カの動きデータを比較し（ステップ１１０５）、合致するオブジェクトを含む映像を検索結果として表示する（ステップ１１０６）。 Then, the notation data corresponding to each other and the sampled movement data of the external input are compared (step 1105), and the video including the matching object is displayed as the search result (step 1106).

図１３を用いて、図１２のステップ１２０２の連続する表記データ１２０１に含まれるオブジェクトの対応付け処理について説明する。
第Ｎ番目の表記データに含まれるオブジェクト１３０１に関する特徴量（位置及び動き）を用いて、第Ｎ＋１番目の表記データにおけるオブジェクトの予測される位置１３０２を求める。そして、この予測される位置１３０２に最も近い位置に存在する第Ｎ＋１番目の表記データに含まれるオブジェクト１３０３をオブジェクト１３０１に対応するオブジェクトとする。 With reference to FIG. 13, the association processing of the objects included in the continuous notation data 1201 in step 1202 of FIG. 12 will be described.
A predicted position 1302 of the object in the (N + 1) th notation data is obtained using the feature amount (position and motion) related to the object 1301 included in the Nth notation data. Then, an object 1303 included in the (N + 1) th notation data present at the position closest to the predicted position 1302 is set as an object corresponding to the object 1301.

図１４を用いて、図１２のステップ１２０４における外部から入力された動きデータ１２０３のサンプリング処理について説明する。
外部より入力された動きデータ１４０１（１２０３と同じ）は連続的な動きデータであるため、そのままでは数フレーム毎に付加される離散的なデータである表記データ中と比較することはできない。そこで、動きデータ１４０１を表記データのフレーム間隔でサンプリングして、このサンプリングされた動きデータ１４０２と表記データを比較する。 The sampling process of the motion data 1203 input from the outside in step 1204 of FIG. 12 will be described using FIG.
Since motion data 1401 (same as 1203) input from the outside is continuous motion data, it cannot be compared with notation data which is discrete data added every few frames. Therefore, the motion data 1401 is sampled at the frame interval of the notation data, and the sampled motion data 1402 is compared with the notation data.

本発明の第１の実施形態に係る映像検索システムの基本的な処理手順を示すフローチャートThe flowchart which shows the basic process sequence of the video search system which concerns on the 1st Embodiment of this invention. 同実施形態におけるオブジェクト検出の処理手順を示すフローチャートA flowchart showing a processing procedure of object detection in the embodiment 同実施形態におけるオブジェクト検出に用いるＭＰＥＧストリームのＩピクチャ及びＰピクチャの関係を示す図The figure which shows the relationship between the I picture and P picture of the MPEG stream used for the object detection in the embodiment 同実施形態におけるオブジェクト検出での低信頼性ペクトルの除去について説明するための図The figure for demonstrating removal of the low reliability spectrum in the object detection in the embodiment 同実施形態における背景領域の変換係数を求める手法を説明するためのフローチャートFlowchart for explaining a method for obtaining a conversion coefficient of a background region in the embodiment 同実施形態における背景領域の変換係数を求める他の手法を説明するためのフローチャートFlowchart for explaining another method for obtaining the conversion coefficient of the background region in the embodiment 同実施形態におけるオブジェクト記述処理で用いる特徴量データの構造を示す図The figure which shows the structure of the feature-value data used by the object description process in the embodiment 同実施形態におけるオブジェクト検索の処理手順を示すフローチャートA flowchart showing a processing procedure of object search in the embodiment 同実施形態におけるオブジェクト検索処理におけるカメラワークの除去について示す図The figure shown about the removal of the camera work in the object search process in the embodiment 同実施形態におけるオブジェクト検索処理における入力されたカメラワークと同一カメラワークが用いられているフレームの検索について示す図The figure which shows about the search of the flame | frame in which the same camera work as the input camera work is used in the object search process in the embodiment 本発明の第２の実施形態に係る映像検索システムの基本的な処理手順を示すフローチャートThe flowchart which shows the basic process sequence of the image | video search system which concerns on the 2nd Embodiment of this invention. 本発明の第３の実施形態に係る映像検索システムの基本的な処理手順を示すフローチャートThe flowchart which shows the basic process sequence of the video search system which concerns on the 3rd Embodiment of this invention. 同実施形態における連続する表記データ間でのオブジェクトの対応付けを説明するための図The figure for demonstrating matching of the object between the continuous notation data in the embodiment 同実施形態における外部より入力された動きデータのサンブリング方法を説明するための図The figure for demonstrating the sampling method of the motion data input from the outside in the embodiment

Explanation of symbols

７００…表記データ
７０３…オブジェクトに関する特徴量
７０４…背景に関する特徴量
７０６…映像ストリーム 700 ... notation data 703 ... feature quantity related to object 704 ... feature quantity related to background 706 ... video stream

Claims

Describe the feature quantity related to a specific object in the video to be searched and the feature quantity related to the background in the video,
After subtracting the feature quantity relating to the background from the feature quantity relating to the object in the video, and comparing it with the feature quantity relating to the object input from outside, the same as the object input from the outside from the video to be searched Or at least one of the frames in the video including the same object as the object input from the outside or the object input from the outside.

Describe the feature quantity related to a specific object in the video to be searched, the feature quantity related to the background in the video, and the difference between each feature quantity,
By comparing the difference and the feature quantity related to the object input from the outside, the same object as the object input from the outside or the same object as the object input from the outside from the video to be searched A video search method comprising searching for at least one of the frames in the video.

Describe the feature quantity related to a specific object in the video to be searched and the feature quantity related to the background in the video,
By comparing the feature quantity related to the background with the feature quantity related to the background in the video inputted from the outside, it is almost the same as the camera work when the video inputted from the outside is obtained from the video to be searched. A video search method characterized by searching for a frame in which camera work is used.

Describe a feature amount including at least information about movement of a specific object in a video to be searched and a feature amount related to a background in the video,
By comparing the motion information of the object in a plurality of consecutive frames in the video with a series of motion information of the object input from the outside, the same object as the object input from the outside from the video to be searched A video search method comprising searching for at least one of an object or a frame including the same object as the object input from the outside.

Describe the feature quantity related to a specific object in the video to be searched, the feature quantity related to the background in the video, and the difference between each feature quantity,
By comparing the difference and the feature quantity related to the object input from the outside, the same object as the object input from the outside or the same object as the object input from the outside from the video to be searched A video search apparatus for searching for at least one of the frames in the video.

Describe the feature quantity related to a specific object in the video to be searched and the feature quantity related to the background in the video,
By comparing the feature quantity related to the background with the feature quantity related to the background in the video inputted from the outside, it is almost the same as the camera work when the video inputted from the outside is obtained from the video to be searched. An image retrieval apparatus for retrieving a frame in which camera work is used.

Describe a feature amount including at least information about movement of a specific object in a video to be searched and a feature amount related to a background in the video,
By comparing the motion information of the object in a plurality of consecutive frames in the video with a series of motion information of the object input from the outside, the same object as the object input from the outside from the video to be searched A video search apparatus for searching for at least one of an object or a frame including the same object as the object input from the outside.