JPH11259507A

JPH11259507A - Video retrieval method and medium for storing video retrieval program

Info

Publication number: JPH11259507A
Application number: JP10060835A
Authority: JP
Inventors: Yukinobu Taniguchi; 行信谷口; Yoshinobu Tonomura; 佳伸外村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-03-12
Filing date: 1998-03-12
Publication date: 1999-09-24

Abstract

PROBLEM TO BE SOLVED: To display the image contents of an entire scene for video images including a camera operation, to extract a representative image preserving spatial position relation and also to perform time-based retrieval in a system for retrieving video images by the representative image extracted therefrom. SOLUTION: In a representative image extraction process 10 for extracting a representative image to be used in a similar image retrieval process 11 for retrieving the representative image similar to a reference image from the video images of a retrieval object, by performing the processing of synthesizing panoramic images by extracting an image string including the camera operation and synthesizing the extracted image string, the representative image for displaying the image contents of an entire scene is automatically extracted. Also, by preserving information relating to the movement of a camera in relation to the representative image, a time-based retrieval condition is replaced with a spatial retrieval condition and the time-based retrieval relating to the representative image synthesized as the panoramic image is made possible.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は，映像データベース
システム等における映像検索方法および映像検索プログ
ラムを格納した記録媒体に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video search method in a video database system and the like and a recording medium storing a video search program.

【０００２】[0002]

【従来の技術】キーワードなどの文字情報ではなく，画
像特徴（色ヒストグラム，エッジ特徴など）に基づいた
画像検索システムが開発されている（M.Flicker, H.Saw
hney,W.Niblack, J.Ashley, Q.Huang, B.Dom, M.Gorkan
i, J.Hafner, D.Lee, D.Petkovic, D.Steele, P.Yanke
r, Query by Image and Video Content: The QBIC Syst
em, IEEE Computer Magazine, Vol.28, No.9, pp.23-3
2, 1997）。2. Description of the Related Art Image retrieval systems based on image features (color histograms, edge features, etc.) instead of character information such as keywords have been developed (M.Flicker, H.Saw
hney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkan
i, J. Hafner, D. Lee, D. Petkovic, D. Steele, P. Yanke
r, Query by Image and Video Content: The QBIC Syst
em, IEEE Computer Magazine, Vol.28, No.9, pp.23-3
2, 1997).

【０００３】文字情報に基づいたキーワード検索は，画
像データベースにおいても重要な検索方法であるが，こ
れはキーワード付与に人手を要すること，画像は人によ
ってあるいは状況によって様々な意味を持つ（多義性）
ためキーワード選択が困難であることが問題であった。
これらの問題点を解決するために，画像特徴に基づいた
検索方式が望まれていた。映像データベースにおける検
索方法にも同様の問題点があり，画像特徴に基づいた検
索方式が望まれていた。A keyword search based on character information is also an important search method in an image database, but it requires human resources to assign keywords, and images have various meanings depending on persons or situations (polysemy).
Therefore, it is difficult to select a keyword.
To solve these problems, a search method based on image features has been desired. There is a similar problem in a search method in a video database, and a search method based on image characteristics has been desired.

【０００４】利用者がどのような方法で検索要求をシス
テムに与えるかには，次のようなバリエーションがあり
得る。 (a) 第１の方法は，利用者が一枚あるいは複数枚の画像
を与えて，その画像と類似した画像を提示するように要
求するものである。[0004] There are following variations in how a user gives a search request to the system. (a) In the first method, the user gives one or more images and requests that the user present an image similar to the image.

【０００５】(b) 第２の方法は，検索要求文を利用者が
与えて，その文を何らかの知識を用いてシステムが扱い
やすい要求条件に変換することによって検索を行うもの
である。例えば「山と青空の写っているシーンが欲し
い」という利用者の検索要求に対して，システムは
「山」は緑色で「青空」は青色であり，「山」の領域は
「空」の領域の下にあるという知識を用いて，「緑色の
領域の上に青色の領域が存在する画像を検索せよ」とい
うシステムにとって扱い易いものに変換し，予め蓄積さ
れているインデクスと検索要求を照合し合致する画像を
提示する。(B) In the second method, a search is performed by giving a search request sentence by a user and converting the sentence to a requirement condition that can be easily handled by the system using some knowledge. For example, in response to a user's search request of “I want a scene with a mountain and blue sky,” the system indicates that the “mountain” is green, the “blue sky” is blue, and the “mountain” area is the “sky” area. Is converted to a system-friendly one, such as "Search for images in which a blue area exists above a green area", using the knowledge that the search request is stored under the green area. Present matching images.

【０００６】映像は画像が時間的に並んだものであるの
で，以上説明したような画像検索技術を応用することで
映像検索も可能であると考えられていた。しかし，映像
には膨大な数の画像が含まれているので，実用的な検索
を行うためには前段階として，映像の中から重要な画像
（代表画像と呼ぶ）を，少数抽出しておき，代表画像に
ついてだけ画像特徴量を算出し，インデクスとするとい
う方式がある（M.Flicker, H.Sawhney, W.Niblack, J.A
shley, Q.Huang, B.Dom, M.Gorkani, J.Hafner, D.Lee,
D.Petkovic, D.Steele, P.Yanker, Query by Image an
d Video Content: The QBIC System, IEEE Computer Ma
gazine, Vol.28, No.9, pp.23-32, 1997）。[0006] Since a video is an image in which images are arranged in time, it has been considered that a video search is also possible by applying the above-described image search technology. However, since a video contains a huge number of images, a small number of important images (referred to as representative images) must be extracted from the video as a pre-stage for a practical search. There is a method of calculating an image feature amount only for a representative image and using it as an index (M. Flicker, H. Sawhney, W. Niblack, JA
shley, Q.Huang, B.Dom, M.Gorkani, J.Hafner, D.Lee,
D.Petkovic, D.Steele, P.Yanker, Query by Image an
d Video Content: The QBIC System, IEEE Computer Ma
gazine, Vol.28, No.9, pp.23-32, 1997).

【０００７】代表画像の抽出方法としては，映像の場面
の変り目であるカット点を検出し，カット点の画像を代
表画像とする方法があった。As a method of extracting a representative image, there has been a method of detecting a cut point which is a transition of a video scene and using an image of the cut point as a representative image.

【０００８】[0008]

【発明が解決しようとする課題】しかし，上記従来技術
には，カメラを動かしながら撮影された映像については
適切な代表画像を抜き出すことが困難であるという問題
点があった。However, the above-described prior art has a problem that it is difficult to extract an appropriate representative image from a video taken while moving the camera.

【０００９】図６を用いてこの問題点について説明す
る。図６に示すシーン（池，山，空の映っているシー
ン）を，カメラを下から上に動かしながら撮影した映像
（画像６１，６２，６３，６４，６５）を考える。カッ
ト点を検出する従来の方法で抽出される代表画像は，シ
ーンの先頭画像６１である。この代表画像には，山が映
っていないので「山の写ったシーンを提示せよ」という
検索要求に応えることができない。This problem will be described with reference to FIG. Consider a video (images 61, 62, 63, 64, 65) of the scene shown in FIG. 6 (a scene in which a pond, a mountain, and the sky are reflected) while moving the camera from bottom to top. A representative image extracted by a conventional method for detecting a cut point is a head image 61 of a scene. Since the mountain is not reflected in this representative image, it is not possible to respond to a search request for “present a scene with a mountain”.

【００１０】この問題を解決するために，今度は画像６
１，６２，６３，６４，６５の５枚の画像を代表画像と
して抽出した場合を考える。これらの代表画像を画像デ
ータベースに蓄えておけば，「山の写っているシーンを
提示せよ」という検索要求に応えることができる。しか
し，代表画像を抽出した時点で代表画像の間の位置関係
が失われているので，「池の上に山の写っているシーン
を提示せよ」といった検索要求に応えることはできな
い。[0010] To solve this problem, the image 6
Consider a case where five images 1, 62, 63, 64, and 65 are extracted as representative images. By storing these representative images in an image database, it is possible to respond to a search request for "present a scene in which a mountain is shown." However, since the positional relationship between the representative images is lost when the representative images are extracted, it is not possible to respond to a search request such as “present a scene with a mountain on a pond”.

【００１１】結局，以上のような従来の方法では，シー
ン全体の画像内容を表し，かつ，その空間的位置関係を
保存した代表画像を抽出できないという問題点があっ
た。また，映像を扱う場合には，時間的な検索を行える
ことが望まれる。例えば，利用者が２枚の画像Ａ，Ｂを
与えて，「Ａの後にＢが現れるシーンを探せ」といった
検索要求に応えるようにしたいわけである。従来技術で
は，代表画像を抽出した時点で時間的な情報は失われて
いたので，時間的な検索を行えなかった。As a result, the conventional method as described above has a problem in that it is not possible to extract a representative image representing the image content of the entire scene and preserving its spatial positional relationship. Also, when handling video, it is desirable to be able to perform a temporal search. For example, a user wants to provide two images A and B to respond to a search request such as "Find a scene in which B appears after A". In the prior art, temporal information was lost when the representative image was extracted, so that a temporal search could not be performed.

【００１２】本発明は，上記問題点を解決するためにな
されたものであり，カメラ操作を含む映像に対してもシ
ーン全体の画像内容を表し，かつ，空間的位置関係を保
存した代表画像を抽出でき，また，時間的な検索を可能
とすることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problem, and a representative image which represents the image content of the entire scene and preserves a spatial positional relationship even for a video including a camera operation. It is intended to be able to extract and to enable temporal search.

【００１３】[0013]

【課題を解決するための手段】本発明は，映像から複数
枚の代表画像を抽出する代表画像抽出工程と，参照画像
と類似した代表画像を検索する類似画像検索工程とを具
備する映像検索方法であって，前記代表画像抽出工程
が，カメラ操作を含む画像列を抽出する工程と，抽出し
た画像列を合成することによってパノラマ画像を合成す
る工程を含むことを特徴とする。カメラ操作を含む画像
列を抽出し，その画像列からパノラマ画像を合成するよ
うにしたことで，シーン全体の画像内容を表した代表画
像を自動的に抽出できるようになる。SUMMARY OF THE INVENTION The present invention provides a video search method including a representative image extracting step of extracting a plurality of representative images from a video, and a similar image searching step of searching for a representative image similar to a reference image. Wherein the representative image extracting step includes a step of extracting an image sequence including a camera operation, and a step of synthesizing the extracted image sequence to synthesize a panoramic image. By extracting an image sequence including a camera operation and synthesizing a panoramic image from the image sequence, a representative image representing the image content of the entire scene can be automatically extracted.

【００１４】さらに第２の発明は，前記代表画像抽出工
程においてカメラの動きに関する情報を代表画像と関連
付けて保存しておき，前記類似画像検索工程において時
間的検索条件を空間的検索条件に置き換えることによっ
て検索を行うことを特徴とする。カメラの動きに関する
情報を代表画像と関連付けて保存しておくことにより，
時間的検索条件を空間的検索条件に置き換えることがで
きるようになる。According to a second aspect of the present invention, in the representative image extracting step, information relating to camera movement is stored in association with the representative image, and the temporal search condition is replaced with a spatial search condition in the similar image searching step. The search is performed by By storing information about camera movements in association with the representative image,
Temporal search conditions can be replaced with spatial search conditions.

【００１５】以上の各処理工程をコンピュータによって
実現するためのプログラムは，コンピュータが読み取り
可能な可搬媒体メモリ，半導体メモリ，ハードディスク
などの適当な記録媒体に格納することができる。A program for realizing each of the above processing steps by a computer can be stored in an appropriate recording medium such as a computer-readable portable medium memory, a semiconductor memory, and a hard disk.

【００１６】[0016]

【発明の実施の形態】本発明の実施の形態について図を
参照して説明する。＜実施の形態１＞図１は本発明の実施の形態の処理フロ
ー図である。大きく二つの工程，(a) 代表画像抽出工
程，(b) 類似画像検索工程，に分けられる。An embodiment of the present invention will be described with reference to the drawings. <Embodiment 1> FIG. 1 is a processing flowchart of an embodiment of the present invention. It can be roughly divided into two processes, (a) a representative image extraction process, and (b) a similar image search process.

【００１７】(a) 代表画像抽出工程図１（Ａ）の点線で
囲まれている部分が代表画像抽出工程１０である。この
工程１０は，新たな映像をデータベース（以下，ＤＢと
呼ぶ）に登録する際に実行される。ステップ１００で
は，映像データを入力する。具体的には，映像信号をデ
ジタル化したり，他の媒体に記録されているデジタル映
像データを読み込む処理であったりする。ステップ１０
１〜１１０は，入力映像データを先頭から解析し，カッ
ト点とカメラ操作を検出することによって複数の代表画
像を抽出する処理である。以下，具体的な手続きについ
て説明する。(A) Representative Image Extraction Step The part surrounded by a dotted line in FIG. This step 10 is executed when a new video is registered in a database (hereinafter, referred to as DB). In step 100, video data is input. Specifically, the processing includes digitizing a video signal and reading digital video data recorded on another medium. Step 10
Steps 1 to 110 are processes for analyzing a plurality of representative images by analyzing input video data from the beginning and detecting a cut point and a camera operation. Hereinafter, the specific procedure will be described.

【００１８】ステップ１０１では，カット点を検出する
ための処理を行う。カット点検出の方法は，様々なもの
が開示されているが，単純に，現在着目している画像と
その一つ前の画像の間の輝度差分をとり，その差分の絶
対値和があるしきい値以上である場合にカット点である
と判定することができる。ステップ１０２で，カット有
りと判定された場合には，ステップ１０３において現在
入力されているカット点画像を代表画像として，ステッ
プ１０４で代表画像登録処理手続きに渡す。In step 101, processing for detecting a cut point is performed. Various methods for detecting a cut point are disclosed. However, simply, a difference in brightness between an image of interest at present and an immediately preceding image is obtained, and there is a sum of absolute values of the difference. If it is not less than the threshold value, it can be determined that it is a cut point. If it is determined in step 102 that there is a cut, the cut point image currently input in step 103 is passed to the representative image registration processing procedure in step 104 as the representative image.

【００１９】ステップ１０４の代表画像登録手続きにつ
いて説明する。図１（Ｃ）のステップ１１１では，登録
しようとしている代表画像から，画像特徴量を抽出す
る。画像特徴量としては，様々なものを利用することが
できるが，例えば画像の輝度の平均値，色ヒストグラム
などを用いることができる。The representative image registration procedure in step 104 will be described. In step 111 of FIG. 1C, an image feature amount is extracted from the representative image to be registered. As the image feature amount, various types can be used. For example, an average value of the luminance of the image, a color histogram, and the like can be used.

【００２０】ここでは，色ヒストグラムを例にとって説
明する。代表画像の各画素点ｉのＲＧＢ値をＲ_i，
Ｇ_i，Ｂ_i（０≦Ｒ_i，Ｇ_i，Ｂ_i≦２５５）とする。
ＲＧＢ値（Ｒ_i，Ｇ_i，Ｂ_i）は，もともと８ビットあ
るが，それぞれを２ビットに圧縮／合成して６ビット値
Ｖ_iに変換する。ヒストグラムＨ（ｖ），ｖ＝０，１，
…，６３は，Ｖ_i＝ｖとなる画素数をカウントしたもの
であり，Ｈ（ｖ）は正の整数値をとる。ステップ１１２
では，代表画像データ，特徴量，映像の先頭からの時間
などをデータベース（ＤＢ）の一つのレコードとして登
録する。なお，画像特徴量としてヒストグラムＨ（ｖ）
を登録する場合には，画像サイズに依存しないよう正規
化してからレコードに登録する。Here, a color histogram will be described as an example. The RGB values of each pixel point i of the representative image are represented by R _i ,
_{_{G i, B i (0 ≦}} R i, G i, B i ≦ 255) that.
RGB values _{_{(R i, G i, B}} i) is originally located 8 bits, converting each compression / synthesized and the 6-bit value V _i to 2 bits. Histogram H (v), v = 0, 1,
, 63 are the number of pixels where V _i = v, and H (v) is a positive integer. Step 112
Then, the representative image data, the feature amount, the time from the beginning of the video, and the like are registered as one record of a database (DB). Note that the histogram H (v) is used as the image feature amount.
Is registered, it is normalized so as not to depend on the image size, and then registered in the record.

【００２１】図２を用いて，具体的なＤＢの構成の例に
ついて説明する。ＤＢの一つのレコードは，図２に示す
ように，レコードのＩＤ番号のフィールド２１，画像デ
ータへのポインタのフィールド２２，映像ＩＤのフィー
ルド２３，映像区間のフィールド２４，パノラマ画像か
否かのフラグ２５，カメラ動き情報のフィールド２６か
ら構成される。フィールド２２には，この例では，画像
データへのポインタとして画像ファイルのファイル名を
格納しているが，もちろん画像データ自身を格納しても
よい。フィールド２３には，映像ＩＤとしてオリジナル
の映像に対するポインタ情報が格納される。フィールド
２４には，代表画像が対応する映像区間が格納される
（０〜１００００は０秒から１００００ミリ秒までの区
間）。フィールド２５は，代表画像のタイプを表すフラ
グが格納される。フィールド２６については後述する。An example of a specific DB configuration will be described with reference to FIG. As shown in FIG. 2, one record of the DB includes a field 21 of the record ID number, a field 22 of the pointer to the image data, a field 23 of the video ID, a field 24 of the video section, and a flag indicating whether or not the image is a panoramic image. 25, a camera motion information field 26. In this example, the field 22 stores the file name of the image file as a pointer to the image data. However, the image data itself may be stored. The field 23 stores pointer information for the original video as the video ID. The field 24 stores a video section corresponding to the representative image (0 to 10000 is a section from 0 seconds to 10000 milliseconds). The field 25 stores a flag indicating the type of the representative image. The field 26 will be described later.

【００２２】ステップ１０５では，カメラパラメータを
推定する。カメラパラメータの推定方法としては，様々
なものが開示されており，いずれを使ってもよいが，単
純な例としては，平行移動モデルをカメラモデルとし，
その最適パラメータをcoarse-to-fine戦略で探索すると
いう方法がある（Taniguchi,Y. et al. Panorama Excer
tps: Extracting and Packing Panoramas for Video Br
owsing, Proceedingsof ACM Multimedia 97, pp.427-43
6.)。In step 105, camera parameters are estimated. Various methods have been disclosed as methods for estimating camera parameters, and any of them may be used. However, as a simple example, a translation model is used as a camera model.
There is a method of searching for the optimal parameters by a coarse-to-fine strategy (Taniguchi, Y. et al. Panorama Excer
tps: Extracting and Packing Panoramas for Video Br
owsing, Proceedingsof ACM Multimedia 97, pp.427-43
6.).

【００２３】ステップ１０６では，ステップ１０５で推
定されたカメラパラメータに基づいてカメラ操作の有無
を判定する。平行移動モデルの場合には，パラメータ値
がすべて０であれば，カメラ操作がないと判定すること
ができる。ノイズを考慮し，より厳密な判定を行うため
に，カメラモデルの当てはまりのよさを定量化する方法
もある（Taniguchi,Y. et al. Panorama Excertps: Ext
racting and PackingPanoramas for Video Browsing, P
roceedings of ACM Multimedia 97, pp.427-436.)。In step 106, the presence or absence of a camera operation is determined based on the camera parameters estimated in step 105. In the case of the parallel movement model, if the parameter values are all 0, it can be determined that there is no camera operation. There is also a method of quantifying the goodness of fit of the camera model in order to make a more rigorous decision in consideration of noise (Taniguchi, Y. et al. Panorama Excertps: Ext
racting and PackingPanoramas for Video Browsing, P
roceedings of ACM Multimedia 97, pp.427-436.).

【００２４】ステップ１０７で，カメラ操作有りと判定
された場合には，ステップ１０８でパノラマ画像を合成
する。より具体的には，カメラの動きをキャンセルする
ように画像列を重ねあわせることによって，背景に継ぎ
目が見えないように画像を合成することができ，最終的
に広視野のパノラマ画像を生成できる。抽出されたパノ
ラマ画像はステップ１０４′（図１（Ｃ）の処理）で代
表画像として登録される。If it is determined in step 107 that a camera operation has been performed, a panoramic image is synthesized in step 108. More specifically, by superimposing the image rows so as to cancel the movement of the camera, the images can be combined so that the seams are not visible in the background, and a panoramic image with a wide field of view can be finally generated. The extracted panoramic image is registered as a representative image in step 104 '(the processing of FIG. 1C).

【００２５】ステップ１０３では，カット点直後の画像
を代表画像とする例について説明したが，カットから一
定時間後の画像を代表画像とするようにしてもよい。ま
た，カット点画像を代表画像としてすぐに登録するので
はなく，次のカット点が検出される前にパノラマ画像を
抽出したら，カット点画像は代表画像として登録しない
ようにすることも冗長な処理を省く意味で好適である。In step 103, an example has been described in which the image immediately after the cut point is used as the representative image. However, the image after a fixed time from the cut may be used as the representative image. Also, instead of immediately registering a cut point image as a representative image, if a panoramic image is extracted before the next cut point is detected, the cut point image may not be registered as a representative image. Is preferred in the sense that is omitted.

【００２６】(b) 類似画像検索工程図１（Ｂ）の点線で囲まれている部分が類似画像検索工
程１１である。この工程１１は，利用者が検索要求をシ
ステムに与えるたびに実行される。(B) Similar Image Searching Step The portion surrounded by the dotted line in FIG. This step 11 is executed each time a user gives a search request to the system.

【００２７】ステップ１２０では，利用者からの検索要
求を入力する。ここでは，類似画像検索の例を考える。
つまり，利用者は１枚の画像をシステムに与えて，検索
結果として類似画像を含む映像区間を出力として得ると
いうものである。したがって，ステップ１２０では，画
像データが読み込まれる。In step 120, a search request from a user is input. Here, an example of similar image search is considered.
That is, the user gives one image to the system, and obtains a video section including a similar image as an output as a search result. Therefore, in step 120, the image data is read.

【００２８】ステップ１２１では，利用者から与えられ
た画像から，画像特徴量を抽出する。どのような種類の
特徴量を算出するかは検索条件によるが，例えば，色に
関して類似しているものが欲しければ，色ヒストグラム
など色の特徴を反映したものとする。In step 121, an image feature is extracted from the image given by the user. The type of feature quantity to be calculated depends on the search condition. For example, if a similar color is desired, the color feature such as a color histogram is reflected.

【００２９】ステップ１２２では，ＤＢの一つのレコー
ドに記録されている特徴量と，ステップ１２１で抽出さ
れた特徴量を照合し，距離を計算し，ステップ１２３の
判定でその距離が許容値よりも小さい場合には，類似画
像が見つかったとして，ステップ１２４で対応する映像
区間を出力する。色ヒストグラムの例では，利用者から
与えられた画像から得られたヒストグラムをＨ１
（ｖ），ＤＢ内の代表画像から得られたヒストグラムを
Ｈ２（ｖ）とすれば，二つの画像の間の距離Ｄは，次式
によって計算できる。In step 122, the feature amount recorded in one record of the DB is compared with the feature amount extracted in step 121, and a distance is calculated. If it is smaller, it is determined that a similar image has been found, and the corresponding video section is output in step 124. In the example of the color histogram, the histogram obtained from the image given by the user is H1
(V) If the histogram obtained from the representative image in the DB is H2 (v), the distance D between the two images can be calculated by the following equation.

【００３０】Ｄ＝Σ｜Ｈ１（ｖ）−Ｈ２（ｖ）｜（ただし，Σはｖ＝０から６３までの総和）。ステップ
１２２からステップ１２４までの処理を，ＤＢに格納さ
れているすべてのレコードについて繰り返す（ステップ
１２５）ことによって，類似映像区間を列挙することが
できる。D = Σ | H1 (v) -H2 (v) | (where Σ is the sum of v = 0 to 63). By repeating the processing from step 122 to step 124 for all records stored in the DB (step 125), similar video sections can be listed.

【００３１】＜実施の形態２＞次に，請求項２の発明に
対応する第２の実施の形態について説明する。図３に，
その処理フロー図を示す。本処理フロー図のほとんどの
部分は，図１の対応する各ステップと同じであるので，
ここでは，新規に加わったステップ３０１とステップ３
０２についてのみ説明する。<Second Embodiment> Next, a second embodiment corresponding to the second aspect of the present invention will be described. In FIG.
The processing flowchart is shown. Most parts of the processing flow diagram are the same as corresponding steps in FIG.
Here, the newly added steps 301 and 3
Only 02 will be described.

【００３２】ステップ３０１は，カメラ操作情報（カメ
ラの動き情報）を抽出するステップであり，代表画像が
カット点によって抽出されたものであれば，カメラは固
定されていたとし，パノラマ画像である場合には，前述
した図１（Ａ）の代表画像抽出工程１０におけるステッ
プ１０５で推定されたカメラパラメータに対して，次に
説明するような変換を施して，ステップ１１２でＤＢへ
挿入する。Step 301 is a step of extracting camera operation information (movement information of the camera). If the representative image is extracted by a cut point, it is determined that the camera is fixed, and if the representative image is a panoramic image. , The camera parameters estimated in step 105 in the above-described representative image extraction step 10 in FIG. 1A are subjected to conversion as described below, and are inserted into the DB in step 112.

【００３３】図４において，４１〜４４は入力画像列を
示し，それぞれ時刻ｔ＝０，１，２，３の画像に対応し
ている。この画像列から生成されるパノラマ画像が，画
像４５である。入力画像Ｉｔとパノラマ画像４５の対応
点（Ｉｔの左上角の点）を，Ｐｔ（ｔ＝０，１，２，
３）とする。対応点Ｐｔは，ステップ１０５で求めたカ
メラパラメータを足し合わせていくことで算出できる
（平行移動モデルの場合）。この点列Ｐｔを，図２に示
すＤＢのフィールド２６にカメラ動き情報として格納し
ておく。In FIG. 4, reference numerals 41 to 44 denote input image sequences, which correspond to images at times t = 0, 1, 2, and 3, respectively. The panorama image generated from this image sequence is the image 45. The point corresponding to the input image It and the panoramic image 45 (the upper left corner point of It) is defined as Pt (t = 0, 1, 2, 2,.
3). The corresponding point Pt can be calculated by adding the camera parameters obtained in step 105 (in the case of a parallel movement model). This point sequence Pt is stored as camera motion information in the field 26 of the DB shown in FIG.

【００３４】図３に戻って，類似画像検索工程１１につ
いて説明する。本実施の形態では，検索要求として，利
用者から２枚の画像ＩＭＧ１，ＩＭＧ２を受け取り，
「ＩＭＧ１の類似画像のあとにＩＭＧ２の類似画像が現
れる映像区間を検索せよ」というものを考える。Returning to FIG. 3, the similar image search step 11 will be described. In the present embodiment, two images IMG1 and IMG2 are received from the user as a search request,
Suppose that “search for a video section in which a similar image of IMG2 appears after a similar image of IMG1”.

【００３５】図３の類似画像検索工程１１におけるステ
ップ１２０では，２枚の画像ＩＭＧ１とＩＭＧ２を入力
する。ステップ１２１では，画像特徴量として上記実施
の形態１と同様にして，ＩＭＧ１，ＩＭＧ２からそれぞ
れ色ヒストグラムＨ１，Ｈ２を算出する。ステップ１２
２では，代表画像とＩＭＧ１，ＩＭＧ２を色ヒストグラ
ムＨ１，Ｈ２を用いて照合し，ＩＭＧ１，ＩＭＧ２の類
似領域が同時に見つかったとき，ステップ３０２に処理
を移す。この時点で，現在着目している映像区間には，
ＩＭＧ１，ＩＭＧ２の類似画像領域が存在することが分
かったことになる。At step 120 in the similar image search step 11 of FIG. 3, two images IMG1 and IMG2 are input. In step 121, color histograms H1 and H2 are calculated from IMG1 and IMG2, respectively, as in the first embodiment. Step 12
In step 2, the representative image is compared with IMG1 and IMG2 using the color histograms H1 and H2. When similar regions of IMG1 and IMG2 are found at the same time, the process proceeds to step 302. At this point, the video section of interest
This means that similar image areas of IMG1 and IMG2 exist.

【００３６】ステップ３０２では，「ＩＭＧ１のあとに
ＩＭＧ２が現れる」という時間条件を満足するかどうか
を判定する。図４の例で，検索要求がＩＭＧ１＝
“Ａ”，ＩＭＧ２＝“Ｅ”である場合，を例にとって説
明する。ステップ１２２で，ＩＭＧ１（“Ａ”）とＩＭ
Ｇ２（“Ｅ”）に類似した領域４６，４７が見つかる。
ＤＢのフィールド２６に設定されているカメラ動き情報
を参照することにより，領域４６は時刻ｔ＝０に，領域
４７は時刻ｔ＝３に現れることが分かる。したがって，
上記時間条件を満たしていることが分かり，ステップ１
２４で対応する映像区間を出力する。結局，時間的条件
を含んだ映像検索が可能となる。In step 302, it is determined whether or not the time condition of "IMG2 appears after IMG1" is satisfied. In the example of FIG. 4, the search request is IMG1 =
The case where “A”, IMG2 = “E” is described as an example. At step 122, IMG1 (“A”) and IM
Regions 46 and 47 similar to G2 ("E") are found.
By referring to the camera motion information set in the DB field 26, it can be seen that the area 46 appears at time t = 0 and the area 47 appears at time t = 3. Therefore,
It turns out that the above time condition is satisfied, and step 1
At 24, the corresponding video section is output. Eventually, video search including temporal conditions becomes possible.

【００３７】＜システムの構成例＞図５は，本発明を適
用する映像検索システムの簡単な構成例を示している。
処理装置１は，ＣＰＵおよびメモリ等からなり，図１に
示す代表画像抽出工程１０を実現するためのソフトウェ
アプログラム等によって構成される代表画像抽出部２，
代表画像登録処理部３，また図１および図３に示す類似
画像検索工程１１を実現するためのソフトウェアプログ
ラム等によって構成される類似画像検索部４を備える。
これらを実現するプログラムは，例えばコンパクトディ
スク，フロッピーディスクなどの記録媒体から読み出し
てインストールすることができる。<System Configuration Example> FIG. 5 shows a simple configuration example of a video search system to which the present invention is applied.
The processing device 1 includes a CPU, a memory, and the like, and includes a representative image extracting unit 2 and a software program for implementing the representative image extracting step 10 illustrated in FIG.
A representative image registration processing unit 3 and a similar image search unit 4 configured by a software program or the like for realizing the similar image search step 11 shown in FIGS. 1 and 3 are provided.
A program for realizing these can be read from a recording medium such as a compact disk or a floppy disk and installed.

【００３８】代表画像抽出部２は，カメラから入力した
映像信号をデジタル化したり，他の媒体に記録されてい
る映像データ５を入力し，前述した処理により代表画像
を抽出して，抽出した代表画像を代表画像登録処理部３
により映像データ５とともに映像データベース６に登録
する。The representative image extracting unit 2 digitizes a video signal input from a camera, inputs video data 5 recorded on another medium, extracts a representative image by the above-described processing, and extracts the extracted representative image. Image to representative image registration processing unit 3
Is registered in the video database 6 together with the video data 5.

【００３９】通信回線等で接続された利用者端末７から
検索したい画像を指定した映像データの検索要求がある
と，類似画像検索部４は，前述した処理により映像デー
タベース６に登録された代表画像を用いて要求された映
像データを検索し，検索結果を利用者端末７へ送信す
る。When there is a video data search request specifying an image to be searched from the user terminal 7 connected via a communication line or the like, the similar image search unit 4 makes the representative image registered in the video database 6 by the above-described processing. , And retrieves the requested video data, and transmits the retrieval result to the user terminal 7.

【００４０】[0040]

【発明の効果】本発明によれば，カメラ操作区間を自動
的に検出し，パノラマ画像を合成し，代表画像とするよ
うにしたので，カメラ操作を含む映像に対してもシーン
全体の画像内容を表し，かつ，空間的位置関係を保存し
た代表画像を抽出できるようになるという効果がある。
また，カメラ操作に関する情報を代表画像と関連付けて
おくことで，時間的な検索が可能となるという効果があ
る。According to the present invention, since a camera operation section is automatically detected, a panoramic image is synthesized, and the representative image is used as a representative image. , And a representative image in which a spatial positional relationship is preserved can be extracted.
In addition, by associating information regarding the camera operation with the representative image, there is an effect that a temporal search can be performed.

[Brief description of the drawings]

【図１】本発明の実施の形態の処理フロー図である。FIG. 1 is a processing flowchart of an embodiment of the present invention.

【図２】データベースの構成について説明するための図
である。FIG. 2 is a diagram for explaining a configuration of a database.

【図３】本発明の第２の実施の形態の処理フロー図であ
る。FIG. 3 is a processing flowchart of a second embodiment of the present invention.

【図４】カメラ動き情報について説明するための図であ
る。FIG. 4 is a diagram for describing camera motion information.

【図５】本発明を適用する映像検索システムの構成例を
示す図である。FIG. 5 is a diagram showing a configuration example of a video search system to which the present invention is applied.

【図６】従来技術の問題点について説明するための模式
図である。FIG. 6 is a schematic diagram for explaining a problem of the related art.

[Explanation of symbols]

１０代表画像抽出工程１１類似画像検索工程１００〜１１０，３０１代表画像抽出工程のステップ１２０〜１２５，３０２類似画像検索工程のステップ２１〜２６ＤＢのフィールド４１〜４４入力画像列４５パノラマ画像４６，４７領域 10 Representative Image Extraction Step 11 Similar Image Search Step 100-110,301 Representative Image Extraction Step 120-125,302 Similar Image Search Step 21-26 DB Fields 41-44 Input Image Sequence 45 Panoramic Image 46,47 region

Claims

[Claims]

1. A video search method comprising: a representative image extracting step of extracting a plurality of representative images from a video; and a similar image searching step of searching for a representative image similar to a reference image. A video search method comprising: extracting a sequence of images including a camera operation; and synthesizing a panoramic image by synthesizing the extracted image sequences.

2. The video search method according to claim 1, wherein
A video characterized in that information on camera movement is stored in association with a representative image in the representative image extracting step, and a search is performed by replacing a temporal search condition with a spatial search condition in the similar image searching step. retrieval method.

3. A program for realizing, by a computer, a video search method including a representative image extraction step of extracting a plurality of representative images from a video and a similar image search step of searching for a representative image similar to the reference image. A computer-readable storage medium storing, in a representative image extracting step, a process including a step of extracting an image sequence including a camera operation, and a process of synthesizing a panoramic image by synthesizing the extracted image sequence. A recording medium storing a video search program storing a program to be executed.