JP2004127311A

JP2004127311A - Associative retrieval device of image

Info

Publication number: JP2004127311A
Application number: JP2003384488A
Authority: JP
Inventors: Akio Nagasaka; 長坂　晃朗; Takafumi Miyatake; 宮武　孝文; Hirotada Ueda; 上田　博唯; Kazuaki Tanaka; 田中　和明
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-10-25
Filing date: 2003-11-14
Publication date: 2004-04-22
Anticipated expiration: 2020-11-02
Also published as: JP3711993B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an associative retrieval device of an image and its method allowing a user to freely and easily execute associative retrieval of the image. <P>SOLUTION: This associative retrieval device is provided with: an image display means having an image display region for viewing the image and an index display region for displaying index information; a point detection means for detecting point positions for the display regions; an object control means for previously registering matter or sound in a moving image appearing in the image; and a control means for defining the state of the image to be subsequently reproduced from the point information obtained from the point detection means and logical structure description of a certain image separately stored. Even if a desired scene can not be identified by the index information, multilateral image retrieval associatively using the respective displays can be carried out such that the scene is finally obtained by finding a scene where a clue as to the scene appears and by associatively retrieving the desired scene based on it. <P>COPYRIGHT: (C)2004,JPO

Description

　本発明は，映像を連想的に検索し任意の映像を見つけ出す装置及び方法に関する。 The present invention relates to an apparatus and a method for associatively searching for an image and finding an arbitrary image.

　近年，計算機の高速化と大容量化を背景にして，従来は扱えなかった映画やビデオなどの映像情報を対象としたデータベースの構築が活発になっている。これに伴い，蓄積された大量の映像の中から，所望のシーンを効率良く選び出す検索技術の実用化が進められている。こうした検索のための方法としては，ユーザが所望のシーンの特徴もしくはキーワード等を指示し，計算機がそれに合致したキーワードが付けられているシーンを見つけ出す方法が，映像データベースの分野で一般的となっている。しかし，シーンの特徴を的確に指定することは，検索に不慣れなユーザにとってはもちろん，熟練者にとっても非常に難しく，思い通りの検索結果が得られないことが多い。 In recent years, with the speeding up and large capacity of computers, the construction of databases for video information such as movies and videos, which could not be handled conventionally, has become active. Along with this, practical use of a search technique for efficiently selecting a desired scene from a large amount of accumulated videos has been promoted. As a method for such a search, a method in which a user designates a feature or a keyword of a desired scene, and a computer finds a scene to which a keyword matching the keyword is attached is generally used in the field of video databases. I have. However, it is very difficult for a user who is unfamiliar with the search, as well as a skilled person, to accurately specify the characteristics of the scene, and often the desired search result cannot be obtained.

　古典的な情報である本には，検索の補助情報として，目次と索引が用意されている。目次は，本文のひとまとまりを象徴するキーワードを，本文中における進行順に列挙した情報である。索引は，本文中の重要なキーワードを，五十音順などの見つけやすい順番で整理して列記した情報である。両者に共通する最大の特徴は，そうしたキーワードが一覧表示されていることにある。そして，必ず巻頭もしくは巻末にあると決まっており，探す手間がかからないことである。読者は
，本文中の一節を，目次や索引を用いることで，自らキーワードを考えることなく探し当てることができる。また，目次を見れば，本文の概要を把握でき，その本が読むに値するかどうかも短時間でわかる。 A table of contents and an index are prepared as supplementary information for search in a book that is classic information. The table of contents is information in which keywords symbolizing a group of the text are listed in the order of progress in the text. The index is information in which important keywords in the text are arranged and listed in an easy-to-find order such as the Japanese syllabary. The most common feature of both is that such keywords are listed. And it is decided that it is always at the beginning or end of the book, so there is no need to search for it. Readers can use the table of contents and index to find passages in the text without having to consider keywords themselves. The table of contents also gives a quick overview of the text and whether the book is worth reading.

　目次や索引による検索には，表示されているキーワードが多すぎると適当な部分が見つけにくい，逆に少ないと適当なキーワードがそもそも存在しない，という問題点もある。しかし，これらの問題点は，ハイパーテキストやフルテキスト検索との併用で解決できる。すなわち，まず目次・索引の項目数をある程度限定してユーザに提示する。ユーザは，その中から，とりあえず目的の部位に関係しそうな次善のキーワードを使って本文を参照し，その本文中に意中の部位に直接関係ありそうなキーワードがないか探せばよい。見つかれば，ハイパーテキストの機構を使って，意中の部位を参照することで目的は達せられる。これはオンラインマニュアルでの検索などで日常的に行われるテクニックである。ハイパーテキストには，あらかじめキーワードが登録されていることが必要であるが，フルテキスト検索を使えば，登録されていないキーワードでも同様のことができる。このように，キーワードを連想的に辿る機構によって，目次や索引の利用できる範囲が広がり，多くの場合，眼前に現れるキーワードをただ取捨選択していくだけの処理で目的の部位が検索できる（以下，連想検索と呼ぶ）ようになる。 (4) In the search using the table of contents or the index, there is a problem that if there are too many displayed keywords, it is difficult to find an appropriate part, and if there are few displayed keywords, there is no appropriate keyword in the first place. However, these problems can be solved in combination with hypertext and full-text search. That is, first, the number of items in the table of contents / index is limited to some extent and presented to the user. The user may refer to the text using a suboptimal keyword that is likely to be related to the target part, and search for a keyword in the text that is likely to be directly related to the desired part. If found, the goal can be achieved by using the hypertext mechanism to refer to the desired location. This is a technique that is routinely used for searching online manuals. It is necessary that a keyword is registered in the hypertext in advance, but the same can be performed with a keyword that has not been registered by using a full-text search. In this way, the associative tracing mechanism of keywords expands the usable range of the table of contents and index, and in many cases, the target site can be searched by simply selecting keywords that appear in front of the eyes (hereinafter, referred to as (Called associative search).

　こうした機構は，映像の検索においても有効と考えられる。映像においては，その中に登場する人や物などの様々な事物が，上記のキーワードに相当するものとして用いることができる。これを用いた連想検索を実現する要素技術としては
，例えば，映像の表示画面中の事物から関連するシーンや情報を参照する方式として，特開平３-５２０７０号公報「動画像の関連情報参照方式」がある。これ
によれば，映像中の各事物の登場する画像区間及び位置を記憶する手段と，対応する関連情報と結び付ける手段とを設けることにより，各事物の表示されている画面上の一点をマウス等によってポイントすることで容易に関連するシーンにジャンプしたり，関連情報を呼び出すことができる。また，画像処理を用いることによって，各事物とその関連情報との対応づけを省力化する手段として，例えば
，発明者らによる特開平５−２０４９９０号公報がある。 Such a mechanism is also considered to be effective in video search. In an image, various things such as people and things appearing in the image can be used as corresponding to the above keywords. As an elemental technology for realizing the associative search using this, for example, as a method of referring to a related scene or information from an object on a video display screen, Japanese Patent Application Laid-Open No. 3-52070, "Moving Image Related Information Reference Method" There is. According to this, by providing a means for storing an image section and a position where each thing in a video appears and a means for associating with the related information, one point on the screen on which each thing is displayed can be a mouse or the like. By pointing, the user can easily jump to a related scene or call up related information. Japanese Patent Application Laid-Open No. Hei 5-204990 discloses an example of a means for saving the correspondence between each object and its related information by using image processing.

特開平５−２０４９９０号公報JP-A-5-204990

　上記で挙げた先行技術は，専ら各事物とその関連情報との対応づけを行うための手段であって，検索システム全体としての構成やユーザの側の使い勝手については十分検討されているとはいえない。また，あらかじめ関連情報との対応づけが済んでいる事物についてしか連想的に辿ることはできないという問題点がある。 The prior art mentioned above is a means for exclusively associating each thing with its related information, and although the structure of the entire search system and the usability of the user have been sufficiently studied. Absent. In addition, there is a problem that it is possible to associatively trace only an object that has been associated with the related information in advance.

　本発明の目的は，映像の検索にあたって，計算機が提示する限定された情報の中から，ただ取捨選択していくだけの操作で，ユーザが記憶を連想的に辿りながら所望のシーンを見つけることができるインタフェースを提供することにある。 An object of the present invention is to allow a user to find a desired scene while associatively tracing a memory by simply selecting the limited information presented by a computer when searching for a video. To provide an interface.

　本発明の第二の目的は，あらかじめ対応づけのされていない事物についても連想的に辿ることができるような手段を提供することにある。 The second object of the present invention is to provide a means for associatively tracing an item that has not been previously associated.

　本発明は，画面上に任意の映像を表示するための映像表示領域と，映像の再生状態を制御するための操作パネル領域と，映像の目次や索引に相当するインデクス情報を表示する領域とを具備し，それらの表示領域のうち，いずれがポイントされたかを検出する手段と，このポイント情報と別途蓄積してある映像の記述情報とから次に再生すべき映像の状態を定める手段を有する。また，表示中の事物とその位置とを把握して，その事物の関連情報を重畳して表示する手段と，関連情報登録変更手段を設ける。また，これら処理に必要な情報を登録し管理する手段を設ける。 According to the present invention, an image display area for displaying an arbitrary image on a screen, an operation panel area for controlling a reproduction state of an image, and an area for displaying index information corresponding to a table of contents or an index of the image are provided. It has means for detecting which of the display areas is pointed, and means for determining the state of the video to be reproduced next from the point information and the descriptive information of the video separately stored. Further, there are provided a means for grasping the displayed thing and its position, superimposing and displaying related information of the thing, and a related information registration changing means. In addition, means for registering and managing information necessary for these processes is provided.

　さらに，表示中のシーンに現れている特定の事物を指定する手段と，その事物の特徴量を抽出するための手段と，その特徴量と合致する特徴量を持つ他の映像シーンを探し出す手段と，見つかった映像シーンに直ちにジャンプする手段とを設ける。 A means for designating a specific thing appearing in the scene being displayed; a means for extracting a feature amount of the thing; a means for searching for another video scene having a feature amount matching the feature amount; Means for immediately jumping to the found video scene.

　本発明によれば，所望のシーンを探すとき，映像のインデクス情報の中に所望のシーンに直接関係のある事物の情報がなくても，所望のシーンを連想させる何らかの事物を次々と辿りながら目的のシーンまで到達することができる。このように，インデクス表示と連想検索を有機的に組み合わせることにより，インデクスの利用できる範囲が大幅に広がり，多くの場合，計算機が提示する情報をただ取捨選択していくだけの処理で目的のシーンが検索できるようになる。そのため，目的のシーンを一意に決定づける適切なキーワードもしくは画像特徴量を考えたり指示する必要がなくなり，あやふやな記憶でも検索ができ，初心者にもわかりやすい。また，関連情報重畳手段によって，再生中の映像に現れている事物の関連情報のうち選択された一部または全部の情報を，再生映像中の該事物の位置に重畳，もしくは該事物とその関連情報とが対応していることを明示される形態で表示されるので，連想検索途中で現れた事物に関する情報を即座に，かつ，どの事物の情報なのか混同することなく正確に知ることができる。また，関連情報登録変更手段を設けることにより，再生中の映像に現れている事物の関連情報の一部または全部の情報を，その事物が現れたその場で直ちに登録もしくは変更できる。 According to the present invention, when searching for a desired scene, even if there is no information on an object directly related to the desired scene in the index information of the video, the object is searched for one after another while retrieving some objects reminiscent of the desired scene. Scene can be reached. In this way, by combining the index display and the associative search organically, the range of use of the index is greatly expanded, and in many cases, the target scene is simply processed by simply selecting the information presented by the computer. Be able to search. For this reason, it is not necessary to consider or indicate an appropriate keyword or image feature amount that uniquely determines a target scene, and a search can be performed even with a sloppy memory, so that even a beginner can easily understand. In addition, the related information superimposing means superimposes, on the position of the object in the reproduced video, the selected part or all of the relevant information of the object appearing in the video being reproduced, or the relevant information and the related information. Since the information is displayed in a form that clearly indicates that it corresponds to the information, it is possible to immediately and accurately know the information about the thing that appeared during the associative search without confusing the information about which thing. . Further, by providing the related information registration changing means, a part or all of the related information of the thing appearing in the video being reproduced can be registered or changed immediately on the spot where the thing appears.

　また，現在画面に表示されている事物が，関連するシーンにジャンプするための情報を未だ付与されていない場合にも，表示画面から事物についての特徴量を抽出し，特徴量の照合を行う手段によって，その事物が現れている別のシーンをその場で検索して映し出すことができる。 Further, even when the information currently displayed on the screen has not yet been provided with information for jumping to a related scene, a feature amount of the object is extracted from the display screen and the feature amount is compared. Thus, another scene in which the object appears can be searched and projected on the spot.

　本発明によれば，所望のシーンを探すときには，インデクス情報から特定しきれなくても，そのシーンに関係する何らかの手掛かりが現れているシーンさえ見つかれば，その手掛かりが現れるシーンを連想的に検索しながら最終的に所望のシーンが得られる，というように，それぞれの表示を融合的に用いた多面的な映像検索ができる。また，再生中の映像中の事物に関する情報を即座に，かつ，どの事物の情報なのか混同することなく正確に知ることができる。また，再生中の映像に現れている事物の関連情報の一部または全部の情報を，該事物が現れたその場で直ちに変更できる。また，本発明のモニタウインドウによれば，再生中のシーンの全映像の中での位置を常に監視することもでき，連想検索でシーンがジャンプしても，ワイプ等の特殊効果と相俟って，そのことが明示的にわかり，通常のシーン変わりと混同することがなくなる。また，重畳表示された関連情報の表示領域をポイントしても，事物をポイントしたのと同じ効果が得られるので，シーンごとに都合のよいポイントの方法を選ぶことができ操作性が向上する。また，表示する関連情報を一覧にすることで，キーを直接入力する手間が省け，またキーを忘れてしまった場合でも，メニューを見て思いだせる。以上のように，本発明によれば，使い勝手のよい連想検索が実現できる。 According to the present invention, when a desired scene is searched for, even if the scene cannot be completely identified from the index information, as long as a scene in which some clue related to the scene appears, the scene in which the clue appears can be searched associatively. However, a desired scene can be finally obtained, so that a multi-faceted video search using the respective displays can be performed. In addition, it is possible to immediately and accurately know information on an object in a video being reproduced without confusing which information the object is. In addition, part or all of the information related to the object appearing in the video being reproduced can be changed immediately at the moment the object appears. Further, according to the monitor window of the present invention, it is possible to constantly monitor the position of the scene being reproduced in the entire video, and even if the scene jumps in the associative search, it is combined with special effects such as wipe. Therefore, it is clearly understood that it is not confused with a normal scene change. In addition, even if the user points at the display area of the related information superimposed and displayed, the same effect as that obtained by pointing the object can be obtained. Therefore, a convenient point method can be selected for each scene, and the operability is improved. Also, by listing the related information to be displayed, the trouble of directly inputting the key can be omitted, and even if the key is forgotten, the menu can be recalled by recalling. As described above, according to the present invention, a user-friendly associative search can be realized.

　以下，本発明の１実施例を詳細に説明する。 Hereinafter, one embodiment of the present invention will be described in detail.

　図２は，本発明を実現するためのシステム構成例の概略ブロック図である。１はＣＲＴ等のディスプレイ装置であり，コンピュータ４の出力画面を表示する。１２は，音声を再生するためのスピーカである。コンピュータ４に対する命令は，マウス等の間接的なポインティングデバイス５，タッチパネル等の直接的なポインティングデバイス１３，あるいはキーボード１１を使って行うことができる。１０の映像再生装置は，光ディスクやビデオデッキ等の映像を再生するための装置である。映像再生装置１０から出力される映像信号は，逐次，３の映像入力装置によってコンピュータ４の扱えるフォーマット形式に変換され，コンピュータ４に送られる。コンピュータ内部では，映像データは，インタフェース８を介してメモリ９に入り，メモリ９に格納されたプログラムに従って，ＣＰＵ７によって処理される。１０が扱う映像の各フレームには，映像の先頭から順に番号，例えばフレーム番号が付けられている。フレーム番号を制御線２によってコンピュータ４から映像再生装置１０に送ることで，フレーム番号に対応する場面の映像が再生される。映像データや各種情報は，外部情報記憶装置６に格納することもできる。メモリ９にはプログラムの他に，以下に説明する処理によって作成される各種のデータが格納され，必要に応じて参照される。 FIG. 2 is a schematic block diagram of a system configuration example for realizing the present invention. A display device 1 such as a CRT displays an output screen of the computer 4. Reference numeral 12 denotes a speaker for reproducing sound. An instruction to the computer 4 can be given using an indirect pointing device 5 such as a mouse, a direct pointing device 13 such as a touch panel, or the keyboard 11. The video reproducing device 10 is a device for reproducing video from an optical disk, a video deck, or the like. The video signal output from the video playback device 10 is sequentially converted into a format that can be handled by the computer 4 by three video input devices, and sent to the computer 4. Inside the computer, the video data enters the memory 9 via the interface 8 and is processed by the CPU 7 according to the program stored in the memory 9. Each frame of the video handled by 10 is numbered sequentially from the top of the video, for example, a frame number. The video of the scene corresponding to the frame number is reproduced by transmitting the frame number from the computer 4 to the video reproducing device 10 through the control line 2. Video data and various information can also be stored in the external information storage device 6. The memory 9 stores various data created by the processing described below in addition to the program, and is referred to as needed.

　以下では，まず連想検索システムの概要について説明し，次に各技術の詳細な実行手順について説明する。 Below, the outline of the associative search system is explained first, and then the detailed execution procedure of each technology is explained.

　映像の連想検索を実現するシステムの画面例を図１に示す。１はディスプレイ装置であり，１２は音声やＢＧＭ等を出力するスピーカ，５はマウスやジョイステイツク等の間接的なポインティングデバイス，１１はキーボード，１３はタッチパネルのような直接的なポインティングデバイスである。 FIG. 1 shows an example of a screen of a system for realizing an associative search for video. 1 is a display device, 12 is a speaker for outputting sound or BGM, 5 is an indirect pointing device such as a mouse or a joystick, 11 is a keyboard, and 13 is a direct pointing device such as a touch panel. .

　ディスプレイ装置１中のモニタウインドウ１１００は，モニタ画面になっており，ＶＣＲと同形式の操作パネル１１０２があって，映像を自由に再生し視聴することができる。モニタ画面に表示される映像が「本」における「本文」，パネル（ボタン）操作は「頁めくり」に対応する。右下のウインドウ１１０８は，対象とする映像の各シーンの代表画像のシーン一覧表示，右中のウインドウ１１１２は，その映像に登場する被写体の一覧表示である。これらの一覧表示を総称して，「インデクス」と呼ぶ。ウインドウ１１０８のシーン一覧表示は，映像中の各シーンから典型的なフレーム画像を選び，縮小して時間順にアイコン１１１０として一覧に並べたものである。これらの画像はシーンの「見出し」に相当するものとして考えることができ，それらを時系列に並べたシーン一覧は「本」の「
目次」にあたる。一方，被写体は，シーンの重要な構成要素の一つであり，その意味でテキストにおける「キーワード」に相当する。したがって，ウインドウ１１１２の被写体の一覧表示は，「索引」に当たる。シーン一覧表示中のアイコン１１１０がマウスクリックされると，モニタ画面の映像が切り替わり，そのアイコンの示すシーンが再生される。被写体の一覧表示は，被写体が何であるかを示すアイコン１１１４と，その右側の時間軸表示部（棒グラフ）１１１６から成る。時間軸表示部（棒グラフ）は，左端が映像の先頭，右端が末尾を示す時間軸になっていて，棒として表示された部分が，その被写体の現れている時間区間を示している。棒の部分をクリックすると，その区間の映像をモニタ画面に表示する。尚，１１０４は，マウスのようなポインティングデバイスの動きに合わせて移動するカーソル，１１０６のウインドウは，映像の各種関連情報を表示する汎用入出力ウインドウである。 A monitor window 1100 in the display device 1 is a monitor screen, and has an operation panel 1102 of the same type as the VCR, so that a video can be freely reproduced and viewed. The image displayed on the monitor screen corresponds to "text" in "book", and the operation of the panel (button) corresponds to "page turning". The lower right window 1108 displays a scene list of representative images of each scene of the target video, and the lower right window 1112 displays a list of subjects appearing in the video. These list displays are collectively called "indexes". In the scene list display of the window 1108, a typical frame image is selected from each scene in the video, reduced, and arranged in a list as icons 1110 in chronological order. These images can be considered to be equivalent to the “headlines” of the scenes.
Table of Contents ". On the other hand, the subject is one of the important components of the scene, and in that sense, corresponds to a “keyword” in the text. Therefore, the display of the list of subjects in the window 1112 corresponds to “index”. When the mouse is clicked on icon 1110 in the scene list display, the image on the monitor screen is switched, and the scene indicated by the icon is reproduced. The subject list display includes an icon 1114 indicating what the subject is, and a time axis display section (bar graph) 1116 on the right side thereof. In the time axis display section (bar graph), the left end is the time axis indicating the beginning of the video and the right end is the end, and the portion displayed as a bar indicates the time section in which the subject appears. Clicking on the bar displays the video of that section on the monitor screen. Note that a cursor 1104 moves in accordance with the movement of a pointing device such as a mouse, and a window 1106 is a general-purpose input / output window for displaying various related information of video.

　次に，本発明にかかる連想検索の基本的な考え方をシンプルな例で説明する。あるユーザが一連の映像の中から被写体Ｂが出ている特定のシーンを見つけたいとする。インデクスに表示される代表画像のシーン一覧表示や被写体の一覧表示の中に目的のシーン（被写体Ｂが出ているシーン）や被写体Ｂそのもののアイコンが運よく存在すれば，それを直接クリックし，再生することにより所期の目的は達成される。しかし，通常，映像情報は膨大であり，容易に目的のシーンが見つけられないことが多い（例えば，映像中，被写体Ｂが出ている時間が短かければ，検索が簡単にできないことは容易に理解されよう）。そこで，本発明の連想検索が重要な意味を持ってくる。つまり，目的のシーン（被写体Ｂ）は直接探せなくとも，ユーザは目的のシーン（被写体Ｂ）に関する何らかの知識を持っていることが多く，本発明ではその知識を利用して連想という名のリンクを施すというものである。例えば，ユーザが，被写体Ｂと被写体Ａが同時に現われていた（シーンがあったはずだ）ということを記憶しているか，若しくは同時に現われている可能性が高いという予測が成り立つと考えているならば，まず被写体Ａを検索することを試みる。 Next, the basic concept of the associative search according to the present invention will be described with a simple example. Suppose that a user wants to find a specific scene in which a subject B appears in a series of videos. If the icon of the target scene (the scene in which the subject B appears) or the icon of the subject B exists luckily in the scene list display of the representative image displayed in the index or the list of the subjects, directly click the icon, The intended purpose is achieved by regeneration. However, usually, the video information is enormous, and it is often difficult to find the target scene easily (for example, if the time during which the subject B appears in the video is short, it is not easy to search for. Will be understood). Therefore, the associative search of the present invention has an important meaning. That is, even if the target scene (subject B) cannot be directly searched for, the user often has some knowledge about the target scene (subject B). In the present invention, a link named associative is used by using the knowledge. It is to apply. For example, if the user remembers that the subject B and the subject A appeared at the same time (there should have been a scene), or thinks that the prediction that the possibility that the subject B and the subject A appeared at the same time is high holds. First, an attempt is made to search for the subject A.

　図３に本発明の映像の連想検索機能のイメージを示す。図中の３枚の絵（シーン１〜３）は，連想検索を行ったときに，モニタ画面に表示される映像中の１シーンをそれぞれ表したイラストである。例えば，ユーザは，インデクス（ウインドウ１１１２の被写体のアイコン１１１４）の中から目的の被写体Ｂを連想できる被写体Ａが写っているシーンを１つ探してモニタ画面に表示する。モニタウインドウ１１００のモニタ画面に最左側のシーン１が再生されているときに，登場している被写体Ａ，Ｃのうちの被写体Ａをマウスカーソルでクリックすると，その被写体Ａが現れている図中中央のシーン２に画面が切り替わる。このシーン２に一緒に現れている別の被写体Ｂをクリックすれば，Ｂが現れている図中右側のシーン３に到達することができる。このシーンが目的のシーンであれば，連想検索は終了する。 FIG. 3 shows an image of the video associative search function of the present invention. The three pictures (scenes 1 to 3) in the figure are illustrations each representing one scene in the video displayed on the monitor screen when the associative search is performed. For example, the user searches the index (subject icon 1114 in the window 1112) for one scene in which the subject A that can associate the target subject B is displayed, and displays the scene on the monitor screen. When the leftmost scene 1 is reproduced on the monitor screen of the monitor window 1100, when the subject A is clicked with the mouse cursor among the appearing subjects A and C, the center in the figure where the subject A appears is displayed. The screen switches to scene 2 of. If another subject B that appears together with this scene 2 is clicked, it is possible to reach the scene 3 on the right side in the figure where B appears. If this scene is the target scene, the associative search ends.

　すなわち，被写体Ｂが出ている特定のシーンを見つける場合，被写体Ｂが被写体Ａと同時に現れるという連想を基に，インデクスに登録されている被写体Ａを通して連想的に目的のシーンである被写体Ｂの特定のシーンまで辿ることができる。このとき，キーワードを考えるといった面倒な操作は不要であり，画面に現れる情報だけを見て，ただ取捨選択すればよい。 That is, when finding a specific scene in which the subject B appears, based on the association that the subject B appears at the same time as the subject A, the identification of the subject B as the target scene is associatively performed through the subject A registered in the index. Can be traced to the scene. At this time, a troublesome operation such as thinking of a keyword is not necessary, and only the information appearing on the screen should be looked at and a selection can be made.

　尚，後述するように単に複数の被写体間の連想に限らず，シーン自体や言葉，ＢＧＭ，字幕といった，映像のあらゆるマルチメディア情報に基づいた連想を用いて検索を行なうことが可能である。 Note that, as will be described later, it is possible to perform a search using not only an association between a plurality of subjects but also an association based on all multimedia information of a video, such as a scene itself, words, BGM, and subtitles.

　さて，こうした連想検索の機能を実現するのに必要となる情報は，基本的に，（１）被写体の現れている映像区間（出現区間），（２）被写体の画面上での位置（出現位置），（３）クリックされたときに切り替わるべき他の映像区間（リンク情報）の３つである。これら３つの情報は組にして扱われる。 By the way, information necessary for realizing such an associative search function is basically composed of (1) a video section (appearance section) where the subject appears, and (2) a position of the subject on the screen (appearance position). ), (3) Other video sections (link information) to be switched when clicked. These three pieces of information are handled as a set.

　映像再生中にどの被写体がクリックされたかは，（１），（２）の出現区間・出現位置情報から判定され，同じ組に格納された（３）のリンク情報から映像の切り替え先が決定される。ここで，映像は，フレームと呼ばれる静止画が毎秒３０枚の割合で連続的に表示されることによって実現されている。これらのフレームに，映像の先頭から順にフレーム番号と呼ばれる連続番号を割り振れば，（１）の出現区間は，その区間の先頭のフレーム番号と末尾のフレーム番号とで表現することができる。（２）の出現位置は，（１）の区間中の各フレームのどの領域に被写体が映されているのかを表す座標情報である。（３）のリンク情報としては，同じ被写体が現れている別のシーンを次々と巡れるようなリンクを施しておく。１本の映像中には，同じ被写体が何度も現れることが多いが，このリンクにより，その被写体が登場する全てのシーンをクリックだけで簡単に呼び出すことができる。 Which subject is clicked during video playback is determined from the appearance section / appearance position information of (1) and (2), and the video switching destination is determined from the link information of (3) stored in the same set. You. Here, the video is realized by continuously displaying still images called frames at a rate of 30 frames per second. If a serial number called a frame number is sequentially assigned to these frames from the beginning of the video, the appearance section of (1) can be represented by the first frame number and the last frame number of the section. The appearance position (2) is coordinate information indicating in which region of each frame in the section (1) the subject is displayed. As the link information of (3), a link is provided so that another scene in which the same subject appears can be cycled one after another. The same subject often appears many times in one video, but with this link, all scenes in which the subject appears can be easily called by just clicking.

　上述の構成による連想検索方法は，すでにリンク情報が設定されている被写体にしか用いることができない。しかし，先に挙げた連想検索に必要な３種の情報のうち，被写体の出現区間と出現位置は，例えば，発明者らによる特願平４−２６１０３３等の被写体自動検索技術により求めることができる。 (4) The associative search method with the above configuration can be used only for a subject for which link information has already been set. However, among the three types of information necessary for the associative search described above, the appearance section and the appearance position of the subject can be obtained by, for example, an automatic subject search technology such as Japanese Patent Application No. 4-261333 by the inventors. .

　被写体自動検索アルゴリズムの概略を図４に示す。探そうとする被写体に固有の色の組み合わせをフレーム中から見つけ出すのが基本である。まずユーザがビデオ映像中からその被写体が現れているフレームを例示画像として一枚だけ選び，その画像中から特徴的な色を抽出する。その後，システムは映像中の全てのフレームについて一枚一枚小さなブロックに分割し，特徴色を含むブロックを探していく。１枚のフレーム中に，特徴色を含むブロックが各色について一定数以上あれば，そのフレームにその被写体があると判定する。フレームにおけるその被写体の出現位置は，上述の被写体検索の処理において，その被写体の特徴色を含むブロックがフレーム中のどの位置に分布しているかを調べることで容易に求められる。 FIG. 4 shows an outline of the subject automatic search algorithm. Basically, a color combination unique to a subject to be searched is found in a frame. First, the user selects only one frame in which the subject appears from the video image as an example image, and extracts a characteristic color from the image. After that, the system divides every frame in the video into smaller blocks one by one, and searches for blocks containing characteristic colors. If the number of blocks including the characteristic color is more than a certain number for each color in one frame, it is determined that the subject exists in that frame. The appearance position of the subject in the frame can be easily obtained by examining where in the frame the block including the characteristic color of the subject is distributed in the above-described subject search processing.

　しかし，この被写体検索方法そのものは，例示画像をシステムに提示することを原則とし，探したい被写体が現れている区間を最低１つは手作業によって見つける必要があり，これが面倒な場合が多い。しかし，本発明のような連想検索の場合には，モニタ画面上の被写体を例示画像として直ちに利用することができるため，この被写体検索方法を非常に効果的に活用できる。 However, this subject search method itself basically presents an example image to the system, and it is necessary to manually find at least one section in which the subject to be searched appears, which is often troublesome. However, in the case of the associative search as in the present invention, the subject on the monitor screen can be used immediately as an example image, so that the subject search method can be used very effectively.

　更に，映像中の全てのフレームについて，あらかじめブロック分割し，各ブロックごとに含まれる色の種類のリストを記憶装置に格納しておけば，上記被写体検索から毎フレームごとのブロック分割処理が不要になり非常に高速になる。その速度は，現行のワークステーション程度の性能でもリアルタイムの１００倍速が可能となっており，１時間の映像の中から３０秒程度で全ての被写体出現区間を見つけることができる。現在表示されている映像から最も近い出現区間を１つだけ探せばよいのであれば，平均して数秒程度で見つけることができる。当然ながら，記憶装置に格納された色のリストは，検索する被写体に関わらず同じものを使うことができる。 Furthermore, if all frames in the video are divided into blocks in advance and a list of the types of colors included in each block is stored in the storage device, the above-described subject search eliminates the need for the block division processing for each frame. Become very fast. The speed is 100 times faster than real-time performance even with the performance of the current workstation, and all the object appearing sections can be found in about 30 seconds from a one-hour video. If only one nearest appearing section needs to be searched from the currently displayed video, it can be found in about several seconds on average. Of course, the same color list stored in the storage device can be used regardless of the subject to be searched.

　以下では，本発明を実現するシステムの実行手順を，メモリ９に格納されたプログラムに従ってＣＰＵ７により実行されるソフトウエアモジュールのブロック図を用いて説明する。ここで説明する各モジュールは，ハードウエア自体で実現することも可能である。 In the following, an execution procedure of the system for realizing the present invention will be described with reference to a block diagram of a software module executed by the CPU 7 according to a program stored in the memory 9. Each module described here can also be realized by hardware itself.

　図５は，本発明によるシーンの連想検索を実現するための処理ブロック図の一例である。連想検索の手掛かりとなる映像中の被写体などの事物が，映像中のどの時間にどの位置に現れるかの情報（出現区間，出現位置），並びにその関連情報・飛び先となるシーンの情報（リンク情報）は，後述するオブジェクトと呼ぶデータ構造体の形式で，あらかじめ図２のメモリ９もしくは外部情報記憶装置６に蓄積されているものとする。 FIG. 5 is an example of a processing block diagram for realizing a scene associative search according to the present invention. Information (at what time and in what position in the video an object such as a subject appears in the video as a clue for the associative search), as well as its related information and information on the scene to jump to (link) Information) is stored in advance in the memory 9 or the external information storage device 6 in FIG. 2 in the form of a data structure called an object described later.

　ここで，映像中の被写体などの事物に関する情報は，出現区間ごとに１つずつ作成されるオブジェクト指向型のデータ構造体の中で管理している。図６は，その概念を表す説明図である。これを，以下，映像オブジェクト，もしくは単にオブジェクトと呼ぶ。映像は動画部分と音声部分とに分けられるが，動画については，その全体をフレーム画像を成すｘｙ平面，および時間ｔの軸からなる３次元空間で表現でき，被写体の出現区間と出現位置は，その部分空間であると考えることができる。この部分空間と１対１に対応づけられるデータ構造体として映像オブジェクトを定義する。（つまり，同一被写体であっても，原則として出現区間ごとにそれぞれの映像オブジェクトとして定義され，それらの映像オブジェクト間（被写体間）にはリンクが施される。）映像オブジェクトには，被写体をは
じめ，字幕やシーンなど映像中の様々な情報を対応づけることができる。言葉やBGMといった音声情報についても，同様に時間軸を持つ音声情報空間の任意の部
分区間と１対１に対応づけられるデータ構造体として映像オブジェクトを定義できる。そして，リンク情報は，映像オブジェクトを相互に参照しあうポインタとして格納する。このように，動画・音声間など，対応するメディアが異なっても共通のデータ構造体の枠組みで管理することで，映像中のあらゆる情報の間に自由にリンクを設定することができる。 Here, information on an object such as a subject in a video is managed in an object-oriented data structure created one by one for each appearance section. FIG. 6 is an explanatory diagram showing the concept. This is hereinafter referred to as a video object or simply an object. The video is divided into a moving image part and an audio part, and the whole moving image can be represented by a three-dimensional space composed of an xy plane forming a frame image and an axis of time t. It can be considered that subspace. A video object is defined as a data structure associated with this subspace on a one-to-one basis. (In other words, even if the subject is the same, in principle, it is defined as a video object for each appearance section, and a link is provided between the video objects (between the subjects).) Various information in the video such as subtitles and scenes can be associated. Similarly, for audio information such as words and BGM, a video object can be defined as a data structure that is associated one-to-one with any partial section of the audio information space having a time axis. Then, the link information is stored as a pointer that mutually references the video object. In this way, even if the corresponding media, such as between moving images and audio, is different, the link can be set freely between all pieces of information in the video by managing them using the common data structure framework.

　さて，図５に戻って，処理ブロック図を詳細に説明する。オブジェクト管理部１２０は，これらオブジェクトを管理するモジュールであり，オブジェクトの登録・変更・削除の処理を行うとともに，他のモジュールから要求があれば，示された条件に合致するオブジェクトの情報１２２を取り出し，そのモジュールに提示する。１００の映像再生表示部は，図１のディスプレイ装置１のモニタ画面であるモニタウインドウ１１００に，映像の再生及び表示処理を行うとともに，現在表示している映像の再生位置情報２１８を１０２のポイント事物識別部に送る。ポイント位置検出部１０４は，図１のマウス等の間接的なポインティングデバイス５やタッチパネルのような直接的なポインティングデバイス１３を常時監視し，ユーザがポイントの動作を行った表示画面上の位置情報１１２をポイント事物識別部１０２に送る。また，併せてその位置情報１１２はインデクス管理部１０８，操作パネル部１１０にも送られる。ポイント事物識別部１０２は，映像再生表示部１００から受け取った再生位置情報２１８をオブジェクト管理部１２０に送り，その再生位置に出現しているとして登録されている全ての事物の情報をオブジェクトとして得る。もし該当するオブジェクトがあれば，さらにそれらから事物の位置情報を取得して，ポイント位置検出部１０４からの位置情報１１２との照合を行い，どの事物がポイントされたのかを識別する。ポイント事物識別部１０２は，識別された事物に関する情報１１４を映像制御部１０６に送る。映像制御部１０６は，識別された事物に関する情報１１４の中でリンク情報に基づき，その事物が現れている別のシーンにジャンプする等の処理を行うため，映像再生表示部１００に制御情報２０８を送る。また，後述するように，事物の関連情報を表示する際には制御情報２１０を映像再生表示部１００に送る。１０８のインデクス管理部は，登録されている映像の代表的なフレーム画像をアイコン１１１０として記憶するとともに，それらのアイコンの一覧にしてウインドウ１１０８に表示する。インデクス管理部１０８はアイコン１１１０と一緒にそのフレーム番号も記憶しており，ポイント位置検出部１０４が，あるアイコンをポイントしていることを検出すると，そのアイコンに対応するシーンを再生するように，制御情報１１６を映像制御部１０６に伝える。また，ポイント事物識別部１０２から，どの事物がポイントされたかの情報１２４をもらい，その事物がポイントされたことがインデクスからもわかるような表示を行う。また，インデクス管理部１０８は，図１のウインドウ１１１２の被写体の一覧表示も管理する。つまり，被写体が何であるかを示すアイコン１１１４を表示すると共に，その被写体の時間軸表示（棒グラフ表示）を行ない，棒の部分がクリックされるとその区間の映像を再生するように制御情報１１６を映像制御部１０６に送る。操作パネル部１１０は，再生・早送り・巻戻し等の各種再生状態を表す図１の操作パネル１１０２を表示し，ポイント位置検出部１０４によって，その操作パネルがポイントされていることが検出されると，ポイントされた操作パネルに対応する再生状態にするよう制御情報１１８を映像制御部１０６に送る。 Now, returning to FIG. 5, the processing block diagram will be described in detail. The object management unit 120 is a module for managing these objects. The object management unit 120 performs registration, change, and deletion processing of the objects, and, when requested by another module, extracts object information 122 that meets the indicated conditions. , Presented in that module. The video playback display unit 100 performs video playback and display processing on a monitor window 1100, which is the monitor screen of the display device 1 in FIG. 1, and also displays the playback position information 218 of the currently displayed video in a 102-point object. Send to identification unit. The point position detection unit 104 constantly monitors the indirect pointing device 5 such as a mouse or the direct pointing device 13 such as a touch panel in FIG. 1, and displays position information 112 on the display screen on which the user has performed a point operation. To the point object identification unit 102. In addition, the position information 112 is also sent to the index management unit 108 and the operation panel unit 110. The point object identification unit 102 sends the reproduction position information 218 received from the video reproduction display unit 100 to the object management unit 120, and obtains information on all the objects registered as appearing at the reproduction position as objects. If there is a corresponding object, position information of the object is further obtained from the object and collation with the position information 112 from the point position detection unit 104 is performed to identify which object is pointed. The point thing identification unit 102 sends information 114 about the identified thing to the video control unit 106. The video control unit 106 sends the control information 208 to the video reproduction / display unit 100 to perform processing such as jumping to another scene where the object appears based on the link information in the information 114 on the identified object. send. In addition, as described later, the control information 210 is sent to the video reproduction display unit 100 when displaying the related information of the thing. The index management unit 108 stores a representative frame image of the registered video as an icon 1110, and displays a list of those icons in a window 1108. The index management unit 108 also stores the frame number together with the icon 1110. When the point position detection unit 104 detects that a certain icon is pointed, the index management unit 108 reproduces a scene corresponding to the icon. The control information 116 is transmitted to the video control unit 106. In addition, information 124 indicating which object is pointed is received from the point object identification unit 102, and a display is made such that the index indicates that the object has been pointed. The index management unit 108 also manages a list display of subjects in the window 1112 in FIG. In other words, the control information 116 is displayed so that the icon 1114 indicating the subject is displayed, the time axis of the subject is displayed (bar graph display), and when the bar is clicked, the video of the section is reproduced. The image is sent to the video control unit 106. The operation panel unit 110 displays the operation panel 1102 of FIG. 1 showing various playback states such as playback, fast forward, and rewind. When the point position detection unit 104 detects that the operation panel is pointed, , And sends control information 118 to the video control unit 106 so as to set the playback state corresponding to the pointed operation panel.

　図７は，映像再生表示部１００をより詳しく示した処理ブロック図の一例である。２００の映像再生部は，映像制御部１０６から送られる制御情報２０８によって，どの映像をどこからどのように再生するか等の指示を受けて映像を再生する。現在表示されている映像の再生位置情報２１２は，逐次事物有無判定部２０２に送られる。２０２は，その再生位置の映像に，あらかじめ登録されている事物が登場しているかどうかをチェックし，あれば，登場している全ての事物の表示画面上での位置情報２１６を取得して，関連情報表示部２０４に送る。この位置情報２１６は，先述のポイント事物識別部１０２で取得する位置情報と同じものであり，位置情報取得処理の重複を避けるため，事物情報２１８としてポイント事物識別部１０２に送ることができる。関連情報表示部２０４は，再生中の各事物の関連情報を画面上に合わせて表示することができる。制御情報２１０により，関連情報を表示するのか否か，表示するのならどの関連情報をどのような形態で表示するのか等が決定される。特に，事物の位置情報２１６によって，表示中のどの位置と対応する情報なのかを明示することができる。この表示方法については後述する。表示方法によっては，映像２１４に重畳合成処理を行い，その映像を映像表示部２０６で表示する。 FIG. 7 is an example of a processing block diagram showing the video playback / display unit 100 in more detail. The video reproducing unit 200 reproduces the video according to the control information 208 sent from the video control unit 106 and receives instructions such as what video is to be reproduced from where and how. The reproduction position information 212 of the currently displayed video is sequentially sent to the thing presence / absence determination unit 202. 202 checks whether or not a pre-registered thing has appeared in the video at the playback position, and if so, acquires position information 216 on the display screen of all the appearing things, The information is sent to the related information display unit 204. The position information 216 is the same as the position information acquired by the point object identification unit 102 described above, and can be sent to the point object identification unit 102 as object information 218 in order to avoid duplication of position information acquisition processing. The related information display unit 204 can display related information of each item being reproduced along with the screen. The control information 210 determines whether or not related information is to be displayed, and if so, which related information is to be displayed in what form. In particular, the position information 216 of the object can clearly indicate which position in the display corresponds to the information. This display method will be described later. Depending on the display method, the image 214 is superimposed and synthesized, and the image is displayed on the image display unit 206.

　図８は，映像オブジェクトのデータ構造を示す一例である。５００はデータ構造体の大枠である。５０２は，オブジェクトのＩＤ番号で，他のオブジェクトと識別するための一意な数が与えられる。５０４は，オブジェクトが，例えば，人を表すのか，字幕であるのか，あるいは音声なのかを示す分類コードである。５０６は，そのオブジェクトが登場する映像へのポインタである。この例では，後述するように，映像は物理映像６００と論理映像９００の２階層に分けたデータ構造になっており，５０６は，物理映像へのポインタである。５１０は，オブジェクトが表す映像中の事物が登場する区間の始点のフレーム番号，５１２は終点のフレーム番号である。５０８は，その事物を代表する映像のフレーム番号であり，オブジェクトを視覚的に扱うインタフェースの下においては，この事物を表すアイコンの絵柄として用いられる。５１４は，オブジェクトの表す事物の画面上での位置を示すための構造体７００へのポインタである。 FIG. 8 is an example showing the data structure of a video object. 500 is a frame of a data structure. Reference numeral 502 denotes an object ID number, which is given a unique number for identifying the object from other objects. A classification code 504 indicates whether the object represents, for example, a person, a caption, or a sound. Reference numeral 506 denotes a pointer to a video in which the object appears. In this example, as described later, the video has a data structure divided into two layers of a physical video 600 and a logical video 900, and reference numeral 506 is a pointer to the physical video. 510 is the frame number of the start point of the section in which the thing in the video represented by the object appears, and 512 is the frame number of the end point. Reference numeral 508 denotes a frame number of a video representing the object, which is used as a picture of an icon representing the object under an interface that visually handles the object. Reference numeral 514 denotes a pointer to a structure 700 for indicating the position on the screen of the thing represented by the object.

　図９に，オブジェクト位置構造体７００の一例を示す。この構造体は，事物の動きがないか，あるいは十分小さい区間ごとに１つずつ作成され，それらが数珠つなぎになった連接リストの形をとる。７０２は，その動きのない区間の始点フレーム番号，７０４が終点フレーム番号である。７０６から７１２は，事物を矩形領域で囲んだときの矩形領域の原点座標と大きさである。５１６は，より抽象度の高い上位のオブジェクトへのポインタである。全てのオブジェクトは固有の関連情報を持つことができるが，幾つかのオブジェクトで関連情報を共有したほうが都合のいい場合がある。例えば，映像中の人や物などの被写体は，同じ被写体が複数のシーンで現れることが多い。もちろん，現れたときの姿や挙動は各シーンごとに違うため，各シーンごとに固有の関連情報が存在するが，名前であるとか，性別・年齢・職業といった抽象度の高い情報は共有したほうがデータ量が少なくて済み，また，情報が更新されたときにも整合性に破綻をきたすことがない。その意味で，こうした抽象度の高い情報は，より上位のオブジェクトの関連情報にもたせ，そのオブジェクトへのポインタを５１６に持つデータ構造としている。５１８は，上位のオブジェクトから下位のオブジェクトを参照するためのポインタである。これは，上位のオブジェクトも下位のオブジェクトも同じ５００のデータ構造体を用いるためによる。もちろん，上位のオブジェクトには，始点・終点フレームや位置情報等の映像に直接関係する情報は不要であるので，それらを省いた簡略版の構造体を用いることもできる。 FIG. 9 shows an example of the object position structure 700. This structure takes the form of a linked list that is created one by one for each section where there is no movement of the object or that is sufficiently small. Reference numeral 702 denotes a start frame number of the section having no motion, and reference numeral 704 denotes an end frame number. Reference numerals 706 to 712 denote the origin coordinates and the size of the rectangular area when the thing is surrounded by the rectangular area. Reference numeral 516 denotes a pointer to a higher-level object having a higher abstraction degree. Every object can have its own related information, but it may be more convenient to share the related information among several objects. For example, a subject such as a person or an object in a video often has the same subject appearing in a plurality of scenes. Of course, the appearance and behavior when they appear differ from scene to scene, so each scene has its own related information. However, it is better to share information with a high degree of abstraction, such as the name, gender, age, and occupation. The data amount is small, and the integrity is not broken when information is updated. In this sense, such information having a high degree of abstraction has a data structure having a pointer 516 to the higher-level object as related information of the higher-level object. Reference numeral 518 denotes a pointer for referencing a lower-level object from a higher-level object. This is because the upper object and the lower object use the same 500 data structures. Of course, information directly related to the video, such as the start and end frames and the position information, is not required for the upper-level object, and thus a simplified version of the structure without such information can be used.

　５２０は，事物の関連情報を記憶するディクショナリ８００へのポインタである。ディクショナリは，図１０に示されるように，関連情報を呼び出すためのキーとなる文字列８０４へのポインタであるキー８０２と，そのキー文字列に対応づけて登録する関連情報の文字列８０８へのポインタである内容８０６，及び関連するオブジェクトへのポインタを持つリンク８１０から構成され，登録する関連情報の項目数だけ作られ，それらが数珠つなぎになった連接リスト形式をとる。オブジェクトの関連情報の読み出しは，キーを指定して，そのキーと合致するディクショナリ構造体の内容を返すことで行う。例えば，キーが「名前」で内容が「太郎」の場合には，「名前」というキーを指定すると「太郎」という関連情報が得られる。関連情報表示部２０４では，どの関連情報を表示するかの選択は
，どのキーに対応する内容を表示するかという処理に帰着する。リンクは，連想検索を行うときのジャンプ先の事物へのポインタであり，内容８０６には，例えば，「他のシーンに現れている同じ被写体」といったリンクの意味を表す文字列あるいは記号が入り，リンク先８１０には，その被写体のオブジェクトへのポインタが入る。連想検索でジャンプするときには，映像制御部１０６は，このオブジェクトの構造体から，その被写体が現れている映像と先頭フレーム番号を読み出して，その映像位置から再生するように映像再生部２００を制御する。 Reference numeral 520 denotes a pointer to the dictionary 800 that stores information related to an object. As shown in FIG. 10, the dictionary includes a key 802 which is a pointer to a character string 804 serving as a key for calling related information, and a character string 808 of related information registered in association with the key character string. It is composed of links 810 having contents 806 as pointers and pointers to related objects, and is created in the number of items of related information to be registered, and takes a linked list form in which the items are linked in a daisy chain. Reading related information of an object is performed by specifying a key and returning the contents of the dictionary structure that matches the key. For example, if the key is "name" and the content is "taro", specifying the key "name" will give the related information "taro". In the related information display unit 204, the selection of which related information is to be displayed results in the process of displaying the content corresponding to which key. The link is a pointer to an object to jump to when performing an associative search, and the content 806 contains a character string or a symbol representing the meaning of the link such as “the same subject appearing in another scene”. The link destination 810 contains a pointer to the object of the subject. When jumping in the associative search, the video control unit 106 reads the video in which the subject appears and the top frame number from the structure of this object, and controls the video reproduction unit 200 to reproduce the video from the video position. .

　図１１は，映像再生部２００のより詳しい処理ブロック図である。映像は，論理映像と物理映像の２階層構造になっている。論理映像はシーンの集合体としての構造情報だけを持ち，物理映像は映像の実データを持つ。論理映像呼出部３００は，映像制御部から送られる再生位置設定情報３１０から，論理映像のライブラリ３０４から合致する論理映像を呼び出す。 FIG. 11 is a more detailed processing block diagram of the video reproducing unit 200. The video has a two-layer structure of a logical video and a physical video. A logical video has only structural information as a set of scenes, and a physical video has actual video data. The logical video calling unit 300 calls a matching logical video from the logical video library 304 based on the reproduction position setting information 310 sent from the video control unit.

　図１２に論理映像のデータ構造体９００の一例を示す。９０２は，論理映像を一意に特定するＩＤ番号である。９０４は，論理映像を代表するシーンの番号である。９０６は，構成シーンを表す連接リストで，シーン１０００が再生されるべき順番に連なっている。９０８は，シーン間のデゾルブやワイプといった特殊効果の設定情報である。９１０には，各種関連情報が入る。 FIG. 12 shows an example of the data structure 900 of the logical video. Reference numeral 902 denotes an ID number for uniquely specifying a logical video. Reference numeral 904 denotes a scene number representing a logical image. Reference numeral 906 denotes a linked list representing the constituent scenes, which are linked in the order in which the scenes 1000 should be reproduced. Reference numeral 908 denotes setting information of special effects such as dissolve and wipe between scenes. 910 contains various related information.

　図１３に，シーン構造体１０００の一例を示す。１００２がシーンの代表フレーム番号で，１００４が始点，１００６が終点のフレーム番号である。対応する物理映像へのポインタが１００８に入る。１０１０には，このシーンの中に登場する全ての事物のデータ構造体，すなわちオブジェクトへのポインタが連接リスト形式で入る。シーンは，その映像内容のつながりを単位にまとめることができ，ピラミッド状の階層的な管理を行うことができる。１０１２の上位シーンは，そうした上位のシーンへのポインタであり，１０１４の下位シーンは，１段下位にある全てのシーンを連接リストにしたものへのポインタである。１０１６はシーンの属性情報である。物理映像呼出部３０２は，フレーム番号に３００でシーン情報が加わった情報３１２によって，物理映像のライブラリ３０８から呼び出す物理映像と再生するフレーム位置を決定する。 FIG. 13 shows an example of the scene structure 1000. 1002 is the representative frame number of the scene, 1004 is the start point, and 1006 is the end frame number. A pointer to the corresponding physical image enters 1008. In 1010, a data structure of all objects appearing in this scene, that is, pointers to objects are entered in a linked list format. Scenes can be grouped by the connection of their video contents, and can be managed hierarchically in a pyramid shape. The upper scene 1012 is a pointer to such a higher scene, and the lower scene 1014 is a pointer to a linked list of all scenes one level lower. Reference numeral 1016 denotes scene attribute information. The physical video calling unit 302 determines the physical video to be called from the physical video library 308 and the frame position to be reproduced based on the information 312 obtained by adding the scene information to the frame number 300.

　図１４は，物理映像構造体６００の一例である。６０２は，物理映像を一意に特定するＩＤ番号である。６０４は，レーザーディスクの映像なのか，ビデオテープのものか，あるいは外部情報記憶装置に格納されたデータなのかを識別するための分類コードである。６０６は代表フレーム番号，６０８が始点，６１０が終点フレーム番号である。６１６には属性情報が入る。他は，映像データが物理映像のデータ構造体の中に持っている場合に必要となる情報である。６１２が映像の画面幅，６１４が同高さであり，６１８は，あるフレーム番号に対応するフレーム画像データが，物理映像のどのアドレスから存在するかを記憶したディレクトリである。６２０はフレーム番号，６２２にフレームの画素データ，６２４に音声データという形式がフレーム数だけ繰り返される。物理映像呼出部は，分類コードにより，レーザディスク等の映像再生装置１０を用いる映像であるとわかれば，映像再生装置に制御命令を送って該当する映像を呼び出す処理を行い，物理映像中にある場合には，該当する映像を呼び出す。 FIG. 14 shows an example of the physical video structure 600. 602 is an ID number for uniquely specifying a physical video. Reference numeral 604 denotes a classification code for identifying whether the image is a laser disk image, a video tape image, or data stored in an external information storage device. 606 is a representative frame number, 608 is a start point, and 610 is an end point frame number. 616 contains attribute information. The other information is necessary when the video data is included in the physical video data structure. 612 is the screen width of the video, 614 is the same height, and 618 is a directory that stores the address of the physical video from which the frame image data corresponding to a certain frame number exists. A frame number 620, frame pixel data 622, and audio data 624 are repeated by the number of frames. If the physical image retrieving unit determines from the classification code that the image is a video using the video reproducing device 10 such as a laser disk, the physical video retrieving unit sends a control command to the video reproducing device to perform a process of calling the relevant video, and the physical video is included in the physical video. In that case, call the corresponding video.

　論理映像を用いるメリットの一つは，大きなデータ量になりがちな物理映像の１本から，その映像を用い，様々に編集された多種多様の映像作品を少ないデータ量で作れることにある。特に，ニュースなど過去の資料映像を頻繁に使い回すような映像ほど，論理映像を用いる利点が大きい。もう一つのメリットは，シーンごとに登場するオブジェクトをあらかじめ記憶しておくことにより，映像再生中にどの事物が現れているのかを，全てのオブジェクトについて調べる必要がなくなり，迅速な処理が期待できる。 One of the merits of using logical video is that from one physical video that tends to have a large amount of data, a wide variety of video works edited in various ways can be created with a small amount of data using the video. In particular, the advantage of using a logical image is greater for an image that frequently uses past material images such as news. Another advantage is that by storing objects appearing for each scene in advance, it is not necessary to check which objects appear during video reproduction for all objects, and rapid processing can be expected.

　先に簡単に説明した図１のコンピュータ画面例を用いて，連想検索のインタフェース部分の実行手順について詳細に説明する。モニタウインドウ１１００には
，前述の映像再生表示部１００により任意の映像が表示される。表示と合わせ，音声もスピーカ１２から出力される。１１０４がカーソルで，マウスやジョイスティク等の間接的なポインティングデバイス５の操作に合わせて画面上を移動しポイント操作を行う。同様のポイント操作はタッチパネルのような直接的なポイティングデバイス１３によっても行うことができ，その際はカーソルは不要にできる。前述のポイント位置検出部１０４は，これらのポインティングデバイスを常時監視し，マウスの移動に合わせてカーソル１１０４を移動したり，マウスのボタンが押されたときには，ポイント操作があったとして，そのときの画面上のカーソルの位置情報を，その位置情報を必要とする各処理モジュールに送る。タッチパネルの場合には，タッチがあった時点で，そのタッチされた位置を検出し
，その位置情報を送る。１１０２は，映像の再生状態を制御するための操作パネルであり，操作パネル部１１０によって，再生・早送りなどの再生状態を示す絵や文字が描かれたボタンと，モードを変更するためのボタン，映像再生表示部からの各種情報を表示するためのディスプレイ領域などが表示される。操作パネルの表示領域がポイントされたことが，ポイント位置検出部１０４から伝えられると，その位置情報から，さらにどのボタンがポイントされたかを検出し，そのボタンに対応づけられた制御コードが映像再生表示部１００に送られる。１１０６は，汎用入出力ウインドウで，キーボード１１等を使って各種情報をコンピュータとやりとりできる。ファイル名を入力することで，連想検索を行う映像の指定をこのウインドウから行うことができる。入力されたファイル名は，再生開始位置を示す先頭フレームの番号と一緒に再生位置設定情報３１０として映像再生表示部１００に送られ，１００の中の論理映像呼出部３００は，その情報から対応する映像を呼び出し，物理映像呼出部を経由して映像がモニタウインドウ１１００に表示される。また，映像の各種関連情報をこの汎用入出力ウインドウ１１０６に表示することもできる。 The execution procedure of the associative search interface will be described in detail with reference to the computer screen example of FIG. 1 briefly described above. An arbitrary image is displayed on the monitor window 1100 by the image reproduction display unit 100 described above. Along with the display, sound is also output from the speaker 12. A cursor 1104 moves on the screen to perform a point operation in accordance with an indirect operation of the pointing device 5 such as a mouse or a joystick. The same point operation can be performed by a direct pointing device 13 such as a touch panel. In this case, a cursor can be omitted. The above-mentioned point position detection unit 104 constantly monitors these pointing devices, and when the cursor 1104 is moved in accordance with the movement of the mouse, or when the mouse button is pressed, it is determined that a point operation has been performed, and The position information of the cursor on the screen is sent to each processing module that needs the position information. In the case of a touch panel, when a touch is made, the touched position is detected and the position information is sent. Reference numeral 1102 denotes an operation panel for controlling the playback state of the video. The operation panel unit 110 has buttons on which pictures and characters indicating the playback state, such as playback and fast forward, are drawn, and buttons for changing the mode. A display area or the like for displaying various information from the video reproduction display unit is displayed. When the point position detection unit 104 notifies that the display area of the operation panel has been pointed, which button is pointed is further detected from the position information, and the control code associated with the button is used for video playback. It is sent to the display unit 100. Reference numeral 1106 denotes a general-purpose input / output window through which various information can be exchanged with the computer using the keyboard 11 or the like. By inputting a file name, a video to be subjected to associative search can be specified from this window. The input file name is sent to the video reproduction display unit 100 as reproduction position setting information 310 together with the number of the first frame indicating the reproduction start position, and the logical video calling unit 300 in 100 responds from the information. The image is called, and the image is displayed on the monitor window 1100 via the physical image calling unit. Further, various related information of the video can be displayed in the general-purpose input / output window 1106.

　ウインドウ１１０８に表示中のアイコン１１１０の一つがポイントされたことがポイント位置検出部によって検出されると，インデクス管理部１０８は，そのアイコンに対応するシーンの先頭フレーム番号を再生位置設定情報として映像再生表示部１００に伝える。１００は，モニタウインドウ１１００にそのシーンの映像を表示する。表示された映像は，１１０２の操作パネルによって再生や早送りなどの制御ができる。これにより映像の再生が開始されると，論理映像呼出部３００が出力する再生位置情報３１４が，インデクス管理部１０８に伝えられ，１０８は，１１０８のウインドウにおいて，例えば，再生中のシーンのアイコンがハイライトしたり点滅するといった強調表示を行い，現在モニタウインドウ１１００で再生されている映像に対応するシーンが一目でわかるようにする。 When the point position detection unit detects that one of the icons 1110 displayed in the window 1108 has been pointed, the index management unit 108 uses the first frame number of the scene corresponding to the icon as the reproduction position setting information to reproduce the image. Tell the display unit 100. 100 displays an image of the scene on a monitor window 1100. The displayed video can be controlled by the operation panel 1102 such as reproduction and fast forward. When the reproduction of the video is started, the reproduction position information 314 output from the logical video calling unit 300 is transmitted to the index management unit 108. In the window 1108, for example, the icon of the scene being reproduced is displayed. Highlighting such as highlighting and blinking is performed so that the scene corresponding to the video currently reproduced in the monitor window 1100 can be understood at a glance.

　１１０８におけるシーンの表示は階層的に行うことができる。まず，ポイントのしかたを，例えば，クリックとダブルクリックとの２種類用意し，クリックを上述の映像呼び出しのためのポイント手段として，ダブルクリックを後述するシーンの階層管理のためのポイント手段に用いる。１１０８に表示されたアイコンの一つがポイントされたことがポイント位置検出部によって検出されると，インデクス管理部１０８は，それがダブルクリックかどうかを調べる。ダブルクリックでなければ，上述の映像呼び出しの処理を行い，ダブルクリックならば，ポイントされたシーンに対応するシーン構造体１０００の中の下位シーン１０１４を参照し，１１０８と同様のウインドウを新たに作成して，それら下位シーンのアイコンを一覧表示する。こうして新たに作成されたウインドウは，１１０８と同様にポイントを検出する対象となり，このウインドウ上のアイコンがポイントされると，インデクス管理部は，対応するシーンをモニタウインドウに表示したり，さらに下位のシーンがあれば，それら下位のシーンの一覧表示を行うウインドウを新たに作成する。こうした階層的な管理は，映像の選択の際にも用いることができ，１本の映像ごとに，その全てのシーンを束ねる最上位のシーン１個を対応づけておけば，上記の枠組みの範疇で，登録されている映像の中から所望の映像をウインドウから選択したり，さらに下位のシーンの一覧を表示させたりすることができる。 The display of the scene in # 1108 can be performed hierarchically. First, two types of points, for example, a click and a double click, are prepared, and the click is used as a point means for calling the above-mentioned image, and the double click is used as a point means for hierarchical management of a scene described later. When the point position detection unit detects that one of the icons displayed in 1108 is pointed, the index management unit 108 checks whether or not the icon is double-clicked. If it is not a double click, the above-mentioned video call processing is performed. If it is a double click, a window similar to 1108 is newly created by referring to the lower scene 1014 in the scene structure 1000 corresponding to the pointed scene. Then, a list of icons of the lower scenes is displayed. The newly created window becomes a target of point detection similarly to 1108. When an icon on this window is pointed, the index management unit displays the corresponding scene on the monitor window or further lower level. If there are scenes, a new window for displaying a list of lower-level scenes is created. Such hierarchical management can also be used when selecting a video, and if one video is associated with one top-level scene that bundles all of the scenes, it falls within the scope of the above framework. Thus, a desired video can be selected from the registered video from the window, and a list of lower-level scenes can be displayed.

　１１１２は，アイコン１１１４と時間軸表示部１１１６からなり，例えば，別々のシーンに現れているが実は同じ被写体である，などといった基準で分類された幾つかの事物をまとめ，代表する一つのアイコン１１１４を表示して，その横に，映像全体の中でそれらの事物が登場する区間を，横軸を時間軸にとった棒グラフで表示したインデクスである。同じ分類の事物は各々オブジェクト構造体５００で管理されており，共通のオブジェクト構造体へのポインタを上位オブジェクト５１６に持つ。逆に上位オブジェクトは，各事物のオブジェクト構造体へのポインタを下位オブジェクト５１８に連接リスト形式で持つ。インデクス管理部１０８は，上位オブジェクトを記憶管理する。アイコンとして表示されるのは，上位オブジェクトの構造体が記憶する代表フレームの縮小画像である。棒グラフは，下位オブジェクトの各々を調べ，その始点・終点フレーム番号から映像全体に占める区間を計算して描画する。この棒グラフにおける事物の登場区間に対応する部分がポイントされたことが検出されると，インデクス管理部１０８は，その部分の映像をモニタウインドウ１１００に表示させる。アイコンをポイントしてオブジェクトを選択し関連情報を付与・変更すれば，上位オブジェクトの関連情報として，すなわち，同じ分類の全ての事物に共通の情報として登録される。 An icon 1114 is composed of an icon 1114 and a time axis display unit 1116. The icon 1114 collects and classifies several objects classified based on criteria such as appearing in different scenes but actually being the same subject. Is displayed, and beside it, the section in which those things appear in the entire video is displayed as a bar graph with the horizontal axis representing the time axis. Objects of the same classification are managed by the object structure 500, and the upper object 516 has a pointer to a common object structure. Conversely, the upper object has a pointer to the object structure of each thing in the lower object 518 in a linked list format. The index management unit 108 stores and manages upper-level objects. What is displayed as the icon is a reduced image of the representative frame stored in the structure of the upper-level object. The bar graph examines each of the lower-level objects, and calculates and draws a section occupying the entire image from the start and end frame numbers. When it is detected that a portion corresponding to an appearance section of an object in the bar graph is pointed, the index management unit 108 displays an image of the portion on the monitor window 1100. If an object is selected by pointing to an icon and related information is given or changed, the object is registered as related information of a higher-order object, that is, as information common to all objects of the same classification.

　一方，モニタウインドウ１１００がポイントされたことが検出されると，そのポイント位置の情報から，ポイント事物識別部１０２によって，映像中のどの事物がポイントされたかを検出する。この処理は，現在再生中のシーンがどれであるかを示す再生位置情報３１４を論理映像呼出部３００から受け，そのシーンに対応するシーン構造体の対応オブジェクト１０１０に記憶されているオブジェクトのそれぞれについて，その始点・終点を調べて，現在再生中のフレーム番号を示す再生位置情報３１６と比較し，そのオブジェクトで表される事物が現在画面上に現れているのかどうかを判定する。現れていると判定された事物の各々について，事物の位置，すなわちオブジェクトの位置５１４と再生位置情報３１６とから，現在の事物の存在領域を求め，その中にポイントされた位置が含まれているかどうかを判定する。複数合致した場合には，優先順位の高いものを１つだけ選択する。優先順位は，例えば，連接リストの登録順で表現できる。この方法だと，優先順位のために特別なデータ領域を用意する必要がない。ポイントされたと判定された事物がある場合には，その事物のオブジェクト構造体中のオブジェクト属性情報５２０を調べて，「連想検索のジャンプ先」を意味するキーを持つディクショナリ構造体８００を探し，リンク８１０に登録されたオブジェクトの始点フレーム番号を読みだして，そのフレームにジャンプする。オブジェクト属性情報５２０に該当するキーがないときには，共通の上位オブジェクトを持つ別の事物が登場しているシーンにジャンプするようにする。これは，ポイントされた事物の１ランク上位のオブジェクトに登録されている下位オブジェクトの連接リストを参照し，その事物に連接する次のオブジェクトの始点フレーム番号を読みだして，そのフレームにジャンプする。 On the other hand, when it is detected that the monitor window 1100 is pointed, the point object identification unit 102 detects which object in the video is pointed from the information of the point position. In this processing, the reproduction position information 314 indicating which scene is currently being reproduced is received from the logical video calling unit 300, and for each of the objects stored in the corresponding object 1010 of the scene structure corresponding to the scene. The start point and the end point are checked, and compared with the reproduction position information 316 indicating the currently reproduced frame number, to determine whether or not the object represented by the object is currently appearing on the screen. For each of the objects determined to be appearing, the present area of the current object is obtained from the position of the object, that is, the object position 514 and the reproduction position information 316, and whether the pointed position is included in the area. Determine whether If more than one match, only one with higher priority is selected. The priority can be expressed, for example, in the order of registration of the linked list. With this method, there is no need to prepare a special data area for priority. If there is a thing determined to be pointed, the object attribute information 520 in the object structure of the thing is checked to find a dictionary structure 800 having a key meaning "jump destination of associative search", and a link. The start frame number of the object registered in 810 is read, and jumping to that frame is performed. If there is no key corresponding to the object attribute information 520, the scene jumps to a scene in which another thing having a common upper object appears. This refers to the linked list of lower objects registered in the object one rank higher than the pointed thing, reads out the starting frame number of the next object linked to the thing, and jumps to that frame.

　以上のように，階層的にシーンを探して当りをつけてから映像をモニタウインドウで確認し，連想検索を行い，またインデクスウインドウで確認するといったことが可能になる。これは，シーンによって構成された論理映像による映像管理手段を導入したことによって達成されている。 As described above, it is possible to hierarchically search for a scene and hit it, then check the video on the monitor window, perform an associative search, and check on the index window. This is achieved by introducing video management means using a logical video composed of scenes.

　図１５に，モニタウインドウ１１００の詳細な画面例を示す。１２００が実際に映像が表示される領域で，１２０２は，映像再生部２００から送られる再生中のフレーム番号を表示する。フレーム番号を表示している部分は，数値入力部を兼ねており，キーボード等によって数字を修正すると，修正された数字を新たなフレーム番号と見做して，その番号に対応するシーンから映像を再生することができる。１２０４は，映像全体中で，現在どの部分を再生しているのかを表示するためのインジケータパネルである。このパネル上のどの位置に指示棒１２０６があるかによって，再生位置を示す。指示棒の位置は，上述のフレーム番号と，再生中の論理映像の構造体データから計算される。１２０８の縦棒は，シーンの変わり目を表す線であり，これによって，どのシーンが再生されているのかも直感的に知ることができる。このパネルによって，連想検索によってジャンプしたことが指示棒１２０６の大きな移動によって明確に知ることができ，映像の中で自然にシーンが変わっただけなのか区別がつかないといった混乱がなくなる。ポイント位置検出部が指示棒１２０６がポイントされ，ドラッグ操作によって強制的に動かされた場合，操作パネル部１１０は，ポイント位置検出部１０４によって得られる移動後確定した位置情報を使って，その位置に対応するシーンとフレーム番号が計算され，その位置に対応する映像部分から再生するように，映像制御部１０６にこの情報を伝えることができる。１２１０は，このモニタウインドウを閉じる場合のボタンである。 FIG. 15 shows a detailed screen example of the monitor window 1100. Reference numeral 1200 denotes an area in which a video is actually displayed, and reference numeral 1202 denotes a frame number being reproduced from the video reproduction unit 200. The part displaying the frame number also serves as a numerical value input part. When the number is corrected by using a keyboard or the like, the corrected number is regarded as a new frame number, and the video from the scene corresponding to that number is displayed. Can be played. Reference numeral 1204 denotes an indicator panel for displaying which part of the entire video is currently being reproduced. The reproduction position is indicated by the position on the panel where the indicator rod 1206 is located. The position of the pointer is calculated from the frame number and the structure data of the logical video being reproduced. The vertical bar 1208 is a line indicating a transition between scenes, and it is possible to intuitively know which scene is being reproduced. With this panel, the jump by the associative search can be clearly known by the large movement of the pointing bar 1206, and confusion such as indistinguishable that the scene has just changed naturally in the video is eliminated. When the pointing bar 1206 is pointed by the pointing bar 120 and is forcibly moved by a drag operation, the operation panel unit 110 uses the position information determined after the movement obtained by the point position detecting unit 104 to change the position to that position. The corresponding scene and frame number are calculated, and this information can be transmitted to the video control unit 106 so that the video portion corresponding to that position is reproduced. A button 1210 is used to close the monitor window.

　図１６は，音声にマッピングされたオブジェクトがある場合の映像表示画面の例である。音声は目で見えない情報であるので，ボタン１４００及び１４０２の形で可視化している。音声かどうかの判定は，事物有無判定部２０２が，オブジェクト分類コード５０４を調べることで行える。２０２は，現在再生中のシーンとフレームの情報を用い，どのオブジェクトが現れているかをチェックするとき，現れているオブジェクトの分類コードが音声のものであれば，ボタンを表示する。ボタンの表示位置は，オブジェクトの位置５１４に登録される。これにより，ポイント事物識別部の処理に変更を加えることなく，このボタンをポイントすることにより，その音声に関連するシーンにジャンプすることができる。ボタンは現在再生中の音声にマッピングされたオブジェクトの種類だけ表示され，ボタン面のタイトルで区別される。 FIG. 16 is an example of a video display screen when there is an object mapped to audio. Since the voice is invisible information, it is visualized in the form of buttons 1400 and 1402. The determination as to whether or not a voice is made can be made by the object presence / absence determining unit 202 examining the object classification code 504. 202 checks which object appears using the information of the currently reproduced scene and frame, and displays a button if the classification code of the appearing object is audio. The display position of the button is registered in the position 514 of the object. Thus, the user can jump to a scene related to the sound by pointing this button without changing the processing of the point object identification unit. The buttons are displayed only for the type of object mapped to the currently reproduced sound, and are distinguished by the title on the button surface.

　図１７の（ａ）〜（ｃ）は，連想検索で別のシーンにジャンプするときの表示画面例である。画面上の事物がポイントされると，映像再生表示部１００は，映像中の通常のシーンの変わり目と区別がつきやすいように特殊効果を加えた変化をするようにする。例えば，ポイントされた事物の領域の重心から，飛び先のシーンの縮小された映像がみるみる大きくなるようなシーンの変わり方をさせる。これにより，どの事物がポイントされたのかもすぐにわかる。 (A) to (c) of FIG. 17 are examples of display screens when jumping to another scene by associative search. When an object on the screen is pointed, the video playback / display unit 100 makes a change with a special effect added so that it can be easily distinguished from a normal scene change in the video. For example, from the center of gravity of the area of the pointed object, the scene is changed such that a reduced image of the scene at the jump destination becomes larger. As a result, it is immediately apparent which object was pointed.

　ところで，図１５における１２１２は，事物の関連情報を表示するかどうかを決めるためのボタンである。このボタンをポイントすると，例えば，図１８に示す１３００のようなメニューが現れる。このメニューには，関連情報を表示をしなくするＯＦＦのほか，現在表示可能な関連情報の種類が表示される。ユーザは，このメニューの中から見たい関連情報の種類を選ぶことができる。この情報は，映像制御部１０６を通じて，制御信号２１０として映像再生表示部１００の関連情報表示部２０４に伝えられ，関連情報を表示するのか，するならば，どのキーに対応する情報なのかが決定される。このメニューは１本の映像ごとに作られて，その映像について登録されている全てのオブジェクト構造体５００におけるオブジェクト属性情報５２０のディクショナリ全てのキーを調べ，全種類をメニューに載せている。１２１４は，モードを変更するためのボタンで，連想検索のモード，関連情報を変更するモードなどを切り替えることができる。これによって，ポイント事物識別部１０２の内部状態を変化させ，ポイント位置検出部からポイントが伝えられたときの対応処理が各内部状態に応じたものにする。 {Circle around (2)} 1212 is a button for determining whether or not to display the related information of the thing. When this button is pointed, a menu such as 1300 shown in FIG. 18 appears. In this menu, the type of related information that can be displayed at present is displayed in addition to OFF for not displaying the related information. The user can select the type of related information to be viewed from this menu. This information is transmitted as a control signal 210 to the related information display unit 204 of the video reproduction display unit 100 via the video control unit 106, and it is determined whether the relevant information is to be displayed and, if so, which key corresponds to the key. Is done. This menu is created for each video, and all the dictionaries of the dictionary of the object attribute information 520 in all the object structures 500 registered for the video are checked, and all types are listed in the menu. Reference numeral 1214 denotes a button for changing a mode, which can switch between an associative search mode and a mode for changing related information. As a result, the internal state of the point thing identification unit 102 is changed, and the corresponding processing when a point is transmitted from the point position detection unit is made to correspond to each internal state.

　図１９は，関連情報を表示する画面の一例である。映像中の事物１５００とその関連情報１５０２との関係が一目でわかるように，事物の上に重畳するように関連情報を表示する。事物有無判定部２０２が，前述した手順で現在現れている事物を確定したとき，それらの事物についてオブジェクトの位置５１４を読みだし，その位置情報から重心を求め，また，関連情報の表示に必要となる領域の重心を求めて，その重心が一致するように関連情報の表示位置を定める。但し，複数の事物が密に接している場合には，相互にオフセットをかけて１５０２の表示が重ならないようにする。関連情報１５０２は図のようなテキストに限定されるものではなく，アイコンなどの画像であっても一向に構わない。また，連想検索時には，ポイント事物識別部１０２が，関連情報１５０２の表示領域をポイントすることでも，対応する事物がポイントされたと識別できるようにし，別のシーンにジャンプできるようにする。これは，一つの事物につき，２つの位置情報を持たせ，そのＯＲで判定することで行う。また，図２０に示すように，関連情報１５０２と事物１５００の間を連結線１５０４で結ぶことでも対応づけのわかりやすい表示を行うことができる。特に，関連情報１５０２の表示位置を固定にしておき，連結線だけを事物の動きに合わせて変化させることで，事物の動きが激しく事物をポイントすることが困難な場合でも，固定している１５０２をポイントすることで容易に連想検索を行うことができる。 FIG. 19 shows an example of a screen for displaying related information. The related information is displayed so as to be superimposed on the thing so that the relationship between the thing 1500 and the related information 1502 in the video can be seen at a glance. When the object presence / absence determination unit 202 determines the current appearing object in the above-described procedure, the position 514 of the object is read out for those objects, the center of gravity is obtained from the position information, and it is necessary to display the related information. Then, the display position of the related information is determined such that the center of gravity of the region is obtained. However, when a plurality of objects are in close contact with each other, they are offset from each other so that the display of 1502 does not overlap. The related information 1502 is not limited to text as shown in the figure, and may be an image such as an icon. Also, at the time of the associative search, the point object identification unit 102 can identify that the corresponding object has been pointed by pointing to the display area of the related information 1502, and can jump to another scene. This is performed by giving two pieces of position information to one object and making an OR determination. In addition, as shown in FIG. 20, by connecting the related information 1502 and the thing 1500 with a connection line 1504, it is possible to display an easy-to-understand association. In particular, by fixing the display position of the related information 1502 and changing only the connection line in accordance with the movement of the object, even if the movement of the object is so severe that it is difficult to point to the object, the fixed 1502 is fixed. By pointing to, an associative search can be easily performed.

　システムの内部状態が関連情報変更モードのときには，図２１に示すように，表示されている関連情報のテキスト１５０２をポイントすると文字修正カーソル１５０６が現れ，キーボード等を使って，その場で直ちに変更することができる。表示された情報が上位のオブジェクトに格納されている関連情報であれば，この変更により，同じ上位オブジェクトを共有する全ての事物について一斉に関連情報が更新されることになる。表示されている以外の関連情報を変更するときには，図２２に示すような関連情報変更ウインドウ１６００が現れる。１６０２は，関連情報のキーのリストである。このリスト中には，その事物の関連情報のほか，その上位オブジェクトの関連情報もある。１６０４のボタンをポイントすると，文字入力ウインドウが現れて，そこに新しいキーを入力すると登録されて１６０２のリストに登録される。１６０２のリストに表示されているキーはポイントによって選択でき，選択されると強調表示される。この状態で，１６０８の文字入力領域に何か入力すると，それが，その選択されたキーに対応する関連情報として登録される。１６０６は，キーを抹消するためのボタンで，キーを選択した状態で１６０６をポイントすると，そのキーに対応する関連情報ごと登録抹消される。１６１０は，このようにして行った変更を受容して完了する場合にポイントするボタンで，１６１２は，変更を全てキャンセルして取りやめる場合にポイントするボタンである。 When the internal state of the system is in the related information change mode, as shown in FIG. 21, when the user points to the displayed related information text 1502, a character correction cursor 1506 appears, and the character is immediately changed on the spot using a keyboard or the like. be able to. If the displayed information is related information stored in a higher-order object, this change causes the related information to be simultaneously updated for all objects sharing the same higher-order object. When the related information other than the displayed information is changed, a related information change window 1600 as shown in FIG. 22 appears. Reference numeral 1602 denotes a list of keys of related information. In this list, in addition to the relevant information of the object, there is also the relevant information of the higher-level object. When the user points at the button 1604, a character input window appears. When a new key is entered, the character is registered and registered in the list 1602. Keys displayed in the list of 1602 can be selected by points, and when selected, they are highlighted. In this state, when something is input in the character input area 1608, it is registered as related information corresponding to the selected key. Reference numeral 1606 denotes a button for deleting a key. When the user points at 1606 while the key is selected, registration of the related information corresponding to the key is deleted. Reference numeral 1610 denotes a button to point when the change made in this way is accepted and completed, and reference numeral 1612 denotes a button to point to cancel and cancel all the changes.

　また，システムの内部状態が事物複写モードのときには，再生中の映像に現れた事物を複写して，他の映像に貼り付けるといったことも動画間・音声間のそれぞれで可能である。複写は，ポイントされた事物のオブジェクトの構造体をそっくり複製することによって行う。複写されたオブジェクトは，上位オブジェクトを共有し，また，その上位のオブジェクトの下位オブジェクトとして追加される。貼り付けについては，映像中の事物は映像情報の部分空間と対応づけられているので，貼り付け先の映像情報の同じ形状の部分空間と置換することで行える。そして，この複写・貼り付けは，関連情報も合わせて複写・貼り付けが行えるので，関連情報に関する作業量はほとんどない。ときには Also, when the internal state of the system is in the object copy mode, it is possible to copy the object that appears in the video being played back and paste it into another video between the moving image and the audio. Copying is performed by duplicating the entire structure of the object of the pointed object. The copied object shares the higher-level object and is added as a lower-level object of the higher-level object. As for the pasting, since the thing in the video is associated with the subspace of the video information, it can be performed by replacing the subspace with the same shape of the pasted video information. In this copying and pasting, since the copying and pasting can be performed together with the related information, there is almost no work regarding the related information.

　以上の実施例では，ワークステションレベルのコンピュータを用いて検索を行なう例で説明したが，ＶＴＲやＴＶなどの一機能として実現することも可能である。 In the above embodiment, an example has been described in which a search is performed using a workstation-level computer, but the search may be implemented as a function such as a VTR or TV.

映像の連想検索を実現するシステムの画面の構成例である。It is a structural example of the screen of the system which realizes the associative search of the video. 本発明の一実施例に係る映像の連想検索システムの装置構成のブロック図である。FIG. 1 is a block diagram of an apparatus configuration of a video associative search system according to an embodiment of the present invention. 映像の連想検索機能の説明図である。It is explanatory drawing of the associative search function of an image. 被写体検索方法を説明する図である。FIG. 4 is a diagram illustrating a subject search method. 映像の連想検索を実現するための処理ブロック図である。It is a processing block diagram for realizing associative search of a video. オブジェクト指向型のデータ構造体の概略図である。FIG. 3 is a schematic diagram of an object-oriented data structure. 映像再生表示部の詳細処理ブロック図である。It is a detailed processing block diagram of a video reproduction display part. 映像オブジェクトを記憶する構造体を示す図である。FIG. 4 is a diagram illustrating a structure that stores a video object. オブジェクトの位置を記憶する構造体を示す図である。FIG. 3 is a diagram illustrating a structure that stores a position of an object. ディクショナリを記憶する構造体を示す図である。It is a figure showing the structure which stores a dictionary. 映像再生部の詳細処理ブロック図である。It is a detailed processing block diagram of a video reproduction part. 論理映像を記憶する構造体を示す図である。FIG. 3 is a diagram illustrating a structure that stores a logical image. シーンを記憶する構造体を示す図である。FIG. 3 is a diagram illustrating a structure that stores a scene. 物理映像を記憶する構造体を示す図である。FIG. 3 is a diagram illustrating a structure that stores a physical image. モニタウインドウを示す画面例である。It is an example of a screen showing a monitor window. モニタウインドウの表示画面例である。It is an example of a display screen of a monitor window. モニタウインドウの表示画面例である。It is an example of a display screen of a monitor window. メニュー表示の例である。It is an example of a menu display. モニタウインドウの表示画面例である。It is an example of a display screen of a monitor window. モニタウインドウの表示画面例である。It is an example of a display screen of a monitor window. モニタウインドウの表示画面例である。It is an example of a display screen of a monitor window. 関連情報を変更するためのウインドウを示す図である。It is a figure showing the window for changing related information.

Explanation of reference numerals

　１…ディスプレイ，２…制御信号線，３…映像入力装置，４…コンピュータ，５…ポインティングデバイス，６…外部情報記憶装置，７…ＣＰＵ，８…接続インタフェース，９…メモリ，１０…映像再生装置，１１…キーボード，１２…スピーカ，１３…タッチパネル。
Reference Signs List 1 display, 2 control signal line, 3 video input device, 4 computer, 5 pointing device, 6 external information storage device, 7 CPU, 8 connection interface, 9 memory, 10 video playback device , 11 ... keyboard, 12 ... speaker, 13 ... touch panel.

Claims

Video display means having a video display area for viewing video, an index display area for displaying index information of the video, point detection means for detecting which of the display areas is pointed, and video The state of the video to be reproduced next from the object management means for pre-registering things or sounds in the moving image appearing in the video, and the point information obtained by the point detection means and the logical structure description of the video separately stored And an associative search device for a video, wherein the video associative search device comprises:

2. The video associative search device according to claim 1, further comprising an attribute information superimposing means, wherein at least a part of selected attribute information of the object appearing in the video being reproduced is converted into the information of the object in the reproduced video. A video associative search device characterized in that it is superimposed on a position or displayed in a form that clearly indicates that the object and its attribute information correspond to each other.

2. The video associative search device according to claim 1, further comprising an attribute information changing unit, wherein at least a part of the attribute information of the object appearing in the video being reproduced is changed immediately when the object appears. A video associative search device, characterized in that:

2. The video associative search device according to claim 1, wherein a video display area, an area for displaying a video playback position, and a video playback state are controlled as partial areas of the display screen in the video display means. Video associative search having an operation window having an area for displaying a button for displaying the attribute information and an area for displaying a button for determining whether or not to display attribute information and the type of display information apparatus.

2. The video associative search device according to claim 1, wherein when a scene changes due to an object point, the change is executed by adding a special video effect, so that the change is displayed separately from a normal scene change. Video associative search device.

2. The video associative search device according to claim 1, wherein when the attribute information of the object is displayed, it is determined that the object is pointed even when the display area of the attribute information of the object is pointed. Video associative search device.

2. The video associative search device according to claim 1, wherein the type of attribute information to be displayed is specified by displaying a list of types of attribute information in the video to be subjected to the associative search. Video associative search device.

7. The video associative search device according to claim 6, wherein the display position of the attribute information of the object is fixed, and the association with the object is a line that changes so as to always connect the position of the object and the display position of the attribute information. A video associative search device characterized by minute display.

Video display means for displaying at least moving image and video indexes; audio output means for reproducing audio; input means for instructing a control state of video; video reproduction means; Video input means for converting the video into a data format that can be handled by a computer, a memory for storing the data obtained by the video input means, and a video input device based on the information input by the input means. An associative video search system having control means for controlling a display state.

The state of the video to be reproduced next is determined by pre-registering the thing or sound in the moving image appearing in the video and detecting that the point on the moving image corresponding to the registered thing is pointed. A video associative search method, wherein a video is displayed on a video display means.