JP2000287165A

JP2000287165A - Image information description method, video image retrieval method, video reproduction method, video retrieval device and video reproduction device

Info

Publication number: JP2000287165A
Application number: JP11339465A
Authority: JP
Inventors: Osamu Hori; 修堀; Toshimitsu Kaneko; 敏充金子; Takeshi Mita; 雄志三田; Koji Yamamoto; 晃司山本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-01-28
Filing date: 1999-11-30
Publication date: 2000-10-13
Anticipated expiration: 2019-11-30
Also published as: JP4574771B2

Abstract

PROBLEM TO BE SOLVED: To allow a user to facilitate retrieval by confirming the contents of a video data through representative frame display or variable speed reproduction from compression coded video data in compliance with the MPEG-2 or the like even with a retrieval device having a low CPU capability. SOLUTION: Information such as the frame number of an original video frame and the size of each sampled image frame corresponding to a sampled image frame group is described as sampled image information relating to the sampled image frame group resulting from sampling a video frame group of the original video data 101 into an optional spatial size at an optional temporal interval and the scene change position information of the original video frame group and inter-frame screen change amount information area described together as attached information. Then a database is built up by cross-referencing the resultant time-space sampled video meta data 102 with the original video data 101, and the original video data are subject to representative frame display or variable speed reproduction by using the time-space sampled video meta data 102.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は画像情報、特に映像
フレーム群を時間的に任意の間隔でかつ空間的に任意の
大きさでサンプリングした標本画像フレーム群に関する
標本画像情報を記述する方法および該標本画像情報を用
いた映像検索並びに映像再生の方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for describing image information, particularly sample image information relating to a sample image frame group obtained by sampling a video frame group at an arbitrary time interval and at an arbitrary spatial size. The present invention relates to a method and apparatus for video search and video playback using sample image information.

【０００２】[0002]

【従来の技術】近年、半導体技術およびディジタル信号
処理技術の進歩により、動画像（映像）情報をアナログ
データからディジタルデータに変換して圧縮する処理を
リアルタイムに行うことが可能となっている。実際、デ
ィジタル衛星放送では、動画像圧縮の国際標準規格であ
るＭＰＥＧ−２により圧縮符号化されたディジタル映像
データが配信され、各家庭で圧縮映像データをリアルタ
イムで伸長復号化してテレビ受像機で映画等を観賞でき
るようになっている。2. Description of the Related Art Recent advances in semiconductor technology and digital signal processing technology have made it possible to convert video (video) information from analog data to digital data and compress the data in real time. In fact, in digital satellite broadcasting, digital video data compressed and encoded by MPEG-2, which is the international standard for moving picture compression, is distributed. And so on.

【０００３】また、光ディスクの高密度化により、ＭＰ
ＥＧ−２等で圧縮されたディジタル映像データを記録す
る技術も実用段階に達しつつある。このような光ディス
ク媒体の代表的なものとして、ＤＶＤ−ＲＡＭやＣＤ−
ＲＷがある。ＤＶＤ−ＲＡＭより記録時間は短いが、Ｈ
ＤＤにおいてもディジタル映像データの記録が可能であ
る。今後、ＤＶＤ−ＲＡＭ等に記録されたディジタル映
像データについても、ディジタル化された文字や静止画
データと同様に、容易に検索できるようにすることが要
求されると考えられる。[0003] Also, with the increase in the density of optical discs, MP
The technology of recording digital video data compressed by EG-2 or the like has also reached the practical stage. A typical example of such an optical disk medium is a DVD-RAM or a CD-RAM.
There is RW. Although the recording time is shorter than DVD-RAM,
Digital video data can also be recorded in DD. In the future, digital video data recorded on a DVD-RAM or the like will be required to be easily searched, like digitalized character and still image data.

【０００４】映像検索の古典的手法は、映画等の映像フ
ァイル毎にタイトル名、およびキーワードを定めてお
き、タイトル名およびキーワードのいずれか一方または
両方によって検索するというものである。この方法は、
検索そのものは容易であるが、映像の内容に応じた細か
な検索ができず、実際に映像を再生表示しないと所望の
映像かどうかが分からないのが欠点である。[0004] A classic technique of video search is to define a title and a keyword for each video file such as a movie, and to search by either or both of the title and the keyword. This method
Although the search itself is easy, it is difficult to perform a detailed search according to the content of the video, and the disadvantage is that it is impossible to know whether or not the video is the desired video without actually reproducing and displaying the video.

【０００５】圧縮されたディジタル映像データを記録す
る際、映像を静止画像フレームの連続として扱うことが
できるため、画像処理技術によって元の映像から代表フ
レームと呼ばれる特徴的な画像フレームを選択し、一覧
表示する方法が考えられている。代表フレームとして
は、シーンチェンジと呼ばれる画面が切り替わる部分を
用いる場合が多い。しかし、シーンチェンジは数秒に一
回、場合によっては数十秒に一回といった頻度でしか起
こらないため、代表フレームで映像の内容を表現するに
は限界がある。シーンチェンジとシーンチェンジの間の
フレームの映像を確認しようとすると、元の映像データ
を復号して表示するしかない。When recording compressed digital video data, since a video can be treated as a sequence of still image frames, a characteristic image frame called a representative frame is selected from the original video by an image processing technique, and the list is selected. A method of displaying is considered. As a representative frame, a portion called a scene change at which a screen is switched is often used. However, since a scene change occurs only once every few seconds, and sometimes once every several tens of seconds, there is a limit in expressing the contents of a video in a representative frame. The only way to check the video of a frame between scene changes is to decode and display the original video data.

【０００６】ＭＰＥＧ−１，ＭＰＥＧ−２といった国際
標準規格で圧縮されたディジタル映像データには、ある
程度ランダムアクセスするための仕組みが入っており、
ランダム再生、早送り再生等の可変速再生（トリックプ
レイ）ができるようになっている。しかし、これらの可
変速再生はディジタル映像データそのものを操作するこ
とにより行われるために処理が重く、計算機パワーの小
さい家庭用の受信機器では処理の負担が大きい。また、
ビデオ・オン・デマンドやインターネットでのブラウズ
のように、遠隔地に設置されたサーバからネットワーク
を通してディジタル映像データを配信し、家庭のコンピ
ュータやテレビ受像機で受信するような環境で上記の可
変速再生を行うことは、ネットワークのトラフィックを
増大させてしまうという困難がある。Digital video data compressed according to international standards such as MPEG-1 and MPEG-2 has a mechanism for random access to some extent.
Variable speed reproduction (trick play) such as random reproduction and fast forward reproduction can be performed. However, these variable speed reproductions are performed by manipulating the digital video data itself, so that the processing is heavy, and the processing load is large for a home-use receiving device having a small computer power. Also,
The above-mentioned variable-speed playback in an environment where digital video data is distributed from a server installed at a remote location through a network and received by a home computer or television receiver, such as video-on-demand or browsing on the Internet Is difficult to increase network traffic.

【０００７】[0007]

【発明が解決しようとする課題】上述したように、従来
の一般的な映像検索は映像ファイルに付与されたタイト
ル名やキーワードで検索する程度であり、映像の内容を
確認して検索するという環境は十分に提供されていない
のが実情である。As described above, in the conventional general video search, the search is performed only by the title name or keyword assigned to the video file. Is not provided enough.

【０００８】また、元の映像からシーンチェンジの部分
を代表フレームとして選択して一覧表示する方法では、
シーンチェンジとシーンチェンジの間のフレームの映像
を確認することができないという問題点がある。In a method of selecting a scene change portion from an original video as a representative frame and displaying it in a list,
There is a problem in that it is not possible to check the video of a frame between scene changes.

【０００９】さらに、ＭＰＥＧ−１，ＭＰＥＧ−２のよ
うな動画像圧縮の国際標準規格に組み込まれた可変速再
生の仕組みでは、ディジタル映像データそのものを操作
することで可変速再生を行うことから、計算機パワーの
小さい機器では処理の負担が大きく、またネットワーク
を通して配信されるディジタル映像データを受信するよ
うな環境で可変速再生を行おうとすると、ネットワーク
のトラフィックを増大させてしまうという問題点があっ
た。Further, in the variable speed reproduction mechanism incorporated in the international standard for moving picture compression such as MPEG-1 and MPEG-2, the variable speed reproduction is performed by operating the digital video data itself. There is a problem in that the processing load is heavy on a device with a small computer power, and when trying to perform variable speed playback in an environment where digital video data distributed through a network is received, network traffic increases. .

【００１０】本発明は、映像の内容を確認しての検索や
表示を行うことができる画像情報記述方法を提供するこ
とを主たる目的とする。[0010] It is a main object of the present invention to provide an image information description method capable of performing search and display while confirming the contents of a video.

【００１１】また、所望のフレームがシーンチェンジと
シーンチェンジの間に存在するような場合でも良好な映
像検索ができるようにすることを目的としている。Another object of the present invention is to enable good image retrieval even when a desired frame exists between scene changes.

【００１２】さらに、映像の可変速再生等を行う場合の
処理量を軽減し、計算機パワーの小さな機器やネットワ
ーク上でも容易に実現できるようにすることを目的とす
る。It is another object of the present invention to reduce the amount of processing when performing variable-speed playback of video and the like, so that it can be easily realized even on a device with a small computer power or on a network.

【００１３】[0013]

【課題を解決するための手段】上記の課題を解決するた
め、本発明に係る画像情報記述方法は、映像フレーム群
を時間的に任意の間隔でかつ空間的に任意の大きさにサ
ンプリングして得た標本画像フレーム群に関する標本画
像情報として、該標本画像フレーム群のそれぞれに対応
する映像フレームを特定するための属性情報を記述する
ことを基本的な特徴とする。In order to solve the above-mentioned problems, an image information description method according to the present invention samples a video frame group at an arbitrary time interval and at an arbitrary spatial size. The basic feature is that attribute information for specifying a video frame corresponding to each of the sample image frame groups is described as sample image information regarding the obtained sample image frame group.

【００１４】また、このような属性情報からなる標本画
像情報に加え、該標本画像フレーム群のそれぞれに対応
する映像フレームを特定するための属性情報を記述し、
さらに映像フレーム群に関する付帯情報を記述すること
を特徴とする。Further, in addition to the sample image information including such attribute information, attribute information for specifying a video frame corresponding to each of the sample image frame groups is described.
Further, it is characterized in that the additional information on the video frame group is described.

【００１５】ここで、属性情報は、標本画像フレーム群
のそれぞれに対応する映像フレームの時間軸上の位置を
示す位置情報、標本画像フレームの大きさに関する情報
のいずれかまたはその両方を含む。Here, the attribute information includes one or both of position information indicating a position on the time axis of a video frame corresponding to each of the sample image frame groups and information relating to the size of the sample image frame.

【００１６】付帯情報は、映像フレーム群のシーンチェ
ンジ位置情報、映像フレーム群のフレーム間の画面変化
量の情報のいずれか、またはその両方を含む。The supplementary information includes one or both of scene change position information of a video frame group and information of a screen change amount between frames of the video frame group.

【００１７】標本画像情報としては、標本画像フレーム
群の画像データまたは該標本画像フレーム群の前記映像
フレーム群へのポインタを併せて記述してもよい。As the sample image information, image data of a sample image frame group or a pointer to the video frame group of the sample image frame group may be described together.

【００１８】また、本発明によると、上記のような画像
情報記述方法により記述された標本画像情報または標本
画像情報および付帯情報を映像フレームの画像データと
共に、あるいは該画像データとは別に格納した記録媒体
が提供される。According to the present invention, there is provided a recording method in which sample image information or sample image information and supplementary information described by the above-described image information description method are stored together with image data of a video frame or separately from the image data. A medium is provided.

【００１９】さらに、本発明によると、上記のような画
像情報記述方法により記述された標本画像情報または標
本画像情報および付帯情報を用いて、以下のように標本
画像フレーム群を対象とした映像検索や映像再生を行う
環境を提供することができる。Further, according to the present invention, a video search for a sample image frame group is performed as follows using the sample image information or the sample image information and the supplementary information described by the image information description method as described above. And an environment for reproducing video.

【００２０】すなわち、本発明に係る第１の映像検索方
法／装置は、映像フレーム群を時間的に任意の間隔でか
つ空間的に任意の大きさにサンプリングして得た標本画
像フレーム群に関する標本画像情報として、少なくとも
該標本画像フレーム群のそれぞれに対応する映像フレー
ムの時間軸上の位置を示す第１の位置情報を記述してお
き、第１の位置情報と与えられた所望の映像フレームの
時間軸上の位置を示す第２の位置情報に基づいて、第２
の位置情報に最も近い第１の位置情報を有する標本画像
フレームを検索することを特徴とする。That is, a first video search method / apparatus according to the present invention provides a sample related to a sample image frame group obtained by sampling a video frame group at an arbitrary time interval and at an arbitrary spatial size. As image information, at least first position information indicating a position on a time axis of a video frame corresponding to each of the sample image frame groups is described, and the first position information and a given desired video frame Based on the second position information indicating the position on the time axis, the second
A search is made for a sample image frame having the first position information closest to the position information.

【００２１】このように本発明により記述された標本画
像情報を用いることで、計算機パワーやトラフィックに
負担をかけることなく、所望フレームの映像検索を容易
に行うことが可能となる。As described above, by using the sample image information described according to the present invention, it is possible to easily perform a video search for a desired frame without imposing a load on computer power or traffic.

【００２２】本発明に係る他の映像検索方法／装置は、
映像フレーム群を時間的に任意の間隔でかつ空間的に任
意の大きさにサンプリングして得た標本画像フレーム群
に関する標本画像情報として、少なくとも該標本画像フ
レーム群のそれぞれに対応する映像フレームの時間軸上
の位置を示す第１の位置情報を記述し、さらに映像フレ
ーム群のシーンチェンジ位置情報を付帯情報として併せ
て記述しておき、第１の位置情報と与えられた所望の映
像フレームの時間軸上の位置を示す第２の位置情報およ
びシーンチェンジ位置情報に基づいて、第２の位置情報
とこれに最も近いシーンチェンジ位置情報との時間的な
前後関係に応じて該シーンチェンジ位置情報より時間的
に前または後の第２の位置情報に最も近い第１の位置情
報を有する標本画像フレームを検索することを特徴とす
る。Another video search method / apparatus according to the present invention comprises:
As sample image information relating to a sample image frame group obtained by sampling the video frame group at an arbitrary time interval and an arbitrary size spatially, at least the time of an image frame corresponding to each of the sample image frame groups The first position information indicating the position on the axis is described, and the scene change position information of the video frame group is also described as additional information, and the first position information and the time of the given desired video frame are described. Based on the second position information indicating the position on the axis and the scene change position information, the second position information and the scene change position information closest to the second position information are determined based on the temporal order of the scene change position information. The method is characterized in that a sample image frame having the first position information closest to the second position information before or after in time is searched.

【００２３】より具体的には、所望のフレームに最も近
いシーンチェンジ位置を検出し、所望のフレームがその
シーンチェンジ位置から時間的に前か後にあるかを判定
して、前者の場合はそのシーンチェンジ位置より前で所
望のフレームに最も近い映像フレームを検索し、後者の
場合はそのシーンチェンジ位置より後で所望のフレーム
に最も近い映像フレームを検索する。More specifically, a scene change position closest to a desired frame is detected, and it is determined whether the desired frame is temporally before or after the scene change position. The video frame closest to the desired frame is searched before the change position, and in the latter case, the video frame closest to the desired frame after the scene change position is searched.

【００２４】このようにシーンチェンジ位置情報を付帯
情報として記述することによって、所望のフレームとよ
り類似した標本画像フレームの検索が可能となる。By describing the scene change position information as supplementary information in this way, it is possible to search for a sample image frame more similar to a desired frame.

【００２５】本発明に係るさらに別の映像検索方法／装
置は、映像フレーム群を時間的に任意の間隔でかつ空間
的に任意の大きさにサンプリングして得た標本画像フレ
ーム群に関する標本画像情報として、少なくとも該標本
画像フレーム群のそれぞれに対応する映像フレームの時
間軸上の位置を示す位置情報を記述しておき、検索対象
画像を提示し、標本画像フレーム群から該検索対象画像
との差分が所定の閾値以下の標本画像フレームを検索す
ることを特徴とする。この場合、検索対象画像との差が
所定の閾値以下の標本画像フレームに対して記述された
位置情報を検索結果として記録するようにしてもよい。Still another video search method / apparatus according to the present invention provides sample image information relating to a sample image frame group obtained by sampling a video frame group at an arbitrary time interval and at an arbitrary spatial size. As at least, position information indicating a position on the time axis of a video frame corresponding to each of the sample image frame groups is described, a search target image is presented, and a difference between the sample image frame group and the search target image is defined. Is searched for a sample image frame having a predetermined threshold or less. In this case, position information described for a sample image frame whose difference from the search target image is equal to or smaller than a predetermined threshold may be recorded as a search result.

【００２６】このように検索対象画像と各標本画像フレ
ームの画像との差分、例えば絶対値差分の合計を求め、
この値が小さい標本画像フレームを検索することによっ
ても、所望フレームの検索ができる。In this way, the difference between the image to be searched and the image of each sample image frame, for example, the sum of the absolute value differences is obtained.
A desired frame can also be searched by searching for a sample image frame having a small value.

【００２７】本発明に係る映像再生方法／装置は、映像
フレーム群を時間的に任意の間隔でかつ空間的に任意の
大きさにサンプリングして得た標本画像フレーム群に関
する標本画像情報として、少なくとも該標本画像フレー
ム群のそれぞれに対応する映像フレームの時間軸上の位
置を示す位置情報を記述し、さらに映像フレーム群のフ
レーム間の画面変化量の情報を付帯情報として併せて記
述しておき、標本画像フレーム群を用いて、画面変化量
の情報に応じて標本画像フレームの取得位置を変化させ
ることにより映像の可変速再生を行うことを特徴とす
る。According to the video reproducing method / apparatus of the present invention, at least sample image information relating to a sample image frame group obtained by sampling a video frame group at an arbitrary time interval and an arbitrary size spatially is provided. Position information indicating the position on the time axis of the video frame corresponding to each of the sample image frame group is described, and information on the amount of screen change between frames of the video frame group is additionally described as additional information, Using a sample image frame group, variable speed reproduction of video is performed by changing the acquisition position of the sample image frame in accordance with the information on the amount of screen change.

【００２８】すなわち、画面変化量が画面変化量が大き
いところでは再生速度を遅く、また画面変化量が小さい
ところで再生速度を遅くすることで、画面変化量を一定
に保った見やすい可変速再生を標本画像フレームに対し
て実現することが可能となる。That is, by reducing the reproduction speed when the screen change amount is large, and slowing down the reproduction speed when the screen change amount is small, the variable speed reproduction which is easy to read while the screen change amount is kept constant is sampled. This can be realized for an image frame.

【００２９】[0029]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００３０】図１は、本発明の一実施形態に係るシステ
ムアーキテクチャを示している。このシステムは大きく
分けてデータベース１００、映像表示エンジン１０４、
検索エンジン／標本画像表示エンジン１０５、コントロ
ーラ１０６および表示部１０７からなる。データベース
１００の内容は、後に詳しく説明する元映像データ１０
１と時・空間標本映像メタデータ１０２、および両者を
対応付ける対応テーブル１０３（対応関数テーブルでも
よい）の３つのコンポーネントからなっている。FIG. 1 shows a system architecture according to an embodiment of the present invention. This system is roughly divided into a database 100, a video display engine 104,
It comprises a search engine / specimen image display engine 105, a controller 106 and a display unit 107. The contents of the database 100 are the original video data 10 described in detail later.
1 and the spatiotemporal sample video metadata 102 and a correspondence table 103 (or a correspondence function table) for associating the two.

【００３１】データベース１００は一箇所に集中配置さ
れていてもよいし、複数箇所に分散配置されていてもよ
く、要は映像表示エンジン１０４や検索エンジン／標本
画像表示エンジン１０５でアクセスすることができれば
よい。元映像データ１０１と時・空間標本映像メタデー
タ１０２は、別々の媒体に格納されていてもよいし、同
一の媒体に格納されていてもよい。媒体としては、例え
ばＤＶＤ等が用いられる。また、元映像データ１０１は
一箇所に記憶されるのではなく、ネットワークを介して
伝送されるデータであってもよい。The database 100 may be centrally arranged at one place or may be dispersedly arranged at plural places. In short, the database 100 can be accessed by the video display engine 104 or the search engine / sample image display engine 105. Good. The original video data 101 and the spatiotemporal sample video metadata 102 may be stored on different media or may be stored on the same media. As the medium, for example, a DVD or the like is used. The original video data 101 may be data transmitted via a network instead of being stored in one place.

【００３２】映像表示エンジン１０４は、コントローラ
１０６による制御の下で元映像データ１０１を表示部１
０７で表示させるための処理を行う。さらに、映像表示
エンジン１０４は、検索エンジン／標本画像表示エンジ
ン１０５により時・空間標本映像メタデータ１０２に基
づいて元映像データ１０１が検索された場合には、元映
像データ１０１の検索された部分を表示部１０７で表示
させるための処理も行う。The video display engine 104 displays the original video data 101 on the display unit 1 under the control of the controller 106.
A process for displaying at 07 is performed. Further, when the search engine / sample image display engine 105 searches for the original video data 101 based on the spatiotemporal sample video metadata 102, the video display engine 104 displays the searched portion of the original video data 101. Processing for displaying on the display unit 107 is also performed.

【００３３】検索エンジン／標本画像表示エンジン１０
５は、コントローラ１０６による制御の下で、後に詳し
く説明する時・空間標本映像メタデータ１０２から、元
映像データ１０１の所望フレーム近傍の適切な標本画像
フレームを検索し、それらを代表フレームとして表示部
１０７で表示させたり、時・空間標本映像メタデータ１
０２を用いてコントローラ１０６を介して元映像データ
１０１の検索を行う。Search Engine / Specimen Image Display Engine 10
5 retrieves an appropriate sample image frame in the vicinity of a desired frame of the original video data 101 from the spatiotemporal sample video metadata 102, which will be described in detail later, under the control of the controller 106, and uses them as a representative frame. 107, or spatio-temporal sample video metadata 1
The search for the original video data 101 is performed via the controller 106 using 02.

【００３４】ここで、検索エンジン／標本画像表示エン
ジン１０５と、映像表示エンジン１０４との違いについ
て説明すると、前者は容量の少ない時・空間標本映像メ
タデータ１０２の中の標本画像フレーム群を処理するの
で、受信機器に内蔵の能力の低いＰＣ上のソフトウェア
として実装しても十分な処理速度を得ることができる。Here, the difference between the search engine / sample image display engine 105 and the image display engine 104 will be described. The former processes a group of sample image frames in the spatiotemporal sample image metadata 102 having a small capacity. Therefore, a sufficient processing speed can be obtained even if it is implemented as software on a PC having a low capacity built in the receiving device.

【００３５】後者はＭＰＥＧ−２映像データやアナログ
映像データである元映像データ１０１を処理するもので
あるため、特別のハードウェアを実装する必要がある場
合が多い。具体的には、元映像データ１０１がＭＰＥＧ
−２による圧縮映像データの場合は、映像表示エンジン
１０４には特別なデコードボード（ＭＰＥＧ−２デコー
ダ）が用いられ、また元映像データ１０１がアナログ映
像信号の場合は、映像表示エンジン１０４としては早送
り、巻き戻しをコントロールできるＶＴＲのような映像
再生装置が用いられる。Since the latter processes the original video data 101 which is MPEG-2 video data or analog video data, it is often necessary to implement special hardware. Specifically, the original video data 101 is MPEG
-2 compressed video data, a special decode board (MPEG-2 decoder) is used as the video display engine 104. If the original video data 101 is an analog video signal, the video display engine 104 fast-forwards. A video reproducing device such as a VTR that can control rewinding is used.

【００３６】なお、元映像データ１０１がＭＰＥＧ−１
やＭＰＥＧ−４による圧縮映像データの場合は、ＰＣ上
のソフトウェアでも映像表示エンジン１０４の実装は可
能であり、システムのアーキテクチャとして分離する必
要はない。Note that the original video data 101 is MPEG-1
In the case of compressed video data according to MPEG-4 or MPEG-4, the video display engine 104 can be implemented even by software on a PC, and there is no need to separate them as a system architecture.

【００３７】対応テーブル１０３における上下の線のコ
ネクションは概念的なもので、対応テーブル１０３は元
映像データ１０１および時・空間標本映像メタデータ１
０２と物理的につながっている必要はない。従って、元
映像データ１０１が格納された媒体は、映像表示エンジ
ン１０４と同一筐体内に納められる場合があり、また時
・空間標本映像メタデータ１０２が格納された媒体は、
検索エンジン／標本画像表示エンジン１０５と同じ筐体
内に納められる場合もある。The connection between the upper and lower lines in the correspondence table 103 is conceptual, and the correspondence table 103 includes the original video data 101 and the spatiotemporal sample video metadata 1.
02 does not need to be physically connected. Therefore, the medium storing the original video data 101 may be stored in the same housing as the video display engine 104, and the medium storing the spatiotemporal sample video metadata 102
The search engine / specimen image display engine 105 may be housed in the same housing.

【００３８】時・空間標本映像メタデータ１０２が格納
された媒体と検索エンジン／標本画像表示エンジン１０
５が離れた位置に存在していたとしても、両者を接続す
る回線としては、伝送容量の比較的小さい、例えば１０
Ｍｂｐｓのネットワークでも十分である。一方、元映像
データ１０１が格納された媒体と映像表示エンジン１０
４を接続する回線は、メデイアの種類によっては１００
Ｍｂｐｓ以上の回線を用意する必要がある。The medium storing the spatiotemporal sample video metadata 102 and the search engine / sample image display engine 10
Even if 5 is located at a distance, the line connecting them has a relatively small transmission capacity, for example, 10
A Mbps network is sufficient. On the other hand, the medium storing the original video data 101 and the video display engine 10
The line connecting 4 is 100 depending on the type of media.
It is necessary to prepare a line of Mbps or more.

【００３９】図１に示したようなシステムアーキテクチ
ャの有利な点は、検索を元映像データ１０１を対象にし
て行うのではなく、それよりデータサイズが小さい時・
空間標本映像メタデータ１０２を対象に行うため、イン
タラクティブな操作を快適に行うことができ、また全体
的にトラフィックを低く抑えることができるという点で
ある。The advantage of the system architecture as shown in FIG. 1 is that the search is not performed for the original video data 101 but for a smaller data size.
Since the spatial sample video metadata 102 is used as a target, interactive operations can be performed comfortably, and overall traffic can be suppressed low.

【００４０】図２は、元映像データ１０１と時・空間標
本映像メタデータ１０２の概念図である。元映像データ
１０１は、ＭＰＥＧ−１，ＭＰＥＧ−２，ＭＰＥＧ−４
等により圧縮されたディジタル映像データまたはアナロ
グデータであり、動画像を構成する映像フレームの集合
（映像フレーム群）からなっている。また、元映像デー
タ１０１には、各映像フレームの時間軸上の位置を示す
位置情報、例えばメディア時間（以下、単に時間と称す
る）またはフレーム番号という位置情報が関連付けられ
ている。元映像データ１０１と時・空間映像メタデータ
１０２の関連付けは、対応テーブル１０３により時間ま
たはフレーム番号で行われる。FIG. 2 is a conceptual diagram of the original video data 101 and the spatiotemporal sample video metadata 102. The original video data 101 is MPEG-1, MPEG-2, MPEG-4
Digital video data or analog data compressed by the above method, and is composed of a set of video frames (video frame group) constituting a moving image. The original video data 101 is associated with position information indicating a position on the time axis of each video frame, for example, position information such as a media time (hereinafter simply referred to as time) or a frame number. The association between the original video data 101 and the spatio-temporal video metadata 102 is performed by the correspondence table 103 by time or frame number.

【００４１】時・空間標本映像メタデータ１０２は、標
本画像情報２０１_１〜２０１_ｎを主体として構成され、
さらに本実施形態の例では付帯情報としてシーンチェン
ジ位置情報２０２および画面変化量情報２０３も含まれ
る。The spatiotemporal sample video metadata 102 is mainly composed of sample image information 201 _{1 to} 201 _n .
Further, in the example of the present embodiment, scene change position information 202 and screen change amount information 203 are also included as supplementary information.

【００４２】標本画像情報２０１_１〜２０１_ｎは、元映
像データ１０１を構成する映像フレーム群を時間的に任
意の間隔でかつ空間的に任意の大きさにサンプリングし
て得られた標本画像フレーム群と、各標本画像フレーム
にそれぞれ対応する元映像フレームの時間軸上の位置を
示す位置情報（時間またはフレーム番号）と、各標本画
像フレームの大きさを示す大きさ情報等の該標本画像フ
レームを特定するための属性情報とからなる。これらの
属性情報のうち、前者の各標本画像フレームにそれぞれ
対応する元映像フレームの時間軸上の位置を示す位置情
報（時間またはフレーム番号）は、対応テーブル１０３
を参照して記述される。The sample image information 201 _{1 to} 201 _n is a sample image frame group obtained by sampling a video frame group constituting the original video data 101 at an arbitrary time interval and at an arbitrary spatial size. And position information (time or frame number) indicating the position on the time axis of the original video frame corresponding to each sample image frame, and the sample image frame such as size information indicating the size of each sample image frame. It consists of attribute information for specifying. Among these pieces of attribute information, position information (time or frame number) indicating the position on the time axis of the original video frame corresponding to each of the former sample image frames is stored in the correspondence table 103.
It is described with reference to FIG.

【００４３】元映像データ１０１が圧縮されたディジタ
ル映像データのように既にディジタル化されている場合
には、時・空間標本映像メタデータ１０２の標本画像情
報２０１_１〜２０１_ｎの中の標本画像フレーム群は、元
映像データ１０１の所望のフレームを復号または部分復
号することで作成される。元映像データ１０１がアナロ
グデータの場合は、これをディジタル化してから標本画
像フレーム群を作成すればよい。If the original video data 101 has already been digitized as compressed digital video data, the sample image frames in the sample image information 201 _{1 to} 201 _n of the spatiotemporal sample video metadata 102 The group is created by decoding or partially decoding a desired frame of the original video data 101. If the original video data 101 is analog data, the original video data 101 may be digitized before the sample image frame group is created.

【００４４】次に、元映像データ１０１がＭＰＥＧ−２
圧縮映像データである場合について、属性情報のうちの
前者、すなわち各標本画像フレームにそれぞれ対応する
元映像フレームの時間軸上の位置を示す位置情報（時間
またはフレーム番号）を説明する。この場合は、ＭＰＥ
Ｇ−２圧縮映像データである元映像データ１０１を復号
して例えば３０フレームに１枚、かつ大きさを縦横１／
８ずつ縮小して標本画像フレーム群２０１_１〜２０１_ｎ
を作成する。また、このように固定の時間サンプリング
と固定の空間サンブリングで標本画像フレーム群を作成
するのでなく、これらを適宜変化させて標本画像フレー
ム群を作成することもできる。画面変化量が少ないとこ
ろでは、時間方向に粗くサンプリングし、画面変化量の
多いところでは時間方向に細かくサンプリングすること
も有効である。Next, if the original video data 101 is MPEG-2
In the case of compressed video data, the former of the attribute information, that is, position information (time or frame number) indicating the position on the time axis of the original video frame corresponding to each sample image frame will be described. In this case, the MPE
The original video data 101, which is G-2 compressed video data, is decoded and, for example, one frame every 30 frames,
The sample image frame groups 201 _{1 to} 201 _n are reduced by 8
Create In addition, instead of creating a sample image frame group by fixed time sampling and fixed spatial sampling as described above, a sample image frame group can be created by appropriately changing these. It is also effective to perform coarse sampling in the time direction where the screen change amount is small, and fine sampling in the time direction where the screen change amount is large.

【００４５】ＭＰＥＧ−２圧縮映像データには、Ｉピク
チャ（フレーム内符号化フレーム）と呼ばれるフレーム
内の相関のみを用いて圧縮したフレームが間欠的に存在
する。ＩピクチャはＰピクチャ（前方予測フレーム間符
号化フレーム）やＢピクチャ（双方向予測フレーム間符
号化フレーム）のようにフレーム間の相関を用いて圧縮
していないために、復号が容易である。そこで、標本画
像フレーム群を作成するに当たり元映像データ１０１の
Ｉピクチャのみについて、しかもＩピクチャのＤＣＴ
（離散コサイン変換）係数のうちのＤＣ成分のみを復号
すれば、より容易に時間的かつ空間的にサンプリングし
た標本画像フレーム群を得ることができる。In the MPEG-2 compressed video data, there are intermittently frames compressed using only a correlation in a frame called an I picture (intra-frame coded frame). I-pictures are easy to decode because they are not compressed using correlation between frames, unlike P-pictures (forward-predicted inter-coded frames) and B-pictures (bi-directionally predicted inter-coded frames). Therefore, when creating a sample image frame group, only the I picture of the original video data 101 and the DCT of the I picture are used.
(Discrete cosine transform) By decoding only the DC component of the coefficients, a sample image frame group sampled temporally and spatially can be obtained more easily.

【００４６】Ｉピクチャは、必ずしもー定のフレーム間
隔で存在することが保証されていないが、ＭＰＥＧ−２
により圧縮された映像データからビデオレート以上のス
ビードで、時間的かつ空間的にサンプリングした標本画
像フレーム群を作成するにはＩピクチャを用いる方法が
有効である。Although it is not always guaranteed that I-pictures exist at a constant frame interval, MPEG-2
In order to create a sample image frame group sampled temporally and spatially at a speed equal to or higher than the video rate from the video data compressed by the method described above, a method using an I picture is effective.

【００４７】このようにＩピクチャから標本画像フレー
ム群を作成する方法は、処理量が少ないために、特別な
ハードウェアを用いなくとも、ＰＣ上のソフトウェアだ
けで処理が可能である利点がある。また、ネットワーク
を介して元映像データ１０１から標本画像フレーム群を
作成する際にも、Ｉピクチャを用いると、トラフィック
増大の問題を容易に回避することができる。The method of creating a sample image frame group from an I picture as described above has an advantage that the processing amount is small and processing can be performed only by software on a PC without using special hardware. Also, when a sample image frame group is created from the original video data 101 via the network, the problem of traffic increase can be easily avoided by using the I picture.

【００４８】一方、標本画像フレーム群を作成する際の
元映像データ１０１の空間方向のサンプリングも固定で
ある必要はなく、適宜可変とすることができ、場合によ
っては縮小のみでなく、特に重要な画面のフレームにつ
いては拡大しても構わない。上述したように、標本画像
情報２０１は標本画像フレーム群と標本画像フレームの
属性情報が含まれており、属性情報に標本画像フレーム
群の大きさ情報が含まれているので、検索または表示時
に標本画像フレームを適宜所望の大きさに変換してから
用いることができる。On the other hand, the sampling of the original video data 101 in the spatial direction at the time of creating the sample image frame group does not need to be fixed, and can be appropriately changed. In some cases, not only reduction but also particularly important The frame of the screen may be enlarged. As described above, the sample image information 201 includes the sample image frame group and the attribute information of the sample image frame, and the attribute information includes the size information of the sample image frame group. The image frame can be used after being appropriately converted to a desired size.

【００４９】図３に、標本画像情報２０１の具体的な記
述例を示す。標本画像情報は標本画像フレーム群の各フ
レーム毎に記述される情報であり、この例では（１）当
該標本画像フレームに対応する元映像データのフレーム
番号または時間、（２）当該標本画像フレームの大きさ
（高さ×幅）、（３）次の標本画像フレームまでのフレ
ーム数または時間、（４）ＪＰＥＧ，ＲＧＢ，ＹＵＶと
いった標本画像フレームの画像形式、（５）標本画像フ
レームの画像データ（または元映像データ１０１へのポ
インタ）からなっている。ここで、（３）、（４）、
（５）は必須でなく、いずれかが省略されていてもよ
い。また、（１）〜（５）以外の付加的情報がさらに含
まれていてもよい。FIG. 3 shows a specific description example of the sample image information 201. The sample image information is information described for each frame of the sample image frame group. In this example, (1) the frame number or time of the original video data corresponding to the sample image frame, (2) the sample image frame Size (height x width), (3) number of frames or time to the next sample image frame, (4) image format of sample image frame such as JPEG, RGB, YUV, (5) image data of sample image frame ( Or a pointer to the original video data 101). Where (3), (4),
(5) is not essential, and any of them may be omitted. Further, additional information other than (1) to (5) may be further included.

【００５０】標本画像フレーム群を時間的に連続したフ
レーム群からなる映像データ（後述する標本映像）とし
て扱ってもよい。その映像データを例えばＡＶＩファイ
ルやＭＰＥＧ−４ファイルに圧縮することにより、さら
にコンパクトにすることも可能である。その場合、その
映像データは、元映像データ１０１の映像フレームへの
ファイルポインタとフレーム番号となる。従って、その
映像データから任意のフレームの画像を取得するための
インタフェースが必要となる。The sample image frame group may be treated as video data (sample video described later) composed of a temporally continuous frame group. By compressing the video data into, for example, an AVI file or an MPEG-4 file, it is possible to further reduce the size. In that case, the video data is a file pointer to the video frame of the original video data 101 and a frame number. Therefore, an interface for acquiring an image of an arbitrary frame from the video data is required.

【００５１】図４は、メタデータ１０２の管理構造を示
している。この例では、標本画像情報２０１_１，２０１
_２、…を管理するためにリスト構造を利用している。Ｒ
ｏｏｔ４０１からフレーム番号の小さい順に標本画像情
報２０１_１，２０１_２、…へのポインタとなるリスト４
０２、４０３、４０４、４０５をつなげてゆき、Ｅｎｄ
４０６が最終のフラグとなる。リスト４０２，４０３，
４０４，４０５の番号ＩＤ１，ＩＤ２，ＩＤ３，ＩＤ４
は概念的なもので、この順番に各リスト４０２，４０
３，４０４，４０５が並んでいることを意味する。この
例ではリスト４０２，４０３、４０４，４０５から実際
の標本画像情報２０１_１，２０１_２、２０２_３、２０２
_４のある場所を指し示しポインタが張られている。FIG. 4 shows a management structure of the metadata 102. In this example, the specimen image information 201 ₁ , 201
₂ , a list structure is used to manage. R
list 4 which is a pointer from sample 401 to sample image information 201 ₁ , 201 ₂ ,.
Connect 02, 403, 404, 405 and End
406 is the final flag. Lists 402, 403,
Numbers ID1, ID2, ID3, ID4 of 404, 405
Are conceptual, and the lists 402 and 40 are listed in this order.
3, 404, 405 are arranged. In this example, the actual sample image information 201 ₁ , 201 ₂ , 202 ₃ , 202 from the lists 402, 403, 404, 405
₄ is pointed to and pointed to.

【００５２】このようなリスト構造にすると、標本画像
情報の追加・削除が容易である。新しい標本画像フレー
ムを追加するときは、フレーム番号を順に調べてゆき、
フレーム番号の大小が逆転しないように標本画像情報の
追加を行う。標本画像フレームを削除するときは、対応
する標本画像情報をリストから取り外せばよい。With such a list structure, it is easy to add and delete specimen image information. When adding a new sample image frame, look up the frame number in order,
Sample image information is added so that the size of the frame number does not reverse. When deleting a sample image frame, the corresponding sample image information may be removed from the list.

【００５３】このように標本画像情報２０１の管理をリ
スト構造として追加・削除を容易にする理由は、標本画
像フレームとして最初に決められたものだけでなく、後
から追加したい場合が多々あることを考慮している。例
えば、ＭＰＥＧ−２圧縮映像データのＩピクチャを標本
画像フレームとして登録した後に、ＭＰＥＧ−２圧縮映
像のシーンチェンジ位置を検出し、そのシーンチェンジ
位置のフレームを標本画像フレームとして登録したい場
合が生じる。この場合、先に述べたＩピクチャからの標
本画像フレームについてはＤＣ成分のみからなる縮小画
像として登録し、シーンチェンジ位置の標本画像フレー
ムについては重要なフレームなので、フルサイズの画像
フレームとして登録することも可能である。The reason for facilitating the addition / deletion of the management of the sample image information 201 in the form of a list structure is that there are many cases where it is desired to add not only the sample image frame initially determined but also the sample image frame later. Take into account. For example, after registering an I picture of MPEG-2 compressed video data as a sample image frame, a scene change position of the MPEG-2 compressed video may be detected, and a frame at the scene change position may be registered as a sample image frame. In this case, the sample image frame from the I picture described above should be registered as a reduced image consisting only of the DC component, and the sample image frame at the scene change position should be registered as a full-size image frame because it is an important frame. Is also possible.

【００５４】なお、標本画像情報の他の記述例について
は、後述する。Other examples of description of the sample image information will be described later.

【００５５】次に、図５を用いて標本画像情報２０１の
記述方法の具体的な手順を元映像データ１０１がＭＰＥ
Ｇ−２圧縮映像データである場合を例にとり説明する。
図５は、標本画像情報２０１の記述を含む時・空間標本
映像メタデータ１０２の記録手順を示すフローチャート
である。Next, a specific procedure of the description method of the sample image information 201 will be described with reference to FIG.
The case of G-2 compressed video data will be described as an example.
FIG. 5 is a flowchart showing a recording procedure of the spatiotemporal specimen video metadata 102 including the description of the specimen image information 201.

【００５６】先ず、元映像データ１０１の映像フレーム
群を読み込み（ステップＳ１１）、元映像フレームを時
間的にサンプリングする（ステップＳ１２）。元映像デ
ータのシーンチェンジ位置を検出する（ステップＳ１
３）。シーンチェンジ位置は、例えば読み込んだ元映像
データ１０１の隣接するフレーム間の画面変化量を計算
し、一定値以上の変化があったところをシーンチェンジ
位置として検出する。First, a video frame group of the original video data 101 is read (step S11), and the original video frame is temporally sampled (step S12). A scene change position of the original video data is detected (step S1)
3). The scene change position is calculated by, for example, calculating a screen change amount between adjacent frames of the read original video data 101, and detecting a scene change position having a change equal to or more than a predetermined value.

【００５７】ステップＳ１２での元映像データ１０１の
時間的なサンプリングは、例えば動きの大きい領域は細
かく、動きの小さい領域は粗くサンプリングする等の処
理を行うこともできる。この例では元映像データ１０１
がＭＰＥＧ−２圧縮映像データであるため、ステップＳ
１２においては標本画像フレームを作成するためにＩピ
クチャを抽出し、また画面変化量を検出するためにＰピ
クチャを抽出する。The temporal sampling of the original video data 101 in step S12 can be performed, for example, by sampling a region with a large motion finely and a region with a small motion roughly. In this example, the original video data 101
Is MPEG-2 compressed video data,
At 12, an I picture is extracted to create a sample image frame, and a P picture is extracted to detect a screen change amount.

【００５８】次に、ステップＳ１２で抽出されたＩピク
チャを空間的にサンプリングすることにより、一枚の標
本画像フレームを作成する（ステップＳ１４）。より具
体的には、ステップＳ１４では主としてＩピクチャの画
素間引きを行って、縮小画像からなる標本画像フレーム
を作成する。但し、Ｉピクチャがシーンチェンジ位置の
ような重要なフレームであれば、縮小せずに元映像デー
タのフレームをそのまま標本画像フレームとするか、場
合によっては画素補間により拡大を行って標本画像フレ
ームを作成してもよい。Next, one sample image frame is created by spatially sampling the I picture extracted in step S12 (step S14). More specifically, in step S14, pixel thinning of the I picture is mainly performed to create a sample image frame including a reduced image. However, if the I picture is an important frame such as a scene change position, the frame of the original video data is used as it is as a sample image frame without reduction, or in some cases, the sample image frame is enlarged by performing pixel interpolation. May be created.

【００５９】一方、ステップＳ１２で抽出されたＰピク
チャから画面変化量、つまり隣接する画面間の画像の変
化の大きさの情報を取得する（ステップＳ１５）。Ｐピ
クチャには、前フレームからの動きベクトルの情報がサ
イド情報として付加されているので、この動きベクトル
の大きさや分布から画面変化量を求めることができる。On the other hand, from the P picture extracted in step S12, information on the amount of screen change, that is, information on the magnitude of image change between adjacent screens is obtained (step S15). Since the information of the motion vector from the previous frame is added to the P picture as side information, the screen change amount can be obtained from the size and distribution of the motion vector.

【００６０】次に、ステップＳ１４で作成された標本画
像フレームを必要に応じて圧縮加工した後（ステップＳ
１６）、この圧縮標本画像フレームと、ステップＳ１３
で検出されたシーンチェンジ位置、およびステップＳ１
５で取得された画面変化量の情報を用いて、図２、図３
に示すような時・空間標本映像メタデータ１０２を記録
し（ステップＳ１７）、処理は終了する。Next, after compressing the sample image frame created in step S14 as necessary (step S14).
16), this compressed sample image frame and step S13
Scene change position detected in step S1 and step S1
2 and 3 using the information of the screen change amount acquired in FIG.
Is recorded (step S17), and the process ends.

【００６１】すなわち、ステップＳ１７では時・空間標
本映像メタデータ１０２として、図２に示したように標
本画像情報２０１、シーンチェンジ位置情報２０２、画
面変化量情報２０３の３つの情報を記録する。また、標
本画像情報２０１は、図３に示したように、（１）当該
標本画像フレームに対応する元映像データのフレーム番
号または時間、（２）当該標本画像フレームの大きさ
（高さ×幅）、（３）次の標本画像フレームまでのフレ
ーム数または時間、（４）ＪＰＥＧ，ＲＧＢ，ＹＵＶと
いった標本画像フレームの画像形式、（５）標本画像フ
レームの画像データ（または元映像データ１０１へのポ
インタ）を含んでいる。（５）の標本画像フレームの画
像データとは、この例ではステップＳ１２で抽出され、
ステップＳ１４で空間的なサンプリングが施され、さら
に必要に応じてステップＳ１６で圧縮加工された、ある
いは圧縮加工されないＩピクチャの画像データである。That is, in step S17, three pieces of information, ie, sample image information 201, scene change position information 202, and screen change amount information 203 are recorded as the spatiotemporal sample video metadata 102 as shown in FIG. As shown in FIG. 3, the sample image information 201 includes (1) the frame number or time of the original video data corresponding to the sample image frame, and (2) the size (height × width) of the sample image frame. ), (3) the number or time to the next sample image frame, (4) the image format of the sample image frame such as JPEG, RGB, YUV, (5) the image data of the sample image frame (or the original image data 101 Pointer). The image data of the sample image frame of (5) is extracted in step S12 in this example,
This is image data of an I picture that has been subjected to spatial sampling in step S14 and further compressed or uncompressed in step S16 as necessary.

【００６２】次に、このようにして記録された時・空間
標本映像メタデータ１０２の利用形態について説明す
る。Next, the use of the spatiotemporal sample video metadata 102 thus recorded will be described.

【００６３】（１）シーンチェンジ位置情報を用いた標
本画像フレームの検索所望の映像フレームを表示したい場合、その所望映像フ
レームを元映像データ１０１から直接検索しようとする
と、処理に時間がかかってしまうことは前述した通りで
ある。その代わり、元映像データをサンプリングして得
られた時・空間標本映像メタデータ１０２を検索するこ
とにより所望フレームを探せば、処理時間が短くて済
む。しかし、標本画像フレーム群は時間的にサンプリン
グされているので、所望フレームの画像が含まれている
とは限らない。そこで、最も簡単には所望フレームに時
間的に最も近い標本画像フレームを検索して表示すれば
よいことになる。図２の例では、破線で示す所望フレー
ムに、時間的に最も近い標本画像情報２０１_ｎの標本画
像フレームを表示用の画像フレームとする。(1) Retrieval of Sample Image Frame Using Scene Change Position Information If it is desired to display a desired video frame, it takes time to retrieve the desired video frame directly from the original video data 101. This is as described above. Instead, if the desired frame is searched by searching the spatio-temporal sample video metadata 102 obtained by sampling the original video data, the processing time can be reduced. However, since the sample image frame group is temporally sampled, an image of a desired frame is not always included. Therefore, the simplest method is to search for and display the sample image frame temporally closest to the desired frame. In the example of FIG. 2, the sample image frame of the sample image information 201 _n that is temporally closest to the desired frame indicated by the broken line is set as the display image frame.

【００６４】この場合、標本画像フレーム群がどの程度
の時間サンプリングで作成されているかによって、所望
フレームと表示用の画像フレームとのずれが決まる。こ
のずれは標本画像フレーム群が十分に短い間隔で時間サ
ンプリングされていれば小さいので、ほとんど問題はな
い。しかし、シーンチェンジがある場合は、必ずしも所
望フレームに時間的に最も近い標本画像フレームが表示
用の画像フレームとして適当とは限らない。すなわち、
所望フレームとそれに最も近い標本画像情報２０１_ｎに
含まれる標本画像フレームとの間にシーンチェンジがあ
ると、標本画像情報２０１_ｎよりも一つ前の標本画像情
報２０１_ｎ−１に含まれる標本画像フレームの方が所望
フレームに類似する表示用の画像フレームとして適当で
ある。本実施形態によれば、図２に示したように時・空
間標本映像メタデータ１０２にシーンチェンジ位置情報
２０２を付帯情報として付加することによって、この問
題を解決することができる。In this case, the shift between the desired frame and the display image frame is determined by how long the sample image frame group is created by sampling. This shift has little problem if the sample image frame group is time-sampled at sufficiently short intervals. However, when there is a scene change, the sample image frame that is temporally closest to the desired frame is not always suitable as an image frame for display. That is,
Desired frame and if there is a scene change between the specimen image frames included in the closest specimen image information 201 _n to, specimen image included in the sample image data 201 _n-1 before one than the specimen image information 201 _n The frame is more suitable as a display image frame similar to the desired frame. According to the present embodiment, this problem can be solved by adding the scene change position information 202 as additional information to the spatiotemporal sample video metadata 102 as shown in FIG.

【００６５】図６に示すフローチャートを参照して、上
記のようにしてシーンチェンジ情報２０２を用いて所望
フレームを代表する標本画像フレームを検索する処理手
順を説明する。なお、ここではシーンチェンジ情報２０
２は、シーンチェンジ位置のフレーム番号（シーンチェ
ンジフレーム番号という）で表されるものとする。Referring to the flowchart shown in FIG. 6, a processing procedure for searching for a sample image frame representing a desired frame using the scene change information 202 as described above will be described. Here, the scene change information 20
2 is represented by a frame number at a scene change position (referred to as a scene change frame number).

【００６６】先ず、検索したい所望フレームのフレーム
番号が与えられると、当該フレーム番号に一番近いシー
ンチェンジフレーム番号を検索する（ステップＳ２
１）。First, when a frame number of a desired frame to be searched is given, a scene change frame number closest to the frame number is searched (step S2).
1).

【００６７】次に、（元映像データの）開始フレーム番
号からステップＳ２１で検索されたシーンチェンジフレ
ーム番号までの間に所望のフレーム番号が存在するかど
うかを判定する（ステップＳ２２）。Next, it is determined whether or not a desired frame number exists between the start frame number (of the original video data) and the scene change frame number searched in step S21 (step S22).

【００６８】ステップＳ２２の判定の結果、所望のフレ
ーム番号が開始フレーム番号〜シーンチェンジフレーム
番号の間にあることが分かれば、開始フレーム番号〜シ
ーンチェンジフレーム番号の間において、所望のフレー
ム番号に時間的に（あるいは空間的に）最も近い標本画
像フレームを検索する（ステップＳ２３）。If it is determined in step S22 that the desired frame number is between the start frame number and the scene change frame number, the time between the start frame number and the scene change frame number is The closest (or spatially) sample image frame is searched (step S23).

【００６９】また、ステップＳ２２の判定の結果、所望
のフレーム番号が開始フレーム番号〜シーンチェンジフ
レーム番号の間にないことが分かれば、シーンチェンジ
フレーム番号〜（元映像データの）最終フレーム番号の
間において、所望のフレーム番号に時間的に（あるいは
空間的に）最も近い標本画像フレームを検索する（ステ
ップＳ２４）。If it is determined in step S22 that the desired frame number is not between the start frame number and the scene change frame number, the scene change frame number and the last frame number (of the original video data) are determined. In, a sample image frame temporally (or spatially) closest to the desired frame number is searched (step S24).

【００７０】そして、検索された標本画像フレームを所
望フレームに最も類似する画像として表示し（ステップ
Ｓ２５）、処理は終了する。Then, the retrieved sample image frame is displayed as an image most similar to the desired frame (step S25), and the process ends.

【００７１】（２）標本画像フレームの検索次に、図７に示すフローチャートを参照して、時・空間
標本映像メタデータ１０２を対象としてある画像に類似
する画像を検索する手順について説明する。(2) Retrieval of Sample Image Frame Next, a procedure for retrieving an image similar to a certain image for the spatiotemporal sample video metadata 102 will be described with reference to the flowchart shown in FIG.

【００７２】先ず、検索対象画像Ｒ、つまり検索してほ
しい画像を提示する（ステップＳ３１）。First, a search target image R, that is, an image to be searched is presented (step S31).

【００７３】次に、時・空間標本映像メタデータ１０２
から標本画像フレームを順次１枚ずつ取得する（ステッ
プＳ３３）。Next, the spatiotemporal sample video metadata 102
, Sample image frames are sequentially acquired one by one (step S33).

【００７４】検索対象画像ＲをステップＳ３３で取得さ
れた標本画像フレームの大きさに正規化する（ステップ
Ｓ３４）。これは、標本画像フレームは一枚一枚サイズ
が異なるからである。The search target image R is normalized to the size of the sample image frame acquired in step S33 (step S34). This is because the sample image frames have different sizes one by one.

【００７５】ステップＳ３３で取得された標本画像フレ
ームと、ステップＳ３４で正規化された検索対象画像Ｒ
との間の類似度、例えば画素毎の絶対値差分の合計を計
算する（ステップＳ３５）。The sample image frame obtained in step S33 and the search target image R normalized in step S34
, For example, the sum of absolute value differences for each pixel is calculated (step S35).

【００７６】この絶対値差分の合計がある閾値以下かど
うかを判定する（ステップＳ３６）。ステップＳ３６の
判定の結果、絶対値差分の合計が閾値以下ならば、ステ
ップＳ３３で取得された標本画像フレームが検索対象画
像Ｒとほぼ同じであると判断して、その標本画像フレー
ムのフレーム番号を検索結果として記録する（ステップ
Ｓ３７）。It is determined whether or not the sum of the absolute value differences is equal to or smaller than a certain threshold (step S36). If the result of the determination in step S36 is that the sum of the absolute value differences is equal to or smaller than the threshold value, it is determined that the sample image frame acquired in step S33 is substantially the same as the search target image R, and the frame number of the sample image frame is changed. It is recorded as a search result (step S37).

【００７７】以上の一連の処理をステップＳ３２で全て
の標本画像フレームが取得されたと判定されるまで繰り
返して、処理は終了する。The above series of processing is repeated until it is determined in step S32 that all the sample image frames have been obtained, and the processing ends.

【００７８】次に、図７のフローチャートに示した手順
に従った処理の終了後、以下のようにして検索結果を表
示する。Next, after the processing according to the procedure shown in the flowchart of FIG. 7 is completed, the search result is displayed as follows.

【００７９】ステップＳ３７で検索結果として得られた
標本画像フレームのフレーム番号に基づき、検索された
標本画像フレームを図１における検索エンジン／標本画
像表示エンジン１０５によって表示部１０７で表示す
る。Based on the frame number of the sample image frame obtained as the search result in step S37, the searched sample image frame is displayed on the display unit 107 by the search engine / sample image display engine 105 in FIG.

【００８０】あるいは、ステップＳ３７で検索結果とし
て得られた標本画像フレームのフレーム番号に基づき、
元映像データ１０１をそのフレーム番号の位置から再生
したい場合は、図１に示した対応テーブル１０３（また
は対応関数テーブル）を用いて、その標本画像フレーム
のフレーム番号に対応する元映像データ１０１のフレー
ム番号を調べる。そして、コントローラ１０６にそのフ
レーム番号の情報と表示コマンドを送ることにより、映
像表示エンジン１０４を用いて元映像データ１０１の該
当フレームから再生を行い、表示部１０７で表示する。Alternatively, based on the frame number of the sample image frame obtained as a search result in step S37,
When the original video data 101 is to be reproduced from the position of the frame number, the frame of the original video data 101 corresponding to the frame number of the sample image frame is determined using the correspondence table 103 (or the corresponding function table) shown in FIG. Look up the number. Then, by transmitting the information of the frame number and the display command to the controller 106, the image is reproduced from the corresponding frame of the original video data 101 using the video display engine 104 and displayed on the display unit 107.

【００８１】（３）時・空間標本映像メタデータを用い
た早送り再生図２に示したように、本実施形態では時・空間標本化ビ
デオメタデータ１０２には、標本画像情報２０１以外の
付帯情報として、シーン位置情報２０２のほか、画面変
化量情報２０３も記述されている。(3) Fast-forward playback using spatio-temporal sample video metadata As shown in FIG. 2, in this embodiment, spatio-temporal sampled video metadata 102 includes additional information other than sample image information 201 In addition to the scene position information 202, screen change amount information 203 is described.

【００８２】画面変化量情報２０３は、元映像データ１
０１の飛び飛びに存在する映像フレーム間の画面変化量
を示す情報であり、例えばフレーム間の絶対値差分の合
計を用いたり、また元映像データ１０１がＭＰＥＧ圧縮
映像データであれば、フレーム間動き補償のデータから
画面全体の動きベクトルの大きさの平均（平均パワー）
を計算して求めることができる。このような画面変化量
情報２０３を時・空間標本化ビデオメタデータ１０２に
付帯させることより、高度な可変速再生を行うことがで
きる。The screen change amount information 203 contains the original video data 1
01 is information indicating the amount of screen change between video frames that exist intermittently. For example, the sum of absolute value differences between frames is used. If the original video data 101 is MPEG-compressed video data, motion compensation between frames is used. Average of motion vector magnitude (average power) of the whole screen from the data of
Can be calculated and obtained. By attaching such screen change amount information 203 to the spatio-temporal sampling video metadata 102, it is possible to perform high-speed variable-speed playback.

【００８３】特開平１０−２４３３５１号（特願平０９
−０４２６３７号）「映像再生装置」で述べられている
ように、映像を画面の変化が大きいところではゆっくり
と再生し、画面の変化が小さいところでは速く再生する
ことにより、画面の変化量を一定にして見やすい早送り
再生を実現する技術は知られている。この特許では、各
フレーム毎に画面変化が存在し、かつ全てのフレームを
用いることが前提となっているが、本発明のように時間
的に離散した標本画像フレームを対象とし、かつ画面変
化量も時間的に離散して得られる場合については言及さ
れていない。そこで、本実施形態では時間的に離散した
標本画像フレームおよび画面変化量に対して同様の効果
が得られる可変速再生を実現する方法を提供する。JP-A-10-243351 (Japanese Patent Application No.
As described in “Video playback device”, the amount of screen change is kept constant by playing back the image slowly where the screen change is large and fast at the place where the screen change is small. A technique for realizing fast-forward playback that is easy to view is known. In this patent, it is assumed that there is a screen change for each frame, and that all frames are used. There is no mention of the case where the time is obtained discretely in time. In view of this, the present embodiment provides a method for realizing variable-speed reproduction in which a similar effect is obtained with respect to a temporally discrete sample image frame and a screen change amount.

【００８４】最初に、図８に示すフローチャートを参照
して、標本画像フレームを用いて可変速再生を行う場合
の基本的な処理手順を説明する。First, with reference to a flowchart shown in FIG. 8, a basic processing procedure when performing variable speed reproduction using a sample image frame will be described.

【００８５】先ず、可変速再生（この場合は、早送り再
生）を行う範囲の指定を行う（ステップＳ４１）。可変
速再生範囲の開始フレーム番号をＦｓ、終了フレーム番
号をＦｅとする。First, a range for performing variable speed reproduction (in this case, fast forward reproduction) is specified (step S41). Let Fs be the start frame number and Fe be the end frame number of the variable speed playback range.

【００８６】次に、再生速度倍率ｍ、つまり何倍速で早
送り再生を行うかを指定する（ステップＳ４２）。Next, the reproduction speed magnification m, that is, how many times the fast forward reproduction is to be performed is specified (step S42).

【００８７】次に、再生方向の指定、つまり早送り再生
を順方向再生で行うか逆方向再生で行うかの指定を行う
（ステップＳ４３）。Next, the reproduction direction is specified, that is, whether fast-forward reproduction is to be performed by forward reproduction or reverse reproduction (step S43).

【００８８】次に、標本画像フレームの再生フレームレ
ートｒ［フレーム／秒］を指定する（ステップＳ４
４）。再生フレームレートｒは、テレビジョン方式によ
って異なり、例えばＮＴＳＣの場合は３０［フレーム／
秒］、ＰＡＬの場合は２４［フレーム／秒］である。Next, the reproduction frame rate r [frame / second] of the sample image frame is designated (step S4).
4). The reproduction frame rate r differs depending on the television system. For example, in the case of NTSC, 30 [frame / frame]
Second], and 24 [frames / second] for PAL.

【００８９】ここで、元映像データ１０１のフレームレ
ートがＲ［フレーム／秒］であったとすると、これを基
に可変速再生時に標本画像フレーム群について読み飛ば
すフレーム数を後述のようにして計算する（ステップＳ
４５）。Here, assuming that the frame rate of the original video data 101 is R [frame / second], the number of frames to be skipped for the sample image frame group during variable speed reproduction is calculated based on this, as described later. (Step S
45).

【００９０】そして、再生フレームレートｒ［フレーム
／秒］で標本画像フレームの再生を行うために、１／ｒ
秒のサイクルで標本画像フレームを取得して表示を行う
（ステップＳ４６）。Then, in order to reproduce the sample image frame at the reproduction frame rate r [frame / second], 1 / r
A sample image frame is acquired and displayed in a cycle of seconds (step S46).

【００９１】順方向再生の場合は、フレームＦｓに対応
する標本画像フレーム番号から再生を開始し、フレーム
番号を増加させる方向で読み飛ばす。逆方向再生の場合
は、フレームＦｅに対応する標本画像フレームから再生
を開始し、フレ一ム番号を減少させる方向で読み飛ばす
ことになる。In the case of forward reproduction, reproduction is started from the sample image frame number corresponding to the frame Fs, and is skipped in the direction of increasing the frame number. In the case of reverse reproduction, reproduction is started from the sample image frame corresponding to the frame Fe, and is skipped in the direction of decreasing the frame number.

【００９２】ここで、ステップＳ４６の処理についてさ
らに詳しく説明すると、順方向の早送り再生の場合は、
１サイクル当たりフレーム番号を（ｍ×Ｒ／ｒ）フレー
ムずつ増加させながら標本画像フレームを取得する。す
なわち、（ｍ×Ｒ／ｒ）がステップＳ４５で計算された
順方向に読み飛ばすフレーム数であり、ステップＳ４６
ではＦｓ＋（ｍ×Ｒ／ｒ）×ｔのフレーム番号における
最近傍の標本画像フレームを再生して表示することにな
る。ここで、ｔはサイクル数である。Here, the processing in step S46 will be described in more detail. In the case of fast forward reproduction in the forward direction,
A sample image frame is obtained while increasing the frame number by (m × R / r) frames per cycle. That is, (m × R / r) is the number of frames to be skipped in the forward direction calculated in step S45, and step S46
In this case, the nearest sample image frame at the frame number of Fs + (m × R / r) × t is reproduced and displayed. Here, t is the number of cycles.

【００９３】同様に、逆方向の早送り再生の場合も、１
サイクル当たり（ｍ×Ｒ／ｒ）フレームずつ減少させな
がら標本画像フレームを取得する。すなわち、（ｍ×Ｒ
／ｒ）がステップＳ４５で計算された逆方向に読み飛ば
すフレーム数であり、ステップＳ４６ではＦｅ−（ｍ×
Ｒ／ｒ）×ｔのフレーム番号における最近傍の標本画像
フレームを再生して表示する。Similarly, in the case of fast forward reproduction in the reverse direction,
A sample image frame is acquired while decreasing by (m × R / r) frames per cycle. That is, (mxR
/ R) is the number of frames to be skipped in the reverse direction calculated in step S45, and in step S46, Fe− (m ×
The nearest sample image frame at the frame number of (R / r) × t is reproduced and displayed.

【００９４】このようにして、標本画像フレーム群を用
いて任意の再生速度倍率の可変速再生が可能になる。な
お、毎サイクルで取り出す標本画像フレームに違いがな
い場合は、同じフレームを継続して表示するようにして
もよく、それにより処理効率を上げることができる。In this manner, variable speed reproduction at an arbitrary reproduction speed magnification can be performed using the sample image frame group. If there is no difference between the sample image frames taken out in each cycle, the same frame may be displayed continuously, thereby increasing the processing efficiency.

【００９５】上述の説明では、ユーザが変更しない限
り、再生速度倍率ｍは一定であるとしたが、次に、前述
した画面変化量情報２０３を利用して、より円滑な可変
速再生を行う方法について述べる。この可変速再生の基
本は、画面変化量情報２０３に応じて標本画像フレーム
を用いた可変速再生での再生速度を時時刻刻変化させ
る、というものである。説明を簡単にするために、可変
速再生の範囲を特に指定せず、元映像データ１０１全体
を対象に早送り再生を行う場合を考える。In the above description, the reproduction speed magnification m is constant unless changed by the user. Next, a method of performing smoother variable speed reproduction using the above-described screen change amount information 203 will be described. Is described. The basis of the variable speed reproduction is that the reproduction speed in the variable speed reproduction using the sample image frame is changed at every time according to the screen change amount information 203. For the sake of simplicity, consider a case where fast forward playback is performed on the entire original video data 101 without specifying a variable speed playback range.

【００９６】先ず、以下のパラメータを定義する。First, the following parameters are defined.

【００９７】元映像データ１０１の全フレーム数：Ｋ
［フレーム］元映像データ１０１のフレームレート：Ｒ［フレーム／
秒］標本画像フレームの再生フレームレート：ｒ［フレーム
／秒］再生速度倍率：ｍ画面変化量情報：Ｐｉ（ｉ＝０，…，ｎ）画面変化量に対応して再生速度に与える重み：Ｗｉ標本画像フレームに対応する元映像データのフレーム番
号：Ｆｉ（ｉ＝０，…，ｎ−１）元映像データの各フレームに対応して再生速度に与える
重み：Ｗｊ（ｊ＝０，…，Ｋ−１）今、激しい動きに対して与える標本画像フレームの画面
変化量の限界値をＬとし、限界値Ｌを超えないような値
［Ｐｉ］を考える。Total number of frames of original video data 101: K
[Frame] Frame rate of original video data 101: R [frame /
Second] Reproduction frame rate of sample image frame: r [frame / second] Reproduction speed magnification: m Screen change amount information: Pi (i = 0,..., N) Weight given to reproduction speed corresponding to screen change amount: Wi Frame number of the original video data corresponding to the sample image frame: Fi (i = 0,..., N-1) Weight given to the reproduction speed corresponding to each frame of the original video data: Wj (j = 0,..., K) -1) Now, let L be the limit value of the amount of screen change of the sample image frame given to severe movement, and consider a value [Pi] that does not exceed the limit value L.

【００９８】［Ｐｉ］＝Ｌ，Ｐｉ＞Ｌの場合，［Ｐｉ］＝Ｐｉ，その他の場合 …（１）また、画面変化量に対応して再生速度に与えられる重み
をＷｉ＝［Ｐｉ］とする。[Pi] = L, Pi> L, [Pi] = Pi, other cases (1) Also, the weight given to the reproduction speed in accordance with the screen change amount is Wi = [Pi]. I do.

【００９９】次に、各フレームの再生速度に対する重み
を考える。離散的な再生速度に対応する重みＷｉを線形
補間して、次式に示すＷｊを求める。Next, the weight for the reproduction speed of each frame will be considered. The weight Wi corresponding to the discrete reproduction speed is linearly interpolated to obtain Wj represented by the following equation.

【０１００】Ｗｊ＝Ｗｉ＋（Ｗ（ｉ＋１）−Ｗｉ）／（Ｆ（ｉ＋１）−Ｆｉ）×ｔここで、ｔ＝０，…，Ｆ（ｉ＋１）−Ｆｉ、ｊ＝Ｆｉ，…，Ｆ（ｉ＋１）−１、ｉ＝０，…，ｎ−１ …（２）Ｗｊを全体の和が１．０になるように正規化したものをＷ’ｊとすると、Ｗ’ｊ＝Ｗｊ／ΣＷｊここで、ｊ＝０，…，ｋ …（３）ここで、再生速度倍率ｍ、再生フレームレートｒ［フレ
ーム／秒］で再生する場合に必要な表示回数Ｎは次式と
なる。Wj = Wi + (W (i + 1) −Wi) / (F (i + 1) −Fi) × t where t = 0,..., F (i + 1) −Fi, j = Fi,. i + 1) -1, i = 0,..., n-1 (2) When Wj is normalized such that the total sum of Wj becomes 1.0, W′j = Wj / ΣWj J = 0,..., K (3) Here, the number of display times N required for reproduction at a reproduction speed magnification m and a reproduction frame rate r [frame / second] is as follows.

【０１０１】Ｎ＝Ｋ／（ｍ×Ｒ／ｒ） …（４）再生速度に対して与える重みを考慮して、標本画像フレ
ーム群から表示用画像フレームを取得する場合、各標本
画像フレームに割り付けられた重みＷ’ｊを加算してゆ
き、その加算値がＴｈ＝ｐ／Ｎ（ｐ＝０，…，Ｎ−１）
なる閾値を超えたときの標本画像フレームを取得する。
すなわち、加算値が閾値Ｔｈを超えたときのフレーム番
号に対応する最近傍の標本画像フレームが表示用画像フ
レームとなる。N = K / (m × R / r) (4) When acquiring display image frames from a group of sample image frames in consideration of the weight given to the reproduction speed, assigning them to each sample image frame The weights W′j thus obtained are added, and the added value becomes Th = p / N (p = 0,..., N−1).
A sample image frame when the threshold value is exceeded is obtained.
That is, the nearest sample image frame corresponding to the frame number when the added value exceeds the threshold Th becomes the display image frame.

【０１０２】上記の計算に従って表示用画像フレームを
予め取得しておき、フレームレートｒ［フレーム／秒］
で表示すれば、画面変化量が大きい時には遅めに、また
画面変化量が小さい時には早目に可変速再生することに
なるが、結果としては所望の再生速度倍率ｍで画像を表
示できる。上記の計算を用いれば、ある時間長の映像番
組をそれより短い任意の時間内で再生することが可能と
なる。再生速度に対して与える重みＷ’ｊに対してスム
ージングをかけたり、シーンチェンジや静止画の部分で
特殊な重み付けを行うことで、可変速再生にさらに特殊
効果を加えることも可能である。A display image frame is obtained in advance according to the above calculation, and a frame rate r [frame / second] is obtained.
If the screen change amount is large, the variable speed reproduction is performed later, and if the screen change amount is small, the variable speed reproduction is performed earlier. As a result, the image can be displayed at a desired reproduction speed magnification m. By using the above calculation, a video program of a certain time length can be reproduced within an arbitrary shorter time. By applying smoothing to the weight W'j given to the reproduction speed or performing special weighting in a scene change or a still image portion, a special effect can be further added to the variable speed reproduction.

【０１０３】ここでは、元映像データ１０１の全体に対
して可変速再生を行う場合について述べたが、部分再生
の場合も全く同様の考え方で可変速再生を行うことがで
きる。すなわち、元映像データ１０１全体のＷ’ｊが計
算できれば、部分再生の問題は容易に解決できる。ま
た、元映像データ１０１の全体に対しての可変速再生の
場合の説明では、可変速再生の開始フレームと終了フレ
ームに画面変化量情報が存在すると仮定したが、これら
がない場合は適当に近傍の画面変化量情報を流用する
か、デフォルトの値を与えるかして計算すればよい。Here, the case where the variable speed reproduction is performed on the entire original video data 101 has been described. However, the variable speed reproduction can be performed in the same manner in the case of the partial reproduction. That is, if W′j of the entire original video data 101 can be calculated, the problem of partial reproduction can be easily solved. Also, in the description of the case of variable-speed playback of the entire original video data 101, it is assumed that the screen change amount information exists in the start frame and the end frame of variable-speed playback. The screen change amount information may be used or a default value may be given for the calculation.

【０１０４】以下、図９に示すフローチャートを参照し
て、上述のように画面変化量情報２０３を利用して、よ
り円滑な可変速再生を行う場合の具体的な処理手順を説
明する。図９において、ステップＳ５１〜Ｓ５４の処理
は図８におけるステップＳ４１〜Ｓ４４の処理と基本的
に同様である。Hereinafter, with reference to the flowchart shown in FIG. 9, a specific processing procedure in the case where smoother variable speed reproduction is performed using the screen change amount information 203 as described above will be described. 9, the processing in steps S51 to S54 is basically the same as the processing in steps S41 to S44 in FIG.

【０１０５】すなわち、先ず画面変化量を一定にして可
変速再生（この場合は、早送り再生）を行う範囲の指定
を行う（ステップＳ５１）。可変速再生範囲の開始フレ
ームをＦｓ、終了フレームをＦｅとする。次に、再生速
度倍率ｍ、つまり何倍速で早送り再生を行うかを指定す
る（ステップＳ５２）。次に、再生方向の指定、つまり
早送り再生を順方向再生で行うか逆方向再生で行うかの
指定を行う（ステップＳ５３）。次に、標本画像フレー
ムの再生フレームレートｒ［フレーム／秒］を指定する
（ステップＳ５４）。That is, first, a range for performing variable speed reproduction (in this case, fast forward reproduction) with the screen change amount being fixed is specified (step S51). The start frame of the variable speed reproduction range is Fs, and the end frame is Fe. Next, the reproduction speed magnification m, that is, how many times the fast forward reproduction is to be performed is specified (step S52). Next, a reproduction direction is specified, that is, whether fast-forward reproduction is performed in forward reproduction or reverse reproduction (step S53). Next, the reproduction frame rate r [frame / second] of the sample image frame is specified (step S54).

【０１０６】この後、式（４）により必要な表示回数Ｎ
を計算する（ステップＳ５５）。また、式（３）に示し
た重みＷ’ｊの加算値がＴｈ＝ｐ／Ｎ（ｐ＝０，
…，Ｎ−１）なる閾値を超えるときの標本画像フレーム
の位置、つまり加算値が閾値Ｔｈを超えたときのフレー
ム番号に対応する最近傍の標本画像フレームを表示用画
像フレーム位置として計算し、これをテーブルに記録す
る（ステップＳ５６）。Thereafter, the necessary number of display times N is obtained by the equation (4).
Is calculated (step S55). Further, the sum of the weights W′j shown in the equation (3) is Th = p / N (p = 0,
, N-1), the position of the sample image frame when it exceeds the threshold value, that is, the nearest sample image frame corresponding to the frame number when the added value exceeds the threshold value Th is calculated as the display image frame position. This is recorded in a table (step S56).

【０１０７】そして、再生フレームレートｒ［フレーム
／秒］で標本画像フレームを再生して表示するために、
１／ｒ秒のサイクルで上記テーブルを用いて表示用標本
画像フレームを取得して表示を行う（ステップＳ５
７）。Then, in order to reproduce and display the sample image frame at the reproduction frame rate r [frame / second],
The display sample image frame is acquired and displayed using the table at a cycle of 1 / r second (step S5).
7).

【０１０８】このように標本画像フレーム群を用いて可
変速再生を行う場合、画面変化量に応じて再生速度を変
化させる、つまり画面変化量が大きいところでは再生速
度を遅く、また画面変化量が小さいところで再生速度を
遅くすることで、先に示した特開平１０−２４３３５１
号（特願平０９−０４２６３７号）「映像再生装置」と
同様の画面変化量を一定に保った見やすい早送り再生を
標本画像フレームに対して実現することが可能である。In the case of performing the variable speed reproduction using the sample image frame group as described above, the reproduction speed is changed in accordance with the screen change amount, that is, the reproduction speed is reduced when the screen change amount is large, and the screen change amount is reduced. By lowering the reproduction speed at a small area, the above-described Japanese Patent Application Laid-Open No. 10-243351 is disclosed.
(Japanese Patent Application No. 09-042637), it is possible to realize easy-to-see fast-forward reproduction for a sample image frame while maintaining a constant screen change amount in the same manner as in the “video reproduction device”.

【０１０９】（４）その他の利用形態図１０は、上記で述べた方法で選択したシーンチェンジ
位置（カット点）近傍の標本画像フレーム５０１，５０
２，…を一覧表示した例である。元映像データから画像
フレームを取り出すことをしないので、このような一覧
画面５００を高速に作成することができる。(4) Other Use Modes FIG. 10 shows sample image frames 501 and 50 near the scene change position (cut point) selected by the method described above.
This is an example in which 2,... Are displayed in a list. Since the image frame is not extracted from the original video data, such a list screen 500 can be created at high speed.

【０１１０】図１１は、元映像全体を時間軸方向に延び
た一本のバー６０１で表示し、さらにバー６０１の指定
した一部分を拡大したバー６０２で表示した例である。
拡大したバー６０２には、この部分の元映像に含まれる
カット点のフレームの画像が見出しとして表示されてい
る。さらに拡大したバー６０２の上にマウスカーソル６
０３を当てると、カット点の位置を考慮して、指定点の
画像フレームに最も類似する近傍の標本画像フレーム６
０４を選択し、アイコンとして表示できる。この処理が
高速に行えるため、マウスアイコンを左右にスライドさ
せることにより、アイコンイメージをリアルタイムに動
画のように表示することができる。FIG. 11 shows an example in which the entire original image is displayed by a single bar 601 extending in the time axis direction, and a specified portion of the bar 601 is displayed by an enlarged bar 602.
In the enlarged bar 602, the image of the frame of the cut point included in the original video of this portion is displayed as a heading. Mouse cursor 6 over bar 602
03, the nearest sample image frame 6 most similar to the image frame at the designated point is considered in consideration of the position of the cut point.
04 can be selected and displayed as an icon. Since this processing can be performed at high speed, the icon image can be displayed in real time like a moving image by sliding the mouse icon left and right.

【０１１１】一方、監視システムの応用を考えたとき、
たまに起る事象を効率よく発見したいという要求があ
る。例えば、常時は監視画面に背景画像のみが映ってい
るが、あるとき侵入者が映ったとする。侵入者は、背景
画像の差分画像として容易に発見することができる。ま
た、映像を記録すると同時に、画面の変化のないところ
では標本画像フレームを時間的に粗くサンプリングし、
画面の変化があったところでは時間的に細かくサンブリ
ングすることにより、侵入者を確実に記録することがで
きる。侵入者が映った画面に、付加情報としてカット点
の管理等のための情報を貯えておき、後で一覧表示を行
うことが可能となる。また、侵入者があったときだけ、
標本画像フレームの空間的サンブリングを精細にするこ
とにより、標本画像フレームでも侵入者を確認するよう
にすることもできる。On the other hand, when considering the application of the monitoring system,
There is a need to efficiently find occasional events. For example, suppose that a background image is always shown on the monitoring screen, but an intruder is shown at some point. An intruder can easily find the difference image of the background image. At the same time as recording the video, the sample image frame is sampled coarsely in a place where there is no change in the screen,
Where the screen changes, the intruder can be reliably recorded by finely sampling in time. Information for managing a cut point or the like is stored as additional information on the screen on which the intruder appears, and a list can be displayed later. Also, only when there is an intruder,
By refining the spatial sampling of the specimen image frame, intruders can also be identified in the specimen image frame.

【０１１２】さらに、侵入者が入った時に元映像よりも
高精細な静止画像を取得し、それを標本画像フレームと
して管理することも有効である。通常の映像では解像度
が不十分な場合、それよりも高解像度の静止画像を用い
て侵入者の判別を行うことが可能となる。It is also effective to acquire a still image with higher definition than the original video when an intruder enters, and manage it as a sample image frame. When the resolution is insufficient for a normal video, it is possible to determine an intruder by using a still image with a higher resolution.

【０１１３】以上説明したように、本実施形態によれ
ば、元映像データ以外に、元映像フレーム群を時間的に
任意の間隔でかつ空間的に任意の大きさにサンプリング
して得た標本画像フレーム群の画像データと属性情報か
らなる標本画像情報を記録しておき、元映像データでは
なく標本画像情報を検索することにより、計算機パワー
やトラフィックに負担をかけることなく、所望フレーム
の映像検索を容易に行うことが可能となる。また、標本
画像情報にシーンチェンジ位置情報を付帯情報として記
述することによって、所望のフレームとより類似した標
本画像フレームの検索が可能となる。さらに、検索対象
画像と各標本画像フレームの画像との差分、例えば絶対
値差分の合計を求め、この値が小さい標本画像フレーム
を検索することによっても、所望フレームの検索ができ
る。さらに、画面変化量が大きいところでは再生速度を
遅く、画面変化量が小さいところで再生速度を遅くする
ことで、画面変化量を一定に保った見やすい可変速再生
を標本画像フレームに対して実現することも可能とな
る。As described above, according to the present embodiment, in addition to the original video data, a sample image obtained by sampling the original video frame group at an arbitrary time interval and at an arbitrary spatial size. By recording sample image information consisting of frame group image data and attribute information and searching for sample image information instead of original video data, video search for a desired frame can be performed without burdening computer power or traffic. This can be easily performed. Further, by describing the scene change position information as supplementary information in the sample image information, it becomes possible to search for a sample image frame more similar to the desired frame. Furthermore, a desired frame can also be searched by obtaining the difference between the search target image and the image of each sample image frame, for example, the sum of absolute value differences, and searching for a sample image frame having a small value. Further, by reducing the reproduction speed when the screen change amount is large and by lowering the reproduction speed when the screen change amount is small, it is possible to realize easy-to-read variable speed reproduction with a constant screen change amount for the sample image frame. Is also possible.

【０１１４】以下、本発明の他の実施形態を説明する。
他の実施形態の説明において第１の実施形態と同一部分
は同一参照数字を付してその詳細な説明は省略する。Hereinafter, another embodiment of the present invention will be described.
In the description of the other embodiments, the same parts as those of the first embodiment are denoted by the same reference numerals, and detailed description thereof will be omitted.

【０１１５】第１実施形態では時・空間標本映像メタデ
ータ１０２は複数の標本画像情報２０１_１〜２０１_ｎを
有するとしたが、その詳細な記述例は説明しなかった。
この記述例の具体例に関する第２実施形態を以下に説明
する。In the first embodiment, the spatiotemporal sample video metadata 102 has a plurality of sample image information 201 _{1 to} 201 _n , but a detailed description example thereof has not been described.
A second embodiment relating to a specific example of this description example will be described below.

【０１１６】図１２は第２実施形態の標本画像情報の記
述例を示している。ここでは、標本画像フレーム群を一
つの映像（標本映像）として扱い、その集合として標本
映像情報７０１を構成する。標本映像は標本映像情報７
０１とは別に用意し、標本映像情報７０１にその場所を
ＵＲＬ等により記述してもよいし、標本映像を標本映像
情報７０１として直接記述してもよい。FIG. 12 shows a description example of sample image information according to the second embodiment. Here, the sample image frame group is treated as one image (sample image), and the sample image information 701 is configured as a set thereof. Sample video is sample video information 7
01, the location may be described in the sample video information 701 by a URL or the like, or the sample video may be directly described as the sample video information 701.

【０１１７】標本画像情報７０２は標本映像情報７０１
で示される標本映像内の標本画像フレームと元映像デー
タフレームとの対応付けを示し、標本映像に含まれる標
本画像フレーム数に応じて、複数記述される。標本画像
情報７０２は元映像フレームのメディア時間７０３と標
本映像のメディア時間７０４より構成される。元映像フ
レームのメディア時間７０３は標本画像フレームに対応
する元映像のフレームを示す。元映像のフレームを一意
に決めることができれば、タイムスタンプ等の時間であ
ってもよいし、フレーム番号等であってもよい。また、
元映像フレームが一定間隔でサンプリングされている場
合など、演算によって対応する元映像フレームが求めら
れる場合は、演算に必要な情報（例えばサンプリング間
隔）を記述して、元映像フレームのメディア時間７０３
を省略してもよい。標本映像のメディア時間７０４は標
本映像情報７０１で示される標本映像内の特定の標本画
像フレームを示す。標本映像のメディア時間７０４は標
本画像フレームを一意に決めることができれば、フレー
ム番号等であってもよいし、標本映像を通常の映像とし
て扱った場合のタイムスタンプ等の時間であってもよ
い。また、標本映像と順次対応付けが行われる場合は省
略しても構わない。The sample image information 702 is sample video information 701
Indicates the correspondence between the sample image frame in the sample video and the original video data frame, and is described in plurality according to the number of sample image frames included in the sample video. The sample image information 702 includes a media time 703 of the original video frame and a media time 704 of the sample video. The media time 703 of the original video frame indicates a frame of the original video corresponding to the sample image frame. As long as the frame of the original video can be uniquely determined, it may be a time such as a time stamp or a frame number. Also,
When the original video frame corresponding to the original video frame is obtained by the calculation, for example, when the original video frame is sampled at a constant interval, information necessary for the calculation (for example, the sampling interval) is described, and the media time 703 of the original video frame is described.
May be omitted. The sample video media time 704 indicates a specific sample image frame in the sample video indicated by the sample video information 701. The media time 704 of the sample video may be a frame number or the like as long as the sample image frame can be uniquely determined, or may be a time such as a time stamp when the sample video is treated as a normal video. Further, when the correspondence is sequentially performed with the sample video, it may be omitted.

【０１１８】図１３は標本画像情報の別の記述例を示し
ている。標本画像情報８０１は各標本画像フレームと元
映像データフレームとの対応付けを示し、標本画像フレ
ーム数に応じて、複数記述される。標本画像情報８０１
は元映像フレームのメディア時間８０２と標本画像デー
タ８０３より構成される。メディア時間８０２は図１２
で示される記述例におけるメディア時間７０３と同様に
標本画像フレームに対応する元映像データのフレーム位
置を示す。なお、本メディア時間８０２もメディア時間
７０３と同様に省略してもよい。標本画像は標本画像デ
ータ８０１とは別に個々に用意し、標本画像データ８０
１にその場所をＵＲＬ等により記述してもよいし、標本
画像を標本画像データとして標本画像データ８０１に直
接記述してもよい。また、標本画像のかわりにその内容
を示すイラスト等の他の画像を標本画像データとしても
よい。FIG. 13 shows another description example of the sample image information. The sample image information 801 indicates the correspondence between each sample image frame and the original video data frame, and is described in plural according to the number of sample image frames. Specimen image information 801
Is composed of the media time 802 of the original video frame and the sample image data 803. The media time 802 is shown in FIG.
The frame position of the original video data corresponding to the sample image frame is shown similarly to the media time 703 in the description example shown by. Note that the main media time 802 may be omitted in the same manner as the media time 703. The sample images are individually prepared separately from the sample image data 801.
1 may be described by a URL or the like, or the sample image may be directly described in the sample image data 801 as sample image data. Further, instead of the sample image, another image such as an illustration showing the contents may be used as the sample image data.

【０１１９】図１４は標本画像情報の別の記述例を示し
ている。図１４で示される記述例は図１２及び図１３で
示される記述例の両方を含んでいる。標本映像情報９０
１は図１２で示される記述例における標本映像情報７０
１と同様で、標本映像の場所を示すＵＲＬあるいは標本
映像自体を示す。標本画像情報９０２は各標本画像フレ
ームと元映像データフレームとの対応付けを示し、標本
画像フレーム数に応じて、複数記述される。標本画像情
報９０２は元映像フレームのメディア時間９０３と標本
映像のメディア時間９０４Ａあるいは標本画像データ９
０４Ｂのいずれかにより構成される。元映像フレームの
メディア時間９０３は図１２で示される記述例における
メディア時間７０３と同様に標本画像フレームに対応す
る元映像データのフレームを示す。なお、本メディア時
間９０３もメディア時間７０３と同様に省略してもよ
い。標本映像のメディア時間９０４Ａは図１２で示され
る記述例におけるメディア時間７０４と同様で、標本映
像情報９０１で示される標本映像内の特定の標本画像フ
レームを示す。該メディア時間９０４Ａが標本映像と順
次対応付けされる場合は省略しても構わない。標本画像
データ９０４Ｂは図１３で示される記述例における標本
画像データ８０３と同様で、個々の標本画像フレームの
場所や標本画像フレーム自体を示す。FIG. 14 shows another description example of the sample image information. The description example shown in FIG. 14 includes both the description examples shown in FIG. 12 and FIG. Sample video information 90
Reference numeral 1 denotes sample video information 70 in the description example shown in FIG.
Similar to 1, the URL indicates the location of the sample image or the sample image itself. The sample image information 902 indicates the correspondence between each sample image frame and the original video data frame, and is described in plural according to the number of sample image frames. The sample image information 902 includes the media time 903 of the original video frame and the media time 904A of the sample video or the sample image data 9
04B. The media time 903 of the original video frame indicates a frame of the original video data corresponding to the sample image frame similarly to the media time 703 in the description example shown in FIG. Note that the main media time 903 may be omitted in the same manner as the media time 703. The media time 904A of the sample video is the same as the media time 704 in the description example shown in FIG. 12, and indicates a specific sample image frame in the sample video indicated by the sample video information 901. If the media time 904A is sequentially associated with the sample video, it may be omitted. The sample image data 904B is the same as the sample image data 803 in the description example shown in FIG. 13, and indicates the location of each sample image frame and the sample image frame itself.

【０１２０】図１４の記述例によれば、標本映像の一部
を差し替えたり、別の標本画像を追加することができ
る。According to the description example of FIG. 14, it is possible to replace a part of a sample image or add another sample image.

【０１２１】次に、図１２〜図１４で示される記述例を
利用して、所望メディア時間の標本画像データを抽出す
る処理を説明する。図１５はその基本フローである。ス
テップＳ６１で所望の標本画像フレームに対応する元映
像フレームのメディア時間を入力する。メディア時間は
タイムスタンプやフレーム番号等、メディア内での時間
的な位置を一意に示すものである。ステップＳ６２では
図１２〜図１４の記述例で記述される標本画像情報群の
中から最初の標本画像情報を取り出す。ステップＳ６３
で、所望のメディア時間と標本画像情報に含まれる元映
像フレームのメディア時間とを比較し、両者が同一、あ
るいは、所望のメディア時間の方が後になる場合はステ
ップＳ６４へ進み、標本画像情報で示されている標本画
像データを取り出す。標本画像データの抽出方法は記述
方法によって異なり、標本フレーム番号が記述されてい
る場合は標本映像の該当標本画像データを抽出し、標本
画像データが記述されている場合はそれを用いる。標本
画像情報に含まれるメディア時間の方が所望メディア時
間より後の場合はステップＳ６５へ進み、標本画像情報
群から次の標本画像情報を取り出して、再びステップＳ
６３へ進みメディア時間の比較を行う。Next, a process for extracting sample image data at a desired media time will be described with reference to the description examples shown in FIGS. FIG. 15 shows the basic flow. In step S61, a media time of an original video frame corresponding to a desired sample image frame is input. The media time uniquely indicates a temporal position in the media, such as a time stamp or a frame number. In step S62, the first sample image information is extracted from the sample image information group described in the description examples of FIGS. Step S63
Then, the desired media time is compared with the media time of the original video frame included in the sample image information. If both are the same or the desired media time is later, the process proceeds to step S64, and the process proceeds to step S64. The sample image data shown is taken out. The method of extracting the sample image data differs depending on the description method. When the sample frame number is described, the corresponding sample image data of the sample video is extracted, and when the sample image data is described, it is used. If the media time included in the sample image information is later than the desired media time, the process proceeds to step S65, the next sample image information is extracted from the sample image information group, and the process returns to step S65.
Proceed to 63 to compare the media times.

【０１２２】図１６は図１２〜図１４で示される記述例
に標本画像フレームの属性情報を付加する記述例であ
る。標本映像は、大きさの異なる標本画像を用いたり、
元映像データの一部の領域のみを切り出して、標本画像
として用いることができるので、これらのパラメータを
属性情報として記述するための一例が図１６に示す記述
例である。FIG. 16 is a description example in which the attribute information of the sample image frame is added to the description examples shown in FIGS. Sample images use sample images of different sizes,
Since only a part of the original video data can be cut out and used as a sample image, an example for describing these parameters as attribute information is the description example shown in FIG.

【０１２３】標本画像フレーム群情報１００１は図１２
〜図１４で示される記述例等による情報を示す。標本属
性情報１００２は個々の標本画像フレームの属性情報
で、標本映像に含まれる標本画像フレーム数に応じて、
複数記述される。標本属性情報１００２は標本番号１０
０３、解像度情報１００４、領域情報１００５により構
成される。The sample image frame group information 1001 is shown in FIG.
15 shows information according to the description example shown in FIG. The sample attribute information 1002 is attribute information of each sample image frame. According to the number of sample image frames included in the sample video,
Described multiple times. Sample attribute information 1002 is sample number 10
03, resolution information 1004, and area information 1005.

【０１２４】標本番号１００３は標本画像フレーム群情
報１００１で示される標本画像フレーム群に含まれる特
定の標本画像フレームに対応する番号である。標本フレ
ーム番号１００３が標本画像フレーム群内の標本画像フ
レームと順次対応する場合は省略しても構わない。A sample number 1003 is a number corresponding to a specific sample image frame included in the sample image frame group indicated by the sample image frame group information 1001. If the sample frame number 1003 sequentially corresponds to the sample image frames in the sample image frame group, it may be omitted.

【０１２５】解像度情報１００４は標本番号１００３で
示される標本画像フレームの元映像データの対応フレー
ムに対する解像度を示す。例えば、画像を何分の１に縮
小したか等を記述する。The resolution information 1004 indicates the resolution of the sample image frame indicated by the sample number 1003 with respect to the corresponding frame of the original video data. For example, it describes how many times the image has been reduced.

【０１２６】領域情報１００５は標本番号１００３で示
される標本画像フレームの元映像データの対応フレーム
における該当傾城を示す。標本画像フレームが元映像デ
ータの対応フレームの一部を切り出している場合はその
領域を記述する。標本画像フレームが元映像データの対
応フレーム全体に相当する場合は領域情報を省略しても
よい。The area information 1005 indicates a corresponding slope in the corresponding frame of the original video data of the sample image frame indicated by the sample number 1003. If the sample image frame cuts out a part of the corresponding frame of the original video data, the area is described. If the sample image frame corresponds to the entire corresponding frame of the original video data, the region information may be omitted.

【０１２７】なお、これら属性情報は図示しないが、図
１２〜図１４で示した記録例における各標本画像情報内
に記述するようにしてもよい。Although not shown, the attribute information may be described in each sample image information in the recording examples shown in FIGS.

【０１２８】図１７は図１６で示される記述方法を用い
た実際の記述例である。元映像フレーム１４０１の中に
は物体が一部のみに存在しているとする。元映像フレー
ム１４０１の標本画像フレームを作成する場合、両画全
体のサンプリングするよりも一部のみを取り出してサン
プリングした方が、画像の内容をより反映した標本画像
フレームが作成できる。そこで、元映像フレーム１４０
１内の矩形領域１４０２を取り出して、縦横がそれぞれ
１／２となるようにサンプリングを行い標本画像フレー
ム１４０３を作成する。このときの解像度情報、領域情
報の記述例は１４０４のようになる。FIG. 17 is an actual description example using the description method shown in FIG. It is assumed that an object exists only partially in the original video frame 1401. When a sample image frame of the original video frame 1401 is created, it is possible to create a sample image frame that reflects the contents of the image more by taking out and sampling only a part than sampling the entirety of both images. Therefore, the original video frame 140
A rectangular area 1402 within the area 1 is sampled, and sampling is performed so that the height and width are each halved to create a sample image frame 1403. A description example of the resolution information and the area information at this time is as shown in 1404.

【０１２９】図１８はユーザの要求に応じて標本画像フ
レームの一覧表示を行う基本フローである。スナッブＳ
７１ではユーザが一覧表示するレベルを入力する。入力
方法は表示レベルに応じて連続的に変化するスライダー
等のＧＵＩを用いてもよいし、数字で直接入力してもよ
い。あるいは、計算機等に接読されたホイールやダイヤ
ル等の入力装置を用いてもよい。FIG. 18 is a basic flow for displaying a list of sample image frames in response to a user request. Snub S
At 71, the user inputs a level to be displayed in a list. As an input method, a GUI such as a slider that changes continuously according to the display level may be used, or the input may be directly made by a number. Alternatively, an input device such as a wheel or a dial read by a computer or the like may be used.

【０１３０】ステップＳ７２では、ステップＳ７１で入
力されたレベル値から一覧表示する標本画像の枚数を計
算する。例えば、最大表示レベルをＬｍａｘ、最大表示
標本画像フレーム数をＴｍａｘとし、現在の表示レベル
がＬであるとすると、表示標本画像フレーム数Ｔ＝Ｔｍ
ａｘ×Ｌ／Ｌｍａｘで求められる。In step S72, the number of sample images to be displayed in a list is calculated from the level value input in step S71. For example, assuming that the maximum display level is Lmax, the maximum display sample image frame number is Tmax, and the current display level is L, the display sample image frame number T = Tm
ax × L / Lmax.

【０１３１】ステップＳ７３では、表示標本画像フレー
ム数に応じて一覧表示する標本画像フレームの選択を行
う。例えば、一定時間間隔や一定フレーム間隔で標本画
像フレームを選択する。あるいは、カット点情報等の付
加情報が与えられている場合は、カット点やシーンの先
頭フレーム等のより重要度の高いフレームから優先的に
選択してもよい。In step S73, sample image frames to be displayed in a list are selected according to the number of display sample image frames. For example, sample image frames are selected at fixed time intervals or fixed frame intervals. Alternatively, when additional information such as cut point information is provided, a frame having higher importance such as a cut point or a top frame of a scene may be preferentially selected.

【０１３２】ステップＳ７４では選択された標本画像の
一覧を作成して、表示する。At step S74, a list of the selected specimen images is created and displayed.

【０１３３】図１９は図１８で示される基本フローを用
いた標本画像フレームの一覧表示のインターフェースを
示す。画面１１０１上には表示レベルを指定するための
スライダー１１０２と標本画像一覧１１０３が存在す
る。スライダー１１０２を画面１１０４内のスライダー
１１０５に示すような位置に移動させて表示レベルを大
きくすると、一覧表示される標本画像の枚数が一覧表示
１１０６に示されるように増加する。このようなインタ
ーフェースを用いることによって、ユーザは映像の内容
に応じて必要なだけの標本画像を直感的に表示すること
が可能である。FIG. 19 shows an interface for displaying a list of sample image frames using the basic flow shown in FIG. On a screen 1101, there are a slider 1102 for specifying a display level and a sample image list 1103. When the slider 1102 is moved to the position shown by the slider 1105 in the screen 1104 to increase the display level, the number of sample images displayed in the list increases as shown in the list display 1106. By using such an interface, the user can intuitively display as many sample images as necessary according to the content of the video.

【０１３４】図２０は図１６で示される記述例を用いた
画面表示の例である。図１６で示される記述例を用いる
ことにより、解像度の異なる標本画像フレームや画面の
一部の領域のみを切り出した標本画像フレームを扱うこ
とができる。一方、画像中には字幕部分等の高解像度で
サンプリングすることが望ましい領域と、背景等の低解
像度でサンプリングすれば十分な領域が混在している。
そこで、同一フレームから作成した解像度と領域の異な
る標本画像フレームを含む標本画像フレーム群１２０１
を用意し、画面表示例１２０２に示すようにこれらを重
ね合わせて表示することにより、字幕を高解像度で背景
を低解像度で表示することが可能となる。FIG. 20 is an example of a screen display using the description example shown in FIG. By using the description example shown in FIG. 16, it is possible to handle sample image frames having different resolutions or sample image frames obtained by cutting out only a part of the screen. On the other hand, in the image, there are mixed regions such as a subtitle portion where it is desirable to perform sampling at a high resolution, and regions such as a background that are sufficient for sampling at a low resolution.
Therefore, a sample image frame group 1201 including sample image frames created from the same frame and having different resolutions and regions.
Is prepared, and these are superimposed and displayed as shown in a screen display example 1202, so that the subtitles can be displayed at a high resolution and the background can be displayed at a low resolution.

【０１３５】図２１は図１６で示される記述例を用いた
画面表示の別の例である。画像１３０１は低解像度でサ
ンプリングされた標本画像フレームである。字幕部分
等、ユーザがより詳細な画像を所望する領域１３０２を
マウス等で指し示すと、領域１３０２のみをより高解像
度でサンプリングした標本画像フレーム１３０３がポッ
プアップ等によって表示される。通常は、画像１３０１
のような低解像度な標本画像フレームが表示されている
ので、画像の大きさを小さくすることができ、一覧表示
等により多くの画像を表示することができる。FIG. 21 is another example of a screen display using the description example shown in FIG. Image 1301 is a sample image frame sampled at low resolution. When the user points a region 1302 where a more detailed image is desired, such as a subtitle portion, with a mouse or the like, a sample image frame 1303 obtained by sampling only the region 1302 at a higher resolution is displayed by a pop-up or the like. Normally, the image 1301
Since a low resolution sample image frame is displayed, the size of the image can be reduced, and more images can be displayed in a list display or the like.

【０１３６】なお、本発明は上述した実施例に限定され
るものではなく、種々変形して実施可能である。The present invention is not limited to the above-described embodiments, but can be implemented with various modifications.

【０１３７】[0137]

【発明の効果】以上説明したように、本発明の画像情報
記述方法によれば、映像の内容を確認しての検索や表示
を行うことができる。As described above, according to the image information description method of the present invention, search and display can be performed while confirming the contents of the video.

【０１３８】また、元映像データをサンプリングして得
られた標本画像に基づいて検索を行う場合、所望のフレ
ームがシーンチェンジとシーンチェンジの間に存在する
ような場合でも良好な映像検索ができる。When a search is performed based on a sample image obtained by sampling original video data, a good video search can be performed even when a desired frame exists between scene changes.

【０１３９】さらに、標本画像に基づいて可変速再生を
行うことができるので処理量が軽減でき、計算機パワー
の小さな機器やネットワーク上でも容易に可変速再生を
実現できる。Further, since the variable speed reproduction can be performed based on the sample image, the processing amount can be reduced, and the variable speed reproduction can be easily realized even on a device with a small computer power or on a network.

[Brief description of the drawings]

【図１】本発明の一実施形態に係るシステムアーキテク
チャを示す図。FIG. 1 is a diagram showing a system architecture according to an embodiment of the present invention.

【図２】元映像データおよび時・空間標本映像メタデー
タの構造を示す概念図。FIG. 2 is a conceptual diagram showing the structure of original video data and spatio-temporal sample video metadata.

【図３】時・空間標本映像メタデータに含まれる標本画
像情報の説明図。FIG. 3 is an explanatory diagram of sample image information included in spatiotemporal sample video metadata.

【図４】標本画像情報の管理構造を示す図。FIG. 4 is a diagram showing a management structure of specimen image information.

【図５】標本画像情報の記述手順を説明するための時・
空間標本映像メタデータの記録手順を示すフローチャー
ト。FIG. 5 is a time chart for explaining a description procedure of specimen image information.
9 is a flowchart illustrating a recording procedure of spatial sample video metadata.

【図６】時・空間標本映像メタデータに含まれるシーン
チェンジ情報を用いた標本画像フレームの検索手順を示
すフローチャート。FIG. 6 is a flowchart showing a procedure for searching for a sample image frame using scene change information included in spatiotemporal sample video metadata.

【図７】時・空間標本映像メタデータを対象とした標本
画像フレームの検索手順を示すフローチャート。FIG. 7 is a flowchart showing a procedure for searching for a sample image frame for spatiotemporal sample video metadata;

【図８】標本画像フレームを用いた可変速再生の手順を
示すフローチャート。FIG. 8 is a flowchart showing a procedure of variable speed reproduction using a sample image frame.

【図９】標本画像フレームと画面変化量情報を用いて円
滑な可変速再生の手順を示すフローチャート。FIG. 9 is a flowchart showing a procedure of smooth variable-speed reproduction using a sample image frame and screen change amount information.

【図１０】時・空間標本映像メタデータに含まれるシー
ンチェンジ情報を用いた標本画像フレームの一覧表示の
例を示す図。FIG. 10 is a view showing an example of a list display of sample image frames using scene change information included in spatiotemporal sample video metadata.

【図１１】時・空間標本映像メタデータを用いた元映像
データと標本画像フレームの表示例を示す図。FIG. 11 is a diagram showing a display example of original video data and a sample image frame using spatiotemporal sample video metadata.

【図１２】標本画像情報の他の記述例を示す図。FIG. 12 is a diagram showing another description example of sample image information.

【図１３】標本画像情報の別の記述例を示す図。FIG. 13 is a diagram showing another description example of sample image information.

【図１４】標本画像情報のさらに他の記述例を示す図。FIG. 14 is a diagram showing still another description example of sample image information.

【図１５】図１２〜図１４の記述例に従った標本画像情
報を用いた標本画像データの検索を示すフローチャー
ト。FIG. 15 is a flowchart showing a search for sample image data using sample image information according to the description examples of FIGS. 12 to 14;

【図１６】標本画像情報のさらに別の記述例を示す図。FIG. 16 is a diagram showing still another description example of sample image information.

【図１７】図１６の記述例に従った標本画像情報の具体
例を示す図。FIG. 17 is a view showing a specific example of sample image information according to the description example of FIG. 16;

【図１８】表示レベルに応じて表示枚数を可変する標本
画像一覧表示の動作を示すフローチャート。FIG. 18 is a flowchart illustrating an operation of displaying a sample image list in which the number of displayed images is changed according to a display level.

【図１９】表示レベルが可変された場合の標本画像一覧
表示の変化を示す図。FIG. 19 is a diagram showing a change in a sample image list display when a display level is changed.

【図２０】図１６の記述例に従った標本画像情報により
解像度と領域の異なる複数の標本画像を重ねて表示する
例を示す図。FIG. 20 is a diagram showing an example in which a plurality of sample images having different resolutions and regions are displayed in a superimposed manner based on the sample image information according to the description example of FIG. 16;

【図２１】図１６の記述例に従った標本画像情報により
解像度と領域の異なる複数の標本画像を重ねて表示する
他の例を示す図。FIG. 21 is a diagram showing another example in which a plurality of sample images having different resolutions and regions are superimposed and displayed based on the sample image information according to the description example of FIG. 16;

───────────────────────────────────────────────────── フロントページの続き (72)発明者三田雄志神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者山本晃司神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内Ｆターム(参考） 5B075 ND12 NK10 NK31 NK44 NK50 NK54 PP28 PQ02 PQ32 PQ46 PQ60 PR06 QM08 QP05 QS20 5C052 AA01 AA02 AA04 AA17 AC06 CC11 DD04 5C053 FA21 FA23 FA25 FA27 GA10 GA18 GB06 GB27 GB28 GB36 GB37 GB38 HA21 HA29 HA33 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Yuji Mita 1st address, Toshiba-cho, Komukai-shi, Kawasaki-shi, Kanagawa Prefecture Inside the Toshiba R & D Center (72) Inventor Koji Yamamoto Toshiba, Komukai-shi, Kawasaki-shi, Kanagawa No. 1 town Toshiba Research & Development Center F-term (reference) GB36 GB37 GB38 HA21 HA29 HA33

Claims

[Claims]

1. A sample image frame comprising: a plurality of sample image frames obtained by sampling video information composed of a plurality of video frames at an arbitrary time interval and an arbitrary size spatially; A description of attribute information for specifying a video frame of the video information corresponding to each of the image information.

2. The image information description method according to claim 1, wherein additional information relating to said video information is also described as said sample image information.

3. The image information description method according to claim 1, wherein the attribute information includes position information indicating a position on a time axis of a video frame corresponding to the sample image frame. .

4. The method according to claim 1, wherein the attribute information includes information on a size or a resolution of the sample image frame.

5. The image information description method according to claim 2, wherein the supplementary information includes scene change position information of the video information.

6. The image information description method according to claim 2, wherein the supplementary information includes information on a screen change amount of the video information.

7. The image information according to claim 1, wherein image data of the sample image frame or a pointer to the sample image frame is described as the sample image information. Description method.

8. A recording medium for storing the sample image information described by the image information description method according to claim 1. Description:

9. A desired video using sample image information on a plurality of sample image frames obtained by sampling video information consisting of a plurality of video frames at an arbitrary time interval and an arbitrary size spatially. In the video search method for searching information, attribute information including at least first position information indicating a position on a time axis to specify the video frame corresponding to the sample image frame is described as the sample image information, Based on the first position information and second position information indicating a position on the time axis of the desired image, a sample image frame having first position information closest to the second position information is searched for. A video search method characterized by the following.

10. A desired image using sample image information on a plurality of sample image frames obtained by sampling video information consisting of a plurality of video frames at an arbitrary time interval and an arbitrary size spatially. In the video search method for searching, the attribute information including at least first position information indicating a position on a time axis for specifying the video frame corresponding to the sample image frame is described as the sample image information, The scene change position information of the video information is described as incidental information in the sample image information, the first position information, the second position information indicating a position on the time axis of the desired video, and the scene change position information The sample image frame having the first position information closest to the second position information temporally before or after the scene change position information is determined based on A video search method characterized by searching.

11. A desired video using sample image information on a plurality of sample image frames obtained by sampling video information consisting of a plurality of video frames at an arbitrary time interval and an arbitrary size spatially. In the video search method for searching for, the attribute information including at least position information indicating a position on a time axis for specifying the video frame corresponding to the sample image frame is described as the sample image information, and the desired image is described. A video image search method for searching for a sample image frame whose difference from the sample image is equal to or less than a predetermined threshold value.

12. The video search method according to claim 11, wherein the position information described for a sample image whose difference from the desired image is equal to or less than a predetermined threshold is recorded as a search result.

13. A method for producing an image by using sample image information on a plurality of sample image frames obtained by sampling video information consisting of a plurality of video frames at an arbitrary time interval and an arbitrary size spatially. In a video reproduction method for performing variable speed reproduction, attribute information including at least position information indicating a position on a time axis for specifying the sample image frame and the video frame corresponding to the sample image frame is described as sample image information. The sample image information describes the information of the screen change amount of the video information as supplementary information, and the variable speed reproduction of the video is performed by changing the reproduction speed of the sample image frame according to the information of the screen change amount. A video reproducing method characterized by performing.

14. A desired video using sample image information on a plurality of sample image frames obtained by sampling video information consisting of a plurality of video frames at an arbitrary time interval and an arbitrary size spatially. A sample image information comprising attribute information including at least first position information indicating a position on a time axis for identifying the image frame corresponding to the sample image frame; Supplementary information indicating scene change position information of the video information added to image information, the first position information, second position information indicating a position on the time axis of the desired video, and the scene change position Based on the information, the sample image frame having the first position information closest to the second position information temporally before or after the scene change position information is detected. A video search device, comprising: a search engine for searching.

15. A desired video using sample image information on a plurality of sample image frames obtained by sampling video information consisting of a plurality of video frames at an arbitrary time interval and at an arbitrary spatial size. In the video search device for searching for, sample image information consisting of attribute information including position information indicating at least a position on the time axis to identify the video frame corresponding to the sample image frame, and the desired image A search engine for searching for a sample image frame whose difference is equal to or less than a predetermined threshold value.

16. A method for producing an image by using sample image information on a plurality of sample image frames obtained by sampling video information composed of a plurality of video frames at an arbitrary time interval and an arbitrary size spatially. In a video playback apparatus that performs variable-speed playback, a sample including image data of the sample image and attribute information including position information indicating at least a position on a time axis for specifying the video frame corresponding to the sample image frame. Image information, incidental information indicating a screen change amount of the video information added to the sample image information, and a variable speed of a video by changing a reproduction speed of the sample image frame according to the information of the screen change amount. A video playback device, comprising: a display engine that performs playback.

17. The method according to claim 17, wherein the sample image frame includes an image obtained by sampling only an arbitrary portion of one image of the video information at an arbitrary time interval and an arbitrary spatial size. The image information description method according to any one of claims 1 to 7, characterized in that:

18. The method according to claim 18, wherein the sample image frame includes an image obtained by sampling only an arbitrary portion of one image of the video information at an arbitrary time interval and an arbitrary spatial size. The video search method according to any one of claims 9 to 12, characterized in that:

19. The method according to claim 19, wherein the sample image frame includes an image obtained by sampling only an arbitrary portion of one image of the video information at an arbitrary time interval and an arbitrary spatial size. 14. The video reproducing method according to claim 13, wherein:

20. The sample image frame includes a sample obtained by sampling only an arbitrary portion of one image of the video information at an arbitrary time interval and an arbitrary spatial size. Claim 14 or Claim 15 characterized by the above-mentioned.
The video search device according to the above.

21. The sample image frame includes a sample obtained by sampling only an arbitrary portion of one image of the video information at an arbitrary time interval and an arbitrary spatial size. 17. The video playback device according to claim 16, wherein:

22. The image information description method according to claim 1, wherein the plurality of sample image frames are stored as one piece of video information.

23. The video search method according to claim 9, wherein the plurality of sample image frames are stored as one video information.

24. The video reproducing method according to claim 13, wherein said plurality of sample image frames are stored as one video information.

25. The method according to claim 14, wherein the plurality of sample image frames are stored as one piece of video information.
Or the video search device according to claim 15.

26. The video reproducing apparatus according to claim 16, wherein said plurality of sample image frames are stored as one video information.