JP2005236891A

JP2005236891A - Image processing apparatus and method thereof

Info

Publication number: JP2005236891A
Application number: JP2004046639A
Authority: JP
Inventors: Satoru Yashiro; 哲八代
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2004-02-23
Filing date: 2004-02-23
Publication date: 2005-09-02

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that a moving image part desired by a user is not efficiently retrieved only by integrating and listing shots for viewing with the shot corresponding to the cut of a commercial moving image as a unit, since the lengths of the shots of the moving image photographed by a video camcorder has length largely distributed from several seconds to one hour, which is long, and a cut point image is not always suitable for representing the shots. <P>SOLUTION: The moving image with the boundary of the shot defined therein is supplied with a frame as a unit (S801). One or more representative frames are detected from the shot on the basis of the correlation of the supplied frame (S803-S806). Indexes corresponding to link information to the moving image are generated, so as to generate a group which is obtained by grouping the generated indexes by prescribed criterion (S806). Information concerning the moving image, the index, and the group are recorded in a recording medium (S808). <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は画像処理装置およびその方法に関し、例えば、動画像のインデックス（索引）を生成し記録して、それに従い、動画像を再生する画像処理に関する。 The present invention relates to an image processing apparatus and method, and relates to, for example, image processing for generating and recording a moving image index and reproducing a moving image according to the generated index.

近年、動画が撮影可能なディジタルカメラ、携帯電話、ディジタルビデオカムコーダ(DVC)などの普及により、個人でも大量の動画を撮影するようになった。一般に、映像データは、そのデータ量が膨大であるが、その内容の概略を知る場合、所望のシーンを探す場合など、従来は映像を早送り再生したり、巻き戻し再生を行った。 In recent years, with the widespread use of digital cameras, mobile phones, digital video camcorders (DVC), and the like that can shoot moving images, individuals have been able to shoot large amounts of moving images. In general, the amount of data of video data is enormous. However, when the outline of the content is known or when a desired scene is searched for, the video is conventionally fast-forwarded or rewinded.

上記の早送りや巻き戻し再生は効率のよい方法とは言えず、効率よくするために、例えば特公平5-74273号公報に記載されたカット点の画像データを代表画面として検出し、インデックス画像を作成する方法が発明された。このカット点とは、一般に、編集された動画において素材の繋ぎ目になる点をいう。DVCで撮影した動画の場合、撮影ボタンを押し、録画を開始してから終了するまでの一連の動画の単位をショットと呼び、ショットの先頭がカット点になる。 The fast forward and rewind playback described above is not an efficient method, and in order to improve efficiency, for example, image data of cut points described in Japanese Patent Publication No. 5-74273 is detected as a representative screen, and an index image is obtained. A method of making was invented. This cut point generally refers to a point that becomes a joint between materials in an edited moving image. In the case of a movie shot with DVC, the unit of a series of movies from the start to the end of recording after pressing the shooting button is called a shot, and the beginning of the shot is the cut point.

磁気テープにアナログ記録するビデオカメラではショットが連続した動画になるが、この場合、ショットの境界でフレームの画像が急激に変化することを利用して、カット点を検出することができる。つまり、隣接するフレームの画像の差が極端に大きい箇所を検出して、カット点とすればよい。 A video camera for analog recording on a magnetic tape produces a moving image in which shots are continuous. In this case, the cut point can be detected by utilizing the abrupt change of the frame image at the shot boundary. That is, it is only necessary to detect a point where the difference between the images of adjacent frames is extremely large and set it as a cut point.

テレビ放送などで使われる動画や映画など（以降「商用動画」と呼ぶ）は、カット（一連の動画の単位）は高々数秒で、一つの番組に数百のカットが存在する。カットが短いため、カット点のフレームの画像（以降「カット点画像」と呼ぶ）を代表画像として利用することができる。ただし、カット点画像を一覧表示する場合、その数は膨大になり、ショットを統合しグルーピングして表示することで、シーンの検索に便宜を図っている。例えば、特開2002-27410公報は、一覧するカット点画像の数を減らすために、それらのフレームの時間的間隔に密にならないよう、かつ、似たようなフレームが選択されないように間引き処理を行う方法を開示する。 For movies and movies used in television broadcasting (hereinafter referred to as “commercial movies”), the cut (unit of a series of movies) is at most several seconds, and there are hundreds of cuts in one program. Since the cut is short, an image of the frame of the cut point (hereinafter referred to as “cut point image”) can be used as the representative image. However, when the cut point images are displayed as a list, the number of the cut point images becomes enormous, and shots are integrated and grouped and displayed for convenience of scene search. For example, in Japanese Patent Laid-Open No. 2002-27410, in order to reduce the number of cut point images to be listed, a thinning process is performed so that the time intervals of those frames are not dense and similar frames are not selected. A method of performing is disclosed.

また、特開2000-36966公報は、同様にカット点を検出し、カット点画像の相関に基づいてグルーピングを行う方法を開示する。このようなグループ構造をディスプレイに表示する場合、最初のカット点画像の周りに子画像の形式で、他のカット点画像を表示する。さらに、特開平9-293139号公報は、動画の各フレームのクラスタリングを行って階層化する方法を開示する。 Japanese Patent Laid-Open No. 2000-36966 discloses a method of detecting cut points in the same manner and performing grouping based on the correlation of cut point images. When such a group structure is displayed on the display, other cut point images are displayed around the first cut point image in the form of a child image. Furthermore, Japanese Patent Laid-Open No. 9-293139 discloses a method of hierarchizing by clustering each frame of a moving image.

しかし、ビデオカムコーダで撮影した動画（以降「パーソナル動画」と呼ぶ）のショットは、数秒から長いものは一時間に及び、商用動画のカットの長さの分散に比べて、ショットの長さの分散が大きい。また、ショットが長いため、カット点画像が、必ずしも、ショットを代表するに相応しい画像とは言えない場合がある。このため、商用動画のカットに相当するショットを単位として統合し一覧したのでは、ユーザが所望する動画部分を効率的に検索することはできない。 However, shots of videos shot with video camcorders (hereinafter referred to as “personal videos”) range from a few seconds to longer in one hour, and the length of the shot is more dispersed than the length of the commercial movie. Is big. Further, since the shot is long, the cut point image may not necessarily be an image suitable for representing the shot. For this reason, if a shot corresponding to a cut of a commercial moving image is integrated and listed as a unit, a moving image portion desired by the user cannot be efficiently searched.

また、特開平9-293139号公報に開示された方法は全フレームに対して処理を行うため、カットやショットの境界とは無関係に、ある動画区間を代表する代表フレームを使って近似的に動画を階層化する方法に比べて精度は高い。しかし、処理コストが高く、ビデオカムコーダのような電池稼動が主体の機器への実装は、商品としての性能バランスを大きく崩し、不可能である。 In addition, since the method disclosed in Japanese Patent Laid-Open No. 9-293139 performs processing on all frames, the moving image is approximated by using a representative frame representing a certain moving image section regardless of the cut or shot boundary. The accuracy is higher than the method of hierarchizing. However, the processing cost is high, and it is impossible to mount on a device such as a video camcorder, which is mainly operated by a battery, because the performance balance as a product is greatly broken.

また、特開2000-36966公報に開示された、代表フレームの一覧中の、さらに小さな領域に類似画像を表示する方法は、ディジタルカメラ、携帯電話、ビデオカムコーダが有する小さなディスプレイでは、画像が小さ過ぎて判別が難しく、また、画像を大きくすれば一覧性が損なわれる結果になる。 In addition, the method disclosed in JP 2000-36966 A for displaying a similar image in a smaller area in the list of representative frames is too small for a small display of a digital camera, a mobile phone, or a video camcorder. It is difficult to discriminate, and if the image is enlarged, the listability is impaired.

特開2002-27410公報JP 2002-27410 JP 特開2000-36966公報JP 2000-36966 JP 特開平9-293139号公報JP-A-9-293139

本発明は、上述の問題を個々にまたはまとめて解決するもので、動画像の所望するシーンの検索を低コストかつ効率的に行うことを目的とする。 An object of the present invention is to solve the above-mentioned problems individually or collectively, and to search for a desired scene of a moving image at low cost and efficiently.

また、表示画面が小さい場合でも効率的な検索を可能にすることを他の目的とする。 Another object is to enable efficient search even when the display screen is small.

本発明は、前記の目的を達成する一手段として、以下の構成を備える。 The present invention has the following configuration as one means for achieving the above object.

本発明にかかる画像処理装置は、ショットの境界が定義された動画像をフレーム単位に供給する供給手段と、供給されたフレームの相関に基づき、ショットから一つ以上の代表フレームを検出し、動画へのリンク情報を対応付けたインデックスを生成するインデックス生成手段と、生成されたインデックスを所定の基準でグルーピングしたグループを生成するグループ化手段と、前記動画像、前記インデックスおよび前記グループに関する情報を記録媒体に記録する記録手段とを有することを特徴とする。 An image processing apparatus according to the present invention detects a one or more representative frames from a shot on the basis of a supply unit that supplies a moving image in which a shot boundary is defined for each frame, and a correlation between the supplied frames. An index generation unit that generates an index that associates link information with a group, a grouping unit that generates a group in which the generated index is grouped according to a predetermined criterion, and records the moving image, the index, and information about the group And recording means for recording on a medium.

本発明にかかる画像処理方法は、ショットの境界が定義された動画像をフレーム単位に供給し、供給されたフレームの相関に基づき、ショットから一つ以上の代表フレームを検出し、動画へのリンク情報を対応付けたインデックスを生成し、生成されたインデックスを所定の基準でグルーピングしたグループを生成し、前記動画像、前記インデックスおよび前記グループに関する情報を記録媒体に記録することを特徴とする。 An image processing method according to the present invention supplies a moving image in which a shot boundary is defined for each frame, detects one or more representative frames from a shot based on the correlation of the supplied frames, and links to a moving image. An index in which information is associated is generated, a group in which the generated index is grouped according to a predetermined criterion is generated, and the information about the moving image, the index, and the group is recorded on a recording medium.

本発明によれば、動画像の所望するシーンの検索を低コストかつ効率的に行うことができる。 According to the present invention, it is possible to efficiently search for a desired scene of a moving image at low cost.

また、表示画面が小さい場合でも効率的な検索を可能にすることができる。 In addition, even when the display screen is small, efficient search can be performed.

以下、本発明にかかる一実施形態の画像処理装置を図面を参照して詳細に説明する。 Hereinafter, an image processing apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings.

［構成］
図1は実施例1の、リムーバブルな光磁気ディスク型の記録媒体を用いるディジタルカムコーダの構成例を示すブロック図である。 [Constitution]
FIG. 1 is a block diagram illustrating a configuration example of a digital camcorder using a removable magneto-optical disk type recording medium according to the first embodiment.

主制御部201は、CPU 202、ROM 203、RAM 204などを有し、CPU 202はROM 203に格納された制御プログラムなどを実行することで、以下に説明する各種処理を含む、ディジタルカムコーダにおける各種制御を実行する。ROM 203には、図2に示す各種処理を制御プログラムとして格納する。具体的には、ドライブユニット205を制御するためのドライブ制御部301、ドライブユニット205に装着された光磁気ディスク206に動画を記録し再生するための動画記録部302、光磁気ディスク206に格納された動画を時間管理するためのタイムコードを更新するタイムコードカウンタ303、撮影日時などを取得するための日時タイマ304、光磁気ディスク206に格納された動画のインデックスを生成するインデックス生成部305、インデックスグループを生成するグループ化部306、ナビゲート部307などを有する。 The main control unit 201 includes a CPU 202, a ROM 203, a RAM 204, and the like, and the CPU 202 executes various control processes stored in the ROM 203, thereby including various processes described below. Execute control. Various processes shown in FIG. 2 are stored in the ROM 203 as a control program. Specifically, a drive control unit 301 for controlling the drive unit 205, a moving image recording unit 302 for recording and reproducing a moving image on the magneto-optical disk 206 attached to the drive unit 205, and a moving image stored on the magneto-optical disk 206 A time code counter 303 for updating a time code for time management, a date timer 304 for acquiring a shooting date and time, an index generation unit 305 for generating an index of a moving image stored in the magneto-optical disk 206, an index group A grouping unit 306 and a navigation unit 307 are generated.

また、ドライブユニット205は、メカ機構、サーボ機構、光ピックアップなどから構成され、主制御部201によって制御される。 The drive unit 205 includes a mechanical mechanism, a servo mechanism, an optical pickup, and the like, and is controlled by the main control unit 201.

［初期化処理］
次に、ドライブユニット205に光磁気ディスク206が装着された際の初期化処理を説明する。 [Initialization]
Next, initialization processing when the magneto-optical disk 206 is mounted in the drive unit 205 will be described.

光磁気ディスク206の装着を検知したドライブ制御部301は、図3に示す光磁気ディスク206上の情報（メディア管理情報401、ショット管理情報402、インデックス管理情報403およびグループ管理情報404）を不揮発性メモリ211に読み込む。 The drive control unit 301 that has detected the mounting of the magneto-optical disk 206 stores nonvolatile information (media management information 401, shot management information 402, index management information 403, and group management information 404) on the magneto-optical disk 206 shown in FIG. Read into memory 211.

図4は光磁気ディスク206から情報401-404が読み込まれた後の不揮発性メモリ211の記憶状態を示す図である。カムコーダID 501は、個々のカムコーダを識別するための識別子で、九桁の数字で表現され、カムコーダの製造時に記録される。メディアカウンタ502は、カムコーダが扱った未使用のメディアの数をカウントアップするカウンタで、カムコーダの製造時は「0」が格納される。また、光磁気ディスク206から読み込まれるメディア管理情報401には、メディアID 503、最新更新日時504、最大タイムコード505、ショット数506およびインデックス数507などの情報が含まれる。 FIG. 4 is a diagram showing a storage state of the nonvolatile memory 211 after information 401-404 is read from the magneto-optical disk 206. The camcorder ID 501 is an identifier for identifying individual camcorders, is expressed by nine-digit numbers, and is recorded when the camcorder is manufactured. The media counter 502 is a counter that counts up the number of unused media handled by the camcorder, and stores “0” when the camcorder is manufactured. The media management information 401 read from the magneto-optical disk 206 includes information such as the media ID 503, the latest update date and time 504, the maximum time code 505, the shot number 506, and the index number 507.

ドライブ制御部301は、メディア管理情報401を不揮発性メモリ211に読み込むと、装着された光磁気ディスク206が未使用か否かを判定する。光磁気ディスク206のメディアID 503は製造時に「0」に設定されるから、メディアID 503の値が「0」であれば、その光磁気ディスク206は未使用であると判定する。そして、未使用の光磁気ディスク206の場合は、最大タイムコード505、ショット数506およびインデックス数507にそれぞれ初期値「0」を代入し、続いて、メディアカウンタ702をインクリメントする。さらに、メディアを識別するためのIDを新たに生成して、生成したIDをメディアID 503に書き込む。生成するIDは、例えば、不揮発性メモリ211に記録されたカムコーダID（九桁）を上位九桁とし、メディアカウンタ502のカウント値（三桁）を下位三桁とする12桁の符号にする。なお、ID等の桁数に制限はない。また、光磁気ディスク206が未使用ではないと判定された場合、メディアカウンタ502およびメディアID 503の更新は行わない。 When the media control information 401 is read into the nonvolatile memory 211, the drive control unit 301 determines whether or not the mounted magneto-optical disk 206 is unused. Since the media ID 503 of the magneto-optical disk 206 is set to “0” at the time of manufacture, if the value of the media ID 503 is “0”, it is determined that the magneto-optical disk 206 is unused. In the case of an unused magneto-optical disk 206, the initial value “0” is assigned to the maximum time code 505, the shot number 506, and the index number 507, respectively, and then the media counter 702 is incremented. Furthermore, an ID for identifying the medium is newly generated, and the generated ID is written in the medium ID 503. The ID to be generated is, for example, a 12-digit code in which the camcorder ID (nine digits) recorded in the nonvolatile memory 211 is the upper nine digits and the count value (three digits) of the media counter 502 is the lower three digits. There is no limit to the number of digits such as ID. If it is determined that the magneto-optical disk 206 is not unused, the media counter 502 and the media ID 503 are not updated.

［動画の記録］
次に、動画の記録過程を説明する。 [Recording video]
Next, a moving image recording process will be described.

入力部212の撮影開始ボタンが押されると、光学系207を通して撮像素子208に投影される被写体の反射光は電気信号に変換される。撮像素子208から1フレームのすべての画素が順次走査で読み出され撮像系処理部209に入力される。撮像系処理部209は、ノイズ除去、自動ゲイン調整を行った後、A/D変換した、輝度信号Yと色差信号Pr、Pbに分離したYPrPbデータを生成する。このYPrPbデータは、バッファメモリ210に蓄積された後、動画記録部302により、離散コサイン変換(DCT)を用いる圧縮アルゴリズムによって圧縮され、最後に、同期符号やエラー訂正符号などが付加され、ドライブユニット205を介して、光磁気ディスク206上の動画記録領域405（図3参照）に記録される。このようにして、入力部212の撮影終了ボタンが押されるまで、動画の記録が続けられる。 When the shooting start button of the input unit 212 is pressed, the reflected light of the subject projected on the image sensor 208 through the optical system 207 is converted into an electrical signal. All pixels in one frame are read out sequentially from the image sensor 208 and input to the imaging system processor 209. After performing noise removal and automatic gain adjustment, the imaging system processing unit 209 generates YPrPb data that is A / D converted and separated into a luminance signal Y and color difference signals Pr and Pb. This YPrPb data is stored in the buffer memory 210, and then compressed by the moving picture recording unit 302 by a compression algorithm using discrete cosine transform (DCT). Finally, a synchronization code, an error correction code, and the like are added to the drive unit 205. To the moving image recording area 405 (see FIG. 3) on the magneto-optical disk 206. In this way, the recording of the moving image is continued until the shooting end button of the input unit 212 is pressed.

一方、撮影開始ボタンが押され、サーボ機構が安定して、上述した動画の記録が開始されると、タイムコードカウンタ303はフレーム単位のカウントアップを開始し、インデックス生成部305はインデックスの生成を開始して、生成したインデックスを不揮発性メモリ211にインデックス管理情報403として記録する。また、グループ化部306は、インデックスをグルーピングして不揮発性メモリ211にグループ管理情報404として記録する。このインデックスの生成および記録、並びに、グルーピングおよび記録に関しては後に詳しく説明する。 On the other hand, when the shooting start button is pressed, the servo mechanism is stabilized, and the above-described recording of the moving image is started, the time code counter 303 starts counting up in units of frames, and the index generation unit 305 generates an index. The generated index is recorded as index management information 403 in the nonvolatile memory 211. The grouping unit 306 groups the indexes and records them as group management information 404 in the nonvolatile memory 211. This index generation and recording, and grouping and recording will be described in detail later.

また、動画の記録が開始されると、ドライブ制御部301は、不揮発性メモリ211のショット数506をインクリメントし、ショット管理情報402の中の有効なショット情報の末尾に、新たに生成したショット情報を追加する。図5に示すように、ショット管理情報402は複数のショット情報から構成され、各ショット情報はショット番号、タイムコードおよび撮影日時を含む。生成されるショット情報は、ショット番号としてショット数506の値を、タイムコードとして最大タイムコード505の値を、撮影日時として撮影開始ボタンが押されたタイミングの日時タイマ304の値をもつ。 When the recording of the moving image is started, the drive control unit 301 increments the shot number 506 of the nonvolatile memory 211, and newly generates shot information at the end of the valid shot information in the shot management information 402. Add As shown in FIG. 5, the shot management information 402 includes a plurality of pieces of shot information, and each piece of shot information includes a shot number, a time code, and a shooting date / time. The generated shot information includes the value of the shot number 506 as the shot number, the value of the maximum time code 505 as the time code, and the value of the date / time timer 304 at the timing when the shooting start button is pressed as the shooting date / time.

さらに、撮影終了ボタンによって動画の記録が終了すると、ドライブ制御部301は、タイムコードカウンタ303のカウントアップを停止し、その時点のタイムコードカウンタ303の値を最大タイムコード505として不揮発性メモリ211に格納する。さらに、日時タイマ304の値を最新更新日時504として不揮発性メモリ211に書き込んだ後、不揮発性メモリ211上のメディア管理情報401、ショット管理情報402、インデックス管理情報403およびグループ管理情報404を光磁気ディスク206に書き込む。 Further, when the recording of the moving image is ended by the shooting end button, the drive control unit 301 stops the count up of the time code counter 303, and the value of the time code counter 303 at that time is set as the maximum time code 505 in the nonvolatile memory 211. Store. Further, after the value of the date / time timer 304 is written in the nonvolatile memory 211 as the latest update date / time 504, the media management information 401, shot management information 402, index management information 403, and group management information 404 on the nonvolatile memory 211 are magneto-optically recorded. Write to disc 206.

［インデックスの記録］
図6はインデックス生成部305のインデックスの記録過程を説明するフローチャートである。 [Record index]
FIG. 6 is a flowchart for explaining the index recording process of the index generation unit 305.

入力部212の撮影開始ボタンが押され、上述した動画記録が開始されると、バッファメモリ210に蓄積された動画の1フレーム分のYPrPbデータを取り出し(S801)、フレームの取り出しができたか否かを判定する(S802)。フレームを取り出すことができた場合は、取り出したフレームの画像データから第一の特徴量を抽出する(S803)。抽出する特徴量は、フレームを例えばN×M個のブロックに分割した各ブロックの平均色を、各ブロックの位置に対応付けたものとする。次に、画像データ、特徴量およびタイムコードを、図7に示すように、RAM 204に割り当てられたキャッシュバッファに格納する(S804)。 When the shooting start button of the input unit 212 is pressed and the above-mentioned moving image recording is started, one frame of YPrPb data of the moving image stored in the buffer memory 210 is extracted (S801), and whether or not the frame has been extracted. Is determined (S802). If the frame can be extracted, the first feature amount is extracted from the image data of the extracted frame (S803). The feature amount to be extracted is obtained by associating the average color of each block obtained by dividing the frame into, for example, N × M blocks, with the position of each block. Next, the image data, the feature amount, and the time code are stored in the cache buffer assigned to the RAM 204 as shown in FIG. 7 (S804).

キャッシュバッファは、リングバッファで構成され、インデックス生成の開始時に初期化される。キャッシュバッファのサイズは、代表フレームの抽出処理における、後述する探索範囲または相関を求めるフレームを決定するのに必要な大きさの、どちらか大きな方のサイズである。 The cache buffer is composed of a ring buffer and is initialized at the start of index generation. The size of the cache buffer is the larger one of the sizes necessary for determining a search range or a frame for which correlation is to be determined, which will be described later, in the representative frame extraction process.

次に、キャッシュバッファに所定量のデータが溜まったか否かの判定により(S805)、データ量が所定量溜まるまで、ステップS801からS804の処理を繰り返す。 Next, by determining whether or not a predetermined amount of data has accumulated in the cache buffer (S805), the processing of steps S801 to S804 is repeated until the data amount has accumulated.

データ量が所定量溜まると、あるいは、撮影終了ボタンが押されてフレームを取り出すことができなかったと判定(S802)すると、代表フレームの抽出処理を行う(S806)。この処理の概要は、キャッシュバッファにバッファされた一つのフレームに注目して、注目フレームが代表フレームになり得るか否かを評価し、代表フレームになり得る場合は、その近傍のフレームを参照して、注目フレームが代表フレームに適するか否かを判断し、適すると判断した場合にはインデックスへ追加するものである。 When a predetermined amount of data is accumulated or when it is determined that the frame has not been taken out by pressing the shooting end button (S802), representative frame extraction processing is performed (S806). The outline of this process is to focus on one frame buffered in the cache buffer, evaluate whether the frame of interest can be a representative frame, and if it can be a representative frame, refer to the neighboring frames. Thus, it is determined whether or not the frame of interest is suitable for the representative frame, and if it is determined that the frame is suitable, it is added to the index.

次に、最後のフレームまで評価したか判定する(S807)。例えば、撮影終了ボタンが押されて録画が終了し、バッファメモリ210上に未処理のフレームがなければ、最後のフレームまで評価したと判定することができる。キャッシュバッファによって取り出すフレームと代表フレームとして評価するフレームに遅延が生じているので、先のS805でインデックス生成を開始してから一定時間代表フレームを評価していない分、フレームが取り出せなくなってからも同じ時間分の評価を行うものである。最後フレームまで処理していなかったらS801へ戻り、処理していたらS808に進む。 Next, it is determined whether the evaluation has been performed up to the last frame (S807). For example, if the recording end button is pressed and the recording ends, and there is no unprocessed frame on the buffer memory 210, it can be determined that the evaluation has been performed up to the last frame. Since there is a delay between the frame fetched by the cache buffer and the frame to be evaluated as the representative frame, the same is true even if the frame cannot be fetched as much as the representative frame has not been evaluated for a certain period of time after starting index generation in the previous S805. The time is evaluated. If not processed to the last frame, the process returns to S801, and if processed, the process proceeds to S808.

最後に、インデックス情報を不揮発性メモリ211のインデックス管理情報403に記録して(S808)、処理を終了する。 Finally, the index information is recorded in the index management information 403 of the nonvolatile memory 211 (S808), and the process ends.

［代表フレームの抽出処理］
図8は代表フレームの抽出処理(S806)を説明するフローチャートである。 [Representative frame extraction processing]
FIG. 8 is a flowchart for explaining representative frame extraction processing (S806).

まず、相関を求めるフレームを決定する(S901)。相関を求めるフレームは、注目フレームと、次式によって求まるタイムコードtcのフレームである。
tc = tp + a + b(tn - tp)
ここで、tc ＜ tn
tnは注目フレームのタイムコード
tpは直前の代表フレームのタイムコード
a ≧ 0、0 ≦ b ≦ 1 First, a frame for which a correlation is obtained is determined (S901). The frame for which the correlation is obtained is the frame of interest and the frame of the time code tc obtained by the following equation.
tc = tp + a + b (tn-tp)
Where tc <tn
tn is the time code of the frame of interest
tp is the time code of the last representative frame
a ≥ 0, 0 ≤ b ≤ 1

なお、タイムコードtnは、インデックス生成の開始時に「0」に初期化され、本処理が呼び出される度に1フレーム分ずつ増加する。 The time code tn is initialized to “0” at the start of index generation, and increases by one frame each time this process is called.

また、aおよびbは、代表フレームの検出頻度を調整するパラメータである。例えば、a=3（秒）、b=0とすると、a=0、b=0の場合に比べて、動きの激しい区間において局所的に多くのフレームが検出されるのを防ぐことができる。また、a=0、b=0.5とすると、直前の代表フレームと注目フレームの中央を参照することになる。これによりa=0、b=0の場合に比べて、動きの少ない個所においても代表フレームの検出量を抑制することができる。aおよびbの両係数の値は、インデックス生成時に決定してもよいし、また、検出された代表フレームの頻度などによりインデックス生成中に制御してもよい。ただし、b=0とした場合のキャッシュバッファのサイズは、1フレーム分の特徴量を記憶できればよいが、例えばb=1とした場合は動画全体分の特徴量をキャッシュ可能なサイズが必要になる。 Further, a and b are parameters for adjusting the detection frequency of the representative frame. For example, when a = 3 (seconds) and b = 0, it is possible to prevent a large number of frames from being detected locally in a section where the motion is intense as compared with the case where a = 0 and b = 0. If a = 0 and b = 0.5, the center of the immediately preceding representative frame and the target frame is referred to. As a result, the amount of detection of the representative frame can be suppressed even in a place where the movement is small as compared with the case where a = 0 and b = 0. The values of both coefficients a and b may be determined at the time of index generation, or may be controlled during index generation based on the frequency of the detected representative frame. However, if b = 0, the cache buffer size need only be able to store the feature value for one frame. For example, if b = 1, the size of the feature value for the entire video must be cached. .

もし、キャッシュバッファとして充分なメモリサイズが確保できない場合は、次のような方法によって、容易に解決することができる。第一の方法は、代表フレームの間隔に最大値を設け、所定時間以上、代表フレームが検出されない場合は、最後の代表フレームから所定時間経過後のフレームを代表フレームとして出力する。また、第二の方法は、タイムコードtcの分解能を、1フレーム単位から、例えば5フレーム単位のように落として、特徴量を保存するフレームを五分の一に間引く。 If a sufficient memory size cannot be secured as a cache buffer, it can be easily solved by the following method. In the first method, a maximum value is set for the interval between representative frames, and when a representative frame is not detected for a predetermined time or longer, a frame after the elapse of a predetermined time from the last representative frame is output as a representative frame. In the second method, the resolution of the time code tc is reduced from one frame unit to, for example, five frame units, and the frame storing the feature amount is thinned out to one fifth.

次に、注目フレームがショットの先頭か否かを評価する(S902)。ショットの先頭フレームは、必ず、代表フレームとして採用するため、その場合は処理をステップS907へ進める。また、注目フレームがショットの先頭フレームでなければ、相関を求めるフレームが特定できない場合（例えばtc≧tnの場合）は処理を終了する(S903)。 Next, it is evaluated whether or not the frame of interest is the head of a shot (S902). Since the first frame of a shot is always adopted as a representative frame, in this case, the process proceeds to step S907. If the frame of interest is not the first frame of the shot, if the frame for which correlation is to be determined cannot be specified (for example, when tc ≧ tn), the process ends (S903).

相関を求めるフレームが特定された場合は、相関判定処理を行う(S904)。具体的には、タイムコードtnおよびtcに相当するフレームの特徴量をキャッシュバッファから取得して、次式により、N×M個のブロックの特徴量間の距離dの総和を求める。
d = Σ_n=1 ^NΣ_m=1 ^M{w1×|Yi(n, m) - Yq(n, m)|
+ w2×|Pri(n, m) - Prq(n, m)|
+ w3×|Pbi(n, m) - Pbq(n, m)|}
ここで、n、mはブロックの水平、垂直方向のインデックス番号
Yi(n, m)はn×m番目のブロックの平均色の輝度
Pri(n, m)、Pbi(n, m)はn×m番目のブロックの平均色の色差
Yq(n, m)は抽出した特徴量中の第(n, m)ブロックの平均色の輝度
Prq(n, m)、Pbq(n, m)は上記第(n, m)ブロックの平均色の色差
w1、w2、w3はそれぞれ距離dを算出する際の重み If the frame for which correlation is to be determined is specified, correlation determination processing is performed (S904). Specifically, the frame feature amounts corresponding to the time codes tn and tc are acquired from the cache buffer, and the sum of the distances d between the feature amounts of the N × M blocks is obtained by the following equation.
d = Σ _{n = 1} ^N Σ _{m = 1} ^M (w1 × | Yi (n, m)-Yq (n, m) |
+ w2 × | Pri (n, m)-Prq (n, m) |
+ w3 × | Pbi (n, m)-Pbq (n, m) |}
Where n and m are the horizontal and vertical index numbers of the block
Yi (n, m) is the brightness of the average color of the n × mth block
Pri (n, m), Pbi (n, m) is the color difference of the average color of the n × mth block
Yq (n, m) is the brightness of the average color of the (n, m) block in the extracted feature
Prq (n, m), Pbq (n, m) is the color difference of the average color of the above (n, m) block
w1, w2, and w3 are the weights for calculating the distance d.

なお、重みw1、w2、w3は精度が向上するように値を調整するものとする。そして、距離dが小さいほど、特徴量が類似し、フレームの相関が高いと判定する。 Note that the values of the weights w1, w2, and w3 are adjusted so as to improve accuracy. Then, it is determined that the smaller the distance d, the more similar the feature amount and the higher the correlation between frames.

次に、距離dと所定の閾値Sとを比較する(S905)。d≦Sの場合、注目フレームと直前の代表フレームが類似しているので終了する。なお、閾値Sはインデックス生成時に決定してもよく、また、検出された代表フレームの頻度などによってインデックス生成中に閾値Sを制御するようにしてもよい。また、d＞Sの場合は、タイムコードtnの近隣で、より代表フレームとして適したフレームを探すため、後述する探索処理を行う(S906)。 Next, the distance d is compared with a predetermined threshold value S (S905). If d ≦ S, the target frame and the immediately preceding representative frame are similar, and the process ends. The threshold value S may be determined at the time of index generation, or the threshold value S may be controlled during index generation based on the frequency of the detected representative frame. If d> S, a search process described later is performed in order to search for a frame more suitable as a representative frame near the time code tn (S906).

次に、インデックス管理情報403の各項目に、検出した情報を設定し、キャッシュバッファ上の代表フレームの画像データから縮小画像（サムネイル画像）を作成して、図9に示すように、インデックス管理情報に追加し(S907)、処理を終了する。なお、縮小画像は、キャッシュバッファに画像データの登録する際に作成しておいてもよいが、そうすれば、全フレームに対して縮小画像を作ることになるので、縮小率、割当可能なメモリ量、特徴量を抽出する過程で生成されるデータの再利用性、縮小処理の処理コストなどを勘案して決定する。 Next, the detected information is set in each item of the index management information 403, a reduced image (thumbnail image) is created from the image data of the representative frame on the cache buffer, and the index management information as shown in FIG. (S907), and the process ends. Note that the reduced image may be created when the image data is registered in the cache buffer, but in that case, a reduced image is created for all frames, so that the reduction rate and the assignable memory This is determined in consideration of the reusability of data generated in the process of extracting the quantity and feature quantity, the processing cost of the reduction process, and the like.

［探索処理］
探索処理の目的は、前段の処理において代表フレームの候補として選ばれたフレームの近隣で、より代表フレームとして適したフレームを探索することである。そのために、本実施例における探索処理は所定の間隔で離れたフレームの相関を求めて、動きの少ない所、すなわち、ブレの少ない個所（フレーム）を選択する。 [Search process]
The purpose of the search process is to search for a frame more suitable as a representative frame in the vicinity of the frame selected as a representative frame candidate in the preceding process. For this purpose, the search processing according to the present embodiment obtains the correlation between frames separated by a predetermined interval, and selects a place where there is little motion, that is, a place (frame) with little blur.

図10は探索処理の詳細を説明するフローチャートである。 FIG. 10 is a flowchart for explaining the details of the search process.

まず、変数を初期する(S1001)。すなわち、基準になるフレーム位置を示す変数tにタイムコードtn、また、最も動きが少ない個所の距離を示す変数min_dに「∞」を設定する。 First, variables are initialized (S1001). That is, the time code tn is set to the variable t indicating the frame position serving as a reference, and “∞” is set to the variable min_d indicating the distance of the location with the least movement.

次に、ループを制御するための変数iに「0」を、変数kに探索範囲を示す値を代入し(S1002)、ループ終了判定を行い(S1003)、ループが終了した(i=k)場合は探索処理を終了して、探索結果を出力する。ここで、kは探索範囲の上限を示す所定の値である。 Next, "0" is substituted for the variable i for controlling the loop, and the value indicating the search range is substituted for the variable k (S1002), the loop end determination is performed (S1003), and the loop is ended (i = k) In this case, the search process is terminated and the search result is output. Here, k is a predetermined value indicating the upper limit of the search range.

次に、前方探索を行うか否かを判定する(S1004)。この判定は、次に距離dを求めるべきフレームが前の代表フレームや、それよりも時間的に前の場合は探索は不要であること、また、ステップS901の処理における係数aの値によっては前方向のフレームは直前の代表フレームと相関が高くなるため、後方の探索範囲よりも前方の探索範囲を狭くしてもよいこと、を考慮したものである。その場合、変数iが探索範囲k未満の所定の閾値を超えた時点で探索不要と判定してもよい。そして、前方探索が不要の場合は処理をステップS1008へ進める。 Next, it is determined whether or not to perform a forward search (S1004). This determination is based on the fact that the next frame for which the distance d is to be obtained is the previous representative frame, or that the search is not required if it is earlier than that, and depending on the value of the coefficient a in the process of step S901, Since the direction frame has a higher correlation with the immediately preceding representative frame, the forward search range may be narrower than the backward search range. In that case, it may be determined that the search is unnecessary when the variable i exceeds a predetermined threshold value less than the search range k. If no forward search is necessary, the process proceeds to step S1008.

前方探索が必要な場合は、タイムコード(t-i-n)と(t-i)のフレームの特徴量をキャッシュバッファから得て距離dを求める(S1005)。なお、距離dはステップS904と同様の方法で求める。ただし、nは零ではない所定の整数である。また、暗い場所の撮影時は、撮像素子から映像を取り出す間隔を拡げて明るさを確保する撮像系があるが、この場合、同じ内容のフレームが続いてしまうので、nを零近くに設定することは適さない。そして、求めた距離dと今までの距離の最低値min_dを比較し(S1006)、d＜min_dの場合は、今までの探索範囲の中で最も動画の動きが少ないと判断し、探索結果を更新する(S1007)。つまり、min_dにdを代入し、タイムコードt-iの位置のフレームを代表フレームの候補にする。 When the forward search is necessary, the feature amount of the frame of time code (t-i-n) and (t-i) is obtained from the cache buffer, and the distance d is obtained (S1005). The distance d is obtained by the same method as in step S904. However, n is a predetermined integer that is not zero. Also, when shooting in a dark place, there is an imaging system that secures brightness by increasing the interval at which images are extracted from the imaging device. In this case, frames with the same contents continue, so n is set to near zero. That is not suitable. Then, the calculated distance d is compared with the minimum value min_d of the distance so far (S1006) .If d <min_d, it is determined that the motion of the video is the smallest in the search range so far, and the search result is Update (S1007). That is, d is substituted for min_d, and the frame at the position of the time code t−i is set as a representative frame candidate.

次に、後方探索を行うか否かを判定する(S1008)。この判定は、次に距離を求めるべきフレームが次のショットになる場合に探索は不要であることを考慮したものである。そして、後方探索が不要の場合は処理をステップS1012へ進める。 Next, it is determined whether or not a backward search is performed (S1008). This determination takes into consideration that the search is not required when the next frame whose distance is to be obtained is the next shot. If no backward search is necessary, the process proceeds to step S1012.

後方探索が必要な場合は、タイムコード(t+i-n)と(t+i)のフレームの特徴量をキャッシュバッファから得て距離dを求め(S1009)、求めた距離dと今までの距離の最低値min_dを比較し(S1010)、d＜min_dの場合は、今までの探索範囲の中で最も動画の動きが少ないと判断し、探索結果を更新する(S1011)。つまり、min_dにdを代入し、タイムコードt+iの位置のフレームを代表フレームの候補にする。 If backward search is required, obtain the distance d from the cache buffer by obtaining the feature quantities of the frames of time code (t + in) and (t + i) (S1009). The minimum value min_d is compared (S1010). If d <min_d, it is determined that the motion of the moving image is the smallest in the search range so far, and the search result is updated (S1011). That is, d is substituted for min_d, and the frame at the position of time code t + i is set as a representative frame candidate.

次に、変数iをインクリメントすることで探索範囲を1フレーム分拡げ(S1012)、処理をステップS1003に戻す。 Next, the search range is expanded by one frame by incrementing the variable i (S1012), and the process returns to step S1003.

［グループ化］
次に、グループ化部306におけるグループ化を説明する。 [Group]
Next, grouping in the grouping unit 306 will be described.

図11はグループ化管理情報404の構成例を示す図である。また、グループ化中は、図11に示す情報に加えて、各グループの代表特徴量として、各グループのメンバであるフレームの特徴量の平均を求めたものを保持する。 FIG. 11 is a diagram showing a configuration example of the grouping management information 404. Further, during grouping, in addition to the information shown in FIG. 11, the average feature value of the frames that are members of each group is held as the representative feature value of each group.

図12はグループ化の詳細を説明するフローチャートである。 FIG. 12 is a flowchart for explaining details of grouping.

まず、各代表フレームをメンバ数「1」の仮のグループとして初期化し、相関を求めるためのグループの特徴量は各代表フレームの特徴量を使用する(S1501)。 First, each representative frame is initialized as a temporary group having the number of members “1”, and the feature amount of each representative frame is used as the group feature amount for obtaining the correlation (S1501).

次に、すべてのグループからグループ間距離が最小になる一組のグループを求め、その距離をmin_dとする(S1502)。なお、距離dはステップS904と同様の方法で求める。また、多次元ベクトルの最近接検索を高速化するために、多次元インデックス手法が盛んに研究されているが、これらの手法を用いることもできる。 Next, a set of groups having a minimum inter-group distance is obtained from all groups, and the distance is set as min_d (S1502). The distance d is obtained by the same method as in step S904. In addition, in order to speed up the nearest-neighbor search of multidimensional vectors, multidimensional index techniques are actively studied, but these techniques can also be used.

次に、min_dと所定の閾値S2を比較して(S1503)、min_d≧S2ならば、これ以上のグループ化は不可能として処理をステップS1507に進める。また、min_d＜S2ならば、最も類似した（グループ間距離が最小の）グループを一つに結合する(S1504)。すなわち、各グループのメンバであるインデックスを一つのグループに編入する。次に、結合したグループの各メンバの特徴量を平均し、代表特徴量を求め(S1505)、グループ数が所望の数まで減ったか否かを判定し(S1506)、減っていない場合は処理をステップS1502に戻す。 Next, min_d is compared with a predetermined threshold value S2 (S1503), and if min_d ≧ S2, further grouping is impossible and the process proceeds to step S1507. If min_d <S2, the most similar groups (with the smallest inter-group distance) are combined into one (S1504). That is, an index that is a member of each group is incorporated into one group. Next, the feature values of each member of the combined group are averaged to obtain a representative feature value (S1505), and it is determined whether or not the number of groups has been reduced to the desired number (S1506). Return to step S1502.

グループ化が不可能またはグループ数が所望数になると、メンバが一つしかないグループ、すなわち結合されずに孤立しているグループを削除し(S1507)、各グループの代表インデックスを求める(S1508)。代表インデックスには、グループのメンバで最もタイムコードが小さいインデックスを採用する。そして、代表インデックス番号等をグループ管理情報404に記録し(S1509)、処理を終了する。 When grouping is impossible or the number of groups reaches a desired number, a group having only one member, that is, a group that is isolated without being joined is deleted (S1507), and a representative index of each group is obtained (S1508). As the representative index, an index having the smallest time code among the members of the group is adopted. Then, the representative index number and the like are recorded in the group management information 404 (S1509), and the process ends.

［インデックスの出力例］
次に、ナビゲート部307によるインデックスの出力例を説明する図である。入力部212の動画ナビボタンが押されると、不揮発性メモリ211からインデックス管理情報403およびグループ管理情報404が読み込まれ、インデックスの一覧が表示部213に表示される。 [Example of index output]
Next, an example of index output by the navigation unit 307 will be described. When the moving image navigation button of the input unit 212 is pressed, the index management information 403 and the group management information 404 are read from the nonvolatile memory 211 and a list of indexes is displayed on the display unit 213.

図13はインデックスの一覧を表示部213に出力した一例を示す図である。インデックスとグループの代表インデックスが左上から横方向に動画へのタイムコード順に並んでいる。ただし、グループ化されたインデックスの場合はグループの代表インデックスのタイムコードを採用する。なお、図13に符号1601で示す表示は、グループ化された代表インデックス画像の縮小表示である。また、符号1602は、グループであることを示す修飾表示で、画像が三枚重なった状態がデザインされている。また、符号1603は、グループ化されていない孤立したインデックスの画像を縮小表示である。もし、インデックスが表示部213の画面に入り切らない場合は、入力部212の指示に基づくスクロールまたは画面切替によって、全インデックスを表示することができる。 FIG. 13 is a diagram showing an example in which a list of indexes is output to the display unit 213. The index and the representative index of the group are arranged in the order of time code to the video from the upper left to the horizontal direction. However, in the case of a grouped index, the time code of the representative index of the group is adopted. Note that the display indicated by reference numeral 1601 in FIG. 13 is a reduced display of the grouped representative index images. Reference numeral 1602 denotes a modification display indicating a group, and a state in which three images overlap is designed. Reference numeral 1603 is a reduced display of an image of an isolated index that is not grouped. If the index does not fit in the screen of the display unit 213, the entire index can be displayed by scrolling or screen switching based on the instruction of the input unit 212.

図14はインデックスの一覧表示の別形態を示す図である。グループを示す表示として符号1602の画像の重なりに代わって、符号1604で示すようなアイコンを表示する。なお、インデックスのレイアウト方法は数多くの変形が可能であるが、要は、インデックスを一覧表示する際にグループか否かをユーザに示せればよい。 FIG. 14 is a diagram showing another form of index list display. As a display indicating the group, an icon as indicated by reference numeral 1604 is displayed instead of the overlapping of the images indicated by reference numeral 1602. It should be noted that the index layout method can be modified in many ways. In short, it is only necessary to indicate to the user whether or not the index is a group when the index is displayed as a list.

［再生例］
次に、ナビゲート部307による選択および再生例を説明する。この動画再生の特徴は、グループ化したインデックスの選択方法によって異なる動画再生を行い、インタラクティブにユーザが所望する動画へナビゲートすることである。 [Example of playback]
Next, an example of selection and reproduction by the navigation unit 307 will be described. A feature of this moving image reproduction is that different moving image reproduction is performed depending on the method of selecting the grouped index, and the user navigates interactively to a desired moving image.

表示部213に一覧表示されたインデックスは、一箇所だけ注目状態にすることができる。グループの代表インデックスの場合は、画像注目状態とグループ注目状態の二つの注目状態がある。図15は注目状態におけるインデックス一覧表示の各部の表示の変化を示す図である。ユーザが注目画像やインデックスを認識し易いように、画像の枠の太さやアイコンの大きさを強調表示する。勿論、注目画像やインデックスを強調するのが目的であるから、枠やアイコンの色を変えたり、アイコンが回転するなどのアニメーションによる強調表示手法を用いてもよい。 The index displayed as a list on the display unit 213 can be in a focused state only at one place. In the case of a group representative index, there are two attention states: an image attention state and a group attention state. FIG. 15 is a diagram showing a change in display of each part of the index list display in the attention state. The thickness of the image frame and the size of the icon are highlighted so that the user can easily recognize the image of interest and the index. Of course, since the purpose is to emphasize the image of interest or the index, an emphasis display method by animation such as changing the color of the frame or icon or rotating the icon may be used.

ユーザは、注目画像またはグループを、入力部212の操作によって自由に移動することができる。例えば、図16は、入力部212に上下左右四つのボタンを有する十字方向キーを有する場合の注目画像またはインデックスの移動方向を示す図である。今、中央行の左列の画像が注目画像であった場合、上、下、左方向キーを押すと、その方向にある画像（またはグループ）に注目状態が移行する。また、右方向キーを押すると中央の画像（またはグループ）が注目状態になり、さらに右方向キーを押すると中央行の右列の画像が注目画像になり、さらに右方向キーを押すると、タイムコード順に、下行の左列の画像が注目画像になる。また、図示しないが、画面外（上または下）へ向かう方向のキーを押した場合、必要に応じて、上下方向に画面切替またはスクロールが行われ、すべてのインデックスまたはグループの代表インデックス中の何れかを注目状態にすることが可能である。 The user can freely move the target image or group by operating the input unit 212. For example, FIG. 16 is a diagram showing the moving direction of the image of interest or index when the input unit 212 has a cross direction key having four buttons, upper, lower, left, and right. If the image in the left column of the center row is the image of interest, when the up, down, left direction key is pressed, the state of interest shifts to an image (or group) in that direction. If you press the right arrow key, the center image (or group) will be in the focused state. If you press the right arrow key again, the right column image in the center row will be the focused image. In the code order, the image in the left column in the lower row becomes the attention image. Although not shown, when a key in the direction toward the outside of the screen (up or down) is pressed, the screen is switched or scrolled in the vertical direction as necessary, and any index among all indexes or group representative indexes is displayed. It is possible to make the state of attention.

入力部212として、マウスやトラックパッドのように画面上の座標を指示可能なポインティングデバイスが利用可能な場合、グループの代表インデックス画像の領域に含まれる座標を指示した場合、および、インデックス画像の領域内の座標を指示した場合は画像注目状態になる。また、修飾表示1602またはアイコン1604の領域内の座標を選択した場合はグループ注目状態になる。また、図示しないスクロール用の領域を定義し、表示部213の画面に表示すれば、スクロール用の領域の座標を指示することで、すべてのインデックスまたはグループの代表インデックスの中の何れかを注目状態にすることが可能である。 When a pointing device that can specify coordinates on the screen, such as a mouse or a trackpad, is available as the input unit 212, the coordinates included in the group representative index image area are specified, and the index image area When the coordinates inside are designated, the image is noticed. When the coordinates in the area of the decoration display 1602 or the icon 1604 are selected, the group attention state is set. In addition, if a scroll area (not shown) is defined and displayed on the screen of the display unit 213, the coordinates of the scroll area are instructed, and either one of the representative indexes of all indexes or groups is focused. It is possible to

画像注目状態において、ユーザが入力部212のOKボタンを操作して次の処理への遷移を指示すると、光磁気ディスク206に記録された動画にアクセスし、インデックスで示されるタイムコードに基づき動画の頭出し再生（表示部212への表示）を行う。また、グループ注目状態において、同様に次の処理への遷移を指示すると、注目グループの各メンバを詳細に確認可能な表示状態へ移行する。 In the image attention state, when the user operates the OK button of the input unit 212 to instruct the transition to the next process, the moving image recorded on the magneto-optical disk 206 is accessed, and the moving image is recorded based on the time code indicated by the index. Cue playback (display on display unit 212) is performed. Similarly, when a transition to the next process is instructed in the group attention state, each member of the attention group shifts to a display state in which details can be confirmed.

図17はグループの各メンバを確認可能な表示状態へ移行した場合の表示部212の表示画面の一例を示す図である。 FIG. 17 is a diagram illustrating an example of a display screen of the display unit 212 when the display state is shifted to a display state in which each member of the group can be confirmed.

図17に符号1902、1905、1908で示すメンバは、グループのメンバのタイムスタンプを小さい順に上から下へ並べたものである。そして、メンバ1902、1905、1908から左右方向に各メンバの前後のインデックスを表示する。すなわち、符号1901はメンバ1902の一つ前のタイムスタンプのインデックス画像、符号1903はメンバ1902の一つ後のタイムスタンプのインデックス画像である。同様に、符号1904、1906はメンバ1905の前後のインデックス画像、符号1907、1909はメンバ1908の前後のインデックス画像である。図示しないが、この場合も、図16で説明したように、画面切替またはスクロールを行い、グループの全メンバの前後のインデックスの画像を表示することができる。また、表示画面の大きさによっては、前後について、複数のインデックス画像を表示してもよい。 The members denoted by reference numerals 1902, 1905, and 1908 in FIG. 17 are obtained by arranging the time stamps of the members of the group from top to bottom in ascending order. Then, the front and rear indexes of each member are displayed from the members 1902, 1905, and 1908 in the left-right direction. That is, reference numeral 1901 is the index image of the time stamp immediately preceding member 1902, and reference numeral 1903 is the index image of the time stamp immediately following member 1902. Similarly, reference numerals 1904 and 1906 are index images before and after the member 1905, and reference numerals 1907 and 1909 are index images before and after the member 1908. Although not shown, also in this case, as described with reference to FIG. 16, the screen can be switched or scrolled to display the index images before and after all the members of the group. Depending on the size of the display screen, a plurality of index images may be displayed before and after.

また、枠が強調表示されたメンバ1902は注目状態にある。ここで次の処理への遷移を指示すると、図16で説明したのと同様に、注目状態の画像のタイムコードから動画の頭出し再生が行われる。 Further, the member 1902 whose frame is highlighted is in a focused state. Here, when a transition to the next process is instructed, the moving image is cued and reproduced from the time code of the image in the attention state, as described with reference to FIG.

図18はグループの各メンバを確認可能な状態へ移行した場合の表示部212の表示画面の他例を示す図である。 FIG. 18 is a diagram showing another example of the display screen of the display unit 212 when each member of the group is transferred to a state where it can be confirmed.

図18に符号2001、2002で示すメンバは、選択グループのメンバのインデックス画像であり、それらのタイムスタンプを含む前後の所定の時間の動画が、符号2003で示すマルチ画面2003で繰り返し連続再生される。また、他の例として、一画面でグループのメンバのタイムスタンプを含む前後の所定の時間の動画を連続再生するようにしてもよい。 The members denoted by reference numerals 2001 and 2002 in FIG. 18 are the index images of the members of the selected group, and the moving images of a predetermined time before and after those time stamps are repeatedly reproduced continuously on the multi-screen 2003 denoted by reference numeral 2003. . As another example, a moving image of a predetermined time before and after the time stamp of the group member may be continuously reproduced on one screen.

図19はグループの各メンバを確認可能な状態へ移行した場合の表示部212の表示画面の他例を示す図である。 FIG. 19 is a diagram showing another example of the display screen of the display unit 212 when each member of the group is transferred to a state where it can be confirmed.

図19は、より多くの情報を提供できるよう、表示部212に画像等をレイアウトしたものである。符号2101は動画の再生領域、符号2102-2104は選択グループのメンバのタイムコードのフレーム画像で、枠が強調表示されているフレーム画像2103が注目状態のメンバである。また、バー2107は動画全体を示すタイムライン、マーカ2104-2106はグループのすべてのメンバが動画全体のどこにあるかを示し、とくに黒いマーカ2105は注目フレーム画像2103に対応する位置を示す。また、マーカ2108は再生領域2101で再生中の位置を示すマーカである。 FIG. 19 shows a layout of images and the like on the display unit 212 so that more information can be provided. Reference numeral 2101 denotes a moving image playback area, reference numerals 2102-2104 denote frame images of time codes of members of the selected group, and a frame image 2103 with a highlighted frame is a member in a focused state. A bar 2107 indicates a timeline indicating the entire moving image, a marker 2104-2106 indicates where all members of the group are in the entire moving image, and a black marker 2105 indicates a position corresponding to the frame image 2103 of interest. A marker 2108 is a marker indicating a position being reproduced in the reproduction area 2101.

注目フレームは、入力部212の十字キーのうち、左右キーで前後のメンバに移動することができ、表示し切れないメンバは、画面切替またはスクロールによって表示させることができる。また、入力部212にポインティングデバイスがあれば、フレーム画像2102-2104、マーカ2104-2106の領域中の座標を指示することで、対応するグループメンバを注目状態にすることができる。そのとき、注目状態になったメンバの画像が中央のフレーム画像2103として表示され、その前後のメンバがフレーム画像2102、2104として表示される。勿論、注目フレームが移動した場合、マーカ2105も対応する位置に移動し、その位置から動画の再生が開始される。 The frame of interest can be moved to the previous and next members with the left and right keys of the cross key of the input unit 212, and the members that cannot be displayed can be displayed by screen switching or scrolling. In addition, if there is a pointing device in the input unit 212, the corresponding group member can be brought into the focused state by designating the coordinates in the area of the frame image 2102-2104 and the marker 2104-2106. At that time, the image of the member in the attention state is displayed as the center frame image 2103, and the members before and after the image are displayed as the frame images 2102 and 2104. Of course, when the frame of interest moves, the marker 2105 also moves to the corresponding position, and playback of the moving image starts from that position.

このレイアウトの各パーツの数や大きさ、位置などは多くの変形が考えられるが、要は、注目状態とその前後のインデックスおよびグループメンバ、再生画面、並びに、インデックスまたはグループメンバの位置の大局的なマーカ表示の三つのうち、少なくとも二つを同時に画面上に表示すればよい。また、このレイアウトを図13や図14に代わるインデックス出力として用いてもよい。すなわち、フレーム画像2102-2104の領域に、インデックス一覧を出力し、グループである否かがわかる情報を図13や図14に符号1602または1604で示す修飾やアイコンの有無で表せばよい。 There are many possible variations in the number, size, position, etc. of each part in this layout, but the main point is that the attention state and the index and group members before and after that, the playback screen, and the overall position of the index or group member Of these three marker displays, at least two may be displayed on the screen simultaneously. Further, this layout may be used as an index output in place of FIG. 13 or FIG. In other words, an index list is output in the area of the frame image 2102-2104, and information indicating whether or not it is a group may be represented by the presence or absence of a modification or icon indicated by reference numerals 1602 or 1604 in FIGS.

図20はディジタルカムコーダ（またはディジタルカメラ）の他の構成例を示すブロック図である。記録媒体としてカード型の不揮発性メモリ（メモリカード）2301を使用する点、および、メカ部をもたないためドライブユニット205が省かれている点が実施例1と異なる部分である。基本動作は、可動部をもたない点を除き、実施例1と同じである。動画を扱う場合、実施例1における光磁気ディスク206の構成（図3参照）と同じ構成をメモリカード2301にもたせるだけで、実施例1と同様な処理を行うことができる。 FIG. 20 is a block diagram showing another configuration example of the digital camcorder (or digital camera). The difference from the first embodiment is that a card-type non-volatile memory (memory card) 2301 is used as a recording medium and that the drive unit 205 is omitted because there is no mechanical part. The basic operation is the same as that of the first embodiment except that there is no movable part. When handling a moving image, the same processing as in the first embodiment can be performed only by giving the memory card 2301 the same configuration as that of the magneto-optical disk 206 in the first embodiment (see FIG. 3).

以下、本発明にかかる実施例2の画像処理を説明する。なお、本実施例において、実施例1と略同様の構成については、同一符号を付して、その詳細説明を省略する。 The image processing according to the second embodiment of the present invention will be described below. Note that in this embodiment, the same reference numerals as those in the first embodiment denote the same parts, and a detailed description thereof will be omitted.

図21は実施例2におけるコンピュータの構成例を示すブロック図で、光学系207、撮像素子208、撮像系処理部209、バッファメモリ210、不揮発性メモリ211が省かれ、代わりにハードディスク2502が接続されている点、並びに、通信インタフェイス(I/F) 214およびネットワーク2503を介して他のコンピュータ2504と接続可能な点で実施例1の構成と異なる。 FIG. 21 is a block diagram illustrating a configuration example of a computer according to the second embodiment, in which the optical system 207, the imaging device 208, the imaging system processing unit 209, the buffer memory 210, and the nonvolatile memory 211 are omitted, and a hard disk 2502 is connected instead. And the configuration of the first embodiment is different in that it can be connected to another computer 2504 via a communication interface (I / F) 214 and a network 2503.

実施例3の構成（コンピュータ）は、動画を直接撮影し記録する機能をもたないが、光磁気ディスク206に格納された動画、もしくは、他のコンピュータ2504に保存されている動画をネットワーク2503経由で、ハードディスク2502に格納することができる。また、実施例1において、光磁気ディスク206（もしくはメモリカード2301）に格納されていたメディア管理情報、ショット管理情報、インデックス管理情報は、ハードディスク2502に格納され、不揮発性メモリ211に読み込まれていたメディア管理情報、ショット管理情報、インデックス管理情報、グループ管理情報はRAM 204の所定領域に読み込まれる。 The configuration (computer) of the third embodiment does not have a function of directly capturing and recording a moving image, but a moving image stored in the magneto-optical disk 206 or a moving image stored in another computer 2504 is transmitted via the network 2503. Can be stored in the hard disk 2502. In the first embodiment, the media management information, shot management information, and index management information stored in the magneto-optical disk 206 (or memory card 2301) are stored in the hard disk 2502 and read into the nonvolatile memory 211. The media management information, shot management information, index management information, and group management information are read into a predetermined area of the RAM 204.

そして、ハードディスク2502に格納された動画に対して、インデックス生成部305において、実施例1と同様に、メディア管理情報、ショット管理情報、インデックス管理情報、グループ管理情報を作成、更新することが可能である。この処理は、ハードディスク2502に動画を格納するときに行ってもよい。 Then, for the video stored in the hard disk 2502, the index generation unit 305 can create and update media management information, shot management information, index management information, and group management information as in the first embodiment. is there. This processing may be performed when moving images are stored in the hard disk 2502.

また、実施例2はコンピュータであるから、動画記録部302、インデックス生成部305における各処理プログラムを、予め光磁気ディスク206に記憶しておき、コンピュータの起動時にRAM 204へ読み込んで、RAM 204上の処理プログラムとして実行するようにしてもよい。 Since the second embodiment is a computer, each processing program in the moving image recording unit 302 and the index generation unit 305 is stored in the magneto-optical disk 206 in advance and is read into the RAM 204 when the computer is started up. It may be executed as a processing program.

［変形例］
上記では、記録媒体上の開始位置または開始時間を表す情報としてタイムコードを用いる例を説明したが、その他の情報、例えば実時間（時計）であってもよい。あるいは、先頭からのフレーム番号であってもよい。 [Modification]
In the above, an example in which a time code is used as information indicating the start position or start time on the recording medium has been described, but other information, for example, real time (clock) may be used. Alternatively, the frame number from the beginning may be used.

また、上記では、インデックスの生成方法や、インデックスの選択方法において、供給されたフレーム画像から特徴量を求める例を説明したが、フレーム画像に色変換やエッジの抽出、トリミング、マスキングなど、ある処理を加えて二次的に得られる画像を用いてもよい。また、近隣のフレームとの時間的に平均したものでもよい。時間的に平均することで、ノイズや不揮発性などで1フレームだけ前後のフレームと異なっていた場合にも、その影響を抑えた代表フレームの検出が可能になる。 In the above description, an example in which a feature amount is obtained from a supplied frame image in the index generation method and the index selection method has been described. However, certain processing such as color conversion, edge extraction, trimming, and masking is performed on the frame image. Alternatively, an image obtained secondarily may be used. Further, it may be averaged over time with neighboring frames. By averaging over time, it is possible to detect a representative frame that suppresses the influence even if it differs from the previous and subsequent frames by noise or non-volatility.

相関を求める特徴量として、上記では、格子状のブロック分割を用いる例を説明したが、低処理コストで相関の許容範囲が狭い第一の相関算出方法と、それよりも処理コストが低いが、許容範囲が広い第二の相関算出方法を組み合わせるのであれば、相関算出方法、および、フレーム画像から得る特徴量は他の方法でもよい。例えば、ISO/IEC 15938-3に規定されている、色やテクスチャや形状などの画像の特徴記述子を用いてもよい。 As an example of the feature amount for obtaining the correlation, in the above description, an example of using grid-like block division has been described, but the first correlation calculation method with a low processing cost and a narrow correlation allowable range, and the processing cost is lower than that, As long as the second correlation calculation method having a wide allowable range is combined, the correlation calculation method and the feature amount obtained from the frame image may be other methods. For example, image feature descriptors such as colors, textures and shapes defined in ISO / IEC 15938-3 may be used.

また、上記では、インデックスの記録において、画像データは、縮小画像を記録する例を説明したが、ステップS801で取り出した画像データそのものでもよい。また、実施例2では、動画データ中の所望のタイムコードへのアクセスが短時間で可能になるため、動画から画像データを取得することができる。そのため、インデックス中の画像データは格納しなくてもよい。この場合はインデックス情報のデータ量を削減できる。 In the above description, an example in which a reduced image is recorded as the image data in the index recording has been described. However, the image data itself extracted in step S801 may be used. Further, in the second embodiment, it is possible to access a desired time code in the moving image data in a short time, so that image data can be acquired from the moving image. Therefore, the image data in the index need not be stored. In this case, the data amount of index information can be reduced.

また、インデックス情報には代表フレームに関する付属情報を一緒に記録してもよい。付属情報は、例えば、代表フレームがショット先頭か否かを表す情報、音声情報、音声認識を行った結果の文字情報、フレーム内に映っているテロップなどの文字を文字認識した結果の文字情報、露出状態などのカメラの情報やパンやズームなどのカメラワーク情報などがある。これらの情報は代表フレームとともに表示、印刷する際に併記して出力することで、人が所望の代表フレームを見付け易くなる。また、文字情報は、代表フレームのキーワードによる検索処理に利用することができる。 In addition, the index information may be recorded together with attached information regarding the representative frame. The attached information includes, for example, information indicating whether the representative frame is the head of the shot, voice information, character information obtained as a result of voice recognition, character information obtained as a result of character recognition of characters such as a telop displayed in the frame, There are camera information such as exposure status and camera work information such as pan and zoom. Such information is output together with the representative frame when it is displayed and printed, so that a person can easily find a desired representative frame. Further, the character information can be used for a search process using a keyword in the representative frame.

また、上記では、インデックスをグループ化するだけであったが、グループをさらにグループ化して階層化してもよい。グループの階層化する方法について簡単に説明すると、グループ化部においてグループを統合した過程を記録すれば、二分木が生成される。二分木のノードはグループの集合なので、ノードの特徴量はノードに含まれるグループの特徴量の平均を求めればよい。距離はステップS904と同様な方法で算出できる。距離が短いノード同士の結合を繰り返せば、深さの低いn分木に変形することができる。管理情報にはグループ間の親子関係を管理するテーブルを設ければよい。また、表示には、各ノードの中で一番若いタイムコードをノードのタイムコードとしてそのフレーム画像を表示すればよい。また、子をもつか否かがわかる情報を修飾1602またはアイコン1604の有無で示せばよい。 In the above description, only the indexes are grouped. However, the groups may be further grouped into hierarchies. Briefly describing a method of grouping groups, a binary tree is generated by recording a process of group integration in a grouping unit. Since the nodes of the binary tree are a set of groups, the feature values of the nodes may be obtained by averaging the feature values of the groups included in the nodes. The distance can be calculated by the same method as in step S904. By repeating the connection of nodes with a short distance, it can be transformed into an n-ary tree with a low depth. The management information may be provided with a table for managing parent-child relationships between groups. For display, the frame image may be displayed with the youngest time code among the nodes as the time code of the node. Information indicating whether or not the child has a child may be indicated by the presence or absence of the modification 1602 or the icon 1604.

［他の実施例］
なお、本発明は、複数の機器（例えばホストコンピュータ、インタフェイス機器、リーダ、プリンタなど）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、複写機、ファクシミリ装置など）に適用してもよい。 [Other embodiments]
Note that the present invention can be applied to a system including a plurality of devices (for example, a host computer, an interface device, a reader, and a printer), and a device (for example, a copying machine and a facsimile device) including a single device. You may apply to.

また、本発明の目的は、前述した実施例の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（または記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはCPUやMPU）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施例の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム(OS)などが実際の処理の一部または全部を行い、その処理によって前述した実施例の機能が実現される場合も含まれることは言うまでもない。 Also, an object of the present invention is to supply a storage medium (or recording medium) on which a program code of software that realizes the functions of the above-described embodiments is recorded to a system or apparatus, and the computer (or CPU or CPU) of the system or apparatus. Needless to say, this can also be achieved by the MPU) reading and executing the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention. Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.

さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるCPUなどが実際の処理の一部または全部を行い、その処理によって前述した実施例の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function is determined based on the instruction of the program code. Needless to say, the CPU of the expansion card or the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

本発明を上記記憶媒体に適用する場合、その記憶媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。 When the present invention is applied to the storage medium, the storage medium stores program codes corresponding to the flowcharts described above.

実施例1の、リムーバブルな光磁気ディスク型の記録媒体を用いるディジタルカムコーダの構成例を示すブロック図、Block diagram showing a configuration example of a digital camcorder using a removable magneto-optical disk type recording medium of Example 1, ROMに格納された各種処理の制御プログラムを示す図、The figure which shows the control program of various processing which is stored in ROM, 光磁気ディスクに格納された各種情報を示す図、The figure which shows the various information stored in the magneto-optical disk, 光磁気ディスクから情報が読み込まれた後の不揮発性メモリの記憶状態を示す図、The figure which shows the storage state of the non-volatile memory after information is read from a magneto-optical disk, ショット管理情報を説明する図、Figure explaining shot management information, インデックス生成部によるインデックスの記録過程を説明するフローチャート、A flowchart for explaining an index recording process by the index generation unit; キャッシュバッファの格納状態を説明する図、The figure explaining the storage state of the cache buffer, 代表フレームの抽出処理を説明するフローチャート、A flowchart for explaining representative frame extraction processing; インデックス管理情報を説明する図、The figure explaining index management information, 探索処理の詳細を説明するフローチャート、A flowchart for explaining the details of the search process; グループ化管理情報を説明する図、A diagram explaining grouping management information, グループ化の詳細を説明するフローチャート、Flowchart explaining details of grouping, インデックスの一覧を表示部に出力した一例を示す図、The figure which shows an example which output the list of indexes to the display part, インデックスの一覧表示の別形態を示す図、Figure showing another form of index list display, 注目状態におけるインデックス一覧表示の各部の表示の変化を示す図、The figure which shows the change of the display of each part of the index list display in the attention state, 注目画像またはインデックスの移動方向を示す図、Figure showing the moving direction of the image of interest or index, グループの各メンバを確認可能な表示状態へ移行した場合の表示部の表示画面の一例を示す図、The figure which shows an example of the display screen of a display part at the time of shifting to the display state which can confirm each member of a group, グループの各メンバを確認可能な状態へ移行した場合の表示部の表示画面の他例を示す図、The figure which shows the other example of the display screen of the display part at the time of shifting to the state which can confirm each member of a group, グループの各メンバを確認可能な状態へ移行した場合の表示部の表示画面の他例を示す図、The figure which shows the other example of the display screen of the display part at the time of shifting to the state which can confirm each member of a group, ディジタルカムコーダ（またはディジタルカメラ）の他の構成例を示すブロック図、Block diagram showing another configuration example of a digital camcorder (or digital camera), 実施例2におけるコンピュータの構成例を示すブロック図である。FIG. 6 is a block diagram illustrating a configuration example of a computer according to a second embodiment.

Claims

Supply means for supplying a moving image in which shot boundaries are defined in units of frames;
Index generating means for detecting one or more representative frames from the shot based on the correlation of the supplied frames and generating an index that associates link information to the video;
Grouping means for generating a group in which the generated index is grouped according to a predetermined criterion;
An image processing apparatus comprising: a recording unit that records information on the moving image, the index, and the group on a recording medium.

Further, a list of images corresponding to frames of moving images recorded on the recording medium is output for display according to the information on the index and group recorded on the recording medium, and the image selected from the list is supported. 2. The image processing apparatus according to claim 1, further comprising a navigation unit that reproduces the moving image based on an index.

3. The image processing apparatus according to claim 1, wherein the grouping unit performs the grouping based on similarity of the frame images of the index.

4. The image processing apparatus according to claim 1, wherein the grouping unit determines an index representing the group from the group of the indexes.

The navigating means outputs, for display, a symbol capable of determining whether or not an image in the list is a group of the index, and an image corresponding to a frame corresponding to the representative index as an image of the index group 5. The image processing device according to claim 4, wherein the image processing device is output for display.

6. The image processing apparatus according to claim 5, wherein the navigating unit reproduces the moving image by a different method depending on whether the symbol is selected or an image in the list is selected.

Provide a moving image with shot boundaries defined in frame units,
Based on the correlation of the supplied frames, one or more representative frames are detected from the shot, and an index that associates link information to the video is generated.
Generate a group by grouping the generated index according to a predetermined standard,
An image processing method, wherein information relating to the moving image, the index, and the group is recorded on a recording medium.

8. A program for controlling an information processing apparatus to execute image processing according to claim 7.

9. A recording medium on which the program according to claim 8 is recorded.