JP2007151118A

JP2007151118A - Method and apparatus for detecting feature scene of moving image

Info

Publication number: JP2007151118A
Application number: JP2006316461A
Authority: JP
Inventors: Akio Nagasaka; 晃朗長坂; Takafumi Miyatake; 孝文宮武; Takehiro Fujita; 武洋藤田; Katsumi Taniguchi; 勝美谷口
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2006-11-24
Filing date: 2006-11-24
Publication date: 2007-06-14
Anticipated expiration: 2015-08-18
Also published as: JP4007406B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and apparatus for easily and fast determining whether it is an important scene in a video image and specifying a range thereof. <P>SOLUTION: Provided are a means for time-sequentially inputting a target moving image to a processing device for each frame; a means for buffering a plurality of frames inputted to the processing device in the past; and a means for determining whether a feature amount of a buffered frame is characterized in getting closer to a feature amount of a latest frame monotonously in order from former one. Furthermore, provided is a means for extracting a video phase from that frame for a fixed time or until detecting a next special video effect as an important scene by determining presence of the special video effect if determined as true by the determination means. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は，ビデオや映画等の動画像を短時間で概要把握を行うための早見する方法及び
装置に係り，特にビデオテープやビデオディスクに格納された動画像からカット（１台の
カメラで撮影された途切れのない動画像区間）間のデゾルブ（連続するカットA，Bがある
とき，そのカットの変わり目において，Aがフェードアウトすると同時にBがフェードイン
する特殊映像効果）を検出することによって動画像を代表する場面を特定する動画像の特
徴場面検出方法及び装置に関する。 The present invention relates to a method and apparatus for quickly seeing a moving image such as a video or a movie in a short time, and in particular, cuts from a moving image stored on a video tape or a video disk (captured by one camera). By detecting a dissolve (a special video effect in which B fades in at the same time as A fades out when there are continuous cuts A and B when there are continuous cuts A and B). The present invention relates to a method and an apparatus for detecting a feature scene of a moving image that identifies a scene representing the scene.

近年，通常のテレビ放送に加えて，衛星放送やケーブルテレビなどが普及しつつあり，
放送の多チャンネル化が進行している。今後，情報ハイウエイと称される広帯域の通信基
盤が整備されれば，放送の配信が容易になり，現状よりもさらに多くの放送業者が参入し
て，多チャンネル化が加速されると考えられる。こうした大量に放送される情報の中から
，視聴者個人個人にとって有用な情報と無用な情報とを区別し，選択することは非常に手
間と時間のかかる作業である。そのため，映像内容を手早く把握するための要約情報（ダ
イジェスト）を効率よく作成する技術の研究が進められている。ダイジェストを作成する
にあたって最も基本的かつ不可欠な処理は，映像中から重要な場面を選び出すことである
。もし，映像中の場面場面の重要度を計算機で自動的に判定できれば，ダイジェストの作
成は非常に簡単になる。例えば，特開平4-294694号では，野球中継において，映像中の移
動物体の移動結果と，ある特定のイベントとの対応（ランナーの本塁位置への移動と，得
点があったこととの対応等）に着目して，重要度の高い場面を選択する方法が示されてい
る。 In recent years, in addition to normal TV broadcasting, satellite broadcasting and cable TV are becoming popular.
Multi-channel broadcasting is progressing. In the future, if a broadband communication infrastructure called an information highway is established, it will be easier to distribute broadcasts, and more broadcasters will enter the market, and the number of channels will be accelerated. It is very time consuming and time consuming to distinguish and select useful information and unnecessary information for individual viewers from such a large amount of broadcasted information. For this reason, research on techniques for efficiently creating summary information (digests) for quickly grasping video content is underway. The most basic and indispensable process for creating a digest is to select important scenes from the video. If the importance of a scene in a video can be automatically judged by a computer, the creation of a digest becomes very easy. For example, in Japanese Patent Laid-Open No. 4-294694, in a baseball broadcast, the correspondence between the movement result of a moving object in a video and a specific event (corresponding to the movement of the runner to the home position and the score, etc. A method for selecting a scene with high importance is shown.

特開平４−２９４６９４号公報JP-A-4-294694

しかしながら，移動物体の動き解析は，現状の画像認識の技術水準では精度や処理速度
が十分でなく，それによって得られた動きパターンと，特定のイベントとの対応が必ずし
も対応するとは限らないという問題点がある。また，正しくイベントが検出できた場合で
も，その前後のどの範囲までを重要な場面として切り出せばよいのかを自動判定させるこ
とは極めて困難である。さらに，ダイジェスト自体，映像全体を視聴するのに比べれば格
段に短い時間ながら，やはり一定の時間をかけて視聴する必要性は残っており，もっと簡
潔に概要把握できるような技術が求められている。 However, the motion analysis of moving objects is not sufficient in accuracy and processing speed at the current state of the art of image recognition, and the correspondence between the obtained motion pattern and a specific event does not always correspond. There is a point. Even if an event can be detected correctly, it is extremely difficult to automatically determine which range before and after it should be extracted as an important scene. In addition, the digest itself is much shorter than viewing the entire video, but there is still a need to watch over a certain amount of time, and there is a need for a technology that allows a more concise overview. .

本発明の目的は，映像中の重要な場面かどうかの判定とその範囲の特定とを簡便かつ高
速に行うための方法を提供することにある。また，映像がどんな分野（ニュース，スポー
ツ中継等）に属するかを判定して分類し，ユーザの映像選択の一助となる情報として提供
することにある。 SUMMARY OF THE INVENTION An object of the present invention is to provide a method for determining whether an important scene in an image is important and specifying the range thereof easily and at high speed. Another object is to determine and classify what field (news, sports broadcast, etc.) the video belongs to, and provide it as information that helps the user to select the video.

放送映像については，多くの場合，放送局側で重要な場面を強調するような各種の映像
効果が施されている。この性質はスポーツ中継の場合に特に顕著であり，例えば，得点が
入った場合にはリプレイを放映するといった特徴がある。リプレイ映像は視点の異なるカ
メラで撮像された映像が使われることが多く，単純に全く同じ映像かどうかでリプレイ映
像か否かを判定することはできないが，そうしたリプレイ映像に切り替わるときには，デ
ゾルブやワイプといった特殊映像効果が用いられ，通常の放送から一時的に外れることを
視聴者が明確に分かるような工夫がされている。さらにまた通常の放送に戻るときにも同
様の映像効果が利用される。したがって，こうした特殊映像効果を検出することにより，
重要な場面を選び出すことが可能になる。 In many cases, the broadcast video has various video effects that emphasize important scenes on the broadcasting station side. This property is particularly noticeable in the case of sports broadcasts. For example, when a score is entered, a replay is broadcast. Replay images are often taken from cameras with different viewpoints, and it is not possible to determine whether a replay image is simply the same or not. Special video effects such as these are used, and it is designed to allow viewers to clearly see that they are temporarily out of normal broadcasting. Furthermore, the same video effect is used when returning to normal broadcasting. Therefore, by detecting these special video effects,
It is possible to select important scenes.

そこで，対象となる動画像をフレーム単位で時系列に処理装置に入力し，該処理装置で
は，フレーム中の各画素の色もしくは輝度が，連続する複数枚のフレーム群にまたがって
，該フレーム群の最初のフレームの色もしくは輝度の値から，最後のフレームの色もしく
は輝度の値に向けて単調に近づく傾向で推移しているかどうかを調べ，該条件を満たす画
素の数から画面全体としての変化を表す評価値を計算し，該評価値が予め定めた許容範囲
外となった時点で，該連続する複数枚のフレームにまたがる区間に，デゾルブ等の特殊映
像効果による場面の変わり目があったと判定し，該区間もしくはその近傍を動画像中の特
徴的な点であると判定する。 Therefore, the target moving image is input to the processing device in a time series in units of frames, and in the processing device, the color or luminance of each pixel in the frame spans a plurality of consecutive frame groups. Check whether the color or brightness value of the first frame of the image has a monotonous trend toward the color or brightness value of the last frame, and change the number of pixels that satisfy the condition When the evaluation value is calculated and the evaluation value falls outside the predetermined allowable range, it is determined that there has been a scene change due to a special video effect such as a dissolve in the section spanning the continuous frames. Then, the section or the vicinity thereof is determined as a characteristic point in the moving image.

また，対象となる動画像をフレーム単位で時系列に処理装置に入力し，または対象とな
る音声を時系列に処理装置に入力し，該処理装置では，カット変化や色調を含む複数の種
類の画像特徴量の変化を検出する手段と，必要に応じて話者変化を含む音声特徴量の変化
を検出する手段を設け，該検出手段により，変化が発生したこと，もしくは複数の変化が
同時または特定の順番で発生したことからなる特徴量に基づき，番組の種類を判別する。 In addition, the target moving image is input to the processing device in time series in units of frames, or the target sound is input to the processing device in time series. In the processing device, a plurality of types including cut changes and color tones are input. A means for detecting a change in the image feature amount and a means for detecting a change in the speech feature amount including a speaker change as necessary are provided, and the detection means detects that a change has occurred, or a plurality of changes occur simultaneously or The type of program is discriminated based on a feature amount that is generated in a specific order.

本発明によれば，重要な場面とその範囲を同時に得ることができ，ダイジェスト映像が
自動で作成できる効果がある。一般にリプレイされる場面は重要な場面であることが多い
が、本発明では、デゾルブを含む特殊映像効果の区間を検出することによって、放送中の
リプレイ場面を精度よく検出できる。 According to the present invention, it is possible to obtain an important scene and its range at the same time, and to produce a digest video automatically. In general, scenes to be replayed are often important scenes. However, in the present invention, a replay scene being broadcast can be accurately detected by detecting a section of a special video effect including a dissolve.

さらにまた，カット変化や色調を含む複数の種類の画像特徴量の変化が同時または特定
の順番で発生したことからなる特徴量に基づき，番組の種類を判別する手段によって，映
像の種類が自動的に判定されるので，視聴者にとって興味のない種類の映像であれば，ダ
イジェスト映像を見るまでもなく却下でき，効率的な映像選択ができる効果がある。また
，この映像の種類の判定においては，簡単な画像や音声の変化とその組み合わせから判定
を行うので，処理が高速に行える。 Furthermore, the type of video is automatically determined by means of determining the type of program based on the feature amount consisting of multiple types of image feature amount changes, including cut changes and color tones, occurring simultaneously or in a specific order. Therefore, if the video is of a type that is not of interest to the viewer, it can be rejected without seeing the digest video, and the video can be efficiently selected. In addition, in the determination of the video type, since the determination is performed based on a simple change in image and sound and a combination thereof, the processing can be performed at high speed.

放送でリプレイされる場面は，専門家が重要であると判定した部分であり，そうしたリ
プレイ場面を検出できれば，ダイジェスト作成が極めて容易になる。本発明によれば，デ
ゾルブを含む特殊映像効果による場面の変わり目が検出できるため，そうした特殊効果に
相前後して流される重要な場面を精度よく抽出できる。また同時に，その場面の範囲も得
ることができる。 The scenes that are replayed by broadcasting are the parts that the expert has determined to be important. If such replay scenes can be detected, it is very easy to create a digest. According to the present invention, since a scene change due to a special video effect including a dissolve can be detected, it is possible to accurately extract an important scene that is flowed before and after the special effect. At the same time, the range of the scene can be obtained.

さらに，カット変化や色調を含む複数の種類の画像特徴量の変化が同時または特定の順
番で発生したことからなる特徴量に基づき，番組の種類を判別する手段によって，映像の
種類が自動的に判定されるので，視聴者にとって興味のない種類の映像であれば，ダイジ
ェスト映像を見るまでもなく却下でき，効率的な映像選択ができる。また，この映像の種
類の判定においては，簡単な画像や音声の変化とその組み合わせから判定を行うので，処
理が高速に行える。 Furthermore, the type of video is automatically determined by means of discriminating the type of the program based on the feature amount consisting of multiple types of image feature amount changes including cut changes and color tones simultaneously or in a specific order. Therefore, if the video is of a type that is not of interest to the viewer, the video can be rejected without seeing the digest video, allowing efficient video selection. In addition, in the determination of the video type, since the determination is performed based on a simple change in image and sound and a combination thereof, the processing can be performed at high speed.

以下，本発明の１実施例を詳細に説明する。 Hereinafter, an embodiment of the present invention will be described in detail.

図１は，本発明を実現するためのシステム構成の概略ブロック図の一例である。１はＣ
ＲＴ等のディスプレイ装置であり，コンピュータ４の出力画面を表示する。コンピュータ
４に対する命令は，キーボードやポインティングデバイス等の入力装置５を使って行うこ
とができる。１０の動画像再生装置は，地上波放送や衛星放送，ケーブルテレビなどの放
送番組を受信するためのチュナー装置，もしくは光ディスクやビデオテープ等に記録され
た動画像を再生するための装置である。動画像再生装置から出力される映像信号は，逐次
，３のＡ／Ｄ変換器によってデジタル画像データに変換され，コンピュータに送られる。
コンピュータ内部では，デジタル画像データは，インタフェース８を介してメモリ９に入
り，メモリ９に格納されたプログラムに従って，ＣＰＵ７によって処理される。１０が扱
う動画像の各フレームに，動画像の先頭から順に番号（フレーム番号）が割り付けられて
いる場合には，フレーム番号を制御線２によって動画像再生装置に送ることで，当該場面
の動画像を呼び出して再生することができる。また，処理の必要に応じて，各種情報を６
の外部情報記憶装置に蓄積することができる。メモリ９には，以下に説明する処理によっ
て作成される各種のデータが格納され，必要に応じて参照される。 FIG. 1 is an example of a schematic block diagram of a system configuration for realizing the present invention. 1 is C
It is a display device such as RT, and displays the output screen of the computer 4. Commands to the computer 4 can be performed using an input device 5 such as a keyboard or a pointing device. A moving image reproduction device 10 is a tuner device for receiving broadcast programs such as terrestrial broadcasting, satellite broadcasting, and cable television, or a device for reproducing moving images recorded on an optical disk, a video tape, or the like. The video signal output from the moving image reproducing apparatus is sequentially converted into digital image data by the A / D converter 3 and sent to the computer.
Inside the computer, the digital image data enters the memory 9 via the interface 8 and is processed by the CPU 7 in accordance with a program stored in the memory 9. When each frame of a moving image handled by 10 is assigned a number (frame number) in order from the beginning of the moving image, the frame number is sent to the moving image reproducing device by the control line 2, thereby moving the moving image of the scene. Images can be recalled and played back. In addition, various information can be stored as needed.
Can be stored in the external information storage device. The memory 9 stores various data created by the processing described below and is referred to as necessary.

以下では，重要場面の選別にあたって，特殊映像効果によるカット変化の一つであるデ
ゾルブを検出する方法について詳細に説明する。 In the following, a method for detecting a dissolve, which is one of cut changes caused by special image effects, in selecting an important scene will be described in detail.

図２は，図１で示したシステム上で実行される，動画像のデゾルブ検出プログラムのフ
ローチャートの一例である。プログラムはメモリ９に格納され，ＣＰＵ７は，まず最初に
初期化処理として，プログラムの実行に必要な各種の変数を初期値に設定する（２００）
。次に，過去のフレーム画像の各画素の輝度値を収めるｍ個の二次元配列B(x, y)の各要
素に０を代入する（２０２）。フレーム画像のサイズがｗ×ｈのとき，ｘは０からw-1，
ｙは０からh-1までの値をとる。処理２０４では，動画像再生装置１０が出力するフレー
ム画像の取り込みを行う（２０４）。処理２０６は，評価値が入る変数evalを０にし，ル
ープカウンタに初期値０を代入する。そして，以下の２０８〜２２８の処理をフレーム画
像中の全画素について行う。 FIG. 2 is an example of a flowchart of a moving image dissolve detection program executed on the system shown in FIG. The program is stored in the memory 9, and the CPU 7 first sets various variables necessary for program execution as initial values as initialization processing (200).
. Next, 0 is substituted into each element of the m two-dimensional array B (x, y) that stores the luminance value of each pixel of the past frame image (202). When the frame image size is w × h, x is 0 to w-1,
y takes a value from 0 to h-1. In the process 204, the frame image output from the moving image playback apparatus 10 is captured (204). A process 206 sets a variable eval in which an evaluation value is entered to 0, and assigns an initial value 0 to the loop counter. Then, the following processes 208 to 228 are performed for all the pixels in the frame image.

２０８から２２８の処理では，デゾルブに特有の性質の検出を行っている。ここで，デ
ゾルブは，図３に示すように，カットの変わり目の前後でＢのように，前後のカットのフ
レーム画像ＡとＣとが混じりあう区間を持つカット変化である。ＢにおけるＡとＣの混合
比率は，デゾルブ開始時のＡが１００％，Ｃが０％の状態から，時間をかけて比率が逆転
してゆき，最終的にＡが０％，Ｃが１００％になった時点でデゾルブが完了する。濃淡画
像の場合，Ａの輝度値をBa，Ｂの輝度値をBb，Ｃの輝度値をBc，Ｃの混合割合をα（0≦
α≦1）としたとき，Bb ＝ Ba × (1 - α) + Bc × αの式で近似することができる。こ
の式を変形すると，Bb ＝ (Bc - Ba) × α + Baになり，混合割合αが０から単調に増加
するデゾルブの場合，Bbの値もBaからBcまで単調に増加もしくは減少する。したがって，
過去ｍフレーム分について常に画素の輝度値をバッファに蓄えておき，そのｍフレーム長
の区間で輝度値が単調に増加もしくは減少しているかどうかを調べることでデゾルブの検
出を行うことができる。ｍの値は，８から１５程度に設定すると，実験的に良好な結果が
得られる。 In the processing from 208 to 228, the characteristic peculiar to the dissolve is detected. Here, as shown in FIG. 3, the dissolve is a cut change having a section in which the frame images A and C of the preceding and following cuts are mixed like B before and after the cut change. The mixing ratio of A and C in B is as follows. From the state where A is 100% and C is 0% at the start of the dissolution, the ratio reverses over time, and finally A is 0% and C is 100%. At this point, the dissolve is complete. In the case of a grayscale image, the luminance value of A is Ba, the luminance value of B is Bb, the luminance value of C is Bc, and the mixing ratio of C is α (0 ≦
When α ≦ 1), it can be approximated by the equation Bb = Ba × (1−α) + Bc × α. If this equation is modified, Bb = (Bc−Ba) × α + Ba, and in the case of a dissolve in which the mixing ratio α increases monotonically from 0, the value of Bb also monotonously increases or decreases from Ba to Bc. Therefore,
The resolution can be detected by always storing the luminance value of the pixel in the buffer for the past m frames and checking whether the luminance value monotonously increases or decreases in the section of the m frame length. If the value of m is set to about 8 to 15, good results can be obtained experimentally.

まず処理２０８では，過去のフレームの輝度値を記憶している二次元配列Ｂのｍ番目の
配列Bmに，座標(x, y)で表される画素の輝度値を代入する。そして，ループカウンタiに
１を代入し，変数numに０を代入する。次に，１番目の配列に記憶された輝度値B1(x, y)
とｍ番目の配列Bm(x, y)の値を比較し（２１２），続けて，i番目の配列に記憶された輝
度値Bi(x, y)がその次の配列Bi+1(x, y)の値よりも大きいかどうかを比較する（２１４，
２１６）。B1(x, y)がBm(x, y)より大きいときには，Bi(x, y)がBi+1(x, y)より大きい場
合にnumの値を１つ増やす。
逆に，B1(x, y)がBm(x, y)より小さいときには，Bi(x, y)がBi+1(x, y)より小さい場合に
numの値を１つ増やす（２１８）。続く処理２２０では，Bi(x, y)にBi+1(x, y)の値を代
入することで，ｍ個の配列Ｂを順番に１つずつシフトするようにし，常に最新のフレーム
から数えてｍフレーム分の輝度値がバッファとして格納されているようにする。処理２２
２では，ループカウンタiを１つ増やし，iがｍより大きくなるまで，処理２１２の時点で
B1(x, y)がBm(x, y)より大きかったときには処理２１４，そうでないときには処理２１６
に戻って処理を繰り返す（２２４）。numが閾値th1よりも大きいときには（２２６），座
標(x, y)の画素については，十分単調に増加もしくは減少しているとしてevalの値を１つ
増やす（２２８）。自然動画像はノイズ等により不規則な変動があるのが常であり，また
，デゾルブの速度も，人間がデゾルブ操作を行う場合にはムラが生じて一定ではなくなる
ので，単調性の判定に閾値を設けることでマージンを持たせる。上記処理をフレーム画像
中の全画素について行うべく，２０８に戻って繰り返す（２３０〜２３６）。これによっ
て，デゾルブの特徴を満たす画素の数がevalに入る。
最後に，evalが閾値th2を超えているかどうかを調べ（２３８），超えていればデゾルブ
があるとして，デゾルブ検出処理（２４０）を実行する。最後に，処理２０４に戻り，映
像の終わりまで２０４からの処理を繰り返す。 First, in process 208, the luminance value of the pixel represented by coordinates (x, y) is substituted into the m-th array Bm of the two-dimensional array B storing the luminance values of the past frames. Then, 1 is assigned to the loop counter i, and 0 is assigned to the variable num. Next, the brightness value B1 (x, y) stored in the first array
Are compared with the values of the m-th array Bm (x, y) (212), and then the luminance value Bi (x, y) stored in the i-th array is the next array Bi + 1 (x, It is compared whether it is larger than the value of y) (214,
216). When B1 (x, y) is larger than Bm (x, y), the value of num is incremented by 1 when Bi (x, y) is larger than Bi + 1 (x, y).
Conversely, when B1 (x, y) is smaller than Bm (x, y), Bi (x, y) is smaller than Bi + 1 (x, y).
The value of num is increased by 1 (218). In the subsequent process 220, by substituting Bi + 1 (x, y) for Bi (x, y), the m arrays B are sequentially shifted one by one, and always counted from the latest frame. The luminance values for m frames are stored as a buffer. Process 22
In 2, the loop counter i is incremented by one, and at the time of processing 212 until i becomes larger than m.
When B1 (x, y) is larger than Bm (x, y), the process 214 is performed. Otherwise, the process 216 is performed.
The processing is repeated after returning to (224). When num is larger than the threshold value th1 (226), the value of eval is increased by 1 because the pixel at the coordinate (x, y) has increased or decreased sufficiently monotonously (228). Natural moving images usually have irregular fluctuations due to noise and the like, and the speed of the dissolve is uneven when humans perform a dissolve operation, and is not constant. A margin is given by providing. In order to perform the above process for all the pixels in the frame image, the process returns to 208 and is repeated (230 to 236). As a result, the number of pixels satisfying the characteristics of the dissolve enters eval.
Finally, it is checked whether or not eval exceeds the threshold value th2 (238), and if it exceeds, it is determined that there is a dissolve, and a dissolve detection process (240) is executed. Finally, the processing returns to the processing 204, and the processing from 204 is repeated until the end of the video.

上記の方法では，ズームやパンといったカメラの動きがある場合にも，evalが高めに出
る。カメラが動けば，それに応じて，フレーム画像中の各画素の輝度も変化し，そうした
変化の中には，輝度が単調増加もしくは単調減少している画素も少なからず存在するから
である。そのため，デゾルブとカメラの動きとの区別がつきにくいケースもある。そこで
，以下では，デゾルブがもっと明確にわかるようなデゾルブ検出方法について説明する。 In the above method, even when there is a camera movement such as zooming and panning, eval appears high. This is because if the camera moves, the luminance of each pixel in the frame image changes accordingly, and there are not a few pixels whose luminance increases or decreases monotonously. For this reason, there are cases where it is difficult to distinguish between a dissolve and a camera movement. Therefore, in the following, a description will be given of a method for detecting a dissolve so that the dissolve can be understood more clearly.

一般に，デゾルブの時間は，１秒（NTSC方式の映像の場合で３０フレーム）以上になる
ものが多い。したがって，デゾルブがかかっている区間では，ｍ＝８のときで２２フレー
ム，ｍ＝１５のときでも１５フレーム以上の時間，evalの値が高い状態が続く。一方，カ
メラの動きの場合は，デゾルブのときほど値は高くない上，必ずしも連続して高い状態が
続くとは限らない。したがって，過去ｎフレーム分についてevalの値の総和sumをとった
とき，デゾルブのときのsumの値とカメラの動きのときのsumとでは顕著な違いが現れる。
図４は，上記の考え方を加えたデゾルブ検出方法である。 In general, the resolution time is often 1 second (30 frames in the case of NTSC video) or more. Therefore, in the section where the dissolution is applied, the value of eval continues to be high for 22 frames when m = 8 and for 15 frames or more even when m = 15. On the other hand, in the case of camera movement, the value is not as high as in the case of a dissolve, and the high state does not always continue. Therefore, when the sum total of the eval values for the past n frames is taken, a significant difference appears between the sum value during the dissolve and the sum during the camera movement.
FIG. 4 shows a dissolve detection method to which the above concept is added.

まず最初に初期化処理として，プログラムの実行に必要な各種の変数を初期値に設定す
る（４００）。次に，過去のフレーム画像の各画素の輝度値を収めるｍ個の二次元配列B(
x, y)の各要素に０を代入するとともに，過去ｎフレーム分のevalの値を記憶するｎ個の
変数E1〜Enを全て０にする（４０２）。フレーム画像のサイズがｗ×ｈのとき，ｘは０か
らw-1，ｙは０からh-1までの値をとる。処理４０４では，動画像再生装置１０が出力する
フレーム画像の取り込みを行う（４０４）。以下，図２で示した２０６から２３６までの
処理を実行してevalを得る（４０６）。そして，Enにevalの値を代入する。E1からEnまで
の総和をsumに求めるとともに，EjにEj+1の値を次々と代入しながらシフトし，常に最新
のeval値がE1〜Enに格納されているようにする（４０８〜４１２）。最後に，sumが閾値t
h3よりも大きいかどうかを判定し（４１４），大きければ，デゾルブ検出処理２４０を行
い，そうでなければ何もせずに処理４０４まで戻って繰り返す。 First, as initialization processing, various variables necessary for program execution are set to initial values (400). Next, m two-dimensional arrays B (
While substituting 0 for each element of x, y), n variables E1 to En for storing eval values for the past n frames are all set to 0 (402). When the frame image size is w × h, x takes a value from 0 to w−1, and y takes a value from 0 to h−1. In process 404, the frame image output from the moving image playback apparatus 10 is captured (404). Thereafter, the processing from 206 to 236 shown in FIG. 2 is executed to obtain eval (406). Then, assign the value of eval to En. The sum from E1 to En is obtained in sum and shifted while substituting Ej + 1 values one after another into Ej so that the latest eval values are always stored in E1 to En (408 to 412). . Finally, sum is the threshold t
It is determined whether it is larger than h3 (414). If it is larger, a dissolve detection process 240 is performed. If not, nothing is done and the process returns to process 404 and is repeated.

デゾルブ検出処理２４０では，デゾルブで挟まれた場面を重要な場面として選択する。
図２および図４のデゾルブ検出方法を実行すると，図５のような評価値の時間推移を表す
グラフを得ることができる。評価値は，デゾルブ区間において，一瞬だけ大きな値を示す
のではなく，急速に増加して急速に減少する三角形状の変化を示す特徴がある。そして，
三角形の底辺を成す２頂点が，デゾルブの開始点と終了点にほぼ対応している。ダイジェ
ストを作成するときには，デゾルブのような特殊映像効果がかかった部分が先頭や末尾に
残っていると見苦しいので，デゾルブの終わった点から，次のデゾルブが始まる手前まで
の区間５０７を切り出すようにする。そのため，上記のデゾルブ検出方法でデゾルブか否
かの判定に用いる第１の閾値５００に加えて，それより低い第２の閾値５０２を用いる。
そして，重要場面の開始点としてのデゾルブが検出された場合には，評価値が第１の閾値
を超えた点５０４以降ではじめて第２の閾値を下回った点５０６を重要場面の開始点とす
る。このとき，余裕をとって開始点を遅らせても構わない。また，重要場面の終了点とし
てデゾルブが検出された場合には，評価値が第１の閾値を超えた点５１０から過去に遡っ
て見たときに初めて第２の閾値を下回った点５０８を重要場面の終了点とする。このとき
，開始点と同様に，余裕をとって終了点を早めの時間にとってもよい。検出されたデゾル
ブが重要場面の開始点を示すのか，終了点を示すのかの判定には，デゾルブ間の時間が利
用できる。通常の放送が続いてれば，デゾルブはないのでデゾルブ間の時間間隔が長くな
り，重要場面ならば，比較的間隔は短い。こうして得られた重要場面を順番に再生するこ
とで，ダイジェストができる。 In the dissolve detection process 240, a scene sandwiched between the dissolves is selected as an important scene.
When the dissolve detection method of FIG. 2 and FIG. 4 is executed, a graph showing the time transition of the evaluation value as shown in FIG. 5 can be obtained. The evaluation value does not show a large value for a moment in the dissolve interval, but has a feature of showing a triangular change that rapidly increases and decreases rapidly. And
The two vertices that form the base of the triangle almost correspond to the start and end points of the dissolve. When creating a digest, it is unsightly if a part with a special image effect such as a dissolve remains at the beginning or end. To do. For this reason, in addition to the first threshold value 500 used for determining whether or not it is a dissolve in the above-described dissolve detection method, a lower second threshold value 502 is used.
Then, when a dissolve is detected as the starting point of the important scene, a point 506 where the evaluation value is below the second threshold for the first time after the point 504 where the evaluation value exceeds the first threshold is set as the starting point of the important scene. . At this time, the start point may be delayed with a margin. In addition, when a dissolve is detected as an end point of an important scene, a point 508 that falls below the second threshold for the first time when the evaluation value goes back to the past from the point 510 where the evaluation value exceeds the first threshold is important. The end point of the scene. At this time, like the start point, the end point may be set for an earlier time with a margin. The time between dissolves can be used to determine whether the detected dissolve indicates the start point or the end point of an important scene. If normal broadcasting continues, there will be no dissolve, so the time interval between dissolves will be longer, and if it is an important scene, the interval will be relatively short. The important scenes obtained in this way can be played in order to make a digest.

上記の実施例においては，輝度の単調な変化を調べたが，色の同様の変化を利用するこ
ともできる。色は１次元情報である輝度と異なり，３次元の情報である。従って，単純に
値の増加減少をもとに単調変化を調べることはできない。ここで，A色からB色への単調な
変化とは，２つの色を３次元の色空間にマッピングしたとき，A色からの距離を徐々に増
しつつ，B色との距離を徐々に縮める傾向としてとらえることができる。したがって，図
２における過去のフレームの輝度値を記憶する二次元配列Ｂの替わりに，色を記憶する二
次元配列Ｂ’を用い，そのＢ’中の各色がＢ’１との色差が増加すると同時にＢ’ｍとの
色差が減少する形で並んでいることを判定すれば，あとは輝度の場合と同様の手法を用い
ることができる。 In the above embodiment, a monotonous change in luminance was examined, but a similar change in color can also be used. The color is three-dimensional information unlike the luminance which is one-dimensional information. Therefore, it is not possible to examine monotonic changes based simply on increasing and decreasing values. Here, the monotonous change from A to B means that when mapping two colors to a three-dimensional color space, the distance from A is gradually increased while the distance from A is gradually increased. Can be viewed as a trend. Therefore, instead of the two-dimensional array B that stores the luminance values of the past frames in FIG. 2, a two-dimensional array B ′ that stores colors is used, and each color in B ′ increases in color difference from B′1. At the same time, if it is determined that the color difference from B′m is reduced, the same method as in the case of luminance can be used.

上記のようなデゾルブ等の特殊映像効果を使ったシーンを重要場面とみなせるのは，現
実としてスポーツ中継等の一部の番組に限定される。また，スポーツ番組中でも合間に挿
入されるコマーシャル中には特殊映像効果が頻繁に登場するため，単純にデゾルブに挟ま
れた区間という条件では過剰に検出しすぎることも多い。もちろん，多めに検出する分に
は，元の映像よりも十分に短い映像になっていれば，実用上問題はない。しかし，より精
度高く重要場面を抽出できれば，概要把握にかかる時間がさらに節約できる。そこで，ダ
イジェストを作成する対象の映像がどのような種類の映像かを区別する手段を設け，重要
場面の選択に活用する。 The fact that scenes using special video effects such as dissolve as described above can be regarded as important scenes is actually limited to some programs such as sports broadcasts. In addition, special video effects frequently appear in commercials inserted between sports programs, so there are many cases where excessive detection is performed under the condition of a section sandwiched between dissolves. Of course, there is no problem in practical use if the video is sufficiently shorter than the original video to detect more. However, if important scenes can be extracted with higher accuracy, the time taken for grasping the outline can be further saved. Therefore, a means for distinguishing what kind of video is the video for which a digest is to be created is used to select an important scene.

図６と図７は，それぞれニュース番組とスポーツ番組において発生するイベントを時間
軸に沿って図示したものである。ここでは，イベントとして，画像や音声の特徴が大きく
変化する点を考える。図中では，１）構図，２）色調，３）話者，４）字幕，５）デゾル
ブ，６）リプレイ，７）スロー再生，の７項目を例に挙げた。こうしたイベントの現れ方
や組み合わせには番組の種類によって特徴があり，その特徴をもとに番組の分類を行うこ
とができる。例えば，ニュース番組においては，キャスターが全面に登場するカットが時
間を空けて複数回現れるので，同じ構図の画像，より具体的には中心付近に顔の色である
肌色が大きな面積を占めている画像が複数回現れる特徴がある。また，そのときの話者は
同一人物である場合が多いとか，番組全体として字幕が頻繁に現れるという特徴もある。
一方，スポーツ中継の場合，固定位置に設置された複数のカメラを切り替えながら放送が
行われることが多く，同じか極めて類似した構図の画像が頻繁に現れる。特に野球やサッ
カーの場合には，色調は芝生の色である緑がメインとなる。また，リプレイやスロー再生
が頻繁に使われるという特徴がある。さらに，ＣＭの場合には，音の途切れが少ない，Ｂ
ＧＭが頻繁に使われる，色調が鮮やか，カットが多く，その時間長も短い，などの特徴が
ある。このように，映像中における複数のイベントの組み合わせパターンから，その映像
の種類をある程度推測することができる。そして，ここで挙げたイベントは，画像認識・
音声認識の技術を要する中では比較的簡単に求められ，その信頼性が高いものばかりであ
る。すなわち，ストーリー等の映像の意味内容に関する認識は必要としない。 FIG. 6 and FIG. 7 illustrate events that occur in news programs and sports programs, respectively, along the time axis. Here, let us consider the point that the characteristics of images and sounds change greatly as events. In the figure, seven items of 1) composition, 2) color tone, 3) speaker, 4) subtitle, 5) dissolve, 6) replay, and 7) slow playback are given as examples. The appearance and combination of such events has characteristics depending on the type of program, and programs can be classified based on the characteristics. For example, in a news program, cuts with casters appearing multiple times appear in time, so the image with the same composition, more specifically, the skin color that is the face color occupies a large area near the center. There is a feature that an image appears multiple times. In addition, there is a feature that the speakers at that time are often the same person or that subtitles appear frequently in the entire program.
On the other hand, in the case of sports broadcasts, broadcasting is often performed while switching a plurality of cameras installed at fixed positions, and images with the same or very similar composition frequently appear. Especially in baseball and soccer, the color is green, which is the color of the lawn. Another feature is that replay and slow playback are frequently used. Furthermore, in the case of CM, there are few sound interruptions, B
It is characterized by frequent use of GM, vivid colors, many cuts, and short time length. In this way, the type of video can be estimated to some extent from the combination pattern of a plurality of events in the video. The events listed here are image recognition /
In the need of speech recognition technology, it is relatively easy to find and has high reliability. That is, there is no need to recognize the semantic content of a video such as a story.

図８は，映像の種類を見分けるシステムのブロック図の一例である。入力映像は，画像
信号と音声信号のそれぞれについて，画像取り込み部８００及び音声取り込み部８０２で
デジタイズされる。デジタイズされたデータは，イベント検出部８０４に送られ，８０４
中の種類別に設けられた専用検出部８０６〜８２０によって，イベント検出の処理が行わ
れる。検出されたイベントは，イベント別カウンタ部８２２によって，イベントの種類別
にカウントされる。また，同時生起カウンタ部８２４は，複数のイベントが同時に，もし
くは規定の順番に現れた場合にのみ，そのイベントの組み合わせに対応するカウンタを１
増やす。これらのカウンタで得られた各種イベントの出現頻度分布は，比較部８２８によ
って，どの種類の番組におけるイベントの出現頻度分布に近いか比較照合される。 FIG. 8 is an example of a block diagram of a system for discriminating video types. The input video is digitized by the image capturing unit 800 and the sound capturing unit 802 for each of the image signal and the audio signal. The digitized data is sent to the event detection unit 804, where 804
Event detection processing is performed by dedicated detection units 806 to 820 provided for each type. The detected events are counted by event type by the event counter 822. The co-occurrence counter unit 824 sets the counter corresponding to the combination of events to 1 only when a plurality of events appear simultaneously or in a prescribed order.
increase. The appearance frequency distribution of various events obtained by these counters is compared and collated by the comparison unit 828 to determine which type of program the appearance frequency distribution of the event is close to.

次に，図８中の各ブロックについて詳細に説明する。 Next, each block in FIG. 8 will be described in detail.

イベント検出部８０４のうち，カット点検出部８０６は，カットの変わり目を検出する
。その手法については，例えば，発明者らによる，情報処理学会論文誌 Vol.33, No.4,
「カラービデオ映像における自動索引付け法と物体探索法」や特開平４−１１１１８１号
等で示された方法等が利用できる。イベント別カウンタ部８２２では，カット点の数がカ
ウントされる。 Of the event detection unit 804, the cut point detection unit 806 detects a cut transition. Regarding the method, for example, by the inventors, IPSJ Journal Vol.33, No.4,
Methods such as “automatic indexing method and object search method in color video image” and Japanese Patent Laid-Open No. 4-111181 can be used. The event-specific counter unit 822 counts the number of cut points.

同一構図検出部８０６は，予め定めた時間以内の過去に遡って，同じ構図もしくは類似
した構図の絵が現れているかどうかを検出する。これにはテンプレートマッチングに代表
される画像比較手法が使える。具体的には，比較する２枚のフレーム画像の同じ座標位置
にある画素の１つ１つについて，輝度差もしくは色差を求めて全画面分の総和をとり，こ
れを画像間の相異度とする。この相異度が定めた閾値より小さければ，同一もしくは類似
性が高いと判定できる。ここで，映像中のフレーム画像全てについて，同一構図か否かを
検出するのは処理時間がかかり，また，連続するフレーム画像間では画像の類似性が高い
動画像の特徴を考慮すると無駄でもある。そこで，カット点検出に連動させて，カット点
の画像だけを調べる対象とする。イベント別カウンタ部では，同一構図を持つフレームの
数がカウントされる。 The same composition detection unit 806 detects whether a picture having the same composition or a similar composition has appeared in the past within a predetermined time. For this, an image comparison method represented by template matching can be used. Specifically, for each pixel at the same coordinate position in the two frame images to be compared, the luminance difference or the color difference is obtained and the total for the entire screen is taken, and this is taken as the difference between the images. To do. If the degree of difference is smaller than a predetermined threshold, it can be determined that they are the same or similar. Here, it takes time to detect whether all the frame images in the video have the same composition or not, and it is useless considering the characteristics of moving images that have high image similarity between consecutive frame images. . Therefore, only the image of the cut point is examined in conjunction with the cut point detection. The counter for each event counts the number of frames having the same composition.

色調検出部８１０は，予め定めた時間以内の過去に遡って，同一の色調もしくは類似し
た色調の絵が現れているかどうかを検出する。これには，例えば，フレーム画面全体につ
いての色度数分布が利用できる。これは構図に無関係な，どの色がどれだけ使われている
かを表した特徴量である。具体的には，比較する２枚のフレーム画像のそれぞれについて
，画像を表現する画素の色を６４色程度に分別し，それら各色がそれぞれフレーム画像中
にどれだけ存在するかをカウントする。そして，得られた度数分布の各度数の差分の絶対
値の総和をもって色調の相異度とする。この相異度が定めた閾値より小さければ，同一も
しくは類似性が高いと判定できる。色調に関しても構図と同様の理由で，カット点の画像
についてのみ対象とすると効率がよい。イベント別カウンタ部では，同一色調を持つフレ
ームの数がカウントされる。また，色調検出部は，途中で求めた度数分布を利用して，ど
の色が最も多く使われているかを調べるようにしてもよい。具体的には，イベント別カウ
ンタ部中に，赤・青・緑等の色別にカウンタを用意し，赤系の色が多ければ赤のカウンタ
を増やし，緑が多ければ，緑のカウンタを増やすようにする。 The color tone detection unit 810 detects whether a picture having the same color tone or a similar color tone appears in the past within a predetermined time. For this, for example, a chromaticity distribution for the entire frame screen can be used. This is a feature quantity that represents how many colors are used regardless of the composition. Specifically, for each of the two frame images to be compared, the color of the pixel representing the image is classified into about 64 colors, and how many of these colors exist in the frame image is counted. Then, the sum of the absolute values of the differences between the frequencies in the obtained frequency distribution is used as the color difference. If the degree of difference is smaller than a predetermined threshold, it can be determined that they are the same or similar. For the same reason as the composition, the color tone is efficient when only the cut point image is targeted. In the event counter, the number of frames having the same color is counted. Further, the color tone detection unit may check which color is most frequently used by using the frequency distribution obtained in the middle. Specifically, a counter for each color such as red, blue, and green is prepared in the event-specific counter section, and if there are many red colors, the red counter is increased, and if there are many greens, the green counter is increased. To.

字幕検出部８１２は，映像中に字幕が現れているかどうかを検出する。その手法につい
ては，例えば，発明者らによる，特願平5-330507等で示された方法等が利用できる。イベ
ント別カウンタ部８２２では，字幕の出現数がカウントされる。 The caption detection unit 812 detects whether captions appear in the video. As the method, for example, the method shown in Japanese Patent Application No. 5-330507 by the inventors can be used. The event counter 822 counts the number of subtitles.

デゾルブ検出部８１４は，映像中のデゾルブ等の特殊効果を検出する。その手法につい
ては，本発明の前半で説明した通りである。イベント別カウンタ部８２２では，デゾルブ
の出現数がカウントされる。 The dissolve detection unit 814 detects special effects such as a dissolve in the video. The method is as described in the first half of the present invention. The event counter 822 counts the number of occurrences of the dissolve.

リプレイ検出部８１６は，予め定めた時間以内の過去に遡って，全く同一の映像が現れ
ているかどうかを検出する。これは同一構図検出部８０８と同様にテンプレートマッチン
グ等によってフレーム画像の比較をすることで行える。しかし，比較する動画像間の各フ
レームごとにテンプレートマッチングを行っていたのでは処理時間がかかりすぎるので，
各フレームを数文字分程度のコードに変換し，そのコード列の照合をもって動画像の照合
とする。１枚のフレームに対応するコード単体では情報量が極めて小さいが，動画像は多
くのフレームから構成されるので，１つの動画像が含むコードの数も多く，動画像中にお
けるコードの一連のシーケンスは，一片の動画像を特定するに足る十分な情報量を持つ。
こうした考え方に立脚した動画像の照合方法は，発明者らによる，特開平７−１１４５６
７号に示されている。 The replay detection unit 816 detects whether or not the same video appears in the past within a predetermined time. This can be done by comparing frame images by template matching or the like as in the same composition detection unit 808. However, if template matching is performed for each frame between moving images to be compared, it takes too much processing time.
Each frame is converted into a code of about several characters, and the collation of the code string is used as the collation of the moving image. A single code corresponding to one frame has a very small amount of information. However, since a moving image is composed of many frames, a single moving image includes a large number of codes, and a sequence of codes in the moving image. Has enough information to identify a piece of video.
A method for collating moving images based on this concept is disclosed in Japanese Patent Laid-Open No. 7-11456 by the inventors.
It is shown in No.7.

スロー再生検出部８１８は，スロー再生の映像を検出する。スロー再生は，フレーム画
像を標準再生時よりも長めの間隔（１／２スローで２倍，１／４スローで４倍）で連続表
示することで実現されるため，スロー再生の映像の場合，画像取り込み部８００でデジタ
イズされる画像は，全く同じ画像が複数枚続くという特徴がある（１／２スローで２枚，
１／４スローで４枚）。そこで，スロー再生かどうかの判定には，連続する２枚のフレー
ムを調べ，そのテンプレートマッチングによって画像相異度を調べる。そして，一定時間
分の相異度の推移を調べ，相異度が特定の周期で大きい値と小さい値を繰り返しているよ
うならば，スロー再生であると判定する。例えば，１／２スローの場合には，２枚ずつ同
じ画像が続くので，相異度は，小さい値と大きい値を交互に繰り返す。１／４の場合には
，小さい値が３回続いて大きい値が１回というように繰り返す。但し，動画像の場合，ス
ロー再生でなくても，連続する２枚のフレーム画像は類似しているので，相異度の大小の
判定は閾値を低めにして行う必要がある。イベント別カウンタ部８２２では，スロー再生
の出現数がカウントされる。 The slow playback detection unit 818 detects slow playback video. Slow playback is realized by continuously displaying frame images at longer intervals than normal playback (2 times at 1/2 slow, 4 times at 1/4 slow). The image digitized by the image capturing unit 800 has a feature that two or more identical images continue (two at 1/2 slow,
4 shots at 1/4 throw). Therefore, in order to determine whether or not the playback is slow, two consecutive frames are examined, and the image difference is examined by template matching. Then, the transition of the degree of difference for a certain period of time is examined, and if the degree of difference repeats a large value and a small value in a specific cycle, it is determined that it is slow reproduction. For example, in the case of 1/2 slow, since the same image continues two by two, the degree of difference alternately repeats a small value and a large value. In the case of 1/4, the small value is repeated three times and the large value is repeated once. However, in the case of a moving image, even if it is not slow reproduction, two consecutive frame images are similar, so it is necessary to determine whether the difference is large or small with a low threshold. The event-specific counter unit 822 counts the number of slow playback appearances.

同一話者検出部８２０では，予め定めた時間以内の過去に遡って，同一の話者が話した
ことがあったかどうかを検出する。例えば，音声の自己相関を求め，最も大きな値をとる
周波数が一致しているかどうかで調べることができる。イベント別カウンタ部８２２では
，同一話者の発話数がカウントされる。 The same speaker detection unit 820 detects whether or not the same speaker has spoken back in the past within a predetermined time. For example, the autocorrelation of speech can be obtained and checked by checking whether the frequency having the largest value matches. The event counter 822 counts the number of utterances of the same speaker.

同時生起カウンタ部８２４は，上記のイベントのうちの幾つかが同時もしくは特定の順
番で現れた場合にカウントを行う。カウンタは，検出するイベントの組み合わせの数だけ
用意される。例えば，同じ構図のときに，同じ話者が話しているケースでは，構図イベン
トと話者イベントの同時発生に対応するカウンタが１増やされる。同様に，デゾルブがあ
って，その直後にスロー再生が検出された場合には，デゾルブイベントとスロー再生イベ
ントの連続発生に対応するカウンタが１増える。 The co-occurrence counter unit 824 performs counting when some of the above events appear simultaneously or in a specific order. There are as many counters as the number of combinations of events to be detected. For example, in the case where the same speaker is speaking at the same composition, the counter corresponding to the simultaneous occurrence of the composition event and the speaker event is incremented by one. Similarly, when there is a dissolve and slow playback is detected immediately after that, the counter corresponding to the continuous occurrence of the dissolve event and the slow playback event is incremented by one.

比較部８２８では，時計８２６を参照し，時刻ｔ１からｔ２までの一定時間における映
像中のイベントの出現頻度の傾向が，どのような種類の番組のものに近いかを比較する。
比較に先立ち，まずニュース番組，スポーツ番組などそれぞれの種類別に典型的なイベン
トを調べておき，番組を特徴づける重要なイベントであるほど高くなるように値を与えて
ランク付けを行って，番組ごとにイベント別のランク一覧表を作成する。比較にあたって
は，各イベントの出現頻度値を正規化した値に，このランク一覧表で記述された値を掛け
て重み付けを行い，そうして得られた各イベントごとの値の総和が閾値を超えた場合，そ
のランク一覧に対応する種類の番組であると判定する。 The comparison unit 828 refers to the clock 826 and compares what kind of program the tendency of the appearance frequency of events in the video for a certain time from time t1 to t2 is similar.
Prior to the comparison, first, typical events such as news programs and sports programs are examined, and values are assigned so that the events become higher as they become more important events that characterize the program. Create a rank list for each event. In the comparison, the normalized value of the appearance frequency value of each event is multiplied by the value described in this rank list, and the total sum of the values obtained for each event exceeds the threshold value. If it is, it is determined that the program corresponds to the rank list.

このようにして得られたイベントを，図６もしくは図７のような，一方を時間軸とする
表形式で，図１のディスプレイ１上に一覧表示することができる。この一覧表示によって
，計算機が自動で判定できなかった場合でも，ユーザはこうした情報を１つの手がかりに
して，他から入手した情報，経験や知識等を合わせて利用することによって，番組の種類
を推測できる可能性がある。また，計算機に教えていない種類の番組が新たに入力された
場合，この一覧表示の中から，重要なイベント，もしくはイベントの組み合わせを選んで
登録するようにしてもよい。これは，図１で示したマウス等のポインティングデバイス５
を使って，一覧表上の各イベントの変化点や区間の表示部分をクリックするなどのダイレ
クトかつビジュアルな操作で行うようにすればユーザにとって非常に便利になる。 The events obtained in this way can be displayed as a list on the display 1 in FIG. 1 in a table format with one of them as a time axis as shown in FIG. 6 or FIG. Even if the computer cannot judge automatically by this list display, the user can guess the type of program by using this information as one clue and using information obtained from others, experience, knowledge, etc. There is a possibility. In addition, when a new type of program that is not taught to the computer is newly input, an important event or a combination of events may be selected and registered from this list display. This is because the pointing device 5 such as a mouse shown in FIG.
It is very convenient for the user if it is performed by direct and visual operations such as clicking the changing point of each event on the list and the display part of the section.

尚、本発明はＰＣ／ＷＳを用いて実現できる他、ＴＶ、ＶＴＲなどの一機能としても適
用可能である。 Note that the present invention can be realized by using a PC / WS, and can also be applied as one function such as a TV and a VTR.

本発明の実施例を実現するためのシステムブロック図である。It is a system block diagram for realizing an embodiment of the present invention. デゾルブの検出を行うプログラムのフローチャートである。It is a flowchart of the program which detects a dissolve. デゾルブの概念を表す図である。It is a figure showing the concept of a dissolve. デゾルブの検出を行うもう１つのプログラムのフローチャートである。It is a flowchart of another program which performs a detection of a dissolve. デゾルブ検出を行うプログラムを実行したときの評価値の時間推移を表すグラフである。It is a graph showing the time transition of the evaluation value when the program which performs a dissolve detection is executed. ニュース番組の典型的なイベントチャートである。It is a typical event chart of a news program. スポーツ中継の典型的なイベントチャートである。It is a typical event chart of a sports broadcast. 映像の分類を行うシステムのブロック図である。It is a block diagram of the system which classifies a video.

Explanation of symbols

１…ディスプレイ，２…制御信号線，３…Ａ／Ｄ変換器，４…コンピュータ，５…入力
装置，６…外部情報記憶装置，７…ＣＰＵ，８…接続インタフェース，９…メモリ，１０
…動画像再生装置，１１…キーボード。 DESCRIPTION OF SYMBOLS 1 ... Display, 2 ... Control signal line, 3 ... A / D converter, 4 ... Computer, 5 ... Input device, 6 ... External information storage device, 7 ... CPU, 8 ... Connection interface, 9 ... Memory, 10
... moving picture playback device, 11 ... keyboard.

Claims

The input receiving means receives the moving images constituting the program in time series in units of frames,
The processing equipment
In the detection unit, an event that is a point at which the feature of the moving image changes from the received moving image and an occurrence timing for each detected event are detected.
Based on the occurrence timing of each detected event type, the tendency of the appearance frequency of the event is extracted and compared with the event characteristics for each program type stored in the storage unit. Determine the type
A program digest creation method comprising: discriminating a type of the input program, extracting a feature scene of the moving image, and creating a digest of the program based on the feature scene.

2. The digest creation method according to claim 1, wherein the processing device has a feature that the feature amount of the frame of the moving image that has received the input monotonously approaches the feature amount of the newest frame in order from the oldest. A program digest creation method comprising: extracting a video section from the frame to a predetermined time as a feature scene and creating a digest of the program when the determination is true.