JP3194837B2

JP3194837B2 - Representative screen extraction method and apparatus

Info

Publication number: JP3194837B2
Application number: JP16726294A
Authority: JP
Inventors: 行信谷口; 佳伸外村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-07-19
Filing date: 1994-07-19
Publication date: 2001-08-06
Anticipated expiration: 2016-08-06
Also published as: JPH0832924A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は代表画面抽出方法および
装置に係り、詳しくは、複数枚の画像データの列からそ
の内容を代表する少数の代表画面を抽出する方法および
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for extracting a representative screen, and more particularly to a method and an apparatus for extracting a small number of representative screens representing the contents from a sequence of a plurality of image data.

【０００２】[0002]

【従来の技術】一般に映像データはデータ量が膨大であ
るが、その内容を知るためには映像を時間順に見ていく
しかなかった。映像内容を効率良く表現する代表画面を
映像データの中から抽出しておけば、映像の概略把握に
有用である。理想的にはストーリなどまで考慮して代表
画面を選び出す必要があるが、その作業は現状では人手
でしかできず、作業量が膨大になるため非現実的であ
る。2. Description of the Related Art Generally, video data has an enormous data amount, but the only way to know its contents is to look at the video in chronological order. If a representative screen that efficiently expresses the video content is extracted from the video data, it is useful for grasping the outline of the video. Ideally, it is necessary to select a representative screen in consideration of the story and the like, but at the present time, the work can only be performed manually and the amount of work is enormous, which is impractical.

【０００３】以下に、代表画面抽出の自動化に関連する
従来の技術について、その二，三の応用例を挙げながら
説明する。[0003] A conventional technique relating to automation of representative screen extraction will be described below with reference to a few application examples.

【０００４】第一の応用例は、ビデオの一覧表示に関す
るものである。図７に映像一覧表示の模式図を示す。ビ
デオテープの内容を知りたいとき、あるいはビデオテー
プの中で必要な部分を頭出ししたいとき、従来は、ビデ
オデッキの早送り、巻き戻し機能を利用するしかなく、
時間と手間がかかるという問題点があった。ビデオテー
プから代表画面を自動的に抽出して、ディスプレイある
いは紙などの媒体に表示すれば、ビデオ内容の一覧が可
能となり、短時間で映像内容の大雑把な把握が可能とな
る。これに関する従来技術としては、例えば特公平５−
７４２７３号“インデックス画像作成装置”、特開昭６
４−１１４８３号“ビデオプリンタ”、特願平５−１９
５６４４号“ビデオ画像プリント方法および装置”など
がある。このうち、特公平５−７４２７３号のインデッ
クス画像作成装置では、カット点あるいはその直前の画
像データを代表画面として抽出する手法をとっている。
カット点はショット（連続的にカメラで撮影された映像
区間）のつなぎ目であり、それを検出すれば、ショット
ごとに一枚ずつ代表画面を選び出すことができるという
わけである。具体的には、画像間の差分値列を計算し画
像変化の有無を判定している。また、特開昭６４−１１
３４８３号のビデオプリンタでは、カット点から一定時
間後の画像を代表画面として抽出し、紙にプリントする
ことを特徴としている。これは、主要な画面がショット
の途中に現れることが多いという経験則に基づいてい
る。さらに、特願平５−１９５６４４号のビデオ画像プ
リント方法および装置では、再生ビデオ信号を画像処理
し、特定の条件に合致する画像をビデオ内容の把握に重
要なビデオ内の大きな変化を表わすイベントとして抜き
出している。[0004] A first application example relates to display of a list of videos. FIG. 7 shows a schematic diagram of the video list display. If you want to know the contents of a videotape or find the necessary part of the videotape, conventionally, you have to use the fast forward and rewind functions of the VCR.
There was a problem that it took time and effort. If a representative screen is automatically extracted from a video tape and displayed on a display or a medium such as paper, a list of video contents can be obtained, and a rough grasp of video contents can be obtained in a short time. As a prior art relating to this, for example,
No. 74273, “Index image creation device”,
4-111483 "Video Printer", Japanese Patent Application No. 5-19
No. 5644, "Video image printing method and apparatus". Among them, the index image creating apparatus of Japanese Patent Publication No. Hei 5-74273 employs a method of extracting a cut point or image data immediately before the cut point as a representative screen.
A cut point is a joint between shots (a video section continuously captured by a camera), and if it is detected, a representative screen can be selected one by one for each shot. Specifically, a sequence of difference values between images is calculated to determine the presence or absence of an image change. Japanese Patent Application Laid-Open No. 64-11
The video printer of No. 3483 is characterized in that an image after a fixed time from a cut point is extracted as a representative screen and printed on paper. This is based on the rule of thumb that the main screen often appears in the middle of a shot. Further, in the video image printing method and apparatus of Japanese Patent Application No. 5-195644, a reproduced video signal is subjected to image processing, and an image meeting a specific condition is defined as an event representing a large change in the video which is important for grasping the video content. I'm pulling out.

【０００５】第二の応用例は、ビデオの早見に関するも
のである。カット点で区切られた一つ一つの映像区間を
少しずつ切り出し、結合することによって早見映像を自
動的に生成できるようになる（大辻、外村“動画像高速
ブランジングの主観評価”、電子情報通信学会春季大
会、ＳＤ９−３，１９９３”）。この方法も、カット点
を検出して、その直後あるいは直前の画像を代表画面と
している。[0005] A second application relates to quick viewing of video. It is possible to automatically generate a quick-view video by cutting out and merging each video section delimited by cut points (Otsuji, Tonomura "Subjective evaluation of high-speed moving image branding", Electronic Information The Communication Society Spring Conference, SD9-3, 1993 "). In this method, a cut point is detected, and an image immediately or immediately before the cut point is used as a representative screen.

【０００６】以下に、映像カット点検出方法の従来技術
について述べる。映像カット点検出の代表的な方法とし
ては、時間的に隣合う二枚の画像（時刻ｔの画像と時刻
ｔ−１の画像）の対応する画素における輝度値の差を計
算して、その絶対値の和（フレーム間差分）をＤ（ｔ）
とし、Ｄ（ｔ）がある与えられた閾値よりも大きい時、
時刻ｔをカット点とみなす方法がある（大辻、外村、大
庭、“輝度情報を使った動画像ブラウジング”、電気情
報通信学会技術報告，ＩＥ９０−１０３，１９９１）。
なお、フレーム間差分の代りに、画素変化面積、輝度ヒ
ストグラム差分、ブロック別色相関などが、Ｄ（ｔ）と
して使われることもある（大辻、外村：“映像カット自
動検出方式の検討”、テレビジョン学会技術報告、Ｖo
l．１６，Ｎo.４３，ｐｐ．７−１２）。また、Ｄ
（ｔ）をそのまま閾値処理するのではなく、各種時間フ
ィルタをＤ（ｔ）に対して作用した結果を閾値処理する
方法もある（Ｋ．Ｏtsuji and Ｙ．Ｔonomura：“Ｐroj
ection Ｄetecting Ｆilter forＶideo Ｃut Ｄetectio
n”，Ｐroc．of ＡＣＭＭultimedia ９３，１９９３，
ｐｐ．２５１−２５７）。この方法は、映像の中に激し
く動く物体やフラッシュ光があっても誤検出を生じにく
いという特徴を持っている。The prior art of the video cut point detection method will be described below. As a typical method of detecting a video cut point, a difference between luminance values of corresponding pixels of two temporally adjacent images (the image at time t and the image at time t−1) is calculated, and the absolute value is calculated. The sum of the values (inter-frame difference) is D (t)
And when D (t) is greater than a given threshold,
There is a method in which the time t is regarded as a cut point (Otsuji, Totomura, Oba, "Moving picture browsing using luminance information", IEICE Technical Report, IE90-103, 1991).
In addition, instead of the inter-frame difference, a pixel change area, a luminance histogram difference, a color correlation for each block, or the like may be used as D (t) (Otsuji, Totomura: "Study of automatic video cut detection method", Technical Report of the Institute of Television Engineers of Japan, Vo
l. 16, No. 43, pp. 7-12). Also, D
Instead of thresholding (t) as it is, there is also a method of thresholding the result of applying various time filters to D (t) (K. Otsuji and Y. Tonomura: “Proj
ection Defecting Filter for Video Cut Detectio
n ", Proc. of ACM Multimedia 93, 1993,
pp. 251-257). This method is characterized in that erroneous detection is unlikely to occur even if there is a strongly moving object or flash light in the video.

【０００７】[0007]

【発明が解決しようとする課題】上記従来技術において
は、代表画面をカット点を基準にして選び出すもので、
単にカット点直後あるいはそれから一定時間後の画像を
代表画面とするというものであった。しかし、カット点
直後の画像はカメラ焦点が合っていなかったり、被写体
の動きが激しいためにぶれていたりして、代表画面とし
て画質的に適切でない場合があるという問題点があっ
た。一覧表示の応用では、画質の悪い代表画面は抽出し
ないようにすることが望ましい。In the above prior art, a representative screen is selected based on a cut point.
The image just after the cut point or a certain time after that is used as the representative screen. However, there is a problem in that the image immediately after the cut point is not focused on the camera, or is blurred due to rapid movement of the subject, and may not be appropriate in terms of image quality as a representative screen. In the application of the list display, it is desirable not to extract a representative screen having poor image quality.

【０００８】また、従来の映像カット検出方法において
は、時間的にゆっくりとしたシーンの変化が検出できな
いという問題点があった。これは、シーンの変化の割合
を表す量が時間的に隣合う２フレームだけから算出され
ていて、長時間のシーンがほとんど反映されていないか
らである。これについて、例えば特願平５−３１７６６
３号“映像カット点検出方法および装置”では、隣合う
画像間に加えて時間的に離れた画像間の複数組の画像デ
ータ間の距離を計算することによって解決している。し
かし、それでもなお、長時間のうちに非常にゆっくりと
変化するシーン変化は検出できないことがあるという問
題点があった。すなわち、人間には完全にシーンが切り
替わったと知覚できるシーン変化でも、それが検出でき
ないことがあったので、そのシーンに対応する代表画面
がもれてしまうという問題点があった。Further, the conventional video cut detection method has a problem that a temporally slow scene change cannot be detected. This is because the amount representing the rate of change of the scene is calculated only from two frames that are temporally adjacent to each other, and a long-time scene is hardly reflected. Regarding this, for example, Japanese Patent Application No. 5-31766.
No. 3 “Video cut point detection method and device” solves the problem by calculating the distance between a plurality of sets of image data between temporally distant images in addition to adjacent images. However, there is still a problem that a scene change that changes very slowly in a long time may not be detected. In other words, a scene change that can be perceived as completely changed by a human being may not be detected, so that there is a problem that a representative screen corresponding to the scene is leaked.

【０００９】また、パニング（カメラを横にふる操
作）、チルト（縦にふる操作）といったカメラ操作によ
って、絵柄が変わった場合もカメラ操作後の画像を代表
画面として抽出したいことがあるが、従来方法ではそれ
ができなかった。[0009] In addition, even when a pattern is changed by a camera operation such as panning ( shaking the camera horizontally) or tilting (shaking vertically), it is sometimes desired to extract an image after the camera operation as a representative screen. The way did not.

【００１０】本発明の目的は、第一にフェード、ワイプ
等の編集効果やカメラ操作による時間的にゆっくりした
シーン変化を検出でき、第二にフラシュ光などの時間的
ノイズを含む映像など、あらゆる映像に適応することが
でき、第三に画質的にも適切な代表画面を抽出できると
ころの代表画面抽出方法および装置を提供することにあ
る。An object of the present invention is to firstly detect an editing effect such as fade and wipe or a temporally slow scene change due to a camera operation, and secondly, any video including temporal noise such as flash light. Thirdly, it is an object of the present invention to provide a representative screen extracting method and apparatus capable of extracting a representative screen which can be adapted to a video and which is appropriate in terms of image quality.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、ある時刻における画像データを参照用画
像データとして、該参照用画像データと時刻ｔの画像デ
ータとの間の距離を時刻ｔを変化させながら順次算出
し、該距離があらかじめ定められた閾値よりも大きいと
いう第一の条件と、時刻ｔの時間的近傍で画像データの
変化が所定の閾値より小さい、すなわちシーンが安定し
ているという第二の条件とをともに満たす時に、該時刻
ｔの画像を代表画面として抽出することを特徴とする。In order to achieve the above object, according to the present invention, image data at a certain time is regarded as reference image data, and the distance between the reference image data and the image data at the time t is determined. sequentially calculated while changing the time t, the first condition that the distance is greater than a predetermined threshold, the change of the image data in the temporal vicinity of the time t is smaller than a predetermined threshold value, i.e., the scene is stable The image at the time t is extracted as a representative screen when both of the second condition that the image data is satisfied and the second condition are satisfied.

【００１２】また、本発明は、シーンが安定しているか
否かを判定する際に、あらかじめ求まっている画像デー
タ間の距離の増減を調べることを特徴とする。すなわ
ち、時刻tの時間的近傍で前記距離の変化が減少すると
第２の条件を満たしたとする。さらに、本発明は、第一
または第二の条件が満たされない場合に、画像カット点
検出を行い、カットありと判定された場合に、時刻ｔの
画像データを代表画面として抽出することを特徴とす
る。Further, the present invention is characterized in that when determining whether or not a scene is stable, an increase or decrease in a distance between image data obtained in advance is examined. Sand
That is, when the change in the distance decreases near the time point of time t,
It is assumed that the second condition is satisfied. Furthermore, the present invention is characterized in that when the first or second condition is not satisfied, an image cut point is detected, and when it is determined that there is a cut, the image data at time t is extracted as a representative screen. I do.

【００１３】[0013]

【作用】本発明では、参照用画像データと時刻ｔの画像
データの間の距離を算出する。この距離は、画像の絵柄
の違いを評価するものである。参照用画像データは固定
しておき、そこからの絵柄の変化を観察していくので、
従来技術では検出できなかった非常にゆっくりとしたシ
ーン変化を検出できるようになり、その結果、代表画面
検出に漏れが少なくなる。それに加えて、シーンが安定
しているか否かを判定する手続きをそなえることによつ
て、フラッシュ光により一時的に絵柄が変化した場合に
代表画面を異って抽出するのを防ぐことができ、また、
カメラ操作によるシーン変化や、ゆっくりとしたシーン
変化が起った場合に、その変化途中で代表画面を過剰に
抽出するのを抑止することができ、さらに被写体の動き
やカメラのぼけなどに起因する画質の悪い代表画面を抽
出しないようにすることができる。According to the present invention, the distance between the reference image data and the image data at time t is calculated. This distance is used to evaluate the difference between the pictures in the image. Since the reference image data is fixed and changes in the pattern from there are observed,
It becomes possible to detect a very slow scene change that could not be detected by the prior art, and as a result, leakage of representative screen detection is reduced. In addition, by providing a procedure for determining whether the scene is stable or not, it is possible to prevent the representative screen from being extracted differently when the picture temporarily changes due to flash light, Also,
When a scene change due to a camera operation or a slow scene change occurs, it is possible to prevent the representative screen from being excessively extracted during the change, and further caused by subject movement or camera blur. It is possible to prevent a representative screen having poor image quality from being extracted.

【００１４】また、シーンの安定性を判定する際に、あ
らかじめ求まっている参照用画像データと各時刻の画像
データの間の距離の増減を調べることにより、シーン安
定性を調べるために余計な画処理をする必要がなく、計
算量を削減できる。さらに、カット検出手順と組み合わ
せることによって、絵柄の似通ッたカット点を代表画面
として検出することができ、代表画面の抽出もれを軽減
できる。Further, when determining the stability of the scene, an increase or decrease in the distance between the previously obtained reference image data and the image data at each time is examined, so that an extra image is required to check the scene stability. There is no need for processing, and the amount of calculation can be reduced. Furthermore, by combining with a cut detection procedure, a cut point having a similar pattern can be detected as a representative screen, and omission of extraction of the representative screen can be reduced.

【００１５】[0015]

【実施例】以下、本発明の一実施例について図面を用い
て説明する。An embodiment of the present invention will be described below with reference to the drawings.

【００１６】図１は、本発明による代表画面抽出装置の
一実施例の構成を示すブロック図である。図１におい
て、１０は入力画像データ列であり、画像のサンプリン
グレート、画像フォーマット、画像サイズは任意でよ
い。すなわち、ＮＴＳＣ標準映像信号を３０frames/sec
でサンプリングしてもよいし、それよりも粗いサンプリ
ングレートでサンプリングしたものでもよい。この入力
画像データ列１０は、ＮＴＳＣのようなアナログ信号で
も、デジタル信号でも、また、ハードディスク、ＣＤ−
ＲＯＭ等、蓄積装置に保存されている画像ファイルであ
ってもよい。１１は参照画像用バッファメモリであり、
入力画像データ列１０の内、ある時点の画像データを参
照用として保存しておくためのメモリである。なお、こ
の参照画像用バッファメモリ１１は、画像に対してある
処理を加えて二次的に得られるデータを格納するもので
あってもよい。例えば、一枚の画像に対する輝度ヒスト
グラムや色ヒストグラムを格納するものであってもよい
し、エッジ情報を格納するものであってもよい。また、
それらを時間的に平均したものであってもよい。さら
に、計算処理部１３での処理時間を短縮するために縮小
した画像を格納するものであってもよい。ここでは、こ
れらも含めて画像データで総称する。１２はバッファメ
モリであり、入力画像データ列１０について時刻ｔ近傍
の画像データを一時格納するためのものである。このバ
ッファメモリ１２は、例えば、画像送出元から順次送ら
れてくる画像データを一時格納しておくフレームバッフ
ァであってもよいし、複数格納しておけるようにシフト
バッファアレィを使って構成してもよい。また、参照画
像用バッファメモリ１１と同様に、輝度ヒストグラム、
色ヒストグラムを格納するようにしてもよい。１３は計
算処理部であり、参照画像用バッファメモリ１１の参照
画像データとバッファメモリ１２の入力画像データを使
って代表画面抽出処理を行う。この計算処理部１３は、
ＲＡＭやＲＯＭなどのメモリを内蔵する所謂ＣＰＵで構
成される。１４は代表画面フレーム番号情報の出力線、
１５は代表画面情報の出力線である。FIG. 1 is a block diagram showing the configuration of an embodiment of a representative screen extracting apparatus according to the present invention. In FIG. 1, reference numeral 10 denotes an input image data sequence, and the sampling rate, image format, and image size of an image may be arbitrary. That is, the NTSC standard video signal is transmitted at 30 frames / sec.
, Or may be sampled at a coarser sampling rate. The input image data sequence 10 may be an analog signal such as NTSC, a digital signal, a hard disk, a CD-
It may be an image file stored in a storage device such as a ROM. 11 is a reference image buffer memory,
This is a memory for storing image data at a certain point in the input image data sequence 10 for reference. The reference image buffer memory 11 stores data obtained secondarily by performing a certain process on the image.
There may be . For example, it may store a luminance histogram or a color histogram for one image, or may store edge information. Also,
They may be averaged over time. Further, an image reduced in order to shorten the processing time in the calculation processing unit 13 may be stored. Here, these are collectively referred to as image data. Reference numeral 12 denotes a buffer memory for temporarily storing image data of the input image data sequence 10 near time t. The buffer memory 12 may be, for example, a frame buffer for temporarily storing image data sequentially transmitted from an image transmission source, or may be configured using a shift buffer array so as to store a plurality of image data. You may. Further, similarly to the reference image buffer memory 11, a luminance histogram,
A color histogram may be stored. A calculation processing unit 13 performs a representative screen extraction process using the reference image data of the reference image buffer memory 11 and the input image data of the buffer memory 12. This calculation processing unit 13
It is constituted by a so-called CPU having a built-in memory such as a RAM or a ROM. 14 is an output line for representative screen frame number information,
Reference numeral 15 denotes an output line for representative screen information.

【００１７】図２は、本発明による代表画面抽出方法の
一実施例の処理フローチャートであり、この処理は図１
の計算処理部１３が受け持つ。まず、時刻ｔを表す変数
ｓ，ｔを０に初期化する（ステップ２０１）。この時、
参照画像用バッファメモリ１１には、入力画像データ列
１０内の、時刻ｓ＝０すなわち時刻ｔ＝０の画像データ
が参照用画像データとして格納される。バッファメモリ
１２には、時刻ｔ＝０以降の入力画像データ列１０が順
次格納され、常時、時刻ｔ近傍の画像データ列が格納さ
れる。ここで、各画像データにはフレーム番号が付加さ
れているとする。次に、ｔを１だけ進めて（ステップ２
０２）、時刻ｓの参照用画像データＩ_sと時刻ｔの画像
データＩ_tの間の距離ｄ_s(t) を計算する（ステップ２０
３）。ここで、距離ｄ_s(t)は正の値をとり、二枚の画像
の絵柄が類似していればいるほど０に近くなり、異なっ
ていればいるほど大きな値をとる。なお、距離の算出に
ついては後述する。次に、距離ｄ_s(t)を閾値Ｔと比較す
る（ステップ２０４）。ここで、ｄ_s(t)＜Ｔの場合、絵
柄の変化が小さいとみなしてステップ２０２に戻る。FIG. 2 is a processing flowchart of an embodiment of a representative screen extracting method according to the present invention .
Of the calculation processing unit 13. First, variables s and t representing time t are initialized to 0 (step 201). At this time,
The reference image buffer memory 11 stores image data at time s = 0, that is, time t = 0, in the input image data sequence 10 as reference image data. The input image data sequence 10 after time t = 0 is sequentially stored in the buffer memory 12, and the image data sequence near time t is always stored. Here, it is assumed that a frame number is added to each image data. Next, t is advanced by 1 (step 2).
02), calculates the distance d _s between the image data I _t of the reference image data I _s and the time t at time s (t) (step 20
3). Here, the distance d _s (t) takes a positive value, becomes closer to 0 as the patterns of the two images are similar, and takes a larger value as the patterns of the two images are different. The calculation of the distance will be described later. Next, the distance d _s (t) is compared with a threshold T (step 204). Here, when d _s (t) <T, the change of the picture is regarded as small, and the process returns to step 202.

【００１８】ｄ_s(t)≧Ｔの場合、二枚の画像の間で絵柄
が変化したとみなす。この場合、続いてシーンが安定し
ているか否かを判定する手続を呼びだし（ステップ２０
５）、安定であるか否かを検査し（ステップ２０６）、
安定でない場合にはステップ２０２に戻る。安定と判定
されれば、時刻ｔの画像を代表画面とし（ステップ２０
７）、ｔをｓに代入し（ステップ２０８）、ステップ２
０２に戻る。ｔをｓに代入するということは、参照用画
像データを時刻ｔの画像データに更新することを意味す
る。なお、シーンが安定しているか否かを判定する手続
きについては後述する。If d _s (t) ≧ T, it is considered that the picture has changed between the two images. In this case, a procedure for determining whether or not the scene is stable is called (step 20).
5) checking whether or not it is stable (step 206);
If it is not stable, the process returns to step 202. If it is determined that the image is stable, the image at time t is set as the representative screen (step 20).
7) Substitute t for s (step 208), and step 2
Return to 02. Substituting t for s means updating the reference image data to the image data at time t. The procedure for determining whether the scene is stable will be described later.

【００１９】ここで、ステップ２０７では、具体的に
は、計算処理部１３が時刻ｔの画像データのフレーム番
号情報を線１４に送出し、同時に、このフレーム番号情
報を制御線１６を通してバッファメモリ１２へ与えて、
バッファメモリ１２から該当フレーム番号の画像データ
を読み出し、参照画像用バッファメモリ１１に転送す
る。これにより、参照画像用バッファメモリ１１の参照
用画像データが時刻ｔの画像データに更新される。ま
た、バッファメモリ１２から読み出された時刻ｔの画像
データは代表画面として線１５に送出される。この線１
５の代表画面を、それに線１４のフレーム番号を付加し
て、例えばハードディスク等の二次記憶媒体等に格納す
る。なお、計算処理部１３は、処理のためにバッファメ
モリ１２から取り込んだ画像データのうちから代表画面
として求まった時刻ｔの画像データを線１５に送出する
とともに参照画像用バッファメモリ１１に書き込んでも
よい。Here, in step 207, specifically, the calculation processing unit 13 sends the frame number information of the image data at the time t to the line 14, and at the same time, sends the frame number information to the buffer memory 12 through the control line 16. Give to
The image data of the corresponding frame number is read from the buffer memory 12 and transferred to the reference image buffer memory 11. Thus, the reference image data in the reference image buffer memory 11 is updated to the image data at the time t. The image data at the time t read from the buffer memory 12 is sent to the line 15 as a representative screen. This line 1
The representative screen No. 5 is stored in a secondary storage medium such as a hard disk, for example, with the frame number of the line 14 added thereto. Note that the calculation processing unit 13 may send out the image data at the time t obtained as the representative screen from the image data fetched from the buffer memory 12 for processing to the line 15 and write the image data into the reference image buffer memory 11. .

【００２０】次に、図３を使って、距離ｄ_s(t)の時間的
変化と処理の流れの関係を説明する。まず、時刻ｓの画
像データを参照用画像データと考える（ステップ２０
１）。ｔを増加させながら（ステップ２０２）、参照用
画像データと時刻ｔの画像データの間の距離ｄ_s(t)を順
次算出する（ステップ２０３）。画像データの絵柄は時
間を経るごとに参照用画像データのものとは異なってく
るので、図３に示すように、距離ｄ_s(t)は少しずつ増加
する。この距離ｄ_s（ｔ₂）を閾値Ｔと比較するが（ステ
ップ２０４）、時刻ｔ₁では、ｄ_s(t）＜Ｔなので、絵柄
が十分に変化していないとみなす。時刻ｔ₂において、
ビルのシーンから車のシーンに切り替わると、距離ｄ
_s(t)が急増し、ｄ_s(t₂）≧Ｔを満たすようになる。この
場合、続いてシーンが安定しているか判定する（ステッ
プ２０５，２０６）。しかし、時刻ｔ₂の近辺ではｄ
_s(t)の増減が大きいので、ステップ２０６ではシーンが
まだ安定していないと判定し、代表画面を抽出しないま
まステップ２０２に戻る。時刻ｔ₃でｄ_s(t)が減少に転
ずるので安定したとみなし、代表画面を抽出する（ステ
ップ２０７）。そして、この時刻ｔ₃の画像データを次
の参照用画像データに設定する。具体的には、ｓ＝ｔ₃
として（ステップ２０８）、もとの処理に戻る。Next, the relationship between the temporal change of the distance d _s (t) and the processing flow will be described with reference to FIG. First, the image data at time s is considered as reference image data (step 20).
1). While increasing t (step 202), the distance d _s (t) between the reference image data and the image data at time t is sequentially calculated (step 203). Since the pattern of the image data differs from that of the reference image data over time, the distance d _s (t) gradually increases as shown in FIG. This distance d _s (t ₂ ) is compared with a threshold value T (step 204). At time t ₁ , since d _s (t) <T, it is assumed that the pattern has not changed sufficiently. At time t _2,
When switching from a building scene to a car scene, the distance d
_s (t) rapidly increases, and d _s (t ₂ ) ≧ T is satisfied. In this case, it is determined whether the scene is stable (steps 205 and 206). However, in the vicinity of time t ₂ , d
_Since the increase or decrease of _s (t) is large, it is determined in step 206 that the scene is not yet stable, and the process returns to step 202 without extracting the representative screen. At time t ₃ , d _s (t) starts to decrease, so it is regarded as stable, and a representative screen is extracted (step 207). Then, it sets the image data of the time t ₃ to the next reference image data. Specifically, s = t ₃
(Step 208), and returns to the original processing.

【００２１】図２の処理フローにおいて、ステップ２０
５，２０６のシーンの安定性を検査する手続きを省略
し、単に距離ｄ_s(t）＞Ｔのとき代表画面ありとみなす
ことも考えられるが、次のような問題があるため実用的
でない。（１）ワイプ、フェードといったゆっくりとしたシーン
変化の場合、図４に示すように、シーン変化途中で距離
ｄ_s(t）が閾値Ｔを越えることがある。このため、ひと
まとまりのシーン変化の中で代表画面を重複して抽出し
てしまったり、クロスフェード（二つの映像が重なりあ
って一つのシーンから他のシーンへ切り替わる映像編集
効果）の途中の２枚の画像が重なりあった（画質的に好
ましくない）代表画面が抽出されてしまったりする。（２）フラッシュが焚かれているシーン（このようなシ
ーンはニユース映像で多く見られるものであるが）を撮
映した映像では、フラツシュ光による輝度の突発的上昇
により、図５に示すよう、距離ｄ_s(t）が突発的に閾値
Ｔを越えることがある。このため、フラツシュが焚かれ
るごとに代表画面を繰り返し抽出してしまう。本発明で
は、シーンの安定性を検査することによって、上記
（１),（２）の問題を克服している。In the processing flow of FIG.
It is conceivable to omit the procedure for checking the stability of 5,206 scenes and simply consider that there is a representative screen when the distance d _s (t)> T, but this is not practical because of the following problems. (1) In the case of a slow scene change such as wipe or fade, the distance d _s (t) may exceed the threshold value T during the scene change as shown in FIG. For this reason, a representative screen may be extracted redundantly in a group of scene changes, or a cross-fade (an image editing effect in which two images overlap and a scene is switched from one scene to another scene) may occur. A representative screen in which two images overlap (which is not desirable in terms of image quality) may be extracted. (2) In a video shot of a scene in which a flash is being fired (such a scene is often seen in a news video), due to a sudden increase in luminance due to flash light, as shown in FIG. The distance d _s (t) may suddenly exceed the threshold T. Therefore, the representative screen is repeatedly extracted each time the flash is fired. In the present invention, the problems (1) and (2) are overcome by checking the stability of the scene.

【００２２】次に、参照用画像データＩ_Sと画像データ
Ｉ_tの間の距離ｄ_s(t)を算出する手続きの二，三の実現
例を説明する。Next, the procedure for calculating the distance d _s between the reference image data I _S and the image data I _t (t), second, describing the third implementation.

【００２３】第一の実現例は、輝度ヒストグラムを用い
るものである。即ち、時刻ｓの参照画像Ｉ_sに対する輝
度ヒストグラムをＨ_s(n)、時刻ｔの画像データＩ_tに対
する輝度ヒストグラムをＨ_s(n)，n＝１，２，…，Ｎと
し、距離ｄ_s(t)を、〔数１〕で計算する。ただし、Ｎは
ヒストグラムの段階数である。The first implementation uses a luminance histogram. That is, a luminance histogram with respect to the reference image I _s at time _s H s (n), the luminance histogram for the image data I _t at time _{t H s (n), n} = 1,2, ..., and N, the distance d _s (t) is calculated by [Equation 1]. Here, N is the number of steps in the histogram.

【００２４】[0024]

【数１】 (Equation 1)

【００２５】第二の実現例は、色のヒストグラムを用い
るものである。即ち、時刻ｓの参照画像、時刻ｔの画像
に対する色ヒストグラムをそれぞれＨ_s′(n_r，n_g，
n_b），Ｈ_t′(n_r，n_g，n_b），n_r，n_g，n_b，１，２，…，
Ｎと表すとき、距離ｄ_s(t)を〔数２〕で計算する。The second implementation uses a color histogram. That is, the color histograms for the reference image at time s and the image at time t are respectively represented by H _s ′ (n _r , _ng ,
n _b ), H _t ′ (n _r , n _g , n _b ), n _r , n _g , n _b , 1, 2,.
When expressed as N, the distance d _s (t) is calculated by [Equation 2].

【００２６】[0026]

【数２】 (Equation 2)

【００２７】以上説明した実現例では、ヒストグラムに
基づいた特徴量から距離ｄ_s(t)を算出したが、これに限
られる訳ではない。ブロックで平均した色情報から距離
ｄ_s(t)を算出してもよい。In the embodiment described above, the distance d _s (t) is calculated from the feature amount based on the histogram, but the present invention is not limited to this. The distance d _s (t) may be calculated from the color information averaged in the block.

【００２８】次に、シーンの安定性を評価する手続きの
二，三の実現例を説明する。Next, a few examples of the procedure for evaluating the stability of a scene will be described.

【００２９】第一の実現例は、フレーム間差分を用いる
ものである。即ち、時刻ｔの画像データをＩ_tとし、座
標（ｘ，ｙ）における輝度値をI_t(x,y)と表し、フレー
ム間差分をＤ(t)＝Σ_x,y｜I_t(x，y)−Ｉ_t-1(x，y)｜により計算する。そして、フレーム間差分の系列がある
時間幅Ｗですべてある閾値θより小さい場合、すなわ
ち、Ｄ_(t-k)＜θ，ｋ＝０，１，…，Ｗ−１のとき、時
刻ｔの付近で画像データの変化が小さい、すなわちシー
ンが安定であると判定する。The first implementation uses inter-frame differences. That is, the image data at time t and I _t, the coordinates (x, y) the luminance value I _t at (x, y) and represent the difference between the frames _{D (t) = Σ x,} y | I t (x , Y) −I _t−1 (x, y) |. When smaller than the threshold value theta with all the time width W there is a series of inter-frame difference, that _{is, D (tk) <θ,} k = 0,1, ..., when the W-1, the image in the vicinity of the time t It is determined that the data change is small, that is, the scene is stable.

【００３０】第二の実現例は、距離ｄ_s(t)をシーンの安
定性の判定にも用いるものである。即ち、距離ｄ_s(t)を
観察すると、シーンが不安定な場合には、図４に示すよ
うに、ｄ_s(t)が単調に増加したり、図５に示すように、
一時間ピークを示すことがある。そこで、例えば、ｄ
_s(t-1)＞ｄ_s(t)を満たすとき画像データの変化が小さ
い、すなわちシーンが安定であるとみなすようにしても
よいし、ｄ_s(t-1)＞ｄ_s(t)かつｄ_s(t-2)＞ｄ_s(t)を満た
すとき同様にシーンが安定であるとみなしてもよい。In the second embodiment, the distance d _s (t) is also used for determining the stability of the scene. That is, when observing the distance d _s (t), if the scene is unstable, d _s (t) monotonically increases as shown in FIG. 4, or as shown in FIG.
May show an hour peak. So, for example, d
_{When s} (t-1)> d _s (t) is satisfied , the change of the image data is small.
That is, the scene may be considered to be stable, and similarly, when d _s (t−1)> d _s (t) and d _s (t−2)> d _s (t) are satisfied , May be considered stable.

【００３１】図６は、本発明代表画面抽出方法の他の一
実施例の処理フローである。図２の方法は、ゆっくりと
したシーン変化を検出できる反面、絵柄の似通ったカッ
ト点を検出できない場合がある。このような場合、図２
の処理フローに映像カット点検出処理を組み合わせて使
うことが有効である。図６において、ステップ６０１〜
６０６はそれぞれ図２のステップ２０１〜２０６に対応
する。ステップ６０４でｄ_s(t)＜Ｔの場合、あるいは、
ステップ６０６でシーンの安定が検出されない場合、図
６ではステップ６１０に処理を移す。ステップ６１０で
は、カット検出手続きを呼び出し、時刻ｔ近傍の画像に
ついて、カット点があるか否かを検査する（ステップ６
１１）。そして、カット点ありと判定されたならば、ス
テップ６０７に進んで時刻ｔの画像を代表画面とし、カ
ット点が検出されなければステップ６０２に戻る。これ
により、絵柄の似通ったカット点ありの場合の代表画面
の抽出もれを軽減できる。なお、カット点検出の手法
は、先に触れた従来方法のいずれによってもよい。FIG. 6 is a processing flow of another embodiment of the representative screen extracting method of the present invention. The method of FIG. 2 can detect a slow scene change, but cannot detect a cut point having a similar pattern. In such a case, FIG.
It is effective to use the processing flow in combination with the video cut point detection processing. In FIG.
Reference numeral 606 corresponds to steps 201 to 206 in FIG. If d _s (t) <T in step 604, or
If the stability of the scene is not detected in step 606, the process proceeds to step 610 in FIG. In step 610, a cut detection procedure is called to check whether there is a cut point in the image near time t (step 6).
11). If it is determined that there is a cut point, the flow advances to step 607 to use the image at time t as the representative screen. If no cut point is detected, the flow returns to step 602. As a result, it is possible to reduce the omission of extraction of the representative screen when there are cut points having similar patterns. The method of detecting the cut point may be any of the conventional methods mentioned above.

【００３２】[0032]

【発明の効果】以上説明したように、本発明によれば、
フェード、ワイプ等の編集効果やカメラ操作による時間
的にゆっくりとしたシーン変化を検出でき、フラッシュ
光などの時間的ノイズを含む映像などあらゆる映像に適
応することができ、画質的にも適切な代表画面を抽出で
きる、などの効果がある。As described above, according to the present invention,
It can detect editing effects such as fades and wipes, and slowly changing scenes over time due to camera operation, and can be applied to any video such as video that includes temporal noise such as flash light. The effect is that the screen can be extracted.

【００３３】また、シーンが安定しているか否かを判定
する際に、あらかじめ求まっている画像デーダ間の距離
の増減を調べることにより、シーン安定性を調べるため
に余計な画像処理をする必要がなく、計算量を削減でき
る効果がある。Further, when determining whether or not the scene is stable, it is necessary to perform extra image processing in order to check the stability of the scene by checking the increase or decrease in the distance between the previously determined image data. Therefore, there is an effect that the amount of calculation can be reduced.

【００３４】さらに、カット点検出処理と組合せ、カッ
トと判定された場合に代表画面を検出することにより、
絵柄の似通ったカット点を見落すことが防止でき、代表
画面の抽出もれを軽減できる効果がある。Further, in combination with the cut point detection processing, when a cut is determined, a representative screen is detected, whereby
It is possible to prevent a cut point having a similar pattern from being overlooked, and to reduce the omission of extraction of the representative screen.

【００３５】本発明は、例えばビデオの一覧表示や早見
などに応用できるが、他にも映像データベースのインタ
フェースに応用可能である。映像データベースに大量の
映像が格納されている場合、映像内容を表すキーワード
を付与しておくのが通例であるが、従来はキーワードだ
けでは欲しい場面を思ったように引き出すことができな
いという問題があった。この場合、キーワード検索で候
補として挙がった映像が本当に自分の欲しいものかどう
かの確認を助けるために、映像インデックスあるいは映
像内容の一覧表示インタフェースが有用である。本発明
を用いれば、このインデックス作成の自動化が可能とな
る。The present invention can be applied to, for example, a video list display or a quick view, but can also be applied to an interface of a video database. When a large amount of video is stored in the video database, it is customary to assign a keyword indicating the content of the video.However, in the past, there was a problem that a keyword alone could not be used to extract the desired scene as desired. Was. In this case, a video index or a video content list display interface is useful for helping to confirm whether or not the video that has been selected as a candidate in the keyword search is really what you want. According to the present invention, the index creation can be automated.

[Brief description of the drawings]

【図１】本発明代表画面抽出装置の一実施例の構成を示
すブロック図である。FIG. 1 is a block diagram showing a configuration of an embodiment of a representative screen extracting device of the present invention.

【図２】本発明代表画面抽出方法の一実施例の処理フロ
ー図である。FIG. 2 is a processing flowchart of a representative screen extracting method according to an embodiment of the present invention.

【図３】距離の時間変化と処理の流れの関係を説明する
ための図である。FIG. 3 is a diagram for explaining a relationship between a temporal change of a distance and a processing flow.

【図４】ゆっくりとしたシーン変化の場合の距離の時間
的変化を説明するための模式図である。FIG. 4 is a schematic diagram illustrating a temporal change in distance in the case of a slow scene change.

【図５】フラッシュ光による距離の時間的変化を説明す
るための模式図である。FIG. 5 is a schematic diagram for explaining a temporal change in distance due to flash light.

【図６】本発明代表画面抽出方法の他の実施例の処理フ
ロー図である。FIG. 6 is a processing flowchart of another embodiment of the representative screen extracting method of the present invention.

【図７】映像の一覧表示の模式図である。FIG. 7 is a schematic diagram of a video list display.

[Explanation of symbols]

１０入力画像データ列１１参照画像用バッファメモリ１２バッファメモリ１３計算処理部１４代表画面フレーム番号情報１５代表画面情報 Reference Signs List 10 Input image data string 11 Reference image buffer memory 12 Buffer memory 13 Calculation processing unit 14 Representative screen frame number information 15 Representative screen information

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G11B 20/00 H04N 5/783 H04N 5/91 - 5/956 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int. Cl. ⁷ , DB name) G11B 20/00 H04N 5/783 H04N 5/91-5/956

Claims

(57) [Claims]

1. A representative screen extracting method for extracting a representative screen from an image data sequence, wherein image data at a certain time is used as reference image data, and a distance between the reference image data and the image data at the time t is determined. sequentially calculated while changing the time t, the distance is the first that is greater than a predetermined threshold conditions and, in the image data in the temporal vicinity of the time t change the second of less than a predetermined threshold condition A representative screen extracting method, wherein image data at time t that satisfies both is extracted as a representative screen.

2. A method for extracting a representative screen from a sequence of image data.
In the front screen extraction method, image data at a certain time is used as reference image data.
Between the reference image data and the image data at time t.
The distance is sequentially calculated while changing the time t, and the distance is larger than a predetermined threshold.
The first condition and the change in the distance near the time t.
At time t that satisfies both the second condition that
A representative screen extracting method characterized by extracting image data as a representative screen .

3. The representative screen extracting method according to claim 1, wherein an image cut point is detected when the first or second condition is not satisfied, and when it is determined that there is a cut point. A representative screen extracting method, wherein the image data at the time t is extracted as a representative screen.

4. A representative screen extracting apparatus for extracting a representative screen from an image data sequence, comprising: a reference image buffer memory for storing image data at time s as reference image data; and an image data near time t. 4. A buffer memory, the reference image buffer memory, and image data read from the buffer memory, a representative screen is extracted according to the representative screen extracting method according to claim 1, and the extracted representative screen is stored in a reference image buffer. A representative screen extraction device, comprising: a calculation processing unit configured to be set in a memory.