JPH0832924A

JPH0832924A - Method for extracting key screen and device therefor

Info

Publication number: JPH0832924A
Application number: JP6167262A
Authority: JP
Inventors: Yukinobu Taniguchi; 行信谷口; Yoshinobu Tonomura; 佳伸外村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-07-19
Filing date: 1994-07-19
Publication date: 1996-02-02
Anticipated expiration: 2016-08-06
Also published as: JP3194837B2

Abstract

PURPOSE:To reduce misdetection of a key screen and also to reduce the excessive extraction of a key image and the deterioration of picture quality by calculating the distance from a reference image to evaluate the difference of patterns and then detecting the change of scenes. CONSTITUTION:The image data obtained at a time (s) of the time t=0 of an input image data string 10 are stored in a reference image buffer memory 11 as the reference image data. Meanwhile the data strings 10 following the time t=0 are successively stored in a buffer memory 12. Then the distance between the time (s) and a time (t) near the time (s) is calculated by a prescribed means and the change and the stability of a pattern and decided. When it is judged that the pattern changes and is stable, a calculation processing part 13 outputs the frame number information on the image data of the time (t) to a line 14 and also gives this information to the memory 12. Then the part 13 transfers the image data on the corresponding frame number to the memory 11 to update the reference image data to the image data of the time (t) and also to store the updated image data in the memory via a line 15 as a key screen.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は代表画面抽出方法および
装置に係り、詳しくは、複数枚の画像データの列からそ
の内容を代表する少数の代表画面を抽出する方法および
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for extracting a representative screen, and more particularly to a method and apparatus for extracting a small number of representative screens representing the contents of a sequence of a plurality of image data.

【０００２】[0002]

【従来の技術】一般に映像データはデータ量が膨大であ
るが、その内容を知るためには映像を時間順に見ていく
しかなかった。映像内容を効率良く表現する代表画面を
映像データの中から抽出しておけば、映像の概略把握に
有用である。理想的にはストーリなどまで考慮して代表
画面を選び出す必要があるが、その作業は現状では人手
でしかできず、作業量が膨大になるため非現実的であ
る。2. Description of the Related Art Generally, the amount of video data is enormous, but the only way to know the contents is to watch the video in chronological order. Extracting a representative screen that efficiently expresses the video content from the video data is useful for grasping the outline of the video. Ideally, it is necessary to select a representative screen in consideration of the story, but at present, the work can only be done manually, and the amount of work becomes enormous, which is unrealistic.

【０００３】以下に、代表画面抽出の自動化に関連する
従来の技術について、その二，三の応用例を挙げながら
説明する。[0003] A conventional technique related to the automation of representative screen extraction will be described below with reference to a few application examples.

【０００４】第一の応用例は、ビデオの一覧表示に関す
るものである。図７に映像一覧表示の模式図を示す。ビ
デオテープの内容を知りたいとき、あるいはビデオテー
プの中で必要な部分を頭出ししたいとき、従来は、ビデ
オデッキの早送り、巻き戻し機能を利用するしかなく、
時間と手間がかかるという問題点があった。ビデオテー
プから代表画面を自動的に抽出して、ディスプレイある
いは紙などの媒体に表示すれば、ビデオ内容の一覧が可
能となり、短時間で映像内容の大雑把な把握が可能とな
る。これに関する従来技術としては、例えば特公平５−
７４２７３号“インデックス画像作成装置”、特開昭６
４−１１４８３号“ビデオプリンタ”、特願平５−１９
５６４４号“ビデオ画像プリント方法および装置”など
がある。このうち、特公平５−７４２７３号のインデッ
クス画像作成装置では、カット点あるいはその直前の画
像データを代表画面として抽出する手法をとっている。
カット点はショット（連続的にカメラで撮影された映像
区間）のつなぎ目であり、それを検出すれば、ショット
ごとに一枚ずつ代表画面を選び出すことができるという
わけである。具体的には、画像間の差分値列を計算し画
像変化の有無を判定している。また、特開昭６４−１１
３４８３号のビデオプリンタでは、カット点から一定時
間後の画像を代表画面として抽出し、紙にプリントする
ことを特徴としている。これは、主要な画面がショット
の途中に現れることが多いという経験則に基づいてい
る。さらに、特願平５−１９５６４４号のビデオ画像プ
リント方法および装置では、再生ビデオ信号を画像処理
し、特定の条件に合致する画像をビデオ内容の把握に重
要なビデオ内の大きな変化を表わすイベントとして抜き
出している。The first application example relates to a video list display. FIG. 7 shows a schematic diagram of the video list display. When you want to know the contents of the video tape, or when you want to find the necessary part in the video tape, conventionally, you have to use the fast-forward and rewind functions of the VCR,
There was a problem that it took time and effort. If a representative screen is automatically extracted from a video tape and displayed on a medium such as a display or paper, a list of video contents can be displayed, and a rough understanding of the video contents can be achieved in a short time. As a conventional technique related to this, for example, Japanese Patent Publication No. 5-
No. 74273, "Index image creating device", Japanese Patent Laid-Open No.
No. 4-11483 "Video Printer", Japanese Patent Application No. 5-19
5644 "Video image printing method and apparatus" and the like. Among them, the index image creating apparatus of Japanese Examined Patent Publication No. 5-74273 adopts a method of extracting the cut point or the image data immediately before the cut point as a representative screen.
The cut point is a joint between shots (video sections continuously shot by a camera), and if it is detected, a representative screen can be selected one by one for each shot. Specifically, the difference value sequence between images is calculated to determine the presence or absence of image change. Also, JP-A-64-11
The 3483 video printer is characterized in that an image after a certain time from the cut point is extracted as a representative screen and printed on paper. This is based on the rule of thumb that major screens often appear in the middle of a shot. Further, in the video image printing method and apparatus of Japanese Patent Application No. 5-195644, the reproduced video signal is subjected to image processing, and an image which meets a specific condition is treated as an event indicating a large change in the video which is important for grasping the video content. It's pulled out.

【０００５】第二の応用例は、ビデオの早見に関するも
のである。カット点で区別られた一つ一つの映像区間を
少しずつ切り出し、結合することによって早見映像を自
動的に生成できるようになる（大辻、外村“動画像高速
ブランジングの主観評価”、電子情報通信学会春季大
会、ＳＤ９−３，１９９３”）。この方法も、カット点
を検出して、その直後あるいは直前の画像を代表画面と
している。The second application is related to video viewing. It becomes possible to automatically generate a quick-view video by cutting out each video segment that is distinguished by cut points little by little and combining them (Otsuji, Tonomura “Subjective evaluation of high-speed moving image branding”, electronic information Communication Society Spring Meeting, SD9-3, 1993 "). Also in this method, the cut point is detected and the image immediately after or immediately before is used as the representative screen.

【０００６】以下に、映像カット点検出方法の従来技術
について述べる。映像カット点検出の代表的な方法とし
ては、時間的に隣合う二枚の画像（時刻ｔの画像と時刻
ｔ−１の画像）の対応する画素における輝度値の差を計
算して、その絶対値の和（フレーム間差分）をＤ（ｔ）
とし、Ｄ（ｔ）がある与えられた閾値よりも大きい時、
時刻ｔをカット点とみなす方法がある（大辻、外村、大
庭、“輝度情報を使った動画像ブラウジング”、電気情
報通信学会技術報告，ＩＥ９０−１０３，１９９１）。
なお、フレーム間差分の代りに、画素変化面積、輝度ヒ
ストグラム差分、ブロック別色相関などが、Ｄ（ｔ）と
して使われることもある（大辻、外村：“映像カット自
動検出方式の検討”、テレビジョン学会技術報告、Ｖo
l．１６，Ｎo.４３，ｐｐ．７−１２）。また、Ｄ
（ｔ）をそのまま閾値処理するのではなく、各種時間フ
ィルタをＤ（ｔ）に対して作用した結果を閾値処理する
方法もある（Ｋ．Ｏtsuji and Ｙ．Ｔonomura：“Ｐroj
ection Ｄetecting Ｆilter forＶideo Ｃut Ｄetectio
n”，Ｐroc．of ＡＣＭＭultimedia ９３，１９９３，
ｐｐ．２５１−２５７）。この方法は、映像の中に激し
く動く物体やフラッシュ光があっても誤検出を生じにく
いという特徴を持っている。The prior art of the image cut point detection method will be described below. As a typical method of detecting the video cut point, the difference between the luminance values of the corresponding pixels of two temporally adjacent images (the image at time t and the image at time t−1) is calculated, and the absolute value is calculated. The sum of the values (difference between frames) is D (t)
And when D (t) is greater than a given threshold,
There is a method in which the time t is regarded as a cut point (Otsuji, Tonomura, Ohba, “Browsing video using luminance information”, IEICE technical report, IE90-103, 1991).
In addition, instead of the inter-frame difference, a pixel change area, a luminance histogram difference, a block-by-block color correlation, etc. may be used as D (t) (Otsuji, Tonomura: “Consideration of automatic video cut detection method”, Television Society Technical Report, Vo
l. 16, No. 43, pp. 7-12). Also, D
There is also a method in which (t) is not subjected to the threshold processing as it is, but the result obtained by applying various time filters to D (t) is subjected to the threshold processing (K. Otsuji and Y. Tonomura: “Proj.
ection Detecting Filter for Video Cut Detectio
n ”, Proc. of ACM Multimedia 93, 1993,
pp. 251-257). This method has a feature that erroneous detection is unlikely to occur even if there is a moving object or flash light in the image.

【０００７】[0007]

【発明が解決しようとする課題】上記従来技術において
は、代表画面をカット点を基準にして選び出すもので、
単にカット点直後あるいはそれから一定時間後の画像を
代表画面とするというものであった。しかし、カット点
直後の画像はカメラ焦点が合っていなかったり、被写体
の動きが激しいためにぶれていたりして、代表画面とし
て画質的に適切でない場合があるという問題点があっ
た。一覧表示の応用では、画質の悪い代表画面は抽出し
ないようにすることが望ましい。In the above prior art, the representative screen is selected based on the cut points.
The image just after the cut point or after a certain time from that point is used as the representative screen. However, there is a problem in that the image immediately after the cut point may not be in focus on the camera or may be blurred due to strong movement of the subject, and thus may not be appropriate in terms of image quality as a representative screen. In the application of list display, it is desirable not to extract a representative screen with poor image quality.

【０００８】また、従来の映像カット検出方法において
は、時間的にゆっくりとしたシーンの変化が検出できな
いという問題点があった。これは、シーンの変化の割合
を表す量が時間的に隣合う２フレームだけから算出され
ていて、長時間のシーンがほとんど反映されていないか
らである。これについて、例えば特願平５−３１７６６
３号“映像カット点検出方法および装置”では、隣合う
画像間に加えて時間的に離れた画像間の複数組の画像デ
ータ間の距離を計算することによって解決している。し
かし、それでもなお、長時間のうちに非常にゆっくりと
変化するシーン変化は検出できないことがあるという問
題点があった。すなわち、人間には完全にシーンが切り
替わったと知覚できるシーン変化でも、それが検出でき
ないことがあったので、そのシーンに対応する代表画面
がもれてしまうという問題点があった。Further, the conventional image cut detection method has a problem that it is not possible to detect a scene change which is slow in time. This is because the amount indicating the rate of change of the scene is calculated from only two frames that are temporally adjacent to each other, and a long-time scene is hardly reflected. Regarding this, for example, Japanese Patent Application No. 5-31766.
No. 3, "Video Cut Point Detection Method and Apparatus" solves this problem by calculating the distance between a plurality of sets of image data between images that are temporally separated in addition to adjacent images. However, there is still a problem that a scene change that changes very slowly in a long time may not be detected. In other words, even if a scene change that can be perceived by human beings as a complete scene change is not detected, there is a problem in that the representative screen corresponding to that scene is missed.

【０００９】また、パニンク（カメラを横にふる操
作）、チルト（縦にふる操作）といったカメラ操作によ
って、絵柄が変わった場合もカメラ操作後の画像を代表
画面として抽出したいことがあるが、従来方法ではそれ
ができなかった。In addition, even if a pattern is changed by a camera operation such as panning (swinging the camera horizontally) or tilting (swing the camera vertically), it is sometimes desired to extract the image after the camera operation as a representative screen. The method couldn't do it.

【００１０】本発明の目的は、第一にフェード、ワイプ
等の編集効果やカメラ操作による時間的によっくりした
シーン変化を検出でき、第二にフラシュ光などの時間的
ノイズを含む映像など、あらゆる映像に適応することが
でき、第三に画質的にも適切な代表画面を抽出できると
ころの代表画面抽出方法および装置を提供することにあ
る。An object of the present invention is to detect an editing effect such as a fade and a wipe or a scene change caused by a camera operation with time, and secondly an image including temporal noise such as flash light. A third object of the present invention is to provide a representative screen extracting method and apparatus capable of adapting to any video and thirdly extracting a representative screen suitable for image quality.

【００１１】[0011]

【問題を解決するための手段】上記目的を達成するため
に、本発明は、ある時刻における画像データを参照用画
像データとして、該参照用画像データと時刻ｔの画像デ
ータとの間の距離を時刻ｔを変化させながら順次算出
し、該距離があらかじめ定められた閾値よりも大きいと
いう第一の条件と、時刻ｔの時間的近傍でシーンが安定
しているという第二の条件とをともに満たす時に、該時
刻ｔの画像を代表画面として抽出することを特徴とす
る。In order to achieve the above object, the present invention uses image data at a certain time as reference image data and determines the distance between the reference image data and the image data at time t. Sequential calculation is performed while changing the time t, and both the first condition that the distance is larger than a predetermined threshold value and the second condition that the scene is stable near the time t are both satisfied. At times, the image at the time t is extracted as a representative screen.

【００１２】また、本発明は、シーンが安定しているか
否かを判定する際に、あらかじめ求まっている画像デー
タ間の距離の増減を調べることを特徴とする。さらに、
本発明は、第一または第二の条件が満たされない場合
に、画像カット点検出を行い、カットありと判定された
場合に、時刻ｔの画像データを代表画面として抽出する
ことを特徴とする。Further, the present invention is characterized in that, when determining whether or not the scene is stable, the increase / decrease of the distance between the image data which is obtained in advance is examined. further,
The present invention is characterized in that image cut point detection is performed when the first or second condition is not satisfied, and image data at time t is extracted as a representative screen when it is determined that there is a cut.

【００１３】[0013]

【作用】本発明では、参照用画像データと時刻ｔの画像
データの間の距離を算出する。この距離は、画像の絵柄
の違いを評価するものである。参照用画像データは固定
しておき、そこからの絵柄の変化を観察していくので、
従来技術では検出できなかった非常にゆっくりとしたシ
ーン変化を検出できるようになり、その結果、代表画面
検出に漏れが少なくなる。それに加えて、シーンが安定
しているか否かを判定する手続きをそなえることによつ
て、フラッシュ光により一時的に絵柄が変化した場合に
代表画面を異って抽出するのを防ぐことができ、また、
カメラ操作によるシーン変化や、ゆっくりとしたシーン
変化が起った場合に、その変化途中で代表画面を過剰に
抽出するのを抑止することができ、さらに被写体の動き
やカメラのぼけなどに起因する画質の悪い代表画面を抽
出しないようにすることができる。In the present invention, the distance between the reference image data and the image data at time t is calculated. This distance is used to evaluate the difference in the pattern of the image. Since the reference image data is fixed and the change in the pattern from that point is observed,
It becomes possible to detect a very slow scene change that cannot be detected by the conventional technique, and as a result, there is less omission in representative screen detection. In addition to that, by providing a procedure to determine whether the scene is stable, it is possible to prevent the representative screen from being extracted differently when the pattern is temporarily changed by the flash light, Also,
When a scene change due to camera operation or a slow scene change occurs, it is possible to prevent excessive extraction of the representative screen during the change, which is caused by the movement of the subject or the blur of the camera. It is possible not to extract a representative screen with poor image quality.

【００１４】また、シーンの安定性を判定する際に、あ
らかじめ求まっている参照用画像データと各時刻の画像
データの間の距離の増減を調べることにより、シーン安
定性を調べるために余計な画処理をする必要がなく、計
算量を削減できる。さらに、カット検出手順と組み合わ
せることによって、絵柄の似通ッたカット点を代表画面
として検出することができ、代表画面の抽出もれを軽減
できる。In addition, when determining the stability of the scene, it is possible to check the increase or decrease in the distance between the reference image data and the image data at each time, which has been obtained in advance, so that an unnecessary image for checking the scene stability can be obtained. The amount of calculation can be reduced without the need for processing. Further, by combining with the cut detection procedure, it is possible to detect a cut point having a similar pattern as a representative screen, and it is possible to reduce extraction omission of the representative screen.

【００１５】[0015]

【実施例】以下、本発明の一実施例について図面を用い
て説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１６】図１は、本発明は代表画面抽出装置の一実
施例の構成を示すブロック図である。図１において、１
０は入力画像データ列であり、画像のサンプリングレー
ト、画像フォーマット、画像サイズは任意でよい。すな
わち、ＮＴＳＣ標準映像信号を３０ frames/secでサン
プリングしてもよいし、それよりも粗いサンプリングレ
ートでサンプリングしたものでもよい。この入力画像デ
ータ列１０は、ＮＴＳＣのようなアナログ信号でも、デ
ジタル信号でも、また、ハードディスク、ＣＤ−ＲＯＭ
等、蓄積装置に保存されている画像ファイルであっても
よい。１１は参照画像用バッファメモリであり、入力画
像データ列１０の内、ある時点の画像データを参照用と
して保存しておくためのメモリである。なお、この参照
画像用バッファメモリ１１は、画像に対してある処理を
加えて二次的に得られるデータを格納するものでっても
よい。例えば、一枚の画像に対する輝度ヒストグラムや
色ヒストグラムを格納するものであってもよいし、エッ
ジ情報を格納するものであってもよい。また、それらを
時間的に平均したものであってもよい。さらに、計算処
理部１３での処理時間を短縮するために縮小した画像を
格納するものであってもよい。ここでは、これらも含め
て画像データで総称する。１２はバッファメモリであ
り、入力画像データ列１０について時刻ｔ近傍の画像デ
ータを一時格納するためのものである。このバッファメ
モリ１２は、例えば、画像送出元から順次送られてくる
画像データを一時格納しておくフレームバッファであっ
てもよいし、複数格納しておけるようにシフトバッファ
アレィを使って構成してもよい。また、参照画像用バッ
ファメモリ１１と同様に、輝度ヒストグラム、色ヒスト
グラムを格納するようにしてもよい。１３は計算処理部
であり、参照画像用バッファメモリ１１の参照画像デー
タとバッファメモリ１２の入力画像データを使って代表
画面抽出処理を行う。この計算処理部１３は、ＲＡＭや
ＲＯＭなどのメモリを内蔵する所謂ＣＰＵで構成され
る。１４は代表画面フレーム番号情報の出力線、１５は
代表画面情報の出力線である。FIG. 1 is a block diagram showing the configuration of an embodiment of a representative screen extracting device according to the present invention. In FIG. 1, 1
0 is an input image data string, and the image sampling rate, image format, and image size may be arbitrary. That is, the NTSC standard video signal may be sampled at 30 frames / sec or may be sampled at a coarser sampling rate. The input image data string 10 may be an analog signal such as NTSC, a digital signal, a hard disk, a CD-ROM.
Etc., the image file stored in the storage device may be used. A reference image buffer memory 11 is a memory for storing image data at a certain point in the input image data string 10 for reference. The reference image buffer memory 11 may be one that stores data obtained by secondarily applying a certain process to the image. For example, a brightness histogram or a color histogram for one image may be stored, or edge information may be stored. Also, they may be averaged over time. Furthermore, a reduced image may be stored to reduce the processing time in the calculation processing unit 13. Here, these are collectively referred to as image data. Reference numeral 12 is a buffer memory for temporarily storing the image data near the time t in the input image data string 10. The buffer memory 12 may be, for example, a frame buffer for temporarily storing image data sequentially sent from an image transmission source, or configured by using a shift buffer array so that a plurality of image data can be stored. May be. Further, similarly to the reference image buffer memory 11, a luminance histogram and a color histogram may be stored. A calculation processing unit 13 performs a representative screen extraction process using the reference image data of the reference image buffer memory 11 and the input image data of the buffer memory 12. The calculation processing unit 13 is composed of a so-called CPU having a built-in memory such as a RAM or a ROM. Reference numeral 14 is a representative screen frame number information output line, and 15 is a representative screen information output line.

【００１７】図２は、本発明代表画面抽出方法の一実施
例の処理フローチャートであり、この処理は図１の計算
処理部１３が受け持つ。まず、時刻ｔを表す変数ｓ，ｔ
を０に初期化する（ステップ２０１）。この時、参照画
像用バッファメモリ１１には、入力画像データ列１０内
の、時刻ｓ＝０すなわち時刻ｔ＝０の画像データが参照
用画像データとして格納される。バッファメモリ１２に
は、時刻ｔ＝０以降の入力画像データ列１０が順次格納
され、常時、時刻ｔ近傍の画像データ列が格納される。
ここで、各画像データにはフレーム番号が付加されてい
るとする。次に、ｔを１だけ進めて（ステップ２０
２）、時刻ｓの参照用画像データＩ_sと時刻ｔの画像デ
ータＩ_tの間の距離ｄ_s(t) を計算する（ステップ２０
３）。ここで、距離ｄ_s(t)は正の値をとり、二枚のり画
像の絵柄が類似していればいるほど０に近くなり、異な
っていればいるほど大きな値をとる。なお、距離の算出
については後述する。次に、距離ｄ_s(t)を閾値Ｔと比較
する（ステップ２０４）。ここで、ｄ_s(t)＜Ｔの場合、
絵柄の変化が小さいとみなしてステップ２０２に戻る。FIG. 2 is a processing flowchart of an embodiment of the representative screen extracting method of the present invention. The calculation processing section 13 of FIG. 1 is responsible for this processing. First, variables s and t representing time t
Is initialized to 0 (step 201). At this time, the image data at time s = 0, that is, at time t = 0 in the input image data string 10 is stored in the reference image buffer memory 11 as reference image data. The buffer memory 12 sequentially stores the input image data sequence 10 after time t = 0, and always stores the image data sequence near time t.
Here, it is assumed that a frame number is added to each image data. Then, t is advanced by 1 (step 20
2) Calculate the distance d _s (t) between the reference image data I _s at time _s and the image data I _t at time t (step 20).
3). Here, the distance d _s (t) takes a positive value and becomes closer to 0 as the patterns of the two paste images are more similar, and takes a larger value as they are different. The calculation of the distance will be described later. Next, the distance d _s (t) is compared with the threshold T (step 204). Here, if d _s (t) <T,
The change in the pattern is considered small, and the process returns to step 202.

【００１８】ｄ_s(t)≧Ｔの場合、二枚の画像の間で絵柄
が変化したとみなす。この場合、続いてシーンが安定し
ているか否かを判定する手続を呼びだし（ステップ２０
５）、安定であるか否かを検査し（ステップ２０６）、
安定でない場合にはステップ２０２に戻る。安定と判定
されれば、時刻ｔの画像を代表画面とし（ステップ２０
７）、ｔをｓに代入し（ステップ２０８）、ステップ２
０２に戻る。ｔをｓに代入するということは、参照用画
像データを時刻ｔの画像データに更新することを意味す
る。なお、シーンが安定しているか否かを判定する手続
きについては後述する。When d _s (t) ≧ T, it is considered that the pattern has changed between the two images. In this case, subsequently, a procedure for determining whether or not the scene is stable is called (step 20).
5) check if it is stable (step 206),
If it is not stable, the process returns to step 202. If it is determined to be stable, the image at time t is set as the representative screen (step 20
7), substituting t for s (step 208), step 2
Return to 02. Substituting t into s means updating the reference image data to the image data at time t. The procedure for determining whether or not the scene is stable will be described later.

【００１９】ここで、ステップ２０７では、具体的に
は、計算処理部１３が時刻ｔの画像データのフレーム番
号情報を線１４に送出し、同時に、このフレーム番号情
報を制御線１６を通してバッファメモリ１２へ与えて、
バッファメモリ１２から該当フレーム番号の画像データ
を読み出し、参照画像用バッファメモリ１１に転送す
る。これにより、参照画像用バッファメモリ１１の参照
用画像データが時刻ｔの画像データに更新される。ま
た、バッファメモリ１２から読み出された時刻ｔの画像
データは代表画面として線１５に送出される。この線１
５の代表画面を、それに線１４のフレーム番号を付加し
て、例えばハードディスク等の二次記憶媒体等に格納す
る。なお、計算処理部１３は、処理のためにバッファメ
モリ１２から取り込んだ画像データのうちから代表画面
として求まった時刻ｔの画像データを線１５に送出する
とともに参照画像用バッファメモリ１１に書き込んでも
よい。Here, in step 207, specifically, the calculation processing section 13 sends the frame number information of the image data at the time t to the line 14, and at the same time, the frame number information is sent to the buffer memory 12 through the control line 16. Give to
The image data of the corresponding frame number is read from the buffer memory 12 and transferred to the reference image buffer memory 11. As a result, the reference image data in the reference image buffer memory 11 is updated to the image data at time t. Further, the image data at time t read from the buffer memory 12 is sent to the line 15 as a representative screen. This line 1
The representative screen of No. 5 is stored in a secondary storage medium such as a hard disk with the frame number of the line 14 added thereto. Note that the calculation processing unit 13 may send the image data at the time t obtained as the representative screen out of the image data fetched from the buffer memory 12 for processing to the line 15 and write it in the reference image buffer memory 11. .

【００２０】次に、図３を使って、距離ｄ_s(t)の時間的
変化と処理の流れの関係を説明する。まず、時刻ｓの画
像データを参照用画像データと考える（ステップ２０
１）。ｔを増加させながら（ステップ２０２）、参照用
画像データと時刻ｔの画像データの間の距離ｄ_s(t)を順
次算出する（ステップ２０３）。画像データの絵柄は時
間を経るごとに参照用画像データのものとは異なってく
るので、図３に示すように、距離ｄ_s(t)は少しずつ増加
する。この距離ｄ_s（ｔ₂）を閾値Ｔと比較するが（ステ
ップ２０４）、時刻ｔ₁では、ｄ_s(t）＜Ｔなので、絵柄
が十分に変化していないとみなす。時刻ｔ₂において、
ビルのシーンから車のシーンに切り替わると、距離ｄ
_s(t)が急増し、ｄ_s(t₂）≧Ｔを満たすようになる。この
場合、続いてシーンが安定しているか判定する（ステッ
プ２０５，２０６）。しかし、時刻ｔ₂の近辺ではｄ
_s(t)の増減が大きいので、ステップ２０６ではシーンが
まだ安定していないと判定し、代表画面を抽出しないま
まステップ２０２に戻る。時刻ｔ₃でｄ_s(t)が減少に転
ずるので安定したとみなし、代表画面を抽出する（ステ
ップ２０７）。そして、この時刻ｔ₃の画像データを次
の参照用画像データに設定する。具体的には、ｓ＝ｔ₃
として（ステップ２０８）、もとの処理に戻る。Next, the relationship between the temporal change of the distance d _s (t) and the processing flow will be described with reference to FIG. First, consider the image data at time s as reference image data (step 20).
1). While increasing t (step 202), the distance d _s (t) between the reference image data and the image data at time t is sequentially calculated (step 203). Since the pattern of the image data is different from that of the reference image data as time passes, the distance d _s (t) gradually increases as shown in FIG. This distance d _s (t ₂ ) is compared with the threshold value T (step 204), but at the time t ₁ , d _s (t) <T, so it is considered that the pattern has not changed sufficiently. At time t ₂ ,
When switching from the building scene to the car scene, the distance d
_s (t) rapidly increases, and d _s (t ₂ ) ≧ T is satisfied. In this case, it is subsequently determined whether the scene is stable (steps 205 and 206). However, in the vicinity of time t ₂ , d
_Since the increase / decrease in _s (t) is large, it is determined in step 206 that the scene is not yet stable, and the flow returns to step 202 without extracting the representative screen. Since d _s (t) starts to decrease at time t ₃ , it is considered stable, and the representative screen is extracted (step 207). Then, the image data at time t ₃ is set as the next reference image data. Specifically, s = t ₃
(Step 208), the process returns to the original process.

【００２１】図２の処理フローにおいて、ステップ２０
５，２０６のシーンの安定性を検査する手続きを省略
し、単に距離ｄ_s(t）＞Ｔのとき代表画面ありとみなす
ことも考えられるが、次のような問題があるため実用的
でない。（１）ワイプ、フェードといったゆっくりとしたシーン
変化の場合、図４に示すように、シーン変化途中で距離
ｄ_s(t）が閾値Ｔを越えることがある。このため、ひと
まとまりのシーン変化の中で代表画面を重複して抽出し
てしまったり、クロスフェード（二つの映像が重なりあ
って一つのシーンから他のシーンへ切り替わる映像編集
効果）の途中の２枚の画像が重なりあった（画質的に好
ましくない）代表画面が抽出されてしまったりする。（２）フラッシュが焚かれているシーン（このようなシ
ーンはニユース映像で多く見られるものであるが）を撮
映した映像では、フラツシュ光による輝度の突発的上昇
により、図５に示すよう、距離ｄ_s(t）が突発的に閾値
Ｔを越えることがある。このため、フラツシュが焚かれ
るごとに代表画面を繰り返し抽出してしまう。本発明では、シーンの安定性を検査することによって、
上記（１),（２）の問題を克服している。In the process flow of FIG. 2, step 20
It is possible to omit the procedure for inspecting the stability of the scenes of 5,206 and simply consider that there is a representative screen when the distance d _s (t)> T, but it is not practical because of the following problems. (1) In the case of a slow scene change such as wipe or fade, as shown in FIG. 4, the distance d _s (t) may exceed the threshold T during the scene change. For this reason, the representative screens may be duplicated and extracted during a group of scene changes, or 2 in the middle of a crossfade (video editing effect in which two images overlap and switch from one scene to another). A representative screen in which images are overlapped (not preferable in terms of image quality) may be extracted. (2) In a video shot of a scene in which a flash is fired (though such a scene is often seen in news footage), as shown in FIG. 5, due to a sudden increase in brightness due to flash light, The distance d _s (t) may suddenly exceed the threshold value T. Therefore, the representative screen is repeatedly extracted each time the flash is fired. In the present invention, by checking the stability of the scene,
It overcomes the problems of (1) and (2) above.

【００２２】次に、参照用画像データＩ_Sと画像データ
Ｉ_tの間の距離ｄ_s(t)を算出する手続きの二，三の実現
例を説明する。Next, a few examples of the procedure for calculating the distance d _s (t) between the reference image data I _S and the image data I _t will be described.

【００２３】第一の実現例は、輝度ヒストグラムを用い
るものである。即ち、時刻ｓの参照画像Ｉ_sに対する輝
度ヒストグラムをＨ_s(n)、時刻ｔの画像データＩ_tに対
する輝度ヒストグラムをＨ_s(n)，n＝１，２，…，Ｎと
し、距離ｄ_s(t)を、〔数１〕で計算する。ただし、Ｎは
ヒストグラムの段階数である。The first implementation uses a luminance histogram. That is, the luminance histogram for the reference image I _{s at} time s is H _s (n), the luminance histogram for the image data I _{t at} time t is H _s (n), n = 1, 2, ..., N, and the distance d _{s is set.} Calculate (t) by [Equation 1]. However, N is the number of steps in the histogram.

【００２４】[0024]

【数１】 [Equation 1]

【００２５】第二の実現例は、色のヒストグラムを用い
るものである。即ち、時刻ｓの参照画像、時刻ｔの画像
に対する色ヒストグラムをそれぞれＨ_s′(n_r，n_g，
n_b），Ｈ_t′(n_r，n_g，n_b），n_r，n_g，n_b，１，２，…，
Ｎと表すとき、距離ｄ_s(t)を〔数２〕で計算する。The second implementation uses a color histogram. That is, the color histograms for the reference image at time s and the image at time t are H _s ′ (n _r , n _g ,
n _b ), H _t ′ (n _r , n _g , n _b ), n _r , n _g , n _b , 1, 2, ...
When represented as N, the distance d _s (t) is calculated by [Equation 2].

【００２６】[0026]

【数２】 [Equation 2]

【００２７】以上説明した実現例では、ヒストグラムに
基づいた特徴量から距離ｄ_s(t)を算出したが、これに限
られる訳ではない。ブロックで平均した色情報から距離
ｄ_s(t)を算出してもよい。In the implementation example described above, the distance d _s (t) is calculated from the feature amount based on the histogram, but the present invention is not limited to this. The distance d _s (t) may be calculated from the color information averaged by the blocks.

【００２８】次に、シーンの安定性を評価する手続きの
二，三の実現例を説明する。Next, a few examples of the procedure for evaluating the stability of the scene will be described.

【００２９】第一の実現例は、フレーム間差分を用いる
ものである。即ち、時刻ｔの画像データをＩ_tとし、座
標（ｘ，ｙ）における輝度値をI_t(x,y)と表し、フレー
ム間差分をＤ(t)＝Σ_x,y｜I_t(x，y)−Ｉ_t-1(x，y)｜により計算する。そして、フレーム間差分の系列がある
時間幅Ｗですべてある閾値θより小さい場合、すなわ
ち、Ｄ_(t-k)＜θ，ｋ＝０，１，…，Ｗ−１のとき、時
刻ｔの付近でシーンが安定であると判定する。The first implementation uses the interframe difference. That is, the image data at time t is I _t , the brightness value at coordinates (x, y) is represented by I _t (x, y), and the inter-frame difference is D (t) = Σ _{x, y} | I _t (x , Y) -It _-1 (x, y) | When the sequence of inter-frame differences is less than a certain threshold θ in a certain time width W, that is, when D _(tk) <θ, k = 0, 1, ..., W-1, the scene near the time t Is stable.

【００３０】第二の実現例は、距離ｄ_s(t)をシーンの安
定性の判定にも用いるものである。即ち、距離ｄ_s(t)を
観察すると、シーンが不安定な場合には、図４に示すよ
うに、ｄ_s(t)が単調に増加したり、図５に示すように、
一時間ピークを示すことがある。そこで、例えば、ｄ
_s(t-1)＞ｄ_s(t)を満たすときシーンが安定であるとみな
すようにしてもよいし、ｄ_s(t-1)＞ｄ_s(t)かつｄ_s(t-2)
＞ｄ_s(t)を満たすときシーンが安定であるとみなしても
よい。The second implementation also uses the distance d _s (t) to determine the stability of the scene. That is, when observing the distance d _s (t), when the scene is unstable, d _s (t) monotonically increases as shown in FIG. 4, or as shown in FIG.
May show peak for one hour. So, for example, d
_The scene may be considered to be stable when _s (t-1)> _ds (t), or _ds (t-1)> _ds (t) and _ds (t-2).
A scene may be considered stable when> d _s (t) is satisfied.

【００３１】図６は、本発明代表画面抽出方法の他の一
実施例の処理フローである。図２の方法は、ゆっくりと
したシーン変化を検出できる反面、絵柄の似通ったカッ
ト点を検出できない場合がある。このような場合、図２
の処理フローに映像カット点検出処理を組み合わせて使
うことが有効である。図６において、ステップ６０１〜
６０６はそれぞれ図２のステップ２０１〜２０６に対応
する。ステップ６０４でｄ_s(t)＜Ｔの場合、あるいは、
ステップ６０６でシーンの安定が検出されない場合、図
６ではステップ６１０に処理を移す。ステップ６１０で
は、カット検出手続きを呼び出し、時刻ｔ近傍の画像に
ついて、カット点があるか否かを検査する（ステップ６
１１）。そして、カット点ありと判定されたならば、ス
テップ６０７に進んで時刻ｔの画像を代表画面とし、カ
ット点が検出されなければステップ６０２に戻る。これ
により、絵柄の似通ったカット点ありの場合の代表画面
の抽出もれを軽減できる。なお、カット点検出の手法
は、先に触れた従来方法のいずれによってもよい。FIG. 6 is a processing flow of another embodiment of the representative screen extracting method of the present invention. The method of FIG. 2 can detect a slow scene change, but may not be able to detect cut points having similar patterns. In such a case,
It is effective to combine the processing flow of 1) with the video cut point detection processing. In FIG. 6, steps 601 to
Reference numeral 606 corresponds to steps 201 to 206 of FIG. 2, respectively. If d _s (t) <T in step 604, or
If the stability of the scene is not detected in step 606, the process proceeds to step 610 in FIG. In step 610, the cut detection procedure is called and the image near the time t is inspected for a cut point (step 6).
11). If it is determined that there is a cut point, the flow advances to step 607 to set the image at time t as the representative screen, and if no cut point is detected, the flow returns to step 602. As a result, it is possible to reduce the omission of extraction of the representative screen when there are cut points with similar patterns. The method of detecting the cut point may be any of the conventional methods mentioned above.

【００３２】[0032]

【発明の効果】以上説明したように、本発明によれば、
フェード、ワイプ等の編集効果やカメラ操作による時間
的にゆっくりとしたシーン変化を検出でき、フラッシュ
光などの時間的ノイズを含む映像などあらゆる映像に適
応することができ、画質的にも適切な代表画面を抽出で
きる、などの効果がある。As described above, according to the present invention,
It is possible to detect editing effects such as fades and wipes and scene changes that occur slowly with camera operation, and it can be applied to all images including temporal noise such as flash light. The effect is that the screen can be extracted.

【００３３】また、シーンが安定しているか否かを判定
する際に、あらかじめ求まっている画像デーダ間の距離
の増減を調べることにより、シーン安定性を調べるため
に余計な画像処理をする必要がなく、計算量を削減でき
る効果がある。When determining whether or not the scene is stable, it is necessary to perform extra image processing in order to check the scene stability by checking the increase / decrease of the distance between the image data which is obtained in advance. This has the effect of reducing the amount of calculation.

【００３４】さらに、カット点検出処理と組合せ、カッ
トと判定された場合に代表画面を検出することにより、
絵柄の似通ったカット点を見落すことが防止でき、代表
画面の抽出もれを軽減できる効果がある。Furthermore, in combination with the cut point detection processing, by detecting the representative screen when it is determined that the cut is made,
It is possible to prevent the cut points having similar patterns from being overlooked and reduce the omission of the representative screen.

【００３５】本発明は、例えばビデオの一覧表示や早見
などに応用できるが、他にも映像データベースのインタ
フェースに応用可能である。映像データベースに大量の
映像が格納されている場合、映像内容を表すキーワード
を付与しておくのが通例であるが、従来はキーワードだ
けでは欲しい場面を思ったように引き出すことができな
いという問題があった。この場合、キーワード検索で候
補として挙がった映像が本当に自分の欲しいものかどう
かの確認を助けるために、映像インデックスあるいは映
像内容の一覧表示インタフェースが有用である。本発明
を用いれば、このインデックス作成の自動化が可能とな
る。The present invention can be applied to, for example, a video list display or a quick view, but can also be applied to an image database interface. When a large amount of video is stored in the video database, it is customary to add a keyword that represents the video content, but conventionally there is the problem that the keyword alone cannot be used to derive the desired scene. It was In this case, a video index or video content list display interface is useful in order to confirm whether the videos selected as candidates in the keyword search are really what one wants. By using the present invention, this index creation can be automated.

[Brief description of drawings]

【図１】本発明代表画面抽出装置の一実施例の構成を示
すブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of a representative screen extraction device of the present invention.

【図２】本発明代表画面抽出方法の一実施例の処理フロ
ー図である。FIG. 2 is a processing flow chart of an embodiment of a representative screen extracting method of the present invention.

【図３】距離の時間変化と処理の流れの関係を説明する
ための図である。FIG. 3 is a diagram for explaining a relationship between a time change of a distance and a processing flow.

【図４】ゆっくりとしたシーン変化の場合の距離の時間
的変化を説明するための模式図である。FIG. 4 is a schematic diagram for explaining a temporal change in distance in the case of a slow scene change.

【図５】フラッシュ光による距離の時間的変化を説明す
るための模式図である。FIG. 5 is a schematic diagram for explaining a temporal change in distance due to flash light.

【図６】本発明代表画面抽出方法の他の実施例の処理フ
ロー図である。FIG. 6 is a processing flow chart of another embodiment of the representative screen extracting method of the present invention.

【図７】映像の一覧表示の模式図である。FIG. 7 is a schematic diagram of displaying a list of videos.

[Explanation of symbols]

１０入力画像データ列１１参照画像用バッファメモリ１２バッファメモリ１３計算処理部１４代表画面フレーム番号情報１５代表画面情報 10 input image data string 11 reference image buffer memory 12 buffer memory 13 calculation processing unit 14 representative screen frame number information 15 representative screen information

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｈ０４Ｎ 5/937 ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI Technical indication H04N 5/937

Claims

[Claims]

1. A representative screen extracting method for extracting a representative screen from an image data string, wherein image data at a certain time is used as reference image data, and a distance between the reference image data and the image data at time t is calculated. Time is sequentially calculated while changing t, and both the first condition that the distance is larger than a predetermined threshold value and the second condition that the scene is stable near the time t are both. A representative screen extracting method, characterized in that image data at a time t to be satisfied is extracted as a representative screen.

2. The representative screen extracting method according to claim 1, wherein whether or not the scene is stable near time t is determined between the reference image data and the image data near time t. A representative screen extraction method characterized by being performed by checking increase and decrease in distance.

3. The representative screen extraction method according to claim 1, wherein image cut point detection is performed when the first or second condition is not satisfied, and when it is determined that there is a cut point. A representative screen extracting method characterized by extracting the image data at the time t as a representative screen.

4. A representative screen extracting device for extracting a representative screen from an image data string, a reference image buffer memory for storing image data at time s as reference image data, and image data in the vicinity of time t. A buffer memory, the reference image buffer memory, and image data read from the buffer memory, a representative screen is extracted according to the representative screen extraction method according to claim 1, and the extracted representative screen is used as a reference image buffer. A representative screen extraction device, comprising: a calculation processing unit set in a memory;