JPH10112835A

JPH10112835A - Video image summarizing method and video image display method

Info

Publication number: JPH10112835A
Application number: JP8264287A
Authority: JP
Inventors: Shin Yamada; 伸山田; Yasuhiro Kikuchi; 康弘菊池
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1996-10-04
Filing date: 1996-10-04
Publication date: 1998-04-28
Anticipated expiration: 2016-10-04
Also published as: JP3250467B2

Abstract

PROBLEM TO BE SOLVED: To allow the method to have provision for versatility of preference of the users and video image contents by extracting scenes consecutive for a prescribed time and grouping scenes in time series. SOLUTION: When a scene change detection means 105 discriminates occurrence of a scene change in a frame image received by a image fetch means 104, the means decides a representative image in all scene modes and a head frame. Then a time length of a scene just before a scene change is calculated and a time discrimination processing means 106 stores a preceding scene representative image as a representative image in the time discrimination mode to a file server 113. Then the user selects in which mode a summarized video image is to be displayed via a user interface means 116, a head frame number of a scene to be displayed and information of the representative image are sent from the server 113 to a summarized video image reproduction means 115, which displays the summarized video image.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、映像の検索、編
集、加工、早見などを支援する方法に係り、特にビデオ
テープやビデオディスクに格納された映像を要約して、
再生または表示をする映像要約方法および映像表示方法
に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for supporting search, editing, processing, and quick viewing of images, and more particularly, to a method for summarizing images stored on a video tape or a video disk.
The present invention relates to a video summarizing method for reproducing or displaying and a video displaying method.

【０００２】[0002]

【従来の技術】映像の編集、早見においては、映像の中
から見たい部分を効率よく探すための映像要約手段が不
可欠である。画像の中から特定の代表画像を選択するこ
とを特徴とする映像要約手段として、例えば、特開昭６
４ー６８０８４号に記載された方法（以下、シーン一覧
表示という。）、ショット毎ラッシュ再生法、特開平７
−２３６１５３号に記載された方法（以下、色差相関値
法という。）、特開平６−１４９９０２号に記載された
方法（以下、時間長指定要約法という。）が知られてい
る。2. Description of the Related Art In video editing and quick viewing, video summarizing means for efficiently searching for a desired portion from a video is indispensable. As a video summarizing means for selecting a specific representative image from images, for example, Japanese Patent Application Laid-Open
No. 4-68084 (hereinafter referred to as scene list display), rush reproduction method for each shot,
A method described in US Pat. No. 2,236,153 (hereinafter, referred to as a color difference correlation value method) and a method described in JP-A-6-149902 (hereinafter, referred to as a time length designation summarizing method) are known.

【０００３】シーン一覧表示は、シーンチェンジをあら
かじめ調べておき、図１４に示すように、シーンチェン
ジ直後の画像を一覧表示する方法である。図１４におい
て、（１）〜（９）がシーンチェンジ直後の画像を表し
ている。この方法では、一つのシーンをシーンの先頭画
像で代表させる。[0003] The scene list display is a method of examining scene changes in advance and displaying a list of images immediately after the scene changes as shown in FIG. In FIG. 14, (1) to (9) represent images immediately after a scene change. In this method, one scene is represented by the leading image of the scene.

【０００４】「シーン」とは、映像編集などの分野でし
ばしば使われる映像の単位であり、多くの場合、「一つ
のビデオカメラで時間的に連続して撮影された部分」と
定義される。本願では「シーン」をこれより広い意味に
用いるものとし、「シーン」とは、「映像を何らかの基
準に従って分割して形成した映像の単位」をいうものと
する。シーンは、ショット、カットと呼ばれることもあ
る。「シーンチェンジ」とは、多くの場合、編集でつな
いだ部分、ビデオカメラの撮影を中断した部分のよう
に、シーンが変化する部分をいう。本願ではこれより広
い意味に用いるものとし、「シーンチェンジ」とは、
「映像を何らかの基準に従って分割する際に分割された
部分」をいうものとする。シーンチェンジは、カットと
呼ばれることもある。また、「フレーム」とは、映像を
構成する各画像をいうものとする。[0004] A "scene" is a unit of video that is often used in the field of video editing and the like, and is often defined as a "portion of a video camera taken continuously in time". In the present application, “scene” is used in a broader sense, and “scene” refers to “a unit of an image formed by dividing an image according to some reference”. Scenes are sometimes called shots and cuts. The “scene change” often refers to a portion where the scene changes, such as a portion connected by editing or a portion where shooting by a video camera is interrupted. In this application, it shall be used in a broader sense, and "scene change"
It refers to “parts divided when dividing a video according to some criteria”. A scene change is sometimes called a cut. In addition, “frame” refers to each image constituting a video.

【０００５】映像を自動的にシーンに分割する方法とし
ては、共通色比率法（山田、藤岡、金森、松島、「部分
領域ごとの共通色に注目したシーンチェンジ検出方法の
検討」、テレビジョン学会技術報告,Vol.17,No.55）、
映像変化モデル法（山田、藤岡、金森、松島、坂内、
「編集効果を含む映像のシーンチェンジ検出方法」、テ
レビジョン学会、マルチメディアと映像処理シンポジウ
ム'94）などが提案されている。As a method of automatically dividing an image into scenes, a common color ratio method (Yamada, Fujioka, Kanamori, Matsushima, "Study Method for Scene Change Focusing on Common Color for Each Partial Area", The Institute of Television Engineers of Japan) Technical Report, Vol. 17, No. 55),
Video change model method (Yamada, Fujioka, Kanamori, Matsushima, Sakauchi,
"Scene change detection method for video including editing effect", The Institute of Television Engineers of Japan, Multimedia and Video Processing Symposium '94), etc. have been proposed.

【０００６】ショット毎ラッシュ再生法は、各シーンの
先頭部分を標準速度で次々に再生する映像要約方法であ
る。この方法は、映像の要約を見るための方法であり、
途中で再生を中断せずに最初から最後まで再生して利用
する。[0006] The rush reproduction method for each shot is a video summarization method in which a head portion of each scene is reproduced one after another at a standard speed. This is a way to see a summary of the video,
Play from the beginning to the end without interrupting playback midway.

【０００７】時間長指定要約法は、全部または一部のシ
ーンを一定時間づつ再生する方法であり、要約動画像の
時間長が指定された値になるように、各シーンの時間長
を用いて、各シーンの再生時間または再生シーン数を決
める。[0007] The time length designation summarization method is a method of reproducing all or a part of a scene at a fixed time, and uses the time length of each scene so that the time length of the summary moving image becomes a designated value. The playback time or the number of playback scenes for each scene is determined.

【０００８】色差相関値法は、カメラアングルのみが変
わって、同様な内容をもつシーンをグループ化すること
で、シーン一覧表示に比べて能率的な映像検索を可能に
する映像要約方法である。この方法では、シーンの先頭
画像（以下、カット画面という。）を求めながら、カッ
ト画面間の色差ヒストグラム相関値を求め、色差ヒスト
グラム相関値をしきい値以上にするカット画面を同一グ
ループのカット画面とみなす。このとき、類似背景を有
するカット画面同士が、同一グループになる。また、カ
ット画面を一覧表示する場合には、同一グループ内で最
初に検出されたカット画面を親画面にして、同一グルー
プ内の残りのカット画面を子画面にして表示する。例え
ば、２番目〜３番目のシーンが同一グループで、６番目
〜８番目のシーンが同一グループであるとき、図１５に
示すように表示する。図１５において、（１）〜（９）
は、それぞれ１番目のカット画面から９番目のカット画
面までを表す。[0008] The color difference correlation value method is a video summarizing method that enables a more efficient video search than a scene list display by grouping scenes having similar contents while changing only the camera angle. In this method, a color difference histogram correlation value between cut images is obtained while obtaining a leading image of a scene (hereinafter, referred to as a cut image), and cut images having a color difference histogram correlation value equal to or more than a threshold value are cut images of the same group. Consider At this time, cut screens having similar backgrounds belong to the same group. When a list of cut screens is displayed, a cut screen first detected in the same group is set as a parent screen, and the remaining cut screens in the same group are displayed as sub-screens. For example, when the second to third scenes belong to the same group and the sixth to eighth scenes belong to the same group, they are displayed as shown in FIG. In FIG. 15, (1) to (9)
Represents the first to ninth cut screens, respectively.

【０００９】以下、色差相関値法を用いた従来の映像要
約システムについて説明する。図１６は従来例である映
像要約システムを示すブロック図である。図１６におい
て、１５０１、１５０２は映像の入力装置であって、１
５０１はビデオディスク装置、１５０２はＶＴＲであ
る。１５０３はビデオディスク装置１５０１やＶＴＲ１
５０２からの映像信号を処理しながら、シーンチェンジ
検出と色差相関値法を用いて映像を要約する映像要約装
置であり、フレーム画像を取り込む画像取り込み手段１
５０４と、シーンチェンジを検出するシーンチェンジ検
出手段１５０５と、カット画面（シーンの先頭画像）の
グループ化を実行するグループ生成手段１５０６から構
成される。１５０７は、ビデオディスク装置１５０１と
ＶＴＲ１５０２を制御する制御装置である。１５０８
は、映像を圧縮する映像圧縮装置である。１５０９は映
像要約装置１５０３で作成されたカット画面とシーンチ
ェンジのデータとグループのデータと、映像圧縮装置１
５０８で圧縮された映像データを保存するファイルサー
バーである。１５１０はファイルサーバに格納されたデ
ータと画像と映像を表示する映像表示装置である。A conventional video summarization system using the color difference correlation value method will be described below. FIG. 16 is a block diagram showing a conventional video summarizing system. In FIG. 16, reference numerals 1501 and 1502 denote video input devices.
Reference numeral 501 denotes a video disk device, and 1502 denotes a VTR. Reference numeral 1503 denotes a video disk device 1501 or VTR 1
A video summarizing apparatus for summarizing a video using a scene change detection and a color difference correlation value method while processing a video signal from a video signal 502.
504, a scene change detecting unit 1505 for detecting a scene change, and a group generating unit 1506 for performing grouping of the cut screen (the leading image of the scene). A control device 1507 controls the video disk device 1501 and the VTR 1502. 1508
Is a video compression device for compressing video. Reference numeral 1509 denotes the cut screen, scene change data and group data created by the video summarizing device 1503, and the video compression device 1
A file server that stores the video data compressed in step 508. Reference numeral 1510 denotes a video display device that displays data, images, and videos stored in the file server.

【００１０】以上のように構成された映像要約システム
について、図１７に示すフローチャートを用いてその全
体の動作を説明する。The overall operation of the video summarizing system configured as described above will be described with reference to the flowchart shown in FIG.

【００１１】手順１６０１では、図１６における制御装
置１５０７がビデオディスク装置１５０１とＶＴＲ１５
０２を制御して、映像の再生を開始し、同時に映像圧縮
装置１５０８での映像の圧縮を開始する。圧縮された映
像はファイルサーバ１５０９に保存する。In step 1601, the control unit 1507 in FIG.
02 is started to reproduce the video, and at the same time, the video compression by the video compression device 1508 is started. The compressed video is stored in the file server 1509.

【００１２】手順１６０２では、制御装置１５０７が映
像が終了したかどうか判定する。映像が終了した場合に
は手順１６０７に進み、そうでなければ、手順１６０３
に進む。In step 1602, the control device 1507 determines whether or not the video has ended. If the video has ended, the procedure proceeds to step 1607; otherwise, the procedure 1603
Proceed to.

【００１３】手順１６０３では、画像取り込み手段１５
０４が再生中のフレーム画像を取り込む。In step 1603, the image capturing means 15
04 captures the frame image being reproduced.

【００１４】手順１６０４では、シーンチェンジ検出手
段１５０５が画像取り込み手段１５０４で取り込まれた
フレーム画像を処理して、前述した共通色比率法等を用
いてシーンチェンジが発生したかどうか検出する。In step 1604, the scene change detecting means 1505 processes the frame image captured by the image capturing means 1504, and detects whether a scene change has occurred using the above-described common color ratio method or the like.

【００１５】手順１６０４で「シーンチェンジが発生し
た」と判定された場合には手順１６０５に進み、そうで
なければ手順１６０２に戻る。If it is determined in step 1604 that "a scene change has occurred", the flow advances to step 1605; otherwise, the flow returns to step 1602.

【００１６】手順１６０５では、シーンチェンジ検出手
段１５０５で検出したシーンチェンジが発生した時点の
フレーム番号とカット画面をファイルサーバ１５０９に
保存する。In step 1605, the frame number and the cut screen at the time when the scene change detected by the scene change detecting means 1505 occurs are stored in the file server 1509.

【００１７】手順１６０６では、グループ生成手段１５
０６が、時系列のカット画面間の色差ヒストグラム相関
値を求め、色差ヒストグラム相関値をしきい値以上にす
るカット画面を同一グループのカット画面とみなす。グ
ループ化の結果をファイルサーバー１５０９に保存して
から、手順１６０２に戻る。In step 1606, the group generation unit 15
06 calculates a color difference histogram correlation value between the time-series cut screens, and regards the cut screens that make the color difference histogram correlation value equal to or larger than the threshold value as cut screens of the same group. After saving the grouping result in the file server 1509, the process returns to the step 1602.

【００１８】手順１６０７では、制御装置１５０７が映
像の再生と映像の圧縮を中止する。手順１６０８では、
使用者が映像の表示方法を選択する。代表的な部分を動
画で見たい場合には、手順１６１０に進む。静止画を用
いて見たい部分を探したい場合には、手順１６０９に進
む。In step 1607, the control device 1507 stops the reproduction of the video and the compression of the video. In step 1608,
The user selects an image display method. If the user wants to view a representative portion as a moving image, the process proceeds to step 1610. If the user wants to search for a desired part using a still image, the process proceeds to step 1609.

【００１９】手順１６０９では、使用者が見たい部分を
効率よく探せるように、システムが映像表示装置１５１
０上に映像の要約を表示する。例えば、図１４に示した
ように、カット画面を一覧表示する。また、同一グルー
プと判定されたカット画面を、同一グループ内で最初に
検出されたカット画面と連結して子画面の形式で表示し
てもよい。２番目のシーンと３番目のシーンが同一グル
ープで、６番目〜８番目のシーンが同一グループである
とき、図１５のように表示されることになる。In step 1609, the system causes the image display device 151 to allow the user to efficiently search for the desired portion.
A summary of the video is displayed on 0. For example, a list of cut screens is displayed as shown in FIG. Alternatively, a cut screen determined to be in the same group may be displayed in the form of a child screen by being linked to a cut screen first detected in the same group. When the second scene and the third scene are in the same group and the sixth to eighth scenes are in the same group, they are displayed as shown in FIG.

【００２０】手順１６１０では、システムが映像表示装
置１５１０上に映像の各グループの先頭部分を所定の時
間ずつ、例えば５秒ずつ表示する。In step 1610, the system displays the head of each video group on the video display device 1510 for a predetermined time, for example, 5 seconds.

【００２１】[0021]

【発明が解決しようとする課題】従来、画像の中から特
定の代表画像を選択することを特徴とする映像要約装置
は、一つの番組の中でどの映像が重要であるかがその映
像の内容等により異なるにもかかわらず、一つの基準の
みにより代表画像を選択していた。例えば「従来の技
術」の欄で説明した色差相関値法を用いた映像要約装置
では、「カメラアングルのみが異なる同様な内容をもつ
シーンは同一のグループである」という映像要約基準の
みによって番組の代表画像を選択していた。しかし同方
法では、例えばニュース番組においてしばしば起こるよ
うに、現場のアナウンサーが事件を説明するシーン等の
ように同一背景であっても人間が現れているシーンやそ
の直後のシーンのように重要な情報を有している可能性
が高いシーンを代表画像として選択することができな
い。これらの画像は背景が類似している限り全てまとめ
られてしまうからである。また同方法では、衛星放送の
番組「ハイテクシャワーインターナショナル」等のよう
に本編の各記事から２〜３シーンの動画像を抜き出して
作成されたダイジェストが最初に流れてから本編が流れ
る番組で、これらのダイジェスト画像を代表画像となる
ように本編をグループ化することができない。ダイジェ
スト部分と本編部分とを区別せず、隣り合うシーン間の
みで類似度を判断するからである。Conventionally, a video summarizing apparatus characterized in that a specific representative image is selected from images is disclosed in which content of the video is important in one program. Despite the differences, the representative image is selected based on only one criterion. For example, in a video summarizing apparatus using the color difference correlation value method described in the section of "Prior Art", a program is only based on a video summarizing standard such that "scenes having similar contents differing only in camera angle are in the same group". The representative image was selected. However, in this method, as is often the case in news programs, important information such as a scene where a person appears even in the same background, such as a scene where an announcer explains the incident, etc. Cannot be selected as a representative image. This is because these images are all collected as long as the background is similar. In addition, in the same method, a digest created by extracting moving images of two or three scenes from each article of the main part, such as a satellite broadcast program "High Tech Shower International", flows first, and then the main part flows. The main part cannot be grouped so that the digest image described above becomes the representative image. This is because the similarity is determined only between adjacent scenes without distinguishing between the digest part and the main part.

【００２２】映像要約装置のみならず、その映像要約装
置により得られた要約情報を含む映像情報を表示する映
像表示装置についても課題があった。すなわち、全ての
シーンの一部又は全部を何らかの方法で再生するショッ
ト毎ラッシュ再生法や時間長指定要約法に基づく映像表
示装置においては、映像のシーンの数が多いときには代
表画像も多くなり、使用者側の負担となっていた。ま
た、画像の中から所定の基準に従って代表画像を選択す
る映像要約装置により得られた要約情報を含む映像情報
を表示する映像表示装置においても、映像のシーンの数
が多いときには代表画像が多くなることが多く、その結
果、一覧表示される画像が多くなり、見たい部分を探し
にくいということがしばしばあった。例えばニュース番
組を要約した場合には、多くの場合、各記事から数十枚
のシーンの代表画像が表示されるために、一つの記事を
見終わるまでに、数十枚のシーンの代表画像を見る必要
があるという課題があった。There has been a problem not only with the video summarizing apparatus but also with a video display apparatus that displays video information including the summary information obtained by the video summarizing apparatus. That is, in a video display device based on a lash reproduction method for each shot or a time length designation summarization method for reproducing part or all of all scenes by some method, when the number of video scenes is large, the number of representative images increases, and Was a burden on the participant. Also, in a video display device that displays video information including summary information obtained by a video summary device that selects a representative image from images according to a predetermined criterion, the number of representative images increases when the number of video scenes is large. As a result, the number of images displayed in a list increases, and it is often difficult to find a desired portion. For example, when summarizing a news program, in many cases, representative images of dozens of scenes are displayed from each article. There was an issue that needed to be seen.

【００２３】また、図１６、図１７を用いて説明したシ
ステムにおける映像表示装置では静止画を用いて見たい
部分を探すことはできるが、動画と音声を用いて見たい
部分を探すことができないという課題を有していた。さ
らに、ショット毎ラッシュ再生法と時間長指定要約法で
は静止画を用いて見たい部分を探すことができないだけ
でなく、一定時間に達しないシーンが再生される場合、
その一定時間の中に次のシーンの先頭の映像が含まれる
のでその一定時間の終わりまで次のシーンが再生され、
その後再び当該次のシーンが先頭から一定時間再生され
るので、この当該次のシーンの先頭の映像が連続してか
つ重複して再生されるという課題を有していた。The video display device in the system described with reference to FIGS. 16 and 17 can search for a desired portion using still images, but cannot search for a desired portion using moving images and sounds. There was a problem that. Furthermore, in the case of the shot-by-shot rush reproduction method and the time length designation summarization method, not only can a part to be viewed using a still image cannot be searched, but also a scene that does not reach a certain time is reproduced.
Since the first scene of the next scene is included in the certain time, the next scene is played until the end of the certain time,
Thereafter, the next scene is reproduced again from the beginning for a certain period of time, so that there is a problem that the beginning video of the next scene is reproduced continuously and redundantly.

【００２４】以上で述べたように、映像の内容や使用者
の好み等に関わりなく一定の映像要約基準のみで画一的
に代表画像を選択する映像要約装置や代表画像を映像の
長さ等に関係なく表示する映像表示装置では、映像の内
容等に依存する多様性および要約に対する使用者の多様
なニーズに対応できないという課題を有していた。As described above, a video summarizing apparatus for uniformly selecting a representative image only based on a certain video summarization standard regardless of the content of the video, the user's preference, and the like, and displaying the representative image as the length of the video, etc. However, the video display device that displays the video data regardless of the user has a problem that it cannot respond to the user's various needs for the diversity and the summary depending on the content of the video and the like.

【００２５】本発明は上記従来技術の課題を解決するも
ので、映像内容の多様性および使用者の好みの多様性に
対応する映像要約装置および要約情報を効率的に表示す
るための映像表示装置を提供することを目的とする。The present invention solves the above-mentioned problems of the prior art, and provides a video summarizing apparatus corresponding to a variety of video contents and a variety of user preferences and a video display apparatus for efficiently displaying summary information. The purpose is to provide.

【００２６】[0026]

【課題を解決するための手段】上記課題を解決するため
に、本発明に係る映像要約方法を実現する映像要約装置
は、取り込まれた映像を所定の基準に基づいて分割して
形成した複数のシーンを複数の時系列グループにまとめ
るためのまたは上記複数のシーンより所定のシーンを選
択するための映像要約手段を複数用意している。この複
数の映像要約手段とは、一定時間以上継続するシーンだ
けを検出してその代表画像を抜き出す手段、色差相関値
法などを用いてグループ化した結果に対して一定時間以
上継続するシーンがその前後のシーンと同一グループに
ならないように修正してから各グループの代表画像を抜
き出す手段、さらに、映像のダイジェストを含む数十シ
ーンのそれぞれの代表画像と他のシーンのそれぞれの代
表画像との間で類似度を計算し類似度がしきい値以上に
なるシーンを検出してその代表画像だけを抜き出す手段
等であり、各手段によって抜き出された映像要約情報は
内部または外部に備えられた記録手段に映像情報自体と
共に記録される。本発明に係る映像表示方法を実現する
映像表示装置では、上記記録手段に記録された映像要約
情報である映像の要約画像である代表画像を一覧表示
し、一覧表示された画像の中から見たい部分の画像を指
定すると、その部分以降の映像が再生される。さらに、
あらかじめ映像要約手段の優先順位と、各映像要約手段
の中での画像選択の優先順位を決めておき、これらの優
先順位を用いて代表画像を決定し、使用者に指定された
枚数以下の画像を抜き出すことができる。また、一つの
映像要約手段により抜き出された代表画像の枚数がしき
い値以上になるとき、当該映像要約手段の代表画像を他
の映像要約手段による代表画像よりも先に選んで映像か
ら抜き出す画像を決定し、そうでない場合には、当該映
像要約手段以外の他の映像要約手段の代表画像の中から
選んで映像から抜き出す画像を決定して、使用者に指定
された枚数以下の画像を抜き出すことができる。In order to solve the above-mentioned problems, an image summarizing apparatus for realizing an image summarizing method according to the present invention comprises a plurality of images obtained by dividing a captured image based on a predetermined standard. A plurality of video summarizing means are provided for grouping scenes into a plurality of time series groups or for selecting a predetermined scene from the plurality of scenes. The plurality of video summarizing means includes a means for detecting only a scene that continues for a certain time or more and extracting a representative image thereof, and a scene for a certain time or more for a result of grouping using a color difference correlation value method or the like. Means to extract representative images of each group after correcting so that they do not belong to the same group as the preceding and following scenes, and between each representative image of dozens of scenes including the digest of the video and each representative image of other scenes Means for calculating a similarity, detecting scenes in which the similarity is equal to or more than a threshold value, and extracting only a representative image thereof.The video summary information extracted by each means is stored in an internal or external recording. It is recorded in the means together with the video information itself. In the video display apparatus for realizing the video display method according to the present invention, a list of representative images, which are summary images of video, which is video summary information recorded in the recording unit, is displayed. When an image of a part is specified, the video after that part is reproduced. further,
The priorities of the video summarizing means and the priorities of image selection in each video summarizing means are determined in advance, and a representative image is determined using these priorities. Can be extracted. Further, when the number of representative images extracted by one video summarizing means is equal to or more than a threshold value, the representative image of the video summarizing means is selected and extracted from the video before the representative image by the other video summarizing means. Determine the image, if not, select an image to be extracted from the video by selecting from representative images of other video summarization means other than the video summarization means, and determine the number of images less than the number specified by the user. Can be extracted.

【００２７】さらに、代表画像を一覧表示する代わり
に、各代表画像の位置付近の映像の一部を次々に再生す
る（以下、再生された映像を要約映像という）ことがで
きる。そして、この要約映像上の任意のフレームを先頭
にして、要約前の映像を再生することができる。また、
要約映像の再生の中断時に、中断した位置のフレーム画
像の内容を代表する代表画像を含む複数の代表画像を一
覧表示する。Further, instead of displaying the representative images in a list, a part of the video near the position of each representative image can be reproduced one after another (hereinafter, the reproduced video is referred to as a summary video). Then, the video before the summarization can be reproduced with an arbitrary frame on the summary video at the top. Also,
When the reproduction of the summary video is interrupted, a list of a plurality of representative images including a representative image representing the contents of the frame image at the interrupted position is displayed.

【００２８】ただし、要約映像において、代表画像の位
置付近の映像の一部が直前の代表画像の位置付近の映像
の一部に含まれる場合には、直前の代表画像の位置付近
の映像の一部の末尾を、代表画像の位置付近の映像の一
部の先頭とする。However, in the summary video, when a part of the video near the position of the representative image is included in a part of the video near the position of the immediately preceding representative image, one of the videos near the position of the immediately preceding representative image is included. The end of the set is the beginning of a part of the video near the position of the representative image.

【００２９】[0029]

【発明の実施の形態】請求項１に記載の発明は、取り込
まれた映像をそのシーンチェンジを検出することにより
分割して形成した複数のシーンについて、時系列的にそ
の前後のシーンの所定の画像（以下、シーンの所定の画
像を代表画像という。）の類似度を計算することにより
代表画像を含むシーンを時系列グループにまとめる時系
列グループ生成過程と、同一の時系列グループ内にあっ
ても一定時間以上継続するシーンについてはその前後の
シーンとは独立する時系列グループとする修正をかける
時系列グループ追加過程と、上記２つの過程で得られた
各時系列グループの映像要約情報を出力する映像要約情
報出力過程とからなる映像要約方法であり、一定時間以
上継続するシーンは全て抜き出すとともに、所定の基準
で時系列のシーンをグループ化するという作用を有す
る。According to the first aspect of the present invention, a plurality of scenes formed by dividing a captured video by detecting a scene change of the captured video are determined in a time series by a predetermined number of scenes before and after the scene. A time-series group generation process of calculating the similarity of an image (hereinafter, a predetermined image of a scene is referred to as a representative image) to group scenes including the representative image into a time-series group. For a scene that continues for a certain period of time or more, a time series group adding step of making it a time series group independent of the preceding and following scenes, and video summary information of each time series group obtained in the above two steps are output. Is a video summarization method comprising the steps of outputting video summarization information. It has the effect that grouping.

【００３０】請求項２に記載の発明は、請求項１に記載
した映像要約方法の時系列グループ生成過程におけるシ
ーンの代表画像間の類似度を計算する方法が、比較すべ
き代表画像間で共通する色を持つ画素の割合を計算する
ものであることを特徴とする映像要約方法であり、一定
時間以上継続するシーンは全て抜き出すとともに、色に
関する類似度を基準として時系列のシーンをグループ化
するという作用を有する。According to a second aspect of the present invention, the method of calculating the similarity between representative images of scenes in the time series group generation process of the video summarizing method according to the first aspect is common to representative images to be compared. This is a video summarization method characterized by calculating the ratio of pixels having the same color, and extracts all scenes that last for a certain period of time or more, and groups time-series scenes based on the similarity regarding color. It has the action of:

【００３１】請求項３に記載の発明は、取り込まれた映
像を構成する複数のシーンの内で所定の基準に従って選
択された複数のシーン（以下、基準シーンという。）の
所定の画像（以下、シーンの所定の画像を代表画像とい
う。）と映像を構成する全てのシーンの代表画像との間
で所定の基準に従って類似度を計算して基準シーンの代
表画像との類似度がしきい値以上になる代表画像を含む
シーンを選択する類似度計算過程と、上記過程で得られ
たシーンの映像要約情報を出力する映像要約情報出力過
程とからなる映像要約方法であり、映像のダイジェスト
シーンである蓋然性の高い時系列シーンと映像を構成す
る全てのシーンとの間の類似度を求めることによりダイ
ジェストシーンに対応する本編シーンを抜き出すという
作用を有する。According to a third aspect of the present invention, a predetermined image (hereinafter, referred to as a reference scene) of a plurality of scenes (hereinafter, referred to as reference scenes) selected according to a predetermined reference from a plurality of scenes constituting a captured image. A similarity between a predetermined image of a scene is referred to as a representative image) and representative images of all scenes constituting a video according to a predetermined standard, and the similarity with the representative image of the reference scene is equal to or greater than a threshold value And a video summary information output step of outputting video summary information of the scene obtained in the above process, which is a digest scene of the video. This has the effect of extracting the main part scene corresponding to the digest scene by obtaining the similarity between the highly likely time-series scene and all the scenes constituting the video.

【００３２】請求項４に記載の発明は、請求項３に記載
の映像要約方法であって、類似度計算過程において基準
シーンの代表画像と本編シーンの代表画像との間の類似
度を計算するための基準が、代表画像を複数の画像領域
に分割し、両代表画像の各画像領域内の画素の平均色の
ＲＧＢ成分を比較するものであることを特徴とする。映
像のダイジェストシーンである蓋然性の高い時系列シー
ンと映像を構成する全てのシーンとの間の代表画像の各
画像領域内の画素の平均色のＲＧＢ成分を比較すること
により類似度を求めることで、ダイジェストシーンに対
応する本編シーンを抜き出すという作用を有する。According to a fourth aspect of the present invention, there is provided the video summarizing method according to the third aspect, wherein a similarity between the representative image of the reference scene and the representative image of the main scene is calculated in the similarity calculating step. Is that the representative image is divided into a plurality of image regions, and the RGB components of the average color of the pixels in each image region of both representative images are compared. By calculating the similarity by comparing the RGB components of the average color of the pixels in each image region of the representative image between the time-series scene having high probability, which is the digest scene of the video, and all the scenes constituting the video, And has the effect of extracting the main part scene corresponding to the digest scene.

【００３３】請求項５に記載の発明は、取り込まれた映
像をそのシーンチェンジを検出することにより分割して
形成した複数のシーンを複数の時系列グループにまとめ
るための複数の映像要約過程および／または上記複数の
シーンより所定のシーンを選択するための複数の映像要
約過程と、各映像要約過程により選択された各シーンの
映像要約情報を出力する映像要約出力過程とからなる映
像要約方法であって、使用者が映像の特徴等に合わせて
代表画像を決定するための複数の映像要約情報を用意す
るという作用を有する。According to a fifth aspect of the present invention, there are provided a plurality of video summarization processes for combining a plurality of scenes formed by dividing a captured video by detecting a scene change thereof into a plurality of time series groups, and / or Alternatively, a video summarization method comprising: a plurality of video summarization processes for selecting a predetermined scene from the plurality of scenes; and a video summary output process for outputting video summary information of each scene selected by each video summarization process. In addition, there is an effect that the user prepares a plurality of pieces of video summary information for determining the representative image according to the characteristics of the video.

【００３４】請求項６に記載の発明は、請求項５に記載
した映像要約方法であって、複数のシーンを複数の時系
列グループにまとめるための複数の映像要約過程および
／または上記複数のシーンより所定のシーンを選択する
ための複数の映像要約過程が、全てのシーンを選択する
映像要約過程と、上記取り込まれた映像の複数のシーン
のうち一定時間以上継続するシーンのみを選択する映像
要約過程と、時系列的にその前後のシーンの所定の画像
（以下、シーンの所定の画像を代表画像という。）の類
似度を所定の基準に従って計算して類似度がしきい値以
上になる代表画像を含むシーンを時系列グループにまと
める映像要約過程と、請求項１または請求項２に記載し
た映像要約過程と、請求項３または請求項４に記載した
映像要約過程のうち少なくとも２以上の映像要約過程で
あることを特徴とするものであり、使用者が映像の特徴
等に合わせて代表画像を決定するための複数の映像要約
情報を用意するという作用を有する。According to a sixth aspect of the present invention, there is provided the video summarizing method according to the fifth aspect, wherein a plurality of video summarizing processes for combining a plurality of scenes into a plurality of time series groups and / or the plurality of scenes are performed. A plurality of video summarization processes for selecting a more predetermined scene include a video summarization process of selecting all scenes, and a video summarization process of selecting only a scene that continues for a predetermined time or more among the plurality of scenes of the captured video. The process and the similarity of a predetermined image of a scene before and after the scene in time series (hereinafter, a predetermined image of the scene is referred to as a representative image) are calculated according to a predetermined reference, and the similarity is equal to or larger than a threshold. A video summarization process for grouping scenes including images into a time-series group, a video summarization process according to claim 1 or 2, and a video summarization process according to claim 3 or 4. And characterized in that at least two or more video summarizing process, it has the effect of user to prepare a plurality of video summary information to determine a representative image in accordance with the characteristics or the like of the video.

【００３５】請求項７に記載の発明は、請求項５または
請求項６に記載した映像要約方法であって、映像要約情
報がシーンの所定の画像（以下、シーンの所定の画像を
代表画像という。）または所定のフレーム番号（以下、
代表フレーム番号という。）であって、代表画像間のま
たは代表フレーム番号間のフレーム数が多いものから順
に代表画像または代表フレーム番号を選択してゆき、所
定の数のシーンまたは時系列シーンを選択するという作
用を有する。According to a seventh aspect of the present invention, there is provided the video summarizing method according to the fifth or sixth aspect, wherein the video summarizing information is a predetermined image of a scene (hereinafter, a predetermined image of a scene is referred to as a representative image). .) Or a predetermined frame number (hereinafter, referred to as
It is called a representative frame number. ) In which the representative image or the representative frame number is selected in ascending order of the number of frames between the representative images or the representative frame numbers, and a predetermined number of scenes or time-series scenes are selected. .

【００３６】請求項８に記載の発明は、請求項７に記載
の映像要約方法であって、一つの映像要約過程によって
は所定の数のシーンまたは時系列シーンが選択できない
場合に、残りのシーンまたは時系列シーンを他の映像要
約手段によって選択されたシーンまたは時系列シーンを
選択するという作用を有する。The invention according to claim 8 is the video summarizing method according to claim 7, wherein when a predetermined number of scenes or time-series scenes cannot be selected by one image summarizing process, the remaining scenes are selected. Alternatively, it has an effect of selecting a time series scene or a scene selected by another video summarizing means or a time series scene.

【００３７】請求項９に記載の発明は、請求項７または
請求項８に記載の映像要約方法であって、使用者が、映
像要約過程の優先順位および選択すべきシーンまたは時
系列シーンの数を入力し、その情報に従って映像要約を
行うという作用を有する。According to a ninth aspect of the present invention, there is provided the video summarizing method according to the seventh or the eighth aspect, wherein a user selects the priority of the video summarizing process and the number of scenes or time series scenes to be selected. Is input, and video summarization is performed according to the information.

【００３８】請求項１０に記載の発明は、映像および請
求項１から請求項９までのいずれかの方法で選択した上
記映像の代表画像を表示する映像表示方法において、代
表画像の位置付近の映像の一部をつないだ映像（以下、
要約映像という。）上で指定されたフレームを先頭にし
て、映像を再生することを特徴とする映像表示方法であ
って、音声や被写体の動きを考慮しながら見たい部分を
指定できるという作用を有する。According to a tenth aspect of the present invention, in a video display method for displaying a video and a representative image of the video selected by any one of the first to ninth methods, the video near the position of the representative image is displayed. Video connecting a part of
It is called a summary video. This is a video display method characterized by reproducing a video with the frame specified above at the head, and has an effect that a portion to be viewed can be specified while taking into account sound and movement of a subject.

【００３９】請求項１１に記載の発明は、映像の中から
抜き出した画像の位置付近の映像の一部をつないだ映像
（以下、要約映像という。）の再生の中断時に、中断し
た位置のフレーム画像の内容を代表する代表画像を含む
複数の代表画像を一覧表示することを特徴とする映像表
示方法であって、使用者が見たい部分の情報を集中的に
表示するという作用を有する。According to an eleventh aspect of the present invention, when reproduction of a video (hereinafter, referred to as a summary video) in which a part of a video near the position of an image extracted from a video is interrupted, a frame at the interrupted position is interrupted. A video display method characterized by displaying a list of a plurality of representative images including representative images representing the contents of an image, and has an effect of intensively displaying information of a part desired by a user.

【００４０】請求項１２に記載の発明は、請求項１０ま
たは請求項１１に記載の映像表示方法であって、要約映
像の先頭の一部が直前の要約映像の末尾の一部に含まれ
る場合には、当該要約映像の中で直前の要約映像の末尾
の一部に含まれた最後のフレームの次のフレームを当該
要約映像の先頭とするという作用を有する。この方法は
一度再生された要約映像を間髪入れずに重複しての再生
を回避することができる。According to a twelfth aspect of the present invention, there is provided the video display method according to the tenth or eleventh aspect, wherein a part of the head of the summary video is included in a part of the tail of the immediately preceding summary video. Has the effect of setting the frame next to the last frame included in a part of the end of the previous summary video in the summary video as the head of the summary video. This method can avoid duplicative reproduction of the once reproduced summary video without a pause.

【００４１】請求項１３に記載された発明は、取り込ま
れた映像を所定の基準に基づいて分割して形成した複数
のシーンを複数の時系列グループにまとめることにより
映像の要約情報を抽出するための複数の映像要約手段お
よび／または上記複数のシーンより所定のシーンを選択
することにより映像の要約情報を抽出するための複数の
映像要約手段と、上記の一またはそれ以上の映像要約手
段で抽出された要約情報を選択する要約情報選択手段お
よび要約情報表示手段を備えた映像要約システムであっ
て、どの基準で要約された情報を表示するかを自由に選
択できるという作用を有する。According to a thirteenth aspect of the present invention, a plurality of scenes formed by dividing a captured video based on a predetermined criterion are grouped into a plurality of time-series groups to extract video summary information. And / or a plurality of video summarization means for extracting video summary information by selecting a predetermined scene from the plurality of scenes, and the one or more video summarization means. A video summarizing system comprising a summary information selecting means and a summary information displaying means for selecting the summarized information, which has an effect that it is possible to freely select on which basis the summarized information is to be displayed.

【００４２】以下、本発明に基づく実施の形態を図面を
参照しながら説明する。（第１の実施の形態）図１は第１の実施の形態であっ
て、複数の映像要約基準に基づき映像より要約情報を抽
出する映像要約装置と、この装置により選択された映像
の要約情報およびその映像自体を表示する映像表示装置
を組み合わせて構成した映像要約システムを示すブロッ
ク図である。Hereinafter, embodiments of the present invention will be described with reference to the drawings. (First Embodiment) FIG. 1 shows a first embodiment, in which a video summarization apparatus for extracting summary information from a video based on a plurality of video summarization criteria, and a video summary information selected by the apparatus. FIG. 2 is a block diagram showing a video summarization system configured by combining a video display device that displays the video itself.

【００４３】図１において、１０１、１０２は映像出力
装置であって、１０１はビデオディスク装置、１０２は
ＶＴＲである。１０３はビデオディスク装置１０１やＶ
ＴＲ１０２からの映像信号を処理しながら、映像を要約
する映像要約装置であって、フレーム画像を取り込む画
像取り込み手段１０４と、シーンチェンジを検出するシ
ーンチェンジ検出手段１０５と、一定時間以上継続する
シーンを検出する時間判定処理手段１０６と、シーンの
代表画像のグループ化を実行するグループ生成手段１０
７と、時間判定処理手段１０６とグループ生成手段１０
７の出力を受けてグループ化の結果を修正するグループ
追加手段１０８と、シーンの代表画像の間の類似度を計
算する画像類似度計算手段１０９と、類似度計算手段で
求めた類似度がしきい値以上になるシーンを検出する画
像基準処理手段１１０から構成される。１１１は、ビデ
オディスク装置１０１とＶＴＲ１０２を制御する制御装
置である。１１２は、映像を圧縮する映像圧縮装置であ
る。１１３は映像圧縮装置１１２で圧縮された映像デー
タと、映像要約装置１０３で検出された各種シーンとグ
ループのデータと、それぞれの代表画像を保存するファ
イルサーバーである。１１４はファイルサーバ１１３に
格納されたデータを用いて映像を表示する映像表示装置
であり、映像要約装置で抜き出された画像の位置付近の
映像を一定時間づつ次々に再生する要約映像再生手段１
１５と、映像の再生を制御するユーザーインタフェース
手段１１６と、指定された位置以降の映像を再生する映
像再生手段１１７から構成される。In FIG. 1, 101 and 102 are video output devices, 101 is a video disk device, and 102 is a VTR. 103 is a video disk device 101 or V
A video summarizing apparatus for summarizing a video while processing a video signal from a TR 102, comprising: an image capturing unit 104 for capturing a frame image; a scene change detecting unit 105 for detecting a scene change; Time determination processing means 106 for detecting, and group generation means 10 for performing grouping of representative images of scenes
7, time determination processing means 106 and group generation means 10
7, the group addition means 108 for correcting the grouping result in response to the output of the image 7, the image similarity calculation means 109 for calculating the similarity between the representative images of the scenes, and the similarity calculated by the similarity calculation means. It comprises an image reference processing means 110 for detecting a scene having a threshold value or more. A control device 111 controls the video disk device 101 and the VTR 102. Reference numeral 112 denotes a video compression device that compresses a video. Reference numeral 113 denotes a file server that stores video data compressed by the video compression device 112, data of various scenes and groups detected by the video summarization device 103, and respective representative images. Reference numeral 114 denotes a video display device for displaying a video using data stored in the file server 113. The video summary device 1 reproduces videos near the position of the image extracted by the video summary device one after another for a certain period of time.
15, a user interface means 116 for controlling the reproduction of the video, and a video reproduction means 117 for reproducing the video after the designated position.

【００４４】映像要約システムの映像要約装置１０３
は、例えば、コンピュータハードウエアおよびソフトウ
エアの組み合わせにより実現することができる。また、
映像表示装置１１４のうち、要約映像再生手段１１５お
よび映像再生手段１１７はコンピュータ上で実現するこ
とができ、ユーザーインターフェース手段１１６はＣＲ
Ｔ等のモニターおよびコンピュータのハードウエアおよ
びソフトウエアの組み合わせにより実現することができ
る。The image summarizing device 103 of the image summarizing system
Can be realized, for example, by a combination of computer hardware and software. Also,
Of the video display device 114, the summary video playback unit 115 and the video playback unit 117 can be realized on a computer, and the user interface unit 116 is a CR
It can be realized by a combination of a monitor such as T and computer hardware and software.

【００４５】本実施の形態に係る映像要約装置１０３は
４つの映像要約基準を備えている。第１の映像要約基準
は、映像の含む全てのシーンの先頭画面を代表画像とす
る映像要約基準である。以下では、この映像要約基準を
全シーン表示モードという。第２の映像要約基準は、一
定値以上の時間長を有するシーンの先頭画面を代表画像
とする映像要約基準である。以下では、この映像要約基
準を時間判定モードという。第３の映像要約基準は、時
間判定モードで選択されたシーンを除くシーンを色差相
関値法によりグループ化し、時間判定モードで選択され
たシーンおよび上記グループ化されたシーンの先頭画面
を代表画像とする映像要約基準である。以下では、この
映像要約基準をハイブリッドモードという。第４の映像
要約基準は、特定のニュース番組のように映像のダイジ
ェストが番組の冒頭等にあることがあらかじめわかって
いる映像の当該ダイジェストを含む部分の代表画像とそ
の他のシーンの先頭画面との間の類似度を、複数の画像
領域内の画素のＲＧＢ成分の比較によって計算し、所定
のしきい値以上の値を持つシーンの先頭画面を代表画像
とする映像要約基準である。以下では、この映像要約基
準を画像基準モードという。The video summarizing apparatus 103 according to the present embodiment has four video summarization standards. The first video summarization criterion is a video summarization criterion in which the top screen of all scenes including a video is a representative image. Hereinafter, this video summarization standard is referred to as an all scene display mode. The second video summarization criterion is a video summarization criterion in which a top screen of a scene having a time length equal to or longer than a certain value is used as a representative image. Hereinafter, this video summarization reference is referred to as a time determination mode. The third video summarization criterion is that scenes other than the scene selected in the time determination mode are grouped by the color difference correlation method, and the scene selected in the time determination mode and the top screen of the grouped scenes are defined as a representative image. This is the video summarization criterion. Hereinafter, this video summarization standard is referred to as a hybrid mode. The fourth video summarization criterion is that a representative image of a portion including the digest of a video, such as a specific news program, which is known in advance that the digest of the video is at the beginning of the program and the top screen of other scenes This is a video summarization criterion in which the similarity between pixels is calculated by comparing RGB components of pixels in a plurality of image areas, and the top screen of a scene having a value equal to or greater than a predetermined threshold value is used as a representative image. Hereinafter, this video summarization standard is referred to as an image standard mode.

【００４６】以上のように構成される映像要約システム
について、図２に示すフローチャートを用いてその動作
を説明する。The operation of the video summarizing system configured as described above will be described with reference to the flowchart shown in FIG.

【００４７】手順２０１では、図１における制御装置１
１１がビデオディスク装置１０１とＶＴＲ１０２を制御
して、映像の再生を開始し、同時に映像圧縮装置１１２
での映像の圧縮を開始する。In step 201, the control device 1 shown in FIG.
11 controls the video disk device 101 and the VTR 102 to start video playback, and at the same time,
Start compressing video in.

【００４８】手順２０２では、制御装置１１１が映像が
終了したかどうか判定する。映像が終了した場合には手
順２１２に進み、そうでなければ、手順２０３に進む。In step 202, the control device 111 determines whether or not the image has been completed. If the video has ended, the procedure proceeds to step 212; otherwise, the procedure proceeds to step 203.

【００４９】手順２０３から手順２１１までは映像要約
装置１０３において行われる処理である。Steps 203 to 211 are processes performed in the video summarizing apparatus 103.

【００５０】手順２０３では、画像取り込み手段１０４
が再生中のフレーム画像を取り込む。In step 203, the image capturing means 104
Captures the frame image being played.

【００５１】手順２０４では、シーンチェンジ検出手段
１０５が、画像取り込み手段１０４で取り込まれたフレ
ーム画像を処理して、「従来の技術」の欄で述べた共通
色比率法を用いて、シーンチェンジが発生したかどうか
を判定する。ただし、共通色比率法では、次シーンのフ
レーム画像を２枚処理してからシーンチェンジが検出さ
れるので、「シーンチェンジが発生した」という判定は
「前回取り込んだフレーム画像がカット画面である」こ
とを表す。In step 204, the scene change detecting means 105 processes the frame image captured by the image capturing means 104, and detects a scene change using the common color ratio method described in the section of "Prior Art". Determine if it has occurred. However, in the common color ratio method, since a scene change is detected after processing two frame images of the next scene, the determination that “a scene change has occurred” is “the previously captured frame image is a cut screen”. It represents that.

【００５２】図３は、今述べたシーンチェンジ検出手段
１０５がシーンチェンジを検出する方法と前シーンを示
す説明図である。図３において、ａは手順２０３で取り
込まれたフレーム画像であり、ｂは１回前の手順２０３
で取り込んだフレーム画像であり、ｃは２回前の手順２
０３で取り込んだフレーム画像であり、ｄは３回前の手
順２０３で取り込んだフレーム画像である。この場合、
シーンチェンジはｂとｃとの間で発生しているが、それ
を検出するのはａを取り込んだ後であることを示してい
る。また、検出したシーンチェンジの直前のシーンを前
シーンと呼ぶ。FIG. 3 is an explanatory diagram showing a method of detecting a scene change by the scene change detecting means 105 just described and a previous scene. In FIG. 3, a is the frame image captured in step 203, and b is the previous step 203.
Is the frame image captured in step 2, and c is the procedure 2 before
03 is the frame image fetched, and d is the frame image fetched in the previous procedure 203 three times. in this case,
Although a scene change has occurred between b and c, it indicates that it is detected after a is taken. The scene immediately before the detected scene change is called a previous scene.

【００５３】なお、映像変化モデル法などを用いてシー
ンチェンジを検出してもよい。また、あらかじめオペレ
ータがシーンチェンジを判断して、そのシーンチェンジ
が発生した時点のフレーム番号を入力しておいてもよ
い。The scene change may be detected by using a video change model method or the like. Alternatively, the operator may determine a scene change in advance and input a frame number at the time when the scene change occurs.

【００５４】手順２０４でシーンチェンジ検出手段１０
５が「シーンチェンジが発生した」と判定した場合には
手順２０５に進み、そうでなければ手順２０２に戻る。In step 204, the scene change detecting means 10
If it is determined that the scene change has occurred, the process proceeds to step 205; otherwise, the process returns to step 202.

【００５５】手順２０５では、全シーンモードのシーン
の代表画面および先頭フレームを決定する。具体的に
は、手順２０４で取り込まれた画像およびシーンチェン
ジが発生した時点のフレーム番号を全シーンモードのシ
ーンの代表画面および先頭フレームとしてファイルサー
バ１１３に保存する。なお、画像をファイルサーバー１
１３に保存する場合には、縮小して保存してもよい。In step 205, the representative screen and the first frame of the scene in the all scene mode are determined. Specifically, the image captured in step 204 and the frame number at the time when the scene change occurs are stored in the file server 113 as the representative screen and the top frame of the scene in all scene modes. In addition, the image is transferred to the file server 1
In the case where the data is stored in the file 13, the data may be reduced and stored.

【００５６】手順２０６では、時間判定モードのシーン
の代表画面および先頭フレームを決定する。具体的に
は、手順２０４で検出したシーンチェンジの直前のシー
ン（以下、前シーンという）の時間長を計算してから、
時間判定処理手段１０６で、時間長が８秒以上になる前
シーンを検出し、検出した前シーンの代表画像を時間判
定モードの代表画像とみなしてファイルサーバ１１３に
保存する。また、時間判定処理手段１０６で検出された
前シーンの先頭フレーム番号を時間判定モードで表示す
るシーンの先頭フレーム番号としてファイルサーバーに
保存する。In step 206, the representative screen and the first frame of the scene in the time determination mode are determined. Specifically, after calculating the time length of the scene immediately before the scene change detected in step 204 (hereinafter referred to as the previous scene),
The time determination processing means 106 detects a previous scene having a time length of 8 seconds or more, and stores a representative image of the detected previous scene in the file server 113 as a representative image in the time determination mode. The head frame number of the previous scene detected by the time determination processing means 106 is stored in the file server as the head frame number of the scene to be displayed in the time determination mode.

【００５７】なお、本実施の形態では、８秒以上の前シ
ーンを検出したが、必ずしも８秒である必要はない。In the present embodiment, the previous scene of 8 seconds or more is detected, but it is not always necessary to be 8 seconds.

【００５８】手順２０７から２０９まででは、ハイブリ
ッドモードのシーンの代表画面および先頭フレームを決
定する。In steps 207 to 209, the representative screen and the first frame of the scene in the hybrid mode are determined.

【００５９】手順２０７では、グループ生成手段１０７
が、手順２０４で検出したシーンチェンジ直後のシーン
（以下、次シーンという。）の代表画像と前シーンの代
表画像をファイルサーバー１１３から取り出し、その間
の色差ヒストグラム相関値を求め、色差ヒストグラム相
関値がしきい値以上になる場合に、次シーンの代表画像
をハイブリッドモードの代表画像とみなしてファイルサ
ーバ１１３に保存する。さらに、色差ヒストグラム相関
値がしきい値以上になる場合には、次シーンの先頭フレ
ーム番号をハイブリッドモードで表示するシーンの先頭
フレーム番号としてファイルサーバー１１３に保存す
る。In step 207, the group generation means 107
However, the representative image of the scene immediately after the scene change (hereinafter, referred to as the next scene) detected in step 204 and the representative image of the previous scene are extracted from the file server 113, and the color difference histogram correlation value between them is obtained. If the value is equal to or larger than the threshold value, the representative image of the next scene is regarded as the representative image of the hybrid mode and stored in the file server 113. If the color difference histogram correlation value is equal to or greater than the threshold value, the first frame number of the next scene is stored in the file server 113 as the first frame number of the scene to be displayed in the hybrid mode.

【００６０】手順２０８は分岐処理である。前シーンの
時間長が８秒以上になる場合には、手順２０９に進み、
そうでなければ手順２１０に進む。Step 208 is a branching process. If the time length of the previous scene is 8 seconds or more, proceed to step 209,
Otherwise, proceed to step 210.

【００６１】手順２０９では、グループ追加手段１０８
が時間長が８秒以上の前シーンの代表画像と次シーンの
代表画像をハイブリッドモードの代表画像とみなしてフ
ァイルサーバー１１３に保存する。さらに、上記前シー
ンの先頭フレーム番号と次シーンの先頭フレーム番号を
ハイブリッドモードで表示するシーンの先頭フレーム番
号としてファイルサーバー１１３に保存する。In step 209, the group adding means 108
Saves the representative image of the previous scene and the representative image of the next scene having a time length of 8 seconds or more as the representative image of the hybrid mode in the file server 113. Further, the head frame number of the previous scene and the head frame number of the next scene are stored in the file server 113 as the head frame number of the scene to be displayed in the hybrid mode.

【００６２】手順２１０から２１１まででは、画像基準
モードのシーンの代表画面および先頭フレームを決定す
る。In steps 210 to 211, the representative screen and the first frame of the scene in the image reference mode are determined.

【００６３】手順２１０では、手順２０４においてシー
ンチェンジ検出手段１０５により求められた次シーンの
代表画像Ｉ_Nと映像の先頭の２０シーンの代表画像Ｉ
_M（Ｍは１から２０までの自然数）との間の類似度S(M,
N)を画像類似度計算手段１１０が計算する。類似度の計
算方法としては、各種の方法が考えられる。図４でその
一例を紹介する。図４は画像基準モードにおいて、ダイ
ジェストを含む部分の代表画像とその他のシーンとの間
の色彩の類似度を計算する際に画面をブロックへの分割
することを示す説明図である。すなわち、代表画像Ｉ_M
を図４に示すように４×４個のブロックに分割し、ｉ番
目のブロック内の画素の平均色のＲＧＢ成分Ｒ_i(Ｍ)、
Ｇ_i(Ｍ)、Ｂ_i(Ｍ)を用いて次式で計算すればよい。In step 210, the representative image I _N of the next scene and the representative image I of the first 20 scenes of the video obtained by the scene change detecting means 105 in step 204.
_M (M is a natural number from 1 to 20) S (M,
N) is calculated by the image similarity calculation means 110. Various methods can be considered as a method of calculating the similarity. An example is shown in FIG. FIG. 4 is an explanatory diagram showing that the screen is divided into blocks when calculating the color similarity between the representative image of the portion including the digest and other scenes in the image reference mode. That is, the representative image I _M
Is divided into 4 × 4 blocks as shown in FIG. 4, and the RGB components R _i (M) of the average color of the pixels in the i-th block are
G _i (M), B _i and (M) may be calculated by the following equation using.

【００６４】[0064]

【数１】 (Equation 1)

【００６５】（数１）式において、｜ｘ｜はｘの絶対値
を表す。また、従来例で述べた色差ヒストグラム相関値
を類似度として用いてもよい。In equation (1), | x | represents the absolute value of x. Further, the color difference histogram correlation value described in the conventional example may be used as the similarity.

【００６６】なお、以上の説明では、次シーンの代表画
像との間の類似度を計算する画像として、映像の先頭の
２０シーンの代表画像を用いたが、これは、番組の構成
によって変更するようにすることもできる。具体的に
は、映像表示装置１１４のユーザインターフェース手段
１１６からの命令により画像類似度計算手段１０９が次
シーンの代表画像Ｉ_Nを映像の先頭でない部分から選ぶ
ようにしてもよい。例えば、映像の末尾にダイジェスト
がある場合には、映像の末尾の複数シーンを用いる。In the above description, the representative image of the first 20 scenes of the video is used as an image for calculating the degree of similarity with the representative image of the next scene. However, this is changed depending on the structure of the program. You can also do so. Specifically, the image similarity calculation means 109 by a command from the user interface unit 116 of video display device 114 may be pick a representative image I _N follows a scene from the portion not at the beginning of the video. For example, when there is a digest at the end of the video, a plurality of scenes at the end of the video are used.

【００６７】手順２１１では、画像基準処理手段１１０
で、Ｍが１から２０までのいずれかの値をとるときに、In step 211, the image reference processing means 110
Then, when M takes any value from 1 to 20,

【００６８】[0068]

【数２】 (Equation 2)

【００６９】が成り立つかどうか調べ、（数２）が成立
するときに、次シーンの先頭フレーム番号を画像基準モ
ードで表示するシーンの先頭フレーム番号としてファイ
ルサーバー１１３に保存し、次シーンの代表画像を画像
基準モードの代表画像とみなす。（数２）においてθ
_SIMはあらかじめ設定したしきい値である。It is checked whether or not the following holds. When (Equation 2) holds, the first frame number of the next scene is stored in the file server 113 as the first frame number of the scene to be displayed in the image reference mode, and the representative image of the next scene is stored. Is regarded as a representative image in the image reference mode. In (Equation 2), θ
_SIM is a preset threshold value.

【００７０】手順２１１の終了後、手順２０２に戻る。
手順２１２では、映像の再生と映像の圧縮を中止する。After the end of step 211, the process returns to step 202.
In step 212, the reproduction of the video and the compression of the video are stopped.

【００７１】ここまでの手順でファイルサーバー１１３
には、複数の映像要約基準、すなわち全シーンモード、
時間判定モード、ハイブリッドモード、画像基準モード
によりそれぞれ抜き出された要約情報、すなわち代表画
像と先頭フレーム番号が保存された。以下の手順２１３
から手順２１５までは表示装置１１４において行われる
処理である。With the above procedure, the file server 113
Contains several video summarization criteria: full scene mode,
The summary information extracted in the time determination mode, the hybrid mode, and the image reference mode, that is, the representative image and the top frame number are stored. Procedure 213 below
Steps 215 to 215 are processes performed in the display device 114.

【００７２】手順２１３では、使用者がユーザーインタ
ーフェース手段１１６を介してどのモードによる要約映
像を表示するかを選択する。選択肢は、上述したよう
に、時間判定モード、ハイブリッドモード、画像基準モ
ード、全シーン表示モードである。ユーザーインターフ
ェース手段１１６は、ファイルサーバー１１３に対し
て、選択されたモードのシーンの代表画像および先頭フ
レームの情報を要約映像再生手段１１５に送る旨の信号
を出力する。In step 213, the user selects which mode to display the summary video through the user interface means 116. The options are the time determination mode, the hybrid mode, the image reference mode, and the all scene display mode, as described above. The user interface unit 116 outputs a signal to the file server 113 to send the representative image of the scene in the selected mode and the information of the first frame to the summary video reproducing unit 115.

【００７３】手順２１４では、手順２１３で選択された
モードで表示されるべきシーンの先頭フレーム番号と代
表画像の情報が、ファイルサーバー１１３より要約映像
再生手段１１５に送られる。これらの情報に基づいて、
要約映像再生手段１１５は要約映像を表示する。ここで
は使用者がハイブリッドモードを選択した場合について
説明する。図５は本発明の第１の実施の形態における要
約映像の作成方法を示す説明図である。３段の映像情報
のうち、上段は映像自体、中段は上段の映像より抜き出
された５秒間の要約映像、下段は中段の要約映像のみを
連続的に再生することを示したものである。すなわち、
図５の下段に示すように、ハイブリッドモードの各代表
画像のシーンの先頭以降の５秒間の映像データをファイ
ルサーバー１１３からを取り出し、標準速度で次々に再
生する。In step 214, the head frame number of the scene to be displayed in the mode selected in step 213 and the information of the representative image are sent from the file server 113 to the summary video reproducing means 115. Based on this information,
The summary video reproducing means 115 displays the summary video. Here, a case where the user selects the hybrid mode will be described. FIG. 5 is an explanatory diagram showing a method of creating a summary video according to the first embodiment of the present invention. Of the three rows of video information, the upper row shows the video itself, the middle row shows the summary video for 5 seconds extracted from the upper row video, and the lower row shows that only the middle summary video is reproduced continuously. That is,
As shown in the lower part of FIG. 5, the video data for 5 seconds after the head of the scene of each representative image in the hybrid mode is taken out from the file server 113 and reproduced one after another at a standard speed.

【００７４】なお、各代表画像のシーンの先頭以降の映
像データの長さは５秒間でなくてもよい。また、各代表
画像のシーンの先頭以降の５秒間ではなく、「各代表画
像の２秒前から５秒間再生する」というように、各代表
画像の位置前後の映像を次々に再生してもよい。また、
標準速度で再生せずに、早送りで再生してもよい。ま
た、映像内容に合わせて早送りの速度を変えてもよい。
また、各代表画像のシーンの先頭以降の５秒間が複数の
シーンを含んでもよい。Note that the length of the video data after the head of the scene of each representative image does not have to be 5 seconds. Also, instead of the five seconds after the beginning of the scene of each representative image, the video before and after the position of each representative image may be played back one after another, such as "play back five seconds from two seconds before each representative image". . Also,
Instead of playback at the standard speed, playback may be performed at fast forward. Further, the fast-forward speed may be changed according to the video content.
Further, the five seconds after the head of the scene of each representative image may include a plurality of scenes.

【００７５】使用者は、ユーザーインタフェース手段１
１６を用いて要約映像の表示をしながら、見たい部分を
探す。要約映像の制御ボタンとしては、再生、逆再生、
静止、早送り、巻き戻し、１コマ送り、１コマ戻しなど
が考えられる。The user operates the user interface means 1
While displaying the summary video using the search button 16, search for a desired part. Control buttons for the summary video include play, reverse play,
Still, fast-forward, rewind, one-frame forward, one-frame reverse, and the like can be considered.

【００７６】手順２１５では、使用者が要約映像を見な
がら見たい部分の先頭を指定する。図６は、要約映像の
再生部分を示す説明図である。すなわち、使用者がｉ番
目のシーンの代表画像の先頭を２秒見た時点でユーザー
インタフェース手段１１６によりそのシーンを指定した
場合、映像再生手段１１７による映像の再生開始時点は
「ｉ番目のシーンの代表画像の先頭から２秒経過した時
点」となる。In step 215, the user designates the head of the part to be viewed while viewing the summary video. FIG. 6 is an explanatory diagram showing a playback part of the summary video. In other words, when the user specifies the scene by the user interface means 116 at the point of time when the user views the top of the representative image of the i-th scene for 2 seconds, the reproduction start time of the video by the video reproducing means 117 is " At the time when two seconds have elapsed from the head of the representative image ".

【００７７】図７に、見たい部分を指定するためのユー
ザーインターフェース１１６の画面の例を示す。図７に
おいて、ａは要約映像を表示する部分、ｂは要約映像を
制御するボタン、ｃは見たい部分を指定するボタンであ
る。見たい部分を指定するボタンを押した瞬間に、ａに
表示されている画像が、見たい部分の先頭とみなされ
る。従って、見たい部分で要約映像を静止させておき、
ボタンｃを押すと、確実に見たい部分の先頭を指定する
ことができる。FIG. 7 shows an example of a screen of the user interface 116 for designating a part to be viewed. In FIG. 7, a is a portion for displaying the summary video, b is a button for controlling the summary video, and c is a button for specifying a portion to be viewed. As soon as the button for designating the part to be viewed is pressed, the image displayed in a is regarded as the head of the part to be viewed. Therefore, keep the summary video still at the part you want to see,
When the button c is pressed, the head of the part to be surely viewed can be designated.

【００７８】なお、以上の説明では、グループの先頭シ
ーンの代表画像をハイブリッドモードの代表画像とした
が、他の方法で代表画像を選んでもよい。例えば、各グ
ループで最も時間長が長いシーンの代表画像をハイブリ
ッドモードの代表画像とみなしてもよい。また、本実施
の形態では選択するモードを４つ用意したが、４つでな
くてもよい。代表画像を求める順番も自由であり、必ず
しも、全シーン表示モード、時間判定モード、ハイブリ
ッドモード、画像基準モードの順番で代表画像を決定し
なくてもよい。In the above description, the representative image of the head scene of the group is set as the representative image in the hybrid mode. However, the representative image may be selected by another method. For example, a representative image of a scene having the longest time length in each group may be regarded as a representative image in the hybrid mode. Further, in the present embodiment, four modes to be selected are prepared, but the number may not be four. The order in which the representative images are obtained is also arbitrary, and the representative images do not necessarily have to be determined in the order of the all scene display mode, the time determination mode, the hybrid mode, and the image reference mode.

【００７９】手順２１４において、要約映像を静止した
ときに、静止した位置の直前の代表画像の前後の代表画
像を一覧表示してもよい。そして、手順２１５におい
て、表示された代表画像の一つを見たい部分の先頭とし
て指定してもよい。同時に表示する代表画像の枚数は、
例えば２４枚にすればよい。In step 214, when the summary video is frozen, a list of representative images before and after the representative image immediately before the still position may be displayed. Then, in step 215, one of the displayed representative images may be designated as the head of a part to be viewed. The number of representative images displayed at the same time is
For example, the number of sheets may be 24.

【００８０】また、手順２１４において、各代表画像の
シーンの先頭以降の５秒間の映像データを取り出すが、
直前の代表画像のシーンの先頭以降の５秒間の中に代表
画像のシーンの先頭フレームが含まれることがある。図
８は、かかる状況での要約映像の再生部分を示す説明図
である。上記説明のように、取り出した映像データを次
々に再生すると、図８のＡの部分が２回再生されること
になる。そこで、図示したように、「直前の代表画像の
シーンの先頭以降の５秒間」の中に「代表画像のシーン
の先頭フレーム」が含まれる場合には、代表画像のシー
ンの先頭ではなく、直前に再生した部分（例えば、図８
のＡの部分）の直後から再生してもよい。In step 214, the video data for 5 seconds after the head of the scene of each representative image is extracted.
The leading frame of the scene of the representative image may be included in the five seconds after the beginning of the scene of the immediately preceding representative image. FIG. 8 is an explanatory diagram showing a playback portion of the summary video in such a situation. As described above, when the extracted video data is successively reproduced, the portion A in FIG. 8 is reproduced twice. Therefore, as shown in the figure, when the “first frame of the scene of the representative image” is included in the “5 seconds after the beginning of the scene of the previous representative image”, instead of the head of the scene of the representative image, (For example, FIG. 8
May be reproduced immediately after (A part).

【００８１】（第２の実施の形態）図９は第２の実施の
形態であって、複数の映像要約基準に基づき映像より要
約情報を抽出するとともに使用者に指定された枚数以下
の代表画像を抜き出す映像要約装置と、この装置により
選択された映像の要約情報およびその映像自体を表示す
る映像表示装置を組み合わせて構成した映像要約システ
ムを示すブロック図である。(Second Embodiment) FIG. 9 shows a second embodiment, in which summary information is extracted from a video based on a plurality of video summarization criteria, and the number of representative images less than or equal to the number specified by the user. FIG. 1 is a block diagram showing a video summarization system configured by combining a video summarization device extracting a video and a video display device that displays summary information of a video selected by the device and a video itself.

【００８２】図９において、８０１、８０２は映像出力
装置であって、８０１はビデオディスク装置、８０２は
ＶＴＲである。８０３はビデオディスク装置８０１やＶ
ＴＲ８０２からの映像信号を処理しながら、使用者に指
定された枚数以下の画像を抜き出して映像を要約する映
像要約装置であって、フレーム画像を取り込む画像取り
込み手段８０４と、シーンチェンジを検出するシーンチ
ェンジ検出手段８０５と、一定時間以上継続するシーン
を検出する時間判定処理手段８０６と、シーンの代表画
像のグループ化を実行するグループ生成手段８０７と、
時間判定処理手段８０６とグループ生成手段８０７の出
力を受けてグループ化の結果を修正するグループ追加手
段８０８と、シーンの代表画像の間の類似度を計算する
画像類似度計算手段８０９と、類似度計算手段で求めた
類似度がしきい値以上になるシーンを検出する画像基準
処理手段８１０と、シーンチェンジ検出手段８０５と時
間判定処理手段８０６とグループ追加手段８０８と画像
基準処理手段８１０の出力を受けて代表画像を決定する
画像判定手段８１１から構成される。８１２は、ビデオ
ディスク装置８０１とＶＴＲ８０２を制御する制御装置
である。８１３は、映像を圧縮する映像圧縮装置であ
る。８１４は映像要約装置８０３で決定した代表画像と
そのフレーム番号を保存するファイルサーバーである。
８１５はファイルサーバー８１４に格納されたデータと
画像と映像を表示する映像表示装置である。In FIG. 9, reference numerals 801 and 802 denote video output devices, 801 denotes a video disk device, and 802 denotes a VTR. 803 is a video disk device 801 or V
A video summarizing device for extracting a number of images less than or equal to a number specified by a user while processing a video signal from the TR 802 and summarizing the video, an image capturing means 804 for capturing a frame image, and a scene for detecting a scene change. A change detection unit 805, a time determination processing unit 806 that detects a scene that continues for a predetermined time or more, a group generation unit 807 that performs grouping of representative images of the scene,
Group addition means 808 for correcting the result of grouping in response to the output of the time determination processing means 806 and group generation means 807, image similarity calculation means 809 for calculating the similarity between representative images of scenes, The output of the image reference processing means 810 for detecting a scene in which the similarity obtained by the calculation means is equal to or more than the threshold value, the scene change detection means 805, the time determination processing means 806, the group addition means 808, and the image reference processing means 810 It comprises image determining means 811 for receiving and determining a representative image. A control device 812 controls the video disk device 801 and the VTR 802. Reference numeral 813 denotes a video compression device that compresses a video. A file server 814 stores the representative image determined by the video summarizing device 803 and its frame number.
An image display device 815 displays data, images, and images stored in the file server 814.

【００８３】映像要約システムの映像要約装置８０３
は、例えば、コンピュータ上で実現することができる。
また、映像表示装置８１５は、例えば、コンピュータ
と、ＣＲＴ等とモニターの組み合わせによる実現するこ
とができる。An image summarizing device 803 of the image summarizing system
Can be realized on a computer, for example.
The video display device 815 can be realized by, for example, a combination of a computer, a CRT, and a monitor.

【００８４】本実施の形態に係る映像要約装置は４つの
映像要約基準を備えている。それらは第１の実施の形態
において説明したものと同じであるのでその説明は省略
する。The video summarizing apparatus according to the present embodiment has four video summarization standards. Since they are the same as those described in the first embodiment, description thereof will be omitted.

【００８５】以上のように構成された映像要約システム
について、図１０に示すフローチャートを用いてその動
作を説明する。The operation of the video summarizing system configured as described above will be described with reference to the flowchart shown in FIG.

【００８６】手順９０１では、手順２０１と同様に、図
９における制御装置８１２がビデオディスク装置８０１
とＶＴＲ８０２を制御して、映像の再生を開始し、同時
に映像圧縮装置８１３での映像の圧縮を開始する。In step 901, similarly to step 201, the control device 812 in FIG.
And the VTR 802 to start video reproduction, and at the same time, start video compression by the video compression device 813.

【００８７】手順９０２では、手順２０２と同様に、映
像が終了したかどうか判定する。映像が終了した場合に
は手順９１２に進み、そうでなければ、手順９０３に進
む。In step 902, as in step 202, it is determined whether or not the video has been completed. If the video has ended, the process proceeds to step 912; otherwise, the process proceeds to step 903.

【００８８】手順９０３では、手順２０３と同様に、画
像取り込み手段８０４が再生中のフレーム画像を取り込
む。In step 903, similarly to step 203, the image capturing means 804 captures the frame image being reproduced.

【００８９】手順９０４では、手順２０４と同様に、シ
ーンチェンジ検出手段８０５が、画像取り込み手段８０
４で取り込まれたフレーム画像を処理して、共通色比率
法等を用いてシーンチェンジが発生したかどうかを判定
する。In step 904, as in step 204, the scene change detecting means 805 determines whether the scene
In step 4, the frame image captured is processed to determine whether a scene change has occurred using a common color ratio method or the like.

【００９０】手順９０４で「シーンチェンジが発生し
た」と判定された場合には手順９０５に進み、そうでな
ければ手順９０２に戻る。If it is determined in step 904 that "a scene change has occurred", the flow advances to step 905; otherwise, the flow returns to step 902.

【００９１】手順９０５では全シーンモードのシーンの
代表画面および先頭フレームを決定する。すなわち、手
順９０４で取り込まれた画像およびその画像のフレーム
番号をファイルサーバ８１４に保存する。保存した画像
は、シーンの代表画像として用いる。また、シーンの代
表画像を全シーン表示モードの代表画像とみなす。In step 905, a representative screen and a top frame of a scene in the all scene mode are determined. That is, the image captured in step 904 and the frame number of the image are stored in the file server 814. The saved image is used as a representative image of the scene. Also, the representative image of the scene is regarded as a representative image in the all scene display mode.

【００９２】手順９０６では時間判定モードのシーンの
代表画面および先頭フレームを決定する。すなわち、手
順２０６と同様に、手順９０４で検出したシーンチェン
ジの直前のシーン（以下、前シーンという）の時間長を
計算してから、時間判定処理手段８０６で、時間長が８
秒以上になる前シーンを検出し、検出した前シーンの代
表画像を時間判定モードの代表画像とみなす。In step 906, the representative screen and the first frame of the scene in the time determination mode are determined. That is, as in step 206, the time length of the scene immediately before the scene change detected in step 904 (hereinafter referred to as the previous scene) is calculated, and then the time determination processing unit 806 sets the time length to 8
A previous scene that is longer than a second is detected, and a representative image of the detected previous scene is regarded as a representative image in the time determination mode.

【００９３】手順９０７から手順９０９まででは、ハイ
ブリッドモードのシーンの代表画面および先頭フレーム
を決定する。In steps 907 to 909, the representative screen and the first frame of the scene in the hybrid mode are determined.

【００９４】手順９０７では、手順２０７と同様に、グ
ループ生成手段８０７が、手順９０４で検出したシーン
チェンジ直後のシーン（以下、次シーンという。）と前
シーンの代表画像間の色差ヒストグラム相関値を求め、
色差ヒストグラム相関値がしきい値以上になる場合に、
次シーンの代表画像をハイブリッドモードの代表画像と
みなしてファイルサーバー８１４に保存する。なお、色
差ヒストグラムを用いずに、共通画素法を用いて時系列
のシーンのグループ化を実行し、グループの先頭シーン
の代表画像をハイブリッドモードの代表画像とみなして
もよい。共通画素法については、本実施の形態に係る映
像要約装置の動作を説明した後に説明する。In step 907, similarly to step 207, the group generation means 807 calculates the color difference histogram correlation value between the scene immediately after the scene change detected in step 904 (hereinafter referred to as the next scene) and the representative image of the previous scene. Asked,
If the color difference histogram correlation value exceeds the threshold,
The representative image of the next scene is regarded as the representative image of the hybrid mode and stored in the file server 814. Instead of using the color difference histogram, time-series scene grouping may be performed using the common pixel method, and the representative image of the head scene of the group may be regarded as the representative image in the hybrid mode. The common pixel method will be described after the operation of the video summarizing apparatus according to the present embodiment is described.

【００９５】手順９０８では、手順２０８と同様に分岐
処理を実行する。前シーンの時間長が８秒以上になる場
合には、手順９０９に進み、そうでなければ手順９１０
に進む。In step 908, a branching process is executed as in step 208. If the time length of the previous scene is 8 seconds or more, the process proceeds to step 909; otherwise, the process proceeds to step 910.
Proceed to.

【００９６】手順９０９では、手順２０９と同様に、グ
ループ追加手段８０８が、前シーンの代表画像と次シー
ンの代表画像をハイブリッドモードの代表画像とみなし
てファイルサーバー８１４に保存する。In step 909, as in step 209, the group adding means 808 regards the representative image of the previous scene and the representative image of the next scene as a representative image in the hybrid mode, and stores the representative image in the file server 814.

【００９７】手順２１０から手順２１１まででは、画像
基準モードのシーンの代表画面および先頭フレームを決
定する。In steps 210 to 211, a representative screen and a head frame of a scene in the image reference mode are determined.

【００９８】手順９１０では、手順２１０と同様に、次
シーンの代表画像Ｉ_Nと映像の先頭の２０シーンの代表
画像Ｉ_M（Ｍは１から２０までの自然数）との間の類似
度S(M,N)、を画像類似度計算手段８０９が計算する。[0098] In step 910, similarly to the procedure 210, the similarity S between the representative image I _M of the head 20 scene representative image I _N and the video of the next scene (natural number of M is from 1 to 20) ( M, N) is calculated by the image similarity calculating means 809.

【００９９】手順９１１では、手順２１１と同様に、画
像基準処理手段８１０で、Ｍが１から２０までのいずれ
かの値をとるときに、（２）式が成り立つかどうか調
べ、（２）式が成立するときに、次シーンの代表画像を
画像基準モードの代表画像とみなす。手順９１１の終了
後、手順９０２に戻る。In step 911, similarly to step 211, the image reference processing means 810 checks whether or not the expression (2) is satisfied when M takes any value from 1 to 20, and the expression (2) Is established, the representative image of the next scene is regarded as the representative image in the image reference mode. After the end of the procedure 911, the process returns to the procedure 902.

【０１００】手順９１２では、手順２１２と同様に、映
像の再生と映像の圧縮を中止する。ここまでの手順でフ
ァイルサーバー８１４には、複数の映像要約基準、すな
わち全シーンモード、時間判定モード、ハイブリッドモ
ード、画像基準モードによりそれぞれ抜き出された要約
情報、すなわち代表画像と先頭フレーム番号が保存され
た。以下の手順９１３から手順９１８までは映像表示装
置８１５において行われる処理である。In step 912, as in step 212, video reproduction and video compression are stopped. In the procedure up to this point, the file server 814 stores a plurality of video summary standards, that is, summary information extracted by the all scene mode, the time determination mode, the hybrid mode, and the image reference mode, that is, the representative image and the first frame number. Was done. The following steps 913 to 918 are processes performed in the video display device 815.

【０１０１】手順９１３から手順９１６まででは、映像
の要約情報である代表画像を所定の枚数にまで取捨選択
する。以下では、映像要約装置で抜き出す代表画像の枚
数が２４枚に設定されている場合について述べる。In steps 913 to 916, the representative images, which are the summary information of the video, are selected up to a predetermined number. Hereinafter, a case will be described in which the number of representative images extracted by the video summarizing apparatus is set to 24.

【０１０２】手順９１３では、画像判定手段８１１が、
あらかじめ決めてある優先順位に基づいて、どのモード
による代表画像を抜き出すかを決定する。以下では、優
先順位の最も高いモードを画像基準モードとし、２番目
に高いモードを時間判定モードとし、３番目に高いモー
ドをハイブリッドモードとし、４番目に高いモードを全
シーン表示モードとした場合について述べる。なお、こ
の優先順位は、使用者が画像判定手段８１１に命令を送
ることのできるインターフェース手段を設けて自由に設
定する態様をとることができる。In step 913, the image determining means 811
Based on the priority determined in advance, the mode in which the representative image is extracted is determined. Hereinafter, a case where the mode with the highest priority is the image reference mode, the second highest mode is the time determination mode, the third highest mode is the hybrid mode, and the fourth highest mode is the all scene display mode State. It should be noted that this priority can be set freely by providing an interface unit that allows the user to send a command to the image determination unit 811.

【０１０３】まずモードを優先順位の最も高い画像基準
モードにより代表画像を決定する。画像基準モードによ
り代表画像が決定された後に手順９１６の分岐処理によ
りこの手順９１３に戻って来た場合には、前回決定した
モードよりも優先順位が一つ低いモードにより代表画像
を決定する。First, a representative image is determined by the image reference mode having the highest priority. When the procedure returns to the procedure 913 by the branching process of the procedure 916 after the representative image is determined in the image reference mode, the representative image is determined in the mode having one lower priority than the previously determined mode.

【０１０４】手順９１４では、画像選択の優先順位にも
とづいて、手順９１３で決定したモードの代表画像に順
番を付ける。例えば、手順９１３で決定した代表画像の
中で、時間長の長いシーンの代表画像から順番に若い番
号をつけていく。同じモードの次の代表画像までの長さ
が長い代表画像から順番に若い番号を付けてもよい。In step 914, the representative images in the mode determined in step 913 are ordered based on the priority of image selection. For example, among the representative images determined in the procedure 913, younger numbers are assigned in order from the representative image of a scene having a longer time length. The representative image having the longer length up to the next representative image in the same mode may be numbered in ascending order.

【０１０５】手順９１５では、若い番号から順番に、映
像要約装置で抜き出す代表画像とみなしていく。ただ
し、映像要約装置で抜き出す代表画像の枚数が２４枚を
越えたら、作業を中断する。In step 915, the images are regarded as representative images extracted by the video summarizing apparatus in ascending order of numbers. However, when the number of representative images extracted by the video summarizing apparatus exceeds 24, the operation is interrupted.

【０１０６】手順９１６は分岐処理である。すでに映像
要約装置で抜き出すことに決定している代表画像の枚数
が２４枚未満のときは、手順９１３に戻る。そうでなけ
れば、手順９１７に進む。Step 916 is a branching process. If the number of representative images already determined to be extracted by the video summarizing apparatus is less than 24, the procedure returns to step 913. Otherwise, go to step 917.

【０１０７】なお、以上の説明では、映像要約装置で抜
き出す代表画像の枚数を２４枚に設定したが、必ずしも
２４枚でなくてもよい。また、全てのモードによっても
代表画像の枚数が２４枚未満のときは手順９１６で無限
ループに入るおそれがある。したがって、全てのモード
を選択した後もシーンの数が２４枚未満になるときに
は、その時点での代表画像の枚数をシーンの数と同じに
なるようにしてもよい。In the above description, the number of representative images extracted by the video summarizing apparatus is set to 24, but the number is not necessarily limited to 24. If the number of representative images is less than 24 in all modes, the process may enter an infinite loop in step 916. Therefore, when the number of scenes becomes less than 24 even after all the modes are selected, the number of representative images at that time may be made equal to the number of scenes.

【０１０８】手順９１７では、使用者が見たい部分を効
率よく探せるように、システムが映像表示装置上に映像
の要約を表示する。例えば、手順９１５で決定した代表
画像を一覧表示する。In step 917, the system displays a summary of the video on the video display device so that the user can efficiently search for a desired portion. For example, a list of the representative images determined in step 915 is displayed.

【０１０９】手順９１８では、使用者が見たい部分を指
定する。例えば、マウスなどのポインティングデバイス
を用いて、見たい部分の代表画像を指定する。映像表示
装置がファイルサーバーから映像データを受け取り、指
定された部分から映像を再生する。In step 918, a part desired by the user is specified. For example, using a pointing device such as a mouse, a representative image of a desired portion is specified. The video display device receives the video data from the file server and reproduces the video from the specified portion.

【０１１０】なお、以上の説明では、手順９１７で代表
画像を一覧表示したが、実施の形態１の手順２１４で述
べたように、要約映像再生手段を用いて要約映像を表示
してもよい。また、代表画像を一覧表示する画像表示手
段を設けて、映像要約装置８０３の中に組み込んでもよ
い。この場合には、手順９１７でシステムが画像表示手
段を用いて映像の要約を表示する。In the above description, a list of representative images is displayed in step 917. However, as described in step 214 of the first embodiment, a summary video may be displayed using the summary video reproducing means. Further, an image display means for displaying a list of representative images may be provided and incorporated in the video summarizing apparatus 803. In this case, in step 917, the system displays the video summary using the image display means.

【０１１１】以上の説明では、設定された枚数以下の代
表画像のみを一覧表示しているが、時間判定モード、ハ
イブリッドモード、画像基準モードの各モードの代表画
像の一覧表示を選択できるようにしてもよい。In the above description, only representative images of a set number or less are displayed in a list. However, a list display of representative images in each of the time determination mode, the hybrid mode, and the image reference mode can be selected. Is also good.

【０１１２】以下では、本実施の形態の映像要約システ
ムにおける手順９０７において時系列のシーンをグルー
プ化する際に採用することのできる共通画素法によるグ
ループ化について述べる。共通画素法は出願人が先に特
願平７−４６９７０号において開示したものである。In the following, grouping by the common pixel method which can be employed when grouping time-series scenes in step 907 in the video summarizing system of the present embodiment will be described. The common pixel method has been disclosed by the applicant in Japanese Patent Application No. 7-46970.

【０１１３】共通画素法は、シーンに共通する色に着目
して、砂浜で撮影したシーンが続く場合のような類似背
景のシーンまたは様々な人物のバストショット（人物の
胸から上が映っているシーン）が続く場合のような、類
似被写体のシーンが時系列に連続する場合を検出し、一
つのグループに統合する方法である。The common pixel method focuses on the color common to the scenes, and has a similar background scene such as a scene taken on a sandy beach or a bust shot of various persons (the upper part is reflected from the chest of the person). This is a method of detecting a case where scenes of similar subjects are continuous in a time series, such as a case where scenes continue, and integrating them into one group.

【０１１４】シーンは内容の最小単位である。従って、
１つのシーン内のフレーム画像は「同一人物が登場す
る」などの共通した特徴をもつ。そこで、各シーンの先
頭部分の動画像がシーンを代表するとみなし、この動画
像を代表時空間画像という。A scene is a minimum unit of contents. Therefore,
Frame images in one scene have common features such as "the same person appears". Therefore, the moving image at the head of each scene is regarded as representing a scene, and this moving image is referred to as a representative spatiotemporal image.

【０１１５】「色が共通する画素」を同一物体とみなす
と、「異なる動きをする同一色の物体」のシーンがグル
ープ化される問題がある。この問題点を解決するため
に、同一色の物体が、異なる２つのシーンで共に静止し
ている場合と、異なる２つのシーンで共に動いている場
合に限って、同一色の物体を同一物体とみなす。即ち、
「動きの有無と色が共通する画素」を同一物体とみな
す。また、同一グループ内のシーンが共通色比率条件、
すなわち、各シーンの代表時空間画像において、「グル
ープ内のシーンの代表時空間画像に共通して現れる同一
物体」の画素の総数を全画素数で正規化した値がしきい
値θ_SHOT以上になるという条件を満たすと仮定する。If the "pixels having the same color" are regarded as the same object, there is a problem that the scenes of the "objects of the same color moving differently" are grouped. In order to solve this problem, the same color object is regarded as the same object only when the same color object is stationary in two different scenes and when the same color object is moving together in two different scenes. I reckon. That is,
“Pixels having the same color as the presence or absence of motion” are regarded as the same object. Also, scenes in the same group have common color ratio conditions,
That is, in the representative spatiotemporal image of each scene, the value obtained by normalizing the total number of pixels of “the same object appearing in the representative spatiotemporal image of the scenes in the group” by the total number of pixels is equal to or greater than the threshold θ _SHOT . Suppose that the condition is satisfied.

【０１１６】図１１は共通色比率条件の様子を示す説明
図である。図１１に示すように、シーンＳＣ₁〜ＳＣ₄の
代表時空間画像に共通して現れる物体がＡ（背景）の部
分のみであり、Ａの部分の画素数の占める割合がシーン
ＳＣ₄以外でしきい値θ_SHOT以上になるとき、シーンＳ
Ｃ₁〜ＳＣ₃は共通色比率条件を満足するが、シーンＳＣ
₁〜ＳＣ₄は共通色比率条件を満足しない。ＳＣ₃〜ＳＣ₄
が共通色比率条件を満足するかどうかは不定である。図
１１の例では、ＳＣ₃〜ＳＣ₄のＡ（背景）とＢ（長方形
の物体）の部分が「シーンＳＣ₃〜ＳＣ₄の代表時空間画
像に共通して現れる同一物体」となり、その画素数の占
める割合がシーンＳＣ₃、ＳＣ₄の両方において１００％
になるので、シーンＳＣ₃〜ＳＣ₄は共通色比率条件を満
足する。FIG. 11 is an explanatory diagram showing the state of the common color ratio condition. As shown in FIG. 11, only the A (background) portion appears in the representative spatiotemporal images of the scenes SC _{1 to} SC ₄ in common, and the ratio of the number of pixels of the A portion is different from that of the scene SC ₄ . When the threshold θ _{SHOT is} exceeded, the scene S
C _{1 to} SC ₃ satisfy the common color ratio condition, but the scene SC
_{1 ~SC} ₄ do not satisfy the common color ratio conditions. SC _{3 to} SC ₄
Whether or not satisfies the common color ratio condition is indeterminate. In the example of FIG. 11, SC ₃ to SC ₄ of A (background) and B "same object appearing in common to the representative spatiotemporal image of the scene SC ₃ to SC _4" (rectangular object) portion of next, the pixel 100% in both scenes SC ₃ and SC ₄
Since the scene SC ₃ to SC ₄ satisfy the common color ratio conditions.

【０１１７】このとき、共通色比率条件を満足するシー
ンを同一グループとみなすと、シーンＳＣ₁〜ＳＣ₃とシ
ーンＳＣ₃〜ＳＣ₄が同一グループとなるので、シーンＳ
Ｃ₁〜ＳＣ₄が同一グループとなるはずであるが、シーン
ＳＣ₁〜ＳＣ₄は共通色比率条件を満足せず矛盾が生じ
る。従って、共通色比率条件を満足しても、同一グルー
プ内のシーンとは限らない。At this time, if scenes satisfying the common color ratio condition are regarded as the same group, scenes SC _{1 to} SC ₃ and scenes SC _{3 to} SC ₄ belong to the same group.
Although C _{1 to} SC ₄ should belong to the same group, scenes SC _{1 to} SC ₄ do not satisfy the common color ratio condition, and contradictions occur. Therefore, even if the common color ratio condition is satisfied, the scenes are not necessarily in the same group.

【０１１８】共通画素法では、同一グループ内で隣り合
うシーンの間の類似度が、異なるグループのシーンの間
の類似度に比べて大きい値になると仮定し、以下の手順
１〜２でグループの境界を求める。ただし、Ｍ＝１と
し、Ｌの初期値は１とする。In the common pixel method, it is assumed that the similarity between adjacent scenes in the same group has a larger value than the similarity between scenes in different groups. Find the boundary. However, M = 1 and the initial value of L is 1.

【０１１９】手順１では、シーンＳＣ_M〜ＳＣ_M+Lが共通
色比率条件を満足するかどうか判定する。満足する場合
には、Ｌに１を加えながら、共通色比率条件を満足しな
くなるまで判定を繰り返す。図１１の例では、Ｌ＝４に
なった時点で手順２に進む。In the procedure 1, it is determined whether or not the scenes SC _{M to} SC _M _{+ L} satisfy the common color ratio condition. If satisfied, the determination is repeated while adding 1 to L until the common color ratio condition is no longer satisfied. In the example of FIG. 11, the process proceeds to the procedure 2 when L = 4.

【０１２０】手順２では、共通色比率条件を満足する各
シーンＳＣ_M〜ＳＣ_M+L-1に対して、次シーンとの類似度
を求め、類似度が最小になる部分をグループの境界とみ
なす。、図１２はグループの境界の決定方法を示す説明
図である。同図において、ＳＣ₂とＳＣ₃の類似度が他の
類似度、すなわちＳＣ₁とＳＣ₂の類似度およびＳＣ₃と
ＳＣ₄の類似度に比べて小さい値なので、ＳＣ₂とＳＣ₃
の間をグループの境界とみなす。In the procedure 2, for each of the scenes SC _{M to} SC _{M + L-1} satisfying the common color ratio condition, the similarity with the next scene is obtained, and the portion having the minimum similarity is defined as the group boundary. I reckon. FIG. 12 is an explanatory diagram showing a method of determining a group boundary. In the figure, the similarity of other similarity SC ₂ and SC _3, i.e., SC of ₁ and smaller than the similarity of the SC similarity and SC ₃ of ₂ and SC _4, SC ₂ and SC ₃
Is regarded as a group boundary.

【０１２１】なお、代表時空間画像の代わりに代表画像
を用いてもよい。このときは、「色が共通する画素」を
同一物体とみなせばよい。Note that a representative image may be used instead of the representative spatiotemporal image. In this case, "pixels having a common color" may be regarded as the same object.

【０１２２】（第３の実施の形態）第３の実施の形態
は、第２の実施の形態において説明した映像要約システ
ムにおいて、どのモードの代表画像から抜き出すかの選
択を、第２の実施の形態の手順９１３のようにあらかじ
め決められた優先順位に基づいてモードを決定するので
はなく、映像を解析してモードの優先順位を決定するこ
とにより行うものである。(Third Embodiment) In a third embodiment, in the video summarizing system described in the second embodiment, selection of a mode from which to extract a representative image is performed according to the second embodiment. Instead of determining the mode based on a predetermined priority as in the procedure 913 of the embodiment, the mode is determined by analyzing the video and determining the priority of the mode.

【０１２３】図９の映像要約システムについて、図１３
に示すフローチャートを用いてその動作を説明する。The video summarizing system shown in FIG.
The operation will be described with reference to the flowchart shown in FIG.

【０１２４】手順１２０１〜手順１２１２の動作は、図
１０に示した手順９０１〜手順９１２の動作と同じであ
るのでその説明を省略する。The operations in steps 1201 to 1212 are the same as the operations in steps 901 to 912 shown in FIG. 10, and a description thereof will be omitted.

【０１２５】手順１２１３では、画像判定手段８１１が
モードの優先順位を決定する。以下では、画像基準モー
ドの代表画像の枚数が５枚以上になるときには、優先順
位の最も高いモードを画像基準モードとし、２番目に高
いモードを時間判定モードとし、３番目に高いモードを
ハイブリッドモードとし、４番目に高いモードを全シー
ン表示モードとし、画像基準モードの代表画像の枚数が
５枚未満のときには、優先順位の最も高いモードを時間
判定モードとし、２番目に高いモードをハイブリッドモ
ードとし、３番目に高いモードを全シーン表示モードと
した例について述べる。なお、ここでは画像基準モード
の代表画像が５枚以上であるか否かを調べてモードの優
先順位を変えているが、必ずしも５枚に設定する必要は
ない。この例のようにモードの自動決定基準を定めたの
は、画像基準モードの要約映像が５枚以下の場合には要
約する映像にはヘッドラインが含まれていない可能性が
高いという推定に基づいている。なお、このようなモー
ドの自動決定基準は、使用者が画像判定手段８１１に命
令を送ることのできるインターフェース手段を設けて自
由に設定を変更する態様をとることができる。At step 1213, the image judging means 811 determines the priority of the mode. Hereinafter, when the number of representative images in the image reference mode becomes 5 or more, the mode with the highest priority is set to the image reference mode, the second highest mode is set to the time determination mode, and the third highest mode is set to the hybrid mode. When the number of representative images in the image reference mode is less than 5, the mode with the highest priority is the time determination mode, and the second highest mode is the hybrid mode. An example in which the third highest mode is set to the all scene display mode will be described. Here, it is checked whether or not the number of the representative images in the image reference mode is five or more, and the priority of the mode is changed. However, it is not always necessary to set the number to five. The automatic criterion for determining the mode as in this example is based on the presumption that if the number of summary videos in the image reference mode is five or less, it is highly likely that the video to be summarized does not include a headline. ing. Note that such a mode automatic determination criterion may be configured such that the user can send an instruction to the image determination unit 811 by providing an interface unit and freely change the setting.

【０１２６】手順１２１４では、手順１２１３で決定し
た優先順位にもとづいて画像判定手段８１１がモードを
決定する。まずモードを優先順位の最も高い画像基準モ
ードにより代表画像を決定する。画像基準モードにより
代表画像が決定された後に手順１２１７の分岐処理によ
りこの手順１２１４に戻って来た場合には、前回決定し
たモードよりも優先順位が一つ低いモードにより代表画
像を決定す。In step 1214, the image determining means 811 determines the mode based on the priority determined in step 1213. First, a representative image is determined based on the image reference mode having the highest priority. When the process returns to the procedure 1214 by the branching process of the procedure 1217 after the representative image is determined in the image reference mode, the representative image is determined in a mode having one priority lower than the mode determined last time.

【０１２７】以下では、映像要約装置で抜き出す代表画
像の枚数が２４枚に設定されている場合について述べ
る。In the following, a case where the number of representative images extracted by the video summarizing apparatus is set to 24 will be described.

【０１２８】手順１２１５では、手順９１４と同様に、
画像選択の優先順位にもとづいて、手順１２１４で決定
したモードの代表画像に順番を付ける。例えば、手順１
２１４で決定した代表画像の中で、時間長の長いシーン
の代表画像から順番に若い番号をつけていく。In step 1215, similar to step 914,
The representative images in the mode determined in step 1214 are ordered based on the image selection priority. For example, Procedure 1
In the representative images determined in 214, the representative images of scenes having a longer time length are numbered in ascending order.

【０１２９】手順１２１６では、手順９１５と同様に、
若い番号から順番に、映像要約装置で抜き出す代表画像
とみなしていく。ただし、映像要約装置で抜き出す代表
画像の枚数が２４枚を越えたら、作業を中断する。In step 1216, similar to step 915,
In order from the youngest number, it is regarded as a representative image extracted by the video summarizing apparatus. However, when the number of representative images extracted by the video summarizing apparatus exceeds 24, the operation is interrupted.

【０１３０】手順１２１７では、手順９１６と同様の分
岐処理を実行する。すでに映像要約装置で抜き出すこと
に決定している代表画像の枚数が２４枚未満のときは、
手順１２１４に戻る。そうでなければ、手順１２１８に
進む。At step 1217, the same branch processing as at step 916 is executed. If the number of representative images already determined to be extracted by the video summarization device is less than 24,
The procedure returns to step 1214. Otherwise, go to step 1218.

【０１３１】手順１２１８では、手順９１７と同様に、
使用者が見たい部分を効率よく探せるように、システム
が映像表示装置上に映像の要約を表示する。例えば、手
順１２１６で決定した代表画像を一覧表示する。In step 1218, similar to step 917,
The system displays a summary of the image on the image display device so that the user can efficiently search for a desired portion. For example, a list of representative images determined in step 1216 is displayed.

【０１３２】手順１２１９では、手順９１８と同様に、
使用者が見たい部分を指定する。例えば、マウスなどの
ポインティングデバイスを用いて、見たい部分の代表画
像を指定する。映像表示装置がファイルサーバーから映
像データを受け取り、指定された部分から映像を再生す
る。At step 1219, similar to step 918,
Specify the part that the user wants to see. For example, using a pointing device such as a mouse, a representative image of a desired portion is specified. The video display device receives the video data from the file server and reproduces the video from the specified portion.

【０１３３】[0133]

【発明の効果】以上で説明した本発明は次のような有利
な効果を奏するため、一定の映像要約基準のみで画一的
に代表画像を選択する従来の映像要約装置や代表画像を
映像の長さ等に関係なく選択する従来の映像表示装置と
比べて、映像内容の多様性および使用者の好みの多様性
に対応することができる。（１）請求項１および請求項２に記載した発明に基づく
映像要約装置は、例えばニュース番組においてしばしば
起こるように、現場のアナウンサーが事件を説明するシ
ーン等のように前後のシーンと同一背景であっても重要
な情報を有している可能性の高いシーンが一定時間以上
継続する場合には、そのシーンとその直後のシーンのよ
うに重要な情報を有している可能性が高いシーンを背景
が類似している前後のシーンから取り出すことができ
る。グループ化の判断時間（本明細書の実施の形態では
８秒）より時間長の長いシーンとしては、ニュース番組
のアナウンサーのシーン、インタビューのシーン、登場
人物による説明のシーン、フリップの出るシーンなどが
ある。我々の分析によれば、時間長の長いシーンの直後
のシーンは、時間長の長いシーンと同様に重要であるこ
とが多い。The present invention described above has the following advantageous effects. For this reason, the conventional image summarizing apparatus and the conventional image summarizing apparatus for uniformly selecting a representative image only based on a certain image summarizing standard are used. Compared to a conventional video display device that selects irrespective of length or the like, it is possible to cope with a variety of video contents and a variety of user preferences. (1) The video summarizing apparatus according to the first and second aspects of the present invention has the same background as the scenes before and after the scene, such as a scene explaining an incident by an announcer at the site, as often occurs in a news program, for example. Even if there is a scene with a high possibility of having important information for a certain period of time or more, a scene with a high possibility of having important information such as the scene and the scene immediately after the scene It can be extracted from scenes before and after having similar backgrounds. Scenes longer than the grouping determination time (8 seconds in the embodiment of the present specification) include scenes of an announcer of a news program, scenes of interviews, scenes of explanation by characters, scenes of flips, and the like. is there. According to our analysis, the scene immediately following the long scene is often as important as the long scene.

【０１３４】例えば、ニュース番組で現場から事件を報
告する場合には、「現場に派遣された登場人物が事件の
背景を説明するシーン」に続いて、「事件に関連するシ
ーン」の映像が流れる。この場合、「現場に派遣された
登場人物が事件の背景を説明するシーン」の画像より
も、「事件に関連するシーン」の画像の方が、事件の内
容を的確に表しており、重要である。しかし、色差ヒス
トグラム相関を用いた代表画像決定方法では、現場とい
う同一背景のシーンになるため、「事件に関連するシー
ン」と「現場に派遣された登場人物が事件の背景を説明
するシーン」が同一グループになり、「事件に関連する
シーン」の画像が代表画像にならない問題がある。ま
た、時間判定モードでも、「事件に関連するシーン」が
必ずしも８秒以上にならないため、代表画像になるとは
限らない。For example, when a case is reported from the site in a news program, a video of a "scene related to the case" flows after "a scene in which characters dispatched to the site explain the background of the case". . In this case, the image of the "scene related to the incident" more accurately represents the content of the incident than the image of "the scene where the characters dispatched to the scene explain the background of the incident," is there. However, in the representative image determination method using the color difference histogram correlation, since the scene is the same background of the scene, the "scene related to the incident" and "the scene where the characters dispatched to the scene explain the background of the incident" In the same group, there is a problem that the image of the "scene related to the incident" does not become the representative image. Also, even in the time determination mode, the “scene related to the incident” does not always become 8 seconds or more, and thus does not always become the representative image.

【０１３５】一方、ハイブリッドモードでは、８秒以上
のシーンとその直後のシーンが代表画像になるので、
「事件に関連するシーン」の画像が代表画像として選ば
れ、色差ヒストグラム相関用いた代表画像決定方法と時
間判定モードにない効果が得られる。時間長の長いシー
ンの中で、アナウンサーのシーン、インタビューのシー
ン、登場人物の説明のシーンにおいては、直後のシーン
と一緒に表示されないと、内容がわからないことが多
い。色差ヒストグラム相関を用いた代表画像決定方法を
用いたとき、時間長の長いシーンの前後のシーンが表示
されるケースが多いが、しきい値が不適当だったり、背
景が類似する場合には、直後のシーンが表示されない。
しかし、ハイブリッドモードでは、確実に直後のシーン
が表示される効果がある。従って、ハイブリッドモード
は、色差相関値法や時間判定モードと比較すると、かか
る種類の番組の要約においては優位性を有する。（２）請求項３および請求項４に記載した発明に基づく
映像要約装置は、本編の各記事から２〜３シーンの動画
像を抜き出して作成されたダイジェストが最初に流れて
から本編が流れる番組で、これらのダイジェスト画像を
代表画像となるように本編をグループ化することができ
る。ダイジェスト部分と本編部分と比較して代表画像を
決定するからである。すなわち、画像基準モードはかか
る種類の番組で絶大な効果を有する。従って、画像基準
モードでは、映像の内容がダイジェスト部分のシーンで
代表され、見たい部分を簡単に探すことができる。On the other hand, in the hybrid mode, a scene longer than 8 seconds and a scene immediately after it are representative images.
The image of the “scene related to the incident” is selected as the representative image, and an effect that is not in the representative image determination method using the color difference histogram correlation and the time determination mode can be obtained. Of the scenes with a long duration, the contents of an announcer scene, an interview scene, and a character explanation scene cannot be understood unless they are displayed together with the immediately following scene. When the representative image determination method using the color difference histogram correlation is used, scenes before and after a scene with a long time length are often displayed, but when the threshold is inappropriate or the background is similar, The next scene is not displayed.
However, in the hybrid mode, there is an effect that the scene immediately after is surely displayed. Therefore, the hybrid mode has an advantage in summarizing such a type of program as compared with the color difference correlation value method and the time determination mode. (2) The video summarizing apparatus based on the invention described in claims 3 and 4 is a program in which a digest created by extracting moving images of two or three scenes from each article of the main part flows first, and then the main part flows. Thus, the main part can be grouped so that these digest images become representative images. This is because the representative image is determined by comparing the digest part and the main part. That is, the image reference mode has a tremendous effect on such types of programs. Therefore, in the image reference mode, the contents of the video are represented by the scene of the digest portion, and the portion to be viewed can be easily searched.

【０１３６】また、映像表示装置と組み合わせて映像要
約システムとして使用することにより、見たいシーンを
指定すると、本編の中のそのシーン以降の映像が再生さ
せることができ、見たい記事を簡単に再生できるという
従来の映像要約システムにはなかった効果を有する。ま
た、画像基準モードにより代表画像がほとんど選択され
ない場合には、その映像中にはその映像の内容を要約的
に紹介する部分（例えば、ニュース等ヘッドライン等）
が存在しないことになる。従って、本モードは映像に要
約情報が付属しているか否かを判定することができると
いう効果を有する。（３）請求項５、請求項６、請求項７、請求項８、請求
項９、および請求項１３に記載した発明に基づく映像要
約装置は、複数の映像要約基準を備えているため、単一
の映像要約基準により映像を要約していた従来の映像要
約装置と比較して、映像の内容の多様性により適切に対
応することができる。使用者は、番組等によって映像要
約基準を変更したり組み合わせたりすることが自由にで
きる。（４）請求項７および請求項８および請求項９に記載し
た発明に基づく映像要約装置は、使用者が代表画像の枚
数の上限を指定できるので、代表画像の枚数を少なく抑
えることができ、従来法に比べて見たい部分を簡単に探
すことができる。さらに、あらかじめモードの優先順位
を決めておいた場合には、使用者が映像の特徴に合わせ
て代表画像を決定できる。また、画像基準モードの代表
画像の枚数がしきい値以上になるかどうかによって代表
画像の決定方法を切り替える場合には、ダイジェストが
存在する映像で、ダイジェスト部分の画像を優先して抜
き出すことができるという効果をも有する。したがっ
て、使用者が要約映像検索する効率が従来の映像要約シ
ステムと比べて向上する。（５）請求項１０に記載した発明に基づく映像表示装置
は、映像の情報およびその映像の要約情報を記録してい
るファイルサーバーの情報を利用して、各代表画像の位
置付近の一定時間の映像を次々に見ながら見たい部分を
指定できる、すなわち動画と音声を用いて見たい部分を
探すことができるので、音声や被写体の動きを考慮しな
がら見たい部分を指定できるようになる。さらに、重要
性の高いシーンなどから代表画像が選ばれているので、
再生される映像を見て、使用者が冗長に感じることはな
い。（６）請求項１１および請求項１２に記載した発明に基
づく映像要約装置は、映像の代表画像およびその前後よ
りなる要約映像をを再生する際に、重複して再生する部
分が無いようにしながら要約映像を再生するので、冗長
な再生にならない。さらに、各代表画像の位置付近の一
定時間の映像を次々に見ながら、見たい部分でこの再生
を中断すると、中断した位置のフレーム画像の内容を代
表する代表画像を含む複数の代表画像を一覧表示するの
で、中断した位置付近に存在する「見たい部分に関連す
る内容」を簡単に探すことができるという効果を有す
る。[0136] Further, by using a video summarizing system in combination with a video display device, when a scene to be viewed is specified, the video subsequent to that scene in the main part can be reproduced, and the article to be viewed can be easily reproduced. This has an effect that the conventional video summarization system does not have. When the representative image is hardly selected in the image reference mode, a portion (for example, a headline such as news) for introducing the content of the video in the video in a summary manner
Will not exist. Therefore, this mode has an effect that it is possible to determine whether or not summary information is attached to a video. (3) Since the video summarizing apparatus based on the invention described in claim 5, claim 6, claim 7, claim 8, claim 9, and claim 13 includes a plurality of video summarization standards, As compared with a conventional video summarizing apparatus that summarizes video according to one video summarization standard, it is possible to appropriately cope with the diversity of video content. The user can freely change or combine the video summarization standards depending on the program or the like. (4) In the video summarizing apparatus based on the invention described in claim 7, claim 8, and claim 9, the user can specify the upper limit of the number of representative images, so that the number of representative images can be reduced. You can easily find the part you want to see compared to the conventional method. Further, when the priorities of the modes are determined in advance, the user can determine the representative image according to the characteristics of the video. In addition, when the method of determining a representative image is switched depending on whether the number of representative images in the image reference mode is equal to or greater than a threshold value, the digest image can be preferentially extracted from a video in which a digest exists. It also has the effect. Therefore, the efficiency with which the user searches for the summary video is improved as compared with the conventional video summary system. (5) The video display device based on the invention described in claim 10 utilizes the information of the video and the information of the file server that stores the summary information of the video for a certain period of time around the position of each representative image. A desired part can be specified while watching the video one after another, that is, a desired part can be searched using a moving image and a sound. Therefore, a desired part can be specified while taking into account sound and movement of a subject. In addition, since representative images are selected from scenes with high importance,
The user does not feel redundant when watching the reproduced video. (6) The video summarizing apparatus based on the invention described in claim 11 and claim 12, when reproducing the representative image of the video and the summary video consisting of the preceding and succeeding images, while eliminating the duplicated portion. Since the summary video is reproduced, there is no redundant reproduction. Furthermore, when the reproduction is interrupted at a desired portion while watching the video for a fixed time in the vicinity of the position of each representative image one after another, a plurality of representative images including the representative image representing the contents of the frame image at the interrupted position are listed. Since the display is performed, it is possible to easily search for “contents related to the part to be viewed” existing near the interrupted position.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態における映像要約シ
ステムのブロック図FIG. 1 is a block diagram of a video summarizing system according to a first embodiment of the present invention.

【図２】本発明の第１の実施の形態における映像要約シ
ステムの動作を示すフローチャートFIG. 2 is a flowchart showing the operation of the video summarizing system according to the first embodiment of the present invention.

【図３】シーンチェンジ検出手段がシーンチェンジを検
出する方法と前シーンを示す説明図FIG. 3 is an explanatory diagram showing a method of detecting a scene change by a scene change detecting means and a previous scene.

【図４】画像基準モードにおいて、画面をブロックへ分
割することを示す説明図FIG. 4 is an explanatory diagram showing that a screen is divided into blocks in an image reference mode.

【図５】本発明の第１の実施の形態における要約映像の
作成方法を示す説明図FIG. 5 is an explanatory diagram showing a method of creating a summary video according to the first embodiment of the present invention.

【図６】本発明の第１の実施の形態における見たい部分
の先頭からの再生開始の様子を示す説明図FIG. 6 is an explanatory diagram showing a state of starting reproduction from a head of a part to be viewed according to the first embodiment of the present invention;

【図７】本発明の第１の実施の形態におけるユーザーイ
ンターフェイス手段を介して見たい部分を指定するため
の画面を示す説明図FIG. 7 is an explanatory diagram showing a screen for designating a portion to be viewed via a user interface unit according to the first embodiment of the present invention;

【図８】本発明の第１の実施の形態における要約映像の
再生部分を示す説明図FIG. 8 is an explanatory diagram showing a playback part of a summary video according to the first embodiment of the present invention.

【図９】本発明の第２の実施の形態における映像要約シ
ステムのブロック図FIG. 9 is a block diagram of a video summarizing system according to a second embodiment of the present invention.

【図１０】本発明の第２の実施の形態における映像表示
システムの動作を示すフローチャートFIG. 10 is a flowchart showing the operation of the video display system according to the second embodiment of the present invention.

【図１１】本発明の第２の実施の形態における共通色比
率条件の様子を示す説明図FIG. 11 is an explanatory diagram showing a state of a common color ratio condition according to the second embodiment of the present invention.

【図１２】本発明の第２の実施の形態におけるグループ
の境界の決定方法を示す説明図FIG. 12 is an explanatory diagram showing a method for determining a group boundary according to the second embodiment of the present invention;

【図１３】本発明の第３の実施の形態における映像表示
システムの動作を示すフローチャートFIG. 13 is a flowchart showing the operation of the video display system according to the third embodiment of the present invention.

【図１４】従来の映像表示装置におけるシーン一覧表示
を示す説明図FIG. 14 is an explanatory diagram showing a scene list display in a conventional video display device.

【図１５】従来の映像表示装置における映像要約結果の
一覧表示を示す説明図FIG. 15 is an explanatory diagram showing a list display of video summary results in a conventional video display device.

【図１６】従来の映像要約システムのブロック図FIG. 16 is a block diagram of a conventional video summarizing system.

【図１７】従来の映像要約システムの動作を示すフロー
チャートFIG. 17 is a flowchart showing the operation of a conventional video summarization system.

【図１８】従来の共通色比率法で検出されるシーンチェ
ンジを示す図FIG. 18 is a diagram showing a scene change detected by a conventional common color ratio method.

[Explanation of symbols]

１０１、８０１ビデオディスク装置１０２、８０２ＶＴＲ１０３、８０３映像要約装置１０４、８０４画像取り込み手段１０５、８０５シーンチェンジ検出手段１０６、８０６時間判定処理手段１０７、８０７グループ生成手段１０８、８０８グループ追加手段１０９、８０９画像類似度計算手段１１０、８１０像基準処理手段８１１画像判定手段１１１、８１２制御装置１１２、８１３映像圧縮装置１１３、８１４ファイルサーバー１１４、８１５映像表示装置１１５要約映像再生手段１１６ユーザーインタフェース手段１１７映像再生手段 101, 801 video disk device 102, 802 VTR 103, 803 video summarization device 104, 804 image capturing means 105, 805 scene change detection means 106, 806 time judgment processing means 107, 807 group generation means 108, 808 group addition means 109, 809 Image similarity calculation means 110, 810 Image reference processing means 811 Image determination means 111, 812 Control device 112, 813 Video compression device 113, 814 File server 114, 815 Video display device 115 Summary video reproduction device 116 User interface device 117 Video Reproduction means

Claims

[Claims]

A plurality of scenes formed by dividing a captured video by detecting a scene change of the captured video, and a predetermined image of a scene before and after the scene in a time series (hereinafter, a predetermined image of the scene is represented as a representative image). A time-series group generation process in which scenes including a representative image are grouped into a time-series group by calculating the similarity of the image. A video sequence summarizing process for modifying the video sequence into a time series group independent of the scene described above, and a video summary information output process for outputting video summary information of each time series group obtained in the above two processes. Method.

2. The method according to claim 1, wherein the similarity between the representative images of the scene in the time series group generation process is calculated by calculating a ratio of pixels having a color common to the representative images to be compared. Video summarization method.

3. A predetermined image of a plurality of scenes (hereinafter, referred to as a reference scene) selected according to a predetermined reference from a plurality of scenes constituting the captured video (hereinafter, a predetermined image of the scene is represented). A similarity is calculated according to a predetermined criterion between a representative image of all scenes (hereinafter, referred to as a main part scene) constituting a video according to a predetermined criterion, and the similarity with the representative image of the reference scene is determined. A video summarization method comprising: a similarity calculation process of selecting a scene including a representative image having a value greater than or equal to a value; and a video summary information output process of outputting video summary information of the scene obtained in the above process.

4. In the similarity calculation process, a criterion for calculating the similarity between the representative image of the reference scene and the representative image of the main scene is to divide the representative image into a plurality of image areas, 4. The video summarizing method according to claim 3, wherein the RGB components of pixels in each image area of the image are compared.

5. A plurality of video summarization processes for grouping a plurality of scenes formed by dividing a captured video by detecting a scene change thereof into a plurality of time series groups and / or a predetermined process from the plurality of scenes. A video summarization method comprising: a plurality of video summarization processes for selecting a scene; and a video summary output process of outputting video summary information of each scene selected by each video summarization process.

6. A plurality of video summarization processes for grouping a plurality of scenes into a plurality of time-series groups and / or a plurality of video summarization processes for selecting a predetermined scene from the plurality of scenes include all scenes. A first video summarizing step of selecting, a second video summarizing step of selecting only a scene that continues for a predetermined time or more from a plurality of scenes of the captured video, and a predetermined A third video that calculates a similarity of an image (hereinafter, a predetermined image of a scene is referred to as a representative image) according to a predetermined criterion, and collects scenes including the representative image whose similarity is equal to or greater than a threshold into a time-series group. A summarizing process, a fourth video summarizing process according to claim 1 or claim 2, and a video summarizing process of at least two of the fifth video summarizing process according to claim 3 or claim 4. 6. The video summarizing method according to claim 5, wherein:

7. The video summary information includes a predetermined image of a scene (hereinafter, a predetermined image of a scene is referred to as a representative image) or a predetermined frame number (hereinafter, referred to as a representative frame number).
And selecting a representative image or a representative frame number in descending order of the number of frames between representative images or representative frame numbers, and selecting a predetermined number of scenes or a predetermined number of time-series scenes. 7. The video summarizing method according to claim 5, wherein the video summarizing method is characterized in that:

8. When a predetermined number of representative images or frame numbers cannot be selected by one video summarization process,
8. The video summarizing method according to claim 7, wherein a representative image or frame number selected by the other video summarizing means for the remaining representative images or frame numbers is selected.

9. The user inputs the priority of the video summarization process and the number of scenes or time series scenes to be selected,
9. The video summarizing method according to claim 7, wherein the video summarizing is performed according to the information.

10. A video display method for displaying a video and a representative image of the video selected by any one of claims 1 to 9, wherein a video connected to a part of a video near a position of the representative image is displayed. (Hereinafter referred to as a summary video.) A video display method characterized by reproducing a video starting from a frame specified above.

11. When the reproduction of a video (hereinafter referred to as a summary video) connecting a part of a video near the position of an image extracted from a video is interrupted, a representative representing the contents of the frame image at the interrupted position. A video display method characterized by displaying a list of a plurality of representative images including images.

12. When the first part of the summary video is included in the last part of the immediately preceding summary video, the last part of the summary video included in the last part of the immediately preceding summary video is included. The video display method according to claim 10, wherein a frame next to the frame is set as a head of the summary video.

13. A plurality of video summarizing means for extracting summary information of a video by grouping a plurality of scenes formed by dividing a captured video based on a predetermined standard into a plurality of time series groups, and / or A plurality of video summarization means for extracting video summary information by selecting a predetermined scene from the plurality of scenes; a summary information selection means for selecting the summary information extracted by the video summary means; and summary information A video summarization system having a display means.