JP2008153920A

JP2008153920A - Motion picture list displaying apparatus

Info

Publication number: JP2008153920A
Application number: JP2006339626A
Authority: JP
Inventors: Takeaki Suenaga; 健明末永; Yoshiaki Ogisawa; 義昭荻澤; Shuichi Watabe; 秀一渡部
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2006-12-18
Filing date: 2006-12-18
Publication date: 2008-07-03

Abstract

<P>PROBLEM TO BE SOLVED: To provide a motion picture list displaying apparatus capable of easily understanding difference between motion pictures to be displayed as a list by a user by summarizing and reproducing the motion pictures so as not to be similar mutually when a plurality of motion pictures are displayed after being arranged. <P>SOLUTION: The motion picture list displaying apparatus 100 selects a motion picture from motion pictures which are accumulated in an accumulating unit 101 by a motion picture selecting unit 102 based on a condition designated by a condition inputting unit 103. The selected motion picture is detected its feature by a feature detecting unit 109, then correlation calculation of a motion picture group is performed by a correlation calculating unit 104 using this feature. Then summary of the motion picture added with the calculated correlation is produced by a summary producing unit 105, and a list of the produced summary is displayed. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、複数の動画像の一覧表示を行う動画像一覧表示装置に関する。 The present invention relates to a moving image list display device that displays a list of a plurality of moving images.

近年、ネットワークの高速化、チャンネルの多様化に伴い、個人が視聴することの出来る動画像コンテンツの量は飛躍的に増加している。また、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）やハードディスク装置（ＨＤＤ）などの記録媒体の大容量化に伴い、動画像をこれらの記録媒体へ大量に録画し、放送時間に縛られることなく番組を視聴するというスタイルも一般的となった。 In recent years, with the speeding up of networks and the diversification of channels, the amount of moving image content that can be viewed by individuals has increased dramatically. In addition, with the increase in capacity of recording media such as DVDs (Digital Versatile Discs) and hard disk drives (HDDs), a large amount of moving images are recorded on these recording media and the program can be viewed without being restricted by the broadcast time. The style became common.

このように、個人が扱う動画像の量が膨大になると、自分の視聴したい動画像を探す為にその動画像の内容を素早く簡単に確認する技術が必要となる。この問題を解決する手段の一つとして特許文献１、２がある。 In this way, when the amount of moving images handled by an individual becomes enormous, a technique for quickly and easily confirming the contents of the moving images is required in order to search for moving images that the user wants to view. As means for solving this problem, there are Patent Documents 1 and 2.

特許文献１では、動画像の要約を作成し、それを確認するという手法である。また、特許文献２では、更に、動画像データから複数のサムネイル静止画像を選出し、それらをコマ送りした動画像の要約を作成するだけでなく、この要約を一覧として同時に再生させることで、ユーザが複数の動画像の内容を素早く確認することを可能にしている。
特開２００３−１０１９３９号公報特開平１１−２８４９４８号公報 Japanese Patent Application Laid-Open No. 2004-228561 is a method of creating a summary of a moving image and confirming it. Further, in Patent Document 2, in addition to selecting a plurality of thumbnail still images from moving image data and creating a summary of moving images obtained by frame-by-frame advancement, the summary can be played back as a list at the same time. Makes it possible to quickly check the contents of multiple moving images.
JP 2003-101939 A JP-A-11-284948

しかしながら、例えば同一ジャンルに分類される複数の動画像各々の要約を作成した場合、その内容が似通ってしまう場合がある。これら似通った要約を特許文献２に示されるような、一覧として表示した場合、ユーザが各動画像の差異を十分に理解することが出来ないため、結果どの動画像を選択して良いか分からなくなるという問題がある。 However, for example, when a summary of each of a plurality of moving images classified into the same genre is created, the contents may be similar. When these similar summaries are displayed as a list as shown in Patent Document 2, the user cannot fully understand the difference between the moving images, and as a result, it is not possible to select which moving image may be selected. There is a problem.

例えば、同一タイトルのドラマの複数話を各々要約すると、特許文献１に示される動画像の要約作成技術では、ジャンル別に最適化されたパラメータを用いて部分的な動画像を選択し、時間長が短縮された動画像の要約を作成する。このジャンル別に用意された要約条件を適用する場合、今要約しようとする動画像群のジャンルは「ドラマ」で同一であるので、適用される要約条件も一緒になる。 For example, when summarizing each of a plurality of dramas of the same title, the moving image summary creation technique disclosed in Patent Document 1 selects partial moving images using parameters optimized for each genre, and the time length is reduced. Create a shortened video summary. When applying the summary conditions prepared for each genre, the genre of the moving image group to be summarized is the same for “drama”, so the summary conditions to be applied are also the same.

ここで、これらのドラマが、毎回定型化されたパターンで構成されていた場合を考える。どれか一つ話の要約でその定型化されたパターンを含む動画像の一部分が選択されれば、同一の要約条件で要約したその他の話の要約でも、同一のパターンを含む部分が選択される可能性が高くなる。 Here, consider a case where these dramas are composed of patterns that are stylized every time. If a part of a moving image that includes the stylized pattern is selected in one of the story summaries, a part that includes the same pattern is selected in the other story summaries summarized under the same summarization condition. The possibility increases.

一般的に、この定型化されたパターンは、映像的乃至は音声的に似通っている場合が多く、それゆえ、これら動画像の要約を一覧として並べても、ユーザにはどれも同じような内容に見えてしまい、各動画像間の差異を理解することが困難である。その結果、ユーザはどの動画像を視聴すべきかを決定し難い。 In general, this stylized pattern is often similar in terms of video or audio. Therefore, even if these video summaries are arranged in a list, all of them have the same content. It is difficult to understand the difference between the moving images. As a result, it is difficult for the user to determine which moving image should be viewed.

本発明は、上記の実情を考慮してなされたものであって、一覧表示される動画像各シーンの相関を考慮することで、ユーザが各々を比較し易いようなユニークな動画像の要約を作成し、一覧表示を行う動画像一覧表示装置を提供することを目的とする。 The present invention has been made in consideration of the above circumstances, and by taking into account the correlation of each scene of the displayed moving images, a unique moving image summary that makes it easy for the user to compare them is provided. An object of the present invention is to provide a moving image list display device that creates and displays a list.

上記の課題を解決するために、本発明の動画像一覧表示装置は次のような構成を持つものとする。
本発明の動画像一覧表示装置は、映像、音声を含む複数の情報からなる動画像を蓄積する蓄積部と、前記蓄積部から所定の条件に従い一覧表示する複数の動画像を選出する選出部と、動画像自体から得られる情報乃至動画像に付属する情報に基づいて、動画像全体またはシーンごとの特徴情報を検出する特徴検出部と、前記特徴検出部で検出された前記特徴情報を用いて、前記選出部で選出された複数の動画像の各シーンについて、シーン間の相関を計算する相関計算部と、前記相関計算部で計算された前記シーン間の相関と、前記特徴検出部で検出された前記特徴情報とに基づいて、前記選出部で選出された各動画像の要約を作成する要約作成部と、前記要約作成部で作成された要約を一覧表示する一覧表示部と、を備えるものである。 In order to solve the above-described problems, the moving image list display device of the present invention has the following configuration.
The moving image list display device of the present invention includes a storage unit that stores a moving image including a plurality of information including video and audio, and a selection unit that selects a plurality of moving images to be displayed in a list according to a predetermined condition from the storage unit. Based on information obtained from the moving image itself or information attached to the moving image, a feature detecting unit that detects feature information of the entire moving image or each scene, and the feature information detected by the feature detecting unit A correlation calculation unit that calculates a correlation between scenes for each scene of a plurality of moving images selected by the selection unit; a correlation between the scenes calculated by the correlation calculation unit; and a detection by the feature detection unit A summary creation unit that creates a summary of each moving image selected by the selection unit based on the feature information that is selected, and a list display unit that displays a summary of the summary created by the summary creation unit Is.

あるいは、本発明の動画像一覧表示装置は、映像、音声を含む複数の情報からなる動画像と、該動画像の特徴情報とを蓄積する蓄積部と、前記蓄積部から所定の条件に従い一覧表示する複数の動画像を選出する選出部と、前記蓄積部で蓄積された前記特徴情報を用いて、前記選出部で選出された複数の動画像の各シーンについて、シーン間の相関を計算する相関計算部と、前記相関計算部で計算された前記シーン間の相関と、前記蓄積部で蓄積された前記特徴情報とに基づいて、前記選出部で選出された各動画像の要約を作成する要約作成部と、前記要約作成部で作成された要約を一覧表示する一覧表示部と、を備えるものである。 Alternatively, the moving image list display device according to the present invention includes a storage unit that stores a plurality of pieces of information including video and audio, and feature information of the moving image, and a list display according to a predetermined condition from the storage unit. A correlation unit that calculates a correlation between scenes for each scene of the plurality of moving images selected by the selection unit using the feature information stored in the storage unit; A summary for creating a summary of each moving image selected by the selection unit based on the calculation unit, the correlation between the scenes calculated by the correlation calculation unit, and the feature information stored by the storage unit A creation unit; and a list display unit that displays a list of summaries created by the summary creation unit.

ここで、前記所定の条件は、ジャンル指定やキーワード指定や、一覧表示された動画像群の中からユーザによって選択操作され、前記動画像自体から得られる情報乃至動画像に付属する情報は、例えば、映像情報、音声情報、字幕情報や、動画像に付属する情報（例えば、ＥＰＧ、および、タグ情報等）のうち１つ以上の組み合わせである。
また、前記相関計算部は、凝集型のクラスタリング手法やｋ−ｍｅａｎｓに代表される分岐型のクラスタリング手法等を用いてシーン間の相関を求める。 Here, the predetermined condition may be information such as genre designation, keyword designation, or a selection operation performed by a user from a list of moving images, and information obtained from the moving image itself or information attached to the moving image may be, for example, , Video information, audio information, subtitle information, and information (eg, EPG, tag information, etc.) attached to a moving image.
The correlation calculation unit obtains a correlation between scenes using an agglomeration type clustering method, a branch type clustering method represented by k-means, or the like.

前記要約作成部は、次のいずれかで動画像の要約を作成する。
（１）前記相関計算部で計算された前記シーン間の相関に基づき、類似したシーンの数に応じて当該シーンを再生する時間長を決定し、動画像に含まれる各シーンを決定された前記時間長で再生することで各動画像の要約を作成する。
前記時間長は、当該シーンについて、類似したシーンが多いほど再生時間を短くするものであってもよいし、さらに、前記特徴検出部から得られた前記特徴情報に基づいて、当該シーンの前記時間長を調整するようにしてもよい。
また、他の動画像との差異を良く示したシーンの場合には、該シーンの再生時間を変更しないようにしてもよい。
さらに、前記類似したシーンの含まれる動画像が同一の動画像か異なる動画像かによって、当該シーンの前記時間長を調整するようにしてもよい。 The summary creation unit creates a summary of a moving image by any of the following.
(1) Based on the correlation between the scenes calculated by the correlation calculation unit, a time length for reproducing the scene is determined according to the number of similar scenes, and each scene included in the moving image is determined A summary of each moving picture is created by playing back in time length.
The time length may be such that as the number of similar scenes in the scene increases, the playback time is shortened, and further, based on the feature information obtained from the feature detection unit, the time length of the scene. The length may be adjusted.
In addition, in the case of a scene that clearly shows a difference from other moving images, the playback time of the scene may not be changed.
Furthermore, the time length of the scene may be adjusted depending on whether the moving image including the similar scene is the same moving image or a different moving image.

（２）前記相関計算部で計算された各シーン間の相関に基づき、類似したシーンが所定数よりも少ないシーンを集めて各動画像の要約を作成する。
（３）各動画像について目標再生時間を定め、前記相関計算部で計算された各シーン間の相関に基づいて、他の動画像との差異を良く示したシーンを前記目標再生時間で再生可能なだけ集めて各動画像の要約を作成する。 (2) Based on the correlation between the scenes calculated by the correlation calculation unit, a summary of each moving image is created by collecting scenes where the number of similar scenes is less than a predetermined number.
(3) A target playback time is determined for each moving image, and a scene that clearly shows a difference from other moving images can be played at the target playback time based on the correlation between the scenes calculated by the correlation calculation unit. Collect as much as possible and create a summary of each video.

（４）一覧表示された動画像群の中から特定の動画像の指定を受け付ける操作入力部を備え、前記要約作成部は、前記操作入力部で指定された動画像（以下、注目動画像）に関しては、前記相関計算部で計算された前記シーン間の相関と独立に要約を作成し、その他の動画像の要約に関しては、前記指定された動画像の要約に含まれるシーンに類似したシーンを除いて作成する。これにより、複数の動画像が一覧表示された場合でも、一覧表示される動画像中に似通ったものが多く存在すると判断されたシーンが要約の中に選出されにくくなるので、ユーザは動画像の差異を確認することが容易になり、所望の動画像を探し出しやすくなる。 (4) An operation input unit that receives designation of a specific moving image from a list of moving image groups, and the summary creation unit includes a moving image (hereinafter referred to as an attention moving image) specified by the operation input unit. For the summary of other moving images, a scene similar to the scene included in the specified moving image summary is created. Create without. As a result, even when a plurality of moving images are displayed in a list, it is difficult for a user to select a moving image of a moving image because it is difficult to select a scene in the summary that is determined to have many similar images. It becomes easy to confirm the difference, and it is easy to find a desired moving image.

さらに、上記の動画像一覧表示装置において、前記蓄積部には、蓄積されている全ての動画像について、予め特徴情報を検出して蓄積しておき、前記相関計算部および前記要約作成部は、前記動画像選出部で選出された動画像の特徴情報を、前記蓄積部から直接読み出すようにしてもよい。 Further, in the above moving image list display device, the accumulation unit detects and accumulates characteristic information in advance for all accumulated moving images, and the correlation calculation unit and the summary creation unit include: The feature information of the moving image selected by the moving image selection unit may be directly read from the storage unit.

また、上述した構成の動画像一覧表示装置の各部として、コンピュータを機能させるためのプログラムを作成しておき、または、そのプログラムをコンピュータ読み取り可能な記録媒体に記録しておき、このプログラムをコンピュータで実行することによっても上記課題を解決することができる。 In addition, a program for causing a computer to function as each unit of the moving image list display device having the above-described configuration is created, or the program is recorded on a computer-readable recording medium, and the program is executed by the computer. The above-mentioned problem can be solved also by executing.

本発明では、複数の動画像を一覧表示するものとし、一覧表示される動画像の少なくとも一つを動画像の要約として再生する際に、作成される要約を一覧表示される動画像群の組み合わせに応じて変化させる。
また、動画像の要約を作成する基準のひとつにシーン間の相関を用い、一覧表示される動画像中に似通ったものが多く存在すると判断されたシーンは、要約の中に選出されにくくする。
これにより、複数の動画像が一覧表示された場合でも、ユーザは動画像の差異を確認することが容易になり、所望の動画像を容易に探し出すことができる。 In the present invention, a plurality of moving images are displayed as a list, and when at least one of the displayed moving images is reproduced as a moving image summary, a combination of moving images displayed as a list of the generated summaries It changes according to.
Further, correlation between scenes is used as one of the criteria for creating a summary of moving images, and scenes that are judged to have many similarities among the displayed moving images are made difficult to be selected in the summary.
Thereby, even when a plurality of moving images are displayed in a list, the user can easily confirm the difference between the moving images, and can easily find a desired moving image.

以下、図面を参照して本発明の動画像一覧表示装置に係る好適な実施形態を説明する。
＜実施形態１＞
図１は、本実施形態１に係る動画像一覧表示装置の機能構成を示すブロック図である。
図１において、動画像一覧表示装置１００は、蓄積部１０１、動画像選出部１０２、条件入力部１０３、相関計算部１０４、要約作成部１０５、画面合成部１０６、表示部１０７、操作入力部１０８、特徴検出部１０９を含んで構成される。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments of a moving image list display device of the invention will be described with reference to the drawings.
<Embodiment 1>
FIG. 1 is a block diagram illustrating a functional configuration of the moving image list display device according to the first embodiment.
In FIG. 1, a moving image list display device 100 includes a storage unit 101, a moving image selection unit 102, a condition input unit 103, a correlation calculation unit 104, a summary creation unit 105, a screen synthesis unit 106, a display unit 107, and an operation input unit 108. , Including a feature detection unit 109.

以下に、本実施形態１に係る動画像一覧表示装置の各部についての詳細を示す。
蓄積部１０１は、ＤＶＤやＨＤＤ等の記憶媒体で形成され、本実施形態１で対象とする複数の動画像を蓄積する。この動画像は、放送波を受信し録画したもの、インターネットなどのネットワークを介して取得したもの、ＤＶＤや各種メモリーカードなどの記録メディアを通じて取得したものなどであり、WMV、MPEG-1、MPEG-2、MPEG-4、H.264/AVC等で圧縮されていてもよい。また、これら動画像の他にも、各シーンの相関の計算や動画像の要約を作成する際に利用される動画像付加情報（例えば、字幕情報、ＥＰＧ（ＥｌｅｃｔｒｏｎｉｃＰｒｏｇｒａｍＧｕｉｄｅ）、動画像に付されたタグ情報など）が蓄積される。 Details of each part of the moving image list display device according to the first embodiment will be described below.
The accumulation unit 101 is formed of a storage medium such as a DVD or an HDD, and accumulates a plurality of moving images targeted in the first embodiment. This moving image is recorded by receiving a broadcast wave, acquired through a network such as the Internet, acquired through a recording medium such as a DVD or various memory cards, and the like. WMV, MPEG-1, MPEG- 2. It may be compressed by MPEG-4, H.264 / AVC or the like. In addition to these moving images, moving image additional information (for example, subtitle information, EPG (Electronic Program Guide), and moving images) used for calculating correlation of each scene and creating a moving image summary. Tag information etc.) is accumulated.

条件入力部１０３は、蓄積部１０１に蓄積された複数の動画像から一覧表示する動画像を選出するための条件を入力する。この動画像を選出する条件としては、例えば、ジャンル指定、キーワードなどを用いた検索が挙げられる。勿論、これら条件を一切指定せず、全ての動画像を選択することも可能である。 The condition input unit 103 inputs a condition for selecting a moving image to be displayed as a list from a plurality of moving images stored in the storage unit 101. As a condition for selecting the moving image, for example, a search using a genre designation, a keyword, or the like can be given. Of course, it is also possible to select all moving images without specifying any of these conditions.

動画像選出部１０２は、条件入力部１０３を通じてユーザが入力した条件に合致する動画像を蓄積部１０１に蓄積された複数の動画像の中から選出する。 The moving image selection unit 102 selects a moving image that matches the condition input by the user through the condition input unit 103 from the plurality of moving images stored in the storage unit 101.

特徴検出部１０９は、動画像選出部１０２で選出された動画像について、動画像から様々な情報を取得し、動画像全体、またはシーンごとの特徴を検出し、ここで得られた動画像の特徴は、相関計算部１０４や要約作成部１０５に伝達され、各シーンの相関の計算や、動画像の要約を作成する際に利用される。 The feature detection unit 109 acquires various information from the moving image about the moving image selected by the moving image selection unit 102, detects the entire moving image or the feature of each scene, and the moving image obtained here The feature is transmitted to the correlation calculation unit 104 and the summary creation unit 105, and is used when calculating the correlation of each scene and creating a summary of a moving image.

ここで、本発明におけるシーンとは、複数のフレームの時系列である動画像中における、意味的まとまりを持つフレーム区間のことを指し、何らかの基準、例えば、予め人手によって記述されたメタデータや動画像情報を用いた自動分割手法によって、既に分割されているものとする。
この自動分割手法としては、公知の方法（例えば、山田伸、藤岡利和、金森克洋、松島宏司:部分領域ごとの共通色に着目したシーンチェンジ検出方式の検討:テレビジョン学会技術報告:Ｖｏｌ１７，Ｎｏ５５）を用いるものとする。 Here, the scene in the present invention refers to a frame section having a semantic unit in a moving image that is a time series of a plurality of frames, and some standard, for example, metadata or a moving image previously described by hand. It is assumed that the image has already been divided by an automatic division method using image information.
As this automatic division method, a known method (for example, Shin Yamada, Toshikazu Fujioka, Katsuhiro Kanamori, Koji Matsushima: Examination of a scene change detection method focusing on a common color for each partial region: Television Society Technical Report: Vol 17, No 55 ) Shall be used.

特徴検出部１０９は、動画像自体から得られる情報乃至は動画像に付属する情報を利用して特徴を検出する。
動画像自体から得られる情報の例としては、動画像に含まれる映像情報や、音声情報または字幕情報など、動画像に付属する情報としてはＥＰＧ、動画像に付されたタグ情報などが挙げられる。
これら動画像情報から得られる特徴の例として、映像情報から得られる平均輝度情報、各種色空間における色ヒストグラム情報、エッジの方向・密度、エッジ情報、動きベクトル情報、音声情報から得られる音声レベルや周波数分布、発話位置、背景音楽（ＢＧＭ）情報、テキスト情報から得られるテロップ情報、ＥＰＧやタグ情報から得られるタイトル、ジャンル情報、出演者、録画日時などの重要キーワードが挙げられる。 The feature detection unit 109 detects features using information obtained from the moving image itself or information attached to the moving image.
Examples of information obtained from the moving image itself include video information included in the moving image, audio information or caption information, and information attached to the moving image includes EPG, tag information attached to the moving image, and the like. .
Examples of features obtained from the moving image information include average luminance information obtained from video information, color histogram information in various color spaces, edge direction / density, edge information, motion vector information, audio level obtained from audio information, Important keywords such as frequency distribution, utterance position, background music (BGM) information, telop information obtained from text information, title obtained from EPG and tag information, genre information, performers, recording date and time, and the like.

相関計算部１０４は、動画像間の相関を計算するために、動画像選出部１０２から得られる表示する動画像全体乃至は各々に含まれる各シーンの相関を計算する。
シーンの相関は、特徴検出部１０９にて得た複数の特徴情報を単一的または複合的に使用し、各シーンの特徴情報は任意の次元の特徴ベクトルに変換される。ここで、特徴ベクトルとは、各シーンから得られる特徴情報を任意の次元のベクトルとしたものである。 The correlation calculation unit 104 calculates the correlation between the entire moving images to be displayed obtained from the moving image selection unit 102 or each scene included in each to obtain the correlation between the moving images.
The scene correlation uses a plurality of feature information obtained by the feature detection unit 109 singly or in combination, and the feature information of each scene is converted into a feature vector of an arbitrary dimension. Here, the feature vector is a vector of an arbitrary dimension obtained from feature information obtained from each scene.

次に、凝集型クラスタリング手法を用いて、各シーンの相関計算の例を説明する。
図２に示すように、シーンＡからシーンＦに分けられた動画像１と、シーンＧからシーンＬに分けられた動画像２があるものとする。ここで、任意のシーンｉの動画像の特徴から、ｍ次元の特徴ベクトルＶ_ｉが得られたものとする。このとき、ｍ次元空間において、ベクトルＶ_ｉに基づき、シーン１からシーンｎまでのｎ個のシーン各々を１個だけ含むｎ個のクラスタ（Ｃ_１〜Ｃ_ｎ）に分類し、これを初期状態とする。 Next, an example of correlation calculation for each scene using an agglomerative clustering technique will be described.
As shown in FIG. 2, it is assumed that there is a moving image 1 divided from scene A to scene F and a moving image 2 divided from scene G to scene L. Here, it is assumed that an m-dimensional feature vector V_i is obtained from the feature of a moving image of an arbitrary scene i. At this time, in the m-dimensional space, based on the vector V_i, the n scenes from the scene 1 to the scene n are classified into n clusters (C_1 to C_n) each including only one scene, and this is set as an initial state.

この初期状態から、２つのシーンの特徴ベクトルＶ_ｉとＶ_ｊのユークリッド距離ｄ（Ｖ_ｉ，Ｖ_ｊ）を取得する。この距離ｄ（Ｖ_ｉ，Ｖ_ｊ）はシーンｉとシーンｊの相関の高さを示し、値が小さければシーンｉとシーンｊの相関が高いことを示す。
さらに、この情報ｄ（Ｖ_ｉ，Ｖ_ｊ）を用いて、各クラスタの距離を求める。クラスタＣ_ｉとＣ_ｊの距離Ｄ（Ｃ_ｉ，Ｃ_ｊ）は以下の式（１）で求められる。 From this initial state, the Euclidean distance d (V_i, V_j) between the feature vectors V_i and V_j of the two scenes is acquired. This distance d (V_i, V_j) indicates the high correlation between the scene i and the scene j, and a small value indicates that the correlation between the scene i and the scene j is high.
Further, using this information d (V_i, V_j), the distance of each cluster is obtained. The distance D (C_i, C_j) between the clusters C_i and C_j is obtained by the following equation (1).

ここで、ｎ_ｉ、ｎ_ｊはそれぞれクラスタＣ_ｉ、Ｃ_ｊに含まれるシーンの個数である。ここでは、説明のため、クラスタＣ_ｉに含まれる全シーンとクラスタＣ_ｊに含まれる全シーンとの距離を求め、その平均をクラスタ間の距離Ｄ（Ｃ_ｉ，Ｃ_ｊ）としたが、これ以外の計算手法を用いてクラスタ間の距離計算を行ってもよい。例えば、クラスタＣ_ｉに含まれる任意のシーンｉ´とクラスタＣ_ｊに含まれる任意のシーンｊ´との間の距離ｄ（Ｖ_ｉ´，Ｖ_ｊ´）の中で最小のものをクラスタ間の距離とする最短距離法や、前述の距離の中で最大のものをクラスタ間の距離とする最大距離法などを用いてもよい。 Here, n_i and n_j are the numbers of scenes included in the clusters C_i and C_j, respectively. Here, for the sake of explanation, the distance between all scenes included in the cluster C_i and all scenes included in the cluster C_j is obtained, and the average is the distance D (C_i, C_j) between the clusters. The distance between clusters may be calculated using. For example, the shortest of the distances d (V_i ′, V_j ′) between any scene i ′ included in the cluster C_i and any scene j ′ included in the cluster C_j is the distance between the clusters. A distance method or a maximum distance method in which the maximum distance among the above-described distances is used may be used.

次に、このクラスタ間の距離Ｄ（Ｃ_ｉ，Ｃ_ｊ）を全クラスタ間で計算し、最も距離が小さくなる次の式（２）を満たすクラスタＣ_ｋ、Ｃ_ｌを同一クラスタとして併合する。 Next, the distance D (C_i, C_j) between the clusters is calculated among all the clusters, and the clusters C_k and C_l that satisfy the following formula (2) with the smallest distance are merged as the same cluster.

併合後の（ｎ−１）個のクラスタについて、再度この併合処理を行うことを繰り返すことで、クラスタをステップごとに纏めていく。但し、Ｄ_ｍｉｎ（Ｃ_ｋ，Ｃ_ｌ）が予め設定した閾値Ｔｈを超える場合は、併合処理を打ち切る。即ち、閾値Ｔｈ以上に全てのクラスタ間が離れている場合は、そのクラスタ各々は互いに類似しないシーンクラスタであるとみなす。 By repeating this merging process for (n-1) clusters after merging, the clusters are gathered for each step. However, when D_min (C_k, C_l) exceeds a preset threshold Th, the merging process is terminated. That is, when all the clusters are separated by the threshold Th or more, the clusters are regarded as scene clusters that are not similar to each other.

図３は、図２に示された各シーンＡ〜Ｌから得られたｍ次元の特徴ベクトルから、前述のクラスタリング手法を用いて分類した結果の例を示す図である。なお、ここでは説明のため、２次元に簡略化してプロットしている。分類の結果、各シーンは、点線で囲まれた各クラスタに分類されている。 FIG. 3 is a diagram illustrating an example of a result of classification using the above-described clustering method from m-dimensional feature vectors obtained from the scenes A to L illustrated in FIG. Here, for the sake of explanation, the plot is simplified to two dimensions. As a result of the classification, each scene is classified into each cluster surrounded by a dotted line.

本発明では、このようにして得られた複数のクラスタにおいて、同一クラスタに分類されたシーンは互いに相関が高いと判断し、反対に別なクラスタに分類されたシーンは相関が低いものとする。以後、この一つ乃至は複数のシーンが分類されたクラスタをシーンクラスタと呼ぶ。 In the present invention, in the plurality of clusters obtained in this way, scenes classified into the same cluster are judged to have high correlation with each other, and conversely, scenes classified into different clusters have low correlation. Hereinafter, a cluster in which one or more scenes are classified is called a scene cluster.

上記の説明では、シーンの相関を計算するために、凝集型のクラスタリング手法を用いたが、ｋ−ｍｅａｎｓに代表される分岐型のクラスタリング手法を用いても良いし、その他の技術を用いてシーン間の相関を求めてもよい。 In the above description, an agglomeration type clustering method is used to calculate the correlation of scenes. However, a branch type clustering method represented by k-means may be used, and scenes using other techniques may be used. You may obtain | require the correlation between.

要約作成部１０５は、相関計算部１０４に示された各シーンの相関と特徴検出部１０９から得られた特徴情報を用いて、動画像選出部１０２で選出された動画像各々の要約を作成する。以下、図４乃至図１２を用いて、動画像の要約を作成する手法の一例を説明する。 The summary creation unit 105 creates a summary of each moving image selected by the moving image selection unit 102 using the correlation of each scene shown in the correlation calculation unit 104 and the feature information obtained from the feature detection unit 109. . Hereinafter, an example of a technique for creating a summary of a moving image will be described with reference to FIGS.

今、４つの動画像、動画像１、動画像２、動画像３、動画像４を一覧表示する場合を考える。これらの各動画像は、図４に示すようにシーン分割されており、前述の手法を用いて全てのシーンをシーンクラスタＡからＯに分類したものとする。
分類の結果、複数の類似したシーンの集合からなるシーンクラスタＡ、Ｂ、Ｃ、Ｄが作成され、それ以外のシーンクラスタＥからＯに含まれるシーンは、類似するシーンが存在せず、各々１つのクラスタに対して１つのシーンのみ分類されたとする。 Consider a case in which four moving images, moving image 1, moving image 2, moving image 3, and moving image 4 are displayed as a list. Each of these moving images is divided into scenes as shown in FIG. 4, and all the scenes are classified into scene clusters A to O using the above-described method.
As a result of the classification, scene clusters A, B, C, and D each including a set of a plurality of similar scenes are created, and the scenes included in the other scene clusters E to O do not have similar scenes. Assume that only one scene is classified for one cluster.

ここで、類似したシーンが多ければ多いほど、即ち同一シーンクラスタに分類されたシーンの個数が多ければ多いほど、それらのシーンの重要の度合いを下げるような係数を設定したい。この係数を重要度ω_ｉとすると、重要度ω_ｉは次の式（３）で計算される。ここで、a、ｂは本装置に予め設定された任意の定数とし、ｎ_ｉは、シーンクラスタｉに分類されたシーンの個数である。 Here, as the number of similar scenes increases, that is, as the number of scenes classified into the same scene cluster increases, it is desired to set a coefficient that lowers the importance of those scenes. When this coefficient is the importance ω_i, the importance ω_i is calculated by the following equation (3). Here, a and b are arbitrary constants preset in the apparatus, and n_i is the number of scenes classified into the scene cluster i.

この重要度ω_ｉを用いて各シーンの再生時間長乃至はシーン再生の有無を判定する。
例えば、図４に示された動画像１から動画像４を分類した結果から各シーンクラスタの重要度ω_ｉを求めると次のような関係が成り立つ。 The importance ω_i is used to determine the playback time length of each scene or the presence / absence of scene playback.
For example, when the importance ω_i of each scene cluster is obtained from the result of classifying the moving image 4 from the moving image 1 shown in FIG. 4, the following relationship is established.

ω_Ａ＜ ω_Ｂ＜ ω_Ｃ＝ ω_Ｄ＜ ω_Ｅ＝ ω_Ｆ＝ … ＝ ω_Ｏ ω_A <ω_B <ω_C = ω_D <ω_E = ω_F = ... = ω_O

上記の説明では、全てのシーンを等価なものとして扱いω_ｉを求めたが、ここに同一の動画に含まれるシーンかどうかを加味して重要度を決定するようにしても良い。
例えば、先に示した重要度ω_ｉの算出方法では、同一の動画像から得られるシーンのみが集まったクラスタｉであっても、重要度ω_ｉが下がってしまう。 In the above description, all scenes are treated as equivalent and ω_i is obtained. However, the importance may be determined by considering whether or not the scenes are included in the same moving image.
For example, in the method of calculating importance ω_i described above, the importance ω_i decreases even in the cluster i in which only scenes obtained from the same moving image are collected.

ところが、動画像間の差異を確認するという観点からすれば、同一動画像から得られるシーンのみが集まったクラスタｉは、他の動画像との差異を良く示したシーンが集まったクラスタであるといえる。そこで、あるクラスタに分類されたシーン群の中で、同一の動画像から得られたシーンが多く含まれる場合は、そのクラスタに分類されるシーンの重要度が高くなるように、重要度ω_ｉを次の式（４）で設定する。但し、ａ、ｂは予め設定された定数であり、ｍ_ｉはクラスタｉに含まれるシーンを、同一動画像に含まれるシーンを重複してカウントすることのないようにしたシーンの数である。 However, from the viewpoint of confirming a difference between moving images, a cluster i in which only scenes obtained from the same moving image are collected is a cluster in which scenes that clearly show differences from other moving images are collected. I can say that. Therefore, when many scenes obtained from the same moving image are included in a scene group classified into a certain cluster, the importance ω_i is set so that the importance of the scene classified into the cluster becomes high. It sets with the following formula | equation (4). However, a and b are preset constants, and m_i is the number of scenes in which scenes included in the cluster i are not counted repeatedly.

図３において、先に示した重要度計算法に基づくと、２つのシーンが分類されたクラスタである３０１と３０２は同一の重要度を持つことになるが、今回示した重要度計算法に基づけば、３０１に比べて３０２の重要度が高くなる。ここにおいて、３０２は動画像２のシーンを多く含むクラスタであり、動画像１との差異を示すのに好適なクラスタであると考えられる。 In FIG. 3, based on the importance calculation method shown above, the clusters 301 and 302 in which two scenes are classified have the same importance, but based on the importance calculation method shown this time. For example, 302 is more important than 301. Here, 302 is a cluster including many scenes of the moving image 2 and is considered to be a cluster suitable for showing a difference from the moving image 1.

次に、求められた重要度ω_ｉを利用し、動画像の要約を作成する。
ここでは、動画像の要約を作成する例の一つとして、重要度を要約後の各シーンの再生時間計算に用いた例を示す。
求められた重要度ω_ｉを利用し、要約後の再生時間長Ｔｘ_ｉを次の式（５）で求める。但し、ｔｘ_ｉは、シーンクラスタｉに分類されたシーンｘの再生時間長を表すものである。 Next, a summary of a moving image is created using the obtained importance ω_i.
Here, as one example of creating a summary of moving images, an example is shown in which importance is used for calculating the playback time of each scene after summarization.
Using the obtained importance ω_i, the reproduction time length Tx_i after summarization is obtained by the following equation (5). However, tx_i represents the reproduction time length of the scene x classified into the scene cluster i.

図５に示すように、シーンｘは要約後の再生時間長Ｔｘ_ｉに合わせて要約される。
指定の時間に合わせてシーンを要約する手法としては、公知の要約技術（例えば、特許３６４０６１５号公報）を用いて、部分的な再生や、高速再生によって実現されることとしても良いし、勿論、その他の技術を利用することとしても良い。 As shown in FIG. 5, the scene x is summarized in accordance with the reproduction time length Tx_i after the summary.
As a method of summarizing a scene in accordance with a specified time, it may be realized by partial reproduction or high-speed reproduction using a known summarization technique (for example, Japanese Patent No. 3640615). Other techniques may be used.

このようにして、図４の動画像１から動画像４を要約した図が図６である。要約された動画像１から要約された動画像４は、各々動画像１から動画像４の各シーンの再生時間を前述の手法を用いて求め要約したものである。図６の例では、各動画像における類似するシーンが多いほど短くまとめられ、逆に他のシーンと類似しないユニークなシーンを中心に要約が作成されている。 FIG. 6 summarizes the moving image 1 to the moving image 4 in FIG. 4 in this way. The summarized moving images 1 to 4 are obtained by summarizing the reproduction times of the respective scenes of the moving images 1 to 4 using the above-described method. In the example of FIG. 6, the more similar scenes in each moving image, the shorter the summary, and conversely, a summary is created around a unique scene that is not similar to other scenes.

また、要約の長さを予め決定し、それに合わせる形で動画像の要約を作成する要約作成法も考えられる。図６で示した要約作成例では、重要度をそのまま動画像に適用して各シーンの再生時間長を決定したため、各要約の再生時間長が不揃いになっている。この要約の再生時間長を、目標とする再生時間長で正規化することにより、全ての要約再生時間を一定とすることが可能である。 A summary creation method is also conceivable in which the length of the summary is determined in advance and a summary of the moving image is created in accordance with the length. In the summary creation example shown in FIG. 6, since the importance level is applied to the moving image as it is to determine the playback time length of each scene, the playback time lengths of the summaries are not uniform. By normalizing the reproduction time length of the summary with the target reproduction time length, it is possible to make all the summary reproduction times constant.

また、図７では、各動画像を構成するシーンを先に求めた重要度ω_ｉによる降順に並び替え、即ち、ユニークなシーンが優先的に再生されるように並び替え、先頭から任意の時間分を要約として再生する例を示している。今、時刻ｔを目標再生時間として定め、時刻ｔ以内で再生可能なシーンを先頭から再生していくとすると、図８（Ａ）に示すような動画像の要約が作成される。また、図８（Ｂ）に示すように、図８（Ａ）で得られた目標再生時間ｔ以内で再生可能なシーンの集合を、時間軸に沿って、即ち要約前の動画像における出現順に従って、再度並び替えることにしてもよい。 Also, in FIG. 7, the scenes constituting each moving image are rearranged in descending order according to the importance ω_i obtained earlier, that is, rearranged so that unique scenes are preferentially reproduced, and an arbitrary amount of time from the beginning. Is reproduced as a summary. Now, assuming that the time t is set as the target reproduction time and a scene that can be reproduced within the time t is reproduced from the beginning, a summary of a moving image as shown in FIG. 8A is created. Further, as shown in FIG. 8B, a set of scenes that can be reproduced within the target reproduction time t obtained in FIG. 8A is displayed along the time axis, that is, in the order of appearance in the moving image before summarization. According to the above, it may be rearranged again.

重要度ω_ｉを用いた要約手法としては、他にも、重要度ω_ｉが予め設定された条件を満たす場合のみ、再生を行う形式とすることも考えられる。図９では、類似するシーンが他に無いシーンのみを再生、即ちω_ｉ＝ａ＋ｂを満たすシーンのみを再生することとした場合の例を示す図である。 As another summarization technique using importance ω_i, it is also possible to use a format in which reproduction is performed only when importance ω_i satisfies a preset condition. FIG. 9 is a diagram illustrating an example of a case where only scenes that have no other similar scenes are reproduced, that is, only scenes satisfying ω_i = a + b are reproduced.

他にも、図１０に示すように、重要度ω_ｉが高いシーンの中から、任意の１シーンを選んで再生するようにしてもよいし、各シーンから代表静止画像を複数枚抜き出してパラパラ漫画のように連続表示することとし、この抜き出す代表静止画像の枚数を重要度ω_ｉの大きさに比例させるようにしても良い。 In addition, as shown in FIG. 10, any one scene may be selected and reproduced from scenes having high importance ω_i, or a plurality of representative still images may be extracted from each scene. The number of representative still images to be extracted may be proportional to the importance ω_i.

勿論、重要度ω_ｉは一つの指標から導き出されるものではなく、複数の指標を組み合わせても良い。相関計算部１０４で得られた各シーンの相関情報に加えて、特徴検出部１０９で検出された特徴情報を利用することで、より柔軟な動画像の要約の作成が可能となる。 Of course, the importance ω_i is not derived from one index, and a plurality of indices may be combined. By using the feature information detected by the feature detection unit 109 in addition to the correlation information of each scene obtained by the correlation calculation unit 104, a more flexible moving image summary can be created.

図１１は、音声レベル１１０１が高く、他の動画像に含まれないシーンを重要なシーンとした例を示す図である。即ち、シーンｉにおける重要度ω_ｉを式（６）で求める。ここにおいて、a、ｂ、ｃは予め設定された任意の定数、ｓ_ｉはシーンｉにおける平均音声レベルである。 FIG. 11 is a diagram illustrating an example in which a scene having a high audio level 1101 and not included in another moving image is an important scene. That is, the importance ω_i in the scene i is obtained by the equation (6). Here, a, b, and c are arbitrary constants set in advance, and s_i is an average sound level in the scene i.

この重要度ω_ｉの最も高いシーンが動画像の要約として選出されるとすると、シーンＦが動画像の要約として選択される。
重要度を決定するために利用可能な特徴情報は、この他にも、シーン切り替え頻度などが挙げられる。例えば、スポーツ番組において、野球中継におけるヒットや投球シーン、サッカー中継における重要なパスやゴールシーンなど、一般的にユーザが重要と考えるシーンにおいては、カメラの切り替わりやリプレイ再生、人物のアップなど、シーン切り替えが頻発する。このことから、シーンが前述のカメラの切り替わりなどの場面転換となるような位置で区切られていた場合、シーンの間隔が短い場合は重要の度合いが高く、逆に長い場合は重要の度合いが低くなるような指標を加味した重要度ω_ｉを設定する。即ち、重要度ω_ｉを次の式（７）で求める。ここにおいて、ｌ_ｉはシーンｉにおける再生時間長であり、ａ、ｂ、ｃは予め設定された係数であるとする。 If the scene having the highest importance ω_i is selected as the summary of the moving image, the scene F is selected as the summary of the moving image.
In addition to this, the feature information that can be used to determine the importance includes scene switching frequency. For example, in sports programs such as hits and throwing scenes in baseball broadcasts, important passes and goal scenes in soccer broadcasts, etc. Switching occurs frequently. For this reason, when the scene is divided at a position that causes a scene change such as the camera switching described above, the degree of importance is high when the interval between scenes is short, and the degree of importance is low when it is long. The importance ω_i is set in consideration of such an index. That is, the importance ω_i is obtained by the following equation (7). Here, it is assumed that l_i is a reproduction time length in the scene i, and a, b, and c are coefficients set in advance.

他にも、音声情報のモノラル、ステレオの種別からシーン中にコマーシャル区間と判断される区間を多く含むか否かを判別し、各シーンにおけるコマーシャル以外（本編）が占める割合を指標の一つとして用いることで、動画像の要約からコマーシャル部分を排除するようにしても良いし、映像情報から得られる動きベクトル情報からシーンの動きの激しさを求め、これを重要度ω_ｉ計算の指標の一つとして用いても良い。 In addition, it is determined whether the scene contains many sections that are determined to be commercial sections based on the type of monaural or stereo audio information, and the ratio of non-commercial (main part) in each scene is used as one of the indicators. By using it, the commercial part may be excluded from the summary of the moving image, or the intensity of the motion of the scene is obtained from the motion vector information obtained from the video information, and this is one of the indices for calculating the importance ω_i It may be used as

また、予め設定しておいた特徴情報、例えば特定の人物の顔、などにマッチするフレームをパターンマッチングなどの手法を用いて検出し、該フレームがシーン中に含まれる割合を指標の一つと考えても良いし、映像中に埋め込まれたテロップが出現するシーンなど、エッジ強度が高い領域を多く含むシーンの重要度を高くするために、シーン中に含まれる各フレームのエッジ強度を指標の一つとしても良い。 Also, frames that match preset feature information, such as a face of a specific person, are detected using a method such as pattern matching, and the ratio of the frames included in the scene is considered as one of the indices. In order to increase the importance of a scene that includes many areas with high edge strength, such as a scene in which embedded telop appears in the video, the edge strength of each frame included in the scene is used as an index. It's okay.

また、上記重要度作成の例では、扱う動画像が何らかの形式で圧縮されている場合、復号処理した映像情報から取得可能な情報を用いて重要度の計算を行ったが、勿論復号前の情報を用いて重要度を計算しても良い。例えば、先の動きベクトルを用いた重要度の計算を行う際に、映像情報がＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）方式のようなフレーム間予測符号化方式によって圧縮されていた場合、復号前の映像情報から得られる動きベクトル情報を用いても良い。 In the example of creating importance, when the moving image to be handled is compressed in some form, the importance is calculated using information that can be acquired from the decoded video information. The importance may be calculated using. For example, when the importance is calculated using the previous motion vector, if the video information is compressed by an inter-frame predictive encoding method such as MPEG (Moving Picture Experts Group), the video information before decoding The motion vector information obtained from the above may be used.

このように、動画像間の相関を加味し、動画像各々のユニークなシーンが強調されるように動画像の要約を作成することで、ユーザがこれら要約を並べて視聴する場合でもコンテンツの差異を把握しやすくなる。 In this way, by taking into account the correlation between moving images and creating a summary of moving images so that the unique scenes of each moving image are emphasized, even if the user views these summaries side by side, the difference in content can be reduced. It becomes easy to grasp.

操作入力部１０８は、ユーザの操作を受け付けるリモコン等の入力デバイスからなり、一覧表示される動画像の選択などを通じて、画面合成部１０６に指示を出す。 The operation input unit 108 includes an input device such as a remote controller that receives user operations, and issues instructions to the screen composition unit 106 through selection of moving images displayed in a list.

画面合成部１０６は、動画像選出部１０２で選択された動画像を、要約映像作成部１０５で要約された動画像を全て同時に再生して表示部１０７に一覧表示する。
しかし、全て同時に再生するのではなく、ユーザがカーソルを合わせて選択した動画像のみ要約再生し、選択されないその他の動画像は代表静止画像を表示する形式としても良い。 The screen synthesizing unit 106 reproduces all the moving images selected by the moving image selection unit 102 and all the moving images summarized by the summary video creating unit 105 and displays them on the display unit 107 as a list.
However, instead of playing all of them at the same time, only a moving image selected by the user by moving the cursor is summarized and reproduced, and other moving images not selected may be displayed as a representative still image.

一覧表示させる動画像は、図１２（Ａ）に示すように、動画像の要約のみを一覧表示する形式としても良いし、図１２（Ｂ）に示すように動画像の要約とそれらに付随する情報（例えば、タイトル、録画日時など）を同時に表示する形式としても良い。 As shown in FIG. 12 (A), the moving images to be displayed in a list may have a format in which only the summary of moving images is displayed as a list, or as shown in FIG. Information (for example, title, recording date and time) may be displayed simultaneously.

さらに、ユーザがカーソルを合わせた動画像の詳細だけ表示するようにしても良い。例えば、図１２（Ｃ）のように、ユーザが動画像３（１２０１）にカーソルを合わせたときに、選択された動画像３の拡大表示された要約と、動画像の詳細を示す文字情報１２０３を表示してもよい。ここで、表示される拡大表示される１２０２の位置には、要約される前の動画像３を再生させるようにしても良いし、動画像の相関を加味しない要約を再生するようにしても良い。 Further, only the details of the moving image on which the user moves the cursor may be displayed. For example, as shown in FIG. 12C, when the user moves the cursor to the moving image 3 (1201), an enlarged summary of the selected moving image 3 and character information 1203 indicating the details of the moving image are displayed. May be displayed. Here, at the position of the enlarged display 1202, the moving image 3 before being summarized may be reproduced, or a summary that does not take the correlation of moving images into consideration may be reproduced. .

但し、本発明における動画像の相関を加味しない要約とは、相関計算部１０４から得られる相関を使用せず、特徴検出部１０９から得られる特徴情報のみを用いて、作成される動画像の要約のこと指し、公知の方法（例えば、特開２００３−１０１９３９号公報など）によって作成する。 However, the summary that does not take into account the correlation of the moving image in the present invention does not use the correlation obtained from the correlation calculation unit 104, and uses only the feature information obtained from the feature detection unit 109 to summarize the generated moving image. That is, it is created by a known method (for example, JP-A-2003-101939).

また、上述のように、動画像選出部１０２で選択された動画像全てを一度に画面に表示する必要はなく、図１２（Ｄ）に示すようにスクロールバー１２０４を用いてスクロールさせたり、図１２（Ｅ）に示すように動画像を予め複数のページに分割して表示するようにし、ページめくりの戻りボタン１２０５や次へボタン１２０６をユーザが選択することでページを切り替えるようにしてもよい。 Further, as described above, it is not necessary to display all the moving images selected by the moving image selection unit 102 on the screen at a time, and the scroll bar 1204 can be used for scrolling, as shown in FIG. 12 (E), the moving image may be divided into a plurality of pages and displayed in advance, and the page may be switched by the user selecting the page turning return button 1205 or the next button 1206. .

表示部１０７は、液晶ディスプレイ等からなり、条件入力部１０３や操作入力部１０８でユーザの入力した結果や、画面合成部１０６で合成された要約された複数の動画像の一覧等を表示する。 The display unit 107 includes a liquid crystal display or the like, and displays a result input by the user through the condition input unit 103 and the operation input unit 108, a list of a plurality of summarized moving images combined by the screen combining unit 106, and the like.

＜実施形態１の変形例＞
上記の実施形態１（図１）では、動画像選出部１０２で選択された動画像から特徴情報を逐次計算する構成となっているが、特徴検出部１０９で扱われる動画像は録画済のものであるから、予め特徴情報を全ての動画像について検出して蓄積しておくものとしても良い。 <Modification of Embodiment 1>
In the first embodiment (FIG. 1), the feature information is sequentially calculated from the moving images selected by the moving image selection unit 102. However, the moving images handled by the feature detection unit 109 are already recorded. Therefore, the feature information may be detected and accumulated for all moving images in advance.

図１３は、実施形態１の変形例の動画像一覧表示装置２００の機能構成を示すブロックであり、同図において、動画像一覧表示装置２００は、蓄積部２０１、動画像選出部１０２、条件入力部１０３、相関計算部１０４、要約作成部１０５、画面合成部１０６、表示部１０７、操作入力部１０８を含んで構成される。図１３において、図１と同じ機能については同一の符号を付し、その説明を省略する。 FIG. 13 is a block diagram illustrating a functional configuration of a moving image list display device 200 according to a modification of the first embodiment. In FIG. 13, the moving image list display device 200 includes a storage unit 201, a moving image selection unit 102, and a condition input. Unit 103, correlation calculation unit 104, summary creation unit 105, screen composition unit 106, display unit 107, and operation input unit 108. 13, the same functions as those in FIG. 1 are denoted by the same reference numerals, and the description thereof is omitted.

この変形例の場合、蓄積部２０１には、全ての動画像について、予め特徴情報を検出して蓄積しておき、相関計算部１０４、要約作成部１０５は、動画像選出部１０２で選出された動画像の特徴情報を、蓄積部２０１から直接読み出すようにする。 In the case of this modification, feature information is previously detected and stored in the storage unit 201 for all moving images, and the correlation calculation unit 104 and the summary creation unit 105 are selected by the moving image selection unit 102. The feature information of the moving image is directly read from the storage unit 201.

また、蓄積部２０１に記録される特徴情報は、本装置の特徴検出部１０９以外で検出されても良く、予め検出された動画像の特徴情報を、放送波やネットワーク、各種記録メディアを通じて取得し、蓄積部２０１に記録するようにしても良い。 Further, the feature information recorded in the storage unit 201 may be detected by a device other than the feature detection unit 109 of the present apparatus, and the feature information of a moving image detected in advance is acquired through a broadcast wave, a network, or various recording media. The data may be recorded in the storage unit 201.

＜実施形態２＞
実施形態１において、一覧表示をした後にさらに条件を絞って動画像を選択する場合も考えられる。本実施形態２は、実施形態１で一覧表示された動画像群の中から、ユーザがさらに一部に絞って内容を確認する場合の例である。 <Embodiment 2>
In the first embodiment, it may be possible to select a moving image by further narrowing down the conditions after displaying a list. The second embodiment is an example in the case where the user further confirms the contents from the moving image group displayed in the list in the first embodiment.

図１４は、実施形態２に係る動画像一覧表示装置３００の機能構成を示すブロック図であり、同図において、動画像一覧表示装置３００は、蓄積部１０１、動画像選出部１０２、条件入力部１０３、相関計算部１０４、要約作成部１０５、画面合成部１０６、表示部１０７、操作入力部３０８、特徴検出部１０９を含んで構成される。図１４において、図１とおなじ機能については同一の符号を付し、その説明を省略する。
操作入力部３０８は、一覧表示された動画像群の中から一部に絞って内容を確認するためのユーザからの指示を受け付け、画面合成部１０６へ指示する。
なお、上記の実施形態１の変形例のように、蓄積部１０１に予め特徴情報を全ての動画像について検出して蓄積しておくものとしても良い。この場合の構成は、図１の代わりに図１３を用いて構成する。 FIG. 14 is a block diagram illustrating a functional configuration of the moving image list display device 300 according to the second embodiment. In the drawing, the moving image list display device 300 includes a storage unit 101, a moving image selection unit 102, and a condition input unit. 103, a correlation calculation unit 104, a summary creation unit 105, a screen synthesis unit 106, a display unit 107, an operation input unit 308, and a feature detection unit 109. 14, the same functions as those in FIG. 1 are denoted by the same reference numerals, and the description thereof is omitted.
The operation input unit 308 accepts an instruction from the user for confirming the contents of the list of moving images displayed in a part, and instructs the screen composition unit 106.
Note that, as in the modification of the first embodiment described above, the feature information may be detected and stored in advance in the storage unit 101 for all moving images. The configuration in this case is configured using FIG. 13 instead of FIG.

本実施形態２の場合の動作を図１４および図１５を用いて説明する。
今、ユーザが図１５（Ａ）の１５０１に示すように、一覧表示された動画像の中から、動画像３、動画像４、動画像５、動画像８の要約だけを表示するように要求するものとする。
これらの動画像のうちからカーソルを合わせて選択し、選出を行った後の図が図１５（Ｂ）である。このとき、動画像３、動画像４、動画像５、動画像８が選択、抽出されたと言う情報は、操作入力部３０８から条件入力部１０３に動画像選択条件として伝えられ、これら動画像が動画像選出部１０２で選出された上で、再度シーン間の相関算出がなされ、これら４つの動画像の相関のみを加味した動画像３、動画像４、動画像５、動画像８に対する動画像の要約が新しく作成される。 The operation in the case of the second embodiment will be described with reference to FIGS.
Now, as shown by 1501 in FIG. 15A, the user requests to display only the summary of the moving image 3, the moving image 4, the moving image 5, and the moving image 8 from the displayed moving images. It shall be.
FIG. 15B is a diagram after selecting and selecting from these moving images with the cursor. At this time, information that the moving image 3, the moving image 4, the moving image 5, and the moving image 8 are selected and extracted is transmitted from the operation input unit 308 to the condition input unit 103 as a moving image selection condition. After being selected by the moving image selection unit 102, the correlation between the scenes is calculated again, and the moving images corresponding to the moving image 3, the moving image 4, the moving image 5, and the moving image 8 that take into account only the correlation of these four moving images are used. A new summary is created.

動画像を選択する方法は、図１５（Ａ）で示したような、直接ユーザが選択指示する手法に限らず、図１５（Ｃ）に示すように、予め複数の動画像が選択可能な大きさを持った範囲１５０２に入った動画像を選択するものとしてもよい。
また、図１５（Ａ）の中から任意の動画像を選択する手法としては、範囲で選択するだけでなく、図１５（Ｄ）に示すように、動画像に付随するＥＰＧ情報などから判断したジャンル情報１５０３で限定することで、一部の動画像を選出することとしても良いし、図１５（Ｅ）に示すように、キーワード検索などの検索結果を選出の条件としても良い。 The method for selecting a moving image is not limited to the method in which the user directly selects and instructs as shown in FIG. 15A, but as shown in FIG. 15C, the moving image can be selected in advance. It is also possible to select a moving image that falls within a range 1502 having a certain length.
Further, as a method of selecting an arbitrary moving image from FIG. 15A, not only selecting by a range, but also judging from EPG information accompanying the moving image as shown in FIG. 15D. By limiting with genre information 1503, it is good also as selecting some moving images, and it is good also considering search results, such as a keyword search, as selection conditions, as shown in FIG.15 (E).

＜実施形態３＞
本実施形態３では、現在ユーザが注目している動画像を、相関を加味しない動画像の要約とし、該要約を基準として、その周辺に表示される動画像の要約を類似することの無いようにする。
図１６は、実施形態３に係る動画像一覧表示装置４００の機能構成を示すブロック図であり、同図において、動画像一覧表示装置４００は、蓄積部１０１、動画像選出部１０２、条件入力部１０３、相関計算部１０４、要約作成部４０５、画面合成部１０６、表示部１０７、操作入力部４０８、特徴検出部１０９を含んで構成される。図１６において、図１とおなじ機能については同一の符号を付し、その説明を省略する。
なお、上記の実施形態１の変形例のように、蓄積部１０１に予め特徴情報を全ての動画像について検出して蓄積しておくものとしても良い。この場合の構成は、図１の代わりに図１３を用いて構成する。 <Embodiment 3>
In the third exemplary embodiment, the moving image currently focused by the user is set as a summary of moving images that do not take correlation into account, and the summaries of moving images displayed in the vicinity thereof are not similar based on the summary. To.
FIG. 16 is a block diagram illustrating a functional configuration of the moving image list display device 400 according to the third embodiment. In the drawing, the moving image list display device 400 includes a storage unit 101, a moving image selection unit 102, and a condition input unit. 103, a correlation calculation unit 104, a summary creation unit 405, a screen synthesis unit 106, a display unit 107, an operation input unit 408, and a feature detection unit 109. In FIG. 16, the same functions as those in FIG. 1 are denoted by the same reference numerals, and the description thereof is omitted.
Note that, as in the modification of the first embodiment described above, the feature information may be detected and stored in advance in the storage unit 101 for all moving images. The configuration in this case is configured using FIG. 13 instead of FIG.

要約作成部４０５は、相関計算部１０４で求められた相関に加えてユーザが今現在注目している動画像であるかどうかを考慮して動画像選出部１０２で求められた動画像各々の要約を作成する。
操作入力部４０８は、今現在注目している動画像であるかどうかの指示をユーザから受け付け、指示された動画像を要約作成部４０５へ渡す。
以下に、本実施形態３の動作を図１６乃至図２１を用いて説明する。 The summary creation unit 405 takes into account the sum of each moving image obtained by the moving image selection unit 102 in consideration of whether or not the moving image is currently focused on by the user in addition to the correlation obtained by the correlation calculation unit 104. Create
The operation input unit 408 receives an instruction from the user as to whether or not the moving image is currently focused, and passes the instructed moving image to the summary creation unit 405.
The operation of the third embodiment will be described below with reference to FIGS.

上記の実施形態１（図１）で説明した通り、まず、ユーザは条件入力部１０３から一覧表示する動画像の選択条件を入力し、動画像選出部１０２では、その条件に基づき蓄積部１０１から動画像を選択し、特徴検出部１０９で得られる選択された動画像の特徴情報を利用して、相関計算部１０４で相関を計算し、それを考慮した動画像の要約を要約作成部４０５で作成する。 As described in the first embodiment (FIG. 1), first, the user inputs selection conditions for moving images to be displayed in a list from the condition input unit 103, and the moving image selection unit 102 starts from the storage unit 101 based on the conditions. By selecting a moving image, using the feature information of the selected moving image obtained by the feature detection unit 109, a correlation is calculated by the correlation calculation unit 104, and a summary of the moving image considering the correlation is calculated by the summary creation unit 405. create.

但し、この時点では操作入力部４０８からの出力は無く、要約作成部４０５は実施形態１で説明した要約作成部１０５と同等の動きをするものとする。その後、一覧表示を行う動画像の要約を画面合成部１０６で合成し、表示部１０７に一覧表示を行う。 However, at this time, there is no output from the operation input unit 408, and the summary creation unit 405 operates in the same manner as the summary creation unit 105 described in the first embodiment. Thereafter, the summary of the moving images to be displayed in a list is combined by the screen combining unit 106 and the list is displayed on the display unit 107.

この一覧表示を行った際、ユーザが操作入力部４０８を通じて、ユーザが今現在注目している動画像を選択したものとする。このとき、操作入力部４０８は、画面合成部１０６を通じて、ユーザの操作入力を表示部１０７に反映させるだけでなく、要約作成部４０５にも伝える。 When this list display is performed, it is assumed that the user has selected a moving image currently focused on by the user through the operation input unit 408. At this time, the operation input unit 408 not only reflects the user operation input on the display unit 107 via the screen composition unit 106 but also transmits it to the summary creation unit 405.

要約作成部４０５では、ユーザが注目している動画像が存在する場合、その動画像に関しては、他の動画像との相関を加味しない要約を作成する。即ち、要約作成部４０５では、ユーザが一覧表示された動画像にカーソルを合わせる（図１７）等によって、操作入力部４０８からユーザの注目した動画像が指定されると、以下の手順に従って動画像の要約を作成する。 In a summary creation unit 405, if there is a moving image that the user is paying attention to, a summary that does not take into account the correlation with other moving images is created for the moving image. That is, in the summary creation unit 405, when a moving image noticed by the user is designated from the operation input unit 408 by moving the cursor to the moving image displayed in a list (FIG. 17) or the like, the moving image is performed according to the following procedure. Create a summary of

まず、ユーザが注目している動画像については、相関計算部１０４で計算された各動画像間の相関を用いずに要約を作成する。この動画像の相関を用いない要約は、例えば、図１８に示すように音声特徴を用いて作成される。図１８は、動画像の音声から得られる特徴の一つである、音声レベルを用いた要約作成法を示したものである。主にスポーツ番組などでは、重要なシーンでは観客の歓声が大きくなる。即ち、音声レベルが高いシーンは重要度が高いと判断できる。 First, a summary is created without using the correlation between the moving images calculated by the correlation calculation unit 104 for the moving image that the user is paying attention to. The summary without using the correlation of the moving images is created using voice features as shown in FIG. 18, for example. FIG. 18 shows a summary creation method using an audio level, which is one of the characteristics obtained from audio of a moving image. Mainly in sports programs, the cheering of the audience increases in important scenes. That is, it can be determined that a scene with a high audio level has a high importance.

このことから、各シーンに含まれる音声の音声レベル１８０１に対し、閾値１８０２を超える音声レベルを含むシーンは重要シーンとして抜き出すことで、動画像の要約が作成される。但し、閾値１８０２は、予め設定された任意の値であるとする。ここでは、該当するシーンＢ、Ｃ、Ｅが選出され、要約１が作成されている。
勿論、動画像の相関を加味せず、映像の特徴から要約を作成するためには、上記以外の手法を用いても良く、公知の文献（例えば、特開平１１−２８４９４８号公報）の技術を用いてもよい。
ここで、この作成された動画像の要約を注目動画像の要約と呼ぶことにする。 Therefore, a summary of a moving image is created by extracting a scene including an audio level exceeding the threshold 1802 as an important scene with respect to the audio level 1801 of the audio included in each scene. However, the threshold value 1802 is an arbitrary value set in advance. Here, corresponding scenes B, C, and E are selected, and summary 1 is created.
Of course, in order to create a summary from video features without taking into account the correlation of moving images, other methods may be used. For example, the technique of a known document (for example, Japanese Patent Laid-Open No. 11-284948) may be used. It may be used.
Here, the summary of the created moving image is referred to as the summary of the moving image of interest.

次に、ユーザが注目した動画像以外の動画像の要約を行う。
今、相関計算部１０４で、図１８に示した動画像１の各シーンＡからＦと、図１９に示した動画像２の各シーンＧからＬとの相関を計算する。ここでは、上述したようなクラスタリングを行うものとし、この結果を２次元にプロットした図が図２０である。 Next, a summary of moving images other than the moving image focused by the user is performed.
Now, the correlation calculation unit 104 calculates the correlation between the scenes A to F of the moving image 1 shown in FIG. 18 and the scenes G to L of the moving image 2 shown in FIG. Here, it is assumed that clustering as described above is performed, and FIG. 20 is a diagram in which the results are plotted in two dimensions.

このとき、ユーザが注目している動画像の要約である、注目動画像の要約に類似しないように、ユーザが注目していない動画像（以下、非注目動画像と呼ぶ）を要約したい。今、ユーザが注目している動画像は動画像１であり、注目動画像の要約が要約１であるとすると、要約１に含まれるシーンである、シーンＢ、シーンＣ、シーンＥに類似するシーン、即ち、シーンＢ、シーンＣ、シーンＥの各々と同一クラスタに分類されるシーンについて重要度を下げることにする。 At this time, it is desired to summarize a moving image that is not focused on by the user (hereinafter, referred to as a non-focused moving image) so as not to be similar to the focused moving image summary that is a summary of the moving image focused on by the user. If the moving image that the user is paying attention to is the moving image 1 and the summary of the moving image of interest is the summary 1, it is similar to the scenes B, C, and E included in the summary 1. The importance is reduced for scenes, that is, scenes classified into the same cluster as each of scene B, scene C, and scene E.

図２１では、注目動画像の要約映像である要約１と同一のクラスタに分類されなかったシーンを要約映像として抜き出す例を示したものである。図２０に示されているように、要約１に含まれるシーンである、シーンＢ、シーンＣ、シーンＥと同一のクラスタ２００１、２００２、２００３に分類されたシーンＧ、シーンＩ、シーンＬは要約２からは省かれ、要約１に含まれていないシーンＨ、シーンＪ、シーンＫで要約２が作成されている。 FIG. 21 shows an example in which scenes that are not classified into the same cluster as the summary 1 that is the summary video of the moving image of interest are extracted as the summary video. As shown in FIG. 20, scenes G, Scene I, and Scene L classified into the same clusters 2001, 2002, and 2003 as Scene B, Scene C, and Scene E, which are scenes included in Summary 1, are summarized. The summary 2 is created with the scene H, the scene J, and the scene K which are omitted from the summary 1 and are not included in the summary 1.

勿論、要約作成部１０５で述べたように、動画像の要約は動画像間の相関のみを基準に作成されるわけではなく、特徴検出部１０９から得られる様々な特徴情報を加味して作成されるものとしても良い。 Of course, as described in the summary creation unit 105, the summary of the moving image is not created based only on the correlation between the moving images, but is created taking into account various feature information obtained from the feature detection unit 109. It may be a thing.

以上の構成で示したように、一覧表示される動画像の中で、ユーザが注目する動画像に対して作成した要約を基準に、その他の類似しない要約を作成することによって、注目動画像の重要なシーンを確認しつつ、周辺の動画像については、注目動画像との差異を確認することが可能になる。 As shown in the above configuration, by creating other similar summaries based on the summaries created for the moving images that the user notices among the list of moving images, While confirming an important scene, it is possible to confirm the difference between the surrounding moving images and the target moving image.

尚、本発明は上述した実施形態に限定されず、本発明の要旨を逸脱しない範囲内で各種の変形、修正が可能であるのは勿論である。例えば、動画像一覧表示装置の各部の機能をコンピュータプログラム化し、このコンピュータプログラムを動画像一覧表示装置へインストールして実行することでも実現される。また、このコンピュータプログラムを着脱可能な記録媒体に記録したり、ネットワークや放送波を介してダウンロードすることにより、移送が簡単になり容易に実施することができる。 Note that the present invention is not limited to the above-described embodiment, and various modifications and corrections can be made without departing from the scope of the present invention. For example, the function of each part of the moving image list display device can be realized as a computer program, and the computer program can be installed in the moving image list display device and executed. Further, the computer program can be recorded on a detachable recording medium, or downloaded via a network or broadcast wave, so that the transfer can be performed easily and easily.

実施形態１に係る動画像一覧表示装置の機能構成を示すブロック図である。3 is a block diagram illustrating a functional configuration of a moving image list display device according to Embodiment 1. FIG. シーンに分割された動画像の様子を示す図である。It is a figure which shows the mode of the moving image divided | segmented into the scene. 図２で示されたシーンをクラスタリングした後の様子を２次元にプロットした図である。It is the figure which plotted the mode after clustering the scene shown in FIG. 2 in two dimensions. 図２で示された各シーンを各クラスタに分類した様子を示す図である。It is a figure which shows a mode that each scene shown in FIG. 2 was classified into each cluster. 図２で示された動画像１を、設定された目標再生時間に基づき要約した様子を示す図である。It is a figure which shows a mode that the moving image 1 shown in FIG. 2 was summarized based on the set target reproduction | regeneration time. 図２で示された動画像１から動画像４を、設定された目標再生時間に基づき要約した様子を示す図である。It is a figure which shows a mode that the moving image 4 shown in FIG. 2 was summarized based on the set target reproduction | regeneration time. 得られた重要度に基づき、図２で示された動画像１から動画像４各々のシーンを重要度順に並び替えた様子を示す図である。It is a figure which shows a mode that the scene of each of the moving image 1 to the moving image 4 shown in FIG. 2 was rearranged in order of importance based on the obtained importance. 目標再生時間ｔに基づき、図２で示された動画像１から動画像４の各シーンから、再生を行うシーンを選出した様子を示した図である。FIG. 3 is a diagram showing a state in which a scene to be reproduced is selected from each scene of moving image 1 to moving image 4 shown in FIG. 2 based on a target reproduction time t. 図２で示された動画像１を、設定された閾値に基づき要約した様子を示す図である。It is a figure which shows a mode that the moving image 1 shown in FIG. 2 was summarized based on the set threshold value. 図２で示された動画像１から、任意の１シーンを選出することで要約した様子を示す図である。It is a figure which shows a mode that it summarized by selecting arbitrary 1 scenes from the moving image 1 shown by FIG. 図２で示された各シーンの類似度と音声レベルに基づき、要約を行う様子を示す図である。It is a figure which shows a mode that it summarizes based on the similarity and audio | voice level of each scene shown by FIG. 動画像の一覧表示手法の一例を示す図である。It is a figure which shows an example of the list display method of a moving image. 本実施形態の変形例に係る動画像一覧表示装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the moving image list display apparatus which concerns on the modification of this embodiment. 実施形態２に係る動画像一覧表示装置の機能構成を示すブロック図である。6 is a block diagram illustrating a functional configuration of a moving image list display device according to a second embodiment. FIG. 一覧表示された動画像から任意の動画像を選択する様子を示した図である。It is the figure which showed a mode that the arbitrary moving images were selected from the moving image displayed by the list. 実施形態３に係る動画像一覧表示装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the moving image list display apparatus which concerns on Embodiment 3. FIG. ユーザが任意の動画像を１つ選択する様子を示した図である。It is the figure which showed a mode that the user selected one arbitrary moving images. 音声レベルを用いて動画像１を要約した様子を示した図である。It is the figure which showed a mode that the moving image 1 was summarized using the audio | voice level. シーン分割された動画像２の様子を示した図である。It is the figure which showed the mode of the moving image 2 by which the scene division | segmentation was carried out. 図１８、図１９で示された動画像１、動画像２の各シーンのクラスタリング結果を２次元にプロットした図である。It is the figure which plotted the clustering result of each scene of the moving image 1 and the moving image 2 shown by FIG. 18, FIG. 19 in two dimensions. 図１９に示された動画像２を要約した様子を示す図である。It is a figure which shows a mode that the moving image 2 shown by FIG. 19 was summarized.

Explanation of symbols

１００，２００，３００，４００…動画像一覧表示装置、１０１，２０１…蓄積部、１０２…動画像選出部、１０３…条件入力部、１０４…相関計算部、１０５，４０５…要約作成部、１０６…画面合成部、１０７…表示部、１０８，３０８，４０８…操作入力部、１０９…特徴検出部、３０１…シーンクラスタ１、３０２…シーンクラスタ２、１１０１，１８０１…音声レベルグラフ、１８０２…閾値、１２０１…選択された動画像、１２０２…選択された動画像の拡大動画サムネイル、１２０３…動画像の詳細を示す文字情報、１２０４…スクロールバー、１２０５…ひとつ前のページに戻す「戻る」ボタン、１２０６…ひとつ次のページに進む「次へ」ボタン、１５０１…選択された動画像、１５０２…範囲によって選択された動画像、１５０３…選択されたカテゴリ情報、２００１…シーンクラスタ１、２００２…シーンクラスタ２、２００３…シーンクラスタ３。 DESCRIPTION OF SYMBOLS 100, 200, 300, 400 ... Moving image list display apparatus, 101, 201 ... Accumulation part, 102 ... Moving image selection part, 103 ... Condition input part, 104 ... Correlation calculation part, 105, 405 ... Summary preparation part, 106 ... Screen composition unit 107 ... Display unit 108 308 408 Operation input unit 109 Feature detection unit 301 Scene cluster 1, 302 Scene cluster 2, 1101, 1801 Voice level graph 1802 Threshold 1201 ... selected moving image, 1202 ... enlarged moving image thumbnail of the selected moving image, 1203 ... character information indicating details of the moving image, 1204 ... scroll bar, 1205 ... "return" button to return to the previous page, 1206 ... “Next” button for proceeding to the next page, 1501... Selected moving image, 1502. 1503 ... selected category information, 2001 ... scene cluster 1,2002 ... scene cluster 2,2003 ... scene cluster 3.

Claims

A storage unit that stores a moving image including a plurality of information including video and audio, a selection unit that selects a plurality of moving images to be displayed in a list according to a predetermined condition from the storage unit, and information or a moving image obtained from the moving image itself Based on information attached to the image, a feature detection unit that detects feature information for the entire moving image or each scene, and a plurality of features selected by the selection unit using the feature information detected by the feature detection unit For each scene of the moving image, based on the correlation calculation unit that calculates the correlation between scenes, the correlation between the scenes calculated by the correlation calculation unit, and the feature information detected by the feature detection unit, A moving image list display device comprising: a summary creation unit that creates a summary of each moving image selected by the selection unit; and a list display unit that displays a summary of the summary created by the summary creation unit .

The moving image list display device according to claim 1, wherein the predetermined condition is given by genre designation or keyword designation.

The moving image list display device according to claim 1, further comprising an operation input unit that receives a specific moving image selected from a list of moving image groups, wherein the predetermined condition is input by the operation input unit. A moving image list display device, which is given by a user's selection operation.

2. The moving image list display device according to claim 1, wherein information obtained from the moving image itself or information attached to the moving image is one or more of video information, audio information, caption information, EPG, and tag information. A moving image list display device characterized by being a combination.

The moving image list display device according to claim 1, wherein the correlation calculation unit calculates a correlation of the scenes using a clustering method.

2. The moving image list display device according to claim 1, wherein the summary generation unit reproduces the scene according to the number of similar scenes based on the correlation between the scenes calculated by the correlation calculation unit. And a summary of each moving image is created by reproducing each scene included in the moving image with the determined time length.

7. The moving image list display device according to claim 6, wherein the time length of the scene is shortened as the number of similar scenes increases.

The moving image list display device according to claim 7, wherein the summary creation unit further adjusts the time length of the scene based on the feature information obtained from the feature detection unit. Moving image list display device.

7. The moving image list display device according to claim 6, wherein the summary creating unit is a scene that clearly shows a difference from other moving images based on the feature information obtained from the feature detecting unit. A moving picture list display device characterized by not changing the playback time of the scene.

The moving image list display device according to claim 6, wherein the summary creation unit further adjusts the time length of the scene depending on whether the moving images included in the similar scene are the same moving image or different moving images. A moving image list display device.

2. The moving image list display device according to claim 1, wherein the summary creation unit collects scenes in which a number of similar scenes is smaller than a predetermined number based on the correlation between the scenes calculated by the correlation calculation unit. A moving image list display device for creating a summary of images.

2. The moving image list display device according to claim 1, wherein the summary creating unit determines a target reproduction time for each moving image, and based on the correlation between the scenes calculated by the correlation calculating unit, A moving image list display device that collects scenes that clearly show differences from the above in a reproducible manner for the target reproduction time and creates a summary of each moving image.

13. The moving image list display device according to claim 1, further comprising an operation input unit that receives designation of a specific moving image from a list of moving image groups, wherein the summary creation unit includes the operation For the moving image specified by the input unit, a summary is created independently of the correlation between the scenes calculated by the correlation calculation unit, and for the summary of other moving images, the summary of the specified moving image is included. A moving image list display device, wherein a moving image list display device is created by excluding a scene similar to an included scene.

14. The moving image list display device according to claim 1, wherein the accumulation unit detects and accumulates characteristic information in advance for a moving image, and the correlation calculation unit and the summary creation unit A moving image list display device, wherein the feature information of the moving image selected by the moving image selection unit is read from the storage unit.

A moving image composed of a plurality of pieces of information including video and audio; a storage unit that stores the feature information of the moving image; a selection unit that selects a plurality of moving images to be displayed in a list according to a predetermined condition from the storage unit; Using the feature information stored in the storage unit, for each scene of a plurality of moving images selected by the selection unit, a correlation calculation unit that calculates the correlation between scenes, and the correlation calculation unit Based on the correlation between the scenes and the feature information accumulated in the accumulation unit, a summary creation unit that creates a summary of each moving image selected by the selection unit, and created by the summary creation unit A moving image list display device comprising: a list display unit for displaying a summary list.