JP5898117B2

JP5898117B2 - Video summarization apparatus, video summarization method, and video summarization program

Info

Publication number: JP5898117B2
Application number: JP2013053671A
Authority: JP
Inventors: 周平田良島; 大我吉田; 筒口　けん; けん筒口; 新井　啓之; 啓之新井; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-03-15
Filing date: 2013-03-15
Publication date: 2016-04-06
Anticipated expiration: 2033-03-15
Also published as: JP2014179888A

Description

本発明は、映像から要約映像を生成する映像要約装置、映像要約方法及び映像要約プログラムに関する。 The present invention relates to a video summarization device, a video summarization method, and a video summarization program for generating a summary video from a video.

放送波やインターネットを介して個人がアクセス可能な映像メディアは既に膨大な規模であり、またその規模は増加の一途を辿っている。例えば、ある動画共有サイトでは、１分あたりにアップロードされる動画の総映像長は７２時間にのぼることが報告されている。映像は時間軸を有するメディアであり、一般にその内容は視聴しなくては理解できない。そのため、視聴困難もしくは不可能な規模の映像からその概要を把握したり、所望の情報を発見したりする行為は明らかに多大な労力を伴う。この問題意識のもと、映像を短時間に圧縮し、短時間での概要把握や情報発見を可能にすることを目的とした映像要約技術の発明が今まで数多くなされてきた。 Video media that can be accessed by individuals via broadcast waves and the Internet are already enormous, and the scale has been increasing. For example, it has been reported that a certain video sharing site has a total video length of 72 hours uploaded per minute. Video is a media with a time axis, and in general, its contents cannot be understood without viewing. Therefore, the act of grasping the outline from the video of a scale that is difficult or impossible to watch or finding the desired information is obviously accompanied with great effort. Based on this problem awareness, many inventions of video summarization techniques have been made so far in order to compress video in a short time, and to make it possible to grasp an outline and discover information in a short time.

ここで映像要約技術とは、一つ以上の映像から得られる映像区間群の中から、要約に含めるべき映像区間を選択する技術を指す。代表的な映像要約方法として、特許文献１、非特許文献１に記載の技術を挙げることができる。特許文献１では、映像区間から抽出される様々な特徴の重みを、視聴者自身に調節させ、その特徴が一定の閾値を超えたような映像区間を抽出することで、視聴者の嗜好に合った要約映像を生成する技術が開示されている。また、非特許文献１では、放送日時が近いニュースでは同一の話題が扱われる可能性が高いことに着目し、映像区間の見た目と意味双方の観点で類似したニュース映像区間をクラスタリングする技術が開示されている。 Here, the video summarization technique refers to a technique for selecting a video section to be included in the summary from a group of video sections obtained from one or more videos. As typical video summarization methods, the techniques described in Patent Document 1 and Non-Patent Document 1 can be cited. In Patent Document 1, the weight of various features extracted from the video section is adjusted by the viewer, and the video section in which the feature exceeds a certain threshold is extracted to meet the viewer's preference. A technique for generating a summary video is disclosed. Also, Non-Patent Document 1 discloses a technique for clustering news video sections that are similar in terms of both the appearance and meaning of the video section, focusing on the fact that the same topic is likely to be handled in news that is close to the broadcast date and time. Has been.

特開２０１２−４４３９０号公報JP 2012-44390 A

W.-T. Chu, C.-C. Huang and W.-F. Cheng: News Story Clustering from Both What and How Aspects: Using Bag of Word Model and Affinity Propagation, in Proc. AIEM-Pro, pp.7-12, 2011W.-T. Chu, C.-C. Huang and W.-F. Cheng: News Story Clustering from Both What and How Aspects: Using Bag of Word Model and Affinity Propagation, in Proc.AIEM-Pro, pp.7 -12, 2011

ところで、視聴困難な規模の映像が存在するという昨今の状況においては、映像要約技術は以下の（１）〜（３）の事項を満たすことが望ましい。
（１）選択された映像区間に重複がない。
（２）要約映像処理に必要な、ハードディスク等の記憶装置へ常に格納しておく必要のあるデータのサイズが小さくて済む。
（３）記憶装置に格納されたデータから要約映像を高速に生成することができる。 By the way, in the recent situation where there is a video of a scale that is difficult to view, it is desirable that the video summarization technique satisfies the following items (1) to (3).
(1) There is no overlap in the selected video section.
(2) The size of the data necessary for the summary video processing that needs to be always stored in a storage device such as a hard disk is small.
(3) A summary video can be generated at high speed from data stored in a storage device.

まず（１）について、要約映像は短い時間でできるだけ多くの情報を含む必要があるため、選ばれた映像区間の間に重複が発生していないことが望ましい。また重複は個々の映像内だけではなく、要約対象となる全ての映像から得られる映像区間の間で重複していないことが望ましい。 First, regarding (1), since the summary video needs to include as much information as possible in a short time, it is desirable that no overlap occurs between the selected video sections. In addition, it is desirable that the overlap does not occur not only in individual videos but also in video sections obtained from all videos to be summarized.

次に（２）について、膨大な規模の映像から要約映像を生成するには、膨大なデータを解析する必要があることが一般的である。中でも特にハードディスク等の記憶装置へ常時格納する必要のあるデータは、データの格納に必要な記憶装置の容量に影響し、運用コストにも影響を及ぼす。低コストでの要約処理を実現するためには、記憶装置へ常に格納しておく必要のあるデータ容量が少なくて済むことが望ましい。 Next, regarding (2), in order to generate a summary video from an enormous scale image, it is generally necessary to analyze an enormous amount of data. In particular, data that needs to be always stored in a storage device such as a hard disk affects the capacity of the storage device necessary for data storage and also affects the operation cost. In order to realize the summary process at a low cost, it is desirable that the data capacity that needs to be always stored in the storage device is small.

最後に（３）について、記憶装置に格納されたデータに基づく要約処理は、映像データの追加に伴い再度実施されたり、例えば個人の嗜好や興味を反映するために要求に応じて実施されたりすることが想定される。そのため記憶装置に格納されたデータから要約映像を出力する処理は特に高速であることが望ましい。 Finally, with regard to (3), the summarization process based on the data stored in the storage device is performed again with the addition of video data, or is performed on demand, for example, to reflect personal preferences and interests. It is assumed that Therefore, it is desirable that the processing for outputting the summary video from the data stored in the storage device is particularly fast.

しかしながら、特許文献１では、要約に含まれる映像区間の選択は特徴の閾値のみで管理されているため、抽出される映像区間は類似したものばかりである可能性があり、（１）の観点が全く考慮されていないという問題がある。 However, in Patent Document 1, the selection of video sections included in the summary is managed only by the threshold value of the feature, so there is a possibility that the extracted video sections are only similar, and the viewpoint of (1) is There is a problem that it is not considered at all.

一方、非特許文献１では、類似した映像区間をクラスタリングによってまとめるといった方法が取られており、映像が重複する問題は解決されている。 On the other hand, in Non-Patent Document 1, a method of collecting similar video sections by clustering is taken, and the problem of overlapping videos is solved.

しかしながら、非特許文献１で用いられるクラスタリングを行うためには、全映像区間間の類似度を記憶装置に格納する必要がある。一般に１時間の映像は約６００の見た目の異なるシーンから構成されると言われているが、仮に１００時間の映像を要約することを考えると、シーンの数はおよそ６００００、全映像区間間の類似度の要素数はおよそ４０億となり、データの格納には膨大容量の記憶装置が必要となるという問題がある。加えて、非特許文献１で用いられるＡｆｆｉｎｉｔｙＰｒｏｐａｇａｔｉｏｎをはじめとする種々のクラスタリング手法は、データサイズの線形、もしくはそれ以上のオーダに比例する計算コストを有する。特に要約対象の映像の数が膨大である場合、記憶装置に格納されたデータから要約映像を生成するための計算コストが膨大であるという問題がある。 However, in order to perform clustering used in Non-Patent Document 1, it is necessary to store the similarity between all video sections in a storage device. In general, it is said that a 1-hour video is composed of about 600 scenes with different appearances. However, if we consider summarizing a 100-hour video, the number of scenes is approximately 60000, and the similarity between all video segments. The number of elements is about 4 billion, and there is a problem that a huge capacity storage device is required to store data. In addition, various clustering methods such as Affinity Propagation used in Non-Patent Document 1 have a calculation cost proportional to the linear order of the data size or higher. In particular, when the number of videos to be summarized is enormous, there is a problem that the calculation cost for generating the summary video from the data stored in the storage device is enormous.

本発明は、このような事情に鑑みてなされたもので、ハードウェア資源と計算コストの増大を抑えて要約映像を生成することができる映像要約装置、映像要約方法及び映像要約プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and provides a video summarization apparatus, a video summarization method, and a video summarization program capable of generating a summary video while suppressing an increase in hardware resources and calculation cost. With the goal.

本発明は、映像から要約映像を生成する映像要約裝置であって、前記映像を構成する映像区間と他の映像区間との類似度に基づき、前記映像区間を一つのノード、該ノード間の類似度をエッジとするグラフによって前記映像を表現するグラフ構築部と、前記グラフを構成するノード群の中から、グラフ分割の開始点となるクエリノードを抽出するクエリ抽出部と、前記クエリノードを開始点として、前記クエリノード近傍のノードから構成するサブグラフを生成することにより前記グラフを分割するグラフ分割処理部と、前記サブグラフの重複度の高いもの同士を結合したクラスタを生成するクラスタ生成部と、前記クラスタを構成する前記ノードの中から、代表ノードを抽出する代表ノード抽出部と、前記代表ノードに対応する映像区間を使用して前記要約映像生成して出力する要約映像出力部とを備えることを特徴とする。 The present invention is a video summarization device for generating a summary video from a video, and based on the similarity between the video segment constituting the video and another video segment, the video segment is defined as one node and a similarity between the nodes. A graph constructing unit that represents the video by a graph having a degree as an edge, a query extracting unit that extracts a query node that is a starting point of graph division from a group of nodes constituting the graph, and the query node is started As a point, a graph division processing unit that divides the graph by generating a subgraph composed of nodes in the vicinity of the query node, and a cluster generation unit that generates a cluster in which those having a high degree of overlap of the subgraphs are combined, A representative node extracting unit for extracting a representative node from the nodes constituting the cluster, and a video section corresponding to the representative node. And use, characterized in that it comprises a video summary output unit for outputting the video summary generation to.

本発明は、前記グラフ分割処理部は、前記サブグラフのうち、クラスタらしさを表す局所的な評価指標を満たすものを前記サブグラフとすることを特徴とする。 The present invention is characterized in that the graph division processing unit uses the subgraph as a subgraph that satisfies a local evaluation index representing the clusteriness.

本発明は、前記代表ノード抽出部は、前記クラスタ内に類似要素を多く持つ前記ノードを前記代表ノードとして抽出することを特徴とする。 The present invention is characterized in that the representative node extracting unit extracts the node having many similar elements in the cluster as the representative node.

本発明は、前記要約映像出力部は、前記代表ノードが含まれる前記クラスタの大きさでランキングし、ランキングの高い前記ノードに対応する映像区間を、指定された要約映像長になるまで順に結合することで前記要約映像を生成することを特徴とする。 In the present invention, the summary video output unit ranks according to the size of the cluster in which the representative node is included, and sequentially combines video sections corresponding to the nodes with high ranking until the specified summary video length is reached. Thus, the summary video is generated.

本発明は、映像から要約映像を生成する映像要約裝置が行う映像要約方法であって、前記映像を構成する映像区間と他の映像区間との類似度に基づき、前記映像区間を一つのノード、該ノード間の類似度をエッジとするグラフによって前記映像を表現するグラフ構築ステップと、前記グラフを構成するノード群の中から、グラフ分割の開始点となるクエリノードを抽出するクエリ抽出ステップと、前記クエリノードを開始点として、前記クエリノード近傍のノードから構成するサブグラフを生成することにより前記グラフを分割するグラフ分割処理ステップと、前記サブグラフの重複度の高いもの同士を結合したクラスタを生成するクラスタ生成ステップと、前記クラスタを構成する前記ノードの中から、代表ノードを抽出する代表ノード抽出ステップと、前記代表ノードに対応する映像区間を使用して前記要約映像生成して出力する要約映像出力ステップとを有することを特徴とする。 The present invention is a video summarization method performed by a video summarization device for generating a summary video from a video, and based on the similarity between the video segment constituting the video and another video segment, the video segment is a node, A graph construction step of expressing the video by a graph having the similarity between the nodes as an edge, a query extraction step of extracting a query node as a starting point of graph division from a group of nodes constituting the graph, Starting with the query node as a starting point, generating a subgraph composed of nodes in the vicinity of the query node, and generating a cluster that combines the graphs with a high degree of redundancy of the subgraphs, and a graph division processing step for dividing the graph A cluster generation step, and a representative node extraction for extracting a representative node from the nodes constituting the cluster. A method, characterized by having an abstract image output step of outputting the video summary generated by using the image section corresponding to the representative node.

本発明は、コンピュータを、前記映像要約装置として機能させるための映像要約プログラムである。 The present invention is a video summarization program for causing a computer to function as the video summarization apparatus.

本発明によれば、ハードウェア資源と計算コストの増大を抑えつつ、高速に要約映像を生成することができるという効果が得られる。 According to the present invention, it is possible to generate a summary video at a high speed while suppressing an increase in hardware resources and calculation cost.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示す映像要約装置１の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video summarizing apparatus 1 shown in FIG. 各映像の分割を伴う場合のグラフの構築の例を示す図である。It is a figure which shows the example of construction | assembly of the graph in the case of accompanying the division | segmentation of each image | video. 各映像の分割を伴わない場合のグラフの構築例を示す図である。It is a figure which shows the construction example of the graph when not dividing | segmenting each image | video. グラフ分割処理の一例を示す図である。It is a figure which shows an example of a graph division | segmentation process. クラスタ生成の一例を示す図である。It is a figure which shows an example of cluster production | generation.

以下、図面を参照して、本発明の一実施形態による映像要約装置を説明する。以下の説明では、一つ以上の映像から構成されるものを映像群と称する。図１は同実施形態の構成を示すブロック図である。この図において、符号１は、コンピュータ装置によって構成する映像要約装置である。符号１１は、映像群を入力する映像入力部である。符号１２は、映像入力部によって入力した映像群を記憶する記憶部である。記憶されるデータは、映像そのもの以外にもメタデータと呼ばれるコンテキスト情報が含まれていても構わない。メタデータとして、例えば映像のタイトルや概要文、音声認識やクローズドキャプションによって得られた発話内容および時刻のデータといったものがある。 Hereinafter, an image summarizing apparatus according to an embodiment of the present invention will be described with reference to the drawings. In the following description, an image composed of one or more images is referred to as an image group. FIG. 1 is a block diagram showing the configuration of the embodiment. In this figure, reference numeral 1 denotes a video summarizing device constituted by a computer device. Reference numeral 11 denotes a video input unit for inputting a video group. Reference numeral 12 denotes a storage unit that stores a video group input by the video input unit. The stored data may include context information called metadata in addition to the video itself. As metadata, for example, there are video titles and summary sentences, speech content obtained by voice recognition and closed captioning, and time data.

符号１３は、記憶部１２に記憶された映像群に基づきグラフを構築するグラフ構築部である。符号１４は、グラフ構築部１３において得られた近傍無向グラフから、グラフ分割処理で用いるクエリノードの抽出を行うクエリ抽出部である。符号１５は、クエリ抽出部１４において得られた各クエリを開始点としてグラフ分割処理を行うグラフ分割処理部である。符号１６は、グラフ分割結果に基づきクラスタを生成するクラスタ生成部である。符号１７は、クラスタ生成部１６において得られた各クラスタから、代表ノードを抽出する代表ノード抽出部である。符号１８は、代表ノード抽出部１７において得られた各代表ノードに対応する映像区間を結合することで、要約映像を出力する要約映像出力部である。 Reference numeral 13 denotes a graph construction unit that constructs a graph based on the video group stored in the storage unit 12. Reference numeral 14 denotes a query extraction unit that extracts a query node used in the graph partitioning process from the neighborhood undirected graph obtained by the graph construction unit 13. Reference numeral 15 denotes a graph division processing unit that performs graph division processing using each query obtained in the query extraction unit 14 as a starting point. Reference numeral 16 denotes a cluster generation unit that generates a cluster based on the graph division result. Reference numeral 17 denotes a representative node extraction unit that extracts a representative node from each cluster obtained by the cluster generation unit 16. Reference numeral 18 denotes a summary video output unit that outputs a summary video by combining video sections corresponding to the representative nodes obtained by the representative node extraction unit 17.

次に、図２を参照して、図１に示す映像要約装置１の処理動作を説明する。図２は、図１に示す映像要約装置１の処理動作を示すフローチャートである。まず、映像入力部１１は、外部から映像群を入力し、記憶部１２に記憶する（ステップＳ１）。記憶部１２に映像群が記憶されると、グラフ構築部１３は、記憶部１２に記憶された映像から、映像区間をノード、ノード間の類似性をエッジとして表現するグラフを構築する（ステップＳ２）。ここで定義された各ノードに対応する映像区間は、要約映像として出力される映像区間の候補である。映像区間は、一つの映像として定義してもよいし、例えば各映像の映像長が長い場合は、各映像を分割することで定義してもよい。以下では一つの例として、各映像を区間に分割し、得られた各映像区間の類似性を考慮することでグラフを構築する処理動作について説明する。 Next, the processing operation of the video summarizing apparatus 1 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the video summarizing apparatus 1 shown in FIG. First, the video input unit 11 inputs a video group from the outside and stores it in the storage unit 12 (step S1). When the video group is stored in the storage unit 12, the graph construction unit 13 constructs a graph expressing the video section as a node and the similarity between the nodes as an edge from the video stored in the storage unit 12 (step S2). ). The video section corresponding to each node defined here is a candidate video section output as a summary video. The video section may be defined as one video, or may be defined by dividing each video when the video length of each video is long, for example. In the following, as one example, a processing operation for constructing a graph by dividing each video into sections and considering the similarity of the obtained video sections will be described.

まず、グラフ構築部１３は、記憶部１２に記憶された各映像を、映像区間に分割する。分割する方法として、一定間隔で分割しても構わないし、参考文献１「Y. Tonomura, A. Akutsu, Y. Taniguchi and G. Suzuki: Structured Video Computing, IEEE Multimedia, pp.34-43, 1994.」に記載される情報など、見た目が不連続に切り替わる点であるカット点で分割してもよい。映像入力部１１によってクローズドキャプションなどの時刻情報付きメタデータも合わせて入力されている場合、その時刻にしたがって映像を分割するようにしてもよい。 First, the graph construction unit 13 divides each video stored in the storage unit 12 into video sections. As a dividing method, it may be divided at regular intervals. Reference 1 “Y. Tonomura, A. Akutsu, Y. Taniguchi and G. Suzuki: Structured Video Computing, IEEE Multimedia, pp. 34-43, 1994. The information may be divided at cut points that are points at which the appearance switches discontinuously. When metadata with time information such as closed caption is also input by the video input unit 11, the video may be divided according to the time.

次に、得られた各映像区間から特徴量を抽出する。映像区間の特徴は、（ｉ）動画像を解析することによって得られる動画像特徴、（ｉｉ）音声を解析することによって得られる音声特徴、（ｉｉｉ）映像入力部でメタデータが付与されている場合、メタデータ特徴、のうち、少なくとも一つ以上から構成されるものであり、これらの特徴量は、ヒストグラムもしくはベクトルで表されるものとする。特徴は任意のものを用いることができる。 Next, a feature amount is extracted from each obtained video section. The features of the video section are (i) a moving image feature obtained by analyzing the moving image, (ii) an audio feature obtained by analyzing the audio, and (iii) metadata added by the video input unit. In this case, it is composed of at least one of metadata features, and these feature amounts are represented by a histogram or a vector. Any feature can be used.

動画像特徴の場合、例えばＬ^＊ａ^＊ｂ^＊色空間における各軸の値を数え上げることで得られるカラーヒストグラムや、参考文献２「A. Oliva and A. Torralba: Building the Gist of a Scene: The Role of Global Image Features in Recognition、Progress in Brain Research, 155, pp. 23-36, 2006.」に記載される景観に関する特徴を表したＧＩＳＴ記述子を用いて得られるベクトルを用いることができる。 In the case of moving image features, for example, a color histogram obtained by counting the values of each axis in the L ^* a ^* b ^* color space, or Reference 2 “A. Oliva and A. Torralba: Building the Gist of a Scene: The Role of Global Image Features in Recognition, Progress in Brain Research, 155, pp. 23-36, 2006. ”can be used vectors obtained using GIST descriptors representing landscape features.

また、音声特徴の場合、例えば音声の韻律に関する特徴を表したメル尺度ケプストラム係数（ＭＦＣＣ，Mel-Frequency Cepstral Coefficients）を用いることができる。また、メタデータ特徴の場合、例えば一つの映像区間を文書、映像区間に付与されたメタデータを単語とみなし、各単語に関するＴＦ−ＩＤＦ値を算出、その値を要素とする文書ベクトルを用いることができる。 In the case of voice features, for example, Mel-Frequency Cepstral Coefficients (MFCC) representing features related to the prosody of voice can be used. Further, in the case of metadata features, for example, one video section is regarded as a document, metadata attached to the video section is regarded as a word, a TF-IDF value for each word is calculated, and a document vector having the value as an element is used. Can do.

なお、ここで抽出される特徴は、グラフが得られた後には必要はないものであり、必ずしも記憶部１２へ記憶する必要はない。 Note that the features extracted here are not necessary after the graph is obtained, and need not necessarily be stored in the storage unit 12.

次に、得られた映像区間群および各映像区間の特徴を用いて、一つの映像区間をノード、ノード間の類似関係をエッジとして表現する近傍無向グラフを構築する。図３は、各映像の分割を伴う場合のグラフの構築の例を示す図である。図３に示すように、映像Ａ（１つ目の映像）を映像区間ａ１、ａ２、ａ３、．．．に分割し、映像Ｂ（２つ目の映像）を映像区間ｂ１、ｂ２、ｂ３、．．．に分割する。３つ目の映像、４つ目の映像についても同様である。そして、映像区間それぞれをノード（図３において○で示している）、ノード間の類似関係をエッジ（図３においては直線で示している）として表現する近傍無向グラフを構築する。図３において、ａ１〜ａ３、ｂ１〜ｂ３は、要約に含まれる映像区間の候補である。 Next, using the obtained video segment group and the characteristics of each video segment, a neighborhood undirected graph is constructed that expresses one video segment as a node and the similarity between nodes as an edge. FIG. 3 is a diagram illustrating an example of construction of a graph in the case where each video is divided. As shown in FIG. 3, the video A (first video) is converted into video sections a1, a2, a3,. . . And video B (second video) is divided into video segments b1, b2, b3,. . . Divide into The same applies to the third video and the fourth video. Then, a neighborhood undirected graph is constructed in which each video section is expressed as a node (indicated by a circle in FIG. 3), and a similarity between nodes is expressed as an edge (indicated by a straight line in FIG. 3). In FIG. 3, a1 to a3 and b1 to b3 are video section candidates included in the summary.

グラフ構築は任意の形態で行うことができ、例えば各ノードから見て近傍ｋ個のノードにのみエッジを張るｋ近傍グラフや、各ノードから見て距離εの中に存在するノードにのみエッジを張るε−グラフを構築すればよい。ｋは正の整数をとるパラメータ、εは正の実数をとるパラメータである。ここで、ノード間の類似度もしくは距離は、映像区間特徴抽出部で得られた特徴間で算出される。類似度尺度もしくは距離尺度は任意のものを用いることが可能で、例えば類似度尺度であればコサイン類似度やＪａｃｃａｒｄ係数、距離尺度としてはユークリッド距離やカイ二乗距離といった公知の尺度を用いればよい。 The graph can be constructed in an arbitrary form, for example, a k-neighbor graph in which edges are extended only to k neighboring nodes as viewed from each node, or edges are provided only to nodes existing within a distance ε as viewed from each node. A stretched ε-graph may be constructed. k is a parameter that takes a positive integer, and ε is a parameter that takes a positive real number. Here, the similarity or distance between nodes is calculated between the features obtained by the video segment feature extraction unit. Any similarity scale or distance scale can be used. For example, a cosine similarity or Jaccard coefficient can be used for the similarity scale, and a known scale such as Euclidean distance or chi-square distance can be used for the distance scale.

最も単純な近傍グラフの構築方法は、全ノード間総当りで類似度もしくは距離を算出し、各ノードに対して条件に合致する近傍ノードを選択するというものである。しかしこの処理の計算オーダはデータ数の二乗に比例するためコストが高い。近傍グラフを高速に構築する方法は多くの公知技術があるため、これらを用いればよい。例えば、参考文献３「W. Dong, M. Charikar and K. Li: Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures, in Proc. WWW, pp.577-586, 2011」、参考文献４「W. Liu, J. He and S.F. Chang: Large Graph Construction for Scalable Semi-Supervised Learning, in Proc. ICML, 2010」、参考文献５「J. Chen, H. Fang and Y. Saad: Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection, Journal of Machine Learning Research, 10, pp. 1989-2012, 2009」等の技術を適用できる。 The simplest method for constructing a neighborhood graph is to calculate a similarity or distance between all nodes and select a neighborhood node that meets the condition for each node. However, the calculation order of this process is high in cost because it is proportional to the square of the number of data. Since there are many known techniques for constructing a neighborhood graph at high speed, these may be used. For example, Reference 3 “W. Dong, M. Charikar and K. Li: Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures, in Proc. WWW, pp. 577-586, 2011”, Reference 4 “W. Liu, J. He and SF Chang: Large Graph Construction for Scalable Semi-Supervised Learning, in Proc. ICML, 2010, Reference 5 `` J. Chen, H. Fang and Y. Saad: Fast Approximate kNN Graph Construction for High Techniques such as “Dimensional Data via Recursive Lanczos Bisection, Journal of Machine Learning Research, 10, pp. 1989-2012, 2009” can be applied.

グラフ構築部１３は、ここで構築したグラフの情報を記憶部１２に記憶する。グラフ構造自体はノードとエッジの情報のみで構成されるため、記憶部１２へ記憶するデータサイズを小さく抑えることが可能となる。なおグラフ構造の記憶には任意のデータベースを用いることができる。例えば隣接ノードへのアクセスにインデクス参照を必要としないデータ構造を有するグラフデータベースを用いれば、データへの高速なアクセスが可能となり好適である。 The graph construction unit 13 stores the information of the graph constructed here in the storage unit 12. Since the graph structure itself includes only node and edge information, the data size stored in the storage unit 12 can be reduced. An arbitrary database can be used for storing the graph structure. For example, if a graph database having a data structure that does not require an index reference for access to an adjacent node is used, high-speed access to data is possible.

前述した説明においては、映像の分割を伴う場合のグラフ構築部１３の処理動作を説明したが、映像の分割を伴わない場合の処理動作は、図４に示すように行えばよい。図４は、各映像の分割を伴わない場合のグラフの構築例を示す図である。図４に示すように、映像１〜Ｎそれぞれをノード（図４において○で示している）、ノード間の類似関係をエッジ（図４においては直線で示している）として表現する近傍無向グラフを構築する。図４において、１〜Ｎは、要約に含まれる映像区間の候補である。図３に示す処理動作と図４に示す処理動作を比較すると、図４の処理動作では、映像区間に分割する処理がない点が異なる。すなわち、図４に示す処理は、図３に示す処理動作において１つの映像に映像区間が１つのみある場合と同じである。 In the above description, the processing operation of the graph construction unit 13 when video division is involved has been described, but the processing operation when video division is not involved may be performed as shown in FIG. FIG. 4 is a diagram illustrating a construction example of a graph in a case where each video is not divided. As shown in FIG. 4, a neighborhood undirected graph expressing each of the images 1 to N as a node (indicated by a circle in FIG. 4) and a similarity between the nodes as an edge (indicated by a straight line in FIG. 4). Build up. In FIG. 4, 1 to N are video segment candidates included in the summary. When the processing operation shown in FIG. 3 is compared with the processing operation shown in FIG. 4, the processing operation shown in FIG. 4 is different in that there is no processing for dividing the video section. That is, the processing shown in FIG. 4 is the same as the case where there is only one video section in one video in the processing operation shown in FIG.

次に、クエリ抽出部１４は、グラフ構築部１３において得られた近傍無向グラフから、グラフ分割処理で用いるクエリノードの抽出を行う（ステップＳ３）。クエリノードの抽出には任意の方法を用いることができる。例えば、（ｉ）次数（ノードに張られたエッジの数）次数が高いノードをクエリとして抽出する方法、（ｉｉ）任意の数のノードをランダムにクエリとして抽出する方法を用いることができる。 Next, the query extraction unit 14 extracts query nodes used in the graph partitioning process from the neighborhood undirected graph obtained by the graph construction unit 13 (step S3). Any method can be used to extract query nodes. For example, it is possible to use (i) a method of extracting a node having a high degree (the number of edges attached to a node) as a query, and (ii) a method of extracting an arbitrary number of nodes as a query at random.

加えてユーザの興味や嗜好が既知である場合、その情報に基づいてクエリを抽出してもよい。例えば（ｉｉｉ）ユーザが興味を持った映像区間が与えられている場合、それらをクエリとする方法、（ｉｖ）ノードにメタデータが付与されている場合、ユーザが興味を持っているメタデータが付与されている映像区間をクエリとして抽出する方法を用いることができる。 In addition, when the user's interests and preferences are known, a query may be extracted based on the information. For example, (iii) When video sections in which the user is interested are given, a method using them as a query, (iv) When metadata is given to a node, the metadata in which the user is interested A method of extracting a given video section as a query can be used.

次に、グラフ分割処理部１５は、クエリ抽出部１４において得られた各クエリを開始点としてグラフ分割処理を行う（ステップＳ４）。図５は、グラフ分割処理の一例を示す図である。図５に示すように、クエリノードを開始点として、開始点の近傍において、エッジを辿りながらグラフ分割を行う。ここで言うグラフ分割処理とは、クエリノード近傍の一つ以上のノード群から構成されるサブグラフのうち、最も「クラスタらしい」サブグラフを出力する処理を指す。「クラスタらしさ」を表す評価指標は、グラフ全体の情報をふまえた大域的な評価指標とグラフの局所的な情報のみをふまえた局所的な評価指標の大きく２種類が存在するが、ここでは後者の局所的な評価指標を用いてグラフ分割処理を行う。 Next, the graph division processing unit 15 performs graph division processing using each query obtained in the query extraction unit 14 as a starting point (step S4). FIG. 5 is a diagram illustrating an example of the graph division process. As shown in FIG. 5, graph division is performed while tracing an edge in the vicinity of the start point with the query node as the start point. The graph partitioning process here refers to a process of outputting the most “cluster-like” subgraph among subgraphs composed of one or more node groups near the query node. There are two types of evaluation indexes that represent “cluster-likeness”: a global evaluation index based on information of the entire graph and a local evaluation index based only on local information of the graph. Graph segmentation processing is performed using the local evaluation index.

局所的な評価指標を用いることによって、個々のクエリを開始点とするグラフ分割処理はグラフ全体のデータサイズに依存せず、よって高速な処理が可能となる。また、グラフ分割処理はクエリ毎に完全に独立であり、容易に並列化が可能であるため、これも処理の高速化に寄与する要因である。更には、評価指標がグラフの局所性を強く反映したものであるため、得られるサブグラフもまた局所性が強く反映されたものとなる。局所性が強く反映されることで、要素間の類似性が明確なサブグラフが生成され、その結果映像群の概要把握により効果のある映像区間を要約映像に含めることに寄与する。 By using a local evaluation index, the graph division processing starting from each query does not depend on the data size of the entire graph, and thus high-speed processing is possible. Further, the graph partitioning process is completely independent for each query, and can be easily parallelized. This is also a factor contributing to the speeding up of the process. Furthermore, since the evaluation index strongly reflects the locality of the graph, the obtained subgraph also reflects the locality strongly. By strongly reflecting the locality, a subgraph with a clear similarity between elements is generated, and as a result, an effective video section is included in the summary video by grasping the outline of the video group.

クラスタらしさの局所的な評価指標としては任意のものを用いることができる。例えば（１）式に示されるｄｅｎｓｉｔｙと呼ばれる指標や、（２）式に示されるｃｏｎｄｕｃｔａｎｃｅと呼ばれる指標を用いればよい。

Any local evaluation index for clusteriness can be used. For example, an index called density shown in equation (1) or an index called conductance shown in equation (2) may be used.

（１）式、（２）式において、ｖｏｌ（Ｓ）はサブグラフＳに含まれるノードの次数和、δ（Ｓ）はサブグラフＳとその外部を接続するエッジ数、｜Ｓ｜はサブグラフの要素数を表す。Ｄｅｎｓｉｔｙはその値が大きいほど、ｃｏｎｄｕｃｔａｎｃｅはその値が小さいほどクラスタらしいことを表している。 In equations (1) and (2), vol (S) is the order sum of nodes included in the subgraph S, δ (S) is the number of edges connecting the subgraph S and the outside, and | S | is the number of elements in the subgraph. Represents. Density indicates that the larger the value, the smaller the value, and the smaller the value, the more likely the cluster.

これら２つの指標について、Ｄｅｎｓｉｔｙよりもｃｏｎｄｕｃｔａｎｃｅを評価指標とした結果の方がよりよい結果を出力できるため、ｃｏｎｄｕｃｔａｎｃｅを指標として採用するほうがより好適である。 With respect to these two indexes, it is more preferable to adopt the conductance as the index because the result using the conductance as the evaluation index can be output better than the density.

ｃｏｎｄｕｃｔａｎｃｅを指標としたグラフ分割アルゴリズムはいくつか提案されており、ここではそれらの公知技術から任意のものを用いればよい。例えば、参考文献６「D. A. Spielman and S. H. Ten: A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning, CoRR, abs/0809.3232, 2008」は、グラフを構成する各ノードの次数から得られる遷移確率行列に基づきクエリノードから近傍ノードへの遷移確率を計算し、遷移確率の高いノード群から構成されるサブグラフの中から、ｃｏｎｄｕｃｔａｎｃｅを最小とするものを出力する方法が開示されている。 Several graph partitioning algorithms using conductance as an index have been proposed, and any of these known techniques may be used here. For example, Reference 6 “DA Spielman and SH Ten: A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time Graph Partitioning, CoRR, abs / 0809.3232, 2008” is obtained from the order of each node constituting the graph. A method is disclosed in which a transition probability from a query node to a neighboring node is calculated based on a transition probability matrix to be output, and a subgraph composed of a node group having a high transition probability is output with a minimum conductance.

また、参考文献７「R. Andersen and F. Chung: Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm, in Proc. TAMC, pp. 1-12, 2008」は、遷移確率行列の代わりにページランク行列を用いる。クエリノードに対し重要度の高いノード群から構成されるサブグラフの中から、ｃｏｎｄｕｃｔａｎｃｅを最小とするものを出力する方法が開示されている。 Reference 7 “R. Andersen and F. Chung: Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm, in Proc. TAMC, pp. 1-12, 2008” is a page rank matrix instead of a transition probability matrix. Is used. A method is disclosed in which a subgraph composed of a group of nodes having high importance for a query node is output with a minimum conductance.

また、参考文献８「R. Andersen and Y. Peres: Finding Sparse Cuts Locally Using Evolving Sets, in Proc. STOC, pp.235-244, 2009」は、開始点近傍における状態遷移確率をサンプリングによってシミュレートし、グラフ全体の遷移確率行列を用いることなくｃｏｎｄｕｃｔａｎｃｅを最小とするサブグラフを出力する方法が開示されている。 Reference 8 “R. Andersen and Y. Peres: Finding Sparse Cuts Locally Using Evolving Sets, in Proc. STOC, pp.235-244, 2009” simulates the state transition probabilities near the starting point by sampling. A method of outputting a subgraph that minimizes conductance without using the transition probability matrix of the entire graph is disclosed.

参考文献６〜参考文献８で開示されているグラフ分割処理のうち、もっとも高速であるのは参考文献８の処理である。一方で参考文献８で開示されている技術は、確率的にサンプリングされた値を用いるため、試行毎に得られるサブグラフが必ずしも一致しないという特徴がある。 Among the graph division processes disclosed in Reference Documents 6 to 8, the process of Reference Document 8 is the fastest. On the other hand, the technique disclosed in Reference 8 has a feature that subgraphs obtained for each trial do not always match because a value sampled stochastically is used.

また、参考文献６〜参考文献８で開示されているいずれの技術においても、得られるサブグラフは繰り返し処理によってそのサイズ（サブグラフに含まれるノードの数）が大きくなるように更新されていく。繰り返し処理を終了させる方法としては、例えばあらかじめｃｏｎｄｕｃｔａｎｃｅの閾値を設定したり、繰り返し処理回数の上限を定めておいたりするなど、任意の方法を用いることができる。 In any of the techniques disclosed in Reference Documents 6 to 8, the obtained subgraph is updated so that the size (the number of nodes included in the subgraph) is increased by iterative processing. As a method of ending the iterative process, for example, an arbitrary method such as setting a conductance threshold value or setting an upper limit of the number of repetition processes can be used.

グラフ分割処理部１５におけるグラフ分割処理はクエリ毎に完全に独立しているため、得られたサブグラフ群の中には部分的、もしくは完全に重複しているものが存在する可能性がある。クラスタ生成部１６は、サブグラフの重複度が高いもの同士についてはサブグラフを結合させ、新たなサブグラフを生成する。この処理を経て、クラスタ生成部１６は、グラフ分割処理部１５において得られたサブグラフ群から、クラスタ群を生成する（ステップＳ５）。図６は、クラスタ生成の一例を示す図である。図６に示すように、サブグラフ１とサブグラフ２では、重複している部分が大きい（重複しているノードが多い）ため、サブグラフ１とサブグラフ２と結合して、クラスタとする。 Since the graph division processing in the graph division processing unit 15 is completely independent for each query, there is a possibility that the obtained subgraph groups partially or completely overlap. The cluster generation unit 16 combines the subgraphs for those with high subgraph overlap, and generates a new subgraph. Through this process, the cluster generation unit 16 generates a cluster group from the subgraph group obtained by the graph division processing unit 15 (step S5). FIG. 6 is a diagram illustrating an example of cluster generation. As shown in FIG. 6, in the subgraph 1 and the subgraph 2, the overlapping portion is large (there are many overlapping nodes), so the subgraph 1 and the subgraph 2 are combined to form a cluster.

サブグラフの重複度は任意の方法で求めることができる。例えばサブグラフＳ_１とＳ_２に対し、（３）式の評価式を満たす場合両者の和集合として新たなサブグラフを生成するといった方法を用いればよい。（３）式においてρは実数パラメータである。

The degree of overlap of subgraphs can be obtained by an arbitrary method. For example, for the subgraphs S ₁ and S ₂ , when the evaluation formula (3) is satisfied, a new subgraph may be generated as a union of both. In equation (3), ρ is a real parameter.

次に、代表ノード抽出部１７は、クラスタ生成部１６で得られた各クラスタから、代表ノードを抽出する（ステップＳ６）。ここで代表ノードは、クラスタ内で類似した要素を最も多く含むものを抽出する。クラスタ内の各要素は基本的に類似した要素から構成されるが、その中でも特に類似した要素の多いノードは、クラスタの中心を担うノードであると言える。クラスタ中心とも言えるノードを代表として抽出することで、続く処理の結果得られる要約映像が、より映像群の把握に効果的な映像区間を含むことに寄与する。 Next, the representative node extraction unit 17 extracts a representative node from each cluster obtained by the cluster generation unit 16 (step S6). Here, a representative node is extracted that contains the most similar elements in the cluster. Each element in the cluster is basically composed of similar elements. Among them, a node having many similar elements can be said to be a node that plays a central role in the cluster. By extracting a node that can be said to be the cluster center as a representative, the summary video obtained as a result of the subsequent processing contributes to including a video section that is more effective for grasping the video group.

クラスタ内類似要素を最も多く含むノードの抽出には任意の方法を用いることができる。例えばクラスタ内次数が最も高いノードを代表ノードとして抽出する方法や、グラフ構築部で算出したノード間の類似度を利用して、クラスタ内における類似度の総和が最大となるものを代表ノードとして抽出する方法を用いればよい。 Any method can be used to extract a node including the most similar elements in the cluster. For example, using the method of extracting the node with the highest degree in the cluster as a representative node, or using the similarity between nodes calculated by the graph construction unit, the node with the maximum sum of similarities in the cluster is extracted as the representative node What is necessary is just to use the method to do.

次に、要約映像出力部１８は、代表ノード抽出部１７において得られた各代表ノードに対応する映像区間を結合することで、要約映像を出力する（ステップＳ７）。ここで、例えば出力する要約映像の映像長があらかじめ指定されていた場合など、代表ノードに対応する映像区間全てを要約映像に含むことができないという場合が考えられる。そこで、類似したノードをより多く持つクラスタから抽出された映像区間を優先的に要約映像に含めることとする。これには、クラスタ内要素の数を数え上げ、クラスタ群をランキングすればよい。クラスタ内要素数の多い映像区間を優先することは、映像群を代表する映像区間が優先的に要約映像に含められることになり、結果得られる要約映像が映像群の概要把握に効果的な映像区間を含むことに寄与する。 Next, the summary video output unit 18 combines the video sections corresponding to the representative nodes obtained by the representative node extraction unit 17 to output a summary video (step S7). Here, for example, when the video length of the summary video to be output has been specified in advance, the video image corresponding to the representative node cannot be included in the summary video. Therefore, video segments extracted from clusters having more similar nodes are preferentially included in the summary video. For this purpose, the number of elements in the cluster is counted and the cluster group is ranked. By giving priority to video segments with a large number of elements in the cluster, video segments representing the video group are preferentially included in the summary video, and the resulting summary video is an effective video for grasping the outline of the video group. Contributes to including intervals.

なお、映像区間の順序については任意の方法を用いることができる。最も簡単な方法は、映像区間選択のために算出したクラスタのランキング結果の順に並べることである。あるいは放送日時など、映像の前後関係を示すメタデータがあらかじめ与えられていた場合、その情報に基づいて映像区間を並べ替えてもよい。 An arbitrary method can be used for the order of the video sections. The simplest method is to arrange the ranking results of clusters calculated for video segment selection. Alternatively, when metadata indicating the context of the video such as broadcast date and time is given in advance, the video sections may be rearranged based on the information.

以上説明したように、前述した（１）〜（３）のいずれも満たす要約処理を実現するため、映像区間をノード、ノード間の特徴の類似性をエッジとするグラフで大規模映像群を表現し、このグラフ構造を記憶部へ記憶する。そして、要素間類似度の全ての組み合わせをデータとして保持する必要のある既存手法に比べ、本発明ではノードと、近傍ノード間を接続するエッジの情報のみでデータの全体像を構成することができるため、（２）の要約処理に必要なデータサイズの問題を解決することができる。 As described above, in order to realize the summarization processing satisfying all of (1) to (3) described above, a large-scale video group is represented by a graph having video segments as nodes and feature similarity between nodes as edges. The graph structure is stored in the storage unit. And compared with the existing method which needs to hold | maintain all the combinations of the similarity between elements as data, in this invention, the whole image of data can be comprised only with the information of the edge which connects between a node and a neighboring node. Therefore, the problem of the data size necessary for the summarization process (2) can be solved.

また、記憶部に記憶されたグラフのクラスタリングを行い、得られた各クラスタから代表ノードを抽出することによって要約映像を出力するようにした。これにより（１）の重複性の問題を解決することができる。更に本発明で用いるクラスタリング処理は、グラフの局所的な情報のみをふまえた評価指標に基づく、局所グラフ分割処理に基づいたクラスタリングを行う。クラスタリング処理を行うための評価指標がデータサイズに依存していた従来技術に比べ、本発明ではクラスタリング処理の計算コストがデータ数に依存しなくなり、またクラスタリング処理の並列性が増すため、（３）の処理速度の問題を解決することができる。 Also, the summary video is output by clustering the graphs stored in the storage unit and extracting representative nodes from the obtained clusters. As a result, the redundancy problem (1) can be solved. Further, the clustering processing used in the present invention performs clustering based on local graph division processing based on an evaluation index based on only local information of a graph. Compared with the prior art in which the evaluation index for performing the clustering process depends on the data size, in the present invention, the calculation cost of the clustering process does not depend on the number of data, and the parallelism of the clustering process increases. The processing speed problem can be solved.

特に、映像区間をノード、ノード間の特徴の類似性をエッジとするグラフにより映像群を表現することで、要約映像生成処理中に記憶部に常に記憶しておくべきデータのサイズを抑えることができる。また、グラフ分割処理部におけるクエリ毎の処理はグラフのサイズ、すなわちデータ数に依存しないため、高速に処理を実行することができる。またグラフ分割処理はクエリ毎に完全に独立しているため、容易に並列化することが可能であり、処理を高速化することができる。 In particular, the video group is represented by a graph in which the video section is a node and the feature similarity between the nodes is an edge, thereby suppressing the size of data that should always be stored in the storage unit during the summary video generation process. it can. Further, since the processing for each query in the graph division processing unit does not depend on the size of the graph, that is, the number of data, the processing can be executed at high speed. In addition, since the graph partitioning process is completely independent for each query, it can be easily parallelized, and the processing speed can be increased.

また、グラフ分割処理部およびクラスタ生成部から得られるクラスタはグラフの局所性が強く反映されるため、より直感的に類似性が明らかな要素から構成されるクラスタを生成することができる。これは、生成される要約映像の精度を高める効果を持ち、映像区間の重複が少なく質の高い要約映像を出力することができる。 In addition, since the locality of the graph is strongly reflected in the clusters obtained from the graph division processing unit and the cluster generation unit, it is possible to generate a cluster composed of elements whose similarity is clear more intuitively. This has the effect of increasing the accuracy of the generated summary video, and it is possible to output a high-quality summary video with little overlap of video sections.

また、クラスタに含まれる要素の多さでクラスタをランキングし、上位のクラスタからクラスタ内類似度が最も高いノードに対応する映像区間を要約映像に含める映像区間とすることで、映像区間の重複が少なく質の高い要約映像を出力することができる。 In addition, by ranking the clusters according to the number of elements included in the cluster, and by setting the video section corresponding to the node with the highest similarity in the cluster from the upper cluster as the video section to be included in the summary video, there is no overlap of video sections. It is possible to output a high-quality summary video with little quality.

なお、図１における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより映像要約処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 Note that a video summarization process is performed by recording a program for realizing the function of the processing unit in FIG. 1 on a computer-readable recording medium, causing the computer system to read and execute the program recorded on the recording medium. You may go. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Accordingly, additions, omissions, substitutions, and other changes of the components may be made without departing from the technical idea and scope of the present invention.

ハードウェア資源と計算コストの増大を抑えて要約映像を生成することが不可欠な用途に適用できる。 It can be applied to applications where it is indispensable to generate summary video while suppressing increase of hardware resources and calculation cost.

１・・・映像要約装置、１１・・・映像入力部、１２・・・記憶部、１３・・・グラフ構築部、１４・・・クエリ抽出部、１５・・・グラフ分割処理部、１６・・・クラスタ生成部、１７・・・代表ノード抽出部、１８・・・要約映像出力部 DESCRIPTION OF SYMBOLS 1 ... Image | video summarization apparatus, 11 ... Image | video input part, 12 ... Memory | storage part, 13 ... Graph construction part, 14 ... Query extraction part, 15 ... Graph division | segmentation process part, 16. ..Cluster generation unit, 17 ... representative node extraction unit, 18 ... summary video output unit

Claims

A video summary device for generating a summary video from a video,
Based on the similarity between the video section constituting the video and another video section, a graph construction unit that represents the video by a graph having the video section as one node and the similarity between the nodes as an edge;
A query extraction unit that extracts a query node that is a starting point of graph division from a group of nodes constituting the graph;
A graph division processing unit that divides the graph by generating a subgraph composed of nodes near the query node, starting from the query node;
A cluster generation unit for generating a cluster in which the subgraphs having a high degree of overlap are combined;
A representative node extracting unit for extracting a representative node from the nodes constituting the cluster;
A summary video output unit that generates and outputs the summary video using a video section corresponding to the representative node ;
The representative node extraction unit includes:
The video summarization apparatus, wherein the node having many similar elements in the cluster is extracted as the representative node .

A video summary device for generating a summary video from a video,
Based on the similarity between the video section constituting the video and another video section, a graph construction unit that represents the video by a graph having the video section as one node and the similarity between the nodes as an edge;
A query extraction unit that extracts a query node that is a starting point of graph division from a group of nodes constituting the graph;
A graph division processing unit that divides the graph by generating a subgraph composed of nodes near the query node, starting from the query node;
A cluster generation unit for generating a cluster in which the subgraphs having a high degree of overlap are combined;
A representative node extracting unit for extracting a representative node from the nodes constituting the cluster;
A summary video output unit that generates and outputs the summary video using a video section corresponding to the representative node ;
The summary video output unit includes:
The summary video is generated by ranking by the size of the cluster in which the representative node is included, and sequentially combining video sections corresponding to the nodes with a high ranking until the specified summary video length is reached. A video summary device.

The graph division processing unit
Among the sub-graph, the video summarizing apparatus for an according to satisfy the local evaluation index representing the cluster likelihood to claim 1 or 2, characterized in that said subgraph.

A video summarization method performed by a video summarization device that generates a summary video from a video,
Based on the similarity between the video section constituting the video and another video section, a graph construction step for expressing the video by a graph having the video section as one node and the similarity between the nodes as an edge;
A query extraction step of extracting a query node that is a starting point of graph division from a group of nodes constituting the graph;
A graph division processing step of dividing the graph by generating a subgraph composed of nodes near the query node, starting from the query node;
A cluster generation step of generating a cluster in which the subgraphs having a high degree of overlap are combined;
A representative node extracting step of extracting a representative node from the nodes constituting the cluster;
Using an image section corresponding to the representative node possess an abstract image output step for generating and outputting the video summary,
In the representative node extraction step,
The video summarizing method, wherein the node having many similar elements in the cluster is extracted as the representative node .

A video summarization method performed by a video summarization device that generates a summary video from a video,
Based on the similarity between the video section constituting the video and another video section, a graph construction step for expressing the video by a graph having the video section as one node and the similarity between the nodes as an edge;
A query extraction step of extracting a query node that is a starting point of graph division from a group of nodes constituting the graph;
A graph division processing step of dividing the graph by generating a subgraph composed of nodes near the query node, starting from the query node;
A cluster generation step of generating a cluster in which the subgraphs having a high degree of overlap are combined;
A representative node extracting step of extracting a representative node from the nodes constituting the cluster;
Using an image section corresponding to the representative node possess an abstract image output step for generating and outputting the video summary,
In the summary video output step,
The summary video is generated by ranking by the size of the cluster in which the representative node is included, and sequentially combining video sections corresponding to the nodes with a high ranking until the specified summary video length is reached. Video summarization method.

A video summarization program for causing a computer to function as the video summarization device according to any one of claims 1 to 3 .