JP2014112280A

JP2014112280A - Video group reconfiguration/summarizing device, video group reconfiguration/summarizing method, and video group reconfiguration/summarizing program

Info

Publication number: JP2014112280A
Application number: JP2012266063A
Authority: JP
Inventors: Shuhei Tarashima; 周平田良島; Takashi Sato; 隆佐藤; Takeshi Tono; 豪東野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-12-05
Filing date: 2012-12-05
Publication date: 2014-06-19
Anticipated expiration: 2032-12-05
Also published as: JP5886733B2

Abstract

PROBLEM TO BE SOLVED: To reconfigure a video group to a video section group for each subject, without using any prior knowledge.SOLUTION: A step for dividing text data with time intervals into one or more segments along a time-axis and allocating each a topic corresponding to a keyword in each segment to the respective segment and a step for combining segments adjoining temporally based on a similarity of the topic are repeated. After the steps are repeated, each segment is clustered for each topic allocated to the segment.

Description

本発明は、映像群を再構成し、再構成後の映像を要約する技術に関する。 The present invention relates to a technique for reconstructing a video group and summarizing the reconstructed video.

映像コンテンツの増加はとどまるところを知らず、日々膨大な量の新しい映像が生成され、放送網やネットワークを介して流通している。映像は時間軸を有するメディアである。 There is no end to the increase in video content, and a huge amount of new video is generated every day and distributed through broadcast networks and networks. An image is a medium having a time axis.

そのため、膨大な映像コンテンツ群の中からその傾向を把握したり、有益な情報含むシーン、そのものが魅力的であるシーンを発見したりすることには、多大な労力と時間を要する。 For this reason, it takes a lot of labor and time to grasp the tendency from a huge group of video contents and to find a scene that contains useful information, which itself is attractive.

そこで、膨大な映像コンテンツ群をわかりやすく整理したり、いわゆる「見どころ」のシーンを抽出したりすることにより、映像コンテンツの効果的な視聴を補助するための仕組みが求められている。 Therefore, there is a need for a mechanism for assisting effective viewing of video content by organizing an enormous amount of video content in an easy-to-understand manner or extracting so-called “highlight” scenes.

ひとえに映像と言っても多種多様な構成が考えられるが、代表的な映像構成の一つとして、複数の話題から構成される映像というものが挙げられる。例えば、テレビで放映されるニュース映像はその一例である。 There are many different configurations for video, but one of the typical video configurations is video composed of multiple topics. For example, a news video broadcast on television is an example.

多くの場合、地上波放送や衛星放送等で放映される一つのニュース番組の中では、政治、経済、国際、芸能、スポーツといった様々なトピックが扱われている。番組内で各トピックが扱われる順番や映像長について、番組放送時点における最新の情報および出来事に強く影響されるため、明確なルールは一般には存在しない。 In many cases, various topics such as politics, economy, international, performing arts, and sports are dealt with in one news program broadcasted by terrestrial broadcasting or satellite broadcasting. There is generally no clear rule for the order and video length in which each topic is handled in the program because it is strongly influenced by the latest information and events at the time of the program broadcast.

講義映像も、複数の話題から構成される映像の一例である。例えば、特定の科目に関する講義映像を考えたとき、例えば、数学における「微積分」や「ベクトル」のような分野を話題と捉えれば、講義映像は複数の話題から構成される映像であると言える。 A lecture video is also an example of a video composed of a plurality of topics. For example, when a lecture video about a specific subject is considered, if a field such as “calculus” and “vector” in mathematics is considered as a topic, it can be said that the lecture video is a video composed of a plurality of topics.

更には、国会や県議会、市議会の様子を収めた議会映像も、複数の話題から構成される映像の例である。例えば、「今年度の予算」や「条例の改定」といった議題を話題と捉えれば、議会映像が複数の話題から構成される映像であることは明らかである。 Furthermore, a parliamentary video showing the state of the Diet, prefectural assembly, and city council is an example of a video composed of multiple topics. For example, if an agenda such as “Budget for the current fiscal year” or “Revision of the ordinance” is regarded as a topic, it is clear that the Congress video is a video composed of a plurality of topics.

これらの例に代表される、複数の話題から構成される映像が複数存在するとき、これらの映像群を効率的に視聴する一つの方法として、映像群を、映像群の中で扱われている話題に基づいて再構成し、話題毎の映像又はそのダイジェスト映像を生成するといった方法が考えられる。 When there are a plurality of videos composed of a plurality of topics, represented by these examples, the video group is handled in the video group as one method for efficiently viewing these video groups. A method of reconstructing based on a topic and generating a video for each topic or its digest video is conceivable.

これについて、「ある一週間の間に複数の放送局で放送されたニュース映像群」を例に具体的に述べる。いま、一週間のニュース映像群の中で扱われている、政治に関連する話題の映像区間のみが抽出され、視聴可能であるとする。この映像区間群を時系列順に視聴することにより、その一週間における政治の流れを容易に把握することができる。 This will be specifically described by taking as an example “a group of news videos broadcast by a plurality of broadcasting stations during a certain week”. Now, it is assumed that only a video section of a topic related to politics that is handled in a news video group for one week is extracted and can be viewed. By viewing this video segment group in chronological order, it is possible to easily grasp the political flow during the week.

また、話題毎に再構成された映像の中から、重要な映像区間のみを視聴することにより、政治の流れをより短時間で視聴、把握することが可能となる。更には、例えば、複数の放送局で同時刻又は同日に放送された同一の話題に関する映像を視聴比較するなど、時間やチャンネルの観点などを考慮した視聴方法をとることにより、例えば、同一の話題に対するチャンネル間での意見の差異といったものを容易かつ明確に把握することが可能となる。 Also, by viewing only important video sections from videos reconstructed for each topic, it becomes possible to view and understand the political flow in a shorter time. Furthermore, for example, by using a viewing method that considers time and channel viewpoints, such as viewing and comparing videos related to the same topic broadcasted at the same time or on the same day by a plurality of broadcasting stations, for example, the same topic It is possible to easily and clearly grasp the difference in opinions between channels.

このように、映像群を話題毎に再構成し、話題毎の視聴又は要約映像の視聴が可能になることにより、映像群をより効率的に視聴することが可能になると考えられる。 As described above, it is considered that the video group can be viewed more efficiently by reconstructing the video group for each topic and allowing viewing of each topic or summary video.

映像群を短時間で効率的に視聴するための要約映像を生成する技術として、特許文献１では、（１）ユーザの要求に対する適合性、（２）意味的な網羅性、（３）見た目の網羅性、（４）視聴しやすさ、（５）時間長を考慮し、映像群から一本の要約映像を生成する技術が開示されている。 As a technique for generating a summary video for efficiently viewing a video group in a short time, Patent Document 1 describes (1) suitability for user requirements, (2) semantic completeness, and (3) appearance. A technique for generating one summary video from a video group in consideration of completeness, (4) ease of viewing, and (5) time length is disclosed.

また、複数の話題から構成される映像を含む映像群を話題毎に再構成し、更に要約するためには、各映像における話題の切り替わりを検出し、その結果得られた各映像区間がいかなる話題を扱ったものかを割り当てる必要がある。 In addition, in order to reconstruct a video group including videos composed of multiple topics for each topic and further summarize, it is possible to detect topic switching in each video, and each video section obtained as a result Need to be assigned.

以降、映像の話題の切り替わりの検出をトピックセグメンテーションとも呼称する。トピックセグメンテーションに関連する技術は、特許文献２や非特許文献１に開示されている。 Hereinafter, the detection of video topic switching is also referred to as topic segmentation. Technologies related to topic segmentation are disclosed in Patent Literature 2 and Non-Patent Literature 1.

特開２０１２−１９３０５号公報JP 2012-19305 A 特開２００５−１６７４５２号公報Japanese Patent Laying-Open No. 2005-167452

別所克人、“クラスター内変動最小アルゴリズムに基づくトピックセグメンテーション”、言語処理学会・自然言語処理研究会、研究報告、自然言語処理154-25、2003年3月7日、p.177-183Katsuhito Bessho, “Topic Segmentation Based on the Intracluster Fluctuation Minimal Algorithm”, Natural Language Processing Society / Natural Language Processing Study Group, Research Report, Natural Language Processing 154-25, March 7, 2003, p.177-183

しかしながら、特許文献１の技術では、映像の順序的な見やすさは考慮されるものの、各映像区間に含まれる話題の情報は必ずしも反映されない。よって、例えば、「政治トピック→スポーツトピック→政治トピック」というように、同一の話題の間に異なるトピックの映像区間が挿入された要約映像が往々にして生成される可能性があり、必ずしもわかりやすい映像が出力されるとは限らないという問題点がある。 However, in the technique of Patent Document 1, although information on orderly viewing of videos is taken into consideration, topic information included in each video section is not necessarily reflected. Therefore, for example, a summary video in which video sections of different topics are inserted between the same topic, such as “political topic → sports topic → political topic”, is often generated, and is always easy to understand. Is not always output.

また、特許文献２や非特許文献１の技術は、学習コーパス等の事前知識を必要とする。特に即時性が求められるニュース番組のような映像の場合、番組内で発話又は字幕表示される単語は既存のコーパスには含まれていないものも少なくないため、事前知識はその都度最新のものへと更新する必要があり、その時間的コストは高いという問題点がある。 Further, the techniques of Patent Literature 2 and Non-Patent Literature 1 require prior knowledge such as a learning corpus. Especially in the case of a video such as a news program that requires immediacy, words that are uttered or displayed in subtitles in the program are often not included in the existing corpus, so prior knowledge should be updated each time. There is a problem that the time cost is high.

本発明は、上記事情を鑑みてなされたものであり、何らの事前知識を用いることなく映像群を話題毎の映像区間群に再構成することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to reconstruct a video group into video section groups for each topic without using any prior knowledge.

請求項１記載の映像群再構成・要約装置は、時間区間付きテキストデータが付与されている映像から構成される映像群を再構成し、映像を要約する映像群再構成・要約装置において、前記時間区間付きテキストデータをデータ記憶手段から読み出して、時間軸に沿って１つ以上のセグメントに分割するセグメント初期化手段と、各セグメント内のキーワードに応じたトピックを各セグメントにそれぞれ割り当てる処理と、前記トピックの類似度に基づいて時間的に隣接するセグメントを結合する処理とを繰り返すトピック割り当て・セグメント更新手段と、前記繰り返された後の各セグメントを当該セグメントに割り当てられたトピック毎にクラスタリングするセグメント群クラスタリング手段と、当該トピックに対する前記キーワードの重要度に基づいて、クラスタリングされたトピック内のセグメントに対応する映像区間から重要映像区間を抽出する重要映像区間抽出手段と、各クラスタの前記映像区間と各クラスタの前記重要映像区間との少なくとも一方に基づいて映像を出力する映像出力手段と、を有することを要旨とする。 The video group reconstruction / summarization device according to claim 1, wherein the video group reconstruction / summarization device reconstructs a video group composed of videos to which text data with a time interval is added, and summarizes the video. A segment initialization unit that reads text data with a time interval from the data storage unit and divides the text data into one or more segments along the time axis, and a process of assigning topics according to keywords in each segment to each segment, A topic assignment / segment update unit that repeats the process of combining temporally adjacent segments based on the similarity of the topic, and a segment that clusters the repeated segments for each topic assigned to the segment Group clustering means and the weight of the keyword for the topic An important video segment extracting means for extracting an important video segment from video segments corresponding to the segments in the clustered topic based on the degree, and at least one of the video segment of each cluster and the important video segment of each cluster. And a video output means for outputting a video based on the above.

本発明によれば、時間区間付きテキストデータを時間軸に沿って１つ以上のセグメントに分割し、各セグメント内のキーワードに応じたトピックを各セグメントにそれぞれ割り当てる処理と前記トピックの類似度に基づいて時間的に隣接するセグメントを結合する処理とを繰り返し、繰り返された後の各セグメントを当該セグメントに割り当てられたトピック毎にクラスタリングするため、何らの事前知識を用いることなく映像群を話題毎の映像区間群に再構成できる。 According to the present invention, text data with a time section is divided into one or more segments along the time axis, and a topic according to a keyword in each segment is assigned to each segment and the similarity between the topics. And repeating the process of combining temporally adjacent segments, and clustering the repeated segments for each topic assigned to the segment, so that the video group can be grouped for each topic without using any prior knowledge. It can be reconfigured into video segments.

請求項２記載の映像群再構成・要約装置は、請求項１記載の映像群再構成・要約装置において、前記トピック割り当て・セグメント更新手段は、セグメントに含まれるキーワードの使用頻度に基づいて当該セグメントの特徴量を算出するセグメント特徴量算出処理と、前記特徴量に基づくトピックベクトルを当該セグメントに割り当てるトピック割り当て処理と、前記トピックベクトルを用いて計算されたセグメント間の類似度が閾値以上の場合に前記隣接するセグメントを結合するセグメント結合処理とを、所定の終了条件を満たすまで繰り返すことを要旨とする。 The video group reconstruction / summarization apparatus according to claim 2 is the video group reconstruction / summarization apparatus according to claim 1, wherein the topic assignment / segment update unit is configured to perform segmentation based on a frequency of use of a keyword included in the segment. A segment feature amount calculation process for calculating the feature amount, a topic assignment process for assigning a topic vector based on the feature amount to the segment, and a similarity between segments calculated using the topic vector is equal to or greater than a threshold value The gist is to repeat the segment combining process for combining adjacent segments until a predetermined end condition is satisfied.

請求項３記載の映像群再構成・要約装置は、請求項１又は２記載の映像群再構成・要約装置において、前記セグメント初期化手段は、セグメントあたりの情報量が閾値よりも大きく、１つのセグメントが複数のトピックに跨らないように前記時間区間付きテキストデータを分割することを要旨とする。 The video group reconstruction / summarization device according to claim 3 is the video group reconstruction / summarization device according to claim 1 or 2, wherein the segment initialization unit has an information amount per segment larger than a threshold value, The gist is to divide the text data with a time section so that the segment does not span a plurality of topics.

請求項４記載の映像群再構成・要約装置は、請求項２記載の映像群再構成・要約装置において、前記トピック割り当て・セグメント更新手段は、時間的に前及び／又は後のセグメントの特徴量を更に用いて前記セグメント特徴量算出処理を実行することを要旨とする。 5. The video group reconstruction / summarization apparatus according to claim 4 is the video group reconstruction / summarization apparatus according to claim 2, wherein the topic assignment / segment update means is a feature quantity of a segment preceding and / or following in time. The gist is that the segment feature amount calculation process is further executed using the above.

請求項５記載の映像群再構成・要約方法は、時間区間付きテキストデータが付与されている映像から構成される映像群を再構成し、映像を要約する映像群再構成・要約方法において、コンピュータにより、前記時間区間付きテキストデータをデータ記憶手段から読み出して、時間軸に沿って１つ以上のセグメントに分割するセグメント初期化ステップと、各セグメント内のキーワードに応じたトピックを各セグメントにそれぞれ割り当てる処理と、前記トピックの類似度に基づいて時間的に隣接するセグメントを結合する処理とを繰り返すトピック割り当て・セグメント更新ステップと、前記繰り返された後の各セグメントを当該セグメントに割り当てられたトピック毎にクラスタリングするセグメント群クラスタリングステップと、当該トピックに対する前記キーワードの重要度に基づいて、クラスタリングされたトピック内のセグメントに対応する映像区間から重要映像区間を抽出する重要映像区間抽出ステップと、各クラスタの前記映像区間と各クラスタの前記重要映像区間との少なくとも一方に基づいて映像を出力する映像出力ステップと、を有することを要旨とする。 6. The video group reconstruction / summarization method according to claim 5, wherein the video group reconstruction / summarization method reconstructs a video group composed of videos to which text data with a time interval is added, and summarizes the video. The segment initialization step of reading out the text data with time section from the data storage means and dividing it into one or more segments along the time axis, and assigning topics corresponding to the keywords in each segment to each segment, respectively A topic assignment / segment update step that repeats the process and a process of combining temporally adjacent segments based on the similarity of the topic, and each repeated segment is assigned to each topic assigned to the segment. Segment group clustering step for clustering and the corresponding cluster An important video segment extraction step for extracting an important video segment from video segments corresponding to segments in the clustered topic based on the importance of the keyword with respect to the video; and the video segment of each cluster and the importance of each cluster And a video output step of outputting a video based on at least one of the video sections.

請求項６記載の映像群再構成・要約方法は、請求項５記載の映像群再構成・要約方法において、前記トピック割り当て・セグメント更新ステップは、セグメントに含まれるキーワードの使用頻度に基づいて当該セグメントの特徴量を算出するセグメント特徴量算出処理と、前記特徴量に基づくトピックベクトルを当該セグメントに割り当てるトピック割り当て処理と、前記トピックベクトルを用いて計算されたセグメント間の類似度が閾値以上の場合に前記隣接するセグメントを結合するセグメント結合処理とを、所定の終了条件を満たすまで繰り返すことを要旨とする。 The video group reconstruction / summarization method according to claim 6 is the video group reconstruction / summarization method according to claim 5, wherein the topic assignment / segment update step is performed based on a frequency of use of a keyword included in the segment. A segment feature amount calculation process for calculating the feature amount, a topic assignment process for assigning a topic vector based on the feature amount to the segment, and a similarity between segments calculated using the topic vector is equal to or greater than a threshold value The gist is to repeat the segment combining process for combining adjacent segments until a predetermined end condition is satisfied.

請求項７記載の映像群再構成・要約方法は、請求項５又は６記載の映像群再構成・要約方法において、前記セグメント初期化ステップは、セグメントあたりの情報量が閾値よりも大きく、１つのセグメントが複数のトピックに跨らないように前記時間区間付きテキストデータを分割することを要旨とする。 The video group reconstruction / summarization method according to claim 7 is the video group reconstruction / summarization method according to claim 5 or 6, wherein the segment initialization step has an information amount per segment larger than a threshold value. The gist is to divide the text data with a time section so that the segment does not span a plurality of topics.

請求項８記載の映像群再構成・要約方法は、請求項６記載の映像群再構成・要約方法において、前記トピック割り当て・セグメント更新ステップは、時間的に前及び／又は後のセグメントの特徴量を更に用いて前記セグメント特徴量算出処理を実行することを要旨とする。 The video group reconstruction / summarization method according to claim 8 is the video group reconstruction / summarization method according to claim 6, wherein the topic assignment / segment update step includes temporally and / or subsequent segment feature quantities. The gist is that the segment feature amount calculation process is further executed using the above.

請求項９記載の映像群再構成・要約プログラムは、請求項５乃至８のいずれかに記載の映像群再構成・要約方法をコンピュータに実行させることを要旨とする。 According to a ninth aspect of the present invention, there is provided a video group reconstruction / summarization program that causes a computer to execute the video group reconstruction / summarization method according to any one of the fifth to eighth aspects.

本発明によれば、何らの事前知識を用いることなく映像群を話題毎の映像区間群に再構成することができる。 According to the present invention, a video group can be reconfigured into a video section group for each topic without using any prior knowledge.

処理全体の概要を説明する図である。It is a figure explaining the outline | summary of the whole process. 時間区間付きテキストデータの例を示す図である。It is a figure which shows the example of text data with a time interval. 映像群再構成・要約装置の機能ブロック構成を示す図である。It is a figure which shows the functional block structure of an image group reconstruction / summarization apparatus. キーワード情報管理テーブルの例を示す図である。It is a figure which shows the example of a keyword information management table. 句読点を基準としたセグメント初期化の例を示す図である。It is a figure which shows the example of the segment initialization on the basis of a punctuation mark. 情報量を基準としたセグメント初期化の例を示す図である。It is a figure which shows the example of the segment initialization on the basis of information amount. 映像の切り替わりを基準としたセグメント初期化の例を示す図である。It is a figure which shows the example of the segment initialization on the basis of the switching of an image | video. セグメント情報管理テーブルの例を示す図である。It is a figure which shows the example of a segment information management table. トピック割り当て・セグメント更新の処理フローを示す図である。It is a figure which shows the processing flow of topic allocation and segment update. セグメント特徴量の算出例を示す図である。It is a figure which shows the example of calculation of a segment feature-value. トピック−単語重要度テーブルの例を示す図である。It is a figure which shows the example of a topic-word importance degree table. セグメント情報更新テーブルの更新例を示す図である。It is a figure which shows the example of an update of a segment information update table. セグメント群のクラスタリング例を示す図である。It is a figure which shows the clustering example of a segment group. 映像視聴インタフェースの例を示す図である。It is a figure which shows the example of a video viewing interface. 効果の概要を示す図である。It is a figure which shows the outline | summary of an effect. ニュース映像群に対する適用例を示す図である。It is a figure which shows the example of application with respect to a news video group. 教材映像群（数学の問題演習講義）に対する適用例を示す図である。It is a figure which shows the example of application with respect to teaching-materials video group (mathematics problem exercise lecture). 議会映像群に対する適用例を示す図である。It is a figure which shows the example of application with respect to an assembly video group.

本発明は、映像に付与されるテキストデータの部分要素群へのトピック割り当て処理、及びトピックの類似度に基づく隣接要素の結合処理を繰り返すことにより、トピック割り当て及びトピックセグメンテーションの精度を高めていくことを特徴としている。 The present invention increases the accuracy of topic assignment and topic segmentation by repeating topic assignment processing to subelement groups of text data given to video and combining processing of adjacent elements based on topic similarity. It is characterized by.

図１を用いてその処理の概要を説明する。本発明では、まず、各映像に付与された時間区間付きテキストデータ群を適当な部分要素群に分割する。続いて、各要素のテキストデータからキーワードおよび特徴量を抽出した後、この要素群にトピックを割り当てる処理を行う。そして、割り当てられたトピックの類似度に基づいて、隣り合う部分要素の結合を行う。 The outline of the processing will be described with reference to FIG. In the present invention, first, the text data group with a time section assigned to each video is divided into appropriate subelement groups. Subsequently, after a keyword and a feature amount are extracted from the text data of each element, a process of assigning a topic to this element group is performed. Then, adjacent subelements are combined based on the assigned topic similarity.

このトピック割り当て処理及び部分要素結合処理を複数回繰り返すことにより、最終的に得られた各部分要素には、トピックが割り当てられる。また、各トピックについて、各キーワードのトピックに対する重要度が得られる。 A topic is assigned to each finally obtained subelement by repeating the topic assignment process and the subelement combination process a plurality of times. Moreover, the importance with respect to the topic of each keyword can be obtained for each topic.

そして、トピックに基づいて要素群をクラスタリングすることにより、トピック毎に再構成された映像を得ることが可能となる。また、各キーワードの各トピックに対する重要度を用いて重要な映像区間を決定することにより、トピック毎の要約映像を得ることが可能となる。 Then, by clustering element groups based on topics, it is possible to obtain video reconstructed for each topic. In addition, it is possible to obtain a summary video for each topic by determining an important video section using the importance of each keyword for each topic.

上述した時間区間付きテキストデータとは、例えば図２のような、１つの映像ファイルに対応する、開始時間および終了時間を有するテキスト群から構成されるデータを指す。 The above-described text data with time section refers to data composed of a text group having a start time and an end time corresponding to one video file as shown in FIG.

具体的には、例えば、映像に付与された字幕やクローズドキャプション、音声認識処理の結果得られるテキストデータを用いることができる。なお、入力されたテキストデータと映像とは一対一に対応付けられているものとし、時間区間付きテキストデータを持たない映像や、映像に対応付けされていない時間区間付きテキストデータの存在は仮定しないものとする。 Specifically, for example, subtitles added to the video, closed captions, or text data obtained as a result of voice recognition processing can be used. It is assumed that the input text data and the video are associated with each other on a one-to-one basis, and it is not assumed that there is no video without text data with a time interval or text data with a time interval not associated with a video. Shall.

以降、時間区間付きテキストデータの部分要素を、セグメントと呼称する。セグメントは、時間区間付きテキストデータにおいて、いくつかの隣り合う時間区間のテキスト又はその一部から構成される。 Hereinafter, the subelement of the text data with time section is referred to as a segment. A segment is composed of texts of some adjacent time intervals or a part thereof in text data with time intervals.

具体的には、例えば図２のように、１行目から２行目の下線部を１つのセグメントとして定義したり、５行目のテキスト先頭から句点までをセグメントとして定義したりする。以下、本発明を実施する一実施の形態について説明する。 Specifically, as shown in FIG. 2, for example, the underlined portion from the first line to the second line is defined as one segment, or the text from the beginning of the text on the fifth line to the punctuation point is defined as a segment. Hereinafter, an embodiment for carrying out the present invention will be described.

〔映像群再構成・要約装置１の機能について〕
図３は、本実施の形態に係る映像群再構成・要約装置１の機能ブロック構成を示す図である。映像群再構成・要約装置１は、データ記憶部１１と、セグメント初期化部１２と、トピック割り当て・セグメント更新部１３と、セグメント群クラスタリング部１４と、重要映像区間抽出部１５と、映像出力部１６とで構成される。 [Functions of image group reconstruction / summarization device 1]
FIG. 3 is a diagram showing a functional block configuration of the video group reconstruction / summarization apparatus 1 according to the present embodiment. The video group reconstruction / summarization apparatus 1 includes a data storage unit 11, a segment initialization unit 12, a topic assignment / segment update unit 13, a segment group clustering unit 14, an important video segment extraction unit 15, and a video output unit. 16.

データ記憶部１１は、入力として与えられ、データベース上で管理された映像群、及びその映像群のそれぞれに対応する時間区間を有するテキストデータ群を記憶する機能を有している。 The data storage unit 11 has a function of storing a video group given as input and managed on a database, and a text data group having a time section corresponding to each of the video group.

セグメント初期化部１２は、データ記憶部１１から時間区間付きテキストデータ群を読み出して、そのテキストデータ群からキーワードを抽出するキーワード抽出処理や、時間軸に沿って１つ以上のセグメントに分割するセグメント初期化処理を行う機能を有している。 The segment initialization unit 12 reads a text data group with a time section from the data storage unit 11, extracts a keyword from the text data group, and segments to be divided into one or more segments along the time axis It has a function to perform initialization processing.

トピック割り当て・セグメント更新部１３は、各セグメント内のキーワードに応じたトピックを各セグメントにそれぞれ割り当てるトピック割り当て処理と、割り当てられたトピックの類似度に基づいて時間的に隣接するセグメントを結合するセグメント結合処理とを所定の回数繰り返し行う機能を有している。 The topic assignment / segment update unit 13 assigns a topic corresponding to a keyword in each segment to each segment, and segment combination that combines temporally adjacent segments based on the similarity of the assigned topics It has a function of repeating the processing a predetermined number of times.

セグメント群クラスタリング部１４は、トピック割り当て・セグメント更新部１３での処理により最終的に得られた各セグメントを、その各セグメントに割り当てられたトピック毎にクラスタリングするクラスタリング処理を行う機能を有している。ここで得られた各クラスタが上述したトピックに相当する。 The segment group clustering unit 14 has a function of performing clustering processing for clustering each segment finally obtained by the processing in the topic assignment / segment update unit 13 for each topic assigned to the segment. . Each cluster obtained here corresponds to the topic described above.

重要映像区間抽出部１５は、トピック割り当て・セグメント更新部１３の処理により得られた各キーワードの重要度に基づいて、クラスタリングされたトピック内のセグメントに対応する映像区間候補の中から重要な映像区間を要約映像として抽出する処理を行う機能を有している。 The important video segment extraction unit 15 selects an important video segment from among the video segment candidates corresponding to the segments in the clustered topic based on the importance of each keyword obtained by the processing of the topic assignment / segment update unit 13. Has a function of performing a process of extracting as a summary video.

映像出力部１６は、セグメント群クラスタリング部１４の処理により得られた各クラスタに対応する映像区間と、重要映像区間抽出部１５の処理により得られた各クラスタの重要映像区間との少なくとも一方に基づいて映像を出力する機能を有している。 The video output unit 16 is based on at least one of the video segment corresponding to each cluster obtained by the processing of the segment group clustering unit 14 and the important video segment of each cluster obtained by the processing of the important video segment extraction unit 15. It has a function to output video.

以上説明した各機能部１１〜１６の処理を実行することにより、トピック毎に再構成された映像群や、重要映像区間が抽出されたトピック毎の要約映像群を出力として得ることが可能となる。 By executing the processing of each of the functional units 11 to 16 described above, it is possible to obtain a video group reconstructed for each topic and a summary video group for each topic from which important video sections are extracted as an output. .

なお、各機能部１１〜１６はメモリやＣＰＵを備えたコンピュータにより実現可能であり、その処理はプログラムによって実行される。 Each functional unit 11 to 16 can be realized by a computer having a memory and a CPU, and the processing is executed by a program.

〔セグメント初期化部１２の処理について〕
次に、セグメント初期化部１２の処理について説明する。セグメント初期化部１２では、各映像に対応した時間区間付きテキストデータからキーワードを抽出し、セグメントの初期状態を決定することにより、各セグメントについて、セグメントに関する情報が格納されたセグメント情報管理テーブルを作成する。 [Processing of segment initialization unit 12]
Next, the process of the segment initialization unit 12 will be described. The segment initialization unit 12 creates a segment information management table in which information about the segment is stored for each segment by extracting keywords from text data with a time section corresponding to each video and determining the initial state of the segment. To do.

（キーワードの抽出について）
まず、時間区間付きテキストデータ群からのキーワード抽出について説明する。ここでは、入力された時間区間付きテキストデータから、キーワードおよびそのデータ全体におけるキーワードの頻度（以下では、データ全体におけるキーワードの頻度を総頻度とも呼称する）を抽出する。 (About keyword extraction)
First, keyword extraction from a text data group with a time section will be described. Here, the keyword and the keyword frequency in the entire data (hereinafter, the keyword frequency in the entire data is also referred to as the total frequency) are extracted from the input text data with time section.

キーワードを抽出する方法としては、テキストデータを形態素解析し、名詞句などの特定の句をキーワードとして抽出してもよい。例えば、漢字が一定数以上連続する文字列など、特定の規則を持つ文字列をキーワードとして抽出するなどしてもよい。 As a method for extracting a keyword, morphological analysis of text data may be performed, and a specific phrase such as a noun phrase may be extracted as a keyword. For example, a character string having a specific rule, such as a character string in which a certain number of kanji characters continue, may be extracted as a keyword.

このときストップワードと呼ばれる、どの映像に対しても出現頻度が非常に高くキーワードとして機能しないような語句は、例えば、時間区間付きテキストデータを一つの文書とみなしたときのキーワードのｔｆ−ｉｄｆ値を評価するなどして、あらかじめ除去することが望ましい。 At this time, a term called a stop word, which has a very high appearance frequency for any video and does not function as a keyword, is, for example, a keyword tf-idf value when text data with a time interval is regarded as one document. It is desirable to remove it beforehand by evaluating the above.

抽出されたキーワードおよびその頻度の情報は、例えば図４に示されるような、キーワードＩＤがキーワード自体とその総頻度に紐付く形式でデータ記憶部１１に格納される。図４は一例であり、これらの要素以外にも、例えば先ほど算出したｔｆ−ｉｄｆ値が要素として含まれていても構わない。 The extracted keyword and its frequency information are stored in the data storage unit 11 in a format in which the keyword ID is associated with the keyword itself and its total frequency, for example, as shown in FIG. FIG. 4 is an example, and in addition to these elements, for example, the previously calculated tf-idf value may be included as an element.

なお、前述した形態素解析の方法としては、例えば、「松本裕治、“形態素解析システム「茶筌」”、情報処理、41巻11号、2000年11月、p.1208-1214」、「“Yet Another Part-of-Speech and Morphological Analyzer”、[onlile]、[平成24年11月15日検索]、＜URL: http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html＞」に記載されているものを用いることができる。 Examples of the morphological analysis method described above include, for example, “Yuji Matsumoto,“ Morphological Analysis System “Chaya” ”, Information Processing, Vol. 41, No. 11, November 2000, p.1208-1214,“ “Yet Another Part-of-Speech and Morphological Analyzer ”, [onlile], [searched on November 15, 2012], <URL: http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html> Can be used.

（セグメントの初期化について）
次に、セグメントの初期状態構築処理について説明する。本実施の形態において、セグメント初期化部１２の処理により得られた各セグメントは、トピック割り当ての最小単位である。 (About segment initialization)
Next, the segment initial state construction processing will be described. In the present embodiment, each segment obtained by the processing of the segment initialization unit 12 is a minimum topic allocation unit.

以降の処理で、トピックの類似度に基づき隣接するセグメントを結合していくことにより、トピックセグメンテーションの精度およびセグメントに対するトピック割り当ての精度を高めていく。 In the subsequent processing, adjacent segments are combined based on topic similarity, thereby improving topic segmentation accuracy and topic allocation accuracy for the segments.

セグメントの初期状態を構築する方法としては様々な方法が考えられるが、その際に考慮すべき観点として、（ａ）セグメントあたりの情報量を増やす、（ｂ）１つのセグメントが複数のトピックに跨らないよう初期セグメントを構築する、の２つの要素が挙げられる。 There are various methods for constructing the initial state of a segment. As a viewpoint to be considered at that time, (a) increasing the amount of information per segment, (b) one segment spans multiple topics. There are two elements: the initial segment is constructed so that it does not.

後述するトピック割り当て・セグメント更新部１３において、各セグメントへのトピック割り当ては、セグメントに含まれるキーワードの頻度、又はそれに関連する情報に基づいて行われる。 In the topic assignment / segment update unit 13 described later, topic assignment to each segment is performed based on the frequency of keywords included in the segment or information related thereto.

一般に、セグメントあたりに含まれる情報量が多いほど、後述するトピック割り当て処理により割り当てられるトピックの妥当性は高いため、セグメントあたりの情報量を閾値よりも大きくするという上記（ａ）の観点は重要である。 In general, the more information contained per segment, the higher the relevance of the topic assigned by the topic assignment process described later. Therefore, the viewpoint of (a) above that makes the amount of information per segment larger than a threshold is important. is there.

一方で、後述するトピック割り当て・セグメント更新部１３におけるセグメント初期化処理において、初期セグメントが複数トピックを含んでいる場合、トピックセグメンテーションの精度が低下するため、上記（ｂ）の観点も重要である。 On the other hand, in the segment initialization process in the topic allocation / segment update unit 13 described later, when the initial segment includes a plurality of topics, the accuracy of topic segmentation is lowered, so the viewpoint (b) is also important.

以上の理由から、上記（ａ）、（ｂ）の両観点から妥当なセグメント初期化方法を適用する必要がある。具体的には、（１）句読点を基準としたセグメント初期化、（２）セグメントあたりの情報量を基準としたセグメント初期化、（３）映像の切り替わりを基準としたセグメント初期化、の３通りの方法が考えられる。 For the above reasons, it is necessary to apply a segment initialization method that is appropriate from both viewpoints (a) and (b). Specifically, (1) segment initialization based on punctuation marks, (2) segment initialization based on the amount of information per segment, and (3) segment initialization based on video switching. Can be considered.

（１）の方法は、句点又は読点を基準として文章を区切り、初期セグメントを構築する方法である。図５は、句点ごとに文章を区切り、初期セグメントを構成している例を示している。 The method (1) is a method of constructing an initial segment by dividing sentences on the basis of punctuation marks or reading marks. FIG. 5 shows an example in which sentences are divided for each punctuation point to constitute an initial segment.

一般的に、句読点で区切られる文章又は句が複数のトピックを含む可能性は低い。そのため、（１）の方法により得られた初期セグメントが、複数トピックを有する可能性は低く、その点でセグメントの初期化方法として好適である。その一方で、例えば「おはようございます。」といったセグメントなど、情報量が非常に小さい、又は存在しないセグメントが発生する可能性があり、トピック割り当てが困難となるセグメントが発生しうるというデメリットがある。 In general, it is unlikely that a sentence or phrase delimited by punctuation marks includes a plurality of topics. For this reason, the initial segment obtained by the method (1) is unlikely to have a plurality of topics, and is therefore suitable as a segment initialization method. On the other hand, for example, there is a possibility that a segment with a very small amount of information or a non-existing segment such as a segment such as “Good morning” may occur, and a segment that makes topic assignment difficult may occur.

（２）の方法は、例えば文章の文字数やキーワードの数といった情報量を基準として分割を行う方法である。図６は、文章を一定の長さで文章を区切り、初期セグメントを構成する例を示している。この他にも、キーワードの抽出処理で得られたキーワードの数がセグメント毎に等しくなるように、セグメントを構成するといった方法も考えられる。 The method (2) is a method of performing division based on the amount of information such as the number of characters in a sentence and the number of keywords. FIG. 6 shows an example in which an initial segment is formed by dividing a sentence by a certain length. In addition, a method is also conceivable in which segments are configured so that the number of keywords obtained by keyword extraction processing is equal for each segment.

この方法により得られる各々の初期セグメントは、（１）の方法のように情報量が小さくトピック割り当てが困難であるようなセグメントは発生しないものの、初期セグメントが複数トピックに跨がる可能性があり、その場合、トピックによる映像区間の分割精度が下がるというデメリットがある。 Each initial segment obtained by this method does not generate a segment with a small amount of information and is difficult to allocate topics as in method (1), but the initial segment may span multiple topics. In this case, there is a demerit that the division accuracy of the video section by topic is lowered.

（３）の方法は、映像や音声の切り替わりによって定義される映像区間に含まれるテキストデータをセグメントとして定義する方法である。図７は、映像が切り替わるごとに文章を区切り、初期セグメントを構成する例を示している。 The method (3) is a method of defining text data included in a video section defined by switching between video and audio as a segment. FIG. 7 shows an example in which an initial segment is formed by dividing sentences every time a video is switched.

この方法により得られる各初期セグメントは、（１）の方法に比べて初期セグメントのサイズを大きく設定することができるため、後述するトピック割り当て・セグメント更新部１３における繰り返し処理の回数を少なく抑えることができる。 Each initial segment obtained by this method can set the size of the initial segment larger than the method (1), so that the number of iterations in the topic assignment / segment update unit 13 described later can be reduced. it can.

なお、映像や音声の切り替わりは、例えば、特開平８−２１４２１０号公報や特開平１１−１８０２８号公報に記載されている方法を用いて検出することができる。実際にセグメントの初期化を行う際には、（ａ）（ｂ）の観点、およびデータの特性を照らし合わせながら、（１）〜（３）の中から適切な方法を選択すればよい。 Note that switching between video and audio can be detected using, for example, the methods described in JP-A-8-214210 and JP-A-11-18028. When the segment is actually initialized, an appropriate method may be selected from (1) to (3) while comparing the viewpoints (a) and (b) and the data characteristics.

（セグメント情報管理テーブルの生成について）
以上の処理結果に基づき、セグメント初期化部１２は、図８に示すようなセグメント情報管理テーブルを構築する。セグメント情報管理テーブルには、セグメントが属する映像のＩＤ、セグメントの開始時間および終了時間、セグメントのテキスト、セグメントに含まれるキーワードとその頻度が格納される。 (About segment information management table generation)
Based on the above processing results, the segment initialization unit 12 constructs a segment information management table as shown in FIG. The segment information management table stores the ID of the video to which the segment belongs, the start time and end time of the segment, the text of the segment, the keyword included in the segment, and its frequency.

図８は一例であり、他にも映像のチャンネルに関する情報が与えられていた場合、チャンネルＩＤの要素をテーブルに追加するなどしてもよい。映像の放送日時（以下、タイムスタンプと呼ぶ）が付与されている場合、そのデータをテーブルに追加してもよい。 FIG. 8 is an example, and when other information related to video channels is given, an element of channel ID may be added to the table. When a video broadcast date and time (hereinafter referred to as a time stamp) is given, the data may be added to the table.

以降、各セグメントにおけるキーワードの頻度を、セグメント内頻度とも呼称する。セグメント情報管理テーブルに格納されるキーワードの情報は、キーワード情報管理テーブルで管理されているキーワードＩＤであり、キーワードそのものや総頻度と紐付けられていることが望ましい。セグメント内頻度は、各セグメントのテキスト中に、キーワードが何回出現したかを数え上げることで得られる。 Hereinafter, the keyword frequency in each segment is also referred to as intra-segment frequency. The keyword information stored in the segment information management table is a keyword ID managed in the keyword information management table, and is preferably associated with the keyword itself and the total frequency. The intra-segment frequency can be obtained by counting how many times the keyword appears in the text of each segment.

なお、各セグメントの開始時間および終了時間を設定する方法については、様々なものが考えられる。最も単純な方法として、セグメントに含まれる文章の一部分を含む時間区間のうち、最も早い開始時間と、最も遅い終了時間とをセグメントの開始時間および終了時間として定義するといったものがある。 There are various methods for setting the start time and end time of each segment. The simplest method is to define the earliest start time and the latest end time as the start time and end time of a segment in a time interval including a part of a sentence included in a segment.

〔トピック割り当て・セグメント更新部１３の処理について〕
続いて、トピック割り当て・セグメント更新部１３の処理について説明する。トピック割り当て・セグメント更新部１３では、各セグメントへのトピック割り当ておよびトピック類似度に基づく隣接セグメントの結合処理を所定の回数繰り返すことにより、トピックセグメンテーションおよびトピック割り当ての精度を向上する。 [Topic assignment / segment update unit 13]
Next, the processing of the topic assignment / segment update unit 13 will be described. The topic assignment / segment update unit 13 improves the accuracy of topic segmentation and topic assignment by repeating the topic segment assignment process based on topic assignment and topic similarity to a segment a predetermined number of times.

具体的には、図９に示すように、ステップＳ１０１のセグメント特徴量算出処理と、ステップＳ１０２のトピック割り当て処理と、ステップＳ１０３のセグメント結合処理とを、ステップＳ１０４のセグメント更新終了判定条件を満たすまで繰り返す。 Specifically, as shown in FIG. 9, the segment feature amount calculation process in step S101, the topic assignment process in step S102, and the segment combination process in step S103 are performed until the segment update end determination condition in step S104 is satisfied. repeat.

この繰り返し処理によって、セグメント初期化部１２により生成されたセグメント情報管理テーブルが更新され、最終的に得られたテーブルの情報を用いて、続くセグメント群クラスタリング部１４の処理が行われる。 By this iterative process, the segment information management table generated by the segment initialization unit 12 is updated, and the subsequent process of the segment group clustering unit 14 is performed using the information of the finally obtained table.

（ステップＳ１０１：セグメント特徴量の算出処理について）
まず、セグメント特徴量算出部では、キーワード情報管理テーブルおよびセグメント情報管理テーブルをもとに、セグメント毎にその特徴量を算出する。 (Step S101: Segment feature amount calculation process)
First, the segment feature amount calculation unit calculates the feature amount for each segment based on the keyword information management table and the segment information management table.

特徴量としては、セグメントに含まれるキーワードの頻度に基づくヒストグラムを抽出する。ここで抽出するヒストグラムは、例えばキーワードのセグメント内頻度をそのまま用いてもよいし、セグメントを一つの文書とみなし、各キーワードについてｔｆ−ｉｄｆのような公知の技術を用いて得られたスコアを用いてもよい。 As the feature amount, a histogram based on the frequency of keywords included in the segment is extracted. The histogram extracted here may use, for example, the intra-segment frequency of the keyword as it is, or consider the segment as one document, and use a score obtained by using a known technique such as tf-idf for each keyword. May be.

ここで、各セグメントの特徴量を算出するにあたっては、セグメントそのものの情報量と、セグメント間の距離を考慮したうえで、周辺セグメントが持つ情報も考慮し特徴量を算出する。 Here, in calculating the feature amount of each segment, the feature amount is calculated in consideration of the information amount of the segment itself and the distance between the segments, and also the information of the surrounding segments.

図１０は、周辺セグメントも考慮に入れたセグメント特徴量算出の一例である。この例において、ｎ番目のセグメント（「西日本と東日本の都市部では、昨夜からけさに」）における特徴量は、このセグメントの前後２セグメントが持つ情報も考慮した上で算出する。 FIG. 10 is an example of segment feature amount calculation that also considers neighboring segments. In this example, the feature amount in the nth segment (“in the urban areas of western Japan and eastern Japan from last night”) is calculated in consideration of the information held by the two segments before and after this segment.

周辺セグメントの情報は、ｎ番目のセグメントに隣り合うセグメントについては０．６倍、ｎ番目のセグメントの２つ隣りのセグメントについては０．２倍の重み付けがなされたうえで考慮され、その結果、同図の右側に示されるような値がｎ番目のセグメントのヒストグラム特徴量として得られる。 Peripheral segment information is taken into account after being weighted 0.6 times for the segment adjacent to the nth segment and 0.2 times for the segment next to the nth segment, A value as shown on the right side of the figure is obtained as the histogram feature amount of the nth segment.

セグメント初期化部１２において用いる方法によっては、情報量が非常に小さくトピック割り当てが困難なセグメントが発生する可能性がある。そのような場合、周辺セグメントの情報も考慮したうえで特徴量を算出することは、仮想的にセグメントの情報量を増やすことができるという点で有効である。 Depending on the method used in the segment initialization unit 12, there is a possibility that a segment having a very small amount of information and difficult to assign to a topic may occur. In such a case, it is effective to calculate the feature amount in consideration of the information of the surrounding segments because the information amount of the segment can be virtually increased.

一方で、セグメントそのものが十分な情報量を有している場合、周辺トピックの情報を考慮することは、かえってセグメントの特徴量の質を低下させてしまう可能性もある。そのため、考慮の対象とするセグメントの数は、セグメントそのものの情報量が多いほど少なく設定されることが望ましい。 On the other hand, when the segment itself has a sufficient amount of information, considering the information on the surrounding topics may reduce the quality of the segment feature amount. For this reason, it is desirable that the number of segments to be considered is set to be smaller as the information amount of the segment itself is larger.

セグメントそのものの情報量と、セグメント間の距離を考慮したうえで周辺セグメントの情報を考慮する方法については様々な方法が考えられる。例えば、特徴量算出の際に考慮する周辺セグメントの最大値を２Ｎ_ｂ、ｎ番目のセグメントに含まれるキーワードの総数をＮ_ｗとしたとき、σ＝Ｎ_ｂ／（Ｎ_ｗ＋１）と定義して、以下の式（１）および式（２）より計算されるＷを、セグメントＢ_ｉに含まれる情報にかかる重みとするといった方法が考えられる。なお、｜Ｂ_ｎ−Ｂ_ｉ｜は、ｎ番目のセグメントＢ_ｎとｉ番目のＢ_ｉとが離れているセグメント数を示す。

Various methods can be considered as a method of considering the information of the surrounding segments after considering the information amount of the segments themselves and the distance between the segments. For example, when the maximum value of surrounding segments to be considered when calculating the feature amount is 2N _b and the total number of keywords included in the nth segment is N _w , it is defined as σ = N _b / (N _w +1) A method is conceivable in which W calculated from the following equations (1) and (2) is used as a weight for information included in the segment B _i . | B _n −B _i | indicates the number of segments in which the n-th segment B _n is separated from the i-th B _i .

（ステップＳ１０２：トピック割り当て処理について）
続いて、トピック割り当て部では、セグメント特徴量算出部により得られた各セグメントの特徴量に基づき、各セグメントにトピックの割り当てを行う。 (Step S102: topic assignment processing)
Subsequently, the topic assignment unit assigns topics to each segment based on the feature values of each segment obtained by the segment feature value calculation unit.

前述したセグメント特徴量算出部では、各セグメントについて特徴量がヒストグラムとして与えられる。ヒストグラムは、ある符号ｃが何回生起したかを表す情報であるため、多項分布に従うとしてモデル化することができる。 In the segment feature value calculation unit described above, the feature values for each segment are given as a histogram. Since the histogram is information indicating how many times a certain code c has occurred, it can be modeled as following a multinomial distribution.

多項分布に従う変数から、その背後にあるトピックを推定するためのトピックモデルとしては、いくつか公知のものが存在する。代表的なものに、「T. Hoffmann、“Probabilistic Latent Semantic Indexing”、SIGIR'99、1999年、p.50-57」や「D.M. Blei、外２名、“Latent Dirichlet Allocation”、Journal of Machine Learning Research 3、2003年、p.993-1022」に記載されたｐＬＳＡ（Probabilistic Latent Semantic Analysis）やＬＤＡ（Latent Dirichlet Allocation）などがある。 There are several well-known topic models for estimating a topic behind a variable that follows a multinomial distribution. Representative examples include “T. Hoffmann,“ Probabilistic Latent Semantic Indexing ”, SIGIR'99, 1999, p.50-57” and “DM Blei, two others,“ Latent Dirichlet Allocation ”, Journal of Machine Learning. PLSA (Probabilistic Latent Semantic Analysis) and LDA (Latent Dirichlet Allocation) described in Research 3, 2003, p.993-1022.

トピックモデルでは、「ある文書に含まれる各単語は、文書固有のトピック比率θ_Ｂに従ってあるトピックを選択した後、そのトピックに固有の単語出現確率分布Φ_ｚに従って生成される」と仮定する。 In the topic model, it is assumed that “each word included in a certain document is generated according to a word appearance probability distribution Φ _z unique to the topic after selecting a certain topic according to the topic ratio θ _B unique to the document”.

いま、セグメントＢ_ｉをトピックモデルにおける文書とみなすと、トピックモデルは、セグメントＢ_ｉと、その背後にあるトピックｚｉの同時確率ｐ（Ｂ_ｉ，ｚ）として表現される。 If the segment B _i is regarded as a document in the topic model, the topic model is expressed as a joint probability p (B _i , z) of the segment B _i and the topic zi behind it.

同時確率ｐ（Ｂ_ｉ，ｚ）は、一般には厳密に計算することができないため、必要に応じて、Gibbs Samplingや変分ベイズ近似などの近似手法を利用して求める。これらの近似手法から直接的に求められるのは、文書固有のトピック比率θ_ｂおよびトピック固有の単語出現確率分布Φ_ｚである。 Since the joint probability p (B _i , z) cannot generally be calculated strictly, it is obtained using an approximation method such as Gibbs Sampling or variational Bayes approximation as necessary. What is directly obtained from these approximation methods is a document-specific topic ratio θ _b and a topic-specific word appearance probability distribution Φ _z .

文書固有のトピック比率θ_ｂは、あらかじめ設定したトピックの数Ｎ_Ｚと同じ次元を持つ確率値のベクトルである。以降、この確率値のベクトルをトピックベクトルと呼ぶ。各セグメントＢ_ｉについてθ_ｂｉを計算することが、セグメントへのトピック割り当てに相当する。 The document-specific topic ratio θ _b is a vector of probability values having the same dimension as the preset number of topics _NZ . Hereinafter, this vector of probability values is referred to as a topic vector. Computing θ _bi for each segment B _i corresponds to topic assignment to the segment.

一方、トピック固有の単語出現確率分布Φ_ｚは、全セグメントに含まれる全単語数Ｎ_ｗと同じ次元を持つ確率値のヒストグラムであり、Ｎ_Ｚ個のヒストグラムが生成される。単語出現確率分布Φ_ｚの各要素の値は、トピックｚにおける単語ｗの出現確率であり、これは、トピックｚ内での単語ｗの重要度を表していると解釈できる。 On the other hand, the topic-specific word appearance probability distribution Φ _z is a histogram of probability values having the same dimension as the total number of words N _w included in all segments, and _NZ histograms are generated. The value of each element of the word appearance probability distribution Φ _z is the appearance probability of the word w in the topic z, which can be interpreted as representing the importance of the word w in the topic z.

単語出現確率分布Φ_ｚの結果を用いて、図１１のような、トピックＩＤおよびキーワードＩＤをキーとするトピック−単語重要度テーブルを構築する。トピック−単語重要度テーブルの構築について、全てのトピックとキーワードの組み合わせに対する重要度を格納したテーブルを構築してもよい。その要素数は、（トピックの数）×（キーワードの数）となる。その他、ある一定値以上の重要度をもつ要素のみを格納したり、各トピックについて上位数件のみを格納したりしてもよい。後述する重要映像区間抽出部１５では、このトピック−単語重要度テーブルを用いることにより、重要映像区間の抽出を行う。 Using the result of the word appearance probability distribution Φ _z , a topic-word importance degree table using the topic ID and the keyword ID as keys as shown in FIG. 11 is constructed. Regarding the construction of the topic-word importance level table, a table storing importance levels for all combinations of topics and keywords may be constructed. The number of elements is (number of topics) × (number of keywords). In addition, it is possible to store only elements having a certain degree of importance or higher, or store only the top several items for each topic. The important video section extraction unit 15 described later extracts an important video section by using this topic-word importance degree table.

（ステップＳ１０３：セグメント結合処理について）
続いて、セグメント結合部では、トピック割り当て部により得られた各セグメントのトピックベクトルに基づく、隣接セグメントの類似度を評価することによって、セグメントの結合を行う。 (Step S103: Segment combining process)
Subsequently, in the segment combination unit, segments are combined by evaluating the similarity of adjacent segments based on the topic vector of each segment obtained by the topic assignment unit.

トピック割り当て処理で述べたように、各セグメントに割り当てられるトピックベクトルは、Ｎ_Ｚ次元の確率ベクトルである。トピックベクトルに基づくセグメント間類似度を算出する方法として、例えば、トピックベクトル間の内積を計算したり、相互相関を計算したりするなどすればよい。 As described in the topic assignment process, the topic vector assigned to each segment is an _NZ- dimensional probability vector. As a method for calculating the similarity between segments based on the topic vector, for example, an inner product between topic vectors or a cross-correlation may be calculated.

この計算によって得られた類似度に対し、閾値を設定することで隣接セグメントを結合するか否かを決定する。閾値は、例えば、あらかじめ一定の値を決めておいたり、トピック類似度の平均値から閾値を動的に設定したりするなど、様々な方法を用いることができる。 Whether or not adjacent segments are to be combined is determined by setting a threshold value for the similarity obtained by this calculation. As the threshold value, various methods can be used, for example, a predetermined value is determined in advance, or the threshold value is dynamically set based on the average value of topic similarities.

隣り合う２つのセグメントが結合すると判定された場合、セグメント情報管理テーブルが更新される。図１２は、セグメント情報更新テーブルの更新の例を示している。２つのセグメントどちらにも出現するキーワードが存在した場合、そのセグメント内頻度は合算され、一つの要素としてテーブルに格納される。 When it is determined that two adjacent segments are combined, the segment information management table is updated. FIG. 12 shows an example of updating the segment information update table. If there are keywords that appear in either of the two segments, the intra-segment frequencies are added together and stored in the table as one element.

（ステップＳ１０４：セグメント更新終了判定について）
最後に、セグメント更新終了判定部では、トピック割り当て・セグメント更新部１３の処理終了判定を行う。終了判定を行う方法は様々なものが考えられる。 (Step S104: Segment update end determination)
Finally, the segment update end determination unit performs the process end determination of the topic assignment / segment update unit 13. There are various methods for determining termination.

例えば。あらかじめトピック割り当て・セグメント更新部１３の繰り返し回数が一定数を超えた時点で終了と判定してもよい。新たに結合されたセグメントの数を判定条件として用いて、結合されるセグメントの数が一定数以下となった時点で終了と判定してもよい。 For example. The end may be determined in advance when the number of repetitions of the topic assignment / segment update unit 13 exceeds a certain number. The number of newly combined segments may be used as a determination condition, and it may be determined to end when the number of combined segments is equal to or less than a certain number.

〔セグメント群クラスタリング部１４の処理について〕
続いて、セグメント群クラスタリング部１４の処理について説明する。セグメント群クラスタリング部１４では、トピック割り当て・セグメント更新部１３の処理の結果、得られた各セグメントのトピックを用いて、セグメントをまとめる処理を行う。 [Processing of the segment group clustering unit 14]
Next, processing of the segment group clustering unit 14 will be described. The segment group clustering unit 14 performs a process of grouping the segments using the topics of the respective segments obtained as a result of the process of the topic assignment / segment update unit 13.

前述した通り、トピックは確率値のベクトルであるため、図１３に示すように、例えばＬ２ノルムなどの適当な距離尺度を用いたクラスタリング処理を適用することによって、セグメントをクラスタにまとめることができる。 As described above, since a topic is a vector of probability values, segments can be grouped into clusters by applying a clustering process using an appropriate distance measure such as an L2 norm as shown in FIG.

クラスタリングの方法としては、k-meansや、「M.M. Yeung、外１名、“Time-Constrained Clustering for Segmentation of Video into Story Unites”、International Conference on Pattern Recognition、vol.3、1996年、p.375-380」に記載されたTime-Constrained Clusteringといった公知の方法を用いることができる。 Clustering methods include k-means, “MM Yeung, 1 other,“ Time-Constrained Clustering for Segmentation of Video into Story Unites ”, International Conference on Pattern Recognition, vol.3, 1996, p.375- A known method such as Time-Constrained Clustering described in “380” can be used.

セグメント群クラスタリング処理によって各セグメントに割り当てられたクラスタは、セグメント情報管理テーブルの要素に追加するなどしてデータ記憶部１１に格納すればよい。 A cluster assigned to each segment by the segment group clustering process may be stored in the data storage unit 11 by adding it to an element of the segment information management table.

〔重要映像区間抽出部１５の処理について〕
続いて、重要映像区間抽出部１５の処理について説明する。重要映像区間抽出部１５では、トピック割り当て・セグメント更新部１３の処理により得られた単語重要度を用いて、映像区間候補の中から、重要な区間の抽出を行う。 [About processing of the important video section extraction unit 15]
Next, processing of the important video section extraction unit 15 will be described. The important video section extraction unit 15 extracts an important section from the video section candidates using the word importance obtained by the processing of the topic assignment / segment update unit 13.

映像区間候補の与え方は様々考えられるが、各候補は、トピック割り当て・セグメント更新部１３の処理により最終的に得られたセグメントの部分要素として定義されるものとする。 There are various ways of giving the video section candidates, but each candidate is defined as a partial element of the segment finally obtained by the processing of the topic assignment / segment update unit 13.

映像区間候補の与え方として、例えば、入力された時間区間付きテキストの全要素を重要映像区間と定義したり、セグメント初期化部１２の処理により得られた初期セグメントを重要映像区間の候補としたりする方法が考えられる。 As examples of how to give video segment candidates, for example, all elements of the input text with time segment are defined as important video segments, or initial segments obtained by the processing of the segment initialization unit 12 are selected as important video segment candidates. A way to do this is conceivable.

これら映像区間候補の中から、重要な区間を抽出する方法としては、例えば、以下の式（３）に基づいて各トピックについて各候補のスコアＳｃｏｒｅを算出し、スコアＳｃｏｒｅの高い候補を重要映像区間とするといった方法が考えられる。

As a method of extracting an important section from these video section candidates, for example, the score Score of each candidate is calculated for each topic based on the following formula (3), and a candidate having a high score Score is selected as the important video section. A method such as

ここで、Ｓｃｏｒｅ（Ｂ^０ _ｊ）は初期セグメントＢ^０ _ｊのスコアを表す。Ｎ_ｂｊｗｉはセグメントＢ^０ _ｊに含まれるキーワードｗ_ｉの頻度、Φ_{ｚｋ,ｗｉ}はトピックｚ_ｋにおけるキーワードｗ_ｉの重要度、θ_{Ｂｊ，ｚｋ}はセグメントＢ^０ _ｊに割り当てられた確率ベクトルのうちトピックｚ_ｋに対応する要素の値を表す。 Here, Score (B ⁰ _j ) represents the score of the initial segment B ⁰ _j . N _bjwi is the frequency of the keyword w _i included in the segment B ⁰ _j , Φ _{zk, wi} is the importance of the keyword w _i in the topic z _k , θ _{Bj, zk} is the topic among the probability vectors assigned to the segment B ⁰ _j z represents the value of the element corresponding to _k .

式（３）は、各初期セグメントに関するスコアを与える式であるが、セグメントＢ^０ _ｊに対応する部分を適宜変更することで、異なる区間が定義された場合にも定義可能である。 Expression (3) is an expression that gives a score regarding each initial segment, but can be defined even when different sections are defined by appropriately changing the portion corresponding to the segment B ⁰ _j .

〔映像出力部１６の処理について〕
最後に、映像出力部１６の処理について説明する。映像出力部１６では、セグメント群クラスタリング部１４の処理により得られた各クラスタの映像区間および重要映像区間抽出部１５の処理により得られた各クラスタの重要映像区間の少なくとも一方を用いて、映像を出力する。 [Processing of video output unit 16]
Finally, the processing of the video output unit 16 will be described. The video output unit 16 uses the video section of each cluster obtained by the processing of the segment group clustering unit 14 and the important video section of each cluster obtained by the processing of the important video section extraction unit 15 to display a video. Output.

映像出力の方法は様々なものが考えられる。最も単純な方法の一つとして、クラスタ毎に再構成された各映像区間群を、各映像のタイムスタンプ順で出力するといった方法が考えられる。他にも、入力された映像群にチャンネルの情報が付与されている場合、チャンネルごとに映像区間をソートしたり、チャンネルごとにビデオプレイヤーを設置し、それらを同時に出力するなどしてもよい。 There are various video output methods. As one of the simplest methods, a method of outputting each video segment group reconstructed for each cluster in the order of the time stamp of each video is conceivable. In addition, when channel information is assigned to the input video group, the video sections may be sorted for each channel, or a video player may be installed for each channel and output them at the same time.

セグメント群クラスタリング部１４の処理により得られた各クラスタおよび重要映像区間抽出部１５の処理により得られた各クラスタの重要映像区間を併用する方法として、図１４に示すような映像視聴インタフェースを用意してもよい。 As a method of using each cluster obtained by the processing of the segment group clustering unit 14 and the important video section of each cluster obtained by the processing of the important video section extracting unit 15, a video viewing interface as shown in FIG. 14 is prepared. May be.

図１４において、映像再生部１７の下には、映像群に含まれる各映像に対応するバーが表示されている。バーの長さは、各映像の映像長に比例しており、ドット部は、本実施の形態により得られたあるクラスタに含まれる映像区間を表している。その中における重要映像区間が、バー内の斜線部で表示されている。 In FIG. 14, bars corresponding to the videos included in the video group are displayed below the video playback unit 17. The length of the bar is proportional to the video length of each video, and the dot portion represents a video section included in a certain cluster obtained by the present embodiment. The important video section is displayed in the shaded area in the bar.

バーの任意の場所に対しクリック等の操作を行うことによって、該当シーンへのシークが行われる。図１４のようなインタフェースによって、映像群の全体像を把握しつつ、任意のクラスタに含まれるシーンの映像視聴が可能となる。 By performing an operation such as clicking on an arbitrary place on the bar, seeking to the corresponding scene is performed. The interface as shown in FIG. 14 makes it possible to view a video of a scene included in an arbitrary cluster while grasping the entire image of the video group.

なお、ここで示したものはあくまでも映像出力方法の例であり、本発明の技術が適用可能な範囲において、いかなる映像出力形態をとっても構わない。 Note that what is shown here is merely an example of a video output method, and any video output form may be used as long as the technology of the present invention is applicable.

以上より、本実施の形態によれば、時間区間付きテキストデータを時間軸に沿って１つ以上のセグメントに分割し、各セグメント内のキーワードに応じたトピックを各セグメントにそれぞれ割り当てる処理とトピックの類似度に基づいて時間的に隣接するセグメントを結合する処理とを繰り返し、繰り返された後の各セグメントを当該セグメントに割り当てられたトピック毎にクラスタリングするので、図１５〜図１８に示すように、何らの事前知識を用いることなく映像群を自動的に話題毎の映像区間群に再構成できる。再構成された映像は話題毎にまとめられたものであるため、視聴者はこれを視聴することにより、容易に話題の内容を把握することができる。 As described above, according to the present embodiment, the text data with a time section is divided into one or more segments along the time axis, and the topic according to the keyword in each segment is assigned to each segment and the topic Since the process of combining temporally adjacent segments based on the similarity is repeated and each segment after the repetition is clustered for each topic assigned to the segment, as shown in FIGS. Video groups can be automatically reconfigured into video segment groups for each topic without using any prior knowledge. Since the reconstructed video is collected for each topic, the viewer can easily grasp the content of the topic by viewing it.

また、本実施の形態によれば、上記繰り返された後のトピックに対するキーワードの重要度に基づいて、クラスタリングされたトピック内のセグメントに対応する映像区間から重要映像区間を抽出するため、話題毎の要約映像を出力することができる。この要約映像を視聴することにより、視聴者はより短時間で話題の内容を把握することができる。 In addition, according to the present embodiment, since the important video section is extracted from the video sections corresponding to the segments in the clustered topic based on the importance of the keyword with respect to the topic after the above repetition, A summary video can be output. By viewing this summary video, the viewer can grasp the content of the topic in a shorter time.

１…映像群再構成・要約装置
１１…データ記憶装置
１２…セグメント初期化部
１３…トピック割り当て・セグメント更新部
１４…セグメント群クラスタリング部
１５…重要映像区間抽出部
１６…映像出力部
１７…映像再生部
Ｓ１０１〜Ｓ１０４…ステップ DESCRIPTION OF SYMBOLS 1 ... Video group reconstruction / summarization apparatus 11 ... Data storage device 12 ... Segment initialization part 13 ... Topic assignment / segment update part 14 ... Segment group clustering part 15 ... Important video section extraction part 16 ... Video output part 17 ... Video reproduction | regeneration Part S101-S104 ... Step

Claims

In a video group reconstruction / summarization device that reconstructs a video group composed of videos to which text data with a time interval is attached, and summarizes the video.
Segment initialization means for reading the text data with time interval from the data storage means and dividing it into one or more segments along the time axis;
Topic assignment / segment update means that repeats the process of assigning topics according to the keywords in each segment to each segment and the process of combining segments that are temporally adjacent based on the similarity of the topics;
Segment group clustering means for clustering each segment after the repetition for each topic assigned to the segment;
An important video segment extracting means for extracting an important video segment from video segments corresponding to segments in the clustered topic based on the importance of the keyword for the topic;
Video output means for outputting video based on at least one of the video section of each cluster and the important video section of each cluster;
A video group reconstruction / summarization apparatus comprising:

The topic assignment / segment update means includes:
Segment feature amount calculation processing for calculating the feature amount of the segment based on the frequency of use of the keyword included in the segment, topic assignment processing for assigning a topic vector based on the feature amount to the segment, and calculation using the topic vector 2. The video group reconstruction / summarization according to claim 1, wherein, when the similarity between the determined segments is equal to or greater than a threshold, the segment combination processing for combining the adjacent segments is repeated until a predetermined end condition is satisfied. apparatus.

The segment initialization means includes
3. The video group reconstruction / decoding according to claim 1, wherein the information amount per segment is larger than a threshold value, and the text data with a time interval is divided so that one segment does not span a plurality of topics. Summarization device.

The topic assignment / segment update means includes:
The video group reconstruction / summarization apparatus according to claim 2, wherein the segment feature value calculation process is further performed using the feature values of the previous and / or subsequent segments in terms of time.

In a video group reconstruction / summarization method that reconstructs a video group composed of videos with text data with a time interval and summarizes the video,
By computer
A segment initialization step of reading the text data with time interval from the data storage means and dividing the text data into one or more segments along the time axis;
A topic assignment / segment update step that repeats a process of assigning topics according to keywords in each segment to each segment, and a process of combining temporally adjacent segments based on the similarity of the topics;
A segment group clustering step of clustering each repeated segment for each topic assigned to the segment;
An important video segment extraction step of extracting an important video segment from video segments corresponding to segments in the clustered topic based on the importance of the keyword with respect to the topic;
A video output step of outputting video based on at least one of the video section of each cluster and the important video section of each cluster;
A video group reconstruction / summarization method characterized by comprising:

The topic assignment / segment update step includes:
Segment feature amount calculation processing for calculating the feature amount of the segment based on the frequency of use of the keyword included in the segment, topic assignment processing for assigning a topic vector based on the feature amount to the segment, and calculation using the topic vector 6. The video group reconstruction / summarization according to claim 5, wherein when the similarity between the determined segments is equal to or greater than a threshold value, the segment combination processing for combining the adjacent segments is repeated until a predetermined end condition is satisfied. Method.

The segment initialization step includes:
7. The video group reconstruction / construction according to claim 5 or 6, wherein the information data per segment is larger than a threshold value, and the text data with a time interval is divided so that one segment does not span a plurality of topics. Summarization method.

The topic assignment / segment update step includes:
The video group reconstruction / summarization method according to claim 6, wherein the segment feature value calculation process is further performed using the feature values of the preceding and / or following segments in terms of time.

9. A video group reconstruction / summarization program that causes a computer to execute the video group reconstruction / summarization method according to claim 5.