JP2013250653A

JP2013250653A - Content summarization support device, method and program

Info

Publication number: JP2013250653A
Application number: JP2012123393A
Authority: JP
Inventors: Tsutomu Hirao; 努平尾; Shogo Kimura; 昭悟木村; Katsuhiko Ishiguro; 勝彦石黒; Tomoharu Iwata; 具治岩田; Do Kevin; ドゥケヴィン; Seimin Ooyo; セイミンオウヨウ
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-05-30
Filing date: 2012-05-30
Publication date: 2013-12-12
Anticipated expiration: 2032-05-30
Also published as: JP5791568B2

Abstract

PROBLEM TO BE SOLVED: To support the generation of the summary of content.SOLUTION: A partial summary input part 1 accepts a partial summary input by an editor, and a content search part 2 searches content candidates included in the summary, and a content feature extraction part 3 extracts a feature vector relating to similarity between each of the searched content candidates and the partial summary. A ranking part 4 performs the ranking of the content candidates in an order which is beneficial for the reinforcement of the partial summary on the basis of a preliminarily learnt ranking function and the feature vector of each of the extracted content candidates.

Description

本発明は、コンテンツ要約支援装置、方法、及びプログラムに係り、特に、複数のコンテンツから構成される要約を生成することを支援するコンテンツ要約支援装置、方法、及びプログラムに関する。 The present invention relates to a content summary support apparatus, method, and program, and more particularly, to a content summary support apparatus, method, and program for supporting generation of a summary composed of a plurality of contents.

誰もがインターネット上で発言できるソーシャルメディアの時代が到来した一方で、大量な情報が出回り、その全てを閲覧・利用することが難しくなってきており、有用な情報のみを取り出して提示する仕組みが強く求められている。その中で、近年、ソーシャルメディア上では、コンテンツの作者(creator)が作り出した大量のコンテンツを整理・要約して消費者(consumer)に提示する編集者(curator)の存在が注目を集めている。例えば、代表的なソーシャルメディアの一つであるTwitter（登録商標）の情報を編集者がまとめるサイトとして、Togetter（登録商標）やNAVERまとめ（登録商標）等の「まとめサイト」が広く利用されている。これらのサイトでは、編集者が既存のコンテンツを収集し、編集者自身の視点から整理し、できあがった要約を再び消費者に配信する仕組みが作られている。消費者としては、作者からのコンテンツを直接読むよりも、編集者を通して要約されたコンテンツを読んだ方が、効率的に情報を収集でき、理解が容易になる。この仕組みを概略として示した図が、図１である。 While the era of social media where anyone can speak on the Internet has arrived, a large amount of information has come out, making it difficult to view and use all of it, and there is a mechanism to extract and present only useful information There is a strong demand. Among them, in recent years, the presence of editors (curators) who organize and summarize a large amount of content created by content creators and present them to consumers has attracted attention on social media. . For example, “Summary sites” such as Togetter (registered trademark) and NAVER summary (registered trademark) are widely used as a site for editors to collect information on Twitter (registered trademark), one of the representative social media. Yes. On these sites, editors collect existing content, organize it from the editor's own perspective, and deliver the resulting summary to the consumer again. As a consumer, reading content summarized through an editor can collect information more efficiently and make it easier to understand than reading content directly from the author. FIG. 1 is a diagram schematically showing this mechanism.

編集者は、これらの要約を作成するに当たり、大量に存在するソーシャルメディアコンテンツの中から、必要なコンテンツだけを選択する必要がある。しかし、現状の「まとめサイト」では、この選択作業をほぼ手動で行う必要があり、編集者に多大な労力を強いている。 In creating these summaries, the editor needs to select only the necessary content from a large amount of social media content. However, in the current “summary site”, it is necessary to perform this selection work almost manually, which puts a great deal of effort on the editor.

また、ソーシャルメディアから有用な情報のみを取り出して提示する方法として、これまでに幾つか考案されている。非特許文献１・２は、消費者の好みと似ているコンテンツの作者を推薦する方法を提案している。非特許文献３は、ソーシャルメディアのコンテンツから不要もしくは有害と考えられるコンテンツを除外する方法を提案している。非特許文献４は、イベントに関するソーシャルメディアのコンテンツを自動的に収集し要約するシステムを提案している。 Some methods have been devised so far for extracting and presenting only useful information from social media. Non-Patent Documents 1 and 2 propose a method for recommending authors of content similar to consumer preferences. Non-Patent Document 3 proposes a method of excluding unnecessary or harmful content from social media content. Non-Patent Document 4 proposes a system that automatically collects and summarizes social media content related to events.

Greene, D.; Reid, F.; Cunningham, P.; and Sheridan, G. “Supporting the curation of twitter user lists”, In NIPS Workshop on Computational Social Science and the Wisdom of Crowds, 2011.Greene, D .; Reid, F .; Cunningham, P .; and Sheridan, G. “Supporting the curation of twitter user lists”, In NIPS Workshop on Computational Social Science and the Wisdom of Crowds, 2011. Hannon, J.; Bennett, M.; and Smyth, B. “Recommending twitter users to follow using content and collaborative filtering approaches”, In ACM Conference on Recommender Systems (RecSys10), 2010.Hannon, J .; Bennett, M .; and Smyth, B. “Recommending twitter users to follow using content and collaborative filtering approaches”, In ACM Conference on Recommender Systems (RecSys10), 2010. Dan, O.; Feng, J.; and Davison, B. “Filtering microblogging messages for social TV”, In Proceedings of the World Wide Web Conference (WWW), 2011.Dan, O .; Feng, J .; and Davison, B. “Filtering microblogging messages for social TV”, In Proceedings of the World Wide Web Conference (WWW), 2011. Chakrabarti, D., and Punera, K. “Event summarization using tweets”, In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM), 2011.Chakrabarti, D., and Punera, K. “Event summarization using tweets”, In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM), 2011.

しかしながら、上記の非特許文献１〜４に記載の技術を、編集者の意図を反映した要約を生成する、もしくはそれを支援する方法として流用することは困難である。 However, it is difficult to divert the techniques described in Non-Patent Documents 1 to 4 as a method for generating or supporting a summary reflecting the intention of the editor.

本発明は、上記の事実を鑑みてなされたもので、コンテンツの要約を生成することを支援することができるコンテンツ要約支援装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made in view of the above-described facts, and an object of the present invention is to provide a content summary support apparatus, method, and program capable of supporting generation of a content summary.

上記の目的を達成するために本発明に係るコンテンツ要約支援装置は、電子データで構成されるコンテンツの集合から複数のコンテンツを選択して、選択された複数のコンテンツからなる要約を生成することを支援するコンテンツ要約支援装置であって、ユーザにより入力された、前記要約を構成するコンテンツの部分集合である部分要約を受け付ける部分要約入力手段と、前記部分要約に含まれるコンテンツに基づいて、前記コンテンツの集合から、前記要約に含まれる候補となるコンテンツであるコンテンツ候補を検索するコンテンツ候補検索手段と、前記コンテンツ候補検索手段によって検索された前記コンテンツ候補の各々について、前記部分要約と前記コンテンツ候補との組み合わせにおける類似性に関する特徴量を抽出するコンテンツ特徴抽出手段と、前記特徴量に基づいて部分要約の補強に有用な度合いを示す値を出力するための予め学習された関数と、前記コンテンツ特徴抽出手段によって抽出された前記コンテンツ候補の各々の前記特徴量とに基づいて、前記コンテンツ候補検索手段によって検索された前記コンテンツ候補を、前記部分要約の補強に有用な順に順位付けする順位付け手段とを含んで構成されている。 To achieve the above object, a content summary support apparatus according to the present invention selects a plurality of contents from a set of contents composed of electronic data, and generates a summary composed of the selected plurality of contents. A content summary support apparatus for supporting, comprising: a partial summary input means for accepting a partial summary that is a subset of the content that constitutes the summary input by a user; and the content based on the content included in the partial summary Content candidate search means for searching for a content candidate that is a candidate content included in the summary, and for each of the content candidates searched by the content candidate search means, the partial summary and the content candidate That extracts features related to similarity in combinations of Each of each of the content candidates extracted by the content feature extraction means, a pre-learned function for outputting a value indicating a useful degree of reinforcement of the partial summary based on the feature amount, and the content feature extraction means And ranking means for ranking the content candidates searched by the content candidate search means in an order useful for reinforcing the partial summary based on the feature amount.

本発明に係るコンテンツ要約支援方法は、電子データで構成されるコンテンツの集合から複数のコンテンツを選択して、選択された複数のコンテンツからなる要約を生成することを支援するコンテンツ要約支援装置におけるコンテンツ要約支援方法であって、部分要約入力手段によって、ユーザにより入力された、前記要約を構成するコンテンツの部分集合である部分要約を受け付け、コンテンツ候補検索手段によって、前記部分要約に含まれるコンテンツに基づいて、前記コンテンツの集合から、前記要約に含まれる候補となるコンテンツであるコンテンツ候補を検索し、コンテンツ特徴抽出手段によって、前記コンテンツ候補検索手段によって検索された前記コンテンツ候補の各々について、前記部分要約と前記コンテンツ候補との組み合わせにおける類似性に関する特徴量を抽出し、順位付け手段によって、前記特徴量に基づいて部分要約の補強に有用な度合いを示す値を出力するための予め学習された関数と、前記コンテンツ特徴抽出手段によって抽出された前記コンテンツ候補の各々の前記特徴量とに基づいて、前記コンテンツ候補検索手段によって検索された前記コンテンツ候補を、前記部分要約の補強に有用な順に順位付けする。 A content summarization support method according to the present invention is a content summarization support apparatus that supports generating a summary composed of a plurality of selected contents by selecting a plurality of contents from a set of contents composed of electronic data. A summary support method that accepts a partial summary, which is a subset of content constituting the summary, input by a user by a partial summary input unit, and based on content included in the partial summary by a content candidate search unit Then, a content candidate that is a candidate content included in the summary is searched from the set of contents, and the partial summarization is performed on each of the content candidates searched by the content candidate search unit by a content feature extraction unit. And the content candidate combination A pre-learned function for extracting a feature amount related to similarity in matching and outputting a value indicating a degree useful for reinforcement of a partial summary based on the feature amount by ranking means; and the content feature extraction means The content candidates searched by the content candidate search means are ranked in the order useful for reinforcing the partial summaries based on the feature quantities of the content candidates extracted by the above.

本発明に係るプログラムは、コンピュータを、上記のコンテンツ要約支援装置の各手段として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit of the content summary support apparatus.

以上説明したように、本発明のコンテンツ要約支援装置、方法、及びプログラムによれば、コンテンツ候補の各々について、入力された部分要約とコンテンツ候補との組み合わせにおける類似性に関する特徴量を抽出し、部分要約の補強に有用な度合いを示す値を出力するための予め学習された関数と、抽出されたコンテンツ候補の各々の特徴量とに基づいて、コンテンツ候補を、部分要約の補強に有用な順に順位付けすることにより、コンテンツの要約を生成することを支援することができる、という効果が得られる。 As described above, according to the content summary support apparatus, method, and program of the present invention, for each content candidate, a feature amount related to the similarity in the combination of the input partial summary and the content candidate is extracted, Based on the pre-learned function for outputting a value indicating the degree of usefulness for the summary reinforcement and the feature amount of each extracted content candidate, the content candidates are ranked in the order useful for the enhancement of the partial summary. By attaching, it is possible to obtain an effect that it is possible to support generation of a summary of content.

ソーシャルメディアの作者、編集者、消費者の関係を示す図である。It is a figure which shows the relationship between the author, editor, and consumer of social media. 本発明の概要を示す図である。It is a figure which shows the outline | summary of this invention. 本発明の実施の形態に係るコンテンツ要約支援装置の構成を示す概略図である。It is the schematic which shows the structure of the content summary assistance apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るコンテンツ要約支援装置における順位付け関数学習処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the ranking function learning process routine in the content summary assistance apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係るコンテンツ要約支援装置における要約支援処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the summary assistance process routine in the content summary assistance apparatus which concerns on embodiment of this invention. 実験結果を示す図である。It is a figure which shows an experimental result.

以下、図面を参照して本発明の実施の形態を詳細に説明する。なお、本実施の形態では、Twitter（登録商標）などのソーシャルサービスで提供される電子文書をコンテンツとして、コンテンツの要約を支援するコンテンツ要約支援装置に、本発明を適用した場合を例に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, an example will be described in which the present invention is applied to a content summary support apparatus that supports content summarization using electronic documents provided by social services such as Twitter (registered trademark) as content. .

＜発明の概要＞
図２に、本発明に係るコンテンツの要約を支援する方法の概念を概説する。 <Outline of the invention>
FIG. 2 outlines the concept of a method for supporting content summarization according to the present invention.

最初に、編集者がこれまでに作成した編集中の要約コンテンツ (partially-curated list)がシステムに存在するとする。以降、簡単のため、この編集中の要約コンテンツを部分要約Sと呼ぶ。すなわち、この部分要約Sは、編集者の手で選択されたコンテンツの集合Ｓ＝｛ｓ_１，ｓ_２，...，ｓ_Ｎｓ｝であり、まだ完成していない要約である。部分要約の一例として、コンテンツがTwitter（登録商標）の記事 (tweet)である場合、編集者が選択もしくは意図した特定のテーマに関するtweetの集合などが挙げられる。 First, assume that the system has a partially-curated list that has been created by the editor so far. Hereinafter, for the sake of simplicity, this summary content being edited is referred to as a partial summary S. That is, this partial summary S is a set of contents S = {s ₁ , s ₂ ,... S _Ns } selected by the editor's hand, and is an _uncompleted summary. As an example of the partial summary, when the content is an article (tweet) of Twitter (registered trademark), a set of tweets about a specific theme selected or intended by the editor can be cited.

次に、部分要約Sに含まれる各コンテンツｓ_ｉ（ｉ＝１，２，…，Ｎ_ｓ）の作者ａ（ｓ_ｉ）＝ａ_ｉが作者であるその他のコンテンツ、もしくは作者a_iが閲覧可能な全てのコンテンツ(timeline)を収集し、それらに編集者自身ａ₀が閲覧可能なコンテンツを加えて、要約に含まれる候補となるコンテンツ候補集合Ｔ＝｛ｔ_１，ｔ_２，...，ｔ_ＮＴ｝を構成する。このTの要素ｔ_ｊ（ｉ＝１，２，...，Ｎ_Ｔ）を順位付け(ranking)して、編集者に提示する。 Next, other contents whose author a (s _i ) = a _i is the author of each content s _i (i = 1, 2,..., N _s ) included in the partial summary S, or can be viewed by the author a _i All content (timeline) is collected, content that can be viewed by the editor a ₀ is added to the content, and a candidate content set T = {t ₁ , t ₂ ,. construct t _NT }. The elements t _j (i = 1, 2,..., N _T ) of this T are ranked and presented to the editor.

編集者は、この順位に基づいて要約に含めるべきコンテンツ~Ｔ⊆Ｔを選択し、以下の（１）式に示すように、選択した~Ｔを部分要約Sに加え、新しい部分要約を構成する。 The editor selects content to be included in the summary based on this ranking, and adds the selected ~ T to the partial summary S to form a new partial summary as shown in the following equation (1). .

上記の作業を繰り返すことで、要約を完成させる。 Repeat the above work to complete the summary.

＜コンテンツ要約支援装置の構成＞
図３に示すように、本実施の形態に係るコンテンツ要約支援装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）と、後述するコンテンツ要約支援処理ルーチンを実行するためのプログラムを記憶したＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）とを備えたコンピュータで構成され、機能的には次に示すように構成されている。コンテンツ要約支援装置１００は、部分要約入力部１と、コンテンツ検索部２と、コンテンツ特徴抽出部３と、順位付け部４と、出力部５と、学習データ入力部６と、順位付け学習データ作成部７と、学習コンテンツ特徴抽出部８と、順位付け関数学習部９とを備えている。なお、ソーシャルメディアが、コンテンツ提供閲覧システムの一例である。 <Configuration of content summary support device>
As shown in FIG. 3, the content summary support apparatus 100 according to the present embodiment includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a program for executing a content summary support processing routine described later. It is comprised by the computer provided with ROM (Read Only Memory) memorize | stored, and it is comprised as shown below functionally. The content summary support apparatus 100 includes a partial summary input unit 1, a content search unit 2, a content feature extraction unit 3, a ranking unit 4, an output unit 5, a learning data input unit 6, and ranking learning data creation. Unit 7, learning content feature extraction unit 8, and ranking function learning unit 9. Social media is an example of a content provision browsing system.

部分要約入力部１は、編集者により入力された、編集中の部分要約（複数のコンテンツ）を受け付ける。 The partial summary input unit 1 receives a partial summary (a plurality of contents) being edited, which is input by an editor.

コンテンツ検索部２は、ソーシャルメディアにおいて全ユーザから提供されたコンテンツ集合を格納したコンテンツデータベース２０から、要約の構成に役に立つ可能性のあるコンテンツ候補を検索する。すなわち、入力された部分要約Sに含まれる各コンテンツｓ_ｉ（ｉ＝１，２，…，Ｎ_ｓ）をクエリとして、コンテンツを検索する。より具体的な方法は、コンテンツを検索し収集する対象となるソーシャルメディアの特性に依存するが、普遍的に使える例として、部分要約に含まれる各コンテンツｓ_iの中から重要な単語を抽出し、その単語をクエリとして関連コンテンツを検索する方法が考えられる。本実施の形態では、コンテンツｓ_iの作者ａ（ｓ_ｉ）＝ａ_ｉが一意に決定されるため、コンテンツ検索部２は、コンテンツデータベース２０から、部分要約を入力した編集者が作者として提供したコンテンツ、及びａ_ｉが消費者として閲覧可能なコンテンツを検索し、検索されたコンテンツの全てをコンテンツ候補とする。 The content search unit 2 searches for content candidates that may be useful for the composition of the summary from the content database 20 that stores content sets provided by all users on social media. That is, the content is searched using each content s _i (i = 1, 2,..., N _s ) included in the input partial summary S as a query. More specific methods will depend on the characteristics of social media for which to search for content acquisition, as an example that can be used universally, it extracts key words from each content s _i included in the partial summary A method of searching related content using the word as a query can be considered. In the present embodiment, since the author a (s _i ) = a _i of the content s _i is uniquely determined, the content search unit 2 is provided from the content database 20 by the editor who has input the partial summary as the author. The contents and the contents that a _i can browse as consumers are searched, and all the searched contents are set as content candidates.

ここで、「消費者として閲覧可能なコンテンツ」について説明する。Twitter（登録商標）を例にとって説明する。Twitter（登録商標）の場合、メールと同じように、各アカウントが一意に定まるような名前が与えられている。 Here, “content that can be browsed as a consumer” will be described. An explanation will be given using Twitter (registered trademark) as an example. In the case of Twitter (registered trademark), a name that uniquely identifies each account is given in the same way as email.

簡単のため、これらの名前をA, B, C,... などの記号で代用する。また、Twitter（登録商標）には、各アカウントが友人や有名人など他のアカウントを指定し、その指定されたアカウントが作成したコンテンツ（tweet）を時系列に並べる、タイムラインと呼ばれる機能が存在する。 For simplicity, substitute these symbols with symbols such as A, B, C, ... Twitter (registered trademark) also has a function called timeline, in which each account designates another account such as a friend or celebrity, and the content (tweet) created by the designated account is arranged in chronological order. .

このような場合、アカウントAのタイムラインが、アカウントAが消費者として閲覧可能なコンテンツ、に対応する。 In such a case, the timeline of account A corresponds to the content that account A can view as a consumer.

さらに、注意するべき点は、Twitter（登録商標）では、アカウントAは、コンテンツ（tweet）の作者にも、タイムラインを通してコンテンツ（tweet）の消費者にもなり得る、という点である。 Furthermore, it should be noted that in Twitter (registered trademark), the account A can be a creator of content (tweet) or a consumer of content (tweet) through a timeline.

これを踏まえると、部分要約が複数コンテンツの集合として表現され、それらコンテンツの、重複のない作者が仮にA, B, Cであったとすると、A, B, Cのタイムラインが、「部分要約に含まれるコンテンツの作者が消費者として閲覧可能なコンテンツになる。 Based on this, if the partial summaries are expressed as a set of multiple contents, and the authors without duplication of those contents are A, B, and C, the timeline of A, B, and C The content creator can view the content as a consumer.

コンテンツ検索部２は、部分要約を入力した編集者が作者として提供したコンテンツ、及びａ_ｉが消費者として閲覧可能なコンテンツを検索した結果であるコンテンツ候補集合Ｔ＝｛ｔ_１，ｔ_２，...，ｔ_ＮＴ｝を、コンテンツ特徴抽出部３に渡す。 The content search unit 2 is a content candidate set T = {t ₁ , t ₂ ,..., Which is a result of searching for content provided by the editor who entered the partial summary as an author and content that a _i can browse as a consumer. .., t _NT } is passed to the content feature extraction unit 3.

コンテンツ特徴抽出部３は、検索されたコンテンツ候補集合に含まれるコンテンツ候補の各々について、入力された部分要約を用いて、後述する特徴ベクトルを抽出する。 The content feature extraction unit 3 extracts a feature vector, which will be described later, for each content candidate included in the searched content candidate set, using the input partial summary.

順位付け部４は、コンテンツ検索部から出力されたコンテンツ候補集合Ｔ＝｛ｔ_１，ｔ_２，...，ｔ_ＮＴ｝を並び替える。すなわち、部分要約Sに追加すべき有用なコンテンツt_jを上の順位とするような順位付けを行う。この順位付けを行うための機械学習手法の枠組みは、一般にランキング学習と呼ばれる。順位付け関数の学習は、以下に説明するように、学習データ入力部６、順位付け学習データ作成部７、学習コンテンツ特徴抽出部８、及び順位付け関数学習部９によって行われる。 The ranking unit 4 rearranges the content candidate set T = {t ₁ , t ₂ ,..., T _NT } output from the content search unit. That is, ranking is performed such that the useful content t _j to be added to the partial summary S is ranked higher. The framework of the machine learning method for performing this ranking is generally called ranking learning. Learning of the ranking function is performed by the learning data input unit 6, the ranking learning data creation unit 7, the learning content feature extraction unit 8, and the ranking function learning unit 9, as described below.

学習データ入力部６は、部分要約S、コンテンツ候補集合Tとは別に、既知のキーボード、マウス、記憶装置などの入力器により入力された、それぞれ複数（Ｎ_Ｌ個）の学習用部分要約Ｓ_ｋ＝｛ｓ_ｋ，ｉ｝_ｉ＝１ ^Ｎｓ，ｋと学習用コンテンツ候補集合Ｔ_ｋ＝｛ｔ_ｋ、ｊ｝_ｊ＝１ ^ＮＴ，ｋ（ｋ＝１，２，...，Ｎ_Ｌ）を受け付ける。 In addition to the partial summaries S and the content candidate sets T, the learning data input unit 6 is provided with a plurality (N _L ) of learning partial summaries S _k inputted by an input device such as a known keyboard, mouse, or storage device. = {S _{k, i} } _{i = 1} ^{Ns, k} and learning content candidate set T _k = {t _{k, j} } _{j = 1} ^{NT, k} (k = 1, 2,..., N _L ) .

順位付け学習データ作成部７は、順位付けを行うために必要な関数をあらかじめ学習しておくために必要な学習データを作成する。順位付け学習データ作成部７は、各学習用コンテンツ候補ｔ_ｋ、ｊ（ｊ＝１，２，...，Ｎ_ｔ，ｋ）に対して、部分要約Ｓ_kの補強に有用かどうかを判断するラベルｌk,jの入力を受け付け、学習用コンテンツ候補ｔ_k,jに付与する。例えば、有用である場合はラベル５、有用ではない場合はラベル１、曖昧な場合は有用でなさそうな順にラベル２，３，４を与える。このようなラベル付けを全ての学習用部分要約Ｓ_L＝｛Ｓ₁，Ｓ₂，...，Ｓ_NL｝に対応する学習用コンテンツ候補集合Ｔ_L＝｛Ｔ₁，Ｔ₂，...，Ｔ_NL｝について行い、それらに付与されたラベルＬ_L＝｛Ｌ₁，Ｌ₂，…，Ｌ_NL｝，Ｌ_k＝｛ｌ_k,1，ｌ_k,2，...，ｌ_k,NTk｝（ｋ＝１，２，...，Ｎ_L）を含めて、学習データ（Ｓ_L，Ｔ_L，Ｌ_L）を作成する。 The ranking learning data creation unit 7 creates learning data necessary for learning in advance functions necessary for ranking. The ranking learning data creation unit 7 determines whether each of the learning content candidates t _{k, j} (j = 1, 2,..., N _{t, k} ) is useful for reinforcing the partial summary S _k. The label lk, j to be input is received and given to the learning content candidate tk _{, j} . For example, label 5 is given when it is useful, label 1 is given when it is not useful, and labels 2, 3, and 4 are given in the order that it is not useful when it is vague. Such a labeling is used as a learning content candidate set T _L = {T ₁ , T ₂ ,... Corresponding to all the learning partial summaries S _L = {S ₁ , S ₂ _,. is performed for T _NL}, granted to them label _{_{L L = {L 1, L}} 2, ..., L NL}, L k = {l k, 1, l k, 2, ..., l k, _NTk } (k = 1, 2,..., N _L ) is included to create learning data (S _L , T _L , L _L ).

学習コンテンツ特徴抽出部８は、順位付けを行うための関数に入力する特徴ベクトルを学習用コンテンツから抽出する。具体的には，学習用部分要約Ｓ_ｋ（ｋ＝１，２，...，Ｎ_Ｌ）に対する各学習用コンテンツ候補ｔ_ｋ、ｊ（ｊ＝１，２，...，Ｎ_Ｔ，ｋ）の特徴ベクトルｆ_ｋ，j（ｆ_ｋ，ｊ ^（２），ｆ_ｋ，ｊ ^（２），...，ｆ_ｋ、ｊ ^（ＮＦ））^Ｔとして、以下に示す単語類似度特徴、注目単語特徴、及びネットワーク特徴の少なくとも１つを抽出する。 The learning content feature extraction unit 8 extracts a feature vector to be input to a function for performing ranking from the learning content. Specifically, each learning content candidate t _{k, j} (j = 1, 2,..., N _{T, k} ) for the learning partial summary S _k (k = 1, 2,..., N _L ). ) As feature vectors f _{k, j} (f _{k, j} ⁽²⁾ , f _{k, j} ⁽²⁾ ,..., F _{k, j} ^(NF) ) ^T , the following word similarity features and attention words Extract at least one of features and network features.

単語類似度特徴は、コンテンツの文字面の類似度、すなわち、コンテンツの内容に関する類似度を表す特徴量である。例えば、以下のように求めることができる。 The word similarity feature is a feature amount representing the similarity of the character face of the content, that is, the similarity related to the content. For example, it can be obtained as follows.

まず、学習コンテンツ特徴抽出部８は、コンテンツ候補ｔ_ｋ、ｊからベクトル表現ｔ_ｋ、ｊを得る。このベクトル表現は、出現しうる全ての単語がＮ_Ｗ個ある場合、Ｎ_Ｗ次元のベクトルとなる。また、ベクトル表現の各要素は、コンテンツｔ_ｋ、ｊに含まれる単語ｗの頻度c(j,w)に基づいて、以下に示す、非線形変換頻度、ＴＦ−ＩＤＦ、及びバイナリ重みの少なくとも1つを用いることによって計算される。 First, learning content feature extraction unit 8 obtains content candidate _{t k,} vector representation from _j _{t k,} the _j. This vector representation is an N _W dimensional vector when there are N _W words that can appear. Each element of the vector expression is based on the frequency c (j, w) of the word w included in the contents tk _{, j} , and is at least one of the following nonlinear transformation frequency, TF-IDF, and binary weight. Is calculated by using

非線形変換頻度は、以下の（２）式に従って計算される。 The nonlinear conversion frequency is calculated according to the following equation (2).

TF-IDFは、以下の（３）式に従って計算される。 TF-IDF is calculated according to the following equation (3).

ただし、d(w)は単語wがコンテンツ中に出現する頻度（文章頻度）である。 However, d (w) is the frequency (sentence frequency) that the word w appears in the content.

バイナリ重みは、以下の（４）式に従って計算される。 The binary weight is calculated according to the following equation (4).

学習コンテンツ特徴抽出部８は、コンテンツ候補ｔ_ｋ、ｊからベクトル表現ｔ_ｋ、ｊと同様に、部分要約に含まれる各コンテンツｓ_k,iからもベクトル表現ｓ_k,iを得る。 Learning content feature extraction unit 8, the contents candidate t _k, vector representation from _j t _k, as with _j, obtaining a vector representation s _{k, i} from the content s _{k, i} included in the partial summary.

そして、学習コンテンツ特徴抽出部８は、単語類似度ｆ_k,j ^(m)として、以下の（５）式に従って、学習用コンテンツ候補のベクトル表現ｔ_k,j ^(m)と、対応する学習用部分要約Ｓ_kの各コンテンツのベクトル表現ｓ_k,iとのコサイン類似度を計算する。 The learning content feature extraction unit 8, the word similarity f _k, as _j ^(m), in accordance with the following equation (5), the vector representation t _k of the learning content _candidate, and _j ^(m), the corresponding learning The cosine similarity with the vector representation s _{k, i} of each content of the partial summary S _k is calculated.

計算された単語類似度ｆ_k,j ^(m)を、学習用部分要約Ｓkに対する学習用コンテンツ候補ｔ_ｋ、ｊの特徴量として採用する。コンテンツ間の類似度として、コサイン類似度以外にも、情報検索の分野で頻繁に用いられるBM25類似度も利用可能である。なお、BM25類似度は、非特許文献（Robertson, S., Zarazoga, H., and Taylor, M. 2004. Simple BM25 Extension to multiple weighted fields. Proc. Of CIKM.）に記載されている。 The calculated word similarity f _{k, j} ^(m) is adopted as the feature amount of the learning content candidate tk _{, j} for the learning partial summary Sk. In addition to the cosine similarity, BM25 similarity frequently used in the field of information search can be used as the similarity between contents. The BM25 similarity is described in non-patent literature (Robertson, S., Zarazoga, H., and Taylor, M. 2004. Simple BM25 Extension to multiple weighted fields. Proc. Of CIKM.).

注目単語特徴は、作者が重要とした情報（見出し語、強調語など）の類似度を表す特徴量である。例えば、以下のように求めることができる。基本的な計算方法は単語類似度とほぼ同じであるが、出現しうる全ての単語ではなく、注目するべきM個の単語だけでベクトルを計算する（M＜N）。注目単語の定義は、コンテンツを検索収集するソーシャルメディアに依存するが、基本的には、コンテンツの作者が何らかの形で指定した単語を用いる。本実施の形態では、Twitter（登録商標）からコンテンツを収集するため、#（シャープ）記号を語句の先頭に付与したハッシュタグと呼ばれる語句に含まれる単語を、注目単語とする。注目単語がコンテンツ内に存在しない場合、特徴がゼロになる。 The attention word feature is a feature amount representing the degree of similarity of information (keywords, emphasized words, etc.) important to the author. For example, it can be obtained as follows. The basic calculation method is almost the same as the word similarity, but the vector is calculated only with M words to be noticed, not all the words that can appear (M <N). The definition of the word of interest depends on the social media in which the content is searched and collected, but basically, a word specified in some form by the author of the content is used. In this embodiment, in order to collect content from Twitter (registered trademark), a word included in a phrase called a hash tag with a # (sharp) symbol added to the head of the phrase is set as a noticed word. If the word of interest does not exist in the content, the feature is zero.

学習コンテンツ特徴抽出部８は、コンテンツ候補ｔ_ｋ、ｊのベクトル表現ｔ_ｋ、ｊ、及び部分要約に含まれる各コンテンツｓ_k,iからもベクトル表現ｓ_k,iについて、コンテンツｔ_ｋ、ｊに含まれる注目単語ｗの頻度c(j,w)に基づいて、ベクトル表現の各要素として、非線形変換頻度、ＴＦ−ＩＤＦ、及びバイナリ重みの少なくとも1つを計算する。そして、学習コンテンツ特徴抽出部８は、単語類似度特徴と同様に、学習用コンテンツ候補のベクトル表現ｔ_k,j ^(m)と、対応する学習用部分要約Ｓ_kの各コンテンツのベクトル表現ｓ_k,iとのコサイン類似度を計算し、注目単語の類似度とし、学習用部分要約Ｓ_kに対する学習用コンテンツ候補ｔ_ｋ、ｊの特徴量として採用する。 Learning content feature extraction unit 8, the contents candidate t _k, the vector representation t _k of _{_j, j,} and each content s _k included in the partial _sum, also vector representation from _i s _k, for _i, the content t _k, the _j Based on the frequency c (j, w) of the word of interest w included, at least one of the nonlinear transformation frequency, TF-IDF, and binary weight is calculated as each element of the vector expression. The learning content feature extraction unit 8, like the word similarity feature vector representation t _k of the learning content _candidate, and _j ^(m), the vector representation s _k for each corresponding content of the learning part summary S _k _{, i} is calculated as the similarity of the word of interest, and is used as the feature amount of the learning content candidate tk _{, j} for the learning partial summary S _k .

ネットワーク特徴は、作者と消費者の関係性に関する類似性を表す特徴量である。ソーシャルメディアでは、言語情報以外にも様々なネットワーク特徴を抽出できる特長がある。例えば、ソーシャルネットワークを利用したコンテンツ間の類似性を２値で表現する、以下の（ア）〜（エ）に示すような特徴を抽出できる。 The network feature is a feature amount that represents the similarity related to the relationship between the author and the consumer. Social media has the feature of extracting various network features in addition to language information. For example, the following features (a) to (d) that express the similarity between contents using a social network in binary can be extracted.

（ア）コンテンツ候補ｔ_k,jとコンテンツs_k,iの作者が同じかどうか？
（イ）コンテンツ候補ｔ_k,jとコンテンツs_k,iが同じ受信者を指定しているかどうか？
（ウ）コンテンツ候補ｔ_k,jとコンテンツs_k,iが同じHTTPリンクを持っているかどうか？
（エ）コンテンツ候補ｔ_k,jの受信者としてコンテンツs_k,iの作者を指定しているかどうか？ (A) Is the author of the content candidate t _{k, j} and the content s _{k, i} the same?
(A) Is the content candidate t _{k, j} and the content s _{k, i} designating the same recipient?
(C) Whether the content candidate t _{k, j} and the content s _{k, i} have the same HTTP link?
(D) Is the author of the content s _{k, i} designated as the recipient of the content candidate t _{k, j} ?

ここで、コンテンツの受信者とは、当該コンテンツの作者がコンテンツ内で明示的に示したコンテンツ消費者である。 Here, the content recipient is a content consumer explicitly shown in the content by the author of the content.

上記の定義に基づき、学習データ（Ｓ_L，Ｔ_L，Ｌ_L）のうち、学習用部分要約とコンテンツ候補集合のペア（Ｓ_L，Ｔ_L）の各々から特徴ベクトルを抽出し、その特徴ベクトルの集合Ｆ_L＝｛Ｆ₁，Ｆ₂，...，Ｆ_NL｝，Ｆ_k＝｛ｆ_k,1，ｆ_k,2，...，ｆ_k,NT,k｝とラベル集合Ｌ_Lの対（Ｆ_L,Ｔ_L）を順位付け関数学習部９に渡す。 Based on the above definition, a feature vector is extracted from each of the learning partial summary and the content candidate set pair (S _L , T _L ) from the learning data (S _L , T _L , L _L ), and the feature vector F _L = {F ₁ , F ₂ ,..., F _NL }, F _k = {f _{k, 1} , f _{k, 2} ,..., F _{k, NT, k} } and the label set L _L The pair (F _L , T _L ) is passed to the ranking function learning unit 9.

順位付け関数学習部９は、学習コンテンツ特徴抽出部８から出力された特徴ベクトルの集合とラベル集合の対（Ｆ_L,Ｔ_L）を用いて、部分要約Sが与えられたときのコンテンツ候補集合Tの各要素を順位付けするための値であって、部分要約を補強するのに有用な度合いを示す値を出力する順位付け関数を学習する。この関数の学習には、数多くの既存の学習技術を使うことができるが、本実施形態では、非特許文献（Joachims, T. 2006. Training linear SVMs in linear time. In Proceedings of Knowledge Discovery and Data Mining (KDD).）に示されるSVMrankと呼ばれる手法を用いる。この結果として得られる順位付け関数を、順位付け部４に渡す。 The ranking function learning unit 9 uses the feature vector set and label set pair (F _L , T _L ) output from the learning content feature extraction unit 8 to provide the content candidate set when the partial summary S is given. A ranking function that outputs a value for ranking each element of T and indicating a degree useful for reinforcing a partial summary is learned. Many existing learning techniques can be used to learn this function. In this embodiment, non-patent literature (Joachims, T. 2006. Training linear SVMs in linear time. In Proceedings of Knowledge Discovery and Data Mining (KDD).) Is used and a method called SVMrank is used. The ranking function obtained as a result is passed to the ranking unit 4.

コンテンツ特徴抽出部３は、部分要約Sとコンテンツ候補集合Tと基づいて、学習コンテンツ特徴抽出部８と同様に、部分要約との類似性に関する特徴ベクトルを、コンテンツ候補毎に抽出し、特徴ベクトルの集合Fを得る。この特徴ベクトル各々は、コンテンツ候補に対応していることに注意する。 Based on the partial summary S and the content candidate set T, the content feature extraction unit 3 extracts, for each content candidate, a feature vector related to the similarity to the partial summary, like the learning content feature extraction unit 8, and Get the set F. Note that each feature vector corresponds to a content candidate.

順位付け部４では、順位付け関数学習部９から出力された順位付け関数を用いて、入力された部分要約Sを固定して、コンテンツ検索部２から出力されたコンテンツ候補集合Tの各要素を順位付けする。具体的には、コンテンツ特徴抽出部３により得られた各特徴ベクトルを順位付け関数にそれぞれ与えることにより、その関数の出力として、コンテンツが部分要約の補強に有用な度合いを示す値（コンテンツの順位に相当する値）がそれぞれ得られ、この値の順にコンテンツ候補を順位付けする。 The ranking unit 4 uses the ranking function output from the ranking function learning unit 9 to fix the input partial summary S, and sets each element of the content candidate set T output from the content search unit 2. Ranking. Specifically, by giving each feature vector obtained by the content feature extraction unit 3 to a ranking function, a value indicating the degree to which the content is useful for reinforcing the partial summary (content ranking) is output as the function. Are obtained), and the content candidates are ranked in the order of these values.

順位付けされたコンテンツ候補の集合が、出力部５によりユーザに出力される。 The set of ranked content candidates is output to the user by the output unit 5.

＜コンテンツ要約支援装置の作用＞
次に、本実施の形態に係るコンテンツ要約支援装置１００の作用について説明する。まず、コンテンツ要約支援装置１００は、予め用意された、学習用の部分要約及びコンテンツ候補集合の対が複数入力されると、コンテンツ要約支援装置１００において、図４に示す順位付け関数学習処理ルーチンが実行される。 <Operation of content summary support device>
Next, the operation of the content summary support apparatus 100 according to this embodiment will be described. First, when a plurality of pairs of learning partial summaries and content candidate sets prepared in advance are input to the content summary support apparatus 100, the ranking function learning processing routine shown in FIG. Executed.

まず、ステップＳ１０１において、入力された、学習用の部分要約及び学習用のコンテンツ候補集合の複数の対を受け付け、ステップＳ１０２において、上記ステップＳ１０１で受け付けた学習用のコンテンツ候補集合に含まれる各コンテンツ候補毎に、部分要約を補強するのに有用な度合いを示すラベルの入力を受け付ける。 First, a plurality of pairs of learning partial summaries and learning content candidate sets input are received in step S101, and each content included in the learning content candidate set received in step S101 is received in step S102. For each candidate, an input of a label indicating a degree useful for reinforcing the partial summary is received.

そして、ステップＳ１０３において、上記ステップＳ１０１で受け付けた学習用の部分要約及び学習用のコンテンツ候補集合の複数の対と、上記ステップＳ１０２で受け付けた各コンテンツ候補のラベルとから構成される学習データを作成する。 In step S103, learning data composed of a plurality of pairs of learning partial summaries and learning content candidate sets received in step S101 and labels of each content candidate received in step S102 is created. To do.

次のステップＳ１０４では、上記ステップＳ１０３で作成された学習データに含まれる学習用のコンテンツ候補の各々に対して、特徴ベクトルを抽出する。そして、ステップＳ１０５において、上記ステップＳ１０４で抽出された学習用のコンテンツ候補の各々に対する特徴ベクトルと、上記ステップＳ１０３で作成された学習データに含まれる学習用のコンテンツ候補の各々のラベルとに基づいて、順位付け関数を学習し、順位付け関数学習処理ルーチンを終了する。 In the next step S104, a feature vector is extracted for each of the learning content candidates included in the learning data created in step S103. In step S105, based on the feature vector for each of the learning content candidates extracted in step S104 and the label of each of the learning content candidates included in the learning data created in step S103. Then, the ranking function is learned, and the ranking function learning processing routine is terminated.

そして、ユーザ(編集者)により編集中の部分要約が入力されると、コンテンツ要約支援装置１００において、図５に示す要約支援処理ルーチンが実行される。 When the partial summary being edited is input by the user (editor), the content summary support apparatus 100 executes a summary support processing routine shown in FIG.

まず、ステップＳ１１１において、入力された部分要約を受け付け、ステップＳ１１２において、上記ステップＳ１１１で受け付けた部分要約を用いて、コンテンツデータベース２０から、部分要約に含まれるコンテンツに関連するコンテンツを検索し、検索されたコンテンツの集合を、コンテンツ候補集合とする。 First, in step S111, the input partial summary is received, and in step S112, content related to the content included in the partial summary is searched from the content database 20 using the partial summary received in step S111. The set of content items is set as a content candidate set.

そして、ステップＳ１１３において、上記ステップＳ１１１で受け付けた部分要約を用いて、上記ステップＳ１１２で検索されたコンテンツ候補集合に含まれるコンテンツ候補毎に、特徴ベクトルを抽出する。 In step S113, a feature vector is extracted for each content candidate included in the content candidate set searched in step S112, using the partial summary received in step S111.

次のステップＳ１１４では、学習された順位付け関数を用いて、上記ステップＳ１１４で抽出されたコンテンツ候補毎の特徴ベクトルに基づいて、有用な度合いを示す値をそれぞれ求め、コンテンツ候補集合を順位付けする。 In the next step S114, using the learned ranking function, based on the feature vector for each content candidate extracted in step S114, a value indicating a useful degree is obtained, and the content candidate sets are ranked. .

そして、ステップＳ１１５において、上記ステップＳ１１４で順位付けされたコンテンツ候補集合を並び替えて、出力部５により出力する。 In step S115, the content candidate sets ranked in step S114 are rearranged and output by the output unit 5.

ユーザは、出力部５により出力された順位付けされたコンテンツ候補集合を参考に、部分要約に、コンテンツを追加して、コンテンツの要約を生成する。また、コンテンツを追加した部分要約を、更にコンテンツ要約支援装置１００に入力して、上記の要約支援処理ルーチンを繰り返し実行してもよい。 The user adds content to the partial summary with reference to the ranked content candidate set output by the output unit 5, and generates a content summary. Further, the partial summary to which the content is added may be further input to the content summary support apparatus 100, and the above summary support processing routine may be repeatedly executed.

＜実施例＞
以下に、実施例を示す。学習データの元となるデータとして、Togetter（登録商標）から収集した5000個の（完成した）要約、及びTwitter（登録商標）から収集した130万個の記事（コンテンツ）を収集した。それぞれの要約に含まれるコンテンツからランダムに選択して、平均で12個のコンテンツが含まれる学習用部分要約を作成した。また、選択されなかったコンテンツを全て含み、かつTwitter（登録商標）から収集した数多くのコンテンツからランダムに選択したコンテンツを混ぜて、学習用候補コンテンツ集合を作成した。この学習用候補コンテンツ集合には、平均で192個のコンテンツが含まれる。その中の3割程度が実際の要約に含まれているコンテンツであり、これらのコンテンツには、それぞれ有用であることを示すラベル（l＝５）を付与し、残り、すなわちTwitter（登録商標）からランダムに選択したコンテンツには、有用ではないことを示すラベル（l＝１）を付与した。 <Example>
Examples are shown below. As the basis of learning data, 5,000 (completed) summaries collected from Togetter (registered trademark) and 1.3 million articles (content) collected from Twitter (registered trademark) were collected. Randomly selected from the contents included in each summary, a partial summary for learning was created that included 12 contents on average. In addition, a candidate content set for learning was created by mixing all the contents that were not selected and mixing randomly selected contents from many contents collected from Twitter (registered trademark). This learning candidate content set includes 192 contents on average. About 30% of the content is included in the actual summary, and each content is given a label (l = 5) indicating that it is useful, and the rest, that is, Twitter (registered trademark) The contents (randomly selected from) were given a label (l = 1) indicating that they were not useful.

テストデータとして、Togetter（登録商標）から2000個の（完成した）要約、及びTwitter（登録商標）から58万個のコンテンツを収集し、学習データと同様の作業を行うことにより、部分要約とコンテンツ候補集合を構成した。実験では、上記の実施の形態で説明した方法により、コンテンツ候補集合に対して、特徴ベクトルを抽出し、順位付け関数を用いて、順位付けを行い、順位付けに応じて上位のコンテンツ候補を選択した。そして、実際に要約に含まれているコンテンツをどの程度正しく選択できたか、を評価した。 By collecting 2000 (completed) summaries from Togetter (registered trademark) and 580,000 content from Twitter (registered trademark) as test data, and performing the same work as learning data, partial summaries and content A candidate set was constructed. In the experiment, a feature vector is extracted from the content candidate set by the method described in the above embodiment, is ranked using a ranking function, and a higher content candidate is selected according to the ranking. did. Then, we evaluated how correctly the content actually included in the summary was selected.

評価尺度として、本実験ではMean Average Precision (MAP)を用いる。図６に実験結果を示す。上記の実施の形態で説明した方法を用いた場合には、MAP=0.857と最も高い評価値を示した（図６のＳＶＭｒａｎｋ参照）。一方、学習なしで単語類似度だけを用いた場合には、MAP=0.825（図６のWord-TFIDF参照）、順位付け部を用いずにランダムに選択した場合には、MAP=0.53(図６のRandom参照)という結果を得た。これにより、上記の実施の形態で説明した方法の有効性が実験的に示された． As an evaluation scale, Mean Average Precision (MAP) is used in this experiment. FIG. 6 shows the experimental results. When the method described in the above embodiment was used, the highest evaluation value was shown as MAP = 0.857 (see SVMrank in FIG. 6). On the other hand, when only word similarity is used without learning, MAP = 0.825 (see Word-TFIDF in FIG. 6), and when randomly selected without using the ranking unit, MAP = 0.53 (FIG. 6). (See Random). As a result, the effectiveness of the method described in the above embodiment was experimentally shown.

以上説明したように、本実施の形態に係るコンテンツ要約支援装置によれば、コンテンツ候補の各々について、入力された部分要約との類似性に関する特徴ベクトルを抽出し、部分要約の補強に有用な度合いを示す値を出力するための予め学習された順位付け関数と、コンテンツ候補の各々について抽出された特徴ベクトルとに基づいて、コンテンツ候補を、部分要約の補強に有用な順に順位付けすることにより、コンテンツの要約を生成することを支援することができる。 As described above, according to the content summary support apparatus according to the present embodiment, for each of the content candidates, a feature vector relating to similarity to the input partial summary is extracted, and the degree useful for reinforcing the partial summary By ranking the content candidates in the order useful for augmenting the partial summaries based on a pre-learned ranking function for outputting a value indicating and a feature vector extracted for each of the content candidates, It can assist in generating a summary of the content.

また、編集者が要約を作成する過程で自然に作られる部分的な要約（編集中の部分要約）を入力すると、コンテンツ要約支援装置は、その要約をさらに良いものにしていくためにふさわしいコンテンツ（要約に含める可能性の高いコンテンツ）を推薦して提示することにより、編集操作を支援することができる。 In addition, when the editor inputs a partial summary (partial summary being edited) that is naturally created in the process of creating the summary, the content summarization support apparatus is adapted to the content ( The editing operation can be supported by recommending and presenting content that is likely to be included in the summary.

また、順位付けを行うことにより、要約に有用なコンテンツを素早く見つけることができ、編集作業を効率化できる。 Also, by ranking, contents useful for summarization can be quickly found, and editing work can be made more efficient.

また、適度に広範囲のコンテンツ候補を収集するため、全てのコンテンツを参照する手間を抑えつつ、編集中の部分要約をより価値のあるものにする広い視点を担保できる。 In addition, since a wide range of content candidates are collected, it is possible to secure a wide viewpoint that makes the partial summary being edited more valuable while reducing the time and effort of referring to all the content.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、コンテンツが、電子文書以外の電子データであってもよい。例えば、ビデオデータや、画像データなどの電子データを、コンテンツとしてもよい。 For example, the content may be electronic data other than an electronic document. For example, electronic data such as video data and image data may be used as the content.

また、本発明は、周知のコンピュータに媒体もしくは通信回線を介して、プログラムをインストールすることによっても実現可能である。 The present invention can also be realized by installing a program on a known computer via a medium or a communication line.

また、上述のコンテンツ要約支援装置は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 Further, the above-described content summary support apparatus has a computer system inside, but the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. .

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 In the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium.

１部分要約入力部
２コンテンツ検索部
３コンテンツ特徴抽出部
４順位付け部
６学習データ入力部
７学習データ作成部
８学習コンテンツ特徴抽出部
９順位付け関数学習部
２０コンテンツデータベース
１００コンテンツ要約支援装置 DESCRIPTION OF SYMBOLS 1 Partial summary input part 2 Content search part 3 Content feature extraction part 4 Ranking part 6 Learning data input part 7 Learning data creation part 8 Learning content feature extraction part 9 Ranking function learning part 20 Content database 100 Content summary support apparatus

Claims

A content summary support apparatus that supports a plurality of contents selected from a set of contents composed of electronic data and generates a summary composed of the selected plurality of contents,
Partial summary input means for accepting a partial summary that is a subset of the content that constitutes the summary input by a user;
Content candidate search means for searching for content candidates that are candidates for content included in the summary from the set of content based on content included in the partial summary;
Content feature extraction means for extracting a feature quantity related to similarity in the combination of the partial summary and the content candidate for each of the content candidates searched by the content candidate search means;
Based on a pre-learned function for outputting a value indicating a degree useful for reinforcement of a partial summary based on the feature amount, and the feature amount of each of the content candidates extracted by the content feature extraction unit. Ranking means for ranking the content candidates searched by the content candidate search means in an order useful for reinforcing the partial summary;
A content summary support device including:

The set of contents is provided from all users in a content providing and browsing system in which each user provides contents as an author and browses contents that can be browsed among the contents provided by other users as an author. A collection of content,
2. The content summary according to claim 1, wherein the content candidate search unit searches, as the content candidate, content that can be browsed by an author of content included in the partial summary and content that can be browsed by a user who has input the partial summary. Support device.

The learning included in learning data, which is prepared in advance and includes a partial digest for learning, a set of content candidates for learning, and a label with a value indicating the useful degree for each of the content candidates. Learning content feature extracting means for extracting the feature amount for the combination of the learning partial summary and the learning content candidate for each of the content candidates for learning;
The function is learned based on the feature amount of each of the learning content candidates extracted by the learning content feature extracting unit and the label of each of the learning content candidates included in the learning data. A ranking function learning means,
The content summary support apparatus according to claim 1 or 2, further comprising:

A content summarization support method in a content summarization support device that supports generating a summary composed of a plurality of selected contents by selecting a plurality of contents from a set of contents composed of electronic data,
Accepting a partial summary, which is a subset of the content constituting the summary, input by the user by the partial summary input means;
Based on the content included in the partial summary, a content candidate search unit searches for a content candidate that is a candidate content included in the summary from the set of content,
For each of the content candidates searched by the content candidate search means, a feature amount relating to the similarity in the combination of the partial summary and the content candidate is extracted by the content feature extraction means,
A function learned in advance for outputting a value indicating a degree useful for reinforcement of a partial summary based on the feature amount by the ranking unit, and the content candidates extracted by the content feature extraction unit A content summary support method that ranks the content candidates searched by the content candidate search means in an order useful for reinforcing the partial summary based on feature amounts.

The set of contents is provided from all users in a content providing and browsing system in which each user provides contents as an author and browses contents that can be browsed among the contents provided by other users as an author. A collection of content,
5. The content summary according to claim 4, wherein the content candidate search means searches for content that can be browsed by an author of the content included in the partial summary and content that can be browsed by a user who has input the partial summary as the content candidate. Support method.

The learning content feature extraction unit includes a partial summary for learning, a set of content candidates for learning, and a label with a value indicating the useful degree for each of the content candidates. For each of the learning content candidates included in the learning data, extract the feature amount for the combination of the learning partial summary and the learning content candidate,
By the ranking function learning means, the feature amount of each of the learning content candidates extracted by the learning content feature extraction means and the label of each of the learning content candidates included in the learning data The content summarizing support method according to claim 4, further comprising learning the function based on:

The program for functioning a computer as each means which comprises the content summary assistance apparatus of any one of Claims 1-3.