JP2018005633A

JP2018005633A - Related content extraction device, related content extraction method, and related content extraction program

Info

Publication number: JP2018005633A
Application number: JP2016132833A
Authority: JP
Inventors: 真浦川; Makoto Urakawa
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2016-07-04
Filing date: 2016-07-04
Publication date: 2018-01-11

Abstract

PROBLEM TO BE SOLVED: To provide a device, a method and program capable of presenting content dealing with the same event in a time sequence manner.SOLUTION: A related content extraction device 1 comprises a content acquisition unit 11 for acquiring content, an information extraction unit 12 for extracting specific information about an event contained in the content, including event date/time, a similar content retrieval unit 13 for retrieving similar content on the basis of the specific information and a content recording unit 14 for recording the similar content in association with each other according to the issue date/time in a time sequence manner.SELECTED DRAWING: Figure 2

Description

本発明は、関連コンテンツを抽出する装置、方法及びプログラムに関する。 The present invention relates to an apparatus, a method, and a program for extracting related content.

近年、ニュース記事をカテゴリ別又はトピック別にまとめて表示するキュレーションサービスが提供されている。サービス例としては、「政治」又は「グルメ」などのカテゴリに分けてニュース記事をまとめたＷｅｂサイトや、より具体的な「Ａ国テロ事件」に関するニュース記事をまとめたＷｅｂサイトなどが挙げられる。
このように、ニュース記事を、その時発生した出来事として個々に一般ユーザに提供するだけでなく、ある一定のまとまりとして提供することは、ユーザの知的欲求を満たすサービスとなる。 In recent years, a curation service has been provided that displays news articles collectively by category or topic. Examples of services include a website that summarizes news articles in categories such as “politics” or “gourmet”, and a website that summarizes news articles related to the more specific “Country A terrorist incident”.
Thus, not only providing news articles to general users individually as events occurring at that time, but also providing them as a certain unit is a service that satisfies the intellectual needs of users.

ニュース記事をまとめる手法として、例えば特許文献１では、記事ＩＤのつながりを各記事に付与しておくことで、関連する記事の要約を生成する技術が提案されている。これは、ある記事が属する関連記事群の要約により、ニュースの全体像を容易に把握するための技術である。 As a technique for collecting news articles, for example, Patent Document 1 proposes a technique for generating a summary of related articles by adding a connection of article IDs to each article. This is a technique for easily grasping the whole picture of news by summarizing related articles to which a certain article belongs.

特開２００５−２５０６４８号公報JP-A-2005-250648

しかしながら、従来のキュレーションサービスでは、記事の内容が近い記事がまとめられるため、時系列の順序性は加味されていないことが多い。このため、ユーザは、時系列のつながりを持った情報である一連のニュース記事を正しく理解することが難しかった。
例えば、特許文献１の技術により関連記事の全体像を生成するためには、元となった記事なのか後続の記事なのかを識別するための記事ＩＤを、事前に各記事に対して手動で付与しておく必要がある。また、期間を限定する場合であっても、ニュースの内容ではなく発行日に基づいて要約が生成されるため、部分的な全体像しか得られなかった。 However, in the conventional curation service, articles having similar contents are gathered, and therefore, time-series order is often not taken into account. For this reason, it has been difficult for the user to correctly understand a series of news articles that are information having time-series connections.
For example, in order to generate an overall image of a related article using the technique of Patent Document 1, an article ID for identifying whether the article is an original article or a subsequent article is manually set for each article in advance. It is necessary to grant. Even when the period is limited, since a summary is generated not based on the content of news but based on the date of issue, only a partial overview can be obtained.

本発明は、同一のイベントを扱ったコンテンツを、時系列に提示できる装置、方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide an apparatus, a method, and a program capable of presenting content handling the same event in time series.

本発明に係る関連コンテンツ抽出装置は、コンテンツを取得するコンテンツ取得部と、前記コンテンツに含まれるイベントに関するイベント日時を含む特定情報を抽出する情報抽出部と、前記特定情報に基づいて類似コンテンツを検索する類似コンテンツ検索部と、前記類似コンテンツを、発行日時により時系列に関連付けて記録するコンテンツ記録部と、を備える。 A related content extraction device according to the present invention includes a content acquisition unit that acquires content, an information extraction unit that extracts specific information including an event date and time related to an event included in the content, and searches for similar content based on the specific information A similar content search unit, and a content recording unit that records the similar content in association with each other in chronological order according to an issue date.

前記情報抽出部は、前記特定情報として、前記イベントの場所又は前記イベントに関係する人物を抽出してもよい。 The information extraction unit may extract the location of the event or a person related to the event as the specific information.

前記情報抽出部は、前記イベント日時を、前記コンテンツに含まれる前記発行日時との相対表現に基づいて導出してもよい。 The information extraction unit may derive the event date and time based on a relative expression with the issue date and time included in the content.

前記関連コンテンツ抽出装置は、入力された検索キーに基づいて、前記コンテンツ記録部により関連付けられたコンテンツ群を選択し、当該選択されたコンテンツ群を時系列に出力するコンテンツ提示部を備えてもよい。 The related content extraction apparatus may include a content presentation unit that selects a content group associated by the content recording unit based on an input search key and outputs the selected content group in time series. .

本発明に係る関連コンテンツ抽出方法は、コンテンツを取得するコンテンツ取得ステップと、前記コンテンツに含まれるイベントに関するイベント日時を含む特定情報を抽出する情報抽出ステップと、前記特定情報に基づいて類似コンテンツを検索する類似コンテンツ検索ステップと、前記類似コンテンツを、発行日時により時系列に関連付けて記録するコンテンツ記録ステップと、をコンピュータが実行する。 The related content extraction method according to the present invention includes a content acquisition step of acquiring content, an information extraction step of extracting specific information including an event date and time related to an event included in the content, and searching for similar content based on the specific information The computer executes a similar content search step and a content recording step of recording the similar content in association with each other in chronological order according to the issue date and time.

本発明に係る関連コンテンツ抽出プログラムは、コンテンツを取得するコンテンツ取得ステップと、前記コンテンツに含まれるイベントに関するイベント日時を含む特定情報を抽出する情報抽出ステップと、前記特定情報に基づいて類似コンテンツを検索する類似コンテンツ検索ステップと、前記類似コンテンツを、発行日時により時系列に関連付けて記録するコンテンツ記録ステップと、をコンピュータに実行させる。 A related content extraction program according to the present invention includes a content acquisition step of acquiring content, an information extraction step of extracting specific information including an event date and time related to an event included in the content, and searching for similar content based on the specific information A similar content search step, and a content recording step of recording the similar content in association with each other in chronological order according to the issue date and time.

本発明によれば、同一のイベントを扱ったコンテンツを、時系列に提示できる。 According to the present invention, content that handles the same event can be presented in time series.

実施形態に係る関連コンテンツ抽出装置の機能概要を示す図である。It is a figure which shows the function outline | summary of the related content extraction apparatus which concerns on embodiment. 実施形態に係る関連コンテンツ抽出装置の構成を示す図である。It is a figure which shows the structure of the related content extraction apparatus which concerns on embodiment. 実施形態に係る情報変換処理の処理内容を例示する図である。It is a figure which illustrates the processing content of the information conversion process which concerns on embodiment. 実施形態に係る第１の利用形態を例示する図である。It is a figure which illustrates the 1st usage pattern which concerns on embodiment. 実施形態に係る第２の利用形態を例示する図である。It is a figure which illustrates the 2nd usage pattern which concerns on embodiment.

以下、本発明の実施形態の一例について説明する。
本実施形態に係る関連コンテンツ抽出装置１は、ニュース記事（コンテンツ）から、この記事が対象とする出来事のイベント日時、場所及び人物の情報と、他のキーワードとに基づき、関連する過去のニュース記事を抽出し、時系列順に並べ替えて提示する。このとき、関連コンテンツ抽出装置１は、各ニュース記事のキーワードも合わせて提示する。 Hereinafter, an example of an embodiment of the present invention will be described.
The related content extraction apparatus 1 according to the present embodiment uses a news article (content) from a news article (content) and related past news articles based on the event date / time, place and person information of the event, and other keywords. Are extracted and rearranged in chronological order. At this time, the related content extraction apparatus 1 also presents the keywords of each news article.

図１は、本実施形態に係る関連コンテンツ抽出装置１の機能概要を示す図である。
関連コンテンツ抽出装置１は、まず、各ニュースコンテンツから、記事の発行日時、内容を示すキーワード、並びにコンテンツが対象とする出来事のイベント日時、場所及び人物を抽出する。 FIG. 1 is a diagram showing a functional outline of the related content extraction apparatus 1 according to the present embodiment.
First, the related content extraction apparatus 1 extracts the issue date / time of an article, a keyword indicating the content, and the event date / time, place, and person of an event targeted by the content from each news content.

続いて、関連コンテンツ抽出装置１は、登録された全てのニュースコンテンツが伝えている出来事の場所、イベント日時、人物及びキーワードに基づいて、各ニュースコンテンツに対して関連するニュースコンテンツを検索し類似コンテンツとして定義する。
関連コンテンツ抽出装置１は、関連付けられた類似コンテンツを、時系列に並べ、コンテンツホルダ又はキュレーションサービスプロバイダなどに提供し、ユーザに提示することができる。 Subsequently, the related content extraction apparatus 1 searches for related news content for each news content based on the location of the event, the event date and time, the person and the keyword conveyed by all the registered news content, and similar content Define as
The related content extraction apparatus 1 can arrange similar related contents in time series, provide them to a content holder or a curation service provider, and present them to the user.

例えば、コンテンツホルダが、項目９０として２０１５年１１月１３日に発行した「Ｂ都市でテロ発生」に関するニュース、項目９５として２０１５年１１月１８日に発行した「全世界で追悼」に関するニュース、項目１２３として２０１６年１月５日に発行した「Ｊ条約合意」に関するニュース、項目１２４として２０１６年１月５日に発行した「テロ主犯格捕まる」に関するニュースの各コンテンツを持っていると仮定する。 For example, the news regarding “terrorism in city B” issued by the content holder as item 90 on November 13, 2015, as the item 95, and the news regarding “memorial throughout the world” as issued as item 95 on November 18, 2015. It is assumed that the contents of the news regarding “J Treaty Agreement” issued on January 5, 2016 as 123 and the news regarding “capture of terrorist criminal” issued on January 5, 2016 as the item 124 are assumed.

関連コンテンツ抽出装置１は、これらのニュースコンテンツから、各ニュースが対象とする出来事の「イベント日時」、「場所」及び「人物」を抽出し、データとして付与する。
例えば、２０１６年１月５日に発行された項目１２４に対しては、「昨年１１月の、Ｅ国によるＡ国同時多発テロ・・・」という文言から、「イベント日時」として２０１５年１１月、「場所」としてＡ国といったデータが付与される。一方、項目１２３に対しては、「イベント日時」として２０１５年９月、「場所」としてＩ都市、「人物」としてＤ大臣が付与される。 The related content extraction apparatus 1 extracts “event date”, “location”, and “person” of events targeted by each news from these news contents and assigns them as data.
For example, for the item 124 issued on January 5, 2016, the term “event date and time” in November 2015 is based on the phrase “Country terrorist attacks by country A in November of last year ...” , Data such as country A is given as “location”. On the other hand, for item 123, “Event Date” is given in September 2015, “City” is I City, and “Person” is D Minister.

関連コンテンツ抽出装置１は、各ニュースコンテンツを、「イベント日時」、「場所」、「人物」及び「キーワード」に基づいて紐付けることにより、各ニュースコンテンツの関連度を計算すると共に、これらの時系列順序性を判定する。この結果、コンテンツホルダ又はキュレーションサービスプロバイダは、あるニュース記事の元になったニュース記事などを時系列に提示できる。
例えば、項目１２４には、その元となったニュース記事として、項目９０と項目９０とが時系列に沿って提示される。 The related content extracting apparatus 1 calculates the relevance level of each news content by associating each news content based on “event date and time”, “location”, “person”, and “keyword”. Determine the sequence order. As a result, the content holder or the curation service provider can present the news articles that are the basis of a certain news article in time series.
For example, in the item 124, the item 90 and the item 90 are presented in time series as the news article that is the source.

図２は、本実施形態に係る関連コンテンツ抽出装置１の構成を示す図である。
なお、関連コンテンツ抽出装置１は、サーバ又はＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）などの情報処理装置（コンピュータ）であり、ＣＰＵなどの制御部及び記憶部の他、入出力デバイス及び通信部などのインタフェースを備えてよい。関連コンテンツ抽出装置１は、記憶部に格納された所定のソフトウェア（関連コンテンツ抽出プログラム）を制御部が実行することにより、本実施形態の各種機能を実現する。
関連コンテンツ抽出装置１の制御部は、コンテンツ取得部１１と、情報抽出部１２と、類似コンテンツ検索部１３と、コンテンツ記録部１４と、出力部１５（コンテンツ提示部及びコンテンツ情報出力部）とを備える。 FIG. 2 is a diagram illustrating a configuration of the related content extraction apparatus 1 according to the present embodiment.
The related content extraction device 1 is an information processing device (computer) such as a server or a PC (Personal Computer), and includes an interface such as an input / output device and a communication unit in addition to a control unit such as a CPU and a storage unit. Good. The related content extraction apparatus 1 implements various functions of the present embodiment by causing the control unit to execute predetermined software (related content extraction program) stored in the storage unit.
The control unit of the related content extraction apparatus 1 includes a content acquisition unit 11, an information extraction unit 12, a similar content search unit 13, a content recording unit 14, and an output unit 15 (content presentation unit and content information output unit). Prepare.

コンテンツ取得部１１は、指定された処理対象のコンテンツを取得する。
情報抽出部１２は、取得したコンテンツから、このコンテンツが対象としている出来事の特定情報として、「イベント日時」、「場所」、「人物」及び「キーワード」を抽出する。 The content acquisition unit 11 acquires the designated processing target content.
The information extraction unit 12 extracts “event date / time”, “location”, “person”, and “keyword” from the acquired content as specific information of the event targeted by the content.

ここで、コンテンツ取得部１１が取り込むコンテンツには、このコンテンツを発行した日時とニュース記事の本文であるテキストとが付与されているものとする。コンテンツ取得部１１が取り込んだコンテンツからは、情報抽出部１２の情報変換処理により「イベント日時」、「場所」、「人物」及び「キーワード」が抽出される。 Here, it is assumed that the content acquired by the content acquisition unit 11 is given the date and time when the content was issued and the text that is the body of the news article. From the content acquired by the content acquisition unit 11, “event date”, “location”, “person”, and “keyword” are extracted by the information conversion process of the information extraction unit 12.

図３は、本実施形態に係る情報変換処理の処理内容を例示する図である。
まず、情報抽出部１２は、コンテンツのテキストを形態素解析により単語に分解し、「イベント日時」、「場所」、「人物」、「キーワード」に振り分ける。 FIG. 3 is a diagram illustrating the processing content of the information conversion processing according to the present embodiment.
First, the information extraction unit 12 decomposes the text of the content into words by morphological analysis, and distributes them into “event date”, “location”, “person”, and “keyword”.

「イベント日時」の場合、情報抽出部１２は、テキスト内の記載内容とコンテンツ自体の発行日時とを比較し、絶対的に表すことのできる日時フォーマットで表記する。
例えば、図３の例では、発行日時は２０１６年１月５日であり、テキストの内容は「昨年」という記載になっている。この場合、「昨年」が指し示す年を、発行日時から１年減算することで計算し、「２０１５年」と変換する。例えば、１年減算するルールとしては、「昨年」という表記だけでなく「去年」という表記も考えられ、このような日時を意味する記載に応じた情報変換処理は、ルールにより予め規定される。 In the case of “event date / time”, the information extraction unit 12 compares the description content in the text with the issue date / time of the content itself, and expresses it in a date / time format that can be expressed absolutely.
For example, in the example of FIG. 3, the date of issue is January 5, 2016, and the content of the text is “Last Year”. In this case, the year indicated by “Last Year” is calculated by subtracting one year from the date of issue, and converted to “2015”. For example, as a rule for subtracting one year, not only “Last year” but also “Last year” is conceivable, and information conversion processing according to the description meaning such date and time is defined in advance by the rule.

なお、変換された日時の精度は、変換後の日時表記により把握でき、類似コンテンツの検索に利用される。例えば、図３の例では、抽出できた「イベント日時」情報は、２０１５年１１月のため、検索対象は、２０１５年１１月１日から２０１５年１１月３１日までの出来事に範囲が絞られる。
一方、例えば、「イベント日時」情報が２０１５年１１月１３日と特定された場合、類似コンテンツの検索対象は、２０１５年１１月１３日に限定される。 The accuracy of the converted date and time can be grasped by the date and time notation after conversion, and is used for searching for similar contents. For example, in the example of FIG. 3, since the extracted “event date” information is November 2015, the search target is limited to events from November 1, 2015 to November 31, 2015. .
On the other hand, for example, when the “event date and time” information is specified as November 13, 2015, the search target of similar content is limited to November 13, 2015.

「場所」及び「人物」の場合、情報抽出部１２は、一般に公開されている、場所を特定するためのＡＰＩ、又は独自の辞書などを参照して特定してよい。情報抽出部１２は、例えば、形態素解析などにより得られた単語のタイプが人物か否かを、Ｗｉｋｉｐｅｄｉａ（登録商標）を構造化したＤＢｐｅｄｉａなどの最新の名前空間を参照して判定する。また、場所についてもＧｅｏコーディング用のライブラリなどにより判定が可能である。これらの特定方法は、利用形態により適宜変更されてよい。
これらの外部又は内部リソースにより、例えば「Ｃ大統領」が「人物」であること、「Ｂ都市」が「場所」であり、その国は「Ａ国」であることが特定される。
「イベント日時」、「場所」又は「人物」に分類されなかった単語は、「キーワード」として設定される。 In the case of “location” and “person”, the information extraction unit 12 may specify the location by referring to a publicly disclosed API for specifying the location or a unique dictionary. For example, the information extraction unit 12 determines whether or not the type of a word obtained by morphological analysis or the like is a person with reference to the latest name space such as DBpedia that is structured from Wikipedia (registered trademark). Also, the location can be determined by a library for Geo coding. These identification methods may be appropriately changed depending on the usage form.
These external or internal resources specify, for example, that “President” is “person”, “B city” is “location”, and that the country is “A country”.
Words that are not classified as “event date”, “location”, or “person” are set as “keywords”.

類似コンテンツ検索部１３は、抽出された「イベント日時」、「場所」、「人物」及び「キーワード」に基づいて、類似コンテンツを検索する。
類似コンテンツ検索部１３は、まず、「イベント日時」情報により、コンテンツ検索の対象範囲を限定し、さらに、検索対象コンテンツの「場所」、「人物」及び「キーワード」毎に類似度を計算し、類似度が高いコンテンツを、類似コンテンツとして抽出する。 The similar content search unit 13 searches for similar content based on the extracted “event date”, “location”, “person”, and “keyword”.
The similar content search unit 13 first limits the content search target range based on the “event date / time” information, and further calculates the similarity for each “location”, “person”, and “keyword” of the search target content, Content having high similarity is extracted as similar content.

例えば、図１の例では、項目１２４のニュース記事が指し示すイベント日時は２０１５年１１月であるため、２０１５年１１月の内容であり、類似度の高い単語を持つ項目９０及び項目９５が類似コンテンツとして抽出される。
なお、「場所」、「人物」及び「キーワード」毎に計算される類似度は、例えば、コサイン類似度又は文脈類似度などが採用可能であり、類似コンテンツを判別する閾値は適宜設定されてよい。 For example, in the example of FIG. 1, since the event date and time indicated by the news article of item 124 is November 2015, it is the content of November 2015, and items 90 and 95 having words with high similarity are similar contents. Extracted as
The similarity calculated for each “location”, “person”, and “keyword” can be, for example, cosine similarity or context similarity, and a threshold value for determining similar content may be set as appropriate. .

コンテンツ記録部１４は、検索した類似コンテンツをイベント日時順、発行日時順に並べ、取得した特定情報、及び類似コンテンツの時系列の前後関係を示す関連情報を、コンテンツに付加してストレージ２０に蓄積する。
なお、ストレージ２０内でのデータ構造は限定されず、各種のデータベースシステム及びファイルシステムなどで幅広く利用可能である。 The content recording unit 14 arranges the searched similar contents in order of event date and time and issuance date, and adds the acquired specific information and related information indicating the chronological order of similar contents to the content 20 and accumulates them in the storage 20. .
The data structure in the storage 20 is not limited and can be widely used in various database systems and file systems.

出力部１５（コンテンツ提示部）は、時系列上の前後関係を把握したいコンテンツの選択又は検索指示を受け付け、ストレージ２０に記録されたコンテンツの関連情報に基づいて、類似コンテンツ群を時系列に出力する。このとき、各コンテンツには、対象とする場所、人物及び使用されているキーワードが付与されているため、コンテンツ間の違いが提示されてもよい。
また、出力部１５（コンテンツ情報出力部）は、指定されたコンテンツに対応付けて、特定情報（「イベント日時」、「場所」、「人物」、「キーワード」）及び類似コンテンツの関連情報を、メタデータとして出力してもよい。 The output unit 15 (content presentation unit) receives a selection or search instruction for content for which the context in the time series is to be grasped, and outputs a similar content group in time series based on the related information of the content recorded in the storage 20 To do. At this time, since a target place, a person, and a used keyword are assigned to each content, a difference between the contents may be presented.
Further, the output unit 15 (content information output unit) associates specific information (“event date and time”, “location”, “person”, “keyword”) and related information on similar content in association with the designated content. You may output as metadata.

図４は、本実施形態に係る関連コンテンツ抽出装置１の第１の利用形態を例示する図である。
第１の利用形態は、サービスプロバイダが関連コンテンツ抽出装置１を利用して、自社及び他社のコンテンツをまとめ、キュレーション情報を提供する例である。Ｘ社は、特定のトピックに関するニュースコンテンツをまとめて提示するサービスを展開し、Ｙ社は、自社が保有するコンテンツより前の経緯を収集している。
なお、関連コンテンツ抽出装置１は、コンテンツホルダＡ及びＢなどのコンテンツ事業者が保有するコンテンツを既に取り込み、時系列の関連情報を付与してストレージ２０に蓄積している。 FIG. 4 is a diagram illustrating a first usage pattern of the related content extraction apparatus 1 according to this embodiment.
The first usage pattern is an example in which a service provider uses the related content extraction device 1 to collect contents of its own company and other companies and provide curation information. Company X develops a service that collectively presents news content related to a specific topic, and Company Y collects the history before the content owned by the company.
Note that the related content extraction apparatus 1 has already taken in content held by content providers such as the content holders A and B, added time-series related information, and accumulated in the storage 20.

サービスプロバイダＸは、Ａ国で発生したテロについて、時系列にコンテンツをまとめたいと考えている。そこで、サービスプロバイダＸは、関連コンテンツ抽出装置１に対して、「Ａ国」及び「テロ」といったクエリで問い合わせることにより、キーワードマッチングにより取得したコンテンツと、このコンテンツと同一の出来事を対象とした類似コンテンツとを時系列の関連情報と共に取得できる。 Service provider X wants to summarize content in a time series regarding terrorism in country A. Therefore, the service provider X makes an inquiry to the related content extraction apparatus 1 using a query such as “Country A” and “terrorism”, so that the content obtained by keyword matching is similar to the same event as this content. Content can be acquired together with time-series related information.

例えば、この例では、項目９０が最初のニュースコンテンツで、その後、項目９５に、さらにその後項目１２４につながっていることが分かる。
また、各項目のニューステキスト本文だけでなく、付与されたイベント日時、場所、人物及びキーワードも出力される。したがって、サービスプロバイダＸは、時系列に並べ替えたコンテンツキュレーションだけでなく、各コンテンツに付与されたキーワードなどの特定情報を合わせて表示することで、情報の流れを可視化できる。 For example, in this example, it can be seen that the item 90 is the first news content, and is then connected to the item 95 and then to the item 124.
In addition to the news text body of each item, the assigned event date, place, person, and keyword are also output. Therefore, the service provider X can visualize the flow of information by displaying not only the content curation rearranged in time series but also specific information such as keywords assigned to each content.

サービスプロバイダＹは、自社が保有する「Ｆ旅客機発見」というニュースコンテンツに対して、これ以前の経緯に関するコンテンツを収集して関連コンテンツとして表示したい。この場合、キーワードだけでなく発行日時に関しても条件を付けて検索することが可能である。
「Ｆ旅客機発見」コンテンツは、発行日が２０１６年１月１５日のため、「２０１６年１月１５日以前」という条件が指定されると、項目１０１は表示されるが、発行日が条件を満たさない項目Ａ−０１は検索結果として表示されていない。 The service provider Y wants to collect content related to the history before the news content “F passenger plane discovery” owned by the company and display it as related content. In this case, it is possible to search with a condition not only for the keyword but also for the issue date and time.
Since the “F Passenger Aircraft Discovery” content has an issuance date of January 15, 2016, if the condition “Before January 15, 2016” is specified, the item 101 is displayed, but the issue date is a condition. The item A-01 that is not satisfied is not displayed as a search result.

図５は、本実施形態に係る関連コンテンツ抽出装置１の第２の利用形態を例示する図である。
第２の利用形態は、コンテンツホルダが関連コンテンツ抽出装置１を利用して、自社のコンテンツに、「イベント日付」、「場所」、「人物」及び「キーワード」に関するデータを付与する例である。 FIG. 5 is a diagram illustrating a second usage pattern of the related content extraction apparatus 1 according to this embodiment.
The second usage mode is an example in which the content holder uses the related content extraction device 1 to add data related to “event date”, “location”, “person”, and “keyword” to the content of the company.

コンテンツホルダＺが「日本におけるテロ対策」に関するコンテンツＺ−Ａ０１を関連コンテンツ抽出装置１に登録すると、「イベント日時」、「場所」、「人物」及び「キーワード」が付与された上で、ストレージに蓄積される。その後、コンテンツホルダＺは、自社コンテンツを一意に示す項目番号のような情報を指定することで、付与された特定情報、及び時系列に関連付けられた類似コンテンツを取得できる。 When the content holder Z registers the content Z-A01 related to “countermeasures against terrorism in Japan” in the related content extraction apparatus 1, the “event date”, “location”, “person”, and “keyword” are assigned to the storage. Accumulated. Thereafter, the content holder Z can acquire the specific information provided and the similar content associated with the time series by designating information such as an item number uniquely indicating the company content.

図５の例では、コンテンツホルダＺは、キーワードとして「日本」、「テロ」、「対策」及び「警察庁」を、人物として「Ｋ首相」を、場所として「日本」を取得できている。このため、コンテンツホルダＺは、取得した情報をメタデータとしてコンテンツＺ−Ａ０１に付与できる。
このように、関連コンテンツ抽出装置１の利用者は、他のコンテンツとの時系列での関連性を確認できるだけでなく、コンテンツそのものに対しても必要な情報を付与することができる。 In the example of FIG. 5, the content holder Z can acquire “Japan”, “terrorism”, “countermeasure” and “National Police Agency” as keywords, “K Prime Minister” as a person, and “Japan” as a place. For this reason, the content holder Z can give the acquired information to the content Z-A01 as metadata.
As described above, the user of the related content extraction apparatus 1 can not only confirm time-series relevance with other content, but can also give necessary information to the content itself.

本実施形態によれば、関連コンテンツ抽出装置１は、コンテンツからイベント日時を含む特定情報を抽出し、この特定情報に基づいて類似コンテンツを検索すると共に、コンテンツの発行日時により時系列に関係付ける。
したがって、関連コンテンツ抽出装置１は、ニュース記事などのコンテンツにおいて、対象のイベントを精度良く特定でき、同一のイベントを扱ったコンテンツを、自動的に時系列に提示できる。 According to the present embodiment, the related content extraction apparatus 1 extracts specific information including an event date and time from the content, searches for similar content based on the specific information, and associates them in time series with the content issue date and time.
Therefore, the related content extraction apparatus 1 can accurately identify a target event in content such as a news article, and can automatically present content that handles the same event in time series.

関連コンテンツ抽出装置１は、さらに、特定情報としてイベントの場所及び関係する人物を抽出するので、コンテンツが対象としているイベントをより精度よく特定できる。
また、これらの特定情報は、ユーザが記憶し検索キーともなりやすい情報であるので、コンテンツに付与されることにより検索時の利便性が向上する。 The related content extraction apparatus 1 further extracts the event location and the related person as the specific information, so that the event targeted by the content can be specified with higher accuracy.
In addition, since the specific information is information that is easily stored and used as a search key by the user, the convenience during the search is improved by being given to the content.

関連コンテンツ抽出装置１は、テキスト内に記載されている相対的な日時表現を発行日時に基づいて変換することにより、コンテンツからイベント日時を抽出する。これにより、テキスト内の表現によらず正確な日付又は日付の範囲を特定できる。 The related content extraction apparatus 1 extracts the event date and time from the content by converting the relative date and time expression described in the text based on the issue date and time. As a result, an accurate date or date range can be specified regardless of the expression in the text.

関連コンテンツ抽出装置１は、コンテンツを時系列に並べるだけでなく、コンテンツが対象とするイベントを表す特定情報及びキーワードも明確にすることで、多数のコンテンツを体系的に提示することができる。また、関連コンテンツ抽出装置１は、コンテンツに対して、イベントを特定しやすいメタデータを付与でき、コンテンツの充実及びユーザサービスの向上につながる。 The related content extraction apparatus 1 can systematically present a large number of contents not only by arranging the contents in time series but also by clarifying specific information and keywords representing events targeted by the contents. Further, the related content extraction apparatus 1 can add metadata that can easily identify an event to the content, which leads to enhancement of the content and improvement of user service.

さらに、関連コンテンツ抽出装置１は、ニュースコンテンツの時系列情報、場所及び人物といった固有名詞になりうる情報、その他のキーワードに分けて情報が付与することにより、各コンテンツのつながりを時系列上に、また、地図上にも展開することが可能となる。このため、新たなコンテンツキュレーションが実現できると共に、ユーザの理解をサポートしたコンテンツ提示が可能となる。 Furthermore, the related content extracting apparatus 1 divides the connection of each content in time series by providing information divided into time series information of news content, information that can be proper nouns such as place and person, and other keywords. It can also be developed on a map. Therefore, new content curation can be realized, and content presentation supporting user understanding can be realized.

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、本実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. Further, the effects described in the present embodiment are merely a list of the most preferable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the present embodiment.

本実施形態における各機能部は、適宜、複数の情報処理装置に分離されたシステムとして構成されてよい。 Each functional unit in the present embodiment may be appropriately configured as a system separated into a plurality of information processing apparatuses.

関連コンテンツ抽出装置１の機能を実現するためのプログラムをコンピュータで読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。 A program for realizing the function of the related content extraction apparatus 1 may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. .

ここでいう「コンピュータシステム」とは、ＯＳや周辺機器などのハードウェアを含むものとする。また、「コンピュータで読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭなどの可搬媒体、コンピュータシステムに内蔵されるハードディスクなどの記憶装置のことをいう。 The “computer system” here includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a hard disk built in a computer system.

さらに「コンピュータで読み取り可能な記録媒体」とは、インターネットなどのネットワークや電話回線などの通信回線を介してプログラムを送信する場合の通信線のように、短時刻の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時刻プログラムを保持しているものも含んでもよい。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 In addition, “computer-readable recording medium” means that a program is dynamically stored for a short time, like a communication line when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. It is also possible to include one that holds a program for a certain time, such as a volatile memory inside a computer system that becomes a server or client in that case. Further, the program may be for realizing a part of the above-described functions, and may be capable of realizing the above-described functions in combination with a program already recorded in the computer system. .

１関連コンテンツ抽出装置
１１コンテンツ取得部
１２情報抽出部
１３類似コンテンツ検索部
１４コンテンツ記録部
１５出力部（コンテンツ提示部、コンテンツ情報出力部）
２０ストレージ DESCRIPTION OF SYMBOLS 1 Related content extraction apparatus 11 Content acquisition part 12 Information extraction part 13 Similar content search part 14 Content recording part 15 Output part (content presentation part, content information output part)
20 storage

Claims

A content acquisition unit for acquiring content;
An information extraction unit that extracts specific information including an event date and time related to an event included in the content;
A similar content search unit that searches for similar content based on the specific information;
A related content extraction apparatus comprising: a content recording unit that records the similar content in association with a time series according to an issue date.

The related content extraction device according to claim 1, wherein the information extraction unit extracts a location of the event or a person related to the event as the specific information.

The related content extraction apparatus according to claim 1, wherein the information extraction unit derives the event date and time based on a relative expression with the issue date and time included in the content.

4. The content presentation unit according to claim 1, further comprising: a content presentation unit that selects a content group associated by the content recording unit based on an input search key and outputs the selected content group in time series. The related content extraction device described in 1.

A content acquisition step of acquiring content;
An information extracting step of extracting specific information including an event date and time related to an event included in the content;
A similar content search step of searching for similar content based on the specific information;
A related content extraction method in which a computer executes a content recording step of recording the similar content in association with a time series according to an issue date and time.

A content acquisition step of acquiring content;
An information extracting step of extracting specific information including an event date and time related to an event included in the content;
A similar content search step of searching for similar content based on the specific information;
A related content extraction program for causing a computer to execute a content recording step of recording the similar content in association with a time series according to an issue date and time.