JP2015200962A

JP2015200962A - Inter-document relation extraction device and program

Info

Publication number: JP2015200962A
Application number: JP2014078012A
Authority: JP
Inventors: 山田　一郎; Ichiro Yamada; 一郎山田; 菊佳望月; Kikuka Mochizuki; 太郎宮▲崎▼; Taro Miyazaki; 田中　英輝; Hideki Tanaka; 英輝田中
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2014-04-04
Filing date: 2014-04-04
Publication date: 2015-11-12
Anticipated expiration: 2034-04-04
Also published as: JP6296651B2

Abstract

PROBLEM TO BE SOLVED: To allow a variety of relations to be extracted in extracting inter-document relations.SOLUTION: A keyword pair generation section outputs a pair of keywords extracted respectively from two documents, as a keyword pair, on the basis of an inter-word relation retrieved from an inter-word relation dictionary storage section, in the case that the pair of keywords has a direct relation to each other or in the case that the pair of keywords has a direct relation to a common other word. As for the keyword pair outputted by the keyword pair generation section, an inter-keyword relation extraction section outputs the direct relation in the case that the keyword pair has the direct relation to each other, on the basis of the inter-word relation retrieved from the inter-word relation dictionary storage section, and outputs the other word and a relation of the keyword contained in the keyword pair to the other word in the case that the keyword pair has the direct relation to the common other word.

Description

本発明は、自然言語処理の技術分野に属するものであり、文書間関係抽出装置およびそのプログラムに関する。 The present invention belongs to the technical field of natural language processing, and relates to an inter-document relationship extraction apparatus and a program thereof.

文書間の関係を抽出する技術において、従来は、文書に含まれる単語を手掛かりとして、文書間に「同等」の関係があるか否かを判定したり、文書間に「推移」の関係かあるか否かを判定したりする技術が提案されている。非特許文献１には、文書を横断して、文間の「同等」を特定するモデルと、文間の「推移」を特定するモデルとが記載されている。 In the technology for extracting the relationship between documents, conventionally, it is determined whether there is an “equivalent” relationship between documents using a word contained in the document, or there is a “transition” relationship between documents. Techniques for determining whether or not are proposed. Non-Patent Document 1 describes a model for specifying “equivalent” between sentences across a document and a model for specifying “transition” between sentences.

宮部泰成，高村大也，奥村学，「文書横断文間関係の特定」，言語処理学会第１２回年次大会，２００６年３月，ｐｐ．４９６−４９９．Yasunari Miyabe, Daiya Takamura, Manabu Okumura, “Identification of Inter-Document Cross-Sentence Relationships”, 12th Annual Conference of the Language Processing Society, March 2006, pp. 496-499.

文書間の関係については、「同等」関係や「推移」関係だけではなく、他の様々な関係を抽出できるようにすることが望まれる。本発明は、このような事情を考慮してなされたものであり、文書間における、「同等」関係や「推移」関係以外の関係も抽出することのできる文書間関係抽出装置およびプログラムを提供するものである。 Regarding the relationship between documents, it is desirable to be able to extract not only “equivalent” relationship and “transition” relationship, but also various other relationships. The present invention has been made in consideration of such circumstances, and provides an inter-document relationship extraction apparatus and program capable of extracting relationships other than “equivalent” relationships and “transition” relationships between documents. Is.

上記の課題を解決するための手段として、文書間関係抽出装置およびプログラムは、２つの文書のそれぞれから、重要な複数のキーワードを抽出し、大量の文書集合から自動獲得した単語間関係辞書を利用してキーワード間の最適な関係を推定する。これにより、２つの文書間の関係を推定する。 As means for solving the above problems, the inter-document relationship extraction apparatus and program extract a plurality of important keywords from each of two documents and use a word-to-word relationship dictionary automatically acquired from a large collection of documents. To estimate the optimal relationship between keywords. This estimates the relationship between the two documents.

［１］本発明の一態様による文書間関係抽出装置は、入力された文書から前記文書の複数のキーワードを抽出するとともに前記文書における各々の前記キーワードの重要度を示すスコアを出力するキーワード抽出部と、２つの前記文書からそれぞれ抽出された２つのキーワードの対に関して、単語の対と前記単語間の関係とを表すデータを保持する単語間関係辞書記憶部から読み出した単語間の関係に基づいて、前記キーワードの対が直接の関係を有する場合、または前記キーワードの対が共通の他の単語との間で直接の関係を有する場合に、そのキーワードの対をキーワード対として出力するキーワード対生成部と、前記キーワード対生成部が出力した前記キーワード対に関して、前記単語間関係辞書記憶部から読み出した単語間の関係に基づいて、前記キーワード対が直接の関係を有する場合には当該関係を出力するとともに、前記キーワード対が共通の他の単語との間で直接の関係を有する場合には当該他の単語および当該キーワード対に含まれる前記キーワードから当該他の単語への関係を出力する、キーワード間関係抽出部と、を具備する。 [1] An inter-document relationship extraction device according to an aspect of the present invention extracts a plurality of keywords of the document from an input document and outputs a score indicating the importance of each of the keywords in the document Based on the relationship between the words read from the inter-word relationship dictionary storage unit that holds data representing the word pairs and the relationship between the words, for the two keyword pairs extracted from the two documents, respectively. When the keyword pair has a direct relationship, or when the keyword pair has a direct relationship with another common word, the keyword pair generation unit outputs the keyword pair as a keyword pair And the relationship between the words read from the inter-word relationship dictionary storage unit with respect to the keyword pair output by the keyword pair generation unit. Therefore, when the keyword pair has a direct relationship, the relationship is output. When the keyword pair has a direct relationship with another common word, the other word and the keyword are output. An inter-keyword relationship extraction unit that outputs a relationship from the keyword included in the pair to the other word.

［２］また、本発明の一態様は、上記の文書間関係抽出装置において、前記キーワード間関係抽出部は、前記キーワード対が有する関係のうち、前記単語間関係辞書記憶部に出現する数が最も少ない関係を、最適な関係として出力する、ことを特徴とする。 [2] Further, according to one aspect of the present invention, in the inter-document relationship extracting apparatus, the inter-keyword relationship extracting unit includes a number of occurrences in the inter-word relationship dictionary storage unit among the relationships of the keyword pairs. It is characterized by outputting the least relationship as the optimum relationship.

［３］また、本発明の一態様は、上記の文書間関係抽出装置において、前記キーワード対生成部によって生成されたキーワード対と、前記キーワード対に関して前記キーワード間関係抽出部によって出力される前記関係と、を含む説明文データを生成する説明生成部、をさらに具備する。 [3] Further, according to one aspect of the present invention, in the inter-document relationship extraction device, the keyword pair generated by the keyword pair generation unit and the relationship output by the keyword relationship extraction unit with respect to the keyword pair And a description generation unit that generates description text data including.

［４］また、本発明の一態様は、コンピューターを、入力された文書から前記文書の複数のキーワードを抽出するとともに前記文書における各々の前記キーワードの重要度を示すスコアを出力するキーワード抽出部、２つの前記文書からそれぞれ抽出された２つのキーワードの対に関して、単語の対と前記単語間の関係とを表すデータを保持する単語間関係辞書記憶部から読み出した単語間の関係に基づいて、前記キーワードの対が直接の関係を有する場合、または前記キーワードの対が共通の他の単語との間で直接の関係を有する場合に、そのキーワードの対をキーワード対として出力するキーワード対生成部、前記キーワード対生成部が出力した前記キーワード対に関して、前記単語間関係辞書記憶部から読み出した単語間の関係に基づいて、前記キーワード対が直接の関係を有する場合には当該関係を出力するとともに、前記キーワード対が共通の他の単語との間で直接の関係を有する場合には当該他の単語および当該キーワード対に含まれる前記キーワードから当該他の単語への関係を出力する、キーワード間関係抽出部、として機能させるためのプログラムである。 [4] Further, according to one aspect of the present invention, a keyword extracting unit that extracts a plurality of keywords of the document from an input document and outputs a score indicating the importance of each of the keywords in the document; Based on the relationship between the words read from the inter-word relationship dictionary storage unit that holds data representing the word pair and the relationship between the words, for the two keyword pairs extracted from the two documents, A keyword pair generation unit that outputs a keyword pair as a keyword pair when the keyword pair has a direct relationship, or when the keyword pair has a direct relationship with another common word; The keyword pair output by the keyword pair generation unit is based on the relationship between words read from the inter-word relationship dictionary storage unit. When the keyword pair has a direct relationship, the relationship is output, and when the keyword pair has a direct relationship with another common word, the other word and the keyword pair are output. Is a program for functioning as an inter-keyword relationship extraction unit that outputs a relationship from the keyword included in the other word to the other word.

本発明によれば、２つの文書がどのような関係を持つかを推定し、出力できる。 According to the present invention, it is possible to estimate and output the relationship between two documents.

本発明の実施形態による文書間関係抽出装置の概略機能構成を示すブロック図である。1 is a block diagram illustrating a schematic functional configuration of an inter-document relationship extraction device according to an embodiment of the present invention. 同実施形態による文書間関係抽出装置の全体的な処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the whole process of the document relationship extraction apparatus by the embodiment. 同実施形態によるキーワード対生成部によるキーワード対生成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the keyword pair production | generation process by the keyword pair production | generation part by the embodiment. 同実施形態によるキーワード間関係抽出部がキーワード間の関係を抽出する処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process in which the relationship extraction part between keywords by the embodiment extracts the relationship between keywords. 同実施形態による文書間関係抽出装置が処理結果を出力した例を示す概略図である。It is the schematic which shows the example which the relationship extraction apparatus between the documents by the same embodiment output the processing result. 同実施形態による単語間関係辞書記憶部が記憶する単語間関係辞書の構成を示す概略図である。It is the schematic which shows the structure of the word relationship dictionary which the word relationship dictionary memory | storage part by the same embodiment memorize | stores. 同実施形態による単語共起辞書記憶部が記憶する単語共起辞書の構成を示す概略図である。It is the schematic which shows the structure of the word co-occurrence dictionary which the word co-occurrence dictionary memory | storage part by the embodiment memorize | stores.

次に、本発明の一実施形態について、図面を参照しながら説明する。
図１は、本実施形態による文書間関係抽出装置の概略機能構成を示すブロック図である。図示するように、文書間関係抽出装置１は、文書取得部１０と、キーワード抽出部１１と、キーワード対生成部１２と、キーワード間関係抽出部１３と、説明生成部１４と、単語間関係辞書記憶部２１と、単語共起辞書記憶部２２と、を含んで構成される。 Next, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a schematic functional configuration of the inter-document relationship extracting apparatus according to the present embodiment. As shown in the figure, the inter-document relationship extraction apparatus 1 includes a document acquisition unit 10, a keyword extraction unit 11, a keyword pair generation unit 12, an inter-keyword relationship extraction unit 13, an explanation generation unit 14, and an inter-word relationship dictionary. A storage unit 21 and a word co-occurrence dictionary storage unit 22 are included.

文書取得部１０は、外部から文書データを取得する。文書取得部１０は、少なくとも２つの文書データを取得する。後の処理では、それらの２つの文書間での関係を表す情報が抽出される。文書データは、例えば、いわゆるテキストファイルの形式を有している。
キーワード抽出部１１は、入力された文書から、その文書の複数のキーワードを抽出するとともに、その文書における各キーワードの重要度を示すスコアを出力するものである。具体的には、キーワード抽出部１１は、文書取得部１０が取得した文書データのうちの２つの文書データそれぞれから、複数のキーワードを抽出する。そして、キーワード抽出部１１は、各文書に対応して、キーワード集合のデータを出力する。 The document acquisition unit 10 acquires document data from the outside. The document acquisition unit 10 acquires at least two pieces of document data. In later processing, information representing the relationship between the two documents is extracted. The document data has, for example, a so-called text file format.
The keyword extraction unit 11 extracts a plurality of keywords of the document from the input document, and outputs a score indicating the importance of each keyword in the document. Specifically, the keyword extraction unit 11 extracts a plurality of keywords from each of two document data of the document data acquired by the document acquisition unit 10. Then, the keyword extraction unit 11 outputs keyword set data corresponding to each document.

キーワード対生成部１２は、２つの文書からそれぞれ抽出された２つのキーワードの対に関して、単語間関係辞書記憶部２１から読み出した単語間の関係に基づいて、キーワードの対が直接の関係を有する場合、またはキーワードの対が共通の他の単語との間で直接の関係を有する場合に、そのキーワードの対をキーワード対として出力する。つまり、キーワード対生成部１２は、キーワード抽出部１１によって抽出された、２つの文書データに対応するキーワード集合のそれぞれから、１つずつキーワードを取り出し、両文書間のキーワード対のデータを生成する。 When the keyword pair generation unit 12 has two keyword pairs extracted from two documents, the keyword pairs have a direct relationship based on the relationship between words read from the inter-word relationship dictionary storage unit 21. If the keyword pair has a direct relationship with another common word, the keyword pair is output as a keyword pair. That is, the keyword pair generation unit 12 extracts one keyword from each of the keyword sets corresponding to the two document data extracted by the keyword extraction unit 11, and generates keyword pair data between both documents.

キーワード間関係抽出部１３は、キーワード対生成部１２が出力したキーワード対に関して、単語間関係辞書記憶部２１から読み出した単語間の関係に基づいて、そのキーワード対が直接の関係を有する場合には当該関係を出力するとともに、そのキーワード対が共通の他の単語との間で直接の関係を有する場合には当該他の単語および当該キーワード対に含まれるキーワードから当該他の単語への関係を出力するものである。つまり、キーワード間関係抽出部１３は、キーワード対生成部１２が生成したキーワード対に対して、その対をなすキーワード間の関係を抽出する処理を行う。なお、キーワード間関係抽出部１３は、この処理の際に、単語間関係辞書記憶部２１のほかに、単語共起辞書記憶部２２を参照する。 When the keyword pair has a direct relationship based on the relationship between words read from the word relationship dictionary storage unit 21 with respect to the keyword pair output by the keyword pair generation unit 12, the keyword relationship extraction unit 13 In addition to outputting the relationship, if the keyword pair has a direct relationship with another common word, the relationship from the other word and the keyword contained in the keyword pair to the other word is output. To do. That is, the inter-keyword relationship extraction unit 13 performs a process of extracting the relationship between the keywords forming the pair for the keyword pair generated by the keyword pair generation unit 12. Note that the keyword relationship extraction unit 13 refers to the word co-occurrence dictionary storage unit 22 in addition to the word relationship dictionary storage unit 21 during this process.

説明生成部１４は、キーワード対抽出部によって抽出されたキーワード対と、キーワード対に関してキーワード間関係抽出部によって出力される関係と、を含む説明文データを生成する。言い換えれば、説明生成部１４は、キーワード間関係抽出部１３が抽出した関係と、そのキーワードとを利用して、元の２つの文書データ間の関係を説明する文を生成する処理を行う。 The description generation unit 14 generates explanatory text data including the keyword pair extracted by the keyword pair extraction unit and the relationship output by the keyword relationship extraction unit regarding the keyword pair. In other words, the description generation unit 14 performs processing for generating a sentence that explains the relationship between the original two document data by using the relationship extracted by the inter-keyword relationship extraction unit 13 and the keyword.

単語間関係辞書記憶部２１は、単語の対とその２つの単語間の関係とを表すデータを保持するものである。ここで、単語間の関係とは、例えば、類似関係、因果関係、上位下位関係、属性関係などである。単語間の関係を表す辞書データは、ウェブにおけるテキストデータを元に予め獲得しておく。テキストデータを元に単語間の関係を自動的に獲得するためには、既存技術に属するものである単語間関係獲得ツールなどを利用することができる。
単語間関係辞書のデータについては、後で図６を参照しながら説明する。
なお、単語間関係辞書記憶部２１を文書間関係抽出装置１内に設ける代わりに、文書間関係抽出装置１の外部の装置として単語間関係辞書記憶部２１の機能を有する装置を設けても良い。この場合には、文書間関係抽出装置１と単語間関係記憶部２１との間で、有線または無線の通信媒体を介した通信を行えるようにする。これにより、文書間関係抽出装置１は、単語間関係辞書記憶部２１から必要なデータを取得する。 The inter-word relationship dictionary storage unit 21 holds data representing a word pair and a relationship between the two words. Here, the relationship between words is, for example, a similarity relationship, a causal relationship, an upper / lower relationship, an attribute relationship, and the like. Dictionary data representing the relationship between words is acquired in advance based on text data on the web. In order to automatically acquire the relationship between words based on text data, an inter-word relationship acquisition tool that belongs to the existing technology can be used.
The data in the inter-word relationship dictionary will be described later with reference to FIG.
Instead of providing the inter-word relationship dictionary storage unit 21 in the inter-document relationship extraction device 1, a device having the function of the inter-word relationship dictionary storage unit 21 may be provided as an external device of the inter-document relationship extraction device 1. . In this case, communication between the document relationship extraction apparatus 1 and the word relationship storage unit 21 can be performed via a wired or wireless communication medium. Thereby, the inter-document relationship extracting device 1 acquires necessary data from the inter-word relationship dictionary storage unit 21.

単語共起辞書記憶部２２は、単語間の共起関係を辞書データとして保持するものである。ここで、共起関係のデータとは、２単語の共起頻度に基づく類似性を表すデータであり、一例としては、Ｄｉｃｅ係数などである。このような共起頻度のデータを得るためには、例えば、ウェブにおけるテキストデータなどを対象として、２つの単語が同一文中で共起した回数をカウントし、カウント結果に基づいてＤｉｃｅ係数等の値を予め算出しておくようにする。また、既に公開されている単語共起頻度データベースのデータなどを利用することもできる。
単語共起辞書のデータについては、後で図７を参照しながら説明する。 The word co-occurrence dictionary storage unit 22 holds a co-occurrence relationship between words as dictionary data. Here, the co-occurrence relationship data is data representing similarity based on the co-occurrence frequency of two words, and an example is a Dice coefficient. In order to obtain such co-occurrence frequency data, for example, the number of times two words co-occur in the same sentence are counted for text data on the web, and values such as Dice coefficient are calculated based on the count result. Is calculated in advance. In addition, data in a word co-occurrence frequency database that has already been made public can be used.
The data of the word co-occurrence dictionary will be described later with reference to FIG.

次に、文書間関係抽出装置１の全体の処理手順について説明する。
図２は、文書間関係抽出装置１の全体的な処理の流れを示すフローチャートである。以下では、このフローチャートに沿って、説明する。このフローチャートでは、２つの文書データ（文書１および文書２）を入力し、それらの文書の間の関係を抽出する処理を示している。なお、３つ以上の文書データを入力し、それらのうちの２つの文書の間の関係を抽出するようにしても良い。 Next, an overall processing procedure of the inter-document relationship extracting apparatus 1 will be described.
FIG. 2 is a flowchart showing an overall processing flow of the inter-document relationship extracting apparatus 1. Below, it demonstrates along this flowchart. This flowchart shows a process of inputting two pieces of document data (document 1 and document 2) and extracting a relationship between the documents. It is also possible to input three or more document data and extract the relationship between two of them.

まずステップＳ１−１において、キーワード抽出部１１は、文書１および文書２のそれぞれからｎ個（ｎは正整数）のキーワードを抽出する。文書１および文書２は、予め文書取得部１０によって取得されているものである。キーワード抽出部１１は、それぞれの文書にとって重要なｎ個の単語をキーワードとして抽出する。文書から重要なキーワードを抽出する処理自体は、従来技術を用いて実施することができる。キーワード抽出部１１が抽出するキーワード集合は、元の文書の特徴をよく表すキーワードの集合である。そのためには、キーワード抽出部１１は、例えばＴＦＩＤＦの値を基準として、各文書から上位ｎ個のキーワードを抽出する。ＴＦＩＤＦは、単語の出現頻度（Term Frequency）と逆文書頻度（Inverse Document Frequency）との積である。なお、キーワード抽出部１１は、各文書につきｎ個のキーワードを出力するとともに、各キーワードの重要さを表す重みの値を出力する。この重みの値としては、ＴＦＩＤＦ値そのものを用いても良いし、ＴＦＩＤＦ値と順序関係が整合するような他の重み値を用いても良い。 First, in step S1-1, the keyword extracting unit 11 extracts n keywords (n is a positive integer) from each of the document 1 and the document 2. Document 1 and document 2 are previously acquired by the document acquisition unit 10. The keyword extraction unit 11 extracts n words important for each document as keywords. The process of extracting important keywords from a document can be performed using conventional techniques. The keyword set extracted by the keyword extraction unit 11 is a set of keywords that well represent the characteristics of the original document. For this purpose, the keyword extraction unit 11 extracts the top n keywords from each document, for example, based on the value of TFIDF. TFIDF is the product of the word appearance frequency (Term Frequency) and the inverse document frequency. The keyword extraction unit 11 outputs n keywords for each document, and outputs a weight value indicating the importance of each keyword. As this weight value, the TFIDF value itself may be used, or another weight value whose order relation is consistent with the TFIDF value may be used.

次にステップＳ１−２において、キーワード対生成部１２は、キーワード抽出部１１によって抽出された１文書あたりｎ個のキーワードと、各キーワードに関連付けられた重み値とに基づいて、キーワード対を生成する処理を行う。
その処理の結果、キーワード対生成部１２は、Ｎ個（Ｎは正整数）のキーワード対を生成する。Ｎの値は予め設定可能とする。一例として、Ｎ＝４を予め設定しておく。また、キーワード対生成部１２は、各々のキーワード対のランクスコアを算出する。なお、このキーワード対生成部１２による処理の詳細な手順については後述する（図３参照）。 Next, in step S1-2, the keyword pair generation unit 12 generates a keyword pair based on n keywords per document extracted by the keyword extraction unit 11 and the weight value associated with each keyword. Process.
As a result of the processing, the keyword pair generation unit 12 generates N (N is a positive integer) keyword pairs. The value of N can be set in advance. As an example, N = 4 is set in advance. In addition, the keyword pair generation unit 12 calculates a rank score for each keyword pair. The detailed procedure of processing by the keyword pair generation unit 12 will be described later (see FIG. 3).

次にステップＳ１−３において、キーワード間関係抽出部１３は、キーワード対生成部１２によって生成されたＮ個のキーワード対のそれぞれに対して、キーワード間の関係を抽出する処理を行う。このとき、キーワード間関係抽出部１３は、キーワード対生成部１２の処理において算出された各キーワード対のランクスコアを用いる。 Next, in step S <b> 1-3, the inter-keyword relationship extraction unit 13 performs a process of extracting a relationship between keywords for each of the N keyword pairs generated by the keyword pair generation unit 12. At this time, the keyword relationship extraction unit 13 uses the rank score of each keyword pair calculated in the processing of the keyword pair generation unit 12.

ここで、キーワード間関係抽出部１３による出力は、次の２パターンのうちのいずれかである。第１のパターンは、キーワード対を構成する２つのキーワードが直接的な関係を有する場合であり、このとき、キーワード間関係抽出部１３は、そのキーワード対と、その関係名を出力する。第２のパターンは、キーワード対を構成する２つのキーワードのそれぞれが、ある共通単語と直接的な関係を有する場合であり、このとき、キーワード間関係抽出部１３は、そのキーワード対と、その共通単語と、そのキーワードとその共通単語とをつなぐ関係名とを出力する。 Here, the output by the keyword relationship extraction unit 13 is one of the following two patterns. The first pattern is a case where two keywords constituting a keyword pair have a direct relationship. At this time, the inter-keyword relationship extraction unit 13 outputs the keyword pair and its relationship name. The second pattern is a case where each of two keywords constituting a keyword pair has a direct relationship with a certain common word. At this time, the inter-keyword relationship extraction unit 13 uses the keyword pair and its common A word and a relation name connecting the keyword and the common word are output.

重要な（ランクの高い）キーワード対を元に、キーワード間関係抽出部１３によって抽出され、出力される関係が、２つの文書の関係を表す。 Based on important (high-ranked) keyword pairs, the relationship extracted and output by the keyword relationship extraction unit 13 represents the relationship between two documents.

次にステップＳ１−４において、説明生成部１４は、各々のキーワード対に対応する説明文を生成する。 Next, in step S1-4, the description generation unit 14 generates a description corresponding to each keyword pair.

具体的には、説明生成部１４は、各関係について、「○○つながり」というパターンによる文言を出力する。また、スコアを出力する。キーワード対を構成する２つのキーワードを直接つなぐ関係名が得られる場合には、説明生成部１４は、「○○つながり」という出力パターンの「○○」をその関係名で置換する。キーワード対を構成する２つのキーワードのそれぞれと同一の関係を持つ共通単語がえら得る場合には、説明生成部１４は、「○○つながり」という出力パターンの「○○」をその共通単語で置換する。 Specifically, the explanation generation unit 14 outputs a word with a pattern of “XXX connection” for each relationship. A score is output. When a relation name that directly connects two keywords constituting a keyword pair is obtained, the explanation generation unit 14 replaces “XX” in the output pattern “XX connection” with the relation name. When a common word having the same relationship with each of the two keywords constituting the keyword pair can be obtained, the explanation generating unit 14 replaces “XX” in the output pattern “XX connection” with the common word. To do.

さらに、説明生成部１４は、あらかじめ記述して記憶しておいたテンプレートデータを用いて、補助的な説明となる文を生成する。 Further, the explanation generating unit 14 generates a sentence serving as an auxiliary explanation by using template data described and stored in advance.

例えば、キーワード間関係抽出部１３からの出力が、キーワード対と、それらのキーワードを直接つなぐ関係名である場合は、説明生成部１４が用いるテンプレートは、「＜キーワードｂ＞は＜キーワードａ＞の＜関係名＞」というものである。このテンプレートにおいて、＜キーワードｂ＞と＜キーワードａ＞と＜関係名＞とは、適宜、具体的な単語や関係名で置き換えられる部分である。
具体的には、例えば、キーワードａが「脳卒中」であり、キーワードｂが「動脈硬化」であり、それらを直接つなぐ関係名が「原因」の場合、説明生成部１４は、「原因つながり。動脈硬化は脳卒中の原因。」という出力を生成する。 For example, when the output from the inter-keyword relationship extraction unit 13 is a keyword pair and a relationship name that directly connects those keywords, the template used by the explanation generation unit 14 is “<keyword b> is <keyword a>”. <Relationship name>". In this template, <keyword b>, <keyword a>, and <relation name> are parts that are appropriately replaced with specific words or relationship names.
Specifically, for example, when the keyword a is “stroke”, the keyword b is “arteriosclerosis”, and the relation name that directly connects them is “cause”, the explanation generation unit 14 reads “cause connection. Produces the output "Cure is the cause of the stroke."

また、キーワード間関係抽出部１３からの出力が、キーワード対と、共通単語と、そのキーワードとその共通単語との間の関係名である場合は、説明生成部１４が用いるテンプレートは、その関係名に依り異なるものである。
具体的には、例えば、キーワードａが「脳卒中」であり、キーワードｂが「心不全」であり、これらのキーワードと関係を有する共通単語が「疾患」であり、キーワードと共通単語の関係名がともに「上位語」の場合は、説明生成部１４は、「疾患つながり。脳卒中
は心不全と同じ上位語（疾患）を持つ。」と出力する。
なお、説明生成部１４による出力の例は、後で、図５を参照しながら、説明する。 If the output from the keyword relationship extraction unit 13 is a keyword pair, a common word, and a relationship name between the keyword and the common word, the template used by the explanation generation unit 14 is the relationship name. It depends on.
Specifically, for example, the keyword a is “stroke”, the keyword b is “heart failure”, the common word related to these keywords is “disease”, and the relation name between the keyword and the common word is both In the case of “higher term”, the explanation generation unit 14 outputs “disease connection. Stroke has the same broader term (disease) as heart failure”.
An example of output by the explanation generation unit 14 will be described later with reference to FIG.

次に、装置内の主要な処理部の処理について説明する。
図３は、キーワード対生成部１２がキーワード対を生成する処理の詳細を示すフローチャートである。ここでは、キーワード対生成の処理の元となる文書のペアを、文書ａおよび文書ｂと呼ぶ。そして、キーワード対を生成する処理を行うための、キーワード対生成部１２への入力は、文書ａから抽出したキーワード集合｛ｗ_ａ１，ｗ_ａ２，・・・ｗ_ａｎ｝とその重みのデータ、および文書ｂから抽出したキーワード集合｛ｗ_ｂ１，ｗ_ｂ２，・・・ｗ_ｂｎ｝とその重みのデータである。これらの入力データは、キーワード抽出部１１の処理の結果として得られる。 Next, processing of main processing units in the apparatus will be described.
FIG. 3 is a flowchart showing details of a process in which the keyword pair generation unit 12 generates a keyword pair. Here, a document pair that is a source of keyword pair generation processing is referred to as a document a and a document b. Then, an input to the keyword pair generation unit 12 for performing a process for generating a keyword pair is a keyword set {w _a1 , w _a2 ,... W _an } extracted from the document a and its weight data, and The keyword set {w _b1 , w _b2 ,... W _bn } extracted from the document b and its weight data. These input data are obtained as a result of the processing of the keyword extraction unit 11.

まずステップＳ２−１において、キーワード対生成部１２は、文書ａから抽出したキーワード集合と文書ｂから抽出したキーワード集合から、それぞれ１個のキーワードｗ_ａｉとｗ_ｂｊとを抽出し、これらによるキーワード対候補（ｗ_ａｉ，ｗ_ｂｊ）を生成する。このとき、各々のキーワード集合からのキーワードの選択のしかたは任意である。 First, in step S2-1, the keyword pair generation unit 12 extracts one keyword w _ai and w _bj from the keyword set extracted from the document a and the keyword set extracted from the document b, respectively. Candidates (w _ai , w _bj ) are generated. At this time, the method of selecting a keyword from each keyword set is arbitrary.

次にステップＳ２−２において、キーワード対生成部１２は、キーワードｗ_ａｉとｗ_ｂｊの各々に付与された重みにより文書中でのランクを求め、このランクの和をキーワード対候補（ｗ_ａｉ，ｗ_ｂｊ）の重みｒａｎｋ＿ｓｃｏｒｅ（ｉ，ｊ）とする。つまり、ｒａｎｋ＿ｓｃｏｒｅ（ｉ，ｊ）＝ｒａｎｋ_ａ（ｉ）＋ｒａｎｋ_ｂ（ｊ）である。ここで、ｒａｎｋ_ａ（ｉ）は、文書ａにおけるキーワードｗ_ａｉのランクである。また、ｒａｎｋ_ｂ（ｊ）は、文書ｂにおけるキーワードｗ_ｂｊのランクである。例えば、キーワードｗ_ａｉが文書ａにおいて最も大きな重みを持つキーワードであり、且つ、キーワードｗ_ｂｊが文書ｂにおいて最も大きな重みを持つキーワードである場合、ｒａｎｋ_ａ（ｉ）＝１、且つ、ｒａｎｋ_ｂ（ｊ）＝１である。この場合、ｒａｎｋ＿ｓｃｏｒｅ（ｉ，ｊ）＝ｒａｎｋ_ａ（ｉ）＋ｒａｎｋ_ｂ（ｊ）＝１＋１＝２である。 Next, in step S2-2, the keyword pair generation unit 12 obtains a rank in the document based on the weights assigned to the keywords w _ai and w _bj , and the sum of the ranks is determined as a keyword pair candidate (w _ai , w _bj ) weight rank_score (i, j). That is, rank_score (i, j) = rank _a (i) + rank _b (j). Here, rank _a (i) is the rank of the keyword w _ai in the document a. Rank _b (j) is the rank of the keyword w _bj in the document b. For example, when the keyword w _ai is the keyword having the largest weight in the document a and the keyword w _bj is the keyword having the largest weight in the document b, rank _a (i) = 1 and rank _b ( j) = 1. In this case, rank_score (i, j) = rank _a (i) + rank _b (j) = 1 + 1 = 2.

次にステップＳ２−３において、キーワード対生成部１２は、生成するキーワード対をカウントするための変数ｃｏｕｎｔＡを１にセットする。この変数のためには、例えば、半導体メモリの所定の領域を割り当てる。 In step S2-3, the keyword pair generation unit 12 sets a variable countA for counting the keyword pairs to be generated to 1. For this variable, for example, a predetermined area of the semiconductor memory is allocated.

次にステップＳ２−４において、キーワード対生成部１２は、ステップＳ２−１で生成したキーワード対候補をすべて処理したか否かを判定する。すべて処理し終えている場合（ステップＳ２−４：ＹＥＳ）には、このフローチャート全体の処理を終了する。すべてを処理し終えていない場合（ステップＳ２−４：ＮＯ）には、次のステップＳ２−５に移る。 Next, in step S2-4, the keyword pair generation unit 12 determines whether all the keyword pair candidates generated in step S2-1 have been processed. When all the processes have been completed (step S2-4: YES), the process of the entire flowchart is terminated. If not all has been processed (step S2-4: NO), the process proceeds to the next step S2-5.

ステップＳ２−５に進んだ場合には、同ステップにおいてキーワード対生成部１２は、残っているキーワード対の中から１つのキーワード対を処理対象として選択する。このとき、キーワード対生成部１２は、ｒａｎｋ＿ｓｃｏｒｅ（ｉ，ｊ）の昇順に選択する。つまり、キーワード対生成部１２は、ｒａｎｋ＿ｓｃｏｒｅ（ｉ，ｊ）の値の小さいものを優先して（言い換えれば、ランクの高いキーワード対を優先して）、処理対象のキーワード対を処理対象として選択する。なお、同じランクスコア値のキーワード対が複数存在する場合、キーワード間関係抽出部１３は、下に列挙する優先順（ａ）〜（ｅ）のいずれかにより、処理対象となるキーワード対を決定する。優先順とは即ち、（ａ）キーワード対を構成する２個のキーワードのうちのいずれか一方の元の文書におけるランクの値が小さい順、言い換えれば、キーワード対を構成する２個のキーワードのそれぞれの元の文書におけるランクの、最小値の値が小さい順。（ｂ）キーワード対を構成する２個のキーワードのそれぞれの元の文書におけるランクの差の絶対値が小さい順。（ｃ）疑似乱数等によりランダムに決定される順。（ｄ）上記（ａ）と上記（ｃ）の組み合わせ。（ｅ）上記（ｂ）と上記（ｃ）の組み合わせ。 When the process proceeds to step S2-5, the keyword pair generation unit 12 selects one keyword pair as a processing target from the remaining keyword pairs in the same step. At this time, the keyword pair generation unit 12 selects in ascending order of rank_score (i, j). That is, the keyword pair generation unit 12 selects a keyword pair to be processed as a processing target by giving priority to those having a low rank_score (i, j) value (in other words, giving priority to a keyword pair having a higher rank). . When there are a plurality of keyword pairs having the same rank score value, the inter-keyword relationship extraction unit 13 determines a keyword pair to be processed according to any of the priority orders (a) to (e) listed below. . The priority order is (a) the order of the rank value in the original document of any one of the two keywords constituting the keyword pair, in other words, each of the two keywords constituting the keyword pair. The order of the smallest value of the rank in the original document. (B) Order in which the absolute value of the difference in rank in each original document of the two keywords constituting the keyword pair is small. (C) Order determined randomly by pseudo-random numbers or the like. (D) A combination of (a) and (c) above. (E) A combination of (b) and (c) above.

次にステップＳ２−６において、キーワード対生成部１２は、処理対象のキーワード対の両者が直接関係を持つか、または、キーワード対の両者がある共通の単語との間で関係を持つかを判定する。そのいずれかの条件に該当する場合（ステップＳ２−６：ＹＥＳ）には、次のステップＳ２−７に進む。処理対象のキーワード対がいずれの条件にも該当しない場合（ステップＳ２−６：ＹＥＳ）には、次のキーワード対候補の処理のために、ステップＳ２−４に戻る。なお、本ステップにおいて単語と単語とが関係を有するか否かを判定するために、キーワード対生成部１２は、単語間関係辞書記憶部２１を参照する。単語間関係辞書記憶部２１は、ある単語と別の単語とがどのような関係を有するかを表す情報を記憶する。関係とは、例えば、類似関係、因果関係、上位下位関係、属性関係などである。 Next, in step S2-6, the keyword pair generation unit 12 determines whether both of the keyword pairs to be processed have a direct relationship or whether there is a relationship with a common word with both of the keyword pairs. To do. If any of these conditions is met (step S2-6: YES), the process proceeds to the next step S2-7. If the keyword pair to be processed does not meet any of the conditions (step S2-6: YES), the process returns to step S2-4 to process the next keyword pair candidate. In addition, in order to determine whether a word and a word have a relationship in this step, the keyword pair production | generation part 12 refers to the word relationship dictionary memory | storage part 21. FIG. The inter-word relationship dictionary storage unit 21 stores information representing what kind of relationship a certain word has with another word. The relationship is, for example, a similarity relationship, a causal relationship, an upper / lower relationship, an attribute relationship, or the like.

例えば、単語間関係辞書記憶部２１は、下の（１）および（２）のような関係を表すデータを含んでいる。
（１）脳卒中 − ［関係：因果関係］ − 動脈硬化
（２）心筋梗塞 − ［関係：因果関係］ − 動脈硬化
つまり、「脳卒中」と「動脈硬化」とが「因果関係」を有しており、且つ、「心筋梗塞」と「動脈硬化」とが「因果関係」を有している。このとき、次の通りである。即ち、例えば、「脳卒中」と「動脈硬化」とは直接関係を有する。また、例えば、「脳卒中」と「心筋梗塞」とは、それぞれ、「動脈硬化」という共通の単語との間で関係（因果関係）を有する。 For example, the inter-word relationship dictionary storage unit 21 includes data representing relationships such as the following (1) and (2).
(1) Stroke-[Relation: causal]-Arteriosclerosis (2) Myocardial infarction-[Relation: causal]-Arteriosclerosis In other words, "stroke" and "arteriosclerosis" have a "causal relationship" In addition, “myocardial infarction” and “arteriosclerosis” have a “causal relationship”. At this time, it is as follows. That is, for example, “stroke” and “arteriosclerosis” have a direct relationship. Further, for example, “stroke” and “myocardial infarction” have a relationship (causal relationship) with a common word “arteriosclerosis”, respectively.

次にステップＳ２−７に進んだ場合、同ステップおいて、キーワード対生成部１２は、現在処理中のキーワード対の候補を、キーワード対の一つとして出力に追加する。言い換えれば、キーワード対生成部１２は、候補であるキーワードの対が、直接の関係を有する場合、または、共通の単語との間で関係を有する場合に、その対を、キーワード対の一つとして生成する。 When the process proceeds to step S2-7, the keyword pair generation unit 12 adds the keyword pair candidate currently being processed to the output as one of the keyword pairs. In other words, when the keyword pair that is a candidate has a direct relationship or a relationship with a common word, the keyword pair generation unit 12 sets the pair as one of the keyword pairs. Generate.

そして、ステップＳ２−８において、キーワード対生成部１２は、変数ｃｏｕｎｔＡの値に１を加算することによって、この変数を更新する。
前述の通り、この変数ｃｏｕｎｔＡは生成するキーワード対をカウントするためのものであり、ここで１を加算することは、上のステップＳ２−７においてキーワード対が１つ生成されることに対応している。
次にステップＳ２−９において、キーワード対生成部１２は、変数ｃｏｕｎｔＡの値がＮより大きいか否かを判定する。つまり、ｃｏｕｎｔＡ＞Ｎの不等式が成立するか否かを判定する。Ｎの値は、前述の通りである。変数ｃｏｕｎｔＡの値がＮより大きい場合（ステップＳ２−９：ＹＥＳ）には、このフローチャート全体の処理を終了する。その時点で、キーワード対生成部１２は、Ｎ個のキーワード対を出力済である。変数ｃｏｕｎｔＡの値がＮ以下である場合（ステップＳ２−９：ＮＯ）には、次のキーワード対候補の処理のために、ステップＳ２−４に戻る。 In step S2-8, the keyword pair generation unit 12 updates this variable by adding 1 to the value of the variable countA.
As described above, this variable countA is for counting the keyword pairs to be generated, and adding 1 here corresponds to the fact that one keyword pair is generated in step S2-7 above. Yes.
Next, in step S2-9, the keyword pair generation unit 12 determines whether or not the value of the variable countA is greater than N. That is, it is determined whether or not the inequality of countA> N holds. The value of N is as described above. When the value of the variable countA is greater than N (step S2-9: YES), the process of the entire flowchart is terminated. At that time, the keyword pair generation unit 12 has output N keyword pairs. If the value of the variable countA is N or less (step S2-9: NO), the process returns to step S2-4 to process the next keyword pair candidate.

上記の一連の処理により、キーワード対生成部１２は、キーワード対候補の中から、ランクのより高いキーワード対を優先して、且つ、所定の関係（ステップＳ２−６の判定がＹＥＳとなるような関係）を満たすＮ個のキーワード対を出力する。ただし、そのようなキーワード対候補がＮ個に満たない場合には、キーワード対生成部１２は、Ｎ個未満のキーワード対を出力する（ステップＳ２−４における判定結果がＹＥＳの場合）。また、キーワード対生成部１２は、出力したキーワード対にそれぞれ対応するランクスコアｒａｎｋ＿ｓｃｏｒｅ（ｉ，ｊ）の値を出力する。 Through the series of processes described above, the keyword pair generation unit 12 gives priority to a keyword pair having a higher rank from among the keyword pair candidates, and has a predetermined relationship (determination in step S2-6 is YES). N keyword pairs satisfying the relationship) are output. However, when there are less than N such keyword pair candidates, the keyword pair generation unit 12 outputs less than N keyword pairs (when the determination result in step S2-4 is YES). Further, the keyword pair generation unit 12 outputs the value of the rank score rank_score (i, j) corresponding to each of the output keyword pairs.

図４は、キーワード間関係抽出部１３がキーワード間の関係を抽出する処理の詳細を示すフローチャートである。この処理の前提として、キーワード間関係抽出部１３は、キーワード対生成部１２が出力したキーワード対集合と、キーワード対のランクスコアを取得する。以下、フローチャートに沿って説明する。 FIG. 4 is a flowchart showing details of the process in which the keyword relationship extraction unit 13 extracts the relationship between keywords. As a premise of this processing, the keyword relationship extraction unit 13 acquires the keyword pair set output by the keyword pair generation unit 12 and the rank score of the keyword pair. Hereinafter, it demonstrates along a flowchart.

まずステップＳ３−１において、キーワード間関係抽出部１３は、関係の出力数をカウントするための変数ｃｏｕｎｔＢを１にセットする。この変数のために、例えば、半導体メモリの所定の領域を割り当てておく。 First, in step S3-1, the inter-keyword relationship extraction unit 13 sets a variable countB for counting the number of output relationships to 1. For this variable, for example, a predetermined area of the semiconductor memory is allocated.

次にステップＳ３−２において、キーワード間関係抽出部１３は、取得したキーワード対の中から１つのキーワード対を処理対象として抽出する。このとき、キーワード間関係抽出部１３は、未処理のキーワード対の集合の中から、ランクスコアの値の昇順に、キーワード対を選択して抽出する。つまり、キーワード間関係抽出部１３は、ランクの高い順に、キーワード対を処理する。 Next, in step S3-2, the inter-keyword relationship extraction unit 13 extracts one keyword pair as a processing target from the acquired keyword pairs. At this time, the inter-keyword relationship extraction unit 13 selects and extracts keyword pairs from the set of unprocessed keyword pairs in ascending order of rank score values. That is, the keyword relationship extraction unit 13 processes keyword pairs in descending order of rank.

次にステップＳ３−３において、キーワード間関係抽出部１３は、前述の単語間関係辞書記憶部２１を参照することにより、処理対象のキーワード対が直接関係を持つかどうかを判定する。そのキーワード対が直接関係を持つ場合（ステップＳ３−３：ＹＥＳ）にはステップＳ３−４に進む。そのキーワード対が直接関係を持たない場合（ステップＳ３−３：ＮＯ）にはステップＳ３−５に進む。 Next, in step S3-3, the inter-keyword relationship extraction unit 13 refers to the above-described inter-word relationship dictionary storage unit 21 to determine whether the keyword pair to be processed has a direct relationship. If the keyword pair has a direct relationship (step S3-3: YES), the process proceeds to step S3-4. If the keyword pair is not directly related (step S3-3: NO), the process proceeds to step S3-5.

次にステップＳ３−４に進んだ場合、同ステップにおいて、キーワード間関係抽出部１３は、処理対象のキーワード対に関する関係名のランキングを行い、最適な関係名を選択する。対象としているキーワード対が１個だけの関係を有する場合には、その関係名が最適な関係名である。一方で、対象としているキーワード対が複数の関係を持つ場合がある。一例として、「ターミネーター２」という単語と「ジェームズ・キャメロン」という単語との間には、２つの関係が存在する。第１の関係は映画作品と監督との関係である。第２の関係は映画監督と脚本家（ライター）との関係である。これらの関係は、単語間関係辞書のエントリーとして単語間関係辞書記憶部２１が保持しているものである。ここで、特定のキーワード対に複数の関係名が存在するとき、キーワード間関係抽出部１３は、それらの関係名のうち、単語間関係辞書全体に出現する数が最も少ない関係名を、最適な関係名として選択する。 Next, in the case of proceeding to step S3-4, in the same step, the inter-keyword relationship extraction unit 13 performs ranking of relation names regarding the keyword pairs to be processed, and selects an optimum relation name. If the target keyword pair has only one relationship, the relationship name is the optimum relationship name. On the other hand, there are cases where the targeted keyword pairs have a plurality of relationships. As an example, there are two relationships between the word “Terminator 2” and the word “James Cameron”. The first relationship is the relationship between the movie work and the director. The second relationship is between the film director and the screenwriter (writer). These relationships are stored in the inter-word relationship dictionary storage unit 21 as entries in the inter-word relationship dictionary. Here, when a plurality of relation names exist in a specific keyword pair, the inter-keyword relation extraction unit 13 selects an optimum relation name that appears in the entire inter-word relation dictionary from among the relation names. Select as relationship name.

ただし、予め設定した数値Ｌよりも出現数が少ない関係名については、エラーとみなして、その関係名を選択しないようにしても良い。Ｌは、正の数であり、例えば、Ｌ＝１０とする。このように、出現数が少なすぎる関係名を除外するのは、そのような関係名が誤りによって単語間関係辞書に含まれている場合を考慮しているためである。 However, a relation name having a smaller number of appearances than the preset numerical value L may be regarded as an error and the relation name may not be selected. L is a positive number, for example, L = 10. The reason why relation names having a small number of appearances are excluded in this way is because a case where such relation names are included in the inter-word relation dictionary due to an error is taken into consideration.

次にステップＳ３−５に進んだ場合、対象のキーワード対は直接の関係を持たないものであるので、その対に含まれる２つのキーワードのそれぞれは、ある共通の単語との間で関係を有するものである。また、そのような共通単語が複数存在する場合もあり得る。同ステップにおいて、キーワード間関係抽出部１３は、最適な関係名を選択する。例えば、キーワード「脳卒中」とキーワード「心筋梗塞」は、他の単語との間で、次に列挙する関係を持つ。
（１）脳卒中 − ［関係：因果関係］ − 動脈硬化
（２）心筋梗塞 − ［関係：因果関係］ − 動脈硬化
（３）脳卒中 − ［関係：上位下位関係］ − 病気
（４）心筋梗塞 − ［関係：上位下位関係］ − 病気
このとき、キーワード対「脳卒中」および「心筋梗塞」は、「因果関係」という関係を持つ「動脈硬化」という共通の単語を有する。また、同キーワード対は、「上位下位関係」という関係を持つ「病気」という共通の単語を有する。 Next, when the process proceeds to step S3-5, since the target keyword pair has no direct relationship, each of the two keywords included in the pair has a relationship with a certain common word. Is. There may be a plurality of such common words. In this step, the keyword relationship extraction unit 13 selects an optimal relationship name. For example, the keyword “stroke” and the keyword “myocardial infarction” have the following relationship with other words.
(1) Stroke-[Relation: Causal]-Arteriosclerosis (2) Myocardial infarction-[Relation: Causal]-Atherosclerosis (3) Stroke-[Relation: Higher and lower relations]-Sickness (4) Myocardial infarction-[ Relationship: Higher-Lower Relationship]-Disease At this time, the keyword pair “stroke” and “myocardial infarction” have a common word “arteriosclerosis” having a relationship of “causal relationship”. The keyword pair has a common word “disease” having a relationship of “upper and lower relations”.

上の（１）から（４）までの単語間関係は単語間関係辞書に登録されているため、キーワード間関係抽出部１３は、キーワード「脳卒中」とキーワード「心筋梗塞」とが関係をもつ共通の単語として「動脈硬化」と「病気」を抽出する。
そして、キーワード間関係抽出部１３は、対象のキーワード対に対する関係名をランキングし、最もふさわしい関係名を選択する。 Since the inter-word relationships (1) to (4) above are registered in the inter-word relationship dictionary, the inter-keyword relationship extraction unit 13 has a common relationship between the keyword “stroke” and the keyword “myocardial infarction”. Extract "arteriosclerosis" and "disease" as the words.
Then, the inter-keyword relationship extraction unit 13 ranks the relationship names for the target keyword pairs and selects the most appropriate relationship name.

例えば、キーワード間関係抽出部１３は、文書ａからのキーワードと共通単語との間の類似度と、文書ｂからのキーワードと共通単語との間の類似度との和を計算する。つまり、キーワード間関係抽出部１３は、次の式に表す計算を行い、Ｓｃｏｒｅ（脳卒中，心筋梗塞｜動脈硬化）、および、Ｓｃｏｒｅ（脳卒中，心筋梗塞｜病気）の２つのスコアの値を得る。 For example, the keyword relationship extraction unit 13 calculates the sum of the similarity between the keyword and the common word from the document a and the similarity between the keyword and the common word from the document b. That is, the keyword relationship extraction unit 13 performs calculation represented by the following expression, and obtains two score values of Score (stroke, myocardial infarction | arteriosclerosis) and Score (stroke, myocardial infarction | disease).

Ｓｃｏｒｅ（脳卒中，心筋梗塞｜動脈硬化）＝
ｄｉｃｅ（脳卒中，動脈硬化）＋ｄｉｃｅ（心筋梗塞，動脈硬化）・・・式１
Ｓｃｏｒｅ（脳卒中，心筋梗塞｜病気）＝
ｄｉｃｅ（脳卒中，病気）＋ｄｉｃｅ（心筋梗塞，病気）・・・式２
これら２つの式のうちの前者は、「脳卒中」と「心筋梗塞」という単語ペアに関して、「動脈硬化」を共通単語としたときの関係名のスコアを表す。また、後者は、「脳卒中」と「心筋梗塞」という単語ペアに関して、「病気」を共通単語としたときの関係名のスコアを表す。また、ｄｉｃｅ（単語１，単語２）は、単語１と単語２との間のＤｉｃｅ係数である。 Score (stroke, myocardial infarction | arteriosclerosis) =
disce (stroke, arteriosclerosis) + disce (myocardial infarction, arteriosclerosis)
Score (stroke, myocardial infarction | disease) =
dice (stroke, illness) + dice (myocardial infarction, illness) ・・・ Equation 2
The former of these two expressions represents the score of the relation name when “arteriosclerosis” is a common word for the word pair “stroke” and “myocardial infarction”. The latter represents the score of the relation name when “disease” is a common word for the word pair “stroke” and “myocardial infarction”. Further, dice (word 1, word 2) is a Dice coefficient between word 1 and word 2.

そして、上の式１および式２による計算を行った結果として、例えば、
Ｓｃｏｒｅ（脳卒中，心筋梗塞｜動脈硬化）＞Ｓｃｏｒｅ（脳卒中，心筋梗塞｜病気）
である場合、キーワード間関係抽出部１３は、スコアの値が高いほうの「動脈硬化」を、「病気」よりも上位の共通単語としてランクする。共通単語が３つ以上ある場合にも、最もスコアの値が高い共通単語を、最上位にランクする。つまり、キーワード間関係抽出部１３は、スコアの値の降順に共通単語をランキングする。 And as a result of having performed the calculation by the above formulas 1 and 2, for example,
Score (stroke, myocardial infarction | arteriosclerosis)> Score (stroke, myocardial infarction | disease)
In such a case, the inter-keyword relationship extraction unit 13 ranks “arteriosclerosis” having a higher score value as a common word higher than “disease”. Even when there are three or more common words, the common word with the highest score is ranked at the top. That is, the inter-keyword relationship extraction unit 13 ranks common words in descending order of score values.

次に、キーワード間関係抽出部１３は、キーワードと共通単語との関係名を特定する。最適な関係名を特定するためには、ステップＳ３−４における処理と同様に、キーワード間関係抽出部１３は、単語間関係辞書における出現数が少ないほど適切な関係名であると判断する。つまり、キーワード間関係抽出部１３は、最も高いランクの共通単語に関する、最も高いランクの関係名を選択する。ここにおいても、例外的に、単語間関係辞書における出現数がＬ個より少ない関係名は、選択から除外するようにしても良い。 Next, the inter-keyword relationship extraction unit 13 specifies a relationship name between the keyword and the common word. In order to specify the optimum relationship name, the keyword relationship extraction unit 13 determines that the more appropriate the relationship name is, the smaller the number of appearances in the word relationship dictionary is, similar to the processing in step S3-4. That is, the inter-keyword relationship extraction unit 13 selects the highest-ranked relationship name related to the highest-ranked common word. Here, as an exception, relation names having fewer than L occurrences in the inter-word relation dictionary may be excluded from selection.

なお、このステップＳ３−５の処理において、キーワード間関係抽出部１３は、単語共起辞書記憶部２２を参照する。単語共起辞書記憶部２２は、単語の共起頻度による類似度のデータを保持している。類似度のデータの一例は、Ｄｉｃｅ係数である。上記の式１や式２で示すスコアを計算する際に、キーワード間関係抽出部１３は、単語共起辞書記憶部２２からＤａｃｅ係数のデータを読みだして使用する。
なお、単語共起辞書記憶部２２の構成については、後で図７を参照しながら、説明する In the process of step S3-5, the inter-keyword relationship extraction unit 13 refers to the word co-occurrence dictionary storage unit 22. The word co-occurrence dictionary storage unit 22 holds similarity data based on word co-occurrence frequencies. An example of the similarity data is a Dice coefficient. When calculating the scores shown in the above formulas 1 and 2, the inter-keyword relationship extraction unit 13 reads and uses the data of the Dace coefficient from the word co-occurrence dictionary storage unit 22.
The configuration of the word co-occurrence dictionary storage unit 22 will be described later with reference to FIG.

ステップＳ３−４またはＳ３−５の処理の後、ステップＳ３−７に移る。このステップＳ３−７おいて、キーワード間関係抽出部１３は、全キーワード対の処理が完了したか否かを判定する。全キーワード対を処理済みの場合（ステップＳ３−７：ＹＥＳ）には、このフローチャート全体の処理を終了する。残っているキーワード対がある場合（ステップＳ３−７：ＮＯ）には、次のステップＳ３−８に進む。 After step S3-4 or S3-5, the process proceeds to step S3-7. In step S3-7, the inter-keyword relationship extraction unit 13 determines whether or not all keyword pairs have been processed. If all keyword pairs have been processed (step S3-7: YES), the processing of the entire flowchart is terminated. If there is a remaining keyword pair (step S3-7: NO), the process proceeds to the next step S3-8.

次にステップＳ３−８において、キーワード間関係抽出部１３は、ステップＳ３−４またはＳ３−５で処理した関係数を、変数ｃｏｕｎｔＢに加算する。 Next, in step S3-8, the inter-keyword relationship extraction unit 13 adds the number of relationships processed in step S3-4 or S3-5 to the variable countB.

そしてステップＳ３−９において、キーワード間関係抽出部１３は、変数ｃｏｕｎｔＢの値が予め設定された数値Ｍ以上であるか否かを判定する。ｃｏｕｎｔＢ≧Ｍである場合（ステップＳ３−９：ＹＥＳ）には、このフローチャート全体の処理を終了する。ｃｏｕｎｔＢ＜Ｍである場合（ステップＳ３−９：ＮＯ）には、次のキーワード対を処理するためにステップＳ３−２に戻る。 In step S3-9, the inter-keyword relationship extraction unit 13 determines whether or not the value of the variable countB is greater than or equal to a preset numerical value M. When countB ≧ M is satisfied (step S3-9: YES), the process of the entire flowchart is terminated. If countB <M (step S3-9: NO), the process returns to step S3-2 to process the next keyword pair.

つまり、このフローチャートに示す処理の結果、キーワード間関係抽出部１３は、キーワード対とそれらのキーワードを直接つなぐ関係名を出力し（ステップＳ３−４による処理の場合）、またはキーワード対と、関係する共通単語と、キーワードと共通単語とをつなぐ関係名とを出力する（ステップＳ３−５による処理の場合）。キーワード間関係抽出部１３は、それらをキーワード対ごとに出力する。
また、キーワード間関係抽出部１３は、与えられた全キーワード対の処理を終えた場合には、それらのキーワード対に対応する関係名をすべて出力して、このフローチャートに示す処理を終了する（ステップＳ３−７：ＹＥＳ）。また、キーワード間関係抽出部１３は、処理した関係数が（Ｍ−１）個以上になったときには、与えられたキーワード対が残っていても、このフローチャートに示す処理を終了する（ステップＳ３−９：ＹＥＳ） In other words, as a result of the processing shown in this flowchart, the inter-keyword relationship extraction unit 13 outputs a relationship name directly connecting the keyword pair and those keywords (in the case of the processing in step S3-4), or is related to the keyword pair. A common word and a relation name that connects the keyword and the common word are output (in the case of the process in step S3-5). The inter-keyword relationship extraction unit 13 outputs them for each keyword pair.
In addition, when all the given keyword pairs have been processed, the inter-keyword relationship extraction unit 13 outputs all the relationship names corresponding to those keyword pairs, and ends the processing shown in this flowchart (step S1). S3-7: YES). Further, the inter-keyword relationship extraction unit 13 ends the processing shown in this flowchart even if the given keyword pair remains when the number of processed relationships becomes (M−1) or more (step S3−). 9: YES)

図５は、文書間関係抽出装置１による処理結果の出力例を示す概略図である。図示するように、この例では、まず、抽出された上位１０個の関係（関係１から関係１０まで）を、そのスコアの降順に出力している。その後に、参考情報として、入力概要文のペア（入力概要文１と入力概要文２）と、それぞれの入力文から抽出されたキーワードの集合（キーワード１とキーワード２）とを出力している。関係１から関係１０までの各々では、説明生成部１４が生成した「○○つながり」というパターンの文言と、スコアと、補助的な説明文が、出力されている。図示する例では、関係１においては、「原因結果」という関係名を用いて「原因つながり」という文言が出力されている。また、関係２においては、「疾患」という共通単語（ここでは、上位語）を用いて「疾患つながり」という文言が出力されている。 FIG. 5 is a schematic diagram illustrating an output example of a processing result by the inter-document relationship extraction device 1. As illustrated, in this example, first, the extracted top ten relationships (from relationship 1 to relationship 10) are output in descending order of their scores. Thereafter, a pair of input summary sentences (input summary sentence 1 and input summary sentence 2) and a set of keywords (keyword 1 and keyword 2) extracted from each input sentence are output as reference information. In each of relations 1 to 10, the wording of the pattern “XX connection” generated by the explanation generation unit 14, the score, and an auxiliary explanation are output. In the illustrated example, in the relationship 1, the word “cause connection” is output using the relationship name “cause result”. In relation 2, the word “disease connection” is output using the common word “disease” (here, the broader term).

図６は、単語間関係辞書記憶部２１が記憶する単語間関係辞書の構成を示す概略図である。図示するように、単語間関係辞書は、ある単語（単語１）と他の単語（単語２）とがどのような関係を有するかを表すデータである。図示する例では、データの１行目は、単語１「脳卒中」と単語２「動脈硬化」とが因果関係を有することを表す。また、データの２行目は、単語１「心筋梗塞」と単語２「動脈硬化」とが因果関係を有することを表す。なお、この「因果関係」の例では、単語１と単語２とは非可換であり、単語１が結果を表し、単語２が原因を表す。データの３行目は、単語１「脳卒中」と単語２「病気」とが上位下位関係を有することを表す。データの４行目は、単語１「心筋梗塞」と単語２「病気」とが上位下位関係を有することを表す。この「上位下位関係」においても、単語１と単語２とは非可換であり、単語１が下位概念を表し、単語２が上位概念を表す。一実施形態として、単語間関係辞書記憶部２１は、リレーショナルモデルによるテーブル（表）としてこのデータを記憶する。単語間の関係の種類としては、「因果関係」や「上位下位関係」のほかに、「類似関係」、「属性関係」などの各種があり得る。
また、単語間関係辞書は、データ項目として、説明文生成用の関係名をも有している。これは、説明生成部１４がテンプレートに関係名をあてはめて説明文を生成するときに用いることを目的として保持しているデータである。図示する例では、単語間関係辞書は、「因果関係」に対応する説明文生成用の関係名として「原因」を保持している。また、単語間関係辞書は、「上位下位関係」に対応する説明文生成用の関係名として「上位語」を保持している。 FIG. 6 is a schematic diagram showing the configuration of the inter-word relationship dictionary stored in the inter-word relationship dictionary storage unit 21. As shown in the figure, the inter-word relationship dictionary is data representing a relationship between a certain word (word 1) and another word (word 2). In the illustrated example, the first line of data indicates that word 1 “stroke” and word 2 “arteriosclerosis” have a causal relationship. The second line of the data indicates that word 1 “myocardial infarction” and word 2 “arteriosclerosis” have a causal relationship. In this “causal relationship” example, word 1 and word 2 are non-commutative, word 1 represents the result, and word 2 represents the cause. The third line of the data indicates that word 1 “stroke” and word 2 “disease” have an upper-lower relationship. The fourth line of the data indicates that word 1 “myocardial infarction” and word 2 “disease” have an upper-lower relationship. In this “higher-lower relationship” as well, word 1 and word 2 are non-commutative, word 1 represents a lower concept, and word 2 represents a higher concept. As one embodiment, the inter-word relationship dictionary storage unit 21 stores this data as a table based on a relational model. As types of relationships between words, there can be various types such as “similarity relationships” and “attribute relationships” in addition to “causal relationships” and “upper and lower relationships”.
The inter-word relation dictionary also has a relation name for generating an explanatory sentence as a data item. This is data held for the purpose of use when the explanation generation unit 14 assigns a relation name to a template to generate an explanation. In the illustrated example, the inter-word relationship dictionary holds “cause” as a relationship name for generating an explanatory sentence corresponding to “causal relationship”. The inter-word relation dictionary holds “higher word” as a relation name for generating a description corresponding to “higher and lower relation”.

同図では４行のデータのみを示しているが、実際には、膨大な数の行のデータがこの単語間関係辞書に含まれる。なお、この単語間関係辞書にエントリーされていない単語ペアについて、両単語は直接の関係を有しないと言える。 Although only four lines of data are shown in the figure, actually, a huge number of lines of data are included in the inter-word relation dictionary. In addition, it can be said that about the word pair which is not entered in this word relationship dictionary, both words do not have a direct relationship.

なお、単語間関係辞書のデータは、予め人手で作成するようにしても良いし、既存技術を用いて大量のテキストから自動的に抽出するようにしても良い。
単語間関係を抽出する既存技術については、例えば、次の参考文献に記載されている。この技術では、例えばインターネットを通して獲得可能な事典データを元に、機械学習を用いて特定の関係を有する単語対を抽出する。
［参考文献：上位下位関係抽出ツール Version1.0, ＵＲＬ：http://alaginrc.nict.go.jp/hyponymy/，ダウンロード年月日：２０１３年１２月２７日］ Note that the inter-word relationship dictionary data may be created manually in advance, or may be automatically extracted from a large amount of text using existing technology.
The existing technique for extracting the relationship between words is described in the following reference, for example. In this technique, for example, word pairs having a specific relationship are extracted using machine learning based on encyclopedia data that can be acquired through the Internet.
[References: Higher-lower relationship extraction tool Version1.0, URL: http://alaginrc.nict.go.jp/hyponymy/, download date: December 27, 2013]

図７は、単語共起辞書記憶部２２が記憶する単語共起辞書の構成を示す概略図である。図示するように、単語間関係辞書は、ある単語（単語１）と他の単語（単語２）との類似度の数値を表すデータである。ここでは、類似度としてＤｉｃｅ係数の値を用いる。Ｄｉｃｅ係数は、大量の文書データを元に統計的な処理をすることにより自動的に求めることができる。単語共起辞書記憶部２２は、リレーショナルモデルによるテーブル（表）としてこのデータを記憶するが、他のデータ形式で表現するようにしても良い。 FIG. 7 is a schematic diagram showing the configuration of the word co-occurrence dictionary stored in the word co-occurrence dictionary storage unit 22. As illustrated, the inter-word relationship dictionary is data representing a numerical value of the similarity between a certain word (word 1) and another word (word 2). Here, the value of the Dice coefficient is used as the similarity. The Dice coefficient can be automatically obtained by performing statistical processing based on a large amount of document data. The word co-occurrence dictionary storage unit 22 stores this data as a table based on a relational model, but may be expressed in other data formats.

なお、単語共起頻度についての既存のデータベースについては、例えば、下の参考文献に記載されている。このデータベースを構築するためには、ウェブページのテキストを元に、２つの単語対が共起する回数をカウントして、共起に関するスコアが算出されている。そのスコアとは、共起頻度、Ｄｉｃｅ係数、ディスカウンティング相互情報量である。なお、共起の条件はさまざまに設定されている。例えば、文書内の共起、近接４文内の共起、１文内の共起という条件で頻度がカウントされている。
［参考文献：（Ａ−５）単語共起頻度データベース，ＵＲＬ：https://alaginrc.nict.go.jp/resources/nictmastar/li-resource-info/li-resource-outline.html，ダウンロード年月日：２０１３年１２月２７日］ In addition, about the existing database about word co-occurrence frequency, it describes in the following references, for example. In order to construct this database, the score for co-occurrence is calculated by counting the number of times two word pairs co-occur based on the text of the web page. The score is a co-occurrence frequency, a Dice coefficient, and a discounting mutual information amount. There are various conditions for co-occurrence. For example, the frequency is counted under the conditions of co-occurrence in a document, co-occurrence in four adjacent sentences, and co-occurrence in one sentence.
[Reference: (A-5) Word co-occurrence frequency database, URL: https://alaginrc.nict.go.jp/resources/nictmastar/li-resource-info/li-resource-outline.html, download date Date: December 27, 2013]

なお、上述した実施形態における文書間関係抽出装置の機能をコンピューターで実現するようにしても良い。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Note that the function of the inter-document relationship extracting apparatus in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible disk, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, a “computer-readable recording medium” dynamically holds a program for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included, and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the above-described functions, or may be a program that can realize the above-described functions in combination with a program already recorded in a computer system.

本実施形態によれば、文書間の関係を抽出することができるため、ある文書との間で特定の関係を有する他の文書を特定することができる。よって、ある文書に関連した他の文書を推薦することができる。また、関係名が特定されるため、その推薦理由を明示することができる。文書がテレビ等の番組に対応付いているときには、ある番組に関連付いた他の番組を推薦することができる。文書が販売されている商品に対応づいているときには、ある商品に関連付いた他の商品を推薦することができる。また、それらの推薦の理由（関係名）を明示的に出力することができる。 According to the present embodiment, since a relationship between documents can be extracted, another document having a specific relationship with a certain document can be specified. Therefore, other documents related to a certain document can be recommended. Moreover, since the relationship name is specified, the reason for recommendation can be specified. When a document is associated with a program such as a television, another program associated with a program can be recommended. When a document is associated with a product for sale, other products associated with the product can be recommended. In addition, the reason for the recommendation (relationship name) can be explicitly output.

（変形例）なお、上記実施形態の変形例として、下記の態様による文書間関係抽出装置１を構成しても良い。つまり、上記実施形態では、キーワード対生成部１２がＮ個のキーワード対を生成する処理（図３で説明した処理）をまず行い、その後に、生成されたキーワード対の集合を対象としてキーワード間関係抽出部１３がキーワード間の関係を抽出する処理（図４で説明した処理）を行った。一方、本変形例では、キーワード対生成部１２が、ｒａｎｋ＿ｓｃｏｒｅ（ｉ，ｊ）の順に従って１個のキーワード対を生成する都度、そのキーワード対を対象として、キーワード間関係抽出部１３がキーワード間の関係を抽出する。
具体的には、図３におけるステップＳ２−７の処理の箇所で、キーワード間関係抽出部１３が、図４におけるステップＳ３−３、Ｓ３−４、Ｓ３−５に相当する処理を行う。また、それとともに、文書間関係抽出装置１の全体としての終了条件の判断および制御を行う。即ち、処理するキーワード対の数（変数ｃｏｕｎｔＡでカウントする数）が上限値Ｎを超えたか否か、また、出力する関係数（変数ｃｏｕｎｔＢでカウントする数）が上限値Ｍ以上になったか否かを、判断し、いずれかの上限に達した時点で、全体の処理を終了する。 (Modification) As a modification of the above embodiment, the inter-document relationship extraction apparatus 1 according to the following mode may be configured. That is, in the above embodiment, the keyword pair generation unit 12 first performs a process of generating N keyword pairs (the process described with reference to FIG. 3), and thereafter, a keyword relationship between the generated keyword pair sets. The extraction unit 13 performed a process of extracting the relationship between keywords (the process described with reference to FIG. 4). On the other hand, in the present modification, each time the keyword pair generation unit 12 generates one keyword pair in the order of rank_score (i, j), the inter-keyword relationship extraction unit 13 selects the keyword pair as a target. Extract relationships.
Specifically, the inter-keyword relationship extraction unit 13 performs a process corresponding to steps S3-3, S3-4, and S3-5 in FIG. 4 at the position of the process in step S2-7 in FIG. At the same time, the termination condition of the inter-document relationship extracting apparatus 1 as a whole is judged and controlled. That is, whether the number of keyword pairs to be processed (the number counted by the variable countA) exceeds the upper limit N, and whether the number of relations to be output (the number counted by the variable countB) is equal to or greater than the upper limit M. Is determined, and when either upper limit is reached, the entire process is terminated.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

本発明は、文書間の関係を抽出することができるため、ある文書との間で特定の関係を有する他の文書を特定することができる。したがって、ある文書との間で特定の関係を有する他の文書を検索するために利用することができる。さらに、例えばテレビ放送やラジオ放送やネット配信等の番組と文書とが対応づいている場合に、ある番組と特定の関係を有する他の番組を検索するためなどに利用することができる。したがって、本発明は、コンテンツ提供サービスにおけるコンテンツ推薦などに利用することができる。また、例えば通販サイトなどで販売されている商品等と文書とが対応づいている場合に、ある商品等と特定の関係を有する他の商品等を検索するためなどに利用することができる。したがって、本発明は、通販サイトなどにおける商品推薦などに利用することができる。 Since the present invention can extract a relationship between documents, another document having a specific relationship with a certain document can be specified. Therefore, it can be used to search for other documents having a specific relationship with a certain document. Further, for example, when a program such as a television broadcast, a radio broadcast, or an internet distribution is associated with a document, it can be used to search for another program having a specific relationship with a certain program. Therefore, the present invention can be used for content recommendation in a content providing service. For example, when a product sold on a mail order site or the like is associated with a document, it can be used to search for another product having a specific relationship with a certain product. Therefore, the present invention can be used for product recommendation on a mail order site or the like.

１文書間関係抽出装置
１０文書取得部
１１キーワード抽出部
１２キーワード対生成部
１３キーワード間関係抽出部
１４説明生成部
２１単語間関係辞書記憶部
２２単語共起辞書記憶部 DESCRIPTION OF SYMBOLS 1 Inter-document relation extraction apparatus 10 Document acquisition part 11 Keyword extraction part 12 Keyword pair production | generation part 13 Keyword relation extraction part 14 Explanation production | generation part 21 Inter-word relation dictionary storage part 22 Word co-occurrence dictionary storage part

Claims

A keyword extracting unit that extracts a plurality of keywords of the document from the input document and outputs a score indicating the importance of each of the keywords in the document;
Based on the relationship between the words read from the inter-word relationship dictionary storage unit that holds data representing the word pair and the relationship between the words, for the two keyword pairs extracted from the two documents, A keyword pair generation unit that outputs a keyword pair as a keyword pair when the keyword pair has a direct relationship, or when the keyword pair has a direct relationship with another common word;
Regarding the keyword pair output by the keyword pair generation unit, when the keyword pair has a direct relationship based on the relationship between words read from the inter-word relationship dictionary storage unit, the relationship is output, When the keyword pair has a direct relationship with another common word, the relationship between the other word and the keyword contained in the keyword pair is output from the keyword to the other word. And
An inter-document relationship extracting apparatus comprising:

The inter-keyword relationship extraction unit outputs, as an optimal relationship, a relationship having the smallest number of occurrences in the inter-word relationship dictionary storage unit among the relationships of the keyword pair.
The inter-document relationship extracting device according to claim 1, wherein

An explanation generating unit that generates explanatory text data including the keyword pair generated by the keyword pair generating unit and the relationship output by the inter-keyword relationship extracting unit with respect to the keyword pair;
The inter-document relationship extracting apparatus according to claim 1, further comprising:

Computer
A keyword extracting unit that extracts a plurality of keywords of the document from the input document and outputs a score indicating the importance of each of the keywords in the document;
Based on the relationship between the words read from the inter-word relationship dictionary storage unit that holds data representing the word pair and the relationship between the words, for the two keyword pairs extracted from the two documents, A keyword pair generation unit that outputs a keyword pair as a keyword pair when the keyword pair has a direct relationship, or when the keyword pair has a direct relationship with another common word;
Regarding the keyword pair output by the keyword pair generation unit, when the keyword pair has a direct relationship based on the relationship between words read from the inter-word relationship dictionary storage unit, the relationship is output, When the keyword pair has a direct relationship with another common word, the relationship between the other word and the keyword contained in the keyword pair is output from the keyword to the other word. Part,
Program to function as.