JP2012159983A

JP2012159983A - Analogy device, analogy method and program

Info

Publication number: JP2012159983A
Application number: JP2011018787A
Authority: JP
Inventors: Tomohiro Takagi; 友博高木
Original assignee: Meiji University
Current assignee: Meiji University
Priority date: 2011-01-31
Filing date: 2011-01-31
Publication date: 2012-08-23
Anticipated expiration: 2031-01-31
Also published as: JP5569908B2

Abstract

PROBLEM TO BE SOLVED: To achieve analogy even when a concept to which a base belongs is different from a concept to which a target belongs and conditions of base and target consist of multiple words in four-term analogy.SOLUTION: A base condition partial combination generation part 21 generates all partial combinations Awhen selecting the predetermined number of words from words that constitute a base condition A, and a base result partial combination generation part 22 generates all partial combinations Bwhen selecting the predetermined number of words from words that constitute a base result B. A relation set generation part 24 extracts a word rthat associates Awith Bfrom article data, with respect to each of all combinations of Aand B. A target condition partial combination generation part 26 generates all partial combinations Cwhen selecting the predetermined number of words that constitute a target condition C, and an analogy result generation part 28 extracts a word xthat is associated with Cby the word rfrom the article data, with respect to each of all combinations of Cand each word r.

Description

本発明は、事例に基づいて類推を行なう類推装置、類推方法及びプログラムに関する。 The present invention relates to an analogy device, an analogy method, and a program for performing analogy based on cases.

基本的な類推手法の一つとして、四項類推が知られている。四項類推は、一般に以下の式（１）のように表示する。 As one of the basic analogy methods, four-term analogy is known. The four-term analogy is generally expressed as the following formula (1).

Ａ：Ｂ＝Ｃ：Ｘ？ …（１） A: B = C: X? ... (1)

上記は、「ＡならばＢ、Ｃならば何であるか？」を意味しており、Ｘは類推結果を表している。つまり、Ａ、Ｂ、ＣからＸ？を求めるのが四項類推である。四項類推では、基本構成要素として、規定領域（ベース）と目標領域（ターゲット）をおく。ベースとは類推する際に用いる既存の知識のことであり、ターゲットとは解決をしなければならない未知の問題のことである。つまり、上記の四項類推の例では、Ａ及びＢはベースに属し、Ｃ及びＸはターゲットに属する。非特許文献１では、この四項類推のアルゴリズムを実現するシステムが提案されている。図１３は、そのアルゴリズムの概要を示す図である。 The above means “if A, what is B, what is C?” And X represents the analogy result. That is, A, B, C to X? It is a four-term analogy to find In the four-term analogy, a specified area (base) and a target area (target) are set as basic components. The base is existing knowledge used for analogy, and the target is an unknown problem that must be solved. That is, in the above four-term analogy example, A and B belong to the base, and C and X belong to the target. Non-Patent Document 1 proposes a system that realizes this four-term analogy algorithm. FIG. 13 is a diagram showing an outline of the algorithm.

図１３に示すアルゴリズムでは、まず、ベースにおけるＡとＢの間の関係集合Ｒを求める関係抽出（Relation extraction）処理を行なう。続いて、関係抽出処理において求めた関係集合Ｒをターゲットに移し、Ｃにその関係集合Ｒを適用してＸを求める関係マッピング（Relation Mapping）処理を行なう。以後、ベースのＡを状況、結果Ｂをその状況での結果、このＡとＢの組を１つの事例と呼ぶ。 In the algorithm shown in FIG. 13, first, a relation extraction process for obtaining a relation set R between A and B in the base is performed. Subsequently, the relation set R obtained in the relation extraction process is transferred to the target, and a relation mapping process for obtaining X by applying the relation set R to C is performed. Hereinafter, the base A is the situation, the result B is the result of the situation, and this set of A and B is called one case.

非特許文献１では、関係抽出処理において得られる関係集合Ｒを、類似な関係を表す単語ｒ_ｉの集合として以下の式（２）のように定義している。 In Non-Patent Document 1, a relation set R obtained in the relation extraction process is defined as a set of words r _i representing a similar relation as shown in the following expression (2).

Ｒ＝｛ｒ_ｉ｝（ｉは１以上の整数） …（２） R = {r _i } (i is an integer of 1 or more) (2)

関係マッピング処理においては、関係集合Ｒに含まれる各単語ｒ_ｉを用いてＣから複数の類推結果の候補となる単語ｘ_ｊを求め、さらに、求めた単語ｘ_ｊそれぞれについて、尤もらしさを定量的に示す値であるｓｃｏｒｅ（ｘ_ｊ）を算出する。つまり、類推結果は、単語ｘ_ｊと、それに付与されたｓｃｏｒｅ（ｘ_ｊ）とからなる集合であり、以下の式（３）により表される。 In the relationship mapping process, a word x _j that is a candidate for a plurality of analogy results is obtained from C using each word r _i included in the relationship set R, and the likelihood is quantitatively determined for each of the obtained words x _j. Score (x _j ) which is a value shown in FIG. That is, the analogy result is a set composed of the word x _j and the score (x _j ) assigned to it, and is expressed by the following equation (3).

Ｘ＝｛ｘ_ｊ｝（ｊは１以上の整数） …（３） X = {x _j } (j is an integer of 1 or more) (3)

一方、非特許文献２では、次のように類推を行っている。まず、ＤＶＤタイトルに対応した説明文から抽出された各単語にＴＦ−ＩＤＦ値のスコアを付与しておき、ユーザの選択操作に従って当該スコアを修正する。そして、その修正したスコアが上位の単語からなるＤＶＤデータと、入力された単語とに基づいて推薦するＤＶＤデータを選択している。
また、非特許文献３では、次のように類推を行なっている。まず、過去の１週間の記事データからＴＦ−ＩＤＦ値が上位の単語からなるワードベクトルを生成し、生成したワードベクトルに基づいてその１週間の翌日の記事データから候補語を抽出する。このワードベクトルと、候補語の組合せを記事データの時期をずらしながら複数生成しておく。そして、予想する日にちより前の１週間の記事データから同様にワードベクトルを生成して過去の記事データから生成したワードベクトルとのマッチングを行い、マッチするワードベクトルに対応した候補語を予測結果としている。 On the other hand, in Non-Patent Document 2, an analogy is performed as follows. First, a score of the TF-IDF value is assigned to each word extracted from the explanatory text corresponding to the DVD title, and the score is corrected according to the user's selection operation. Then, the recommended DVD data is selected based on the DVD data whose corrected score is composed of the upper words and the input words.
In Non-Patent Document 3, the analogy is performed as follows. First, a word vector composed of words having a higher TF-IDF value is generated from article data of the past week, and candidate words are extracted from article data of the next day of the week based on the generated word vector. A plurality of combinations of this word vector and candidate words are generated while shifting the time of article data. Then, a word vector is similarly generated from the article data of one week prior to the predicted date, and matching is performed with the word vector generated from the past article data, and candidate words corresponding to the matching word vectors are used as the prediction results. Yes.

岡田一宏、外４名、「構造写像理論に基づく類推手法」、第３５回ファジィ・ワークショップ講演論文集、２０１０年３月、ｐ．２５−３０Kazuhiro Okada, 4 others, “Analogue Method Based on Structural Mapping Theory”, Proceedings of the 35th Fuzzy Workshop, March 2010, p. 25-30 伊達寅彦、外７名、「ＣＦＳを用いたＤＶＤ推薦システムの提案」、第３５回ファジィ・ワークショップ講演論文集、日本知能情報ファジィ学会、２０１０年３月、ｐ．３３−３６Yasuhiko Date, 7 others, “Proposal of DVD recommendation system using CFS”, Proceedings of the 35th Fuzzy Workshop, Intelligent Information Technology Fuzzy Society, March 2010, p. 33-36 伊藤慎一郎、外１名、「言語による経済動向の予測」、第３５回ファジィ・ワークショップ講演論文集、２０１０年３月、ｐ．３７−４０Shinichiro Ito, 1 other, “Predicting Economic Trends by Language,” Proceedings of the 35th Fuzzy Workshop, March 2010, p. 37-40

一般に四項類推では、ベースの状況であるＡ、ターゲットの状況であるＣとも、１つの単語で構成されるいわば１次元の表現であり、これは、非特許文献１においても同様である。一方、非特許文献２及び非特許文献３では、複数の単語で表される状況を事例として類推を行うことができるが、ターゲットもベースも同じ概念に属していなければならない。例えば、非特許文献１では、「魚ならばうろこ、鳥ならば何であるか？」というように、ターゲットが魚に関する概念に属し、ベースが鳥に関する概念に属していても類推を行うことができる。しかし、非特許文献２では、ターゲットもベースもＤＶＤの概念に属し、非特許文献３では、ターゲットもベースも経済の概念に属する。 In general, in the four-term analogy, both the base situation A and the target situation C are so-called one-dimensional expressions composed of one word, and this is the same in Non-Patent Document 1. On the other hand, in Non-Patent Document 2 and Non-Patent Document 3, an analogy can be made by taking a situation represented by a plurality of words as an example, but both the target and the base must belong to the same concept. For example, in Non-Patent Document 1, an analogy can be performed even if the target belongs to the concept related to fish and the base belongs to the concept related to bird, such as “What is scale if fish? What is it if bird?”. . However, in Non-Patent Document 2, both the target and base belong to the concept of DVD, and in Non-Patent Document 3, both the target and base belong to the concept of economy.

本発明は、このような事情を考慮してなされたもので、その目的は、ベースにおける状況及び結果とからなる事例と、ベースが属する概念とは異なる概念に属するターゲットにおける状況が与えられたときに、ベース及びターゲットの状況がそれぞれ複数の単語で構成される場合においても、ターゲットにおける類推結果を求めることができる類推装置、類推方法、及び、プログラムを提供することにある。 The present invention has been made in consideration of such circumstances, and its purpose is to provide a case in which a situation and a result in a base and a situation in a target belonging to a concept different from the concept to which the base belongs are given. In addition, an object of the present invention is to provide an analogy device, an analogy method, and a program capable of obtaining an analogy result in a target even when the base and target situations are each composed of a plurality of words.

この発明は、上記の課題を解決すべくなされたもので、ベース状況データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるベース状況部分組合せデータを生成するベース状況部分組合せ生成部と、ベース結果データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるベース結果部分組合せデータを生成するベース結果部分組合せ生成部と、前記ベース状況部分組合せデータのうち１つと前記ベース結果部分組合せデータのうち１つとからなる異なる組合せそれぞれについて、前記ベース状況部分組合せデータが示す単語と前記ベース結果部分組合せデータが示す単語とを関係付ける単語である関連付け単語を、記事記憶装置に記憶されている記事データから抽出する関係集合生成部と、ターゲット状況データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるターゲット状況部分組合せデータを生成するターゲット状況部分組合せ生成部と、前記ターゲット状況部分組合せデータのうち１つと前記関連付け単語のうち１つとからなる異なる組合せそれぞれについて、前記関連付け単語によって前記ターゲット状況部分組合せデータが示す単語と関係付けられる単語を、前記記事記憶装置に記憶されている記事データから類推結果として抽出する類推結果生成部と、を備えることを特徴とする類推装置である。 The present invention has been made to solve the above-mentioned problem, and a base situation partial combination for generating base situation partial combination data composed of a predetermined number of words selected from a plurality of words indicated by base situation data by different combinations. A generation unit, a base result partial combination generation unit that generates base result partial combination data including a predetermined number of words selected from a plurality of words indicated by the base result data, and the base situation partial combination data For each of the different combinations of one and one of the base result partial combination data, an association word that is a word relating the word indicated by the base situation partial combination data and the word indicated by the base result partial combination data, Relational aggregates extracted from article data stored in storage A target situation partial combination generation unit that generates target situation partial combination data composed of a predetermined number of words selected from a plurality of words indicated by the target situation data, and one of the target situation partial combination data For each different combination consisting of one and one of the association words, a word related to the word indicated by the target situation partial combination data by the association word is used as an analogy result from the article data stored in the article storage device. And an analogy result generating unit for extraction.

また本発明は、上述した類推装置であって、前記関係集合生成部が抽出した前記関連付け単語それぞれについて、前記記事記憶装置に記憶されている前記記事データから得られる当該関連付け単語と前記ベース状況部分組合せデータ及び前記ベース結果部分組合せデータの共起との関連の強さ、当該関連付け単語と前記ベース状況部分組合せデータとの関連の強さ、及び、当該関連付け単語と前記ベース結果部分組合せデータとの関連の強さに基づき、状況と結果の関係付けを行なう単語としての妥当性を定量的に表す関係妥当性スコアを算出する関係妥当性スコア算出部と、前記類推結果生成部が類推結果として抽出した前記単語のそれぞれについて、前記記事記憶装置に記憶されている前記記事データから得られる当該単語と前記ターゲット状況部分組合せデータ及び前記関連付け単語の共起との関連の強さ、当該単語と前記ターゲット状況部分組合せデータとの関連の強さ、及び、当該単語と前記関連付け単語との関連の強さ、ならびに、前記関連付け単語について算出された前記関係妥当性スコアに基づき、類推結果としての妥当性を定量的に表す類推結果妥当性スコアを算出する類推結果妥当性スコア算出部と、をさらに備えることを特徴とする。 In addition, the present invention is the analogy device described above, and for each of the association words extracted by the relation set generation unit, the association word obtained from the article data stored in the article storage device and the base situation part The strength of the association between the combination data and the co-occurrence of the base result partial combination data, the strength of the association between the association word and the base situation partial combination data, and the association word and the base result partial combination data Based on the strength of the relation, the relation validity score calculation part that calculates the relation validity score that quantitatively represents the validity as the word that relates the situation and the result, and the analogy result generation part extract as the analogy result For each of these words, the word obtained from the article data stored in the article storage device and the target Strength of association between the situation situation combination data and the co-occurrence of the associated word, strength of association between the word and the target situation partial combination data, and strength of association between the word and the association word, And an analogy result validity score calculation unit that calculates an analogy result validity score that quantitatively represents the validity as an analogy result based on the relation validity score calculated for the association word. Features.

また本発明は、上述した類推装置であって、ベース状況データ及びベース結果データからなる複数の事例データ毎に、前記ベース状況部分組合せ生成部に、前記事例データを構成する前記ベース状況データからベース状況部分組合せデータを生成させ、前記ベース結果部分組合せ生成部に、前記事例データを構成する前記ベース結果データから前記ベース結果部分組合せデータを生成させ、前記関係集合生成部に、前記ベース状況データから生成された前記ベース状況部分組合せデータのうち１つと、前記ベース結果データから生成された前記ベース結果部分組合せデータのうち１つとからなる異なる組合せそれぞれについて関連付け単語を記事データから抽出させ、前記関係妥当性スコア算出部に、前記関連付け単語それぞれについて関係妥当性スコアを算出させ、前記類推結果生成部に、前記ターゲット状況部分組合せデータのうち１つと前記関連付け単語のうち１つとからなる異なる組合せそれぞれについて記事データから類推結果の単語を抽出させ、前記類推結果妥当性スコア算出部に、類推結果として抽出された前記単語のそれぞれについて類推結果妥当性スコアを算出させる類推処理制御部と、前記事例データ毎に得られた前記類推結果の単語に含まれる同一の単語を統合するとともに、統合した前記同一の単語について算出された前記類推結果妥当性スコアを積算する類推結果積算部とをさらに備える、ことを特徴とする。 In addition, the present invention provides the analogy device described above, wherein, for each of a plurality of case data composed of base situation data and base result data, the base situation partial combination generation unit is configured to base the base situation data on the base situation data constituting the case data. Generating situation partial combination data, causing the base result partial combination generation unit to generate the base result partial combination data from the base result data constituting the case data, and causing the relation set generation unit to generate from the base situation data An association word is extracted from article data for each of different combinations of one of the generated base situation partial combination data and one of the base result partial combination data generated from the base result data, and the relation validity Relevant for each of the associated words in the sex score calculator A correctness score is calculated, and the analogy result generation unit is configured to extract an analogy result word from article data for each of the different combinations of one of the target situation partial combination data and one of the associated words, and the analogy result An analogy processing control unit that causes the result validity score calculation unit to calculate an analogy result validity score for each of the words extracted as an analogy result, and the same included in the analogy result word obtained for each case data And an analogy result accumulating unit for accumulating the analogy result validity scores calculated for the integrated identical words.

また本発明は、上述した類推装置であって、前記関係集合生成部は、前記ベース状況部分組合せデータが示す単語が主語の名詞かつ前記ベース結果部分組合せデータが示す単語が述部の名詞である前記記事データの文から、述部の動詞を前記関連付け単語として抽出し、前記類推結果生成部は、前記ターゲット状況部分組合せデータが示す複数の単語が主語の名詞かつ前記関連付け単語が述部の動詞である前記記事データの文から、述部の名詞を類推結果として抽出する、ことを特徴とする。 Further, the present invention is the analogy device described above, wherein the relation set generation unit is a noun whose subject is the word indicated by the base situation partial combination data and a noun whose predicate is the word indicated by the base result partial combination data The verb of the predicate is extracted as the association word from the sentence of the article data, and the analogy result generation unit is configured such that the plurality of words indicated by the target situation partial combination data are subject nouns and the association word is a predicate verb. The noun of the predicate is extracted as an analogy result from the sentence of the article data.

また本発明は、上述した類推装置であって、前記記事記憶装置は、さらに、所定の分野に関する単語を含む辞書データを記憶し、前記関係集合生成部は、前記ベース状況部分組合せデータにより示される単語及び前記ベース結果部分組合せデータにより示される単語が共起する前記記事データの文から、前記辞書データに含まれる単語を前記関連付け単語として抽出し、前記類推結果生成部は、前記ターゲット状況部分組合せデータにより示される単語及び前記関連付け単語が共起する前記記事データの文から、前記辞書データに含まれる単語を類推結果として抽出する、ことを特徴とする。 Further, the present invention is the analogy device described above, wherein the article storage device further stores dictionary data including words relating to a predetermined field, and the relation set generation unit is indicated by the base situation partial combination data A word included in the dictionary data is extracted as the association word from a sentence of the article data in which a word and a word indicated by the base result partial combination data co-occur, and the analogy result generation unit is configured to output the target situation partial combination A word included in the dictionary data is extracted as an analogy result from a sentence of the article data in which a word indicated by data and the associated word co-occur.

また本発明は、類推装置が実行する類推方法であって、ベース状況部分組合せ生成部が、ベース状況データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるベース状況部分組合せデータを生成するベース状況部分組合せ生成過程と、ベース結果部分組合せ生成部が、ベース結果データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるベース結果部分組合せデータを生成するベース結果部分組合せ生成過程と、関係集合生成部が、前記ベース状況部分組合せデータのうち１つと前記ベース結果部分組合せデータのうち１つとからなる異なる組合せそれぞれについて、前記ベース状況部分組合せデータが示す単語と前記ベース結果部分組合せデータが示す単語とを関係付ける単語である関連付け単語を、記事記憶装置に記憶されている記事データから抽出する関係集合生成過程と、ターゲット状況部分組合せ生成部が、ターゲット状況データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるターゲット状況部分組合せデータを生成するターゲット状況部分組合せ生成過程と、類推結果生成部が、前記ターゲット状況部分組合せデータのうち１つと前記関連付け単語のうち１つとからなる異なる組合せそれぞれについて、前記関連付け単語によって前記ターゲット状況部分組合せデータが示す単語と関係付けられる単語を、前記記事記憶装置に記憶されている記事データから類推結果として抽出する類推結果生成過程と、を有することを特徴とする類推方法である。 The present invention is also an analogy method executed by an analogy estimation device, wherein the base situation partial combination generation unit includes a predetermined number of words selected from a plurality of words indicated by the base situation data, and the base situation partial combination A base situation partial combination generation process for generating data, and a base result partial combination generation unit generate base result partial combination data composed of a predetermined number of words selected by different combinations from a plurality of words indicated by the base result data The base result partial combination generation process and the relation set generation unit indicate the word indicated by the base situation partial combination data for each of the different combinations of one of the base situation partial combination data and one of the base result partial combination data. And a word related to the word indicated by the base result partial combination data The relation set generation process of extracting a certain association word from the article data stored in the article storage device, and the target situation partial combination generation unit of a predetermined number of words selected by a different combination from a plurality of words indicated by the target situation data A target situation partial combination generation process for generating target situation partial combination data consisting of the words, and an analogy result generation unit for each different combination consisting of one of the target situation partial combination data and one of the association words, An analogy result generation step of extracting a word related to the word indicated by the target situation partial combination data by the association word as an analogy result from the article data stored in the article storage device. This is an analogy method.

また本発明は、類推装置として用いられるコンピュータを、ベース状況データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるベース状況部分組合せデータを生成するベース状況部分組合せ生成部、ベース結果データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるベース結果部分組合せデータを生成するベース結果部分組合せ生成部、前記ベース状況部分組合せデータのうち１つと前記ベース結果部分組合せデータのうち１つとからなる異なる組合せそれぞれについて、前記ベース状況部分組合せデータが示す単語と前記ベース結果部分組合せデータが示す単語とを関係付ける単語である関連付け単語を、記事記憶装置に記憶されている記事データから抽出する関係集合生成部、ターゲット状況データが示す複数の単語から異なる組合せにより選択した所定単語数の前記単語からなるターゲット状況部分組合せデータを生成するターゲット状況部分組合せ生成部、前記ターゲット状況部分組合せデータのうち１つと前記関連付け単語のうち１つとからなる異なる組合せそれぞれについて、前記関連付け単語によって前記ターゲット状況部分組合せデータが示す単語と関係付けられる単語を、前記記事記憶装置に記憶されている記事データから類推結果として抽出する類推結果生成部、として機能させることを特徴とするプログラムである。 Further, the present invention provides a base situation partial combination generation unit that generates base situation partial combination data including a predetermined number of words selected from a plurality of words indicated by the base situation data by using a computer used as an analogy device. A base result partial combination generation unit that generates base result partial combination data including a predetermined number of words selected from a plurality of words indicated by base result data, and one of the base situation partial combination data and the base result For each different combination consisting of one of the partial combination data, an association word, which is a word relating the word indicated by the base situation partial combination data and the word indicated by the base result partial combination data, is stored in the article storage device. Relations extracted from live article data A target situation partial combination generation unit that generates target situation partial combination data including a predetermined number of words selected from a plurality of words indicated by target situation data, and one of the target situation partial combination data and the target situation data For each different combination consisting of one of the association words, a word related to the word indicated by the target situation partial combination data by the association word is extracted from the article data stored in the article storage device as an analogy result It is a program characterized by functioning as an analogy result generation unit.

本実施形態によれば、ベースにおける状況及び結果とからなる事例と、ベースが属する概念とは異なる概念に属するターゲットにおける状況が与えられたときに、ベース及びターゲットの状況がそれぞれ複数の単語で構成される場合においても、ターゲットにおける類推結果を求めることができる。 According to the present embodiment, when a situation consisting of a situation and a result in a base and a situation in a target belonging to a concept different from the concept to which the base belongs are given, each of the situation of the base and the target is composed of a plurality of words. Even in such a case, the analogy result at the target can be obtained.

本発明の第１の実施形態による類推装置の関係抽出処理の概要を示す図である。It is a figure which shows the outline | summary of the relationship extraction process of the analogy device by the 1st Embodiment of this invention. 同実施形態による類推装置の関係マッピング処理の概要を示す図である。It is a figure which shows the outline | summary of the relationship mapping process of the analogy device by the same embodiment. 同実施形態による類推装置の構成を示すブロック図である。It is a block diagram which shows the structure of the analogy device by the same embodiment. 同実施形態による類推装置の類推処理フローを示す図である。It is a figure which shows the analogy processing flow of the analogy device by the same embodiment. 同実施形態による類推装置の類推処理フローを示す図である。It is a figure which shows the analogy processing flow of the analogy device by the same embodiment. 同実施形態による単語ｒの抽出処理を説明するための図である。It is a figure for demonstrating the extraction process of the word r by the embodiment. 同実施形態による単語ｘの抽出処理を説明するための図である。It is a figure for demonstrating the extraction process of the word x by the embodiment. 同実施形態による単語ｒ及び単語ｘの抽出処理を説明するための図である。It is a figure for demonstrating the extraction process of the word r and the word x by the embodiment. 本発明の第２の実施形態による類推装置の処理概要を示す図である。It is a figure which shows the process outline | summary of the analogy estimation apparatus by the 2nd Embodiment of this invention. 同実施形態による類推装置の構成を示すブロック図である。It is a block diagram which shows the structure of the analogy device by the same embodiment. 同実施形態による類推装置の処理フローを示す図である。It is a figure which shows the processing flow of the analogy device by the same embodiment. 同実施形態による類推装置の処理フローを示す図である。It is a figure which shows the processing flow of the analogy device by the same embodiment. 従来技術の類推アルゴリズムを示す図である。It is a figure which shows the analogy algorithm of a prior art.

以下、図面を参照して本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

[第１の実施形態]
一般に、四項類推における規定領域（ベース）の状況Ａと目標領域（ターゲット）の状況Ｃは、１つの単語で構成されるいわば１次元の表現である。本実施形態では、この状況Ａ及び状況Ｃを、複数の単語で構成される多次元の構成に拡張する。これを、式（１）で示したmodus ponensと同様の表現方法で記述すると、以下の式（４）のようになる。 [First embodiment]
Generally, the situation A of the specified area (base) and the situation C of the target area (target) in the four-term analogy are so-called one-dimensional expressions composed of one word. In the present embodiment, the situation A and the situation C are extended to a multi-dimensional configuration composed of a plurality of words. If this is described in the same expression method as modus ponens shown in equation (1), the following equation (4) is obtained.

上記のように、状況Ａは、単語ｔ_ａ１，ｔ_ａ２，…，ｔ_ａｍで構成され、状況Ｃは、単語ｔ_ｃ１，ｔ_ｃ２，…，ｔ_ｃｍで構成される。つまり、状況Ａ及び状況Ｃは、ｍ個の単語を要素とするｍ次元ワードベクトルである。一方、ベースの状況Ａでの結果Ｂは、単語ｗ_ｂ１，ｗ_ｂ２，…，ｗ_ｂｇで構成され、ターゲットの状況Ｃでの結果Ｘは、単語ｗ_ｘ１，ｗ_ｘ２，…，ｗ_ｘｇで構成される。つまり、結果Ｂ及び結果Ｘは、ｇ個の単語を要素とするｇ次元ワードベクトルである。
本実施形態の類推装置は、式（４）に示すような多次元ベクトルであるベースの状況Ａ、ベースの状況Ａでの結果Ｂ、及び、ターゲットの状況Ｃから、ターゲットの状況Ｃでの結果Ｘを求める処理を行なう。 As described above, conditions A, the word _t _a1, t a2, _..., is composed of _{t am,} status C is the word _t _c1, t c2, _..., composed of _{t cm.} That is, the situation A and the situation C are m-dimensional word vectors having m words as elements. On the other hand, the result B in the base situation A is composed of words w _b1 , w _b2 ,..., W _bg , and the result X in the target situation C is composed of words w _x1 , w _x2 _,. Is done. That is, the result B and the result X are g-dimensional word vectors having g words as elements.
The analogy device of the present embodiment is based on the base situation A, which is a multidimensional vector as shown in Expression (4), the result B in the base situation A, and the target situation C, and the result in the target situation C. Processing for obtaining X is performed.

図１及び２を用いて、本実施形態の類推装置の処理概要を説明する。
図１は、本実施形態の類推装置における関係抽出処理の概要を示す図である。類推装置は、状況Ａを表すｍ個（ｍは２以上の整数）の単語の列であるｍ次元のワードベクトルからｎ個（ｎは１以上ｍ以下の整数）の単語を選択したときの全ての組合せを生成する。さらに、類推装置は、結果Ｂを表すｇ個（ｇは２以上の整数）の単語の列であるｇ次元のワードベクトルの中からｈ（ｈは１以上ｇ以下の整数）個の単語を選択したときの全ての組合せを生成する。類推装置は、状況Ａから生成した組合せを構成するｎ個の単語と、結果Ｂから生成した組合せを構成するｈ個の単語とが共起する記事を検索する。記事には複数の単語が含まれており、それら複数の単語からなるワードベクトルとみなすことができる。このワードベクトルを構成する単語の中には、他の単語同士を関連付ける単語も含まれる。類推装置は、状況Ａから生成した組合せを構成するｎ個の単語と、結果Ｂから生成した組合せを構成するｈ個の単語とを関係付ける単語ｒを記事から抽出する。抽出された単語ｒをそれぞれｒ_１、ｒ_２、…とすると、関係集合Ｒ＝｛ｒ_ｉ｝と表すことができる（ｉは１以上の整数）。 The processing outline of the analogy device of this embodiment will be described with reference to FIGS.
FIG. 1 is a diagram showing an outline of the relationship extraction process in the analogy device of this embodiment. The analogy device selects all n (n is an integer from 1 to m) words from an m-dimensional word vector that is a sequence of m (m is an integer of 2 or more) words representing the situation A. Generate a combination of Further, the analogy device selects h (h is an integer between 1 and g) words from a g-dimensional word vector that is a string of g words (g is an integer of 2 or more) representing the result B. All combinations are generated. The analog inference device searches for articles in which n words constituting the combination generated from the situation A and h words constituting the combination generated from the result B co-occur. An article includes a plurality of words, and can be regarded as a word vector composed of the plurality of words. Among the words constituting this word vector, there are also words that associate other words with each other. The analogy device extracts from the article a word r that associates n words that make up the combination generated from the situation A and h words that make up the combination generated from the result B. If the extracted words r are r ₁ , r ₂ ,..., _They can be expressed as a relation set R = {r _i } (i is an integer of 1 or more).

図２は、関係マッピング処理の概要を示す図である。類推装置は、状況Ｃを構成するｍ個の単語の列であるｍ次元のワードベクトルからｎ個の単語を選択したときの全ての組合せを生成する。類推装置は、関係集合Ｒを構成する各単語ｒ_ｉによって、状況Ｃから生成した組合せと関連付けられる単語を記事から抽出し、抽出した単語群を類推結果Ｘとする。 FIG. 2 is a diagram showing an overview of the relationship mapping process. The analog inference apparatus generates all combinations when n words are selected from an m-dimensional word vector that is a sequence of m words constituting the situation C. Analogy apparatus, by each word r _i constituting the relationship set R, extracting word associated with the combination generated from the situation C from the article, the extracted word group and analogy result X.

図３は、本発明の第一の実施形態による類推装置１の構成を示すブロック図である。類推装置１は、例えば、１台または複数台のコンピュータ装置で実現することができ、記事データを記憶する記事記憶装置５とネットワークを介して接続される。記事データは、例えば、ニュースのテキストデータ、雑誌のテキストデータ、知識データベースの内容などである。記事データは、複数の単語からなるが、その中には、他の単語間を関係付ける単語が含まれる。例えば、自然言語の文の場合、主語の名詞（単語）と、述部にある名詞（単語）とを、述部の動詞（単語）が関連付けている。 FIG. 3 is a block diagram showing the configuration of the analogy device 1 according to the first embodiment of the present invention. The analogy device 1 can be realized by, for example, one or a plurality of computer devices, and is connected to an article storage device 5 that stores article data via a network. The article data is, for example, news text data, magazine text data, knowledge database contents, and the like. The article data is composed of a plurality of words, among which are words that relate other words. For example, in the case of a natural language sentence, the noun (word) of the subject and the noun (word) in the predicate are associated with the verb (word) of the predicate.

同図に示すように、類推装置１は、事例記憶部１１、入力部１２、関係抽出部１３、処理結果記憶部１４、関係マッピング部１５及び出力部１６を備えて構成される。
事例記憶部１１は、状況Ａのワードベクトルを示す状況Ａデータ（ベース状況データ）と、結果Ｂのワードベクトルを示す結果Ｂデータ（ベース結果データ）を記憶する。状況Ａデータは、ｍ個の単語を要素とするｍ次元ワードベクトルを示し、結果Ｂデータは、ｇ個の単語を要素とするｇ次元ワードベクトルを示す。入力部１２は、キーボードなどによって、状況Ｃのワードベクトルを示す状況Ｃデータ（ターゲット状況データ）の入力を受ける。状況Ｃデータは、ｍ個の単語を要素とするｍ次元ワードベクトルである。なお、入力部１２は、ネットワークを介して接続される他のコンピュータ装置から状況Ｃデータを受信したり、コンピュータ読み取り可能な記録媒体から状況Ｃデータを読み出したりしてもよい。処理結果記憶部１４は、関係抽出部１３及び関係マッピング部１５の各部による処理結果を記憶する。 As shown in FIG. 1, the analogy device 1 includes a case storage unit 11, an input unit 12, a relationship extraction unit 13, a processing result storage unit 14, a relationship mapping unit 15, and an output unit 16.
The case storage unit 11 stores situation A data (base situation data) indicating a word vector of situation A and result B data (base result data) indicating a word vector of result B. The situation A data indicates an m-dimensional word vector having m words as elements, and the result B data indicates a g-dimensional word vector having g words as elements. The input unit 12 receives input of situation C data (target situation data) indicating a word vector of situation C using a keyboard or the like. The situation C data is an m-dimensional word vector having m words as elements. Note that the input unit 12 may receive the status C data from another computer device connected via a network, or read the status C data from a computer-readable recording medium. The processing result storage unit 14 stores processing results obtained by the units of the relationship extraction unit 13 and the relationship mapping unit 15.

関係抽出部１３は、ベース状況部分組合せ生成部２１、ベース結果部分組合せ生成部２２、ベース共起記事検索部２３、関係集合生成部２４及び関係妥当性スコア算出部２５を備える。
ベース状況部分組合せ生成部２１は、事例記憶部１１から読み出した状況Ａデータが示すｍ個の単語からｎ個の単語を選択したときの全ての組合せ（ｎ＝１の場合も説明の便宜上、組合せと記載する。）を生成し、これらの組合せそれぞれを示すデータである部分組合せＡ_ｌ（ベース状況部分組合せデータ）を生成する（１≦ｌ≦_ｍＣ_ｎ、ｌは整数）。つまり、部分組合せＡ_ｌは、ｎ個の単語を要素とするｎ次元のワードベクトルを示す。 The relationship extraction unit 13 includes a base situation partial combination generation unit 21, a base result partial combination generation unit 22, a base co-occurrence article search unit 23, a relation set generation unit 24, and a relation validity score calculation unit 25.
The base situation partial combination generation unit 21 selects all the combinations when n words are selected from the m words indicated by the situation A data read from the case storage unit 11 (in the case of n = 1, for convenience of explanation, combinations) And a partial combination A _l (base situation partial combination data) which is data indicating each of these combinations is generated (1 ≦ l ≦ _m C _n , where l is an integer). That is, the partial combination _Al indicates an n-dimensional word vector having n words as elements.

ベース結果部分組合せ生成部２２は、事例記憶部１１から読み出した結果Ｂデータが示すｇ個の単語からｈ個の単語を選択したときの全ての組合せ（ｈ＝１の場合も説明の便宜上、組合せと記載する。）を生成し、これらの組合せそれぞれを示すデータである部分組合せＢ_ｋ（ベース結果部分組合せデータ）を生成する（１≦ｋ≦_ｇＣ_ｈ、ｋは整数）。つまり、部分組合せＢ_ｋは、ｈ個の単語を要素とするｈ次元のワードベクトルを示す。 The base result partial combination generation unit 22 selects all the combinations when the h words are selected from the g words indicated by the result B data read from the case storage unit 11 (for the convenience of explanation, even when h = 1) And a partial combination B _k (base result partial combination data) that is data indicating each of these combinations is generated (1 ≦ k ≦ _g C _h , k is an integer). That is, the partial combination _Bk indicates an h-dimensional word vector having h words as elements.

ベース共起記事検索部２３は、１つの部分組合せＡ_ｌと、１つの部分組合せＢ_ｋとからなる全ての組合せそれぞれについて記事記憶装置５に記憶されている記事データを検索し、部分組合せＡ_ｌ及び部分組合せＢ_ｋが示す全ての単語が共起する記事を示す記事データを特定する。ベース共起記事検索部２３は、特定した記事データの集合からなるデータである記事集合Ｄを生成する。関係集合生成部２４は、１つの部分組合せＡ_ｌと１つの部分組合せＢ_ｋとからなる全ての組合せそれぞれについて、部分組合せＡ_ｌが示すｎ個の単語と、部分組合せＢ_ｋが示すｈ個の単語とを関係付ける単語ｒ（関連付け単語）を記事集合Ｄに含まれる各記事データから抽出し、抽出した単語ｒの集合を示すデータである関係集合Ｒを生成する。抽出された各単語ｒを、ｒ_１、ｒ_２、…とする。関係妥当性スコア算出部２５は、関係集合生成部２４が生成した関係集合Ｒに含まれる各単語ｒ_ｉ（ｉは１以上の整数）の関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）を算出する。関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）は、単語ｒ_ｉが、部分組合せＡ_ｌと部分組合せＢ_ｋとを関連付ける単語として妥当であるかの尤もらしさを定量的に表す値である。 The base co-occurrence article retrieval unit 23 retrieves article data stored in the article storage device 5 for each of all combinations including one partial combination A _l and one partial combination B _k, and obtains a partial combination A _l. and all the words indicated by the partial combination B _k to identify the article data indicating an article co-occurring. The base co-occurrence article search unit 23 generates an article set D that is data including a set of specified article data. The relation set generation unit 24 includes, for each combination of one partial combination A _l and one partial combination B _k , n words indicated by the partial combination A _l and h numbers indicated by the partial combination B _k. A word r (association word) related to the word is extracted from each piece of article data included in the article set D, and a relation set R that is data indicating the set of extracted words r is generated. Let each extracted word r be r ₁ , r ₂ ,. The relation validity score calculation unit 25 calculates a relation validity score score (r _i ) of each word r _i (i is an integer of 1 or more) included in the relation set R generated by the relation set generation unit 24. The relation validity score score (r _i ) is a value that quantitatively represents the likelihood that the word r _i is valid as a word that associates the partial combination A ₁ and the partial combination B _k .

関係マッピング部１５は、ターゲット状況部分組合せ生成部２６、ターゲット共起記事検索部２７、類推結果生成部２８及び類推結果妥当性スコア算出部２９を備える。
ターゲット状況部分組合せ生成部２６は、入力部１２により入力された状況Ｃデータが示すｍ個の単語からｎ個の単語を選択したときの全ての組合せを生成し、これらの組合せそれぞれを示すデータである部分組合せＣ_ｆ（ターゲット状況部分組合せデータ）を生成する（１≦ｆ≦_ｍＣ_ｎ、ｆは整数）。つまり、部分組合せＣ_ｆは、ｎ個の単語を要素とするｎ次元のワードベクトルを示す。 The relationship mapping unit 15 includes a target situation partial combination generation unit 26, a target co-occurrence article search unit 27, an analogy result generation unit 28, and an analogy result validity score calculation unit 29.
The target situation partial combination generation unit 26 generates all combinations when n words are selected from the m words indicated by the situation C data input by the input unit 12, and is data indicating each of these combinations. A certain partial combination C _f (target situation partial combination data) is generated (1 ≦ f ≦ _m C _n , f is an integer). That is, the partial combination _Cf indicates an n-dimensional word vector having n words as elements.

ターゲット共起記事検索部２７は、１つの部分組合せＣ_ｆと、関係集合Ｒに含まれる１つの単語ｒ_ｉとからなる全ての組合せそれぞれについて記事記憶装置５を検索し、部分組合せＣ_ｆが示す全ての単語と、単語ｒ_ｉとが共起する記事を示す記事データを特定する。ターゲット共起記事検索部２７は、特定した記事データの集合からなるデータである記事集合Ｅを生成する。類推結果生成部２８は、単語ｒ_ｉによって部分組合せＣ_ｆが示すｎ個の単語と関係付けられる単語ｘを記事集合Ｅに含まれる各記事データから抽出し、抽出された単語ｘの集合を示すデータである類推結果集合Ｘを生成する。抽出された各単語ｘを、ｘ_１、ｘ_２、…とする。類推結果妥当性スコア算出部２９は、類推結果生成部２８により生成された類推結果集合Ｘに含まれる各単語ｘ_ｊ（ｊは１以上の整数）の類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）を算出する。類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）は、単語ｘ_ｊが、類推結果として妥当であるかを定量的に表す値である。 The target co-occurrence article search unit 27 searches the article storage device 5 for all combinations of one partial combination C _f and one word r _i included in the relation set R, and the partial combination C _f indicates Article data indicating an article in which all the words and the word r _i co-occur are specified. The target co-occurrence article search unit 27 generates an article set E that is data including a set of identified article data. The analogy result generation unit 28 extracts the word x associated with the n words indicated by the partial combination C _f by the word r _i from each article data included in the article set E, and indicates the set of extracted words x An analogy result set X that is data is generated. Let each extracted word x be x ₁ , x ₂ ,. The analogy result validity score calculation unit 29 calculates an analogy result validity score score (x _j ) of each word x _j (j is an integer of 1 or more) included in the analogy result set X generated by the analogy result generation unit 28. calculate. The analogy result validity score score (x _j ) is a value that quantitatively represents whether the word x _j is valid as an analogy result.

出力部１６は、類推結果生成部２８により生成された類推結果集合Ｘが示す各単語ｘ_ｊと、類推結果妥当性スコア算出部２９により算出された当該単語ｘ_ｊの類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）とからなる類推結果データをディスプレイに表示させる。あるいは、出力部１６は、類推結果データをプリンタなどにより印刷してもよく、情報記録媒体へ書き込んでもよく、ネットワークを介して接続されるコンピュータ装置へ送信してもよい。この類推結果データは、結果Ｘを表すｇ次元ワードベクトルの要素である単語ｗ_ｘ１，ｗ_ｘ２，…，ｗ_ｘｇのいずれかであると類推される単語とその類推結果妥当性スコアの集合である。つまり、類推結果は、単語ｘ_ｊからなるファジィ集合によって表現される。 The output unit 16 outputs each word x _j indicated by the analogy result set X generated by the analogy result generation unit 28 and the analogy result validity score score of the word x _j calculated by the analogy result validity score calculation unit 29 ( x _j )) is displayed on the display. Alternatively, the output unit 16 may print the analogy result data by a printer or the like, write it to an information recording medium, or send it to a computer device connected via a network. This analogy result data is a set of words that are _presumed to be any of the words w _x1 , w _x2 ,..., W _xg that are elements of the g-dimensional word vector representing the result X, and the analogy result validity score. . That analogy result is represented by a fuzzy set consisting of a word x _j.

図４及び図５は、図３に示す類推装置１の類推処理フローを示す図である。
我々が日常使用している言語では３万語程度であるが、ここから各記事の特徴を表す重要語を抽出し、この抽出した重要語を用いて生成した状況Ａデータ及び結果Ｂデータを類推装置１の事例記憶部１１に記憶させておく。一般的に、状況Ａのワードベクトルの次元数ｍや結果Ｂのワードベクトルの次元数ｇとして２０〜５０を用いるが、それ以外の次元数でもよい。また、処理結果記憶部１４は、初期値ＮＵＬＬの記事集合Ｄ、記事集合Ｅ、関係集合Ｒ及び類推結果集合Ｘを記憶する。 4 and 5 are diagrams showing an analogy processing flow of the analogy device 1 shown in FIG.
In our daily use language, there are about 30,000 words. From this, we extract key words representing the characteristics of each article, and analogize the situation A data and result B data generated using the extracted key words. The data is stored in the case storage unit 11 of the device 1. Generally, 20 to 50 are used as the dimension number m of the word vector of the situation A and the dimension number g of the word vector of the result B, but other dimension numbers may be used. Further, the processing result storage unit 14 stores an article set D, an article set E, a relation set R, and an analogy result set X having an initial value NULL.

図４において、類推装置１の入力部１２は、状況Ｃデータの入力を受ける（ステップＳ１００）。続いて、ベース状況部分組合せ生成部２１は、事例記憶部１１から状況Ａデータを読み出し、状況Ａデータが示すワードベクトルの要素であるｍ個の単語から（ｍは２以上の整数）、ｎ個（ｎは１以上ｍ以下の整数）の単語を選択したときの組合せを全て生成する。組合せの数は、_ｍＣ_ｎとなる。ベース状況部分組合せ生成部２１は、生成した単語の組合せを要素とするｎ次元ワードベクトルを示す部分組合せＡ_ｌ（１≦ｌ≦_ｍＣ_ｎ、ｌは整数）を生成し、処理結果記憶部１４に書き込む（ステップＳ１０５）。抽出単語数ｎは、２〜５程度を用いるが、これ以外の値でもよい。 In FIG. 4, the input unit 12 of the analogy device 1 receives input of situation C data (step S100). Subsequently, the base situation partial combination generation unit 21 reads the situation A data from the case storage unit 11, and from m words that are elements of the word vector indicated by the situation A data (m is an integer of 2 or more), n pieces All combinations are selected when a word (n is an integer from 1 to m) is selected. The number of combinations is _m C _n . The base situation partial combination generation unit 21 generates a partial combination A _l (1 ≦ l ≦ _m C _n , where l is an integer) indicating an n-dimensional word vector having the generated word combination as an element, and the processing result storage unit 14 (Step S105). The number n of extracted words is about 2 to 5, but other values may be used.

続いて、ベース結果部分組合せ生成部２２は、事例記憶部１１から結果Ｂデータを読み出し、結果Ｂデータが示すワードベクトルの要素であるｇ個の単語から、ｈ個（ｈは１以上ｇ以下の整数）の単語を選択したときの全ての組合せを生成する。組合せの数は、_ｇＣ_ｈとなる。ベース結果部分組合せ生成部２２は、生成した単語の組合せを要素とするｈ次元ワードベクトルを示す部分組合せＢ_ｋ（１≦ｋ≦_ｇＣ_ｈ、ｋは整数）を生成し、処理結果記憶部１４に書き込む（ステップＳ１１０）。抽出単語数ｈは、２〜５程度を用いるが、これ以外の値でもよい。 Subsequently, the base result partial combination generation unit 22 reads the result B data from the case storage unit 11, and from the g words that are elements of the word vector indicated by the result B data, h (h is 1 to g). All combinations when an integer word is selected are generated. The number of combinations _becomes g _{C h.} The base result partial combination generation unit 22 generates a partial combination B _k (1 ≦ k ≦ _g C _h , k is an integer) indicating an h-dimensional word vector having the generated word combination as an element, and the processing result storage unit 14 (Step S110). The number of extracted words h is about 2 to 5, but other values may be used.

ベース共起記事検索部２３は、ステップＳ１０５において処理結果記憶部１４に書き込まれた部分組合せＡ_ｌの１つと、ステップＳ１１０において処理結果記憶部１４に書き込まれた部分組合せＢ_ｋの１つとからなる全ての組合せを生成する（ステップＳ１１５）。つまり、生成される組合せはＡ_１−Ｂ_１、Ａ_１−Ｂ_２、…、Ａ_１−Ｂ_(ｇＣｈ)、Ａ_２−Ｂ_１、Ａ_２−Ｂ_２、…、Ａ_{(ｍＣｎ−１)}−Ｂ_(ｇＣｈ)、Ａ_(ｍＣｎ)−Ｂ_１、Ａ_(ｍＣｎ)−Ｂ_２、…、Ａ_(ｍＣｎ)−Ｂ_(ｇＣｈ)である。ベース共起記事検索部２３は、ステップＳ１１５において生成した組合せのうち、まだステップＳ１２５の処理対象としていない組合せＡ_ｌ−Ｂ_ｋを選択する（ステップＳ１２０）。 Based co-occurrence article retrieval unit 23, one of the subcombination _{A l} written in the processing result storage unit 14 in step S105, consists one subcombination _{B k} written in the processing result storage unit 14 in step S110 All combinations are generated (step S115). That is, the generated combinations are A ₁ -B ₁ , A ₁ -B ₂ ,..., A ₁ -B _(gCh) , A ₂ -B ₁ , A ₂ -B ₂ , ..., A _(mCn-1) − _{_{B (gCh), a (mCn}} ) -B 1, a (mCn) -B 2, ..., a _{_{a (mCn) -B (gCh)}} . The base co-occurrence article search unit 23 selects a combination A ₁ -B _k that has not yet been processed in step S 125 from the combinations generated in step S 115 (step S 120).

ベース共起記事検索部２３は、ステップＳ１２０において選択した組合せＡ_ｌ−Ｂ_ｋを構成する部分組合せＡ_ｌ及び部分組合せＢ_ｋを処理結果記憶部１４から読み出す。ベース共起記事検索部２３は、記事記憶装置５が記憶する記事データを検索し、読み出した部分組合せＡ_ｌが示すｎ個の単語と、読み出した部分組合せＢ_ｋが示すｈ個の単語とが全て含まれる記事を示す記事データを特定する。ベース共起記事検索部２３は、特定した記事データを記事記憶装置５から読み出し、読み出した記事データを処理結果記憶部１４に記憶されている記事集合Ｄに追加する（ステップＳ１２５）。ただし、ベース共起記事検索部２３は、抽出した記事データがすでに記事集合Ｄに含まれている場合は追加しない。関係集合Ｄに含まれる記事データをそれぞれ記事データｄ_１、ｄ_２、…とする。 The base co-occurrence article search unit 23 reads out the partial combination A ₁ and the partial combination B _k constituting the combination A ₁ -B _k selected in Step S120 from the processing result storage unit 14. Based co-occurrence article retrieval unit 23 searches the article data stored in the article storage device 5, and the n words indicating read subcombination A _l is a h number of words indicated by the read part combination B _k is the Identify article data indicating all included articles. The base co-occurrence article search unit 23 reads the specified article data from the article storage device 5, and adds the read article data to the article set D stored in the processing result storage unit 14 (step S125). However, the base co-occurrence article search unit 23 does not add when the extracted article data is already included in the article set D. The article data included in the relation set D is assumed to be article data d ₁ , d ₂ ,.

ベース共起記事検索部２３は、ステップＳ１１５において生成した全ての組合せをステップＳ１２５の処理対象としたかを判断する(ステップＳ１３０）。まだステップＳ１２５の処理対象としていない組合せがあると判断した場合（ステップＳ１３０：ＮＯ）、ベース共起記事検索部２３は、ステップＳ１２０からの処理を繰り返す。ベース共起記事検索部２３が全ての組合せを処理対象としたと判断した場合（ステップＳ１３０：ＹＥＳ）、関係集合生成部２４は、ステップＳ１４０の処理を実行する。 The base co-occurrence article search unit 23 determines whether all the combinations generated in step S115 are the processing targets of step S125 (step S130). If it is determined that there is a combination that has not yet been processed in step S125 (step S130: NO), the base co-occurrence article search unit 23 repeats the processing from step S120. When the base co-occurrence article search unit 23 determines that all combinations are to be processed (step S130: YES), the relation set generation unit 24 executes the process of step S140.

関係集合生成部２４は、ステップＳ１１５と同様に、部分組合せＡ_ｌの１つと部分組合せＢ_ｋの１つとからなる全ての組合せを生成する（ステップＳ１３５）。関係集合生成部２４は、ステップＳ１３５において生成した組合せのうち、まだステップＳ１４５の処理対象としていない組合せＡ_ｌ−Ｂ_ｋを選択する（ステップＳ１４０）。 Relationship set generation unit 24, similarly to step S115, and generates all the combinations of one of one part combination _{B k} subcombination _{A l} (step S135). The relation set generation unit 24 selects the combination A ₁ -B _k that has not yet been processed in step S145 from the combinations generated in step S135 (step S140).

関係集合生成部２４は、ベース共起記事検索部２３により処理結果記憶部１４に書き込まれた記事集合Ｄに含まれる記事データｄ_１、ｄ_２、…が示す記事それぞれから、選択した組合せＡ_ｌ−Ｂ_ｋの部分組合せＡ_ｌが示すｎ個の単語と、部分組合せＢ_ｋが示すｈ個の単語とを関係付ける単語ｒを抽出する（ステップＳ１４５）。関係集合生成部２４は、抽出した単語を処理結果記憶部１４に記憶されている関係集合Ｒに追加する。ただし、関係集合生成部２４は、抽出した単語ｒがすでに関係集合Ｒに含まれている場合は追加しない。関係集合Ｒに含まれる単語ｒをそれぞれ単語ｒ_１、ｒ_２、…とする。次元数ｍ、ｇが２０〜５０であり、抽出単語数ｎ，ｈが２〜５個である場合、関係集合Ｒに含まれる単語数は、２０〜５０語程度となる。 The relation set generation unit 24 selects the combination A _l selected from each of the articles indicated by the article data d ₁ , d ₂ ,... Included in the article set D written in the processing result storage unit 14 by the base co-occurrence article search unit 23. and n words indicated subcombinations _{a l} of -B _k, the word r relating the h pieces of words indicated by the partial combination _{B k} extracted (step S145). The relation set generation unit 24 adds the extracted word to the relation set R stored in the processing result storage unit 14. However, the relation set generation unit 24 does not add when the extracted word r is already included in the relation set R. Words r included in the relation set R are defined as words r ₁ , r ₂ ,. When the dimension numbers m and g are 20 to 50 and the extracted word numbers n and h are 2 to 5, the number of words included in the relation set R is about 20 to 50 words.

関係集合生成部２４は、ステップＳ１３５において生成した全ての組合せをステップＳ１４５の処理対象としたかを判断する(ステップＳ１５０）。まだステップＳ１４５の処理対象としていない組合せがあると判断した場合（ステップＳ１５０：ＮＯ）、関係集合生成部２４は、ステップＳ１４０からの処理を繰り返す。関係集合生成部２４が全ての組合せをステップＳ１４５の処理対象としたと判断した場合（ステップＳ１５０：ＹＥＳ）、関係妥当性スコア算出部２５は、ステップＳ１５５の処理を実行する。 The relation set generation unit 24 determines whether all the combinations generated in step S135 are the processing targets of step S145 (step S150). When it is determined that there is a combination that has not yet been processed in step S145 (step S150: NO), the relation set generation unit 24 repeats the processing from step S140. When the relation set generation unit 24 determines that all combinations are to be processed in step S145 (step S150: YES), the relation validity score calculation unit 25 executes the process in step S155.

関係妥当性スコア算出部２５は、関係集合生成部２４により処理結果記憶部１４に書き込まれた関係集合Ｒに含まれる各単語ｒ_ｉ（ｉは１以上の整数）について、以下の式（５）により関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）を算出し、処理結果記憶部１４に書き込む（ステップＳ１５５）。但し、ＭＩ（Ａ_ｌ，ｒ_ｉ）は、部分組合せＡ_ｌと単語ｒ_ｉとの相互情報量、ＭＩ（Ｂ_ｋ，ｒ_ｉ）は、部分組合せＢ_ｋと単語ｒ_ｉとの相互情報量、ＭＩ（Ａ_ｌＢ_ｋ，ｒ_ｉ）は、部分組合せＡ_ｌ及び部分組合せＢ_ｋの共起と単語ｒ_ｉとの相互情報量である。相互情報量とは、２つの確率変数が相互に依存する尺度を表す量である。 The relation validity score calculation unit 25 uses the following formula (5) for each word r _i (i is an integer of 1 or more) included in the relation set R written in the processing result storage unit 14 by the relation set generation unit 24. calculating a relationship validity score score _{(r i),} the writing in the processing result storage unit 14 (step S155). Where MI (A ₁ , r _i ) is the mutual information amount between the partial combination A ₁ and the word r _i , MI (B _k , r _i ) is the mutual information amount between the partial combination B _k and the word r _i , MI (A ₁ B _k , r _i ) is a mutual information amount between the co-occurrence of the partial combination A ₁ and the partial combination B _k and the word r _i . The mutual information amount is an amount representing a measure on which two random variables depend on each other.

ｓｃｏｒｅ(ｒ_ｉ)＝Σ_ｌΣ_ｋＭＩ(Ａ_ｌ，ｒ_ｉ）＊ＭＩ(Ｂ_ｋ，ｒ_ｉ）＊ＭＩ(Ａ_ｌＢ_ｋ，ｒ_ｉ） …（５） score (r _i ) = Σ ₁ Σ _k MI (A ₁ , r _i ) * MI (B _k , r _i ) * MI (A ₁ B _k , r _i ) (5)

上記のように、式（５）は、相互情報量ＭＩ（Ａ_ｌ，ｒ_ｉ）、相互情報量ＭＩ（Ｂ_ｋ，ｒ_ｉ）、及び、相互情報量ＭＩ（Ａ_ｌＢ_ｋ，ｒ_ｉ）を乗算した値を、全ての部分組合せＡ_ｌ、全ての部分組合せＢ_ｋについて累積加算した値である。従って、相互情報量ＭＩ（Ａ_ｌ，ｒ_ｉ）、相互情報量ＭＩ（Ｂ_ｋ，ｒ_ｉ）、相互情報量ＭＩ（Ａ_ｌＢ_ｋ，ｒ_ｉ）が大きいほど、つまり、部分組合せＡ_ｌと単語ｒ_ｉの結びつきが強いほど、部分組合せＢ_ｋと単語ｒ_ｉの結びつきが強いほど、また、部分組合せＡ_ｌ及び部分組合せＢ_ｋの共起と単語ｒ_ｉの結びつきが強いほど、関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）も大きくなる。 As described above, the equation (5) is obtained by calculating the mutual information MI (A _l , r _i ), the mutual information MI (B _k , r _i ), and the mutual information MI (A _l B _k , r _i ). Is a value _{obtained by} cumulatively adding all the partial combinations A ₁ and all the partial combinations B _k . Accordingly, the larger the mutual information MI (A _l , r _i ), the mutual information MI (B _k , r _i ), and the mutual information MI (A _l B _k , r _i ), that is, the partial combination A _l word as strong ties r _i, the stronger the ties subcombination B _k and words r _i, also, as the strong ties subcombination a _l and subcombinations B _k of co-occurrence and word r _i, related validity score score _{(r i)} is also increased.

なお、関係妥当性スコア算出部２５は、相互情報量ＭＩ（Ａ_ｌ，ｒ_ｉ）、相互情報量ＭＩ（Ｂ_ｋ，ｒ_ｉ）、及び、相互情報量ＭＩ（Ａ_ｌＢ_ｋ，ｒ_ｉ）をそれぞれ、式（６）〜式（８）により算出する。 The relation validity score calculation unit 25 includes the mutual information MI (A _l , r _i ), the mutual information MI (B _k , r _i ), and the mutual information MI (A _l B _k , r _i ). Are calculated by the equations (6) to (8), respectively.

但し、ｐ（ｘ）は、ｘが記事記憶装置５に記憶されている記事データに出現する確率であり、ｐ（ｘ，ｙ）は、ｘとｙが同時に記事記憶装置５に記憶されている記事データに出現する確率である。つまり、ｐ（Ａ_ｌ）は、部分組合せＡ_ｌが示す全ての単語が出現する記事データの確率、ｐ（Ｂ_ｋ）は、部分組合せＢ_ｋが示す全ての単語が出現する記事データの確率、ｐ（ｒ_ｉ）は、単語ｒ_ｉが出現する記事データの確率である。また、ｐ（Ａ_ｌＢ_ｋ）は、部分組合せＡ_ｌが示す全ての単語と部分組合せＢ_ｋが示す全ての単語とが出現する記事データの確率、ｐ（Ａ_ｌ，ｒ_ｉ）は、部分組合せＡ_ｌが示す全ての単語と単語ｒ_ｉとが出現する記事データの確率、ｐ（Ｂ_ｋ，ｒ_ｉ）は、部分組合せＢ_ｋが示す全ての単語と単語ｒ_ｉとが出現する記事データの確率、ｐ（Ａ_ｌＢ_ｋ，ｒ_ｉ）は、部分組合せＡ_ｌが示す全ての単語及び部分組合せＢ_ｋが示す全ての単語と単語ｒ_ｉとが出現する記事データの確率である。 However, p (x) is the probability that x will appear in the article data stored in the article storage device 5, and p (x, y) is that x and y are stored in the article storage device 5 at the same time. The probability of appearing in article data. That is, p (A _l ) is the probability of article data in which all the words indicated by the partial combination A _l appear, p (B _k ) is the probability of article data in which all the words indicated by the partial combination B _k appear, p (r _i ) is the probability of the article data in which the word r _i appears. In addition, p (A ₁ B _k ) is the probability of article data in which all the words indicated by the partial combination A ₁ and all the words indicated by the partial combination B _k appear, and p (A _l , r _i ) is the partial The probability of article data in which all the words indicated by the combination A _l and the word r _i appear, p (B _k , r _i ) is the article data in which all the words indicated by the partial combination B _k and the word r _i appear. , P (A ₁ B _k , r _i ) is the probability of article data in which all words indicated by the partial combination A ₁ and all words indicated by the partial combination B _k and the word r _i appear.

関係妥当性スコア算出部２５は、各確率を以下のように算出する。関係妥当性スコア算出部２５は、記事記憶装置５に記憶されている記事データの数である合計記事数をカウントする。続いて、関係妥当性スコア算出部２５は、記事記憶装置５に記憶されている記事データのうち、部分組合せＡ_ｌが示す全ての単語が出現する記事データの数、部分組合せＢ_ｋが示す全ての単語が出現する記事データの数、単語ｒ_ｉが出現する記事データの数をカウントし、これらのカウント数をそれぞれ合計記事数で除算することによりｐ（Ａ_ｌ）、ｐ（Ｂ_ｋ）、ｐ（ｒ_ｉ）を算出する。 The relation validity score calculation unit 25 calculates each probability as follows. The relation validity score calculation unit 25 counts the total number of articles, which is the number of article data stored in the article storage device 5. Subsequently, the relationship appropriateness score calculation unit 25, among the article data stored in the article storage device 5, subcombinations A _l number of all article data word appears indicated, subcombinations B _k are all shown P (A _l ), p (B _k ), by counting the number of article data in which the word appears, and the number of article data in which the word r _i appears, and dividing these counts by the total number of articles, respectively. p (r _i ) is calculated.

さらに、関係妥当性スコア算出部２５は、記事記憶部５に記憶されている記事データのうち部分組合せＡ_ｌが示す全ての単語と部分組合せＢ_ｋが示す全ての単語とが出現する記事データの数、部分組合せＡ_ｌが示す全ての単語と単語ｒ_ｉとが出現する記事データの数、部分組合せＢ_ｋが示す全ての単語と単語ｒ_ｉとが出現する記事データの数、部分組合せＡ_ｌが示す全ての単語及び部分組合せＢ_ｋが示す全ての単語と単語ｒ_ｉとが出現する記事データの数をカウントし、これらのカウント数をそれぞれ合計記事数で除算することにより、ｐ（Ａ_ｌＢ_ｋ）、ｐ（Ａ_ｌ，ｒ_ｉ）、ｐ（Ｂ_ｋ，ｒ_ｉ）、ｐ（Ａ_ｌＢ_ｋ，ｒ_ｉ）を算出する。 Furthermore, the relationship appropriateness score calculation unit 25, the article data and all word indicating all words and subcombination B _k indicated subcombination A _l of the article data stored in the article storage unit 5 appears The number, the number of article data in which all the words indicated by the partial combination A _l and the word r _i appear, the number of article data in which all the words indicated by the partial combination B _k and the word r _i appear, the partial combination A _l by there is all words and subcombination B _k all words and word indicating the r _i that indicates a count of article data appearing divides these count the total number of articles, respectively, p (a _l B _k ), p (A _l , r _i ), p (B _k , r _i ), p (A _l B _k , r _i ) are calculated.

なお、式（６）〜式（８）に用いられるｐ（Ａ_ｌ）、ｐ（Ｂ_ｋ）、ｐ（ｒ_ｉ）、ｐ（Ａ_ｌＢ_ｋ）、ｐ（Ａ_ｌ，ｒ_ｉ）、ｐ（Ｂ_ｋ，ｒ_ｉ）、ｐ（Ａ_ｌＢ_ｋ，ｒ_ｉ）を、記事記憶装置５に記憶される記事データに基づいて算出した確率としているが、記事集合Ｄに含まれる記事データに出現する確率としてもよい。この場合、関係妥当性スコア算出部２５は、記事記憶部５に記憶されている記事データに代えて、記事集合Ｄに含まれる記事データを用い、上記と同様にｐ（Ａ_ｌ）、ｐ（Ｂ_ｋ）、ｐ（ｒ_ｉ）、ｐ（Ａ_ｌＢ_ｋ）、ｐ（Ａ_ｌ，ｒ_ｉ）、ｐ（Ｂ_ｋ，ｒ_ｉ）、ｐ（Ａ_ｌＢ_ｋ，ｒ_ｉ）を算出する。 Note that p (A ₁ ), p (B _k ), p (r _i ), p (A ₁ B _k ), p (A ₁ , r _i ), p used in equations (6) to (8). (B _k , r _i ), p (A _l B _k , r _i ) are the probabilities calculated based on the article data stored in the article storage device 5, but appear in the article data included in the article set D It is good also as the probability to do. In this case, the relation validity score calculation unit 25 uses the article data included in the article set D instead of the article data stored in the article storage unit 5, and p (A _l ), p ( B _k ), p (r _i ), p (A _l B _k ), p (A _l , r _i ), p (B _k , r _i ), and p (A _l B _k , r _i ) are calculated.

図５において、ターゲット状況部分組合せ生成部２６は、図４のステップＳ１００において入力された状況Ｃデータが示すワードベクトルの要素であるｍ個の単語からｎ個の単語を選択したときの全ての組合せを生成する。組合せの数は、_ｍＣ_ｎとなる。ターゲット状況部分組合せ生成部２６は、生成した単語の組合せを要素とするｎ次元ワードベクトルを示す部分組合せＣ_ｆ（１≦ｆ≦_ｍＣ_ｎ、ｆは整数）を生成し、処理結果記憶部１４に書き込む（ステップＳ２００）。 In FIG. 5, the target situation partial combination generation unit 26 selects all the combinations when n words are selected from m words that are elements of the word vector indicated by the situation C data input in step S100 of FIG. Is generated. The number of combinations is _m C _n . The target situation partial combination generation unit 26 generates a partial combination C _f (1 ≦ f ≦ _m C _n , where f is an integer) indicating an n-dimensional word vector having the generated word combination as an element, and the processing result storage unit 14 (Step S200).

ターゲット共起記事検索部２７は、処理結果記憶部１４に書き込まれた部分組合せＣ_ｆの１つと、関係集合Ｒに含まれる単語ｒ_ｉの１つとからなる全ての組合せを生成する（ステップＳ２０５）。つまり、生成される組合せはＣ_１−ｒ_１、Ｃ_１−ｒ_２、…、Ｃ_２−ｒ_１、Ｃ_２−ｒ_２、…、Ｃ_ｍＣｎ−ｒ_１、Ｃ_ｍＣｎ−ｒ_２、…である。なお、ターゲット共起記事検索部２７は、関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）が閾値以上の単語ｒ_ｉのみ、あるいは、関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）が高いものから所定数の単語ｒ_ｉのみを組合せを生成する対象としてもよい。ターゲット共起記事検索部２７は、ステップＳ２０５において生成した全ての組合せのうち、まだステップＳ２１５の処理対象としていない組合せＣ_ｆ−ｒ_ｉを選択する（ステップＳ２１０）。 The target co-occurrence article search unit 27 generates all combinations of one of the partial combinations C _f written in the processing result storage unit 14 and one of the words r _i included in the relation set R (Step S205). . That is, the generated combinations are C ₁ -r ₁ , C ₁ -r ₂ , ..., C ₂ -r ₁ , C ₂ -r ₂ , ..., C _mCn -r ₁ , C _mCn -r ₂ , ... . Incidentally, the target co-occurrence article retrieval unit 27, the relationship relevance score score _{(r i)} is more words _{r i} only threshold or word _{r i} of a predetermined number from those related relevance score score _{(r i)} is high Only a combination may be generated. Target cooccurrence article retrieval unit 27, among all the combinations generated in step S205, selects a combination _C f -r _i not yet been processed in step S215 (step S210).

ターゲット共起記事検索部２７は、ステップＳ２１０において選択した組合せＣ_ｆ−ｒ_ｉを構成する部分組合せＣ_ｆ及び単語ｒ_ｉを処理結果記憶部１４から読み出す。ターゲット共起記事検索部２７は、記事記憶装置５が記憶する記事データを検索し、読み出した部分組合せＣ_ｆが示すｎ個の単語と、単語ｒ_ｉとが全て含まれる記事を示す記事データを特定する。ターゲット共起記事検索部２７は、特定した記事データを記事記憶装置５から読み出し、読み出した記事データを処理結果記憶部１４に記憶されている記事集合Ｅに書き込む（ステップＳ２１５）。ただし、ターゲット共起記事検索部２７は、抽出した記事データがすでに記事集合Ｅに含まれている場合は追加しない。関係集合Ｅに含まれる記事データをそれぞれ記事データｅ_１、ｅ_２、…とする。 Target cooccurrence article retrieval unit 27 reads the partial combination _{C f} and word _{r i} constituting the combination _C f -r _i selected in step S210 from the processing result storage unit 14. The target co-occurrence article search unit 27 searches for article data stored in the article storage device 5, and stores article data indicating an article including all the n words indicated by the read partial combination C _f and the word r _i. Identify. The target co-occurrence article search unit 27 reads the specified article data from the article storage device 5, and writes the read article data to the article set E stored in the processing result storage unit 14 (step S215). However, the target co-occurrence article search unit 27 does not add when the extracted article data is already included in the article set E. The article data included in the relation set E is assumed to be article data e ₁ , e ₂ ,.

ターゲット共起記事検索部２７は、ステップＳ２０５において生成した全ての組合せをステップＳ２１５の処理対象としたかを判断する(ステップＳ２２０）。まだステップＳ２１５の処理対象としていない組合せがあると判断した場合（ステップＳ２２０：ＮＯ）、ターゲット共起記事検索部２７は、ステップＳ２１０からの処理を繰り返す。ターゲット共起記事検索部２７が全ての組合せを処理対象としたと判断した場合（ステップＳ２２０：ＹＥＳ）、類推結果生成部２８は、ステップＳ２２５の処理を実行する。 The target co-occurrence article search unit 27 determines whether all the combinations generated in step S205 are the processing targets of step S215 (step S220). If it is determined that there is a combination that has not yet been processed in step S215 (step S220: NO), the target co-occurrence article search unit 27 repeats the process from step S210. When the target co-occurrence article search unit 27 determines that all combinations are to be processed (step S220: YES), the analogy result generation unit 28 executes the process of step S225.

類推結果生成部２８は、ステップＳ２０５と同様に、部分組合せＣ_ｆの１つと関係集合Ｒに含まれる単語ｒ_ｉの１つとからなる全ての組合せを生成する（ステップＳ２２５）。類推結果生成部２８は、ステップＳ２２５において生成した組合せのうち、まだステップＳ２３５の処理対象としていない組合せＣ_ｆ−ｒ_ｉを選択する（ステップＳ２３０）。 Analogy result generation unit 28, similarly to step S205, and generates all the combinations of one of the word _{r i} included in one relationship set R subcombination _{C f} (step S225). Analogy result generation unit 28, out of the generated combination in step S225, selects a combination _C f -r _i not yet been processed in step S235 (step S230).

類推結果生成部２８は、ターゲット共起記事検索部２７が処理結果記憶部１４に書き込んだ記事集合Ｅに含まれる各記事データｅ_１、ｅ_２、…が示す記事それぞれから、選択した組合せの単語ｒ_ｉによって部分組合せＣ_ｆが示すｎ個の単語と関係付けられる単語ｘを抽出する（ステップＳ２３５）。この時、ターゲットにおけるＣ_ｆ−ｒ_ｉ−ｘの構造が、ベースで単語ｒ_ｉが抽出されたときのＡ_ｌ−ｒ_ｉ−Ｂ_ｋの構造と同じになるようにｘを選択する。類推結果生成部２８は、抽出した単語ｘを処理結果記憶部１４に記憶されている類推結果集合Ｘに追加する。ただし、類推結果生成部２８は、抽出した単語ｘがすでに類推結果集合Ｘに含まれている場合は追加しない。類推結果集合Ｘに含まれる単語ｘをそれぞれ、ｘ_１、ｘ_２、…とする。 The analogy result generation unit 28 selects words of combinations selected from the articles indicated by the respective article data e ₁ , e ₂ ,... Included in the article set E written by the target co-occurrence article search unit 27 in the processing result storage unit 14. A word x related to n words indicated by the partial combination C _f is extracted by r _i (step S235). In this case, the structure of the _C f _-r i -x in the target selects the _A l _-r i -B _k x to be the same as the structure of when the word _{r i} is extracted with the base. The analogy result generation unit 28 adds the extracted word x to the analogy result set X stored in the processing result storage unit 14. However, the analogy result generation unit 28 does not add when the extracted word x is already included in the analogy result set X. Words x included in the analogy result set X are respectively x ₁ , x ₂ ,.

類推結果生成部２８は、ステップＳ２２５において生成した全ての組合せをステップＳ２３５の処理対象としたかを判断する(ステップＳ２４０）。類推結果生成部２８は、まだ処理対象としていない組合せがあると判断した場合（ステップＳ２４０：ＮＯ）、ステップＳ２３０からの処理を繰り返す。類推結果生成部２８が全ての組合せをステップＳ２３５の処理対象としたと判断した場合（ステップＳ２４０：ＹＥＳ）、類推結果妥当性スコア算出部２９は、ステップＳ２４５の処理を実行する。 The analogy result generation unit 28 determines whether all combinations generated in step S225 have been processed in step S235 (step S240). If the analogy result generation unit 28 determines that there is a combination that has not yet been processed (step S240: NO), the process from step S230 is repeated. When the analogy result generation unit 28 determines that all combinations have been processed in step S235 (step S240: YES), the analogy result validity score calculation unit 29 executes the process of step S245.

類推結果妥当性スコア算出部２９は、処理結果記憶部１４に記憶されている類推結果集合Ｘで示される単語ｘ_ｊ（ｊ＝１、２、…）について、以下の式（９）により類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）を算出し、処理結果記憶部１４に書き込む（ステップＳ２４５）。但し、ＭＩ（Ｃ_ｆ，ｘ_ｊ）は、部分組合せＣ_ｆと単語ｘ_ｊとの相互情報量、ＭＩ（ｒ_ｉ，ｘ_ｊ）は、単語ｒ_ｉと単語ｘ_ｊとの相互情報量、ＭＩ（Ｃ_ｆｒ_ｉ，ｘ_ｊ）は、部分組合せＣ_ｆ及び単語ｒ_ｉの共起と単語ｘ_ｊとの相互情報量である。また、類推結果妥当性スコア算出部２９は、関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）を処理結果記憶部１４から読み出す。 The analogy result validity score calculation unit 29 uses the following equation (9) to calculate an analogy result for the word x _j (j = 1, 2,...) Indicated by the analogy result set X stored in the processing result storage unit 14. The validity score score (x _j ) is calculated and written in the processing result storage unit 14 (step S245). Where MI (C _f , x _j ) is the mutual information amount between the partial combination C _f and the word x _j , MI (r _i , x _j ) is the mutual information amount between the word r _i and the word x _j , MI (C _f r _i , x _j ) is a mutual information amount between the word x _j and the co-occurrence of the partial combination C _f and the word r _i . The analogy result validity score calculation unit 29 reads the relationship validity score score _{(r i)} from the processing result storage unit 14.

ｓｃｏｒｅ（ｘ_ｊ）＝Σ_ｆΣ_ｉＭＩ(Ｃ_ｆ，ｘ_ｊ）＊ＭＩ(ｒ_ｉ，ｘ_ｊ）＊ＭＩ(Ｃ_ｆｒ_ｉ，ｘ_ｊ） …（９） score (x _j ) = Σ _f Σ _i MI (C _f , x _j ) * MI (r _i , x _j ) * MI (C _f r _i , x _j ) (9)

上記のように、式（９）は、相互情報量ＭＩ（Ｃ_ｆ，ｘ_ｊ）、相互情報量ＭＩ（ｒ_ｉ，ｘ_ｊ）、相互情報量ＭＩ（Ｃ_ｆｒ_ｉ，ｘ_ｊ）及び関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）を乗算した値を、全ての部分組合せＣ_ｆ、全ての単語ｒ_ｉについて累積加算した値である。従って、相互情報量ＭＩ（Ｃ_ｆ，ｘ_ｊ）、相互情報量ＭＩ（ｒ_ｉ，ｘ_ｊ）、相互情報量ＭＩ（Ｃ_ｆｒ_ｉ，ｘ_ｊ）が大きいほど、つまり、部分組合せＣ_ｆと単語ｘ_ｊの結びつきが強いほど、単語ｒ_ｉと単語ｘ_ｊの結びつきが強いほど、部分組合せＣ_ｆ及び単語ｒ_ｉの共起と単語ｘ_ｊの結びつきが強いほど、また、関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）が大きいほど、類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）も大きくなる。 As described above, the equation (9) is obtained by calculating the mutual information MI (C _f , x _j ), the mutual information MI (r _i , x _j ), the mutual information MI (C _f r _i , x _j ) and the relationship. The value obtained by multiplying the validity score score (r _i ) is a value obtained by accumulating all the partial combinations C _f and all the words r _i . Therefore, the larger the mutual information MI (C _f , x _j ), the mutual information MI (r _i , x _j ), and the mutual information MI (C _f r _i , x _j ), that is, the partial combination C _f and word as strong ties _{x j,} the more strong connection word _{r i} and the word _{x j,} the more strong connection subcombination _{C f} and word _{r i} cooccurrence and words _{x j,} also related relevance scores score As (r _i ) increases, the analogy result validity score score (x _j ) also increases.

なお、類推結果妥当性スコア算出部２９は、相互情報量ＭＩ（Ｃ_ｆ，ｘ_ｊ）、相互情報量ＭＩ（ｒ_ｉ，ｘ_ｊ）、及び、相互情報量ＭＩ（Ｃ_ｆｒ_ｉ，ｘ_ｊ）をそれぞれ、式（１０）〜式（１２）により算出する。 The analogy result validity score calculation unit 29 includes the mutual information MI (C _f , x _j ), the mutual information MI (r _i , x _j ), and the mutual information MI (C _f r _i , x _j). ) Are calculated by equations (10) to (12), respectively.

但し、ｐ（Ｃ_ｆ）は、部分組合せＣ_ｆで示される全ての単語が出現する記事データの確率、ｐ（ｘ_ｊ）は、単語ｘ_ｊが出現する記事データの確率である。また、ｐ（Ｃ_ｆ，ｘ_ｊ）は、部分組合せＣ_ｆが示す全ての単語と単語ｘ_ｊとが出現する記事データの確率、ｐ（ｒ_ｉ，ｘ_ｊ）は、単語ｒ_ｉと単語ｘ_ｊが出現する記事データの確率、ｐ（Ｃ_ｆ，ｒ_ｉ）は、部分組合せＣ_ｆが示す全ての単語と単語ｒ_ｉとが出現する記事データの確率、ｐ（Ｃ_ｆｒ_ｉ，ｘ_ｊ）は、部分組合せＣ_ｆが示す全ての単語及び単語ｒ_ｉと単語ｘ_ｊとが出現する記事データの確率である。 However, p (C _f ) is the probability of article data in which all the words indicated by the partial combination C _f appear, and p (x _j ) is the probability of article data in which the word x _j appears. In addition, p (C _f , x _j ) is the probability of article data in which all the words indicated by the partial combination C _f and the word x _j appear, and p (r _i , x _j ) is the word r _i and the word x The probability of article data in which _j appears, p (C _f , r _i ) is the probability of article data in which all the words indicated by the partial combination C _f and the word r _i appear, p (C _f r _i , x _j ) Is the probability of article data in which all the words indicated by the partial combination C _f and the word r _i and the word x _j appear.

類推結果妥当性スコア算出部２９は、各確率を以下のように算出する。類推結果妥当性スコア算出部２９は、記事記憶装置５に記憶されている記事データの数である合計記事数をカウントする。続いて、類推結果妥当性スコア算出部２９は、記事記憶装置５に記憶されている記事データのうち、部分組合せＣ_ｆが示す全ての単語が出現する記事データの数、単語ｘ_ｊが出現する記事データの数、単語ｒ_ｉが出現する記事データの数をカウントし、これらのカウント数をそれぞれ合計記事数で除算することによりｐ（Ｃ_ｆ）、ｐ（ｘ_ｊ）、ｐ（ｒ_ｉ）を算出する。 The analogy result validity score calculation unit 29 calculates each probability as follows. The analogy result validity score calculation unit 29 counts the total number of articles, which is the number of article data stored in the article storage device 5. Subsequently, analogy result validity score calculation unit 29, among the article data stored in the article storage device 5, the number of article data of all the words indicated by the subcombinations C _f appears, the word x _j appears the number of article data, and counts the number of article data word _{r i} appears, _p (C _f) by dividing the number of these counts by the total number of articles, respectively, p (x j), p (r i) Is calculated.

さらに、類推結果妥当性スコア算出部２９は、記事記憶部５に記憶されている記事データのうち部分組合せＣ_ｆが示す全ての単語と単語ｘ_ｊとが出現する記事データの数、単語ｒ_ｉと単語ｘ_ｊとが出現する記事データの数、部分組合せＣ_ｆが示す全ての単語と単語ｒ_ｉとが出現する記事データの数、部分組合せＣ_ｆが示す全ての単語及び単語ｒ_ｉと単語ｘ_ｊとが出現する記事データの数をカウントし、これらのカウント数をそれぞれ合計記事数で除算することにより、ｐ（Ｃ_ｆ，ｘ_ｊ）、ｐ（ｒ_ｉ，ｘ_ｊ）、ｐ（Ｃ_ｆ，ｒ_ｉ）、ｐ（Ｃ_ｆｒ_ｉ，ｘ_ｊ）を算出する。 Further, the analogy result validity score calculation unit 29 calculates the number of article data in which all the words indicated by the partial combination C _f and the word x _j appear in the article data stored in the article storage unit 5 and the word r _i. the number of article data and words x _j appears as, subcombinations C _f number of article data is all words and word r _i appear indicated, all words and word r _i and word indicating the subcombination C _f By counting the number of article data in which x _j appears, and dividing these counts by the total number of articles, p (C _f , x _j ), p (r _i , x _j ), p (C _f , r _i ), p (C _f r _i , x _j ) are calculated.

なお、式（１０）〜式（１２）に用いられるｐ（Ｃ_ｆ）、ｐ（ｘ_ｊ）、ｐ（ｒ_ｉ）、ｐ（Ｃ_ｆ，ｘ_ｊ）、ｐ（ｒ_ｉ，ｘ_ｊ）、ｐ（Ｃ_ｆ，ｒ_ｉ）、ｐ（Ｃ_ｆｒ_ｉ，ｘ_ｊ）を、記事記憶装置５に記憶される記事データに基づいて算出した確率としているが、記事集合Ｅに含まれる記事データに出現する確率としてもよい。この場合、類推結果妥当性スコア算出部２９は、記事記憶部５に記憶されている記事データに代えて、記事集合Ｅに含まれる記事データを用い、上記と同様にｐ（Ｃ_ｆ）、ｐ（ｘ_ｊ）、ｐ（ｒ_ｉ）、ｐ（Ｃ_ｆ，ｘ_ｊ）、ｐ（ｒ_ｉ，ｘ_ｊ）、ｐ（Ｃ_ｆ，ｒ_ｉ）、ｐ（Ｃ_ｆｒ_ｉ，ｘ_ｊ）を算出する。 Note that p (C _f ), p (x _j ), p (r _i ), p (C _f , x _j ), p (r _i , x _j ), and p (r _i , x _j ), which are used in the equations (10) to (12), p (C _f , r _i ) and p (C _f r _i , x _j ) are the probabilities calculated based on the article data stored in the article storage device 5, but the article data included in the article set E It is good also as the probability of appearing. In this case, the analogy result validity score calculation unit 29 uses the article data included in the article set E instead of the article data stored in the article storage unit 5, and p (C _f ), p Calculate (x _j ), p (r _i ), p (C _f , x _j ), p (r _i , x _j ), p (C _f , r _i ), p (C _f r _i , x _j ) To do.

出力部１６は、類推結果生成部２８が処理結果記憶部１４に書き込んだ類推結果集合Ｘが示す各単語ｘ_ｊと、類推結果妥当性スコア算出部２９が書き込んだ当該単語ｘ_ｊの類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）とからなる類推結果データをディスプレイに表示させるなどして出力する（ステップＳ２４５）。このとき、出力部１６は、類推結果妥当性スコアが閾値以上の単語ｘ_ｊとその類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）のみを出力するようにしてもよく、類推結果妥当性スコアが高い順に所定数の単語ｘ_ｊとその類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）のみを出力してもよい。 Output unit 16, analogy results each word x _j which generator 28 is indicated by analogy result set X written in the processing result storage unit 14, analogy result validity of written by analogy result validity score calculation unit 29 the word x _j The analogy result data including the sex score score (x _j ) is output on the display (step S245). At this time, the output unit 16 may output only the word x _j whose analogy result validity score is greater than or equal to the threshold and its analogy result validity score score (x _j ), and in descending order of the analogy result validity score. Only a predetermined number of words x _j and their analogy result validity scores score (x _j ) may be output.

続いて、図４に示すステップＳ１４５における単語ｒの抽出処理、図５に示すステップＳ２３５における単語ｘの抽出処理の詳細な処理手順を説明する。以下では、２つの抽出処理について示しているが、事例や類推の対象に応じていずれを用いてもよい。ここでは、簡単のため、状況Ａ、結果Ｂ、状況Ｃとも３次元のワードベクトルであり、抽出単語数ｎ、ｈが２である場合を例にして説明する。 Next, a detailed processing procedure of the word r extraction process in step S145 shown in FIG. 4 and the word x extraction process in step S235 shown in FIG. 5 will be described. In the following, two extraction processes are shown, but any of them may be used according to the case or the target of analogy. Here, for the sake of simplicity, a case will be described in which the situation A, the result B, and the situation C are three-dimensional word vectors, and the number of extracted words n and h is 2.

この場合、図４のステップＳ１０５において、状況Ａのワードベクトル（ｔ_ａ１，ｔ_ａ２，ｔ_ａ３）から部分組合せＡ_１（ｔ_ａ１，ｔ_ａ２）、Ａ_２（ｔ_ａ１，ｔ_ａ３）、Ａ_３（ｔ_ａ２，ｔ_ａ３）が生成され、ステップＳ１１０において、結果Ｂのワードベクトルから部分組合せＢ_１（ｗ_ｂ１，ｗ_ｂ２）、Ｂ_２（ｗ_ｂ１，ｗ_ｂ３）、Ｂ_３（ｗ_ｂ２，ｗ_ｂ３）が生成される。従って、ステップＳ１３５においては、組合せＡ_１−Ｂ_１，Ａ_１−Ｂ_２，Ａ_１−Ｂ_３、Ａ_２−Ｂ_１，Ａ_２−Ｂ_２，Ａ_２−Ｂ_３、Ａ_３−Ｂ_１，Ａ_３−Ｂ_２，Ａ_３−Ｂ_３が生成される。 In this case, in step S105 of FIG. 4, partial combinations A ₁ (t _a1 , t _a2 ), A ₂ (t _a1 , t _a3 ), A ₃ from the word vector (t _a1 , t _a2 , t _a3 ) of the situation A (T _a2 , t _a3 ) is generated, and in step S110, partial combinations B ₁ (w _b1 , w _b2 ), B ₂ (w _b1 , w _b3 ), B ₃ (w _b2 , w) are generated from the word vector of the result B. _b3 ) is generated. Therefore, in step S135, the combinations A ₁ -B ₁ , A ₁ -B ₂ , A ₁ -B ₃ , A ₂ -B ₁ , A ₂ -B ₂ , A ₂ -B ₃ , A ₃ -B ₁ , A ₃ -B ₂ and A ₃ -B ₃ are generated.

（抽出処理１）：記事データは、自然言語で記述された文書のテキストデータである。ステップＳ１４５において、関係集合生成部２４は、記事集合Ｄに含まれる記事データｄ_１、ｄ_２、…が示す文書の形態素解析を行う。関係集合生成部２４は、形態素解析の結果を参照し、部分組合せＡ_ｌが示すｎ個の単語が主語に含まれ、かつ、部分組合せＢ_ｋが示すｈ個の単語が述部にある名詞として含まれている文から、部分組合せＡ_ｌが示すｎ個の単語と部分組合せＢ_ｋが示すｈ個の単語を関係づける述部の動詞を単語ｒとして抽出する。
また、ステップＳ２３５において、類推結果生成部２８は、記事集合Ｅに含まれる記事データｅ_１、ｅ_２、…が示す文書の形態素解析を行う。類推結果生成部２８は、形態素解析の結果を参照し、部分組合せＣ_ｆが示すｎ個の単語が主語に含まれ、かつ、単語ｒ_ｉが述部の動詞として含まれている文から、述部にある名詞を単語ｘとして抽出する。 (Extraction process 1): The article data is text data of a document described in a natural language. In step S145, the relation set generation unit 24 performs a morphological analysis of the document indicated by the article data d ₁ , d ₂ ,... Included in the article set D. Relationship set generation unit 24 refers to the result of the morphological analysis, n-number of words indicated by the subcombinations A _l is included in the subject, and, as a noun h number of words indicated by the partial combination B _k is a predicate from contains statements, it extracts the verb predicates relating the h pieces of words indicated by the subcombinations a _l is n words and subcombination B _k indicating a word r.
In step S235, the analogy result generation unit 28 performs morphological analysis of the document indicated by the article data e ₁ , e ₂ ,... Included in the article set E. The analogy result generation unit 28 refers to the result of the morphological analysis, and from the sentence in which the n words indicated by the subcombination C _f are included in the subject and the word r _i is included as a verb of the predicate, The noun in the part is extracted as the word x.

図６は、ステップＳ１４０における単語ｒの抽出処理を説明するための図である。同図においては、ステップＳ１３５において部分組合せＡ_１（サメ，マグロ）及び部分組合せＢ_１（ひれ，尾）の組合せが選択されている例を示している。関係集合生成部２４は、記事データｄ_１が示す文「サメやマグロのような魚類は，ひれや尾を使って高速に泳ぐことができる。」の主語は部分組合せＡ_１が示す単語「サメ」及び「マグロ」であり、述部にある名詞は部分組合せＢ_１が示す単語「ひれ」及び「尾」であるため、当該文の述部の動詞「泳ぐ」を単語r_１として抽出する。また、関係集合生成部２４は、記事データｄ_２が示す文「サメやマグロのような魚類は，ひれや尾を使って広範囲を移動することができる。」の主語は部分組合せＡ_１が示す単語「サメ」及び「マグロ」であり、述部にある名詞は部分組合せＢ_１が示す単語「ひれ」及び「尾」であるため、当該文の述部の動詞「移動する」を単語r_２として抽出する。 FIG. 6 is a diagram for explaining the extraction process of word r in step S140. In the figure, an example is shown in which a combination of partial combination A ₁ (shark, tuna) and partial combination B ₁ (fin, tail) is selected in step S135. The relation set generation unit 24 uses the word “sharks” indicated by the partial combination A _{1 as} the subject of the sentence “fish such as sharks and tuna can swim at high speed using fins and tails” indicated by the article data d ₁ . ”And“ tuna ”, and the nouns in the predicate are the words“ fin ”and“ tail ”indicated by the subcombination B ₁ , so the verb“ swim ”in the predicate of the sentence is extracted as the word r ₁ . Further, the relationship set generation unit 24, sentence indicated by the article data d ₂ "fishes such as sharks and tuna. Capable of moving a wide range with the Hireya tail" subject of indicated subcombination A ₁ Since the words “shark” and “tuna” and the nouns in the predicate are the words “fin” and “tail” indicated by the partial combination B ₁ , the verb “move” in the predicate of the sentence is referred to as the word r _2. Extract as

このように、２以上の単語からなる部分組合せを利用することによって、１つの単語を用いる場合よりも、目的とする関係を高精度に抽出することができる。部分組合せに含まれる単語数が多いほどより正確な関係を示す単語を抽出することが可能となるが、関係の抽出対象となる文は減少する。 In this way, by using a partial combination composed of two or more words, it is possible to extract a target relationship with higher accuracy than in the case of using one word. As the number of words included in the partial combination increases, it becomes possible to extract words indicating a more accurate relationship, but the number of sentences from which relationships are extracted decreases.

図７は、ステップＳ２３５における単語ｘの抽出処理を説明するための図である。同図は、ステップＳ２３０において組合せの一方として部分組合せＣ_１（ライオン，オオカミ）が選択された場合について示している。類推結果生成部２８は、記事集合Ｅに含まれるいずれの記事データからも、部分組合せＣ_１が示す単語「ライオン」及び「オオカミ」が主語に含まれ、かつ、単語ｒ_１「泳ぐ」が述部の動詞である文を含む文は検出されなかったとする。一方、類推結果生成部２８は、記事集合Ｅに含まれる記事データｅ_１が示す文「ライオンやオオカミなどは足を使って広範囲を移動する必要があり…」の主語は部分組合せＣ_１が示す単語「ライオン」及び「オオカミ」であり、述部の動詞は単語ｒ_２「移動する」であるため、当該文の述部にある名詞「足」を単語ｘ_１として抽出する。 FIG. 7 is a diagram for explaining the word x extraction processing in step S235. This figure shows a case where the partial combination C ₁ (lion, wolf) is selected as _one of the combinations in step S230. The analogy result generation unit 28 includes, from any article data included in the article set E, the words “lion” and “wolf” indicated by the partial combination C ₁ are included in the subject, and the word r ₁ “swim” is described. It is assumed that a sentence including a sentence that is a part verb is not detected. On the other hand, the analogy result generation unit 28 indicates the subject of the sentence “the lion, the wolf, etc. need to move over a wide range using their feet ...” indicated by the article data e ₁ included in the article set E, as indicated by the partial combination C _1. Since the words “lion” and “wolf” and the verb of the predicate is the word r ₂ “move”, the noun “foot” in the predicate of the sentence is extracted as the word x ₁ .

（抽出処理２）：記事データは、自然言語で記述された文書のテキストデータであり、記事記憶装置５は、分野別の辞書データ（コーパス）をさらに記憶している。また、状況Ａ、結果Ｂ、関係集合Ｒは、特定の分野の用語とする。さらに、図４のステップＳ１００において、類推装置１の入力部１２は、さらに、分野を示す情報の入力を受ける。 (Extraction process 2): The article data is text data of a document described in a natural language, and the article storage device 5 further stores field-specific dictionary data (corpus). The situation A, the result B, and the relation set R are terms in a specific field. Furthermore, in step S100 of FIG. 4, the input unit 12 of the analogy device 1 further receives input of information indicating a field.

図４のステップＳ１４５において、関係集合生成部２４は、入力された分野の情報に対応した辞書データを特定し、部分組合せＡ_ｌが示すｎ個の単語と、部分組合せＢ_ｋが示すｈ個の単語が含まれる文から、特定された辞書データに登録されている単語を単語ｒの候補ｒ’として抽出する。抽出された候補ｒ’をそれぞれ、候補ｒ_１’、ｒ_２’…とする。関係集合生成部２４は、抽出した候補ｒ_ｙ’（ｙ＝１、２、…）と、部分組合せＡ_ｌ、部分組合せＢ_ｋそれぞれとの関連の強さに基づいて関係を表す単語としての妥当性を示す値を以下の式（１３）により算出する。但し、ＭＩ（Ａ_ｌ,ｒ_ｙ’）は、部分組合せＡ_ｌと候補ｒ_ｙ’との相互情報量、ＭＩ（Ｂ_ｋ,ｒ_ｙ’）は、部分組合せＢ_ｋと候補ｒ_ｙ’との相互情報量である。なお、相互情報量ＭＩ（Ａ_ｌ,ｒ_ｙ’）、相互情報量ＭＩ（Ｂ_ｋ,ｒ_ｙ’）は、単語ｒ_ｉの代わりに候補ｒ_ｙ’を用いることにより、式（６）、式（７）と同様に算出される。 In step S145 of FIG. 4, the relation set generation unit 24 identifies dictionary data corresponding to the input field information, and includes n words indicated by the partial combination A _l and h words indicated by the partial combination B _k. A word registered in the specified dictionary data is extracted as a candidate r ′ of the word r from the sentence including the word. Let the extracted candidates r ′ be candidates r ₁ ′, r ₂ ′,. The relation set generation unit 24 uses the extracted candidate r _y ′ (y = 1, 2,...) As appropriate as a word representing the relation based on the strength of association between the partial combination A ₁ and the partial combination B _k. A value indicating the property is calculated by the following equation (13). However, MI (A _l , r _y ′) is the mutual information amount between the partial combination A _l and the candidate r _y ′, and MI (B _k , r _y ′) is the difference between the partial combination B _k and the candidate r _y ′. Mutual information. Incidentally, the amount of mutual information _{_{MI (A l, r y '}} ), the mutual information _{_{MI (B k, r y'}} ) , by using the candidate _{r y} 'in place of the word _{r i,} equation (6), wherein Calculated in the same manner as (7).

スコア（ｒ_ｙ’）＝ＭＩ（Ａ_ｌ，ｒ_ｙ’）＋ＭＩ（Ｂ_ｋ，ｒ_ｙ’） …（１３） Score (r _y ′) = MI (A ₁ , r _y ′) + MI (B _k , r _y ′) (13)

関係集合生成部２４は、算出した値が、関連が強いと判断する所定の条件以上である候補ｒ’を、単語ｒとして選択する。
また、ステップＳ２３０において、類推結果生成部２８は、部分組合せＣ_ｆが示すｎ個の単語と、単語ｒ_ｉが含まれる文から、入力された分野の情報に対応した辞書データに登録されている単語を単語ｘとして抽出する。 The relation set generation unit 24 selects a candidate r ′ whose calculated value is equal to or greater than a predetermined condition for determining that the relation is strong as a word r.
In step S230, the analogy result generation unit 28 is registered in the dictionary data corresponding to the input field information from the sentence including the n words indicated by the partial combination C _f and the word r _i . Extract the word as word x.

図８は、単語ｒ及び単語ｘの抽出処理を説明するための図である。同図においては、分野の情報がコンピュータであり、ステップＳ１４０において部分組合せＡ_１（画像，写真）及び部分組合せＢ_１（ＧＩＦ，ＪＰＥＧ）の組合せが選択されている場合の例について示している。記事データｄ_１が示す文「画像や写真の圧縮には、ＧＩＦ、ＪＰＥＧなどのファイル形式が使えます。」には、部分組合せＡ_１が示す単語「画像」及び「写真」と、部分組合せＢ_１が示す単語「ＧＩＦ」及び単語「ＪＰＥＧ」が含まれている。関係集合生成部２４は、分野の情報からコンピュータ辞書データを特定し、この文に含まれる単語のうち、コンピュータ辞書データに登録されている「圧縮」、「ファイル形式」を候補ｒ_１’、ｒ_２’として抽出する。関係集合生成部２４は、これらの抽出した候補ｒ_１’、ｒ_２’について、上記の式（１３）によりスコアを算出した結果、候補ｒ_１’「圧縮」は単語ｒとして選択せず、候補ｒ_２’「ファイル形式」を単語ｒ_１として選択する。 FIG. 8 is a diagram for explaining extraction processing of the word r and the word x. This figure shows an example in which the field information is a computer and the combination of the partial combination A ₁ (image, photograph) and the partial combination B ₁ (GIF, JPEG) is selected in step S140. The sentence “article or file compression such as GIF or JPEG can be used for compression of images and photos” shown in the article data d ₁ includes the words “image” and “photo” indicated by the partial combination A ₁ and the partial combination B. ₁ includes the word “GIF” and the word “JPEG”. The relation set generation unit 24 identifies the computer dictionary data from the field information, and among the words included in the sentence, “compression” and “file format” registered in the computer dictionary data are candidates r ₁ ′, r Extract as ₂ '. The relation set generation unit 24 calculates a score for the extracted candidates r ₁ ′ and r ₂ ′ using the above equation (13). As a result, the candidate r ₁ ′ “compressed” is not selected as the word r, and the candidate r ₂ 'Select “file format” as word r ₁ .

類推結果生成部２８は、ステップＳ２３０において部分組合せＣ_１（音楽，会話）と単語ｒ_１「ファイル形式」の組合せが選択されている場合、部分組合せＣ_１が示す単語「音楽」及び「会話」、ならびに、単語ｒ_１「ファイル形式」が含まれる文から、記事記憶装置５が記憶しているコンピュータ辞書データに登録されている「ＭＰ３」を単語ｘ_１として抽出する。 When the combination of the partial combination C ₁ (music, conversation) and the word r ₁ “file format” is selected in step S230, the analogy result generation unit 28 selects the words “music” and “conversation” indicated by the partial combination C _1. In addition, “MP3” registered in the computer dictionary data stored in the article storage device 5 is extracted from the sentence including the word r ₁ “file format” as the word x ₁ .

[第２の実施形態]
続いて、本発明の他の実施形態を説明する。
第１の実施形態では、式（４）に示したように、ベースとなる事例である「Ａ：Ｂ」は一対のみ存在し、この事例によりターゲットの状況Ｃから結果Ｘを求めていた。つまり、第１の実施形態において、類推装置１は、多次元一事例における四項類推の処理をおこなっていた。一方、本実施形態では、「Ａ：Ｂ」で示されるベースの事例が複数存在する場合にターゲットの状況Ｃから結果Ｘを求める、多次元多事例の四項類推の処理を考える。これを、式（４）で示したmodus ponensと同じ表現方法で記述すると、以下の式（１４）のようになる。 [Second Embodiment]
Subsequently, another embodiment of the present invention will be described.
In the first embodiment, as shown in Expression (4), there is only one pair of “A: B” as a base example, and the result X is obtained from the situation C of the target by this example. That is, in the first embodiment, the analogy device 1 performs a four-term analogy process in a multidimensional case. On the other hand, in the present embodiment, a four-dimensional analogy process of multi-dimensional multi-cases that obtains the result X from the target situation C when there are a plurality of base cases indicated by “A: B” will be considered. When this is described by the same expression method as modus ponens shown in equation (4), the following equation (14) is obtained.

上記のように、本実施形態では、事例がＮ個（Ｎは２以上の整数）あり、各事例を事例（Ｉ）とする（Ｉは２以上Ｎ以下の整数）。事例（Ｉ）は、ベースの状況Ａ（Ｉ）と、ベースの状況Ａ（Ｉ）での結果Ｂ（Ｉ）とからなる。本実施形態の類推装置は、Ａ（Ｉ）：Ｂ（Ｉ）を用いて、状況Ｃに対応する結果Ｘを求める。 As described above, in this embodiment, there are N cases (N is an integer of 2 or more), and each case is referred to as a case (I) (I is an integer of 2 to N). Case (I) includes a base situation A (I) and a result B (I) in the base situation A (I). The analogy device of this embodiment calculates | requires the result X corresponding to the situation C using A (I): B (I).

なお、状況Ａ（Ｉ）は、ｍ個（ｍは２以上の整数）の単語ｔ_ａＩ１，ｔ_ａＩ２，…，ｔ_ａＩｍを要素とするｍ次元ワードベクトルであり、ターゲットの状況Ｃも、第１の実施形態と同様のｍ次元ワードベクトルである。また、結果Ｂ（Ｉ）は、ｇ個（ｇは２以上の整数）の単語ｗ_ｂＩ１，ｗ_ｂＩ２，…，ｗ_ｂＩｇを要素とするｇ次元ワードベクトルであり、ターゲットの結果Ｘも、第１の実施形態と同様のｇ次元ワードベクトルである。 The situation A (I) is an m-dimensional word vector having m words (m is an integer of 2 or more) words t _aI1 , t _aI2 ,..., T _aIm , and the target situation C is also the first This is the same m-dimensional word vector as in the embodiment. Further, the result B (I) is a g-dimensional word vector having g (where g is an integer of 2 or more) words w _bI1 , w _bI2 ,..., W _bIg as elements, and the target result X is also the first This is the same g-dimensional word vector as in the embodiment.

図９は、本実施形態による類推装置の動作概要を示す図である。
本実施形態では、事例が多事例であるため、以下の式（１５）のように表わすことができる。 FIG. 9 is a diagram showing an outline of the operation of the analogy device according to the present embodiment.
In this embodiment, since there are many cases, it can be expressed as the following equation (15).

Ａ（Ｉ）：Ｂ（Ｉ）＝Ｃ：Ｘ（Ｉ）？（Ｉは２以上Ｎ以下の整数）・・・（１５） A (I): B (I) = C: X (I)? (I is an integer not less than 2 and not more than N) (15)

同図に示すように、本実施形態の類推装置は、各事例（Ｉ）について第１の実施形態の類推装置１と同様の関係抽出処理を行なうことによって、状況Ａ（Ｉ）の部分組合せと結果Ｂ（Ｉ）の部分組合せとの関係を示す単語の集合である関係集合Ｒ（Ｉ）を生成する。本実施形態の類推装置は、各関係集合Ｒ（Ｉ）について第１の実施形態の類推装置１と同様に関係マッピング処理を行なうことによって、類推結果集合Ｘ（Ｉ）を生成し、類推結果妥当性スコアを算出する。本実施形態の類推装置は、類推結果集合Ｘ（１）〜Ｘ（Ｎ）を統合し、重複する単語ｘがある場合は、重複を削除する。重複削除後の類推結果妥当性スコアは、重複する単語ｘについての類推結果妥当性スコアを合計した値である。 As shown in the figure, the analogy device of the present embodiment performs a relationship extraction process similar to that of the analogy device 1 of the first embodiment for each case (I), so that the partial combination of the situation A (I) A relation set R (I) that is a set of words indicating a relation with the partial combination of the result B (I) is generated. The analogy device of this embodiment generates an analogy result set X (I) by performing the relationship mapping process for each relation set R (I) in the same manner as the analogy device 1 of the first embodiment, and the analogy result is valid. Calculate the sex score. The analogy device of this embodiment integrates analogy result sets X (1) to X (N), and deletes duplicates when there are duplicate words x. The analogy result validity score after duplicate deletion is a value obtained by summing up the analogy result validity scores for the overlapping word x.

図１０は、本発明の第２の実施形態による類推装置１ａの構成を示すブロック図である。同図において、図３に示す第１の実施形態による類推装置１と同一の部分には同一の符号を付し、その説明を省略する。図１０に示す類推装置１ａが、図３に示す第１の実施形態の類推装置１と異なる点は、事例記憶部１１に代えて事例記憶部１１ａを備える点、類推処理制御部１７及び類推結果積算部１８を備える点である。 FIG. 10 is a block diagram showing the configuration of the analogy device 1a according to the second embodiment of the present invention. In this figure, the same parts as those in the analogy device 1 according to the first embodiment shown in FIG. The analogy device 1a shown in FIG. 10 differs from the analogy device 1 of the first embodiment shown in FIG. 3 in that it includes a case storage unit 11a instead of the case storage unit 11, an analogy processing control unit 17 and an analogy result. It is a point provided with the integrating | accumulating part 18. FIG.

事例記憶部１１ａは、状況Ａ（Ｉ）のワードベクトルを示す状況Ａ（Ｉ）データ（ベース状況データ）と、結果Ｂ（Ｉ）のワードベクトルを示す結果Ｂ（Ｉ）データ（ベース結果データ）とからなる事例（Ｉ）を記憶する。類推処理制御部１７は、第１の実施形態と同様の処理を事例記憶部１１ａに記憶されている事例（１）〜事例（Ｎ）について処理を行なうよう関係抽出部１３及び関係マッピング部１５に指示する。類推結果積算部１８は、各事例（Ｉ）について得られた類推結果の単語ｘの集合を示すデータである類推結果Ｘ（Ｉ）を統合する。 The case storage unit 11a includes situation A (I) data (base situation data) indicating a word vector of situation A (I) and result B (I) data (base result data) indicating a word vector of result B (I). Case (I) consisting of The analogy processing control unit 17 causes the relationship extraction unit 13 and the relationship mapping unit 15 to perform the same processing as in the first embodiment on the cases (1) to (N) stored in the case storage unit 11a. Instruct. The analogy result accumulation unit 18 integrates analogy results X (I), which is data indicating a set of analogy result words x obtained for each case (I).

図１１及び図１２は、図１０に示す類推装置１ａの処理フローを示す図である。
類推装置１ａの事例記憶部１１ａは、状況Ａ（Ｉ）データと結果Ｂ（Ｉ）データとからなる事例（Ｉ）を記憶している（Ｉは２以上Ｎ以下の整数）。状況Ａ（Ｉ）データは、ｍ個（ｍは２以上の整数）の単語ｔ_ａＩ１，ｔ_ａＩ２，…，ｔ_ａＩｍを要素とするｍ次元ワードベクトルを示す（ｍは２以上の整数）。結果Ｂ（Ｉ）データは、ｇ個（ｇは２以上の整数）の単語ｗ_ｂＩ１，ｗ_ｂＩ２，…，ｗ_ｂＩｇを要素とするｇ次元ワードベクトルを示す。また、処理結果記憶部１４は、初期値ＮＵＬＬの記事集合Ｄ、記事集合Ｅ、関係集合Ｒ（Ｉ）、類推結果集合Ｘ（Ｉ）を記憶する。 11 and 12 are diagrams showing a processing flow of the analogy device 1a shown in FIG.
The case storage unit 11a of the analogy device 1a stores a case (I) composed of situation A (I) data and result B (I) data (I is an integer of 2 or more and N or less). The situation A (I) data represents an m-dimensional word vector having m (m is an integer of 2 or more) words t _aI1 , t _aI2 ,..., T _aIm (m is an integer of 2 or more). The result B (I) data indicates a g-dimensional word vector having g (where g is an integer of 2 or more) words w _bI1 , w _bI2 ,..., W _bIg as elements. Further, the processing result storage unit 14 stores an article set D, an article set E, a relation set R (I), and an analogy result set X (I) with an initial value NULL.

図１１において、類推装置１ａの入力部１２は、図４に示す第１の実施形態のステップＳ１００と同様に、状況Ｃデータの入力を受ける（ステップＳ３００）。類推処理制御部１７は、事例記憶部１１ａに記憶されている事例（Ｉ）のうち、まだ処理対象としていない事例（Ｉ）を選択する（ステップＳ３０５）。類推処理制御部１７は、選択した事例（Ｉ）についての処理を実行するよう、関係抽出部１３に指示する。これにより、関係抽出部１３は、類推処理制御部１７により選択された事例（Ｉ）の状況Ａ（Ｉ）データ、結果Ｂ（Ｉ）データを事例記憶部１１から読み出し、第１の実施形態における状況Ａデータ、結果Ｂデータの代わりに用いて、図４に示す第１の実施形態におけるステップＳ１０５〜Ｓ１５５と同様の処理を実行する（ステップＳ３１０〜Ｓ３６０）。これにより、事例（Ｉ）についての関係集合Ｒと、関係集合Ｒに含まれる各単語ｒ_ｉの関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）が得られる。事例（Ｉ）を用いて得られた関係集合Ｒを関係集合Ｒ（Ｉ）とし、関係集合Ｒに含まれる単語ｒ_１、ｒ_２、…をそれぞれ単語ｒ_Ｉ１、ｒ_Ｉ２、…とし、単語ｒ_ｉについて算出された関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）を、関係妥当性スコアｓｃｏｒｅ（ｒ_Ｉｉ）とする。これらのデータは、事例（Ｉ）の識別情報と対応づけて処理結果記憶部１４に書き込まれる。 In FIG. 11, the input unit 12 of the analogy device 1a receives input of status C data (step S300), similarly to step S100 of the first embodiment shown in FIG. The analogy process control unit 17 selects a case (I) that has not yet been processed from cases (I) stored in the case storage unit 11a (step S305). The analogy process control unit 17 instructs the relationship extraction unit 13 to execute the process for the selected case (I). As a result, the relationship extraction unit 13 reads the situation A (I) data and the result B (I) data of the case (I) selected by the analogy processing control unit 17 from the case storage unit 11, and in the first embodiment By using the situation A data and the result B data in place of the situation A data, the same processing as steps S105 to S155 in the first embodiment shown in FIG. 4 is executed (steps S310 to S360). Thus, the relationship between the set R for case (I), Relationship relevance scores score of each word _{r i} in the relation set R _{(r i)} is obtained. The relation set R obtained by using the case (I) is set as a relation set R (I), the words r ₁ , r ₂ ,... Included in the relation set R are set as words r _I1 , r _I2,. Let the relation validity score score (r _i ) calculated for _i be the relation validity score score (r _Ii ). These data are written in the processing result storage unit 14 in association with the identification information of the case (I).

類推処理制御部１７は、事例記憶部１１ａに記憶されている事例（Ｉ）に、まだ処理対象としていない事例（Ｉ）があるかを判断する。まだ処理対象としていない事例（Ｉ）があると判断した場合（ステップＳ３６５：ＮＯ）、類推処理制御部１７は、ステップＳ３０５からの処理を繰り返す。類推処理制御部１７が全ての事例（Ｉ）を処理対象としたと判断した場合（ステップＳ３６５：ＹＥＳ）、関係マッピング部１５を起動する。 The analogy processing control unit 17 determines whether there is a case (I) that is not yet processed in the case (I) stored in the case storage unit 11a. When it is determined that there is a case (I) that has not yet been processed (step S365: NO), the analogy process control unit 17 repeats the process from step S305. If the analogy processing control unit 17 determines that all cases (I) are to be processed (step S365: YES), the relationship mapping unit 15 is activated.

図１２において、ターゲット状況部分組合せ生成部２６は、図５に示す第１の実施形態のステップＳ２００と同様に、状況Ｃデータが示すワードベクトルから部分組合せＣ_ｆ（１≦ｆ≦_ｍＣ_ｎ、ｆは整数）を生成し、処理結果記憶部１４に書き込む（ステップＳ３００）。 In FIG. 12, the target situation partial combination generation unit 26, as in step S200 of the first embodiment shown in FIG. 5, uses a partial combination C _f (1 ≦ f ≦ _m C _n , f is an integer) and is written in the processing result storage unit 14 (step S300).

類推処理制御部１７は、まだ関係マッピング部１５における処理対象としていない事例（Ｉ）を特定する（ステップＳ３０５）。類推処理制御部１７は、特定した事例（Ｉ）についての処理を実行するよう、関係マッピング部１５に指示する。これにより、関係マッピング部１５は、類推処理制御部１７により選択された事例（Ｉ）の識別情報と対応付けて処理結果記憶部１４に記憶されている関係集合Ｒ（Ｉ）及び関係妥当性スコアｓｃｏｒｅ（ｒ_Ｉｉ）を読み出し、第１の実施形態における関係集合Ｒ、関係妥当性スコアｓｃｏｒｅ（ｒ_ｉ）の代わりに用いて、図５に示す第１の実施形態におけるステップＳ２１０〜Ｓ２４５の処理を実行する（ステップＳ４１５〜Ｓ４４０）。これにより類推結果集合Ｘと、類推結果集合Ｘに含まれる各単語ｘ_ｊの類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）が得られる。事例（Ｉ）に対応した関係集合Ｒ（Ｉ）及び関係妥当性スコアｓｃｏｒｅ（ｒ_Ｉｊ）を用いて得られたこの類推結果集合Ｘを類推結果集合Ｘ（Ｉ）とし、類推結果集合Ｘに含まれる単語ｘ_１、ｘ_２、…をそれぞれ単語ｘ_Ｉ１、ｘ_Ｉ２、…とし、単語ｘ_ｊについて算出された類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）を、関係妥当性スコアｓｃｏｒｅ_Ｉ（ｘ_Ｉｊ）とする。これらのデータは、事例（Ｉ）の識別情報と対応づけて処理結果記憶部１４に書き込まれる。 The analogy processing control unit 17 identifies a case (I) that has not yet been processed in the relationship mapping unit 15 (step S305). The analogy process control unit 17 instructs the relationship mapping unit 15 to execute the process for the identified case (I). Accordingly, the relationship mapping unit 15 associates the identification information of the case (I) selected by the analogy processing control unit 17 with the relationship set R (I) and the relationship validity score stored in the processing result storage unit 14. The score (r _Ii ) is read out and used in place of the relation set R and the relation validity score score (r _i ) in the first embodiment, and the processes of steps S210 to S245 in the first embodiment shown in FIG. 5 are performed. Execute (Steps S415 to S440). Thereby, the analogy result set X and the analogy result validity score score (x _j ) of each word x _j included in the analogy result set X are obtained. This analogy result set X obtained by using the relation set R (I) corresponding to the case (I) and the relation validity score score (r _Ij ) is set as an analogy result set X (I) and included in the analogy result set X The words x ₁ , x ₂ ,... _Are the words x _I1 , x _I2 ,..., Respectively, and the analogy result validity score score (x _j ) calculated for the word x _j is the relation validity score score _I (x _Ij ). And These data are written in the processing result storage unit 14 in association with the identification information of the case (I).

類推処理制御部１７は、まだ関係マッピング部１５における処理対象としていない事例（Ｉ）があるかを判断する。まだ処理対象としていない事例（Ｉ）があると判断した場合（ステップＳ４５５：ＮＯ）、類推処理制御部１７は、ステップＳ４０５からの処理を繰り返す。類推処理制御部１７が全ての事例（Ｉ）を処理対象としたと判断した場合（ステップＳ４５５：ＹＥＳ）、類推結果積算部１８を起動する。 The analogy processing control unit 17 determines whether there is a case (I) that is not yet processed in the relationship mapping unit 15. If it is determined that there is a case (I) that has not yet been processed (step S455: NO), the analogy process control unit 17 repeats the process from step S405. When the analogy processing control unit 17 determines that all cases (I) are to be processed (step S455: YES), the analogy result accumulation unit 18 is activated.

類推結果積算部１８は、以下に示す式（１６）のように、処理結果記憶部１４から読み出した類推結果集合Ｘ（１）〜（Ｎ）を統合したデータである統合類推結果集合Ｘを生成する。 The analogy result accumulation unit 18 generates an integrated analogy result set X that is data obtained by integrating the analogy result sets X (1) to (N) read from the processing result storage unit 14 as shown in the following equation (16). To do.

Ｘ＝Σ_ＩＸ（Ｉ）（Ｉ＝１〜Ｎ） …（１６） _{X = Σ I X (I)} (I = 1~N) ... (16)

類推結果積算部１８は、類推結果集合Ｘ（１）〜（Ｎ）に含まれる単語ｘ_Ｉｊに重複があれば一つの単語のみを残して重複をなくし、重複をなくした単語ｘ_Ｉｊの集合を示すデータである統合類推結果集合Ｘを生成する。統合類推結果集合Ｘに含まれる単語をｘ_１、ｘ_２、…とする。 The analogy result accumulating unit 18 eliminates duplication by leaving only one word if there is duplication in the words _xIj included in the analogy result sets X (1) to (N), and sets a set of words _{xIj from} which duplication has been eliminated. An integrated analogy result set X, which is data to be shown, is generated. It is assumed that words included in the integrated analogy result set X are x ₁ , x ₂ ,.

続いて、類推結果積算部１８は、以下の式（１７）により、統合類推結果集合Ｘに含まれる各単語ｘ_ｊの類推結果妥当性スコアを算出する。ただし、ｓｃｏｒｅ_Ｉ（ｘ_ｊ）は、事例（Ｉ）について得られた単語ｘ_ｊの類推結果妥当性スコアである。 Subsequently, analogy result integration unit 18, by the following equation (17), and calculates the inferred result relevance score of each word x _j included in the integrated analogy result set X. However, score _I (x _j ) is the analogy result validity score of the word x _j obtained for the case (I).

ｓｃｏｒｅ（ｘ_ｊ）＝Σ_Ｉｓｃｏｒｅ_Ｉ（ｘ_ｊ） …（１７） score (x _j ) = Σ _I score _I (x _j ) (17)

つまり、統合類推結果集合Ｘに含まれる単語ｘ_ｊの類推結果妥当性スコアは、当該単語ｘ_ｊに統合された単語ｘ_Ｉｊの類推結果妥当性スコアｓｃｏｒｅ_Ｉ（ｘ_Ｉｊ）を積算した値である。類推結果積算部１８は、統合類推結果集合Ｘに含まれる単語ｘ_ｊ及び当該単語ｘ_ｊの類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）を処理結果記憶部１４に書き込む（ステップＳ４６０）。 That analogy result relevance score of a word _{x j} included in the integrated analogy result set X is the analogy results Relevance Score score _I value obtained by integrating _{(x Ij)} word _{x Ij} integrated in the word _{x j} . The analogy result accumulating unit 18 writes the word x _j included in the integrated analogy result set X and the analogy result validity score score (x _j ) of the word x _{j in} the processing result storage unit 14 (step S460).

出力部１６は、ステップＳ４４５において類推結果積算部１８が処理結果記憶部１４に書き込んだ統合類推結果集合Ｘが示す各単語ｘ_ｊ及び当該単語ｘ_ｊの類推結果妥当性スコアｓｃｏｒｅ（ｘ_ｊ）とからなる類推結果データをディスプレイに表示させるなどして出力する（ステップＳ４６５）。このとき、出力部１６は、類推結果妥当性スコアが閾値以上の単語ｘ_ｊのみを出力するようにしてもよく、類推結果妥当性スコアが高い順に所定数の単語ｘ_ｊのみを出力してもよい。 The output unit 16 includes a analogy result accumulation unit 18 is a processing result each word _{x j} storage unit 14 in the integrated analogy result set X written by the show and the word _{x j} analogical results relevance score score _{(x j)} in step S445 The analogy result data consisting of is output on the display (step S465). At this time, the output unit 16 may output only the word x _j having an analogy result validity score equal to or higher than the threshold, or may output only a predetermined number of words x _{j in} descending order of the analogy result validity score. Good.

上述した実施形態では、記事記憶装置５を第１の実施形態の類推装置１、第２の実施形態の類推装置１ａとネットワークを介して接続される外部の装置としているが、類推装置１、類推装置１ａが記事記憶装置５を内部に備える構成としてもよい。
また、上述した実施形態では、記事集合Ｄ、Ｅを、記事データの集合としているが、記事データが記憶されているＵＲＬ（Universal Resource Locator）や記事データのデータ名など、記事データの識別情報を示すデータでもよい。この場合、類推装置１及び類推装置１ａは、記事集合Ｄ、Ｅに含まれる記事データの識別情報で特定される記事記憶装置５内の記事データを参照し、上述した処理を実行する。
また、上述した実施形態では、状況Ａ、状況Ａ（１）〜状況Ａ（Ｎ）と状況Ｃのワードベクトルの次元数が同じ場合について説明したが、状況Ａ、状況Ａ（１）〜状況Ａ（Ｎ）の次元数と状況Ｃのワードベクトルの次元数とが異なっていてもよい。また、状況Ａ（１）〜状況Ａ（Ｎ）のワードベクトルの次元数は全て同一でなくともよい。また、結果Ｂ（１）〜結果Ｂ（Ｎ）のワードベクトルの次元数は全て同一でなくともよい。 In the embodiment described above, the article storage device 5 is an external device connected to the analogy device 1 of the first embodiment and the analogy device 1a of the second embodiment via a network. The apparatus 1a may be configured to include the article storage device 5 therein.
In the above-described embodiment, the article sets D and E are set of article data. However, the article data identification information such as a URL (Universal Resource Locator) where the article data is stored and the data name of the article data is used. It may be the data shown. In this case, the analogy device 1 and the analogy device 1a refer to the article data in the article storage device 5 specified by the identification information of the article data included in the article sets D and E, and execute the above-described processing.
In the above-described embodiment, the situation A, situation A (1) to situation A (N) and situation C have the same number of word vectors. However, situation A, situation A (1) to situation A The number of dimensions of (N) and the number of dimensions of the word vector of situation C may be different. Further, the number of dimensions of the word vectors in the situations A (1) to A (N) may not be the same. Further, the number of dimensions of the word vectors of the results B (1) to B (N) may not be the same.

上述した類推装置１及び類推装置１ａは、内部にコンピュータシステムを有している。そして、類推装置１の関係抽出部１３、関係マッピング部１５及び出力部１６、ならびに、類推装置１ａの関係抽出部１３、関係マッピング部１５、出力部１６、類推処理制御部１７及び類推結果積算部１８の動作の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータシステムが読み出して実行することによって、上記処理が行われる。ここでいうコンピュータシステムとは、ＣＰＵ及び各種メモリやＯＳ、周辺機器等のハードウェアを含むものである。 The analogy device 1 and analogy device 1a described above have a computer system inside. Then, the relationship extraction unit 13, the relationship mapping unit 15 and the output unit 16 of the analogy device 1, and the relationship extraction unit 13, the relationship mapping unit 15, the output unit 16, the analogy process control unit 17 and the analogy result integration unit of the analogy device 1a The operation process 18 is stored in a computer-readable recording medium in the form of a program, and the above-described processing is performed by the computer system reading and executing this program. The computer system here includes a CPU, various memories, an OS, and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組合せで実現できるものであっても良い。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the above-described functions, or may be a program that can realize the above-described functions in combination with a program already recorded in a computer system.

１、１ａ…類推装置
１１、１１ａ…事例記憶部
１２…入力部
１３…関係抽出部
１４…処理結果記憶部
１５…関係マッピング部
１６…出力部
１７…類推処理制御部
１８…類推結果積算部
２１…ベース状況部分組合せ生成部
２２…ベース結果部分組合せ生成部
２３…ベース共起記事検索部
２４…関係集合生成部
２５…妥当性スコア算出部
２６…ターゲット状況部分組合せ生成部
２７…ターゲット共起記事検索部
２８…類推結果生成部
２９…類推結果妥当性スコア算出部
５…記事記憶装置 DESCRIPTION OF SYMBOLS 1, 1a ... Analogical inference apparatus 11, 11a ... Case memory | storage part 12 ... Input part 13 ... Relation extraction part 14 ... Processing result memory | storage part 15 ... Relation mapping part 16 ... Output part 17 ... Analogical-inference process control part 18 ... Analogical-inference-results integration part 21 ... base situation partial combination generation part 22 ... base result partial combination generation part 23 ... base co-occurrence article search part 24 ... relation set generation part 25 ... validity score calculation part 26 ... target situation partial combination generation part 27 ... target co-occurrence article Search unit 28 ... analogy result generation unit 29 ... analogy result validity score calculation unit 5 ... article storage device

Claims

A base situation partial combination generation unit that generates base situation partial combination data composed of a predetermined number of words selected from a plurality of words indicated by the base situation data, and
A base result partial combination generation unit that generates base result partial combination data composed of a predetermined number of words selected from a plurality of words indicated by the base result data;
The word indicated by the base situation partial combination data and the word indicated by the base result partial combination data are associated with each different combination of one of the base situation partial combination data and one of the base result partial combination data. A relation set generation unit that extracts association words that are words from article data stored in the article storage device;
A target situation partial combination generation unit that generates target situation partial combination data composed of a predetermined number of words selected from a plurality of words indicated by the target situation data;
For each different combination of one of the target situation partial combination data and one of the association words, a word related to the word indicated by the target situation partial combination data by the association word is stored in the article storage device. An analogy result generation unit that extracts as analogy results from the posted article data,
An analogy device comprising:

For each of the association words extracted by the relation set generation unit, co-occurrence of the association word obtained from the article data stored in the article storage device, the base situation partial combination data, and the base result partial combination data The relation between the situation and the result based on the strength of the relation, the strength of the relation between the related word and the base situation partial combination data, and the strength of the relation between the related word and the base result partial combination data A relation validity score calculation unit for calculating a relation validity score that quantitatively represents validity as a word to perform,
For each of the words extracted as an analogy result by the analogy result generation unit, the word obtained from the article data stored in the article storage device, the co-occurrence of the target situation partial combination data, and the association word Strength of association, strength of association between the word and the target situation partial combination data, strength of association between the word and the association word, and the relation validity score calculated for the association word An analogy result validity score calculation unit for calculating an analogy result validity score that quantitatively represents the validity as an analogy result,
The analogy device according to claim 1, further comprising:

For each of a plurality of case data composed of base situation data and base result data, the base situation partial combination generation unit generates base situation partial combination data from the base situation data constituting the case data, and the base result partial combination The generation unit generates the base result partial combination data from the base result data constituting the case data, and the relation set generation unit has one of the base situation partial combination data generated from the base situation data and , For each different combination consisting of one of the base result partial combination data generated from the base result data, the association word is extracted from the article data, and the relation validity score calculation unit makes the relation validity for each of the association words Generate sex score and generate analogy result In addition, the analogy result word is extracted from the article data for each different combination of one of the target situation partial combination data and one of the association words, and the analogy result validity score calculation unit extracts the analogy result as an analogy result An analogy processing control unit for calculating an analogy result validity score for each of the words
An analogy result accumulating unit that integrates the same words included in the analogy result words obtained for each case data and accumulates the analogy result validity scores calculated for the integrated same words Prepare
The analogy device according to claim 2, wherein:

The relation set generation unit obtains a predicate verb from the sentence of the article data in which the word indicated by the base situation partial combination data is a subject noun and the word indicated by the base result partial combination data is a predicate noun. Extract as related words,
The analogy result generation unit extracts a predicate noun as an analogy result from a sentence of the article data in which a plurality of words indicated by the target situation partial combination data are subject nouns and the association word is a predicate verb ,
The analogy device according to any one of claims 1 to 3, characterized in that:

The article storage device further stores dictionary data including words related to a predetermined field,
The relation set generation unit converts a word included in the dictionary data from the sentence of the article data in which the word indicated by the base situation partial combination data and the word indicated by the base result partial combination data co-occurs. Extract as
The analogy result generation unit extracts a word included in the dictionary data as an analogy result from a sentence of the article data in which the word indicated by the target situation partial combination data and the association word co-occur.
The analogy device according to any one of claims 1 to 3, characterized in that:

An analogy method performed by an analogy device,
A base situation partial combination generation process in which a base situation partial combination generation unit generates base situation partial combination data including a predetermined number of words selected from a plurality of words indicated by the base situation data by different combinations;
A base result partial combination generation process in which a base result partial combination generation unit generates base result partial combination data composed of a predetermined number of words selected from a plurality of words indicated by the base result data;
The relation set generation unit includes a word indicated by the base situation partial combination data and the base result partial combination data for each different combination of one of the base situation partial combination data and one of the base result partial combination data. A relation set generation process for extracting an association word, which is a word relating to a word to be shown, from article data stored in the article storage device;
A target situation partial combination generation unit that generates target situation partial combination data including a predetermined number of words selected by different combinations from a plurality of words indicated by the target situation data; and
The analogy result generation unit, for each different combination consisting of one of the target situation partial combination data and one of the association words, a word related to the word indicated by the target situation partial combination data by the association word, An analogy result generation process for extracting as analogy results from the article data stored in the article storage device;
An analogy method characterized by comprising:

A computer used as an analogy device
A base situation partial combination generation unit for generating base situation partial combination data composed of a predetermined number of words selected from different words indicated by the base situation data;
A base result partial combination generation unit that generates base result partial combination data composed of a predetermined number of words selected from a plurality of words indicated by the base result data.
The word indicated by the base situation partial combination data and the word indicated by the base result partial combination data are associated with each different combination of one of the base situation partial combination data and one of the base result partial combination data. A relation set generation unit that extracts association words, which are words, from article data stored in an article storage device;
A target situation partial combination generation unit that generates target situation partial combination data composed of a predetermined number of words selected from a plurality of words indicated by the target situation data;
For each different combination of one of the target situation partial combination data and one of the association words, a word related to the word indicated by the target situation partial combination data by the association word is stored in the article storage device. An analogy result generator that extracts as analogy results from the article data
A program characterized by functioning as