JP2007102642A

JP2007102642A - Information analysis system, information analysis method and information analysis program

Info

Publication number: JP2007102642A
Application number: JP2005294108A
Authority: JP
Inventors: Hiroyuki Onuma; 宏行大沼; Masaki Matsudaira; 正樹松平; Masamutsu Fuchigami; 正睦渕上; Kohaku Morita; 幸伯森田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2005-10-06
Filing date: 2005-10-06
Publication date: 2007-04-19

Abstract

<P>PROBLEM TO BE SOLVED: To create and output a useful correlation rule among a plurality of pieces of information. <P>SOLUTION: The information analysis system is provided with: a morphological analysis means for performing morphological analysis of input text information; a syntax analysis means; an item creation means for creating a morphological analysis result and/or a syntax analysis result as items for analysis of the correlation rule; an item group creation means for creating one or more item groups using one or more items created by the item creation means; an item group deletion means for deleting item groups having items with semantic inclusive relation as elements by comparing the respective created item groups with one another; an item group computation means for computing appearance frequency of coincidence about the respective item groups; a correlation rule creation means for creating one or more correlation rules based on the computed appearance frequency of coincidence of the respective item groups; and a display means for displaying the respective correlation rules created by the correlation rule creation means. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、情報分析システム、情報分析方法及び情報分析プログラムに関し、例えば、大量のデータの中から有用な情報を発見し、表示する情報分析システムに適用し得る。 The present invention relates to an information analysis system, an information analysis method, and an information analysis program. For example, the present invention can be applied to an information analysis system that finds and displays useful information from a large amount of data.

大量のデータに埋もれた有用な情報を見つけるデータマイニングの手法として、例えばアソシエーション分析が知られている。アソシエーション分析とは、アソシエーションルールと呼ばれる事象間のつながりに関する規則を知識として発見するものである。 As a data mining technique for finding useful information buried in a large amount of data, for example, association analysis is known. Association analysis is to discover rules related to the connection between events called association rules as knowledge.

非特許文献１に開示されているＡｐｒｉｏｒｉアルゴリズムは、アソシエーション分析の効率的な処理方法の１つであり、最小支持度以上の多頻度の集合を抽出する方法が提案されている。 The Priori algorithm disclosed in Non-Patent Document 1 is one of the efficient processing methods of association analysis, and a method for extracting a frequent set greater than the minimum support level has been proposed.

アソシエーション分析では、分析対象のオブジェクト（以下、アイテムと呼ぶ）を何にするかは、目的に応じて規定することになる。例えば、ＰＯＳデータの分析ならば、顧客が購入した商品を１つのアイテムにするだろう。 In association analysis, what an object (hereinafter referred to as an item) to be analyzed is specified according to the purpose. For example, in the case of POS data analysis, a product purchased by a customer may be an item.

従って、アソシエーション分析は、自然文を対象としたテキストマイニングにも応用可能であり、テキストマイニングの場合には、アイテムを次のいずれかに設定することが考えられる。 Therefore, the association analysis can also be applied to text mining for natural sentences, and in the case of text mining, it is conceivable to set the item to one of the following.

（場合１）単語
（場合２）単語間の係り受け関係
例えば、データが「母が、テレビで宣伝している特定口座を申し込んだが、開設されていない。」であるとする。 (Case 1) Word (Case 2) Dependency Relationship Between Words For example, assume that the data is “Mother applied for a specific account advertised on TV but has not been opened”.

この場合に、場合１では、次のように、［母、テレビ、宣伝する（いる）、特定口座、申し込む（だ）、開設する（ない）］となる。 In this case, in case 1, as follows, [mother, television, advertise (is), specific account, apply (is), opens (not)].

一方、場合２では、図２に示す係り受け関係が存在するので、次のように、［母−申し込む（だ）、テレビ−宣伝する（いる）、宣伝する（いる）−特定口座、特定口座−申し込む（だ）］となる。 On the other hand, in case 2, since the dependency relationship shown in FIG. 2 exists, [Mother-subscribe (da), TV-advertise (is), advertise (is) -specific account, specific account as follows: -Apply.

ここで、各アイテムは、「係り元単語−係り先単語」という記法であり、括弧内は、動詞が否定や継続などの意味（以下、意図情報と呼ぶ）で使われていることを示す。 Here, each item has a notation of “relationship source word—relationship destination word”, and the parentheses indicate that the verb is used in a meaning such as negation or continuation (hereinafter referred to as intention information).

非特許文献２には、場合２の係り受け関係を用いて重要情報を発見する方法が開示されており、その記載の中で、これらのアイテムを組み合わせてアイテム集合を作成し、支持度や確信度などを計算し、一定の支持度、確信度以上のアソシエーションルールを出力する技術が開示されている。 Non-Patent Document 2 discloses a method of discovering important information using the dependency relationship of Case 2, and in that description, an item set is created by combining these items, and the degree of support and confidence A technique for calculating a degree or the like and outputting an association rule with a certain degree of support or certainty is disclosed.

Ａｇｒａｗａｌ，Ｒ，Ｉｍｉｅｌｉｎｓｋｉ，Ｔ．，ａｎｄＳｗａｍｉ，Ａ． ”ＭｉｎｉｎｇＡｓｓｏｃｉａｔｉｏｎＲｕｌｅｓｂｅｔｗｅｅｎＳｅｔｓｏｆＩｔｅｍｓｉｎＬａｒｇｅＤａｔａｂａｓｅｓ”．ＰｒｏｃｅｅｄｉｎｇｓｏｆＡＣＭＳＩＧＭＯＤ−９３，ＰＰ．２０７−２１６（１９９３）．Agrawal, R, Imielinski, T .; , And Swami, A .; “Minning Association Rules between Sets of Items in Large Databases”. Proceedings of ACM SIGMOD-93, PP. 207-216 (1993). 嶋津恵子、門馬敦仁、古川康一、「相関ルール導出法によるコールセンター情報からの重要情報の発見」、第１７回人工知能学会全国大会、１Ｆ４−０３、２００３年Keiko Shimazu, Masahito Monma, Koichi Furukawa, “Discovering Important Information from Call Center Information Using Association Rule Derivation Method”, 17th Annual Conference of Japanese Society for Artificial Intelligence, 1F4-03, 2003 伊藤貴之、井上恵介、土井淳、梶永泰正、池端裕子、「力学モデルを用いたグラフデータの画面配置手法の改良」、情報処理学会研究報告グラフィクスとＣＡＤ１０３−２，ｐｐ−７−１２，２００１年Takayuki Ito, Keisuke Inoue, Satoshi Doi, Yasumasa Tominaga, Yuko Ikebata, “Improvement of Screen Layout Method of Graph Data Using Dynamic Model”, IPSJ Research Reports Graphics and CAD103-2, pp-7-12, 2001 Year

ところで、データ中に出現する各単語をアイテムにした場合（場合１）については、各アイテムが単語であるため、係り受け関係の場合（場合２）に比べて、得られたアソシエーションルールの意味が取りづらい。一方、係り受け関係の場合には、個々のアイテムは「何がどうした」という形式であり、意味がとりやすい。 By the way, when each word appearing in the data is an item (case 1), since each item is a word, the meaning of the obtained association rule is more significant than in the case of dependency relationship (case 2). Difficult to take. On the other hand, in the case of a dependency relationship, each item is in the form of “what is wrong” and is easy to take a meaning.

また、場合１では、係り受け関係を考慮していないため、｛特定口座、申し込む（だ）｝など、単純に単語を組み合わせることで、主格や目的語の省略などを意識しないで、アイテム集合を作成することができる。一方、場合２では、係り受け関係を用いるため、テキストデータ特有の主語や目的語の省略には対応できない。 Also, in case 1, since the dependency relationship is not taken into account, the item set can be created by simply combining words such as {specific account, apply (da)} without being aware of the main character or object omission. Can be created. On the other hand, in case 2, since the dependency relationship is used, it is not possible to cope with omission of the subject or object specific to the text data.

例えば、図２に示す係り受け関係では、「開設されていない」は、「（特定口座が）開設されていない」という意味であり、ガ格が省略されている。そのため、場合２では、「特定口座」と「開設する（ない）」の間に係り受け関係が存在していないので、「開設する（ない）」を含む係り受け関係は作成されていない。 For example, in the dependency relationship shown in FIG. 2, “not established” means “(a specific account) is not established”, and the case is omitted. Therefore, in case 2, since there is no dependency relationship between “specific account” and “open (not)”, a dependency relationship including “open (not)” is not created.

そこで、係り受け関係と単語をアイテムとしてアソシエーションルール（相関規則）を作成、出力するものであり、アソシエーションルールの作成及び又は出力を効率的に、重複しないように整理良くすることができる情報分析システム、情報分析方法及び情報分析プログラムを提供する。 Therefore, an information analysis system that creates and outputs association rules (correlation rules) using dependency relationships and words as items, and can efficiently create and / or output association rules so that they do not overlap. An information analysis method and an information analysis program are provided.

かかる課題を解決するため、第１の本発明の情報分析システムは、入力した複数のテキスト情報のそれぞれの構成要素に基づいて相関規則を作成し、有用な相関規則を出力する情報分析システムにおいて、（１）各テキスト情報に対して形態素解析を行う形態素解析手段と、（２）各テキスト情報に対して構文解析を行う構文解析手段と、（３）各テキスト情報の形態素解析結果及び又は構文解析結果を、相関規則の分析対象であるアイテムとして作成するアイテム作成手段と、（４）アイテム作成手段により作成された１又は複数のアイテムを用いて１又は複数のアイテム集合を作成するアイテム集合作成手段と、（５）アイテム集合作成手段により作成された各アイテム集合を照らし合わせて、意味的に包含関係のあるアイテムを要素として有するアイテム集合を削除するアイテム集合削除手段と、（６）各アイテム集合について共起出現頻度を計算するアイテム集合計算手段と、（７）アイテム集合計算手段により計算された各アイテム集合の共起出現頻度に基づいて１又は複数の相関規則を作成する相関規則作成手段と、（８）相関規則作成手段により作成された各相関規則を表示する表示手段とを備えることを特徴とする。 In order to solve such a problem, an information analysis system according to a first aspect of the present invention is an information analysis system that creates a correlation rule based on each component of a plurality of input text information and outputs a useful correlation rule. (1) morphological analysis means for performing morphological analysis on each text information, (2) syntax analysis means for performing syntax analysis on each text information, and (3) morphological analysis results and / or syntax analysis of each text information. Item creation means for creating a result as an item to be analyzed by the correlation rule, and (4) Item set creation means for creating one or a plurality of item sets using one or a plurality of items created by the item creation means (5) By comparing each item set created by the item set creation means, an item having a semantic inclusion relation is defined as an element. An item set deleting means for deleting the item set, (6) an item set calculating means for calculating the co-occurrence frequency for each item set, and (7) co-occurrence of each item set calculated by the item set calculating means. A correlation rule creating unit that creates one or a plurality of correlation rules based on the appearance frequency, and (8) a display unit that displays each correlation rule created by the correlation rule creating unit.

第２の本発明の情報分析方法は、入力した複数のテキスト情報のそれぞれの構成要素に基づいて相関規則を作成し、有用な相関規則を出力する情報分析方法において、（１）形態素解析手段が、各テキスト情報に対して形態素解析を行う形態素解析工程と、（２）構文解析手段が、各テキスト情報に対して構文解析を行う構文解析工程と、（３）アイテム作成手段が、各テキスト情報の形態素解析結果及び又は構文解析結果を、相関規則の分析対象であるアイテムとして作成するアイテム作成工程と、（４）アイテム集合作成手段が、アイテム作成手段により作成された１又は複数のアイテムを用いて１又は複数のアイテム集合を作成するアイテム集合作成工程と、（５）アイテム集合削除手段が、アイテム集合作成手段により作成された各アイテム集合を照らし合わせて、意味的に包含関係のあるアイテムを要素として有するアイテム集合を削除するアイテム集合削除工程と、（６）アイテム集合計算手段が、各アイテム集合について共起出現頻度を計算するアイテム集合計算工程と、（７）相関規則作成手段が、アイテム集合計算手段により計算された各アイテム集合の共起出現頻度に基づいて１又は複数の相関規則を作成する相関規則作成工程と、（８）表示手段が、相関規則作成手段により作成された各相関規則を表示する表示工程とを備えることを特徴とする。 An information analysis method according to a second aspect of the present invention is an information analysis method in which a correlation rule is created based on each component of a plurality of input text information and a useful correlation rule is output. A morpheme analysis step for performing morphological analysis on each text information, (2) a syntax analysis step for parsing each text information by a syntax analysis unit, and (3) an item creation unit for each text information An item creation step of creating the morpheme analysis result and / or the syntax analysis result as an item to be analyzed by the correlation rule, and (4) the item set creation means uses one or more items created by the item creation means An item set creation step for creating one or a plurality of item sets, and (5) an item set deletion means each created by the item set creation means An item set deletion step of checking an item set and deleting an item set having items having semantically inclusive relations as elements, and (6) item set calculation means calculates the co-occurrence appearance frequency for each item set An item set calculation step; and (7) a correlation rule creation step in which the correlation rule creation unit creates one or a plurality of correlation rules based on the co-occurrence appearance frequency of each item set calculated by the item set calculation unit; 8) The display means includes a display step of displaying each correlation rule created by the correlation rule creation means.

第３の本発明の情報分析プログラムは、入力した複数のテキスト情報のそれぞれの構成要素に基づいて相関規則を作成し、有用な相関規則を出力する情報分析プログラムにおいて、コンピュータに、（１）各テキスト情報に対して形態素解析を行う形態素解析手段、（２）各テキスト情報に対して構文解析を行う構文解析手段、（３）各テキスト情報の形態素解析結果及び又は構文解析結果を、相関規則の分析対象であるアイテムとして作成するアイテム作成手段、（４）アイテム作成手段により作成された１又は複数のアイテムを用いて１又は複数のアイテム集合を作成するアイテム集合作成手段、（５）アイテム集合作成手段により作成された各アイテム集合を照らし合わせて、意味的に包含関係のあるアイテムを要素として有するアイテム集合を削除するアイテム集合削除手段、（６）各アイテム集合について共起出現頻度を計算するアイテム集合計算手段、（７）アイテム集合計算手段により計算された各アイテム集合の共起出現頻度に基づいて１又は複数の相関規則を作成する相関規則作成手段、（８）相関規則作成手段により作成された各相関規則を表示する表示手段として機能させることを特徴とする。 An information analysis program according to a third aspect of the present invention is an information analysis program for creating a correlation rule based on each component of a plurality of input text information and outputting a useful correlation rule. Morphological analysis means for performing morphological analysis on text information, (2) Syntax analysis means for performing syntax analysis on each text information, (3) Morphological analysis results and / or syntax analysis results of each text information, Item creation means for creating items to be analyzed, (4) Item set creation means for creating one or more item sets using one or more items created by the item creation means, (5) Item set creation An item that has items that are semantically inclusive as elements by comparing each item set created by the means An item set deleting means for deleting the item set, (6) an item set calculating means for calculating the co-occurrence appearance frequency for each item set, and (7) a co-occurrence appearance frequency of each item set calculated by the item set calculating means. And (8) function as display means for displaying each correlation rule created by the correlation rule creation means.

本発明によれば、係り受け関係と単語をアイテムとしてアソシエーションルール（相関規則）を作成、出力するものであり、アソシエーションルールの作成及び又は出力を効率的に、重複しないように整理良くすることができる。 According to the present invention, an association rule (correlation rule) is created and output using a dependency relationship and a word as an item, and the creation and / or output of the association rule can be efficiently organized so as not to overlap. it can.

（Ａ）第１の実施形態
以下、本発明の情報分析システム、情報分析方法及び情報分析プログラムの第１の実施形態を図面を参照しながら詳細に説明する。 (A) First Embodiment Hereinafter, a first embodiment of an information analysis system, an information analysis method, and an information analysis program of the present invention will be described in detail with reference to the drawings.

本実施形態では、自然文であるテキスト文を対象として、複数のアイテムを生成し、これらアイテムを集合させたアソシエーションルールを作成する場合を説明する。 In the present embodiment, a case will be described in which a plurality of items are generated for a text sentence that is a natural sentence, and an association rule in which these items are aggregated is created.

また、本実施形態は、係り受け関係をアイテムとする手法（上述の従来技術の場合２の手法）を拡張して、テキストデータ特有の主語や目的語の省略にも対応できるようにする。 In addition, the present embodiment extends the technique using the dependency relationship as an item (the technique 2 in the case of the above-described related art) so as to cope with the omission of the subject and object specific to the text data.

その拡張方法は、アイテムとして、係り受け関係だけでなく、一般名詞や動詞、形容詞などの単語もアイテムとする。それによって、上述の従来技術の場合１と異なり、係り受け関係を考慮すると同時に、テキストデータ特有の主語や目的語の省略にも対応する。 In the extension method, not only dependency relationships but also words such as general nouns, verbs, and adjectives are used as items. Thereby, unlike the case of the above-described conventional technique 1, the dependency relationship is considered, and at the same time, the omission of the subject and the object specific to the text data is supported.

係り受け関係と単語をアイテムにすることで、アソシエーションルール間で意味的に重複する関係が生じるが、その際には、より制約が強いアソシエーションルールを残すようにする。 By using dependency relationships and words as items, there is a semantically overlapping relationship between association rules, but in that case, association rules that are more restrictive are left.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態のデータ分析装置７Ａの内部構成を示すブロック図である。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the internal configuration of the data analysis device 7A of the first embodiment.

図１において、本実施形態のデータ分析装置７Ａは、入力部１、形態素解析部２、構文解析部３、アイテム生成部４、アソシエーションルール抽出部５、表示部６を有して構成される。 In FIG. 1, the data analysis device 7 </ b> A of the present embodiment includes an input unit 1, a morpheme analysis unit 2, a syntax analysis unit 3, an item generation unit 4, an association rule extraction unit 5, and a display unit 6.

さらに、アソシエーションルール抽出部５は、制御部５００、候補アイテム集合生成部５０１、候補アイテム集合削除部５０２、候補アイテム集合計算部５０３、ルール作成部５０４、アイテム集合一時記憶部５０５を有する。 Further, the association rule extraction unit 5 includes a control unit 500, a candidate item set generation unit 501, a candidate item set deletion unit 502, a candidate item set calculation unit 503, a rule creation unit 504, and an item set temporary storage unit 505.

入力部１は、アソシエーション分析の対象となるデータを取り込むものであり、取り込んだデータを形態素解析部２に与えるものである。また、入力部１は、アソシエーションルールを計算する「最小支持度」、「最小確信度」、「作成するアイテム集合の最大サイズ」を取り込み、その「最小支持度」、「最小確信度」、「作成するアイテム集合の最大サイズ」をアソシエーション抽出部５に与えるものである。 The input unit 1 captures data to be subjected to association analysis, and supplies the captured data to the morpheme analysis unit 2. Further, the input unit 1 takes in “minimum support”, “minimum confidence”, and “maximum size of the item set to be created” for calculating the association rule, and inputs the “minimum support”, “minimum confidence”, “ The maximum size of the item set to be created ”is given to the association extraction unit 5.

形態素解析部２は、入力部１が取り込んだ各データを受け取り、その各データに対して、所定の形態素辞書や所定の規則などを用いて、形態素解析を行うものであり、形態素解析結果を構文解析部３に与えるものである。 The morpheme analysis unit 2 receives each data fetched by the input unit 1 and performs a morpheme analysis on each data using a predetermined morpheme dictionary or a predetermined rule. This is given to the analysis unit 3.

構文解析部３は、形態素解析部２により解析された形態素解析結果を受け取り、その形態素解析結果に基づいて構文解析を行い、係り受け関係を抽出するものである。 The syntax analysis unit 3 receives a morpheme analysis result analyzed by the morpheme analysis unit 2, performs a syntax analysis based on the morpheme analysis result, and extracts a dependency relationship.

アイテム生成部４は、形態素解析部２による形態素解析結果や、構文解析部３による構文解析結果を利用して、単語や係り受け関係からなる複数のアイテムを作成するものである。また、アイテム生成部４は、作成したアイテムをアソシエーションルール抽出部５に与えるものである。 The item generation unit 4 uses the morpheme analysis result by the morpheme analysis unit 2 and the syntax analysis result by the syntax analysis unit 3 to create a plurality of items including words and dependency relationships. The item generation unit 4 gives the created item to the association rule extraction unit 5.

アソシエーションルール抽出部５は、アイテム生成部４により作成されたアイテムを受け取り、それらアイテムの中から後述する方法により必要なアイテム集合を抽出し、アソシエーションルールを作成するものである。また、アソシエーションルール抽出部５は、作成したアソシエーションルールを表示部６に与えるものである。 The association rule extraction unit 5 receives the items created by the item generation unit 4, extracts a necessary item set from the items by a method described later, and creates an association rule. Further, the association rule extraction unit 5 gives the created association rule to the display unit 6.

表示部６は、アソシエーションルール抽出部５により作成されたアソシエーションルールを受け取ると、受け取ったアソシエーションルールに対して所定の出力処理を施して、出力するものである。 When the display unit 6 receives the association rule created by the association rule extraction unit 5, the display unit 6 performs a predetermined output process on the received association rule and outputs it.

次に、アソシエーションルール抽出部５の内部機能について図１を参照して説明する。 Next, the internal function of the association rule extraction unit 5 will be described with reference to FIG.

制御部５００は、アソシエーションンルール抽出部５の機能を制御するものである。 The control unit 500 controls the function of the association rule extraction unit 5.

候補アイテム集合生成部５０１は、様々なサイズのアイテム集合を作成するものである。 The candidate item set generation unit 501 creates item sets of various sizes.

候補アイテム集合削除部５０２は、単語間の意味的な関係から、支持度を計算する必要のないアイテム集合を見つけて削除するものである。 The candidate item set deletion unit 502 finds and deletes an item set that does not need to be calculated from the semantic relationship between words.

候補アイテム集合計算部５０３は、各アイテム集合の支持度を計算するものである。 The candidate item set calculation unit 503 calculates the support level of each item set.

ルール作成部５０４は、アソシエーションルールを生成し、一定の確信度以上のアソシエーションルールを導出する。 The rule creation unit 504 generates an association rule and derives an association rule having a certain certainty factor or higher.

アイテム集合一時記憶部５０５は、候補アイテム集合計算部５０３により計算された各アイテム集合の支持度を格納するものである。 The item set temporary storage unit 505 stores the support level of each item set calculated by the candidate item set calculation unit 503.

（Ａ−２）第１の実施形態の動作
次に、本実施形態のデータ分析装置７の動作について図面を参照しながら詳説する。 (A-2) Operation of First Embodiment Next, the operation of the data analysis device 7 of this embodiment will be described in detail with reference to the drawings.

図３は、第１の実施形態の動作を示すフローチャートであり、このフローチャートに沿って動作を説明する。 FIG. 3 is a flowchart showing the operation of the first embodiment, and the operation will be described along this flowchart.

入力部１は、データの入力を受け付け、入力されたデータを取り込むと、取り込んだデータを形態素解析部２に与える。また、入力部１は、「最小支持度」、「最小確信度」、「作成するアイテム集合の最大サイズ」の入力を受け付けており、これら「最小支持度」、「最小確信度」、「作成するアイテム集合の最大サイズ」を取り込むと、アソシエーションルール抽出部５に与える（ステップ１００）。 When the input unit 1 accepts input of data and takes in the input data, the input unit 1 gives the taken-in data to the morpheme analysis unit 2. The input unit 1 accepts inputs of “minimum support”, “minimum confidence”, and “maximum size of item set to be created”, and these “minimum support”, “minimum confidence”, “creation” When the “maximum size of item set to be performed” is taken in, it is given to the association rule extraction unit 5 (step 100).

例えば、以下では、図４に示すデータを入力データとした場合を例として述べる。なお、図４における「データＩＤ」は各データを識別するための識別子である。 For example, a case where the data shown in FIG. 4 is input data will be described below as an example. Note that “data ID” in FIG. 4 is an identifier for identifying each data.

また、例えば、「最小支持度」＝２（つまり、２以上の文書で出現したアイテムを採用。）、「最小確信度」＝０．６、「作成するアイテム集合の最大サイズ」＝３とする。最小支持度は、全文書に占める出現割合としてもよいが、ここでは説明の都合上出現数にしている。 Also, for example, “minimum support” = 2 (that is, items appearing in two or more documents are adopted), “minimum certainty” = 0.6, and “maximum size of item set to be created” = 3. . The minimum support level may be the appearance ratio of all documents, but here it is the number of appearances for convenience of explanation.

入力部１に入力された各データが形態素解析部２に与えられると、形態素解析部２は、受け取った各データについて、形態素解析を実施する（ステップ１１０）。なお、形態素解析部２による形態素解析処理は、従来の一般的な形態素解析技術を用いることができ、ここでの詳細な説明は省略する。 When each data input to the input unit 1 is given to the morpheme analysis unit 2, the morpheme analysis unit 2 performs morpheme analysis on each received data (step 110). The morpheme analysis processing by the morpheme analysis unit 2 can use a conventional general morpheme analysis technique, and a detailed description thereof is omitted here.

ここで、図５は、図４に示す「データ１」についての形態素解析処理の結果を示す。図５において、「データＩＤ」は入力データのデータＩＤであり、「単語ＩＤ」は入力データにおける形態素（単語）を識別するための識別情報であり、「単語」は解析された形態素（単語）を示し、「形態素の種類」は単語の品詞を示す。 Here, FIG. 5 shows the result of the morphological analysis process for “data 1” shown in FIG. In FIG. 5, “data ID” is a data ID of input data, “word ID” is identification information for identifying a morpheme (word) in the input data, and “word” is an analyzed morpheme (word). And “type of morpheme” indicates the part of speech of the word.

形態素解析部２により各データについて形態素解析が行われると、構文解析部３は、各データの形態素解析結果に基づいて構文解析を実施する（ステップ１２０）。なお、構文解析部３による構文解析処理は、従来の一般的な構文解析技術を用いることができ、ここでの詳細な説明は省略する。 When morpheme analysis is performed on each data by the morpheme analysis unit 2, the syntax analysis unit 3 performs syntax analysis based on the morpheme analysis result of each data (step 120). The parsing process by the parsing unit 3 can use a conventional general parsing technique, and a detailed description thereof is omitted here.

ここで、図６は、図４に示す「データ１〜データ３」についての構文解析処理の結果を示す。図６において、「番号」は各データにおける係り受け関係を識別するための識別情報である。 Here, FIG. 6 shows the result of the parsing process for “data 1 to data 3” shown in FIG. In FIG. 6, “number” is identification information for identifying the dependency relationship in each data.

アイテム生成部４は、構文解析部３による構文解析結果に基づいて、１個１個の係り受け関係を１個のアイテム（これを「係り受けアイテム」と呼ぶ）として、各データ毎に作成する（ステップ１３０）。図７は、図４の「データ１〜データ３」についての係り受けアイテムの作成結果を示す。 The item generation unit 4 creates each dependency relationship as one item (referred to as “dependency item”) for each piece of data based on the syntax analysis result by the syntax analysis unit 3. (Step 130). FIG. 7 shows a result of creating dependency items for “data 1 to data 3” in FIG.

このとき、アイテム生成部４は、助詞情報を削除し、「係り元単語＋（−）＋係り先単語」という表記法で作成する。また、係り元単語が動詞、係り先単語が名詞で、助詞情報がＮＵＬＬの関係の場合、アイテム生成部４は、係り先単語と係り元単語を入れ替える形式で係り受けアイテムを作成する。 At this time, the item generation unit 4 deletes the particle information and creates it with the notation “relationship source word + (−) + relationship destination word”. Further, when the relationship source word is a verb, the relationship destination word is a noun, and the particle information is NULL, the item generation unit 4 creates a dependency item in a format in which the relationship destination word and the relationship source word are replaced.

例えば、図６の「データ２の番号３」が示すように、「宣伝する（いる）＋ＮＵＬＬ＋特定口座」の場合、アイテム生成部４は、「特定口座−宣伝する（いる）」という形式のアイテムを作成する（図７参照）。 For example, as indicated by “number 3 of data 2” in FIG. 6, in the case of “advertise (is) + NULL + specific account”, the item generation unit 4 has an item of the form “specific account-advertise (is)”. Is created (see FIG. 7).

また、アイテム生成部４は、形態素解析部２による形態素解析結果に基づいて、名詞、動詞、形容詞、形容動詞をアイテム（これを「単語アイテム」と呼ぶ）として、各データ毎に作成する（ステップ１４０）。図８は、図４の「データ１〜データ３」についての単語アイテムの作成結果を示す。但し、アイテム生成部４は、「こと」、「もの」等の名詞はアイテムとしない。 Further, the item generation unit 4 creates a noun, a verb, an adjective, and an adjective verb as an item (this is referred to as a “word item”) based on the morphological analysis result by the morpheme analysis unit 2 (step) 140). FIG. 8 shows a word item creation result for “data 1 to data 3” in FIG. However, the item generation unit 4 does not use nouns such as “things” and “things” as items.

また、構文解析結果において、係り元単語と係り先単語がともに名詞で、助詞情報がＮＵＬＬである単語は複合名詞である場合、アイテム生成部４は、係り元単語だけでは、そのデータの意味を正しく表さないことがあるので、係り元単語を登録しない。 In the parsing result, when both the source word and the destination word are nouns and the word whose particle information is NULL is a compound noun, the item generation unit 4 determines the meaning of the data only with the source word. Do not register the source word because it may not be represented correctly.

例えば、図４の「データ１」において、単語「総合口座」は「総合」と「口座」とからなる複合名詞であるため、アイテム生成部４は、係り元単語である「総合」は登録しない。なお、係り先単語である「口座」は単語アイテムとして登録される。 For example, in “Data 1” in FIG. 4, the word “general account” is a compound noun composed of “general” and “account”, so the item generation unit 4 does not register “general”, which is the dependency source word. . Note that the “account” as the contact word is registered as a word item.

アイテム生成部４により各データの係り受けアイテム及び単語アイテムが作成されると、アソシエーション抽出部５は、作成された係り受けアイテム及び単語アイテムに基づいてアイテム集合を抽出し、アソシエーションルールを作成する（ステップ１５０）。 When the dependency item and the word item of each data are created by the item generation unit 4, the association extraction unit 5 extracts an item set based on the created dependency item and the word item, and creates an association rule ( Step 150).

図９は、ステップ１５０におけるアソシエーションルール抽出部５の詳細な処理を示すフローチャートである。 FIG. 9 is a flowchart showing detailed processing of the association rule extraction unit 5 in step 150.

アイテム生成部４により作成された各データの係り受けアイテム及び単語アイテムを受け取ると、候補アイテム集合生成部５０１は、アイテム生成部４から受け取ったすべてのアイテムを大きさ１のアイテム集合として、アイテム集合一時記憶部５０５に登録する（ステップ１０００）。 When receiving the dependency item and the word item of each data created by the item generation unit 4, the candidate item set generation unit 501 sets all items received from the item generation unit 4 as item sets of size 1, and sets the item set Registration in the temporary storage unit 505 (step 1000).

ここで、アイテム集合の大きさとは、アイテム集合を構成するアイテム（係り受けアイテム、単語アイテムのどちらも含む）の数であり、例えば、大きさ１のアイテム集合とは１個のアイテムから構成される集合体をいい、大きさｎ（ｎは正の整数）のアイテム集合とはｎ個のアイテムから構成される集合体をいう。 Here, the size of the item set is the number of items (including both dependency items and word items) constituting the item set. For example, the size 1 item set is composed of one item. An item set having a size n (n is a positive integer) is a set composed of n items.

そして、候補アイテム集合計算部５０３は、アイテム集合一時記憶部５０５に登録されている大きさ１のアイテム集合について、大きさ１のアイテム集合の出現数を計算する（ステップ１０１０）。 Then, the candidate item set calculation unit 503 calculates the number of appearances of the size 1 item set for the size 1 item set registered in the item set temporary storage unit 505 (step 1010).

図１０は、大きさ１のアイテム集合の出現数を示す。図１０では、各アイテム集合についての出現数と、各アイテム集合が出現するデータＩＤ（文書ＩＤ）とを対応付けて示す。 FIG. 10 shows the number of appearances of a size 1 item set. In FIG. 10, the number of appearances for each item set is associated with the data ID (document ID) in which each item set appears.

候補アイテム集合計算部５０３により各アイテム集合の出現数が計算されると、候補アイテム集合削除部５０２は、アイテム集合の出現数が、入力部１から受け取った「最小支持度」未満である場合には、そのアイテム集合をアイテム集合一時記憶部５０５から削除する（ステップ１０２０）。 When the number of occurrences of each item set is calculated by the candidate item set calculation unit 503, the candidate item set deletion unit 502 determines that the number of appearances of the item set is less than the “minimum support” received from the input unit 1. Deletes the item set from the item set temporary storage unit 505 (step 1020).

なお、候補アイテム集合削除部５０２によりすべてのアイテム集合が「最小支持度」未満でありすべて削除された場合（ステップ１０３０）、条件を満たすアソシエーションルールが存在しないため、処理を終了する。 If all item sets are less than the “minimum support level” and all items are deleted by the candidate item set deletion unit 502 (step 1030), the process ends because there is no association rule that satisfies the condition.

一方、条件を満たすアソシエーションルールが存在する場合、ステップ１０４０に進み、制御部５００は、大きさｎのアイテム集合を処理対象とするため、アイテム集合の大きさを示すカウンタｎを２にセットする（ステップ１０４０）。 On the other hand, if there is an association rule that satisfies the condition, the process proceeds to step 1040, and the control unit 500 sets the counter n indicating the size of the item set to 2 in order to process the item set having the size n (2) ( Step 1040).

制御部５００によりカウンタｎが２にセットされると、候補アイテム集合性西部５０１は、アイテム集合一時記憶部５０５に登録されているアイテム集合に基づいて、大きさｎ（ここでは大きさ２）のアイテム集合を生成する（ステップ１０５０）。ここで、アイテム集合の生成には、例えば、非特許文献１に開示されるＡｐｒｉｏｒｉアルゴリズムを利用してもよい。 When the counter n is set to 2 by the control unit 500, the candidate item set western unit 501 has a size n (here, size 2) based on the item set registered in the item set temporary storage unit 505. An item set is generated (step 1050). Here, for the generation of the item set, for example, the Priori algorithm disclosed in Non-Patent Document 1 may be used.

大きさ２のアイテム集合が生成されると、候補アイテム集合削除部５０２は、大きさ２の各アイテム集合について、例えば次のような部分集合を含むアイテム集合を削除する（ステップ１０６０）。 When a size 2 item set is generated, the candidate item set deletion unit 502 deletes, for example, an item set including the following subset for each size 2 item set (step 1060).

例えば、｛Ｗ、＊−Ｗ｝や｛Ｗ、Ｗ−＊｝のように、アイテム集合を構成する一方のアイテムが他方のアイテムの全部又は一部を含んでいるようなアイテム集合を削除する。但し、Ｗは単語、＊は任意の単語である。「−」は係り受け関係を表す。 For example, an item set such that one item constituting the item set includes all or part of the other item is deleted, such as {W, * -W} or {W, W- *}. However, W is a word and * is an arbitrary word. “−” Represents a dependency relationship.

例えば、図１０の８行目及び９行目のアイテム「特定口座−申し込む（だ）」と「申し込む（だ）」とを組み合わせた大きさ２のアイテム集合｛特定口座−申し込む（だ）、申し込む（だ）｝を考える。この場合、各アイテムに「申し込む（だ）」の部分が重複しているので、候補アイテム集合削除部５０２は、このアイテム集合｛特定口座−申し込む（だ）、申し込む（だ）｝を削除する。また、図１０の８行目及び１５行目の「特定口座−申し込む（だ）」と「特定口座」を組み合わせたアイテム集合｛特定口座−申し込む（だ）、特定口座｝についても同様に削除する。 For example, an item set of size 2 combining the items “specific account-apply (da)” and “apply (da)” in the eighth and ninth lines in FIG. 10 {specific account-apply (da), apply. (Da)}. In this case, since the “apply (da)” portion of each item is duplicated, the candidate item set deletion unit 502 deletes this item set {specific account—apply (da), apply (da)}. Similarly, the item set {specific account-apply (da), specific account} combining "specific account-apply (da)" and "specific account" in the 8th and 15th lines in FIG. .

候補アイテム集合削除部５０２により部分集合を含むアイテム集合が削除されると、候補アイテム集合計算部５０３は、大きさ２の各アイテム集合の支持度（出現数）を計算し、その各アイテム集合の支持度が「最小支持度」未満である場合は、そのアイテム集合をアイテム集合一時記憶部５０５から削除する（ステップ１０７０）。図１１は、大きさ２のアイテム集合を示し、出現数が「最小支持度」未満である斜線部分のアイテム集合を削除する。 When the item set including the subset is deleted by the candidate item set deletion unit 502, the candidate item set calculation unit 503 calculates the support level (number of appearances) of each item set of size 2, and If the support level is less than the “minimum support level”, the item set is deleted from the item set temporary storage unit 505 (step 1070). FIG. 11 shows an item set of size 2, and deletes the item set in the hatched portion where the number of appearances is less than the “minimum support”.

そして、制御部５００は、カウンタｎが入力部１から入力された「作成するアイテム集合の最大サイズ」と等しい場合、又は、すべてのアイテム集合で、「最小支持度」未満だった場合には、ステップ１１００に進む。 Then, when the counter n is equal to the “maximum size of the item set to be created” input from the input unit 1 or when it is less than the “minimum support level” in all the item sets, Proceed to step 1100.

一方、それ以外の場合には、ステップ１０９０に進み、制御部５００は、カウンタｎに１を加算し、ステップ１０５０に戻って繰り返し処理が行われる。 On the other hand, in other cases, the process proceeds to step 1090, where the control unit 500 adds 1 to the counter n, returns to step 1050, and repeats the process.

ステップ１１００では、ルール作成部５０４が、アイテム集合一時記憶部５０５を参照して、入力部１に入力された「最小確信度」以上のアソシエーションルールを抽出する。 In step 1100, the rule creation unit 504 refers to the item set temporary storage unit 505, and extracts an association rule greater than or equal to “minimum certainty” input to the input unit 1.

そのため、まず、ルール作成部５０４は、アイテム集合一時記憶部５０５に格納されたアイテム集合を１個選択し、選択したアイテム集合を、任意の２個の集合（条件部と結論部になるアイテム集合）に分割する（ステップ１１１０）。分割された集合をそれぞれ、Ｓ、１−Ｓとして示す。 Therefore, first, the rule creation unit 504 selects one item set stored in the item set temporary storage unit 505, and selects the selected item sets as two arbitrary sets (an item set that becomes a condition part and a conclusion part). (Step 1110). The divided sets are denoted as S and 1-S, respectively.

例えば、大きさ３のアイテム集合｛特定口座、申し込む（だ）、開設する（ない）｝は、（１）｛特定口座、申し込む（だ）｝、｛開設する（ない）｝、（２）｛特定口座、開設する（ない）｝、｛申し込む（だ）｝、（３）｛申し込む（だ）、開設する（ない）｝、｛特定口座｝の３通りに分割できる。 For example, an item set of size 3 {specific account, apply (da), open (not)} is (1) {specific account, apply (da)}, {open (not)}, (2) { Specific account, open (not)}, {apply (da)}, (3) {apply (da), open (no)}, {specific account}.

ルール作成部５０４は、分割された組み合わせのうち１個の集合を取り出し、その取り出した集合について、アソシエーションルールの候補ｓ⇒１−ｓと１−ｓ⇒ｓに対する確信度（それぞれ、１の支持度／ｓの支持度、１の支持度／１−ｓの支持度）を求める（ステップ１１３０）。この確信度の求め方は、従来のアソシエーションルールの導出方法と同じであるので詳細な説明は省略する。 The rule creation unit 504 extracts one set from the divided combinations, and the reliability for the association rule candidates s⇒1-s and 1-s⇒s (one support level for each). / S support degree, 1 support degree / 1-s support degree) (step 1130). Since the method for obtaining the certainty factor is the same as the conventional method for deriving the association rule, a detailed description thereof is omitted.

求めた確信度が入力部１で入力された「最小確信度」未満である場合、ルール作成部５０４は、その集合をアソシエーションルールとして採用せず、ステップ１１２０に戻る。 If the calculated certainty factor is less than the “minimum certainty factor” input by the input unit 1, the rule creating unit 504 does not adopt the set as an association rule and returns to Step 1120.

なお、ステップ１１３０及び１１４０の処理は、分割して求めたすべての組み合わせの集合について行われ、またステップ１１０に戻り、アイテム集合一時記憶部５０５に格納されているすべてのアイテム集合について実施される。 Note that the processing in steps 1130 and 1140 is performed for a set of all combinations obtained by division, and the process returns to step 110 and is performed for all item sets stored in the item set temporary storage unit 505.

例えば、図１２は、図１１に示す大きさ２のアイテム集合からできるアソシエーションルールを示す。但し、確信度が０．６未満のものを省略している。 For example, FIG. 12 shows an association rule made from the item set of size 2 shown in FIG. However, those with a certainty factor of less than 0.6 are omitted.

また例えば、図１３は、大きさ３のアイテム集合を示し、図１４は、大きさ３のアイテム集合からできるアソシエーションルールを示す。 Also, for example, FIG. 13 shows a size 3 item set, and FIG. 14 shows an association rule made from a size 3 item set.

ステップ１１５０では、作成したアソシエーションルールの条件部や結論部の制約が少ないルールを削除し、処理を終了する。 In step 1150, a rule with few restrictions on the condition part and the conclusion part of the created association rule is deleted, and the process ends.

この削除対象は、次の２つの条件を同時に満たすアソシエーションルールである。 This deletion target is an association rule that simultaneously satisfies the following two conditions.

２つのアソシエーションルールＲｉ：Ｐｉ⇒Ｃｉ（確信度Ｃｏｆｉｉ），Ｒｊ：Ｐｊ⇒Ｃｊ（確信度Ｃｏｎｆｉｊ）において、
条件１．確信度が同一値Ｃｏｎｆｉｉ＝Ｃｏｎｆｉｉ
条件２．（ｉ）から（ｖｉ）のいずれかの場合である。 In the two association rules Ri: Pi⇒Ci (confidence level Cofii), Rj: Pj⇒Cj (confidence level Confij),
Condition 1. Confidence level is the same value Confif = Confii
Condition 2. One of the cases (i) to (vi).

（ｉ）Ｃｉ＝ＣｊかつＰｉが｛Ｃｏｍｉｊ｝，Ｐｊが｛Ｃｏｍｉｊ，ｐｊ１，ｐｊ２、…、ｐｊｎ｝である場合に、Ｒｉを削除する。（但し、Ｃｏｍｉｊは、ＰｉとＰｊに共通するアイテムとする。）すなわち、２つのアソシエーションルールの違いが、一方のアソシエーションルールの条件部の増加のみの場合。 (I) If Ci = Cj and Pi is {Comij} and Pj is {Comij, pj1, pj2,..., Pjn}, Ri is deleted. (However, Comij is an item common to Pi and Pj.) That is, the difference between two association rules is only an increase in the condition part of one association rule.

例えば、
Ｒｉ：｛特定口座−対応｝⇒申し込む確信度：０．７５
Ｒｊ：｛特定口座−対応、悪い｝⇒申し込む確信度：０．７５
の場合には、Ｒｉを削除する。 For example,
Ri: {specific account-correspondence} ⇒ application certainty: 0.75
Rj: {specific account-corresponding, bad} ⇒ sign up Certainty: 0.75
In the case of Ri, Ri is deleted.

（ｉｉ）Ｐｉ＝ＰｊかつＣｉが｛Ｃｏｍｉｊ｝，Ｃｊが｛Ｃｏｍｉｊ，ｃｊ１，ｃｊ２，…、ｃｊｎ｝である場合に、Ｒｉを削除する。（但し、Ｃｏｍｉｊは、ＣｉとＣｊに共通するアイテムとする。）すなわち、２つのアソシエーションルールの違いが、一方のアソシエーションルールの結論部の増加のみの場合。 (Ii) Ri is deleted when Pi = Pj and Ci is {Comij} and Cj is {Comij, cj1, cj2,..., Cjn}. (However, Comij is an item common to Ci and Cj.) That is, the difference between two association rules is only an increase in the conclusion part of one association rule.

例えば、
（Ｒ２−１）｛開設する（ない）｝⇒｛申し込む（だ）｝確信度：１．００
（Ｒ３−２）｛開設する（ない）｝⇒｛特定口座、申し込む（だ）｝確信度：１．００の場合には、（Ｒ２−１）を削除する。 For example,
(R2-1) {Open (not)} ⇒ {Apply (da)} Certainty: 1.00
(R3-2) {Open (not)} ⇒ {Specific account, apply (da)} Certainty factor: If 1.00, delete (R2-1).

（ｉｉｉ）Ｃｉ＝ＣｊかつＰｉが｛Ｃｏｍｉｊ，Ｐｉ１，Ｐｉ２｝，Ｐｊが｛Ｃｏｍｉｊ，Ｐｊ１，…，Ｐｊｎ｝である場合で、
ｐｉ１＝単語Ａかつｐｉ２＝単語Ｂ、ｐｊ１＝単語Ａ−単語Ｂである場合に、Ｒｉを削除する。（但し、Ｃｏｍｉｊは、ＰｉとＰｊに共通するアイテムとする。）すなわち、２つのアソシエーションルールの違いとして、条件部のアイテムが、係り受け関係か単語の共起関係かによる場合。 (Iii) When Ci = Cj and Pi is {Comij, Pi1, Pi2} and Pj is {Comij, Pj1,..., Pjn},
If pi1 = word A and pi2 = word B and pj1 = word A-word B, Ri is deleted. (However, Comij is an item common to Pi and Pj.) That is, as a difference between the two association rules, the item in the condition part depends on a dependency relationship or a word co-occurrence relationship.

例えば、
（Ｒ２−６）：｛特定口座−申し込む（だ）｝⇒｛開設する（ない）｝確信度：０．６７
（Ｒ３−１）：｛特定口座、申し込む（だ）｝⇒｛開設する（ない）｝確信度：０．６７ならば、（Ｒ３−１）を削除する。 For example,
(R2-6): {specific account-apply (da)} ⇒ {open (no)} certainty: 0.67
(R3-1): {specific account, apply (da)} ⇒ {open (not)} certainty factor: 0.67, delete (R3-1).

（ｉｖ）Ｐｉ＝ＰｊかつＣｉが｛Ｃｏｍｉｊ，ｃｉ１，ｃｉ２｝、Ｃｊが｛Ｃｏｍｉｊ，ｃｊ１，…，ｃｊｎ｝である場合で、
ｃｉ１＝単語Ａかつｃｉ２＝単語Ｂ、ｃｊ１＝単語Ａ−単語Ｂである場合に、Ｒｉを削除する。（但し、Ｃｏｍｉｊは、ＣｉとＣｊに共通するアイテムとする。）すなわち、２つのアソシエーションルールの違いとして、結論部のアイテムが、係り受け関係か単語の共起関係かによる場合。 (Iv) When Pi = Pj and Ci is {Comij, ci1, ci2} and Cj is {Comij, cj1,..., Cjn}
If ci1 = word A and ci2 = word B, cj1 = word A-word B, Ri is deleted. (However, Comij is an item common to Ci and Cj.) That is, as a difference between the two association rules, the item in the conclusion part depends on a dependency relationship or a word co-occurrence relationship.

例えば、
（Ｒ２−５）｛開設する（ない）｝⇒｛特定口座−申し込む（だ）｝確信度：１．００
（Ｒ３−２）｛開設する（ない）｝⇒｛特定口座、申し込む（だ）｝確信度：１．００ならば、（Ｒ３−２）を削除する。 For example,
(R2-5) {open (not)} ⇒ {specific account-apply (da)} certainty: 1.00
(R3-2) {Open (not)} ⇒ {Specific account, apply (da)} Certainty factor: If 1.00, delete (R3-2).

（ｖ）Ｃｉ：ＣｊかつＰｉが｛Ｃｏｍｉｊ，ｐｉ１｝，Ｐｊが｛Ｃｏｍｉｊ、ｐｊ１、．．ｐｊｎ｝である場合に、
ｐｉ１＝単語Ａ，かつ（ｐｊ１＝単語Ａ−単語Ｂ又は、ｐｊ１＝単語Ｂ−単語Ａ）である場合に、Ｒｉを削除する。（但し、Ｃｏｍｉｊは、ＰｉとＰｊに共通するアイテムとする。）
すなわち、２つのアソシエーションルールの違いとして、アソシエーションルールの条件部のアイテムが、単語アイテムか、それとも、その単語を含む係り受けアイテムかによる場合。 (V) Ci: Cj and Pi is {Comij, pi1}, Pj is {Comij, pj1,. . pjn},
If pi1 = word A and (pj1 = word A-word B or pj1 = word B-word A), Ri is deleted. (However, Comij is an item common to Pi and Pj.)
That is, the difference between the two association rules is whether the item in the condition part of the association rule is a word item or a dependency item including the word.

例えば、
（Ｒ２−１４）｛悪い｝⇒｛申し込む（だ）｝確信度：０．６７
（Ｒ２−１６）｛対応−悪い｝⇒｛申し込む（だ）｝確信度：０．６７
ならば、（Ｒ２−１４）を削除する。同様に、（Ｒ２−１２）も削除する。 For example,
(R2-14) {Bad} ⇒ {Apply (da)} Confidence: 0.67
(R2-16) {Correspondence-Bad} ⇒ {Apply (da)} Confidence: 0.67
Then, (R2-14) is deleted. Similarly, (R2-12) is also deleted.

（ｖｉ）Ｐｉ＝ＰｊかつＣｉが｛Ｃｏｍｉｊ，ｃｉ１｝，Ｃｊが｛ｃｏｍｉｊ，ｃｊ１，…，ｃｊｎ｝である場合に、
ｃｉ１＝単語Ａ，かつ（ｃｊ１＝単語Ａ−単語Ｂ又は、ｃｊ１＝単語Ｂ−単語Ａ）である場合に、Ｒｉを削除する。（但し、Ｃｏｍｉｊは、ＣｉとＣｊに共通するアイテムとする。）
すなわち、２つのアソシエーションルールの違いとして、アソシエーションルールの結論部のアイテムが、単語アイテムか、それとも、その単語を含む係り受けアイテムかによる場合。 (Vi) When Pi = Pj and Ci is {Comij, ci1} and Cj is {comij, cj1,..., Cjn},
If ci1 = word A and (cj1 = word A-word B or cj1 = word B-word A), Ri is deleted. (However, Comij is an item common to Ci and Cj.)
That is, as a difference between the two association rules, whether the item of the conclusion part of the association rule is a word item or a dependency item including the word.

例えば、
（Ｒ２−４）｛開設する（ない）｝⇒｛特定口座｝確信度：１．００
（Ｒ２−５）｛開設する（ない）〕⇒｛特定口座−申し込む（だ）｝確信度：１．００ならば、（Ｒ２−４）を削除する。 For example,
(R2-4) {open (not)} ⇒ {specific account} certainty: 1.00
(R2-5) {Open (not)] ⇒ {Specific account-Apply (da)} Confidence: If 1.00, delete (R2-4).

以上のように、アソシエーションルールが抽出されると、図３のステップ１６０に進み、表示部６が、ステップ１１５０で削除されなかったアソシエーションルールを表示する。結果として、図１５に示すように、取り消し線が引かれていない次のアソシエーションルールが出力される。 As described above, when the association rule is extracted, the process proceeds to step 160 in FIG. 3 and the display unit 6 displays the association rule that has not been deleted in step 1150. As a result, as shown in FIG. 15, the next association rule with no strikethrough is output.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、係り受け関係のみをアイテムとした場合と違い、単語を考慮することで、テキストデータ特有の主語や目的語の省略の対応した、アソシエーションルールを抽出することができる。 (A-3) Effects of the First Embodiment According to the first embodiment, unlike the case where only the dependency relationship is used as an item, by omitting the subject or object specific to the text data by considering the word. The corresponding association rule can be extracted.

例えば、係り受け関係だけをアイテムとした場合には、データＩＤ２の「開設する（ない）」は、主語が省略されているため、係り受け関係は設定されていないが、本手法では、単語と係り受け関係を共にアイテムとしているため、｛特定口座−申し込む（だ）｝⇒｛開設する（ない）｝といったアソシエーションルールも抽出できる。 For example, in the case where only the dependency relationship is an item, the subject of “open (not)” of the data ID 2 is omitted, so the dependency relationship is not set. Since both of the dependency relationships are items, association rules such as {specific account-apply (da)} ⇒ {open (no)} can be extracted.

（Ｂ）第２の実施形態
次に、本発明の情報分析システム、情報分析方法及び情報分析プログラムの第２の実施形態を図面を参照して説明する。 (B) Second Embodiment Next, a second embodiment of the information analysis system, information analysis method, and information analysis program of the present invention will be described with reference to the drawings.

第２の実施形態では、アソシエーションルールを出力する方法を説明する。 In the second embodiment, a method for outputting an association rule will be described.

（Ｂ−１）第２の実施形態の構成
図１６は、第２の実施形態のデータ分析装置７Ｂの内部構成例を示すブロック図である。 (B-1) Configuration of Second Embodiment FIG. 16 is a block diagram showing an example of the internal configuration of a data analysis device 7B of the second embodiment.

図１６に示す構成が、第１の実施形態と異なる点は、表示部６の機能構成であり、以下では、この表示部６の機能構成について詳細に説明し、第１の実施形態で説明した構成の詳細な説明は省略する。 The configuration shown in FIG. 16 is different from the first embodiment in the functional configuration of the display unit 6. In the following, the functional configuration of the display unit 6 will be described in detail and described in the first embodiment. Detailed description of the configuration is omitted.

図１７は、表示部６が表示するアソシエーションルールの表示イメージを示す図であり、図１７に示すように、各節点がそれぞれ１個のアイテムに対応し、節点間の枝がアソシエーションルールの条件部と結論部との関係を示す。なお、図１７では無向グラフで表示されており、どちらが条件部で、どちらが結論部であるかは示されていないが、それが明確になるように有向グラフにしてもよい。 FIG. 17 is a diagram showing a display image of the association rule displayed by the display unit 6. As shown in FIG. 17, each node corresponds to one item, and a branch between the nodes is a condition part of the association rule. And the relationship with the conclusion part. In FIG. 17, it is displayed as an undirected graph, and it is not shown which is a condition part and which is a conclusion part, but it may be a directed graph so that it becomes clear.

例えば、図１７の「対応−悪い」と「特定口座」の間の枝は、アソシエーションルール「｛対応−悪い｝⇒｛特定口座｝」又は「｛特定口座｝⇒｛対応−悪い｝」のいずれかが存在することを示している。 For example, the branch between “correspondence-bad” and “specific account” in FIG. 17 is either association rule “{corresponding-bad} → {specific account}” or “{specific account} → {corresponding-bad}”. Indicates that exists.

図１６に示すように、表示部６は、グラフ要素操作部６０１、グラフ表示部６０２、グラフ節点一時記憶部６０３、グラフ枝一時記憶部６０４とを有する。 As illustrated in FIG. 16, the display unit 6 includes a graph element operation unit 601, a graph display unit 602, a graph node temporary storage unit 603, and a graph branch temporary storage unit 604.

グラフ要素操作部６０１は、図１７に示すグラフ表示でアソシエーションルールを表示するために必要な操作を取り入れ、実行するものである。 The graph element operation unit 601 takes in and executes operations necessary for displaying the association rule in the graph display shown in FIG.

ここで、グラフ要素操作部６０１により実行される操作は、次の（操作１）〜（操作４）に示すようなものがあり、グラフ要素操作部６０１は、枝を削除したり、節点間に新しい枝を追加したり、上位概念を表す節点を新たに追加するものである。 Here, the operations executed by the graph element operation unit 601 are as shown in the following (operation 1) to (operation 4). The graph element operation unit 601 deletes a branch or between nodes. A new branch is added or a node representing a superordinate concept is newly added.

（操作１）概念的に類似した節点を近くに表示することで、利用者に対する発想支援を行う。例えば、係り受けアイテム｛対応−悪い｝と、単語アイテム「対応」、「悪い」が近くに表示されるようにする。 (Operation 1) Conceptually similar nodes are displayed nearby to support idea generation for the user. For example, the dependency item {correspondence-bad} and the word items “correspondence” and “bad” are displayed in the vicinity.

（操作２）グラフ表示する場合には、アイテム間に関係が存在するかどうかが重要であって、信頼度の違いはあまり重要ではない。 (Operation 2) When displaying a graph, it is important whether there is a relationship between items, and the difference in reliability is not so important.

従って、
（Ｒ２−１ｌ）｛対応｝⇒｛申し込む（だ）｝確信度：０．７５
（Ｒ２−１６）｛対応−悪い｝⇒申し込む（だ）｝確信度：０．６７
のように、２つのアソシエーションルールの違いとして、アソシエーションルールの条件部のアイテムが、単語アイテムか、それとも、その単語を含む係り受けアイテムかによる場合には、単語アイテムの方のアソシエーションルールを表示しない。この例では、（Ｒ２−１ｌ）は表示しない。 Therefore,
(R2-1l) {Correspondence} ⇒ {Apply (da)} Certainty: 0.75
(R2-16) {Correspondence-Bad} ⇒Apply (da)} Confidence: 0.67
If the item in the condition part of the association rule is a word item or a dependency item including the word, the association rule for the word item is not displayed as the difference between the two association rules. . In this example, (R2-1l) is not displayed.

さらに、２つのアソシエーションルールの違いとして、アソシエーションルールの結論部のアイテムが、単語アイテムか、それとも、その単語を含む係り受けアイテムかによる場合も同様とする。 Further, the same applies to the case where the item in the conclusion part of the association rule is a word item or a dependency item including the word as a difference between the two association rules.

（操作３）「対応−悪い」という係り受けアイテムを表示する際には、アソシエーションルール「｛対応〕⇒｛悪い｝」「｛悪い｝⇒｛対応｝」を表示しない。 (Operation 3) When displaying the dependency item “response-bad”, the association rules “{response] → {bad}” “{bad} → {response}” are not displayed.

（操作４）「可能」「否定」などの意図情報だけが異なる動詞や形容詞、形容動詞についても、近くに表示されるようにする。例えば、図１３におけるアイテム「事故−遭う」「遭う（た）」「事故−遭う（た）」である。 (Operation 4) Verbs, adjectives and adjective verbs that differ only in intention information such as “possible” and “denial” are also displayed nearby. For example, the items “accident-encounter”, “encounter (ta)”, and “accident-encounter (ta)” in FIG.

（操作１）や（操作４）で、アイテムを近くに表示する方法として、これらの操作で関連づけられるアイテム間に枝を追加したり、上位概念を表す節点を新たに追加したり、それらの間に枝を設定する。 In (Operation 1) and (Operation 4), as a method of displaying items close to each other, a branch is added between items related by these operations, a node representing a superordinate concept is newly added, Set a branch to.

グラフ表示部６０２は、グラフ要素操作部６０１の結果に従ってグラフを作成するものである。 The graph display unit 602 creates a graph according to the result of the graph element operation unit 601.

（Ｂ−２）第２の実施形態の動作
次に、第２の実施形態の動作について図面を参照して説明する。 (B-2) Operation of Second Embodiment Next, the operation of the second embodiment will be described with reference to the drawings.

図１８及び図１９は、第２の実施形態の表示部６による表示処理を示すフローチャートである。なお、以下において、表示部６が出力するアソシエーションルールは、図１５に示す大きさ１のアイテム集合で構成されるアソシエーションルールとする。 18 and 19 are flowcharts showing display processing by the display unit 6 according to the second embodiment. In the following, the association rule output by the display unit 6 is an association rule composed of an item set of size 1 shown in FIG.

アソシエーションルール抽出部５によりアソシエーションルールが表示部６に与えられると、表示部６のグラフ要素操作部６０１は、アソシエーションルールの条件部と結論部との各アイテムをグラフの節点とし、各アイテムをグラフ節点一時記憶部６０３に格納する（ステップ２０００）。 When an association rule is given to the display unit 6 by the association rule extraction unit 5, the graph element operation unit 601 of the display unit 6 uses each item of the condition part and the conclusion part of the association rule as a node of the graph, and displays each item as a graph. The data is stored in the node temporary storage unit 603 (step 2000).

図２０は、グラフ節点一時記憶部６０３に格納される格納例を示す。図２０に示すように、グラフ節点一時記憶部６０３は、「節点」、「出現数」、「該当文書ＩＤ」の各項目から構成される。 FIG. 20 shows an example of storage stored in the graph node temporary storage unit 603. As illustrated in FIG. 20, the graph node temporary storage unit 603 includes items of “node”, “number of appearances”, and “corresponding document ID”.

「節点」項目は、アソシエーションルールの条件部と結論部に現れるすべてのアイテムを格納する。「出現数」項目と「該当文書ＩＤ」項目とは、図１０の大きさ１のアイテム集合一時記憶部５０５から取得して格納する。 The “node” item stores all items appearing in the condition part and the conclusion part of the association rule. The “appearance number” item and the “corresponding document ID” item are acquired from the item set temporary storage unit 505 of size 1 in FIG. 10 and stored.

さらに、グラフ要素操作部６０１は、条件部と結論部との関係を、グラフの節点をつなぐ枝とし、グラフ枝一時記憶部６０４に格納する（ステップ２０００）。 Further, the graph element operation unit 601 stores the relationship between the condition part and the conclusion part in the graph branch temporary storage unit 604 as a branch connecting the nodes of the graph (step 2000).

図２１は、グラフ枝一時記憶部６０４に格納される格納例を示す。図２０に示すように、グラフ枝一時記憶部６０４は、「節点１」、「節点２」、「種類」、「出現数」、「該当文書１Ｄ」の各項目から構成される。 FIG. 21 shows an example of storage stored in the graph branch temporary storage unit 604. As illustrated in FIG. 20, the graph branch temporary storage unit 604 includes items of “node 1”, “node 2”, “type”, “number of appearances”, and “corresponding document 1D”.

「節点１」、「節点２」項目は、枝の２つの節点を表す。「種類」項目は、枝の種類を表し、枝がアソシエーションルールから作られたのか、それとも、（操作１）や（操作４）から作られたのか、の情報を格納する。「出現数」項目と「該当文書ＩＤ」項目は、図１１の大きさ２のアイテム集合一時記憶部５０５から取得して格納する。 The “node 1” and “node 2” items represent two nodes of the branch. The “type” item represents a branch type, and stores information on whether the branch is created from an association rule or whether it is created from (operation 1) or (operation 4). The “appearance number” item and the “corresponding document ID” item are acquired from the item set temporary storage unit 505 of size 2 in FIG. 11 and stored.

グラフ要素操作部６０１は、グラフ節点一時記憶部６０３に格納されている節点を取り出し、その取り出した節点が、係り受けアイテムであるか又は単語アイテムであるかを判別する（ステップ２０２０）。 The graph element operation unit 601 extracts a node stored in the graph node temporary storage unit 603, and determines whether the extracted node is a dependency item or a word item (step 2020).

そして、選択した節点が、係り受けアイテムである場合ステップ２０３０に進み、単語アイテムである場合ステップ２０７０に進む。 If the selected node is a dependency item, the process proceeds to step 2030. If the selected node is a word item, the process proceeds to step 2070.

選択した節点が係り受けアイテムである場合、グラフ要素操作部６０１は、当該節点の係り元単語や係り先単語が、グラフ節点一時記憶部６０３に単語アイテムとして存在するか否かを判断し、グラフ節点一時記憶部６０３に存在するとき、その係り受けアイテムと、係り元単語又は係り先単語とを節点とする２つの節点の間に枝を設定する（ステップ２０３０）。 When the selected node is a dependency item, the graph element operation unit 601 determines whether the dependency source word or the dependency destination word of the node exists as a word item in the graph node temporary storage unit 603, and the graph When the node temporary storage unit 603 exists, a branch is set between two nodes having the dependency item and the source word or destination word as nodes (step 2030).

図２２は、ステップ２０３０の処理によって更新されたグラフ枝一時記憶部６０４の格納内容を示す。ここで、追加する枝は、「節点１」が「単語アイテム」、「節点２」が「係り受けアイテム」とし、「枝種類」項目の値を「係り受け」にする。「出現数」項目と「該当文書ＩＤ」項目は、図１０の大きさ１のアイテム集合一時記憶部５０５を参照し、単語アイテムの値を格納する。 FIG. 22 shows the stored contents of the graph branch temporary storage unit 604 updated by the processing of step 2030. Here, regarding the branch to be added, “node 1” is “word item”, “node 2” is “dependency item”, and the value of the “branch type” item is “dependency”. The “appearance number” item and the “corresponding document ID” item refer to the item set temporary storage unit 505 of size 1 in FIG. 10 and store the value of the word item.

例えば、図２２における係り受けアイテム｛対応−悪い｝を例に挙げると、グラフ要素操作部６０１は、この係り元単語｛対応｝及び係り先単語｛悪い｝がグラフ節点一時記憶部６０３に存在することを確認すると、係り元単語｛対応｝を「節点１」に格納し、係り受けアイテム｛対応−悪い｝を「節点２」に格納し、「枝種類」に「係り受け」を格納する。また、「出現数」及び「該当文書ＩＤ」にはアイテム集合一時記憶部５０５の格納項目を格納する。 For example, taking the dependency item {correspondence-bad} in FIG. 22 as an example, the graph element operation unit 601 includes the dependency source word {correspondence} and the dependency destination word {bad} in the graph node temporary storage unit 603. If confirmed, the dependency source word {correspondence} is stored in "node 1", the dependency item {response-bad} is stored in "node 2", and "dependency" is stored in "branch type". In addition, items stored in the item set temporary storage unit 505 are stored in the “appearance count” and “corresponding document ID”.

また、グラフ節点一時記憶部６０３に単語アイテムとして存在し、枝を設定することができたら、ステップ２０５０に進み、グラフ節点一時記憶部６０３に単語アイテムとして存在していなかったら、ステップ２０４０に進む。 If it exists as a word item in the graph node temporary storage unit 603 and a branch can be set, the process proceeds to step 2050. If it does not exist as a word item in the graph node temporary storage unit 603, the process proceeds to step 2040.

ステップ２０４０では、当該節点の係り元単語や係り先単語が、グラフ節点一時記憶部６０３の別の係り受けアイテムの係り元単語又は係り先単語に一致していたら、一致した単語を、節点として追加し、２つの節点の間に枝を設定する。 In step 2040, if the source word or destination word of the relevant node matches the source word or destination word of another dependency item in the graph node temporary storage unit 603, the matched word is added as a node. And set a branch between the two nodes.

例えば、｛対応−悪い｝と｛対応−遅い｝がグラフ節点一時記憶部６０３に存在していたら、新たに単語アイテム「対応」を、グラフ節点一時記憶部６０３に追加し、「対応」と「対応−悪い」の間、「対応」と「対応−遅い」の間に枝を設定する。 For example, if {correspondence-bad} and {correspondence-slow} exist in the graph node temporary storage unit 603, the word item “correspondence” is newly added to the graph node temporary storage unit 603, and “correspondence” and “ A branch is set between “response” and “response-slow” while “response-bad”.

ステップ２０５０では、（操作４）を実施する。すなわち、当該節点の係り元単語や係り先単語が、動詞、形容詞又は形容動詞であり、「可能」「否定」などの意図情報を含むなら、意図情報を含まない単語アイテムをグラフ節点一時記憶部６０３に登録し、その単語アイテムとの間に枝を設定する。但し、単語アイテムがグラフ節点一時記憶部６０３に既に登録されているなら、枝だけをグラフ枝一時記憶部に格納する。その後、ステップ２０１０に戻り、処理を繰り返す。 In Step 2050, (Operation 4) is performed. That is, if the source word or destination word of the node is a verb, an adjective or an adjective verb and includes intention information such as “possible” and “denial”, the word item not including the intention information is stored in the graph node temporary storage unit. Register to 603 and set a branch between the word item. However, if the word item is already registered in the graph node temporary storage unit 603, only the branch is stored in the graph branch temporary storage unit. Then, it returns to step 2010 and repeats processing.

図２３は、ステップ２０５０の処理によって更新されたグラフ節点一時記憶部６０３の格納内容例を示す。 FIG. 23 shows an example of the stored contents of the graph node temporary storage unit 603 updated by the processing of step 2050.

ここで、追加される節点の例は、図２０の１行目の係り受けアイテム「特定口座−申し込む（だ）」の係り先単語「申し込む（だ）」から意図情報の除いた単語「申し込む」である。「出現数」項目と「該当文書ＩＤ」項目は、図２０の「特定口座−申し込む（だ）」の「出現数」項目と「該当文書ＩＤ」項目をそのまま設定する。但し、意図情報の除いた単語「申し込む」が既に登録されている場合には、出現数項目と該当文書ＩＤ項目については、既に登録されている値に付け加える。 Here, an example of the added node is the word “apply” obtained by removing the intention information from the dependency word “apply (da)” of the dependency item “specific account—apply (da)” in the first line of FIG. 20. It is. In the “appearance number” item and “corresponding document ID” item, the “appearance number” item and “corresponding document ID” item of “specific account-apply (da)” in FIG. 20 are set as they are. However, when the word “apply” excluding the intention information is already registered, the appearance number item and the corresponding document ID item are added to the already registered values.

図２２に、ステップ２０５０の処理によって更新されたグラフ枝一時記憶部６０４の格納内容例を示す。 FIG. 22 shows an example of the contents stored in the graph branch temporary storage unit 604 updated by the processing in step 2050.

ここで、追加される枝の例は、「節点１」が意図情報を含まないアイテム、「節点２」が当該節点とし、「枝種類」項目の値を「動詞」にする。「出現数」項目と「該当文書ＩＤ」項目は、当該節点の出現数項目と該当文書ＩＤをそのまま設定する。 Here, in the example of the added branch, “node 1” is an item that does not include intention information, “node 2” is the node, and the value of the “branch type” item is “verb”. In the “appearance number” item and the “corresponding document ID” item, the appearance number item of the node and the corresponding document ID are set as they are.

ステップ２０６０では、（操作４）を実施する。当該節点が、動詞、形容詞又は形容動詞であり、「可能」「否定」などの意図情報を含むなら、意図情報を含まない単語アイテムをグラフ節点一時記憶部に登録し、その単語アイテムとの間に枝を設定する。但し、単語アイテムがグラフ節点一時記憶部に既に登録されているなら、枝だけをグラフ枝一時記憶部に格納する。ステップ２０１０に戻る。 In Step 2060, (Operation 4) is performed. If the node is a verb, an adjective or an adjective verb and includes intention information such as “possible” or “denial”, a word item not including the intention information is registered in the graph node temporary storage unit, and Set a branch to. However, if the word item is already registered in the graph node temporary storage unit, only the branch is stored in the graph branch temporary storage unit. Return to step 2010.

図２４に、ステップ２０６０の処理によって更新されたグラフ節点一時記憶部６０３の格納内容例を示す。 FIG. 24 shows an example of the contents stored in the graph node temporary storage unit 603 updated by the processing of step 2060.

ここで、追加される節点例は、図２０の２、３行目の単語アイテム「申し込む（だ）」「開設する（ない）」から意図情報の除いた単語「申し込む」「開設する」である。単語「開設する」の「出現数」項目と「該当文書ＩＤ」項目は、図２０の「開設する（ない）」の出現数項目と該当文書ＩＤ項目をそのまま設定する。単語「申し込む」は既に登録されているので、「出現数」項目と「該当文書ＩＤ」項目に関しては、既に登録されている値に付け加える。 Here, the added node examples are the words “Apply” and “Open” obtained by removing the intention information from the word items “Apply (da)” and “Open (not)” in the second and third lines of FIG. . In the “appearance number” item and the “corresponding document ID” item of the word “open”, the appearance number item and the corresponding document ID item of “open (none)” in FIG. 20 are set as they are. Since the word “apply” is already registered, the “appearance number” item and the “corresponding document ID” item are added to the already registered values.

グラフ要素操作部６０１は、グラフ枝一時記憶部６０４の各要素について、次のステップ２０８０以降の処理を実施し、グラフ枝一時記憶部６０４のすべての要素で処理が終了したら、ステップ２１２０に進む（ステップ２０７０）。 The graph element operation unit 601 performs the processing from the next step 2080 on each element of the graph branch temporary storage unit 604, and when the processing is completed for all the elements of the graph branch temporary storage unit 604, the process proceeds to step 2120 ( Step 2070).

グラフ要素操作部６０１がグラフ枝一時記憶部６０４から処理対象の枝を選択すると、その選択した枝の「種類」項目が「ルール」である場合、ステップ２０９０に進み、それ以外である場合、ステップ２０７０に戻り処理を繰り返す（ステップ２０８０）。 When the graph element operation unit 601 selects the branch to be processed from the graph branch temporary storage unit 604, if the “type” item of the selected branch is “rule”, the process proceeds to step 2090; Returning to 2070, the processing is repeated (step 2080).

選択した枝の「種類」項目が「ルール」である場合、グラフ要素操作部６０１は、その処理対象の枝に設定される「節点１」と「節点」とが共に単語アイテムならばステップ２１００に進み、それ以外ならばステップ２１１０に戻り処理を繰り返す（ステップ２０９０）。 When the “type” item of the selected branch is “rule”, the graph element operation unit 601 proceeds to step 2100 if both “node 1” and “node” set to the processing target branch are word items. Otherwise, return to Step 2110 and repeat the process (Step 2090).

処理対象の枝の「節点１」及び「節点２」が共に単語アイテムである場合、グラフ要素操作部６０１は、（操作３）を実施する。すなわち、グラフ要素操作部６０１は、係り受け関係「選択した枝の節点１−選択した枝の節点２」、又は「選択した枝の節点２−選択した枝の節点１」が、他の枝の節点に存在するならば、処理対象の枝を削除する（ステップ１１００）。 When both “node 1” and “node 2” of the branch to be processed are word items, the graph element operation unit 601 performs (operation 3). That is, the graph element operation unit 601 determines that the dependency relationship “node of the selected branch 1-node 2 of the selected branch” or “node 2 of the selected branch 2—node 1 of the selected branch” is the other branch. If it exists at the node, the branch to be processed is deleted (step 1100).

例えば、図２２に、ステップ２１００の処理によって更新されたグラフ枝一時記憶部６０４の格納内容例を示す。図２２において、１行目と４行目が削除されている。１行目は、「節点１」と「節点２」がともに単語アイテムであり、７行目の「節点１」に係り受け関係「対応−悪い」が存在するため、削除されている。４行目も同様に、２行目の「節点２」に係り受け関係「特定口座−申し込む（だ）」が存在するため、削除されている。 For example, FIG. 22 shows an example of the contents stored in the graph branch temporary storage unit 604 updated by the processing in step 2100. In FIG. 22, the first and fourth lines are deleted. In the first line, “node 1” and “node 2” are both word items, and since there is a dependency relationship “correspondence-bad” on “node 1” in the seventh line, it is deleted. Similarly, the fourth line has been deleted because there is a dependency relationship “specific account—apply” at “node 2” on the second line.

処理対象の枝の「節点１」及び「節点２」が共に単語アイテムでない場合、グラフ要素操作部６０１は、（操作２）を実施する。すなわち、グラフ要素操作部６０１は、処理対象の枝と比較して、「節点１」又は「節点２」のアイテムが、単語アイテムか、それとも、その単語を含む係り受けアイテムかによる違いのみである枝がある場合に、単語アイテムの方の節点を削除する。その後、ステップ２０７０に戻り、処理を繰り返す。 When “node 1” and “node 2” of the branch to be processed are not word items, the graph element operation unit 601 performs (operation 2). That is, the graph element operation unit 601 is different from the branch to be processed only in whether the item of “node 1” or “node 2” is a word item or a dependency item including the word. If there is a branch, delete the node for the word item. Thereafter, the process returns to step 2070 and the process is repeated.

例えば、図２２に、ステップ２１１０の処理によって更新されたグラフ枝一時記憶部６０４の格納内容例を示す。図２２において、３行目と６行目が削除されている。３行目と５行目とでは、３行目の節点１「特定口座」と６行目の節点１「特定口座−申し込む（だ）」が異なるだけで、節点２は共に「対応」で共通である。従って、節点１が単語アイテムである３行目が削除される。 For example, FIG. 22 shows an example of the contents stored in the graph branch temporary storage unit 604 updated by the processing in step 2110. In FIG. 22, the third and sixth lines are deleted. Lines 3 and 5 differ only in node 1 "specific account" on line 3 and node 1 "specific account-apply for" on line 6; It is. Therefore, the third line in which node 1 is a word item is deleted.

５行目と８行目とでは、５行目の節点１「対応」と８行目の節点１「対応−悪い」が異なるだけで、節点２は共に「申し込む（だ）」で共通である。従って、節点１が単語アイテムである５行目が削除される。 In the 5th and 8th lines, the node 1 “correspondence” in the 5th line is different from the node 1 “correspondence-bad” in the 8th line, and the node 2 is common to “apply”. . Therefore, the fifth line in which node 1 is a word item is deleted.

また、グラフ要素操作部６０１によりすべての枝について処理が行われると、グラフ表示部６０２は、グラフ節点一時記憶部６０３とグラフ枝一時記憶部６０４に基づいて、グラフ形式で表示する（ステップ２１２０）。 When the graph element operation unit 601 has processed all branches, the graph display unit 602 displays the graph in a graph format based on the graph node temporary storage unit 603 and the graph branch temporary storage unit 604 (step 2120). .

なお、グラフ表示部６０２によるグラフ形式の表示方法は、例えば、非特許文献３等に記載されている方式を利用することができる。 For example, a method described in Non-Patent Document 3 or the like can be used as a method of displaying the graph format by the graph display unit 602.

結果として、図２５に示すようなグラフ表現を出力する。なお図２５において、図の実線の枝は、アソシエーションルールによる関係を、破線は、それ以外の関係を表す。図１７に比べて、アソシエーションルールを表す枝が削減され、単語アイテムと、当該単語アイテムを含む係り受けブイテムとの位置が近くなっている。 As a result, a graph representation as shown in FIG. 25 is output. In FIG. 25, the solid line branch in the figure represents the relationship according to the association rule, and the broken line represents the other relationship. Compared to FIG. 17, branches representing association rules are reduced, and the positions of the word item and the dependency item including the word item are closer.

（Ｂ−３）第２の実施形態の効果
第２の実施形態によれば、アソシエーションルールをグラフ形式で表示する際に、意味的に類似した節点を近くに表示することで、ユーザは、より重要なルールを見つけやすくなる。 (B-3) Effect of Second Embodiment According to the second embodiment, when displaying association rules in a graph format, by displaying nodes that are semantically similar to each other nearby, the user can Helps you find important rules.

また、２つのアソシエーションルールの違いが少なく、より限定したアソシエーションルールのみを表示することで、ユーザは、他の重要なアソシエーションルールを見落とすことを防ぐことができる。 In addition, the difference between the two association rules is small, and by displaying only the more limited association rules, the user can prevent overlooking other important association rules.

（Ｃ）第３の実施形態
次に、本発明の情報分析システム、情報分析方法及び情報分析プログラムの第３の実施形態を図面を参照して説明する。 (C) Third Embodiment Next, a third embodiment of the information analysis system, the information analysis method, and the information analysis program of the present invention will be described with reference to the drawings.

単語のみから構成されるアイテム集合は、係り受けアイテムより支持度や確信度が高くなるため、単語アイテムのみから構成されるアソシエーションルールが多数生成されてしまう。そのようなルールが多く出力されると、利用者が、重要なルールを見落としてしまう可能性が高まる。そこで、第３の実施形態では、単語アイテムのみから構成されたアソシエーションルールが多数出力されないように、アソシエーションルールを制限する。 Since an item set composed only of words has higher support and certainty than dependency items, a large number of association rules composed only of word items are generated. If many such rules are output, the user is more likely to miss important rules. Therefore, in the third embodiment, association rules are limited so that a large number of association rules composed only of word items are not output.

具体的には、例えば、｛母−申し込む（だ）｝⇒｛特定口座−申し込む（だ）｝のように、係り受けアイテムを含むアソシエーションルールでは、低い割合の支持度や確信度でも表示し、｛母｝⇒｛宣伝する｝のように、単語アイテムのみで構成されるアソシエーションルールは、より高い支持度や確信度で表示する。 Specifically, for example, in association rules that include dependency items, such as {mother-apply (da)} ⇒ {specific account-apply (da)}, a low percentage of support and confidence are displayed. An association rule composed only of word items, such as {mother} ⇒ {advertise}, is displayed with a higher degree of support and certainty.

そこで、第３の実施形態では、係り受けアイテムと単語アイテムで異なる支持度や確信度を設定できるようにする。これによって、単語アイテムのみから構成されたアソシエーションルールがたくさん表示されることを防ぐことができる。 Therefore, in the third embodiment, different support levels and certainty levels can be set for the dependency item and the word item. As a result, it is possible to prevent a large number of association rules composed only of word items from being displayed.

（Ｃ−１）第３の実施形態の構成及び動作
第３の実施形態の構成は、図１に示す第１の実施形態の構成に対応するので、以下では図１を用いて第３の実施形態を説明する。 (C-1) Configuration and Operation of Third Embodiment Since the configuration of the third embodiment corresponds to the configuration of the first embodiment shown in FIG. 1, the third embodiment will be described below using FIG. A form is demonstrated.

第３の実施形態が第１の実施形態と異なる点は、第１の実施形態では、入力部１が「最小支持度」、「最小確信度」を取り込んでいたが、第３の実施形態では、「単語アイテムのみの最小支持度」、「それ以外の場合の最小支持度」、「単語アイテムのみの最小確信度」、「それ以外の場合の最小確信度」を取り込む点である。 The third embodiment is different from the first embodiment in that, in the first embodiment, the input unit 1 takes in “minimum support” and “minimum certainty”, but in the third embodiment, , “Minimum support for word items only”, “minimum support for other cases”, “minimum confidence only for word items”, and “minimum confidence in other cases”.

また、第３の実施形態の候補アイテム集合計算部５０３が、「単語アイテムのみの最小支持度」、「それ以外の場合の最小支持度」に基づいて、処理を継続するアイテム集合を選別する点である。 In addition, the candidate item set calculation unit 503 of the third embodiment selects an item set to continue processing based on “minimum support level only for word items” and “minimum support level in other cases”. It is.

さらに、第３の実施形態のルール作成部５０４が、「単語アイテムのみの最小確信度」、「それ以外の場合の最小確信度」に基づいて、得られたアソシエーションルールを選別する点である。 Furthermore, the rule creation unit 504 of the third embodiment is to select the obtained association rule based on “minimum certainty only for word items” and “minimum certainty in other cases”.

また、図３において、第３の実施形態が第１の実施形態と異なるのは、ステップ１００とステップ１５０であるので、以下では、これらに対応する第３の実施形態の特徴的な処理をステップ３００及び３５０として入れ替えて説明する。 Also, in FIG. 3, the third embodiment differs from the first embodiment in step 100 and step 150. Therefore, in the following, the characteristic processing of the third embodiment corresponding to these steps is performed. The description will be exchanged as 300 and 350.

まず、入力部１は、図３の第１の実施形態と同様に、データの入力を行う。このとき、入力部１は、「単語アイテムのみの最小支持度」、「それ以外の場合の最小支持度」、「単語アイテムのみの最小確信度」、「それ以外の場合の最小確信度」、「作成するアイテム集合の最大サイズ」を取り込み、アソシエーションルール抽出部５に与える（ステップ３００）。 First, the input unit 1 inputs data as in the first embodiment of FIG. At this time, the input unit 1 includes “minimum support level only for word items”, “minimum support level for other cases”, “minimum confidence level for only word items”, “minimum confidence level for other cases”, The “maximum size of the item set to be created” is fetched and given to the association rule extraction unit 5 (step 300).

例えば、「単語アイテムのみの最小支持度」＝２、「それ以外の場合の最小支持度」＝２、「単語アイテムのみの最小確信度」＝０．８、「それ以外の場合の最小確信度」＝０．６、「作成するアイテム集合の最大の大きさ」＝３とする。 For example, “minimum support for word items only” = 2, “minimum support for other cases” = 2, “minimum confidence only for word items” = 0.8, “minimum confidence in other cases” ] = 0.6, “maximum size of item set to be created” = 3.

また、ステップ３５０において、アソシエーションルール抽出部５は、図２６に示すアソシエーションルールの作成処理を行う。 In step 350, the association rule extraction unit 5 performs an association rule creation process shown in FIG.

ここで、図２６は、第３の実施形態のアソシエーションルール抽出部５における処理を示し、図３の処理と対応する処理については同一する符号を付して示す。 Here, FIG. 26 shows processing in the association rule extraction unit 5 of the third embodiment, and processing corresponding to the processing in FIG. 3 is given the same reference numerals.

まず、候補アイテム集合生成部５０１による大きさ１のアイテム集合の生成及び候補アイテム集合計算部５０３による大きさ１のアイテム集合の出現数の計算は第１の実施形態の処理と同様である。 First, generation of an item set of size 1 by the candidate item set generation unit 501 and calculation of the number of appearances of an item set of size 1 by the candidate item set calculation unit 503 are the same as the processing of the first embodiment.

候補アイテム集合削除部５０２は、各アイテム集合のうち単語アイテムについて、入力部１に入力された「単語アイテムのみの最小支持度」と比較し、各単語アイテムの出現数が「単語アイテムのみの最小支持度」未満である場合、その単語アイテムをアイテム集合一時記憶部５０５から削除する（ステップ３０２０）。 The candidate item set deletion unit 502 compares the word item in each item set with the “minimum support level of only word items” input to the input unit 1, and the number of occurrences of each word item is “minimum of only word items” If it is less than “support level”, the word item is deleted from the item set temporary storage unit 505 (step 3020).

また、候補アイテム集合削除部５０２は、各アイテム集合のうち係り受けアイテムについては、入力された「それ以外の場合の最小支持度」と比較し、各係り受けアイテムの出現数が「それ以外の場合の最小支持度」未満である場合、その係り受けアイテムをアイテム集合一時記憶部５０５から削除する（ステップ３０２０）。 Further, the candidate item set deletion unit 502 compares the dependency item of each item set with the input “minimum support level in other cases”, and the number of appearance of each dependency item is “other than that” If it is less than the “minimum support level”, the dependency item is deleted from the item set temporary storage unit 505 (step 3020).

その後、制御部５００がカウンタｎを２にセットし、候補アイテム集合生成部５０１が大きさ２のアイテム集合を作成し、第１の実施形態と同様にして、候補アイテム集合削除部５０２が部分集合を含むアイテム集合を削除する（ステップ１０３０〜１０６０）。 After that, the control unit 500 sets the counter n to 2, the candidate item set generation unit 501 creates an item set of size 2, and the candidate item set deletion unit 502 sets the subset as in the first embodiment. Is deleted (steps 1030 to 1060).

そして、候補アイテム集合計算部５０３は、大きさ２の各アイテム集合についての支持度を計算し、大きさ２のアイテム集合のうち、構成要素がすべて単語アイテムのものについては、その支持度と「単語アイテムのみの最小支持度」とを比較し、「単語アイテムのみの最小支持度」未満のアイテム集合をアイテム集合一時記憶部５０５から削除する（ステップ３０７０）。 Then, the candidate item set calculation unit 503 calculates the support level for each item set of size 2, and among the item sets of size 2, for all items whose constituent elements are word items, the support level and “ Compared with “minimum support level of only word items”, an item set less than “minimum support level of only word items” is deleted from the item set temporary storage unit 505 (step 3070).

また、候補アイテム集合計算部５０３は、大きさ２のアイテム集合のうち、少なくとも１つ以上の係り受けアイテムを構成要素として有するものについては、その支持度と「それ以外の場合の最小支持度」とを比較し、「それ以外の場合の最小支持度」未満のアイテム集合をアイテム集合一時記憶部５０５から削除する（ステップ３０７０）。 In addition, the candidate item set calculation unit 503 has a support level and “minimum support level in other cases” for items having at least one dependency item among the size 2 item sets. And an item set less than “minimum support in other cases” is deleted from the item set temporary storage unit 505 (step 3070).

そして、制御部５００は、カウンタｎが、入力部１から入力された「作成するアイテム集合の最大サイズ」と等しい場合、又は、すべてのアイテム集合で、最小支持度未満だった場合には、ステップ１１００に進み、それ以外の場合には、ステップ１０９０に進み、カウンタｎに１を加算し、ステップ１０５０に戻って処理を繰り返す（ステップ３０８０）。 When the counter n is equal to the “maximum size of the item set to be created” input from the input unit 1 or when all the item sets are less than the minimum support level, Proceed to 1100, otherwise proceed to step 1090, add 1 to the counter n, return to step 1050 and repeat the process (step 3080).

そして、第１の実施形態と同様に、アイテム集合一時記憶部５０５に格納されるすべてのアイテム集合について条件部と結論部とに分割されると、各アイテム集合の条件部と結論部のすべての組み合わせについての確信度を計算する（ステップ１１００〜１１３０）。 As in the first embodiment, when all the item sets stored in the item set temporary storage unit 505 are divided into the condition part and the conclusion part, all of the condition part and the conclusion part of each item set are stored. The certainty factor for the combination is calculated (steps 1100 to 1130).

そして、各アイテム集合の条件部と結論部がすべて単語アイテムである場合、その組み合わせの確信度と「単語アイテムのみの最小確信度」とを比較し、その確信度が「単語アイテムのみの最小確信度」未満であるとき、その組み合わせを削除する（ステップ３１４０）。すなわち、アソシエーションルールとして採用しない。 Then, when the condition part and the conclusion part of each item set are all word items, the certainty of the combination is compared with the “minimum certainty only for word items” and the certainty is “minimum certainty only for word items”. If it is less than “degree”, the combination is deleted (step 3140). That is, it is not adopted as an association rule.

また、各アイテム集合の条件部と結論部のいずれかに少なくとも係り受けアイテムを有する場合、その組み合わせの確信度と「それ以外の最小確信度」とを比較し、その確信度が「それ以外の最小確信度」未満であるとき、その組み合わせを削除する（ステップ３１４０）。 In addition, when there is at least a dependency item in either the condition part or the conclusion part of each item set, the reliability of the combination is compared with the "other minimum confidence" and the confidence is If it is less than “minimum certainty”, the combination is deleted (step 3140).

その後、ステップ１１２０又はステップ１１００に戻り、すべてのアイテム集合のすべての組み合わせについて終了すると、アソシエーションルールの重複するものが選別され、処理が終了する（ステップ３１５０）。 Thereafter, the process returns to Step 1120 or Step 1100, and when all combinations of all item sets are completed, duplicated association rules are selected and the process ends (Step 3150).

ここで、図２７は、大きさ２のアイテム集合からできるアソシエーションルールを示す。図２７において、取り消し線が、第１の実施形態に比べて削除されたルールである。「単語アイテムのみの最小確信度」を高く設定（０．８に設定）したために、単語アイテムから構成されるアソシエーションルールが削除されている。 Here, FIG. 27 shows an association rule made from a size 2 item set. In FIG. 27, the strikethrough is a rule that has been deleted compared to the first embodiment. Since the “minimum certainty only for word items” is set high (set to 0.8), the association rule composed of word items is deleted.

（Ｃ−２）第３の実施形態効果
以上のように、第３の実施形態によれば、第１の実施形態と同様の効果を得ることができる。 (C-2) Effects of Third Embodiment As described above, according to the third embodiment, the same effects as those of the first embodiment can be obtained.

また、第３の実施形態によれば、係り受けアイテムと単語アイテムで、異なる支持度、確信度を設定できるようにすることで、支持度が高く、たくさん出現しがちな単語アイテムのみから構成されるアソシエーションルールの出力を抑制することができる。これによって、他の重要なアソシエーションルールを見落とすことを防ぐことができる。 In addition, according to the third embodiment, the support item and the word item can be set to have different support and certainty, so that the support item is high and the word item is likely to appear a lot. Output of association rules can be suppressed. This can prevent overlooking other important association rules.

（Ｄ）第４の実施形態
次に、本発明の情報分析システム、情報分析方法及び情報分析プログラムの第４の実施形態を図面を参照して説明する。 (D) Fourth Embodiment Next, a fourth embodiment of the information analysis system, information analysis method, and information analysis program of the present invention will be described with reference to the drawings.

第１の実施形態では、係り受けアイテムだけでなく、単語アイテムも設定したことによって、あまり意味のないアイテム集合が生成されることがある。 In the first embodiment, not only dependency items but also word items are set, so that an item set that is not very meaningful may be generated.

例えば、単語アイテム「母」と「宣伝する（いる）」を組み合わせたアイテム集合｛母、宣伝する（いる）｝というアイテム集合である。 For example, the item set {mother, advertise (is)} is an item set that combines the word items “mother” and “advertise (is)”.

一方、単語アイテム「悪い」と「申し込む（だ）」を組み合わせたアイテム集合｛悪い、申し込む（だ）｝では、「申し込んだ際に、何か悪いことがあったのではないか」という推測が働き、必ずしも誤ったアイテム集合とはいえない。 On the other hand, in the item set {bad, apply (da)} that combines the word items "bad" and "apply (da)", the guess is that something bad happened when you applied Working, not necessarily the wrong item set.

そこで、第４の実施形態では、一方が名詞で、他方が用言（動詞、形容詞や形容動詞）である単語アイテムを組み合わせたことによって生ずる意味の誤ったアイテム集合が、なるべく生成されないようにする。 Therefore, in the fourth embodiment, an item set having an erroneous meaning caused by combining word items in which one is a noun and the other is a predicate (verb, adjective or adjective verb) is prevented from being generated as much as possible. .

そのために、一方が名詞で、他方が用言の単語のみから構成されるアイテム集合を作成する場合には、その単語間に少なくとも１つの係り受け関係が存在していることを前提条件にする。例えば、単語アイテム「母」と「宣伝する（いる）」を組み合わせたアイテム集合｛母、宣伝する（いる）｝を作成する場合には、元の入力データに、「母−宣伝する（いる）」という係り受け関係が存在している場合に限ることにする。 For this reason, when creating an item set in which one is a noun and the other is only a word of predicate, it is assumed that at least one dependency relationship exists between the words. For example, to create an item set {mother, advertise (is)} that combines the word items “mother” and “advertise (is)”, the original input data contains “mother-advertise (is)”. Only when there is a dependency relationship.

（Ｄ−１）第４の実施形態の構成及び動作
第４の実施形態の構成は、図１に示す第１の実施形態の構成に対応する。また、第４の実施形態が第１の実施形態と異なる点は、アソシエーションルール抽出部５によるアソシエーションルールの抽出処理である。 (D-1) Configuration and Operation of Fourth Embodiment The configuration of the fourth embodiment corresponds to the configuration of the first embodiment shown in FIG. Further, the fourth embodiment differs from the first embodiment in the association rule extraction processing by the association rule extraction unit 5.

そこで、以下では、第４の実施形態のアソシエーションルール抽出部５のアソシエーションルール抽出処理について図２８のフローチャートを参照して説明する。 Therefore, in the following, association rule extraction processing of the association rule extraction unit 5 of the fourth embodiment will be described with reference to the flowchart of FIG.

図２８において、第４の実施形態は、第１の実施形態のステップ１０６０の処理の前に、以下に示すステップ４０５５を実施する点であり、それ以外の他の処理は第１の実施形態と同様である。 In FIG. 28, the fourth embodiment is that step 4055 shown below is performed before the process of step 1060 of the first embodiment, and other processes are the same as those of the first embodiment. It is the same.

ステップ４０５５では、候補アイテム集合削除部５０２が、大きさ２のアイテム集合を作成中の場合にのみ、次のチェックを行う。 In step 4055, the candidate item set deletion unit 502 performs the following check only when an item set of size 2 is being created.

大きさ２の各アイテム集合｛Ａ、Ｂ｝について、当該アイテム集合の要素Ａ、Ｂが単語であり、Ａ又はＢのいずれかが名詞であり、他方が用言である場合に、図６に示すような構文解析部３による係り受け関係を参照し、係り受け関係として、「Ａ−Ｂ」又は「Ｂ−Ａ」という関係を満たさなければ、候補アイテム集合削除部５０２は、アイテム集合｛Ａ、Ｂ｝を削除する。 For each item set {A, B} of size 2, when elements A and B of the item set are words, one of A or B is a noun, and the other is a predicate, FIG. The candidate item set deletion unit 502 refers to the dependency relationship by the parsing unit 3 as shown, and if the relationship “AB” or “BA” is not satisfied as the dependency relationship, the candidate item set deletion unit 502 selects the item set {A , B}.

例えば、仮に「最小支持度」が１であった場合、図１０のデータから、大きさ２のアイテム集合を作成する際に、アイテム集合｛母、宣伝する（いる）｝は、大きさ２のアイテム集合一時記憶部５０５から削除される。これは、図６に、「母−宣伝する」又は「宣伝する−母」という係り受け関係が存在しないからである。 For example, if the “minimum support” is 1, when creating an item set of size 2 from the data in FIG. 10, the item set {mother, advertise (is)} It is deleted from the item set temporary storage unit 505. This is because there is no dependency relationship of “mother-advertise” or “advertise-mother” in FIG.

但し、このチェックでは、用言の意図情報は問わない。例えば、「母−宣伝する（ない）」「母−宣伝する（た）」が存在していれば、条件を満たすものとする。 However, in this check, the intention information of the predicate is not asked. For example, if “mother-advertise (not)” and “mother-advertise (ta)” exist, the condition is satisfied.

一方、アイテム集合｛特定口座、開設する（ない）｝は、大きさ２のアイテム集合一時記憶部５０５から削除されない。これは、図６の８行目に「特定口座−開設する（ない）」という係り受け関係が存在するからである。 On the other hand, the item set {specific account, opened (none)} is not deleted from the size 2 item set temporary storage unit 505. This is because there is a dependency relationship “specific account-open (not)” on the eighth line in FIG. 6.

（Ｄ−２）第４の実施形態の効果
以上のように、第４の実施形態によれば、第１の実施形態と同等の効果を得ることができる。 (D-2) Effect of Fourth Embodiment As described above, according to the fourth embodiment, the same effect as that of the first embodiment can be obtained.

また、第４の実施形態によれば、係り受け関係を参照することで、一方が名詞で、他方が用言（動詞、形容詞や形容動詞）である単語アイテムを組み合わせたことによって、意味の誤ったアイテム集合ができるだけ生成されないようにできる。 In addition, according to the fourth embodiment, by referring to the dependency relationship, a combination of word items in which one is a noun and the other is a predicate (a verb, an adjective or an adjective verb) is used. It is possible to prevent the generated item set from being generated as much as possible.

（Ｅ）他の実施形態
（１）第１の実施形態では、図７、図８で動詞や形容詞などの用言に、否定や継続をあらわす意図情報を設定しているが、設定しなくてもよい。 (E) Other Embodiments (1) In the first embodiment, intention information indicating denial or continuation is set in the predicates such as verbs and adjectives in FIG. 7 and FIG. Also good.

（２）第１の実施形態のアイテムを作る際に、概念階層を利用してもよい。例えば、「口座」と「特定口座」の間に上位・下位概念関係がある場合に、図８のデータＩＤ３に、次のアイテムに加えてもいい。 (2) The concept hierarchy may be used when creating the item of the first embodiment. For example, when there is an upper / lower conceptual relationship between the “account” and the “specific account”, the data ID 3 in FIG. 8 may be added to the next item.

「口座−申し込む（だ）」、「口座−開設する（ない）」、「口座」
（３）第１の実施形態では、最小支持度や最小確信度を入力しているが、システムで固定値にしてもよい。また、最小支持度や最小確信度だけでなく、リスト値などを入力できるようにしてもよい。又は、最大支持度や最大確信度を入力できるようにして、出力するアソシエーションルールの上限を規定してもよい。 "Account-Apply (da)", "Account-Open (not)", "Account"
(3) In the first embodiment, the minimum support level and the minimum confidence level are input, but may be fixed values by the system. In addition to the minimum support level and the minimum certainty level, a list value or the like may be input. Alternatively, the upper limit of the association rule to be output may be defined by allowing the maximum support level and the maximum certainty level to be input.

（４）第１の実施形態のステップ１４０で登録する単語アイテムは、係り受けアイテムの係り元単語にも係り先単語にも出現しない単語に限ってもいい。また、動詞や形容詞のみ、や、名詞のみに制限してもよい。 (4) The word items registered in step 140 of the first embodiment may be limited to words that do not appear in the dependency source word or the dependency destination word of the dependency item. Moreover, you may restrict | limit only to a verb, an adjective, or only a noun.

（５）第１の実施形態の図７、図８で、各データのアイテムを作成する際には、データの属性データを加えてもいい。例えば、コールセンターから得られた入力データであれば、コールした顧客の年齢や性別情報をアイテムにしてもよい。 (5) In creating the items of each data in FIGS. 7 and 8 of the first embodiment, attribute data of the data may be added. For example, as long as input data obtained from a call center, the customer's age and gender information may be used as an item.

（６）第２の実施形態では、枝は無向グラフにしているが、条件部から結論部への有向グラフにしてもよい。 (6) Although the branch is an undirected graph in the second embodiment, it may be a directed graph from the condition part to the conclusion part.

（７）第２の実施形態で、グラフ要素操作部６０１で追加したグラフ枝一時記憶部のデータのうち、種類項目が「ルール」以外の枝は、節点間の引力として計算するだけで、表示しなくても良い。 (7) In the second embodiment, among the data in the graph branch temporary storage unit added by the graph element operation unit 601, branches other than the type item “rule” are simply calculated as the attractive force between the nodes, and are displayed. You don't have to.

（８）第２の実施形態で、グラフ要素操作部６０１で新たな枝を追加する際に、概念階層を利用してもよい。例えば、「口座」と「特定口座」の間に上位・下位概念関係がある場合に、図２２に、
節点１項目：「口座」
節点２項目：「特定口座」
種類：「概念」
出現数：単語アイテム「口座」とその下位概念の単語アイテムの出現数
該当文書ＩＤ：単語アイテム「口座」とその下位概念の単語アイテムの出現する文書
というデータを追加してもよい。 (8) In the second embodiment, a concept hierarchy may be used when a new branch is added by the graph element operation unit 601. For example, when there is an upper / lower conceptual relationship between “account” and “specific account”, FIG.
Node 1 item: "Account"
Node 2 item: “Specific Account”
Type: "Concept"
Number of occurrences: Number of occurrences of the word item “account” and its subordinate concept word items Applicable document ID: Data of a document in which the word item “account” and its subordinate concept word items appear may be added.

（９）最小支持度を計算する際には、同一データ中の出現回数を利用した重み付きの支持度などを利用してもよい。 (9) When calculating the minimum support level, a weighted support level using the number of appearances in the same data may be used.

（１０）上述した第１〜第４の実施形態では、本発明に係るシステムが、同一の装置により実現されるものとして説明したが、それぞれ接続可能な複数の別々の装置により分散処理で実現できるものとしても良い。 (10) In the first to fourth embodiments described above, the system according to the present invention has been described as being realized by the same device, but can be realized by distributed processing by a plurality of separate devices that can be connected to each other. It is good as a thing.

（１１）上述した第１〜第４の実施形態で説明したシステムは、例えばパーソナルコンピュータやワークステーションなどの情報処理装置により実現される機能であるが、その実体は情報処理装置が実行可能な処理プログラムである。また、第１〜第４の実施形態で説明したシステムの処理プログラムは、例えば、コンピュータに読み取り可能な記憶媒体に格納されたり、ハードディスクに格納されたり、又はネットワークを通じて伝送されうるものであったりする。 (11) The system described in the first to fourth embodiments described above is a function realized by an information processing apparatus such as a personal computer or a workstation, but its substance is a process that can be executed by the information processing apparatus. It is a program. The processing program of the system described in the first to fourth embodiments may be stored in a computer-readable storage medium, stored in a hard disk, or transmitted through a network, for example. .

（１２）本発明に係るシステムの構成要素の配置や処理フローの順序は、上述した第１〜第４の実施形態で説明したものに限定されない。 (12) The arrangement of components of the system according to the present invention and the order of processing flows are not limited to those described in the first to fourth embodiments.

第１の実施形態のデータ分析装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the data analyzer of 1st Embodiment. 係り受け関係の例を示す説明図である。It is explanatory drawing which shows the example of a dependency relationship. 第１の実施形態のデータ分析装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the data analyzer of 1st Embodiment. 第１の実施形態の入力データ例を示す図である。It is a figure which shows the example of input data of 1st Embodiment. 第１の実施形態の形態素解析処理の結果を示す図である。It is a figure which shows the result of the morphological analysis process of 1st Embodiment. 第１の実施形態の構文解析処理の結果を示す図である。It is a figure which shows the result of the syntax analysis process of 1st Embodiment. 第１の実施形態の係り受け関係によるアイテム作成結果を示す図である。It is a figure which shows the item creation result by the dependency relation of 1st Embodiment. 第１の実施形態の形態素によるアイテム作成結果を示す図である。It is a figure which shows the item creation result by the morpheme of 1st Embodiment. 第１の実施形態のアソシエーションルール抽出処理を示すフローチャートである。It is a flowchart which shows the association rule extraction process of 1st Embodiment. 第１の実施形態のアイテム集合一時記憶部の格納内容例を示す図である。It is a figure which shows the example of a storage content of the item set temporary storage part of 1st Embodiment. 第１の実施形態のアイテム集合一時記憶部の格納内容例を示す図である。It is a figure which shows the example of a storage content of the item set temporary storage part of 1st Embodiment. 第１の実施形態のアイテム集合からできるアソシエーションルールを示す図である。It is a figure which shows the association rule made from the item set of 1st Embodiment. 第１の実施形態のアイテム集合一時記憶部の格納内容例を示す図である。It is a figure which shows the example of a storage content of the item set temporary storage part of 1st Embodiment. 第１の実施形態のアイテム集合からできるアソシエーションルールを示す図である。It is a figure which shows the association rule made from the item set of 1st Embodiment. 第１の実施形態の抽出されたアソシエーションルールを示す図である。It is a figure which shows the association rule extracted of 1st Embodiment. 第２の実施形態のデータ分析装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the data analyzer of 2nd Embodiment. 第２の実施形態のアソシエーションルールのグラフ表示例を示す図である。It is a figure which shows the example of a graph display of the association rule of 2nd Embodiment. 第２の実施形態の表示部６の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the display part 6 of 2nd Embodiment. 第２の実施形態の表示部６の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the display part 6 of 2nd Embodiment. 第２の実施形態のグラフ節点一時記憶部の格納内容例を示す図である。It is a figure which shows the example of the storage content of the graph node temporary storage part of 2nd Embodiment. 第２の実施形態のグラフ枝一時記憶部の格納内容例を示す図である。It is a figure which shows the example of the storage content of the graph branch temporary storage part of 2nd Embodiment. 第２の実施形態のグラフ枝一時記憶部の更新された格納内容例を示す図である。It is a figure which shows the example of the storage content updated of the graph branch temporary storage part of 2nd Embodiment. 第２の実施形態のグラフ節点一時記憶部の更新された格納内容例を示す図である。It is a figure which shows the example of the storage content updated of the graph node temporary storage part of 2nd Embodiment. 第２の実施形態のグラフ枝一時記憶部の更新された格納内容例を示す図である。It is a figure which shows the example of the storage content updated of the graph branch temporary storage part of 2nd Embodiment. 第２の実施形態のアソシエーションルールのグラフ表示例を示す図である。It is a figure which shows the example of a graph display of the association rule of 2nd Embodiment. 第３の実施形態のアソシエーションルール抽出処理を示すフローチャートである。It is a flowchart which shows the association rule extraction process of 3rd Embodiment. 第３の実施形態のアイテム集合からできるアソシエーションルールを示す図である。It is a figure which shows the association rule made from the item set of 3rd Embodiment. 第４の実施形態のアソシエーションルール抽出処理を示すフローチャートである。It is a flowchart which shows the association rule extraction process of 4th Embodiment.

Explanation of symbols

１…入力部、２…形態素解析部、３…構文解析部、４…アイテム生成部、５…アソシエーションルール抽出部、５００…制御部、５０１…候補アイテム集合生成部、５０２…候補アイテム集合削除部、５０３…候補アイテム集合計算部、５０４…ルール作成部、５０５…アイテム集合一時記憶部、６…表示部、６０１…グラフ要素操作部、６０２…グラフ表示部、６０３…グラフ節点一時記憶部、６０４…グラフ枝一時記憶部、７Ａ及び７Ｂ…データ分析装置。
DESCRIPTION OF SYMBOLS 1 ... Input part, 2 ... Morphological analysis part, 3 ... Syntax analysis part, 4 ... Item generation part, 5 ... Association rule extraction part, 500 ... Control part, 501 ... Candidate item set generation part, 502 ... Candidate item set deletion part 503 ... Candidate item set calculation unit 504 ... Rule creation unit 505 ... Item set temporary storage unit 6 ... Display unit 601 ... Graph element operation unit 602 ... Graph display unit 603 ... Graph node temporary storage unit 604 ... Graph branch temporary storage unit, 7A and 7B ... Data analysis device.

Claims

In an information analysis system that creates a correlation rule based on each component of a plurality of input text information and outputs a useful correlation rule,
Morphological analysis means for performing morphological analysis on each text information,
A syntax analysis means for performing syntax analysis on each of the above text information;
Item creation means for creating a morphological analysis result and / or a syntax analysis result of each text information as an item to be analyzed by the correlation rule;
Item set creation means for creating one or more item sets using the one or more items created by the item creation means;
Item set deletion means for checking the item sets created by the item set creation means and deleting the item set having items having semantically inclusive relations as elements,
Item set calculation means for calculating the co-occurrence appearance frequency for each item set,
Correlation rule creating means for creating one or more correlation rules based on the co-occurrence frequency of each item set calculated by the item set calculating means;
An information analysis system comprising: display means for displaying each correlation rule created by the correlation rule creation means.

The information analysis system according to claim 1, wherein the item set deletion unit deletes a word item including the same character string as the dependency source character string or the dependency destination character string of the dependency item.

The correlation rule creating means divides each created correlation rule into a condition part and a conclusion part, a relationship between the divided condition part and the conclusion part, and the divided condition part and the conclusion. The information analysis system according to claim 1 or 2, wherein an association rule with few restrictions is deleted based on a certainty factor when co-occurring with a part.

The display means displays each item as a node, and displays the relationship between the condition part of the correlation rule and the conclusion part in the form of a graph. The nodes having similar concepts between the items are displayed. The information analysis system according to any one of claims 1 to 3, wherein the information is displayed close to the distance.

5. The correlation rule creating means selects the useful correlation rule using usefulness judgment information for judging the usefulness of each created correlation rule. Information analysis system described.

6. The information analysis system according to claim 5, wherein the usefulness determination information is defined according to a type of the item.

The information analysis system according to claim 1, wherein the item set creation unit creates an item set having a dependency relationship between word items constituting the item set.

In an information analysis method for creating a correlation rule based on each component of a plurality of input text information and outputting a useful correlation rule,
A morpheme analysis step in which the morpheme analysis means performs morpheme analysis on each text information;
A parsing step in which the parsing means performs parsing on each of the text information;
An item creating step for creating a morphological analysis result and / or a syntax analysis result of each text information as an item to be analyzed by the correlation rule;
An item set creation step in which the item set creation means creates one or more item sets using the one or more items created by the item creation means;
An item set deletion step in which an item set deletion unit deletes the item set having an item having a semantically inclusive relationship as an element by comparing each item set created by the item set creation unit,
Item set calculation means, the item set calculation step for calculating the co-occurrence appearance frequency for each item set,
A correlation rule creating step for creating one or more correlation rules based on the co-occurrence appearance frequency of each item set calculated by the item set calculating unit;
A display means comprising: a display step of displaying each correlation rule created by the correlation rule creating means.

In an information analysis program that creates a correlation rule based on each component of a plurality of input text information and outputs a useful correlation rule,
On the computer,
Morphological analysis means for performing morphological analysis on each text information,
Syntax analysis means for performing syntax analysis on each of the above text information;
Item creation means for creating a morphological analysis result and / or a syntax analysis result of each text information as an item to be analyzed by the correlation rule,
Item set creation means for creating one or more item sets using the one or more items created by the item creation means,
Item set deletion means for deleting each item set created by the item set creation means and deleting the item set having an item having a semantically inclusive relationship as an element,
Item set calculation means for calculating the co-occurrence appearance frequency for each item set,
Correlation rule creating means for creating one or a plurality of correlation rules based on the co-occurrence appearance frequency of each item set calculated by the item set calculating means;
An information analysis program that functions as display means for displaying each of the correlation rules created by the correlation rule creation means.