JP4815934B2

JP4815934B2 - Text mining device, text mining method, text mining program

Info

Publication number: JP4815934B2
Application number: JP2005223971A
Authority: JP
Inventors: 崇博池田; 聡中澤; 要祐坂尾; 研治佐藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2005-08-02
Filing date: 2005-08-02
Publication date: 2011-11-16
Anticipated expiration: 2025-08-02
Also published as: JP2007041767A

Description

本発明は、テキストマイニングに関し、特に依存構造木が異なる同義表現を同一視してマイニングを行うことができるテキストマイニング装置、テキストマイニング方法、テキストマイニングプログラムに関する。 The present invention relates to text mining, and more particularly, to a text mining device, a text mining method, and a text mining program that can perform mining by equating synonymous expressions having different dependency structure trees.

テキストマイニング装置は、大量のテキストから有用な知識を抽出するために、テキスト中に頻出する単語のパターンを抽出することを目的として構成されている。従来のテキストマイニング装置の一例が、特許文献１に記載されている。
この特許文献１に記載されたテキストマイニング装置は、テキスト中の各文の構文構造を解析し構文木を構築する言語解析装置と、構文木の中から頻出するパターンを発見するパターン抽出装置とを有し、テキスト中に頻出する単語の構文的なパターンを抽出する。 The text mining device is configured to extract a pattern of words that frequently appear in the text in order to extract useful knowledge from a large amount of text. An example of a conventional text mining device is described in Patent Document 1.
The text mining device described in Patent Document 1 includes a language analysis device that analyzes a syntax structure of each sentence in a text and constructs a syntax tree, and a pattern extraction device that finds a frequent pattern from the syntax tree. And syntactic patterns of words that frequently appear in the text.

一方、依存構造木を変形し、同義の表現に対応する別の依存構造木に変換するシステムの一例が、非特許文献１に記載されている。この非特許文献１に記載された依存構造処理システムは、予め定めておく変換規則（照合パターン・書き換えパターン）を参照して、照合パターンに適合する依存構造木を書き換えパターンに従って別の依存構造木に変換する。
変換規則では、変数を用いて、変換前の依存構造木と変換後の依存構造木との間で対応する節点の関係を記述する。非特許文献１には、例えば、「Ｎ１がＮ２にＶさせられる」という表現に対応する依存構造木を「Ｎ２がＮ１をＶする」という同義の表現に対応する別の依存構造木に変換するための変換規則の例が示されている。この例において、Ｎ１およびＮ２は名詞に対応する変数、Ｖは動詞に対応する変数である。 On the other hand, Non-Patent Document 1 describes an example of a system that transforms a dependency structure tree and converts it into another dependency structure tree corresponding to a synonymous expression. The dependency structure processing system described in Non-Patent Document 1 refers to a predetermined conversion rule (matching pattern / rewrite pattern) and converts another dependency structure tree that matches the matching pattern according to the rewrite pattern. Convert to
In the conversion rule, the relationship of corresponding nodes between the dependency structure tree before conversion and the dependency structure tree after conversion is described using variables. In Non-Patent Document 1, for example, a dependency structure tree corresponding to the expression “N1 is caused to V2 by N2” is converted into another dependency structure tree corresponding to a synonymous expression “N2 causes V1 to N1”. An example conversion rule for is shown. In this example, N1 and N2 are variables corresponding to nouns, and V is a variable corresponding to a verb.

特開２００１−８４２５０号公報JP 2001-84250 A 岩倉友哉他４名、汎用依存構造処理モジュールＫＵＲＡＬＡＮＧ、言語処理学会第９回年次大会予稿集、２００３年３月１８日、ｐｐ．６８７−６９０Yuya Iwakura and four others, General Purpose Dependent Structure Processing Module KURALANG, Proceedings of the 9th Annual Conference of the Language Processing Society, March 18, 2003, pp. 687-690

しかし、従来のテキストマイニング装置には、同義の表現の依存構造木を同一視してマイニングを行うことができないという問題があった。 However, the conventional text mining device has a problem in that mining cannot be performed by equating a dependency structure tree of synonymous expressions.

従来のテキストマイニング装置は、依存構造木が異なる同義表現、すなわち、同一の意味内容を表しているにもかかわらず依存構造木が異なる（用いられている単語やその単語間の係り受け関係が異なる）表現を同一視してマイニングを行うことができない。その理由は、従来のテキストマイニング装置では、依存構造木が異なる同義表現について何ら考慮されていないためである。
この結果、同義の意味内容に対応する表現が複数存在する場合、それぞれの表現ごとにそれを抽出するかしないかを判定しなければならず、特徴的な意味内容を抽出し損なうおそれがあった。 Conventional text mining devices have different synonymous expressions with different dependency structure trees, that is, different dependency structure trees even though they represent the same semantic content (words used and dependency relationships between the words are different) ) Mining cannot be done with the same expression. This is because the conventional text mining apparatus does not consider any synonymous expressions with different dependency structure trees.
As a result, when there are multiple expressions corresponding to the synonymous meaning contents, it must be determined whether or not to extract each expression, and there is a risk that characteristic meaning contents may not be extracted. .

例えば、表現ＥＸ１「表示する文字を小さくする」および表現ＥＸ２「小さな文字で表示する」は、いずれも「表示に使用する文字のサイズを小さくする」という同一の意味内容を表現しているため、同義表現である。しかし、図１（ａ）に示す表現ＥＸ１に対応する依存構造木ＤＴ−ＥＸ１と図１（ｂ）に示す表現ＥＸ２に対応する依存構造木ＤＴ−ＥＸ２は異なっている。
なお、ここでは、一例として、文節を節点とし、文節に属する自立語を終止形に直したものを節点のラベルとし、文節間の係り受け関係を枝とする形態の依存構造木の例を示している。以降の例も同様である。 For example, the expression EX1 “reducing the character to be displayed” and the expression EX2 “displaying with a small character” both express the same meaning content “reducing the size of the character used for display”. Synonymous expression. However, the dependency structure tree DT-EX1 corresponding to the expression EX1 shown in FIG. 1A is different from the dependency structure tree DT-EX2 corresponding to the expression EX2 shown in FIG.
As an example, here is an example of a dependency structure tree in which a clause is a node, an independent word belonging to a clause is converted to a terminal form, a node label is used, and the dependency relationship between clauses is a branch. ing. The same applies to the following examples.

従来のテキストマイニング装置は、これらの表現を同一視することができないため、例えば、「表示に使用する文字のサイズを小さくする」という意味内容が表現ＥＸ１を用いて述べられているテキストが２３件、表現ＥＸ２を用いて述べられているテキストが３４件ある場合に、「表示に使用する文字のサイズを小さくする」という意味内容が述べられているテキストが合わせて５７件あると認定することができない。この結果、例えば、５０件以上出現する表現を抽出するようにマイニングを行う場合、「表示に使用する文字のサイズを小さくする」という意味内容は５７件のテキストで述べられているにもかかわらず、特徴的な意味内容として抽出することができない。 Since the conventional text mining device cannot identify these expressions, for example, there are 23 texts in which the meaning content “reducing the size of characters used for display” is described using the expression EX1. In the case where there are 34 texts described using the expression EX2, it may be recognized that there are 57 texts including the meaning content “reducing the size of characters used for display” in total. Can not. As a result, for example, when mining to extract expressions that appear 50 times or more, the meaning of “reducing the size of characters used for display” is described in 57 texts. Cannot be extracted as characteristic meaning content.

従来のテキストマイニング装置は、同義表現を１つの表現に統一するように依存構造木を事前に変換し、変換後の依存構造木に対してマイニングを行うにしても、依存構造木間の変換規則を事前に作成するのに手間がかかる。その理由は、依存構造木間の変換規則においては、変換前の依存構造木の各節点に接合していた節点を、変換後の依存構造木においてどの節点に接合し直せばよいのかを明確に記述しておく必要があるためである。 The conventional text mining device converts the dependency structure tree in advance so as to unify the synonymous expressions into one expression, and performs the conversion rule between the dependency structure trees even if the dependency structure tree after conversion is mined. Takes time to create The reason for this is that in the conversion rules between the dependency structure trees, it is clear which node should be connected to each node in the dependency structure tree after conversion. This is because it needs to be described.

例えば、表現ＥＸ１に対応する依存構造木ＤＴ−ＥＸ１（図１（ａ））を、表現ＥＸ２に対応する依存構造木ＤＴ−ＥＸ２（図１（ｂ））に変換する規則を記述する場合、ＤＴ−ＥＸ１中の節点「表示する」に接合していた節点と、節点「する」に接合していた節点とを、ともにＤＴ−ＥＸ２中の節点「表示する」に接合するように変換し、ＤＴ−ＥＸ１中の節点「小さい」に接合していた節点を、ＤＴ−ＥＸ２中の節点「小さな」に接合するように変換すること等を明確に記述しておく必要がある。さもなくば、文Ｓ１「メールを表示する文字をできるだけ小さくする方法をＷＥＢで調べた」に対応する依存構造木ＤＴ−Ｓ１（図２）において、依存構造木ＤＴ−ＥＸ１に適合する部分（図３のＰＴ１）を依存構造木ＤＴ−ＥＸ２（図１（ｂ））に変換する際に、節点「メール」、節点「できるだけ」および節点「調べる」を変換後の依存構造木においてどの節点に接合すればよいか決めることができず、依存構造木の変換を行うことができない。 For example, when describing a rule for converting the dependency structure tree DT-EX1 (FIG. 1A) corresponding to the expression EX1 to the dependency structure tree DT-EX2 (FIG. 1B) corresponding to the expression EX2, DT -The node joined to the node "display" in EX1 and the node joined to the node "Yes" are both converted to be joined to the node "display" in DT-EX2, and DT -It is necessary to clearly describe, for example, that a node joined to the node "small" in EX1 is converted to be joined to the node "small" in DT-EX2. Otherwise, in the dependency structure tree DT-S1 (FIG. 2) corresponding to the sentence S1 “A method for reducing the size of characters for displaying mail as much as possible” is a portion that matches the dependency structure tree DT-EX1 (FIG. 2). 3 is converted to the dependency structure tree DT-EX2 (FIG. 1B), the node “mail”, the node “as much as possible” and the node “examine” are joined to which node in the converted dependency structure tree. It is not possible to decide what to do, and the dependency structure tree cannot be converted.

従来のテキストマイニング装置は、同義表現を１つの表現に統一するように依存構造木を事前に変換し、変換後の依存構造木に対してマイニングを行うにしても、同義と見なしたい表現が完全に同一の意味ではなく依存構造木間で節点の対応を付けられない場合には、それらの表現に対応する依存構造木を事前に統一してマイニングを行うことができない。その理由は、変換前の依存構造木の各節点に接合していた節点を、変換後の依存構造木においてどの節点に接合し直せばいいのかを明確にしなければ、依存構造木間の変換を行うことができないためである。 Even if the conventional text mining device converts the dependency structure tree in advance so that the synonymous expressions are unified into one expression and performs mining on the converted dependency structure tree, the expression to be regarded as synonymous can be obtained. If the nodes are not completely the same in meaning and cannot correspond to the nodes between the dependency structure trees, the dependency structure trees corresponding to those expressions cannot be mined in advance. The reason for this is that if it is not clear which node should be joined to each node in the dependency structure tree before conversion, it is necessary to convert between the dependency structure trees. This is because it cannot be done.

例えば、表現ＥＸ３「表示する行数を増やす」に対応する依存構造木ＤＴ−ＥＸ３は、図４（ａ）のようになる。このとき、図１（ａ）に示す表現ＥＸ１「表示する文字を小さくする」に対応する依存構造木ＤＴ−ＥＸ１の各節点と図４（ａ）の依存構造木ＤＴ−ＥＸ３の各節点とを完全に対応づけることはできず、表現ＥＸ１と表現ＥＸ３を同義と見なしたい場合でも、対応する依存構造木間の変換を行うことができない。
実際、文Ｓ２「メールを表示する行数を２倍に増やす」に対応する依存構造木ＤＴ−Ｓ２（図４（ｂ））において、依存構造木ＤＴ−ＥＸ３（図４（ａ））に適合する部分を依存構造木ＤＴ−ＥＸ１（図１（ａ））に変換しようとしても、表現ＥＸ１「表示する文字を小さくする」において、文Ｓ２中の「２倍に」という文節が係る先は存在せず、依存構造木ＤＴ−Ｓ２中の節点「２倍」を変換後の依存構造木のどの節点にも接合することができない。 For example, the dependency structure tree DT-EX3 corresponding to the expression EX3 “increasing the number of lines to be displayed” is as illustrated in FIG. At this time, each node of the dependency structure tree DT-EX1 corresponding to the expression EX1 “reducing displayed characters” shown in FIG. 1A and each node of the dependency structure tree DT-EX3 of FIG. Even if it is desired that the expressions EX1 and EX3 are regarded as synonymous, it is not possible to perform conversion between corresponding dependency structure trees.
Actually, the dependency structure tree DT-S2 (FIG. 4B) corresponding to the sentence S2 “double the number of lines for displaying the mail” conforms to the dependency structure tree DT-EX3 (FIG. 4A). Even if an attempt is made to convert the portion to be converted into the dependency structure tree DT-EX1 (FIG. 1 (a)), in the expression EX1 “reducing the displayed character”, there is a destination related to the phrase “double” in the sentence S2. The node “double” in the dependency structure tree DT-S2 cannot be joined to any node of the converted dependency structure tree.

従来のテキストマイニング装置では、同義表現を１つの表現に統一するように依存構造木を事前に変換し、変換後の依存構造木に対してマイニングを行うとすると、本来は特徴的ではない表現が誤って抽出されたり、本来は特徴的である表現が誤って抽出されなかったりするおそれがある。その理由は、依存構造木の変換によって、本来は存在しなかった節点が新たに生成されたり、本来は存在していた節点が削除されたりするという副作用があるためである。 In a conventional text mining device, if a dependency structure tree is converted in advance so that synonymous expressions are unified into one expression, and the dependency structure tree after conversion is mined, an expression that is not characteristic in nature is obtained. There is a possibility that an expression that is originally extracted or a characteristic expression is not extracted by mistake. The reason is that there is a side effect that a node that did not originally exist is newly generated or a node that originally existed is deleted by conversion of the dependency structure tree.

例えば、表現ＥＸ２「小さな文字で表示する」に対応する依存構造木ＤＴ−ＥＸ２（図１（ｂ））を、表現ＥＸ１「表示する文字を小さくする」に対応する依存構造木ＤＴ−ＥＸ１（図１（ａ））に変換する場合、本来は存在しなかった節点「小さい」、節点「する」が新たに生成されることになる。
このような依存構造木の変換を行う場合、例えば、表現ＥＸ１「表示する文字を小さくする」が使われているテキストが２１件、表現ＥＸ２「小さな文字で表示する」が使われているテキストが１８件、このほかに、「表示する画像のサイズを小さくする」のように表現ＥＸ１とも表現ＥＸ２とも異なるが、「小さくする」を含む表現が使われているテキストが１６件存在するものとすると、変換後、節点「小さい」と節点「する」によって構成される部分木は全依存構造木中に合計５５件出現することになる。
この結果、例えば、５０件以上出現する部分木を抽出するようにマイニングを行う場合、節点「小さい」と節点「する」によって構成される部分木は、本来であれば３７件しか出現しないにもかかわらず、特徴的な部分木として抽出されてしまう。 For example, the dependency structure tree DT-EX2 (FIG. 1B) corresponding to the expression EX2 “display with small characters” is changed to the dependency structure tree DT-EX1 (FIG. 1B) corresponding to the expression EX1 “reducing display characters”. In the case of conversion to 1 (a)), the node “small” and “node” that did not originally exist are newly generated.
When such dependency structure tree conversion is performed, for example, 21 texts using the expression EX1 “reducing displayed characters” and texts using the expression EX2 “display with small characters” are used. In addition to 18 cases, it is assumed that there are 16 texts that are different from the expression EX1 and the expression EX2 such as “reduce the size of the image to be displayed”, but use the expression including “decrease”. After the conversion, a total of 55 subtrees composed of the node “small” and the node “do” appear in all the dependency structure trees.
As a result, for example, when mining is performed so as to extract a subtree that appears 50 or more times, only 37 subtrees that are composed of the nodes “small” and “do” appear originally. Regardless, it is extracted as a characteristic subtree.

逆に、例えば、表現ＥＸ１「表示する文字を小さくする」に対応する依存構造木ＤＴ−ＥＸ１（図１（ａ））を、表現ＥＸ２「小さな文字で表示する」に対応する依存構造木ＤＴ−ＥＸ２（図１（ｂ））に変換する場合、本来存在していた節点「小さい」と節点「する」が削除されることになる。
このような依存構造木の変換を行う場合、例えば、表現ＥＸ１「表示する文字を小さくする」が使われているテキストが３４件、表現ＥＸ２「小さな文字で表示する」が使われているテキストが１３件、このほかに、「表示する画像のサイズを小さくする」のように、表現ＥＸ１とも表現ＥＸ２とも異なるが、「小さくする」を含む表現が使われているテキストが１９件存在するものとすると、変換後、節点「小さい」と節点「する」によって構成される部分木は全依存構造木中に合計１９件しか出現しないことになる。
この結果、例えば、５０件以上出現する部分木を抽出するようにマイニングを行う場合、節点「小さい」と節点「する」によって構成される部分木は、本来であれば５３件出現していたにもかかわらず、特徴的な部分木としては抽出されなくなってしまう。 On the other hand, for example, the dependency structure tree DT-EX1 (FIG. 1A) corresponding to the expression EX1 “reducing displayed characters” is changed to the dependency structure tree DT− corresponding to the expression EX2 “displaying with small characters”. In the case of conversion to EX2 (FIG. 1B), the node “small” and the node “yes” that were originally present are deleted.
When such dependency structure tree conversion is performed, for example, 34 texts using the expression EX1 “reducing displayed characters” and texts using the expression EX2 “display with small characters” are used. In addition to 13 cases, there are 19 texts that are different from the expression EX1 and the expression EX2 as in “reducing the size of an image to be displayed”, but use an expression including “decrease”. Then, after the conversion, only a total of 19 subtrees constituted by the node “small” and the node “do” appear in the entire dependency structure tree.
As a result, for example, when mining is performed so as to extract subtrees that appear 50 times or more, 53 subtrees originally composed of the nodes “small” and “do” have appeared. Nevertheless, it is no longer extracted as a characteristic subtree.

そこで、本発明は、依存構造木が異なる同義表現を同一視してマイニングを行うことができるテキストマイニング装置、テキストマイニング方法、テキストマイニングプログラムを提供することをその目的とする。 Accordingly, an object of the present invention is to provide a text mining device, a text mining method, and a text mining program that can perform mining while equating synonymous expressions having different dependency structure trees.

本発明に係るテキストマイニング装置では、同義表現識別手段が、テキストマイニングの対象となる文の依存構造木（対象文依存構造木）の中に同義表現辞書に登録されている表現の依存構造木（同義表現依存構造木）と一致する部分木（一致部分木）が含まれているかどうかを識別する。
節点置換手段は、一致部分木を同義表現が属するグループを示す特別な節点（同義表現節点）で置き換え、特徴部分木抽出手段は、置き換え後の対象文依存構造木から特徴部分木を抽出する（請求項１）。 In the text mining device according to the present invention, the synonym expression identifying means includes a dependency structure tree (in the dependency structure tree of the sentence to be text mined) (target sentence dependency structure tree) of the expressions registered in the synonym expression dictionary. It is identified whether or not a subtree (matching subtree) that matches the synonymous expression dependency structure tree is included.
The node replacement means replaces the matching subtree with a special node (synonymous expression node) indicating the group to which the synonym expression belongs, and the feature subtree extraction means extracts the feature subtree from the target sentence dependency structure tree after replacement ( Claim 1).

上記テキストマイニング装置によれば、対象文依存構造木に含まれる一致部分木を、節点置換手段が同義表現節点で置換し、特徴部分木抽出手段は、置換後の対象文依存構造木から特徴部分木を抽出する。
そのため、依存構造木が異なる同義表現を同一視して特徴的な部分木の抽出を行うことができる。 According to the text mining device, the node substituting unit replaces the matching subtree included in the target sentence dependent structure tree with the synonymous expression node, and the feature subtree extracting unit extracts the feature part from the target sentence dependent structural tree after the replacement. Extract trees.
Therefore, a characteristic subtree can be extracted by equating synonymous expressions with different dependency structure trees.

本発明に係るテキストマイニング装置では、同義表現識別手段が、対象文依存構造木の中に一致部分木が含まれているかどうかを識別する。
節点追加手段は、対象文依存構造木に同義表現節点を追加する。節点追加手段は、同義表現節点から、一致部分木外の節点で一致部分木内の節点からの係り受け枝を有している節点への係り受け枝を追加し、一致部分木外の節点で一致部分木内の節点への係り受け枝を有している節点から、同義表現節点への係り受け枝を追加する。特徴部分木抽出手段は、同義表現節点の追加と係り受け枝の追加がされた後の対象文依存構造木（依存構造束）から特徴部分木を抽出する（請求項２）。 In the text mining device according to the present invention, the synonymous expression identifying means identifies whether or not the matching subtree is included in the target sentence dependent structure tree.
The node addition means adds a synonymous expression node to the target sentence dependency structure tree. The node addition means adds a dependency branch from the synonymous expression node to a node having a dependency branch from a node in the matching subtree at a node outside the matching subtree, and matches at a node outside the matching subtree A dependency branch to a synonymous expression node is added from a node having a dependency branch to a node in the subtree. The feature subtree extracting means extracts the feature subtree from the target sentence dependency structure tree (dependency structure bundle) after the addition of the synonymous expression node and the dependency branch.

上記テキストマイニング装置によれば、節点追加手段が、対象文依存構造木に含まれる一致部分木に相当する同義表現節点を対象文依存構造木に追加し、一致部分木と外部の節点の関係を示す係り受け枝と同等の係り受け枝を同義表現節点に追加して依存構造束を生成する。特徴部分木抽出手段は、節点と係り受け枝の追加後の依存構造束から特徴部分木を抽出する。
そのため、依存構造木が異なる同義表現を同一視して特徴的な部分木の抽出を行うことができる。さらに、特徴部分木の抽出の対象となる依存構造束には、既存の節点はそのまま残されているから、同義表現を統一することによる悪影響を低く抑えることができる。 According to the text mining device, the node adding means adds a synonymous expression node corresponding to the matching subtree included in the target sentence dependency structure tree to the target sentence dependency structure tree, and determines the relationship between the matching subtree and external nodes. A dependency structure bundle is generated by adding a dependency branch equivalent to the dependency branch shown to the synonymous expression node. The feature subtree extracting means extracts a feature subtree from the dependency structure bundle after the addition of nodes and dependency branches.
Therefore, a characteristic subtree can be extracted by equating synonymous expressions with different dependency structure trees. Furthermore, since the existing nodes remain as they are in the dependency structure bundle that is the target of the feature subtree extraction, adverse effects caused by unifying synonymous expressions can be suppressed to a low level.

上記テキストマイニング装置において、排他関係枝追加手段が同義表現節点と一致部分木に含まれる各節点とを排他関係枝で接続した依存構造束を生成し、特徴部分木抽出手段は、排他関係枝で接続された節点を含まない部分木から特徴部分木を抽出するようにしてもよい（請求項３）。
このようにすれば、一致部分木に含まれる節点と同義表現節点の両方を含むような抽出結果として意味を成さない部分木は特徴部分木として抽出されなくなる。 In the text mining device, the exclusive relation branch adding means generates a dependency structure bundle in which the synonymous expression nodes and the nodes included in the matching subtree are connected by the exclusive relation branches, and the feature subtree extracting means is the exclusive relation branch. A feature subtree may be extracted from a subtree that does not include a connected node.
In this way, a subtree that does not make sense as an extraction result including both nodes and synonymous expression nodes included in the matching subtree is not extracted as a feature subtree.

上記テキストマイニング装置において、抽出結果出力手段を備え、この抽出結果出力手段は、特徴部分木に含まれる同義表現部節点のラベルを、このラベルが示す同義表現グループに属する表現を代表する出力用表現に置き換えて前記特徴部分木の形状を人間が視認可能な形で出力するようにしてもよい（請求項４）。
このようにすれば、出力装置に表示される特徴構造に含まれる同義表現節点のラベルは、使用者にとって意味のわからない記号等ではなく、同義表現グループを表す表現となるので、使用者はラベルの意味を理解できるようになる。 The text mining device further includes an extraction result output unit, and the extraction result output unit is configured to output a representation of a synonym expression node node label included in the feature subtree and representing an expression belonging to the synonym expression group indicated by the label. The shape of the feature part tree may be output in a form that can be visually recognized by a human.
In this way, the label of the synonym expression node included in the feature structure displayed on the output device is not a symbol or the like that does not make sense to the user, but an expression that represents the synonym expression group. You can understand the meaning.

上記テキストマイニング装置において、出力用表現を同義表現節点のラベルが示す同義表現グループの中で最初に列挙されている表現としても良い（請求項５）。 In the text mining device, the output expression may be an expression listed first in the synonym expression group indicated by the label of the synonym expression node (claim 5).

上記テキストマイニング装置において、出力用表現を同義表現節点のラベルが示す前記同義表現グループの中で最も長さが短い表現としても良い（請求項６）。 In the text mining device, the output expression may be an expression having the shortest length in the synonym expression group indicated by the synonym expression node label.

上記テキストマイニング装置において、出力用表現を同義表現節点のラベルが示す前記同義表現グループの中で前記テキストマイニングの対象となる文集合中に最も多く出現した表現としてもよい（請求項７）。 In the text mining device, the output expression may be an expression that appears most frequently in the sentence set that is the target of the text mining in the synonym expression group indicated by the synonym expression node label.

上記テキストマイニング装置において、出力用表現を同義表現節点のラベルが示す前記同義表現グループの中で出力用表現として用いるものとして予め指定されている表現としてもよい（請求項８）。 In the text mining device, the output expression may be an expression designated in advance as an expression for output in the synonym expression group indicated by the label of the synonym expression node (claim 8).

上記テキストマイニング装置において、出力用表現を同義表現節点のラベルが示す前記同義表現グループに対応して、この同義表現グループに含まれる表現とは別に予め定義されている表現としても良い（請求項９）。 In the text mining device, the output expression may be an expression defined in advance corresponding to the synonym expression group indicated by the synonym expression node label, separately from the expressions included in the synonym expression group. ).

本発明に係るテキストマイニング方法では、テキストマイニング装置のコンピュータが、マイニングの対象となるテキストデータベースから対象文依存構造木を生成し、同義表現辞書に記憶されている表現を記憶装置から読み出して同義表現依存構造木を生成し、対象文依存構造木と同義表現依存構造木を照合して一致部分木が対象文依存構造木に含まれているかどうかを識別し、通常の節点のラベルとは区別される識別子をラベルとする同義表現節点を生成し、一致部分木に含まれる全節点を同義表現節点で置換し、置換がされた後の対象文依存構造木から特徴部分木を抽出する処理を実行する（請求項１０）。 In the text mining method according to the present invention, the computer of the text mining device generates a target sentence-dependent structure tree from the text database to be mined, reads the expressions stored in the synonym expression dictionary from the storage device, and synonymous expressions. Generates a dependency structure tree, compares the target sentence dependency structure tree with the synonymous expression dependency structure tree to identify whether the matching subtree is included in the target sentence dependency structure tree, and is distinguished from a normal node label an identifier generating the synonymous expression nodes for a label that, all nodes in the matching subtree replaced with synonymous expression node, executes a process of extracting a characteristic partial tree from a sentence dependency structure tree after being substituted to (claim 10).

上記テキストマイニング方法によれば、対象文依存構造木に含まれる一致部分木を、同義表現節点で置換し、置換後の対象文依存木から特徴部分木を抽出する。
そのため、依存構造木が異なる同義表現を同一視して特徴的な部分木の抽出を行うことができる。 According to the text mining method, the matching subtree included in the target sentence dependency structure tree is replaced with the synonym expression node, and the feature subtree is extracted from the target sentence dependency tree after replacement.
Therefore, a characteristic subtree can be extracted by equating synonymous expressions with different dependency structure trees.

本発明に係るテキストマイニング方法では、テキストマイニング装置のコンピュータが、マイニングの対象となるテキストデータベースから対象文依存構造木を生成し、同義表現辞書に記憶されている表現を記憶装置から読み出して同義表現依存構造木を生成し、対象文依存構造木と同義表現依存構造木を照合して一致部分木が対象文依存構造木に含まれているかどうかを識別し、通常の節点のラベルとは区別される識別子をラベルとする同義表現節点を生成して対象文依存構造木に追加する。このとき、同義表現節点から、一致部分木外の節点で一致部分木の内の節点からの係り受け枝を有している節点への係り受け枝を追加し、一致部分木外の節点で一致部分木内の節点への係り受け枝を有している節点から、同義表現節点への係り受け枝を追加しておく。そして、同義表現節点の追加と係り受け枝の追加がされた後の対象文依存構造木（依存構造束）から特徴部分木を抽出する処理を実行するようにしてもよい（請求項１１）。 In the text mining method according to the present invention, the computer of the text mining device generates a target sentence-dependent structure tree from the text database to be mined, reads the expressions stored in the synonym expression dictionary from the storage device, and synonymous expressions. Generates a dependency structure tree, compares the target sentence dependency structure tree with the synonymous expression dependency structure tree to identify whether the matching subtree is included in the target sentence dependency structure tree, and is distinguished from a normal node label A synonymous expression node with the identifier as a label is generated and added to the target sentence dependency structure tree. At this time, add a dependency branch from the synonymous expression node to a node that has a dependency branch from a node in the matching subtree at a node outside the matching subtree, and match at a node outside the matching subtree A dependency branch to a synonymous expression node is added from a node having a dependency branch to a node in the subtree. And you may make it perform the process which extracts a feature subtree from the object sentence dependence structure tree (dependence structure bundle) after the addition of a synonymous expression node and the addition of a dependency branch (claim 11).

上記テキストマイニング方法によれば、対象文依存構造木に含まれる一致部分木に相当する同義表現節点を対象文依存構造木に追加し、一致部分木と外部の節点の関係を示す係り受け枝と同等の係り受け枝を同義表現節点に追加して依存構造束を生成する。そして、依存構造束から特徴部分木を抽出する。
そのため、依存構造木が異なる同義表現を同一視して特徴的な部分木の抽出を行うことができる。さらに、特徴部分木の抽出の対象となる対象文依存構造木には、既存の節点はそのまま残されているから、同義表現を統一することによる悪影響を低く抑えることができる。 According to the text mining method, a synonymous expression node corresponding to a matching subtree included in the target sentence dependency structure tree is added to the target sentence dependency structure tree, and a dependency branch indicating a relationship between the matching subtree and an external node is obtained. An equivalent dependency branch is added to the synonym expression node to generate a dependency structure bundle. Then, a feature subtree is extracted from the dependency structure bundle.
Therefore, a characteristic subtree can be extracted by equating synonymous expressions with different dependency structure trees. Furthermore, since the existing node is left as it is in the target sentence dependency structure tree that is the target of the feature subtree extraction, adverse effects caused by unifying synonymous expressions can be reduced.

上記テキストマイニング方法において、対象文依存構造木に同義表現節点が追加された後、この同義表現節点と前記一致部分木に含まれる各節点を排他関係枝で接続する排他関係枝接続ステップを備え、当該排他関係枝接続ステップの動作内容を前記テキストマイニング装置のコンピュータが実行し、前記特徴部分木抽出ステップでは、前記テキストマイニング装置のコンピュータが前記依存構造束の前記排他関係枝で接続された節点を含まない部分木から前記特徴部分木を抽出するようにしてもよい（請求項１２）。
このようにすれば、一致部分木に含まれる節点と同義表現節点の両方を含むような抽出結果として意味を成さない部分木は特徴部分木として抽出されなくなる。
In the text mining method, after a synonymous expression node is added to the target sentence dependency structure tree, the synonymous expression node and an exclusive relation branch connecting step of connecting each node included in the matching subtree with an exclusive relation branch, The operation contents of the exclusive relation branch connection step are executed by the computer of the text mining apparatus, and in the feature subtree extraction step, the computer of the text mining apparatus extracts nodes connected by the exclusive relation branch of the dependency structure bundle. The feature subtree may be extracted from a subtree that is not included (claim 12).
In this way, a subtree that does not make sense as an extraction result including both nodes and synonymous expression nodes included in the matching subtree is not extracted as a feature subtree.

本発明のテキストマイニングプログラムでは、マイニングの対象となるテキストデータベースから対象文依存構造木を生成する機能と、同義表現辞書に記憶されている表現を記憶装置から読み出して同義表現依存構造木を生成する機能と、対象文依存構造木と同義表現依存構造木を照合して一致部分木が対象文依存構造木に含まれているかどうかを識別する機能と、通常の節点のラベルとは区別される識別子をラベルとする同義表現節点を生成する機能と、一致部分木に含まれる全節点を同義表現節点で置換する機能と、置換がされた後の対象文依存構造木から特徴部分木を抽出する機能とをコンピュータに実行させる（請求項１３）。 In the text mining program of the present invention, a function for generating a target sentence dependency structure tree from a text database to be mined and a expression stored in the synonym expression dictionary are read from the storage device to generate a synonym expression dependency structure tree. An identifier that distinguishes a function from a normal node label and a function that identifies whether a matching subtree is included in the target sentence dependency structure tree by comparing the target sentence dependency structure tree with the synonymous expression dependency structure tree A function that generates synonymous nodes with labels, a function that replaces all nodes included in the matching subtree with synonymous nodes, and a function that extracts feature subtrees from the target sentence dependency structure tree after replacement Are executed by a computer (claim 13).

上記テキストマイニングプログラムによれば、対象文依存構造木に含まれる一致部分木を、同義表現節点で置換し、置換後の対象文依存木から特徴部分木を抽出する。
そのため、コンピュータをテキストマイニング装置として動作させ、依存構造木が異なる同義表現を同一視して特徴的な部分木の抽出を行うことができる。 According to the text mining program, the matching subtree included in the target sentence dependency structure tree is replaced with the synonymous expression node, and the feature subtree is extracted from the target sentence dependency tree after replacement.
Therefore, it is possible to extract a characteristic partial tree by operating a computer as a text mining device and identifying synonymous expressions having different dependency structure trees.

本発明に係るテキストマイニングプログラムでは、マイニングの対象となるテキストデータベースから対象文依存構造木を生成する機能と、同義表現辞書に記憶されている表現を記憶装置から読み出して同義表現依存構造木を生成する機能と、対象文依存構造木と同義表現依存構造木を照合して一致部分木が対象文依存構造木に含まれているかどうかを識別する機能と、通常の節点のラベルとは区別される識別子をラベルとする同義表現節点を生成して対象文依存構造木に追加する機能とをコンピュータ実行させる。このとき、同義表現節点から、一致部分木外の節点で一致部分木の内の節点からの係り受け枝を有している節点への係り受け枝を追加させ、一致部分木外の節点で一致部分木内の節点からの係り受け枝を有している節点から、同義表現節点への係り受け枝を追加させるようにする。そして、同義表現節点の追加と係り受け枝の追加がされた後の対象文依存構造木（依存構造束）から特徴部分木を抽出させる（請求項１４）。 In the text mining program according to the present invention, a function for generating a target sentence dependency structure tree from a text database to be mined and a expression stored in the synonym expression dictionary are read from the storage device to generate a synonym expression dependency structure tree. Are distinguished from normal node labels and the function that identifies whether the matching subtree is included in the target sentence dependency structure tree by comparing the target sentence dependency structure tree with the synonymous expression dependency structure tree. The computer executes a function of generating a synonymous expression node having the identifier as a label and adding it to the target sentence dependency structure tree. At this time, add a dependency branch from the synonymous expression node to a node that has a dependency branch from a node in the matching subtree at a node outside the matching subtree, and match at a node outside the matching subtree A dependency branch to a synonym expression node is added from a node having a dependency branch from a node in the subtree. Then, a feature subtree is extracted from the target sentence dependency structure tree (dependency structure bundle) after the addition of the synonymous expression nodes and the dependency branches.

上記テキストマイニングプログラムによれば、対象文依存構造木に含まれる一致部分木に相当する同義表現節点を対象文依存構造木に追加し、一致部分木と外部の節点の関係を示す係り受け枝と同等の係り受け枝を同義表現節点に追加して依存構造束を生成する。そして依存構造束から特徴部分木を抽出する。
そのため、コンピュータをテキストマイニング装置として動作させ、依存構造木が異なる同義表現を同一視して特徴的な部分木の抽出を行うことができる。さらに、特徴部分木の抽出の対象となる対象文依存構造木には、既存の節点はそのまま残されているから、同義表現を統一することによる悪影響を低く抑えることができる。 According to the text mining program, a synonymous expression node corresponding to a matching subtree included in the target sentence dependency structure tree is added to the target sentence dependency structure tree, and a dependency branch indicating a relationship between the matching subtree and an external node is obtained. An equivalent dependency branch is added to the synonym expression node to generate a dependency structure bundle. A feature subtree is extracted from the dependency structure bundle.
Therefore, it is possible to extract a characteristic partial tree by operating a computer as a text mining device and identifying synonymous expressions having different dependency structure trees. Furthermore, since the existing node is left as it is in the target sentence dependency structure tree that is the target of the feature subtree extraction, adverse effects caused by unifying synonymous expressions can be reduced.

上記テキストマイニングプログラムにおいて、対象文依存構造木に同義表現節点が追加された後、同義表現節点と一致部分木に含まれる各節点を排他関係枝で接続した依存構造束を生成する機能をコンピュータに実行させ、依存構造束の排他関係枝で接続された節点を含まない部分木から特徴部分木を抽出するようにしてもよい（請求項１５）。
このようにすれば、一致部分木に含まれる節点と同義表現節点の両方を含むような抽出結果として意味を成さない部分木は特徴部分木として抽出されなくなる。 In the above text mining program, after a synonymous expression node is added to the target sentence dependency structure tree, a function to generate a dependency structure bundle in which each node included in the synonymous expression node and the matching subtree is connected by an exclusive relation branch is added to the computer. The feature subtree may be extracted from subtrees that do not include nodes connected by exclusive relation branches of the dependency structure bundle.
In this way, a subtree that does not make sense as an extraction result including both nodes and synonymous expression nodes included in the matching subtree is not extracted as a feature subtree.

本発明によれば、対象文依存構造木に含まれる一致部分木を、節点置換手段が同義表現節点で置換し、特徴部分木抽出手段は、置換後の対象文依存木から特徴部分木を抽出する。
そのため、依存構造木が異なる同義表現を同一視して特徴的な部分木の抽出を行うことができる。 According to the present invention, the node replacement unit replaces the matching subtree included in the target sentence dependency structure tree with the synonymous expression node, and the feature subtree extraction unit extracts the feature subtree from the target sentence dependency tree after replacement. To do.
Therefore, a characteristic subtree can be extracted by equating synonymous expressions with different dependency structure trees.

図を参照しながら本発明の第１の実施形態であるテキストマイニング装置１０の構成と動作について説明する。
（テキストマイニング装置１０の構成）
図５は、テキストマイニング装置１０の概略機能ブロック図である。
テキストマイニング装置１０は、キーボード、マウス等の入力装置１と、プログラム制御により動作するデータ処理装置２０と、情報を記憶するハードディスク等の記憶装置３０と、ディスプレイ装置等の出力装置４とを備える。 The configuration and operation of the text mining device 10 according to the first embodiment of the present invention will be described with reference to the drawings.
(Configuration of text mining device 10)
FIG. 5 is a schematic functional block diagram of the text mining apparatus 10.
The text mining device 10 includes an input device 1 such as a keyboard and a mouse, a data processing device 20 that operates under program control, a storage device 30 such as a hard disk that stores information, and an output device 4 such as a display device.

記憶装置３０は、同義表現辞書記憶部３１と、テキスト集合記憶部３２とを備える。
同義表現辞書記憶部３１は、テキストマイニングを行う際に同義と見なす表現を同義と見なす表現ごとにグループ化して定義した同義表現辞書を予め記憶している。テキスト集合記憶部３２は、テキストマイニングの対象となるテキストを予め記憶している。
同義表現辞書において同一の同義表現グループに属する同義表現どうしが、テキストマイニングを行う際に同一視される。 The storage device 30 includes a synonym expression dictionary storage unit 31 and a text set storage unit 32.
The synonym expression dictionary storage unit 31 stores in advance a synonym expression dictionary defined by grouping expressions that are considered synonymous when performing text mining into expressions that are considered synonymous. The text set storage unit 32 stores in advance text to be mined.
In the synonym expression dictionary, synonym expressions belonging to the same synonym expression group are identified with each other when performing text mining.

図６に、同義表現辞書の一例を示す。この例は、テキストマイニング時に、表現ＥＸ１「表示する文字を小さくする」と表現ＥＸ２「小さな文字で表示する」と表現ＥＸ３「表示する行数を増やす」とを同一視し（同義表現グループＧ１）、さらに、表現ＥＸ４「画像をメールで送る」と表現ＥＸ５「メールに画像を添付する」とを同一視したい（同義表現グループＧ２）場合の同義表現辞書の定義例である。 FIG. 6 shows an example of the synonym expression dictionary. In this example, at the time of text mining, the expression EX1 “reducing displayed characters”, the expression EX2 “displaying with small characters”, and the expression EX3 “increasing the number of lines to be displayed” are identified (synonymous expression group G1). Furthermore, this is a definition example of the synonym expression dictionary when the expression EX4 “send image by e-mail” and the expression EX5 “attach image to e-mail” are to be identified (synonym expression group G2).

データ処理装置２０は、言語解析手段２１と、同義表現識別手段２２と、同義表現節点生成手段２３と、節点置換手段２４と、特徴部分木抽出手段２５と、抽出結果出力手段２６とを備える。
言語解析手段２１は、同義表現辞書記憶部３１に記憶されている全ての表現およびテキスト集合記憶部３２に記憶されているテキスト中の全ての文について対応する依存構造木を構築する。
依存構造木とは、文の構成要素を節点とし、文の構成要素間の依存関係（係り受け関係）を枝として、文を木構造として表現したものである。各節点は、節点に付与されたラベルによって区別される。 The data processing device 20 includes language analysis means 21, synonymous expression identification means 22, synonymous expression node generation means 23, node replacement means 24, feature subtree extraction means 25, and extraction result output means 26.
The language analysis means 21 constructs a dependency structure tree corresponding to all expressions stored in the synonym expression dictionary storage unit 31 and all sentences in the text stored in the text set storage unit 32.
The dependency structure tree expresses a sentence as a tree structure with a sentence component as a node, a dependency relationship between the sentence components (dependency relationship) as a branch. Each node is distinguished by a label attached to the node.

節点に対応する文の構成要素としては、例えば形態素を採用し、各形態素の原型を節点のラベルとし、形態素間の依存関係を枝とする依存構造木を構築するようにしてもよいし、図１（ａ）の例のように文節を採用し、文節に属する自立語を終止形に直したしたものを節点のラベルとし、文節間の係り受け関係を枝とする依存構造木を構築するようにしてもよい。
依存構造木を構築するには、例えば、形態素解析を行って文を形態素の単位に分割し、構文解析を行って各形態素間の関係を求める等、一般に知られている方法を用いることができる。
なお、言語解析手段２１によって構築された依存構造木は、図示しないＤＲＡＭ(Dynamic Random Access Memory)等の一時記憶装置に保持するようにしてもよいし、記憶装置３０に保持するようにしてもよい。 For example, a morpheme may be adopted as a component of a sentence corresponding to a node, and a dependency structure tree may be constructed in which a prototype of each morpheme is used as a node label and a dependency relationship between morphemes is a branch. As in the example of 1 (a), adopt a clause, build a dependency structure tree with the independence word belonging to the clause converted to a terminal form as a node label, and the dependency relationship between clauses as branches It may be.
In order to construct a dependency structure tree, for example, a generally known method can be used, such as dividing a sentence into morpheme units by performing morphological analysis and obtaining a relationship between morphemes by performing syntax analysis. .
The dependency structure tree constructed by the language analysis means 21 may be held in a temporary storage device such as a DRAM (Dynamic Random Access Memory) (not shown) or may be held in the storage device 30. .

同義表現識別手段２２は、テキスト集合記憶部３２に記憶されているテキスト中の各文に対応する依存構造木（対象文依存構造木）と、同義表現辞書記憶部３１に記憶されている表現に対応する依存構造木（同義表現依存構造木）を照合する。これにより、テキスト中で同義表現辞書中の表現が使用されている箇所を特定し、テキスト中のどの箇所で同義表現辞書中のどの表現が使用されているのかを識別する。 The synonym expression identifying unit 22 uses a dependency structure tree (target sentence dependency structure tree) corresponding to each sentence in the text stored in the text set storage unit 32 and an expression stored in the synonym expression dictionary storage unit 31. Match the corresponding dependency structure tree (synonymous expression dependency structure tree). As a result, the part where the expression in the synonym expression dictionary is used in the text is specified, and which part in the synonym expression dictionary is used at which part in the text.

同義表現節点生成手段２３は、テキスト中の各文において、同義表現辞書中の表現が使用されている箇所、すなわち、テキスト中の各文に対応する依存構造木において同義表現辞書中の表現に対応する依存構造木（一致部分木）が部分木として含まれている箇所のそれぞれに対応づけて、新しい節点（同義表現節点）を生成する。
生成する節点には、その箇所で使用されていた表現が属する同義表現グループを表す識別子をラベルとして付与する。すなわち、同一の同義表現グループに属する表現が使用されている箇所に対して生成された節点には、共通のラベルを付与するようにする。また、ラベルは、言語解析手段２１によって構築された依存構造木にはじめから存在していた節点のラベルとは異なる特別なラベルとする。 The synonym expression node generation means 23 corresponds to the expression in the synonym expression dictionary in the part where the expression in the synonym expression dictionary is used in each sentence in the text, that is, the dependency structure tree corresponding to each sentence in the text. A new node (synonymous expression node) is generated by associating the dependency structure tree (matching subtree) with each of the locations where it is included as a subtree.
An identifier indicating the synonymous expression group to which the expression used at the location belongs is assigned as a label to the node to be generated. That is, a common label is assigned to nodes generated for locations where expressions belonging to the same synonymous expression group are used. Also, the label is a special label different from the node label that originally existed in the dependency structure tree constructed by the language analysis means 21.

節点置換手段２４は、テキスト中の各文に対応する依存構造木において、一致部分木が含まれている箇所に対して、その部分木に含まれる全節点を、同義表現節点生成手段２３がその箇所に対応づけて生成した同義表現節点で置換する。この置換処理により、始点と終点の両方が同一の節点に置換される枝は削除する。始点と終点の一方の節点のみが置換される枝はそのまま残す。 The node replacement means 24 is a synonymous expression node generation means 23 for all the nodes included in the subtree in the dependency structure tree corresponding to each sentence in the text. Replace with the synonym node generated in association with the location. This replacement process deletes a branch in which both the start point and the end point are replaced with the same node. Leave the branch where only one of the start and end nodes is replaced.

節点置換手段２４による処理の例を図７に示す。これは、文Ｓ１「メールを表示する文字をできるだけ小さくする方法をＷＥＢで調べた」に対応する依存構造木ＤＴ−Ｓ１（図２）において、表現ＥＸ１「表示する文字を小さくする」に対応する依存構造木ＤＴ−ＥＸ１が部分木として含まれている箇所（図３のＰＴ１）に対して、その箇所の部分木に含まれる全節点を、新しい節点「Ｇ１」によって置換する場合の例である。
この例では、４つの節点「表示する」「文字」「小さい」「する」が節点「Ｇ１」によって置換される。これらの４節点間を接合していた枝ＢＲ１、ＢＲ２、ＢＲ３は削除される。節点「メール」と節点「表示する」とを接合していた枝ＢＲ４は、節点「メール」と節点「Ｇ１」とを接合する形でそのまま残される。また、節点「できるだけ」と節点「小さい」とを接合していた枝ＢＲ５は、節点「できるだけ」と節点「Ｇ１」とを接合する形でそのまま残される。また、節点「する」と節点「方法」とを接合していた枝ＢＲ６は、節点「Ｇ１」と節点「方法」とを接合する形でそのまま残される。 An example of processing by the node replacement means 24 is shown in FIG. This corresponds to the expression EX1 “reducing the displayed characters” in the dependency structure tree DT-S1 (FIG. 2) corresponding to the sentence S1 “The method of reducing the displayed characters as much as possible by WEB”. This is an example of replacing all nodes included in the subtree at the location where the dependency structure tree DT-EX1 is included as a subtree (PT1 in FIG. 3) with a new node “G1”. .
In this example, the four nodes “display”, “character”, “small”, and “to” are replaced by the node “G1”. The branches BR1, BR2, and BR3 that have joined these four nodes are deleted. The branch BR4 that has joined the node “mail” and the node “display” is left as it is in the form of joining the node “mail” and the node “G1”. Further, the branch BR5 that has joined the node “as much as possible” and the node “small” is left as it is in the form of joining the node “as possible” and the node “G1”. Further, the branch BR6 that has joined the node “Yes” and the node “Method” is left as it is in the form of joining the node “G1” and the node “Method”.

特徴部分木抽出手段２５は、テキスト中の各文に対応する依存構造木から特徴的な部分木を抽出する。
ある部分木が特徴的かどうかの判定は、一般的なデータマイニングの手法を用いる。例えば、全依存構造木中で予め定める閾値以上の回数出現する部分木を特徴的な部分木として抽出することができる。また、対応する依存構造木中に、ある部分木が出現する文が１つ以上存在するテキストが予め定める閾値以上の個数存在する場合に、その部分木を特徴的な部分木として抽出することも可能である。このほか、テキストが予め定める集合に属するか否かと、そのテキスト中の各文に対応する依存構造木中に部分木が出現するか否かに、予め定める閾値以上の相関性がある部分木を特徴的な部分木として抽出するようにしてもよい。 The characteristic subtree extracting unit 25 extracts a characteristic subtree from the dependency structure tree corresponding to each sentence in the text.
A general data mining method is used to determine whether a subtree is characteristic. For example, a subtree that appears more than a predetermined threshold in all the dependency structure trees can be extracted as a characteristic subtree. In addition, when there are more than a predetermined threshold number of texts in which one or more sentences in which a subtree appears in the corresponding dependency structure tree, the subtree may be extracted as a characteristic subtree. Is possible. In addition, a subtree having a correlation greater than or equal to a predetermined threshold is determined as to whether or not the text belongs to a predetermined set and whether or not a subtree appears in the dependency structure tree corresponding to each sentence in the text. You may make it extract as a characteristic subtree.

ある部分木が特徴的かどうかを判定する基準（例えば、出現回数の閾値や、相関性を求める対象となるテキストの集合、相関性の閾値等）は、入力装置１を通して利用者が入力するようにしてもよい。また、特徴的かどうかを判定する方法を複数用意し、利用者が選択できるようにしてもよい。このほか、利用者が、抽出する部分木の条件や、部分木を抽出するテキストの条件を指定できるようにしてもよい。 Criteria for determining whether or not a subtree is characteristic (for example, a threshold for the number of appearances, a set of texts for which correlation is obtained, a correlation threshold, etc.) are input by the user through the input device 1. It may be. In addition, a plurality of methods for determining whether or not it is characteristic may be prepared so that the user can select it. In addition, the user may be able to specify the condition of the subtree to be extracted and the text condition for extracting the subtree.

抽出結果出力手段２６は、抽出された部分木の形状を出力装置４に利用者が視認できる形で出力する。このとき、言語解析手段２１によって構築された依存構造木にはじめから存在していた節点についてはそのまま出力し、同義表現節点生成手段２３によって生成された節点については、同義表現辞書記憶部３１を参照して、そのラベルを対応する同義表現グループに応じた表現に置換して出力する。これにより、同義表現節点生成手段２３によって付与された特殊なラベルを利用者が理解できる状態に直すことができる。 The extraction result output means 26 outputs the extracted partial tree shape to the output device 4 in a form that the user can visually recognize. At this time, the nodes that originally existed in the dependency structure tree constructed by the language analysis unit 21 are output as they are, and the nodes generated by the synonym expression node generation unit 23 are referred to the synonym expression dictionary storage unit 31. Then, the label is replaced with an expression corresponding to the corresponding synonym expression group and output. Thereby, the special label given by the synonymous expression node generation means 23 can be corrected to a state in which the user can understand.

ラベルの置換に使用する表現は、例えば、同義表現グループ中で最初の表現、もっとも長さが短い表現、テキスト中にもっとも多く出現した表現等を同義表現辞書から自動的に選択するようにすることができる。
また、同義表現辞書において、同義表現グループを代表する表現に予め印を付けておき、その表現を使用するようにしてもよい。
また、同義表現辞書において、ラベルの置換に使用する表現を同義と見なす表現とは別に格納しておくようにしてもよい。 For the expression used for label replacement, for example, the first expression in the synonym expression group, the expression with the shortest length, and the expression that appears most frequently in the text are automatically selected from the synonym expression dictionary. Can do.
In the synonym expression dictionary, expressions representing the synonym expression groups may be marked in advance and used.
In the synonym expression dictionary, expressions used for label replacement may be stored separately from expressions regarded as synonymous.

次に、テキストマイニング装置１０の動作について詳細に説明する。
図８は、テキストマイニング装置の動作を示すフローチャートである。
まず、言語解析手段２１が、同義表現辞書記憶部３１に記憶されている全表現を解析し、同義表現依存構造木を構築する（図８のステップＡ１およびＡ２）。
次に、言語解析手段２１は、テキスト集合記憶部３２に記憶されているテキスト中の１つの文を解析し、対象文依存構造木を構築する（ステップＡ３）。
続いて、同義表現識別手段２２が、ステップＡ３で構築された対象文依存構造木を、ステップＡ１において構築された同義表現依存構造木と照合し、対象文中に同義表現辞書に記録されている表現が含まれているかどうかを判別する（ステップＡ４およびステップＡ５）。 Next, the operation of the text mining device 10 will be described in detail.
FIG. 8 is a flowchart showing the operation of the text mining apparatus.
First, the language analysis unit 21 analyzes all expressions stored in the synonym expression dictionary storage unit 31 and constructs a synonym expression dependency structure tree (steps A1 and A2 in FIG. 8).
Next, the language analysis unit 21 analyzes one sentence in the text stored in the text set storage unit 32 and constructs a target sentence dependency structure tree (step A3).
Subsequently, the synonym expression identifying unit 22 compares the target sentence dependency structure tree constructed in step A3 with the synonym expression dependency structure tree constructed in step A1, and the expression recorded in the synonym expression dictionary in the target sentence. Is included (step A4 and step A5).

対象文依存構造木の中に、同義表現辞書中の表現に対応する依存構造木（一致部分木）が部分木として含まれている箇所が存在する場合、同義表現節点生成手段２３が、その表現が属する同義表現グループに応じた識別子をラベルとする特別な節点（同義表現節点）を、その箇所に対応づけて生成する（ステップＡ６）。さらに、節点置換手段２４が、一致部分木に含まれる全節点を、同義表現節点によって置換する（ステップＡ７）。 If the target sentence dependency structure tree includes a portion where the dependency structure tree (matching subtree) corresponding to the expression in the synonym expression dictionary is included as a subtree, the synonym expression node generation means 23 performs the expression. A special node (synonymous expression node) whose label is an identifier corresponding to the synonym expression group to which the symbol belongs is generated in association with the location (step A6). Further, the node replacement means 24 replaces all nodes included in the matching subtree with synonymous expression nodes (step A7).

同義表現識別手段２２は、ステップＡ１において構築されたすべての同義表現依存構造木との照合が終わったかどうかを判定する（ステップＡ８）。まだ照合していない同義表現依存構造木が残っている場合には、同義表現識別手段２２、同義表現節点生成手段２３および節点置換手段２４が、ステップＡ４からＡ７までの動作を繰り返す。なお、このとき２巡目以降は、１巡目の処理が行われた後の構造を対象として処理を行う。 The synonym expression identification unit 22 determines whether or not the collation with all the synonym expression dependency structure trees constructed in step A1 has been completed (step A8). If a synonymous expression dependent structural tree that has not yet been collated remains, the synonymous expression identifying means 22, the synonymous expression node generating means 23, and the node replacing means 24 repeat the operations from step A4 to A7. At this time, after the second round, processing is performed on the structure after the first round processing is performed.

さらに、言語解析手段２１が、テキスト集合記憶部３２に記憶されているテキスト中のすべての文に対して解析が終了したかどうかを判定する（ステップＡ９）。まだ解析していない文が残っている場合には、言語解析手段２１、同義表現識別手段２２、同義表現節点生成手段２３および節点置換手段２４が、ステップＡ３からＡ８までの動作を繰り返す。 Further, the language analysis means 21 determines whether or not the analysis has been completed for all sentences in the text stored in the text set storage unit 32 (step A9). If a sentence that has not been analyzed still remains, the language analysis means 21, the synonym expression identification means 22, the synonym expression node generation means 23, and the node replacement means 24 repeat the operations from step A3 to A8.

すべての文について、ここまでの処理が終了すると、特徴部分木抽出手段２５が、各文に対応する依存構造木から特徴的な部分木を抽出する（ステップＡ１０）。
最後に、抽出結果出力手段２６が、抽出結果を順に出力装置４に出力する。まず、同義表現辞書記憶部３１を参照し、抽出結果において、ステップＡ６において生成された同義表現節点のラベルを相応な表現（出力用表現）に置換する（ステップＡ１０）。続いて、出力装置４を通して抽出結果を出力する（ステップＡ１１）。すべての抽出結果に対してこの動作を繰り返す(ステップＡ１２）。 When the processing so far is completed for all sentences, the feature subtree extracting unit 25 extracts a characteristic subtree from the dependency structure tree corresponding to each sentence (step A10).
Finally, the extraction result output unit 26 outputs the extraction results to the output device 4 in order. First, the synonym expression dictionary storage unit 31 is referred to, and in the extraction result, the label of the synonym expression node generated in step A6 is replaced with a corresponding expression (expression for output) (step A10). Subsequently, the extraction result is output through the output device 4 (step A11). This operation is repeated for all extraction results (step A12).

なお、本実施の形態では、テキスト集合記憶部３２に記憶されているテキストに対して、言語解析手段２１が一文ごとに依存構造木を構築し、同義表現識別手段２２、同義表現節点生成手段２３、および、節点置換手段２４がこの依存構造木を順に処理するものとして説明したが、テキスト集合記憶部３２に記憶されているテキスト中の全文に対して言語解析手段２１が一括して依存構造木を構築し、同義表現識別手段２２、同義表現節点生成手段２３および節点置換手段２４が、それぞれ、全依存構造木を一括して処理するようにしてもよい。 In the present embodiment, for the text stored in the text set storage unit 32, the language analyzing unit 21 constructs a dependency structure tree for each sentence, and synonymous expression identifying unit 22 and synonymous expression node generating unit 23. In the above description, the node replacement unit 24 sequentially processes the dependency structure tree. However, the language analysis unit 21 collectively processes the dependency structure tree for all sentences in the text stored in the text set storage unit 32. And the synonymous expression identifying means 22, the synonymous expression node generating means 23, and the node replacing means 24 may each process all the dependency structure trees at once.

次に、テキストマイニング装置１０の具体的な動作例について説明する。
本実動作例では、依存構造木として、文節を節点とし、文節に属する自立語を終止形に直したしたものを節点のラベルとし、文節間の係り受け関係を枝とする木構造を採用する。
同義表現辞書記憶部３１には、図６に示す内容が予め記憶されている。
また、テキスト集合記憶部３２には、テキストマイニングの対象となるテキストが予め記憶されている。図１５において長方形３０３で模式的に示したのが一つのテキスト、たとえばコールセンターに寄せられた問い合わせの内容を電子的に記録したものである。一つのテキストには、１または複数の文が含まれている。テキスト集合記憶部３２には、このようなテキストが複数記憶されている。 Next, a specific operation example of the text mining device 10 will be described.
In this actual operation example, as the dependency structure tree, a tree structure is used in which a clause is a node, an independent word belonging to the clause is changed to a terminal form, a node label is used, and a dependency relationship between clauses is a branch. .
The synonym expression dictionary storage unit 31 stores in advance the contents shown in FIG.
In addition, the text set storage unit 32 stores in advance text to be subjected to text mining. In FIG. 15, a rectangle 303 schematically shows one text, for example, an electronically recorded content of an inquiry sent to a call center. One text includes one or more sentences. The text set storage unit 32 stores a plurality of such texts.

まず、言語解析手段２１が、同義表現辞書記憶部３１中の各表現を解析し、同義表現依存構造木を構築する。
本実施例では、形態素解析により各表現を形態素に分割し、構文解析により文節間の係り受け関係を求めて、同義表現依存構造木を構築する。
この処理により、図６の表現ＥＸ１から依存構造木ＤＴ−ＥＸ１（図１（ａ））が、表現ＥＸ２から依存構造木ＤＴ−ＥＸ２（図１（ｂ））が、表現ＥＸ３から依存構造木ＤＴ−ＥＸ３（図４（ａ））が、表現ＥＸ４から依存構造木ＤＴ−ＥＸ４（図１６（ａ））が、そして、表現ＥＸ５から依存構造木ＤＴ−ＥＸ５（図１６（ｂ））が構築される。 First, the language analysis means 21 analyzes each expression in the synonym expression dictionary storage unit 31 to construct a synonym expression dependency structure tree.
In the present embodiment, each expression is divided into morphemes by morphological analysis, and the dependency relation between clauses is obtained by syntactic analysis to construct a synonymous expression dependent structure tree.
By this processing, the dependency structure tree DT-EX1 (FIG. 1A) from the expression EX1 in FIG. 6, the dependency structure tree DT-EX2 (FIG. 1B) from the expression EX2, and the dependency structure tree DT from the expression EX3 are displayed. -EX3 (FIG. 4 (a)) is constructed from the expression EX4 to the dependency structure tree DT-EX4 (FIG. 16 (a)), and from the expression EX5 to the dependency structure tree DT-EX5 (FIG. 16 (b)). The

同義表現辞書記憶部３１中のすべての表現に対して同義表現依存構造木の構築が終了すると、続いて、言語解析手段２１が、テキスト集合記憶部３２中のテキストに含まれる各文を解析して対象文依存構造木を構築し、同義表現識別手段２２が、一致部分木が対象文依存構造木に含まれているかどうかを識別し、同義表現節点生成手段２３が、一致部分木が含まれる箇所に対応づけて同義表現節点を生成し、節点置換手段２４が、一致部分木に含まれる全節点を同義表現節点により置換する処理を行う。 When the construction of the synonymous expression dependency structure tree is completed for all the expressions in the synonym expression dictionary storage unit 31, the language analysis unit 21 subsequently analyzes each sentence included in the text in the text set storage unit 32. Then, the target sentence dependency structure tree is constructed, the synonym expression identifying unit 22 identifies whether or not the matching subtree is included in the target sentence dependency structure tree, and the synonym expression node generating unit 23 includes the matching subtree. A synonymous expression node is generated in association with the location, and the node replacement means 24 performs a process of replacing all the nodes included in the matching subtree with the synonymous expression node.

ここでは、テキスト中の文Ｓ１「メールを表示する文字をできるだけ小さくする方法をＷＥＢで調べた」に対する上記処理の例を説明する。
まず、言語解析手段２１が、この文を解析して対象文依存構造木を構築する。この結果、図２に示す依存構造木ＤＴ−Ｓ１が構築される。
次に、同義表現識別手段２２が、同義表現依存構造木ＤＴ−ＥＸ１（図１（ａ））、ＤＴ−ＥＸ２（図１（ｂ））、ＤＴ−ＥＸ３（図４（ａ））、ＤＴ−ＥＸ４（図１６（ａ））、および、ＤＴ−ＥＸ５（図１６（ｂ））のそれぞれと、文Ｓ１に対応する依存構造木ＤＴ−Ｓ１（図２）とを照合し、依存構造木ＤＴ−Ｓ１中のどの箇所に、どの表現に対応する依存構造木が部分木として含まれているかを識別する。 Here, the example of the said process with respect to sentence S1 in a text "The method to make the character which displays an e-mail as small as possible was investigated by WEB."
First, the language analysis means 21 analyzes this sentence and constructs a target sentence dependency structure tree. As a result, the dependency structure tree DT-S1 shown in FIG. 2 is constructed.
Next, the synonym expression identifying means 22 is synonymous expression dependency structure tree DT-EX1 (FIG. 1 (a)), DT-EX2 (FIG. 1 (b)), DT-EX3 (FIG. 4 (a)), DT- Each of EX4 (FIG. 16 (a)) and DT-EX5 (FIG. 16 (b)) is compared with the dependency structure tree DT-S1 (FIG. 2) corresponding to the sentence S1, and the dependency structure tree DT- It is identified in which part in S1 the dependency structure tree corresponding to which expression is included as a subtree.

照合の結果、依存構造木ＤＴ−Ｓ１中には、図３に示すように、依存構造木ＤＴ−ＥＸ１が部分木ＰＴ１として含まれているが、依存構造木ＤＴ−ＥＸ２、ＤＴ−ＥＸ３、ＤＴ−ＥＸ４、および、ＤＴ−ＥＸ５は含まれていないことが識別される。
同義表現節点生成手段２３は、図３の部分木ＰＴ１に対応づけて、同義表現節点を生成する。同義表現節点には、適合した依存構造木に対応する表現が属する同義表現グループを表す識別子をラベルとして付与する。 As a result of the collation, the dependency structure tree DT-S1 includes the dependency structure tree DT-EX1 as a subtree PT1 as shown in FIG. 3, but the dependency structure trees DT-EX2, DT-EX3, DT are included. -It is identified that EX4 and DT-EX5 are not included.
The synonym expression node generation means 23 generates a synonym expression node in association with the subtree PT1 in FIG. An identifier representing a synonym expression group to which an expression corresponding to the adapted dependency structure tree belongs is assigned as a label to the synonym expression node.

依存構造木ＤＴ−ＥＸ１は、表現ＥＸ１に対応するものであり、表現ＥＸ１は同義表現グループＧ１に属するため、ここでは、「Ｇ１」というラベルを付与する（図１７）。なお、このラベルは、言語解析手段２１によって構築された依存構造木にはじめから存在していた節点のラベルとは異なる特別なラベルである。図中では「Ｇ１」に下線を引くことで、同義表現節点生成手段２３によって生成された同義表現節点であることを示している。 The dependency structure tree DT-EX1 corresponds to the expression EX1, and since the expression EX1 belongs to the synonym expression group G1, a label “G1” is given here (FIG. 17). This label is a special label that is different from the node label that originally existed in the dependency structure tree constructed by the language analysis means 21. In the drawing, “G1” is underlined to indicate that it is a synonym expression node generated by the synonym expression node generation means 23.

次に、節点置換手段２４が、依存構造木ＤＴ−Ｓ１の部分木ＰＴ１に含まれる全節点を、同義表現節点生成手段２３によって生成された節点「Ｇ１」によって置換する（図７）。節点置換手段２４は、始点と終点の両方が同一の節点に置換される枝ＢＲ１ないし３を削除し、始点と終点の一方の節点のみが置換される枝ＢＲ４ないし６はそのまま残す。この結果、依存構造木ＤＴ−Ｓ１は、依存構造木ＤＴ−Ｓ１Ｒ（図１８（ａ））へと変換される。 Next, the node replacement unit 24 replaces all nodes included in the subtree PT1 of the dependency structure tree DT-S1 with the node “G1” generated by the synonymous expression node generation unit 23 (FIG. 7). The node replacement unit 24 deletes the branches BR1 to BR3 in which both the start point and the end point are replaced with the same node, and leaves the branches BR4 to BR6 in which only one of the start point and the end point is replaced. As a result, the dependency structure tree DT-S1 is converted into the dependency structure tree DT-S1R (FIG. 18A).

言語解析手段２１、同義表現識別手段２２、同義表現節点生成手段２３、および、節点置換手段２４がこのように動作することで、文Ｓ１「メールを表示する文字をできるだけ小さくする方法をＷＥＢで調べた」から依存構造木ＤＴ−Ｓ１（図２）が構築され、最終的に依存構造木ＤＴ−Ｓ１Ｒ（図１８（ａ））に変換される。このようにして生成された依存構造木ＤＴ−Ｓ１Ｒが、特徴部分木抽出手段２５が特徴的な部分木を抽出する対象となる。 The language analysis unit 21, the synonym expression identification unit 22, the synonym expression node generation unit 23, and the node replacement unit 24 operate in this manner, so that the sentence S1 “A method for minimizing the character for displaying the mail is examined by WEB. Dependent structure tree DT-S1 (FIG. 2) is constructed from “”, and finally converted to dependency structure tree DT-S1R (FIG. 18A). The dependency structure tree DT-S1R generated in this way is a target from which the feature subtree extraction unit 25 extracts a characteristic subtree.

同様の処理により、文Ｓ２「メールを表示する行数を２倍に増やす」からは、依存構造木ＤＴ−Ｓ２（図４（ｂ））が構築され、最終的に依存構造木ＤＴ−Ｓ２Ｒ（図１８（ｂ））が生成される。さらに、文Ｓ３「メールを小さな文字で別な画面に表示する」からは、依存構造木ＤＴ−Ｓ３（図１９（ａ））が構築され、最終的に依存構造木ＤＴ−Ｓ３Ｒ（図１９（ｂ））が生成される。 By the same process, from the sentence S2 “double the number of lines for displaying mail”, the dependency structure tree DT-S2 (FIG. 4B) is constructed, and finally the dependency structure tree DT-S2R ( FIG. 18B) is generated. Furthermore, from the sentence S3 “display mail on another screen with small characters”, the dependency structure tree DT-S3 (FIG. 19A) is constructed, and finally the dependency structure tree DT-S3R (FIG. 19 ( b)) is generated.

依存構造木ＤＴ−Ｓ１Ｒ（図１８（ａ））、依存構造木ＤＴ−Ｓ２Ｒ（図１８（ｂ））、依存構造木ＤＴ−Ｓ３Ｒ（図１９（ｂ））を比較すると、これらの依存構造木は、いずれも節点「Ｇ１」を含む。これは、同義表現識別手段２２、同義表現節点生成手段２３および節点置換手段２４の処理により、文Ｓ１で使われている表現ＥＸ１、文Ｓ２で使われている表現ＥＸ２および文Ｓ３で使われている表現ＥＸ３の差異が吸収され、いずれも、依存構造木中で単一の節点「Ｇ１」として表されるようになったことを示している。 When the dependency structure tree DT-S1R (FIG. 18A), the dependency structure tree DT-S2R (FIG. 18B), and the dependency structure tree DT-S3R (FIG. 19B) are compared, these dependency structure trees are compared. Each include the node “G1”. This is used in the expression EX1 used in the sentence S1, the expression EX2 used in the sentence S2, and the sentence S3 by the processing of the synonym expression identifying means 22, the synonymous expression node generating means 23, and the node replacing means 24. This shows that the difference in the expression EX3 is absorbed, and both are now represented as a single node “G1” in the dependency structure tree.

また、文Ｓ４「撮影した画像をメールで２人に送る」からは、依存構造木ＤＴ−Ｓ４（図２０（ａ））が構築され、最終的に依存構造木ＤＴ−Ｓ４Ｒ（図２０（ｂ））が生成される。文Ｓ５「メールにカメラで撮影した画像を添付する」からは、依存構造木ＤＴ−Ｓ５（図２１（ａ））が構築され、最終的に依存構造木ＤＴ−Ｓ５Ｒ（図２１（ｂ））が生成される。 Also, from the sentence S4 “send the photographed image to two people by mail”, the dependency structure tree DT-S4 (FIG. 20A) is constructed, and finally the dependency structure tree DT-S4R (FIG. 20B). )) Is generated. From the sentence S5 “Attach image taken by camera to mail”, dependency structure tree DT-S5 (FIG. 21A) is constructed, and finally dependency structure tree DT-S5R (FIG. 21B). Is generated.

依存構造木ＤＴ−Ｓ４Ｒ（図２０（ｂ））と依存構造木ＤＴ−Ｓ５Ｒ（図２１（ｂ））を比較すると、これらの依存構造木は、いずれも節点「Ｇ２」を含む。これは、同義表現識別手段２２、同義表現節点生成手段２３および節点置換手段２４の処理により、文Ｓ４で使われている表現ＥＸ４および文Ｓ５で使われている表現ＥＸ５の差異が吸収され、いずれも、依存構造木中で単一の節点「Ｇ２」として表されるようになったことを示している。 Comparing the dependency structure tree DT-S4R (FIG. 20B) and the dependency structure tree DT-S5R (FIG. 21B), these dependency structure trees all include the node “G2”. This is because the difference between the expression EX4 used in the sentence S4 and the expression EX5 used in the sentence S5 is absorbed by the processing of the synonym expression identifying unit 22, the synonym expression node generating unit 23, and the node replacing unit 24. Is also represented as a single node “G2” in the dependency structure tree.

言語解析手段２１、同義表現識別手段２２、同義表現節点生成手段２３および節点置換手段２４が処理を繰り返し、テキスト集合記憶部３２中の各テキスト中の文すべてに対して依存構造木を生成すると、特徴部分木抽出手段２５が、生成された全依存構造木を対象として特徴的な部分木を抽出する。
ここでは、節点置換後の全依存構造木中で、５０回以上出現する部分木を特徴的な部分木として抽出するものとする。この場合、例えば、特徴部分木抽出手段２５は、依存構造木中に含まれる部分木を全種類列挙して、それぞれの出現回数をカウントし、出現回数が５０回以上の部分木を抽出することができる。 When the language analysis unit 21, the synonym expression identification unit 22, the synonym expression node generation unit 23, and the node replacement unit 24 repeat the processing to generate a dependency structure tree for all sentences in each text in the text set storage unit 32, The characteristic subtree extracting unit 25 extracts a characteristic subtree for all generated dependency structure trees.
Here, it is assumed that a subtree that appears 50 times or more is extracted as a characteristic subtree in all dependent structure trees after node replacement. In this case, for example, the feature subtree extracting unit 25 lists all types of subtrees included in the dependency structure tree, counts the number of appearances of each, and extracts subtrees with the number of appearances of 50 or more. Can do.

特徴的な部分木の抽出が終了すると、抽出結果出力手段２６が、抽出結果を順に出力する。このとき、言語解析手段２１によって構築された依存構造木にはじめから存在していた節点についてはそのまま出力し、同義表現節点生成手段２３によって生成された節点については、ラベルを、対応する同義表現グループに応じた表現に置換して出力する。
この動作例では、対応する同義表現グループ中の最初の表現を用いて、同義表現節点生成手段２３によって生成された節点のラベルを置換するものとする。 When the extraction of the characteristic partial tree is completed, the extraction result output unit 26 outputs the extraction results in order. At this time, the nodes that originally existed in the dependency structure tree constructed by the language analysis unit 21 are output as they are, and the labels generated by the synonym expression node generation unit 23 are labeled with the corresponding synonym expression groups. Replace with the expression according to the output.
In this operation example, it is assumed that the node label generated by the synonym expression node generation unit 23 is replaced using the first expression in the corresponding synonym expression group.

抽出結果には、図２２（ａ）に示す依存構造木ＤＴ−Ｒ１が含まれていたとし、抽出結果出力手段２６が、この依存構造木ＤＴ−Ｒ１を出力する例を説明する。
依存構造木ＤＴ−Ｒ１において、節点「メール」は、言語解析手段２１によって構築された依存構造木にはじめから存在していた節点であるが、節点「Ｇ１」は、同義表現節点生成手段２３によって生成された節点である。同義表現グループＧ１（図６参照）中の最初の表現は、表現ＥＸ１「表示する文字を小さくする」であるため、節点「Ｇ１」のラベルは、この表現ＥＸ１によって置換され、図２２（ｂ）のように結果が出力される。 An example in which the extraction result includes the dependency structure tree DT-R1 shown in FIG. 22A and the extraction result output unit 26 outputs the dependency structure tree DT-R1 will be described.
In the dependency structure tree DT-R 1, the node “mail” is a node that originally existed in the dependency structure tree constructed by the language analysis unit 21, but the node “G 1” is generated by the synonym expression node generation unit 23. The generated node. Since the first expression in the synonym expression group G1 (see FIG. 6) is the expression EX1 “reducing the displayed character”, the label of the node “G1” is replaced by this expression EX1, and FIG. The result is output as follows.

文Ｓ１中の表現ＥＸ１に対応する部分、文Ｓ２中の表現ＥＸ３に対応する部分、および、文Ｓ３中の表現ＥＸ２に対応する部分がいずれも同一の節点「Ｇ１」によって表され、同一視された状態で特徴的な部分木の抽出が行われる。また、文Ｓ４中の表現ＥＸ４に対応する部分、および、文Ｓ５中の表現ＥＸ５に対応する部分がいずれも同一の節点「Ｇ２」によって表され、これらも同一視された状態で特徴的な部分木の抽出が行われる。また、追加された節点のラベルは、出力時に、それぞれの節点によって同一視された表現に対応する適当な表現に置換されるため、表現の同一視が行われた場合でも、利用者は容易に結果を理解できる。
また、これらの節点「Ｇ１」「Ｇ２」は、文節に属する自立語を終止形に直したしたものをラベルとする元の依存構造木中の節点とは異なるものであるため、元の依存構造木中の節点と区別することができ、本来特徴的ではない部分木が誤って抽出されることはない。 The part corresponding to the expression EX1 in the sentence S1, the part corresponding to the expression EX3 in the sentence S2, and the part corresponding to the expression EX2 in the sentence S3 are all represented and identified by the same node “G1”. In this state, characteristic subtrees are extracted. Further, both the part corresponding to the expression EX4 in the sentence S4 and the part corresponding to the expression EX5 in the sentence S5 are represented by the same node “G2”, and these are also characteristic parts in the state of being identified. Tree extraction is performed. In addition, since the label of the added node is replaced with an appropriate expression corresponding to the expression identified by each node at the time of output, the user can easily perform the identification even if the expression is identified. Understand the results.
In addition, these nodes “G1” and “G2” are different from the nodes in the original dependency structure tree in which the independent words belonging to the phrase are converted to the final form, and thus the original dependency structure. A subtree that can be distinguished from nodes in the tree and is not inherently characteristic is not erroneously extracted.

このように、テキストマイニング装置１０は、特徴構造を抽出する前に節点置換手段２４が対象文依存構造木の一致部分木を同義表現節点生成手段２２が生成した同義表現節点に置換するため、特徴部分木抽出手段２５は、同義表現を同一視して特徴的な部分木の抽出を行うことができる。
なお、置換後の構造も木構造であり、なおかつ、置換によって節点が増加することがないため、特徴的な部分木の抽出にかかるコストが、置換前と比べて増加することはない。
また、対象文依存構造木において、同義の表現に対応する部分の構造を統一する際に、別の依存構造木に変換するのではなく、同義表現節点という単一の節点に置換するため、依存構造木間の変換規則を記述することなく同義表現を同一視することができる。また、対応する依存構造木間の変換を行うことが不可能な同義表現を同一視することも可能である。
さらに、節点置換手段によって追加される同義表現節点は、既存の節点とは異なる特殊な節点であるため、既存の節点のみからなる部分木の抽出には影響を及ぼさない。この結果、同義表現を統一することによる副作用を低く抑えることができる。 As described above, the text mining device 10 replaces the matching subtree of the target sentence dependent structure tree with the synonymous expression node generated by the synonymous expression node generating means 22 before the feature structure is extracted. The subtree extracting means 25 can extract characteristic subtrees by equating synonymous expressions.
Note that the structure after replacement is also a tree structure, and the nodes are not increased by the replacement, so that the cost for extracting a characteristic subtree does not increase compared to before replacement.
In addition, when unifying the structure of the part corresponding to the synonymous expression in the target sentence dependency structure tree, it is not converted to another dependency structure tree, but replaced by a single node called a synonym expression node. Synonymous expressions can be identified without describing conversion rules between structural trees. It is also possible to identify synonymous expressions that cannot be converted between corresponding dependency structure trees.
Furthermore, since the synonymous expression node added by the node replacement means is a special node different from the existing nodes, it does not affect the extraction of the subtree consisting only of the existing nodes. As a result, side effects caused by unifying synonymous expressions can be suppressed.

次に、図を参照しながら本発明の第２の実施形態であるテキストマイニング装置１１の構成と動作について説明する。テキストマイニング装置１１は、多くの要素がテキストマイニング装置１０と共通するので、共通する要素には同一の符号を付して説明を省略する。 Next, the configuration and operation of the text mining device 11 according to the second embodiment of the present invention will be described with reference to the drawings. Since many elements of the text mining device 11 are common to the text mining device 10, the common elements are denoted by the same reference numerals and description thereof is omitted.

（テキストマイニング装置１１の構成）
図９は、テキストマイニング装置１１の概略機能ブロック図である。
テキストマイニング装置１１は、データ処理装置２９の構成が、図５のテキストマイニング装置１０のデータ処理装置２０と異なっている。データ処理装置２９は、図５の節点置換手段２４と特徴部分木抽出手段２５に代わり、束構成手段（節点追加手段および排他関係枝追加手段）２７と束用特徴部分木抽出手段（特徴部分木抽出手段）２８を有している。 (Configuration of text mining device 11)
FIG. 9 is a schematic functional block diagram of the text mining apparatus 11.
The text mining device 11 is different from the data processing device 20 of the text mining device 10 in FIG. 5 in the configuration of the data processing device 29. The data processing device 29 replaces the node replacement unit 24 and the feature subtree extraction unit 25 of FIG. 5 with a bundle forming unit (node addition unit and exclusive branch addition unit) 27 and a feature subtree extraction unit (feature subtree). Extraction means) 28.

束構成手段２７は、対象文依存構造木に一致部分木が含まれているとき、同義表現節点を追加する。束構成手段２７は、さらに、一致部分木に含まれる全節点を同義表現節点によって置換した場合と同等の枝を依存構造木に追加する。
すなわち、一致部分木について、一致部分木に含まれる節点を始点とし一致部分木に含まれない節点を終点とする枝が、元の依存構造木中に存在するならば、同義表現節点からその終点への枝を依存構造木に追加する。また、一致部分木に含まれない節点を始点とし、一致部分木に含まれる節点を終点とする枝が、元の依存構造木中に存在するならば、その始点から同義表現節点への枝を依存構造木に追加する。 The bundle constructing unit 27 adds a synonymous expression node when a matching subtree is included in the target sentence dependency structure tree. The bundle forming unit 27 further adds a branch equivalent to the case where all the nodes included in the matching subtree are replaced with the synonymous expression nodes to the dependency structure tree.
That is, for a matching subtree, if there is a branch in the original dependency structure tree that starts from a node included in the matching subtree and ends in a node not included in the matching subtree, the end point from the synonymous expression node Add a branch to to the dependency tree. If a branch starting from a node not included in the matching subtree and ending in a node included in the matching subtree exists in the original dependency structure tree, a branch from the starting point to the synonym expression node is determined. Add to dependency tree.

束構成手段２７は、このとき、一致部分木に含まれる各節点と同義表現節点とを互いに排他関係枝で結び排他関係にある節点として関連づけておく。
なお、束構成手段２７の処理により、依存構造木は、木構造から束構造へと変換される。以降、変換後の構造を依存構造束と呼ぶ。 At this time, the bundle forming unit 27 associates each node included in the matching subtree and the synonymous expression node with each other by an exclusive relationship branch and associates them as nodes having an exclusive relationship.
The dependency structure tree is converted from a tree structure to a bundle structure by the processing of the bundle forming unit 27. Hereinafter, the converted structure is called a dependency structure bundle.

束構成手段２７による処理の例を図１０に示す。これは、文Ｓ１「メールを表示する文字をできるだけ小さくする方法をＷＥＢで調べた」に対応する依存構造木ＤＴ−Ｓ１（図２）において、表現ＥＸ１「表示する文字を小さくする」に対応する依存構造木ＤＴ−ＥＸ１が部分木として含まれている箇所（図１０のＰＴ１）に対応づけて節点「Ｇ１」が生成されているときに、枝を追加する場合の例である。
この例では、ＰＴ１に含まれる節点「する」を始点としＰＴ１の外側の節点「方法」を終点とする枝ＢＲ６が存在するため、節点「Ｇ１」から節点「方法」への枝ＢＲ７が追加される。
また、ＰＴ１の外側の節点「メール」を始点とし内側の節点「表示する」を終点とする枝ＢＲ４およびＰＴ１の外側の節点「できるだけ」を始点とし内側の節点「小さい」を終点とする枝ＢＲ５が存在するため、節点「メール」および節点「できるだけ」から節点「Ｇ１」への枝ＢＲ８および枝ＢＲ９が追加される。
また、ＰＴ１内の４つの節点「表示する」「文字」「小さい」「する」と、節点「Ｇ１」とが互いに排他関係枝ＢＲ１０ないしＢＲ１３で結ばれ排他関係にある節点として関連づけられる。図１０では、点線によって排他関係が示されている。 An example of processing by the bundle forming unit 27 is shown in FIG. This corresponds to the expression EX1 “reducing the displayed characters” in the dependency structure tree DT-S1 (FIG. 2) corresponding to the sentence S1 “The method of reducing the displayed characters as much as possible by WEB”. This is an example in which a branch is added when the node “G1” is generated in association with a location (PT1 in FIG. 10) where the dependency structure tree DT-EX1 is included as a subtree.
In this example, since there is a branch BR6 starting from the node “Yes” included in PT1 and ending in the node “method” outside PT1, the branch BR7 from the node “G1” to the node “method” is added. The
Further, a branch BR4 whose starting point is the node “mail” outside PT1 and whose end point is the inner node “display”, and a branch BR5 whose starting point is “possible” outside node PT1 and whose inner node “small” is the end point. Therefore, the branch BR8 and the branch BR9 from the node “mail” and the node “as much as possible” to the node “G1” are added.
Also, the four nodes “display”, “character”, “small”, “do” in PT1 and the node “G1” are connected to each other by the exclusive relationship branches BR10 to BR13 and are associated as exclusive nodes. In FIG. 10, the exclusive relationship is indicated by a dotted line.

ここで、互いに排他関係にある節点とは、その両方が同時に出現していると解釈することができない節点を意味する。同義表現節点生成手段２３によって、一致部分木に対して生成された同義表現節点は、その部分木全体をひとまとめにして、同義表現辞書中の表現が出現していると解釈したことに相当する節点である。
一方、一致部分木中の各節点は、その表現を構成する構成要素が個別に出現していると解釈したことに相当する節点である。
したがって、両方が同時に出現していると解釈することはできないため、束構成手段２７は、両者を互いに排他関係にある節点として関連づける。 Here, nodes that are mutually exclusive means a node that cannot be interpreted as both appearing simultaneously. The synonym expression node generated by the synonym expression node generation means 23 for the matching subtree is a node corresponding to the fact that the whole subtree is grouped and the expression in the synonym expression dictionary appears. It is.
On the other hand, each node in the matching subtree is a node corresponding to the interpretation that the constituent elements constituting the expression appear individually.
Accordingly, since it cannot be interpreted that both appear simultaneously, the bundle forming unit 27 associates both as nodes that are mutually exclusive.

なお、一致部分木に含まれる節点と排他関係にある同義表現節点がすでに存在していた場合には、その同義表現節点も、新たに生成された同義表現節点と互いに排他関係にある節点として関連づける。その両者が表す表現は、構成要素として同一の節点を持っており、両方が同時に出現していると解釈することはできないためである。 If there is already a synonymous expression node that is in an exclusive relationship with the nodes included in the matching subtree, the synonymous expression node is also associated with the newly generated synonymous expression node as a node that is mutually exclusive. . This is because the expressions represented by both have the same node as a component and cannot be interpreted as both appearing simultaneously.

束用特徴部分木抽出手段２８は、言語解析手段２１によって構築された依存構造木および束構成手段２７によって構成された依存構造束から特徴的な部分木を抽出する。依存構造木から依存構造束が構成されている場合、元の依存構造木は特徴的な部分木の抽出に使用しない。また、互いに排他関係にある複数の節点を含む部分木は抽出しない。
例えば、図１１に示す依存構造束ＤＬ−Ｓ１において、節点「表示する」と節点「文字」と節点「小さい」と節点「する」がそれぞれ排他関係枝で節点Ｇ１と接続されている場合、図１２に示す依存構造木ＰＴ２は、排他関係にある節点「する」および節点「Ｇ１」が含まれるため、束用特徴部分木抽出手段１８は、これを抽出しない。 The bundle feature subtree extraction means 28 extracts a characteristic subtree from the dependency structure tree constructed by the language analysis means 21 and the dependency structure bundle constructed by the bundle construction means 27. When a dependency structure bundle is configured from the dependency structure tree, the original dependency structure tree is not used for extracting a characteristic subtree. Also, subtrees that include a plurality of nodes that are mutually exclusive are not extracted.
For example, in the dependency structure bundle DL-S1 shown in FIG. 11, when the node “display”, the node “character”, the node “small”, and the node “enable” are connected to the node G1 through exclusive relation branches, respectively, The dependency structure tree PT2 shown in FIG. 12 includes the nodes “Yes” and “G1” that are in an exclusive relationship, and therefore the bundle feature subtree extraction unit 18 does not extract them.

本実施の形態においても、ある部分木が特徴的かどうかの判定は、一般的なデータマイニングの手法を用いることができる。例えば、全依存構造木中で予め定める閾値以上の回数出現する部分木を特徴的な部分木として抽出することができる。また、対応する依存構造木中に、ある部分木が出現する文が１つ以上存在するテキストが、予め定める閾値以上の個数存在する場合に、その部分木を特徴的な部分木として抽出することも可能である。このほか、テキストが予め定める集合に属するか否かと、そのテキスト中の各文に対応する依存構造木中に部分木が出現するか否かに、予め定める閾値以上の相関性がある部分木を、特徴的な部分木として抽出するようにしてもよい。 Also in the present embodiment, a general data mining technique can be used to determine whether a certain subtree is characteristic. For example, a subtree that appears more than a predetermined threshold in all the dependency structure trees can be extracted as a characteristic subtree. In addition, when there is a text having one or more sentences in which a certain subtree appears in the corresponding dependency tree, the subtree is extracted as a characteristic subtree when there are more than a predetermined threshold. Is also possible. In addition, a subtree having a correlation greater than or equal to a predetermined threshold is determined as to whether or not the text belongs to a predetermined set and whether or not a subtree appears in the dependency structure tree corresponding to each sentence in the text. Alternatively, it may be extracted as a characteristic subtree.

本実施の形態においても、第１の実施の形態と同様に、特徴的かどうかを判定する基準（例えば、出現回数の閾値や、相関性を求める対象となるテキストの集合、相関性の閾値等）は、入力装置１を通して利用者が入力するようにしてもよい。また、特徴的かどうかを判定する方法を複数用意し、利用者が選択できるようにしてもよい。このほか、利用者が、抽出する部分木の条件や、部分木を抽出するテキストの条件を指定できるようにしてもよい。 Also in the present embodiment, as in the first embodiment, criteria for determining whether or not it is characteristic (for example, a threshold for the number of appearances, a set of texts for which correlation is obtained, a correlation threshold, etc. ) May be input by the user through the input device 1. In addition, a plurality of methods for determining whether or not it is characteristic may be prepared so that the user can select it. In addition, the user may be able to specify the condition of the subtree to be extracted and the text condition for extracting the subtree.

次に、テキストマイニング装置１１の動作について詳細に説明する。
図１３は、テキストマイニング装置１１の動作を示すフローチャートである。ステップＡ１ないしステップＡ６の動作は、テキストマイニング装置１０と同様である。 Next, the operation of the text mining device 11 will be described in detail.
FIG. 13 is a flowchart showing the operation of the text mining device 11. The operations of Step A1 to Step A6 are the same as those of the text mining device 10.

ステップＢ１では、束構成手段２７が、ステップＡ３において構築された対象文依存構造木に、同義表現依存構造木に対応する一致部分木が含まれているとき、対象文依存構造木の節点と、ステップＡ６において生成された同義表現節点との間に枝を追加し、依存構造束を構成する。すなわち、一致部分木に含まれる節点を始点とし一致部分木に含まれない節点を終点とする枝が元の依存構造木中に存在していた場合に、同義表現節点からその終点への枝を追加し、また、一致部分木に含まれない節点を始点とし一致部分木に含まれる節点を終点とする枝が元の依存構造木中に存在していた場合に、その始点から同義表現節点への枝を追加する。
ステップＢ２では、束構成手段２７が、一致部分木に含まれる各節点と同義表現節点とを互いに排他関係にある節点として関連づける。
ステップＢ３では、束用特徴部分木抽出手段２８が、各文に対応する依存構造木もしくは依存構造束から特徴的な部分木を抽出する。ただし、互いに排他関係にある複数の節点を含む部分木は抽出しない。 In step B1, when the bundle constructing unit 27 includes a matching subtree corresponding to the synonymous expression dependency structure tree in the target sentence dependency structure tree constructed in step A3, a node of the target sentence dependency structure tree, A branch is added between the synonymous expression nodes generated in step A6 to form a dependency structure bundle. That is, if there is a branch in the original dependency structure tree starting from a node included in the matching subtree and ending in a node not included in the matching subtree, the branch from the synonymous expression node to the end point If there is a branch in the original dependency structure tree that starts from a node that is not included in the matching subtree and ends in a node that is included in the matching subtree, from that start point to the synonymous expression node Add a branch.
In step B2, the bundle forming unit 27 associates each node included in the matching subtree and the synonymous expression node as nodes that are mutually exclusive.
In step B3, the bundle feature subtree extracting unit 28 extracts a characteristic subtree from the dependency structure tree or the dependency structure bundle corresponding to each sentence. However, subtrees that include a plurality of nodes that are mutually exclusive are not extracted.

次に、テキストマイニング装置１１の具体的な動作例について説明する。
本実施例も、テキストマイニング装置１０の動作例と同様に、依存構造木として文節を節点とし文節に属する自立語を終止形に直したしたものを節点のラベルとし、文節間の係り受け関係を枝とする木構造を採用する。
また、同義表現辞書記憶部３１には、図２３に示す内容が予め記憶されている。
また、テキスト集合記憶部３２には、図１５に示す内容のテキストマイニングの対象となるテキストが予め記憶されている。 Next, a specific operation example of the text mining device 11 will be described.
Similarly to the operation example of the text mining apparatus 10 in this embodiment, the dependency structure tree is obtained by changing a clause as a node and a self-standing word belonging to the clause as a terminal form as a node label. A tree structure with branches is adopted.
Further, the contents shown in FIG. 23 are stored in the synonym expression dictionary storage unit 31 in advance.
In addition, the text set storage unit 32 stores in advance text to be subjected to text mining with the contents shown in FIG.

本実施例においても、テキストマイニング装置１０の場合と同様に、まず、言語解析手段２１が、同義表現辞書記憶部３１中の各表現を解析し、同義表現依存構造木を構築する。
この処理により、表現ＥＸ１から依存構造木ＤＴ−ＥＸ１（図１（ａ））が、表現ＥＸ２から依存構造木ＤＴ−ＥＸ２（図１（ｂ））が、表現ＥＸ３から依存構造木ＤＴ−ＥＸ３（図４（ａ））が、表現ＥＸ６から依存構造木ＤＴ−ＥＸ６（図２４（ａ））が、そして、表現ＥＸ７から依存構造木ＤＴ−ＥＸ７（図２４（ｂ））が構築される。 Also in the present embodiment, as in the case of the text mining device 10, first, the language analysis unit 21 analyzes each expression in the synonym expression dictionary storage unit 31 and constructs a synonym expression dependency structure tree.
With this processing, the expression EX1 to the dependency structure tree DT-EX1 (FIG. 1A), the expression EX2 to the dependency structure tree DT-EX2 (FIG. 1B), and the expression EX3 to the dependency structure tree DT-EX3 ( In FIG. 4A, a dependency structure tree DT-EX6 (FIG. 24A) is constructed from the expression EX6, and a dependency structure tree DT-EX7 (FIG. 24B) is constructed from the expression EX7.

続いて、言語解析手段２１が、テキスト集合記憶部３２中のテキストに含まれる各文を解析して対象文依存構造木を構築し、同義表現識別手段２２が、同義表現辞書中の表現に対応する一致部分木が対象文依存構造木に含まれているかどうかを識別し、同義表現節点生成手段２３が、一致部分木と対応付けて同義表現節点を生成する。 Subsequently, the language analysis unit 21 analyzes each sentence included in the text in the text set storage unit 32 to construct a target sentence dependency structure tree, and the synonym expression identification unit 22 corresponds to the expression in the synonym expression dictionary. The matching subtree is identified as being included in the target sentence dependency structure tree, and the synonym expression node generation unit 23 generates a synonym expression node in association with the matching subtree.

ここでは、まず、テキスト中の文Ｓ１「メールを表示する文字をできるだけ小さくする方法をＷＥＢで調べた」に対する処理を例として説明する。
まず、言語解析手段２１により依存構造木ＤＴ−Ｓ１（図２）が構築さる。続いて、同義表現識別手段２２によって、各依存構造木ＤＴ−ＥＸ１（図１（ａ））、ＤＴ−ＥＸ２（図１（ｂ））、ＤＴ−ＥＸ３（図４（ａ））、ＤＴ−ＥＸ６（図２４（ａ））、および、ＤＴ−ＥＸ７（図２４（ｂ））が、この依存構造木ＤＴ−Ｓ１と順に照合される。
この結果、依存構造木ＤＴ−ＥＸ１（図１（ａ））がこの依存構造木ＤＴ−Ｓ１中に部分木として含まれていることが識別され（図３）、同義表現節点生成手段２３によって、その部分に対応づけて新たに同義表現節点「Ｇ１」が生成される（図１７）。ここまでは、第１の実施例と同じ処理が行われる。 Here, first, an explanation will be given by taking as an example the processing for the sentence S1 in the text “The method for making the letters for displaying mail as small as possible was examined by WEB”.
First, the dependency analysis tree DT-S1 (FIG. 2) is constructed by the language analysis means 21. Subsequently, by the synonymous expression identifying means 22, each dependency structure tree DT-EX1 (FIG. 1 (a)), DT-EX2 (FIG. 1 (b)), DT-EX3 (FIG. 4 (a)), DT-EX6 (FIG. 24A) and DT-EX7 (FIG. 24B) are collated with this dependency structure tree DT-S1 in order.
As a result, it is identified that the dependency structure tree DT-EX1 (FIG. 1 (a)) is included as a subtree in the dependency structure tree DT-S1 (FIG. 3). A new synonymous expression node “G1” is generated in association with the portion (FIG. 17). Up to this point, the same processing as in the first embodiment is performed.

次に、束構成手段２７が同義表現節点「Ｇ１」と、元の依存構造木ＤＴ−Ｓ１中の節点との間に枝を追加し、依存構造束を構成する（図１０）。このとき、依存構造木ＤＴ−ＥＸ１に適合する一致部分木ＰＴ１に着目し、部分木ＰＴ１内の節点を始点とし部分木ＰＴ１外の節点を終点とする枝が、依存構造木ＤＴ−Ｓ１中に存在する場合に、節点「Ｇ１」からその終点への枝を追加し、部分木ＰＴ１外の節点を始点とし部分木ＰＴ２内の節点を終点とする枝が、依存構造木中ＤＴ−Ｓ１中に存在していた場合に、その始点から節点「Ｇ１」への枝を追加する。
図１０において、左側の依存構造木ＤＴ−ＥＸ２に適合する部分木ＰＴ１の内側の節点「する」を始点とし、外側の節点「方法」を終点とする枝ＢＲ６が存在するため、節点「Ｇ１」から節点「方法」へのＢＲ７枝が追加される。また、部分木ＰＴ１の外側の節点「メール」を始点とし、内側の節点「表示する」を終点とする枝ＢＲ４、および、部分木ＰＴ１の外側の節点「できるだけ」を始点とし、内側の節点「小さい」を終点とする枝ＢＲ５が存在するため、束構成手段２７によって、節点「メール」および節点「できるだけ」から節点「Ｇ１」への枝ＢＲ８およびＢＲ９が追加され、依存構造束ＤＬ−Ｓ１（図１１）が構成される。 Next, the bundle forming unit 27 adds a branch between the synonymous expression node “G1” and the node in the original dependency structure tree DT-S1 to form a dependency structure bundle (FIG. 10). At this time, paying attention to the matching subtree PT1 that matches the dependency structure tree DT-EX1, a branch having a node in the subtree PT1 as a start point and a node outside the subtree PT1 as an end point is included in the dependency structure tree DT-S1. If there is a branch from the node “G1” to its end point, a branch having a node outside the subtree PT1 as a start point and a node in the subtree PT2 as an end point is included in the dependency structure tree DT-S1. If it exists, a branch from the starting point to the node “G1” is added.
In FIG. 10, since there is a branch BR6 starting from the inner node “Yes” and having the outer node “Method” as the end point in the subtree PT1 that matches the left dependency structure tree DT-EX2, the node “G1” exists. A BR7 branch from to the node “method” is added. Also, the branch node BR4 starting from the outer node “mail” of the subtree PT1 and the inner node “display” as the end point, and the outer node “possible” of the subtree PT1 as the starting point, the inner node “ Since there is a branch BR5 whose end point is “small”, the bundle forming unit 27 adds the branch “BR” and the branch BR8 and BR9 from the node “mail” and the node “G1” to the node “G1”, and the dependency structure bundle DL-S1 ( FIG. 11) is constructed.

束構成手段２７は、さらに、依存構造木ＤＴ−ＥＸ１に適合する部分木に含まれる各節点と、節点「Ｇ１」とを互いに排他関係にある節点として関連づける。この結果、図１７において点線部分の内側に存在していた４つの節点「する」「小さい」「文字」「表示する」と、節点「Ｇ１」とが互いに排他関係にある節点として関連づけられる。図１１では、点線によって排他関係が示されている。
このようにして構成された依存構造束ＤＬ−Ｓ１（図１１）に対して、同義表現識別手段２２による同義表現辞書中の表現に対応する依存構造木との照合が続けられるが、他に含まれている依存構造木は存在しないため、最終的にこの依存構造束ＤＬ−Ｓ１が、束用特徴部分木抽出手段２８が特徴的な部分木を抽出する対象となる。 The bundle forming unit 27 further associates each node included in the subtree matching the dependency structure tree DT-EX1 and the node “G1” as nodes that are mutually exclusive. As a result, the four nodes “Yes”, “Small”, “Character”, “Display” and the node “G1” existing inside the dotted line portion in FIG. 17 are associated with the node “G1” as mutually exclusive nodes. In FIG. 11, the exclusive relationship is indicated by a dotted line.
The dependency structure bundle DL-S1 (FIG. 11) configured in this way is continuously checked against the dependency structure tree corresponding to the expression in the synonym expression dictionary by the synonym expression identification unit 22, Since no dependency structure tree exists, this dependency structure bundle DL-S1 is finally a target from which the bundle feature subtree extraction unit 28 extracts characteristic subtrees.

次に、文Ｓ３「メールを表示する文字をできるだけ小さくする方法をＷＥＢで調べた」を含むテキストがテキスト集合記憶部３２中に存在したとし、この文Ｓ３に対する処理を説明する。
まず、言語解析手段２１により依存構造木ＤＴ−Ｓ３（図１９（ａ））が構築される。続いて、同義表現識別手段２２によって、各依存構造木ＤＴ−ＥＸ１（図１（ａ））、ＤＴ−ＥＸ２（図１（ｂ））、ＤＴ−ＥＸ３（図４（ａ））、ＤＴ−ＥＸ６（図２４（ａ））、および、ＤＴ−ＥＸ７（図２４（ｂ））が、この依存構造木ＤＴ−Ｓ３と順に照合される。
この結果、依存構造木ＤＴ−ＥＸ２（図１（ｂ））がこの依存構造木ＤＴ−Ｓ３中に部分木ＰＴ３として含まれていることが識別され、同義表現節点生成手段２３によって、その部分に対応づけて新しい節点「Ｇ１」が生成される（図２５（ａ））。 Next, assuming that there is a text in the text set storage unit 32 that includes the sentence S3 “A method for making a character for displaying mail as small as possible” is described in the text set storage unit 32, the processing for this sentence S3 will be described.
First, the dependency analysis tree DT-S3 (FIG. 19A) is constructed by the language analysis means 21. Subsequently, by the synonymous expression identifying means 22, each dependency structure tree DT-EX1 (FIG. 1 (a)), DT-EX2 (FIG. 1 (b)), DT-EX3 (FIG. 4 (a)), DT-EX6 (FIG. 24A) and DT-EX7 (FIG. 24B) are collated in order with this dependency structure tree DT-S3.
As a result, it is identified that the dependency structure tree DT-EX2 (FIG. 1B) is included as a subtree PT3 in the dependency structure tree DT-S3. Correspondingly, a new node “G1” is generated (FIG. 25A).

次に、束構成手段２７が同義表現節点「Ｇ１」と、元の依存構造木ＤＴ−Ｓ３中の節点との間に枝を追加し、依存構造束を構成する。このとき、依存構造木ＤＴ−ＥＸ２に適合する一致部分木ＰＴ３に着目し、部分木ＰＴ３内の節点を始点とし部分木ＰＴ３外の節点を終点とする枝が依存構造木中ＤＴ−Ｓ３に存在する場合に、節点「Ｇ１」からその終点への枝を追加し、部分木ＰＴ３外の節点を始点とし部分木ＰＴ３内の節点を終点とする枝が依存構造木中ＤＴ−Ｓ３中に存在していた場合に、その始点から節点「Ｇ１」への枝を追加する。
図２５（ａ）において、依存構造木ＤＴ−ＥＸ２に適合する部分木ＰＴ３の外側の節点「メール」を始点とし、内側の節点「表示する」を終点とする枝ＢＲ１４、および、部分木ＰＴ３の外側の節点「画面」を始点とし、内側の節点「表示する」を終点とする枝ＢＲ１５が存在するため、束構成手段２７によって、節点「メール」および節点「画面」から節点「Ｇ１」への枝ＢＲ１６およびＢＲ１７が追加され、依存構造束ＤＬ−Ｓ３Ａ（図２５（ｂ））が構成される。 Next, the bundle forming unit 27 adds a branch between the synonymous expression node “G1” and the node in the original dependency structure tree DT-S3 to form a dependency structure bundle. At this time, paying attention to the matching subtree PT3 conforming to the dependency structure tree DT-EX2, a branch having a node in the subtree PT3 as a start point and a node outside the subtree PT3 as an end point exists in the dependency structure tree DT-S3. When adding a branch from the node “G1” to its end point, a branch having a node outside the subtree PT3 as a start point and a node inside the subtree PT3 as an end point exists in the dependency structure tree DT-S3. If so, a branch from the starting point to the node “G1” is added.
In FIG. 25 (a), the branch BR14 starting from the outer node “mail” of the subtree PT3 conforming to the dependency structure tree DT-EX2 and ending at the inner node “display” and the subtree PT3 Since there is a branch BR15 having the outer node “screen” as the start point and the inner node “display” as the end point, the bundle forming unit 27 changes the node “mail” and the node “screen” to the node “G1”. Branches BR16 and BR17 are added to form a dependent structure bundle DL-S3A (FIG. 25B).

束構成手段２７は、さらに、部分木ＰＴ３に含まれる各節点と、節点「Ｇ１」とを互いに排他関係にある節点として関連づける。この結果、図２５（ａ）において部分木ＰＴ３の内側に存在していた３つの節点「表示する」「文字」「小さな」と、節点「Ｇ１」とが互いに排他関係枝ＢＲ１８、ＢＲ１９、ＢＲ２０により結ばれ排他関係にある節点として関連づけられる。
このようにして構成された依存構造束ＤＬ−Ｓ３Ａ（図２５（ｂ））に対して、同義表現識別手段２２による、同義表現辞書中の表現に対応する依存構造木との照合が続けられ、表現ＥＸ６に対応する依存構造木ＤＴ−ＥＸ６（図２４（ａ））がこの依存構造束ＤＬ−Ｓ３Ａ（図２５（ｂ））中に部分木ＰＴ４として存在することが識別される。表現ＥＸ６は、同義表現グループＧ３に属するため、同義表現節点生成手段２３は、その部分に対応づけて新しい節点「Ｇ３」を生成する（図２６）。 The bundle forming unit 27 further associates each node included in the subtree PT3 with the node “G1” as nodes that are mutually exclusive. As a result, the three nodes “display”, “character”, “small”, and node “G1” that existed inside the subtree PT3 in FIG. 25A are mutually exclusive by the branches BR18, BR19, BR20. Connected as nodes that are connected and exclusive.
The dependency structure bundle DL-S3A (FIG. 25 (b)) thus configured is continuously checked against the dependency structure tree corresponding to the expression in the synonym expression dictionary by the synonym expression identifying unit 22. It is identified that the dependency structure tree DT-EX6 (FIG. 24A) corresponding to the expression EX6 exists as the subtree PT4 in this dependency structure bundle DL-S3A (FIG. 25B). Since the expression EX6 belongs to the synonym expression group G3, the synonym expression node generation unit 23 generates a new node “G3” in association with the portion (FIG. 26).

束構成手段２７は、依存構造木中ＤＴ−Ｓ３中の節点と節点「Ｇ１」との間に枝を追加したときと同様の処理により、依存構造木中ＤＬ−Ｓ３Ａ中の節点と節点「Ｇ３」との間に枝を追加する。図２６によると、依存構造木ＤＴ−ＥＸ６に適合する部分木ＰＴ４の外側の節点「メール」を始点とし内側の節点「表示する」を終点とする枝ＢＲ２１および部分木ＰＴ４の外側の節点「文字」を始点とし内側の節点「表示する」を終点とする枝ＢＲ２２が存在するため、束構成手段２７によって、節点「メール」および節点「文字」から節点「Ｇ３」への枝ＢＲ２３およびＢＲ２４が追加され、依存構造束ＤＬ−Ｓ３Ｂ（図２７）が構成される。 The bundle forming means 27 performs the same processing as when a branch is added between the node in the dependency structure tree DT-S3 and the node “G1”, and the node and node “G3” in the dependency structure tree DL-S3A. To add a branch. According to FIG. 26, a branch BR21 having a node “mail” outside the subtree PT4 conforming to the dependency structure tree DT-EX6 as a start point and an end point “display” inside the subtree PT4 and a node “character” outside the subtree PT4 ”And a branch BR22 having an inner node“ display ”as an end point exists, and the bundle forming unit 27 adds branches BR23 and BR24 from the node“ mail ”and the node“ character ”to the node“ G3 ”. The dependency structure bundle DL-S3B (FIG. 27) is configured.

束構成手段２７は、さらに、依存構造木ＤＴ−ＥＸ６に適合する部分木に含まれる各節点と節点「Ｇ３」とを互いに排他関係にある節点として関連づける。この結果、図２６において部分木ＰＴ４の内側に存在していた３つの節点「表示する」「画面」「別」と、節点「Ｇ３」とが互いに排他関係にある節点として関連づけられ、排他関係枝ＢＲ２５、ＢＲ２６、ＢＲ２７でそれぞれ接続される。
このとき、部分木ＰＴ４に含まれ節点のうち、節点「表示する」に対して互いに排他関係にある節点として節点「Ｇ１」がすでに関連づけられているため、節点「Ｇ１」と節点「Ｇ３」も互いに排他関係にある節点として関連づけられ排他関係枝ＢＲ２８で接続される。 The bundle forming unit 27 further associates each node included in the subtree conforming to the dependency structure tree DT-EX6 and the node “G3” as nodes that are mutually exclusive. As a result, the three nodes “display”, “screen”, “other”, and node “G3” that existed inside the subtree PT4 in FIG. 26 are associated as nodes that are mutually exclusive, and the exclusive relationship branch They are connected by BR25, BR26, and BR27, respectively.
At this time, since the node “G1” is already associated with the node “display” among the nodes included in the subtree PT4, the node “G1” and the node “G3” are also associated with each other. They are associated as nodes that are mutually exclusive and are connected by an exclusive branch BR28.

本実施例では、依存構造木ＤＴ−Ｓ３（図１９（ａ））において、節点「小さな」、節点「文字」および節点「表示する」からなる部分木が依存構造木ＤＴ−ＥＸ２に適合すると識別されると同時に、節点「別」、節点「画面」および節点「表示する」からなる部分木が依存構造木ＤＴ−ＥＸ６に適合すると識別される。本実施例では、同義表現節点生成手段２３によって生成された節点によって元の依存構造木の節点を置き換えてしまうことがないため、このように、単一の節点「表示する」を表現ＥＸ２の一部としても、表現ＥＸ６の一部としてもとらえることができている。 In this embodiment, in the dependency structure tree DT-S3 (FIG. 19A), it is identified that the subtree consisting of the node “small”, the node “character”, and the node “display” matches the dependency structure tree DT-EX2. At the same time, the subtree consisting of the node “different”, the node “screen”, and the node “display” is identified as matching the dependency structure tree DT-EX6. In the present embodiment, since the node generated by the synonymous expression node generation unit 23 does not replace the node of the original dependency structure tree, the single node “display” is expressed as one expression EX2. The part can also be taken as part of the expression EX6.

こうして構成された依存構造束ＤＬ−Ｓ３Ｂ（図２７）に対して、同義表現識別手段２２による同義表現辞書中の表現に対応する依存構造木との照合が続けられるが、他に含まれている依存構造木は存在しないため、最終的にこの依存構造束ＤＬ−Ｓ３Ｂが、束用特徴部分木抽出手段２８が特徴的な部分木を抽出する対象となる。
このようにして、同義表現識別手段２２、同義表現節点生成手段２３および束構成手段２７の処理により、表現ＥＸ１、表現ＥＸ２および表現ＥＸ３が使われている箇所に対して節点「Ｇ１」が新たに依存構造木中に追加される。表現ＥＸ１、表現ＥＸ２および表現ＥＸ３は、いずれも単一の節点「Ｇ１」として表され、それらの差異が吸収される。同様に、表現ＥＸ６および表現ＥＸ７も単一の節点「Ｇ３」として表され、それらの差異が吸収される。 The dependency structure bundle DL-S3B (FIG. 27) thus configured is continuously checked against the dependency structure tree corresponding to the expression in the synonym expression dictionary by the synonym expression identification unit 22, but is included elsewhere. Since there is no dependency structure tree, this dependency structure bundle DL-S3B is finally a target from which the feature subtree extracting unit 28 extracts characteristic subtrees.
In this way, the node “G1” is newly added to the location where the expressions EX1, EX2 and EX3 are used by the processes of the synonym expression identifying unit 22, the synonym expression node generating unit 23, and the bundle forming unit 27. Added to dependency tree. Expression EX1, expression EX2, and expression EX3 are all represented as a single node “G1”, and their differences are absorbed. Similarly, expressions EX6 and EX7 are also represented as a single node “G3” and their differences are absorbed.

言語解析手段２１、同義表現識別手段２２、同義表現節点生成手段２３および束構成手段２７が処理を繰り返し、テキスト集合記憶部３２中の各テキスト中の文すべてに対して依存構造木または依存構造束を生成すると、束用特徴部分木抽出手段２８が、生成された依存構造木または依存構造束を対象として特徴的な部分木を抽出する。このとき、依存構造木から依存構造束が構成されている場合、元の依存構造木は特徴的な部分木の抽出に使用しない。また、互いに排他関係にある複数の節点を含む部分木は抽出しない。 The language analysis unit 21, synonym expression identification unit 22, synonym expression node generation unit 23, and bundle formation unit 27 repeat the processing, and a dependency structure tree or dependency structure bundle for all sentences in each text in the text set storage unit 32. Is generated, the feature subtree extracting means for bundle 28 extracts a characteristic subtree for the generated dependency structure tree or dependency structure bundle. At this time, when a dependency structure bundle is configured from the dependency structure tree, the original dependency structure tree is not used for extracting a characteristic subtree. Also, subtrees that include a plurality of nodes that are mutually exclusive are not extracted.

ここでは、特徴的な部分木を抽出する対象となる依存構造木および依存構造束中で計５０回以上出現する部分木を特徴的な部分木として抽出するものとする。この場合、例えば、依存構造木および依存構造束中に含まれる部分木を全種類列挙して、それぞれの出現回数をカウントし、出現回数が５０回以上の部分木を抽出することができる。
抽出結果出力手段２６が、このようにして抽出された部分木を順に出力する。このとき、言語解析手段２１によって構築された依存構造木にはじめから存在していた節点についてはそのまま出力し、同義表現節点生成手段２３によって生成された節点については、対応する同義表現グループに応じた表現にラベルを置換して出力する。 Here, it is assumed that a dependent structure tree from which a characteristic subtree is to be extracted and a subtree that appears 50 times or more in total in the dependent structure bundle are extracted as a characteristic subtree. In this case, for example, it is possible to list all types of subtrees included in the dependency structure tree and the dependency structure bundle, count the number of appearances of each, and extract subtrees with the appearance count of 50 or more.
The extraction result output means 26 sequentially outputs the subtrees extracted in this way. At this time, the nodes that originally existed in the dependency structure tree constructed by the language analysis unit 21 are output as they are, and the nodes generated by the synonym expression node generation unit 23 correspond to the corresponding synonym expression groups. Replace the label with the output.

テキストマイニング装置１０の場合と同様に、対応する同義表現グループ中の最初の表現を用いてラベルを置換するものとすると、例えば、図２８（ａ）に示す依存構造木ＤＴ−Ｒ２のような抽出結果は、節点「Ｇ３」のラベルが同義表現グループＧ３（図２３参照）の最初の表現である表現ＥＸ６「別の画面に表示する」によって置換され図２８（ｂ）のように結果が出力される。
本実施例においても、テキストマイニング装置１０の場合のように、同義表現辞書中で同一の同義表現グループに属する表現に対応する部分がいずれも同一の節点で表され、これらが同一視された状態で特徴的な部分木の抽出が行われる。また、追加された節点のラベルは、出力時に、それぞれの節点によって同一視された表現に対応する適当な表現に置換されるため、表現の同一視が行われた場合でも、利用者が容易に結果を理解できる。
第１の実施例と同様に、追加される節点「Ｇ１」「Ｇ３」は、元の依存構造木中の節点とは異なるため、誤って特徴的と見なされることはない。 As in the case of the text mining device 10, if the label is replaced using the first expression in the corresponding synonym expression group, for example, an extraction like the dependency structure tree DT-R2 shown in FIG. As a result, the label of the node “G3” is replaced by the expression EX6 “display on another screen” which is the first expression of the synonym expression group G3 (see FIG. 23), and the result is output as shown in FIG. The
Also in this embodiment, as in the case of the text mining device 10, all the parts corresponding to expressions belonging to the same synonym expression group in the synonym expression dictionary are represented by the same node, and these are regarded as being the same. The characteristic sub-tree is extracted at. In addition, since the label of the added node is replaced with an appropriate expression corresponding to the expression identified by each node at the time of output, the user can easily perform the identification even when the expression is identified. Understand the results.
Similar to the first embodiment, the added nodes “G1” and “G3” are different from the nodes in the original dependency structure tree, and thus are not mistakenly regarded as characteristic.

また、表現を同一視する際に、その表現に対応する依存構造木に相当する部分を削除することがないため、その部分からも特徴的な部分木が抽出されうる。例えば、図１１の依存構造束ＤＬ−Ｓ１において、表現ＥＸ１に対応する依存構造木ＤＴ−ＥＸ１（図１（ａ））に含まれる４つの節点「する」「小さい」「文字」「表示する」はそのまま残っており、その部分も特徴的な部分木を抽出する対象となっている。 Further, when the expressions are identified, a part corresponding to the dependency structure tree corresponding to the expression is not deleted, and thus a characteristic subtree can be extracted from the part. For example, in the dependency structure bundle DL-S1 of FIG. 11, the four nodes included in the dependency structure tree DT-EX1 (FIG. 1A) corresponding to the expression EX1 “do” “small” “character” “display”. Remains as it is, and that part is also the target for extracting characteristic subtrees.

さらに、互いに排他関係にある節点を含む部分木を抽出しないため、統一する前の構造と統一した後の構造の両方を残しておいても、その両方を含むような抽出結果として意味をなさない部分木を抽出することがない。例えば、図１１の依存構造束ＤＬ−Ｓ１において、節点「する」と節点「Ｇ１」とは互いに排他関係にあるため、図１２に示すような抽出結果として意味をなさない部分木は抽出されない。 Furthermore, since subtrees that contain nodes that are mutually exclusive are not extracted, leaving both the structure before unification and the structure after unification does not make sense as an extraction result that includes both. Does not extract subtrees. For example, in the dependency structure bundle DL-S1 of FIG. 11, the node “Yes” and the node “G1” are mutually exclusive, so a subtree that does not make sense as an extraction result as shown in FIG. 12 is not extracted.

このように、テキストマイニング装置１１では、対象文依存構造木において、同義の表現に対応する部分の構造を統一する際、束構成手段２７は、一致部分木を同義表現節点で置換する代わりに、対象文依存構造木に同義表現節点を追加して束構造を生成し、構造の統一により既存の節点が削除されることを防ぐ。
このため、同義の表現に対応する部分の木構造が失われることがなくなり、束用特徴部分木抽出手段２８は、その部分からも特徴的な部分木を抽出することができる。
また、同義の表現に対応する部分の構造を統一する際、束構成手段２７は、同義表現節点と一致部分木内の各節点とを排他関係枝で結び排他関係にある節点として関連づけておく。そして、束用特徴部分木抽出手段２８は、特徴的な部分木の抽出を行う際に、互いに排他関係にある節点を含む部分木を抽出しない。
このため、統一する前の構造と統一した後の構造の両方を残しておいても、その両方を含むような、抽出結果として意味をなさない部分木を抽出することがない。
このように、本実施の形態によれば、第１の実施の形態の効果に加え、同義表現を統一することによる副作用をさらに低く抑えることができるという効果が得られる。 As described above, in the text mining device 11, when unifying the structure of the part corresponding to the synonymous expression in the target sentence dependent structure tree, the bundle forming unit 27, instead of replacing the matching subtree with the synonymous expression node, A syntactic expression node is added to the target sentence dependency structure tree to generate a bundle structure, and the existing nodes are prevented from being deleted due to the unification of the structure.
For this reason, the tree structure of the portion corresponding to the synonymous expression is not lost, and the feature subtree extracting unit 28 for bundling can extract the characteristic subtree from the portion.
Further, when unifying the structures of the parts corresponding to the synonymous expressions, the bundle forming unit 27 associates the synonymous expression nodes and the nodes in the matching subtree with exclusive relation branches and associates them as nodes in the exclusive relation. The bundling feature subtree extracting means 28 does not extract subtrees including nodes that are mutually exclusive when extracting a characteristic subtree.
For this reason, even if both the structure before unification and the structure after unification are left, the subtree which does not make sense as an extraction result which includes both is not extracted.
Thus, according to the present embodiment, in addition to the effect of the first embodiment, an effect that the side effects caused by unifying synonymous expressions can be further reduced can be obtained.

上記に説明したテキストマイニング装置１０およびテキストマイニング装置１１は、コンピュータとそれを動作させるプログラムによっても実現することができる。
図１４は、このような実施形態を説明する図である。
コンピュータ４０は、記憶装置３０と出力装置４と入力装置１とＣＰＵ(Central Processing Unit)４１と主記憶装置４２を備えている。記憶装置１は、例えばハードディスク装置で、同義表現辞書を記憶する同義表現辞書記憶部３１とマイニングの対象となるテキスト集合を記憶するテキスト集合記憶部３２を備えている。主記憶装置４２は、たとえばＲＡＭ(Random Access Memory)により構成され、テキストマイニング用プログラム４３を記憶している。
主記憶装置４２に格納されたテキストマイニング用プログラム４３は、ＣＰＵ４１に読み込まれ実行される。
ここで、テキストマイニング用プログラム４３は、コンピュータに、上記に説明した各動作を実行させるプログラムである。
このようにすれば、ＣＰＵ４１を言語解析手段２１、同義表現識別手段２２、同義表現節点生成手段２３、節点置換手段２４、特徴部分木抽出手段２５、抽出結果出力手段２６として機能するデータ処理装置２０として動作させ、コンピュータ４０をテキストマイニング装置１０として動作させることができる。
同様に、ＣＰＵ４１を言語解析手段２１、同義表現識別手段２２、同義表現節点生成手段２３、束構成手段２７、束用特徴部分木抽出手段２８、抽出結果出力手段２６として機能するデータ処理装置２９として動作させ、コンピュータ４０をテキストマイニング装置１１として動作させることができる。 The text mining device 10 and the text mining device 11 described above can also be realized by a computer and a program that operates the computer.
FIG. 14 is a diagram for explaining such an embodiment.
The computer 40 includes a storage device 30, an output device 4, an input device 1, a CPU (Central Processing Unit) 41, and a main storage device 42. The storage device 1 is a hard disk device, for example, and includes a synonym expression dictionary storage unit 31 that stores a synonym expression dictionary and a text set storage unit 32 that stores a text set to be mined. The main storage device 42 is constituted by a RAM (Random Access Memory), for example, and stores a text mining program 43.
The text mining program 43 stored in the main storage device 42 is read and executed by the CPU 41.
Here, the text mining program 43 is a program that causes a computer to execute the operations described above.
In this way, the CPU 41 functions as the language analysis means 21, synonymous expression identification means 22, synonymous expression node generation means 23, node replacement means 24, feature subtree extraction means 25, and extraction result output means 26. And the computer 40 can be operated as the text mining device 10.
Similarly, the data processing device 29 functions as the CPU 41 as the language analysis unit 21, the synonym expression identification unit 22, the synonym expression node generation unit 23, the bundle formation unit 27, the bundle feature subtree extraction unit 28, and the extraction result output unit 26. The computer 40 can be operated as the text mining device 11 by operating.

図１（ａ）は、表現「表示する文字を小さくする」に対応する依存構造木を示す図である。図１（ｂ）は、表現「小さな文字で表示する」に対応する依存構造木を示す図である。FIG. 1A is a diagram illustrating a dependency structure tree corresponding to the expression “reducing the displayed characters”. FIG. 1B is a diagram illustrating a dependency structure tree corresponding to the expression “display with small characters”. 文「メールを表示する文字を小さくする方法をＷＥＢで調べた」に対応する依存構造木を示す図である。It is a figure which shows the dependence structure tree corresponding to the sentence "The method of reducing the character which displays an email was investigated by WEB." 図２の依存構造木において図１（ａ）の依存構造木に適合する部分を示す図である。It is a figure which shows the part which fits the dependence structure tree of Fig.1 (a) in the dependence structure tree of FIG. 図４（ａ）は、表現「表示する行数を増やす」対応する依存構造木を示す図である。図４（ｂ）は、表現「メールを表示する行数を２倍に増やす」に対応する依存構造木を示す図である。FIG. 4A is a diagram illustrating a dependency structure tree corresponding to the expression “increase the number of rows to be displayed”. FIG. 4B is a diagram illustrating a dependency structure tree corresponding to the expression “increasing the number of lines for displaying mail twice”. 本発明の第１の実施の形態であるテキストマイニング装置の構成を示すブロック図である。It is a block diagram which shows the structure of the text mining device which is the 1st Embodiment of this invention. 同義表現辞書の一例を示す図である。It is a figure which shows an example of a synonym expression dictionary. 節点置換手段が依存構造木に対して節点の置換を行う例を示す図である。It is a figure which shows the example which a node replacement means replaces a node with respect to a dependency structure tree. テキストマイニング装置の動作を示す流れ図である。It is a flowchart which shows operation | movement of a text mining apparatus. 本発明の第２の実施の形態であるテキストマイニング装置の構成を示すブロック図である。It is a block diagram which shows the structure of the text mining device which is the 2nd Embodiment of this invention. 束構成手段が依存構造木に対して枝の追加を行う例を示す図である。It is a figure which shows the example which a bundling means adds a branch with respect to a dependence structure tree. 束構成手段が依存構造木に対して枝の追加を行うことによって生成された依存構造束を示す図である。It is a figure which shows the dependency structure bundle | flux produced | generated when the bundle | stack formation means adds a branch with respect to a dependency structure tree. 図１１の依存構造束からは抽出されない部分木の例を示す図である。It is a figure which shows the example of the subtree which is not extracted from the dependence structure bundle | flux of FIG. 図９のテキストマイニング装置の動作を示す流れ図である。10 is a flowchart showing the operation of the text mining device of FIG. 9. コンピュータとコンピュータプログラムによる本発明の実施形態を示す図である。It is a figure which shows embodiment of this invention by a computer and a computer program. テキスト集合の一例を示す図である。It is a figure which shows an example of a text set. 図１６（ａ）は、表現「画像をメールで送る」対応する依存構造木を示す図である。図１６（ｂ）は、表現「メールに画像を添付する」に対応する依存構造木を示す図である。FIG. 16A is a diagram showing a dependency structure tree corresponding to the expression “send image by e-mail”. FIG. 16B is a diagram illustrating a dependency structure tree corresponding to the expression “attach an image to an email”. 同義表現節点生成手段が図２の依存構造木に対して新たに節点を生成する例を示す図である。It is a figure which shows the example which a synonymous expression node production | generation means produces | generates a new node with respect to the dependence structure tree of FIG. 図１８（ａ）は、節点置換手段が図２依存構造木に対して節点の置換を行うことによって生成された依存構造木を示す図である。図１８（ｂ）は、節点置換手段が図４（ｂ）の依存構造木に対して節点の置換を行うことによって生成された依存構造木を示す図である。FIG. 18A is a diagram showing a dependency structure tree generated by the node replacement means replacing nodes in the dependency structure tree shown in FIG. FIG. 18B is a diagram showing the dependency structure tree generated by the node replacement means replacing nodes with respect to the dependency structure tree of FIG. 図１９（ａ）は、文「メールを小さな文字で別な画面に表示する」に対応する依存構造木を示す図である。図１９（ｂ）は、図１９（ａ）の依存構造木に対して節点置換を行った後の依存構造木を示す図である。FIG. 19A is a diagram illustrating a dependency structure tree corresponding to the sentence “display mail on a different screen with small characters”. FIG. 19B is a diagram illustrating the dependency structure tree after node replacement is performed on the dependency structure tree of FIG. 図２０（ａ）は、文「撮影した画像をメールで２人に送る」に対応する依存構造木を示す図である。図２０（ｂ）は、図２０（ａ）の依存構造木に対して節点置換を行った後の依存構造木示す図である。FIG. 20A is a diagram showing a dependency structure tree corresponding to the sentence “send a photographed image to two people by mail”. FIG. 20B is a diagram illustrating the dependency structure tree after node replacement is performed on the dependency structure tree of FIG. 図２１（ａ）は、文「メールにカメラで撮影した画像を添付する」に対応する依存構造木を示す図である。図２１（ｂ）は、図２１（ａ）の依存構造木に対して節点置換を行った後の依存構造木を示す図である。FIG. 21A is a diagram illustrating a dependency structure tree corresponding to the sentence “Attach an image captured by a camera to an email”. FIG. 21B is a diagram illustrating the dependency structure tree after node replacement is performed on the dependency structure tree of FIG. 図２２（ａ）は、抽出結果の依存構造木の例を示す図である。図２２（ｂ）は、抽出結果の出力例示す図である。FIG. 22A is a diagram illustrating an example of a dependency structure tree as an extraction result. FIG. 22B is a diagram illustrating an output example of the extraction result. 同義表現辞書の別の一例を示す図である。It is a figure which shows another example of a synonym expression dictionary. 図２４（ａ）は、表現「別の画面に表示する」に対応する依存構造木を示す図である。図２４（ｂ）は、表現「表示する画面を分ける」に対応する依存構造木を示す図である。FIG. 24A is a diagram illustrating a dependency structure tree corresponding to the expression “display on another screen”. FIG. 24B is a diagram illustrating a dependency structure tree corresponding to the expression “divide screens to be displayed”. 図２５（ａ）は、図１９（ａ）の依存構造木に対して同義表現節点を生成する例を示す図である。図２５（ｂ）は、図２５（ａ）の依存構造木から依存構造束を生成する例を示す図である。FIG. 25A is a diagram illustrating an example of generating synonymous expression nodes for the dependency structure tree of FIG. FIG. 25B is a diagram illustrating an example of generating a dependency structure bundle from the dependency structure tree of FIG. 束構成手段が図１９の依存構造木に対して枝の追加を行うことによって生成された依存構造束を示す図である。FIG. 20 is a diagram illustrating a dependency structure bundle generated by adding a branch to the dependency structure tree of FIG. 19 by the bundle forming unit. 束構成手段が図２６の依存構造木に対して枝の追加を行うことによって生成された依存構造束を示す図である。FIG. 27 is a diagram showing a dependency structure bundle generated by adding a branch to the dependency structure tree of FIG. 図２８（ａ）は、抽出結果の依存構造木の例を示す図である。図２８（ｂ）は、抽出結果の出力例示す図である。FIG. 28A is a diagram illustrating an example of a dependency structure tree as an extraction result. FIG. 28B is a diagram illustrating an output example of the extraction result.

Explanation of symbols

１０、１１：テキストマイニング装置
２１：言語解析手段
２２：同義表現識別手段
２３：同義表現節点生成手段
２４：節点置換手段
２５：特徴部分木抽出手段
２６：抽出結果出力手段
２７：束構成手段
２８：束用特徴部分木抽出手段
３１：同義表現辞書記憶部 10, 11: Text mining device 21: Language analysis means 22: Synonym expression identification means 23: Synonym expression node generation means 24: Node replacement means 25: Feature subtree extraction means 26: Extraction result output means 27: Bundle construction means 28: Feature subtree extracting means 31 for bundling: synonymous expression dictionary storage unit

Claims

Synonymous expression dictionary storage means for storing a synonymous expression dictionary that defines different expressions with synonymous contents as synonymous expression groups;
A target sentence dependency structure tree that is a dependency structure tree of each sentence included in a sentence set that is a target of text mining is compared with a synonym expression dependency structure tree that is a dependency structure tree of each expression included in the synonym expression dictionary, Synonym expression identifying means for identifying whether or not a matching subtree that is a subtree matching the synonymous expression dependency structure tree is included in the target sentence dependency structure tree;
Synonymous expression node generating means for generating a synonymous expression node having an identifier that indicates the synonymous expression group to which the expression corresponding to the matching subtree belongs and that is distinguished from a normal node label;
Node replacement means for replacing all nodes included in the matching subtree with the synonymous expression nodes;
A text mining device comprising: a feature subtree extracting unit that extracts a feature subtree from the target sentence dependency structure tree after the replacement.

Synonymous expression dictionary storage means for storing a synonymous expression dictionary that defines different expressions with synonymous contents as synonymous expression groups;
A target sentence dependency structure tree that is a dependency structure tree of each sentence included in a sentence set that is a target of text mining is compared with a synonym expression dependency structure tree that is a dependency structure tree of each expression included in the synonym expression dictionary, Synonym expression identifying means for identifying whether or not a matching subtree that is a subtree matching the synonymous expression dependency structure tree is included in the target sentence dependency structure tree;
Synonymous expression node generating means for generating a synonymous expression node having an identifier that indicates the synonymous expression group to which the expression corresponding to the matching subtree belongs and that is distinguished from a normal node label;
The synonym expression node is added to the target sentence dependency structure tree, and the synonym expression is obtained from a node that is not included in the matching subtree and has a dependency branch toward the node included in the matching subtree. A dependency branch toward the node is added, from the synonymous expression node toward a node that is not included in the matching subtree and has a dependency branch directed from the node included in the matching subtree. A node adding means for generating a dependency structure bundle by adding a dependency branch;
A text mining device comprising: a feature subtree extracting unit that extracts a feature subtree from the dependency structure bundle.

An exclusive branch connecting means for connecting the nodes included in the synonymous expression nodes and the matching subtree with exclusive branches;
3. The text mining apparatus according to claim 2, wherein the feature subtree extracting unit extracts the feature subtree from a subtree not including nodes connected by the exclusive relation branch of the dependency structure bundle.

A label of the synonym expression part node included in the feature subtree is replaced with an output expression representative of an expression belonging to the synonym expression group indicated by the label, and the shape of the feature subtree is visible to humans. 4. The text mining apparatus according to claim 1, further comprising an extraction result output means for outputting.

5. The text mining device according to claim 4, wherein the output expression is an expression first listed in the synonym expression group indicated by a label of the synonym expression node.

The text mining device according to claim 4, wherein the output expression is an expression having the shortest length in the synonym expression group indicated by the label of the synonym expression node.

5. The output expression is an expression that appears most frequently in a sentence set that is a target of the text mining in the synonym expression group indicated by a label of the synonym expression node. Text mining device.

5. The text mining device according to claim 4, wherein the output expression is an expression designated in advance as an expression for output in the synonym expression group indicated by the label of the synonym expression node. .

5. The output expression is an expression defined in advance separately from expressions included in the synonym expression group corresponding to the synonym expression group indicated by the label of the synonym expression node. The text mining device described.

In a text mining method executed by a text mining device that analyzes a sentence included in a text database to generate a target sentence dependency structure tree and extracts a feature subtree from the target sentence dependency structure tree,
A language analysis step of reading out expressions stored in a synonym expression dictionary that defines different expressions as synonym expressions as synonym expression groups from a storage device and generating a synonym expression dependency structure tree that is a dependency structure tree of the expression;
A synonym for collating the target sentence dependency structure tree with the synonym expression dependency structure tree and identifying whether or not a matching subtree that is a subtree matching the synonym expression dependency structure tree is included in the target sentence dependency structure tree An expression identification step;
A synonym expression node generation step of generating a synonym expression node having an identifier that indicates the synonym expression group to which the expression corresponding to the matching subtree belongs and is distinguished from a normal node label;
A node replacement step of replacing all nodes included in the matching subtree with the synonymous nodes;
A feature subtree extraction step of extracting a feature subtree from the target sentence-dependent structure tree after the replacement ,
A text mining method , wherein the operation content of each step is executed by a computer of the text mining apparatus .

In a text mining method executed by a text mining device that analyzes a sentence included in a text database to generate a target sentence dependency structure tree and extracts a feature subtree from the target sentence dependency structure tree,
A language analysis step of reading out expressions stored in a synonym expression dictionary that defines different expressions as synonym expressions as synonym expression groups from a storage device and generating a synonym expression dependency structure tree that is a dependency structure tree of the expression;
A synonym for collating the target sentence dependency structure tree with the synonym expression dependency structure tree and identifying whether or not a matching subtree that is a subtree matching the synonym expression dependency structure tree is included in the target sentence dependency structure tree An expression identification step;
A synonym expression node generation step of generating a synonym expression node having an identifier that indicates the synonym expression group to which the expression corresponding to the matching subtree belongs and is distinguished from a normal node label;
The synonym expression node is added to the target sentence dependency structure tree, and the synonym expression is obtained from a node that is not included in the matching subtree and has a dependency branch toward the node included in the matching subtree. A dependency branch toward the node is added, from the synonymous expression node toward a node that is not included in the matching subtree and has a dependency branch directed from the node included in the matching subtree. A node addition step for generating a dependency structure bundle by adding a dependency branch;
A feature subtree extraction step for extracting a feature subtree from the dependency structure bundle ,
A text mining method , wherein the operation content of each step is executed by a computer of the text mining apparatus .

After a synonym expression node is added to the target sentence dependency structure tree in the node addition step, the synonym expression node and an exclusive relation branch connection step of connecting each node included in the matching subtree with an exclusive relation branch,
The computer of the text mining device executes the operation content of the exclusive relation branch connection step,
12. The feature subtree extraction step, wherein the computer of the text mining apparatus extracts the feature subtree from a subtree that does not include nodes connected by the exclusive relation branch of the dependency structure bundle. The text mining method described in.

In a text mining program that causes a computer to execute a function of analyzing a sentence included in a text database and generating a target sentence dependency structure tree and a function of extracting a feature subtree from the target sentence dependency structure tree,
In the computer,
A function of reading expressions stored in a synonym expression dictionary that defines different expressions as synonymous expression groups with synonymous contents from a storage device and generating a synonymous expression dependency structure tree that is a dependency structure tree of the expression;
A function of collating the target sentence dependency structure tree with the synonymous expression dependency structure tree and identifying whether or not a matching subtree that is a subtree matching the synonymous expression dependency structure tree is included in the target sentence dependency structure tree When,
A function of generating a synonymous expression node having an identifier that indicates the synonymous expression group to which the expression corresponding to the matching subtree belongs and is distinguished from a normal node label;
A function of replacing all nodes included in the matching subtree with the synonymous nodes;
A text mining program that executes a function of extracting a feature subtree from the target sentence dependency structure tree after the replacement.

In a text mining program that causes a computer to execute a function of analyzing a sentence included in a text database and generating a target sentence dependency structure tree and a function of extracting a feature partial structure from the target sentence dependency structure tree,
In the computer,
A function of reading expressions stored in a synonym expression dictionary that defines different expressions as synonymous expression groups with synonymous contents from a storage device and generating a synonymous expression dependency structure tree that is a dependency structure tree of the expression;
A function of collating the target sentence dependency structure tree with the synonymous expression dependency structure tree and identifying whether or not a matching subtree that is a subtree matching the synonymous expression dependency structure tree is included in the target sentence dependency structure tree When,
A function of generating a synonymous expression node having an identifier that indicates the synonymous expression group to which the expression corresponding to the matching subtree belongs and is distinguished from a normal node label;
The synonym expression node is added to the target sentence dependency structure tree, and the synonym expression is obtained from a node that is not included in the matching subtree and has a dependency branch toward the node included in the matching subtree. A dependency branch toward the node is added, from the synonymous expression node toward a node that is not included in the matching subtree and has a dependency branch directed from the node included in the matching subtree. A function to generate a dependency structure bundle by adding a dependency branch,
A text mining program that executes a function of extracting a feature subtree from the dependency structure bundle.

After a synonymous expression node is added to the target sentence dependency structure tree, the computer is caused to execute a function of connecting the synonymous expression node and each node included in the matching subtree with an exclusive relation branch,
15. The text mining according to claim 14, wherein, when extracting the feature subtree, the feature subtree is extracted from a subtree that does not include nodes connected by the exclusive branch of the dependency structure bundle. program.