JP2010102521A

JP2010102521A - Dictionary creation device, dictionary creation method, dictionary creation program and recording medium recorded with the same program

Info

Publication number: JP2010102521A
Application number: JP2008273683A
Authority: JP
Inventors: Hiroyori Taira; 博順平; Masaaki Nagata; 昌明永田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-10-24
Filing date: 2008-10-24
Publication date: 2010-05-06
Anticipated expiration: 2028-10-24
Also published as: JP5193798B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a dictionary creation device for automatically learning a term determination rule with high precision, and for systematically dealing with a predicate item structure analysis including a zero pronoun or a composite pronoun. <P>SOLUTION: The dictionary creation device includes: a syntax/meaning analyzing part 101 for analyzing the syntax of a text for training with a correct answer item structure, and for creating a syntax/meaning analysis result table 120; a training data creation part 102 for creating a training attribute index table 121 by extracting attributes for learning by referring to the syntax/meaning analysis result table 120, and for creating a training vector table 122 by creating training vectors at the same time; a learning part 103 for learning a weight expressing the significance of the attributes by using the table 122, and for creating a weight table 123 by adding the weight obtained by learning to the training attribute index table 121; and a term determination rule creation part 104 for outputting a list obtained by rearranging the attributes in the order of the attributes whose significance is higher as an item determination rule by referring to the weight table 123. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、自然言語で表現された質問に対しコンピュータが回答する質問応答システム、情報検索システム、情報抽出システム、自動要約システム、自動翻訳システム、自動言い換えシステム、音声認識システムなどに用いられる辞書作成装置、辞書作成方法および辞書作成プログラム並びに辞書作成プログラムを記録した記録媒体に関するものである。 The present invention provides a dictionary for use in a question answering system in which a computer answers a question expressed in a natural language, an information retrieval system, an information extraction system, an automatic summarization system, an automatic translation system, an automatic paraphrasing system, a speech recognition system, etc. The present invention relates to a device, a dictionary creation method, a dictionary creation program, and a recording medium on which the dictionary creation program is recorded.

従来の言語処理装置では、述語の格フレームに対し確率モデルを仮定し、格フレームの正解が人手で付与されているコーパスデータを用いて機械学習を行い、確率モデルのパラメータを推定し、決定された確率モデルを用いて、最も尤度の高い項構造を出力する装置が提案されている（例えば、非特許文献１を参照）。この方法は、文中で、述語がどの単語であるか、その述語に対する項がどの単語であるかが与えられた場合に、その単語の意味属性をどのレベルにした格フレームが情報論的に表現力の高いルールであるかを調べる方法であり、与えられたテキストに対する項の認定、ゼロ代名詞解析については扱っていない。 In a conventional language processing device, a probabilistic model is assumed for the case frame of the predicate, machine learning is performed using corpus data in which the correct answer of the case frame is manually assigned, and the parameters of the probability model are estimated and determined. An apparatus that outputs the most likely term structure using a probabilistic model has been proposed (see, for example, Non-Patent Document 1). In this method, when a predicate is a word in a sentence and a word for a term for the predicate is given, the case frame with the level of the semantic attribute of the word is expressed in information theory. It is a method to check whether the rule is powerful. It does not deal with term recognition or zero pronoun analysis for a given text.

また、非特許文献２に開示されているように大量のテキストコーパスを用いて、確率モデルを学習して、格解析を行い、述語項構造を決定する手法が提案されているが、この方法では、述語が含まれる文とは異なる文に項が現れるゼロ代名詞については扱っていない。 Further, as disclosed in Non-Patent Document 2, a method of learning a probability model using a large amount of text corpus, performing case analysis, and determining a predicate term structure has been proposed. , We do not deal with zero pronouns whose terms appear in a sentence different from the sentence containing the predicate.

また、ゼロ代名詞の同定方法については非特許文献３に記載されている。
李航、安倍直樹、“格スロット間の依存関係の学習”、情報処理学会研究報告、自然言語処理研究報告、Ｖｏｌ．９６、Ｎｏ．１１４、ＰＰ．９３−９９河原大輔、黒橋禎夫、“Ｗｅｂから獲得した大規模格フレームに基づく構文・格解析の統合的確率モデル”、言語処理学会第１２回年次大会発表論文集、ＰＰ．１１１１−１１１４、２００６年飯田龍、乾健太郎、“文脈的手がかりを考慮した機械学習による日本語ゼロ代名詞の先行詞同定”、情報処理学会論文誌、Ｖｏｌ．４５、Ｎｏ．３、ＰＰ．９０６−９１８池原悟、宮崎正弘、白井諭、横尾昭男、中岩浩巳、小倉健太郎、大山芳史、林良彦、日本語語彙大系、岩波書店（１９９７）工藤拓、松本裕治、“チャンキングの段階適用による日本語係り受け解析”、情報処理学会論文誌、Ｖｏｌ．４３、Ｎｏ．６、ｐｐ．１８３４−１８４２、２００２年ＶｌａｄｉｍｉｒＶａｐｎｉｋ、“ＴｈｅＮａｔｕｒｅｏｆＳｔａｔｉｓｔｉｃａｌＬｅａｒｎｉｎｇＴｈｅｏｒｙ”、２ｎｄＥｄｉｔｉｏｎ，Ｓｐｒｉｎｇｅｒ（１９９９） A method for identifying zero pronouns is described in Non-Patent Document 3.
Lee Wang, Naoki Abe, “Learning Dependencies Between Case Slots”, Information Processing Society of Japan Research Report, Natural Language Processing Research Report, Vol. 96, no. 114, PP. 93-99 Daisuke Kawahara and Ikuo Kurohashi, “Integrated probabilistic model of syntax and case analysis based on large-scale case frames acquired from the Web”, Proc. Of the 12th Annual Conference of the Language Processing Society, PP. 1111-1114, 2006 Ryu Iida and Kentaro Inui, “Identification of antecedents of Japanese zero pronouns by machine learning considering contextual cues”, IPSJ Journal, Vol. 45, no. 3, PP. 906-918 Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio Yokoo, Hiroaki Nakaiwa, Kentaro Ogura, Yoshifumi Oyama, Yoshihiko Hayashi, Japanese Vocabulary System, Iwanami Shoten (1997) Taku Kudo, Yuji Matsumoto, “Japanese Dependency Analysis by Chunking Stage Application”, IPSJ Journal, Vol. 43, no. 6, pp. 1834-1842, 2002 Vladimir Vapnik, “The Nature of Statistical Learning Theory”, 2nd Edition, Springer (1999)

従来の述語項構造を出力する言語処理装置では、辞書に格フレーム情報が登録されていても、複数の動詞、名詞の用法が存在する場合、どの用法についての格情報を用いて解析を行うかについては、明確な基準がなく、人手に頼った調整が必要であり、その調整は、非常に労力を要し、かつ調整によって解析精度を向上させるような調整方法を見つけることは困難であった。 In a conventional language processing device that outputs a predicate term structure, even if case frame information is registered in the dictionary, if there are multiple verb and noun usages, which usage case information is used for analysis There is no clear standard, and manual adjustment is necessary. The adjustment is very labor intensive, and it is difficult to find an adjustment method that improves the analysis accuracy by adjustment. .

そこで、非特許文献２では、大規模なテキストコーパスから述語項構造の確率モデルを自動的に構築する方法が提案されている。しかしながら、この方法では、述語が含まれる文とは異なる文に項が現れるゼロ代名詞については扱っておらず、複数の文が与えられたとき、高い精度の述語項構造解析を行うことが困難であった。 Therefore, Non-Patent Document 2 proposes a method for automatically constructing a predicate term structure probability model from a large-scale text corpus. However, this method does not deal with zero pronouns in which terms appear in a sentence different from the sentence containing the predicate, and it is difficult to perform a highly accurate predicate term structure analysis when multiple sentences are given. there were.

また、非特許文献３では、ゼロ代名詞の同定方法を扱っているが、述語項構造解析は扱っていない。また、述語が名詞化され、複合名詞の中に存在するような場合についても扱われていない。ゼロ代名詞や複合名詞の問題は、述語項構造解析を行う上で、相互に影響しあい、順番に扱っても全体的な述語項構造解析精度はかえって下がってしまう恐れがある。 Non-Patent Document 3 deals with a zero pronoun identification method, but does not deal with predicate term structure analysis. Also, the case where the predicate is converted into a noun and exists in a compound noun is not handled. The problems of zero pronouns and compound nouns affect each other in the predicate term structure analysis, and even if they are handled in order, the overall predicate term structure analysis accuracy may be lowered.

ゼロ代名詞、複合名詞を含めて述語項構造解析を統一的に扱う方法は従来無かった。 There has been no method to handle predicate term structure analysis in a unified manner including zero pronouns and compound nouns.

本発明は上記課題を解決するものであり、その目的は、項を判定するための項判定規則を高精度で自動学習することができるとともに、ゼロ代名詞や複合名詞を含めた述語項構造解析を統一的に扱うことができる辞書作成装置、辞書作成方法および辞書作成プログラム並びに辞書作成プログラムを記録した記録媒体を提供することにある。 The present invention solves the above-mentioned problems, and its purpose is to automatically learn a term decision rule for judging a term with high accuracy and to perform predicate term structure analysis including zero pronouns and compound nouns. An object is to provide a dictionary creation device, a dictionary creation method, a dictionary creation program, and a recording medium on which the dictionary creation program is recorded, which can be handled in a unified manner.

上記課題を解決するために、本発明は、述語および動作性名詞に対して、正解の項構造が人手でタグ付けされたテキストに基づいて、機械学習手法を用いることにより、述語または動作性名詞と、テキストに含まれる語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態等の情報から、述語または動作性名詞（以下、まとめて「述語」と呼ぶ）に対する項を判定するための項判定規則を自動学習し出力するように構成した。 In order to solve the above-mentioned problems, the present invention provides a predicate or a behavioral noun by using a machine learning method based on a text in which a correct term structure is manually tagged for a predicate and a behavioral noun. Predicate or operability based on the basic form, part of speech, semantic category, whether it is a functional word, whether it is a symbol, whether it is a symbol, dependency between clauses, predicate state, etc. A term decision rule for judging a term for a noun (hereinafter collectively referred to as a “predicate”) is automatically learned and output.

すなわち、請求項１に記載の辞書作成装置は、述語又は動作性名詞に対して正解の項構造が付与された自然言語で記載された解析対象のテキストと、該テキストを構文・意味解析した結果である、テキストに含まれる単語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態とが格納された構文・意味解析結果テーブルと、前記構文・意味解析結果テーブルを参照して、テキストに含まれる単語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態から学習を行うための属性を抽出し訓練属性インデクステーブルを作成し、同時に訓練ベクトルを作成して訓練ベクトルテーブルを作成する訓練データ作成手段と、前記訓練ベクトルテーブルを用いて属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブルに追加して重みテーブルを作成する重み学習手段と、前記重みテーブルを参照し、重要度の高い属性順に属性を並べ替えたリストを項判定規則として出力する項判定規則作成手段とを備え、前記出力された項判定規則を辞書とすることを特徴としている。 That is, the dictionary creation apparatus according to claim 1 is a result of analyzing the text to be analyzed described in a natural language in which a correct term structure is given to a predicate or an action noun, and a syntax / semantic analysis of the text. Refer to the syntax / semantic analysis result table in which the basic form of words included in the text, the part of speech, the semantic category, the dependency relationship between clauses, the state of the predicate, and the syntax / semantic analysis result table are stored. Attributes to learn from the basic form of words contained in the text, parts of speech, semantic categories, dependency relationships between clauses, and predicate states, create a training attribute index table, and simultaneously create a training vector Training data creation means for creating a training vector table, and learning weights representing the importance of the attributes using the training vector table. A weight learning unit that creates a weight table by adding the weights added to the training attribute index table, and a term determination that refers to the weight table and outputs a list in which attributes are rearranged in the order of the most important attributes as a term determination rule A rule creating means, wherein the output term determination rule is a dictionary.

また請求項２に記載の辞書作成装置は、述語又は動作性名詞に対して正解の項構造が付与された自然言語で記載された解析対象のテキストを構文解析し、テキストに含まれる単語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態を解析し、構文・意味解析結果テーブルを作成する構文・意味解析手段と、前記構文・意味解析結果テーブルを参照して、テキストに含まれる単語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態から学習を行うための属性を抽出し訓練属性インデクステーブルを作成し、同時に訓練ベクトルを作成して訓練ベクトルテーブルを作成する訓練データ作成手段と、前記訓練ベクトルテーブルを用いて属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブルに追加して重みテーブルを作成する重み学習手段と、前記重みテーブルを参照し、重要度の高い属性順に属性を並べ替えたリストを項判定規則として出力する項判定規則作成手段とを備え、前記出力された項判定規則を辞書とすることを特徴としている。 Further, the dictionary creation device according to claim 2 parses the text to be analyzed described in a natural language in which a correct term structure is given to the predicate or the action noun, and the basic form of the word included in the text , Part of speech, semantic category, dependency relationship between clauses, predicate state, syntax / semantic analysis means for creating a syntax / semantic analysis result table, and text referring to the syntax / semantic analysis result table The attribute for learning is extracted from the basic form of words, parts of speech, semantic categories, dependency relationships between clauses, and predicate states, a training attribute index table is created, and a training vector is created at the same time. Training data creation means for creating a table, and learning the weight representing the importance of the attribute using the training vector table, the weight obtained by the learning is A weight learning means for creating a weight table by adding to a training attribute index table; and a term determination rule creating means for outputting a list in which attributes are sorted in the order of attribute with high importance as a term determination rule with reference to the weight table. And the output term determination rule is a dictionary.

また請求項３に記載の辞書作成装置は、請求項１又は２において、前記訓練データ作成手段は、前記訓練ベクトルとともに教師変数を訓練ベクトルテーブルに格納し、前記重み学習手段は、前記訓練ベクトルテーブルに記載された訓練ベクトルおよび教師変数に対して、正例側と負例側を分割する２つの平行な超平面の距離が最大となる超平面を求め、該求められた超平面に基づいて機械学習手法を用いて重みを学習することを特徴としている。 The dictionary creation device according to claim 3 is the dictionary creation device according to claim 1, wherein the training data creation unit stores a teacher variable together with the training vector in a training vector table, and the weight learning unit includes the training vector table. A hyperplane in which the distance between two parallel hyperplanes that divide the positive example side and the negative example side is maximized with respect to the training vector and the teacher variable described in FIG. It is characterized by learning weights using a learning method.

また、請求項４に記載の辞書作成方法は、述語又は動作性名詞に対して正解の項構造が付与された自然言語で記載された解析対象のテキストと、該テキストを構文・意味解析した結果である、テキストに含まれる単語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態とが格納された構文・意味解析結果テーブルを備えた装置における辞書作成方法であって、訓練データ作成手段が、前記構文・意味解析結果テーブルを参照して、テキストに含まれる単語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態から学習を行うための属性を抽出し訓練属性インデクステーブルを作成し、同時に訓練ベクトルを作成して訓練ベクトルテーブルを作成するステップと、重み学習手段が、前記訓練ベクトルテーブルを用いて属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブルに追加して重みテーブルを作成する重み学習ステップと、項判定規則作成手段が、前記重みテーブルを参照し、重要度の高い属性順に属性を並べ替えたリストを項判定規則として出力するステップとを実行し、前記出力された項判定規則を辞書とすることを特徴としている。 Further, the dictionary creation method according to claim 4 is a result of analyzing the text to be analyzed described in a natural language in which a correct term structure is given to a predicate or an action noun, and a syntax / semantic analysis of the text. A dictionary creation method in a device having a syntax / semantic analysis result table storing basic forms of words included in text, parts of speech, semantic categories, dependency relationships between clauses, and predicate states, The training data creation means refers to the syntax / semantic analysis result table, and determines attributes for learning from the basic form of words included in the text, the part of speech, the semantic category, the dependency relationship between clauses, and the state of the predicate. A step of extracting and creating a training attribute index table, simultaneously creating a training vector and creating a training vector table, and weight learning means comprising the training vector table; A weight learning step of learning a weight representing the importance of an attribute using a bull, adding a weight obtained by the learning to the training attribute index table to create a weight table, and a term determination rule creating means, A step of referring to the weight table, outputting a list in which attributes are sorted in the order of high importance as a term determination rule is executed, and the output term determination rule is used as a dictionary.

また請求項５に記載の辞書作成方法は、構文・意味解析手段が、述語又は動作性名詞に対して正解の項構造が付与された自然言語で記載された解析対象のテキストを構文解析し、テキストに含まれる単語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態を解析し、構文・意味解析結果テーブルを作成するステップと、訓練データ作成手段が、前記構文・意味解析結果テーブルを参照して、テキストに含まれる単語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態から学習を行うための属性を抽出し訓練属性インデクステーブルを作成し、同時に訓練ベクトルを作成して訓練ベクトルテーブルを作成するステップと、重み学習手段が、前記訓練ベクトルテーブルを用いて属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブルに追加して重みテーブルを作成する重み学習ステップと、項判定規則作成手段が、前記重みテーブルを参照し、重要度の高い属性順に属性を並べ替えたリストを項判定規則として出力するステップとを実行し、前記出力された項判定規則を辞書とすることを特徴としている。 Further, in the dictionary creation method according to claim 5, the syntax / semantic analysis means parses the text to be analyzed described in the natural language in which the correct term structure is given to the predicate or the action noun, Analyzing the basic form, part of speech, semantic category, dependency relationship between clauses, and predicate state in the text, creating a syntax / semantic analysis result table, and training data creating means, Referring to the analysis result table, the basic attributes of words included in the text, parts of speech, semantic categories, dependency relationships between clauses, and attributes for learning are extracted from the predicate state, and a training attribute index table is created. Simultaneously creating a training vector and creating a training vector table, and a weight learning means uses the training vector table to represent a weight representing the importance of the attribute A weight learning step of learning and adding a weight obtained by the learning to the training attribute index table to create a weight table, and a term determination rule creating means refer to the weight table, and in order of the attribute having the highest importance A step of outputting a list in which attributes are rearranged as a term determination rule is executed, and the output term determination rule is used as a dictionary.

また請求項６に記載の辞書作成方法は、請求項４又は５において、前記訓練データ作成手段が、前記訓練ベクトルとともに教師変数を訓練ベクトルテーブルに格納するステップを有し、前記重み学習ステップは、前記訓練ベクトルテーブルに記載された訓練ベクトルおよび教師変数に対して、正例側と負例側を分割する２つの平行な超平面の距離が最大となる超平面を求め、該求められた超平面に基づいて機械学習手法を用いて重みを学習することを特徴としている。 In addition, the dictionary creation method according to claim 6 includes the step of storing the teacher variable in the training vector table together with the training vector, in which the training data creation means is the training vector table according to claim 4 or 5, With respect to the training vector and the teacher variable described in the training vector table, a hyperplane in which the distance between two parallel hyperplanes dividing the positive example side and the negative example side is maximized is obtained, and the obtained hyperplane is obtained. Based on the above, it is characterized by learning weights using a machine learning method.

また、請求項７に記載の辞書作成プログラムは、コンピュータを請求項１乃至３のいずれか１項に記載の各手段として機能させる辞書作成プログラムである。 A dictionary creation program according to a seventh aspect is a dictionary creation program that causes a computer to function as each means according to any one of the first to third aspects.

また、請求項８に記載の記録媒体は、請求項７に記載の辞書作成プログラムを記録したコンピュータ読み取り可能な記録媒体である。 A recording medium according to an eighth aspect is a computer-readable recording medium in which the dictionary creating program according to the seventh aspect is recorded.

上記構成によれば、正解の項構造が人手でタグ付けされたテキストに対し、述語または動作性名詞と、テキストに含まれる語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態等の情報から、述語または動作性名詞に対する項を判定するための項判定規則を自動学習し出力することができる利点がある。 According to the above configuration, for text that is tagged manually with the correct term structure, the predicate or behavioral noun, the basic form of the word included in the text, the part of speech, the semantic category, and the dependency relationship between clauses, predicate There is an advantage that a term determination rule for determining a term for a predicate or a behavioral noun can be automatically learned and output from information such as the state of.

本発明によれば次のような優れた効果が得られる。
（１）高精度で項を判定するための項判定規則を自動学習することができる。
（２）述語が含まれる文とは異なる文に項が現れるゼロ代名詞や複合名詞を含めて項判定規則を自動学習することができる。したがって、本発明で作成した辞書を用いることで、ゼロ代名詞や複合名詞を含めた述語項構造解析を統一的に扱うことができる。 According to the present invention, the following excellent effects can be obtained.
(1) A term determination rule for determining a term with high accuracy can be automatically learned.
(2) The term decision rule can be automatically learned including zero pronouns and compound nouns in which terms appear in a sentence different from the sentence containing the predicate. Therefore, by using the dictionary created in the present invention, predicate term structure analysis including zero pronouns and compound nouns can be handled uniformly.

以下、図面を参照しながら本発明の実施の形態を説明するが、本発明は下記の実施形態例に限定されるものではない。図１は本発明の一実施形態例における辞書作成装置１の構成を示すブロック図であり、図２は図１の装置の動作を示すフローチャートである。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, but the present invention is not limited to the following embodiments. FIG. 1 is a block diagram showing a configuration of a dictionary creation device 1 according to an embodiment of the present invention, and FIG. 2 is a flowchart showing an operation of the device of FIG.

図１において、２は、述語又は動作性名詞に対して人手で正解の項構造が付与された自然言語で記載された訓練用テキスト（解析対象のテキスト）と、該テキストを構文・意味解析した結果である、テキストに含まれる単語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態とが格納された構文・意味解析結果テーブルである。 In FIG. 1, 2 is a training text (text to be analyzed) written in a natural language in which a correct term structure is manually added to a predicate or an action noun, and the text is subjected to syntax and semantic analysis. The result is a syntax that contains the basic form of the word contained in the text, part of speech, semantic category, whether it is a functional word, whether it is a symbol, the dependency between clauses, and the state of the predicate. It is a semantic analysis result table.

機能語であるか否か、および記号であるか否かの情報は、あらかじめ構文・意味解析結果テーブル２に格納しておいても良いし、構文・意味解析結果テーブル２には格納せず、単語の基本形と品詞の情報から必要に応じて動的に算出して利用することとしても良い。 Information regarding whether or not it is a function word and whether or not it is a symbol may be stored in the syntax / semantic analysis result table 2 in advance, or not stored in the syntax / semantic analysis result table 2. It is also possible to dynamically calculate and use the basic form of the word and the part of speech information as necessary.

３は、前記構文・意味解析結果テーブル２を参照して、テキストに含まれる単語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態から学習を行うための属性を抽出し訓練属性インデクステーブル４を作成し、同時に訓練ベクトルを作成して訓練ベクトルテーブル５を作成する訓練データ作成手段としての訓練データ作成部である。 3 refers to the syntax / semantic analysis result table 2 and refers to the basic form, part of speech, semantic category, whether it is a functional word, whether it is a function word, whether it is a symbol, and the dependency between phrases. It is a training data creation unit as a training data creation means for extracting a training attribute index table 4 by extracting attributes for learning from the relationship and predicate state, and creating a training vector table 5 at the same time by creating a training vector. .

６は、前記訓練ベクトルテーブル５を用いて属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブル４に追加して重みテーブル７を作成する重み学習手段としての重み学習部である。 6 is a weight learning unit that learns a weight representing the importance of an attribute using the training vector table 5 and creates a weight table 7 by adding the weight obtained by the learning to the training attribute index table 4. Is a weight learning unit.

８は、前記重みテーブル７を参照し、重要度の高い属性順に属性を並べ替えたリストを項判定規則（辞書）として出力する項判定規則作成手段としての項判定規則作成部である。 Reference numeral 8 denotes a term determination rule creating unit serving as a term determination rule creating unit that refers to the weight table 7 and outputs a list in which attributes are rearranged in the order of importance, as a term determination rule (dictionary).

次に、上記のように構成された辞書作成装置１の動作を図２とともに説明する。 Next, the operation of the dictionary creating apparatus 1 configured as described above will be described with reference to FIG.

まず、辞書作成装置１に対して、ユーザが正解項構造付き訓練用テキストおよびテキストの構文・意味解析結果を構文・意味解析結果テーブル２に入力する（ステップＳ１）。 First, the user inputs the training text with correct term structure and the syntax / semantic analysis result of the text into the syntax / semantic analysis result table 2 to the dictionary creating apparatus 1 (step S1).

次に、訓練データ作成部３が、構文・意味解析結果テーブル２から、述語および動作性名詞の項判定規則を構成する属性を抽出し、訓練属性インデクステーブル４を作成するとともに、該属性を利用して訓練ベクトルを作成し、訓練ベクトルテーブル５に保存する（ステップＳ２）。 Next, the training data creation unit 3 extracts the attributes constituting the predicate and action noun term determination rules from the syntax / semantic analysis result table 2, creates the training attribute index table 4, and uses the attributes Then, a training vector is created and stored in the training vector table 5 (step S2).

次に、重み学習部６が訓練ベクトルテーブル５から、機械学習技術を用いて、属性の重要性を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブル４に追加し、それを重みテーブル７に保存する（ステップＳ３）。 Next, the weight learning unit 6 learns the weight representing the importance of the attribute from the training vector table 5 using machine learning technology, and adds the weight obtained by the learning to the training attribute index table 4; It is stored in the weight table 7 (step S3).

次に、項判定規則作成部８が、重みテーブル７の情報から、属性の重み順に並べなおし、項判定規則を作成し出力して、終了する（ステップＳ４）。 Next, the term determination rule creation unit 8 rearranges the information in the weight table 7 in the order of the attribute weights, creates and outputs a term determination rule, and ends (step S4).

尚、前記図１の構文・意味解析結果テーブル２、訓練データ作成部３、訓練属性インデクステーブル４、訓練ベクトルテーブル５、重み学習部６、重みテーブル７および項判定規則作成部８の、各部の具体的な構成、動作については、次に詳述する図３の構文・意味解析結果テーブル１２０、訓練データ作成部１０２、訓練属性インデクステーブル１２１、訓練ベクトルテーブル１２２、重み学習部１０３、重みテーブル１２３および項判定規則作成部１０４と各々同一であるので、ここでは説明を省略する。 The syntax / semantic analysis result table 2, the training data creation unit 3, the training attribute index table 4, the training vector table 5, the weight learning unit 6, the weight table 7, and the term determination rule creation unit 8 of FIG. Regarding the specific configuration and operation, the syntax / semantic analysis result table 120, the training data creation unit 102, the training attribute index table 121, the training vector table 122, the weight learning unit 103, and the weight table 123 shown in FIG. Since they are the same as the term determination rule creation unit 104, the description thereof is omitted here.

図３は本発明の他の実施形態例における辞書作成装置１０の構成を示すブロック図であり、図４は図３の装置の動作を示すフローチャートである。 FIG. 3 is a block diagram showing the configuration of the dictionary creation device 10 according to another embodiment of the present invention, and FIG. 4 is a flowchart showing the operation of the device of FIG.

図３において、１０１は、述語又は動作性名詞に対して正解の項構造が付与された自然言語で記載された訓練用テキスト（解析対象のテキスト）を構文解析し、テキストに含まれる単語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態を解析し、構文・意味解析結果テーブル１２０を作成する構文・意味解析手段としての構文・意味解析部である。 In FIG. 3, reference numeral 101 denotes a training text (text to be analyzed) described in a natural language in which a correct term structure is given to a predicate or an action noun, and a basic form of a word included in the text. , Part-of-speech, semantic category, whether it is a functional word, whether it is a symbol, dependency relation between clauses, predicate state, syntax / semantic analysis to create a syntax / semantic analysis result table 120 It is a syntax / semantic analysis unit as a means.

１０２は、前記構文・意味解析結果テーブル１２０を参照して、テキストに含まれる単語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態から学習を行うための属性を抽出し訓練属性インデクステーブル１２１を作成し、同時に訓練ベクトルを作成して訓練ベクトルテーブル１２２を作成する訓練データ作成手段としての訓練データ作成部である。 102, referring to the syntax / semantic analysis result table 120, the basic form of a word included in the text, the part of speech, the semantic category, whether it is a function word, whether it is a symbol, and the dependency between phrases It is a training data creation unit as a training data creation unit that extracts attributes for learning from the state of relations and predicates, creates a training attribute index table 121, and creates a training vector at the same time to create a training vector table 122. .

１０３は、前記訓練ベクトルテーブル１２２を用いて属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブル１２１に追加して重みテーブル１２３を作成する重み学習手段としての重み学習部である。 103 is a weight learning unit that learns the weight representing the importance of the attribute using the training vector table 122 and adds the weight obtained by the learning to the training attribute index table 121 to create the weight table 123. Is a weight learning unit.

１０４は、前記重みテーブル１２３を参照し、重要度の高い属性順に属性を並べ替えたリストを項判定規則（辞書）として出力する項判定規則作成手段としての項判定規則作成部である。 Reference numeral 104 denotes a term determination rule creating unit serving as a term determination rule creating unit that refers to the weight table 123 and outputs, as a term determination rule (dictionary), a list in which attributes are rearranged in order of importance.

前記構文・意味解析部１０１、構文・意味解析結果テーブル１２０、訓練データ作成部１０２、訓練属性インデクステーブル１２１、訓練ベクトルテーブル１２２、重み学習部１０３、重みテーブル１２３および項判定規則作成部１０４の、後述する各機能は、例えばコンピュータにより達成される。 The syntax / semantic analysis unit 101, the syntax / semantic analysis result table 120, the training data creation unit 102, the training attribute index table 121, the training vector table 122, the weight learning unit 103, the weight table 123, and the term determination rule creation unit 104, Each function to be described later is achieved by, for example, a computer.

次に、上記のように構成された辞書作成装置１０の動作を図４とともに説明する。まず、辞書作成装置１０に対して、自然言語で書かれた、述語または動作性名詞に対して人手で正解の項構造が付与された、訓練用テキストを構文・意味解析部１０１に入力する（ステップＳ１１）。例えば、図５のような文章が入力されたとする。 Next, the operation of the dictionary creating apparatus 10 configured as described above will be described with reference to FIG. First, a training text in which a correct term structure is manually added to a predicate or a behavioral noun written in a natural language is input to the syntax / semantic analysis unit 101 to the dictionary creation device 10 ( Step S11). For example, assume that a sentence as shown in FIG. 5 is input.

ここで、図５の＜ＮＰＩＤ＝数字＞と＜／ＮＰ＞のタグで囲まれた部分は項を表し、＜ＰＲＥＤ〜＞と＜／ＰＲＥＤ＞で囲まれた部分は述語であることを表す。このテキストには出現していないが動作性名詞については＜ＥＶＥＮＴ〜＞と＜／ＥＶＥＮＴ＞のタグで囲むものとする。また、＜ＰＲＥＤ〜＞タグと＜ＥＶＥＮＴ〜＞タグの「〜」の部分に記述される「ＮＯＭ＝“１” ＡＣＣ＝“２”」等の記載は、「この述語や動作性名詞の基本形に対して、主格を取る項の正解は、ＩＤ番号が１である「私」、対格を取る項の正解は、ＩＤ番号が２である「ピーマン」」等を表す。ここでは、簡単のため、項の種類としては主格項と対格項だけ扱うとするが、他の項の種類についても同様に処理が可能である。 Here, the part surrounded by the tags <NP ID = number> and </ NP> in FIG. 5 represents a term, and the part enclosed by <PRED ~> and </ PRED> represents a predicate. . Although it does not appear in this text, a behavioral noun is enclosed by tags <EVENT ~> and </ EVENT>. In addition, the description such as “NOM =“ 1 ”ACC =“ 2 ”” described in the “˜” part of the <PRED ~> tag and the <EVENT ~> tag is “in the basic form of this predicate or action noun. On the other hand, the correct answer of the term taking the main case represents "I" whose ID number is 1, and the correct answer of the term taking the main case represents "green pepper" whose ID number is 2. Here, for the sake of simplicity, it is assumed that only the main term and the opposite term are handled as the types of terms, but the same processing can be performed for the types of other terms.

次に、構文・意味解析部１０１が、前記訓練用テキストを構文解析および意味解析を行うことにより、テキストに含まれる語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態を特定し、解析結果を構文・意味解析結果テーブル１２０に格納する(ステップＳ１２)。例えば、「私はピーマンが嫌いだ。しかし昨日は母に無理やり食べさせられた。」という文章を構文解析および意味解析を行った場合、図６のように、テキストの先頭から順に、単語毎に、文番号、文節番号、係り先文節番号、文節内の単語番号、単語の基本形、品詞、機能語または記号か否か、単語の意味カテゴリ、を得る。また、テキストに与えられていた正解の項構造から、どの単語が該テキストで解析対象とする述語または動作性名詞であるか、該述語または動作性名詞に対して項構造を構成する項を得る。この場合、「食べる」が今注目している述語だとし、「食べる」に対する項は「私」と「ピーマン」であり、特に「私」は主格項、「ピーマン」は対格項であるとする。 Next, the syntax / semantic analysis unit 101 performs syntax analysis and semantic analysis on the training text, thereby indicating whether or not the basic form, part of speech, semantic category, and function word of the word included in the text are symbols. And the dependency relationship between clauses and the state of the predicate are specified, and the analysis result is stored in the syntax / semantic analysis result table 120 (step S12). For example, if you parse the sentence “I hate bell peppers, but my mother forcibly eaten yesterday” and perform semantic analysis, as shown in FIG. , Sentence number, clause number, dependency clause number, word number in the clause, basic form of the word, part of speech, whether it is a function word or symbol, and the meaning category of the word. In addition, from the correct term structure given to the text, which word is the predicate or behavioral noun to be analyzed in the text, or the terms constituting the term structure for the predicate or behavioral noun are obtained. . In this case, suppose that “eating” is the predicate that is of interest now, and the terms for “eating” are “I” and “green pepper”, and in particular, “I” is the main term, and “green pepper” is the counter case. .

ここで図６の文番号は０以上の整数で、テキストの先頭の文から順に０，１，２、．．．と付与される。文節番号は０以上の整数で、１文内の先頭の文節から順に０，１，２、．．．と付与される。また、係り先文節番号は、各単語について、その単語を含む文節が構文解析の結果、係っていると判定された文節の番号である。ただし文末で係り先が無い場合は、係り先文節番号を−１とした。単語番号は、０以上の整数で、１文節内の先頭の単語から順に０，１，２、．．．と付与される。単語基本形は各単語の基本形である。品詞は、各単語の品詞である。機能語／記号は、各単語が該単語を含む文節の中で意味内容を表す内容語ではなく、「は」「が」など内容語に付属して内容語の機能を表す機能語であるか、もしくは記号であるかを表す。意味カテゴリは、日本語語彙大系（参考文献：非特許文献４参照）などのシソーラスを用いて各単語に付与された意味カテゴリである。 Here, the sentence numbers in FIG. 6 are integers of 0 or more, and 0, 1, 2,. . . And given. The clause number is an integer greater than or equal to 0. 0, 1, 2,. . . And given. Further, the related clause number is the number of the clause that is determined to be related as a result of syntax analysis for each word. However, when there is no destination at the end of the sentence, the destination clause number is set to -1. The word number is an integer greater than or equal to 0, and is 0, 1, 2,. . . And given. The word basic form is the basic form of each word. The part of speech is the part of speech of each word. Whether the functional word / symbol is a functional word that represents the function of the content word attached to the content word, such as “ha” and “ga”, instead of the content word that represents the semantic content in the clause including the word. Or a symbol. The semantic category is a semantic category assigned to each word using a thesaurus such as a Japanese vocabulary system (see Reference Document: Non-Patent Document 4).

この構文・意味解析部１０１では、ＣａｂｏＣｈａ（参考文献：非特許文献５参照）等の単体の構文解析器を利用することもできる。 The syntax / semantic analysis unit 101 can use a single syntax analyzer such as CaboCha (reference: see Non-Patent Document 5).

次に、訓練データ作成部１０２が、前記構文・意味解析結果テーブル１２０について、述語を起点として、何らかの制約の元で述語から一番近い場所に出現した単語が述語に対する項となっていると仮定し、その制約となる属性を抽出し、訓練属性インデクステーブル１２１を作成する（ステップＳ１３）。例えば、図６の例では、述語「食べる」に対し、主格の項となっている「私」は、「述語より前にある単語を探索」したとき、「意味カテゴリが「人」」でかつ「その単語の機能語が「は」」である単語の中では、一番近い場所にある単語である。述語以外の入力テキストに現れる各単語について、どのような制約で述語から一番近いかを調べ、その制約を属性とする。例えば、制約としては、（対象単語の探索方法、品詞、意味カテゴリ、係り受け関係にある述語の態）の組合せとする。 Next, the training data creation unit 102 assumes that, in the syntax / semantic analysis result table 120, a word that appears in the nearest place from the predicate under some restriction is a term for the predicate, starting from the predicate. Then, the attribute that becomes the restriction is extracted, and the training attribute index table 121 is created (step S13). For example, in the example of FIG. 6, when “I”, which is the main term for the predicate “eat”, “searches for a word before the predicate”, the meaning category is “person” and Among words whose function word is “ha”, it is the word in the nearest place. For each word appearing in the input text other than the predicate, it is examined what kind of constraint is closest to the predicate, and the constraint is used as an attribute. For example, the constraint is a combination of (target word search method, part of speech, semantic category, predicate state in dependency relationship).

ここで、対象単語の探索方法とは、例えば、述語より前方に単語を探索し、単語を含む文節が対象述語を含む文節へ係っている状態で、単語を含む文節の機能語／記号および単語が係っている述語の態を考慮する場合をｉｃ、述語より後方に単語を探索し、対象述語を含む文節から単語を含む文節へ係っている状態で、単語を含む文節の機能語／記号および単語が係っている述語の態を考慮する場合をｏｃ、述語と対象の単語が同じ文節内にある場合をｓｃ、述語を含む文節と対象の単語を含む文節との間に係り受け関係がなく、かつ同じ文節でもない場合に、その単語を含む文節の持つ機能語／記号、係り受け関係にある述語の態を考慮しながら最初に述語より前方を、それでもそのような単語がない場合は述語より後方を探索する方法をｎｃとする。また、機能語／記号および単語が係っている述語の態を考慮しないで述語より前方を探索する方法をｆｗ、同様に後方を探索する方法をｂｗとする。例えば、図６の例では、単語「母」は、（探索方法＝ｉｃ、単語の意味カテゴリ＝人、機能語＝に、態＝受動態）の制約等で述語「食べる」から一番近い位置にある。このようにして、図７のような訓練属性インデクステーブル１２１を作成する。多くの属性が作成されるが、ここではそのうち１０個の属性のみを示す。 Here, the search method of the target word is, for example, searching for a word ahead of the predicate, and in a state where the clause including the word is related to the clause including the target predicate, the functional word / symbol of the clause including the word and Ic when considering the state of the predicate that the word is related to, search for the word after the predicate, and the functional word of the clause that includes the word from the clause containing the target predicate to the clause containing the word / Oc to consider the state of the predicate with which the symbol and the word are involved, sc if the predicate and the target word are in the same clause, and a clause between the clause containing the predicate and the clause containing the target word If there is no receiving relationship and the same clause, the function word / symbol of the clause containing the word and the predicate state in the dependency relationship are taken into consideration first, but such a word is still If not, how to search behind the predicate And c. Further, let fw be a method for searching forward from a predicate without considering the predicate state in which the function word / symbol and the word are related, and bw be a method for searching backward similarly. For example, in the example of FIG. 6, the word “mother” is closest to the predicate “eat” due to restrictions such as (search method = ic, word semantic category = person, function word =, state = passive). is there. In this way, the training attribute index table 121 as shown in FIG. 7 is created. Many attributes are created, of which only 10 are shown here.

次に、訓練データ作成部１０２が、訓練属性インデクステーブル１２１と構文・意味解析結果テーブル１２０を元に、構文・意味解析結果テーブル１２０中の述語以外の各単語について、訓練ベクトルを作成し、訓練ベクトルテーブル１２２に格納する（ステップＳ１４）。例えば、図５において、述語を除く単語について、上から順に訓練ベクトルを作成するとし、それらのベクトルをｘ＿１，ｘ＿２，ｘ＿３，．．．とする。この場合、図８のような訓練ベクトルを生成する。図７のような訓練属性インデクステーブル１２１の各属性が各単語の条件に当てはまるか否かを検査し、もし当てはまれば、属性値を１、当てはまらなければ属性値を０とする。例えば、ｘ＿１において４番目の要素が１となっているが、これは、単語「私」が、図７の属性番号４の条件、つまり探索タイプがｎｃ、すなわち、述語と係り受け関係に無い単語の中を述語から前方向を探索したとき、意味カテゴリが「人」で、その単語を含む文節が「は」であり、単語と係り受け関係にある述語の態が能動態である単語の中で、述語から最も近い位置にあるためである。逆に、ｘ＿１の９番目の要素が０となっているが、これは、述語「食べる」から機能語の条件、態の条件はなしで、述語の前方の単語を探索していったとき、意味カテゴリが「人」であるものは、文番号１、文節番号２、単語番号０の「母」が最も近く、「私」は２番目の近さで一番ではないため、条件に当てはまらないと判定され０となっている。こうして、訓練ベクトルを生成して、訓練ベクトルテーブル１２２へ格納する。 Next, the training data creation unit 102 creates training vectors for each word other than predicates in the syntax / semantic analysis result table 120 based on the training attribute index table 121 and the syntax / semantic analysis result table 120, It stores in the vector table 122 (step S14). For example, in FIG. 5, it is assumed that training vectors are created in order from the top for words excluding predicates, and these vectors are represented by x_1, x_2, x_3,. . . And In this case, a training vector as shown in FIG. 8 is generated. It is checked whether or not each attribute of the training attribute index table 121 as shown in FIG. 7 meets the condition of each word. If so, the attribute value is 1, and if not, the attribute value is 0. For example, the fourth element in x_1 is 1, which means that the word “I” is the condition of attribute number 4 in FIG. 7, that is, the search type is nc, that is, the word that is not dependent on the predicate. When searching forward from a predicate in a word, the semantic category is “people”, the phrase containing the word is “ha”, and the predicate in a dependency relationship with the word is active This is because the position is closest to the predicate. On the other hand, the ninth element of x_1 is 0, which means that when searching for the word in front of the predicate without any function or condition conditions from the predicate “eat”, If the category is “People”, sentence number 1, phrase number 2, and word number 0 “mother” are the closest, and “I” is the second closest, so the condition is not met It is determined to be 0. Thus, a training vector is generated and stored in the training vector table 122.

また、述語を除く各単語について項の種類ごとに教師変数を訓練ベクトルテーブル１２２へ格納する（ステップＳ１５）。例えば、ｙ＿{１，ＮＯＭ}＝１，ｙ＿{２，ＮＯＭ}＝０，ｙ＿{３，ＮＯＭ}＝０，．．．，ｙ＿{１，ＡＣＣ}＝０，ｙ＿{２，ＡＣＣ}＝０，ｙ＿{３，ＡＣＣ}＝１，．．．，を格納する。ここで、ｙ＿{ｉ，ＮＯＭ}は、ｉ番目の単語が正解の主格項であれば１、そうでなければ−１を格納するスカラーの変数である。また同様に、ｙ＿{ｉ，ＡＣＣ}はｉ番目の単語が正解の対格項であれば１、そうでなければ−１を格納するスカラーの変数である。なお、ＮＯＭは主格に当たる英語ｎｏｍｉｎａｔｉｖｅ、ＡＣＣは対格に当たる英語ａｃｃｕｓａｔｉｖｅを表している。こうして、結局、図９のような訓練ベクトルと教師変数を訓練ベクトルテーブル１２２へ格納する。 Further, for each word excluding the predicate, a teacher variable is stored in the training vector table 122 for each type of term (step S15). For example, y_ {1, NOM} = 1, y_ {2, NOM} = 0, y_ {3, NOM} = 0,. . . , Y_ {1, ACC} = 0, y_ {2, ACC} = 0, y_ {3, ACC} = 1,. . . , Is stored. Here, y_ {i, NOM} is a scalar variable that stores 1 if the i-th word is a correct principal term, and -1 otherwise. Similarly, y_ {i, ACC} is a scalar variable storing 1 if the i-th word is a correct case term, and -1 otherwise. Note that NOM represents English nominal corresponding to the main case, and ACC represents English accumulative equivalent to the main case. Thus, the training vector and the teacher variable as shown in FIG.

次に、重み学習部１０３が、項の種類毎に、訓練ベクトルテーブル１２２から、訓練ベクトルおよび教師変数を読み出し、機械学習手法を用いて、属性の重要度の重みを計算する（ステップＳ１６）。 Next, the weight learning unit 103 reads the training vector and the teacher variable from the training vector table 122 for each term type, and calculates the weight of the attribute importance using the machine learning method (step S16).

一例としてＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）と呼ばれる機械学習手法で、特に非特許文献６において線形ＳＶＭと呼ばれている機械学習手法を利用して、学習を行う方法について、以下に述べる。 As an example, a machine learning method called SVM (Support Vector Machine), in particular, a method of learning using a machine learning method called linear SVM in Non-Patent Document 6 will be described below.

線形ＳＶＭでは、（ｘ₁，ｙ₁），．．．，（ｘ_m，ｙ_m）…（１）で表されるｍ個の訓練データに対して、正例側（ｙ＞０となる部分）と負例側（ｙ＜０となる部分）を分割する２つの平行な超平面を求める。その際、その２つの超平面の距離が最大となるような超平面を求める。ここで求められる超平面は式（２）で表される。 For linear SVMs, (x ₁ , y ₁ ),. . . , (X _m , y _m ) ... For the m pieces of training data represented by (1), the positive example side (part where y> 0) and the negative example side (part where y <0) are divided. Find two parallel hyperplanes. At this time, a hyperplane in which the distance between the two hyperplanes is maximized is obtained. The hyperplane obtained here is expressed by equation (2).

ここでｗは重みベクトルで、ｂはバイアスであり、ｗもｂも訓練データから式（３）で表される最適化問題を解くことで得られる。 Here, w is a weight vector, b is a bias, and both w and b can be obtained by solving the optimization problem expressed by Equation (3) from the training data.

このようにして得られた重みベクトルｗの要素は、線形ＳＶＭの場合、その値が大きいほど、それに対応する属性の重要度が高いことを意味する。例えば、図９のような訓練ベクトルテーブル１２２の内容に対して、主格項に対する線形ＳＶＭ、対格項に対する線形ＳＶＭで学習を行ったとする。ここで主格項に対する線形ＳＶＭでは、（ｘ＿１，ｙ＿{１，ＮＯＭ}），（ｘ＿２，ｙ＿{２，ＮＯＭ}），（ｘ＿３，ｙ＿{３，ＮＯＭ}），．．．，対格項に対する線形ＳＶＭでは、（ｘ＿１，ｙ＿{１，ＡＣＣ}），（ｘ＿２，ｙ＿{２，ＡＣＣ}），（ｘ＿３，ｙ＿{３，ＡＣＣ}），．．．，のベクトルとスカラー変数の組が訓練データとして用いられる。 In the case of the linear SVM, the element of the weight vector w obtained in this way means that the greater the value, the higher the importance of the corresponding attribute. For example, it is assumed that the contents of the training vector table 122 as shown in FIG. 9 are learned by the linear SVM for the main case term and the linear SVM for the counter case term. Here, in the linear SVM for the main term, (x_1, y_ {1, NOM}), (x_2, y_ {2, NOM}), (x_3, y_ {3, NOM}),. . . , (X_1, y_ {1, ACC}), (x_2, y_ {2, ACC}), (x_3, y_ {3, ACC}),. . . The vector and scalar variable pairs are used as training data.

その結果、図１０のような主格項に関する重みベクトルｗ_NOM、対格項に関する重みベクトルｗ_ACCが得られたとする。そして、重み学習部１０３は、前記図１０のように得られた重みベクトルを前記訓練属性インデクステーブル１２１のテーブル（図７のテーブル）に追加することにより、主格項に関する重みテーブル（図１１）と対格項に関する重みテーブル（図１２)を得、それらを重みテーブル１２３に格納する。 As a result, it is assumed that the weight vector w _NOM related to the main case term and the weight vector w _ACC related to the case term as shown in FIG. Then, the weight learning unit 103 adds the weight vector obtained as shown in FIG. 10 to the table of the training attribute index table 121 (table of FIG. 7), thereby obtaining a weight table (FIG. 11) related to the main item. A weight table (FIG. 12) regarding the relative terms is obtained and stored in the weight table 123.

次に、項判定規則作成部１０４が、重みテーブル１２３内の複数の重みテーブルを１つにまとめ、重みでソートして、決定リストとして出力する（ステップＳ１７）。例えば、図１１と図１２のテーブルをまとめて図１３のテーブルを得る。これを重みでソートしたものの上位１０位までを図１４に示す。このテーブルを決定リスト形式として出力し、項判定規則テーブル（辞書）を得る。 Next, the term determination rule creation unit 104 collects a plurality of weight tables in the weight table 123, sorts them by weights, and outputs them as a decision list (step S17). For example, the tables of FIGS. 11 and 12 are combined to obtain the table of FIG. FIG. 14 shows the top 10 items sorted by weight. This table is output as a decision list format to obtain a term determination rule table (dictionary).

なお、前記実施形態例では、出力する項構造が述語基本形に対する必須表層格に関する項構造である具体例について記述したが、出力する項構造が任意の表層格に関する場合や深層格に関する場合でも、構文・意味解析結果テーブルをそれらに応じたものにすれば同様の手段で実現可能である。 In the above embodiment, a specific example is described in which the output term structure is a term structure related to the mandatory surface case for the predicate basic form. However, even if the output term structure is related to any surface case or deep case, the syntax -It can be realized by the same means if the semantic analysis result table is made corresponding to them.

また、重み学習手法においても重みテーブルが得られれば上記で述べた機械学習方法とは異なる学習方法を用いることが可能である。 Also, in the weight learning method, if a weight table is obtained, a learning method different from the machine learning method described above can be used.

また、扱う対象が英語などの外国語テキストである場合にも、係り受け関係の機能語の代わりに、動詞・名詞以外の品詞や、単語間で構成される名詞句・動詞句といった句構造における関係を係り受け関係として使用することによって、この学習方法を用いることが可能である。 In addition, even if the target is a foreign language text such as English, instead of the dependency function word, the part structure other than the verb / noun and the phrase structure such as the noun phrase / verb phrase composed of words are used. This learning method can be used by using the relationship as a dependency relationship.

また、本実施形態の辞書作成装置における各手段の一部もしくは全部の機能をコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行して本発明を実現することができること、本実施形態の辞書作成方法における手順をコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることは言うまでもなく、コンピュータでその機能を実現するためのプログラムを、そのコンピュータが読み取り可能な記録媒体、例えばＦＤ（Ｆｌｏｐｐｙ（登録商標）Ｄｉｓｋ）や、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｋ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、メモリカード、ＣＤ（ＣｏｍｐａｃｔＤｉｓｋ）−ＲＯＭ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＨＤＤ、リムーバブルディスクなどに記録して、保存したり、配布したりすることが可能である。また、上記のプログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 Further, a part or all of the functions of each means in the dictionary creation device of the present embodiment can be configured by a computer program, and the program can be executed using the computer to realize the present invention. It goes without saying that the procedure in the dictionary creation method can be constituted by a computer program, and the program can be executed by the computer, and the program for realizing the function by the computer can be read by a computer-readable recording medium such as an FD. (Floppy (registered trademark) Disk), MO (Magneto-Optical disk), ROM (Read Only Memory), memory card, CD (Compact Disk) -ROM, DVD (Digital Versatile D) sk) -ROM, CD-R, CD-RW, HDD, and recorded in a removable disk, or stored, it is possible or distribute. It is also possible to provide the above program through a network such as the Internet or electronic mail.

本発明の一実施形態例に関する辞書作成装置のシステム構成を示すブロック図。The block diagram which shows the system configuration | structure of the dictionary preparation apparatus regarding one Example of this invention. 本発明の一実施形態例に関する辞書作成装置の動作を説明するためのフローチャート。The flowchart for demonstrating operation | movement of the dictionary creation apparatus regarding one Example of this invention. 本発明の他の実施形態例に関する辞書作成装置のシステム構成を示すブロック図。The block diagram which shows the system configuration | structure of the dictionary creation apparatus regarding the other embodiment of this invention. 本発明の他の実施形態例に関する辞書作成装置の動作を説明するためのフローチャート。The flowchart for demonstrating operation | movement of the dictionary creation apparatus regarding the other embodiment of this invention. 本発明の辞書作成装置に入力される正解項構造付き訓練データの例を示す説明図。Explanatory drawing which shows the example of training data with a correct answer term structure input into the dictionary creation apparatus of this invention. 本発明の辞書作成装置における構文・意味解析結果テーブルの例を示す説明図。Explanatory drawing which shows the example of a syntax and a semantic analysis result table in the dictionary creation apparatus of this invention. 本発明の辞書作成装置における訓練属性インデクステーブルの例を示す説明図。Explanatory drawing which shows the example of the training attribute index table in the dictionary creation apparatus of this invention. 本発明の辞書作成装置内で作成される訓練ベクトルの例を示す説明図。Explanatory drawing which shows the example of the training vector produced within the dictionary creation apparatus of this invention. 本発明の辞書作成装置における訓練ベクトルテーブルの例を示す説明図。Explanatory drawing which shows the example of the training vector table in the dictionary creation apparatus of this invention. 本発明の辞書作成装置において機械学習で得られる重みベクトルの例を示す説明図。Explanatory drawing which shows the example of the weight vector obtained by machine learning in the dictionary creation apparatus of this invention. 本発明の辞書作成装置における主格項に関する重みテーブルの例を示す説明図。Explanatory drawing which shows the example of the weight table regarding the main item in the dictionary creation apparatus of this invention. 本発明の辞書作成装置における対格項に関する重みテーブルの例を示す説明図。Explanatory drawing which shows the example of the weight table regarding the relative term in the dictionary creation apparatus of this invention. 本発明の辞書作成装置における主格項に関する重みテーブルと対格項に関する重みテーブルをまとめたテーブルの例を示す説明図。Explanatory drawing which shows the example of the table which put together the weight table regarding a main item, and the weight table regarding a relative item in the dictionary creation apparatus of this invention. 本発明の辞書作成装置から出力される項判定規則の例を示す説明図。Explanatory drawing which shows the example of the term determination rule output from the dictionary creation apparatus of this invention.

Explanation of symbols

１，１０…辞書作成装置、２，１２０…構文・意味解析結果テーブル、３，１０２…訓練データ作成部、４，１２１…訓練属性インデクステーブル、５，１２２…訓練ベクトルテーブル、６，１０３…重み学習部、７，１２３…重みテーブル、８，１０４…項判定規則作成部、１０１…構文・意味解析部。 DESCRIPTION OF SYMBOLS 1,10 ... Dictionary creation apparatus, 2,120 ... Syntax and semantic analysis result table, 3,102 ... Training data creation part, 4,121 ... Training attribute index table, 5,122 ... Training vector table, 6,103 ... Weight Learning unit, 7, 123 ... weight table, 8, 104 ... term determination rule creation unit, 101 ... syntax / semantic analysis unit.

Claims

The text to be analyzed described in natural language with the correct term structure assigned to the predicate or action noun, and the basic form of the word included in the text, the part of speech, which is the result of syntactic and semantic analysis of the text A syntax / semantic analysis result table storing semantic categories, dependency relationships between clauses, and predicate states;
Refer to the syntax / semantic analysis result table and extract attributes for learning from the basic form of words included in the text, parts of speech, semantic categories, dependency relationships between clauses, and predicate states, and a training attribute index table Training data creation means for creating a training vector table by creating training vectors at the same time;
Weight learning means for learning a weight representing the importance of an attribute using the training vector table, and adding a weight obtained by the learning to the training attribute index table to create a weight table;
A term determination rule creating means for referring to the weight table and outputting a list in which attributes are rearranged in descending order of importance as a term determination rule;
A dictionary creation device, wherein the output term determination rule is a dictionary.

Parses the text to be analyzed written in natural language with the correct term structure attached to the predicate or action noun, and depends on the basic form of words, parts of speech, semantic categories, and clauses included in the text A syntax / semantic analysis means for analyzing the state of relations and predicates and creating a syntax / semantic analysis result table;
Refer to the syntax / semantic analysis result table and extract attributes for learning from the basic form of words included in the text, parts of speech, semantic categories, dependency relationships between clauses, and predicate states, and a training attribute index table Training data creation means for creating a training vector table by creating training vectors at the same time;
Weight learning means for learning a weight representing the importance of an attribute using the training vector table, and adding a weight obtained by the learning to the training attribute index table to create a weight table;
A term determination rule creating means for referring to the weight table and outputting a list in which attributes are rearranged in descending order of importance as a term determination rule;
A dictionary creation device, wherein the output term determination rule is a dictionary.

The training data creating means stores a teacher variable together with the training vector in a training vector table,
The weight learning means obtains a hyperplane that maximizes the distance between two parallel hyperplanes that divide a positive example side and a negative example side with respect to a training vector and a teacher variable described in the training vector table, 3. The dictionary creating apparatus according to claim 1, wherein weights are learned using a machine learning method based on the obtained hyperplane.

The text to be analyzed described in natural language with the correct term structure assigned to the predicate or action noun, and the basic form of the word included in the text, the part of speech, which is the result of syntactic and semantic analysis of the text A dictionary creation method in an apparatus having a syntax / semantic analysis result table storing semantic categories, dependency relationships between clauses, and predicate states,
The training data creation means refers to the syntax / semantic analysis result table, and determines attributes for learning from the basic form of words included in the text, the part of speech, the semantic category, the dependency relationship between clauses, and the state of the predicate. Extracting and creating a training attribute index table, simultaneously creating a training vector and creating a training vector table;
Weight learning means learns a weight representing the importance of an attribute using the training vector table, and adds a weight obtained by the learning to the training attribute index table to create a weight table; and
A term determination rule creating means executes a step of referring to the weight table and outputting a list in which attributes are rearranged in the order of attribute having high importance as a term determination rule,
A dictionary creation method, wherein the output term determination rule is a dictionary.

The syntactic / semantic analysis means parses the text to be analyzed described in the natural language with the correct term structure added to the predicate or action noun, and the basic form, part of speech, and semantic category of the word contained in the text Analyzing the dependency relationship between clauses, the state of predicates, and creating a syntax / semantic analysis result table;
The training data creation means refers to the syntax / semantic analysis result table, and determines attributes for learning from the basic form of words included in the text, the part of speech, the semantic category, the dependency relationship between clauses, and the state of the predicate. Extracting and creating a training attribute index table, simultaneously creating a training vector and creating a training vector table;
Weight learning means learns a weight representing the importance of an attribute using the training vector table, and adds a weight obtained by the learning to the training attribute index table to create a weight table; and
A term determination rule creating means executes a step of referring to the weight table and outputting a list in which attributes are rearranged in the order of attribute having high importance as a term determination rule,
A dictionary creation method, wherein the output term determination rule is a dictionary.

The training data creating means has a step of storing a teacher variable together with the training vector in a training vector table;
The weight learning step obtains a hyperplane that maximizes the distance between two parallel hyperplanes that divide a positive example side and a negative example side with respect to the training vectors and teacher variables described in the training vector table, 6. The dictionary creation method according to claim 4, wherein weights are learned using a machine learning method based on the obtained hyperplane.

A dictionary creation program for causing a computer to function as each means according to any one of claims 1 to 3.

A computer-readable recording medium on which the dictionary creating program according to claim 7 is recorded.