JP5193798B2

JP5193798B2 - Dictionary creating device, dictionary creating method, dictionary creating program, and recording medium recording dictionary creating program

Info

Publication number: JP5193798B2
Application number: JP2008273683A
Authority: JP
Inventors: 博順平; 昌明永田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-10-24
Filing date: 2008-10-24
Publication date: 2013-05-08
Anticipated expiration: 2028-10-24
Also published as: JP2010102521A

Description

本発明は、自然言語で表現された質問に対しコンピュータが回答する質問応答システム、情報検索システム、情報抽出システム、自動要約システム、自動翻訳システム、自動言い換えシステム、音声認識システムなどに用いられる辞書作成装置、辞書作成方法および辞書作成プログラム並びに辞書作成プログラムを記録した記録媒体に関するものである。 The present invention provides a dictionary for use in a question answering system in which a computer answers a question expressed in a natural language, an information retrieval system, an information extraction system, an automatic summarization system, an automatic translation system, an automatic paraphrasing system, a speech recognition system, etc. The present invention relates to a device, a dictionary creation method, a dictionary creation program, and a recording medium on which the dictionary creation program is recorded.

従来の言語処理装置では、述語の格フレームに対し確率モデルを仮定し、格フレームの正解が人手で付与されているコーパスデータを用いて機械学習を行い、確率モデルのパラメータを推定し、決定された確率モデルを用いて、最も尤度の高い項構造を出力する装置が提案されている（例えば、非特許文献１を参照）。この方法は、文中で、述語がどの単語であるか、その述語に対する項がどの単語であるかが与えられた場合に、その単語の意味属性をどのレベルにした格フレームが情報論的に表現力の高いルールであるかを調べる方法であり、与えられたテキストに対する項の認定、ゼロ代名詞解析については扱っていない。 In a conventional language processing device, a probabilistic model is assumed for the case frame of the predicate, machine learning is performed using corpus data in which the correct answer of the case frame is manually assigned, and the parameters of the probability model are estimated and determined. An apparatus that outputs the most likely term structure using a probabilistic model has been proposed (see, for example, Non-Patent Document 1). In this method, when a predicate is a word in a sentence and a word for a term for the predicate is given, the case frame with the level of the semantic attribute of the word is expressed in information theory. It is a method to check whether the rule is powerful. It does not deal with term recognition or zero pronoun analysis for a given text.

また、非特許文献２に開示されているように大量のテキストコーパスを用いて、確率モデルを学習して、格解析を行い、述語項構造を決定する手法が提案されているが、この方法では、述語が含まれる文とは異なる文に項が現れるゼロ代名詞については扱っていない。 Further, as disclosed in Non-Patent Document 2, a method of learning a probability model using a large amount of text corpus, performing case analysis, and determining a predicate term structure has been proposed. , We do not deal with zero pronouns whose terms appear in a sentence different from the sentence containing the predicate.

また、ゼロ代名詞の同定方法については非特許文献３に記載されている。
李航、安倍直樹、“格スロット間の依存関係の学習”、情報処理学会研究報告、自然言語処理研究報告、Ｖｏｌ．９６、Ｎｏ．１１４、ＰＰ．９３−９９河原大輔、黒橋禎夫、“Ｗｅｂから獲得した大規模格フレームに基づく構文・格解析の統合的確率モデル”、言語処理学会第１２回年次大会発表論文集、ＰＰ．１１１１−１１１４、２００６年飯田龍、乾健太郎、“文脈的手がかりを考慮した機械学習による日本語ゼロ代名詞の先行詞同定”、情報処理学会論文誌、Ｖｏｌ．４５、Ｎｏ．３、ＰＰ．９０６−９１８池原悟、宮崎正弘、白井諭、横尾昭男、中岩浩巳、小倉健太郎、大山芳史、林良彦、日本語語彙大系、岩波書店（１９９７）工藤拓、松本裕治、“チャンキングの段階適用による日本語係り受け解析”、情報処理学会論文誌、Ｖｏｌ．４３、Ｎｏ．６、ｐｐ．１８３４−１８４２、２００２年ＶｌａｄｉｍｉｒＶａｐｎｉｋ、“ＴｈｅＮａｔｕｒｅｏｆＳｔａｔｉｓｔｉｃａｌＬｅａｒｎｉｎｇＴｈｅｏｒｙ”、２ｎｄＥｄｉｔｉｏｎ，Ｓｐｒｉｎｇｅｒ（１９９９） A method for identifying zero pronouns is described in Non-Patent Document 3.
Lee Wang, Naoki Abe, “Learning Dependencies Between Case Slots”, Information Processing Society of Japan Research Report, Natural Language Processing Research Report, Vol. 96, no. 114, PP. 93-99 Daisuke Kawahara and Ikuo Kurohashi, “Integrated probabilistic model of syntax and case analysis based on large-scale case frames acquired from the Web”, Proc. Of the 12th Annual Conference of the Language Processing Society, PP. 1111-1114, 2006 Ryu Iida, Kentaro Inui, “Identification of antecedents of Japanese zero pronouns by machine learning considering contextual cues”, IPSJ Journal, Vol. 45, no. 3, PP. 906-918 Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai, Akio Yokoo, Hiroaki Nakaiwa, Kentaro Ogura, Yoshifumi Oyama, Yoshihiko Hayashi, Japanese Vocabulary System, Iwanami Shoten (1997) Taku Kudo, Yuji Matsumoto, “Japanese Dependency Analysis by Chunking Stage Application”, IPSJ Journal, Vol. 43, no. 6, pp. 1834-1842, 2002 Vladimir Vapnik, “The Nature of Statistical Learning Theory”, 2nd Edition, Springer (1999)

従来の述語項構造を出力する言語処理装置では、辞書に格フレーム情報が登録されていても、複数の動詞、名詞の用法が存在する場合、どの用法についての格情報を用いて解析を行うかについては、明確な基準がなく、人手に頼った調整が必要であり、その調整は、非常に労力を要し、かつ調整によって解析精度を向上させるような調整方法を見つけることは困難であった。 In a conventional language processing device that outputs a predicate term structure, even if case frame information is registered in the dictionary, if there are multiple verb and noun usages, which usage case information is used for analysis There is no clear standard, and manual adjustment is necessary. The adjustment is very labor intensive and it is difficult to find an adjustment method that improves the analysis accuracy by adjustment. .

そこで、非特許文献２では、大規模なテキストコーパスから述語項構造の確率モデルを自動的に構築する方法が提案されている。しかしながら、この方法では、述語が含まれる文とは異なる文に項が現れるゼロ代名詞については扱っておらず、複数の文が与えられたとき、高い精度の述語項構造解析を行うことが困難であった。 Therefore, Non-Patent Document 2 proposes a method for automatically constructing a predicate term structure probability model from a large-scale text corpus. However, this method does not deal with zero pronouns in which terms appear in a sentence different from the sentence containing the predicate, and it is difficult to perform a highly accurate predicate term structure analysis when multiple sentences are given. there were.

また、非特許文献３では、ゼロ代名詞の同定方法を扱っているが、述語項構造解析は扱っていない。また、述語が名詞化され、複合名詞の中に存在するような場合についても扱われていない。ゼロ代名詞や複合名詞の問題は、述語項構造解析を行う上で、相互に影響しあい、順番に扱っても全体的な述語項構造解析精度はかえって下がってしまう恐れがある。 Non-Patent Document 3 deals with a zero pronoun identification method, but does not deal with predicate term structure analysis. Also, the case where the predicate is converted into a noun and exists in a compound noun is not handled. The problems of zero pronouns and compound nouns affect each other in the predicate term structure analysis, and even if they are handled in order, the overall predicate term structure analysis accuracy may be lowered.

ゼロ代名詞、複合名詞を含めて述語項構造解析を統一的に扱う方法は従来無かった。 There has been no method to handle predicate term structure analysis in a unified manner including zero pronouns and compound nouns.

本発明は上記課題を解決するものであり、その目的は、項を判定するための項判定規則を高精度で自動学習することができるとともに、ゼロ代名詞や複合名詞を含めた述語項構造解析を統一的に扱うことができる辞書作成装置、辞書作成方法および辞書作成プログラム並びに辞書作成プログラムを記録した記録媒体を提供することにある。 The present invention solves the above-mentioned problems, and its purpose is to automatically learn a term decision rule for judging a term with high accuracy and to perform predicate term structure analysis including zero pronouns and compound nouns. An object is to provide a dictionary creation device, a dictionary creation method, a dictionary creation program, and a recording medium on which the dictionary creation program is recorded, which can be handled in a unified manner.

上記課題を解決するために、本発明は、述語および動作性名詞に対して、正解の項構造が人手でタグ付けされたテキストに基づいて、機械学習手法を用いることにより、述語または動作性名詞と、テキストに含まれる語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態等の情報から、述語または動作性名詞（以下、まとめて「述語」と呼ぶ）に対する項を判定するための項判定規則を自動学習し出力するように構成した。 In order to solve the above-mentioned problems, the present invention provides a predicate or a behavioral noun by using a machine learning method based on a text in which a correct term structure is manually tagged for a predicate and a behavioral noun. Predicate or operability based on the basic form, part of speech, semantic category, whether it is a functional word, whether it is a symbol, whether it is a symbol, dependency between clauses, predicate state, etc. A term decision rule for judging a term for a noun (hereinafter collectively referred to as a “predicate”) is automatically learned and output.

すなわち、請求項１に記載の辞書作成装置は、自然言語で記載された解析対象のテキストに含まれる各単語について、該解析対象のテキストの先頭から順に、該単語の表記と、該単語の基本形と、該単語の属する文節の係り先の文節と、該単語と係り受け関係にある述語の態と、該単語が述語又は動作性名詞であるか否かを示す情報と、該単語が前記述語又は動作性名詞に対して項の関係にある場合にはその正解の項構造と、が格納された構文・意味解析結果テーブルと、前記構文・意味解析結果テーブル中の述語又は動作性名詞毎に、該述語又は動作性名詞を起点として制約条件を満たすような構文・意味解析結果テーブル中の該述語又は動作性名詞以外の単語を探索したとき、該単語が該述語又は動作性名詞から最も近い場所に出現する単語となるような前記制約条件を属性として抽出し訓練属性インデクステーブルを作成し、該述語又は動作性名詞以外の単語毎に、該単語が前記制約条件の各々に当てはまるか否かを示す値により構成される訓練ベクトルと、前記述語又は動作性名詞以外の単語の各々についての正解の項構造とそれ以外の項構造とを識別するための教師変数と、からなる訓練ベクトルテーブルを作成する訓練データ作成手段と、前記訓練ベクトルテーブルに記載された訓練ベクトルおよび教師変数に対して、前記項構造の種類ごとに、正解の項構造を示す教師変数と訓練ベクトルとの組からなるデータを正例とし、正解以外の項構造を示す教師変数と訓練ベクトルとの組からなるデータを負例として、正例と負例とを分割する２つの平行な超平面の距離が最大となる超平面を求め、該求められた超平面に基づいて機械学習手法を用いて前記属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブルに追加して重みテーブルを作成する重み学習手段と、前記重みテーブルを参照し、重要度の高い属性順に属性を並べ替えたリストを述語または動作性名詞の基本形に対する項構造を判定するための項判定規則として出力する項判定規則作成手段とを備え、前記正解の項構造を示す情報は、前記解析対象のテキストに含まれる述語又は動作性名詞の基本形に対する正解の項構造を示す情報であり、前記制約条件は、(a) 前記起点となる述語または動作性名詞を含む文節を基準としたときの、該単語の探索範囲または探索方向と、(b) 前記起点となる述語または動作性名詞を含む文節と前記述語又は動作性名詞以外の単語を含む文節との間の係り受け関係の有無と、(c) 前記述語又は動作性名詞以外の単語と係り受け関係にある述語の態の情報を考慮するか否かと、を含み、前記出力された項判定規則を辞書とすることを特徴としている。 That is, the dictionary creation device according to claim 1, for each word included in the text to be analyzed described in a natural language, the notation of the word and the basic form of the word in order from the beginning of the text to be analyzed. The clause to which the word belongs, the state of the predicate in a dependency relationship with the word, information indicating whether the word is a predicate or a behavioral noun, and the word is a pre-description If there is a term relationship to a word or behavioral noun, the correct term structure is stored in the syntax / semantic analysis result table, and each predicate or behavioral noun in the syntax / semantic analysis result table is stored. In addition, when a word other than the predicate or behavioral noun in the syntax / semantic analysis result table that satisfies the constraint condition starting from the predicate or behavioral noun is searched for, the word is the most from the predicate or behavioral noun. Appears nearby The constraint condition that becomes a word is extracted as an attribute to create a training attribute index table, and for each word other than the predicate or action noun, a value indicating whether the word applies to each of the constraint conditions Training to create a training vector table composed of training vectors configured, and teacher variables for identifying the correct term structure and other term structures for each word other than the predescription word or the action noun With respect to the training vector and the teacher variable described in the training vector table, data creation means, and data consisting of a set of a teacher variable and a training vector indicating a correct term structure for each type of term structure is a positive example The distance between two parallel hyperplanes that divide the positive example and the negative example is the maximum, with data consisting of pairs of training variables and training vectors indicating term structures other than the correct answer as negative examples. A weight representing the importance of the attribute is learned using a machine learning method based on the obtained hyperplane, and the weight obtained by the learning is added to the training attribute index table. A weight learning means for creating a weight table, and a list of rules with reference to the weight table, in which attributes are rearranged in descending order of importance, as a term determination rule for determining a term structure for a basic form of a predicate or a behavioral noun The information indicating the correct term structure is information indicating the correct term structure for the basic form of the predicate or action noun included in the text to be analyzed, and the constraint condition (A) the search range or search direction of the word based on the phrase containing the predicate or behavioral noun as the origin, and (b) the predicate or behavior name as the origin. And the presence or absence of modification relationship between the phrase and the predicate or clause that contains the word other than the operation of nouns including, (c) state predicate in the predicate or receiving relates the word other than the operation of a noun related Whether or not the above information is taken into account, and the output term determination rule is a dictionary.

また、請求項２に記載の辞書作成方法は、自然言語で記載された解析対象のテキストに含まれる各単語について、該解析対象のテキストの先頭から順に、該単語の表記と、該単語の基本形と、該単語の属する文節の係り先の文節と、該単語と係り受け関係にある述語の態と、該単語が述語又は動作性名詞であるか否かを示す情報と、該単語が前記述語又は動作性名詞に対して項の関係にある場合にはその正解の項構造と、が格納された構文・意味解析結果テーブルを備えた装置における辞書作成方法であって、訓練データ作成手段が、前記構文・意味解析結果テーブル中の述語又は動作性名詞毎に、該述語又は動作性名詞を起点として制約条件を満たすような構文・意味解析結果テーブル中の該述語又は動作性名詞以外の単語を探索したとき、該単語が該述語又は動作性名詞から最も近い場所に出現する単語となるような前記制約条件を属性として抽出し訓練属性インデクステーブルを作成し、該述語又は動作性名詞以外の単語毎に、該単語が前記制約条件の各々に当てはまるか否かを示す値により構成される訓練ベクトルと、前記述語又は動作性名詞以外の単語の各々についての正解の項構造とそれ以外の項構造とを識別するための教師変数と、からなる訓練ベクトルテーブルを作成するステップと、重み学習手段が、前記訓練ベクトルテーブルに記載された訓練ベクトルおよび教師変数に対して、前記項構造の種類ごとに、正解の項構造を示す教師変数と訓練ベクトルとの組からなるデータを正例とし、正解以外の項構造を示す教師変数と訓練ベクトルとの組からなるデータを負例として、正例と負例とを分割する２つの平行な超平面の距離が最大となる超平面を求め、該求められた超平面に基づいて機械学習手法を用いて前記属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブルに追加して重みテーブルを作成する重み学習ステップと、項判定規則作成手段が、前記重みテーブルを参照し、重要度の高い属性順に属性を並べ替えたリストを述語または動作性名詞の基本形に対する項構造を判定するための項判定規則として出力するステップと、を実行し、前記正解の項構造を示す情報は、前記解析対象のテキストに含まれる述語又は動作性名詞の基本形に対する正解の項構造を示す情報であり、前記制約条件は、(a) 前記起点となる述語または動作性名詞を含む文節を基準としたときの、該単語の探索範囲または探索方向と、(b) 前記起点となる述語または動作性名詞を含む文節と前記述語又は動作性名詞以外の単語を含む文節との間の係り受け関係の有無と、(c) 前記述語又は動作性名詞以外の単語と係り受け関係にある述語の態の情報を考慮するか否かと、を含み、前記出力された項判定規則を辞書とすることを特徴としている。
In addition, the dictionary creation method according to claim 2 includes: for each word included in the text to be analyzed described in a natural language, the notation of the word and the basic form of the word in order from the beginning of the text to be analyzed. The clause to which the word belongs, the state of the predicate in a dependency relationship with the word, information indicating whether the word is a predicate or a behavioral noun, and the word is a pre-description A dictionary creation method in a device having a syntax / semantic analysis result table storing a correct answer term structure and a term structure for a word or action noun, wherein training data creation means For each predicate or behavioral noun in the syntax / semantic analysis result table, a word other than the predicate or behavioral noun in the syntax / semantic analysis result table that satisfies the constraint condition starting from the predicate or behavioral noun When searching for The constraint condition such that the word becomes a word appearing closest to the predicate or behavioral noun is extracted as an attribute to create a training attribute index table, and for each word other than the predicate or behavioral noun, Identify training vectors composed of values indicating whether or not a word meets each of the above-mentioned constraints, and the correct term structure and other term structures for each word other than the predescription word or action noun A training vector table consisting of a teacher variable for performing the training, and a weight learning means for each of the types of term structures for the training vector and the teacher variable described in the training vector table. Data consisting of a combination of a teacher variable indicating a term structure and a training vector is a positive example, and data consisting of a combination of a teacher variable indicating a term structure other than the correct answer and a training vector is used. As an example, a hyperplane that maximizes the distance between two parallel hyperplanes that divide a positive example and a negative example is obtained, and the importance of the attribute is determined using a machine learning method based on the obtained hyperplane. A weight learning step of learning a weight to represent, adding a weight obtained by the learning to the training attribute index table to create a weight table, and a term determination rule creating means, referring to the weight table, Outputting a list in which attributes are sorted in descending order of attributes as a term determination rule for determining a term structure for a basic form of a predicate or a behavioral noun, and the information indicating the correct term structure is analyzed This is the information indicating the correct term structure of the basic form of the predicate or behavioral noun contained in the target text, and the constraint conditions are based on the phrase containing the starting predicate or behavioral noun (a) And when the, the search range or search directions of the single word, (b) modification relationship between clauses containing the word other than the phrase containing a predicate or operability nouns serving as the starting point the predicate or operability nouns And (c) whether or not to consider information on the state of predicates that are in a dependency relationship with a word other than the preceding descriptive word or action noun, and the output term determination rule is a dictionary It is characterized by.

また、請求項３に記載の辞書作成プログラムは、コンピュータを請求項１に記載の各手段として機能させる辞書作成プログラムである。 Also, the dictionary creation program according to claim 3 is a dictionary creation program to function as the respective means described computer to claim 1.

また、請求項４に記載の記録媒体は、請求項３に記載の辞書作成プログラムを記録したコンピュータ読み取り可能な記録媒体である。
A recording medium according to a fourth aspect is a computer-readable recording medium in which the dictionary creating program according to the third aspect is recorded.

上記構成によれば、正解の項構造が人手でタグ付けされたテキストに対し、述語または動作性名詞と、テキストに含まれる語の基本形、品詞、意味カテゴリ、および文節間の係り受け関係、述語の態等の情報から、述語または動作性名詞に対する項を判定するための項判定規則を自動学習し出力することができる利点がある。 According to the above configuration, for text that is tagged manually with the correct term structure, the predicate or behavioral noun, the basic form of the word included in the text, the part of speech, the semantic category, and the dependency relationship between clauses, predicate There is an advantage that a term determination rule for determining a term for a predicate or a behavioral noun can be automatically learned and output from information such as the state of.

本発明によれば次のような優れた効果が得られる。
（１）高精度で項を判定するための項判定規則を自動学習することができる。
（２）述語が含まれる文とは異なる文に項が現れるゼロ代名詞や複合名詞を含めて項判定規則を自動学習することができる。したがって、本発明で作成した辞書を用いることで、ゼロ代名詞や複合名詞を含めた述語項構造解析を統一的に扱うことができる。 According to the present invention, the following excellent effects can be obtained.
(1) A term determination rule for determining a term with high accuracy can be automatically learned.
(2) The term decision rule can be automatically learned including zero pronouns and compound nouns in which terms appear in a sentence different from the sentence containing the predicate. Therefore, by using the dictionary created in the present invention, predicate term structure analysis including zero pronouns and compound nouns can be handled uniformly.

以下、図面を参照しながら本発明の実施の形態を説明するが、本発明は下記の実施形態例に限定されるものではない。図１は本発明の一実施形態例における辞書作成装置１の構成を示すブロック図であり、図２は図１の装置の動作を示すフローチャートである。 Hereinafter, embodiments of the present invention will be described with reference to the drawings, but the present invention is not limited to the following embodiments. FIG. 1 is a block diagram showing a configuration of a dictionary creation device 1 according to an embodiment of the present invention, and FIG. 2 is a flowchart showing an operation of the device of FIG.

図１において、２は、述語又は動作性名詞に対して人手で正解の項構造が付与された自然言語で記載された訓練用テキスト（解析対象のテキスト）と、該テキストを構文・意味解析した結果である、テキストに含まれる単語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態とが格納された構文・意味解析結果テーブルである。 In FIG. 1, 2 is a training text (text to be analyzed) written in a natural language in which a correct term structure is manually added to a predicate or an action noun, and the text is subjected to syntax and semantic analysis. The result is a syntax that contains the basic form of the word contained in the text, part of speech, semantic category, whether it is a functional word, whether it is a symbol, the dependency between clauses, and the state of the predicate. It is a semantic analysis result table.

機能語であるか否か、および記号であるか否かの情報は、あらかじめ構文・意味解析結果テーブル２に格納しておいても良いし、構文・意味解析結果テーブル２には格納せず、単語の基本形と品詞の情報から必要に応じて動的に算出して利用することとしても良い。 Information regarding whether or not it is a function word and whether or not it is a symbol may be stored in the syntax / semantic analysis result table 2 in advance, or not stored in the syntax / semantic analysis result table 2. It is also possible to dynamically calculate and use the basic form of the word and the part of speech information as necessary.

３は、前記構文・意味解析結果テーブル２を参照して、テキストに含まれる単語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態から学習を行うための属性を抽出し訓練属性インデクステーブル４を作成し、同時に訓練ベクトルを作成して訓練ベクトルテーブル５を作成する訓練データ作成手段としての訓練データ作成部である。 3 refers to the syntax / semantic analysis result table 2 and refers to the basic form, part of speech, semantic category, whether it is a functional word, whether it is a function word, whether it is a symbol, and the dependency between phrases. It is a training data creation unit as a training data creation means for extracting a training attribute index table 4 by extracting attributes for learning from the relationship and predicate state, and creating a training vector table 5 at the same time by creating a training vector. .

６は、前記訓練ベクトルテーブル５を用いて属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブル４に追加して重みテーブル７を作成する重み学習手段としての重み学習部である。 6 is a weight learning unit that learns a weight representing the importance of an attribute using the training vector table 5 and creates a weight table 7 by adding the weight obtained by the learning to the training attribute index table 4. Is a weight learning unit.

８は、前記重みテーブル７を参照し、重要度の高い属性順に属性を並べ替えたリストを項判定規則（辞書）として出力する項判定規則作成手段としての項判定規則作成部である。 Reference numeral 8 denotes a term determination rule creating unit serving as a term determination rule creating unit that refers to the weight table 7 and outputs a list in which attributes are rearranged in the order of importance, as a term determination rule (dictionary).

次に、上記のように構成された辞書作成装置１の動作を図２とともに説明する。 Next, the operation of the dictionary creating apparatus 1 configured as described above will be described with reference to FIG.

まず、辞書作成装置１に対して、ユーザが正解項構造付き訓練用テキストおよびテキストの構文・意味解析結果を構文・意味解析結果テーブル２に入力する（ステップＳ１）。 First, the user inputs the training text with correct term structure and the syntax / semantic analysis result of the text into the syntax / semantic analysis result table 2 to the dictionary creating apparatus 1 (step S1).

次に、訓練データ作成部３が、構文・意味解析結果テーブル２から、述語および動作性名詞の項判定規則を構成する属性を抽出し、訓練属性インデクステーブル４を作成するとともに、該属性を利用して訓練ベクトルを作成し、訓練ベクトルテーブル５に保存する（ステップＳ２）。 Next, the training data creation unit 3 extracts the attributes constituting the predicate and action noun term determination rules from the syntax / semantic analysis result table 2, creates the training attribute index table 4, and uses the attributes Then, a training vector is created and stored in the training vector table 5 (step S2).

次に、重み学習部６が訓練ベクトルテーブル５から、機械学習技術を用いて、属性の重要性を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブル４に追加し、それを重みテーブル７に保存する（ステップＳ３）。 Next, the weight learning unit 6 learns the weight representing the importance of the attribute from the training vector table 5 using machine learning technology, and adds the weight obtained by the learning to the training attribute index table 4; It is stored in the weight table 7 (step S3).

次に、項判定規則作成部８が、重みテーブル７の情報から、属性の重み順に並べなおし、項判定規則を作成し出力して、終了する（ステップＳ４）。 Next, the term determination rule creation unit 8 rearranges the information in the weight table 7 in the order of the attribute weights, creates and outputs a term determination rule, and ends (step S4).

尚、前記図１の構文・意味解析結果テーブル２、訓練データ作成部３、訓練属性インデクステーブル４、訓練ベクトルテーブル５、重み学習部６、重みテーブル７および項判定規則作成部８の、各部の具体的な構成、動作については、次に詳述する図３の構文・意味解析結果テーブル１２０、訓練データ作成部１０２、訓練属性インデクステーブル１２１、訓練ベクトルテーブル１２２、重み学習部１０３、重みテーブル１２３および項判定規則作成部１０４と各々同一であるので、ここでは説明を省略する。 The syntax / semantic analysis result table 2, the training data creation unit 3, the training attribute index table 4, the training vector table 5, the weight learning unit 6, the weight table 7, and the term determination rule creation unit 8 of FIG. Regarding the specific configuration and operation, the syntax / semantic analysis result table 120, the training data creation unit 102, the training attribute index table 121, the training vector table 122, the weight learning unit 103, and the weight table 123 shown in FIG. Since they are the same as the term determination rule creation unit 104, the description thereof is omitted here.

図３は本発明の他の実施形態例における辞書作成装置１０の構成を示すブロック図であり、図４は図３の装置の動作を示すフローチャートである。 FIG. 3 is a block diagram showing the configuration of the dictionary creation device 10 according to another embodiment of the present invention, and FIG. 4 is a flowchart showing the operation of the device of FIG.

図３において、１０１は、述語又は動作性名詞に対して正解の項構造が付与された自然言語で記載された訓練用テキスト（解析対象のテキスト）を構文解析し、テキストに含まれる単語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態を解析し、構文・意味解析結果テーブル１２０を作成する構文・意味解析手段としての構文・意味解析部である。 In FIG. 3, reference numeral 101 denotes a training text (text to be analyzed) described in a natural language in which a correct term structure is given to a predicate or an action noun, and a basic form of a word included in the text. , Part-of-speech, semantic category, whether it is a functional word, whether it is a symbol, dependency relation between clauses, predicate state, syntax / semantic analysis to create a syntax / semantic analysis result table 120 It is a syntax / semantic analysis unit as a means.

１０２は、前記構文・意味解析結果テーブル１２０を参照して、テキストに含まれる単語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態から学習を行うための属性を抽出し訓練属性インデクステーブル１２１を作成し、同時に訓練ベクトルを作成して訓練ベクトルテーブル１２２を作成する訓練データ作成手段としての訓練データ作成部である。 102, referring to the syntax / semantic analysis result table 120, the basic form of a word included in the text, the part of speech, the semantic category, whether it is a function word, whether it is a symbol, and the dependency between phrases It is a training data creation unit as a training data creation unit that extracts attributes for learning from the state of relations and predicates, creates a training attribute index table 121, and creates a training vector at the same time to create a training vector table 122. .

１０３は、前記訓練ベクトルテーブル１２２を用いて属性の重要度を表す重みを学習し、該学習により得られた重みを前記訓練属性インデクステーブル１２１に追加して重みテーブル１２３を作成する重み学習手段としての重み学習部である。 103 is a weight learning unit that learns the weight representing the importance of the attribute using the training vector table 122 and adds the weight obtained by the learning to the training attribute index table 121 to create the weight table 123. Is a weight learning unit.

１０４は、前記重みテーブル１２３を参照し、重要度の高い属性順に属性を並べ替えたリストを項判定規則（辞書）として出力する項判定規則作成手段としての項判定規則作成部である。 Reference numeral 104 denotes a term determination rule creating unit serving as a term determination rule creating unit that refers to the weight table 123 and outputs, as a term determination rule (dictionary), a list in which attributes are rearranged in order of importance.

前記構文・意味解析部１０１、構文・意味解析結果テーブル１２０、訓練データ作成部１０２、訓練属性インデクステーブル１２１、訓練ベクトルテーブル１２２、重み学習部１０３、重みテーブル１２３および項判定規則作成部１０４の、後述する各機能は、例えばコンピュータにより達成される。 The syntax / semantic analysis unit 101, the syntax / semantic analysis result table 120, the training data creation unit 102, the training attribute index table 121, the training vector table 122, the weight learning unit 103, the weight table 123, and the term determination rule creation unit 104, Each function to be described later is achieved by, for example, a computer.

次に、上記のように構成された辞書作成装置１０の動作を図４とともに説明する。まず、辞書作成装置１０に対して、自然言語で書かれた、述語または動作性名詞に対して人手で正解の項構造が付与された、訓練用テキストを構文・意味解析部１０１に入力する（ステップＳ１１）。例えば、図５のような文章が入力されたとする。 Next, the operation of the dictionary creating apparatus 10 configured as described above will be described with reference to FIG. First, a training text in which a correct term structure is manually added to a predicate or a behavioral noun written in a natural language is input to the syntax / semantic analysis unit 101 to the dictionary creation device 10 ( Step S11). For example, assume that a sentence as shown in FIG. 5 is input.

ここで、図５の＜ＮＰＩＤ＝数字＞と＜／ＮＰ＞のタグで囲まれた部分は項を表し、＜ＰＲＥＤ〜＞と＜／ＰＲＥＤ＞で囲まれた部分は述語であることを表す。このテキストには出現していないが動作性名詞については＜ＥＶＥＮＴ〜＞と＜／ＥＶＥＮＴ＞のタグで囲むものとする。また、＜ＰＲＥＤ〜＞タグと＜ＥＶＥＮＴ〜＞タグの「〜」の部分に記述される「ＮＯＭ＝“１” ＡＣＣ＝“２”」等の記載は、「この述語や動作性名詞の基本形に対して、主格を取る項の正解は、ＩＤ番号が１である「私」、対格を取る項の正解は、ＩＤ番号が２である「ピーマン」」等を表す。ここでは、簡単のため、項の種類としては主格項と対格項だけ扱うとするが、他の項の種類についても同様に処理が可能である。 Here, the part surrounded by the tags <NP ID = number> and </ NP> in FIG. 5 represents a term, and the part enclosed by <PRED ~> and </ PRED> represents a predicate. . Although it does not appear in this text, a behavioral noun is enclosed by tags <EVENT ~> and </ EVENT>. In addition, the description such as “NOM =“ 1 ”ACC =“ 2 ”” described in the “˜” part of the <PRED ~> tag and the <EVENT ~> tag is “in the basic form of this predicate or action noun. On the other hand, the correct answer of the term taking the main case represents "I" whose ID number is 1, and the correct answer of the term taking the main case represents "green pepper" whose ID number is 2. Here, for the sake of simplicity, it is assumed that only the main term and the opposite term are handled as the types of terms, but the same processing can be performed for the types of other terms.

次に、構文・意味解析部１０１が、前記訓練用テキストを構文解析および意味解析を行うことにより、テキストに含まれる語の基本形、品詞、意味カテゴリ、機能語であるか否か、記号であるか否か、および文節間の係り受け関係、述語の態を特定し、解析結果を構文・意味解析結果テーブル１２０に格納する(ステップＳ１２)。例えば、「私はピーマンが嫌いだ。しかし昨日は母に無理やり食べさせられた。」という文章を構文解析および意味解析を行った場合、図６のように、テキストの先頭から順に、単語毎に、文番号、文節番号、係り先文節番号、文節内の単語番号、単語の基本形、品詞、機能語または記号か否か、単語の意味カテゴリ、を得る。また、テキストに与えられていた正解の項構造から、どの単語が該テキストで解析対象とする述語または動作性名詞であるか、該述語または動作性名詞に対して項構造を構成する項を得る。この場合、「食べる」が今注目している述語だとし、「食べる」に対する項は「私」と「ピーマン」であり、特に「私」は主格項、「ピーマン」は対格項であるとする。 Next, the syntax / semantic analysis unit 101 performs syntax analysis and semantic analysis on the training text, thereby indicating whether or not the basic form, part of speech, semantic category, and function word of the word included in the text are symbols. And the dependency relationship between clauses and the state of the predicate are specified, and the analysis result is stored in the syntax / semantic analysis result table 120 (step S12). For example, if you parse the sentence “I hate bell peppers, but my mother forcibly eaten yesterday” and perform semantic analysis, as shown in FIG. , Sentence number, clause number, dependency clause number, word number in the clause, basic form of the word, part of speech, whether it is a function word or symbol, and the meaning category of the word. In addition, from the correct term structure given to the text, which word is the predicate or behavioral noun to be analyzed in the text, or the terms constituting the term structure for the predicate or behavioral noun are obtained. . In this case, suppose that “eating” is the predicate that is of interest now, and the terms for “eating” are “I” and “green pepper”, and in particular, “I” is the main term, and “green pepper” is the counter case. .

ここで図６の文番号は０以上の整数で、テキストの先頭の文から順に０，１，２、．．．と付与される。文節番号は０以上の整数で、１文内の先頭の文節から順に０，１，２、．．．と付与される。また、係り先文節番号は、各単語について、その単語を含む文節が構文解析の結果、係っていると判定された文節の番号である。ただし文末で係り先が無い場合は、係り先文節番号を−１とした。単語番号は、０以上の整数で、１文節内の先頭の単語から順に０，１，２、．．．と付与される。単語基本形は各単語の基本形である。品詞は、各単語の品詞である。機能語／記号は、各単語が該単語を含む文節の中で意味内容を表す内容語ではなく、「は」「が」など内容語に付属して内容語の機能を表す機能語であるか、もしくは記号であるかを表す。意味カテゴリは、日本語語彙大系（参考文献：非特許文献４参照）などのシソーラスを用いて各単語に付与された意味カテゴリである。 Here, the sentence numbers in FIG. 6 are integers of 0 or more, and 0, 1, 2,. . . And given. The clause number is an integer greater than or equal to 0. 0, 1, 2,. . . And given. Further, the related clause number is the number of the clause that is determined to be related as a result of syntax analysis for each word. However, when there is no destination at the end of the sentence, the destination clause number is set to -1. The word number is an integer greater than or equal to 0, and is 0, 1, 2,. . . And given. The word basic form is the basic form of each word. The part of speech is the part of speech of each word. Whether the functional word / symbol is a functional word that represents the function of the content word attached to the content word, such as “ha” and “ga”, instead of the content word that represents the semantic content in the clause including the word. Or a symbol. The semantic category is a semantic category assigned to each word using a thesaurus such as a Japanese vocabulary system (see Reference Document: Non-Patent Document 4).

この構文・意味解析部１０１では、ＣａｂｏＣｈａ（参考文献：非特許文献５参照）等の単体の構文解析器を利用することもできる。 The syntax / semantic analysis unit 101 can use a single syntax analyzer such as CaboCha (reference: see Non-Patent Document 5).

次に、訓練データ作成部１０２が、前記構文・意味解析結果テーブル１２０について、述語を起点として、何らかの制約の元で述語から一番近い場所に出現した単語が述語に対する項となっていると仮定し、その制約となる属性を抽出し、訓練属性インデクステーブル１２１を作成する（ステップＳ１３）。例えば、図６の例では、述語「食べる」に対し、主格の項となっている「私」は、「述語より前にある単語を探索」したとき、「意味カテゴリが「人」」でかつ「その単語の機能語が「は」」である単語の中では、一番近い場所にある単語である。述語以外の入力テキストに現れる各単語について、どのような制約で述語から一番近いかを調べ、その制約を属性とする。例えば、制約としては、（対象単語の探索方法、品詞、意味カテゴリ、係り受け関係にある述語の態）の組合せとする。 Next, the training data creation unit 102 assumes that, in the syntax / semantic analysis result table 120, a word that appears in the nearest place from the predicate under some restriction is a term for the predicate, starting from the predicate. Then, the attribute that becomes the restriction is extracted, and the training attribute index table 121 is created (step S13). For example, in the example of FIG. 6, when “I”, which is the main term for the predicate “eat”, “searches for a word before the predicate”, the meaning category is “person” and Among words whose function word is “ha”, it is the word in the nearest place. For each word appearing in the input text other than the predicate, it is examined what kind of constraint is closest to the predicate, and the constraint is used as an attribute. For example, the constraint is a combination of (target word search method, part of speech, semantic category, predicate state in dependency relationship).

ここで、対象単語の探索方法とは、例えば、述語より前方に単語を探索し、単語を含む文節が対象述語を含む文節へ係っている状態で、単語を含む文節の機能語／記号および単語が係っている述語の態を考慮する場合をｉｃ、述語より後方に単語を探索し、対象述語を含む文節から単語を含む文節へ係っている状態で、単語を含む文節の機能語／記号および単語が係っている述語の態を考慮する場合をｏｃ、述語と対象の単語が同じ文節内にある場合をｓｃ、述語を含む文節と対象の単語を含む文節との間に係り受け関係がなく、かつ同じ文節でもない場合に、その単語を含む文節の持つ機能語／記号、係り受け関係にある述語の態を考慮しながら最初に述語より前方を、それでもそのような単語がない場合は述語より後方を探索する方法をｎｃとする。また、機能語／記号および単語が係っている述語の態を考慮しないで述語より前方を探索する方法をｆｗ、同様に後方を探索する方法をｂｗとする。例えば、図６の例では、単語「母」は、（探索方法＝ｉｃ、単語の意味カテゴリ＝人、機能語＝に、態＝受動態）の制約等で述語「食べる」から一番近い位置にある。このようにして、図７のような訓練属性インデクステーブル１２１を作成する。多くの属性が作成されるが、ここではそのうち１０個の属性のみを示す。 Here, the search method of the target word is, for example, searching for a word ahead of the predicate, and in a state where the clause including the word is related to the clause including the target predicate, the functional word / symbol of the clause including the word and Ic when considering the state of the predicate that the word is related to, search for the word after the predicate, and the functional word of the clause that includes the word from the clause containing the target predicate to the clause containing the word / Oc to consider the state of the predicate with which the symbol and the word are involved, sc if the predicate and the target word are in the same clause, and a clause between the clause containing the predicate and the clause containing the target word If there is no receiving relationship and the same clause, the function word / symbol of the clause containing the word and the predicate state in the dependency relationship are taken into consideration first, but such a word is still If not, how to search behind the predicate And c. Further, let fw be a method for searching forward from a predicate without considering the predicate state in which the function word / symbol and the word are related, and bw be a method for searching backward similarly. For example, in the example of FIG. 6, the word “mother” is closest to the predicate “eat” due to restrictions such as (search method = ic, word semantic category = person, function word =, state = passive). is there. In this way, the training attribute index table 121 as shown in FIG. 7 is created. Many attributes are created, of which only 10 are shown here.

次に、訓練データ作成部１０２が、訓練属性インデクステーブル１２１と構文・意味解析結果テーブル１２０を元に、構文・意味解析結果テーブル１２０中の述語以外の各単語について、訓練ベクトルを作成し、訓練ベクトルテーブル１２２に格納する（ステップＳ１４）。例えば、図５において、述語を除く単語について、上から順に訓練ベクトルを作成するとし、それらのベクトルをｘ＿１，ｘ＿２，ｘ＿３，．．．とする。この場合、図８のような訓練ベクトルを生成する。図７のような訓練属性インデクステーブル１２１の各属性が各単語の条件に当てはまるか否かを検査し、もし当てはまれば、属性値を１、当てはまらなければ属性値を０とする。例えば、ｘ＿１において４番目の要素が１となっているが、これは、単語「私」が、図７の属性番号４の条件、つまり探索タイプがｎｃ、すなわち、述語と係り受け関係に無い単語の中を述語から前方向を探索したとき、意味カテゴリが「人」で、その単語を含む文節が「は」であり、単語と係り受け関係にある述語の態が能動態である単語の中で、述語から最も近い位置にあるためである。逆に、ｘ＿１の９番目の要素が０となっているが、これは、述語「食べる」から機能語の条件、態の条件はなしで、述語の前方の単語を探索していったとき、意味カテゴリが「人」であるものは、文番号１、文節番号２、単語番号０の「母」が最も近く、「私」は２番目の近さで一番ではないため、条件に当てはまらないと判定され０となっている。こうして、訓練ベクトルを生成して、訓練ベクトルテーブル１２２へ格納する。 Next, the training data creation unit 102 creates training vectors for each word other than predicates in the syntax / semantic analysis result table 120 based on the training attribute index table 121 and the syntax / semantic analysis result table 120, It stores in the vector table 122 (step S14). For example, in FIG. 5, it is assumed that training vectors are created in order from the top for words excluding predicates, and these vectors are represented by x_1, x_2, x_3,. . . And In this case, a training vector as shown in FIG. 8 is generated. It is checked whether or not each attribute of the training attribute index table 121 as shown in FIG. 7 meets the condition of each word. If so, the attribute value is 1, and if not, the attribute value is 0. For example, the fourth element in x_1 is 1, which means that the word “I” is the condition of attribute number 4 in FIG. 7, that is, the search type is nc, that is, the word that is not dependent on the predicate. When searching forward from a predicate in a word, the semantic category is “people”, the phrase containing the word is “ha”, and the predicate in a dependency relationship with the word is active This is because the position is closest to the predicate. On the other hand, the ninth element of x_1 is 0, which means that when searching for the word in front of the predicate without any function or condition conditions from the predicate “eat”, If the category is “People”, sentence number 1, phrase number 2, and word number 0 “mother” are the closest, and “I” is the second closest, so the condition is not met It is determined to be 0. Thus, a training vector is generated and stored in the training vector table 122.

また、述語を除く各単語について項の種類ごとに教師変数を訓練ベクトルテーブル１２２へ格納する（ステップＳ１５）。例えば、ｙ＿{１，ＮＯＭ}＝１，ｙ＿{２，ＮＯＭ}＝０，ｙ＿{３，ＮＯＭ}＝０，．．．，ｙ＿{１，ＡＣＣ}＝０，ｙ＿{２，ＡＣＣ}＝０，ｙ＿{３，ＡＣＣ}＝１，．．．，を格納する。ここで、ｙ＿{ｉ，ＮＯＭ}は、ｉ番目の単語が正解の主格項であれば１、そうでなければ−１を格納するスカラーの変数である。また同様に、ｙ＿{ｉ，ＡＣＣ}はｉ番目の単語が正解の対格項であれば１、そうでなければ−１を格納するスカラーの変数である。なお、ＮＯＭは主格に当たる英語ｎｏｍｉｎａｔｉｖｅ、ＡＣＣは対格に当たる英語ａｃｃｕｓａｔｉｖｅを表している。こうして、結局、図９のような訓練ベクトルと教師変数を訓練ベクトルテーブル１２２へ格納する。 Further, for each word excluding the predicate, a teacher variable is stored in the training vector table 122 for each type of term (step S15). For example, y_ {1, NOM} = 1, y_ {2, NOM} = 0, y_ {3, NOM} = 0,. . . , Y_ {1, ACC} = 0, y_ {2, ACC} = 0, y_ {3, ACC} = 1,. . . , Is stored. Here, y_ {i, NOM} is a scalar variable that stores 1 if the i-th word is a correct principal term, and -1 otherwise. Similarly, y_ {i, ACC} is a scalar variable storing 1 if the i-th word is a correct case term, and -1 otherwise. Note that NOM represents English nominal corresponding to the main case, and ACC represents English accumulative equivalent to the main case. Thus, the training vector and the teacher variable as shown in FIG.

次に、重み学習部１０３が、項の種類毎に、訓練ベクトルテーブル１２２から、訓練ベクトルおよび教師変数を読み出し、機械学習手法を用いて、属性の重要度の重みを計算する（ステップＳ１６）。 Next, the weight learning unit 103 reads the training vector and the teacher variable from the training vector table 122 for each term type, and calculates the weight of the attribute importance using the machine learning method (step S16).

一例としてＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）と呼ばれる機械学習手法で、特に非特許文献６において線形ＳＶＭと呼ばれている機械学習手法を利用して、学習を行う方法について、以下に述べる。 As an example, a machine learning method called SVM (Support Vector Machine), in particular, a method of learning using a machine learning method called linear SVM in Non-Patent Document 6 will be described below.

線形ＳＶＭでは、（ｘ₁，ｙ₁），．．．，（ｘ_m，ｙ_m）…（１）で表されるｍ個の訓練データに対して、正例側（ｙ＞０となる部分）と負例側（ｙ＜０となる部分）を分割する２つの平行な超平面を求める。その際、その２つの超平面の距離が最大となるような超平面を求める。ここで求められる超平面は式（２）で表される。 For linear SVMs, (x ₁ , y ₁ ),. . . , (X _m , y _m ) ... For the m pieces of training data represented by (1), the positive example side (part where y> 0) and the negative example side (part where y <0) are divided. Find two parallel hyperplanes. At this time, a hyperplane in which the distance between the two hyperplanes is maximized is obtained. The hyperplane obtained here is expressed by equation (2).

ここでｗは重みベクトルで、ｂはバイアスであり、ｗもｂも訓練データから式（３）で表される最適化問題を解くことで得られる。 Here, w is a weight vector, b is a bias, and both w and b can be obtained by solving the optimization problem expressed by Equation (3) from the training data.

このようにして得られた重みベクトルｗの要素は、線形ＳＶＭの場合、その値が大きいほど、それに対応する属性の重要度が高いことを意味する。例えば、図９のような訓練ベクトルテーブル１２２の内容に対して、主格項に対する線形ＳＶＭ、対格項に対する線形ＳＶＭで学習を行ったとする。ここで主格項に対する線形ＳＶＭでは、（ｘ＿１，ｙ＿{１，ＮＯＭ}），（ｘ＿２，ｙ＿{２，ＮＯＭ}），（ｘ＿３，ｙ＿{３，ＮＯＭ}），．．．，対格項に対する線形ＳＶＭでは、（ｘ＿１，ｙ＿{１，ＡＣＣ}），（ｘ＿２，ｙ＿{２，ＡＣＣ}），（ｘ＿３，ｙ＿{３，ＡＣＣ}），．．．，のベクトルとスカラー変数の組が訓練データとして用いられる。 In the case of the linear SVM, the element of the weight vector w obtained in this way means that the greater the value, the higher the importance of the corresponding attribute. For example, it is assumed that learning is performed on the contents of the training vector table 122 as shown in FIG. Here, in the linear SVM for the main term, (x_1, y_ {1, NOM}), (x_2, y_ {2, NOM}), (x_3, y_ {3, NOM}),. . . , (X_1, y_ {1, ACC}), (x_2, y_ {2, ACC}), (x_3, y_ {3, ACC}),. . . The vector and scalar variable pairs are used as training data.

その結果、図１０のような主格項に関する重みベクトルｗ_NOM、対格項に関する重みベクトルｗ_ACCが得られたとする。そして、重み学習部１０３は、前記図１０のように得られた重みベクトルを前記訓練属性インデクステーブル１２１のテーブル（図７のテーブル）に追加することにより、主格項に関する重みテーブル（図１１）と対格項に関する重みテーブル（図１２)を得、それらを重みテーブル１２３に格納する。 As a result, it is assumed that the weight vector w _NOM related to the main case term and the weight vector w _ACC related to the case term as shown in FIG. 10 are obtained. Then, the weight learning unit 103 adds the weight vector obtained as shown in FIG. 10 to the table of the training attribute index table 121 (table of FIG. 7), thereby obtaining a weight table (FIG. 11) related to the main item. A weight table (FIG. 12) regarding the relative terms is obtained and stored in the weight table 123.

次に、項判定規則作成部１０４が、重みテーブル１２３内の複数の重みテーブルを１つにまとめ、重みでソートして、決定リストとして出力する（ステップＳ１７）。例えば、図１１と図１２のテーブルをまとめて図１３のテーブルを得る。これを重みでソートしたものの上位１０位までを図１４に示す。このテーブルを決定リスト形式として出力し、項判定規則テーブル（辞書）を得る。 Next, the term determination rule creation unit 104 collects a plurality of weight tables in the weight table 123, sorts them by weights, and outputs them as a decision list (step S17). For example, the tables of FIGS. 11 and 12 are combined to obtain the table of FIG. FIG. 14 shows the top 10 items sorted by weight. This table is output as a decision list format to obtain a term determination rule table (dictionary).

なお、前記実施形態例では、出力する項構造が述語基本形に対する必須表層格に関する項構造である具体例について記述したが、出力する項構造が任意の表層格に関する場合や深層格に関する場合でも、構文・意味解析結果テーブルをそれらに応じたものにすれば同様の手段で実現可能である。 In the above embodiment, a specific example is described in which the output term structure is a term structure related to the mandatory surface case for the predicate basic form. However, even if the output term structure is related to any surface case or deep case, the syntax -It can be realized by the same means if the semantic analysis result table is made corresponding to them.

また、重み学習手法においても重みテーブルが得られれば上記で述べた機械学習方法とは異なる学習方法を用いることが可能である。 Also, in the weight learning method, if a weight table is obtained, a learning method different from the machine learning method described above can be used.

また、扱う対象が英語などの外国語テキストである場合にも、係り受け関係の機能語の代わりに、動詞・名詞以外の品詞や、単語間で構成される名詞句・動詞句といった句構造における関係を係り受け関係として使用することによって、この学習方法を用いることが可能である。 In addition, even if the target is a foreign language text such as English, instead of the dependency function word, the part structure other than the verb / noun and the phrase structure such as the noun phrase / verb phrase composed of words are used. This learning method can be used by using the relationship as a dependency relationship.

また、本実施形態の辞書作成装置における各手段の一部もしくは全部の機能をコンピュータのプログラムで構成し、そのプログラムをコンピュータを用いて実行して本発明を実現することができること、本実施形態の辞書作成方法における手順をコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることは言うまでもなく、コンピュータでその機能を実現するためのプログラムを、そのコンピュータが読み取り可能な記録媒体、例えばＦＤ（Ｆｌｏｐｐｙ（登録商標）Ｄｉｓｋ）や、ＭＯ（Ｍａｇｎｅｔｏ−Ｏｐｔｉｃａｌｄｉｓｋ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、メモリカード、ＣＤ（ＣｏｍｐａｃｔＤｉｓｋ）−ＲＯＭ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＨＤＤ、リムーバブルディスクなどに記録して、保存したり、配布したりすることが可能である。また、上記のプログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 Further, a part or all of the functions of each means in the dictionary creation device of the present embodiment can be configured by a computer program, and the program can be executed using the computer to realize the present invention. It goes without saying that the procedure in the dictionary creation method can be constituted by a computer program, and the program can be executed by the computer, and the program for realizing the function by the computer can be read by a computer-readable recording medium such as an FD. (Floppy (registered trademark) Disk), MO (Magneto-Optical disk), ROM (Read Only Memory), memory card, CD (Compact Disk) -ROM, DVD (Digital Versatile D) sk) -ROM, CD-R, CD-RW, HDD, and recorded in a removable disk, or stored, it is possible or distribute. It is also possible to provide the above program through a network such as the Internet or electronic mail.

本発明の一実施形態例に関する辞書作成装置のシステム構成を示すブロック図。The block diagram which shows the system configuration | structure of the dictionary preparation apparatus regarding one Example of this invention. 本発明の一実施形態例に関する辞書作成装置の動作を説明するためのフローチャート。The flowchart for demonstrating operation | movement of the dictionary creation apparatus regarding one Example of this invention. 本発明の他の実施形態例に関する辞書作成装置のシステム構成を示すブロック図。The block diagram which shows the system configuration | structure of the dictionary creation apparatus regarding the other embodiment of this invention. 本発明の他の実施形態例に関する辞書作成装置の動作を説明するためのフローチャート。The flowchart for demonstrating operation | movement of the dictionary creation apparatus regarding the other embodiment of this invention. 本発明の辞書作成装置に入力される正解項構造付き訓練データの例を示す説明図。Explanatory drawing which shows the example of training data with a correct answer term structure input into the dictionary creation apparatus of this invention. 本発明の辞書作成装置における構文・意味解析結果テーブルの例を示す説明図。Explanatory drawing which shows the example of a syntax and a semantic analysis result table in the dictionary creation apparatus of this invention. 本発明の辞書作成装置における訓練属性インデクステーブルの例を示す説明図。Explanatory drawing which shows the example of the training attribute index table in the dictionary creation apparatus of this invention. 本発明の辞書作成装置内で作成される訓練ベクトルの例を示す説明図。Explanatory drawing which shows the example of the training vector produced within the dictionary creation apparatus of this invention. 本発明の辞書作成装置における訓練ベクトルテーブルの例を示す説明図。Explanatory drawing which shows the example of the training vector table in the dictionary creation apparatus of this invention. 本発明の辞書作成装置において機械学習で得られる重みベクトルの例を示す説明図。Explanatory drawing which shows the example of the weight vector obtained by machine learning in the dictionary creation apparatus of this invention. 本発明の辞書作成装置における主格項に関する重みテーブルの例を示す説明図。Explanatory drawing which shows the example of the weight table regarding the main item in the dictionary creation apparatus of this invention. 本発明の辞書作成装置における対格項に関する重みテーブルの例を示す説明図。Explanatory drawing which shows the example of the weight table regarding the relative term in the dictionary creation apparatus of this invention. 本発明の辞書作成装置における主格項に関する重みテーブルと対格項に関する重みテーブルをまとめたテーブルの例を示す説明図。Explanatory drawing which shows the example of the table which put together the weight table regarding a main item, and the weight table regarding a relative item in the dictionary creation apparatus of this invention. 本発明の辞書作成装置から出力される項判定規則の例を示す説明図。Explanatory drawing which shows the example of the term determination rule output from the dictionary creation apparatus of this invention.

Explanation of symbols

１，１０…辞書作成装置、２，１２０…構文・意味解析結果テーブル、３，１０２…訓練データ作成部、４，１２１…訓練属性インデクステーブル、５，１２２…訓練ベクトルテーブル、６，１０３…重み学習部、７，１２３…重みテーブル、８，１０４…項判定規則作成部、１０１…構文・意味解析部。 DESCRIPTION OF SYMBOLS 1,10 ... Dictionary creation apparatus, 2,120 ... Syntax and semantic analysis result table, 3,102 ... Training data creation part, 4,121 ... Training attribute index table, 5,122 ... Training vector table, 6,103 ... Weight Learning unit, 7, 123 ... weight table, 8, 104 ... term determination rule creation unit, 101 ... syntax / semantic analysis unit.

Claims

For each word included in the text to be analyzed described in natural language, in order from the beginning of the text to be analyzed, the notation of the word, the basic form of the word, the clause to which the clause to which the word belongs , The predicate state that is in a dependency relationship with the word, information indicating whether the word is a predicate or a behavioral noun, and the word is in a term relationship with the predescription word or the behavioral noun In this case, the correct answer term structure and a syntax / semantic analysis result table storing
For each predicate or behavioral noun in the syntax / semantic analysis result table, a word other than the predicate or behavioral noun in the syntax / semantic analysis result table that satisfies a constraint condition starting from the predicate or behavioral noun When the search is performed, the constraint condition is extracted as an attribute so that the word becomes a word appearing closest to the predicate or the action noun, and a training attribute index table is created, and a word other than the predicate or the action noun is created. A training vector composed of values indicating whether or not the word meets each of the constraints, a correct term structure for each of the words other than the predescription word or the behavioral noun, and other terms Training data creating means for creating a training vector table comprising a teacher variable for identifying a structure;
With respect to the training vectors and teacher variables described in the training vector table, for each type of term structure, data consisting of a combination of a teacher variable and a training vector indicating a correct term structure is used as a positive example. Using a data consisting of a set of a teacher variable indicating a term structure and a training vector as a negative example, a hyperplane that maximizes the distance between two parallel hyperplanes that divide the positive example and the negative example is obtained. Weight learning means for learning a weight representing the importance of the attribute using a machine learning method based on a hyperplane, and adding a weight obtained by the learning to the training attribute index table to create a weight table;
A term determination rule creating means for referring to the weight table and outputting a list in which attributes are sorted in the order of high importance attributes as a term determination rule for determining a term structure for a basic form of a predicate or an action noun,
The information indicating the correct term structure is information indicating the correct term structure for the basic form of the predicate or action noun included in the text to be analyzed,
The constraints are
(a) the search range or search direction of the word, based on the phrase including the predicate or behavioral noun as the starting point;
(b) the presence or absence of a dependency relationship between a clause including the predicate or behavioral noun as a starting point and a clause including a word other than the previous description word or behavior noun;
(c) whether or not to consider predicate state information that is in a dependency relationship with words other than predescription words or action nouns;
Including
A dictionary creation device, wherein the output term determination rule is a dictionary.

For each word included in the text to be analyzed described in natural language, in order from the beginning of the text to be analyzed, the notation of the word, the basic form of the word, the clause to which the clause to which the word belongs , The predicate state that is in a dependency relationship with the word, information indicating whether the word is a predicate or a behavioral noun, and the word is in a term relationship with the predescription word or the behavioral noun In the case, a dictionary creation method in a device having a syntax / semantic analysis result table storing the correct answer term structure,
For each predicate or behavioral noun in the syntax / semantic analysis result table, the training data creation means uses the predicate or behavior in the syntax / semantic analysis result table that satisfies the constraint condition starting from the predicate or behavioral noun. When searching for a word other than a sex noun, the constraint condition such that the word becomes a word appearing in the nearest place from the predicate or action noun is extracted as an attribute to create a training attribute index table, and the predicate or For each word other than a behavioral noun, a training vector composed of a value indicating whether or not the word is applicable to each of the constraint conditions, and a correct answer term for each of the words other than the previous description word or the behavioral noun Creating a training vector table comprising teacher variables for identifying structures and other term structures;
The weight learning means, for the training vector and the teacher variable described in the training vector table, is a positive example of data consisting of a set of a teacher variable and a training vector indicating a correct term structure for each type of term structure. And a hyperplane that maximizes the distance between two parallel hyperplanes that divide the positive and negative examples, using the data consisting of a set of teacher variables and training vectors indicating the term structure other than the correct answer as a negative example. Based on the obtained hyperplane, a weight representing the importance of the attribute is learned using a machine learning method, and the weight obtained by the learning is added to the training attribute index table to create a weight table A weight learning step;
A step for generating a term determination rule with reference to the weight table, and outputting a list in which attributes are rearranged in the order of high importance as a term determination rule for determining a term structure for a basic form of a predicate or an action noun; Run,
The information indicating the correct term structure is information indicating the correct term structure for the basic form of the predicate or action noun included in the text to be analyzed,
The constraints are
(a) the search range or search direction of the word, based on the phrase including the predicate or behavioral noun as the starting point;
(b) the presence or absence of a dependency relationship between a clause including the predicate or behavioral noun as a starting point and a clause including a word other than the previous description word or behavior noun;
(c) whether or not to consider predicate state information that is in a dependency relationship with words other than predescription words or action nouns;
Including
A dictionary creation method, wherein the output term determination rule is a dictionary.

A dictionary creation program for causing a computer to function as each means according to claim 1 .

A computer-readable recording medium on which the dictionary creating program according to claim 3 is recorded.