JP2009176168A

JP2009176168A - Language processor, language processing method, language processing program, and recording medium recording same program

Info

Publication number: JP2009176168A
Application number: JP2008015602A
Authority: JP
Inventors: Hiroyori Taira; 博順平; Masaaki Nagata; 昌明永田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-01-25
Filing date: 2008-01-25
Publication date: 2009-08-06
Anticipated expiration: 2028-01-25
Also published as: JP5150277B2

Abstract

<P>PROBLEM TO BE SOLVED: To obtain an essential surface case through the use of a case conversion rule with an appropriate conversion rule described therein. <P>SOLUTION: A language processor includes a case conversion rule table 2 for storing a rule to convert a modification state between a predicate or an action noun and a word other than it or a word attribute into case relation between the predicate or the action noun and the word other than it. A text input by a case conversion part 3 is output after being converted into the argument structure of the predicate and the action noun through the use of the modification state of a text and the rule of the case conversion rule table 2. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、自然言語で表現された質問に対しコンピュータが回答する質問応答システム、情報検索システム、情報抽出システム、自動要約システム、自動翻訳システム、自動言い換えシステム、音声認識システムなどに用いられる言語解析装置、言語処理方法および言語処理プログラム並びに言語処理プログラムを記録した記録媒体に関するものである。 The present invention relates to a language analysis used in a question answering system in which a computer answers a question expressed in a natural language, an information retrieval system, an information extraction system, an automatic summarization system, an automatic translation system, an automatic paraphrase system, a speech recognition system, etc. The present invention relates to an apparatus, a language processing method, a language processing program, and a recording medium on which the language processing program is recorded.

従来の言語処理装置では、述語の格フレームに対し確率モデルを仮定し、格フレームの正解が人手で付与されているコーパスデータを用いて機械学習を行い、確率モデルのパラメータを推定し、決定された確率モデルを用いて、最も尤度の高い項構造を出力する装置が提案されている（例えば、非特許文献１を参照）。この方法では、文中で、述語がどの単語であるか、その述語に対する項がどの単語であるかが与えられた場合に、その単語の意味属性をどのレベルにした格フレームが情報論的に表現力の高いルールであるかを調べる方法であり、述語や項の認定、照応解析、修飾語句決定については扱っていない。 In a conventional language processing device, a probabilistic model is assumed for the case frame of the predicate, machine learning is performed using corpus data in which the correct answer of the case frame is manually assigned, and the parameters of the probability model are estimated and determined. An apparatus that outputs the most likely term structure using a probabilistic model has been proposed (see, for example, Non-Patent Document 1). In this method, in a sentence, given a word that is a predicate and a word that is a term for the predicate, the case frame that expresses the semantic attribute of the word is expressed in information theory. It is a method to check whether the rule is powerful, and does not deal with predicate and term recognition, anaphora analysis, and modifier determination.

従来の述語項構造を出力する言語処理装置では、あらかじめ辞書に登録された、動詞、形容詞、名詞に付与された格情報を用いた解析しか行うことができず、辞書に登録されていない、動詞、名詞の使い方が使用されていた場合には適切に解析することができないことがあった。また、辞書に格情報が登録されていても、複数の動詞、名詞の用法が存在する場合、どの用法についての格情報を用いて解析を行うかについては、明確な基準がなく、人手に頼った調整が必要であり、その調整は、非常に労力を要し、かつ調整によって解析精度を向上させるような調整方法を見つけることは困難であった。そこで、非特許文献２で提案されている方法では、大規模なテキストコーパスから述語項構造の確率モデルを自動的に構築する方法が提案されている。
李航、安部直樹、“格スロット間の依存関係の学習”、情報処理学会研究報告、自然言語処理研究会報告、Ｖｏｌ．９６、Ｎｏ．１１４、ＰＰ．９３−９９。河原大輔、黒橋禎夫、“Ｗｅｂから獲得した大規模格フレームに基づく構文・格解析の統合的確率モデル”，言語処理学会第１３回年次大会発表論文集、２００６年。 In a conventional language processing device that outputs a predicate term structure, a verb that is registered in advance in a dictionary can only perform analysis using case information given to verbs, adjectives, and nouns. When noun usage was used, it could not be analyzed properly. Even if case information is registered in the dictionary, if there are multiple verbs and noun usages, there is no clear standard on which usage case information is used for analysis, and there is no need to rely on human resources. Adjustment is necessary, and the adjustment is very labor intensive, and it is difficult to find an adjustment method that improves the analysis accuracy by the adjustment. Therefore, in the method proposed in Non-Patent Document 2, a method for automatically constructing a probabilistic model of a predicate term structure from a large-scale text corpus has been proposed.
Lee Wang, Naoki Abe, “Learning Dependencies Between Case Slots”, Information Processing Society of Japan Research Report, Natural Language Processing Study Group Report, Vol. 96, no. 114, PP. 93-99. Daisuke Kawahara and Ikuo Kurohashi, “Integrated probabilistic model of syntactic and case analysis based on large-scale case frames acquired from the Web”, Proc. Of the 13th Annual Conference of the Language Processing Society, 2006.

しかしながら、質問応答、情報検索システムなどでは、「何が何を何にどうした」という述語、動作性名詞の基本形に対する必須表層格が有用な情報となるが、前記非特許文献２で提案されている方法では、述語の表層形に対する表層格しか扱っておらず、使役文、受身文、連体修飾等で見られる格要素の変化を考慮して、述語あるいは動作性名詞の基本形に対する必須表層格を出力することができない。また、必須表層格が省略されている場合には、必須表層格を出力することができない。さらに、必須表層格の修飾語句については、どこまでを項に含めるかについて考慮されておらず適切な修飾語句を含めた必須表層格の出力が必ずしも実現していなかったなどの問題点がある。 However, in the question answering, information retrieval system, etc., the predicate “what is what” and “what is what”, and the essential surface case for the basic form of the action noun are useful information. In this method, only the surface case for the surface form of the predicate is dealt with. Cannot output. In addition, when the required surface case is omitted, the required surface case cannot be output. Furthermore, there is a problem that the required surface case modifiers are not considered in terms of how far they are included in the term, and the output of the required surface case including appropriate modifiers has not necessarily been realized.

本発明の目的は、上記の問題点を解決し、述語および動作性名詞に対して、述語または動作性名詞とそれらの品詞とそれらの用言意味属性と意味的主辞の係り受け関係と意味的主辞の品詞、意味カテゴリ等の特徴と付属語から、必須表層格への適切な変換ルールが記述された格変換ルールを用いて必須表層格を得るとともに、格変換ルールを適用しても必須表層格が得られなかった場合に、述語または動作性名詞と項の共起情報および話題度に基づいて必須表層格を決め、さらに得られた表層格の単語に対し、どこまでの修飾語句を含めて項と見なすかを、統計情報に基づいて決定することを特徴とする言語処理装置、言語処理方法および言語処理プログラム並びに言語処理プログラムを記録した記録媒体を提供することにある。 The object of the present invention is to solve the above-mentioned problems, and for the predicates and behavioral nouns, the dependency relationship between the predicates or behavioral nouns, their part of speech, their prescriptive meaning attributes, and the semantic headings, and the semantics. Using the case conversion rules that describe appropriate conversion rules to essential surface cases from the features and adjuncts of the part of speech, semantic category, etc., the required surface case is obtained and even if the case conversion rules are applied If the case is not obtained, the required surface case is determined based on the co-occurrence information of the predicate or behavioral noun and the term and the topic level, and how many modifiers are included for the obtained surface case word An object of the present invention is to provide a language processing apparatus, a language processing method, a language processing program, and a recording medium on which a language processing program is recorded.

上記課題を達成するために、請求項１に係る発明は、入力されたテキスト中の述語または動作性名詞についての項構造を出力する言語処理装置であって、述語または動作性名詞とそれ以外の単語または単語属性間の係り受け状態を述語または動作性名詞とそれ以外の単語との格関係へ変換する規則を記憶した格変換規則記憶手段と、テキストの係り受け状態および前記格変換規則記憶手段の格関係へ変換する規則を適用して、入力されたテキストを述語および動作性名詞の項構造に変換して出力する格変換手段と、を備えることを特徴とする。 In order to achieve the above object, the invention according to claim 1 is a language processing device that outputs a term structure for a predicate or a behavioral noun in an input text. Case conversion rule storage means for storing a rule for converting a dependency state between words or word attributes into a case relationship between a predicate or a behavioral noun and other words, and a dependency state of text and the case conversion rule storage means And a case conversion means for converting the input text into a predicate and a term structure of a behavioral noun by applying a rule for conversion into a case relationship.

また、請求項２に係る発明は、請求項１において、前記格関係へ変換する規則は、正解の格関係および係り受け状態が与えられたテキストから機械学習によって生成される、ことを特徴とする。 The invention according to claim 2 is characterized in that, in claim 1, the rule for converting to the case relation is generated by machine learning from a text given a correct case relation and a dependency state. .

また、請求項３に係る発明は、請求項１又は２において、前記係り受け状態は、文節の意味的主辞について、係り元主辞と係り先主辞の情報である、ことを特徴とする。 The invention according to claim 3 is characterized in that, in claim 1 or 2, the dependency state is information of a relationship original character and a relationship character for a semantic main character of a phrase.

また、請求項４に係る発明は、請求項１又は２において、前記係り受け状態は、文節の意味的主辞について、係り元主辞と係り先主辞と意味的主辞に対する付属語の情報または意味的主辞の品詞または意味的主辞の意味カテゴリまたは述語または動作性名詞の用言意味属性または修飾語句または同一テキストに出現した単語またはテキストの属するジャンルやカテゴリの情報である、ことを特徴とする。 The invention according to claim 4 is the invention according to claim 1 or 2, wherein the dependency state is information about the semantic main word of the phrase, the information of the auxiliary word, the predecessor main word, and the semantic main word, or the semantic main word. It is characterized by the meaning category or predicate or behavioral noun prescriptive meaning attribute or modifier phrase of the part of speech or semantic main word or the genre or category information to which the word or text appears in the same text.

また、請求項５に係る発明は、請求項１において、前記格関係へ変換する規則は、述語又は動作性名詞に対して項構造が付与された自然言語で記載された解析対象のテキストを構文解析し、文節集合および文節間の係り受け関係を解析する構文解析部と、該構文解析部で文節内の自立語と判定された単語の中から文節毎に意味的主辞を特定し、主辞テーブルを作成する主辞特定部と、該主辞テーブルの文節ごとに該文節の係り先文節を追加するとともに、主辞について、主辞の属する文節内で後続する付属語を前記主辞テーブルに追加する主辞間係り受け解析部と、前記テキストに付与されている述語又は動作性名詞の項構造から、該述語ならび動作性名詞の基本形に対する必須表層格およびそれぞれの単語が属する文節を抽出し、項構造正解テーブルを作成し、前記主辞テーブルを参照して該項構造正解テーブルの各文節を主辞に変換すると共に、各述語ならび動作性名詞の必須格に対応する入力テキストにおける文節の係り受け関係を、該項構造正解テーブルに付加し、該項構造正解テーブルを用いて格変換ルールを学習する格変換ルール学習部と
得られた述語並びに動作性名詞を格変換規則として出力する格変換規則出力部とから生成される、ことを特徴とする。 The invention according to claim 5 is the invention according to claim 1, wherein the rule for converting to the case relation is a syntax of a text to be analyzed described in a natural language in which a term structure is given to a predicate or an action noun. A syntactic analysis unit that analyzes and analyzes a set of clauses and a dependency relationship between clauses, and identifies a semantic main word for each phrase from words determined to be independent words in the phrase by the syntactic analysis unit; A main part specifying unit for creating the main part and a dependency part of the main part for each main part of the main part table, and a subordinate dependency for adding an additional word following the main part to the main part table in the clause to which the main part belongs From the analysis section and the term structure of the predicate or behavioral noun assigned to the text, the required superficial case for the basic form of the predicate and behavioral noun and the phrase to which each word belongs are extracted, and the term structure correct A table is created, and each clause of the term structure correct answer table is converted to a head with reference to the head table, and the dependency relationship of clauses in the input text corresponding to the required case of each predicate and action noun is A case conversion rule learning unit that adds a term structure correct answer table and learns case conversion rules using the term structure correct answer table and a case conversion rule output unit that outputs the obtained predicate and action noun as case conversion rules It is generated.

また、請求項６に係る発明は、請求項５において、シソーラスを更に有し、前記格変換ルール学習部は、前記シソーラスを検索して、前記項構造正解テーブル内の単語の上位概念を抽出し、該抽出した単語と元の単語を置き換えたルールを更に作成して、前記項構造正解テーブルに追加することを特徴とする。 The invention according to claim 6 further includes a thesaurus according to claim 5, and the case conversion rule learning unit searches the thesaurus to extract a superordinate concept of words in the term structure correct answer table. A rule in which the extracted word and the original word are replaced is further created and added to the term structure correct answer table.

また、請求項７に係る発明は、請求項５において、前記格変換ルール学習部は、前記項構造正解テーブルに記載された正解の格変換ルールを機械学習により、格変換ルールを学習することを特徴とする。 The invention according to claim 7 is that in claim 5, the case conversion rule learning unit learns the case conversion rule by machine learning of the correct case conversion rule described in the item structure correct answer table. Features.

また、請求項８に係る発明は、入力されたテキスト中の述語または動作性名詞についての項構造を出力する言語処理方法であって、テキストおよびテキストの係り受け状態を入力するステップと、テキスト中の述語または動作性名詞に対して、テキストの係り受け状態に応じて格変換規則記憶手段に格納されている格変換規則を適用し、格変換手段で項構造に変換するステップと、上記ステップで得られた述語または動作性名詞に対する項構造を出力するステップと、を有することを特徴とする。 The invention according to claim 8 is a language processing method for outputting a term structure for a predicate or an action noun in an input text, the step of inputting the text and the dependency state of the text, Applying a case conversion rule stored in the case conversion rule storage means to the predicate or behavioral noun according to the dependency state of the text, and converting to a term structure by the case conversion means; and Outputting a term structure for the obtained predicate or action noun.

また、請求項９に係る発明は、請求項８において、前記格変換規則は、正解の格関係および係り受け状態が与えられたテキストから機械学習によって生成される、ことを特徴とする。 The invention according to claim 9 is characterized in that, in claim 8, the case conversion rule is generated by machine learning from a text having a correct case relation and a dependency state.

また、請求項１０に係る発明は、請求項８又は９において、前記係り受け状態は、文節の意味的主辞について、係り元主辞と係り先主辞の情報である、ことを特徴とする。 The invention according to claim 10 is characterized in that, in claim 8 or 9, the dependency state is information of a relationship original character and a relationship character for a semantic main character of a phrase.

また、請求項１１に係る発明は、請求項８又は９において、前記係り受け状態は、文節の意味的主辞について、係り元主辞と係り先主辞と意味的主辞に対する付属語の情報または意味的主辞の品詞または意味的主辞の意味カテゴリまたは述語または動作性名詞の用言意味属性または修飾語句または同一テキストに出現した単語またはテキストの属するジャンルやカテゴリの情報である、ことを特徴とする。 The invention according to claim 11 is the invention according to claim 8 or 9, wherein the dependency state is the information of the ancillary words or the semantic main words for the semantic main words of the clause, the original characters of the original characters, the dependent characters and the semantic characters. It is characterized by the meaning category or predicate or behavioral noun prescriptive meaning attribute or modifier phrase of the part of speech or semantic main word or the genre or category information to which the word or text appears in the same text.

また、請求項１２に係る発明は、請求項８において、前記格変換規則は、構文解析部が、述語又は動作性名詞に対して項構造が付与された自然言語で記載された解析対象のテキストを構文解析し、文節集合および文節間の係り受け関係を解析するステップと、主辞特定部が、該構文解析部で文節内の自立語と判定された単語の中から文節毎に意味的主辞を特定し、主辞テーブルを作成するステップと、主辞間係り受け解析部が、該主辞テーブルの文節ごとに該文節の係り先文節を追加するとともに、主辞について、主辞の属する文節内で後続する付属語を前記主辞テーブルに追加するステップと、格変換ルール学習部が、前記テキストに付与されている述語又は動作性名詞の項構造から、該述語ならび動作性名詞の基本形に対する必須表層格およびそれぞれの単語が属する文節を抽出し、項構造正解テーブルを作成し、前記主辞テーブルを参照して該項構造正解テーブルの各文節を主辞に変換すると共に、各述語ならび動作性名詞の必須格に対応する入力テキストにおける文節の係り受け関係を、該項構造正解テーブルに付加するステップと
格変換規則出力部が、得られた述語並びに動作性名詞を格変換規則として出力するステップとから生成される、ことを特徴とする。 The invention according to claim 12 is the text to be analyzed according to claim 8, in which the case conversion rule is a text to be analyzed in which the syntax analysis unit is described in a natural language in which a term structure is given to a predicate or an action noun And analyzing the dependency relationship between the phrase set and the phrase, and the main word specifying unit determines a semantic main word for each phrase from the words determined by the syntactic analysis unit as independent words in the phrase. The step of identifying and creating a head table, and the dependency analysis unit between the heads adds a dependency clause of the clause for each clause of the head table, and for the head, the ancillary word following in the clause to which the head belongs And a case conversion rule learning unit from the term structure of the predicate or behavioral noun assigned to the text, the required surface case for the basic form of the predicate and behavioral noun Then, the clauses to which each word belongs are extracted, a term structure correct answer table is created, each clause of the term structure correct answer table is converted to a main word with reference to the main word table, and the required case of each predicate and action noun Are generated from the step of adding the dependency relations of clauses in the input text corresponding to the item structure correct answer table and the case conversion rule output unit outputting the obtained predicate and action noun as case conversion rules. It is characterized by that.

また、請求項１３に係る発明は、請求項１２において、シソーラスを更に有し、前記格変換ルール学習部は、前記シソーラスを検索して、前記項構造正解テーブル内の単語の上位概念を抽出した後に、該抽出した単語と元の単語を置き換えたルールを更に作成して、前記項構造正解テーブルに追加することを特徴とする。 The invention according to claim 13 is the invention according to claim 12, further comprising a thesaurus, wherein the case conversion rule learning unit searches the thesaurus to extract a superordinate concept of words in the term structure correct answer table. Later, a rule in which the extracted word and the original word are replaced is further created and added to the term structure correct answer table.

また、請求項１４に係る発明は、請求項１２において、前記格変換ルール学習部は、前記項構造正解テーブルに記載された正解の格変換ルールを機械学習により、格変換ルールを学習することを特徴とする。 Further, in the invention according to claim 14, in claim 12, the case conversion rule learning unit learns the case conversion rule by machine learning of the correct case conversion rule described in the item structure correct answer table. Features.

また、請求項１５に係る発明は、コンピュータを、請求項１乃至７のいずれか１項に記載の言語処理装置における各手段として機能させることを特徴とする言語処理プログラムである。 The invention according to claim 15 is a language processing program that causes a computer to function as each means in the language processing apparatus according to any one of claims 1 to 7.

また、請求項１６に係る発明は、請求項１５に記載の言語処理プログラムを記録したコンピュータ読み取り可能な記録媒体である。 The invention according to claim 16 is a computer-readable recording medium on which the language processing program according to claim 15 is recorded.

本発明によれば、入力された自然言語で表現されたテキストに対し、述語または動作性名詞の基本形についての項構造を高精度に出力することができる利点がある。 According to the present invention, there is an advantage that the term structure of the basic form of the predicate or the action noun can be output with high accuracy for the text expressed in the natural language.

以下本発明の実施の形態を図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本実施形態に関する言語処理装置の原理的システム構成を示すブロック図である。本言語処理装置１は、述語または動作性名詞おそれ以外の単語または単語属性間の係り受け状態を述語または動作性名詞とそれ以外の単語との格関係へ変換する規則を記憶した格変換規則テーブル２と、テキストの係り受け状態および前記格変換規則を用いて、入力されたテキストを述語および動作性名詞の項構造に変換して出力する格変換部３とから構成される。 FIG. 1 is a block diagram showing the basic system configuration of the language processing apparatus according to this embodiment. The language processing apparatus 1 stores a case conversion rule table storing rules for converting a dependency state between words or word attributes other than predicate or behavioral noun fear into a case relationship between a predicate or behavioral noun and other words. 2 and a case conversion unit 3 which converts the input text into a predicate and a term structure of an action noun using the text dependency state and the case conversion rule.

以下、図１に示すシステムの原理的動作を、図２に示すフローチャートを参照して説明する。 The principle operation of the system shown in FIG. 1 will be described below with reference to the flowchart shown in FIG.

まず、言語処理装置１に対して、ユーザがテキストおよびテキストの係り受け状態を入力する（ステップ１）。ここで、テキストの係り受け状態は、人手で与えることも、構文解析器のようなプログラムの出力を与えることも可能である。 First, the user inputs text and a text dependency state to the language processing apparatus 1 (step 1). Here, the dependency state of the text can be given manually or the output of a program such as a parser can be given.

次に、テキスト中の述語または動作性名詞に対して、テキストの係り受け状態に応じて格変換規則テーブル２に格納されている格変換規則を適用し、格変換部３にて項構造に変換する（ステップ２）。格変換規則テーブル２は、あらかじめ人手で与えておくことも、機械学習等の技術を実現したプログラムによって自動的に作成し与えておくことも可能である。 Next, the case conversion rules stored in the case conversion rule table 2 are applied to the predicates or behavioral nouns in the text according to the dependency state of the text, and the case conversion unit 3 converts the case into a term structure. (Step 2). The case conversion rule table 2 can be given manually in advance, or can be automatically created and given by a program that implements a technique such as machine learning.

最後に、ステップ２で得られた述語または動作性名詞に対する項構造を格変換部３から出力して動作を終了する（ステップ３）。 Finally, the term structure for the predicate or the action noun obtained in step 2 is output from the case conversion unit 3 to end the operation (step 3).

図３は、本実施形態にかかる言語処理装置１０の原理的システム構成を示すブロック図である。 FIG. 3 is a block diagram showing the basic system configuration of the language processing apparatus 10 according to the present embodiment.

本言語処理装置１０は、構文解析部１０１、固有名詞連結部１０２、主辞特定部１０３、主辞間係り受け解析部１０４、非主辞述語テーブル１２４、非主辞述語記憶部１２５、格変換部１２６、照応解析部１２７、修飾語句決定部１２８、項構造出力部１１０、構文解析結果テーブル１２０、主辞テーブル１２１、シソーラス１２２、格変換規則テーブル１０６、述語項テーブル１２９、項構造テーブル１３０とを有する。 The language processing apparatus 10 includes a syntax analysis unit 101, a proper noun linking unit 102, a subject specifying unit 103, an inter-subject dependency analysis unit 104, a non-subject predicate table 124, a non-subject predicate storage unit 125, a case conversion unit 126, an anaphoric response. It has an analysis unit 127, a modifier phrase determination unit 128, a term structure output unit 110, a syntax analysis result table 120, a main table 121, a thesaurus 122, a case conversion rule table 106, a predicate term table 129, and a term structure table 130.

次に、図３に示す本実施形態のシステム構成を適用した具体的な言語処理装置の動作を、図４に示すフローチャートを参照して説明する。 Next, the operation of a specific language processing apparatus to which the system configuration of this embodiment shown in FIG. 3 is applied will be described with reference to the flowchart shown in FIG.

まず、言語処理装置１０に対して、自然言語で書かれた解析対象のテキストを入力する（ステップ１）。 First, an analysis target text written in a natural language is input to the language processing apparatus 10 (step 1).

次に、構文解析部１０１が、前記テキストを構文解析し、文節集合および文節間の係り受け関係を解析し、その解析結果を構文解析結果テーブル１２０に格納する（ステップ２）。 Next, the syntax analysis unit 101 parses the text, analyzes the clause set and the dependency relationship between clauses, and stores the analysis result in the syntax analysis result table 120 (step 2).

例えば、「村山富市首相は妻の作ったおいしいカレーを食べた。」という文を構文解析した場合、図５に示すように、テキストの先頭から順に、単語、単語基本形、品詞、固有品詞タグ、その単語が属す文節番号を得る。 For example, if you parse a sentence that says, “Mr. Tomiichi Murayama ate a delicious curry made by his wife”, as shown in FIG. , Get the phrase number to which the word belongs.

ここで文節番号は、「０」以上の整数で、テキストの先頭の文節から順に「０，１，２，・・・」と付与される。また、固有名詞タグは、単語が固有名詞あるいは、固有名詞の一部であると解析されたときに付与されるタグであり、例えば、Ｂ−ＰＥＲＳＯＮは人名の固有名詞の先頭単語を表し、Ｉ−ＰＥＲＳＯＮは人名の固有名詞の先頭ではない単語を表し、「Ｏ」は固有名詞ではない名詞を表す。 Here, the phrase number is an integer greater than or equal to “0”, and is assigned “0, 1, 2,...” In order from the first phrase of the text. The proper noun tag is a tag given when a word is analyzed to be a proper noun or a part of a proper noun. For example, B-PERSON represents the first word of a proper noun of a person name, -PERSON represents a word that is not the head of a proper noun of a person name, and "O" represents a noun that is not a proper noun.

また、構文解析部１０１は、図６に示すように、文節番号とその文節が係る文節の番号である係り先文節番号の組を出力する。ここで、文末等で係り先がない場合は、係り先文節番号を「−１」とした。この出力結果は、構文解析結果テーブル１２０に格納される。この構文解析部１０１では、ＣａｂｏＣｈａ（参考文献：非特許文献３）等の単体の構文解析器を利用することもできる。
工藤拓、松本裕治、“チャンキングの段階適用による日本語係り受け解析”，情報処理学会論文誌、Ｖｏｌ．４３、Ｎｏ．６、ｐｐ．１８３４−１８４２、２００２年。 Further, as shown in FIG. 6, the syntax analysis unit 101 outputs a set of a clause number and a related clause number that is the number of the clause related to the clause. Here, when there is no destination at the end of the sentence, the destination clause number is set to “−1”. This output result is stored in the syntax analysis result table 120. In the syntax analysis unit 101, a single syntax analyzer such as CaboCha (reference document: Non-Patent Document 3) can be used.
Taku Kudo, Yuji Matsumoto, “Japanese Dependency Analysis by Chunking Stage Application”, IPSJ Journal, Vol. 43, no. 6, pp. 1834-1842, 2002.

次に、固有名詞連結部１０２が、構文解析結果テーブル１２０の固有名詞タブ欄を上から順に見て、一つの固有名詞を一つにまとめる（ステップ３）。例えば、構文解析結果テーブル１２０が、図５に示すようになっている場合は、単語「村山」と単語「富市」は、各々固有名詞タグ「Ｂ−ＰＥＲＳＯＮ」「Ｉ−ＰＥＲＳＯＮ」を持つため、固有名詞連結部１０２が、これら二つの単語を連結して単語欄は「村山富市」、単語基本形欄は「村山富市」とする。品詞欄はまとめあげて「名詞−固有名詞−人名」、固有名詞タグ欄もまとめて「ＰＥＲＳＯＮ」とする。また、テキストの先頭の単語から順に単語番号を付与する。その結果で構文解析結果テーブル１２０を更新する。更新された構文解析結果テーブル１２０の例を図７に示す。 Next, the proper noun linking unit 102 looks at the proper noun tab fields of the syntax analysis result table 120 in order from the top, and collects one proper noun into one (step 3). For example, when the syntax analysis result table 120 is as shown in FIG. 5, the words “Murayama” and the word “Tomiichi” have proper noun tags “B-PERSON” and “I-PERSON”, respectively. The proper noun linking unit 102 concatenates these two words, and sets the word column as “Murayama Tomi City” and the word basic form column as “Murayama Tomi City”. The part-of-speech column is collectively referred to as “noun-proprietary noun-person name”, and the proper-noun tag column is also collectively referred to as “PERSON”. Also, word numbers are assigned in order from the first word of the text. The syntax analysis result table 120 is updated with the result. An example of the updated syntax analysis result table 120 is shown in FIG.

次に、主辞特定部１０３が、各文節の意味的主辞を特定する（ステップ４）。各文節において、構文解析器によって、文節内の自立語、すなわち品詞が名詞のものおよび、動詞、形容詞のもの中で自立語と判定された単語の中で、文節中で最も後ろに位置する単語を意味的主辞として抽出し、その単語の単語番号、単語の属す文節の文節番号と合わせ、主辞テーブル１２１を作成する。主辞テーブル１２１の例を図８に示す。 Next, the main part specifying unit 103 specifies the semantic main part of each phrase (step 4). In each clause, the last word in the clause among the independent words in the clause, that is, the words whose part of speech is determined to be independent words in nouns and in verbs and adjectives Is extracted as a semantic head and is combined with the word number of the word and the phrase number of the phrase to which the word belongs to create a head table 121. An example of the main table 121 is shown in FIG.

次に、主辞間係り受け解析部１０４が、主辞テーブル１２１の文節番号に関して、係り先文節番号テーブルの文節番号に対する係り先文節番号を検索して、主辞テーブル１２１に係り先文節番号を追加する（ステップ５）。その結果の主辞テーブル１２１の例を図９に示す。 Next, the inter-subject dependency analysis unit 104 searches for the clause number of the clause number in the dependency clause number table with respect to the clause number of the subject table 121 and adds the dependency clause number to the subject table 121 ( Step 5). An example of the resulting main table 121 is shown in FIG.

また、主辞間係り受け解析部１０４が、各主辞について、各主辞の属する文節内で後続する非自立語全体を付属語として主辞テーブル１２１に追加する（ステップ６）。ただし、末尾の句読点を除いたものを付属語とする。その結果の主辞テーブル１２１の例を図１０に示す。 The inter-subject dependency analysis unit 104 adds, for each main word, the entire non-independent word that follows in the phrase to which each main word belongs to the main word table 121 as an attached word (step 6). However, the ones excluding the punctuation at the end are attached words. An example of the main table 121 as a result is shown in FIG.

次に、ステップ７では、非主辞述語記憶部１２５が、文節内の主辞以外の述語または動作性名詞の並びを非主辞述語テーブル１２４に記憶する。例えば、ステップ１において、「監督夫人の作った優勝記念ケーキを選手達が食べた。」という解析対象文が入力されたとする。このとき、ステップ２における構文解析結果が、図１１に示すようになっており、また構文解析部１０１は、図１２に示すように、文節番号とその文節が係る文節の番号である係り先文節番号の組を出力しているとする。また、ステップ６において作成された主辞テーブル１２１が、図１３に示すように更新されているとする。 Next, in step 7, the non-subject predicate storage unit 125 stores in the non-subject predicate table 124 a list of predicates or behavioral nouns other than the main predicate in the clause. For example, in step 1, it is assumed that an analysis target sentence “the players ate the winning memorial cake made by the wife of the director” was input. At this time, the syntax analysis result in step 2 is as shown in FIG. 11, and the syntax analysis unit 101, as shown in FIG. 12, shows the clause number and the related clause that is the number of the clause to which the clause relates. Suppose you are outputting a set of numbers. Further, it is assumed that the main word table 121 created in step 6 is updated as shown in FIG.

各文節において主辞の前に位置する述語または動作性名詞がある場合には、その順番で記憶する。例えば、文節番号２の主辞「ケーキ」の前には、「優勝」「記念」という動作性名詞があるので、その順番で非主辞述語テーブル１２４に図１４に示すように記憶する。文節番号１のように「作っ」という主辞の前に文節内では述語または動作性名詞が存在しない場合は「−」として記憶する。 If there is a predicate or action noun that precedes the main word in each clause, it is stored in that order. For example, before the main word “cake” of the phrase number 2, there are operational nouns “winning” and “commemorative”, and are stored in the non-main word predicate table 124 in that order as shown in FIG. If there is no predicate or action noun in the clause before the main word “Make” as in clause number 1, it is stored as “−”.

次に、文内の前から順に、主辞ごとに以下の処理を繰り返す（ステップ８）。その主辞が係っている先があるか、あるいは文内の他の単語から係られているかどうかを、図１３の主辞テーブルを元にして調べて、ステップ９で係り受け関係があれば（Ｙ）、格変換部１２６で格変換ルールを適用する（ステップ１０）。ステップ９で、係り受け関係がない場合（Ｎ）には、照応解析部１２７で照応解析を行う（ステップ１１）。 Next, the following processing is repeated for each main word in order from the front in the sentence (step 8). Based on the main table of FIG. 13, it is checked whether there is a destination to which the main word is related, or whether it is related from another word in the sentence. If there is a dependency relationship in step 9 (Y ), The case conversion rule is applied by the case conversion unit 126 (step 10). In step 9, if there is no dependency relationship (N), the anaphora analysis unit 127 performs anaphora analysis (step 11).

ステップ１０での格変換ルールの適用について説明する。まず、図１３の主辞テーブル中の主辞の中で、述語あるいは動作性名詞であるものに対し、係り受け関係にある他の主辞およびその意味カテゴリとその単語番号、その主辞を含む文節の主辞に後続する付属語と文節番号、述語あるいは動作性名詞に対する用言意味属性を、主辞テーブル１２１（図１３）、シソーラス１２２、用言意味属性体系を用いて付与し、述語項テーブル１２９を作成する。例えば、図１３に示すテーブルから得られる述語項テーブル１２９の例を図１５に示す。図１５を用いて、異なる主辞ごとに最適な変換規則を選択する。 Application of the case conversion rule in step 10 will be described. First, among the main words in the main word table of FIG. 13, for the predicate or the action noun, the other main words in the dependency relationship, their semantic categories, their word numbers, and the main words of the clause containing the main word A predicate term table 129 is created by assigning a prescriptive meaning attribute to the following ancillary word and phrase number, predicate or action noun using the main table 121 (FIG. 13), thesaurus 122, and the prescriptive meaning attribute system. For example, FIG. 15 shows an example of the predicate term table 129 obtained from the table shown in FIG. Using FIG. 15, an optimal conversion rule is selected for each different main word.

例えば、図１６のように、述語又は動作性名詞そのものに対して、単語又は意味カテゴリと係り関係から必須表層格への変換の重みが辞書として得られているとき、述語項テーブル１２９に出現した単語および係り状態に対し、その変換ルールの組み合わせを適用したとき、重みの評価値（例えば重みの線形和など）が最も大きな評価値が得られる変換規則の組合せを適用する。 For example, as shown in FIG. 16, when the weight of conversion from the relationship between the word or the semantic category and the required surface case is obtained as a dictionary for the predicate or the action noun itself, it appears in the predicate term table 129. When a combination of conversion rules is applied to a word and a dependency state, a combination of conversion rules that obtains an evaluation value having the largest weight evaluation value (for example, a linear sum of weights) is applied.

もし、図１６のような述語又は動作性名詞そのものに対するルールが適用されなかった場合には、述語あるいは動作性名詞の用言意味属性に対する変換を考える（図１７）。同様に、与えられていた評価関数で評価値を計算し、最も高い評価値が得られる変換規則の組合せを適用する。 If the rule for the predicate or behavioral noun itself as shown in FIG. 16 is not applied, the conversion of the predicate or behavioral noun to the meaning attribute of the predicate or behavioral noun is considered (FIG. 17). Similarly, an evaluation value is calculated using the given evaluation function, and a combination of conversion rules that gives the highest evaluation value is applied.

もし、その結果、変換が適用されない場合には、述語あるいは動作性名詞一般に対する変換則を考える（図１７）。同様に、与えられていた評価関数で評価値を計算し、最も高い評価値が得られる変換規則の組合せを適用する。本参考例では、その結果、図１８のような項構造テーブル１３０が得られたとする。その後、照応解析部１２７で照応解析を行い、ステップ１１に進む。 As a result, if conversion is not applied, a conversion rule for predicates or general behavioral nouns is considered (FIG. 17). Similarly, an evaluation value is calculated using the given evaluation function, and a combination of conversion rules that gives the highest evaluation value is applied. As a result, in this reference example, it is assumed that a term structure table 130 as shown in FIG. 18 is obtained. Thereafter, the anaphoric analysis unit 127 performs anaphoric analysis, and the process proceeds to step 11.

ステップ１１では、述語あるいは動作性名詞に関して、ここまでの処理で項構造が決まらなかった部分に関して、照応解析を行い、項構造を決める。非主辞述語テーブルに記憶していた係り受け関係がなかった述語または動作性名詞である、「監督」「優勝」「記念」も図１８に示す項構造テーブル１３０に追加して項構造テーブルが図１９のように更新されたとする。ここで、ガ格、ヲ格、ニ格について単語の埋まっていない「−」で表記されている部分について照応解析部１２７で照応解析を行って、照応先が決まるものについては、その照応先の単語で埋める。 In step 11, with respect to the predicate or the behavioral noun, the anaphora analysis is performed on the part where the term structure has not been determined by the processing so far, and the term structure is determined. “Director”, “winner” and “commemorative” which are predicates or behavioral nouns stored in the non-subject predicate table and which have no dependency relationship are also added to the item structure table 130 shown in FIG. Suppose that it is updated as shown in FIG. Here, the anaphora analysis unit 127 performs anaphora analysis on the portion of the ga-case, wo-case, and d-case where the word is not filled in, and the anaphora destination is determined. Fill with words.

照応解析のやり方についてはさまざまな手法が考えられるが、ここでは、文内および同一文脈にある文中に出現する単語の中で最も「話題度」の高い単語を照応先とする照応解析を行う。ここで話題度とは、構文的な観点、意味的な観点、単語の出現頻度などから、文章全体を様々な要約率で要約したときに、高い要約率の要約においても、出現する単語には高い点数を、低い要約率の要約にしか、出てこない単語には低い点数を与えるように点数化したものである。 Various methods can be considered for the method of anaphora analysis. Here, anaphora analysis is performed using the word having the highest “topic” among the words appearing in the sentence and in the same context. Here, topic level refers to the words that appear even when summarizing the entire sentence at various summarization rates from a syntactical viewpoint, semantic viewpoint, word appearance frequency, etc. The high score is scored so that only a summary with a low summarization rate gives a low score to words that do not appear.

また、これまでの処理において単語が埋まらなかった場合には、ガ格は常に必須格であると仮定している場合には「外界照応」であることを表す「ｅｘｏｇ」を埋める。その結果、項構造テーブル１３０は図２０のようになったとする。 In addition, when a word is not filled in the processing so far, “exog” representing “external world response” is filled when it is assumed that the case is always an indispensable case. As a result, the term structure table 130 is as shown in FIG.

次に、修飾語句決定部１２８で出力修飾語句の特定を行う（ステップ１２）。図２０の項構造テーブル１３０で得られている主辞について、そのまま主辞を出力するか、修飾語句を付与するかどうかを判定する。主辞そのままで出現する頻度と、修飾語句を付与したときの単語群で出現する頻度を比較して、多い頻度の形態の単語群を出力する単語と認定する。 Next, an output modifier is specified by the modifier determining unit 128 (step 12). For the main word obtained in the item structure table 130 of FIG. 20, it is determined whether to output the main letter as it is or to add a modifier. By comparing the frequency of appearance of the main word as it is and the frequency of appearance of the word group when the modifier is added, the word group is recognized as a word that outputs a word group having a high frequency form.

例えば、「選手」という主辞よりの「選手達」という単語の方が一般的なコーパスに出現する頻度が高い場合には、「選手」を「選手達」で置き換えて出力する。これらの単語群の候補としては、図１１の構文解析結果を用いて、主辞に対し、同一文節中で前後連続する単語を付加していき、頻度を計算する。また、単語番号、文節番号の列を消去し、図２１のように項構造テーブルを更新する。 For example, when the word “players” from the main word “players” appears more frequently in a general corpus, “players” are replaced with “players” and output. As the candidates of these word groups, using the result of the syntax analysis in FIG. 11, consecutive words in the same phrase are added to the main word, and the frequency is calculated. Further, the column of word numbers and phrase numbers is deleted, and the item structure table is updated as shown in FIG.

最後に得られた項構造テーブルを項構造出力部１１０から述語および動作性名詞の項構造解析結果として出力する（ステップ１３）。 The finally obtained term structure table is outputted from the term structure output unit 110 as a predicate and a behavioral noun term structure analysis result (step 13).

なお、本実施の形態では、出力する項構造が必須表層格に関する項構造である例について記述したが、出力する項構造が任意の表層格に関する場合や深層格に関する場合でも、格変換規則テーブルをそれらに応じたものにすれば同様の手段で実現可能である。 In this embodiment, an example in which the term structure to be output is a term structure related to an essential surface case has been described. However, even if the output term structure is related to an arbitrary surface case or a deep case, the case conversion rule table is displayed. It can be realized by the same means as long as it corresponds to them.

また、処理対象が英語などの外国語テキストである場合にも、係り受け関係の付属語の代わりに、動詞・名詞以外の品詞や単語間で構成される名詞句・動詞句といった句構造における関係を係り関係として使用した変換規則を用いることで、この処理方法を用いて解析結果を得ることができる。 In addition, even when the processing target is a foreign language text such as English, relations in phrase structures such as noun phrases / verb phrases composed of parts of speech other than verbs / nouns or words instead of dependency-related adjuncts By using a conversion rule that is used as a relationship, an analysis result can be obtained using this processing method.

図２３は、上記言語処理装置で使用する辞書作成装置の原理的システム構成を示すブロック図である。本辞書作成装置１１は、格変換ルール学習部１２を有する。 FIG. 23 is a block diagram showing a principle system configuration of a dictionary creation device used in the language processing device. The dictionary creation device 11 includes a case conversion rule learning unit 12.

次に、図２３に示すシステム構成の原理的動作を、図２４に示すフローチャートを参照して説明する。 Next, the principle operation of the system configuration shown in FIG. 23 will be described with reference to the flowchart shown in FIG.

まず、辞書作成装置１１に対して、ユーザがテキストおよびテキストの正解の項構造を格変換ルール学習部１２に入力する（ステップ１）。 First, the user inputs the text and the correct term structure of the text into the case conversion rule learning unit 12 to the dictionary creation device 11 (step 1).

次に、テキスト中の述語または動作性名詞と、テキスト中の各文節の主辞との係り受け状態を述語項構造に変換する格変換規則を、機械学習技術を用いて作成する（ステップ２）。最後に、ステップ２で得られた格変換規則を格変換ルール学習部１２から出力して動作を終了する（ステップ３）。 Next, a case conversion rule for converting the dependency state between the predicate or behavioral noun in the text and the main word of each clause in the text into a predicate term structure is created using machine learning technology (step 2). Finally, the case conversion rule obtained in step 2 is output from the case conversion rule learning unit 12 and the operation is terminated (step 3).

図２５は、上記辞書作成装置１１の具体的な辞書作成装置１００の原理的システム構成を示すブロック図で、図３に示す言語処理装置と同一部分には同一符号を付している。本辞書作成装置１００は、構文解析部１０１、固有名詞連結部１０２、主辞特定部１０３、主辞間係り受け解析部１０４、格変換ルール学習部１０５、構文解析結果テーブル１２０、主辞テーブル１２１、シソーラス１２２、項構造正解テーブル１２３および格変換規則出力部１０７とを有する。 FIG. 25 is a block diagram showing a specific system configuration of the dictionary creating apparatus 100 of the dictionary creating apparatus 11, in which the same parts as those of the language processing apparatus shown in FIG. The dictionary creation apparatus 100 includes a syntax analysis unit 101, a proper noun linking unit 102, a subject specifying unit 103, a dependency dependency analysis unit 104, a case conversion rule learning unit 105, a syntax analysis result table 120, a subject table 121, and a thesaurus 122. A term structure correct answer table 123 and a case conversion rule output unit 107.

以下、図２５に示す具体的な辞書作成装置１００の動作を、図２６に示すフローチャートを参照して説明する。 Hereinafter, the operation of the specific dictionary creation apparatus 100 shown in FIG. 25 will be described with reference to the flowchart shown in FIG.

まず、辞書作成装置１００に対して、自然言語で書かれた訓練用のテキストを構文解析部１０１に入力する（ステップ１）。 First, training text written in a natural language is input to the syntax analysis unit 101 to the dictionary creation device 100 (step 1).

次に、構文解析部１０１が、前記テキストを構文解析することにより、文節集合および文節間の係り受け関係を特定して、解析結果を構文解析結果テーブル１２０に格納する（ステップ２）。例えば、「村山富市首相は妻の作ったおいしいカレーを食べた。」という文を構文解析した場合、前記図５のように、テキストの先頭から順に、単語、単語基本形、品詞、固有名詞タグ、その単語が属す文節番号を得る。ここで文節番号は「０」以上の整数で、テキストの先頭の文節から順に０，１，２，・・・と付与される。 Next, the syntax analysis unit 101 parses the text to identify a dependency set between clause sets and clauses, and stores the analysis result in the syntax analysis result table 120 (step 2). For example, when the sentence “Mr. Tomiichi Murayama ate a delicious curry made by his wife” was parsed, words, basic word forms, parts of speech, proper noun tags in order from the top of the text as shown in FIG. , Get the phrase number to which the word belongs. Here, the phrase number is an integer greater than or equal to “0”, and is assigned as 0, 1, 2,... In order from the first phrase of the text.

また、固有名詞タグは、単語が固有名詞あるいは、固有名詞の一部であると解析されたときに付与されるタグであり、例えば、Ｂ−ＰＥＲＳＯＮは、人名の固有名詞の先頭単語を表し、Ｉ−ＰＥＲＳＯＮは、人名の固有名詞の先頭ではない単語を表し、「Ｏ」は、固有名詞ではない名詞を表す。 The proper noun tag is a tag given when a word is analyzed as a proper noun or a part of a proper noun. For example, B-PERSON represents the first word of a proper noun of a person name, I-PERSON represents a word that is not the head of a proper noun of a person name, and “O” represents a noun that is not a proper noun.

また、構文解析部１０１は、前記図６のように、文節番号とその文節が係る文節の番号である係り先文節番号の組を出力する。ここで、文末等で係り先がない場合は、係り先文節番号を「−１」とした。この出力結果は、構文解析結果テーブル１２０に格納される。この構文解析部１０１では、ＣａｂｏＣｈａ（参考文献：非特許文献３）等の単体の構文解析器を利用することもできる。 Further, as shown in FIG. 6, the syntax analysis unit 101 outputs a set of a clause number and a related clause number which is the number of the clause related to the clause. Here, when there is no destination at the end of the sentence, the destination clause number is set to “−1”. This output result is stored in the syntax analysis result table 120. In the syntax analysis unit 101, a single syntax analyzer such as CaboCha (reference document: Non-Patent Document 3) can be used.

次に、固有名詞連結部１０２が、構文解析結果テーブル１２０の固有名詞タブ欄を上から順に見て、一つの固有名詞を一つにまとめ１つの単語とし、テキストの先頭から順に単語番号を付与する。ここで単語番号は「０」以上の整数で、テキストの先頭の単語から順に、０，１，２，．．．と付与する（ステップ３）。例えば、構文解析結果テーブル１２０が、前記図５のようになっている場合は、単語「村山」と単語「富市」は、各々固有名詞タグ「Ｂ−ＰＥＲＳＯＮ」「Ｉ−ＰＥＲＳＯＮ」を持つため、固有名詞連結部１０２が、これら二つの単語を連結して単語欄は「村山富市」、単語基本形欄は「村山富市」とする。品詞欄はまとめあげて「名詞−固有名詞−人名」、固有名詞タグ欄もまとめて「ＰＥＲＳＯＮ」とする。また、テキストの先頭の単語から順に単語番号を付与する。その結果で構文解析結果テーブル１２０を更新する。更新された構文解析結果テーブル１２０の例を前記図７に示す。 Next, the proper noun linking unit 102 looks at the proper noun tab column of the syntax analysis result table 120 in order from the top, combines one proper noun into one word, and assigns word numbers in order from the beginning of the text. To do. Here, the word number is an integer greater than or equal to “0”, and in order from the first word of the text, 0, 1, 2,. . . (Step 3). For example, when the syntax analysis result table 120 is as shown in FIG. 5, the words “Murayama” and the word “Fumiichi” have proper noun tags “B-PERSON” and “I-PERSON”, respectively. The proper noun linking unit 102 concatenates these two words, and sets the word column as “Murayama Tomi City” and the word basic form column as “Murayama Tomi City”. The part-of-speech column is collectively referred to as “noun-proprietary noun-person name”, and the proper-noun tag column is also collectively referred to as “PERSON”. Also, word numbers are assigned in order from the first word of the text. The syntax analysis result table 120 is updated with the result. An example of the updated syntax analysis result table 120 is shown in FIG.

次に、主辞特定部１０３が、各文節の意味的主辞を特定する（ステップ４）。各文節において、構文解析器によって、文節内の自立語、すなわち品詞が名詞のものおよび、動詞、形容詞のもの中で自立語と判定された単語の中で、文節中で最も後ろに位置する単語を意味的主辞として抽出し、その単語の単語番号、単語の属す文節の文節番号とあわせて、主辞テーブル１２１を作成する。主辞テーブル１２１の例を前記図８に示す。 Next, the main part specifying unit 103 specifies the semantic main part of each phrase (step 4). In each clause, the last word in the clause among the independent words in the clause, that is, the words whose part of speech is determined to be independent words in nouns and in verbs and adjectives Is extracted as a semantic main word, and the main word table 121 is created together with the word number of the word and the phrase number of the phrase to which the word belongs. An example of the main table 121 is shown in FIG.

次に、主辞間係り受け解析部１０４が、主辞テーブル１２１の文節番号欄の各文節番号に対する係り先の文節番号を、係り先文節番号テーブルに記載の文節番号に対する係り先文節番号を用いて取得し、主辞テーブル１２１に係り先文節番号を追加する（ステップ５）。図６に示す係り先文節番号テーブルと図８に示す主辞テーブル１２１を用いて得られた主辞テーブル１２１の例を図９に示す。 Next, the dependency dependency analysis unit 104 acquires the clause number of the dependency destination for each clause number in the clause number column of the subject table 121 using the dependency clause number for the clause number described in the dependency clause number table. Then, the related phrase number is added to the main word table 121 (step 5). FIG. 9 shows an example of the main word table 121 obtained by using the relation clause number table shown in FIG. 6 and the main word table 121 shown in FIG.

また、主辞間係り受け解析部１０４が、各主辞について、各主辞の属する文節内で後続する非自立語全体を付属語として主辞テーブル１２１に追加する（ステップ６）。ただし、後続する非自立語全体の末尾に句読点がある場合には、その句読点を除いたものを付属語とする。その結果の主辞テーブル１２１の例を図１０に示す。 The inter-subject dependency analysis unit 104 adds, for each main word, the entire non-independent word that follows in the phrase to which each main word belongs to the main word table 121 as an attached word (step 6). However, if there is a punctuation mark at the end of the subsequent non-independent words, the word excluding the punctuation mark is used as an appendix. An example of the main table 121 as a result is shown in FIG.

次に、ステップ７では、格変換ルール学習部１０５が、格変換ルールを学習する。学習用のデータとして、人手で述語または動作性名詞の基本形の必須表層格に付与された、訓練用テキスト「村山富市首相は、妻の作ったおいしいカレーを食べた。」に対する正解の項構造が図２７のように与えられているとする。 Next, in step 7, the case conversion rule learning unit 105 learns case conversion rules. As the data for learning, the correct term structure for the training text "Prime Minister Tomiichi Murayama ate a delicious curry made by his wife", which was manually assigned to the required superficial case of the basic form of predicate or action noun. Is given as shown in FIG.

ここで＜ＮＰＩＤ＝数字＞と＜／ＮＰ＞のタグで囲まれた部分は名詞句を表し、＜ＰＲＥＤ〜＞と＜／ＰＲＥＤ＞で囲まれた部分は述語であることを表す。このテキストには出現していないが動作性名詞については＜ＥＶＥＮＴ〜＞と＜／ＥＶＥＮＴ＞のタグで囲むとする。また、＜ＰＲＥＤ〜＞タグと＜ＥＶＥＮＴ〜＞タグの「〜」の部分に記述される「ｇａ＝"２"ｗｏ＝"３"」等の記載は、「この述語や動作性名詞の基本形に対してガ格をとる項の正解は、ＩＤ番号が２である名詞句、ヲ格をとる項の正解は、ＩＤ番号が３である名詞句」等を表す。このデータに対して格変換ルール学習部１２５が、訓練用テキスト、正解の項構造、構文解析結果テーブル１２０を用いて、＜ＰＲＥＤ〜＞タグと＜／ＰＲＥＤ＞タグで挟まれた述語および＜ＥＶＥＮＴ〜＞タグと＜／ＥＶＥＮＴ＞タグで囲まれた動作性名詞について、その基本形に対する必須表層格およびそれぞれの単語が属す文節番号を格納した項構造正解テーブル１２３を作成する。項構造正解テーブル１２３の例を図２８に示す。ここでは、簡単のため、必須表層格はガ格、ヲ格、ニ格だけ扱うとするが、他の格についても同様に処理が可能である。 Here, a portion surrounded by tags <NP ID = number> and </ NP> represents a noun phrase, and a portion surrounded by <PRED ~> and </ PRED> represents a predicate. Although it does not appear in this text, a behavioral noun is assumed to be enclosed in <EVENT-> and </ EVENT> tags. In addition, the description such as “ga =“ 2 ”wo =“ 3 ”” described in the “˜” part of the <PRED ~> tag and the <EVENT ~> tag is “the basic form of this predicate or action noun”. On the other hand, the correct answer of a term that takes a case is a noun phrase whose ID number is 2, and the correct answer of a term that takes a case is a noun phrase whose ID number is 3. For this data, the case conversion rule learning unit 125 uses the training text, the correct term structure, and the syntax analysis result table 120 to use a predicate sandwiched between <PRED ~> tag and </ PRED> tag and <EVENT For the behavioral nouns enclosed by the ~> tag and the </ EVENT> tag, the term structure correct answer table 123 is created which stores the essential surface case for the basic form and the phrase number to which each word belongs. An example of the term structure correct answer table 123 is shown in FIG. Here, for the sake of simplicity, it is assumed that only the mandatory case cases are treated as ga, wo, and d, but other cases can be processed in the same manner.

次に、格変換ルール学習部１０５は、項構造正解テーブル１２３における各単語を文節番号に対する主辞で置き換える。また、主辞に対応する単語番号も格納する。もし、その主辞が置き換える前の単語の部分文字列になっていない場合は、単語欄および文節番号を“−”で置き換える。 Next, the case conversion rule learning unit 105 replaces each word in the term structure correct answer table 123 with the main word for the phrase number. The word number corresponding to the main word is also stored. If the main word is not a partial character string of the word before replacement, the word field and phrase number are replaced with "-".

また、述語または動作性名詞自体が文節の主辞になっていない場合には、述語または動作性名詞を項構造正解テーブル１２３から削除する。このようにして更新された後の項構造正解テーブル１２３の例を図２９に示す。 If the predicate or behavioral noun itself is not the main word of the clause, the predicate or behavioral noun is deleted from the term structure correct answer table 123. FIG. 29 shows an example of the term structure correct answer table 123 after being updated in this way.

次に、格変換ルール学習部１０５は、前記図１０に示すような主辞テーブル１２１の情報を利用して、元のテキストにおける係り受け関係の情報を項構造正解テーブル１２３に加える。このようにして更新された後の項構造正解テーブル１２３の例を図３０に示す。ここで係り関係は、項が述語または動作性名詞にかかっている場合は、項の付属語、述語または動作性名詞が項にかかっている場合は、述語または動作性名詞の付属語に“（逆）”を追加したものである。 Next, the case conversion rule learning unit 105 adds the dependency relationship information in the original text to the term structure correct answer table 123 using the information in the main table 121 as shown in FIG. An example of the term structure correct answer table 123 after being updated in this way is shown in FIG. Here, the relationship is related to the predicate or behavioral noun if the term depends on the predicate or behavioral noun, and if the predicate or behavioral noun depends on the term, “( The reverse) "is added.

次に、格変換ルール学習部１０５は、日本語語彙大系（参考文献：非特許文献４）等のシソーラス１２２を用いて、単語に意味カテゴリを付与する。また、単語番号、文節番号は削除する。述語または動作性名詞の単語欄は構文解析結果テーブル１２０を用いて単語の基本形に直す。
池原悟，宮崎正弘，白井諭，横尾昭男，中岩浩巳，小倉健太郎，大山芳史，林良彦、日本語語彙大系．岩波書店（１９９７）．その結果、更新された後の項構造正解テーブル１２３の例を図３１に示す。このようなテーブルを大量に集め、機械学習手法を用いて、一般に成立する格変換ルールを学習する。例えば、簡単のため、図３２に示すような５項目しかない項構造正解テーブル１２３があったとき、このテーブルに対する格変換ルール学習部１０５の動作について説明する。ここで、格変換ルールの学習は、述語または動作性名詞ごとに行うこととする。 Next, the case conversion rule learning unit 105 assigns a semantic category to the word using a thesaurus 122 such as a Japanese vocabulary system (reference document: non-patent document 4). The word number and phrase number are deleted. The word column of the predicate or action noun is converted into the basic form of the word using the syntax analysis result table 120.
Satoru Ikehara, Masahiro Miyazaki, Atsushi Shirai, Akio Yokoo, Hiroaki Nakaiwa, Kentaro Ogura, Yoshifumi Oyama, Yoshihiko Hayashi, Japanese Vocabulary System. Iwanami Shoten (1997). As a result, an example of the updated term structure correct answer table 123 is shown in FIG. A large number of such tables are collected, and a case conversion rule that is generally established is learned using a machine learning method. For example, for simplicity, when there is a term structure correct answer table 123 having only five items as shown in FIG. 32, the operation of the case conversion rule learning unit 105 for this table will be described. Here, case conversion rules are learned for each predicate or action noun.

まず、格変換ルール学習部１０５は、項構造正解テーブル１２３の「述語または動作性名詞」欄が「作る」である１行目〜３行目に対し、述語「作る」専用の訓練データを生成する。 First, the case conversion rule learning unit 105 generates training data dedicated to the predicate “create” for the first to third lines in which the “predicate or action noun” column of the term structure correct answer table 123 is “create”. To do.

格変換ルール学習部１０５は、項構造テーブル１２３の１行目〜３行目から、格変換前の構造を抽出し、機械学習に用いる変換前特徴とする。例えば変換前特徴１を「単語：妻、係り関係：の」、変換前特徴２を「単語：カレー、係り関係：た（逆）」、変換前特徴３を「単語：私、係り関係：が」、変換前特徴４を「単語：シチュー、係り関係：た（逆）」、変換前特徴５を「単語：会社、係り関係：が」、変換前特徴６を「単語：規則、係り関係：た（逆）」とする。変換前特徴としては、意味カテゴリを用いた「意味カテゴリ：女性、係り関係：の」や、意味カテゴリの上位の意味カテゴリを用いた「意味カテゴリ：実体、係り関係：の」といった特徴も用いることができるが、ここでは説明を簡単にするため、上記６つの特徴を使用したとする。 The case conversion rule learning unit 105 extracts the structure before the case conversion from the first to third lines of the term structure table 123 and uses it as the pre-conversion feature used for machine learning. For example, pre-conversion feature 1 is “word: wife, relationship: no”, pre-conversion feature 2 is “word: curry, relationship: ta (reverse)”, and pre-conversion feature 3 is “word: me, relationship: The pre-conversion feature 4 is “word: stew, relationship: ta (reverse)”, the pre-conversion feature 5 is “word: company, relationship: g”, and the pre-conversion feature 6 is “word: rule, relationship: (Reverse) ”. As a pre-conversion feature, a feature such as “semantic category: female, relationship: no” using a semantic category or “semantic category: entity, relationship: no” using a semantic category higher than the semantic category is used. However, in order to simplify the explanation, it is assumed that the above six features are used.

次に、格変換ルール学習部１０５は、項構造正解テーブル１２３の１行目〜３行目に対し、変換前構造ベクトルｘ_i（ｉ＝１〜３）を作成する。ここで変換前ベクトルは、上記特徴１から特徴６について、その特徴が項構造正解テーブルに記載されている場合には１、記載されていない場合には０の要素を持つベクトルである。例えば、項構造正解テーブル１２３の１行目については、特徴１および特徴２のみが記載されているため、ｘ₁＝（１，１，０，０，０，０）ｔ（ｔは転置を表す、以下同様）を生成し、項構造正解テーブル１２３の２行目については、特徴３および特徴４のみが記載されているため、ｘ₂＝（０，０，１，１，０，０）ｔを生成する。同様にしてｘ₃も生成する。 Next, the case conversion rule learning unit 105 creates pre-conversion structure vectors x _i (i = 1 to 3) for the first to third lines of the term structure correct answer table 123. Here, the pre-conversion vector is a vector having elements of 1 to 6 when the feature is described in the term structure correct answer table and 0 when the feature is not described. For example, since only the feature 1 and the feature 2 are described in the first row of the term structure correct answer table 123, x ₁ = ( ₁ , ₁ , 0, 0, 0, 0) t (t represents transposition) , The same applies hereinafter), and only the feature 3 and the feature 4 are described in the second row of the term structure correct answer table 123. Therefore, x ₂ = (0,0,1,1,0,0) t Is generated. Similarly, x ₃ is also generated.

次に、格変換ルール学習部１０５は、項構造正解テーブル１２３の１行目〜３行目から、格変換後の構造を抽出し、機械学習に用いる変換後特徴とする。例えば変換後特徴１を「一番目に出現した単語」、変換後特徴２を「二番目に出現した単語」とし、それらの単語が変換後に例えば基本形に対する表層格がガ格を取れば１、ヲ格を取れば２、ニ格を取れば３、その他の格を取れば４の特徴量を持つ、といったように特徴を定義する。 Next, the case conversion rule learning unit 105 extracts the structure after the case conversion from the first to third lines of the term structure correct answer table 123 and uses it as the post-conversion feature used for machine learning. For example, if the post-conversion feature 1 is “first appearing word” and the post-conversion feature 2 is “second appearing word”, and if these words are converted, for example, the surface case for the basic form takes a case of 1, The features are defined such as 2 for the case, 3 for the second case, and 4 for the other cases.

次に、格変換ル−ル学習部１０５は、項構造正解テーブル１２３の１行目〜３行目に対し、変換後構造ベクトルｙ_i（ｉ＝１〜３）を作成する。例えば、項構造正解テーブル１２３の１行目については、一番目に出現した単語「妻」がガ格に変換され、二番目に出現した単語「カレー」がヲ格に変換されるため、ｙ₁＝（１，２）ｔを生成する。項構造正解テーブル１２３の２行目および３行目についても同様にしてｙ₂＝（１，２）ｔおよびｙ₃＝（１，２）ｔを生成する。 Next, the case conversion rule learning unit 105 creates post-conversion structure vectors y _i (i = 1 to 3) for the first to third lines of the term structure correct answer table 123. For example, for the first line of the term structure correct answer table 123, the first word “wife” is converted to ga and the second word “curry” is converted to wo, so y ₁ = (1,2) t is generated. Similarly, y ₂ = (1,2) t and y ₃ = (1,2) t are generated for the second and third lines of the term structure correct answer table 123.

以降、変換前構造ベクトルｘ_iから変換後構造ベクトルｙ_iへの変換のことを「格変換ルールｉ」と呼ぶ。 Hereinafter, the conversion from the pre-conversion structure vector x _i to the post-conversion structure vector y _i is referred to as “case conversion rule i”.

次に、これらの格変換ルールに対して格変換ルール学習部１０５が統計的な機械学習を行い、格変換ルールに対する適切な重み付けを行う。一例としてＳＶＭｓｔｒｕｃｔと呼ばれる手法で、特に下記に示す非特許文献５において、ＳＶＭ△ｓ₂（Δｓは上付き）と呼ばれている機械学習手法を利用して、学習を行う方法について、以下に述べる。
ＩｏａｎｎｉｓＴｓｏｃｈａｎｔａｒｉｄｉｓ，ＴｈｏｍａｓＨｏｆｍａｎｎ，ＴｈｏｒｓｔｅｎＪｏａｃｈｉｍｓ，ＹａｓｅｍｉｎＡｌｔｕｎ、“ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅＬｅａｒｎｉｎｇｆｏｒＩｎｔｅｒｄｅｐｅｎｄｅｎｔａｎｄＳｔｒｕｃｔｕｒｅｄＯｕｔｐｕｔＳｐａｃｅｓ”，ＴｈｅＰｒｏｃｅｅｄｉｎｇｓｏｆＴｈｅＴｗｅｎｔｙ−Ｆｉｒｓｔ，ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＭａｃｈｉｎｅＬｅａｒｎｉｎｇ（ＩＣＭＬ２００４）ＳＶＭ△ｓ2（Δｓは上付き）では、下記数１で表される不等式を満たしながら、下記数２の式を最小化するような、重みベクトルｗとスラック変数ベクトルξを求める。 Next, the case conversion rule learning unit 105 performs statistical machine learning on these case conversion rules, and appropriately weights the case conversion rules. As an example, a method called SVMstruct, and a method of learning using a machine learning method called SVMΔs ₂ (Δs is a superscript) in Non-Patent Document 5 shown below will be described below. .
Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, Yasemin Altun, "Support Vector Machine Learning for Interdependent and Structured Output Spaces", The Proceedings of The Twenty-First, International Conference on Machine Learning (ICML2004) SVM △ s2 (Δs superscript) Then, the weight vector w and the slack variable vector ξ are calculated so as to minimize the following equation 2 while satisfying the inequality represented by the following equation 1.

ここで、ｉは上記ｘ_iおよびｙ_iにおけるｉと同じインデックス番号、ｘ_iは上記と同じ変換前構造ベクトル、ｙ_iは、上記と同じ正解の変換後構造ベクトルである。Ｙは変換前構造ベクトルｘ_iを変換したとき取りうる可能性のある変換後構造ベクトルの全体集合であり、ｙ^〜 _i,kは、取りうる可能性のある変換後構造ベクトルのうち、正解の構造ｙ_iと異なる不正解の構造で、ｋ番目のものを表す（ｋはインデックス番号）。δψ（ｘ_i，ｙ^〜 _i,k）は、下記数３で定義されるψ（ｘ_i，ｙ_i）とψ（ｘ_i，ｙ^〜 _i,k）の差である。ψ（ｘ_i，ｙ_i）は，変換前ベクトルｘ_iと変換後ベクトルｙ_iから生成されるベクトルで、ｘ_iの要素が表す特徴とｙ_iの要素が表す特徴の論理積を要素とするベクトルである。図２０にψ（ｘ_i，ｙ_i）の例として上記ｘ₁および上記ｙ₁に対するベクトルψ（ｘ₁，ｙ₁）の例を示す。ξ_iはスラック変数ベクトルξのｉ番目の要素で変換前構造ベクトルｘ_iに対するスラック変数であり、「０」以上の実数値を持つ変数である。Δ（ｙ_i，ｙ^〜 _i,k）は、正解の変換後構造ｙ_iに対して不正解の構造ｙ^〜 _i,kがどれくらい異なった構造であるかを表す損失関数で、本実施形態では、構造ｙ_iとｙ^〜 _i,kで一致しない格要素の数とする。Ｃは、訓練データ全体に対する識別間違いと、マージン最大化のトレードオフを表すパラメータであり、正の実数値で、ユーザが与える値である。ｎはインデックス番号ｉの最大値、すなわち訓練データの数で、＜Ａ，Ｂ＞は、ベクトルＡとベクトルＢの内積を表す。 Here, i is the same index number as _i in the above x _i and y _i , x _i is the same pre-conversion structure vector as above, and y _i is the same correct post-conversion structure vector as above. Y is a whole set of post-conversion structure vectors that can be obtained when the pre-conversion structure vector x _i is transformed, and y ^to _{i, k} are correct ones of the post-conversion structure vectors that can be taken. A structure of incorrect answers different from the structure y _i , which represents the k-th structure (k is an index number). δψ (x _i , y ^to _{i, k} ) is a difference between ψ (x _i , y _i ) and ψ (x _i , y ^to _{i, k} ) defined by the following equation (3). ψ (x _i , y _i ) is a vector generated from the pre-conversion vector x _i and the post-conversion vector y _i , and the logical product of the feature represented by the element of x _{i and} the feature represented by the element of y _i is an element. Is a vector. FIG. 20 shows an example of the vector ψ (x ₁ , y ₁ ) with respect to x ₁ and y ₁ as an example of ψ (x _i , y _i ). ξ _i is the i th element of the slack variable vector ξ and is a slack variable for the structure vector x _i before conversion, and is a variable having a real value equal to or greater than “0”. Δ (y _i , y ^˜ _{i, k} ) is a loss function that represents how different the incorrect solution structure y ^˜ _{i, k} is from the correct post-conversion structure y _i . , And the number of case elements that do not match between the structures y _i and y ^˜ _{i, k} . C is a parameter representing a trade-off between identification error for the entire training data and margin maximization, and is a positive real value given by the user. n is the maximum value of the index number i, that is, the number of training data, and <A, B> represents the inner product of the vector A and the vector B.

ＳＶＭΔｓ₂（Δｓは上付き）では、数１で表される不等式を満たしながら、下記数２の式を最小化する最適化問題を解くが、直接この最適化問題を解くのではなく、双対問題である数４の式に変換した上でこの最適化問題を解く。 SVMΔs ₂ (Δs is a superscript) solves an optimization problem that minimizes the following equation 2 while satisfying the inequality represented by equation 1, but does not solve this optimization problem directly, but a dual problem. This optimization problem is solved after being converted into the following equation (4).

ここで、α_i,kは、ｙ^〜 _i,kに対応するラグランジュ係数である。また、δｉｉ’はｉ＝ｉ’のときに１、ｉ≠ｉ’のときに０を取る関数である。また、前記数４におけるベクトル内積の部分で、内積の代わりに、下記数５で示すようにカーネル関数Ｋを定義して使用することができる。 Here, α _{i, k} is a Lagrangian coefficient corresponding to y ^to _{i, k} . Further, δii ′ is a function that takes 1 when i = i ′ and takes 0 when i ≠ i ′. In addition, in the vector inner product portion in Equation 4, a kernel function K can be defined and used instead of the inner product as shown in Equation 5 below.

本実施形態では、カーネル関数に、２次多項式カーネルＫ（Ａ，Ｂ）＝（＜Ａ，Ｂ＞＋１）²を使用することとする。ここでＡ，Ｂは同じ次元のベクトルだとする。 In the present embodiment, a second-order polynomial kernel K (A, B) = (<A, B> +1) ² is used as the kernel function. Here, A and B are vectors of the same dimension.

同様にして、格変換ルール学習部１０５は、項構造正解テーブル１２３の「述語または動作性名詞」欄が「食べる」である４行目〜５行目に対し、述語「食べる」専用の訓練データを生成し、生成された格変換ルールに対し、機械学習による重みの学習を行い、格変換ルールに対する適切な重み付けを行う。また、この例では、各述語または動作性名詞に対してのみ格変換ルールの生成、重みの学習を行ったが、各用言意味属性レベルやすべての述語または動作性名詞といったレベルについても格変換ルールの生成、重みの学習を行う。ここで用言意味属性とは、動詞、形容詞などの用言に対して、その意味から人手で「物理的移動」「身体動作」「感情動作」などの属性を設定したもので、例えば「用言意味属性体系」（参考文献：非特許文献６）等から与えることができる。
中岩浩巳、池原悟、“日英の構文的対応関係に着目した日本語用言意味属性の分類”，情報処理学会論文誌、Ｖｏｌ．３８、Ｎｏ．２、ｐｐ．２１５−２２５、１９９７年。 Similarly, the case conversion rule learning unit 105 performs training data dedicated to the predicate “eat” on the fourth to fifth lines in which the “predicate or behavioral noun” column of the term structure correct answer table 123 is “eat”. , And learning weights by machine learning for the generated case conversion rules, and appropriately weighting the case conversion rules. In this example, case conversion rules are generated and weights are learned only for each predicate or action noun, but case conversion is also performed for each predicate attribute level and all predicates or action noun levels. Generate rules and learn weights. Here, the term meaning attribute refers to verbs, adjectives, etc., and attributes such as “physical movement”, “body movement”, and “emotional movement” are manually set based on the meaning. The word meaning attribute system "(reference document: non-patent document 6) can be given.
Hiroaki Nakaiwa and Satoru Ikehara, “Classification of Japanese semantic meanings focusing on syntactic correspondence between Japanese and English”, Transactions of Information Processing Society of Japan, Vol. 38, no. 2, pp. 215-225, 1997.

機械学習によって得られた格変換ルールと重みの例を図３７に示す。 An example of case conversion rules and weights obtained by machine learning is shown in FIG.

最後に得られたテーブルを解析結果として格変換規則出力部１０７から出力して格変換規則テーブル（辞書）を得る（ステップ８）。 The finally obtained table is output as an analysis result from the case conversion rule output unit 107 to obtain a case conversion rule table (dictionary) (step 8).

なお、本具体例では、出力する項構造が必須表層格に関する項構造である具体例について記述したが、出力する項構造が任意の表層格に関する場合や深層格に関する場合でも、入力する正解の項構造を、それらに応じたものにすれば同様の手段で実現可能である。 In this specific example, a specific example in which the term structure to be output is a term structure related to the required surface case has been described. However, even if the output term structure is related to any surface case or deep case, the correct answer term to be input is input. The structure can be realized by the same means as long as the structure is adapted to them.

また、学習手段においても格変換規則の重みを学習できるものであれば上記で述べた機械学習方法とは異なる学習方法を用いることが可能である。 Also, a learning method different from the machine learning method described above can be used as long as the learning means can learn the weight of the case conversion rule.

また、扱う対象が英語などの外国語テキストである場合にも、係り受け関係の付属語の代わりに、動詞・名詞以外の品詞や単語間で構成される名詞句・動詞句といった句構造における関係を係り関係として使用することによって、この学習方法を用いることが可能である。 In addition, even in the case of foreign language texts such as English, the relationship in phrase structures such as noun phrases / verb phrases composed of parts of speech other than verbs / nouns and words, instead of dependency-related adjuncts. This learning method can be used by using as a relationship.

図１、図３に示したシステム構成図における各部の一部もしくは全部の機能をコンピュータプログラムで構成し、そのプログラムを、コンピュータを用いて実行して、本発明を実現することができること、あるいは、図２、図４に示したステップをコンピュータのプログラムで構成し、そのプログラムをコンピュータに実行させることができることは言うまでもなく、コンピュータでその機能を実現するためのプログラム、あるいは、コンピュータにその処理ステップを実行させるためのプログラムを、そのコンピュータが読み取り可能な記録媒体、例えば、フレキシブルディスクや、ＭＯ、ＲＯＭ、メモリカード、ＣＤ、ＤＶＤ、リムーバブルディスクなどに記録して、保存したり、配布したりすることが可能である。 The functions of some or all of the components in the system configuration diagram shown in FIGS. 1 and 3 can be configured by a computer program, and the program can be executed using a computer to realize the present invention, or It goes without saying that the steps shown in FIGS. 2 and 4 can be configured by a computer program, and the program can be executed by the computer, and the program for realizing the function by the computer, or the processing steps by the computer. The program to be executed is recorded on a computer-readable recording medium, such as a flexible disk, MO, ROM, memory card, CD, DVD, removable disk, etc., and is stored or distributed. Is possible.

また、上記のプログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。このように、記録媒体やネットワークにより提供されたプログラムをコンピュータにインストールすることで、本発明が実施可能となる。 It is also possible to provide the above program through a network such as the Internet or electronic mail. In this way, the present invention can be implemented by installing a program provided by a recording medium or a network in a computer.

本発明の実施の形態に関する言語処理装置の原理的システム構成を示すブロック図。1 is a block diagram showing a basic system configuration of a language processing apparatus according to an embodiment of the present invention. 本実施形態の原理的動作を説明するためのフローチャート。The flowchart for demonstrating the fundamental operation | movement of this embodiment. 本実施形態にかかる言語処理装置のシステム構成を示すブロック図。The block diagram which shows the system configuration | structure of the language processing apparatus concerning this embodiment. 言語処理装置の動作を説明するためのフローチャート。The flowchart for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための構文解析結果の説明図。Explanatory drawing of the syntax analysis result for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための構文解析結果のうちの係り受け解析結果の説明図。Explanatory drawing of the dependency analysis result among the syntax analysis results for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための構文解析結果の更新の説明図。Explanatory drawing of the update of the syntax analysis result for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための主辞テーブルの説明図。Explanatory drawing of the main word table for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための係り先文節番号が追加された主辞テーブルの説明図Explanatory drawing of the main word table to which the relation clause number is added for explaining the operation of the language processing device 言語処理装置の動作を説明するための付属語が追加された主辞テーブルの説明図。Explanatory drawing of the main word table to which the attached word for demonstrating operation | movement of a language processing apparatus was added. 言語処理装置の動作を説明するための解析対象テキストに対する構文解析結果の説明図。Explanatory drawing of the syntax analysis result with respect to the analysis object text for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための解析対象テキストに対する構文解析結果のうちの係り受け解析結果の説明図。Explanatory drawing of the dependency analysis result among the syntax analysis results with respect to the analysis object text for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための係り先文節番号と付属語が追加された主辞テーブルの説明図。Explanatory drawing of the head table to which the relation clause number and the attached word for demonstrating operation | movement of a language processing apparatus were added. 言語処理装置の動作を説明するための非主辞述語テーブルの説明図。Explanatory drawing of the non-subject predicate table for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための解析対象テキストに対する述語項テーブルの説明図。Explanatory drawing of the predicate term table with respect to the analysis object text for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための解析対象テキストの述語・動作性名詞に対する格変換ルール重み（辞書）の説明図。Explanatory drawing of the case conversion rule weight (dictionary) with respect to the predicate and action noun of the analysis object text for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための解析対象テキストの述語・動作性名詞の用言意味属性および述語・動作性名詞一般に対する格変換ルール重み（辞書）の説明図。Explanatory drawing of the predicate / action noun's use meaning attribute of the analysis object text for explaining operation | movement of a language processing apparatus, and the case conversion rule weight (dictionary) with respect to a predicate / activity noun in general. 言語処理装置の動作を説明するための解析対象テキストの述語・動作性名詞に対して格変換ルールが適用され格変換が行われた結果得られた項構造テーブルの説明図。Explanatory drawing of the term structure table obtained as a result of applying a case conversion rule to the predicate and action noun of the analysis object text for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための係り受け関係がなかった述語・動作性名詞も追加した項構造テーブルの説明図。Explanatory drawing of the term structure table which also added the predicate and action noun which did not have the dependency relationship for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための照応解析処理を行った後の項構造テーブルの説明図。Explanatory drawing of the term structure table after performing the anaphora analysis process for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための修飾語句決定処理を行った後の項構造テーブルの説明図。Explanatory drawing of the term structure table after performing the modifier determination process for demonstrating operation | movement of a language processing apparatus. 言語処理装置の動作を説明するための基本形決定処理を行った後での項構造テーブルの説明図。Explanatory drawing of the term structure table after performing the basic form determination process for demonstrating operation | movement of a language processing apparatus. 言語処理装置に適用する辞書作成処理装置の原理的システム構成を示すブロック図。The block diagram which shows the fundamental system structure of the dictionary creation processing apparatus applied to a language processing apparatus. 辞書作成装置の原理的動作を説明するためのフローチャート。The flowchart for demonstrating the principle operation | movement of a dictionary creation apparatus. 具体的な辞書作成装置のシステム構成を示すブロック図。The block diagram which shows the system configuration | structure of a specific dictionary creation apparatus. 辞書作成装置の動作を説明するためのフローチャート。The flowchart for demonstrating operation | movement of a dictionary creation apparatus. 辞書作成装置の動作を説明するための訓練用テキストに対して人手で格関係の付与された正解の述語項構造の例を示す説明図。Explanatory drawing which shows the example of the predicate term structure of the correct answer by which the case relation was manually given with respect to the training text for demonstrating operation | movement of a dictionary creation apparatus. 辞書作成装置の動作を説明するための項構造正解テーブルの例を示す説明図。Explanatory drawing which shows the example of the term structure correct answer table for demonstrating operation | movement of a dictionary creation apparatus. 辞書作成装置の動作を説明するための単語が主辞で置き換えられた項構造正解テーブルの例を示す説明図。Explanatory drawing which shows the example of the term structure correct answer table by which the word for demonstrating operation | movement of a dictionary creation apparatus was replaced by the main word. 辞書作成装置の動作を説明するための係り受け関係情報が追加された項構造正解テーブルの例を示す説明図。Explanatory drawing which shows the example of the term structure correct answer table to which the dependency relation information for demonstrating operation | movement of a dictionary creation apparatus was added. 辞書作成装置の動作を説明するための意味カテゴリが付与された項構造正解テーブルの例を示す説明図。Explanatory drawing which shows the example of the term structure correct answer table to which the semantic category for demonstrating operation | movement of a dictionary creation apparatus was provided. 辞書作成装置の動作を説明するための格変換ルール学習部に与える項構造正解テーブルの例を示す説明図。Explanatory drawing which shows the example of the term structure correct answer table given to the case conversion rule learning part for demonstrating operation | movement of a dictionary creation apparatus. 辞書作成装置の動作を説明するためのシソーラスの例を示す説明図。Explanatory drawing which shows the example of the thesaurus for demonstrating operation | movement of a dictionary creation apparatus. 辞書作成装置の動作を説明するための生成した格変換ルールの例を示す説明図。Explanatory drawing which shows the example of the case conversion rule produced | generated for demonstrating operation | movement of a dictionary creation apparatus. 辞書作成装置の動作を説明するための生成した格変換ルールの例を示す説明図。Explanatory drawing which shows the example of the case conversion rule produced | generated for demonstrating operation | movement of a dictionary creation apparatus. 辞書作成装置の動作を説明するための格変換ルールベクトルの例を示す説明図。Explanatory drawing which shows the example of the case conversion rule vector for demonstrating operation | movement of a dictionary creation apparatus. 辞書作成装置の動作を説明するための機械学習によって得られた格変換ルールと重みの例を示す説明図。Explanatory drawing which shows the example of the case conversion rule and weight obtained by machine learning for demonstrating operation | movement of a dictionary creation apparatus.

Explanation of symbols

１・・・言語処理装置
２・・・格変換規則テーブル
３・・・格変換部
１０・・・言語処理装置
１０１・・・構文解析部
１０２・・・固有名詞連結部
１０３・・・主辞特定部
１０４・・・主辞間係り受け解析部
１０６・・・格変換規則テーブル
１１０・・・項構造出力部
１２０・・・構文解析結果テーブル
１２１・・・主辞テーブル
１２２・・・シソーラス
１２４・・・非主辞述語テーブル
１２５・・・非主辞述語記憶部
１２６・・・格変換部
１２７・・・照応解析部
１２８・・・修飾語句決定部
１２９・・・述語項テーブル
１３０・・・項構造テーブル DESCRIPTION OF SYMBOLS 1 ... Language processing device 2 ... Case conversion rule table 3 ... Case conversion part 10 ... Language processing device 101 ... Syntax analysis part 102 ... Proper noun connection part 103 ... Subject specification Unit 104 ... dependency dependency analysis unit 106 ... case conversion rule table 110 ... term structure output unit 120 ... syntax analysis result table 121 ... head table 122 ... thesaurus 124 ... Non-subject predicate table 125 ... non-subject predicate storage unit 126 ... case conversion unit 127 ... anaphora analysis unit 128 ... modifier phrase determination unit 129 ... predicate term table 130 ... term structure table

Claims

A language processing device that outputs a term structure about a predicate or a behavioral noun in an input text,
A case conversion rule storage means for storing a rule for converting a dependency state between a predicate or behavioral noun and other words or word attributes into a case relationship between the predicate or behavioral noun and other words;
Applying a rule that converts the dependency state of the text and the case relation of the case conversion rule storage means, and converting the input text into a predicate and a behavioral noun term structure, and outputting the case conversion means,
A language processing apparatus comprising:

The rules for converting to the case relationship are:
Generated by machine learning from text with correct case relationship and dependency status,
The language processing apparatus according to claim 1.

The dependency state is
About the semantic main part of the phrase, it is the information about the main part and the main part
The language processing apparatus according to claim 1, wherein:

The dependency state is
For the semantic headings of clauses, the information of the annexes to the source and destination headings and semantic headings, the part of speech of the semantic head or the semantic category of the semantic head, the predicate or behavioral noun, the semantic attribute or modifier Or information on the genre or category to which the word or text appears in the same text,
The language processing apparatus according to claim 1, wherein:

The rules for converting each of the above relationships are:
A parsing unit that parses the text to be analyzed described in a natural language with a term structure attached to a predicate or a behavioral noun, and analyzes a dependency set between clause sets and clauses;
A main part specifying unit that specifies a semantic main part for each phrase from words determined to be independent words in the sentence by the parsing unit, and creates a main part table;
Adding a dependency clause of the clause for each clause of the head table, and adding an auxiliary word that follows in the clause to which the head belongs to the head table dependency analysis unit;
From the term structure of the predicate or behavioral noun assigned to the text, extract the essential surface case for the basic form of the predicate and behavioral noun and the phrase to which each word belongs, create a term structure correct answer table, Each clause of the term structure correct answer table is converted to a main word with reference to the table, and the dependency relation of the clauses in the input text corresponding to the required case of each predicate and action noun is added to the term structure correct answer table. A case conversion rule learning unit for learning a case conversion rule using the term structure correct answer table and a case conversion rule output unit for outputting the obtained predicate and a behavioral noun as a case conversion rule.
The language processing apparatus according to claim 1.

A further thesaurus
The case conversion rule learning unit searches the thesaurus, extracts a superordinate concept of words in the term structure correct answer table, further creates a rule that replaces the extracted word and the original word, and The language processing apparatus according to claim 5, wherein the language processing apparatus is added to the structure correct answer table.

The language processing apparatus according to claim 5, wherein the case conversion rule learning unit learns a case conversion rule by machine learning of a correct case conversion rule described in the term structure correct answer table.

A case conversion rule storage means for storing a rule for converting a dependency state between a predicate or behavioral noun and other words or word attributes into a case relationship between the predicate or behavioral noun and other words;
Applying a rule for converting the dependency state of the text and the case relationship of the case conversion rule storage means, and converting the input text into a term structure of a predicate and a behavioral noun, and outputting the case conversion means,
A language processing method for outputting a term structure of a predicate or a behavioral noun in an input text,
Entering text and text dependency status;
Applying the case conversion rules stored in the case conversion rule storage means to the predicate or behavioral noun in the text according to the dependency state of the text, and converting to a term structure by the case conversion means;
Outputting a term structure for the predicate or behavioral noun obtained in the above step;
A language processing method comprising:

The case conversion rule is:
Generated by machine learning from text with correct case relationship and dependency status,
The language processing method according to claim 8.

The dependency state is
About the semantic main part of the phrase, it is the information about the main part and the main part
The language processing method according to claim 8 or 9, wherein

The dependency state is
For the semantic headings of clauses, the information of the annexes to the source and destination headings and semantic headings, the part of speech of the semantic head or the semantic category of the semantic head, the predicate or behavioral noun, the semantic attribute or modifier Or information on the genre or category to which the word or text appears in the same text,
The language processing method according to claim 8 or 9, wherein

The case conversion rule is:
A step of parsing a text to be analyzed described in a natural language in which a term structure is given to a predicate or a behavioral noun, and analyzing a dependency set between clause sets and clauses;
A main part specifying unit specifying a semantic main part for each phrase from the words determined as independent words in the phrase by the syntactic analysis unit, and creating a main part table;
A dependency relationship analysis unit between the heads adds a dependency clause of the clause for each clause of the head table, and adds, to the head table, an ancillary word that follows in the clause to which the head belongs,
The case conversion rule learning unit extracts, from the term structure of the predicate or behavioral noun assigned to the text, the required surface case for the basic form of the predicate and behavioral noun and the phrase to which each word belongs, and the term structure correct answer A table is created, and each clause of the term structure correct answer table is converted to a head with reference to the head table, and the dependency relationship of clauses in the input text corresponding to the required case of each predicate and action noun is The step of adding to the term structure correct answer table and the case conversion rule output unit are generated from the step of outputting the obtained predicate and the behavioral noun as a case conversion rule.
The language processing method according to claim 8.

A further thesaurus
The case conversion rule learning unit searches the thesaurus and extracts a high-level concept of a word in the term structure correct answer table, and then creates a rule that replaces the extracted word with the original word, The language processing method according to claim 12, wherein the language processing method is added to a term structure correct answer table.

13. The language processing method according to claim 12, wherein the case conversion rule learning unit learns a case conversion rule by machine learning of a correct case conversion rule described in the term structure correct answer table.

A language processing program for causing a computer to function as each unit in the language processing apparatus according to claim 1.

A computer-readable recording medium on which the language processing program according to claim 15 is recorded.