JP2011198075A

JP2011198075A - Natural language analysis device, method, and program

Info

Publication number: JP2011198075A
Application number: JP2010064512A
Authority: JP
Inventors: Manabu Satsusano; 学颯々野
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2010-03-19
Filing date: 2010-03-19
Publication date: 2011-10-06
Anticipated expiration: 2030-03-19
Also published as: JP5302922B2

Abstract

PROBLEM TO BE SOLVED: To provide a natural language analysis device, method, and program, for determining modification relations for each character, without morphologically analyzing a sentence to be analyzed to the end.SOLUTION: A natural language analysis device 10 acquires characters constituting the sentence to be analyzed, in units of characters, and determines dependency relations for each acquired character. Then the natural language analysis device 10 stacks dependency-undefined characters in a dependency-undefined character stack 107 in a process of determining dependency relations for each character in order from the first character in the sentence and, after the modification of the characters is determined by dependency determination, determines dependency relations of the characters stacked in the dependency-undefined character stack 107 to determine modifications of characters.

Description

本発明は、自然言語解析装置、方法及びプログラムに関する。 The present invention relates to a natural language analysis apparatus, method, and program.

従来より、日本語の文の解析では、形態素解析を行った単語（形態素）列を文節としてまとめ、その文節間の係り受け関係（依存関係）で解析結果を表現することが主流である。この場合、係り受け解析の手法は、事前に文頭から文末までスキャンして形態素解析を行う必要があり、形態素解析後にまた文頭に戻って、文頭から文末までスキャンして文節まとめ上げを行い、文節まとめ上げ後にまた文頭に戻って、文頭から文末までスキャンして係り受け解析を行っている。 2. Description of the Related Art Conventionally, in Japanese sentence analysis, it has been the mainstream to summarize word (morpheme) strings that have been subjected to morphological analysis as phrases and to express the analysis results as dependency relationships (dependencies) between the phrases. In this case, the dependency analysis method needs to perform morphological analysis by scanning from the beginning of the sentence to the end of the sentence in advance, and after returning to the beginning of the sentence after the morpheme analysis, it scans from the beginning of the sentence to the end of the sentence, collects the sentences, After summarizing, we return to the beginning of the sentence and scan from the beginning to the end of the sentence for dependency analysis.

この様な日本語の文節間の係り受け関係を解析する技術において、文節まとめ上げと係り受け解析とを１回のスキャンで行う特許文献１が知られている。 As a technique for analyzing the dependency relationship between Japanese phrases as described above, Japanese Patent Application Laid-Open No. H10-228867 is known in which phrase grouping and dependency analysis are performed in one scan.

特許文献１が開示する技術は、解析対象の文を形態素に分解されたものを入力し、形態素列の各単語（形態素）間の依存関係（係り元とその係り先との決定と、係り関係のタイプの決定との二つ）を決定する処理を、スタックを利用して行う。ここで、この決定された係り関係のタイプは、文節の区切りも表わしている。したがって、特許文献１が開示する技術は、このような依存関係を決定する処理を行うことで、文節まとめ上げと係り受け解析とを同時に行なうことができ、文節まとめ上げと係り受け解析とで独立したモジュールを必要としないと共に、処理を高速化することができる。 The technology disclosed in Patent Document 1 inputs a sentence to be analyzed that is decomposed into morphemes, and determines the dependency between each word (morpheme) in the morpheme string (determination between the relation source and its relation destination, relation relation) The process of deciding the two types) is performed using the stack. Here, the determined relationship type also represents a segment break. Therefore, the technique disclosed in Patent Document 1 can perform the phrase grouping and dependency analysis at the same time by performing such a process of determining the dependency, and the phrase grouping and dependency analysis are independent. The module is not required, and the processing speed can be increased.

特開２００９−１７６０６２号公報JP 2009-176062 A

しかしながら、特許文献１で開示する技術は、文末が明瞭な文を解析対象とする技術であり、解析対象の文を文末まで形態素に分解して形態素解析を行った後に、文節のまとめ上げと文節間の係り受け関係を決定するものである。したがって、特許文献１で開示する技術は、文末が不明瞭な場合には、形態素解析が行われず、文節間の係り受け関係を決定できない。 However, the technique disclosed in Patent Document 1 is a technique for analyzing a sentence whose sentence end is clear, and after analyzing the sentence to be analyzed into morphemes up to the end of the sentence and performing morphological analysis, It determines the dependency relationship between them. Therefore, in the technique disclosed in Patent Document 1, when the sentence end is unclear, the morphological analysis is not performed, and the dependency relationship between phrases cannot be determined.

そこで、文を文末まで形態素に分解して形態素解析を行わなくても、文節間の係り受け関係を決定することができる装置が求められている。 Therefore, there is a need for an apparatus that can determine the dependency relationship between clauses without decomposing sentences into morphemes until the end of the sentence and performing morphological analysis.

本発明は、解析対象の文を文末まで形態素解析しなくても、文字ごとの係り受け関係を決定することが可能な自然言語解析装置、方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide a natural language analysis apparatus, method, and program capable of determining a dependency relationship for each character without performing a morphological analysis of a sentence to be analyzed to the end of the sentence.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）自然言語文を解析する自然言語解析装置であって、解析対象の文を構成する文字を文字単位で取得する文字取得手段と、取得した文字ごとの依存関係を決定する文字係り受け解析手段と、を備え、前記文字係り受け解析手段は、前記解析対象の文の先頭文字から順に前記文字ごとの依存関係を決定する過程で、係り先が未確定の文字をスタックしていき、依存関係の判定により文字の係り先が決定した後に、スタックに蓄積された文字の依存関係の決定を行って文字の係り受けを決定する、自然言語解析装置。 (1) A natural language analysis apparatus for analyzing a natural language sentence, a character acquisition means for acquiring characters constituting the sentence to be analyzed in character units, and a character dependency analysis for determining a dependency relationship for each acquired character And the character dependency analyzing means stacks the characters whose dependency destinations are undetermined in the process of determining the dependency relationship for each character in order from the first character of the sentence to be analyzed. A natural language analyzing apparatus that determines a dependency of a character by determining a dependency relationship of characters accumulated in a stack after determining a character dependency destination by determining the relationship.

（１）の構成によれば、本発明に係る自然言語解析装置は、解析対象の文を構成する文字を文字単位で取得し、取得した文字ごとの依存関係を決定する。当該解析対象の文の先頭文字から順にこの文字ごとの依存関係を決定する過程で、自然言語解析装置は、係り先が未確定の文字をスタックしていき、依存関係の判定により文字の係り先が決定した後に、スタックに蓄積された文字の依存関係の決定を行って文字の係り受けを決定する。 According to the configuration of (1), the natural language analyzing apparatus according to the present invention acquires characters constituting the sentence to be analyzed in character units, and determines the dependency for each acquired character. In the process of determining the dependency relationship for each character in order from the first character of the sentence to be analyzed, the natural language analyzer stacks the characters whose dependency is not yet determined, and determines the character's dependency by determining the dependency. Is determined, the dependency of characters stored in the stack is determined to determine the dependency of the characters.

したがって、本発明に係る自然言語解析装置は、依存関係が未確定な文字をスタックに蓄積しながら文字ごとの係り受け解析をするので、解析対象の文を文末まで形態素解析しなくても、文字ごとの係り受け関係を決定することができる。 Therefore, since the natural language analysis apparatus according to the present invention performs dependency analysis for each character while accumulating characters whose dependency is not yet determined in the stack, even if the sentence to be analyzed is not analyzed to the end of the sentence, the character is analyzed. The dependency relationship can be determined for each.

（２）前記文字係り受け解析手段における依存関係の判定は、依存元及び依存先候補の文字の種類と、文字の位置の関係とに応じた判定結果のタイプを対応付けた文法定義テーブルに基づいて行う、（１）に記載の自然言語解析装置。 (2) Dependency determination in the character dependency analysis unit is based on a grammar definition table in which types of determination results corresponding to character types of dependency sources and dependency destination candidates and character position relationships are associated with each other. The natural language analysis device according to (1).

（２）の構成によれば、当該自然言語解析装置は、文字ごとの依存関係の判定を、依存元及び依存先候補の文字の種類と、文字の位置の関係とに応じた判定結果のタイプを対応付けた文法定義テーブルに基づいて行う。 According to the configuration of (2), the natural language analyzing apparatus determines the dependency relationship for each character, and determines the type of determination result according to the character type of the dependency source and the dependency destination candidate and the relationship between the character positions. Is performed based on the grammar definition table associated with.

したがって、当該自然言語解析装置は、文法定義テーブルに基づいて文字ごとの依存関係を決定するので、解析対象の文を文末まで形態素解析しなくても、文字ごとの係り受け関係を決定することができる。 Therefore, since the natural language analyzing apparatus determines the dependency relationship for each character based on the grammar definition table, it is possible to determine the dependency relationship for each character without performing morphological analysis of the sentence to be analyzed up to the end of the sentence. it can.

（３）前記文字係り受け解析手段における依存関係の判定は、ＳＶＭにより機械学習された文法ルールに基づいて行う、（１）又は（２）に記載の自然言語解析装置。 (3) The natural language analysis apparatus according to (1) or (2), wherein the dependency relationship in the character dependency analysis unit is determined based on a grammatical rule machine-learned by SVM.

（３）の構成によれば、当該自然言語解析装置は、文字ごとの依存関係の判定を、ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）により機械学習された文法ルールに基づいて行う。したがって、当該自然言語解析装置は、ＳＶＭにより文字ごとの依存関係を決定することができる。 According to the configuration of (3), the natural language analyzing apparatus performs the determination of the dependency for each character based on the grammatical rules machine-learned by SVM (Support Vector Machine). Therefore, the natural language analyzing apparatus can determine the dependency for each character by SVM.

（４）自然言語解析装置が、自然言語文を解析する自然言語解析方法であって、解析対象の文を構成する文字を文字単位で取得するステップと、取得した文字ごとの依存関係を決定する文字係り受け解析ステップと、を含み、前記文字係り受け解析ステップは、前記解析対象の文の先頭文字から順に前記文字ごとの依存関係を決定する過程で、係り先が未確定の文字をスタックしていき、依存関係の判定により文字の係り先が決定した後に、スタックに蓄積された文字の依存関係の決定を行って文字の係り受けを決定する、自然言語解析方法。 (4) A natural language analyzing apparatus is a natural language analyzing method for analyzing a natural language sentence, and a step of acquiring characters constituting a sentence to be analyzed in character units and a dependency relationship for each acquired character are determined. A character dependency analysis step, wherein the character dependency analysis step stacks a character whose dependency destination is undetermined in the process of determining the dependency for each character in order from the first character of the sentence to be analyzed. A natural language analysis method for determining a character dependency by determining a dependency relationship of a character accumulated in a stack after determining a character dependency destination by determining a dependency relationship.

したがって、本発明に係る自然言語解析方法は、当該自然言語解析装置が、依存関係が未確定な文字をスタックに蓄積しながら文字ごとの係り受け解析をするので、解析対象の文を文末まで形態素解析しなくても、文字ごとの係り受け関係を決定することができる。 Therefore, in the natural language analysis method according to the present invention, the natural language analysis apparatus performs dependency analysis for each character while accumulating the characters whose dependency is undetermined in the stack. Even without analysis, the dependency relationship for each character can be determined.

（５）自然言語文を解析する自然言語解析プログラムであって、コンピュータに、解析対象の文を構成する文字を文字単位で取得するステップと、取得した文字ごとの依存関係を決定する文字係り受け解析ステップと、を実行させ、前記文字係り受け解析ステップは、前記解析対象の文の先頭文字から順に前記文字ごとの依存関係を決定する過程で、係り先が未確定の文字をスタックしていき、依存関係の判定により文字の係り先が決定した後に、スタックに蓄積された文字の依存関係の決定を行って文字の係り受けを決定する、プログラム。 (5) A natural language analysis program for analyzing a natural language sentence, in which a computer obtains characters constituting the sentence to be analyzed in character units, and character dependency for determining a dependency for each acquired character. And the character dependency analysis step stacks characters whose dependency destinations are undetermined in the process of determining the dependency for each character in order from the first character of the sentence to be analyzed. A program for determining a character dependency by determining a dependency relationship of characters stored in a stack after determining a character dependency destination by determining a dependency relationship.

したがって、本発明に係る自然言語解析プログラムを自然言語解析装置に導入して実行することにより、当該自然言語解析装置は、依存関係が未確定な文字をスタックに蓄積しながら文字ごとの係り受け解析をするので、解析対象の文を文末まで形態素解析しなくても、文字ごとの係り受け関係を決定することができる。 Therefore, by introducing the natural language analysis program according to the present invention into the natural language analysis device and executing it, the natural language analysis device can perform dependency analysis for each character while accumulating the characters whose dependency relationship is undetermined in the stack. Therefore, the dependency relationship for each character can be determined without performing morphological analysis of the sentence to be analyzed until the end of the sentence.

本発明によれば、自然言語解析処理において、解析対象の文を文末まで形態素解析しなくても、文字ごとの係り受け関係を決定することができる。 According to the present invention, in the natural language analysis process, the dependency relationship for each character can be determined without performing a morphological analysis of the sentence to be analyzed up to the end of the sentence.

本発明の一実施形態に係る自然言語解析装置の構成例を示す図である。It is a figure which shows the structural example of the natural language analyzer which concerns on one Embodiment of this invention. 本発明の一実施形態に係る自然言語解析装置において、ＳＶＭを用いた場合の依存関係判定部の構成例を示す図である。It is a figure which shows the structural example of the dependence relationship determination part at the time of using SVM in the natural language analyzer which concerns on one Embodiment of this invention. 本発明の一実施形態に係る自然言語解析装置における解析文字格納部の例を示す図である。It is a figure which shows the example of the analysis character storage part in the natural language analyzer which concerns on one Embodiment of this invention. 本発明の一実施形態に係る自然言語解析装置における文法定義テーブルの例を示す図である。It is a figure which shows the example of the grammar definition table in the natural language analyzer which concerns on one Embodiment of this invention. 本発明の一実施形態に係る自然言語解析装置の係り受け解析処理部の具体的な処理例を示す図である。It is a figure which shows the specific process example of the dependency analysis process part of the natural language analyzer which concerns on one Embodiment of this invention. 図５に続く、係り受け解析処理部の具体的な処理例を示す図である。FIG. 6 is a diagram illustrating a specific processing example of the dependency analysis processing unit following FIG. 5. 図６に続く、係り受け解析処理部の具体的な処理例を示す図である。FIG. 7 is a diagram illustrating a specific processing example of the dependency analysis processing unit following FIG. 6. 図７に続く、係り受け解析処理部の具体的な処理例を示す図である。FIG. 8 is a diagram illustrating a specific processing example of the dependency analysis processing unit following FIG. 7. 図８に続く、係り受け解析処理部の具体的な処理例を示す図である。FIG. 9 is a diagram illustrating a specific processing example of the dependency analysis processing unit following FIG. 8. 図９に続く、係り受け解析処理部の具体的な処理例を示す図である。FIG. 10 is a diagram illustrating a specific processing example of the dependency analysis processing unit following FIG. 9. 図１０に続く、係り受け解析処理部の具体的な処理例を示す図である。FIG. 11 is a diagram illustrating a specific processing example of the dependency analysis processing unit following FIG. 10.

以下、本発明の実施形態について図を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本実施形態は、コンピュータ及びその周辺装置に適用される。本実施形態における各部は、コンピュータ及びその周辺装置が備えるハードウェア並びに該ハードウェアを制御するソフトウェアによって構成される。 This embodiment is applied to a computer and its peripheral devices. Each unit in the present embodiment is configured by hardware included in a computer and its peripheral devices, and software that controls the hardware.

上記ハードウェアには、制御部としてのＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）の他、記憶部、通信装置、表示装置及び入力装置が含まれる。記憶部としては、例えば、メモリ（ＲＡＭ：ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ、ＲＯＭ：ＲｅａｄＯｎｌｙＭｅｍｏｒｙ等）、ハードディスクドライブ（ＨＤＤ：ＨａｒｄＤｉｓｋＤｒｉｖｅ）及び光ディスク（ＣＤ：ＣｏｍｐａｃｔＤｉｓｋ、ＤＶＤ：ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ等）ドライブが挙げられる。通信装置としては、例えば、各種有線及び無線インターフェース装置が挙げられる。表示装置としては、例えば、液晶ディスプレイやプラズマディスプレイ等の各種ディスプレイが挙げられる。入力装置としては、例えば、キーボード及びポインティング・デバイス（マウス、トラッキングボール等）が挙げられる。 The hardware includes a storage unit, a communication device, a display device, and an input device in addition to a CPU (Central Processing Unit) as a control unit. Examples of the storage unit include a memory (RAM: Random Access Memory, ROM: Read Only Memory, etc.), a hard disk drive (HDD: Hard Disk Drive), and an optical disk (CD: Compact Disc, DVD: Digital Versatile Drive, etc.). It is done. Examples of the communication device include various wired and wireless interface devices. Examples of the display device include various displays such as a liquid crystal display and a plasma display. Examples of the input device include a keyboard and a pointing device (mouse, tracking ball, etc.).

上記ソフトウェアには、上記ハードウェアを制御するコンピュータ・プログラムやデータが含まれる。コンピュータ・プログラムやデータは、記憶部により記憶され、制御部により適宜実行、参照される。また、コンピュータ・プログラムやデータは、通信回線を介して配布されることも可能であり、ＣＤ−ＲＯＭ等のコンピュータ可読媒体に記録して配布されることも可能である。 The software includes a computer program and data for controlling the hardware. The computer program and data are stored in the storage unit, and are appropriately executed and referenced by the control unit. The computer program and data can be distributed via a communication line, or can be recorded on a computer-readable medium such as a CD-ROM and distributed.

図１は本発明の一実施形態に係る自然言語解析装置１０の構成例を示す図である。なお、以下では日本語を対象とした例について説明するが、文節に区切ることが可能で、文節間の係り受け関係が存在する言語であれば同様に適用することができる。 FIG. 1 is a diagram illustrating a configuration example of a natural language analysis apparatus 10 according to an embodiment of the present invention. In the following, an example for Japanese will be described. However, the present invention can be similarly applied to any language that can be divided into phrases and has a dependency relationship between phrases.

図１において、自然言語解析装置１０は、解析対象の文を構成する文字を文字単位で取得する解析対象文字入力部１０１と、この解析対象文字入力部１０１により入力された文字から、取得した文字ごとの依存関係を決定する係り受け解析処理部１０２とを備える。係り受け解析処理部１０２は、解析対象の文の先頭文字から順に文字ごとの依存関係を決定し、解析結果を解析文字格納部１０８に格納する。解析文字格納部１０８のデータ構造については、図３において説明する。 In FIG. 1, the natural language analysis apparatus 10 includes an analysis target character input unit 101 that acquires characters constituting an analysis target sentence in units of characters, and characters acquired from the characters input by the analysis target character input unit 101. And a dependency analysis processing unit 102 for determining the dependency of each. The dependency analysis processing unit 102 determines the dependency for each character in order from the first character of the sentence to be analyzed, and stores the analysis result in the analysis character storage unit 108. The data structure of the analysis character storage unit 108 will be described with reference to FIG.

また、自然言語解析装置１０は、係り受け解析処理部１０２により作業用データ領域として用いられる依存元文字ＩＤ格納エリア１０５と、依存先候補文字ＩＤ格納エリア１０６と、依存先未決スタック１０７と、係り受け解析処理部１０２の処理において文字間の依存関係の判定に用いられる依存関係判定部１０３と、解析結果を出力する解析結果出力部１０４とを備えている。 The natural language analyzing apparatus 10 also includes a dependency source character ID storage area 105, a dependency destination candidate character ID storage area 106, a dependency destination pending stack 107, and a dependency stack used as a work data area by the dependency analysis processing unit 102. A dependency determination unit 103 used for determining dependency between characters in the processing of the receiving analysis processing unit 102, and an analysis result output unit 104 that outputs an analysis result are provided.

依存元文字ＩＤ格納エリア１０５は、処理過程で依存元となる文字ＩＤを保持し、依存先候補文字ＩＤ格納エリア１０６は、処理過程で依存先候補となる文字ＩＤを保持し、依存先未決スタック１０７は、依存先が決まらなかった依存元の文字ＩＤを後入れ先出し（ＬＩＦＯ：ＬａｓｔＩｎＦｉｒｓｔＯｕｔ）で保持する。 The dependence source character ID storage area 105 holds a character ID that becomes a dependence source in the processing process, and the dependence destination candidate character ID storage area 106 holds a character ID that becomes a dependence destination candidate in the processing process. In 107, the dependence source character ID for which the dependence destination has not been determined is held in last-in first-out (LIFO).

また、依存関係判定部１０３は、文法定義テーブル１１０を有している。そして、係り受け解析処理部１０２は、依存関係判定部１０３により文字の係り先が決定された後に、依存先未決スタック１０７に蓄積された文字の依存関係の決定を行う。ここで、文法定義テーブル１１０は、依存元と依存先との関係をＩＦ―ＴＨＥＮ形式等のデータ構造として保持するほか、ＳＶＭによる機械学習により生成されたモデルとして保持する場合も含む。文法定義テーブル１１０の例については、図４において後述する。 In addition, the dependency relationship determination unit 103 has a grammar definition table 110. Then, the dependency analysis processing unit 102 determines the dependency relationship of the characters accumulated in the dependency destination pending stack 107 after the dependency determination unit 103 determines the character dependency destination. Here, the grammar definition table 110 holds the relationship between the dependence source and the dependence destination as a data structure in the IF-THEN format or the like, and also includes the case where it is held as a model generated by machine learning by SVM. An example of the grammar definition table 110 will be described later with reference to FIG.

図２は、本発明の一実施形態に係る自然言語解析装置１０において、ＳＶＭを用いた場合の依存関係判定部１０３の構成例を示す図である。 FIG. 2 is a diagram illustrating a configuration example of the dependency relationship determination unit 103 when SVM is used in the natural language analysis apparatus 10 according to an embodiment of the present invention.

図２において、依存関係判定部１０３は、機械学習のための教師データを入力する教師データ入力部１０３１と、この教師データ入力部１０３１により入力された教師データに基づいてＳＶＭにより機械学習を行う機械学習部１０３２と、機械学習により生成されたモデルを保持する文法ルール記憶部（例えば、図４において後述する文法定義テーブル１１０に相当）１０３３とを備えている。 In FIG. 2, a dependency relationship determination unit 103 is a teacher data input unit 1031 that inputs teacher data for machine learning, and a machine that performs machine learning by SVM based on the teacher data input by the teacher data input unit 1031. A learning unit 1032 and a grammar rule storage unit (for example, corresponding to a grammar definition table 110 described later in FIG. 4) 1033 holding a model generated by machine learning are provided.

また、依存関係判定部１０３は、係り受け解析処理部１０２（図１）から依存元文字ＩＤと、依存先候補文字ＩＤとを引数に依存関係判定要求を受け付ける依存関係判定要求受付部１０３４と、この依存関係判定要求受付部１０３４で受け付けた依存関係判定要求に基づき、文法ルール記憶部１０３３を用いて依存関係を判定し、判定結果を係り受け解析処理部１０２に返す依存関係判定実行部１０３５とを備えている。 In addition, the dependency relationship determination unit 103 receives a dependency relationship determination request receiving unit 1034 that receives a dependency source character ID and a dependency destination candidate character ID from the dependency analysis processing unit 102 (FIG. 1) as arguments. Based on the dependency determination request received by the dependency determination request receiving unit 1034, the dependency determination is performed using the grammar rule storage unit 1033, and the determination result is returned to the dependency analysis processing unit 102. It has.

図３は、本発明の一実施形態に係る自然言語解析装置１０における解析文字格納部１０８の例を示す図である。解析文字格納部１０８は、一文を構成する文字ごとに、文字を識別する「文字ＩＤ」フィールドと、文字が格納される「文字コード」フィールドと、文字の属性のうち、例えば、文字の種類（例えば、ひらがなや、カタカナ、英数字、漢字、記号等）である文字種を示す「文字種」フィールドと、依存先の文字ＩＤを示す「依存先」フィールドと、依存先の文字との依存関係のタイプを示す「タイプ」フィールドとを含んでいる。なお、文字解析の初期状態は、「文字ＩＤ」のフィールドが埋められ、「文字コード」、「文字種」、「依存先」及び「タイプ」のフィールドはブランクである。図３が示す例は、自然言語解析装置１０による係り受け解析が終了し、文字の係り先の文字ＩＤが依存先に格納され、単語区切が設定され、文節区切が設定されていることを示す例である。さらに、図３が示す例は、矢印２２１が文字ＩＤによる文字の係り先を示し、矢印２１１が設定された単語区切を示し、矢印２１２が設定された文節区切を示す例である。 FIG. 3 is a diagram illustrating an example of the analysis character storage unit 108 in the natural language analysis apparatus 10 according to an embodiment of the present invention. The analysis character storage unit 108 includes, for each character constituting one sentence, a “character ID” field for identifying the character, a “character code” field for storing the character, and a character attribute (for example, character type ( For example, a “character type” field indicating a character type that is hiragana, katakana, alphanumeric characters, kanji, symbols, etc., a “dependence destination” field indicating a dependency destination character ID, and a dependency type between the dependency destination characters And a “type” field indicating In the initial state of character analysis, the “character ID” field is filled, and the “character code”, “character type”, “dependence destination”, and “type” fields are blank. The example shown in FIG. 3 indicates that the dependency analysis by the natural language analysis apparatus 10 is completed, the character ID of the character dependency destination is stored in the dependency destination, the word break is set, and the phrase break is set. It is an example. Further, the example shown in FIG. 3 is an example in which an arrow 221 indicates a character destination by a character ID, an arrow 211 indicates a word break, and an arrow 212 indicates a phrase break.

図４は、本発明の一実施形態に係る自然言語解析装置１０における文法定義テーブル１１０の例を示す図である。文法定義テーブル１１０は、依存元及び依存先候補の文字種、位置関係（文章を構成する文字において、文字同士の前後の関係）等の条件と、判定結果のタイプとを対応付けている。 FIG. 4 is a diagram showing an example of the grammar definition table 110 in the natural language analysis apparatus 10 according to an embodiment of the present invention. The grammar definition table 110 associates conditions such as the character types and positional relationships of the dependence source and dependence destination candidates (relationship between the characters in the characters constituting the sentence) with the type of the determination result.

ここで、判定結果のタイプ「Ｗ」は、「依存元文字と依存先候補文字とが同一単語内の文字連続を構成する」ことを示すタイプである。判定結果のタイプ「Ｂ」は、「依存元文字と依存先候補文字とが同一文節内の単語連続を構成する」ことを示すタイプである。判定結果のタイプ「Ｄ」は、「依存元文字が末尾となる文節と、依存先候補文字が末尾となる文節とが文節間の依存関係にある」ことを示すタイプである。判定結果のタイプ「Ｏ」は、「依存関係なし」を示すタイプである。判定結果のタイプ「Ｅ」は、「文末である」ことを示すタイプである。 Here, the determination result type “W” is a type indicating that the dependence source character and the dependence destination candidate character constitute a continuous character in the same word. The determination result type “B” is a type indicating that the dependence source character and the dependence destination candidate character constitute a word continuation within the same phrase. The determination result type “D” is a type indicating that “the clause whose dependent source character ends and the clause whose dependent destination candidate character ends are in a dependency relationship between clauses”. The determination result type “O” is a type indicating “no dependency”. The determination result type “E” is a type indicating “end of sentence”.

さらに、上記のタイプの判定条件の一例として、例えば、「依存元文字の文字種と依存先候補文字の文字種とが同一ＡＮＤ依存元文字の直後に依存先候補文字」の場合に、依存元文字はＷと判定される。また、「依存元文字と依存先候補文字との文字種が異なるＡＮＤ依存元文字の直後に依存先候補文字」の場合に、依存元文字はＢと判定される。また、「依存元文字が「の」ＡＮＤ依存先候補文字が格助詞に用いられる文字」の場合に、依存元文字はＤと判定される。上記以外の場合に、Ｏと判定される。そして、文末の場合に、Ｅと判定される。ここで、この様な条件は例であり、これに限られるものではない。この様な条件は、機械学習により作成され、文法定義テーブル１１０に記憶される。 Furthermore, as an example of the above-mentioned type of determination condition, for example, in the case where “the character type of the dependency source character and the character type of the dependency destination candidate character are the same AND the dependency destination candidate character immediately after the dependency source character”, the dependency source character is W is determined. In addition, in the case of “dependence source character and dependency destination candidate character are different in character type AND dependency destination candidate character immediately after dependency source character”, the dependency source character is determined to be B. In addition, when the “dependent source character is“ NO ”AND the dependent destination candidate character is a character used as a case particle”, the dependent source character is determined to be D. Otherwise, it is determined as O. If the sentence ends, it is determined as E. Here, such a condition is an example, and the present invention is not limited to this. Such conditions are created by machine learning and stored in the grammar definition table 110.

ここで、本発明の一実施形態に係る自然言語解析装置１０の解析処理を高水準言語で示す。 Here, the analysis processing of the natural language analysis apparatus 10 according to an embodiment of the present invention is shown in a high-level language.

ｐｒｏｃｅｄｕｒｅａｎａｌｙｚｅ（ｍ，ｈ，ｔ）
ｖａｒｓ：ｓｔａｃｋ
ｂｅｇｉｎ
Ｐｕｓｈ（−１，ｓ）
ｍ［０］＝ｇｅｔ＿ｔｏｋｅｎ（）
Ｐｕｓｈ（０，ｓ）
ｍ［１］＝ｇｅｔ＿ｔｏｋｅｎ（）
ｉ＝１
ｗｈｉｌｅ（ｍ［ｉ］！＝ＥＯＳ）ｄｏｂｅｇｉｎ
ｊ＝Ｐｏｐ（ｓ）
ｍ［ｉ＋１］＝ｇｅｔ＿ｔｏｋｅｎ（）
ｗｈｉｌｅ（ｊ！＝−１＆＆（Ｄｅｐ（ｊ，ｉ，ｍ，ｔ）｜｜（ｍ［ｉ＋１］＝＝ＥＯＳ））ｄｏｂｅｇｉｎ
ｈ［ｊ］＝ｉ
ｊ＝Ｐｏｐ（ｓ）
ｅｎｄ
Ｐｕｓｈ（ｊ，ｓ）
Ｐｕｓｈ（ｉ，ｓ）
＋＋ｉ；
ｅｎｄ
ｊ＝Ｐｏｐ（ｓ）
ｈ［ｊ］＝ｉ
ｔ［ｊ］＝“Ｅ”
ｅｎｄ procedure analysis (m, h, t)
var s: stack
begin
Push (-1, s)
m [0] = get_token ()
Push (0, s)
m [1] = get_token ()
i = 1
while (m [i]! = EOS) do begin
j = Pop (s)
m [i + 1] = get_token ()
while (j! =-1 && (Dep (j, i, m, t) || (m [i + 1] == EOS)) do begin
h [j] = i
j = Pop (s)
end
Push (j, s)
Push (i, s)
++ i;
end
j = Pop (s)
h [j] = i
t [j] = “E”
end

上述の解析処理において、ｍは形態素（文字）の配列、ｈは係り先を記憶する配列、ｔは判定結果のタイプを記憶する配列である。また、Ｐｕｓｈ（値，ｓ）はスタックに値をプッシュする関数であり、ｇｅｔ＿ｔｏｋｅｎ（）は１文字を取得する関数であり、Ｐｏｐ（ｓ）はスタックから値をポップする関数であり、Ｄｅｐ（ｊ，ｉ，ｍ，ｔ）はｊ番目の文字がｉ番目の文字に依存するか否かを判定する関数である。 In the above-described analysis processing, m is an array of morphemes (characters), h is an array that stores the relations, and t is an array that stores the type of the determination result. Push (value, s) is a function that pushes a value onto the stack, get_token () is a function that acquires one character, Pop (s) is a function that pops a value from the stack, and Dep (j , I, m, t) is a function for determining whether the j-th character depends on the i-th character.

すなわち、Ｄｅｐ関数は、依存関係判定部１０３に相当し、ｊ番目の文字（依存元文字ＩＤ）とｉ番目の文字（依存先候補文字ＩＤ）とを引数に依存関係判定要求を受け付けて、依存元文字と依存先候補文字との判定条件を記憶する文法定義テーブル１１０を用いて文字種や種々の属性を判断して格納し、依存関係を判定する。そして、Ｄｅｐ関数は、ｊ番目の文字がｉ番目の文字に依存する（係る）と判定する場合に「Ｔｒｕｅ」を返し、依存しない（係らない）と判定する場合に「Ｆａｌｓｅ」を返す。 That is, the Dep function corresponds to the dependency determination unit 103, accepts a dependency determination request using the jth character (dependence source character ID) and the i th character (dependence destination candidate character ID) as arguments, and The grammar definition table 110 that stores the determination conditions for the original character and the dependence destination candidate character is used to determine and store the character type and various attributes, and the dependency relationship is determined. The Dep function returns “True” when it is determined that the j-th character is dependent (related) on the i-th character, and “False” when it is determined that the j-th character is not dependent (not related).

なお、Ｄｅｐ関数が判断する文字種は、簡単のため上記の例を示したが、これに限られるものではない。具体的には、その他の文字種の例として、漢数字になりうる文字（○の記号や、漢字の中の一、二、・・・壱、弐、百、千、万等）や、一般的には記号と見なされているが、仮名と同類で扱ったほうがよい文字（濁点、半濁点、長音の記号等）や、名前の一部に使われる確率が高い文字（子、優、宏、朗、・・・）等も含まれる。さらに、Ｄｅｐ関数の判断の例として、Ｄｅｐ関数は、解析対象の文字の種類が漢数字になりうる文字である場合、当該文字は漢数字内の文字連続を構成すると判断する場合がある。また、解析対象の文字の種類が名前の一部に使われる確率が高い文字である場合、当該文字は名前内の文字連続を構成すると判断する場合がある。また、一つの文字が複数の文字種を有すると判断する場合もある。具体的には、「○」が、記号であると共に、漢数字の一部である、という二つの文字種を有すると判断する場合である。このように、Ｄｅｐ関数は、文字の種々の属性を参照して依存関係を判定する。 In addition, although the said example showed the character type which a Dep function judges for simplicity, it is not restricted to this. Specifically, other examples of character types include characters that can be Chinese numerals (symbols ○, one or two of the Chinese characters, 壱, 弐, one hundred, one thousand, ten thousand, etc.) Is considered a symbol, but it should be treated in the same way as kana (a dakuten, semi-dakuten, long-sound symbol, etc.), or a character that has a high probability of being used as part of a name (child, Yu, Hiroshi, Akira, ...) etc. are also included. Further, as an example of the determination of the Dep function, when the type of character to be analyzed is a character that can be a Chinese numeral, the Dep function may determine that the character constitutes a continuous character in the Chinese numeral. Further, when the type of character to be analyzed is a character that has a high probability of being used as a part of the name, it may be determined that the character constitutes a continuous character in the name. Moreover, it may be determined that one character has a plurality of character types. Specifically, this is a case where it is determined that “◯” has two character types that are a symbol and a part of a Chinese numeral. In this way, the Dep function determines the dependency by referring to various attributes of the character.

以下、形態素解析結果「メグが彼にあのペンをあげた。」（図３）という具体例について、図５〜図１１を用いて上述の解析処理を説明する。図５は、本発明の一実施形態に係る自然言語解析装置１０の係り受け解析処理部１０２の具体的な処理例を示す図である。図６〜図１１は、それぞれ前の図に続く、係り受け解析処理部１０２の具体的な処理例を示す図である。 Hereinafter, the above-described analysis process will be described with reference to FIGS. 5 to 11 for a specific example of the morphological analysis result “Meg gave his pen to him” (FIG. 3). FIG. 5 is a diagram illustrating a specific processing example of the dependency analysis processing unit 102 of the natural language analysis apparatus 10 according to the embodiment of the present invention. 6 to 11 are diagrams illustrating specific processing examples of the dependency analysis processing unit 102, which are subsequent to the previous diagrams.

図５において、係り受け解析処理部１０２は、処理を開始すると、依存先未決スタック１０７に「−１」をプッシュして、文字ＩＤ「０」の文字「メ」を解析文字格納部１０８の文字コードに格納する。引き続き、係り受け解析処理部１０２は、依存先未決スタック１０７に「０」をプッシュして、文字ＩＤ「１」の文字「グ」を解析文字格納部１０８の文字コードに格納する。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤを初期値「１」に設定する。 In FIG. 5, when the dependency analysis processing unit 102 starts the processing, it pushes “−1” to the dependency-destination pending stack 107, and the character “me” with the character ID “0” is stored in the character of the analysis character storage unit 108. Store in code. Subsequently, the dependency analysis processing unit 102 pushes “0” to the dependency destination pending stack 107 and stores the character “G” of the character ID “1” in the character code of the analysis character storage unit 108. Next, the dependency analysis processing unit 102 sets the dependence destination candidate character ID in the dependence destination candidate character ID storage area 106 to an initial value “1”.

図６（１）において、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤが示す文字コードがＥＯＳではないので、依存先未決スタック１０７からポップした値「０」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ＋１（即ち１＋１＝「２」）の文字「が」を解析文字格納部１０８の文字コードに格納する。 In FIG. 6A, the dependency analysis processing unit 102 determines that the value “0” popped from the dependency destination pending stack 107 because the character code indicated by the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106 is not EOS. Is set as the dependency source character ID in the dependency source character ID storage area 105. Next, the dependency analysis processing unit 102 stores the character “GA” of the dependency destination candidate character ID + 1 (ie, 1 + 1 = “2”) in the character code of the analysis character storage unit 108.

次に、図６（２）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、依存関係判定部１０３の判定結果が「Ｔｒｕｅ」なので、以下の処理を行う。より具体的には、依存関係判定部１０３は、文字ＩＤ「０」の文字「メ」の文字種に「カタカナ」を格納し、文字ＩＤ「１」の文字「グ」の文字種に「カタカナ」を格納し、「メグ」が単語辞書にあるので、「メ」が「グ」に「係る」とする判定（機械学習による判定）を行い、「Ｔｒｕｅ」を返す。係り受け解析処理部１０２は、依存関係判定部１０３の判定結果が「Ｗ」であるので、解析文字格納部１０８の依存元文字のタイプに判定結果の「Ｗ」を設定し、依存先に依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「１」を設定する。次に、係り受け解析処理部１０２は、依存先未決スタック１０７からポップした値「−１」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。 Next, in FIG. 6B, the dependency analysis processing unit 102 determines that the dependency source character ID of the dependency source character ID storage area 105 is not “−1”, and the determination result of the dependency relationship determination unit 103 is “True”. Therefore, the following processing is performed. More specifically, the dependency relationship determination unit 103 stores “katakana” as the character type of the character “me” with the character ID “0” and sets “katakana” as the character type of the character “gu” with the character ID “1”. Since “Meg” is in the word dictionary, it is determined that “Me” is “related” to “G” (determination by machine learning), and “True” is returned. Since the determination result of the dependency relationship determination unit 103 is “W”, the dependency analysis processing unit 102 sets “W” as the determination result as the dependency source character type of the analysis character storage unit 108 and depends on the dependency destination. The dependence destination candidate character ID “1” in the destination candidate character ID storage area 106 is set. Next, the dependency analysis processing unit 102 sets the value “−1” popped from the dependency destination pending stack 107 as the dependency source character ID in the dependency source character ID storage area 105.

図６（３）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」であるので、依存先未決スタック１０７に依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤ「−１」をプッシュする。さらに、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「１」をプッシュする。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤをインクリメントして（１＋１＝）「２」にする。 In FIG. 6 (3), since the dependency source character ID of the dependency source character ID storage area 105 is “−1”, the dependency analysis processing unit 102 stores the dependency source character ID storage area 105 in the dependency destination pending stack 107. The dependence source character ID “−1” is pushed. Further, the dependency analysis processing unit 102 pushes the dependence destination candidate character ID “1” in the dependence destination candidate character ID storage area 106. Next, the dependency analysis processing unit 102 increments the dependence destination candidate character ID in the dependence destination candidate character ID storage area 106 to (1 + 1 =) “2”.

図７（１）において、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤが示す文字コードがＥＯＳではないので、依存先未決スタック１０７からポップした値「１」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ＋１（即ち２＋１＝「３」）の文字「彼」を解析文字格納部１０８の文字コードに格納する。 In FIG. 7A, the dependency analysis processing unit 102 determines that the value “1” popped from the dependency destination pending stack 107 since the character code indicated by the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106 is not EOS. Is set as the dependency source character ID in the dependency source character ID storage area 105. Next, the dependency analysis processing unit 102 stores the character “hi” of the dependence destination candidate character ID + 1 (ie, 2 + 1 = “3”) in the character code of the analysis character storage unit 108.

図７（２）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、依存関係判定部１０３の判定結果が「Ｔｒｕｅ」なので、以下の処理を行う。より具体的には、依存関係判定部１０３は、文字ＩＤ「２」の文字「が」の文字種に「ひらがな」を格納し、「メグ」が単語辞書にある、「グが」が単語辞書にない、「グが」が文字種が異なる等、注目する文字及びその文字の前後の素性を見ることで「グ」が単語の区切りで、かつ「が」が助詞等の情報を使い、「が」に係るとする判定（機械学習による判定）を行い、「Ｔｒｕｅ」を返す。係り受け解析処理部１０２は、依存関係判定部１０３の判定結果が「Ｂ」であるので、解析文字格納部１０８の依存元文字のタイプに判定結果の「Ｂ」を設定し、依存先に依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「２」を設定する。次に、係り受け解析処理部１０２は、依存先未決スタック１０７からポップした値「−１」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。 In FIG. 7B, the dependency analysis processing unit 102 determines that the dependency source character ID of the dependency source character ID storage area 105 is not “−1” and the determination result of the dependency relationship determination unit 103 is “True”. Perform the process. More specifically, the dependency relationship determination unit 103 stores “Hiragana” as the character type of the character “GA” of the character ID “2”, “MEG” is in the word dictionary, and “GUGA” is in the word dictionary. No, “Guga” is different in character type, etc., “G” is a word break, “G” is information of particle, etc. Is determined (determination based on machine learning), and “True” is returned. Since the determination result of the dependency determination unit 103 is “B”, the dependency analysis processing unit 102 sets “B” of the determination result as the dependency source character type of the analysis character storage unit 108 and depends on the dependency destination. The dependence destination candidate character ID “2” in the destination candidate character ID storage area 106 is set. Next, the dependency analysis processing unit 102 sets the value “−1” popped from the dependency destination pending stack 107 as the dependency source character ID in the dependency source character ID storage area 105.

図７（３）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」であるので、依存先未決スタック１０７に依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤ「−１」をプッシュする。さらに、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「２」をプッシュする。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤをインクリメントして（２＋１＝）「３」にする。 In FIG. 7 (3), since the dependency source character ID of the dependency source character ID storage area 105 is “−1”, the dependency analysis processing unit 102 stores the dependency source character ID storage area 105 in the dependency destination pending stack 107. The dependence source character ID “−1” is pushed. Further, the dependency analysis processing unit 102 pushes the dependence destination candidate character ID “2” in the dependence destination candidate character ID storage area 106. Next, the dependency analysis processing unit 102 increments the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106 to (2 + 1 =) “3”.

図８（１）において、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤが示す文字コードがＥＯＳではないので、依存先未決スタック１０７からポップした値「２」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ＋１（即ち３＋１＝「４」）の文字「に」を解析文字格納部１０８の文字コードに格納する。 In FIG. 8A, the dependency analysis processing unit 102 determines that the value “2” popped from the dependency destination pending stack 107 since the character code indicated by the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106 is not EOS. Is set as the dependency source character ID in the dependency source character ID storage area 105. Next, the dependency analysis processing unit 102 stores the character “ni” of the dependence destination candidate character ID + 1 (ie, 3 + 1 = “4”) in the character code of the analysis character storage unit 108.

図８（２）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、依存関係判定部１０３の判定結果が「Ｆａｌｓｅ」なので、以下の処理を行う。より具体的には、依存関係判定部１０３は、文字ＩＤ「３」の文字「彼」の文字種に「漢字」を格納し、「が」は「彼」に「係らない」とする判定（機械学習による判定）を行い、「Ｆａｌｓｅ」を返す。係り受け解析処理部１０２は、依存関係判定部１０３の判定結果が「Ｏ」であるので、解析文字格納部１０８の依存元文字のタイプに判定結果「Ｏ」を設定し、依存先を設定しない（未決にする）。 In FIG. 8 (2), the dependency analysis processing unit 102 determines that the dependency source character ID of the dependency source character ID storage area 105 is not “−1” and the determination result of the dependency relationship determination unit 103 is “False”. Perform the process. More specifically, the dependency relationship determination unit 103 stores “Kanji” as the character type of the character “he” of the character ID “3”, and determines that “ga” is “not related” to “he” (machine) Judgment by learning) and returns “False”. The dependency analysis processing unit 102 sets the determination result “O” as the dependency source character type in the analysis character storage unit 108 and does not set the dependency destination because the determination result of the dependency relationship determination unit 103 is “O”. (To be decided)

図８（３）において、係り受け解析処理部１０２は、依存先未決スタック１０７に依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤ「２」をプッシュする。さらに、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「３」をプッシュする。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤをインクリメントして（３＋１＝）「４」にする。 In FIG. 8 (3), the dependency analysis processing unit 102 pushes the dependency source character ID “2” of the dependency source character ID storage area 105 to the dependency destination pending stack 107. Further, the dependency analysis processing unit 102 pushes the dependence destination candidate character ID “3” in the dependence destination candidate character ID storage area 106. Next, the dependency analysis processing unit 102 increments the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106 to (3 + 1 =) “4”.

図９（１）において、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤが示す文字コードがＥＯＳではないので、依存先未決スタック１０７からポップした値「３」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ＋１（即ち４＋１＝「５」）の文字「あ」を解析文字格納部１０８の文字コードに格納する。 In FIG. 9A, the dependency analysis processing unit 102 determines that the value “3” popped from the dependency destination pending stack 107 since the character code indicated by the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106 is not EOS. Is set as the dependency source character ID in the dependency source character ID storage area 105. Next, the dependency analysis processing unit 102 stores the character “A” of the dependency destination candidate character ID + 1 (ie, 4 + 1 = “5”) in the character code of the analysis character storage unit 108.

図９（２）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、依存関係を依存関係判定部１０３の判定結果が「Ｔｒｕｅ」なので、以下の処理を行う。より具体的には、依存関係判定部１０３は、文字ＩＤ「４」の文字「に」の文字種に「ひらがな」を格納し、「彼」が「に」に「係る」とする判定（機械学習による判定）を行い、「Ｔｒｕｅ」を返す。係り受け解析処理部１０２は、依存関係判定部１０３の判定結果が「Ｂ」であるので、解析文字格納部１０８の依存元文字のタイプに判定結果の「Ｂ」を設定し、依存先に依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「４」を設定する。次に、係り受け解析処理部１０２は、依存先未決スタック１０７からポップした値「２」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。次に、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、依存関係を依存関係判定部１０３の判定結果が「Ｆａｌｓｅ」なので、以下の処理を行う。より具体的には、依存関係判定部１０３は、「が」が「に」に「係らない」とする判定（機械学習による判定）を行い、「Ｆａｌｓｅ」を返す。係り受け解析処理部１０２は、依存関係判定部１０３の判定結果が「Ｏ」であるので、解析文字格納部１０８の依存元文字のタイプに判定結果「Ｏ」を設定し、依存先を設定しない（未決にする）。 In FIG. 9B, the dependency analysis processing unit 102 determines that the dependency source character ID of the dependency source character ID storage area 105 is not “−1”, and the dependency determination result of the dependency relationship determination unit 103 is “True”. Therefore, the following processing is performed. More specifically, the dependency relationship determination unit 103 stores “Hiragana” in the character type of the character “ni” of the character ID “4” and determines that “he” is “related” to “ni” (machine learning). And return “True”. Since the determination result of the dependency determination unit 103 is “B”, the dependency analysis processing unit 102 sets “B” of the determination result as the dependency source character type of the analysis character storage unit 108 and depends on the dependency destination. The dependence destination candidate character ID “4” in the destination candidate character ID storage area 106 is set. Next, the dependency analysis processing unit 102 sets the value “2” popped from the dependency destination pending stack 107 as the dependency source character ID of the dependency source character ID storage area 105. Next, since the dependency source character ID of the dependency source character ID storage area 105 is not “−1” and the determination result of the dependency relationship determination unit 103 is “False”, the dependency analysis processing unit 102 determines that the dependency source character ID is “False”. Process. More specifically, the dependency relationship determination unit 103 performs determination (determination by machine learning) that “ga” is “not related” to “ni”, and returns “False”. The dependency analysis processing unit 102 sets the determination result “O” as the dependency source character type in the analysis character storage unit 108 and does not set the dependency destination because the determination result of the dependency relationship determination unit 103 is “O”. (To be decided)

図９（３）において、係り受け解析処理部１０２は、依存先未決スタック１０７に依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤ「２」をプッシュする。さらに、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「４」をプッシュする。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤをインクリメントして（４＋１＝）「５」にする。 In FIG. 9 (3), the dependency analysis processing unit 102 pushes the dependency source character ID “2” of the dependency source character ID storage area 105 to the dependency destination pending stack 107. Further, the dependency analysis processing unit 102 pushes the dependence destination candidate character ID “4” in the dependence destination candidate character ID storage area 106. Next, the dependency analysis processing unit 102 increments the dependence destination candidate character ID in the dependence destination candidate character ID storage area 106 to (4 + 1 =) “5”.

以下同様に、係り受け解析処理部１０２は、依存先が未決の文字ＩＤをスタックしていき、文字間の依存関係の判定により文字の係り先が決定した後に、スタックに蓄積された文字ＩＤの依存関係の決定を行って文字の係り受けを決定する。 Similarly, the dependency analysis processing unit 102 stacks character IDs whose dependency destinations are not yet determined, and after determining the character dependency destinations by determining the dependency relationship between characters, the dependency analysis processing unit 102 determines the character IDs stored in the stack. Dependency is determined and character dependency is determined.

文末近くでの処理を図１０で説明する。図１０が示す例は、依存先未決スタック１０７に、依存先が未決の３個の文字ＩＤと、次の処理対象の文字ＩＤとがスタックされており、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤが「１３」である状態である。 Processing near the end of the sentence will be described with reference to FIG. In the example shown in FIG. 10, three character IDs whose dependency destinations are yet to be determined and the next processing target character ID are stacked on the dependency destination undecided stack 107. This is a state where the destination candidate character ID is “13”.

図１０（１）において、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤが示す文字コードがＥＯＳではないので、依存先未決スタック１０７からポップした値「１２」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ＋１（即ち１３＋１＝「１４」）の文字「ＥＯＳ」を解析文字格納部１０８の文字コードに格納する。 In FIG. 10A, the dependency analysis processing unit 102 determines that the value “12” popped from the dependency destination pending stack 107 since the character code indicated by the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106 is not EOS. Is set as the dependency source character ID in the dependency source character ID storage area 105. Next, the dependency analysis processing unit 102 stores the character “EOS” of the dependence destination candidate character ID + 1 (ie, 13 + 1 = “14”) in the character code of the analysis character storage unit 108.

図１０（２）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、かつ、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ＋１が示す文字コードがＥＯＳであるので、以下の処理を行う。より具体的には、依存関係判定部１０３は、文字ＩＤ「１３」の文字「。」の文字種に「記号」を格納し、「た」が「。」に「係る」とする判定（機械学習による判定）を行い、「Ｔｒｕｅ」を返す。係り受け解析処理部１０２は、依存関係判定部１０３の判定結果が「Ｂ」であるので、解析文字格納部１０８の依存元文字のタイプに判定結果の「Ｂ」を設定し、依存先に依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「１３」を設定する。次に、係り受け解析処理部１０２は、依存先未決スタック１０７からポップした値「９」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。 In FIG. 10B, the dependency analysis processing unit 102 determines that the dependency source character ID in the dependency source character ID storage area 105 is not “−1” and the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106. Since the character code indicated by ID + 1 is EOS, the following processing is performed. More specifically, the dependency relationship determination unit 103 stores “symbol” in the character type of the character “.” Of the character ID “13” and determines that “ta” is “related” to “.” (Machine learning) And return “True”. Since the determination result of the dependency determination unit 103 is “B”, the dependency analysis processing unit 102 sets “B” of the determination result as the dependency source character type of the analysis character storage unit 108 and depends on the dependency destination. The dependence destination candidate character ID “13” of the destination candidate character ID storage area 106 is set. Next, the dependency analysis processing unit 102 sets the value “9” popped from the dependency destination pending stack 107 as the dependency source character ID in the dependency source character ID storage area 105.

図１０（３）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、かつ、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ＋１が示す文字コードがＥＯＳであるので、解析文字格納部１０８の依存元文字ＩＤ「９」のタイプに「Ｄ」を設定し、依存先に依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「１３」を設定する。次に、係り受け解析処理部１０２は、依存先未決スタック１０７からポップした値「４」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。 In FIG. 10C, the dependency analysis processing unit 102 determines that the dependency source character ID in the dependency source character ID storage area 105 is not “−1” and the dependency destination candidate character ID in the dependency destination candidate character ID storage area 106. Since the character code indicated by ID + 1 is EOS, “D” is set as the type of the dependency source character ID “9” in the analysis character storage unit 108, and the dependency destination candidate character in the dependency destination candidate character ID storage area 106 is set as the dependency destination. ID “13” is set. Next, the dependency analysis processing unit 102 sets the value “4” popped from the dependency destination pending stack 107 as the dependency source character ID of the dependency source character ID storage area 105.

図１１（１）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、かつ、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ＋１が示す文字コードがＥＯＳであるので、解析文字格納部１０８の依存元文字ＩＤ「４」のタイプに「Ｄ」を設定し、依存先に依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「１３」を設定する。次に、係り受け解析処理部１０２は、依存先未決スタック１０７からポップした値「２」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。 In FIG. 11A, the dependency analysis processing unit 102 determines that the dependency source character ID of the dependency source character ID storage area 105 is not “−1” and the dependency destination candidate character ID of the dependency destination candidate character ID storage area 106. Since the character code indicated by ID + 1 is EOS, “D” is set as the type of the dependency source character ID “4” in the analysis character storage unit 108, and the dependency destination candidate character in the dependency destination candidate character ID storage area 106 is set as the dependency destination. ID “13” is set. Next, the dependency analysis processing unit 102 sets the value “2” popped from the dependency destination pending stack 107 as the dependency source character ID of the dependency source character ID storage area 105.

図１１（２）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」ではなく、かつ、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ＋１が示す文字コードがＥＯＳであるので、解析文字格納部１０８の依存元文字ＩＤ「２」のタイプに「Ｄ」を設定し、依存先に依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「１３」を設定する。次に、係り受け解析処理部１０２は、依存先未決スタック１０７からポップした値「−１」を依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤに設定する。 In FIG. 11B, the dependency analysis processing unit 102 determines that the dependency source character ID of the dependency source character ID storage area 105 is not “−1” and the dependency destination candidate character ID of the dependency destination candidate character ID storage area 106. Since the character code indicated by ID + 1 is EOS, “D” is set as the type of the dependency source character ID “2” in the analysis character storage unit 108, and the dependency destination candidate character in the dependency destination candidate character ID storage area 106 is set as the dependency destination. ID “13” is set. Next, the dependency analysis processing unit 102 sets the value “−1” popped from the dependency destination pending stack 107 as the dependency source character ID in the dependency source character ID storage area 105.

図１１（３）において、係り受け解析処理部１０２は、依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤが「−１」であるので、依存先未決スタック１０７に依存元文字ＩＤ格納エリア１０５の依存元文字ＩＤ「−１」をプッシュする。さらに、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「１３」をプッシュする。次に、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤをインクリメントして（１３＋１＝）「１４」にする。そして、係り受け解析処理部１０２は、依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤが示す文字コードがＥＯＳであるので、依存先未決スタック１０７からポップした値「１３」の依存先に依存先候補文字ＩＤ格納エリア１０６の依存先候補文字ＩＤ「１４」を設定し、解析文字格納部１０８の依存元文字ＩＤ「１３」のタイプに「Ｅ」を設定し、処理を終了する。このようにして、自然言語解析装置１０は、例えば「メグが彼にあのペンをあげた。」という文を構成する文字を文字単位で取得し、取得した文字ごとの依存関係を図１１（２）の様に決定する。 In FIG. 11 (3), the dependency analysis processing unit 102 determines that the dependency source character ID in the dependency source character ID storage area 105 is “−1”. The dependence source character ID “−1” is pushed. Further, the dependency analysis processing unit 102 pushes the dependence destination candidate character ID “13” in the dependence destination candidate character ID storage area 106. Next, the dependency analysis processing unit 102 increments the dependence destination candidate character ID in the dependence destination candidate character ID storage area 106 to (13 + 1 =) “14”. Then, since the character code indicated by the dependence destination candidate character ID in the dependence destination candidate character ID storage area 106 is EOS, the dependency analysis processing unit 102 sets the dependence destination of the value “13” popped from the dependence destination pending stack 107. The dependence destination candidate character ID “14” in the dependence destination candidate character ID storage area 106 is set, the type of the dependence source character ID “13” in the analysis character storage unit 108 is set to “E”, and the process is terminated. In this way, the natural language analyzing apparatus 10 acquires, for example, characters that make up a sentence “Meg gave him that pen.” For each character, and shows the dependency for each acquired character as shown in FIG. ).

本実施形態によれば、自然言語解析装置１０は、解析対象の文を構成する文字を文字単位で取得し、取得した文字ごとの依存関係を決定する。そして、自然言語解析装置１０は、当該解析対象の文の先頭文字から順にこの文字ごとの依存関係を決定する過程で、係り先が未確定の文字を依存先未決スタック１０７にスタックしていき、依存関係の判定により文字の係り先が決定した後に、依存先未決スタック１０７に蓄積された文字の依存関係の決定を行って文字の係り受けを決定する。さらに、自然言語解析装置１０の依存関係の判定は、依存元及び依存先候補の文字の種類、文字の前後関係の条件に応じた判定結果のタイプを対応付けた文法定義テーブル１１０に基づいて行う。さらに、自然言語解析装置１０の依存関係の判定は、ＳＶＭにより機械学習された文法ルールに基づいて行う。したがって、自然言語解析装置１０は、依存関係が未確定な文字を蓄積しながら文字ごとの係り受け解析をするので、解析対象の文を文末まで形態素解析しなくても、文字ごとの係り受け関係を決定することができる。 According to the present embodiment, the natural language analyzing apparatus 10 acquires characters constituting the sentence to be analyzed in character units, and determines the dependency for each acquired character. Then, the natural language analyzing apparatus 10 stacks characters whose dependency destinations are undecided in the dependency destination undecided stack 107 in the process of determining the dependency relationship for each character in order from the first character of the sentence to be analyzed, After the character dependency destination is determined by the dependency relationship determination, the character dependency relationship stored in the dependency destination pending stack 107 is determined to determine the character dependency relationship. Furthermore, the determination of the dependency relationship of the natural language analyzing apparatus 10 is performed based on the grammar definition table 110 in which the types of the dependency source and dependency destination candidates and the determination result types corresponding to the character context relationship are associated with each other. . Furthermore, the dependency relationship of the natural language analysis apparatus 10 is determined based on the grammatical rules machine-learned by the SVM. Therefore, since the natural language analysis apparatus 10 performs dependency analysis for each character while accumulating characters whose dependency relationship is undetermined, the dependency relationship for each character can be obtained without performing morphological analysis of the sentence to be analyzed to the end of the sentence. Can be determined.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

１０自然言語解析装置
１０１解析対象文字入力部
１０２係り受け解析処理部
１０３依存関係判定部
１０４解析結果出力部
１０５依存元文字ＩＤ格納エリア
１０６依存先候補文字ＩＤ格納エリア
１０７依存先未決スタック
１０８解析文字格納部
１１０文法定義テーブル
１０３１教師データ入力部
１０３２機械学習部
１０３３文法ルール記憶部
１０３４依存関係判定要求受付部
１０３５依存関係判定実行部 DESCRIPTION OF SYMBOLS 10 Natural language analyzer 101 Analysis object character input part 102 Dependency analysis process part 103 Dependency determination part 104 Analysis result output part 105 Dependent source character ID storage area 106 Dependent destination candidate character ID storage area 107 Dependent destination undecided stack 108 Analyzed character Storage unit 110 Grammar definition table 1031 Teacher data input unit 1032 Machine learning unit 1033 Grammar rule storage unit 1034 Dependency determination request reception unit 1035 Dependency determination execution unit

Claims

A natural language analyzer for analyzing natural language sentences,
Character acquisition means for acquiring characters constituting the sentence to be analyzed in character units;
Character dependency analysis means for determining the dependency for each acquired character,
The character dependency analysis means includes:
In the process of determining the dependency relationship for each character in order from the first character of the sentence to be analyzed, the character is determined as the dependency destination is stacked, and the character dependency destination is determined by determining the dependency, Determine the dependency of the characters by determining the dependency of the characters stored in
Natural language analyzer.

The determination of the dependency relationship in the character dependency analysis unit is performed based on a grammar definition table in which a type of a determination result according to a character type of a dependency source and a dependency destination candidate and a character position relationship is associated. The natural language analysis apparatus according to claim 1.

The natural language analysis apparatus according to claim 1, wherein the dependency determination in the character dependency analysis unit is performed based on a grammar rule machine-learned by SVM.

A natural language analyzer is a natural language analysis method for analyzing a natural language sentence,
Obtaining characters constituting the sentence to be analyzed in character units;
A character dependency analysis step for determining a dependency relationship for each acquired character,
The character dependency analysis step includes:
In the process of determining the dependency relationship for each character in order from the first character of the sentence to be analyzed, the character is determined as the dependency destination is stacked, and the character dependency destination is determined by determining the dependency, A natural language analysis method for determining the dependency of characters by determining the dependency relationship of characters stored in.

A natural language analysis program that analyzes natural language sentences.
Obtaining characters constituting the sentence to be analyzed in character units;
A character dependency analysis step for determining a dependency relationship for each acquired character; and
The character dependency analysis step includes:
In the process of determining the dependency relationship for each character in order from the first character of the sentence to be analyzed, the character is determined as the dependency destination is stacked, and the character dependency destination is determined by determining the dependency, A program that determines the dependency of characters by determining the dependency relationship of characters stored in.