JPH08278973A

JPH08278973A - Parallel phrase analyzing device and learning data generating device

Info

Publication number: JPH08278973A
Application number: JP7082510A
Authority: JP
Inventors: Hide Fuji; 秀富士
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1995-04-07
Filing date: 1995-04-07
Publication date: 1996-10-22
Anticipated expiration: 2020-09-21
Also published as: JP3698454B2

Abstract

PURPOSE: To provide the learning data generating device which automatically generates knowledge of parallel structure depending upon fields and context. CONSTITUTION: An inputted learning sentence is converted by a morpheme analysis part 1 into a morpheme string. The morpheme string is converted by a paragraph composition part 2 into a paragraph string. A parallel key decision part 3 decides whether or not there are parallel keys in the paragraph string. When there are the parallel keys, a parallel type classification part 4 checks the parallel type of the learning sentence. A parallel element extraction part 5 extracts parallel elements from the learning sentence when the parallel type is suitable to learning. A modification classification part 6 checks the modification type of the learning sentence. A modification relation extraction part 7 extracts modification elements from the learning sentence when the modification type is suitable to the learning.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、学習辞書を参照して入
力文中の並列構造を認識する並列句解析装置および学習
データを作成する学習データ作成装置に関するものであ
る。並列句解析装置や学習データ作成装置は、自然言語
解析システムの中に存在する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel phrase analysis device for recognizing a parallel structure in an input sentence by referring to a learning dictionary and a learning data creation device for creating learning data. The parallel phrase analysis device and the learning data creation device exist in the natural language analysis system.

【０００２】一般に出回っている文書にはかなりの数の
並列構造が含まれている。機械翻訳システムや文書推敲
支援システムなどの自然言語解析システムでは、これら
の並列構造を正しく認識しなければ正しい解析を行うこ
とが出来ない。並列構造の認識を誤ると、殆どの場合、
文意が正しく解釈できなくなり、解析全体として見ると
致命的である。Documents in general circulation contain a considerable number of parallel structures. A natural language analysis system such as a machine translation system or a document revision support system cannot perform correct analysis without correctly recognizing these parallel structures. Incorrect recognition of parallel structures, in most cases,
The meaning cannot be correctly interpreted, which is fatal when viewed as an analysis as a whole.

【０００３】[0003]

【従来の技術】自然言語解析システムでは、並列構造を
認識するための仕組みとして、次のようなものが考案さ
れ、実現されている。１．システム操作者が並列構造を対話的に指定するも
の。システム解析できるところまで自動的に解析を行
い、並列構造の曖昧性が生じた時点で、システム操作者
に並列の範囲を指定させる。並列範囲の周囲に括弧付け
させるもの等がある。また、可能な並列候補を提示して
操作者に正しいものを選ばせるものもある。２．構文情報を使うもの。並列候補の中で、構文的に妥
当なもののみを使う。ただし、構文的に妥当な並列候補
が複数ある場合があるので、この方法のみでは正しいも
のを見つけることが出来ない。３．並列構造のバランスを使うもの。並列候補の中で、
並列キー両側の並列要素のバランスの良いものを優先的
に正解として扱う。４．用言の格情報を使うもの。並列要素が、係り受け関
係を持つとき、この係り受けの妥当性を加味して計算を
行う。５．意味属性を使うもの。並列要素候補の意味属性か
ら、その並列要素の妥当性を計算する。６．上記の複数の情報を統合して使うもの。2. Description of the Related Art In a natural language analysis system, the following has been devised and implemented as a mechanism for recognizing parallel structures. 1. A system operator interactively specifies a parallel structure. The system is automatically analyzed to the point where it can be analyzed, and when the ambiguity of the parallel structure occurs, the system operator is made to specify the parallel range. There are things such as putting parentheses around the parallel range. There is also a method in which possible parallel candidates are presented so that the operator can select the correct one. 2. Those that use syntax information. Use only the syntactically valid ones of the parallel candidates. However, since there are cases where there are multiple syntactically valid parallel candidates, this method alone cannot find the correct one. 3. Uses a parallel structure balance. Among the parallel candidates,
The one with the well-balanced parallel elements on both sides of the parallel key is preferentially treated as the correct answer. 4. The one that uses the case information of idioms. When parallel elements have a dependency relation, the validity of this dependency is taken into account for calculation. 5. Those that use semantic attributes. The validity of the parallel element is calculated from the semantic attribute of the parallel element candidate. 6. A combination of the above information.

【０００４】[0004]

【発明が解決しようとする課題】上述したような従来の
並列構造認定では、使っている情報が固定的なものが多
いので、或る分野に特徴的な並列構造を精度よく認識す
ることが難しかった。分野に特徴的な並列構造とは、例
えば文書処理の分野では、「書き込み」と「読み出
し」、「作成」と「更新」と「保存」などがある。この
ような分野依存の構造は一般的な情報から或る程度は認
識することが出来るが、分野に合わせて調整でき、確実
な認識が出来るような装置が必要となる。In the conventional parallel structure recognition as described above, since the information used is often fixed, it is difficult to accurately recognize the parallel structure characteristic of a certain field. It was In the field of document processing, for example, the parallel structure characteristic of the field includes “write” and “read”, “create”, “update”, and “save”. Such a field-dependent structure can be recognized to a certain extent from general information, but a device that can be adjusted according to the field and can be surely recognized is required.

【０００５】また、上述したように並列構造は分野毎に
特徴があるが、文書毎に特徴があることも多い。すなわ
ち、或る特定の文書で一度使われた並列表現は、同一文
書内で同じような形で再度使われることが多い。このよ
うな文脈依存の情報も合わせた調節が出来ることが望ま
しい。Further, as described above, the parallel structure has characteristics in each field, but often has characteristics in each document. That is, a parallel expression used once in a specific document is often used again in the same form in the same document. It is desirable to be able to adjust such context-dependent information as well.

【０００６】本発明は、この点に鑑みて創作されたもの
であって、分野や文脈依存の並列構造の知識を自動的に
作成する学習データ作成装置を提供することを目的とし
ている。また、本発明は、学習データ作成装置によって
作成された学習データを参照して、並列構造認識を行う
並列句解析装置を提供することを目的としている。The present invention was created in view of this point, and it is an object of the present invention to provide a learning data creating apparatus for automatically creating knowledge of a field or context-dependent parallel structure. It is another object of the present invention to provide a parallel phrase analysis device that performs parallel structure recognition by referring to the learning data created by the learning data creation device.

【０００７】[0007]

【課題を解決するための手段】請求項１の並列句解析装
置は、入力文を形態素列に分解する形態素解析部と、形
態素列から文節列を合成する文節合成部と、並列要素お
よび係り受け要素を記憶する学習辞書と、文節列および
学習辞書の内容に基づいて、入力文中に存在する並列句
の認識を行う並列構造認識部とを具備することを特徴と
するものである。According to another aspect of the present invention, there is provided a parallel phrase analyzing apparatus, wherein a morpheme analyzing unit for decomposing an input sentence into a morpheme sequence, a bunsetsu synthesizing unit for synthesizing a phrase sequence from the morpheme sequence, a parallel element and a dependency. It is characterized by comprising a learning dictionary for storing elements and a parallel structure recognizing unit for recognizing a parallel phrase existing in an input sentence based on a phrase sequence and contents of the learning dictionary.

【０００８】請求項２の並列句解析装置は、請求項１の
並列句解析装置において、並列構造認識部が、文節列か
ら作ることが出来る全ての並列の組合せについて学習辞
書を検索し、検索のヒット数の多い組合せを優先的に出
力することを特徴とするものである。In the parallel phrase analyzing device according to claim 2, in the parallel phrase analyzing device according to claim 1, the parallel structure recognizing unit searches the learning dictionary for all parallel combinations that can be created from the bunsetsu sequence, and the search is performed. It is characterized in that a combination with a large number of hits is preferentially output.

【０００９】請求項３の学習データ自動作成装置は、学
習文を形態素に分解する形態素解析部と、形態素列を文
節列に合成する文節合成部と、文節列の中から並列キー
を見つけ出す並列キー判定部と、文節列および並列キー
から、学習文の並列型を判定する並列型分類部と、文節
列から、学習文の係り受け型を判定する係り受け型分類
部と、学習文の並列型に基づいて学習文が学習に有用な
ものか否かを判断し、有用であれば並列要素を学習文か
ら抽出する並列要素抽出部と、学習文の係り受け型に基
づいて学習文が学習に有用なものか否かを判断し、有用
であれば係り受け要素を学習文から抽出する係り受け要
素抽出部とを具備することを特徴とするものでる。According to a third aspect of the present invention, there is provided a learning data automatic creation apparatus in which a morpheme analysis unit for decomposing a learning sentence into morphemes, a bunsetsu synthesizing unit for synthesizing a morpheme string into bunsetsu strings, and a parallel key for finding a parallel key from bunsetsu strings. Judgment part, parallel type classification part for judging parallel type of learning sentence from phrase sequence and parallel key, dependency type classification part for judging dependency type of learning sentence from phrase sequence, parallel type of learning sentence Based on the above, it is determined whether the learning sentence is useful for learning, and if it is useful, the parallel element extraction unit that extracts parallel elements from the learning sentence, and the learning sentence based on the dependency type of the learning sentence It is characterized by comprising a dependency element extraction unit that determines whether or not it is useful and extracts the dependency element from the learning sentence if it is useful.

【００１０】請求項４の学習データ自動作成装置は、請
求項３の学習データ自動作成装置において、並列要素抽
出部が、並列型分類部の出力結果の内、学習文の並列構
造に並列の曖昧性がないのもだけを並列要素抽出の対象
とすることを特徴とするものである。According to a fourth aspect of the present invention, there is provided the learning data automatic generation apparatus according to the third aspect, in which the parallel element extracting unit has a parallel ambiguous structure in the parallel structure of the learning sentence in the output result of the parallel type classifying unit. The feature is that only the non-existent objects are targeted for parallel element extraction.

【００１１】請求項５の学習データ自動作成装置は、請
求項３の学習データ自動作成装置において、係り受け要
素抽出部が、係り受け型分類部の出力結果の内、学習文
の並列構造に係り受けの曖昧性がないものだけを係り受
け要素抽出の対象とすることを特徴とするものである。According to a fifth aspect of the present invention, there is provided the learning data automatic generation apparatus according to the third aspect, wherein the dependency element extraction unit is related to the parallel structure of the learning sentence in the output result of the dependency type classification unit. The feature is that only the items that have no ambiguity in the receiving are targeted for the receiving element extraction.

【００１２】請求項６の学習データ自動作成装置は、請
求項３の学習データ自動作成装置において、並列要素抽
出部が、並列型分類部の出力結果の内、学習文の並列構
造に並列の曖昧性がある場合でも、全ての可能な並列の
組合せに対して並列要素抽出を行い、頻度の高い並列要
素のみを学習データとして登録することを特徴とするも
のである。According to a sixth aspect of the learning data automatic generation apparatus of the third aspect, in the learning data automatic generation apparatus of the third aspect, the parallel element extraction unit is a fuzzy parallel to the parallel structure of the learning sentence in the output result of the parallel type classification unit. Even if there is a property, parallel elements are extracted for all possible parallel combinations, and only the frequently used parallel elements are registered as learning data.

【００１３】請求項７の学習データ自動作成装置は、請
求項３の学習データ自動作成装置において、係り受け要
素抽出部が、係り受け型分類部の出力結果の内、学習文
の並列構造に係り受けの曖昧性がある場合でも、全ての
可能な並列の組合せに対して係り受け要素抽出を行い、
頻度の高い係り受け要素のみを学習データとして登録す
ることを特徴とするものである。According to a seventh aspect of the present invention, there is provided the learning data automatic generation apparatus according to the third aspect, wherein the dependency element extraction unit is related to the parallel structure of the learning sentence in the output result of the dependency type classification unit. Even if there is ambiguity in the dependency, the dependency element extraction is performed for all possible parallel combinations,
The feature is that only the dependency elements having a high frequency are registered as learning data.

【００１４】請求項８の文脈追従型並列句解析装置は、
入力文を形態素列に分解する形態素解析部と、形態素列
から文節列を合成する文節合成部と、並列要素および係
り受け要素を記憶する学習辞書と、文節列の中から並列
キーを見つけ出す並列キー判定部と、文節列および並列
キーから、入力文の並列型を判定する並列型分類部と、
文節列から、入力文の係り受け型を判定する係り受け型
分類部と、入力文の並列型を参照して入力文から並列要
素を抽出し、学習辞書に登録する並列要素抽出登録部
と、入力文の係り受け型を参照して入力文から係り受け
要素を抽出し、学習辞書に登録する係り受け要素抽出登
録部と文節列および学習辞書の内容に基づいて、入力文
中に存在する並列句の認識を行う並列構造認識部とを具
備することを特徴とするものである。A context-following parallel phrase analysis device according to claim 8 is
A morphological analysis unit that decomposes an input sentence into a morpheme sequence, a bunsetsu synthesis unit that synthesizes a bunsetsu sequence from a morpheme sequence, a learning dictionary that stores parallel elements and dependency elements, and a parallel key that finds a parallel key in the bunsetsu sequence. A determination unit, a parallel type classification unit that determines the parallel type of the input sentence from the phrase sequence and the parallel key,
A dependency type classification unit that determines a dependency type of an input sentence from a phrase sequence, a parallel element extraction registration unit that extracts a parallel element from the input sentence by referring to a parallel type of the input sentence, and registers the learning element in a learning dictionary, A parallel phrase that exists in the input sentence based on the dependency element extraction / registration unit that extracts the dependency element from the input sentence by referring to the dependency type of the input sentence, and registers it in the learning dictionary And a parallel structure recognition unit that recognizes

【００１５】請求項９の文脈追従型並列句解析装置は、
請求項８の文脈追従型並列句解析装置において、並列構
造認識部が、文脈の学習データを事前の学習によって得
られた学習データに優先させることを特徴とするもので
ある。The context-following parallel phrase analysis device of claim 9 is
In the context-following parallel phrase analysis device according to claim 8, the parallel structure recognition unit prioritizes the learning data of the context over the learning data obtained by prior learning.

【００１６】[0016]

【作用】本発明の並列句解析装置の作用について説明す
る。学習辞書には、例えば「編集，印刷」「作成，更
新，削除」「解析，生成」「分割，合成」などの並列要
素と、「文書の作成」「文書の編集」「文書の更新」
「文書の印刷」と言う係り受け要素が格納されている。
例えば「編集と印刷を行なう」と言う文が入力される
と、この入力文は「編集」「と」「印刷」「を」「行
な」「う」「。」と言う形態素列に変換される。各形態
素には文法上の属性が付加されている。文法上の属性と
は、例えば「編集」がサ変名詞と言うものである。The operation of the parallel phrase analysis device of the present invention will be described. In the learning dictionary, for example, parallel elements such as “edit, print”, “create, update, delete”, “analyze, generate”, “divide and combine”, and “create document”, “edit document”, “update document”.
A dependency element called "document printing" is stored.
For example, when a sentence "edit and print" is input, this input sentence is converted into a morpheme sequence "edit""to""print""wa""line""u"".". It A grammatical attribute is added to each morpheme. The grammatical attribute is, for example, that “edit” is a sahen noun.

【００１７】この形態素列から文節列が作成される。上
述の入力文に対応する文節列は、「編集と」文節型＝体（並）係り受け型＝体「印刷を」文節型＝体係り受け型＝用「行なう。」文節型＝用係り受け型＝× と言うものである。上述の文節列から、「編集と」が体
言属性を持ち、且つ並列キーであり、「編集と」の係り
先が体言であることが判る。A phrase sequence is created from this morpheme sequence. The phrase sequence corresponding to the above input sentence is “edit and” phrase type = body (normal) dependency type = body “print” phrase type = body dependency type = use “do.” Clause type = use dependency The type is called x. From the phrase sequence described above, it can be seen that “edit and” has a word attribute, is a parallel key, and the destination of “edit and” is the word.

【００１８】「編集と」が並列キーであり、並列の相手
が体言であり、学習辞書に「編集，印刷」と言う並列要
素が登録されているので、「編集の」の並列相手が「印
刷」であることが判る。Since "edit and" is a parallel key, and the parallel partner is a word, and the parallel element "edit, print" is registered in the learning dictionary, the parallel partner of "edit" is "print". It is understood that it is.

【００１９】本発明の学習データ作成装置の作用につい
て説明する。例えば、「編集と印刷を行なう」と言う学
習文が入力されると、この学習文は「編集」「と」「印
刷」「を」「行な」「う」「。」と言う形態素列に変換
される。The operation of the learning data creating apparatus of the present invention will be described. For example, when a learning sentence “Edit and print” is input, this learning sentence is converted into a morpheme sequence “Edit”, “To”, “Print”, “W”, “Line”, “U”, “.”. To be converted.

【００２０】この形態素列から文節列が作成される。上
述の学習文に対応する文節列は、「編集と」文節型＝体（並）係り受け型＝体「印刷を」文節型＝体係り受け型＝用「行なう。」文節型＝用係り受け型＝× と言うものである。上述の文節列の中に存在する並列キ
ーは「編集と」と言う文節である。「編集と」の係り先
は体言であり、この例では「編集と」に係る体言は「印
刷を」しか存在しないので、学習文から「編集と印刷」
と言う並列要素が抽出される。A phrase sequence is created from this morpheme sequence. The phrase sequence corresponding to the above-mentioned learning sentence is “edit and” phrase type = body (normal) dependency type = body “print” clause type = body dependency type = use “do.” Clause type = use dependency The type is called x. The parallel key existing in the above clause sequence is a clause called "edit and". The term “editing and” is related to a word, and in this example, the wording “editing” is only “print”.
A parallel element called is extracted.

【００２１】「文書の編集を行なう。」と言う学習文が
学習データ作成装置に入力されると、この学習文は「文
書」「の」「編集」「を」「行な」「う」「。」と言う
形態素列に変換される。When a learning sentence "Edit a document." Is input to the learning data creating apparatus, this learning sentence is "document""no""edit""wa""line""u"" . ".

【００２２】この形態素列から文節列が作成される。上
述の学習文に対応する文節列は、「文書の」文節型＝体係り受け型＝体「印刷を」文節型＝体係り受け型＝用「行なう。」文節型＝用係り受け型＝× と言うものである。この文節列において、「文書の」の
係り先になるものは「印刷を」と言う文節しか存在しな
いので、学習文から「文書の印刷」と言う係り受け要素
が抽出される。A phrase sequence is created from this morpheme sequence. The phrase sequence corresponding to the above-mentioned learning sentence is “document” phrase type = body dependency type = body “print” phrase type = body dependency type = use “do.” Clause type = use dependency type = × Is to say. In this phrase sequence, since only the phrase “print” is the target of “document” dependency, the dependency element “print document” is extracted from the learning sentence.

【００２３】本発明の文脈追従型並列句解析装置は、本
発明の並列句解析装置と学習データ作成装置をあわせた
ものである。本発明の文脈追従型並列句解析装置は、入
力文から学習データを抽出し、抽出された学習データを
使用しながら、それ以後の入力文の並列句解析を行う。The context-following parallel phrase analysis device of the present invention is a combination of the parallel phrase analysis device of the present invention and the learning data creation device. The context-following parallel phrase analysis device of the present invention extracts learning data from an input sentence and uses the extracted learning data to perform parallel phrase analysis of the subsequent input sentence.

【００２４】[0024]

【実施例】図１は本発明の学習データ作成装置の例を示
す図である。同図において、１は形態素解析部、２は文
節合成部、３は並列キー判定部、４は並列型分類部、５
は並列要素抽出部、６は係り受け型分類部、７は係受け
関係抽出部をそれぞれ示している。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a diagram showing an example of a learning data creating apparatus of the present invention. In the figure, 1 is a morphological analysis unit, 2 is a clause synthesis unit, 3 is a parallel key determination unit, 4 is a parallel type classification unit, and 5 is a parallel type classification unit.
Represents a parallel element extraction unit, 6 represents a dependency type classification unit, and 7 represents a dependency relationship extraction unit.

【００２５】形態素解析部１は、学習文を受け取り、学
習文を形態素列に分解する。文節合成部２は、上述の形
態素解析結果で得られた形態素列を文節に組み上げる。
ここで言う文節とは、標準的な学校文法に基づく文節で
ある。名詞や動詞などの自立語に、付属語が付いた形が
多い。並列キー判定部３は、文節合成部２によって合成
された文節列から文節のキーとなるものを判定する。並
列キーとは、例えば「編集（名詞）と（並立助詞）」と
言う形を持つ文節を意味している。The morpheme analysis unit 1 receives a learning sentence and decomposes the learning sentence into a morpheme string. The bunsetsu synthesizing unit 2 assembles a morpheme string obtained from the above morpheme analysis result into a bunsetsu.
The phrase referred to here is a phrase based on standard school grammar. Independent words such as nouns and verbs often have adjuncts. The parallel key determination unit 3 determines the key of the phrase from the phrase sequence synthesized by the phrase synthesis unit 2. The parallel key means a phrase having a form of, for example, “edit (noun) and (parallel particle)”.

【００２６】並列型分類部４は、これまでに求めた文節
および並列キーの情報から、入力文の並列型を分類す
る。並列型とは、例えば並列キーとされた文節と並立し
得る文節が１個か複数個かを表すものである。並列要素
抽出部５は、並列型分類部４で判定された学習文の並列
型から、その学習文が学習に有用なものであるかどうか
を判断し、有用であれば、その学習文からから並列要素
を取り出す。取り出された並列要素は、後述の並列要素
学習辞書に書き込まれる。The parallel type classifying unit 4 classifies the parallel type of the input sentence from the information of the clause and the parallel key obtained so far. The parallel type indicates, for example, whether there is one or a plurality of bunsetsu that can be juxtaposed with a bunsetsu which is a parallel key. The parallel element extraction unit 5 determines whether or not the learning sentence is useful for learning from the parallel type of the learning sentence determined by the parallel type classifying unit 4, and if it is useful, determines from the learning sentence. Take out parallel elements. The retrieved parallel element is written in the parallel element learning dictionary described later.

【００２７】係り受け型分類部６は、文節合成部２によ
って得られた文節列から、学習文の係り受け型を判定す
る。係り受け型とは、例えば係り元の文節の係り先とな
り得る文節が１個か複数個かを表すものである。係り受
け関係抽出部７は、係り受け型分類部６によって判定さ
れた係り受け型から、学習文が係り受け情報を抽出する
のに適当であるかどうかを判断し、適当であれば係り受
け要素を取り出す。取り出された係り受け要素は、後述
の係り受け要素学習辞書に格納される。The dependency type classifying unit 6 determines the dependency type of the learning sentence from the phrase sequence obtained by the phrase synthesizing unit 2. The dependency type indicates, for example, whether there is one or a plurality of bunsetsu that can be the bunsetsu of the bunsetsu source. The dependency relation extraction unit 7 determines whether the learning sentence is suitable for extracting the dependency information from the dependency type determined by the dependency type classification unit 6, and if so, the dependency element. Take out. The extracted dependency element is stored in the dependency element learning dictionary described later.

【００２８】図２は学習データ作成のための処理の流れ
を示す図である。ステップＳ１では、入力文が入力され
る。ステップＳ２では、入力文が形態素に分解される。
ステップＳ３では、形態素列が文節列に組み上げられ
る。ステップＳ４では、文節列から並列型が判定され
る。ステップＳ５では、この判定結果から入力文が学習
に適しているかどうかが判定される。学習に適している
場合は、ステップＳ６で並列要素が並列要素学習辞書に
登録される。FIG. 2 is a diagram showing a flow of processing for creating learning data. In step S1, an input sentence is input. In step S2, the input sentence is decomposed into morphemes.
In step S3, the morpheme sequence is assembled into a phrase sequence. In step S4, the parallel type is determined from the phrase sequence. In step S5, it is determined from the determination result whether the input sentence is suitable for learning. If it is suitable for learning, the parallel element is registered in the parallel element learning dictionary in step S6.

【００２９】ステップＳ７では、ステップＳ３で得られ
た文節列から係り受け型が判定される。ステップＳ８で
は、判定された係り受け型が学習に適しているか否が判
断される。学習に適している場合は、ステップＳ９で係
り受け要素を係り受け要素学習辞書に登録する。ステッ
プＳ１０では、入力文が最後の文かどうかを判定し、最
後であれば終了する。最後でなければ、ステップＳ１に
戻って次の入力文を受け付ける。In step S7, the dependency type is determined from the clause sequence obtained in step S3. In step S8, it is determined whether the determined dependency type is suitable for learning. If it is suitable for learning, the dependency element is registered in the dependency element learning dictionary in step S9. In step S10, it is determined whether the input sentence is the last sentence, and if it is the last sentence, the process ends. If not the last, the process returns to step S1 to accept the next input sentence.

【００３０】図３は学習データ作成を説明する図であ
る。図３(a) は、学習データである。図３(b) は、学習
データに形態素解析をかけ、形態素列に分解したもので
ある。図３(c) は、形態素列を文節に組み上げたもので
ある。各文節には、文節の型および係り得る先の文節型
を示してある。図３(c) は次のことを示す。「編集と」
と言う文節が、体言であること、並列キーであること、
係り先は体言であることを示す。また、「印刷を」と言
う文節が、体言であること、係り先が用言であることを
示す。更に、「行なう。」と言う文節が、用言であるこ
と、係り先がないことを示す。FIG. 3 is a diagram for explaining the learning data creation. FIG. 3A shows learning data. In FIG. 3B, the learning data is subjected to morphological analysis and decomposed into a morphological sequence. FIG. 3 (c) shows a morpheme string assembled into a clause. In each clause, the type of clause and the possible preceding clause types are shown. FIG. 3 (c) shows the following. "Edit and"
The phrase that says "is a phrase, is a parallel key,
Indicates that the contact person is a word. In addition, the phrase “print” is a phrase and the contact is a verb. In addition, the phrase "do." Is a verb and has no contact.

【００３１】図３(c) によって、「編集と」と言う文節
が並列キーであることが認識される。この文では、「編
集と」と並列になり得る文節は「印刷を」しかないの
で、並列の曖昧性はなく、よって確実に並列要素を特定
することが出来る文である。このことから、この文は、
並列要素抽出に適した文であることが判断される。この
ような並列要素データを抽出して蓄積したものが、図３
(e) の並列要素学習辞書である。From FIG. 3C, it is recognized that the phrase "editing" is a parallel key. In this sentence, the only clause that can be in parallel with "editing" is "printing", so there is no ambiguity in parallel, and thus it is possible to reliably identify parallel elements. From this, this sentence becomes
It is determined that the sentence is suitable for parallel element extraction. The data obtained by extracting and accumulating such parallel element data is shown in FIG.
It is a parallel element learning dictionary of (e).

【００３２】図３(d) は「文書の編集を行う」と言う学
習文から、文節を取り出したものである。この文は、
「編集を」の対象として「文書の」しかあり得ず、係り
受けの曖昧性がない。したがって、この文は係り受け要
素抽出に適していると判断され、係り受け要素が抽出さ
れる。このようにして抽出された係り受け要素を蓄積し
たものが図３(e) の係り受け要素学習辞書である。FIG. 3D shows a phrase extracted from the learning sentence "editing a document". This sentence
There can be only "document" as the target of "editing", and there is no ambiguity of dependency. Therefore, this sentence is judged to be suitable for the modification element extraction, and the modification element is extracted. The dependency element learning dictionary shown in FIG. 3E is obtained by accumulating the dependency elements thus extracted.

【００３３】図４は並列構造の曖昧性を説明する図であ
る。本発明では、並列構造の型で文を分類し、入力文が
学習に向いているかどうかを判定する。以下の例では、
仮に“並列構造を含む文で、且つ、並列の曖昧性のない
文”が学習に向いた文だとする。図４(a) および(b) の
文は、ともに「読みだしと」と言う並列キーを持ってい
るので、並列構造を持った文と言える。FIG. 4 is a diagram for explaining the ambiguity of the parallel structure. In the present invention, sentences are classified according to the type of parallel structure, and it is determined whether the input sentence is suitable for learning. In the example below,
Suppose that "a sentence including a parallel structure and a parallel unambiguous sentence" is a sentence suitable for learning. Since the sentences in FIGS. 4 (a) and 4 (b) both have a parallel key called "reading", they can be said to have a parallel structure.

【００３４】更に、図４(a) によると、「読みだしと」
の係り先（この場合は並列の相手）は体言であるが、
「読みだしと」の後には「ファイルへの」と言う体言と
「書き込み」と言う体言があり、どちらに係るかは曖昧
である。このようにして、この文は「並列構造に曖昧性
がある」と判定される。一方、図４(b) においては、
「読み出しと」の係り先（並列の相手）としては「書き
込み」しか存在しないので、並列の曖昧性がない。この
ようにして、この文は「並列の曖昧性がない」と判定さ
れる。Further, according to FIG. 4 (a), "reading"
The contact person (in this case, the parallel partner) is a word,
After "reading", there is a wording "to file" and a wording "writing", and it is unclear which one is related. In this way, this sentence is determined to be "ambiguous in parallel structure". On the other hand, in FIG. 4 (b),
There is no ambiguity of parallelism because there is only "write" as the destination of "reading" (parallel partner). In this way, the sentence is determined to be "no parallel ambiguity."

【００３５】図５は係り受けの曖昧性を説明する図であ
る。本発明においては、係り受け型で文を分類し、入力
文が学習に向いているかどうかを判定する。以下の例で
は、仮に「係り受けの曖昧性のない文」が学習に向いた
文だとする。図５(a) においては、「ファイルへ」と言
う体言文節は用言に係るが、「ファイルへ」の後には
「書き込まれた」と「読み出す」の２つの用言文節があ
り、どちらに係るかは本格的な解析をしないと判定でき
ない。よって、この文を「係り受けの曖昧性のある文」
と判定する。図５(b) では、「ファイルへの」の係り先
は「読み出す」しかないので、この文は「係り受けの曖
昧性のない文」だと判定される。FIG. 5 is a diagram for explaining the ambiguity of dependency. In the present invention, sentences are classified by the dependency type, and it is determined whether or not the input sentence is suitable for learning. In the following example, it is assumed that the “sentence without ambiguity of dependency” is a sentence suitable for learning. In Fig. 5 (a), the word phrase "to file" is related to the idiom, but after "to file", there are two word phrases "written" and "read". It cannot be judged whether or not it is relevant without a full-scale analysis. Therefore, this sentence is referred to as "a sentence with ambiguous dependency".
To determine. In FIG. 5 (b), since the reference to “to file” is only “read”, this sentence is determined to be “sentence without ambiguity in dependency”.

【００３６】学習文の並列構造における係り受けの曖昧
性について例をあげて説明する。例１（曖昧性なし）：「東京および大阪に行った」例２（曖昧性あり）：「東京および大阪の町に行った」例１では、「東京」と「大阪」が並列になっており、よ
って「行った」は「東京」と「大阪」の両方にかかると
言うことが、一意に認識できる。例２では、「東京」と
「大阪」が並列であると言う解釈と、「東京」と「大阪
の町」が並列になっていると言う解釈ができる。よっ
て、「行った」が「東京」と「大阪」にかかると言う解
釈と、「行った」が「東京」と「町」にかかると言う解
釈の両方ができる。このことから、この例文の係り受け
は曖昧であると言える。The ambiguity of dependency in the parallel structure of learning sentences will be described with an example. Example 1 (without ambiguity): "I went to Tokyo and Osaka" Example 2 (with ambiguity): "I went to the towns of Tokyo and Osaka" In Example 1, "Tokyo" and "Osaka" are in parallel. Therefore, it can be uniquely recognized that "go" goes to both "Tokyo" and "Osaka". In Example 2, it is possible to interpret that “Tokyo” and “Osaka” are in parallel, and that “Tokyo” and “Osaka town” are in parallel. Therefore, it can be interpreted that "I went" affects "Tokyo" and "Osaka" and "I went" affects "Tokyo" and "town". From this, it can be said that the dependency of this example sentence is ambiguous.

【００３７】以上のような操作を全ての学習文に対して
行い、学習データの蓄積を行う。上述の例では、曖昧性
のない文のみを学習の対象にするようになっているが、
曖昧性がある場合には、全ての可能な組合せに展開して
から学習辞書に蓄積し、頻度の低いものは捨てるような
方法を取ることも出来る。例えば、上述の例１の場合
は、全ての可能な組合せは、「東京に行った。」「大阪
に行った。」になり、上述の例２の場合は、全ての可能
な組合せは、「東京に行った。」「東京の町に行っ
た。」「大阪の町に行った。」になる。The above operation is performed for all learning sentences to accumulate learning data. In the above example, only unambiguous sentences are targeted for learning.
If there is ambiguity, it is possible to develop it into all possible combinations, store it in the learning dictionary, and discard the infrequent ones. For example, in the case of the above-mentioned example 1, all possible combinations are “I went to Tokyo” and “I went to Osaka.” In the case of the above-mentioned example 2, all possible combinations are “ "I went to Tokyo.""I went to a town in Tokyo.""I went to a town in Osaka."

【００３８】図６は本発明の並列句解析装置の例を示す
図である。同図において、１は形態素解析部、２は文節
合成部、８は並列構造認識部、９は並列要素学習辞書、
１０は係り受け要素学習辞書をそれぞれ示す。FIG. 6 is a diagram showing an example of the parallel phrase analysis device of the present invention. In the figure, 1 is a morphological analysis unit, 2 is a clause synthesis unit, 8 is a parallel structure recognition unit, 9 is a parallel element learning dictionary,
Reference numerals 10 respectively indicate dependency element learning dictionaries.

【００３９】形態素解析部１は、解析対象の入力文を形
態素に分解する。文節合成部２は、上述の形態素列を文
節列にまとめあげる。並列構造認識部８は、上述の文節
列から並列構造を認識する。並列構造認識部８は、認識
のために、並列要素学習辞書９および係り受け要素学習
辞書１０を参照する。これら並列要素学習辞書９および
係り受け要素学習辞書１０は、学習データ作成装置（図
１を参照）によって、解析に先立って作成しておくもの
である。The morpheme analysis unit 1 decomposes the input sentence to be analyzed into morphemes. The phrase synthesizing unit 2 collects the above-mentioned morpheme sequence into a phrase sequence. The parallel structure recognizing unit 8 recognizes the parallel structure from the phrase sequence. The parallel structure recognition unit 8 refers to the parallel element learning dictionary 9 and the dependency element learning dictionary 10 for recognition. The parallel element learning dictionary 9 and the dependency element learning dictionary 10 are created by the learning data creating device (see FIG. 1) prior to analysis.

【００４０】図７は学習データを使用して並列句解析を
行うための処理の流れを示す図である。ステップＳ１で
は、文が入力される。ステップＳ２では、入力文が形態
素に分解される。ステップＳ３では、形態素列が文節に
組み上げられる。ステップＳ４では、学習段階で作成さ
れた学習辞書を参照しながら、並列句認識を行う。FIG. 7 is a diagram showing the flow of processing for performing parallel phrase analysis using learning data. In step S1, a sentence is input. In step S2, the input sentence is decomposed into morphemes. In step S3, the morpheme string is assembled into a phrase. In step S4, parallel phrase recognition is performed with reference to the learning dictionary created in the learning stage.

【００４１】図８は並列構造の認識を説明する図であ
る。図８(a) は解析対象の入力文である。図８(b) は入
力文を文節に分解したものである。図８(c) は文節列か
ら作ることが出来る並列の組合せの例である。図中、組
合せは括弧で表してある。ここで、それぞれの組合せか
ら得られる，全ての並列要素の組と係り受け要素の組に
ついて、学習辞書を検索する。検索した結果、要素が辞
書に存在したものを，右側に並べてある。この中で、上
から３番目の組合せが，学習辞書の内容にもっとも沿っ
たものであるので、並列句認識結果として出力する。図
８(d) は、この出力結果である。なお、それぞれの組合
せについて従来手法による点数付けを行っておき、この
点数に学習辞書の検索結果の点数を加味して候補を選ぶ
ようにしても良い。FIG. 8 is a diagram for explaining the recognition of the parallel structure. FIG. 8A shows an input sentence to be analyzed. FIG. 8 (b) shows the input sentence decomposed into clauses. FIG. 8 (c) is an example of a parallel combination that can be created from the phrase sequence. The combinations are shown in parentheses in the figure. Here, the learning dictionary is searched for all the sets of parallel elements and the sets of dependency elements obtained from the respective combinations. As a result of the search, the elements existing in the dictionary are arranged on the right side. Of these, the third combination from the top is the one that most closely matches the contents of the learning dictionary, and is therefore output as the parallel phrase recognition result. FIG. 8 (d) shows the output result. It is also possible to perform scoring by a conventional method for each combination, and to select candidates by adding the score of the search result of the learning dictionary to this score.

【００４２】図９は本発明の文脈追従型並列句解析装置
の例を示す図である。同図において、１は形態素解析
部、２は文節合成部、３は並列キー判定部、４は並列型
分類部、６は係り受け型分類部、８は並列構造認識部、
９は並列要素学習辞書、１０は係り受け要素学習辞書を
それぞれ示している。FIG. 9 is a diagram showing an example of the context-following parallel phrase analysis device of the present invention. In the figure, 1 is a morphological analysis unit, 2 is a clause synthesis unit, 3 is a parallel key determination unit, 4 is a parallel type classification unit, 6 is a dependency type classification unit, 8 is a parallel structure recognition unit,
Reference numeral 9 indicates a parallel element learning dictionary, and 10 indicates a dependency element learning dictionary.

【００４３】図９の文脈追従型並列句解析装置は、学習
辞書を用いて並列句解析を行う装置において、学習辞書
の学習も同時に行うようにしたものである。基本的に
は、学習のための構成と認識のための構成を組み合わせ
たような構成になっている。The context-following parallel phrase analysis device of FIG. 9 is a device for performing parallel phrase analysis using a learning dictionary, and is also configured to perform learning of the learning dictionary at the same time. Basically, it has a structure that combines a structure for learning and a structure for recognition.

【００４４】図中、並列キー判定部３までは普通の並列
句解析装置と同じである。並列型分類部４および係り受
け型分類部６は、並列構造認識処理中に、並列キー判定
部３からの文節列を監視しており、学習に適当な文の場
合には、並列要素学習辞書９又は係り受け要素学習辞書
１０に学習データを追加する。並列構造認識部８は、こ
の動的に変化する学習辞書９，１０を参照しながら、認
識処理を行う。よって、学習が起こった次の文の認識か
ら、この学習結果が並列構造認識部８に反映されるよう
になる。In the figure, the components up to the parallel key determination unit 3 are the same as those of an ordinary parallel phrase analysis device. The parallel type classification unit 4 and the dependency type classification unit 6 monitor the bunsetsu string from the parallel key determination unit 3 during the parallel structure recognition processing, and when the sentence is suitable for learning, the parallel element learning dictionary is used. 9 or learning data is added to the dependency element learning dictionary 10. The parallel structure recognition unit 8 performs recognition processing with reference to the dynamically changing learning dictionaries 9 and 10. Therefore, from the recognition of the next sentence in which learning has occurred, the learning result is reflected in the parallel structure recognition unit 8.

【００４５】図９の文脈追従型並列句解析装置において
は、学習辞書の中には、現文書の認識前から存在する学
習データと、現文書の認識中に抽出された学習データと
が存在することになるが、現文書の認識中で抽出された
学習データの重みを、現文書の認識前から存在する学習
データの重みよりも大きくすることが出来る。In the context-following parallel phrase analysis device of FIG. 9, the learning dictionary contains learning data existing before recognition of the current document and learning data extracted during recognition of the current document. However, the weight of the learning data extracted during the recognition of the current document can be made larger than the weight of the learning data existing before the recognition of the current document.

【００４６】[0046]

【発明の効果】以上の説明から明らかなように、本発明
によれば、並列句認識を行う際に、解析対象の分野に多
用されるような並列表現を確実に認識することが出来
る。また、この際に用られる学習データを，人手作業な
しに自動的に作成することが出来る。As is apparent from the above description, according to the present invention, when performing parallel phrase recognition, it is possible to reliably recognize parallel expressions that are frequently used in the field to be analyzed. Further, the learning data used in this case can be automatically created without any manual work.

[Brief description of drawings]

【図１】本発明の学習データ作成装置の例を示す図であ
る。FIG. 1 is a diagram showing an example of a learning data creation device of the present invention.

【図２】学習データ作成のための処理の流れを示す図で
ある。FIG. 2 is a diagram showing a flow of processing for creating learning data.

【図３】学習データの作成を説明する図である。FIG. 3 is a diagram illustrating creation of learning data.

【図４】並列構造の曖昧性を説明する図である。FIG. 4 is a diagram illustrating ambiguity of a parallel structure.

【図５】係り受けの曖昧性を説明する図である。FIG. 5 is a diagram illustrating ambiguity of dependency.

【図６】本発明の並列句解析装置の例を示す図である。FIG. 6 is a diagram showing an example of a parallel phrase analysis device of the present invention.

【図７】学習データを用いて並列句解析を行うための処
理の流れを示す図である。FIG. 7 is a diagram showing a flow of processing for performing parallel phrase analysis using learning data.

【図８】並列構造の認識を説明する図である。FIG. 8 is a diagram illustrating recognition of a parallel structure.

【図９】本発明の文脈追従型並列句解析装置の例を示す
図である。FIG. 9 is a diagram showing an example of a context-following parallel phrase analysis device of the present invention.

[Explanation of symbols]

１形態素解析部２文節合成部３並列キー判定部４並列型分類部５並列要素抽出部６係り受け分類部７係り受け関係抽出部８並列構造認識部９並列要素学習辞書１０係り受け要素学習辞書 DESCRIPTION OF SYMBOLS 1 Morphological analysis unit 2 Phrase synthesis unit 3 Parallel key determination unit 4 Parallel type classification unit 5 Parallel element extraction unit 6 Dependency classification unit 7 Dependency relationship extraction unit 8 Parallel structure recognition unit 9 Parallel element learning dictionary 10 Dependency element learning dictionary

Claims

[Claims]

1. A morphological analysis unit that decomposes an input sentence into a morpheme sequence, a bunsetsu synthesis unit that synthesizes a bunsetsu sequence from a morpheme sequence, a learning dictionary that stores parallel elements and dependency elements, and a bunsetsu sequence and a learning dictionary. A parallel phrase analysis device, comprising: a parallel structure recognition unit that recognizes a parallel phrase existing in an input sentence based on contents.

2. The parallel structure recognizing unit searches the learning dictionary for all parallel combinations that can be created from the bunsetsu sequence, and preferentially outputs a combination having a large number of search hits. 1. A parallel phrase analyzer of 1.

3. A morpheme analysis unit for decomposing a learning sentence into morphemes, a bunsetsu composition unit for synthesizing a morpheme string into bunsetsu strings, a parallel key determination unit for finding a parallel key from bunsetsu strings, and a bunsetsu string and a parallel key. , A parallel type classification unit that determines the parallel type of the learning sentence, a dependency type classification unit that determines the dependency type of the learning sentence from the clause sequence, and the learning sentence is useful for learning based on the parallel type of the learning sentence. If it is useful, the parallel element extraction unit that extracts parallel elements from the learning sentence and if it is useful, determines whether the learning sentence is useful for learning based on the dependency type of the learning sentence. A learning data automatic creation apparatus comprising: a dependency element extraction unit that extracts a dependency element from a learning sentence if useful.

4. The parallel element extraction unit sets only the parallel output of the parallel type classification unit that has no parallel ambiguity in the parallel structure of the learning sentence as the target of the parallel element extraction. Item 3 learning data automatic creation device.

5. The dependency element extraction unit targets the dependency element extraction only from the output results of the dependency type classification unit, which has no dependency ambiguity in the parallel structure of the learning sentence. The learning data automatic creation device according to claim 3.

6. The parallel element extraction unit performs parallel element extraction for all possible parallel combinations even if there is parallel ambiguity in the parallel structure of the learning sentence among the output results of the parallel type classification unit. 4. The learning data automatic creation apparatus according to claim 3, wherein only the parallel elements having a high frequency are registered as learning data.

7. The dependency element extraction unit is dependent on all possible parallel combinations even if there is an ambiguity in the dependency structure of the learning sentence among the output results of the dependency type classification unit. 4. The receiving element extraction is performed, and only the changing element having a high frequency is registered as learning data.
Learning data automatic creation device.

8. A morphological analysis unit for decomposing an input sentence into a morpheme sequence, a bunsetsu synthesis unit for synthesizing a bunsetsu sequence from a morpheme sequence, a learning dictionary for storing parallel elements and dependency elements, and a bunsetsu sequence for parallelization. A parallel key determination unit that finds a key, a parallel type classification unit that determines the parallel type of the input sentence from the clause sequence and the parallel key, and a dependency type classification unit that determines the dependency type of the input sentence from the clause sequence. By referring to the parallel type of the input sentence, the parallel elements are extracted from the input sentence and registered in the learning dictionary, and by referring to the dependency type of the input sentence, the dependency element is extracted from the input sentence. A context-following type characterized by comprising a dependency element extraction / registration unit for registering in a learning dictionary and a parallel structure recognizing unit for recognizing a parallel phrase existing in an input sentence based on a clause sequence and contents of the learning dictionary. Parallel phrase analyzer.

9. The context-following parallel phrase analysis device according to claim 8, wherein the parallel structure recognition unit prioritizes the learning data of context over the learning data obtained by prior learning.