JPS63221475A

JPS63221475A - Analyzing method for syntax

Info

Publication number: JPS63221475A
Application number: JP62055624A
Authority: JP
Inventors: Yuji Sugano; 祐司菅野; Kenji Nagao; 健司長尾; Osamu Iwasaki; 修岩崎; Kenichi Ueda; 謙一上田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1987-03-11
Filing date: 1987-03-11
Publication date: 1988-09-14

Abstract

PURPOSE:To improve the average processing efficiency for analysis of a syntax without deteriorating the efficiency of the worst case, by analyzing the syntax after obtaining a part of the syntax of an input sentence by using form information and meaning information in the input sentence as well as the information included in a syntax rule group. CONSTITUTION:An input sentence 1 is successively read into a morpheme analyzing means 3 at and after the head character. The means 3 retrieves a part-of- speech dictionary 2 to confirm character strings that can be used as morphemes. Then the relevant character string is confirmed as a morpheme if the relevant part of speech can be connected to the morpheme confirmed right before that time point. This production is repeated to obtain a morpheme string 4 for the meaning information, the form information added with the part-of-speech information, etc. The strings 4 are read one by one into an incomplete syntax analyzing means 6. The means 6 analyzes the morpheme string 4 and obtains a partial tree string 7 within a buffer area. The string 7 is read by a syntax analyzing means 8 for each partial tree and all possible syntax trees are obtained in the form of an analysis table.

Description

【発明の詳細な説明】産業上の利用分野本発明は、入力された文の文構造や意味に従って動作す
る機器の構文解析方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a syntax analysis method for a device that operates according to the sentence structure and meaning of an input sentence.

従来の技術以下文として日本語文を対象にした場合について説明す
る。機械翻訳システムや文書校正システム等、日本語文
章の文構造や意味を理解して動作する計算機応用システ
ムでは、日本語文の構文構造を解析する必要がある。Conventional technology A case in which a Japanese sentence is targeted as a sentence will be explained below. Computer-applied systems such as machine translation systems and document proofing systems that operate by understanding the sentence structure and meaning of Japanese sentences need to analyze the syntactic structure of Japanese sentences.

計算言語学の分野では、そのような文解析の手法が研究
されており、例えば、「講座現代の言語第７巻・言語の
機械処理」（長尾真編、三省堂刊）や、「日本語情報処
理」（長尾真監修、電子通信学会発行）等の成書には、
これまでに開発された文解析手法のうちの代表的なもの
が紹介されている。その中で、文脈自由文法に基づく構
文解析手法は、構文規則が宣言的で明確であり、個々の
規則が独立しているので、文法の開発、保守が容易であ
る。また、言語理論の一つである変形生成文法の研究成
果がすなおに記述できる特徴を持つ。In the field of computational linguistics, such sentence analysis methods are being researched, such as ``Lecture on Modern Languages Volume 7: Machine Processing of Language'' (edited by Makoto Nagao, published by Sanseido) and ``Japanese Information In books such as "Processing" (supervised by Makoto Nagao, published by the Institute of Electronics and Communication Engineers),
Representative methods of sentence analysis developed so far are introduced. Among them, the syntactic analysis method based on context-free grammar has declarative and clear syntactic rules, and each rule is independent, so it is easy to develop and maintain the grammar. It also has the characteristic of being able to easily describe the research results of transformative generative grammar, which is one of the linguistic theories.

さら（二、形式言語理論、計算論の分野で、文脈自由文
法に基づいて、文を解析する効率のよい解析手法が知ら
れており、これらの手段を援用した解析システムがいく
つか作られており、「自然言語処理のためのプログラミ
ングシステム・拡張ＬＩＮＧＯＬについて」（田中穂積
他、電気通信学会論文誌、１９７７年、１２号、１６０
１〜１６ｏ８頁）等はその一例である。しかしながら、
このような解析システムでは、一般的で、制限のない文
脈自由文法の形で書かれた構文規則が扱える解析手法を
用いているため、その代償として、空間的、時間的なオ
ーバールラドが生じ、例えば、アーリー（Ｅａｒｌｅｙ
）の解析手法の場合には、入力文の長さをｎとすると、
ｎ２のオーダーの記憶容量と、ｎ５のオーダーの計算時
間が必要になる。これに対し、構文規則が、文脈自由文
法の一部分に限定される場合ニは、ＦＯＲＴＲＡＮ等プ
ログラミング言語の翻訳プログラム（コンパイラ−）で
用いられているような、より高速な手法が使える。例え
ば、構文規則がＬＬ（ｋ）文法と呼ばれる。文脈自由文
法の一部分に属する場合には、再帰下降法と呼ぶ手法を
用いて、より効率のよい解析を行なう事ができる。その
詳細は、ディセントコンバイリング。Furthermore, in the fields of formal language theory and computational theory, efficient analysis methods for analyzing sentences based on context-free grammars are known, and several analysis systems that utilize these methods have been created. "About extended LINGOL, a programming system for natural language processing" (Hozumi Tanaka et al., Transactions of the Institute of Electrical Communication Engineers, 1977, No. 12, 160
1-16o8 pages) is an example. however,
Such parsing systems use parsing techniques that can handle syntactic rules written in the form of general, unrestricted context-free grammars, at the expense of spatial and temporal overrading, e.g. , Early
), if the length of the input sentence is n, then
A storage capacity on the order of n2 and a calculation time on the order of n5 are required. On the other hand, if the syntax rules are limited to a portion of a context-free grammar, faster methods such as those used in translation programs (compilers) for programming languages such as FORTRAN can be used. For example, the syntax rules are called LL(k) grammars. If it belongs to a part of a context-free grammar, a more efficient analysis can be performed using a method called recursive descent. The details are Descent Combi Ring.

ニー・ジェー・ティー　ディピー他著　エリスハークソ
ド社刊行１９８１年（Ｄｅｓｃｅｎｔ　Ｃｏｍｐｉｌｉ
ｎｇ。Written by N.J.T. Dippy et al. Published by Ellis Harxod Publishing Co., Ltd., 1981 (Descent Compili)
ng.

Ａ、　Ｊ、　Ｔ、　Ｄａｖｉｅ他　Ｅｌｌｉｓ　Ｈｏｒ
ｗｏｏｄ社刊。A, J, T, Davie et al. Ellis Hor
Published by Wood Publishing.

１９８１年）に述べられている。しかしながら、多くの
自然言語の文法はあいまいなもので、このような制限付
きの文脈自由文法の枠がら出てしまい、そのままの形で
は、自然言語の解析を完全に行なうことはできない。(1981). However, the grammar of many natural languages is ambiguous, and it is impossible to completely analyze natural languages in their original form, since they fall outside the framework of such limited context-free grammars.

発明が解決しようとする問題点しかし、以上のような方法では、入力文の構文構造を、
構文木の形で完全に求めようとする限り、構文規則とし
て記述できる最も複雑なものにも対処できるような解析
手法を用いねばならず、空間的、時間的にかなりのオー
バーヘッドが生ずるという問題があった。また、一般的
な解析手法の適用の前に、その解析手法で用いる構文規
則とは別の、制限された構文規則を用いて解析する手法
では、構文規則が２群に分離するため、構文規則体系全
体としての整合性を保つのが困難で、個々の入力文に存
在する特殊性も生かせないという問題があった。Problems to be Solved by the Invention However, in the above method, the syntactic structure of the input sentence is
As long as we try to obtain a complete syntax tree, we have to use an analysis method that can handle even the most complex thing that can be described as a syntactic rule, which poses the problem of considerable spatial and temporal overhead. there were. In addition, in a method that uses limited syntax rules that are different from the syntax rules used in that analysis method before applying a general analysis method, the syntax rules are separated into two groups, so the syntax rules There were problems in that it was difficult to maintain the consistency of the system as a whole, and it was not possible to take advantage of the peculiarities that existed in individual input sentences.

本発明は、以上のような問題点を解決するもので、一般
の構文規則に従って解析を行なう構文解析手段で入力文
を解析する前に、個々の入力文の持つ特殊性と、与えら
れた構文規則群の特徴とを考慮して、前記構文解析手段
に比べて解析能力は劣るが高速・効率的な第２の構文解
析手段で、可能な限りの解析を行なうことで、取り扱い
やすい単一の構文規則群の下で、最悪の場合の効率を落
とさずに、平均的な処理効率を向上させた構文解析方法
を提供するものである。The present invention solves the above-mentioned problems. Before parsing an input sentence using a syntactic analysis means that performs analysis according to general syntactic rules, the present invention analyzes the special characteristics of each input sentence and the given syntax. Considering the characteristics of the rule group, a second parsing method, which is faster and more efficient than the parsing method described above, is used to perform as much analysis as possible, resulting in a single, easy-to-handle parsing method. The present invention provides a syntax analysis method that improves average processing efficiency under a group of syntax rules without reducing efficiency in the worst case.

問題点を解決するための手段本発明は、構文規則（二従って入力文の構文構造を解析
する構文解析手段と、前記入力文が前記構文解析手段へ
入力される前に、前記入力文の構文構造の一部分を求め
る不完全構文解析手段とにより、上記目的を達成するも
のである。Means for Solving the Problems The present invention provides a syntax analysis means for analyzing the syntactic structure of an input sentence according to syntactic rules (2), The above object is achieved by an incomplete parsing means for determining a part of the structure.

作　　　　用上記構成において、人力文中に含まれる形態、構文、意
味等の諸情報が、不完全構文解析手段へ入力され、全構
文規則中で、適用できると、前記不完全構文解析手段に
よって判断された規則だけが、前記入力文に適用され、
高速だが、部分的で不完全な構文解析が行なわれる。そ
の結果が、全ての構文規則を取り扱うことの可能な、一
般の構文解析手段へ入力され、ここで、完全に構文解析
が行なわれる。Operation In the above configuration, various information such as form, syntax, and meaning contained in a human sentence is input to the incomplete syntax analysis means, and is determined by the incomplete syntax analysis means to be applicable among all syntactic rules. only those rules applied to said input sentence,
Fast, but with partial and incomplete parsing. The result is input to a general parser capable of handling all syntax rules, where it is fully parsed.

実施例以下、本発明によって、ベタ書きされた漢字がな混じり
表記の日本語文を、文脈自由文法に基づいて解析し、構
文木な出力する構文解析方法を例にとって、図面と共に
説明する。図は、本発明による構文解析方法を実施する
ための全体構成を示すブロック図である。図において、
１は解析すべき入力文、２は形態素と、その品詞とを納
めた品詞辞書、３は品詞辞書２を参照して、入力文１を
相互（二隣接可能な、品詞情報の付加された形態素列４
へと変換する形態素解析手段、５は文脈自由文法式で記
述された構文規則、例えば、「　単位文　−格要素・単
位文　」「　単位文　−連部　　　　　　」「　格要素　−名詞句・格助詞　」のような規則が納められた構文規則群、６は形態素列４
の品詞の情報と構文規則群５とから、入力文１を部分的
に構文解析する、ＬＲ（１）手法に基づいた高速な不完
全構文解析手段、７は不完全構文解析手段６が出力する
部分ホ列、８は、部分ホ列７をアーリー（Ｅａｒｌｅｙ
）の手法（：基づいて解析し、構文ホ９を求める構文解
析手段である。Embodiment Hereinafter, a syntactic analysis method according to the present invention, in which a Japanese sentence written in solid letters with mixed Kanji characters is analyzed based on a context-free grammar and outputted as a syntax tree, will be explained with reference to the drawings. The figure is a block diagram showing the overall configuration for implementing the syntax analysis method according to the present invention. In the figure,
1 is an input sentence to be analyzed, 2 is a part-of-speech dictionary containing morphemes and their parts of speech, and 3 is a part-of-speech dictionary containing morphemes and their parts of speech. Column 4
5 is a morphological analysis means for converting into morphological analysis means, and 5 is a syntactic rule written in a context-free grammar formula, such as "unit sentence - case element/unit sentence", "unit sentence - rendition", "case element - noun phrase/case particle" A syntactic rule group containing rules such as 6 is a morpheme sequence 4
A high-speed incomplete syntactic analysis means based on the LR (1) method that partially parses the input sentence 1 from the part of speech information and the syntactic rule group 5; 7 is output by the incomplete syntactic analysis means 6. Partial Hole sequence 8 sets partial Hole sequence 7 to Early (Earley).
) is a syntactic analysis method that analyzes based on the method (:) and obtains the syntax 9.

入力文１は、文頭の文字から順に形態素解析手段３へ読
み込まれて処理される。形態素解析手段３（二よって、
品詞辞書２が検索されて、形態素となり得る文字列が認
定され、その品詞が、直前（二認定された形態素と接続
可能ならば、その文字列は形態素として認定される。こ
の動作を繰り返すことで、最終的に、品詞情報の付加さ
れた形態情報、意味情報等の形態素列４が得られる。形
態素列４は、文頭の形態素から１つずつ不完全構文解析
手段６へ読み込まれて処理される。不完全構文解析手段
６は、解析の途中結果及び最終結果を納めるバッファ領
域と、構文規則群５から作られた有限状態オートマトン
とから成り、バッファ領域の内容と読み込んだ形態素、
及び有限状態オートマトンの現在の状態により、行なう
べき動作を決定する。ただし、行なうべき動作が複数何
者えられ、−意には定まらない場合、すなわち、ＬＲ（
１）文法の範囲を超えるような動作が必要となる場合に
は、構文規則を適用することはせず、単にシフト動作（
読み込んだ形態素を単にバッファ領域に付は加える）を
行なう。このようにして、不完全構文解析手段６は形態
素列４を解析してゆき、最終的にバッファ領域内に部分
ホ列７が得られるが、この部分ホ列７は、構文規則群５
が同一であっても、入力文１の内容によって、形態素列
４と同一のもの（全熱解析が行なわれない）から、完全
な構文ホ（全ての解析が行なわれる）まで、様々な場合
があり得るが、一般には、いくつかの形態素がボトム・
アップにまとめ上げられて出来た部分ホ（最終的（＝得
られる構文木の一部分）の列であり、その列の要素数は
１以上かつ形態素列４の形態素数以下である。また、こ
の不完全構文解析の段階で、後の構文・意味解析に役立
つ各種の情報を抽出することも可能である。このよう（
ニして出来た部分ホ列７は、構文解析手段８により、１
部分水ごとに読み込まれ、解析表の形式で、全ての可能
な構文木が求まる。The input sentence 1 is read into the morphological analysis means 3 in order from the first character of the sentence and processed. Morphological analysis means 3 (2, therefore,
The part of speech dictionary 2 is searched to identify a character string that can be a morpheme, and if that part of speech can be connected to the immediately preceding (2 recognized morpheme), the character string is recognized as a morpheme.By repeating this operation, , Finally, a morpheme string 4 containing morphological information, semantic information, etc. to which part-of-speech information is added is obtained.The morpheme string 4 is read into the incomplete syntax analysis means 6 one by one starting from the morpheme at the beginning of the sentence and processed. The incomplete syntax analysis means 6 consists of a buffer area for storing intermediate and final results of analysis, and a finite state automaton created from the syntax rule group 5, and the incomplete syntax analysis means 6 consists of a buffer area for storing intermediate and final results of analysis, and a finite state automaton created from the syntax rule group 5.
and the current state of the finite state automaton to determine the action to be taken. However, if there are multiple actions to be performed and it is not decided at will, in other words, LR (
1) When an action is required that goes beyond the scope of the grammar, no syntactic rules are applied, and the shift action (
The read morpheme is simply added to the buffer area. In this way, the incomplete syntactic analysis means 6 analyzes the morpheme sequence 4, and finally obtains the partial h-sequence 7 in the buffer area, but this partial h-sequence 7 is the syntactic rule group 5
Even if they are the same, there are various cases depending on the content of input sentence 1, ranging from the same as morpheme sequence 4 (no full thermal analysis is performed) to a complete syntax (all parsing is performed). It is possible, but generally some morphemes are bottom
It is a sequence of partial hos (final (= part of the syntactic tree obtained) created by grouping up, and the number of elements in that sequence is 1 or more and less than or equal to the number of morphemes in morpheme sequence 4. At the stage of complete syntactic analysis, it is also possible to extract various types of information useful for later syntactic and semantic analysis.
The partial string 7 created by
Each partial water is read and all possible syntax trees are determined in the form of an analysis table.

上記の実施例においては、ＬＲ（１）手法の計算量が時
間的にも空間的にも、形態素列４の形態素数ｎとしたと
き、ｎｌのオーダーであり、先に述べたアーリー（Ｅａ
ｒｌｅｙ　）の解析手法の計算量よりも少なくて済む。In the above example, the amount of calculation of the LR(1) method is on the order of nl, both temporally and spatially, where n is the number of morphemes in the morpheme sequence 4, and
The amount of calculation required is smaller than that of the analysis method of (rley).

しかも、最悪の場合でも、計算量は、不完全構文解析手
段６がない場合と同等である。また、不完全構文解析手
段６は、構文解析手段８のための補助的な情報抽出を行
なって、構文解析手段８へ渡すことも可能であり、その
場合には、構文解析手段８の処理効率そのものを向上さ
せることができる。Moreover, even in the worst case, the amount of calculation is equivalent to the case without the incomplete syntax analysis means 6. In addition, the incomplete syntax analysis means 6 can also extract auxiliary information for the syntax analysis means 8 and pass it to the syntax analysis means 8. In that case, the processing efficiency of the syntax analysis means 8 can be improved. You can improve that.

なお、以上の説明では文として日本語文を使用した場合
について説明したが、英語、ドイツ語等、その他の各種
言語の文に対しても適用できる。Note that although the above explanation has been given for the case where Japanese sentences are used as sentences, the invention can also be applied to sentences in various other languages such as English and German.

発明の効果以上の説明のように、本発明によれば、従来の構文解析
方法に比べ、構文規則群を単一にしたままで、効率よく
適用することが可能な構文規則を、個々の入力文（二応
じて自動的に判断、適用することが可能であり、最悪の
場合の効率を落とすことなしに、平均的な処理効率を向
上させることができ、その効果は大きい。Effects of the Invention As explained above, according to the present invention, compared to conventional syntax analysis methods, syntax rules that can be applied efficiently while keeping a single syntax rule group can be applied to each individual input. It is possible to automatically judge and apply the processing according to the sentence (2), and the average processing efficiency can be improved without reducing efficiency in the worst case, which has a large effect.

[Brief explanation of drawings]

図は本発明（二よる構文解析方法の一実施例の全体構成
を示すブロック図である。１・・・入力文、２・・・品詞辞書、３・・・形態素解
析手段、４・・・形態素列、５・・・構文規則群、６・
・・不完全構文解析手段、７・・・部分ホ列、８・・・
構文解析手段、９・・・構文木。The figure is a block diagram showing the overall configuration of an embodiment of the syntactic analysis method according to the present invention (2). 1. Input sentence, 2. Part-of-speech dictionary, 3. Morphological analysis means, 4. Morpheme sequence, 5... Syntactic rule group, 6.
... Incomplete syntax analysis means, 7... Partial string, 8...
Syntax analysis means, 9... syntax tree.

Claims

[Claims]

(1) When analyzing the syntactic structure of each input sentence according to a group of syntactic rules that define the syntactic structure of a language, the morphological information and semantic information in the input sentence and the information in the group of syntactic rules are used to analyze the syntactic structure of each input sentence. A syntactic analysis method characterized by determining a part of the syntactic structure of an input sentence and then performing syntactic analysis.

(2) The syntax analysis method according to claim 1, wherein the syntax rule sentence is written in either a context-free grammar or an extended context-free grammar.

(3) The syntactic analysis method according to claim 1, wherein the input sentence is a sequence of Japanese morphemes to which morphological information, semantic information, etc. are added.