JPS63221475A - Analyzing method for syntax - Google Patents
Analyzing method for syntaxInfo
- Publication number
- JPS63221475A JPS63221475A JP62055624A JP5562487A JPS63221475A JP S63221475 A JPS63221475 A JP S63221475A JP 62055624 A JP62055624 A JP 62055624A JP 5562487 A JP5562487 A JP 5562487A JP S63221475 A JPS63221475 A JP S63221475A
- Authority
- JP
- Japan
- Prior art keywords
- syntax
- morpheme
- syntactic
- information
- input sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title description 14
- 238000004458 analytical method Methods 0.000 claims abstract description 52
- 230000000877 morphologic effect Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 abstract description 7
- 230000002542 deteriorative effect Effects 0.000 abstract 1
- 238000004364 calculation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000002076 thermal analysis method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 239000002023 wood Substances 0.000 description 1
Landscapes
- Machine Translation (AREA)
Abstract
Description
【発明の詳細な説明】
産業上の利用分野
本発明は、入力された文の文構造や意味に従って動作す
る機器の構文解析方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a syntax analysis method for a device that operates according to the sentence structure and meaning of an input sentence.
従来の技術
以下文として日本語文を対象にした場合について説明す
る。機械翻訳システムや文書校正システム等、日本語文
章の文構造や意味を理解して動作する計算機応用システ
ムでは、日本語文の構文構造を解析する必要がある。Conventional technology A case in which a Japanese sentence is targeted as a sentence will be explained below. Computer-applied systems such as machine translation systems and document proofing systems that operate by understanding the sentence structure and meaning of Japanese sentences need to analyze the syntactic structure of Japanese sentences.
計算言語学の分野では、そのような文解析の手法が研究
されており、例えば、「講座現代の言語第7巻・言語の
機械処理」(長尾真編、三省堂刊)や、「日本語情報処
理」(長尾真監修、電子通信学会発行)等の成書には、
これまでに開発された文解析手法のうちの代表的なもの
が紹介されている。その中で、文脈自由文法に基づく構
文解析手法は、構文規則が宣言的で明確であり、個々の
規則が独立しているので、文法の開発、保守が容易であ
る。また、言語理論の一つである変形生成文法の研究成
果がすなおに記述できる特徴を持つ。In the field of computational linguistics, such sentence analysis methods are being researched, such as ``Lecture on Modern Languages Volume 7: Machine Processing of Language'' (edited by Makoto Nagao, published by Sanseido) and ``Japanese Information In books such as "Processing" (supervised by Makoto Nagao, published by the Institute of Electronics and Communication Engineers),
Representative methods of sentence analysis developed so far are introduced. Among them, the syntactic analysis method based on context-free grammar has declarative and clear syntactic rules, and each rule is independent, so it is easy to develop and maintain the grammar. It also has the characteristic of being able to easily describe the research results of transformative generative grammar, which is one of the linguistic theories.
さら(二、形式言語理論、計算論の分野で、文脈自由文
法に基づいて、文を解析する効率のよい解析手法が知ら
れており、これらの手段を援用した解析システムがいく
つか作られており、「自然言語処理のためのプログラミ
ングシステム・拡張LINGOLについて」(田中穂積
他、電気通信学会論文誌、1977年、12号、160
1〜16o8頁)等はその一例である。しかしながら、
このような解析システムでは、一般的で、制限のない文
脈自由文法の形で書かれた構文規則が扱える解析手法を
用いているため、その代償として、空間的、時間的なオ
ーバールラドが生じ、例えば、アーリー(Earley
)の解析手法の場合には、入力文の長さをnとすると、
n2のオーダーの記憶容量と、n5のオーダーの計算時
間が必要になる。これに対し、構文規則が、文脈自由文
法の一部分に限定される場合ニは、FORTRAN等プ
ログラミング言語の翻訳プログラム(コンパイラ−)で
用いられているような、より高速な手法が使える。例え
ば、構文規則がLL(k)文法と呼ばれる。文脈自由文
法の一部分に属する場合には、再帰下降法と呼ぶ手法を
用いて、より効率のよい解析を行なう事ができる。その
詳細は、ディセントコンバイリング。Furthermore, in the fields of formal language theory and computational theory, efficient analysis methods for analyzing sentences based on context-free grammars are known, and several analysis systems that utilize these methods have been created. "About extended LINGOL, a programming system for natural language processing" (Hozumi Tanaka et al., Transactions of the Institute of Electrical Communication Engineers, 1977, No. 12, 160
1-16o8 pages) is an example. however,
Such parsing systems use parsing techniques that can handle syntactic rules written in the form of general, unrestricted context-free grammars, at the expense of spatial and temporal overrading, e.g. , Early
), if the length of the input sentence is n, then
A storage capacity on the order of n2 and a calculation time on the order of n5 are required. On the other hand, if the syntax rules are limited to a portion of a context-free grammar, faster methods such as those used in translation programs (compilers) for programming languages such as FORTRAN can be used. For example, the syntax rules are called LL(k) grammars. If it belongs to a part of a context-free grammar, a more efficient analysis can be performed using a method called recursive descent. The details are Descent Combi Ring.
ニー・ジェー・ティー ディピー他著 エリスハークソ
ド社刊行1981年(Descent Compili
ng。Written by N.J.T. Dippy et al. Published by Ellis Harxod Publishing Co., Ltd., 1981 (Descent Compili)
ng.
A、 J、 T、 Davie他 Ellis Hor
wood社刊。A, J, T, Davie et al. Ellis Hor
Published by Wood Publishing.
1981年)に述べられている。しかしながら、多くの
自然言語の文法はあいまいなもので、このような制限付
きの文脈自由文法の枠がら出てしまい、そのままの形で
は、自然言語の解析を完全に行なうことはできない。(1981). However, the grammar of many natural languages is ambiguous, and it is impossible to completely analyze natural languages in their original form, since they fall outside the framework of such limited context-free grammars.
発明が解決しようとする問題点
しかし、以上のような方法では、入力文の構文構造を、
構文木の形で完全に求めようとする限り、構文規則とし
て記述できる最も複雑なものにも対処できるような解析
手法を用いねばならず、空間的、時間的にかなりのオー
バーヘッドが生ずるという問題があった。また、一般的
な解析手法の適用の前に、その解析手法で用いる構文規
則とは別の、制限された構文規則を用いて解析する手法
では、構文規則が2群に分離するため、構文規則体系全
体としての整合性を保つのが困難で、個々の入力文に存
在する特殊性も生かせないという問題があった。Problems to be Solved by the Invention However, in the above method, the syntactic structure of the input sentence is
As long as we try to obtain a complete syntax tree, we have to use an analysis method that can handle even the most complex thing that can be described as a syntactic rule, which poses the problem of considerable spatial and temporal overhead. there were. In addition, in a method that uses limited syntax rules that are different from the syntax rules used in that analysis method before applying a general analysis method, the syntax rules are separated into two groups, so the syntax rules There were problems in that it was difficult to maintain the consistency of the system as a whole, and it was not possible to take advantage of the peculiarities that existed in individual input sentences.
本発明は、以上のような問題点を解決するもので、一般
の構文規則に従って解析を行なう構文解析手段で入力文
を解析する前に、個々の入力文の持つ特殊性と、与えら
れた構文規則群の特徴とを考慮して、前記構文解析手段
に比べて解析能力は劣るが高速・効率的な第2の構文解
析手段で、可能な限りの解析を行なうことで、取り扱い
やすい単一の構文規則群の下で、最悪の場合の効率を落
とさずに、平均的な処理効率を向上させた構文解析方法
を提供するものである。The present invention solves the above-mentioned problems. Before parsing an input sentence using a syntactic analysis means that performs analysis according to general syntactic rules, the present invention analyzes the special characteristics of each input sentence and the given syntax. Considering the characteristics of the rule group, a second parsing method, which is faster and more efficient than the parsing method described above, is used to perform as much analysis as possible, resulting in a single, easy-to-handle parsing method. The present invention provides a syntax analysis method that improves average processing efficiency under a group of syntax rules without reducing efficiency in the worst case.
問題点を解決するための手段
本発明は、構文規則(二従って入力文の構文構造を解析
する構文解析手段と、前記入力文が前記構文解析手段へ
入力される前に、前記入力文の構文構造の一部分を求め
る不完全構文解析手段とにより、上記目的を達成するも
のである。Means for Solving the Problems The present invention provides a syntax analysis means for analyzing the syntactic structure of an input sentence according to syntactic rules (2), The above object is achieved by an incomplete parsing means for determining a part of the structure.
作 用
上記構成において、人力文中に含まれる形態、構文、意
味等の諸情報が、不完全構文解析手段へ入力され、全構
文規則中で、適用できると、前記不完全構文解析手段に
よって判断された規則だけが、前記入力文に適用され、
高速だが、部分的で不完全な構文解析が行なわれる。そ
の結果が、全ての構文規則を取り扱うことの可能な、一
般の構文解析手段へ入力され、ここで、完全に構文解析
が行なわれる。Operation In the above configuration, various information such as form, syntax, and meaning contained in a human sentence is input to the incomplete syntax analysis means, and is determined by the incomplete syntax analysis means to be applicable among all syntactic rules. only those rules applied to said input sentence,
Fast, but with partial and incomplete parsing. The result is input to a general parser capable of handling all syntax rules, where it is fully parsed.
実施例
以下、本発明によって、ベタ書きされた漢字がな混じり
表記の日本語文を、文脈自由文法に基づいて解析し、構
文木な出力する構文解析方法を例にとって、図面と共に
説明する。図は、本発明による構文解析方法を実施する
ための全体構成を示すブロック図である。図において、
1は解析すべき入力文、2は形態素と、その品詞とを納
めた品詞辞書、3は品詞辞書2を参照して、入力文1を
相互(二隣接可能な、品詞情報の付加された形態素列4
へと変換する形態素解析手段、5は文脈自由文法式で記
述された構文規則、例えば、「 単位文 −格要素・単
位文 」
「 単位文 −連部 」
「 格要素 −名詞句・格助詞 」
のような規則が納められた構文規則群、6は形態素列4
の品詞の情報と構文規則群5とから、入力文1を部分的
に構文解析する、LR(1)手法に基づいた高速な不完
全構文解析手段、7は不完全構文解析手段6が出力する
部分ホ列、8は、部分ホ列7をアーリー(Earley
)の手法(:基づいて解析し、構文ホ9を求める構文解
析手段である。Embodiment Hereinafter, a syntactic analysis method according to the present invention, in which a Japanese sentence written in solid letters with mixed Kanji characters is analyzed based on a context-free grammar and outputted as a syntax tree, will be explained with reference to the drawings. The figure is a block diagram showing the overall configuration for implementing the syntax analysis method according to the present invention. In the figure,
1 is an input sentence to be analyzed, 2 is a part-of-speech dictionary containing morphemes and their parts of speech, and 3 is a part-of-speech dictionary containing morphemes and their parts of speech. Column 4
5 is a morphological analysis means for converting into morphological analysis means, and 5 is a syntactic rule written in a context-free grammar formula, such as "unit sentence - case element/unit sentence", "unit sentence - rendition", "case element - noun phrase/case particle" A syntactic rule group containing rules such as 6 is a morpheme sequence 4
A high-speed incomplete syntactic analysis means based on the LR (1) method that partially parses the input sentence 1 from the part of speech information and the syntactic rule group 5; 7 is output by the incomplete syntactic analysis means 6. Partial Hole sequence 8 sets partial Hole sequence 7 to Early (Earley).
) is a syntactic analysis method that analyzes based on the method (:) and obtains the syntax 9.
入力文1は、文頭の文字から順に形態素解析手段3へ読
み込まれて処理される。形態素解析手段3(二よって、
品詞辞書2が検索されて、形態素となり得る文字列が認
定され、その品詞が、直前(二認定された形態素と接続
可能ならば、その文字列は形態素として認定される。こ
の動作を繰り返すことで、最終的に、品詞情報の付加さ
れた形態情報、意味情報等の形態素列4が得られる。形
態素列4は、文頭の形態素から1つずつ不完全構文解析
手段6へ読み込まれて処理される。不完全構文解析手段
6は、解析の途中結果及び最終結果を納めるバッファ領
域と、構文規則群5から作られた有限状態オートマトン
とから成り、バッファ領域の内容と読み込んだ形態素、
及び有限状態オートマトンの現在の状態により、行なう
べき動作を決定する。ただし、行なうべき動作が複数何
者えられ、−意には定まらない場合、すなわち、LR(
1)文法の範囲を超えるような動作が必要となる場合に
は、構文規則を適用することはせず、単にシフト動作(
読み込んだ形態素を単にバッファ領域に付は加える)を
行なう。このようにして、不完全構文解析手段6は形態
素列4を解析してゆき、最終的にバッファ領域内に部分
ホ列7が得られるが、この部分ホ列7は、構文規則群5
が同一であっても、入力文1の内容によって、形態素列
4と同一のもの(全熱解析が行なわれない)から、完全
な構文ホ(全ての解析が行なわれる)まで、様々な場合
があり得るが、一般には、いくつかの形態素がボトム・
アップにまとめ上げられて出来た部分ホ(最終的(=得
られる構文木の一部分)の列であり、その列の要素数は
1以上かつ形態素列4の形態素数以下である。また、こ
の不完全構文解析の段階で、後の構文・意味解析に役立
つ各種の情報を抽出することも可能である。このよう(
ニして出来た部分ホ列7は、構文解析手段8により、1
部分水ごとに読み込まれ、解析表の形式で、全ての可能
な構文木が求まる。The input sentence 1 is read into the morphological analysis means 3 in order from the first character of the sentence and processed. Morphological analysis means 3 (2, therefore,
The part of speech dictionary 2 is searched to identify a character string that can be a morpheme, and if that part of speech can be connected to the immediately preceding (2 recognized morpheme), the character string is recognized as a morpheme.By repeating this operation, , Finally, a morpheme string 4 containing morphological information, semantic information, etc. to which part-of-speech information is added is obtained.The morpheme string 4 is read into the incomplete syntax analysis means 6 one by one starting from the morpheme at the beginning of the sentence and processed. The incomplete syntax analysis means 6 consists of a buffer area for storing intermediate and final results of analysis, and a finite state automaton created from the syntax rule group 5, and the incomplete syntax analysis means 6 consists of a buffer area for storing intermediate and final results of analysis, and a finite state automaton created from the syntax rule group 5.
and the current state of the finite state automaton to determine the action to be taken. However, if there are multiple actions to be performed and it is not decided at will, in other words, LR (
1) When an action is required that goes beyond the scope of the grammar, no syntactic rules are applied, and the shift action (
The read morpheme is simply added to the buffer area. In this way, the incomplete syntactic analysis means 6 analyzes the morpheme sequence 4, and finally obtains the partial h-sequence 7 in the buffer area, but this partial h-sequence 7 is the syntactic rule group 5
Even if they are the same, there are various cases depending on the content of input sentence 1, ranging from the same as morpheme sequence 4 (no full thermal analysis is performed) to a complete syntax (all parsing is performed). It is possible, but generally some morphemes are bottom
It is a sequence of partial hos (final (= part of the syntactic tree obtained) created by grouping up, and the number of elements in that sequence is 1 or more and less than or equal to the number of morphemes in morpheme sequence 4. At the stage of complete syntactic analysis, it is also possible to extract various types of information useful for later syntactic and semantic analysis.
The partial string 7 created by
Each partial water is read and all possible syntax trees are determined in the form of an analysis table.
上記の実施例においては、LR(1)手法の計算量が時
間的にも空間的にも、形態素列4の形態素数nとしたと
き、nlのオーダーであり、先に述べたアーリー(Ea
rley )の解析手法の計算量よりも少なくて済む。In the above example, the amount of calculation of the LR(1) method is on the order of nl, both temporally and spatially, where n is the number of morphemes in the morpheme sequence 4, and
The amount of calculation required is smaller than that of the analysis method of (rley).
しかも、最悪の場合でも、計算量は、不完全構文解析手
段6がない場合と同等である。また、不完全構文解析手
段6は、構文解析手段8のための補助的な情報抽出を行
なって、構文解析手段8へ渡すことも可能であり、その
場合には、構文解析手段8の処理効率そのものを向上さ
せることができる。Moreover, even in the worst case, the amount of calculation is equivalent to the case without the incomplete syntax analysis means 6. In addition, the incomplete syntax analysis means 6 can also extract auxiliary information for the syntax analysis means 8 and pass it to the syntax analysis means 8. In that case, the processing efficiency of the syntax analysis means 8 can be improved. You can improve that.
なお、以上の説明では文として日本語文を使用した場合
について説明したが、英語、ドイツ語等、その他の各種
言語の文に対しても適用できる。Note that although the above explanation has been given for the case where Japanese sentences are used as sentences, the invention can also be applied to sentences in various other languages such as English and German.
発明の効果
以上の説明のように、本発明によれば、従来の構文解析
方法に比べ、構文規則群を単一にしたままで、効率よく
適用することが可能な構文規則を、個々の入力文(二応
じて自動的に判断、適用することが可能であり、最悪の
場合の効率を落とすことなしに、平均的な処理効率を向
上させることができ、その効果は大きい。Effects of the Invention As explained above, according to the present invention, compared to conventional syntax analysis methods, syntax rules that can be applied efficiently while keeping a single syntax rule group can be applied to each individual input. It is possible to automatically judge and apply the processing according to the sentence (2), and the average processing efficiency can be improved without reducing efficiency in the worst case, which has a large effect.
図は本発明(二よる構文解析方法の一実施例の全体構成
を示すブロック図である。
1・・・入力文、2・・・品詞辞書、3・・・形態素解
析手段、4・・・形態素列、5・・・構文規則群、6・
・・不完全構文解析手段、7・・・部分ホ列、8・・・
構文解析手段、9・・・構文木。The figure is a block diagram showing the overall configuration of an embodiment of the syntactic analysis method according to the present invention (2). 1. Input sentence, 2. Part-of-speech dictionary, 3. Morphological analysis means, 4. Morpheme sequence, 5... Syntactic rule group, 6.
... Incomplete syntax analysis means, 7... Partial string, 8...
Syntax analysis means, 9... syntax tree.
Claims (3)
た構文規則群に従って解析する際、前記入力文中の形態
情報、意味情報と、前記構文規則群中の情報とを用いて
、前記入力文の構文構造の一部分を求めた後、構文解析
することを特徴とする構文解析方法。(1) When analyzing the syntactic structure of each input sentence according to a group of syntactic rules that define the syntactic structure of a language, the morphological information and semantic information in the input sentence and the information in the group of syntactic rules are used to analyze the syntactic structure of each input sentence. A syntactic analysis method characterized by determining a part of the syntactic structure of an input sentence and then performing syntactic analysis.
由文法のいずれかの形式で記述されている特許請求の範
囲第1項記載の構文解析方法。(2) The syntax analysis method according to claim 1, wherein the syntax rule sentence is written in either a context-free grammar or an extended context-free grammar.
本語の形態素の列である特許請求の範囲第1項記載の構
文解析方法。(3) The syntactic analysis method according to claim 1, wherein the input sentence is a sequence of Japanese morphemes to which morphological information, semantic information, etc. are added.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP62055624A JPS63221475A (en) | 1987-03-11 | 1987-03-11 | Analyzing method for syntax |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP62055624A JPS63221475A (en) | 1987-03-11 | 1987-03-11 | Analyzing method for syntax |
Publications (1)
Publication Number | Publication Date |
---|---|
JPS63221475A true JPS63221475A (en) | 1988-09-14 |
Family
ID=13003935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP62055624A Pending JPS63221475A (en) | 1987-03-11 | 1987-03-11 | Analyzing method for syntax |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS63221475A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5289375A (en) * | 1990-01-22 | 1994-02-22 | Sharp Kabushiki Kaisha | Translation machine |
US5329446A (en) * | 1990-01-19 | 1994-07-12 | Sharp Kabushiki Kaisha | Translation machine |
JP2013025699A (en) * | 2011-07-25 | 2013-02-04 | Nec Corp | Syntactic analysis information creation device, translation device, translation system, syntactic analysis information creation method, and computer program |
US8838440B2 (en) | 2010-09-14 | 2014-09-16 | International Business Machines Corporation | Generating parser combination by combining language processing parsers |
-
1987
- 1987-03-11 JP JP62055624A patent/JPS63221475A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5329446A (en) * | 1990-01-19 | 1994-07-12 | Sharp Kabushiki Kaisha | Translation machine |
US5289375A (en) * | 1990-01-22 | 1994-02-22 | Sharp Kabushiki Kaisha | Translation machine |
US8838440B2 (en) | 2010-09-14 | 2014-09-16 | International Business Machines Corporation | Generating parser combination by combining language processing parsers |
JP2013025699A (en) * | 2011-07-25 | 2013-02-04 | Nec Corp | Syntactic analysis information creation device, translation device, translation system, syntactic analysis information creation method, and computer program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JPH02165378A (en) | Machine translation system | |
KR20040101678A (en) | Apparatus and method for analyzing compounded morpheme | |
JPS6318458A (en) | Method and apparatus for extracting feeling information | |
JPS63221475A (en) | Analyzing method for syntax | |
Zhang et al. | Chinese-Mongolian Machine Translation Combining Sentence Structure Information | |
JPS62139076A (en) | Language analysis system | |
Isahara et al. | Context analysis system for Japanese text | |
JPS6126172A (en) | Kana/kanji conversion system | |
KR100400222B1 (en) | Dynamic semantic cluster method and apparatus for selectional restriction | |
JP3972697B2 (en) | Natural language processing system, natural language processing method, and computer program | |
JPH02140869A (en) | Sentence structure analyzing method | |
Ahmed et al. | Multilingual Extension of Dependency Parsing and Annotation | |
Aref et al. | English to Arabic machine translation: a critical review and suggestions for development | |
JPH0320866A (en) | Text base retrieval system | |
JPS6389975A (en) | Language analyzer | |
JPS61221875A (en) | System for converting processing japanese sentence into simple sentence | |
JPH0827798B2 (en) | Parser | |
Kurohashi et al. | Construction of Japanese nominal semantic dictionary using “A NO B” phrases in corpora | |
JPH04372047A (en) | Kana/kanji converter | |
JPS62264367A (en) | Japanese word producing device | |
JPS63168775A (en) | Idiomatic expression analyzing and transforming method for natural language | |
JPS63138465A (en) | Analyzing device for syntax structure | |
JPH0340067A (en) | Sentence retrieving system | |
JPS61115172A (en) | Machine language-translation system | |
JPS63213069A (en) | Language processor |