JPH02254565A

JPH02254565A - Syntax analysis system

Info

Publication number: JPH02254565A
Application number: JP1077341A
Authority: JP
Inventors: Norikazu Ito; 則和伊藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-03-29
Filing date: 1989-03-29
Publication date: 1990-10-15

Abstract

PURPOSE:To improve the syntax analysis efficiency by performing syntax analysis after limiting parts of speech of words to parts of speech, which satisfy combinations of the highest connection probability, to resolve the polysemy of words having many parts of speech at the time of analyzing these words. CONSTITUTION:A morpheme analyzing part 7a of a translation main body part (translation part) 7 refers to a dictionary for an input text, and a syntax analyzing part 7b gets information of individual words and performs purging in accordance with a grammatical rule, and a tree structure is generated from analysis results. A converting part 7c transforms the tree structure of the input language to that of the output language, and a generating part 7d translates every node of the obtained tree structure. Parts of speech of respective words are limited to parts of speech satisfying combinations of the highest connection probability by calculating the product of connection probability between parts of speech, and syntax analysis is performed after the polysemy of words having many parts of speech is resolved at the time of analyzing these words. Thus, the syntax analysis efficiency is improved.

Description

【発明の詳細な説明】１監分互本発明は、自然語処理における構文解析方式に関する。[Detailed description of the invention] 1 supervisor mutually The present invention relates to a syntax analysis method in natural language processing.

災米肢生本発明に係る従来技術としては、特開昭６１−４０６７
２号公報や特開昭］１−７４０６９号公報がある。特開
昭６１−４０６７２号公報に記載された発明は、多品側
解消処理方式に関するもので、品詞決定のための多品側
解消規則が用意されており、品詞の出現率を考慮して１
つの品詞を決定している。また、特開昭６１−４０６７
２号公報に記載された発明は、対話型翻訳方式に関する
もので、入力文中の単語の品詞指定を人間が行なうと、
それに基づき翻訳するものである。As a prior art related to the present invention, Japanese Patent Application Laid-Open No. 61-4067
2 and Japanese Patent Application Laid-open No. 1-74069. The invention described in Japanese Patent Application Laid-Open No. 61-40672 relates to a multi-part side resolution processing method, in which a multi-part side cancellation rule for determining parts of speech is prepared, and one
The two parts of speech are determined. Also, JP-A No. 61-4067
The invention described in Publication No. 2 relates to an interactive translation method, in which when a human specifies the part of speech of a word in an input sentence,
The translation is based on that.

自然言語の構文解析で、多品詞の処理は非常に厄介な問
題である。１つの語が複数の意味を持っているとさまざ
まな解析結果が導かれるが、正解以外は誤解析である。Processing multiple parts of speech is a very difficult problem in natural language parsing. When one word has multiple meanings, various analysis results can be derived, but anything other than the correct answer is an incorrect analysis.

多品詞語の多義解消は構文解析にとって２つの意味があ
る。１つは解析精度の向上である。多義解消によって誤
解析が大幅に軽減される。もう１つは解析精度の向上で
ある。The disambiguation of multipart speech words has two meanings for syntactic analysis. One is to improve analysis accuracy. Misanalysis is greatly reduced by disambiguation. The other is improving analysis accuracy.

他の品詞を排除するので正解を導く解析以外はほとんど
なされなくなる。例えば多品詞語を複数持つ文をそのま
ま解析すると８つの解の候補が得られるとする。多品詞
の解消を行えば解の候補が大幅に減るであろうから、解
析規則の適用回数及び組み合わせ数が少なくなり、構文
解析の負担は著しく軽減される。Since other parts of speech are excluded, little analysis is required other than the one that leads to the correct answer. For example, suppose that eight possible solutions are obtained when a sentence with multiple parts of speech is analyzed as is. If multiple parts of speech are eliminated, the number of solution candidates will be greatly reduced, so the number of applications of parsing rules and the number of combinations will be reduced, and the burden of parsing will be significantly reduced.

豆−一五本発明は、上述のごとき実情に鑑みてなされたもので、
新しく連接確率を導入し、語の優先度を併用することで
、構文解析効率の向上を図り、より精度の高い構文解析
を実現し、熟語処理にも対応できるような構文解析方式
を提供することを目的としてなされたものである。Mame-15 This invention was made in view of the above-mentioned circumstances.
By introducing a new concatenation probability and using word priority, we aim to improve syntactic analysis efficiency, achieve more accurate syntactic analysis, and provide a syntactic analysis method that can also handle idiom processing. It was made for the purpose of

盪−一鬼本発明は、上記目的を達成するために、（１）機械翻訳
等の自然言語解析システムにおける、形態素解析部での
解析対象テキストの辞書引きの後に、各々の語が持つ品
詞を、それぞれの間の連接確率の積を計算することによ
り、最も連接確率の高くなる組み合わせを満たすものに
限定して、複数の品詞を持つ語が解析されるときの多品
詞語の多義性を解消してから構文解析を行うこと、或い
は、（２）機械翻訳等の自然言語解析システムにおける
、形態素解析部での解析対象テキストの辞書引きの後に
、各々の語が持つ品詞を、それぞれの間の連接確率の積
を計算することにより、最も連接確率の高くなる組み合
わせを満たすものに限定するときに、各々の品詞が持つ
優先度も併せて積の係数として計算を行い、多品詞語の
多義を解消して構文解析を行うこと、或いは、（３）機
械翻訳等の自然言語解析システムにおける、形態素解析
部での解析対象テキストの辞書引きの後に、各々の語が
持つ品詞を、それぞれの間の連接確率の積を計算するこ
とにより、最も連接確率の高くなる組み合わせを満たす
ものに限定するときに、辞書引きされた語の中に熟語が
あるとき、その熟語の優先度によって並列的計算と排他
的計算を行い、多品詞語の多義を解消して構文解析を行
うことを特徴としたものである。以下、本発明の実施例
に基づいて説明する。(1) In a natural language analysis system such as machine translation, the present invention calculates the parts of speech of each word after the morphological analysis section looks up the text to be analyzed in a dictionary. , by calculating the product of the conjunctive probabilities between each, the ambiguity of multi-part speech words is resolved when words with multiple parts of speech are analyzed by limiting the combination to those that satisfy the highest conjunctive probability. (2) In a natural language analysis system such as machine translation, after the morphological analysis unit looks up the text to be analyzed in a dictionary, the part of speech of each word is determined by comparing the part of speech between each word. By calculating the product of conjunctive probabilities, when limiting the combination to those that satisfy the highest concatenating probability, the priority of each part of speech is also calculated as a coefficient of the product, and polysemy of multi-part speech words can be reduced. (3) In a natural language analysis system such as machine translation, after the morphological analysis unit looks up the text to be analyzed in a dictionary, the part of speech of each word is determined by comparing the part of speech between each word. By calculating the product of conjunctive probabilities, when limiting to combinations that satisfy the highest concatenating probability, if there is an idiom among the words looked up in the dictionary, parallel calculation and exclusion are performed depending on the priority of the idiom. It is characterized by performing syntactic analysis by performing calculations and eliminating ambiguity in multipart speech words. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明による構文解析方式を用いた翻訳装置
の一実施例を説明するための構成図で、図中、１はＣＲ
Ｔ、２はキーボード、３は０ＣＲ１４は入力文書、５は
スペルチェック部、６は前編集部、７は翻訳本体部、８
は後編集部、９は辞書、１０は文法規則、１１は出力文
書、１２はプリンタである。ファイル入力、キーボード
入力、ＯＣＲ入力のいずれかによって得た入力文はスペ
ルチェック部５、前編集部６を用いて前処理を行える。FIG. 1 is a block diagram for explaining one embodiment of a translation device using a syntactic analysis method according to the present invention, and in the figure, 1 is a CR
T, 2 is the keyboard, 3 is 0CR14 is the input document, 5 is the spell check section, 6 is the pre-editing section, 7 is the translation main section, 8
9 is a post-editing section, 9 is a dictionary, 10 is a grammar rule, 11 is an output document, and 12 is a printer. Input sentences obtained by file input, keyboard input, or OCR input can be preprocessed using spell check section 5 and preediting section 6.

翻訳本体部７によって得られた出力文は後編集部８によ
って翻訳情報を利用して編集できる。入力文と出力文は
プリンタ１２を用いて印刷できる。The output sentence obtained by the translation main unit 7 can be edited by the post-editing unit 8 using translation information. The input and output sentences can be printed using printer 12.

第２図は、翻訳本体部７の処理の流れを示すが、この翻
訳本体部（翻訳部）７は大きく分けて形態素解析、構文
解析、変換、生成の４つの処理からなり、形態素解析部
７ａでは入力テキストの辞書引きを行ない、構文解析部
７ｂでは個々の語の情報を得て文法規則に従ってパージ
ングを行い、解析結果から木構造を作成する。変換部７
ｃでは入力言語の木構造から出力言語の木構造に変形し
、生成部７ｄでは得られた木構造をノードごとに訳出す
る。FIG. 2 shows the processing flow of the translation main unit 7. This translation main unit (translation unit) 7 is roughly divided into four processes: morphological analysis, syntactic analysis, conversion, and generation. Then, the input text is looked up in a dictionary, and the syntactic analysis unit 7b obtains information on each word and performs parsing according to grammatical rules, and creates a tree structure from the analysis results. Conversion section 7
In step c, the tree structure of the input language is transformed into the tree structure of the output language, and in the generation section 7d, the obtained tree structure is translated node by node.

本発明は、上記の構文解析部に属するもので、ここでは
入力テキストは英文とする。入力されたテキストを対象
として、形態素解析部７ａで辞書引きを行う、辞書引き
した結果を得て、構文解析部７ｂに進む。構文解析部７
ｂでは、まず、多品詞解消処理を行う。ここでの多品詞
解消方式は下、記のＳ　、　Ｊ　、　Ｄｅｒｏｓｅの文
献にて招介されているものを利用する。The present invention belongs to the above-mentioned syntax analysis section, and here, the input text is assumed to be English text. The morphological analysis unit 7a performs a dictionary lookup on the input text, obtains the result of the dictionary lookup, and proceeds to the syntactic analysis unit 7b. Syntax analysis section 7
In step b, multi-part-of-speech resolution processing is first performed. The multi-part-of-speech resolution method here uses the method introduced in the following literature by S., J., and Derose.

Ｃｏｍｐｕｔａｔｉｏｎａｌ　Ｌｉｎｇｕｉｓｔｉｃｓ
、　Ｖｏｌ、１４．　Ｎｏ、１゜す１ｎｔｅｒ　１９８
８．　ｐ３１−３９“Ｇｒａｍｍａｔｉｃａｌ　Ｃａｔ
ｅｇｏｒｙ　Ｄｉｓａｍｂｉｇｕａｔｉｏｎ　ｂｙＳｔ
ａｔｓｔｉｃａｌ　Ｏｐｔｉｍｉｚａｔｉｏｎ”　Ｓ、
Ｊ、Ｄｅｒｏｓｅ（ＢｒｏｗｎＵｎｉｖ、）上記文献の筆者（Ｄｅｒｏｓｓ）の提案するＶＯＬＳＵ
ＮＧＡと呼ぶ多品詞解消方式を利用して解析部導入処理
とする。ＶＯＬＳＵＮＧＡは以下の特徴を持つ。Computational Linguistics
, Vol. 14. No, 1゜su1nter 198
8. p31-39 “Grammatical Cat
egory Disambiguation bySt
physical optimization”S,
J, Derose (BrownUniv,) VOLSU proposed by the author of the above document (Deross)
The analysis section is introduced using a multi-part-of-speech resolution method called NGA. VOLSUNGA has the following features.

■完全な数学的アルゴリズムに基づき、臨時的な付加部
分を最小限に押さえている。■Based on a complete mathematical algorithm, temporary additions are kept to a minimum.

骨最適な品詞列の定義は、品詞列を構成する品詞連接確
率および相対品詞確率の積が最大のものである。The definition of an optimal part-of-speech string is one in which the product of the part-of-speech conjunction probability and the relative part-of-speech probability that make up the part-of-speech string is maximum.

■効率的な最適品詞列探索法（動的プログラミング法）
により、指数関数的な計算を克服した。■Efficient optimal part-of-speech sequence search method (dynamic programming method)
This overcomes exponential calculations.

ここでいう品詞は細かい品詞分類を指し、全部で１００
種類ぐらいある。The parts of speech here refer to detailed classifications of parts of speech, with a total of 100 parts of speech.
There are several types.

必要なデータとしては、品詞分類と相対品詞確率と熟語
優先情報を持つ辞書（第４図）、品詞連接確率表（第２
表）がある。Necessary data include a dictionary with part-of-speech classification, relative part-of-speech probability, and idiom priority information (Figure 4), and a part-of-speech conjunctive probability table (Figure 2).
There is a table).

方式は、最適な品詞の組み合わせを以下の第１表から動
的プログラミング方法により求める。第３図には最適品
詞列選択のフロー図が示されている。The method uses a dynamic programming method to find the optimal combination of parts of speech from Table 1 below. FIG. 3 shows a flowchart for selecting an optimal part-of-speech sequence.

第１表もし、　Ｔｏ、１からＴｎ、ｊまでの最適品詞組合せが
、Ｔｎ−１，ｉを通るとすると、Ｔｎ−１，ｉまでの部
分はＴＯ２１からＴｎ−１，ｉまでの最適品詞組合せで
ある。Table 1: If the optimal part-of-speech combination from To,1 to Tn,j passes through Tn-1,i, then the part up to Tn-1,i is the optimal part-of-speech combination from TO21 to Tn-1,i. It is.

なぜならば、Ｔｎ−１、ｉまでの部分が最適でなければ
、Ｔｎ−１、ｉまでの部分に最適品詞組合せを選ぶと、
その方が最適となる。This is because if the part up to Tn-1,i is not optimal, if the optimal part-of-speech combination is selected for the part up to Tn-1,i,
That would be optimal.

ゆえに、　ＴＯ，１からＴｎ、ｊまでの最適品詞組合せ
は、各１（＝１．・・、ｉｏ）についてのＴＯ２１から
Ｔｎ−１、ｉまでの最適品詞組合せとＴｎ−１，ｉから
Ｔｎ、ｊへの組合せ中の最適なものである。例えばＴｈｅ　ｍａｎ　５ｔｉｌｌ　５ａｔｉ　ｈｅｒ。Therefore, the optimal part-of-speech combination from TO,1 to Tn,j is the optimal part-of-speech combination from TO21 to Tn-1,i for each 1 (=1..., io) and the optimal part-of-speech combination from Tn-1,i to Tn, It is the optimal one among the combinations to j. For example, The man 5till 5ati her.

という文についての最適品詞組合せを計算する。Calculate the optimal part-of-speech combination for the sentence.

ここではそれぞれの語が以下の品詞を持つとする。Assume that each word has the following parts of speech.

また、説明を簡単にするため、それぞれの語の品詞が持
つ相対品詞確率は省略する。実際には連接確率に加えて
品詞相対確率も係数となる。Furthermore, to simplify the explanation, the relative part-of-speech probabilities of the parts of speech of each word are omitted. Actually, in addition to the conjunction probability, the part-of-speech relative probability is also a coefficient.

Ｔｈｅ　　ｍａｎ　　５ｔｉｌｌ　　ｓａｗ　　ｈｅｒ
ＡＴ　　ＮＮ　　ＮＮ　　　ＮＮ　　ＰＰ０ＶＢ　　Ｖ
Ｂ　　　ＶＢＤ　　ＰＰＳＢここで、ＡＴ（＝冠詞）、ＮＮ（＝名詞）、ｐｐｏ　（
＝代名詞目的格）、ｐｐｓ（＝所有代名詞）、ＲＢ（＝
副詞）、ＶＢ（＝動詞）、ＶＢＤ（＝動詞過去）である
。The man 5till saw her
AT NN NN NN PP0VB V
B VBD PPS B Here, AT (=article), NN (=noun), ppo (
= pronoun object), pps (= possessive pronoun), RB (=
adverb), VB (=verb), and VBD (=verb past).

先頭のＴｈｅから末尾のｈｅｒまでの品詞組合せは１１
２１３ｍ２＊２＝２４の２４通りある。次のページの第
２表の確率を用いて最適組合せを計算する。There are 11 part-of-speech combinations from "The" at the beginning to "her" at the end.
There are 24 ways, 213m2*2=24. Calculate the optimal combination using the probabilities in Table 2 on the next page.

＾　　　　ロコ２　　　　〉ＱＸＱｚ　　　　ｃＱ　　　　Ｃ１５Ｚ　　　＞　　　＝＝＝＝　　妻 ↑ ≧　　　　αコ ≧＝　　　　〉２巴　　　　〉 ↑ ■ ↑ 呂第２表（連接確率の例）次に、相対品詞確率による補正の例を示す。＾　　　Loco 2　　　〉 QXQ z　　　cQ　　C15 Z　　　　＞　　　＝ === Wife ↑ ≧　　　　α ≧=　　　〉 2 Tomoe ↑ ■ ↑ Lu Table 2 (Example of connection probability) Next, an example of correction using relative part-of-speech probabilities will be shown.

ｓｏ　：　ＱＬ（限定側、９３２）、ＣＳ（従属接続詞
、４７９）　。so: QL (limiting side, 932), CS (subordinating conjunction, 479).

ＬＩＨ（間投詞、１）数字は品詞相対確率ｓｏ　ｔｈａ
ｔ　　の並びを連接確率のみと相対品詞確率併用で品詞
推定する。LIH (interjection, 1) The number is the part of speech relative probability so tha
The part of speech of the sequence t is estimated using only the conjunction probability and the relative part of speech probability.

連接確率のみによる方法Ｐ（Ｕｌｌ−Ｃ３））　Ｐ（Ｃ
３−Ｃ５）連接確率と相対品詞確率Ｐ（ＵＨ−Ｃ３）＊Ｐ（ｓｏ−ＵＨ）＊Ｐ（ｔｈａｔ−
Ｃ３）＜Ｐ　（Ｃ８−Ｃ５）＊Ｐ　（ｓｏ−Ｃ３）＊Ｐ
　（ｔｈａｔ−Ｃ３）Ｐ、（）は確率を示す。Ｐ（ＵＨ
−Ｃ３）はｔｌＨ−Ｃ５の連接確率を示す。Ｐ　（ｓｏ
−Ｃ３）はｓｏのＣＳの品詞相対確率を示す。連接確率
と相対品詞確率を併用したときに正解Ｃ３−ＣＳが得ら
れる。Method P(Ull-C3)) P(C
3-C5) Conjunction probability and relative part-of-speech probability P(UH-C3)*P(so-UH)*P(that-
C3)<P (C8-C5)*P (so-C3)*P
(that-C3)P, () indicates probability. P(UH
-C3) indicates the connection probability of tlH-C5. P (so
-C3) indicates the relative probability of the CS of so. The correct answer C3-CS is obtained when the conjunction probability and the relative part-of-speech probability are used together.

次に、熟語があるときの処理例を示す。Next, an example of processing when there is an idiom is shown.

Ｔｈｅ　１ｌａｎ　ｃａｍｅ　ｉｎ　ｏｒｄｅｒ　ｔｏ
　ｗｉｎ、は以下の品詞を持つとする。The 1lan came in order to
Assume that win has the following parts of speech.

Ｔｈｅ　ｍａｎ　ｃａｍｅ　ｉｎ　ｏｒｄｅｒ　ｔｏ　
ｗｉｎ。The man came in order to
Win.

ＡＴ　　ＮＮ　　ＶＢＤ　　ＰＲＮＮ　　　ＴＯＮＮ　
　Ｔｏ（不定詞を伴うｔｏ）ＶＢ　　　　ＲＢ　ＶＢ　
　　ＰＲＶＢ　　ＰＲ（前置詞）ＲＢ　　　ＰＰＮ（人
称代名詞主格）＜−−ＴＯ−−）先頭の工から末尾のｗｉｎまでの品詞組合せは１＊２＊
１＊２＊２＊３＊２＝４８１ネ２＊１　　＊　　１　１　２＝　　４計４８通りあ
る。本来、熟語はその構成単語ごとに扱わずにひとかた
まりで１つの言葉を成すと考えるべきであるが、ここで
は便宜上その構成単語ごとに仮の単語として組み合わせ
の可能性を残しておく。このとき、もちろんこの部分は
他の単語との組み合わせは許さない。その熟語を構成す
る最後の単語において連接確率及び品詞相対確率を計算
する。この方法では熟語が長ければ長いほど係数の数が
少なくなるので、それに応じてあらたに係数を乗じてお
く必要もある。AT NN VBD PRNN TONN
To (to with infinitive) VB RB VB
PRVB PR (preposition) RB PPN (personal pronoun nominative) <--TO--) The part-of-speech combinations from the first work to the last win are 1*2*
1 * 2 * 2 * 3 * 2 = 48 1 ne 2 * 1 * 1 1 2 = 4 There are 48 ways in total. Normally, an idiom should be considered as a single word rather than treated individually, but for the sake of convenience, each of its constituent words is treated as a provisional word, leaving open the possibility of combinations. At this time, of course, this part cannot be combined with other words. The conjunction probability and part-of-speech relative probability are calculated for the last word constituting the compound word. In this method, the longer the idiom, the fewer the number of coefficients, so it is necessary to multiply the number of coefficients accordingly.

この例では、ｉｎ　ｏｒｄｅｒ　ｔｏが熟語である。こ
の部分は“１ｎｌ）“ｏｒｄｅｒ”ｔｏ”の組み合わせ
は２＊２＊３で１２通りあるが“ｉｎ　ｏｒｄｅｒ　ｔ
ｏ”は１通りである。下では本−＊−ＴＯと表わしてい
る。−ゝ°の部分が組み合わせが固定であることを示し
ている。In this example, in order to is an idiom. This part is "1nl)" There are 12 combinations of "order" to, 2*2*3, but "in order t"
There is only one type of "o". Below, it is expressed as this -*-TO. The -ゝ° part indicates that the combination is fixed.

ＮＮ　　　　　　　ＮＮ　　　　　　　　　　ＮＮ　　
　　　ＰＲＡＴ　　−＋　　ＡＴ　　　　４　　ＡＴ　
　　　ＶＢＤ　　−＋　　ＡＴ　　　　ＶＢＤＶＢ　　
　　　　ＶＢ　　　　　　　　　ＶＢ　　　　　ＲＢ本ＮＮ　　　　　ＰＲＮＮ　　　　　　ＮＮ　　　　　Ｐ
ＲＮＮ　　Ｔ。NN NN NN
PRAT −+ AT 4 AT
VBD −+ AT VBDVB
VB VB RB book NN PRNN NN P
RNN T.

４ＡＴ　　　　ＶＢＤ　　　　　　　４ＡＴ　　　　Ｖ
ＢＤ　　　　　　　　ＰＲＶＢ　　　　　ＲＢ　　ＶＢ
　　　　　　ＶＢ　　　　　ＲＢ　　ＶＢ　　ＲＢ＊　
−−＊　　　　　　　　　　　　　＊　−−＊　−−Ｔ
。4AT VBD 4AT V
BD PRVB RB VB
VB RB VB RB*
−−＊＊ −−＊ −−T
.

ＮＮ　　　　　ＰＲＮＮ　　Ｔｏ　　ＮＮ４ＡＴ　　　
　ＶＢＤ　　　　　　　　ＰＲＶＢ　　　　　ＲＢ　　
　ＶＢ　　　ＲＢ　　　ＶＢ＊　−−＊　−−Ｔ。NN PRNN To NN4AT
VBD PRVB RB
VB RB VB* --* --T.

ＡＴ　　ＮＮ　　ＶＢＤ　　　＊−−＊−−Ｔｏ　　Ｖ
Ｂまた、熟語には辞書で優先情報を与えることができる
。今までは熟語が優先情報を持たない場合を説明してい
る。優先情報を持つときは熟語以外の可能性を廃棄する
。すなわち“ｉｎ　ｏｒｄｅｒ　ｔｏ”が優先情報を持
つときは“１ｎ″“ｏｒｄｅｒ”ｔｏ”は辞書引きされ
ない。例文は以下の品詞しか持たなくなる。なおこの優
先情報は熟語により付与するものもあるし付与しないも
のもある。AT NN VBD *--*--To V
BAlso, priority information can be given to phrases in a dictionary. So far, we have explained the case where an idiom does not have priority information. When having priority information, possibilities other than idiomatic words are discarded. In other words, when "in order to" has priority information, "1n""order"to" will not be looked up in the dictionary.The example sentence will only have the following parts of speech.This priority information may be given by an idiom, or it may not be given. There are some things.

ＡＴ　　ＮＮ　　ＶＢＤ　　＜−−Ｔｏ　−−−＞　Ｎ
ＮＶＢ　　　　　　　　　　ＶＢこうして多品用の解消をしてから構文解析を行う。構文
解析では多義性が大幅に解消されているので本構造の作
成が主な仕事となる。AT NN VBD <--To ---> N
NVB VB In this way, syntax analysis is performed after eliminating multiple items. Since ambiguity is largely eliminated in syntactic analysis, the main task is to create this structure.

羞−一米以上の説明から明らかなように、本発明によると、請求
項１により、新しく連接確率を導入することで、構文解
析率が向上する。また、請求項２により、語の優先度を
併用することで゛、より精度の高い構文解析ができる。As is clear from the above description, according to the present invention, the parsing rate is improved by newly introducing the concatenation probability according to claim 1. Furthermore, according to claim 2, by using word priority in combination, more accurate syntactic analysis can be performed.

また、請求項３により。Also according to claim 3.

熟語処理に対応する。Supports idiom processing.

[Brief explanation of drawings]

第１図は、本発明による構文解析方式を用いた翻訳装置
の一実施例を説明するための構成図、第２図は、第１図
における翻訳本体部の処理フローを示す図、第３図は、
最適品詞別選択のフローを示す図、第４図は、辞書の例
を示す図である。１・・・ＣＲＴ、２・・・キーボード、３・・・０ＣＲ
１４・・・入力文書、５・・・スペースチエツク部、６
・・・前編集部、７・・・翻訳本体部、８・・・後編集
部、９・・・辞書、１０・・・文法規則、１１・・・出
力文書、１２・・・プリンタ。第図第図FIG. 1 is a block diagram for explaining an embodiment of a translation device using the syntax analysis method according to the present invention, FIG. 2 is a diagram showing the processing flow of the translation main body in FIG. 1, and FIG. teeth,
FIG. 4, which is a diagram showing the flow of selection by optimal part of speech, is a diagram showing an example of a dictionary. 1...CRT, 2...Keyboard, 3...0CR
14... Input document, 5... Space check section, 6
... Pre-editing section, 7... Translation body section, 8... Post-editing section, 9... Dictionary, 10... Grammar rules, 11... Output document, 12... Printer. Figure Figure

Claims

[Claims] 1. In a natural language analysis system such as machine translation, after the morphological analysis unit looks up the text to be analyzed in a dictionary, the product of the conjunctive probabilities between the parts of speech of each word is calculated. By doing this, syntactic analysis is performed after eliminating the ambiguity of multi-part speech words when a word with multiple parts of speech is analyzed, limiting it to the combination that satisfies the highest conjunctive probability. parsing method. 2. In a natural language analysis system such as machine translation, after the morphological analysis unit looks up the text to be analyzed in a dictionary, the part of speech that each word has is calculated to find the most connected part of speech by calculating the product of the connection probabilities between each part of speech. When restricting to combinations that satisfy a high probability, the priority of each part of speech is also calculated as a product coefficient, and syntactic analysis is performed by eliminating polysemy of multi-part speech words. Parsing method. 3. In a natural language analysis system such as machine translation, after the morphological analysis unit looks up the text to be analyzed in a dictionary, the part of speech that each word has is calculated to find the most connected part of speech by calculating the product of the connection probabilities between each part of speech. When limiting to words that satisfy combinations that increase the probability, if there is a compound word among the words looked up in the dictionary, parallel calculation and exclusive calculation are performed depending on the priority of the compound word to eliminate polysemy of multi-part speech words. A syntactic analysis method characterized by performing syntactic analysis.