JPS6391776A

JPS6391776A - Natural language analyzer

Info

Publication number: JPS6391776A
Application number: JP61230237A
Authority: JP
Inventors: 堤　泰治郎; 堤　豊
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1986-09-30
Filing date: 1986-09-30
Publication date: 1988-04-22
Also published as: JPH0444304B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】Ａ、産業上の利用分野この発明は、機械翻訳システムや自然言語による質問応
答システム等に適用して好適な自然言語解析装置に関し
、とくに、解釈にあいまいさがある文に対して最適な解
析木を簡易に選べるようにしたものである。[Detailed Description of the Invention] A. Industrial Application Field The present invention relates to a natural language analysis device suitable for application to machine translation systems, natural language question answering systems, etc. This makes it easy to select the optimal analytic tree for a given problem.

Ｂ、従来の技術Ｂ１．統語解析と意味解析計算機によって自然言語を処理する場合に必須なのは、
自然言語文の構造を得ることである。そして、このため
には辞書を参照し、どの単語が何という品詞であるかを
判別し、さらに、どの単語がどの単語を修飾しているか
を判別せねばならない。こうして得られる文の構造を表
現するものが解析木である。また、自然言語文から解析
木を得る操作が文章解析である。文章解析を行なう上で
難かしいのが、多品詞語の解釈及び動詞の任意格に関連
する解釈である。多品詞語とは、一つの単語が複数の品
詞として振舞うものであり、例えば英語の場合多くの動
詞は名詞としても使われる。B. Conventional technology B1. When processing natural language using syntactic and semantic analysis computers, the following are essential:
The goal is to obtain the structure of natural language sentences. In order to do this, it is necessary to refer to a dictionary and determine which word belongs to which part of speech, and further to determine which word modifies which word. A parse tree represents the structure of the sentence obtained in this way. Furthermore, text analysis is the operation of obtaining a parse tree from a natural language sentence. What is difficult in sentence analysis is the interpretation of multipart speech words and the interpretation related to the optional case of verbs. Multi-part speech words are words in which one word behaves as multiple parts of speech; for example, in English, many verbs are also used as nouns.

動詞の任意格とは、所定の動詞に対してあってもなくて
もかまわない格である。例えば、ｉｎ格などはあっても
なくてもかまわない場合が多い。この場合、ｉｎは直前
の別の格の名詞を修飾する場合もあり得るので、すなわ
ちｉｎ格でない場合もあるので取り扱いが難かしい。The optional case of a verb is a case that may or may not exist for a given verb. For example, in cases may or may not be present in many cases. In this case, in may modify the immediately preceding noun in another case, that is, it may not be in case, so it is difficult to handle.

従来解析木を得るには統語解析が行われてきた。Traditionally, syntactic analysis has been used to obtain parse trees.

この統語解析では、文がいくつかの句（名詞句、動詞句
など）から構成され、また各々の句がいくつかの単語か
ら構成されるという具合に文法規則をつくり、それに従
って解析を進める（ｒＬａｎｇｕａｇｅＡｓ　Ａ　Ｃｏ
ｇｎｉｔｉｖｅ　Ｐｒｏｃｅｓｓ　Ｖｏｌ、Ｉ　５ｙｎ
ｔａｘＪ、ＴｅｒｒｙＷｉｎｏｇｒａｄ、　Ａｄｄｉｓ
ｏｎ　Ｗｅｓｌｅｙ出版、ｒＰａｒｓｉｎｇＮａｔｕｒ
ａｌ　ＬａｎｑｕａｇｅＪ、Ｍａｒｇａｒｅｔ　Ｋｉｎ
ｇ、　ＡｃａｄｅｍｉｃＰｒｅｓｓ出版）。この統語解
析の長所としては、比較的簡単に文法規則が作れて、辞
書の記述も単純であることである。しかし、欠点として
、上述多品詞語や任意格等に起因するあいまいさがある
ときに、複数の解析木を出力してしまうことがあげられ
る。自然言語文を統語解析すると、唯一の結果を得るこ
とはまれで、比較的短かい文に対しても多数の異なった
解析結果が得られる。In this syntactic analysis, a sentence is made up of several phrases (noun phrases, verb phrases, etc.), and each phrase is made up of several words. Grammar rules are created, and the analysis proceeds according to these rules ( rLanguage As A Co
gnitive Process Vol, I 5yn
taxJ, Terry Winograd, Addis
on Wesley Publishing, rParsingNatur
al LanquageJ, Margaret Kin
g, Academic Press Publishing). The advantage of this syntactic analysis is that it is relatively easy to create grammatical rules and the dictionary descriptions are simple. However, a drawback is that multiple parse trees are output when there is ambiguity due to multi-part speech words, arbitrary cases, etc. mentioned above. Syntactic analysis of natural language sentences rarely yields a unique result, and many different analysis results can be obtained even for relatively short sentences.

ところで、文にあいまいさがある場合にも、人間は意味
や文脈を考慮に入れて、はとんど唯一の解釈を与えるこ
とができる。このような配慮を自然言語文の解析に適用
したものが意味解析であり、その１つとしてモンタギュ
ー文法がある（「モンタギュー文法に基づく英文和訳シ
ステムの試作」、画用等、情報処理学会論文誌Ｖｏ１．
２３、Ｎａ２、ｐｐ、１０７−１１５．１９８２年３月
）。しかし、この方法は、非常に煩雑な計算を必要とし
、また、すべての単語の意味を記述しておかなければな
らないため、やっかいである。By the way, even when there is ambiguity in a sentence, humans can take meaning and context into consideration and provide a unique interpretation. Semantic analysis is the application of such considerations to the analysis of natural language sentences, and one of them is Montague grammar ("Prototype of an English-Japanese translation system based on Montague grammar", Information Processing Society Journal Vol1.
23, Na2, pp, 107-115. March 1982). However, this method is cumbersome because it requires very complicated calculations and the meanings of all words must be written in advance.

この発明では動詞の格構造に着目して、あいまいさを統
計的に処理できるようにしている。This invention focuses on the case structure of verbs to enable statistical processing of ambiguity.

Ｂ２．その他の従来技術格構造に着目した文章解析としては、「自然言語理解シ
ステムＩＭＡＧＥＳ−Ｉの構文解析過程についてＪ、吉
武等、電子通信学会論文誌Ｖｏ１．Ｊ６７ＬＤ　Ｎα１
０、ρｐ、ＩＬ４７−１１５４．１９８４年１０月など
がある。B2. Other conventional techniques for text analysis focusing on case structure include "On the parsing process of the natural language understanding system IMAGES-I, J. Yoshitake et al., Journal of the Institute of Electronics and Communication Engineers Vol. 1. J67LD Nα1
0, ρp, IL47-1154.October 1984, etc.

これらは、統語解析のために動詞の所定の格が必須格か
どうかを調べているだけであり、統計的情報は何ら使っ
ていない。These methods merely check whether a predetermined case of a verb is an essential case for syntactic analysis, and do not use any statistical information.

共起確率を用いて多少統計的要素を取り入れたシステム
としては、「機械翻訳のための構文解析手法」、吉川等
、情報処理、Ｖｏｌ、２６、Ｎ（１１０、ＰＰ、１１５
７−１１６４．１９８５年１０月等がある。これらは、
複合名詞句などの解析に用いているのみであり、多品詞
語や任意格の係り受は等に適用した例はない。A system that uses co-occurrence probability and incorporates some statistical elements is "Syntax analysis method for machine translation", Yoshikawa et al., Information Processing, Vol. 26, N (110, PP, 115
7-1164.October 1985, etc. these are,
It has only been used to analyze compound noun phrases, etc., and there are no examples of it being applied to multi-part words, arbitrary case dependencies, etc.

さらに、統計情報を用いて構文解析を試みたものとして
は、「確率的言語処理へのアプローチ」、藤崎、情報処
理学会、自然言語処理研究会資料４１−６．１９８４年
１月がある。この方式では、文法規則そのものに確率を
つけるため、文法規則の作り方そのものが非常に難かし
く、また意味を全く考慮していないため、分野に合った
結果を期待することは難かしい。また、意味要素以外の
条件である文の形（受身かどうかなど）により影響を受
けるため、ｒｍａｃｈｉｎｅ　ｉｓ　ｖｉｒｔｕａｌＪ
とｒｖｉｒｔｕａｌｍａｃｈｉｎｅＪとを同じ重要度と
しないといった矛盾がある。Furthermore, an attempt at syntactic analysis using statistical information includes "An Approach to Probabilistic Language Processing" by Fujisaki, Information Processing Society of Japan, Natural Language Processing Research Group Material 41-6, January 1984. This method attaches probabilities to the grammatical rules themselves, making the creation of the grammatical rules very difficult, and since the meaning is not taken into account at all, it is difficult to expect results that are appropriate for the field. In addition, since it is influenced by the form of the sentence (such as whether it is passive or not), which is a condition other than semantic elements,
There are contradictions such as not giving the same importance to and virtualmachineJ.

Ｃ９本発明が解決しようとしている問題点この発明は以
上の事情を考慮した上でなされたものであり、統語解析
方式のもつ簡易さ、高処理速度等を維持しつつ、意味を
考慮してもつとも確からしい解析木を出力する装置を提
供することを目的としている。C9 Problems to be Solved by the Present Invention This invention has been made in consideration of the above circumstances, and has been made in consideration of the meaning while maintaining the simplicity and high processing speed of the syntactic analysis method. The purpose is to provide a device that outputs a plausible analytic tree.

Ｄ６問題点を解決するための手段この発明では以上の目的を達成するために、まず入力文
を統語解析する。そして複数の解析木が得られると、す
べての解析木について、文中の単語間の修飾のもっとも
らしさく尤度）を所定の統計情報から得て、累計し、各
解析木の尤度を算出する。そして尤度が最も高い解析木
を、意味上最も確かな解析木として出力する。なお、単
語間の修飾とは広い意味で用いられ、名詞−名詞、形容
詞−名詞１名詞−動詞、動詞−前置詞−名詞等の関係を
含む。Means for Solving Problem D6 In order to achieve the above object, the present invention first performs syntactic analysis of an input sentence. Then, when multiple parse trees are obtained, for all parse trees, the plausibility (likelihood) of the modification between words in a sentence is obtained from predetermined statistical information and summed, and the likelihood of each parse tree is calculated. . Then, the analytic tree with the highest likelihood is output as the most semantically certain analytic tree. Note that modification between words is used in a broad sense and includes relationships such as noun-noun, adjective-noun 1 noun-verb, verb-preposition-noun, etc.

より具体的な例では、学習用データ文中に現われる単語
間の修飾をすべて動詞−格という関係に変換し、統計を
とり、解析木の尤度を計算するためのデータを得る。す
なわち、修飾中に動詞が現実に含まれるときには、その
ままその動詞−格という関係を統計処理する。名詞−名
詞の関係のように格構造が現実には現われていないとき
には。In a more specific example, all modifications between words that appear in the learning data sentences are converted into verb-case relationships, and statistics are taken to obtain data for calculating the likelihood of an analysis tree. That is, when a verb is actually included in the modification, the verb-case relationship is statistically processed. When the case structure does not appear in reality, such as in the noun-noun relationship.

潜在的な格構造を文の意味を考慮して特定し、動詞等を
補足して動詞−格の関係を得る。そしてこれを統計に入
れる。なお、このようにして得たデータを第１尤度計算
辞書と呼ぶことにする。The latent case structure is identified by considering the meaning of the sentence, and verbs are supplemented to obtain the verb-case relationship. And add this to the statistics. Note that the data obtained in this manner will be referred to as the first likelihood calculation dictionary.

第１尤度計算辞書のみを用いて解析木の尤度を計算する
場合には、検索が煩雑になるので、この具体的な例では
第２尤度計算辞書をあわせて用いる。すなわち、第１尤
度計算辞書を用いて名詞−名調、形容詞−名詞等の関係
の統計情報を得るには、名詞−名詞等の関係を動詞−格
の関係に変換しなければならず、この操作を検索時に行
うのは大変である。第１尤度計算辞書の統計情報を用い
て、名詞−名詞の関係等から直接検索可能な第２尤度計
算辞書を予め作成しておけば、このようなＦｔ雑さを解
ン１１できる。If the likelihood of the analytic tree is calculated using only the first likelihood calculation dictionary, the search becomes complicated, so in this specific example, the second likelihood calculation dictionary is also used. That is, in order to obtain statistical information on relationships such as noun-nominal, adjective-noun, etc. using the first likelihood calculation dictionary, it is necessary to convert the noun-noun, etc. relationship into a verb-case relationship. It is difficult to perform this operation during a search. If a second likelihood calculation dictionary that can be directly searched based on the noun-noun relationship is created in advance using the statistical information of the first likelihood calculation dictionary, such Ft complexity can be resolved.

また、この具体的な例では、名詞の数が動詞や形容詞等
と較べ非常に多く、すべての名詞の出現頻度を調べるこ
とは容易ではないことから、意味マーカを使用する。意
味マーカは、各名詞をたとえば２４種類に分類する際の
分類名で、人間であるかとか、ソフトウェアの名称であ
るとかの性質によって規定されている。これにより、膨
大なデータを収集することなく尤度計算辞書を作成する
ことができる。　実行時には、まず統語解析部を用いて
入力文を文法規則と統語解析辞書とを参照しつつ統語解
析を行なう。いくつかの解析木が得られると、解析木尤
度計算部が起動され、解析木の修飾を検索し、動詞と格
との修飾であれば第１の尤度計算辞書を検索し、修飾の
尤度を得る。また、名詞−名詞の修飾等であれば、第２
尤度計算辞書を検索する。修飾の尤度を累精して各解析
木の尤度とし、最も尤度の高い解析木を出力する。Furthermore, in this specific example, the number of nouns is much larger than verbs, adjectives, etc., and it is not easy to check the frequency of appearance of all nouns, so semantic markers are used. The meaning marker is a classification name for classifying each noun into, for example, 24 types, and is defined by characteristics such as whether it is a human being or the name of software. Thereby, a likelihood calculation dictionary can be created without collecting a huge amount of data. At the time of execution, the syntactic analysis unit first performs syntactic analysis of the input sentence while referring to the grammar rules and the syntactic analysis dictionary. When several parse trees are obtained, the parse tree likelihood calculation unit is activated, searches for the modification of the parse tree, and if it is a modification between a verb and a case, searches the first likelihood calculation dictionary and calculates the modification. Get the likelihood. Also, if it is a noun-noun modification, etc., the second
Search the likelihood calculation dictionary. The likelihood of each modification is accumulated to obtain the likelihood of each analytic tree, and the analytic tree with the highest likelihood is output.

Ｅ、実施例以下この発明を英語から日本語への機械翻訳システムに
適用した一実施例について図面を参照しながら説明しよ
う。E. Example Hereinafter, an example in which the present invention is applied to an English to Japanese machine translation system will be described with reference to the drawings.

第１図は一実施例を実現する処理システムを示している
。第１図において、ワーク・ステーション１がホスト・
コンピュータ２に回線３を介して接続されている。この
ホスト・コンピュータ２にはチャネル４を介して補助記
憶装置５が接続されている。FIG. 1 shows a processing system implementing one embodiment. In Figure 1, work station 1 is the host
It is connected to a computer 2 via a line 3. An auxiliary storage device 5 is connected to the host computer 2 via a channel 4.

第１図の処理システムで実現される機械翻訳処理は第２
図に示すように、（Ｓｌ）英語文人力、（Ｓ２）英語文
解析、（Ｓ３）英語の解析木から日本語解析木への変換
、（Ｓ４）日本語文生成および（Ｓ５）日本語文出力の
５つのステージからなっている。ステージＳ２はこの発
明に直接関連する部分であり、のちに詳述される。ステ
ージＳ３は、ステージＳ２で得られた英語の解析木に基
づいて対応する日本諸本を生成するもので、この際、英
和辞書や構文対応規則を使用する。ステージＳ４では日
本語文をより自然にするため語順を入れ換えたり、句読
点を挿入したりする。The machine translation process realized by the processing system shown in Figure 1 is
As shown in the figure, (Sl) English literary ability, (S2) English sentence analysis, (S3) conversion from English parse tree to Japanese parse tree, (S4) Japanese sentence generation, and (S5) Japanese sentence output. It consists of five stages. Stage S2 is a portion directly related to this invention and will be detailed later. Stage S3 is to generate corresponding Japanese books based on the English parse tree obtained in stage S2, using an English-Japanese dictionary and syntactic correspondence rules. In stage S4, the word order is changed and punctuation marks are inserted to make the Japanese sentence more natural.

第１図のワーク・ステーション１は英語文の入力１日本
語文の出力および文の編集に用いられる。A work station 1 in FIG. 1 is used for inputting English sentences, outputting Japanese sentences, and editing sentences.

ホスト・コンピュータ２は上述のステージ８２〜Ｓ４の
処理を行うプログラムを実行するのに用いられる。補助
記憶装置５には各種の辞書が記憶されている。この辞書
については主に第３図を参照して説明する。The host computer 2 is used to execute a program that performs the processing in stages 82 to S4 described above. The auxiliary storage device 5 stores various dictionaries. This dictionary will be explained mainly with reference to FIG.

第３回は英語文解析ステージＳ２の処理を示すものであ
る。ここでは説明の便宜上洛処理をブロックとして表わ
している。The third time shows the processing of the English sentence analysis stage S2. Here, for convenience of explanation, the Raku processing is expressed as a block.

第３図において、入力英語文は統語解析部６に供給され
る。統語解析部６は統語解析辞書７および文法規則テー
ブル８を参照して入力英語文を統語解析する。統語解析
辞書７はたとえば第４図に示すデータ構造を有しており
、ここには見出し語とその品詞とが記述されている。文
法規則テーブル８はたとえば第５図に示すようなもので
ある。In FIG. 3, an input English sentence is supplied to a syntactic analysis section 6. The syntactic analysis unit 6 refers to the syntactic analysis dictionary 7 and the grammar rule table 8 to syntactically analyze the input English sentence. The syntactic analysis dictionary 7 has a data structure shown in FIG. 4, for example, in which headwords and their parts of speech are described. The grammar rule table 8 is, for example, as shown in FIG.

なお、Ｎ０ＵＮは名詞、ＮＰは名詞句、ＤＥＴは限定側
、ＶＥＲＶは動詞、ｖｐは動詞句、ＰＲＥＰは前置詞、
ＰＰは前置詞句、Ｓは文章を指す。Note that N0UN is a noun, NP is a noun phrase, DET is the limiting side, VERV is a verb, vp is a verb phrase, PREP is a preposition,
PP refers to a prepositional phrase, and S refers to a sentence.

文法規則は一般に書き換え規則と呼ばれ、ＡＢ−＋Ｃの形で表わされる。これは、ＡＢという系列が出てきた
らＣと書き換えることを示している。Grammar rules are generally called rewriting rules and are expressed in the form AB-+C. This indicates that if the series AB appears, it will be rewritten as C.

第３図の統語解析部６の処理をＴｉｍｅ　ｆｌｉｅｓ　１ｉｋｅ　ａｎ　ａｒｒｏｗ。The processing of the syntactic analysis unit 6 in FIG. Time flies 1ike an arrow.

という文章を例に挙げて考えてみよう。この文章が入力
されると、まず統語解析辞書７が検索されて、ｔｉｆｆｉｅ＝ＮＯＬＩＮ、　ｆｌｉｅｓ＝ＮＯＵＮ、
　ＶＥＲＢ。Let's take this sentence as an example. When this sentence is input, the syntactic analysis dictionary 7 is first searched, and tiffie=NOLIN, flies=NOUN,
VERB.

１ｉｋｅ＝ＶＥＲＢ、　ＰＲＥＰ、　ａｎ＝ＤＥＴ、　
ａｒｒｏｗ＝ＮＯＵＮが得られる。そしてつぎの４つの
組が候補として得られる。1ike=VERB, PREP, an=DET,
arrow=NOUN is obtained. Then, the following four sets are obtained as candidates.

上段の２つの組について文法規則を適用していくと、第
６図（ａ）および（ｂ）にそれぞれ示す解析木を得る。By applying the grammar rules to the two sets in the upper row, we obtain the parse trees shown in FIGS. 6(a) and 6(b), respectively.

下段の２つの組への文法規則の適用は途中で失則して解
析木は生成されない。ｖＰ−ｖＰやＮ　Ｐ　−Ｐ　ＲＥ
　Ｐを書き換えることは許されていないからである。Application of the grammar rules to the two sets in the lower row fails midway through, and no parse tree is generated. vP-vP and N P -P RE
This is because it is not allowed to rewrite P.

このように２つの解析木が得られるのは、文章中にあい
まいさがあったからであり、以降の処理で意味上圧しい
解析木を選択する。The reason why two parse trees are obtained in this way is because there is ambiguity in the text, and in subsequent processing, the parse tree with the most semantic meaning is selected.

第３図の統語解析部６で得られた解析木はスコア算出部
９および最適解析木判別部１０に供給される。解析木が
１つの場合にはこの解析木が最適解析木判別部１０を介
してそのまま出力される。The parse tree obtained by the syntactic analysis section 6 in FIG. 3 is supplied to the score calculation section 9 and the optimal parse tree discrimination section 10. If there is only one analytic tree, this analytic tree is output as is through the optimal analytic tree determination section 10.

解析木が２つ以上の場合には、スコア算出部９が第１尤
度計算辞書１１および第２尤度計算辞書１２を参照して
各解析木の尤度を計算する。最適解析木判別部１ｏは解
析木の尤度を比較し、最大の尤度を有する解析木を最適
解析木として出力する。When there are two or more analytic trees, the score calculation unit 9 refers to the first likelihood calculation dictionary 11 and the second likelihood calculation dictionary 12 to calculate the likelihood of each analytic tree. The optimal analytic tree determination unit 1o compares the likelihoods of the analytic trees and outputs the analytic tree with the maximum likelihood as the optimal analytic tree.

すなわち、第１および第２尤度計算ｎ杏１１および１２
には多数の文からなる学習用データに含まれる単語間の
修飾の頻度が記憶されている。スコア算出部９は解析木
の各々に表わされる修飾をピックアップし、対応する頻
度を第１および第２尤度辞書１１および１２から取り出
して累計し、その結果を解析木の各々の尤度とする。That is, the first and second likelihood calculations 11 and 12
Stores the frequency of modifications between words included in learning data consisting of a large number of sentences. The score calculating unit 9 picks up the modifications expressed in each of the analytic trees, extracts the corresponding frequencies from the first and second likelihood dictionaries 11 and 12, totals them, and uses the result as the likelihood of each of the analytic trees. .

第１尤度計算辞書１１は動詞とその格との関係にある２
つの単語（句）が学習用データに現われた頻度を記憶し
ている。第１尤度計算辞書１１のデータ構造の一例を第
７図に示す。この例では５ｕｐｐｏｒｔが動詞であり、
Ｖ　Ｍ　／　Ｓ　Ｐ　、　ｕｓｅｒがその主格（ＳＵＢ
Ｊ）であり、またｐｒｏｇｒａｍ、ｍａｃｈｉｎｅが目
的格（ＯＢ　、Ｔ　Ｉ　）である。右の欄の数字は修飾
の頻度を表わす。たとえば５ｕｐｐｏｒｔが動詞であリ
、ＶＭ／Ｓｐがその主格である修飾の頻度は２０である
。The first likelihood calculation dictionary 11 has a relationship between verbs and their cases 2
The frequency with which a word (phrase) appears in the training data is memorized. An example of the data structure of the first likelihood calculation dictionary 11 is shown in FIG. In this example, 5upport is a verb,
V M / S P, user is the nominative (SUB
J), and program and machine are objective cases (OB, T I). The numbers in the right column represent the frequency of modification. For example, the frequency of modification in which 5upport is a verb and VM/Sp is the nominative is 20.

第２尤度計算辞書１２は名詞と名詞との間の修飾および
名詞と前置詞プラス名詞との間の修飾の頻度を記憶して
いる。この辞書１２のデータ構造の一例を第８図に示す
。この例では、ｃｏｎｔｒｏｌｐｒｏｇｒａｍという名
詞−名詞間の修飾が１７回起きている。またｐｒｏｇｒ
ａｍ　ｉｎ　ｄｉｓｋという名詞−前置詞プラス名詞間
の修飾が５回起きている。The second likelihood calculation dictionary 12 stores the frequencies of modifications between nouns and between nouns and prepositions plus nouns. An example of the data structure of this dictionary 12 is shown in FIG. In this example, the noun-noun modification controlprogram occurs 17 times. Also progr
The modification between the noun and the preposition plus the noun "am in disk" occurs five times.

第３図のスコア算出部９の動作を、さきに挙げたＴｉｍｅ　　ｆｌｉｅｓ　　１ｉｋｅ　　ａｎ　　ａｒ
ｒｏｔｙ。The operation of the score calculation unit 9 in FIG.
roty.

という文を例にして考えておく。Consider this sentence as an example.

第６図（ａ）および（ｂ）に示すようにこの文章に対し
て実現可能な解析木は２つである。そして解析木（ａ）
に関してはの２つの修飾が抽出される。なお、この実施例では、Ｄ
ＥＴ　（冠詞、限定側）が意味的要素を含んでいないこ
とから、これらを修飾から除外している。またＰＲＥＰ
　（前置詞）は、後続の名詞と合わさって１つの修飾要
素となるため、動詞−前置調子名詞という３つで１組と
みなしている。ｆｌｉｅＳはこの解析木では動詞である
ので、第１尤度計算辞書１１を検索して、上述の２つの
修飾の頻度を得る。ここではそれぞれ２０と１０とであ
るとしよう。そうすると、解析木（ａ）の尤度はこれら
の累計値３ｏとなる。As shown in FIGS. 6(a) and 6(b), there are two parse trees that can be realized for this sentence. and parse tree (a)
Two modifications are extracted for . Note that in this example, D
Since ET (article, restrictive side) does not contain any semantic elements, these are excluded from modification. Also PREP
(Preposition) is combined with the following noun to form a modifier, so the verb and prepositional noun are considered to be one set. Since flyS is a verb in this parse tree, the first likelihood calculation dictionary 11 is searched to obtain the frequencies of the two modifications described above. Let us assume that they are 20 and 10, respectively. Then, the likelihood of the analytic tree (a) becomes the cumulative value 3o.

解析木（ｂ）に関しては、の３つの修飾が抽出される。解析木（ｂ）のｔｉｍｅ（
ｌｉｅｓの修飾は解析木（ａ）と異なり、名詞−名詞の
修飾である。したがってｔｉｍｅ−ｆｌｉｅｓについて
は第２尤度計算辞書１２を検索してたとえば２を得る。Regarding the parse tree (b), three modifications are extracted. The time(
The lies modification is a noun-noun modification, unlike the parse tree (a). Therefore, for time-flies, the second likelihood calculation dictionary 12 is searched to obtain, for example, 2.

また１ｉｌｅは動詞であるので、残りの２つの修飾につ
いては第１尤度計算辞書１１を検索してたとえば、１０
．１０をそれぞれ得る。解析木（ｂ）の尤度は以上の値
を累計して２２となる。Also, since 1ile is a verb, the first likelihood calculation dictionary 11 is searched for the remaining two modifications, and for example, 1ile is a verb.
．． 10 each. The likelihood of the analytic tree (b) is 22, which is the cumulative sum of the above values.

第３図の最適解析木判別部１ｏはスコア算出部９からの
実現可能な解析木の尤度の比較を行い、最大尤度の解析
木を出力する。上述の例では、第６図（ａ）および（ｂ
）の解析木はそれぞれ３゜および２２の尤度を有するの
で、解析木（ａ）を出力する。The optimal analytic tree discriminator 1o in FIG. 3 compares the likelihoods of the realizable analytic trees from the score calculation unit 9 and outputs the analytic tree with the maximum likelihood. In the above example, FIGS. 6(a) and (b)
) has a likelihood of 3° and 22, respectively, so the analytic tree (a) is output.

つぎに尤度計算辞書１１および１２の作成方法について
述べる。Next, a method for creating the likelihood calculation dictionaries 11 and 12 will be described.

まず、学習データを準備する。この学習データは文章解
析（機械翻訳）の対象となる分野から無作為に多数の文
を抽出して得ることができる。そしてこれらの文を人手
で文章解析し、完全な、あいまいさのない解析木を作る
。そして解析木の単語間の修飾を調べる。たとえば、ＶＭ／ＳＰ　ｒｕｎｓ　ｉｎ　ｖｉｒｔｕａｌ　ｍａｃ
ｈｉｎｅ。First, prepare the learning data. This learning data can be obtained by randomly extracting a large number of sentences from the field targeted for sentence analysis (machine translation). These sentences are then analyzed manually to create a complete, unambiguous parse tree. Then, examine the modifications between words in the parse tree. For example, VM/SP runs in virtual mac
hine.

という入力文の場合を示したのが第９図である。FIG. 9 shows the case of the input sentence.

（ａ）は学習用入力文でこれを解析したのが（ｂ）であ
る。これから、ＶＭ／ＳＰがｒｕｎの主格であること、
ｉｎ　ｖｉｒｔｕａｌ　ｍａｃｈｉｎｅがｒｕｎに係っ
ていること、およびｖｉｒｔｕａｌが１ａｃｈｉｎｅを
修飾していることがわかり、３つの修飾が得られる。こ
のうちＶＭ／５Ｐ−ｒｕｎｓは、そのまま用いる（ただ
し、三人称単数現在形のＳや過去形等は原形に直して扱
う）。またｒｕｎ−ｉｎ　ｖｉｒｔｕａｌ　ｍａｃｈｉ
ｎｅでは、意味の主役がｒｕｎ−ｉｎ　ｍａｃｈｉｎｅ
であるから、ｍａｃｈｉｎｅがｒｕｎにｉｎ格で係って
いるとみなすａ　ｖｉｒｔｕａｌ−ｍａｃｈｉｎは意味
上ｍａｃｈｉｎｅ　ｉｓ　ｖｉｒｔｕａｌと等価である
から、ｂｅ−ｖｉｒｔｕａｌという動詞４Ｆ１当語を作
る。これは、形容詞は、ｂｅ動詞を伴って用語として扱
うことができるからで、意味上ｂｅ動詞は何の役目も果
たしていないので差し支えない。これにより、ｍａｃｈ
ｉｎｅはｂｅ−ｖｉｒｔｕａｌの主格であるという関係
が得られる。そして以上のことから（ｃ）のような修飾
が抽出される。(a) is an input sentence for learning, and (b) is an analysis of this input sentence. From now on, VM/SP is the nominative of run,
It turns out that in virtual machine is related to run, and that virtual modifies 1achine, resulting in three modifications. Among these, VM/5P-runs is used as is (however, third person singular present tense S, past tense, etc. are changed to their original forms). Also run-in virtual machine
In ne, the main meaning is run-in machine.
Therefore, a virtual-machine, in which machine is considered to be related to run in the in-case, is semantically equivalent to machine is virtual, so the verb be-virtual is created. This is because an adjective can be treated as a term along with the verb be, and the verb be does not play any role in terms of meaning, so there is no problem. This allows mach
The relationship that ine is the nominative of be-virtual is obtained. From the above, a modification like (c) is extracted.

なお、ここでは出現しなかったが、ｔｈｉｓやｉｔ、　
ｈｅ、ｔｈｅｍ等の代名詞あるいはａｎとかｔｈｅ等の
冠詞及びｏｎｅ、ｔｗｏ、　ｔｈｒｅｅ等の数詞は、意
味的関係を表わす場合にほとんど影響を与えないので、
統計には加えないこととする。Although they did not appear here, this, it,
Pronouns such as he and them, articles such as an and the, and numerals such as one, two, and three have little influence when expressing semantic relationships, so
It will not be included in the statistics.

第１０図に別の例を示す。この例では、ＴｈｘＳ＋ｉｓ
、ａという部分は、意味的関係に影響を与えないので統
計に加えない。また、ｍａｃｈｉｎｅ　ｏｆ　ＶＭ／Ｓ
Ｐについては、ｍａｃｈｉｎｅ　＠　ｏｆ　ＶＭ／ＳＰ
が修飾しているとみなす。ここで、この修飾を動詞と格
との関係にするために、動詞を補って文を作る。そうす
ると。Another example is shown in FIG. In this example, ThxS+is
, a do not affect the semantic relationship, so they are not added to the statistics. Also, machine of VM/S
For P, machine @ of VM/SP
is considered to be qualified. Now, in order to make this modification a relationship between the verb and the case, we make a sentence by supplementing the verb. Then.

この分野を考慮してｒＶＭ／ＳＰ　ｒｕｎｓ　ｉｎ　ｍ
ａｃｈｉｎｅＪという文が作られる、この作業は人間が
行なうのであるが１前後の文脈を考えれば、人間にとっ
ては、さほど困難な作業ではないし、学習時に一度だけ
行なえばよいので、負担は少ない。なお、分野しこよっ
て補われる動詞にはかたよりがある。このかたよりを反
映させて、補われる動詞を適当に選定することによって
、精度が高まる。こうして得られた文、ｒＶＭ／ＳＰ　
ｒｕｎｓ　ｉｎ　ｍａｃｈｉｎｅＪから動詞と格との関
係を抽出して（ｃ）の修飾が得られる。Considering this field, rVM/SP runs in m
This task, in which the sentence achineJ is created, is performed by humans, but considering the context before and after 1, it is not a very difficult task for humans, and it only needs to be done once during learning, so it is not a burden on humans. Note that there is a difference in the verbs that are supplemented depending on the field. Accuracy can be improved by reflecting this bias and appropriately selecting verbs to be supplemented. The sentence thus obtained, rVM/SP
The modification (c) is obtained by extracting the relationship between the verb and the case from "runs in machine".

つぎに第９図例および第１０図例の修飾の各々の出現を
カウントしてつぎの中間的な頻度データを得る。Next, the occurrence of each of the modifications in the example of FIG. 9 and the example of FIG. 10 is counted to obtain the next intermediate frequency data.

そして同様のカウント処理を他のすべての学習データ文
について実行して最終的な頻度データを得、これらを第
１尤度計算辞書１１に記憶する。この際のデータ構造に
ついては第７図を参照してすでに述べた。他の修飾につ
いても同様の処理を行って第１尤度計算辞書１１に記憶
する。Then, similar counting processing is performed for all other learning data sentences to obtain final frequency data, and these are stored in the first likelihood calculation dictionary 11. The data structure at this time has already been described with reference to FIG. Similar processing is performed for other modifications and stored in the first likelihood calculation dictionary 11.

以上のような集計では、動詞とその格との間の修飾に現
われる傾向のある単語と、名詞と名詞との間の修飾等に
現われる傾向のある単語とを、格構造という同一のメジ
ャーで扱い、その双方が統計情報に確実に反映されるよ
うにでき、単語にどのような傾向があろうと、その単語
の修飾を正確に、かつ統一的に記述することが可能とな
る。In the above aggregation, words that tend to appear in modifications between verbs and their cases and words that tend to appear in modifications between nouns are treated under the same measure of case structure. , it is possible to ensure that both of these are reflected in the statistical information, and it is possible to accurately and uniformly describe the modification of a word, no matter what tendency the word has.

なお以下に集計上の留意点をまとめておく。Below is a summary of points to keep in mind when calculating.

肱に動詞は一般に他の単語を修飾しないので、修飾される場
合だけを考える。ただし、以下を除く、■ｉｎｇを伴な
って名詞を修飾する場合これは１名詞が、動詞に対して
、どのような修飾関係にあるのかを調べる。The verb 肱に generally does not modify other words, so we will only consider cases where it is modified. However, except for the following cases, ■ When a noun is modified with ing, the modification relationship between a noun and the verb is investigated.

（例）　ｐｒｏｇｒａｍｍｉｎｇ　ｌａｎｇｕａｇｅ。(Example) programming language.

→ｈｕｍａｎ　ｐｒｏｇｒａｍｓ　ｉｎ　ｌａｎｇｕａ
ｇｅｈｕｍａｎ−ｐｒｏｇｒａｍ、　ｐｒｏｇｒａｍｉ
ｎ　ｌａｎｇｕａｇｅの２つの関係が得られる。→human programs in language
gehuman-program, programi
Two relations of n language are obtained.

■過去分詞になって名詞を修飾する場合この場合たいて
い名詞は動詞の目的格となっている。■When a past participle modifies a noun In this case, the noun is usually the object of the verb.

（例）　ｂｏｉｌｅｄ　ｅｇｇ。(Example) Boiled egg.

→ｈｕｍａｎ　ｂｏｉｌｅｄ　ｅｇｇｈｕｍａｎ−ｂｏｉｌ、ｂｏｉｌ−ｅｇｇの２つが得ら
れる。→human boiled egg Human-boil and boil-egg are obtained.

形容詞形容詞はｂｅ−をつけた形で動詞相当語として扱われる
。Adjectives Adjectives are treated as verb equivalents by adding be-.

（例）　ｆｉｌｅ　ｉｓ　ｌａｒｇｅ−＋ｆｉｌｅ　−
ｂｅ−１ａｒｇｅｌａｒｇｅ　ｆｉｌｅ　　−＋ｆｉｌ
ｅ　−ｂｅ−１ａｒｇｅ逼り創副詞は一般的に動詞、動詞相当語を修飾するのでその格
として扱う。(Example) file is large-+file-
be-1argelarge file -+fil
Since the e-be-1arge creative adverb generally modifies a verb or a verb equivalent, it is treated as its case.

（例）　）ｌｕｍａｎ　ｕｓｅｓ　ｔｈｅ　ｆｉｌｅ　
ｑｕｉｃｋｌｙ。(Example) ) Luman uses the file
quickly.

ｕｓｅ　−ｑｕｉｃｋｌｙがとられる。use-quickly is taken.

名」１名詞は動詞の各格（主格、目的格、前置側路）として使
われるが、その他にも、名詞を修飾することもある。Nouns are used in each case of verbs (nominative, objective, and prepositional), but they can also modify nouns in other ways.

（例）　ＶＭ／ＳＰ　ｕｓａｇｅ。(Example) VM/SP usage.

−＋Ｈｕｍａｎ　ｕｓｅｓ　ＶＭ／ＳＰ。-+Human uses VM/SP.

ｈｕｍａｎ−ｕｓｅ、　ｕｓｅ　ＶＭ／ＳＰがとられる
。human-use, use VM/SP is taken.

（例）　ｍａｃｈｉｎｅ　ｏｆ　ＶＭ／ＳＰ。(Example) machine of VM/SP.

−）ＶＭ／ＳＰ　ｒｕｎｓ　ｉｎ　ｍａｃｈｉｎｅ。-) VM/SP runs in machine.

ＶＭ／５Ｐ−ｒｕｎ、　　ｒｕｎ　　ｉｎ　　ｍａｃｈ
ｉｎｅがとられる。VM/5P-run, run in mach
ine is taken.

その他この他の代名詞、冠詞、助動詞等は、意味要素をほとん
ど持っていないので統計には入れない。Other pronouns, articles, auxiliary verbs, etc. have almost no semantic elements and are therefore not included in the statistics.

つぎに第２尤度計算辞書１２のデータの作成方法につい
て述べる。このデータを作成するには、第１尤度計算辞
書１１のデータを作成するときに名詞−名詞間の修飾等
に対して施こした操作と逆の操作を行う。すなわち、第
１尤度計算辞書１１のデータを作成するときには、名詞
−名詞間の修飾等を動詞−路間の修飾に変換して集計を
採った。Next, a method of creating data for the second likelihood calculation dictionary 12 will be described. To create this data, the operations performed on the modifications between nouns and the like when creating the data for the first likelihood calculation dictionary 11 are performed. That is, when creating the data for the first likelihood calculation dictionary 11, the noun-to-noun modifications, etc. were converted to verb-to-path modifications and then tabulated.

これは両者が意味上同一であるからである。そうすると
、同様の理由から、第２尤度計算辞書１２のデータを得
るには、動詞−路間の修飾を名詞−名詞間の修飾や名詞
−前置詞旬間の修飾に変換して集計を採ればよい。学習
データ文のすべての修飾の統計情報は動詞−路間の修飾
としてすでに第１尤度計算辞書１１に記憶されているの
で、ここに記憶されている情報を利用する。This is because both are the same in meaning. Then, for the same reason, in order to obtain the data of the second likelihood calculation dictionary 12, it is only necessary to convert the modification between verb and path into the modification between noun and noun or the modification between noun and preposition and calculate the total. . Since the statistical information of all the modifications of the learning data sentence has already been stored in the first likelihood calculation dictionary 11 as a modification between the verb and the path, the information stored here is used.

たとえばｒｍａｃｈｉｎｅ　ｏｆ　ＶＭ／ＳＰＪは意味
上ｒＶＭ／５Ｐｒｕｎｓ　ｉｎ　ｍａｃｈｉｎｅ　Ｊと
同一であるので、このように変換を行う。ｒＩＩｌａｃ
ｈｉｎｅ　ｏｆ　ＶＭ／ＳＰＪという句のスコアはｒＶ
Ｍ／ＳＰ　ｒｕｎｓ　ｉｎ　ｍａｃｈｉｎｅＪの文のス
コアと同一であるはずである。この文のスコアは、ＶＭ
／５Ｐ−ｒｕｎでたとえば２、ｒｕｎ　−ｉｎ　ｍａｃ
ｈｉｎｅでたとえば２であり、全部で４となる。この結
果ｍａｃｈｉｎｅ−ｏｆ　ＶＭ／ＳＰの修飾のスコアを
４と判別し、第２尤度計算辞書１２に記憶する。For example, since rmachine of VM/SPJ is semantically the same as rVM/5Pruns in machine J, the conversion is performed in this manner. rIIlac
The score for the phrase hine of VM/SPJ is rV
It should be the same as the score for the sentence M/SP runs in machineJ. The score for this sentence is VM
/5P-run for example 2, run -in mac
hine is 2, for example, and the total is 4. As a result, the modification score of the machine-of VM/SP is determined to be 4, and is stored in the second likelihood calculation dictionary 12.

以上で実施例の説明を終える。This concludes the description of the embodiment.

なお上述実施例では名詞について単語ごとに柴計をとり
、そののち検索を行うようにした。しかし、名詞の単語
数は非常に多く、限られた量の学習文からこれらにつき
もれなく統計をとることは困１である。したがって、名
詞の単語をその特徴に応じてグループに分類するように
することが好ましい。ここではこのグループ名を意味マ
ーカと呼ぶ。表は意味マーカの例を示す。このような意
味マーカを用いると、たとえばｒｍａｃｈｉｎｅ　ｏｆ
　ＶＭ／ＳＰＪはｒＬＩＤ　ｏｆ　ＬＥＪと表わされる
。意味マーカを用いて集計をとり、そののちこれを用い
て解析木の尤度計算を行えば、学習文が少なくても、ま
た、学習文中にない単語が入力文に含まれていても、さ
ほど問題なく処理を行うことができる。表の意味マーカ
を用いた例では数百文の学習で精度の高い文章解析を行
えた。In the above-mentioned embodiment, a check mark is taken for each word of the noun, and then the search is performed. However, the number of noun words is extremely large, and it is difficult to collect statistics on all of them from a limited amount of learning sentences. Therefore, it is preferable to classify noun words into groups according to their characteristics. Here, this group name is called a semantic marker. The table shows examples of semantic markers. Using such semantic markers, for example, rmachine of
VM/SPJ is expressed as rLID of LEJ. If you take a tally using semantic markers and then use this to calculate the likelihood of the parse tree, it will be much easier even if there are few training sentences or even if the input sentences contain words that are not in the training sentences. Processing can be performed without any problems. In an example using table semantic markers, highly accurate sentence analysis was possible after learning several hundred sentences.

表、意味マーカの種類ＨＭ：人間を表わす　　　　　ｈｕｍａｎ、０ｐｅｒａ
ｔｏｒ、ｕｓｅｒなどＡＭ：動物を表わす　　　　　ａ
ｎｉｍａｌ、ｆｌｙ、ｃａｔＬＣ：計算機中の論理的場
所　ａｒｅａ、ｓｔｏｒａｇｅ、ｆｉｌｅＬＥ：計算機
のソフトウェア　ｐｒｏｇｒａｍ、ＶＭ／ＳＰ、ｍａｔ
ｒｉｘ関連ＤＭ：ドキュメント　　　　　ａｂｓｔｒｕｃｔ　、　
ｃｈａｒｔ　、　ｃｈａｐｔｅｒＳＴ：状態　　　　　
　　　　５ｔａｔｕｓ、ｅｖｅｎｔ、ｍｏｄｅＴＨ：技
術の名称　　　　　　ｂａｔｃｈ　、　ｔｒａｐＦＡ：
特性　　　　　　　　　５ｅｃｕｒｉｔｙ、ｏｐｔｉｏ
ｎＡ丁＝属性名　　　　　　　　ｃｌａｓｓ、ｃｏｕｎ
ｔ、５ｉｚｅＵＤ：装置　　　　　　　　ｍａｃｈｉｎ
ｅ、ａｃｃｅｓｓｏｒＷＫ：仕事の名称　　　　　　ａ
ｃｔｉｖｉｔｙ、ａｃｔｉｏｎ、ｃｒｅａｔｉｏｎＴＭ
：時間　　　　　　　　　ｔｉｍｅ、ｐａｓｔ、ｂｅｇ
ｉｎｎｉｎｇＰＬ：場所　　　　　　　　　ｐｌａｃｅ
、ｂａｎｋ、ｃｏｍｐａｎｙＰＮ：人の名前　　　　　
　　５ｈａｎｎｏｎＰ○：地点　　　　　　　　　ｏｒ
ｉｇｉｎ　＋　ｐｏｓｉｔｊｏｎ　、　１ｏｃａｔｉｏ
ｎ○Ｇ：組Ｈａ　　　　　　　　　　ｄｅｐａｒｔｍｅ
ｎｔ　、　ＩＢＭｖＡ：属性値　　　　　　　　ｄｅｇ
ｒｅｅ、ｃｏｌｕｍｎＬＰ：論理的な通路　　　　　ｐ
ａｔｈ、ｂｙｐａｓｓ、ｂｕｓ、ｎｅｔｗｏｒｋｌＦ：
情報　　　　　　　　　ａｌａｒｍ、ａｎｓｗｅｒ、ｃ
ｏｍｍｅｎｔＰＴ：物の部分　　　　　　　ｂｏｄｙ、
ｅｄｇｅ、ｈｅａｄＭＬ：物体　　　　　　　　　ｍａ
ｔａｌ、ｏｘｉｄｅＳＬ：計算機の勺プライ用品　ｂａ
ｔｔｅｒｙ、ｃａｒｄ、ｐａｃｋＦ９発明の詳細な説明したように、この発明によれば統語解析の出力の
結果にあいまいさがある場合に、統計データを参照して
あいまいさを除去することができる。統計データ収集の
手間は比較的小さく、実行時には単純な加算器のみで実
現でき、従来できなかった多品洞部の決定や係り受けの
あいまいさの除去が、容易に行なえる。また、分野毎に
統計データをパッケージとして用意しておくことにより
、分野に依存した解析結果を得ることもできる。Table, semantic marker type HM: human, 0pera
tor, user, etc. AM: Represents an animal a
nimal, fly, catLC: Logical location in the computer area, storage, fileLE: Computer software program, VM/SP, mat
rix related DM: document abstract,
chart, chapter ST: status
5tatus, event, modeTH: technology name batch, trapFA:
Characteristics 5ecurity, optio
nA ding=attribute name class, coun
t, 5izeUD: device machine
e, accessorWK: Job name a
activity, action, creationTM
:time time, past, beg
inningPL: place
, bank, companyPN: person's name
5hannonP○: Point or
igin + position, 1ocatio
n○G: Group Ha department
nt, IBMvA: Attribute value deg
ree, columnLP: logical passage p
ath, bypass, bus, networklF:
Information alarm, answer, c
omentPT: part of a thing, body
edge, headML: object ma
tal, oxideSL: Calculator ply supplies ba
ttery, card, packF9 As described in detail, according to the present invention, when there is ambiguity in the output of syntactic analysis, it is possible to remove the ambiguity by referring to statistical data. The effort required to collect statistical data is relatively small, and it can be implemented using only a simple adder during execution, making it easy to determine multi-item areas and eliminate ambiguity in dependencies, which were previously impossible. Furthermore, by preparing statistical data for each field as a package, analysis results that depend on the field can be obtained.

また、この発明はどの言語にも適用できるし、自然言語
文を入力とするシステムならば何にでも適用が可能であ
る。Furthermore, this invention can be applied to any language, and can be applied to any system that receives natural language sentences as input.

[Brief explanation of the drawing]

第１図はこの発明の一実施例を実現するコンピュータ・
システム例を示す図、第２図はこの実施例の全体の流れ
を示すフローチャート、第３図はこの実施例の要部を示
すブロック図、第４図〜第１０図は第３図の要部を説明
するための図である。６・・・・統語解析部、９・・・・スコア算出部、１１
・・・・最適解析本判別部。出願人　　インターナショナル・ビジネス・マシーンズ
・コーポレーション復代理人　　弁理士　　澤　　１）　俊　　夫ホスト λ 辞　書　　　　　　　文法規則第４図　　　　　　　　第５図第６図第７図第８図（ａ）ＶＭ／ＳＰ　　　　ｒｕｎｓ　　　　　　ｉｎ　　　　
ｖｉｒｔｕａｌ　　　ｍａＣｈｉｎｅ（ｂ）（ｃ）第９図FIG. 1 shows a computer system that implements an embodiment of the present invention.
Figure 2 is a flowchart showing the overall flow of this embodiment. Figure 3 is a block diagram showing the main parts of this embodiment. Figures 4 to 10 are the main parts of Fig. 3. FIG. 6... Syntactic analysis section, 9... Score calculation section, 11
...Optimal analysis book discriminator. Applicant International Business Machines Corporation Sub-Agent Patent Attorney Sawa 1) Toshio Host λ Dictionary Grammar Rules Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 (a) VM/SP runs in
Virtual maChine (b) (c) Figure 9

Claims

[Claims]

(1) a storage means for storing a value representing the likelihood of modification occurring between constituent elements of a natural language sentence, and a syntactic analysis means for parsing the natural language input sentence into one or more possible modes; When two or more syntactic analysis results are obtained, an accumulating means for accumulating the likelihood of modification appearing in each of the syntactic analysis results; and a syntactic analysis result that maximizes the accumulated value. A natural language analysis device characterized by having a discriminating means for discriminating a result from a regular text analysis result.

(2) The natural language analysis device according to claim (1), wherein the modification between the constituent elements is related to a case structure that actually or potentially appears between the constituent elements.

(3) The storage means includes a first storage means that is accessed for case structures actually appearing in the input sentence, and a second storage means that is accessed for case structures that potentially appear in the input sentence. and a storage means for
2) Natural language analysis device described in section 2).

(4) Claim (3) wherein the first storage means is configured so that the content of the case structure, the constituent elements of the verb related to the case structure, and the constituent elements of the case can be accessed as entries. Natural language analysis device described.

(5) The nature described in claim (3) or (4), wherein the second storage means is configured such that a pair of components related by a latent case structure can be accessed as an entry. Language analysis device.

(6) Claim (1) in which some of the above constituent elements are classified into one or more groups according to their respective semantic features, and each of the groups is treated as one constituent element.
A natural language analysis device according to items (5) to (5) above.

(7) A natural language analysis device according to claim (6), in which some of the constituent elements are constituent elements of nouns.