JPH06332940A

JPH06332940A - Example leading type natural language analyzing device

Info

Publication number: JPH06332940A
Application number: JP5122442A
Authority: JP
Inventors: Eiichiro Sumida; 英一郎隅田; Kura Furuse; 蔵古瀬; Hitoshi Iida; 仁飯田
Original assignee: A T R JIDO HONYAKU DENWA KENKYUSHO KK; ATR JIDO HONYAKU DENWA
Current assignee: A T R JIDO HONYAKU DENWA KENKYUSHO KK; ATR JIDO HONYAKU DENWA
Priority date: 1993-05-25
Filing date: 1993-05-25
Publication date: 1994-12-02
Anticipated expiration: 2013-09-10
Also published as: JP2796690B2

Abstract

PURPOSE:To select optimum structure with high accuracy according to any simple method concerning a vague sentence by selecting the maximum likelihood input structure according to calculated tolerance and outputting the selected maximum likelihood tolerant structure. CONSTITUTION:A sentence in a natural language is inputted from an input means 1, the input sentence is mapped to input structure by an analyzing means 2 and when the plural kinds of input structure are provided, the tolerance is calculated for each kind of structure. Since the maximum likelihood tolerant input structure is selected and outputted by a selecting means 4 according to the calculated tolerance, the optimum structure can be selected with high accuracy concerning the vague sentence. Namely, the maximum likelihood tolerant structure is outputted while referring to the example concerning the vague input sentence. Moreover, approximate collation can be provided by utilizing thesaurus, and the number of examples is reduced. Further, since the example is prepared for each field, the example can be easily adjusted for each field. On the other hand, this method can be applied to any arbitrary language and easily integrated into a lot of conventional analyzing methods (such as CFG and pattern matching).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は用例主導型自然言語解
析装置に関し、特に、機械翻訳システム，情報検索シス
テム，質問応答システムなどで用いられ、曖昧な文に対
して最適な構造を簡易な方法により高い精度で選べるよ
うな用例主導型自然言語解析装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an example-driven natural language analysis device, and more particularly to a simple method for determining an optimum structure for an ambiguous sentence, which is used in a machine translation system, an information retrieval system, a question answering system, etc. The present invention relates to an example-driven natural language analysis device that can be selected with higher accuracy.

【０００２】[0002]

【従来の技術】計算機による言語処理システムは、ます
ますその需要が高まり、研究開発が盛んに行なわれてい
る。最近では、機械翻訳システムは情報検索システムや
質問応答システムなどの多様な分野で商品化も行なわれ
るようになってきた。2. Description of the Related Art Demand for computer language processing systems is increasing, and research and development are being actively conducted. Recently, machine translation systems have been commercialized in various fields such as information retrieval systems and question answering systems.

【０００３】[0003]

【発明が解決しようとする課題】計算機による言語処理
システムの必須要素である言語解析装置には、従来、共
通の大きな問題点があった。すなわち、曖昧な文に対し
て最適な構造を高い精度で選べないことである。たとえ
ば、前置詞句の係り先は、構造的曖昧性を引起こす典型
的な例である。The language analysis device, which is an essential element of a computer language processing system, has had a common problem in the past. That is, it is not possible to select the optimum structure with high accuracy for ambiguous sentences. For example, the preposition of a preposition is a typical example of causing structural ambiguity.

【０００４】 I present a paper at the conferenc
e. この例文の前置詞句「at the conference 」は、動詞
「present 」，名詞「a paper 」の両方を修飾し得る。
一般に、前置詞句の係り先は、文法規則による従来の解
析装置では一意に決定することが困難である。計算機処
理では、ある前置詞の係り先が複数可能な場合でも、人
間は、通常、係り先の尤もらしさを判定できる。この例
では、前者の方が自然な係り先である。I present a paper at the conferenc
e. The prepositional phrase "at the conference" in this example sentence may modify both the verb "present" and the noun "a paper".
In general, it is difficult to uniquely determine a prepositional phrase to which a preposition is related by a conventional parsing device based on grammar rules. In computer processing, a person can usually determine the likelihood of a preposition even if there are a plurality of prepositions of a given preposition. In this example, the former is the more natural contact.

【０００５】現在広く行なわれている手法は、文法規則
や辞書に意味マーカなどを用いて、係り先の優先情報を
指定する方法である。しかし、この手法では、動詞の必
須格は別として、動詞の自由格（たとえば上述の前置詞
句「at the conference 」）や前置詞句が名詞に係る場
合を扱えない。A method that is widely used at present is a method of designating priority information of an affiliated party by using a grammatical rule or a semantic marker in a dictionary. However, this method cannot handle the free case of a verb (for example, the above-mentioned prepositional phrase "at the conference") or the case where the prepositional phrase is associated with a noun, apart from the essential case of the verb.

【０００６】一方、近年では、確率文法が盛んに研究さ
れている。この手法は、文法規則適用の最適化をするの
で、前置詞句の曖昧性解消を行なうためには、単語に近
いレベルの細かい文法規則を与える必要がある。このよ
うな細かい文法規則を記述することは容易ではない。さ
らに、もし記述できたとしても、最適化のための学習時
間が膨大になる（計算量は文法のサイズをＧとするとＯ
（Ｇ³）である）。On the other hand, in recent years, probabilistic grammar has been actively studied. Since this method optimizes the application of grammatical rules, it is necessary to give detailed grammatical rules close to words in order to resolve disambiguation of prepositional phrases. It is not easy to describe such fine grammar rules. Further, even if it can be described, the learning time for optimization becomes enormous (the calculation amount is O if the size of the grammar is G.
(G ³ )).

【０００７】構造的な曖昧性の解消に世界知識，ディス
コースモデルを使う手法がある。しかし、この手法に
は、知識作成の困難さ，処理の重さなどの問題があり、
現実の文章を処理できるものは存在しない。There is a method of using world knowledge and a discourse model to eliminate structural ambiguity. However, this method has some problems such as difficulty of knowledge creation and heavy processing,
There is no one that can process actual sentences.

【０００８】それゆえに、この発明の主たる目的は、曖
昧な文に対して最適な構造を簡易な方法により高い精度
で選べるような用例主導型自然言語解析装置を提供する
ことである。Therefore, a main object of the present invention is to provide an example-driven natural language analysis device which can select an optimum structure for an ambiguous sentence by a simple method with high accuracy.

【０００９】[0009]

【課題を解決するための手段】請求項１に係る発明は、
自然言語の文を入力する入力手段と、入力された入力文
を入力構造に写像する解析手段と、写像された入力構造
が複数得られたときに各構造の尤度を計算する尤度計算
手段と、計算された尤度に従って最尤の入力構造を選択
する選択手段と、選択された最尤の構造を出力する出力
手段とを備えて構成される。The invention according to claim 1 is
Input means for inputting a sentence in natural language, analyzing means for mapping the input sentence input to an input structure, and likelihood calculating means for calculating a likelihood of each structure when a plurality of mapped input structures are obtained And selecting means for selecting the maximum likelihood input structure according to the calculated likelihood, and output means for outputting the selected maximum likelihood structure.

【００１０】請求項２の発明では、請求項１の尤度計算
部は、当該自然言語文において頻繁に出現する部分構造
を記憶する用例記憶部と、意味の類似性に基づいて単語
を木の形に整理したシソーラスとをもつように構成され
る。According to the second aspect of the present invention, the likelihood calculating section according to the first aspect stores an example storage section for storing a partial structure that frequently appears in the natural language sentence and a word tree based on the similarity of meanings. It is configured to have a thesaurus arranged in shape.

【００１１】請求項３に係る発明では、請求項１の尤度
計算部が入力構造の各部分構造毎に計算する部分尤度に
基づいて全体の尤度を計算する。In the invention according to claim 3, the likelihood calculation section of claim 1 calculates the overall likelihood based on the partial likelihood calculated for each partial structure of the input structure.

【００１２】請求項４に係る発明は、請求項３の部分尤
度は、入力部分構造との部分構造間意味距離が最小にな
る用例部分構造を用例記憶部から検索し、その部分構造
間意味距離と検索された用例部分構造の個数とからな
る。According to the invention of claim 4, the partial likelihood of claim 3 searches the example storage unit for an example partial structure having a minimum meaning distance between the partial structures and the input partial structure, and the partial structure meaning It consists of the distance and the number of retrieved example substructures.

【００１３】請求項５に係る発明は、請求項４の部分構
造間意味距離が、入力部分構造，用例部分構造に含まれ
る単語のシソーラスに従って計算される単語間意味距離
に基づいて決定される。In the invention according to claim 5, the semantic distance between substructures according to claim 4 is determined based on the interword semantic distance calculated according to a thesaurus of words included in the input substructure and the example substructure.

【００１４】請求項６に係る発明では、請求項２の用例
記憶部は、部分構造をキーとしてアクセスできるように
構成される。In the invention according to claim 6, the example storage section of claim 2 is configured to be accessible using the partial structure as a key.

【００１５】請求項７に係る発明では、請求項１の尤度
および部分尤度の計算が解析手段の後処理として実現さ
れるか解析手段に直接組込まれる。In the invention according to claim 7, the calculation of the likelihood and the partial likelihood of claim 1 is realized as a post-process of the analyzing means or directly incorporated in the analyzing means.

【００１６】[0016]

【作用】この発明に係る用例主導型自然言語解析装置
は、入力手段から自然言語の文を入力し、解析手段によ
って入力文を入力構造に写像し、入力構造が複数得られ
たときに各構造の尤度を計算し、計算された尤度に従っ
て最尤の入力構造を選択手段によって選択して出力する
ことにより、曖昧な文に対して最適な構造を高い精度で
選択することができる。In the example-driven natural language analysis apparatus according to the present invention, a natural language sentence is input from the input means, the input sentence is mapped to the input structure by the analysis means, and when a plurality of input structures are obtained, each structure is obtained. The maximum likelihood input structure is selected and output by the selecting means according to the calculated likelihood, and the optimum structure can be selected with high accuracy for an ambiguous sentence.

【００１７】[0017]

【実施例】以下の説明では、この発明を英語の解析に適
用した２つの実施例について示す。ここでは、特に、前
置詞の係り受けを扱うが、この発明は用例を収集すれ
ば、他の構造的な曖昧性、たとえば、ｔｏ−不定詞，関
係節，従属節などの係り受けにも有効である。The following description shows two embodiments in which the present invention is applied to the analysis of English. Here, in particular, the preposition dependency is dealt with, but the present invention is also effective for the dependency of other structural ambiguities such as to-infinitives, relative clauses, and subordinate clauses, if the examples are collected. is there.

【００１８】説明を簡単にするために、前述の例文の
前置詞句「at the conference 」の尤もらしい係り先
が、動詞「present 」，名詞「a paper 」のいずれであ
るかを決定する問題を例として扱う。In order to simplify the explanation, an example of the problem of determining whether the presumably predecessor of the prepositional phrase "at the conference" in the above example sentence is the verb "present" or the noun "a paper" is taken as an example. Treat as.

【００１９】図１はこの発明の一実施例の概略ブロック
図であり、図２は図１に示した尤度計算部の具体例を示
す図である。FIG. 1 is a schematic block diagram of an embodiment of the present invention, and FIG. 2 is a diagram showing a concrete example of the likelihood calculating section shown in FIG.

【００２０】まず、入力部１は、キーボードや文字認識
装置や音声認識装置などからなり、文を入力しかつ同時
にその文を単語分割して各単語に品詞などの情報を辞書
に従って付与し解析部２に与える。解析部２は従来の構
文解析やパターンマッチングによる手法などで、入力に
対する可能な構造（入力構造）を生成する。尤度計算部
３は、図２に示すように、用例記憶部６とシソーラス７
とを含み、入力構造のそれぞれの尤度を計算する。この
尤度計算部３はこの発明の特徴部分であり、後で詳細に
説明する。選択部４は尤度計算部３で計算された尤度付
き構造から最尤の構造を選択するものであり、その選択
した最尤の構造を出力部５に出力する。出力部５は表示
装置や印刷装置などからなる。First, the input unit 1 is composed of a keyboard, a character recognition device, a voice recognition device, etc., inputs a sentence, and at the same time divides the sentence into words, adds information such as a part of speech to each word according to a dictionary, and analyzes it. Give to 2. The analysis unit 2 generates a possible structure (input structure) for an input by a method such as conventional syntax analysis or pattern matching. As shown in FIG. 2, the likelihood calculation unit 3 includes an example storage unit 6 and a thesaurus 7.
Compute the likelihood of each of the input structures, including and. The likelihood calculator 3 is a characteristic part of the present invention, and will be described in detail later. The selection unit 4 selects the maximum likelihood structure from the likelihood-added structures calculated by the likelihood calculation unit 3, and outputs the selected maximum likelihood structure to the output unit 5. The output unit 5 includes a display device, a printing device, and the like.

【００２１】前述の尤度計算部３は、入力構造の各部分
構造毎に計算する部分尤度に基づいて全体の尤度を計算
する。部分尤度は、入力部分構造との部分構造間意味距
離（実数値）が最小になる用例部分構造を用例記憶部６
から検索し、その部分構造間意味距離と検索された用例
部分構造の個数（頻度）とからなる。部分構造間意味距
離は、入力部分構造，用例部分構造に含まれる単語のシ
ソーラス７に従って計算される単語間意味距離に基づい
て決定する。ここで、部分尤度決定について詳細に説明
する。The above-mentioned likelihood calculation unit 3 calculates the overall likelihood based on the partial likelihood calculated for each partial structure of the input structure. As for the partial likelihood, the example storage unit 6 uses the example partial structure that minimizes the semantic distance (real value) between the partial structures and the input partial structure.
It is composed of the semantic distance between the substructures and the number (frequency) of the retrieved example substructures. The semantic distance between substructures is determined based on the semantic distance between words calculated according to the thesaurus 7 of words included in the input substructure and the example substructure. Here, the partial likelihood determination will be described in detail.

【００２２】図３は例文の曖昧な係り先に対応する２
つの入力構造を示したものであり、ここでは説明を簡略
化するために、依存構造表現と呼ばれるのもので表わ
す。図３（ａ）は動詞「present 」に係る場合に対応
し、図３（ｂ）は「名詞「paper」に係る場合に対応
し、太線で係り受けを示している。尤度計算部３ではこ
の２つの可能性、すなわち、「present 、at、the conf
erence」，「a paper 、at、、the conference」の尤度
を計算する。その際に図２に示した用例記憶部６とシソ
ーラス７の２つを参照する。用例記憶部６は部分構造を
キーとしてアクセスできるように構成されており、ここ
で説明している前置詞の場合は、部分構造は動詞（ある
いは名詞）・前置詞・名詞からなる３つ組である。FIG. 3 corresponds to an ambiguous reference of an example sentence.
Two input structures are shown, and here, in order to simplify the description, they are represented by what is called a dependency structure expression. FIG. 3A corresponds to the case related to the verb “present”, and FIG. 3B corresponds to the case related to the “noun“ paper ”, and the dependency is indicated by a bold line. The likelihood calculator 3 uses these two possibilities, namely “present, at, the conf
erence ”,“ a paper, at, the conference ”. At that time, two of the example storage unit 6 and the thesaurus 7 shown in FIG. 2 are referred to. The example storage unit 6 is configured so that it can be accessed by using the partial structure as a key. In the case of the prepositions described here, the partial structure is a triple consisting of a verb (or noun), a preposition, and a noun.

【００２３】図４は用例記憶部の一例を示す図であり、
特に、図４（ａ）は動詞に係る場合を示し、図４（ｂ）
は名詞に係る場合を示している。図５はシソーラスの一
部と単語間の意味距離計算を説明するための図である。FIG. 4 is a diagram showing an example of the example storage unit,
In particular, FIG. 4A shows a case related to a verb, and FIG.
Indicates a case involving a noun. FIG. 5 is a diagram for explaining the calculation of the semantic distance between a part of the thesaurus and words.

【００２４】シソーラス７は、単語を共通の意味概念に
従って体系化した辞書のことである。この一部を図５に
示している。単語間意味距離はシソーラス７を使って計
算する。この方法では、単語間意味距離をシソーラス７
上の概念間意味距離によって定義する。概念間意味距離
はシソーラス７における最小の共通上位概念の位置に従
って０から１までの値にする。値０は２つの概念が同じ
であることを意味し、値１は無関係であることを意味す
る。階層数（ｎ＋１）なら下から、０，１／ｎ，２／
ｎ，…，１を距離として割当てる。図５のシソーラス７
は（３＋１）階層で０，１／３，２／３，１を割当て
る。たとえば、『人間』，『猿』の２つの概念の最小の
共通上位概念は『動物』であり、距離は１／３となる。
したがって、概念『人間』を持つ単語「彼女」と、概念
『猿』を持つ単語「チンパンジー」の距離は１／３とな
る。図５において、ｄ（ｘ，ｙ）は単語（あるいは概
念）ｘ，ｙの間の意味距離を表わす。The thesaurus 7 is a dictionary in which words are systematized according to a common concept of meaning. A part of this is shown in FIG. The semantic distance between words is calculated using thesaurus 7. In this method, the semantic distance between words is
It is defined by the semantic distance between the above concepts. The meaning distance between concepts is set to a value from 0 to 1 according to the position of the smallest common superordinate concept in the thesaurus 7. A value of 0 means that the two concepts are the same, and a value of 1 means they are irrelevant. If the number of layers (n + 1) is 0, 1 / n, 2 /
Assign n, ..., 1 as the distance. Thesaurus 7 in FIG.
Assigns 0, 1/3, 2/3, 1 in the (3 + 1) layer. For example, the smallest common superordinate concept of the two concepts "human" and "monkey" is "animal", and the distance is 1/3.
Therefore, the distance between the word "she" having the concept "human" and the word "chimpanzee" having the concept "monkey" is 1/3. In FIG. 5, d (x, y) represents a semantic distance between words (or concepts) x and y.

【００２５】部分構造間意味距離は、各単語の距離に重
みをかけたものの総和として計算する。入力部分構造Ｉ
と用例部分構造Ｅがそれぞれｎ個の単語Ｉ_kとＥ_kの列
からなるとすると、部分構造間意味距離ｄ（Ｉ，Ｅ）は
次式で計算される。The semantic distance between substructures is calculated as the sum of weighted distances of the words. Input substructure I
And the example substructure E is composed of a sequence of n words I _k and E _k , respectively, the intersubstructure semantic distance d (I, E) is calculated by the following equation.

【００２６】[0026]

【数１】 [Equation 1]

【００２７】重みｗ_kは正解率が高くなるように調整す
る必要があるが、ここでは簡単のためにｗ_k＝１とす
る。The weight w _k needs to be adjusted so that the correct answer rate is high, but here w _k = 1 for simplicity.

【００２８】次に、入力構造全体の尤度について説明す
る。ここでは、まず、入力文中に前置詞句が１つしかな
い単純な場合を考えると、構造全体の尤度と部分尤度と
が一致する。選択部４がこの比較を行なう。Next, the likelihood of the entire input structure will be described. Here, first, considering a simple case where there is only one prepositional phrase in the input sentence, the likelihood of the entire structure and the partial likelihood match. The selection unit 4 makes this comparison.

【００２９】次に、具体的な処理について説明する。係
り先の候補数をｎとし、係り先の候補（動詞または名
詞）をｘ_i（１＜＝ｉ＜＝ｎ）とし、前置詞をｐとし、
ｐの目的語（名詞）をｙとする。問題は入力「ｘ₁，
…，ｘ_n，ｐ，ｙ」に対して「ｐ，ｙ」の最尤の係り先
「ｘ_k」を選ぶことである。例文では、入力は「pres
ent ，a paper ，at，the conference」であり、正解は
「present 」である。［処理手順］（ステップ１）次の処理を繰返す（１＜＝ｉ＜＝
ｎ）。Next, the specific processing will be described. Let n be the number of candidates for the dependency, x _i (1 <= i <= n) be the candidates (verb or noun) for the dependency, and p be the preposition,
Let y be the object (noun) of p. The problem is the input "x ₁ ,
, X _n , p, y ”, the maximum likelihood destination“ x _k ”of“ p, y ”is selected. In the example sentence, the input is "pres
"ent, a paper, at, the conference", and the correct answer is "present". [Processing Procedure] (Step 1) The following processing is repeated (1 <= i <=
n).

【００３０】「ｘ_i ｐｙ」との部分用例間意味距離
を前述の第（１）式に従って計算し、最小距離の用例を
用例記憶部６から検索する。最小距離ｄ_iと同一距離の
用例の頻度ｆ_iを記憶する。The semantic distance between the partial examples with "x _i py" is calculated according to the above-mentioned equation (1), and the example storage unit 6 is searched for the example of the minimum distance. The frequency f _i of the example with the same distance as the minimum distance d _i is stored.

【００３１】（ステップ２）ｄ_iが最小になるｘ_iが１
つしか存在しないならば、ｘ_iを係り先として返し終了
する。[0031] (Step 2) d _i is minimized x _i is 1
If there are only three, x _i is set as the dependent, and the process ends.

【００３２】（ステップ３）ｄ_iが最小になるｘ_iのう
ちでｆ_iが最大になるｘ_iが１つしか存在しないなら
ば、ｘ_iを係り先として返し終了する。[0032] (Step 3) f _i among d _i is minimized x _i is if there is only one is x _i that maximizes, Exit to return the x _i as dependency destination.

【００３３】（ステップ４）ｄ_iが最小になるｘ_iすべ
てを係り先として返し終了する。具体例１ステップ２で終了する場合。(Step 4) Return all x _i for which d _i is the minimum as the related party, and end. Example 1 When ending in step 2.

【００３４】 I present a paper at the conferenc
e. 「present ，at，the conference」の最短距離は０．０
０であり、「a paper，at，the conference」の最短距
離は０．３３である。したがって、前者に決定する。具体例２ステップ３で終了する場合。I present a paper at the conferenc
e. The minimum distance for "present, at, the conference" is 0.0
The minimum distance for "a paper, at, the conference" is 0.33. Therefore, the former is decided. Example 2 When ending at step 3.

【００３５】 We have the next conference at the
hotel. この場合、「have，at，the hotel 」「the next confe
rence ，at，the hotel」の両候補の最短距離は、とも
に０．１７となった。前者の最短距離の用例は「hold，
at，hotel 」でその頻度は９２件であり、後者の最短距
離の用例は「meeting 、at、hotel 」であり、その頻度
は１５件であったので、「at，the hotel」の係り先
は、「have」と判定する。We have the next conference at the
hotel. In this case, "have, at, the hotel""the next confe
The shortest distance between both candidates "rence, at, the hotel" was 0.17. The former example of the shortest distance is "hold,
The frequency of "at, hotel" is 92, and the example of the shortest distance of the latter is "meeting, at, hotel", and the frequency is 15, so the reference of "at, the hotel" is , "Have".

【００３６】上述の手続は、部分用例間意味距離ｄ_iと
同一距離の用例の頻度ｆ_iとから計算できる複合尤度、
たとえばｄ_i−ｆ_i／１０^m（ｍは用例記憶部６の規模
に依存するが、経験的にはｍ＝６で十分である）を最小
にすることとほぼ等価である。The above procedure is a compound likelihood that can be calculated from the partial inter-example semantic distance d _i and the frequency f _i of examples with the same distance,
For example, it is almost equivalent to minimizing d _i −f _i / 10 ^m (m depends on the scale of the example storage unit 6, but empirically, m = 6 is sufficient).

【００３７】すべての前置詞の複合尤度の総和に従っ
て、最尤の係り先を決定する。次に、第２実施例につい
て説明する。第２実施例は基本的には第１実施例と同じ
である。相違するのは尤度および部分尤度の計算が、解
析部２に組込まれている点である。文法規則適用時に係
り先候補を抽出し、同時に、部分尤度計算を行なう。The maximum likelihood is determined according to the total sum of the composite likelihoods of all the prepositions. Next, a second embodiment will be described. The second embodiment is basically the same as the first embodiment. The difference is that the calculation of the likelihood and the partial likelihood is incorporated in the analysis unit 2. When the grammar rule is applied, the related party candidates are extracted, and at the same time, the partial likelihood calculation is performed.

【００３８】文法規則は次のようなものが用意される。（１）ＶＰＰＰ→ＶＰ（２）ＮＰＰＰ→ＮＰ入力文は次の例文だとする。The following grammar rules are prepared. (1) VP PP → VP (2) NP PP → NP It is assumed that the input sentence is the following example sentence.

【００３９】 I present a paper at the conferenc
e. 文法規則（１）が適用されるときに部分構造「present
、at、the conference」が抽出でき、当該部分構造の
部分尤度計算を行なう。文法規則（２）が適用されると
きに部分構造「a paper 、at、the conference」が抽出
でき、当該部分構造の部分尤度計算を行なう。解析終了
時には、構造全体の尤度が求まる。これ以外の点は第１
実施例と同じである。I present a paper at the conferenc
e. Substructure “present when grammar rule (1) is applied
, At, the conference ”can be extracted and the partial likelihood calculation of the partial structure is performed. When the grammar rule (2) is applied, the partial structure “a paper, at, the conference” can be extracted, and the partial likelihood calculation of the partial structure is performed. At the end of the analysis, the likelihood of the entire structure is obtained. The other points are the first
Same as the embodiment.

【００４０】[0040]

【発明の効果】以上のように、この発明によれば、自然
言語の文を入力して入力構造に写像し、複数の入力構造
が得られたとき各構造の尤度を計算し、その尤度に従っ
て最尤の入力構造を選択して出力するようにしたの
で、、曖昧な入力文に対して用例を参照して最尤の構造
を出力することができる。しかも、シソーラスを利用す
ることで近似照合が実現でき、用例数が少なくて済む。
さらに、分野毎に用例を用意すれば、分野毎に簡単に調
整できる。また、この手法は任意の言語に適用でき、従
来の多くの解析手法（ＣＦＧやパターンマッチングな
ど）に容易に組込むことができる。As described above, according to the present invention, a natural language sentence is input and mapped to an input structure, and when a plurality of input structures are obtained, the likelihood of each structure is calculated and the likelihood is calculated. Since the maximum likelihood input structure is selected and output according to the degree, the maximum likelihood structure can be output with reference to an ambiguous input sentence. Moreover, approximate matching can be realized by using a thesaurus, and the number of examples can be reduced.
Furthermore, if an example is prepared for each field, it can be easily adjusted for each field. Further, this method can be applied to any language, and can be easily incorporated into many conventional analysis methods (CFG, pattern matching, etc.).

[Brief description of drawings]

【図１】この発明の用例主導型自然言語解析装置の第１
実施例の概要を示すブロック図である。FIG. 1 is a first example of an example-driven natural language analysis device according to the present invention.
It is a block diagram which shows the outline | summary of an Example.

【図２】図１に示した尤度計算部の概要を示すブロック
図である。、FIG. 2 is a block diagram showing an outline of a likelihood calculation unit shown in FIG. ,

【図３】曖昧な入力文「I present a paper at the con
ference.」に対する可能な入力構造を示した図である。[Figure 3] Ambiguous input sentence "I present a paper at the con
is a diagram showing a possible input structure for "ference."

【図４】用例記憶部の一部を示す図であり、特に、
（ａ）は動詞に係る場合を示し、（ｂ）は名詞に係る場
合を示す。FIG. 4 is a diagram showing a part of an example storage unit, in particular,
(A) shows a case related to a verb, and (b) shows a case related to a noun.

【図５】シソーラスの一部と単語間意味距離計算を説明
するための図である。FIG. 5 is a diagram for explaining a part of the thesaurus and calculation of a semantic distance between words.

[Explanation of symbols]

１入力部２解析部３尤度計算部４選択部５出力部６用例記憶部７シソーラス 1 input unit 2 analysis unit 3 likelihood calculation unit 4 selection unit 5 output unit 6 example storage unit 7 thesaurus

フロントページの続き (72)発明者古瀬蔵京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール自動翻訳電話研究所内 (72)発明者飯田仁京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール自動翻訳電話研究所内Front Page Continuation (72) Inventor Kura Furuse, Seika-cho, Soraku-gun, Kyoto Prefecture, Osamu Osamu Osamu Osamu, No. 5, Sanpeiya, ATR Co., Ltd. Automatic translation telephone laboratory (72) Ina, Hitoshi Iida Seiraku-cho, Soraku-gun, Kyoto Prefecture Daiji Intani, Shoji, Hiratani No. 5, Arai Co., Ltd. Automatic translation telephone laboratory

Claims

[Claims]

1. Input means for inputting a sentence in natural language; analysis means for mapping the input sentence input by the input means onto an input structure; and a plurality of input structures obtained by the analysis means,
Likelihood calculation means for calculating the likelihood of each structure, selection means for selecting the maximum likelihood input structure according to the likelihood calculated by the likelihood calculation means, and maximum likelihood structure selected by the selection means An example-driven natural language analysis device, comprising: output means for outputting

2. The likelihood calculating means includes an example storage unit that stores an example substructure that frequently appears in the natural language sentence, and a thesaurus that organizes words into a tree shape based on similarity in meaning. An example-driven natural language analyzer according to claim 1, characterized in that it comprises:

3. The example-driven nature according to claim 1, wherein the likelihood calculating means calculates the overall likelihood based on a partial likelihood calculated for each partial structure of the input structure. Language analysis device.

4. The partial likelihood is searched from the example storage unit for an example partial structure having a minimum meaning distance between the partial structures and the input partial structure, and the meaning distance between the partial structures and the searched example part are searched. The example-driven natural language analysis device according to claim 3, characterized in that it comprises the number of structures.

5. The semantic distance between substructures is determined based on an interword semantic distance calculated according to the thesaurus of words included in the input substructure and the example substructure. 4. Example-driven natural language analyzer.

6. The example-initiated natural language analysis apparatus according to claim 2, wherein the example storage unit is accessible by using the partial structure as a key.

7. The calculation of the likelihood and the partial likelihood is
The example-initiated natural language analysis apparatus according to claim 1, characterized in that it can be directly incorporated into the analysis means other than a method realized as a post-processing of the analysis means.