JP2796690B2

JP2796690B2 - Example-driven natural language analyzer

Info

Publication number: JP2796690B2
Application number: JP5122442A
Authority: JP
Inventors: 英一郎隅田; 蔵古瀬; 仁飯田
Original assignee: EI TEI AARU JIDO HONYAKU DENWA KENKYUSHO KK
Current assignee: EI TEI AARU JIDO HONYAKU DENWA KENKYUSHO KK
Priority date: 1993-05-25
Filing date: 1993-05-25
Publication date: 1998-09-10
Anticipated expiration: 2013-09-10
Also published as: JPH06332940A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は用例主導型自然言語解
析装置に関し、特に、機械翻訳システム，情報検索シス
テム，質問応答システムなどで用いられ、曖昧な文に対
して最適な構造を簡易な方法により高い精度で選べるよ
うな用例主導型自然言語解析装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an example-driven natural language analyzing apparatus, and more particularly to a simple method for use in a machine translation system, an information retrieval system, a question answering system, etc., which can optimize an optimal structure for ambiguous sentences. The present invention relates to an example-driven natural language analyzer that can be selected with higher accuracy.

【０００２】[0002]

【従来の技術】計算機による言語処理システムは、ます
ますその需要が高まり、研究開発が盛んに行なわれてい
る。最近では、機械翻訳システムは情報検索システムや
質問応答システムなどの多様な分野で商品化も行なわれ
るようになってきた。2. Description of the Related Art The demand for computer-based language processing systems has been increasing, and research and development have been actively conducted. Recently, machine translation systems have been commercialized in various fields such as information retrieval systems and question answering systems.

【０００３】[0003]

【発明が解決しようとする課題】計算機による言語処理
システムの必須要素である言語解析装置には、従来、共
通の大きな問題点があった。すなわち、曖昧な文に対し
て最適な構造を高い精度で選べないことである。たとえ
ば、前置詞句の係り先は、構造的曖昧性を引起こす典型
的な例である。The language analyzer, which is an essential element of a computer-based language processing system, has conventionally had a large common problem. That is, it is not possible to select an optimal structure with high accuracy for an ambiguous sentence. For example, the involvement of a prepositional phrase is a typical example of causing structural ambiguity.

【０００４】 I present a paper at the conferenc
e. この例文の前置詞句「at the conference 」は、動詞
「present 」，名詞「a paper 」の両方を修飾し得る。
一般に、前置詞句の係り先は、文法規則による従来の解
析装置では一意に決定することが困難である。計算機処
理では、ある前置詞の係り先が複数可能な場合でも、人
間は、通常、係り先の尤もらしさを判定できる。この例
では、前者の方が自然な係り先である。[0004] I present a paper at the conferenc
e. The preposition phrase "at the conference" in this example sentence can modify both the verb "present" and the noun "a paper".
In general, it is difficult to uniquely determine the destination of a prepositional phrase using a conventional analyzer based on grammar rules. In the computer processing, even when a plurality of destinations of a given preposition are possible, a human can usually determine the likelihood of the destination. In this example, the former is a more natural participant.

【０００５】現在広く行なわれている手法は、文法規則
や辞書に意味マーカなどを用いて、係り先の優先情報を
指定する方法である。しかし、この手法では、動詞の必
須格は別として、動詞の自由格（たとえば上述の前置詞
句「at the conference 」）や前置詞句が名詞に係る場
合を扱えない。[0005] A method widely used at present is a method of designating priority information of a destination using a semantic marker or the like in a grammar rule or a dictionary. However, this method cannot deal with the case of the verb being free (for example, the above-mentioned preposition phrase "at the conference") or the case where the preposition phrase relates to a noun, aside from the essential case of the verb.

【０００６】一方、近年では、確率文法が盛んに研究さ
れている。この手法は、文法規則適用の最適化をするの
で、前置詞句の曖昧性解消を行なうためには、単語に近
いレベルの細かい文法規則を与える必要がある。このよ
うな細かい文法規則を記述することは容易ではない。さ
らに、もし記述できたとしても、最適化のための学習時
間が膨大になる（計算量は文法のサイズをＧとするとＯ
（Ｇ³）である）。On the other hand, in recent years, stochastic grammars have been actively studied. Since this method optimizes the application of grammar rules, it is necessary to provide fine grammar rules at a level close to a word in order to resolve the ambiguity of prepositional phrases. It is not easy to describe such detailed grammar rules. Furthermore, even if it can be described, the learning time for optimization is enormous (the amount of computation is O if the size of the grammar is G).
(G ³ )).

【０００７】構造的な曖昧性の解消に世界知識，ディス
コースモデルを使う手法がある。しかし、この手法に
は、知識作成の困難さ，処理の重さなどの問題があり、
現実の文章を処理できるものは存在しない。There is a method using world knowledge and a discourse model for resolving structural ambiguity. However, this method has problems such as difficulty in creating knowledge and weight of processing.
Nothing can handle real text.

【０００８】それゆえに、この発明の主たる目的は、曖
昧な文に対して最適な構造を簡易な方法により高い精度
で選べるような用例主導型自然言語解析装置を提供する
ことである。SUMMARY OF THE INVENTION Therefore, a main object of the present invention is to provide an example-driven natural language analyzer capable of selecting an optimum structure for an ambiguous sentence with high accuracy by a simple method.

【０００９】[0009]

【課題を解決するための手段】請求項１に係る発明は、
自然言語の文を入力する入力手段と、入力された入力文
を入力部分構造に写像する解析手段と、入力された自然
言語文の全体を調べて頻繁に出現する用例部分構造を記
憶する用例記憶手段と、意味の類似性に基づいて単語を
予め木の形に整理し、単語の意味概念に従って単語間意
味距離を求めるためのシソーラスと、写像された入力構
造に基づいて、入力部分構造および用例部分構造に含ま
れる単語のシソーラスに従って計算される意味距離に基
づく部分構造間意味距離を求め、その部分構造間意味距
離が最小になる用例部分構造を検索し、部分構造間意味
距離が最小になる用例部分構造の出現する頻度を計算
し、計算した部分構造間意味距離と頻度とを部分尤度と
して出力する尤度計算手段と、計算された部分構造間距
離と頻度とからなる部分尤度に従って、最も確からしさ
を示す最尤の入力構造を選択する選択手段と、選択され
た最尤の構造を出力する出力手段を備えて構成される。The invention according to claim 1 is
Input means for inputting a sentence of a natural language, analysis means for mapping the input sentence to an input substructure, and example storage for examining the entire input natural language sentence and storing frequently occurring example substructures Means, a thesaurus for prearranging words in the form of a tree based on the similarity of meaning, and obtaining a meaning distance between words in accordance with the meaning concept of the word, and an input partial structure and an example based on the mapped input structure Find the semantic distance between substructures based on the semantic distance calculated according to the thesaurus of words included in the substructures, search for an example substructure that minimizes the semantic distance between the substructures, and minimize the semantic distance between the substructures The likelihood calculating means for calculating the frequency of appearance of the example substructures and outputting the calculated semantic distance between the substructures and the frequency as the partial likelihood, and the calculated distance between the substructures and the frequency In accordance with the divided likelihood configured to include selecting means for selecting the input structure of the maximum likelihood indicating the most certainty, output means for outputting the structure of the maximum likelihood chosen.

【００１０】[0010]

【００１１】[0011]

【００１２】[0012]

【００１３】[0013]

【００１４】[0014]

【００１５】[0015]

【００１６】[0016]

【作用】この発明に係る用例主導型自然言語解析装置
は、入力手段から自然言語の文を入力し、解析手段によ
って入力文を入力構造に写像する。入力された自然言語
文の全体を調べて頻繁に出現する用例の部分構造を記憶
しておき、写像された入力構造に基づいてシソーラスか
ら単語間意味距離を求め、その単語間意味距離が最小に
なる用例部分構造を検索し、部分構造間意味距離が最小
になる用例部分構造の出現する頻度を計算し、構造の確
からしさを決定する尤度を求め、その尤度に従って最も
確からしさを示す最尤の入力構造を選択して出力するこ
とにより、曖昧な文に対して最適な構造を高い精度で選
択することができる。In the example-driven natural language analysis apparatus according to the present invention, a sentence of a natural language is input from input means, and the input sentence is mapped to an input structure by the analysis means. By examining the entire input natural language sentence and storing the substructures of frequently occurring examples, the inter-word semantic distance is obtained from the thesaurus based on the mapped input structure, and the inter-word semantic distance is minimized. Search for the example substructure, calculate the frequency of occurrence of the example substructure that minimizes the semantic distance between the substructures, find the likelihood for determining the likelihood of the structure, and determine the likelihood that indicates the most certainty according to the likelihood. By selecting and outputting the input structure of the likelihood, it is possible to select an optimum structure for an ambiguous sentence with high accuracy.

【００１７】[0017]

【実施例】以下の説明では、この発明を英語の解析に適
用した２つの実施例について示す。ここでは、特に、前
置詞の係り受けを扱うが、この発明は用例を収集すれ
ば、他の構造的な曖昧性、たとえば、ｔｏ−不定詞，関
係節，従属節などの係り受けにも有効である。DESCRIPTION OF THE PREFERRED EMBODIMENTS In the following description, two embodiments in which the present invention is applied to the analysis of English will be described. Here, in particular, we deal with the dependency of prepositions. However, the present invention is effective in collecting other structural ambiguities, for example, to-infinitives, relative clauses, subordinate clauses, etc. is there.

【００１８】説明を簡単にするために、前述の例文の
前置詞句「at the conference 」の尤もらしい係り先
が、動詞「present 」，名詞「a paper 」のいずれであ
るかを決定する問題を例として扱う。For simplicity, the problem of deciding whether the plausible part of the preposition phrase "at the conference" in the above example sentence is the verb "present" or the noun "a paper" will be described. Treat as

【００１９】図１はこの発明の一実施例の概略ブロック
図であり、図２は図１に示した尤度計算部の具体例を示
す図である。FIG. 1 is a schematic block diagram of one embodiment of the present invention, and FIG. 2 is a diagram showing a specific example of the likelihood calculating section shown in FIG.

【００２０】まず、入力部１はキーボードや文字認識装
置や音声認識装置などからなり、文を入力しかつ同時に
その文を単語分割して各単語に品詞などの情報を辞書に
従って付与し解析部２に与える。解析部２は従来の構文
解析やパターンマッチングによる手法などで、入力に対
する可能な構造（入力構造）を生成する。尤度計算部３
は、図２に示すように、用例記憶部６とシソーラス７と
を含み、入力構造のそれぞれの尤度を計算する。この尤
度計算部３はこの発明の特徴部分であり、後で詳細に説
明する。選択部４は尤度計算部３で計算された尤度と木
構造から最尤の構造を選択するものであり、その選択し
た最尤の構造を出力部５に出力する。出力部５は表示装
置や印刷装置等からなる。なお、この実施例において用
いられる尤度，複合尤度，最尤は構造の確からしさを決
定するために定義されたものである。First, the input unit 1 includes a keyboard, a character recognition device, a voice recognition device, and the like. The input unit 1 inputs a sentence, divides the sentence into words at the same time, and gives information such as part of speech to each word according to a dictionary. Give to. The analysis unit 2 generates a possible structure for the input (input structure) by a conventional syntax analysis or a method based on pattern matching. Likelihood calculator 3
Includes an example storage unit 6 and a thesaurus 7, as shown in FIG. 2, and calculates the likelihood of each of the input structures. The likelihood calculating section 3 is a feature of the present invention, and will be described later in detail. The selection unit 4 selects the maximum likelihood structure from the likelihood calculated by the likelihood calculation unit 3 and the tree structure, and outputs the selected maximum likelihood structure to the output unit 5. The output unit 5 includes a display device, a printing device, and the like. Note that the likelihood, composite likelihood, and maximum likelihood used in this embodiment are defined for determining the certainty of the structure.

【００２１】前述の尤度計算部３は、入力構造の各部分
構造毎に計算する部分尤度に基づいて全体の尤度を計算
する。部分尤度は、入力部分構造との部分構造間意味距
離（実数値）が最小になる用例部分構造を用例記憶部６
から検索し、その部分構造間意味距離と検索された用例
部分構造の個数（頻度）とからなる。部分構造間意味距
離は、入力部分構造，用例部分構造に含まれる単語のシ
ソーラス７に従って計算される単語間意味距離に基づい
て決定する。ここで、部分尤度決定について詳細に説明
する。The above-mentioned likelihood calculating section 3 calculates the overall likelihood based on the partial likelihood calculated for each partial structure of the input structure. The partial likelihood is obtained by storing the example partial structure in which the meaningful distance (real value) between the partial structures with the input partial structure is the smallest, in the example storage unit 6.
, And the number (frequency) of the searched example partial structures. The inter-substructure semantic distance is determined based on the inter-word semantic distance calculated according to the thesaurus 7 of the words included in the input sub-structure and the example sub-structure. Here, the partial likelihood determination will be described in detail.

【００２２】図３は例文の曖昧な係り先に対応する２
つの入力構造を示したものであり、ここでは説明を簡略
化するために、依存構造表現と呼ばれるのもので表わ
す。図３（ａ）は動詞「present 」に係る場合に対応
し、図３（ｂ）は「名詞「paper」に係る場合に対応
し、太線で係り受けを示している。尤度計算部３ではこ
の２つの可能性、すなわち、「present 、at、the conf
erence」，「a paper 、at、、the conference」の尤度
を計算する。その際に図２に示した用例記憶部６とシソ
ーラス７の２つを参照する。用例記憶部６は部分構造を
キーとしてアクセスできるように構成されており、ここ
で説明している前置詞の場合は、部分構造は動詞（ある
いは名詞）・前置詞・名詞からなる３つ組である。FIG. 3 shows an example sentence 2 corresponding to an ambiguous destination.
One input structure is shown here, and for the sake of simplicity, it is represented by what is called a dependent structure expression. FIG. 3A corresponds to the case related to the verb “present”, and FIG. 3B corresponds to the case related to “noun“ paper ”, and the dependency is indicated by a thick line. The likelihood calculation unit 3 uses these two possibilities, namely, “present, at, the conf”
erence "," a paper, at, the conference ". At this time, the example storage unit 6 and the thesaurus 7 shown in FIG. 2 are referred to. The example storage unit 6 is configured to be able to access using the partial structure as a key. In the case of the preposition described here, the partial structure is a triple set consisting of a verb (or a noun), a preposition, and a noun.

【００２３】図４は用例記憶部の一例を示す図であり、
特に、図４（ａ）は動詞に係る場合を示し、図４（ｂ）
は名詞に係る場合を示している。図５はシソーラスの一
部と単語間の意味距離計算を説明するための図である。FIG. 4 is a diagram showing an example of the example storage unit.
In particular, FIG. 4A shows a case relating to a verb, and FIG.
Indicates a case relating to a noun. FIG. 5 is a diagram for explaining the calculation of the semantic distance between a part of the thesaurus and a word.

【００２４】シソーラス７は、単語を共通の意味概念に
従って体系化した辞書のことである。この一部を図５に
示している。単語間意味距離はシソーラス７を使って計
算する。この方法では、単語間意味距離をシソーラス７
上の概念間意味距離によって定義する。概念間意味距離
はシソーラス７における最小の共通上位概念の位置に従
って０から１までの値にする。値０は２つの概念が同じ
であることを意味し、値１は無関係であることを意味す
る。階層数（ｎ＋１）なら下から、０，１／ｎ，２／
ｎ，…，１を距離として割当てる。図５のシソーラス７
は（３＋１）階層で０，１／３，２／３，１を割当て
る。たとえば、『人間』，『猿』の２つの概念の最小の
共通上位概念は『動物』であり、距離は１／３となる。
したがって、概念『人間』を持つ単語「彼女」と、概念
『猿』を持つ単語「チンパンジー」の距離は１／３とな
る。図５において、ｄ（ｘ，ｙ）は単語（あるいは概
念）ｘ，ｙの間の意味距離を表わす。The thesaurus 7 is a dictionary in which words are systematized according to a common semantic concept. A part of this is shown in FIG. The meaning distance between words is calculated using the thesaurus 7. In this method, the inter-word semantic distance is calculated using the thesaurus 7
It is defined by the above semantic distance between concepts. The concept meaning distance is set to a value from 0 to 1 according to the position of the smallest common superordinate concept in the thesaurus 7. A value of 0 means that the two concepts are the same, and a value of 1 means that they are irrelevant. If the number of layers is (n + 1), from the bottom, 0, 1 / n, 2 /
.., 1 are assigned as distances. Thesaurus 7 in FIG.
Assigns 0, 1/3, 2/3, and 1 in the (3 + 1) hierarchy. For example, the smallest common superordinate concept between the two concepts "human" and "monkey" is "animal", and the distance is 1/3.
Therefore, the distance between the word "she" having the concept "human" and the word "chimpanzee" having the concept "monkey" is 1/3. In FIG. 5, d (x, y) represents a semantic distance between words (or concepts) x and y.

【００２５】部分構造間意味距離は、各単語の距離に重
みをかけたものの総和として計算する。入力部分構造Ｉ
と用例部分構造Ｅがそれぞれｎ個の単語Ｉ_kとＥ_kの列
からなるとすると、部分構造間意味距離ｄ（Ｉ，Ｅ）は
次式で計算される。The semantic distance between partial structures is calculated as the sum of weights of distances between words. Input partial structure I
Assuming that the partial structure E is composed of columns of n words I _k and E _k , respectively, the meaning distance d (I, E) between the partial structures is calculated by the following equation.

【００２６】[0026]

【数１】 (Equation 1)

【００２７】重みｗ_kは正解率が高くなるように調整す
る必要があるが、ここでは簡単のためにｗ_k＝１とす
る。Although it is necessary to adjust the weight w _k so as to increase the accuracy rate, it is assumed here that w _k = 1 for simplicity.

【００２８】次に、入力構造全体の尤度について説明す
る。ここでは、まず、入力文中に前置詞句が１つしかな
い単純な場合を考えると、構造全体の尤度と部分尤度と
が一致する。選択部４がこの比較を行なう。Next, the likelihood of the entire input structure will be described. Here, first, considering a simple case where there is only one preposition phrase in the input sentence, the likelihood of the entire structure and the partial likelihood match. The selector 4 makes this comparison.

【００２９】次に、具体的な処理について説明する。係
り先の候補数をｎとし、係り先の候補（動詞または名
詞）をｘ_i（１＜＝ｉ＜＝ｎ）とし、前置詞をｐとし、
ｐの目的語（名詞）をｙとする。問題は入力「ｘ₁，
…，ｘ_n，ｐ，ｙ」に対して「ｐ，ｙ」の最尤の係り先
「ｘ_k」を選ぶことである。例文では、入力は「pres
ent ，a paper ，at，the conference」であり、正解は
「present 」である。［処理手順］（ステップ１）次の処理を繰返す（１＜＝ｉ＜＝
ｎ）。Next, specific processing will be described. The number of candidate candidates is n, the candidate (verb or noun) is x _i (1 <= i <= n), the preposition is p,
The object (noun) of p is y. Problem input "x _1,
, X _n , p, y ”is to select the maximum likelihood destination“ x _k ”of“ p, y ”. In the example sentence, the input is "pres
ent, a paper, at, the conference ”, and the correct answer is“ present ”. [Processing Procedure] (Step 1) The following processing is repeated (1 <= i <=
n).

【００３０】「ｘ_ｉｐｙ」との部分構造間意味距離を
前述の第（１）式に従って計算し、最小距離の用例部分
構造を用例記憶部６から検索する。最小距離ｄ_ｉと同一
距離の用例部分構造の頻度ｆ_ｉを記憶する。[0030] The partial structure between the mean distance between the "x _{i p} y" calculated in accordance with equation (1) described above, to search for example partial structure of the minimum distance from the example storage unit 6. Storing the minimum distance d _i and frequency f _i of the example the partial structure of the same distance.

【００３１】（ステップ２）ｄ_iが最小になるｘ_iが１
つしか存在しないならば、ｘ_iを係り先として返し終了
する。[0031] (Step 2) d _i is minimized x _i is 1
If only One does not exist, to end returns the x _i as the dependency destination.

【００３２】（ステップ３）ｄ_iが最小になるｘ_iのう
ちでｆ_iが最大になるｘ_iが１つしか存在しないなら
ば、ｘ_iを係り先として返し終了する。(Step 3) If there is only one x _i with the largest f _i among the x _{i with} the smallest d _i , the process returns with x _i as the destination and ends.

【００３３】（ステップ４）ｄ_iが最小になるｘ_iすべ
てを係り先として返し終了する。具体例１ステップ２で終了する場合。(Step 4) Return all x _i in which d _i is the minimum as the destination and end. Specific example 1 When ending in step 2.

【００３４】 I present a paper at the conferenc
e. 「present ，at，the conference」の最短距離は０．０
０であり、「a paper，at，the conference」の最短距
離は０．３３である。したがって、前者に決定する。具体例２ステップ３で終了する場合。[0034] I present a paper at the conferenc
e. The shortest distance of "present, at, the conference" is 0.0
0, and the shortest distance of “a paper, at, the conference” is 0.33. Therefore, the former is decided. Specific example 2 When ending in step 3.

【００３５】 We have the next conference at the
hotel. この場合、「have，at，the hotel 」「the next confe
rence ，at，the hotel」の両候補の最短距離は、とも
に０．１７となった。前者の最短距離の用例は「hold，
at，hotel 」でその頻度は９２件であり、後者の最短距
離の用例は「meeting 、at、hotel 」であり、その頻度
は１５件であったので、「at，the hotel」の係り先
は、「have」と判定する。[0035] We have the next conference at the
hotel. In this case, "have, at, the hotel""the next confe"
The shortest distance of both candidates "rence, at, the hotel" was 0.17. An example of the former shortest distance is "hold,
"at, hotel" has a frequency of 92, and the latter example of the shortest distance is "meeting, at, hotel", and its frequency is 15; , "Have".

【００３６】上述の手続は、部分用例間意味距離ｄ_iと
同一距離の用例の頻度ｆ_iとから計算できる複合尤度、
たとえばｄ_i−ｆ_i／１０^m（ｍは用例記憶部６の規模
に依存するが、経験的にはｍ＝６で十分である）を最小
にすることとほぼ等価である。The procedure described above, the composite likelihood can be calculated from a partial example between mean distance d _i and frequency f _i of the example of the same distance,
For example d _{_i} -f _i / 10 m (where ^m is dependent on the size of the example storage unit 6, the empirical is sufficient m = 6) is approximately equivalent to minimizing the.

【００３７】すべての前置詞の複合尤度の総和に従っ
て、最尤の係り先を決定する。次に、第２実施例につい
て説明する。第２実施例は基本的には第１実施例と同じ
である。相違するのは尤度および部分尤度の計算が、解
析部２に組込まれている点である。文法規則適用時に係
り先候補を抽出し、同時に、部分尤度計算を行なう。According to the sum of the composite likelihoods of all prepositions, the maximum likelihood is determined. Next, a second embodiment will be described. The second embodiment is basically the same as the first embodiment. The difference is that the calculation of the likelihood and the partial likelihood is incorporated in the analysis unit 2. At the time of applying a grammar rule, a candidate for a destination is extracted, and at the same time, a partial likelihood calculation is performed.

【００３８】文法規則は次のようなものが用意される。（１）ＶＰＰＰ→ＶＰ（２）ＮＰＰＰ→ＮＰ入力文は次の例文だとする。The following grammar rules are prepared. (1) VP PP → VP (2) NP PP → NP The input sentence is assumed to be the following example sentence.

【００３９】 I present a paper at the conferenc
e. 文法規則（１）が適用されるときに部分構造「present
、at、the conference」が抽出でき、当該部分構造の
部分尤度計算を行なう。文法規則（２）が適用されると
きに部分構造「a paper 、at、the conference」が抽出
でき、当該部分構造の部分尤度計算を行なう。解析終了
時には、構造全体の尤度が求まる。これ以外の点は第１
実施例と同じである。[0039] I present a paper at the conferenc
e. When the grammar rule (1) is applied, the substructure "present
, At, and the conference "can be extracted, and partial likelihood calculation of the partial structure is performed. When the grammar rule (2) is applied, a partial structure “a paper, at, the conference” can be extracted, and a partial likelihood calculation of the partial structure is performed. At the end of the analysis, the likelihood of the entire structure is obtained. Other points are the first
This is the same as the embodiment.

【００４０】[0040]

【発明の効果】以上のように、この発明によれば、自然
言語の文を入力して入力構造に写像し、複数の入力構造
が得られたとき各構造の尤度を計算し、その尤度に従っ
て最尤の入力構造を選択して出力するようにしたの
で、、曖昧な入力文に対して用例を参照して最尤の構造
を出力することができる。しかも、シソーラスを利用す
ることで近似照合が実現でき、用例数が少なくて済む。
さらに、分野毎に用例を用意すれば、分野毎に簡単に調
整できる。また、この手法は任意の言語に適用でき、従
来の多くの解析手法（ＣＦＧやパターンマッチングな
ど）に容易に組込むことができる。As described above, according to the present invention, a sentence in a natural language is input and mapped to an input structure, and when a plurality of input structures are obtained, the likelihood of each structure is calculated. Since the maximum likelihood input structure is selected and output according to the degree, the maximum likelihood structure can be output by referring to an example for an ambiguous input sentence. In addition, approximate matching can be realized by using a thesaurus, and the number of examples can be reduced.
Furthermore, if an example is prepared for each field, it can be easily adjusted for each field. This method can be applied to any language, and can be easily incorporated into many conventional analysis methods (CFG, pattern matching, etc.).

[Brief description of the drawings]

【図１】この発明の用例主導型自然言語解析装置の第１
実施例の概要を示すブロック図である。FIG. 1 is a first example of an example-driven natural language analyzer according to the present invention.
It is a block diagram showing an outline of an example.

【図２】図１に示した尤度計算部の概要を示すブロック
図である。、FIG. 2 is a block diagram illustrating an outline of a likelihood calculation unit illustrated in FIG. 1; ,

【図３】曖昧な入力文「I present a paper at the con
ference.」に対する可能な入力構造を示した図である。[Figure 3] Ambiguous input sentence "I present a paper at the con
FIG. 7 shows a possible input structure for "ference."

【図４】用例記憶部の一部を示す図であり、特に、
（ａ）は動詞に係る場合を示し、（ｂ）は名詞に係る場
合を示す。FIG. 4 is a diagram showing a part of an example storage unit.
(A) shows a case related to a verb, and (b) shows a case related to a noun.

【図５】シソーラスの一部と単語間意味距離計算を説明
するための図である。FIG. 5 is a diagram for explaining calculation of a part of a thesaurus and a semantic distance between words.

[Explanation of symbols]

１入力部２解析部３尤度計算部４選択部５出力部６用例記憶部７シソーラス DESCRIPTION OF SYMBOLS 1 Input part 2 Analysis part 3 Likelihood calculation part 4 Selection part 5 Output part 6 Example storage part 7 Thesaurus

───────────────────────────────────────────────────── フロントページの続き (72)発明者飯田仁京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール自動翻訳電話研究所内 (56)参考文献特開平４−47364（ＪＰ，Ａ) 特開平３−276367（ＪＰ，Ａ) 特開昭63−91776（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/20 - 17/28 ＪＩＣＳＴファイル（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Jin Iida Kyoto, Soraku-gun, Seika-cho, 5F, Inani, 5F, Sanpiraya Inside AT / R Automatic Translation and Telephone Research Institute, Inc. (56) References JP-A-4- 47364 (JP, A) JP-A-3-276367 (JP, A) JP-A-63-91776 (JP, A) (58) Fields investigated (Int. Cl. ⁶ , DB name) G06F 17/20-17 / 28 JICST file (JOIS)

Claims

(57) [Claims]

An input unit for inputting a sentence of a natural language; an analyzing unit for mapping an input sentence input by the input unit onto an input substructure; and a whole natural language sentence input from the input unit is examined. An example storage means for storing an example partial structure that frequently appears in the form of a word; and a thesaurus for prearranging the words in a tree form based on the similarity of the meaning and obtaining a meaning distance between words according to the concept of the meaning of the word; Based on the input structure mapped by the analysis means,
Calculating a semantic distance between partial structures based on a semantic distance calculated according to the thesaurus of words included in the input partial structure and the example partial structure; and storing the example partial structure having the minimum semantic distance between the partial structures as the example storage means. A likelihood calculating unit that calculates the frequency of occurrence of the example substructures in which the semantic distance between the partial structures is minimized, and outputs the calculated semantic distance between the substructures and the frequency as a partial likelihood; Selecting means for selecting the maximum likelihood input structure indicating the most certainty, according to the partial likelihood consisting of the distance between the partial structures and the frequency calculated by the calculating means, and outputting the maximum likelihood structure selected by the selecting means An example-driven natural language analysis device, comprising: