JPH09160915A

JPH09160915A - Contextual processing system for natural language

Info

Publication number: JPH09160915A
Application number: JP7318433A
Authority: JP
Inventors: Takesuke Hiraoka; 丈介平岡
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1995-12-07
Filing date: 1995-12-07
Publication date: 1997-06-20

Abstract

PROBLEM TO BE SOLVED: To provide a contextual processing system for natural language which is capable of eliminating the ambiguity of a syntax structure and selecting a right analysis result. SOLUTION: The syntax meaning analysis of a sentence is performed (step S1 and S2), the sentence is decomposed into the modification elements of a word, the modification elements are integrated, and the contextual space data composed by having the relative confidence weighed by the evaluation point by the number of the appearance of the modification and a meaning analysis is prepared (S3). The modification element which is large in relative confidence is used as data to be believed. When the modification element which is the same as this element appears in an other sentence, the relative confidence of the modification element is raised. When the modification data which is the same as the modification data which is very small in the relative confidence appears in the other sentence, confidence adjusting processings (A, B and C) lowering the relative confidence of the modification element are performed (S4 to S6). From the contextual space data which is finally obtained from the processings, data with high confidence is selected and a syntax tree is selected (S7).

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、自然言語を入力と
しその構文構造や意味内容を解析する事によって何らか
の仕事を実行する情報処理装置の解析アルゴリズムの中
で、単語同士の係り受けの妥当性を判断する意味解析処
理及び文脈解析処理の処理方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the validity of dependency between words in an analysis algorithm of an information processing apparatus which executes a certain task by inputting natural language and analyzing its syntactic structure and semantic content. The present invention relates to a processing method of a semantic analysis process and a context analysis process for determining.

【０００２】[0002]

【従来の技術】一般に日本語解析は、形態素解析、構文
解析処理、意味解析処理、文脈解析処理といった段階に
分けて考えられる。この中で構文解析処理において合文
法的な構文木を生成したとき、一般には複数の構文木が
得られる。この中から正しい構文木を選択するために意
味解析処理が行われる。意味解析処理では単語の係り受
けの様子を調べ、意味的に妥当な係り受けでるかどうか
を判定する。2. Description of the Related Art Generally, Japanese analysis can be considered in stages, such as morphological analysis, syntactic analysis processing, semantic analysis processing, and context analysis processing. When a syntactic grammar tree is generated in the parsing process, a plurality of syntactic trees are generally obtained. Semantic analysis is performed to select the correct syntax tree from among these. In the semantic analysis process, the dependency of a word is examined to determine whether or not the dependency is semantically valid.

【０００３】しかし、実際には意味解析処理で１００％
の正しい結果を得るのは困難であり、この問題を解決す
るために文脈解析処理が行われる。しかしながら文脈解
析処理は技術的に未成熟な分野であり、現在は意味解析
処理までの処理のみであり文脈を対象とした処理はあま
り行われていない。However, in reality, the semantic analysis processing is 100%.
It is difficult to obtain the correct result of, and context analysis processing is performed to solve this problem. However, the context analysis processing is a technically immature field, and at present, only the processing up to the semantic analysis processing and the processing targeting the context is not performed so much.

【０００４】[0004]

【発明が解決しようとする課題】従来の意味解析処理で
は正しい構文解析結果を得るのが困難な場合がある。そ
れは、自然言語が本質的に曖昧な側面を許容しているた
め、意味解析処理が困難であるという場合もあるが、一
方着目している文に記述されている表記内容だけを頼り
に処理を行い、前後の文の情報を用いていないことも、
大きな要因として挙げられる。つまり、文脈情報をうま
く用いる枠組みが実現していないという点が、大きな問
題点である。In the conventional semantic analysis processing, it may be difficult to obtain a correct syntactic analysis result. It may be difficult to perform semantic analysis processing because natural language inherently allows vague aspects, but on the other hand, the processing depends on only the notation content described in the sentence of interest. Doing and not using the information of the sentences before and after,
This is a major factor. In other words, the fact that a framework that uses context information well has not been realized is a major problem.

【０００５】本発明は上記の点に鑑みてなされたもので
その目的は、構文構造の曖昧性を解消し、正しい解析結
果を選択することができる自然言語の文脈的処理方式を
提供することにある。すなわち本発明は、複数の文を処
理し着目した文だけでなく、その前後の文の内容を係り
受けの曖昧性解消に利用する文脈解析処理を提供する。
本発明が有効な場合は限られた場合であるが、今後様々
に文脈処理を発展させる上で基礎となる枠組みを提案す
るものでもある。The present invention has been made in view of the above points, and an object thereof is to provide a natural language contextual processing method capable of eliminating ambiguity in syntactic structure and selecting a correct analysis result. is there. That is, the present invention provides a context analysis process that processes a plurality of sentences and uses not only the sentence of interest but also the contents of the sentences before and after the sentence for disambiguating the dependency.
Although the present invention is effective only in a limited number of cases, it also proposes a basic framework for developing various context processing in the future.

【０００６】[0006]

【課題を解決するための手段】この発明は上記の目的を
達成するために、文章について構文意味解析を行って単
語の係り受け要素に分解し、該係り受け要素をまとめ
て、係り受けの出現数及び意味解析による評価点によっ
て重み付けられた相対確信度を有して成る文脈空間デー
タを作成し、前記相対確信度が大きい係り受け要素を、
信じるデータとして用い、これと同じ係り受けの要素が
他の文中に現れた時はその係り受け要素の相対確信度を
上げ、相対確信度が非常に小さい係り受けデータと同じ
係り受け要素が他の文中に現れた時はその係り受け要素
の相対確信度を下げる確信度調整処理を行い、前記処理
により最終的に得られた文脈空間データから、高い確信
度のデータを選択して構文木の選択を行うことを特徴と
している。SUMMARY OF THE INVENTION In order to achieve the above object, the present invention performs a syntactic and semantic analysis on a sentence and decomposes it into dependency elements of a word. Create context space data that has relative confidence weighted by the number and the evaluation score by the semantic analysis, and determine the dependency element having a large relative confidence,
It is used as belief data, and when the same dependency element appears in another sentence, the relative confidence factor of that dependency element is increased, and the same dependency factor as the dependency data with a very small relative confidence factor When it appears in a sentence, a certainty factor adjustment process that lowers the relative certainty factor of the dependency element is performed, and data with a high certainty factor is selected from the context space data finally obtained by the process to select a syntax tree. It is characterized by performing.

【０００７】[0007]

【発明の実施の形態】以下図面を参照しながら本発明の
実施の形態を説明する。本発明では、文を構文意味解析
し、その結果を単語の係り受けの組みのデータに分解し
て保持する。係り受けの組みのおのおのを係り受け要
素、それらを以下で説明する方法でまとめ上げたデータ
を文脈空間データと呼ぶ。ある文章を構文意味解析する
と一般に複数の構文木が生成される。この構文木と他の
文から得られた文脈空間データを比較し、文脈空間デー
タ内に存在する係り受け要素と同じ係り受け要素を含む
かどうかを調べ、これを構文木選択の判断材料として用
いる。Embodiments of the present invention will be described below with reference to the drawings. According to the present invention, a sentence is syntactically and semantically analyzed, and the result is decomposed and stored as data of the dependency set of words. Each of the dependency sets is referred to as a dependency element, and the data collected by the method described below is referred to as context space data. When a sentence is syntactically and semantically analyzed, plural syntactic trees are generally generated. This syntax tree is compared with the context space data obtained from other sentences, it is checked whether or not the same dependency element exists in the context space data, and this is used as a criterion for selecting the syntax tree. .

【０００８】つぎに例を交えて詳細に説明する。ある文
から複数の構文木が得られたとする。それらの内容を詳
細に見てみると、すべての構文木に共通して含まれる係
り受けがあったり、少ししか現れない係り受けがあった
りする事が判る。Next, a detailed description will be given with examples. Suppose multiple parse trees are obtained from a sentence. If you take a closer look at those contents, you can see that there are dependencies that are commonly included in all syntax trees, and that there are dependencies that appear only slightly.

【０００９】例えば（文ア）「管理クラスは存在する全
ての機器クラスに対してリンクを張る命令を発行する」
という文を解析して次の表１の３通りの結果を得たとす
る。For example, (sentence a) "The management class issues an instruction to link to all existing device classes."
It is assumed that the following statement is obtained by analyzing the sentence.

【００１０】[0010]

【表１】 [Table 1]

【００１１】この係り受けの様子を分解すると表２のよ
うになる。Table 2 is a breakdown of the state of this dependency.

【００１２】[0012]

【表２】 [Table 2]

【００１３】係り受け要素は単語の表記（２つ）、意味
関係を表す記号、その要素を含む構文木番号などのデー
タから構成されている。「命令を発行する」、「管理ク
ラスは発行する」など５つの係り受けはどの構文木にも
共通に現れれている。従ってこれらは確かな係り受けと
判断できる。他の係り受けは出現の個数、つまり生成さ
れた構文木中何個の構文木に現れるかにより相対的な確
信度を持つと考えることとする。The dependency element is composed of data such as word notation (two), a symbol indicating a semantic relationship, and a syntax tree number including the element. Five dependencies such as "issue a command" and "issue a management class" appear commonly in all syntax trees. Therefore, it can be judged that these are certain dependencies. It is considered that other dependencies have relative certainty depending on the number of occurrences, that is, how many syntactic trees in the generated syntactic tree.

【００１４】また、それぞれの構文木は意味解析により
評価点が付与されているので、評価点の差を個数計算の
重み付けとして用いる。表２の得点はそうして得られた
点数である。そしてこの点数をもとに係り受け要素の相
対的な確信度を計算する。さらに相対確信度の高いデー
タから順に係り受け要素をまとめあげていき、次の表３
に示す一種の木構造のデータを作成する（これが文脈空
間データである）。Further, since each syntax tree is given an evaluation score by semantic analysis, the difference between the evaluation scores is used as a weight for the number calculation. The scores in Table 2 are the scores thus obtained. Then, based on this score, the relative certainty factor of the dependency element is calculated. The dependency factors are summarized in order from the data with the higher relative confidence.
Create a kind of tree structure data shown in (this is context space data).

【００１５】[0015]

【表３】 [Table 3]

【００１６】表３では、例えば「存在する」という語の
係り先が、「存在する機器クラス」となる相対確信度が
０．７８３であり、「存在する命令」となる相対確信度
が０．２１７である。In Table 3, for example, the relative existence factor of "existing" is 0.783, and the relative reliability of "existing command" is 0.73. 217.

【００１７】さらに、相対確信度０．７８３のデータ
（これを部分空間と呼ぶ）内で「機器クラス」の係り先
は、「機器クラスに対して発行する」となる相対確信度
が０．５００、「機器クラスに対して張る」となる相対
確信度が０．５００、となっている。各文についてこの
ような文脈空間データを作成しながら解析を進めてい
く。Further, in the data of relative confidence level 0.783 (this is called a subspace), the related party of "device class" is "issue to device class" and the relative confidence level is 0.500. , The relative certainty factor of "put on the device class" is 0.500. We will proceed with the analysis while creating such context space data for each sentence.

【００１８】次に作成した文脈空間データを元にしてチ
ェックを行う処理について説明する。ある文を解析して
文脈空間データを作成したら、それより以前の文から得
た文脈空間データとのチェック処理を行う。処理は次の
Ａ、Ｂ、Ｃの三段階からなる。着目文の文脈空間データ
をＣｕｒｒｅｎｔＣｏｎＳｐ、それ以前の文の文脈空間
データのリストをＣｏｎＳｐｓとする。Next, a process of checking based on the created context space data will be described. After a sentence is analyzed and context space data is created, a check process is performed with the context space data obtained from the sentence before that. The process consists of the following three steps A, B, and C. The context space data of the sentence of interest is CurrentConSp, and the list of context space data of the sentences before it is ConSps.

【００１９】処理Ａ：ＣｕｒｒｅｎｔＣｏｎＳｐの相対
確信度１の係り受け要素を信じてＣｏｎＳｐｓの係り受
け要素の相対確信度の値を調整する。処理Ｂ：ＣｏｎＳｐｓの相対確信度１の係り受け要素を
信じてＣｕｒｒｅｎｔＣｏｎＳｐの係り受け要素の相対
確信度の値を調整する。処理Ｃ：ＣｕｒｒｅｎｔＣｏｎＳｐとＣｏｎＳｐｓの相
対確信度１未満のデータ同士で相対確信度の微調整を行
う。Process A: Believing the dependency element of the relative confidence of CurrentConSp of 1 and adjusting the value of the relative confidence of the dependency element of ConSps. Process B: Believing the dependency element of the relative confidence of ConSps of 1 and adjusting the value of the relative confidence of the dependency element of CurrentConSp. Process C: Fine adjustment of the relative certainty factor is performed between the data of the relative certainty factors of CurrentConSp and ConSps less than 1.

【００２０】[0020]

【実施例】図１に本発明の処理の全体の流れを示す。処
理Ａおよび処理Ｂはデータが入れ代わっただけでアルゴ
リズムは同じである。この処理の流れを図２に示し詳細
に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows the overall flow of processing of the present invention. The algorithms of Process A and Process B are the same except that the data is interchanged. The flow of this processing is shown in FIG. 2 and will be described in detail.

【００２１】（ステップＳ₁）信じる方の文脈空間デー
タ（処理ＡではＣｕｒｒｅｎｔＣｏｎＳｐ、処理Ｂでは
ＣｏｎＳｐｓ）から係り受け要素（以下単に要素とす
る）を一つ取り出す。(Step S ₁ ) _One dependent element (hereinafter simply referred to as an element) is extracted from the believing context space data (CurrentConSp in process A, ConSps in process B).

【００２２】（ステップＳ₂）その要素の相対確信度
（以下単に確信度とする）が１であればステップＳ₃へ
進む。そうでないなら処理を終了する。(Step S ₂ ) If the relative certainty factor of the element (hereinafter simply referred to as the certainty factor) is 1, the process proceeds to step S ₃ . If not, the process ends.

【００２３】（ステップＳ₃）その確信度１の要素を確
信要素とする。(Step S ₃ ) An element having the certainty factor of 1 is set as the certainty element.

【００２４】（ステップＳ₄）調整対象の文脈空間デー
タ（処理ＡではＣｏｎＳｐｓ、処理ＢではＣｕｒｒｅｎ
ｔＣｏｎＳｐ）から要素を一つ取り出し対象要素とす
る。(Step S ₄ ) Context space data to be adjusted (ConSps in process A, Curren in process B)
One element from tConSp) is taken as the target element.

【００２５】（ステップＳ₅）対象要素の確信度が１の
場合はこれも信じるのでそのままにしてステップＳ₄へ
戻る。１未満の場合はステップＳ₆へ進む。(Step S ₅ ) If the certainty factor of the target element is 1, this is also believed, and the process is returned to step S ₄ without any change. If less than 1, proceed to step S ₆ .

【００２６】（ステップＳ₆）対象要素の確信度が１未
満の場合は複数の部分空間が存在し、それらの確信度の
和が１になるようになっている。そこで、それぞれの部
分空間内に確信要素と同じ要素が含まれているかどうか
を調べる。同じ要素であるかどうかは表記と単語間の意
味関係で調べる。また、復号名詞の場合は後ろに付く名
詞を用いると有効な場合が多い。大規模なシソーラス辞
書が利用可能であれば、シソーラスコードが一致したと
いう情報を用いることもできる。(Step S ₆ ) When the certainty factor of the target element is less than 1, there are a plurality of subspaces, and the sum of these certainty factors is 1. Therefore, it is checked whether or not the same element as the certainty element is included in each subspace. Whether or not they are the same element is checked by the notation and the semantic relationship between words. In addition, in the case of a decrypted noun, it is often effective to use a noun that follows. If a large thesaurus dictionary is available, the information that the thesaurus codes match can also be used.

【００２７】（ステップＳ₇）確信要素と同じ要素がど
の部分空間にもない場合はデータはそのままにしてステ
ップＳ₁に戻る。そうでない場合はステップＳ₈へ進む。(Step S ₇ ) If the same element as the certainty element does not exist in any subspace, the data is left as it is and the process returns to step S ₁ . If not, the process proceeds to step S ₈ .

【００２８】（ステップＳ₈）確信要素と同じ要素をす
べての部分空間が含む場合はステップＳ₉へ進む。そう
でない場合はステップＳ₁₀へ進む。(Step S ₈ ) If all subspaces include the same element as the certainty element, the process proceeds to step S ₉ . If not, the process proceeds to step S ₁₀ .

【００２９】（ステップＳ₉）各々の部分空間を調整対
象の文脈空間データとみなし、おのおのの部分空間に対
して本処理をステップＳ₁から適用する。それが終了す
るとステップＳ₁へ戻る。(Step S ₉ ) Each subspace is regarded as the context space data to be adjusted, and this process is applied to each subspace from step S ₁ . When it ends, the process returns to step S ₁ .

【００３０】（ステップＳ₁₀）確信要素と同じ要素を含
む部分空間については確信度を上げる。含まない部分空
間については確信度を下げる。ただし、調整後も確信度
の和は１になるように制限する。(Step S ₁₀ ) The certainty factor is increased for the subspace including the same element as the certainty element. The confidence level is lowered for subspaces that do not include it. However, the sum of the confidence factors is limited to 1 even after the adjustment.

【００３１】（ステップＳ₁₁）確信要素と同じ要素を含
む部分空間を調整対象空間とみなして本処理をステップ
Ｓ₁から適用する。それが終了するとステップＳ₁へ戻
る。(Step S ₁₁ ) The subspace containing the same element as the certainty element is regarded as the adjustment target space, and this processing is applied from step S ₁ . When it ends, the process returns to step S ₁ .

【００３２】次に処理Ｃについて説明する。処理Ｃも処
理Ａ、Ｂと同様にＣｕｒｒｅｎｔＣｏｎＳｐとＣｏｎＳ
ｐの要素を見比べていくが、確信要素の確信度は部分空
間の上から下へ進むにつれて掛け合わせていく。本発明
における確信度は本来非常に不確かな情報であるので、
正しい要素として判断する下限値（正下限と呼ぶ、例え
ば０．８）、誤った要素として判断する上限値（負上限
と呼ぶ、例えば０．２）を設け、その中間のデータは確
信要素として扱わない。ＣｕｒｒｅｎｔＣｏｎＳｐとＣ
ｏｎＳｐの要素で同じ物があった場合、次の表４のよう
な処理を行う。ただし、確信度の調整は処理Ａ、処理Ｂ
に比べ変化幅を小さく設定する。Next, the process C will be described. Process C is also the same as Processes A and B, CurrentConSp and ConS.
As the elements of p are compared, the certainty factor of the certainty factor is multiplied as it goes from the top to the bottom of the subspace. Since the certainty factor in the present invention is originally very uncertain information,
A lower limit value (called a positive lower limit, for example, 0.8) that determines as a correct element and an upper limit value (called a negative upper limit, for example, 0.2) that determines an incorrect element are provided, and intermediate data is treated as a conviction element Absent. CurrentConSp and C
If there is the same onSp element, the processing shown in Table 4 below is performed. However, the adjustment of the certainty factor is the process A and the process B.
Set the change width to be smaller than.

【００３３】[0033]

【表４】 [Table 4]

【００３４】上記のようにして要素チェックの処理が全
て終了したら、最終的に構文木の選択処理（図１のステ
ップＳ₇）を行う。この処理は最終的な文脈空間データ
のリストを対象にする。文脈空間内の確信度の高い方の
部分空間をたどっていき、最も底の要素まで来た時、こ
の要素が何番の構文木から得られたものかをみて、その
番号の構文木を正解の構文木とする。When all the element checking processes are completed as described above, the syntax tree selecting process (step S _{7 in} FIG. 1) is finally performed. This process targets the final list of context space data. Follow the subspace with the higher certainty in the context space, and when you reach the bottom element, see which parse tree this element was derived from and correct the parse tree of that number. Is a syntax tree of.

【００３５】更に実例を用いた説明を述べる。上記例文
（文ア）「管理クラスは存在する全ての機器クラスに対
してリンクを張る命令を発行する」では、表１に示すよ
うに２番の構文木は確信度が低いが、１番と３番の木は
評価点と確信度が等しくどちらが正解であるか判定でき
ない。ところがこの文の後に（文イ）「機器クラスに対
して発行した命令は条件フラグがＯＮである事を確認し
実行される」という文が続くとする。表５にこの文の解
析結果をまとめて示す。Further explanation will be given using an actual example. In the above example sentence (sentence A) “Management class issues an instruction to link to all existing device classes”, as shown in Table 1, the syntax tree of No. 2 has low confidence but The third tree has the same evaluation score and certainty factor, and it cannot be determined which is the correct answer. However, it is assumed that after this sentence, a sentence (statement a) "The instruction issued to the device class is executed after confirming that the condition flag is ON". Table 5 summarizes the analysis results of this sentence.

【００３６】[0036]

【表５】 [Table 5]

【００３７】このとき〔用，機器クラス，に対して，発
行〕という要素が確信度１で存在している事が分かる。
また、上記の（文ア）ではこれと同じ要素が確信度０．
５で存在している（表３）。従って、この要素の確信度
を上げ、もう一方の要素〔用，機器クラス，に対して，
張る〕の確信度を下げる事によって正しい３番の構文木
を選択する事が可能となる。At this time, it can be seen that the element [issue for use, device class] exists with a certainty factor 1.
Also, in the above (sentence A), the same element as this has a certainty factor of 0.
It is present in 5 (Table 3). Therefore, increase the certainty factor of this element, and for the other element [use, device class,
It is possible to select the correct No. 3 syntax tree by lowering the certainty factor of [Enforce].

【００３８】[0038]

【発明の効果】以上述べたように、この発明によれば、
文章について構文意味解析を行って単語の係り受け要素
に分解し、該係り受け要素をまとめて、係り受けの出現
数及び意味解析による評価点によって重み付けられた相
対確信度を有して成る文脈空間データを作成し、前記相
対確信度が大きい係り受け要素を、信じるデータとして
用い、これと同じ係り受けの要素が他の文中に現れた時
はその係り受け要素の相対確信度を上げ、相対確信度が
非常に小さい係り受けデータと同じ係り受け要素が他の
文中に現れた時はその係り受け要素の相対確信度を下げ
る確信度調整処理を行い、前記処理により最終的に得ら
れた文脈空間データから、高い確信度のデータを選択し
て構文木の選択を行うようにしたので、一文に着目した
だけでは解消できない構文構造の曖昧性を解消し、正し
い解析結果を選択することができる。As described above, according to the present invention,
A context space formed by performing a syntactic and semantic analysis on a sentence to decompose it into dependency elements of a word, collecting the dependency elements, and having a relative certainty factor weighted by the number of appearances of the dependency and the evaluation point by the semantic analysis. When data is created and the dependency element having a large relative certainty factor is used as the belief data, when the same dependency factor appears in another sentence, the relative certainty factor of the dependency factor is increased to increase the relative confidence. When a dependency element having the same degree as the dependency data appears in another sentence, a confidence factor adjusting process for lowering the relative confidence factor of the dependency factor is performed, and the context space finally obtained by the process. Since we selected high-confidence data from the data to select the syntax tree, we can resolve the ambiguity of the syntax structure that cannot be resolved by focusing on one sentence and select the correct analysis result. Rukoto can.

[Brief description of the drawings]

【図１】本発明の一実施例を表し、全体の処理を示すフ
ローチャート。FIG. 1 is a flow chart showing an overall process according to an embodiment of the present invention.

【図２】本発明の一実施例を表し、確信度調整処理を示
すフローチャート。FIG. 2 is a flowchart showing the certainty factor adjustment processing according to the embodiment of the present invention.

Claims

[Claims]

1. In a natural language processing method for analyzing a syntactic structure, semantic content, etc. of a natural language as an input, a syntactic and semantic analysis is performed on a sentence to decompose it into word dependency elements, and the dependency elements are In summary, the context space data consisting of the relative certainty factor weighted by the number of appearances of the dependency and the evaluation point by the semantic analysis is created, and the dependency element having the large relative certainty factor is used as the belief data. When the same dependency element appears in another sentence, the relative confidence of the dependency element is increased, and when the dependency element with the same relative confidence as the dependency data appears in another sentence, Performing a certainty factor adjustment process to lower the relative certainty factor of the dependency element, from the context space data finally obtained by the process,
A natural language contextual processing method characterized by selecting high-confidence data and selecting a syntax tree.