JPS62206670A

JPS62206670A - Context analysis system for natural language

Info

Publication number: JPS62206670A
Application number: JP61050031A
Authority: JP
Inventors: Tadashi Hoshiai; 忠星合
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-03-07
Filing date: 1986-03-07
Publication date: 1987-09-11

Abstract

PURPOSE:To accurately extract the context information on an input sentence related to the correspondence by analyzing the unity between an input sentence and preceding sentence. CONSTITUTION:A preceding sentence context data storing means 3 stores a sentence structure tree in the form of th e output obtained from analysis of a sentence structure. While an input sentence to be presently analyzed is supplied to a data input means 1 and analyzed by a unity analyzing means 2. Then a context information generating means 5 produces the information against the preceding sentence and delivers it. In this case, the means 2 refers to the means 3 and a word meaning relation data storing means 4 and at the same time analyzes the connection, i.e., the presence or absence of the meaning relation between the input sentence inputted from the means 1 and the context of the preceding sentence. If the connection is recognized between both sentences, the context processing type is decided to obtain the accurate context information.

Description

【発明の詳細な説明】［概要］自然言語文について構文解析の後、文脈を解析するとき
、入力文と先行文文脈との結束性を解析する手段により
代名詞照応・代用表現照応・省略照応に関する入力文の
文脈的な情報を抽出して、良好な解析結果を得ることの
できる文脈解析方式である。[Detailed Description of the Invention] [Summary] When analyzing the context of a natural language sentence after syntactic analysis, it is possible to analyze pronoun anaphora, substitute expression anaphora, and ellipsis anaphora by means of analyzing the cohesion between the input sentence and the preceding sentence context. This is a context analysis method that can extract contextual information from input sentences and obtain good analysis results.

［産業上の利用分野］本発明は自然言語文について文脈を解析するとき、入力
文と先行文文脈との結束性を解析する手段を具備して実
行する文脈解析方式に関する。[Industrial Field of Application] The present invention relates to a context analysis method that is equipped with means for analyzing cohesion between an input sentence and a preceding sentence context when analyzing the context of a natural language sentence.

［従来の技術］自然言語文について電子計算機を使用し、その文脈を解
析する試みが研究されている。通常その解析手段は第３
図に示す構成となっている。第３図において、自然言語
文例えば「べた書きされた日本語文」につき、ステップ
■において形態素解析を行う。形態素解析とは単語の羅
列している文章について単語と単語の区別を行い、単語
列の組合わせを得ることをいう。以下の説明において各
ステップ毎に電子計算機が記憶装置を活用しながら処理
を行っている。次にステップ■において構文解析を行い
、構文木を得る。構文木とは単語列について、各単語の
品詞連続列を（名詞＋助詞）、（名詞＋助詞）、（動詞
士助動詞）、（句点）のように求め、括弧内を名詞句、
名詞句、動詞句として把握し、それら全体を見ると「文
」という枝分かれのある木に見えるため、その構造をい
う。[Prior Art] Attempts are being made to analyze the context of natural language sentences using electronic computers. Usually, the means of analysis is the third
The configuration is shown in the figure. In FIG. 3, morphological analysis is performed on a natural language sentence, for example, a "Japanese sentence written in solid color" in step (2). Morphological analysis is the process of distinguishing between words in a sentence containing a list of words and obtaining combinations of word strings. In the following explanation, a computer performs processing while utilizing a storage device for each step. Next, in step (2), syntax analysis is performed to obtain a syntax tree. What is a syntactic tree? For a word string, find the consecutive parts of speech of each word as (noun + particle), (noun + particle), (verb auxiliary verb), (pause), and the part of speech in parentheses is the noun phrase,
It is understood as a noun phrase and a verb phrase, and when you look at them as a whole, they look like a tree with branches called "sentences", so this term refers to their structure.

次にステップ■において文脈解析を行う。このとき文脈
について切れ目の有無に関係なく、また通常「代名詞に
ついての照応」を行う程度の処理を行っている。代名詞
についての照応とは現在の文において「彼」　「それ」
のような代名詞が先行文において、何の名詞と対応する
かの照合付けを行うことをいう。その結果で言語学に基
づく処理を行うための解決結果が得られる。Next, in step (2), context analysis is performed. At this time, regardless of the presence or absence of a break in the context, processing is normally performed to the extent of ``anaphora regarding pronouns''. Anaphora for pronouns is “he” and “it” in the current sentence.
This refers to checking which noun a pronoun such as corresponds to in the preceding sentence. The result provides a solution for processing based on linguistics.

［発明が解決しようとする問題点］従来の処理において特に文脈解析のとき、文脈の切れ目
を判定することに基準を設けず適宜処理していた。また
文脈処理の対象として「代名詞の照応」のみを行い、他
の「代用表現の照応」などを行ってなかった。そのため
文脈解析の結果が良好でなかった。「代用表現の照応」
とは指示語として代名詞以外の用語を使用している文章
（単語列）についての照応を指し、例えば「−・・・太
部は花子に対し−」の先行文に対し、解析している現在
の文（入力文）に「男は花子の・−・」のように「男」
と表現されている場合をいう。[Problems to be Solved by the Invention] In conventional processing, particularly when context analysis is performed, no criteria are set for determining a break in context, and processing is performed as appropriate. In addition, only ``anaphora of pronouns'' was processed as a subject of context processing, and other ``anaphora of substitute expressions'' were not performed. Therefore, the results of context analysis were not good. "anaphora of substitute expressions"
refers to a sentence (word string) that uses a term other than a pronoun as a demonstrative. In the sentence (input sentence), add "man" as in "Man is Hanako's..."
This refers to the case where it is expressed as

本発明の目的は前述の欠点を改善し、好結果を得るため
の文脈解析方式として、入力文と先行文について「結束
性」を解析することにより、照応に関する入力文の文脈
的情報を的確に抽出できる文脈解析方式を提供すること
にある。The purpose of the present invention is to improve the above-mentioned drawbacks and provide a context analysis method to obtain good results by analyzing the "cohesiveness" of the input sentence and the preceding sentence to accurately obtain the contextual information of the input sentence regarding anaphora. The purpose is to provide a context analysis method that can extract such information.

［問題点を解決するための手段］第１図は本発明の原理構成を示すブロック図で、従来の
自然言語文解析装置における「文脈解析」を行う部分を
改善していることを示している。第１図において、１は
データ入力手段、２は結束性解析手段、３は先行文文脈
データ格納手段、４は単語意味関係データ格納手段、５
は文脈情報生成手段、６は文脈情報解析結果出力手段を
示す。先行文文脈データ格納手段には構文解析を行った
ときその出力としての構文木が格納されている。[Means for solving the problem] Figure 1 is a block diagram showing the principle configuration of the present invention, and shows that the part that performs "context analysis" in a conventional natural language sentence analysis device is improved. . In FIG. 1, 1 is a data input means, 2 is a cohesion analysis means, 3 is a preceding sentence context data storage means, 4 is a word meaning relationship data storage means, and 5 is a data input means.
6 indicates a context information generation means, and 6 indicates a context information analysis result output means. The preceding sentence context data storage means stores a syntax tree as an output when syntax analysis is performed.

データ入力手段１には現在解析すべき入力文が人力され
、結束性解析手段２において解析され、次に先行文との
文脈情報生成手段５において情報を生成し、出力する。An input sentence to be currently analyzed is manually inputted to the data input means 1, analyzed by the cohesion analysis means 2, and then information is generated and outputted by the context information generation means 5 with respect to the preceding sentence.

［作用］第１図における結束性解析手段２は先行文文脈データ格
納手段３と卑語意味関係データ格納手段４とを参照しな
がら、データ入力手段１から入力した入力文について、
先行文文脈との「つながり」即ち意味関係の有無を解析
し、「つながりが有るとき」は更に文脈処理上の種類を
見出し、的確な文脈情報を得る。[Operation] The cohesiveness analysis means 2 in FIG.
It analyzes the presence or absence of a ``connection'', that is, a semantic relationship, with the context of the preceding sentence, and when there is a ``connection,'' it further discovers the type of context processing and obtains accurate context information.

〔実施例］本発明の具体的処理として前述の「意味関係の解析」と
「文脈情報の生成」とについて順次説明する。[Example] As specific processing of the present invention, the above-mentioned "semantic relationship analysis" and "context information generation" will be sequentially explained.

Ａ、先行文文脈との意味関係について、意味関係として
は（１）上位下位関係（２）類義関係（３）属性関係（４）深層格関係がある。各関係については第２図に例示しであるように
、（１）上位下位関係とは「人間」と「男性」について「
人間」が「男性」より上位の関係にあること。A. Regarding the semantic relationships with the preceding sentence context, the semantic relationships include (1) superior-subordinate relationships, (2) synonymous relationships, (3) attribute relationships, and (4) deep case relationships. For each relationship, as illustrated in Figure 2, (1) superior-subordinate relationship refers to ``human'' and ``male''.
``Human beings'' are in a higher rank than ``men.''

（２）類義関係とは上位の関係が同一となる単語（例え
ば「人間」について）下位の異なる２つの単語（例えば
「男性」と「患者」との関係）をいう。(2) A synonymous relationship is a word that has the same upper level relationship (eg, ``human'') and two words that have different lower level relationships (eg, the relationship between ``man'' and ``patient'').

（３）属性関係とは例えば「患者」について「病気」と
の関係のように「病気があるから患者であること」を示
す語をいう。また「人間」について「年令」との関係の
例がある。(3) Attribute relationships refer to words that indicate that a person is a patient because they have a disease, such as the relationship between a patient and a disease. There is also an example of the relationship between ``human'' and ``age''.

（４）深層格関係とは例えば［入院する」について「患
者」或いは「病院」との関係をいう。これは「−・・・
する」のような動詞と他の概念との関係をいうことで、
「入院する」と「患者」は深層格関係にあるという。ま
たｒＡＪと表示しているのは動詞の動作の主体（エージ
ェント）を示す。「入院する」と「病院」とについて「
０」と表示しているのは動作の対象（オブジェクト）を
示す。(4) A deep case relationship refers to, for example, the relationship of ``to be hospitalized'' with ``patient'' or ``hospital.'' This is "-...
By referring to the relationship between verbs such as "to do" and other concepts,
It is said that ``to be hospitalized'' and ``patient'' have a deep hierarchical relationship. Also, rAJ indicates the subject (agent) of the action of the verb. About “to be hospitalized” and “hospital”
0'' indicates the target (object) of the action.

入力文と文脈との意味関係についての具体的解析は次の
ように実行する。まず入力文の単語を左から１語ずつ見
て、それぞれについて文脈との意味関係を解析する。入
力文の先頭からｉ番目の単語Ｗｉと文脈との意味関係を
見出すため、格納されている先行文中の単語（Ｗｉ）を
入力文に近い順に探索し、両車語間に前述の意味関係に
ついて解析する。何れかの意味関係Ｒが発見できれば「
Ｗｉは文脈中のＷｊと意味関係Ｒを有する」と判断する
。文脈中のどの単語とも意味関係が発見できなければｒ
Ｗｔは文脈と意味関係がない」と判断する。A specific analysis of the semantic relationship between the input sentence and the context is performed as follows. First, the words in the input sentence are looked at one by one from the left, and the meaning of each word with respect to the context is analyzed. In order to find the semantic relationship between the i-th word Wi from the beginning of the input sentence and the context, the words (Wi) in the stored preceding sentences are searched in order of proximity to the input sentence, and the above-mentioned semantic relationship between the two words is searched. To analyze. If any semantic relation R can be discovered,
It is determined that "Wi has a semantic relationship R with Wj in the context." If no semantic relationship can be found with any word in the context, r
"Wt has no relation to context and meaning."

例えば「それ」の単語については第２図の関係図では一
応「事物」と意味関係ありとされ、必要に応じ具体的関
係を調べて下位の単語と関係ありとする。「彼」という
単語については「男性」であることが直ぐ判り意味関係
ありとされる。For example, in the relationship diagram of FIG. 2, the word ``sore'' is assumed to have a semantic relationship with ``thing'', and if necessary, the specific relationship is investigated and it is determined that there is a relationship with lower-level words. Regarding the word ``he,'' it is immediately obvious that it means ``male,'' and there is a semantic relationship.

以上を繰り返すことにより、入力文中の各単語と文脈と
の意味関係が解析できる。入力文中に、文脈と意味関係
を有するような単語がある場合に、この文は文脈に対し
て「結束性（ｃｏｈｅｓｉｏｎ）を有する」といい、入
力文中のどの単語も文脈と意味関係がない場合に、この
文は文脈に対して「結束性が無い」という。結束性が有
る限り、文脈は意味的つながりを持ち、結束性が無（な
れば文脈は意味的なつながりが切れるとみなされる。By repeating the above steps, the semantic relationship between each word in the input sentence and the context can be analyzed. If there is a word in the input sentence that has a semantic relationship with the context, this sentence is said to have "cohesion" with the context, and if no word in the input sentence has a semantic relationship with the context, then this sentence is said to have "cohesion" with the context. In other words, this sentence is said to have ``no cohesion'' with respect to the context. As long as there is cohesion, the context has a semantic connection; if there is no cohesion, the context is considered to have no semantic connection.

Ｂ０文脈処理の種類前述の手段により意味関係があると判断されたとき、そ
の種類に応じて下記の処理を行う。Types of B0 context processing When it is determined that there is a semantic relationship by the above-mentioned means, the following processing is performed depending on the type.

（１）意味関係が上位下位関係の場合入力文中の単語Ｗｉが名詞的代名詞（「それ」「これら
」　「彼」など）の場合は「代名詞の照応」の処理を行
う。(1) When the semantic relationship is a superior-subordinate relationship If the word Wi in the input sentence is a noun-like pronoun (such as "it,""these,""he," etc.), "pronoun anaphora" processing is performed.

単語Ｗｉが連体詞的代名詞（「その」　「あの」など）
の場合は「代名詞の照応」の処理を行う。The word Wi is an adnominal pronoun (“that”, “that”, etc.)
In this case, "pronoun anaphora" processing is performed.

単語Ｗｉが代名詞でない場合は、「代用表現の照応」の
処理を行う。If the word Wi is not a pronoun, "anaphora of substitute expression" processing is performed.

（２）意味関係が属性関係の場合単語Ｗｉが連体詞的代名詞の場合は、「代名詞の照応」
の処理を行う。(2) When the semantic relationship is an attribute relationship If the word Wi is an adnominal pronoun, it is a “pronoun anaphora”
Process.

単語Ｗｉが代名詞でない場合は、「省略の照応」の処理
を行う。If the word Wi is not a pronoun, "omitted anaphora" processing is performed.

（３）意味関係が類義関係の場合単語Ｗｉが代名詞でない場合は「代用表現の照応」の処
理を行う。(3) When the semantic relationship is a synonymous relationship If the word Wi is not a pronoun, "anaphora of substitute expression" processing is performed.

（４）意味関係が深層格関係の場合単語Ｗｉが動詞の場合「省略の照応」の処理を行う。(4) When the semantic relationship is a deep case relationship When the word Wi is a verb, "anaphora of omission" processing is performed.

以上により処理についての種類が判定されたから、文脈
情報の生成を行う。Since the type of processing has been determined as described above, context information is generated.

Ｃ０文脈情報の生成イ９代名詞の照応の場合入力文の構文木中の代名詞のノード（節点）から、先行
文の文脈中指示対象に対応するノードに対し、データと
しての対応付けをするためのポインタを張る。C0 Context information generation A9 Pronoun anaphora In order to make a data correspondence from the pronoun node in the syntax tree of the input sentence to the node corresponding to the referent in the context of the preceding sentence. Post a pointer.

口１代用表現の照応の場合入力文の構文木中の代用表現の単語のノードから、先行
文の文脈中の指示対象に対応するノードにポインタを張
る。In the case of anaphora of a substitute expression, a pointer is placed from the node of the word of the substitute expression in the syntax tree of the input sentence to the node corresponding to the referent in the context of the preceding sentence.

ハ、省略の照応の場合深層格関係の場合、動詞Ｗｉを含む部分木のうちで、動
詞Ｗｉの他の深層格関係にある単語に相当する部分木と
同じレベルに、省略句（エリプシス）とラベルされたノ
ードを挿入し、そのノードから文脈中の指示対象に対応
するノードにポインタを張る。C. In the case of elliptical anaphors In the case of deep case relations, among the subtrees containing the verb Wi, the ellipsis phrase (ellipsis) is placed at the same level as the subtree corresponding to words in other deep case relations of the verb Wi. Insert a labeled node and point from that node to the node corresponding to the referent in the context.

以上の処理により入力された構文木を更新してから、文
脈情報生成手段６は、この構文木を先行車文脈データ格
納手段３に格納されている文脈の最後尾に付加格納する
。After updating the input syntax tree through the above processing, the context information generation means 6 additionally stores this syntax tree at the end of the context stored in the preceding vehicle context data storage means 3.

［発明の効果］このようにして本発明によると、入力された自然言語文
を解析するとき、先行文脈との結束性についてまず解析
を行い、意味関係の有無を判断することにより文脈の切
れ目を判定する基準としているから、文脈解析が的確に
なる。更に代名詞の照応以外に代用表現の照応も扱って
いるため、解析結果は一層良好となる。[Effects of the Invention] Thus, according to the present invention, when an input natural language sentence is analyzed, the cohesion with the preceding context is first analyzed, and the presence or absence of a semantic relationship is determined, thereby eliminating context breaks. Since it is used as a criterion for judgment, the context analysis becomes more accurate. Furthermore, since it deals with the anaphors of substitute expressions in addition to the anaphors of pronouns, the analysis results are even better.

[Brief explanation of drawings]

第１図は本発明の原理構成を示すブロック図、第２図は
文脈との意味関係を例示する図、第３図は文脈解析方式
を説明する図である。２−結束性解析手段３−・・先行文文脈データ格納手段４・−単語意味関係データ格納手段５−文脈情報生成手段特許出願人　　　　富士通株式会社代　理　人　　　弁理士　鈴木栄祐裏発明のＲ理構、檄図第１図 ←−−−上位下哨正量屡ネＲ釈結果！関係係４関係文脈解析図FIG. 1 is a block diagram showing the principle configuration of the present invention, FIG. 2 is a diagram illustrating the semantic relationship with context, and FIG. 3 is a diagram explaining a context analysis method. 2-Cohesion analysis means 3--Antecedent sentence context data storage means 4--Word meaning relationship data storage means 5-Context information generation means Patent applicant: Fujitsu Limited Agent Patent attorney: R-physics invented by Eisuke Suzuki , Picture map 1 ←----Upper lower post correct amount times R interpretation result! Relationship Context Analysis Diagram

Claims

[Claims] In a context analysis method for analyzing the context of syntactic analysis of a natural language sentence, there is provided a data input means (1), a preceding sentence context data storage means (3), and a word meaning relationship data storage means (4). , a cohesion analysis means (2), a context information generation means (5), and a context information analysis result output means (6). A method for analyzing the context of a natural language sentence, characterized in that the cohesion analysis means (2) analyzes the cohesion of an input sentence regarding pronoun anaphor, substitute expression anaphor, and omitted anaphor while referring to the cohesion analysis means (2).