JP2003281132A

JP2003281132A - System and method of processing natural language, and computer program

Info

Publication number: JP2003281132A
Application number: JP2002079625A
Authority: JP
Inventors: Hiroshi Masuichi; 博増市; Tomoko Okuma; 智子大熊
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-03-20
Filing date: 2002-03-20
Publication date: 2003-10-03
Anticipated expiration: 2022-03-20
Also published as: JP3972697B2

Abstract

<P>PROBLEM TO BE SOLVED: To more accurately analyze the syntactic meaning of a sentence with an omitted constituent which is primarily considered essential such as a subject or object in a Japanese sentence. <P>SOLUTION: When words follow the predicate in a sentence, a speech part category AUX corresponding to the predicate which should not have a case frame is defined. When these words follow the predicate in the sentence, the case frame is deleted. When a phrase connector such that the predicate of the phrase just before it shows a tendency not to receive a noun phrase including 'ha' or 'ga' (in Japanese) appearing at the head of the sentence is present in the sentence, a noun phrase including 'ga' is inserted into the case frame corresponding to the subject of the predicate of the phrase just after the phrase connector if the noun phrase including 'ga' is present at the head of the sentence. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、人間が日常的なコ
ミュニケーションに使用する自然言語を数学的に取り扱
うための自然言語処理システム及び自然言語処理方法、
並びにコンピュータ・プログラムに係り、特に、日本語
構文の統語・意味解析を行なう自然言語処理システム及
び自然言語処理方法、並びにコンピュータ・プログラム
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language processing system and a natural language processing method for mathematically handling a natural language used by humans for daily communication.
The present invention also relates to a computer program, and more particularly to a natural language processing system and a natural language processing method for performing syntactic and semantic analysis of Japanese syntax, and a computer program.

【０００２】さらに詳しくは、本発明は、日本語文のよ
うに主語や目的語といった本来は必須と考えられるが構
成要素が省略された文をより正確に統語意味解析を行な
う自然言語処理システム及び自然言語処理方法、並びに
コンピュータ・プログラムに係り、特に、文中から省略
された主語や目的語すなわちゼロ代名詞の情報を高精度
に出力する自然言語処理システム及び自然言語処理方
法、並びにコンピュータ・プログラムに関する。More specifically, the present invention is a natural language processing system and a natural language processing system for more accurately performing syntactic and semantic analysis of a sentence such as a Japanese sentence that is originally indispensable, such as a subject and an object, but whose constituent elements are omitted. The present invention relates to a language processing method and a computer program, and more particularly, to a natural language processing system and a natural language processing method and a computer program which highly accurately output information of a subject or object omitted from a sentence, that is, a zero pronoun.

【０００３】[0003]

【従来の技術】日本語や英語など、人間が日常的なコミ
ュニケーションに使用する言葉のことを「自然言語」と
呼ぶ。多くの自然言語は、自然発生的な起源を持ち、人
類、民族、社会の歴史とともに進化してきた。勿論、人
は身振りや手振りなどによっても意思疎通を行なうこと
が可能であるが、自然言語により最も自然で且つ高度な
コミュニケーションを実現することができる。2. Description of the Related Art Words used by humans for daily communication such as Japanese and English are called "natural language". Many natural languages have a naturally occurring origin and have evolved with the history of humanity, ethnicity, and society. Of course, a person can communicate by gesturing or gesturing, but natural language can realize the most natural and advanced communication.

【０００４】他方、情報技術の発展に伴い、コンピュー
タが人間社会に定着し、各種産業や日常生活の中に深く
浸透している。いまやコンピュータ・データだけでな
く、画像や音響などほとんどすべての情報コンテンツが
コンピュータ上で取り扱われ、情報の編集・加工、蓄
積、管理、伝達、共有など高度な処理を行なうことが可
能となっている。On the other hand, with the development of information technology, computers have become established in human society and have been deeply permeated in various industries and daily lives. Now, not only computer data but almost all information contents such as images and sounds are handled on the computer, and it is possible to perform advanced processing such as editing / processing, storage, management, transmission and sharing of information. .

【０００５】自然言語は、本来抽象的であいまい性が高
い性質を持つが、文章を数学的に取り扱うことにより、
コンピュータ処理を行なうことができる。この結果、機
械翻訳や対話システム、検索システムなど、自動化処理
により自然言語に関するさまざまなアプリケーション／
サービスが実現される。Natural language has an abstract and ambiguous nature by nature, but by treating sentences mathematically,
Computer processing can be performed. As a result, various applications related to natural language such as machine translation, dialogue system, search system, etc.
The service is realized.

【０００６】自然言語処理は一般に、形態素解析、構文
解析、意味解析、文脈解析という各処理フェーズに区分
される。Natural language processing is generally divided into processing phases of morphological analysis, syntactic analysis, semantic analysis, and context analysis.

【０００７】形態素解析では、文を意味的最小単位であ
る形態素（morpheme）に分節して品詞の認定処理を行な
う。構文解析では、文法規則などを基に句構造などの文
の構造を解析する。文法規則が木構造であることから、
構文解析結果は一般に個々の形態素が係り受け関係など
を基にして接合された木構造となる。意味解析では、文
中の語の語義（概念）や、語と語の間の意味関係などに
基づいて、文が伝える意味を表現する意味構造を求め
て、意味構造を合成する。文脈解析では、文の系列であ
る文章（談話）を解析の基本単位とみなして、文間の意
味的なまとまりを得て談話構造を構成する。In the morphological analysis, a sentence is segmented into morphemes, which are the smallest semantic units, and a part-of-speech recognition process is performed. In the syntactic analysis, a sentence structure such as a phrase structure is analyzed based on grammatical rules. Since the grammar rules have a tree structure,
The syntactic analysis result generally has a tree structure in which individual morphemes are joined based on the dependency relationship. In the semantic analysis, based on the meaning (concept) of the words in the sentence and the semantic relationship between the words, the semantic structure expressing the meaning conveyed by the sentence is obtained, and the semantic structures are synthesized. In context analysis, a sentence (discourse), which is a series of sentences, is regarded as a basic unit of analysis, and a discourse structure is constructed by obtaining a semantic unity between sentences.

【０００８】統語意味解析では、構文解析などで係り受
け関係を求めた後の構造文に対して、動詞と主語などの
文中の他の構成要素との関係（すなわち、述語の格フレ
ーム）を記述した結合価辞書を用いて、述部とそれに係
る語の意味関係を抽出するということが行なわれてい
る。In the syntactic and semantic analysis, the relation between the verb and the other constituent elements in the sentence such as the subject (ie, the case frame of the predicate) is described for the structural sentence after the dependency relation is obtained by the syntactic analysis or the like. It is performed that the semantic relation between the predicate and the related words is extracted using the valence index dictionary.

【０００９】[0009]

【発明が解決しようとする課題】日本語文では、主語や
目的語といった本来必須の構成要素と考えられるものが
頻繁に省略される。このような省略された主語や目的語
のことを「ゼロ代名詞」と呼んでいる。[Problems to be Solved by the Invention] In Japanese sentences, what is considered to be an essential constituent element such as a subject or an object is often omitted. Such abbreviated subjects and objects are called "zero pronouns".

【００１０】ゼロ代名詞の実体が何であるのかを文脈か
ら特定することは、対話システムを始めとする各種の自
然言語処理アプリケーションを実現する上で不可欠の処
理である。ゼロ代名詞の実体を特定するためのアルゴリ
ズムとして、例えば、M. A.Walker、A. K. Joshi及びE.
F. Prince共著の"Centering Theory in Discourse", C
larendon Press, Oxford (1994)に詳細が述べられてい
る「Ｃｅｎｔｅｒｉｎｇ理論」を挙げることができる。Identifying the substance of the zero pronoun from the context is an essential process for realizing various natural language processing applications such as a dialogue system. Examples of algorithms for identifying the substance of the zero pronoun include MAWalker, AK Joshi and E.
F. Prince co-authored "Centering Theory in Discourse", C
The "Centering theory", which is described in detail in larendon Press, Oxford (1994), can be mentioned.

【００１１】しかしながら、このようなゼロ代名詞の実
体をいずれの方法で特定するにせよ、その前処理とし
て、文中においてどの構成要素が省略されているかを同
定しておく必要がある。However, whichever method is used to specify the substance of such a zero pronoun, it is necessary to identify, as a preprocessing, which constituent element is omitted in the sentence.

【００１２】例えば、述語の格フレームを基本的な情報
に用いて、ゼロ代名詞の出現位置を特定することができ
る。格フレームは、述部毎にどのような構成要素が結合
し得るかを示したものであり、結合価とも呼ばれ、結合
価辞書に蓄積されている。For example, the case position of the predicate can be specified by using the case frame of the predicate as basic information. The case frame shows what constituent elements can be combined for each predicate, and is also called a valence, and is stored in the valence dictionary.

【００１３】図１には、情報処理振興事業協会技術セン
ター（ＩＰＡ）で開発されたＩＰＡＬ動詞辞書に記述さ
れている動詞「合う」の選択制限付き格フレームを示し
ている。同図に示すように、動詞「合う」に対して、主
語（「ガ格」）のみをとる場合、あるいは主語とニ格を
同時にとる場合などの格フレームが記述されている。図
１で選択制限として与えられているｈｕｍ（human：人
間）やｐｈｅ（phenomenon：現象）などは、その格をと
る名詞の概念的なカテゴリを示しており、意味素性（se
mantic feature）又は意味マーカ（semantic marker）
と呼ばれる。ゼロ代名詞の出現位置を特定するための基
本的な情報は述語の格フレームと呼ばれるものである。FIG. 1 shows a case frame with limited selection of the verb "fit" described in the IPAL verb dictionary developed at the Information Technology Promotion Agency Technical Center (IPA). As shown in the figure, a case frame is described for the verb "fit" when only the subject ("ga case") is taken or when the subject and the two cases are taken at the same time. Hum (human: human) and phe (phenomenon: phenomenon), which are given as selection restrictions in Fig. 1, indicate the conceptual category of the noun that takes the case, and the semantic feature (se
mantic feature) or semantic marker
Called. The basic information for identifying the appearance position of the zero pronoun is called the case frame of the predicate.

【００１４】ここで、格フレーム情報を使用してゼロ代
名詞の出現位置を探索する方法について考察してみる。
述語の格フレームを参照することによって、文中で省略
されている可能性がある主語や述語の出現位置を容易に
特定することができる。Now, let us consider a method of searching for an appearance position of a zero pronoun using case frame information.
By referring to the case frame of the predicate, it is possible to easily specify the appearance position of the subject or predicate that may be omitted in the sentence.

【００１５】例えば、以下の文（１）の中で、格フレー
ムを基にゼロ代名詞化されている可能性のある個所を探
索して、ゼロ代名詞に対応する記号として「ＮＵＬＬ」
を挿入していく。この結果、元の文は（２）に示すよう
な形になってしまうが、本来必要でないゼロ代名詞を解
析結果に含めてしまうことは明かであろう。For example, in the following sentence (1), a part that may be a zero pronoun is searched for based on the case frame, and "NULL" is used as a symbol corresponding to the zero pronoun.
Insert. As a result, the original sentence has the form shown in (2), but it is clear that the zero pronouns that are not originally necessary are included in the analysis result.

【００１６】（１）考えてみていなかったが、恐らく正
しくない。（２）(NULLが)(NULLを)考えて(NULLが) (NULLを)みて
(NULLが)いて(NULLが)なかったが、恐らく(NULLは)正し
く(NULLは)ない。(1) I did not think about it, but it is probably incorrect. (2) Think of (NULL) (NULL) (NULL) (NULL)
It was (null) and not (null), but probably (null) is not correct (null).

【００１７】本発明の目的は、上述したような技術的課
題を鑑みたものであり、日本語文のように主語や目的語
といった本来は必須と考えられるが構成要素が省略され
た文をより正確に統語意味解析を行なうことができる、
優れた自然言語処理システム及び自然言語処理方法、並
びにコンピュータ・プログラムを提供することにある。The object of the present invention is to solve the above technical problems, and more accurately translates a sentence in which constituent elements are omitted, such as a Japanese sentence, which is originally considered essential such as a subject and an object. Can perform syntactic and semantic analysis,
An object is to provide an excellent natural language processing system, natural language processing method, and computer program.

【００１８】本発明のさらなる目的は、文中から省略さ
れた主語や目的語すなわちゼロ代名詞の情報を高精度に
出力することができる、優れた自然言語処理システム及
び自然言語処理方法、並びにコンピュータ・プログラム
を提供することにある。A further object of the present invention is to provide an excellent natural language processing system, natural language processing method, and computer program capable of accurately outputting information of a subject or object omitted from a sentence, that is, a zero pronoun. To provide.

【００１９】[0019]

【課題を解決するための手段及び作用】本発明は、上記
課題を参酌してなされたものであり、その第１の側面
は、必須の構成要素が省略されている文を統語・意味解
析する自然言語処理システム又は自然言語処理方法であ
って、述語に後続する場合には格フレームを持つべきで
ない述語に対応する品詞カテゴリＡＵＸを定義する品詞
カテゴリ定義手段又はステップと、入力文中で他の述語
の直後あるいは一定の助詞を挟んでその後に存在する該
品詞カテゴリＡＵＸに属する述語を探索する述語探索又
はステップと、前記述語探索又はステップにより抽出さ
れた述語の格フレームを削除する格フレーム削除又はス
テップと、を具備することを特徴とする自然言語処理シ
ステム又は自然言語処理方法である。The present invention has been made in consideration of the above problems. The first aspect of the present invention is to perform syntactic and semantic analysis on sentences in which essential constituent elements are omitted. A natural language processing system or a natural language processing method, wherein a part-of-speech category defining means or step for defining a part-of-speech category AUX corresponding to a predicate that should not have a case frame when following the predicate, and another predicate in the input sentence Immediately after or a predicate search or step for searching for a predicate belonging to the part-of-speech category AUX that exists after a certain particle, and a case frame deletion for deleting the case frame of the predicate extracted by the predescription word search or step, or A natural language processing system or a natural language processing method, comprising:

【００２０】また、本発明の第１の側面に係る自然言語
処理システム又は自然言語処理方法は、直前の句の述部
が文頭に出現する「は」や「が」を伴う名詞句を受けな
い傾向を示す第１の句結合子を文中で探索する第１の句
結合子探索手段又はステップと、文中で該第１の句結合
子が発見されたことに応じて、文頭に「が」を伴う名詞
句があれば該第１の句結合子の直後の句の述部の主語に
対応する格フレームに該名詞句を挿入し、及び／又は、
文頭に「は」を伴う名詞句があれば該第１の句結合子の
直後の句の述部に対応する格フレームに該名詞句を挿入
する第１の格フレーム処理手段又はステップと、をさら
に備えていてもよい。Further, the natural language processing system or the natural language processing method according to the first aspect of the present invention does not receive a noun phrase accompanied by "ha" or "ga" in which the predicate of the immediately preceding phrase appears at the beginning of a sentence. In response to the first phrase connector searching means or step for searching the first phrase connector showing a tendency in the sentence and the fact that the first phrase connector is found in the sentence, "ga" is added to the beginning of the sentence. If there is a accompanying noun phrase, insert the noun phrase in the case frame corresponding to the subject of the predicate of the phrase immediately after the first phrase connector, and / or
If there is a noun phrase accompanied by "ha" at the beginning of the sentence, a first case frame processing means or step for inserting the noun phrase in the case frame corresponding to the predicate of the phrase immediately after the first phrase connector. It may be further equipped.

【００２１】また、本発明の第１の側面に係る自然言語
処理システム又は自然言語処理方法は、直前の句の述部
が文頭に出現する「は」を伴う名詞句を受けない傾向を
示す第２の句結合子を文中で探索する第２の句結合子探
索手段又はステップと、文中で該第２の句結合子が発見
されたことに応じて、文頭に「は」を伴う名詞句があれ
ば該第２の句結合子の直後の句の述部に対応する格フレ
ームに該名詞句を挿入する第２の格フレーム処理手段又
はステップと、をさらに備えていてもよい。Further, the natural language processing system or the natural language processing method according to the first aspect of the present invention shows that the predicate of the immediately preceding phrase tends not to receive a noun phrase accompanied by "ha" appearing at the beginning of a sentence. A second phrase connector searching means or step for searching the second phrase connector in the sentence, and a noun phrase accompanied by "ha" at the beginning of the sentence in response to the second phrase connector being found in the sentence. If there is, a second case frame processing means or step for inserting the noun phrase into the case frame corresponding to the predicate of the phrase immediately after the second phrase connector may be further provided.

【００２２】本発明の第１の側面に係る自然言語処理シ
ステム又は自然言語処理方法によれば、補助的述語及び
句結合子に注目して、不要な格フレームを削除して、格
構造のあいまい性を減ずることができる。According to the natural language processing system or the natural language processing method of the first aspect of the present invention, focusing on the auxiliary predicate and the phrase connector, unnecessary case frames are deleted, and the case structure is ambiguous. You can reduce your sex.

【００２３】したがって、その後、通常の意味解析（格
構造解析）により、格フレームを基にゼロ代名詞化され
ている可能性のある個所を探索して、ゼロ代名詞に対応
する記号として「ＮＵＬＬ」を挿入していくという処理
により、ゼロ代名詞の出現位置をより正確に特定するこ
とができる。Therefore, after that, a usual semantic analysis (case structure analysis) is performed to search for a part that may be zero pronoun based on the case frame, and "NULL" is used as a symbol corresponding to the zero pronoun. By the process of inserting, the appearance position of the zero pronoun can be specified more accurately.

【００２４】また、本発明の第２の側面は、必須の構成
要素が省略されている文を統語・意味解析する自然言語
処理をコンピュータ・システム上で実行するようにコン
ピュータ可読形式で記述されたコンピュータ・プログラ
ムであって、述語に後続する場合には格フレームを持つ
べきでない述語に対応する品詞カテゴリＡＵＸを定義す
る品詞カテゴリ定義ステップと、入力文中で他の述語の
直後あるいは一定の助詞を挟んでその後に存在する該品
詞カテゴリＡＵＸに属する述語を探索する述語探索ステ
ップと、前記述語探索ステップにより抽出された述語の
格フレームを削除する格フレーム削除ステップと、を具
備することを特徴とするコンピュータ・プログラムであ
る。The second aspect of the present invention is described in a computer-readable format so that a natural language process for performing syntactic / semantic analysis of a sentence in which essential components are omitted is executed on a computer system. A part-of-speech category defining step that defines a part-of-speech category AUX corresponding to a predicate that should not have a case frame when it is a computer program and that follows a predicate, and immediately after another predicate or a fixed particle in the input sentence. And a case frame deletion step of deleting a case frame of the predicate extracted by the predescriptor word search step. It is a computer program.

【００２５】本発明の第２の側面に係るコンピュータ・
プログラムは、コンピュータ・システム上で所定の処理
を実現するようにコンピュータ可読形式で記述されたコ
ンピュータ・プログラムを定義したものである。換言す
れば、本発明の第２の側面に係るコンピュータ・プログ
ラムをコンピュータ・システムにインストールすること
によって、コンピュータ・システム上では協働的作用が
発揮され、本発明の第１の側面に係る自然言語処理装置
又は自然言語処理方法と同様の作用効果を得ることがで
きる。A computer according to the second aspect of the present invention
The program defines a computer program written in a computer-readable format so as to realize a predetermined process on a computer system. In other words, by installing the computer program according to the second aspect of the present invention in the computer system, a cooperative action is exerted on the computer system, and the natural language according to the first aspect of the present invention. It is possible to obtain the same operational effect as the processing device or the natural language processing method.

【００２６】本発明のさらに他の目的、特徴や利点は、
後述する本発明の実施形態や添付する図面に基づくより
詳細な説明によって明らかになるであろう。Further objects, features and advantages of the present invention are as follows.
It will be apparent from the embodiments of the present invention described later and the more detailed description based on the accompanying drawings.

【００２７】[0027]

【発明の実施の形態】以下、図面を参照しながら本発明
の実施形態について詳解する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below in detail with reference to the drawings.

【００２８】自然言語の構文解析手法は、統計処理に基
づく方法と文法ルール記述に基づく方法に大別すること
ができる。本発明は、とりわけ文法ルール記述に基づく
統語・意味解析に適用することで顕著な効果を奏するこ
とができる。Natural language syntax analysis methods can be roughly classified into methods based on statistical processing and methods based on grammar rule description. The present invention can exert a remarkable effect particularly when applied to syntactic / semantic analysis based on grammar rule description.

【００２９】本発明に係る自然言語処理システムは、例
えば、ＬＦＧ（Lexical-FunctionalGrammar）文法理論
に基づく統語・意味解析処理に組み込んで実装すること
ができる。ＬＦＧでは、ネイティブ・スピーカの言語知
識すなわち文法を、コンピュータ処理や、コンピュータ
の処理動作に影響を及ぼすその他の非文法的な処理パラ
メータとは切り離したコンポーネントとして構成してい
る。まず、自然言語処理システムの全体像について簡単
に説明する。なお、本実施形態ではＬＦＧ文法理論に基
づいて説明するが、勿論、他の文法ルールを備えた解析
システムにおいても本発明を同様に適用することができ
る。The natural language processing system according to the present invention can be implemented, for example, by incorporating it into the syntactic / semantic analysis processing based on the LFG (Lexical-Functional Grammar) grammar theory. In LFG, the linguistic knowledge, or grammar, of native speakers is organized as a component separate from computer processing and other non-grammatical processing parameters that affect the processing behavior of the computer. First, an overview of the natural language processing system will be briefly described. Although the present embodiment is described based on the LFG grammar theory, the present invention can be similarly applied to an analysis system having other grammar rules.

【００３０】図２には、ＬＦＧに基づく自然言語処理シ
ステム１の構成を模式的に示している。FIG. 2 schematically shows the structure of the natural language processing system 1 based on LFG.

【００３１】形態素解析部２は、日本語など特定の言語
に関する形態素ルール２Ａと形態素辞書２Ｂを持ち、入
力文を意味的最小単位である形態素に分節して品詞の認
定処理を行なう。例えば、「私の娘は英語を話しま
す。」という文が入力された場合、形態素解析結果とし
て、「私{Noun} の{up} 娘{Noun} は{up} 英語{Noun}
を{up} 話す{Verb1}{tr} ます{jp} 。{pt}」が出力され
る。The morpheme analysis unit 2 has a morpheme rule 2A and a morpheme dictionary 2B relating to a specific language such as Japanese, and divides an input sentence into morphemes, which are the smallest semantic units, to perform a part of speech recognition process. For example, if the sentence "My daughter speaks English." Is entered, the result of morphological analysis is "I {Noun} {up} daughter {Noun} is {up} English {Noun}.
{Up} speak {Verb1} {tr} and {jp}. {pt} "is output.

【００３２】このような形態素解析結果は、次いで、統
語・意味解析部３に入力される。統語・意味解析部は、
文法ルール３Ａや結合価辞書３Ｂなどの辞書を持ち、文
法ルールなどに基づく句構造の解析や、文中の語の語義
や語と語の間の意味関係などに基づいて文が伝える意味
を表現する意味構造の解析を行なう（結合価辞書は動詞
と主語などの文中の他の構成要素との関係を記述したも
のであり、述部とそれに係る語の意味関係を抽出するこ
とができる）。The result of such morphological analysis is then input to the syntactic / semantic analysis unit 3. The syntactic and semantic analysis department
It has dictionaries such as grammar rules 3A and valence valence dictionary 3B, and analyzes the phrase structure based on grammar rules and expresses the meaning conveyed by the sentence based on the meaning of words in the sentence and the meaning relation between words. Analyze the semantic structure (The valence dictionary describes the relationship between verbs and other constituent elements in the sentence, such as the subject, and can extract the semantic relationship between predicates and related words).

【００３３】そして、構文解析した結果として、単語や
形態素などからなる文章の句構造を木構造として表し
た"ｃ−ｓｔｒｕｃｔｕｒｅ（constituent structur
e）"と、主語、目的語などの格構造に基づいて入力文を
疑問文、過去形、丁寧文など意味的・機能的に解析した
結果として"ｆ−ｓｔｒｕｃｔｕｒｅ（functional stru
cture）"を出力する。Then, as a result of the syntactic analysis, "c-structure (constituent structur) which represents a phrase structure of a sentence composed of words and morphemes as a tree structure.
e) ", and as a result of semantically and functionally analyzing the input sentence based on the case structure of the subject, object, etc., such as question sentence, past tense, polite sentence," f-structure (functional stru
cture) "is output.

【００３４】図３及び図４には、入力文「私の娘は英語
を話します。」を統語・意味解析部１により処理した結
果として得られるｃ−ｓｔｒｕｃｔｕｒｅ及びｆ−ｓｔ
ｒｕｃｔｕｒｅをそれぞれ示している。In FIGS. 3 and 4, the c-structure and f-st obtained as a result of processing the input sentence "My daughter speaks English" by the syntactic / semantic analysis unit 1.
In each of the figures, the figures are shown.

【００３５】ｃ−ｓｔｒｕｃｔｕｒｅは、文中の単語や
句の構造を木構造形式で表したものであり、構文カテゴ
リーによって定義される。例えば音素列を生成するため
の音韻学的な解釈を、ｃ−ｓｔｒｕｃｔｕｒｅを基に行
なうことができる。一方、ｆ−ｓｔｒｕｃｔｕｒｅは、
文法的な機能を明確に表現したものであり、文法的な機
能名、意味的形式、並びに特徴シンボルにより構成され
る。ｆ−ｓｔｒｕｃｔｕｒｅを参照することにより、主
語（subject）、目的語（object）、補語（complemen
t）、修飾語（adjunct）といった意味理解を得ることが
できる。ｆ−ｓｔｒｕｃｔｕｒｅは、ｃ−ｓｔｒｕｃｔ
ｕｒｅの各節点に付随する素性の集合であり、図４に示
すように属性−属性値のマトリックスの形で表現され
る。すなわち、［］で囲まれた中の左側は素性（属性）
の名前であり、右側は素性の値（属性値）である。The c-structure is a tree structure representing the structure of words and phrases in a sentence, and is defined by the syntax category. For example, a phonological interpretation for generating a phoneme string can be performed based on c-structure. On the other hand, f-structure is
It is a clear expression of grammatical functions and is composed of grammatical function names, semantic forms, and feature symbols. By referring to f-structure, the subject, the object, and the complement
t) and modifiers (adjunct) can be understood. f-structure is c-structure
It is a set of features attached to each node of ure, and is expressed in the form of an attribute-attribute value matrix as shown in FIG. In other words, the left side in [] is the feature (attribute)
And the right side is the feature value (attribute value).

【００３６】なお、ＬＦＧの詳細に関しては、例えばR.
M. Kaplan及びJ. Bresnan共著の論文"Lexical-Functio
nal Grammar: A Formal System for Grammatical Repre
sentation"（The MIT Press, Cambridge (1982). Repr
inted in Formal Issues inLexical-Functional Gramma
r, pp. 29-130. CSLI publications, Stanford Univers
ity(1995).）に記述されている。For details of LFG, see, for example, R.
Paper "Lexical-Functio" co-authored by M. Kaplan and J. Bresnan
nal Grammar: A Formal System for Grammatical Repre
sentation "(The MIT Press, Cambridge (1982). Repr
inted in Formal Issues in Lexical-Functional Gramma
r, pp. 29-130.CSLI publications, Stanford Univers
ity (1995).).

【００３７】日本語文では、主語や目的語といった本来
必須の構成要素と考えられるものが省略されることが多
い。より正確な文脈解析を行なうためには、このような
ゼロ代名詞の実体を特定することは不可欠である。した
がって、その前処理として,統語・意味解析部３におい
て文中においてどの構成要素が省略されているかを同定
しておく必要がある。In the Japanese sentence, what is considered to be an essential constituent element such as a subject and an object is often omitted. In order to perform more accurate context analysis, it is indispensable to identify such zero pronoun entities. Therefore, as a preprocessing thereof, it is necessary for the syntactic / semantic analysis unit 3 to identify which constituent element is omitted in the sentence.

【００３８】そこで、本発明では、まず「述語に後続す
る場合、格フレームを持つべきでない述語」に対応する
品詞カテゴリＡＵＸを定義する。例えば、上述した文例
（１）に含まれている「みる」「いる」「ない」などが
ＡＵＸのカテゴリに属する語である。これらの語が述語
に後続する場合は格フレームを持たないものとする。Therefore, in the present invention, first, a part-of-speech category AUX corresponding to "a predicate that should not have a case frame when it follows a predicate" is defined. For example, “see,” “is,” “is not,” and the like included in the sentence example (1) described above are words that belong to the AUX category. If these words follow the predicate, they have no case frame.

【００３９】また、ゼロ代名詞の出現位置を特定する場
合、２つ以上の句が結合している場合の処理が特に問題
になる。この場合、文頭に係助詞「は」や格助詞「が」
を伴って現れる名詞句の係り先がどの句の述部であるか
の特定が問題となる。Further, when specifying the appearance position of the zero pronoun, the processing when two or more phrases are combined becomes a particular problem. In this case, the particle "ha" or the case particle "ga" is added at the beginning of the sentence.
It becomes a problem to identify which predicate of the noun phrase that appears with "."

【００４０】これに対し、本発明では、句と句をつなぐ
句結合子に注目する。句結合子は、「が」「から」
「し」などの接続助詞、連用形接続、「なら」「たら」
などの条件化、「ものの」「ところが」などのモノノ
類、「時」「頃」などの特殊名詞などに分類できる。本
発明では、これらの句結合子（あるいはそれらの組み合
わせ）を従属句構造の観点から再分類する。例えば、句
結合子を以下の３種に分類する。On the other hand, in the present invention, attention is paid to a phrase connector that connects phrases. The phrase connector is "ga""kara"
Conjunctive particles such as "shi", conjunctions, "nara" and "tarara"
It can be categorized into such things as "condition", "monono" such as "mono" and "koroga", and special nouns such as "time" and "koro". In the present invention, these phrase connectors (or combinations thereof) are reclassified in terms of subordinate phrase structure. For example, phrase connectors are classified into the following three types.

【００４１】句結合子Ａ：直前の句の述部が、文頭に出
現する「は」や「が」を伴う名詞句を受けないという傾
向を示す句結合子。「つつ」、「動詞連用形の反復」な
どがここに分類される。 Phrase connecter A: A phrase connecter showing a tendency that the predicate of the immediately preceding phrase does not receive a noun phrase accompanied by "ha" and "ga" appearing at the beginning of a sentence. "Tsutsu" and "repetitive verbs" are classified here.

【００４２】句結合子Ｂ：直前の句の述部が、文頭に出
現する「は」を伴う名詞句を受けないという傾向を示す
句結合子。「ずに」、「ないで」、「たら（条件
化）」、「ても」などがここに分類される。 Phrase connecter B: A phrase connecter showing a tendency that the predicate of the immediately preceding phrase does not receive a noun phrase accompanied by "ha" appearing at the beginning of a sentence. “No”, “without”, “tara (conditional)”, “even”, etc. are classified here.

【００４３】句結合子Ｃ：上記以外の句結合子。直前の
句の述部が、文頭に出現する「は」や「が」を伴う名詞
句を受け得る句結合子。 Phrase connector C: A phrase connector other than the above. A phrase connector in which the predicate of the immediately preceding phrase can receive a noun phrase accompanied by "ha" or "ga" appearing at the beginning of a sentence.

【００４４】なお、これらの句結合子の分類について
は、例えば南不二男著の『現代日本語文法の輪郭』（大
修館書店，1993）に詳細が述べられている。The classification of these phrase connectors is described in detail, for example, in "Contour of Contemporary Japanese Grammar" by Fujio Minami (Daishukan Shoten, 1993).

【００４５】図５には、本発明の一実施形態に係るゼロ
代名詞解析の処理手順をフローチャートの形式で示して
いる。以下、このフローチャートを参照しながらゼロ代
名詞の解析処理について詳解する。FIG. 5 is a flowchart showing the processing procedure of the zero pronoun analysis according to the embodiment of the present invention. Hereinafter, the analysis process of the zero pronoun will be described in detail with reference to this flowchart.

【００４６】まず入力文に対して構文解析処理を実行す
る（ステップＳ１）。First, a syntax analysis process is executed on the input sentence (step S1).

【００４７】そして、例えばｆ−ｓｔｒｕｃｔｕｒｅ形
式で記述されている構文解析木を基に、前述した品詞カ
テゴリＡＵＸに属する述語が他の述語の直後（あるいは
一定の助詞を挟んでその後）に存在するかどうかを判断
する（ステップＳ２）。Whether the predicate belonging to the above-mentioned part-of-speech category AUX exists immediately after another predicate (or after a certain particle is inserted) based on, for example, the parse tree described in the f-structure format. It is judged (step S2).

【００４８】このような述語が存在する場合には、この
品詞カテゴリＡＵＸに属する述語の格フレームを必要な
いものとして削除する（ステップＳ３）。If such a predicate exists, the case frame of the predicate belonging to this part-of-speech category AUX is deleted as unnecessary (step S3).

【００４９】次いで、入力文中に前述した結合子Ａが存
在するかどうかを判断する（ステップＳ４）。そして、
結合子Ａが存在する場合、以下の処理Ｉ及びIIを実行す
る（ステップＳ５）。Then, it is judged whether or not the above-mentioned connector A exists in the input sentence (step S4). And
When the connector A exists, the following processes I and II are executed (step S5).

【００５０】処理１：文頭に「が」を伴う名詞句があれ
ば、結合子Ａの直後の句の述部の主語に対応する格フレ
ームに該名詞句を挿入する。 Process 1: If there is a noun phrase accompanied by "ga" at the beginning of the sentence, the noun phrase is inserted into the case frame corresponding to the subject of the predicate of the phrase immediately after the connector A.

【００５１】同様に、入力文中に前述した結合子が存在
するかどうかを判断して（ステップＳ６）、結合子Ｂが
存在する場合、以下の処理IIを実行する（ステップＳ
８）。Similarly, it is judged whether or not the above-mentioned connector is present in the input sentence (step S6), and if the connector B is present, the following processing II is executed (step S).
8).

【００５２】処理２：文頭に「は」を伴う名詞句があれ
ば結合子Ａ又はＢの直後の句の述部に対応する格フレー
ムに該名詞句を挿入する。但し、格フレームが複数存在
する場合は、曖昧性を残しておき後の意味解析であいま
い性を解消する。 Process 2: If there is a noun phrase accompanied by "ha" at the beginning of the sentence, the noun phrase is inserted into the case frame corresponding to the predicate of the phrase immediately after the connector A or B. However, when there are a plurality of case frames, the ambiguity is left and the ambiguity is resolved in the subsequent semantic analysis.

【００５３】以上の処理によって、構文解析木から不要
な格フレームを削除して、格構造の曖昧性を減じること
ができるので、通常の意味解析（格構造解析）により
（ステップＳ８）、ゼロ代名詞の出現位置をより正確に
特定することができる。By the above processing, unnecessary case frames can be deleted from the syntactic parse tree to reduce the ambiguity of the case structure. Therefore, a normal semantic analysis (case structure analysis) (step S8) is performed to eliminate the zero pronoun. The appearance position of can be specified more accurately.

【００５４】図６には、文例「私は彼の本を読んで発見
した。」を構文解析した結果を示している。FIG. 6 shows the result of parsing the sentence example "I found out by reading his book."

【００５５】また、図７には、図６に示した構文解析木
に対してさらに意味解析（格構造解析）を適用した結果
を示している。同図において、ゼロ代名詞は「ＮＵＬ
Ｌ」で表記されている。同図からも判るように、構文解
析木の適切な位置にＮＵＬＬが付与されている。単一文
のゼロ代名詞解析として正しい結果が得られている。し
たがって、文例「私は彼の本を読んで発見した」を本手
法で解析した結果によれば、「I read his book and di
scovered something.」と正しい翻訳結果を得ることが
できる。Further, FIG. 7 shows the result of applying the semantic analysis (case structure analysis) to the syntax analysis tree shown in FIG. In the figure, the zero pronoun is "NUL
It is described by "L". As can be seen from the figure, NULL is added to an appropriate position of the parse tree. The correct result is obtained as a zero pronoun analysis of a single sentence. Therefore, according to the result of analyzing the sentence example "I read and found his book" by this method, "I read his book and di
You can get the correct translation result with "scovered something."

【００５６】［追補］以上、特定の実施形態を参照しな
がら、本発明について詳解してきた。しかしながら、本
発明の要旨を逸脱しない範囲で当業者が該実施形態の修
正や代用を成し得ることは自明である。すなわち、例示
という形態で本発明を開示してきたのであり、本明細書
の記載内容を限定的に解釈するべきではない。本発明の
要旨を判断するためには、冒頭に記載した特許請求の範
囲の欄を参酌すべきである。[Supplement] The present invention has been described in detail with reference to the specific embodiments. However, it is obvious that those skilled in the art can modify or substitute the embodiments without departing from the scope of the present invention. That is, the present invention has been disclosed in the form of exemplification, and the contents of this specification should not be construed in a limited manner. In order to determine the gist of the present invention, the section of the claims described at the beginning should be taken into consideration.

【００５７】[0057]

【発明の効果】以上詳記したように、本発明によれば、
日本語文のように主語や目的語といった本来は必須と考
えられるが構成要素が省略された文をより正確に統語意
味解析を行なうことができる、優れた自然言語処理シス
テム及び自然言語処理方法、並びにコンピュータ・プロ
グラムを提供することができる。As described above in detail, according to the present invention,
An excellent natural language processing system and natural language processing method capable of more accurately performing syntactic and semantic analysis on a sentence such as a Japanese sentence that is originally indispensable, such as a subject or an object, but whose constituent elements are omitted, and A computer program can be provided.

【００５８】また、本発明によれば、文中から省略され
た主語や目的語すなわちゼロ代名詞の情報を高精度に出
力することができる、優れた自然言語処理システム及び
自然言語処理方法、並びにコンピュータ・プログラムを
提供することができる。Further, according to the present invention, an excellent natural language processing system and natural language processing method capable of accurately outputting information of a subject or object omitted from a sentence, that is, a zero pronoun, and a computer, A program can be provided.

【００５９】本発明では、補助的述語及び句結合子に注
目して、不要な格フレームを削除して、格構造のあいま
い性を減ずる点に特徴があり、これにより精度の高いゼ
ロ代名詞の解析を行なうことができる。The present invention is characterized by reducing unnecessary case frames and reducing the ambiguity of the case structure by paying attention to auxiliary predicates and phrase connectors, which enables highly accurate analysis of zero pronouns. Can be done.

[Brief description of drawings]

【図１】情報処理振興事業協会技術センター（ＩＰＡ）
で開発されたＩＰＡＬ動詞辞書に記述されている動詞
「合う」の選択制限付き格フレームを示した図である。[Fig. 1] Information Technology Promotion Agency Technical Center (IPA)
It is a figure which showed the case frame with selection restrictions of the verb "fit" described in the IPAL verb dictionary developed by.

【図２】ＬＦＧに基づく自然言語処理システム１の構成
を模式的に示した図である。FIG. 2 is a diagram schematically showing a configuration of a natural language processing system 1 based on LFG.

【図３】入力文「私の娘は英語を話します。」を統語・
意味解析部１により処理した結果として得られるｃ−ｓ
ｔｒｕｃｔｕｒｅを示した図である。[Figure 3] Syntactically the input sentence "My daughter speaks English."
Cs obtained as a result of processing by the semantic analysis unit 1
It is the figure which showed structure.

【図４】入力文「私の娘は英語を話します。」を統語・
意味解析部１により処理した結果として得られるｆ−ｓ
ｔｒｕｃｔｕｒｅを示した図である。[Figure 4] Syntactically the input sentence "My daughter speaks English."
Fs obtained as a result of processing by the semantic analysis unit 1
It is the figure which showed structure.

【図５】本発明の一実施形態に係るゼロ代名詞解析の処
理手順をフローチャートの形式で示した図である。FIG. 5 is a diagram showing a processing procedure of zero pronoun analysis according to an embodiment of the present invention in the form of a flowchart.

【図６】文例「私は彼の本を読んで発見した。」に対す
る文構文解析結果を示した図である。FIG. 6 is a diagram showing a sentence parsing result for a sentence example “I have read and found his book.”

【図７】文例「私は彼の本を読んで発見した。」に対す
る意味（格構造）解析結果を示した図である。FIG. 7 is a diagram showing a result of semantic (case structure) analysis for a sentence example “I found out by reading his book.”

[Explanation of symbols]

１…自然言語処理システム２…形態素解析部２Ａ…形態素ルール，２Ｂ…形態素辞書３…統語・意味解析部３Ａ…文法ルール，３Ｂ…結合価辞書 1. Natural language processing system 2 ... Morphological analysis unit 2A ... Morphological rules, 2B ... Morphological dictionary 3 ... Syntactic and Semantic Analysis Department 3A ... Grammar rule, 3B ... Bond valence dictionary

Claims

[Claims]

1. A natural language processing system for performing syntactic / semantic analysis of a sentence in which essential components are omitted, wherein a part-of-speech category AUX corresponding to a predicate that should not have a case frame when the predicate follows The part-of-speech category defining means to be defined, the predicate searching means for searching a predicate belonging to the part-of-speech category AUX existing immediately after another predicate in the input sentence or after sandwiching a certain particle, and the predescription word searching means. And a case frame deleting means for deleting the case frame of the predicate.

2. A first phrase connector that searches a sentence for a first phrase connector that tends not to receive a noun phrase with "ha" or "ga" that appears at the beginning of the sentence in the predicate of the immediately preceding phrase. Subject to the search means and the predicate of the phrase immediately following the first phrase connector, if there is a noun phrase with a "ga" at the beginning of the sentence in response to the discovery of the first phrase connector in the sentence. The noun phrase is inserted in the case frame corresponding to, and / or if there is a noun phrase with "ha" at the beginning of the sentence, the noun phrase is inserted in the case frame corresponding to the predicate of the phrase immediately after the first phrase connector. The natural language processing system according to claim 1, further comprising: first case frame processing means for inserting a noun phrase.

3. A second phrase connector searching means for searching a sentence for a second phrase connector showing a tendency that the predicate of the immediately preceding phrase does not receive a noun phrase accompanied by "ha" appearing at the beginning of the sentence. In response to the fact that the second phrase connector is found in the sentence, if there is a noun phrase accompanied by "ha" at the beginning of the sentence, the case frame corresponding to the predicate of the phrase immediately after the second phrase connector is set. The natural language processing system according to claim 1, further comprising second case frame processing means for inserting the noun phrase.

4. A natural language processing system for performing syntactic / semantic analysis of a sentence in which essential components are omitted, wherein a predicate of a preceding phrase appears at the beginning of a sentence with a "ha" or "ga". A phrase connector search means for searching in a sentence for a phrase connector showing a tendency not to receive a phrase, and if a noun phrase accompanied by "ga" at the beginning of the sentence is found in response to the phrase connector being found in the sentence, Insert the noun phrase in the case frame corresponding to the subject of the predicate of the phrase immediately after the phrase connector, and / or if there is a noun phrase with "ha" at the beginning of the sentence, A natural language processing system comprising: a case frame processing means for inserting the noun phrase into a case frame corresponding to a predicate.

5. A natural language processing system that syntactically and semantically analyzes a sentence in which essential components are omitted, and the predicate of the preceding phrase does not receive a noun phrase accompanied by "ha" appearing at the beginning of the sentence. A phrase connector search means for searching a phrase connector showing a tendency in a sentence and a noun phrase accompanied by "ha" at the beginning of the sentence depending on the fact that the phrase connector is found in the sentence. A natural language processing system, comprising: a case frame processing means for inserting the noun phrase into a case frame corresponding to a predicate of a phrase immediately after.

6. A natural language processing method for performing syntactic / semantic analysis of a sentence in which essential components are omitted, wherein a part-of-speech category AUX corresponding to a predicate that should not have a case frame when the predicate follows The part-of-speech category definition step to be defined, the predicate search step of searching for a predicate belonging to the part-of-speech category AUX that exists immediately after another predicate in the input sentence or after a certain particle is sandwiched, and the predescriptor word search step. A case frame deleting step of deleting a case frame of the predicate described above.

7. A first phrase connector that searches for a first phrase connector in a sentence, which shows a tendency that the predicate of the immediately preceding phrase does not receive a noun phrase accompanied by "ha" or "ga" appearing at the beginning of the sentence. Subject of the predicate of the phrase immediately after the first phrase connector, if there is a noun phrase with "ga" at the beginning of the sentence, in response to the search step and the first phrase connector found in the sentence. The noun phrase is inserted in the case frame corresponding to, and / or if there is a noun phrase with "ha" at the beginning of the sentence, the noun phrase is inserted in the case frame corresponding to the predicate of the phrase immediately after the first phrase connector. The natural language processing method according to claim 6, further comprising a first case frame processing step of inserting a noun phrase.

8. A second phrase connector searching step for searching in a sentence for a second phrase connector which shows a tendency that the predicate of the immediately preceding phrase does not receive a noun phrase accompanied by "ha" appearing at the beginning of the sentence. In response to the fact that the second phrase connector is found in the sentence, if there is a noun phrase accompanied by "ha" at the beginning of the sentence, the case frame corresponding to the predicate of the phrase immediately after the second phrase connector is set. 7. The natural language processing method according to claim 6, further comprising a second case frame processing step of inserting the noun phrase.

9. A natural language processing method for syntactic / semantic analysis of a sentence in which essential components are omitted, wherein a predicate of the preceding phrase appears at the beginning of a sentence with a "ha" or "ga". A phrase connector search step of searching for a phrase connector showing a tendency not to receive a phrase in the sentence, and if there is a noun phrase accompanied by “ga” at the beginning of the sentence according to the phrase connector being found in the sentence, Insert the noun phrase in the case frame corresponding to the subject of the predicate of the phrase immediately after the phrase connector, and / or if there is a noun phrase with "ha" at the beginning of the sentence, A case frame processing step of inserting the noun phrase into a case frame corresponding to a predicate, the natural language processing method.

10. A natural language processing method for syntactic / semantic analysis of a sentence in which essential components are omitted, wherein a predicate of a preceding phrase does not receive a noun phrase accompanied by "ha" appearing at the beginning of the sentence. According to the phrase connector search step of searching for a phrase connector showing a tendency in the sentence, and if there is a noun phrase accompanied by “ha” at the beginning of the sentence in response to the phrase connector being found in the sentence, A case frame processing step of inserting the noun phrase into a case frame corresponding to a predicate of a phrase immediately after, the natural language processing method.

11. A computer program written in a computer-readable format so as to execute natural language processing for performing syntactic / semantic analysis on a sentence in which essential components are omitted, in a predicate. A part-of-speech category definition step that defines a part-of-speech category AUX corresponding to a predicate that should not have a case frame when following, and the part-of-speech category AUX that exists immediately after another predicate or after a certain particle in the input sentence. A computer program, comprising: a predicate search step of searching for a predicate belonging to, and a case frame deletion step of deleting the case frame of the predicate extracted by the predescription word search step.

12. A computer program written in a computer-readable format so as to execute a natural language process for performing syntactic / semantic analysis on a sentence in which essential components are omitted, the computer program comprising: A phrase connector search step that searches for a phrase connector that tends to not receive a noun phrase with a "ha" or "ga" appearing at the beginning of the phrase in the sentence, and the phrase connector is found in the sentence. Accordingly, if there is a noun phrase accompanied by "ga" at the beginning of the sentence, the noun phrase is inserted in the case frame corresponding to the subject of the predicate of the phrase immediately after the phrase connector, and / or at the beginning of the sentence. If there is a noun phrase accompanied by "ha", a case frame processing step of inserting the noun phrase into the case frame corresponding to the predicate of the phrase immediately after the phrase connector, the computer program characterized by the above-mentioned. .

13. A computer program written in a computer-readable format so as to execute a natural language process for performing syntactic / semantic analysis on a sentence in which essential components are omitted, the computer program comprising: Depending on the phrase connector search step of searching for a phrase connector in the sentence that indicates that the predicate of the phrase does not receive the noun phrase with "ha" appearing at the beginning of the sentence, and the phrase connector is found in the sentence. And a case frame processing step of inserting the noun phrase into the case frame corresponding to the predicate of the phrase immediately after the phrase connector, if there is a noun phrase accompanied by "ha" at the beginning of the sentence. A computer program that does.