JPH07334503A

JPH07334503A - Natural language processing system

Info

Publication number: JPH07334503A
Application number: JP6122901A
Authority: JP
Inventors: Takesuke Hiraoka; 丈介平岡
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1994-06-06
Filing date: 1994-06-06
Publication date: 1995-12-22

Abstract

PURPOSE:To transfer a compound noun having the modification of a word, which is required for sentence structure analysis, and meaning information on words, which are required for meaning analysis, and also, to provide a comparatively simple data structure. CONSTITUTION:In the natural language processing system which morphemeanalyzes an input sentence and analyzes the sentence structure and the meaning from the analyzed result, data obtained in morpheme analysis is set to be word data obtained by dividing the input sentence into respective word units and describing attribute data following the respective words, and compound noun data obtained by describing meaning information on respective nouns constituting the compound noun among the words. In the meaning analysis of the compound noun, meaning information on the respective nouns constituting the compound noun are obtained by referring to compound noun data. Common compound noun data is set to be a retrieval/reference object for the compound noun of the same notation.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語処理システム
に係り、特に形態素解析処理で得る複合名詞データを構
文・意味解析処理に受け渡すためのデータ処理に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a natural language processing system, and more particularly to data processing for passing compound noun data obtained by morphological analysis processing to syntax / semantic analysis processing.

【０００２】[0002]

【従来の技術】ワードプロセッサや機械翻訳、ドキュメ
ントデータベース、ハイパーテキストといったコンピュ
ータを使った自然言語処理が実用化されている。2. Description of the Related Art Natural language processing using a computer such as a word processor, machine translation, document database, and hypertext has been put into practical use.

【０００３】このための自然言語解析は、まず解析対象
となる文章を形態素単位（語構成の最小単位）に区切
り、それぞれの形態素がもつ性質を明らかにする形態素
解析を行う。この後、自然言語の統語規則から解析する
構文解析、続いて曖昧性や漠然性を取り除く意味解析、
文脈解析を行う。In natural language analysis for this purpose, a sentence to be analyzed is first divided into morpheme units (minimum units of word structure), and morpheme analysis is performed to clarify the properties of each morpheme. After this, a syntactic analysis that analyzes from the syntactic rules of natural language, and then a semantic analysis that removes ambiguity and vagueness,
Perform contextual analysis.

【０００４】構文解析には、形態素解析された文を文法
を用いて正しい文であるか否かを判定し、正しい文のと
きはその構文解析結果として木構造（解析木）を得る。In the syntactic analysis, it is determined whether or not the morphologically analyzed sentence is a correct sentence by using a grammar, and when the sentence is correct, a tree structure (a parse tree) is obtained as the syntactic analysis result.

【０００５】この構文解析処理では、文法的な適合性の
みに着目しているため、構文的な曖昧性が発生し、多く
の構文木が生成されてしまう。この中から、正しい解析
木を選択するために、意味解析処理を行う。In this syntactic analysis process, since attention is paid only to the grammatical conformity, syntactic ambiguity occurs and many syntactic trees are generated. A semantic analysis process is performed to select the correct parse tree from among these.

【０００６】意味解析処理では、単語の文法カテゴリ
（品詞に相当）だけでなく、その意味的な情報を利用す
るものである。In the semantic analysis process, not only the grammatical category of a word (corresponding to a part of speech) but also its semantic information is used.

【０００７】ここで、形態素解析はＣ言語で記述したプ
ログラムによって処理がなされ、構文・意味解析はＰｒ
ｏｌｏｇ言語で記述したプログラムによって処理がなさ
れている。Here, the morphological analysis is processed by a program described in C language, and the syntax / semantic analysis is performed by Pr.
Processing is performed by a program written in the log language.

【０００８】このため、形態素解析結果で得られたデー
タを構文・意味解析処理に受け渡すためのインターフェ
ース手段を必要とする。このときの受け渡しデータは、
入力文字列を単語単位に分割し、それぞれの単語に付随
した属性的データを記述した形式のデータ（以下、”ｄ
ｉｃｔデータ”と呼ぶ）にされる。Therefore, an interface means is required for passing the data obtained as the morphological analysis result to the syntax / semantic analysis processing. The passing data at this time is
The input character string is divided into words, and the data in the format that describes the attribute data attached to each word (hereinafter, "d"
ict data ”).

【０００９】[0009]

【発明が解決しようとする課題】入力文字列の中に複合
名詞があった場合、形態素解析では個々の名詞に分割す
るが、ｄｉｃｔデータは複合名詞を１つの単語データと
して扱う。When there is a compound noun in the input character string, it is divided into individual nouns by morphological analysis, but the dict data treats the compound noun as one word data.

【００１０】これは、構文解析の段階では単語の係受け
を調べるため、複合名詞を１つの名詞として前後の単語
との係受けを調べるのが合理的であることによる。This is because it is rational to check the dependency between words before and after it, using a compound noun as one noun because the dependency of words is checked at the stage of parsing.

【００１１】しかし、意味解析の段階では複合名詞がど
のような名詞から構成されており、それぞれの名詞がど
のような辞書情報（意味的なものも含む）を持っていた
かということが必要となる。However, at the stage of semantic analysis, it is necessary to know what kind of noun the compound noun is composed of and what kind of dictionary information each noun had (including semantic ones). .

【００１２】現状では、複合名詞のｄｉｃｔデータは、
元の名詞の持っていた情報のうち、ごく限られた情報し
か受け継いでいないため、複合名詞の意味的な解析の確
度を悪くしていた。At present, the dict data of compound noun is
Since only a limited amount of information that the original noun had was inherited, the accuracy of semantic analysis of compound nouns was deteriorated.

【００１３】本発明の目的は、構文解析に必要な単語の
係受け及び意味解析に必要な単語の意味情報を持たせた
複合名詞の受け渡しができる自然言語処理システムを提
供することにある。It is an object of the present invention to provide a natural language processing system capable of accepting a word necessary for syntactic analysis and passing a compound noun carrying the semantic information of a word necessary for semantic analysis.

【００１４】本発明の他の目的は、比較的簡単なデータ
構造で受け渡しできる自然言語処理システムを提供する
ことにある。Another object of the present invention is to provide a natural language processing system capable of passing data with a relatively simple data structure.

【００１５】[0015]

【課題を解決するための手段】本発明は、前記課題の解
決を図るため、入力文を形態素解析し、この解析結果か
ら構文解析と意味解析を行う自然言語処理システムにお
いて、前記形態素解析で得るデータは、入力文を単語単
位に分割し、それぞれの単語に付随した属性的データを
記述した単語データと、該単語のうち複合名詞を構成す
る各名詞の意味情報を記述した複合名詞データとし、前
記構文解析は、前記単語データから構文木の生成を行
い、前記意味解析は、前記単語データのうち同一表記の
複合名詞はその表記をキーとして前記複合名詞データを
検索して各名詞の意味情報を得ることを特徴とする。According to the present invention, in order to solve the above problems, a morpheme analysis is performed on an input sentence, and a natural language processing system for performing syntactic analysis and semantic analysis from the analysis result is obtained by the morpheme analysis. The data is divided into input words into word units, and word data describing attribute data associated with each word, and compound noun data describing semantic information of each noun forming a compound noun of the words, The syntactic analysis generates a syntactic tree from the word data, and the semantic analysis retrieves the compound noun data by using the notation as a key for compound nouns having the same notation among the word data and semantic information of each noun. It is characterized in that

【００１６】[0016]

【作用】形態素解析結果のデータとして従来の単語デー
タの他に、複合名詞データを生成し、複合名詞の意味解
析には複合名詞データを参照することで複合名詞を構成
する各名詞の意味情報を得る。[Function] In addition to the conventional word data as the morphological analysis result data, compound noun data is generated, and the semantic information of each noun forming the compound noun is referred by referring to the compound noun data for the semantic analysis of the compound noun. obtain.

【００１７】同一表記の複合名詞には共通の複合名詞デ
ータを検索・参照対象とする。For compound nouns having the same notation, common compound noun data is searched and referred to.

【００１８】[0018]

【実施例】図１は、本発明の一実施例を示し、形態素解
析処理による構文・意味解析処理への受け渡しデータの
構造を示す。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows an embodiment of the present invention and shows the structure of data passed to a syntax / semantic analysis process by a morpheme analysis process.

【００１９】入力文に対する形態素解析結果として、単
語データとしてのｄｉｃｔデータの他に、複合名詞デー
タとしてのｓｅｍｄｉｃｔデータを生成し、構文・意味
解析のためのデータ受け渡しを行う。As morphological analysis results for the input sentence, semdict data as compound noun data is generated in addition to dict data as word data, and data is passed for syntax / semantic analysis.

【００２０】ｄｉｃｔデータは、従来と同様に、単語の
品詞と表記と文法情報及び意味情報からなるデータ構造
にされる。As in the conventional case, the dict data has a data structure including the part of speech and notation of a word, grammatical information and semantic information.

【００２１】ｓｅｍｄｉｃｔデータ（複合名詞データ）
は、ｄｉｃｔデータの各単語のうち、複合名詞になる単
語の複合名詞表記と元の名詞表記及び元の名詞の意味情
報からなるデータ構造にされる。Semdict data (compound noun data)
Is a data structure including a compound noun notation of a word to be a compound noun, an original noun notation, and semantic information of the original noun of each word of the dict data.

【００２２】例えば、入力文「実験結果を機械処理にか
ける」に対し、形態素に分割した結果は、「実験／結果／を／機械／処理／に／かけ／る」となる。この形態素の単語データになるｄｉｃｔデータ
は、ｄｉｃｔ（名詞、実験結果、〜、Ｓｅｍ１）ｄｉｃｔ（格助詞、を、〜、〜）ｄｉｃｔ（名詞、機械処理、〜、Ｓｅｍ２）ｄｉｃｔ（格助詞、に、〜、〜）ｄｉｃｔ（動詞、かける、〜、〜）となる。ここで、Ｓｅｍは、それぞれの名詞が持つ意味
情報であり、例えば意味素性、シソーラスコードであ
る。For example, an input sentence "experimental result is subjected to machine processing" is divided into morphemes, and the result is "experiment / result / a / machine / processing / to / multiply / multiply". The dict data that becomes the word data of this morpheme is dict (noun, experimental result, ~, Sem1) dict (case particle, 〜, ~) dict (noun, machine processing, ~, Sem2) dict (case particle, , ~, ~) Dict (verb, call, ~, ~). Here, Sem is semantic information that each noun has, for example, a semantic feature or a thesaurus code.

【００２３】このときの、ｓｅｍｄｉｃｔデータは、ｓｅｍｄｉｃｔ（複合名詞、実験結果、［［実験、Ｓｅ
ｍ１１］、［結果、Ｓｅｍ１２］］）ｓｅｍｄｉｃｔ（複合名詞、機械処理、［［機械、Ｓｅ
ｍ２１］、［処理、Ｓｅｍ２２］］）となる。The semdict data at this time is semdict (compound noun, experimental result, [[experiment, Se
m11], [result, Sem12]]) semdict (compound noun, machine processing, [[machine, Se
m21], [processing, Sem22]]).

【００２４】このようなｄｉｃｔデータとｓｅｍｄｉｃ
ｔデータを受け渡された構文・意味解析処理は、構文解
析では、ｄｉｃｔデータから単語の係受けを使って構文
木を生成する。Such dict data and semdic
In the syntax / semantic analysis process delivered with the t data, the syntax analysis generates a syntax tree from the dict data by using word modification.

【００２５】この生成された構文木の意味的な適合性判
断を行う意味解析では、ｄｉｃｔデータのうち、単語
「実験結果」という語は辞書にないため、その意味情報
「Ｓｅｍ１」には何も記述されない。In the semantic analysis for determining the semantic compatibility of the generated syntax tree, the word "experimental result" in the dict data is not in the dictionary, so that the meaning information "Sem1" is nothing. Not described.

【００２６】そこで、ｓｅｍｄｉｃｔデータに複合名詞
表記があるか否かを検索し、「実験」の意味情報Ｓｅｍ
１１と「結果」の意味情報Ｓｅｍ１２を参照する。この
参照によって、複合名詞「実験結果」の意味解析に利用
することができる。Therefore, it is searched whether or not the compound noun expression is included in the semdict data, and the semantic information Sem of "experiment" is searched.
11 and the semantic information Sem12 of "result" are referred to. This reference can be used for semantic analysis of the compound noun “experimental result”.

【００２７】同様に、複合名詞「機械処理」についても
表記をキーとしてｓｅｍｄｉｃｔデータの検索と意味情
報を得ることができる。Similarly, with respect to the compound noun "machine processing", the notation can be used as a key to search the semidict data and obtain the semantic information.

【００２８】したがって、本実施例によれば、ｄｉｃｔ
データのうち、複合名詞の意味情報はｓｅｍｄｉｃｔデ
ータの参照によって、当該複合名詞がどのような名詞か
ら構成され、それぞれの名詞がどういう意味情報を持っ
ているかを意味解析の段階で参照・利用できる。Therefore, according to this embodiment, dict
Among the data, the semantic information of the compound noun can be referred to and used at the stage of the semantic analysis by referring to the semidict data to see what kind of noun the compound noun is composed of and what kind of semantic information each noun has.

【００２９】しかも、従来の単語データ（ｄｉｃｔデー
タ）については、何ら変更することがないため、構文解
析処理の変更を不要にする。Moreover, since the conventional word data (dict data) is not changed at all, the syntactic analysis process need not be changed.

【００３０】また、ｓｅｍｄｉｃｔデータは、同じ表記
の複合名詞に共通の意味情報として利用できる。すなわ
ち、ｄｉｃｔデータ中に複合名詞の各単語毎の意味情報
を記述すると、１つの入力文又は複数の入力文にわたっ
て同じ表記の複合名詞が複数個出現するときにそれぞれ
のｄｉｃｔデータ中に同じ意味情報を記述することにな
って冗長なデータになる。この点、本実施例では、同じ
表記の複合名詞には１つのｓｅｍｄｉｃｔデータを受け
渡すようにしておくことで、同じ表記の複合名詞が出現
したときに同じｓｅｍｄｉｃｔデータを参照することに
よってその意味情報を得ることができる。The semdict data can be used as common semantic information for compound nouns having the same notation. That is, when the semantic information for each word of the compound noun is described in the dict data, when a plurality of compound nouns having the same notation appear in one input sentence or a plurality of input sentences, the same meaning information is included in each dict data. Would result in redundant data. In this respect, in the present embodiment, by passing one semdict data to the compound noun having the same notation, the meaning information can be obtained by referring to the same semdict data when the compound noun having the same notation appears. Can be obtained.

【００３１】[0031]

【発明の効果】以上のとおり、本発明によれば、形態素
解析結果のデータとして、従来の単語データの他に、単
語のうち複合名詞を構成する各名詞の意味情報を記述し
た複合名詞データを生成して構文・意味解析に受け渡
し、意味解析には、複合名詞はその表記をキーとして複
合名詞データを検索して各名詞の意味情報を得るように
したため、以下の効果がある。As described above, according to the present invention, as the data of the morphological analysis result, in addition to the conventional word data, compound noun data describing the semantic information of each noun forming a compound noun of words is used. Since the compound noun is generated and passed to the syntactic / semantic analysis and the compound noun is searched for compound noun data using the notation as a key to obtain the semantic information of each noun, the following effects are obtained.

【００３２】（１）複合名詞を含む入力文の意味解析の
確度を向上できる。(1) The accuracy of the semantic analysis of the input sentence including the compound noun can be improved.

【００３３】（２）複合名詞データの追加にも構文解析
処理には変更を必要としない。(2) No change is required in the syntactic analysis process even when the compound noun data is added.

【００３４】（３）受け渡しデータは、従来の単語デー
タに複合名詞データを追加するのみで済み、データ量の
増加はほとんど無い。(3) With regard to the transfer data, it is sufficient to add the compound noun data to the conventional word data, and the data amount hardly increases.

【００３５】（４）同一表記の複合名詞には共通の複合
名詞データを検索・参照対象として共通利用できる。(4) For compound nouns having the same notation, common compound noun data can be commonly used as a search / reference object.

[Brief description of drawings]

【図１】本発明の一実施例を示す受け渡しデータ構造。FIG. 1 is a transfer data structure showing an embodiment of the present invention.

Claims

[Claims]

1. A natural language processing system for performing morphological analysis of an input sentence and performing syntactic analysis and semantic analysis from the analysis result, wherein the data obtained by the morphological analysis divides the input sentence into word units and Word data describing the accompanying attribute data, and compound noun data describing the semantic information of each noun forming a compound noun of the words, the syntactic analysis generating a syntactic tree from the word data, The natural language processing system wherein the semantic analysis obtains semantic information of each noun by searching the compound noun data for the compound nouns having the same notation among the word data by using the notation as a key.