JP4033093B2

JP4033093B2 - Natural language processing system, natural language processing method, and computer program

Info

Publication number: JP4033093B2
Application number: JP2003326396A
Authority: JP
Inventors: 智子大熊; 博増市; 宏樹吉村; 大悟杉原
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-09-18
Filing date: 2003-09-18
Publication date: 2008-01-16
Anticipated expiration: 2023-09-18
Also published as: JP2005092615A

Description

本発明は、人間が日常的なコミュニケーションに使用する自然言語を数学的に取り扱うための自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに係り、特に、自然言語文の構文・意味解析を行なう自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに関する。 The present invention relates to a natural language processing system, a natural language processing method, and a computer program for mathematically handling a natural language used by humans for daily communication, and in particular, to analyze syntax and semantics of a natural language sentence. The present invention relates to a natural language processing system, a natural language processing method, and a computer program.

さらに詳しくは、本発明は、複数の語が連なって構成される複合語を含む日本語文に対してより速い解析速度で構文・意味解析結果を出力する自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに係り、特に、解析速度の向上のために複合語を１つに纏め上げたときにより高い再現率の構文・意味解析結果を出力する自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに関する。 More specifically, the present invention relates to a natural language processing system and a natural language processing method for outputting a syntactic / semantic analysis result at a faster analysis speed for a Japanese sentence including a compound word composed of a plurality of words connected, and In particular, a natural language processing system and a natural language processing method for outputting a syntactic / semantic analysis result having a higher recall when combining compound words into one to improve analysis speed, and a computer・ Regarding the program.

日本語や英語など、人間が日常的なコミュニケーションに使用する言葉のことを「自然言語」と呼ぶ。多くの自然言語は、自然発生的な起源を持ち、人類、民族、社会の歴史とともに進化してきた。勿論、人は身振りや手振りなどによっても意思疎通を行なうことが可能であるが、自然言語により最も自然で且つ高度なコミュニケーションを実現することができる。 Words that humans use for everyday communication, such as Japanese and English, are called “natural languages”. Many natural languages have a naturally occurring origin and have evolved with the history of mankind, people and society. Of course, people can communicate with each other by gestures and hand gestures, but natural language can realize the most natural and advanced communication.

他方、情報技術の発展に伴い、コンピュータが人間社会に定着し、各種産業や日常生活の中に深く浸透している。いまやコンピュータ・データだけでなく、画像や音響などほとんどすべての情報コンテンツがコンピュータ上で取り扱われ、情報の編集・加工、蓄積、管理、伝達、共有など高度な処理を行なうことが可能となっている。 On the other hand, with the development of information technology, computers have become established in human society and have deeply penetrated into various industries and daily life. Now, not only computer data, but almost all information content such as images and sounds are handled on the computer, making it possible to perform advanced processing such as editing / processing, storage, management, transmission and sharing of information. .

例えば、日本語や英語を始めとする各種の言語で記述される自然言語は、本来抽象的であいまい性が高い性質を持つが、文章を数学的に取り扱うことにより、コンピュータ処理を行なうことができる。この結果、機械翻訳や対話システム、検索システム、質問応答システムなど、自動化処理により自然言語に関するさまざまなアプリケーション／サービスが実現される。 For example, a natural language written in various languages such as Japanese and English is inherently abstract and ambiguous, but can be processed computerically by handling sentences mathematically. . As a result, various applications / services related to natural language are realized by automated processing such as machine translation, dialogue system, search system, and question answering system.

かかる自然言語処理は一般に、形態素解析、構文解析、意味解析、文脈解析という各処理フェーズに区分される。 Such natural language processing is generally divided into processing phases of morphological analysis, syntax analysis, semantic analysis, and context analysis.

形態素解析では、文を意味的最小単位である形態素（ｍｏｒｐｈｅｍｅ）に分節して品詞の認定処理を行なう。構文解析では、文法規則などを基に句構造などの文の構造を解析する。文法規則が木構造であることから、構文解析結果は一般に個々の形態素が係り受け関係などを基にして接合された木構造となる。意味解析では、文中の語の語義（概念）や、語と語の間の意味関係などに基づいて、文が伝える意味を表現する意味構造を求めて、意味構造を合成する。また、文脈解析では、文の系列である文章（談話）を解析の基本単位とみなして、文間の意味的なまとまりを得て談話構造を構成する。 In morpheme analysis, a sentence is segmented into morphemes which are the smallest semantic units, and part-of-speech recognition processing is performed. In syntax analysis, sentence structure such as phrase structure is analyzed based on grammatical rules. Since the grammatical rule is a tree structure, the parsing result generally has a tree structure in which individual morphemes are joined based on a dependency relationship. In semantic analysis, a semantic structure that expresses the meaning conveyed by a sentence is obtained based on the meaning (concept) of the words in the sentence and the semantic relationship between words, and the semantic structure is synthesized. In context analysis, a sentence series (discourse) is regarded as a basic unit of analysis, and a discourse structure is constructed by obtaining a semantic group between sentences.

とりわけ、構文解析及び意味解析は、自然言語処理の分野において、対話システム、機械翻訳、文書校正支援、文書要約などのアプリケーションを実現する上で必要不可欠の技術であるとされている。 In particular, syntactic analysis and semantic analysis are indispensable techniques for realizing applications such as dialog systems, machine translation, document proofreading, and document summarization in the field of natural language processing.

構文解析では、自然言語文を受け取り、文法規則に基づいて単語（文節）間の係り受け関係を決定する処理を行なう。構文解析結果は、依存構造と呼ばれる木構造（依存木）の形態で表現することができる。また、意味解析では、単語（文節）間の係り受け関係に基づいて文中の格関係を決定する処理を行なうことができる。ここで言う格関係とは、文を構成する各要素が持つ、主語（ＳＵＢＪ）、目的語（ＯＢＪ）といった文法上の役割のことを指す。また、文の時制や様相、話法などを判定する処理を意味解析が含む場合もある。 In the syntax analysis, a natural language sentence is received, and a dependency relationship between words (sentences) is determined based on grammatical rules. The parsing result can be expressed in the form of a tree structure (dependency tree) called a dependency structure. In the semantic analysis, it is possible to perform a process of determining a case relationship in a sentence based on a dependency relationship between words (sentences). The case relationship here refers to a grammatical role such as a subject (SUBJ) and an object (OBJ) possessed by each element constituting a sentence. In addition, semantic analysis may include processing for determining sentence tense, appearance, speech, and the like.

ところで、自然言語処理における構文・意味解析に要する時間は、文に含まれる形態素数に対し指数関数的に増加するとされている。このため、複数の形態素を１つにまとめて形態素数を減らすことにより、解析速度の向上を期待することができる。 By the way, the time required for syntax / semantic analysis in natural language processing is assumed to increase exponentially with respect to the number of morphemes contained in a sentence. For this reason, improvement in analysis speed can be expected by combining a plurality of morphemes into one and reducing the number of morphemes.

例えば、構文解析や意味解析などにおいて、複数の語が連なって構成される複合語が出現した場合、これらの語を１つの単語として扱う処理、すなわち連続する複数の形態素を纏め上げる処理を行なうことで、解析速度の改善を図っている。 For example, when a compound word composed of a plurality of words appears in syntax analysis or semantic analysis, a process of treating these words as one word, that is, a process of collecting a plurality of continuous morphemes Therefore, the analysis speed is improved.

例えば、連語に代表されるような原文テクスト中の一まとまりの表現形態を一の形態素として処理し、統語解析の精度を向上させるとともに統語解析に要する時間の短縮を図った機械翻訳装置について提案されている（例えば、特許文献１を参照のこと）。この場合、ＨＤ装置に英和連語辞書を用意し、この英和連語辞書には、いわゆる連語に代表されるような一まとまりの表現形態を格納しておく。そして、統語解析処理において、等位接続詞によって結合された単語から構成される表現形態を英文テクスト中で検索し、英文連語辞書に登録されている場合、あるいは検索された表現形態を構成する単語の接頭辞又は接尾辞が同一である場合には、その検索された表現形態を一の形態素として認識し、分離することなく構文の解析を行なう。 For example, a machine translation device has been proposed that processes a set of expression forms in a source text such as collocations as a single morpheme to improve the accuracy of syntactic analysis and reduce the time required for syntactic analysis. (For example, refer to Patent Document 1). In this case, an English-Japanese collocation dictionary is prepared in the HD device, and a group of expression forms represented by so-called collocations are stored in the English-Japanese collocation dictionary. Then, in the syntactic analysis process, an expression form composed of words combined by equivalence conjunctions is searched in an English text and registered in the English collocation dictionary, or the words constituting the searched expression form If the prefixes or suffixes are the same, the retrieved expression form is recognized as one morpheme and the syntax is analyzed without separation.

また、コンピュータを用いた自然語解析装置において、解析が難しかった複合語、重文、複文を効率的に解析できる構文解析方法についても提案されている（例えば、特許文献２を参照のこと）。この場合、品詞情報が動詞＋接続助詞＋動詞である単語列の組み合わせを１つの動詞とする。また、あらかじめ用意した辞書を参照して、ａｎｄやｏｒの論理演算子という属性を持つ単語が検出された場合、単語の前後に位置する単語を含めて一個の単語として処理を進める。また、形容詞、副詞、感嘆詞から選ばれる少なくとも１つの単語が検出された場合、該単語は該単語の後に最初に出現する指示表明語（動詞、名詞などのように事象を示す単語）を修飾する単語として処理を進める。 In addition, a syntax analysis method that can efficiently analyze compound words, heavy sentences, and compound sentences that have been difficult to analyze in a natural language analysis apparatus using a computer has been proposed (see, for example, Patent Document 2). In this case, a combination of word strings whose part-of-speech information is verb + connecting particle + verb is defined as one verb. If a word having an attribute of “and” or “or” is detected with reference to a dictionary prepared in advance, the process proceeds as one word including words positioned before and after the word. Also, if at least one word selected from adjectives, adverbs, and exclamations is detected, the word modifies the first statement statement (a word indicating an event such as a verb or noun) that appears after the word. Proceed as a word to be processed.

本明細書で言う複合語は、複数の名詞が連なって構成される「複合名詞」や、複数の動詞が連なって構成される「複合動詞」などが挙げられる。 The compound words referred to in this specification include “compound nouns” composed of a plurality of nouns, “compound verbs” composed of a plurality of verbs, and the like.

例えば、以下の例文（１）に示すように、「青少年」、「総合」、「体育」、「大会」という４つの連続した名詞を１つの複合名詞として扱う。 For example, as shown in the following example sentence (1), four consecutive nouns “youth”, “general”, “physical education”, and “meeting” are treated as one compound noun.

（１）横浜で青少年総合体育大会が行われた。
横浜で青少年総合体育大会が行うれるた。
→ 横浜で、青少年総合体育大会が行うれるた (1) A youth sports competition was held in Yokohama.
A youth sports competition was held in Yokohama.
→ A youth sports competition was held in Yokohama

ところが、元々別の形態素をまとめることによって、不具合が生じることがある。図９には、上記の例文に対する構文意味解析結果の一例を示している。例えば、この解析結果を対象にして、「大会」というキーワードで検索しようとしても、連続した名詞を１つの名詞として取り扱った上記の語（「青少年総合体育大会」）とは一致しない。 However, problems may occur by collecting different morphemes. FIG. 9 shows an example of the syntactic and semantic analysis result for the above example sentence. For example, even if an attempt is made to search for the result of this analysis with the keyword “meeting”, it does not match the above word (“Youth General Athletic Meet”) that treats consecutive nouns as one noun.

（２）横浜で行われた大会は何ですか？ (2) What is the tournament held in Yokohama?

例えば、例文（２）のような自然言語による問い合わせに対して、（１）を回答として採用することができない。図１０には、例文（２）についての構文意味解析結果を示しているが、一方の図９に示した解析結果では複合名詞として纏め上げを行なっているため、対応付けることができなくなってしまっている。要言すれば、連続する形態素を纏め上げる弊害として、検索システムの再現率を低下させてしまうことになる。 For example, (1) cannot be adopted as an answer to an inquiry in natural language such as example sentence (2). FIG. 10 shows the syntactic and semantic analysis result for the example sentence (2). However, the analysis result shown in FIG. 9 is compiled as a compound noun and cannot be associated. Yes. In short, as a harmful effect of collecting continuous morphemes, the recall rate of the search system is reduced.

ここで、再現率を維持するために、文字列の完全一致ではなく部分一致を取るという方針で検索を実施しても、別の問題を招来する。 Here, in order to maintain the reproduction rate, even if the search is performed based on a policy of taking a partial match instead of a complete match of the character string, another problem is caused.

（３）合体はどこで行われましたか？ (3) Where was the coalescence done?

例えば上記の例文（３）のような問い合わせ文に対して、「合体」という単語が「青少年総合体育大会」と部分的に一致してしまう。図１１には、例文（２）についての構文意味解析結果を示しているが、図９に示した解析結果と対応付けられてしまうため、例文（１）を回答として採用してしまう。つまり、検索システムの適合率を低下させてしまうことになる。 For example, in the inquiry sentence like the above example sentence (3), the word “union” partially matches with “youth general athletic meet”. FIG. 11 shows the syntactic and semantic analysis result for the example sentence (2), but the example sentence (1) is adopted as an answer because it is associated with the analysis result shown in FIG. In other words, the relevance rate of the search system is reduced.

特開平１１−３２９１７８号公報JP 11-329178 A 特開２００１−１２５８９８号公報JP 2001-125898 A

本発明の目的は、複数の語が連なって構成される複合語を含む文に対してより速い解析速度で構文・意味解析結果を出力することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。 An object of the present invention is to provide an excellent natural language processing system and natural language processing capable of outputting a syntactic / semantic analysis result at a faster analysis speed with respect to a sentence including a compound word composed of a plurality of words. It is to provide a method and a computer program.

本発明のさらなる目的は、解析速度の向上のために複合語を１つに纏め上げたときに高い精度の構文・意味解析結果を出力することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is to provide an excellent natural language processing system and natural language processing capable of outputting a highly accurate syntax / semantic analysis result when combining compound words into one for improving the analysis speed. It is to provide a method and a computer program.

本発明のさらなる目的は、解析速度の向上のために複合語を１つに纏め上げたときに、検索システムの再現率や適合率を低下させることなく、高い精度の構文・意味解析結果を出力することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。 A further object of the present invention is to output a high-accuracy syntax / semantic analysis result without reducing the recall rate and relevance rate of the search system when compound words are combined into one to improve the analysis speed. An object of the present invention is to provide an excellent natural language processing system, natural language processing method, and computer program.

本発明は、上記課題を参酌してなされたものであり、その第１の側面は、特定の品詞からなる複数の語が連なって構成される複合語が出現する自然言語文を解析する自然言語処理システムであって、
入力された自然言語文について形態素毎の品詞の認定結果を含んだ形態素解析結果を取得する手段と、
前記形態素解析結果に基づいて、該入力された自然言語文中で前記特定の品詞の形態素が連なっている箇所を抽出する手段と、
該抽出された連続する前記特定の品詞の形態素に対して特別な文法カテゴリを与える手段と、
を具備することを特徴とする自然言語処理システムである。 The present invention has been made in consideration of the above problems, and a first aspect thereof is a natural language for analyzing a natural language sentence in which a compound word composed of a plurality of words composed of specific parts of speech appears. A processing system,
Means for obtaining a morphological analysis result including a recognition result of a part of speech for each morpheme for the input natural language sentence;
Based on the morpheme analysis result, means for extracting a part where the morpheme of the specific part of speech is continuous in the input natural language sentence;
Means for giving a special grammatical category to the extracted consecutive part-of-speech morphemes;
It is a natural language processing system characterized by comprising.

ここで言う特定の品詞は、例えば名詞や動詞のことであり、複合語は、複合名詞や複合動詞のことを指す。 The specific part of speech mentioned here refers to, for example, a noun or a verb, and the compound word refers to a compound noun or a compound verb.

本発明に係る自然言語処理システムは、形態素毎の品詞の認定結果を含んだ形態素解析結果に基づいて構文意味解析用辞書を作成する辞書作成手段と、前記構文意味解析用辞書の各見出し語に対して文法カテゴリに応じた文法ルールを適用して構文解析を行う構文解析手段をさらに備えていてもよい。 The natural language processing system according to the present invention includes a dictionary creation means for creating a syntax-semantic analysis dictionary based on a morphological analysis result including a result of part-of-speech recognition for each morpheme, and each headword of the syntax-separation analysis dictionary. On the other hand, it may further comprise a syntax analysis means for performing syntax analysis by applying a grammar rule corresponding to the grammar category.

このような場合、前記辞書作成手段は、連続する前記特定の品詞の形態素に対して特別な文法カテゴリを与える。また、前記構文解析手段は、前記特定の品詞に関する文法ルール以外に、前記特別な文法カテゴリを持つ形態素は前記特定の品詞になるという文法ルールを含んでいる。この文法ルールは、０以上の前記特定の品詞の形態素に接続する形態素は前記特定の品詞になるという文法ルールと等価である。 In such a case, the dictionary creating means gives a special grammatical category to the continuous morpheme of the specific part of speech. The syntax analysis means includes a grammatical rule that a morpheme having the special grammar category becomes the specific part of speech in addition to the grammatical rule regarding the specific part of speech. Grammar rules This morphological connecting to morphemes 0 or more of the specific part of speech is equivalent to grammar rule becomes the particular part of speech.

一般に、形態素解析結果に基づいて、構文・意味解析（ＬＦＧ）用の辞書が一時的に生成される。上述したように、複合名詞は名詞（Ｎ）として特定された連続する複数の形態素で構成される。ここで、複合名詞を構成する各形態素の品詞を名詞（Ｎ）のまま取り扱うと、構文・意味解析では名詞（Ｎ）はＳ、ＮＰ、ＡＤＶのいずれにも成り得る、すなわち解析結果の候補数が多くなる分だけ計算量が増大してしまう。 In general, a dictionary for syntax / semantic analysis (LFG) is temporarily generated based on a morphological analysis result. As described above, a compound noun is composed of a plurality of continuous morphemes specified as a noun (N). Here, if the part of speech of each morpheme constituting a compound noun is handled as a noun (N), the noun (N) can be any of S, NP, and ADV in the syntax / semantic analysis, that is, the number of analysis result candidates. The amount of calculation increases by the amount of increase.

そこで、本発明では、形態素解析結果から構文意味解析用の文法カテゴリを生成する際に、複合名詞を構成する名詞に特別な文法カテゴリを割り当てて、構文・意味解析時において候補数を制限することにより、解析結果に影響を与えずに解析速度を向上させるようにした。 Therefore, in the present invention, when generating a grammatical category for syntactic and semantic analysis from a morphological analysis result, a special grammatical category is assigned to nouns constituting a compound noun, and the number of candidates is limited at the time of syntactic and semantic analysis. Therefore, the analysis speed is improved without affecting the analysis result.

具体的には、通常の名詞に関する文法ルールに加えて、０以上の名詞に接続する名詞は名詞になるというルールを追加し、複合名詞の各要素は、通常の名詞を含む構文意味解析ルールにマッチしなくなるようにした。すなわち、構文意味解析結果の候補が制限されることから、計算コストを削減し、解析速度を向上することができる。 Specifically, in addition to the grammatical rules for ordinary nouns, a rule is added that nouns connected to zero or more nouns become nouns, and each element of a compound noun is a syntactic and semantic analysis rule that includes ordinary nouns. Made it no longer match. That is, since the candidates for the syntactic and semantic analysis result are limited, the calculation cost can be reduced and the analysis speed can be improved.

また、本発明の第２の側面は、特定の品詞からなる複数の語が連なって構成される複合語が出現する自然言語文を解析するための処理をコンピュータ・システム上で実行するようにコンピュータ可読形式で記述されたコンピュータ・プログラムであって、
入力された自然言語文について形態素毎の品詞の認定結果を含んだ形態素解析結果を取得するステップと、
前記形態素解析結果に基づいて、該入力された自然言語文中で前記特定の品詞の形態素が連なっている箇所を抽出するステップと、
該抽出された連続する前記特定の品詞の形態素に対して特別な文法カテゴリを与えるステップと、
を具備することを特徴とするコンピュータ・プログラムである。 A second aspect of the present invention is a computer configured to execute, on a computer system, a process for analyzing a natural language sentence in which a compound word composed of a plurality of words composed of specific parts of speech appears. A computer program written in a readable format,
Obtaining a morphological analysis result including a recognition result of part of speech for each morpheme for the input natural language sentence;
Based on the morphological analysis result, extracting a portion where the morpheme of the specific part of speech is continuous in the input natural language sentence;
Providing a special grammar category for the extracted consecutive part-of-speech morphemes;
A computer program characterized by comprising:

本発明の第２の側面に係るコンピュータ・プログラムは、コンピュータ・システム上で所定の処理を実現するようにコンピュータ可読形式で記述されたコンピュータ・プログラムを定義したものである。換言すれば、本発明の第２の側面に係るコンピュータ・プログラムをコンピュータ・システムにインストールすることによって、コンピュータ・システム上では協働的作用が発揮され、本発明の第１の側面に係る自然言語処理システムと同様の作用効果を得ることができる。 The computer program according to the second aspect of the present invention defines a computer program described in a computer-readable format so as to realize predetermined processing on a computer system. In other words, by installing the computer program according to the second aspect of the present invention in the computer system, a cooperative action is exhibited on the computer system, and the natural language according to the first aspect of the present invention. The same effects as the processing system can be obtained.

本発明によれば、複数の語が連なって構成される複合語を含む文に対してより速い解析速度で構文・意味解析結果を出力することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することができる。 Advantageous Effects of Invention According to the present invention, an excellent natural language processing system and natural language processing capable of outputting a syntax / semantic analysis result at a faster analysis speed for a sentence including a compound word composed of a plurality of words connected in series. Methods and computer programs can be provided.

また、本発明によれば、解析速度の向上のために複合語を１つに纏め上げたときに、検索システムの再現率や適合率を低下させることなく、高い精度の構文・意味解析結果を出力することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することができる。 In addition, according to the present invention, when compound words are combined into one for improving the analysis speed, a highly accurate syntax / semantic analysis result can be obtained without reducing the recall rate and the matching rate of the search system. An excellent natural language processing system, natural language processing method, and computer program that can be output can be provided.

本発明のさらに他の目的、特徴や利点は、後述する本発明の実施形態や添付する図面に基づくより詳細な説明によって明らかになるであろう。 Other objects, features, and advantages of the present invention will become apparent from more detailed description based on embodiments of the present invention described later and the accompanying drawings.

以下、図面を参照しながら本発明の実施形態について詳解する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明に係る自然言語処理システムは、解析速度の向上のために、複合名詞や複合動詞などの連続する形態素からなる複合語を１つに纏め上げる処理を行なうが、このとき、検索システムの再現率や適合率を低下させることなく、高い精度の構文・意味解析結果を出力することができる。 The natural language processing system according to the present invention performs a process of collecting compound words composed of continuous morphemes such as compound nouns and compound verbs into one in order to improve the analysis speed. It is possible to output high-accuracy syntax / semantic analysis results without reducing the rate and precision.

ここで、構文・意味解析を行うための文法理論の代表的な例として、ＬｅｘｉｃａｌＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ（ＬＦＧ）を挙げることができる。本発明は、例えばＬＦＧ文法理論に基づく統語・意味解析処理に組み込んで実装することができる。ＬＦＧでは、ネイティブ・スピーカの言語知識すなわち文法を、コンピュータ処理や、コンピュータの処理動作に影響を及ぼすその他の非文法的な処理パラメータとは切り離したコンポーネントとして構成している。 Here, Lexical Functional Grammar (LFG) can be cited as a representative example of grammar theory for performing syntax / semantic analysis. The present invention can be implemented by being incorporated into syntactic / semantic analysis processing based on, for example, LFG grammar theory. In LFG, linguistic knowledge, that is, grammar of native speakers is configured as a component separated from computer processing and other non-grammatical processing parameters that affect the processing operation of the computer.

まず、自然言語処理システムの全体像について簡単に説明する。図１には、ＬＦＧに基づく自然言語処理システム１の構成を模式的に示している。この自然言語処理システムは、例えばパーソナル・コンピュータ（ＰＣ）などの一般的な計算機システム上で、所定の自然言語処理アプリケーション・プログラムを実行するという形態で実現することができる。 First, an overview of the natural language processing system will be briefly described. FIG. 1 schematically shows a configuration of a natural language processing system 1 based on LFG. This natural language processing system can be realized in the form of executing a predetermined natural language processing application program on a general computer system such as a personal computer (PC).

形態素解析部２は、日本語など特定の言語に関する形態素ルール２Ａと形態素辞書２Ｂを持ち、入力文を意味的最小単位である形態素に分節して品詞の認定処理を行なう。例えば、「私の娘は英語を話します。」という文が入力された場合、形態素解析結果として、「私｛Ｎｏｕｎ｝の｛ｕｐ｝娘｛Ｎｏｕｎ｝は｛ｕｐ｝英語｛Ｎｏｕｎ｝を｛ｕｐ｝話す｛Ｖｅｒｂ１｝｛ｔｒ｝ます｛ｊｐ｝。｛ｐｔ｝」が出力される。 The morpheme analysis unit 2 has a morpheme rule 2A and a morpheme dictionary 2B related to a specific language such as Japanese, and performs a part-of-speech recognition process by segmenting an input sentence into morphemes that are semantic minimum units. For example, if a sentence “My daughter speaks English” is input, “{up} daughter {Noun} of I {Noun} {up} English {Noun} {up} } Speak {Verb1} {tr} mass {jp}. {Pt} "is output.

このような形態素解析結果は、次いで、統語・意味解析部３に入力される。統語・意味解析部３は、文法ルール３Ａや結合価辞書３Ｂなどの辞書を持ち、文法ルールなどに基づく句構造の解析や、文中の語の語義や語と語の間の意味関係などに基づいて文が伝える意味を表現する意味構造の解析を行なう（結合価辞書は動詞と主語などの文中の他の構成要素との関係を記述したものであり、述部とそれに係る語の意味関係を抽出することができる）。そして、構文解析した結果として、単語や形態素などからなる文章の句構造を木構造として表した“ｃ−ｓｔｒｕｃｔｕｒｅ（ｃｏｎｓｔｉｔｕｅｎｔｓｔｒｕｃｔｕｒｅ）”と、主語、目的語などの格構造に基づいて入力文を疑問文、過去形、丁寧文など意味的・機能的に解析した結果として“ｆ−ｓｔｒｕｃｔｕｒｅ（ｆｕｎｃｔｉｏｎａｌｓｔｒｕｃｔｕｒｅ）”を出力する。 Such a morphological analysis result is then input to the syntactic / semantic analysis unit 3. The syntactic / semantic analysis unit 3 has dictionaries such as a grammar rule 3A and a valence dictionary 3B, and is based on the analysis of phrase structure based on the grammar rule, the meaning of words in a sentence, and the semantic relationship between words. Analyzing the semantic structure expressing the meaning conveyed by the sentence (The valence dictionary describes the relationship between verbs and other components in the sentence such as the subject, and the semantic relation between the predicate and the related word. Can be extracted). As a result of parsing, “c-structure (constituent structure)” representing a phrase structure of a sentence including words and morphemes as a tree structure, and an input sentence based on a case structure such as a subject and an object are questioned. “F-structure (functional structure)” is output as a result of semantic and functional analysis such as sentences, past tense, and polite sentences.

図２及び図３には、入力文「私の娘は英語を話します。」を統語・意味解析部１により処理した結果として得られるｃ−ｓｔｒｕｃｔｕｒｅ及びｆ−ｓｔｒｕｃｔｕｒｅをそれぞれ示している。 FIGS. 2 and 3 respectively show c-structure and f-structure obtained as a result of processing the input sentence “My daughter speaks English” by the syntactic / semantic analysis unit 1.

ｃ−ｓｔｒｕｃｔｕｒｅは、文中の単語や句の構造を木構造形式で表したものであり、構文カテゴリによって定義される。例えば音素列を生成するための音韻学的な解釈を、ｃ−ｓｔｒｕｃｔｕｒｅを基に行なうことができる。一方、ｆ−ｓｔｒｕｃｔｕｒｅは、文法的な機能を明確に表現したものであり、文法的な機能名、意味的形式、並びに特徴シンボルにより構成される。ｆ−ｓｔｒｕｃｔｕｒｅを参照することにより、主語（ｓｕｂｊｅｃｔ）、目的語（ｏｂｊｅｃｔ）、補語（ｃｏｍｐｌｅｍｅｎｔ）、修飾語（ａｄｊｕｎｃｔ）といった意味理解を得ることができる。ｆ−ｓｔｒｕｃｔｕｒｅは、ｃ−ｓｔｒｕｃｔｕｒｅの各節点に付随する素性の集合であり、図３に示すように属性−属性値のマトリックスの形で表現される。すなわち、［］で囲まれた中の左側は素性（属性）の名前であり、右側は素性の値（属性値）である。 c-structure represents the structure of words and phrases in a sentence in a tree structure format, and is defined by a syntax category. For example, phonological interpretation for generating a phoneme string can be performed based on c-structure. On the other hand, f-structure clearly expresses a grammatical function, and includes a grammatical function name, a semantic form, and a feature symbol. By referring to f-structure, it is possible to obtain an understanding of the meaning of a subject, an object, an complement, a modifier, and so on. The f-structure is a set of features attached to each node of the c-structure, and is expressed in the form of an attribute-attribute value matrix as shown in FIG. That is, the left side in [] is a feature (attribute) name, and the right side is a feature value (attribute value).

なお、ＬＦＧの詳細に関しては、例えばＲ．Ｍ．Ｋａｐｌａｎ及びＪ．Ｂｒｅｓｎａｎ共著の論文“Ｌｅｘｉｃａｌ−ＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ：ＡＦｏｒｍａｌＳｙｓｔｅｍｆｏｒＧｒａｍｍａｔｉｃａｌＲｅｐｒｅｓｅｎｔａｔｉｏｎ”（ＴｈｅＭＩＴＰｒｅｓｓ，Ｃａｍｂｒｉｄｇｅ（１９８２）．ＲｅｐｒｉｎｔｅｄｉｎＦｏｒｍａｌＩｓｓｕｅｓｉｎＬｅｘｉｃａｌ−ＦｕｎｃｔｉｏｎａｌＧｒａｍｍａｒ，ｐｐ．２９−１３０．ＣＳＬＩｐｕｂｌｉｃａｔｉｏｎｓ，ＳｔａｎｆｏｒｄＵｎｉｖｅｒｓｉｔｙ（１９９５）．）などに記述されている。 For details of LFG, see, for example, R.A. M.M. Kaplan and J.H. Bresnan co-author of the paper. "Lexical-Functional Grammar: A Formal System for Grammatical Representation" (The MIT Press, Cambridge (1982) Reprinted in Formal Issues in Lexical-Functional Grammar, pp.29-130.CSLI publications, Stanford University (1995 ).) Etc.

次いで、本発明に係る自然言語処理による複合名詞などの連続する形態素からなる複合語についての纏め上げ処理について詳解する。 Next, a detailed description will be given of the grouping process for compound words composed of continuous morphemes such as compound nouns by natural language processing according to the present invention.

［背景技術］の欄でも既に述べたように、複合語を１つの単語として纏め上げることにより、構文・意味解析時の解析速度が向上するが、検索システムにおける再現率又は適合率が低下するという弊害を伴う。 As already mentioned in the section of “Background Art”, by combining the compound words as one word, the analysis speed at the time of syntax / semantic analysis is improved, but the recall or relevance rate in the search system is reduced. Accompanied by evil.

一般に、形態素解析結果に基づいて、構文・意味解析（ＬＦＧ）用の辞書が一時的に生成される。この構文・意味解析用辞書は、入力文の各形態素を見出し語とし、見出し語とその文法カテゴリを記述している。上述したように、複合名詞は名詞（Ｎ）として特定された連続する複数の形態素で構成される。ここで、複合名詞を構成する各形態素の品詞を名詞（Ｎ）のまま取り扱うと、構文・意味解析では名詞（Ｎ）はＳ、ＮＰ、ＡＤＶのいずれにも成り得る、すなわち解析結果の候補数が多くなる分だけ計算量が増大してしまう。 In general, a dictionary for syntax / semantic analysis (LFG) is temporarily generated based on a morphological analysis result. This dictionary for syntax and semantic analysis uses each morpheme of the input sentence as a headword, and describes the headword and its grammatical category. As described above, a compound noun is composed of a plurality of continuous morphemes specified as a noun (N). Here, if the part of speech of each morpheme constituting a compound noun is handled as a noun (N), the noun (N) can be any of S, NP, and ADV in the syntax / semantic analysis, that is, the number of analysis result candidates. The amount of calculation increases by the amount of increase.

図４には、構文・意味解析の前処理として、形態素解析の結果を基に、構文・意味解析（ＬＦＧ）用の辞書を生成するための処理手順をフローチャートの形式で示している。但し、ここでは複合語の例として複合名詞を取り扱うものとする。 FIG. 4 is a flowchart showing a processing procedure for generating a dictionary for syntax / semantic analysis (LFG) based on the result of morphological analysis as preprocessing of syntax / semantic analysis. However, compound nouns are handled here as examples of compound words.

まず、元の日本語原文を入力するとともに、別途行なわれる形態素解析処理から得られる形態素解析結果を取得する（ステップＳ１）。形態素解析では、入力文を意味的最小単位である形態素に分節して品詞の認定が行われる。 First, the original Japanese original is input, and a morphological analysis result obtained from a morphological analysis process performed separately is acquired (step S1). In the morphological analysis, the part of speech is segmented into morphemes, which are the smallest semantic units, and the part of speech is recognized.

次いで、変数ｉに１を代入し（ステップＳ２）、ｉが入力文に含まれる形態素数に到達するまでの間、ループ１では、入力文中の各形態素を辞書に登録する処理が行われる。ループ１では、複合名詞を構成する各形態素に特別な品詞カテゴリを逐次割り当ていくためのバッファが用意される。 Next, 1 is substituted into the variable i (step S2), and in the loop 1, a process of registering each morpheme in the input sentence in the dictionary is performed until i reaches the number of morphemes included in the input sentence. In loop 1, a buffer for sequentially assigning a special part of speech category to each morpheme constituting the compound noun is prepared.

ｉ番目の形態素が名詞の場合には（ステップＳ３）、バッファに当該ｉ番目の形態素を書き込み（ステップＳ４）、変数ｉをｉ＋１にして、形態素ｉが名詞である間は、ループ２において、複合名詞を構成する名詞に特別な品詞カテゴリを割り当てる処理が繰り返し行われる。 If the i-th morpheme is a noun (step S3), the i-th morpheme is written in the buffer (step S4), the variable i is set to i + 1, and while the morpheme i is a noun, it is combined in the loop 2. The process of assigning a special part-of-speech category to nouns constituting the noun is repeated.

すなわち、バッファに格納されている形態素ｉら、複合名詞を構成する名詞に特別な文法カテゴリとしてＮｍｏｄを与え、辞書に登録する（ステップＳ６）そして、辞書に登録した後は、バッファを空にし（ステップＳ７）、変数ｉを１だけ増分して、ｉ番目の形態素が名詞である間はループ２の処理を繰り返し実行する。 That is, Nmod is given as a special grammar category to the nouns constituting the compound nouns and the morphemes i stored in the buffer and registered in the dictionary (step S6). After registering in the dictionary, the buffer is emptied ( In step S7), the variable i is incremented by 1, and the process of loop 2 is repeatedly executed while the i-th morpheme is a noun.

一方、形態素ｉが名詞でない場合には（ステップＳ３）、バッファが空であるかどうかをさらに判別する（ステップＳ９）。
On the other hand, if the morpheme i is not a noun (step S3), it is further determined whether or not the buffer is empty (step S9).

バッファが空である場合には、所定の文法カテゴリ変換表（図６を参照のこと）を参照して、ｉ番目の形態素に該当する通常の文法カテゴリを取得し、当該ｉ番目の形態素とともに辞書に登録する（ステップＳ１０）。そして、変数ｉを１だけ増分して（ステップＳ１１）、ループ１の処理を繰り返し実行する。 When the buffer is empty, a normal grammar category corresponding to the i-th morpheme is obtained by referring to a predetermined grammar category conversion table (see FIG. 6), and the dictionary together with the i-th morpheme (Step S10). Then, the variable i is incremented by 1 (step S11), and the process of loop 1 is repeatedly executed.

また、バッファが空でない場合には（ステップＳ９）、バッファに格納されている形態素を通常の文法カテゴリＮとして辞書に登録し（ステップＳ１２）、バッファを空にする（ステップＳ１３）。そして、変数ｉを１だけ増分して（ステップＳ１１）、ループ１の処理を繰り返し実行する。 If the buffer is not empty (step S9), the morpheme stored in the buffer is registered in the dictionary as a normal grammar category N (step S12), and the buffer is empty (step S13). Then, the variable i is incremented by 1 (step S11), and the process of loop 1 is repeatedly executed.

続いて、本実施形態に係る構文・意味解析の前処理ついて、上記の例文（１）に基づいて具体的に説明する。 Next, the pre-processing of syntax / semantic analysis according to the present embodiment will be specifically described based on the example sentence (1).

入力文を形態素解析に投入すると、その出力結果として、品詞情報とともに文字列が取得され、形態素解析結果として保持される。ここでは、入力文を構成する各形態素「横浜」、「で」、「青少年」、「総合」、「体育」、「大会」「が」、「行う」、「れる」、「た。」が、図５に示すようにそれぞれ品詞情報と文頭からの順番とともに格納される。同図において、「表層」は原文から形態素毎に区切った文字列であり、「見出し語」は形態素が活用語の場合の原形である。 When an input sentence is input to morphological analysis, a character string is acquired together with the part-of-speech information as an output result, and is held as a morphological analysis result. Here, the morphemes “Yokohama”, “De”, “Youth”, “Comprehensive”, “Physical education”, “Meeting” “Ga”, “Done”, “Red”, “Ta” that compose the input sentence. 5, part-of-speech information and the order from the beginning of the sentence are respectively stored. In the figure, “surface layer” is a character string delimited by morpheme from the original text, and “entry word” is the original form when the morpheme is a usage word.

次に、ｉ番目の形態素の品詞の情報を参照し、名詞でなければ空白文字とともに出力する。ｉ＝１のとき、ｉ番目の形態素「横浜」が名詞であるから（ステップＳ３）、「横浜」を品詞情報とともにバッファに格納する（ステップＳ４）。そして、変数ｉをｉ＋１にする（ステップＳ５）。 Next, the part-of-speech information of the i-th morpheme is referenced, and if it is not a noun, it is output together with a blank character. When i = 1, since the i-th morpheme “Yokohama” is a noun (step S3), “Yokohama” is stored in the buffer together with the part of speech information (step S4). Then, the variable i is set to i + 1 (step S5).

次の形態素ｉは「で」であり、その品詞情報は「格助詞」なので（ステップＳ３）、ループ２には入らず、バッファが空かどうかを判別する（ステップＳ９）。ここでは、バッファは空ではないので、バッファ中の文字列「横浜」を文法カテゴリ「Ｎ」として構文意味解析用辞書に登録し（ステップＳ１２）、空白文字とともに出力し、バッファを空にする（ステップＳ１３）。また、形態素ｉである「で」の品詞は「格助詞」なので（ステップＳ９）、文法カテゴリ変換表を参照して、文法カテゴリ「ＰＰｏｂｌ」として構文意味解析用辞書に登録する（ステップＳ１０）。そして、変数ｉをｉ＋１にする（ステップＳ１１）。 Since the next morpheme i is “de” and its part-of-speech information is “case particle” (step S3), it does not enter the loop 2 and determines whether or not the buffer is empty (step S9). Here, since the buffer is not empty, the character string “Yokohama” in the buffer is registered in the syntax and semantic analysis dictionary as the grammar category “N” (step S12), is output together with a blank character, and the buffer is empty ( Step S13). Since the part of speech of “de” which is the morpheme i is “case particle” (step S9), the grammar category conversion table is referred to and registered in the syntactic and semantic analysis dictionary as the grammar category “PPbl” (step S10). Then, the variable i is set to i + 1 (step S11).

次の形態素ｉは「青少年」であり、品詞が「名詞」なので（ステップＳ３）、「青少年」を品詞情報とともにバッファに格納する（ステップＳ４）。そして、ｉをｉ＋１にする（ステップＳ５）。 Since the next morpheme i is “youth” and the part of speech is “noun” (step S3), “youth” is stored in the buffer together with the part of speech information (step S4). Then, i is set to i + 1 (step S5).

次の形態素ｉは「総合」であり、品詞が「名詞」である（ステップＳ３）。このとき、バッファが空でないため、複合名詞を構成する名詞であることを示す特別な文法カテゴリ「Ｎｍｏｄ」を割り当てて、辞書に登録した後（ステップＳ６）、バッファを空にしてから、現在ｉ番目の形態素である「総合」をバッファに格納する（ステップＳ７）。そして、変数ｉをｉ＋１にする（ステップＳ８）。 The next morpheme i is “general” and the part of speech is “noun” (step S3). At this time, since the buffer is not empty, a special grammar category “Nmod” indicating that it is a noun constituting a compound noun is assigned and registered in the dictionary (step S6). The “total” which is the th morpheme is stored in the buffer (step S7). Then, the variable i is set to i + 1 (step S8).

次の形態素ｉは「体育」であり、品詞は「名詞」で（ステップＳ３）、ループ２における手続きを繰り返す。つまり、現在バッファに格納されている「総合」に複合名詞を構成する名詞であることを示す特別な文法カテゴリ「Ｎｍｏｄ」を割り当てて、辞書に登録した後（ステップＳ６）、バッファを空にしてから、現在ｉ番目の形態素である「体育」をバッファに格納する（ステップＳ７）。そして、変数ｉをｉ＋１にする（ステップＳ８）。 The next morpheme i is “physical education”, the part of speech is “noun” (step S3), and the procedure in loop 2 is repeated. That is, after assigning a special grammar category “Nmod” indicating that it is a noun constituting a compound noun to “general” currently stored in the buffer and registering it in the dictionary (step S6), the buffer is emptied. From the current i-th morpheme, “physical education” is stored in the buffer (step S7). Then, the variable i is set to i + 1 (step S8).

後続のｉ番目の形態素の品詞情報が名詞である限り、ループ２内の処理を繰り返す。ここでは、次の形態素ｉが「大会」であり、品詞は「名詞」なので（ステップＳ３）、また同じ処理を行う。 As long as the part-of-speech information of the subsequent i-th morpheme is a noun, the processing in the loop 2 is repeated. Here, since the next morpheme i is “meeting” and the part of speech is “noun” (step S3), the same processing is performed.

さらに次の形態素ｉは「が」であり、品詞は「格助詞」なので（ステップＳ３）、ループ２における繰り返しを中止する。そして、文法カテゴリ変換表を参照し、現在バッファに格納されている形態素「大会」を文法カテゴリＮとして構文意味解析用辞書に登録する（ステップＳ１２）。ここで、文法カテゴリ変換表を参照し、ｉ番目の形態素「が」の品詞は「格助詞」なので、これを文法カテゴリ「ＰＰｏｂｌ」として構文意味解析用辞書に登録する（ステップＳ１０）。 Further, since the next morpheme i is “ga” and the part of speech is “case particle” (step S3), the repetition in the loop 2 is stopped. Then, referring to the grammar category conversion table, the morpheme “meeting” currently stored in the buffer is registered as a grammar category N in the syntax and semantic analysis dictionary (step S12). Here, referring to the grammar category conversion table, since the part of speech of the i-th morpheme “ga” is “case particle”, it is registered as a grammatical category “PPbl” in the syntactic and semantic analysis dictionary (step S10).

そして、変数ｉをｉ＋１にし（ステップＳ１１）、処理を継続する。以降、名詞以外の形態素「行う」、「れる」、「た」が順次出現するが（ステップＳ３）、バッファが空であるから（ステップＳ９）、文法カテゴリ変換表を参照し、「行う」が「Ｖ」に「れる」が「ＡＵＸ」に「た」が「ＡＵＸ」にそれぞれ変換され、辞書に登録される（ステップＳ１０）。 Then, the variable i is set to i + 1 (step S11), and the process is continued. Thereafter, morphemes other than nouns “do”, “re”, and “ta” appear in order (step S3), but the buffer is empty (step S9). “V” is converted into “AUX” and “ta” is converted into “AUX”, and is registered in the dictionary (step S10).

以上のようにして、本実施形態に係る構文・意味解析の辞書作成処理手続きによれば、例文（１）の入力に対して、図７に示すような構文意味解析用辞書を作成することができる。 As described above, according to the syntax / semantic analysis dictionary creation processing procedure according to the present embodiment, it is possible to create a syntax / semantic analysis dictionary as shown in FIG. 7 for the input of the example sentence (1). it can.

本実施形態に係る構文解析において適用される文法ルールの一例を下式に示しておく。 An example of a grammar rule applied in the syntax analysis according to this embodiment is shown in the following expression.

Ｓ→ＮＰ＊ＶＰ …（ａ）
ＮＰ→Ｎ｛ＰＰｓｕｂｊ｜ＰＰｏｂｌ｝ …（ｂ）
ＶＰ→ＶＡＵＸ＊ …（ｃ）
Ｎ→Ｎｍｏｄ＊Ｎ …（ｄ） S → NP * VP (a)
NP → N {PPsubj | PPobl} (b)
VP → V AUX * (c)
N → Nmod * N (d)

上式のうち、（ａ）〜（ｃ）は既存の構文解析ルールである。ここで、式（ａ）は、０以上の連用修飾成分（ＮＰ）＋述語（ＶＰ）は文（Ｓ）であるというルールである。また、式（ｂ）は、名詞と格助詞の組み合わせが連用修飾成分になるというルールである。また、式（ｃ）は、動詞＋０以上の助詞又は助動詞が述語になるというルールである。 Among the above formulas, (a) to (c) are existing parsing rules. Here, the expression (a) is a rule that zero or more continuous modification components (NP) + predicates (VP) are statements (S). Moreover, Formula (b) is a rule that the combination of a noun and a case particle becomes a continuous modification component. The expression (c) is a rule that a verb + 0 or more particles or an auxiliary verb becomes a predicate.

また、式（ｄ）は、本実施形態において追加される構文解析ルールであり、０以上の名詞に接続する名詞は名詞になるというルールである。 Expression (d) is a parsing rule added in the present embodiment, and is a rule that a noun connected to zero or more nouns becomes a noun.

上記の例文（１）から本実施形態に係る処理手順に従って図７に示したような構文解析用辞書を作成し、さらに上記の文法ルール（ａ）〜（ｄ）によって構文解析を行った結果を、図８に示している。この場合、複合名詞の各要素は、下式のような名詞Ｎを含む構文意味解析ルールにマッチしなくなる。すなわち、複合名詞の各要素となる名詞には特別な品詞カテゴリＮｍｏｄが与えられていることから、上記の文法ルール（ｄ）が適用される。すなわち、構文意味解析結果の候補が制限されることから、計算コストを削減し、解析速度を向上することができる。 A syntax analysis dictionary as shown in FIG. 7 is created from the above example sentence (1) according to the processing procedure according to the present embodiment, and the result of the syntax analysis by the above grammar rules (a) to (d) is shown. This is shown in FIG. In this case, each element of the compound noun does not match the syntactic and semantic analysis rule including the noun N as shown in the following expression. That is, since the special part-of-speech category Nmod is given to each noun as each element of the compound noun, the above grammatical rule (d) is applied. That is, since the candidates for the syntactic and semantic analysis result are limited, the calculation cost can be reduced and the analysis speed can be improved.

Ｓ→Ｎ
ＮＰ→Ｎ
ＡＤＶ→Ｎ S → N
NP → N
ADV → N

従来の構文解析ルールのみを適用した場合、名詞（Ｎ）はＳ、ＮＰ、ＡＤＶのいずれにも成り得る、すなわち解析結果の候補数が多くなる分だけ計算量が増大してしまう。これに対し、本実施形態では、複合名詞を構成する名詞に特別な文法カテゴリＮｍｏｄを割り当てており、構文解析結果の候補が名詞Ｎに限定される、すなわち候補数が制限される。すなわち、複合名詞の要素が上式のような名詞Ｎを含む構文意味解析ルールにマッチしなくなることから、計算コストを著しく削減し、解析速度を向上することができる。 When only the conventional parsing rule is applied, the noun (N) can be any of S, NP, and ADV, that is, the amount of calculation increases as the number of analysis result candidates increases. On the other hand, in the present embodiment, a special grammar category Nmod is assigned to the nouns constituting the compound noun, and the parsing result candidates are limited to the noun N, that is, the number of candidates is limited. That is, since the element of the compound noun does not match the syntactic and semantic analysis rule including the noun N as in the above formula, the calculation cost can be significantly reduced and the analysis speed can be improved.

［追補］
以上、特定の実施形態を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施形態の修正や代用を成し得ることは自明である。 [Supplement]
The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiment without departing from the gist of the present invention.

本実施形態ではＬＦＧ文法理論に基づいて説明したが、勿論、他の文法ルールを備えた解析システムにおいても本発明を同様に適用することができる。 Although the present embodiment has been described based on the LFG grammar theory, of course, the present invention can be similarly applied to an analysis system having other grammar rules.

要するに、例示という形態で本発明を開示してきたのであり、本明細書の記載内容を限定的に解釈するべきではない。本発明の要旨を判断するためには、冒頭に記載した特許請求の範囲の欄を参酌すべきである。 In short, the present invention has been disclosed in the form of exemplification, and the description of the present specification should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims section described at the beginning should be considered.

図１は、ＬＦＧに基づく自然言語処理システム１の構成を模式的に示した図である。FIG. 1 is a diagram schematically showing a configuration of a natural language processing system 1 based on LFG. 図２は、入力文「私の娘は英語を話します。」を統語・意味解析部１により処理した結果として得られるｃ−ｓｔｒｕｃｔｕｒｅを示した図である。FIG. 2 is a diagram showing c-structure obtained as a result of processing the input sentence “My daughter speaks English” by the syntactic / semantic analysis unit 1. 図３は、入力文「私の娘は英語を話します。」を統語・意味解析部１により処理した結果として得られるｆ−ｓｔｒｕｃｔｕｒｅを示した図である。FIG. 3 is a diagram showing f-structure obtained as a result of processing the input sentence “My daughter speaks English” by the syntactic / semantic analysis unit 1. 図４は、構文・意味解析の前処理として、形態素解析の結果を基に、構文・意味解析（ＬＦＧ）用の辞書を生成するための処理手順を示したフローチャートである。FIG. 4 is a flowchart showing a processing procedure for generating a dictionary for syntax / semantic analysis (LFG) based on the result of morphological analysis as preprocessing of syntax / semantic analysis. 図５は、例文（１）についての形態素解析結果を示した図である。FIG. 5 is a diagram showing a morphological analysis result for the example sentence (1). 図６は、文法カテゴリ変換表の構成例を示した図である。FIG. 6 is a diagram showing a configuration example of the grammar category conversion table. 図７は、例文（１）の入力に対して作成される構文意味解析用辞書を示した図である。FIG. 7 is a diagram showing a syntax and semantic analysis dictionary created for the input of the example sentence (1). 図８は、図７に示した構文解析用辞書を用いて構文解析を行った結果を示した図である。FIG. 8 is a diagram showing a result of syntax analysis using the syntax analysis dictionary shown in FIG. 図９は、例文（１）に対する構文意味解析結果の一例を示した図である。FIG. 9 is a diagram illustrating an example of a syntax and semantic analysis result for the example sentence (1). 図１０は、例文（２）についての構文意味解析結果を示した図である。FIG. 10 is a diagram showing a result of syntactic and semantic analysis for the example sentence (2). 図１１は、例文（３）についての構文意味解析結果を示した図である。FIG. 11 is a diagram showing the result of syntactic and semantic analysis for the example sentence (3).

Explanation of symbols

１…自然言語処理システム
２…形態素解析部
２Ａ…形態素ルール，２Ｂ…形態素辞書
３…統語・意味解析部
３Ａ…文法ルール，３Ｂ…結合価辞書
DESCRIPTION OF SYMBOLS 1 ... Natural language processing system 2 ... Morphological analysis part 2A ... Morphological rule, 2B ... Morphological dictionary 3 ... Syntactic / semantic analysis part 3A ... Grammar rule, 3B ... Joint value dictionary

Claims

A natural language processing system for analyzing a natural language sentence in which a compound word composed of a plurality of words composed of specific parts of speech appears,
Means for acquiring a morpheme analysis result including a recognition result of a part of speech for each morpheme for the input natural language sentence;
Based on the morpheme analysis result, means for extracting a portion where morphemes of the same part of speech are connected to the specific part of speech in the input natural language sentence;
A special grammar category indicating that a compound word is constructed for the extracted continuous morpheme of the specific part of speech is given , and a normal grammar category is given to the other morpheme, and the input natural language Means for creating a dictionary for syntactic and semantic analysis describing each morpheme contained in a sentence as a headword and describing a grammar category for each headword ;
A syntax analysis means for applying a grammar rule describing a relation between a combination of grammatical categories of morphemes and a phrase structure of a sentence to the dictionary for syntax semantic analysis, and performing a syntax analysis on an input natural language sentence; ,
A natural language processing system comprising:

The specific part of speech is a noun or a verb,
The natural language processing system according to claim 1.

The grammar rules that the syntax analysis unit applies to the dictionary for syntax semantic analysis include a grammar rule that a morpheme having a special grammar category becomes the specific part of speech, in addition to the grammar rules related to the specific part of speech.
The natural language processing system according to claim 1.

In a natural language processing system constructed using a computer, a natural language processing method for analyzing a natural language sentence in which a compound word composed of a plurality of words composed of specific parts of speech appears,
The morpheme analysis means provided in the computer acquires a morpheme analysis result including a recognition result of a part of speech for each morpheme for the input natural language sentence;
The extraction means provided in the computer, based on the morphological analysis results, extracting a portion where morphemes of the same part of speech are connected to the specific part of speech in the input natural language sentence;
The dictionary creation means provided in the computer gives a special grammar category indicating that a compound word is constructed for the extracted continuous morpheme of the specific part of speech, and a normal grammar category for other morphemes Creating a dictionary for syntactic and semantic analysis that describes each morpheme included in the input natural language sentence as a headword and describing a grammatical category for each headword;
The syntactic analysis means provided in the computer applies a grammar rule describing a relationship between a combination of grammatical categories of consecutive morphemes and a phrase structure of a sentence to the dictionary for syntactic and semantic analysis, and an input natural language sentence A parsing step for parsing
A natural language processing method comprising:

The specific part of speech is a noun or a verb,
The natural language processing method according to claim 4.

The grammatical rule applied to the syntactic analysis dictionary in the parsing step includes a grammatical rule that a morpheme having the special grammar category becomes the specific part of speech in addition to the grammatical rule related to the specific part of speech. ,
The natural language processing method according to claim 4.

A computer program written in a computer-readable format so as to execute processing on a computer for analyzing a natural language sentence in which a compound word composed of a plurality of words composed of specific parts of speech appears, The computer,
Means for acquiring a morpheme analysis result including a recognition result of a part of speech for each morpheme for the input natural language sentence;
Based on the morpheme analysis result, means for extracting a portion where morphemes of the same part of speech are connected to the specific part of speech in the input natural language sentence;
A special grammar category indicating that a compound word is constructed for the extracted continuous morpheme of the specific part of speech is given, and a normal grammar category is given to the other morpheme, and the input natural language Means for creating a dictionary for syntactic and semantic analysis describing each morpheme contained in a sentence as a headword and describing a grammar category for each headword;
A syntax analysis means for applying a grammar rule describing a relationship between a combination of grammatical categories of morphemes and a phrase structure of a sentence to the dictionary for syntax semantic analysis, and performing a syntax analysis on an input natural language sentence; ,
Computer program to function as