JP2010134730A

JP2010134730A - Language processor, language processing system and program

Info

Publication number: JP2010134730A
Application number: JP2008310498A
Authority: JP
Inventors: Tomoko Okuma; 智子大熊; Hiroshi Umeki; 宏梅基; Hiroshi Masuichi; 博増市
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2008-12-05
Filing date: 2008-12-05
Publication date: 2010-06-17
Anticipated expiration: 2028-12-05
Also published as: JP5343539B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a language processor which acquires base information as a criteria for determining whether a word used in a sentence in combination with a word representing a number functions as either or both a numerical classifier or/and a noun, on the basis of sentence group data. <P>SOLUTION: A word acquisition means (21) acquires a word used in a sentence in combination with a word representing a number. A statistical information acquisition means (22) acquires statistical information relating to how to use the word, on the basis of sentence group data. A base information generation means (23) generates base information as a criteria for determining whether the word functions as either or both a numerical classifier or/and a noun, on the basis of the statistical information. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、言語処理装置、言語処理システム及びプログラムに関する。 The present invention relates to a language processing device, a language processing system, and a program.

数を表す語と組み合わせて文中で用いられる単語は助数詞及び名詞のいずれか一方として機能する場合と両方として機能する場合がある。例えば「４０人のメンバー」との文において「４０」はメンバーの数を示しており、この場合の「人」は、数詞である「４０」が名詞である「メンバー」を修飾する際の補助を行う助数詞として機能する。一方、「４０人の清掃活動」との文において「４０」は清掃活動の数を示しておらず、「人」を修飾している。この場合の「人」は、数詞である「４０」が直接修飾する対象そのものを現しており、名詞として機能する。このように、「人」は助数詞及び名詞の両方として機能し得る。また例えば、「個」との語は数を表す語と組み合わせて文中で用いられた場合、名詞としては機能せず、助数詞として機能する。また例えば、「自衛隊」との語は、「３自衛隊の合同訓練」のように、数を表す語と組み合わせて用いられることもあるが、助数詞としては機能せず、名詞として機能する。 A word used in a sentence in combination with a word representing a number may function as one or both of a classifier and a noun. For example, in the sentence “40 members”, “40” indicates the number of members. In this case, “person” is an assist in modifying “members” in which the number “40” is a noun. Functions as a classifier. On the other hand, in the sentence “40 cleaning activities”, “40” does not indicate the number of cleaning activities and qualifies “person”. The “person” in this case represents the object itself directly modified by the numeral “40” and functions as a noun. Thus, a “person” can function as both a classifier and a noun. Further, for example, when the word “individual” is used in a sentence in combination with a word representing a number, it does not function as a noun but functions as a classifier. Further, for example, the word “SDF” may be used in combination with a word representing a number, such as “Joint training of 3 Self-Defense Forces”, but it does not function as a classifier but functions as a noun.

例えば構文解析などの言語処理を、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかをを考慮して実行する場合には、上記単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報が必要となる。 For example, when performing language processing such as parsing in consideration of whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or both, Basic information is required as a basis for determining whether the word functions as one or both of a classifier and a noun.

この点に関連して、特許文献１には、助数詞の意味役割を記述した対応表を予め記憶しておき、この対応表に基づいて文の解析を行う技術が開示されている。また特許文献２には、ユーザが名詞に合わせて選択した助数詞を該名詞と対応づけて辞書データに記憶し、以降の翻訳処理をその辞書データに基づいて行う技術が開示されている。
特許４０３９２８２号特開２０００−９００８８号公報 In relation to this point, Patent Document 1 discloses a technique in which a correspondence table describing the semantic role of a classifier is stored in advance, and a sentence is analyzed based on the correspondence table. Patent Document 2 discloses a technique in which a classifier selected by a user according to a noun is stored in dictionary data in association with the noun, and the subsequent translation processing is performed based on the dictionary data.
Patent 4039282 Japanese Patent Laid-Open No. 2000-90088

本発明は、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報を文章群データに基づいて取得する言語処理装置、言語処理システム及びプログラムを提供することを目的とする。 The present invention acquires, based on sentence group data, basic information that serves as a basis for determining whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or as both An object of the present invention is to provide a language processing device, a language processing system, and a program.

請求項１に記載の発明は、言語処理装置であって、数を表す語と組み合わせて文中で用いられる単語を取得する単語取得手段と、文章群データに基づいて、前記単語の用いられ方に関する統計情報を取得する統計情報取得手段と、前記単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報を、前記統計情報に基づいて生成する基礎情報生成手段と、を含むことを特徴とする。 The invention according to claim 1 is a language processing apparatus, which relates to a word acquisition means for acquiring a word used in a sentence in combination with a word representing a number, and how the word is used based on sentence group data. Statistical information acquisition means for acquiring statistical information, and basic information for generating basic information based on the statistical information as a basis for determining whether the word functions as either a classifier or a noun or both And an information generating means.

請求項２に記載の発明は、請求項１に記載の言語処理装置であって、前記統計情報取得手段は、前記文章群データにおいて、前記単語が、助数詞に対応する予め定められた第１態様で用いられている回数を取得する手段と、前記文章群データにおいて、前記単語が、名詞に対応する予め定められた第２態様で用いられている回数を取得する手段と、を含むことを特徴とする。 Invention of Claim 2 is the language processing apparatus of Claim 1, Comprising: The said statistical information acquisition means WHEREIN: In the said sentence group data, the said word is a predetermined 1st aspect corresponding to a classifier. And means for obtaining the number of times the word is used in a predetermined second mode corresponding to a noun in the sentence group data. And

請求項３に記載の発明は、請求項１又は２に記載の言語処理装置であって、前記統計情報取得手段は、前記文章群データにおいて前記単語を修飾する修飾語の数を取得する手段を含むことを特徴とする。 The invention according to claim 3 is the language processing apparatus according to claim 1 or 2, wherein the statistical information acquisition means includes means for acquiring the number of modifiers that modify the word in the sentence group data. It is characterized by including.

請求項４に記載の発明は、請求項１乃至３のいずれかに記載の言語処理装置であって、解析対象文字列を取得する解析対象文字列取得手段と、前記解析対象文字列に前記単語が含まれる場合、該単語の前記基礎情報に基づいて、該単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかを判定し、該判定結果に基づいて、前記解析対象文字列を解析する解析手段と、を含むことを特徴とする。 Invention of Claim 4 is the language processing apparatus in any one of Claim 1 thru | or 3, Comprising: The analysis object character string acquisition means which acquires an analysis object character string, The said word in the said analysis object character string Is included, based on the basic information of the word, it is determined whether the word functions as either a classifier or a noun or both, and based on the determination result, the character to be analyzed Analyzing means for analyzing the column.

請求項５に記載の発明は、請求項４に記載の言語処理装置であって、前記解析手段は、前記判定結果に基づいて、前記解析対象文字列の解析規則を決定する解析規則決定手段を含み、前記解析規則決定手段によって決定された解析規則を用いて前記解析対象文字列を解析することを特徴とする。 The invention according to claim 5 is the language processing apparatus according to claim 4, wherein the analysis means includes analysis rule determination means for determining an analysis rule for the analysis target character string based on the determination result. And the analysis target character string is analyzed using the analysis rule determined by the analysis rule determination means.

請求項６に記載の発明は、言語処理システムであって、数を表す語と組み合わせて文中で用いられる単語を取得する単語取得手段と、文章群データに基づいて、前記単語の用いられ方に関する統計情報を取得する統計情報取得手段と、前記単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報を、前記統計情報に基づいて生成する基礎情報生成手段と、を含むことを特徴とする。 The invention according to claim 6 is a language processing system, which relates to a word acquisition means for acquiring a word used in a sentence in combination with a word representing a number, and how the word is used based on sentence group data. Statistical information acquisition means for acquiring statistical information, and basic information for generating basic information based on the statistical information as a basis for determining whether the word functions as either a classifier or a noun or both And an information generation means.

請求項７に記載の発明は、プログラムであって、数を表す語と組み合わせて文中で用いられる単語を取得する単語取得手段、文章群データに基づいて、前記単語の用いられ方に関する統計情報を取得する統計情報取得手段、及び、前記単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報を、前記統計情報に基づいて生成する基礎情報生成手段、としてコンピュータを機能させるためのプログラムである。 The invention according to claim 7 is a program, the word acquisition means for acquiring a word used in a sentence in combination with a word representing a number, statistical information on how the word is used based on sentence group data Statistical information acquisition means for acquiring, and basic information generation for generating basic information based on the statistical information as a basis for determining whether the word functions as either a classifier or a noun or both A program for causing a computer to function as means.

請求項１に記載の発明によれば、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報が文章群データに基づいて取得される。 According to the first aspect of the present invention, the basic information serving as a basis for determining whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or as both. Acquired based on sentence group data.

請求項２に記載の発明によれば、数を表す語と組み合わせて文中で用いられる単語の上記基礎情報が、該単語が、助数詞に対応する予め定められた第１態様で実際に文章群データで用いられている回数と、名詞に対応する予め定められた第２態様で実際に文章群データで用いられている回数と、を考慮して取得される。 According to the second aspect of the present invention, the basic information of a word used in a sentence in combination with a word representing a number is actually sentence group data in a predetermined first mode in which the word corresponds to a classifier. And the number of times actually used in the sentence group data in the predetermined second mode corresponding to the noun.

請求項３に記載の発明によれば、数を表す語と組み合わせて文中で用いられる単語の上記基礎情報が、文章群データにおいて該単語を修飾する修飾語の数を考慮して取得される。 According to the third aspect of the present invention, the basic information of the word used in the sentence in combination with the word representing the number is acquired in consideration of the number of modifiers that modify the word in the sentence group data.

請求項４に記載の発明によれば、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかが考慮された解析結果が得られる。 According to the fourth aspect of the invention, an analysis result in which whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or both can be obtained. .

請求項５に記載の発明によれば、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかが考慮された解析規則に則った解析結果が得られる。 According to the invention described in claim 5, in accordance with an analysis rule that considers whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or as both An analysis result is obtained.

請求項６に記載の発明によれば、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報が文章群データに基づいて取得される。 According to the invention described in claim 6, basic information serving as a basis for determining whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or as both. Acquired based on sentence group data.

請求項７に記載の発明によれば、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報が文章群データに基づいて取得される。 According to the seventh aspect of the present invention, basic information serving as a basis for determining whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or as both. Acquired based on sentence group data.

以下、本発明の実施の形態について、図面を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の一実施形態に係る言語処理システム１の構成例を示す図である。図１に示す言語処理システム１は言語処理装置１０を含んで構成される。言語処理装置１０は例えばパーソナルコンピュータ等であり、制御部１１と、記憶部１２と、操作部１３と、表示部１４と、を含んで構成される。 FIG. 1 is a diagram illustrating a configuration example of a language processing system 1 according to an embodiment of the present invention. A language processing system 1 shown in FIG. 1 includes a language processing device 10. The language processing apparatus 10 is, for example, a personal computer, and includes a control unit 11, a storage unit 12, an operation unit 13, and a display unit 14.

制御部１１は、例えばＣＰＵ等であって、記憶部１２に格納されるプログラムに従って各種の情報処理を実行する。記憶部１２は、例えばＲＡＭやＲＯＭ等のメモリ素子、ハードディスクなどを含んで構成される。記憶部１２は、制御部１１によって実行されるプログラムや、各種のデータを保持する。特に本実施形態では、記憶部１２は、形態素解析や構文解析に使用する辞書や文法規則のデータを記憶している。また、記憶部１２は、制御部１１のワークメモリとしても動作する。 The control unit 11 is, for example, a CPU or the like, and executes various types of information processing according to programs stored in the storage unit 12. The storage unit 12 includes a memory element such as a RAM and a ROM, a hard disk, and the like. The storage unit 12 holds programs executed by the control unit 11 and various data. In particular, in the present embodiment, the storage unit 12 stores dictionary and grammatical rule data used for morphological analysis and syntax analysis. The storage unit 12 also operates as a work memory for the control unit 11.

操作部１３は、例えばキーボードやマウス等であって、利用者の指示操作を受け付けて、当該指示操作の内容を制御部１１に出力する。表示部１４は、例えば液晶ディスプレイ等であり、制御部１１からの指示に従って、画像の表示を行う。 The operation unit 13 is, for example, a keyboard or a mouse, and receives a user instruction operation, and outputs the content of the instruction operation to the control unit 11. The display unit 14 is, for example, a liquid crystal display, and displays an image in accordance with an instruction from the control unit 11.

以下、本実施形態に係る言語処理システム１で実現される機能について説明する。言語処理システム１は、機能的に、図２に示すように、単語取得部２１と、統計情報取得部２２と、基礎情報生成部２３と、解析対象文字列取得部３１と、解析部３２と、解析規則決定部３３と、解析結果出力部３４と、基礎情報記憶部３５と、を含んで構成される。基礎情報記憶部３５は例えば記憶部１２によって実現され、その他の機能ブロックは、例えば制御部１１が記憶部１２に格納されるプログラムを実行することによって実現される。このプログラムは、例えばインターネット等の通信手段を介して提供されてもよいし、例えばＣＤ−ＲＯＭやＤＶＤ−ＲＯＭ等、各種のコンピュータ読み取り可能な情報記憶媒体に格納されて提供されてもよい。 Hereinafter, functions realized by the language processing system 1 according to the present embodiment will be described. As shown in FIG. 2, the language processing system 1 functionally includes a word acquisition unit 21, a statistical information acquisition unit 22, a basic information generation unit 23, an analysis target character string acquisition unit 31, and an analysis unit 32. The analysis rule determination unit 33, the analysis result output unit 34, and the basic information storage unit 35 are included. The basic information storage unit 35 is realized by, for example, the storage unit 12, and the other functional blocks are realized by, for example, the control unit 11 executing a program stored in the storage unit 12. This program may be provided through communication means such as the Internet, or may be provided by being stored in various computer-readable information storage media such as CD-ROM and DVD-ROM.

まず、単語取得部２１、統計情報取得部２２、及び基礎情報生成部２３について説明する。これらの機能ブロックは、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報を生成するための機能ブロックである。 First, the word acquisition unit 21, the statistical information acquisition unit 22, and the basic information generation unit 23 will be described. These functional blocks are functions for generating basic information that serves as a basis for determining whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or as both. It is a block.

数を表す語と組み合わせて文中で用いられる単語は、
第１グループ：助数詞及び名詞の両方として機能し得るもの
第２グループ：助数詞として機能するもの
第３グループ：名詞として機能するもの
の３種類のグループに分類される。本実施形態では、数を表す語と組み合わせて文中で用いられ得る単語が第１〜３グループのいずれに属するかが判断され、その判断結果を示す情報が上記基礎情報として生成される。 Words used in sentences in combination with words that represent numbers are:
First group: one that can function as both a classifier and noun Second group: one that functions as a classifier Third group: one that functions as a noun Grouped into three groups. In the present embodiment, it is determined which of the first to third groups a word that can be used in a sentence in combination with a word representing a number, and information indicating the determination result is generated as the basic information.

図３及び図４は言語処理システム１で実行される処理を示すフロー図であり、単語取得部２１、統計情報取得部２２、及び基礎情報生成部２３の処理を示すフロー図である。以下、図３及び図４を参照しながら、これらの機能ブロックについて説明する。 3 and 4 are flowcharts showing processing executed in the language processing system 1, and are flowcharts showing processing of the word acquisition unit 21, the statistical information acquisition unit 22, and the basic information generation unit 23. FIG. Hereinafter, these functional blocks will be described with reference to FIGS. 3 and 4.

単語取得部２１は、数を表す語の後に付けて用いられ得る単語を取得する。 The word acquisition unit 21 acquires a word that can be used after a word representing a number.

具体的には、図３に示すように、まず単語取得部２１は複数の文を含んでなる文章群データを取得し、文章群データに係る文章群の解析を実行する（Ｓ１０１）。例えば文章群データは記憶部１２に記憶される。以下では具体例として、下記の８つの文ａ〜ｈを含んでなる文章群Ｇを対象として処理する場合について説明する。
ａ：「日本人は２人しかいなかった。」
ｂ：「彼はそのうちの１個を買った。」
ｃ：「他の人は知らない。」
ｄ：「昨日、厚木基地の航空自衛隊が災害地に派遣された。」
ｅ：「３自衛隊の合同訓練が行われた。」
ｆ：「５個を重ねると丁度良い高さになる。」
ｇ：「自衛隊に入隊した。」
ｈ：「クラスで３人が合格した。」 Specifically, as shown in FIG. 3, first, the word acquisition unit 21 acquires sentence group data including a plurality of sentences, and executes analysis of the sentence group related to the sentence group data (S101). For example, the sentence group data is stored in the storage unit 12. Hereinafter, as a specific example, a case will be described in which a sentence group G including the following eight sentences a to h is processed.
a: “There were only two Japanese.”
b: “He bought one of them.”
c: “Other people don't know.”
d: “The Air Self Defense Force at Atsugi Base was dispatched to the disaster area yesterday.”
e: “Joint training of 3 Self-Defense Forces was conducted.”
f: “If 5 are stacked, the height is just right.”
g: “I joined the Self-Defense Force.”
h: “Three people passed in class.”

Ｓ１０１では文章群に対して形態素解析及び構文解析が実行される。形態素解析及び構文解析としては公知の手法が用いられる。 In S101, morphological analysis and syntax analysis are performed on the sentence group. Known methods are used for morphological analysis and syntax analysis.

形態素解析及び構文解析を実行した後、単語取得部２１は、文章群を構成する形態素のうちから、数を表す形態素（単語）の直後にある単語（形態素）を抽出し、それらの単語のリストＬを取得する（Ｓ１０２）。例えば、上記文章群Ｇの場合、数を表す語の直後にある単語として、文ａ，ｈにおける「人」と、文ｂ，ｆにおける「個」と、文ｅにおける「自衛隊」と、が抽出される。このため、Ｓ１０２で取得される単語リストＬは｛人，個，自衛隊｝となる。 After executing the morpheme analysis and the syntax analysis, the word acquisition unit 21 extracts a word (morpheme) immediately after the morpheme (word) representing the number from the morphemes constituting the sentence group, and lists those words L is acquired (S102). For example, in the case of the sentence group G, “person” in the sentences a and h, “individual” in the sentences b and f, and “Self Defense Force” in the sentence e are extracted as words immediately after the word representing the number. Is done. For this reason, the word list L acquired in S102 is {person, individual, self-defense force}.

統計情報取得部２２は、単語取得部２１によって取得された単語の、文章群における用いられ方に関する統計情報を取得する。例えば、統計情報取得部２２は、文章群において、単語取得部２１によって取得された単語が、助数詞に対応する予め定められた第１態様で用いられている回数を取得する（Ｓ１０５参照）。また例えば、統計情報取得部２２は、文章群において、単語取得部２１によって取得された単語が、名詞に対応する予め定められた第２態様で用いられている回数を取得する（Ｓ１０６参照）。 The statistical information acquisition unit 22 acquires statistical information regarding how the word acquired by the word acquisition unit 21 is used in the sentence group. For example, the statistical information acquisition unit 22 acquires the number of times that the word acquired by the word acquisition unit 21 is used in a predetermined first mode corresponding to the classifier in the sentence group (see S105). In addition, for example, the statistical information acquisition unit 22 acquires the number of times that the word acquired by the word acquisition unit 21 is used in a predetermined second mode corresponding to the noun in the sentence group (see S106).

具体的には、図３に示すように、統計情報取得部２２は変数ｉを０に初期化する（Ｓ１０３）。そして統計情報取得部２２は、単語リストＬのｉ番目の要素である単語Ｌ_ｉが文章群において用いられている回数Ｐ_ｉを計数する（Ｓ１０４）。すなわち、単語Ｌ_ｉが文章群に登場する回数Ｐ_ｉが計数される。例えば、変数ｉが０である場合には、単語リストＬの０番目の要素（すなわち先頭の要素）である単語「人」が用いられている回数Ｐ_０が計数される。「人」は文ａ，ｃ，ｈにおいて１回ずつ用いられているため、Ｐ_０の値として「３」が取得される。 Specifically, as shown in FIG. 3, the statistical information acquisition unit 22 initializes a variable i to 0 (S103). Then, the statistical information acquisition unit 22 counts the number P _i of times that the word L _i that is the i-th element of the word list L is used in the sentence group (S104). That is, the number P _{i of} times that the word L _i appears in the sentence group is counted. For example, when the variable i is 0, the number of times P ₀ in which the word “person” that is the 0th element (that is, the first element) of the word list L is used is counted. Since “person” is used once in the sentences a, c, and h, “3” is acquired as the value of P ₀ .

また統計情報取得部２２は、文章群において単語Ｌ_ｉが、数を表す語の後に付けて用いられている回数Ｑ_ｉを計数する（Ｓ１０５）。すなわち、文章群において単語Ｌ_ｉが数を表す語の直後に登場する回数Ｑ_ｉが計数される。例えば上記文章群Ｇの場合、文ａ，ｈにおいて「人」が数を表す語の直後にあるため、Ｑ_０の値として「２」が取得される。 In addition, the statistical information acquisition unit 22 counts the number of times Q _{i in} which the word L _i is used after the word representing the number in the sentence group (S105). In other words, the number of times the word L _i in the sentence group is to appear immediately after the word representing the number Q _i is counted. For example, in the case of the sentence group G, “2” is acquired as the value of Q ₀ because “person” is immediately after the word representing the number in the sentences a and h.

助数詞として機能する単語は助数詞として用いられることにより、助数詞として機能しない単語に比べて、数を表す語の後に付けて用いられることが多くなる。したがって、助数詞として機能する単語は、助数詞として機能しない単語に比べて、Ｑ_ｉの値が大きくなる。このため、Ｑ_ｉの値は、単語が助数詞として機能するか否かを判断するための統計値となる。 When a word that functions as a classifier is used as a classifier, it is often used after a word that represents a number, compared to a word that does not function as a classifier. Therefore, a word that functions as a classifier has a larger Q _i value than a word that does not function as a classifier. For this reason, the value of Q _i is a statistical value for determining whether a word functions as a classifier.

また統計情報取得部２２は、文章群において単語Ｌ_ｉが、数を表す語の直後以外の位置で、かつ、述語を修飾するように用いられている回数Ｒ_ｉを計数する（Ｓ１０６）。例えば上記文章群Ｇの場合、文ｃにおける「人」は、数を表す語の直後になく、かつ、述語を修飾しているため、Ｒ_０の値として「１」が取得される。 The statistical information acquisition unit 22, the word L _i in sentence group is at a position other than immediately following the word representing the number, and counts the number of times R _i which is used to modify the predicate (S106). For example, in the case of the sentence group G, “person” in the sentence c is not immediately after the word representing the number, and the predicate is modified, so “1” is acquired as the value of R ₀ .

ここで、数を表す語の後に付けて用いられ得るような単語が、数を表す語の直後以外の位置で、かつ、述語を修飾するように用いられている場合、その単語は名詞として機能していると考えられる。したがって、名詞として機能する単語は、名詞として機能しない単語に比べて、Ｒ_ｉの値が大きくなる。このため、Ｒ_ｉの値は、単語が名詞として機能するか否かを判断するための統計値となる。 Here, when a word that can be used after a word that represents a number is used at a position other than immediately after the word that represents a number and is used to modify a predicate, the word functions as a noun. it seems to do. Therefore, the value of R _i is greater for words that function as nouns than for words that do not function as nouns. For this reason, the value of R _i is a statistical value for determining whether or not a word functions as a noun.

また統計情報取得部２２は、文章群において単語Ｌ_ｉを修飾している修飾語の数Ｓ_ｉを計数する（Ｓ１０７）。例えば上記文章群Ｇの場合、文ｃにおいて「人」を修飾している修飾語が１つあり、かつ、他の文には「人」を修飾している修飾語が存在しないため、Ｓ_０の値として「１」が取得される。 The statistical information acquisition unit 22 counts the number _{S i} of modifier that modifies the word _{L i} in sentence group (S107). For example, in the case of the sentence group G, there modifier that modifies the "person" one in the sentence c, and, since the other statements no modifier that modifies the "human", S ₀ “1” is acquired as the value of.

ここで、単語が名詞として機能している場合には修飾語によって修飾されることが多い。したがって、名詞として機能する単語は、名詞として機能しない単語に比べて、Ｓ_ｉの値が大きくなる。このため、Ｓ_ｉの値は、単語が名詞として機能するか否かを判断するための統計値となる。 Here, when a word functions as a noun, it is often modified by a modifier. Thus, words that functions as a noun, as compared to a word that does not function as a noun, the value of S _i increases. Therefore, the value of S _i is a statistical value for the word to determine whether it functions as a noun.

基礎情報生成部２３は、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報を、統計情報取得部２２によって取得された統計情報に基づいて生成する。 The basic information generation unit 23 obtains statistical information as basic information for determining whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun or as both. Generated based on the statistical information acquired by the unit 22.

具体的には、図４に示すように、基礎情報生成部２３は、単語Ｌ_ｉの統計情報（Ｐ_ｉ，Ｑ_ｉ，Ｒ_ｉ，Ｓ_ｉ）に基づいて単語Ｌ_ｉの評価値Ｅ_ｉを算出する（Ｓ１０８）。評価値Ｅ_ｉは下記の算出式（１）によって算出される。 Specifically, as shown in FIG. 4, basic information generating section 23, the word _{L i} statistics _{_{_{(P i, Q i, R}}} i, S i) an evaluation value _{E i} of the word _{L i} based on the Calculate (S108). The evaluation value E _i is calculated by the following calculation formula (1).

Ｅ_ｉ＝（Ｐ_ｉ＊Ｑ_ｉ）／（１＋Ｒ_ｉ＋Ｓ_ｉ）・・・（１） E _i = (P _i * Q _i ) / (1 + R _i + S _i ) (1)

上述のように、助数詞として機能する単語については、助数詞として機能しない単語に比べて、Ｑ_ｉの値が大きくなる。一方、名詞として機能する単語については、名詞として機能しない単語に比べて、Ｒ_ｉ及びＳ_ｉの値が大きくなる。このため、第２グループ（助数詞として機能するもの）に属する単語は他のグループに属する単語よりも評価値Ｅ_ｉが大きくなり、第３グループ（名詞として機能するもの）に属する単語は他のグループに属する単語よりも評価値Ｅ_ｉが小さくなる。 As described above, a word that functions as a classifier has a larger Q _i value than a word that does not function as a classifier. On the other hand, the value of R _i and S _i is greater for words that function as nouns than for words that do not function as nouns. For this reason, words belonging to the second group (functioning as a classifier) have a larger evaluation value E _i than words belonging to other groups, and words belonging to the third group (functioning as nouns) The evaluation value E _i is smaller than the words belonging to.

また、上述のように、「人」の統計値（Ｐ_０，Ｑ_０，Ｒ_０，Ｓ_０）は（３，２，１，１）であるため、「人」の評価値Ｅ_０として「２」が取得される。 Further, as described above, since the statistical value (P ₀ , Q ₀ , R ₀ , S ₀ ) of “person” is ( ₃ , 2, ₁ , ₁ ), the evaluation value E ₀ of “person” is “ 2 "is acquired.

その後、基礎情報生成部２３は単語Ｌ_ｉの評価値Ｅ_ｉを基準値Ｊと比較する（Ｓ１０９）。上述のように、第３グループに属する単語は他のグループに属する単語よりも評価値Ｅ_ｉが小さくなるため、このステップでは、単語Ｌ_ｉの評価値Ｅ_ｉが基準値Ｊよりも小さいか否かを判断することによって、単語Ｌ_ｉを第３グループに分類すべきか否かを判断する。すなわち、単語Ｌ_ｉを名詞に分類すべきか否かを判断する。 Thereafter, basic information generating unit 23 compares the evaluation value _{E i} of the word _{L i} with a reference value J (S109). As described above, since the evaluation value E _{i of the} words belonging to the third group is smaller than that of the words belonging to other groups, in this step, whether or not the evaluation value E _i of the word L _i is smaller than the reference value J. by determining, it is determined whether or not to classify the word L _i to a third group. That is, it is determined whether or not the word L _i should be classified as a noun.

なお、基準値Ｊは例えば次のようにして決定される。すなわち、まず、数を表す語と組み合わせて文中で用いられる場合に名詞として機能し、かつ、助数詞として機能しないことが分かっている代表的な単語を選出する。その後、選出した単語を処理対象として上記のＳ１０４〜Ｓ１０８の処理を実行することによって、該単語の評価値Ｅを算出する。そして、算出した評価値Ｅに基づいて基準値Ｊを設定する。例えば、算出した評価値Ｅの近傍の値を基準値Ｊとして設定する。なお、以下では基準値Ｊが「１」であることとして説明する。 The reference value J is determined as follows, for example. That is, first, representative words that are known to function as nouns and not to function as classifiers when used in sentences in combination with words representing numbers are selected. Then, the evaluation value E of the word is calculated by executing the processing of S104 to S108 with the selected word as a processing target. Then, a reference value J is set based on the calculated evaluation value E. For example, a value near the calculated evaluation value E is set as the reference value J. In the following description, it is assumed that the reference value J is “1”.

単語Ｌ_ｉの評価値Ｅ_ｉが基準値Ｊよりも小さい場合（Ｓ１０９：Ｙ）、基礎情報生成部２３は単語Ｌ_ｉを第３グループに分類する（Ｓ１１０）。すなわち、単語Ｌ_ｉは名詞に分類される。 If the evaluation value _{E i} of the word _{L i} is smaller than the reference value J (S109: Y), basic information generating unit 23 classifies the word _{L i} in the third group (S110). That is, the word L _i is classified as a noun.

一方、単語Ｌ_ｉの評価値Ｅ_ｉが基準値Ｊ以上である場合（Ｓ１０９：Ｎ）、基礎情報生成部２３は単語Ｌ_ｉの評価値Ｅ_ｉを基準値Ｋと比較する（Ｓ１１１）。上述のように、第２グループに属する単語は他のグループに属する単語よりも評価値Ｅ_ｉが大きくなるため、このステップでは、単語Ｌ_ｉの評価値Ｅ_ｉが基準値Ｋよりも大きいか否かを判断することによって、単語Ｌ_ｉを第２グループに分類すべきか否かを判断する。すなわち、単語Ｌ_ｉを助数詞に分類すべきか否かを判断する。 On the other hand, if the evaluation value _{E i} of the word _{L i} is greater than or equal to the reference value J (S109: N), basic information generating unit 23 compares the evaluation value _{E i} of the word _{L i} and the reference value K (S111). As described above, since a word belonging to the second group has a higher evaluation value E _i than a word belonging to another group, in this step, whether or not the evaluation value E _i of the word L _i is larger than the reference value K. It is determined whether or not the word L _i should be classified into the second group. That is, it is determined whether or not the word L _i should be classified as a classifier.

なお、基準値Ｋは例えば次のようにして決定される。すなわち、まず、数を表す語と組み合わせて文中で用いられる場合に助数詞として機能し、かつ、名詞として機能しないことが分かっている代表的な単語を選出する。その後、選出した単語を処理対象として上記のＳ１０４〜Ｓ１０８の処理を実行することによって、該単語の評価値Ｅを算出する。そして、算出した評価値Ｅに基づいて基準値Ｋを設定する。例えば、算出した評価値Ｅの近傍の値を基準値Ｊとして設定する。なお、以下では基準値Ｋが「３」であることとして説明する。 The reference value K is determined as follows, for example. That is, first, representative words that function as a classifier when used in a sentence in combination with a word representing a number and are known not to function as a noun are selected. Then, the evaluation value E of the word is calculated by executing the processing of S104 to S108 with the selected word as a processing target. Then, a reference value K is set based on the calculated evaluation value E. For example, a value near the calculated evaluation value E is set as the reference value J. In the following description, it is assumed that the reference value K is “3”.

単語Ｌ_ｉの評価値Ｅ_ｉが基準値Ｋよりも大きい場合（Ｓ１１１：Ｙ）、基礎情報生成部２３は単語Ｌ_ｉを第２グループに分類する（Ｓ１１２）。すなわち、単語Ｌ_ｉは助数詞に分類される。 If the evaluation value _{E i} of the word _{L i} is larger than the reference value K (S111: Y), basic information generating unit 23 classifies the word _{L i} in the second group (S112). That is, the word L _i is classified as a classifier.

一方、単語Ｌ_ｉの評価値Ｅ_ｉが基準値Ｋ以下である場合（Ｓ１１１：Ｎ）、基礎情報生成部２３は単語Ｌ_ｉを第１グループに分類する（Ｓ１１３）。すなわち、この場合、単語Ｌ_ｉは助数詞及び名詞の両方に分類される。 On the other hand, if the evaluation value _{E i} of the word _{L i} is equal to or less than the reference value K (S111: N), basic information generating unit 23 classifies the word _{L i} in the first group (S113). That is, in this case, the word L _i is classified as both a classifier and a noun.

なお、上述のように、「人」の評価値Ｅ_０は「２」であり、基準値Ｊ（１）よりも大きく、かつ、基準値Ｋ（３）以下であるため、「人」は第１グループに分類される（Ｓ１１３参照）。すなわち、「人」は助数詞及び名詞の両方に分類される。 As described above, the evaluation value E ₀ of “person” is “2”, which is larger than the reference value J (1) and not more than the reference value K (3). It is classified into one group (see S113). That is, “person” is classified as both a classifier and a noun.

単語Ｌ_ｉの分類が判断された後、変数ｉの値に１が加算され（Ｓ１１４）、変数ｉの値が単語リストＬの要素数Ｍ未満であるか否かが判定される（Ｓ１１５）。そして、変数ｉの値が単語リストＬの要素数Ｍ未満である場合にはＳ１０４〜Ｓ１１５の処理が実行される。 After classification of the word L _i is determined, 1 is added to the value of the variable i (S114), the value of the variable i is equal to or a number of elements less than M word list L is determined (S115). When the value of the variable i is less than the number M of elements in the word list L, the processes of S104 to S115 are executed.

例えば変数ｉの値が１である場合には、単語リストＬ内の単語「個」を対象として、Ｓ１０４〜Ｓ１１３の処理が実行される。上記文章群Ｇでは「個」が文ｂ，ｆにおいて１回ずつ用いられているため、Ｐ_１の値として「２」が取得される（Ｓ１０４）。また、文ｂ，ｆにおいて「個」は数を表す語の後にあるため、Ｑ_１の値として「２」が取得される（Ｓ１０５）。さらに、文ｂ，ｆにおいて「個」は数を表す語の後にあるため、Ｒ_１の値として「０」が取得される（Ｓ１０６）。また、文ｂ，ｆにおいて「個」を修飾する修飾語は存在しないため、Ｓ_１の値として「０」が取得される（Ｓ１０７）。その結果、「個」の評価値Ｅ_１として「４」が取得される（Ｓ１０８）。「個」の評価値Ｅ_１は基準値Ｊ（１）よりも大きく、かつ、基準値Ｋ（３）よりも大きいため、「個」は第２グループに分類される（Ｓ１１２）。すなわち、「個」は助数詞に分類される。 For example, when the value of the variable i is 1, the processing of S104 to S113 is executed for the word “pieces” in the word list L. Since “individual” is used once in the sentences b and f in the sentence group G, “2” is acquired as the value of P ₁ (S104). In the sentences b and f, since “number” is after the word representing the number, “2” is acquired as the value of Q ₁ (S105). Further, in the sentences b and f, since “number” is after the word representing the number, “0” is acquired as the value of R ₁ (S106). In addition, since there is no modifier that modifies “pieces” in the sentences b and f, “0” is acquired as the value of S ₁ (S107). As a result, “4” is acquired as the evaluation value E ₁ of “pieces” (S108). Since the “individual” evaluation value E ₁ is larger than the reference value J (1) and larger than the reference value K (3), the “individual” is classified into the second group (S112). That is, “individual” is classified as a classifier.

また例えば変数ｉの値が２である場合には、単語リストＬ内の単語「自衛隊」を対象として、Ｓ１０４〜Ｓ１１３の処理が実行される。上記文章群Ｇでは「自衛隊」が文ｄ，ｅ，ｇにおいて１回ずつ用いられているため、Ｐ_２の値として「３」が取得される（Ｓ１０４）。また、文ｅにおいて単語「自衛隊」が数を表す語の後にあるため、Ｑ_２の値として「１」が取得される（Ｓ１０５）。さらに、文ｄ，ｇにおいて「自衛隊」は、数を表す語の後になく、かつ、述語を修飾しているため、Ｒ_２の値として「２」が取得される（Ｓ１０６）。また、文ｄにおいて「自衛隊」を修飾する修飾語が１つ存在するため、Ｓ_２の値として「１」が取得される（Ｓ１０７）。その結果、「自衛隊」の評価値Ｅ_２として「０．７５」が取得される（Ｓ１０８）。「自衛隊」の評価値Ｅ_２は基準値Ｊ（１）以下であるため、「自衛隊」は第３グループに分類される（Ｓ１１０）。すなわち、「自衛隊」は名詞に分類される。 For example, when the value of the variable i is 2, the processing of S104 to S113 is executed for the word “Self Defense Force” in the word list L. In the sentence group G, “SDF” is used once in the sentences d, e, and g, so “3” is acquired as the value of P ₂ (S104). In addition, since the word “SDF” follows the word representing the number in the sentence e, “1” is acquired as the value of Q ₂ (S105). Furthermore, in the sentences d and g, “Self Defense Force” is not after the word representing the number and the predicate is modified, so “2” is acquired as the value of R ₂ (S106). Further, since there is one modifier that modifies “SDF” in the sentence d, “1” is acquired as the value of S ₂ (S 107). As a result, "0.75" is obtained as the evaluation value _{E 2} of the "SDF" (S108). Since "SDF" evaluation value E ₂ of equal to or less than the reference value J (1), "SDF" are classified into a third group (S110). That is, “Self Defense Force” is classified as a noun.

なお、Ｓ１１５の処理において、変数ｉの値が単語リストＬの要素数Ｍ以上であると判定される場合とは、単語リストＬに含まれる各単語に関してＳ１０４〜Ｓ１１３の処理が完了し、単語リストＬに含まれる各単語の分類が完了した場合である。 In the process of S115, when it is determined that the value of the variable i is equal to or greater than the number M of elements in the word list L, the process of S104 to S113 is completed for each word included in the word list L. This is a case where the classification of each word included in L is completed.

基礎情報生成部２３は以上のようにして取得された各単語の分類結果を基礎情報記憶部３５に記憶する。 The basic information generation unit 23 stores the classification result of each word acquired as described above in the basic information storage unit 35.

次に、解析対象文字列取得部３１、解析部３２、解析規則決定部３３、解析結果出力部３４、及び基礎情報記憶部３５について説明する。これらの機能ブロックは、解析対象の文字列の構造などを解析するための機能ブロックである。 Next, the analysis target character string acquisition unit 31, the analysis unit 32, the analysis rule determination unit 33, the analysis result output unit 34, and the basic information storage unit 35 will be described. These functional blocks are functional blocks for analyzing the structure of the character string to be analyzed.

解析対象文字列取得部３１は解析の対象となる解析対象文字列を取得する。解析対象文字列は自然言語で記述された文であって、利用者が操作部１３を用いて入力してもよいし、予め記憶部１２などに記憶されていてもよい。ここでは解析対象文字列は日本語の文であることとする。 The analysis target character string acquisition unit 31 acquires an analysis target character string to be analyzed. The analysis target character string is a sentence written in a natural language, and may be input by the user using the operation unit 13 or may be stored in advance in the storage unit 12 or the like. Here, it is assumed that the character string to be analyzed is a Japanese sentence.

以下では具体例として、
Ｘ：「３人の学生」
Ｙ：「３個のりんご」
Ｚ：「３自衛隊の訓練」
との３つの解析対象文字列Ｘ，Ｙ，Ｚを解析する場合について説明する。 In the following, as a specific example
X: “Three students”
Y: “3 apples”
Z: “3 Self-Defense Forces training”
A case where three analysis target character strings X, Y, and Z are analyzed will be described.

解析部３２は解析対象文字列を解析する。まず、解析部３２は解析対象文字列の形態素解析を実行する。例えば解析対象文字列Ｘの場合、「３」、「人」、「の」、「学生」との形態素が得られ、各形態素「３」、「人」、「の」、「学生」の品詞が、それぞれ、数字、助数詞、連体化助詞、名詞と解析される。 The analysis unit 32 analyzes the analysis target character string. First, the analysis unit 32 performs morphological analysis of the character string to be analyzed. For example, in the case of the character string X to be analyzed, morphemes “3”, “people”, “no”, “student” are obtained, and the part of speech of each morpheme “3”, “people”, “no”, “student” is obtained. Are analyzed as numbers, classifiers, unionized particles, and nouns, respectively.

次に、解析規則決定部３３は、解析対象文字列において数を表す形態素の直後にある形態素が、助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかを判定する。上述のように、基礎情報記憶部３５には、数を表す語の後に付いて用いられ得る単語が第１〜３グループのいずれに属するかを示す情報が記憶されている。このため、解析規則決定部３３は基礎情報記憶部３５の記憶内容を参照する。 Next, the analysis rule determination unit 33 determines whether the morpheme immediately after the morpheme representing the number in the analysis target character string functions as one or both of the classifier and the noun. As described above, the basic information storage unit 35 stores information indicating which of the first to third groups a word that can be used after a word representing a number belongs. For this reason, the analysis rule determination unit 33 refers to the stored contents of the basic information storage unit 35.

例えば解析対象文字列Ｘの場合、「３」の直後にある「人」が第１〜３グループのいずれに属するかが判定される。上述のように、「人」は第１グループに属するため、「人」は第１グループに属すると判定される。すなわち、「人」は助数詞及び名詞の両方として機能すると判定される。 For example, in the case of the analysis target character string X, it is determined to which of the first to third groups “person” immediately after “3” belongs. As described above, since “person” belongs to the first group, it is determined that “person” belongs to the first group. That is, it is determined that “person” functions as both a classifier and a noun.

また例えば解析対象文字列Ｙの場合、「３」の直後にある「個」が第１〜３グループのいずれに属するかが判定される。上述のように、「個」は第２グループに属するため、「個」は第２グループに属すると判定される。すなわち、「個」は助数詞として機能し、名詞として機能しないと判定される。 Further, for example, in the case of the analysis target character string Y, it is determined which “group” immediately after “3” belongs to the first to third groups. As described above, since “pieces” belong to the second group, it is determined that “pieces” belong to the second group. That is, it is determined that “individual” functions as a classifier and does not function as a noun.

また例えば解析対象文字列Ｚの場合、「３」の直後にある「自衛隊」が第１〜３グループのいずれに属するかが判定される。上述のように、「自衛隊」は第３グループに属するため、「自衛隊」は第３グループに属すると判定される。すなわち、「自衛隊」は名詞として機能し、助数詞として機能しないと判定される。 Further, for example, in the case of the analysis target character string Z, it is determined to which of the first to third groups the “Self Defense Force” immediately after “3” belongs. As described above, since the “SDF” belongs to the third group, it is determined that the “SDF” belongs to the third group. That is, it is determined that “SDF” functions as a noun and does not function as a classifier.

そして、解析規則決定部３３は、数を表す形態素の直後にある形態素が第１〜３グループのいずれに属するかの判断結果に基づいて、構文解析に用いる解析規則として一又は複数の解析規則を決定する。 Then, the analysis rule determination unit 33 selects one or a plurality of analysis rules as the analysis rules used for the syntax analysis based on the determination result as to which of the first to third groups the morpheme immediately after the morpheme representing the number belongs. decide.

例えば、解析対象文字列Ｘの場合、「人」は第１グループ（助数詞及び名詞の両方として機能するもの）に分類されているため、「人」が助数詞であると前提した解析規則と、「人」が名詞であると前提した解析規則との二つの解析規則が選択される。 For example, in the case of the analysis target character string X, since “person” is classified into the first group (functioning as both a classifier and a noun), an analysis rule that assumes that “person” is a classifier, and “ Two analysis rules are selected, the analysis rule assuming that "person" is a noun.

図５は「人」が助数詞であると前提した解析規則の一例を示す。図５（ａ）は文法規則を示している。なお、解析対象文字列Ｘ〜Ｚはいずれも名詞句であるため、図５（ａ）では名詞句（Ｎａｄｊ）の文法規則が示されており、名詞句に含まれる形態素が満たすべき語順のルールが示されている。ここで記号「｜」は「ＯＲ」を意味し、波括弧で囲まれて記号「｜」で区切られた各要素の一つが択一的に選ばれることを意味している。また、「ＰＰ」、「Ｎ」、「ＮＵＭＢＥＲ」、及び「ＣＬ」は形態素が属する語彙種別を示している。図５（ｂ）は形態素と語彙種別との対応関係を規定した辞書を示しており、形態素がどの語彙種別に属するかを示している。なお、図５に示す解析規則は「人」が助数詞であると前提した解析規則であるため、図５（ｂ）では「人」が、助数詞に対応する語彙種別「ＣＬ」に属することになっている。 FIG. 5 shows an example of an analysis rule on the assumption that “person” is a classifier. FIG. 5A shows grammar rules. Since all of the analysis target character strings X to Z are noun phrases, the grammatical rules for the noun phrase (Nadj) are shown in FIG. 5A, and the rules for the word order to be satisfied by the morphemes included in the noun phrase It is shown. Here, the symbol “|” means “OR”, which means that one of the elements surrounded by curly brackets and separated by the symbol “|” is selected alternatively. “PP”, “N”, “NUMBER”, and “CL” indicate the vocabulary type to which the morpheme belongs. FIG. 5B shows a dictionary that defines the correspondence between morphemes and vocabulary types, and shows to which vocabulary types the morphemes belong. The analysis rule shown in FIG. 5 is an analysis rule on the assumption that “person” is a classifier. Therefore, in FIG. 5B, “person” belongs to the vocabulary type “CL” corresponding to the classifier. ing.

図６は「人」が名詞であると前提した解析規則の一例を示す。図６（ａ）は名詞句（Ｎａｄｊ）の文法規則を示しており、名詞句に含まれる形態素が満たすべき語順のルールを示している。図６（ｂ）は形態素と語彙種別との対応関係を規定した辞書を示しており、形態素がどの語彙種別に属するかを示している。なお、図６に示す解析規則は「人」が名詞であると前提した解析規則であるため、図６（ｂ）は「人」が、名詞に対応する語彙種別「Ｎ」に属することになっている。この点で図６（ｂ）は図５（ｂ）と異なっている。 FIG. 6 shows an example of an analysis rule on the assumption that “person” is a noun. FIG. 6A shows the grammatical rules of the noun phrase (Nadj), and shows the word order rules that the morphemes included in the noun phrase should satisfy. FIG. 6B shows a dictionary that defines the correspondence between morphemes and vocabulary types, and shows to which vocabulary types the morphemes belong. The analysis rule shown in FIG. 6 is an analysis rule that assumes that “person” is a noun, and therefore, in FIG. 6B, “person” belongs to the vocabulary type “N” corresponding to the noun. ing. In this respect, FIG. 6B is different from FIG.

また例えば、解析対象文字列Ｙの場合、「個」は第２グループ（助数詞として機能するもの）に分類されているため、「個」が助数詞であると前提した一つの解析規則が選択される。この解析規則では、「個」が、助数詞に対応する語彙種別「ＣＬ」に属すると規定される。 Further, for example, in the case of the analysis target character string Y, since “persons” are classified into the second group (functions as classifiers), one analysis rule that assumes that “persons” are classifiers is selected. . In this analysis rule, “individual” is defined as belonging to the vocabulary type “CL” corresponding to the classifier.

また例えば、解析対象文字列Ｚの場合、「自衛隊」は第３グループ（名詞として機能するもの）に分類されているため、「自衛隊」が名詞であると前提した一つの解析規則が選択される。この解析規則では、「自衛隊」が、名詞に対応する語彙種別「Ｎ」に属すると規定される。 Also, for example, in the case of the analysis target character string Z, since “Self Defense Force” is classified into the third group (functioning as a noun), one analysis rule that assumes that “Self Defense Force” is a noun is selected. . In this analysis rule, it is defined that “SDF” belongs to the vocabulary type “N” corresponding to the noun.

解析部３２は、解析規則決定部３３によって決定された解析規則に則って構文解析を実行する。例えば、解析規則決定部３３によって一の解析規則が選択された場合にはその解析規則に則って構文解析が実行され、一の解析結果が得られる。一方、解析規則決定部３３によって複数の解析規則が選択された場合には、各解析規則に則って構文解析が実行され、複数の解析結果が得られる。 The analysis unit 32 executes syntax analysis in accordance with the analysis rule determined by the analysis rule determination unit 33. For example, when one analysis rule is selected by the analysis rule determination unit 33, syntax analysis is executed in accordance with the analysis rule, and one analysis result is obtained. On the other hand, when a plurality of analysis rules are selected by the analysis rule determination unit 33, syntax analysis is executed in accordance with each analysis rule, and a plurality of analysis results are obtained.

例えば、解析対象文字列Ｘの場合、「人」が語彙種別「ＣＬ」に属するとした解析規則（図５参照）に則って構文解析が実行される。図７はこの解析結果を示す。また別に、「人」が語彙種別「Ｎ」に属するとした解析規則（図６参照）に則って構文解析が実行される。図８はこの解析結果を示す。ところで、図８に示すような解析結果が得られた場合、「３人の学生」との文字列は、例えば「３人の先生が受け持つ共通の学生」のような意味と解釈されることになる。以上のように解析対象文字列Ｘの場合には二つの解析結果が得られる。 For example, in the case of the analysis target character string X, the syntax analysis is executed in accordance with an analysis rule (see FIG. 5) that “person” belongs to the vocabulary type “CL”. FIG. 7 shows the results of this analysis. Separately, syntax analysis is executed in accordance with an analysis rule (see FIG. 6) that “person” belongs to the vocabulary type “N”. FIG. 8 shows the result of this analysis. By the way, when an analysis result as shown in FIG. 8 is obtained, the character string “three students” is interpreted as meaning, for example, “a common student of three teachers”. Become. As described above, in the case of the analysis target character string X, two analysis results are obtained.

また例えば、解析対象文字列Ｙの場合、「個」が、助数詞に対応する語彙種別「ＣＬ」に属するとした解析規則に則って構文解析のみが実行される。図９はこの解析結果を示す。このように、解析対象文字列Ｙの場合には一つの解析結果のみが得られる。 Further, for example, in the case of the analysis target character string Y, only the syntax analysis is executed in accordance with the analysis rule that “piece” belongs to the vocabulary type “CL” corresponding to the classifier. FIG. 9 shows the results of this analysis. Thus, in the case of the analysis target character string Y, only one analysis result is obtained.

また例えば、解析対象文字列Ｚの場合、「自衛隊」が、名詞に対応する語彙種別「Ｎ」に属するとした解析規則に則って構文解析のみが実行される。図１０はこの解析結果を示す。このように、解析対象文字列Ｚの場合には一つの解析結果のみが得られる。 Further, for example, in the case of the analysis target character string Z, only the syntax analysis is executed in accordance with the analysis rule that “Self Defense Force” belongs to the vocabulary type “N” corresponding to the noun. FIG. 10 shows the result of this analysis. Thus, in the case of the analysis target character string Z, only one analysis result is obtained.

解析結果出力部３４は解析部３２の解析結果を出力する。例えば、解析結果出力部３４は解析部３２の解析結果を表示部１４に表示する。複数の解析結果が得られた場合、解析結果出力部３４はそれら複数の解析結果を出力する。 The analysis result output unit 34 outputs the analysis result of the analysis unit 32. For example, the analysis result output unit 34 displays the analysis result of the analysis unit 32 on the display unit 14. When a plurality of analysis results are obtained, the analysis result output unit 34 outputs the plurality of analysis results.

なお、以上のように、解析対象文字列において数を表す形態素の直後にある形態素が第１グループ（助数詞及び名詞の両方として機能し得るもの）に分類されている場合、複数の解析結果が得られる。このような場合、機械学習などの公知の曖昧性解消処理を実行して、より妥当と判断される解析結果が得られるよう、解析結果を絞り込むこととしてもよい。このような曖昧性解消処理は、例えば吉村宏樹、他３名著「Support Vector Machineに基づくf-structureの選択」（自然言語処理研究会報告、情報処理学会、２００３年１１月、Vol.2003、No.108、p.75-80）に記載された方法により実現される。 As described above, when the morpheme immediately after the morpheme representing the number in the analysis target character string is classified into the first group (which can function as both a classifier and a noun), a plurality of analysis results are obtained. It is done. In such a case, the analysis result may be narrowed down by executing a known ambiguity resolution process such as machine learning so as to obtain an analysis result judged to be more appropriate. For example, Hiroki Yoshimura and three other authors, “Selection of f-structure based on Support Vector Machine” (Natural Language Processing Study Group Report, Information Processing Society of Japan, November 2003, Vol. 2003, No. .108, p.75-80).

以上説明した本実施形態によれば、数を表す語と組み合わせて文中で用いられる単語が助数詞及び名詞のいずれか一方として機能するか又は両方として機能するかの判定の基礎となる基礎情報が文章群データに基づいて取得される。 According to the present embodiment described above, basic information serving as a basis for determining whether a word used in a sentence in combination with a word representing a number functions as either a classifier or a noun, or as both, is a sentence. Acquired based on group data.

なお、本発明は以上に説明した実施の形態に限定されるものではない。 The present invention is not limited to the embodiment described above.

例えば以上に説明した処理はいずれも一例であって、以上に説明した方法とは異なる方法によって実現されることとしてもよい。 For example, all the processes described above are examples, and may be realized by a method different from the method described above.

また例えば、統計情報取得部２２は以上に説明した統計情報（Ｓ_ｉ，Ｐ_ｉ，Ｑ_ｉ，Ｒ_ｉ）以外の統計情報を取得するようにしてもよい。また例えば、評価値Ｅ_ｉの算出式は上記の算出式（１）に限られず、他の算出式によって算出されるようにしてもよいし、他の統計情報を用いて算出されるようにしてもよい。 Further, for example, the statistical information acquisition unit 22 may acquire statistical information other than the statistical information (S _i , P _i , Q _i , R _i ) described above. For example, the calculation formula of the evaluation value E _i is not limited to the above calculation formula (1), but may be calculated by other calculation formulas, or may be calculated using other statistical information. Also good.

また例えば、基礎情報生成部２３は単語Ｌ_ｉの評価値Ｅ_ｉを上記基礎情報として基礎情報記憶部３５に記憶させるようにしてもよい。この場合、解析規則決定部３３は、形態素が第１〜３グループのいずれに属するかを判定する場合、基礎情報記憶部３５に記憶されるその形態素（単語）の評価値Ｅを用いる。すなわち、解析規則決定部３３はＳ１０９〜Ｓ１１３の処理と同じ処理を実行することによって、形態素が第１〜３グループのいずれに属するかを判定する。なお、この場合、基準値Ｊ，Ｋも基礎情報の一部として基礎情報記憶部３５に記憶されるようにしてもよい。 Further, for example, basic information generating unit 23 the evaluation value E _i of the word L _i may also be stored in the basic information storage unit 35 as the basic information. In this case, the analysis rule determination unit 33 uses the evaluation value E of the morpheme (word) stored in the basic information storage unit 35 when determining which of the first to third groups the morpheme belongs to. That is, the analysis rule determination unit 33 determines which of the first to third groups the morpheme belongs by executing the same processing as the processing of S109 to S113. In this case, the reference values J and K may also be stored in the basic information storage unit 35 as part of the basic information.

また例えば、基礎情報生成部２３は単語Ｌ_ｉの統計情報（Ｓ_ｉ，Ｐ_ｉ，Ｑ_ｉ，Ｒ_ｉ）を上記基礎情報として基礎情報記憶部３５に記憶させるようにしてもよい。この場合、解析規則決定部３３は、形態素が第１〜３グループのいずれに属するかを判定する場合、基礎情報記憶部３５に記憶されるその形態素（単語）の統計情報（Ｓ_ｉ，Ｐ_ｉ，Ｑ_ｉ，Ｒ_ｉ）を用いる。すなわち、解析規則決定部３３はＳ１０８〜Ｓ１１３の処理と同じ処理を実行することによって、形態素が第１〜３グループのいずれに属するかを判定する。なお、この場合、基準値Ｊ，Ｋも基礎情報の一部として基礎情報記憶部３５に記憶されるようにしてもよい。 Further, for example, basic information generating section 23 statistics word _{_{_{_{L i (S i, P i}}}} , Q i, R i) a may also be stored in the basic information storage unit 35 as the basic information. In this case, when the analysis rule determination unit 33 determines which of the first to third groups the morpheme belongs to, statistical information (S _i , P _i ) on the morpheme (word) stored in the basic information storage unit 35. , Q _i , R _i ). That is, the analysis rule determination unit 33 determines which of the first to third groups the morpheme belongs by executing the same processing as the processing of S108 to S113. In this case, the reference values J and K may also be stored in the basic information storage unit 35 as part of the basic information.

また例えば、言語処理システム１は複数の言語処理装置１０を含んで構成されるようにしてもよい。図１１は、複数の言語処理装置１０を含んで構成される言語処理システム１の一例を示す。図１１に示す言語処理システム１は第１言語処理装置１０ａと第２言語処理装置１０ｂとを含んでいる。第１言語処理装置１０ａと第２言語処理装置１０ｂとは通信手段を介して相互にデータを授受することが可能になっている。この場合、図１２に示すように、単語取得部２１、統計情報取得部２２、及び基礎情報生成部２３が第１言語処理装置１０ａで実現され、解析対象文字列取得部３１、解析部３２、解析規則決定部３３、解析結果出力部３４、及び基礎情報記憶部３５が第２言語処理装置１０ｂで実現されるようにしてもよい。この場合、基礎情報生成部２３は上記基礎情報を第１言語処理装置１０ａの記憶部１２に記憶する。また、基礎情報生成部２３は、例えば第２言語処理装置１０ｂからの要求に応じて、上記基礎情報を通信手段を介して第２言語処理装置１０ｂに供給する。 Further, for example, the language processing system 1 may be configured to include a plurality of language processing devices 10. FIG. 11 shows an example of a language processing system 1 configured to include a plurality of language processing devices 10. The language processing system 1 shown in FIG. 11 includes a first language processing device 10a and a second language processing device 10b. The first language processing device 10a and the second language processing device 10b can exchange data with each other via communication means. In this case, as shown in FIG. 12, the word acquisition unit 21, the statistical information acquisition unit 22, and the basic information generation unit 23 are realized by the first language processing device 10 a, and the analysis target character string acquisition unit 31, the analysis unit 32, The analysis rule determination unit 33, the analysis result output unit 34, and the basic information storage unit 35 may be realized by the second language processing device 10b. In this case, the basic information generation unit 23 stores the basic information in the storage unit 12 of the first language processing device 10a. Moreover, the basic information generation part 23 supplies the said basic information to the 2nd language processing apparatus 10b via a communication means, for example according to the request | requirement from the 2nd language processing apparatus 10b.

本発明の実施形態に係る言語処理システムの構成例を示す図である。It is a figure which shows the structural example of the language processing system which concerns on embodiment of this invention. 本発明の実施形態に係る言語処理システムが実現する機能の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the function which the language processing system which concerns on embodiment of this invention implement | achieves. 言語処理ステムで実行される処理を示すフロー図である。It is a flowchart which shows the process performed with a language processing system. 言語処理ステムで実行される処理を示すフロー図である。It is a flowchart which shows the process performed with a language processing system. 解析規則の一例を示す図である。It is a figure which shows an example of an analysis rule. 解析規則の他の一例を示す図である。It is a figure which shows another example of an analysis rule. 解析結果の一例を示す図である。It is a figure which shows an example of an analysis result. 解析結果の他の一例を示す図である。It is a figure which shows another example of an analysis result. 解析結果の他の一例を示す図である。It is a figure which shows another example of an analysis result. 解析結果の他の一例を示す図である。It is a figure which shows another example of an analysis result. 本発明の他の実施形態に係る言語処理システムの構成例を示す図である。It is a figure which shows the structural example of the language processing system which concerns on other embodiment of this invention. 本発明の他の実施形態に係る言語処理システムが実現する機能の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the function which the language processing system which concerns on other embodiment of this invention implement | achieves.

Explanation of symbols

１言語処理システム、１０言語処理装置、１０ａ第１言語処理装置、１０ｂ第２言語処理装置、１１制御部、１２記憶部、１３操作部、１４表示部、２１単語取得部、２２統計情報取得部、２３基礎情報生成部、３１解析対象文字列取得部、３２解析部、３３解析規則決定部、３４解析結果出力部、３５基礎情報記憶部。 DESCRIPTION OF SYMBOLS 1 language processing system, 10 language processing apparatus, 10a 1st language processing apparatus, 10b 2nd language processing apparatus, 11 control part, 12 memory | storage part, 13 operation part, 14 display part, 21 word acquisition part, 22 statistical information acquisition part , 23 Basic information generation part, 31 Analysis object character string acquisition part, 32 Analysis part, 33 Analysis rule determination part, 34 Analysis result output part, 35 Basic information storage part.

Claims

Word acquisition means for acquiring a word used in a sentence in combination with a word representing a number;
Statistical information acquisition means for acquiring statistical information on how the word is used based on sentence group data;
Basic information generating means for generating basic information based on the statistical information as a basis for determining whether the word functions as one or both of a classifier and a noun;
A language processing apparatus comprising:

The statistical information acquisition means includes
Means for acquiring the number of times the word is used in a predetermined first mode corresponding to a classifier in the sentence group data;
Means for acquiring the number of times the word is used in a predetermined second mode corresponding to a noun in the sentence group data;
The language processing apparatus according to claim 1.

The language processing apparatus according to claim 1, wherein the statistical information acquisition unit includes a unit that acquires the number of modifiers that modify the word in the sentence group data.

An analysis target character string acquisition means for acquiring the analysis target character string;
When the word is included in the analysis target character string, based on the basic information of the word, it is determined whether the word functions as either a classifier or a noun or both, and the determination result Based on the analysis means for analyzing the analysis target character string,
4. The language processing apparatus according to claim 1, comprising:

The analysis means includes analysis rule determination means for determining an analysis rule for the analysis target character string based on the determination result, and the analysis target character string is determined using the analysis rule determined by the analysis rule determination means. The language processing device according to claim 4, wherein the language processing device is analyzed.

Word acquisition means for acquiring a word used in a sentence in combination with a word representing a number;
Statistical information acquisition means for acquiring statistical information on how the word is used based on sentence group data;
Basic information generating means for generating basic information based on the statistical information as a basis for determining whether the word functions as one or both of a classifier and a noun;
A language processing system comprising:

Word acquisition means for acquiring a word used in a sentence in combination with a word representing a number;
Statistical information acquisition means for acquiring statistical information on how the word is used based on sentence group data; and
Basic information generating means for generating basic information based on the statistical information, which is a basis for determining whether the word functions as one of a classifier and a noun or both.
As a program to make the computer function.