JP5117744B2

JP5117744B2 - Word meaning tag assigning device and method, program, and recording medium

Info

Publication number: JP5117744B2
Application number: JP2007063244A
Authority: JP
Inventors: 貴秋田中; 早苗藤田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-03-13
Filing date: 2007-03-13
Publication date: 2013-01-16
Anticipated expiration: 2027-03-13
Also published as: JP2008225846A

Description

本発明は、自然言語解析技術に関し、特に自然言語データを構成する各単語に対して文脈にあった意味を示すタグを付与する単語意味タグ付与技術に関する。 The present invention relates to a natural language analysis technique, and more particularly to a word meaning tag assignment technique for assigning a tag indicating meaning in context to each word constituting natural language data.

自然言語処理において、意味的な情報を扱う場合、文中に含まれる各単語に、辞書やシソーラスで定義された単語の意味、すなわち意味タグを自動的に付与する技術が用いられる。
従来、このような技術として、対象文を構文解析器にかけ、依存構造にある単語対の意味タグを左から右へ決定していく方法や（例えば、非特許文献１など参照）、隠れマルコフモデルを使い、形態素解析と同様の方法で、左から右へ単語の意味タグを決定していく方法（例えば、非特許文献２など参照）が提案されている。
なお、これら技術では、意味タグは辞書やシソーラスで定義されているものとして説明しているが、単語が意味のまとまりで分類されているものは本質的に同じであり、例えばテキスト中での出現分布の特徴に基づいて既存の自動分類手法で分類されたものなども含まれる。 In the case of handling semantic information in natural language processing, a technique is used in which the meaning of a word defined by a dictionary or thesaurus, that is, a semantic tag is automatically assigned to each word included in a sentence.
Conventionally, as such a technique, a method of applying a target sentence to a parser and determining a semantic tag of a word pair in a dependency structure from left to right (for example, see Non-Patent Document 1), a hidden Markov model A method is proposed in which semantic tags of words are determined from left to right in the same manner as in morphological analysis (see, for example, Non-Patent Document 2).
In these technologies, meaning tags are described as being defined in a dictionary or thesaurus. However, words that are classified according to meaning groups are essentially the same. For example, they appear in text. Those classified by the existing automatic classification method based on the characteristics of the distribution are also included.

SenseLeaner, Rada Mihalcea and Ehsanul Faruque 著, "SenseLearner: Minimally Supervised Word Sense Disambiguation for All Words in Open Text" in Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain, July 2004.SenseLeaner, Rada Mihalcea and Ehsanul Faruque, "SenseLearner: Minimally Supervised Word Sense Disambiguation for All Words in Open Text" in Proceedings of ACL / SIGLEX Senseval-3, Barcelona, Spain, July 2004. Antonia Molina他著, "WSD system based on Specialized Hidden Markov Model(upv-shmm-eaw)" in Proceedings of ACL/SIGLEX Senseval-3, Barcelona, Spain, July 2004.Antonia Molina et al., "WSD system based on Specialized Hidden Markov Model (upv-shmm-eaw)" in Proceedings of ACL / SIGLEX Senseval-3, Barcelona, Spain, July 2004. 笠原要也著, "基本語意味データベース：Lexeedの構築", 情報処理学会研究報告、NLC-159, 2004Y. Kasahara, "Basic Word Semantic Database: Construction of Lexeed", Information Processing Society of Japan, NLC-159, 2004

しかしながら、このような従来技術では、対象文の中の複数の単語に対して意味タグを付与する際に、各単語の出現順に沿って各単語に意味タグを付与しているため、意味タグの付与精度を高めることができないという問題点があった。 However, in such a conventional technique, when a semantic tag is assigned to a plurality of words in a target sentence, a semantic tag is assigned to each word along the order of appearance of each word. There was a problem that the application accuracy could not be increased.

すなわち各単語の出現順に沿って各単語に意味タグを付与する場合、出現順に沿った単語の意味タグ付与結果が他の単語の意味タグ付けに大きな影響を与えるため、場合によっては誤った意味タグの系列を選択してしまう恐れがある。
例えば、「茶のグラスをかけた男」という文中の「茶」と「グラス」という語に意味タグを付与する場合、左から右に意味タグを決定していくと、「茶」の共起語「グラス」が持つ「食器」という意味に引きずられて、「茶」の意味タグを「飲料」と決定し、その意味タグとの共起関係から次の「グラス」の意味タグを「食器」と誤る可能性がある。 In other words, when meaning tags are assigned to each word in the order of appearance of each word, the meaning tag assignment result of words in the order of appearance has a great influence on the meaning tagging of other words. There is a risk of selecting the series.
For example, if you add meaning tags to the words “tea” and “glass” in the sentence “a man wearing a glass of tea,” the co-occurrence of “tea” The word “glass” is dragged by the meaning of “tableware”, the meaning tag of “tea” is determined as “beverage”, and the next meaning tag of “glass” is determined as “tableware” from the co-occurrence relationship with the meaning tag. ”May be mistaken.

本発明はこのような課題を解決するためのものであり、意味タグの付与精度を高めることができる単語意味タグ付与装置および方法、プログラム並びに記録媒体を提供することを目的としている。 The present invention has been made to solve such problems, and an object thereof is to provide a word meaning tag assigning device and method, a program, and a recording medium that can improve the accuracy of meaning tag assignment.

このような目的を達成するために、本発明にかかる単語意味タグ付与装置は、自然言語データからなる対象文に含まれる各対象単語に対して、当該対象単語の意味を示す意味タグをそれぞれ付与する単語意味タグ付与装置であって、単語の意味ごとに設けられた、当該意味を示す意味タグと当該意味の概念を示す意味クラスの組からなる単語辞書を記憶する記憶部と、対象単語のうちから選択した処理対象単語について、当該処理対象単語と対象文に含まれる他の単語との間の単語共起関係を示す一次特徴量を生成し、当該処理対象単語と単語共起関係を持つ他の対象単語について記憶部の単語辞書を検索して得られた意味タグおよび意味クラスと当該処理対象単語との間の意味共起関係を示す意味共起特徴量を生成し、当該処理対象単語の一次特徴量と意味共起特徴量から当該処理対象単語の拡張特徴量を生成する特徴量抽出部と、処理対象単語の拡張特徴量に基づいて、単語辞書に記述されている当該処理対象単語の意味タグのうち、対象文で使用されている当該処理対象単語の最も適切な意味を示す意味タグを決定する意味タグ決定部とを備えている。 In order to achieve such an object, the word meaning tag assignment device according to the present invention assigns a meaning tag indicating the meaning of the target word to each target word included in the target sentence composed of natural language data. A word meaning tag assigning device for each word meaning, a storage unit for storing a word dictionary composed of a meaning tag indicating the meaning and a meaning class indicating a concept of the meaning, and a target word For a processing target word selected from among them, a primary feature amount indicating a word co-occurrence relationship between the processing target word and another word included in the target sentence is generated, and the word has a word co-occurrence relationship with the processing target word A semantic co-occurrence feature amount indicating a semantic co-occurrence relationship between a semantic tag and a semantic class obtained by searching the word dictionary in the storage unit for another target word and the processing target word is generated, and the processing target word of A feature amount extraction unit that generates an extended feature amount of the processing target word from the next feature amount and the semantic co-occurrence feature amount, and the processing target word described in the word dictionary based on the extended feature amount of the processing target word A semantic tag determination unit that determines a semantic tag indicating the most appropriate meaning of the processing target word used in the target sentence among the semantic tags.

この際、特徴量抽出部に、処理対象単語と対象文に含まれる他の単語との間の単語共起関係ごとに当該単語共起関係を示す一次特徴量をそれぞれ生成する一次特徴量抽出部と、対象文に含まれる他の対象単語にそれぞれついて記憶部の単語辞書を検索し、得られた意味タグおよび意味クラスのそれぞれについて処理対象単語との間の意味共起関係を示す意味共起特徴量をそれぞれ生成し、これら意味共起特徴量と当該処理対象単語の一次特徴量とから当該対象単語の拡張特徴量を生成する特徴量拡張部とを設けてもよい。 At this time, the feature quantity extraction unit generates a primary feature quantity indicating the word co-occurrence relationship for each word co-occurrence relationship between the processing target word and other words included in the target sentence. And semantic co-occurrence indicating a semantic co-occurrence relationship with the processing target word for each of the semantic tag and semantic class obtained by searching the word dictionary in the storage unit for each of the other target words included in the target sentence. A feature amount expansion unit that generates feature amounts and generates an extended feature amount of the target word from the semantic co-occurrence feature amount and the primary feature amount of the processing target word may be provided.

また、記憶部により、対象単語に対する一次特徴量および意味共起特徴量ごとに、当該対象単語に対応して選択される各意味タグの重みを示す意味タグ選択モデルを記憶し、意味タグ決定部により、拡張特徴量に含まれる処理対象単語の一次特徴量および意味共起特徴量について記憶部の意味タグ選択モデルを検索し、意味タグごと得られた重みに基づいて当該処理対象単語と各意味タグとの組合せごとに評価値を算出し、これら評価値に基づいて当該処理対象単語の意味タグを決定するようにしてもよい。 Further, the storage unit stores a semantic tag selection model indicating the weight of each semantic tag selected corresponding to the target word for each primary feature and semantic co-occurrence feature for the target word, and a semantic tag determination unit Thus, the semantic tag selection model in the storage unit is searched for the primary feature value and the semantic co-occurrence feature value of the processing target word included in the extended feature value, and the processing target word and each meaning are determined based on the weight obtained for each semantic tag. An evaluation value may be calculated for each combination with a tag, and a semantic tag for the processing target word may be determined based on the evaluation value.

また、記憶部により、対象単語に対する意味タグの付与順序を示す付与順序データベースを記憶し、意味タグ決定部により、複数の処理対象単語について意味タグを決定する場合、記憶部の付与順序データベースを検索して各処理対象単語の付与順序を取得し、これら付与順序に基づいて各処理対象単語に対して順に意味タグを決定するようにしてもよい。 In addition, the storage unit stores an assignment order database indicating the order in which the semantic tags are assigned to the target words, and when the semantic tag determination unit determines the semantic tags for a plurality of processing target words, the storage unit grant order database is searched. Then, the order of giving each processing target word may be acquired, and the semantic tag may be sequentially determined for each processing target word based on the order of giving.

また、対象文を形態素解析する形態素解析部と、対象文を構文意味解析する構文意味解析部と、これら形態素解析部と構文意味解析部での種類の解析結果を、同一解析対象ごとに統合し、対象文とともに一次特徴量抽出部へ出力する解析結果統合部とを有する対象文入力部をさらに備えてもよい。 In addition, the morphological analysis unit that performs morphological analysis on the target sentence, the syntactic and semantic analysis unit that performs syntactic and semantic analysis on the target sentence, and the types of analysis results in the morpheme analysis unit and the syntax and semantic analysis unit are integrated for each same analysis target. The target sentence input unit may further include an analysis result integration unit that outputs to the primary feature quantity extraction unit together with the target sentence.

また、本発明にかかる単語意味タグ付与方法は、自然言語データからなる対象文に含まれる各対象単語に対して、当該対象単語の意味を示す意味タグをそれぞれ付与する単語意味タグ付与方法であって、単語の意味ごとに設けられた、当該意味を示す意味タグと当該意味の分類を示す意味クラスの組からなる単語辞書を記憶部で記憶する記憶ステップと、特徴量抽出部により、対象単語のうちから選択した処理対象単語について、当該処理対象単語と対象文に含まれる他の単語との間の単語共起関係を示す一次特徴量を生成し、対象文に含まれる他の対象単語について記憶部の単語辞書を検索して得られた意味タグおよび意味クラスと当該処理対象単語との間の意味共起関係を示す意味共起特徴量を生成し、当該処理対象単語の一次特徴量と意味共起特徴量から当該処理対象単語の拡張特徴量を生成する特徴量抽出ステップと、意味タグ決定部により、処理対象単語の拡張特徴量に基づいて、単語辞書に記述されている当該処理対象単語の意味タグのうち、対象文で使用されている当該処理対象単語の最も適切な意味を示す意味タグを決定する意味タグ決定ステップとを備えている。 The word meaning tag assignment method according to the present invention is a word meaning tag assignment method for assigning a meaning tag indicating the meaning of the target word to each target word included in the target sentence composed of natural language data. A storage step of storing a word dictionary composed of a combination of a meaning tag indicating the meaning and a meaning class indicating the classification of the meaning, provided for each meaning of the word in the storage unit; For the processing target word selected from the above, a primary feature amount indicating a word co-occurrence relationship between the processing target word and another word included in the target sentence is generated, and the other target word included in the target sentence is generated. Generating a semantic co-occurrence feature amount indicating a semantic co-occurrence relationship between the semantic tag and semantic class obtained by searching the word dictionary in the storage unit and the processing target word, and a primary feature amount of the processing target word; The processing target described in the word dictionary based on the extended feature amount of the processing target word by the feature amount extraction step for generating the extended feature amount of the processing target word from the taste co-occurrence feature amount and the semantic tag determination unit A semantic tag determination step of determining a semantic tag indicating the most appropriate meaning of the processing target word used in the target sentence among the semantic tags of the word.

また、本発明にかかるプログラムは、コンピュータに、上記単語意味タグ付与方法の各ステップを実行させるためのプログラムである。
また、本発明にかかる記録媒体は、上記プログラムが記録された記録媒体である。 Moreover, the program concerning this invention is a program for making a computer perform each step of the said word meaning tag provision method.
A recording medium according to the present invention is a recording medium on which the program is recorded.

本発明によれば、特徴量抽出部により、対象単語のうちから選択した処理対象単語について、当該処理対象単語と対象文に含まれる他の単語との間の単語共起関係を示す一次特徴量を生成し、当該処理対象単語と単語共起関係を持つ他の対象単語について記憶部の単語辞書を検索して得られた意味タグおよび意味クラスと当該処理対象単語との間の意味共起関係を示す意味共起特徴量を生成し、当該処理対象単語の一次特徴量と意味共起特徴量から当該処理対象単語の拡張特徴量を生成し、意味タグ決定部により、処理対象単語の拡張特徴量に基づいて、単語辞書に記述されている当該処理対象単語の意味タグのうち、対象文で使用されている当該処理対象単語の最も適切な意味を示す意味タグを決定している。 According to the present invention, a primary feature amount indicating a word co-occurrence relationship between the processing target word and another word included in the target sentence for the processing target word selected from the target words by the feature amount extraction unit. And the semantic co-occurrence relationship between the semantic tag and semantic class obtained by searching the word dictionary in the storage unit for other target words having a word co-occurrence relationship with the processing target word and the processing target word Is generated from the primary feature value and the semantic co-occurrence feature amount of the processing target word, and the semantic tag determination unit causes the extended feature of the processing target word to be generated. Based on the amount, the semantic tag indicating the most appropriate meaning of the processing target word used in the target sentence is determined from the semantic tags of the processing target word described in the word dictionary.

これにより、対象単語の意味タグを付与する際、当該対象単語と単語共起関係を持つ単語だけでなく、当該対象単語と意味共起関係を持つ単語との関係に基づいて意味タグを付与することができる。したがって、単語の出現順に沿って各単語に意味タグを付与する場合と比較して、より広範囲の関係に基づいて意味タグを付与することができ、意味タグの付与精度を高めることができる。 Thereby, when assigning a semantic tag for a target word, a semantic tag is assigned based on the relationship between the target word and a word having a semantic co-occurrence relationship as well as a word having a word co-occurrence relationship with the target word. be able to. Therefore, compared with the case where a meaning tag is given to each word along the order of appearance of the word, the meaning tag can be given based on a wider range of relationships, and the meaning tag assignment accuracy can be improved.

次に、本発明の実施の形態について図面を参照して説明する。
[第１の実施の形態]
まず、図１を参照して、本発明の第１の実施の形態にかかる単語意味タグ付与装置について説明する。図１は、本発明の第１の実施の形態にかかる単語意味タグ付与装置の構成を示すブロック図である。
単語意味タグ付与装置１０は、サーバやパーソナルコンピュータなどの一般的な情報処理装置からなり、自然言語データからなる対象文Ｘに含まれる各対象単語に対して、当該対象単語の意味を示す意味タグをそれぞれ付与し、出力文Ｙとして出力する機能を有している。 Next, embodiments of the present invention will be described with reference to the drawings.
[First embodiment]
First, with reference to FIG. 1, the word meaning tag provision apparatus concerning the 1st Embodiment of this invention is demonstrated. FIG. 1 is a block diagram showing a configuration of a word meaning tag assigning apparatus according to a first embodiment of the present invention.
The word meaning tag assigning device 10 includes a general information processing device such as a server or a personal computer, and for each target word included in the target sentence X made of natural language data, a semantic tag indicating the meaning of the target word. Are given and output as an output sentence Y.

本実施の形態は、記憶部により、単語の意味ごとに設けられた、当該意味を示す意味タグと当該意味の概念を示す意味クラスの組からなる単語辞書を記憶しておき、特徴量抽出部により、対象単語のうちから選択した処理対象単語について、当該処理対象単語と対象文に含まれる他の単語との間の単語共起関係を示す一次特徴量を生成し、当該処理対象単語と単語共起関係を持つ他の対象単語について記憶部の単語辞書を検索して得られた意味タグおよび意味クラスと当該処理対象単語との間の意味共起関係を示す意味共起特徴量を生成し、当該処理対象単語の一次特徴量と意味共起特徴量から当該処理対象単語の拡張特徴量を生成し、意味タグ決定部により、処理対象単語の拡張特徴量に基づいて、単語辞書に記述されている当該処理対象単語の意味タグのうち、対象文で使用されている当該処理対象単語の最も適切な意味を示す意味タグを決定するようにしたものである。 In the present embodiment, the storage unit stores a word dictionary that is provided for each word meaning and includes a meaning tag that indicates the meaning and a meaning class that indicates the concept of the meaning. For the processing target word selected from the target words, a primary feature amount indicating a word co-occurrence relationship between the processing target word and another word included in the target sentence is generated, and the processing target word and the word A semantic co-occurrence feature amount indicating a semantic co-occurrence relationship between a semantic tag and a semantic class obtained by searching the word dictionary in the storage unit for another target word having a co-occurrence relationship and the processing target word is generated. Then, an extended feature quantity of the processing target word is generated from the primary feature quantity and semantic co-occurrence feature quantity of the processing target word, and is described in the word dictionary by the semantic tag determination unit based on the extended feature quantity of the processing target word. The processing target Of meaning tag word, in which so as to determine the meaning tag indicating the most appropriate means of the processing target word used in the sentence.

以下、図１を参照して、本発明の第１の実施の形態にかかる単語意味タグ付与装置の構成について詳細に説明する。
単語意味タグ付与装置１０には、主な機能部として、一般的な情報処理装置と同様に、演算処理部１、記憶部２、入出力インターフェース部（以下、入出力Ｉ／Ｆ部という）３、通信インターフェース部（以下、通信Ｉ／Ｆ部という）４、操作入力部５、および画面表示部６が設けられている。 Hereinafter, with reference to FIG. 1, the structure of the word meaning tag provision apparatus concerning the 1st Embodiment of this invention is demonstrated in detail.
The word meaning tag assigning device 10 includes, as main functional units, an arithmetic processing unit 1, a storage unit 2, an input / output interface unit (hereinafter referred to as an input / output I / F unit) 3 as in a general information processing device. A communication interface unit (hereinafter referred to as a communication I / F unit) 4, an operation input unit 5, and a screen display unit 6 are provided.

演算処理部１は、ＣＰＵなどのマイクロプロセッサとその周辺回路からなり、記憶部２に格納されているプログラム２０を読み出して実行することにより、上記ハードウェアとプログラム２０とを協働させて各種処理部を実現する。
演算処理部１で実現される主な処理部としては、対象文解析部１１、特徴量抽出部１２、意味タグ決定部１３、および意味タグ出力部１４がある。 The arithmetic processing unit 1 is composed of a microprocessor such as a CPU and its peripheral circuits, and reads and executes the program 20 stored in the storage unit 2, thereby causing the hardware and the program 20 to cooperate with each other to perform various processes. Realize the part.
As main processing units realized by the arithmetic processing unit 1, there are a target sentence analysis unit 11, a feature amount extraction unit 12, a semantic tag determination unit 13, and a semantic tag output unit 14.

記憶部２は、ハードディスクやメモリなどの記憶装置からなり、演算処理部１で実行するプログラム２０や、意味タグの付与処理に用いる各種処理情報を記憶する。プログラム２０は、例えば入出力Ｉ／Ｆ部３を介して記録媒体Ｍから読み込まれ、あるいは通信Ｉ／Ｆ部４を介して外部装置（図示せず）から読み込まれ、記憶部２へ予め格納される。
記憶部２で記憶する主な処理情報としては、単語辞書２１と意味タグ選択モデル２２がある。 The storage unit 2 is composed of a storage device such as a hard disk or a memory, and stores a program 20 executed by the arithmetic processing unit 1 and various processing information used for a semantic tag assignment process. The program 20 is read from the recording medium M via, for example, the input / output I / F unit 3 or read from an external device (not shown) via the communication I / F unit 4 and stored in the storage unit 2 in advance. The
Main processing information stored in the storage unit 2 includes a word dictionary 21 and a semantic tag selection model 22.

入出力Ｉ／Ｆ部３は、専用のデータ入出力回路からなり、ＣＤやＤＶＤ、さらには不揮発性メモリカードなどの記録媒体Ｍとの間で、演算処理部１からの指示に応じて、対象文Ｘ、出力文Ｙ、辞書、データベースなどの各種データやプログラムを入出力する機能を有している。
通信Ｉ／Ｆ部４は、専用のデータ通信回路からなり、ＬＡＮなどの通信回線を介して接続されたサーバなどの外部装置との間で、演算処理部１からの指示に応じて、対象文Ｘ、出力文Ｙ、辞書、データベースなどの各種データやプログラムを送受信する機能を有している。 The input / output I / F unit 3 is composed of a dedicated data input / output circuit, and is connected to a recording medium M such as a CD, a DVD, or a nonvolatile memory card in accordance with an instruction from the arithmetic processing unit 1. It has a function of inputting / outputting various data and programs such as sentence X, output sentence Y, dictionary, and database.
The communication I / F unit 4 includes a dedicated data communication circuit, and communicates with an external device such as a server connected via a communication line such as a LAN in accordance with an instruction from the arithmetic processing unit 1. It has a function of transmitting and receiving various data and programs such as X, output sentence Y, dictionary, and database.

操作入力部５は、キーボードやマウスなどの操作入力装置からなり、オペレータの操作を検出して演算処理部１へ出力する機能を有している。
画面表示部６は、ＬＣＤやＰＤＰなどの画面表示装置からなり、演算処理部１からの指示に応じて対象文Ｘや出力文Ｙなどの各種データや操作画面を画面表示する機能を有している。 The operation input unit 5 includes an operation input device such as a keyboard and a mouse, and has a function of detecting an operation of the operator and outputting the operation to the arithmetic processing unit 1.
The screen display unit 6 includes a screen display device such as an LCD or a PDP, and has a function of displaying various data such as a target sentence X and an output sentence Y and an operation screen on the screen according to instructions from the arithmetic processing unit 1. Yes.

図２は、本発明の第１の実施の形態にかかる単語意味タグ付与装置の要部を示すブロック図である。
対象文解析部１１は、自然言語データからなる対象文Ｘを、記憶部２、入出力Ｉ／Ｆ部３、通信Ｉ／Ｆ部４、操作入力部５などから受け取って、２つの異なる言語解析を行う機能を有しており、ここでは形態素解析部１１Ａ、構文意味解析部１１Ｂ、解析結果統合部１１Ｃから構成されている。 FIG. 2 is a block diagram showing a main part of the word meaning tag assigning device according to the first embodiment of the present invention.
The target sentence analysis unit 11 receives the target sentence X composed of natural language data from the storage unit 2, the input / output I / F unit 3, the communication I / F unit 4, the operation input unit 5, and the like, and performs two different language analyzes. Here, the morphological analysis unit 11A, the syntactic and semantic analysis unit 11B, and the analysis result integration unit 11C are configured.

形態素解析部１１Ａは、対象文Ｘについて公知の形態素解析処理を行う機能を有している。構文意味解析部１１Ｂは、対象文Ｘについて公知の構文意味解析処理を行う機能を有している。解析結果統合部１１Ｃは、形態素解析部１１Ａ、構文意味解析部１１Ｂから受け取った複数の種類の解析結果を受け取り、同一文に対する解析結果を対応付ける、あるいは／さらに同一単語に関する解析結果を対応付ける機能と、同一文に対する複数の異なる解析結果を統合する機能とを有している。 The morpheme analysis unit 11A has a function of performing a known morpheme analysis process on the target sentence X. The syntax and semantic analysis unit 11B has a function of performing a known syntax and semantic analysis process on the target sentence X. The analysis result integration unit 11C receives a plurality of types of analysis results received from the morphological analysis unit 11A and the syntax / semantic analysis unit 11B, associates analysis results for the same sentence, and / or further associates analysis results for the same word, And a function of integrating a plurality of different analysis results for the same sentence.

特徴量抽出部１２は、対象文解析部１１で得られた言語解析結果に基づいて、記憶部２の単語辞書２１を参照することにより、対象文Ｘに含まれる各対象言語について特徴量（特徴情報）を抽出する機能を有しており、ここでは一次特徴量抽出部１２Ａと特徴量拡張部１２Ｂから構成されている。単語辞書２１は、各単語について、その意味ごとに設けられた、当該意味を示す意味タグと当該意味の分類を示す意味クラスの組からなるデータベースである。なお、単語辞書２１として、任意の知識ベースで前提としている概念と、その概念同士の相互関係を明示したオントロジを用いてもよい。 The feature quantity extraction unit 12 refers to the word dictionary 21 of the storage unit 2 based on the language analysis result obtained by the target sentence analysis unit 11, and thereby the feature quantity (features) for each target language included in the target sentence X. Information), which is composed of a primary feature quantity extraction unit 12A and a feature quantity expansion unit 12B. The word dictionary 21 is a database composed of a set of a meaning tag indicating the meaning and a meaning class indicating the classification of the meaning, which is provided for each word for each word. It should be noted that as the word dictionary 21, an ontology clearly indicating the concept assumed in an arbitrary knowledge base and the mutual relationship between the concepts may be used.

一次特徴量抽出部１２Ａは、言語解析結果に基づき対象文Ｘから抽出した単語ごとに単語辞書２１を検索する機能と、これら単語のうち複数の意味を持つ単語を意味タグ付与対象となる対象言語として決定する機能と、対象単語のうちから選択した処理対象単語について、当該処理対象単語と対象文Ｘに含まれる他の単語との間の単語共起関係を示す一次（単語共起）特徴量（一次特徴情報）を生成する機能とを有している。 The primary feature quantity extraction unit 12A has a function of searching the word dictionary 21 for each word extracted from the target sentence X based on the language analysis result, and a target language to which a word having a plurality of meanings among these words is a target for giving a semantic tag. And a primary (word co-occurrence) feature amount indicating a word co-occurrence relationship between the processing target word and other words included in the target sentence X for the processing target word selected from the target words And a function of generating (primary feature information).

特徴量拡張部１２Ｂは、対象文Ｘに含まれる他の対象単語について単語辞書２１を検索する機能と、検索により得られた意味タグおよび意味クラスと当該処理対象単語との間の意味共起関係を示す意味共起特徴量（意味共起特徴情報）を生成する機能と、当該処理対象単語の一次特徴量と意味共起特徴量から当該処理対象単語の拡張特徴量を生成する機能とを有している。 The feature amount expanding unit 12B searches the word dictionary 21 for other target words included in the target sentence X, and the semantic co-occurrence relationship between the semantic tag and semantic class obtained by the search and the processing target word. A function for generating semantic co-occurrence feature values (semantic co-occurrence feature information) indicating the above, and a function for generating an extended feature amount of the processing target word from the primary feature amount and semantic co-occurrence feature amount of the processing target word. is doing.

意味タグ決定部１３は、特徴量抽出部１２で抽出された各対象言語の拡張特徴量に基づいて、記憶部２の意味タグ選択モデル２２を参照することにより、単語辞書２１に記述されている当該処理対象単語の意味タグのうち、対象文で使用されている当該処理対象単語の最も適切な意味を示す意味タグを決定する機能を有しており、ここでは意味タグ組合せ探索部１３Ａから構成されている。意味タグ選択モデル２２は、対象単語に対する一次特徴量および意味共起特徴量ごとに、当該対象単語に対応して選択される各意味タグの重みを示すデータベースである。 The semantic tag determination unit 13 is described in the word dictionary 21 by referring to the semantic tag selection model 22 in the storage unit 2 based on the extended feature amount of each target language extracted by the feature amount extraction unit 12. Among the meaning tags of the processing target word, it has a function of determining a semantic tag indicating the most appropriate meaning of the processing target word used in the target sentence. Here, the semantic tag combination search unit 13A is used. Has been. The semantic tag selection model 22 is a database indicating the weight of each semantic tag selected corresponding to the target word for each primary feature value and semantic co-occurrence feature value for the target word.

意味タグ組合せ探索部１３Ａは、拡張特徴量に含まれる処理対象単語の一次特徴量および意味共起特徴量について意味タグ選択モデル２２を検索する機能と、意味タグごと得られた重みに基づいて当該処理対象単語と各意味タグとの組合せごとに評価値を算出する機能と、これら評価値が最も高い組合せをビーム探索などの探索アルゴリズムにより求めることにより当該処理対象単語の意味タグを決定する機能とを有している。 The semantic tag combination search unit 13A is configured to search the semantic tag selection model 22 for the primary feature value and the semantic co-occurrence feature value of the processing target word included in the extended feature value, and based on the weight obtained for each semantic tag. A function of calculating an evaluation value for each combination of a processing target word and each semantic tag, and a function of determining a semantic tag of the processing target word by obtaining a combination having the highest evaluation value by a search algorithm such as a beam search have.

意味タグ出力部１４は、意味タグ決定部１３で決定した各対象単語の意味タグに基づいて、対象文Ｘの各対象言語に最適な意味タグが付与された出力文Ｙを生成力する機能と、この出力文Ｙを記憶部２、入出力Ｉ／Ｆ部３、通信Ｉ／Ｆ部４、画面表示部６などへ出力する機能とを有している。 The semantic tag output unit 14 has a function of generating an output sentence Y to which an optimal semantic tag is assigned to each target language of the target sentence X based on the semantic tag of each target word determined by the semantic tag determination unit 13. The output sentence Y has a function of outputting to the storage unit 2, the input / output I / F unit 3, the communication I / F unit 4, the screen display unit 6, and the like.

[第１の実施の形態の動作]
次に、図２を参照して、本発明の第１の実施の形態にかかる単語意味タグ付与装置の動作について説明する。
単語意味タグ付与装置１０の演算処理部１は、操作入力部５により、オペレータによる処理開始操作を検出した場合、まず対象文解析部１１により、意味タグの付与対象とする対象文Ｘを受け取り、言語解析処理を行う。なお、対象文Ｘについてすでに言語解析されている場合、対象文解析部１１による言語解析処理は不要となり、入力された対象文Ｘとその言語解析結果が特徴量抽出部１２へ渡される。 [Operation of the first embodiment]
Next, with reference to FIG. 2, the operation of the word meaning tag assigning device according to the first exemplary embodiment of the present invention will be described.
When the operation input unit 5 detects a process start operation by the operator, the arithmetic processing unit 1 of the word meaning tag adding device 10 first receives a target sentence X to be given a semantic tag by the target sentence analyzing unit 11. Perform language analysis processing. If the target sentence X has already been subjected to language analysis, the target sentence analysis unit 11 does not need language analysis processing, and the input target sentence X and its language analysis result are passed to the feature quantity extraction unit 12.

対象文Ｘが「茶のグラスをかけた男」という自然言語データ（テキストデータ）である場合、形態素解析部１１Ａは、対象文Ｘを受け取って形態素解析処理を行い、得られた単語とその品詞を解析結果として出力する。図３は、形態素解析結果の例である。この例では、対象文Ｘが７つの単語に分割され、それぞれの単語に品詞が付与されている。文字位置は、それぞれの単語の対象文Ｘにおける文字の始まりと終わりの位置（文字桁）を表している。例えば「茶」は、対象文Ｘで１文字目から１文字目までの位置にあることを示している。 When the target sentence X is natural language data (text data) of “a man wearing a brown glass”, the morpheme analysis unit 11A receives the target sentence X and performs morpheme analysis processing, and the obtained word and its part of speech. Is output as an analysis result. FIG. 3 is an example of a morphological analysis result. In this example, the target sentence X is divided into seven words, and parts of speech are given to the respective words. The character position represents the start and end positions (character digits) of the character in the target sentence X of each word. For example, “brown” indicates that the target sentence X is located from the first character to the first character.

一方、構文意味解析部１１Ｂは、対象文Ｘを受け取って構文意味解析を行い、各単語間の意味的な関係を出力する。図４は、構文意味解析結果の例である。この例では、（１）「茶のグラス」という名詞句があり、主辞「グラス」と修飾辞「茶」の２つの語が助詞「の」で連結されている、という関係と、（２）「グラスをかけた男」という名詞句があり、「かけた」という述語に対して「男」が主格、「グラス」が目的格である、という関係の２つの関係を示している。丸括弧内は、図３と同様に元の文での文字位置を示している。
解析結果統合部１１Ｃは、形態素解析部１１Ａと構文意味解析部１１Ｂの異なる言語解析結果を受け取り１つの結果にまとめ、特徴量抽出部１２に送る。 On the other hand, the syntax and semantic analysis unit 11B receives the target sentence X, performs syntax and semantic analysis, and outputs a semantic relationship between the words. FIG. 4 is an example of a syntax and semantic analysis result. In this example, (1) there is a noun phrase “tea glass”, and the two words of the main word “glass” and the modifier “tea” are connected by the particle “no”, and (2) There is a noun phrase “a man wearing a glass”, which indicates two relationships, “man” is the main character and “glass” is the target character for the predicate “it was”. The parenthesis indicates the character position in the original sentence as in FIG.
The analysis result integration unit 11C receives different language analysis results of the morpheme analysis unit 11A and the syntax / semantic analysis unit 11B, combines them into one result, and sends the result to the feature amount extraction unit 12.

次に、演算処理部１は、特徴量抽出部１２により、対象文解析部１１で得られた対象文Ｘの言語解析結果に基づいて、各対象単語の特徴量を抽出する。
まず、特徴量抽出部１２は、一次特徴量抽出部１２Ａにより、対象文Ｘの言語解析結果と単語辞書２１に基づいて意味タグの付与対象となる対象単語を特定する。図５は、単語辞書の例である。 Next, the arithmetic processing unit 1 causes the feature amount extraction unit 12 to extract the feature amount of each target word based on the language analysis result of the target sentence X obtained by the target sentence analysis unit 11.
First, the feature quantity extraction unit 12 uses the primary feature quantity extraction unit 12 </ b> A to identify a target word to be given a semantic tag based on the language analysis result of the target sentence X and the word dictionary 21. FIG. 5 is an example of a word dictionary.

この例では、単語「茶」には意味タグ「茶₁」「茶₂」に対応する２つの意味が存在し、単語「グラス」には意味タグ「グラス₁」「グラス₂」に対応する２つの意味が存在し、単語「男」には意味タグ「男₁」に対応する１つの意味のみが存在し、単語「かける」には意味タグ「かける₁」に対応する１つの意味のみが存在していることを示している。従って、ここでは複数の意味が存在する「茶」と「グラス」の２語が意味タグ付け対象単語と決定される。 In this example, the word “tea” has two meanings corresponding to the meaning tags “tea ₁ ” and “tea ₂ ”, and the word “glass” has two meanings corresponding to the meaning tags “glass ₁ ” and “glass ₂ ”. one of meaning is present, only the meaning of the one corresponding to the meaning tag "man _1" in the word "man" is present, the word "put" there is only the meaning of the one corresponding to the meaning tag "multiplied by _1" It shows that you are doing. Accordingly, here, two words “tea” and “glass” having a plurality of meanings are determined as meaning tagging target words.

続いて、一次特徴量抽出部１２Ａは、これら対象単語のうちから処理対象単語を１つずつ順に次選択し、その処理対象単語ごとに一次特徴量を生成する。図６は、一次特徴量の定義例である。図７は、図６の定義に従って抽出した一次特徴量の例である。例えば、「茶（１、１）」の欄のＣｏＷｏｒｄ（グラス）は、対象単語「茶」が出現している文に、単語「グラス」が共起して出現していることを表している。
この際、一次特徴量抽出部１２Ａは、まず、図６の定義に基づいて処理対象単語と対象文Ｘに含まれる他の単語との単語共起関係を抽出し、それぞれ単語共起関係を示す一次特徴量を生成する。そして、これら一次特徴量からの羅列からなる一次特徴量を生成する。 Subsequently, the primary feature quantity extraction unit 12A sequentially selects processing target words one by one from these target words, and generates a primary feature quantity for each processing target word. FIG. 6 is a definition example of the primary feature amount. FIG. 7 is an example of primary feature values extracted according to the definition of FIG. For example, CoWord (glass) in the “tea (1, 1)” column indicates that the word “glass” co-occurs in the sentence in which the target word “tea” appears. .
At this time, the primary feature quantity extraction unit 12A first extracts a word co-occurrence relationship between the processing target word and other words included in the target sentence X based on the definition of FIG. A primary feature is generated. And the primary feature-value which consists of the enumeration from these primary feature-values is produced | generated.

次に、特徴量抽出部１２は、特徴量拡張部１２Ｂにより、単語辞書２１を参照して、処理対象単語と単語共起関係を持つ他の対象単語の意味タグや当該意味の分類を示す意味クラスを検索し、得られた意味タグおよび意味クラスのぞれぞれと当該処理対象単語との共起関係を示す新たな特徴量を生成し、これを一次特徴量に付加することにより、一次特徴量をより広範な概念まで拡張した拡張特徴量を生成する。 Next, the feature quantity extraction unit 12 refers to the word dictionary 21 by the feature quantity expansion unit 12B, and indicates meaning tags of other target words having a word co-occurrence relationship with the processing target word and the meaning classification. By searching for a class, a new feature quantity indicating the co-occurrence relationship between each of the obtained semantic tags and semantic classes and the processing target word is generated, and this is added to the primary feature quantity to obtain a primary feature quantity. An extended feature amount is generated by extending the feature amount to a broader concept.

図８は、拡張特徴量の例である。例えば、対象単語「茶」の一次特徴量ＣｏＷｏｒｄ（グラス）に含まれる単語「グラス」が「グラス₁：ガラス製の杯」である場合、単語辞書２１を検索して単語「グラス」の意味タグ「グラス₁」と意味クラス[食器]を取得し、当該一次特徴量の「グラス」を意味タグ「グラス₁」で置き換えた新たな特徴量ＣｏＷｏｒｄ（グラス₁）や、意味クラス[食器]で置き換えた新たな特徴量ＣｏＷｏｒｄ（[食器]）を生成する。 FIG. 8 is an example of the extended feature amount. For example, when the word “glass” included in the primary feature value CoWord (glass) of the target word “tea” is “glass ₁ : glass cup”, the word dictionary 21 is searched and the meaning tag of the word “glass” is searched. Acquire “Glass ₁ ” and semantic class [tableware], and replace it with a new feature quantity CoWord (glass ₁ ) that replaces “Glass” of the primary feature quantity with the semantic tag “glass ₁ ”, and semantic class [tableware]. A new feature value CoWord ([tableware]) is generated.

一次特徴量は、対象単語「茶」と他の単語との単語レベルにおける共起関係を示す特徴量であるのに対して、これら新たな特徴量は、対象単語「茶」に対して単語共起関係を持つ他の対象単語との意味レベルにおける共起関係、すなわち意味共起関係を示す特徴量であり、これら新たな特徴量は意味共起特徴量といえる。
したがって、[食器]という意味クラスに置き換えた意味共起特徴量ＣｏＷｏｒｄ（[食器]）を加えることにより、[食器]の意味クラスに属する「皿」や「ジョッキ」などの他の単語が出現した文に対して、似た文脈であることを示す特徴量を得ることが可能となる。 The primary feature value is a feature value indicating a co-occurrence relationship at the word level between the target word “tea” and another word, whereas these new feature values are the same for the target word “tea”. It is a feature quantity indicating a co-occurrence relation at a semantic level with another target word having an origin relation, that is, a semantic co-occurrence relation. These new feature quantities can be said to be semantic co-occurrence feature quantities.
Therefore, by adding the semantic co-occurrence feature CoWord ([tableware]) replaced with the semantic class [tableware], other words such as “dish” and “mug” belonging to the semantic class of [tableware] appeared. It is possible to obtain a feature value indicating that the sentence has a similar context.

次に、演算処理部１は、意味タグ決定部１３により、特徴量抽出部１２で抽出された各対象言語の拡張特徴量に基づいて、対象文Ｘの各対象単語に最適な意味タグを決定する。
意味タグ決定部１３は、意味タグ組合せ探索部１３Ａにより、記憶部２の意味タグ選択モデル２２から各意味タグの組合せの評価値を計算し、その評価値が最大になる意味タグの組合せを探索により求める。 Next, the arithmetic processing unit 1 determines the optimum semantic tag for each target word of the target sentence X based on the extended feature amount of each target language extracted by the feature amount extraction unit 12 by the semantic tag determination unit 13. To do.
The semantic tag determination unit 13 calculates the evaluation value of each semantic tag combination from the semantic tag selection model 22 in the storage unit 2 by the semantic tag combination search unit 13A, and searches for a semantic tag combination that maximizes the evaluation value. Ask for.

図９は、最大エントロピモデルに基づく意味タグ選択モデルの例である。図９中の数字は、対象単語について各特徴量が得られたときの各意味タグが選択される重みを表している。この重みは、正解の意味タグの付与されている学習データを用いて最大エントロピモデル（A. Berger他著："A maximum entropy approach to natural language processing."、Computational Linguistics, 22(1), 39-71）に基づく方法などで決定することができる。 FIG. 9 is an example of a semantic tag selection model based on the maximum entropy model. The numbers in FIG. 9 represent the weights at which each semantic tag is selected when each feature amount is obtained for the target word. This weight is calculated using the maximum entropy model (A. Berger et al .: "A maximum entropy approach to natural language processing.", Computational Linguistics, 22 (1), 39- 71).

意味タグ選択モデルが最大エントロピモデルである場合、対象文の文脈ｃで、意味タグｔが選択される確率ｐ（ｔ｜ｃ）は、次の式（１）および式（２）で求められ、これを評価値とする。ここで、ｆｉ（ｔ，ｃ）は素性関数であり、各特徴量ｉが得られたとき１、それ以外のとき０となる２値関数である。また、λｉはモデルのパラメータであり、図９中の「重み」にあたる。 When the semantic tag selection model is the maximum entropy model, the probability p (t | c) that the semantic tag t is selected in the context c of the target sentence is obtained by the following expressions (1) and (2): This is an evaluation value. Here, fi (t, c) is a feature function, which is a binary function that is 1 when each feature quantity i is obtained, and 0 otherwise. Further, λi is a parameter of the model and corresponds to “weight” in FIG.

初期状態では、どの意味タグも決定しておらず、単語「茶」に対する特徴量のうち非ゼロの重みを持つのは、「ＣｏＷｏｒｄ（グラス）」のみであるので、
Ｚ（ｃ）ｐ（茶₁｜ｃ）＝０．６×１＝０．６
Ｚ（ｃ）ｐ（茶₂｜ｃ）＝０．２
より、評価値（確率）は、
ｐ（茶₁｜ｃ）＝０．６／（０．６＋０．２）＝０．７５
ｐ（茶₂｜ｃ）＝０．２／（０．６＋０．２）＝０．２５
である。 In the initial state, no semantic tag is determined, and only “CoWord (glass)” has a non-zero weight among the feature values for the word “brown”.
Z (c) p (Brown ₁ | c) = 0.6 × 1 = 0.6
Z (c) p (Brown ₂ | c) = 0.2
Therefore, the evaluation value (probability) is
p (Brown ₁ | c) = 0.6 / (0.6 + 0.2) = 0.75
p (Brown ₂ | c) = 0.2 / (0.6 + 0.2) = 0.25
It is.

同様に単語「グラス」に関しては、特徴量「ＣｏＷｏｒｄ（茶）」，「ＡＲＧ２／ＲＥＬ（述語_かける）」から、
Ｚ（ｃ）ｐ（グラス₁｜ｃ）＝０．２×１＋０．０２×１＝０．２２
Ｚ（ｃ）ｐ（グラス₂｜ｃ）＝０．３×１＋１．３×１＝１．６
より、評価値（確率）は、
ｐ（グラス₁｜ｃ）＝０．２２／（０．２２＋１．６）＝０．１２１
ｐ（グラス₂｜ｃ）＝１．６／（０．２２＋１．６）＝０．８７９
である。 Similarly, for the word “glass”, from the feature quantities “CoWord (brown)” and “ARG2 / REL (predicate_kake)”,
Z (c) p (Glass ₁ | c) = 0.2 × 1 + 0.02 × 1 = 0.22
Z (c) p (glass ₂ | c) = 0.3 × 1 + 1.3 × 1 = 1.6
Therefore, the evaluation value (probability) is
p (glass ₁ | c) = 0.22 / (0.22 + 1.6) = 0.121
p (glass ₂ | c) = 1.6 / (0.22 + 1.6) = 0.879
It is.

この後、意味タグ組合せ探索部１３Ａは、ビーム探索アルゴリズムにより、最も評価値の高い意味タグを探索する。
図１０は、ビーム探索アルゴリズムの例である。ここで、対象単語の集合をＷ＝{ｗ₁,...,ｗ_n}、単語ｗｉのｋ番目の意味タグをｔ_wik、決定された意味タグのリストをＴ、ビーム幅をｂとする。探索は、Ｎ＝[Ｗ，Ｔ]を探索ノードとして行う。またノードの評価値ｓ（Ｎ）は、ノードＮが直前ノードＮ_１＝[Ｗ_１，Ｔ_１]から、単語ｗｉの意味タグをｔ_wikに決定してできたとすると、ｓ（Ｎ）＝ｐ（ｔ_wik｜ｃ）・ｓ（Ｎ_１）と定義する。ｐ（ｔ_wik｜ｃ）は、前述のように図８の拡張特徴量、図３の形態素解析結果、および意味タグ選択モデル２２から求めることができる。 Thereafter, the semantic tag combination search unit 13A searches for a semantic tag having the highest evaluation value by a beam search algorithm.
FIG. 10 is an example of a beam search algorithm. Here, the set of target words is W = {w ₁ ,..., W _n }, the k-th semantic tag of the word wi is t _wik , the list of determined semantic tags is T, and the beam width is b. . The search is performed using N = [W, T] as a search node. Further, the node evaluation value s (N) is obtained when s (N) = p (t _wik ), assuming that the node N determines the meaning tag of the word wi as t _wik from the immediately preceding node N_1 = [W_1, T_1]. | C) · s (N_1). As described above, p (t _wik | c) can be obtained from the extended feature quantity of FIG. 8, the morphological analysis result of FIG. 3, and the semantic tag selection model 22.

この例では、図１０のステップＳ１において、初期ノードＮ₀＝[Ｗ₀＝{茶、グラス}，Ｔ０＝{}]が初期キューＱ₀に挿入される。続くステップＳ２において、Ｗ₀から順番に「茶」、「グラス」を取り出し、それぞれに意味タグを決定したノードを作成する。これらのノードをキューＱ’に挿入し、Ｑ’＝＜[{グラス}，{茶₁}]，[{グラス}，{茶₂}]，[{茶}，{グラス₁}]，[{茶}，{グラス₂}]＞となる。 In this example, in step S1 of FIG. 10, the initial node N ₀ = [W ₀ = {brown, glass}, T0 = {}] is inserted into the initial queue Q ₀ . In the subsequent step S2, “brown” and “grass” are taken out in order from W ₀ , and a node with a semantic tag determined for each is created. Insert these nodes into the queue Q ′, and Q ′ = <[{Glass}, {Brown ₁ }], [{Glass}, {Brown ₂ }], [{Brown}, {Glass ₁ }], [{ Tea}, {glass ₂ }]>.

ここで、評価値ｓ（[{グラス}，{茶₁}]）は、ｐ（茶₁｜ｃ）に等しく０．７５であり、評価値ｓ（[{グラス}，{茶₂}]）は、ｐ（茶₂｜ｃ）に等しく０．２５である。同様に、ｓ（[{茶}，{グラス₁}]）＝０．１２１、ｓ（[{茶}，{グラス₂}]）＝０．８７９である。
ステップＳ３において、評価値の高い順にキューＱ’の中をソートすると、Ｑ’＝＜[{茶}，{グラス₂}]，[{グラス}，{茶₁}]，[{グラス}，{茶₂}]，[{茶}，{グラス₁}]＞となる。ビーム幅ｂ＝２とすると、新しいキューＱは、Ｑ＝＜[{茶}，{グラス₂}]，{グラス}，{茶₁}]＞となる。 Here, the evaluation value s ([{glass}, {tea ₁ }]) is equal to p (tea ₁ | c) and is 0.75, and the evaluation value s ([{glass}, {tea ₂ }]) Is equal to p (brown ₂ | c) and is 0.25. Similarly, s ([{Brown}, {Glass ₁ }]) = 0.121 and s ([{Brown}, {Glass ₂ }]) = 0.879.
In step S3, when the queue Q ′ is sorted in descending order of evaluation value, Q ′ = <[{Brown}, {Glass ₂ }], [{Glass}, {Brown ₁ }], [{Glass}, { Tea ₂ }], [{Brown}, {Glass ₁ }]>. If the beam width is b = 2, the new cue Q becomes Q = <[{Brown}, {Glass ₂ }], {Glass}, {Brown ₁ }]>.

次に、キューＱの先頭のノード[{茶}、{グラス₂}]からは、図８から新しく特徴量ＣｏＷｏｒｄ（グラス₂）、ＣｏＷｏｒｄ（[衣服]）、ＡＲＧ１／ＲＥＬ−ＡＲＧ２（名詞句_の、[衣服]）が得られる。同様にして計算を進めると、ステップＳ４において、最終的に[{}，{茶₂，グラス₂}]が最も評価値の高いノードとなり、この意味タグの組合せが意味タグ出力部１４に渡される。 Next, from the first node [{brown}, {glass ₂ }] of the queue Q, new feature quantities CoWord (glass ₂ ), CoWord ([clothes]), ARG1 / REL-ARG2 (noun phrase_ [Clothes]). When the calculation proceeds in the same manner, in step S4, [{}, {Brown ₂ , Glass ₂ }] finally becomes the node with the highest evaluation value, and this combination of semantic tags is passed to the semantic tag output unit 14. .

図１１は、意味タグ付与結果の例である。対象単語「茶」，「グラス」にそれぞれ「茶₂」，「グラス₂」の意味タグが付与されている。「男」，「かける」は単語辞書２１上でそれぞれ意味タグが１つしかないので自動的に唯一の意味タグ「男₁」，「かける₁」がそれぞれ付与される。 FIG. 11 is an example of a semantic tag assignment result. Meaning tags “tea ₂ ” and “glass ₂ ” are assigned to the target words “tea” and “glass”, respectively. Since “male” and “kake” have only one semantic tag on the word dictionary 21, the only semantic tags “male ₁ ” and “kake ₁ ” are automatically assigned.

このように、本実施の形態では、記憶部２により、単語の意味ごとに設けられた、当該意味を示す意味タグと当該意味の概念を示す意味クラスの組からなる単語辞書２１を記憶しておき、特徴量抽出部１２により、対象単語のうちから選択した処理対象単語について、当該処理対象単語と対象文に含まれる他の単語との間の単語共起関係を示す一次特徴量を生成し、当該処理対象単語と単語共起関係を持つ他の対象単語について記憶部の単語辞書を検索して得られた意味タグおよび意味クラスと当該処理対象単語との間の意味共起関係を示す意味共起特徴量を生成し、当該処理対象単語の一次特徴量と意味共起特徴量から当該処理対象単語の拡張特徴量を生成し、意味タグ決定部１３により、処理対象単語の拡張特徴量に基づいて、単語辞書に記述されている当該処理対象単語の意味タグのうち、対象文で使用されている当該処理対象単語の最も適切な意味を示す意味タグを決定している。 As described above, in the present embodiment, the storage unit 2 stores the word dictionary 21 that is provided for each word meaning and includes a meaning tag that indicates the meaning and a meaning class that indicates the concept of the meaning. The feature amount extraction unit 12 generates a primary feature amount indicating a word co-occurrence relationship between the processing target word and another word included in the target sentence for the processing target word selected from the target words. Meaning indicating a semantic co-occurrence relationship between a semantic tag and a semantic class obtained by searching the word dictionary in the storage unit for another target word having a word co-occurrence relationship with the processing target word and the processing target word A co-occurrence feature amount is generated, an extended feature amount of the processing target word is generated from the primary feature amount and the semantic co-occurrence feature amount of the processing target word, and the semantic tag determination unit 13 converts the processing target word into an extended feature amount. Based word dictionary Of meaning tag to be processed words are written, and determines the semantic tags indicating the most appropriate means of the processing target word used in the sentence.

また、本実施の形態では、対象文に対してことなる複数の解析処理によって言語解析を行い、これら解析結果を統合した解析情報に基づいて、単語共起関係や意味共起関係を抽出するようにしたので、特徴量を正確に生成することができ、より精度の高い意味タグの付与が実現可能となる。 In this embodiment, language analysis is performed by a plurality of different analysis processes for the target sentence, and word co-occurrence relations and semantic co-occurrence relations are extracted based on analysis information obtained by integrating these analysis results. As a result, feature quantities can be generated accurately, and more accurate semantic tags can be assigned.

[第２の実施の形態]
次に、図１２を参照して、本発明の第２の実施の形態にかかる単語意味タグ付与装置について説明する。図１２は、本発明の第２の実施の形態にかかる単語意味タグ付与装置の要部を示すブロック図であり、前述した図２と同じ又は同等部分には同一符号を付してある。
第１の実施の形態では、対象文解析部１１において、対象文Ｘについて形態素解析と構文意味解析の異なる２つの言語解析を行う場合を例として説明した。本実施の形態では、対象文解析部１１において、形態素解析だけを行う場合について説明する。 [Second Embodiment]
Next, with reference to FIG. 12, the word meaning tag provision apparatus concerning the 2nd Embodiment of this invention is demonstrated. FIG. 12 is a block diagram showing the main part of the word meaning tag assigning apparatus according to the second embodiment of the present invention, and the same or equivalent parts as those in FIG.
In the first embodiment, a case has been described as an example where the target sentence analysis unit 11 performs two language analyzes on the target sentence X, which are different in morphological analysis and syntax-semantic analysis. In the present embodiment, a case where only the morphological analysis is performed in the target sentence analysis unit 11 will be described.

第１の実施の形態と比較して、本実施の形態にかかる単語意味タグ付与装置１０では、対象文解析部１１が形態素解析部１１Ａからのみ構成されている点が異なる。なお、他の構成については、前述した第１の実施の形態と同様であり、ここでの詳細な説明は省略する。 Compared to the first embodiment, the word meaning tagging device 10 according to the present embodiment is different in that the target sentence analysis unit 11 is configured only from the morpheme analysis unit 11A. Other configurations are the same as those of the first embodiment described above, and a detailed description thereof is omitted here.

[第２の実施の形態の動作]
次に、図１２を参照して、本発明の第２の実施の形態にかかる単語意味タグ付与装置の動作について説明する。
単語意味タグ付与装置１０の演算処理部１は、操作入力部５により、オペレータによる処理開始操作を検出した場合、まず対象文解析部１１により、意味タグの付与対象とする対象文Ｘを受け取り、言語解析処理を行う。
対象文Ｘが「男がグラスで茶を出した」という自然言語データ（テキストデータ）である場合、対象文解析部１１は、形態素解析部１１Ａにより、対象文Ｘを受け取って形態素解析処理を行い、得られた単語とその品詞を解析結果として出力する。図１３は、解析結果の例である。 [Operation of Second Embodiment]
Next, the operation of the word meaning tag assigning device according to the second exemplary embodiment of the present invention will be described with reference to FIG.
When the operation input unit 5 detects a process start operation by the operator, the arithmetic processing unit 1 of the word meaning tag adding device 10 first receives a target sentence X to be given a semantic tag by the target sentence analyzing unit 11. Perform language analysis processing.
When the target sentence X is natural language data (text data) that “a man made a tea with a glass”, the target sentence analysis unit 11 receives the target sentence X by the morpheme analysis unit 11A and performs morpheme analysis processing. The obtained word and its part of speech are output as an analysis result. FIG. 13 is an example of the analysis result.

次に、演算処理部１は、特徴量抽出部１２により、対象文解析部１１で得られた対象文Ｘの言語解析結果に基づいて、各対象単語の特徴量を抽出する。
まず、特徴量抽出部１２は、一次特徴量抽出部１２Ａにより、対象文Ｘの言語解析結果と単語辞書２１に基づいて意味タグの付与対象となる対象単語を特定する。
この例では、単語「「グラス」，「茶」，「出す」にはそれぞれ２つずつの意味が存在するので、これら「茶」，「グラス」，「出す」の３語が意味タグ付け対象単語と決定される。 Next, the arithmetic processing unit 1 causes the feature amount extraction unit 12 to extract the feature amount of each target word based on the language analysis result of the target sentence X obtained by the target sentence analysis unit 11.
First, the feature quantity extraction unit 12 uses the primary feature quantity extraction unit 12 </ b> A to identify a target word to be given a semantic tag based on the language analysis result of the target sentence X and the word dictionary 21.
In this example, there are two meanings for each of the words “glass”, “tea”, and “take out”, so these three words “tea”, “glass”, and “take out” are subject to semantic tagging. Determined as a word.

続いて、一次特徴量抽出部１２Ａは、これら対象単語のうちから処理対象単語を１つずつ順に次選択し、その処理対象単語ごとに一次特徴量を生成する。
この際、一次特徴量抽出部１２Ａは、前述と同様に、まず、図６の定義に基づいて処理対象単語と対象文Ｘに含まれる他の単語との単語共起関係を抽出し、それぞれ単語共起関係を示す一次特徴量を生成する。そして、これら一次特徴量からの羅列からなる一次特徴量を生成する。図１４は、一次特徴量の例である。 Subsequently, the primary feature quantity extraction unit 12A sequentially selects processing target words one by one from these target words, and generates a primary feature quantity for each processing target word.
At this time, the primary feature quantity extraction unit 12A first extracts the word co-occurrence relationship between the processing target word and other words included in the target sentence X based on the definition of FIG. A primary feature amount indicating a co-occurrence relationship is generated. And the primary feature-value which consists of the enumeration from these primary feature-values is produced | generated. FIG. 14 is an example of the primary feature amount.

次に、特徴量抽出部１２は、特徴量拡張部１２Ｂにより、単語辞書２１を参照して、処理対象単語と単語共起関係を持つ他の対象単語の意味タグや当該意味の分類を示す意味クラスを検索し、得られた意味タグおよび意味クラスのぞれぞれと当該処理対象単語との共起関係を示す新たな特徴量を生成し、これを一次特徴量に付加することにより、一次特徴量をより広範な概念まで拡張した拡張特徴量を生成する。図１５は、拡張特徴量の例である。 Next, the feature quantity extraction unit 12 refers to the word dictionary 21 by the feature quantity expansion unit 12B, and indicates meaning tags of other target words having a word co-occurrence relationship with the processing target word and the meaning classification. By searching for a class, a new feature quantity indicating the co-occurrence relationship between each of the obtained semantic tags and semantic classes and the processing target word is generated, and this is added to the primary feature quantity to obtain a primary feature quantity. An extended feature amount is generated by extending the feature amount to a broader concept. FIG. 15 is an example of the extended feature amount.

図１６は、最大エントロピモデルに基づく意味タグ選択モデルの例である。初期状態では、どの意味タグも決定しておらず、単語「グラス」に対する特徴量のうち非ゼロの重みを持つのは「ＣｏＷｏｒｄ（男）」，「ＣｏＷｏｒｄ（茶）」，「ＣｏＷｏｒｄ（出す）」であるので、
Ｚ（ｃ）ｐ（グラス₁｜ｃ）＝０．１×１＋０．１×１＋０．２×１＝０．４
Ｚ（ｃ）ｐ（グラス₂｜ｃ）＝０．４＋１＋０．１×１＝０．５
より、評価値（確率）は、
ｐ（グラス₁｜ｃ）＝０．４／（０．４＋０．５）＝０．４４
ｐ（グラス₂｜ｃ）＝０．５／（０．４＋０．５）＝０．５６
である。 FIG. 16 is an example of a semantic tag selection model based on the maximum entropy model. In the initial state, no semantic tag is determined, and “CoWord (male)”, “CoWord (brown)”, and “CoWord (out)” have a non-zero weight among the feature quantities for the word “glass”. Because
Z (c) p (Glass ₁ | c) = 0.1 × 1 + 0.1 × 1 + 0.2 × 1 = 0.4
Z (c) p (glass ₂ | c) = 0.4 + 1 + 0.1 × 1 = 0.5
Therefore, the evaluation value (probability) is
p (glass ₁ | c) = 0.4 / (0.4 + 0.5) = 0.44
p (glass ₂ | c) = 0.5 / (0.4 + 0.5) = 0.56
It is.

同様に単語「茶」に関しては、特徴量「ＣｏＷｏｒｄ（グラス）」，「ＣｏＷｏｒｄ（出す）」から
Ｚ（ｃ）ｐ（茶₁｜ｃ）＝０．８×１＋０．９×１＝１．７
Ｚ（ｃ）ｐ（茶₂｜ｃ）＝０．２×１＋０．１×１＝０．３
より、評価値（確率）は、
ｐ（茶₁｜ｃ）＝１．７／（１．７＋０．３）＝０．８５
ｐ（茶₂｜ｃ）＝０．３／（１．７＋０．３）＝０．１５
である。 Similarly, with respect to the word “tea”, the feature quantities “CoWord (glass)” and “CoWord (deposit)” are used. Z (c) p (tea ₁ | c) = 0.8 × 1 + 0.9 × 1 = 1.7
Z (c) p (Brown ₂ | c) = 0.2 × 1 + 0.1 × 1 = 0.3
Therefore, the evaluation value (probability) is
p (Brown ₁ | c) = 1.7 / (1.7 + 0.3) = 0.85
p (Brown ₂ | c) = 0.3 / (1.7 + 0.3) = 0.15
It is.

同様に単語「出す」に関しては、特徴量「ＣｏＷｏｒｄ（グラス）」，「ＣｏＷｏｒｄ（茶）」から
Ｚ（ｃ）ｐ（出す₁｜ｃ）＝０．２×１＋０＝０．２
Ｚ（ｃ）ｐ（出す₂｜ｃ）＝０．３×１＋０．９×１＝１．２
より、評価値（確率）は、
ｐ（出す₁｜ｃ）＝０．２／（０．２＋１．２）＝０．１４
ｐ（出す₂｜ｃ）＝０．２／（０．２＋１．２）＝０．８６
である。 Similarly, for the word “out”, Z (c) p (out ₁ | c) = 0.2 × 1 + 0 = 0.2 from the feature quantities “CoWord (glass)” and “CoWord (brown)”.
Z (c) p ( ₂ < ₂ > c) = 0.3 * 1 + 0.9 * 1 = 1.2
Therefore, the evaluation value (probability) is
p ( _{1 to be} put out | c) = 0.2 / (0.2 + 1.2) = 0.14
p ( ₂ out | c) = 0.2 / (0.2 + 1.2) = 0.86
It is.

この後、意味タグ組合せ探索部１３Ａは、ビーム探索アルゴリズムにより、最も評価値の高い意味タグを探索する。
この例では、図１０のステップＳ１において、初期ノードＮ₀＝[Ｗ₀＝{グラス，茶，出す}、Ｔ０＝{}]が初期キューＱ₀に挿入される。続くステップＳ２において、Ｗ₀から順番に「グラス」，「茶」，「出す」を取り出し、それぞれに意味タグを決定したノードを作成する。これらのノードをキューＱ’に挿入し、Ｑ’＝＜[{茶，出す}，{グラス₁}]，[{茶，出す}，{グラス₂}]，[{グラス，出す}，{茶₁}]，[{グラス，出す}，{茶₂}]，[{グラス，茶}，{出す₁}]，[{グラス，茶}，{出す₂}]＞となる。 Thereafter, the semantic tag combination search unit 13A searches for a semantic tag having the highest evaluation value by a beam search algorithm.
In this example, in step S1 of FIG. 10, the initial node N ₀ = [W ₀ = {glass, brown, out}, T0 = {}] is inserted into the initial queue Q ₀ . In the subsequent step S2, “glass”, “brown”, and “take out” are extracted in order from W ₀ , and a node for which a semantic tag is determined is created for each. Insert these nodes into the queue Q ', Q' = <[{Brown, out}, {Glass ₁ }], [{Brown, out}, {Glass ₂ }], [{Glass, out}, {Brown ₁ }], [{Glass, out}, {Brown ₂ }], [{Glass, tea}, {Out ₁ }], [{Glass, tea}, {Out ₂ }]>.

ステップＳ３において、評価値の高い順にキューＱ’の中をソートすると、Ｑ’＝＜[{グラス，茶}，{出す₂}]，[{グラス，出す}，{茶₁}]，[{茶，出す}，{グラス₂}]，[{茶、出す}，{グラス₁}]，[{グラス，出す}，{茶₂}]，[{グラス，茶}，{出す₁}]＞となる。
ビーム幅ｂ＝２とすると、新しいキューＱは、Ｑ＝＜[{グラス，茶}，{出す₂}]，[{グラス，出す}，{茶₁}]＞となる。 In the step S3, when the queue Q ′ is sorted in descending order of evaluation value, Q ′ = <[{glass, tea}, {out ₂ }], [{glass, out}, {tea ₁ }], [{ Tea, out}, {glass ₂ }], [{tea, out}, {glass ₁ }], [{glass, out}, {tea ₂ }], [{glass, tea}, {out ₁ }]> It becomes.
If the beam width b = 2, the new cue Q becomes Q = <[{glass, brown}, {out ₂ }], [{glass, out}, {brown ₁ }]>.

同様にして計算を進めると、ステップＳ４において、最終的に[{}、{グラス₁、茶₁、出す₂}]が最も評価値の高いノードとなり、この意味タグの組合せを意味タグ出力部１４に渡す。図１７は、意味タグ付与結果の例である。対象単語「グラス」、「茶」、「出す」にそれぞれ「グラス₁」、「茶₁」、「出す₂」の意味タグが付与されている。「男」は単語辞書２１上でそれぞれ意味タグが１つずつしかないので自動的に唯一の意味タグ「男₁」が付与される。 When the calculation proceeds in the same manner, in step S4, [{}, {Glass ₁ , Brown ₁ , Out ₂ }] finally becomes the node with the highest evaluation value, and this semantic tag combination is used as the semantic tag output unit 14. To pass. FIG. 17 is an example of a semantic tag assignment result. Meaning tags “Glass ₁ ”, “Brown ₁ ” and “Draw ₂ ” are assigned to the target words “Glass”, “Brown” and “Det”, respectively. Since “male” has only one semantic tag on the word dictionary 21, the only semantic tag “male ₁ ” is automatically assigned.

このように、本実施の形態では、対象文解析部１１において形態素解析のみを行うようにしたので、演算処理部１での処理負担を軽減できる。なお、第１の実施の形態より利用する情報が少ないため単語共起関係や意味共起関係の抽出精度が低くなる可能性があるものの、単語共起関係や意味共起関係を用いているため、より広範囲の関係に基づいて意味タグを付与することができ、意味タグの付与精度を高めることができる。 As described above, in the present embodiment, since only the morphological analysis is performed in the target sentence analysis unit 11, the processing load on the arithmetic processing unit 1 can be reduced. Note that, since less information is used than in the first embodiment, the accuracy of extracting word co-occurrence relationships and semantic co-occurrence relationships may be lowered, but word co-occurrence relationships and semantic co-occurrence relationships are used. , Meaning tags can be assigned based on a wider range of relationships, and the accuracy of meaning tag assignment can be increased.

[第３の実施の形態]
次に、図１８を参照して、本発明の第３の実施の形態にかかる単語意味タグ付与装置について説明する。図１８は、本発明の第３の実施の形態にかかる単語意味タグ付与装置の要部を示すブロック図であり、前述した図２と同じ又は同等部分には同一符号を付してある。 [Third embodiment]
Next, with reference to FIG. 18, the word meaning tag provision apparatus concerning the 3rd Embodiment of this invention is demonstrated. FIG. 18 is a block diagram showing a main part of a word meaning tag assigning device according to the third exemplary embodiment of the present invention. The same or equivalent parts as those in FIG.

第１の実施の形態では、意味タグ決定部１３において、特徴量抽出部１２で抽出された各対象言語の拡張特徴量に基づいて、記憶部２の意味タグ選択モデル２２を参照することにより、対象単語の最も適切な意味を示す意味タグを決定する場合について説明した。本実施の形態では、予め記憶部２に付与優先順データベース２３を設け、意味タグ決定部１３において対象単語の意味タグを付与する際、付与優先順データベース２３の優先順に基づく順序で各対象単語に意味タグを付与する場合について説明する。 In the first embodiment, the semantic tag determination unit 13 refers to the semantic tag selection model 22 in the storage unit 2 based on the extended feature amount of each target language extracted by the feature amount extraction unit 12. The case where the semantic tag indicating the most appropriate meaning of the target word is determined has been described. In the present embodiment, when the assignment priority order database 23 is provided in the storage unit 2 in advance and the meaning tag determination unit 13 assigns the meaning tag of the target word, each priority word is assigned to each target word in the order based on the priority order of the assignment priority order database 23. A case where a semantic tag is added will be described.

第１の実施の形態と比較して、本実施の形態にかかる単語意味タグ付与装置１０では、意味タグ決定部１３に意味タグ付与順決定部１３Ｂが追加されており、記憶部２に付与優先順データベース２３が追加されている点が異なる。なお、他の構成については、前述した第１の実施の形態と同様であり、ここでの詳細な説明は省略する。 Compared with the first embodiment, in the word meaning tag assignment device 10 according to the present embodiment, a meaning tag assignment order determination unit 13B is added to the meaning tag determination unit 13, and the storage unit 2 is given priority. The difference is that a sequential database 23 is added. Other configurations are the same as those of the first embodiment described above, and a detailed description thereof is omitted here.

図１９は、付与優先順データベースの一例である。ここでは、各対象単語の優先順を示す優先順データとして、特定のコーパスで計測した単語ごとの意味タグのエントロピーを用いている。一般にエントロピーが低いほど、単語の意味タグの使われ方のばらつきが小さく、意味タグを決めやすいと考えられる。したがって、この例によれば、対象単語への意味タグ付与順として（１）グラス、（２）茶の順となる。 FIG. 19 is an example of a priority database. Here, the entropy of the semantic tag for each word measured by a specific corpus is used as the priority order data indicating the priority order of each target word. In general, the lower the entropy, the smaller the variation in the use of word semantic tags, and the easier it is to determine semantic tags. Therefore, according to this example, the meaning tags are assigned to the target words in the order of (1) glass and (2) brown.

意味タグ決定部１３は、意味タグ組合せ探索部１３Ａにより、特徴量抽出部１２で抽出された各対象言語の拡張特徴量に基づいて、対象文Ｘの各対象単語に最適な意味タグを決定する際、前もって意味タグ付与順決定部１３Ｂにより、各対象単語への意味タグ付与順を決定する。意味タグ付与順決定部１３Ｂは、各対象単語について記憶部２の付与優先順データベース２３を検索して、各対象単語の優先順データ、ここでは意味タグのエントロピーを取得し、この優先順データに基づいて各対象単語への意味タグ付与順を決定する。 The semantic tag determination unit 13 determines an optimal semantic tag for each target word of the target sentence X based on the extended feature amount of each target language extracted by the feature amount extraction unit 12 by the semantic tag combination search unit 13A. At this time, the meaning tag assignment order determination unit 13B determines the meaning tag assignment order to each target word in advance. The meaning tag assignment order determination unit 13B searches the assignment priority order database 23 of the storage unit 2 for each target word, obtains priority order data of each target word, here, entropy of the meaning tag, and uses this priority order data. Based on this, the order in which semantic tags are assigned to each target word is determined.

意味タグ組合せ探索部１３Ａは、意味タグ付与順決定部１３Ｂで決定された付与順に基づいて、各対象単語から１つずつ処理対象単語を選択し、当該処理対象単語について第１の実施の形態と同様のビーム探索アルゴリズムにより、最も評価値の高い意味タグを探索する。 The semantic tag combination search unit 13A selects one processing target word from each target word based on the order of assignment determined by the semantic tag assignment order determination unit 13B, and the processing target word is the same as that of the first embodiment. A semantic tag having the highest evaluation value is searched for by a similar beam search algorithm.

このように、本実施の形態では、記憶部２により、対象単語に対する意味タグの付与順序を示す付与順序データベース２３を記憶しておき、意味タグ決定部１３により、複数の処理対象単語について意味タグを決定する場合、記憶部の付与順序データベースを検索して各処理対象単語の付与順序を取得し、これら付与順序に基づいて各処理対象単語に対して順に意味タグを決定するようにしたので、より高い精度で意味タグを付与することが可能となる。 As described above, in the present embodiment, the storage unit 2 stores the assignment order database 23 indicating the order in which the semantic tags are assigned to the target words, and the semantic tag determination unit 13 stores the semantic tags for a plurality of processing target words. Since the acquisition order database of the storage unit is searched to obtain the assignment order of each processing target word, and the meaning tag is determined in order for each processing target word based on these assignment orders, Semantic tags can be assigned with higher accuracy.

また、本実施の形態では、単語の意味タグ付与順の優先順データとしてエントロピーを用いる場合を例として説明したが、これに限定されるものではなく、例えば引用文献３で述べられている「単語親密度」を使用し、この単語親密度の低い順に意味タグを付与するなど、他の基準に基づいて意味タグ付与順を決定してもよい。 In the present embodiment, the case where entropy is used as the priority order data in the meaning tag assignment order of words has been described as an example. However, the present invention is not limited to this. For example, “word” described in cited document 3 The order of meaning tag assignment may be determined based on other criteria, such as using “familiarity” and assigning semantic tags in order of increasing word familiarity.

本発明の第１の実施の形態にかかる単語意味タグ付与装置の構成を示すブロック図である。It is a block diagram which shows the structure of the word meaning tag provision apparatus concerning the 1st Embodiment of this invention. 本発明の第１の実施の形態にかかる単語意味タグ付与装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the word meaning tag provision apparatus concerning the 1st Embodiment of this invention. 形態素解析結果の例である。It is an example of a morphological analysis result. 構文意味解析結果の例である。It is an example of a syntactic and semantic analysis result. 単語辞書の例である。It is an example of a word dictionary. 一次特徴量の定義例である。It is a definition example of a primary feature-value. 図６の定義に従って抽出した一次特徴量の例である。It is an example of the primary feature-value extracted according to the definition of FIG. 拡張特徴量の例である。It is an example of an extended feature-value. 最大エントロピモデルに基づく意味タグ選択モデルの例である。It is an example of the semantic tag selection model based on the maximum entropy model. ビーム探索アルゴリズムの例である。It is an example of a beam search algorithm. 意味タグ付与結果の例である。It is an example of a meaning tag provision result. 本発明の第２の実施の形態にかかる単語意味タグ付与装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the word meaning tag provision apparatus concerning the 2nd Embodiment of this invention. 他の解析結果の例である。It is an example of another analysis result. 他の一次特徴量の例である。It is an example of another primary feature-value. 他の拡張特徴量の例である。It is an example of another extended feature-value. 最大エントロピモデルに基づく他の意味タグ選択モデルの例である。It is an example of another semantic tag selection model based on the maximum entropy model. 意味タグ付与結果の例である。It is an example of a meaning tag provision result. 本発明の第３の実施の形態にかかる単語意味タグ付与装置の要部を示すブロック図である。It is a block diagram which shows the principal part of the word meaning tag provision apparatus concerning the 3rd Embodiment of this invention. 付与優先順データベースの一例である。It is an example of a priority database.

Explanation of symbols

１０…単語意味タグ付与装置、１…演算処理部、１１…対象文解析部、１１Ａ…形態素解析部、１１Ｂ…構文意味解析部、１１Ｃ…解析結果統合部、１２…特徴量抽出部、１２Ａ…一次特徴量抽出部、１２Ｂ…特徴量拡張部、１３…意味タグ決定部、１３Ａ…意味タグ組合せ探索部、１３Ｂ…意味タグ付与順決定部、１４…意味タグ出力部、２…記憶部、２０…プログラム、２１…単語辞書、２２…意味タグ選択モデル、２３…付与優先順データベース、３…入出力Ｉ／Ｆ部、４…通信Ｉ／Ｆ部、５…操作入力部、６…画面表示部、Ｘ…対象文、Ｙ…出力文、Ｍ…記録媒体。 DESCRIPTION OF SYMBOLS 10 ... Word meaning tag provision apparatus, 1 ... Arithmetic processing part, 11 ... Object sentence analysis part, 11A ... Morphological analysis part, 11B ... Syntax-and-meaning analysis part, 11C ... Analysis result integration part, 12 ... Feature-value extraction part, 12A ... Primary feature quantity extraction unit, 12B ... feature quantity expansion unit, 13 ... semantic tag determination unit, 13A ... semantic tag combination search unit, 13B ... semantic tag assignment order determination unit, 14 ... semantic tag output unit, 2 ... storage unit, 20 ... Program, 21 ... Word dictionary, 22 ... Semantic tag selection model, 23 ... Granting priority order database, 3 ... Input / output I / F unit, 4 ... Communication I / F unit, 5 ... Operation input unit, 6 ... Screen display unit , X ... target sentence, Y ... output sentence, M ... recording medium.

Claims

A word meaning tag assigning device that assigns a meaning tag indicating the meaning of a target word to each target word included in a target sentence composed of natural language data,
A storage unit that stores a word dictionary including a set of a meaning tag indicating the meaning and a meaning class indicating the concept of the meaning, provided for each meaning of the word;
For a processing target word selected from the target words, a primary feature amount indicating a word co-occurrence relationship between the processing target word and another word included in the target sentence is generated and included in the target sentence A semantic co-occurrence feature amount indicating a semantic co-occurrence relationship between the semantic tag and semantic class obtained by searching the word dictionary of the storage unit for the other word and the processing target word is generated, and the processing target word a feature extraction unit which adds the meaning co-occurrence characteristic amount to the primary characteristic of generating an extended feature amount of the processing target word,
Meaning indicating the most appropriate meaning of the processing target word used in the target sentence among the semantic tags of the processing target word described in the word dictionary based on the extended feature amount of the processing target word A semantic tag determination unit for determining a tag , and
The storage unit stores a weight given to each set of the extended feature amount of the processing target word and the semantic tag when determining the probability that each semantic tag is selected by the maximum entropy method,
The semantic tag determining unit multiplies the weight given to each semantic tag corresponding to the extended feature amount, with respect to the extended feature amount of the processing target word, and uses the maximum entropy method to determine the processing target word and each semantic tag. And calculating a probability for each combination, and determining a semantic tag having the highest probability among semantic tags corresponding to the processing target word as a semantic tag of the processing target word .

In the word meaning tag grant apparatus of Claim 1,
The feature amount extraction unit includes:
A primary feature amount extraction unit that generates a primary feature amount indicating the word co-occurrence relationship for each word co-occurrence relationship between the processing target word and another word included in the target sentence;
A word dictionary in the storage unit is searched for each of the other target words included in the target sentence, and a semantic co-occurrence relationship indicating a semantic co-occurrence relationship with the processing target word is obtained for each of the obtained semantic tags and semantic classes. A word meaning tag comprising: a feature amount expansion unit that generates each occurrence feature amount and generates an extended feature amount of the target word from the semantic co-occurrence feature amount and a primary feature amount of the processing target word Granting device.

In the word meaning tag grant apparatus of Claim 1 ,
The storage unit stores an assignment order database indicating an assignment order of semantic tags for the target word;
When the semantic tag determining unit determines a semantic tag for a plurality of processing target words, the semantic tag determining unit acquires a processing order of the processing target words by searching a storage order database of the storage unit, and performs each processing based on the processing order. A semantic word tagging apparatus characterized in that a semantic tag is sequentially determined for a target word.

In the word meaning tag grant apparatus of Claim 1,
The morphological analysis unit that performs morphological analysis on the target sentence, the syntax-semantic analysis unit that performs syntax-separation analysis on the target sentence, and the types of analysis results in the morpheme analysis unit and the syntax-semantic analysis unit are integrated for each same analysis target. A word meaning tagging apparatus, further comprising: a target sentence input unit including an analysis result integration unit that outputs the target sentence together with the target sentence to the primary feature amount extraction unit.

A word meaning tag assigning method for assigning a meaning tag indicating the meaning of a target word to each target word included in a target sentence composed of natural language data,
A storage step of storing in the storage unit a word dictionary composed of a combination of a meaning tag indicating the meaning and a meaning class indicating the classification of the meaning, provided for each meaning of the word;
A feature amount extraction unit generates a primary feature amount indicating a word co-occurrence relationship between the processing target word and another word included in the target sentence for the processing target word selected from the target words, Semantic co-occurrence feature quantity indicating a semantic co-occurrence relationship between a semantic tag and a semantic class obtained by searching the word dictionary in the storage unit for other words included in the target sentence and the processing target word is generated. A feature amount extraction step of generating the extended feature amount of the processing target word by adding the semantic co-occurrence feature amount to the primary feature amount of the processing target word;
Based on the extended feature amount of the processing target word, the semantic tag determination unit selects the most processing target word used in the target sentence from the semantic tags of the processing target word described in the word dictionary. A semantic tag determination step for determining a semantic tag indicating an appropriate meaning , and
The storing step stores a weight given to each set of the extended feature amount of the processing target word and the semantic tag when obtaining the probability that each semantic tag is selected by the maximum entropy method,
The semantic tag determination step multiplies the weight given to each semantic tag corresponding to the extended feature amount by multiplying the extended feature amount of the processing target word by the maximum entropy method and each semantic tag. And calculating a probability for each combination of the above and determining a semantic tag having the highest probability among semantic tags corresponding to the processing target word as a semantic tag of the processing target word .

The program for making a computer perform each step of the word meaning tag provision method of Claim 5 .

A recording medium on which the program according to claim 6 is recorded.