JP2010129025A

JP2010129025A - Word relation determining device and program

Info

Publication number: JP2010129025A
Application number: JP2008305972A
Authority: JP
Inventors: Hiroshi Masuichi; 博増市
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2008-12-01
Filing date: 2008-12-01
Publication date: 2010-06-10
Anticipated expiration: 2028-12-01
Also published as: JP5277914B2

Abstract

<P>PROBLEM TO BE SOLVED: To determine the relation regarding higher and lower order relations of verbs, with high accuracy. <P>SOLUTION: The word relation determining device 10 sets a first verb and a second verb as an object to be determined; obtains a first name group including one or more nouns, which coincides/coincide with a given condition about the first verb from a plurality of sentences; obtains a second noun group including one or more nouns, which coincides/coincide with a given condition about the second verb from the plurality of sentences; and determines whether having superior and inferior relations between the first verb and the second verb, on the basis of comparison of a variation in the first noun group with a variation in the second noun group in a noun thesaurus in which a plurality of names are represented in a tree structure, on the basis of superior and inferior relations of the concept belonging to each noun. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、単語関係判定装置及びプログラムに関する。 The present invention relates to a word relationship determination device and a program.

単語間の概念（意味）に基づく階層関係を記述したシソーラスは、テキスト検索の際の入力キーワードの補充、拡張や、文書分類を行う際のキーワードの正規化等で重要な役割を果たしており貴重な言語情報である。 A thesaurus that describes hierarchical relationships based on the concept (meaning) between words plays an important role in supplementing and expanding input keywords when searching for text, and normalizing keywords when sorting documents. Language information.

シソーラスは人的に作成されることが一般的であるが、シソーラスの構築には高い一貫性、整合性が要求され人的作業では処理負荷が大きく、シソーラスを機械的に構築するための技術が必要とされてきている。 The thesaurus is generally created manually, but the construction of the thesaurus requires high consistency and consistency, and the human work requires a large processing load, and there is a technology for mechanically constructing the thesaurus. It has been needed.

そこで、下記の特許文献１に記載されているように、２つの単語のそれぞれの近傍に出現する単語の頻度に基づいて両単語の距離を算出し、算出した距離に基づくクラスタリングを行って、シソーラスを機械的に構築する技術を提案しているものがある。
特開２０００―２３１５７２号公報 Therefore, as described in Patent Document 1 below, the distance between the two words is calculated based on the frequency of words appearing in the vicinity of each of the two words, and clustering based on the calculated distance is performed to obtain a thesaurus. Some have proposed a technology for mechanically constructing the system.
JP 2000-231572 A

動詞についてのシソーラスを構築する場合には、上位下位の関係にある動詞の近傍に出現する単語は極めて近い頻度分布を持つことが多く、動詞についての上位下位の判定を動詞の近傍に出現する単語の頻度分布に基づいて行うと精度良く判定できないことがある。 When constructing a thesaurus for verbs, words that appear in the vicinity of verbs that are in a higher-order and lower-order relationship often have a very close frequency distribution, and words that appear in the vicinity of the verb If it is performed based on the frequency distribution, the determination may not be made with high accuracy.

本発明の目的の一つは、動詞について上位下位の関係性を精度良く判定できる単語関係判定装置及びプログラムを提供することにある。 One of the objects of the present invention is to provide a word relationship determination device and a program capable of accurately determining upper and lower relationships for a verb.

上記目的を達成するために、請求項１に記載の単語関係判定装置の発明は、第１の動詞と第２の動詞を判定対象として設定する設定手段と、複数の文から前記第１の動詞に関して所与の条件に合致する１又は複数の名詞を含む第１の名詞群を取得する第１名詞群取得手段と、前記複数の文から前記第２の動詞に関して前記所与の条件に合致する１又は複数の名詞を含む第２の名詞群を取得する第２名詞群取得手段と、複数の名詞を当該各名詞が有する概念の上位下位の関係に基づいて木構造に表した名詞シソーラスにおける前記第１の名詞群のばらつきと前記第２の名詞群のばらつきとの比較に基づいて、前記第１の動詞と前記第２の動詞とが上位下位の関係にあるか否かを判定する判定手段と、を含むことを特徴とする。 In order to achieve the above object, the invention of a word relationship determination device according to claim 1 is characterized in that setting means for setting a first verb and a second verb as determination targets, and the first verb from a plurality of sentences. A first noun group obtaining means for obtaining a first noun group including one or more nouns that match a given condition with respect to the second verb from the plurality of sentences, the first noun group obtaining means being matched with the given condition A second noun group acquiring means for acquiring a second noun group including one or a plurality of nouns; and the noun thesaurus in which the plurality of nouns are represented in a tree structure based on the upper and lower relations of the concept of each noun. Determination means for determining whether or not the first verb and the second verb are in an upper-lower relationship based on a comparison between the variation of the first noun group and the variation of the second noun group. It is characterized by including these.

また、請求項２に記載の発明は、請求項１に記載の単語関係判定装置において、前記判定手段は、前記第１の名詞群のばらつきと前記第２の名詞群のばらつきとに予め定められた以上の差がある場合に、ばらつきの大きい方に関する動詞を上位、他方に関する動詞を下位の概念にあると判定することを特徴とする。 According to a second aspect of the present invention, in the word relationship determination device according to the first aspect, the determination means is predetermined for the variation of the first noun group and the variation of the second noun group. In the case where there is more than the difference, it is determined that the verb related to the one with the larger variation is in the upper concept and the verb related to the other is in the lower concept.

また、請求項３に記載の発明は、請求項２に記載の単語関係判定装置において、前記所与の条件は、複数の条件を含み、前記判定手段は、前記複数の条件のいずれかに基づいて取得された第１の名詞群のばらつきと第２の名詞群のばらつきとに予め定められた以上の差がある場合に、ばらつきの大きい方に関する動詞を上位、他方に関する動詞を下位の関係にあると判定することを特徴とする。 The invention according to claim 3 is the word relationship determination device according to claim 2, wherein the given condition includes a plurality of conditions, and the determination means is based on any of the plurality of conditions. If there is more than a predetermined difference between the variation of the first noun group and the variation of the second noun group obtained in this way, the verb related to the larger variation is in the higher order and the verb related to the other is in the lower relationship. It is determined that it exists.

また、請求項４に記載の発明は、請求項１乃至３のいずれかに記載の単語関係判定装置において、前記所与の条件は、前記設定手段により対象に設定された動詞と係り受けの関係にある名詞という条件、前記対象に設定された動詞の近傍に出現する名詞という条件、前記対象に設定された動詞に因果関係を表す表現を介して係る文節に含まれる名詞という条件のうち少なくとも１つを含むことを特徴とする。 According to a fourth aspect of the present invention, in the word relationship determination device according to any one of the first to third aspects, the given condition is a relationship between a verb set as a target by the setting means and a dependency. At least one of the following conditions: a noun condition in which the noun appears in the vicinity of the verb set for the object, and a noun condition included in the phrase through a representation of a causal relationship to the verb set for the object It is characterized by including one.

また、請求項５に記載の発明は、請求項１乃至４のいずれかに記載の単語関係判定装置において、前記名詞シソーラスにおける名詞群のばらつきは、当該名詞群に含まれる名詞の各組の前記名詞シソーラスにおけるホップ数に基づいて算出されることを特徴とする。 Further, the invention according to claim 5 is the word relationship determination device according to any one of claims 1 to 4, wherein the variation of the noun group in the noun thesaurus is the number of nouns included in the noun group. It is calculated based on the number of hops in the noun thesaurus.

また、請求項６に記載の発明は、請求項１乃至５のいずれかに単語関係判定装置において、前記第１及び第２の名詞群に基づいて前記第１の動詞と前記第２の動詞の類否を判定する類否判定手段をさらに含み、前記類否判定手段により類似と判定された場合に、前記判定手段により前記第１の動詞と前記第２の動詞とが上位下位の関係にあるか否かを判定することを特徴とする。 The invention according to claim 6 is the word relationship determination device according to any one of claims 1 to 5, wherein the first verb and the second verb are based on the first and second noun groups. And a similarity determination unit that determines similarity, and when the similarity determination unit determines that the similarity is similar, the determination unit has a high-level and low-level relationship between the first verb and the second verb It is characterized by determining whether or not.

また、請求項７に記載の発明は、請求項６に記載の単語関係判定装置において、前記類否判定手段は、前記第１及び第２の名詞群に含まれる各名詞を前記名詞シソーラスにおける予め定められた基準の概念に変換するとともに、当該変換した第１及び第２の名詞群に含まれる基準の概念の頻度に基づいて、前記第１の動詞と第２の動詞の類否を判定することを特徴とする。 The invention described in claim 7 is the word relationship determination device according to claim 6, wherein the similarity determination unit determines each noun included in the first and second noun groups in advance in the noun thesaurus. The concept is converted into a predetermined reference concept, and the similarity between the first verb and the second verb is determined based on the frequency of the reference concept included in the converted first and second noun groups. It is characterized by that.

また、請求項８に記載の発明は、請求項６又は７に記載の単語関係判定装置において、前記類否判定手段により類似と判定され、かつ、前記判定手段により上位下位の関係にないと判定された前記第１の動詞と前記第２の動詞は、同義語の関係にあると判定することを特徴とする。 Further, the invention according to claim 8 is the word relationship determination device according to claim 6 or 7, wherein the similarity determination unit determines that the similarity is similar, and the determination unit determines that there is no upper or lower relationship. The determined first verb and the second verb are determined to have a synonym relationship.

また、請求項９に記載の単語関係判定装置の発明は、第１の動詞と第２の動詞を判定対象として設定する設定手段と、複数の文から前記第１の動詞に因果関係を表す表現を介して係る文節に含まれる１又は複数の名詞を含む第１の名詞群を取得する第１名詞群取得手段と、前記複数の文から前記第２の動詞に因果関係を表す表現を介して係る文節に含まれる１又は複数の名詞を含む第２の名詞群を取得する第２名詞群取得手段と、前記第１の名詞群のばらつきと前記第２の名詞群のばらつきとの比較に基づいて、前記第１の動詞と前記第２の動詞とが上位下位の関係にあるか否かを判定する判定手段と、を含むことを特徴とする。 According to another aspect of the present invention, there is provided a word relation determination device according to the present invention, wherein the first verb and the second verb are set as determination targets, and the expression representing the causal relationship from a plurality of sentences to the first verb. Via a first noun group acquisition means for acquiring a first noun group including one or more nouns included in the clause, and an expression representing a causal relationship from the plurality of sentences to the second verb Based on the comparison between the second noun group acquisition means for acquiring the second noun group including one or a plurality of nouns included in the phrase and the variation of the first noun group and the variation of the second noun group. And determining means for determining whether or not the first verb and the second verb are in a high-order and low-order relationship.

また、請求項１０に記載のプログラムの発明は、第１の動詞と第２の動詞を判定対象として設定する設定ステップと、複数の文から前記第１の動詞に関して所与の条件に合致する１又は複数の名詞を含む第１の名詞群を取得する第１名詞群取得ステップと、前記複数の文から前記第２の動詞に関して前記所与の条件に合致する１又は複数の名詞を含む第２の名詞群を取得する第２名詞群取得ステップと、複数の名詞を当該各名詞が有する概念の上位下位の関係に基づいて木構造に表した名詞シソーラスにおける前記第１の名詞群のばらつきと前記第２の名詞群のばらつきとの比較に基づいて、前記第１の動詞と前記第２の動詞とが上位下位の関係にあるか否かを判定する判定ステップと、をコンピュータに実行させることを特徴とする。 According to a tenth aspect of the present invention, there is provided a program that sets a first verb and a second verb as determination targets, and that satisfies a given condition with respect to the first verb from a plurality of sentences. Or a first noun group acquisition step of acquiring a first noun group including a plurality of nouns, and a second noun that includes one or more nouns that match the given condition with respect to the second verb from the plurality of sentences. A second noun group acquisition step of acquiring a noun group of the first noun group in a noun thesaurus that represents a plurality of nouns in a tree structure based on the upper and lower relations of the concept of each noun and the noun group Determining whether or not the first verb and the second verb are in an upper-lower relationship based on the comparison with the variation of the second noun group. Features.

また、請求項１１に記載のプログラムの発明は、第１の動詞と第２の動詞を判定対象として設定する設定ステップと、複数の文から前記第１の動詞に因果関係を表す表現を介して係る文節に含まれる１又は複数の名詞を含む第１の名詞群を取得する第１名詞群取得ステップと、前記複数の文から前記第２の動詞に因果関係を表す表現を介して係る文節に含まれる１又は複数の名詞を含む第２の名詞群を取得する第２名詞群取得ステップと、前記第１の名詞群のばらつきと前記第２の名詞群のばらつきとの比較に基づいて、前記第１の動詞と前記第２の動詞とが上位下位の関係にあるか否かを判定する判定ステップと、をコンピュータに実行させることを特徴とする。 The invention of the program according to claim 11 includes a setting step of setting the first verb and the second verb as determination targets, and an expression expressing a causal relationship from a plurality of sentences to the first verb. A first noun group acquisition step of acquiring a first noun group including one or a plurality of nouns included in the clause, and a phrase representing the causal relationship from the plurality of sentences to the second verb. Based on a second noun group acquisition step of acquiring a second noun group including one or a plurality of nouns included, and a comparison between the variation of the first noun group and the variation of the second noun group, A determination step of determining whether or not the first verb and the second verb are in an upper-lower relationship is executed by a computer.

請求項１及び１０に記載の発明によれば、第１の動詞と第２の動詞のそれぞれについて取得された名詞群の概念上での分布を用いて、各動詞の上位下位の判定を精度良く行える。 According to the inventions described in claims 1 and 10, the upper and lower determinations of each verb are accurately performed using the conceptual distribution of the noun group acquired for each of the first verb and the second verb. Yes.

請求項２に記載の発明によれば、概念上での分布が広い方を上位、他方を下位として判定できる。 According to the second aspect of the present invention, it is possible to determine the broader conceptual distribution as higher and the other as lower.

請求項３に記載の発明によれば、１つの条件では上位下位の判定ができない場合でも、他の条件により動詞間の上位下位の関係が判定できる。 According to the third aspect of the present invention, even when the upper / lower determination cannot be made by one condition, the upper / lower relation between the verbs can be determined by another condition.

請求項４に記載の発明によれば、動詞と関連性の高い名詞群を取得できる。 According to invention of Claim 4, a noun group with a high relation with a verb is acquirable.

請求項５に記載の発明によれば、名詞間のばらつきを名詞シソーラスにおける客観的な距離に換算して算出できる。 According to the invention described in claim 5, it is possible to calculate by converting the variation between nouns into an objective distance in the noun thesaurus.

請求項６に記載の発明によれば、類似かつ抽出された名詞群の概念上の分布が異なる動詞を上位下位の関係にあると判定できる。 According to the sixth aspect of the present invention, it is possible to determine that verbs having different conceptual distributions of similar and extracted noun groups are in a higher and lower order relationship.

請求項７に記載の発明によれば、動詞に関して得られた名詞群についての頻度分析を名詞の概念毎に行うことができる。 According to the seventh aspect of the present invention, the frequency analysis of the noun group obtained for the verb can be performed for each noun concept.

請求項８に記載の発明によれば、同義語を判定できる。 According to invention of Claim 8, a synonym can be determined.

請求項９及び１１に記載の発明によれば、動詞と因果関係性のある名詞群に基づいて、動詞間の上位下位の判定が行える。 According to the ninth and eleventh aspects of the present invention, it is possible to make upper / lower determinations between verbs based on a noun group having a causal relationship with a verb.

以下、本発明を実施するための好適な実施の形態（以下、実施形態という）を、図面に従って説明する。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, preferred embodiments (hereinafter referred to as embodiments) for carrying out the invention will be described with reference to the drawings.

図１には、本実施形態に係る単語関係判定装置１０の機能ブロック図を示した。図１に示されるように、単語関係判定装置１０は、テキストデータ格納部１２、名詞シソーラス格納部１４、判定対象動詞設定部１６、係り受け名詞抽出部１８、類否判定部２０、第１判定部２２、因果関係名詞抽出部２４、及び第２判定部２６を含む。上記の各部の機能は、ＣＰＵ等の制御手段、メモリ等の記憶手段、外部デバイスとデータを送受信する入出力手段等を備えたコンピュータが、コンピュータ読み取り可能な情報記憶媒体に格納されたプログラムを読み込み実行することで実現されるものとしてよい。なお、プログラムは情報記憶媒体によってコンピュータたる単語関係判定装置１０に供給されることとしてもよいし、インターネット等のデータ通信ネットワークを介して供給されることとしてもよい。 In FIG. 1, the functional block diagram of the word relationship determination apparatus 10 which concerns on this embodiment was shown. As shown in FIG. 1, the word relationship determination device 10 includes a text data storage unit 12, a noun thesaurus storage unit 14, a determination target verb setting unit 16, a dependency noun extraction unit 18, a similarity determination unit 20, and a first determination. A unit 22, a causal noun extraction unit 24, and a second determination unit 26. The functions of the above-described units are such that a computer having a control unit such as a CPU, a storage unit such as a memory, and an input / output unit that transmits and receives data to and from an external device reads a program stored in a computer-readable information storage medium. It may be realized by executing. The program may be supplied to the word relationship determination apparatus 10 as a computer by an information storage medium, or may be supplied via a data communication network such as the Internet.

テキストデータ格納部１２は、磁気ディスク等の記憶装置を含み構成され、１又は複数の文を含む文書情報を１又は複数格納している。文書情報は、文字列データを含むデータファイルとして構成されることとしてよい。 The text data storage unit 12 includes a storage device such as a magnetic disk, and stores one or more document information including one or more sentences. The document information may be configured as a data file including character string data.

名詞シソーラス格納部１４は、名詞を対象として構築された名詞シソーラスを格納するものであり、名詞シソーラスとは、名詞を上位下位の関係により階層化した木構造により表される情報である。図２には名詞シソーラスの一例を示す。 The noun thesaurus storage unit 14 stores a noun thesaurus constructed for nouns, and the noun thesaurus is information represented by a tree structure in which nouns are hierarchized by upper and lower relationships. FIG. 2 shows an example of a noun thesaurus.

図２に示されるように、名詞シソーラスは、ルートから順次、上位概念、その上位概念に対する下位概念という関係により接続された木構造であり、本実施形態においては、名詞シソーラスの一部の階層を基準概念として選択している。この基準概念は概念毎に個別に選択されてもよいし、ルートからの階層の深さにより選択されてもよい。基準概念を用いた処理の詳細については後述する。 As shown in FIG. 2, the noun thesaurus is a tree structure connected by a relationship of a superordinate concept and a subordinate concept relative to the superordinate concept sequentially from the root. In this embodiment, a part of the noun thesaurus is Selected as the standard concept. This reference concept may be selected individually for each concept, or may be selected according to the depth of the hierarchy from the root. Details of processing using the reference concept will be described later.

判定対象動詞設定部１６は、互いの関係性を判定する対象となる１組の動詞を設定する。以下、判定対象動詞設定部１６により設定される組の対を動詞Ａ及び動詞Ｂとする。判定対象動詞設定部１６は、利用者からの入力を受け付けて判定対象の動詞を設定してもよいし、予め用意された動詞のリストの中から動詞の組を選択し、当該選択した動詞を判定対象に設定することとしてもよい。 The determination target verb setting unit 16 sets a set of verbs that are targets for determining the relationship between each other. Hereinafter, a pair set by the determination target verb setting unit 16 is referred to as a verb A and a verb B. The determination target verb setting unit 16 may accept an input from the user and set a determination target verb, or select a set of verbs from a list of verbs prepared in advance, and select the selected verb. It may be set as a determination target.

係り受け名詞抽出部１８は、判定対象動詞設定部１６により設定された組の動詞のそれぞれに対して係り受けの関係にある名詞を抽出する。係り受け名詞抽出部１８は、判定対象動詞設定部１６により設定された動詞（動詞Ａ、動詞Ｂ）を含む文を、テキストデータ格納部１２に格納されたテキストデータから検索し、検索された文に対して形態素解析、構文解析を行った後に、上記設定された動詞に係る文節を特定する。そして、係り受け名詞抽出部１８は、当該特定した文節のうち、文節の終わりが「が」、「を」、「で」、「に」のいずれかの格助詞である場合に、その格助詞の前の名詞を抽出する。係り受け名詞抽出部１８は、上記抽出した名詞を格助詞の語毎に、当該名詞の出現頻度とともに記録する。係り受け名詞抽出部１８は、動詞Ａと動詞Ｂのそれぞれについて上記処理を行う。 The dependency noun extraction unit 18 extracts nouns having a dependency relationship with respect to each of the set of verbs set by the determination target verb setting unit 16. The dependency noun extraction unit 18 searches the text data stored in the text data storage unit 12 for a sentence including the verbs (verb A and verb B) set by the determination target verb setting unit 16, and the searched sentence After performing morphological analysis and syntactic analysis, the phrase related to the set verb is specified. Then, the dependency noun extraction unit 18 selects the case particle when the end of the phrase is “ga”, “wo”, “de”, or “ni” among the specified clauses. Extract the noun before. The dependency noun extraction unit 18 records the extracted noun along with the appearance frequency of the noun for each word of the case particle. The dependency noun extraction unit 18 performs the above processing for each of the verb A and the verb B.

類否判定部２０は、動詞Ａと動詞Ｂとが類似であるか否かを判定する。ここでの類似とは、動詞Ａと動詞Ｂとが類義語の関係にあることとする。本実施形態においては、類否判定部２０は、判定対象の動詞（動詞Ａ又は動詞Ｂ）について係り受け名詞抽出部１８により抽出された名詞群に含まれる全ての名詞を、名詞シソーラス格納部１４に格納された名詞シソーラスに基づいて、上位にある基準概念の語に変換する。例えば、図２に示された名詞シソーラスによれば、「ブルドック」という語は、上位に基準概念の「動物」があるので、「動物」に変換される。類否判定部２０は、動詞Ａと動詞Ｂのそれぞれについて得られた名詞群について上記基準概念のへの変換処理を行う。 The similarity determination unit 20 determines whether the verb A and the verb B are similar. Here, similarity means that verb A and verb B have a synonym relationship. In the present embodiment, the similarity determination unit 20 uses all the nouns included in the noun group extracted by the dependency noun extraction unit 18 for the verb to be determined (verb A or verb B) as the noun thesaurus storage unit 14. Based on the noun thesaurus stored in, it is converted into a higher-order reference concept word. For example, according to the noun thesaurus shown in FIG. 2, the word “Bulldog” is converted to “animal” because there is a standard concept “animal” at the top. The similarity determination unit 20 converts the noun group obtained for each of the verb A and the verb B into the reference concept.

類否判定部２０は、上記処理により基準概念の語に変換された名詞群（基準概念名詞群とする）に基づいて、動詞Ａ、動詞Ｂのそれぞれについて、特徴ベクトルを生成する。以下、本実施形態における特徴ベクトルの生成処理について説明する。 The similarity determination unit 20 generates a feature vector for each of the verb A and the verb B based on the noun group (referred to as the reference concept noun group) converted into the reference concept word by the above processing. The feature vector generation processing in the present embodiment will be described below.

まず、動詞Ａにつき格助詞「が」、「を」、「で」、「に」についてそれぞれ抽出された基準概念名詞群をそれぞれＧＡ１、ＧＡ２、ＧＡ３、ＧＡ４とする。ここで、ＧＡ１＝｛ｗｉ｜ｉ＝１〜ｎ｝、ｎはＧＡ１に含まれる異なる単語の数とし、ｗｉの出現頻度がＮＡ１ｗｉとすれば、出現比率ＲＡ１ｗｉは、ＧＡ１の要素数をＮＡ１とした場合に、ＲＡ１ｗｉ＝ＮＡ１ｗｉ／ＮＡ１として算出される。そして、ＧＡ１について、特徴ベクトルＧＡ１＝（ＲＡ１ｗ１，ＲＡ１ｗ２，・・・，ＲＡ１ｗｎ）となる。また、ＧＡ２，ＧＡ３，ＧＡ４についても同様の処理に基づいて特徴ベクトルを生成する。そして、特徴ベクトルＧＡ１，ＧＡ２，ＧＡ３，ＧＡ４を連結して動詞Ａの特徴ベクトルＲＡをＲＡ＝（ＲＡ１ｗ１，ＲＡ１ｗ２，・・・，ＲＡ１ｗｎ，ＲＡ２ｗ１，ＲＡ２ｗ２，・・・，ＲＡ２ｗｎ，ＲＡ３ｗ１，ＲＡ３ｗ２，・・・，ＲＡ３ｗｎ，ＲＡ４ｗ１，ＲＡ４ｗ２，・・・，ＲＡ４ｗｎ）として生成する。動詞Ｂについても同様にして特徴ベクトルＲＢを生成する。ＲＡとＲＢはそれぞれベクトルの長さを１に正規化し、以下特徴ベクトルＲＡ，ＲＢとは正規化後のベクトルであるとする。 First, reference concept noun groups extracted for the case particles “ga”, “wo”, “de”, and “ni” for the verb A are defined as GA1, GA2, GA3, and GA4, respectively. Here, GA1 = {wi | i = 1 to n}, where n is the number of different words included in GA1, and if the appearance frequency of wi is NA1wi, the appearance ratio RA1wi is the number of elements of GA1 is NA1. In this case, RA1wi = NA1wi / NA1 is calculated. For GA1, feature vector GA1 = (RA1w1, RA1w2,..., RA1wn). For GA2, GA3, and GA4, feature vectors are generated based on the same processing. Then, the feature vectors GA1, GA2, GA3, GA4 are concatenated to convert the feature vector RA of the verb A into RA = (RA1w1, RA1w2,..., RA1wn, RA2w1, RA2w2,..., RA2wn, RA3w1, RA3w2,. .., RA3wn, RA4w1, RA4w2,..., RA4wn). A feature vector RB is similarly generated for the verb B. RA and RB are each normalized to a vector length of 1, and feature vectors RA and RB are assumed to be normalized vectors.

類否判定部２０は、上記生成した動詞Ａと動詞Ｂのそれぞれの特徴ベクトルＲＡと特徴ベクトルＲＢとの内積を算出し、算出された内積の大きさに基づいて動詞Ａと動詞Ｂとの類否を判定する。具体的には、類否判定部２０は、特徴ベクトルＲＡと特徴ベクトルＲＢの内積ＲＡ・ＲＢが閾値Ｔ１（０＜Ｔ１＜１）よりも大きい場合には、動詞Ａと動詞Ｂとが類似であると判定することとする。閾値Ｔ１は予め定めておくこととしてよい。 The similarity determination unit 20 calculates the inner product of the feature vector RA and the feature vector RB of each of the generated verb A and verb B, and classifies the verb A and the verb B based on the size of the calculated inner product. Determine no. Specifically, the similarity determination unit 20 determines that the verb A and the verb B are similar when the inner product RA · RB of the feature vector RA and the feature vector RB is larger than the threshold T1 (0 <T1 <1). It is determined that there is. The threshold value T1 may be determined in advance.

第１判定部２２は、判定対象に設定された動詞Ａと動詞Ｂとが上位下位概念の関係にあるか否かを判定する。第１判定部２２は、類否判定部２０により判定対象とする動詞Ａと動詞Ｂが類似と判定された場合に、上記の上位下位概念の判定をすることとする。そして、第１判定部２２は、まず動詞Ａ及び動詞Ｂについて係り受け名詞抽出部１８により抽出されたそれぞれの名詞群の名詞シソーラス格納部１４に格納された名詞シソーラス上での分布距離を算出し、当該算出された分布距離に基づいて動詞Ａ及び動詞Ｂの上位下位の関係性を判定する。以下、本実施形態における第１判定部２２の具体的な処理を説明する。 The first determination unit 22 determines whether or not the verb A and the verb B set as the determination target are in a relationship of higher and lower concepts. When the similarity determination unit 20 determines that the verb A and the verb B to be determined are similar, the first determination unit 22 determines the upper and lower concepts. The first determination unit 22 first calculates the distribution distance on the noun thesaurus stored in the noun thesaurus storage unit 14 of each noun group extracted by the dependency noun extraction unit 18 for the verb A and the verb B. Based on the calculated distribution distance, the upper and lower relations of the verb A and the verb B are determined. Hereinafter, specific processing of the first determination unit 22 in the present embodiment will be described.

まず、第１判定部２２は、格助詞の「が」について抽出された集合ＧＡ１から１組の名詞を抽出し、その抽出された組の名詞の名詞シソーラス上での距離を取得する。本実施形態における名詞シソーラス上での距離とは、名詞シソーラス上で一方の名詞から他方の名詞に到達するまでのホップ数であるとする。ホップ数は、例えば１の名詞が他の名詞と直接のリンクで繋がっているとすると１、他の名詞を介して接続されていると２とする。具体的に図２に示した名詞シソーラスにおいては、「ブルドック」と「犬」はホップ数１、「ブルドック」と「猫」はホップ数が３となる。第１判定部２２は、ＧＡ１に含まれる名詞の全ての組についてホップ数を取得し、そのホップ数の平均値をＧＡ１に関する名詞シソーラス上での距離ＭＡ１とする。第１判定部２２は、同様に他の集合ＧＡ２〜ＧＡ４についても名詞シソーラス上での距離ＭＡ２〜ＭＡ４を算出し、動詞Ａの名詞シソーラス上での分布を示すベクトルＭＡ＝（ＭＡ１，ＭＡ２，ＭＡ３，ＭＡ４）を得る。第１判定部２２は、動詞Ｂについても同様の処理を行い動詞Ｂの名詞シソーラス上での分布を示すベクトルＭＢを得る。 First, the first determination unit 22 extracts a set of nouns from the set GA1 extracted for the case particle “ga”, and acquires a distance on the noun thesaurus of the extracted nouns. The distance on the noun thesaurus in the present embodiment is the number of hops from one noun to the other noun on the noun thesaurus. The number of hops is, for example, 1 if a noun is connected to another noun by a direct link, and 2 if it is connected via another noun. Specifically, in the noun thesaurus shown in FIG. 2, “Bulldog” and “Dog” have 1 hop, and “Bulldog” and “Cat” have 3 hops. The first determination unit 22 acquires the number of hops for all pairs of nouns included in GA1, and sets the average value of the number of hops as the distance MA1 on the noun thesaurus related to GA1. Similarly, the first determination unit 22 calculates distances MA2 to MA4 on the noun thesaurus for the other sets GA2 to GA4, and a vector MA = (MA1, MA2, MA3 indicating the distribution of the verb A on the noun thesaurus. , MA4). The first determination unit 22 performs the same process for the verb B, and obtains a vector MB indicating the distribution of the verb B on the noun thesaurus.

次に、第１判定部２２は、ベクトルＭＡとベクトルＭＢとのベクトル間距離‖ＭＡ−ＭＢ‖を算出し、この値が予め定められた閾値Ｔ２（Ｔ２＞０）よりも大きい場合には、動詞Ａと動詞Ｂとには上位下位の関係性があると判定し、そうでなかった場合には上位下位の関係性が不明であると判定する。上記基準により上位下位の関係性があると判定された場合には、ベクトルのノルムが大きい方が上位概念、小さい方が下位概念にあると判定する。これは、上位概念の語の方が下位概念の語に比べて広範に用いられ、分布に広がりがあると考えられるためである。 Next, the first determination unit 22 calculates an inter-vector distance ‖MA-MB‖ between the vector MA and the vector MB, and when this value is larger than a predetermined threshold T2 (T2> 0), Verb A and verb B are determined to have a high-level and low-level relationship. If not, it is determined that the high-level and low-level relationship is unknown. When it is determined that there is an upper / lower relationship according to the above criteria, it is determined that the larger vector norm is the higher concept and the smaller vector is the lower concept. This is because the words of the higher concept are used more widely than the words of the lower concept, and the distribution is considered wide.

因果関係名詞抽出部２４は、第１判定部２２により動詞Ａと動詞Ｂとの上位下位の関係性が不明であると判定された場合に、動詞Ａ又は動詞Ｂを含む文をテキストデータ格納部１２から抽出すると共に、当該抽出された文のうち「〜ので」、「〜ために」、「〜という理由で」等の因果関係を表す表現が、動詞Ａ又動詞Ｂよりも前にある文を検索する。そして、因果関係名詞抽出部２４は、上記検索された文において、上記因果関係を表す表現よりも前に出現する文節に含まれる名詞を抽出して名詞群（因果関係名詞群）を得る。因果関係名詞抽出部２４は、動詞Ａそして動詞Ｂのそれぞれについて因果関係名詞群を抽出することとし、動詞Ａについて抽出された因果関係名詞群の集合をＨＡ、動詞Ｂについて抽出された因果関係名詞群の集合をＨＢとする。 The causal relation noun extraction unit 24, when the first determination unit 22 determines that the upper and lower relationship between the verb A and the verb B is unknown, the sentence including the verb A or the verb B is a text data storage unit. Sentences that are extracted from No. 12 and that have a causal relationship such as “because”, “because of”, “because of” or the like before the verb A or the verb B Search for. And the causal relation noun extraction part 24 extracts the noun contained in the phrase which appears before the expression showing the causal relation in the searched sentence, and obtains a noun group (causal relation noun group). The causal relation noun extraction unit 24 extracts a causal relation noun group for each of verb A and verb B, and a set of causal relation noun groups extracted for verb A is a causal relation noun extracted for verb B. Let HB be the set of groups.

第２判定部２６は、因果関係名詞抽出部２４により各動詞について抽出される因果関係名詞群に基づいて、再度動詞Ａと動詞Ｂとの上位下位の関係性を判定する。以下、第２判定部２６により行われる判定の具体的な処理を説明する。 The second determination unit 26 determines the upper and lower relationship between the verb A and the verb B again based on the causal relationship noun group extracted for each verb by the causal relationship noun extraction unit 24. Hereinafter, specific processing of determination performed by the second determination unit 26 will be described.

第２判定部２６は、動詞Ａについて抽出された因果関係名詞群の集合ＨＡの名詞シソーラス上での分布距離を算出する。第２判定部２６は、ＨＡに含まれる名詞の全ての組に対して名詞シソーラス上でのホップ数を取得し、当該取得したホップ数の平均値をＨＡの名詞シソーラス上での分布距離ＬＡとして算出する。第２判定部２６は、動詞Ｂについて抽出された因果関係名詞群の集合ＨＢについても同様にしてＨＢの名詞シソーラス上での分布距離ＬＢを算出する。そして、第２判定部２６は、ＬＡとＬＢとの差を求め、求められた差が予め定められた閾値Ｔ３（Ｔ３＞０）よりも大きい場合には、動詞Ａと動詞Ｂとに上位下位の関係性があると判定し、そうでなければ動詞Ａと動詞Ｂとが同義語であると判定する。上記基準により上位下位の関係性があると判定された場合には、分布距離が大きい方が上位概念、小さい方が下位概念にあると判定する。 The second determination unit 26 calculates the distribution distance on the noun thesaurus of the causal noun group HA extracted for the verb A. The second determination unit 26 acquires the number of hops on the noun thesaurus for all pairs of nouns included in the HA, and sets the average value of the acquired hop counts as the distribution distance LA on the noun thesaurus of the HA. calculate. The second determination unit 26 calculates the distribution distance LB on the noun thesaurus of HB in the same manner for the set HB of causal noun groups extracted for the verb B. Then, the second determination unit 26 obtains the difference between LA and LB, and if the obtained difference is larger than a predetermined threshold T3 (T3> 0), the second and second determination units 26 are higher and lower in verb A and verb B. Otherwise, it is determined that verb A and verb B are synonyms. If it is determined by the above criteria that there is an upper / lower relationship, it is determined that the one with the larger distribution distance is the upper concept and the one with the smaller distribution distance is the lower concept.

単語関係判定装置１０は、以上説明した類否判定部２０、第１判定部２２、第２判定部２６の３つの判定結果に基づいて、判定対象の１組の動詞についての上位下位の関係性を判定する。単語関係判定装置１０は、さらに他の動詞の組についても順次同様の判定処理を実行することにより、複数の動詞の上位下位の概念を得て動詞シソーラスが構築される。 The word relationship determination apparatus 10 is based on the above-described three determination results of the similarity determination unit 20, the first determination unit 22, and the second determination unit 26. Determine. The word relationship determination device 10 sequentially executes the same determination process for other verb pairs, thereby obtaining a concept of a higher and lower order of a plurality of verbs and constructing a verb thesaurus.

次に、図３に示したフローチャートを参照しながら、単語関係判定装置１０による動詞の上位下位概念の判定処理の流れを説明する。 Next, referring to the flowchart shown in FIG. 3, the flow of the determination process of the upper and lower concepts of the verb by the word relationship determination device 10 will be described.

図３に示されるように、単語関係判定装置１０は、判定の対象とする動詞の組の入力を受け（Ｓ１０１）、格納されたテキストデータから受け付けた動詞の組の各々について係り受けの関係にある名詞群を抽出する（Ｓ１０２）。次に、単語関係判定装置１０は、抽出した名詞群の各名詞を名詞シソーラスの基準概念に変換し（Ｓ１０３）、変換した基準概念の語に基づいて各動詞の特徴ベクトルを生成して判定の対象とする動詞の組の類似性を判定する（Ｓ１０４）。 As shown in FIG. 3, the word relationship determination device 10 receives an input of a set of verbs to be determined (S101), and determines the dependency relationship for each set of verbs received from stored text data. A noun group is extracted (S102). Next, the word relationship determination device 10 converts each noun of the extracted noun group into a noun thesaurus reference concept (S103), and generates a feature vector for each verb based on the converted reference concept word. The similarity of the target verb set is determined (S104).

単語関係判定装置１０は、動詞が類似すると判定する場合には（Ｓ１０４：Ｙ）、さらに各動詞について抽出された名詞群の名詞シソーラスにおける分布距離を算出する（Ｓ１０５）。単語関係判定装置１０は、算出した分布距離が閾値以上か否かを判定し（Ｓ１０６）、閾値以上と判定する場合には（Ｓ１０６：Ｙ）、分布距離の広がりが大きい方を上位、他方を下位と判定する（Ｓ１０７）。また、単語関係判定装置１０は、閾値未満と判定する場合には（Ｓ１０６：Ｎ）、判定対象の各動詞について因果関係を表す表現により関連する名詞からなる因果関係名詞群を抽出すると共に（Ｓ１０８）、抽出した因果関係名詞群の名詞シソーラスにおける分布距離が閾値以上か否かを判定する（Ｓ１０９）。ここで、閾値以上と判定される場合には（Ｓ１０９：Ｙ）、分布距離の大きい方を上位、他方を下位と判定する（Ｓ１０７）。一方で、閾値未満と判定される場合には（Ｓ１０９：Ｎ）、動詞は同義語であると判定する（Ｓ１１０）。なお、Ｓ１０４において、類似でないと判定された場合には（Ｓ１０４：Ｎ）、動詞の組は上位下位の関係性にないと判定し（Ｓ１１１）、処理を終了する。 When determining that the verbs are similar (S104: Y), the word relationship determination device 10 further calculates the distribution distance in the noun thesaurus of the noun group extracted for each verb (S105). The word relationship determination device 10 determines whether or not the calculated distribution distance is greater than or equal to a threshold value (S106). If it is determined that the calculated distribution distance is equal to or greater than the threshold value (S106: Y), It is determined as lower (S107). In addition, when determining that the word relationship determination apparatus 10 is less than the threshold (S106: N), the word relationship determination device 10 extracts a causal relationship noun group including nouns related to each of the determination target verbs using an expression representing the causal relationship (S108). ), It is determined whether or not the distribution distance in the noun thesaurus of the extracted causal noun group is greater than or equal to a threshold (S109). Here, when it is determined that the threshold value is greater than or equal to the threshold (S109: Y), the one with the larger distribution distance is determined to be higher and the other is determined to be lower (S107). On the other hand, when it is determined that it is less than the threshold (S109: N), it is determined that the verb is a synonym (S110). If it is determined in S104 that they are not similar (S104: N), it is determined that the verb pair does not have a higher-lower relationship (S111), and the process ends.

次に、具体例として、（１）「許可する」と「読む」、（２）「許可する」と「認可する」、（３）「読む」と「熟読する」の３例についての単語関係判定装置１０による判定例を示す。 Next, as specific examples, (1) “permit” and “read”, (2) “permit” and “authorize”, and (3) “read” and “read carefully”, word relationships for three examples The example of determination by the determination apparatus 10 is shown.

図４Ａ乃至Ｄには、テキストデータから抽出された、各動詞について係り受け関係にある名詞及びその名詞の出現頻度の一例を示した。図４Ａ，図４Ｂ，図４Ｃ，図４Ｄはそれぞれ「許可する」、「読む」、「認可する」、「熟読する」について抽出された名詞及び出現頻度を示している。 FIGS. 4A to 4D show examples of nouns having a dependency relationship with respect to each verb extracted from the text data and the appearance frequency of the nouns. 4A, FIG. 4B, FIG. 4C, and FIG. 4D show the nouns and appearance frequencies extracted for “Allow”, “Read”, “Authorize”, and “Read carefully”, respectively.

また、図５Ａ乃至Ｄには、各動詞について抽出された名詞を基準概念に変換した場合の各基準概念の語の頻度を、格助詞毎に表したテーブルを示す。図５Ａ，図５Ｂ，図５Ｃ，図５Ｄはそれぞれ「許可する」、「読む」、「認可する」、「熟読する」についてのテーブルを示している。 5A to 5D show tables showing the frequency of words of each reference concept for each case particle when the noun extracted for each verb is converted to the reference concept. FIG. 5A, FIG. 5B, FIG. 5C, and FIG. 5D show tables for “permit”, “read”, “permit”, and “peruse”, respectively.

まず、（１）「許可する」と「読む」の関係の判定について説明する。図５Ａ、図５Ｂに示されるように、「許可する」と「読む」とでは、「許可する」につき格助詞「が」に関して得られた基準概念の語句は「地域」、「集団」、「役割」等の出現頻度が高く、格助詞「を」については「操作」、「生産」等の出現頻度が高い。一方で、「読む」につき格助詞「が」に関して得られた基準概念の語句は、「人称」に出現頻度が集中し、また、格助詞「を」については「文具」に出現頻度が集中しており、「許可する」と「読む」とでは基準概念の語句についての頻度分布が大きくなる。従って、「許可する」と「読む」についての基準概念の語句の頻度分布に基づく特徴ベクトルの内積値は小さくなり、類似性無しと判定される。従って、「許可する」と「読む」については上位下位の判定は行われない。 First, (1) determination of the relationship between “permitted” and “read” will be described. As shown in FIG. 5A and FIG. 5B, in “permit” and “read”, the words of the reference concept obtained for the case particle “ga” for “permit” are “region”, “group”, “ “Role” and the like have a high appearance frequency, and the case particle “O” has a high appearance frequency such as “operation” and “production”. On the other hand, the words of the basic concept obtained for the case particle “ga” for “read” are concentrated in the “person”, and the frequency of occurrence in the “stationery” is concentrated for the case particle “ha”. In the case of “permit” and “read”, the frequency distribution of the words of the standard concept becomes large. Therefore, the inner product value of the feature vectors based on the frequency distribution of the words of the reference concept for “permit” and “read” becomes small, and it is determined that there is no similarity. Therefore, the upper and lower determinations are not performed for “permit” and “read”.

次に、（２）「許可する」と「認可する」の関係の判定について説明する。図５Ａ，図５Ｃに示されるように、両動詞ともに格助詞「が」に関して得られた基準語句は「地域」、「集団」、「役割」等の出現頻度が高く、また、格助詞「を」に関しては「操作」、「生産」等の出現頻度が高いという共通した傾向が見られる。従って、「許可する」と「認可する」についての基準概念の語句の頻度分布に基づく特徴ベクトルの内積値は大きくなり、類似性有りと判定される。 Next, (2) determination of the relationship between “permitted” and “permitted” will be described. As shown in FIG. 5A and FIG. 5C, the reference words obtained for the case particle “ga” for both verbs have a high frequency of appearance of “region”, “group”, “role”, etc. ”Has a common tendency that the appearance frequency of“ operation ”,“ production ”, etc. is high. Therefore, the inner product value of the feature vectors based on the frequency distribution of the words of the reference concept for “permit” and “permit” is increased, and it is determined that there is similarity.

そこで、第１判定部２２は、「許可する」と「認可する」についての図４Ａ，図４Ｃに示される各名詞についての名詞シソーラスにおける平均距離を算出し、両者の上位下位の関係性を判定する。具体的には、「許可する」と「認可する」とでは、「認可する」が行政等に関して用いられる傾向があるのに対して、「許可する」はより広範に用いられており、両者の名詞シソーラスにおける平均距離は、「許可する」の方が「認可する」よりも大きくなる。従って、「許可する」と「認可する」とには上位下位の関係があり、「許可する」が上位、「認可する」が下位の関係があると判定される。 Therefore, the first determination unit 22 calculates the average distance in the noun thesaurus for each noun shown in FIG. 4A and FIG. 4C for “permit” and “permit”, and determines the upper and lower relationship between the two. To do. Specifically, “permit” and “permit” tend to be used for administration, etc., whereas “permit” is more widely used. The average distance in the noun thesaurus is greater for “permit” than for “permit”. Therefore, it is determined that “permit” and “authorize” have a high-order relationship, “permit” has a high-order relationship, and “permit” has a low-order relationship.

次に、（３）「読む」と「熟読する」の関係の判定について説明する。図５Ｂ、図５Ｄに示されるように、両者とも格助詞については同様の頻度分布を有しており、類否判定部２０では両者に類似性があると判定される。そして、第１判定部２２では、両者について抽出された名詞群の名詞シソーラスにおける平均距離についても両名詞群には分布に大きな差異がないため、算出される平均距離にも差が出ず、両者の上位下位の関係性については不明と判定される。 Next, (3) the determination of the relationship between “read” and “read carefully” will be described. As shown in FIGS. 5B and 5D, both have the same frequency distribution for case particles, and the similarity determination unit 20 determines that both are similar. And in the 1st determination part 22, since there is no big difference in distribution in both noun groups also about the average distance in the noun thesaurus of the noun group extracted about both, a difference does not appear also in the calculated average distance, both It is determined that the relationship between the upper and lower levels is unknown.

そこで、因果関係名詞抽出部２４では、「読む」と「熟読する」を含む文のうち「〜ので」、「〜ために」、「〜という理由で」等の因果関係表現を含む文を抽出し、抽出された文のうち因果関係表現よりも前に出現する名詞をさらに抽出する。例えば、「読む」については、「試験のために、参考書を読んだ」、「知識を得るために、本を読んだ」、「暇という理由で本を読んだ」という文が抽出されたとすると、こうして抽出された文から「試験」、「知識」、「暇」という因果関係名詞群が得られる。一方で、「熟読する」については、例えば「試験のために、参考書を熟読した」、「テストのために、教科書を熟読した」、「暗記が必要なので、本を熟読した」という文が抽出されたとすると、こうして抽出された文から「試験」、「テスト」、「暗記」という因果関係名詞群が得られる。第２判定部２６は、各動詞についてそれぞれ得られた因果関係名詞群の名詞シソーラスにおける平均距離を算出し、上記の例では「熟読する」という行為の原因は「試験」、「テスト」等の一部の意味範疇の名詞に集中し、一方の「読む」については行為の原因は多岐に渡っているから、両因果関係名詞群の名詞シソーラスにおける平均距離は、「読む」の方が「熟読する」よりも大きくなる。従って、「読む」と「熟読する」は第２判定部２６により「読む」が上位、「熟読する」が下位にあると判定される。 Therefore, the causal relation noun extraction unit 24 extracts sentences including causal relation expressions such as “~ so”, “~ for”, and “for reason” among sentences including “read” and “read carefully”. In the extracted sentence, nouns that appear before the causal relation expression are further extracted. For example, with regard to “reading”, the sentences “reading a reference book for exams”, “reading a book to gain knowledge”, and “reading a book for leisure” were extracted. Then, a causal noun group of “examination”, “knowledge”, and “free time” is obtained from the sentence thus extracted. On the other hand, with regard to “read carefully”, for example, “reading a reference book for an examination”, “reading a textbook for a test”, and “reading a book carefully because it requires memorization” If extracted, a causal noun group of “test”, “test”, and “memorization” is obtained from the sentence thus extracted. The second determination unit 26 calculates the average distance in the noun thesaurus of the causal noun group obtained for each verb. In the above example, the cause of the act of “reading carefully” is “test”, “test”, etc. Concentrating on nouns in some semantic categories, and on the other hand, “reading” has various causes of action, so the average distance in the noun thesaurus of both causal noun groups is “reading” is better than “reading”. Larger than Accordingly, “read” and “read carefully” are determined by the second determination unit 26 that “read” is higher and “read” is lower.

本発明は、上記の実施形態に限定されるものではなく、例えば、類否判定部２０により類似と判定された動詞の組に対して、第２判定部２６による判定のみを行って上位下位の判定を行うこととしても構わない。また、類否判定部２０、第１判定部２２では、係り受け名詞抽出部１８により抽出された判定対象の動詞と係り受けの関係にある名詞を用いて判定しているが、上記判定に用いる名詞は判定対象の動詞と文中で共起関係にある名詞を用いることとしても構わない。さらに、第１判定部２２と第２判定部２６においては、それぞれ係り受け名詞抽出部１８と因果関係名詞抽出部２４により抽出された名詞群について、要素の共通度に基づいて上位下位の判定を行うとともに、両名詞群の包含関係に基づいてどちらが上位かを判定することとしてもよい。その他にも、動詞シソーラス中に存在しない動詞をテキストデータから検索するとともに、当該検索した動詞について動詞シソーラス中の各動詞との判定を行い、上位下位、同義語の判定が行われた場合に、当該検索した動詞を動詞シソーラスに追加することとしてもよい。 The present invention is not limited to the above-described embodiment. For example, for a set of verbs determined to be similar by the similarity determination unit 20, only the determination by the second determination unit 26 is performed, and upper and lower levels are determined. It is also possible to make a determination. In addition, the similarity determination unit 20 and the first determination unit 22 make the determination using a noun that has a dependency relationship with the determination target verb extracted by the dependency noun extraction unit 18. The noun may be a noun having a co-occurrence relationship in the sentence with the verb to be determined. Further, in the first determination unit 22 and the second determination unit 26, the upper and lower determinations are made based on the commonality of the elements for the noun groups extracted by the dependency noun extraction unit 18 and the causal noun extraction unit 24, respectively. It is good also as determining which is high-order based on the inclusion relationship of both noun groups. In addition, when searching for verbs that do not exist in the verb thesaurus from the text data, and determining each verb in the verb thesaurus for the searched verb, and when the upper and lower, synonyms are determined, The searched verb may be added to the verb thesaurus.

本実施形態に係る単語関係判定装置の機能ブロック図である。It is a functional block diagram of the word relationship determination apparatus which concerns on this embodiment. 名詞シソーラスの一例を示す図である。It is a figure which shows an example of a noun thesaurus. 動詞の上位下位概念の判定処理のフローチャートである。It is a flowchart of the determination process of the superordinate concept of a verb. 「許可する」について係り受け関係にある名詞及びその名詞の出現頻度の一例を示した図である。It is the figure which showed an example of the noun which has a dependency relation about "permit", and the appearance frequency of the noun. 「読む」について係り受け関係にある名詞及びその名詞の出現頻度の一例を示した図である。It is the figure which showed an example of the noun which has a dependency relation about "reading", and the appearance frequency of the noun. 「認可する」について係り受け関係にある名詞及びその名詞の出現頻度の一例を示した図である。It is the figure which showed an example of the noun which has a dependency relation about "authorize", and the appearance frequency of the noun. 「熟読する」について係り受け関係にある名詞及びその名詞の出現頻度の一例を示した図である。It is the figure which showed an example of the noun which has a dependency relation about "read carefully", and the appearance frequency of the noun. 「許可する」について各基準概念の語の頻度を、格助詞毎に表したテーブルを示す図である。It is a figure which shows the table which represented the frequency of the word of each reference | standard concept about "permit" for every case particle. 「読む」について各基準概念の語の頻度を、格助詞毎に表したテーブルを示す図である。It is a figure which shows the table which represented the frequency of the word of each reference | standard concept about "reading" for every case particle. 「認可する」について各基準概念の語の頻度を、格助詞毎に表したテーブルを示す図である。It is a figure which shows the table which represented the frequency of the word of each reference | standard concept about "permit" for every case particle. 「熟読する」について各基準概念の語の頻度を、格助詞毎に表したテーブルを示す図である。It is a figure which shows the table which represented the frequency of the word of each reference | standard concept about "reading carefully" for every case particle.

Explanation of symbols

１０単語関係判定装置、１２テキストデータ格納部、１４名詞シソーラス格納部、１６判定対象動詞設定部、１８係り受け名詞抽出部、２０類否判定部、２２第１判定部、２４因果関係名詞抽出部、２６第２判定部。 DESCRIPTION OF SYMBOLS 10 Word relationship determination apparatus, 12 Text data storage part, 14 Noun thesaurus storage part, 16 Determination object verb setting part, 18 Dependency noun extraction part, 20 Similarity determination part, 22 1st determination part, 24 Causal relation noun extraction part , 26 Second determination unit.

Claims

Setting means for setting the first verb and the second verb as determination targets;
First noun group acquisition means for acquiring a first noun group including one or more nouns that match a given condition with respect to the first verb from a plurality of sentences;
Second noun group acquisition means for acquiring a second noun group including one or more nouns that match the given condition with respect to the second verb from the plurality of sentences;
Based on a comparison between the variation of the first noun group and the variation of the second noun group in a noun thesaurus that represents a plurality of nouns in a tree structure based on the upper and lower relations of the concept of each noun, Determining means for determining whether or not the first verb and the second verb are in a high-order and low-order relationship.

When there is a difference greater than a predetermined difference between the variation of the first noun group and the variation of the second noun group, the determination unit is configured to use the verb related to the larger variation and the verb related to the other The word relationship determination device according to claim 1, wherein the word relationship determination device is determined to be in the concept.

The given condition includes a plurality of conditions;
The determination means has a large variation when there is a difference greater than or equal to a predetermined value between the variation of the first noun group and the variation of the second noun group acquired based on any of the plurality of conditions. The word relationship determination device according to claim 2, wherein a verb related to the direction is determined to be in a higher order and a verb related to the other is determined to be in a lower relationship.

The given condition is a condition of a noun that has a dependency relationship with a verb set as a target by the setting means, a condition of a noun that appears in the vicinity of a verb set as the target, and the target The word relationship determination device according to any one of claims 1 to 3, further comprising at least one condition of nouns included in the phrase through an expression representing a causal relationship with the verb.

The word according to claim 1, wherein the variation of the noun group in the noun thesaurus is calculated based on the number of hops in the noun thesaurus of each set of nouns included in the noun group. Relationship determination device.

A similarity determination means for determining similarity between the first verb and the second verb based on the first and second noun groups;
The determination unit determines whether the first verb and the second verb are in a high-order or low-order relationship when the similarity determination unit determines that the similarity is similar. The word relationship determination device according to any one of 1 to 5.

The similarity determination unit converts each noun included in the first and second noun groups into a concept of a predetermined standard in the noun thesaurus, and converts the nouns into the converted first and second noun groups. The word relationship determination device according to claim 6, wherein the similarity between the first verb and the second verb is determined based on a frequency of a reference concept included.

Determining that the first verb and the second verb determined to be similar by the similarity determination unit and not determined to be in a higher or lower relationship by the determination unit are in a synonym relationship. The word relationship determination device according to claim 6 or 7, wherein:

Setting means for setting the first verb and the second verb as determination targets;
First noun group acquisition means for acquiring a first noun group including one or a plurality of nouns included in the phrase from a plurality of sentences via an expression representing a causal relationship to the first verb;
Second noun group acquisition means for acquiring a second noun group including one or a plurality of nouns included in the clause through an expression representing a causal relationship to the second verb from the plurality of sentences;
Judgment whether or not the first verb and the second verb are in an upper-lower relationship based on the comparison between the variation of the first noun group and the variation of the second noun group A word relationship determination apparatus characterized by comprising: means.

A setting step for setting the first verb and the second verb as determination targets;
A first noun group obtaining step of obtaining a first noun group including one or more nouns that match a given condition with respect to the first verb from a plurality of sentences;
A second noun group obtaining step of obtaining a second noun group including one or more nouns that match the given condition with respect to the second verb from the plurality of sentences;
Based on a comparison between the variation of the first noun group and the variation of the second noun group in a noun thesaurus that represents a plurality of nouns in a tree structure based on the upper and lower relations of the concept of each noun, A program for causing a computer to execute a determination step of determining whether or not the first verb and the second verb are in an upper-lower relationship.

A setting step for setting the first verb and the second verb as determination targets;
A first noun group acquisition step of acquiring a first noun group including one or a plurality of nouns included in the phrase through a representation representing a causal relationship to the first verb from a plurality of sentences;
A second noun group acquisition step of acquiring a second noun group including one or a plurality of nouns included in the phrase through an expression representing a causal relationship to the second verb from the plurality of sentences;
Judgment whether or not the first verb and the second verb are in an upper-lower relationship based on the comparison between the variation of the first noun group and the variation of the second noun group A program for causing a computer to execute the steps.