JPH06208590A

JPH06208590A - Method for calculating degree of similarity between words

Info

Publication number: JPH06208590A
Application number: JP5003333A
Authority: JP
Inventors: Susumu Sai; 進崔; Eiji Komatsu; 英二小松; Hiroshi Yasuhara; 宏安原
Original assignee: NIPPON DENSHIKA JISHO KENKYUSH; NIPPON DENSHIKA JISHO KENKYUSHO KK
Current assignee: NIPPON DENSHIKA JISHO KENKYUSH; NIPPON DENSHIKA JISHO KENKYUSHO KK
Priority date: 1993-01-12
Filing date: 1993-01-12
Publication date: 1994-07-26

Abstract

PURPOSE:To calculate the degree of similarity between polysemy words or between words in different languages by using a word dictionary describing corresponding relation between each word and its concept and a concept system describing the master-slave relation of concepts and independent of languages in respect to a similarity degree calculating method for calculating the degree of similarity between words. CONSTITUTION:This similarity degree calculating method is provided with the word dictionary 1 and the concept system 2, concept groups corresponding to plural inputted words are respectively extracted by retrieving the dictionary 1, the degree alpha of similarity between the extracted concept groups is calculated, corresponding master concepts (and/or slave concepts) are extracted from these extracted concept groups by retrieving the system 2, and the degree beta of similarity between the extracted master concepts (and/or the slave concepts) is calculated. These degrees alpha, beta of similarity are weighted to calculate the degree deltaof similarity as the degree of similarity between words.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、単語間の類似度を算出
する類似度算出方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a similarity calculation method for calculating the similarity between words.

【０００２】[0002]

【従来の技術】従来、単語間の類似度算出は、分類語彙
表、類語辞典、シソーラスなどを用いて、単語間の距離
を測定して算出していた。2. Description of the Related Art Heretofore, the similarity between words has been calculated by measuring the distance between words using a categorized vocabulary table, a thesaurus, a thesaurus, and the like.

【０００３】[0003]

【発明が解決しようとする課題】これら従来の類似度算
出は、いずれも単語体系を利用した類似度を算出するも
のであって、単語の持つ概念および概念体系上の上位下
位関係を利用した類似度算出でないため、多義性を持つ
単語に対する処理の仕方の問題や、異なる言語の単語間
の類似度を算出できないなどの問題があった。All of these conventional similarity calculation methods calculate the similarity using the word system, and the similarity using the concept of words and the upper and lower relations in the concept system. Since it is not a degree calculation, there is a problem of how to process a word having polysemy and a problem that the degree of similarity between words in different languages cannot be calculated.

【０００４】本発明は、これらの問題を解決するため、
単語と概念の対応関係を記述した単語辞書および概念の
上位下位関係を記述した言語に依存しない概念体系を用
い、多義性の単語や異なる言語の単語間の類似度を算出
可能にすることを目的としている。The present invention solves these problems.
The purpose is to be able to calculate polysemy words and similarity between words in different languages by using a word dictionary that describes the correspondence between words and concepts and a language-independent concept system that describes the superordinate and subordinate relationships of concepts. I am trying.

【０００５】[0005]

【課題を解決するための手段】図１を参照して課題を解
決するための手段を説明する。図１において、単語辞書
１は、単語とその概念集合を予め登録したものである。[Means for Solving the Problems] Means for solving the problems will be described with reference to FIG. In FIG. 1, a word dictionary 1 is a dictionary in which words and their concept sets are registered in advance.

【０００６】概念体系２は、単語とその概念集合の上位
概念および／あるいは下位概念を予め登録したものであ
る。概念集合類似度算出部５は、単語辞書１を検索して
取り出した単語の概念集合をもとに、単語間の概念集合
の類似度αを算出するものである。The concept system 2 is a system in which a word and a superordinate concept and / or a subordinate concept of the concept set are registered in advance. The concept set similarity calculation unit 5 calculates the concept set similarity α between words based on the concept set of words retrieved by searching the word dictionary 1.

【０００７】概念集合類似度算出部６は、概念体系２を
検索して取り出した単語の概念集合の上位概念（および
／あるいは下位概念）をもとに、概念集合の類似度βを
算出するものである。The concept set similarity calculation unit 6 calculates the concept set similarity β based on the superordinate concept (and / or subordinate concept) of the concept set of the words retrieved by searching the concept system 2. Is.

【０００８】単語間類似度算出部７は、類似度αおよび
類似度βに重み付けをそれぞれ行って単語間の類似度δ
を算出するものである。The word similarity calculation unit 7 weights the similarity α and the similarity β, respectively, and calculates the similarity δ between the words.
Is calculated.

【０００９】[0009]

【作用】本発明は、図１に示すように、概念集合類似度
算出部５が、入力された複数の単語について、単語辞書
１を検索して取り出した概念集合の間の類似度αを算出
し、概念集合類似度算出部６がこれら取り出した概念集
合について、概念体系２を検索して取り出した上位概念
（および／あるいは下位概念）の間の類似度βを算出
し、単語間類似度算出部７がこれら算出した類似度αお
よび類似度βにそれぞれ重み付けを行って単語間の類似
度δを算出するようにしている。In the present invention, as shown in FIG. 1, the concept set similarity calculation unit 5 calculates the similarity α between the concept sets retrieved by searching the word dictionary 1 for a plurality of input words. Then, the concept set similarity calculation unit 6 searches the concept system 2 for these extracted concept sets, calculates the similarity β between the extracted upper concepts (and / or lower concepts), and calculates the inter-word similarity. The unit 7 weights the calculated similarity α and similarity β to calculate the similarity δ between words.

【００１０】この際、単語辞書１を異なる言語毎に設け
ると共に、これらの単語辞書１からリンクする共通の上
記概念体系２とするようにしている。従って、単語と概
念の対応関係を記述した単語辞書１および概念の上位下
位関係を記述した言語に依存しない概念体系２を用い、
従来の類似度では算出できなかった多義性の単語や異な
る言語の単語などの間の類似度を算出することが可能と
なる。At this time, the word dictionary 1 is provided for each different language, and the common concept system 2 linked from these word dictionaries 1 is used. Therefore, using the word dictionary 1 that describes the correspondence between words and concepts and the language-independent concept system 2 that describes the superordinate and subordinate relationships of concepts,
It is possible to calculate the similarity between ambiguous words and words in different languages, which could not be calculated by the conventional similarity.

【００１１】[0011]

【実施例】次に、図１から図９を用いて本発明の実施例
の構成および動作を順次詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the construction and operation of an embodiment of the present invention will be described in detail with reference to FIGS.

【００１２】図１は、本発明の１実施例構成図を示す。
図１において、単語辞書１は、単語の概念集合を予め登
録したものである。例えば後述するように、単語“リン
ゴ”に対応づけて概念集合・林檎という植物・林檎という植物の果実を登録したものである（図６、図７参照）。FIG. 1 shows a block diagram of an embodiment of the present invention.
In FIG. 1, a word dictionary 1 is one in which a concept set of words is registered in advance. For example, as will be described later, the concept set, the plant called apple, and the fruit of the plant called apple are registered in association with the word “apple” (see FIGS. 6 and 7).

【００１３】概念体系２は、単語辞書１に登録した単語
の概念集合の上位概念および下位概念を予め登録したも
のである。例えば後述するように、単語の概念集合につ
いて、ｉ段目（ｉ＝１、２・・・・、整数）に対応づけ
て上位概念を下記のように登録する。The concept system 2 is a system in which a superordinate concept and a subordinate concept of a concept set of words registered in the word dictionary 1 are registered in advance. For example, as will be described later, with respect to the concept set of words, the superordinate concept is registered as below in association with the i-th stage (i = 1, 2, ..., Integer).

【００１４】・単語の概念集合：林檎という植物・上位概念（１段目）：樹木・上位概念（２段目）：種で捕えた植物単語辞書検索部３は、類似度の算出対象の単語Ｗ１、単
語Ｗ２について、単語辞書１を検索して該当する単語の
概念集合をそれぞれ取り出すものである。例えば後述す
るように、単語Ｗ１“リンゴ”に対応して、概念集合と
して、図６に示すように、・林檎という植物・林檎という植物の果実を取り出すものである。-Concept set of words: plant called apple-superordinate concept (first stage): tree-superordinate concept (second stage): plant caught by seed The word dictionary search unit 3 calculates words for which similarity is calculated. With respect to W1 and word W2, the word dictionary 1 is searched to extract the concept set of the corresponding word. For example, as will be described later, as shown in FIG. 6, corresponding to the word W1 “apple”, the plant of apples and the fruits of the plant of apples are extracted as shown in FIG.

【００１５】概念集合類似度算出部５は、単語辞書検索
部３によって取り出された単語Ｗ１の概念集合５１およ
び単語Ｗ２の概念集合５２をもとに、下式（１）から類
似度αを算出するものである。The concept set similarity calculation unit 5 calculates the similarity α from the following equation (1) based on the concept set 51 of the word W1 and the concept set 52 of the word W2 extracted by the word dictionary search unit 3. To do.

【００１６】 α＝｜Ｃ１∩Ｃ２｜（１）ここで、Ｃ１は単語Ｗ１の概念集合を表し、Ｃ２は単語
Ｗ２の概念集合を表す。従って、類似度αは、概念集合
Ｃ１と概念集合Ｃ２に共通に存在する概念の数となる
（図６、図７参照）。Α = | C1∩C2 | (1) Here, C1 represents the concept set of the word W1, and C2 represents the concept set of the word W2. Therefore, the degree of similarity α is the number of concepts that commonly exist in the concept set C1 and the concept set C2 (see FIGS. 6 and 7).

【００１７】概念体系検索部４は、Ｗ１の概念集合５
１、Ｗ２の概念集合５２について、ここでは、概念体系
２を検索してＷ１の上位概念集合６１、Ｗ２の上位概念
集合６２をそれぞれ取り出すものである。例えばＷ１の
概念集合５１・林檎という植物に対応して上位概念（１段目）として・樹木を取り出し、上位概念（２段目）として・種で捉えた植物をそれぞれ取り出すものである（図６、図７参照）。The concept system search unit 4 uses the concept set 5 of W1.
For the concept set 52 of 1 and W2, here, the concept system 2 is searched to extract the super concept set 61 of W1 and the super concept set 62 of W2, respectively. For example, the concept set 51 of W1 ・ As a superordinate concept (first stage) corresponding to a plant called an apple ・ A tree is taken out and as a superordinate concept (second stage) ・ A plant caught by a seed is taken out respectively (Fig. 6). , See FIG. 7).

【００１８】概念集合類似度算出部６は、概念体系２か
ら取り出された単語Ｗ１の上位概念集合６１、単語Ｗ２
の上位概念集合６２をもとに、下式（２）、（３）から
類似度βを算出するものである。まず、Ｗ１とＷ２の概
念体系上のｉ段目の上位概念集合間の類似度βｉを下式
（２）によって算出する。The concept set similarity calculator 6 calculates a superordinate concept set 61 of the word W1 extracted from the concept system 2 and a word W2.
Based on the superordinate concept set 62, the similarity β is calculated from the following equations (2) and (3). First, the similarity βi between the i-th superordinate concept sets in the conceptual system of W1 and W2 is calculated by the following equation (2).

【００１９】 βｉ＝（１＋Ｋβｉ１×ＣＳｉ）（１＋Ｋβｉ２（ＣＳｉ／Ｎｉ１＋ＣＳｉ／Ｎｉ２）））−１（２）ここで、ｉ＝１、２・・・（概念体系２の最大段数）Ｋβｉ１、Ｋβｉ２：重みＮｉ１：Ｗ１のｉ段目の上位概念集合内の異なる概念の
数Ｎｉ２：Ｗ２のｉ段目の上位概念集合内の異なる概念の
数ＣＳｉ：Ｗ１のｉ段目の上位概念集合とＷ２のｉ段目の
上位概念集合の間の共通な概念の数である。例えばＣＳｉの値が大きいほど類似度βｉの値
は大きくなる。ＣＳｉ＝０のとき、類似度βｉ＝０とな
る。Ｎｉ１あるいはＮｉ２の値が大きいほど類似度βｉ
の値は小さくなる。ＣＳｉ／Ｎｉｋ（ｋ＝１、２）は、
共通な上位概念数対上位概念の集合内の異なる概念数の
比率である。例えば５つの上位概念の中に１つが他と同
じ（ＣＳｉ／Ｎｉｋ＝１／５）であることより、２つの
上位概念の中に１つが他と同じ（ＣＳｉ／Ｎｉｋ＝１／
２）であることの方が類似度βｉが大きい。Ｋβｉ１を
用いてＣＳｉの重みを調整し、Ｋβｉ２を用いてＣＳｉ
／Ｎｉ１＋ＣＳｉ／Ｎｉ２の重みを調整する。Ｋβｉ２
＝０のとき、Ｎｉ１とＮｉ２がβｉに作用しなくなる。
これら求めた類似度βｉをもとに全体の類似度βを下式
（３）を用いて算出する。Βi = (1 + Kβi1 × CSi) (1 + Kβi2 (CSi / Ni1 + CSi / Ni2))-1 (2) Here, i = 1, 2, ... (Maximum number of stages of concept system 2) Kβi1, Kβi2: weight Ni1: Number of different concepts in i-th superordinate concept set of W1 Ni2: Number of different concepts in i-th superordinate concept set of W2 CSi: W1 i-th superordinate concept set and W2 i-th stage It is the number of common concepts between the superordinate sets of eyes. For example, the larger the value of CSi, the larger the value of the similarity βi. When CSi = 0, the similarity βi = 0. The larger the value of Ni1 or Ni2, the similarity βi
Becomes smaller. CSi / Nik (k = 1, 2) is
It is the ratio of the number of common superordinate concepts to the number of different concepts in the set of superordinate concepts. For example, one of the five superordinate concepts is the same as the others (CSi / Nik = 1/5), and thus one of the two superordinate concepts is the same (CSi / Nik = 1 /).
In the case of 2), the similarity βi is larger. Adjust the weight of CSi using Kβi1 and use Cβi2 to adjust CSi
Adjust the weight of / Ni1 + CSi / Ni2. Kβi2
When = 0, Ni1 and Ni2 do not act on βi.
Based on the obtained similarity βi, the overall similarity β is calculated using the following equation (3).

【００２０】 β＝Ｋβ１×β１＋Ｋβ２×β２＋・・・＋Ｋβｉ×βｉ（３）ここで、ｉ＝１、２、３・・・ｉ（整数）Ｋβｉ：βｉの重み βｉ：Ｗ１およびＷ２の上位概念上のｉ段目の上位概念
間の類似度である。Β = Kβ1 × β1 + Kβ2 × β2 + ... + Kβi × βi (3) where i = 1, 2, 3 ... i (integer) Kβi: Weight of βi βi: In the superordinate concept of W1 and W2 Is the similarity between the superordinate concepts of the i-th stage.

【００２１】単語間類似度算出部７は、類似度αおよび
類似度βに重み付けをそれぞれ行って単語間の類似度δ
を下式（４）を用いて算出するものである。 δ＝１−ｅｘｐ（−（Ｋα×α＋Ｋβ×β））（４）ここで、Ｋα：αの重みＫβ：βの重みである。Ｋαを用いて類似度αの重みを調整し、Ｋαの
値が大きいほど類似度αの重みは大きくなる。Ｋβを用
いて類似度βの重みを調整し、Ｋβの値が大きいほど類
似度βの重みは大きくなる。ＫαとＫβの値を調整する
ことにより、類似度δの値の範囲を、０から１の範囲で
調整することができる。The inter-word similarity calculation unit 7 weights the similarity α and the similarity β, respectively, and calculates the similarity δ between words.
Is calculated using the following equation (4). δ = 1−exp (− (Kα × α + Kβ × β)) (4) Here, Kα: α weight Kβ: β weight. The weight of the similarity α is adjusted using Kα, and the larger the value of Kα, the larger the weight of the similarity α. The weight of the similarity β is adjusted using Kβ, and the larger the value of Kβ, the larger the weight of the similarity β. By adjusting the values of Kα and Kβ, the range of the value of the similarity δ can be adjusted within the range of 0 to 1.

【００２２】以下詳細に順次説明する。図２は、本発明
の動作説明フローチャートを示す。図２において、Ａ１
は、単語Ｗ１、単語Ｗ２の入力を行う。これは、図１の
左側から類似度δの算出対象の単語Ｗ１、単語Ｗ２とし
て、例えば後述する図６の・単語Ｗ１“リンゴ” ・単語Ｗ２“オレンジ” を入力する。Details will be sequentially described below. FIG. 2 shows a flowchart for explaining the operation of the present invention. In FIG. 2, A1
Inputs the word W1 and the word W2. For this purpose, from the left side of FIG. 1, for example, the word W1 “apple” and the word W2 “orange” in FIG.

【００２３】Ａ２は、単語辞書の検索を行う。Ａ３は、
概念集合の取り出しを行う。これらＡ２、Ａ３は、図１
の単語辞書検索部３が単語辞書１を検索し、Ａ１で入力
された単語Ｗ１、単語Ｗ２の概念集合を図６の概念集合
Ｃ１、概念集合Ｃ２に示すように取り出す。A2 searches the word dictionary. A3 is
Extract the concept set. These A2 and A3 are shown in FIG.
The word dictionary searching unit 3 searches the word dictionary 1 and extracts the concept sets of the words W1 and W2 input in A1 as shown in the concept sets C1 and C2 of FIG.

【００２４】Ａ４は、類似度αの算出を行う。これは、
Ａ３で取り出した単語Ｗ１、単語Ｗ２の概念集合Ｃ１、
概念集合Ｃ２について、式（１）をもとに、概念集合Ｃ
１と概念集合Ｃ２に共通に存在する概念集合の数を類似
度αとして算出する。例えば図６の場合には概念集合Ｃ
１と概念集合Ｃ２に共通に存在する概念集合がないた
め、類似度α＝０と算出する。At A4, the degree of similarity α is calculated. this is,
A word W1 extracted in A3, a concept set C1 of word W2,
For concept set C2, based on equation (1), concept set C2
1 and the number of concept sets commonly existing in the concept set C2 are calculated as the similarity α. For example, in the case of FIG. 6, the concept set C
1 and the concept set C2 do not have a common concept set, the similarity α = 0 is calculated.

【００２５】Ａ５は、概念体系の検索を行う。Ａ６は、
上位概念の取り出しを行う。これらＡ５、Ａ６は、Ａ３
で取り出した概念集合Ｃ１および概念集合Ｃ２につい
て、概念体系２を検索し、上位概念をそれぞれ取り出
す。例えば図６に示すように、概念集合Ｃ１の“林檎と
いう植物”の１段目の上位概念として、・樹木２段目の上位概念として・種で捉えた植物を取り出す。A5 searches the concept system. A6 is
Extract the superordinate concept. These A5 and A6 are A3
The concept system 2 is searched for the concept set C1 and the concept set C2 extracted in step S1, and the superordinate concepts are extracted. For example, as shown in FIG. 6, as a superordinate concept of the first stage of the “plant called apple” in the concept set C1, as a superordinate concept of the second stage of trees, the plant captured by the seed is extracted.

【００２６】Ａ７は、類似度βの算出を行う。これは、
Ａ６で取り出した上位概念について、１段目、２段目に
ついてそれぞれ既述した式（２）、（３）に代入して、
類似度β１、類似度β２を例えば図６に示すようにそれ
ぞれ算出する。At A7, the similarity β is calculated. this is,
Substituting the superordinate concept extracted in A6 into the equations (2) and (3) already described for the first and second stages,
The similarity β1 and the similarity β2 are calculated, for example, as shown in FIG.

【００２７】Ａ８は、類似度δの算出を行う。これは、
Ａ４で算出した類似度α、およびＡ７で算出した類似度
β１、類似度β２を既述した式（４）に代入して、類似
度δを例えば図６に示すように算出する。At A8, the similarity δ is calculated. this is,
The similarity α calculated in A4 and the similarity β1 and the similarity β2 calculated in A7 are substituted into the above-described equation (4) to calculate the similarity δ as shown in FIG. 6, for example.

【００２８】以上によって、単語Ｗ１、単語Ｗ２の入力
に対応して、単語辞書１を検索して単語に対応する概念
集合Ｃ１、概念集合Ｃ２を取り出して類似度αを算出
し、次にこれら概念集合Ｃ１、概念集合Ｃ２について概
念体系２を検索して上位概念Ｓ１、上位概念Ｓ２を取り
出して類似度β１、類似度β２を算出し、これらから全
体の類似度δを算出して単語間の類似度とする。これに
より、単語間の類似度を算出する際に、当該単語の概念
を考慮した類似度を算出することが可能となり、多義性
を持つ単語間の類似度を算出したり、異なる言語の単語
間の類似度を算出することが可能となる。以下更に詳細
に説明する。As described above, in response to the input of the word W1 and the word W2, the word dictionary 1 is searched, the concept set C1 and the concept set C2 corresponding to the word are extracted, the similarity α is calculated, and then these concepts are calculated. The concept system 2 is searched for the set C1 and the concept set C2, the superordinate concept S1 and the superordinate concept S2 are extracted, the similarity β1 and the similarity β2 are calculated, and the overall similarity δ is calculated from these to calculate the similarity between words. Degree. This makes it possible to calculate the degree of similarity in consideration of the concept of the word when calculating the degree of similarity between words, and to calculate the degree of similarity between words with polysemy or between words in different languages. It is possible to calculate the degree of similarity of. The details will be described below.

【００２９】図３は、本発明の類似度αの算出フローチ
ャートを示す。これは、図２のＡ３およびＡ４の詳細な
フローチャートである。ここでは、図７を参照して具体
的に説明する。FIG. 3 shows a flowchart for calculating the similarity α according to the present invention. This is a detailed flowchart of A3 and A4 in FIG. Here, a specific description will be given with reference to FIG. 7.

【００３０】図３において、Ａ１１は、単語Ｗ１、Ｗ２
の概念集合Ｃ１、Ｃ２の取り出しを行う。これは、例え
ば図７に示すように、単語Ｗ１、単語Ｗ２について、単
語辞書１を検索して図示のような概念集合Ｃ１、概念集
合Ｃ２をそれぞれ取り出す。In FIG. 3, A11 is the words W1 and W2.
The concept sets C1 and C2 are extracted. For example, as shown in FIG. 7, the word dictionary 1 is searched for the word W1 and the word W2, and the concept set C1 and the concept set C2 shown in the drawing are extracted.

【００３１】Ａ１２は、概念集合Ｃ１、Ｃ２の比較を行
う。Ａ１３は、一致数を算出し、類似度αとして保存す
る。これらＡ１２、Ａ１３は、図７に示すように、単語
Ｗ１および単語Ｗ２の概念集合Ｃ１および概念集合Ｃ２
を比較し、ここでは、“蜜柑という果実”が１つ一致し
たので、この一致数を類似度α＝１として保存する。A12 compares the concept sets C1 and C2. At A13, the number of matches is calculated and stored as the similarity α. These A12 and A13 are, as shown in FIG. 7, a concept set C1 and a concept set C2 of the word W1 and the word W2.
Are compared, and here, one "fruit called tangerine" is matched, so the number of matches is stored as the degree of similarity α = 1.

【００３２】以上によって、単語Ｗ１の概念集合Ｃ１、
単語Ｗ２の概念集合Ｃ２を取り出し、類似度αを算出す
る。図４は、本発明の類似度βの算出フローチャートを
示す。これは、図２のＡ５からＡ７の詳細なフローチャ
ートである。ここでは、図７を参照して具体的に説明す
る。From the above, the concept set C1 of the word W1,
The concept set C2 of the word W2 is taken out, and the similarity α is calculated. FIG. 4 shows a flowchart for calculating the similarity β according to the present invention. This is a detailed flowchart of A5 to A7 in FIG. Here, a specific description will be given with reference to FIG. 7.

【００３３】図４において、Ａ２１は、概念集合Ｃ１、
Ｃ２をもとに、概念体系を検索し、上位概念Ｓを取り出
す。これは、例えば図７の概念集合Ｃ１および概念集合
Ｃ２について、概念体系２を検索して図示のような上位
概念（ｉ＝１、１段目）Ｓ１および上位概念（ｉ＝２、
２段目）Ｓ２をそれぞれ取り出す。In FIG. 4, A21 is a concept set C1,
The concept system is searched based on C2, and the superordinate concept S is extracted. For example, the concept system 2 is searched for the concept set C1 and the concept set C2 in FIG. 7, and the superordinate concept (i = 1, first stage) S1 and superordinate concept (i = 2,
(2nd step) Take out S2.

【００３４】Ａ２２は、上位概念Ｓｉを比較する。Ａ２
３は、一致数を算出し、ＣＳｉとする。Ａ２４は、Ｓｉ
の要素数をＮｉとする。A22 compares the superordinate concept Si. A2
3 calculates the number of coincidences and sets it as CSi. A24 is Si
Let Ni be the number of elements of.

【００３５】Ａ２５は、式（２）に代入し、類似度βｉ
を算出して保存する。これらは、後述する図７に示すよ
うにして類似度β１＝９５．０７、類似度β２＝９５．
０７としてそれぞれ算出して保存する。Substituting A25 into equation (2), the similarity βi
Calculate and save. These are similarities β1 = 95.07 and similarity β2 = 95.
07 are calculated and saved.

【００３６】Ｓ２６は、指定されたｉか判別する。ＹＥ
Ｓの場合には、指定された段数ｉについて類似度βｉの
算出を終了したので、Ａ２８に進む。一方、ＮＯの場合
には、指定された段数ｉまで類似度βの算出を終了して
いないので、Ａ２７でｉ＝ｉ＋１してＡ２２に戻る。In step S26, it is determined whether the designated i. YE
In the case of S, the calculation of the degree of similarity βi has been completed for the specified number of stages i, and thus the process proceeds to A28. On the other hand, in the case of NO, since the calculation of the similarity β has not been completed until the designated number of stages i, i = i + 1 in A27 and the process returns to A22.

【００３７】Ａ２８は、保存したβｉを式（３）に代入
して類似度βを算出し、保存する。これは、Ａ２５で保
存した類似度βｉについて、式（３）に代入して全体の
類似度βを算出して保存する。これにより、図７の場合
には、類似度β１、β２を式（３）に代入して類似度β
＝９９．８２として算出し、保存する。At step A28, the stored βi is substituted into the equation (3) to calculate the similarity β, which is then stored. This is performed by substituting the similarity βi stored in A25 into the equation (3) and calculating and storing the overall similarity β. As a result, in the case of FIG. 7, the similarities β1 and β2 are substituted into the equation (3) to calculate the similarity β.
= 99.82 and save.

【００３８】以上によって、単語Ｗ１の概念集合Ｃ１、
単語Ｗ２の概念集合Ｃ２の入力に対応して、概念体系２
を検索して上位概念Ｓ１、上位概念Ｓ２を取り出して類
似度β１、β２をそれぞれ算出し、これら類似度β１、
β２を式（３）に代入して全体の類似度βを算出する。
これらにより、単語の概念集合の上位概念間の類似度β
を算出することが可能となり、多義性を持つ単語間の類
似度を算出したり、異なる言語の単語間の類似度を算出
することが可能となる。From the above, the concept set C1 of the word W1
Corresponding to the input of the concept set C2 of the word W2, the concept system 2
Is searched for, the superordinate concept S1 and the superordinate concept S2 are extracted, and the similarities β1 and β2 are calculated.
Substituting β2 into equation (3), the overall similarity β is calculated.
From these, the similarity β between superordinate concepts of the concept set of words
Can be calculated, and it is possible to calculate the degree of similarity between words having polysemy and the degree of similarity between words in different languages.

【００３９】図５は、本発明の類似度δの算出フローチ
ャートを示す。これは、図２のＡ８の詳細なフローチャ
ートである。ここでは、図７を参照して具体的に説明す
る。図５において、Ａ３１は、保存したα、βを取り出
す。これは、既述した図３のＡ１３で保存した類似度α
および図４のＡ２８で保存した類似度βをそれぞれ取り
出す。FIG. 5 shows a flowchart for calculating the similarity δ according to the present invention. This is a detailed flowchart of A8 in FIG. Here, a specific description will be given with reference to FIG. 7. In FIG. 5, A31 retrieves the stored α and β. This is the similarity α saved in A13 of FIG.
And the similarity β stored in A28 of FIG. 4 is extracted.

【００４０】Ａ３２は、α、βを式（４）に代入して、
類似度δを算出する。これは、Ａ３１で取り出した単語
の概念集合間の類似度α、および上位概念間の類似度β
を式（４）に代入し、全体の類似度δを算出する。例え
ば図７の（４）に示すように、代入してここで、類似度
δ＝０．９７として算出する。A32 substitutes α and β into the equation (4) to obtain
The similarity δ is calculated. This is the similarity α between the concept sets of the words extracted in A31 and the similarity β between the superordinate concepts.
Is substituted into equation (4) to calculate the overall similarity δ. For example, as shown in (4) of FIG. 7, substitution is performed here to calculate the similarity δ = 0.97.

【００４１】以上によって、単語の概念集合Ｃ１、Ｃ２
の間の類似度α、およびこれら概念集合Ｃ１、Ｃ２の上
位概念Ｓ１、Ｓ２などの間の類似度βを求め、これら類
似度αおよび類似度βを式（４）に代入して全体の類似
度δを算出することが可能となる。これらにより、多義
性を持つ単語や異なる言語の単語間の類似度を算出する
ことが可能となる。From the above, the word concept sets C1 and C2
Between the concept sets C1 and C2, and the similarity β between the superordinate concepts S1 and S2 of the concept sets C1 and C2. The similarity α and the similarity β are substituted into the equation (4) to obtain the overall similarity. It is possible to calculate the degree δ. With these, it becomes possible to calculate the degree of polysemy and the similarity between words in different languages.

【００４２】図６は、本発明の具体例（その１）を示
す。この具体例は、単語Ｗ１“リンゴ”および単語Ｗ２
“オレンジ”の場合のものである。ここで、矢印はポイ
ント先を表し、ポイント先の＜３ｂｄ８ｄｃ＞などは概
念集合（あるいは上位概念）のインデックスＩＤを表
す。FIG. 6 shows a specific example (1) of the present invention. In this example, the word W1 "apple" and the word W2
This is for "orange". Here, the arrow indicates the point destination, and <3bd8dc> at the point destination indicates the index ID of the concept set (or the superordinate concept).

【００４３】（１）係数として予め実験で求めてその
値を図示の下記のように設定する。・Ｋα＝０．４５・Ｋβ＝０．０２８・Ｋβ１＝２．７５・Ｋβ２＝８．２５・Ｋ１＝１・Ｋ２＝０．０５（２）単語Ｗ１“リンゴ”および単語Ｗ２“オレン
ジ”について、単語辞書１を検索し、・単語Ｗ１“リンゴ”の概念集合Ｃ１として、・＜３ｂｄ８ｄｃ＞林檎という植物・＜３ｂｄ８ｄｂ＞林檎という植物の果実・単語Ｗ２“オレンジ”の概念集合Ｃ２として、・＜０ｅ８４４ｅ＞オレンジという植物・＜３ｃ０ｅ７４＞オレンジという色・＜３ｃ０７３５＞蜜柑という果実をそれぞれ取り出した様子を示す。(1) The coefficient is obtained in advance by experiments and the value is set as shown below.・ Kα = 0.45 ・ Kβ = 0.028 ・ Kβ1 = 2.75 ・ Kβ2 = 8.25 ・ K1 = 1 ・ K2 = 0.05 (2) For the word W1 "apple" and the word W2 "orange", The word dictionary 1 is searched, -as the concept set C1 of the word W1 "apple",-<3bd8dc> the plant called apple, <3bd8db> the fruit of the plant called apple, -as the concept set C2 of the word W2 "orange",-<0e844e > A plant called orange. <3c0e74> A color called orange. <3c0735> A fruit called tangerine is taken out.

【００４４】（３）単語Ｗ１の概念集合Ｃ１と単語Ｗ
２の概念集合Ｃ２の類似度αを計算すると、式（１）に
示すように、ここでは概念集合Ｃ１と概念集合Ｃ２に同
じ概念がないので、・類似度α＝０となる。(3) Concept set C1 of word W1 and word W
When the similarity α of the concept set C2 of 2 is calculated, there is no same concept in the concept set C1 and the concept set C2 as shown in Expression (1), and therefore the similarity α = 0.

【００４５】（４）概念集合Ｃ１および概念集合Ｃ２
について、概念体系２を検索し、・概念集合Ｃ１の上位概念Ｓ₁ ¹（１段目）として、・＜３０ｆ６ｃｃ＞樹木・＜３０ｆ６ｃｅ＞果物・概念集合Ｃ２の上位概念Ｓ₁ ²（１段目）として、・＜３０ｆ６ｃｃ＞樹木・＜３０ｆ６ｃｅ＞果物・＜３０ｆ９４４＞色の値をそれぞれ取り出した様子を示す。(4) Concept set C1 and concept set C2
For, searches the conceptual system 2, as the preamble S _{¹ 1} of-concept set C1 (1 stage), · <30f6cc> preamble S ₁ ² ₍₁ stage of trees · <30f6ce> fruit and concept set C2 ), <30f6cc> tree, <30f6ce> fruit, <30f944> color value, respectively.

【００４６】（５）概念集合Ｃ１の上位概念Ｓ₁ ¹と概
念集合Ｃ２の上位概念Ｓ₁ ²の類似度β１を計算すると、
式（２．１）に示すように、・類似度β１＝９５．０７となる。ここで、・Ｎ１１＝２は、上位概念Ｓ₁ ¹の数が２つを表す。(5) When the similarity β1 between the superordinate concept S ₁ ¹ of the concept set C1 and the superordinate concept S ₁ ² of the concept set C2 is calculated,
As shown in Expression (2.1): The similarity β1 = 95.07. Here, · N11 = 2, the number of preamble S ₁ ¹ represents two.

【００４７】・Ｎ１２＝３は、上位概念Ｓ₁ ²の数が３つを表す。・ＣＳ１＝２は、上位概念Ｓ₁ ¹と上位概念Ｓ₁ ²のうちの
一致する概念の数が２つを表す。N12 = 3 indicates that the number of superordinate concepts S ₁ ² is three. CS1 = 2 indicates that the number of matching concepts in the superordinate concept S ₁ ¹ and the superordinate concept S ₁ ² is two.

【００４８】（６）概念集合Ｃ１および概念集合Ｃ２
について、概念体系２を検索し、・概念集合Ｃ１の上位概念Ｓ₂ ¹（２段目）として、・＜３０ｆ６ｃｂ＞種で捉えた植物・＜３ｆ９６３９＞飲食物・概念集合Ｃ２の上位概念Ｓ₂ ²（２段目）として、・＜３０ｆ６ｃｂ＞種で捉えた植物・＜３ｆ９６３９＞飲食物・＜３ｆ９８９２＞具体物の質的属性をそれぞれ取り出した様子を示す。(6) Concept set C1 and concept set C2
, The concept system 2 is searched, as a superordinate concept S ₂ ¹ (second stage) of the concept set C ¹ , a plant caught by <30f6cb> species, <3f9639> food and drink, and a superordinate concept S _{2 of the} concept set C2 ² (2nd tier): <30f6cb> plant captured by <30f6cb> species <3f9639> food / drink <3f9892> qualitative attributes of concrete objects are shown.

【００４９】（７）概念集合Ｃ１の上位概念Ｓ₂ ¹と概
念集合Ｃ２の上位概念Ｓ₂ ²の類似度β２を計算すると、
式（２．２）に示すように、・類似度β２＝９５．０７となる。ここで、・Ｎ２１＝２は、上位概念Ｓ₂ ¹の数が２つを表す。[0049] (7) When calculating the generic concept S ₂ ² of the similarity β2 generic term S ₂ ¹ and concepts set C2 concepts set C1,
As shown in Expression (2.2): The similarity β2 = 95.07. Here, · N21 = 2, the number of preamble S ₂ ¹ represents two.

【００５０】・Ｎ２２＝３は、上位概念Ｓ₂ ²の数が３つを表す。・ＣＳ２＝２は、上位概念Ｓ₂ ¹と上位概念Ｓ₂ ²のうちの
一致する概念の数が２つを表す。N22 = 3 indicates that the number of superordinate concepts S ₂ ² is three. CS2 = 2 indicates that the number of matching concepts in the superordinate concepts S ₂ ¹ and S ₂ ² is two.

【００５１】（８）次に類似度α、類似度β１、類似
度β２から全体の類似度δを式（４）に示すように・類似度δ＝０．９４と算出する。(8) Next, from the similarity α, the similarity β1, and the similarity β2, the overall similarity δ is calculated as shown in equation (4): similarity δ = 0.94.

【００５２】以上のように、単語Ｗ１“リンゴ”および
単語Ｗ２“オレンジ”が入力されたことに対応して、単
語辞書１および概念体系２を検索して、類似度α、類似
度β１、β２を算出し、これらをもとに全体の類似度δ
＝０．９４として算出する。これらにより、単語Ｗ１
“リンゴ”と単語Ｗ２“オレンジ”の概念集合の間に同
一のものがなくても、当該概念集合の上位概念の間の類
似度β１、β２をもとに単語間の類似度δを算出するこ
とが可能となる。As described above, in response to the input of the word W1 "apple" and the word W2 "orange", the word dictionary 1 and the concept system 2 are searched, and the similarity α, the similarity β1, and β2 are obtained. Is calculated, and based on these, the overall similarity δ
It is calculated as = 0.94. With these, the word W1
Even if the concept set of “apple” and the word W2 “orange” are not the same, the similarity δ between words is calculated based on the similarities β1 and β2 between superordinate concepts of the concept set. It becomes possible.

【００５３】図７は、本発明の具体例（その２）を示
す。この具体例は、単語Ｗ１“オレンジ”および単語Ｗ
２“橘”の場合のものである。ここで、矢印はポイント
先を表し、ポイント先の＜０ｅ８４４ｅ＞などは概念集
合（あるいは上位概念）のインデックスＩＤを表す。FIG. 7 shows a specific example (No. 2) of the present invention. In this example, the word W1 "orange" and the word W
2 This is for "Tachibana". Here, the arrow indicates the point destination, and <0e844e> at the point destination indicates the index ID of the concept set (or the superordinate concept).

【００５４】（１）係数として予め実験で求めてその
値を図示の下記のように設定する。・Ｋα＝０．４５・Ｋβ＝０．０２８・Ｋβ１＝２．７５・Ｋβ２＝８．２５・Ｋ１＝１・Ｋ２＝０．０５（２）以下図６の（２）から（８）と同様にして類似度α＝１類似度β１＝９５．０７類似度β２＝９５．０７類似度δ＝０．９７を算出する。(1) The coefficient is obtained in advance by experiments and the value is set as shown below.・ Kα = 0.45 ・ Kβ = 0.028 ・ Kβ1 = 2.75 ・ Kβ2 = 8.25 ・ K1 = 1 ・ K2 = 0.05 (2) The same as (2) to (8) in FIG. 6 below. Then, the degree of similarity α = 1, the degree of similarity β1 = 95.07, the degree of similarity β2 = 95.07, and the degree of similarity δ = 0.97 are calculated.

【００５５】以上のように、単語Ｗ１“オレンジ”およ
び単語Ｗ２“橘”が入力されたことに対応して、単語辞
書１および概念体系２を検索して、類似度α、類似度β
１、β２を算出し、これらをもとに全体の類似度δ＝
０．９７として算出する。これらにより、単語Ｗ１“オ
レンジ”と単語Ｗ２“橘”の概念集合の間の類似度α、
および当該概念集合の上位概念の間の類似度β１、β２
をもとに単語間の類似度δを算出することが可能とな
る。このときの重み付けは、ＫαおよびＫβの値を実験
的に決めることにより設定する。As described above, in response to the input of the word W1 "orange" and the word W2 "tachibana", the word dictionary 1 and the concept system 2 are searched for the similarity α and the similarity β.
1 and β2 are calculated, and the overall similarity δ =
Calculated as 0.97. From these, the similarity α between the concept set of the word W1 “orange” and the word W2 “tachibana”,
And the degrees of similarity β1 and β2 between the superordinate concepts of the concept set
It is possible to calculate the similarity δ between words based on. The weighting at this time is set by experimentally determining the values of Kα and Kβ.

【００５６】図８は、本発明の単語辞書と概念体系の関
係図を示す。図８の（ａ）は、日本語単語辞書１１、英
語単語辞書１２および概念体系２の関係図を示す。この
関係図は、日本語単語辞書１１および英語単語辞書１２
が共通な概念を持ち、概念体系２にリンクする。概念体
系２は言語に依存しないので、概念レベルで単語の類似
度を算出する場合、日本語の単語同士、英語の単語同
士、日本語と英語の類似度δを算出することが可能とな
る。ここで、本発明の単語間の類似度は、従来の単語間
の距離と違い、単語の振る舞い、使い方の観点から見た
類似度であって、既述した図６、図７で説明したように
して算出する。FIG. 8 shows the relationship between the word dictionary and the concept system of the present invention. FIG. 8A shows a relationship diagram of the Japanese word dictionary 11, the English word dictionary 12, and the concept system 2. This relationship diagram is represented by a Japanese word dictionary 11 and an English word dictionary 12.
Have common concepts and are linked to Concept System 2. Since the concept system 2 does not depend on the language, when calculating the word similarity at the concept level, it is possible to calculate the Japanese words, the English words, and the Japanese and English similarity δ. Here, unlike the distance between words in the related art, the similarity between words of the present invention is a similarity from the viewpoint of word behavior and usage, and as described above with reference to FIGS. 6 and 7. And calculate.

【００５７】図８の（ｂ）は、単語Ｗ１の概念集合Ｃ１
と、単語Ｗ２の概念集合Ｃ２とが同一の場合を示す。こ
のような関係を同一関係と呼ぶ。この同一関係の場合、
例えば・英国とイギリス・辞書と字典・外国と海外の場合には、単語Ｗ１の概念集合Ｃ１と単語Ｗ２の概念
集合Ｃ２が同一であるため、類似度を算出する必要がな
い。FIG. 8B shows the concept set C1 of the word W1.
And the concept set C2 of the word W2 are the same. Such a relationship is called an identical relationship. In this same relationship,
For example: -UK and the UK-Dictionary and dictionary-Foreign and foreign countries, the concept set C1 of the word W1 and the concept set C2 of the word W2 are the same, so there is no need to calculate the degree of similarity.

【００５８】図８の（ｃ）は、単語Ｗ１の概念集合Ｃ１
と、単語Ｗ２の概念集合Ｃ２とが全く同一ではないが、
共通な概念が存在する場合を示す。このような関係を同
義関係と呼ぶ。この同義関係の場合、例えば・男と男子・犬とスパイ・国語と日本語の場合には、単語Ｗ１の概念集合Ｃ１と、単語Ｗ２の概
念集合Ｃ２とが全く同一ではないが共通な概念が存在
し、共通な概念の数が多いほど単語Ｗ１と単語Ｗ２の類
似度は大きい。類似度αは同義関係での類似度である。FIG. 8C shows the concept set C1 of the word W1.
And the concept set C2 of the word W2 is not exactly the same,
The case where there is a common concept is shown. Such a relationship is called a synonymous relationship. In the case of this synonym, for example: -male and boy-dog and spy-national language and Japanese, the concept set C1 of the word W1 and the concept set C2 of the word W2 are not exactly the same, but there is a common concept. The greater the number of existing and common concepts, the greater the similarity between the word W1 and the word W2. The similarity α is a similarity in a synonymous relationship.

【００５９】図８の（ｄ）は、単語Ｗ１の概念集合Ｃ１
と、単語Ｗ２の概念集合Ｃ２の間に共通な概念が存在し
ないが、上位概念の間に共通な概念が存在する場合を示
す。このような関係を類似関係と呼ぶ。この類似関係の
場合、例えば・部長と社長・部長と私・アメリカとイギリスの場合には、単語Ｗ１の概念集合Ｃ１と、単語Ｗ２の概
念集合Ｃ２の間に共通な概念が存在しないが、ある階層
ｉまでの上位概念Ｓｉの間に共通な概念が存在する。こ
の際、類似関係にある２つの単語Ｗ１と単語Ｗ２の間の
類似度は、図６、図７を用いて既述したように共通の上
位概念数以外、両単語の上位概念の数にも関係し、例え
ば５つの上位概念の中に１つが他と同じである場合（１
／５）よりも、２つの上位概念の中に１つが他と同じで
ある場合（１／２）の方が類似度が大きい。類似度βは
類似関係での類似度である。FIG. 8D shows the concept set C1 of the word W1.
And a common concept does not exist between the concept sets C2 of the word W2, but a common concept exists between the superordinate concepts. Such a relationship is called a similarity relationship. In the case of this similar relationship, for example: -Manager and President-Manager and I-America and United Kingdom, there is no common concept between the concept set C1 of word W1 and the concept set C2 of word W2. There is a common concept among the superordinate concepts Si up to the hierarchy i. At this time, the degree of similarity between two words W1 and W2 having a similar relationship is not limited to the number of common superordinate concepts as described above with reference to FIGS. Related, for example, if one of the five superordinate concepts is the same as the other (1
/ 5), the similarity is higher when one of the two superordinate concepts is the same as the other (1/2). The similarity β is the similarity in the similarity relationship.

【００６０】図９は、本発明の具体例を示す。図９の
（ａ）は、重み値のサンプル例を示す。これは、既述し
た図６、図７で用いた重み値のサンプル例である。FIG. 9 shows a specific example of the present invention. FIG. 9A shows an example of sample weight values. This is a sample example of the weight values used in FIGS. 6 and 7 described above.

【００６１】図９の（ｂ）および（ｃ）図、は、図９の
（ａ）の重み値のサンプル例を用いたときの単語間の類
似度δの算出例を示す。ここで、類似度δは０から１の
値を持ち、大きいほど類似度が高い。図９の（ｂ）は日
本語の単語同士の類似度δの例を示し、図９の（ｃ）は
英語と英語および日本語と英語の単語の類似度δの例を
示す。FIGS. 9B and 9C show an example of calculating the similarity δ between words when the sample example of the weight value of FIG. 9A is used. Here, the similarity δ has a value of 0 to 1, and the larger the similarity, the higher the similarity. 9B shows an example of the similarity δ between Japanese words, and FIG. 9C shows an example of the similarity δ between English and English and between Japanese and English words.

【００６２】[0062]

【発明の効果】以上説明したように、本発明によれば、
入力された複数の単語について、単語辞書１を検索して
該当する概念集合をそれぞれ取り出し、これら取り出し
た概念集合の間の類似度αを算出し、これら取り出した
概念集合について、概念体系２を検索して該当する上位
概念（および／あるいは下位概念）を取り出し、これら
取り出した上位概念（および／あるいは下位概念）の間
の類似度βを算出し、これら算出した類似度αおよび類
似度βにそれぞれ重み付けを行って類似度δを算出し、
これを単語間の類似度とする構成を採用しているため、
単語の文法や意味情報（概念の集合）を記述した単語辞
書１および概念の上位下位関係を記述した言語に依存し
ない概念体系２を用い、従来の類似度を算出できなかっ
た多義性の単語や異なる言語の単語などの間の類似度を
算出することができる。As described above, according to the present invention,
With respect to a plurality of input words, the word dictionary 1 is searched to extract the corresponding concept sets, the similarity α between the extracted concept sets is calculated, and the concept system 2 is searched for the extracted concept sets. Then, the corresponding superordinate concept (and / or subordinate concept) is extracted, the similarity β between the extracted superordinate concepts (and / or subordinate concepts) is calculated, and the calculated similarity α and similarity β are respectively calculated. Weighting is performed to calculate the similarity δ,
Since we have adopted a structure that uses this as the degree of similarity between words,
Using the word dictionary 1 that describes the grammar and semantic information of a word (set of concepts) and the concept system 2 that does not depend on the language that describes the superordinate and subordinate relations of the concepts Similarities between words in different languages can be calculated.

[Brief description of drawings]

【図１】本発明の１実施例構成図である。FIG. 1 is a configuration diagram of an embodiment of the present invention.

【図２】本発明の動作説明フローチャートである。FIG. 2 is a flowchart explaining the operation of the present invention.

【図３】本発明の類似度αの算出フローチャートであ
る。FIG. 3 is a flowchart for calculating a similarity α according to the present invention.

【図４】本発明の類似度βの算出フローチャートであ
る。FIG. 4 is a flowchart for calculating a similarity β according to the present invention.

【図５】本発明の類似度δの算出フローチャートであ
る。FIG. 5 is a flowchart for calculating a similarity δ according to the present invention.

【図６】本発明の具体例（その１）である。FIG. 6 is a specific example (1) of the present invention.

【図７】本発明の具体例（その２）である。FIG. 7 is a specific example (2) of the present invention.

【図８】本発明の単語辞書と概念体系の関係図である。FIG. 8 is a diagram showing the relationship between the word dictionary and the concept system of the present invention.

【図９】本発明の具体例である。FIG. 9 is a specific example of the present invention.

[Explanation of symbols]

１：単語辞書１１：日本語単語辞書１２：英語単語辞書２：概念体系３：単語辞書検索部４：概念体系検索部５：概念集合類似度算出部（類似度α）５１：Ｗ１の概念集合５２：Ｗ２の概念集合６：概念集合類似度算出部（類似度β）６１：Ｗ１の概念集合６２：Ｗ２の概念集合７：単語間類似度算出部（類似度δ） 1: Word dictionary 11: Japanese word dictionary 12: English word dictionary 2: Concept system 3: Word dictionary search unit 4: Concept system search unit 5: Concept set similarity calculation unit (similarity α) 51: W1 concept set 52: Concept set of W2 6: Concept set similarity calculation unit (similarity β) 61: Concept set of W1 62: Concept set of W2 7: Inter-word similarity calculation unit (similarity δ)

Claims

[Claims]

1. A similarity calculation method for calculating similarity between words, comprising: a word dictionary (1) in which a word and its concept set are registered in advance; and a concept set of words registered in this word dictionary (1). A concept system (2) for pre-registering a concept (and / or a subordinate concept) is provided, the word dictionary (1) is searched for a plurality of input words, and a corresponding concept set is extracted, respectively. The similarity α between the concept sets is calculated, the concept system (2) is searched for the extracted concept sets, the corresponding superordinate concept (and / or subordinate concept) is extracted, and the extracted superordinate concept (and (Or subordinate concept), the similarity β is calculated, the calculated similarity α and the similarity β are weighted to calculate the similarity δ, and the similarity δ is calculated. Similarity calculation method between the words, characterized by being configured so as to similarity score.

2. The word dictionary (1) is provided for each different language, and the common concept system (2) linked from these word dictionaries (1) is used.
A method for calculating the degree of similarity between written words.