JP2609196B2

JP2609196B2 - Similarity calculator

Info

Publication number: JP2609196B2
Application number: JP5061641A
Authority: JP
Inventors: 耕三大井; 英一郎隅田; 仁飯田
Original assignee: 株式会社エイ・ティ・アール自動翻訳電話研究所
Priority date: 1993-03-22
Filing date: 1993-03-22
Publication date: 1997-05-14
Anticipated expiration: 2012-05-14
Also published as: JPH06274548A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は情報処理分野、特に機
械翻訳や情報検索や質問応答などの自然言語処理、画像
処理、音声処理などの分野に関し、特に、入力されるあ
る情報（以下「入力情報」と呼ぶ）と、この入力情報と
の比較が行なわれる他の情報（以下「比較情報」と呼
ぶ）のそれぞれとの間の類似度を計算する装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of information processing, and more particularly to the field of natural language processing such as machine translation, information retrieval, and question answering, image processing, and voice processing. The present invention relates to a device for calculating the similarity between each of the input information and other information (hereinafter referred to as “comparison information”) to be compared with the input information.

【０００２】[0002]

【従来の技術】従来、入力情報と比較情報のそれぞれと
の間の類似度の計算を行なう類似度計算装置では、計算
は入力情報および比較情報自体をデータとして行なわれ
ていた。2. Description of the Related Art Conventionally, in a similarity calculating apparatus for calculating a similarity between input information and comparison information, the calculation is performed using the input information and the comparison information itself as data.

【０００３】たとえば、画像検索について述べる。画像
検索とは、比較情報として予め準備されている大量の画
像の中から、入力画像に類似した画像を検索する処理で
ある。この場合、１つの入力画像と１つの比較画像との
間の類似度を求めるときには、各画像の各ドットの色お
よび濃淡のデータ、すなわち、画像そのもののデータを
用いていた。[0003] For example, image retrieval will be described. The image search is a process of searching for an image similar to the input image from a large number of images prepared in advance as comparison information. In this case, when calculating the similarity between one input image and one comparison image, data of the color and shade of each dot of each image, that is, data of the image itself, has been used.

【０００４】これと対照的に、入力情報には現われてい
ない特徴を、外部知識から獲得して類似度計算を行なう
事例もある。以下、特に機械翻訳の事例について説明す
る。[0004] In contrast, there are cases in which features not appearing in input information are obtained from external knowledge and similarity calculation is performed. Hereinafter, a case of machine translation will be particularly described.

【０００５】機械翻訳システムの１タイプとして、大量
の用例（原文と訳文との対）と入力文とを比較し、入力
文に最も類似した用例を求めてそれを基に訳文を生成す
るタイプのものがある。そのようなタイプの機械翻訳シ
ステムにおいては、入力文すなわち入力情報と用例すな
わち比較情報との間の類似度は、その入力文における単
語（以下「入力単語」と称する）と用例における単語
（以下「比較単語」と称する）との間の類似度を、たと
えば類語辞書のような外部知識から獲得した情報を用い
て計算する。この類語辞書のような外部知識から獲得し
た情報は、入力単語あるいは比較単語にそれぞれ対応し
たものではあるが、入力単語あるいは比較単語には現わ
れていないような特徴に関する情報である。One type of machine translation system is to compare a large number of examples (a pair of an original sentence and a translated sentence) with an input sentence, find an example most similar to the input sentence, and generate a translated sentence based on the example. There is something. In such a type of machine translation system, the similarity between an input sentence, that is, input information, and an example, that is, comparison information, is determined by a word in the input sentence (hereinafter referred to as an “input word”) and a word in the example (hereinafter, “input word”). Is calculated using information acquired from external knowledge such as a thesaurus. The information obtained from external knowledge such as the thesaurus is information relating to features that do not appear in the input word or the comparison word, although they correspond to the input word or the comparison word, respectively.

【０００６】このように、外部知識から獲得した特徴に
関する情報を用いて類似度計算を行なう装置は、従来、
入力情報と比較情報のそれぞれとの間の類似度計算を逐
次処理で行なっていた。[0006] As described above, a device that performs similarity calculation using information on features acquired from external knowledge has been conventionally used.
The calculation of the similarity between the input information and each of the comparison information has been performed sequentially.

【０００７】図１２は、従来のこのような類似度計算装
置において、入力情報と比較情報のそれぞれとの間の類
似度計算をする際の手順を示すフローチャートである。FIG. 12 is a flowchart showing a procedure for calculating a similarity between input information and comparison information in such a conventional similarity calculation apparatus.

【０００８】たとえば、電子計算機を使用して類似度計
算を行なう場合、比較情報に対応し比較情報には現われ
ていない比較特徴は予め計算機のメモリ上に格納されて
いる。そして類似度計算は次のような手順によって行な
われている。For example, when performing similarity calculation using an electronic computer, comparison features corresponding to the comparison information and not appearing in the comparison information are stored in advance in the memory of the computer. The similarity calculation is performed according to the following procedure.

【０００９】図１２を参照して、まずステップＳ００１
で、入力情報に対応し入力情報には現われていない入力
特徴を外部知識から抽出する処理が行なわれる。Referring to FIG. 12, first, at step S001.
Then, a process of extracting input features corresponding to the input information and not appearing in the input information from the external knowledge is performed.

【００１０】続いてステップＳ００２〜Ｓ００５で、予
め計算機のメモリ上に格納されていた比較特徴を１つず
つＣＰＵ（中央演算処理装置）内に取り出して、順次入
力情報の特徴との間の類似度計算を行なっていた。Subsequently, in steps S002 to S005, the comparison features previously stored in the memory of the computer are fetched one by one into a CPU (Central Processing Unit), and the similarity between the features of the input information is sequentially obtained. I was doing calculations.

【００１１】[0011]

【発明が解決しようとする課題】しかし、入力情報と比
較情報とに対応し、入力情報あるいは比較情報には現わ
れていない特徴を用いて類似度計算をする場合には、処
理時間が非常にかかるという問題点がある。たとえば、
入力文と大量の用例とを比較し、最も類似した用例を求
めてそれを基に訳文を生成するタイプの機械翻訳システ
ムにおいて、より正確な翻訳を行なおうとすれば用例数
をより多くすることが必要である。ところが、数万の用
例を用いて類似度計算を行なう場合には、短い文でも入
力文とすべての用例との間の類似度の計算に数分を要す
る。そのためにこのようなタイプの機械翻訳システムで
リアルタイム処理を実現することは不可能であった。However, when the similarity is calculated using features that do not appear in the input information or the comparison information, the processing time is extremely long. There is a problem. For example,
In a machine translation system that compares the input sentence with a large number of examples, finds the most similar example, and generates a translation based on it, the number of examples should be increased if more accurate translation is to be performed. is required. However, when the similarity is calculated using tens of thousands of examples, it takes several minutes to calculate the similarity between the input sentence and all the examples even for a short sentence. Therefore, it has not been possible to realize real-time processing with such a type of machine translation system.

【００１２】それゆえに請求項１に記載の発明の目的
は、入力情報と比較情報のそれぞれとの間の類似度を、
外部知識から獲得した、入力情報に対応した入力特徴お
よび比較情報に対応した比較特徴を用いて求める類似度
計算において、計算速度を大幅に向上できる類似度計算
装置を提供することである。Therefore, an object of the present invention is to determine the similarity between the input information and each of the comparison information,
An object of the present invention is to provide a similarity calculation device capable of greatly improving the calculation speed in similarity calculation using an input feature corresponding to input information and a comparison feature corresponding to comparison information acquired from external knowledge.

【００１３】[0013]

【課題を解決するための手段】請求項１に記載の類似度
計算装置は、図８に示されるように、複数個の予め定め
る比較情報との間での類似度を計算する対象となる情報
を入力するための手段３２と、与えられる情報に対応す
るが、前記与えられる情報自体には現われていない特徴
を導出するための外部知識を格納するための手段２４
と、前記複数個の比較情報に対応して前記外部知識を用
いて導出される前記比較情報の特徴のそれぞれと、前記
入力情報に対応して前記外部知識を用いて導出される特
徴との間で所定の類似度計算を行なって、前記比較情報
のそれぞれと、前記入力情報との類似度を導出するため
の手段６０と、前記導出された類似度を出力するための
手段３８とを含み、前記類似度を導出するための手段６
０は、前記複数個の比較情報の各々について、対応する
特徴を格納する第１の領域と、当該特徴と前記入力情報
に対応して導出される特徴との間での比較結果を格納す
る第２の領域とを有する、格納内容で参照可能な記憶手
段６６を含み、前記記憶手段６６は、前記第１および第
２の領域中の、指定可能な所定部分領域の格納内容が、
与えられる比較対象と一致している比較情報すべてを同
時に検出可能であり、前記類似度を導出するための手段
６０はさらに、前記複数個の比較情報のうち、対応する
第１の領域の格納内容の所定部分が、前記入力情報に対
応して導出される特徴の前記所定部分と一致するものす
べてを同時に検出し、対応するすべての前記第２の領域
の、前記所定部分に対応する特定の部分領域に、一致が
検出されたことを示す所定の値を同時に格納する第１の
一致検出手段６２、６４と、前記第１の一致検出手段６
２、６４による検出および格納動作を、前記所定部分の
指定を変更しながら所望の回数だけ行なうための手段６
２、６４と、ある類似度に対応する所与の値と一致する
値を有している前記第２の領域を前記記憶手段６６にお
いて同時に検出し、当該検出された第２の領域に対応す
る比較情報を前記ある類似度に関連づけることにより、
当該比較情報に対する類似度を導出するための第２の一
致検出手段６２、６４とを含む。As shown in FIG. 8, the similarity calculating apparatus according to claim 1 calculates information similar to a plurality of pieces of predetermined comparison information. And means for storing external knowledge for deriving features corresponding to the given information but not appearing in the given information itself.
And, between each of the features of the comparison information derived using the external knowledge corresponding to the plurality of pieces of comparison information, and the features derived using the external knowledge corresponding to the input information. A predetermined similarity calculation is performed at each of the comparison information, a means 60 for deriving a similarity between the input information, and a means 38 for outputting the derived similarity. Means 6 for deriving the similarity
0 is a first area for storing a feature corresponding to each of the plurality of pieces of comparison information and a comparison result between the feature and a feature derived corresponding to the input information. And storage means 66 having a storage area 66 which can be referred to by storage contents. The storage means 66 stores, in the first and second areas, storage contents of a specified partial area which can be specified,
All of the comparison information that matches the given comparison target can be detected simultaneously, and the means 60 for deriving the similarity further includes a storage content of a corresponding first area of the plurality of pieces of comparison information. A predetermined portion of the feature derived in accordance with the input information simultaneously detects all the features that match the predetermined portion, and a specific portion corresponding to the predetermined portion of all the corresponding second regions. First match detecting means 62 and 64 for simultaneously storing a predetermined value indicating that a match has been detected in the area;
Means 6 for performing the detection and storage operations by 2, 64 a desired number of times while changing the designation of the predetermined portion.
2, 64 and the second area having a value that matches a given value corresponding to a certain similarity is simultaneously detected in the storage unit 66, and the second area corresponding to the detected second area is detected. By associating the comparison information with the certain similarity,
Second match detecting means 62 and 64 for deriving a similarity to the comparison information.

【００１４】[0014]

【００１５】[0015]

【００１６】[0016]

【００１７】[0017]

【００１８】[0018]

【００１９】[0019]

【００２０】[0020]

【００２１】[0021]

【００２２】[0022]

【作用】請求項１に記載の類似度計算装置においては、
入力情報と比較情報のそれぞれとの間の類似度が、外部
知識から獲得した、入力情報に対応した入力特徴および
比較情報に対応した比較特徴を用いて、並列に計算され
る。したがって従来の逐次計算処理よりも高速に類似度
計算を行なうことができる。In the similarity calculation device according to the first aspect,
The similarity between each of the input information and each of the comparison information is calculated in parallel using the input features corresponding to the input information and the comparison features corresponding to the comparison information acquired from the external knowledge. Therefore, similarity calculation can be performed faster than in the conventional sequential calculation processing.

【００２３】並列演算を実現するためには、格納内容で
参照可能な記憶手段を用いて類似度の計算を実行する。
複数個の比較情報の各々について、対応する特徴を第１
の領域に格納する。そしてこの第１の領域の格納内容の
所定部分が、入力情報に対応して導出される特徴の所定
部分と一致するものすべてを同時に検出し、検出された
比較情報に対応する第２の領域の、所定部分に対応する
特定の部分領域に、一致が検出されたことを示す所定の
値を同時に格納する。この検出および格納動作は、所定
部分の指定を変更しながら所望の回数だけ行なうことに
より、これら所定部分に対し比較情報の各々がどの程度
一致しているかが、対応する第２の領域に書込まれる。
そして、ある類似度に対応する所与の値と一致する値を
有している第２の領域を記憶手段において同時に検出
し、検出された第２の領域に対応する比較情報をそのあ
る類似度に関連づけることにより、比較情報に対する類
似度が導出される。複数個の比較情報に対して行なわれ
る比較は、比較情報の個数ではなく、第１の一致検出手
段による検出が行なわれる所定部分の個数のみに依存す
る。この個数は比較情報の個数に比較してはるかに少な
い。そのため上記した類似度を導出するための手段で行
なわれる比較の回数は、比較情報の１つ１つに対して１
回ずつ比較を行なう場合と比較してはるかに少なくな
る。In order to realize the parallel operation, the similarity is calculated by using a storage means which can be referred to by the stored contents.
For each of the plurality of pieces of comparison information,
Is stored in the area. Then, all of the predetermined portions of the stored contents of the first area that coincide with the predetermined parts of the features derived corresponding to the input information are simultaneously detected, and the second area of the second area corresponding to the detected comparison information is detected. A predetermined value indicating that a match has been detected is simultaneously stored in a specific partial region corresponding to the predetermined portion. This detection and storage operation is performed a desired number of times while changing the designation of the predetermined portion, so that the degree to which each of the comparison information matches the predetermined portion is written in the corresponding second area. It is.
Then, a second area having a value corresponding to a given value corresponding to a certain similarity is simultaneously detected in the storage means, and comparison information corresponding to the detected second area is converted to the certain similarity. , A similarity to the comparison information is derived. The comparison performed on a plurality of pieces of comparison information depends not only on the number of pieces of comparison information, but only on the number of predetermined portions detected by the first coincidence detecting means. This number is much smaller than the number of comparison information. Therefore, the number of comparisons performed by the means for deriving the similarity described above is one for each piece of comparison information.
It is much less than when performing comparisons one by one.

【００２４】[0024]

【実施例】以下、本発明の３つの実施例を順に説明す
る。３つの実施例とも、１つの入力情報と１０００個の
比較情報との間の類似度計算を行なう装置である。情報
としては単語を用い、入力された単語（以下「入力単
語」と称する）と、比較対象となる単語（以下「比較単
語」と称する）１０００語との間の類似度計算を行なう
例を説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, three embodiments of the present invention will be described in order. Each of the three embodiments is an apparatus for calculating a similarity between one input information and 1000 pieces of comparison information. An example in which a word is used as information and similarity calculation is performed between an input word (hereinafter, referred to as “input word”) and 1000 words to be compared (hereinafter, referred to as “comparison word”) will be described. I do.

【００２５】なお、以下の説明では単語の間の類似度計
算を行なう場合を例として本発明を説明するが、本発明
は単語の比較のみに限定されるわけではなく、前述のよ
うな画像情報や、音声情報などにおける類似度計算に対
しても適用可能である。In the following description, the present invention will be described by taking the case of calculating the similarity between words as an example. However, the present invention is not limited to only the comparison of words, and the image information as described above is used. Also, the present invention can be applied to similarity calculation in audio information and the like.

【００２６】実施例の説明をする前に、これら実施例が
使用される背景について説明する。前述の、入力文と大
量の用例（原文と訳文との対）とを比較して最も類似し
た用例を求めてそれを基に訳文を生成するタイプの機械
翻訳システムについて考える。このタイプの機械翻訳シ
ステムにおいては、入力文および用例を構成する各単語
ごとの類似度を計算することによって最も類似した用例
が求められる。そしてその用例を基に訳文が生成され
る。Before describing the embodiments, the background where these embodiments are used will be described. A machine translation system of the type described above in which an input sentence is compared with a large number of examples (a pair of an original sentence and a translated sentence) to find the most similar example and generate a translated sentence based on the most similar example. In this type of machine translation system, the most similar example is obtained by calculating the similarity of each word constituting the input sentence and the example. Then, a translated sentence is generated based on the example.

【００２７】単語の類似度は以下のようにして求められ
る。まず、外部知識としての類語辞書を準備する。この
類語辞書から、入力単語または比較単語に対応した特徴
に関する情報として「類語コード」を獲得し、その類語
コードを用いて単語間の類似度を求めている。The similarity between words is obtained as follows. First, a synonym dictionary as external knowledge is prepared. A “synonym code” is acquired from the synonym dictionary as information on features corresponding to the input word or the comparison word, and the similarity between words is obtained using the synonym code.

【００２８】類語辞書とは、単語の意味を階層的に分類
した体系に基づき、各単語に類語コードを付与したもの
を格納した辞書である。図２に、４階層からなる類語辞
書の分類体系の一部を示す。The thesaurus is a dictionary in which words are assigned a synonym code based on a system in which the meanings of words are hierarchically classified. FIG. 2 shows a part of a classification system of a thesaurus of four layers.

【００２９】図２を参照して、各単語に付与される類語
コードは３桁の１０進数からなる。類語コードの１００
の位、１０の位、１の位はそれぞれ、分類体系の大分
類、中分類、小分類を表わす。図２に示される例におい
て、単語「取材」は、大分類が［取引］、中分類が［報
道］、小分類が［編集］である分類に属し、その分類の
類語コードは４５７となっている。Referring to FIG. 2, the synonym code assigned to each word is composed of a three-digit decimal number. Synonym code 100
The 10th, 1st, and 1st places respectively represent the large, middle, and small classifications of the classification system. In the example shown in FIG. 2, the word “interview” belongs to a class whose major category is “transaction”, its middle category is “report”, and its minor category is “edit”, and its synonym code is 457. I have.

【００３０】なお、本実施例の説明においては、図２に
示すように３桁の１０進数の類語コードを用いている。
しかし、本発明はこのような類語コードではなく、一般
的なｎ桁ｍ進数やベクトルなど、様々な形の情報を類語
コードと同様のものとして処理可能である。In the description of this embodiment, a three-digit decimal synonym code is used as shown in FIG.
However, the present invention can process various types of information, such as general n-digit m-ary numbers and vectors, instead of such synonym codes, as similar to the synonym codes.

【００３１】（１）第１の実施例図３を参照して、本発明の第１の実施例の類似度計算装
置は、入力部３２と、入力特徴抽出部３４と、外部知識
２４と、類似度計算部３６と、出力部３８とを含む。(1) First Embodiment Referring to FIG. 3, a similarity calculation apparatus according to a first embodiment of the present invention includes an input unit 32, an input feature extraction unit 34, an external knowledge 24, It includes a similarity calculation unit 36 and an output unit 38.

【００３２】入力部３２はキーボード、文字認識装置、
音声認識装置などからなる。入力部３２は、入力単語を
入力特徴抽出部３４に与えるためのものである。The input unit 32 includes a keyboard, a character recognition device,
It consists of a voice recognition device and the like. The input unit 32 is for giving an input word to the input feature extraction unit 34.

【００３３】外部知識２４は、ハードディスクやメモリ
などからなる。この外部知識２４は、図４に示されるよ
うな、各単語の見出しとその類語コードが対になったデ
ータを多数格納している。The external knowledge 24 includes a hard disk and a memory. The external knowledge 24 stores a large number of data in which a heading of each word and its synonym code are paired as shown in FIG.

【００３４】入力特徴抽出部３４は、入力部３２から与
えられる入力単語に対応した類語コード（以下「入力コ
ード」と称する）を外部知識２４から抽出し、類似度計
算部３６に与えるためのものである。The input feature extraction unit 34 extracts a synonym code (hereinafter, referred to as “input code”) corresponding to the input word provided from the input unit 32 from the external knowledge 24 and provides the similarity code to the similarity calculation unit 36. It is.

【００３５】類似度計算部３６は、１０００個のコンピ
ュータ４０−１〜４０−１０００を含む。各コンピュー
タ４０−１〜４０−１０００の入力は入力特徴抽出部３
４の出力に接続されている。また各コンピュータ４０−
１〜４０−１０００の出力は出力部３８の入力に接続さ
れている。これらコンピュータ４０−１〜４０−１００
０の各々は、ＣＰＵとメモリとを含む。１つのコンピュ
ータのメモリには、１つの比較単語に対応した類語コー
ド（以下「比較コード」と称する）および類似度計算の
ためのプログラムが格納されている。これらコンピュー
タ４０−１〜４０−１０００には入力特徴抽出部３４か
ら入力コードが与えられる。各コンピュータ４０−１〜
４０−１０００は、与えられた入力コードと各コンピュ
ータのメモリに格納されている比較コードとの間の類似
度を後述するような方法に従って計算し、計算された類
似度を出力部３８に与える。これらコンピュータ４０−
１〜４０−１０００の各メモリに格納されている比較コ
ードの一例が図５に示されている。図５においてたとえ
ばコンピュータ４０−１に格納されている比較コード
「４２６」は、図４に示されるように比較単語「販売」
に対応する類語コードである。他の比較コードも同様に
図４に示される外部知識２４内の或る単語に対応する類
語コードとなっている。The similarity calculator 36 includes 1000 computers 40-1 to 40-1000. The input of each of the computers 40-1 to 40-1000 is input feature extraction unit 3
4 is connected to the output. Each computer 40-
Outputs of 1 to 40-1000 are connected to inputs of the output unit 38. These computers 40-1 to 40-100
Each of the 0s includes a CPU and a memory. A memory of one computer stores a synonym code (hereinafter, referred to as a “comparison code”) corresponding to one comparison word and a program for calculating similarity. These computers 40-1 to 40-1000 are provided with input codes from the input feature extraction unit 34. Each computer 40-1
40-1000 calculates the similarity between the given input code and the comparison code stored in the memory of each computer according to a method described later, and provides the calculated similarity to the output unit 38. These computers 40-
An example of the comparison code stored in each of the memories 1 to 40-1000 is shown in FIG. In FIG. 5, for example, the comparison code “426” stored in the computer 40-1 corresponds to the comparison word “sales” as shown in FIG.
Is a synonym code corresponding to. Similarly, the other comparison codes are synonym codes corresponding to certain words in the external knowledge 24 shown in FIG.

【００３６】出力部３８は、表示装置、印刷装置などか
らなる。出力部３８は、類似度計算部３６のコンピュー
タ４０−１〜４０−１０００から与えられた類似度を出
力するためのものである。The output unit 38 includes a display device, a printing device, and the like. The output unit 38 is for outputting the similarity given from the computers 40-1 to 40-1000 of the similarity calculation unit 36.

【００３７】図６は、図３のコンピュータ４０−１〜４
０−１０００の各々で行なわれる、入力コードと比較コ
ードとの間の類似度の求め方を示す。図６において最左
欄は入力コードと比較コードとの間に成立する条件を示
す。中欄は、最左欄に示される条件に適合したときの類
似度を示す。最右欄は、最左欄に示される条件に適合す
るような入力コードと比較コードとの対を示す。FIG. 6 shows the computers 40-1 to 40-4 in FIG.
It shows how to determine the similarity between the input code and the comparison code, which is performed in each of 0-1000. In FIG. 6, the leftmost column shows conditions that are satisfied between the input code and the comparison code. The middle column shows the similarity when the condition shown in the leftmost column is met. The rightmost column shows a pair of an input code and a comparison code that meet the conditions shown in the leftmost column.

【００３８】入力コードと比較コードとの条件欄におけ
る記号「Ｉ１」「Ｉ２」「Ｉ３」はそれぞれ、入力コー
ドの１００の位と、１０の位と、１の位とを表わす。記
号「Ｃ１」「Ｃ２」「Ｃ３」はそれぞれ、比較コードの
１００の位と、１０の位と、１の位とを表わす。The symbols "I1,""I2," and "I3" in the condition column for the input code and the comparison code represent the hundreds, tenths, and ones digits of the input code, respectively. The symbols "C1,""C2," and "C3" represent the hundreds, tens, and ones positions of the comparison code, respectively.

【００３９】図６を参照して、第２行目で示されるよう
に、入力コードと比較コードとが、１００の位と、１０
の位と、１の位とのいずれでも一致する場合には、類似
度は３となる。第３行目に示されるように１００の位と
１０の位とが一致し、１の位のみが異なる場合には類似
度は２となる。１００の位のみが一致し、１０の位が異
なる場合には、図６の４行目に示されるように類似度は
１となる。図６の第５行目に示されるように入力コード
と比較コードの１００の位が互いに異なる場合には類似
度は０となる。Referring to FIG. 6, as shown in the second line, the input code and the comparison code are changed to the hundreds digit and the tenth digit.
If both the first place and the first place match, the similarity is 3. As shown in the third row, when the hundreds digit matches the tenth digit and only the ones digit is different, the similarity is 2. When only the hundreds digit matches and the tens digit differs, the similarity becomes 1 as shown in the fourth row of FIG. As shown in the fifth line of FIG. 6, when the input code and the comparison code are different from each other in the order of 100, the similarity is 0.

【００４０】以下、図３に示される類似度計算装置の動
作を、入力単語が「取材」である場合を例にして説明す
る。入力単語「取材」が入力部３２を介して入力される
と、その入力単語「取材」は入力特徴抽出部３４に与え
られる。Hereinafter, the operation of the similarity calculation device shown in FIG. 3 will be described by taking as an example the case where the input word is "reporting". When the input word “reporting” is input via the input unit 32, the input word “reporting” is given to the input feature extraction unit 34.

【００４１】入力特徴抽出部３４は、入力単語「取材」
に対応した類語コード（入力コード）を外部知識２４か
ら抽出する。この場合図４の第４行目に示されるよう
に、入力単語「取材」に対応した類語コードは４５７と
なっているので、入力コードとして４５７が得られる。
この入力コード４５７は、類似度計算部３６のコンピュ
ータ４０−１〜４０−１０００のすべてに与えられる。The input feature extraction unit 34 receives the input word “interview”
Is extracted from the external knowledge 24. In this case, as shown in the fourth line of FIG. 4, the synonym code corresponding to the input word “reporting” is 457, so that 457 is obtained as the input code.
This input code 457 is given to all of the computers 40-1 to 40-1000 of the similarity calculation unit 36.

【００４２】各コンピュータ４０−１〜４０−１０００
は、入力コード「４５７」が与えられると、それぞれ独
立に類似度計算を行なう。各コンピュータは、入力コー
ド「４５７」と、そのコンピュータに割り当てられてい
る比較コードとを比較し、図６に示される類似度の求め
方に従って類似度を求める。Each of the computers 40-1 to 40-1000
Performs the similarity calculation independently of each other when the input code “457” is given. Each computer compares the input code “457” with the comparison code assigned to the computer, and obtains the similarity according to the method of obtaining the similarity shown in FIG.

【００４３】たとえば図５を参照して、コンピュータ４
０−１のメモリに格納されている比較コードは「４２
６」である。したがって図６の第４行目の条件（ハ）に
より類似度１となる。コンピュータ４０−２では、比較
コードは「１４９」である。条件（ニ）により類似度０
となる。コンピュータ４０−３では、比較コードは「４
５８」である。条件（ロ）により類似度は２となる。コ
ンピュータ４０−１０００では、比較コードは「７３
２」である。条件（ニ）により類似度０となる。他のコ
ンピュータ４０−４〜４０−９９９でも同様の類似度計
算が行なわれる。類似度計算部３６の各コンピュータ４
０−１〜４０−１０００は、求められた類似度を出力部
３８に与える。For example, referring to FIG.
The comparison code stored in the memory 0-1 is “42”.
6 ". Therefore, the similarity is 1 according to the condition (c) in the fourth row of FIG. In the computer 40-2, the comparison code is “149”. Similarity 0 according to condition (d)
Becomes In the computer 40-3, the comparison code is "4
58 ". The similarity becomes 2 depending on the condition (b). In the computer 40-1000, the comparison code is "73
2 ". The similarity becomes 0 according to the condition (d). Similar similarity calculations are performed in the other computers 40-4 to 40-999. Each computer 4 of the similarity calculator 36
0-1 to 40-1000 give the obtained similarity to the output unit 38.

【００４４】出力部３８は、コンピュータ４０−１〜４
０−１０００からの類似度がすべて与えられると、表示
装置や印刷装置などにその類似度を出力する。The output unit 38 includes computers 40-1 to 40-4.
When all the similarities from 0 to 1000 are given, the similarities are output to a display device or a printing device.

【００４５】この第１の実施例では、１０００個の類似
度計算がコンピュータ４０−１〜４０−１０００により
並列に行なわれる。したがって従来の逐次処理による計
算に比べて単純計算で約１０００倍高速に行なわれる。In the first embodiment, 1000 similarity calculations are performed in parallel by the computers 40-1 to 40-1000. Therefore, the simple calculation is performed about 1000 times faster than the calculation by the conventional sequential processing.

【００４６】（２）第２の実施例図７は、本発明の第２の実施例の類似度計算装置の概略
構成図である。図７を参照してこの類似度計算装置は、
入出力管理部５０と、入出力管理部５０が接続されるネ
ットワーク５２と、ネットワーク５２に接続される１０
００個の類似度計算部５４−１〜５４−１０００とを含
む。(2) Second Embodiment FIG. 7 is a schematic configuration diagram of a similarity calculation apparatus according to a second embodiment of the present invention. Referring to FIG. 7, this similarity calculation device includes:
An input / output management unit 50, a network 52 to which the input / output management unit 50 is connected, and 10 connected to the network 52.
And 00 similarity calculation units 54-1 to 54-1000.

【００４７】入出力管理部５０は、入力部３２と、外部
知識２４と、入力特徴抽出部３４と、出力部３８とを含
む。図７と図３とにおいて、同一のブロックには同一の
参照符号および名称が与えられており、それらの機能も
同一である。したがってここではそれらについての詳し
い説明は繰返さない。この入出力管理部５０は、ワーク
ステーションやパーソナルコンピュータなどによって構
成される。The input / output management unit 50 includes an input unit 32, external knowledge 24, an input feature extraction unit 34, and an output unit 38. 7 and 3, the same blocks have the same reference numerals and names, and their functions are also the same. Therefore, detailed description thereof will not be repeated here. The input / output management unit 50 is configured by a workstation, a personal computer, or the like.

【００４８】外部知識２４には、図４に示されるデータ
が格納されている。類似度計算部５４−１〜５４−１０
００の各々は、ワークステーションやパーソナルコンピ
ュータなどを含む。これら各類似度計算部５４−１〜５
４−１０００は、ＣＰＵとメモリとを含む。１つの類似
度計算部のメモリには、１つの比較単語に対応した類語
コード（比較コード）および類似度計算のプログラムが
格納されている。The external knowledge 24 stores the data shown in FIG. Similarity calculators 54-1 to 54-10
Each of 00 includes a workstation, a personal computer, and the like. Each of these similarity calculation units 54-1 to 5-5
4-1000 includes a CPU and a memory. The memory of one similarity calculation unit stores a synonym code (comparison code) corresponding to one comparison word and a program of similarity calculation.

【００４９】以下、この発明の第２の実施例の類似度計
算装置の動作を説明する。キーボードなどからなる入力
部３２によって入力単語が入力特徴抽出部３４に与えら
れる。入力特徴抽出部３４は、この入力単語に対応する
入力コードを外部知識２４から抽出し、ネットワーク５
２を介して類似度計算部５４−１〜５４−１０００のす
べてに送る。各類似度計算部５４−１〜５４−１０００
は、送られてきた入力コードと、各類似度計算部のメモ
リに格納されている比較コードとの間の類似度を第１の
実施例における方法と同様の手順で計算する。類似度計
算部５４−１〜５４−１０００はすべて、求めた類似度
を入出力管理部５０にネットワーク５２を介して送る。
入出力管理部５０の出力部３８は、前述と同様に表示装
置、印刷装置などからなり、類似度計算部５４−１〜５
４−１０００から送られてきた類似度を出力する。The operation of the similarity calculation device according to the second embodiment of the present invention will be described below. An input word is provided to an input feature extraction unit 34 by an input unit 32 including a keyboard or the like. The input feature extraction unit 34 extracts an input code corresponding to the input word from the external knowledge 24, and
2 to all of the similarity calculation units 54-1 to 54-1000. Each similarity calculator 54-1 to 54-1000
Calculates the similarity between the transmitted input code and the comparison code stored in the memory of each similarity calculation unit in the same procedure as the method in the first embodiment. All the similarity calculation units 54-1 to 54-1000 send the obtained similarity to the input / output management unit 50 via the network 52.
The output unit 38 of the input / output management unit 50 includes a display device, a printing device, and the like as described above, and includes the similarity calculation units 54-1 to 5-5.
The similarity sent from 4-1000 is output.

【００５０】この第２の実施例においても、類似度計算
部５４−１〜５４−１０００はそれぞれ独立に類似度計
算を行なう。すなわち、１０００個の類似度計算が並列
に行なわれる。したがって従来の逐次処理による計算に
比べて単純計算で約１０００倍高速に類似度計算が行な
われる。Also in the second embodiment, the similarity calculation units 54-1 to 54-1000 independently calculate the similarity. That is, 1000 similarity calculations are performed in parallel. Therefore, similarity calculation is performed about 1000 times faster by simple calculation than the calculation by the conventional sequential processing.

【００５１】（３）第３の実施例図８は、本発明の第３の実施例の類似度計算装置の概略
構成を示すブロック図である。図８を参照して、この類
似度計算装置は、入力部３２と、外部知識２４と、入力
特徴抽出部３４と、類似度計算部６０と、出力部３８と
を含む。図８と図３とにおいて、同一のブロックには同
一の参照符号および名称が与えられている。それらの機
能も同一である。したがってここではそれらについての
詳しい説明は繰返さない。(3) Third Embodiment FIG. 8 is a block diagram showing a schematic configuration of a similarity calculating apparatus according to a third embodiment of the present invention. Referring to FIG. 8, the similarity calculation device includes an input unit 32, external knowledge 24, an input feature extraction unit 34, a similarity calculation unit 60, and an output unit 38. 8 and 3, the same blocks are given the same reference numerals and names. Their functions are the same. Therefore, detailed description thereof will not be repeated here.

【００５２】類似度計算部６０は、ＣＰＵ６２と、メモ
リ６４と、内容でアドレス可能な連想メモリ６６とを含
む。The similarity calculator 60 includes a CPU 62, a memory 64, and an associative memory 66 addressable by contents.

【００５３】メモリ６４には、類似度計算のプログラム
が格納されている。またメモリ６４には、各比較情報に
対応する類似度を格納するエリアが設けられている。The memory 64 stores a similarity calculation program. Further, the memory 64 is provided with an area for storing the similarity corresponding to each piece of comparison information.

【００５４】連想メモリ６６の各ワードには、１つの比
較単語に対応した類語コード（比較コード）が予め格納
されている。そして合計１０００個のワードに１０００
個の比較コードが格納されている。In each word of the associative memory 66, a synonym code (comparison code) corresponding to one comparison word is stored in advance. And 1000 for a total of 1000 words
Comparison codes are stored.

【００５５】連想メモリ６６は、マスクによる一致検索
機能と部分並列書込機能とを有するものとする。マスク
による一致検索機能とは、ビット列の特定部分に対して
一致検索を行なう機能をいう。部分並列書込機能とは、
一致が検出された複数のデータの、特定のビットに並列
にデータを書込む機能をいう。Associative memory 66 is assumed to have a matching search function using a mask and a partial parallel writing function. The match search function using a mask refers to a function of performing a match search on a specific portion of a bit string. What is the partial parallel writing function?
A function of writing data in parallel to a specific bit of a plurality of data for which a match is detected.

【００５６】連想メモリ６６のデータの格納形式が図９
に示される。図９を参照して、連想メモリ６６には比較
コード１〜１０００の１０００個のエリアが設けられ
る。各比較コードのためのエリア、たとえば比較コード
１のエリアは全部で１５ビットからなる。このうち先頭
から３ビットは類似度を求めるために使用されるエリア
である。後半の１２ビットは比較コードを格納するため
のエリアである。The data storage format of the associative memory 66 is shown in FIG.
Is shown in Referring to FIG. 9, associative memory 66 is provided with 1000 areas of comparison codes 1 to 1000. The area for each comparison code, for example, the area for comparison code 1, consists of a total of 15 bits. Of these, the first three bits are an area used for obtaining similarity. The latter 12 bits are an area for storing a comparison code.

【００５７】類似度を求めるためのエリアの３ビットの
第１のビットは、入力コードと比較コードとの１００の
位の一致／不一致を示すビットであり、２ビット目は入
力コードと比較コードとの１０の位の一致／不一致を示
すビットであり、３ビット目は入力コードと比較コード
との１の位の一致／不一致を示すビットである。The first bit of the three bits in the area for calculating the similarity indicates a match / mismatch of the 100th place between the input code and the comparison code, and the second bit indicates the input code and the comparison code. The third bit is a bit indicating a one-digit match / mismatch between the input code and the comparison code.

【００５８】比較コードを示す１２ビットはそれぞれ４
ビットずつの３つの領域に分けられる。これら３つの領
域は、図９に示されるように順に比較コードの１００の
位を表わす４ビットと、比較コードの１０の位を表わす
４ビットと、比較コードの１の位を表わす４ビットとで
ある。Each of the 12 bits indicating the comparison code is 4 bits.
It is divided into three areas of bits. As shown in FIG. 9, these three areas are composed of four bits representing the hundreds digit of the comparison code, four bits representing the tenth digit of the comparison code, and four bits representing the ones digit of the comparison code. is there.

【００５９】なお、各比較コードエリアのうち最初の３
ビットは、類似度計算に先立って初期値たとえばすべて
０に設定されているものとする。The first three of the comparison code areas
It is assumed that the bits are set to an initial value, for example, all 0 prior to the similarity calculation.

【００６０】以下、この第３の実施例の類似度計算装置
の動作を説明する。キーボード、文字認識装置、音声認
識装置などからなる入力部３２により入力単語が入力さ
れると、その入力単語が入力特徴抽出部３４に与えられ
る。入力特徴抽出部３４は、与えられる入力単語に対応
した類語コード（入力コード）を外部知識２４から抽出
する。外部知識２４に格納されているデータは図４に示
されたものと同様である。Hereinafter, the operation of the similarity calculating apparatus according to the third embodiment will be described. When an input word is input by the input unit 32 including a keyboard, a character recognition device, a voice recognition device, and the like, the input word is given to the input feature extraction unit 34. The input feature extraction unit 34 extracts a synonym code (input code) corresponding to a given input word from the external knowledge 24. The data stored in the external knowledge 24 is the same as that shown in FIG.

【００６１】抽出された入力コードは類似度計算部６０
に与えられる。類似度計算部６０は、次のようにして入
力コードと１０００個の比較コードとの間の類似度計算
を行ない、その結果を出力部３８に与える。The extracted input code is sent to the similarity calculating section 60.
Given to. The similarity calculation unit 60 calculates the similarity between the input code and the 1000 comparison codes as follows, and supplies the result to the output unit 38.

【００６２】類似度計算部６０では、連想メモリ６６を
用いた次のような類似度計算が行なわれる。類似度の定
義は、第１の実施例において図６を参照して説明したも
のと同じである。The similarity calculation section 60 performs the following similarity calculation using the associative memory 66. The definition of the similarity is the same as that described with reference to FIG. 6 in the first embodiment.

【００６３】以下、入力単語として「取材」が入力され
た場合の、類似度計算部６０の動作を説明する。入力単
語「取材」に対応する類語コード（入力コード）は図４
に示されるように「４５７」である。この入力コード
「４５７」が類似度計算部６０に与えられると、類似度
計算部６０は次のように動作する。The operation of the similarity calculator 60 when "reporting" is input as an input word will be described below. A synonym code (input code) corresponding to the input word “interview” is shown in FIG.
"457" as shown in FIG. When the input code “457” is provided to the similarity calculating section 60, the similarity calculating section 60 operates as follows.

【００６４】まず、図８のメモリ６４内に、図１１に示
すように、図１０に示される連想メモリ内の比較コード
の格納エリアと同様のデータ構造を有するエリアを設け
る。このエリアは８行のエリア７０、７２、７４、７
６、７８、８０、８２、８４からなる。第１行目のエリ
ア７０の後半の１２ビットには、入力コードの「４５
７」が格納される。このエリア７０の先頭の３ビットは
すべて０である。この１行目のエリア７０を入力の検索
コードと呼ぶ。First, as shown in FIG. 11, an area having the same data structure as the storage area of the comparison code in the associative memory shown in FIG. 10 is provided in the memory 64 of FIG. This area consists of eight rows of areas 70, 72, 74, 7
6, 78, 80, 82 and 84. In the latter 12 bits of the area 70 on the first line, “45”
7 "is stored. The first three bits of this area 70 are all 0s. The area 70 on the first line is called an input search code.

【００６５】第２行目のエリア７２には、入力の検索コ
ードのうち、その１００の位を表わす４ビット以外をマ
スクしたデータが格納される。エリア７４には、入力の
検索コードのうち１０の位を表わす４ビット以外をマス
クしたデータが格納される。エリア７６には、入力の検
索コードのうち１の位を表わす４ビット以外をマスクし
たデータが格納される。エリア７８には、先頭の３ビッ
トに“１１１”が格納され、それ以外のビットがマスク
されたデータが格納される。エリア８０には、先頭の３
ビットに“１１０”が格納され、それ以外のビットがマ
スクされたデータが格納される。エリア８２には、先頭
の２ビットに“１０”が格納され、それ以外のビットが
マスクされたデータが格納される。エリア８４には、先
頭のビットに“０”が格納され、それ以外のビットがす
べてマスクされたデータが格納される。以下、次に示す
ような手順に従って類似度計算が行なわれる。In the area 72 on the second line, data obtained by masking other than the four bits representing the hundreds digit of the input search code is stored. Area 74 stores data obtained by masking bits other than 4 bits representing the tens place in the input search code. In the area 76, data obtained by masking other than 4 bits representing the first place in the input search code is stored. In the area 78, "111" is stored in the first three bits, and data in which the other bits are masked is stored. Area 80 contains the first three
“110” is stored in the bit, and data in which the other bits are masked is stored. In the area 82, data in which “10” is stored in the first two bits and other bits are masked is stored. In the area 84, data in which “0” is stored in the first bit and all other bits are masked is stored. Hereinafter, similarity calculation is performed according to the following procedure.

【００６６】エリア７２に格納された、検索コード
の１００の位を表わす４ビット以外をマスクしたデータ
による一致検索の命令と、一致した比較コードの１００
の位の一致／不一致を示すビットに１を書込む命令とを
順に連想メモリ６６に与える。An instruction for a match search using data stored in area 72 and masking data other than the four bits representing the hundreds digit of the search code and a match code 100
And an instruction to write 1 to a bit indicating a match / mismatch of the order of order are given to the associative memory 66 in order.

【００６７】エリア７４に格納された、検索コード
の１０の位を表わす４ビット以外をマスクしたデータに
よる一致検索の命令と、一致した比較コードの１０の位
の一致／不一致を示すビットに１を書込む命令とを順に
連想メモリ６６に与える。An instruction for a match search based on data stored in the area 74 and masking data other than the 4 bits representing the tens digit of the search code, and adding 1 to the bit indicating the match / mismatch of the tens digit of the matched comparison code The instructions to be written are sequentially given to the associative memory 66.

【００６８】エリア７６に格納された、検索コード
の１の位を表わす４ビット以外をマスクしたデータによ
る一致検索の命令と、一致した比較コードの１の位の一
致／不一致を示すビットに１を書込む命令とを順に連想
メモリ６６に与える。An instruction for a match search based on data stored in the area 76 and masking data other than the four bits representing the one's place of the search code, and 1 to the bit indicating the match / mismatch of the first place of the matched comparison code The instructions to be written are sequentially given to the associative memory 66.

【００６９】上述の〜の処理を行なった結果、連想
メモリ６６上のデータは図１０に示されるようになる。
すなわち比較コード１（４２６）と入力コード「４５
７」とは１００の位のみが一致するために、図１０の比
較コード１のエリアに示されるように１００の位の一致
／不一致を示すビットのみが“１”となり、他の２ビッ
トは“０”となる。比較コード２（１４９）と入力コー
ドとは一致する桁がないために比較コード２の先頭の３
ビットは“０００”となる。同じように比較コード３の
先頭の３ビットは“１１０”となる。As a result of performing the above-mentioned processing (1), the data on the associative memory 66 is as shown in FIG.
That is, the comparison code 1 (426) and the input code “45”
Since only the hundreds digit matches "7", only the bit indicating match / mismatch of the hundreds digit becomes "1" as shown in the area of the comparison code 1 in FIG. 10, and the other two bits are "1". 0 ". Since there is no digit that matches the comparison code 2 (149) with the input code, the first 3
The bit becomes “000”. Similarly, the first three bits of the comparison code 3 are “110”.

【００７０】図１１のエリア７８に格納されたデー
タによる一致検索命令を連想メモリ６６に与え、一致し
た比較コードを検出する。そして、図８のメモリ６４上
に予め準備されていた、各比較情報に対応する類似度を
格納するエリアのうち、一致が検出された比較コードに
対応するエリアに類似度として「３」を格納する（図６
の条件（イ））。A match search command based on the data stored in the area 78 of FIG. 11 is given to the associative memory 66, and a matched comparison code is detected. Then, among the areas for storing the similarities corresponding to the respective pieces of comparison information prepared in advance in the memory 64 of FIG. (Fig. 6
Condition (a)).

【００７１】図１１のエリア８０に格納されたデー
タによる一致検索命令を連想メモリ６６に与え、一致し
た比較コードを検出する。そして、一致が検出された比
較コードに対応するメモリ６４上のエリアに類似度とし
て「２」を格納する（図６の条件（ロ））。A match search command based on the data stored in the area 80 of FIG. 11 is given to the associative memory 66, and a matched comparison code is detected. Then, “2” is stored as the degree of similarity in the area on the memory 64 corresponding to the comparison code where the match is detected (condition (b) in FIG. 6).

【００７２】図１１のエリア８２に格納されたデー
タによる一致検索命令を連想メモリ６６に与え、一致し
た比較コードを検出する。そして、１が検出された比較
コードに対応するメモリ６４上のエリアに類似度として
「１」を格納する（図６の条件（ハ））。A match search command based on the data stored in the area 82 of FIG. 11 is given to the associative memory 66, and a matched comparison code is detected. Then, “1” is stored as the degree of similarity in the area on the memory 64 corresponding to the detected comparison code of 1 (condition (c) in FIG. 6).

【００７３】図１１のエリア８４に格納されたデー
タによる一致検索命令を連想メモリ６６に与え、一致し
た比較コードを検出する。１が検出された比較コードに
対応するメモリ６４上のエリアに類似度として「０」を
与える。（図６の条件（ニ））。A match search command based on the data stored in the area 84 of FIG. 11 is given to the associative memory 66, and a matched comparison code is detected. “0” is given as the degree of similarity to the area on the memory 64 corresponding to the detected comparison code of “1”. (Condition (d) in FIG. 6).

【００７４】入力コードとすべての比較コードとの
間の類似度を出力部３８に与える。上記した〜で行
なわれる一致検索命令および書込命令は連想メモリ上の
すべての比較コードに対して並列に行なわれる。したが
って第１の実施例と同様に、従来の逐次処理による類似
度計算に比べて、単純計算で約１０００倍高速となり、
類似度計算を高速に実現することができる。The similarity between the input code and all the comparison codes is given to the output unit 38. The match search instruction and the write instruction performed in the above-mentioned steps are performed in parallel for all the comparison codes in the associative memory. Therefore, similar to the first embodiment, the simple calculation is about 1000 times faster than the similarity calculation by the conventional sequential processing.
Similarity calculation can be realized at high speed.

【００７５】このように本発明に係る類似度計算装置を
用いれば、入力情報と比較情報のそれぞれとの間の類似
度を、外部知識から獲得した、入力情報に対応した入力
特徴および比較情報に対応した比較特徴を用いて並列に
行なわれる類似度計算によって求めることができる。特
に、入力文と大量の用例（原文と訳文との対）とを比較
して最も類似した用例を求めてそれを基に訳文を生成す
るタイプの機械翻訳システムや、大量の情報と入力情報
との間の類似度を求める必要がある画像検索や文章検索
などの情報検索において、入力情報に対応した特徴を用
いて類似度を求める必要がある場合に、高速にそうした
処理を行なうことが可能となる。As described above, by using the similarity calculating apparatus according to the present invention, the similarity between the input information and the comparison information can be obtained from the input feature and the comparison information corresponding to the input information, obtained from the external knowledge. It can be obtained by similarity calculation performed in parallel using the corresponding comparison features. In particular, a machine translation system that compares an input sentence with a large number of examples (a pair of an original sentence and a translated sentence) to find the most similar example and generates a translated sentence based on the same, or a large amount of information and input information If it is necessary to find similarity between images using a feature corresponding to the input information in an information search such as image search or sentence search, it is possible to perform such processing at high speed. Become.

【００７６】[0076]

【発明の効果】以上のように請求項１に記載の類似度計
算装置によれば、複数個の比較情報に対応して外部知識
を用いて導出される比較情報の特徴のそれぞれと、入力
情報に対応して導出される特徴との間で所定の類似度計
算が並列に行なわれて類似度が導出される。特に、格納
内容で参照可能な記憶手段を用い、すべての比較情報の
特徴と、入力情報の特徴との所定部分同士の比較が１回
の動作で行なえる。この動作を、比較情報の個数よりも
はるかに少ない回数行なうだけで、すべての比較情報の
特徴と、入力情報の特徴との間の類似度を示す情報が得
られる。したがって、このようにして比較特徴を用いて
求める類似度計算において、従来の逐次処理と異なって
計算速度の大幅な向上を実現できる類似度計算装置を提
供できる。As described above, according to the similarity calculating apparatus according to the first aspect, each of the features of the comparison information derived using external knowledge corresponding to a plurality of pieces of comparison information and the input information A predetermined similarity calculation is performed in parallel with a feature derived corresponding to the above to derive a similarity. In particular, by using a storage means that can be referred to by the stored contents, the comparison between the features of all the comparison information and the features of the input information can be performed in a single operation. By performing this operation much less than the number of pieces of comparison information, information indicating the degree of similarity between the features of all pieces of comparison information and the features of input information can be obtained. Therefore, it is possible to provide a similarity calculation apparatus capable of realizing a great improvement in calculation speed unlike the conventional sequential processing in the similarity calculation calculated using the comparison features in this way.

[Brief description of the drawings]

【図１】本発明に係る類似度計算装置の概略の構成を示
す図である。FIG. 1 is a diagram showing a schematic configuration of a similarity calculation device according to the present invention.

【図２】類語辞書の分類体系の一部を示す図である。FIG. 2 is a diagram showing a part of a classification system of a thesaurus.

【図３】本発明の第１の実施例の類似度計算装置の概略
構成を示すブロック図である。FIG. 3 is a block diagram illustrating a schematic configuration of a similarity calculation device according to the first embodiment of this invention.

【図４】外部知識の内容を示す模式図である。FIG. 4 is a schematic diagram showing contents of external knowledge.

【図５】この発明の第１の実施例の類似度計算装置の類
似度計算部内の計算機メモリに格納されている比較コー
ドを示す図である。FIG. 5 is a diagram showing a comparison code stored in a computer memory in a similarity calculator of the similarity calculator according to the first embodiment of the present invention.

【図６】この発明における入力コードと比較コードとの
間の類似度の求め方を示す図である。FIG. 6 is a diagram showing how to determine the similarity between an input code and a comparison code according to the present invention.

【図７】本発明の第２の実施例の類似度計算装置の概略
構成を示すブロック図である。FIG. 7 is a block diagram illustrating a schematic configuration of a similarity calculating apparatus according to a second embodiment of this invention.

【図８】本発明の第３の実施例の類似度計算装置の概略
構成を示すブロック図である。FIG. 8 is a block diagram illustrating a schematic configuration of a similarity calculation device according to a third embodiment of the present invention.

【図９】この発明の第３の実施例の類似度計算装置にお
ける連想メモリ上の比較コードの格納形式を示す図であ
る。FIG. 9 is a diagram showing a storage format of a comparison code in an associative memory in the similarity calculating apparatus according to the third embodiment of the present invention.

【図１０】この発明の第３の実施例における連想メモリ
上の比較コードのデータを示す図である。FIG. 10 is a diagram showing comparison code data on an associative memory according to a third embodiment of the present invention.

【図１１】この発明の第３の実施例の類似度計算装置の
行なう類似度計算において使用する、メモリ上の検索の
ためのデータを示す図である。FIG. 11 is a diagram showing search data on a memory used in similarity calculation performed by the similarity calculation device according to the third embodiment of the present invention.

【図１２】従来の類似度計算装置における類似度計算の
手順を示す図である。FIG. 12 is a diagram showing a procedure of similarity calculation in a conventional similarity calculation device.

[Explanation of symbols]

２４外部知識３２入力部３４入力特徴抽出部３６類似度計算部３８出力部５０入出力管理部５２ネットワーク６０類似度計算部６６連想メモリ 24 external knowledge 32 input unit 34 input feature extraction unit 36 similarity calculation unit 38 output unit 50 input / output management unit 52 network 60 similarity calculation unit 66 associative memory

───────────────────────────────────────────────────── フロントページの続き (72)発明者飯田仁京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール自動翻訳電話研究所内 (56)参考文献特開平２−235176（ＪＰ，Ａ) 特開平２−297670（ＪＰ，Ａ) 特開昭58−205282（ＪＰ，Ａ) ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Jin Iida Kyoto, Soraku-gun, Seika-cho, 5th, Inaniya, 5th, Sanraya, ATIR Automatic Translation Telephone Research Institute, Inc. (56) References JP-A-2-2 235176 (JP, A) JP-A-2-297670 (JP, A) JP-A-58-205282 (JP, A)

Claims

(57) [Claims]

1. A means for inputting information for calculating a similarity between a plurality of pieces of predetermined comparison information, and information corresponding to the given information, which appears in the given information itself. Means for storing external knowledge for deriving a feature that is not included, each of the features of the comparison information derived using the external knowledge corresponding to the plurality of pieces of comparison information, and Means for performing a predetermined similarity calculation between features correspondingly derived using the external knowledge and deriving a similarity between each of the comparison information and the input information; Means for outputting the calculated similarity, wherein the means for deriving the similarity includes, for each of the plurality of pieces of comparison information, a first area storing a corresponding feature; Corresponds to the input information The stored result of comparison between the feature derived on 2
And storage means that can be referred to by stored contents, wherein the storage means includes:
All of the comparison information in which the storage content of the specifiable predetermined partial area matches the given comparison target can be simultaneously detected, and the means for deriving the similarity further includes: , And simultaneously detect all the predetermined portions of the stored content of the corresponding first region that match the predetermined portion of the feature derived corresponding to the input information, and detect all the corresponding second regions. A first coincidence detecting means for simultaneously storing a predetermined value indicating that a coincidence is detected in a specific partial area corresponding to the predetermined part; and a detecting and storing operation by the first coincidence detecting means. ,
Means for performing a desired number of times while changing the designation of the predetermined portion; and simultaneously detecting in the storage means the second area having a value corresponding to a given value corresponding to a certain similarity And a second coincidence detecting means for deriving a similarity to the comparison information by associating the comparison information corresponding to the detected second area with the certain similarity. .