JP3422396B2

JP3422396B2 - Similarity search method based on viewpoint

Info

Publication number: JP3422396B2
Application number: JP23158995A
Authority: JP
Inventors: 要笠原; 和光松澤
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-09-08
Filing date: 1995-09-08
Publication date: 2003-06-30
Anticipated expiration: 2015-09-08
Also published as: JPH0981578A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、単語の意味属性の
データベースを用いて単語の類似性に基づいて類似語を
検索する類似検索方法に関し、更に詳しくは、検索にお
いて重視する特徴概念である観点に基づいて類似語を検
索する観点に基づく類似検索方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a similarity retrieval method for retrieving similar words based on the similarity of words using a database of word semantic attributes, and more specifically, a feature concept to be emphasized in retrieval. The present invention relates to a similarity search method based on the viewpoint of searching for similar words based on.

【０００２】[0002]

【従来の技術】近年、ＷＳに代表されるコンピュータの
処理能力の飛躍的な向上と、大容量で安価な記憶媒体の
出現に起因して、各種のデータベースが数多く作成、使
用されるに至った。このようなデータベースについて、
データベースを構成する個々のデータ（以下、要素と称
する）相互間、或いは外部から入力されるデータと要素
との間の類似性を判別することにより要素を検索、分類
することが行われている。例えば、種々の文献に関する
情報を蓄積した文献データベースの場合、指定した文献
に類似した文献があるか否かを検索する。また、特許明
細書その他のテキストデータの集合については、内容が
類似したものを収集、分類することも頻繁に行われてい
る。更に、画像データベースの類似検索を行う際には、
個々の要素に画像の特徴語を複数付与し、この特徴語を
用いた類似検索が行われている。2. Description of the Related Art In recent years, various databases have been created and used due to the dramatic improvement in the processing capacity of a computer typified by WS and the advent of a large-capacity and inexpensive storage medium. . For such a database,
Elements are searched and classified by determining the similarity between individual data (hereinafter, referred to as elements) forming a database or between data and an element input from the outside. For example, in the case of a document database accumulating information on various documents, it is searched whether there is a document similar to the designated document. In addition, regarding a set of text data such as a patent specification, those having similar contents are often collected and classified. Furthermore, when performing a similar search of the image database,
A plurality of characteristic words of an image are given to each element, and a similarity search using this characteristic word is performed.

【０００３】このような類似検索を行う際には要素間の
類似性判別が必要であり、その代表的な例として特徴概
念を用いる方法があるのでこれについて説明する。この
方法は、データベース中の各要素に特徴概念を付与し、
これについての比較を行うことにより類似した要素を判
別する方法である。ここで言う特徴概念とは、キーワー
ド、属性とも呼ばれ、要素の特徴を表す単語により表現
される。動物に関するデータベースの場合、要素『馬』
については、その特徴を表す『蹄』、『たてがみ』、
『家畜』、『走る』その他の単語が特徴概念となり得
る。全文検索と呼ばれる技術においては、テキストに含
まれる単語のそれぞれをテキストの特徴概念としてテキ
スト同士の類似性を判別することが行われる。When performing such a similarity search, it is necessary to determine the similarity between elements, and as a typical example, there is a method using a characteristic concept, which will be described. This method gives each element in the database a characteristic concept,
This is a method of discriminating similar elements by making a comparison about this. The feature concept here is also called a keyword or an attribute, and is expressed by a word representing the feature of an element. In the case of a database on animals, the element "horse"
About, "hoof", "mane",
"Livestock", "run" and other words can be characteristic concepts. In a technique called full-text search, similarity between texts is determined by using each word included in the text as a characteristic concept of the text.

【０００４】このような特徴概念を複数付与された要素
同士の類似性の判別には種々の方法が存在する。例え
ば、共通の特徴概念が比較する要素中に存在するときは
類似度１を与え、存在しない時には類似度０を与える方
法や、比較する要素中の一致する特徴概念の数に比例し
て類似度を与える方法その他の方法がある。There are various methods for determining the similarity between elements to which a plurality of such characteristic concepts are added. For example, a method of giving a similarity of 1 when a common feature concept exists in the elements to be compared and a similarity of 0 when there is no common feature concept, or a similarity in proportion to the number of matching feature concepts in the elements to be compared. There are other ways to give.

【０００５】上記特徴概念に基づく要素類似性判別を用
いて類似要素の検索を行うことが行われている。その一
般的な方法としては、要素或いは外部から入力される特
徴概念を付与されたデータと類似した要素をデータベー
スから検索する際に、その要素とデータベース中の全て
の要素との類似度を算出し、類似度の高い要素を検索結
果として出力する方法がある。[0005] A similar element is searched by using the element similarity determination based on the characteristic concept. The general method is to calculate the degree of similarity between an element or all the elements in the database when searching the database for an element or an element similar to the data to which the characteristic concept input from the outside is added. , There is a method of outputting an element having a high degree of similarity as a search result.

【０００６】[0006]

【発明が解決しようとする課題】上述した従来の類似検
索方法には、要素間の類似の関係が状況に応じて変動す
るにもかかわらず、固定的な検索結果しか与えることが
できず、検索の状況に応じた柔軟な類似検索を行えない
問題点がある。単語『林檎』、『南天』、『鯛』、
『猿』、『蜜柑』、『唐辛子』それぞれが特徴概念を保
有するデータベースを例にとり、この問題を説明する。The above-described conventional similarity search method can only give a fixed search result even though the similar relationship between elements varies depending on the situation. There is a problem in that a similar similarity search cannot be performed according to the situation. The words "apple", "nanten", "sea bream",
This problem will be explained by using a database in which "monkey", "tangerine", and "chili pepper" have characteristic concepts.

【０００７】図６は、単語のデータベースの一例であ
る。要素番号１の『林檎』に類似した要素をこのデータ
ベース中から検索する際には、特徴概念を比較し要素間
の類似度を求め、類似度に基づいて検索結果を出力する
のが一般的である。例えば、比較する要素同士で共通な
要素の数を類似度とした場合、『林檎』に対する類似度
は、『南天』が２、『猿』が１、『蜜柑』が３、『唐辛
子』が１となり、類似度の最も大きな『蜜柑』を検索結
果として出力する。FIG. 6 shows an example of a word database. When searching the database for an element similar to "Ringo" with element number 1, it is common to compare the characteristic concepts to find the similarity between the elements and output the search result based on the similarity. is there. For example, if the number of common elements between the compared elements is the similarity, the similarity to "apple" is 2 for "nanten", 1 for "monkey", 3 for "tangerine", and 1 for "chili pepper". Then, "Tangerine" with the highest similarity is output as the search result.

【０００８】しかし、ユーザが『林檎』の特徴概念「赤
い」を重視した検索結果を期待するとき、上記検索方法
では、「赤い」を特徴概念に含まない『蜜柑』が得ら
れ、「赤い」を含み、特徴概念の比較により類似語と見
なせる『南天』や『唐辛子』を得ることができない。However, when the user expects a search result in which the feature concept "red" of "apple" is emphasized, "tangerine" which does not include "red" in the feature concept is obtained by the above search method, and "red" is obtained. It is impossible to obtain "nanten" and "chili pepper" that are regarded as similar words by comparing the characteristic concepts.

【０００９】このような、検索に対して重視すべき特徴
概念を観点と呼ぶ。検索は様々な状況、或いは条件で行
われるので、検索の観点は状況等に応じて変化する。同
じ要素に対する類似検索を行う場合であっても、検索の
観点に応じた類似検索結果を求める必要がある。Such a characteristic concept that should be emphasized for retrieval is called a viewpoint. Since the search is performed in various situations or conditions, the viewpoint of the search changes depending on the situation or the like. Even when the similar search is performed for the same element, it is necessary to obtain the similar search result according to the viewpoint of the search.

【００１０】一方、検索の観点を検索キーワードと見な
して単純な検索を行っても、検索を行う要素の特徴概念
を考慮に入れない類似検索では、比較要素と全く無関係
な要素が検索される。例えば、観点「赤い」を保有する
単語を検索結果として与える従来方式であるキーワード
検索では、『南天』以外に、期待していない『猿』が同
時に得られることになる。また、『唐辛子』の特徴概念
「赤色」は「赤い」と同義であり、検索結果として『唐
辛子』が含まれるべきであるが、上記キーワード検索の
方法で検索結果として含まれない。On the other hand, even if a simple search is performed by regarding the viewpoint of the search as a search keyword, in the similar search in which the characteristic concept of the element to be searched is not taken into consideration, an element completely unrelated to the comparison element is searched. For example, in the keyword search which is a conventional method in which a word having the viewpoint “red” is given as a search result, an unexpected “monkey” is obtained at the same time in addition to “nanten”. Further, the characteristic concept "red" of "chili pepper" is synonymous with "red" and "chili pepper" should be included in the search result, but it is not included in the search result by the above keyword search method.

【００１１】本発明は、上記に鑑みてなされたもので、
その目的とするところは、検索の状況に応じた柔軟な類
似検索を行うべく観点に基づいて類似語を検索する観点
に基づく類似検索方法を提供することにある。The present invention has been made in view of the above,
It is an object of the present invention to provide a similarity search method based on the viewpoint of searching for similar words in order to perform flexible similarity search according to the search situation.

【００１２】[0012]

【課題を解決するための手段】上記目的を達成するた
め、請求項１記載の本発明は、複数の要素の各々が要素
の特徴を表す単語である特徴概念を保有するデータベー
スにおいて、当該データベース中から指定された検索対
象の要素と類似する類似要素を前記データベース中より
検索するに際して、前記検索対象の要素および検索にお
いて重視すべき特徴概念である観点を指定し、当該観点
と等しい特徴概念を保有する要素を前記データベースか
ら検索して類似要素候補とし、前記検索対象として指定
された要素の特徴概念と前記類似要素候補の特徴概念を
比較して、共通する特徴概念の数が多いほど高くなる類
似度を計算し、類似度が高い上位複数の類似要素候補を
類似要素として出力するか、または予め決定した下限以
上の類似度を与える全ての類似要素候補を類似要素とし
て出力することを要旨とする。In order to achieve the above-mentioned object, the present invention according to claim 1 is a database having a feature concept in which each of a plurality of elements is a word representing a feature of the element, in the database. When a similar element similar to the element to be searched specified from is searched from the database, the element to be searched and a viewpoint that is a characteristic concept to be emphasized in the search are specified, and a characteristic concept equal to the viewpoint is held. Elements that are searched for from the database as similar element candidates, the characteristic concept of the element designated as the search target is compared with the characteristic concept of the similar element candidates, and the higher the number of common characteristic concepts, the higher the similarity. Degree is calculated, and a plurality of similar element candidates with high similarity are output as similar elements, or a degree of similarity equal to or higher than a predetermined lower limit is given. Like elements candidate Te is summarized in that output as similar elements.

【００１３】請求項１記載の本発明によれば、検索対象
の要素と検索において重視する観点を指定し、この観点
と等しい特徴概念を保有する要素をデータベースから検
索して類似要素候補とし、前記検索対象として指定され
た要素の特徴概念と前記類似要素候補の特徴概念を比較
して、共通する特徴概念の数が多いほど高くなる類似度
を計算し、類似度が高い上位複数の類似要素候補を類似
要素として出力するか、または予め決定した下限以上の
類似度を与える全ての類似要素候補を類似要素として出
力することにより、検索の状況を表す観点に応じた類似
検索結果を得ることができるとともに、検索処理時間を
短縮することができる。According to the first aspect of the present invention, an element to be searched and a viewpoint to be emphasized in the search are designated, and an element having a characteristic concept equal to this viewpoint is searched from the database to be a similar element candidate. The feature concept of the element designated as the search target is compared with the feature concept of the similar element candidate, the higher the similarity is calculated as the number of common feature concepts is increased, and the plurality of similar element candidates with high similarity are calculated. Is output as a similar element, or all similar element candidates that give a degree of similarity equal to or greater than a predetermined lower limit are output as similar elements, and thus it is possible to obtain a similar search result according to a viewpoint representing the search situation. At the same time, the search processing time can be shortened.

【００１４】また、請求項２記載の本発明は、請求項１
記載の発明において、前記観点を保有する要素を検索す
る際に、前記データベース中の各特徴概念について該特
徴概念を含む要素一覧表を作成し、該要素一覧表を用い
て前記要素を検索することを要旨とする。The present invention according to claim 2 is the same as claim 1.
In the invention described above, when searching an element having the viewpoint, an element list including the characteristic concept is created for each characteristic concept in the database, and the element is searched using the element list. Is the gist.

【００１５】請求項２記載の本発明では、データベース
中の各特徴概念について該特徴概念を含む要素一覧表を
作成し、該要素一覧表を用いて要素を検索する。According to the second aspect of the present invention, an element list including the characteristic concept is created for each characteristic concept in the database, and the element is searched using the element list.

【００１６】更に、請求項３記載の本発明は、複数の要
素の各々が要素の特徴を表す単語である特徴概念を保有
するデータベースにおいて、当該データベース中から指
定された検索対象の要素と類似する類似要素を前記デー
タベース中より検索するに際して、前記データベース中
の全特徴概念を意味に基づく分類名である意味分類で置
き換え、前記検索対象の要素および検索において重視す
べき特徴概念である観点を指定し、当該観点の意味分類
と等しい意味分類を保有する要素を前記データベースか
ら検索して類似要素候補とし、前記検索対象として指定
された要素の意味分類と前記類似要素候補の意味分類を
比較して、共通する意味分類の数が多いほど高くなる類
似度を計算し、類似度が高い上位複数の類似要素候補を
類似要素として出力するか、または予め決定した下限以
上の類似度を与える全ての類似要素候補を類似要素とし
て出力することを要旨とする。Further, the present invention according to claim 3 is similar to an element to be searched, which is designated from the database, in a database having a characteristic concept in which each of a plurality of elements is a word representing the characteristic of the element. When retrieving similar elements from the database, all characteristic concepts in the database are replaced with semantic classifications that are meaning-based classification names, and the element to be searched and the viewpoint that is a characteristic concept to be emphasized in the search are specified. , Searching the database for an element having a semantic classification equal to the semantic classification of the viewpoint as a similar element candidate, comparing the semantic classification of the element designated as the search target with the semantic classification of the similar element candidate, The higher the number of common semantic classifications, the higher the similarity is calculated, and the top multiple similar element candidates with high similarity are output as similar elements. Either, or a gist that output all similar elements candidate giving pre-determined lower limit or more similarity as similarity factors.

【００１７】請求項３記載の本発明によれば、データベ
ース中の全特徴概念を意味に基づく意味分類で置き換
え、検索対象の要素と検索において重視する観点を指定
し、この観点と等しい意味分類を保有する要素をデータ
ベースから検索して類似要素候補とし、前記検索対象と
して指定された要素の意味分類と前記類似要素候補の意
味分類を比較して、共通する意味分類の数が多いほど高
くなる類似度を計算し、類似度が高い上位複数の類似要
素候補を類似要素として出力するか、または予め決定し
た下限以上の類似度を与える全ての類似要素候補を類似
要素として出力することにより、検索の状況を表す観点
に応じた類似検索結果を得ることができるとともに、検
索処理時間を短縮することができる。また、意味におい
て観点と同義または意味の近い特徴概念を検出すること
ができ、人間の感覚に近い柔軟な類似検索を行うことが
できる。According to the present invention as set forth in claim 3, all the characteristic concepts in the database are replaced with semantic classifications based on meanings, the element to be searched and a viewpoint to be emphasized in the search are specified, and a semantic classification equal to this viewpoint is set. The retained element is searched from the database as a similar element candidate, the semantic classification of the element designated as the search target is compared with the semantic classification of the similar element candidate, and the higher the number of common semantic classifications, the higher the similarity. By calculating the degree of similarity and outputting a plurality of similar element candidates having a high degree of similarity as similar elements, or outputting all the similar element candidates that give a degree of similarity equal to or higher than a predetermined lower limit as similar elements, It is possible to obtain the similar search result according to the viewpoint representing the situation and reduce the search processing time. In addition, it is possible to detect a feature concept that is synonymous or close in meaning to the viewpoint in meaning, and it is possible to perform a flexible similarity search that is close to human sense.

【００１８】請求項４記載の本発明は、請求項３記載の
発明において、前記観点の意味分類を保有する要素を検
索する際に、前記データベース中の各要素の各意味分類
について該意味分類を含む要素一覧表を作成し、該要素
一覧表を用いて前記要素を検索することを要旨とする。According to a fourth aspect of the present invention, in the invention according to the third aspect, when searching for an element that holds the semantic classification of the viewpoint, the semantic classification is performed for each semantic classification of each element in the database. The gist is to create an element list including the elements and search for the element using the element list.

【００１９】請求項４記載の本発明では、データベース
中の各要素の各意味分類について該意味分類を含む要素
一覧表を作成し、該要素一覧表を用いて要素を検索す
る。According to the fourth aspect of the present invention, an element list including the semantic classification is created for each semantic classification of each element in the database, and the element is searched using the element list.

【００２０】[0020]

【発明の実施の形態】以下、図面を用いて本発明の実施
の形態について説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００２１】図１は、本発明の第１の実施形態に係る観
点に基づく類似検索方法の作用を示すフローチャートで
ある。同図に示す観点に基づく類似検索方法は、データ
ベース中の要素Ａに類似した要素をデータベース中より
検索する際に、検索において重視する観点となる特徴概
念ａを導入するものである。FIG. 1 is a flowchart showing the operation of the similarity search method based on the viewpoint according to the first embodiment of the present invention. The similar search method based on the viewpoint shown in the figure introduces a characteristic concept a which is a viewpoint to be emphasized in the search when searching the database for an element similar to the element A in the database.

【００２２】図１に示すフローチャートに従って類似検
索方法の作用を説明する。図１においては、まず検索の
対象となる要素Ａと検索の観点ａを指定するとともに、
要素数を計数するパラメータｉを１にセットする（ステ
ップＳ１１）。検索の対象は、データベース中から指定
しても、またはデータベース中の要素と同じように特徴
概念を保有するデータならばどのようなデータでもよ
い。また、観点ａは検索を行う際に重要となる特徴概念
を検索者が入力する。その際、観点を要素Ａ中の特徴概
念から選んでもよい。The operation of the similarity search method will be described with reference to the flowchart shown in FIG. In FIG. 1, first, the element A to be searched and the viewpoint a of the search are specified, and
A parameter i for counting the number of elements is set to 1 (step S11). The target of the search may be specified from the database, or may be any data as long as it has the characteristic concept like the elements in the database. Further, from the viewpoint a, the searcher inputs a characteristic concept that is important when performing a search. At that time, the viewpoint may be selected from the characteristic concept in the element A.

【００２３】次に、パラメータｉが要素数よりも小さい
か否かをチェックする（ステップＳ１２）。パラメータ
ｉが要素数よりも小さい場合には、全ての要素について
検索を行うためにステップＳ１３に進み、要素Ａｉの特
徴概念中に観点ａが含まれるか否かをチェックし、含ま
れない場合には、ステップＳ１５でパラメータｉをイン
クリメントして次の要素について同様に処理を行うべく
ステップＳ１２に戻るが、要素Ａｉの特徴概念中に観点
ａが含まれる場合には、該要素Ａｉを検索候補α中に追
加する（ステップＳ１４）。それから、パラメータｉを
インクリメントして（ステップＳ１５）、ステップＳ１
２に戻る。Next, it is checked whether the parameter i is smaller than the number of elements (step S12). If the parameter i is smaller than the number of elements, the process proceeds to step S13 to search for all the elements, and it is checked whether or not the viewpoint a is included in the characteristic concept of the element Ai. Returns to step S12 to increment the parameter i in step S15 and perform the same process for the next element. However, if the viewpoint a is included in the characteristic concept of the element Ai, the element Ai is set as the search candidate α. Add to the inside (step S14). Then, the parameter i is incremented (step S15), and step S1
Return to 2.

【００２４】以上のようにして、データベース中の全て
の要素について観点ａと等しい特徴概念が含まれている
か否かを判定し、含まれている要素からなる検索候補α
を生成する。As described above, it is determined whether or not the feature concept equal to the viewpoint a is included for all the elements in the database, and the search candidate α consisting of the included elements is determined.
To generate.

【００２５】データベース中の全ての要素について前記
判定を行い、検索候補を生成すると、ステップＳ１２に
おいて、パラメータｉが要素数よりも大きくなるので、
ステップＳ１６に進む。When the above determination is made for all the elements in the database and search candidates are generated, the parameter i becomes larger than the number of elements in step S12.
It proceeds to step S16.

【００２６】ステップＳ１６では、要素Ａと検索候補α
中の要素との類似度を計算する。類似度の計算法は、比
較する要素中の特徴概念の共通数が多いほど類似度が高
くなるような計算法であれば、どのようなものであって
も構わない。最後に、検索候補α中の類似度の高い要素
を類似検索結果として出力する（ステップＳ１７）。類
似度の高い要素の決定方法としては、類似度の高い上位
複数の要素を結果として出力する方法や、類似度の下限
を予め決定しておき、その類似度の下限以上の類似度を
与える要素全てを出力する方法など、いかなる方法であ
っても構わない。In step S16, the element A and the search candidate α
Calculate the similarity with the elements inside. Any calculation method may be used as the calculation method of the similarity degree as long as the common number of the characteristic concepts in the elements to be compared increases, the similarity degree increases. Finally, the element having a high degree of similarity in the search candidates α is output as a similarity search result (step S17). As a method of determining a high similarity element, a method of outputting a plurality of high similarity similarity elements as a result, or a lower limit of similarity is determined in advance, and an element giving a similarity equal to or higher than the lower limit of similarity is given. Any method such as a method of outputting everything may be used.

【００２７】次に、図１の類似検索方法について図６を
用いて更に具体的に説明する。図６のデータベース中の
要素Ａ『林檎』と類似した要素を観点ａ「赤い」につい
て検索して獲得する。Next, the similarity search method of FIG. 1 will be described more specifically with reference to FIG. An element similar to the element A “apple” in the database of FIG. 6 is searched and acquired for the viewpoint a “red”.

【００２８】最初に、データベース中から観点「赤い」
を含む要素を調べると、『南天』、『猿』が得られ、こ
れらを類似検索候補αとする。次に、要素Ａ『林檎』と
類似検索候補α中の各々の要素と類似度を計算する。こ
こでは、比較する要素で共通な特徴概念の数を類似度と
する。この場合、類似度は、『南天』と『林檎』が２、
『猿』と『林檎』が１となる。類似度の最も高い要素を
検索結果として出力する場合、「赤い」という観点で
『林檎』と類似した要素として『南天』を出力する。First, the viewpoint "red" from the database
When the element including is searched, "nanten" and "monkey" are obtained, and these are set as similar search candidates α. Next, the degree of similarity with the element A “apple” and each element in the similar search candidate α is calculated. Here, the number of feature concepts common to the elements to be compared is the similarity. In this case, the degree of similarity is 2 for "South Ten" and "Apple",
"Monkey" and "Apple" become 1. When outputting the element with the highest degree of similarity as a search result, "nanten" is output as an element similar to "apple" in terms of "red".

【００２９】上記実施形態の類似検索方法において、デ
ータベース中の全要素について観点ａと等しい特徴概念
が含まれているかを判定する際、検索を行う度にデータ
ベース中の全要素を判定する必要がある。そこで、デー
タベース中に含まれる全ての種類の特徴概念について、
特徴概念ごとにそれを含む要素名の集合からなる要素一
覧表を予め作成しておく。そして、観点と等しい特徴概
念を検索する際に、要素一覧表中の観点に対応する特徴
概念を参照し、その特徴概念を含む要素名の集合を検索
候補とし、この検索候補について類似検索を行う。上記
要素一覧表の参照により、観点ａと等しい特徴概念が含
まれているかの判定は、要素一覧表のみで可能となる。In the similarity search method of the above embodiment, when it is determined whether or not all the elements in the database include the characteristic concept equal to the viewpoint a, it is necessary to determine all the elements in the database each time a search is performed. . Therefore, for all types of feature concepts included in the database,
An element list including a set of element names including the characteristic concepts is created in advance. Then, when searching for a feature concept equal to the viewpoint, the feature concept corresponding to the viewpoint in the element list is referred to, a set of element names including the feature concept is set as a search candidate, and similar search is performed for this search candidate. . By referring to the element list, it is possible to determine whether or not the feature concept equal to the viewpoint a is included only in the element list.

【００３０】ここで、先に具体例として示した図６のデ
ータベースについて「赤い」を観点とした要素『林檎』
の類似要素を検索する方法を例として説明する。図６の
データベース中から観点「赤い」を含む要素を調べる際
に、データベースに含まれる特徴要素毎に、その特徴要
素を含む要素一覧表を作成すると図２のようになる。こ
の要素一覧表中を参照することにより、特徴概念「赤
い」を含む要素は、データベース中の全要素について調
べることなく『南天』、『猿』であることを判定するこ
とができる。Here, with respect to the database shown in FIG. 6 as a concrete example, the element "apple" from the viewpoint of "red"
A method of searching for a similar element will be described as an example. When an element including the viewpoint “red” is searched from the database of FIG. 6, an element list including the characteristic element is created for each characteristic element included in the database, as shown in FIG. By referring to this element list, it is possible to determine that the element including the characteristic concept "red" is "southern" or "monkey" without checking all the elements in the database.

【００３１】次に、第２の実施形態について説明する。
この実施形態は、検索の観点を含むデータベース中の要
素を検索するために観点と特徴概念を比較する際に、観
点と特徴概念それぞれを意味に基づいた意味分類名に変
換し、文字列は異なるが観点と意味のほぼ同一の特徴概
念を検索して、単語の意味を考慮した類似検索を行うも
のである。Next, a second embodiment will be described.
In this embodiment, when comparing a viewpoint and a characteristic concept in order to search an element in a database including a viewpoint of a search, each viewpoint and the characteristic concept are converted into meaning-based semantic classification names, and character strings are different. Is a feature concept that has almost the same viewpoint and meaning, and performs a similarity search considering the meaning of words.

【００３２】検索を行う前に、データベース中の各要素
の全ての特徴概念を意味分類名に変換する。意味分類名
とは、特徴概念で意味の類似したもの同士をクラスタリ
ング（意味分類）し、そのクラスタに付与された名前を
表す。意味分類の方法としては、分類語彙を参照して決
定する方法、全種類の特徴概念間の類似度に基づいてク
ラスタ分析を行い、得られたクラスタから意味分類名を
付与する方法、或いは、既存の類語辞典やシソーラスを
用いる方法など、どのようなものであっても構わない。Before searching, all the characteristic concepts of each element in the database are converted into semantic classification names. The meaning classification name represents a name given to the cluster by clustering (semantic classification) of features having similar meanings. As a method of semantic classification, a method of determining by referring to the classification vocabulary, a cluster analysis based on the similarity between all types of characteristic concepts, and a method of assigning a semantic classification name from the obtained cluster, or an existing method Any method may be used, such as a thesaurus or a thesaurus method.

【００３３】検索時には、まず、検索の対象となる要素
Ａと検索の観点となる意味分類ａを指定する。検索の対
象は、データベース中から指定しても、或いは、データ
ベース中の要素と同じように特徴概念を保有するデータ
ならばどのようなデータであっても構わない。データベ
ース外のデータを検索対象とする時には、データ中の特
徴概念を意味分類に変換する。また、観点ａは検索を行
う際に重要となる特徴概念を検索者が入力する。その
際、観点を要素Ａ中の特徴概念を変換した意味分類から
選んでも構わない。観点ａを検索者が入力する場合、観
点ａを意味分類に変換しておく。次に、データベース中
の全要素について、観点ａと等しい意味分類が含まれて
いるかを判定し、含まれている要素からなる検索候補α
を生成する。At the time of search, first, the element A to be searched and the semantic classification a from the viewpoint of search are specified. The target of the search may be designated from the database, or may be any data as long as it has the characteristic concept like the elements in the database. When searching data outside the database, the characteristic concepts in the data are converted into semantic categories. Further, from the viewpoint a, the searcher inputs a characteristic concept that is important when performing a search. At that time, the viewpoint may be selected from the semantic classification obtained by converting the characteristic concept in the element A. When the searcher inputs the viewpoint a, the viewpoint a is converted into the semantic classification. Next, for all elements in the database, it is determined whether or not the semantic classification equal to the viewpoint a is included, and the search candidate α consisting of the included elements is determined.
To generate.

【００３４】そして、要素Ａと検索候補α中の要素との
類似度を計算する。類似度の計算法は、比較する要素中
の意味分類の共通数が多いほど類似度が高くなるような
計算法であれば、どのようなものであっても構わない。
最後に、検索候補α中の類似度の高い要素を類似検索結
果として出力する。類似度の高い要素の決定方法として
は、類似度の高い上位複数の要素を結果として出力する
方法や、類似度の下限を予め決定しておき、その類似度
の下限以上の類似度を与える要素全てを出力する方法な
ど、いかなる方法であっても構わない。Then, the degree of similarity between the element A and the element in the search candidate α is calculated. Any calculation method may be used as the similarity calculation method as long as the larger the common number of semantic classifications among the compared elements, the higher the similarity.
Finally, the element with a high degree of similarity in the search candidates α is output as a similarity search result. As a method of determining a high similarity element, a method of outputting a plurality of high similarity similarity elements as a result, or a lower limit of similarity is determined in advance, and an element giving a similarity equal to or higher than the lower limit of similarity is given. Any method such as a method of outputting everything may be used.

【００３５】ここで、本実施形態について図６のデータ
ベースを用いて更に具体的に説明する。最初に、データ
ベース中の各要素の全ての特徴概念について、図３の意
味分類表を用いて意味分類に変換する。例えば、要素
『唐辛子』の特徴概念「赤色」を変換する際には、意味
分類表中で、「赤色」を含む意味分類を探し、その意味
分類名「赤」でデータベース中の「赤色」を置き換え
る。このようにして、図６のデータベースは図４のよう
に変換される。Here, the present embodiment will be described more specifically using the database shown in FIG. First, all the characteristic concepts of each element in the database are converted into the semantic classification using the semantic classification table of FIG. For example, when converting the characteristic concept "red" of the element "chili pepper", search for a meaning classification that includes "red" in the meaning classification table, and use the meaning classification name "red" to identify "red" in the database. replace. In this way, the database of FIG. 6 is converted as shown in FIG.

【００３６】次に、検索の観点「赤い」を図３の意味分
類表を用いて意味分類「赤」に置き換える。そして、図
４のデータベース中で、要素毎に意味分類「赤」を含む
かを調べ、含まれる要素『林檎』、『南天』、『猿』、
『唐辛子』からなる集合を検索候補αとする。Next, the point of view "red" is replaced with the meaning classification "red" using the meaning classification table of FIG. Then, in the database of FIG. 4, it is checked whether or not each element includes the meaning classification “red”, and the included elements “apple”, “nanten”, “monkey”,
A set of "chili peppers" is set as a search candidate α.

【００３７】それから、要素Ａ『林檎』と類似検索候補
α中の各々の要素との類似度を計算する。ここでは、比
較する要素で共通な特徴概念の数を類似度とする。この
場合、類似度は、『南天』と『林檎』が３、『猿』と
『林檎』が１、『唐辛子』と『林檎』が３となる。類似
度の最も高い要素を検索結果として出力する場合、「赤
い」という観点で『林檎』と類似した要素として、『南
天』と『唐辛子』を出力する。第１の実施形態では、要
素『唐辛子』中には観点「赤い」と意味の近い「赤色」
を含んでいるにも関わらず、検索候補および検索結果に
含まれなかった。図３のような意味分類表を利用するこ
とにより、第２の実施形態では、「赤い」という観点で
『林檎』と類似した『唐辛子』を『南天』とともに出力
することができる。Then, the degree of similarity between the element A "apple" and each element in the similar search candidate α is calculated. Here, the number of feature concepts common to the elements to be compared is the similarity. In this case, the degree of similarity is 3 for "nanten" and "apple", 1 for "monkey" and "apple", and 3 for "chili pepper" and "apple". When outputting the element with the highest degree of similarity as a search result, "nanten" and "chili pepper" are output as elements similar to "apple" in terms of "red". In the first embodiment, the viewpoint "red" and the meaning "red" are similar in the element "chili pepper".
Despite being included, it was not included in the search candidates and search results. By using the semantic classification table as shown in FIG. 3, in the second embodiment, “chili pepper” similar to “apple” in terms of “red” can be output together with “nanten”.

【００３８】第２の実施形態において、データベース中
の全要素について、観点ａと等しい意味分類が含まれて
いるかを判定する際、検索を行う度にデータベース中の
全ての要素を調べる場合、毎回判定のための時間を要す
る。そこで、データベース中に含まれる全ての種類の意
味分類について、意味分類ごとにそれを含む要素名の集
合からなる要素一覧表を予め作成しておく。そして、観
点と等しい意味分類を検索する際に、要素一覧表中の観
点に対応する意味分類を参照し、その意味分類を含む要
素名の集合を検索候補とし、この検索候補について類似
検索を行う。In the second embodiment, when it is determined whether or not all elements in the database include the semantic classification equal to the viewpoint a, when every element in the database is checked each time a search is performed, the determination is made every time. Take time for. Therefore, for all types of semantic classifications included in the database, an element list including a set of element names including the semantic classifications is created in advance. Then, when searching a semantic classification that is equal to a viewpoint, the semantic classification corresponding to the viewpoint in the element list is referred to, a set of element names including the semantic classification is set as a search candidate, and similar search is performed for this search candidate. .

【００３９】ここで、第２の実施形態について更に具体
的に説明する。上述した図４のデータベースについて
「赤い」を観点とした要素『林檎』の類似要素を検索す
る方法を例として説明する。図４のデータベース中から
観点「赤い」を含む要素を調べる際に、データベースに
含まれる意味分類毎に、その意味分類を含む要素一覧表
を作成すると図５のようになる。この要素一覧表中を参
照することにより、特徴概念「赤い」を含む要素は、デ
ータベース中の全要素について調べることなく『南
天』、『猿』、『唐辛子』であることを判定することが
できる。Now, the second embodiment will be described more specifically. An example of a method of searching for similar elements of the element "apple" from the viewpoint of "red" in the database of FIG. 4 will be described. FIG. 5 shows a list of elements including the meaning classification for each meaning classification included in the database when checking the elements including the viewpoint “red” from the database of FIG. By referring to this element list, it is possible to determine that the element including the characteristic concept "red" is "nanten", "monkey", "chili pepper" without checking all the elements in the database. .

【００４０】[0040]

【発明の効果】以上説明したように、請求項１記載の本
発明によれば、検索において重視する観点を指定し、該
観点と等しい特徴概念を保有する要素をデータベースか
ら検索して類似要素候補とし、該類似要素候補より類似
要素を検索しているので、従来では固定的であった検索
結果の出力しか得られない類似検索結果に比較して、検
索の状況を表す観点に応じた類似検索結果を得ることが
できるとともに、検索処理時間を短縮することができ
る。As described above, according to the present invention as set forth in claim 1, a viewpoint to be emphasized in a search is designated, an element having a characteristic concept equal to the viewpoint is searched from a database, and similar element candidates are searched. Since similar elements are searched from the similar element candidates, the similar search according to the viewpoint of the search situation is made in comparison with the similar search result in which only the output of the search result which has been fixed in the past is obtained. The result can be obtained and the search processing time can be shortened.

【００４１】また、請求項２記載の本発明によれば、デ
ータベース中の各特徴概念について該特徴概念を含む要
素一覧表を作成し、該要素一覧表を用いて要素を検索す
るので、要素一覧表の参照のみ行われ、検索処理時間を
短縮することができる。According to the second aspect of the present invention, an element list including the characteristic concept is created for each characteristic concept in the database, and an element is searched using the element list. Only the table is referenced, and the search processing time can be shortened.

【００４２】更に、請求項３記載の本発明によれば、デ
ータベース中の全特徴概念を意味に基づく意味分類で置
き換え、検索において重視する特徴概念である観点を指
定し、観点の意味分類と等しい意味分類を保有する要素
をデータベースから検索して類似要素候補とし、該類似
要素候補より類似要素を検索するので、意味において観
点と同義または意味の近い特徴概念を検出することがで
き、人間の感覚に近い柔軟な類似検索を行うことができ
る。Further, according to the present invention as set forth in claim 3, all the characteristic concepts in the database are replaced with the semantic classification based on the meaning, the viewpoint which is the characteristic concept to be emphasized in the search is designated, and is equal to the semantic classification of the viewpoint. Since an element having a semantic classification is searched from a database as a similar element candidate and a similar element is searched from the similar element candidate, it is possible to detect a characteristic concept that is synonymous with or similar in meaning to the viewpoint, and a human sense. It is possible to perform a flexible similarity search close to.

【００４３】請求項４記載の本発明によれば、データベ
ース中の各要素の各意味分類について該意味分類を含む
要素一覧表を作成し、該要素一覧表を用いて要素を検索
するので、要素一覧表の参照のみ行われ、検索処理時間
を短縮することができる。According to the present invention as set forth in claim 4, since an element list including the meaning classification of each element of each element in the database is created and the element list is searched, the element is searched. Since only the list is referenced, the search processing time can be shortened.

[Brief description of drawings]

【図１】本発明の第１の実施形態に係る観点に基づく類
似検索方法の作用を示すフローチャートである。FIG. 1 is a flowchart showing an operation of a similarity search method based on a viewpoint according to a first embodiment of the present invention.

【図２】特徴概念を含む要素一覧表の一例を示す図であ
る。FIG. 2 is a diagram showing an example of an element list including a characteristic concept.

【図３】特徴概念を意味分類に変換した単語データベー
スの一例を示す図である。FIG. 3 is a diagram showing an example of a word database in which a characteristic concept is converted into a semantic classification.

【図４】意味分類名を用いて表されたデータベースの一
例を示す図である。FIG. 4 is a diagram showing an example of a database represented using semantic classification names.

【図５】意味分類を含む要素一覧表の一例を示す図であ
る。FIG. 5 is a diagram showing an example of an element list including meaning classifications.

【図６】単語のデータベースの一例を示す図である。FIG. 6 is a diagram showing an example of a word database.

フロントページの続き (56)参考文献特開平６−162099（ＪＰ，Ａ) 特開平５−233689（ＪＰ，Ａ) 笠原要他，多観点概念ベースの自己精練化手法，電子情報通信学会技術研究報告，日本，社団法人電子情報通信学会，1994年５月13日，ＶＯＬ94 Ｎｏ．33，第１頁乃至第８頁笠原要他，類似語検索における観点の自動生成法，情報処理学会研究報告, 日本，社団法人情報処理学会，1996年７月25日，96−Ｆ１−42，第29頁乃至第36頁笠原要他，概念ベースを用いた常識語の類似検索，電子情報通信学会技術研究報告，日本，社団法人電子情報通信学会，1995年９月28日，ＡＩ95−22〜 23，第23頁乃至第30頁笠原要他，観点に基づく概念間の類似性判別，情報処理学会論文誌，日本, 社団法人情報処理学会，1994年３月 15日，ｖｏｌ35 Ｎｏ．３，第505頁乃至第509頁笠原要他，精練化に基づく概念ベース構成法，電子情報通信学会技術研究報告，日本，社団法人電子情報通信学会，1995年５月26日，ＤＥ95−１〜８，第49頁乃至第56頁 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-6-162099 (JP, A) JP-A-5-233689 (JP, A) Kasahara Kaname et al. Multi-view concept based self-refining method, IEICE technology Research report, Japan, The Institute of Electronics, Information and Communication Engineers, May 13, 1994, VOL94 No. 33, pp. 1-8 Kaname, et al., Automatic generation method of viewpoint in similar word search, IPSJ research report, Japan, IPSJ, 25 July 1996, 96-F1-42, Page 29 to 36 Kaname, Kaname et al., Similarity search for common sense words using concept base, IEICE Technical Report, Japan, The Institute of Electronics, Information and Communication Engineers, September 28, 1995, AI95-22 ~ 23, pp. 23 to 30, Kaname, et al., Similarity discrimination between concepts based on viewpoint, IPSJ journal, Japan, IPSJ, March 15, 1994, vol35 No. 3, p. 505, p. 509, 509, Kasahara et al., Concept-based construction method based on refinement, IEICE technical report, Japan, The Institute of Electronics, Information and Communication Engineers, May 26, 1995, DE95-1 to 8, pages 49 to 56 (58) Fields investigated (Int.Cl. ⁷ , DB name) G06F 17/30 JISST file (JOIS)

Claims

(57) [Claims]

1. A database that holds a feature concept, wherein each of a plurality of elements is a word representing the feature of the element ,
Similar to the search target element specified in the database
When searching for similar elements from the database, the elements to be searched and the features to be emphasized in the search
Specifies the concept is in view, the element carrying the same features concept with the viewpoint retrieved from said database and similar elements candidate, the compound wherein the concept of the element specified as the search target
Comparing the characteristic concepts of similar element candidates,
The higher the number, the higher the similarity is calculated, and the top multiple similar element candidates with high similarity are set as similar elements.
Output or give a similarity above a predetermined lower limit
A similar search method based on the point of view that all the similar element candidates are output as similar elements .

2. When retrieving an element having the viewpoint, an element list including the characteristic concept is created for each characteristic concept in the database, and the element is retrieved using the element list. A similarity search method based on the viewpoint of claim 1.

3. A database that holds a feature concept, each of a plurality of elements being a word representing the feature of the element ,
Similar to the search target element specified in the database
When searching for similar elements from the database, all feature concepts in the database are replaced with semantic classification that is a classification name based on meaning, and elements to be searched and features to be emphasized in the search
Specifies the viewpoint is a concept, the elements carrying the same meaning classification and semantic classification of the viewpoint retrieved from said database and similar elements candidate semantic classification and the class of the specified element as the search target
Comparing the semantic classifications of similar element candidates,
The higher the number, the higher the similarity is calculated, and the top multiple similar element candidates with high similarity are set as similar elements.
Output or give a similarity above a predetermined lower limit
A similar search method based on the point of view that all the similar element candidates are output as similar elements .

Wherein when searching for elements that possess semantic classification of the viewpoints for each semantic classification of the elements in said database to create an element list containing the semantic classification, by using the element list The similarity search method based on the aspect of claim 3, wherein the element is searched.