JPH0823869B2

JPH0823869B2 - Database similarity search method

Info

Publication number: JPH0823869B2
Application number: JP4222577A
Authority: JP
Inventors: 秀雄島津
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1992-08-21
Filing date: 1992-08-21
Publication date: 1996-03-06
Anticipated expiration: 2011-03-06
Also published as: JPH0668153A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、関係データベースシス
テム上に事例ベース検索を構築するデータベース類似検
索方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database similarity search method for constructing a case base search on a relational database system.

【０００２】[0002]

【従来の技術】事例ベース推論（Ｃａｓｅ−ｂａｓｅｄ
Ｒｅａｓｏｎｉｎｇ，以下ＣＢＲと略す）は、人工知
能の分野で最近注目を浴びてきた新しい推論パラダイム
である（参考文献：「新しい問題解決方式として実用化
が期待される事例ベース推論」小林重信、中村孝太郎、
日経ＡＩ別冊１９９１春号）。2. Description of the Related Art Case-based reasoning
Reasoning (hereinafter abbreviated as CBR) is a new reasoning paradigm that has recently attracted attention in the field of artificial intelligence (Reference: "Case-based reasoning expected to be put to practical use as a new problem solving method" Shigenobu Kobayashi and Kotaro Nakamura. ,
Nikkei AI separate volume 1991 spring issue).

【０００３】ＣＢＲの構成は図７のようになる。事例ベ
ースには、過去に解いた問題とその解法の組からなる事
例が、数多く格納されている。新たに解くべき問題が与
えられると、その問題に最も類似した問題を解いた事例
が事例ベースから取り出される（検索フェーズ）。ＣＢ
Ｒでは、類似事例の検索が重要な処理である。事例ベー
スを検索する時に、解くべき問題と完全に一致（ｅｘａ
ｃｔｍａｔｃｈｉｎｇ）しなくても部分的に一致（ｐ
ａｒｔｉａｌｍａｔｃｈｉｎｇ）すればよい。次に、
検索された類似事例中の問題表現と解くべき問題表現の
差を吸収するために、システムが持つ変形ルールを使っ
て類似事例中の一部を適応修正（修正フェーズ）し、そ
の結果を解くべき問題の解決案として提示する。そのよ
うにして作られた解は常に正しいとは限らないので、実
際に適用してみる。その結果が正しければ「新しい事例
とその解決方法」を示した正の事例として事例ベースに
格納するし、失敗した時には失敗した理由が解析され、
「新しい問題とそれに対する誤った回折方法とその過ち
を繰り返さないための判断基準」が事例ベースに格納さ
れる（評価・修復フェーズ）。適用した結果の成功・失
敗の判断は、ＣＢＲシステム自身がする場合もあるし、
利用者が判断を下す場合もある。このように、ＣＢＲで
は、新しい問題が与えられると、その解法に成功するに
しても失敗するにしても、常にその結果をシステムの事
例ベースに加えていく。失敗した時には２度と同じ間違
いをしないようにインデックス付けを行なう。この方針
に基づいてＣＢＲ向けツールが提案されている（参考文
献：アンイターフェースフォーケースベーストノレッジ
アクイジション、Ｃ．リーズベック，ＡｎＩｎｔｅｒ
ｆａｃｅｆｏｒＣａｓｅ−ＢａｓｅｄＫｎｏｗｌ
ｅｄｇｅＡｃｑｕｉｓｉｔｉｏｎ，Ｐｒｏｃ．ｏｆ
ＷｏｒｋｓｈｏｐｏｎＣａｓｅ−ＢａｓｅｄＲ
ｅａｓｏｎｉｎｇ，ＤＡＲＰＡ，１９８８）。このシス
テムについて図８に概略図を示す。図８で、１０１は事
例属性ベクトル、１０２は事例ベース、１０３は問題格
納属性ベクトル、１０４は部分マッチング定義部、１０
５は重み定義部、１０７は属性類似度計算部、１０９は
総類似度計算部、である。図９は、上記参考文献では、
この実施例と同じ構成をしているわけではないが、原理
的には同じである。属性のベクトルの形で表現された多
くの事例が事例ベース１０２に格納されており、解くべ
き問題が問題格納属性ベクトル１０３に格納されると、
事例ベース１０２中の事例が１つずつ取り出され、事例
属性ベクトル１０１にロードされ、その類似度の計算が
なされる。対応する要素属性間の部分マッチングの定義
は部分マッチング定義部１０４に定義され、また要素属
性の重みは重み定義部１０５に格納されている。各属性
に対応する属性類似度計算部１０７が属性の類似度の計
算をすると、総類似度計算部にてその総和が計算され
る。The structure of the CBR is shown in FIG. The case base stores a large number of cases composed of a set of problems solved in the past and a solution method thereof. When a new problem to be solved is given, the case that solves the problem most similar to the problem is retrieved from the case base (search phase). CB
In R, searching for similar cases is an important process. When searching the case base, the exact match (exa
Partial match without ct matching) (p
artistic matching). next,
In order to absorb the difference between the problem expression in the searched similar cases and the problem expression to be solved, a part of the similar cases should be adaptively modified (correction phase) using the transformation rules of the system, and the result should be solved. Present it as a solution to the problem. The solution created in this way is not always correct, so try applying it. If the result is correct, it is stored in the case base as a positive case showing "new case and its solution", and when it fails, the reason for failure is analyzed,
“New problems, erroneous diffraction methods for them, and criteria for avoiding repeated mistakes” are stored in the case base (evaluation / repair phase). In some cases, the CBR system itself may judge the success or failure of the applied result,
The user may make the decision. Thus, in CBR, when a new problem is given, whether the solution succeeds or fails, the result is always added to the case base of the system. If it fails, index it so that it will not make the same mistake twice. Based on this policy, tools for CBR have been proposed (reference: Anita Face for Cases-based Knowledge Acquisition, C. Leesbeck, An Inter.
face for Case-Based Knowl
edge Acquisition, Proc. of
Workshop on Case-Based R
easing, DARPA, 1988). A schematic diagram of this system is shown in FIG. In FIG. 8, 101 is a case attribute vector, 102 is a case base, 103 is a question storage attribute vector, 104 is a partial matching definition part, 10
5 is a weight definition unit, 107 is an attribute similarity calculation unit, and 109 is a total similarity calculation unit. FIG. 9 shows that in the above reference,
Although it does not have the same configuration as this embodiment, it is the same in principle. Many cases represented in the form of attribute vectors are stored in the case base 102, and when the problem to be solved is stored in the problem storage attribute vector 103,
The cases in the case base 102 are fetched one by one, loaded into the case attribute vector 101, and the degree of similarity is calculated. The definition of the partial matching between the corresponding element attributes is defined in the partial matching definition unit 104, and the weight of the element attribute is stored in the weight definition unit 105. When the attribute similarity calculation unit 107 corresponding to each attribute calculates the attribute similarity, the total similarity calculation unit calculates the sum.

【０００４】このように、従来の事例ベースシステムで
は、事例は、事例ベースシステム自身が内蔵するデータ
記憶部に格納されていた。また、事例ベースシェルでの
事例検索の基本的な方法は、問題が与えられると個別の
事例と１つずつ比較していき、類似の事例を見つける手
法（ＮｅａｒｅｓｔＮｅｉｇｈｂｏｒ手法と呼ばれ
る）であった。事例が属性の集合で表現されている場
合、個々の属性は、その属性値問題類似度が定義されて
いる。例えば、事例内の１つの属性として「言語」があ
れば、その類似度は、図６のように定義されている。例
えば、ＰＲＯＬＯＧとＰＲＯＬＯＧの間の類似度は１で
あり、ＰＲＯＬＯＧとＬＩＳＰの間の類似度は、両者が
ＡＩ−ＬＡＮＧＵＡＧＥというグループに属するので、
０．８と読むことが出来る。従来のＮｅａｒｅｓｔＮ
ｅｉｇｈｂｏｒ手法のアルゴリズムは、以下のようにな
る。As described above, in the conventional case-based system, the case is stored in the data storage unit incorporated in the case-based system itself. In addition, the basic method of case retrieval in the case-based shell is a method (called a Nearest Neighbor method) in which, when a problem is given, individual cases are compared one by one to find similar cases. When a case is represented by a set of attributes, the attribute value problem similarity is defined for each attribute. For example, if there is "language" as one attribute in the case, the degree of similarity is defined as shown in FIG. For example, the similarity between PROLOG and PROLOG is 1, and the similarity between PROLOG and LISP is that both belong to the group AI-LANGUAGE,
It can be read as 0.8. Conventional Nearest N
The algorithm of the eighbor method is as follows.

【０００５】１．事例を１つ取り出す。[0005] 1. Pick up one case.

【０００６】２．属性ごとに、属性ごとに定義された類
似度定義を使って、問題と事例の間の類似度を計算し、
それから問題とその事例との間の総合類似度を計算す
る。総合類似度の計算式は以下のようになる。2. For each attribute, use the similarity definition defined for each attribute to calculate the similarity between the problem and the case,
Then the overall similarity between the problem and its case is calculated. The formula for calculating the overall similarity is as follows.

【０００７】[0007]

【数１】 [Equation 1]

【０００８】３．すべての事例に対して、１、２を繰り返す。3. Repeat 1 and 2 for all cases.

【０００９】４．総合類似度の最も大きいものが、問題
に対して類似名事例として選択される。4. The one with the highest overall similarity is selected as the similar name case for the problem.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、このよ
うに事例ベースを事例ベースシステム自身が内蔵するデ
ータ記憶部に格納する方法では，現実問題として以下の
問題があった。However, the method of storing the case base in the data storage unit built in the case base system itself has the following problems as actual problems.

【００１１】・事例ベース構築プロジェクトは，既存の
データベースをより有効活用しようという場合が多い。
その場合、従来のデータベースと別に事例ベースを作成
するのは無駄であるし、保守も困難であり、実現性が低
い。In many cases, the case-based construction project tries to utilize the existing database more effectively.
In that case, it is useless to create a case base separately from the conventional database, maintenance is difficult, and the feasibility is low.

【００１２】・開発プロジェクトが全く新規に事例ベー
スを作成する場合でも、事例ベースの開発・管理・検索
の処理は、従来のデータベースシステムのそれと重なる
部分も多く、両者は全く合いいれないものではない。ま
た、既存データベース管理システムが有する検索高速化
の種々の技術は、事例ベース検索に於いても有効に使わ
れるべきである。Even when a development project creates a completely new case base, the case base development, management, and search processes often overlap with those of the conventional database system, and the two are not incompatible. . In addition, various techniques for increasing the search speed that the existing database management system has should be effectively used in case-based search.

【００１３】・事例ベースを新規に独自に作成すると、
事例ベース検索・推論のプロジェクトが失敗したり終了
すると、せっかく作成した事例ベースもクズになってし
まう。もし、商用データベース管理システムを使って事
例ベースを構築しておけば、たとえ事例ベースプロジェ
クトが終焉しても、少なくともデータベースは残るとい
うリスクヘッジの効果もある。When a new case base is created,
If the case-based search / inference project fails or ends, the case-base created will be a waste. If a case base is constructed using a commercial database management system, there is a risk hedging effect that at least the database remains even if the case base project ends.

【００１４】・事例ベースの検索システムを構築しよう
として始めたプロジェクトにおいてすら、利用者は、い
つも類似検索をしたいと望んでいるわではなく、しばし
ば正確なデータベース検索をしたい。事例ベースまたは
データベースが、あいまい検索も正確な検索も出来るこ
とが最も良い。[0014] Even in a project that started out to build a case-based search system, users often do not want a similar search, but rather an accurate database search. It is best if the case-based or database can do fuzzy or exact searches.

【００１５】本発明の目的は、関係データベースシステ
ム上に事例ベース検索を効率よく構築する方法を提供す
ることである。An object of the present invention is to provide a method for efficiently constructing a case base search on a relational database system.

【００１６】[0016]

【課題を解決するための手段】本発明は，３つの発明か
らなる。The present invention comprises three inventions.

【００１７】第１の本発明は、請求項１、２、３で構成
される。The first aspect of the present invention comprises claims 1, 2, and 3.

【００１８】請求項１は、ある属性と属性値を与えられ
ると、前もって前記属性の２つの値間の類似度が定義さ
れた属性類似度定義記述を参照して、前記属性がとる値
として、前記属性値と最も類似度の高い属性値の集合の
値のうちのどれかでなければならないことを表す検索条
件式、前記属性がとる値として、前記属性値と２番目に
類似度の高い属性値の集合の値のうちのどれかでなけれ
ばならないことを表す検索条件式、前記属性がとる値と
して、前記属性値と３番目に類似度の高い属性値の集合
の値のうちのどれかでなければならないことを表す検索
条件式、のように類似度の高い順に、類似度を付記した
検索条件式のリストを作成する類似度順属性値検索条件
リスト作成手段を持つことを特徴とするデータベース類
似検索方法である。According to a first aspect, when an attribute and an attribute value are given, the attribute similarity definition description in which the similarity between the two values of the attribute is defined in advance is referred to, A search condition expression indicating that it must be one of the values of the set of attribute values having the highest similarity to the attribute value, and the attribute having the second highest similarity to the attribute value as the value taken by the attribute. A search condition expression indicating that it must be one of the values of the set of values, and one of the values of the set of attribute values that has the third highest similarity to the attribute value as the value that the attribute takes. It is characterized by having a similarity order attribute value search condition list creating means for creating a list of search condition formulas with similarities in descending order of similarity, such as a search condition formula indicating that it must be Database similarity search method

【００１９】請求項２は、複数の属性と属性値の組で１
つの事例が表される時、属性毎に請求項１で述べた前記
類似度順属性値検索条件リスト作成手段によって検索条
件式のリストを作成し、前記属性ごとに作成された検索
条件式のリストから属性ごとに任意に条件式を１つずつ
取り出して、前記属性毎の検索条件式の総当たり的組み
合わせをすべてつくり、各組み合わせの属性毎の検索条
件式を論理積で結合する形で合成した条件式を生成し、
また、各組み合わせ内の検索条件式に付記されていた類
似度に前もって定義されていた対応する属性の重みを加
重平均した値を当該組み合わせの総合類似度として算出
し、総合類似度の値と前記合成条件式の対を総合類似度
の値の大きい順に並べて出力する事例類似条件リスト出
力手段、を持つことを特徴とするデータベース類似検索
方法である。A second aspect of the present invention is a set of a plurality of attributes and attribute values.
When two cases are represented, a list of search condition expressions is created for each attribute by the similarity order attribute value search condition list creating means described in claim 1, and a list of search condition expressions created for each attribute is created. From the above, one conditional expression is arbitrarily extracted for each attribute, all brute force combinations of the search conditional expressions for each attribute are created, and the search conditional expressions for each attribute of each combination are combined by logical product. Generate a conditional expression,
Further, a value obtained by weighting and averaging the weights of the corresponding attributes defined in advance in the similarity added to the search condition expression in each combination is calculated as the total similarity of the combination, and the value of the total similarity and the above A database similarity search method characterized by having a case similarity condition list output means for arranging and outputting a pair of synthetic conditional expressions in the descending order of total similarity value.

【００２０】請求項３は、１つの事例がデータベースの
１レコードに相当するデータベース上に類似検索機能を
有するシステムに於いて、複数の属性と属性値の組で表
される問題が与えられると、請求項２で述べた前記事例
類似条件リスト出力手段から、前記事例に対応して出力
される総合類似度と合成条件検索式の対のリストの先頭
の対を取り出し、前記先頭の対の合成条件検索式を対応
するデータベース検索式に変換し、データベースに発行
して、前記データベースからレコードを検索し、検索結
果を前記先頭の対の総合類似度の値とともに提示するこ
とを行い、同じ事を、前記リストの先頭の対を１つずつ
取り出しながら繰り返すことを特徴とするデータベース
類似検索方法である。According to a third aspect of the present invention, in a system having a similarity search function on a database in which one case corresponds to one record of the database, a problem represented by a plurality of attributes and attribute values is given, From the case similarity condition list output means described in claim 2, the top pair of the list of pairs of the total similarity and the combination condition search expression output corresponding to the case is extracted, and the combination condition of the top pair is extracted. Convert the search formula into a corresponding database search formula, issue it to the database, search records from the database, present the search result together with the value of the total similarity of the top pair, and do the same thing, The database similarity search method is characterized in that the first pair of the list is taken out one by one and repeated.

【００２１】第２の本発明は、請求項２において、総合
類似度の値と合成条件式の対をすべて出力するのではな
く、総合類似度の値が前もって定義しておいた総合類似
度下限境界値と比較して大きい対のみを選択し、総合類
似度の値の大きい順に並べて出力する事例類似条件リス
ト出力手段、を持つことを特徴とするデータベース類似
検索方法である。According to a second aspect of the present invention, in claim 2, not all the pairs of the value of the overall similarity and the combination conditional expression are output, but the value of the overall similarity has a lower limit of the overall similarity defined in advance. A database similarity search method comprising: a case similarity condition list output unit that selects only pairs that are larger than the boundary value and arranges and outputs them in descending order of total similarity value.

【００２２】第３の本発明は、請求項３において、総合
類似度と合成条件検索式の対のリストのすべての対を取
り出して処理をするのではなく、対ごとにデータベース
検索して得られたレコード数の累計を計算しておき、前
記累計の値が、前もって定めておいた最大レコード件数
値を越えたならば、そこで前記総合類似度と合成条件検
索式の対のリストから対を取り出してデータベース検索
する処理を終了することを特徴とするデータベース類似
検索方法である。According to a third aspect of the present invention, in claim 3, instead of extracting and processing all the pairs in the list of pairs of the total similarity and the synthetic condition search expression, a database search is performed for each pair. The total number of records is calculated, and if the value of the total exceeds the maximum number of records set in advance, then a pair is taken out from the list of pairs of the total similarity and the composition condition search formula. The database similarity search method is characterized by terminating the process of searching the database.

【００２３】既存のデータベース上に事例ベース検索シ
ステムを構築する場合、データベースから１つずつレコ
ードを検索してきて上記ＮｅａｒｅｓｔＮｅｉｇｈｂ
ｏｒアルゴリズムを実効するのはまるで非効率なので、
本発明では、全く異なるアルゴリズムを考案した。When constructing a case-based search system on an existing database, records are searched one by one from the database and the above-mentioned Nearest Neighb
It is totally inefficient to implement the or algorithm, so
In the present invention, a completely different algorithm was devised.

【００２４】ここで、個々の事例は、関係データベース
のレコードに対応する。本発明のアルゴリズムの大きな
流れは以下の通りである。Here, each case corresponds to a record in the relational database. The major flow of the algorithm of the present invention is as follows.

【００２５】１．与えられた問題と類似なレコードを検
索する為の検索条件式を生成する。1. Generate a search condition expression to search for records similar to the given problem.

【００２６】２．それら検索条件式の中から最も類似度
の大きい検索条件式を取りだして、その検索条件式を使
ってデータベース検索をする。2. The retrieval condition formula having the highest degree of similarity is taken out of these retrieval condition formulas, and the database is searched using the retrieval condition formula.

【００２７】３．１つ以上のレコードが検索されたら、
それを対応する類似度とともに提示する。充分な数のレ
コードが提示された場合には、これで終了とする。そう
でなければ、２へ戻り、残りの検索条件式に対して同じ
操作を繰り返す。3. When one or more records are retrieved,
It is presented with the corresponding similarity. If a sufficient number of records have been presented, this ends the process. If not, the process returns to 2 and the same operation is repeated for the remaining search condition expressions.

【００２８】従来の手法では、問題と個別の比較事例と
類似度を計算した時点ではじめて問題とその事例の間の
類似度の値がわかったが、本発明の手法では、まず類似
度を計算し、その類似度の値をとるために条件をもと
め、それからデータベース検索を行い、実体としての個
別事例を手に入れる、というように順序が大きく異な
る。In the conventional method, the value of the similarity between the problem and the case is known only when the similarity between the problem and the individual comparison case is calculated, but in the method of the present invention, the similarity is calculated first. However, the order is greatly different, such as obtaining a condition to obtain the value of the degree of similarity, then performing a database search to obtain individual cases as entities.

【００２９】[0029]

【実施例】第１の本発明の実施例の説明は、請求項１、
２、３からなる。３つの請求項の全体の流れを記述して
いるのは請求項３であるが、その中の一部の機能として
請求項２を利用している。請求項２は、その中の一部の
機能として請求項１を利用している．そこで、実施例と
しては、請求項１、２、３の順に説明して行く。DESCRIPTION OF THE PREFERRED EMBODIMENTS The first embodiment of the present invention will be described in claim 1,
It consists of a few. It is claim 3 that describes the overall flow of the three claims, but claim 2 is used as a part of the function. Claim 2 uses Claim 1 as a part of the functions. Therefore, as an embodiment, description will be made in the order of claims 1, 2, and 3.

【００３０】図１に第１の発明の請求項１の部分の実施
例を示す。属性類似度定義記述の例としては、「言語」
属性の場合の例が図６に示されている。このような定義
が個々の属性ごとに前もって定義されているとする。FIG. 1 shows an embodiment of the portion of claim 1 of the first invention. As an example of the attribute similarity definition description, "language"
An example of the case of attributes is shown in FIG. It is assumed that such a definition is defined in advance for each individual attribute.

【００３１】属性と属性値が与えられると、その属性に
対する属性類似度定義記述を参照して、属性類似度定義
記述から、与えられた属性値に相当するノードをみつ
け、それを親ノードとする。一方、現ノードとして
［空」を入れる。もし親ノードがなければ終了するが、
親ノードがあれば繰り返し処理Ａにいく。When an attribute and an attribute value are given, the attribute similarity definition description for the attribute is referred to, a node corresponding to the given attribute value is found from the attribute similarity definition description, and it is used as a parent node. . On the other hand, [empty] is entered as the current node. If there is no parent node, it ends,
If there is a parent node, go to processing A repeatedly.

【００３２】繰り返し処理Ａの処理は以下の通りであ
る。親ノード自身を含め、親ノードの子孫に当たる末端
ノードのうち、現ノード自身または現ノードの子孫にあ
たる末端ノードを除いた物をすべて数え上げる。属性が
とる値として、それら数え上げた末端ノードの値のうち
のどれかでなければならない、というデータベース検索
条件式を生成する。一方、その親ノードについている類
似度の値を調べる。条件式と類似度の値を対にして出力
する。親ノードの親を見つける。親ノードかなければ繰
り返し処理Ａを終了するが、親があれば、親ノードを現
ノードとし、新たな親を親ノードとし、繰り返し処理Ａ
を再び行なう。The processing of the repeated processing A is as follows. Among the end nodes that are descendants of the parent node, including the parent node itself, all items excluding the end nodes that are the current node itself or descendants of the current node are counted. A database search conditional expression that the value taken by the attribute must be one of the values of the counted end nodes is generated. On the other hand, the similarity value of the parent node is checked. The conditional expression and the similarity value are output as a pair. Find the parent of the parent node. If there is no parent node, the iterative process A ends, but if there is a parent, the parent node becomes the current node, the new parent becomes the parent node, and the iterative process A
Do again.

【００３３】例を使ってい説明する。いま、ＰＲＯＬＯ
Ｇか利用者から与えられた値だとすると、その類似条件
式と類似度の値は次のようになる。An example will be described. Now PROLO
If G or a value given by the user, the similar conditional expression and the value of similarity are as follows.

【００３４】・第１類似集合は、ＰＲＯＬＯＧで類似度１・第２類似集合は、ＰＲＯＬＯＧを除いたＡＩ−ＬＡＮ
ＧＵＡＧＥ下にある要素なので「ＬＩＳＰ」であり、そ
の類似度は０．８である。The first similar set is PROLOG and the degree of similarity is 1. The second similar set is the AI-LAN excluding PROLOG.
Since it is an element under GUAGE, it is “LISP” and its similarity is 0.8.

【００３５】・第３類似集合は、ＰＲＯＬＯＧ，ＬＩＳ
Ｐを除いたＨＬ−ＬＡＮＧＵＡＧＥ下にある要素なので
「ＦＯＲＴＲＡＮ」「ＰＡＳＣＡＬ」「Ｃ」「Ｃ＋＋」
であり、その類似度は０．５である。The third similar set is PROLOG, LIS
Since it is an element under HL-LANGUAGE except P, "FORTRAN""PASCAL""C""C++"
And its similarity is 0.5.

【００３６】・第４類似集合は、ＰＲＯＬＯＧ，ＬＩＳ
Ｐ，ＦＯＲＴＲＡＮ，ＰＡＳＣＡＬ，Ｃ，Ｃ＋＋を除い
たＡＮＹ−ＬＡＮＧＵＡＧＥ下にある要素なので「ＡＳ
Ｍ」であり、その類似度は０である。となる。The fourth similar set is PROLOG, LIS
Since it is an element under ANY-LANGUAGE except P, FORTRAN, PASCAL, C, and C ++, "AS
M ”, and the degree of similarity is 0. Becomes

【００３７】第１の本発明の請求項２の実施例を図２に
示す。個別の属性は、図６のように２つの値間の類似度
が定義されている。今、例として、言語属性の他に、工
数属性とマシン属性があったとする。利用者からの入力
として、言語属性、工数属性、マシン属性の値が与えら
れたとする。そうすると、請求項１の類似度順属性値検
索条件リスト作成手段を使って、言語属性、工数属性、
マシン属性のそれぞれの検索条件式のリストが生成され
る。次に、それぞれの条件式リストから１つずつ取り出
して組合せを作る。言語属性の条件式リストの要素がＮ
個あり、工数属性の条件式リストの要素がＭ個あり、マ
シン属性の条件式リストの要素がＬ個あった場合には、
合計（Ｎ×Ｍ×Ｌ）個の組合せが生成される。An embodiment of claim 2 of the first invention is shown in FIG. For each individual attribute, the degree of similarity between two values is defined as shown in FIG. As an example, it is assumed that there are man-hour attribute and machine attribute in addition to the language attribute. It is assumed that values of language attribute, man-hour attribute, and machine attribute are given as input from the user. Then, using the similarity order attribute value search condition list creating means of claim 1, the language attribute, the man-hour attribute,
A list of search condition expressions for each machine attribute is generated. Next, one is taken out from each conditional expression list to create a combination. The element of the conditional expression list of language attribute is N
If there are M elements of the conditional expression list of the man-hour attribute and L elements of the conditional expression list of the machine attribute,
A total of (N × M × L) combinations are generated.

【００３８】例えば、１つの組合せの例としては、言語
属性の検索条件式リストからは「言語＝ＬＩＳＰ（類似
度０．８）」が取り出され、工数属性の検索条件式リス
トからは「工数＝１０人月（類似度０．５）」が取り出
され、マシン属性の検索条件式リストからは「マシン＝
ＥＷＳ−ＲＩＳＣ（類似度０．９）」が取り出され、
「言語＝ＬＩＳＰ（類似度０．８）、工数＝１０人月
（類似度０．５）、マシン＝ＥＷＳ−ＲＩＳＣ（類似度
０．９）」が生成される。次に、組合せごとに、組合せ
内要素である検索条件式を論理積で結合する形で合成し
た条件式を生成する。上の組合せは次のようになる。「言語＝ＬＩＳＰａｎｄ工数＝１０人月ａｎｄ
マシン＝ＥＷＳ−ＲＩＳＣ」同時に、組合せ毎に組合せ内要素である各検索条件式に
付記されていた類似度から総合類似度を計算する。総合
類似度の計算式の１例としては加重平均を計算するもの
があり、式は以下のようになる。For example, as an example of one combination, "language = LISP (similarity 0.8)" is extracted from the search condition expression list of the language attribute, and "man-hour =" from the search condition expression list of the man-hour attribute. "10 person months (similarity 0.5)" is retrieved, and "machine =
EWS-RISC (similarity 0.9) "is retrieved,
"Language = LISP (similarity 0.8), man-hours = 10 man-months (similarity 0.5), machine = EWS-RISC (similarity 0.9)" is generated. Next, for each combination, a conditional expression that is a combination of the search condition expressions that are the elements within the combination is formed by combining them with a logical product. The above combination would be: "Language = LISP and man-hours = 10 man-months and
Machine = EWS-RISC ”At the same time, the total similarity is calculated for each combination from the similarities added to the search condition formulas that are the elements within the combination. An example of the formula for calculating the total similarity is to calculate a weighted average, and the formula is as follows.

【００３９】[0039]

【数２】 [Equation 2]

【００４０】上の例では、言語属性の重みが０．５、工
数属性の重みが０．３、マシン属性の重みが０．２だと
すると、（０．５×０．８）＋（０．３×０．５）＋
（０．２×０．９）÷（０．５＋０．３＋０．２）＝
０．７３となり、総合類似度の値は０．７３である。組
合せごとに総合類似が計算されると、それを総合類似度
の値の大きい順に並べて出力する。In the above example, if the language attribute weight is 0.5, the man-hour attribute weight is 0.3, and the machine attribute weight is 0.2, then (0.5 × 0.8) + (0.3 × 0.5) +
(0.2 × 0.9) ÷ (0.5 + 0.3 + 0.2) =
The value is 0.73, and the value of the total similarity is 0.73. When the total similarity is calculated for each combination, the total similarity is arranged and output in descending order of the total similarity value.

【００４１】第１の本発明の請求項３の実施例を図３に
示す。複数の属性と属性値の組で表される問題が与えら
れると、請求項２の事例類似条件リスト出力手段を使っ
て。前記問題に対応して出力される総合類似度と合成条
件検索式の対のリストを生成させ、総合類似度の値が大
きい順に並べる。リストの先頭の対を取り出し、合成条
件検索式からデータベース検索式を生成する。これは、
関係データベースの代表的な言語ＳＱＬを使うと簡単に
生成できる。合成条件式は、ＳＱＬのＳＥＬＥＣＴ−Ｆ
ＲＯＭ−ＷＨＥＲＥ型のＷＨＥＲＥ句そのものなので、
「ＳＥＬＥＣＴ＊ＦＲＯＭ対象テーブル」という文字列
を前もって用意しておき、それに取り出した合成条件検
索式を文字列レベルで結合すれば良い。例えば、上の例
では、「ＳＥＬＥＣＴ＊ＦＲＯＭ対象テーブルＷＨＥＲ
Ｅ言語＝ＬＩＳＰａｎｄ工数＝１０人月ａｎｄ
マシン＝ＥＷＳ−ＲＩＳＣ」となる。属性ごとに生成さ
れる検索式としては、言語＝ＬＩＳＰのような等式で表
現されるものの他に、「言語＝ＬＩＳＰまたはＰＲＯＬ
ＯＧ，ｓａｍａｌｌｔａｌｋ」のように複数のＯＲで表
現される場合もある。このときには、この表現に対応す
るＳＱＬ表現を生成してやる。この例では、「言語ｉｎ
（ＬＩＳＰ，ＰＲＯＬＯＧ，ｓｍａｌｌｔａｌｋ）」と
なる。また「１０＜工数＜２０」のように範囲で指定さ
れる場合もある。このときにも、この表現に対応するＳ
ＱＬ表現を生成してやる。この例では、「工数ｂｅｔｗ
ｅｅｎ１０ａｎｄ２０」となる。これらの変換は
簡単に１対１に出来るのでその方法は特に目新しい技術
は必要としない。次に、生成されたＳＱＬ式をデータベ
ースに発行して、前記データベースからレコードを検索
する。その検索結果を総合類似度の値とともに提示す
る。合成条件検索式のリストが空になれば終了、そうで
なければ合成条件検索式のリストから次の先頭の要素を
取り出し上記の処理を繰り返す。An embodiment of claim 3 of the first invention is shown in FIG. Given a problem represented by a plurality of attribute-attribute value pairs, use the case-similar condition list output means of claim 2. A list of pairs of the total similarity and the synthesis condition search expression output corresponding to the above-mentioned problem is generated and arranged in descending order of the value of the total similarity. The head pair of the list is taken out and a database search expression is generated from the synthesis condition search expression. this is,
It can be easily generated using SQL, which is a typical language for relational databases. The compound conditional expression is SQL SELECT-F.
Since it is the ROM-WHERE type WHERE clause itself,
A character string "SELECT * FROM target table" may be prepared in advance, and the combined condition retrieval formulas extracted therefrom may be combined at the character string level. For example, in the above example, "SELECT * FROM target table WHER
E language = LISP and man-hours = 10 man-months and
Machine = EWS-RISC ". As the search expression generated for each attribute, in addition to the expression expressed by an equation such as language = LISP, “language = LISP or PROL”
It may be expressed by a plurality of ORs such as “OG, smalltalk”. At this time, an SQL expression corresponding to this expression is generated. In this example, "language in
(LISP, PROLOG, smalltalk) ". In some cases, the range may be designated as “10 <man-hours <20”. Also at this time, S corresponding to this expression
Generate a QL expression. In this example, “man-hour betw
een 10 and 20 ". Since these conversions can be easily made one-to-one, the method does not require any new technique. Next, the generated SQL expression is issued to the database, and the record is retrieved from the database. The search result is presented together with the value of the total similarity. If the synthesis condition search expression list is empty, the process ends. If not, the next top element is extracted from the synthesis condition search expression list and the above process is repeated.

【００４２】第２の本発明の実施例を説明する。第２の
本発明は、第１の本発明の請求項２を拡張したものであ
る。本発明を請求項２の内容と入れ換えても、第１の本
発明の機能が損なわれることはなく、本発明で拡張した
機能が、第１の本発明にも反映されることになる。A second embodiment of the present invention will be described. The second aspect of the present invention is an extension of claim 2 of the first aspect of the present invention. Even if the present invention is replaced with the contents of claim 2, the function of the first present invention is not impaired, and the function expanded by the present invention is also reflected in the first present invention.

【００４３】第２の本発明の実施例において拡張された
部分は、第１の本発明の請求項２で生成した合成条件検
索式と総合類似度の対のうち前もって定義しておいた総
合類似度の下限境界と比較して大きい対のみを選択し、
そうでもないものは捨てるということである。この機能
を付加することで，総合類似度の低いＳＱＬ式を発行し
て類似度のたいして高くない事例を検索する計算のムダ
をすることがなくなる。The extended portion in the second embodiment of the present invention is the total similarity defined in advance in the combination of the synthetic condition retrieval formula generated in claim 2 of the first present invention and the total similarity. Select only pairs that are large compared to the lower bound of degrees,
It means throwing away the things that are not so. By adding this function, it is not necessary to issue a SQL expression having a low overall similarity to search for a case where the similarity is not so high as to waste the calculation.

【００４４】第３の本発明は、第１の本発明の請求項３
を拡張したものである。第３の本発明の実施例において
拡張された部分は、第１の本発明の請求項３でデータベ
ース検索した結果を監視し検索されたレコードの総計を
計算しておき、もし、前もって定義しておいた最大レコ
ード件数値を越えたならばそこで検索および結果の表示
を止めるというものである。The third aspect of the present invention is the claim 3 of the first aspect of the present invention.
Is an extension of. The expanded part in the third embodiment of the present invention is to monitor the result of the database search in claim 3 of the first present invention and calculate the total number of the searched records, and define it in advance. If the maximum number of records is exceeded, search and display of results will be stopped there.

【００４５】・事例ベースプロジクトは、既存のデータ
ベースをより有効活用しようという場合が多い。その場
合、本発明を使うと、従来のデータベースと事例ベース
を共用できるので、開発コスト、保守コスト、一貫性の
効果がある。In many cases, the case-based project intends to utilize the existing database more effectively. In that case, the present invention can be used to share a case base with a conventional database, which has the effect of development cost, maintenance cost, and consistency.

【００４６】・開発プロジェクトが全く新規に事例ベー
スを作成する場合でも、本発明を使うと、事例ベースの
開発・管理・検索の処理は、従来のデータベースシステ
ムの有する検索高速化の種々の技術をそのまま活用でき
る効果がある。Even if a development project creates a completely new case base, by using the present invention, various techniques for speeding up the search that a conventional database system has can be used for the case base development / management / search processing. There is an effect that it can be used as it is.

[Brief description of drawings]

【図１】第１の本発明の請求項１の部分の実施例1 an embodiment of the first claim part of the present invention, FIG.

【図２】第１の本発明の請求項２の部分の実施例FIG. 2 is an embodiment of the portion of claim 2 of the first invention.

【図３】第１の本発明の請求項３の部分の実施例FIG. 3 is an embodiment of the part of claim 3 of the first invention.

【図４】第２の発明の実施例FIG. 4 is an embodiment of the second invention.

【図５】第３の発明の実施例FIG. 5: Embodiment of the third invention

【図６】属性値間類似度の定義図FIG. 6 is a definition diagram of similarity between attribute values.

【図７】事例ベース推論（ＣＢＲ）の構成図FIG. 7 is a block diagram of case-based reasoning (CBR).

【図８】ＣＢＲシステムの概略図FIG. 8 is a schematic diagram of a CBR system.

[Explanation of symbols]

１０１事例属性ベクトル１０２事例ベース１０３問題格納属性ベクトル１０４部分マッチング定義部１０５重み定義部１０７属性類似度計算部１０９総類似度計算部 101 Case Attribute Vector 102 Case Base 103 Problem Storage Attribute Vector 104 Partial Matching Definition Section 105 Weight Definition Section 107 Attribute Similarity Calculation Section 109 Total Similarity Calculation Section

───────────────────────────────────────────────────── フロントページの続き (56)参考文献人工知能学会研究会資料ＳＩＧ−ＫＢＳ −9102−７（1991）Ｐ．57−97 ＡＡＡＩ−92（1992）Ｐ．843−849 情報処理学会45回（平成４年後期）全国大会講演論文集２Ｐ．175−178 情報処理学会44回（平成４年前期）全国大会講演論文集２Ｐ．49−52 情報処理学会44回（平成４年前期）全国大会講演論文集３Ｐ．241−242 情報処理学会43回（平成３年後期）全国大会講演論文集２Ｐ．95−96 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) References The Society for Artificial Intelligence Research Material SIG-KBS-9102-7 (1991) P. 57-97 AAAI-92 (1992) p. 843-849 Proceedings of the 45th Conference of the Information Processing Society of Japan (Late 1992) 2nd p. 175-178 Proceedings of the 44th Annual Conference of IPSJ (the first half of 1992) 2nd p. 49-52 Proceedings of Information Processing Society of Japan 44th Annual Meeting (Past 1992) 3P. 241-242 Proceedings of the 43rd IPSJ National Conference (Late 1991) 2 p. 95-96

Claims

[Claims]

1. A Given certain attributes and attribute values, and previously referring to, the value is the attribute value of the attribute similarity is defined attribute similarity definition description between two of said attribute top
Is a search condition expression indicating that the attribute value belongs to a set of attribute values having a high degree of similarity, and the value of the attribute has the second highest degree of similarity to the attribute value.
Search condition expression indicating that the attribute belongs to a set of attribute values
A collection of attribute values whose sex values have the third highest degree of similarity to the attribute values.
Database, wherein the search condition indicating that they belong to the case, a high order of similarity as, by having a similarity order attribute value search list creating means for creating a list of search conditional expression are indicated by the similarity Similar search method.

2. When one case is represented by a set of a plurality of attributes and attribute values, a list of search condition expressions is created by the similarity-ordered attribute value search condition list creating means for each attribute, and the attribute is created. From the list of search condition expressions created for each attribute, one conditional expression is arbitrarily extracted for each attribute, all brute force combinations of the search condition expressions for each attribute are created, and the search condition expression for each attribute in the combination is created. A conditional expression that is composed by combining with the logical product is generated, and the weighted average value of the weights of the corresponding attributes defined in advance to the similarity added to the search conditional expression in each combination is calculated. A database having case similarity condition list output means for calculating a total similarity of a combination and arranging and outputting a pair of the value of the total similarity and the synthetic conditional expression in descending order of the value of the total similarity. Similar search method.

3. In a system having a similarity search function on a database in which one case corresponds to one record in the database, if a problem represented by a plurality of attributes and attribute values is given, the case similarity is obtained. From the condition list output means, the top pair of the list of pairs of the total similarity and the combined condition search expression output corresponding to the case is taken out, and the combined condition search expression of the top pair is converted into the corresponding database search expression. Convert it, publish it to the database, retrieve the record from the database, present the search result together with the value of the total similarity of the top pair, and do the same thing for one top pair in the list. A database similarity search method characterized in that it is repeated while fetching each.

4. Not outputting all the pairs of the total similarity value and the synthetic conditional expression, but only the pairs whose total similarity value is larger than the previously defined total similarity lower limit boundary value. 3. The database similarity search method according to claim 2, further comprising: case similarity condition list output means that selects and outputs the arranged case similarity condition lists in descending order of total similarity value.

5. The total number of records obtained by searching the database for each pair is calculated instead of extracting and processing all the pairs of the list of pairs of the total similarity and the synthetic condition search expression, If the value of the cumulative total exceeds a predetermined maximum number of records, a process for extracting a pair from the list of pairs of the total similarity and the synthetic condition search expression and terminating the database is terminated. The database similarity search method according to claim 3.