JP3258063B2

JP3258063B2 - Database search system and method

Info

Publication number: JP3258063B2
Application number: JP05696592A
Authority: JP
Inventors: 寛高田
Original assignee: NS Solutions Corp
Current assignee: NS Solutions Corp
Priority date: 1992-02-07
Filing date: 1992-02-07
Publication date: 2002-02-18
Anticipated expiration: 2017-02-18
Also published as: JPH05225238A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、データベース検索シス
テム及び方法に関し、特に複数の条件によりデータベー
スから必要な情報を取り出すためのデータベース検索シ
ステム及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database search system and method, and more particularly to a database search system and method for extracting necessary information from a database according to a plurality of conditions.

【０００２】[0002]

【従来の技術】全物件検索によるデータベース検索にお
いて条件検索を行う場合には、所定の検索キーが設定さ
れ、この検索キーを全物件に適用して検索を行う。たと
えば各物件が検索キーを含むか否かを調べ、検索キーを
含む物件が検索結果としてリストアップされる。2. Description of the Related Art When performing a condition search in a database search by a search for all properties, a predetermined search key is set, and the search key is applied to all properties to perform a search. For example, it is checked whether or not each property includes a search key, and properties including the search key are listed as search results.

【０００３】このような検索において、複数の検索キー
からなる条件式（検索式）を用いて検索を行う場合に
は、複数の検索キーによって条件式を立式し、これを用
いて検索することが従来行われている。たとえばキー
Ａ、Ｂ、Ｃ、Ｄによって、（Ａ or Ｂ or Ｃ）and
Ｄのような条件式を作成し、この式を用いて全物件に
対する検索を行う。In such a search, when a search is performed using a conditional expression (search expression) including a plurality of search keys, a conditional expression is formulated using a plurality of search keys, and the search is performed using the conditional expression. Is conventionally performed. For example, by keys A, B, C, and D, (A or B or C) and
A conditional expression such as D is created, and a search for all properties is performed using this expression.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、このよ
うな検索は複数のキーから構成される条件式を用いてい
るため、検索時間が非常に長く、条件不成立時のコスト
パフォーマンスが低い。また、類似する条件式、たとえ
ば上記の条件式に類似する（Ａ or Ｂ or Ｃ）and
Ｅのような条件式による検索を行う場合に、すでに行
った検索の部分的な論理条件（Ａ or Ｂ or Ｃ）の
再利用ができないため、効率が低いという欠点がある。However, since such a search uses a conditional expression composed of a plurality of keys, the search time is extremely long, and the cost performance when the condition is not satisfied is low. Also, a similar conditional expression, for example, (A or B or C) and
When performing a search using a conditional expression such as E, there is a drawback that efficiency is low because partial logical conditions (A, B, or C) of the search already performed cannot be reused.

【０００５】本発明は、上記のような従来の欠点を解消
し、複数の条件による検索において、条件式の複雑さに
かかわらず、高速な条件検索が可能であり、検索結果の
再利用が可能なデータベース検索システムを提供するこ
とを目的とする。The present invention solves the above-mentioned drawbacks of the prior art, and enables a high-speed condition search regardless of the complexity of a conditional expression in a search based on a plurality of conditions, and enables the search results to be reused. It aims to provide a simple database search system.

【０００６】[0006]

【課題を解決するための手段】本発明のデータベース検
索システムは、検索対象のｉ番目の物件のｊ番目のデー
タＣ_i,jとｉ番目の物件内のｊ番目のデータＣ_i,jの近傍
データＣ_i,kとに基づいて算出された複数の量子化量の
組み合わせと、検索キーに基づいて算出された複数の量
子化量の組み合わせとを比較し、その合致度により全物
件検索を行う検索手段と、前記検索手段による検索結果
を記憶する検索結果記憶手段と、前記検索結果記憶手段
に記憶された検索結果を用いて条件検索を行う条件検索
手段とを具備する点に特徴を有する。また、本発明のデ
ータベース検索方法は、検索対象のｉ番目の物件のｊ番
目のデータＣ_i,jとｉ番目の物件内のｊ番目のデータＣ
_i,jの近傍データＣ_i,kとに基づいて算出された複数の量
子化量の組み合わせと、検索キーに基づいて算出された
複数の量子化量の組み合わせとを比較し、その合致度に
より全物件の中から検索キーに対応したあいまい検索を
行い、該検索結果の物件リストに対して論理条件検索を
行う点に特徴を有する。Means for Solving the Problems] database search system of the present invention, the j-th data C _i of the i-th property of the search _{target, j} and the i-th j-th data C _i in _property, near the _j A combination of a plurality of quantization amounts calculated based on the data C _{i, k} and a combination of a plurality of quantization amounts calculated based on the search key are compared, and all properties are searched based on the degree of matching. It is characterized in that it comprises a search means, a search result storage means for storing a search result by the search means, and a condition search means for performing a condition search using the search results stored in the search result storage means. In addition, the database search method of the present invention provides the j-th data C _{i, j} of the i-th property to be searched and the j-th data C in the i-th property.
A combination of a plurality of quantization amounts calculated based on the neighborhood data C _{i, k of} _{i, j} and a combination of a plurality of quantization amounts calculated based on the search key are compared. It is characterized in that an ambiguous search corresponding to the search key is performed from all the properties, and a logical condition search is performed on the property list of the search result.

【０００７】[0007]

【作用】本発明によれば、検索対象のｉ番目の物件のｊ
番目のデータＣ_i,jとｉ番目の物件内のｊ番目のデータ
Ｃ_i,jの近傍データＣ_i,kとに基づいて算出された複数の
量子化量の組み合わせと、検索キーに基づいて算出され
た複数の量子化量の組み合わせとを比較し、その合致度
により全物件検索を行い、その検索結果を記憶する。次
に、その検索結果を用いて条件検索を行う。したがっ
て、所定の条件による検索結果が記憶されており、これ
を用いてより複雑な条件の検索を行うから、部分的な検
索結果を再利用することができ、高速の条件検索が可能
となる。According to the present invention, j of the i-th property to be searched is
Based on a search key and a combination of a plurality of quantization amounts calculated based on the _i- th data C _{i, j and} the neighborhood data C _{i, k} of the j-th data C _{i, j} in the i-th property. A comparison is made with the calculated combination of the plurality of quantization amounts, a search for all properties is performed based on the degree of matching, and the search result is stored. Next, a condition search is performed using the search result. Therefore, a search result based on a predetermined condition is stored, and a more complicated condition is searched using the search result. Therefore, a partial search result can be reused, and a high-speed condition search can be performed.

【０００８】[0008]

【実施例】図１には、本発明によるシステムの一実施例
が示されている。同図に示すように、本システムは、検
索部１２、検索結果リスト格納部１４、物件リスト条件
検索部１６を有する。検索部１２は検索キー入力部１８
から入力された所定の検索キーからなる条件式によって
データベースたる検索対象１０について全物件検索を行
う検索部である。FIG. 1 shows an embodiment of the system according to the present invention. As shown in FIG. 1, the present system includes a search unit 12, a search result list storage unit 14, and a property list condition search unit 16. The search unit 12 is a search key input unit 18
The search unit performs a search for all properties of the search target 10 as a database using a conditional expression including a predetermined search key input from the user.

【０００９】たとえば検索キー入力部１８から入力され
るキーが同図に示すようにＡ、Ｂである場合に、検索部
１２は条件式Ａ、Ｂによってそれぞれ検索を行い、その
結果が検索結果リスト格納部１４に格納される。同図に
示す例では条件式Ａによって検索された物件の番号リス
トが３、５、１０、２０であり、条件式Ｂによって検索
された物件の番号リストが５、１０、３０である。これ
らの物件番号リストが検索結果として検索結果リスト格
納部１４に格納される。For example, when the keys input from the search key input unit 18 are A and B as shown in FIG. 1, the search unit 12 performs a search using the conditional expressions A and B, and the result is a search result list. It is stored in the storage unit 14. In the example shown in the figure, the property number list searched by the conditional expression A is 3, 5, 10, and 20, and the property number list searched by the conditional expression B is 5, 10, 30. These property number lists are stored in the search result list storage unit 14 as search results.

【００１０】物件リスト条件検索部１６は、検索結果リ
スト格納部１４に格納された結果を用いてさらに複雑な
条件式による検索を行う検索部である。たとえば条件式
Ａ、Ｂによる前記の検索結果を用いて、条件式（Ａ or
Ｂ）や（Ａ and Ｂ）による検索を行う場合には、
物件リスト条件検索部１６は検索結果リスト格納部１４
からから条件式Ａおよび条件式Ｂによる検索結果を読み
出し、これらを用いて条件式（Ａ or Ｂ）または（Ａ
and Ｂ）による検索を行う。The property list condition search unit 16 is a search unit that performs a search based on a more complicated conditional expression using the results stored in the search result list storage unit 14. For example, using the above search results based on the conditional expressions A and B, the conditional expression (A or
If you search by B) or (A and B)
The property list condition search unit 16 is a search result list storage unit 14
From the search results obtained by the conditional expressions A and B, and using these, the conditional expressions (A or B) or (A
and search by B).

【００１１】本実施例の場合には、検索結果リスト格納
部１４に格納されている結果は前述のように、条件式Ａ
によって検索された物件の番号リストが３、５、１０、
２０であり、条件式Ｂによって検索された物件の番号リ
ストが５、１０、３０であるから、条件式（Ａ or
Ｂ）の検索の場合にはこれらの物件番号リストのＯＲを
求めることにより、物件番号リスト３、５、１０、２
０、３０が得られる。同様に、条件式（Ａ and Ｂ）
の検索の場合にはこれらの物件番号リストのＡＮＤを求
めることにより、物件番号リスト５、１０が得られる。
これらの得られた物件番号リストの結果は再び検索結果
リスト格納部１４に送られ、格納される。In the case of the present embodiment, the result stored in the search result list storage unit 14 is, as described above, a conditional expression A
The property number list searched by is 3, 5, 10,
20, and the number list of the property searched by the conditional expression B is 5, 10, 30. Therefore, the conditional expression (A or
In the case of the search in B), the OR of these property number lists is obtained to obtain property number lists 3, 5, 10, and 2.
0 and 30 are obtained. Similarly, conditional expression (A and B)
In the case of the search, the AND of these property number lists is obtained, whereby property number lists 5 and 10 can be obtained.
The obtained results of the property number list are sent again to the search result list storage unit 14 and stored therein.

【００１２】したがって、これらのリストを用いて、物
件リスト条件検索部１６はさらに（Ａ or Ｂ or
Ｃ）や（Ａ or Ｂ or Ｃ）and Ｅのような条件式
を用いた検索を同様に行うことができる。Therefore, using these lists, the property list condition search unit 16 further calculates (A or B or
A search using a conditional expression such as (C), (A or B or C) and E can be similarly performed.

【００１３】本実施例によれば、上記のように所定の検
索キーからなる条件式によって検索した結果を物件番号
リストとして検索結果リスト格納部１４に格納してお
き、これらのキーを組み合わせた複雑な条件式による検
索を行う場合に、格納された物件番号リストを用いて条
件検索を行う。According to this embodiment, the result of the search by the conditional expression including the predetermined search key as described above is stored in the search result list storage unit 14 as a property number list, and the complex combination of these keys is stored. When performing a search using a simple conditional expression, a condition search is performed using the stored article number list.

【００１４】したがって、複雑な条件式による検索の場
合に、全物件に対してそれぞれ複雑な条件による検索を
行う必要がないため、検索時間を短縮することができ
る。また、部分的な条件検索の結果を再利用して検索を
行うから検索効率がよい。Therefore, in the case of a search based on a complex conditional expression, it is not necessary to perform a search on all the properties based on a complex condition, and thus the search time can be reduced. Further, since the search is performed by reusing the result of the partial condition search, the search efficiency is high.

【００１５】本発明による検索システムは、各種のデー
タベースの検索に適用できる。たとえば次のようなデー
タ検索システムにおける条件式の検索に適用できる。The search system according to the present invention can be applied to various database searches. For example, the present invention can be applied to search for a conditional expression in the following data search system.

【００１６】図２は、本発明が適用される一実施例を示
す近傍特徴量によるパターン検索システムのデータフロ
ー図である。この検索システムでは、予め全対象物件か
ら事象（情報）の位相情報を全て捨象した近傍特徴量を
作成し、そのデータ群に対して全物件検索を行う。検索
のアルゴリズムは、学習ステップと検索ステップとから
なる。学習ステップでは、物件毎に近傍特徴量行列が作
成される。図２では、検索対象１０から近傍特徴量行列
３０を作成し、それを構造ファイル４０に保存するまで
のステップに該当する。また、検索ステップでは、検索
キーに対して学習ステップと同様の処理を行って検索キ
ーの近傍特徴量が求められ、物件の近傍特徴量とのマッ
チング演算が行われ、物件毎にマッチング度（類似度）
を示す評価結果を得る。図２では、検索キー５０をもと
に検索Ｓ４にて構造ファイル４０の物件データとのマッ
チング演算を行い、評価結果リスト７０或いはソート済
みリスト８０のように結果を出力するまでのステップに
該当する。以下、各ステップについて説明する。FIG. 2 is a data flow diagram of a pattern search system based on neighboring features showing an embodiment to which the present invention is applied. In this search system, a neighborhood feature is created in advance by omitting all phase information of events (information) from all target properties, and a search for all properties is performed for the data group. The search algorithm includes a learning step and a search step. In the learning step, a neighborhood feature amount matrix is created for each property. In FIG. 2, this corresponds to a step of creating a neighborhood feature amount matrix 30 from the search target 10 and storing it in the structure file 40. Also, in the search step, the same processing as in the learning step is performed on the search key to determine the neighboring feature amount of the search key, a matching operation is performed with the nearby feature amount of the property, and the matching degree (similarity degree) is determined for each property. Every time)
Is obtained. In FIG. 2, this corresponds to a step of performing a matching operation with the property data of the structure file 40 in the search S4 based on the search key 50 and outputting the result as in the evaluation result list 70 or the sorted list 80. . Hereinafter, each step will be described.

【００１７】（１）、学習ステップ図２に於いて、検索対象１０は、例えば日本語、英語、
ドイツ語、フランス語、ヘブライ語、ロシア語などの文
書データ、或いは量子化された波形数値データ、化学構
造式、遺伝子情報などである。このような検索対象に対
して、まず正規化手段Ｓ１により正規化の処理を行な
う。一般に検索対象は、情報の最小単位（文書であれば
アルファベットなどの文字、数値チャートであれば、あ
る時刻における実数値など）の列で表現されている。そ
れをなんらかの方法でｎ階調の整数列に変換する。これ
をデータの正規化と呼ぶ。(1) Learning Step In FIG. 2, the search target 10 is, for example, Japanese, English,
Document data in German, French, Hebrew, Russian, etc., or quantized waveform numerical data, chemical structural formulas, genetic information, etc. For such a search target, first, normalization processing is performed by the normalization means S1. Generally, a search target is represented by a sequence of the minimum unit of information (a character such as an alphabet in a document, a real number at a certain time in a numerical chart, and the like). It is converted into an integer sequence of n gradations by some method. This is called data normalization.

【００１８】例えば、英文書データの場合、ＡＳＣＩＩ
コード表をそのまま用いることにより、次のような２５
６階調の数値表現として実現される。 …… This is a pen. …… 84｜104｜105｜115｜32｜105｜115｜32｜97｜32｜112｜101｜110｜46｜For example, in the case of English document data, ASCII
By using the code table as it is, the following 25
It is realized as a numerical representation of six gradations. …… This is a pen. …… 84 ｜ 104 ｜ 105 ｜ 115 ｜ 32 ｜ 105 ｜ 115 ｜ 32 ｜ 97 ｜ 32 ｜ 112 ｜ 101 ｜ 110 ｜ 46 ｜

【００１９】上記のコードにおいては、Ｔが84、ｈが10
4...と対応している。In the above code, T is 84 and h is 10
It corresponds to 4 ....

【００２０】次に、正規化されたデータ２０から、学習
手段Ｓ２により近傍特徴量が算出され以下に説明する手
順で近傍特徴量行列３０の形式に畳込まれる。ここで近
傍特徴量を抽出する演算式は種々考えられる。この演算
式は検索の鋭さ（過検出の少なさ）にも影響を与える。Next, from the normalized data 20, a neighboring feature is calculated by the learning means S2, and is convolved in the form of a neighboring feature matrix 30 by the procedure described below. Here, various arithmetic expressions for extracting the neighborhood feature amount can be considered. This arithmetic expression also affects the sharpness of the search (less overdetection).

【００２１】学習手段Ｓ２の一例として、正規化された
データ２０から量子化量を求め、この量子化量を用いて
近傍特徴量行列３０を得る手順を説明する。例えば図３
に示すように、検索される対象物件（文書）が複数ある
とし、そのうちのｉ番目の物件の量子化について考え
る。ここで、ｉ番目の物件（文書）のｊ番目のデータ
（文字）をＣ_i,jとし、Ｃ_i,jのｋ近傍に関するデータを
Ｃ_i,j+1、Ｃ_i,j+2、・・・、Ｃ_i,j+kとする。ｉ番目の物件
において、図３に示すように正規化された数値列135,6
4,37,71,101,...が並んでいるとすると、Ｃ_i,jに関する
量子化量ｘ及びＣ_i,jのｋ近傍に関する量子化量ｙは、ｘ＝ｆ(Ｃ_i,j) ｙ＝ｇ(Ｃ_i,j,Ｃ_i,j+1,Ｃ_i,j+2,..,Ｃ_i,j+k) ・・・式（１）で求められる。As an example of the learning means S2, a procedure for obtaining a quantization amount from the normalized data 20 and obtaining a neighboring feature amount matrix 30 using the quantization amount will be described. For example, FIG.
Suppose that there are a plurality of target properties (documents) to be searched, and the quantization of the i-th property is considered. Here, j-th data of the i-th property (document) (character) and C _{i, j,} C _i, the data for the k-neighborhood of _{_{j C i, j + 1,}} C i, j + 2, · .., C _{i, j + k} . In the i-th property, a normalized numeric string 135,6 as shown in FIG.
4,37,71,101 and ... is that alongside, C _i, the quantization amount x and C _i relates _{_j,} quantization amount y is about k vicinity of _j, x = f (C _{i, j)} y = G (C _{i, j} , C _{i, j + 1} , C _{i, j + 2} , .., C _{i, j + k} ) ... It is obtained by the following equation (1).

【００２２】ここで、ｆ(Ｃ_i,j)はＣ_i,jに関するｎ段階
量子化関数である。すなわち、ｉ番目の物件のｊ番目の
データＣ_i,jについて所定の演算を行って得られる値で
あり、１〜ｎのいずれかの整数で表される。したがっ
て、このｎ段階量子化関数ｆの演算により得られた量子
化量ｘの値によって図４に示す行列（座標）においてｘ
軸方向の位置が１〜ｎの範囲で定まる。Here, f (C _{i, j} ) is an n-stage quantization function for C _{i, j} . That is, it is a value obtained by performing a predetermined operation on the j-th data C _{i, j} of the i-th property, and is represented by any integer from 1 to n. Accordingly, in the matrix (coordinates) shown in FIG. 4, x is obtained by the value of the quantization amount x obtained by the calculation of the n-stage quantization function f.
The position in the axial direction is determined in the range of 1 to n.

【００２３】また、ｇ(Ｃ_i,j,Ｃ_i,j+1,Ｃ_i,j+2,...,Ｃ
_i,j+k)は、Ｃ_i,jの前方ｋ近傍に関するｍ段階量子化関
数である。すなわち、ｉ番目の物件のｊ番目のデータＣ
_i,jと、そのデータＣ_i,jの近傍の所定数のデータＣ
_i,j+1、Ｃ_i,j+2、・・・、Ｃ_i,j+kとについて所定の演算を
行って得られる値であり、１〜ｍのいずれかの整数で表
される。たとえば図３に示すようにｊ番目のデータＣ
_i,jが１３５であり、ｋが３の場合には、Ｃ_i,j+1,Ｃ
_i,j+2,Ｃ_i,j+3としてデータ１３５に続くデータ６４、
３７、７１を抽出し、これらのデータとデータ１３５と
の相関について所定の演算を行う。ｊ番目のデータＣ
_i,jが次の６４の場合には、Ｃ_i,j+1,Ｃ_i,j+2,Ｃ_i,j+3と
してデータ６４に続くデータ３７、７１、１０１を抽出
し、これらのデータとデータ６４との相関について所定
の演算を行う。このようにしてｍ段階量子化関数ｇの演
算により得られた量子化量ｙの値によって、図４に示す
行列（座標）におけるｙ軸方向の位置が１〜ｍの範囲で
定まる。Further, g (C _{i, j} , C _{i, j + 1} , C _{i, j + 2} ,..., C
_{i, j + k} ) is an m-step quantization function for the neighborhood of k in front of C _{i, j} . That is, the j-th data C of the i-th property
_{i, j} and a predetermined number of data C near the data C _{i, j}
_{i, j + 1} , C _{i, j + 2} ,..., C _{i, j + k} are values obtained by performing a predetermined operation, and are represented by any integer from ₁ to _m . For example, as shown in FIG.
_{When i, j} is 135 and k is 3, C _{i, j + 1} , C
data 64 following data 135 as _{i, j + 2} , C _{i, j + 3} ,
37 and 71 are extracted, and a predetermined calculation is performed on the correlation between these data and the data 135. j-th data C
_{When i, j} is the next 64, data 37, 71, 101 following data 64 are extracted as Ci _{, j + 1} , Ci _{, j + 2} , Ci _{, j + 3} , and these data are extracted. A predetermined calculation is performed for the correlation between the data 64 and the data 64. The position of the matrix (coordinates) shown in FIG. 4 in the y-axis direction in the range of 1 to m is determined by the value of the quantization amount y obtained by the calculation of the m-stage quantization function g.

【００２４】したがって、上記のように正規化されたデ
ータ２０から量子化量ｘ、ｙを求めることによって図４
に示す行列（座標）における位置が定める。なお、量子
化量を求める演算式ｆ()、ｇ()としては、他にも種々あ
るが、例えば、ｆ：ｘ→ｘｇ：(ｘ，ｙ)→ｘ−ｙ（または｜ｘ−ｙ｜）・・・式（２）のように、演算式ｆ()は入力された値をそのまま量子化
量とし、演算式ｇ()は入力された２つの相対の差、或い
は差の絶対値を量子化量とする例が考えられる。この場
合、正規化されたデータ２０が先の例84｜104｜105｜11
5…では、Ｃ_i,jを８４とすると、Ｃ_i,jとＣ_i,jの前方ｋ
近傍に関する量子化量ｘ、ｙの座標位置は、(８４，２
０)、(８４，２１)、(８４，３１)、…となる。また、
上式（２）以外にも、幾つかの文字列の個々の文字整数
値に対し四則演算を施すことにより近傍特徴量を取り出
してもよい。図３中に示した量子化量ｘ、ｙの座標位置
(５１，７１)、(３２，１０３)、…は、上式（２）とは
異なる手法によって求めたものである。Therefore, by obtaining the quantization amounts x and y from the data 20 normalized as described above, FIG.
The position in the matrix (coordinates) shown in FIG. Although there are various other arithmetic expressions f () and g () for obtaining the quantization amount, for example, f: x → x g: (x, y) → xy (or | xy) |) Expression (2): The expression f () uses the input value as it is as the quantization amount, and the expression g () denotes the difference between the two input values or the absolute value of the difference. Can be considered as a quantization amount. In this case, the normalized data 20 corresponds to the previous example 84 | 104 | 105 | 11.
In 5 _..., C i, when the _j a 84, C _{i, j} and C _i, the front of the _j k
The coordinate positions of the quantization amounts x and y related to the neighborhood are (84, 2)
0), (84, 21), (84, 31),. Also,
In addition to the above equation (2), a neighborhood feature may be extracted by performing four arithmetic operations on individual character integer values of some character strings. The coordinate positions of the quantization amounts x and y shown in FIG.
(51, 71), (32, 103),... Are obtained by a method different from the above equation (2).

【００２５】本システムでは、各物件情報は、上記のよ
うにして求めたｘ、ｙに対して物件の通番ｉと重みｗ
（x,y,i）の組として記憶される。重みｗ（x,y,i）は、
データｘ、ｙ、ｉから所定の演算によって求められる
が、通常は重みｗ（x,y,i）の値は１に固定してもよ
い。In this system, each piece of property information is a property serial number i and a weight w with respect to x and y obtained as described above.
It is stored as a set of (x, y, i). The weight w (x, y, i) is
The value of the weight w (x, y, i) may be fixed to 1 although it is obtained by a predetermined calculation from the data x, y, i.

【００２６】上記のようにして各物件についてデータＣ
_i,jごとに求められた量子化量ｘ、ｙの値に基づき図４
に棒によって示されるように、データを記憶する。すな
わち、データＣ_i,jの量子化量ｘ、ｙの値によって定め
られる座標の位置に、その物件の通番ｉとその重みｗ
（x,y,i）を組みとしたデータを記憶する。同図ではこ
のようなデータが記憶されるごとに棒の長さが延びるよ
うに表されている。通常は重みｗ（x,y,i）は１とされ
るから、物件の通番ｉのデータのみがｘ、ｙの値によっ
て定められる座標の位置に記憶されてゆく。Data C for each property as described above
4 based on the values of the quantization amounts x and y obtained for each of _{i and j} .
Store the data as indicated by the bar at That is, the serial number i of the property and the weight w thereof are placed at the position of the coordinates determined by the values of the quantization amounts x and y of the data C _{i, j.}
Data containing (x, y, i) is stored. In the figure, the length of the bar is shown to be extended each time such data is stored. Normally, the weight w (x, y, i) is set to 1, so that only the data of the serial number i of the property is stored at the position of the coordinates determined by the values of x and y.

【００２７】この様にして作成された近傍特徴量行列に
物件の識別番号を付加して構造ファイル４０として保存
する。An identification number of a property is added to the neighborhood feature amount matrix created in this way, and the matrix is stored as a structure file 40.

【００２８】（２）、検索ステップまず、検索キー５０を入力する。例えば、"This is a p
en."を検索キーとする。この検索キー５０に対して学習
ステップでの正規化手段Ｓ１と同一の正規化方法に基づ
く正規化手段Ｓ３によりキー情報を以下の整数列に正規
化する。 84｜104｜105｜115｜32｜105｜115｜32｜97｜32｜112｜101｜110｜46｜(2) Search Step First, a search key 50 is input. For example, "This is ap
en. "is used as a search key. The key information is normalized to the following integer sequence by the normalization means S3 based on the same normalization method as the normalization means S1 in the learning step for the search key 50. ｜ 104 ｜ 105 ｜ 115 ｜ 32 ｜ 105 ｜ 115 ｜ 32 ｜ 97 ｜ 32 ｜ 112 ｜ 101 ｜ 110 ｜ 46 ｜

【００２９】次に、検索手段Ｓ４において、学習ステッ
プでの学習手段Ｓ２と同一の近傍特徴量抽出式ｆ()、ｇ
()を用いて、正規化された検索キー５０の数値列の先頭
から量子化量ｘ、ｙの組の系列を作成する。次に、この
検索キー５０の量子化量ｘ、ｙの組の系列に基づいて、
構造ファイル４０から取り出した物件ｉに対する検索キ
ー５０の含有度数ω_iとして、Ｖ(ｘ_j,ｙ_j,ｉ)をｊ＝１
〜ｍについて合計することにより算出する。Next, in the search means S4, the same neighborhood feature extraction formulas f () and g as those of the learning means S2 in the learning step are used.
Using (), a series of pairs of quantization amounts x and y is created from the head of the normalized numerical sequence of the search key 50. Next, based on a series of pairs of quantization amounts x and y of the search key 50,
As the content frequency ω _i of the search key 50 for the property i extracted from the structure file 40, V (x _j , y _j , i) is j = 1
Ｍm are calculated by summing.

【００３０】ただし、Ｖ(ｘ_j,ｙ_j,ｋ)は、構造ファイル
４０に記憶された物件ｉの重みに等しく、重みを持たな
い場合には０と定める。However, V (x _j , y _j , k) is equal to the weight of the property i stored in the structure file 40, and is set to 0 when there is no weight.

【００３１】したがって、検索すべきキー５０の数値列
から求めた量子化量ｘ、ｙの組に対応する図４の量子化
量ｘ、ｙの位置にデータがある場合（棒がある場合）に
は、別に設けられた記憶手段のそのデータに示される物
件の通番ｉの格納箇所にその重みの値を構造表価値scor
e（合致度）として記憶させる。Therefore, when there is data (when there is a bar) at the position of the quantization amount x, y in FIG. 4 corresponding to the combination of the quantization amount x, y obtained from the numerical sequence of the key 50 to be searched. Stores the value of the weight in the storage location of the serial number i of the property indicated by the data in the separately provided storage means,
It is stored as e (degree of match).

【００３２】次に、評価結果出力手段Ｓ５において、構
造ファイル４０内の各物件毎に得られた構造評価値scor
e（合致度）を完全一致の場合の評価値で割って、検索
キー５０の含有確率を求め、評価結果のリスト７０を得
る。更にソート手段Ｓ６において、このリスト７０を含
有確率の降順にソートしソート済みリスト８０を得る。Next, in the evaluation result output means S5, the structure evaluation value scor obtained for each property in the structure file 40 is obtained.
By dividing e (degree of match) by the evaluation value in the case of perfect match, the content probability of the search key 50 is obtained, and a list 70 of evaluation results is obtained. Further, the sorting unit S6 sorts the list 70 in descending order of the content probability to obtain a sorted list 80.

【００３３】このソート済みリスト８０が検索結果であ
り、その上位物件を参照することにより、検索キーが物
件中に含まれている確率が高い物件名を知ることができ
る。含有確率は、完全一致及び不完全一致の全てについ
て求まるから、あいまい一致検索を行なうことができ
る。The sorted list 80 is a search result, and by referring to a higher order property, it is possible to know a property name having a high probability that the search key is included in the property. Since the content probabilities are obtained for all of the perfect match and the incomplete match, a fuzzy match search can be performed.

【００３４】また、検索キーの全情報についての全物件
探索であるから、検索もれが発生する確率は、本質的に
零であると言う特徴がある。Further, since all properties are searched for all information of a search key, the probability of occurrence of a search leak is essentially zero.

【００３５】また、１つの物件に対する検索キーの評価
時間は、キーの文字数のみに依存し、物件の大きさには
依存しない。従って、非常に高速に検索を行なうことが
できる。The evaluation time of the search key for one property depends only on the number of characters of the key, and does not depend on the size of the property. Therefore, the search can be performed at a very high speed.

【００３６】上記のようなデータ検索の結果、ソート済
みリスト８０においてスコア（キーの含有確率）が所定
のしきい値よりも高い物件を抽出し、これを前述の条件
式ＡまたはＢによる検索結果とする。これらの物件の番
号リストは図１の検索結果リスト格納部１４に格納さ
れ、これを基にして前述のように物件リスト条件検索部
１６において例えば（Ａ and Ｂ）や、（Ａ or Ｂ
or Ｃ）and Ｅ等のような条件検索が行われる。As a result of the data search as described above, properties whose score (key content probability) is higher than a predetermined threshold value are extracted from the sorted list 80, and are extracted as search results by the above-described conditional expression A or B. And The property number list of these properties is stored in the search result list storage unit 14 of FIG. 1, and based on this, the property list condition search unit 16 uses, for example, (A and B) or (A or B) as described above.
or C) A condition search such as and E is performed.

【００３７】上記のようなデータ検索においては、検索
の結果がソート済みリスト８０として得られるからこれ
を検索結果リスト格納部１４に格納し、格納されたデー
タに基づいて条件検索を行うことにより、複雑な条件式
の場合にも高速で検索を行うことができる。In the data search as described above, the search result is obtained as a sorted list 80, which is stored in the search result list storage unit 14, and a condition search is performed based on the stored data. A high-speed search can be performed even for a complicated conditional expression.

【００３８】近傍特徴量は、各物件の全データを対象と
し取り出さなくてもよい。例えば、物件データ中の特定
の一つまたは一つ以上の整数値、特定の範囲の整数値、
或いはデータ列を構成する各バイト中の特定の１つまた
は一つ以上のビットを除外して近傍特徴量を作成（抽
出）してもよい。また日本語文書のように２バイト文字
で構成されている場合には、例えば上位バイトを除外し
て下位バイトを対象として近傍特徴量を取り出してもよ
い。The neighboring feature amounts need not be extracted for all data of each property. For example, one or more specific integer values in property data, a specific range of integer values,
Alternatively, the neighborhood feature may be created (extracted) by excluding one or more specific bits in each byte constituting the data string. Further, in the case of a two-byte character as in a Japanese document, for example, the upper-order byte may be excluded and the neighboring feature amount may be extracted from the lower-order byte.

【００３９】上述の例では、近傍特徴量行列は、２５６
次のビット行列であり、これは８Kバイトに相当する。
従って、１物件のデータが１K バイト程度であるデータ
ベースでは、効率のよいシステムであるとは言えない。
そこでデータ圧縮手段Ｓ７を設けてデータ圧縮を行なっ
て構造ファイル４０の容量を減らすのがよい。In the above example, the neighborhood feature matrix is 256
The next bit matrix, which corresponds to 8K bytes.
Therefore, a database in which the data of one property is about 1 KB is not an efficient system.
Therefore, it is preferable to provide the data compression means S7 to perform data compression to reduce the capacity of the structure file 40.

【００４０】図５にデータ圧縮法の一例を示す。この例
では、２５６次の近傍特徴量行列の各要素毎に要素値が
１である物件名４０ａ（識別コード）を１バイト／件の
データ列として蓄積する。従って、要素値が０である物
件名は不要データとして除外する。FIG. 5 shows an example of the data compression method. In this example, the property name 40a (identification code) having an element value of 1 for each element of the 256-order neighborhood feature amount matrix is stored as a data string of 1 byte / item. Therefore, a property name whose element value is 0 is excluded as unnecessary data.

【００４１】物件数が２５５個以上ある場合には、物件
名４０ａは１バイトで表せないので、下位の１バイトの
みを蓄積する。例えば、物件数が１万件の場合、物件名
は２バイトで表されるが、そのうちの下位１バイトを使
用する。そして物件名コードが２５５を越える毎にデー
タ列にマーカ４０ｂを挿入する。If the number of properties is 255 or more, the property name 40a cannot be represented by one byte, so only the lower one byte is stored. For example, when the number of properties is 10,000, the property name is represented by 2 bytes, and the lower 1 byte is used. Then, every time the property name code exceeds 255, the marker 40b is inserted into the data string.

【００４２】検索時には、検索キーの近傍特徴量の各々
に該当する構造ファイルのデータ列を取り出し、物件名
毎の出現度数テーブルを作成する。この際、マーカ４０
ｂを越える毎に物件名コードに２５５を加える。このよ
うにして作成した出現度数テーブルに基づいて図２の評
価結果リスト７０が得られる。At the time of retrieval, a data string of the structure file corresponding to each of the neighboring feature amounts of the retrieval key is extracted, and an appearance frequency table for each property name is created. At this time, the marker 40
Each time the value exceeds b, 255 is added to the article name code. The evaluation result list 70 of FIG. 2 is obtained based on the appearance frequency table created in this manner.

【００４３】なお物件名コードのデータ列が例えば全物
件中の半分以上ある場合には、その近傍特徴量行列要素
は各物件について共通であると見なして、その要素を削
除してもよい。When the data string of the property name code is, for example, half or more of all the properties, the neighboring feature amount matrix element may be regarded as common to each property, and the element may be deleted.

【００４４】上述の実施例において，正規化手段Ｓ１、
学習手段Ｓ２、正規化手段Ｓ３、検索手段Ｓ４、評価結
果出力手段Ｓ５、ソート手段Ｓ６、データ圧縮手段Ｓ７
は、コンピュータプログラムによって構成することがで
きるが、論理回路素子を用いて専用のハードウエアを構
成してもよい。In the above embodiment, the normalizing means S1,
Learning means S2, normalization means S3, search means S4, evaluation result output means S5, sort means S6, data compression means S7
Can be configured by a computer program, but dedicated hardware may be configured using logic circuit elements.

【００４５】[0045]

【発明の効果】本発明によれば、検索対象のｉ番目の物
件のｊ番目のデータＣ_i,jとｉ番目の物件内のｊ番目の
データＣ_i,jの近傍データＣ_i,kとに基づいて算出された
複数の量子化量の組み合わせと、検索キーに基づいて算
出された複数の量子化量の組み合わせとを比較し、その
合致度により全物件検索を行うようにしたので、あいま
い検索が可能となり、実質的に検索漏れをなくし、高速
に検索を行うことができる。そして、その検索の結果を
記憶しておき、その記憶された検索結果に基づいて条件
検索を行うことにより、複雑な条件による検索を高速で
行うことができ、また、部分的な検索結果を再利用でき
るので無駄がなく、検索効率を高くすることが可能とな
る。According to the present invention, the j-th data C _i of the i-th property of the search _{target, j} and the i-th j-th data C _i in _property, near the _j data C _i, and _k Is compared with a combination of a plurality of quantization amounts calculated based on the search key and a combination of a plurality of quantization amounts calculated based on the search key, and the entire property is searched based on the degree of matching. The search can be performed, and the omission of the search can be substantially eliminated, and the search can be performed at a high speed. Then, by storing the search results and performing a conditional search based on the stored search results, a search with complicated conditions can be performed at high speed, and a partial search result can be reproduced. Since it can be used, there is no waste and it is possible to increase the search efficiency.

[Brief description of the drawings]

【図１】本発明によるデータベース検索システムの一実
施例のデータフロー図である。FIG. 1 is a data flow diagram of an embodiment of a database search system according to the present invention.

【図２】本発明による検索システムを適用するデータベ
ース検索システムのデータフロー図である。FIG. 2 is a data flow diagram of a database search system to which the search system according to the present invention is applied.

【図３】近傍情報の量子化を示す図である。FIG. 3 is a diagram illustrating quantization of neighborhood information.

【図４】記憶される情報構造を示す図である。FIG. 4 is a diagram showing a stored information structure.

【図５】圧縮された近傍特徴量のデータ構成図である。FIG. 5 is a data configuration diagram of a compressed neighboring feature amount.

[Explanation of symbols]

１０検索対象１２検索部１４検索結果リスト格納部１６物件リスト条件検索部１８検索キー入力部２０正規化データ３０近傍特徴量行列４０構造ファイル５０検索キー６０正規化キー７０評価結果リスト８０ソート済みリストＳ１正規化手段Ｓ２学習手段Ｓ３正規化手段Ｓ４検索手段Ｓ５評価結果出力手段Ｓ６ソート手段Ｓ７データ圧縮手段 10 Search Target 12 Search Unit 14 Search Result List Storage Unit 16 Property List Condition Search Unit 18 Search Key Input Unit 20 Normalized Data 30 Nearby Feature Matrix 40 Structure File 50 Search Key 60 Normalized Key 70 Evaluation Result List 80 Sorted List S1 normalization means S2 learning means S3 normalization means S4 search means S5 evaluation result output means S6 sorting means S7 data compression means

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平２−130673（ＪＰ，Ａ) 特開昭60−65335（ＪＰ，Ａ) 特許2993539（ＪＰ，Ｂ２) 特許2993540（ＪＰ，Ｂ２) 特許3151730（ＪＰ，Ｂ２) ＲｏｙＥ．Ｋｉｍｂｅｌｌ，”ＳｅａｒｃｈｉｎｇｆｏｒＴｅｘｔ？ＳｅｎｄａｎＮ−Ｇｒａｍ！”，ＢＹＴＥ 1988 ＭＡＹ，ｐｐ．297−312 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-2-130673 (JP, A) JP-A-60-65335 (JP, A) Patent 2993539 (JP, B2) Patent 2993540 (JP, B2) Patent 3151730 (JP, B2) Roy E. Kimbell, "Searching for Text? Send an N-Gram!", BYTE 1988 MAY, pp. 1-35. 297-312 (58) Field surveyed (Int. Cl. ⁷ , DB name) G06F 17/30 JICST file (JOIS)

Claims

(57) [Claims]

1. Calculation based on j-th data C _{i, j} of an i-th property to be searched _{and j} i-th data C _{i, j} neighboring data C _{i, k} in an i-th property A search unit that compares a combination of a plurality of quantization amounts and a combination of a plurality of quantization amounts calculated based on a search key, and performs a search for all properties based on the degree of matching, and stores a search result obtained by the search unit. A database search system comprising: a search result storage unit for performing a condition search using a search result stored in the search result storage unit.

2. The search result storage means stores a search result for each search key detected by the search means as a property number list, and the condition search means stores a logical condition search based on the property number list. The database search system according to claim 1, wherein the search is performed.

3. A calculated based on the search target of the i-th property of the j-th data C _i, j-th data C _i in _j and the i-th _property, neighboring data C _i of _{_j,} to a _k A combination of a plurality of quantization amounts and a combination of a plurality of quantization amounts calculated based on the search key are compared, and a fuzzy search corresponding to the search key is performed from among all the articles based on the degree of matching, and the search is performed. A database search method characterized by performing a logical condition search on the resulting property list.