JPH05225238A

JPH05225238A - Data base retrieval system

Info

Publication number: JPH05225238A
Application number: JP4056965A
Authority: JP
Inventors: Hiroshi Takada; 寛高田
Original assignee: Nippon Steel Corp
Current assignee: Nippon Steel Corp
Priority date: 1992-02-07
Filing date: 1992-02-07
Publication date: 1993-09-03
Anticipated expiration: 2017-02-18
Also published as: JP3258063B2

Abstract

PURPOSE:To increase the retrieval speed and to use the retrieval result again in the data base retrieval system using complicated conditional expressions. CONSTITUTION:Retrieval on prescribed conditions A, B, etc., is performed in a retrieval part 12, and retrieval results are stored in a retrieval result list storage part 14. In the case of retrieval on complicated conditions such as A or B, A and B, an object list condition retrieval part 16 uses the results of retrieval on conditions A and B, which are stored in the retrieval result list storage part 14, to perform the retrieval. Since the data base is not directly retrieved on complicated conditions, the retrieval time is shortened. Since results of retrieval on simpler conditions are used again, the retrieval efficiency is high.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データベース検索シス
テムに関し、特に複数の条件によりデータベースから必
要な情報を取り出すためのデータベース検索システムに
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database search system, and more particularly to a database search system for retrieving necessary information from a database according to a plurality of conditions.

【０００２】[0002]

【従来の技術】全物件検索によるデータベース検索にお
いて条件検索を行う場合には、所定の検索キーが設定さ
れ、この検索キーを全物件に適用して検索を行う。たと
えば各物件が検索キーを含むか否かを調べ、検索キーを
含む物件が検索結果としてリストアップされる。2. Description of the Related Art When a condition search is performed in a database search by searching all properties, a predetermined search key is set, and this search key is applied to all properties to perform the search. For example, it is checked whether or not each property includes the search key, and the property including the search key is listed as the search result.

【０００３】このような検索において、複数の検索キー
からなる条件式（検索式）を用いて検索を行う場合に
は、複数の検索キーによって条件式を立式し、これを用
いて検索することが従来行われている。たとえばキー
Ａ、Ｂ、Ｃ、Ｄによって、（Ａ or Ｂ or Ｃ）and
Ｄのような条件式を作成し、この式を用いて全物件に対
する検索を行う。When performing a search using a conditional expression (search expression) consisting of a plurality of search keys in such a search, formulate a conditional expression with a plurality of search keys and perform a search using this. Has been done conventionally. For example, by keys A, B, C, D, (A or B or C) and
A conditional expression such as D is created, and all properties are searched using this expression.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、このよ
うな検索は複数のキーから構成される条件式を用いてい
るため、検索時間が非常に長く、条件不成立時のコスト
パフォーマンスが低い。また、類似する条件式、たとえ
ば上記の条件式に類似する（Ａ or Ｂ or Ｃ）and
Ｅのような条件式による検索を行う場合に、すでに行っ
た検索の部分的な論理条件（Ａ or Ｂ or Ｃ）の再
利用ができないため、効率が低いという欠点がある。However, since such a search uses a conditional expression composed of a plurality of keys, the search time is very long and the cost performance when the condition is not satisfied is low. Also, similar conditional expressions, for example, similar to the above conditional expressions (A or B or C) and
When performing a search using a conditional expression such as E, there is a drawback that the efficiency is low because the partial logical condition (A or B or C) of the search already performed cannot be reused.

【０００５】本発明は、上記のような従来の欠点を解消
し、複数の条件による検索において、条件式の複雑さに
かかわらず、高速な条件検索が可能であり、検索結果の
再利用が可能なデータベース検索システムを提供するこ
とを目的とする。The present invention solves the above-mentioned conventional drawbacks, and in the search by a plurality of conditions, a high-speed condition search can be performed regardless of the complexity of the conditional expression, and the search results can be reused. The purpose is to provide a simple database search system.

【０００６】[0006]

【課題を解決するための手段】本発明のデータベース検
索システムは、所定の条件により全物件検索を行う検索
手段と、検索手段による検索結果を記憶する検索結果記
憶手段と、検索結果記憶手段に記憶された検索結果を用
いて条件検索を行う条件検索手段とを具備する。A database search system according to the present invention stores a search means for searching all properties under a predetermined condition, a search result storage means for storing search results by the search means, and a search result storage means. And a condition retrieval means for conducting a condition retrieval using the retrieved retrieval result.

【０００７】[0007]

【作用】本発明によれば、複数の条件によって全物件検
索を行う場合に、複数の条件のうちの所定の条件によっ
て検索手段が検索を行い、検索の結果は検索結果記憶手
段に記憶される。次にこの検索結果を用いて条件検索手
段が複雑な条件の検索を行う。したがって、所定の条件
による検索結果が記憶されており、これを用いてより複
雑な条件の検索を行うから、部分的な検索結果を再利用
することができ、高速の条件検索が可能となる。According to the present invention, when all properties are searched by a plurality of conditions, the search means searches by a predetermined condition among the plurality of conditions, and the result of the search is stored in the search result storage means. .. Next, using this search result, the condition search means searches for a complicated condition. Therefore, a search result based on a predetermined condition is stored, and a search for a more complicated condition is performed using this, so that a partial search result can be reused and a high-speed condition search can be performed.

【０００８】[0008]

【実施例】図１には、本発明によるシステムの一実施例
が示されている。同図に示すように、本システムは、検
索部１２、検索結果リスト格納部１４、物件リスト条件
検索部１６を有する。検索部１２は検索キー入力部１８
から入力された所定の検索キーからなる条件式によって
データベースたる検索対象１０について全物件検索を行
う検索部である。1 shows an embodiment of the system according to the invention. As shown in the figure, this system includes a search unit 12, a search result list storage unit 14, and a property list condition search unit 16. The search unit 12 is a search key input unit 18
This is a search unit that searches all properties for a search target 10 that is a database according to a conditional expression including a predetermined search key input from.

【０００９】たとえば検索キー入力部１８から入力され
るキーが同図に示すようにＡ、Ｂである場合に、検索部
１２は条件式Ａ、Ｂによってそれぞれ検索を行い、その
結果が検索結果リスト格納部１４に格納される。同図に
示す例では条件式Ａによって検索された物件の番号リス
トが３、５、１０、２０であり、条件式Ｂによって検索
された物件の番号リストが５、１０、３０である。これ
らの物件番号リストが検索結果として検索結果リスト格
納部１４に格納される。For example, when the keys input from the search key input unit 18 are A and B as shown in the figure, the search unit 12 searches by the conditional expressions A and B, respectively, and the result is the search result list. It is stored in the storage unit 14. In the example shown in the figure, the property number lists searched by the conditional expression A are 3, 5, 10, 20 and the property number lists searched by the conditional expression B are 5, 10, 30. These property number lists are stored in the search result list storage unit 14 as search results.

【００１０】物件リスト条件検索部１６は、検索結果リ
スト格納部１４に格納された結果を用いてさらに複雑な
条件式による検索を行う検索部である。たとえば条件式
Ａ、Ｂによる前記の検索結果を用いて、条件式（Ａ or
Ｂ）や（Ａ and Ｂ）による検索を行う場合には、物
件リスト条件検索部１６は検索結果リスト格納部１４か
らから条件式Ａおよび条件式Ｂによる検索結果を読み出
し、これらを用いて条件式（Ａ or Ｂ）または（Ａ
and Ｂ）による検索を行う。The property list condition search unit 16 is a search unit that uses the results stored in the search result list storage unit 14 to perform a search using a more complicated conditional expression. For example, using the above-mentioned search results by the conditional expressions A and B, the conditional expression (A or
B) or (A and B), the property list condition search unit 16 reads the search results by the conditional expressions A and B from the search result list storage unit 14 and uses them to search for the conditional expressions. (A or B) or (A
and search according to B).

【００１１】本実施例の場合には、検索結果リスト格納
部１４に格納されている結果は前述のように、条件式Ａ
によって検索された物件の番号リストが３、５、１０、
２０であり、条件式Ｂによって検索された物件の番号リ
ストが５、１０、３０であるから、条件式（Ａ or
Ｂ）の検索の場合にはこれらの物件番号リストのＯＲを
求めることにより、物件番号リスト３、５、１０、２
０、３０が得られる。同様に、、条件式（Ａ and Ｂ）
の検索の場合にはこれらの物件番号リストのＡＮＤを求
めることにより、物件番号リスト５、１０が得られる。
これらの得られた物件番号リストの結果は再び検索結果
リスト格納部１４に送られ、格納される。In the case of the present embodiment, the result stored in the search result list storage unit 14 is the conditional expression A as described above.
The number list of properties searched by is 3, 5, 10,
20 and the number list of properties searched by the conditional expression B is 5, 10, 30. Therefore, the conditional expression (A or
In the case of the search of B), by obtaining the OR of these property number lists, the property number lists 3, 5, 10, 2
0 and 30 are obtained. Similarly, conditional expression (A and B)
In the case of the search, the property number lists 5 and 10 are obtained by calculating the AND of these property number lists.
The results of the obtained property number list are sent again to the search result list storage unit 14 and stored therein.

【００１２】したがって、これらのリストを用いて、物
件リスト条件検索部１６はさらに（Ａ or Ｂ or
Ｃ）や（Ａ or Ｂ or Ｃ）and Ｅのような条件式を
用いた検索を同様に行うことができる。Therefore, by using these lists, the property list condition search unit 16 further (A or B or
A search using a conditional expression such as C) or (A or B or C) and E can be similarly performed.

【００１３】本実施例によれば、上記のように所定の検
索キーからなる条件式によって検索した結果を物件番号
リストとして検索結果リスト格納部１４に格納してお
き、これらのキーを組み合わせた複雑な条件式による検
索を行う場合に、格納された物件番号リストを用いて条
件検索を行う。According to this embodiment, the search result list storage unit 14 stores the result of the search by the conditional expression consisting of the predetermined search keys as described above, and the combination of these keys is complicated. When performing a search using a conditional expression, a conditional search is performed using the stored property number list.

【００１４】したがって、複雑な条件式による検索の場
合に、全物件に対してそれぞれ複雑な条件による検索を
行う必要がないため、検索時間を短縮することができ
る。また、部分的な条件検索の結果を再利用して検索を
行うから検索効率がよい。Therefore, in the case of a search by a complicated conditional expression, it is not necessary to search for all properties by a complicated condition, so that the search time can be shortened. Further, since the result of the partial condition search is reused to perform the search, the search efficiency is good.

【００１５】本発明による検索システムは、各種のデー
タベースの検索に適用できる。たとえば次のようなデー
タ検索システムにおける条件式の検索に適用できる。The retrieval system according to the present invention can be applied to retrieval of various databases. For example, it can be applied to the retrieval of conditional expressions in the following data retrieval system.

【００１６】図２は、本発明が適用される一実施例を示
す近傍特徴量によるパターン検索システムのデータフロ
ー図である。この検索システムでは、予め全対象物件か
ら事象（情報）の位相情報を全て捨象した近傍特徴量を
作成し、そのデータ群に対して全物件検索を行なう。検
索のアルゴリズムは、学習ステップと検索ステップとか
らなる。学習ステップでは、物件毎に近傍特徴量行列が
作成される。検索ステップでは、検索キーと近傍特徴量
行列とのマッチング演算が行なわれ、物件ごとにマッチ
ング度（類似度）を示す評価結果を得る。以下、各ステ
ップについて説明する。FIG. 2 is a data flow diagram of a pattern search system by the neighborhood feature quantity showing an embodiment to which the present invention is applied. In this search system, a neighborhood feature amount is created in which all phase information of events (information) is removed from all target properties in advance, and all properties are searched for the data group. The search algorithm includes a learning step and a search step. In the learning step, a neighborhood feature quantity matrix is created for each property. In the search step, a matching operation between the search key and the neighborhood feature amount matrix is performed to obtain an evaluation result indicating the matching degree (similarity) for each property. Each step will be described below.

【００１７】（１）、学習ステップ図２に於いて、検索対象１０は、例えば日本語、英語、
ドイツ語、フランス語、ヘブライ語、ロシア語などの文
書データ、或いは量子化された波形数値データ、化学構
造式、遺伝子情報などである。このような検索対象に対
して、まず正規化手段Ｓ１により正規化の処理を行な
う。一般に検索対象は、情報の最小単位（文書であれば
アルファベットなどの文字、数値チャートであれば、あ
る時刻における実数値など）の列で表現されている。そ
れをなんらかの方法でｎ階調の整数列に変換する。これ
をデータの正規化と呼ぶ。(1) Learning Step In FIG. 2, the search target 10 is, for example, Japanese, English,
Document data in German, French, Hebrew, Russian, etc., or quantized waveform numerical data, chemical structural formulas, genetic information, and the like. For such a search target, the normalization means S1 first performs a normalization process. In general, a search target is represented by a column of minimum units of information (characters such as alphabets in the case of documents, real numerical values at a certain time in the case of numerical charts). It is converted into an integer sequence of n gradations by some method. This is called data normalization.

【００１８】例えば、英文書データの場合、ＡＳＣＩＩ
コード表をそのまま用いることにより、次のような２５
６階調の数値表現として実現される。 …… This is a pen. …… 84｜104 ｜105 ｜115 ｜32｜105 ｜115 ｜32｜97｜32｜112 ｜101 ｜110 ｜46｜For example, in the case of English document data, ASCII
By using the code table as it is, the following 25
It is realized as a numerical expression with 6 gradations. …… This is a pen. …… 84 ｜ 104 ｜ 105 ｜ 115 ｜ 32 ｜ 105 ｜ 115 ｜ 32 ｜ 97 ｜ 32 ｜ 112 ｜ 101 ｜ 110 ｜ 46 ｜

【００１９】上記のコードにおいては、Ｔが84、ｈが10
4 ．．と対応している。In the above code, T is 84 and h is 10
Four . ． It corresponds to.

【００２０】正規化されたデータ２０は、次に学習手段
Ｓ２により近傍特徴量行列３０の形式に畳込まれる。こ
こで近傍特徴量をとる演算式は種々考えられる。この演
算式は検索の鋭さ（過検出の少なさ）にも影響を与え
る。The normalized data 20 is then convoluted into the form of the neighborhood feature quantity matrix 30 by the learning means S2. Here, various arithmetic expressions for obtaining the neighborhood feature amount are possible. This arithmetic expression also affects the sharpness of search (the degree of overdetection is small).

【００２１】今、ｉ番目の物件（文書）のｊ番目のデー
タ（文字）をＣ_i,jとし、Ｃ_i,jに関する量子化量ｘと
Ｃ_i,jの前方ｋ近傍に関する量子化量ｙを次のようにし
て求める。ここでは、検索される対象物件（文書）がｎ
個あるとし、そのうちのｉ番目の物件の量子化について
説明する。ｉ番目の物件において、図３に示すように正
規化された数値列135,64,37,71,101,...が並んでいると
すると、Ｃ_i,jに関する量子化量ｘは、ｘ＝f(Ｃ_i,j）Ｃ_i,jの前方ｋ近傍に関する量子化量ｙはｙ＝g(Ｃ_i,j, Ｃ_i,j+1,Ｃ_i,j+2,....,Ｃ_i,j+k) で求められる。[0021] Now, j-th data (characters) to C _i of the i-th property _(document), and _j, C _i, the quantization amount x and C _i relates _{_j,} quantization amount for Upcoming k near the _j y Is calculated as follows. Here, the target property (document) to be searched is n
Given that there are individual pieces, the quantization of the i-th property will be described. Assuming that the normalized numerical value sequence 135,64,37,71,101, ... is arranged in the i-th property as shown in FIG. 3, the quantization amount x for C _{i, j} is x = f (C _{i, j} ) Quantization amount y for the front k neighborhood of C _{i, j} is y = g (C _{i, j} , C _{i, j + 1,} C _{i, j + 2, ...,} C _{i , j + k} ).

【００２２】ここで、f(Ｃ_i,j）はＣ_i,jに関するｎ段
階量子化関数である。すなわち、ｉ番目の物件のｊ番目
のデータＣ_i,jについて所定の演算を行って得られる値
であり、１〜ｎのいずれかの整数で表される。したがっ
て、得られたｘの値によって図４に示す行列（座標）に
おいてｘ軸方向の位置が１〜ｎの範囲で定まる。Here, f (C _{i, j} ) is an n-step quantization function for C _{i, j} . That is, it is a value obtained by performing a predetermined operation on the j-th data C _{i, j} of the i-th property, and is represented by any integer of 1 to n. Therefore, the position in the x-axis direction in the matrix (coordinates) shown in FIG. 4 is determined within the range of 1 to n by the obtained value of x.

【００２３】また、g(Ｃ_i,j, Ｃ_i,j+1,Ｃ_i,j+2,....,
Ｃ_i,j+k) は、Ｃ_i,jの前方ｋ近傍に関するｍ段階量子
化関数である。すなわち、ｉ番目の物件のｊ番目のデー
タＣ_i,jとそのデータの近傍の所定の数のデータについ
て所定の演算を行って得られる値であり、１〜ｍのいず
れかの整数で表される。たとえば図３に示すようにｊ番
目のデータＣ_i,jが１３５であり、ｋが３の場合には、
Ｃ_i,j+1,Ｃ_i,j+2,Ｃ_i,j+3としてデータ１３５に続くデ
ータ６４、３７、７１を抽出し、これらのデータとデー
タ１３５との相関について所定の演算を行う。ｊ番目の
データＣ_i,jが次の６４の場合には、Ｃ_i,j+1,Ｃ_i,j+2,
Ｃ_i,j+3としてデータ６４に続くデータ３７、７１、１
０１を抽出し、これらのデータとデータ６４との相関に
ついて所定の演算を行う。Further, g (C _{i, j} , C _{i, j + 1,} C _{i, j + 2, ...,}
C _{i, j + k} ) is an m-step quantization function with respect to the front k neighborhood of C _{i, j} . That is, it is a value obtained by performing a predetermined operation on the j-th data C _{i, j of} the i-th property and a predetermined number of data in the vicinity of that data, and is represented by an integer of 1 to _m. It For example, as shown in FIG. 3, when the j-th data C _{i, j} is 135 and k is 3,
The data 64, 37, 71 following the data 135 are extracted as C _{i, j + 1,} C _{i, j + 2,} C _{i, j + 3} , and a predetermined calculation is performed on the correlation between these data and the data 135. . When the j-th data C _{i, j} is the next 64, C _{i, j + 1,} C _{i, j + 2,}
Data 37, 71, 1 following data 64 as C _{i, j + 3}
01 is extracted, and a predetermined calculation is performed on the correlation between these data and the data 64.

【００２４】このようにして得られたｙの値によって、
図４に示す行列（座標）におけるｙ軸方向の位置が１〜
ｍの範囲で定まる。したがって、上記のようにｘ、ｙを
求めることによって図４に示す行列（座標）における位
置が定まる。According to the value of y thus obtained,
The position in the y-axis direction in the matrix (coordinates) shown in FIG.
Determined in the range of m. Therefore, by determining x and y as described above, the position in the matrix (coordinates) shown in FIG. 4 is determined.

【００２５】本システムでは、各物件情報は、上記のよ
うにして求めたｘ、ｙに対して物件の通番ｉと重みｗ
（x,y,i)の組として記憶される。重みｗ（x,y,i)は、デ
ータｘ、ｙ、ｉから所定の演算によって求められるが、
通常は重みｗ（x,y,i)の値は１に固定される。In the present system, each piece of property information has a serial number i and a weight w of the property for x and y obtained as described above.
It is stored as a set of (x, y, i). The weight w (x, y, i) is obtained from the data x, y, i by a predetermined calculation,
Normally, the value of the weight w (x, y, i) is fixed to 1.

【００２６】上記のようにして求められたデータＣ_i,j
ごとにｘ、ｙの値に基づき図４に棒によって示されるよ
うに、データを記憶する。すなわち、データＣ_i,jの
ｘ、ｙの値によって定められる座標の位置に、その物件
の通番ｉとその重みｗ（x,y,i)を組みとしたデータを記
憶する。同図ではこのようなデータが記憶されるごとに
棒の長さが延びるように表されている。通常は重みｗ
（x,y,i)は１とされるから、物件の通番ｉのデータのみ
がｘ、ｙの値によって定められる座標の位置に記憶され
てゆく。The data C _{i, j} obtained as described above
The data is stored for each one based on the x, y values, as indicated by the bars in FIG. That is, the data in which the serial number i of the property and its weight w (x, y, i) are combined is stored at the position of the coordinates determined by the values of x and y of the data C _{i, j} . In the figure, the length of the bar is shown to be extended each time such data is stored. Usually weight w
Since (x, y, i) is set to 1, only the data of the serial number i of the property is stored at the position of the coordinates determined by the values of x and y.

【００２７】この様にして作成された近傍特徴量行列に
物件の識別番号を付加して構造ファイル４０として保存
する。The identification number of the property is added to the neighborhood feature amount matrix created in this way, and the structure file 40 is saved.

【００２８】（２）、検索ステップまず検索キー５０を入力する。例えば、"This is a pe
n."を検索キーとする。この検索キー５０に対して学習
ステップと同一の正規化方法に基づく正規化手段Ｓ３に
よりキー情報を整数列に正規化する。 84｜104 ｜105 ｜115 ｜32｜105 ｜115 ｜32｜97｜32｜112 ｜101 ｜110 ｜46｜(2) Search Step First, the search key 50 is input. For example, "This is a pe
n. "is used as the search key. The key information is normalized to an integer sequence by the normalization means S3 based on the same normalization method as the learning step for this search key 50. 84 | 104 | 105 | 115 | 32 ｜ 105 ｜ 115 ｜ 32 ｜ 97 ｜ 32 ｜ 112 ｜ 101 ｜ 110 ｜ 46 ｜

【００２９】次に、検索手段Ｓ４において、学習ステッ
プと同一の近傍特徴量抽出式f() 、g() を用いて各物件
に対応する正規化された数値列の先頭からｘ、ｙの組の
系列を作成する。次に、このｘ、ｙの組の系列に基づい
て、物件ｋに対する検索キーの含有度数ω_kとして、Ｖ
（ｘ_j,ｙ_j,ｋ）をｊ＝１〜ｍについて合計することによ
り算出する。Next, in the search means S4, a set of x and y from the head of the normalized numerical value sequence corresponding to each property using the same neighborhood feature extraction formulas f () and g () as in the learning step. Create a series of. Next, based on the series of the set of x and y, the search key content frequency ω _k for the property k is V
It is calculated by summing (x _j, y _j, k) for j = 1 to m.

【００３０】ただし、Ｖ（ｘ_j,ｙ_j,ｋ）は、物件情報リ
ストが物件ｉについての重みを持つ場合、はその重みに
等しく、持たない場合には０と定める。However, V (x _j, y _j, k) is set to be equal to the weight when the property information list has the weight for the property i, and is set to 0 when the property information list does not have the weight.

【００３１】したがって、検索すべき数値列のｘ、ｙの
組に対応する図４のｘ、ｙの位置にデータがある場合
（棒がある場合）には、別に設けられた記憶手段のその
データに示される物件の通番ｉの格納箇所にその重みの
値を記憶させる。Therefore, when there is data at the x, y position in FIG. 4 (when there is a bar) corresponding to the x, y pair of the numerical sequence to be searched (when there is a bar), that data in the storage means provided separately. The value of the weight is stored in the storage location of the serial number i of the property shown in FIG.

【００３２】次に、評価結果出力手段Ｓ５において、物
件毎に得られた構造評価値score （合致度）を完全一致
の場合の評価値（この場合は、検索キー情報の文字数−
ｋ）で割って、検索キーの含有確率を求め、評価結果の
リスト７０を得る。更にソート手段Ｓ６において、この
リスト７０を含有確率の降順にソートしソート済みリス
ト８０を得る。Next, in the evaluation result output means S5, the structural evaluation value score (degree of coincidence) obtained for each property is an evaluation value in the case of perfect match (in this case, the number of characters in the search key information-
Divide by k) to obtain the search key content probability, and obtain a list 70 of evaluation results. Further, the sorting means S6 sorts the list 70 in descending order of content probability to obtain a sorted list 80.

【００３３】このソート済みリスト８０が検索結果であ
り、その上位物件を参照することにより、検索キーが物
件中に含まれている確率が高い物件名を知ることができ
る。含有確率は、完全一致及び不完全一致の全てについ
て求まるから、あいまい一致検索を行なうことができ
る。This sorted list 80 is a search result, and by referring to the higher-ranked property, it is possible to know the property name with a high probability that the search key is included in the property. Since the content probability is obtained for all of the perfect match and the incomplete match, the fuzzy match search can be performed.

【００３４】また、検索キーの全情報についての全物件
探索であるから、検索もれが発生する確率は、本質的に
零であると言う特徴がある。Further, since the search is for all properties for all the information of the search key, the probability of missing the search is essentially zero.

【００３５】また、１つの物件に対する検索キーの評価
時間は、キーの文字数のみに依存し、物件の大きさには
依存しない。従って、非常に高速に検索を行なうことが
できる。Further, the evaluation time of the search key for one property depends only on the number of characters of the key and does not depend on the size of the property. Therefore, the search can be performed very quickly.

【００３６】上記のようなデータ検索の結果、ソート済
みリスト８０においてスコア（キーの含有確率）が所定
のしきい値よりも高い物件を抽出し、これを前述の条件
式ＡまたはＢによる検索結果とする。これらの物件の番
号リストは図１の検索結果リスト格納部１４に格納さ
れ、これを基にして前述のように物件リスト条件検索部
１６において条件検索が行われる。As a result of the data search as described above, the property whose score (key content probability) is higher than a predetermined threshold is extracted from the sorted list 80, and the extracted result is obtained by the conditional expression A or B. And The number list of these properties is stored in the search result list storage unit 14 of FIG. 1, and based on this, the property list condition search unit 16 performs the condition search as described above.

【００３７】上記のようなデータ検索においては、検索
の結果がソート済みリスト８０として得られるからこれ
を検索結果リスト格納部１４に格納し、格納されたデー
タに基づいて条件検索を行うことにより、複雑な条件式
の場合にも高速で検索を行うことができる。In the data search as described above, since the search result is obtained as the sorted list 80, this is stored in the search result list storage unit 14 and the conditional search is performed based on the stored data. Even in the case of complicated conditional expressions, the search can be performed at high speed.

【００３８】式（１）の近傍特徴量抽出式は上述の例の
他に種々考えることができる。例えば、 f: x→x g: (x,y)→x-y （または｜x-y ｜）とすれば、隣接文字及び一つ置きの文字の差分（または
差分の絶対値）を近傍特徴量として近傍特徴量行列を作
ることができる。また幾つかの文字列の個々の文字整数
値に対し四則演算を施すことにより近傍特徴量を取り出
してもよい。The neighborhood feature extraction formula of the formula (1) can be variously considered in addition to the above example. For example, if f: x → xg: (x, y) → xy (or | xy |), the difference between adjacent characters and every other character (or the absolute value of the difference) is used as the neighborhood feature quantity. You can make a matrix. Alternatively, the neighborhood feature amount may be extracted by performing four arithmetic operations on individual character integer values of some character strings.

【００３９】近傍特徴量は、各物件の全データを対象と
し取り出さなくてもよい。例えば、物件データ中の特定
の一つまたは一つ以上の整数値、特定の範囲の整数値、
或いはデータ列を構成する各バイト中の特定の１つまた
は一つ以上のビットを除外して近傍特徴量を作成（抽
出）してもよい。また日本語文書のように２バイト文字
で構成されている場合には、例えば上位バイトを除外し
て下位バイトを対象として近傍特徴量を取り出してもよ
い。The neighborhood feature amount does not have to be extracted for all the data of each property. For example, a specific one or more integer values in property data, an integer value in a specific range,
Alternatively, the neighboring feature amount may be created (extracted) by excluding a specific one or one or more bits in each byte forming the data string. In the case of a double-byte character like a Japanese document, for example, the upper byte may be excluded and the lower-order byte may be taken as the target to extract the neighborhood feature amount.

【００４０】上述の例では、近傍特徴量行列は、２５６
次のビット行列であり、これは８Kバイトに相当する。
従って、１物件のデータが１K バイト程度であるデータ
ベースでは、効率のよいシステムであるとは言えない。
そこでデータ圧縮手段Ｓ７を設けてデータ圧縮を行なっ
て構造ファイル４０の容量を減らすのがよい。In the above example, the neighborhood feature quantity matrix is 256
This is the next bit matrix, which corresponds to 8 Kbytes.
Therefore, it cannot be said that a database in which the data for one property is about 1 Kbyte is an efficient system.
Therefore, it is preferable to reduce the capacity of the structure file 40 by providing data compression means S7 to perform data compression.

【００４１】図５にデータ圧縮法の一例を示す。この例
では、２５６次の近傍特徴量行列の各要素毎に要素値が
１である物件名４０ａ（識別コード）を１バイト／件の
データ列として蓄積する。従って、要素値が０である物
件名は不要データとして除外する。FIG. 5 shows an example of the data compression method. In this example, the property name 40a (identification code) whose element value is 1 is stored as a 1-byte / case data string for each element of the 256th-order neighborhood feature amount matrix. Therefore, the property name whose element value is 0 is excluded as unnecessary data.

【００４２】物件数が２５５個以上ある場合には、物件
名４０ａは１バイトで表せないので、下位の１バイトの
みを蓄積する。例えば、物件数が１万件の場合、物件名
は２バイトで表されるが、そのうちの下位１バイトを使
用する。そして物件名コードが２５５を越える毎にデー
タ列にマーカ４０ｂを挿入する。When the number of properties is 255 or more, the property name 40a cannot be represented by 1 byte, so only the lower 1 byte is stored. For example, when the number of properties is 10,000, the property name is represented by 2 bytes, but the lower 1 byte is used. Then, every time the property name code exceeds 255, the marker 40b is inserted into the data string.

【００４３】検索時には、検索キーの近傍特徴量の各々
に該当する構造ファイルのデータ列を取り出し、物件名
毎の出現度数テーブルを作成する。この際、マーカ４０
ｂを越える毎に物件名コードに２５５を加える。このよ
うにして作成した出現度数テーブルに基づいて図２の評
価結果リスト７０が得られる。At the time of search, the data string of the structure file corresponding to each of the neighborhood feature amounts of the search key is taken out, and the appearance frequency table for each property name is created. At this time, the marker 40
Add 255 to the property name code every time it exceeds b. The evaluation result list 70 of FIG. 2 is obtained based on the appearance frequency table created in this way.

【００４４】なお物件名コードのデータ列が例えば全物
件中の半分以上ある場合には、その近傍特徴量行列要素
は各物件について共通であると見なして、その要素を削
除してもよい。If the data string of the property name code is, for example, more than half of all properties, the neighboring feature amount matrix element may be regarded as common for each property and the element may be deleted.

【００４５】上述の実施例において，正規化手段Ｓ１、
学習手段Ｓ２、正規化手段Ｓ３、検索手段Ｓ４、評価結
果出力手段Ｓ５、ソート手段Ｓ６、データ圧縮手段Ｓ７
は、コンピュータプログラムによって構成することがで
きるが、論理回路素子を用いて専用のハードウエアを構
成してもよい。In the above embodiment, the normalizing means S1,
Learning means S2, normalization means S3, search means S4, evaluation result output means S5, sorting means S6, data compression means S7.
Can be configured by a computer program, but dedicated hardware may be configured by using a logic circuit element.

【００４６】[0046]

【発明の効果】本発明のシステムによれば、所定の条件
による検索の結果を記憶しておくので、複雑な条件によ
る検索を高速で行うことができる。また、部分的な検索
結果を再利用できるので無駄がなく、検索効率が高い。According to the system of the present invention, since the result of the search under the predetermined condition is stored, the search under the complicated condition can be performed at high speed. Further, since partial search results can be reused, there is no waste and the search efficiency is high.

[Brief description of drawings]

【図１】本発明によるデータベース検索システムの一実
施例のデータフロー図である。FIG. 1 is a data flow diagram of an embodiment of a database search system according to the present invention.

【図２】本発明による検索システムを適用するデータベ
ース検索システムのデータフロー図である。FIG. 2 is a data flow diagram of a database search system to which the search system according to the present invention is applied.

【図３】近傍情報の量子化を示す図である。FIG. 3 is a diagram showing quantization of neighborhood information.

【図４】記憶される情報構造を示す図である。FIG. 4 is a diagram showing a stored information structure.

【図５】圧縮された近傍特徴量のデータ構成図である。FIG. 5 is a data configuration diagram of a compressed neighborhood feature amount.

[Explanation of symbols]

１０検索対象１２検索部１４検索結果リスト格納部１６物件リスト条件検索部１８検索キー入力部２０正規化データ３０近傍特徴量行列４０構造ファイル５０検索キー６０正規化キー７０評価結果リスト８０ソート済みリストＳ１正規化手段Ｓ２学習手段Ｓ３正規化手段Ｓ４検索手段Ｓ５評価結果出力手段Ｓ６ソート手段Ｓ７データ圧縮手段 10 search target 12 search unit 14 search result list storage unit 16 property list condition search unit 18 search key input unit 20 normalized data 30 neighborhood feature amount matrix 40 structure file 50 search key 60 normalization key 70 evaluation result list 80 sorted list S1 normalization means S2 learning means S3 normalization means S4 search means S5 evaluation result output means S6 sorting means S7 data compression means

Claims

[Claims]

1. A search means for searching all properties according to predetermined conditions, a search result storage means for storing search results by the search means, and a condition search using the search results stored in the search result storage means. A condition search means for performing the search, wherein the condition search means performs a search based on a combination of the conditions used in the search by the search means,
A database search system, characterized in that the search is performed based on the search results stored in the search result storage means.

2. The database search system according to claim 1, wherein the system further comprises input means for inputting the condition.

3. A storage unit that stores the neighborhood feature amount for each search target property, and the degree of matching between the search key neighborhood feature amount and the search target neighborhood feature amount is determined for each property, and the property number is matched. The database search system according to claim 1, wherein the database search system comprises a search means for outputting in descending order of degree.

4. The quantization amount x for the j-th data string C _{i, j} of the i-th property to be searched and k data strings C _{i, j + 1,} C _{i, j + 2, in the} vicinity thereof _{. ..,} C _{i, j + k} quantized amount y and x = f (C _{i, j} ) y = g (C _{i, j} , C _{i, j + 1,} C _{i, j + 2, ..,} C _{i, j + k} ), and is used for a database search for storing the serial number i of the property at the position of the storage means determined based on the obtained x and y values. The database search system according to claim 3.