JP2001209651A

JP2001209651A - Method and device for retrieving multi-dimensional vector and recording medium having multi-dimensional vector retrieval program recorded thereon

Info

Publication number: JP2001209651A
Application number: JP2000017877A
Authority: JP
Inventors: Nobuhiko Kamikawa; 伸彦上川; Kazumasa Iwasaki; 一正岩崎
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2000-01-24
Filing date: 2000-01-24
Publication date: 2001-08-03
Anticipated expiration: 2020-01-24
Also published as: JP4029536B2

Abstract

PROBLEM TO BE SOLVED: To solve the problem that processing time and a processing amount required for the processing of retrieving n-dimensional vector data increase corresponding to the dimension number of vector data to be a retrieval object. SOLUTION: In this multi-dimensional vector retrieval method for retrieving pertinent multi-dimensional vector data whose position and size are both present inside an optional multi-dimensional rectangular area inside a multi-dimensional space from the stored plural multi-dimensional vector data, the multi-dimensional rectangular area is inputted as a retrieval range, a first judgment processing is performed to a data pair composed of the multi-dimensional vector data and an address value calculated based on the rough values of the respective dimensions of the vector data by using the address value, a second judgment processing is performed by using the multi-dimensional vector data in the case that the address value is within a prescribed range and the multi-dimensional vector data of the data pair are outputted as a retrieved result in the case that the vector data are within the prescribed range.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、多数のｎ（ｎ≧
１）次元ベクトルデータが蓄積されたデータベース等か
ら、ｎ次元空間内において、位置、サイズ共に任意のｎ
次元矩形領域内に存在するベクトルデータを検索する多
次元ベクトル検索方法に関し、特に、ｎが数十以上とな
る場合の検索に適用して有効な技術に関するものであ
る。[0001] The present invention relates to a method for producing a plurality of n (n ≧ n).
1) From a database or the like in which the dimensional vector data is stored, an arbitrary n in both the position and the size in the n-dimensional space.
The present invention relates to a multidimensional vector search method for searching for vector data existing in a dimensional rectangular area, and more particularly to a technique that is effective when applied to a search when n is several tens or more.

【０００２】[0002]

【従来の技術】従来、蓄積された複数のｎ次元ベクトル
データの中から、ｎ次元ベクトル空間内において、位
置、サイズ共に任意のｎ次元矩形領域内に存在するベク
トルデータを検索する方法が、ＰＣＴ／ＥＰ９７／０４
５２０に開示されている。2. Description of the Related Art Heretofore, a method of searching for vector data existing in an arbitrary n-dimensional rectangular area in terms of both position and size in an n-dimensional vector space from a plurality of accumulated n-dimensional vector data has been disclosed by PCT. / EP97 / 04
520.

【０００３】この方法では、ベクトル空間を一次元的に
順序付けることのできる領域に分割し、各ベクトルデー
タが属する領域によってベクトルデータを管理する。検
索時には、ベクトル空間内で位置、サイズ共に任意のｎ
次元矩形領域を検索範囲として、検索範囲と重なる領域
を全て求め、求めた各領域内に存在するベクトルデータ
に対して、ベクトルデータの各次元値と検索範囲の各次
元の最小値、最大値との比較を行う。ベクトルデータの
各次元値の比較において、検索範囲内に存在すると判定
されたベクトルデータを検索結果として出力する。In this method, a vector space is divided into regions that can be ordered one-dimensionally, and the vector data is managed by the region to which each vector data belongs. At the time of search, any position and size in the vector space are arbitrary n
With the dimensional rectangular area as the search range, all the areas overlapping the search range are obtained, and for the vector data existing in each obtained area, the minimum value and the maximum value of each dimension value of the vector data and each dimension of the search range are obtained. Is compared. In comparing the respective dimension values of the vector data, the vector data determined to be within the search range is output as a search result.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記の
ような従来の方法では、対象データが数十次元以上の高
次元ベクトルになると、検索するのに要する処理時間が
増大するという問題があった。すなわち、上記公知例に
よれば、検索処理は「検索範囲と重なる領域を全て求め
る処理」と「ベクトルデータの各次元値と検索範囲の各
次元の最小値、最大値との比較を行う処理」とに分ける
ことができ、「検索範囲と重なる領域を全て求める処
理」の処理オーダが、Ｏ（２ｎ×ａ（ｎ））（ａは１つ
の領域に対して検索範囲と重なるか否かをを判定する処
理量）であり、「ベクトルデータの各次元値と検索範囲
の各次元の最小値、最大値との比較を行う処理」の処理
オーダが、Ｏ（ｒ×ｂ（ｎ））（ｒは検索結果となるデ
ータ数、ｂは１件毎に検索範囲の各次元との比較を行う
処理量）である。例えば、対象ベクトルデータが３２次
元の場合には、「検索範囲と重なる領域を全て求める処
理」の処理オーダは、Ｏ（約４０億×ａ（ｎ））とな
る。ａの処理時間を１μ秒と仮定しても、「検索範囲と
重なる領域を全て求める処理」だけで４０００秒を要す
ることになる。このように、上記従来の方法では、対象
データの次元数が増加すると、検索処理時間が増大す
る。多次元ベクトル検索方法は、例えば類似画像検索の
ような検索システムに適用されるが、そのような検索シ
ステムにおいて要求される検索処理時間は１０万件程度
のベクトルデータに対して「数秒程度」であり、上記従
来の方法では、要求を満たすのは難しいと考えられる。However, the conventional method as described above has a problem that when the target data is a high-dimensional vector having several tens of dimensions or more, the processing time required for searching increases. That is, according to the above-described known example, the search processing is “processing for obtaining all areas overlapping with the search range” and “processing for comparing each dimension value of the vector data with the minimum value and the maximum value of each dimension of the search range”. The processing order of the “processing for obtaining all areas overlapping with the search range” is O (2n × a (n)) (where a is whether or not one area overlaps the search range). The processing order of “processing for comparing each dimension value of vector data with the minimum value and maximum value of each dimension of the search range” is O (r × b (n)) (r Is the number of data to be a search result, and b is the amount of processing for comparing each dimension with each dimension of the search range. For example, when the target vector data has 32 dimensions, the processing order of the “processing for obtaining all areas overlapping the search range” is O (about 4 billion × a (n)). Even if the processing time of “a” is assumed to be 1 μsec, it takes 4000 seconds only for “the process of obtaining all the regions overlapping the search range”. As described above, in the conventional method, as the number of dimensions of the target data increases, the search processing time increases. The multidimensional vector search method is applied to a search system such as a similar image search, and the search processing time required in such a search system is "about several seconds" for about 100,000 vector data. Therefore, it is considered that it is difficult to satisfy the demand by the above-mentioned conventional method.

【０００５】本発明の目的は上記問題を改善するため
に、ｎが数十以上の場合でも、位置、サイズ共に任意の
ｎ次元矩形領域内に存在するｎ次元ベクトルデータを検
索するのに要する処理時間に対し、検索対象となるベク
トルデータの次元数の影響を減らす技術を提供すること
にある。[0005] An object of the present invention is to improve the above-mentioned problem by performing processing required for retrieving n-dimensional vector data existing in an arbitrary n-dimensional rectangular area in both position and size even when n is several tens or more. An object of the present invention is to provide a technique for reducing the influence of the number of dimensions of vector data to be searched with respect to time.

【０００６】[0006]

【課題を解決するための手段】上記問題を改善するため
に、蓄積された複数の多次元ベクトルデータの中から、
位置、サイズ共に任意の多次元矩形領域内に存在する当
該多次元ベクトルデータを検索する多次元ベクトル検索
方法において、前記多次元矩形領域を検索条件として入
力し、前記多次元ベクトルデータと当該多次元ベクトル
データの各次元の概略値を元に算出されるアドレス値と
からなるデータ対に対して、当該アドレス値を用いて第
１の判定処理を行い、当該アドレス値が前記検索条件を
満たす場合に、当該データ対の多次元ベクトルデータを
用いて第２の判定処理を行い、当該ベクトルデータが前
記検索条件を満たす場合に、当該多次元ベクトルデータ
を検索結果として出力するようにする。すなわち、本発
明による検索処理は、検索結果の候補となるデータを絞
り込む「第１の判定処理」と、検索結果の候補となるデ
ータを対象にして、最終的に検索結果となるデータを求
める「第２の判定処理」とに分けられる。第１の判定処
理の処理オーダは、Ｏ（Ｎ×ｆ（ｎ））（Ｎは蓄積デー
タ数、ｆは１件毎の前判定処理量）であり、第２の判定
処理の処理オーダは、Ｏ（Ｎ’×ｇ（ｎ））（Ｎ’は本
判定処理の対象となるデータ数、ｇは１件毎の本判定処
理量）であるので、検索処理全体の処理オーダはＯ（Ｎ
×ｆ（ｎ）＋Ｎ’×ｇ（ｎ））となる。アドレス値はベ
クトルデータの各次元値の概略値を元に算出されるの
で、アドレス値のサイズはベクトルデータのサイズより
も小さくなり、結果として、ｆ（ｎ）＜＜ｇ（ｎ）とな
る。また、本発明においては、経験的観測により、第１
の判定処理を行うことにより、第２の判定処理の対象と
なるデータ数を検索結果となるデータ数の３倍程度にま
で絞り込むことが期待できる。例えば、１０万件の蓄積
データから１００件の検索結果を出力する検索では、第
１の判定処理の処理オーダＯ（１０万×ｆ（ｎ））に対
して、第２の判定処理の処理オーダはＯ（３００×ｇ
（ｎ））となる。ここで、経験的観測により、ファイル
Ｉ／Ｏを考慮した処理時間として、ｆの処理時間を１０
μ秒、ｇの処理時間を１ｍ秒と仮定すると、第１の判定
処理の処理時間が１秒程度、第２の判定処理の処理時間
が０．３秒程度であり、検索処理全体で１．３秒となる
ので、１０万件程度のベクトルデータに対して「数秒程
度」という要求を満たし得る性能であるということがで
きる。また、ＣＰＵ演算はファイルＩ／Ｏに比べて処理
時間が非常に短いので、検索対象となるベクトルデータ
の次元数が増加しても、検索処理全体の処理量にはほと
んど影響を与えない。Means for Solving the Problems In order to improve the above problem, from among a plurality of stored multidimensional vector data,
In a multi-dimensional vector search method for searching for the multi-dimensional vector data present in an arbitrary multi-dimensional rectangular area in both the position and the size, the multi-dimensional rectangular area is input as a search condition, and the multi-dimensional vector data and the multi-dimensional rectangular data are input. A first determination process is performed on a data pair consisting of an address value calculated based on an approximate value of each dimension of vector data using the address value, and when the address value satisfies the search condition, The second determination process is performed using the multidimensional vector data of the data pair, and when the vector data satisfies the search condition, the multidimensional vector data is output as a search result. That is, in the search processing according to the present invention, "first determination processing" for narrowing down data that is a search result candidate and "final search result data for the search result candidate data" are obtained. Second determination process ". The processing order of the first determination processing is O (N × f (n)) (N is the number of stored data, f is the pre-determination processing amount for each case), and the processing order of the second determination processing is Since O (N ′ × g (n)) (N ′ is the number of data to be subjected to the main determination processing and g is the main processing amount for each case), the processing order of the entire search processing is O (N
× f (n) + N ′ × g (n)). Since the address value is calculated based on the approximate value of each dimension value of the vector data, the size of the address value becomes smaller than the size of the vector data, and as a result, f (n) << g (n). Also, in the present invention, the first
By performing the determination processing of, it can be expected that the number of data to be subjected to the second determination processing is reduced to about three times the number of data as the search result. For example, in a search in which 100 search results are output from 100,000 accumulated data, the processing order O (100,000 × f (n)) of the first determination processing is compared with the processing order of the second determination processing. Is O (300 × g
(N)). Here, based on empirical observation, the processing time of f is set to 10 as the processing time taking the file I / O into consideration.
Assuming that the processing time of μ seconds and g is 1 ms, the processing time of the first determination processing is about 1 second and the processing time of the second determination processing is about 0.3 seconds. Since this is 3 seconds, it can be said that the performance is such that it can satisfy the requirement of "about several seconds" for about 100,000 vector data. Further, since the processing time of the CPU operation is much shorter than that of the file I / O, even if the number of dimensions of the vector data to be searched increases, the processing amount of the entire search processing is hardly affected.

【０００７】以上のように本発明の多次元ベクトル検索
方法によれば、ｎが数十以上の場合でも、位置、サイズ
共に任意のｎ次元矩形領域内に存在するｎ次元ベクトル
データを検索するのに要する処理時間を、処理量が検索
対象となるベクトルデータの次元数にしたがって増大せ
ず、１０万件程度のベクトルデータに対して「数秒程
度」という要求を満し得る時間にまで減らすことが可能
である。As described above, according to the multidimensional vector search method of the present invention, even when n is several tens or more, n-dimensional vector data existing in an arbitrary n-dimensional rectangular area in both position and size can be searched. The processing time does not increase with the number of dimensions of the vector data to be searched, and can be reduced to a time that can satisfy the requirement of "about several seconds" for about 100,000 vector data. It is possible.

【０００８】[0008]

【発明の実施の形態】次に、本発明の一実施形態につい
て図面を参照して説明する。Next, an embodiment of the present invention will be described with reference to the drawings.

【０００９】図１は本実施形態の多次元ベクトル検索方
法の原理を示す図である。本実施形態では、多次元空間
４０３に分布する、１０個のベクトルデータ３０３から
検索範囲４０４内に存在するベクトルデータ３０３を求
める手順として、まず、第１の判定処理を行うことによ
り、１０個のベクトルデータ３０３から、検索範囲４０
４内に存在し得るベクトルデータ３０３である、図の網
掛け部分に存在する３個のベクトルデータ３０３に絞り
込む。次に、絞り込まれた３個のベクトルデータ３０３
に対してのみ、第２の判定処理を行い、検索範囲４０４
内に存在する２個のベクトルデータ３０３を求める。こ
こで、第１の判定処理の計算量が非常に少ないので、検
索処理に要する時間を減らすことができる。FIG. 1 is a diagram showing the principle of a multidimensional vector search method according to the present embodiment. In the present embodiment, as a procedure for obtaining the vector data 303 existing within the search range 404 from the ten vector data 303 distributed in the multidimensional space 403, first, a first determination process is performed to obtain the ten vector data 303. From the vector data 303, the search range 40
4 is narrowed down to three vector data 303 existing in the shaded portion of the figure, which are the vector data 303 that can exist in FIG. Next, the narrowed down three vector data 303
The second determination process is performed only for
The two vector data 303 existing in are obtained. Here, since the amount of calculation in the first determination processing is very small, the time required for the search processing can be reduced.

【００１０】図２は本実施形態の多次元ベクトル検索装
置の構成を示す図である。図２に示すように本実施形態
の多次元ベクトル検索装置は、ＣＰＵ１００と、入力装
置１０１と、出力装置１０２と、バス１０３と、メモリ
２００と、二次記憶装置３００とから構成される。本シ
ステムの蓄積処理実行時は、蓄積処理プログラム２０１
と、アドレス算出プログラム２０３と、Ｂ−Ｔｒｅｅプ
ログラム２０４とがメモリ２００に格納され、ＣＰＵ１
００で実行される。入力装置１０１から入力されたベク
トルデータ３０３がメモリ２００に渡り、蓄積処理プロ
グラム２０１がＢ−Ｔｒｅｅデータ３０１を更新し、ア
ドレス値３０２、ベクトルデータ３０３を二次記憶装置
３００に格納する。本システムの検索処理実行時は、検
索処理プログラム２０２と、アドレス算出プログラム２
０３と、Ｂ−Ｔｒｅｅプログラム２０４と、軸アドレス
算出プログラム２０５と、範囲内判定プログラム２０６
とがメモリ２００に格納され、ＣＰＵ１００で実行され
る。ここで、Ｂ−Ｔｒｅｅプログラム２０４は、一般的
に用いられているＢ−Ｔｒｅｅ処理を行うプログラムで
ある。入力装置１０１から入力された検索範囲４０４の
情報がメモリ２００に渡り、検索処理プログラム２０２
がＢ−Ｔｒｅｅデータ３０１、アドレス値３０２、ベク
トルデータ３０３を参照して検索結果のベクトルデータ
３０３を求め、出力装置１０２に検索結果のベクトルデ
ータ３０３を出力する。二次記憶装置３００には、Ｂ−
Ｔｒｅｅデータ３０１と、アドレス値３０２と、ベクト
ルデータ３０３とが格納されている。Ｂ−Ｔｒｅｅデー
タ３０１は、Ｂ−Ｔｒｅｅプログラム２０４により作成
されるデータであり、図１０に示すように、アドレス値
３０２をインデクスキーとしたＢ−Ｔｒｅｅのノード部
に相当する。Ｂ−Ｔｒｅｅデータ３０１のリーフ部に相
当するのがアドレス値３０２、Ｂ−Ｔｒｅｅデータ３０
１の実データ部に相当するのがベクトルデータ３０３と
なっている。以下、アドレス値３０２と当該アドレス値
３０２を算出した元のベクトルデータ３０３との組をデ
ータ対と呼ぶ。FIG. 2 is a diagram showing the configuration of the multidimensional vector search device of the present embodiment. As shown in FIG. 2, the multidimensional vector search device according to the present embodiment includes a CPU 100, an input device 101, an output device 102, a bus 103, a memory 200, and a secondary storage device 300. When the storage processing of the present system is executed, the storage processing program 201
And an address calculation program 203 and a B-Tree program 204 are stored in the memory 200.
It is executed at 00. The vector data 303 input from the input device 101 is transferred to the memory 200, the storage processing program 201 updates the B-Tree data 301, and stores the address value 302 and the vector data 303 in the secondary storage device 300. When executing the search processing of the present system, the search processing program 202 and the address calculation program 2
03, a B-Tree program 204, an axis address calculation program 205, and an in-range determination program 206
Are stored in the memory 200 and executed by the CPU 100. Here, the B-Tree program 204 is a program for performing B-Tree processing that is generally used. The information of the search range 404 input from the input device 101 is transferred to the memory 200, and the search processing program 202
Finds the search result vector data 303 by referring to the B-Tree data 301, the address value 302, and the vector data 303, and outputs the search result vector data 303 to the output device 102. The secondary storage device 300 has B-
Tree data 301, an address value 302, and vector data 303 are stored. The B-Tree data 301 is data created by the B-Tree program 204 and, as shown in FIG. 10, corresponds to a B-Tree node part using the address value 302 as an index key. Address values 302 and B-Tree data 30 correspond to leaf portions of the B-Tree data 301.
The vector data 303 corresponds to one real data part. Hereinafter, a set of the address value 302 and the original vector data 303 from which the address value 302 is calculated is referred to as a data pair.

【００１１】本システムを機能させるための、蓄積処理
プログラム２０１、検索処理プログラム２０２、アドレ
ス算出プログラム２０３、Ｂ−Ｔｒｅｅプログラム２０
４、軸アドレス算出プログラム２０５、範囲内判定プロ
グラム２０６、からなるプログラムは、ＣＤ−ＲＯＭ等
の記録媒体に記録され二次記憶媒体２に格納された後、
メモリ１２にロードされて実行されるものとする。なお
前記プログラムを記録する媒体はＣＤ−ＲＯＭ以外の他
の媒体でも良い。The storage processing program 201, the search processing program 202, the address calculation program 203, and the B-Tree program 20 for making this system function.
4. A program including the axis address calculation program 205 and the in-range determination program 206 is recorded on a recording medium such as a CD-ROM and stored in the secondary storage medium 2,
It is assumed that the program is loaded into the memory 12 and executed. The medium on which the program is recorded may be a medium other than the CD-ROM.

【００１２】以下、本実施形態の蓄積処理と検索処理に
ついて説明する。Hereinafter, the accumulation processing and the retrieval processing according to the present embodiment will be described.

【００１３】図３は、本実施形態の蓄積処理プログラム
２０１のフローチャートである。以下、図１１、図１２
と併せて説明する。ステップ３１では、入力装置１０１
から入力されたベクトルデータ３０３から、アドレス算
出プログラム２０３によりアドレス値３０２を算出す
る。図１１に示すように、ベクトルデータ３０３（１
１）が入力されたら、ベクトルデータ３０３（１１）の
アドレス値３０２［１．３．２］を算出する。ステップ
３２では、算出したアドレス値３０２の二次記憶装置３
００における格納位置をＢ−Ｔｒｅｅプログラム２０４
により求め、Ｂ−Ｔｒｅｅデータ３０１を更新し、アド
レス値３０２、ベクトルデータ３０３を二次記憶装置３
００に格納する。ここでは、アドレス値３０２の大小関
係が、データ対（３）＜データ対（１１）＜データ対
（４）となっているので、図１２に示すように、データ
対（１１）のアドレス値３０２［１．３．２］はデータ
対（３）のアドレス値３０２［１．０．３］と、データ
対（４）のアドレス値３０２［２．０．３］との間に挿
入されるので、Ｂ−Ｔｒｅｅデータ３０１を更新し、そ
れに伴いアドレス値３０２、ベクトルデータ３０３をし
かるべき格納位置に格納する。FIG. 3 is a flowchart of the storage processing program 201 according to the present embodiment. Hereinafter, FIGS. 11 and 12
It is explained together with. In step 31, the input device 101
The address value 302 is calculated by the address calculation program 203 from the vector data 303 input from the. As shown in FIG. 11, the vector data 303 (1
When 1) is input, the address value 302 [1.3.2] of the vector data 303 (11) is calculated. In step 32, the secondary storage device 3 of the calculated address value 302
00 is stored in the B-Tree program 204
And updates the B-Tree data 301 and stores the address value 302 and the vector data 303 in the secondary storage device 3.
00 is stored. Here, since the magnitude relationship of the address values 302 is data pair (3) <data pair (11) <data pair (4), the address value 302 of the data pair (11) is set as shown in FIG. [1.3.2] is inserted between the address value 302 [1.0.3] of the data pair (3) and the address value 302 [2.0.3] of the data pair (4). , B-Tree data 301 is updated, and the address value 302 and the vector data 303 are stored in appropriate storage locations accordingly.

【００１４】図４は、本実施形態のアドレス算出プログ
ラム２０３のフローチャートである。ステップ４１で
は、アドレス値３０２を算出する対象のベクトルデータ
３０３から、検索の用途に応じて必要なｍ個の次元を選
択することにより、アドレス用ベクトルを作成する。ス
テップ４２では、アドレス用ベクトル空間からなるキュ
ーブ４０１を処理対象キューブとする。ステップ４３で
は、変数ｉに「１」を代入する。ｉとは、アドレス値３
０２の全ての桁に対して処理を行ったかどうかをチェッ
クする変数である。ステップ４４では、対象キューブに
対して基本分割処理を行い、処理対象キューブを２ｍ個
のサブキューブ４０２に分割する。ステップ４５では、
ベクトルデータ３０３がステップ４４で分割されたサブ
キューブ４０２のうち、どのサブキューブ４０２に含ま
れるかを判断する。ステップ４６では、ベクトルデータ
３０３が含まれるサブキューブ４０２の番号をアドレス
値３０２のｉ桁目の数Ｘｉとする。ここで、Ｘｉは符号
なしｍビット整数である。ステップ４７では、ベクトル
データ３０３が含まれるサブキューブ４０２を処理対象
キューブとする。ステップ４８では、アドレス値３０２
の全ての桁に対して処理を行ったかどうかをチェックす
る。、ｉ＜ｋを満たす場合にはステップ４９に進み、満
たさない場合には、アドレス値３０２の全ての桁に対し
て処理を行ったので、［Ｘ１．Ｘ２．…．Ｘｋ］という
ｋ個の数値並びをアドレス値３０２としてプログラムを
終了する。ステップ４９では、変数ｉを「１」増分す
る。ここで、アドレス値３０２の桁数ｋを多くすると、
範囲内判定プログラム２０６で行われるアドレス値３０
２の比較処理において判定精度が良くなり、ベクトルデ
ータ３０３の比較処理を行う回数を減らすことができ
る。反面、アドレス値３０２の比較処理自体の計算量、
アドレス値３０２のデータ量が大きくなる。そのため、
アドレス値３０２の桁数ｋは、蓄積されているベクトル
データ３０３の分布密度等を考慮して適切な値に設定す
る必要がある。FIG. 4 is a flowchart of the address calculation program 203 according to this embodiment. In step 41, an address vector is created by selecting the necessary m dimensions from the vector data 303 for which the address value 302 is to be calculated, depending on the purpose of the search. In step 42, the cube 401 composed of the address vector space is set as a processing target cube. In step 43, “1” is substituted for the variable i. i is the address value 3
This is a variable for checking whether or not processing has been performed on all digits of 02. In step 44, a basic division process is performed on the target cube, and the target cube is divided into 2m subcubes 402. In step 45,
It is determined which of the subcubes 402 the vector data 303 is included in in the subcube 402 divided in step 44. In step 46, the number of the subcube 402 including the vector data 303 is set as the i-th digit Xi of the address value 302. Here, Xi is an unsigned m-bit integer. In step 47, the sub-cube 402 including the vector data 303 is set as a processing target cube. In step 48, the address value 302
Check whether all digits of have been processed. , I <k, the process proceeds to step 49. Otherwise, the process has been performed on all the digits of the address value 302, so that [X1. X2. …. Xk] is set as the address value 302 with the k numerical value sequence, and the program ends. In step 49, the variable i is incremented by "1". Here, if the number of digits k of the address value 302 is increased,
Address value 30 performed by in-range determination program 206
In the comparison process 2, the determination accuracy is improved, and the number of times the comparison process of the vector data 303 is performed can be reduced. On the other hand, the calculation amount of the comparison process itself of the address value 302,
The data amount of the address value 302 increases. for that reason,
It is necessary to set the number of digits k of the address value 302 to an appropriate value in consideration of the distribution density of the stored vector data 303 and the like.

【００１５】基本分割とは、図８に示すように、ｍ次元
のキューブ４０１を各辺２分割することにより、２ｍ個
のサブキューブ４０２に分割する処理であり、各サブキ
ューブ４０２には通し番号を付ける。ここでは、分割の
際に各辺を２等分割しているが、ベクトルデータ３０３
の分布の偏りを考慮して分割しても良い。アドレス値３
０２の１次元的な順序付け規則とは、上位桁の数値が大
きいアドレス値３０２ほど大きいと定義する。すなわ
ち、２次元のアドレス用ベクトルから３桁のアドレス値
３０２を算出した場合には、アドレス値３０２は数式１
のような大小関係を持ち、これを昇順で並べると、図９
の矢印で示した順番となる。As shown in FIG. 8, the basic division is a process of dividing an m-dimensional cube 401 into two sides, thereby dividing the cube into 2m subcubes 402. Each subcube 402 has a serial number. wear. Here, each side is divided into two equal parts at the time of division.
May be divided in consideration of the distribution bias. Address value 3
The one-dimensional ordering rule of 02 is defined such that the larger the address value 302 of the numerical value of the upper digit is, the larger the address value is. That is, when the three-digit address value 302 is calculated from the two-dimensional address vector, the address value 302 is expressed by Equation 1
When they are arranged in ascending order as shown in FIG.
In the order indicated by the arrow.

【００１６】[0016]

【数１】０.０.０＜０.０.１＜ … ＜０.０.３＜０.
１.０＜ … ＜３.３.２＜３.３.３図５は、本実施形態の検索処理プログラム２０２のフロ
ーチャートである。以下、図１３の検索範囲４０４が入
力された場合の蓄積データに対して、検索処理プログラ
ムが行う処理の流れを、図１４と併せて説明する。ステ
ップ５０では、検索範囲４０４内で原点から一番近い点
と一番遠い点とを求め、アドレス算出プログラム２０３
により、それぞれの点のアドレス値３０２を算出する。
以下、検索範囲４０４内で原点から一番近い点が持つア
ドレス値３０２を検索最小アドレス値、検索範囲４０４
内で原点から一番遠い点が持つアドレス値３０２を検索
最大アドレス値と呼ぶ。検索最小アドレス値は［２．
１．１］、検索最大アドレス値は［３．２．１］と算出
される。ステップ５１では、Ｂ−Ｔｒｅｅプログラム２
０４により、二次記憶装置３００での、検索最小アドレ
ス値と検索最大アドレス値の格納位置を求める。図１４
に示すように、検索最小アドレス値の格納位置は（４）
と（５）の間、検索最大アドレス値の格納位置は（９）
と（１０）の間と求められるので、。ステップ５２で
は、軸アドレス算出プログラムにより、検索最小アドレ
ス値の軸アドレス値と検索最大アドレス値の軸アドレス
値を算出する。検索最小アドレス値の軸アドレス値は、
横軸［０．１．１］、縦軸［１．０．０］と算出され、
検索最大アドレス値の軸アドレス値は、横軸［１．０．
１］、縦軸［１．１．０］と算出される。ステップ５３
では、ステップ５１で求めた検索最小アドレス値の格納
位置に格納されているデータ対を処理対象データ対とす
る。すなわち、データ対（５）を処理対象データ対とす
る。ステップ５４では、処理対象データ対以降に格納さ
れているデータ対、すなわち、処理対象データ対のアド
レス値３０２よりも大きいアドレス値３０２を持つデー
タ対に対して処理を行う必要があるかを判断する。この
判断は、ステップ５２で求めた検索最大アドレス値の格
納位置と処理対象データ対の格納位置とを比較すること
により行う。「検索最大アドレス値の格納位置＜処理対
象データ対格納位置」ならば、処理を続ける必要がない
ので、プログラムを終了する。「検索最大アドレス値の
格納位置≧処理対象データ対格納位置」ならば、ステッ
プ５５に進む。つまり、図１４において、データ対
（９）よりも後に格納されているデータ対（１０）に対
しては、判定処理を行う必要がない。ステップ５５で
は、軸アドレス算出プログラム２０５により、処理対象
データ対のアドレス値３０２の軸アドレスを算出する。
アドレス値３０２（５）の軸アドレスは、横軸［０．
１．１］、縦軸［１．０．１］と求められる。ステップ
５６では、範囲内判定プログラム２０６により、処理対
象データ対のベクトルデータ３０３が検索範囲４０４内
に存在するか否かの判定処理を行う。ステップ５７で
は、処理対象データ対のベクトルデータ３０３が検索範
囲４０４内に存在すると判定された、すなわち、処理対
象データ対のベクトルデータ３０３が検索結果であるな
らば、ステップ５８に進む。そうでないならば、すてっ
ぷ５９に進む。ステップ５８では、処理対象データ対の
ベクトルデータ３０３を検索結果として出力装置１０２
に出力する。ステップ５９では、処理対象データ対の次
の位置に格納されているデータ対を処理対象として、ス
テップ５４に進む。処理対象データ対が（５）の場合、
処理対象データ対を次に格納されているデータ対（６）
にする。## EQU1 ## 0.0.0.0 <0.0.1 <.
1.0 <... <3.3.2 <3.3.3 FIG. 5 is a flowchart of the search processing program 202 of the present embodiment. Hereinafter, the flow of the process performed by the search processing program on the accumulated data when the search range 404 in FIG. 13 is input will be described with reference to FIG. In step 50, the closest point and the farthest point from the origin within the search range 404 are obtained, and the address calculation program 203
Thus, the address value 302 of each point is calculated.
Hereinafter, the address value 302 of the point closest to the origin within the search range 404 is referred to as the minimum search address value and the search range 404.
The address value 302 of the point which is farthest from the origin in the above is referred to as a search maximum address value. The minimum search address value is [2.
1.1], and the search maximum address value is calculated as [3.2.1]. In step 51, the B-Tree program 2
04, the storage locations of the search minimum address value and the search maximum address value in the secondary storage device 300 are obtained. FIG.
As shown in (4), the storage location of the search minimum address value is (4)
Between (5) and (5), the storage location of the maximum search address value is (9)
And between (10). In step 52, the axis address calculation program calculates the axis address value of the minimum search address value and the axis address value of the maximum search address value. The axis address value of the minimum search address value is
The horizontal axis [0.1.1] and the vertical axis [1.0.0] are calculated,
The axis address value of the search maximum address value is expressed on the horizontal axis [1.0.
1] and the vertical axis [1.1.0]. Step 53
Then, the data pair stored in the storage position of the minimum search address value obtained in step 51 is set as the data pair to be processed. That is, the data pair (5) is set as the data pair to be processed. In step 54, it is determined whether or not it is necessary to perform processing on a data pair stored after the data pair to be processed, that is, a data pair having an address value 302 larger than the address value 302 of the data pair to be processed. . This determination is made by comparing the storage position of the search maximum address value obtained in step 52 with the storage position of the data pair to be processed. If “storage position of search maximum address value <storage position of data to be processed”, the program does not need to be continued, and the program ends. If “storage position of search maximum address value ≧ processing target data pair storage position”, the process proceeds to step 55. That is, in FIG. 14, it is not necessary to perform the determination process on the data pair (10) stored after the data pair (9). In step 55, the axis address calculation program 205 calculates the axis address of the address value 302 of the data pair to be processed.
The axis address of the address value 302 (5) is expressed by the horizontal axis [0.
1.1] and the vertical axis [1.0.1]. In step 56, the in-range determination program 206 determines whether or not the vector data 303 of the data pair to be processed exists within the search range 404. In step 57, if it is determined that the vector data 303 of the data pair to be processed exists within the search range 404, that is, if the vector data 303 of the data pair to be processed is a search result, the process proceeds to step 58. If not, proceed to step 59. In step 58, the output device 102 uses the vector data 303 of the data pair to be processed as a search result.
Output to In step 59, the process proceeds to step 54 with the data pair stored at the next position after the data pair to be processed as the processing target. When the data pair to be processed is (5),
The data pair stored next to the data pair to be processed (6)
To

【００１７】以上のように、Ｂ−Ｔｒｅｅプログラム２
０４を用いることにより、データ対（１）、データ対
（２）、データ対（３）、データ対（４）、データ対
（１０）に対しては、アドレス値３０２の比較による判
定処理を行う必要がないので、アドレス値３０２の比較
による判定処理の計算量を１／２に減らすことができ
る。As described above, the B-Tree program 2
04, the data pair (1), the data pair (2), the data pair (3), the data pair (4), and the data pair (10) are subjected to determination processing by comparing the address value 302. Since there is no need, the calculation amount of the determination process by comparing the address values 302 can be reduced to half.

【００１８】図６は、本実施形態の軸アドレス算出プロ
グラム２０５のフローチャートである。ｊ軸アドレス値
ａｊは、アドレス値３０２を第ｊベクトルについてのみ
評価する処理を高速化するために使用される。ステップ
６１では、変数ｉに「１」を代入する。ｉとは、アドレ
ス値３０２の全ての桁に対して２進数を算出したかどう
かをチェックする変数である。ステップ６２では、アド
レス値３０２のｉ桁目の数Ｘｉを２進数で表示した時の
各桁数ｘi１、ｘi２、…、ｘiｍを算出する。ここでｍ
とは、アドレス用ベクトルの次元数である。ステップ６
３では、全ての桁に対して２進数を算出したかどうかを
チェックする。ｉ＜ｋを満たす場合にはステップ６４に
進み、満たさない場合には、全ての桁に対して２進数を
算出したのでステップ６５に進む。ここでｋとは、アド
レス用ベクトルの次元数である。ステップ６４では、変
数ｉを「１」増分する。ステップ６５では、変数ｊに
「１」を代入する。ｊとは、アドレス用ベクトルの全て
の次元における軸アドレス値を算出したかどうかをチェ
ックする変数である。ステップ６６では、ステップ６２
で算出された２進数のうちｘ１ｊ、ｘ２ｊ、…、ｘｋｊ
のｋ個の２進数から構成されるｋビット整数をｊ軸アド
レス値ａｊとして算出する。ステップ６７では、アドレ
ス用ベクトルの全ての次元における軸アドレス値を算出
したかどうかをチェックする。ｊ＜ｍを満たす場合には
ステップ６８に進み、満たさない場合には、アドレス用
ベクトルの全ての次元における軸アドレス値を算出した
のでプログラムを終了する。ステップ６８では、変数ｊ
を「１」増分する。FIG. 6 is a flowchart of the axis address calculation program 205 according to the present embodiment. The j-axis address value aj is used to speed up the process of evaluating the address value 302 only for the j-th vector. In step 61, “1” is substituted for a variable i. i is a variable for checking whether a binary number has been calculated for all digits of the address value 302. In step 62, the numbers xi1, xi2,..., Xim of the i-th digit Xi of the address value 302 when the number Xi is displayed in a binary number are calculated. Where m
Is the number of dimensions of the address vector. Step 6
In 3, it is checked whether binary numbers have been calculated for all digits. If i <k is satisfied, the process proceeds to step 64; otherwise, the process proceeds to step 65 since the binary numbers have been calculated for all the digits. Here, k is the number of dimensions of the address vector. In step 64, the variable i is incremented by "1". In step 65, “1” is substituted for the variable j. j is a variable for checking whether or not the axis address values in all dimensions of the address vector have been calculated. In step 66, step 62
X1j, x2j, ..., xkj among the binary numbers calculated in
Is calculated as a j-axis address value aj. In step 67, it is checked whether or not the axis address values in all dimensions of the address vector have been calculated. If j <m is satisfied, the process proceeds to step 68. Otherwise, the axis address values in all dimensions of the address vector are calculated, and the program ends. In step 68, the variable j
Is incremented by "1".

【００１９】図７は、本実施形態の範囲内判定プログラ
ム２０６のフローチャートである。以下、図１の蓄積デ
ータに対して範囲内判定プログラム２０６が行う処理の
流れを、図１５、図１６と併せて説明する。範囲内判定
プログラム２０６が行う処理は、「アドレス値３０２の
比較による判定処理」と「ベクトルデータ３０３各次元
値の比較による判定処理」とに分けることができる。ス
テップ７１からステップ７４までがアドレス値３０２の
比較による判定処理であり、ステップ７５からステップ
７８までがベクトルデータ３０３各次元値の比較による
判定処理である。ステップ７１では、変数ｉに「１」を
代入する。ｉとは、アドレス値３０２の全ての軸アドレ
ス値に対して判定処理を行ったかどうかをチェックする
変数である。ステップ７２では、処理対象データ対のア
ドレス値３０２が検索最小アドレス値から検索最大アド
レス値までの値かどうかを判断する。ここで、ａｍｉｎ
ｉとは検索最小アドレス値のｉ軸アドレス値、ａｉとは
処理対象データ対のアドレス値３０２のｉ軸アドレス
値、ａｍａｘｉとは検索最大アドレス値のｉ軸アドレス
値を示す。ａｍｉｎｉ≦ａｉ≦ａｍａｘｉを満たす場合
にはステップ７３へ進む。満たさない場合には、処理対
象データ対のベクトルデータ３０３は検索範囲４０４の
内部には存在しないとしてプログラムを終了する。ステ
ップ７３では、全ての軸アドレス値に対して判定処理を
行ったかどうかをチェックする。ここでｍとは、アドレ
ス用ベクトルの次元数である。ｉ＜ｍを満たす場合には
ステップ７４に進み、満たさない場合には、全ての軸ア
ドレス値に対して判定処理を行ったので、処理対象デー
タ対のベクトルデータ３０３は検索範囲４０４の内部に
存在するとして、ステップ７５に進む。ステップ７４で
は、変数ｉを「１」増分する。図１５に、アドレス値３
０２の比較による判定処理の詳細な内容を示す。（５）
〜（９）のデータに対して、アドレス値３０２の比較に
よる判定処理を行い、検索範囲４０４内に存在し得ると
判定されたデータが（５）、（７）、（９）である。ス
テップ７５では、変数ｊに「１」を代入する。ｊとは、
ベクトルデータ３０３の全ての次元に対して判定処理を
行ったかどうかをチェックする変数である。ステップ７
６では、処理対象データ対のベクトルデータ３０３の次
元値が検索範囲４０４の当該次元の最大値から最小値ま
での値かどうかを判断する。ここで、ｖｍｉｎｊとは検
索範囲４０４のｊ次元での最小値、ｖｊとは処理対象デ
ータ対のベクトルデータ３０３のｊ次元値、ｖｍａｘｊ
とは検索範囲４０４のｊ次元での最大値を示す。ｖｍｉ
ｎｊ≦ｖｊ≦ｖｍａｘｊを満たす場合にはステップ７７
へ進む。満たさない場合には、処理対象データ対のベク
トルデータ３０３は検索範囲４０４の内部には存在しな
いとしてプラグラムを終了する。ステップ７７では、ベ
クトルデータ３０３の全ての次元に対して判定処理を行
ったかどうかをチェックする。ここでｎとは、ベクトル
データ３０３の次元数である。ｊ＜ｎを満たす場合には
ステップ７８に進み、満たさない場合には、ベクトルデ
ータ３０３の全ての次元に対して判定処理を行ったの
で、処理対象データ対のネクトルデータ３０３は検索範
囲４０４の内部に存在するとしてプログラムを終了す
る。ステップ７８では、変数ｊを「１」増分する。図１
６に、ベクトルデータ３０３各次元値の比較による判定
処理の詳細な内容を示す。（５）、（７）、（９）のデ
ータに対して、アドレス値３０２の比較による判定処理
を行い、検索範囲４０４内に存在すると判定されたデー
タが（５）、（９）であり、個の２個のデータが検索結
果となる。FIG. 7 is a flowchart of the in-range determination program 206 of this embodiment. Hereinafter, the flow of the process performed by the in-range determination program 206 on the accumulated data in FIG. 1 will be described with reference to FIGS. The processing performed by the in-range determination program 206 can be divided into “determination processing by comparing address values 302” and “determination processing by comparing each dimension value of vector data 303”. Steps 71 to 74 are determination processing by comparing the address values 302, and steps 75 to 78 are determination processing by comparing each dimension value of the vector data 303. In step 71, “1” is substituted for a variable i. “i” is a variable for checking whether or not the determination processing has been performed on all the axis address values of the address value 302. In step 72, it is determined whether the address value 302 of the data pair to be processed is a value from the minimum search address value to the maximum search address value. Where amin
i indicates the i-axis address value of the minimum search address value, ai indicates the i-axis address value of the address value 302 of the data pair to be processed, and amaxi indicates the i-axis address value of the maximum search address value. If it satisfies the following condition, the process proceeds to step 73. If not, the program ends, assuming that the vector data 303 of the data pair to be processed does not exist inside the search range 404. In step 73, it is checked whether or not the determination processing has been performed for all axis address values. Here, m is the number of dimensions of the address vector. If i <m is satisfied, the process proceeds to step 74. If not, the determination process is performed on all axis address values, so that the vector data 303 of the data pair to be processed exists inside the search range 404. If so, the process proceeds to step 75. In step 74, the variable i is incremented by "1". FIG. 15 shows an address value 3
02 shows the details of the determination process based on the comparison of No. 02. (5)
(5), (7), and (9) are determined by comparing the address value 302 with the data of (9) to (9) and determining that the data can exist within the search range 404. In step 75, “1” is substituted for the variable j. j is
This is a variable for checking whether or not the determination processing has been performed for all dimensions of the vector data 303. Step 7
In S6, it is determined whether or not the dimension value of the vector data 303 of the data pair to be processed is a value from the maximum value to the minimum value of the dimension in the search range 404. Here, vminj is the j-dimensional minimum value of the search range 404, vj is the j-dimensional value of the vector data 303 of the data pair to be processed, vmaxj
Indicates the maximum value of the search range 404 in the j-dimension. vmi
If nj ≦ vj ≦ vmaxj is satisfied, step 77
Proceed to. If not, the program ends, assuming that the vector data 303 of the data pair to be processed does not exist inside the search range 404. In step 77, it is checked whether or not the determination processing has been performed for all dimensions of the vector data 303. Here, n is the number of dimensions of the vector data 303. If j <n is satisfied, the process proceeds to step 78. Otherwise, the determination process is performed on all dimensions of the vector data 303. Terminates the program as it exists. In step 78, the variable j is incremented by "1". FIG.
6 shows the details of the determination process based on the comparison of each dimension value of the vector data 303. A determination process is performed on the data of (5), (7), and (9) by comparing the address value 302, and data determined to be within the search range 404 are (5) and (9). The two pieces of data are the search results.

【００２０】本実施形態では、１枚の画像から輝度微分
情報や色情報を解析して求められる５９０次元の画像特
徴量をベクトルデータ３０３として使用した。そして、
５９０次元のベクトルデータ３０３から３２次元のアド
レス用ベクトルを作成し、６桁のアドレス値３０２を算
出した。そのため、５９０次元のベクトルデータ３０３
からアドレス値３０２を算出するよりも、アドレス値３
０２のサイズが３２／５９０になり、検索処理に要する
計算量を減らすことができる。また、軸アドレス値のサ
イズが６ビットであり、それに対して、ベクトルデータ
３０３の各次元値のサイズは４バイト＝３２ビット（実
数値）である。ここで、１０万件のベクトルデータ３０
３の集合に対して、Ｂ−Ｔｒｅｅプログラム２０４を用
いることによって、アドレス値３０２の比較による判定
処理の対象となるベクトルデータ３０３を蓄積データ件
数の１／２に絞り込み、アドレス値３０２の比較による
判定処理で３００件が検索範囲４０４の内部に存在し得
ると判定され、ベクトルデータ３０３各次元値の比較に
よる判定処理で１００件が検索範囲４０４の内部に存在
すると判定される検索処理について考察する。この場
合、１０万件分のベクトルデータ３０３各次元値の比較
による判定処理を５万件分のアドレス値３０２の比較に
よる判定処理と３００件分のベクトルデータ３０３各次
元値の比較による判定処理で代用したと言うことができ
る。ここで、Ｂ−Ｔｒｅｅプログラム２０４を用いた処
理量は他の処理に比べて無視できるほど小さいものとし
た。１件に対するベクトルデータ３０３各次元値の比較
による判定処理が、ベクトルデータ３０３の５９０次元
各次元における３２ビット比較処理であるのに対して、
１件に対するアドレス値３０２の比較による判定処理
は、アドレス用ベクトルの３２次元各次元における６ビ
ット比較処理であるので、１件当たりの比較処理回数が
３２／５９０、１回あたりの比較処理対象データサイズ
が６／３２となる。よって、１件に対するベクトルデー
タ３０３各次元値の比較による判定処理を１ｕとする
と、検索処理に要する計算量を１０万ｕから、約５００
ｕ＋３００ｕ＝約８００ｕに減らすことが可能となる。In the present embodiment, 590-dimensional image feature values obtained by analyzing luminance differential information and color information from one image are used as vector data 303. And
A 32-dimensional address vector was created from the 590-dimensional vector data 303, and a 6-digit address value 302 was calculated. Therefore, 590-dimensional vector data 303
Address value 302 rather than calculating address value 302 from
02 becomes 32/590, and the amount of calculation required for search processing can be reduced. The size of the axis address value is 6 bits, whereas the size of each dimension value of the vector data 303 is 4 bytes = 32 bits (real value). Here, 100,000 vector data 30
By using the B-Tree program 204 for the set 3, the vector data 303 to be subjected to the determination process by comparing the address values 302 is narrowed down to の of the number of stored data, and the determination is made by comparing the address values 302. Consider a search process in which 300 cases are determined to be inside the search range 404 in the process, and 100 cases are determined to be inside the search range 404 in the determination process by comparing each dimension value of the vector data 303. In this case, the determination process by comparing each dimension value of the vector data 303 for 100,000 cases is performed by comparing the address value 302 for 50,000 cases and the determination process by comparing each dimension value of the 300 vector data 303. It can be said that it was substituted. Here, the processing amount using the B-Tree program 204 is assumed to be negligibly small as compared with other processing. The determination process by comparing each dimension value of the vector data 303 for one case is a 32-bit comparison process in each of the 590 dimensions of the vector data 303,
The determination process based on the comparison of the address value 302 for one case is a 6-bit comparison process in each of the 32 dimensions of the address vector, so the number of comparison processes per case is 32/590, and the comparison process target data is one time. The size becomes 6/32. Therefore, assuming that the determination process by comparing each dimension value of the vector data 303 for one case is 1u, the calculation amount required for the search process is reduced from 100,000 u to about 500
u + 300u = about 800u can be reduced.

【００２１】以上説明したように本実施形態の多次元ベ
クトル検索方法によれば、ｎ次元ベクトル空間内におい
て、位置、サイズ共に任意のｎ次元矩形領域内に存在す
るｎ次元ベクトルデータを検索するのに要する処理時間
を１／１００程度に減らすことが可能であり、また、処
理量が検索対象となるベクトルデータの次元数にしたが
って増大しないので、ｎが数十以上の場合でも適用する
ことが可能である。As described above, according to the multidimensional vector search method of the present embodiment, n-dimensional vector data existing in an arbitrary n-dimensional rectangular area in both the position and the size is searched in the n-dimensional vector space. Can be reduced to about 1/100, and the processing amount does not increase according to the number of dimensions of the vector data to be searched. Therefore, the present invention can be applied even when n is several tens or more. It is.

【００２２】[0022]

【発明の効果】本発明によれば、位置、サイズ共に任意
のｎ次元矩形領域内に存在するｎ次元ベクトルデータを
検索する処理において、「第１の判定処理」と「第２の
判定処理」とを導入することによって、ｎが数十以上の
場合でも、位置、サイズ共に任意のｎ次元矩形領域内に
存在するｎ次元ベクトルデータを検索するのに要する処
理時間に対し、検索対象となるベクトルデータの次元数
の影響を減らすことが可能となる。According to the present invention, in the processing for searching for n-dimensional vector data present in an arbitrary n-dimensional rectangular area in both position and size, "first determination processing" and "second determination processing" And the processing time required to search for n-dimensional vector data existing in an arbitrary n-dimensional rectangular area in both position and size even when n is several tens or more, the search target vector It is possible to reduce the influence of the number of dimensions of data.

[Brief description of the drawings]

【図１】本実施形態の多次元ベクトル検索方法の原理を
示す図である。FIG. 1 is a diagram illustrating the principle of a multidimensional vector search method according to an embodiment.

【図２】本実施形態の多次元ベクトル検索装置の構成を
示す図である。FIG. 2 is a diagram illustrating a configuration of a multidimensional vector search device according to the present embodiment.

【図３】本実施形態の蓄積処理プログラム２０１のフロ
ーチャートである。FIG. 3 is a flowchart of an accumulation processing program 201 according to the embodiment.

【図４】本実施形態のアドレス算出プログラム２０３の
フローチャートである。FIG. 4 is a flowchart of an address calculation program 203 according to the embodiment.

【図５】本実施形態の検索処理プログラム２０２のフロ
ーチャートである。FIG. 5 is a flowchart of a search processing program 202 according to the embodiment.

【図６】本実施形態の軸アドレス算出プログラム２０５
のフローチャートである。FIG. 6 is an axis address calculation program 205 according to the embodiment.
It is a flowchart of FIG.

【図７】本実施形態の範囲内判定プログラム２０６のフ
ローチャートである。FIG. 7 is a flowchart of an in-range determination program 206 of the embodiment.

【図８】本実施形態のアドレス算出プログラム２０３が
行う基本分割を説明する図である。FIG. 8 is a diagram illustrating basic division performed by an address calculation program 203 according to the embodiment.

【図９】本実施形態のアドレス値３０２の１次元的な順
序付け規則を説明する図である。FIG. 9 is a diagram illustrating a one-dimensional ordering rule of address values 302 according to the present embodiment.

【図１０】本実施形態の二次記憶装置３００の詳細を示
す図である。FIG. 10 is a diagram illustrating details of a secondary storage device 300 according to the present embodiment.

【図１１】本実施形態の蓄積処理を説明するための多次
元ベクトル空間４０３を示す図である。FIG. 11 is a diagram showing a multidimensional vector space 403 for explaining the accumulation processing of the present embodiment.

【図１２】本実施形態の蓄積処理を説明するための二次
記憶装置３００の詳細を示す図である。FIG. 12 is a diagram showing details of a secondary storage device 300 for explaining an accumulation process of the present embodiment.

【図１３】本実施形態の検索処理を説明するための多次
元ベクトル空間４０３を示す図である。FIG. 13 is a diagram illustrating a multidimensional vector space 403 for describing search processing according to the present embodiment.

【図１４】本実施形態の検索処理を説明するための二次
記憶装置３００の詳細を示す図である。FIG. 14 is a diagram illustrating details of a secondary storage device 300 for describing a search process according to the present embodiment.

【図１５】本実施形態のアドレス値３０２の比較による
判定処理の詳細な内容を示す図である。FIG. 15 is a diagram illustrating detailed contents of a determination process by comparing the address values 302 according to the present embodiment.

【図１６】本実施形態のベクトルデータ３０３各次元値
の比較による判定処理の詳細な内容を示す図である。FIG. 16 is a diagram illustrating detailed contents of a determination process based on a comparison of each dimension value of the vector data 303 according to the present embodiment.

[Explanation of symbols]

１０１ＣＰＵ１０１入力装置１０２出力装置１０３バス２００メモリ２０１蓄積処理プログラム２０２検索処理プログラム２０３アドレス算出プログラム２０４Ｂ−Ｔｒｅｅプログラム２０５軸アドレス算出プログラム２０６範囲内判定プログラム３００二次記憶装置３０１Ｂ−Ｔｒｅｅデータ３０２アドレス値３０３ベクトルデータ４０１キューブ４０２サブキューブ４０３多次元ベクトル空間 Reference Signs List 101 CPU 101 input device 102 output device 103 bus 200 memory 201 storage processing program 202 search processing program 203 address calculation program 204 B-Tree program 205 axis address calculation program 206 in-range determination program 300 secondary storage device 301 B-Tree data 302 Address value 303 Vector data 401 Cube 402 Subcube 403 Multidimensional vector space

Claims

[Claims]

1. A multi-dimensional vector search method for searching, among a plurality of stored multi-dimensional vector data, the multi-dimensional vector data present in an arbitrary multi-dimensional rectangular area in both position and size. A rectangular area is input as a search condition, and for a data pair consisting of the multidimensional vector data and an address value calculated based on the approximate value of each dimension of the multidimensional vector data, a data pair using the address value is used. 1 is performed, and if the address value satisfies the search condition, a second determination process is performed using the multidimensional vector data of the data pair, and if the vector data satisfies the search condition, A multidimensional vector search method, comprising outputting the multidimensional vector data as a search result.

2. The multi-dimensional vector search method according to claim 1, wherein, when storing the multi-dimensional vector data, the address value calculated based on an approximate value of each dimension of the multi-dimensional vector data and A multi-dimensional vector search method characterized by storing multi-dimensional vector data separately in association with each other.

3. The multi-dimensional vector search method according to claim 1, wherein, when calculating the address value, an address vector is created by selecting an arbitrary dimension of the multi-dimensional vector data. A multidimensional vector search method, characterized in that a value calculated based on an approximate value of each dimension of a vector is used as an address value.

4. A multi-dimensional vector search method according to claim 1, wherein a rule for assigning a one-dimensional order to said address value is set, and a B-Tree using said address value as an index key is created to create said data pair. Is stored, and when the first determination process is performed, the minimum address value of the search range and the storage position of the maximum address value of the search range are determined using the B-Tree, and the minimum address value of the search range is determined. Wherein the first determination process is not performed on the data pair having an address value smaller than and the data pair having an address value larger than the maximum address value in the search range. retrieval method.

5. A multidimensional vector retrieval method for retrieving, from a plurality of accumulated multidimensional vector data, said multidimensional vector data present in an arbitrary multidimensional rectangular area in both position and size. A search condition input unit for inputting a rectangular area as a search condition, and a data pair consisting of the multidimensional vector data and an address value calculated based on an approximate value of each dimension of the multidimensional vector data, the corresponding address 1st performing the 1st judgment processing using the value
A second determination processing unit that performs a second determination process using the multidimensional vector data of the data pair when the address value satisfies the search condition; A search result output unit that outputs the multidimensional vector data as a search result when a condition is satisfied.

6. A multidimensional vector search method for searching, among a plurality of accumulated multidimensional vector data, the multidimensional vector data present in an arbitrary multidimensional rectangular area in both position and size. A search condition input unit for inputting a rectangular area as a search condition, and a data pair consisting of the multidimensional vector data and an address value calculated based on an approximate value of each dimension of the multidimensional vector data, the corresponding address 1st performing the 1st judgment processing using the value
A second determination processing unit that performs a second determination process using the multidimensional vector data of the data pair when the address value satisfies the search condition; A medium in which a program for causing a computer to function as a search result output unit that outputs the multidimensional vector data as a search result when a condition is satisfied is recorded.