JP2015049574A

JP2015049574A - Index generation device and retrieval device

Info

Publication number: JP2015049574A
Application number: JP2013179285A
Authority: JP
Inventors: 健全劉; Jianquan Liu
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2013-08-30
Filing date: 2013-08-30
Publication date: 2015-03-16
Anticipated expiration: 2033-08-30
Also published as: JP6167767B2

Abstract

PROBLEM TO BE SOLVED: To provide an index technology capable of calculating an index value indicating a distance between original data with index data.SOLUTION: An index generation device 100 includes: a data acquisition unit 101 acquiring high dimension data; a coefficient acquisition unit 102 acquiring transformation coefficients of the number of dimensions of the high dimension data; a transformation unit 103 mapping the high dimension data only into a one-dimensional space with a plurality of transformation coefficients acquired by the coefficient acquisition unit 102; and an index generation unit 104 generating an index contained as index data in a state where one-dimensional data obtained by the transformation unit 103 is aligned in an ascending or descending order and having a hierarchical structure.

Description

本発明は、データのインデックス（索引）技術に関する。 The present invention relates to a data indexing technique.

現在、様々な類似検索手法が提案されている。このような類似検索は、画像の特徴量データのような多次元データや高次元データを対象に行われることが多い。 Currently, various similar search methods have been proposed. Such similarity search is often performed on multidimensional data such as image feature data or high-dimensional data.

例えば、特許文献１、３及び６には、類似画像検索方法に関する技術が記載されている。特許文献２には、一方のデータから他方のデータへ辿るためのリンクがデータ間に設定されたデータベースを用いて、類似データの検索を行う手法が記載されている。特許文献５には、任意の画像集合を階層的に分類する手法が記載されている。特許文献７には、高次元の特徴ベクトルの集合から、クエリ特徴ベクトルに類似した特徴ベクトルを検索する手法が記載されている。特許文献９には、ハッシュ関数を用いて各学習パターンをハッシュ値に対応するバケットに分類し、入力パターンのハッシュ値に対応するバケットに属する学習パターンの中から、入力パターンに最も類似する学習パターンを探索する手法が記載されている。特許文献１０には、複数の特徴量をベクトルで表現可能な多次元のデータに対し、条件を指定して所望のデータを抽出するデータマッチング方法が記載されている。以降、「高次元」と「多次元」とは、特別に区別することなく用いられる。 For example, Patent Documents 1, 3, and 6 describe techniques related to a similar image search method. Patent Document 2 describes a method of searching for similar data using a database in which a link for tracing from one data to the other data is set between the data. Patent Document 5 describes a method of hierarchically classifying an arbitrary image set. Patent Document 7 describes a technique for searching a feature vector similar to a query feature vector from a set of high-dimensional feature vectors. In Patent Literature 9, each learning pattern is classified into a bucket corresponding to a hash value using a hash function, and a learning pattern that is most similar to the input pattern among the learning patterns belonging to the bucket corresponding to the hash value of the input pattern. A technique for searching is described. Patent Document 10 describes a data matching method that specifies desired conditions and extracts desired data for multidimensional data that can represent a plurality of feature quantities as vectors. Hereinafter, “high dimension” and “multidimensional” are used without any particular distinction.

このような類似検索では、通常、類似度関数等を用いて対象データ間の類似度が算出される。類似度が高いほど、対象データどうしがより類似すると判断できる。または、距離関数等を用いて、対象データ間の距離が算出される。距離が小さいほど、対象データどうしがより類似すると判断できる。例えば、画像の特徴量データは、多次元の数値ベクトルで表わされ、比較対象の特徴量データ間の類似度が類似度関数により算出される。特許文献４には、データベース内の全ての特徴量に関し、他の特徴量との類似度を計算し、類似度の高い順に上位ｆ（ｘ）件分のＩＤ情報を、類似度順付きで格納しておき、この格納内容を検索することにより、類似特徴量を検索する手法が記載されている。 In such a similarity search, the similarity between target data is usually calculated using a similarity function or the like. It can be determined that the higher the similarity is, the more similar the target data is. Alternatively, the distance between the target data is calculated using a distance function or the like. It can be determined that the smaller the distance, the more similar the target data. For example, image feature data is represented by a multidimensional numerical vector, and the similarity between the feature data to be compared is calculated by a similarity function. Patent Document 4 calculates similarity with other feature values for all feature values in the database, and stores ID information for the upper f (x) items in descending order of similarity. A technique for searching for similar feature amounts by searching the stored contents is described.

また、対象データに関しインデックス（索引）が構築され、このインデックスを用いて類似検索を行うことで、検索の高速化が図られる。多次元データのインデックス生成手法としては、Ｒ木（R-Tree）が知られている（非特許文献１参照）。また、特許文献８には、特徴ベクトル空間を複数個の近似領域に分割し、各近似領域の疎密に応じて階層化されたインデックスツリーを生成する手法が記載されている。 In addition, an index (index) is constructed with respect to the target data, and a similar search is performed using this index, thereby speeding up the search. An R-Tree is known as an index generation method for multidimensional data (see Non-Patent Document 1). Patent Document 8 describes a method of dividing a feature vector space into a plurality of approximate regions and generating a hierarchical index tree according to the density of each approximate region.

また、多次元のデータを元のデータ空間から一次元空間へマッピングし、マッピング後の一次元データを用いてインデックスを構築する手法も存在する。この手法によれば、一次元データを用いて類似検索を行うことで、検索の高速化が実現できる。多次元データを一次元空間へマッピングする手法は、空間充填曲線（Spatial Filling Curve）と呼ばれる。空間充填曲線としては、Ｚカーブ（Ｚ−ｃｕｒｖｅまたはＺ−ｏｒｄｅｒ）（非特許文献２参照）、Ｈｉｌｂｅｒｔ曲線（非特許文献３参照）等が知られている。 There is also a technique for mapping multidimensional data from an original data space to a one-dimensional space and constructing an index using the mapped one-dimensional data. According to this method, it is possible to speed up the search by performing a similar search using one-dimensional data. A technique for mapping multidimensional data to a one-dimensional space is called a spatial filling curve. As the space filling curve, a Z curve (Z-curve or Z-order) (see Non-Patent Document 2), a Hilbert curve (see Non-Patent Document 3), and the like are known.

特許第４５４５６４１号公報Japanese Patent No. 4545641 特開２０１１−０９０３５２号公報JP 2011-090352 A 特開２０１２−０７９１８６号公報JP 2012-079186 A 特開２０００−０３５９６５号公報JP 2000-035965 A 特開２００１−１６００５７号公報Japanese Patent Laid-Open No. 2001-160057 特許第４９０６９００号公報Japanese Patent No. 4906900 特開２０１１−２５７９７０号公報JP 2011-257970 A 特開２００２−１６３２７２号公報JP 2002-163272 A 特開２００９−０２０７６９号公報JP 2009-020769A 特開２００４−０４６６１２号公報JP 2004-046612 A

Antonin Guttman著、「R-Trees：A Dynamic Index Structure for Spatial Searching」、SIGMOD Conference出版、1984年、pp.47-57Antonin Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching”, SIGMOD Conference, 1984, pp. 47-57 G. M. Morton、「A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing」、Technical report、IBM、Ottawa, Canada、1966年G. M. Morton, "A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing", Technical report, IBM, Ottawa, Canada, 1966 David Hilbert、「Ueber die stetige Abbildung einer Line auf ein Flachenstuck」、Mathematische Annalen Volume 38, Issue 3、pp 459-460、1891年David Hilbert, `` Ueber die stetige Abbildung einer Line auf ein Flachenstuck '', Mathematische Annalen Volume 38, Issue 3, pp 459-460, 1891

上述のような類似検索手法は、検索処理で距離（類似度）を計算する際に、インデックスデータを元のデータへアクセスするためだけに利用するため、検索処理の更なる高速化の余地を残す。即ち、インデックスデータを元のデータの代わりに用いて、データ間の距離を示す指標値を計算することができれば、検索処理の更なる高速化を実現することができる。 The similarity search method as described above uses the index data only for accessing the original data when calculating the distance (similarity) in the search process, leaving room for further speed-up of the search process. . In other words, if the index data can be used instead of the original data and an index value indicating the distance between the data can be calculated, the search process can be further speeded up.

Ｒ木のような空間インデックスを用いた手法は、最小包含矩形（ＭＢＲ；Minimum Bounding Rectangle）を用いて、元のデータを階層的に囲み、木構造インデックスを構築する。そして、そのインデックスを基に検索時に枝刈りを行うことで、元のデータへのアクセス回数を減らし、結果、類似検索を高速化する。このような空間インデックス技術は、インデックスデータをデータ間の距離の計算に用いることを想定していないため、その空間インデックスを用いる類似検索処理において、インデックスデータを用いてデータ間の距離を計算することはできない。 A method using a spatial index such as an R-tree uses a minimum bounding rectangle (MBR) to hierarchically surround the original data and construct a tree structure index. Then, pruning is performed at the time of search based on the index, thereby reducing the number of accesses to the original data and, as a result, speeding up the similar search. Such a spatial index technique does not assume that index data is used to calculate the distance between data. Therefore, in a similar search process using the spatial index, the distance between data is calculated using the index data. I can't.

一方、空間充填曲線を用いて手法によれば、生成されるインデックスデータは、一次元空間上での順番を示すため、そのインデックスデータを用いて順番の遠近を判定し、この判定結果をデータ間距離として利用することは可能である。ところが、その手法によれば、隣接するインデックスデータ間の距離は全て等しくなるため、データ間の距離の大きさを区別することは困難となり、類似検索の効率を低下させる。 On the other hand, according to the method using the space filling curve, since the generated index data indicates the order in the one-dimensional space, the order of the order is determined using the index data, and this determination result is determined between the data. It can be used as a distance. However, according to this method, since the distances between adjacent index data are all equal, it is difficult to distinguish the magnitudes of the distances between the data, and the similarity search efficiency is lowered.

本発明は、上述のような事情に鑑みてなされたものであり、インデックスデータを用いて元のデータ間の距離を示す指標値を計算可能とするインデックス技術を提供することにある。 The present invention has been made in view of the above-described circumstances, and provides an index technique that makes it possible to calculate an index value indicating a distance between original data using index data.

本発明の各側面では、上述した課題を解決するために、それぞれ以下の構成を採用する。 Each aspect of the present invention employs the following configurations in order to solve the above-described problems.

第１の側面は、インデックス生成装置に関する。第１の側面に係るインデックス生成装置は、高次元データを取得するデータ取得部と、相互に可約できない、当該高次元データの次元数分の変換係数を取得する係数取得部と、係数取得部で取得される複数の変換係数を用いて、当該高次元データを一次元空間へ唯一にマッピングする変換部と、変換部により得られる一次元データが昇順又は降順に整列された状態でインデックスデータとして含まれ、階層構造を持つインデックスを生成するインデックス生成部と、を有する。 The first aspect relates to an index generation device. An index generation device according to a first aspect includes a data acquisition unit that acquires high-dimensional data, a coefficient acquisition unit that acquires conversion coefficients corresponding to the number of dimensions of the high-dimensional data that cannot be mutually reducible, and a coefficient acquisition unit As the index data in a state where the conversion unit uniquely mapping the high-dimensional data to the one-dimensional space using the plurality of conversion coefficients acquired in step 1 and the one-dimensional data obtained by the conversion unit are arranged in ascending or descending order And an index generation unit that generates an index having a hierarchical structure.

第２の側面は、少なくとも１つのコンピュータにより実行されるインデックス生成方法に関する。第２の側面に係るインデックス生成方法は、高次元データを取得し、相互に可約できない、当該高次元データの次元数分の変換係数を取得し、取得された複数の変換係数を用いて、当該高次元データを一次元空間へ唯一にマッピングし、マッピングにより得られる一次元データが昇順又は降順に整列された状態でインデックスデータとして含まれ、階層構造を持つインデックスを生成する、ことを含む。 The second aspect relates to an index generation method executed by at least one computer. The index generation method according to the second aspect acquires high-dimensional data, acquires conversion coefficients for the number of dimensions of the high-dimensional data that cannot be mutually reducible, and uses the acquired plurality of conversion coefficients, The high-dimensional data is uniquely mapped to a one-dimensional space, and the one-dimensional data obtained by the mapping is included as index data in an ascending or descending order, and an index having a hierarchical structure is generated.

第３の側面は、第１の側面に係るインデックス生成装置により生成されるインデックスを用いる検索装置に関する。第３の側面に係る検索装置は、当該高次元データと同じ次元数の検索対象データを取得するクエリ取得部と、上記係数取得部で取得される複数の変換係数と同じ複数の変換係数を用いて、上記変換部と同じ手法で、検索対象データを一次元空間へ唯一にマッピングする検索対象変換部と、当該高次元データと検索対象データとの間の類似度を評価する際に、検索対象変換部により得られる検索対象一次元データとインデックスにインデックスデータとして含まれる一次元データとの間の距離を算出する距離算出部と、を有する。 The third aspect relates to a search apparatus that uses an index generated by the index generation apparatus according to the first aspect. The search device according to the third aspect uses a query acquisition unit that acquires search target data having the same number of dimensions as the high-dimensional data, and a plurality of conversion coefficients that are the same as the plurality of conversion coefficients acquired by the coefficient acquisition unit. When the similarity between the search target conversion unit that uniquely maps the search target data to the one-dimensional space and the high-dimensional data and the search target data is evaluated by the same method as the conversion unit, the search target A distance calculation unit that calculates a distance between the search target one-dimensional data obtained by the conversion unit and the one-dimensional data included in the index as index data.

第４の側面は、第２の側面に係るインデックス生成方法により生成されるインデックスを用いる検索方法に関する。第４の側面に係る検索方法は、当該高次元データと同じ次元数の検索対象データを取得し、上記複数の変換係数を用いて、第２の側面に係るインデックス生成方法に含まれるマッピングと同じ手法で、検索対象データを一次元空間へ唯一にマッピングし、当該高次元データと検索対象データとの間の類似度を評価する際に、検索対象データのマッピングにより得られる検索対象一次元データとインデックスに含まれる一次元データとの間の距離を算出する、ことを含む。 The fourth aspect relates to a search method using an index generated by the index generation method according to the second aspect. The search method according to the fourth aspect is the same as the mapping included in the index generation method according to the second aspect using the plurality of transform coefficients, acquiring search target data having the same number of dimensions as the high-dimensional data. When the search target data is uniquely mapped to the one-dimensional space by the technique and the similarity between the high-dimensional data and the search target data is evaluated, the search target one-dimensional data obtained by mapping the search target data Calculating a distance between the one-dimensional data included in the index.

本発明の他の側面としては、第２の側面又は第４の側面に係る方法を少なくとも１つのコンピュータに実行させるプログラムであってもよいし、このようなプログラムを記録したコンピュータが読み取り可能な記録媒体であってもよい。この記録媒体は、非一時的な有形の媒体を含む。 Another aspect of the present invention may be a program that causes at least one computer to execute the method according to the second aspect or the fourth aspect, or a computer-readable recording that records such a program. It may be a medium. This recording medium includes a non-transitory tangible medium.

上記各側面によれば、インデックスデータを用いて元のデータ間の距離を示す指標値を計算可能とするインデックス技術を提供することができる。 According to each said aspect, the index technique which makes it possible to calculate the index value which shows the distance between the original data using index data can be provided.

第１実施形態におけるインデックス生成装置の処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the index production | generation apparatus in 1st Embodiment. 第１実施形態におけるインデックス生成装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the index production | generation apparatus in 1st Embodiment. 第１実施形態における検索装置の処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the search device in 1st Embodiment. 第１実施形態における検索装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the search device in 1st Embodiment. 第２実施形態における高次元データ検索装置のハードウェア構成例を概念的に示す図である。It is a figure which shows notionally the hardware structural example of the high-dimensional data search apparatus in 2nd Embodiment. 第２実施形態における高次元データ検索装置の処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the high-dimensional data search device in 2nd Embodiment. 第２実施形態における高次元データ検索装置の、インデックス生成に関する動作例を示すフローチャートである。It is a flowchart which shows the operation example regarding index generation of the high-dimensional data search device in 2nd Embodiment. 第２実施形態における高次元データ検索装置の、範囲問合せ（Range Query）に関する検索方法に関する動作例を示すフローチャートである。It is a flowchart which shows the operation example regarding the search method regarding the range query (Range Query) of the high-dimensional data search device in 2nd Embodiment. 第３実施形態における高次元データ検索装置の処理構成例を概念的に示す図である。It is a figure which shows notionally the process structural example of the high-dimensional data search device in 3rd Embodiment. 第３実施形態における高次元データ検索装置の、ｋ最近傍探索の動作例を示すフローチャートである。It is a flowchart which shows the operation example of k nearest neighbor search of the high-dimensional data search device in 3rd Embodiment.

以下、本発明の実施の形態について説明する。なお、以下に挙げる各実施形態はそれぞれ例示であり、本発明は以下の各実施形態の構成に限定されない。 Embodiments of the present invention will be described below. In addition, each embodiment given below is an illustration, respectively, and this invention is not limited to the structure of each following embodiment.

［第１実施形態］
まず、第１実施形態として、インデックス生成装置、インデックス生成方法、そのインデックス生成装置で生成されるインデックスを用いる検索装置、及び、そのインデックス生成方法で生成されるインデックスを用いる検索方法について、説明する。 [First Embodiment]
First, as a first embodiment, an index generation device, an index generation method, a search device using an index generated by the index generation device, and a search method using an index generated by the index generation method will be described.

図１は、第１実施形態におけるインデックス生成装置１００の処理構成例を概念的に示す図である。第１実施形態におけるインデックス生成装置１００は、高次元データを取得するデータ取得部１０１と、相互に可約できない、上記高次元データの次元数分の変換係数を取得する係数取得部１０２と、係数取得部１０２で取得される複数の変換係数を用いて、上記高次元データを一次元空間へ唯一にマッピングする変換部１０３と、変換部１０３により得られる一次元データが昇順又は降順に整列された状態でインデックスデータとして含まれ、階層構造を持つインデックスを生成するインデックス生成部１０４と、を有する。 FIG. 1 is a diagram conceptually illustrating a processing configuration example of the index generation device 100 according to the first embodiment. The index generation device 100 according to the first embodiment includes a data acquisition unit 101 that acquires high-dimensional data, a coefficient acquisition unit 102 that acquires conversion coefficients corresponding to the number of dimensions of the high-dimensional data that cannot be mutually reduced, and a coefficient Using a plurality of conversion coefficients acquired by the acquisition unit 102, the conversion unit 103 that uniquely maps the high-dimensional data to a one-dimensional space, and the one-dimensional data obtained by the conversion unit 103 are arranged in ascending or descending order. An index generation unit 104 that generates an index having a hierarchical structure that is included as index data in a state.

インデックス生成装置１００は、例えば、後述する詳細実施形態（第２実施形態以降）における高次元データ検索装置１と同様のハードウェア構成を有する。その高次元データ検索装置１と同様に、プログラムが処理されることで、上述の各処理部が実現される。インデックス生成装置１００のハードウェア構成は制限されない。 The index generation device 100 has, for example, the same hardware configuration as the high-dimensional data search device 1 in a detailed embodiment (second embodiment and later) described later. Similar to the high-dimensional data search device 1, each processing unit described above is realized by processing a program. The hardware configuration of the index generation device 100 is not limited.

次に、第１実施形態におけるインデックス生成方法について図２を用いて説明する。図２は、第１実施形態におけるインデックス生成装置１００の動作例を示すフローチャートである。以下の説明では、インデックス生成装置１００が当該インデックス生成方法の実行主体となるが、インデックス生成装置１００に含まれる上述の各処理部が実行主体となってもよい。 Next, the index generation method in the first embodiment will be described with reference to FIG. FIG. 2 is a flowchart illustrating an operation example of the index generation device 100 according to the first embodiment. In the following description, the index generation device 100 is the execution subject of the index generation method, but each of the processing units included in the index generation device 100 may be the execution subject.

第１実施形態におけるインデックス生成方法は、インデックス生成装置１００のような、少なくとも１つのコンピュータにより実行される。当該インデックス生成方法は、高次元データを取得し（Ｓ２１）、相互に可約できない、上記高次元データの次元数分の変換係数を取得し（Ｓ２２）、（Ｓ２２）で取得された複数の変換係数を用いて、上記高次元データを一次元空間へ唯一にマッピングし（Ｓ２３）、（Ｓ２３）のマッピングにより得られる一次元データが昇順又は降順に整列された状態でインデックスデータとして含まれ、階層構造を持つインデックスを生成する（Ｓ２４）、ことを含む。 The index generation method in the first embodiment is executed by at least one computer such as the index generation apparatus 100. The index generation method acquires high-dimensional data (S21), acquires conversion coefficients corresponding to the number of dimensions of the high-dimensional data that cannot be mutually reduced (S22), and a plurality of conversions acquired in (S22). Using the coefficient, the high-dimensional data is uniquely mapped to the one-dimensional space (S23), and the one-dimensional data obtained by the mapping of (S23) is included as index data in a state of being arranged in ascending or descending order. An index having a structure is generated (S24).

本実施形態では、インデックスが付与される対象となる高次元データが取得され、更に、この高次元データの次元数分の変換係数が取得される。取得される高次元データのデータ型は制限されない。更に、上述したように、高次元データは、複数次元のデータを意味し、多次元データや複数次元データと区別されない。また、取得される複数の変換係数は、相互に可約できないという特性を持つ。ここで、相互に可約できないとは、更なる単純化を行うことができないことを意味する。具体的には、全ての変換係数が整数である場合に、当該変換係数の全てのペアが、１以外の公約数を持たないことを意味し、小数の変換係数を含む場合に、当該変換係数の全てのペアが自然数で割り切れないことを意味する。 In the present embodiment, high-dimensional data to be indexed is acquired, and conversion coefficients corresponding to the number of dimensions of the high-dimensional data are acquired. The data type of the acquired high-dimensional data is not limited. Furthermore, as described above, high-dimensional data means multi-dimensional data and is not distinguished from multi-dimensional data or multi-dimensional data. Further, the obtained plurality of conversion coefficients have a characteristic that they cannot be mutually reduced. Here, mutually inducible means that no further simplifications can be made. Specifically, when all transform coefficients are integers, it means that all pairs of the transform coefficients do not have a common divisor other than 1, and when the transform coefficients include decimal transform coefficients, Means that all pairs of are not divisible by natural numbers.

このような相互に可約できない複数の変換係数を用いて、上記高次元データが、一次元空間へ唯一にマッピングされる。唯一にマッピングとは、複数の高次元データがその一次元空間の同じ点（１つの一次元データ）に写像されないことを意味する。この唯一のマッピングは、相互に可約できない複数の変換係数を用いることで実現することができる。本実施形態は、相互に可約できない複数の変換係数を用いるマッピングであれば、マッピング方法自体を制限しない。 The high-dimensional data is uniquely mapped to a one-dimensional space using a plurality of transform coefficients that are not mutually reducible. The only mapping means that a plurality of high-dimensional data are not mapped to the same point (one one-dimensional data) in the one-dimensional space. This only mapping can be realized by using a plurality of transform coefficients that cannot be mutually contracted. The present embodiment does not limit the mapping method itself as long as the mapping uses a plurality of transform coefficients that cannot be mutually reduced.

本実施形態では、上記マッピングにより高次元データから変換された一次元データをインデックスデータとして含むインデックスが生成される。生成されたインデックスでは、当該インデックスデータが、昇順又は降順に整列された状態で階層的に管理される。このインデックスの生成には、Ｂ木（Ｂ＋木、Ｂ＊木など）と呼ばれる周知の階層型インデックスを利用することができる。このインデックスデータは、元の高次元データへのポインタとしても利用されてもよい。 In the present embodiment, an index including one-dimensional data converted from high-dimensional data by the mapping as index data is generated. In the generated index, the index data is hierarchically managed in a state of being arranged in ascending order or descending order. For the generation of this index, a known hierarchical index called a B-tree (B + tree, B * tree, etc.) can be used. This index data may also be used as a pointer to the original high-dimensional data.

このように、本実施形態によれば、各高次元データに対応する各インデックスデータが独自の（唯一性を持つ）実数となるため、各インデックスデータを用いて、高次元データ間の距離（類似度）を示す指標値を算出することができる。即ち、インデックスデータ間の距離が、対応する高次元データ間の距離を示す指標値となり得る。ここで、指標値とは、高次元データ間の距離の値と完全に一致しないとしても、その距離の指標となり得る値を意味する。また、本実施形態によれば、元の高次元データが一次元空間上に唯一にマッピングされて得られる一次元データをインデックスデータとすることで、元の高次元データが有する情報を失うことなく、インデックス空間の複雑さを大幅に削減することができる。 As described above, according to the present embodiment, each index data corresponding to each high-dimensional data becomes a unique (unique) real number, and therefore, the distance (similarity) between the high-dimensional data using each index data. Index value indicating (degree) can be calculated. That is, the distance between index data can be an index value indicating the distance between corresponding high-dimensional data. Here, the index value means a value that can be an index of the distance even if it does not completely match the value of the distance between the high-dimensional data. In addition, according to the present embodiment, by using the one-dimensional data obtained by uniquely mapping the original high-dimensional data on the one-dimensional space as index data, the information that the original high-dimensional data has is not lost. The complexity of the index space can be greatly reduced.

ここで、上述のインデックス生成装置１００及びインデックス生成方法に具現化されている技術的思想を説明すると共に、相互に可約できない変換係数及び高次元データの一次元空間へのマッピング処理に関する具体例を例示する。 Here, the technical idea embodied in the index generation apparatus 100 and the index generation method described above will be described, and a specific example related to mapping processing that cannot be mutually reduced and high-dimensional data mapping to a one-dimensional space. Illustrate.

本発明者は、数論における素因数分解の基本原理に着眼し、この基本原理の逆、即ち、素因数分解の逆を類似検索のためのインデックスデータの生成に採用するという着想を得た。素因数分解は、任意の正の整数（自然数）を素数の積の形で表すことである。「素因数分解の基本原理」とは、素因数分解の変換過程に持つ特殊な性質を意味する。素因数分解には、次のような性質がある。 The present inventor has focused on the basic principle of prime factorization in number theory, and has come up with the idea of adopting the reverse of this basic principle, that is, the reverse of prime factorization, for generating index data for similarity search. Prime factorization is the representation of any positive integer (natural number) in the form of a product of prime numbers. The “basic principle of prime factorization” means a special property in the conversion process of prime factorization. Prime factorization has the following properties.

任意の正の整数に対して、素因数分解は一意的に決定される。この性質は、素因数分解の一意性とも表記される。 For any positive integer, the prime factorization is uniquely determined. This property is also expressed as uniqueness of prime factorization.

例えば、正の整数２８８を素因数分解すると、次のようになる。
２８８＝２×２×２×２×２×３×３＝２^５×３^２
即ち、正の整数２８８は、一義的に「２×２×２×２×２×３×３」に因数分解される。また、このような連続積は、素数の冪乗と他の素数の冪乗との積としても表すこともできる。即ち、正の整数２８８は、素数２の５乗と素数３の２乗との積と表すこともできる。 For example, when a positive integer 288 is factored, the result is as follows.
288 = 2 × 2 × 2 × 2 × 2 × 3 × 3 = 2 ⁵ × 3 ²
That is, the positive integer 288 is uniquely factored into “2 × 2 × 2 × 2 × 2 × 3 × 3”. Such a continuous product can also be expressed as a product of a power of a prime number and a power of another prime number. That is, the positive integer 288 can also be expressed as a product of the prime number 2 5 and the prime number 3 square.

このような素因数分解の性質により、素因数分解の逆演算も一義的である。よって、素数を予め選択し、選択された各素数に、対応する冪数を与えることにより、必ず唯一の正の整数が得られる。
例えば、素数２及び素数３を選択し、素数２の冪数に５を、素数３の冪数に２を与え、素因数分解の逆演算をすると、正の整数２８８が一意的に算出される。
２^５×３^２＝２×２×２×２×２×３×３＝２８８ Due to the nature of prime factorization, the inverse operation of prime factorization is also unique. Therefore, a unique positive integer is always obtained by selecting a prime number in advance and giving a corresponding power to each selected prime number.
For example, if prime number 2 and prime number 3 are selected, 5 is given to the power of prime 2 and 2 is given to the power of prime 3, and the inverse operation of prime factorization is performed, positive integer 288 is uniquely calculated.
2 ⁵ × 3 ² = 2 × 2 × 2 × 2 × 2 × 3 × 3 = 288

本発明者は、このような素因数分解の逆演算の一意性を踏まえつつ、素因数分解の視点を変えて、次の考察を行った。 The present inventor changed the viewpoint of prime factorization and considered the following while taking into account the uniqueness of the inverse operation of prime factorization.

まず、上述のように選択される素数の個数を類似検索処理のデータ空間の次元数と定義する。これにより、当該データ空間の次元数が、選択される素数の個数となるように、当該データ空間の次元数分の素数が予め選択される。上述の例では、２つの素数２及び３が選ばれているため、データ空間の次元数は２である。 First, the number of prime numbers selected as described above is defined as the number of dimensions in the data space of the similarity search process. As a result, prime numbers corresponding to the number of dimensions of the data space are selected in advance so that the number of dimensions of the data space becomes the number of prime numbers to be selected. In the above example, since two prime numbers 2 and 3 are selected, the number of dimensions in the data space is 2.

更に、選択された各素数に与えられる冪数を、当該データ空間上の高次元データを形成する各次元の要素値又はその要素値が自然数に正規化された数値と定義する。これにより、対象となる高次元データの各次元の要素値が自然数の場合には、その要素値が、選択された各素数に冪数として与えられる。また、対象となる高次元データの各次元の要素値が自然数でない場合には、その要素値が自然数に正規化された数値が、選択された各素数に冪数として与えられる。上述の例では、対象の高次元データは、素数２及び３に与えられる冪数５及び２を各次元の要素値として持つ２次元のベクトルデータ、又は、各次元の要素値が正規化されることで得られる数値が冪数５及び２となる２次元ベクトルデータと考えられる。 Further, the power given to each selected prime number is defined as an element value of each dimension forming high-dimensional data on the data space or a numerical value obtained by normalizing the element value to a natural number. Thereby, when the element value of each dimension of the target high-dimensional data is a natural number, the element value is given as a power to each selected prime number. When the element value of each dimension of the target high-dimensional data is not a natural number, a numerical value obtained by normalizing the element value to the natural number is given as a power to each selected prime number. In the above example, the target high-dimensional data is normalized as two-dimensional vector data having the powers 5 and 2 given to the prime numbers 2 and 3 as element values in each dimension, or the element values in each dimension. It can be considered that the numerical value obtained in this way is two-dimensional vector data having a power of 5 and 2.

ここで、上述のような素因数分解の逆演算を特定の変換関数と捉えると、素因数分解の逆演算は、ある種の変換（又はマッピング）と見なすことができる。そして、この考え方と上述の定義とを合わせると、上述の例は、２次元空間の点（５，２）を１次元の点（２８８）に変換することに相当する。素因数分解及びその逆演算の一意性から、この変換も一意的である。 Here, if the inverse operation of prime factorization as described above is regarded as a specific conversion function, the inverse operation of prime factorization can be regarded as a kind of transformation (or mapping). Then, when this idea is combined with the above definition, the above example corresponds to converting the point (5, 2) in the two-dimensional space into a one-dimensional point (288). Because of the uniqueness of prime factorization and its inverse, this transformation is also unique.

このような考察から、本発明者は、「素因数分解の逆演算を利用することにより、任意のｄ次元空間上のベクトルデータを１次元空間の単一の正の整数へ唯一に変換することができる」こと（以降、第１補題と表記する）を見出した。以下、この第１補題を用いた具体例を実施例１として説明する。 From such considerations, the present inventor stated that “the inverse operation of prime factorization can be used to uniquely convert vector data in an arbitrary d-dimensional space into a single positive integer in a one-dimensional space. I found what I can do "(hereinafter referred to as the first lemma). Hereinafter, a specific example using the first lemma will be described as a first embodiment.

実施例１におけるインデックス生成装置１００では、係数取得部１０２は、当該高次元データの次元数分の素数を複数の変換係数として取得する。変換部１０３は、データ取得部１０１により取得される高次元データを自然数に正規化し、この正規化された高次元データを形成する各次元の要素データを冪数として用いて、係数取得部１０２で取得される各変換係数を底としてそれぞれ冪乗して得られる値の積を算出する。インデックス生成部１０４は、変換部１０３によりこのようにして算出された一次元データがインデックスデータとして含まれるインデックスを生成する。 In the index generation device 100 according to the first embodiment, the coefficient acquisition unit 102 acquires prime numbers corresponding to the number of dimensions of the high-dimensional data as a plurality of transform coefficients. The conversion unit 103 normalizes the high-dimensional data acquired by the data acquisition unit 101 to a natural number, and uses the element data of each dimension forming the normalized high-dimensional data as a power, and the coefficient acquisition unit 102 Calculate the product of the values obtained by raising each acquired conversion coefficient to the power. The index generation unit 104 generates an index in which the one-dimensional data calculated in this way by the conversion unit 103 is included as index data.

以下、実施例１におけるインデックス生成装置１００の上記処理をより詳細に図２を用いて説明する。 Hereinafter, the above-described processing of the index generation device 100 according to the first embodiment will be described in more detail with reference to FIG.

インデックス生成装置１００は、インデックス対象となるｄ次元データｖを取得する（Ｓ２１）。ｄは２以上の整数である。データｖは、ｄ次元空間上の任意の点であり、次のように表記される。
ｖ（ｘ_１，ｘ_２，・・・，ｘ_ｄ） The index generation device 100 acquires d-dimensional data v to be indexed (S21). d is an integer of 2 or more. The data v is an arbitrary point on the d-dimensional space and is expressed as follows.
v (x ₁ , x ₂ ,..., x _d )

ここで、各次元の要素データ（ｘ_１，ｘ_２，・・・，ｘ_ｄ）は自然数である。但し、取得されたｄ次元データｖの各次元の要素データは、自然数でなくてもよい。この場合には、インデックス生成装置１００は、ｄ次元データｖを自然数に正規化すればよい。具体的には、インデックス生成装置１００は、ｄ次元データｖの各次元の要素データを、元のデータに復元可能に、自然数にそれぞれ変換する。 Here, each dimension element data (x ₁ , x ₂ ,..., X _d ) is a natural number. However, the element data of each dimension of the acquired d-dimensional data v may not be a natural number. In this case, the index generation device 100 may normalize the d-dimensional data v to a natural number. Specifically, the index generation device 100 converts the element data of each dimension of the d-dimensional data v into a natural number so that it can be restored to the original data.

インデックス生成装置１００は、ｄ個の素数を適当に選択する（Ｓ２２）。インデックス生成装置１００は、素数表を予め保持していてもよい。選択された素数は、ｐ_１，ｐ_２，・・・，ｐ_ｄと表記される。 The index generating apparatus 100 appropriately selects d prime numbers (S22). The index generation device 100 may hold a prime table in advance. Selected _{_{prime, p 1, p 2, ···}} , is denoted as _{p d.}

インデックス生成装置１００は、各次元の要素データを冪数として用いて、上記選択された各素数を底としてそれぞれ冪乗して得られる値の積を算出する（Ｓ２３）。この処理は、次の（式１）で表わされる。

The index generating apparatus 100 calculates the product of the values obtained by raising each of the selected prime numbers to the power using the element data of each dimension as the power (S23). This process is expressed by the following (formula 1).

上記（式１）は、自然数を要素データとしても持つｄ次元空間上のデータＮ^ｄを変換関数ｆを用いて、１次元の自然数Ｎへ変換することを示す。ここで、変換関数ｆは、以下の（式２）で表すことができる。（式２）において、ｐ及びｖは上述のとおりである。ｕは、一次元への変換後（マッピング後）の値を示す。

The above (Equation 1) indicates that the data N ^d in the d-dimensional space that also has a natural number as element data is converted into a one-dimensional natural number N using the conversion function f. Here, the conversion function f can be expressed by the following (Formula 2). In (Formula 2), p and v are as described above. u represents a value after conversion into one dimension (after mapping).

しかしながら、実施例１の手法では、次元数ｄが大きくなる程、変換後の一次元データの値（ｕ）が急激に大きくなる。よって、実施例１によれば、実行するコンピュータ（インデックス生成装置１００）の要求性能が高くなってしまう。そこで、本発明者は、実施例１の課題に対して次のような考察を行い、上記第１補題を更に発展させた。 However, in the method of the first embodiment, the value (u) of the converted one-dimensional data increases rapidly as the dimension number d increases. Therefore, according to the first embodiment, the required performance of the computer (index generation device 100) to be executed is increased. Therefore, the present inventor made the following consideration on the problem of the first embodiment and further developed the first lemma.

素因数分解の逆演算の上記例において、両辺に対して、素数２及び３の積を底とする対数を取る。この演算は、次のように表わされる。

In the above example of the inverse operation of prime factorization, the logarithm with the product of prime numbers 2 and 3 as the base is taken for both sides. This operation is expressed as follows.

ここで、次のようなα_１及びα_２を定義し、α_１及びα_２を用いると、上記（式３）は、次の（式４）のように表わされる。以下の（式４）の右辺は実数である。また、α_１及びα_２は、定数であり、かつ、選択される素数のみに依存しているため、事前に計算しておくことが可能である。

Here, when α ₁ and α ₂ are defined as follows and α ₁ and α ₂ are used, the above (Equation 3) is expressed as the following (Equation 4). The right side of the following (Formula 4) is a real number. Further, α ₁ and α ₂ are constants and depend only on the selected prime number, and can be calculated in advance.

このように、上記例で示される、素数２及び３を用いて、２次元空間上の点（５，２）を一次元の値（２８８）に変換することは、上記（式４）に示される変換に置き換えられる。（式４）は、定数α_１及びα_２を係数とする線形変換を示し、定数α_１及びα_２は、素数の積を底とする対数計算で求められる、線形変換の係数である。そして、この線形変換に潜在する本質は、変換係数α_１及びα_２がお互いに可約ではないことにある。 Thus, the conversion of the point (5, 2) in the two-dimensional space into the one-dimensional value (288) using the prime numbers 2 and 3 shown in the above example is shown in (Equation 4). Replaced with (Equation 4) indicates a linear transformation using constants α ₁ and α ₂ as coefficients, and the constants α ₁ and α ₂ are linear transformation coefficients obtained by logarithmic calculation with the product of prime numbers as the base. The underlying essence of this linear transformation is that the transformation coefficients α ₁ and α ₂ are not mutually reducible.

このような考察から、本発明者は、「相互に可約できない変換係数を取る線形変換を利用することにより、任意のｄ次元空間上のベクトルデータを１次元空間上の単一の正の実数へ一意的に変換することができる」こと（以降、第２補題と表記する）を見出した。以下、この第２補題を用いた具体例を実施例２として説明する。 From such considerations, the present inventor has made it possible to convert vector data on an arbitrary d-dimensional space into a single positive real number on a one-dimensional space by using a linear transformation that takes transformation coefficients that cannot be mutually reduced. It can be uniquely converted to "(hereinafter referred to as the second lemma). Hereinafter, a specific example using the second lemma will be described as a second embodiment.

実施例２におけるインデックス生成装置１００では、変換部１０３は、高次元データを形成する各次元の要素データと、係数取得部１０２で取得される各変換係数との積の和を算出する。言い換えれば、変換部１０３は、各変換係数を用いて、高次元データを線形変換する。例えば、係数取得部１０２は、上記（式４）で定義される定数α１及びα２のような変換係数を取得する。具体的には、係数取得部１０２は、高次元データの次元数分の素数を選択し、この選択された素数の積を底とする各素数の対数を当該変換係数として取得する。 In the index generation device 100 according to the second embodiment, the conversion unit 103 calculates the sum of products of element data of each dimension forming high-dimensional data and each conversion coefficient acquired by the coefficient acquisition unit 102. In other words, the conversion unit 103 linearly converts high-dimensional data using each conversion coefficient. For example, the coefficient acquisition unit 102 acquires conversion coefficients such as the constants α1 and α2 defined in (Equation 4) above. Specifically, the coefficient acquisition unit 102 selects prime numbers corresponding to the number of dimensions of the high-dimensional data, and acquires the logarithm of each prime number with the product of the selected prime numbers as the conversion coefficient.

以下、実施例２におけるインデックス生成装置１００の上記処理をより詳細に図２を用いて説明する。 Hereinafter, the above-described processing of the index generation device 100 according to the second embodiment will be described in more detail with reference to FIG.

インデックス生成装置１００は、インデックス対象となるｄ次元データｖを取得する（Ｓ２１）。ｄは２以上の整数である。データｖは、ｄ次元空間上の任意の点であり、次のように表記される。実施例２では、各次元の要素データ（ｘ_１，ｘ_２，・・・，ｘ_ｄ）は自然数でなくてもよい。
ｖ（ｘ_１，ｘ_２，・・・，ｘ_ｄ） The index generation device 100 acquires d-dimensional data v to be indexed (S21). d is an integer of 2 or more. The data v is an arbitrary point on the d-dimensional space and is expressed as follows. In the second embodiment, the element data (x ₁ , x ₂ ,..., X _d ) for each dimension may not be a natural number.
v (x ₁ , x ₂ ,..., x _d )

インデックス生成装置１００は、ｄ個の素数を適当に選択する。インデックス生成装置１００は、素数表を予め保持していてもよい。選択された素数は、ｐ_１，ｐ_２，・・・，ｐ_ｄと表記される。更に、インデックス生成装置１００は、その選択された素数の積を底とする各素数の対数を当該変換係数として取得する（Ｓ２２）。この変換係数の算出は、以下の（式５）で表すことができる。 The index generation device 100 appropriately selects d prime numbers. The index generation device 100 may hold a prime table in advance. Selected _{_{prime, p 1, p 2, ···}} , is denoted as _{p d.} Furthermore, the index generating apparatus 100 acquires the logarithm of each prime number with the product of the selected prime numbers as a base (S22). The calculation of the conversion coefficient can be expressed by the following (Formula 5).

上記（式５）において、α_ｉは、取得される複数の変換係数を示す。ｄ及びｐは上述のとおりである。

In the above (Formula 5), α _i represents a plurality of acquired conversion coefficients. d and p are as described above.

インデックス生成装置１００は、高次元データを形成する各次元の要素データと、変換係数α_ｉとの積の和を算出する（Ｓ２３）。この処理は、次の（式６）で表わされる。

The index generation device 100 calculates the sum of products of the element data of each dimension forming the high-dimensional data and the conversion coefficient α _i (S23). This process is expressed by the following (formula 6).

上記（式６）は、ｄ次元空間上のデータＮ^ｄを変換関数ｇを用いて、１次元の正の実数Ｒへ変換することを示す。ここで、変換関数ｇは、以下の（式７）で表すことができる。（式７）において、ｄ及びｖ並びにαは上述のとおりである。ｕは、一次元への変換後（マッピング後）の値を示す。

The above (Equation 6) indicates that the data N ^d in the d-dimensional space is converted into a one-dimensional positive real number R using the conversion function g. Here, the conversion function g can be expressed by the following (formula 7). In (Formula 7), d, v, and α are as described above. u represents a value after conversion into one dimension (after mapping).

第２補題で示されるように、高次元データの一次元空間への唯一のマッピングは、相互に可約できない変換係数を用いた線形変換で実現することができる。従って、変換係数自体は、上記内容、即ち、素数の積を底とする各素数の対数に制限されない。例えば、インデックス生成装置１００（係数取得部１０２）は、選択された各素数の平方根をそれぞれ変換係数として取得することもできる。 As shown in the second lemma, the only mapping of high-dimensional data to a one-dimensional space can be realized by a linear transformation using transformation coefficients that cannot be mutually reduced. Therefore, the transform coefficient itself is not limited to the above contents, that is, the logarithm of each prime number with the product of prime numbers as the base. For example, the index generation device 100 (coefficient acquisition unit 102) can also acquire the square root of each selected prime as a conversion coefficient.

このように、実施例１及び２によれば、素因数分解の逆演算の一意性、又は、相互に可約でない係数を用いた線形変換の一意性を用いることにより、任意のｄ次元空間上のベクトルデータを１次元空間の単一の正の実数（自然数を含む）へ一意的に変換することができる。そして、得られた一次元データがインデックスデータとして用いられ、このインデックスデータにおける数値が異なる唯一性を持つ実数であるため、このインデックスデータを用いて元のデータ間の距離を示す指標値を計算することができる。 As described above, according to the first and second embodiments, by using the uniqueness of the inverse operation of prime factorization or the uniqueness of linear transformation using coefficients that are not mutually reducible, Vector data can be uniquely converted to a single positive real number (including a natural number) in a one-dimensional space. Then, since the obtained one-dimensional data is used as index data, and the numerical values in the index data are real numbers having different uniqueness, an index value indicating a distance between the original data is calculated using the index data. be able to.

図３は、第１実施形態における検索装置２００の処理構成例を概念的に示す図である。第１実施形態における検索装置２００は、上述のインデックス生成装置１００で生成されたインデックスを用いる。検索装置２００は、必要に応じて、そのインデックスに含まれるインデックスデータ（一次元データ）をインデックス生成装置１００から取得してもよいし、インデックス全体を予めインデックス生成装置１００から取得し保持していてもよい。 FIG. 3 is a diagram conceptually illustrating a processing configuration example of the search device 200 in the first embodiment. The search device 200 in the first embodiment uses the index generated by the index generation device 100 described above. The search device 200 may acquire the index data (one-dimensional data) included in the index from the index generation device 100 as necessary, or acquire and hold the entire index from the index generation device 100 in advance. Also good.

検索装置２００は、上記高次元データと同じ次元数の検索対象データを取得するクエリ取得部２０１と、上記係数取得部１０２で取得される複数の変換係数と同じ複数の変換係数を用いて、上記変換部１０３と同じ手法で、その検索対象データを一次元空間へ唯一にマッピングする検索対象変換部２０２と、上記高次元データとその検索対象データとの間の類似度を評価する際に、検索対象変換部２０２により得られる検索対象一次元データと上記インデックスに含まれる一次元データとの間の距離を算出する距離算出部２０３と、を有する。 The search device 200 uses the query acquisition unit 201 that acquires the search target data having the same number of dimensions as the high-dimensional data, and the plurality of conversion coefficients that are the same as the plurality of conversion coefficients acquired by the coefficient acquisition unit 102. When evaluating the similarity between the search target conversion unit 202 that uniquely maps the search target data to the one-dimensional space and the high-dimensional data and the search target data in the same manner as the conversion unit 103 A distance calculation unit 203 that calculates a distance between the search target one-dimensional data obtained by the target conversion unit 202 and the one-dimensional data included in the index.

検索装置２００は、例えば、後述する詳細実施形態（第２実施形態以降）における高次元データ検索装置１と同様のハードウェア構成を有する。その高次元データ検索装置１と同様に、プログラムが処理されることで、上述の各処理部が実現される。検索装置２００のハードウェア構成は制限されない。 The search device 200 has, for example, the same hardware configuration as the high-dimensional data search device 1 in a detailed embodiment (second embodiment and later) described later. Similar to the high-dimensional data search device 1, each processing unit described above is realized by processing a program. The hardware configuration of the search device 200 is not limited.

以下、第１実施形態における検索方法について図４を用いて説明する。図４は、第１実施形態における検索装置２００の動作例を示すフローチャートである。以下の説明では、検索装置２００が当該検索方法の実行主体となるが、検索装置２００に含まれる上述の各処理部が実行主体となってもよい。 Hereinafter, the search method in the first embodiment will be described with reference to FIG. FIG. 4 is a flowchart illustrating an operation example of the search device 200 according to the first embodiment. In the following description, the search device 200 is an execution subject of the search method, but each of the above-described processing units included in the search device 200 may be the execution subject.

第１実施形態における検索方法は、検索装置２００のような、少なくとも１つのコンピュータにより実行される方法であって、かつ、上述のインデックス生成方法により生成されるインデックスを用いる方法である。第１実施形態における検索方法は、上記高次元データと同じ次元数の検索対象データを取得し（Ｓ４１）、上述のインデックス生成方法で取得されたものと同じ複数の変換係数を用いて、上述のインデックス生成方法に含まれる上記マッピングと同じ手法で、検索対象データを一次元空間へ唯一にマッピングし（Ｓ４２）、上記高次元データとその検索対象データとの間の類似度を評価する際に、その検索対象データの（Ｓ４２）のマッピングにより得られる検索対象一次元データと上記インデックスにインデックスデータとして含まれる一次元データとの間の距離を算出する（Ｓ４３）、ことを含む。 The search method in the first embodiment is a method that is executed by at least one computer such as the search device 200 and that uses an index generated by the above-described index generation method. The search method in the first embodiment acquires search target data having the same number of dimensions as the high-dimensional data (S41), and uses the same plurality of transform coefficients acquired by the index generation method described above, and In the same method as the mapping included in the index generation method, the search target data is uniquely mapped to a one-dimensional space (S42), and when evaluating the similarity between the high-dimensional data and the search target data, Calculating the distance between the search target one-dimensional data obtained by mapping (S42) of the search target data and the one-dimensional data included in the index as index data (S43).

上述のインデックス生成装置１００及びインデックス生成方法によれば、上述したように、インデックスデータ間の距離が、対応する高次元データ間の距離を示す指標値となり得る。そこで、本実施形態では、当該インデックスデータの生成手法と同様に、検索対象データが一次元空間へ唯一にマッピングされ、検索対象一次元データが取得される。そして、当該インデックスデータの元となる高次元データとその検索対象データとの間の類似度を評価する際に、検索対象一次元データと、当該インデックスデータとしての一次元データとの間の距離が算出される。算出された距離は、対応する高速データ間の距離の指標値となり得るため、その距離を用いて、検索処理を行うことができる。 According to the index generation device 100 and the index generation method described above, as described above, the distance between index data can be an index value indicating the distance between corresponding high-dimensional data. Therefore, in this embodiment, similarly to the index data generation method, the search target data is uniquely mapped to the one-dimensional space, and the search target one-dimensional data is acquired. Then, when evaluating the similarity between the high-dimensional data that is the source of the index data and the search target data, the distance between the search target one-dimensional data and the one-dimensional data as the index data is Calculated. Since the calculated distance can be an index value of the distance between the corresponding high-speed data, the search process can be performed using the distance.

このように、本実施形態によれば、インデックスデータを用いて、対応する高次元データ間の距離の指標値を算出することができるため、その指標値を用いて検索結果の正解となる高次元データの数を或る程度絞り込むことができる。従って、本実施形態によれば、実距離を計算すべき高次元データの数を減らすことができ、ひては、検索処理の更なる効率化及び高速化を実現することができる。 As described above, according to the present embodiment, since the index data can be used to calculate the index value of the distance between corresponding high-dimensional data, the high-dimension that is the correct answer of the search result using the index value. The number of data can be reduced to some extent. Therefore, according to the present embodiment, the number of high-dimensional data for which the actual distance is to be calculated can be reduced, and further efficiency and speedup of the search process can be realized.

以下、上述の第１実施形態について更に詳細を説明する。以下には、詳細実施形態として、第２実施形態及び第３実施形態を例示する。以下の各実施形態は、第１実施形態における、インデックス生成装置１００、検索装置２００、インデックス生成方法及び検索方法を高次元データ検索装置に適用した場合の例である。なお、上述の第１実施形態は、高次元データを扱う検索装置への適用に限定されるものではなく、類似度を算出し得る様々なデータの検索装置に適用可能である。 Hereinafter, the details of the first embodiment will be described. Below, 2nd Embodiment and 3rd Embodiment are illustrated as detailed embodiment. Each of the following embodiments is an example when the index generation device 100, the search device 200, the index generation method, and the search method in the first embodiment are applied to a high-dimensional data search device. The first embodiment described above is not limited to application to a search apparatus that handles high-dimensional data, and can be applied to various data search apparatuses that can calculate similarity.

［第２実施形態］
〔装置構成〕
図５は、第２実施形態における高次元データ検索装置（以降、単に検索装置と表記する）１のハードウェア構成例を概念的に示す図である。第２実施形態における検索装置１は、図５に示されるように、ハードウェア構成として、相互にバスにより接続される、ＣＰＵ（Central Processing Unit）１０、メモリ１１、入出力インタフェース（Ｉ／Ｆ）１２、通信装置１３等を有する。 [Second Embodiment]
〔Device configuration〕
FIG. 5 is a diagram conceptually illustrating a hardware configuration example of a high-dimensional data search device (hereinafter simply referred to as a search device) 1 in the second embodiment. As shown in FIG. 5, the search device 1 according to the second embodiment has a CPU (Central Processing Unit) 10, a memory 11, and an input / output interface (I / F) that are connected to each other via a bus as a hardware configuration. 12, communication device 13 and the like.

メモリ１１は、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ハードディスク等である。入出力Ｉ／Ｆ１２は、キーボード、マウス等のようなユーザ操作の入力を受け付ける入力装置（図示せず）、表示装置やプリンタ等のようなユーザに情報を提供する出力装置（図示せず）、可搬型記録媒体などとデータをやりとりする装置などと接続可能である。通信装置１３は、他のノードと通信を行う。検索装置１は、入力装置や出力装置を持たなくてもよく、検索装置１のハードウェア構成は制限されない。 The memory 11 is a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, or the like. The input / output I / F 12 includes an input device (not shown) that receives input of a user operation such as a keyboard and a mouse, an output device (not shown) that provides information to the user such as a display device and a printer, It can be connected to a device that exchanges data with a portable recording medium or the like. The communication device 13 communicates with other nodes. The search device 1 may not have an input device or an output device, and the hardware configuration of the search device 1 is not limited.

〔処理構成〕
図６は、第２実施形態における検索装置１の処理構成例を概念的に示す図である。第２実施形態における検索装置１は、図６に示されるように、データ取得部２０、インデックス生成部２１、データベース（ＤＢ）２７、クエリ取得部３０、検索部３１等を有する。これら各処理部は、例えば、ＣＰＵ１０によりメモリ１１に格納されるプログラムが実行されることにより実現される。また、当該プログラムは、例えば、ＣＤ（Compact Disc）、メモリカード等のような可搬型記録媒体から入出力Ｉ／Ｆ１２を介して、又は、ネットワーク上の他のコンピュータから通信装置１３を介してインストールされ、メモリ１１に格納されてもよい。 [Processing configuration]
FIG. 6 is a diagram conceptually illustrating a processing configuration example of the search device 1 in the second embodiment. As illustrated in FIG. 6, the search device 1 in the second embodiment includes a data acquisition unit 20, an index generation unit 21, a database (DB) 27, a query acquisition unit 30, a search unit 31, and the like. Each of these processing units is realized, for example, by executing a program stored in the memory 11 by the CPU 10. The program is installed from a portable recording medium such as a CD (Compact Disc) or a memory card via the input / output I / F 12 or from another computer on the network via the communication device 13. May be stored in the memory 11.

また、上述の処理部は、複数台のコンピュータにより実現されてもよい。例えば、インデックス生成のためのデータ取得部２０及びインデックス生成部２１は、１つのコンピュータで実現され、クエリ取得部３０及び検索部３１は、他のコンピュータで実現されてもよい。また、ＤＢ２７は、更に異なる他のコンピュータで実現されてもよい。 Further, the processing unit described above may be realized by a plurality of computers. For example, the data acquisition unit 20 and the index generation unit 21 for index generation may be realized by one computer, and the query acquisition unit 30 and the search unit 31 may be realized by another computer. The DB 27 may be realized by another different computer.

データ取得部２０は、上述のデータ取得部１０１に相当する。データ取得部２０は、映像等のような高次元の特徴量データを高次元データとして取得する。特徴量データは、入力画面等を入力装置を用いてユーザが操作することにより入力された情報であってもよいし、可搬型記録媒体、他のコンピュータ等から入出力Ｉ／Ｆ１２又は通信装置１３を経由して取得された情報であってもよい。データ取得部２０により取得される高次元データの内容自体は制限されない。 The data acquisition unit 20 corresponds to the data acquisition unit 101 described above. The data acquisition unit 20 acquires high-dimensional feature data such as video as high-dimensional data. The feature amount data may be information input by a user operating an input screen or the like using an input device, or the input / output I / F 12 or the communication device 13 from a portable recording medium, another computer, or the like. It may be information acquired via. The content of the high-dimensional data acquired by the data acquisition unit 20 is not limited.

インデックス生成部２１は、データ取得部２０により取得される特徴量データ（高次元データ）に対しインデックスを付与し、その特徴量データ及びインデックス情報をＤＢ２７に格納する。インデックス生成部２１は、係数取得部２３、変換部２４、並び替え処理部２５等を含む。図６では、説明の便宜のために、係数取得部２３、変換部２４及び並び替え処理部２５が、インデックス生成部２１に内包されるように図示される。これら各処理部の関係を図６に示される関係に限定されない。 The index generation unit 21 assigns an index to the feature amount data (high-dimensional data) acquired by the data acquisition unit 20, and stores the feature amount data and index information in the DB 27. The index generation unit 21 includes a coefficient acquisition unit 23, a conversion unit 24, a rearrangement processing unit 25, and the like. In FIG. 6, for convenience of explanation, the coefficient acquisition unit 23, the conversion unit 24, and the rearrangement processing unit 25 are illustrated as being included in the index generation unit 21. The relationship between these processing units is not limited to the relationship shown in FIG.

ＤＢ２７は、多数の特徴量データ及びそのインデックス情報を格納する。但し、ＤＢ２７は、検索装置１以外の他のコンピュータ上に実現されてもよい。この場合、インデックス生成部２１は、他のコンピュータと通信を行うことにより、ＤＢ２７にアクセスする。 The DB 27 stores a large number of feature amount data and index information thereof. However, the DB 27 may be realized on a computer other than the search device 1. In this case, the index generation unit 21 accesses the DB 27 by communicating with another computer.

係数取得部２３及び変換部２４は、上述の係数取得部１０２及び変換部１０３に相当する。係数取得部２３及び変換部２４は、上記補題１に基づく処理を実行してもよいし、上記補題２に基づく処理を実行してもよい。並び替え処理部２５は、上述のインデックス生成部１０４に相当する。並び替え処理部２５は、変換部２４により特徴量データから変換される一次元データを昇順又は降順に並び替えてインデックスデータとし、Ｂ＋木を用いてインデックスを生成する。 The coefficient acquisition unit 23 and the conversion unit 24 correspond to the coefficient acquisition unit 102 and the conversion unit 103 described above. The coefficient acquisition unit 23 and the conversion unit 24 may execute processing based on the lemma 1 or may execute processing based on the lemma 2. The rearrangement processing unit 25 corresponds to the index generation unit 104 described above. The rearrangement processing unit 25 rearranges the one-dimensional data converted from the feature amount data by the conversion unit 24 into ascending or descending order as index data, and generates an index using the B + tree.

クエリ取得部３０は、上述のクエリ取得部２０１に相当する。クエリ取得部３０は、検索対象の特徴量データを取得する。検索対象の特徴量データは、データ取得部２０で取得される特徴量データと同じ次元数のデータであり、以降、検索対象データと表記する。クエリ取得部３０は、検索対象データに加えて、その検索対象データからの距離条件を更に取得する。この距離条件は、検索対象データの高次元空間における距離情報であり、その距離条件には、例えば、検索対象データの高次元空間上における、検索対象データを中心とする半径ｒが用いられる。クエリ取得部３０により取得される検索対象データ及び距離条件は、類似検索の範囲問合せで指定される情報であり、類似検索の問い合わせ範囲を示す。 The query acquisition unit 30 corresponds to the query acquisition unit 201 described above. The query acquisition unit 30 acquires feature amount data to be searched. The feature quantity data to be searched is data having the same number of dimensions as the feature quantity data acquired by the data acquisition unit 20, and is hereinafter referred to as search target data. In addition to the search target data, the query acquisition unit 30 further acquires a distance condition from the search target data. This distance condition is distance information in the high-dimensional space of the search target data. For the distance condition, for example, a radius r centering on the search target data in the high-dimensional space of the search target data is used. The search target data and the distance condition acquired by the query acquisition unit 30 are information specified by the similar search range query, and indicate the query range of the similar search.

検索部３１は、ＤＢ２７に格納されるインデックスを用いて、ＤＢ２７に格納される特徴量データの中から、検索対象データ及び距離条件に基づく範囲問合せに適合する特徴量データを検索する。範囲問合せとは、検索対象データとの距離が距離条件に適合する特徴量データを、ＤＢ２７から抽出する類似検索要求を意味する。検索結果のデータは、入出力Ｉ／Ｆ１２を介して表示装置や印刷装置に出力されてもよいし、入出力Ｉ／Ｆ１２を介して可搬型記録媒体に格納されてもよいし、通信装置１３を介して他のコンピュータに送信されてもよい。 Using the index stored in the DB 27, the search unit 31 searches the feature amount data stored in the DB 27 for feature amount data that matches the range query based on the search target data and the distance condition. The range inquiry means a similar search request for extracting feature data whose distance from the search target data matches the distance condition from the DB 27. The search result data may be output to a display device or a printing device via the input / output I / F 12, stored in a portable recording medium via the input / output I / F 12, or the communication device 13. It may be transmitted to other computers via.

検索部３１は、図６に示されるように、検索対象変換部３２、距離算出部３３、範囲検索部３４等を含む。 As shown in FIG. 6, the search unit 31 includes a search target conversion unit 32, a distance calculation unit 33, a range search unit 34, and the like.

検索対象変換部３２は、上述の検索対象変換部２０２に相当する。検索対象変換部３２は、検索対象変換部２０２と同様に、クエリ取得部３０で取得された検索対象データを検索対象一次データに変換する。更に、検索対象変換部３２は、検索対象データの変換と同様の手法で、後述する範囲取得部３５により取得される、上界データ及び下界データを一次元空間へ唯一にマッピングする。これは、検索対象データ及び距離条件により示される、高次元空間上の類似検索範囲を一次元空間に写像することに相当する。この処理により、特徴量データが属する高次元空間上の上界データ及び下界データが上界一次元データ及び下界一次元データに変換される。 The search target conversion unit 32 corresponds to the search target conversion unit 202 described above. Similar to the search target conversion unit 202, the search target conversion unit 32 converts the search target data acquired by the query acquisition unit 30 into search target primary data. Further, the search target conversion unit 32 uniquely maps the upper bound data and the lower bound data acquired by the range acquisition unit 35, which will be described later, to the one-dimensional space in the same manner as the conversion of the search target data. This is equivalent to mapping the similar search range in the high-dimensional space indicated by the search target data and the distance condition to the one-dimensional space. By this processing, the upper bound data and the lower bound data in the high dimensional space to which the feature amount data belongs are converted into the upper bound one dimensional data and the lower bound one dimensional data.

検索対象変換部３２は、係数取得部２３で取得される複数の変換係数をインデックス生成部２１から取得してもよいし、係数取得部２３と同じ手法で、係数取得部２３で取得される複数の変換係数と同じ複数の変換係数を取得してもよい。また、検索対象変換部３２は、変換部２４と同じ変換ルール（マッピングルール、変換関数）を持つ。 The search target conversion unit 32 may acquire a plurality of conversion coefficients acquired by the coefficient acquisition unit 23 from the index generation unit 21, or may acquire a plurality of conversion coefficients acquired by the coefficient acquisition unit 23 in the same manner as the coefficient acquisition unit 23. A plurality of the same conversion coefficients may be acquired. Further, the search target conversion unit 32 has the same conversion rule (mapping rule, conversion function) as the conversion unit 24.

距離算出部３３は、上述の距離算出部２０３に相当する。距離算出部３３は、後述の第１対象特定部３６により特定されるインデックスデータと検索対象変換部３２により得られる検索対象一次元データとの間の距離を算出する。そのインデックスデータ及び検索対象一次元データは共に一次元の値であるため、距離算出部３３は、各値の差を当該距離として算出する。 The distance calculation unit 33 corresponds to the distance calculation unit 203 described above. The distance calculation unit 33 calculates the distance between the index data specified by the first target specifying unit 36 described later and the search target one-dimensional data obtained by the search target conversion unit 32. Since the index data and the search target one-dimensional data are both one-dimensional values, the distance calculation unit 33 calculates a difference between the values as the distance.

範囲検索部３４は、ＤＢ２７に格納されるインデックスに含まれるインデックスデータを参照することにより、検索対象データ及び距離条件に基づく範囲問合せの解となる特徴量データを抽出する。 The range search unit 34 refers to the index data included in the index stored in the DB 27 to extract feature amount data that is a solution to the range query based on the search target data and the distance condition.

範囲検索部３４は、図６に示されるように、範囲取得部３５、第１対象特定部３６、候補抽出部３７、第１類似度算出部３８等を含む。範囲検索部３４は、後述の第１類似度算出部３８により算出される実距離と問合せ範囲情報との比較により、当該範囲問合せの解となる特徴量データを抽出する。但し、範囲検索部３４は、後述の候補抽出部３７により解候補として抽出されるインデックスデータに対応する特徴量データを当該範囲問合せの解として抽出することもできる。この場合には、範囲検索部３４は、第１類似度算出部３８を持たなくてもよい。更に、範囲検索部３４は、後述の第１対象特定部３６により特定されるインデックスデータに対応する特徴量データを当該範囲問合せの解として抽出することもできる。この場合には、範囲検索部３４は、候補抽出部３７及び第１類似度算出部３８を持たなくてもよい。 As illustrated in FIG. 6, the range search unit 34 includes a range acquisition unit 35, a first target specifying unit 36, a candidate extraction unit 37, a first similarity calculation unit 38, and the like. The range search unit 34 extracts feature quantity data as a solution to the range query by comparing the actual distance calculated by the first similarity calculation unit 38, which will be described later, with the query range information. However, the range search unit 34 can also extract feature data corresponding to the index data extracted as a solution candidate by the candidate extraction unit 37 described later as a solution for the range query. In this case, the range search unit 34 may not have the first similarity calculation unit 38. Furthermore, the range search unit 34 can also extract feature amount data corresponding to index data specified by a first target specifying unit 36 described later as a solution to the range query. In this case, the range search unit 34 may not have the candidate extraction unit 37 and the first similarity calculation unit 38.

範囲取得部３５は、クエリ取得部３０により取得される検索対象データ及び距離条件により示される、検索対象データの高次元空間における問合せ範囲に関する、上界データ及び下界データを取得する。距離条件が半径ｒを示す場合、上界データ及び下界データは、その高次元空間上で、検索対象データに対応する点から半径ｒ以内に含まれる特徴量データ群の中の上界及び下界を示す。よって、範囲取得部３５により取得される上界データ及び下界データは、検索対象データと同じ次元数を持つ。 The range acquisition unit 35 acquires upper bound data and lower bound data related to the query range in the high-dimensional space of the search target data indicated by the search target data and the distance condition acquired by the query acquisition unit 30. When the distance condition indicates the radius r, the upper bound data and the lower bound data indicate the upper bound and the lower bound in the feature amount data group included within the radius r from the point corresponding to the search target data in the high-dimensional space. Show. Therefore, the upper bound data and the lower bound data acquired by the range acquisition unit 35 have the same number of dimensions as the search target data.

第１対象特定部３６は、ＤＢ２７に格納されるインデックスの中から、検索対象変換部３２により上界データ及び下界データから得られる上界一次元データ及び下界一次元データの間の範囲内のインデックスデータを特定する。具体的には、第１対象特定部３６は、当該インデックスの中から、下界一次元データより大きく、かつ、上界一次元データよりも小さいインデックスデータを特定する。 The first target specifying unit 36 is an index within the range between the upper bound one-dimensional data and the lower bound one-dimensional data obtained from the upper bound data and the lower bound data by the search subject converting unit 32 among the indexes stored in the DB 27. Identify the data. Specifically, the first target specifying unit 36 specifies index data that is larger than the lower bound one-dimensional data and smaller than the upper bound one-dimensional data from the index.

候補抽出部３７は、距離条件から得られる一次元空間上の一次元距離条件と、距離算出部３３により算出される距離との比較により、第１対象特定部３６により特定されたインデックスデータをフィルタリングし、このフィルタリングで得られるインデックスデータを解候補として抽出する。クエリ取得部３０により取得される距離条件は、上述のとおり、特徴量データの高次元空間における距離を示し、一次元距離条件は、その高次元空間の距離に対応する一次元空間上の距離を示す。この一次元距離条件は、例えば、ヘルダーの不等式を用いて算出される。よって、候補抽出部３７は、第１対象特定部３６により特定される、上界一次元データ及び下界一次元データの間の範囲内のインデックスデータの中から、各インデックスデータと検索対象変換部３２により得られる検索対象一次元データとの間の距離がその一次元距離条件に合致しないインデックスデータを除外し、残ったインデックスデータを解候補とする。 The candidate extracting unit 37 filters the index data specified by the first target specifying unit 36 by comparing the one-dimensional distance condition in the one-dimensional space obtained from the distance condition with the distance calculated by the distance calculating unit 33. Then, index data obtained by this filtering is extracted as a solution candidate. As described above, the distance condition acquired by the query acquisition unit 30 indicates the distance in the high-dimensional space of the feature amount data, and the one-dimensional distance condition indicates the distance in the one-dimensional space corresponding to the distance in the high-dimensional space. Show. This one-dimensional distance condition is calculated using, for example, Helder's inequality. Therefore, the candidate extraction unit 37 selects each index data and the search target conversion unit 32 from the index data within the range between the upper bound one-dimensional data and the lower bound one-dimensional data specified by the first target specifying unit 36. The index data whose distance to the search target one-dimensional data obtained by the above does not match the one-dimensional distance condition is excluded, and the remaining index data is set as a solution candidate.

第１類似度算出部３８は、候補抽出部３７により抽出された解候補のインデックスデータに対応する特徴量データと検索対象データとの間の実距離を算出する。算出される実距離は、特徴量データ及び検索対象データが属する高次元空間上の距離である。 The first similarity calculation unit 38 calculates the actual distance between the feature amount data corresponding to the solution candidate index data extracted by the candidate extraction unit 37 and the search target data. The calculated actual distance is a distance in a high-dimensional space to which the feature amount data and the search target data belong.

〔動作例〕
以下、第２実施形態におけるインデックス生成方法及び検索方法を、第２実施形態における検索装置１の動作に基づいて、説明する。以下の説明では、検索装置１が各方法の実行主体となるが、検索装置１に含まれる上述の各処理部が実行主体となってもよい。また、実行主体は、複数の装置（コンピュータ）であってもよい。 [Operation example]
Hereinafter, the index generation method and the search method in the second embodiment will be described based on the operation of the search device 1 in the second embodiment. In the following description, the search device 1 is an execution subject of each method, but each of the above-described processing units included in the search device 1 may be an execution subject. Further, the execution subject may be a plurality of devices (computers).

まず、第２実施形態におけるインデックス生成方法について図７を用いて説明する。図７は、第２実施形態における検索装置１の、インデックス生成に関する動作例を示すフローチャートである。但し、図７には、上記第１補題に基づくインデックス生成方法が例示されている。 First, an index generation method according to the second embodiment will be described with reference to FIG. FIG. 7 is a flowchart illustrating an operation example related to index generation of the search device 1 according to the second embodiment. However, FIG. 7 illustrates an index generation method based on the first lemma.

検索装置１は、インデックス対象となる特徴量データの次元数ｄを取得する（Ｓ７１）。次元数ｄは、入力装置を用いて入力画面等をユーザが操作することにより入力されてもよいし、可搬型記録媒体、他のコンピュータ等から取得されてもよい。 The search device 1 acquires the dimension number d of the feature amount data to be indexed (S71). The dimension number d may be input by a user operating an input screen or the like using an input device, or may be acquired from a portable recording medium, another computer, or the like.

検索装置１は、ｄ個の素数を選択する（Ｓ７２）。検索装置１は、素数表からその素数を選択してもよい。この場合、検索装置１は、十分な数の素数を含む素数表を予め保持していてもよいし、他のコンピュータ等から取得してもよい。 The search device 1 selects d prime numbers (S72). The search device 1 may select the prime number from the prime number table. In this case, the search device 1 may hold in advance a prime number table including a sufficient number of prime numbers, or may obtain it from another computer or the like.

続いて、検索装置１は、（Ｓ７２）で選択されたｄ個の素数に基づいて、変換係数を決定する（Ｓ７３）。図７の例では、選択されたｄ個の素数がそのままｄ個の変換係数に決定される。上記第２補題に基づくインデックス生成方法の場合には、検索装置１は、選択されたｄ個の素数の積を底とするｄ個の素数の各々の対数をｄ個の変換係数として算出する。また、他の方法の場合には、検索装置１は、選択されたｄ個の素数の各々の平方根をｄ個の変換係数として算出してもよい。変換係数の全てのペアが相互に可約できないように、複数の変換係数が決められるのであれば、具体的な決定方法は制限されない。 Subsequently, the search device 1 determines a transform coefficient based on the d prime numbers selected in (S72) (S73). In the example of FIG. 7, the selected d prime numbers are directly determined as d transform coefficients. In the case of the index generation method based on the second lemma, the search device 1 calculates the logarithm of each of the d prime numbers with the product of the selected d prime numbers as d transform coefficients. In the case of another method, the search device 1 may calculate the square root of each of the selected d prime numbers as d conversion coefficients. If a plurality of transform coefficients are determined so that all pairs of transform coefficients cannot be mutually reduced, a specific determination method is not limited.

検索装置１は、インデックス対象となる特徴量データを取得する（Ｓ７４）。特徴量データは、入力装置を用いて入力画面等をユーザが操作することにより入力されてもよいし、可搬型記録媒体、他のコンピュータ等から取得されてもよい。 The search device 1 acquires feature quantity data to be indexed (S74). The feature amount data may be input by a user operating an input screen or the like using an input device, or may be acquired from a portable recording medium, another computer, or the like.

検索装置１は、（Ｓ７４）で取得された特徴量データを正規化する（Ｓ７５）。具体的には、検索装置１は、特徴量データの各次元の要素データを自然数にそれぞれ正規化する。ここでの正規化とは、各次元の要素データを、元のデータに復元可能に、自然数に変換することを意味する。よって、小数を単純に小数点以下を全て削除することで自然数に変換することはこの正規化には該当しない。 The search device 1 normalizes the feature amount data acquired in (S74) (S75). Specifically, the search device 1 normalizes element data of each dimension of the feature amount data to natural numbers. Normalization here means converting the element data of each dimension into a natural number so that it can be restored to the original data. Therefore, converting a decimal number to a natural number by simply deleting all decimal places does not correspond to this normalization.

検索装置１は、（Ｓ７３）で決定された変換係数を用いて、（Ｓ７５）で正規化された特徴量データを一次元へ唯一に変換する（Ｓ７６）。図７の例では、実施例１に示されるように、検索装置１は、正規化された特徴量データの各次元の要素データを冪数として用いて、（Ｓ７３）で取得された各変換係数を底としてそれぞれ冪乗して得られる値の積を算出する。 Using the conversion coefficient determined in (S73), the search device 1 uniquely converts the feature data normalized in (S75) into one dimension (S76). In the example of FIG. 7, as illustrated in the first embodiment, the search device 1 uses the element data of each dimension of the normalized feature amount data as a power, and uses each conversion coefficient acquired in (S73). Calculate the product of the values obtained by raising each to the power.

検索装置１は、（Ｓ７６）で変換された一次元のデータを昇順又は降順に整列する（Ｓ７７）。 The search device 1 arranges the one-dimensional data converted in (S76) in ascending or descending order (S77).

検索装置１は、インデックス対象となる他の特徴量データが有るか否かを判断する（Ｓ７８）。検索装置１は、他の特徴量データが有れば（Ｓ７８；ＹＥＳ）、その特徴量データを取得し（Ｓ７４）、その取得された特徴量データに対して、（Ｓ７５）以降を実行する。 The search device 1 determines whether there is any other feature amount data to be indexed (S78). If there is other feature amount data (S78; YES), the search device 1 acquires the feature amount data (S74), and executes (S75) and subsequent steps on the acquired feature amount data.

検索装置１は、他の特徴量データがなければ（Ｓ７８；ＮＯ）、（Ｓ７７）で整列された一次元データを各特徴量データのインデックスデータとしても含む階層的なインデックスを生成し、そのインデックスと特徴量データとをＤＢ２７に格納する（Ｓ７９）。但し、検索装置１は、インデックス及び特徴量データを、可搬型記録媒体に格納してもよいし、他のコンピュータに送信してもよい。 If there is no other feature amount data (S78; NO), the search device 1 generates a hierarchical index including the one-dimensional data arranged in (S77) as index data of each feature amount data. And feature quantity data are stored in the DB 27 (S79). However, the search device 1 may store the index and feature amount data in a portable recording medium or may transmit them to another computer.

第２実施形態におけるインデックス生成方法は、図７の例に限定されない。予め、処理対象となる高次元データの次元数が分かっている場合には、（Ｓ７３）で決定される変換係数は、予め、検索装置１に保持されていてもよく、この場合、（Ｓ７１）、（Ｓ７２）及び（Ｓ７３）は、当該インデックス生成方法に含まれなくてもよい。また、図７の例では、上記第１補題が利用されたが、第２実施形態におけるインデックス生成方法は、第２補題に基づいていてもよい。この場合には、検索装置１は、（Ｓ７５）を実行せず、（Ｓ７６）では、（Ｓ７３）で決定された各変換係数を含む線形変換関数に、特徴量データの各次元の要素データを代入することにより、その特徴量データを一次元データに変換する。 The index generation method in the second embodiment is not limited to the example of FIG. When the number of dimensions of the high-dimensional data to be processed is known in advance, the conversion coefficient determined in (S73) may be held in advance in the search device 1, and in this case, (S71) , (S72) and (S73) may not be included in the index generation method. In the example of FIG. 7, the first lemma is used. However, the index generation method in the second embodiment may be based on the second lemma. In this case, the search device 1 does not execute (S75). In (S76), the element data of each dimension of the feature amount data is added to the linear conversion function including each conversion coefficient determined in (S73). By substituting, the feature data is converted into one-dimensional data.

次に、第２実施形態における検索方法について図８を用いて説明する。図８は、第２実施形態における検索装置１の、範囲問合せ（Range Query）に関する検索方法に関する動作例を示すフローチャートである。以下の説明では、検索対象データはクエリデータｑと表記される。 Next, a search method in the second embodiment will be described with reference to FIG. FIG. 8 is a flowchart showing an operation example related to a search method related to a range query (Range Query) of the search device 1 according to the second embodiment. In the following description, the search target data is expressed as query data q.

検索装置１は、クエリデータｑ及び距離条件をクエリパラメータとして取得する（Ｓ８１）。クエリデータｑは、検索対象データであり、高次元の特徴量データである。図８の例では、距離条件として距離半径ｒが指定される。 The search device 1 acquires the query data q and the distance condition as query parameters (S81). The query data q is search target data and is high-dimensional feature data. In the example of FIG. 8, the distance radius r is specified as the distance condition.

検索装置１は、クエリデータｑ及び距離半径ｒにより、元の高次元空間において範囲問合せの解となりうる上界及び下界の各データ点を求める（Ｓ８２）。上界のデータ点は、上界データと表記され、下界のデータ点は、下界データと表記される。 The search device 1 obtains upper and lower data points that can be a solution to the range query in the original high-dimensional space from the query data q and the distance radius r (S82). The upper bound data point is denoted as upper bound data, and the lower bound data point is denoted as lower bound data.

検索装置１は、（Ｓ８２）で取得された上界データ及び下界データを、図７の（Ｓ７６）と同様の手法により、一次元へ唯一に変換する（Ｓ８３）。このとき、検索装置１は、図７の（Ｓ７３）で決定されるものと同じ変換係数を用いる。これにより、インデックス対象の特徴量データからインデックスデータへの変換と同様の変換規則により、上界データ及び下界データが上界一次元データ及び下界一次元データに変換される。 The search device 1 uniquely converts the upper bound data and the lower bound data acquired in (S82) into one dimension by the same method as (S76) in FIG. 7 (S83). At this time, the search device 1 uses the same conversion coefficient as that determined in (S73) of FIG. Thereby, the upper bound data and the lower bound data are converted into the upper bound one-dimensional data and the lower bound one-dimensional data by the same conversion rule as the conversion from the feature quantity data to be indexed to the index data.

検索装置１は、ＤＢ２７に格納されるインデックスから、（Ｓ８３）で得られる上界一次元データと下界一次元データとの間の範囲内のインデックスデータを特定する（Ｓ８４）。その範囲内のインデックスデータが存在しない場合、検索装置１は、解なしと判断する。 The search device 1 specifies index data within the range between the upper bound one-dimensional data and the lower bound one-dimensional data obtained in (S83) from the index stored in the DB 27 (S84). When there is no index data within the range, the search device 1 determines that there is no solution.

検索装置１は、クエリデータｑを、（Ｓ８３）と同様の手法により、一次元へ唯一に変換する（Ｓ８５）。これにより、インデックス対象の特徴量データからインデックスデータへの変換と同様の変換規則により、クエリデータｑがクエリ一次元データに変換される。 The search device 1 uniquely converts the query data q into one dimension by the same method as (S83) (S85). Thus, the query data q is converted into query one-dimensional data according to the same conversion rule as the conversion from the feature quantity data to be indexed to the index data.

検索装置１は、（Ｓ８５）で得られるクエリ一次元データと、（Ｓ８４）で特定される各インデックスデータとの距離をそれぞれ算出する（Ｓ８６）。インデックスデータ及びクエリ一次元データは共に一次元の値であるため、検索装置１は、各値の差を当該距離として算出する。 The search device 1 calculates the distance between the one-dimensional query data obtained in (S85) and each index data specified in (S84) (S86). Since the index data and the query one-dimensional data are both one-dimensional values, the search device 1 calculates the difference between the values as the distance.

検索装置１は、（Ｓ８６）で算出される距離を用いて、（Ｓ８４）で特定されたインデックスデータをフィルタリングし、残ったインデックスデータを解候補として抽出する（Ｓ８７）。検索装置１は、当該フィルタリングを具体的に次のように実行する。検索装置１は、ヘルダーの不等式等を用いて、（Ｓ８１）で得られた半径ｒに対応する一次元空間上の一次元距離条件を算出し、（Ｓ８６）で算出された距離がその一次元距離条件に合致しないインデックスデータを解候補から除外する。 The search device 1 filters the index data specified in (S84) using the distance calculated in (S86), and extracts the remaining index data as solution candidates (S87). The search device 1 performs the filtering specifically as follows. The search device 1 calculates a one-dimensional distance condition in the one-dimensional space corresponding to the radius r obtained in (S81) using a Helder inequality, and the distance calculated in (S86) is the one-dimensional distance. Index data that does not match the distance condition is excluded from the solution candidates.

検索装置１は、（Ｓ８７）で抽出された解候補のインデックスデータに対応する特徴量データをＤＢ２７から取得し、この特徴量データとクエリデータｑとの実距離を算出する（Ｓ８８）。 The search device 1 acquires feature amount data corresponding to the index data of the solution candidate extracted in (S87) from the DB 27, and calculates an actual distance between the feature amount data and the query data q (S88).

検索装置１は、（Ｓ８８）で算出された実距離が（Ｓ８２）で得られた半径ｒより小さい特徴量データを当該範囲問合せの解として抽出する（Ｓ８９）。 The search device 1 extracts feature amount data whose actual distance calculated in (S88) is smaller than the radius r obtained in (S82) as a solution to the range query (S89).

第２実施形態における検索方法は、図８の例に限定されない。例えば、検索装置１は、（Ｓ８４）で特定されたインデックスデータを解候補に設定してもよい。この場合には、当該検索方法は、（Ｓ８５）、（Ｓ８６）及び（Ｓ８７）を含まなくてもよい。また、検索装置１は、（Ｓ８４）で特定されたインデックスデータに対応する特徴量データを範囲問合せの解に設定してもよい。この場合には、当該検索方法は、（Ｓ８５）以降を含まなくてもよい。また、検索装置１は、（Ｓ８７）で解候補として抽出されたインデックスデータに対応する特徴量データを範囲問合せの解に設定してもよい。この場合には、当該検索方法は、（Ｓ８８）以降を含まなくてもよい。また、当該検索方法は、図８に示される工程の実行順に制限されない。例えば、（Ｓ８５）は、（Ｓ８２）以降で、かつ、（Ｓ８６）より前であれば、どこ時点で実行されてもよい。 The search method in the second embodiment is not limited to the example of FIG. For example, the search device 1 may set the index data specified in (S84) as a solution candidate. In this case, the search method may not include (S85), (S86), and (S87). Further, the search device 1 may set the feature amount data corresponding to the index data specified in (S84) as a solution to the range query. In this case, the search method may not include (S85) and subsequent steps. Further, the search device 1 may set feature amount data corresponding to the index data extracted as a solution candidate in (S87) as a solution for the range query. In this case, the search method may not include (S88) and subsequent steps. Further, the search method is not limited to the execution order of the steps shown in FIG. For example, (S85) may be executed at any time as long as it is after (S82) and before (S86).

〔第２実施形態における作用及び効果〕
上述のように、第２実施形態では、インデックス対象となる高次元の特徴量データが、相互に可約できない変換係数を用いた変換規則により、一意的に、一次元データに変換され、この一次元データが昇順又は降順に整列された状態でインデックスデータとして含まれる階層的なインデックスが生成される。そして、このインデックスを用いた範囲問合せの検索処理が実行される。この検索処理では、インデックスデータが属する一次元空間上での計算のみにより、範囲問合せの解となる特徴量データに対応するインデックスデータを或る程度絞り込むことができる。つまり、第２実施形態によれば、高次元空間上の距離計算（類似度計算）のような高負荷の処理を行うことなく、一次元空間上の距離計算（減算）のような低負荷の処理で、範囲問合せの解候補を絞り込むことができるため、範囲問合せの検索処理を高速化することができる。 [Operations and effects in the second embodiment]
As described above, in the second embodiment, high-dimensional feature quantity data to be indexed is uniquely converted into one-dimensional data by a conversion rule using conversion coefficients that cannot be mutually reduced. A hierarchical index included as index data is generated in a state where the original data is arranged in ascending or descending order. Then, a range query search process using this index is executed. In this search process, the index data corresponding to the feature data serving as a solution to the range query can be narrowed down to some extent only by calculation in the one-dimensional space to which the index data belongs. That is, according to the second embodiment, a low load such as a distance calculation (subtraction) in a one-dimensional space is performed without performing a high load process such as a distance calculation (similarity calculation) in a high-dimensional space. Since the range query solution candidates can be narrowed down in the process, the range query search process can be speeded up.

具体的には、第２実施形態では、範囲問合せの距離条件から得られる高次元空間の上界及び下界が、インデックスデータの生成のためのものと同様の変換規則で、インデックスデータの一次元空間にマッピングされる。このマッピングにより、当該一次元空間上における上界点（上界一次元データ）及び下界点（下界一次元データ）が得られ、上界点と下界点との間の範囲内のインデックスデータが特定される。このように、第２実施形態では、範囲問合せの上界及び下界を一次元空間に変換することで、全インデックスデータの中から、範囲問合せの解に対応し得るインデックスデータを絞り込むことができる。 Specifically, in the second embodiment, the upper and lower bounds of the high-dimensional space obtained from the distance condition of the range query are the same conversion rules as those for generating the index data, and the one-dimensional space of the index data Mapped to By this mapping, the upper bound point (upper bound one-dimensional data) and the lower bound point (lower bound one-dimensional data) on the one-dimensional space are obtained, and the index data within the range between the upper bound point and the lower bound point is specified. Is done. As described above, in the second embodiment, by converting the upper and lower bounds of the range query into a one-dimensional space, it is possible to narrow down index data that can correspond to the solution of the range query from all index data.

更に、範囲問合せのクエリデータ（検索対象データ）も、インデックスデータの生成のためのものと同様の変換規則で、一次元データ（クエリ一次元データ）に変換され、そのクエリ一次元データとインデックスデータとの距離（差）から、解候補としてのインデックスデータが更に絞り込まれる。このような解候補の更なる絞り込みについても、一次元空間上での計算のみで実現することができる。このように絞り込まれた解候補に関し、高次元空間上での実距離が計算され、その実距離に応じて、範囲問合せの最終的な解が得られる。 Further, the query data (search target data) of the range query is also converted into one-dimensional data (query one-dimensional data) by the same conversion rule as that for generating index data, and the query one-dimensional data and index data are converted. Index data as solution candidates is further narrowed down from the distance (difference). Such further narrowing down of solution candidates can also be realized only by calculation in a one-dimensional space. With respect to the solution candidates narrowed down in this way, an actual distance in a high-dimensional space is calculated, and a final solution for a range inquiry is obtained according to the actual distance.

このように、第２実施形態における範囲問合せの検索処理によれば、一次元空間での段階的な解候補の絞り込みにより、処理負荷の高い高次元空間上での実距離計算の対象を減らすことに成功し、ひいては、範囲問合せの検索処理の高速化を実現している。 As described above, according to the search processing of the range query in the second embodiment, the target of the real distance calculation in the high-dimensional space with a high processing load is reduced by narrowing down the solution candidates stepwise in the one-dimensional space. As a result, the search processing of the range query is speeded up.

［第３実施形態］
第２実施形態では、範囲問合せの検索機能のみが説明された。第３実施形態における検索装置１は、範囲問合せの検索機能に加えて、ｋ最近傍探索（k-Nearest Neighbors Query）の検索機能も備える。以下、第３実施形態における検索装置１について、第２実施形態と異なる内容を中心に説明する。以下の説明では、第２実施形態と同様の内容については適宜省略する。 [Third Embodiment]
In the second embodiment, only the range query search function has been described. The search device 1 according to the third embodiment includes a search function for a k-Nearest Neighbors Query in addition to a search function for a range query. Hereinafter, the search device 1 in the third embodiment will be described focusing on the content different from the second embodiment. In the following description, the same contents as those of the second embodiment are omitted as appropriate.

〔処理構成〕
図９は、第３実施形態における検索装置１の処理構成例を概念的に示す図である。第３実施形態における検索装置１では、検索部３１が、第２実施形態の構成に加えて、最近傍探索部４０を更に有する。最近傍探索部４０についても、他の処理部と同様に、ＣＰＵ１０によりメモリ１１に格納されるプログラムが実行されることにより実現される。 [Processing configuration]
FIG. 9 is a diagram conceptually illustrating a processing configuration example of the search device 1 in the third embodiment. In the search device 1 according to the third embodiment, the search unit 31 further includes a nearest neighbor search unit 40 in addition to the configuration of the second embodiment. The nearest neighbor search unit 40 is also realized by executing a program stored in the memory 11 by the CPU 10 as in the case of other processing units.

クエリ取得部３０は、ｋ最近傍探索の際には、検索対象データ及びデータ数ｋ（ｋは自然数）を示すデータ数情報を取得する。 The query acquisition unit 30 acquires data number information indicating search target data and the number of data k (k is a natural number) in the k nearest neighbor search.

最近傍探索部４０は、ＤＢ２７に格納されるインデックスに含まれるインデックスデータを参照することにより、検索対象データ及びデータ数情報により示されるｋ最近傍探索の解となる特徴量データを抽出する。ｋ最近傍探索とは、ＤＢ２７から、検索対象データとの距離が最小であるものから上位ｋ個の特徴量データを抽出する類似検索処理である。最近傍探索部４０は、後述する第２類似度算出部４２により算出された実距離の中のｋ番目に小さい実距離を距離条件として、検索対象データと共に用いて、範囲検索部３４を動作させ、それにより抽出される特徴量データの中から、実距離の小さい順で上位ｋ個の特徴量データをｋ最近傍探索の解として抽出する。 The nearest neighbor search unit 40 refers to the index data included in the index stored in the DB 27 to extract feature amount data that is a solution to the k nearest neighbor search indicated by the search target data and the data count information. The k nearest neighbor search is a similar search process for extracting the top k feature amount data from the DB 27 having the smallest distance to the search target data. The nearest neighbor search unit 40 operates the range search unit 34 using the kth smallest real distance of the real distances calculated by the second similarity calculation unit 42 described later as a distance condition together with the search target data. Then, from the feature amount data extracted thereby, the top k feature amount data is extracted as a solution of the k nearest neighbor search in ascending order of the actual distance.

最近傍探索部４０は、第２対象特定部４１、第２類似度算出部４２等を含む。
第２対象特定部４１は、ＤＢ２７に格納されるインデックスに含まれるインデックスデータの並び順における、検索対象変換部３２により得られた検索対象一次元データの位置に基づいて、その検索対象一次元データの直前及び直後から、データ数情報で示される数ｋの所定倍の数のインデックスデータを特定する。例えば、第２対象特定部４１は、検索対象一次元データの直前ｋ個のインデックスデータを特定し、検索対象一次元データの直後ｋ個のインデックスデータを特定し、トータルで、２ｋ個のインデックスデータを特定する。データ数ｋの所定倍の数のインデックスデータの具体的特定手法は制限されない。直前及び直後で異なる数のインデックスデータが特定されてもよい。例えば、検索対象一次元データに近い順に、データ数ｋの所定倍の数のインデックスデータが特定されてもよい。 The nearest neighbor searching unit 40 includes a second target specifying unit 41, a second similarity calculating unit 42, and the like.
The second target specifying unit 41 uses the search target one-dimensional data based on the position of the search target one-dimensional data obtained by the search target conversion unit 32 in the arrangement order of the index data included in the index stored in the DB 27. The number of index data that is a predetermined multiple of the number k indicated by the data number information is specified immediately before and immediately after. For example, the second target specifying unit 41 specifies k index data immediately before the search target one-dimensional data, specifies k index data immediately after the search target one-dimensional data, and 2k index data in total. Is identified. The specific identification method of index data that is a predetermined multiple of the number of data k is not limited. Different numbers of index data may be specified immediately before and after. For example, a number of index data that is a predetermined multiple of the number of data k may be specified in the order of closer to the search target one-dimensional data.

第２類似度算出部４２は、第２対象特定部４１により特定された各インデックスデータに対応する各特徴量データと検索対象データとの間の実距離を算出する。 The second similarity calculating unit 42 calculates an actual distance between each feature amount data corresponding to each index data specified by the second target specifying unit 41 and the search target data.

〔動作例〕
以下、第３実施形態における検索方法を図１０を用いて説明する。図１０は、第３実施形態における検索装置１の、ｋ最近傍探索の動作例を示すフローチャートである。以下の説明では、検索装置１が各方法の実行主体となるが、検索装置１に含まれる上述の各処理部が実行主体となってもよい。また、実行主体は、複数の装置（コンピュータ）であってもよい。以下の説明では、検索対象データはクエリデータｑと表記される。 [Operation example]
Hereinafter, a search method in the third embodiment will be described with reference to FIG. FIG. 10 is a flowchart illustrating an operation example of the k nearest neighbor search of the search device 1 according to the third embodiment. In the following description, the search device 1 is an execution subject of each method, but each of the above-described processing units included in the search device 1 may be an execution subject. Further, the execution subject may be a plurality of devices (computers). In the following description, the search target data is expressed as query data q.

検索装置１は、クエリデータｑ及びデータ数情報を取得する（Ｓ１０１）。クエリデータｑは、検索対象データであり、高次元の特徴量データである。図１０の例では、データ数情報はデータ数ｋを示す。 The search device 1 acquires the query data q and the data number information (S101). The query data q is search target data and is high-dimensional feature data. In the example of FIG. 10, the data number information indicates the data number k.

検索装置１は、クエリデータｑを、図８の（Ｓ８５）と同様の手法により、一次元へ唯一に変換する（Ｓ１０２）。これにより、インデックス対象の特徴量データからインデックスデータへの変換と同様の変換規則により、クエリデータｑがクエリ一次元データに変換される。 The search device 1 uniquely converts the query data q into one dimension by the same method as (S85) in FIG. 8 (S102). Thus, the query data q is converted into query one-dimensional data according to the same conversion rule as the conversion from the feature quantity data to be indexed to the index data.

検索装置１は、ＤＢ２７に格納されるインデックスに含まれるインデックスデータの並び順における、（Ｓ１０２）で得られたクエリ一次元データの位置を取得する（Ｓ１０３）。例えば、検索装置１は、クエリ一次元データが前からｍ番目のインデックスデータと前からｎ番目のインデックスデータとの間に位置することを認識する。 The search device 1 acquires the position of the query one-dimensional data obtained in (S102) in the arrangement order of the index data included in the index stored in the DB 27 (S103). For example, the search device 1 recognizes that the query one-dimensional data is located between the m-th index data from the front and the n-th index data from the front.

検索装置１は、検索対象一次元データの直前及び直後から、データ数情報で示される数ｋの所定倍の数のインデックスデータを特定する（Ｓ１０４）。例えば、検索装置１は、クエリ一次元データの直前ｋ個のインデックスデータを特定し、クエリ一次元データの直後ｋ個のインデックスデータを特定し、トータルで、２ｋ個のインデックスデータを特定する。 The search device 1 specifies a number of index data that is a predetermined multiple of the number k indicated by the data number information immediately before and immediately after the search target one-dimensional data (S104). For example, the search device 1 specifies k index data immediately before the query one-dimensional data, specifies k index data immediately after the query one-dimensional data, and specifies 2k index data in total.

検索装置１は、（Ｓ１０４）で特定されたインデックスデータに対応する各特徴量データとクエリデータｑとの実距離をそれぞれ算出する（Ｓ１０５）。上記例によれば、検索装置１は、２ｋ個のインデックスデータに対応する２ｋ個の特徴量データの各々と、クエリデータｑとの実距離をそれぞれ算出し、２ｋ個の実距離を得る。 The search device 1 calculates the actual distance between each feature amount data corresponding to the index data specified in (S104) and the query data q (S105). According to the above example, the search device 1 calculates the actual distance between each of the 2k feature amount data corresponding to the 2k index data and the query data q, and obtains 2k actual distances.

検索装置１は、（Ｓ１０５）で算出された実距離の中から、ｋ番目に小さい実距離ｓを選択する（Ｓ１０６）。
検索装置１は、（Ｓ１０６）で選択された実距離ｓを距離条件に設定し、図８に示される動作を遂行する（Ｓ１０７）。この動作では、（Ｓ１０２）で既にクエリ一次元データが取得されているため、（Ｓ８５）は実行されなくてもよい。 The search device 1 selects the kth smallest actual distance s from the actual distances calculated in (S105) (S106).
The search device 1 sets the actual distance s selected in (S106) as a distance condition, and performs the operation shown in FIG. 8 (S107). In this operation, since the query one-dimensional data has already been acquired in (S102), (S85) may not be executed.

検索装置１は、図８の（Ｓ８９）で解として得られた特徴量データの中から、実距離の小さい順で上位ｋ個の特徴量データをｋ最近傍探索の解として抽出する（Ｓ１０８）。 The search device 1 extracts the top k feature value data from the feature value data obtained as a solution in FIG. 8 (S89) in ascending order of the actual distance as a solution for the k nearest neighbor search (S108). .

〔第３実施形態における作用及び効果〕
第３実施形態では、上述のインデックスを用いたｋ最近傍探索処理が実行される。具体的には、インデックスデータが属する一次元空間上におけるクエリ一次元データの位置に基づいて、クエリ一次元データの周辺の、データ数ｋの所定倍の数のインデックスデータが特定され、この特定されたインデックスデータとクエリデータとの間の高次元空間上での実距離が計算される。そして、この実距離が距離条件の半径に設定され、第２実施形態の範囲問合せの検索処理が実行される。範囲問合せの検索処理で解として抽出された特徴量データの中から、実距離の小さい順でｋ個の特徴量データがｋ最近傍探索の解として抽出される。 [Operations and effects in the third embodiment]
In the third embodiment, the k nearest neighbor search process using the above-described index is executed. Specifically, based on the position of the query one-dimensional data in the one-dimensional space to which the index data belongs, the number of index data that is a predetermined multiple of the number k of data around the query one-dimensional data is specified. The actual distance on the high-dimensional space between the index data and the query data is calculated. Then, this actual distance is set as the radius of the distance condition, and the range inquiry search process of the second embodiment is executed. From the feature quantity data extracted as a solution in the range query search process, k feature quantity data are extracted as a solution of k nearest neighbor search in ascending order of the actual distance.

このように、第３実施形態によれば、ｋ最近傍探索処理において、処理負荷の高い高次元空間上での実距離の計算対象を減らしているため、ｋ最近傍探索処理を高速化することができる。 As described above, according to the third embodiment, in the k nearest neighbor search process, the number of objects to be calculated for the actual distance in a high-dimensional space with a high processing load is reduced. Can do.

上述の説明で用いた複数のフローチャートでは、複数の工程（処理）が順番に記載されているが、各実施形態で実行される工程の実行順序は、その記載の順番に制限されない。各実施形態では、図示される工程の順番を内容的に支障のない範囲で変更することができる。また、上述の各実施形態及び各実施例は、内容が相反しない範囲で組み合わせることができる。 In the plurality of flowcharts used in the above description, a plurality of steps (processes) are described in order, but the execution order of the steps executed in each embodiment is not limited to the description order. In each embodiment, the order of the illustrated steps can be changed within a range that does not hinder the contents. Moreover, each above-mentioned embodiment and each Example can be combined in the range in which the content does not conflict.

上記の各実施形態及び各実施例の一部又は全部は、以下の付記のようにも特定され得る。但し、各実施形態及び各実施例が以下の記載に限定されるものではない。 A part or all of the above embodiments and examples can be specified as in the following supplementary notes. However, each embodiment and each example are not limited to the following description.

（付記１）高次元データを取得するデータ取得部と、
相互に可約できない、前記高次元データの次元数分の変換係数を取得する係数取得部と、
前記係数取得部で取得される複数の変換係数を用いて、前記高次元データを一次元空間へ唯一にマッピングする変換部と、
前記変換部により得られる一次元データが昇順又は降順に整列された状態でインデックスデータとして含まれ、階層構造を持つインデックスを生成するインデックス生成部と、
を備えるインデックス生成装置。 (Supplementary note 1) a data acquisition unit for acquiring high-dimensional data;
A coefficient acquisition unit that acquires conversion coefficients corresponding to the number of dimensions of the high-dimensional data that cannot be mutually reduced;
Using a plurality of conversion coefficients acquired by the coefficient acquisition unit, a conversion unit that uniquely maps the high-dimensional data to a one-dimensional space;
An index generation unit that includes the one-dimensional data obtained by the conversion unit as index data in an ascending or descending order, and generates an index having a hierarchical structure;
An index generation device comprising:

（付記２）前記変換部は、前記高次元データを形成する各次元の要素データと、前記係数取得部で取得される各変換係数との積の和を算出する、
付記１に記載のインデックス生成装置。 (Additional remark 2) The said conversion part calculates the sum of the product of the element data of each dimension which forms the said high-dimensional data, and each conversion coefficient acquired by the said coefficient acquisition part,
The index generation device according to attachment 1.

（付記３）前記係数取得部は、前記高次元データの次元数分の素数を選択し、該選択された素数の積を底とする各素数の対数を前記複数の変換係数として取得する、
付記２に記載のインデックス生成装置。 (Supplementary Note 3) The coefficient acquisition unit selects a prime number corresponding to the number of dimensions of the high-dimensional data, and acquires a logarithm of each prime number based on the product of the selected prime numbers as the plurality of transform coefficients.
The index generation device according to attachment 2.

（付記４）前記係数取得部は、前記高次元データの次元数分の素数を前記複数の変換係数として取得し、
前記変換部は、前記データ取得部により取得される高次元データを自然数に正規化し、該正規化された高次元データを形成する各次元の要素データを冪数として用いて、前記係数取得部で取得される各変換係数を底としてそれぞれ冪乗して得られる値の積を算出する、
付記１に記載のインデックス生成装置。 (Additional remark 4) The said coefficient acquisition part acquires the prime number for the number of dimensions of the said high-dimensional data as said some conversion coefficient,
The conversion unit normalizes the high-dimensional data acquired by the data acquisition unit to a natural number, uses the element data of each dimension forming the normalized high-dimensional data as a power, and the coefficient acquisition unit Calculate the product of the values obtained by raising each obtained conversion coefficient to the power,
The index generation device according to attachment 1.

（付記５）付記１から４のいずれか１つに記載のインデックス生成装置により生成される前記インデックスを用いる検索装置において、
前記高次元データと同じ次元数の検索対象データを取得するクエリ取得部と、
前記係数取得部で取得される前記複数の変換係数と同じ複数の変換係数を用いて、前記変換部と同じ手法で、前記検索対象データを前記一次元空間へ唯一にマッピングする検索対象変換部と、
前記高次元データと前記検索対象データとの間の類似度を評価する際に、前記検索対象変換部により得られる検索対象一次元データと前記インデックスに前記インデックスデータとして含まれる前記一次元データとの間の距離を算出する距離算出部と、
を備える検索装置。 (Supplementary note 5) In the search device using the index generated by the index generation device according to any one of Supplementary notes 1 to 4,
A query acquisition unit that acquires search target data having the same number of dimensions as the high-dimensional data;
A search target conversion unit that uniquely maps the search target data to the one-dimensional space in the same manner as the conversion unit, using the same plurality of conversion coefficients as the plurality of conversion coefficients acquired by the coefficient acquisition unit; ,
When evaluating the similarity between the high-dimensional data and the search target data, the search target one-dimensional data obtained by the search target conversion unit and the one-dimensional data included in the index as the index data A distance calculation unit for calculating a distance between;
A search device comprising:

（付記６）前記検索対象データからの距離条件を取得する第１条件取得部と、
前記インデックスに含まれる前記インデックスデータを参照することにより、前記検索対象データ及び前記距離条件に基づく範囲問合せの解となる高次元データを抽出する範囲検索部と、
を更に備え、
前記範囲検索部は、
前記検索対象データ及び前記距離条件により示される、前記検索対象データの高次元空間における問合せ範囲に関する、上界データ及び下界データを取得する範囲取得部、
を含み、
前記検索対象変換部は、前記複数の変換係数を用いて、前記上界データ及び前記下界データを前記一次元空間へ唯一にマッピングし、
前記範囲検索部は、
前記インデックスに含まれる前記インデックスデータの中から、前記検索対象変換部により前記上界データ及び前記下界データから得られる上界一次元データ及び下界一次元データの間の範囲内のインデックスデータを特定する第１対象特定部、
を更に含み、
前記距離算出部は、前記第１対象特定部により特定されるインデックスデータと前記検索対象変換部により得られる前記検索対象一次元データとの間の距離を算出し、
前記範囲検索部は、
前記距離条件から得られる前記一次元空間上の一次元距離条件と、前記距離算出部により算出される距離との比較により、前記第１対象特定部により特定されたインデックスデータをフィルタリングし、該フィルタリングで得られるインデックスデータを解候補として抽出する候補抽出部、
を更に含む、
付記５に記載の検索装置。 (Appendix 6) a first condition acquisition unit that acquires a distance condition from the search target data;
A range search unit that extracts high-dimensional data that is a solution to a range query based on the search target data and the distance condition by referring to the index data included in the index;
Further comprising
The range search unit
A range acquisition unit that acquires upper bound data and lower bound data related to a query range in a high-dimensional space of the search target data indicated by the search target data and the distance condition;
Including
The search target conversion unit uses the plurality of conversion coefficients to uniquely map the upper bound data and the lower bound data to the one-dimensional space,
The range search unit
Among the index data included in the index, the search target conversion unit specifies index data within a range between the upper bound one-dimensional data and the lower bound one-dimensional data obtained from the upper bound data and the lower bound data. 1st object specific part,
Further including
The distance calculation unit calculates a distance between the index data specified by the first target specifying unit and the search target one-dimensional data obtained by the search target conversion unit,
The range search unit
The index data specified by the first target specifying unit is filtered by comparing the one-dimensional distance condition in the one-dimensional space obtained from the distance condition with the distance calculated by the distance calculating unit, and the filtering A candidate extraction unit that extracts the index data obtained in step 1 as solution candidates,
Further including
The search device according to appendix 5.

（付記７）前記範囲検索部は、
前記候補抽出部により抽出された前記解候補のインデックスデータに対応する高次元データと前記検索対象データとの間の実距離を算出する第１類似度算出部、
を更に含み、
前記第１類似度算出部により算出される実距離と前記距離条件との比較により、前記範囲問合せの解となる高次元データを抽出する、
付記６に記載の検索装置。 (Appendix 7) The range search unit
A first similarity calculation unit that calculates an actual distance between high-dimensional data corresponding to the solution candidate index data extracted by the candidate extraction unit and the search target data;
Further including
Extracting high-dimensional data as a solution to the range query by comparing the actual distance calculated by the first similarity calculation unit and the distance condition;
The search device according to appendix 6.

（付記８）データ数ｋ（ｋは自然数）を示すデータ数情報を取得する第２条件取得部と、
前記インデックスに含まれる前記インデックスデータを参照することにより、前記検索対象データ及び前記データ数情報により示されるｋ最近傍探索の解となる高次元データを抽出する最近傍探索部と、
を更に備え、
前記最近傍探索部は、
前記インデックスに含まれる前記インデックスデータの並び順における、前記検索対象変換部により得られる前記検索対象一次元データの位置に基づいて、前記検索対象一次元データの直前及び直後から、前記データ数情報で示される数の所定倍の数のインデックスデータを特定する第２対象特定部と、
前記第２対象特定部により特定された各インデックスデータに対応する各高次元データと前記検索対象データとの間の実距離を算出する第２類似度算出部と、
を含み、
前記第２類似度算出部により算出された実距離の中の前記ｋ番目に小さい実距離を前記距離条件として、前記検索対象データと共に用いて、前記範囲検索部を動作させることにより抽出される高次元データの中から、実距離の小さい順で上位ｋ個の高次元データをｋ最近傍探索の解として抽出する、
付記７に記載の検索装置。 (Additional remark 8) The 2nd condition acquisition part which acquires the data number information which shows the data number k (k is a natural number),
By referring to the index data included in the index, a nearest neighbor search unit that extracts high-dimensional data serving as a solution of the k nearest neighbor search indicated by the search target data and the data number information;
Further comprising
The nearest neighbor search unit includes:
Based on the position of the search target one-dimensional data obtained by the search target conversion unit in the order in which the index data included in the index is arranged, the data count information is obtained immediately before and immediately after the search target one-dimensional data. A second target specifying unit that specifies a predetermined number of index data of the number shown,
A second similarity calculating unit that calculates an actual distance between each high-dimensional data corresponding to each index data specified by the second target specifying unit and the search target data;
Including
The k-th smallest actual distance calculated by the second similarity calculation unit is used as the distance condition together with the search target data, and is extracted by operating the range search unit. Extracting the top k high-dimensional data from the dimensional data in ascending order of the real distance as a solution of the k nearest neighbor search;
The search device according to appendix 7.

（付記９）少なくとも１つのコンピュータにより実行されるインデックス生成方法において、
高次元データを取得し、
相互に可約できない、前記高次元データの次元数分の変換係数を取得し、
前記取得された複数の変換係数を用いて、前記高次元データを一次元空間へ唯一にマッピングし、
前記マッピングにより得られる一次元データが昇順又は降順に整列された状態でインデックスデータとして含まれ、階層構造を持つインデックスを生成する、
ことを含むインデックス生成方法。 (Supplementary note 9) In an index generation method executed by at least one computer,
Acquire high-dimensional data,
Obtain conversion coefficients for the number of dimensions of the high-dimensional data that cannot be mutually reduced,
Using the acquired plurality of transformation coefficients, the high-dimensional data is uniquely mapped to a one-dimensional space,
One-dimensional data obtained by the mapping is included as index data in an ascending or descending order, and an index having a hierarchical structure is generated.
Index generation method including the above.

（付記１０）前記マッピングは、前記高次元データを形成する各次元の要素データと、前記係数取得部で取得される各変換係数との積の和を算出する、
付記９に記載のインデックス生成方法。 (Supplementary Note 10) The mapping calculates a sum of products of element data of each dimension forming the high-dimensional data and each conversion coefficient acquired by the coefficient acquisition unit.
The index generation method according to attachment 9.

（付記１１）前記高次元データの次元数分の素数を選択する、
ことを更に含み、
前記変換係数の取得は、前記選択された素数の積を底とする各素数の対数を前記複数の変換係数として取得する、
付記１０に記載のインデックス生成方法。 (Supplementary Note 11) Select prime numbers corresponding to the number of dimensions of the high-dimensional data.
Further including
The acquisition of the conversion coefficient is to obtain the logarithm of each prime number based on the product of the selected prime numbers as the plurality of conversion coefficients.
The index generation method according to attachment 10.

（付記１２）前記高次元データを自然数に正規化する、
ことを更に含み、
前記変換係数の取得は、前記高次元データの次元数分の素数を前記複数の変換係数として取得し、
前記マッピングは、前記正規化された高次元データを形成する各次元の要素データを冪数として用いて、前記取得された各変換係数を底としてそれぞれ冪乗して得られる値の積を算出する、
付記９に記載のインデックス生成方法。 (Supplementary Note 12) Normalize the high-dimensional data to natural numbers.
Further including
The acquisition of the conversion coefficient acquires a prime number corresponding to the number of dimensions of the high-dimensional data as the plurality of conversion coefficients,
The mapping uses the element data of each dimension forming the normalized high-dimensional data as a power, and calculates a product of values obtained by raising each of the acquired transform coefficients to the power. ,
The index generation method according to attachment 9.

（付記１３）付記９から１２のいずれか１つに記載のインデックス生成方法により生成される前記インデックスを用い、かつ、少なくとも１つのコンピュータにより実行される検索方法において、
前記高次元データと同じ次元数の検索対象データを取得し、
前記複数の変換係数を用いて、前記インデックス生成方法に含まれる前記マッピングと同じ手法で、前記検索対象データを前記一次元空間へ唯一にマッピングし、
前記高次元データと前記検索対象データとの間の類似度を評価する際に、前記検索対象データの前記マッピングにより得られる検索対象一次元データと前記インデックスに含まれる前記一次元データとの間の距離を算出する、
ことを含む検索方法。 (Supplementary note 13) In the search method using the index generated by the index generation method according to any one of Supplementary notes 9 to 12, and executed by at least one computer,
Retrieve search target data having the same number of dimensions as the high-dimensional data,
Using the plurality of transform coefficients, the search target data is uniquely mapped to the one-dimensional space in the same manner as the mapping included in the index generation method,
When evaluating the similarity between the high-dimensional data and the search target data, between the search target one-dimensional data obtained by the mapping of the search target data and the one-dimensional data included in the index Calculate distance,
Search method including that.

（付記１４）前記検索対象データからの距離条件を取得し、
前記検索対象データ及び前記距離条件により示される、前記検索対象データの高次元空間における問合せ範囲に関する、上界データ及び下界データを取得し、
前記検索対象データのマッピングと同じ手法及び同じ複数の変換係数を用いて、前記上界データ及び前記下界データを前記一次元空間へ唯一にマッピングし、
前記インデックスに含まれる前記インデックスデータの中から、前記上界データ及び前記下界データの前記マッピングにより得られる上界一次元データ及び下界一次元データの間の範囲内のインデックスデータを特定し、
前記特定されたインデックスデータと前記検索対象一次元データとの間の距離を算出し、
前記距離条件から得られる前記一次元空間上の一次元距離条件と、前記算出された距離との比較により、前記特定されたインデックスデータをフィルタリングし、該フィルタリングで得られるインデックスデータを解候補として抽出する、
ことを更に含む付記１３に記載の検索方法。 (Appendix 14) Obtaining a distance condition from the search target data,
Obtaining upper bound data and lower bound data related to a query range in a high-dimensional space of the search subject data indicated by the search subject data and the distance condition;
Using the same method and the same plurality of transformation coefficients as the mapping of the search target data, the upper bound data and the lower bound data are uniquely mapped to the one-dimensional space,
From the index data included in the index, specify the index data within the range between the upper bound one-dimensional data and the lower bound one-dimensional data obtained by the mapping of the upper bound data and the lower bound data,
Calculating a distance between the identified index data and the one-dimensional data to be searched;
The identified index data is filtered by comparing the one-dimensional distance condition in the one-dimensional space obtained from the distance condition with the calculated distance, and the index data obtained by the filtering is extracted as a solution candidate To
The search method according to supplementary note 13, further including:

（付記１５）前記解候補として抽出されたインデックスデータに対応する高次元データと前記検索対象データとの間の実距離を算出し、
前記算出された実距離と前記距離条件との比較により、前記検索対象データ及び前記距離条件に基づく範囲問合せの解となる高次元データを抽出する、
ことを更に含む付記１４に記載の検索方法。 (Supplementary Note 15) An actual distance between the high-dimensional data corresponding to the index data extracted as the solution candidate and the search target data is calculated,
By comparing the calculated actual distance and the distance condition, high-dimensional data that is a solution to a range query based on the search target data and the distance condition is extracted.
The search method according to supplementary note 14, further including:

（付記１６）データ数ｋ（ｋは自然数）を示すデータ数情報を取得し、
前記インデックスに含まれる前記インデックスデータの並び順における、前記検索対象一次元データの位置に基づいて、前記検索対象一次元データの直前及び直後から、前記データ数情報で示される数の所定倍の数のインデックスデータを特定し、
前記特定された各インデックスデータに対応する各高次元データと前記検索対象データとの間の実距離を算出し、
前記算出された実距離の中の前記ｋ番目に小さい実距離を前記距離条件として特定し、
前記検索対象データ及び前記距離条件に基づく前記範囲問合せの解として抽出される高次元データの中から、実距離の小さい順で上位ｋ個の高次元データをｋ最近傍探索の解として抽出する、
ことを更に含む付記１５に記載の検索方法。 (Supplementary Note 16) Obtain data number information indicating the number of data k (k is a natural number),
Based on the position of the one-dimensional search target data in the order of the index data included in the index, a number that is a predetermined multiple of the number indicated by the data count information immediately before and after the one-dimensional search target data Identify index data for
Calculating an actual distance between each high-dimensional data corresponding to each identified index data and the search target data;
Specifying the kth smallest actual distance in the calculated actual distance as the distance condition;
From the high-dimensional data extracted as the solution of the range query based on the search target data and the distance condition, the top k high-dimensional data are extracted as the solution of the k nearest neighbor search in ascending order of the actual distance.
The search method according to supplementary note 15, further including:

（付記１７）付記９から１２のいずれか１つに記載のインデックス生成方法を少なくとも１つのコンピュータに実行させるプログラム。 (Supplementary Note 17) A program that causes at least one computer to execute the index generation method according to any one of Supplementary Notes 9 to 12.

（付記１８）付記１３から１６のいずれか１つに記載の検索方法を少なくとも１つのコンピュータに実行させるプログラム。 (Supplementary Note 18) A program that causes at least one computer to execute the search method according to any one of Supplementary Notes 13 to 16.

１高次元データ検索装置（検索装置）
１０ＣＰＵ
１１メモリ
２０、１０１データ取得部
２１、１０４インデックス生成部
２３、１０２係数取得部
２４、１０３変換部
２５並び替え処理部
２７データベース（ＤＢ）
３０、２０１クエリ取得部
３１検索部
３２、２０２検索対象変換部
３３、２０３距離算出部
３４範囲検索部
３５範囲取得部
３６第１対象特定部
３７候補抽出部
３８第１類似度算出部
４０最近傍探索部
４１第２対象特定部
４２第２類似度算出部
１００インデックス生成装置
２００検索装置 1 High-dimensional data retrieval device (retrieval device)
10 CPU
11 Memory 20, 101 Data acquisition unit 21, 104 Index generation unit 23, 102 Coefficient acquisition unit 24, 103 Conversion unit 25 Rearrangement processing unit 27 Database (DB)
30, 201 Query acquisition unit 31 Search unit 32, 202 Search target conversion unit 33, 203 Distance calculation unit 34 Range search unit 35 Range acquisition unit 36 First target identification unit 37 Candidate extraction unit 38 First similarity calculation unit 40 Nearest Search unit 41 Second object specifying unit 42 Second similarity calculation unit 100 Index generation device 200 Search device

Claims

A data acquisition unit for acquiring high-dimensional data;
A coefficient acquisition unit that acquires conversion coefficients corresponding to the number of dimensions of the high-dimensional data that cannot be mutually reduced;
Using a plurality of conversion coefficients acquired by the coefficient acquisition unit, a conversion unit that uniquely maps the high-dimensional data to a one-dimensional space;
An index generation unit that includes the one-dimensional data obtained by the conversion unit as index data in an ascending or descending order, and generates an index having a hierarchical structure;
An index generation device comprising:

The conversion unit calculates a sum of products of element data of each dimension forming the high-dimensional data and each conversion coefficient acquired by the coefficient acquisition unit;
The index generation device according to claim 1.

The coefficient acquisition unit selects a prime number corresponding to the number of dimensions of the high-dimensional data, and acquires the logarithm of each prime number based on the product of the selected prime numbers as the plurality of transform coefficients.
The index generation device according to claim 2.

The coefficient acquisition unit acquires a prime number corresponding to the number of dimensions of the high-dimensional data as the plurality of conversion coefficients,
The conversion unit normalizes the high-dimensional data acquired by the data acquisition unit to a natural number, uses the element data of each dimension forming the normalized high-dimensional data as a power, and the coefficient acquisition unit Calculate the product of the values obtained by raising each obtained conversion coefficient to the power,
The index generation device according to claim 1.

In the search apparatus using the said index produced | generated by the index production | generation apparatus of any one of Claim 1 to 4,
A query acquisition unit that acquires search target data having the same number of dimensions as the high-dimensional data;
A search target conversion unit that uniquely maps the search target data to the one-dimensional space in the same manner as the conversion unit, using the same plurality of conversion coefficients as the plurality of conversion coefficients acquired by the coefficient acquisition unit; ,
When evaluating the similarity between the high-dimensional data and the search target data, the search target one-dimensional data obtained by the search target conversion unit and the one-dimensional data included in the index as the index data A distance calculation unit for calculating a distance between;
A search device comprising:

A first condition acquisition unit that acquires a distance condition from the search target data;
A range search unit that extracts high-dimensional data that is a solution to a range query based on the search target data and the distance condition by referring to the index data included in the index;
Further comprising
The range search unit
A range acquisition unit that acquires upper bound data and lower bound data related to a query range in a high-dimensional space of the search target data indicated by the search target data and the distance condition;
Including
The search target conversion unit uses the plurality of conversion coefficients to uniquely map the upper bound data and the lower bound data to the one-dimensional space,
The range search unit
Among the index data included in the index, the search target conversion unit specifies index data within a range between the upper bound one-dimensional data and the lower bound one-dimensional data obtained from the upper bound data and the lower bound data. 1st object specific part,
Further including
The distance calculation unit calculates a distance between the index data specified by the first target specifying unit and the search target one-dimensional data obtained by the search target conversion unit,
The range search unit
The index data specified by the first target specifying unit is filtered by comparing the one-dimensional distance condition in the one-dimensional space obtained from the distance condition with the distance calculated by the distance calculating unit, and the filtering A candidate extraction unit that extracts the index data obtained in step 1 as solution candidates,
Further including
The search device according to claim 5.

The range search unit
A first similarity calculation unit that calculates an actual distance between high-dimensional data corresponding to the solution candidate index data extracted by the candidate extraction unit and the search target data;
Further including
Extracting high-dimensional data as a solution to the range query by comparing the actual distance calculated by the first similarity calculation unit and the distance condition;
The search device according to claim 6.

A second condition acquisition unit for acquiring data number information indicating the data number k (k is a natural number);
By referring to the index data included in the index, a nearest neighbor search unit that extracts high-dimensional data serving as a solution of the k nearest neighbor search indicated by the search target data and the data number information;
Further comprising
The nearest neighbor search unit includes:
Based on the position of the search target one-dimensional data obtained by the search target conversion unit in the order in which the index data included in the index is arranged, the data count information is obtained immediately before and immediately after the search target one-dimensional data. A second target specifying unit that specifies a predetermined number of index data of the number shown,
A second similarity calculating unit that calculates an actual distance between each high-dimensional data corresponding to each index data specified by the second target specifying unit and the search target data;
Including
The k-th smallest actual distance calculated by the second similarity calculation unit is used as the distance condition together with the search target data, and is extracted by operating the range search unit. Extracting the top k high-dimensional data from the dimensional data in ascending order of the real distance as a solution of the k nearest neighbor search;
The search device according to claim 7.

In an index generation method executed by at least one computer,
Acquire high-dimensional data,
Obtain conversion coefficients for the number of dimensions of the high-dimensional data that cannot be mutually reduced,
Using the acquired plurality of transformation coefficients, the high-dimensional data is uniquely mapped to a one-dimensional space,
One-dimensional data obtained by the mapping is included as index data in an ascending or descending order, and an index having a hierarchical structure is generated.
Index generation method including the above.

A search method using the index generated by the index generation method according to claim 9 and executed by at least one computer,
Retrieve search target data having the same number of dimensions as the high-dimensional data,
Using the plurality of transform coefficients, the search target data is uniquely mapped to the one-dimensional space in the same manner as the mapping included in the index generation method,
When evaluating the similarity between the high-dimensional data and the search target data, between the search target one-dimensional data obtained by the mapping of the search target data and the one-dimensional data included in the index Calculate distance,
Search method including that.