JP2017211846A

JP2017211846A - Retrieval data management device, retrieval data management method, and retrieval data management program

Info

Publication number: JP2017211846A
Application number: JP2016104827A
Authority: JP
Inventors: 岩崎　雅二郎; Masajiro Iwasaki; 雅二郎岩崎
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-05-26
Filing date: 2016-05-26
Publication date: 2017-11-30
Anticipated expiration: 2036-05-26
Also published as: JP6333306B2

Abstract

PROBLEM TO BE SOLVED: To provide a retrieval data management device, a retrieval data management method and a retrieval data management program, capable of reducing the number of edges in consideration of search performance.SOLUTION: A retrieval data management device includes: a storage unit configured to store data of a graph structure to which a multidimensional vector data of a retrieval object is connected by edges; a deletion processing unit configured to extract an extracted edge as an edge satisfying a predetermined condition from the data of the graph structure stored in the storage unit, connect pieces of multidimensional vector data identical to the extracted edge with a route different from the extracted edge, determine whether an alternate path having the number of edges equal to or less than a predetermined number is present or not, and when it is determined that the alternate path is present, delete the extracted edge from the data of the graph structure.SELECTED DRAWING: Figure 1

Description

本発明は、検索データ管理装置、検索データ管理方法、および検索データ管理プログラムに関する。 The present invention relates to a search data management device, a search data management method, and a search data management program.

画像検索などのデータ検索の分野において、検索対象のデータ同士を仮想的に接続要素によって接続したグラフ構造のデータを構築しておき、接続要素を辿ることでデータ検索を行うことが行われている。接続要素とは、二つのデータが接続されていることを示す情報であり、エッジ、リンクなどと称される（以下ではエッジと称する）。エッジの設定は、まず検索対象のデータの特徴量を求め、特徴量が近いデータ同士を接続するといった規則に従って行われる。特徴量は、典型的には多次元ベクトルデータで表され、特徴量が近いとは、例えば、多次元ベクトルデータ間の距離が短いことと定義される。多次元ベクトルデータ同士の距離としては、例えば、ベクトル要素間の差分についてＬｐノルムを求めたものが使用される（ｐ＝１、２、…）。 In the field of data search such as image search, data having a graph structure in which search target data is virtually connected by connection elements is constructed, and data search is performed by tracing the connection elements. . The connection element is information indicating that two pieces of data are connected, and is referred to as an edge or a link (hereinafter referred to as an edge). The setting of the edge is performed according to a rule such that a feature amount of data to be searched is first obtained and data having similar feature amounts are connected. The feature amount is typically represented by multidimensional vector data, and that the feature amount is close is defined, for example, as a short distance between the multidimensional vector data. As the distance between the multidimensional vector data, for example, the Lp norm obtained for the difference between the vector elements is used (p = 1, 2,...).

また、エッジには、有向エッジと無向エッジが存在する。有向エッジとは、一方向にしかデータを辿れないエッジであり、無向エッジとは、双方向にデータを辿ることができるエッジである。無向エッジを設定した方が検索経路のパターンが多くなるため、高速かつ高精度に検索を行うことができる場合がある。ところが、例えば有向エッジを設定したグラフ構造に対して機械的に無向エッジに変更した場合、特定のデータに対して多くのエッジが設定されることがある。エッジの数が過剰になると、検索処理の工数が増加し、かえって性能が低下する場合もある。 In addition, there are directed edges and undirected edges. A directed edge is an edge that can trace data in only one direction, and an undirected edge is an edge that can trace data in both directions. Since there are more search path patterns when the undirected edge is set, there are cases where the search can be performed at high speed and with high accuracy. However, for example, when a graph structure in which a directed edge is set is mechanically changed to an undirected edge, many edges may be set for specific data. When the number of edges becomes excessive, the number of man-hours for the search process increases, and the performance may be deteriorated.

こうした点に鑑み、ある着目データに接続されているデータのうち、着目データとの距離が短いものから所定数のデータを残し、残りのデータとの接続を解除する技術が開示されている（例えば、特許文献１参照）。 In view of these points, a technique is disclosed in which a predetermined number of data is left out of data connected to certain data of interest and the distance from the data of interest is short, and the connection with the remaining data is released (for example, , See Patent Document 1).

特開２０１１−０９０３５２号公報JP 2011-090352 A

しかしながら、従来の技術では、エッジを効果的に削減可能である反面、探索性能（検索時間）を考慮していないために探索性能が低下する傾向がある。
本発明は、このような事情を考慮してなされたものであり、探索性能を考慮したエッジの削減が可能な検索データ管理装置、検索データ管理方法、および検索データ管理プログラムを提供することを目的の一つとする。 However, while the conventional technique can effectively reduce the edges, the search performance tends to decrease because the search performance (search time) is not taken into consideration.
The present invention has been made in view of such circumstances, and an object of the present invention is to provide a search data management device, a search data management method, and a search data management program capable of reducing edges in consideration of search performance. One of them.

本発明の一態様は、検索対象の多次元ベクトルデータがエッジにより接続されたグラフ構造のデータを記憶する記憶部と、前記記憶部に記憶されたグラフ構造のデータから、所定条件を満たすエッジである被抽出エッジを抽出し、前記被抽出エッジと同じ多次元ベクトルデータ同士を、前記被抽出エッジとは異なる経路で接続すると共に、エッジ数が所定数以下である代替パスが存在するか否かを判定し、存在すると判定した場合に、前記被抽出エッジを前記グラフ構造のデータから削除する削除処理部と、を備える検索データ管理装置である。 One aspect of the present invention is a storage unit that stores graph structure data in which multidimensional vector data to be searched is connected by an edge, and an edge that satisfies a predetermined condition from the graph structure data stored in the storage unit. Whether a certain extracted edge is extracted, and the same multidimensional vector data as the extracted edge is connected by a route different from the extracted edge, and whether there is an alternative path having a predetermined number of edges or less And a deletion processing unit that deletes the extracted edge from the data of the graph structure when it is determined that the extracted edge exists.

本発明の一態様によれば、探索性能を考慮したエッジの削減をすることができる。 According to one embodiment of the present invention, edges can be reduced in consideration of search performance.

実施形態の検索データ管理装置１を中心とした構成図である。It is a block diagram centering on the search data management apparatus 1 of embodiment. 制御部２０のハードウェア構成の一例を示す図である。2 is a diagram illustrating an example of a hardware configuration of a control unit 20. FIG. 既存ベクトルデータ４２の内容の一例を示す図である。It is a figure which shows an example of the content of the existing vector data. グラフインデックス４４の内容の一例を示す図である。4 is a diagram illustrating an example of the contents of a graph index 44. FIG. 新規作成・追加処理部２６Ａによる処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a process by 26 A of new creation / addition process parts. 近傍ベクトルが更新される様子を示す図である。It is a figure which shows a mode that a neighborhood vector is updated. 新規作成・追加処理部２６Ａによる処理の流れの他の一例を示すフローチャートである。It is a flowchart which shows another example of the flow of a process by 26 A of new creation / addition process parts. 新規作成・追加処理部２６Ａによる処理の流れの他の一例を示すフローチャートである。It is a flowchart which shows another example of the flow of a process by 26 A of new creation / addition process parts. ベクトル集合Ｓから、ベクトルｙとの距離が最も短いベクトルが抽出され、ベクトル集合Ｓから除外される様子を概念的に示す図である。FIG. 5 is a diagram conceptually illustrating a state where a vector having the shortest distance from the vector y is extracted from the vector set S and excluded from the vector set S. ベクトルｓの近傍ベクトルＮ（Ｇ，ｓ）を概念的に示す図である。It is a figure which shows notionally vector N (G, s) of vector s. ベクトルがベクトル集合Ｓまたはベクトル集合Ｒに追加される様子を概念的に示す図である。It is a figure which shows notionally a mode that a vector is added to the vector set S or the vector set R. ベクトル集合Ｒに含まれるベクトルから、ベクトルｙとの距離が最も長いベクトルが削除され、ｒが再設定される様子を概念的に示す図である。It is a figure which shows notionally a mode that the vector with the longest distance with the vector y is deleted from the vector contained in the vector set R, and r is reset. 削除処理部２６Ｂにより実行される処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the process performed by the deletion process part 26B. エッジ数がｐ以下の代替パスが削除される様子を概念的に示す図である。It is a figure which shows notionally a mode that the alternative path | pass whose number of edges is below p is deleted. 探索開始ベクトルを選択する手法を例示した図である。It is the figure which illustrated the method of selecting a search start vector.

以下、図面を参照し、本発明の検索データ管理装置、検索データ管理方法、および検索データ管理プログラムの実施形態について説明する。なお、以下の説明において「ベクトル間の距離が短い」と「ベクトル同士が近い」、および「ベクトル間の距離が長い」と「ベクトルが遠い」は同じ意味である。 Hereinafter, embodiments of a search data management device, a search data management method, and a search data management program according to the present invention will be described with reference to the drawings. In the following description, “the distance between the vectors is short” and “the vectors are close”, and “the distance between the vectors is long” and “the vector is far” have the same meaning.

［全体構成］
図１は、実施形態の検索データ管理装置１を中心とした構成図である。検索データ管理装置１は、一以上のクライアント端末ＣＬとネットワークＮＷを介して接続される。クライアント端末は、パーソナルコンピュータ、スマートフォンなどの携帯電話、タブレット端末、その他の端末装置である。ネットワークＮＷは、無線基地局、公衆回線、専用回線、プロバイダ端末、インターネットなどを含む。検索データ管理装置１は、クライアント端末ＣＬからクエリデータを受信すると、クエリデータに類似するデータを検索し、検索結果をクライアント端末ＣＬに返信する。検索データ管理装置１が返信するデータは、データそのもの（例えばｊｐｇなどで生成された画像データ）であってもよいし、データを参照するための識別子（ＵＲＬなど）であってもよい。また、クエリデータや検索対象のデータは、画像、音声、テキストデータなど、如何なる種類のデータであってもよい。以下の説明では、検索データ管理装置１が画像を検索するものとして説明する。 [overall structure]
FIG. 1 is a configuration diagram centering on a search data management apparatus 1 according to the embodiment. The search data management device 1 is connected to one or more client terminals CL via a network NW. The client terminal is a personal computer, a mobile phone such as a smartphone, a tablet terminal, or another terminal device. The network NW includes a wireless base station, a public line, a dedicated line, a provider terminal, the Internet, and the like. When the search data management device 1 receives the query data from the client terminal CL, the search data management device 1 searches for data similar to the query data and returns the search result to the client terminal CL. The data returned by the search data management device 1 may be the data itself (for example, image data generated by jpg or the like) or an identifier (URL or the like) for referring to the data. Further, the query data and the search target data may be any kind of data such as images, sounds, text data, and the like. In the following description, it is assumed that the search data management device 1 searches for images.

検索データ管理装置１は、例えば、ネットワークインターフェース１０と、制御部２０と、入出力装置３０と、データサーバ４０とを備える。ネットワークインターフェース１０は、例えば、ＮＩＣ（Network Interface Card）である。 The search data management device 1 includes, for example, a network interface 10, a control unit 20, an input / output device 30, and a data server 40. The network interface 10 is, for example, a NIC (Network Interface Card).

制御部２０は、例えば、ベクトルデータ生成部２２と、インデックス検索部２４と、グラフインデックス編集部２６とを備える。グラフインデックス編集部２６は、新規作成・追加処理部２６Ａと、削除処理部２６Ｂとを備える。これらの制御部２０の構成要素は、例えば、ＣＰＵ（Central Processing Unit）などのプロセッサがプログラムを実行することにより実現される。また、これらの機能部のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）などのハードウェアによって実現されてもよいし、ソフトウェアとハードウェアが協働して実現されてもよい。これらの機能部による処理の内容については後述する。 The control unit 20 includes, for example, a vector data generation unit 22, an index search unit 24, and a graph index editing unit 26. The graph index editing unit 26 includes a new creation / addition processing unit 26A and a deletion processing unit 26B. These components of the control unit 20 are realized, for example, when a processor such as a CPU (Central Processing Unit) executes a program. Some or all of these functional units may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), or software. And hardware may be implemented in cooperation. The contents of processing by these functional units will be described later.

図２は、制御部２０のハードウェア構成の一例を示す図である。制御部２０は、例えば、ＣＰＵ２０Ａと、ＲＡＭ（Random Access Memory）２０Ｂと、プログラムメモリ２０Ｃとが内部バスによって接続された構成を有する。内部バスには、ネットワークインターフェース１０、入出力インターフェース、メモリインターフェースなどの各種インターフェースが接続される。ＲＡＭ２０Ｂには、ＣＰＵ２０Ａによる処理対象のデータ、或いは処理結果のデータなどが格納される。プログラムメモリ２０Ｃは、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリなどの記憶装置のうち一部または全部を含む。プログラムメモリ２０Ｃには、ベクトルデータ生成プログラム２０Ｃａ、インデックス検索プログラム２０Ｃｂ、グラフインデックス編集プログラム２０Ｃｃなどのプログラムが格納される。これらのプログラムは明確に分離されている必要はなく、一部または全部が共通化されてもよい。 FIG. 2 is a diagram illustrating an example of a hardware configuration of the control unit 20. The control unit 20 has, for example, a configuration in which a CPU 20A, a RAM (Random Access Memory) 20B, and a program memory 20C are connected by an internal bus. Various interfaces such as a network interface 10, an input / output interface, and a memory interface are connected to the internal bus. The RAM 20B stores data to be processed by the CPU 20A, data of processing results, and the like. The program memory 20C includes a part or all of a storage device such as a ROM (Read Only Memory), a HDD (Hard Disk Drive), and a flash memory. Programs such as a vector data generation program 20Ca, an index search program 20Cb, and a graph index editing program 20Cc are stored in the program memory 20C. These programs do not need to be clearly separated, and some or all of them may be shared.

図１に戻り、入出力装置３０は、検索データ管理装置１の管理者による入力操作を受け付けるマウスやキーボード、タッチパネルなどの入力装置と、ＬＣＤ（Liquid Crystal Display）や有機ＥＬ（Electroluminescence）表示装置、スピーカなどの出力装置とを含む。 Returning to FIG. 1, the input / output device 30 includes an input device such as a mouse, a keyboard, and a touch panel that accepts an input operation by an administrator of the search data management device 1, an LCD (Liquid Crystal Display), an organic EL (Electroluminescence) display device, Output devices such as speakers.

データサーバ４０は、ＨＤＤやフラッシュメモリなどの記憶装置により実現される。データサーバ４０は、制御部２０のプログラムメモリ２０Ｃと共通化されてもよいし、プログラムメモリ２０Ｃとは別体の記憶装置により実現されてもよい。また、データサーバ４０は、検索データ管理装置１から各種ネットワークを介してアクセス可能なＮＡＳ（Network Attached Storage）装置などの外部記憶装置により実現されてもよい。この場合、特許請求の範囲における「記憶部」は、データサーバ４０から取得したデータが一時的に格納されるＲＡＭ２０Ｂを指すものと解釈してもよい。 The data server 40 is realized by a storage device such as an HDD or a flash memory. The data server 40 may be shared with the program memory 20C of the control unit 20, or may be realized by a storage device separate from the program memory 20C. The data server 40 may be realized by an external storage device such as a NAS (Network Attached Storage) device that can be accessed from the search data management device 1 via various networks. In this case, the “storage unit” in the claims may be interpreted to indicate the RAM 20B in which data acquired from the data server 40 is temporarily stored.

［制御部］
以下、制御部２０の各機能部による処理の内容について説明する。なお、制御部２０は、一つのプロセッサにより実現される必要はなく、機能ごとに分散処理を行ってもよい。例えば、ベクトルデータ生成部２２およびインデックス検索部２４と、グラフインデックス編集部２６とは、それぞれ別体のプロセッサにより実現されてもよい。 [Control unit]
Hereinafter, the contents of processing by each functional unit of the control unit 20 will be described. Note that the control unit 20 does not need to be realized by a single processor, and may perform distributed processing for each function. For example, the vector data generation unit 22, the index search unit 24, and the graph index editing unit 26 may be realized by separate processors.

ベクトルデータ生成部２２は、クライアント端末ＣＬなどから受信した入力データに基づいて、入力データの特徴量である多次元ベクトルデータを生成する。入力データが画像である場合、多次元ベクトルデータは、例えば、局所特徴量やＢｏＦ（Bag of Features）、色のヒストグラムなど、或いはこれらの組み合わせである。また、入力データがテキストデータである場合、多次元ベクトルデータは、例えば、単語の出現頻度のベクトルデータ、ニューラルネットワークを用いて抽出した意味ベクトルなどである。 The vector data generation unit 22 generates multidimensional vector data, which is a feature amount of input data, based on input data received from the client terminal CL or the like. When the input data is an image, the multidimensional vector data is, for example, a local feature amount, BoF (Bag of Features), a color histogram, or a combination thereof. When the input data is text data, the multidimensional vector data is, for example, vector data of word appearance frequency, meaning vectors extracted using a neural network, and the like.

インデックス検索部２４は、ベクトルデータ生成部２２により生成された多次元ベクトルデータ（以下、新規ベクトルデータと称する）をクエリとしてデータサーバ４０を検索し、新規ベクトルデータに類似する既存ベクトルデータ４２を抽出する。データサーバ４０には、例えば、既存ベクトルデータ４２と、グラフインデックス４４とが格納される。 The index search unit 24 searches the data server 40 using the multidimensional vector data (hereinafter referred to as new vector data) generated by the vector data generation unit 22 as a query, and extracts existing vector data 42 similar to the new vector data. To do. For example, existing vector data 42 and a graph index 44 are stored in the data server 40.

既存ベクトルデータ４２は、検索対象となるデータから生成された多次元ベクトルデータである。既存ベクトルデータ４２には、その既存ベクトルデータ４２を生成する元となったデータ、或いはデータを参照する識別子が対応付けられている。検索データ管理装置１は、新規ベクトルデータをクエリとして得られた既存ベクトルデータ４２、すなわち新規ベクトルデータに類似する既存ベクトルデータ４２に対応付けられたデータ、或いはデータを参照する識別子をクライアント端末ＣＬに送信する。図３は、既存ベクトルデータ４２の内容の一例を示す図である。図示するように、既存ベクトルデータ４２は、多次元ベクトルデータの識別情報（図中、ベクトルＩＤ）に対して、多次元ベクトルデータ、およびその多次元ベクトルデータを生成する元となったデータ、或いはデータを参照する識別子が対応付けられたデータである。 The existing vector data 42 is multidimensional vector data generated from data to be searched. The existing vector data 42 is associated with data used to generate the existing vector data 42 or an identifier that refers to the data. The search data management device 1 sends to the client terminal CL the existing vector data 42 obtained by using the new vector data as a query, that is, data associated with the existing vector data 42 similar to the new vector data, or an identifier referring to the data. Send. FIG. 3 is a diagram illustrating an example of the contents of the existing vector data 42. As shown in the figure, the existing vector data 42 is the multidimensional vector data and the data used to generate the multidimensional vector data for the identification information (vector ID in the figure) of the multidimensional vector data, or This is data associated with an identifier that refers to data.

グラフインデックス４４は、複数の既存ベクトルデータ４２を接続するエッジに関する情報であり、既存ベクトルデータ４２のうち任意の二つを接続する複数のエッジにより形成されるグラフ構造のデータである。本実施例におけるグラフインデックス４４に含まれるエッジは、双方向のエッジである。図４は、グラフインデックス４４の内容の一例を示す図である。図示するように、グラフインデックス４４は、各エッジの識別情報であるエッジＩＤに対して、そのエッジが接続する両端の既存ベクトルデータのベクトルＩＤが対応付けられたデータである。なお、図３および図４に示すデータ構造は、あくまで一例であり、例えば、ベクトルＩＤに対して、接続されているエッジのエッジＩＤが対応付けられたデータが、グラフインデックス４４としてデータサーバ４０に格納されてもよい。すなわち、本実施形態においてデータ構造の形式は本質的な問題ではなく、要求される性質を満たす限り、如何なるデータ構造のデータがデータサーバ４０に格納されてもよい。 The graph index 44 is information regarding edges connecting a plurality of existing vector data 42, and is data having a graph structure formed by a plurality of edges connecting any two of the existing vector data 42. The edges included in the graph index 44 in the present embodiment are bidirectional edges. FIG. 4 is a diagram illustrating an example of the contents of the graph index 44. As illustrated, the graph index 44 is data in which the vector ID of the existing vector data at both ends to which the edge is connected is associated with the edge ID that is identification information of each edge. The data structures shown in FIGS. 3 and 4 are merely examples. For example, data in which an edge ID of a connected edge is associated with a vector ID is stored in the data server 40 as a graph index 44. It may be stored. In other words, the format of the data structure is not an essential problem in the present embodiment, and data having any data structure may be stored in the data server 40 as long as the required properties are satisfied.

インデックス検索部２４は、グラフインデックス４４により規定されたエッジによって辿ることのできる既存ベクトルデータ４２のうち、新規ベクトルデータに対して既定の距離以内にある既存ベクトルデータ４２を、新規ベクトルデータに類似する既存ベクトルデータとして抽出する。ベクトル間の距離とは、例えば、ベクトル要素間の差分についてＬｐノルムを求めたものと定義される（ｐ＝１、２、…）。また、インデックス検索部２４は、グラフインデックス４４により規定されたエッジによって辿ることのできる既存ベクトルデータ４２のうち、新規ベクトルデータに対する距離が短いものから順に所定数の既存ベクトルデータ４２を、新規ベクトルデータに類似する既存ベクトルデータとして抽出してもよい。そして、インデックス検索部２４は、抽出した既存ベクトルデータ４２を、ネットワークインターフェース１０およびネットワークＮＷを介してクライアント端末ＣＬに送信する。 The index search unit 24 resembles the existing vector data 42 within a predetermined distance with respect to the new vector data among the existing vector data 42 that can be traced by the edge defined by the graph index 44, similar to the new vector data. Extract as existing vector data. The distance between vectors is defined as, for example, the Lp norm obtained for the difference between vector elements (p = 1, 2,...). In addition, the index search unit 24 selects a predetermined number of existing vector data 42 in order from the shortest distance to the new vector data among the existing vector data 42 that can be traced by the edge defined by the graph index 44. It may be extracted as existing vector data similar to. Then, the index search unit 24 transmits the extracted existing vector data 42 to the client terminal CL via the network interface 10 and the network NW.

なお、インデックス検索部２４による検索処理は、例えば、後述する図８のフローチャートの処理を用いて実現される。図８のフローチャートにおいて、新規ベクトルデータをベクトルｙとして得られるベクトル集合Ｒが、インデックス検索部２４による検索結果の一例である。この場合、所定数ｋｓを所望の値に調整することで、検索結果に含まれるベクトルの数を調整することができる。 Note that the search processing by the index search unit 24 is realized using, for example, the processing of the flowchart of FIG. In the flowchart of FIG. 8, a vector set R obtained from new vector data as a vector y is an example of a search result by the index search unit 24. In this case, the number of vectors included in the search result can be adjusted by adjusting the predetermined number ks to a desired value.

ベクトルデータ生成部２２およびインデックス検索部２４の機能によって、クライアント端末ＣＬは、送信したデータに類似するデータを検索データ管理装置１から取得することができる。なお、検索データ管理装置１の役割は、新規ベクトルデータに類似する既存ベクトルデータ４２を抽出するところまでであってもよく、その場合、抽出した既存ベクトルデータ４２に対応したデータ、或いはデータを参照する識別子をクライアント端末ＣＬに送信する機能は、検索データ管理装置１とは異なる装置が有してもよい。 With the functions of the vector data generation unit 22 and the index search unit 24, the client terminal CL can acquire data similar to the transmitted data from the search data management device 1. Note that the role of the search data management apparatus 1 may be up to extracting the existing vector data 42 similar to the new vector data. In that case, the data corresponding to the extracted existing vector data 42 or data is referred to. An apparatus different from the search data management apparatus 1 may have a function of transmitting the identifier to be transmitted to the client terminal CL.

グラフインデックス編集部２６の新規作成・追加処理部２６Ａは、グラフインデックス４４を新規作成し、または追加処理する。この処理について、例えば以下に説明する二つの手法のうちいずれかを採用することができる。以下の説明では、多次元ベクトルを単にベクトルと称することがある。 The new creation / addition processing unit 26A of the graph index editing unit 26 creates a new graph index 44 or performs addition processing. For this processing, for example, one of the two methods described below can be adopted. In the following description, a multidimensional vector may be simply referred to as a vector.

（新規作成・追加手法１）
図５は、新規作成・追加処理部２６Ａによる処理の流れの一例を示すフローチャートである。このフローチャートの処理は、グラフインデックス４４を新規に作成する場合に使用可能であると共に、新たな多次元ベクトルデータが追加された場合に、既にグラフインデックス４４に対応付けられた多次元ベクトルデータと新たな多次元ベクトルデータとに対して実行することで、新たな多次元ベクトルデータを追加する処理にも使用することができる。本フローチャートの処理の前提として、既存ベクトルデータ４２に含まれる多次元ベクトルデータには、１からｎまでのベクトルＩＤが付与されているものとする。フローチャートの中で使用する引数ｉおよびｊは、１からｎまで順に選択されて処理が実行される。すなわち、新規作成・追加処理部２６Ａは、まず、引数ｉを１に設定すると共に、引数ｊを１から一つずつインクリメントしながら引数ｊがｎに至るまで繰り返しＳ１００〜Ｓ１０６の処理を行う。次に、新規作成・追加処理部２６Ａは、引数ｉを２に設定すると共に、引数ｊを１から一つずつインクリメントしながら引数ｊがｎに至るまで繰り返しＳ１００〜Ｓ１０６の処理を行う。これを、引数ｉがｎに至るまで繰り返し実行する。 (New creation / addition method 1)
FIG. 5 is a flowchart showing an example of the flow of processing by the new creation / addition processing unit 26A. The process of this flowchart can be used when a new graph index 44 is created, and when new multidimensional vector data is added, the multidimensional vector data already associated with the graph index 44 and the new It can also be used for the process of adding new multidimensional vector data by executing it on multi-dimensional vector data. As a premise of the processing of this flowchart, it is assumed that vector IDs 1 to n are assigned to multidimensional vector data included in the existing vector data 42. Arguments i and j used in the flowchart are sequentially selected from 1 to n and executed. That is, the new creation / addition processing unit 26A first sets the argument i to 1, and repeatedly performs the processes of S100 to S106 until the argument j reaches n while incrementing the argument j one by one. Next, the new creation / addition processing unit 26A sets the argument i to 2, and repeats the processes of S100 to S106 until the argument j reaches n while incrementing the argument j one by one. This is repeated until the argument i reaches n.

新規作成・追加処理部２６Ａは、ベクトルｉ（ベクトルＩＤがｉであるベクトル；以下同様）が、ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素である、またはｉ＝ｊであるか否かを判定する（Ｓ１００）。ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）とは、ベクトルｊから見て距離が短い方から順に所定数ｋｐだけ選択されるベクトルの集合である。「Ｇ」は、グラフ構造のデータを示す符号である。ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素であるベクトルのそれぞれと、ベクトルｊとの間は、エッジによって接続される。従って、少なくともベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素であるベクトルは、ベクトルｊから一つのエッジを介して辿ることのできるベクトルである。なお、ベクトルｊが、ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素でない他のベクトルの近傍ベクトルの要素である場合もある。以下、このことを前提として説明する。ベクトルｉがベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素である場合、またはｉ＝ｊである場合、Ｓ１０２〜Ｓ１０６の処理がスキップされる。 The new creation / addition processing unit 26A determines whether the vector i (the vector with the vector ID i; the same applies hereinafter) is an element of the neighborhood vector N (G, j) of the vector j or i = j. Is determined (S100). The neighborhood vector N (G, j) of the vector j is a set of vectors selected by a predetermined number kp in order from the shortest distance when viewed from the vector j. “G” is a code indicating data having a graph structure. Each vector that is an element of the neighborhood vector N (G, j) of the vector j and the vector j are connected by an edge. Accordingly, at least a vector that is an element of the neighborhood vector N (G, j) of the vector j is a vector that can be traced from the vector j through one edge. Note that the vector j may be an element of a neighborhood vector of another vector that is not an element of the neighborhood vector N (G, j) of the vector j. Hereinafter, this will be described on the assumption. When the vector i is an element of the neighborhood vector N (G, j) of the vector j, or when i = j, the processing of S102 to S106 is skipped.

ベクトルｉがベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素でなく、且つｉ＝ｊでない場合、新規作成・追加処理部２６Ａは、ベクトルｉをベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素に追加する（Ｓ１０２）。次に、新規作成・追加処理部２６Ａは、ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素であるベクトルの数が所定数ｋｐを超えるか否かを判定する（Ｓ１０４）。所定数ｋｐは、任意に定められる自然数である。例えば、ｋｐ＝３である。ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）の要素であるベクトルの数が所定数ｋｐを超える場合、新規作成・追加処理部２６Ａは、ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）から、ベクトルｊとの距離が最も長い（ベクトルｊから最も遠い）ベクトルを除外する（Ｓ１０６）。 When the vector i is not an element of the neighborhood vector N (G, j) of the vector j and i = j, the new creation / addition processing unit 26A converts the vector i to the neighborhood vector N (G, j) of the vector j. It adds to an element (S102). Next, the new creation / addition processing unit 26A determines whether or not the number of vectors that are elements of the neighborhood vector N (G, j) of the vector j exceeds a predetermined number kp (S104). The predetermined number kp is an arbitrary natural number. For example, kp = 3. When the number of vectors that are elements of the neighborhood vector N (G, j) of the vector j exceeds the predetermined number kp, the new creation / addition processing unit 26A calculates the vector j from the neighborhood vector N (G, j) of the vector j. The vector having the longest distance from (the farthest from the vector j) is excluded (S106).

図６は、近傍ベクトルが更新される様子を示す図である。図示の例では、元々、ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）にベクトル（１）〜（３）が設定されていたところ、ベクトルｉが追加されると共にベクトル（３）が除外された。この結果、ベクトルｊの近傍ベクトルＮ（Ｇ，ｊ）がＮ（Ｇ，ｊ）＊に更新された。このような処理を繰り返すことで、ベクトルｘ＝１〜ｎの全てについて、近傍ベクトルＮ（Ｇ，ｘ）が設定される。これによって、グラフインデックス４４が作成される。 FIG. 6 is a diagram illustrating how the neighborhood vector is updated. In the illustrated example, when vectors (1) to (3) were originally set in the neighborhood vector N (G, j) of the vector j, the vector i was added and the vector (3) was excluded. As a result, the neighborhood vector N (G, j) of the vector j is updated to N (G, j) *. By repeating such processing, the neighborhood vector N (G, x) is set for all the vectors x = 1 to n. Thereby, the graph index 44 is created.

（新規作成・追加手法２）
図７および図８は、新規作成・追加処理部２６Ａによる処理の流れの他の一例を示すフローチャートである。本フローチャートの処理の前提として、既存ベクトルデータ４２において１からｎまでのベクトルＩＤが付与され、グラフインデックス４４が作成されたｎ個の多次元ベクトルデータが既に存在し、新たにｍ−ｎ個の多次元ベクトルデータが追加されたものとする（ｍ＞ｎ）。フローチャートの中で使用する引数ｙは、（ｎ＋１）からｍまで順に選択されて処理が実行される。 (New creation / addition method 2)
7 and 8 are flowcharts showing another example of the processing flow by the new creation / addition processing unit 26A. As a premise of the processing of this flowchart, n-dimensional vector data having a vector ID from 1 to n assigned to the existing vector data 42 and having the graph index 44 already exist, and mn pieces of new multi-dimensional vector data already exist. It is assumed that multidimensional vector data is added (m> n). The argument y used in the flowchart is selected in order from (n + 1) to m, and the process is executed.

新規作成・追加処理部２６Ａは、Ｋ近傍検索処理を実行する（Ｓ２００）。Ｋ近傍検索処理については図８を用いて説明する。図８に示すように、新規作成・追加処理部２６Ａは、超球の半径ｒを∞（無限大）に設定し（Ｓ３００）、既存のベクトル集合からベクトル集合Ｓを抽出する（Ｓ３０２）。超球とは、検索範囲（その範囲に入っていれば入力ベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）の要素に含められる可能性がある範囲）を示す仮想的な球である。なお、ステップＳ３０２で抽出されたベクトル集合Ｓに含まれるベクトルは、同時にベクトル集合Ｒの初期集合にも含められてもよい。 The new creation / addition processing unit 26A executes a K neighborhood search process (S200). The K neighborhood search process will be described with reference to FIG. As shown in FIG. 8, the new creation / addition processing unit 26A sets the radius r of the hypersphere to ∞ (infinity) (S300), and extracts the vector set S from the existing vector set (S302). The hypersphere is a virtual sphere indicating a search range (a range that can be included in the elements of the neighborhood vector N (G, y) of the input vector y if it is within the range). Note that the vectors included in the vector set S extracted in step S302 may be included in the initial set of the vector set R at the same time.

次に、新規作成・追加処理部２６Ａは、ベクトル集合Ｓに含まれるベクトルの中で、ベクトルｙとの距離が最も短いベクトルを抽出し、ベクトルｓとする（Ｓ３０４）。次に、新規作成・追加処理部２６Ａは、ベクトルｓをベクトル集合Ｓから除外する（Ｓ３０６）。図９は、ベクトル集合Ｓから、ベクトルｙとの距離が最も短いベクトルが抽出され、ベクトル集合Ｓから除外される様子を概念的に示す図である。 Next, the new creation / addition processing unit 26A extracts a vector having the shortest distance from the vector y among the vectors included in the vector set S and sets it as a vector s (S304). Next, the new creation / addition processing unit 26A excludes the vector s from the vector set S (S306). FIG. 9 is a diagram conceptually illustrating how a vector having the shortest distance from the vector y is extracted from the vector set S and excluded from the vector set S.

次に、新規作成・追加処理部２６Ａは、ベクトルｓとベクトルｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超えるか否かを判定する（Ｓ３０８）。ここで、εは拡張要素であり、ｒ（１＋ε）は、探索範囲（その範囲にあるベクトルの近傍ベクトルについて、入力ベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）の要素に含められるか否かを判定する範囲）の半径を示す値である。ベクトルｓとベクトルｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超える場合、新規作成・追加処理部２６Ａは、ベクトル集合Ｒをベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）として出力し、図８のフローチャートの処理を終了する。 Next, the new creation / addition processing unit 26A determines whether the distance d (s, y) between the vector s and the vector y exceeds r (1 + ε) (S308). Here, ε is an expansion element, and r (1 + ε) indicates whether or not the search range (a neighborhood vector of a vector in the range is included in the neighborhood vector N (G, y) of the input vector y). This is a value indicating the radius of the range to be determined. When the distance d (s, y) between the vector s and the vector y exceeds r (1 + ε), the new creation / addition processing unit 26A outputs the vector set R as a neighborhood vector N (G, y) of the vector y. Then, the process of the flowchart of FIG.

ベクトルｓとベクトルｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超えない場合、新規作成・追加処理部２６Ａは、ベクトルｓの近傍ベクトルＮ（Ｇ，ｓ）の要素であるベクトルの中からベクトル集合Ｃに含まれないベクトルを一つ選択し、選択したベクトルｕを、ベクトル集合Ｃに格納する（Ｓ３１２）。図１０は、ベクトルｓの近傍ベクトルＮ（Ｇ，ｓ）を概念的に示す図である。ベクトル集合Ｃは、重複検索を回避するために便宜上設けられるものであり、図８のフローチャートが開始されるときにリセットされて空集合とされる。 When the distance d (s, y) between the vector s and the vector y does not exceed r (1 + ε), the new creation / addition processing unit 26A determines the vector that is an element of the neighborhood vector N (G, s) of the vector s. One vector that is not included in the vector set C is selected from them, and the selected vector u is stored in the vector set C (S312). FIG. 10 is a diagram conceptually showing the neighborhood vector N (G, s) of the vector s. The vector set C is provided for convenience in order to avoid duplicate search, and is reset to an empty set when the flowchart of FIG. 8 is started.

次に、新規作成・追加処理部２６Ａは、ベクトルｕとベクトルｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下であるか否かを判定する（Ｓ３１４）。ベクトルｕとベクトルｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下である場合、新規作成・追加処理部２６Ａは、ベクトルｕをベクトル集合Ｓに追加する（Ｓ３１６）。 Next, the new creation / addition processing unit 26A determines whether or not the distance d (u, y) between the vector u and the vector y is equal to or less than r (1 + ε) (S314). When the distance d (u, y) between the vector u and the vector y is equal to or less than r (1 + ε), the new creation / addition processing unit 26A adds the vector u to the vector set S (S316).

次に、新規作成・追加処理部２６Ａは、ベクトルｕとベクトルｙとの距離ｄ（ｕ，ｙ）がｒ以下であるか否かを判定する（Ｓ３１８）。ベクトルｕとベクトルｙとの距離ｄ（ｕ，ｙ）がｒを超える場合、Ｓ３３０に処理が進められる。 Next, the new creation / addition processing unit 26A determines whether the distance d (u, y) between the vector u and the vector y is equal to or less than r (S318). If the distance d (u, y) between the vector u and the vector y exceeds r, the process proceeds to S330.

ベクトルｕとベクトルｙとの距離ｄ（ｕ，ｙ）がｒ以下である場合、新規作成・追加処理部２６Ａは、ベクトルｕをベクトル集合Ｒに追加する（Ｓ３２０）。図１１は、ベクトルがベクトル集合Ｓまたはベクトル集合Ｒに追加される様子を概念的に示す図である。図示の例では、ベクトルｙとの距離がｒ以下であるベクトル（４）および（５）がベクトル集合Ｒに追加され、ベクトルｙとの距離がｒを超えｒ（１＋ε）以下であるベクトル（６）がベクトル集合Ｓに追加される。 When the distance d (u, y) between the vector u and the vector y is equal to or less than r, the new creation / addition processing unit 26A adds the vector u to the vector set R (S320). FIG. 11 is a diagram conceptually illustrating how a vector is added to the vector set S or the vector set R. In the illustrated example, vectors (4) and (5) whose distance from the vector y is equal to or less than r are added to the vector set R, and the vector (6) whose distance from the vector y exceeds r and is equal to or less than r (1 + ε). ) Is added to the vector set S.

そして、新規作成・追加処理部２６Ａは、ベクトル集合Ｒに含まれるベクトル数がｋｓを超えるか否かを判定する（Ｓ３２２）。所定数ｋｓは、任意に定められる自然数である。例えば、ｋｓ＝３である。 Then, the new creation / addition processing unit 26A determines whether or not the number of vectors included in the vector set R exceeds ks (S322). The predetermined number ks is a natural number that is arbitrarily determined. For example, ks = 3.

ベクトル集合Ｒに含まれるベクトル数がｋｓを超える場合、新規作成・追加処理部２６Ａは、ベクトル集合Ｒに含まれるベクトルの中でベクトルｙとの距離が最も長いベクトルを、ベクトル集合Ｒから除外する（Ｓ３２４）。 When the number of vectors included in the vector set R exceeds ks, the new creation / addition processing unit 26A excludes the vector having the longest distance from the vector y among the vectors included in the vector set R from the vector set R. (S324).

次に、新規作成・追加処理部２６Ａは、ベクトル集合Ｒに含まれるベクトル数がｋｓと一致するか否かを判定する（Ｓ３２６）。ベクトル集合Ｒに含まれるベクトル数がｋｓと一致する場合、新規作成・追加処理部２６Ａは、ベクトル集合Ｒに含まれるベクトルの中でベクトルｙとの距離が最も長いベクトルと、ベクトルｙとの距離を、新たなｒに設定する（Ｓ３２８）。 Next, the new creation / addition processing unit 26A determines whether or not the number of vectors included in the vector set R matches ks (S326). When the number of vectors included in the vector set R matches ks, the new creation / addition processing unit 26A determines the distance between the vector having the longest distance from the vector y among the vectors included in the vector set R and the vector y. Is set to a new r (S328).

図１２は、ベクトル集合Ｒに含まれるベクトルから、ベクトルｙとの距離が最も長いベクトルが除外され、ｒが再設定される様子を概念的に示す図である。図１２の例では、図１１でＲに追加されたベクトルｓ、（４）、および（５）の他、前回以前のループ処理の結果、ベクトル（７）がＲに含まれていたものとする。この場合、ベクトルｙとの距離が最も長いベクトル（５）がＲから除外され、残ったベクトルの中でベクトルｙとの距離が最も長いベクトル（７）とベクトルｙとの距離が、ｒに設定される。 FIG. 12 is a diagram conceptually illustrating a state in which a vector having the longest distance from the vector y is excluded from vectors included in the vector set R, and r is reset. In the example of FIG. 12, in addition to the vector s, (4), and (5) added to R in FIG. 11, the vector (7) is included in R as a result of the previous loop processing. . In this case, the vector (5) having the longest distance from the vector y is excluded from R, and the distance between the vector (7) having the longest distance from the vector y and the vector y among the remaining vectors is set to r. Is done.

そして、新規作成・追加処理部２６Ａは、ベクトルｓの近傍ベクトルＮ（Ｇ，ｓ）の要素であるベクトルから全てのベクトルを選択してベクトル集合Ｃに格納し終えたか否かを判定する（Ｓ３３０）。ベクトルｓの近傍ベクトルＮ（Ｇ，ｓ）の要素であるベクトルから全てのベクトルを選択してベクトル集合Ｃに格納し終えていない場合、Ｓ３１２に処理が戻される。 Then, the new creation / addition processing unit 26A determines whether all vectors have been selected from the vectors that are elements of the neighborhood vector N (G, s) of the vector s and stored in the vector set C (S330). ). If all vectors have not been selected from the vectors that are elements of the neighborhood vector N (G, s) of the vector s and stored in the vector set C, the process returns to S312.

ベクトルｓの近傍ベクトルＮ（Ｇ，ｓ）の要素であるベクトルから全てのベクトルを選択してベクトル集合Ｃに格納し終えた場合、新規作成・追加処理部２６Ａは、ベクトル集合Ｓが空集合であるか否かを判定する（Ｓ３３０）。ベクトル集合Ｓが空集合でない場合、Ｓ３０４に処理が戻され、ベクトル集合Ｓが空集合である場合、新規作成・追加処理部２６Ａは、ベクトル集合Ｒをベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）として出力し、図８のフローチャートの処理を終了する（Ｓ３３２）。 When all the vectors are selected from the vectors that are elements of the neighborhood vector N (G, s) of the vector s and stored in the vector set C, the new creation / addition processing unit 26A indicates that the vector set S is an empty set. It is determined whether or not there is (S330). When the vector set S is not an empty set, the process is returned to S304. When the vector set S is an empty set, the new creation / addition processing unit 26A converts the vector set R into a neighborhood vector N (G, y) of the vector y. And the process of the flowchart of FIG. 8 ends (S332).

図８に示す処理を行うことで、全てのベクトルを網羅的に処理する場合に比して、メモリ領域の消費を抑制しつつ、高速処理を実現することができる。すなわち、図８に示す処理は、主としてグラフインデックス４４により規定されるエッジを辿ることで進行するため、既存ベクトルデータ４２を網羅的に検索するよりも処理工数が少なくて済む。また、既存のベクトル群に必要なメモリ領域の他、シードとなるベクトル集合Ｓ、重複検索を回避するためのベクトル集合Ｃ、および、最大のベクトル数がｋｓに制限されるベクトル集合Ｒに関するデータを管理すればよいため、メモリ領域の消費を抑制することができる。更に、入力データであるベクトルｙから十分に遠いベクトルは、処理によってベクトル集合Ｓに追加される可能性が低く、仮にＳ３０２でベクトル集合Ｓに含められたとしても、探索順位が低いことにより処理対象とされない可能性が高くなるため、処理工数を更に低減することができる。 By performing the processing shown in FIG. 8, it is possible to realize high-speed processing while suppressing consumption of the memory area as compared to the case of exhaustively processing all vectors. That is, the process shown in FIG. 8 proceeds mainly by following an edge defined by the graph index 44, and therefore, the processing man-hours can be reduced compared to exhaustively searching the existing vector data 42. In addition to the memory area necessary for the existing vector group, data relating to a vector set S serving as a seed, a vector set C for avoiding duplicate search, and a vector set R in which the maximum number of vectors is limited to ks Since it only has to be managed, consumption of the memory area can be suppressed. Furthermore, a vector sufficiently distant from the vector y as input data is unlikely to be added to the vector set S by processing, and even if it is included in the vector set S in S302, the search target is low, Since the possibility that it will not be increased increases, the number of processing steps can be further reduced.

図７に戻る。ベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）が求められると、新規作成・追加処理部２６Ａは、ベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）の要素であるベクトルを一つ選択し、ベクトルｚとする（Ｓ２０２）。次に、新規作成・追加処理部２６Ａは、ベクトルｚの近傍ベクトルＮ（Ｇ，ｚ）の要素としてベクトルｙを追加する（Ｓ２０４）。 Returning to FIG. When the neighborhood vector N (G, y) of the vector y is obtained, the new creation / addition processing unit 26A selects one vector that is an element of the neighborhood vector N (G, y) of the vector y, (S202). Next, the new creation / addition processing unit 26A adds the vector y as an element of the neighborhood vector N (G, z) of the vector z (S204).

次に、新規作成・追加処理部２６Ａは、ベクトルｚの近傍ベクトルＮ（Ｇ，ｚ）の要素であるベクトルの数がｋｐを超えるか否かを判定する（Ｓ２０６）。ベクトルｚの近傍ベクトルＮ（Ｇ，ｚ）の要素であるベクトルの数が所定数ｋｐを超える場合、新規作成・追加処理部２６Ａは、ベクトルｚの近傍ベクトルＮ（Ｇ，ｚ）から、ベクトルｚとの距離が最も長い（ベクトルｊから最も遠い）ベクトルを除外する（Ｓ２０８）。 Next, the new creation / addition processing unit 26A determines whether or not the number of vectors that are elements of the neighborhood vector N (G, z) of the vector z exceeds kp (S206). When the number of vectors that are elements of the neighborhood vector N (G, z) of the vector z exceeds a predetermined number kp, the new creation / addition processing unit 26A calculates the vector z from the neighborhood vector N (G, z) of the vector z. The vector having the longest distance from (the furthest from the vector j) is excluded (S208).

そして、新規作成・追加処理部２６Ａは、ベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）の要素であるベクトルｚを全て選択したか否かを判定する（Ｓ２１０）。ベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）の要素であるベクトルｚを全て選択していない場合はＳ２０２に処理が戻され、ベクトルｙの近傍ベクトルＮ（Ｇ，ｙ）の要素であるベクトルｚを全て選択した場合はＳ２００〜Ｓ２０８のループ処理の１回分が終了する。これによって、グラフインデックス４４が作成される。 Then, the new creation / addition processing unit 26A determines whether or not all vectors z that are elements of the neighborhood vector N (G, y) of the vector y have been selected (S210). When all the vectors z that are elements of the neighborhood vector N (G, y) of the vector y are not selected, the process returns to S202, and the vector z that is an element of the neighborhood vector N (G, y) of the vector y is obtained. If all of them are selected, the loop processing of S200 to S208 ends. Thereby, the graph index 44 is created.

ここで、（新規作成・追加手法１）と（新規作成・追加手法２）の処理は、組み合わせて使用されてもよい。すなわち、最初にベクトル群が与えられてグラフインデックス４４を作成する際に（新規作成・追加手法１）の処理を行い、その後、ベクトル群が追加される際に（新規作成・追加手法２）の処理を行ってもよい。その場合、ｋｐとｋｓは同じでもよいし、異なってもよい。また、ベクトル群が追加される度に（新規作成・追加手法１）の処理を行うものとしてもよい。 Here, the processes of (new creation / addition method 1) and (new creation / addition method 2) may be used in combination. That is, when the vector group is first given and the graph index 44 is created, the processing of (new creation / addition method 1) is performed, and then when the vector group is added (new creation / addition method 2) Processing may be performed. In that case, kp and ks may be the same or different. Alternatively, the processing of (New creation / addition method 1) may be performed each time a vector group is added.

（削除処理）
削除処理部２６Ｂは、新規作成・追加処理部２６Ａによって作成されたグラフインデックス４４から、所定条件を満たすエッジを抽出し、抽出したエッジ（被抽出エッジ）と同じベクトルを異なる経路で接続する代替パスであって、エッジ数が所定数ｐ以下の代替パスが存在する場合に、抽出したエッジを削除する処理を行う。所定条件とは、例えば、以下に説明する図１３のＳ４０２の判定条件である。 (Deletion process)
The deletion processing unit 26B extracts an edge satisfying a predetermined condition from the graph index 44 created by the new creation / addition processing unit 26A, and connects the same vector as the extracted edge (extracted edge) through a different route. When there are alternative paths with the number of edges equal to or less than the predetermined number p, a process of deleting the extracted edges is performed. The predetermined condition is, for example, the determination condition of S402 in FIG.

図１３は、削除処理部２６Ｂにより実行される処理の流れの一例を示すフローチャートである。本フローチャートの処理の前提として、既存ベクトルデータ４２に含まれる多次元ベクトルデータには、１からｎまでのベクトルＩＤが付与されているものとする（図７および図８のフローチャートが実行された後であれば、１〜ｍまでのベクトルＩＤが付与されてもよい）。フローチャートの中で使用する引数ｅは、１からｎまで順に選択されて処理が実行される。 FIG. 13 is a flowchart illustrating an example of the flow of processing executed by the deletion processing unit 26B. As a premise of the processing of this flowchart, it is assumed that vector IDs 1 to n are assigned to the multidimensional vector data included in the existing vector data 42 (after the flowcharts of FIGS. 7 and 8 are executed). If so, vector IDs of 1 to m may be given). An argument e to be used in the flowchart is selected in order from 1 to n and the process is executed.

まず、削除処理部２６Ｂは、ベクトルｅの近傍ベクトルＮ（Ｇ，ｅ）の要素であるベクトルの中からベクトルを一つ選択し、ベクトルｆとする（Ｓ４００）。次に、削除処理部２６Ｂは、Ｒａｎｋ｛Ｎ（Ｇ，ｅ），ｆ｝が所定数ｋｒを超えるか否かを判定する（Ｓ４０２）。Ｒａｎｋ｛Ｎ（Ｇ，ｅ），ｆ｝とは、ベクトルｆが、ベクトルｅの近傍ベクトルＮ（Ｇ，ｅ）の要素であるベクトルの中で、何番目にベクトルｅに近いかを示す指標値である。また、所定数ｋｒは、例えば、所定数ｋｐ未満の値である。従って、「Ｒａｎｋ｛Ｎ（Ｇ，ｅ），ｆ｝が所定数ｋｒを超えるか否かを判定する」処理は、ベクトルｅの近傍ベクトルＮ（Ｇ，ｅ）の要素であるベクトルから、ベクトルｅとの距離が他のベクトルとの比較において長いベクトルを抽出する処理である。Ｒａｎｋ｛Ｎ（Ｇ，ｅ），ｆ｝が所定数ｋｒを超えない場合、Ｓ４０４〜Ｓ４１０の処理がスキップされる。 First, the deletion processing unit 26B selects one vector from vectors that are elements of the neighborhood vector N (G, e) of the vector e and sets it as a vector f (S400). Next, the deletion processing unit 26B determines whether Rank {N (G, e), f} exceeds a predetermined number kr (S402). Rank {N (G, e), f} is an index value indicating how many times the vector f is closest to the vector e among the vectors that are elements of the neighborhood vector N (G, e) of the vector e. It is. The predetermined number kr is a value less than the predetermined number kp, for example. Accordingly, the process of “determining whether Rank {N (G, e), f} exceeds a predetermined number kr” is performed from the vector that is an element of the neighborhood vector N (G, e) of the vector e to the vector e. Is a process of extracting a vector having a long distance compared to other vectors. When Rank {N (G, e), f} does not exceed the predetermined number kr, the processing of S404 to S410 is skipped.

Ｒａｎｋ｛Ｎ（Ｇ，ｅ），ｆ｝が所定数ｋｒを超える場合、削除処理部２６Ｂは、代替パスの探索開始ベクトルを選択し（Ｓ４０４）、探索開始ベクトルから先のパスを探索する（Ｓ４０６）。 When Rank {N (G, e), f} exceeds a predetermined number kr, the deletion processing unit 26B selects an alternative path search start vector (S404), and searches for a previous path from the search start vector (S406). ).

次に、削除処理部２６Ｂは、エッジ数がｐ以下の代替パスが存在するか否かを判定する（Ｓ４０８）。エッジ数がｐ以下の代替パスが存在する場合、削除処理部２６Ｂは、ベクトルｅとベクトルｆを接続するエッジ（ｅ，ｆ）をグラフインデックス４４から削除する（Ｓ４１０）。 Next, the deletion processing unit 26B determines whether there is an alternative path with the number of edges equal to or less than p (S408). When there is an alternative path having the number of edges equal to or less than p, the deletion processing unit 26B deletes the edge (e, f) connecting the vector e and the vector f from the graph index 44 (S410).

図１４は、エッジ数がｐ以下の代替パスが削除される様子を概念的に示す図である。図示の例では、ベクトルｅとｆ（１）を接続するエッジ（ｅ，ｆ（１））、および、ベクトルｅとｆ（２）を接続するエッジ（ｅ，ｆ（２））には、共に代替パスが存在する。エッジ（ｅ，ｆ（１））の代替パスのエッジ数は３であるのに対し、エッジ（ｅ，ｆ（２））の代替パスのエッジ数は４である。ここで、ｐ＝３とすると、エッジ（ｅ，ｆ（１））は削除可能、エッジ（ｅ，ｆ（２））は削除不可として扱われる。 FIG. 14 is a diagram conceptually illustrating a state where an alternative path having the number of edges of p or less is deleted. In the illustrated example, both the edge (e, f (1)) connecting the vectors e and f (1) and the edge (e, f (2)) connecting the vectors e and f (2) are both There is an alternate path. The number of edges of the alternative path of edge (e, f (1)) is 3, whereas the number of edges of the alternative path of edge (e, f (2)) is 4. Here, when p = 3, the edge (e, f (1)) is treated as erasable, and the edge (e, f (2)) is treated as irremovable.

次に、削除処理部２６Ｂは、ベクトルｅの近傍ベクトルＮ（Ｇ，ｅ）の要素であるベクトルから全てのベクトルを選択したか否かを判定する（Ｓ４１２）。ベクトルｅの近傍ベクトルＮ（Ｇ，ｅ）の要素であるベクトルから全てのベクトルを選択していない場合、Ｓ４００に処理が戻される。一方、ベクトルｅの近傍ベクトルＮ（Ｇ，ｅ）の要素であるベクトルから全てのベクトルを選択した場合、図１３のフローチャートの処理が終了する。 Next, the deletion processing unit 26B determines whether all vectors have been selected from the vectors that are elements of the neighborhood vector N (G, e) of the vector e (S412). If not all vectors are selected from the vectors that are elements of the neighborhood vector N (G, e) of the vector e, the process returns to S400. On the other hand, when all the vectors are selected from the vectors that are elements of the neighborhood vector N (G, e) of the vector e, the processing of the flowchart of FIG. 13 ends.

係る処理によって、探索性能を考慮したエッジの削減をすることができる。この種の技術では、ベクトルにエッジが大量に付与されていると、かえって検索性能が低下してしまう。従って、比較的距離の長いエッジを優先的に削除することで、局所的な検索回数を低減し、検索処理を効率化することができる。ところが、距離の長いエッジを無制限に削除すると、迂回する経路が長くなることで、むしろ検索処理が非効率になってしまう場合がある。また、迂回する経路が存在しなくなり、検索できないベクトルが生じる場合もある。 With such processing, it is possible to reduce edges in consideration of search performance. In this type of technique, if a large number of edges are added to a vector, the search performance is rather degraded. Therefore, by preferentially deleting edges with a relatively long distance, the number of local searches can be reduced and the search process can be made more efficient. However, if an edge with a long distance is deleted indefinitely, the path to be detoured becomes long, and the search process may rather become inefficient. In addition, there is a case where there is no detour path and a vector that cannot be searched for is generated.

係る事情に鑑み、本実施形態の検索データ管理装置１では、そのエッジを削除した後に、同じベクトルに対して比較的少ないエッジ数で到達することができる場合にのみ、エッジを削除することとした。これによって、探索性能を考慮したエッジの削減をすることができ、エッジの数を適度に維持することができる。 In view of such circumstances, in the search data management device 1 of the present embodiment, after deleting the edge, the edge is deleted only when the same vector can be reached with a relatively small number of edges. . As a result, the number of edges can be reduced in consideration of search performance, and the number of edges can be maintained appropriately.

ここで、図１３のフローチャートにおけるＳ４０４の処理において、探索開始ベクトルを選択する手法としては、以下のような手法が挙げられる。 Here, in the process of S <b> 404 in the flowchart of FIG. 13, as a technique for selecting a search start vector, the following technique may be mentioned.

（探索対象ノード選択手法１）
削除処理部２６Ｂは、ベクトルｅにエッジを介して接続されている全てのベクトルを、探索開始ベクトルとして網羅的に選択してもよい。これによって、エッジ数がｐ以下の代替パスを確実に発見することができる。 (Search target node selection method 1)
The deletion processing unit 26B may comprehensively select all vectors connected to the vector e via edges as search start vectors. As a result, an alternative path having the number of edges of p or less can be found with certainty.

（探索対象ノード選択手法２）
削除処理部２６Ｂは、ベクトルｅにエッジを介して接続されているベクトルのうち、ベクトルｆに最も近いベクトルを探索開始ベクトルとして、代替パスの探索を開始してもよい。これによって、代替パスの探索処理の負荷を最も軽減することができる。 (Search target node selection method 2)
The deletion processing unit 26B may start a search for an alternative path using a vector closest to the vector f among vectors connected to the vector e via an edge as a search start vector. As a result, the load of the alternative path search process can be reduced most.

（探索対象ノード選択手法３）
削除処理部２６Ｂは、ベクトルｅにエッジを介して接続されているベクトルのうち、そのベクトルとベクトルｆとの距離が、ベクトルｅとベクトルｆとの距離よりも短いベクトルを探索開始ベクトルとして、代替パスの探索を開始してもよい。これによって、代替パスの探索処理の負荷を軽減すると共に、（探索対象ノード選択手法２）よりも確実にエッジ数がｐ以下の代替パスを確実に発見することができる。 (Search target node selection method 3)
The deletion processing unit 26B substitutes, as a search start vector, a vector whose distance between the vector e and the vector f is shorter than the distance between the vector e and the vector f among the vectors connected to the vector e via an edge. Path searching may be started. As a result, the load of the alternative path search process can be reduced, and an alternative path having the number of edges equal to or less than p can be reliably detected as compared with (search target node selection method 2).

なお、いずれの手法においても、ベクトルｅに付与されているエッジの中で、短いものから順にランクｋｒまでのエッジのみを使用してグラフの探索を行う（短いものから順にランクｋｒまでのエッジのみを探索開始ベクトルとする）。なぜなら、ランクｋｒよりも大きいエッジを含む代替パスが見つかったとしても、それ以降の削除処理により代替パス上の当該エッジが削除されてしまう可能性があるからである。 In any of the methods, the graph search is performed using only edges from the shortest one to the rank kr among the edges assigned to the vector e (only edges from the shortest to the rank kr in order). As a search start vector). This is because even if an alternative path including an edge larger than the rank kr is found, the edge on the alternative path may be deleted by subsequent deletion processing.

図１５は、探索開始ベクトルを選択する手法を例示した図である。本図において、（探索対象ノード選択手法２）を採用する場合、ベクトル（８）のみが探索開始ベクトルとして選択される。一方、（探索対象ノード選択手法３）を採用する場合、ベクトル（８）、（９）、（１０）が探索開始ベクトルとして選択される。 FIG. 15 is a diagram illustrating a method for selecting a search start vector. In this figure, when the (search target node selection method 2) is adopted, only the vector (8) is selected as the search start vector. On the other hand, when (search target node selection method 3) is adopted, vectors (8), (9), and (10) are selected as search start vectors.

いずれの場合にも、エッジ数がｐに至った時点で、その経路の探索を終了することにより、処理負荷の増大を抑制することができる。削除処理部２６Ｂは、探索開始ベクトルを起点として、その先のベクトルを網羅的に探索し、エッジ数がｐに至るまでにベクトルｆに到達できなかった場合、その探索開始ベクトルを起点としてエッジ数がｐ以下の代替パスを発見できなかったと判断し、その探索開始ベクトルからの探索を終了する。 In any case, when the number of edges reaches p, the search for the route is terminated, thereby suppressing an increase in processing load. The deletion processing unit 26B exhaustively searches the vector after the search start vector as a starting point, and when the vector f cannot be reached before the number of edges reaches p, the number of edges starting from the search start vector Is determined to have failed to find an alternative path equal to or less than p, and the search from the search start vector is terminated.

以上説明した実施形態の検索データ管理装置１によれば、検索対象の多次元ベクトルデータがエッジにより接続されたグラフ構造のデータ（４２、４４）を記憶する記憶部（４０）と、記憶部に記憶されたグラフ構造のデータから所定条件を満たす被抽出エッジを抽出し（図１３のＳ４０２；エッジ（ｅ，ｆ））、被抽出エッジと同じ多次元ベクトルデータ同士（ｅ、ｆ）を、被抽出エッジとは異なる経路で接続すると共に、エッジ数が所定数（ｐ）以下である代替パスが存在するか否かを判定し、存在すると判定した場合に、被抽出エッジをグラフ構造のデータから削除する削除処理部（２６Ｂ）と、を備えることにより、探索性能を考慮したエッジの削減をすることができる。 According to the search data management device 1 of the embodiment described above, the storage unit (40) that stores the data (42, 44) of the graph structure in which the multidimensional vector data to be searched is connected by the edge, and the storage unit Extracted edges satisfying a predetermined condition are extracted from the stored graph structure data (S402 in FIG. 13; edges (e, f)), and the same multidimensional vector data (e, f) as the extracted edges are extracted. It is determined whether or not there is an alternative path having a number of edges equal to or less than a predetermined number (p) and connected by a route different from the extracted edge. By providing the deletion processing unit (26B) to be deleted, it is possible to reduce edges in consideration of search performance.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 As mentioned above, although the form for implementing this invention was demonstrated using embodiment, this invention is not limited to such embodiment at all, In the range which does not deviate from the summary of this invention, various deformation | transformation and substitution Can be added.

１…検索データ管理装置、１０…ネットワークインターフェース、２０…制御部、２２…ベクトルデータ生成部、２４…インデックス検索部、２６…グラフインデックス編集部、２６Ａ…新規作成・追加処理部、２６Ｂ…削除処理部、３０…入出力装置、４０…データサーバ、４２…既存ベクトルデータ、４４…グラフインデックス DESCRIPTION OF SYMBOLS 1 ... Search data management apparatus, 10 ... Network interface, 20 ... Control part, 22 ... Vector data generation part, 24 ... Index search part, 26 ... Graph index edit part, 26A ... New creation / addition process part, 26B ... Deletion process Part, 30 ... I / O device, 40 ... data server, 42 ... existing vector data, 44 ... graph index

Claims

A storage unit for storing data of a graph structure in which multidimensional vector data to be searched is connected by edges;
Extracting the extracted edge that is an edge satisfying a predetermined condition from the data of the graph structure stored in the storage unit,
Connecting the same multi-dimensional vector data as the extracted edge with a path different from the extracted edge, and determining whether there is an alternative path having a predetermined number of edges or less,
A deletion processing unit for deleting the extracted edge from the data of the graph structure when it is determined that it exists,
A search data management device comprising:

The predetermined condition is that, among a plurality of edges connected to certain multidimensional vector data, a distance between connected multidimensional vector data is long in comparison with other edges.
The search data management device according to claim 1.

The predetermined condition is that, when a plurality of edges connected to certain multidimensional vector data are ranked in ascending order of distance between connected multidimensional vector data, the predetermined condition is lower than a predetermined order.
The search data management device according to claim 1 or 2.

The deletion processing unit, when searching for the alternative path, among the multidimensional vector data connected to one multidimensional vector data among the multidimensional vector data connected by the extracted edge, The multidimensional vector data closest to the other multidimensional vector data among the multidimensional vector data connected by the edge is set as a search start vector.
The search data management device according to any one of claims 1 to 3.

The deletion processing unit, when searching for the alternative path, among the multidimensional vector data connected to one multidimensional vector data among the multidimensional vector data connected by the extracted edge, The multidimensional vector data closer to the other multidimensional vector data among the multidimensional vector data connected by the extracted edge than the multidimensional vector data is set as a search start vector.
The search data management device according to any one of claims 1 to 3.

Creation of the graph structure data based on the distance between multiple multidimensional vectors, by limiting the number of other multidimensional vectors connected to the target multidimensional vector via edges to a predetermined number Further comprising
The search data management device according to any one of claims 1 to 5.

A generating unit that generates multidimensional vector data as feature data of query data received from another device;
Based on the multidimensional vector data generated by the generation unit and the data of the graph structure, multidimensional vector data similar to the multidimensional vector data generated by the generation unit is extracted and returned to the other device Further comprising a search unit for
The search data management device according to any one of claims 1 to 6.

Extract the extracted edge, which is the edge that satisfies the predetermined condition, from the data of the graph structure in which the multidimensional vector data to be searched is connected by the edge,
Connecting the same multi-dimensional vector data as the extracted edge with a path different from the extracted edge, and determining whether there is an alternative path having a predetermined number of edges or less,
If it is determined that it exists, the extracted edge is deleted from the data of the graph structure;
Search data management method.

On the computer,
From the data of the graph structure in which the multidimensional vector data to be searched is connected by the edge, the extracted edge that is an edge satisfying a predetermined condition is extracted,
Connecting the same multi-dimensional vector data as the extracted edge with a route different from the extracted edge, and determining whether there is an alternative path having a predetermined number of edges or less,
When it is determined that it exists, the extracted edge is deleted from the data of the graph structure.
Search data management program.