JP7077387B1

JP7077387B1 - Information processing equipment, information processing methods, and information processing programs

Info

Publication number: JP7077387B1
Application number: JP2020195557A
Authority: JP
Inventors: 雅二郎岩崎
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2022-05-30
Anticipated expiration: 2040-11-25
Also published as: JP2022083919A

Abstract

【課題】効率的な検索処理を可能にする。【解決手段】本願に係る情報処理装置は、取得部と、検索処理部とを有する。取得部は、データ検索の対象となる複数のオブジェクトに対する検索クエリを取得する。検索処理部は、複数のオブジェクトの各々に対応するノード群がエッジにより連結されたグラフを用いて、検索クエリの近傍のノードを検索する検索処理において、所定の基準により選択された複数のノードと検索クエリとの距離を、ベクトル量子化がされた複数のノードのベクトル情報を用いて算出する。【選択図】図５PROBLEM TO BE SOLVED: To enable efficient search processing. An information processing device according to the present application has an acquisition unit and a search processing unit. The acquisition unit acquires a search query for a plurality of objects to be searched for data. The search processing unit uses a graph in which nodes corresponding to each of a plurality of objects are connected by edges to search for nodes in the vicinity of the search query, and the search processing unit includes a plurality of nodes selected by a predetermined criterion. The distance to the search query is calculated using the vector information of a plurality of nodes that have undergone vector quantization. [Selection diagram] FIG. 5

Description

本発明は、情報処理装置、情報処理方法、及び情報処理プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and an information processing program.

従来、種々の情報を探索（検索）する技術が提供されている。例えば、検索対象に対応するノードがエッジにより連結されたグラフを生成し、生成したグラフを用いて検索を行う技術が提供されている。また、このような技術は、例えば画像検索等に用いられる。 Conventionally, a technique for searching (searching) various information has been provided. For example, there is provided a technique of generating a graph in which nodes corresponding to a search target are connected by edges and performing a search using the generated graph. Further, such a technique is used, for example, for image retrieval and the like.

特許第６２９３３３５号公報Japanese Patent No. 6293335 特許第６３００９８２号公報Japanese Patent No. 630982 特許第６３１１０００号公報Japanese Patent No. 6311000

岩崎雅二郎 "木構造型インデックスを利用した近似k最近傍グラフによる近傍検索", 情報処理学会論文誌, 2011/2, Vol. 52, No. 2. pp.817-828.Masajiro Iwasaki "Approximate k-nearest neighbor search using tree-structured index", IPSJ Journal, 2011/2, Vol. 52, No. 2. pp.817-828.

しかしながら、上記の従来技術では、検索処理の効率化の点では改善の余地がある。例えば、上記の従来技術では、グラフのエッジ数等を調整したりして、効率的な検索ができるグラフに変更することにより、検索処理の時間削減等の効率化を行っているが、グラフの変更により検索処理の効率化を図るには限界がある。そのため、検索処理自体を効率化することが望まれている。 However, in the above-mentioned conventional technique, there is room for improvement in terms of improving the efficiency of the search process. For example, in the above-mentioned conventional technique, the number of edges of the graph is adjusted to change the graph so that the graph can be searched efficiently, thereby improving the efficiency such as reducing the time of the search process. There is a limit to improving the efficiency of the search process by making changes. Therefore, it is desired to improve the efficiency of the search process itself.

本願は、上記に鑑みてなされたものであって、効率的な検索処理を可能にする情報処理装置、情報処理方法、及び情報処理プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object of the present application is to provide an information processing apparatus, an information processing method, and an information processing program that enable efficient search processing.

本願に係る情報処理装置は、データ検索の対象となる複数のオブジェクトに対する検索クエリを取得する取得部と、前記複数のオブジェクトの各々に対応するノード群がエッジにより連結されたグラフを用いて、前記検索クエリの近傍のノードを検索する検索処理において、所定の基準により選択された複数のノードと前記検索クエリとの距離を、ベクトル量子化がされた前記複数のノードのベクトル情報を用いて算出する検索処理部と、を備えたことを特徴とする。 The information processing apparatus according to the present application uses an acquisition unit for acquiring search queries for a plurality of objects to be searched for data, and a graph in which nodes corresponding to each of the plurality of objects are connected by edges. In the search process for searching for a node in the vicinity of the search query, the distance between the plurality of nodes selected by a predetermined criterion and the search query is calculated using the vector information of the plurality of nodes subjected to vector quantization. It is characterized by having a search processing unit.

実施形態の一態様によれば、効率的な検索処理を可能にすることができるという効果を奏する。 According to one aspect of the embodiment, there is an effect that efficient search processing can be enabled.

図１は、第１の実施形態に係る情報処理の一例を示す図である。FIG. 1 is a diagram showing an example of information processing according to the first embodiment. 図２は、第１の実施形態に係るデータの一例を示す図である。FIG. 2 is a diagram showing an example of data according to the first embodiment. 図３は、第１の実施形態に係るデータの一例を示す図である。FIG. 3 is a diagram showing an example of data according to the first embodiment. 図４は、第１の実施形態に係る情報処理システムの構成例を示す図である。FIG. 4 is a diagram showing a configuration example of the information processing system according to the first embodiment. 図５は、第１の実施形態に係る情報処理装置の構成例を示す図である。FIG. 5 is a diagram showing a configuration example of the information processing apparatus according to the first embodiment. 図６は、第１の実施形態に係るオブジェクト情報記憶部の一例を示す図である。FIG. 6 is a diagram showing an example of an object information storage unit according to the first embodiment. 図７は、第１の実施形態に係るグラフ情報記憶部の一例を示す図である。FIG. 7 is a diagram showing an example of a graph information storage unit according to the first embodiment. 図８は、第１の実施形態に係る量子化情報記憶部の一例を示す図である。FIG. 8 is a diagram showing an example of a quantized information storage unit according to the first embodiment. 図９は、第１の実施形態に係るコードブック情報記憶部の一例を示す図である。FIG. 9 is a diagram showing an example of a codebook information storage unit according to the first embodiment. 図１０は、第１の実施形態に係る情報処理の一例を示すフローチャートである。FIG. 10 is a flowchart showing an example of information processing according to the first embodiment. 図１１は、第１の実施形態に係る検索処理の一例を示すフローチャートである。FIG. 11 is a flowchart showing an example of the search process according to the first embodiment. 図１２は、第２の実施形態に係る情報処理の一例を示す図である。FIG. 12 is a diagram showing an example of information processing according to the second embodiment. 図１３は、第２の実施形態に係る情報処理装置の構成例を示す図である。FIG. 13 is a diagram showing a configuration example of the information processing apparatus according to the second embodiment. 図１４は、第２の実施形態に係る量子化情報記憶部の一例を示す図である。FIG. 14 is a diagram showing an example of a quantized information storage unit according to the second embodiment. 図１５は、第２の実施形態に係るブロブ情報記憶部の一例を示す図である。FIG. 15 is a diagram showing an example of a blob information storage unit according to the second embodiment. 図１６は、第２の実施形態に係る検索処理の一例を示すフローチャートである。FIG. 16 is a flowchart showing an example of the search process according to the second embodiment. 図１７は、変形例に係る情報処理の一例を示す図である。FIG. 17 is a diagram showing an example of information processing according to a modified example. 図１８は、変形例に係るグラフ情報記憶部の一例を示す図である。FIG. 18 is a diagram showing an example of a graph information storage unit according to a modified example. 図１９は、変形例に係る検索処理の一例を示すフローチャートである。FIG. 19 is a flowchart showing an example of the search process according to the modified example. 図２０は、情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 20 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing device.

以下に、本願に係る情報処理装置、情報処理方法、及び情報処理プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法、及び情報処理プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, the information processing apparatus, the information processing method, and the embodiment for implementing the information processing program (hereinafter referred to as “the embodiment”) according to the present application will be described in detail with reference to the drawings. It should be noted that this embodiment does not limit the information processing apparatus, information processing method, and information processing program according to the present application. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate explanations are omitted.

（実施形態）
〔１．第１の実施形態〕
〔１－１．情報処理〕
図１を用いて、第１の実施形態に係る情報処理の一例について説明する。図１は、第１の実施形態に係る情報処理の一例を示す図である。情報処理装置１００は、データ検索の対象となる複数のオブジェクトをグラフ構造化したグラフインデックス（単に「グラフ」ともいう）を用いた検索処理を実行する。図１では、情報処理装置１００がデータ検索の対象であるオブジェクトがベクトル化された各ベクトルに対応するノードがエッジで連結されたグラフを用いて近傍検索を行う場合の検索処理の一部を示す。情報処理装置１００は、グラフの各ノードを検索対象のオブジェクトとして、グラフを辿って与えられた検索クエリ（ベクトル）の近傍のノードを探索する。情報処理装置１００は、検索処理により、抽出する近傍のノードの数として指定された所定数（以下「検索数」ともいう）のノードを、検索クエリの近傍のノードとして抽出する。以下では、画像情報をデータ検索の対象とした場合を一例として説明するが、データ検索の対象は、動画情報や音声情報等の種々の対象であってもよい。 (Embodiment)
[1. First Embodiment]
[1-1. Information processing]
An example of information processing according to the first embodiment will be described with reference to FIG. 1. FIG. 1 is a diagram showing an example of information processing according to the first embodiment. The information processing apparatus 100 executes a search process using a graph index (also simply referred to as a “graph”) in which a plurality of objects to be searched for data are graph-structured. FIG. 1 shows a part of a search process in which the information processing apparatus 100 performs a neighborhood search using a graph in which the nodes corresponding to the vectorized objects of the object to be searched for data are connected by edges. .. The information processing apparatus 100 searches for a node in the vicinity of a given search query (vector) by tracing the graph, with each node of the graph as an object to be searched. The information processing apparatus 100 extracts a predetermined number of nodes (hereinafter, also referred to as “search number”) designated as the number of nearby nodes to be extracted by the search process as nodes in the vicinity of the search query. Hereinafter, the case where the image information is the target of the data search will be described as an example, but the target of the data search may be various targets such as moving image information and audio information.

情報処理装置１００は、数百万～数億等の単位の膨大な画像情報に対応するノードを対象に検索処理を行うが、図面においてはその一部（図１ではノードＮ１等の数十個）のみを図示する。例えば、情報処理装置１００は、図１中の空間情報ＳＰ１に示すように、ノードＮ１、Ｎ７、Ｎ９等に示すような複数のノード（ベクトル）に関する情報を取得する。このように「ノードＮ＊（＊は任意の数値）」と記載した場合、そのノードはノードＩＤ「Ｎ＊」により識別されるノードであることを示す。例えば、「ノードＮ１」と記載した場合、そのノードはノードＩＤ「Ｎ１」により識別されるノードである。空間情報ＳＰ１中の白丸（〇）の各々が各ノードを示す。 The information processing apparatus 100 performs a search process on a node corresponding to a huge amount of image information in the unit of millions to hundreds of millions, etc., but in the drawing, a part of it (in FIG. 1, dozens of nodes N1 etc.) ) Only shown. For example, as shown in the spatial information SP1 in FIG. 1, the information processing apparatus 100 acquires information about a plurality of nodes (vectors) as shown in the nodes N1, N7, N9, and the like. When described as "node N * (* is an arbitrary numerical value)" in this way, it means that the node is a node identified by the node ID "N *". For example, when described as "node N1", the node is a node identified by the node ID "N1". Each of the white circles (◯) in the spatial information SP1 indicates each node.

図１中の空間情報ＳＰ１では、主に説明に関係するノードに符号を付すが、符号が付されていない白丸（〇）の各々もノードであり、図示するノード以外にも多数のノードが含まれる。また、各ノードは、各オブジェクト（検索対象）に対応する。例えば、画像から抽出された複数の局所特徴量のそれぞれがオブジェクトであってもよい。また、例えば、オブジェクト間の距離が定義された種々のデータがオブジェクトであってもよい。 In the spatial information SP1 in FIG. 1, the nodes mainly related to the description are coded, but each of the unsigned white circles (○) is also a node, and includes a large number of nodes other than the node shown in the figure. Is done. In addition, each node corresponds to each object (search target). For example, each of the plurality of local features extracted from the image may be an object. Further, for example, various data in which the distance between objects is defined may be an object.

例えば図１中の空間情報ＳＰ１は、ユークリッド空間であってもよい。例えば、空間情報ＳＰ１は、オブジェクトのベクトルの次元数に対応し、１００次元や１０００次元等の多次元空間であるものとする。なお、図１では、空間情報ＳＰ１に示すように直積量子化によりベクトル（空間）が複数の部分領域（部分空間）に４分割された状態を概念的に図示するが、直積量子化の点については後述する。 For example, the spatial information SP1 in FIG. 1 may be an Euclidean space. For example, the spatial information SP1 corresponds to the number of dimensions of the vector of the object, and is assumed to be a multidimensional space such as 100 dimensions or 1000 dimensions. Note that FIG. 1 conceptually illustrates a state in which a vector (space) is divided into four subregions (subspaces) by direct product quantization as shown in spatial information SP1, but the point of direct product quantization is Will be described later.

空間情報ＳＰ１中のノードである白丸（〇）間を接続する点線がノード間を連結するエッジを示す。図１の例では、説明を簡単にするために、ノード間が無向（双方向）エッジ（以下単に「エッジ」ともいう）により連結される場合を示す。なお、ここでいう無向エッジとは、連結されたノード間を双方向にデータを辿ることができるエッジを意味する。例えば、ノードＮ１を示す白丸（〇）とノードＮ４を示す白丸（〇）との間は点線で接続されており、ノードＮ１とノードＮ４との間はエッジで連結されていることを示す。すなわち、空間情報ＳＰ１に示すグラフではノードＮ１とノードＮ４との間を双方向に辿ることが可能となる。具体的には、空間情報ＳＰ１ではノードＮ１からノードＮ４へ辿ることができ、かつノードＮ４からノードＮ１へ辿ることができる。 The dotted line connecting the white circles (◯), which are the nodes in the spatial information SP1, indicates the edge connecting the nodes. In the example of FIG. 1, for the sake of simplicity, a case where nodes are connected by undirected (bidirectional) edges (hereinafter, also simply referred to as “edges”) will be shown. The undirected edge here means an edge that can trace data in both directions between connected nodes. For example, the white circle (〇) indicating the node N1 and the white circle (〇) indicating the node N4 are connected by a dotted line, and the node N1 and the node N4 are connected by an edge. That is, in the graph shown in the spatial information SP1, it is possible to trace between the node N1 and the node N4 in both directions. Specifically, in the spatial information SP1, the node N1 can be traced to the node N4, and the node N4 can be traced to the node N1.

なお、図１の例では、図示の関係上、図示したノード間を連結するエッジのみを図示するが、図示したエッジ以外にも多数のエッジが含まれる。このように、図１中の空間情報ＳＰ１では、エッジの一部のみを図示するが、例えばｋ近傍グラフ（k-nearest neighbor graph）であるものとする。なお、空間情報ＳＰ１は、種々のグラフであってもよい。 In the example of FIG. 1, for the sake of illustration, only the edges connecting the illustrated nodes are shown, but a large number of edges are included in addition to the illustrated edges. As described above, in the spatial information SP1 in FIG. 1, only a part of the edge is shown, but it is assumed that it is, for example, a k-nearest neighbor graph. The spatial information SP1 may be various graphs.

また、グラフのエッジは、無向エッジに限らず、有向エッジであってもよい。有向エッジの場合、有向エッジの参照元となっているノードから参照先のノードへのみ辿ることができる。例えば、２つのノード間が、一方を参照元とし他方を参照先とする第１エッジ、及び一方を参照先とし他方を参照元とする第２エッジの２つの有向エッジで連結されている場合、その２つのノード間が無向（双方向）エッジで連結されて状態と同じ状態である。 Further, the edge of the graph is not limited to the undirected edge, and may be a directed edge. In the case of a directed edge, it is possible to trace only from the node that is the reference source of the directed edge to the node that is the reference destination. For example, when two nodes are connected by two directed edges, a first edge with one as a reference source and the other as a reference destination, and a second edge with one as a reference destination and the other as a reference source. , The two nodes are connected by an undirected (bidirectional) edge and are in the same state as the state.

ここから、図１を用いて検索処理について説明する。図１の例では、情報処理装置１００は、空間情報ＳＰ１に示すようなグラフＧＲ１を取得済みであるものとする。なお、情報処理装置１００は、種々の従来技術を適宜用いてグラフＧＲ１を生成してもよい。まず、検索処理の説明に先立ってベクトル（空間）の直積量子化について説明する。 From here, the search process will be described with reference to FIG. In the example of FIG. 1, it is assumed that the information processing apparatus 100 has already acquired the graph GR1 as shown in the spatial information SP1. The information processing apparatus 100 may generate the graph GR1 by appropriately using various conventional techniques. First, prior to the explanation of the search process, the direct product quantization of the vector (space) will be described.

図１では、空間情報ＳＰ１に示すように直積量子化によりベクトル（空間）が４つの部分空間ＡＲ１１～ＡＲ１４に分割された状態を概念的に図示する。図１に示す部分空間ＡＲ１１～ＡＲ１４は、各ノード（オブジェクト）のベクトル間の距離等の説明のための概念的な図で示すが、各部分空間ＡＲ１１～ＡＲ１４は多次元空間となる。例えば、図１に示す部分空間ＡＲ１１は、平面上に図示するため２次元の態様にて図示されるが、例えば１００次元や１０００次元等の多次元空間であるものとする。 In FIG. 1, as shown in the spatial information SP1, the state in which the vector (space) is divided into four subspaces AR11 to AR14 by direct product quantization is conceptually illustrated. The subspaces AR11 to AR14 shown in FIG. 1 are shown in a conceptual diagram for explaining the distance between vectors of each node (object), and the subspaces AR11 to AR14 are multidimensional spaces. For example, the subspace AR11 shown in FIG. 1 is shown in a two-dimensional manner for being shown on a plane, but is assumed to be a multidimensional space such as 100 dimensions or 1000 dimensions.

各部分空間ＡＲ１１～ＡＲ１４の各々は、４分割されるベクトルの各分割（部分ベクトル）に対応する空間を示す。なお、例えば１００次元のベクトルを１００分割しても良い。例えば、部分空間ＡＲ１１は、ベクトルが４分割された４つの部分ベクトル（「サブベクトル」ともいう）のうち、先頭の部分ベクトル（「第１サブベクトル」ともいう）に対応する次元の空間（「第１サブ空間」ともいう）を示す。部分空間ＡＲ１１は、ベクトルの先頭の分割位置に対応する部分空間（第１サブ空間）を示す。検索クエリＱＥ１の場合、部分空間ＡＲ１１は、先頭の部分ベクトル（第１サブベクトル）である第１部分クエリＱＥ１－１に対応する空間である。 Each of the subspaces AR11 to AR14 indicates a space corresponding to each division (partial vector) of the vector to be divided into four. For example, a 100-dimensional vector may be divided into 100. For example, the subspace AR11 is a dimensional space (“1st subvector”) corresponding to the first subvector (also referred to as “first subvector”) among four subvectors (also referred to as “subvectors”) in which the vector is divided into four. Also referred to as "first subspace"). The subspace AR11 indicates a subspace (first subspace) corresponding to the division position at the beginning of the vector. In the case of the search query QE1, the subspace AR11 is a space corresponding to the first subquery QE1-1 which is the first subvector.

また、部分空間ＡＲ１２は、ベクトルが４分割された４つの部分ベクトルのうち、先頭から２番目の部分ベクトル（「第２サブベクトル」ともいう）に対応する次元の空間（「第２サブ空間」ともいう）を示す。部分空間ＡＲ１２は、ベクトルの先頭から２番目の分割位置に対応する部分空間（第２サブ空間）を示す。検索クエリＱＥ１の場合、部分空間ＡＲ１２は、先頭から２番目の部分ベクトル（第２サブベクトル）である第２部分クエリＱＥ１－２に対応する空間である。 Further, the subspace AR12 is a dimensional space (“second subspace”) corresponding to the second subvector from the beginning (also referred to as “second subvector”) among the four subspaces in which the vector is divided into four. Also called). The subspace AR12 indicates a subspace (second subspace) corresponding to the second division position from the beginning of the vector. In the case of the search query QE1, the subspace AR12 is a space corresponding to the second subquery QE1-2, which is the second subvector from the beginning.

また、部分空間ＡＲ１３は、ベクトルが４分割された４つの部分ベクトルのうち、先頭から３番目の部分ベクトル（「第３サブベクトル」ともいう）に対応する次元の空間（「第３サブ空間」ともいう）を示す。部分空間ＡＲ１３は、ベクトルの先頭から３番目の分割位置に対応する部分空間（第３サブ空間）を示す。検索クエリＱＥ１の場合、部分空間ＡＲ１３は、先頭から３番目の部分ベクトル（第３サブベクトル）である第３部分クエリＱＥ１－３に対応する空間である。 Further, the subspace AR13 is a dimensional space (“third subspace”” corresponding to the third subvector from the beginning (also referred to as “third subvector”) among the four subvectors in which the vector is divided into four. Also called). The subspace AR13 indicates a subspace (third subspace) corresponding to the third division position from the beginning of the vector. In the case of the search query QE1, the subspace AR13 is a space corresponding to the third subquery QE1-3, which is the third subvector from the beginning.

また、部分空間ＡＲ１４は、ベクトルが４分割された４つの部分ベクトルのうち、先頭から４番目（すなわち最後尾）の部分ベクトル（「第４サブベクトル」ともいう）に対応する次元の空間（「第４サブ空間」ともいう）を示す。部分空間ＡＲ１４は、ベクトルの先頭から４番目の分割位置に対応する部分空間（第４サブ空間）を示す。検索クエリＱＥ１の場合、部分空間ＡＲ１４は、先頭から４番目の部分ベクトル（第４サブベクトル）である第４部分クエリＱＥ１－４に対応する空間である。 Further, the subspace AR14 is a space having a dimension corresponding to the fourth (that is, the last) subvector (also referred to as “fourth subvector”) from the four subvectors in which the vector is divided into four (“4th subvector”). Also referred to as "fourth subspace"). The subspace AR14 indicates a subspace (fourth subspace) corresponding to the fourth division position from the beginning of the vector. In the case of the search query QE1, the subspace AR14 is a space corresponding to the fourth subquery QE1-4, which is the fourth partial vector (fourth subvector) from the beginning.

なお、図１の例では、部分空間ＡＲ１１～ＡＲ１４を類似の形状で示すが、各部分空間ＡＲ１１～ＡＲ１４の形状は異なってもよいし、また各部分空間ＡＲ１１～ＡＲ１４における領域の分割態様も異なってもよい。また、ノード間のエッジによる接続関係は部分空間ＡＲ１１～ＡＲ１４で共通であるものとする。すなわち、グラフＧＲ１は、空間情報ＳＰ１に対応するグラフであり、部分空間ＡＲ１１～ＡＲ１４で共通である。 In the example of FIG. 1, the subspaces AR11 to AR14 are shown in similar shapes, but the shapes of the subspaces AR11 to AR14 may be different, and the division mode of the regions in the subspaces AR11 to AR14 is also different. You may. Further, it is assumed that the connection relationship between the nodes by the edge is common in the subspaces AR11 to AR14. That is, the graph GR1 is a graph corresponding to the spatial information SP1 and is common to the subspaces AR11 to AR14.

また、図１の例では、分割された部分ベクトル（サブベクトル）ごとにルックアップテーブルが生成される。例えば、第１サブベクトル、第２サブベクトル、第３サブベクトル、及び第４サブベクトルの４つのサブベクトルごとにクラスタリングされ、個別にルックアップテーブル（コードブック情報）が生成される。 Further, in the example of FIG. 1, a look-up table is generated for each divided partial vector (sub-vector). For example, the first subvector, the second subvector, the third subvector, and the fourth subvector are clustered for each of the four subvectors, and a look-up table (codebook information) is individually generated.

図１では、第１サブベクトルについては、部分空間ＡＲ１１に示すように、コードブックＣＤ１１～ＣＤ１９に対応する９個のグループにクラスタリングされ、コードブックＣＤ１１～ＣＤ１９の各々に対応するベクトルが代表ベクトル（セントロイド）として算出される。例えば、ノードＮ７、Ｎ９等の第１サブベクトルはコードブックＣＤ１１に対応するグループにクラスタリングされることを示す。この場合、ノードＮ７、Ｎ９等の第１サブベクトルは距離の計算（算出）において、第１サブベクトルに対応するコードブック情報（「第１コードブック情報」ともいう）を用いて、コードブックＣＤ１１のベクトルにベクトル量子化される。 In FIG. 1, as shown in the subspace AR11, the first subvector is clustered into nine groups corresponding to the codebooks CD11 to CD19, and the vector corresponding to each of the codebooks CD11 to CD19 is a representative vector ( Calculated as a centroid). For example, it is shown that the first subvectors of the nodes N7, N9, etc. are clustered into the group corresponding to the codebook CD11. In this case, the first subvector of the nodes N7, N9, etc. uses the codebook information (also referred to as “first codebook information”) corresponding to the first subvector in the calculation (calculation) of the distance, and the codebook CD11. Is vector-quantized into a vector of.

また、第２サブベクトルについては、コードブックＣＤ２１～ＣＤ２４等（図２、図９参照）に対応する複数のグループにクラスタリングされ、コードブックＣＤ２１～ＣＤ２４等の各々に対応するベクトルが代表ベクトル（セントロイド）として算出される。この場合、各ノードの第２サブベクトルは距離の計算（算出）において、第２サブベクトルに対応するコードブック情報（「第２コードブック情報」ともいう）を用いて、対応するコードブックのベクトルにベクトル量子化される。 The second subvector is clustered into a plurality of groups corresponding to the codebooks CD21 to CD24 and the like (see FIGS. 2 and 9), and the vector corresponding to each of the codebooks CD21 to CD24 and the like is a representative vector (cent). Lloyd) is calculated. In this case, the second subvector of each node uses the codebook information (also referred to as "second codebook information") corresponding to the second subvector in the calculation (calculation) of the distance, and the vector of the corresponding codebook. Is vector-quantized to.

第３サブベクトルについては、コードブックＣＤ３１～ＣＤ３４等（図２、図９参照）に対応する複数のグループにクラスタリングされ、コードブックＣＤ３１～ＣＤ３４等の各々に対応するベクトルが代表ベクトル（セントロイド）として算出される。この場合、各ノードの第３サブベクトルは距離の計算（算出）において、第３サブベクトルに対応するコードブック情報（「第３コードブック情報」ともいう）を用いて、対応するコードブックのベクトルにベクトル量子化される。 The third subvector is clustered into a plurality of groups corresponding to the codebooks CD31 to CD34, etc. (see FIGS. 2 and 9), and the vector corresponding to each of the codebooks CD31 to CD34, etc. is a representative vector (centroid). Is calculated as. In this case, the third subvector of each node uses the codebook information (also referred to as "third codebook information") corresponding to the third subvector in the calculation (calculation) of the distance, and the vector of the corresponding codebook. Is vector-quantized to.

第４サブベクトルについては、コードブックＣＤ４１～ＣＤ４４等（図２、図９参照）に対応する複数のグループにクラスタリングされ、コードブックＣＤ２４～ＣＤ４４等の各々に対応するベクトルが代表ベクトル（セントロイド）として算出される。この場合、各ノードの第４サブベクトルは距離の計算（算出）において、第４サブベクトルに対応するコードブック情報（「第４コードブック情報」ともいう）を用いて、対応するコードブックのベクトルにベクトル量子化される。 The fourth subvector is clustered into a plurality of groups corresponding to the codebooks CD41 to CD44 (see FIGS. 2 and 9), and the vector corresponding to each of the codebooks CD24 to CD44 is a representative vector (centroid). Is calculated as. In this case, the fourth subvector of each node uses the codebook information (also referred to as "fourth codebook information") corresponding to the fourth subvector in the calculation (calculation) of the distance, and the vector of the corresponding codebook. Is vector-quantized to.

なお、代表ベクトルを求める処理等、コードブック情報の生成は、種々の技術を適宜用いて生成される。コードブック情報の生成については従来技術であるため詳細な説明を省略する。また、ルックアップテーブルについては上記に限らず、例えば、分割された部分ベクトル（サブベクトル）全体で１つのルックアップテーブル（コードブック情報）を用いてもよいが、この点については後述する。 The codebook information is generated by appropriately using various techniques such as the process of obtaining the representative vector. Since the generation of codebook information is a conventional technique, detailed description thereof will be omitted. Further, the look-up table is not limited to the above, and for example, one look-up table (codebook information) may be used for the entire divided partial vector (sub-vector), which will be described later.

ここから、検索クエリＱＥ１を対象とする検索処理を説明する。まず、情報処理装置１００は、検索クエリＱＥ１を取得する（ステップＳ１１）。例えば、情報処理装置１００は、ユーザが利用する端末装置１０（図４参照）から検索クエリＱＥ１を取得する。 From here, the search process targeting the search query QE1 will be described. First, the information processing apparatus 100 acquires the search query QE1 (step S11). For example, the information processing device 100 acquires the search query QE1 from the terminal device 10 (see FIG. 4) used by the user.

そして、情報処理装置１００は、検索クエリＱＥ１であるベクトルを４分割する。すなわち、情報処理装置１００は、検索クエリＱＥ１を４つの部分クエリに分割する。図１では、情報処理装置１００は、検索クエリＱＥ１を、第１サブベクトルである第１部分クエリＱＥ１－１、第２サブベクトルである第２部分クエリＱＥ１－２、第３サブベクトルである第３部分クエリＱＥ１－３、及び第４サブベクトルである第４部分クエリＱＥ１－４との４つのサブベクトルに分割する。具体的には、情報処理装置１００は、「４５，２３，２…」を第１部分クエリＱＥ１－１とし、「１２７，３４，５…」を第２部分クエリＱＥ１－２とし、「２０，９８，１１０…」を第３部分クエリＱＥ１－３とし、「１２，４５，４…」を第４部分クエリＱＥ１－４とする。 Then, the information processing apparatus 100 divides the vector which is the search query QE1 into four. That is, the information processing apparatus 100 divides the search query QE1 into four partial queries. In FIG. 1, the information processing apparatus 100 uses the search query QE1 as a first subvector, a first partial query QE1-1, a second subvector, a second partial query QE1-2, and a third subvector. It is divided into four subvectors, a three-part query QE1-3 and a fourth subvector, the fourth subquery QE1-4. Specifically, in the information processing apparatus 100, "45, 23, 2 ..." is referred to as the first partial query QE1-1, "127, 34, 5 ..." is referred to as the second partial query QE1-2, and "20, "98,110 ..." is referred to as the third subquery QE1-3, and "12,45,4 ..." is referred to as the fourth subquery QE1-4.

そして、情報処理装置１００は、検索クエリＱＥ１の各サブベクトルと、そのサブベクトルに対応するコードブックのベクトルとの間の距離を算出する。情報処理装置１００は、図２のコードブック情報ＴＢ１～ＴＢ４に示すように、検索クエリＱＥ１の各サブベクトルと各コードブックのベクトルとの間の距離（差分）を算出する。図２は、第１の実施形態に係るデータの一例を示す図である。 Then, the information processing apparatus 100 calculates the distance between each subvector of the search query QE1 and the vector of the codebook corresponding to the subvector. As shown in the codebook information TB1 to TB4 of FIG. 2, the information processing apparatus 100 calculates the distance (difference) between each subvector of the search query QE1 and the vector of each codebook. FIG. 2 is a diagram showing an example of data according to the first embodiment.

コードブック情報ＴＢ１は、第１サブベクトルに対応する第１コードブック情報を示し、情報処理装置１００は、第１サブベクトルに対応するコードブックＣＤ１１～ＣＤ１９の各々のベクトルと、第１部分クエリＱＥ１－１との間の距離を算出する。図２では、情報処理装置１００は、コードブック情報ＴＢ１に示すように、コードブックＣＤ１１と第１部分クエリＱＥ１－１との間の距離を距離ＤＳ１１と算出する。同様に、情報処理装置１００は、コードブックＣＤ１２～ＣＤ１４等の各々と第１部分クエリＱＥ１－１との間の距離を距離ＤＳ１２～ＤＳ１４等と算出する。 The codebook information TB1 indicates the first codebook information corresponding to the first subvector, and the information processing apparatus 100 has the respective vectors of the codebooks CD11 to CD19 corresponding to the first subvector, and the first partial query QE1. Calculate the distance to -1. In FIG. 2, the information processing apparatus 100 calculates the distance between the codebook CD11 and the first partial query QE1-1 as the distance DS11, as shown in the codebook information TB1. Similarly, the information processing apparatus 100 calculates the distance between each of the codebooks CD12 to CD14 and the like and the first partial query QE1-1 as the distance DS12 to DS14 and the like.

同様に、コードブック情報ＴＢ２～ＴＢ４の各々は、第２～第４サブベクトルの各々に対応する第２～第４コードブック情報を示す。情報処理装置１００は、コードブック情報ＴＢ２に示すように、コードブックＣＤ２１～ＣＤ２４等の各々と第２部分クエリＱＥ１－２との間の距離を距離ＤＳ２１～ＤＳ２４等と算出する。情報処理装置１００は、コードブック情報ＴＢ３に示すように、コードブックＣＤ３１～ＣＤ３４等の各々と第３部分クエリＱＥ１－３との間の距離を距離ＤＳ３１～ＤＳ３４等と算出する。情報処理装置１００は、コードブック情報ＴＢ４に示すように、コードブックＣＤ４１～ＣＤ４４等の各々と第４部分クエリＱＥ１－４との間の距離を距離ＤＳ４１～ＤＳ４４等と算出する。 Similarly, each of the codebook information TB2 to TB4 indicates the second to fourth codebook information corresponding to each of the second to fourth subvectors. As shown in the codebook information TB2, the information processing apparatus 100 calculates the distance between each of the codebooks CD21 to CD24 and the like and the second partial query QE1-2 as the distance DS21 to DS24 and the like. As shown in the codebook information TB3, the information processing apparatus 100 calculates the distance between each of the codebooks CD31 to CD34 and the like and the third partial query QE1-3 as the distance DS31 to DS34 and the like. As shown in the codebook information TB4, the information processing apparatus 100 calculates the distance between each of the codebooks CD41 to CD44 and the like and the fourth partial query QE1-4 as the distance DS41 to DS44 and the like.

なお、図２では説明のため抽象的に示すが、距離ＤＳ１１～ＤＳ４４等は具体的な値であるものとする。距離は浮動小数点で表される値であるが、後述のＳＩＭＤ（Single Instruction, Multiple Data）による高速化のためにscale-offset-compression（「https://www.unidata.ucar.edu/blogs/developer/entry/compression_by_scaling_and_offfset」等参照）により１バイト整数型に圧縮してもよい。情報処理装置１００は、各コードブックとクエリとの間の距離算出を、検索クエリＱＥ１を取得したタイミングで行ってもよいし、その情報が必要になったタイミングで行ってもよい。情報処理装置１００は、検索処理において、コードブック情報ＴＢ１～ＴＢ４をルックアップテーブルとして用いて、各ノードと検索クエリＱＥ１との間の距離、すなわち近似距離（「第１距離」ともいう）を算出する。このように、情報処理装置１００は、検索処理において、グラフを探索する際には、各ノードと検索クエリとの間の真の距離（「第２距離」ともいう）ではなく、近似距離（第１距離）を計算し処理を行うことにより、効率的な検索処理を可能にすることができる。 Although shown abstractly in FIG. 2 for the sake of explanation, the distances DS11 to DS44 and the like are assumed to be specific values. The distance is a value expressed in floating point, but scale-offset-compression ("https://www.unidata.ucar.edu/blogs/") for speeding up by SIMD (Single Instruction, Multiple Data) described later. It may be compressed to a 1-byte integer type by (see developer / entry / compression_by_scaling_and_offfset "etc.). The information processing apparatus 100 may calculate the distance between each codebook and the query at the timing when the search query QE1 is acquired, or at the timing when the information is needed. The information processing apparatus 100 uses the codebook information TB1 to TB4 as a look-up table in the search process to calculate the distance between each node and the search query QE1, that is, an approximate distance (also referred to as “first distance”). do. As described above, when the information processing apparatus 100 searches the graph in the search process, the information processing apparatus 100 does not have a true distance between each node and the search query (also referred to as a “second distance”), but an approximate distance (also referred to as a “second distance”). By calculating (1 distance) and performing processing, efficient search processing can be enabled.

情報処理装置１００は、検索クエリＱＥ１を対象とする検索処理を実行する（ステップＳ１２）。情報処理装置１００は、検索クエリＱＥ１を対象として、グラフＧＲ１を用いた図１１に示すような検索処理を行うことにより、検索クエリＱＥ１の検索結果を得る。図１１に示す検索処理についての詳細は後述する。情報処理装置１００は、検索クエリＱＥ１を対象として検索処理を行うことにより、検索数のノードを検索クエリＱＥ１の近傍のノードとして抽出する。 The information processing apparatus 100 executes a search process targeting the search query QE1 (step S12). The information processing apparatus 100 obtains the search result of the search query QE1 by performing the search process as shown in FIG. 11 using the graph GR1 for the search query QE1. Details of the search process shown in FIG. 11 will be described later. The information processing apparatus 100 performs a search process on the search query QE1 to extract the nodes having the number of searches as the nodes in the vicinity of the search query QE1.

情報処理装置１００は、検索クエリＱＥ１を対象とする検索処理において、各ノードのうち、所定のノードをグラフＧＲ１の検索の開始点（起点）となるノード（以下「起点ノード」ともいう）として選択する。例えば、情報処理装置１００は、起点ノードを、木構造のインデックス等の所定のインデックスを用いて選択する。図１の例では、情報処理装置１００は、起点ノードとして、ノードＮ７を選択するものとする。なお、図１では説明を簡単にするために、所定のインデックスを用いてノードＮ７のみを起点ノードとして選択する場合を示すが、情報処理装置１００は、複数のノードを起点ノードとして選択してもよいし、ランダム等の様々な方法により起点ノードを選択してもよい。 In the search process targeting the search query QE1, the information processing apparatus 100 selects a predetermined node as a node (hereinafter, also referred to as “starting point node”) that is the starting point (starting point) of the search of the graph GR1. do. For example, the information processing apparatus 100 selects a starting node using a predetermined index such as a tree-structured index. In the example of FIG. 1, the information processing apparatus 100 selects the node N7 as the starting node. Note that FIG. 1 shows a case where only the node N7 is selected as the starting node using a predetermined index for the sake of simplicity, but the information processing apparatus 100 may select a plurality of nodes as the starting node. Alternatively, the starting node may be selected by various methods such as random.

ここで、情報処理装置１００は、検索クエリＱＥ１を対象とする検索処理において、一のノードからのエッジが連結されたノード（以下「接続ノード」ともいう）と検索クエリとの距離を並列化して算出する（ステップＳ１３）。図１では、情報処理装置１００は、一括処理情報ＬＴ１に示すように、ノードＮ７からのエッジが連結されたノード（接続ノード）であるノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５等については、検索クエリＱＥ１との距離を並列化して算出する。 Here, the information processing apparatus 100 parallelizes the distance between the node (hereinafter, also referred to as “connection node”) to which the edges from one node are connected and the search query in the search process targeting the search query QE1. Calculate (step S13). In FIG. 1, as shown in the batch processing information LT1, the information processing apparatus 100 refers to the search query QE1 for the nodes N9, N12, N54, N85, etc., which are the nodes (connection nodes) to which the edges from the node N7 are connected. Calculate by parallelizing the distance with.

例えば、ノードＮ９は、ノード情報ＩＮＦ１に示すように、第１サブベクトルがコードブックＣＤ１２、第２サブベクトルがコードブックＣＤ２３、第３サブベクトルがコードブックＣＤ３５、及び第４サブベクトルがコードブックＣＤ４７に対応付けられていることを示す。なお、変数CDのサイズはコードブックのサイズにより決定されるので、例えばコードブックサイズが16であれば、変数CDは4ビットで良いことになり、大幅なデータの圧縮が可能である。そのため、情報処理装置１００は、コードブックＣＤ１２の距離ＤＳ１２、コードブックＣＤ２３の距離ＤＳ２３、コードブックＣＤ３５の距離ＤＳ３５、及びコードブックＣＤ４７の距離ＤＳ４７を用いて、ノードＮ９と、検索クエリＱＥ１との距離を算出する。例えば、情報処理装置１００は、コードブックＣＤ１２の距離ＤＳ１２、コードブックＣＤ２３の距離ＤＳ２３、コードブックＣＤ３５の距離ＤＳ３５、及びコードブックＣＤ４７の距離ＤＳ４７の合計を、ノードＮ９と検索クエリＱＥ１との距離として算出する。 For example, in the node N9, as shown in the node information INF1, the first subvector is the codebook CD12, the second subvector is the codebook CD23, the third subvector is the codebook CD35, and the fourth subvector is the codebook CD47. Indicates that it is associated with. Since the size of the variable CD is determined by the size of the codebook, for example, if the codebook size is 16, the variable CD may be 4 bits, and a large amount of data can be compressed. Therefore, the information processing apparatus 100 uses the distance DS12 of the codebook CD12, the distance DS23 of the codebook CD23, the distance DS35 of the codebook CD35, and the distance DS47 of the codebook CD47, and the distance between the node N9 and the search query QE1. Is calculated. For example, the information processing apparatus 100 uses the total of the distance DS12 of the codebook CD12, the distance DS23 of the codebook CD23, the distance DS35 of the codebook CD35, and the distance DS47 of the codebook CD47 as the distance between the node N9 and the search query QE1. calculate.

例えば、ノードＮ１２は、ノード情報ＩＮＦ２に示すように、第１サブベクトルがコードブックＣＤ１４、第２サブベクトルがコードブックＣＤ２９、第３サブベクトルがコードブックＣＤ３１、及び第４サブベクトルがコードブックＣＤ４５に対応付けられていることを示す。例えば、情報処理装置１００は、コードブックＣＤ１４の距離ＤＳ１４、コードブックＣＤ２９の距離ＤＳ２９、コードブックＣＤ３１の距離ＤＳ３１、及びコードブックＣＤ４５の距離ＤＳ４５の合計を、ノードＮ１２と検索クエリＱＥ１との距離として算出する。同様に、情報処理装置１００は、ノードＮ５４のノード情報ＩＮＦ３を用いて、ノードＮ５４と検索クエリＱＥ１との距離を算出し、ノードＮ８５のノード情報ＩＮＦ４を用いて、ノードＮ８５と検索クエリＱＥ１との距離を算出する。 For example, in the node N12, as shown in the node information INF2, the first subvector is the codebook CD14, the second subvector is the codebook CD29, the third subvector is the codebook CD31, and the fourth subvector is the codebook CD45. Indicates that it is associated with. For example, the information processing apparatus 100 uses the total of the distance DS14 of the codebook CD14, the distance DS29 of the codebook CD29, the distance DS31 of the codebook CD31, and the distance DS45 of the codebook CD45 as the distance between the node N12 and the search query QE1. calculate. Similarly, the information processing apparatus 100 calculates the distance between the node N54 and the search query QE1 using the node information INF3 of the node N54, and uses the node information INF4 of the node N85 to set the node N85 and the search query QE1. Calculate the distance.

例えば、情報処理装置１００は、ＳＩＭＤの演算に関する並列化により、ノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５の各々と検索クエリＱＥ１との距離を一括して算出する。これにより、情報処理装置１００は、距離計算を並列化することにより、距離計算を高速化することでき、効率的な検索処理を可能にすることができる。 For example, the information processing apparatus 100 collectively calculates the distance between each of the nodes N9, N12, N54, and N85 and the search query QE1 by parallelizing the SIMD operation. As a result, the information processing apparatus 100 can speed up the distance calculation by parallelizing the distance calculation, and can enable efficient search processing.

なお、図１では説明を簡単にするために、４つのノードを対象として距離を並列化して算出する場合を示すが、並列化される数は、情報処理装置１００の仕様に基づいて決定される。例えば、情報処理装置１００がＳＩＭＤにより一括して処理できる数（「一括処理可能単位」ともいう）が「４」である場合、図１と同様の処理となるが、ＳＩＭＤにより一括して処理できる数（一括処理可能単位）が「１６」である場合、情報処理装置１００は、１６個のノードを対象として距離を並列化して算出する。また、一括処理可能単位が「３２」である場合、情報処理装置１００は、３２個のノードを対象として距離を並列化して算出する。このように、情報処理装置１００は、ＳＩＭＤにより一括して処理できる数（一括処理可能単位）に対応する数のノードの距離計算を一括して行うことにより、効率的な検索処理を可能にすることができる。 Note that FIG. 1 shows a case where the distances are calculated in parallel for four nodes for the sake of simplicity, but the number to be parallelized is determined based on the specifications of the information processing apparatus 100. .. For example, when the number of information processing apparatus 100 that can be collectively processed by SIMD (also referred to as “collective processable unit”) is “4”, the processing is the same as in FIG. 1, but can be collectively processed by SIMD. When the number (unit that can be collectively processed) is "16", the information processing apparatus 100 calculates by parallelizing the distances for 16 nodes. Further, when the batch processable unit is "32", the information processing apparatus 100 calculates by parallelizing the distances for 32 nodes. In this way, the information processing apparatus 100 enables efficient search processing by collectively calculating the distances of the number of nodes corresponding to the number that can be collectively processed by SIMD (collective processable unit). be able to.

情報処理装置１００は、上記のように複数のノードの距離計算を並列化して行いながら、検索クエリＱＥ１を対象として、グラフＧＲ１を用いた図１１に示すような検索処理を行うことにより、検索数のノードを検索クエリＱＥ１の近傍のノードとして抽出する。 The information processing apparatus 100 performs the search process as shown in FIG. 11 using the graph GR1 for the search query QE1 while performing the distance calculation of the plurality of nodes in parallel as described above. Node is extracted as a node in the vicinity of the search query QE1.

上述のように、情報処理装置１００は、ベクトル量子化された各ノードのベクトルと検索クエリとの間の近似距離（第１距離）を算出して、第１距離を用いて検索処理を行う事により、効率的な検索処理を可能にすることができる。また、情報処理装置１００は、並列化可能な数のノードの距離計算を一括して行うことにより、効率的な検索処理を可能にすることができる。上記のように、情報処理装置１００は、グラフのノードの複数の接続ノードとの近似距離を一括して計算することにより、限定されたメモリ領域（ルックアップテーブル）を繰り返し利用する（メモリキャッシュにのる）ので高速化が可能である。 As described above, the information processing apparatus 100 calculates an approximate distance (first distance) between the vector of each vector quantized node and the search query, and performs the search process using the first distance. Therefore, efficient search processing can be enabled. Further, the information processing apparatus 100 can enable efficient search processing by collectively calculating the distances of a number of nodes that can be parallelized. As described above, the information processing apparatus 100 repeatedly uses the limited memory area (look-up table) by collectively calculating the approximate distances of the nodes of the graph to the plurality of connected nodes (in the memory cache). Therefore, it is possible to increase the speed.

また、図１の例では、情報処理装置１００は、直積量子化により分割したベクトルを用いて検索処理を行うことにより、ベクトルを分割せずに検索処理を行う場合に比べて、より効率的な検索処理を可能にすることができる。例えば、情報処理装置１００は、直積量子化により短いベクトル間の距離を算出することとなり、距離計算の対象となるベクトルのサイズを小さくできる。また、情報処理装置１００は、距離計算時に利用するルックアップテーブルによりアクセスするメモリ空間を、分割されたサブベクトルのルックアップテーブルに限定することができる。情報処理装置１００は、直積量子化により検索精度の低下を抑制しつつ、ベクトルサイズを削減することができる。なお、図１では、直積量子化が行われた場合の処理を説明したが、直積量子化が行われた場合は一例に過ぎず、情報処理装置１００は、直積量子化を行われてないベクトルを用いて検索処理を行ってもよい。 Further, in the example of FIG. 1, the information processing apparatus 100 is more efficient than the case where the search process is performed without dividing the vector by performing the search process using the vector divided by the direct product quantization. Search processing can be enabled. For example, the information processing apparatus 100 calculates the distance between short vectors by direct product quantization, and the size of the vector to be calculated for the distance can be reduced. Further, the information processing apparatus 100 can limit the memory space accessed by the look-up table used at the time of distance calculation to the look-up table of the divided subvectors. The information processing apparatus 100 can reduce the vector size while suppressing a decrease in search accuracy by direct product quantization. Note that FIG. 1 describes the process when the direct product quantization is performed, but the case where the direct product quantization is performed is only an example, and the information processing apparatus 100 is a vector in which the direct product quantization is not performed. You may perform the search process using.

〔１－１－１．その他〕
上述した処理は一例に過ぎず、情報処理装置１００は、効率的な検索処理の為に様々な情報や手法を用いて、検索処理を行ってもよい。この点について、各事項について詳述する。 [1-1-1. others〕
The above-mentioned processing is only an example, and the information processing apparatus 100 may perform the search processing by using various information and methods for the efficient search processing. In this regard, each item will be described in detail.

（エッジ（接続ノード）の格納態様）
各ノードに連結されるエッジ（接続ノード）の情報について、ノードでのエッジ（接続ノード）の格納態様（格納方法）には、例えば以下の第１格納態様及び第２格納態様の２種類が考えられるため、格納態様については、利用形態によって選択されてもよい。 (Storage mode of edge (connection node))
Regarding the information of the edge (connection node) connected to each node, for example, the following two types of storage mode (storage method) of the edge (connection node) at the node can be considered. Therefore, the storage mode may be selected depending on the usage mode.

例えば、第１格納態様としては、各ノードには接続ノード（オブジェクト）のＩＤを対応付けて格納し、別のテーブルに直積量子化されたオブジェクトの情報を格納してもよい。例えば、図７及び図８に示すデータの格納態様が第１格納態様に対応する。第１格納態様の場合、メモリ使用量を削減できる。 For example, as the first storage mode, the ID of the connection node (object) may be stored in association with each node, and the information of the Cartesian product quantized object may be stored in another table. For example, the data storage mode shown in FIGS. 7 and 8 corresponds to the first storage mode. In the case of the first storage mode, the memory usage can be reduced.

また、例えば、第２格納態様としては、各ノードに直に量子化されたオブジェクトを持つ。第２格納態様の場合、あるノードの接続ノード（オブジェクト）が量子化された情報を各ノードに対応付けて記憶する。例えば、ノードＮ７の接続ノードであるノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５の各々の量子化された情報（図２中のノード情報ＩＮＦ１～ＩＮＦ４）がノードＮ７に対応付けて記憶されるデータの格納態様が第２格納態様に対応する。第２格納態様の場合、検索時にシーケンシャルにオブジェクトをアクセスするため速度低下が発生を抑制することができる。 Further, for example, as the second storage mode, each node has an object directly quantized. In the case of the second storage mode, the connection node (object) of a certain node stores the quantized information in association with each node. For example, a storage mode in which the quantized information (node information INF1 to INF4 in FIG. 2) of each of the nodes N9, N12, N54, and N85, which are the connection nodes of the node N7, is stored in association with the node N7. Corresponds to the second storage mode. In the case of the second storage mode, since the objects are accessed sequentially at the time of searching, it is possible to suppress the occurrence of slowdown.

（ルックアップテーブル）
上述した例では、分割したサブベクトルごとに個別にクラスタリングし、個々にルックアップテーブル（コードブック情報）を生成する場合を示したが、この場合に限らず、ルックアップテーブルについては任意の態様であってもよい。 (Look-up table)
In the above example, the case where clustering is performed individually for each divided subvector and the look-up table (codebook information) is generated individually is shown, but the case is not limited to this case, and the look-up table can be used in any manner. There may be.

例えば、全てのサブベクトルに対してクラスタリングし、一つのルックアップテーブルが生成されてもよい。上述した例では、例えば、第１サブベクトル、第２サブベクトル、第３サブベクトル、及び第４サブベクトルの４つのサブベクトル全体を対象にクラスタリングし、一つのルックアップテーブルが生成されてもよい。 For example, one lookup table may be generated by clustering all subvectors. In the above example, for example, one lookup table may be generated by clustering all four subvectors of the first subvector, the second subvector, the third subvector, and the fourth subvector. ..

また、個々のサブベクトルを部分的にマージして複数のクラスごとに、ルックアップテーブルが生成されてもよい。例えば、分散が類似しているサブベクトルをマージして複数のクラスを形成し、複数のクラスごとに、ルックアップテーブルが生成されてもよい。上述した例では、例えば、第１サブベクトル及び第３サブベクトルの２つのサブベクトの分散が類似し、第２サブベクトル及び第４サブベクトルの２つのサブベクトの分散が類似している場合、第１サブベクトル及び第３サブベクトルをマージして第１クラスとし、第２サブベクトル及び第４サブベクトルをマージして第２クラスとしてもよい。この場合、第１クラスのルックアップテーブル（コードブック情報）と、第２クラスのルックアップテーブル（コードブック情報）との２つのルックアップテーブルが生成されてもよい。 In addition, a lookup table may be generated for each of a plurality of classes by partially merging individual subvectors. For example, subvectors with similar variances may be merged to form a plurality of classes, and a look-up table may be generated for each of the plurality of classes. In the above example, for example, when the variances of the two vectors of the first subvector and the third subvector are similar, and the variances of the two vectors of the second subvector and the fourth subvector are similar, the first The subvector and the third subvector may be merged into the first class, and the second subvector and the fourth subvector may be merged into the second class. In this case, two look-up tables, a first-class look-up table (codebook information) and a second-class look-up table (codebook information), may be generated.

（転置）
上述したデータの保持は一例に過ぎず、情報処理装置１００は、効率的なデータ参照等が可能となるように、種々の態様によりデータを保持してもよい。例えば、情報処理装置１００は、転置したデータを保持してもよい。この点について、図３を用いて説明する。図３は、第１の実施形態に係るデータの一例を示す図である。 (Transposition)
The above-mentioned data retention is only an example, and the information processing apparatus 100 may retain the data in various manners so that efficient data reference and the like can be performed. For example, the information processing apparatus 100 may hold the transposed data. This point will be described with reference to FIG. FIG. 3 is a diagram showing an example of data according to the first embodiment.

例えば、ノード情報ＩＮＦ１～ＩＮＦ４に示すようにデータを保持し、ノード情報ＩＮＦ１～ＩＮＦ４を順次処理した場合、図２のステップＳ１４に示すように、ルックアップテーブルであるコードブック情報ＴＢ１～ＴＢ４を繰り返し参照することとなる。 For example, when the data is retained as shown in the node information INF1 to INF4 and the node information INF1 to INF4 are sequentially processed, the codebook information TB1 to TB4 which is a look-up table is repeated as shown in step S14 of FIG. It will be referred to.

そこで、ステップＳ１５に示すように、ノード情報ＩＮＦ１～ＩＮＦ４を転置した転置データＴＲ１～ＴＲ４が生成される。例えば、情報処理装置１００は、ノード情報ＩＮＦ１～ＩＮＦ４を転置した転置データＴＲ１～ＴＲ４を生成する。 Therefore, as shown in step S15, the transposed data TR1 to TR4 in which the node information INF1 to INF4 are transposed are generated. For example, the information processing apparatus 100 generates transposed data TR1 to TR4 by transposing the node information INF1 to INF4.

図３では、ノード情報ＩＮＦ１～ＩＮＦ４のうち、ノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５の第１サブベクトルに対応するコードブックの一覧である第１データＴＲ１が生成される。具体的には、ノードＮ９の第１サブベクトルに対応するコードブックＣＤ１２、ノードＮ１２の第１サブベクトルに対応するコードブックＣＤ１４、ノードＮ５４の第１サブベクトルに対応するコードブックＣＤ１３、及びノードＮ８５の第１サブベクトルに対応するコードブックＣＤ１８の一覧である第１データＴＲ１が生成される。 In FIG. 3, among the node information INF1 to INF4, the first data TR1 which is a list of codebooks corresponding to the first subvectors of the nodes N9, N12, N54, and N85 is generated. Specifically, the codebook CD12 corresponding to the first subvector of the node N9, the codebook CD14 corresponding to the first subvector of the node N12, the codebook CD13 corresponding to the first subvector of the node N54, and the node N85. The first data TR1 which is a list of the codebook CD18 corresponding to the first subvector of is generated.

同様に、ノード情報ＩＮＦ１～ＩＮＦ４のうち、ノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５の第２サブベクトルに対応するコードブックの一覧である第２データＴＲ２が生成される。また、ノード情報ＩＮＦ１～ＩＮＦ４のうち、ノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５の第３サブベクトルに対応するコードブックの一覧である第３データＴＲ３が生成される。ノード情報ＩＮＦ１～ＩＮＦ４のうち、ノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５の第４サブベクトルに対応するコードブックの一覧である第４データＴＲ４が生成される。 Similarly, among the node information INF1 to INF4, the second data TR2, which is a list of codebooks corresponding to the second subvectors of the nodes N9, N12, N54, and N85, is generated. Further, among the node information INF1 to INF4, the third data TR3, which is a list of codebooks corresponding to the third subvectors of the nodes N9, N12, N54, and N85, is generated. Of the node information INF1 to INF4, the fourth data TR4, which is a list of codebooks corresponding to the fourth subvectors of the nodes N9, N12, N54, and N85, is generated.

なお、各転置データＴＲ１～ＴＲ４の一覧のうち、何番目のデータがどのノードに対応するかの対応付けを示す対応付情報が生成される。図３の例では、各転置データＴＲ１～ＴＲ４の一覧のうち、１番目（最初）のデータがノードＮ９に対応し、２番目のデータがノードＮ１２に対応し、３番目のデータがノードＮ５４に対応し、４番目（最後）のデータがノードＮ８５に対応することを示す対応付情報が生成される。情報処理装置１００は、対応付情報を参照することにより、各転置データＴＲ１～ＴＲ４の一覧の各データが、どのノードに対応するかを特定することができる。例えば、情報処理装置１００は、対応付情報を生成する。 It should be noted that, in the list of each transposed data TR1 to TR4, correspondence information indicating the correspondence of which node corresponds to which node is generated. In the example of FIG. 3, in the list of each transposed data TR1 to TR4, the first (first) data corresponds to the node N9, the second data corresponds to the node N12, and the third data corresponds to the node N54. Corresponding, correspondence information is generated indicating that the fourth (last) data corresponds to node N85. The information processing apparatus 100 can specify which node the data in the list of the transposed data TR1 to TR4 corresponds to by referring to the correspondence information. For example, the information processing apparatus 100 generates correspondence information.

情報処理装置１００は、ノード情報ＩＮＦ１～ＩＮＦ４を転置した転置データＴＲ１～ＴＲ４を用いて処理を行う。例えば、情報処理装置１００は、転置データＴＲ１を用いてルックアップテーブルを参照する場合、コードブック情報ＴＢ１のみを参照することとなり、１つのコードブック情報ＴＢ１のみを参照することとなる。同様に、情報処理装置１００は、転置データＴＲ２を用いてルックアップテーブルを参照する場合、コードブック情報ＴＢ２のみを参照することとなり、１つのコードブック情報ＴＢ２のみを参照することとなる。転置データＴＲ３、ＴＲ４についても同様に１つのコードブック情報のみを参照することとなる。これにより、情報処理装置１００は、効率的にデータを参照することができるため、効率的な検索処理を可能にすることができる。 The information processing apparatus 100 performs processing using the transposed data TR1 to TR4 in which the node information INF1 to INF4 are transposed. For example, when the information processing apparatus 100 refers to the look-up table using the transposed data TR1, it refers only to the codebook information TB1 and refers to only one codebook information TB1. Similarly, when the information processing apparatus 100 refers to the look-up table using the transposed data TR2, it refers only to the codebook information TB2 and refers to only one codebook information TB2. Similarly, for the transposed data TR3 and TR4, only one codebook information is referred to. As a result, the information processing apparatus 100 can efficiently refer to the data, so that efficient search processing can be enabled.

このように、一括距離計算時にサブクラス（サブベクトル等）ごとのルックアップテーブルの参照するように、ノードの近傍オブジェクト（接続ノード）のデータを転置し、サブクラスごとにまとめ上げた順番でデータをノードにもつことで、参照の局所性がさらに高まり、高速化が可能となる。 In this way, the data of the object near the node (connection node) is transposed so that the lookup table for each subclass (subvector, etc.) is referenced when calculating the batch distance, and the data is noded in the order in which they are organized for each subclass. By having it, the locality of reference is further enhanced and the speed can be increased.

なお、図３では説明を簡単にするために、４つのノードを対象として転置データを生成する場合を示すが、転置データを生成する単位（ノードの数）は、情報処理装置１００の仕様に基づいて決定される。例えば、情報処理装置１００の一括処理可能単位が「４」である場合、図３と同様の処理となるが、一括処理可能単位が「１６」である場合、１６個のノードを一つの単位として転置データが生成される。また、一括処理可能単位が「３２」である場合、３２個のノードを一つの単位として転置データが生成される。このように、情報処理装置１００がＳＩＭＤにより一括して処理できる数（一括処理可能単位）に応じて生成される転置データを用いることで、情報処理装置１００は、効率的にデータを参照することができるため、効率的な検索処理を可能にすることができる。 Note that FIG. 3 shows a case where transposed data is generated for four nodes for the sake of simplicity, but the unit (number of nodes) for generating transposed data is based on the specifications of the information processing apparatus 100. Will be decided. For example, when the batch processable unit of the information processing apparatus 100 is "4", the processing is the same as in FIG. 3, but when the batch processable unit is "16", 16 nodes are regarded as one unit. Transposed data is generated. Further, when the batch processable unit is "32", the transposed data is generated with 32 nodes as one unit. In this way, by using the translocation data generated according to the number (collective processable unit) that the information processing apparatus 100 can collectively process by SIMD, the information processing apparatus 100 can efficiently refer to the data. Therefore, efficient search processing can be enabled.

例えば、上述のような検索処理時の時間の多くは距離計算であるが、距離計算は上記のようなＳＩＭＤによる並列計算により高速化を図ることができる。このように距離計算を並列化した場合、オブジェクトデータをフェッチする時間が検索処理を占めることになる。このフェッチ時間を削減することができれば、さらなる高速化が実現される。なお、データをフェッチする時間を削減するにはプリフェッチを行う方法があるが、限界がある。 For example, most of the time during the search process as described above is distance calculation, but the distance calculation can be speeded up by parallel calculation by SIMD as described above. When the distance calculation is parallelized in this way, the time to fetch the object data occupies the search process. If this fetch time can be reduced, further speedup will be realized. There is a method of performing prefetch to reduce the time for fetching data, but there is a limit.

一方、情報処理装置１００は、上述のように参照の局所性を高め、効率的にデータを参照することを可能にすることにより、データのフェッチを抑制し、さらなる高速化を実現することができる。 On the other hand, the information processing apparatus 100 can suppress the fetching of data and realize further speedup by increasing the locality of reference as described above and making it possible to refer to the data efficiently. ..

（検索結果）
なお、上述した第２距離（真の距離）による検索結果を返す場合には、以下の第１の方法及び第２の方法の２つの方法が考えられる。 (search results)
When returning the search result by the above-mentioned second distance (true distance), the following two methods, the first method and the second method, can be considered.

例えば、第１の方法としては、検索数を指定された検索数より多く探索し、探索終了時に、真の距離計算を行って距離でソートし、指定された検索数のオブジェクトを検索結果としてもよい。この場合、情報処理装置１００は、指定された検索数（「第１数」ともいう）よりも多い数（「拡張検索数」ともいう）のノードを抽出するように、拡張検索数（「第２数」ともいう）を検索数として設定し、図１１に示すような検索処理を行うことにより、拡張検索数のノード、すなわち指定された検索数よりも多い数のノードを近傍候補ノードとして抽出する。そして、情報処理装置１００は、近傍候補ノードを対象として、第２距離（真の距離）を算出し、近傍候補ノードのうち、第２距離が短い方から第１数のノードを検索クエリの近傍のノードとして抽出する。 For example, as the first method, the number of searches may be searched more than the specified number of searches, and at the end of the search, the true distance calculation may be performed to sort by distance, and the objects with the specified number of searches may be used as the search result. good. In this case, the information processing apparatus 100 extracts the number of extended searches (also referred to as "extended search number") more than the specified number of searches (also referred to as "first number") (also referred to as "extended search number"). By setting "2 numbers") as the number of searches and performing the search process as shown in FIG. 11, nodes with an extended search number, that is, nodes with a number larger than the specified number of searches, are extracted as neighborhood candidate nodes. do. Then, the information processing apparatus 100 calculates a second distance (true distance) for the neighborhood candidate node, and among the neighborhood candidate nodes, the first node from the one with the shortest second distance is searched for in the vicinity of the search query. Extract as a node of.

また、例えば、第２の方法としては、探索中に、第１距離（近似距離）に基づいて、ノード（オブジェクト）が探索範囲内または検索範囲内に入った場合のみに、第２距離（真の距離）を計算し、近似距離を真の距離で置き換えることで、真の距離による検索結果を返してもよい。 Further, for example, as a second method, the second distance (true) is obtained only when the node (object) is within the search range or within the search range based on the first distance (approximate distance) during the search. Distance) may be calculated and the approximate distance may be replaced with the true distance to return the search result by the true distance.

ここでいう、検索範囲は、図１１中の「ｒ」により規定される範囲であり、探索範囲は、図１１中の検索範囲係数「ε」を用いた「ｒ（１＋ε）」により規定される範囲である。例えば、検索範囲に入った場合に第２距離（真の距離）を計算する場合、情報処理装置１００は、ノードの第１距離（近似距離）が「ｒ」以下となった場合に、そのノードの第２距離（真の距離）を算出する。また、探索範囲に入った場合に第２距離（真の距離）を計算する場合、情報処理装置１００は、ノードの第１距離（近似距離）が「ｒ（１＋ε）」以下となった場合に、そのノードの第２距離（真の距離）を算出する。 The search range referred to here is a range defined by "r" in FIG. 11, and the search range is defined by "r (1 + ε)" using the search range coefficient "ε" in FIG. The range. For example, when calculating the second distance (true distance) when entering the search range, the information processing apparatus 100 determines that the node when the first distance (approximate distance) of the node is "r" or less. The second distance (true distance) of is calculated. Further, when calculating the second distance (true distance) when entering the search range, the information processing apparatus 100 determines that the first distance (approximate distance) of the node is "r (1 + ε)" or less. , Calculate the second distance (true distance) of that node.

例えば、精度を高める場合、探索範囲に入った場合を条件としてもよい。また、処理の高速化を求める場合、検索範囲に入った場合を条件としてもよい。なお、上記は一例に過ぎず、いずれを条件とするかは、検索範囲係数「ε」の値や処理の目的などに応じて適宜設定されてもよい。 For example, in order to improve the accuracy, the condition may be that the search range is entered. Further, when speeding up the processing is required, it may be a condition that the process is within the search range. It should be noted that the above is only an example, and which condition is used may be appropriately set according to the value of the search range coefficient “ε”, the purpose of processing, and the like.

（ベクトル量子化）
なお、情報処理装置１００は、特許文献３に示すように、２段階のベクトル量子化を行ってもよい。そして、情報処理装置１００は、下記の式（１）により、検索クエリと各ノードとの距離を算出してもよい。 (Vector quantization)
As shown in Patent Document 3, the information processing apparatus 100 may perform two-step vector quantization. Then, the information processing apparatus 100 may calculate the distance between the search query and each node by the following equation (1).

ここで、上記式（１）中の左辺の値は、例えば、検索クエリとノードとの間の二乗距離を示す。また、例えば、上記式（１）中の「ｘ」は、クエリに対応する。また、例えば、上記式（１）中の「ｙ」は、ノードに対応する。また、例えば、上記式（１）の右辺中の「ｑ_ｃ（ｙ）」は、「ｙ」の代表ベクトル（セントロイド）を示す。例えば、情報処理装置１００は、上記式（１）中の「ｙ」について、ノードのベクトルデータを有しない場合は、各ノードが属する部分領域のセントロイドの数値を用いてもよい。また、例えば、「ｙ－ｑ_ｃ（ｙ）」は、残差ベクトルを示す。また、例えば、上記式（１）の右辺中の「ｑ_ｐ」は、所定の量子化器（関数）を示す。 Here, the value on the left side in the above equation (1) indicates, for example, the squared distance between the search query and the node. Further, for example, "x" in the above equation (1) corresponds to a query. Further, for example, "y" in the above equation (1) corresponds to a node. Further, for example, "q _c (y)" in the right side of the above equation (1) indicates a representative vector (centroid) of "y". For example, when the information processing apparatus 100 does not have the vector data of the nodes for "y" in the above equation (1), the numerical value of the centroid of the partial region to which each node belongs may be used. Further, for example, "yq _c (y)" indicates a residual vector. Further, for example, "q _p " in the right side of the above equation (1) indicates a predetermined quantizer (function).

また、例えば、上記式（１）の右辺中の「ｊ」は、分割された空間の数であってもよい。例えば、図１の例では、上記式（１）の右辺中の「ｊ」は、分割された空間の数「４」であってもよい。また、例えば、上記式（１）の右辺中の「ｕ_ｊ（）」は、括弧中のベクトル間の部分残差ベクトルを示す。例えば、情報処理装置１００は、上記式（１）を用いて、各部分空間におけるクエリとノードとの間の二乗距離を算出し、合算することにより、クエリとノードとの距離を算出してもよい。 Further, for example, "j" in the right side of the above equation (1) may be the number of divided spaces. For example, in the example of FIG. 1, "j" in the right side of the above equation (1) may be the number of divided spaces "4". Further, for example, "u _j ()" in the right side of the above equation (1) indicates a partial residual vector between the vectors in parentheses. For example, the information processing apparatus 100 may calculate the squared distance between the query and the node in each subspace using the above equation (1) and calculate the distance between the query and the node by adding them together. good.

〔１－２．情報処理システムの構成〕
図４に示すように、情報処理システム１は、端末装置１０と、情報提供装置５０と、情報処理装置１００とが含まれる。端末装置１０と、情報提供装置５０と、情報処理装置１００とは所定のネットワークＮを介して、有線または無線により通信可能に接続される。図４は、第１の実施形態に係る情報処理システムの構成例を示す図である。なお、図４に示した情報処理システム１には、複数台の端末装置１０や、複数台の情報提供装置５０や、複数台の情報処理装置１００が含まれてもよい。 [1-2. Information processing system configuration]
As shown in FIG. 4, the information processing system 1 includes a terminal device 10, an information providing device 50, and an information processing device 100. The terminal device 10, the information providing device 50, and the information processing device 100 are connected to each other via a predetermined network N so as to be communicable by wire or wirelessly. FIG. 4 is a diagram showing a configuration example of the information processing system according to the first embodiment. The information processing system 1 shown in FIG. 4 may include a plurality of terminal devices 10, a plurality of information providing devices 50, and a plurality of information processing devices 100.

端末装置１０は、ユーザによって利用される情報処理装置である。端末装置１０は、ユーザによる種々の操作を受け付ける。なお、以下では、端末装置１０をユーザと表記する場合がある。すなわち、以下では、ユーザを端末装置１０と読み替えることもできる。なお、上述した端末装置１０は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣ（Personal Computer）や、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等により実現される。 The terminal device 10 is an information processing device used by the user. The terminal device 10 accepts various operations by the user. In the following, the terminal device 10 may be referred to as a user. That is, in the following, the user can be read as the terminal device 10. The terminal device 10 described above is realized by, for example, a smartphone, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), or the like.

情報提供装置５０は、ユーザ等に種々の情報提供を行うための情報が格納された情報処理装置である。例えば、情報提供装置５０は、ウェブサーバ等の種々の外部装置から収集した文字情報等に基づくオブジェクトＩＤが格納される。例えば、情報提供装置５０は、ユーザ等に画像検索サービスを提供する情報処理装置である。例えば、情報提供装置５０は、画像検索サービスを提供するための各情報が格納される。例えば、情報提供装置５０は、画像検索サービスの対象となる画像に対応するベクトル情報を情報処理装置１００に提供する。また、情報提供装置５０は、クエリを情報処理装置１００に送信することにより、情報処理装置１００からクエリに対応する画像を示すオブジェクトＩＤ等を受信する。 The information providing device 50 is an information processing device in which information for providing various information to a user or the like is stored. For example, the information providing device 50 stores an object ID based on character information or the like collected from various external devices such as a web server. For example, the information providing device 50 is an information processing device that provides an image search service to users and the like. For example, the information providing device 50 stores each information for providing an image search service. For example, the information providing device 50 provides the information processing device 100 with vector information corresponding to an image targeted by an image search service. Further, the information providing device 50 receives an object ID or the like indicating an image corresponding to the query from the information processing device 100 by transmitting the query to the information processing device 100.

情報処理装置１００は、検索サービスを提供するコンピュータである。情報処理装置１００は、検索クエリに対応する指定された検索数のオブジェクトを検索結果として提供する。情報処理装置１００は、検索対象となるノード（オブジェクト）がエッジで連結されたグラフを用いて、検索クエリの近傍のノード（オブジェクト）を検索結果として提供する。情報処理装置１００は、複数のオブジェクトの各々に対応するノード群がエッジにより連結されたグラフを用いて、検索クエリの近傍のノードを検索する検索処理において、所定の基準により選択された複数のノードと検索クエリとの距離を、ベクトル量子化がされた複数のノードのベクトル情報を用いて算出する。 The information processing device 100 is a computer that provides a search service. The information processing apparatus 100 provides as a search result a specified number of objects corresponding to a search query. The information processing apparatus 100 provides a node (object) in the vicinity of a search query as a search result by using a graph in which nodes (objects) to be searched are connected by edges. The information processing apparatus 100 uses a graph in which nodes corresponding to each of a plurality of objects are connected by edges to search for nodes in the vicinity of a search query, and the information processing apparatus 100 has a plurality of nodes selected according to a predetermined criterion. The distance between and the search query is calculated using the vector information of multiple nodes that have undergone vector quantization.

例えば、情報処理装置１００は、端末装置１０からクエリ（検索クエリ）を受信すると、検索クエリに類似する対象（オブジェクト）を検索し、検索結果を端末装置に提供する。また、例えば、情報処理装置１００が端末装置に提供するデータは、画像情報等のデータ自体であってもよいし、ＵＲＬ（Uniform Resource Locator）等の対応するデータを参照するための情報であってもよい。また、検索クエリや検索対象（オブジェクト）は、画像、音声、テキストデータなど、如何なる種類のデータであってもよい。 For example, when the information processing apparatus 100 receives a query (search query) from the terminal apparatus 10, it searches for an object (object) similar to the search query and provides the search result to the terminal apparatus. Further, for example, the data provided by the information processing device 100 to the terminal device may be the data itself such as image information, or the information for referring to the corresponding data such as a URL (Uniform Resource Locator). May be good. Further, the search query and the search target (object) may be any kind of data such as image, voice, and text data.

〔１－３．情報処理装置の構成〕
次に、図５を用いて、第１の実施形態に係る情報処理装置１００の構成について説明する。図５は、第１の実施形態に係る情報処理装置１００の構成例を示す図である。図５に示すように、情報処理装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。なお、情報処理装置１００は、情報処理装置１００の管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を表示するための表示部（例えば、液晶ディスプレイ等）を有してもよい。 [1-3. Information processing device configuration]
Next, the configuration of the information processing apparatus 100 according to the first embodiment will be described with reference to FIG. FIG. 5 is a diagram showing a configuration example of the information processing apparatus 100 according to the first embodiment. As shown in FIG. 5, the information processing apparatus 100 includes a communication unit 110, a storage unit 120, and a control unit 130. The information processing device 100 includes an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the information processing device 100, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. You may have.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワーク（例えば図４中のネットワークＮ）と有線または無線で接続され、端末装置１０や情報提供装置５０との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 110 is connected to the network (for example, the network N in FIG. 4) by wire or wirelessly, and transmits / receives information to / from the terminal device 10 and the information providing device 50.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。第１の実施形態に係る記憶部１２０は、図５に示すように、オブジェクト情報記憶部１２１と、グラフ情報記憶部１２２と、量子化情報記憶部１２３と、コードブック情報記憶部１２４とを有する。 (Memory unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. As shown in FIG. 5, the storage unit 120 according to the first embodiment includes an object information storage unit 121, a graph information storage unit 122, a quantization information storage unit 123, and a codebook information storage unit 124. ..

（オブジェクト情報記憶部１２１）
第１の実施形態に係るオブジェクト情報記憶部１２１は、オブジェクトに関する各種情報を記憶する。例えば、オブジェクト情報記憶部１２１は、オブジェクトＩＤやベクトルデータを記憶する。図６は、第１の実施形態に係るオブジェクト情報記憶部の一例を示す図である。図６に示すオブジェクト情報記憶部１２１は、「オブジェクトＩＤ」、「ベクトル情報」といった項目が含まれる。 (Object information storage unit 121)
The object information storage unit 121 according to the first embodiment stores various information about the object. For example, the object information storage unit 121 stores object IDs and vector data. FIG. 6 is a diagram showing an example of an object information storage unit according to the first embodiment. The object information storage unit 121 shown in FIG. 6 includes items such as “object ID” and “vector information”.

「オブジェクトＩＤ」は、オブジェクトを識別するための識別情報を示す。また、「ベクトル情報」は、オブジェクトＩＤにより識別されるオブジェクトに対応するベクトル情報を示す。すなわち、図６の例では、オブジェクトを識別するオブジェクトＩＤに対して、オブジェクトに対応するベクトルデータ（ベクトル情報）が対応付けられて登録されている。 The "object ID" indicates identification information for identifying an object. Further, "vector information" indicates vector information corresponding to the object identified by the object ID. That is, in the example of FIG. 6, the vector data (vector information) corresponding to the object is associated and registered with the object ID that identifies the object.

例えば、図６の例では、オブジェクトＩＤ「ＯＢ１」により識別されるオブジェクト（対象）は、「１０，２４，５１，２・・・」の多次元のベクトル情報が対応付けられることを示す。 For example, in the example of FIG. 6, it is shown that the object (target) identified by the object ID “OB1” is associated with the multidimensional vector information of “10, 24, 51, 2, ...”.

なお、オブジェクト情報記憶部１２１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。 The object information storage unit 121 is not limited to the above, and may store various information depending on the purpose.

（グラフ情報記憶部１２２）
第１の実施形態に係るグラフ情報記憶部１２２は、グラフに関する各種情報を記憶する。例えば、グラフ情報記憶部１２２は、生成したグラフを記憶する。図７は、第１の実施形態に係るグラフ情報記憶部の一例を示す図である。図７に示すグラフ情報記憶部１２２は、「ノードＩＤ」、「オブジェクトＩＤ」、および「接続ノード情報」といった項目を有する。 (Graph information storage unit 122)
The graph information storage unit 122 according to the first embodiment stores various information related to the graph. For example, the graph information storage unit 122 stores the generated graph. FIG. 7 is a diagram showing an example of a graph information storage unit according to the first embodiment. The graph information storage unit 122 shown in FIG. 7 has items such as "node ID", "object ID", and "connection node information".

「ノードＩＤ」は、グラフにおける各ノード（対象）を識別するための識別情報を示す。また、「オブジェクトＩＤ」は、オブジェクトを識別するための識別情報を示す。なお、ノードＩＤとオブジェクトＩＤが共通である場合、「ノードＩＤ」にオブジェクトＩＤが記憶され、グラフ情報記憶部１２２に「オブジェクトＩＤ」の項目は含まれてなくてもよい。例えば、オブジェクトＩＤとノードＩＤとして用いる場合、「ノードＩＤ」にオブジェクトＩＤが記憶され、グラフ情報記憶部１２２に「オブジェクトＩＤ」の項目は含まれてなくてもよい。 The "node ID" indicates identification information for identifying each node (target) in the graph. Further, the "object ID" indicates identification information for identifying an object. When the node ID and the object ID are common, the object ID is stored in the "node ID", and the item of the "object ID" may not be included in the graph information storage unit 122. For example, when used as an object ID and a node ID, the object ID may be stored in the "node ID", and the item of the "object ID" may not be included in the graph information storage unit 122.

また、「接続ノード情報」は、対応するノードから辿ることができるノード（参照先のノード）に関する情報を示す。例えば、「接続ノード情報」には、「参照先」といった情報が含まれる。「参照先」は、エッジにより連結され、そのノードから辿ることができる参照先（ノード）を識別するための情報を示す。すなわち、図７の例では、ノードを識別するノードＩＤ（オブジェクトＩＤ）に対して、そのノードからエッジにより辿ることができる参照先（ノード）が対応付けられて登録されている。なお、「接続ノード情報」には、参照先に接続されるエッジを識別するための情報（エッジＩＤ）等が含まれてもよい。 Further, "connection node information" indicates information about a node (referenced node) that can be traced from the corresponding node. For example, "connection node information" includes information such as "reference destination". "Reference destination" indicates information for identifying a reference destination (node) that is connected by an edge and can be traced from that node. That is, in the example of FIG. 7, a reference destination (node) that can be traced by an edge from the node is associated with and registered with respect to the node ID (object ID) that identifies the node. The "connection node information" may include information (edge ID) for identifying an edge connected to the reference destination.

図７の例では、ノードＩＤ「Ｎ１」により識別されるノード（ノードＮ１）は、オブジェクトＩＤ「ＯＢ１」により識別されるオブジェクト（対象）に対応することを示す。また、ノードＮ１からは、ノードＩＤ「Ｎ４」により識別されるノード（ノードＮ４）にエッジが連結されており、ノードＮ１からノードＮ４へ辿ることができることを示す。 In the example of FIG. 7, it is shown that the node (node N1) identified by the node ID “N1” corresponds to the object (target) identified by the object ID “OB1”. Further, from the node N1, it is shown that the edge is connected to the node (node N4) identified by the node ID "N4", and the node N1 can be traced to the node N4.

また、ノードＩＤ「Ｎ２」により識別されるノード（ノードＮ２）は、オブジェクトＩＤ「ＯＢ２」により識別されるオブジェクト（対象）に対応することを示す。また、ノードＮ２からは、ノードＩＤ「Ｎ６」により識別されるノード（ノードＮ６）にエッジが連結されており、ノードＮ２からノードＮ６へ辿ることができることを示す。 Further, it is shown that the node (node N2) identified by the node ID "N2" corresponds to the object (target) identified by the object ID "OB2". Further, it is shown that the edge is connected to the node (node N6) identified by the node ID "N6" from the node N2, and the node N2 can be traced to the node N6.

また、ノードＩＤ「Ｎ７」により識別されるノード（ノードＮ７）は、オブジェクトＩＤ「ＯＢ７」により識別されるオブジェクト（対象）に対応することを示す。また、ノードＮ７からは、ノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５にエッジが連結されており、ノードＮ２からノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５の各々へ辿ることができることを示す。 Further, it is shown that the node (node N7) identified by the node ID "N7" corresponds to the object (target) identified by the object ID "OB7". Further, it is shown that the edges are connected to the nodes N9, N12, N54, and N85 from the node N7, and the nodes N9, N12, N54, and N85 can be traced from the node N2.

なお、グラフ情報記憶部１２２は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、グラフ情報記憶部１２２は、各ノード（ベクトル）間を連結するエッジの長さが記憶されてもよい。すなわち、グラフ情報記憶部１２２は、各ノード（ベクトル）間の距離を示す情報が記憶されてもよい。なお、グラフ情報記憶部１２２は、上記に限らず、種々のデータ構造によりグラフ情報を記憶してもよい。 The graph information storage unit 122 is not limited to the above, and may store various information depending on the purpose. For example, the graph information storage unit 122 may store the length of an edge connecting each node (vector). That is, the graph information storage unit 122 may store information indicating the distance between each node (vector). The graph information storage unit 122 is not limited to the above, and may store graph information according to various data structures.

また、グラフは、クエリを入力とし、グラフ中のエッジを辿ることによりノードを探索し、クエリに類似するノードを抽出し出力するプログラムモジュールを含んでもよい。すなわち、グラフは、グラフを用いて検索処理を行うプログラムモジュールとしての利用が想定されるものであってもよい。例えば、グラフＧＲ１は、クエリとしてベクトルデータが入力された場合に、そのベクトルデータに類似するベクトルデータに対応するノードをグラフ中から抽出し、出力するプログラムであってもよい。例えば、グラフＧＲ１は、クエリ画像に対応する類似画像を検索するプログラムモジュールとして利用されるデータであってもよい。例えば、グラフＧＲ１は、入力されたクエリに基づいて、グラフにおいてそのクエリに類似するノードを抽出し、出力するよう、コンピュータを機能させる。 Further, the graph may include a program module that takes a query as an input, searches for a node by tracing an edge in the graph, and extracts and outputs a node similar to the query. That is, the graph may be expected to be used as a program module that performs search processing using the graph. For example, the graph GR1 may be a program that extracts and outputs a node corresponding to the vector data similar to the vector data from the graph when the vector data is input as a query. For example, the graph GR1 may be data used as a program module for searching a similar image corresponding to a query image. For example, graph GR1 causes a computer to extract and output nodes similar to the query in the graph based on the input query.

（量子化情報記憶部１２３）
第１の実施形態に係る量子化情報記憶部１２３は、割当処理に関する各種情報を記憶する。図８は、第１の実施形態に係る量子化情報記憶部の一例を示す図である。図８の例では、量子化情報記憶部１２３には、「ノードＩＤ」、「オブジェクトＩＤ」、および「量子化情報」といった項目を有する。 (Quantization information storage unit 123)
The quantized information storage unit 123 according to the first embodiment stores various information related to the allocation process. FIG. 8 is a diagram showing an example of a quantized information storage unit according to the first embodiment. In the example of FIG. 8, the quantized information storage unit 123 has items such as “node ID”, “object ID”, and “quantized information”.

「ノードＩＤ」は、グラフにおける各ノード（対象）を識別するための識別情報を示す。また、「オブジェクトＩＤ」は、オブジェクトを識別するための識別情報を示す。なお、ノードＩＤとオブジェクトＩＤが共通である場合、「ノードＩＤ」にオブジェクトＩＤが記憶され、量子化情報記憶部１２３に「オブジェクトＩＤ」の項目は含まれてなくてもよい。 The "node ID" indicates identification information for identifying each node (target) in the graph. Further, the "object ID" indicates identification information for identifying an object. When the node ID and the object ID are common, the object ID is stored in the "node ID", and the item of the "object ID" may not be included in the quantization information storage unit 123.

また、「量子化情報」は、各ノード（オブジェクト）の量子化されたベクトルの情報を示す。例えば、「量子化情報」には、「要素」、「コードブックＩＤ」といった情報が含まれる。「要素」は、対応するオブジェクトのベクトルにおける配置を示す。図８の例では、「要素」には、「＃１」、「＃２」、「＃３」、「＃４」が含まれる場合を示す。この場合、各ノード（オブジェクト）のベクトルは４分割され、各分割された部分ベクトルがコードブックにより量子化されることを示す。なお、分割数は４に限らず、例えば分割数が６の場合、「要素」には、「＃１」、「＃２」、「＃３」、「＃４」、「＃５」、「＃６」が含まれる。「コードブックＩＤ」は、各要素（部分ベクトル）に対応するコードブックを識別するための情報を示す。 Further, "quantization information" indicates information on the quantized vector of each node (object). For example, "quantization information" includes information such as "element" and "codebook ID". "Element" indicates the arrangement of the corresponding object in the vector. In the example of FIG. 8, the case where the "element" includes "# 1", "# 2", "# 3", and "# 4" is shown. In this case, the vector of each node (object) is divided into four, and it is shown that each divided partial vector is quantized by the codebook. The number of divisions is not limited to 4, for example, when the number of divisions is 6, the "elements" include "# 1", "# 2", "# 3", "# 4", "# 5", and "# 5". # 6 "is included. The "codebook ID" indicates information for identifying the codebook corresponding to each element (partial vector).

図８の例では、ノードＮ９（オブジェクトＯＢ９）のベクトルは、４分割された部分ベクトルのうち、先頭の部分ベクトルがコードブックＩＤ「ＣＤ１２」により識別されるコードブック（コードブックＣＤ１２）により量子化されることを示す。また、ノードＮ９（オブジェクトＯＢ９）のベクトルは、４分割された部分ベクトルのうち、先頭から２番目の部分ベクトルがコードブックＩＤ「ＣＤ２３」により識別されるコードブック（コードブックＣＤ２３）により量子化されることを示す。 In the example of FIG. 8, the vector of the node N9 (object OB9) is quantized by the codebook (codebook CD12) in which the first partial vector of the four-divided partial vectors is identified by the codebook ID “CD12”. Indicates that it will be done. Further, the vector of the node N9 (object OB9) is quantized by the codebook (codebook CD23) in which the second partial vector from the beginning is identified by the codebook ID "CD23" among the four-divided partial vectors. Indicates that.

また、ノードＮ９（オブジェクトＯＢ９）のベクトルは、４分割された部分ベクトルのうち、先頭から３番目の部分ベクトルがコードブックＩＤ「ＣＤ３５」により識別されるコードブック（コードブックＣＤ３５）により量子化されることを示す。また、ノードＮ９（オブジェクトＯＢ９）のベクトルは、４分割された部分ベクトルのうち、先頭から４番目（すなわち最後尾）の部分ベクトルがコードブックＩＤ「ＣＤ４７」により識別されるコードブック（コードブックＣＤ４７）により量子化されることを示す。 Further, the vector of the node N9 (object OB9) is quantized by the codebook (codebook CD35) in which the third partial vector from the beginning is identified by the codebook ID "CD35" among the four-divided partial vectors. Indicates that. Further, the vector of the node N9 (object OB9) is a codebook (codebook CD47) in which the fourth (that is, the last) partial vector from the beginning of the four-divided partial vectors is identified by the codebook ID "CD47". ) Indicates that it is quantized.

なお、量子化情報記憶部１２３は、上記に限らず、目的に応じて種々の情報を記憶してもよい。 The quantized information storage unit 123 is not limited to the above, and may store various information depending on the purpose.

（コードブック情報記憶部１２４）
第１の実施形態に係るコードブック情報記憶部１２４は、コードブックに関する各種情報を記憶する。例えば、コードブック情報記憶部１２４は、コードブックＩＤや各コードブックのベクトル情報を記憶する。図９は、第１の実施形態に係るコードブック情報記憶部の一例を示す図である。図９の例では、コードブック情報記憶部１２４は、各コードブックとベクトルとの対応付けを示すルックアップテーブルを記憶する場合を一例として示す。 (Codebook information storage unit 124)
The codebook information storage unit 124 according to the first embodiment stores various information related to the codebook. For example, the codebook information storage unit 124 stores the codebook ID and the vector information of each codebook. FIG. 9 is a diagram showing an example of a codebook information storage unit according to the first embodiment. In the example of FIG. 9, the codebook information storage unit 124 shows as an example a case where a lookup table showing the correspondence between each codebook and a vector is stored.

コードブック情報記憶部１２４は、４分割された部分ベクトルのうち、先頭の部分ベクトルを量子化するために用いるコードブック情報ＴＢ１、４分割された部分ベクトルのうち、先頭から２番目の部分ベクトルを量子化するために用いるコードブック情報ＴＢ２を記憶する。また、コードブック情報記憶部１２４は、４分割された部分ベクトルのうち、先頭から３番目の部分ベクトルを量子化するために用いるコードブック情報ＴＢ３、４分割された部分ベクトルのうち、先頭から４番目（すなわち最後尾）の部分ベクトルを量子化するために用いるコードブック情報ＴＢ４等を記憶する。 The codebook information storage unit 124 uses the codebook information TB1 used to quantize the first partial vector of the four-divided subvectors, and the second subvector from the beginning of the four-divided subvectors. The codebook information TB2 used for quantization is stored. Further, the codebook information storage unit 124 uses codebook information TB3 for quantizing the third partial vector from the beginning among the four divided partial vectors, and four from the beginning among the four divided partial vectors. The codebook information TB4 and the like used for quantizing the second (that is, the last) partial vector are stored.

図９の例では、コードブック情報ＴＢ１には、コードブックＩＤ「ＣＤ１１」により識別されるコードブック（コードブックＣＤ１１）やコードブックＩＤ「ＣＤ１２」により識別されるコードブック（コードブックＣＤ１２）等のコードブック情報を記憶する。例えば、コードブックＣＤ１１は、「５，１３・・・」の多次元のベクトル情報が対応付けられることを示す。また、コードブックＣＤ１２は、「２７，５１・・・」の多次元のベクトル情報が対応付けられることを示す。 In the example of FIG. 9, the codebook information TB1 includes a codebook (codebook CD11) identified by the codebook ID “CD11”, a codebook (codebook CD12) identified by the codebook ID “CD12”, and the like. Memorize codebook information. For example, the codebook CD11 shows that the multidimensional vector information of "5, 13 ..." Is associated with each other. Further, the codebook CD12 shows that the multidimensional vector information of "27, 51 ..." Is associated with the codebook CD12.

なお、コードブック情報記憶部１２４は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、コードブック情報記憶部１２４は、各コードブックと検索クエリとの差分（距離）を示す情報を記憶してもよい。 The codebook information storage unit 124 is not limited to the above, and may store various information depending on the purpose. For example, the codebook information storage unit 124 may store information indicating the difference (distance) between each codebook and the search query.

（制御部１３０）
図５の説明に戻って、制御部１３０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）やＧＰＵ（Graphics Processing Unit）等によって、情報処理装置１００内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 130)
Returning to the description of FIG. 5, the control unit 130 is a controller, and is inside the information processing device 100 by, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), or the like. Various programs (corresponding to an example of an information processing program) stored in the storage device of the above are realized by executing the RAM as a work area. Further, the control unit 130 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図５に示すように、制御部１３０は、取得部１３１と、生成部１３２と、検索処理部１３３と、提供部１３４とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図５に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 5, the control unit 130 includes an acquisition unit 131, a generation unit 132, a search processing unit 133, and a provision unit 134, and realizes or executes an information processing function or operation described below. do. The internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 5, and may be any other configuration as long as it is configured to perform information processing described later.

（取得部１３１）
取得部１３１は、各種情報を取得する。例えば、取得部１３１は、記憶部１２０から各種情報を取得する。例えば、取得部１３１は、オブジェクト情報記憶部１２１や、グラフ情報記憶部１２２や、量子化情報記憶部１２３や、コードブック情報記憶部１２４等から各種情報を取得する。また、取得部１３１は、各種情報を外部の情報処理装置から取得する。取得部１３１は、端末装置１０や情報提供装置５０から各種情報を取得する。 (Acquisition unit 131)
The acquisition unit 131 acquires various types of information. For example, the acquisition unit 131 acquires various information from the storage unit 120. For example, the acquisition unit 131 acquires various information from the object information storage unit 121, the graph information storage unit 122, the quantization information storage unit 123, the codebook information storage unit 124, and the like. Further, the acquisition unit 131 acquires various information from an external information processing device. The acquisition unit 131 acquires various information from the terminal device 10 and the information providing device 50.

取得部１３１は、グラフを取得する。例えば、情報処理装置１００は、図１中の空間情報ＳＰ１を取得してもよい。例えば、情報処理装置１００は、情報提供装置５０等の外部装置からグラフを取得してもよい。 The acquisition unit 131 acquires the graph. For example, the information processing apparatus 100 may acquire the spatial information SP1 in FIG. For example, the information processing device 100 may acquire a graph from an external device such as the information providing device 50.

取得部１３１は、データ検索の対象となる複数のオブジェクトに対する検索クエリを取得する。例えば、取得部１３１は、検索クエリＱＥ１に関する情報を取得する。例えば、取得部１３１は、画像検索に関する検索クエリを取得する。例えば、取得部１３１は、利用する端末装置１０からクエリを取得する。例えば、取得部１３１は、利用する端末装置１０からクエリを受け付けた情報提供装置５０からクエリを取得する。 The acquisition unit 131 acquires a search query for a plurality of objects to be searched for data. For example, the acquisition unit 131 acquires information about the search query QE1. For example, the acquisition unit 131 acquires a search query related to an image search. For example, the acquisition unit 131 acquires a query from the terminal device 10 to be used. For example, the acquisition unit 131 acquires a query from the information providing device 50 that has received the query from the terminal device 10 to be used.

（生成部１３２）
生成部１３２は、各種情報を生成する。例えば、生成部１３２は、記憶部１２０に記憶された情報（データ）から各種情報（データ）を生成する。例えば、生成部１３２は、オブジェクト情報記憶部１２１や、グラフ情報記憶部１２２や、量子化情報記憶部１２３や、コードブック情報記憶部１２４等に記憶された情報（データ）から各種情報を生成する。 (Generation unit 132)
The generation unit 132 generates various information. For example, the generation unit 132 generates various information (data) from the information (data) stored in the storage unit 120. For example, the generation unit 132 generates various information from the information (data) stored in the object information storage unit 121, the graph information storage unit 122, the quantization information storage unit 123, the codebook information storage unit 124, and the like. ..

生成部１３２は、グラフ情報記憶部１２２に示すようなグラフを生成してもよい。例えば、生成部１３２は、空間情報ＳＰ１を生成する。また、生成部１３２は、量子化情報記憶部１２３に示すようなベクトル量子化に関する情報を生成してもよい。例えば、生成部１３２は、ノードＮ１（オブジェクトＯＢ１）等の各オブジェクトがベクトル量子化された情報を生成する。また、生成部１３２は、コードブック情報記憶部１２４に示すようなコードブックに関する情報を生成してもよい。生成部１３２は、コードブックのルックアップテーブルを生成してもよい。例えば、生成部１３２は、コードブック情報ＴＢ１～ＴＢ４のような、コードブックに関する情報を生成する。なお、情報処理装置１００がグラフ情報記憶部１２２、量子化情報記憶部１２３、コードブック情報記憶部１２４に示す情報を、情報提供装置５０等の外部装置から取得する場合、情報処理装置１００は、生成部１３２を有しなくてもよい。 The generation unit 132 may generate a graph as shown in the graph information storage unit 122. For example, the generation unit 132 generates the spatial information SP1. Further, the generation unit 132 may generate information related to vector quantization as shown in the quantization information storage unit 123. For example, the generation unit 132 generates information in which each object such as the node N1 (object OB1) is vector-quantized. Further, the generation unit 132 may generate information about the codebook as shown in the codebook information storage unit 124. The generation unit 132 may generate a codebook look-up table. For example, the generation unit 132 generates information about the codebook, such as codebook information TB1 to TB4. When the information processing device 100 acquires the information shown in the graph information storage unit 122, the quantization information storage unit 123, and the codebook information storage unit 124 from an external device such as the information providing device 50, the information processing device 100 may obtain the information. It is not necessary to have the generation unit 132.

（検索処理部１３３）
検索処理部１３３は、オブジェクトに関する検索サービスを提供する。検索処理部１３３は、各種情報を探索する。検索処理部１３３は、各種情報を検索する。例えば、検索処理部１３３は、グラフを探索することにより、オブジェクトを検索する。例えば、検索処理部１３３は、取得部１３１により取得されたクエリが取得された場合、グラフを探索することにより、クエリに類似するオブジェクトを検索する。例えば、検索処理部１３３は、グラフを探索することにより、クエリに類似するオブジェクトを抽出する。例えば、検索処理部１３３は、図１１に示すような処理手順に基づいて、グラフを探索することにより、クエリに類似するオブジェクトを抽出する。なお、情報処理装置１００は、検索サービスを提供しない場合、検索処理部１３３を有しなくてもよい。 (Search processing unit 133)
The search processing unit 133 provides a search service for objects. The search processing unit 133 searches for various types of information. The search processing unit 133 searches for various types of information. For example, the search processing unit 133 searches for an object by searching the graph. For example, when the query acquired by the acquisition unit 131 is acquired, the search processing unit 133 searches for an object similar to the query by searching the graph. For example, the search processing unit 133 extracts an object similar to a query by searching the graph. For example, the search processing unit 133 extracts an object similar to a query by searching the graph based on the processing procedure as shown in FIG. If the information processing apparatus 100 does not provide the search service, the information processing apparatus 100 does not have to have the search processing unit 133.

検索処理部１３３は、検索処理において各種情報を選択する。検索処理部１３３は、検索処理において各種情報を抽出する。検索処理部１３３は、検索処理において各種情報を判定する。検索処理部１３３は、検索処理において各種情報を決定する。検索処理部１３３は、検索処理において各種情報を変更する。検索処理部１３３は、検索処理において各種情報を更新する。 The search processing unit 133 selects various information in the search processing. The search processing unit 133 extracts various information in the search processing. The search processing unit 133 determines various information in the search processing. The search processing unit 133 determines various information in the search processing. The search processing unit 133 changes various information in the search processing. The search processing unit 133 updates various information in the search processing.

検索処理部１３３は、複数のオブジェクトの各々に対応するノード群がエッジにより連結されたグラフを用いて、検索クエリの近傍のノードを検索する検索処理において、所定の基準により選択された複数のノードと検索クエリとの距離を、ベクトル量子化がされた複数のノードのベクトル情報を用いて算出する。検索処理部１３３は、複数のノードと検索クエリとの距離の算出を並列化して一括で行う。検索処理部１３３は、情報処理装置１００の仕様に基づいて決定される一括処理数の複数のノードと検索クエリとの距離の算出を並列処理する。 The search processing unit 133 uses a graph in which nodes corresponding to each of a plurality of objects are connected by edges to search for nodes in the vicinity of the search query, and the search processing unit 133 selects a plurality of nodes according to a predetermined criterion. The distance between and the search query is calculated using the vector information of multiple nodes that have undergone vector quantization. The search processing unit 133 performs the calculation of the distance between the plurality of nodes and the search query in parallel and collectively. The search processing unit 133 processes in parallel the calculation of the distance between the plurality of nodes of the batch processing number determined based on the specifications of the information processing apparatus 100 and the search query.

検索処理部１３３は、複数のノードの各々が対応する代表ベクトルに対応付けられたコードブックを用いて、複数のノードと検索クエリとの距離を算出する。検索処理部１３３は、一のノードからのエッジが連結された複数のノードと検索クエリとの距離を算出する。検索処理部１３３は、複数のノードと複数のノードの各々が対応付けられた代表ベクトルを示す参照用情報が一のノードに対応付けて記憶されたノード情報を用いて、複数のノードと検索クエリとの距離を算出する。 The search processing unit 133 calculates the distance between the plurality of nodes and the search query by using the codebook associated with the representative vector corresponding to each of the plurality of nodes. The search processing unit 133 calculates the distance between a plurality of nodes to which edges from one node are connected and a search query. The search processing unit 133 uses the node information in which the reference information indicating the representative vector to which each of the plurality of nodes is associated with the plurality of nodes is associated with one node and stored, and the search query is made with the plurality of nodes. Calculate the distance to.

検索処理部１３３は、直積量子化により、各々が複数の部分ベクトルに分割された複数のノードのベクトル情報を用いて、複数のノードと検索クエリとの距離を算出する。検索処理部１３３は、複数のノードの各々が分割された複数の部分ベクトルの分割位置ごとのベクトルに対応する代表ベクトルに対応付けられた複数のコードブックを用いて、複数のノードと検索クエリとの距離を算出する。検索処理部１３３は、複数のノードと複数のノードの各々の複数の部分ベクトルの各々が対応付けられた代表ベクトルを示す参照用情報が一のノードに対応付けて記憶されたノード情報を用いて、複数のノードと検索クエリとの距離を算出する。 The search processing unit 133 calculates the distance between the plurality of nodes and the search query by using the vector information of the plurality of nodes, each of which is divided into a plurality of partial vectors by direct product quantization. The search processing unit 133 uses a plurality of codebooks associated with a representative vector corresponding to a vector for each division position of a plurality of partial vectors in which each of the plurality of nodes is divided, and uses a plurality of nodes and a search query. Calculate the distance of. The search processing unit 133 uses the node information in which the reference information indicating the representative vector to which each of the plurality of nodes and the plurality of partial vectors of each of the plurality of nodes is associated is stored in association with one node. , Calculate the distance between multiple nodes and the search query.

検索処理部１３３は、複数のノードの各々に、各ノードの複数の部分ベクトルの各々が対応付けられた代表ベクトルの一覧が対応付けられた参照用情報を用いて、複数のノードと検索クエリとの距離を算出する。検索処理部１３３は、複数のノードの各々の複数の部分ベクトルの各々が対応する代表ベクトルが、分割位置ごとに一覧で並ぶ転置情報と、複数のノードの各々が一覧で対応する位置を示す対応付情報とを含む参照用情報を用いて、複数のノードと検索クエリとの距離を算出する。検索処理部１３３は、複数のノードの各々の複数の部分ベクトルの各々が対応する代表ベクトルに対応付けられた一のコードブックを用いて、複数のノードと検索クエリとの距離を算出する。 The search processing unit 133 uses reference information in which a list of representative vectors to which each of the plurality of partial vectors of each node is associated with each of the plurality of nodes is associated with the plurality of nodes and the search query. Calculate the distance of. The search processing unit 133 indicates the transposition information in which the representative vectors corresponding to each of the plurality of partial vectors of the plurality of nodes are arranged in a list for each division position, and the correspondence in which each of the plurality of nodes indicates the corresponding position in the list. The distance between multiple nodes and the search query is calculated using the reference information including the attached information. The search processing unit 133 calculates the distance between the plurality of nodes and the search query by using one codebook in which each of the plurality of partial vectors of the plurality of nodes is associated with the corresponding representative vector.

検索処理部１３３は、検索処理において処理対象となったノードのうち、所定のノードを対象として、ベクトル量子化がされた距離である第１距離とは異なり、ベクトル量子化がされていない第２距離を算出する。検索処理部１３３は、検索処理において検索クエリの近傍のノードとして抽出するノードの第１数よりも多い数である第２数のノードを近傍候補ノードとして抽出し、近傍候補ノードを対象として第２距離を算出する。検索処理部１３３は、近傍候補ノードのうち、第２距離が短い方から第１数のノードを検索クエリの近傍のノードとして抽出する。 The search processing unit 133 does not perform vector quantization on a predetermined node among the nodes processed in the search processing, unlike the first distance, which is the distance obtained by vector quantization. Calculate the distance. The search processing unit 133 extracts a second number of nodes, which is a larger number than the first number of nodes to be extracted as a node in the vicinity of the search query in the search process, as a neighborhood candidate node, and targets the neighborhood candidate node as a second node. Calculate the distance. The search processing unit 133 extracts the first number of nodes from the one with the shortest second distance among the neighborhood candidate nodes as the nodes in the vicinity of the search query.

検索処理部１３３は、第１距離が所定の閾値以内であるノードを対象として第２距離を算出する。検索処理部１３３は、近傍のノードとして抽出する対象範囲を示す検索範囲内のノードを対象として第２距離を算出する。検索処理部１３３は、検索処理の対象範囲を示す探索範囲内のノードを対象として第２距離を算出する。 The search processing unit 133 calculates the second distance for the node whose first distance is within a predetermined threshold value. The search processing unit 133 calculates the second distance for the nodes in the search range indicating the target range to be extracted as the neighboring nodes. The search processing unit 133 calculates the second distance for the nodes in the search range indicating the target range of the search processing.

（提供部１３４）
提供部１３４は、各種情報を提供する。例えば、提供部１３４は、端末装置１０や情報提供装置５０に各種情報を送信する。例えば、提供部１３４は、検索クエリに対応するオブジェクトＩＤを検索結果として提供する。提供部１３４は、検索結果を端末装置１０へ送信する。提供部１３４は、検索処理部１３３により検索されたオブジェクトＩＤを、検索クエリに対応する検索結果として端末装置１０へ提供する。 (Providing section 134)
The providing unit 134 provides various information. For example, the providing unit 134 transmits various information to the terminal device 10 and the information providing device 50. For example, the providing unit 134 provides the object ID corresponding to the search query as a search result. The providing unit 134 transmits the search result to the terminal device 10. The providing unit 134 provides the object ID searched by the search processing unit 133 to the terminal device 10 as a search result corresponding to the search query.

また、提供部１３４は、検索処理部１３３により検索されたオブジェクトＩＤを情報提供装置５０へ提供してもよい。例えば、提供部１３４は、検索処理部１３３が検索により抽出したオブジェクトＩＤを情報提供装置５０へ提供する。提供部１３４は、検索処理部１３３により抽出されたオブジェクトＩＤをクエリに対応するベクトルを示す情報として情報提供装置５０に提供する。 Further, the providing unit 134 may provide the object ID searched by the search processing unit 133 to the information providing device 50. For example, the providing unit 134 provides the object ID extracted by the search processing unit 133 to the information providing device 50. The providing unit 134 provides the information providing device 50 with the object ID extracted by the search processing unit 133 as information indicating a vector corresponding to the query.

〔１－４．情報処理のフロー〕
次に、図１０を用いて、第１の実施形態に係る情報処理システム１による情報処理の手順について説明する。図１０は、第１の実施形態に係る情報処理の一例を示すフローチャートである。 [1-4. Information processing flow]
Next, the procedure of information processing by the information processing system 1 according to the first embodiment will be described with reference to FIG. FIG. 10 is a flowchart showing an example of information processing according to the first embodiment.

図１０に示すように、情報処理装置１００は、データ検索の対象となる複数のオブジェクトに対する検索クエリを取得する（ステップＳ１０１）。図１の例では、情報処理装置１００は、検索クエリＱＥ１を取得する。 As shown in FIG. 10, the information processing apparatus 100 acquires a search query for a plurality of objects to be searched for data (step S101). In the example of FIG. 1, the information processing apparatus 100 acquires the search query QE1.

そして、情報処理装置１００は、複数のオブジェクトの各々に対応するノード群がエッジにより連結されたグラフを用いて、検索クエリの近傍のノードを検索する検索処理において、所定の基準により選択された複数のノードと検索クエリとの距離を、ベクトル量子化がされた複数のノードのベクトル情報を用いて算出する（ステップＳ１０２）。図１の例では、情報処理装置１００は、複数のノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５と前記検索クエリＱＥ１との距離を、ベクトル量子化がされた複数のノードＮ９、Ｎ１２、Ｎ５４、Ｎ８５のベクトル情報を用いて算出する。 Then, the information processing apparatus 100 uses a graph in which nodes corresponding to each of the plurality of objects are connected by edges to search for nodes in the vicinity of the search query, and the information processing apparatus 100 is selected by a predetermined criterion. The distance between the node and the search query is calculated using the vector information of a plurality of nodes that have undergone vector quantization (step S102). In the example of FIG. 1, the information processing apparatus 100 uses a vector of a plurality of nodes N9, N12, N54, N85 in which the distance between the plurality of nodes N9, N12, N54, N85 and the search query QE1 is vector-quantized. Calculate using information.

〔１－５．検索処理例〕
ここで、第１の実施形態に係る検索処理の一例について、図１１を一例として説明する。図１１は、第１の実施形態に係る検索処理の一例を示すフローチャートである。以下に説明する検索処理は、情報処理装置１００の検索処理部１３３によって行われる。また、以下でいうオブジェクトは、ノードと読み替えてもよい。なお、以下では、情報処理装置１００（検索処理部１３３）が検索処理を行う。なお、検索サービスを提供しない場合、情報処理装置１００は検索処理部１３３を有しなくてもよい。以下で説明する処理の検索クエリは、追加ノードや対象ノードやユーザが指定したオブジェクト等であってもよい。 [1-5. Search processing example]
Here, an example of the search process according to the first embodiment will be described with reference to FIG. 11 as an example. FIG. 11 is a flowchart showing an example of the search process according to the first embodiment. The search process described below is performed by the search process unit 133 of the information processing apparatus 100. In addition, the objects referred to below may be read as nodes. In the following, the information processing apparatus 100 (search processing unit 133) performs search processing. If the search service is not provided, the information processing apparatus 100 does not have to have the search processing unit 133. The search query of the process described below may be an additional node, a target node, an object specified by the user, or the like.

ここでは、近傍オブジェクト集合Ｎ（Ｇ，ｙ）は、ノードｙに付与されているエッジにより関連付けられている近傍のオブジェクトの集合である。「Ｇ」は、所定のグラフデータ（例えば、空間情報ＳＰ１に示すグラフＧＲ１等）であってもよい。例えば、情報処理装置１００は、ｋ近傍検索処理を実行する。 Here, the neighborhood object set N (G, y) is a set of neighborhood objects associated with the edge assigned to the node y. “G” may be predetermined graph data (for example, graph GR1 shown in spatial information SP1). For example, the information processing apparatus 100 executes the k-nearest neighbor search process.

例えば、情報処理装置１００は、超球の半径ｒを∞（無限大）に設定し（ステップＳ３００）、既存のオブジェクト集合から部分集合Ｓを抽出する（ステップＳ３０１）。例えば、情報処理装置１００は、ルートノードとして選択されたオブジェクト（ノード）を部分集合Ｓとして抽出してもよい。また、例えば、超球とは、検索範囲を示す仮想的な球である。なお、ステップＳ３０１において抽出されたオブジェクト集合Ｓに含まれるオブジェクトは、同時に検索結果のオブジェクト集合Ｒの初期集合にも含められる。 For example, the information processing apparatus 100 sets the radius r of the hypersphere to ∞ (infinity) (step S300), and extracts the subset S from the existing object set (step S301). For example, the information processing apparatus 100 may extract an object (node) selected as a root node as a subset S. Further, for example, a hypersphere is a virtual sphere indicating a search range. The objects included in the object set S extracted in step S301 are also included in the initial set of the object set R of the search results at the same time.

次に、情報処理装置１００は、オブジェクト集合Ｓに含まれるオブジェクトの中で、検索クエリオブジェクトをｙとするとオブジェクトｙとの距離が最も短いオブジェクトを抽出し、オブジェクトｓとする（ステップＳ３０２）。例えば、情報処理装置１００は、ルートノードとして選択されたオブジェクト（ノード）のみがＳの要素の場合には、結果的にルートノードがオブジェクトｓとして抽出される。次に、情報処理装置１００は、オブジェクトｓをオブジェクト集合Ｓから除外する（ステップＳ３０３）。 Next, the information processing apparatus 100 extracts the object having the shortest distance from the object y from the objects included in the object set S, where y is the search query object, and sets it as the object s (step S302). For example, in the information processing apparatus 100, when only the object (node) selected as the root node is an element of S, the root node is extracted as the object s as a result. Next, the information processing apparatus 100 excludes the objects s from the object set S (step S303).

次に、情報処理装置１００は、オブジェクトｓとオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超えるか否かを判定する（ステップＳ３０４）。ここで、εは拡張要素であり、ｒ（１＋ε）は、探索範囲（この範囲内のノードのみを探索する。検索範囲よりも大きくすることで精度を高めることができる）の半径を示す値である。オブジェクトｓとオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超える場合（ステップＳ３０４：Ｙｅｓ）、情報処理装置１００は、オブジェクト集合Ｒをオブジェクトｙの近傍オブジェクト集合として出力し（ステップＳ３０５）、処理を終了する。 Next, the information processing apparatus 100 determines whether or not the distance d (s, y) between the object s and the object y exceeds r (1 + ε) (step S304). Here, ε is an extension element, and r (1 + ε) is a value indicating the radius of the search range (searching only the nodes within this range. The accuracy can be improved by making it larger than the search range). be. When the distance d (s, y) between the object s and the object y exceeds r (1 + ε) (step S304: Yes), the information processing apparatus 100 outputs the object set R as a neighborhood object set of the object y (step). S305), the process is terminated.

オブジェクトｓと検索クエリオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超えない場合（ステップＳ３０４：Ｎｏ）、情報処理装置１００は、オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトの中からオブジェクト集合Ｃに含まれないオブジェクトを一つ選択し、選択したオブジェクトｕを、オブジェクト集合Ｃに格納する（ステップＳ３０６）。オブジェクト集合Ｃは、重複検索を回避するために便宜上設けられるものであり、処理開始時には空集合に設定される。 When the distance d (s, y) between the object s and the search query object y does not exceed r (1 + ε) (step S304: No), the information processing apparatus 100 determines the neighborhood object set N (G, s) of the objects s. One object not included in the object set C is selected from the objects that are the elements of the above, and the selected object u is stored in the object set C (step S306). The object set C is provided for convenience in order to avoid duplicate search, and is set to an empty set at the start of processing.

次に、情報処理装置１００は、オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下であるか否かを判定する（ステップＳ３０７）。オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下である場合（ステップＳ３０７：Ｙｅｓ）、情報処理装置１００は、オブジェクトｕをオブジェクト集合Ｓに追加する（ステップＳ３０８）。また、オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下ではない場合（ステップＳ３０７：Ｎｏ）、情報処理装置１００は、ステップＳ３０９の判定（処理）を行う。 Next, the information processing apparatus 100 determines whether or not the distance d (u, y) between the object u and the object y is r (1 + ε) or less (step S307). When the distance d (u, y) between the object u and the object y is r (1 + ε) or less (step S307: Yes), the information processing apparatus 100 adds the object u to the object set S (step S308). Further, when the distance d (u, y) between the object u and the object y is not r (1 + ε) or less (step S307: No), the information processing apparatus 100 determines (processes) step S309.

次に、情報処理装置１００は、オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ以下であるか否かを判定する（ステップＳ３０９）。オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒを超える場合（ステップＳ３０９：Ｎｏ）、情報処理装置１００は、ステップＳ３１５の判定（処理）を行う。すなわち、オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ以下ではない場合、情報処理装置１００は、ステップＳ３１５の判定（処理）を行う。 Next, the information processing apparatus 100 determines whether or not the distance d (u, y) between the object u and the object y is r or less (step S309). When the distance d (u, y) between the object u and the object y exceeds r (step S309: No), the information processing apparatus 100 determines (processes) step S315. That is, when the distance d (u, y) between the object u and the object y is not r or less, the information processing apparatus 100 performs the determination (processing) in step S315.

オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ以下である場合（ステップＳ３０９：Ｙｅｓ）、情報処理装置１００は、オブジェクトｕをオブジェクト集合Ｒに追加する（ステップＳ３１０）。そして、情報処理装置１００は、オブジェクト集合Ｒに含まれるオブジェクト数がｋｓを超えるか否かを判定する（ステップＳ３１１）。所定数ｋｓは、任意に定められる自然数である。例えば、ｋｓは、検索数や抽出対象数であってもよい。また、例えば、範囲検索等において抽出するオブジェクト数の上限を設けない場合、ｋｓは、無限大に設定されてもよい。例えば、ｋｓ＝４等であってもよい。オブジェクト集合Ｒに含まれるオブジェクト数がｋｓを超えない場合（ステップＳ３１１：Ｎｏ）、情報処理装置１００は、ステップＳ３１３の判定（処理）を行う。 When the distance d (u, y) between the object u and the object y is r or less (step S309: Yes), the information processing apparatus 100 adds the object u to the object set R (step S310). Then, the information processing apparatus 100 determines whether or not the number of objects included in the object set R exceeds ks (step S311). The predetermined number ks is an arbitrarily determined natural number. For example, ks may be the number of searches or the number of extraction targets. Further, for example, when the upper limit of the number of objects to be extracted in the range search or the like is not set, ks may be set to infinity. For example, ks = 4 or the like may be used. When the number of objects included in the object set R does not exceed ks (step S311: No), the information processing apparatus 100 determines (processes) step S313.

オブジェクト集合Ｒに含まれるオブジェクト数がｋｓを超える場合（ステップＳ３１１：Ｙｅｓ）、情報処理装置１００は、オブジェクト集合Ｒに含まれるオブジェクトの中でオブジェクトｙとの距離が最も長い（遠い）オブジェクトを、オブジェクト集合Ｒから除外する（ステップＳ３１２）。 When the number of objects included in the object set R exceeds ks (step S311: Yes), the information processing apparatus 100 selects the object having the longest distance (far) from the object y among the objects included in the object set R. Exclude from the object set R (step S312).

次に、情報処理装置１００は、オブジェクト集合Ｒに含まれるオブジェクト数がｋｓと一致するか否かを判定する（ステップＳ３１３）。オブジェクト集合Ｒに含まれるオブジェクト数がｋｓと一致しない場合（ステップＳ３１３：Ｎｏ）、情報処理装置１００は、ステップＳ３１５の判定（処理）を行う。また、オブジェクト集合Ｒに含まれるオブジェクト数がｋｓと一致する場合（ステップＳ３１３：Ｙｅｓ）、情報処理装置１００は、オブジェクト集合Ｒに含まれるオブジェクトの中でオブジェクトｙとの距離が最も長い（遠い）オブジェクトと、オブジェクトｙとの距離を、新たなｒに設定する（ステップＳ３１４）。 Next, the information processing apparatus 100 determines whether or not the number of objects included in the object set R matches ks (step S313). When the number of objects included in the object set R does not match ks (step S313: No), the information processing apparatus 100 determines (processes) step S315. Further, when the number of objects included in the object set R matches ks (step S313: Yes), the information processing apparatus 100 has the longest distance (far) from the object y among the objects included in the object set R. The distance between the object and the object y is set to a new r (step S314).

そして、情報処理装置１００は、オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えたか否かを判定する（ステップＳ３１５）。オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えていない場合（ステップＳ３１５：Ｎｏ）、情報処理装置１００は、ステップＳ３０６に戻って処理を繰り返す。 Then, the information processing apparatus 100 selects all the objects from the objects that are the elements of the object set N (G, s) in the vicinity of the objects s, and determines whether or not the objects have been stored in the object set C (step S315). .. When all the objects have been selected from the objects which are the elements of the object set N (G, s) in the vicinity of the objects and stored in the object set C (step S315: No), the information processing apparatus 100 has step S306. Return to and repeat the process.

オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えた場合（ステップＳ３１５：Ｙｅｓ）、情報処理装置１００は、オブジェクト集合Ｓが空集合であるか否かを判定する（ステップＳ３１６）。オブジェクト集合Ｓが空集合でない場合（ステップＳ３１６：Ｎｏ）、情報処理装置１００は、ステップＳ３０２に戻って処理を繰り返す。また、オブジェクト集合Ｓが空集合である場合（ステップＳ３１６：Ｙｅｓ）、情報処理装置１００は、オブジェクト集合Ｒを出力し、処理を終了する（ステップＳ３１７）。例えば、情報処理装置１００は、オブジェクト集合Ｒに含まれるオブジェクト（ノード）を追加ノード（入力オブジェクトｙ）に対応する近傍ノードとして選択してもよい。例えば、情報処理装置１００は、オブジェクト集合Ｒに含まれるオブジェクト（ノード）を対象ノード（入力オブジェクトｙ）に対応する近傍ノードとして抽出（選択）してもよい。また、例えば、情報処理装置１００は、オブジェクト集合Ｒに含まれるオブジェクト（ノード）を検索クエリ（入力オブジェクトｙ）に対応する検索結果として、検索を行った端末装置等へ提供してもよい。 When all the objects are selected from the objects that are the elements of the object set N (G, s) in the vicinity of the objects s and stored in the object set C (step S315: Yes), the information processing apparatus 100 sets the object set S. Is an empty set or not (step S316). If the object set S is not an empty set (step S316: No), the information processing apparatus 100 returns to step S302 and repeats the process. When the object set S is an empty set (step S316: Yes), the information processing apparatus 100 outputs the object set R and ends the process (step S317). For example, the information processing apparatus 100 may select an object (node) included in the object set R as a neighboring node corresponding to an additional node (input object y). For example, the information processing apparatus 100 may extract (select) an object (node) included in the object set R as a neighboring node corresponding to the target node (input object y). Further, for example, the information processing apparatus 100 may provide an object (node) included in the object set R to a terminal device or the like that has performed a search as a search result corresponding to a search query (input object y).

〔２．第２の実施形態〕
ここから、第２の実施形態について説明する。第２の実施形態は、クラスタリングなどにより近傍オブジェクトをグルーピングし、各グループ（以下「ブロブ」ともいう）を一括して距離計算を行う対象とする。すなわち、第２の実施形態では、ブロブ単位で一括距離計算を行う。なお、第１の実施形態と同様の点については、適宜説明を省略する。第２の実施形態においては、情報処理システム１は、情報処理装置１００に代えて、情報処理装置１００Ａを有する。 [2. Second embodiment]
From here, the second embodiment will be described. In the second embodiment, neighboring objects are grouped by clustering or the like, and each group (hereinafter, also referred to as “blob”) is targeted for performing distance calculation collectively. That is, in the second embodiment, the batch distance calculation is performed for each blob. The same points as in the first embodiment will be omitted as appropriate. In the second embodiment, the information processing system 1 has an information processing device 100A instead of the information processing device 100.

〔２－１．情報処理〕
まず、図１２を用いて、第２の実施形態に係る情報処理の概要を説明する。図１２は、第２の実施形態に係る情報処理の一例を示す図である。 [2-1. Information processing]
First, the outline of the information processing according to the second embodiment will be described with reference to FIG. FIG. 12 is a diagram showing an example of information processing according to the second embodiment.

図１２の例では、情報処理装置１００Ａは、空間情報ＳＰ２１に示すようなグラフＧＲ２１を取得済みであるものとする。例えば図１中の空間情報ＳＰ２１は、ユークリッド空間であってもよい。例えば、空間情報ＳＰ２１は、オブジェクトのベクトルの次元数に対応し、１００次元や１０００次元等の多次元空間であるものとする。なお、図１２の例では、説明を簡単にするために直積量子化が行われていない図を基に説明するが、図１と同様にベクトル（空間）の直積量子化が行われてもよい。 In the example of FIG. 12, it is assumed that the information processing apparatus 100A has already acquired the graph GR21 as shown in the spatial information SP21. For example, the spatial information SP21 in FIG. 1 may be an Euclidean space. For example, the spatial information SP21 corresponds to the number of dimensions of the vector of the object, and is assumed to be a multidimensional space such as 100 dimensions or 1000 dimensions. In the example of FIG. 12, for the sake of simplicity, the description will be based on a diagram in which the direct product quantization is not performed, but the vector (space) direct product quantization may be performed in the same manner as in FIG. ..

まず、図１２で示す各情報について説明する。図１２の空間情報ＳＰ２１中のノードである白丸（〇）間を接続する点線がノード間を連結するエッジを示す。図１２では、図１と同様に無向エッジを例として示すが、グラフＧＲ２１のエッジは、無向エッジに限らず、有向エッジであってもよい。なお、ノートやエッジについては図１と同様であるため詳細な説明は省略する。 First, each information shown in FIG. 12 will be described. The dotted line connecting the white circles (◯), which are the nodes in the spatial information SP21 in FIG. 12, indicates the edge connecting the nodes. In FIG. 12, an undirected edge is shown as an example as in FIG. 1, but the edge of the graph GR 21 is not limited to the undirected edge and may be a directed edge. Since the notes and edges are the same as those in FIG. 1, detailed description thereof will be omitted.

図１２の空間情報ＳＰ２１において、直線で囲まれた領域はクラスタリングなどで近傍のオブジェクトがまとめ上げられた（グループ化された）ものであり、領域をブロブと称する。以下で示す例ではブロブが一括距離計算の処理単位となる。図１２は、ブロブＢＬ１～ＢＬ１０の１０個の領域（ブロブ）にクラスタリングされた場合を示す。例えば、ブロブＢＬ１は、ノードＮ７、Ｎ９、Ｎ８５、Ｎ１２６等の複数のノードが属するブロブであることを示す。なお、ブロブＢＬ１～ＢＬ１０中の黒い点は、各ブロブのセントロイド（代表ベクトル）を示す。 In the spatial information SP21 of FIG. 12, the area surrounded by a straight line is a group of nearby objects by clustering or the like, and the area is referred to as a blob. In the example shown below, the blob is the processing unit for batch distance calculation. FIG. 12 shows a case where clustering is performed in 10 regions (blobs) of blobs BL1 to BL10. For example, blob BL1 indicates that it is a blob to which a plurality of nodes such as nodes N7, N9, N85, and N126 belong. The black dots in the blobs BL1 to BL10 indicate the centroid (representative vector) of each blob.

なお、ブロブＢＬ１～ＢＬ１０は、クラスタリング等に関する種々の手法を適宜用いて生成される。例えば、ブロブに分類するためのクラスタリングは、ｋ－ｍｅａｎｓクラスタリングでもよい。また、例えば、ブロブに分類するためのクラスタリングは、ｋ－ｍｅａｎｓでの一回のイテレーション（アサイン）で各クラスタの中心座標（平均）で得られたセントロイドを次のイテレーションに使うのではなく、その代わりに、各セントロイドに最も近いオブジェクトにセントロイドを置き換えてから、次のイテレーションを行ってもよい。この場合、ｋ－ｍｅａｎｓで得られる最終的なクラスタのセントロイドは既存のオブジェクト、つまり、ノードとなる。これにより、クラスタ（ブロブ）と各ノードのエッジ（近傍ノード）が一致する傾向が高まり、検索性能をさらに向上させることができる。 The blobs BL1 to BL10 are generated by appropriately using various methods related to clustering and the like. For example, the clustering for classifying into blobs may be k-means clustering. Also, for example, in clustering for classifying into blobs, the centroid obtained at the center coordinates (average) of each cluster in one iteration (assignment) in k-means is not used for the next iteration. Alternatively, you may replace the centroid with the object closest to each centroid before performing the next iteration. In this case, the centroid of the final cluster obtained by k-means is an existing object, that is, a node. As a result, the tendency for the cluster (blob) and the edge (neighboring node) of each node to match increases, and the search performance can be further improved.

図１２の例では、情報処理装置１００Ａは、空間情報ＳＰ２１に示すように、複数のノード（オブジェクト）をクラスタリングすることにより、分類した複数のブロブＢＬ１～ＢＬ１０の情報を用いて、検索処理を行う。 In the example of FIG. 12, as shown in the spatial information SP21, the information processing apparatus 100A performs a search process by clustering a plurality of nodes (objects) and using the information of the plurality of classified blobs BL1 to BL10. ..

ここから、検索クエリＱＥ２を対象とする検索処理を説明する。まず、情報処理装置１００Ａは、検索クエリＱＥ２を取得する（ステップＳ２１）。例えば、情報処理装置１００Ａは、ユーザが利用する端末装置１０（図４参照）から検索クエリＱＥ２を取得する。 From here, the search process targeting the search query QE2 will be described. First, the information processing apparatus 100A acquires the search query QE2 (step S21). For example, the information processing device 100A acquires the search query QE2 from the terminal device 10 (see FIG. 4) used by the user.

そして、情報処理装置１００Ａは、検索クエリＱＥ２のベクトルと、コードブックのベクトルとの間の距離を算出する。図示は省略するが直積量子化を行っていない場合、情報処理装置１００Ａは、検索クエリＱＥ２のベクトルと、ベクトル全体を量子化するためのコードブックのベクトルとの間の距離（差分）を算出する。そして、情報処理装置１００Ａは、検索クエリＱＥ２のベクトルと各コードブックのベクトルとの間の距離（差分）を、各コードブックを識別するための情報に対応付けてコードブック情報ＴＢ２１として保持する。なお、直積量子化を行っている場合、情報処理装置１００Ａは、図１と同様に、図２のコードブック情報ＴＢ１～ＴＢ４に示すように、検索クエリＱＥ２の各サブベクトルと各コードブックのベクトルとの間の距離（差分）を算出する。なお、検索クエリのベクトルとコードブックのベクトルとの距離算出は、図１、図２等と同様であるため詳細な説明は省略する。 Then, the information processing apparatus 100A calculates the distance between the vector of the search query QE2 and the vector of the codebook. Although not shown, the information processing apparatus 100A calculates the distance (difference) between the vector of the search query QE2 and the vector of the codebook for quantizing the entire vector when the direct product quantization is not performed. .. Then, the information processing apparatus 100A holds the distance (difference) between the vector of the search query QE2 and the vector of each codebook as the codebook information TB21 in association with the information for identifying each codebook. When direct product quantization is performed, the information processing apparatus 100A has the same as FIG. 1, as shown in the codebook information TB1 to TB4 of FIG. 2, each subvector of the search query QE2 and the vector of each codebook. Calculate the distance (difference) between and. Since the distance calculation between the search query vector and the codebook vector is the same as in FIGS. 1, 2, etc., detailed description thereof will be omitted.

情報処理装置１００Ａは、検索クエリＱＥ２を対象とする検索処理を実行する（ステップＳ２２）。情報処理装置１００Ａは、検索クエリＱＥ２を対象として、グラフＧＲ２１を用いた図１６に示すような検索処理を行うことにより、検索クエリＱＥ２の検索結果を得る。図１６に示す検索処理についての詳細は後述する。情報処理装置１００Ａは、検索クエリＱＥ２を対象として検索処理を行うことにより、検索数のノードを検索クエリＱＥ２の近傍のノードとして抽出する。 The information processing apparatus 100A executes a search process targeting the search query QE2 (step S22). The information processing apparatus 100A obtains the search result of the search query QE2 by performing the search process as shown in FIG. 16 using the graph GR21 for the search query QE2. Details of the search process shown in FIG. 16 will be described later. The information processing apparatus 100A performs a search process on the search query QE2 to extract the nodes having the number of searches as the nodes in the vicinity of the search query QE2.

情報処理装置１００Ａは、検索クエリＱＥ２を対象とする検索処理において、各ノードのうち、所定のノードをグラフＧＲ２１の検索の起点となるノード（起点ノード）として選択する。図１２の例では、情報処理装置１００Ａは、起点ノードとして、ノードＮ９を選択するものとする。図１２の例では、情報処理装置１００Ａは、例えば、ノードＮ９を処理対象として検索処理を開始する。 In the search process for the search query QE2, the information processing apparatus 100A selects a predetermined node among the nodes as a node (starting point node) to be the starting point of the search of the graph GR21. In the example of FIG. 12, the information processing apparatus 100A selects the node N9 as the starting node. In the example of FIG. 12, the information processing apparatus 100A starts the search process with the node N9 as the processing target, for example.

ここで、情報処理装置１００Ａは、検索クエリＱＥ２を対象とする検索処理において、処理対象となるノードが属するブロブに属する複数のノードと検索クエリとの距離を並列化して算出する（ステップＳ２３）。図１２では、情報処理装置１００Ａは、一括処理情報ＬＴ２に示すように、処理対象となるノードＮ９が属するブロブＢＬ１に属するノードであるノードＮ７、Ｎ９、Ｎ８５、Ｎ１２６等については、検索クエリＱＥ２との距離を並列化して算出する。 Here, the information processing apparatus 100A calculates the distance between a plurality of nodes belonging to the blob to which the node to be processed belongs and the search query in parallel in the search process targeting the search query QE2 (step S23). In FIG. 12, as shown in the batch processing information LT2, the information processing apparatus 100A uses the search query QE2 for the nodes N7, N9, N85, N126, etc., which are the nodes belonging to the blob BL1 to which the node N9 to be processed belongs. Calculate by parallelizing the distances of.

情報処理装置１００Ａは、コードブック情報ＴＢ２１が示す各コードブックと検索クエリＱＥ２との間の距離を用いて、ノードＮ７、Ｎ９、Ｎ８５、Ｎ１２６の各々と検索クエリＱＥ２との距離を算出する。例えば、情報処理装置１００Ａは、コードブック情報ＴＢ２１を参照して、ノードＮ７のベクトルに対応するコードブックの距離を、ノードＮ７と検索クエリＱＥ２との距離として算出する。なお、直積量子化が行われている場合の距離算出は、図１、図２等と同様であるため詳細な説明は省略する。 The information processing apparatus 100A calculates the distance between each of the nodes N7, N9, N85, and N126 and the search query QE2 by using the distance between each codebook indicated by the codebook information TB 21 and the search query QE2. For example, the information processing apparatus 100A refers to the codebook information TB21 and calculates the distance of the codebook corresponding to the vector of the node N7 as the distance between the node N7 and the search query QE2. Since the distance calculation in the case of direct product quantization is the same as in FIGS. 1, 2, etc., detailed description thereof will be omitted.

なお、一括処理可能単位については、情報処理装置１００の場合と同様に、情報処理装置１００Ａの仕様に基づいて決定される。例えば、情報処理装置１００ＡがＳＩＭＤにより一括して処理できる数（一括処理可能単位）が「４」である場合、図１と同様に、４個のノードを対象として距離を並列化して算出する。例えば、情報処理装置１００Ａは、ＳＩＭＤの演算に関する並列化により、ノードＮ７、Ｎ９、Ｎ８５、Ｎ１２６の各々と検索クエリＱＥ２との距離を一括して算出する。これにより、情報処理装置１００Ａは、距離計算を並列化することにより、距離計算を高速化することでき、効率的な検索処理を可能にすることができる。また、情報処理装置１００Ａは、グラフＧＲ２１を用いて、ノードＮ７からのエッジが接続されたノード（例えばノードＮ１２、Ｎ６４）を辿り、それらのノードが属するブロブ（例えばブロブＢＬ２、ＢＬ３）を対象とした一括距離計算を行って、検索処理を行う。なお、情報処理装置１００Ａは、検索処理において、同じブロブが繰り返し一括距離計算の対象となることを抑制するが詳細は後述する。 The batch processable unit is determined based on the specifications of the information processing apparatus 100A, as in the case of the information processing apparatus 100. For example, when the number (units that can be collectively processed) that the information processing apparatus 100A can process collectively by SIMD is "4", the distances are calculated in parallel for four nodes as in FIG. 1. For example, the information processing apparatus 100A collectively calculates the distance between each of the nodes N7, N9, N85, and N126 and the search query QE2 by parallelizing the SIMD operation. As a result, the information processing apparatus 100A can speed up the distance calculation by parallelizing the distance calculation, and can enable efficient search processing. Further, the information processing apparatus 100A traces the nodes (for example, nodes N12 and N64) to which the edges from the node N7 are connected by using the graph GR21, and targets the blobs (for example, blobs BL2 and BL3) to which those nodes belong. Perform the batch distance calculation and perform the search process. The information processing apparatus 100A suppresses that the same blob is repeatedly subject to batch distance calculation in the search process, but the details will be described later.

なお、図１２では説明を簡単にするために、４つのノードを対象として距離を並列化して算出する場合を示すが、並列化される数は、情報処理装置１００Ａの仕様に基づいて決定される。この点についても図１、図２等と同様であるため、詳細な説明は省略する。また、情報処理装置１００Ａは、図３に示す例と同様に、転置データを用いてもよい。 Note that FIG. 12 shows a case where the distances are calculated in parallel for four nodes for the sake of simplicity, but the number to be parallelized is determined based on the specifications of the information processing apparatus 100A. .. Since this point is the same as in FIGS. 1, 2, etc., detailed description thereof will be omitted. Further, the information processing apparatus 100A may use transposed data as in the example shown in FIG.

上述のように、情報処理装置１００Ａは、ベクトル量子化された各ノードのベクトルと検索クエリとの間の近似距離（第２距離）を算出して、第２距離を用いて検索処理を行う事により、効率的な検索処理を可能にすることができる。また、情報処理装置１００Ａは、並列化可能な数のノードの距離計算を一括して行うことにより、効率的な検索処理を可能にすることができる。上記のように、情報処理装置１００Ａは、ブロブを一括処理の単位として一括して計算することにより、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100A calculates an approximate distance (second distance) between the vector of each vector quantized node and the search query, and performs the search process using the second distance. Therefore, efficient search processing can be enabled. Further, the information processing apparatus 100A can enable efficient search processing by collectively calculating the distances of a number of nodes that can be parallelized. As described above, the information processing apparatus 100A can enable efficient search processing by collectively calculating blobs as a unit of batch processing.

情報処理装置１００Ａは、検索処理において、グラフＧＲ２１を辿り処理対象となるノードを対象に上述した処理を行うことにより、検索数のノードを検索クエリＱＥ２の近傍のノードとして抽出する。例えば、情報処理装置１００Ａは、情報処理装置１００と同様に、上述した第１の方法及び第２の方法のいずれかにより検索結果を得る。 In the search process, the information processing apparatus 100A traces the graph GR21 and performs the above-mentioned process on the node to be processed, thereby extracting the node having the number of searches as a node in the vicinity of the search query QE2. For example, the information processing apparatus 100A obtains a search result by any of the above-mentioned first method and the second method, similarly to the information processing apparatus 100.

上述した処理は一例に過ぎず、情報処理装置１００Ａは、効率的な検索処理の為に様々な情報や手法を用いて、検索処理を行ってもよい。この点について、各事項について詳述する。例えば、情報処理装置１００Ａは、各ブロブに属するノードの数が、情報処理装置１００がＳＩＭＤにより一括して処理できる数（一括処理可能単位）になるように、クラスタリングを行い、ブロブを生成してもよい。 The above-mentioned processing is only an example, and the information processing apparatus 100A may perform the search processing by using various information and methods for the efficient search processing. In this regard, each item will be described in detail. For example, the information processing apparatus 100A performs clustering so that the number of nodes belonging to each blob is the number that the information processing apparatus 100 can collectively process by SIMD (collective processable unit), and generates blobs. May be good.

また、情報処理装置１００Ａは、ブロブに関する情報を用いて、重複した処理が行われることを抑制してもよい。情報処理装置１００Ａは、ブロブ単位にそのブロブの距離計算を行ったかを示すブロブ距離計算フラグ（以下単に「フラグ」ともいう）、および各オブジェクトがどのブロブに属するかを示すテーブルを有してもよい。この場合、情報処理装置１００Ａは、フラグにより各ブロブが処理済であるか否かを管理してもよい。例えば、情報処理装置１００Ａは、近傍ノードを逐一処理する直前にノードが属するブロブをテーブルにより特定し、そのブロブが一括距離計算済みかをフラグで判断し、未計算の場合には、一括計算を行い、個々のノードの処理を行ってもよい。この点については、図１５や図１６においても説明する。これにより、情報処理装置１００Ａは、重複した処理が行われることを抑制することができる。 Further, the information processing apparatus 100A may suppress duplicate processing by using the information about the blob. Even if the information processing apparatus 100A has a blob distance calculation flag (hereinafter, also simply referred to as “flag”) indicating whether the blob distance calculation has been performed for each blob, and a table indicating which blob each object belongs to. good. In this case, the information processing apparatus 100A may manage whether or not each blob has been processed by the flag. For example, the information processing apparatus 100A identifies the blob to which the node belongs from the table immediately before processing the neighboring nodes one by one, determines whether the blob has been calculated for the batch distance with a flag, and if it has not been calculated, performs the batch calculation. It may be done and processing of individual nodes may be performed. This point will also be described with reference to FIGS. 15 and 16. As a result, the information processing apparatus 100A can prevent duplicate processing from being performed.

〔２－２．情報処理装置の構成〕
次に、図１３を用いて、第２の実施形態に係る情報処理装置１００Ａの構成について説明する。図１３は、第２の実施形態に係る情報処理装置の構成例を示す図である。図１３に示すように、情報処理装置１００Ａは、通信部１１０と、記憶部１２０Ａと、制御部１３０Ａとを有する。なお、情報処理装置１００Ａにおいて、情報処理装置１００と同様の点は適宜説明を省略する。 [2-2. Information processing device configuration]
Next, the configuration of the information processing apparatus 100A according to the second embodiment will be described with reference to FIG. 13. FIG. 13 is a diagram showing a configuration example of the information processing apparatus according to the second embodiment. As shown in FIG. 13, the information processing apparatus 100A includes a communication unit 110, a storage unit 120A, and a control unit 130A. In the information processing apparatus 100A, the same points as those of the information processing apparatus 100 will be omitted as appropriate.

（記憶部１２０Ａ）
記憶部１２０Ａは、例えば、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。第２の実施形態に係る記憶部１２０Ａは、図１３に示すように、オブジェクト情報記憶部１２１と、グラフ情報記憶部１２２と、量子化情報記憶部１２３Ａと、コードブック情報記憶部１２４と、ブロブ情報記憶部１２５とを有する。 (Memory unit 120A)
The storage unit 120A is realized by, for example, a semiconductor memory element such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. As shown in FIG. 13, the storage unit 120A according to the second embodiment includes an object information storage unit 121, a graph information storage unit 122, a quantization information storage unit 123A, a codebook information storage unit 124, and a blob. It has an information storage unit 125.

（量子化情報記憶部１２３Ａ）
第２の実施形態に係る量子化情報記憶部１２３Ａは、割当処理に関する各種情報を記憶する。図１４は、第２の実施形態に係る量子化情報記憶部の一例を示す図である。図１４の例では、量子化情報記憶部１２３Ａには、「ノードＩＤ」、「オブジェクトＩＤ」、「ブロブＩＤ」、および「量子化情報」といった項目を有する。 (Quantization information storage unit 123A)
The quantized information storage unit 123A according to the second embodiment stores various information related to the allocation process. FIG. 14 is a diagram showing an example of a quantized information storage unit according to the second embodiment. In the example of FIG. 14, the quantized information storage unit 123A has items such as "node ID", "object ID", "blob ID", and "quantization information".

「ノードＩＤ」は、グラフにおける各ノード（対象）を識別するための識別情報を示す。また、「オブジェクトＩＤ」は、オブジェクトを識別するための識別情報を示す。なお、ノードＩＤとオブジェクトＩＤが共通である場合、「ノードＩＤ」にオブジェクトＩＤが記憶され、量子化情報記憶部１２３Ａに「オブジェクトＩＤ」の項目は含まれてなくてもよい。 The "node ID" indicates identification information for identifying each node (target) in the graph. Further, the "object ID" indicates identification information for identifying an object. When the node ID and the object ID are common, the object ID is stored in the "node ID", and the item of the "object ID" may not be included in the quantized information storage unit 123A.

「ブロブＩＤ」は、ノード（オブジェクト）が属するブロブを識別するための情報を示す。 The "blob ID" indicates information for identifying the blob to which the node (object) belongs.

また、「量子化情報」は、各ノード（オブジェクト）の量子化されたベクトルの情報を示す。例えば、「量子化情報」には、「要素」、「コードブックＩＤ」といった情報が含まれる。「要素」は、対応するオブジェクトのベクトルにおける配置を示す。図１４の例では、「要素」には、「＃１」、「＃２」、「＃３」、「＃４」が含まれる場合を示す。この場合、各ノード（オブジェクト）のベクトルは４分割され、各分割された部分ベクトルがコードブックにより量子化されることを示す。 Further, "quantization information" indicates information on the quantized vector of each node (object). For example, "quantization information" includes information such as "element" and "codebook ID". "Element" indicates the arrangement of the corresponding object in the vector. In the example of FIG. 14, the case where the "element" includes "# 1", "# 2", "# 3", and "# 4" is shown. In this case, the vector of each node (object) is divided into four, and it is shown that each divided partial vector is quantized by the codebook.

図１４の例では、ノードＮ１（オブジェクトＯＢ１）は、ブロブＩＤ「ＢＬ９」により識別されるブロブ（ブロブＢＬ９）に属することを示す。例えば、直積量子化によりベクトルが４分割される場合、ノードＮ１の量子化情報には、図８に示すノードＮ１の量子化情報示す４つのコードブックと同様の情報が格納される。また、例えば、直積量子化が行われない場合、ノードＮ１の量子化情報には、ノードＮ１のベクトルを量子化するための、１つのコードブックを示す情報が記憶される。 In the example of FIG. 14, the node N1 (object OB1) is shown to belong to the blob (blob BL9) identified by the blob ID “BL9”. For example, when the vector is divided into four by the direct product quantization, the quantization information of the node N1 stores the same information as the four codebooks showing the quantization information of the node N1 shown in FIG. Further, for example, when the direct product quantization is not performed, the information indicating one codebook for quantizing the vector of the node N1 is stored in the quantization information of the node N1.

なお、量子化情報記憶部１２３Ａは、上記に限らず、目的に応じて種々の情報を記憶してもよい。 The quantized information storage unit 123A is not limited to the above, and may store various information depending on the purpose.

（ブロブ情報記憶部１２５）
第２の実施形態に係るブロブ情報記憶部１２５は、ブロブに関する情報を記憶する。例えば、ブロブ情報記憶部１２５は、各ブロブに対応付けられたオブジェクトを識別する各種情報を記憶する。図１５は、第２の実施形態に係るブロブ情報記憶部の一例を示す図である。図１５の例では、ブロブ情報記憶部１２５は、「ブロブＩＤ」、「ノードＩＤ」、「ベクトル情報」といった項目が含まれる。 (Blob information storage unit 125)
The blob information storage unit 125 according to the second embodiment stores information about the blob. For example, the blob information storage unit 125 stores various information that identifies an object associated with each blob. FIG. 15 is a diagram showing an example of a blob information storage unit according to the second embodiment. In the example of FIG. 15, the blob information storage unit 125 includes items such as “blob ID”, “node ID”, and “vector information”.

「ブロブＩＤ」は、ブロブを識別するための識別情報を示す。また、「ノードＩＤ」は、ブロブＩＤにより識別されるブロブに対応付けられたノード（オブジェクト）を示す。「ベクトル情報」は、ブロブのベクトル情報を示す。例えば、「ベクトル情報」は、ブロブのセントロイドに対応するベクトルを示す。 "Blob ID" indicates identification information for identifying a blob. Further, the "node ID" indicates a node (object) associated with the blob identified by the blob ID. "Vector information" indicates the vector information of the blob. For example, "vector information" indicates a vector corresponding to the blob's centroid.

図１５に示す例では、ブロブＩＤ「ＢＬ１」により識別されるブロブ（ブロブＢＬ１）に対応付けられたノード（オブジェクト）は、ノードＮ７、Ｎ９、Ｎ８５、Ｎ１２６等であることを示す。ブロブＢＬ１は、「５１，４，１０２，３３・・・」の多次元のベクトル情報が対応付けられることを示す。 In the example shown in FIG. 15, it is shown that the node (object) associated with the blob (blob BL1) identified by the blob ID “BL1” is the node N7, N9, N85, N126 or the like. The blob BL1 indicates that the multidimensional vector information of "51, 4, 102, 33 ..." Is associated with each other.

また、ブロブＩＤ「ＢＬ９」により識別されるブロブ（ブロブＢＬ９）に対応付けられたノード（オブジェクト）は、ノードＮ１、Ｎ４、Ｎ５等であることを示す。ブロブＢＬ９は、「１２，５５，１２，６・・・」の多次元のベクトル情報が対応付けられることを示す。 Further, it is shown that the node (object) associated with the blob (blob BL9) identified by the blob ID "BL9" is a node N1, N4, N5 or the like. The blob BL9 indicates that the multidimensional vector information of "12, 55, 12, 6 ..." Is associated with the vector information.

なお、ブロブ情報記憶部１２５は、上記に限らず、目的に応じて種々の情報を記憶してもよい。ブロブ情報記憶部１２５は、各ブロブが距離計算の処理対象となったか否かを示すフラグを記憶してもよい。例えば、ブロブ情報記憶部１２５は、距離計算の対象として処理されていないことを示す値（「未処理フラグ値」ともいう）と、距離計算の対象として処理されたことを示す値（「処理済フラグ値」ともいう）とのいずれかを各ブロブのフラグの値に設定する。例えば、ブロブ情報記憶部１２５は、検索処理開始時に各ブロブのフラグの値を、そのブロブを対象としての距離計算が未処理であることを示す値（例えば０）に設定し、距離計算の対象となったブロブのフラグの値を、距離計算が処理済みであることを示す値（例えば１）に変更する。 The blob information storage unit 125 is not limited to the above, and may store various information depending on the purpose. The blob information storage unit 125 may store a flag indicating whether or not each blob has been processed for distance calculation. For example, the blob information storage unit 125 has a value indicating that it has not been processed as a target of distance calculation (also referred to as “unprocessed flag value”) and a value indicating that it has been processed as a target of distance calculation (“processed”). Set one of (also called "flag value") to the value of the flag of each blob. For example, the blob information storage unit 125 sets the value of the flag of each blob at the start of the search process to a value (for example, 0) indicating that the distance calculation for the blob has not been processed, and is the target of the distance calculation. The value of the flag of the blob that has become is changed to a value (for example, 1) indicating that the distance calculation has been completed.

例えば、情報処理装置１００Ａは、処理対象となったブロブのフラグの値を参照して、そのブロブについて一括距離計算の処理を実行するか否かを判定する。情報処理装置１００Ａは、各ブロブのフラグの値を参照し、そのブロブのフラグの値が未処理フラグ値である場合は、そのブロブが一括距離計算の処理前であると判定して、そのブロブに属するノードの一括距離計算を実行する。一方、情報処理装置１００Ａは、そのブロブのフラグの値が処理済フラグ値である場合は、そのブロブが一括距離計算の処理済みであると判定して、そのブロブに属するノードの一括距離計算を行わない。 For example, the information processing apparatus 100A refers to the value of the flag of the blob to be processed, and determines whether or not to execute the batch distance calculation process for the blob. The information processing apparatus 100A refers to the value of the flag of each blob, and if the value of the flag of the blob is the unprocessed flag value, determines that the blob is before the processing of the batch distance calculation, and determines that the blob. Performs batch distance calculation for nodes belonging to. On the other hand, when the value of the flag of the blob is the processed flag value, the information processing apparatus 100A determines that the blob has been processed for the batch distance calculation, and performs the batch distance calculation of the nodes belonging to the blob. Not performed.

（制御部１３０Ａ）
図１３の説明に戻って、制御部１３０Ａは、コントローラ（controller）であり、例えば、ＣＰＵやＭＰＵやＧＰＵ等によって、情報処理装置１００Ａ内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０Ａは、コントローラであり、例えば、ＡＳＩＣやＦＰＧＡ等の集積回路により実現される。 (Control unit 130A)
Returning to the description of FIG. 13, the control unit 130A is a controller, and various programs (information processing programs) stored in the storage device inside the information processing device 100A by, for example, a CPU, MPU, GPU, or the like. (Corresponding to one example) is realized by executing RAM as a work area. Further, the control unit 130A is a controller, and is realized by, for example, an integrated circuit such as an ASIC or FPGA.

図１３に示すように、制御部１３０Ａは、取得部１３１と、生成部１３２Ａと、検索処理部１３３Ａと、提供部１３４とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０Ａの内部構成は、図１３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 13, the control unit 130A has an acquisition unit 131, a generation unit 132A, a search processing unit 133A, and a provision unit 134, and realizes or executes an information processing function or operation described below. do. The internal configuration of the control unit 130A is not limited to the configuration shown in FIG. 13, and may be any other configuration as long as it is configured to perform information processing described later.

（生成部１３２Ａ）
生成部１３２Ａは、生成部１３２と同様に各種情報を生成する。 (Generator 132A)
The generation unit 132A generates various information in the same manner as the generation unit 132.

生成部１３２Ａは、量子化情報記憶部１２３に示すようなベクトル量子化に関する情報を生成してもよい。例えば、生成部１３２は、ノードＮ１（オブジェクトＯＢ１）がブロブＢＬ９に属することを示す情報を生成する。また、生成部１３２は、ブロブ情報記憶部１２５に示すようなブロブに関する情報を生成してもよい。例えば、生成部１３２は、ブロブＢＬ１には、ノードＮ７、Ｎ９、Ｎ８５、Ｎ１２６等が属することを示す情報を生成する。なお、情報処理装置１００がグラフ情報記憶部１２２、量子化情報記憶部１２３Ａ、コードブック情報記憶部１２４、ブロブ情報記憶部１２５に示す情報を、情報提供装置５０等の外部装置から取得する場合、情報処理装置１００は、生成部１３２Ａを有しなくてもよい。 The generation unit 132A may generate information on vector quantization as shown in the quantization information storage unit 123. For example, the generation unit 132 generates information indicating that the node N1 (object OB1) belongs to the blob BL9. Further, the generation unit 132 may generate information about the blob as shown in the blob information storage unit 125. For example, the generation unit 132 generates information indicating that the nodes N7, N9, N85, N126 and the like belong to the blob BL1. When the information processing device 100 acquires the information shown in the graph information storage unit 122, the quantization information storage unit 123A, the codebook information storage unit 124, and the blob information storage unit 125 from an external device such as the information providing device 50. The information processing device 100 does not have to have the generation unit 132A.

（検索処理部１３３Ａ）
検索処理部１３３Ａは、検索処理部１３３と同様に検索処理に関する各種処理を行う。 (Search processing unit 133A)
The search processing unit 133A performs various processes related to the search processing in the same manner as the search processing unit 133.

検索処理部１３３Ａは、複数のオブジェクトの各々に対応するノード群を対象として、検索クエリの近傍のノードを検索する検索処理において、複数のオブジェクトを分類した複数のブロブの情報を用いて、一のブロブに属する複数のノードと検索クエリとの距離を、ベクトル量子化された複数のノードのベクトル情報を用いて算出する。検索処理部１３３Ａは、複数のノードと検索クエリとの距離の算出を並列化して一括で行う。検索処理部１３３Ａは、情報処理装置１００Ａの仕様に基づいて決定される一括処理数の複数のノードと検索クエリとの距離の算出を並列処理する。 The search processing unit 133A uses the information of a plurality of blobs that classify a plurality of objects in a search process for searching a node in the vicinity of a search query for a node group corresponding to each of the plurality of objects. The distance between a plurality of nodes belonging to a blob and a search query is calculated using the vector information of the plurality of vector-quantified nodes. The search processing unit 133A performs the calculation of the distance between the plurality of nodes and the search query in parallel and collectively. The search processing unit 133A processes in parallel the calculation of the distance between the plurality of nodes of the batch processing number determined based on the specifications of the information processing apparatus 100A and the search query.

検索処理部１３３Ａは、検索クエリが該当するブロブと隣接する一のブロブに属する複数のノードと検索クエリとの距離を算出する。検索処理部１３３Ａは、複数のノードがエッジにより連結されたグラフを用いて、検索クエリの近傍のノードを検索する検索処理において、複数のノードと検索クエリとの距離を算出する。検索処理部１３３Ａは、検索処理において処理対象となる対象ノードからのエッジが連結された接続ノードが属する一のブロブに属する複数のノードと検索クエリとの距離を算出する。 The search processing unit 133A calculates the distance between the blob to which the search query corresponds and a plurality of nodes belonging to one adjacent blob and the search query. The search processing unit 133A calculates the distance between the plurality of nodes and the search query in the search processing for searching the nodes in the vicinity of the search query by using the graph in which the plurality of nodes are connected by edges. The search processing unit 133A calculates the distance between a plurality of nodes belonging to one blob to which the connection node to which the edges from the target node to be processed are connected in the search processing are connected and the search query.

検索処理部１３３Ａは、対象ノードが属するブロブ以外の一のブロブに属する複数のノードと検索クエリとの距離を算出する。検索処理部１３３Ａは、複数のノードの各々から、複数のノードの各々が属するブロブ以外の他のブロブへ連結されたブロブ用グラフを用いて、検索クエリの近傍のノードを検索する検索処理において、複数のノードと検索クエリとの距離を算出する。検索処理部１３３Ａは、複数のノードがエッジにより連結された変換前のグラフにおいて、複数のノードのうち一のノードからエッジが連結されたノードが属するブロブであって、一のノードが属するブロブ以外のブロブへ、一のノードからエッジを連結することにより生成された変換後のグラフであるブロブ用グラフを用いて、検索クエリの近傍のノードを検索する検索処理において、複数のノードと検索クエリとの距離を算出する。 The search processing unit 133A calculates the distance between a plurality of nodes belonging to one blob other than the blob to which the target node belongs and the search query. In the search process, the search processing unit 133A searches for a node in the vicinity of the search query by using a blob graph connected from each of the plurality of nodes to a blob other than the blob to which each of the plurality of nodes belongs. Calculate the distance between multiple nodes and the search query. The search processing unit 133A is a blob to which the node to which the edge is connected from one of the plurality of nodes belongs in the graph before conversion in which a plurality of nodes are connected by the edge, and is other than the blob to which one node belongs. In the search process to search for nodes in the vicinity of the search query using the graph for blob, which is the graph after conversion generated by connecting the edges from one node to the blob of, multiple nodes and the search query Calculate the distance of.

検索処理部１３３Ａは、検索処理において処理対象となる対象ノードからのエッジが連結されたブロブである一のブロブに属する複数のノードと検索クエリとの距離を算出する。検索処理部１３３Ａは、既に処理対象となったブロブである処理済みブロブを示す情報を用いて、一のブロブを処理対象とするかを判定する。検索処理部１３３Ａは、一のブロブが処理済みブロブである場合、一のブロブを処理対象としないと判定する。 The search processing unit 133A calculates the distance between a plurality of nodes belonging to one blob, which is a blob in which edges from the target node to be processed in the search processing are connected, and the search query. The search processing unit 133A determines whether or not one blob is to be processed by using the information indicating the processed blob that is the blob that has already been processed. When one blob is a processed blob, the search processing unit 133A determines that the one blob is not the processing target.

検索処理部１３３Ａは、検索処理において処理対象となったノードのうち、所定のノードを対象として、ベクトル量子化がされた距離である第１距離とは異なり、ベクトル量子化がされていない第２距離を算出する。検索処理部１３３Ａは、検索処理において検索クエリの近傍のノードとして抽出するノードの第１数よりも多い数である第２数のノードを近傍候補ノードとして抽出し、近傍候補ノードを対象として第２距離を算出する。検索処理部１３３Ａは、近傍候補ノードのうち、第２距離が短い方から第１数のノードを検索クエリの近傍のノードとして抽出する。 The search processing unit 133A is different from the first distance, which is the distance in which vector quantization is performed for a predetermined node among the nodes processed in the search processing, and the search processing unit 133A does not perform vector quantization. Calculate the distance. The search processing unit 133A extracts a second number of nodes as a neighborhood candidate node, which is a larger number than the first number of nodes to be extracted as a node in the vicinity of the search query in the search processing, and targets the neighborhood candidate node as the second node. Calculate the distance. The search processing unit 133A extracts the first number of nodes from the one with the shortest second distance among the neighborhood candidate nodes as the nodes in the vicinity of the search query.

検索処理部１３３Ａは、第１距離が所定の閾値以内であるノードを対象として第２距離を算出する。検索処理部１３３Ａは、近傍のノードとして抽出する対象範囲を示す検索範囲内のノードを対象として第２距離を算出する。検索処理部１３３Ａは、検索処理の対象範囲を示す探索範囲内のノードを対象として第２距離を算出する。 The search processing unit 133A calculates the second distance for the node whose first distance is within a predetermined threshold value. The search processing unit 133A calculates the second distance for the nodes in the search range indicating the target range to be extracted as the neighboring nodes. The search processing unit 133A calculates the second distance for the nodes in the search range indicating the target range of the search processing.

〔２－３．検索処理例〕
ここで、第２の実施形態に係る検索処理の一例について、図１６を一例として説明する。図１６は、第２の実施形態に係る検索処理の一例を示すフローチャートである。以下に説明する検索処理は、情報処理装置１００Ａの検索処理部１３３Ａによって行われる。なお、図１１等、第１の実施形態と同様の点については適宜説明を省略する。例えば、図１６において、図１１と同様の点は同じステップ番号を付すことにより、適宜説明を省略する。 [2-3. Search processing example]
Here, an example of the search process according to the second embodiment will be described with reference to FIG. 16 as an example. FIG. 16 is a flowchart showing an example of the search process according to the second embodiment. The search process described below is performed by the search process unit 133A of the information processing apparatus 100A. The same points as in the first embodiment, such as FIG. 11, will be omitted as appropriate. For example, in FIG. 16, the same points as those in FIG. 11 are given the same step numbers, and the description thereof will be omitted as appropriate.

ここでは、近傍オブジェクト集合Ｎ（Ｇ，ｙ）は、ノードｙに付与されているエッジにより関連付けられている近傍のオブジェクトの集合である。「Ｇ」は、所定のグラフデータ（例えば、空間情報ＳＰ２１に示すグラフＧＲ２１等）であってもよい。例えば、情報処理装置１００Ａは、ｋ近傍検索処理を実行する。 Here, the neighborhood object set N (G, y) is a set of neighborhood objects associated with the edge assigned to the node y. “G” may be predetermined graph data (for example, graph GR21 shown in spatial information SP21). For example, the information processing apparatus 100A executes the k-nearest neighbor search process.

例えば、図１６に示す検索処理は、ステップＳ３０４の後にステップＳ３０４ａに示す処理を行う点で、図１１に示す検索処理と相違する。このように、図１６に示す検索処理は、通常のグラフの探索で、まだ、アクセスしていないブロブに到達した場合にそのブロブ内のオブジェクト（ノード）とクエリとの距離を一括して計算し、各ノードの処理を行う。 For example, the search process shown in FIG. 16 differs from the search process shown in FIG. 11 in that the process shown in step S304a is performed after step S304. As described above, the search process shown in FIG. 16 is a normal graph search, and when a blob that has not been accessed yet is reached, the distance between the object (node) in the blob and the query is collectively calculated. , Process each node.

具体的には、図１６の検索処理においては、情報処理装置１００Ａは、オブジェクトｓが属するブロブがまだ距離計算していない場合には、以下のすべて（図１６中のステップＳ３０４ａ中の「－」に続けて示す４つの処理。以下「第１の処理」～「第４の処理」とする）を実行する（ステップＳ３０４ａ）。ステップＳ３０４ａにおいて、情報処理装置１００Ａは、そのブロブ内のオブジェクトを一括距離計算する処理（第１の処理）を実行する。また、ステップＳ３０４ａにおいて、情報処理装置１００Ａは、ブロブの距離計算フラグをセットする処理（第２の処理）を実行する。また、ステップＳ３０４ａにおいて、情報処理装置１００Ａは、一括距離計算を行ったブロブ内のすべてのオブジェクトをＣに格納する処理（第３の処理）を実行する。また、ステップＳ３０４ａにおいて、情報処理装置１００Ａは、ブロブのすべてのオブジェクトをｕとして逐一ステップＳ３０７からステップＳ３１４までの処理を行う処理（第４の処理）を実行する。その後、情報処理装置１００Ａは、ステップＳ３０６以降の処理を行う。 Specifically, in the search process of FIG. 16, when the blob to which the object s belongs has not yet calculated the distance, the information processing apparatus 100A has all of the following (“−” in step S304a in FIG. The four processes shown in succession to the above; hereinafter referred to as "first process" to "fourth process") are executed (step S304a). In step S304a, the information processing apparatus 100A executes a process (first process) of collectively calculating the distance of the objects in the blob. Further, in step S304a, the information processing apparatus 100A executes a process (second process) of setting the blob distance calculation flag. Further, in step S304a, the information processing apparatus 100A executes a process (third process) of storing all the objects in the blob for which the batch distance calculation has been performed in C. Further, in step S304a, the information processing apparatus 100A executes a process (fourth process) of performing the processes from step S307 to step S314 one by one with all the objects of the blob as u. After that, the information processing apparatus 100A performs the processing after step S306.

〔２－４．変形例〕
ここから、変形例について説明する。第２の実施形態に係る変形例においては、ブロブの概念を含むグラフを用いてもよい。例えば、第２の実施形態に係る変形例においては、ノードからのエッジによる参照先がブロブであるグラフを用いてもよい。なお、第１の実施形態や第２の実施形態と同様の点については、適宜説明を省略する。変形例に係る情報処理装置１００Ａは、グラフ情報記憶部１２２に代えて、グラフ情報記憶部１２２Ａを有する。 [2-4. Modification example]
From here, a modification will be described. In the modification according to the second embodiment, a graph including the concept of blob may be used. For example, in the modification according to the second embodiment, a graph in which the reference destination by the edge from the node is a blob may be used. The same points as those of the first embodiment and the second embodiment will be omitted as appropriate. The information processing apparatus 100A according to the modified example has a graph information storage unit 122A instead of the graph information storage unit 122.

〔２－４－１．情報処理〕
まず、図１７を用いて、変形例に係る情報処理の概要を説明する。図１７は、変形例に係る情報処理の一例を示す図である。 [2-4-1. Information processing]
First, the outline of information processing according to the modified example will be described with reference to FIG. FIG. 17 is a diagram showing an example of information processing according to a modified example.

情報処理装置１００Ａは、ノード間をエッジで連結したグラフＧＲ２１を、ノードからのエッジによる参照先がブロブであるグラフＧＲ３１に変換する（ステップＳ３１）。すなわち、情報処理装置１００Ａは、図１７に示すように、通常のグラフであるグラフＧＲ２１から、ブロブの概念が導入されたグラフ（ブロブ用グラフ）であるグラフＧＲ３１を生成する。 The information processing apparatus 100A converts the graph GR21 in which the nodes are connected by an edge into the graph GR31 in which the reference destination by the edge from the node is a blob (step S31). That is, as shown in FIG. 17, the information processing apparatus 100A generates a graph GR31, which is a graph (graph for blobs) into which the concept of blobs is introduced, from the graph GR21, which is a normal graph.

図１７の例では、情報処理装置１００Ａは、空間情報ＳＰ３１に示すように、ノードからブロブへの有向エッジを含むグラフＧＲ３１を生成する。なお、グラフＧＲ３１においては、各ノード間のエッジについては削除され、グラフＧＲ３１には、ノードからブロブへの有向エッジが含まれ、ノード間のエッジは含まれない。図１７に示す矢印線は、矢元のノードから矢先のブロブへの有向エッジを示す。すなわち、図１７に示す矢印線は、矢元のノードを参照元とし、矢先のブロブを参照先とする有向エッジを示す。例えば、図１７では、ノードＮ１からは、ブロブＢＬ８及びブロブＢＬ１０の２つのブロブへのエッジが連結されることを示す。 In the example of FIG. 17, the information processing apparatus 100A generates the graph GR31 including the directed edge from the node to the blob, as shown in the spatial information SP31. In the graph GR31, the edges between the nodes are deleted, and the graph GR31 includes the directed edges from the nodes to the blobs, and does not include the edges between the nodes. The arrow line shown in FIG. 17 indicates a directed edge from the node at the origin of the arrow to the blob at the tip of the arrow. That is, the arrow line shown in FIG. 17 indicates a directed edge with the node at the arrowhead as the reference source and the blob at the arrowhead as the reference destination. For example, FIG. 17 shows that from node N1, the edges of blobs BL8 and blobs BL10 are connected to the two blobs.

例えば、情報処理装置１００Ａは、各ノードのエッジを接続ノード（近傍ノード）へのエッジから接続ノードが属するブロブへのエッジに変換する。なお、情報処理装置１００Ａは、各ノード自身が属するブロブへのエッジは生成しない。図１７の例では、情報処理装置１００Ａは、ノードＮ９とノードＮ１２６は同じブロブＢＬ１に属するため、ノードＮ９からブロブＢＬ１へのエッジ、及びノードＮ１２６からブロブＢＬ１へのエッジは生成しない。一方、図１７の例では、情報処理装置１００Ａは、ノードＮ９とノードＮ１８は異なるブロブＢＬ１、ＢＬ４に属するため、ノードＮ９からブロブＢＬ４へのエッジ、及びノードＮ１８からブロブＢＬ１へのエッジを生成する。 For example, the information processing apparatus 100A converts the edge of each node from the edge to the connecting node (neighboring node) to the edge to the blob to which the connecting node belongs. The information processing apparatus 100A does not generate an edge to the blob to which each node itself belongs. In the example of FIG. 17, since the node N9 and the node N126 belong to the same blob BL1, the information processing apparatus 100A does not generate an edge from the node N9 to the blob BL1 and an edge from the node N126 to the blob BL1. On the other hand, in the example of FIG. 17, since the node N9 and the node N18 belong to different blobs BL1 and BL4, the information processing apparatus 100A generates an edge from the node N9 to the blob BL4 and an edge from the node N18 to the blob BL1. ..

図１７の例では、情報処理装置１００Ａは、空間情報ＳＰ３１に示すように、ノード（オブジェクト）からブロブへの有向エッジを含むグラフＧＲ３１を用いて、検索処理を行う。例えば、情報処理装置１００Ａは、検索クエリＱＥ２を対象として、グラフＧＲ２１を用いた図１９に示すような検索処理を行うことにより、検索クエリＱＥ２の検索結果を得る。図１９に示す検索処理についての詳細は後述する。なお、図１７での検索処理は、ブロブの概念を用いて検索処理を行う点で図１２と共通し、エッジに関連する処理の点以外は、図１２に示す検索処理と同様である。 In the example of FIG. 17, the information processing apparatus 100A performs the search process using the graph GR31 including the directed edge from the node (object) to the blob, as shown in the spatial information SP31. For example, the information processing apparatus 100A obtains the search result of the search query QE2 by performing the search process as shown in FIG. 19 using the graph GR21 for the search query QE2. Details of the search process shown in FIG. 19 will be described later. The search process in FIG. 17 is common to FIG. 12 in that the search process is performed using the concept of blob, and is the same as the search process shown in FIG. 12 except for the process related to the edge.

〔２－４－２．グラフ〕
次に、図１８を用いて、変形例に係るグラフの概要を説明する。図１８は、変形例に係るグラフ情報記憶部の一例を示す図である。例えば、図１８に示すグラフ情報記憶部１２２Ａは、ブロブの概念が導入されたグラフ（ブロブ用グラフ）を記憶する。図１８に示すグラフ情報記憶部１２２Ａは、「ノードＩＤ」、「オブジェクトＩＤ」、および「ブロブ情報」といった項目を有する。このように、変形例に係るグラフ情報記憶部１２２Ａは、「接続ノード情報」に代えて、「ブロブ情報」を有する点で図７のグラフ情報記憶部１２２と相違する。なお、図７のグラフ情報記憶部１２２と同様の点については適宜説明を省略する。 [2-4-2. graph〕
Next, the outline of the graph according to the modified example will be described with reference to FIG. FIG. 18 is a diagram showing an example of a graph information storage unit according to a modified example. For example, the graph information storage unit 122A shown in FIG. 18 stores a graph (graph for blobs) into which the concept of blobs is introduced. The graph information storage unit 122A shown in FIG. 18 has items such as “node ID”, “object ID”, and “blob information”. As described above, the graph information storage unit 122A according to the modified example is different from the graph information storage unit 122 in FIG. 7 in that it has "blob information" instead of "connection node information". The same points as the graph information storage unit 122 in FIG. 7 will be omitted as appropriate.

また、「ブロブ情報」は、対応するノードから辿ることができるブロブ（参照先のブロブ）に関する情報を示す。例えば、「ブロブ情報」には、「参照先」といった情報が含まれる。「参照先」は、エッジにより連結され、そのノードから辿ることができる参照先（ブロブ）を識別するための情報を示す。すなわち、図１８の例では、ノードを識別するノードＩＤ（オブジェクトＩＤ）に対して、そのノードからエッジにより辿ることができる参照先（ブロブ）が対応付けられて登録されている。なお、「ブロブ情報」には、参照先に接続されるエッジを識別するための情報（エッジＩＤ）等が含まれてもよい。 Further, "blob information" indicates information about blobs (referenced blobs) that can be traced from the corresponding node. For example, "blob information" includes information such as "reference destination". "Reference destination" indicates information for identifying a reference destination (brob) that is connected by an edge and can be traced from the node. That is, in the example of FIG. 18, a reference destination (blob) that can be traced by an edge from the node is associated with and registered with respect to the node ID (object ID) that identifies the node. The "blob information" may include information (edge ID) for identifying an edge connected to a reference destination.

図１８の例では、ノードＩＤ「Ｎ１」により識別されるノード（ノードＮ１）は、オブジェクトＩＤ「ＯＢ１」により識別されるオブジェクト（対象）に対応することを示す。また、ノードＮ１からは、ブロブＩＤ「ＢＬ８」により識別されるブロブ（ブロブＢＬ８）にエッジが連結されており、ノードＮ１からブロブＢＬ８へ辿ることができることを示す。また、ノードＮ１からは、ブロブＩＤ「ＢＬ１０」により識別されるブロブ（ブロブＢＬ１０）にエッジが連結されており、ノードＮ１からブロブＢＬ１０へ辿ることができることを示す。 In the example of FIG. 18, it is shown that the node (node N1) identified by the node ID “N1” corresponds to the object (target) identified by the object ID “OB1”. Further, it is shown that the edge is connected to the blob (blob BL8) identified by the blob ID "BL8" from the node N1 and the blob BL8 can be traced from the node N1. Further, it is shown that the edge is connected to the blob (blob BL10) identified by the blob ID "BL10" from the node N1 and the blob BL10 can be traced from the node N1.

また、ノードＩＤ「Ｎ２」により識別されるノード（ノードＮ２）は、オブジェクトＩＤ「ＯＢ２」により識別されるオブジェクト（対象）に対応することを示す。また、ノードＮ２からは、ブロブＩＤ「ＢＬ５」により識別されるブロブ（ブロブＢＬ５）にエッジが連結されており、ノードＮ１からブロブＢＬ５へ辿ることができることを示す。 Further, it is shown that the node (node N2) identified by the node ID "N2" corresponds to the object (target) identified by the object ID "OB2". Further, it is shown that the edge is connected to the blob (blob BL5) identified by the blob ID "BL5" from the node N2, and the blob BL5 can be traced from the node N1.

なお、グラフ情報記憶部１２２Ａは、上記に限らず、目的に応じて種々の情報を記憶してもよい。 The graph information storage unit 122A is not limited to the above, and may store various information depending on the purpose.

また、グラフは、クエリを入力とし、グラフ中のエッジを辿ることによりノードを探索し、クエリに類似するノードを抽出し出力するプログラムモジュールを含んでもよい。すなわち、グラフは、グラフを用いて検索処理を行うプログラムモジュールとしての利用が想定されるものであってもよい。例えば、グラフＧＲ３１は、クエリとしてベクトルデータが入力された場合に、そのベクトルデータに類似するベクトルデータに対応するノードをグラフ中から抽出し、出力するプログラムであってもよい。例えば、グラフＧＲ３１は、クエリ画像に対応する類似画像を検索するプログラムモジュールとして利用されるデータであってもよい。例えば、グラフＧＲ３１は、入力されたクエリに基づいて、グラフにおいてそのクエリに類似するノードを抽出し、出力するよう、コンピュータを機能させる。 Further, the graph may include a program module that takes a query as an input, searches for a node by tracing an edge in the graph, and extracts and outputs a node similar to the query. That is, the graph may be expected to be used as a program module that performs search processing using the graph. For example, the graph GR 31 may be a program that extracts and outputs a node corresponding to the vector data similar to the vector data from the graph when the vector data is input as a query. For example, the graph GR 31 may be data used as a program module for searching a similar image corresponding to the query image. For example, the graph GR 31 makes a computer function to extract and output a node similar to the query in the graph based on the input query.

〔２－４－３．検索処理例〕
ここから、変形例に係る検索処理の一例について、図１９を一例として説明する。図１９は、変形例に係る検索処理の一例を示すフローチャートである。以下に説明する検索処理は、情報処理装置１００Ａの検索処理部１３３Ａによって行われる。なお、図１１、図１６等、第１の実施形態または第２の実施形態と同様の点については適宜説明を省略する。例えば、図１９において、図１１と同様の点は同じステップ番号を付すことにより、適宜説明を省略する。 [2-4-3. Search processing example]
From here, an example of the search process according to the modified example will be described with reference to FIG. 19 as an example. FIG. 19 is a flowchart showing an example of the search process according to the modified example. The search process described below is performed by the search process unit 133A of the information processing apparatus 100A. It should be noted that the same points as those of the first embodiment or the second embodiment, such as FIGS. 11 and 16, will be omitted as appropriate. For example, in FIG. 19, the same points as in FIG. 11 are given the same step numbers, and the description thereof will be omitted as appropriate.

図１９では、集合Ｎ（Ｇ，ｙ）は、ノードｙに付与されているエッジにより関連付けられているブロブの集合である点で、図１１及び図１６の検索処理と相違する。図１９では、Ｎ（Ｇ、ｓ）およびＣはブロブ集合となる。すなわち、図１１での「近傍オブジェクト集合Ｎ」は、図１９では、「ブロブ集合Ｎ」と読み替えられ、図１１での「オブジェクト集合Ｃ」は、図１９では、「ブロブ集合Ｃ」と読み替えられる。また、上記のブロブへの変更に関連する処理における「オブジェクト」の文言は、適宜「ブロブ」と読み替えられる。「Ｇ」は、ブロブの概念が導入されたグラフ（ブロブ用グラフ）データ（例えば、空間情報ＳＰ３１に示すグラフＧＲ３１等）であってもよい。例えば、情報処理装置１００Ａは、ｋ近傍検索処理を実行する。 In FIG. 19, the set N (G, y) differs from the search process of FIGS. 11 and 16 in that the set N (G, y) is a set of blobs associated with the edges assigned to the node y. In FIG. 19, N (G, s) and C are blob sets. That is, the "neighborhood object set N" in FIG. 11 is read as "blob set N" in FIG. 19, and the "object set C" in FIG. 11 is read as "blob set C" in FIG. .. In addition, the wording of "object" in the process related to the above change to blob is appropriately read as "blob". “G” may be graph (blob graph) data (for example, graph GR31 shown in spatial information SP31) into which the concept of blob is introduced. For example, the information processing apparatus 100A executes the k-nearest neighbor search process.

例えば、図１９に示す検索処理は、図１１からステップＳ３０６の処理が以下のように変更される。図１９の検索処理においては、情報処理装置１００Ａは、Ｎ（Ｇ、ｓ）の中からブロブ集合Ｃに含まれない、ブロブを一つ選択し、選択したブロブをブロブ集合Ｃに格納し、ブロブ内のオブジェクトを一括計算し、その集合をＢ（「対象ブロブ内オブジェクト集合Ｂ」ともいう）とする（ステップＳ３０６）。 For example, in the search process shown in FIG. 19, the process in step S306 is changed from FIG. 11 as follows. In the search process of FIG. 19, the information processing apparatus 100A selects one blob that is not included in the blob set C from N (G, s), stores the selected blob in the blob set C, and blobs. The objects in the object are collectively calculated, and the set is set to B (also referred to as "object set B in the target blob") (step S306).

また、図１９に示す検索処理は、ステップＳ３０６の後にステップＳ３０６ａに示す処理を行う点で、図１１に示す検索処理と相違する。具体的には、図１９の検索処理においては、情報処理装置１００Ａは、対象ブロブ内オブジェクト集合Ｂからオブジェクトｕを一つ選択する（ステップＳ３０６ａ）。その後、情報処理装置１００Ａは、ステップＳ３０７以降の処理を行う。 Further, the search process shown in FIG. 19 is different from the search process shown in FIG. 11 in that the process shown in step S306a is performed after step S306. Specifically, in the search process of FIG. 19, the information processing apparatus 100A selects one object u from the object set B in the target blob (step S306a). After that, the information processing apparatus 100A performs the processing after step S307.

また、図１９に示す検索処理は、ステップＳ３１５の前にステップＳ３１４ａに示す処理を行う点で、図１１に示す検索処理と相違する。具体的には、図１９の検索処理においては、情報処理装置１００Ａは、対象ブロブ内オブジェクト集合Ｂから全て選択したか否かを判定する（ステップＳ３１４ａ）。 Further, the search process shown in FIG. 19 is different from the search process shown in FIG. 11 in that the process shown in step S314a is performed before step S315. Specifically, in the search process of FIG. 19, the information processing apparatus 100A determines whether or not all the objects in the target blob have been selected (step S314a).

対象ブロブ内オブジェクト集合Ｂから全て選択し終えた場合（ステップＳ３１４ａ：Ｙｅｓ）、情報処理装置１００Ａは、ステップＳ３１５の処理を行う。一方、対象ブロブ内オブジェクト集合Ｂから全て選択し終えていない場合（ステップＳ３１４ａ：Ｎｏ）、情報処理装置１００Ａは、ステップＳ３０６ａに戻って処理を繰り返す。 When all the objects in the target blob have been selected from the object set B (step S314a: Yes), the information processing apparatus 100A performs the process of step S315. On the other hand, when all the objects in the target blob have not been selected (step S314a: No), the information processing apparatus 100A returns to step S306a and repeats the process.

〔３．効果〕
上述してきたように、第１の実施形態に係る情報処理装置１００は、取得部１３１と、検索処理部１３３とを有する。取得部１３１は、データ検索の対象となる複数のオブジェクトに対する検索クエリを取得する。検索処理部１３３は、複数のオブジェクトの各々に対応するノード群がエッジにより連結されたグラフを用いて、検索クエリの近傍のノードを検索する検索処理において、所定の基準により選択された複数のノードと検索クエリとの距離を、ベクトル量子化がされた複数のノードのベクトル情報を用いて算出する。 [3. effect〕
As described above, the information processing apparatus 100 according to the first embodiment has an acquisition unit 131 and a search processing unit 133. The acquisition unit 131 acquires a search query for a plurality of objects to be searched for data. The search processing unit 133 uses a graph in which nodes corresponding to each of a plurality of objects are connected by edges to search for nodes in the vicinity of the search query, and the search processing unit 133 selects a plurality of nodes according to a predetermined criterion. The distance between and the search query is calculated using the vector information of multiple nodes that have undergone vector quantization.

このように、第１の実施形態に係る情報処理装置１００は、検索クエリとの距離を所定の基準により選択された複数のノードを対象として、ベクトル量子化された情報を用いて行うことにより、複数のノードを対象について計算負荷を低減することができる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment is performed by using vector-quantified information for a plurality of nodes whose distance from the search query is selected according to a predetermined criterion. It is possible to reduce the calculation load for multiple nodes. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、複数のノードと検索クエリとの距離の算出を並列化して一括で行う。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 calculates the distances between the plurality of nodes and the search query in parallel and collectively performs the calculation.

このように、第１の実施形態に係る情報処理装置１００は、複数のノードについて、検索クエリとの距離の算出を並列化して一括で行うことにより、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment can enable efficient search processing by performing the calculation of the distance from the search query in parallel for a plurality of nodes in a batch. can.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、情報処理装置１００の仕様に基づいて決定される一括処理数の複数のノードと検索クエリとの距離の算出を並列処理する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 calculates the distance between the plurality of nodes of the batch processing number determined based on the specifications of the information processing apparatus 100 and the search query in parallel. To process.

このように、第１の実施形態に係る情報処理装置１００は、情報処理装置１００が一括して処理可能な数の複数のノードについて、検索クエリとの距離の算出を並列化して一括で行うことにより、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment performs the calculation of the distance from the search query in parallel for a plurality of nodes that can be collectively processed by the information processing apparatus 100. Therefore, efficient search processing can be enabled.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、複数のノードの各々が対応する代表ベクトルに対応付けられたコードブックを用いて、複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 uses a codebook associated with a representative vector to which each of the plurality of nodes corresponds to the plurality of nodes and the search query. Calculate the distance.

このように、第１の実施形態に係る情報処理装置１００は、複数のノードの各々が対応する代表ベクトルに対応付けられたコードブックを用いることにより、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment can enable efficient search processing by using a codebook associated with a representative vector in which each of the plurality of nodes corresponds. can.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、一のノードからのエッジが連結された複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 calculates the distance between a plurality of nodes to which edges are connected from one node and a search query.

このように、第１の実施形態に係る情報処理装置１００は、あるノードから辿ることができる複数のノードを対象として、ベクトル量子化された情報を用いて行うことにより、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment performs efficient search processing by using vector-quantized information for a plurality of nodes that can be traced from a certain node. Can be made possible.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、複数のノードと複数のノードの各々が対応付けられた代表ベクトルを示す参照用情報が一のノードに対応付けて記憶されたノード情報を用いて、複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 associates a plurality of nodes with reference information indicating a representative vector associated with each of the plurality of nodes with one node. Using the stored node information, calculate the distance between multiple nodes and the search query.

このように、第１の実施形態に係る情報処理装置１００は、複数のノードと複数のノードの各々が対応付けられた代表ベクトルを示す参照用情報が一のノードに対応付けて記憶されたノード情報を用いることにより、効率的な情報へのアクセスが可能となる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, in the information processing apparatus 100 according to the first embodiment, the reference information indicating the representative vector to which each of the plurality of nodes and the plurality of nodes is associated is stored in association with one node. By using information, efficient access to information becomes possible. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、直積量子化により、各々が複数の部分ベクトルに分割された複数のノードのベクトル情報を用いて、複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 uses the vector information of the plurality of nodes, each of which is divided into a plurality of partial vectors by direct product quantization, with the plurality of nodes. Calculate the distance to the search query.

このように、第１の実施形態に係る情報処理装置１００は、直積量子化により、分割された部分ベクトルの情報を用いて、複数のノードと検索クエリとの距離を算出することにより、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment is efficient by calculating the distance between a plurality of nodes and the search query by using the information of the divided partial vectors by direct product quantization. Search processing can be enabled.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、複数のノードの各々が分割された複数の部分ベクトルの分割位置ごとのベクトルに対応する代表ベクトルに対応付けられた複数のコードブックを用いて、複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 is associated with a representative vector corresponding to a vector for each division position of a plurality of partial vectors in which each of the plurality of nodes is divided. Calculate the distance between multiple nodes and the search query using multiple codebooks.

このように、第１の実施形態に係る情報処理装置１００は、部分ベクトルの分割位置ごとのベクトルに対応する代表ベクトルに対応付けられたコードブックを用いることにより、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment enables efficient search processing by using a codebook associated with the representative vector corresponding to the vector for each division position of the partial vector. can do.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、複数のノードと複数のノードの各々の複数の部分ベクトルの各々が対応付けられた代表ベクトルを示す参照用情報が一のノードに対応付けて記憶されたノード情報を用いて、複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 has reference information indicating a representative vector to which each of the plurality of nodes and the plurality of partial vectors of the plurality of nodes is associated with each other. Using the node information stored in association with one node, the distance between multiple nodes and the search query is calculated.

このように、第１の実施形態に係る情報処理装置１００は、複数のノードと複数のノードの各々の複数の部分ベクトルの各々が対応付けられた代表ベクトルを示す参照用情報が一のノードに対応付けて記憶されたノード情報を用いることにより、効率的な情報へのアクセスが可能となる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, in the information processing apparatus 100 according to the first embodiment, the reference information indicating the representative vector to which each of the plurality of nodes and the plurality of partial vectors of each of the plurality of nodes is associated is combined with one node. Efficient access to information is possible by using the node information stored in association with each other. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、複数のノードの各々に、各ノードの複数の部分ベクトルの各々が対応付けられた代表ベクトルの一覧が対応付けられた参照用情報を用いて、複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 associates each of the plurality of nodes with a list of representative vectors to which each of the plurality of partial vectors of each node is associated. Calculate the distance between multiple nodes and the search query using the reference information.

このように、第１の実施形態に係る情報処理装置１００は、複数のノードの各々に、各ノードの複数の部分ベクトルの各々が対応付けられた代表ベクトルの一覧が対応付けられた参照用情報を用いることにより、効率的な情報へのアクセスが可能となる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, in the information processing apparatus 100 according to the first embodiment, reference information is associated with a list of representative vectors to which each of the plurality of partial vectors of each node is associated with each of the plurality of nodes. By using, efficient access to information becomes possible. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、複数のノードの各々の複数の部分ベクトルの各々が対応する代表ベクトルが、分割位置ごとに一覧で並ぶ転置情報と、複数のノードの各々が一覧で対応する位置を示す対応付情報とを含む参照用情報を用いて、複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 includes transposition information in which representative vectors corresponding to each of the plurality of partial vectors of the plurality of nodes are arranged in a list for each division position. , The distance between the plurality of nodes and the search query is calculated using the reference information including the corresponding information indicating the corresponding position in the list of each of the plurality of nodes.

このように、第１の実施形態に係る情報処理装置１００は、複数のノードの各々の複数の部分ベクトルの各々が対応する代表ベクトルが、分割位置ごとに一覧で並ぶ転置情報と、複数のノードの各々が一覧で対応する位置を示す対応付情報とを含む参照用情報を用いることにより、効率的な情報へのアクセスが可能となる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, in the information processing apparatus 100 according to the first embodiment, the transposition information in which the representative vectors corresponding to each of the plurality of partial vectors of the plurality of nodes are arranged in a list for each division position and the plurality of nodes Efficient access to the information is possible by using the reference information including the corresponding information indicating the corresponding position in the list. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、複数のノードの各々の複数の部分ベクトルの各々が対応する代表ベクトルに対応付けられた一のコードブックを用いて、複数のノードと検索クエリとの距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 uses one codebook in which each of the plurality of partial vectors of each of the plurality of nodes is associated with the corresponding representative vector. , Calculate the distance between multiple nodes and the search query.

このように、第１の実施形態に係る情報処理装置１００は、部分ベクトルの各分割位置に共通の一のコードブックを用いることにより、記憶量の増大を抑制することができる。 As described above, the information processing apparatus 100 according to the first embodiment can suppress an increase in the storage amount by using one codebook common to each division position of the partial vector.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、検索処理において処理対象となったノードのうち、所定のノードを対象として、ベクトル量子化がされた距離である第１距離とは異なり、ベクトル量子化がされていない第２距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 is a distance obtained by vector quantization for a predetermined node among the nodes processed in the search processing. Unlike the first distance, the second distance without vector quantization is calculated.

このように、第１の実施形態に係る情報処理装置１００は、ベクトル量子化された情報を用いて検索処理を行いつつ、所定のノードについてはベクトル量子化されていない距離を算出することにより、検索の精度を保ちつつ、検索処理時間の増大を抑制することができる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment performs search processing using vector-quantized information, and calculates a distance that is not vector-quantized for a predetermined node. It is possible to suppress an increase in search processing time while maintaining the accuracy of the search. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、検索処理において検索クエリの近傍のノードとして抽出するノードの第１数よりも多い数である第２数のノードを近傍候補ノードとして抽出し、近傍候補ノードを対象として第２距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 uses a second number of nodes, which is a larger number than the first number of nodes to be extracted as nodes in the vicinity of the search query in the search processing. It is extracted as a neighborhood candidate node, and the second distance is calculated for the neighborhood candidate node.

このように、第１の実施形態に係る情報処理装置１００は、検索結果とするノード数よりも多いノードを候補として抽出し、それらの候補のベクトル量子化されていない距離を算出することにより、検索の精度を保ちつつ、検索処理時間の増大を抑制することができる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment extracts nodes having a larger number of nodes as search results as candidates, and calculates the non-vector-quantized distances of those candidates. It is possible to suppress an increase in search processing time while maintaining the accuracy of the search. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、近傍候補ノードのうち、第２距離が短い方から第１数のノードを検索クエリの近傍のノードとして抽出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 extracts the first number of nodes from the one with the shorter second distance as the nodes in the vicinity of the search query among the neighborhood candidate nodes.

このように、第１の実施形態に係る情報処理装置１００は、候補のうちベクトル量子化されていない距離が短い方から検索結果とする数だけのノードを近傍のノードとすることにより、正確に距離計算されたノードを近傍のノードとすることができる。 As described above, the information processing apparatus 100 according to the first embodiment accurately sets as many nodes as the search results from the candidate having the shorter non-vector quantized distance as the neighboring nodes. A node whose distance has been calculated can be a neighboring node.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、第１距離が所定の閾値以内であるノードを対象として第２距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 calculates the second distance for the node whose first distance is within a predetermined threshold value.

このように、第１の実施形態に係る情報処理装置１００は、ベクトル量子化された距離が閾値以内であるノードのみを対象として、ベクトル量子化されていない距離を算出することにより、検索の精度を保ちつつ、検索処理時間の増大を抑制することができる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment calculates the distance not vector-quantized only for the nodes whose vector-quantized distance is within the threshold value, thereby achieving the accuracy of the search. It is possible to suppress an increase in search processing time while maintaining the above. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、近傍のノードとして抽出する対象範囲を示す検索範囲内のノードを対象として第２距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 calculates the second distance for the nodes in the search range indicating the target range to be extracted as the neighboring nodes.

このように、第１の実施形態に係る情報処理装置１００は、ベクトル量子化された距離が検索範囲内のノードのみを対象として、ベクトル量子化されていない距離を算出することにより、検索の精度を保ちつつ、検索処理時間の増大を抑制することができる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment has the accuracy of the search by calculating the distance not vector-quantized only for the nodes whose vector-quantized distance is within the search range. It is possible to suppress an increase in search processing time while maintaining the above. Therefore, the information processing apparatus 100 can enable efficient search processing.

また、第１の実施形態に係る情報処理装置１００において、検索処理部１３３は、検索処理の対象範囲を示す探索範囲内のノードを対象として第２距離を算出する。 Further, in the information processing apparatus 100 according to the first embodiment, the search processing unit 133 calculates the second distance for the nodes in the search range indicating the target range of the search processing.

このように、第１の実施形態に係る情報処理装置１００は、ベクトル量子化された距離が探索範囲内のノードのみを対象として、ベクトル量子化されていない距離を算出することにより、検索の精度を保ちつつ、検索処理時間の増大を抑制することができる。したがって、情報処理装置１００は、効率的な検索処理を可能にすることができる。 As described above, the information processing apparatus 100 according to the first embodiment calculates the distance not vector-quantized only for the nodes whose vector-quantized distance is within the search range, thereby achieving the accuracy of the search. It is possible to suppress an increase in the search processing time while maintaining the above. Therefore, the information processing apparatus 100 can enable efficient search processing.

〔４．ハードウェア構成〕
上述してきた各実施形態に係る情報処理装置１００、１００Ａは、例えば図２０に示すような構成のコンピュータ１０００によって実現される。図２０は、情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ（Read Only Memory）１３００、ＨＤＤ（Hard Disk Drive）１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [4. Hardware configuration]
The information processing devices 100 and 100A according to each of the above-described embodiments are realized by, for example, a computer 1000 having a configuration as shown in FIG. FIG. 20 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing device. The computer 1000 includes a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface (I / F) 1500, an input / output interface (I / F) 1600, and a media interface (I / F). ) Has 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、ネットワークＮを介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータをネットワークＮを介して他の機器へ送信する。 The HDD 1400 stores a program executed by the CPU 1100, data used by such a program, and the like. The communication interface 1500 receives data from another device via the network N and sends it to the CPU 1100, and transmits the data generated by the CPU 1100 to the other device via the network N.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display or a printer, and an input device such as a keyboard or a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. Further, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides the program or data to the CPU 1100 via the RAM 1200. The CPU 1100 loads the program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. And so on.

例えば、コンピュータ１０００が第１の実施形態に係る情報処理装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置からネットワークＮを介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information processing apparatus 100 according to the first embodiment, the CPU 1100 of the computer 1000 realizes the function of the control unit 130 by executing the program loaded on the RAM 1200. The CPU 1100 of the computer 1000 reads and executes these programs from the recording medium 1800, but as another example, these programs may be acquired from another device via the network N.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の行に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are examples, and various modifications are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure line of the invention. It is possible to carry out the present invention in other modified forms.

〔５．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [5. others〕
Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the information shown in the figure.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in any unit according to various loads and usage conditions. Can be integrated and configured.

また、上述してきた各実施形態に記載された各処理は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the processes described in the above-described embodiments can be appropriately combined as long as the processing contents do not contradict each other.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Further, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the acquisition unit can be read as an acquisition means or an acquisition circuit.

１情報処理システム
１０端末装置
５０情報提供装置
１００、１００Ａ情報処理装置
１２０、１２０Ａ記憶部
１２１オブジェクト情報記憶部
１２２、１２２Ａグラフ情報記憶部
１２３、１２３Ａ量子化情報記憶部
１２４コードブック情報記憶部
１２５ブロブ情報記憶部
１３０、１３０Ａ制御部
１３１取得部
１３２、１３２Ａ生成部
１３３、１３３Ａ検索処理部
１３４提供部 1 Information processing system 10 Terminal device 50 Information providing device 100, 100A Information processing device 120, 120A Storage unit 121 Object information storage unit 122, 122A Graph information storage unit 123, 123A Quantized information storage unit 124 Codebook information storage unit 125 Blob Information storage unit 130, 130A Control unit 131 Acquisition unit 132, 132A Generation unit 133, 133A Search processing unit 134 Providing unit

Claims

An acquisition unit that acquires search queries for multiple objects to be searched for data,
In a search process for searching for a node in the vicinity of the search query using a graph in which a group of nodes corresponding to each of the plurality of objects is connected by an edge, a plurality of nodes selected by a predetermined criterion and the search query. A search processing unit that calculates the distance to and from using the vector information of the plurality of nodes that have been vector-quantified.
An information processing device characterized by being equipped with.

The search processing unit
The information processing apparatus according to claim 1, wherein the calculation of the distance between the plurality of nodes and the search query is performed in parallel.

The search processing unit
The information processing apparatus according to claim 2, wherein the calculation of the distance between the plurality of nodes and the search query, which is the number of batch processes determined based on the specifications of the information processing apparatus, is performed in parallel.

The search processing unit
One of claims 1 to 3, wherein the distance between the plurality of nodes and the search query is calculated by using a codebook associated with a representative vector in which each of the plurality of nodes corresponds. The information processing device described in the section.

The search processing unit
The information processing apparatus according to claim 4, wherein the distance between the plurality of nodes to which edges from one node are connected and the search query is calculated.

The search processing unit
The reference information indicating the representative vector to which each of the plurality of nodes and the plurality of nodes is associated is associated with the one node, and the node information stored is used for the plurality of nodes and the search query. The information processing apparatus according to claim 5, wherein the distance between the node and the node is calculated.

The search processing unit
5. Item 6. The information processing apparatus according to Item 6.

The search processing unit
Using a plurality of codebooks associated with the representative vector corresponding to the vector for each division position of the plurality of subvectors in which each of the plurality of nodes is divided, the plurality of nodes and the search query are used. The information processing apparatus according to claim 7, wherein the distance is calculated.

The search processing unit
Using the node information stored in association with the one node, reference information indicating the representative vector to which each of the plurality of nodes and the plurality of partial vectors of each of the plurality of nodes is associated is used. The information processing apparatus according to claim 8, wherein the distance between the plurality of nodes and the search query is calculated.

The search processing unit
The plurality of nodes and the search query are subjected to the reference information associated with the list of the representative vectors to which each of the plurality of partial vectors of each node is associated with each of the plurality of nodes. The information processing apparatus according to claim 9, wherein the distance is calculated.

The search processing unit
The transposition information in which the representative vector corresponding to each of the plurality of partial vectors of each of the plurality of nodes is arranged in a list for each division position and the correspondence indicating the positions in which each of the plurality of nodes corresponds in the list. The information processing apparatus according to claim 9, wherein the distance between the plurality of nodes and the search query is calculated by using reference information including additional information.

The search processing unit
It is characterized in that the distance between the plurality of nodes and the search query is calculated by using one codebook associated with the representative vector to which each of the plurality of partial vectors of each of the plurality of nodes corresponds. The information processing apparatus according to claim 7.

The search processing unit
Among the nodes processed in the search process, the second distance that is not vector-quantized is different from the first distance that is the distance that is vector-quantized for a predetermined node. The information processing apparatus according to any one of claims 1 to 12, wherein the information processing apparatus is calculated.

The search processing unit
In the search process, a second number of nodes, which is a larger number than the first number of nodes to be extracted as a node in the vicinity of the search query, is extracted as a neighborhood candidate node, and the second distance is calculated for the neighborhood candidate node. The information processing apparatus according to claim 13, wherein the information processing apparatus is calculated.

The search processing unit
The information processing apparatus according to claim 14, wherein the first number of nodes are extracted as nodes in the vicinity of the search query from the one having the shorter second distance among the neighborhood candidate nodes.

The search processing unit
The information processing apparatus according to any one of claims 13 to 15, wherein the second distance is calculated for a node whose first distance is within a predetermined threshold value.

The search processing unit
The information processing apparatus according to any one of claims 13 to 16, wherein the second distance is calculated for a node in a search range indicating a target range to be extracted as a node in the vicinity.

The search processing unit
The information processing apparatus according to any one of claims 13 to 17, wherein the second distance is calculated for a node in a search range indicating a target range of the search process.

It is an information processing method executed by a computer.
The acquisition process to acquire search queries for multiple objects to be searched for data, and
In a search process for searching for a node in the vicinity of the search query using a graph in which a group of nodes corresponding to each of the plurality of objects is connected by an edge, a plurality of nodes selected by a predetermined criterion and the search query. A search processing step of calculating the distance to and from using the vector information of the plurality of nodes subjected to vector quantization, and
An information processing method characterized by including.

The acquisition procedure to acquire the search query for multiple objects to be searched for data, and the acquisition procedure.
In a search process for searching for a node in the vicinity of the search query using a graph in which a group of nodes corresponding to each of the plurality of objects is connected by an edge, a plurality of nodes selected by a predetermined criterion and the search query. A search processing procedure for calculating the distance to and from using the vector information of the plurality of nodes that have been vector-quantified.
An information processing program characterized by having a computer execute.