JP6190324B2

JP6190324B2 - SEARCH DEVICE, SEARCH METHOD, AND SEARCH PROGRAM

Info

Publication number: JP6190324B2
Application number: JP2014119115A
Authority: JP
Inventors: 靖宏藤原
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-06-09
Filing date: 2014-06-09
Publication date: 2017-08-30
Anticipated expiration: 2034-06-09
Also published as: JP2015232782A

Description

本発明は、検索装置などに関する。 The present invention relates to a search device and the like.

近年、Flickr（登録商標）などのウェブサービスが発達してきているように、画像検索が普及しつつある。そのなか、画像検索の手法についての研究が盛んにおこなわれている。画像検索の目的は、問い合わせ画像と意味が同じ画像を見いだすことである。 In recent years, as web services such as Flickr (registered trademark) have been developed, image search is becoming popular. In the midst of this, research on image retrieval techniques has been actively conducted. The purpose of the image search is to find an image having the same meaning as the inquiry image.

そこで、画像検索においては、ユーザの意図に基づき、どのように画像をランキングするかが重要な問題となる。１９７０年代に提案された初期の画像検索の手法は、キーワードに基づくものである。キーワードに基づく画像検索の手法は、現在でも多くのウェブサービスで用いられている。しかし、テキストと、画像との間には、意味的なギャップがあるという問題がある。 Therefore, in image search, how to rank images based on the user's intention is an important issue. The initial image retrieval technique proposed in the 1970s is based on keywords. The keyword-based image search method is still used in many web services. However, there is a problem that there is a semantic gap between text and images.

この意味的なギャップを克服するために、問い合わせ画像と最も視覚的に似た画像を検索する検索手法が、１９９０年代に提案された。この検索手法は、データベースに格納された画像を、形や色などの低次元の特徴量を用いた類似度に基づいてランキングをおこない、類似度がより高い画像を検索結果とする。この検索手法は、キーワードに基づく検索手法と比較して、画像の特徴量を自動抽出できるというメリットがある。しかし、この検索手法は、低次元の特徴量ではユーザが望む意図を十分に反映し切れず、画像の特徴量とユーザの意図との間にギャップがあり、適切な検索結果が必ずしも得られないという問題がある。 In order to overcome this semantic gap, a search method for searching for an image most visually similar to the query image was proposed in the 1990s. In this search method, images stored in a database are ranked based on similarity using low-dimensional feature quantities such as shape and color, and images with higher similarity are used as search results. This search method has an advantage that the feature amount of the image can be automatically extracted as compared with the search method based on the keyword. However, this search technique does not sufficiently reflect the user's intentions with low-dimensional feature values, and there is a gap between the image feature values and the user's intentions, and appropriate search results cannot always be obtained. There is a problem.

このギャップの問題を解決する手法として、Manifold Rankingを画像検索に用いることが提案されている。Manifold Rankingとは、データポイントによって構成されるクラスタ（典型的にManifoldとよばれる）に基づいて、データポイントのランキングを計算する手法である。また、同じクラスタに属するデータポイントは同じ意味を持つことを前提とし、画像検索における上記のギャップの問題を解決する手法も提案されている。 As a technique for solving this gap problem, it has been proposed to use Manifold Ranking for image retrieval. Manifold Ranking is a technique for calculating a ranking of data points based on a cluster (typically called Manifold) composed of data points. Also, a method for solving the above gap problem in image search has been proposed on the premise that data points belonging to the same cluster have the same meaning.

J.He，M.Li，H.Zhang，H.Tong，and C.Zhang，“Manifold-Ranking Based Image Retrieval”，In ACM Multimedia,2004.J.He, M.Li, H.Zhang, H.Tong, and C.Zhang, “Manifold-Ranking Based Image Retrieval”, In ACM Multimedia, 2004. D.Zhou，J.Weston，A.Gretton，O.Bousquet，and B.Schoelkopf，“Ranking on Data Manifolds”，In NIPS，2003.D.Zhou, J.Weston, A.Gretton, O.Bousquet, and B.Schoelkopf, “Ranking on Data Manifolds”, In NIPS, 2003.

しかし、Manifold Rankingを用いる画像検索の手法は、高い計算量が問題となる。理論的には、Manifold Rankingにおけるランキングスコアは、後述するコスト関数を最小化するものとして定義される。コスト関数を最小化する最適解は、データポイントの数をｎとしたとき、サイズがｎ×ｎである行列の逆行列を計算することで求めることができる。ｎ×ｎ行列の逆行列を求める計算量はＯ（ｎ^３）であるため、Manifold Rankingにおける計算量はＯ（ｎ^３）となる。さらに、Manifold Rankingを用いる画像検索の手法は、逆行列を保持することから、Ｏ（ｎ^２）のメモリ量が必要となる。このような課題は、画像検索に限らず、一般的なデータ検索にもあてはまる。 However, the image search method using Manifold Ranking has a problem of high calculation amount. Theoretically, the ranking score in the Manifold Ranking is defined as the one that minimizes the cost function described later. An optimal solution that minimizes the cost function can be obtained by calculating an inverse matrix of a matrix having a size of n × n, where n is the number of data points. Since the calculation amount for obtaining the inverse matrix of the n × n matrix is O (n ³ ), the calculation amount in the Manifold Ranking is O (n ³ ). Furthermore, the image retrieval method using the Manifold Ranking holds an inverse matrix, and therefore requires an O (n ² ) memory amount. Such a problem applies not only to image search but also to general data search.

本願が開示する実施形態の一例は、上記に鑑みてなされたものであって、データ検索を高速かつ精度よくおこなうことを目的とする。 An example of an embodiment disclosed in the present application has been made in view of the above, and an object thereof is to perform data search at high speed and with high accuracy.

本願が開示する実施形態の一例は、複数のノードならびにノード間を接続するエッジを含む入力グラフの隣接行列に基づく、各ノードのスコアを算出する行列に対して、疎行列を含む行列積へ行列分解をおこなう。そして、実施形態の一例は、問い合わせノードならびに検索数の入力に応じて、行列分解をおこなった行列から求まる問い合わせノードに対するスコアが、上位から検索数以内の解ノードを、複数のノードから選択して出力し、さらに、制御部は、入力グラフを複数のクラスタへ分割し、クラスタごとに、クラスタ内の全ノードに対するスコアの上限値を推定し、問い合わせノードに対するスコアが上位から検索数だけの解ノードを、複数のノードから選択する際に、クラスタごとに推定したスコアの上限値が、解ノードの集合に既に含まれる要素に対応するスコアの最小値（ただし、解ノードの集合が空集合の場合には、０）未満であるクラスタを、解ノードを選択する対象クラスタから除外する。 An example of an embodiment disclosed in the present application is a matrix to a matrix product including a sparse matrix with respect to a matrix for calculating a score of each node based on an adjacency matrix of an input graph including a plurality of nodes and edges connecting the nodes. Decompose. An example of the embodiment is to select a solution node having a score with respect to the query node obtained from the matrix subjected to matrix decomposition within the number of searches from a plurality of nodes according to the input of the query node and the number of searches. In addition, the control unit divides the input graph into a plurality of clusters, and for each cluster, estimates an upper limit value of the score for all nodes in the cluster, and obtains a query node having a score corresponding to the number of searches from the top for the query node. Is selected from multiple nodes, the upper limit of the score estimated for each cluster is the minimum score corresponding to the elements already included in the set of solution nodes (provided that the set of solution nodes is an empty set) Are excluded from the target cluster from which the solution node is selected .

本願が開示する実施形態の一例によれば、例えば、データ検索を高速かつ精度よくおこなうことができる。 According to an example of an embodiment disclosed in the present application, for example, data search can be performed at high speed and with high accuracy.

図１は、補助定理１を示す図である。FIG. 1 is a diagram showing Lemma 1. 図２は、定理１を示す図である。FIG. 2 is a diagram illustrating Theorem 1. 図３は、補助定理２を示す図である。FIG. 3 is a diagram showing Lemma 2. 図４は、補助定理３を示す図である。FIG. 4 is a diagram showing Lemma 3. 図５は、補助定理４を示す図である。FIG. 5 is a diagram showing the lemma 4. 図６は、定義１を示す図である。FIG. 6 is a diagram showing Definition 1. 図７は、補助定理５を示す図である。FIG. 7 is a diagram showing the lemma 5. 図８は、定義２を示す図である。FIG. 8 is a diagram showing definition 2. 図９は、補助定理６を示す図である。FIG. 9 is a diagram showing the lemma 6. 図１０は、補助定理７を示す図である。FIG. 10 is a diagram showing the lemma 7. 図１１は、定理２を示す図である。FIG. 11 is a diagram showing Theorem 2. 図１２は、定理３を示す図である。FIG. 12 is a diagram showing Theorem 3. 図１３は、検索装置の構成の一例を示すブロック図である。FIG. 13 is a block diagram illustrating an example of a configuration of the search device. 図１４は、事前計算部の構成の一例を示すブロック図である。FIG. 14 is a block diagram illustrating an example of the configuration of the pre-calculation unit. 図１５は、検索部の構成の一例を示すブロック図である。FIG. 15 is a block diagram illustrating an example of the configuration of the search unit. 図１６は、事前計算処理を示すフローチャートの一例を示す図である。FIG. 16 is a diagram illustrating an example of a flowchart illustrating the pre-calculation process. 図１７は、検索処理を示すフローチャートの一例を示す図である。FIG. 17 is a diagram illustrating an example of a flowchart illustrating search processing. 図１８は、ノードの並び替えによる最適化処理のアルゴリズムの一例を示す図である。FIG. 18 is a diagram illustrating an example of an algorithm for optimization processing by rearranging nodes. 図１９は、ｔｏｐ−ｋ検索処理のアルゴリズムの一例を示す図である。FIG. 19 is a diagram illustrating an example of a top-k search processing algorithm. 図２０は、検索プログラムを実行するコンピュータの一例を示す図である。FIG. 20 is a diagram illustrating an example of a computer that executes a search program.

実施形態の説明に先立ち、以下の説明で用いる主な記号の定義、従来手法の数理的背景の説明、実施形態の数理的背景の説明をおこなう。その後、実施形態を説明する。 Prior to the description of the embodiments, the definitions of main symbols used in the following description, the mathematical background of the conventional method, and the mathematical background of the embodiment will be described. Thereafter, an embodiment will be described.

［主な記号の定義］
実施形態で用いる主な記号を下表に示す。以下、従来手法の数理的背景、実施形態の数理的背景、実施形態の各説明において、同一の記号を用いる。 [Definition of main symbols]
The main symbols used in the embodiment are shown in the table below. Hereinafter, the same symbols are used in the mathematical background of the conventional method, the mathematical background of the embodiment, and the description of the embodiment.

［従来手法の数理的背景］
以下、従来手法の数理的背景を説明する。Manifold Rankingでは、ｋ−ＮＮグラフを用いてデータを表現することが一般的である。グラフのノードは、データポイントに対応する。２つのデータポイントがｋ近傍であれば、２つのデータポイント間は、無向グラフで接続される。ｋ−ＮＮグラフを用いたランキングとは、ノードの集合をＵ＝｛ｕ_１，ｕ_２，・・・，ｕ_ｎ｝⊂Ｒ^ｍ（Ｒ^ｍ：ｍ次実空間）とし、問い合わせノードをｕ_ｑ∈Ｕとしたときに、Ｕに含まれるノードをｕ_ｑに対するスコアでランキングすることである。 [Mathematical background of conventional methods]
The mathematical background of the conventional method will be described below. In Manifold Ranking, it is common to represent data using a k-NN graph. The nodes of the graph correspond to data points. If two data points are near k, the two data points are connected by an undirected graph. Ranking using a k-NN graph means that a set of nodes is U = {u ₁ , u ₂ ,..., u _n } ⊂R ^m (R ^m : m-th order real space), and an inquiry node is u _q When ∈U, the nodes included in U are ranked by the score for u _q .

Ａ∈Ｒ^ｎ×ｎ（Ｒ^ｎ×ｎ：ｎ×ｎ行列集合）をｋ−ＮＮグラフにおける隣接行列としたときに、行列Ａは、対称行列であり、エッジ数がＯ（ｎ）である。２つのノードｕ_ｉ，ｕ_ｊを接続するエッジの重みは、Ａ_ｉｊ＝ｅｘｐ｛−ｄ^２（ｕ_ｉ，ｕ_ｊ）／２σ^２｝と計算される。ここで、ｄ（ｕ_ｉ，ｕ_ｊ）は、ノードｕ_ｉ，ｕ_ｊの距離関数であり、一般的にはユーグリッド距離が用いられる。σは、一般的には距離関数の関数値の標準偏差が用いられる。ｑは、大きさがｎ×１のベクトルであり、問い合わせノードｕ_ｑに対応する要素ｑ_ｑは１であり、その他の要素は０である。 When AεR ^{n × n} (R ^{n × n} : n × n matrix set) is an adjacent matrix in the k-NN graph, the matrix A is a symmetric matrix and the number of edges is O (n). The weight of the edge connecting the two nodes u _i and u _j is calculated as A _ij = exp {−d ² (u _i , u _j ) / 2σ ² }. Here, d (u _i , u _j ) is a distance function of the nodes u _i , u _j , and generally a Eugrid distance is used. Generally, the standard deviation of the function value of the distance function is used for σ. q is a vector magnitude of n × 1, the element q _q corresponding to the query node u _q is 1, the other elements are zero.

Manifold Rankingにおいて、各ノードのスコアは、下記の式（１）で定義されるコスト関数を最小化する最適解として定義される。なお、下記の式（１）における||＊||^２は、Ｌ^２ノルムである。 In Manifold Ranking, the score of each node is defined as an optimal solution that minimizes the cost function defined by the following equation (1). In addition, || * || ² in the following formula (1) is an L ² norm.

コスト関数ｆ（ｘ）の右辺第１項は、グラフ内で近接するノードは類似するスコアを持つという条件に対応し、右辺第２項は、問い合わせノードｕ_ｑに近いノードほどより高いスコアを持つという条件に対応する。最適なランキングスコアｘ^＊は、コスト関数ｆ（ｘ）を最小化するものであり、下記の式（２）に示すコスト関数の微分を０とおいた下記の式（３）から、下記の式（４）のように求まる。 The first term on the right side of the cost function f (x) corresponds to the condition that adjacent nodes in the graph have similar scores, and the second term on the right side has a higher score for nodes closer to the query node u _q. It corresponds to the condition. The optimal ranking score x ^* minimizes the cost function f (x). From the following formula (3) where the derivative of the cost function shown in the following formula (2) is 0, the following formula ( It is obtained as in 4).

式（４）において、（Ｉ−αＣ^−１／２ＡＣ^−１／２）^−１は、（Ｉ−αＣ^−１／２ＡＣ^−１／２）の逆行列である。式（４）から、Manifold Rankingにおけるスコアを計算するには、逆行列の計算が必要であることが分かる。しかし、逆行列の計算は、ｎ×ｎ行列の場合、Ｏ（ｎ^３）の計算量を要する。よって、検索の際の計算量が問題となる。 In the formula (4), (I-αC- ¹ ^/ ² AC ^-1/2 ) ^-1 is an inverse matrix of (I-αC ^-1/2 AC ^-1/2 ). From equation (4), it can be seen that calculation of the inverse matrix is necessary to calculate the score in the Manifold Ranking. However, the calculation of the inverse matrix requires a calculation amount of O (n ³ ) in the case of an n × n matrix. Therefore, the amount of calculation at the time of search becomes a problem.

また、この従来手法では、スコアが小さいノードに対しても、式（２）〜（４）の計算を行う必要があるという問題がある。また、この従来手法では、逆行列を保持するために、Ｏ（ｎ^２）の記憶領域が必要となるという問題がある。結果として、大規模なデータに対して検索をおこなう際には、膨大な計算量と記憶領域を要することとなる。 In addition, this conventional method has a problem that it is necessary to perform the calculations of equations (2) to (4) even for a node having a small score. In addition, this conventional method has a problem that an O (n ² ) storage area is required to hold an inverse matrix. As a result, when searching for large-scale data, a huge amount of calculation and a storage area are required.

［実施形態の数理的背景］
次に、実施形態の数理的背景を説明する。Manifold Rankingによる検索速度を向上させるために、（ｉ）不完全コレスキー分解から得られた疎行列を用いてスコアの近似を計算し、（ii）不要なスコアの計算を枝刈り（除外）するために、スコアの上限値を計算する。（ｉ）および（ii）の手法は、不完全コレスキー分解から得られた疎行列における非零要素の数はＯ（ｎ）であるため、Ｏ（ｎ^３）の計算コストを要する逆行列を用いる場合と比べ大幅に高速に検索できる。 [Mathematical background of the embodiment]
Next, the mathematical background of the embodiment will be described. In order to improve the search speed by Manifold Ranking, (i) calculate approximate score using sparse matrix obtained from incomplete Cholesky decomposition, and (ii) prune (exclude) unnecessary score calculation. Therefore, the upper limit of the score is calculated. In the methods (i) and (ii), since the number of non-zero elements in the sparse matrix obtained from the incomplete Cholesky decomposition is O (n), an inverse matrix requiring the calculation cost of O (n ³ ) is obtained. Searches can be made much faster than when using them.

なお、疎行列とは、三角行列、対角行列などのように、大部分の要素が０である行列をいう。 Note that a sparse matrix is a matrix in which most elements are zero, such as a triangular matrix or a diagonal matrix.

以下、問い合わせノードに対してスコアを近似的に計算する方法について述べる。実施形態では、不完全コレスキー分解を用いてスコアを計算する。先ず、Manifold Rankingのスコアの定義が、不完全コレスキー分解を用いて書き直せることを示す。 Hereinafter, a method of approximately calculating the score for the inquiry node will be described. In an embodiment, the score is calculated using incomplete Cholesky decomposition. First, it shows that the definition of Manifold Ranking score can be rewritten using incomplete Cholesky decomposition.

実施形態では、近似の精度を上げるために、ノードを並び変える。ノードの並び変えを行う行列をＰとする。行列Ｐを求める手法は、後述する。また、ノードが並び変えられた行列をＡ´とする。ｋ−ＮＮグラフにおけるｎ×ｎの隣接行列Ａは、行列Ｐを用いて行列Ａ´へ、Ａ＝ＰＡ´Ｐ^Ｔと変換される。なお、行列Ｐ^Ｔは、行列Ｐの転置である。 In the embodiment, the nodes are rearranged in order to increase the accuracy of approximation. Let P be a matrix for rearranging nodes. A method for obtaining the matrix P will be described later. A matrix in which the nodes are rearranged is assumed to be A ′. The n × n adjacency matrix A in the k-NN graph is converted into a matrix A ′ using the matrix P as A = PA′P ^T. Note that the matrix ^PT is a transpose of the matrix P.

同様に、対角行列Ｃは、Ｃ´＝ＰＣＰ^Ｔと変換される。ｕ´_ｉを、ノードを並び替えた後の第ｉ番目のノードとする。ｎ×ｎの行列Ｐは、直交行列である。行列Ｐの各行と各列は、それぞれ１つずつ非零の要素を持ち、その他の要素は０である。Ｐ_ｉｊ＝１は、第ｊ番目の行が第ｉ番目の行と入れ替えできることを表す。 Similarly, the diagonal matrix C is transformed as C ′ = PCP ^T. Let u ′ _{i be} the i-th node after the nodes are rearranged. The n × n matrix P is an orthogonal matrix. Each row and each column of the matrix P has one non-zero element, and the other elements are zero. P _ij = 1 indicates that the j-th row can be replaced with the i-th row.

この行列Ｐを用いることにより、上記の式（４）で定義されるスコアの計算式は、Ｉ＝ＰＩＰ^Ｔであり、Ｐ^Ｔ＝Ｐ^−１であることから、下記の式（５）のように書き換えることができる。 By using this matrix P, the calculation formula of the score defined by the above equation (4) is I = PIP ^T and P ^T = P ⁻¹ , so that the following equation (5) Can be rewritten.

ここで、行列Ｉ、行列Ｃ´、行列Ａ´は、いずれも対称行列であるので、行列｛Ｉ−α（Ｃ´）^−１／２Ａ´（Ｃ´）^−１／２｝もまた対称行列である。そのため、実施形態は、行列｛Ｉ−α（Ｃ´）^−１／２Ａ´（Ｃ´）^−１／２｝を不完全コレスキー分解して、ある近似行列を求める。なお、不完全コレスキー分解とは、対称行列を下三角行列Ｌ、対角行列Ｄ、上三角行列Ｕ（Ｌは実対称行列のため、Ｕ＝Ｌ^Ｔ）の行列積へ近似分解する手法である。 Here, since the matrix I, the matrix C ′, and the matrix A ′ are all symmetric matrices, the matrix {I−α (C ′) ^{−1/2 A} ′ (C ′) ^−1/2 } is also symmetric. It is a matrix. Therefore, in the embodiment, the matrix {I−α (C ′) ^{−1/2 A} ′ (C ′) ^−1/2 } is ^subjected to incomplete Cholesky decomposition to obtain an approximate matrix. The incomplete Cholesky factorization is a method for approximating a symmetric matrix into a matrix product of a lower triangular matrix L, a diagonal matrix D, and an upper triangular matrix U (U = L ^T because L is a real symmetric matrix). is there.

不完全コレスキー分解により、｛Ｉ−α（Ｃ´）^−１／２Ａ´（Ｃ´）^−１／２｝＝ＬＤＵ＋Ｎ≒ＬＤＵ＝ＬＤＬ^Ｔと計算できる。この式から、近似のスコアは、前進身代入と後進代入を用いて計算できる。ここで、Ｌ´＝ＬＤ、ｑ´＝（１−α）Ｐｑとする。また、ベクトルｘを、近似スコアによるベクトルとし、ｘ´＝Ｐｘとする。すると、式（５）から、Ｕｘ´＝ｙ、Ｌ´ｙ＝ｑ´となる。行列Ｌ´は、下三角行列であるため、大きさがｎ×１であるベクトルｙの要素をＬ´ｙ＝ｑ´に対する前進代入を用いて、下記の式（６）に示すように求めることができる。 By incomplete Cholesky decomposition, {I−α (C ′) ^{−1/2 A} ′ (C ′) ^−1/2 } = LDU + N≈LDU = LDL ^T From this equation, an approximate score can be calculated using forward substitution and backward substitution. Here, L ′ = LD and q ′ = (1-α) Pq. Further, the vector x is a vector based on the approximate score, and x ′ = Px. Then, from Expression (5), Ux ′ = y and L′ y = q ′. Since the matrix L ′ is a lower triangular matrix, an element of the vector y having a size of n × 1 is obtained by using forward substitution for L′ y = q ′ as shown in the following formula (6). Can do.

同様に、大きさがｎ×１であるベクトルｘ´の要素は、行列Ｕｘ´＝ｙに対する後進代入を用いて、下記の式（７）に示すように求めることができる。 Similarly, an element of a vector x ′ having a size of n × 1 can be obtained as shown in the following formula (7) using backward substitution for the matrix Ux ′ = y.

前進代入では、先ず、最初の要素ｙ_１を求め、ｙ_１を用いてｙ_２を求めるように、前の要素の値を求め、式（６）に逐次的に代入をおこなっていくことで、最後の要素ｙ_ｎまでの各要素の値を求める。後進代入では、先ず、最後の要素ｘ´_ｎを求め、ｘ´_ｎを用いてｘ´_ｎ−１を求めるように、後の要素の値を求め、式（７）に逐次的に代入をおこなっていくことで、最初の要素ｘ´_１までの各要素の値を求める。 In forward substitution, first, the first element y ₁ is obtained, and the value of the previous element is obtained so that y ₂ is obtained using y ₁ . determine the value of each element to the last element y _n. The backward assignment, first, obtains the last element _x'n, to determine the _x'n-1 using the _x'n obtains the value after the elements, subjected to sequentially substituted into equation (7) By doing so, the value of each element up to the first element x ′ ₁ is obtained.

行列式Ｐｘ（＝ｘ´）は、ベクトルｘの行が並び替えられたベクトルなので、ノードｕ_ｉの近似スコアｘ_ｉは、ベクトルｘ´の要素から、下記の式（８）のように求められる。 Determinant Px (= x'), since vector rearranged row vector x, the approximate score x _i nodes u _i, the element of the vector x', determined as in the following formula (8) .

式（８）は、ベクトルｘ´が、近似のスコアに対応することを示す。すなわち、要素ｘ´_ｊは、並び替えをおこなった後のノードｕ´_ｊのスコアに対応する。 Equation (8) indicates that the vector x ′ corresponds to an approximate score. That is, the element x ′ _j corresponds to the score of the node u ′ _j after the rearrangement.

行列｛Ｉ−α（Ｃ´）^−１／２Ａ´（Ｃ´）^−１／２｝の不完全コレスキー分解をおこなえば、式（６）〜（８）から、近似のスコアを計算できることが分かる。実施形態における近似のスコアの計算量を示す補助定理１を、図１に示す。補助定理１によれば、近似のスコアを計算するには、Ｏ（ｎ）の計算量が必要である。 If an incomplete Cholesky decomposition of the matrix {I−α (C ′) ^{−1/2 A} ′ (C ′) ^−1/2 } is performed, an approximate score can be calculated from the equations (6) to (8). I understand. An auxiliary theorem 1 indicating the calculation amount of the approximate score in the embodiment is shown in FIG. According to Lemma 1, a calculation amount of O (n) is required to calculate an approximate score.

実施形態では、不完全コレスキー分解を用いて、｛Ｉ−α（Ｃ´）^−１／２Ａ´（Ｃ´）^−１／２｝≒ＬＤＵを計算することにより、近似のスコアを求める。以下、近似の精度を向上させるために、グラフのノードを並び替える手法について述べる。実施形態では、不完全コレスキー分解を適用する前にノードの並び替えをおこなう。先ず、ノードの並び替えにより近似の精度を向上させる問題は、ＮＰ完全問題と等価であることを、図２の定理１で示す。 In the embodiment, an approximate score is obtained by calculating {I−α (C ′) ^{−1/2 A} ′ (C ′) ^−1/2 } ≈LDU using incomplete Cholesky decomposition. In the following, a method for rearranging the nodes of the graph in order to improve the accuracy of approximation will be described. In the embodiment, the nodes are rearranged before applying the incomplete Cholesky decomposition. First, theorem 1 in FIG. 2 shows that the problem of improving the accuracy of approximation by rearranging nodes is equivalent to the NP complete problem.

ここで、不完全コレスキー分解が「不完全」である所以を説明する。行列ＷをＷ＝｛Ｉ−α（Ｃ´）^−１／２Ａ´（Ｃ´）^−１／２｝とすると、行列Ｌ（＝Ｕ^Ｔ）と、Ｄは、それぞれ下記の式（９）、（１０）のようになる。 Here, the reason why the incomplete Cholesky decomposition is “incomplete” will be described. Assuming that the matrix W is W = {I−α (C ′) ^{−1/2 A} ′ (C ′) ^−1/2 }, the matrix L (= U ^T ) and D are respectively expressed by the following formula (9): (10)

ここで、式（９）から、もし行列Ｗにおける要素が０であれば、対応する行列Ｌにおける要素も０になることが分かる。この「行列Ｌが行列Ｗと同じ疎なパターンを有する」性質が、「不完全」の所以である。“疎なパターン”とは、三角行列、対角行列などを含む、大部分の要素が０である行列の要素配置パターンをいう。 Here, it can be seen from equation (9) that if the element in the matrix W is 0, the element in the corresponding matrix L is also 0. This “matrix L has the same sparse pattern as the matrix W” property is the reason for “incomplete”. “Sparse pattern” refers to an element arrangement pattern of a matrix in which most elements are zero, including a triangular matrix, a diagonal matrix, and the like.

もし「行列Ｌが行列Ｗと同じ疎なパターンを有する」性質がなければ、式（９）、（１０）から正確にスコアの値を計算することは不可能である。この「行列Ｌが行列Ｗと同じ疎なパターンを有する」性質から、不完全コレスキー分解においては、強制的に値が０にされる要素の数が少ないほど、近似の精度の向上が期待できることが分かる。 If there is no “matrix L has the same sparse pattern as the matrix W” property, it is impossible to calculate the score value accurately from the equations (9) and (10). Due to the property that “the matrix L has the same sparse pattern as the matrix W”, in the incomplete Cholesky decomposition, the smaller the number of elements whose values are forced to 0, the higher the accuracy of approximation can be expected. I understand.

式（９）、（１０）から、行列Ｌと行列Ｄは、左側の要素から計算されることが分かる。このため、行列Ｗにおける左側の要素が疎であるほど強制的に値が０にされる要素の数が減り、近似の精度の向上が期待できることが分かる。すなわち、ノードを並び替えることにより行列の左側の要素を疎にし、近似の精度を向上させることができる。 From equations (9) and (10), it can be seen that the matrix L and the matrix D are calculated from the elements on the left side. For this reason, it can be seen that as the left side element in the matrix W is sparser, the number of elements whose value is forcibly reduced to 0 decreases, and an improvement in approximation accuracy can be expected. That is, by rearranging the nodes, the element on the left side of the matrix can be made sparse, and the accuracy of approximation can be improved.

ノードを並び替えるＡｌｇｏｒｉｔｈｍ１では、ｋ−ＮＮグラフのデータがクラスタを形成する、すなわちクラスタリング可能であるという性質を用いる。グラフのクラスタリング手法として、例えば一例として「H.Shiokawa，Y.Fujiwara，and M.Onizuka，”Fast Algorithm for Modularity-based Graph Clustering“，AAAI，2013.」を用いることができるが、その他の手法であってもよい。Ａｌｇｏｒｉｔｈｍ１および対応するフローチャートの説明は、図１６および図１８を参照して後述する。 Algorithm 1 that rearranges the nodes uses the property that the data of the k-NN graph forms a cluster, that is, clustering is possible. For example, “H.Shiokawa, Y.Fujiwara, and M.Onizuka,“ Fast Algorithm for Modularity-based Graph Clustering ”, AAAI, 2013.” can be used as a graph clustering method. There may be. The description of Algorithm 1 and the corresponding flowchart will be described later with reference to FIGS. 16 and 18.

ノードの並び替えをおこなう手法は、近似の精度を上げるためにおこなう以外にも、不要なノードのスコアの計算を枝刈りするというメリットがある。先に述べたとおり、近似のスコアは前進代入と後退代入を用いて計算できるが、前進代入と後退代入のみの手法では、選択したノードのみのスコアを計算することは難しい。これは前進代入と後退代入において先に計算したスコアを代入することで新たにスコアを逐次的に計算するからである。以下、不要なノードのスコアの計算を枝刈りするアプローチについて述べる。 The method of rearranging the nodes has an advantage of pruning unnecessary node scores in addition to the approximation accuracy. As described above, the approximate score can be calculated by using forward substitution and backward substitution. However, it is difficult to calculate the score of only the selected node by the method using only forward substitution and backward substitution. This is because new scores are sequentially calculated by substituting previously calculated scores in forward substitution and backward substitution. The following describes an approach for pruning unnecessary node score calculations.

ノードを並び変えた後における問い合わせノードをｕ´_ｑとし、ノードｕ´_ｑが含まれるクラスタをＣ_Ｑ（ノードｕ´_ｑがクラスタＣ_Ｎに含まれる場合はＣ_Ｑ＝Ｃ_Ｎ）としたときに、実施形態では、図３の補助定理２に示す行列Ｌの性質を用いて枝刈りをおこなう。 The inquiry node in after rearranged the node and _u'q, a cluster that contains the node _u'q when the _{_(C} Q = _C _N If the node _u'q is included in the cluster _{_C N) C} _Q In the embodiment, pruning is performed using the property of the matrix L shown in the lemma 2 of FIG.

図３の補助定理２に示す行列Ｌの性質から、ベクトルｙについて、図４の補助定理３に示す性質が成り立つ。 From the property of the matrix L shown in Lemma 2 of FIG. 3, the property shown in Lemma 3 of FIG. 4 holds for the vector y.

図４の補助定理３から、ベクトルｙにおける非零要素は、クラスタＣ_ＱとクラスタＣ_Ｎに対応する要素のみに限定されることがわかる。この性質を用いてベクトルｙにおける要素を高速に計算することができる。同様に、この性質から近似のスコアに対応するベクトルｘ´について、図５の補助定理４に示す性質が成り立つ。図５の補助定理４から、クラスタＣ_Ｎに対応するベクトルｘ´の要素が存在すれば、任意に選択したノードのスコアを計算できることがわかる。 From Lemma 3 in FIG. 4, the non-zero elements in the vector y is found to be limited only to the elements corresponding to the cluster C _Q and cluster C _N. Using this property, the elements in the vector y can be calculated at high speed. Similarly, for this vector x ′ corresponding to the approximate score, the property shown in Lemma 4 of FIG. 5 holds. From Lemma 4 in FIG. 5, if there are elements of the vector x'corresponding to the cluster C _N, it can be seen that calculates a score for nodes selected arbitrarily.

実施形態では、補助定理２〜４を用いて、ベクトルｙにおいて、まずクラスタＣ_ＱとクラスタＣ_Ｎに対応する要素を計算し、ベクトルｘ´におけるクラスタＣ_Ｎに対応する要素を計算する。クラスタＣ_Ｎに対応する要素を用いることにより、任意に選択したクラスタにおける近似のスコアを計算することができる。 In embodiments, using Lemma 2-4, the vector y, the element is first corresponding to the cluster C _Q and cluster C _N calculated, to calculate the element corresponding to the cluster C _N in the vector x'. By using the element corresponding to the cluster C _N, it is possible to calculate the score of the approximation in clusters arbitrarily selected.

実施形態では、高速に検索をおこなうために、解の候補となるノードを、スコアの上限値の推定により求める。スコアの上限値で解の候補となると判定されたときにのみ、近似のスコアを計算することで、高速に検索をおこなうことができる。以下、スコアの上限値の推定方法について述べる。 In the embodiment, in order to perform a high-speed search, a candidate node for a solution is obtained by estimating an upper limit value of the score. A search can be performed at high speed by calculating an approximate score only when it is determined that the upper limit of the score is a candidate for a solution. Hereinafter, a method for estimating the upper limit of the score will be described.

先に述べたとおり、実施形態は、先ず、クラスタＣ_Ｑと、クラスタＣ_Ｎに含まれる近似のスコアを計算するため、クラスタＣ_ＱおよびクラスタＣ_Ｎ以外のクラスタに含まれるノードに対して、スコアの上限値の推定値を計算する。ノードｕ´_ｉに対する、下記の（Ａ）で表される推定値は、図６の定義１のように定義される。 As mentioned above, embodiment, first, the cluster C _Q, to calculate the score of the approximation included in the cluster C _N, the node in the cluster other than clusters C _Q and cluster C _N, score Calculate an estimate of the upper limit of. The estimated value represented by (A) below for node u ′ _i is defined as definition 1 in FIG.

なお、下記の（Ｂ）で表される、図６の定義１で定義される２つの上限値は、検索をおこなう前に事前に計算しておくことが可能である。推定値についての性質を述べるための補助定理を示す。 Note that the two upper limit values defined by definition 1 in FIG. 6 represented by (B) below can be calculated in advance before performing a search. Here is a lemma to describe the properties of the estimates.

図７の補助定理５は、推定値が近似のスコアの上限値になっていることを示す。実施形態は、高速に検索をおこなうため、定義１により各ノードに対してスコアの上限値を推定するのではなく、各クラスタに対してスコアの上限値を推定する。クラスタＣ_ＱおよびクラスタＣ_ＮのいずれのクラスタでもないクラスタＣ_ｉにおける、下記の式（Ｃ）で表されるスコアの上限値の推定値は、図８の定義２のように定義される。 7 shows that the estimated value is the upper limit value of the approximate score. In the embodiment, in order to perform a search at high speed, the upper limit value of the score is not estimated for each node according to definition 1, but the upper limit value of the score is estimated for each cluster. The estimated value of the upper limit value of the score represented by the following formula (C) in the cluster C _i that is neither the cluster C _Q nor the cluster C _N is defined as definition 2 in FIG.

そして、上記（Ｃ）で表される、クラスタＣ_ｉにおけるスコアの上限値の推定値は、図９の補助定理６に示す性質を持つ。そして、図９の補助定理６を用いて、検索において不必要なノードを枝刈りする。その結果、検索におけるスコアの上限値の推定値の計算量は、図１０の補助定理７に示すとおりである。 The estimated value of the upper limit value of the score in the cluster C _i represented by (C) has the property shown in the auxiliary theorem 6 in FIG. Then, using the auxiliary theorem 6 in FIG. 9, unnecessary nodes in the search are pruned. As a result, the calculation amount of the estimated value of the upper limit value of the score in the search is as shown in the lemma 7 of FIG.

以上の数理的考察から、実施形態における検索の計算量およびメモリ量について、図１１の定理２および図１２の定理３が成り立つ。なお、実施形態は、疎行列を用い、不要なスコア計算を枝刈りするが、図１１の定理２によれば、不要なスコア計算を枝刈りする必要がない場合でも、計算量を削減でき、検索を高速化できることが分かる。 From the above mathematical considerations, theorem 2 of FIG. 11 and theorem 3 of FIG. 12 hold for the calculation amount and memory amount of the search in the embodiment. Although the embodiment uses a sparse matrix to prune unnecessary score calculations, theorem 2 in FIG. 11 can reduce the amount of calculation even when there is no need to prune unnecessary score calculations. You can see that the search can be speeded up.

すなわち、実施形態は、ＮＰ完全問題に基づき、従来手法より高い精度の検索結果を得ながら、計算量およびメモリ使用量がＯ（ｎ）と、従来手法より大幅に低減される。また、実施形態は、事前に内部パラメータの設定を要しないパラメータフリーとなり、ユーザがManifold Rankingによる検索を簡易におこなうことができる。 That is, in the embodiment, based on the NP complete problem, while obtaining a search result with higher accuracy than the conventional method, the calculation amount and the memory usage amount are O (n), which is significantly reduced from the conventional method. In addition, the embodiment is parameter-free that does not require the setting of internal parameters in advance, and the user can easily perform a search based on the Manifold Ranking.

［実施形態］
上記の数理的議論を踏まえ、以下、本願が開示する検索装置などの実施形態を、図面に基づいて説明する。以下の実施形態は、Manifold Rankingを用いた、画像をノードとする画像検索において、ｋ−ＮＮグラフ、問い合わせノードｕ_ｑ、解ノードの数ｋを入力とし、Manifold Rankingによるスコアが上位ｋ個のノードを出力する。なお、以下の実施形態は、一例を示すに過ぎず、本願が開示する技術を限定するものではない。 [Embodiment]
Based on the above mathematical discussion, embodiments of a search device and the like disclosed in the present application will be described below with reference to the drawings. In the following embodiment, in an image search using an image as a node using Manifold Ranking, the k-NN graph, the inquiry node u _q , and the number k of solution nodes are input, and the k score is the top k node according to the Manifold Ranking. Is output. The following embodiments are merely examples, and do not limit the technology disclosed by the present application.

（検索装置の構成）
図１３は、検索装置の構成の一例を示すブロック図である。実施形態の検索装置１００は、Manifold Rankingでスコアを計算し、スコアが高い順にｋ個のノードを検索結果として出力する検索装置である。図１３に示すように、検索装置１００は、事前計算部１０、検索部２０を有する。事前計算部１０は、グラフデータＧ＝｛Ｖ，Ｅ｝（Ｖはノード集合、Ｅはエッジ集合）を外部入力とし、三角行列を出力する。検索部２０は、問い合わせノードｕ_ｑおよび検索数ｋを外部入力とし、事前計算部１０からの三角行列を内部入力とし、検索結果としてｋ個のノードを出力する。 (Configuration of search device)
FIG. 13 is a block diagram illustrating an example of a configuration of the search device. The search device 100 according to the embodiment is a search device that calculates a score by Manifold Ranking and outputs k nodes as search results in descending order of score. As illustrated in FIG. 13, the search device 100 includes a pre-calculation unit 10 and a search unit 20. The pre-calculation unit 10 takes the graph data G = {V, E} (V is a node set, E is an edge set) as an external input, and outputs a triangular matrix. The search unit 20 uses the inquiry node u _q and the search number k as external inputs, the triangular matrix from the pre-calculation unit 10 as internal input, and outputs k nodes as search results.

（事前計算部の構成）
図１４は、事前計算部の構成の一例を示すブロック図である。事前計算部１０は、ノード並び替え部１１、行列計算部１２を有する。ノード並び替え部１１は、グラフデータＧ＝｛Ｖ，Ｅ｝を外部入力とし、ノード集合Ｖを並び替えるノードの並び替え行列Ｐを計算し、並び替え行列Ｐおよび並び替え行列Ｐにより並び替えられたノード集合Ｖ´を出力する。行列計算部１２は、ノードが並び替えられたグラフデータＧ＝｛Ｖ´，Ｅ｝を内部入力とし、上述の式（９）、（１０）に基づき計算した下三角行列Ｌ、対角行列Ｄ、上三角行列Ｕを出力する。ただし、Ｌ＝Ｕ^Ｔである。 (Configuration of pre-calculation unit)
FIG. 14 is a block diagram illustrating an example of the configuration of the pre-calculation unit. The prior calculation unit 10 includes a node rearrangement unit 11 and a matrix calculation unit 12. The node rearrangement unit 11 uses the graph data G = {V, E} as an external input, calculates a node rearrangement matrix P for rearranging the node set V, and is rearranged by the rearrangement matrix P and the rearrangement matrix P. Output node set V ′. The matrix calculation unit 12 uses the graph data G = {V ′, E} in which the nodes are rearranged as an internal input, and the lower triangular matrix L and the diagonal matrix D calculated based on the above equations (9) and (10). The upper triangular matrix U is output. However, it is L = ^{U T.}

（検索部の構成）
図１５は、検索部の構成の一例を示すブロック図である。検索部２０は、スコア計算部２１、スコア推定部２２、検索結果保存部２３を有する。スコア計算部２１は、三角行列を事前計算部１０からの入力とし、検索対象のノードＶを検索結果保存部２３からの入力として計算したノードＶのスコアをスコア推定部２２および検索結果保存部２３へ出力する。 (Configuration of search part)
FIG. 15 is a block diagram illustrating an example of the configuration of the search unit. The search unit 20 includes a score calculation unit 21, a score estimation unit 22, and a search result storage unit 23. The score calculation unit 21 uses the triangular matrix as an input from the pre-calculation unit 10 and the score of the node V calculated using the search target node V as the input from the search result storage unit 23. The score estimation unit 22 and the search result storage unit 23 Output to.

スコア推定部２２は、問い合わせノードｕ_ｑおよび検索個数ｋを外部入力とし、スコア計算部２１により計算されたスコアを入力とし、検索結果保存部２３からの検索対象のノードＶを入力として、推定したノードのスコアを検索結果保存部２３へ出力する。 The score estimation unit 22 estimates the query node u _q and the search number k as external inputs, receives the score calculated by the score calculation unit 21 as input, and receives the search target node V from the search result storage unit 23 as input. The score of the node is output to the search result storage unit 23.

検索結果保存部２３は、スコア推定部２２により推定されたスコアを入力とし、スコア計算部２１により計算されたスコアを入力とし、検索対象のノードを決定し、決定したノードをスコア計算部２１およびスコア推定部２２へ出力し、検索結果を外部へ出力する。 The search result storage unit 23 uses the score estimated by the score estimation unit 22 as input, receives the score calculated by the score calculation unit 21 as input, determines a search target node, and determines the determined node as the score calculation unit 21 and It outputs to the score estimation part 22, and outputs a search result outside.

（事前計算処理）
図１６は、事前計算処理を示すフローチャートの一例を示す図である。事前計算処理は、ｋ−ＮＮのグラフデータＧ＝｛Ｖ，Ｅ｝を入力として、ノードＶの並び替え行列Ｐを出力する処理である。 (Pre-calculation processing)
FIG. 16 is a diagram illustrating an example of a flowchart illustrating the pre-calculation process. The pre-calculation process is a process of outputting the rearrangement matrix P of the node V with the graph data G = {V, E} of k-NN as an input.

先ず、検索装置１００の事前計算部１０は、外部からの入力として、ｋ−ＮＮグラフの隣接行列Ａの入力を受け付ける（ステップＳ１１）。次に、事前計算部１０は、並び替え行列ＰをＰ＝０と初期化する（ステップＳ１２）。次に、事前計算部１０は、所定のクラスタリング手法により、グラフＧをＮ−１個のクラスタに分割する（ステップＳ１３）。次に、事前計算部１０は、Ｎ個目の空のクラスタＣ_Ｎを作成する（ステップＳ１４）。 First, the pre-calculation unit 10 of the search device 100 accepts an input of the adjacency matrix A of the k-NN graph as an input from the outside (step S11). Next, the pre-calculation unit 10 initializes the rearrangement matrix P as P = 0 (step S12). Next, the pre-calculation unit 10 divides the graph G into N−1 clusters by a predetermined clustering method (step S13). Next, the pre-calculation unit 10 creates an _Nth empty cluster CN (step S14).

次に、事前計算部１０は、ｉ＝１，２，・・・，Ｎ−１について、ステップＳ１５〜Ｓ１８のループ処理を実行する。ステップＳ１６では、事前計算部１０は、ｉ番目のクラスタから、他のクラスタへまたがるエッジを持つノードを削除する。次に、ステップＳ１７では、事前計算部１０は、ステップＳ１６で削除したノードを、Ｎ番目のクラスタＣ_Ｎへ追加する。事前計算部１０は、ｉ＝１，２，・・・，Ｎ−１についてのステップＳ１５〜Ｓ１８のループ処理が終了すると、ステップＳ１９へ処理を移す。 Next, the pre-calculation unit 10 executes a loop process of steps S15 to S18 for i = 1, 2,..., N−1. In step S16, the pre-calculation unit 10 deletes a node having an edge extending from another cluster to the i-th cluster. Next, in step S17, pre-calculation unit 10, a deleted node in step S16, adds to the N-th cluster _{C N.} When the loop process of steps S15 to S18 for i = 1, 2,..., N−1 is completed, the pre-calculation unit 10 moves the process to step S19.

次に、事前計算部１０は、行インデックスｋをｋ＝１と初期化する（ステップＳ１９）。次に、事前計算部１０は、ｉ＝１，２，・・・，Ｎ（Ｎはクラスタの数）について、ステップＳ２０〜Ｓ２９のループ処理を実行する。ステップＳ２１では、事前計算部１０は、グラフデータＧ＝｛Ｖ，Ｅ｝のノード集合Ｖを並べ代えたノード集合Ｖ´を空集合に初期化する。 Next, the pre-calculation unit 10 initializes the row index k to k = 1 (step S19). Next, the pre-calculation unit 10 executes a loop process of steps S20 to S29 for i = 1, 2,..., N (N is the number of clusters). In step S21, the pre-calculation unit 10 initializes a node set V ′ obtained by rearranging the node set V of the graph data G = {V, E} to an empty set.

次に、事前計算部１０は、ｊ＝１，２，・・・，Ｎ_ｉ（Ｎ_ｉは第ｉ番目のクラスタＣ_ｉにおけるノードの個数）について、ステップＳ２２〜Ｓ２７のループ処理を実行する。ステップＳ２３では、事前計算部１０は、ｕ_ｌ＝ａｒｇｍｉｎ（ｅ（ｕ）｜ｕ∈Ｃ_ｉ＼Ｖ´）により、エッジ数ｅ（ｕ）が最小となるＣ_ｉのノードｕ_ｌを求める。次に、ステップＳ２４では、事前計算部１０は、並び替え行列Ｐのｋｌ成分Ｐ_ｋｌをＰ_ｋｌ＝１とする。ただし、ｌは、ステップＳＳ２３で求めたノードｕ_ｌに対応するｌである。 Next, the pre-calculation unit 10 executes a loop process of steps S22 to S27 for j = 1, 2,..., N _i (N _i is the number of nodes in the i-th cluster C _i ). In step S23, the pre-calculation unit 10 obtains the node u _{l of} C _i that minimizes the number of edges e (u) by u _l = arg min (e (u) | uεC _i \ V ′). Next, in step S24, the pre-calculation unit 10 sets the kl component P _kl of the rearrangement matrix P to P _kl = 1. However, l is _l corresponding to the node u _l obtained in step SS23.

次に、ステップＳ２５では、事前計算部１０は、ステップＳ２３で求めたノードｕ_ｌを、ノード集合Ｖ´に追加する。次に、ステップＳ２６では、事前計算部１０は、行インデックスｋをｋ＝ｋ＋１と１インクリメントする。 Next, in step S25, the pre-calculation unit 10 adds the node u ₁ obtained in step S23 to the node set V ′. Next, in step S26, the pre-calculation unit 10 increments the row index k by 1 to k = k + 1.

事前計算部１０は、ｊ＝１，２，・・・，Ｎ_ｉについてのステップＳ２２〜Ｓ２７のループ処理が終了し、ｉ＝１，２，・・・，ＮについてのステップＳ２０〜Ｓ２８のループ処理が終了すると、ステップＳ２９へ処理を移す。ステップＳ２９では、事前計算部１０は、ステップＳ２０〜Ｓ２８のループ処理により求まった並び替え行列ＰおよびグラフデータＧ＝｛Ｖ，Ｅ｝のノード集合Ｖを並べ代えたノード集合Ｖ´を出力する。 Precalculation unit 10, j = 1,2, ···, the loop process of steps S22~S27 are finished for _{N i, i = 1,2, ···} , the loop of steps S20~S28 for N When the process ends, the process moves to a step S29. In step S29, the pre-calculation unit 10 outputs a node set V ′ in which the rearrangement matrix P obtained by the loop processing in steps S20 to S28 and the node set V of the graph data G = {V, E} are rearranged.

（検索処理）
図１７は、検索処理を示すフローチャートの一例を示す図である。先ず、検索部２０は、問い合わせノードｕ_ｑの入力を受け付ける（ステップＳ３１）。次に、検索部２０は、パラメータθ、ｔｏｐ−ｋノード集合Ｋを、それぞれθ＝０、Ｋ＝φと初期化する（ステップＳ３２）。次に、検索部２０は、ｔｏｐ−ｋノード集合Ｋにダミーノードを追加する（ステップＳ３３）。 (Search process)
FIG. 17 is a diagram illustrating an example of a flowchart illustrating search processing. First, the search unit 20 receives an input of an inquiry node u _q (step S31). Next, the search unit 20 initializes the parameter θ and the top-k node set K to θ = 0 and K = φ, respectively (step S32). Next, the search unit 20 adds a dummy node to the top-k node set K (step S33).

次に、検索部２０は、問い合わせノードｕ_ｑから、並び替え後のノードｕ´_ｑを求める（ステップＳ３４）。次に、検索部２０は、全てのｕ´_ｉ∈Ｃ_Ｑ∪Ｃ_Ｎについて、ステップＳ３５〜Ｓ３７のループ処理を実行する。ステップＳ３６では、検索部２０は、補助定理３に基づき、ノードｕ´_ｉに対応する要素ｙ´_ｉを計算する。検索部２０は、全てのｕ´_ｉ∈Ｃ_Ｑ∪Ｃ_ＮについてのステップＳ３５〜Ｓ３７のループ処理が終了すると、ステップＳ３８へ処理を移す。 Next, the search unit 20, from the query node _{u q,} obtains the node _u'q after the rearrangement (step S34). Next, the search unit 20 executes the loop process of steps S35 to S37 for all u ′ _i εC _Q ∪C _N. At step S36, the search unit 20, based on Lemma 3, to calculate the corresponding element y _'i to node _u'i. When the loop process of steps S35 to S37 for all u ′ _i εC _Q ∪C _N is completed, the search unit 20 moves the process to step S38.

次に、検索部２０は、全てのｕ´_ｉ∈Ｃ_Ｑ∪Ｃ_Ｎについて、ステップＳ３８〜Ｓ４４のループ処理を実行する。ステップＳ３９では、検索部２０は、補助定理４に基づき、ノードｕ´_ｉに対応する要素ｘ´_ｉを計算する。次に、ステップＳ４０では、検索部２０は、ステップＳ３９で計算した要素ｘ´_ｉがパラメータθ以上であるか否かを判定する。検索部２０は、要素ｘ´_ｉがパラメータθ以上である場合にステップＳ４１へ処理を移し、要素ｘ´_ｉがパラメータθ未満である場合にステップＳ４３へ処理を移す。 Next, the search unit 20 executes the loop process of steps S38 to S44 for all u ′ _i εC _Q ∪C _N. In step S39, the search unit 20 calculates an element x ′ _i corresponding to the node u ′ _i based on the lemma 4. Next, in step S40, the search unit 20 determines whether or not the element x ′ _i calculated in step S39 is greater than or equal to the parameter θ. The search unit 20 moves the process to step S41 when the element x ′ _i is equal to or larger than the parameter θ, and moves the process to step S43 when the element x ′ _i is less than the parameter θ.

次に、ステップＳ４１では、検索部２０は、ｖ´＝ａｒｇｍｉｎ（ｘ´_ｋ｜ｕ´_ｋ∈Ｋ）に基づき、ｖ´を計算する。次に、ステップＳ４２では、検索部２０は、ｔｏｐ−ｋノード集合Ｋからｖ´を削除し、ｕ´_ｉをｔｏｐ−ｋノード集合Ｋへ追加する。次に、ステップＳ４３では、検索部２０は、θ＝ｍｉｎ（ｘ´_ｋ｜ｘ´_ｋ∈Ｋ）なる計算式によりθを更新する。検索部２０は、全てのｕ´_ｉ∈Ｃ_Ｑ∪Ｃ_ＮについてのステップＳ３８〜Ｓ４４のループ処理が終了すると、ステップＳ４５へ処理を移す。 Next, in step S41, the search unit 20 calculates v ′ based on v ′ = arg min (x ′ _k | u ′ _k εK). Next, in step S42, the search unit 20 deletes v ′ from the top-k node set K, and adds u ′ _i to the top-k node set K. Next, in step S43, the search unit 20 updates θ according to a calculation formula θ = min (x ′ _k | x ′ _k εK). When the loop process of steps S38 to S44 for all u ′ _i εC _Q ∪C _N is completed, the search unit 20 moves the process to step S45.

次に、検索部２０は、全てのＣ_ｉ≠Ｃ_Ｑ，Ｃ_Ｎについて、ステップＳ４５〜Ｓ５５のループ処理を実行する。ステップＳ４６では、検索部２０は、図８の定義２で定義される、上述した（Ｃ）で表されるスコアの上限値の推定値を計算する。次に、ステップＳ４７では、検索部２０は、上述した（Ｃ）で表されるスコアの上限値の推定値がパラメータθ以上であるか否かを判定する。検索部２０は、スコアの上限値の推定値がパラメータθ以上である場合にステップＳ４８へ処理を移し、スコアの上限値の推定値がパラメータθ未満である場合にステップＳ５３へ処理を移す。 Next, the search unit 20 executes the loop process of steps S45 to S55 for all C _i ≠ C _Q and C _N. In step S46, the search unit 20 calculates an estimated value of the upper limit value of the score represented by (C) described above, which is defined by definition 2 in FIG. Next, in step S47, the search unit 20 determines whether or not the estimated value of the upper limit value of the score represented by (C) described above is greater than or equal to the parameter θ. The search unit 20 moves the process to step S48 when the estimated value of the upper limit value of the score is greater than or equal to the parameter θ, and moves the process to step S53 when the estimated value of the upper limit value of the score is less than the parameter θ.

次に、検索部２０は、全てのｕ_ｊ∈Ｃ_ｉについて、ステップＳ４８〜Ｓ５４のループ処理を実行する。ステップＳ４９では、検索部２０は、補助定理４に基づき、ノードｕ´_ｊに対応する要素ｘ´_ｊを計算する。次に、ステップＳ５０では、検索部２０は、ステップＳ４９で計算した要素ｘ´_ｊがパラメータθ以上であるか否かを判定する。検索部２０は、要素ｘ´_ｊがパラメータθ以上である場合にステップＳ５１へ処理を移し、要素ｘ´_ｊがパラメータθ未満である場合にステップＳ５３へ処理を移す。 Next, the search unit 20 executes the loop process of steps S48 to S54 for all u _j εC _i . In step S49, the search unit 20 calculates an element x ′ _j corresponding to the node u ′ _j based on the lemma 4. Next, in step S50, the search unit 20, element _x'j calculated in step S49 is equal to or more parameters theta. Search unit 20, element _x'j is the process goes to step S51 if it is more than parameter theta, elements _x'j is the process moves to step S53 if less than parameter theta.

次に、ステップＳ５１では、検索部２０は、ｖ´＝ａｒｇｍｉｎ（ｘ´_ｋ｜ｕ´_ｋ∈Ｋ）に基づき、ｖ´を計算する。次に、ステップＳ５２では、検索部２０は、ｔｏｐ−ｋノード集合Ｋからｖ´を削除し、ｕ´_ｉをｔｏｐ−ｋノード集合Ｋへ追加する。次に、ステップＳ５３では、検索部２０は、θ＝ｍｉｎ（ｘ´_ｋ｜ｘ´_ｋ∈Ｋ）なる計算式によりθを更新する。検索部２０は、全てのｕ´_ｊ∈Ｃ_ｉについてのステップＳ４８〜Ｓ５４のループ処理が終了し、全てのＣ_ｉ≠Ｃ_Ｑ，Ｃ_ＮについてのステップＳ４５〜Ｓ５５のループ処理が終了すると、ステップＳ５６へ処理を移す。 Next, in step S51, the search unit 20 calculates v ′ based on v ′ = arg min (x ′ _k | u ′ _k εK). Next, in step S <b> 52, the search unit 20 deletes v ′ from the top-k node set K and adds u ′ _i to the top-k node set K. Next, in step S _< b> 53, the search unit 20 updates θ using a calculation formula of θ = min (x ′ _k | x ′ _k εK). Searching unit 20, the loop process of steps S48~S54 for all _u'j ∈ C _i finished, all _{C i} ≠ _C Q, the loop process of steps S45~S55 for _{C N} is completed, step The process proceeds to S56.

次に、検索部２０は、全てのｕ´_ｉ∈Ｋについて、ステップＳ５６〜Ｓ５８のループ処理を実行する。ステップＳ５７では、検索部２０は、事前計算部１０のノード並び替え部１１が計算した並び替え行列Ｐを用いて、ノードｕ´_ｉを並び替える。検索部２０は、全てのｕ´_ｉ∈ＫについてのステップＳ５６〜Ｓ５８のループ処理が終了すると、ステップＳ５９へ処理を移す。ステップＳ５９では、検索部２０は、ｔｏｐ−ｋノード集合Ｋを出力する。 Next, the search unit 20 executes the loop process of steps S56 to S58 for all u ′ _i εK. In step S57, the search unit 20 rearranges the nodes u ′ _i using the rearrangement matrix P calculated by the node rearrangement unit 11 of the pre-calculation unit 10. When the loop process of steps S56 to S58 for all u ′ _i εK is completed, the search unit 20 moves the process to step S59. In step S59, the search unit 20 outputs a top-k node set K.

（ノードの並び替えによる最適化処理）
図１８は、ノードの並び替えによる最適化処理のアルゴリズムの一例を示す図である。図１８に示すＡｌｇｏｒｉｔｈｍ１は、図１６に示す事前計算処理を示すフローチャートに対応する。Ａｌｇｏｒｉｔｈｍ１は、ノードを並び替えることにより行列の左側の要素を疎にし、近似の精度を向上させる。ここで、Ｎをクラスタの数とし，Ｎ_ｉを第ｉ番目のクラスタＣ_ｉにおけるノードの個数とし、ｅ（ｕ）をクラスタＣ_ｉ内のノードｕに接続されるエッジの数とする。 (Optimization processing by rearranging nodes)
FIG. 18 is a diagram illustrating an example of an algorithm for optimization processing by rearranging nodes. Algorithm 1 shown in FIG. 18 corresponds to the flowchart showing the pre-calculation process shown in FIG. Algorithm 1 rearranges the nodes to sparse the left-side element of the matrix and improves the accuracy of approximation. Here, N is the number of clusters, N _i is the number of nodes in the i-th cluster C _i , and e (u) is the number of edges connected to the node u in the cluster C _i .

Ａｌｇｏｒｉｔｈｍ１では、先ず、行列Ｐを零行列に設定する（１行目）。次に、グラフのクラスタリング手法により、グラフにおけるＮ−１個のクラスタを計算する（第２行目）。このクラスタリング手法では、クラスタ間にまたがるエッジを持つノードは削除し、削除したノードをＮ番目のクラスタＣ_Ｎに追加する（３〜７行目）。その結果、クラスタＣ_Ｎに含まれるノードは、全てクラスタにまたがるエッジを持つこととなる。すなわち、ｉ＝１，２，・・・，Ｎ−１であるクラスタＣ_ｉに含まれるノードは、クラスタ内のエッジのみしか持たないこととなる。最後に、各クラスタにおいて、エッジが少ない順でノードを選択し、ノードの並び替えを行う（８〜１７行目）。 In Algorithm 1, first, the matrix P is set to a zero matrix (first row). Next, N-1 clusters in the graph are calculated by the graph clustering method (second row). In this clustering method, nodes having edges that extend between clusters are deleted, and the deleted nodes are added to the _Nth cluster CN (lines 3 to 7). As a result, nodes included in the cluster C _N becomes to have an edge across all clusters. That is, nodes included in cluster C _i where i = 1, 2,..., N−1 have only edges in the cluster. Finally, in each cluster, nodes are selected in the order of few edges, and the nodes are rearranged (lines 8 to 17).

（ｔｏｐ−ｋ検索処理）
図１９は、ｔｏｐ−ｋ検索処理のアルゴリズムの一例を示す図である。図１９に示すＡｌｇｏｒｉｔｈｍ２は、図１７に示す検索処理を示すフローチャートに対応する。Ａｌｇｏｒｉｔｈｍ２において、パラメータθはｔｏｐ−ｋノード集合Ｋにおいて最も低い近似のスコア、Ｋはｔｏｐ−ｋノード集合とする。Ａｌｇｏｒｉｔｈｍ２において、先ず、θを０に設定し（１行目）、ｔｏｐ−ｋノード集合Ｋに近似のスコアが０であるダミーのノードを追加する（２〜３行目）。次に、問い合わせノードｕ_ｑからノードｕ´_ｑを求める（４行目）。ベクトルｙにおいては、クラスタＣ_Ｑと、Ｃ_Ｎに属するノードに対応する要素のみが非零要素になるため（補助定理３）、前進代入を用いてベクトルｙにおける非零要素を計算する（５〜７行目）。 (Top-k search process)
FIG. 19 is a diagram illustrating an example of a top-k search processing algorithm. Algorithm 2 shown in FIG. 19 corresponds to the flowchart showing the search process shown in FIG. In Algorithm 2, the parameter θ is the lowest approximate score in the top-k node set K, and K is the top-k node set. In Algorithm 2, first, θ is set to 0 (first line), and a dummy node having an approximate score of 0 is added to the top-k node set K (second to third lines). Next, determine the node _u'q from the query node _{u q} (4 line). In vector y, the cluster C _Q and, only elements corresponding to the nodes belonging to C _N is non-zero elements (Lemma 3), to calculate the non-zero elements in the vector y using forward substitution (5 7th line).

クラスタＣ_Ｑに属するノードは解ノードであることが期待され、また、クラスタＣ_Ｎに含まれるノードの近似のスコアは、選択されたノードの近似のスコアを計算するために必要であるため（補助定理４）、クラスタＣ_Ｑと、クラスタＣ_Ｎに属するノードに対して近似のスコアを計算し、解ノードの集合を更新する（８〜１６行目）。そして、各クラスタにおける推定値を計算する（１８行目）。 Nodes belonging to the cluster C _Q is expected to be the solution node, also for the cluster score approximation of nodes contained in C _N is necessary to calculate the score of the approximation of the selected node (auxiliary theorem 4), and the cluster _{C Q,} the score of the approximation to nodes belonging to the cluster _{C N} calculated, updates the set of solutions nodes (8-16 line). And the estimated value in each cluster is calculated (18th line).

もし、推定値がθより小さければ、補助定理６から、そのクラスタに含まれる全てのノードの近似のスコアはθより小さくなる。そのため、そのクラスタに含まれるノードに対しては近似のスコアを計算しない。もし、推定値がθ以上であれば、そのクラスタには解ノードが含まれる可能性があるため、そのクラスタに含まれるノードの近似のスコアを計算する（１９〜２９行目）。最後に、解ノードに対して並び替え行列Ｐを用いて並び替えをおこなう前のノード番号を求める（３１〜３３行目）。 If the estimated value is smaller than θ, according to Lemma 6, the approximate scores of all the nodes included in the cluster are smaller than θ. Therefore, an approximate score is not calculated for the nodes included in the cluster. If the estimated value is greater than or equal to θ, the cluster may include a solution node, so an approximate score of the node included in the cluster is calculated (19th to 29th lines). Finally, the node number before sorting the solution nodes using the sorting matrix P is obtained (31st to 33rd rows).

（変例）
実施形態は、複数のノードならびにノード間を接続するエッジを含む入力グラフの隣接行列に基づく、各ノードのスコアを算出する行列を、不完全コレスキー分解する。しかし、行列分解は、不完全コレスキー分解に限らず、コレスキー分解、ＬＤＭ分解、ＬＵ分解など、疎行列を含む行列積へ行列を分解する他の分解方法を用いてもよい。 (Variant)
The embodiment performs incomplete Cholesky decomposition on a matrix that calculates a score of each node based on an adjacency matrix of an input graph including a plurality of nodes and edges connecting the nodes. However, the matrix decomposition is not limited to incomplete Cholesky decomposition, and other decomposition methods for decomposing the matrix into matrix products including sparse matrices such as Cholesky decomposition, LDM decomposition, and LU decomposition may be used.

（実施形態の効果）
実施形態は、複数のノードならびにノード間を接続するエッジを含む入力グラフの隣接行列に基づく、各ノードのスコアを算出する行列に対して不完全コレスキー分解をおこなう。そして、実施形態は、問い合わせノードならびに検索数の入力に応じて、行列分解をおこなった行列から求まる問い合わせノードに対するスコアが、上位から検索数以内の解ノードを、複数のノードから選択して出力する。 (Effect of embodiment)
The embodiment performs incomplete Cholesky decomposition on a matrix for calculating a score of each node based on an adjacency matrix of an input graph including a plurality of nodes and edges connecting the nodes. In the embodiment, in response to the input of the inquiry node and the number of searches, a solution node whose score for the query node obtained from the matrix subjected to matrix decomposition is within the number of searches from the top is selected and output from a plurality of nodes. .

また、実施形態は、入力グラフを複数のクラスタへ分割し、クラスタごとに、クラスタ内の全ノードに対するスコアの上限値を推定する。そして、実施形態は、問い合わせノードに対するスコアが上位から検索数だけの解ノードを、複数のノードから選択する際に、クラスタごとに推定したスコアの上限値が、解ノードの集合に既に含まれる要素に対応するスコアの最小値（ただし、解ノードの集合が空集合の場合には、０）未満であるクラスタを、解ノードを選択する対象クラスタから除外する。 In the embodiment, the input graph is divided into a plurality of clusters, and the upper limit value of the score for all the nodes in the cluster is estimated for each cluster. In the embodiment, an element whose score upper limit value estimated for each cluster is already included in the set of solution nodes when selecting a solution node having a score corresponding to the number of searches from the top for a query node from a plurality of nodes. Clusters that are less than the minimum score corresponding to (however, when the set of solution nodes is an empty set, 0) are excluded from target clusters for selecting solution nodes.

また、実施形態は、入力グラフを複数のクラスタへ分割し、各クラスタにおいて、接続されるエッジ数が少ない順序でノードを並び替え、ノードの並び替えに応じて求まる行列に基づいてスコアを算出する行列を変換した行列に対して不完全コレスキー分解をおこなう。 Also, the embodiment divides the input graph into a plurality of clusters, rearranges the nodes in the order in which the number of connected edges is small in each cluster, and calculates a score based on a matrix obtained according to the rearrangement of the nodes. Incomplete Cholesky decomposition is performed on the transformed matrix.

よって、実施形態は、Ｏ（ｎ^３）の計算量を要する従来手法と比較し、Ｏ（ｎ）の計算量へと計算量を削減できるので、高速に検索処理をおこなうことができる。また、実施形態は、ノードの並び替えをおこなった後に不完全コレスキー分解をおこなった結果に基づくスコアにより検索結果を出力するので、計算量の削減と計算精度のトレードオフの問題を克服し、高精度で上位ｋ個のノードを検索結果として出力できる。また、実施形態は、Ｏ（ｎ^２）のメモリ量を要する従来手法と比較し、Ｏ（ｎ）のメモリ量で計算できるので、計算資源の省資源化を図ることができる。また、実施形態は、事前設定を要する内部パラメータを必要としないので、ユーザは、Manifold Rankingによる検索を簡易に行うことができる。 Therefore, the embodiment can reduce the calculation amount to the calculation amount of O (n) as compared with the conventional method that requires the calculation amount of O (n ³ ), so that the search process can be performed at high speed. In addition, since the embodiment outputs the search result based on the score based on the result of performing the incomplete Cholesky decomposition after rearranging the nodes, the problem of the trade-off between the reduction of the calculation amount and the calculation accuracy is overcome, The top k nodes can be output as search results with high accuracy. In addition, since the embodiment can perform the calculation with the memory amount of O (n), compared with the conventional method that requires the memory amount of O (n ² ), it is possible to save the calculation resources. In addition, since the embodiment does not require an internal parameter that needs to be set in advance, the user can easily perform a search based on the Manifold Ranking.

（実施形態のシステム構成について）
図１３〜１５に示した検索装置１００の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、検索装置１００の機能の分散および統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散または統合して構成することができる。 (System configuration of the embodiment)
Each component of the search device 100 illustrated in FIGS. 13 to 15 is functionally conceptual, and does not necessarily need to be physically configured as illustrated. That is, the specific form of distribution and integration of the functions of the search device 100 is not limited to the illustrated one, and all or a part thereof can be functionally or physically in arbitrary units according to various loads or usage conditions. Can be distributed or integrated.

また、検索装置１００においておこなわれる各処理は、全部または任意の一部が、ＣＰＵ（Central Processing Unit）およびＣＰＵにより解析実行されるプログラムにて実現されてもよい。また、検索装置１００においておこなわれる各処理は、ワイヤードロジックによるハードウェアとして実現されてもよい。 Each process performed in the search device 100 may be realized in whole or in part by a CPU (Central Processing Unit) and a program that is analyzed and executed by the CPU. Moreover, each process performed in the search device 100 may be realized as hardware based on wired logic.

また、実施形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともできる。もしくは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上述および図示の処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて適宜変更することができる。 In addition, among the processes described in the embodiment, all or a part of the processes described as being automatically performed can be manually performed. Alternatively, all or part of the processing described as being performed manually can be automatically performed by a known method. In addition, the above-described and illustrated processing procedures, control procedures, specific names, and information including various data and parameters can be changed as appropriate unless otherwise specified.

（プログラムについて）
また、実施形態において説明した検索装置１００のＣＰＵなどの制御装置が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。例えば、制御装置が実行する処理をコンピュータが実行可能な言語で記述した検索プログラムを作成することもできる。この場合、コンピュータが検索プログラムを実行することにより、実施形態と同様の効果を得ることができる。さらに、検索プログラムをコンピュータ読み取り可能な記録媒体に記録して、記録媒体に記録された検索プログラムをコンピュータに読み込ませて実行することにより実施形態と同様の処理を実現できる。以下に、図１３〜１５に示した検索装置１００と同様の機能を実現するプログラムを実行するコンピュータの一例を説明する。 (About the program)
Further, it is possible to create a program in which a process executed by a control device such as the CPU of the search device 100 described in the embodiment is described in a language that can be executed by a computer. For example, a search program in which processing executed by the control device is described in a language that can be executed by a computer can be created. In this case, the same effect as the embodiment can be obtained by the computer executing the search program. Furthermore, the processing similar to the embodiment can be realized by recording the search program on a computer-readable recording medium, and reading and executing the search program recorded on the recording medium. Hereinafter, an example of a computer that executes a program that realizes the same function as that of the search device 100 illustrated in FIGS.

図２０は、検索プログラムを実行するコンピュータを示す図である。コンピュータ１０００は、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらは、バス１０８０によって接続される。 FIG. 20 is a diagram illustrating a computer that executes a search program. The computer 1000 includes a memory 1010 and a CPU 1020. The computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These are connected by a bus 1080.

図２０に示すように、メモリ１０１０は、ＲＯＭ（Read Only Memory）１０１１およびＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。また、ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０３１に接続される。また、ディスクドライブインタフェース１０４０は、ディスクドライブ１０４１に接続される。ディスクドライブ１０４１には、磁気ディスクや光ディスクなどの着脱可能な記憶媒体が挿入される。シリアルポートインタフェース１０５０は、例えばマウス１０５１、キーボード１０５２に接続される。また、ビデオアダプタ１０６０は、例えばディスプレイ１０６１に接続される。 As illustrated in FIG. 20, the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031. The disk drive interface 1040 is connected to the disk drive 1041. A removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041. The serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example. The video adapter 1060 is connected to the display 1061, for example.

ここで、図２０に例示するように、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、検索プログラムは、コンピュータ１０００によって実行される指令が記述されたプログラムモジュールとして、例えばハードディスクドライブ１０３１に記憶される。 Here, as illustrated in FIG. 20, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the search program is stored in, for example, the hard disk drive 1031 as a program module in which a command to be executed by the computer 1000 is described.

また、実施形態で説明した各種データは、プログラムデータとして、例えばメモリ１０１０やハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出す。そして、ＣＰＵ１０２０が、検索プログラムの各手順を実行する。 The various data described in the embodiment is stored as program data, for example, in the memory 1010 or the hard disk drive 1031. The CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1031 to the RAM 1012 as necessary. Then, the CPU 1020 executes each procedure of the search program.

なお、検索プログラムにかかるプログラムモジュール１０９３およびプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られない。すなわち、プログラムモジュール１０９３およびプログラムデータ１０９４は、着脱可能な記憶媒体に記憶され、ディスクドライブなどを介してＣＰＵ１０２０によって読み出されてもよい。 Note that the program module 1093 and the program data 1094 related to the search program are not limited to being stored in the hard disk drive 1031. That is, the program module 1093 and the program data 1094 may be stored in a removable storage medium and read by the CPU 1020 via a disk drive or the like.

検索プログラムにかかるプログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）など）を介して接続された他のコンピュータに記憶されていてもよい。そして、プログラムモジュール１０９３およびプログラムデータ１０９４は、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出され、実行されてもよい。 The program module 1093 and the program data 1094 related to the search program may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and the program data 1094 may be read and executed by the CPU 1020 via the network interface 1070.

また、検索プログラムのモジュール分割は、例えば図１３〜１５に示す事前計算部１０、ノード並び替え部１１、行列計算部１２、検索部２０、スコア計算部２１、スコア推定部２２、検索結果保存部２３およびその他の処理を実行する機能部それぞれが実行する処理単位でおこなってもよい。しかし、モジュールの分割および統合は、これに限らず、処理効率や保守性などを考慮し、適宜なされてもよい。 Further, the module division of the search program includes, for example, the pre-calculation unit 10, the node rearrangement unit 11, the matrix calculation unit 12, the search unit 20, the score calculation unit 21, the score estimation unit 22, and the search result storage unit illustrated in FIGS. 23 and other functional units that execute other processing may be performed in units of processing. However, the division and integration of modules are not limited to this, and may be appropriately performed in consideration of processing efficiency and maintainability.

以上の実施形態ならびにその変形は、本願が開示する技術に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 The above embodiments and modifications thereof are included in the invention disclosed in the claims and equivalents thereof as well as included in the technology disclosed in the present application.

１０事前計算部
１１ノード並び替え部
１２行列計算部
２０検索部
２１スコア計算部
２２スコア推定部
２３検索結果保存部
１００検索装置
１０００コンピュータ
１０１０メモリ
１０２０ＣＰＵ 10 Advance calculation part
11 Node rearrangement unit 12 Matrix calculation unit
20 Search part
21 Score calculator
22 Score estimation unit
23 Search Result Storage Unit 100 Search Device
1000 computers
1010 memory
1020 CPU

Claims

A search device having a control unit for executing processing in cooperation with a storage unit,
The controller is
Perform matrix decomposition into a matrix product including a sparse matrix for a matrix for calculating a score of each node based on an adjacency matrix of an input graph including an edge connecting a plurality of nodes and the nodes,
In response to the input of the query node and the number of searches, the score for the query node obtained from the matrix subjected to the matrix decomposition is selected from the plurality of nodes and output the solution nodes within the number of searches from the top .
further,
The controller is
Dividing the input graph into a plurality of clusters;
For each cluster, estimate the upper limit of the score for all nodes in the cluster;
The upper limit value of the score estimated for each cluster is already included in the set of solution nodes when the solution node having the score for the inquiry node corresponding to the number of searches from the top is selected from the plurality of nodes. Retrieval characterized by excluding clusters that are less than the minimum value of the score corresponding to an element (however, when the set of solution nodes is an empty set, 0) from the target cluster for selecting the solution node apparatus.

The retrieval apparatus according to claim 1 , wherein the matrix decomposition is incomplete Cholesky decomposition.

A search method executed by a search device,
The search device is
Perform matrix decomposition into a matrix product including a sparse matrix for a matrix for calculating a score of each node based on an adjacency matrix of an input graph including an edge connecting a plurality of nodes and the nodes,
In response to the input of the query node and the number of searches, the score for the query node obtained from the matrix subjected to the matrix decomposition is selected from the plurality of nodes and output the solution nodes within the number of searches from the top .
further,
The search device is
Dividing the input graph into a plurality of clusters;
For each cluster, estimate the upper limit of the score for all nodes in the cluster;
The upper limit value of the score estimated for each cluster is already included in the set of solution nodes when the solution node having the score for the inquiry node corresponding to the number of searches from the top is selected from the plurality of nodes. Including a process of excluding clusters that are less than a minimum value of the score corresponding to an element (however, when the set of solution nodes is an empty set, 0) from the target clusters for selecting the solution nodes. Search method.

Search program for causing a computer to function as a retrieval device according to claim 1 or 2.