JP6789253B2

JP6789253B2 - Search device, search method, and program

Info

Publication number: JP6789253B2
Application number: JP2018007390A
Authority: JP
Inventors: 小萌武; 豪入江; 薫平松; 柏野　邦夫; 邦夫柏野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2018-01-19
Filing date: 2018-01-19
Publication date: 2020-11-25
Anticipated expiration: 2038-01-19
Also published as: JP2019125316A

Description

本発明は、検索装置、検索方法、およびプログラムに関し、特に、リランキングの精度を高めるための検索装置、検索方法、およびプログラムに関する。 The present invention relates to a search device, a search method, and a program, and more particularly to a search device, a search method, and a program for improving the accuracy of reranking.

検索とは、検索の利用者がクエリを入力し、検索アルゴリズムがクエリとデータベースに格納されている検索対象のデータとの関係性の測度を計算し、関係性の測度が最も高い検索対象のデータや関係性の測度が所定の閾値より高い検索対象のデータ、検索対象のデータを関係性の測度の高い順に並べ替えたランキングリストなどを利用者に返答することである。 In search, the user of the search inputs a query, and the search algorithm calculates the measure of the relationship between the query and the data of the search target stored in the database, and the data of the search target with the highest relationship measure. It is to return to the user the data of the search target whose relationship measurement is higher than a predetermined threshold, the ranking list in which the search target data is sorted in descending order of the relationship measurement.

検索対象は、例えば、文書や画像、音声、映像、その他様々なメディアやその組み合わせとして記録されたデータ等がある。以下、検索対象のデータを単に検索対象と記す。また、関係性の測度を類似度と記し、クエリと検索対象との類似度を当該クエリにおける当該検索対象のランクスコアとも記す。 The search target includes, for example, documents, images, sounds, videos, various other media, and data recorded as a combination thereof. Hereinafter, the data to be searched is simply referred to as a search target. In addition, the measure of the relationship is described as the similarity, and the similarity between the query and the search target is also described as the rank score of the search target in the query.

利用者にとってクエリと関係性のある検索対象を関係のある検索対象と記し、そうでない検索対象を無関係な検索対象と記す。 Search targets that are related to the query for the user are described as related search targets, and search targets that are not related are described as irrelevant search targets.

ランキングリストの中における検索対象の順位を当該検索対象のランクという。ランクの値は正の整数であり、ランクスコアとは異なる概念である。ランクが高いとはランクの値が小さいことを指し、例えば、ランクの最も高い検索対象はランクの値が１である。 The rank of the search target in the ranking list is called the rank of the search target. The rank value is a positive integer, which is a different concept from the rank score. High rank means that the rank value is small. For example, the search target having the highest rank has a rank value of 1.

しかし、クエリから得られる情報量が少ない場合、クエリと関係のある検索対象との類似度が低く、関係のある検索対象に低いランクが与えられる可能性が高い。 However, when the amount of information obtained from the query is small, the similarity between the query and the related search target is low, and the related search target is likely to be given a low rank.

このような問題に対処するために、１回目の検索によってランクを与えられた検索対象に対して、新たなランクを付与するための検索対象リランキング方法が各種提案されている。 In order to deal with such a problem, various search target reranking methods for assigning a new rank to the search target given the rank by the first search have been proposed.

以下、１回目の検索によってランクを与えられた検索対象で、ランクのより高いものを上位結果と記す。上位結果の中に含まれている検索対象で、関係のあるものを真陽性の上位結果と記し、そうでないものを偽陽性の上位結果と記す。 Below, among the search targets given a rank by the first search, those with a higher rank are described as high-ranking results. Among the search targets included in the top results, those that are related are described as true positive top results, and those that are not are described as false positive top results.

例えば、非特許文献１記載の検索対象リランキング装置は、入力となる第１クエリを用いて、全検索対象を対象に１回目の検索をし、検索対象の中から上位結果を複数取得する。 For example, the search target reranking device described in Non-Patent Document 1 uses a first query as an input to perform a first search for all search targets, and acquires a plurality of high-ranking results from the search targets.

次に、検索対象の各々について、当該検索対象を第２クエリとし、第１クエリと全検索対象とを組み合わせたデータの集合を対象に２回目の検索をし、第１クエリと全検索対象にランクを付与する。 Next, for each of the search targets, the search target is set as the second query, the second search is performed on the set of data that combines the first query and all search targets, and the first query and all search targets are selected. Give a rank.

そして、検索対象の各々について、第１クエリと全上位結果とに付与されたランクに基づいて、当該検索対象のランクスコアを計算する。検索対象の全てについてランクスコアの計算が完了後、ランクスコアの高い順に全検索対象を並べ替える。 Then, for each of the search targets, the rank score of the search target is calculated based on the ranks given to the first query and all the top results. After the calculation of the rank score for all the search targets is completed, all the search targets are sorted in descending order of rank score.

また、例えば、非特許文献２記載の検索対象リランキング装置は、入力となる第１クエリを用いて、全検索対象を対象に１回目の検索をし、全検索対象に第１ランクを付与し、検索対象の中から上位結果を複数取得する。 Further, for example, the search target reranking device described in Non-Patent Document 2 uses the first query as an input to perform the first search for all the search targets and assigns the first rank to all the search targets. , Get multiple top results from the search target.

次に、上位結果の各々について、当該上位結果を第２クエリとし、第１クエリと全検索対象とを組み合わせたデータの集合を対象に２回目の検索をし、第１クエリと全検索対象に第２ランクを付与する。 Next, for each of the top results, the top result is set as the second query, the second search is performed on the set of data that combines the first query and all search targets, and the first query and all search targets are selected. Gives the second rank.

そして、検索対象の各々について、全上位結果の第１ランクの行列と、第１クエリの第２ランクの行列と、当該検索対象の第２ランクの行列と、に基づいて、当該検索対象のランクスコアを計算する。 Then, for each of the search targets, the rank of the search target is based on the first rank matrix of all the high-ranking results, the second rank matrix of the first query, and the second rank matrix of the search target. Calculate the score.

検索対象の全てについてランクスコアの計算が完了後、ランクスコアの高い順に全検索対象を並べ替える。 After the calculation of the rank score for all the search targets is completed, all the search targets are sorted in descending order of rank score.

また、例えば、非特許文献３記載の検索装置は、クエリと全検索対象とを組み合わせたデータの集合において、データのペアの各々について、当該ペアを構成した２個のデータ間の距離に基づく類似度を計算し、ランクスコアの初期値を定める。 Further, for example, the search device described in Non-Patent Document 3 has similarities based on the distance between two data constituting the pair for each pair of data in a set of data in which a query and all search targets are combined. Calculate the degree and determine the initial value of the rank score.

次に、データのペアの全てについて類似度の計算が完了後、データの各々について、当該データと全データとの類似度と全データのランクスコアに基づいて、当該データのランクスコアを更新する。 Next, after the calculation of the similarity for all the pairs of data is completed, the rank score of the data is updated for each of the data based on the similarity between the data and all the data and the rank score of all the data.

そして、データの全てについてランクスコアの更新をし、当該更新処理を複数回繰り返し、最終回の更新で得られたランクスコアの高い順に全検索対象を並べ替える。 Then, the rank score is updated for all of the data, the update process is repeated a plurality of times, and all the search targets are sorted in descending order of the rank score obtained in the last update.

Danfeng Qin, Stephan Gammeter, Lukas Bossard, Till Quack, and Luc J. Van Gool. Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In CVPR, 2011, pp. 777-784.Danfeng Qin, Stephan Gammeter, Lukas Bossard, Till Quack, and Luc J. Van Gool. Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In CVPR, 2011, pp. 777-784. Xiaohui Shen, Zhe Lin, Jonathan Brandt, and Ying Wu. Spatially-constrained similarity measure for large-scale object retrieval. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36, No. 6, 2014, pp. 1229-1241.Xiaohui Shen, Zhe Lin, Jonathan Brandt, and Ying Wu. Spatially-constrained similarity measure for large-scale object retrieval. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36, No. 6, 2014, pp. 1229- 1241. Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Scholkopf. Ranking on data manifolds. In NIPS, 2003, pp. 169-176.Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Scholkopf. Ranking on data manifolds. In NIPS, 2003, pp. 169-176.

検索対象リランキング方法の精度は、上位結果のリランキングに対する貢献度の正確性に強く依存する。 The accuracy of the search target reranking method strongly depends on the accuracy of the contribution of the top results to the reranking.

偽陽性の上位結果のリランキングに対する貢献度が高く計算された場合、当該上位結果との類似度が高い無関係な検索対象は、ランクスコアが高く計算され、リランキングの精度に悪影響を与える。 When the contribution of false positive top results to the reranking is calculated high, the unrelated search target having a high degree of similarity to the top results is calculated with a high rank score, which adversely affects the accuracy of the reranking.

例えば、非特許文献１記載の方法は、全上位結果のリランキングに対する貢献度を一様に扱う。また、非特許文献２記載の方法は、各上位結果の第１ランクに反比例する加重値を当該上位結果のリランキングに対する貢献度として計算する。 For example, the method described in Non-Patent Document 1 uniformly treats the degree of contribution to the reranking of all top results. Further, in the method described in Non-Patent Document 2, a weighted value inversely proportional to the first rank of each higher-ranking result is calculated as a degree of contribution to the re-ranking of the higher-ranking result.

しかし、第１ランクの正確性は１回目の検索の精度に強く依存し、１回目の検索の精度が低い場合、偽陽性の上位結果については第１ランクが高く、高い加重値が付与される可能性が高い。 However, the accuracy of the first rank strongly depends on the accuracy of the first search, and when the accuracy of the first search is low, the first rank is high and a high weighted value is given to the top result of false positive. Probability is high.

こうした第１ランクの高い偽陽性の上位結果は、依然としてリランキングに対する貢献度が高く、リランキングの精度に悪影響を与える場合が多い。 Such high-ranked false-positive results of the first rank still have a high degree of contribution to re-ranking, and often adversely affect the accuracy of re-ranking.

このため、偽陽性の上位結果のリランキングに対する貢献度と、真陽性の上位結果と無関係な検索対象との類似度が低い場合であっても、当該類似度が高いと判断されてしまうことにより、リランキングの精度が劣化する、という問題があった。 Therefore, even if the degree of contribution to the re-ranking of the false positive top results and the degree of similarity with the search target unrelated to the true positive top results are low, the degree of similarity is judged to be high. , There was a problem that the accuracy of reranking deteriorated.

本発明は上記の点に鑑みてなされたものであり、リランキングの精度を高めて、検索の精度を高めることができる検索装置、検索方法、およびプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to provide a search device, a search method, and a program capable of improving the accuracy of reranking and improving the accuracy of search.

本発明に係る検索装置は、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とするデータ集合に含まれるデータの各々について、前記データを第２クエリとして前記データ集合に対する検索を行った場合における、前記データ集合に含まれるデータの各々の、前記第２クエリに関連する順位であるランクを付与する検索部と、前記データ集合に含まれるデータのペアの各々について、一方の前記データと他方の前記データとの類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する類似度計算部と、前記検索対象となる複数のデータの各々について、前記類似度計算部により求められた類似度に基づいて、前記データが前記第１クエリに関連する度合いを示すランクスコアを計算するランクスコア計算部と、を備えて構成される。 The search device according to the present invention uses the data as a second query for each of the data included in the data set including the input first query data and the plurality of data to be searched as the elements. For each of the search unit that assigns a rank that is the rank related to the second query of each of the data included in the data set when the search is performed on the set, and each of the data pairs included in the data set. , When the similarity between the one data and the other data is the rank given to the other data when one data is used as the second query, and when the other data is used as the second query. A similarity calculation unit that calculates so that the higher the rank of the rank assigned to one of the data in the above, the larger the value, and the similarity calculation unit for each of the plurality of data to be searched. A rank score calculation unit for calculating a rank score indicating the degree to which the data is related to the first query is provided based on the similarity obtained by the above.

また、本発明に係る検索方法は、検索部が、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とするデータ集合に含まれるデータの各々について、前記データを第２クエリとして前記データ集合に対する検索を行った場合における、前記データ集合に含まれるデータの各々の、前記第２クエリに関連する順位であるランクを付与する検索ステップと、類似度計算部が、前記データ集合に含まれるデータのペアの各々について、一方の前記データと他方の前記データとの類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する類似度計算ステップと、ランクスコア計算部が、前記検索対象となる複数のデータの各々について、前記類似度計算部により求められた類似度に基づいて、前記データが前記第１クエリに関連する度合いを示すランクスコアを計算するランクスコア計算ステップと、を含む。 Further, in the search method according to the present invention, the search unit obtains the data for each of the data included in the data set including the input first query data and the plurality of data to be searched as elements. When a search is performed on the data set as the second query, a search step for assigning a rank, which is a rank related to the second query, for each of the data included in the data set, and a similarity calculation unit are used. For each of the pairs of data included in the data set, the similarity between the one data and the other data is the rank given to the other data when one of the data is used as the second query. , The similarity calculation step that calculates so that the higher the rank, the larger the rank given to one of the data when the other data is used as the second query, and the rank score calculation unit , A rank score calculation step for calculating a rank score indicating the degree to which the data is related to the first query based on the similarity obtained by the similarity calculation unit for each of the plurality of data to be searched. And, including.

本発明に係る検索装置及び検索方法によれば、検索部が、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とするデータ集合に含まれるデータの各々について、当該データを第２クエリとしてデータ集合に対する検索を行った場合における、データ集合に含まれるデータの各々の、第２クエリに関連する順位であるランクを付与し、類似度計算部が、データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する。 According to the search device and the search method according to the present invention, the search unit determines each of the data included in the data set containing the input first query data and the plurality of data to be searched as elements. When a search is performed on the data set using the data as the second query, each of the data contained in the data set is given a rank which is the rank related to the second query, and the similarity calculation unit gives the data set a rank. For each of the included data pairs, the similarity between one data and the other data is the rank given to the other data when one data is used as the second query, and the other data. When the data is used as the second query, the higher the rank, the larger the value given to the data.

そして、ランクスコア計算部が、検索対象となる複数のデータの各々について、類似度計算部により求められた類似度に基づいて、当該データが第１クエリに関連する度合いを示すランクスコアを計算する。 Then, the rank score calculation unit calculates a rank score indicating the degree to which the data is related to the first query based on the similarity obtained by the similarity calculation unit for each of the plurality of data to be searched. ..

このように、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とするデータ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算することにより、関係のある検索対象と無関係な検索対象との類似度を低く抑制し、無関係な検索対象のランクスコアをより正確に計算し、リランキングの精度を高めて、検索の精度を高めることができる。 As described above, for each of the data pairs included in the data set whose elements are the input first query data and the plurality of data to be searched, one of the relevant data and the other of the relevant data The degree of similarity is determined by the rank given to the other data when one of the data is used as the second query and the rank given to one of the data when the other data is used as the second query. By calculating so that the higher the rank of both, the higher the value, the similarity between the related search target and the unrelated search target is suppressed to a low level, and the rank score of the unrelated search target is calculated more accurately. , The accuracy of re-ranking can be improved and the accuracy of search can be improved.

また、本発明に係る検索装置の類似度計算部は、前記データ集合に含まれるデータのペアの各々について、一方の前記データと他方の前記データとの類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が前記所定のランクよりも低いランクである場合に小さい値を取るように計算することができる。 Further, the similarity calculation unit of the search device according to the present invention determines the similarity between one of the data and the other of each of the pairs of data included in the data set, and the second of the data. The rank given to the other data in the case of a query and the rank given to one of the data in the case of using the other data as a second query are both higher ranks than a predetermined rank. In both cases, the higher the rank, the larger the value, and when at least one of the ranks is lower than the predetermined rank, the smaller the value can be calculated.

また、本発明に係る検索方法の前記類似度計算ステップは、前記データ集合に含まれるデータのペアの各々について、一方の前記データと他方の前記データとの類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が前記所定のランクよりも低いランクである場合に小さい値を取るように計算することができる。 Further, in the similarity calculation step of the search method according to the present invention, for each of the pairs of data included in the data set, the similarity between one of the data and the other of the data is obtained, and one of the data is used. The rank given to the other data in the case of two queries and the rank given to one of the data in the case of using the other data as the second query are both higher than the predetermined rank. In some cases, the higher the rank, the larger the value, and when at least one of the ranks is lower than the predetermined rank, the smaller the value can be calculated.

また、本発明に係る検索装置は、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とする第１データ集合に含まれるデータの各々について、前記データを第２クエリとして前記第１データ集合に対する検索を行った場合における、前記第１データ集合に含まれるデータの各々の、前記第２クエリに関連する順位であるランクを付与する検索部と、前記第１クエリであるデータと、前記ランクが上位Ｎ個の前記検索対象となるデータを要素とする第２データ集合に含まれるデータのペアの各々について、一方の前記データと他方の前記データとの類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する第１類似度計算部と、前記第２データ集合の各データについて、前記第１類似度計算部により求められた類似度に基づいて、前記データが前記第１クエリに関連する度合いを示す第１ランクスコアを計算する第１ランクスコア計算部と、前記検索対象となる複数のデータに含まれるデータの各々と、前記第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の前記データと他方の前記データとの類似度である第２類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する第２類似度計算部と、前記検索対象となる複数のデータの各々について、前記データの第１ランクスコアと、前記第２類似度とに基づいて、前記データが前記第１クエリに関連する度合いを示す第２ランクスコアを計算する第２ランクスコア計算部と、を備えて構成される。 In addition, the search device according to the present invention obtains the data for each of the data included in the first data set whose elements are the input first query data and the plurality of data to be searched. When a search is performed on the first data set as a query, a search unit that assigns a rank that is a rank related to the second query of each of the data included in the first data set, and the first query. For each of the pair of data included in the second data set whose elements are the data having the highest rank N and the data to be searched, the degree of similarity between the one data and the other data is determined. , The rank given to the other data when one of the data is used as the second query and the rank given to one of the data when the other data is used as the second query are both high. For each data of the first similarity calculation unit and the second data set, which is calculated so as to take a larger value as the rank is, the data is based on the similarity obtained by the first similarity calculation unit. The first rank score calculation unit that calculates the first rank score indicating the degree of relevance to the first query, each of the data included in the plurality of data to be searched, and the data included in the second data set. For each of the pairs in combination with each of the above, a second similarity, which is the similarity between the one data and the other data, is given to the other data when one of the data is used as the second query. A second similarity calculation unit that calculates so that the higher the rank, the larger the value obtained and the rank given to one of the data when the other data is used as the second query. For each of the plurality of data to be searched, a second rank score indicating the degree to which the data is related to the first query is obtained based on the first rank score of the data and the second similarity. It is configured to include a second rank score calculation unit for calculating.

また、本発明に係る検索方法は、検索部が、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とする第１データ集合に含まれるデータの各々について、前記データを第２クエリとして前記第１データ集合に対する検索を行った場合における、前記第１データ集合に含まれるデータの各々の、前記第２クエリに関連する順位であるランクを付与する検索ステップと、第１類似度計算部が、前記第１クエリであるデータと、前記ランクが上位Ｎ個の前記検索対象となるデータを要素とする第２データ集合に含まれるデータのペアの各々について、一方の前記データと他方の前記データとの類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する第１類似度計算ステップと、第１ランクスコア計算部が、前記第２データ集合の各データについて、前記第１類似度計算部により求められた類似度に基づいて、前記データが前記第１クエリに関連する度合いを示す第１ランクスコアを計算する第１ランクスコア計算ステップと、第２類似度計算部が、前記検索対象となる複数のデータに含まれるデータの各々と、前記第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の前記データと他方の前記データとの類似度である第２類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する第２類似度計算ステップと、第２ランクスコア計算部が、前記検索対象となる複数のデータの各々について、前記データの第１ランクスコアと、前記第２類似度とに基づいて、前記データが前記第１クエリに関連する度合いを示す第２ランクスコアを計算する第２ランクスコア計算ステップと、を含む。 Further, in the search method according to the present invention, the search unit describes each of the data included in the first data set including the input first query data and the plurality of data to be searched as elements. When a search is performed on the first data set using the data as the second query, a search step for assigning a rank, which is a rank related to the second query, for each of the data included in the first data set, and a search step. One of the first similarity calculation units is for each of the data that is the first query and the data pair that is included in the second data set that includes the data to be searched for the top N ranks. The degree of similarity between the data and the other data is the rank given to the other data when one of the data is used as the second query, and one of the ranks given to the other data when the other data is used as the second query. The first similarity calculation step, which calculates so that the higher the rank given to the data, the larger the value, and the first rank score calculation unit determines each data in the second data set. A first rank score calculation step for calculating a first rank score indicating the degree to which the data is related to the first query based on the similarity obtained by the first similarity calculation unit, and a second similarity calculation. For each of the pairs of data included in the plurality of data to be searched and each of the data included in the second data set, one of the data and the other data The second similarity, which is the similarity, is set to the rank given to the other data when one of the data is used as the second query and to one of the data when the other data is used as the second query. The second similarity calculation step, which calculates so that the higher the assigned rank, the larger the value, and the second rank score calculation unit describes each of the plurality of data to be searched. Includes a second rank score calculation step of calculating a second rank score indicating the degree to which the data is relevant to the first query, based on the first rank score of the data and the second similarity.

本発明に係る検索装置及び検索方法によれば、検索部が、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とする第１データ集合に含まれるデータの各々について、当該データを第２クエリとして第１データ集合に対する検索を行った場合における、第１データ集合に含まれるデータの各々の、当該第２クエリに関連する順位であるランクを付与し、第１類似度計算部が、第１クエリであるデータと、ランクが上位Ｎ個の検索対象となるデータを要素とする第２データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算し、第１ランクスコア計算部が、第２データ集合の各データについて、第１類似度計算部により求められた類似度に基づいて、当該データが第１クエリに関連する度合いを示す第１ランクスコアを計算する。 According to the search device and the search method according to the present invention, each of the data included in the first data set whose elements are the input first query data and the plurality of data to be searched by the search unit. When a search is performed on the first data set using the data as the second query, each of the data included in the first data set is given a rank, which is the rank related to the second query, and the first For each of the pair of data included in the second data set whose elements are the data that is the first query and the data to be searched in the top N ranks, the similarity calculation unit has one of the relevant data and the other of the data. The degree of similarity with the data is given to the rank given to the other data when one of the data is used as the second query and to one of the data when the other data is used as the second query. The higher the rank, the larger the value is calculated so that the first rank score calculation unit obtains the similarity obtained by the first similarity calculation unit for each data in the second data set. Based on this, a first rank score indicating the degree to which the data is relevant to the first query is calculated.

そして、第２類似度計算部が、検索対象となる複数のデータに含まれるデータの各々と、第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の当該データと他方の当該データとの類似度である第２類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算し、第２ランクスコア計算部が、検索対象となる複数のデータの各々について、当該データの第１ランクスコアと、第２類似度とに基づいて、当該データが第１クエリに関連する度合いを示す第２ランクスコアを計算する。 Then, the second similarity calculation unit uses one of the data and the other for each of the pairs of the data included in the plurality of data to be searched and the data included in the second data set. The second similarity, which is the similarity with the relevant data, is the rank given to the other relevant data when one of the relevant data is used as the second query, and the case where the other relevant data is used as the second query. One of the ranks assigned to the data is calculated so that the higher the rank, the larger the value, and the second rank score calculation unit calculates the first of the data for each of the plurality of data to be searched. Based on the 1st rank score and the 2nd similarity, the 2nd rank score indicating the degree to which the data is related to the 1st query is calculated.

このように、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とする第１データ集合に含まれるデータの各々について、当該データを第２クエリとして第１データ集合に対する検索を行った場合における、第１データ集合に含まれるデータの各々の、当該第２クエリに関連する順位であるランクを付与し、第１類似度計算部が、第１クエリであるデータと、ランクが上位Ｎ個の検索対象となるデータを要素とする第２データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算し、第２データ集合の各データについて、類似度に基づいて、当該データが第１クエリに関連する度合いを示す第１ランクスコアを計算し、検索対象となる複数のデータに含まれるデータの各々と、第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の当該データと他方の当該データとの類似度である第２類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算し、検索対象となる複数のデータの各々について、当該データの第１ランクスコアと、第２類似度とに基づいて、当該データが第１クエリに関連する度合いを示す第２ランクスコアを計算することにより、偽陽性の上位結果のリランキングに対する貢献度と、真陽性の上位結果と無関係な検索対象との類似度とを低く抑制し、リランキングの精度を高めて、検索の精度を高めることができる。 In this way, for each of the data included in the first data set whose elements are the input first query data and the plurality of data to be searched, the data is used as the second query in the first data set. Each of the data included in the first data set is given a rank which is the rank related to the second query, and the first similarity calculation unit is the data which is the first query. For each of the data pairs included in the second data set whose elements are the data to be searched with the top N ranks, the similarity between the data on one side and the data on the other side is determined. The higher the rank, the larger the value given to the other data in the case of the second query and the rank given to one of the data in the case of using the other data as the second query. For each data in the second data set, the first rank score indicating the degree to which the data is related to the first query is calculated based on the similarity, and the data to be searched is divided into a plurality of data to be searched. For each pair of combinations of each of the included data and each of the data contained in the second data set, the second similarity, which is the similarity between the one data and the other data, is the one. The higher the rank given to the other data when the data is used as the second query, and the rank given to one of the data when the other data is used as the second query. Calculated to take a large value, and for each of the plurality of data to be searched, the degree to which the data is related to the first query is shown based on the first rank score and the second similarity of the data. By calculating the second rank score, the degree of contribution to the reranking of the top false positive results and the similarity between the top results of true positives and unrelated search targets are suppressed to a low level, and the accuracy of reranking is improved. , The accuracy of the search can be improved.

また、本発明に係る検索装置の第１類似度計算部は、前記第２データ集合に含まれるデータのペアの各々について、一方の前記データと他方の前記データとの類似度を、一方の前記データを第２クエリとした場合における他方の前記データに付与されたランクと、他方の前記データを第２クエリとした場合における一方の前記データに付与されたランクとが、共に所定のランクよりも高いランクである場合に高いランクであるほど大きい値を取るように計算し、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が前記所定のランクよりも低いランクである場合に小さい値を取るように計算し、第２類似度計算部は、検索対象となる複数のデータに含まれるデータの各々と、第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の当該データと他方の当該データとの類似度である第２類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が前記所定のランクよりも低いランクである場合に小さい値を取るように計算することができる。 Further, the first similarity calculation unit of the search device according to the present invention determines the similarity between one of the data and the other of each of the data pairs included in the second data set. The rank given to the other data when the data is used as the second query and the rank given to one of the data when the other data is used as the second query are both higher than the predetermined ranks. When the rank is high, the higher the rank, the higher the value, and when both ranks higher than the predetermined rank, the higher the rank, the higher the value. It is calculated so as to take a small value when the rank is lower than the predetermined rank, and the second similarity calculation unit includes each of the data included in the plurality of data to be searched and the second data set. For each of the pairs that combine each of the data, the second similarity, which is the similarity between the one data and the other data, is used as the second query, and the other data is used as the second query. When the rank given to the data and the rank given to one of the data when the other data is used as the second query are both higher than the predetermined rank, the higher the rank, the larger the rank. It can be calculated to take a value, and if at least one is a rank lower than the predetermined rank, it can be calculated to take a smaller value.

本発明に係るプログラムは、上記の検索装置の各部として機能させるためのプログラムである。 The program according to the present invention is a program for functioning as each part of the above-mentioned search device.

本発明の検索装置、検索方法、およびプログラムによれば、リランキングの精度を高めて、検索の精度を高めることができる。 According to the search device, search method, and program of the present invention, the accuracy of reranking can be improved to improve the accuracy of search.

本発明の第１実施の形態に係る検索装置の構成を示す概略図である。It is the schematic which shows the structure of the search apparatus which concerns on 1st Embodiment of this invention. 本発明の第１実施の形態に係る検索装置の検索処理ルーチンを示すフローチャートである。It is a flowchart which shows the search processing routine of the search apparatus which concerns on 1st Embodiment of this invention. 本発明の第２実施の形態に係る検索装置の構成を示す概略図である。It is the schematic which shows the structure of the search apparatus which concerns on 2nd Embodiment of this invention. 本発明の第２実施の形態に係る検索装置の検索処理ルーチンを示すフローチャートである。It is a flowchart which shows the search processing routine of the search apparatus which concerns on 2nd Embodiment of this invention.

以下、本発明の実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

＜本発明の第１実施の形態に係る検索装置の概要＞
まず、本発明の第１実施形態の概要について説明する。 <Overview of the search device according to the first embodiment of the present invention>
First, the outline of the first embodiment of the present invention will be described.

非特許文献３記載の方法では、検索の精度は、ランクスコアの更新に必要な全データ間の類似度の正確性に強く依存する。当該方法は、データ間の距離に基づく類似度を用いているが、距離計算の正確性が低い場合、関係のある検索対象と無関係な検索対象との類似度が高く計算される可能性が高い。 In the method described in Non-Patent Document 3, the accuracy of the search strongly depends on the accuracy of the similarity between all the data required for updating the rank score. The method uses similarity based on the distance between data, but if the distance calculation is not accurate, it is likely that the similarity between the related search target and the unrelated search target is high. ..

このため、無関係な検索対象は、ランクスコアが過度に高く計算され、検索の精度に悪影響を与える場合が多い、という問題があった。 For this reason, there is a problem that the rank score of an irrelevant search target is calculated to be excessively high, which often adversely affects the accuracy of the search.

本実施形態では、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とするデータ集合に含まれるデータの各々について、当該データを第２クエリとして、データ集合に対して検索をし、全データにランクを付与する。 In the present embodiment, for each of the data included in the data set containing the input first query data and the plurality of data to be searched as elements, the data is used as the second query for the data set. Search and give a rank to all data.

そして、データのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する。 Then, for each of the data pairs, the similarity between the one data and the other data is the rank given to the other data when one data is used as the second query, and the other data. When the data is used as the second query, the higher the rank, the larger the value given to the data.

そして、データのペアの全てについて類似度の計算が完了後、計算した全データ間の類似度に基づいて、全データのランクスコアを計算する。 Then, after the calculation of the similarity for all the data pairs is completed, the rank score of all the data is calculated based on the calculated similarity between all the data.

このような構成により、関係のある検索対象と無関係な検索対象との類似度を低く抑制し、無関係な検索対象のランクスコアをより正確に計算し、検索の精度を高めることが可能となる。 With such a configuration, it is possible to suppress the similarity between the related search target and the unrelated search target to a low level, calculate the rank score of the unrelated search target more accurately, and improve the accuracy of the search.

また、当該類似度を、データのペアの各々について、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が前記所定のランクよりも低いランクである場合に小さい値を取るように計算してもよい。 Further, for each of the data pairs, the similarity is defined as the rank given to the other data when one of the data is used as the second query and the other when the other data is used as the second query. When the ranks given to the relevant data are both higher than the predetermined rank, the higher the rank, the larger the value is calculated, and at least one of the ranks is lower than the predetermined rank. It may be calculated to take a small value in some cases.

このような構成により、関係のある検索対象と無関係な検索対象との類似度をより低く抑制することができ、検索の精度をより高めることが可能となる。 With such a configuration, the degree of similarity between the related search target and the unrelated search target can be suppressed to be lower, and the accuracy of the search can be further improved.

＜本発明の第１実施の形態に係る検索装置の構成＞
図１を参照して、本発明の第１実施の形態に係る検索装置の構成について説明する。図１は、本実施形態に係る検索装置の構成を示すブロック図である。 <Structure of the search device according to the first embodiment of the present invention>
The configuration of the search device according to the first embodiment of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of a search device according to the present embodiment.

検索装置１０は、ＣＰＵと、ＲＡＭと、後述する検索処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。 The search device 10 is composed of a computer including a CPU, a RAM, and a ROM that stores a program for executing a search processing routine described later, and is functionally configured as shown below.

図１に示すように、本実施形態に係る検索装置１０は、入力部１００と、データベース１１０と、検索部１２０と、類似度計算部１３０と、ランクスコア計算部１４０と、出力部１５０とを備えて構成される。 As shown in FIG. 1, the search device 10 according to the present embodiment includes an input unit 100, a database 110, a search unit 120, a similarity calculation unit 130, a rank score calculation unit 140, and an output unit 150. Be prepared.

入力部１００は、検索クエリである、第１クエリｑの入力を受け付ける。そして、入力部１００は、受け付けた第１クエリを、検索部１２０に渡す。 The input unit 100 receives the input of the first query q, which is a search query. Then, the input unit 100 passes the received first query to the search unit 120.

データベース１１０は、検索対象となる複数のデータを記憶している。 The database 110 stores a plurality of data to be searched.

具体的には、データベース１１０は、検索対象のデータ

をｎ個記憶している。データベース１１０は、検索部１２０の要求に応じて、検索対象のデータ

を検索部１２０に渡す。 Specifically, the database 110 is the data to be searched.

Is memorized n. The database 110 is the data to be searched in response to the request of the search unit 120.

Is passed to the search unit 120.

検索部１２０は、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とするデータ集合に含まれるデータの各々について、当該データを第２クエリとしてデータ集合に対する検索を行った場合における、データ集合に含まれるデータの各々の、第２クエリに関連する順位であるランクを付与する。 The search unit 120 searches the data set using the data as the second query for each of the data included in the data set including the input first query data and the plurality of data to be searched. When this is done, each of the data contained in the data set is given a rank, which is a rank related to the second query.

具体的には、検索部１２０は、まず、入力部１００から第１クエリｑを受け取ると、データベース１１０から、検索対象となる複数のデータ

を取得し、これらを１つの集合としたデータ集合

を作成する。ここで、

とする。 Specifically, when the search unit 120 first receives the first query q from the input unit 100, the search unit 120 receives a plurality of data to be searched from the database 110.

Is obtained, and these are set as one set.

To create. here,

And.

次に、検索部１２０は、データ集合

に含まれるデータの各々について、当該データを第２クエリとして当該データ集合に含まれるデータの各々を対象として検索した場合における、当該第２クエリと当該対象としたデータとの組に対して、当該第２クエリに関連する順位であるランクを付与する。 Next, the search unit 120 sets data.

For each of the data contained in the data, when the data is searched as the second query for each of the data included in the data set, the pair of the second query and the target data is referred to. Give a rank, which is the rank related to the second query.

より具体的には、まずｘ_０を第２クエリとして、データ集合

に含まれるデータ全てを対象として検索し、全データに対してランクを付与する。ここで、データｘ_ｉを第２クエリとした時の対象としたデータｘ_ｊに付与されたランクをｒ（ｘ_ｉ｜ｘ_ｊ）と表す。 More specifically, first, x ₀ is used as the second query, and the data set

Search for all the data included in, and give a rank to all the data. Here, the rank given to the target data x _j when the data x _i is used as the second query is expressed as r (x _i | x _j ).

例えば、検索した第２クエリｘ_０と対象としたデータｘ２とのランクが３であった場合、ｒ（ｘ_０｜ｘ_２）＝３となる。 For example, if the rank of the searched second query x ₀ and the target data x 2 is 3, then r (x ₀ | x ₂ ) = 3.

検索部１２０は、第２クエリをｘ_０と対象としたデータとの組｛ｘ_０，ｘ_０｝，｛ｘ_０，ｘ_１｝，…，｛ｘ_０，ｘ_ｎ｝の各々に対して、ｒ（ｘ_０｜ｘ_０）、ｒ（ｘ_０｜ｘ_１）、…、ｒ（ｘ_０，ｘ_ｎ）等のようにランクを付与する。 The search unit 120 refers to each of the pair {x ₀ , x ₀ }, {x ₀ , x ₁ }, ..., {x ₀ , x _n } of the data for which the second query is targeted as x ₀ . Ranks are given such as r (x ₀ | x ₀ ), r (x ₀ | x ₁ ), ..., r (x ₀ , x _n ), and the like.

そして、検索部１２０は、第２クエリをｘ_１とした場合、ｘ_２とした場合と、データ集合の全てのデータを第２クエリとした場合も同様にデータ集合

に含まれるデータ全てを対象としたランクを付与する。 Then, the search unit 120 similarly sets the data set when the second query is x ₁ , when the second query is x _2, and when all the data in the data set is the second query.

A rank is given to all the data contained in.

ここで、検索方法としては、例えば検索対象が文書の場合に、ｂａｇ−ｏｆ−ｗｏｒｄｓ法等、検索対象が画像の場合に、ｂａｇ−ｏｆ−ｖｉｓｕａｌ−ｗｏｒｄｓ法等、検索対象の性質に応じた種々の検索方法を用いることができる。 Here, as the search method, for example, when the search target is a document, the bag-of-words method, etc., and when the search target is an image, the bag-of-visual-words method, etc., depending on the nature of the search target. Various search methods can be used.

次に、検索部１２０は、全てのデータのペアに対してランクを付与すると、得られた全ランクの値を格納するランク行列を生成し、当該ランク行列と、生成したデータ集合

とを類似度計算部１３０に渡す。 Next, when the search unit 120 assigns ranks to all the data pairs, it generates a rank matrix that stores the values of all the obtained ranks, and the rank matrix and the generated data set are generated.

Is passed to the similarity calculation unit 130.

類似度計算部１３０は、データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する。 For each pair of data included in the data set, the similarity calculation unit 130 sets the similarity between one of the data and the other of the data as the second query of the other data. It is calculated that the higher the rank, the larger the value given to the data and the rank given to one of the data when the other data is used as the second query.

具体的には、類似度計算部１３０は、ランク行列に基づいて、データ集合

に含まれるデータのペアの各々について、当該データのペア｛ｘ_ｉ，ｘ_ｊ｝について、類似度ｓ（ｘ_ｉ，ｘ_ｊ）を求める。 Specifically, the similarity calculation unit 130 sets data based on the rank matrix.

For each of the data pairs included in, the similarity s (x _i , x _j ) is obtained for the data pair {x _i , x _j }.

ここで、類似度計算部１３０は、類似度ｓ（ｘ_ｉ，ｘ_ｊ）を、一方のデータｘ_ｉを第２クエリとした場合における他方のデータｘ_ｊに付与されたランクｒ（ｘ_ｉ｜ｘ_ｊ）と、他方のデータｘ_ｊを第２クエリとした場合における一方のデータｘ_ｉに付与されたランクｒ（ｘ_ｊ｜ｘ_ｉ）とが共に高いランクであるほど大きい値を取るように計算する。例えば、下記式（１）を用いて計算することができる。 Here, the similarity calculation unit 130 assigns the similarity s (x _i , x _j ) to the rank r (x _i |) given to the other data x _j when one data x _i is used as the second query. to take a large value as _x _i) and is at both high rank _| and x _j), the rank given to one of the data x _i r (x j in a case where the other data x _j and the second query calculate. For example, it can be calculated using the following formula (1).

ここで、

は、パラメータである。 here,

Is a parameter.

また、類似度ｓ（ｘ_ｉ，ｘ_ｊ）を、ｒ（ｘ_ｉ｜ｘ_ｊ）とｒ（ｘ_ｊ｜ｘ_ｉ）とにおける相乗平均の逆数、相加平均の逆数、最大値の逆数等としてもよい。 Further, the similarity s (x _i , x _j ) is set as the reciprocal of the geometric mean, the reciprocal of the arithmetic mean, the reciprocal of the maximum value, etc. at r (x _i | x _j ) and r (x _j | x _i ). May be good.

そして、類似度計算部１３０は、全てのデータのペアについて類似度を求めると、得られた全ての類似度を格納する類似度行列を求め、当該類似度行列と、データ集合

とを、ランクスコア計算部１４０に渡す。 Then, when the similarity calculation unit 130 finds the similarity for all the data pairs, it finds the similarity matrix that stores all the obtained similarity, and the similarity matrix and the data set.

To the rank score calculation unit 140.

ランクスコア計算部１４０は、検索対象となる複数のデータの各々について、類似度計算部１３０により求められた類似度に基づいて、当該データが第１クエリに関連する度合いを示すランクスコアを計算する。 The rank score calculation unit 140 calculates a rank score indicating the degree to which the data is related to the first query based on the similarity obtained by the similarity calculation unit 130 for each of the plurality of data to be searched. ..

具体的には、ランクスコア計算部１４０は、類似度計算部１３０より取得した類似度行列に基づいて、第１クエリｑに関して、データ集合

に含まれる全データのランクスコアを計算する。 Specifically, the rank score calculation unit 140 sets data for the first query q based on the similarity matrix acquired from the similarity calculation unit 130.

Calculate the rank score of all the data contained in.

ランクスコアの計算は、例えば非特許文献３記載の方法を用いて計算することができる。ここで、類似度行列を

とし、

とする。 The rank score can be calculated by using, for example, the method described in Non-Patent Document 3. Here, the similarity matrix

age,

And.

また、上記式（７）を用いて類似度を求めた場合、類似度行列の要素は、

とする。 Further, when the similarity is obtained using the above equation (7), the elements of the similarity matrix are

And.

まず、ランクスコア計算部１４０は、Ａの対角成分を０にする。 First, the rank score calculation unit 140 sets the diagonal component of A to 0.

次に、ランクスコア計算部１４０は、下記式（２）を用いて、対角行列Ｄを計算する。 Next, the rank score calculation unit 140 calculates the diagonal matrix D using the following equation (2).

ここで、ｄ_ｊは、下記式（３）のように、ｘ_ｊを検索の対象とした場合の、全ての第２クエリとの類似度の総和である。 Here, d _j is the sum of the degree of similarity with all the second queries when x _j is the search target as in the following equation (3).

ランクスコア計算部１４０は、下記式（４）にしたがって、

を計算する。 The rank score calculation unit 140 uses the following equation (4) to perform

To calculate.

次に下記式（５）にしたがって、ランクスコアを格納する行列ｗを計算する。ここで、行列ｗを

とする。 Next, the matrix w for storing the rank score is calculated according to the following equation (5). Here, the matrix w

And.

ここで、Ｉは単位行列あり、

はパラメータであり、例えばα＝０．９９を採用することができる。また、

も、パラメータである。 Here, I has an identity matrix,

Is a parameter, and for example, α = 0.99 can be adopted. Also,

Is also a parameter.

また、ランクスコアは、非特許文献３記載の反復法を用いても計算することができる。例えば、下記式（６）による計算を繰り返すことにより、行列ｗを計算する。 The rank score can also be calculated by using the iterative method described in Non-Patent Document 3. For example, the matrix w is calculated by repeating the calculation according to the following equation (6).

ここで、

である。また、ｔ番目の反復をしたときに計算されるｗをｗ^ｔとし、ｗ^０＝ｙとする。また、例えば反復回数を３０回とし、最終回の反復で計算された行列ｗをランクスコア計算部１４０による計算結果とすることができる。 here,

Is. Also, the w to be calculated when the t-th iteration and w ^{^t,} and ^w 0 = y. Further, for example, the number of iterations can be set to 30, and the matrix w calculated in the final iteration can be used as the calculation result by the rank score calculation unit 140.

また、ランクスコアは、共役勾配法を用いて、上記式（５）を解く方法によっても計算することができる。 The rank score can also be calculated by the method of solving the above equation (5) using the conjugate gradient method.

そして、ランクスコア計算部１４０は、計算された行列ｗに基づいて、ランクスコアの高い順に検索対象のデータ

を並べ替え、並べ替え結果を出力部１５０に渡す。 Then, the rank score calculation unit 140 searches for data in descending order of rank score based on the calculated matrix w.

Is sorted, and the sorting result is passed to the output unit 150.

出力部１５０は、ランクスコア計算部１４０による並び替え結果を出力する。 The output unit 150 outputs the sorting result by the rank score calculation unit 140.

＜本発明の第１実施の形態に係る検索装置の作用＞
図２は、本発明の第１実施の形態に係る検索処理ルーチンを示すフローチャートである。 <Operation of the search device according to the first embodiment of the present invention>
FIG. 2 is a flowchart showing a search processing routine according to the first embodiment of the present invention.

入力部１００に第１クエリが入力されると、検索装置１０において、図２に示す検索処理ルーチンが実行される。 When the first query is input to the input unit 100, the search device 10 executes the search processing routine shown in FIG.

まず、ステップＳ１００において、入力部１００が、第１クエリの入力を受け付ける。 First, in step S100, the input unit 100 accepts the input of the first query.

ステップＳ１１０において、検索部１２０が、データベース１１０から検索対象となる複数のデータを取得する。 In step S110, the search unit 120 acquires a plurality of data to be searched from the database 110.

ステップＳ１２０において、検索部１２０は、上記ステップＳ１００により受け付けた第１クエリであるデータと、上記ステップ１００により取得した検索対象となる複数のデータとを要素とするデータ集合に含まれるデータの各々について、当該データを第２クエリとしてデータ集合に対する検索を行った場合における、データ集合に含まれるデータの各々の、第２クエリに関連する順位であるランクを付与する。 In step S120, the search unit 120 refers to each of the data included in the data set including the data which is the first query received in step S100 and the plurality of data to be searched obtained in step 100 as elements. , A rank, which is a rank related to the second query, is given to each of the data included in the data set when the data set is searched for as the second query.

ステップＳ１３０において、類似度計算部１３０は、データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する。 In step S130, the similarity calculation unit 130 sets the similarity between one of the data and the other of the data for each pair of data included in the data set as the second query. It is calculated that the higher the rank, the larger the value given to the other data and the rank given to one of the data when the other data is used as the second query.

ステップＳ１４０において、ランクスコア計算部１４０は、検索対象となる複数のデータの各々について、類似度計算部１３０により求められた類似度に基づいて、当該データが第１クエリに関連する度合いを示すランクスコアを計算する。 In step S140, the rank score calculation unit 140 indicates the degree to which the data is related to the first query based on the similarity obtained by the similarity calculation unit 130 for each of the plurality of data to be searched. Calculate the score.

ステップＳ１５０において、ランクスコア計算部１４０は、上記ステップＳ１４０により得られたランクスコアに基づいて、ランクスコアの高い順に検索対象のデータを並べ替える。 In step S150, the rank score calculation unit 140 sorts the data to be searched in descending order of rank score based on the rank score obtained in step S140.

ステップＳ１６０において、出力部１５０は、上記ステップＳ１５０により得られた並び替え結果を出力する。 In step S160, the output unit 150 outputs the sorting result obtained in step S150.

以上説明したように、本実施形態に係る検索装置によれば、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とするデータ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算することにより、関係のある検索対象と無関係な検索対象との類似度を低く抑制し、無関係な検索対象のランクスコアをより正確に計算し、リランキングの精度を高めて、検索の精度を高めることができる。 As described above, according to the search device according to the present embodiment, each of the data pairs included in the data set containing the input first query data and the plurality of data to be searched as elements. The degree of similarity between one of the relevant data and the other of the relevant data was defined as the rank given to the other relevant data when one of the relevant data was used as the second query, and the other of the relevant data as the second query. By calculating so that the higher the rank, the larger the rank given to one of the relevant data in the case, the similarity between the related search target and the unrelated search target is suppressed to a low level. The rank score of irrelevant search targets can be calculated more accurately, the accuracy of re-ranking can be improved, and the accuracy of search can be improved.

なお、本実施形態において、類似度計算部１３０は、データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が所定のランクよりも低いランクである場合に小さい値を取るように計算してもよい。 In the present embodiment, the similarity calculation unit 130 uses the similarity between one of the data and the other as the second query for each of the pairs of data included in the data set. When both the rank given to the other relevant data and the rank given to one of the relevant data when the other relevant data is used as the second query are higher than the predetermined rank. It may be calculated that the higher the rank is, the larger the value is taken, and when at least one of them is lower than the predetermined rank, the smaller value is taken.

具体的には、類似度計算部１３０は、類似度ｓ（ｘ_ｉ，ｘ_ｊ）を、一方のデータｘ_ｉを第２クエリとした場合における他方のデータｘ_ｊに付与されたランクｒ（ｘ_ｉ｜ｘ_ｊ）と、他方のデータｘ_ｊを第２クエリとした場合における一方のデータｘ_ｉに付与されたランクｒ（ｘ_ｊ｜ｘ_ｉ）とが共に所定のランクよりも高いランクである場合に高いランクであるほど大きい値を取るように計算し、共に所定のランクよりも高いランクでない場合に小さい値を取るように計算する。例えば、下記式（７）を用いて計算することができる。 Specifically, the similarity calculation unit 130 assigns the similarity s (x _i , x _j ) to the rank r (x) given to the other data x _j when one data x _i is used as the second query. _i | x _j ) and the rank r (x _j | x _i ) given to one data x _i when the other data x _j is used as the second query are both higher ranks than the predetermined rank. In some cases, the higher the rank, the larger the value, and when both ranks are not higher than the predetermined rank, the smaller value is calculated. For example, it can be calculated using the following equation (7).

ここで、ｋは所定のランクである正の整数の閾値であり、例えばｋ＝３０とすることができ、ｓ_ｋ（ｘ_ｉ，ｘ_ｊ）は、式（７）を用いた場合の類似度を示す。 Here, k is a threshold value of a positive integer having a predetermined rank, for example, k = 30, and _sk (x _i , x _j ) is the degree of similarity when the equation (7) is used. Is shown.

式（７）を用いる場合、ｒ（ｘ_ｉ｜ｘ_ｊ）とｒ（ｘ_ｊ｜ｘ_ｉ）とが共に所定のランクよりも高いランクでない場合には、類似度を小さい値を取るように計算することで、関係のある検索対象のデータと無関係な検索対象のデータとの類似度をより低く抑制することができ、より精度の高い類似度を求めることができる。 When equation (7) is used, if both r (x _i | x _j ) and r (x _j | x _i ) are not higher than a predetermined rank, the similarity is calculated to take a small value. By doing so, the similarity between the related search target data and the unrelated search target data can be suppressed to be lower, and a more accurate similarity can be obtained.

＜本発明の第２実施の形態に係る検索装置の概要＞
本発明の第２実施形態の概要について説明する。 <Overview of the search device according to the second embodiment of the present invention>
The outline of the second embodiment of the present invention will be described.

検索対象リランキング方法の精度は、上位結果のリランキングに対する貢献度の正確性に強く依存する。偽陽性の上位結果のリランキングに対する貢献度が高く計算された場合、当該上位結果との類似度が高い無関係な検索対象は、ランクスコアが高く計算され、リランキングの精度に悪影響を与える。例えば、非特許文献１記載の方法は、全上位結果のリランキングに対する貢献度を一様に扱う。非特許文献２記載の方法は、各上位結果の第１ランクに反比例する加重値を当該上位結果のリランキングに対する貢献度として計算する。しかし、第１ランクの正確性は１回目の検索の精度に強く依存し、１回目の検索の精度が低い場合、偽陽性の上位結果については第１ランクが高く、高い加重値が付与される可能性が高い。こうした第１ランクの高い偽陽性の上位結果は、依然としてリランキングに対する貢献度が高く、リランキングの精度に悪影響を与える場合が多い。 The accuracy of the search target reranking method strongly depends on the accuracy of the contribution of the top results to the reranking. When the degree of contribution to the reranking of the high-ranking false positive results is calculated to be high, the unrelated search target having a high degree of similarity to the high-ranking result is calculated to have a high rank score, which adversely affects the accuracy of the reranking. For example, the method described in Non-Patent Document 1 uniformly treats the degree of contribution to the reranking of all top results. In the method described in Non-Patent Document 2, a weighted value inversely proportional to the first rank of each higher-ranking result is calculated as the degree of contribution to the re-ranking of the higher-ranking result. However, the accuracy of the first rank strongly depends on the accuracy of the first search, and when the accuracy of the first search is low, the first rank is high and a high weighted value is given to the top result of false positive. Probability is high. Such high-ranked false-positive results of the first rank still have a high degree of contribution to re-ranking, and often adversely affect the accuracy of re-ranking.

また、検索対象リランキング方法の精度は、上位結果と検索対象との類似度の正確性にも強く依存する。真陽性の上位結果と無関係な検索対象との類似度が高く計算された場合、当該検索対象のランクスコアが高く計算され、リランキングの精度が低下する。例えば、非特許文献１、２記載の方法は、２回目の検索で上位結果や検索対象などに付与されたランクに基づいて、前記類似度を計算している。 In addition, the accuracy of the search target reranking method strongly depends on the accuracy of the similarity between the top result and the search target. When the degree of similarity between the high-ranking true positive result and the unrelated search target is calculated to be high, the rank score of the search target is calculated to be high, and the accuracy of reranking is lowered. For example, in the methods described in Non-Patent Documents 1 and 2, the similarity is calculated based on the rank given to the higher-ranked result or the search target in the second search.

しかし、２回目の検索の精度が低い場合、無関係な検索対象でも、真陽性の上位結果との類似度が高く計算され、リランキングの精度に悪影響を与える可能性が高い、という問題があった。 However, if the accuracy of the second search is low, there is a problem that even if the search target is irrelevant, the similarity with the top result of true positive is calculated to be high, and the accuracy of reranking is likely to be adversely affected. ..

次に、検索対象の中から第１クエリにおける上位結果を取得し、第１クエリと全上位結果とを組み合わせた第２データ集合において、第１実施形態と同様の方法を用いて、全データのランクスコア（第１ランクスコア）を計算する。 Next, in the second data set in which the upper results in the first query are acquired from the search targets and the first query and all the upper results are combined, the same method as in the first embodiment is used to obtain all the data. Calculate the rank score (first rank score).

検索対象となる複数のデータに含まれるデータと、第２データ集合に含まれるデータとを組み合わせたペアの各々について、ランクに基づいて、第１実施形態と同様の方法を用いて、類似度を計算し、検索対象の各々について、第２データ集合の全データの第１ランクスコアに基づいて、当該検索対象と全データとの類似度の加重和を計算し、第２ランクスコアとする。 For each of the pairs in which the data included in the plurality of data to be searched and the data included in the second data set are combined, the similarity is determined by using the same method as in the first embodiment based on the rank. For each of the search targets, the weighted sum of the degree of similarity between the search target and all the data is calculated based on the first rank score of all the data in the second data set, and this is used as the second rank score.

このような構成により、偽陽性の上位結果のリランキングに対する貢献度と、真陽性の上位結果と無関係な検索対象との類似度とを低く抑制し、リランキングの精度を高めることが可能となる。 With such a configuration, it is possible to suppress the contribution of false positive top results to the reranking and the similarity between the true positive top results and unrelated search targets to a low level, and improve the accuracy of reranking. ..

また、第１実施形態の構成、従来技術の方法（例えば非特許文献３の方法）と比べて、計算量の大きい第１ランクスコアの計算を検索対象のデータの数より数少ない上位結果のみに限定するため、計算量を小さく抑制することができ、検索の速度を高めることが可能となる。 Further, the calculation of the first rank score, which has a large amount of calculation as compared with the configuration of the first embodiment and the method of the prior art (for example, the method of Non-Patent Document 3), is limited to only the higher-ranked results which are less than the number of data to be searched. Therefore, the amount of calculation can be suppressed to a small level, and the search speed can be increased.

＜本発明の第２実施の形態に係る検索装置の構成＞
本発明の第２実施の形態に係る検索装置２０の構成について説明する。なお、第１実施の形態に係る検索装置１０と同様の構成については、同一の符号を付して詳細な説明は省略する。 <Structure of the search device according to the second embodiment of the present invention>
The configuration of the search device 20 according to the second embodiment of the present invention will be described. The same components as those of the search device 10 according to the first embodiment are designated by the same reference numerals, and detailed description thereof will be omitted.

検索装置２０は、ＣＰＵと、ＲＡＭと、後述する検索処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成され、機能的には次に示すように構成されている。 The search device 20 is composed of a computer including a CPU, a RAM, and a ROM that stores a program for executing a search processing routine described later, and is functionally configured as shown below.

図３に示すように、本実施形態に係る検索装置２０は、入力部１００と、データベース１１０と、検索部１２０と、第１類似度計算部２３０と、第１ランクスコア計算部２４０と、第２類似度計算部２５０と、第２ランクスコア計算部２６０と、出力部１５０とを備えて構成される。 As shown in FIG. 3, the search device 20 according to the present embodiment includes an input unit 100, a database 110, a search unit 120, a first similarity calculation unit 230, a first rank score calculation unit 240, and a first rank score calculation unit 240. The two similarity calculation unit 250, the second rank score calculation unit 260, and the output unit 150 are provided.

検索部１２０は、更に、生成したランク行列と、生成したデータ集合

を第２類似度計算部２５０に渡す。ここで、本実施形態では、データ集合

を第１データ集合と呼ぶ。 The search unit 120 further includes the generated rank matrix and the generated data set.

Is passed to the second similarity calculation unit 250. Here, in the present embodiment, a data set

Is called the first data set.

第１類似度計算部２３０は、第１クエリであるデータと、ランクが上位Ｎ個の検索対象となるデータを要素とする第２データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する。 The first similarity calculation unit 230 uses one of the relevant data for each of the data that is the first query and the data pair included in the second data set whose elements are the data to be searched with the top N ranks. The degree of similarity between and the other data is the rank given to the other data when one data is used as the second query, and the one data when the other data is used as the second query. The higher the rank, the larger the value given to.

具体的には、第１類似度計算部２３０は、まず、検索部１２０によって計算されたランク行列に含まれている、第１クエリｑ（ｘ_０）を用いた時に検索対象のデータ

に付与されたランクｒ（ｘ_１｜ｘ_０），ｒ（ｘ_２｜ｘ_０），…，ｒ（ｘ_ｎ｜ｘ_０）に基づいて、検索対象のデータのうち第１クエリにおける上位ｍ個のデータを取得する。例えば、上位１００個（ｍ＝１００）個のデータを取得する場合、ｒ（ｘ_ｉ｜ｘ_０）≦１００となるｘ_ｉを取得する。ここで、取得した上位結果を、

とする。 Specifically, the first similarity calculation unit 230 first searches data when the first query q (x ₀ ) included in the rank matrix calculated by the search unit 120 is used.

Based on the rank r (x ₁ | x ₀ ), r (x ₂ | x ₀ ), ..., R (x _n | x ₀ ) given to, the top m pieces of the data to be searched in the first query. Get the data of. For example, when acquiring top 100 a (m = 100) pieces of data, _r | acquires _{_(x} i x ₀₎ ≦ 100 and becomes _{x i.} Here, the obtained top results are

And.

そして、第１類似度計算部２３０は、第１クエリｑと、取得した上位結果

とを１つの集合とした第２データ集合

を作成する。 Then, the first similarity calculation unit 230 includes the first query q and the acquired higher-level result.

A second data set with and as one set

To create.

次に、第１類似度計算部２３０は、第２データ集合

に含まれるデータのペアの各々について、当該データのペア｛ｚ_ｉ，ｚ_ｊ｝について、類似度ｓ（ｚ_ｉ，ｚ_ｊ）を、一方のデータｚ_ｉを第２クエリとした場合における他方のデータｚ_ｊに付与されたランクｒ（ｚ_ｉ｜ｚ_ｊ）と、他方のデータｚ_ｊを第２クエリとした場合における一方のデータｚ_ｉに付与されたランクｒ（ｚ_ｊ｜ｚ_ｉ）とが共に高いランクであるほど大きい値を取るように計算する。 Next, the first similarity calculation unit 230 uses the second data set.

For each data pair to be included in, the data pairs _{z i, _{z j}} for similarity _{s (z} i, _{z j),} and the other in the case where one of the data _{z i} is defined as a second query data _{z j} in granted rank _r and | | _{_(z i z} j) and _(z i _{z j),} the other data _{z j} the second query and the rank given to one of the data _{z i} when _r Calculate so that the higher the rank of both, the larger the value.

例えば、上記式（１）を用いて計算し、また、類似度ｓ（ｚ_ｉ，ｚ_ｊ）を、相乗平均の逆数、相加平均の逆数、最大値の逆数等としてもよい。 For example, it calculated using the above equation (1), also the similarity s (z _{i, z} _j), and the reciprocal of the geometric mean reciprocal of the arithmetic mean, or as the reciprocal of the maximum value and the like.

また、第１類似度計算部２３０は、第１クエリであるデータと、ランクが上位Ｎ個の検索対象となるデータとを要素とする第２データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に所定のランクよりも高いランクである場合に高いランクであるほど大きい値を取るように計算し、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が所定のランクよりも低いランクである場合に小さい値を取るように計算してもよい。 Further, the first similarity calculation unit 230 uses one of the data pairs included in the second data set whose elements are the data that is the first query and the data that is the search target of the top N ranks. The degree of similarity between the data and the other data is the rank given to the other data when one of the data is used as the second query, and one when the other data is used as the second query. When the ranks given to the relevant data are both higher than the predetermined rank, the higher the rank, the larger the value is calculated, and when both are higher than the predetermined rank, both are calculated. The higher the rank, the larger the value may be calculated, and if at least one of the ranks is lower than the predetermined rank, the smaller value may be calculated.

具体的には、第１類似度計算部２３０は、第２データ集合

に含まれるデータのペアの各々について、当該データのペア｛ｚ_ｉ，ｚ_ｊ｝について、類似度ｓ（ｚ_ｉ，ｚ_ｊ）を、一方のデータｚ_ｉを第２クエリとした場合における他方のデータｚ_ｊに付与されたランクｒ（ｚ_ｉ｜ｚ_ｊ）と、他方のデータｚ_ｊを第２クエリとした場合における一方のデータｚ_ｉに付与されたランクｒ（ｚ_ｊ｜ｚ_ｉ）とが、共に所定のランクよりも高いランクである場合に高いランクであるほど大きい値を取るように計算し、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が所定のランクよりも低いランクである場合に小さい値を取るように計算する。 Specifically, the first similarity calculation unit 230 uses the second data set.

For each data pair to be included in, the data pairs _{z i, _{z j}} for similarity _{s (z} i, _{z j),} and the other in the case where one of the data _{z i} is defined as a second query data _{z j} in granted rank _r and | | _{_(z i z} j) and _(z i _{z j),} the other data _{z j} the second query and the rank given to one of the data _{z i} when _r However, when both ranks are higher than the predetermined rank, the higher the rank, the higher the value is calculated, and when both are higher than the predetermined rank, the higher the rank, the higher the value. If at least one of them has a rank lower than a predetermined rank, it is calculated to take a smaller value.

例えば、上記式（７）の方法を用いて、類似度ｓ_ｋ（ｚ_ｉ，ｚ_ｊ）を求めることができる。 For example, using the method of the above formula (7), the similarity _{_{_{s k (z i, z j}}} ) can be obtained.

そして、第１類似度計算部２３０は、得られた全ての類似度を格納する第１類似度行列を求め、第１類似度行列と、第２データ集合

とを、第２ランクスコア計算部２６０に渡す。 Then, the first similarity calculation unit 230 obtains the first similarity matrix that stores all the obtained similarity, and the first similarity matrix and the second data set.

To the second rank score calculation unit 260.

また、第１類似度計算部２３０は、第２データ集合

を、第２類似度計算部２５０に渡す。 Further, the first similarity calculation unit 230 uses the second data set.

Is passed to the second similarity calculation unit 250.

第１ランクスコア計算部２４０は、第２データ集合の各データについて、第１類似度計算部２３０により求められた類似度に基づいて、当該データが第１クエリに関連する度合いを示す第１ランクスコアを計算する。 The first rank score calculation unit 240 indicates the degree to which the data is related to the first query based on the similarity obtained by the first similarity calculation unit 230 for each data in the second data set. Calculate the score.

具体的には、第１ランクスコア計算部２４０は、第１実施形態に係るランクスコア計算部１４０と同様の手法により、第１類似度計算部２３０より取得した第１類似度行列に基づいて、第１クエリｑに関して、第２データ集合

に含まれるデータの各々についてのランクスコアである第１ランクスコアを計算する。 Specifically, the first rank score calculation unit 240 is based on the first similarity matrix acquired from the first similarity calculation unit 230 by the same method as the rank score calculation unit 140 according to the first embodiment. Second data set for the first query q

The first rank score, which is the rank score for each of the data contained in, is calculated.

そして、第１ランクスコア計算部２４０は、第１ランクスコアを格納する第１ランクスコア行列を、第２ランクスコア計算部２６０に渡す。 Then, the first rank score calculation unit 240 passes the first rank score matrix storing the first rank score to the second rank score calculation unit 260.

第２類似度計算部２５０は、検索対象となる複数のデータに含まれるデータの各々と、第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の当該データと他方の当該データとの類似度である第２類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する。 The second similarity calculation unit 250 uses one of the data and the other for each of the pairs of the data included in the plurality of data to be searched and the data included in the second data set. The second similarity, which is the degree of similarity with the data, is the rank given to the other data when one of the data is used as the second query, and one when the other data is used as the second query. It is calculated that the higher the rank given to the relevant data, the larger the value.

具体的には、第２類似度計算部２５０は、検索対象となる複数のデータ

に含まれるデータｘ_ｉと、第２データ集合

に含まれるデータｚ_ｊとを組み合わせたペアの各々について、当該データのペア｛ｚ_ｉ，ｘ_ｊ｝について、第２類似度ｓ（ｘ_ｉ，ｚ_ｊ）を、一方のデータｘ_ｉを第２クエリとした場合における他方のデータｚ_ｊに付与されたランクｒ（ｚ_ｉ｜ｘ_ｉ）と、他方のデータｚ_ｊを第２クエリとした場合における一方のデータｘ_ｉに付与されたランクｒ（ｘ_ｉ｜ｚ_ｊ）とが、共に高いランクであるほど大きい値を取るように計算する。 Specifically, the second similarity calculation unit 250 uses a plurality of data to be searched.

The data x _i contained in and the second data set

For each pair the combination of the data _{z j} included in, the data pairs _{z i, _{x j}} for the second degree of similarity _{s (x} i, _{z j),} and one of the data _{x i} second query the rank given to the other data z _j in the case where r (z i _{| x} _i) and the other data z _j the second one in the case of the query data x _i to be applied rank r ( It is calculated so that the higher the rank of both x _i | z _j ), the larger the value.

例えば、上記式（１）を用いて計算し、また、第２類似度ｓ（ｘ_ｉ，ｚ_ｊ）を、相乗平均の逆数、相加平均の逆数、最大値の逆数等としてもよい。 For example, it calculated using the above equation (1), and the second similarity s (x _{i, z} _j), and the reciprocal of the geometric mean reciprocal of the arithmetic mean, or as the reciprocal of the maximum value and the like.

そして、第２類似度計算部２５０は、得られた全ての第２類似度を格納する第２類似度行列を求め、第２類似度行列を、第２ランクスコア計算部２６０に渡す。 Then, the second similarity calculation unit 250 obtains a second similarity matrix that stores all the obtained second similarity, and passes the second similarity matrix to the second rank score calculation unit 260.

第２ランクスコア計算部２６０は、検索対象となる複数のデータの各々について、当該データの第１ランクスコアと、第２類似度とに基づいて、当該データが第１クエリに関連する度合いを示す第２ランクスコアを計算する。 The second rank score calculation unit 260 indicates the degree to which the data is related to the first query based on the first rank score of the data and the second similarity for each of the plurality of data to be searched. Calculate the second rank score.

具体的には、第２ランクスコア計算部２６０は、検索対象のデータの各々について、第２データ集合

に含まれるデータの各々についての第１ランクスコアに基づいて、当該検索対象のデータと当該第２データ集合に含まれるデータのペアとの第２類似度の加重和を計算し、第２ランクスコアとする。 Specifically, the second rank score calculation unit 260 sets the second data set for each of the data to be searched.

Based on the first rank score for each of the data contained in, the weighted sum of the second similarity between the data to be searched and the pair of data contained in the second data set is calculated, and the second rank score is calculated. And.

ここで、第１ランクスコア行列を

と、検索対象のデータｘ_ｉの第２ランクスコアをｐ（ｘ_ｉ）と表す。 Here, the first rank score matrix

If represents the second rank score data _{x i} of the search target p _{(x i).}

例えば、第２ランクスコアは、下記式（８）や式（９）などを用いて計算する。 For example, the second rank score is calculated using the following equations (8) and (9).

そして、第２ランクスコア計算部２６０は、第２ランクスコアの高い順に検索対象のデータ

を並べ替え、並べ替え結果を出力部１５０に渡す。 Then, the second rank score calculation unit 260 searches for data in descending order of the second rank score.

Is sorted, and the sorting result is passed to the output unit 150.

＜本発明の第２実施の形態に係る検索装置の作用＞
図４は、本発明の第２実施の形態に係る検索処理ルーチンを示すフローチャートである。なお、第１実施の形態に係る検索処理ルーチンと同様の処理については、同一の符号を付して詳細な説明は省略する。 <Operation of the search device according to the second embodiment of the present invention>
FIG. 4 is a flowchart showing a search processing routine according to the second embodiment of the present invention. The same processing as the search processing routine according to the first embodiment is designated by the same reference numerals and detailed description thereof will be omitted.

ステップＳ２３０において、第１類似度計算部２３０は、第１クエリであるデータと、ランクが上位Ｎ個の検索対象となるデータを要素とする第２データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する。 In step S230, the first similarity calculation unit 230 uses the first query data and each of the data pairs included in the second data set whose elements are the data to be searched with the highest N ranks. The degree of similarity between one of the data and the other data is the rank given to the other data when one of the data is used as the second query, and the degree of similarity between the other data is used as the second query. It is calculated so that the higher the rank given to the data, the larger the value.

また、ステップＳ２３０において、第１類似度計算部２３０は、第１クエリであるデータと、ランクが上位Ｎ個の検索対象となるデータとを要素とする第２データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に所定のランクよりも高いランクである場合に高いランクであるほど大きい値を取るように計算し、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が所定のランクよりも低いランクである場合に小さい値を取るように計算してもよい。 Further, in step S230, the first similarity calculation unit 230 is a pair of data included in the second data set whose elements are the data that is the first query and the data that is the search target of the top N ranks. For each, the similarity between the data on one side and the data on the other side is defined as the rank given to the data on the other side when one piece of data is used as the second query, and the data on the other side is referred to as the second query. If one of the ranks given to the relevant data in the case of the above is higher than the predetermined rank, the higher the rank, the larger the value is calculated, and both are higher than the predetermined rank. In some cases, the higher the rank, the larger the value, and if at least one of them is lower than the predetermined rank, the smaller the value may be calculated.

ステップＳ２４０において、第１ランクスコア計算部２４０は、第２データ集合の各データについて、上記ステップＳ２３０により求められた類似度に基づいて、当該データが第１クエリに関連する度合いを示す第１ランクスコアを計算する。 In step S240, the first rank score calculation unit 240 indicates the degree to which the data is related to the first query for each data in the second data set based on the similarity obtained in step S230. Calculate the score.

ステップＳ２４２において、第２類似度計算部２５０は、検索対象となる複数のデータに含まれるデータの各々と、第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の当該データと他方の当該データとの類似度である第２類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算する。 In step S242, the second similarity calculation unit 250 uses one of the relevant pairs of the data included in the plurality of data to be searched and the data included in the second data set. The second similarity, which is the similarity between the data and the other data, is the rank given to the other data when one of the data is used as the second query, and the other data is referred to as the second query. In this case, the higher the rank, the larger the value given to the data.

ステップＳ２４４において、第２ランクスコア計算部２６０は、検索対象となる複数のデータの各々について、当該データの第１ランクスコアと、上記ステップＳ２４２により求められた第２類似度とに基づいて、当該データが第１クエリに関連する度合いを示す第２ランクスコアを計算する。 In step S244, the second rank score calculation unit 260 uses the first rank score of the data and the second similarity obtained in step S242 for each of the plurality of data to be searched. Calculate a second rank score that indicates how much the data is relevant to the first query.

ステップＳ２５０において、第２ランクスコア計算部２６０は、上記ステップＳ２４４により得られた第２ランクスコアの高い順に検索対象のデータを並べ替える。 In step S250, the second rank score calculation unit 260 sorts the data to be searched in descending order of the second rank score obtained in step S244.

以上説明したように、本実施形態に係る検索装置によれば、入力された第１クエリであるデータと、検索対象となる複数のデータとを要素とする第１データ集合に含まれるデータの各々について、当該データを第２クエリとして第１データ集合に対する検索を行った場合における、第１データ集合に含まれるデータの各々の、当該第２クエリに関連する順位であるランクを付与し、第１類似度計算部が、第１クエリであるデータと、ランクが上位Ｎ個の検索対象となるデータを要素とする第２データ集合に含まれるデータのペアの各々について、一方の当該データと他方の当該データとの類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算し、第１ランクスコア計算部が、第２データ集合の各データについて、第１類似度計算部により求められた類似度に基づいて、当該データが第１クエリに関連する度合いを示す第１ランクスコアを計算し、検索対象となる複数のデータに含まれるデータの各々と、第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の当該データと他方の当該データとの類似度である第２類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に高いランクであるほど大きい値を取るように計算し、検索対象となる複数のデータの各々について、当該データの第１ランクスコアと、第２類似度とに基づいて、当該データが第１クエリに関連する度合いを示す第２ランクスコアを計算することにより、偽陽性の上位結果のリランキングに対する貢献度と、真陽性の上位結果と無関係な検索対象との類似度とを低く抑制し、リランキングの精度を高めて、検索の精度を高めることができる。 As described above, according to the search device according to the present embodiment, each of the data included in the first data set having the input first query data and the plurality of data to be searched as elements. When a search is performed on the first data set using the data as the second query, each of the data included in the first data set is given a rank, which is the rank related to the second query, and the first For each of the pair of data included in the second data set whose elements are the data that is the first query and the data that is the search target of the top N ranks, the similarity calculation unit has one of the relevant data and the other of the data. The degree of similarity with the data is given to the rank given to the other data when one of the data is used as the second query and to one of the data when the other data is used as the second query. The higher the rank, the larger the value is calculated so that the first rank score calculation unit obtains the similarity obtained by the first similarity calculation unit for each data in the second data set. Based on this, a first rank score indicating the degree to which the data is related to the first query is calculated, and each of the data contained in the plurality of data to be searched and each of the data contained in the second data set are obtained. For each of the combined pairs, the second similarity, which is the similarity between one data and the other data, is the rank given to the other data when one data is used as the second query. , When the other relevant data is used as the second query, the higher the rank of the other relevant data, the larger the value is calculated, and each of the plurality of data to be searched is calculated. , By calculating a second rank score that indicates the degree to which the data is relevant to the first query, based on the first rank score of the data and the second similarity, with respect to the reranking of the top false positive results. It is possible to suppress the degree of contribution and the degree of similarity between the high-ranking true positive results and the unrelated search target to a low level, improve the accuracy of re-ranking, and improve the accuracy of the search.

また、計算量の大きい第１ランクスコアの計算を、検索対象のデータ数より少ない上位結果のみに限定するため、計算量を小さく抑制することができ、検索の速度を高めることが可能となる。 Further, since the calculation of the first rank score having a large amount of calculation is limited to only the high-ranking results smaller than the number of data to be searched, the amount of calculation can be suppressed to be small and the search speed can be increased.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

例えば、第２の実施形態において、第２類似度計算部２５０は、検索対象となる複数のデータに含まれるデータの各々と、第２データ集合に含まれるデータの各々とを組み合わせたペアの各々について、一方の当該データと他方の当該データとの類似度である第２類似度を、一方の当該データを第２クエリとした場合における他方の当該データに付与されたランクと、他方の当該データを第２クエリとした場合における一方の当該データに付与されたランクとが、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が前記所定のランクよりも低いランクである場合に小さい値を取るように計算するようにしてもよい。 For example, in the second embodiment, the second similarity calculation unit 250 is a pair of each of the data included in the plurality of data to be searched and each of the data included in the second data set. The second similarity, which is the similarity between the one data and the other data, is the rank given to the other data when one of the data is used as the second query, and the other data. Is used as the second query, and when one of the ranks given to the relevant data is higher than the predetermined rank, the higher the rank, the larger the value, and at least one of them is calculated to take a larger value. It may be calculated so as to take a small value when the rank is lower than the predetermined rank.

具体的には、第２類似度計算部２５０は、検索対象となる複数のデータ

に含まれるデータｘ_ｉと、第２データ集合

に含まれるデータｚ_ｊとを組み合わせたペアの各々について、当該データのペア｛ｚ_ｉ，ｘ_ｊ｝について、第２類似度ｓ（ｘ_ｉ，ｚ_ｊ）を、一方のデータｘ_ｉを第２クエリとした場合における他方のデータｚ_ｊに付与されたランクｒ（ｚ_ｊ｜ｘ_ｉ）と、他方のデータｚ_ｊを第２クエリとした場合における一方のデータｘ_ｉに付与されたランクｒ（ｘ_ｉ｜ｚ_ｊ）とが、共に所定のランクよりも高いランクである場合に高いランクであるほど大きい値を取るように計算し、共に所定のランクよりも高いランクである場合に共に高いランクであるほど大きい値を取るように計算し、少なくとも一方が所定のランクよりも低いランクである場合に小さい値を取るように計算する。 Specifically, the second similarity calculation unit 250 uses a plurality of data to be searched.

The data x _i contained in and the second data set

For each pair the combination of the data _{z j} included in, the data pairs _{z i, _{x j}} for the second degree of similarity _{s (x} i, _{z j),} and one of the data _{x i} second The rank r (z _j | x _i ) given to the other data z _j in the case of a query and the rank r (g) given to one data x _i in the case of using the other data z _j as a second query. It is calculated that x _i | z _j ) takes a larger value as the rank is higher than the predetermined rank when both are higher than the predetermined rank, and both are higher ranks when both are higher than the predetermined rank. The higher the value, the larger the value, and when at least one of them has a rank lower than the predetermined rank, the smaller value is calculated.

例えば、上記式（７）の方法を用いて、第２類似度ｓ_ｋ（ｘ_ｉ，ｚ_ｊ）を求めることができる。 For example, using the method of the above formula (7), the second similarity _{_{_{s k (x i, z j}}} ) can be obtained.

また、第２実施形態において、第１類似度計算部２３０が、上位検索

と、第２データ集合

とを作成したが、検索部１２０が生成し、第１類似度計算部２３０と、第２類似度計算部２５０とにそれぞれ渡す構成としてもよい。 Further, in the second embodiment, the first similarity calculation unit 230 performs a higher search.

And the second data set

However, the search unit 120 may generate the above and pass it to the first similarity calculation unit 230 and the second similarity calculation unit 250, respectively.

また、第２類似度計算部２５０においても、上位検索

と、第２データ集合

とを作成する構成としてもよい。 In addition, the second similarity calculation unit 250 also performs a high-level search.

And the second data set

It may be configured to create and.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 Further, although described as an embodiment in which the program is pre-installed in the specification of the present application, it is also possible to provide the program by storing it in a computer-readable recording medium.

１０検索装置
２０検索装置
１００入力部
１１０データベース
１２０検索部
１３０類似度計算部
１４０ランクスコア計算部
１５０出力部
２３０第１類似度計算部
２４０第１ランクスコア計算部
２５０第２類似度計算部
２６０第２ランクスコア計算部 10 Search device 20 Search device 100 Input unit 110 Database 120 Search unit 130 Similarity calculation unit 140 Rank score calculation unit 150 Output unit 230 First similarity calculation unit 240 First rank score calculation unit 250 Second similarity calculation unit 260 2 rank score calculation department

Claims

When a search is performed on the data set using the data as the second query for each of the data which is the input first query and the data included in the data set whose elements are a plurality of data to be searched. , A search unit that assigns a rank, which is a rank related to the second query, for each of the data included in the data set.
For each of the pairs of data contained in the data set, the similarity between the one data and the other data is the rank given to the other data when one of the data is used as the second query. , A similarity calculation unit that calculates so that the higher the rank, the larger the rank given to one of the data when the other data is used as the second query.
With respect to each of the plurality of data to be searched, a rank score calculation unit that calculates a rank score indicating the degree to which the data is related to the first query based on the similarity obtained by the similarity calculation unit. ,
Search device including.

The similarity calculation unit
For each of the pairs of data contained in the data set, the similarity between the one data and the other data is the rank given to the other data when one of the data is used as the second query. , When the other data is used as the second query and the rank given to one of the data is higher than the predetermined rank, the higher the rank, the larger the value. The search device according to claim 1, wherein the search device is calculated so as to take a smaller value when at least one of them has a rank lower than the predetermined rank.

For each of the data which is the input first query and the data included in the first data set whose elements are a plurality of data to be searched, the search for the first data set is performed using the data as the second query. A search unit that assigns a rank, which is a rank related to the second query, for each of the data included in the first data set when the data is performed.
For each of the data that is the first query and the data pair included in the second data set whose elements are the data to be searched with the top N ranks, one of the data and the other of the data The degree of similarity is the rank given to the other data when one of the data is used as the second query, and the rank given to one of the data when the other data is used as the second query. However, the first similarity calculation unit that calculates so that the higher the rank, the larger the value,
For each data in the second data set, a first rank that calculates a first rank score indicating the degree to which the data is related to the first query is calculated based on the similarity obtained by the first similarity calculation unit. Score calculation department and
For each of the pairs in which each of the data included in the plurality of data to be searched and each of the data included in the second data set are combined, the degree of similarity between the one data and the other data A certain second similarity was given to the rank given to the other data when one of the data was used as the second query, and to one of the data when the other data was used as the second query. The second similarity calculation unit that calculates so that the higher the rank, the larger the value,
For each of the plurality of data to be searched, a second rank score indicating the degree to which the data is related to the first query is calculated based on the first rank score of the data and the second similarity. 2nd rank score calculation unit and
Search device including.

The first similarity calculation unit is
For each of the pairs of data contained in the second data set, the similarity between the one data and the other data was given to the other data when one data was used as the second query. When the rank and the rank given to one of the data when the other data is used as the second query are both higher than the predetermined rank, the higher the rank, the larger the value. Calculate so that if both ranks are higher than the predetermined rank, the higher the rank, the higher the value, and if at least one of them is lower than the predetermined rank, the smaller the value. Calculate to
The second similarity calculation unit is
For each of the pairs in which each of the data included in the plurality of data to be searched and each of the data included in the second data set are combined, the degree of similarity between the one data and the other data A certain second similarity was given to the rank given to the other data when one of the data was used as the second query, and to one of the data when the other data was used as the second query. When both ranks are higher than the predetermined rank, the higher the rank, the larger the value is calculated, and when at least one of them is lower than the predetermined rank, the smaller value is taken. The search device according to claim 3, wherein the search device is calculated as follows.

The search unit searches for the data set as the second query for each of the data included in the data set including the input first query data and the plurality of data to be searched. A search step for assigning a rank, which is a rank related to the second query, for each of the data included in the data set when the data is performed.
For each of the pairs of data included in the data set, the similarity calculation unit determines the similarity between the one data and the other data, and the other data when one of the data is used as a second query. The similarity calculation step is calculated so that the higher the rank, the larger the rank given to the data and the rank given to one of the data when the other data is used as the second query. ,
The rank score calculation unit calculates a rank score indicating the degree to which the data is related to the first query based on the similarity obtained by the similarity calculation unit for each of the plurality of data to be searched. Rank score calculation steps to be performed and
Search methods that include.

The similarity calculation step is
For each of the pairs of data contained in the data set, the similarity between the one data and the other data is the rank given to the other data when one of the data is used as the second query. , When the other data is used as the second query and the rank given to one of the data is higher than the predetermined rank, the higher the rank, the larger the value. The search method according to claim 5, wherein the search method is calculated so as to take a smaller value when at least one of the ranks is lower than the predetermined rank.

The search unit uses the data as the second query for each of the data included in the first data set whose elements are the input first query data and the plurality of data to be searched, and the first data. A search step for assigning a rank, which is a rank related to the second query, for each of the data included in the first data set when a search is performed on the set, and
One of the first similarity calculation units is for each of the data that is the first query and the data pair that is included in the second data set that includes the data to be searched for the top N ranks. The degree of similarity between the data and the other data is the rank given to the other data when one of the data is used as the second query, and one of the ranks given to the other data when the other data is used as the second query. The first similarity calculation step, which calculates so that the higher the rank of the rank assigned to the data, the larger the value.
The first rank score calculation unit indicates the degree to which the data is related to the first query based on the similarity obtained by the first similarity calculation unit for each data of the second data set. The first rank score calculation step to calculate the rank score and
For each of the pairs of data included in the plurality of data to be searched and each of the data included in the second data set, the second similarity calculation unit has one of the data and the other. The second similarity, which is the similarity with the data, is the rank given to the other data when one of the data is used as the second query, and the second similarity when the other data is used as the second query. A second similarity calculation step for calculating that the higher the rank of the rank assigned to the data, the larger the value.
The degree to which the data is related to the first query based on the first rank score of the data and the second similarity for each of the plurality of data to be searched by the second rank score calculation unit. The second rank score calculation step to calculate the second rank score indicating
Search methods that include.

A program for causing a computer to function as each part of the search device according to any one of claims 1 to 4.