JP6577922B2

JP6577922B2 - Search apparatus, method, and program

Info

Publication number: JP6577922B2
Application number: JP2016175052A
Authority: JP
Inventors: 小萌武; 薫平松; 柏野　邦夫; 邦夫柏野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2016-09-07
Filing date: 2016-09-07
Publication date: 2019-09-18
Anticipated expiration: 2036-09-07
Also published as: JP2018041281A

Description

本発明は、検索装置、方法、及びプログラムに係り、特に、クエリを用いた検索による各検索対象のランクを付与する検索装置、方法、及びプログラムに関する。 The present invention relates to a search device, a method, and a program, and more particularly, to a search device, a method, and a program for assigning a rank of each search target by a search using a query.

検索とは、検索の利用者がクエリを入力し、検索アルゴリズムがクエリとの類似性の高いデータをデータベースに格納されている検索対象のデータから選択し、選択された検索対象のデータを利用者に返答することである。以下、検索対象のデータを検索対象と記す。検索対象は、例えば、文書や画像、音声、映像、その他様々なメディアやその組み合わせとして記録されたデータ等がある。適切な検索結果は、データを検索する条件によって複数存在することがある。検索の利用者が検索対象の中から適切な検索結果を判断しやすくするために、検索アルゴリズムによって計算された検索対象とクエリとの類似性の測度の高い順に検索対象を並べ替え、検索対象に優先順位をつける必要がある。以下、類似性の測度を類似度と記し、優先順位をランクと記す。 Search means that the user of the search inputs the query, the search algorithm selects data with high similarity to the query from the search target data stored in the database, and the user selects the selected search target data. To reply to. Hereinafter, the search target data is referred to as a search target. Search targets include, for example, data recorded as documents, images, audio, video, various other media, and combinations thereof. There may be a plurality of appropriate search results depending on conditions for searching for data. To make it easier for search users to determine the appropriate search results from the search targets, the search targets are sorted in descending order of similarity measure between the search target and the query calculated by the search algorithm. Priorities need to be set. Hereinafter, the measure of similarity is referred to as similarity, and the priority order is referred to as rank.

しかし、クエリから得られる情報量が少ない場合、適切な検索結果とクエリとの類似度が低く、適切な検索結果に低いランクが与えられる可能性が高い。前記問題に対処するために、１回目の検索によってランクを与えられた検索対象に対して、新たなランクを付与するための検索対象リランキング方法が各種提案されている。 However, when the amount of information obtained from the query is small, the similarity between the appropriate search result and the query is low, and there is a high possibility that a low rank is given to the appropriate search result. In order to deal with the problem, various search target reranking methods for assigning a new rank to a search target given a rank by the first search have been proposed.

例えば、非特許文献１記載の検索対象リランキング装置は、入力となる第１のクエリを用いて１回目の検索をし、検索対象に第１のランクを付与した後に、検索対象の中から第１のランクが最高な検索対象を複数割り出す。以下、前記第１のランクが最高な検索対象を上位結果と記す。次に、各検索対象を第２のクエリとして用いて２回目の検索をし、全検索対象に第２のランクをつける。次に、前記各検索対象について、上位結果の第２のランクに基づいて、新たなランクスコアを計算する。最後に、新たなランクスコアの高い順に前記各検索対象を並べ替え、前記各検索対象に第３のランクをつける。 For example, the search target reranking device described in Non-Patent Document 1 performs a first search using a first query that is an input, and after assigning a first rank to the search target, Multiple search targets with the highest rank are determined. Hereinafter, the search target having the highest first rank is referred to as a high-order result. Next, a second search is performed using each search target as a second query, and a second rank is assigned to all search targets. Next, for each search target, a new rank score is calculated based on the second rank of the top result. Finally, the search objects are rearranged in descending order of the new rank score, and a third rank is assigned to each search object.

また、非特許文献２記載の検索対象リランキング装置は、非特許文献１記載の検索対象リランキング装置と同様に検索対象の中から上位結果を割り出す。次に、入力となる第１のクエリをデータベースに追加し、各上位結果を第２のクエリとして用いて２回目の検索をし、第１のクエリと全検索対象に第２のランクをつける。次に、各検索対象について、上位結果の第１のランクと第１のクエリの第２のランク、前記各検索対象の第２のランクに基づいて、新たなランクスコアを計算する。最後に、新たなランクスコアの高い順に前記各検索対象を並べ替え、前記各検索対象に第３のランクをつける。 In addition, the search target reranking apparatus described in Non-Patent Document 2 calculates a high-order result from the search targets as in the search target reranking apparatus described in Non-Patent Document 1. Next, a first query to be input is added to the database, a second search is performed using each of the higher rank results as a second query, and a second rank is assigned to the first query and all search targets. Next, for each search target, a new rank score is calculated based on the first rank of the top result, the second rank of the first query, and the second rank of each search target. Finally, the search objects are rearranged in descending order of the new rank score, and a third rank is assigned to each search object.

Danfeng Qin, Stephan Gammeter, Lukas Bossard, Till Quack, Luc J. Van Gool: Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. CVPR 2011: 777-784.Danfeng Qin, Stephan Gammeter, Lukas Bossard, Till Quack, Luc J. Van Gool: Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. CVPR 2011: 777-784. Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu: Spatially-Constrained Similarity Measurefor Large-Scale Object Retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(6): 1229-1241 (2014).Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu: Spatially-Constrained Similarity Measurefor Large-Scale Object Retrieval.IEEE Trans. Pattern Anal. Mach. Intell. 36 (6): 1229-1241 (2014).

非特許文献１、２記載の検索対象リランキング装置は、ランクに基づいて検索対象の新たなランクスコアを計算する。しかし、ランクは、データを並べ替えた優先順位に過ぎず、データ間の類似性と相関しない場合が存在する。例えば、類似性の低い二つのデータについて、片方のデータをクエリとして用いて検索をした場合、もう片方のデータに高いランクが与えられる場合が存在する。こうしたデータ間の類似性と相関しないランクに基づいて検索対象の新たなランクスコアを計算する場合、計算されたランクスコアに誤差が生じやすく、検索対象リランキングの精度に悪影響を与える場合が存在する。 The search target reranking device described in Non-Patent Documents 1 and 2 calculates a new rank score of the search target based on the rank. However, the rank is only the priority obtained by rearranging the data, and there is a case where the rank does not correlate with the similarity between the data. For example, when two data having low similarity are searched using one data as a query, there is a case where a high rank is given to the other data. When calculating a new rank score for a search target based on a rank that does not correlate with the similarity between such data, an error may occur in the calculated rank score, which may adversely affect the accuracy of the search target re-ranking. .

また、非特許文献１記載の検索対象リランキング装置は、全上位結果の検索対象リランキングに対する貢献度を一様に扱う。しかし、上位結果に無関係な検索対象が偽陽性の上位結果として含まれる場合が多く存在する。この場合、偽陽性の上位結果との類似性が高い他の無関係な検索対象について、新たなランクスコアが高く計算される可能性が高く、検索対象リランキングの精度に悪影響を与える場合が存在する。 In addition, the search target reranking device described in Non-Patent Document 1 uniformly handles the contributions of all the top results to the search target reranking. However, there are many cases in which search objects that are irrelevant to the top result are included as false positive top results. In this case, there is a possibility that a new rank score is highly calculated for other irrelevant search targets having high similarity to the false positive top result, and the accuracy of search target re-ranking may be adversely affected. .

非特許文献２記載の検索対象リランキング装置は、上位結果の第１のランクに反比例する加重値を前記上位結果に付与することによって、前記偽陽性の上位結果の貢献度を抑える。しかし、第１のランクの正確性は１回目の検索の精度に強く依存し、１回目の検索の精度が低い場合、偽陽性の上位結果について、第１のランクが高く、高い加重値が付与される場合が存在する。こうした第１のランクの高い偽陽性の上位結果は依然として貢献度が高く、検索対象リランキングの精度に悪影響を与える場合が存在する。 The search target re-ranking device described in Non-Patent Document 2 suppresses the contribution of the false positive top result by giving a weight value inversely proportional to the first rank of the top result to the top result. However, the accuracy of the first rank strongly depends on the accuracy of the first search. When the accuracy of the first search is low, the first rank is high and a high weight is assigned to the top result of false positives. There are cases where The top result of false positive with a high first rank still has a high degree of contribution, and there is a case where the accuracy of search target reranking is adversely affected.

本発明は、上記課題を解決するために成されたものであり、より正確な検索対象リランキングができる検索装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a search device, method, and program capable of performing more accurate search target re-ranking.

上記目的を達成するために、本発明に係る検索装置は、入力されたクエリを用いた検索によって検索対象にランクを付与し、前記検索対象の中から上位結果を複数取得する検索部と、前記検索部によって取得された前記上位結果の各々に対して加重値を設定する加重値計算部と、前記上位結果の各々と全検索対象の各々との間の類似度を各々計算する上位結果検索対象間類似度計算部と、前記上位結果検索対象間類似度計算部によって計算された前記類似度に基づいて、各検索対象について、上位結果との類似度が高ければ高いほど前記上位結果の加重値と前記検索対象のランクスコアとの差分が小さくなることを表す目的関数を最適化するように、各検索対象のランクスコアを計算するリランキング部と、を含んで構成されている。 In order to achieve the above object, a search device according to the present invention provides a search target with a rank by a search using an input query, and obtains a plurality of high-order results from the search target; and A weight value calculation unit that sets a weight value for each of the high-order results acquired by the search unit, and a high-level result search target that calculates a similarity between each of the high-order results and each of all search targets Based on the similarity calculated by the inter-similarity calculation unit and the similarity calculation unit between the upper result search targets, for each search target, the higher the similarity with the upper result, the higher the weight of the upper result And a re-ranking unit that calculates a rank score of each search target so as to optimize an objective function representing that a difference between the search target rank score and the search target rank score is small.

本発明に係る検索方法は、検索部が、入力されたクエリを用いた検索によって検索対象にランクを付与し、前記検索対象の中から上位結果を複数取得し、加重値計算部が、前記検索部によって取得された前記上位結果の各々に対して加重値を設定し、上位結果検索対象間類似度計算部が、前記上位結果の各々と全検索対象の各々との間の類似度を各々計算し、リランキング部が、前記上位結果検索対象間類似度計算部によって計算された前記類似度に基づいて、各検索対象について、上位結果との類似度が高ければ高いほど前記上位結果の加重値と前記検索対象のランクスコアとの差分が小さくなることを表す目的関数を最適化するように、各検索対象のランクスコアを計算する。 In the search method according to the present invention, the search unit assigns a rank to the search target by a search using the input query, acquires a plurality of high-order results from the search target, and the weight calculation unit calculates the search A weight value is set for each of the top results acquired by the section, and a similarity calculation unit between the top results search targets calculates the similarity between each of the top results and each of all search targets. Then, based on the similarity calculated by the upper result search target similarity calculation unit, the reranking unit increases the weight of the higher result for each search target as the similarity with the higher result is higher. The rank score of each search target is calculated so as to optimize the objective function indicating that the difference between the search target rank score and the search target rank score is small.

本発明に係るプログラムは、コンピュータを、上記の検索装置を構成する各部として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit constituting the search device.

本発明の検索装置、方法、及びプログラムによれば、上位結果の各々と全検索対象の各々との間の類似度に基づいて、各検索対象について、上位結果との類似度が高ければ高いほど前記上位結果の加重値と前記検索対象のランクスコアとの差分が小さくなることを表す目的関数を最適化するように、各検索対象のランクスコアを計算し、検索対象にランクを付与することにより、より正確な検索対象リランキングができる、という効果が得られる。 According to the search device, method, and program of the present invention, based on the similarity between each of the top results and each of all the search targets, the higher the similarity to the top results for each search target, the higher By calculating the rank score of each search target so as to optimize the objective function indicating that the difference between the weight value of the top result and the rank score of the search target becomes small, and assigning a rank to the search target Thus, it is possible to obtain an effect that more accurate search target reranking can be performed.

本発明の第１の実施の形態に係る検索装置の構成を示すブロック図である。It is a block diagram which shows the structure of the search device which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る検索装置のリランキング部の構成を示すブロック図である。It is a block diagram which shows the structure of the re-ranking part of the search device which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る検索装置における検索処理ルーチンを示すフローチャートである。It is a flowchart which shows the search processing routine in the search device which concerns on the 1st Embodiment of this invention. 本発明の第３の実施の形態に係る検索装置の加重値計算部の構成を示すブロック図である。It is a block diagram which shows the structure of the weight value calculation part of the search device which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施の形態に係る検索装置における加重値を計算する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which calculates the weight value in the search device which concerns on the 3rd Embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜第１の実施の形態に係る検索装置の構成＞
次に、本発明の第１の実施の形態に係る検索装置の構成について説明する。図１に示すように、本発明の第１の実施の形態に係る検索装置１００は、ＣＰＵと、ＲＡＭと、後述する検索処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この検索装置１００は、機能的には図１に示すように、入力部１０、演算部２０と、出力部５０とを備えている。 <Configuration of Retrieval Device According to First Embodiment>
Next, the configuration of the search device according to the first embodiment of the present invention will be described. As shown in FIG. 1, the search device 100 according to the first embodiment of the present invention includes a CPU, a RAM, and a ROM that stores a program and various data for executing a search processing routine to be described later. It can be configured with a computer including. Functionally, the search device 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、クエリの入力を受け付ける。 The input unit 10 receives a query input.

演算部２０は、検索対象を蓄積・管理しているデータベース２２に対してクエリを用いた検索を行って上位結果を複数取得する検索部２４と、上位結果の各々の加重値を計算する加重値計算部２６と、上位結果の各々と全検索対象の各々との間の類似度を計算する上位結果検索対象間類似度計算部２８と、検索対象のランクスコアを計算してランクを付与するリランキング部３０とを含んで構成されている。 The calculation unit 20 performs a search using a query on the database 22 that stores and manages search targets to obtain a plurality of high-order results, and a weight value that calculates a weight value of each high-order result. A calculation unit 26; a similarity calculation unit 28 between upper result search targets for calculating the similarity between each of the upper results and each of all search targets; The ranking unit 30 is included.

検索部２４は、クエリとデータベース２２を入力とし、クエリと全検索対象との類似度を計算し、類似度に応じて検索対象にランクを付与し、複数の上位結果と、複数の上位結果の各々とクエリとの類似度を記録した第１の類似度行列とを出力する。類似度の計算方法は、例えば、検索対象が文書の場合は、ベクトル空間モデル等が挙げられる。検索対象が画像の場合は、bag-of-visual-words法や非特許文献３のような空間検証法等が挙げられる。 The search unit 24 receives the query and the database 22 as input, calculates the similarity between the query and all the search targets, assigns a rank to the search target according to the similarity, and includes a plurality of higher results and a plurality of higher results. A first similarity matrix in which the similarity between each and the query is recorded is output. The similarity calculation method includes, for example, a vector space model when the search target is a document. When the search target is an image, a bag-of-visual-words method, a spatial verification method as described in Non-Patent Document 3, or the like can be given.

[非特許文献３]：Xiaomeng Wu, Kunio Kashino: Robust Spatial Matching as Ensemble of Weak Geometric Relations. BMVC 2015: 25.1-25.12 [Non-Patent Document 3]: Xiaomeng Wu, Kunio Kashino: Robust Spatial Matching as Ensemble of Weak Geometric Relations. BMVC 2015: 25.1-25.12

加重値計算部２６は、検索部２４によって出力された上位結果を入力とし、上位結果の各々の加重値を設定し、加重値を記録した加重値行列を出力する。本実施の形態では、全上位結果の加重値を１に設定する。 The weight value calculation unit 26 receives the upper result output by the search unit 24, sets the weight value of each upper result, and outputs a weight value matrix in which the weight value is recorded. In the present embodiment, the weight value of all higher rank results is set to 1.

上位結果検索対象間類似度計算部２８は、複数の上位結果とデータベース２２を入力とし、複数の上位結果の各々と全検索対象の各々との間の類似度を各々計算し、類似度の各々を記録した第２の類似度行列を出力する。類似度の計算方法は、検索部２４が用いた方法と同様の方法でも良く、前記方法と異なる方法でも良い。 The upper result search target similarity calculation unit 28 receives a plurality of higher results and the database 22 and calculates the similarity between each of the plurality of higher results and each of all the search targets. Is output as a second similarity matrix. The method for calculating the similarity may be the same as the method used by the search unit 24, or may be a method different from the above method.

リランキング部３０は、図２に示すように、検索対象のランクスコアを最適化するための目的関数を作成する第１の目的関数作成部３２と、目的関数を最適化するランクスコアを計算する第１の目的関数最小化部３４とを含んで構成されている。リランキング部３０は、上位結果検索対象間類似度計算部２８によって出力された第２の類似度行列と加重値計算部２６によって出力された加重値行列を入力とし、検索対象のランクスコアを記録したランクスコア行列を出力する。 As shown in FIG. 2, the reranking unit 30 calculates a rank score for optimizing the objective function and a first objective function creating unit 32 for creating an objective function for optimizing the rank score of the search target. The first objective function minimizing unit 34 is included. The re-ranking unit 30 receives the second similarity matrix output by the higher-level search target similarity calculation unit 28 and the weight value matrix output by the weight calculation unit 26, and records the rank score of the search target. Output the rank score matrix.

第１の目的関数作成部３２は、第２の類似度行列と加重値行列を入力とし、各検索対象について、上位結果との類似度が高ければ高いほど当該上位結果の加重値と当該検索対象のランクスコアとの差分が小さくなることを表す目的関数を作成し、出力する。 The first objective function creation unit 32 receives the second similarity matrix and the weight matrix, and for each search target, the higher the similarity with the higher result, the higher the weighted value of the higher result and the search target. Creates and outputs an objective function indicating that the difference from the rank score becomes smaller.

本実施形態では、以下のように目的関数を作成する。ここで、上位結果の数をｍと記し、ｊ番目の上位結果の加重値をｓ_jと記し、全上位結果の加重値行列を In this embodiment, an objective function is created as follows. Here, the number of top results is written as m, the weight value of the _jth top result is written as s _j, and the weight matrix of all the top results is

と記す。また、検索対象の数をｎと記し、ｉ番目の検索対象のランクスコアをｒ_iと記し、全検索対象のランクスコア行列を . Also, the number of search objects is denoted as n, the rank score of the i-th search object is denoted as r _i, and the rank score matrix of all search objects is

と記す。また、ｉ番目の検索対象とｊ番目の上位結果との類似度をｕ_ijと記し、前記第２の類似度行列を

と記す。 . The similarity between the i-th search target and the j-th top result is denoted as u _ij, and the second similarity matrix is expressed as

.

次に、

の入次数行列を next,

The in-degree matrix of

に定義し、

に定義する。同様に、

の出次数行列を
Defined in

Defined in Similarly,

The degree matrix of

に定義し、

に定義する。次に、

を
Defined in

Defined in next,

The

に正規化する。次に、ｉ番目の検索対象のランクスコアｒ_iの費用関数を
Normalize to Next, the cost function of the rank score r _i of the i-th search target is

に定義する。最後に、全検索対象のランクスコア行列

を最適化するための大域的な目的関数を Defined in Finally, the rank score matrix for all searches

A global objective function to optimize

に定義する。
Defined in

第１の目的関数最小化部３４は、上記式（７）に示す目的関数を入力とし、当該目的関数を最小化する時の全検索対象のランクスコアを計算し、ランクスコアを記録したランクスコア行列を出力する。 The first objective function minimizing unit 34 receives the objective function shown in the equation (7) as an input, calculates the rank scores of all search targets when minimizing the objective function, and rank scores recorded with the rank scores Output a matrix.

本実施形態では、式（７）に示す目的関数について、全検索対象のランクスコア行列

における式（７）の微分係数が0になる時の

を求める。具体的に、以下の式（９）に従って計算する。 In this embodiment, the rank score matrix of all search targets for the objective function shown in Expression (7)

When the derivative of equation (7) is 0

Ask for. Specifically, the calculation is performed according to the following equation (9).

ここで、最適化された

を

と記す。 Where optimized

The

.

出力部５０は、ランクスコアの高い順に検索対象を並べ替え、検索対象にランクを付与して出力する。 The output unit 50 rearranges the search targets in descending order of the rank score, assigns ranks to the search targets, and outputs them.

＜本発明の第１の実施の形態に係る検索装置の作用＞
次に、本発明の第１の実施の形態に係る検索装置１００の作用について説明する。入力部１０によって、クエリを受け付けると、検索装置１００は、図３に示す検索処理ルーチンを実行する。 <Operation of Search Device According to First Embodiment of the Present Invention>
Next, the operation of the search device 100 according to the first embodiment of the present invention will be described. When the input unit 10 receives a query, the search device 100 executes a search processing routine shown in FIG.

まず、ステップＳ１００では、入力されたクエリを用いてデータベース２２に対して検索を行い、上位結果を複数取得し、第１の類似度行列を出力する。 First, in step S100, the database 22 is searched using the input query, a plurality of high-order results are acquired, and a first similarity matrix is output.

ステップＳ１０２では、上記ステップＳ１００で取得した上位結果の各々と、データベース２２に蓄積されている検索対象の各々との間の類似度を計算し、第２の類似度行列を出力する。 In step S102, the similarity between each of the high-order results acquired in step S100 and each of the search objects stored in the database 22 is calculated, and a second similarity matrix is output.

ステップＳ１０４では、上記ステップＳ１００で取得した上位結果の各々に対して加重値を設定し、加重値行列を出力する。 In step S104, a weight value is set for each of the upper results obtained in step S100, and a weight value matrix is output.

そして、ステップＳ１０６では、上記ステップＳ１０２で出力された第２の類似度行列と上記ステップＳ１０４で出力された加重値行列を入力とし、上記式（７）に示す目的関数を作成する。 In step S106, the second similarity matrix output in step S102 and the weight matrix output in step S104 are input, and the objective function shown in equation (7) is created.

ステップＳ１０８では、上記ステップＳ１０６で作成された目的関数を入力とし、上記式（９）に従って、目的関数を最小化する時の全検索対象のランクスコアを計算し、ランクスコアの高い順に検索対象を並べ替え、検索対象にランクを付与して出力し、検索処理ルーチンを終了する。 In step S108, the objective function created in step S106 is used as an input, the rank scores of all search targets when the objective function is minimized are calculated according to the above equation (9), and the search targets are selected in descending order of rank score. Rearrangement, the rank is assigned to the search target and output, and the search processing routine ends.

以上説明したように、本発明の第１の実施の形態に係る検索装置によれば、上位結果の各々と全検索対象の各々との間の類似度に基づいて、各検索対象について、上位結果との類似度が高ければ高いほど当該上位結果の加重値と当該検索対象のランクスコアとの差分が小さくなることを表す目的関数を最適化するように、各検索対象のランクスコアを計算し、１回目の検索によってランクを与えられた検索対象に対して、新たなランクを付与することにより、より正確な検索対象リランキングができる。 As described above, according to the search device according to the first embodiment of the present invention, on the basis of the similarity between each of the high-order results and each of all search objects, the high-order results are obtained for each search object. The higher the degree of similarity is, the higher the weighted value of the top result and the lower the difference between the search target rank score, the more optimal the objective function is calculated. By assigning a new rank to a search target given a rank by the first search, more accurate search target re-ranking can be performed.

また、上位結果の各々と全検索対象の各々との間の類似度に基づいて検索対象のランクスコアを計算することによって、ランクがデータ間の類似性と相関しない場合でも、より正確な検索対象リランキングができる。 In addition, by calculating the rank score of the search target based on the similarity between each of the top results and all of the search targets, a more accurate search target even when the rank does not correlate with the similarity between the data Reranking is possible.

＜第２の実施の形態に係る検索装置の構成＞
次に、本発明の第２の実施の形態に係る検索装置の構成について説明する。なお、第２の実施の形態に係る検索装置は、第１の実施の形態と同様であるため、同一符号を付して説明を省略する。 <Configuration of Retrieval Device According to Second Embodiment>
Next, the configuration of the search device according to the second embodiment of the present invention will be described. Note that the search device according to the second embodiment is the same as that of the first embodiment, and therefore the same reference numerals are given and description thereof is omitted.

第２の実施の形態では、検索部２４によって出力された第１の類似度行列を用いて、上位結果の加重値行列を設定している点が、第１の実施の形態と異なっている。 The second embodiment is different from the first embodiment in that the weight matrix of the upper result is set using the first similarity matrix output by the search unit 24.

上記図１に示すように、第２の実施の形態に係る検索装置１００の加重値計算部２６は、検索部２４によって出力された上位結果、及び第１の類似度行列を入力とし、第１の類似度行列を、上位結果の各々の加重値を記録した加重値行列として設定し、出力する。 As shown in FIG. 1, the weight value calculation unit 26 of the search device 100 according to the second embodiment receives the upper result output by the search unit 24 and the first similarity matrix as inputs, Is set as a weight matrix in which the weight values of the top results are recorded and output.

なお、第２の実施の形態に係る検索装置の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 Note that other configurations and operations of the search device according to the second embodiment are the same as those of the first embodiment, and thus description thereof is omitted.

このように、クエリと上位結果との間の類似度を用いて、上位結果の各々の加重値を設定し、目的関数を最適化するように、検索対象のランクスコアを計算することによって、ランクがデータ間の類似性と相関しない場合でも、より正確な検索対象リランキングができる。 In this way, by using the similarity between the query and the top result, setting each weight value of the top result, and calculating the rank score of the search target so as to optimize the objective function, the rank Even if there is no correlation with the similarity between the data, more accurate search target reranking can be performed.

＜第３の実施の形態に係る検索装置の構成＞
次に、本発明の第３の実施の形態に係る検索装置の構成について説明する。なお、第３の実施の形態に係る検索装置は、第１の実施の形態と同様であるため、同一符号を付して説明を省略する。 <Configuration of Retrieval Device According to Third Embodiment>
Next, the configuration of the search device according to the third embodiment of the present invention will be described. Note that the search device according to the third embodiment is the same as that of the first embodiment, and therefore the same reference numerals are given and description thereof is omitted.

第３の実施の形態では、目的関数を作成し、目的関数を最適化するように、上位結果の加重値を計算している点が、第２の実施の形態と異なっている。 The third embodiment is different from the second embodiment in that the objective function is created and the weight value of the upper result is calculated so as to optimize the objective function.

第３の実施の形態に係る検索装置１００の加重値計算部２６は、図４に示すように、全上位結果間の類似度を計算する上位結果間類似度計算部６０と、上位結果の加重値を最適化するための目的関数を作成する第２の目的関数作成部６２と、加重値を計算する第２の目的関数最小化部６４とを含んで構成されている。加重値計算部２６は、検索部２４によって出力された複数の上位結果と第１の類似度行列を入力とし、上位結果の加重値を記録した加重値行列を出力する。 As shown in FIG. 4, the weight value calculation unit 26 of the search device 100 according to the third embodiment includes an upper result similarity calculation unit 60 that calculates the similarity between all upper results, and the weighting of the upper results. A second objective function creating unit 62 that creates an objective function for optimizing a value and a second objective function minimizing unit 64 that calculates a weight value are configured. The weight value calculation unit 26 receives the plurality of higher rank results output from the search unit 24 and the first similarity matrix, and outputs a weight value matrix in which the weight values of the higher result are recorded.

上位結果間類似度計算部６０は、複数の上位結果を入力とし、全上位結果間の類似度を計算し、類似度を記録した第３の類似度行列を出力する。類似度の計算方法は、検索部２４と上位結果検索対象間類似度計算部２８が用いた方法と同様の方法でも良く、前記方法と異なる方法でも良い。 The upper-level similarity calculation unit 60 receives a plurality of higher-level results, calculates the similarity between all higher-level results, and outputs a third similarity matrix in which the similarities are recorded. The method of calculating the similarity may be the same as the method used by the search unit 24 and the upper result search target similarity calculation unit 28, or may be a method different from the above method.

第２の目的関数作成部６２は、第３の類似度行列と第１の類似度行列を入力とし、二つの上位結果間の類似度が高ければ高いほど前記二つの上位結果の加重値の差分が小さくなることを表す目的関数を作成し、出力する。 The second objective function creation unit 62 receives the third similarity matrix and the first similarity matrix, and the higher the similarity between the two higher results, the higher the difference between the weight values of the two higher results. Creates and outputs an objective function indicating that becomes smaller.

本実施の形態では、以下のように目的関数を作成する。ここで、ｊ番目の上位結果とクエリとの類似度をｙ_jと記し、第１の類似度行列を In the present embodiment, an objective function is created as follows. Here, the similarity between the j-th top result and the query is denoted as y _j, and the first similarity matrix is expressed as

と記す。また、ｉ番目の上位結果とｊ番目の上位結果との類似度をｖ_ijと記し、前記第３の類似度行列を

と記す。 . Also, the similarity between the i-th higher result and the j-th higher result is denoted as v _ij, and the third similarity matrix is

.

次に、

の入次数行列を next,

The in-degree matrix of

に定義し、

に定義する。次に、

を Defined in

Defined in next,

The

に正規化する。次に、ｉ番目の上位結果の加重値ｓ_iの費用関数を Normalize to Next, the cost function of the weight s _i of the i-th top result is

に定義する。最後に、全上位結果の加重値行列

を最適化するための大域的な目的関数を Defined in Finally, the weight matrix for all top results

A global objective function to optimize

に定義する。ここで、μ＞０は外部パラメータであり、

はｍ×ｍの単位行列である。 Defined in Here, μ> 0 is an external parameter,

Is an m × m unit matrix.

第２の目的関数最小化部６４は、上記式（１５）に示す目的関数を入力とし、当該目的関数を最小化する時の全上位結果の加重値を計算し、加重値を記録した加重値行列を出力する。 The second objective function minimizing unit 64 receives the objective function shown in the above equation (15) as an input, calculates a weight value of all the upper results when the objective function is minimized, and a weight value in which the weight value is recorded Output a matrix.

本実施の形態では、上記式（１５）の目的関数について、全上位結果の加重値行列

における式（１５）の微分係数が0になる時の

を求める。具体的には、以下の式（１７）に従って計算する。 In the present embodiment, with respect to the objective function of the above equation (15), the weight value matrix of all higher rank results

When the derivative of equation (15) is 0

Ask for. Specifically, the calculation is performed according to the following equation (17).

ここで、

である。 here,

It is.

最適化された

を

と記す。 Optimized

The

.

式（１７）を用いて

を求めても良いが、本実施の形態では、より効率的な解法として、以下の反復法を用いる。ここで、ｔ番目の反復をした時の

を

と記し、

を

に初期化する。次に、 Using equation (17)

However, in this embodiment, the following iterative method is used as a more efficient solution. Where the t-th iteration

The

And

The

Initialize to. next,

により

を更新し、外部パラメータＴ番目の反復が終わった時の加重値行列

を

として出力する。 By

And the weight matrix when the external parameter Tth iteration is over

The

Output as.

＜本発明の第３の実施の形態に係る検索装置の作用＞
次に、本発明の第３の実施の形態に係る検索装置１００の作用について説明する。入力部１０によって、クエリを受け付けると、検索装置１００は、上記図３に示す検索処理ルーチンと同様の処理ルーチンを実行する。 <Operation of Search Device According to Third Embodiment of the Present Invention>
Next, the operation of the search device 100 according to the third embodiment of the present invention will be described. When the query is received by the input unit 10, the search device 100 executes a processing routine similar to the search processing routine shown in FIG.

このとき、上記ステップＳ１０４は、図５に示す処理ルーチンによって実現される。 At this time, step S104 is realized by the processing routine shown in FIG.

ステップＳ３００では、上記ステップＳ１００で取得した複数の上位結果を入力とし、全上位結果間の類似度を計算し、類似度を記録した第３の類似度行列を出力する。 In step S300, the plurality of higher rank results acquired in step S100 are input, the similarity between all the higher rank results is calculated, and a third similarity matrix in which the similarities are recorded is output.

ステップＳ３０２では、上記ステップＳ３００で出力された第３の類似度行列と上記ステップＳ１００で出力された第１の類似度行列を入力とし、上記式（１５）に示す目的関数を作成し、出力する。 In step S302, the third similarity matrix output in step S300 and the first similarity matrix output in step S100 are input, and the objective function shown in equation (15) is created and output. .

そして、ステップＳ３０４では、上記ステップＳ３００で出力された第３の類似度行列を正規化した行列と、上記ステップＳ１００で出力された第１の類似度行列とに基づいて、上記式（１８）に従って、上記式（１５）に示す目的関数を最適化するように上位結果の各々の加重値を更新することを繰り返すことにより、上位結果の各々の加重値を計算し、加重値行列を出力する。 In step S304, based on the matrix obtained by normalizing the third similarity matrix output in step S300 and the first similarity matrix output in step S100, the above equation (18) is used. Then, by repeatedly updating each weight value of the upper result so as to optimize the objective function shown in the above equation (15), each weight value of the upper result is calculated, and a weight matrix is output.

なお、第３の実施の形態に係る検索装置の他の構成及び作用については、第１の実施の形態と同様であるため、説明を省略する。 Note that other configurations and operations of the search device according to the third embodiment are the same as those of the first embodiment, and thus the description thereof is omitted.

このように、上位結果間の類似度を用いて、各上位結果について、他の上位結果との類似度が低ければ低いほどより低い加重値を付与することによって、１回目の検索の精度が低い場合や上位結果に無関係な検索対象が含まれる場合でも、偽陽性の上位結果に、より低い加重値を付与することが可能になり、偽陽性の上位結果の検索対象リランキングに対する貢献度を抑制し、より正確な検索対象リランキングができる。 In this way, by using the similarity between the upper results, the lower the accuracy of the first search, the lower the similarity with the other upper results, the lower the weighted value is assigned. Even if search targets that are irrelevant to the top results are included, it is possible to assign a lower weight value to the false positive top results and suppress the contribution of the false positive top results to the search target reranking In addition, more accurate search ranking can be performed.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made without departing from the gist of the present invention.

例えば、上述した実施の形態では、データベース２２が、検索装置１００に設けられている場合を例に説明したが、これに限定されるものではなく、データベース２２が、検索装置１００の外部に設けられていてもよい。 For example, in the above-described embodiment, the case where the database 22 is provided in the search device 100 has been described as an example. However, the present invention is not limited to this, and the database 22 is provided outside the search device 100. It may be.

１０入力部
２０演算部
２２データベース
２４検索部
２６加重値計算部
２８上位結果検索対象間類似度計算部
３０リランキング部
３２第１の目的関数作成部
３４第１の目的関数最小化部
５０出力部
６０上位結果間類似度計算部
６２第２の目的関数作成部
６４第２の目的関数最小化部
１００検索装置 DESCRIPTION OF SYMBOLS 10 Input part 20 Calculation part 22 Database 24 Search part 26 Weight value calculation part 28 Upper result search object similarity calculation part 30 Re-ranking part 32 1st objective function creation part 34 1st objective function minimization part 50 Output part 60 upper result similarity calculation unit 62 second objective function creation unit 64 second objective function minimization unit 100 search device

Claims

A search unit that ranks search targets by a search using an input query, and obtains a plurality of high-order results from the search targets;
A weight value calculation unit for setting a weight value for each of the high-order results acquired by the search unit;
A similarity calculation unit between upper result search targets for calculating the similarity between each of the upper results and each of all search targets;
Based on the similarity calculated by the similarity calculation unit between the upper result search objects, for each search object, the higher the similarity with the upper result, the higher the weight value of the upper result and the rank score of the search object A re-ranking unit that calculates a rank score for each search target so as to optimize an objective function representing that the difference between
Search device including

The search unit obtains a plurality of the high-order results, obtains a similarity between the query and each of the high-order results,
The search device according to claim 1, wherein the weight calculation unit sets a similarity between the query and each of the high-order results as a weight for each of the high-order results.

The search unit obtains a plurality of the high-order results, obtains a similarity between the query and each of the high-order results,
The weight calculation unit calculates a similarity between the upper results, and based on each of the similarities between the upper results, the higher the similarity between the two upper results, the higher the two upper results The search device according to claim 1, wherein the weight value of each of the high-order results is calculated so as to optimize an objective function representing that a difference between the weight values is small.

The weight calculation unit optimizes the objective function based on each normalized similarity between the top results and the similarity between the query and each of the top results. The search device according to claim 3, wherein the weight value of each of the high-order results is calculated by repeatedly updating the weight value of each of the high-order results.

The search unit gives a rank to the search target by searching using the input query, and obtains a plurality of high-order results from the search target,
A weight value calculation unit sets a weight value for each of the top results obtained by the search unit;
The similarity calculation unit between the upper result search objects calculates the similarity between each of the upper results and each of all the search objects,
Based on the similarity calculated by the upper result search target similarity calculation unit, the re-ranking unit, for each search target, the higher the similarity with the upper result, the higher the weighted value of the higher result and the higher result A search method for calculating a rank score of each search target so as to optimize an objective function indicating that a difference from the rank score of the search target is small.

In the acquisition by the search unit, the plurality of higher results are obtained, and the similarity between the query and each of the higher results is obtained,
6. The search method according to claim 5, wherein the weight value calculation unit sets the similarity between the query and each of the higher results as a weight value for each of the higher results.

In the acquisition by the search unit, the plurality of higher results are obtained, and the similarity between the query and each of the higher results is obtained,
In the setting by the weight calculation unit, the similarity between the higher results is calculated, and based on the similarity between the higher results, the higher the similarity between the two higher results, The search method according to claim 5, wherein the weight value of each of the high-order results is calculated so as to optimize an objective function indicating that a difference between the weight values of the two high-order results is small.

The program for functioning a computer as each part which comprises the search device of any one of Claims 1-4.