JP5508232B2

JP5508232B2 - Query recommendation device, query recommendation method, query recommendation program

Info

Publication number: JP5508232B2
Application number: JP2010256059A
Authority: JP
Inventors: 伸二宮原; 典史片渕; 良治片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-11-16
Filing date: 2010-11-16
Publication date: 2014-05-28
Anticipated expiration: 2030-11-16
Also published as: JP2012108661A

Description

本発明は、検索エンジンの検索ログに基づきユーザに推薦クエリを提供する技術に関する。 The present invention relates to a technique for providing a recommendation query to a user based on a search log of a search engine.

近年、インターネット上の情報は急激に増加し、ユーザが必要な情報を検索する検索エンジンの重要性が高まっている。この検索エンジンには、ユーザの検索状況を示すログファイルが蓄積されているため、該ログファイルを分析し、検索サービスの向上に利用する試みが行われている。 In recent years, information on the Internet has increased rapidly, and the importance of search engines for searching for information required by users has increased. Since this search engine stores a log file indicating a user's search status, an attempt is made to analyze the log file and use it to improve the search service.

表１は、一般的なログファイルを示している。ここでは「日付」、「ユーザＩＤ」、「クエリ名」、「クリックＵＲＬ」、「ランキング順位」の項目に関する情報が記録されている。具体的には、「日付」の項目にはユーザが検索結果のＵＲＬをクリックした日付が記録され、「ユーザＩＤ」の項目には検索クエリを用いて検索したユーザの所有するＩＤが記録され、「クエリ名」の項目にはユーザが検索に利用したクエリ名が記録され、「クリックＵＲＬ」の項目にはユーザが検索結果中でクリックしたＵＲＬが記録され、「ランキング順位」の項目にはクリックＵＲＬの検索ランキングの順位が記録される。 Table 1 shows a typical log file. Here, information relating to items of “date”, “user ID”, “query name”, “click URL”, and “ranking order” is recorded. Specifically, the date when the user clicked the URL of the search result is recorded in the item “date”, and the ID owned by the user who searched using the search query is recorded in the item “user ID”. The query name used by the user for the search is recorded in the “query name” item, the URL clicked by the user in the search result is recorded in the “click URL” item, and the click in the “ranking order” item. The URL search ranking is recorded.

検索エンジンでは、このようなログファイルをユーザの検索行動の効率化、例えば推薦クエリの提示に利用している。この推薦クエリは、図１に示すように、ユーザが検索エンジンに投入したクエリと、検索結果をクリックしたＵＲＬとをペアにした二部グラフを用いて定められている。図１中、左側のｑ₁〜ｑ₃はユーザ投入のクエリ名を示し、右側のｕ₁〜ｕ₅は検索結果からユーザがクリックしたＵＲＬを示し、該両者がエッジで連結されている。このような二部グラフを解析し、結び付きの強いクエリを推薦クエリとして用いている。 Search engines use such log files to improve the efficiency of user search behavior, for example, to present recommended queries. As shown in FIG. 1, this recommendation query is defined using a bipartite graph in which a query input by a user to a search engine and a URL where a search result is clicked are paired. In FIG. 1, q _{1 to} q ₃ on the left side indicate user input query names, and u _{1 to} u ₅ on the right side indicate URLs clicked by the user from the search results, and the two are connected by an edge. Such a bipartite graph is analyzed, and a strongly connected query is used as a recommendation query.

ところが、二部グラフに基づく推薦クエリの提示は、利用頻度の高いクエリはログファイル中の検索ログが多く有効であるものの、利用頻度の少ないクエリでは検索ログが少ないため、関連するクエリも少なく、推薦できるクエリが希少となる問題が生じる。 However, the recommendation query based on the bipartite graph is effective because the search log in the log file is effective for the frequently used query, but the query is not frequently used because the search log is small. There is a problem that queries that can be recommended are rare.

そこで、利用頻度の少ないクエリに対するクエリ推薦技術が非特許文献１に提案されている。ここではユーザ投入のクエリと実際にクリックしたＵＲＬとを結び付けたクリックグラフ（二部グラフ）と、ユーザ投入のクエリと実際にはクリックしていないＵＲＬとを結び付けたスキップグラフを用いてクエリを推薦している。図２は、図１のクリックグラフから得られるスキップグラフ例を示し、該スキップグラフによればクリックグラフでは繋がりのないクエリ同士が繋がりをもつようになり（例えばｑ₁．ｑ₂．ｑ₃がｕ₂を介して繋がりを持つ）、ユーザに推薦するクエリ数を増やすことができる。 Therefore, Non-Patent Document 1 proposes a query recommendation technique for queries that are less frequently used. Here, a query is recommended using a click graph (bipartite graph) that links the user-input query and the actually clicked URL, and a skip graph that links the user-input query and the URL that is not actually clicked. doing. FIG. 2 shows an example of a skip graph obtained from the click graph of FIG. 1. According to the skip graph, queries not connected in the click graph are connected (for example, q ₁ .q ₂ .q ₃ are connected). with connection via a u _2), it is possible to increase the number of queries to be recommended to the user.

ここではクリックグラフから得られる特徴量は、クリックされたＵＲＬで繋がるクエリ同士の明示的な関連度として扱っている。また、スキップグラフから得られる特徴量は、ユーザのクリックしていないＵＲＬを介して繋がるクエリ同士の暗黙的な関連度として扱っている。このようなクリックグラフから得られるクエリ間の特徴量と、スキップグラフから得られるクエリ間の特徴量との総和を用いて、ユーザに推薦するクエリを抽出している。 Here, the feature amount obtained from the click graph is treated as an explicit degree of association between the queries connected by the clicked URL. In addition, the feature amount obtained from the skip graph is treated as an implicit degree of association between queries connected via a URL that the user has not clicked. A query recommended to the user is extracted using the sum of the feature quantities between queries obtained from such a click graph and the feature quantities between queries obtained from the skip graph.

Yang Song,Li-wei He,"Optimal Rare Query Suggestion With Implicit User Feedback",ACM WWW 2010,April 26-30,2010Yang Song, Li-wei He, "Optimal Rare Query Suggestion With Implicit User Feedback", ACM WWW 2010, April 26-30,2010

しかしながら、非特許文献１の手法では、対象としているクエリの検索結果に対して、クリックしていないＵＲＬにスキップグラフのエッジを作成して重みを与えていくことから、該当するＵＲＬをクリックしないユーザ数が多くなると大きな重みが与えられてしまう。これではユーザの未クリックＵＲＬをクエリに関連しないＵＲＬと想定した場合に無関係のクエリが多く推薦され、ユーザの検索効率が悪化するおそれがある。 However, according to the method of Non-Patent Document 1, since the edge of the skip graph is created and weighted to the URL that has not been clicked for the search result of the target query, the user who does not click the corresponding URL When the number increases, a large weight is given. In this case, when the user's unclicked URL is assumed to be a URL not related to the query, many unrelated queries are recommended, and the search efficiency of the user may be deteriorated.

また、ログファイルを用いてスキップグラフを作成するにあたって、ログファイル中にはクエリとクリック関係の無いＵＲＬが多数存在するため、関連するクエリと判断されるクエリ同士が多くなり、この点でも無関係なクエリが数多く推薦されるおそれがある。 In addition, when creating a skip graph using a log file, there are many URLs that do not have a click relationship with the query in the log file, so there are many queries that are judged to be related queries. Many queries may be recommended.

本発明は、上述のような従来技術の問題点を解決するためになされたものであり、高精度の推薦クエリをユーザに提供し、ユーザの検索効率の向上に貢献することを解決課題としている。 The present invention has been made to solve the above-described problems of the prior art, and it is a solution to provide a highly accurate recommendation query to the user and contribute to improvement of the search efficiency of the user. .

そこで、本発明は、検索ログ中、クリックされたＵＲＬで結び付けられる各クエリの検索ログを順次抽出し、抽出された検索ログ中のＵＲＬ、即ちクエリ間に介在するＵＲＬの重要度を算出する。このＵＲＬの重要度に基づきスキップグラフにおけるクエリ間の特徴量を算出することで、検索結果におけるクリック状況（例えばクリックされなかったＵＲＬが多い等）の影響を軽減してクエリ間の関連度を算出し、より高精度の推薦クエリを抽出する。このときクエリ間の距離も用いてスキップグラフにおけるクエリ間の特徴量を算出することもできる。これにより関連度の低いクエリの特徴量を小さくすることが可能となる。 Therefore, the present invention sequentially extracts the search log of each query linked by the clicked URL in the search log, and calculates the importance of the URL in the extracted search log, that is, the URL interposed between the queries. By calculating the feature quantity between queries in the skip graph based on the importance of this URL, the influence of the click status (for example, many URLs that have not been clicked) in the search result is reduced, and the degree of association between queries is calculated. Then, a highly accurate recommendation query is extracted. At this time, the feature amount between queries in the skip graph can also be calculated using the distance between queries. This makes it possible to reduce the feature amount of a query with a low degree of association.

具体的には、本発明に係るクエリ推薦装置の一態様は、検索クエリ記憶手段からクリックされたＵＲＬで結び付けられる各クエリの検索ログを抽出し、抽出された検索ログ内のＵＲＬの重要度を各クエリからのクリック確率に基づき算出するクエリ関連ログ抽出手段と、前記抽出手段の抽出した検索ログに基づき各クエリとクリックされたＵＲＬとをエッジで連結させたクリックグラフと、各クエリと未クリックのＵＲＬとをエッジで連結させたスキップグラフとを作成するグラフ作成手段と、グラフ作成手段の作成したクリックグラフにおけるクエリ間の特徴量をＵＲＬ経由の到達確率に応じて算出するクリックグラフ特徴量抽出手段と、グラフ作成手段の作成したスキップグラフにおけるクエリ間の特徴量を、前記抽出手段の算出したＵＲＬの重要度をもって算出するスキップグラフ特徴量抽出手段と、前記各特徴量抽出手段の算出した特徴量を合成して最終的なクエリ間の特徴量を算出する特徴量合成手段と、を備え、特徴量合成手段の合成したクエリ間の特徴量に応じてユーザに推薦クエリを提示することを特徴とする。 Specifically, one aspect of the query recommendation device according to the present invention extracts a search log of each query linked with the clicked URL from the search query storage unit, and determines the importance of the URL in the extracted search log. Query-related log extraction means that is calculated based on the click probability from each query, a click graph that links each query and the clicked URL based on the search log extracted by the extraction means, and each query and non-click Graph creation means for creating a skip graph obtained by connecting the URLs of URLs with edges, and click graph feature quantity extraction for calculating the feature quantity between queries in the click graph created by the graph creation means in accordance with the arrival probability via URL And the feature quantity between the queries in the skip graph created by the graph creation means is calculated as U calculated by the extraction means. A skip graph feature amount extraction means for calculating with the importance of L, and a feature amount synthesis means for calculating the feature amount between the final queries by combining the feature amounts calculated by the feature amount extraction means, A feature is that a recommended query is presented to a user in accordance with a feature quantity between queries synthesized by a feature quantity synthesis unit.

本発明に係るクエリ推薦装置の他の態様は、検索クエリ記憶手段からクリックされたＵＲＬで結び付けられる各クエリの検索ログを抽出する際に、該ＵＲＬで順次結び付けられるクエリの位置情報をクエリ間のＵＲＬ数に応じて設定し、該ＵＲＬの重要度を各クエリからのクリック確率に基づき算出するクエリ関連ログ抽出手段と、前記抽出手段の抽出した検索ログに基づき各クエリとクリックされたＵＲＬとをエッジで連結させたクリックグラフと、各クエリと未クリックのＵＲＬとをエッジで連結させたスキップグラフとを作成するグラフ作成手段と、グラフ作成手段の作成したクリックグラフにおけるクエリ間の特徴量をＵＲＬ経由の到達確率に応じて算出するクリックグラフ特徴量抽出手段と、グラフ作成手段の作成したスキップグラフにおけるクエリ間の特徴量を、前記抽出手段の設定した位置情報に基づくクエリ間の距離と、前記抽出手段の算出した前記ＵＲＬの重要度とを用いて算出するスキップグラフ特徴量抽出手段と、前記各特徴量抽出手段の算出した特徴量を合成して最終的なクエリ間の特徴量を算出する特徴量合成手段と、を備え、特徴量合成手段の合成したクエリ間の特徴量に応じてユーザに推薦クエリを提示することを特徴とする。 In another aspect of the query recommendation device according to the present invention, when the search log of each query linked by the clicked URL is extracted from the search query storage unit, the position information of the query sequentially linked by the URL is extracted between the queries. Query-related log extraction means for setting the URL according to the number of URLs and calculating the importance of the URL based on the click probability from each query, and each query and the clicked URL based on the search log extracted by the extraction means A graph creation unit that creates a click graph linked by an edge, a skip graph in which each query and an unclicked URL are linked by an edge, and a feature amount between queries in the click graph created by the graph creation unit is a URL. Click graph feature amount extraction means that is calculated according to the via arrival probability, and skipping created by the graph creation means A skip graph feature amount extraction unit that calculates a feature amount between queries in F using a distance between queries based on position information set by the extraction unit and an importance level of the URL calculated by the extraction unit; A feature amount synthesizing unit that synthesizes the feature amounts calculated by the feature amount extraction units and calculates a feature amount between the final queries, and according to the feature amounts between the queries synthesized by the feature amount synthesis unit A recommendation query is presented to the user.

本発明に係るクエリ推薦方法の一態様は、検索クエリ記憶手段からクリックされたＵＲＬで結び付けられる各クエリの検索ログを抽出し、抽出された検索ログ内のＵＲＬの重要度を各クエリのクリック確率に基づき算出するクエリ関連ログ抽出ステップと、前記抽出ステップで抽出した検索ログに基づき各クエリとクリックされたＵＲＬとをエッジで連結させたクリックグラフと、各クエリと未クリックのＵＲＬとをエッジで連結させたスキップグラフとを作成するグラフ作成ステップと、グラフ作成ステップで作成したクリックグラフにおけるクエリ間の特徴量をＵＲＬ経由の到達確率に応じて算出するクリックグラフ特徴量抽出ステップと、グラフ作成ステップで作成したスキップグラフにおけるクエリ間の特徴量を、前記抽出ステップで算出したＵＲＬの重要度をもって算出するスキップグラフ特徴量抽出ステップと、前記各特徴量抽出ステップで算出した特徴量を合成して最終的なクエリ間の特徴量を算出する特徴量合成ステップと、を有し、特徴量合成ステップで合成したクエリ間の特徴量に応じてユーザに推薦クエリを提示することを特徴とする。 One aspect of the query recommendation method according to the present invention is to extract a search log of each query linked with the clicked URL from the search query storage unit, and to determine the importance of the URL in the extracted search log as the click probability of each query. A query-related log extraction step that is calculated based on the search log, a click graph that links each query and the clicked URL based on the search log extracted in the extraction step, and each query and an unclicked URL at the edge A graph creation step for creating a connected skip graph, a click graph feature amount extraction step for calculating a feature amount between queries in the click graph created in the graph creation step according to an arrival probability via URL, and a graph creation step Extracting the feature quantity between queries in the skip graph created in step A skip graph feature amount extraction step for calculating the importance of the calculated URL, and a feature amount synthesis step for calculating the feature amount between the final queries by combining the feature amounts calculated in the feature amount extraction steps. And a recommended query is presented to the user according to the feature quantity between the queries synthesized in the feature quantity synthesis step.

本発明に係るクエリ推薦方法の他の態様は、検索クエリ記憶手段からクリックされたＵＲＬで結び付けられる各クエリの検索ログを抽出する際に、該ＵＲＬで順次結び付けられるクエリの位置情報をクエリ間のＵＲＬ数に応じて設定し、該ＵＲＬの重要度を各クエリからのクリック確率に基づき算出するクエリ関連ログ抽出ステップと、前記抽出ステップで抽出した検索ログに基づき各クエリとクリックされたＵＲＬとをエッジで連結させたクリックグラフと、各クエリと未クリックのＵＲＬとをエッジで連結させたスキップグラフとを作成するグラフ作成ステップと、グラフ作成ステップで作成したクリックグラフにおけるクエリ間の特徴量をＵＲＬ経由の到達確率に応じて算出するクリックグラフ特徴量抽出ステップと、グラフ作成ステップで作成したスキップグラフにおけるクエリ間の特徴量を、前記抽出ステップで設定した位置情報に基づくクエリ間の距離と、前記抽出ステップで算出した前記ＵＲＬの重要度とを用いて算出するスキップグラフ特徴量抽出ステップと、前記各特徴量抽出ステップで算出した特徴量を合成して最終的なクエリ間の特徴量を算出する特徴量合成ステップと、有し、特徴量合成ステップで合成したクエリ間の特徴量に応じてユーザに推薦クエリを提示することを特徴とする。 In another aspect of the query recommendation method according to the present invention, when the search log of each query linked by the clicked URL is extracted from the search query storage unit, the position information of the query sequentially linked by the URL is extracted between the queries. A query related log extracting step that sets the URL according to the number of URLs and calculates the importance of the URL based on the click probability from each query, and each query and the clicked URL based on the search log extracted in the extracting step. A graph creation step for creating a click graph linked by an edge, a skip graph by linking each query and an unclicked URL by an edge, and a feature amount between queries in the click graph created by the graph creation step Click graph feature amount extraction step calculated according to the via arrival probability, and graph creation step The skip graph feature is calculated by using the distance between queries based on the position information set in the extraction step and the importance of the URL calculated in the extraction step. An amount extraction step; and a feature amount synthesis step that calculates the feature amount between the queries by combining the feature amounts calculated in the feature amount extraction steps, and between the queries combined in the feature amount combination step A recommendation query is presented to the user in accordance with the feature amount.

なお、本発明は、前記装置としてコンピュータを機能させるプログラムの態様としてもよい。このプログラムは、ネットワークや記録媒体などを通じて提供することができる。 In addition, this invention is good also as an aspect of the program which makes a computer function as said apparatus. This program can be provided through a network or a recording medium.

本発明によれば、高精度の推薦クエリをユーザに提供でき、ユーザの検索効率の向上に貢献することができる。 ADVANTAGE OF THE INVENTION According to this invention, a highly accurate recommendation query can be provided to a user and it can contribute to the improvement of a user's search efficiency.

クリックグラフ（二部グラフ）例。Click graph (bipartite graph) example. スキップグラフ例。Example of skip graph. 本発明の実施形態に係るクエリ推薦装置の構成例。The structural example of the query recommendation apparatus which concerns on embodiment of this invention. 同クエリ推薦方法のフローチャート。The flowchart of the same query recommendation method. 図４中のＳ０１のクエリ位置算出を示すフローチャート。5 is a flowchart showing query position calculation in S01 in FIG. 同ＵＲＬの重要度算出を示すフローチャート。The flowchart which shows importance calculation of the URL. 同クリックグラフからクエリ間の特徴量算出例を示す説明図。Explanatory drawing which shows the example of feature-value calculation between queries from the same click graph.

≪装置構成例≫
図３に基づき本発明の実施形態に係るクエリ推薦装置の構成例を説明する。この推薦装置１は、検索エンジンの検索ログに基づきクリックグラフとスキップグラフとを作成し、該両グラフから得られるクエリ間の関連度、即ちクエリ間の特徴量を用いてユーザに推薦クエリを提示する。 ≪Example of device configuration≫
Based on FIG. 3, the structural example of the query recommendation apparatus which concerns on embodiment of this invention is demonstrated. This recommendation device 1 creates a click graph and a skip graph based on a search log of a search engine, and presents a recommended query to a user using the degree of association between queries obtained from the both graphs, that is, the feature quantity between queries. To do.

具体的には、前記推薦装置１は、検索サービスを提供する検索エンジンと連携し、通常のコンピュータのハードウェアリソース、例えばＣＰＵ，メモリ（ＲＡＭ），ハードディスクドライブ装置，通信デバイスなどを備える。このハードウェアリソースとソフトウェアリソース（ＯＳ．アプリケーションなど）との協働の結果、前記推薦装置１は、検索クエリ記憶部２，クエリ関連ログ抽出部３，クリックグラフ作成部４，クリックグラフ特徴量抽出部５，スキップグラフ作成部６，スキップグラフ特徴量抽出部７，特徴量合成部８を実装する。 Specifically, the recommendation device 1 is linked with a search engine that provides a search service, and includes hardware resources such as a CPU, a memory (RAM), a hard disk drive device, and a communication device. As a result of the cooperation between the hardware resource and the software resource (OS. Application, etc.), the recommendation device 1 has a search query storage unit 2, a query related log extraction unit 3, a click graph creation unit 4, and a click graph feature value extraction. 5, a skip graph creation unit 6, a skip graph feature amount extraction unit 7, and a feature amount synthesis unit 8 are implemented.

この記憶部２には、検索エンジンに投入された検索クエリ群と検索結果などの検索ログ（ログファイル）が格納されている。ここでは検索エンジンの検索ログに連動してメモリ（ＲＡＭ）あるいはハードディスクドライブ装置などの記憶装置に該検索ログをデータ格納する。表２は、前記記憶部２の格納データ例を示し、表１と同様に「日付」、「ユーザＩＤ」、「クエリ名」、「クリックＵＲＬ」、「ランキング順位」の項目に関する情報が記録されている。 The storage unit 2 stores a search log (log file) such as a search query group input to the search engine and a search result. Here, the search log is stored in a storage device such as a memory (RAM) or a hard disk drive device in conjunction with the search log of the search engine. Table 2 shows an example of data stored in the storage unit 2 and records information on items of “date”, “user ID”, “query name”, “click URL”, and “ranking rank” as in Table 1. ing.

前記抽出部３は、前記記憶部２を参照してクリックされたＵＲＬで繋がる各クエリ、即ち該ＵＲＬで結び付けられる各クエリの検索ログを順次抽出する。このときクリックされたＵＲＬを介して順次結び付けられるクエリの位置情報を、クエリ間に介在するＵＲＬ数に応じて設定する。また、抽出された検索ログ内のＵＲＬ、即ち前記クエリ間に介在するＵＲＬの重要度を算出する。ここではクエリの利用頻度（クエリ投入回数に対するＵＲＬのクリック回数）に基づきＵＲＬのクリック確率を算出し、該クリック確率に基づきＵＲＬのエントロピーを算出する。このエントロピーを、各ＵＲＬのエントロピーの最大値で正規化する。正規化されたエントロピーをＵＲＬの重要度に利用する。 The extraction unit 3 sequentially extracts each query connected by the clicked URL with reference to the storage unit 2, that is, a search log of each query connected by the URL. At this time, the position information of the queries sequentially linked via the clicked URLs is set according to the number of URLs interposed between the queries. In addition, the importance of the URL in the extracted search log, that is, the URL interposed between the queries is calculated. Here, the URL click probability is calculated based on the query usage frequency (the number of URL clicks relative to the number of query inputs), and the URL entropy is calculated based on the click probability. This entropy is normalized with the maximum entropy value of each URL. Normalized entropy is used for URL importance.

前記グラフ作成部４は、前記抽出部３の抽出した検索ログに基づき各クエリとクリックされたＵＲＬとをエッジで連結したクリックグラフを作成し、前記特徴量抽出部５は、前記グラフ作成部４で作成したクリックグラフにおけるクエリ間の特徴量、即ちクエリ間の明示的な関連度を一方のクエリからＵＲＬを経由して他方のＵＲＬに到達する確率に基づき算出する。その際にクエリの投入回数に対するＵＲＬのクリック回数の割合を利用する。 The graph creation unit 4 creates a click graph in which each query and the clicked URL are connected by an edge based on the search log extracted by the extraction unit 3, and the feature amount extraction unit 5 includes the graph creation unit 4 The feature amount between queries in the click graph created in step 1, that is, the explicit relevance between the queries is calculated based on the probability that one query will reach the other URL via the URL. At that time, the ratio of the number of URL clicks to the number of query inputs is used.

前記グラフ作成部６は、前記抽出部３の抽出した検索ログに基づき各クエリとクリックされていないＵＲＬとをエッジで連結したスキップグラフを作成し、前記特徴量抽出部７は、スキップグラフにおけるクエリ間の特徴量、即ちクエリ間の暗黙的な関連度を、前記抽出手段の算出したＵＲＬの重要度と各クエリの位置情報に基づく距離を用いて算出する。 The graph creation unit 6 creates a skip graph in which each query and an unclicked URL are connected by an edge based on the search log extracted by the extraction unit 3, and the feature amount extraction unit 7 creates a query in the skip graph. A feature amount between them, that is, an implicit relevance between queries, is calculated using a distance based on the importance of the URL calculated by the extraction unit and position information of each query.

前記合成部８は、前記各特徴量抽出手段５．７の算出した特徴量を合成して最終的なクエリ間の特徴量を算出する。ここで合成されたクエリ間の特徴量に応じてユーザに提示する推薦クエリが抽出される。抽出された推薦クエリは、インターネット経由でユーザ所有の端末（ＰＣ，携帯電話など）に表示される。 The synthesizing unit 8 synthesizes the feature quantities calculated by the feature quantity extraction units 5.7 to calculate a feature quantity between final queries. A recommendation query to be presented to the user is extracted according to the feature amount between the queries synthesized here. The extracted recommendation query is displayed on a user-owned terminal (PC, mobile phone, etc.) via the Internet.

≪処理ステップ≫
以下、図４に基づき前記推薦装置１の処理ステップを説明する。ここではクエリｑ１を処理対象のクエリとする事例を説明する。このクエリｑ１が図示省略の入力部に入力されて処理が開始される。 ≪Process step≫
Hereinafter, processing steps of the recommendation device 1 will be described with reference to FIG. Here, a case where the query q1 is a query to be processed will be described. The query q1 is input to an input unit (not shown) and processing is started.

Ｓ０１：前記抽出部３は、図示省略の入力部にクエリｑ１が入力されると、前記記憶部２にアクセスし、クエリｑ₁から順にクリックされたＵＲＬで結び付けられる各クエリの検索ログ（クリックログ）のサブセットを抽出する。このとき前記抽出部３は、該検索ログ内における各クエリの位置情報を設定し、各ＵＲＬの重要度を算出する。なお、前記サブセット・各クエリの位置情報・各ＵＲＬの重要度は、前記グラフ作成部４．６に出力される。 S01: The extraction unit 3, a query q1 is input to the input unit (not shown), it accesses the storage unit 2, search log (click logs for each query from the query q ₁ tied at the clicked URL in order ) To extract a subset. At this time, the extraction unit 3 sets position information of each query in the search log, and calculates the importance of each URL. The subset, the position information of each query, and the importance of each URL are output to the graph creation unit 4.6.

図５に基づき各クエリの位置情報の設定例を説明する。ここではクエリｑ₁から順にＵＲＬのクリック関係を有するクエリ（いわゆる関連クエリ）の位置情報を設定する。この関連クエリの位置情報をクエリｑ₁からの最小距離、即ちクエリｑ₁からの最小経由ＵＲＬ数に応じて設定する。まず、処理の開始にあたってＳ１２．Ｓ１３の処理回数の最大値「Ｎ＿ｍａｘ」が設定される。この処理例では「Ｎ＿ｍａｘ＝２」とし、初期値「Ｎ＝０」で処理を開始する（Ｓ１１）。なお、「Ｎ＿ｍａｘ」の値は適宜調整することができるものとする。 A setting example of position information of each query will be described with reference to FIG. Here, position information of a query having a URL click relationship (so-called related query) is set in order from the query q ₁ . The minimum distance of the position information of the relevant query from the query q _1, that is set according to the minimum over the number URL from the query q _1. First, S12. The maximum value “N_max” of the number of processes in S13 is set. In this processing example, “N_max = 2” is set, and the processing is started with an initial value “N = 0” (S11). Note that the value of “N_max” can be adjusted as appropriate.

つぎに「Ｎ＝Ｎ＋１＝１」とし、Ｎの値「１」をｑ₁の検索ログ内における位置情報に設定し（Ｓ１２）、クエリｑ１を対象とする第１回目の検索ログ抽出を行う。すなわち、ｑ₁をクエリに持つ検索ログからＵＲＬを抽出し、併せて該ＵＲＬに結び付くクエリ、即ち該ＵＲＬとクリック関係を有するクエリを含む検索ログを抽出する（Ｓ１３）。 Next, “N = N + 1 = 1” is set, and the value “1” of N is set as position information in the search log of q ₁ (S12), and the first search log extraction for the query q1 is performed. That is, a URL is extracted from a search log having q ₁ as a query, and a search log including a query linked to the URL, that is, a query having a click relationship with the URL is extracted (S13).

この抽出後に「Ｎ＝Ｎ＿ｍａｘ」が成立するか否かを確認する（Ｓ１４）。確認の結果、成立すればＳ１５に進む一方、成立しなければＳ１２に戻る。この段階では「Ｎ＝１」であるため、「Ｎ＝Ｎ＿ｍａｘ」は成立しない。したがって、Ｓ１２に戻って「Ｎ＝Ｎ＋１＝２」とし、前記抽出された検索ログに含まれるｑ₁以外のクエリ（関連クエリ１）について、検索ログ内の位置情報をＮの値「２」に設定する（Ｓ１２）。ここで設定されたｑ₁以外のクエリを対象とする第２回目の検索ログ抽出を行う。 After this extraction, it is confirmed whether “N = N_max” is satisfied (S14). As a result of the confirmation, if it is satisfied, the process proceeds to S15. Since “N = 1” at this stage, “N = N_max” does not hold. Therefore, returning to S12, “N = N + 1 = 2” is set, and the position information in the search log is set to the value “2” of N for a query (related query 1) other than q ₁ included in the extracted search log. Set (S12). A second search log extraction for a query other than q ₁ set here is performed.

すなわち、位置情報「２」に設定されたクエリを含む検索ログを抽出し、併せて該検索ログに含まれるＵＲＬに結び付くクエリを抽出する（Ｓ１３）。この段階では「Ｎ＝２」なため、Ｓ１４が成立し、Ｓ１５に進む。Ｓ１５では、「Ｎ＝Ｎ＋１＝３」とし、最後にＳ１３で抽出されたクエリ（関連クエリ２）について検索ログ内の位置情報をＮの値「３」に設定（関連クエリ１を除く）して処理を終了する。これによりクエリｑ₁から順次ＵＲＬ経由で結び付く各クエリの位置情報を最小経由のＵＲＬ数に応じて設定することができる。なお、この設定例では位置情報は「Ｎ＋１」（ＵＲＬ数＋１）で設定されている。 That is, a search log including the query set in the position information “2” is extracted, and a query associated with the URL included in the search log is extracted (S13). Since “N = 2” at this stage, S14 is established, and the process proceeds to S15. In S15, “N = N + 1 = 3” is set, and the position information in the search log for the query (related query 2) extracted in S13 is set to the value “3” of N (excluding related query 1). End the process. As a result, it is possible to set the position information of each query sequentially linked via the URL from the query q ₁ according to the minimum number of URLs. In this setting example, the position information is set as “N + 1” (the number of URLs + 1).

図６に基づきＵＲＬの重要度の算出方法を説明する。まず、処理の開始により前記サブセット中の各ＵＲＬとクリック関係を有するクエリとを抽出する（Ｓ２１）。ここでは一例として式（１）のＵＲＬを対象とし、該式（１）のＵＲＬとクリック関係を有するクエリｑ_i（ｉ＝１，２，・・・，ｎ）を抽出する。 A method for calculating the importance of the URL will be described with reference to FIG. First, each URL in the subset and a query having a click relationship are extracted at the start of processing (S21). Here, as an example, the query q _i (i = 1, 2,..., N) having the click relationship with the URL of the expression (1) is extracted.

つぎにＳ２１で抽出したクエリｑ_iの利用頻度から式（１）のＵＲＬのクリック確率を算出する（Ｓ２２）。すなわち、クエリｑ_iの前記サブセット内における投入回数を基に式（２）を用いて、式（１）のＵＲＬに対するクリック確率（Ｐ_ik）を算出する。 Next, the URL click probability of the formula (1) is calculated from the usage frequency of the query q _i extracted in S21 (S22). That is, the click probability (P _ik ) for the URL of the formula (1) is calculated using the formula (2) based on the number of times the query q _i is inserted in the subset.

ここでは式（１）のＵＲＬとクリック関係を有するすべてのクエリｑ_i（ｉ＝１，２，・・・，ｎ）に対して、式（２）を用いてクリック確率（Ｐ_ik）を算出する。そして、Ｓ２２で算出したクリック確率（Ｐ_ik）を基に式（１）のＵＲＬのエントロピーＥ_kを、式（３）を用いて算出する（Ｓ２３）。 Here, for all the queries q _i (i = 1, 2,..., N) having a click relationship with the URL of Expression (1), the click probability (P _ik ) is calculated using Expression (2). To do. Then, based on the click probability (P _ik ) calculated in S22, the URL entropy E _k of equation (1) is calculated using equation (3) (S23).

最後にＳ２１〜Ｓ２３の処理を前記サブセットに含まれるすべてのＵＲＬに対して実行し、各ＵＲＬのエントロピーＥ_kの最大値を用いて式（４）で正規化する（Ｓ２４）。正規化されたエントロピーＥ_kを各ＵＲＬの重要度とする。 Finally done for all URL included a process of S21~S23 in the subset, normalized by the formula (4) using the maximum value of the entropy E _k for each URL (S24). The normalized entropy E _k and the importance of each URL.

Ｓ０２：前記グラフ作成部４は、前記抽出部３からの出力情報のうちＳ０１で抽出された前記サブセットを基にクリックグラフを作成する。このクリックグラフの作成方法を説明する。ここで前記サブセットが表２の検索ログであれば、ｑ₁のクエリからはｕ１．ｕ４のＵＲＬがクリックされているため、図１のクリックグラフに示すように、ｑ₁からｕ₁．ｕ₄へそれぞれエッジを連結させる。同様に表２の検索ログに基づきｑ₂．ｑ₃に対しても同様にクリックグラフを作成する。すなわち、ｑ₂からｕ₃．ｕ₅にエッジを連結し、ｑ₃からｕ₁．ｕ₅へエッジを連結させる。このようにクリックグラフを作成し、作成されたクリックグラフを前記抽出部３からの出力情報（クリック確率Ｐ_ikを含む。）と併せて前記特徴量抽出部５に出力する。 S02: The graph creating unit 4 creates a click graph based on the subset extracted in S01 among the output information from the extracting unit 3. A method of creating this click graph will be described. Here, if the subset is the search log of Table 2, from the query of q ₁ , u1. Since u4 URL of is clicked, as shown in the click graph of FIG. 1, u ₁ from q _1. Each linking the edge to u _4. Similarly, based on the search log in Table 2, q ₂ . A click graph is similarly created for q ₃ . That is, q ₂ to u ₃ . Connect edges to u ₅ , q ₃ to u ₁ . u ₅ to link the edges. A click graph is created in this way, and the created click graph is output to the feature amount extraction unit 5 together with the output information (including the click probability P _ik ) from the extraction unit 3.

Ｓ０３：前記特徴量抽出部５は、Ｓ０２で作成されたクリックグラフにおけるエッジの重み、即ちクエリ間の特徴量（関連度）を算出する。このクリックグラフからの特徴量は式（５）〜（７）を用いて算出する。 S03: The feature quantity extraction unit 5 calculates the edge weight in the click graph created in S02, that is, the feature quantity (relevance) between queries. The feature amount from the click graph is calculated using equations (5) to (7).

式（５）〜（７）を用いてクリックグラフにおけるすべてのクエリ間の特徴量を計算する。ここで「ｃｌｉｃｋ＿ｖａｌｕｅ_ij」は、クエリｑ_iとクエリｑ_jの関連性を表す特徴量を示している。また、「ｋ」は式（１）のＵＲＬを示し、「Ｐ_ik×Ｐ_kj」はクエリｑ_iを出発して式（１）のＵＲＬを経由し、式（１）のＵＲＬからクエリｑ_jへ到達する確率を示している。図７にクリックグラフにおけるクエリｑ₁とクエリｑ₂の関連性を表す特徴量「ｃｌｉｃｋ＿ｖａｌｕｅ₁₂」を対象とした具体的な計算方法を示しておく。なお、算出したクエリ間の特徴量は前記合成部８に出力される。 The feature quantity between all the queries in the click graph is calculated using Expressions (5) to (7). Here, “click_value _ij ” indicates a feature amount representing the relationship between the query q _i and the query q _j . Further, “k” indicates the URL of the formula (1), and “P _ik × P _kj ” starts from the query q _i and passes through the URL of the formula (1), and the query q _j from the URL of the formula (1). The probability of reaching is shown. FIG. 7 shows a specific calculation method for the feature quantity “click_value ₁₂ ” representing the relationship between the query q ₁ and the query q ₂ in the click graph. The calculated feature quantity between queries is output to the synthesis unit 8.

Ｓ０４：前記グラフ作成部６は、前記抽出部３からの出力情報のうちＳ０１で抽出された前記サブセットを基にスキップグラフを作成する。このスキップグラフの作成方法を説明する。ここではＳ０２と同様に前記サブセットが表２の検索ログとする。このときｑ₁に着目するとｕ₁．ｕ₄はクリックされているが、ｕ₂．ｕ₃．ｕ₅はクリックされていない。そこで、ｑ₁からクリックされていないｕ₂．ｕ₃．ｕ₅に対して、図２に示すように、エッジを連結させる。 S04: The graph creating unit 6 creates a skip graph based on the subset extracted in S01 in the output information from the extracting unit 3. A method of creating this skip graph will be described. Here, the subset is the search log of Table 2 as in S02. At this time, focusing on q ₁ , u ₁ . u ₄ is clicked, but u ₂ . u ₃ . u ₅ has not been clicked. So, u ₂ that it has not been clicked from q _1. u ₃ . Edges are connected to u ₅ as shown in FIG.

また、ｑ₂．ｑ₃に対しても、ｑ₂からクリックされていないｕ₁．ｕ₂．ｕ₄にエッジを連結し、ｑ₃からクリックされていないｕ₂．ｕ₃．ｕ₄にエッジを連結させる。このようにスキップグラフを作成し、作成したスキップグラフを前記抽出部３からの出力情報と併せて前記特徴量抽出部７に出力する。 Q ₂ . even for q _3, u ₁ that have not been clicked from q _2. u ₂ . Connect an edge to u ₄ and not clicked from q ₃ u ₂ . u ₃ . u ₄ to link the edges. The skip graph is generated in this way, and the generated skip graph is output to the feature amount extraction unit 7 together with the output information from the extraction unit 3.

Ｓ０５：前記特徴量抽出部７は、Ｓ０４で作成されたスキップグラフにおけるエッジの重み、即ちクエリ間の特徴量（クエリ間の関連度）を算出する。このとき前記グラフ作成部６からの出力情報中、各クエリの位置情報に基づくクエリ間の距離（ｑ_iの位置−ｑ_jの位置）を算出する。 S05: The feature quantity extraction unit 7 calculates the edge weight in the skip graph created in S04, that is, the feature quantity between queries (degree of association between queries). At this time, the distance between the queries based on the position information of each query (the position of q _i −the position of q _j ) is calculated from the output information from the graph creation unit 6.

このクエリ間の距離と前記出力情報中のＵＲＬの重要度Ｅ_kとを用いて、スキップグラフにおけるクエリ間の特徴量を算出する。この算出には式（８）を用いるものとし、算出されたクエリ間の特徴量は前記合成部８に出力される。 Using the distance between the queries and the importance E _k of the URL in the output information, the feature quantity between the queries in the skip graph is calculated. For this calculation, equation (8) is used, and the calculated feature quantity between the queries is output to the synthesis unit 8.

Ｓ０６：前記合成部８は、Ｓ０３．Ｓ０５で前記各特徴量抽出部５．７から出力されたクリックグラフの特徴量とスキップグラフの特徴量とを合成し、最終的なクエリ間の特徴量を算出する。この算出には式（９）を用いる。 S06: The synthesizing unit 8 performs S03. In S05, the feature values of the click graph output from the feature value extraction unit 5.7 and the feature values of the skip graph are synthesized, and the feature values between the final queries are calculated. Formula (9) is used for this calculation.

式（９）中、「α」は「０〜１．０」までの実数を示し、予め任意の値に設定することができる。以下、「α」の値に「０．７」を設定し、クエリとして「恋愛占い」，「心理テスト」，「タロット占い」，「姓名判断」，「誕生日占い」を対象とした事例を説明する。ここではクリックグラフからは表３に示すクエリ間の特徴量が得られ、スキップグラフからは表４に示すクエリ間の特徴量が得られたとする。 In Expression (9), “α” represents a real number from “0 to 1.0”, and can be set to an arbitrary value in advance. In the following, we set “α” to “0.7”, and the examples are “romance fortune-telling”, “psychological test”, “tarot fortune-telling”, “first-name judgment”, “birthday fortune-telling” as queries. explain. Here, it is assumed that feature quantities between queries shown in Table 3 are obtained from the click graph, and feature quantities between queries shown in Table 4 are obtained from the skip graph.

このとき「恋愛占い」についてクエリ間の特徴量は、
「恋愛占い」⇒「心理テスト」＝０．７×０．０＋０．３×０．４＝０．１２
「恋愛占い」⇒「タロット占い」＝０．７×０．５＋０．３×０．２＝０．４１
「恋愛占い」⇒「姓名判断」＝０．７×０．０＋０．３×０．３＝０．０９
「恋愛占い」⇒「誕生日占い」＝０．７×０．１＋０．３×０．０＝０．０７
と算出される。 At this time, the feature value between queries for "love fortune telling"
"Love divination"=>"Psychologicaltest" = 0.7 x 0.0 + 0.3 x 0.4 = 0.12
"Love divination"=>"Tarotdivination" = 0.7 x 0.5 + 0.3 x 0.2 = 0.41
"Love fortune telling"=>"Last name judgment" = 0.7 x 0.0 + 0.3 x 0.3 = 0.09
“Love fortune telling” ⇒ “Birthday fortune telling” = 0.7 × 0.1 + 0.3 × 0.0 = 0.07
Is calculated.

算出された特徴量が閾値を超えていれば、推薦クエリとして抽出される一方、閾値を超えていなければ、推薦クエリとして抽出されないものとする。この閾値は仕様に応じて適宜に調整できるものとする。例えば閾値を「０．１」に設定すれば、「心理テスト」と「タロット占い」が推薦クエリとして抽出される。抽出される推薦クエリは、インターネット経由でユーザ所有の端末に表示されてユーザに提示される。 If the calculated feature amount exceeds the threshold value, it is extracted as a recommended query, while if it does not exceed the threshold value, it is not extracted as a recommended query. This threshold value can be appropriately adjusted according to the specification. For example, if the threshold is set to “0.1”, “psychological test” and “tarot fortune-telling” are extracted as recommendation queries. The extracted recommendation query is displayed on the user-owned terminal via the Internet and presented to the user.

このような前記推薦装置１によれば、スキップグラフにおけるクエリ間の特徴量を算出するにあたって、式（８）に示すように、クエリ間を結び付けるＵＲＬの重要度Ｅ_kを利用するため、検索ログ中のクリック状況（クリックされていないＵＲＬが多いなど）の影響が軽減される。したがって、非特許文献１のように検索ログ中の未クリックＵＲＬが増加するほど前記特徴量が大きくはならず、これによりユーザに高精度の推薦クエリを提示でき、ユーザの検索効率の向上に貢献できる。 According to the above recommendation apparatus 1, when calculating the feature quantity between the query in the skip graph, as shown in equation (8), for utilizing the importance E _k of URL linking between query, the search log The influence of the middle click status (such as many URLs that have not been clicked) is reduced. Therefore, as the number of unclicked URLs in the search log increases as in Non-Patent Document 1, the feature amount does not increase. This makes it possible to present a highly accurate recommended query to the user, contributing to improvement of the user's search efficiency. it can.

このとき非特許文献１では、スキップグラフを作成する際にクエリとクリック関係に無いすべてのＵＲＬにエッジを作成しているため、関連度の低いクエリとＵＲＬにもエッジが作成されるものの、前記推薦装置１の式（８）によれば、ＵＲＬで結び付けられるクエリ間の距離（ｑ₁の位置−ｑ_jの位置）を考慮しているため、かかるクエリ間の関連度を低くすることができ、この点でも精度のよい推薦クエリをユーザに提示できる。 At this time, in Non-Patent Document 1, when creating a skip graph, edges are created for all URLs that do not have a click relationship with the query. according to equation recommendation apparatus 1 (8), because of the consideration of the distance (position of -q _j of q ₁₎ between the query to be bound by URL, it is possible to lower the degree of association between such queries Also in this respect, a highly accurate recommendation query can be presented to the user.

なお、本発明は、上記実施形態に限定されるものではなく、例えばスキップグラフおけるクエリ間の特徴量算出にあたってＵＲＬの重要度Ｅ_kのみを用いることもできる。この場合にはＳ１２．Ｓ１５の位置情報の設定を行うことなく、式（８）中から「１／ｑ₁の位置−ｑ_jの位置＋１」を削除すればよい。このときＵＲＬで結び付けられるクエリ間の距離は考慮されないものの、検索ログ中のクリック状況（クリックされていないＵＲＬが多いなど）の影響が軽減される効果を得ることができる。 Note that the present invention is not limited to the above-described embodiment, and for example, only the URL importance level E _k can be used for calculating the feature quantity between queries in a skip graph. In this case, S12. Without performing setting of the positional information S15, it is sufficient to remove the "position of 1 / q ₁ position -q _j +1" from the formula (8). At this time, although the distance between the queries linked by the URL is not taken into consideration, it is possible to obtain an effect of reducing the influence of the click state in the search log (there are many URLs that have not been clicked).

≪プログラムなど≫
本発明は、前記推薦装置１の各部２〜８の一部もしくは全部として、コンピュータを機能させる文書検索プログラムとして構成することもできる。このプログラムによれば、Ｓ０１〜Ｓ０６．Ｓ１１〜Ｓ１５．Ｓ２１〜Ｓ２４の一部あるいは全部をコンピュータに実行させることが可能となる。 ≪Programs≫
The present invention can also be configured as a document search program that causes a computer to function as some or all of the units 2 to 8 of the recommendation device 1. According to this program, S01 to S06. S11-S15. It is possible to cause the computer to execute part or all of S21 to S24.

前記プログラムは、Ｗｅｂサイトや電子メールなどネットワークを通じて提供することができる。また、前記プログラムは、ＣＤ−ＲＯＭ，ＤＶＤ−ＲＯＭ，ＣＤ−Ｒ，ＣＤ−ＲＷ，ＤＶＤ−Ｒ，ＤＶＤ−ＲＷ，ＭＯ，ＨＤＤ，ＢＤ−ＲＯＭ，ＢＤ−Ｒ，ＢＤ−ＲＥなどの記録媒体に記録して、保存・配布することも可能である。この記録媒体は、記録媒体駆動装置を利用して読み出され、そのプログラムコード自体が前記実施形態の処理を実現するので、該記録媒体も本発明を構成する。 The program can be provided through a network such as a website or e-mail. The program is stored in a recording medium such as a CD-ROM, DVD-ROM, CD-R, CD-RW, DVD-R, DVD-RW, MO, HDD, BD-ROM, BD-R, or BD-RE. It is also possible to record, save and distribute. This recording medium is read using a recording medium driving device, and the program code itself realizes the processing of the above embodiment, so that the recording medium also constitutes the present invention.

１…クエリ推薦装置
２…検索クエリ記憶部（検索クエリ記憶手段）
３…クエリ関連ログ抽出部（クエリ関連ログ抽出手段）
４…クリックグラフ作成部（グラフ作成手段）
５…クリックグラフ特徴量抽出部（クリックグラフ特徴量抽出手段）
６…スキップグラフ作成部（グラフ作成手段）
７…スキップグラフ特徴量抽出部（スキップグラフ特徴量抽出手段）
８…特徴量合成部（特徴量合成手段） DESCRIPTION OF SYMBOLS 1 ... Query recommendation apparatus 2 ... Search query memory | storage part (search query memory | storage means)
3 ... Query related log extracting unit (query related log extracting means)
4. Click graph creation unit (graph creation means)
5 ... Click graph feature value extraction unit (click graph feature value extraction means)
6 ... Skip graph creation unit (graph creation means)
7 ... Skip graph feature value extraction unit (skip graph feature value extraction means)
8... Feature amount synthesis unit (feature amount synthesis means)

Claims

A query recommendation device for presenting a recommendation query to a user based on a search log stored in a search query storage means,
Query-related log extracting means for extracting a search log of each query linked with the clicked URL from the search query storage means, and calculating the importance of the URL in the extracted search log based on the click probability from each query;
Creation of a graph that creates a click graph in which each query and clicked URL are connected by an edge based on the search log extracted by the extraction means, and a skip graph in which each query and an unclicked URL are connected by an edge Means,
Click graph feature quantity extraction means for calculating the feature quantity between queries in the click graph created by the graph creation means according to the arrival probability via URL;
A skip graph feature amount extraction unit that calculates a feature amount between queries in the skip graph created by the graph creation unit based on the importance of the URL calculated by the extraction unit;
A feature amount combining unit that calculates the feature amount between the final queries by combining the feature amounts calculated by the feature amount extraction units;
A query recommendation device that presents a recommended query to a user according to a result of comparing feature amounts between queries synthesized by a feature amount synthesis unit with a preset threshold value .

A query recommendation device for presenting a recommendation query to a user based on a search log stored in a search query storage means,
When extracting the search log of each query linked by the clicked URL from the search query storage means, the position information of the query sequentially linked by the URL is set according to the number of URLs between the queries, and the importance of the URL Query-related log extraction means for calculating the log based on the click probability from each query,
Creation of a graph that creates a click graph in which each query and clicked URL are connected by an edge based on the search log extracted by the extraction means, and a skip graph in which each query and an unclicked URL are connected by an edge Means,
Click graph feature quantity extraction means for calculating the feature quantity between queries in the click graph created by the graph creation means according to the arrival probability via URL;
Skip for calculating the feature quantity between queries in the skip graph created by the graph creation means using the distance between queries based on the location information set by the extraction means and the importance of the URL calculated by the extraction means Graph feature extraction means;
A feature amount combining unit that calculates the feature amount between the final queries by combining the feature amounts calculated by the feature amount extraction units;
A query recommendation device that presents a recommended query to a user according to a result of comparing feature amounts between queries synthesized by a feature amount synthesis unit with a preset threshold value .

The extraction means calculates a click probability of each URL based on the number of clicks of the URL with respect to the number of query inputs,
Calculate the entropy of each URL based on the click probability, normalize the entropy of each URL with the maximum value of each entropy,
The query recommendation device according to claim 1, wherein the normalized entry value is set as the importance of each URL.

The click graph feature quantity extraction means calculates the click probability of each URL based on the number of clicks of the URL with respect to the number of queries input,
The query recommendation device according to any one of claims 1 to 3, wherein the arrival probability of reaching the other query via a URL from one query is calculated based on the click probability.

A query recommendation method executed by an apparatus for presenting a recommendation query to a user based on a search log stored in a search query storage unit,
A query-related log extracting step of extracting a search log of each query linked with the clicked URL from the search query storage unit, and calculating the importance of the URL in the extracted search log based on a click probability of each query;
Creation of a graph that creates a click graph in which each query and clicked URL are connected by an edge based on the search log extracted in the extraction step, and a skip graph in which each query and an unclicked URL are connected by an edge Steps,
A click graph feature amount extraction step for calculating a feature amount between queries in the click graph created in the graph creation step according to the arrival probability via URL;
A skip graph feature amount extraction step of calculating a feature amount between queries in the skip graph created in the graph creation step with the importance of the URL calculated in the extraction step;
A feature amount combining step of combining the feature amounts calculated in each of the feature amount extraction steps to calculate a feature amount between the final queries,
A query recommendation method characterized by presenting a recommended query to a user according to a result of comparing feature amounts between queries combined in a feature amount combining step with a preset threshold value .

A query recommendation method executed by an apparatus for presenting a recommendation query to a user based on a search log stored in a search query storage unit,
When extracting the search log of each query linked by the clicked URL from the search query storage means, the position information of the query sequentially linked by the URL is set according to the number of URLs between the queries, and the importance of the URL A query-related log extraction step for calculating the log based on the click probability from each query,
Creation of a graph that creates a click graph in which each query and clicked URL are connected by an edge based on the search log extracted in the extraction step, and a skip graph in which each query and an unclicked URL are connected by an edge Steps,
A click graph feature amount extraction step for calculating a feature amount between queries in the click graph created in the graph creation step according to the arrival probability via URL;
Skip to calculate the feature quantity between queries in the skip graph created in the graph creation step using the distance between queries based on the location information set in the extraction step and the importance of the URL calculated in the extraction step A graph feature extraction step;
A feature amount combining step of combining the feature amounts calculated in each of the feature amount extraction steps to calculate a feature amount between the final queries,
A query recommendation method characterized by presenting a recommended query to a user according to a result of comparing feature amounts between queries combined in a feature amount combining step with a preset threshold value .

In the extraction step, a click probability of each URL is calculated based on the number of clicks of the URL with respect to the number of query inputs,
Calculate the entropy of each URL based on the click probability, normalize the entropy of each URL with the maximum value of each entropy,
7. The query recommendation method according to claim 5, wherein the normalized entry value is set as the importance of each URL. 8.

In the click graph feature amount extraction step, the click probability of each URL is calculated based on the number of clicks of the URL with respect to the number of queries input,
The query recommendation method according to any one of claims 5 to 7, wherein the arrival probability of reaching the other query via a URL from one query is calculated based on the click probability.

A query recommendation program for causing a computer to function as the query recommendation device according to claim 1.