JP5670867B2

JP5670867B2 - Query location estimation method, apparatus, and program

Info

Publication number: JP5670867B2
Application number: JP2011254229A
Authority: JP
Inventors: 宮原　伸二; 伸二宮原; 義昌小池; 良治片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-11-21
Filing date: 2011-11-21
Publication date: 2015-02-18
Anticipated expiration: 2031-11-21
Also published as: JP2013109583A

Description

本発明は、検索サービスにおけるクエリのクリックログを用いたクエリの場所推定方法及び装置及びプログラムに係り、特に、一部のクエリや一部のURLに場所情報が付与されている場合に、場所情報が付与されていないクエリ、URLの場所を推定しながら、クエリの場所推定に関する信頼度を算出し、クエリのクリック分布に基づいたグラフ分析に適用するためのクエリの場所推定方法及び装置及びプログラムに関する。 The present invention relates to a query location estimation method, apparatus, and program using a click log of a query in a search service, and in particular, location information when location information is given to some queries and some URLs. The present invention relates to a query location estimation method, apparatus, and program for calculating a reliability related to query location estimation while estimating the location of a query to which a URL is not attached, URL, and applying the result to graph analysis based on the click distribution of the query .

近年、検索サービスにおけるユーザの検索状況を示すログファイルを分析し、検索サービスの向上に利用する試みが行われている。ここで、一般に検索ログには図１に示すような項目の情報が含まれている。 In recent years, an attempt has been made to analyze a log file indicating a user's search status in a search service and use it to improve the search service. Here, in general, the search log includes information on items as shown in FIG.

１．日付
２．ユーザＩＤ
３．クエリ名
４．クリックURL
５．ランキング順位
図１の日付の項目は、ユーザがURLをクリックした日付である。ユーザＩＤは、検索クエリを用いて検索したユーザのＩＤである。クエリ名は、ユーザが検索に利用したクエリ名である。クリックURLはユーザがクリックした検索結果のURLである。ランキング順位はユーザがクリックしたURLの検索ランキング順位である。 1. Date 2. User ID
3. Query name Click URL
5. Ranking Order The date item in FIG. 1 is the date when the user clicked the URL. The user ID is an ID of a user who has searched using a search query. The query name is a query name used by the user for the search. The click URL is the URL of the search result clicked by the user. The ranking order is the search ranking order of the URL clicked by the user.

これらのログは、ユーザの検索行動の効率化に利用されている。その利用情報の一つにクエリ推薦がある。このクエリ推薦では、ユーザが投入したクエリとクリックしたURLが対になった図２のような二部グラフを用いて推薦している。図２の左側はユーザが投入したクエリ名であり、右側は検索結果からユーザがクリックしたURLを表している。また、ユーザがクエリを投入して実際にクリックしたURLをエッジで連結している。この二部グラフを解析し、結びつきの強いクエリを推薦クエリとして用いている。 These logs are used to improve the efficiency of user search behavior. One of the usage information is query recommendation. In this query recommendation, recommendation is made using a bipartite graph as shown in FIG. 2 in which a query entered by a user and a clicked URL are paired. The left side of FIG. 2 shows the name of the query input by the user, and the right side shows the URL clicked by the user from the search result. In addition, URLs that users clicked on after entering a query are linked at the edge. This bipartite graph is analyzed, and a strongly connected query is used as a recommendation query.

ここで、検索ログを用いたクエリ推薦の代表的な技術として、クエリとURLの関連性を二部グラフを用いてクエリとURＬ間の関連性を算出する技術がある（例えば、非特許文献１参照）。この技術では、クエリとURLの関連性を二部グラフを用いて、クエリとURL間の関連性を算出している。 Here, as a typical technique for query recommendation using a search log, there is a technique for calculating the relation between a query and a URL using a bipartite graph of the relation between the query and the URL (for example, Non-Patent Document 1). reference). In this technology, the relationship between a query and a URL is calculated using a bipartite graph.

これらログファイルを用いて、ユーザが欲しいページを見つけやすいクエリを推薦する技術が存在する（例えば、非特許文献１参照）。この技術では、クエリとURLの関連性を二部グラフを用いて、クエリとURL間の関連性を算出する際に、同じ検索意図のクエリが同じURL群にアクセスしている性質を利用し、同じ検索意図のクエリをクラスタリングしている。クラスタリングしたクエリにおいて、関連度の高いクエリを代表クエリとしている。ユーザにクエリを推薦する場合は、ユーザが利用しようとするクエリに対し、そのクエリが属するクラスタリング内のクエリ群から関連度の高いクエリを推薦する。この技術により、ユーザはより検索精度の高いクエリで検索を行うことができる。 There is a technique for recommending a query that makes it easy for a user to find a desired page using these log files (see, for example, Non-Patent Document 1). In this technology, when calculating the relationship between a query and URL using a bipartite graph of the relationship between a query and a URL, the same search intention query accesses the same URL group. Clustering queries with the same search intention. In the clustered query, a query having a high degree of relevance is used as a representative query. When recommending a query to a user, a query having a high degree of relevance is recommended from a group of queries in the clustering to which the query belongs to a query that the user intends to use. With this technology, the user can perform a search with a query with higher search accuracy.

Hangbo Deng. Irwin King, Michael R. Lyu: Entropy-biased Models for Query Representation on the Click Graph, ACM SIGIR, pages 339-346, 2009.Hangbo Deng. Irwin King, Michael R. Lyu: Entropy-biased Models for Query Representation on the Click Graph, ACM SIGIR, pages 339-346, 2009.

しかしながら、クエリが関係する場所（例えば、クエリ「渋谷デパート」は渋谷に関するクエリ）を特定することを目的とし、従来手法を用いてクエリに関連度の高い他のクエリが関連する場所を利用する場合、場所とは関係のないクエリの結びつきが問題になることがある。 However, if the query is related to the location (for example, the query “Shibuya Department Store” is a query related to Shibuya), the location of other queries related to the query is used using the conventional method. , Query ties unrelated to location can be a problem.

例えば、横浜のデパートでアクセサリを販売しており、渋谷のデパートではアクセサリを販売していない場合に、図３に示すようなクリックグラフが存在するものとする。このとき、クエリ「デパートアクセサリ」では横浜でアクセサリを販売しているため、横浜のデパートに関連するURL2, URL3をクリックしている。しかし、従来手法を用いてクエリ間の関連度を算出し、クエリ「デパートアクセサリ」とクエリ「渋谷デパート」の関連度が大きい場合、渋谷ではアクセサリを販売していないためクエリ「デパートアクセサリ」が渋谷でも利用できると判断される。 For example, when an accessory is sold at a department store in Yokohama and no accessory is sold at a department store in Shibuya, a click graph as shown in FIG. 3 exists. At this time, since the query “department accessory” sells accessories in Yokohama, URL2 and URL3 related to the department store in Yokohama are clicked. However, if the relevance between the queries is calculated using the conventional method and the relevance between the query “Department Accessories” and the query “Shibuya Department Store” is large, the accessories “Shibuya” are not sold in Shibuya. However, it is judged that it can be used.

他の例として、図４のようにクエリ「デパートアクセサリ」がクエリ「渋谷デパート」とクエリ「横浜デパート」との関連度が小さい場合は、クエリ「デパートアクセサリ」は利用できる場所が特定できない。しかし、クエリ「デパートアクセサリ」でクリックしているURL2、URL3はともに横浜のデパートに関係するURLのため、クエリ「デパートアクセサリ」は横浜で利用できるクエリと考えるのが妥当である。 As another example, when the query “department accessory” has a low degree of association between the query “Shibuya department store” and the query “Yokohama department store” as shown in FIG. However, since both URL2 and URL3 clicked in the query “department accessory” are URLs related to the department store in Yokohama, it is reasonable to consider the query “department accessory” as a query that can be used in Yokohama.

このように、クエリ間の関連度を用いてクエリが利用できる場所を特定するには、従来手法のみでは適用できないという問題がある。 As described above, there is a problem that it is not possible to apply a conventional method alone to specify a place where a query can be used by using the degree of association between queries.

本発明は、上記の点に鑑みなされたもので、クエリ間の関連度とクエリとクリック関係にあるＵＲＬ群の場所に対する関連度を用いて、クエリが利用できる場所を推定することが可能なクエリの場所推定方法及び装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and it is possible to estimate a place where a query can be used by using a degree of association between queries and a degree of association with a location of a URL group in a click relationship with the query. An object of the present invention is to provide a method, an apparatus, and a program for estimating the location of a computer.

上記の課題を解決するため、本発明（請求項１）は、検索サービスにおいて、ユーザの投入したクエリと該クエリに対する検索結果においてクリックしたURLから構成される検索ログを記録する検索ログ記憶手段と、該検索ログから対象とするクエリに関連するクエリ群とURL群を抽出する検索ログ抽出手段と、該クエリとクリックされたURLから構成される二部グラフを作成し、二部グラフを用いてクエリ間、URL間の関連度を算出する関連度算出手段と、を有する装置におけるクエリの場所推定方法であって、
前記関連度算出手段が、前記二部グラフにおいてクエリとURLを結ぶエッジの重みをクリック回数を基にクエリ間遷移確率算出ルールに基づいてクエリ間の遷移確率を算出するクエリ間遷移確率算出ステップと、
前記関連度算出手段が、URL間遷移確率算出ルールに基づいてURL間の遷移確率を算出するURL間遷移確率算出ステップと、
場所情報算出手段が、前記クエリ間の遷移確率に基づいてクエリの場所推定値を算出する第１の場所情報算出ステップと、
前記場所情報算出手段が、前記URL間の遷移確率に基づいてURLの場所推定値を算出する第２の場所情報算出ステップと、
場所判定手段が、前記クエリの場所推定値と前記URLの場所推定値を用いてクエリの場所を判定する場所判定ステップと、を有する。 In order to solve the above problems, the present invention (Claim 1) provides a search log storage means for recording a search log composed of a query input by a user and a URL clicked in a search result for the query in a search service. A search log extracting means for extracting a query group and a URL group related to the target query from the search log, and creating a bipartite graph composed of the query and the clicked URL, and using the bipartite graph A query level estimation method in a device having a relevance calculation means for calculating relevance between queries and URLs,
The inter-query transition probability calculation step in which the relevance calculation means calculates the transition probability between queries based on the inter-query transition probability calculation rule based on the number of clicks of the edge weight connecting the query and URL in the bipartite graph; ,
The relevance calculation means calculates a transition probability between URLs based on a transition probability calculation rule between URLs, and calculates a transition probability between URLs,
A first location information calculation step in which the location information calculation means calculates a location estimate of the query based on the transition probability between the queries;
A second location information calculating step in which the location information calculating means calculates a URL location estimate based on the transition probability between the URLs;
The location determination means includes a location determination step of determining the location of the query using the location estimate of the query and the location estimate of the URL.

また、本発明（請求項２）は、前記クエリ間遷移確率算出ステップにおいて、
クエリの投入回数とURLへのクリック回数に基づいてエッジの重みを算出し、該クエリから該URLへの遷移確率に基づいてクエリ間を結ぶURLを介した遷移確率を算出する前記クエリ間遷移確率算出ルールを用いる。 In the present invention (Claim 2), in the inter-query transition probability calculating step,
The inter-query transition probability that calculates the edge weight based on the number of queries input and the number of clicks on the URL, and calculates the transition probability via the URL connecting the queries based on the transition probability from the query to the URL Use calculation rules.

また、本発明（請求項３）は、前記URL間遷移確率算出ステップにおいて、
クエリの投入回数とURLへのクリック回数に基づいてエッジの重みを算出し、URLからクエリへの遷移確率に基づいてURL間を結ぶクエリを介した遷移確率を算出する前記URL間遷移確率算出ルールを用いる。 Further, according to the present invention (Claim 3), in the inter-URL transition probability calculating step,
Calculating the weight of an edge based on the number of clicks to a query put number and URL, the URL among transition probability calculation for calculating a transition probability via a query connecting the UR L based on the transition probability from URL to query Use rules.

また、本発明（請求項４）は、前記第１の場所情報算出ステップにおいて、
前記クエリ間の遷移確率を用いて他のクエリのもつ場所情報を当該クエリの場所ベクトルへ配分し、
前記第２の場所情報算出ステップにおいて、
前記URL間の遷移確率を用いて他のURLのもつ場所情報を当該URLの場所ベクトルへ配分し、
前記場所判定ステップにおいて、
前記クエリの場所ベクトルの要素を確率として用い、クエリの場所信頼度として確率エントロピーを算出し、
前記クエリとクリック関係にあるURLの場所ベクトルの総和の各ベクトル要素を確率として用い、URLの場所信頼度として確率エントロピーを算出し、
前記クエリの場所信頼度と前記URLの場所信頼度を乗じたものをクエリの場所を判定する値としてもち、該値が閾値以上であればクエリの場所ベクトルの要素の中で最も大きい値の要素に対応する場所をクエリの場所とする。 Moreover, this invention (Claim 4) is the first location information calculation step ,
The location information with the other queries using the transition probabilities between before Symbol query allocate to the location vector of the query,
In the second location information calculation step ,
The location information with the other URL using the transition probabilities between before Symbol URL allocated to the location vector of the URL,
In the location determination step,
Probability entropy is calculated as the location reliability of the query using elements of the location vector of the query as probabilities,
Probability entropy is calculated as the URL location reliability using each vector element of the sum of the URL location vectors in a click relationship with the query as a probability,
Has multiplied by a location reliability of the URL and location reliability of the query to the location of the query as determined value, the largest value among the elements of the query location vector if said value is equal to or greater than the threshold The location corresponding to the element is the query location.

従来の技術では、クエリ間の関連度によるクエリの場所的な繋がりを特定する目的において、クエリとURLのクリック関係を２部グラフにして関連度を算出する場合、クリック関係の特徴によって場所的な繋がりがないクエリとの関連度が高い可能性がある。これに対し、本発明によれば、クエリ間の関連度による場所信頼度とクエリとクリック関係にあるURL群の場所信頼度を用いることにより、クエリ間の場所信頼度が小さい場合でもURL群の場所信頼度を用いて場所を推定でき、また、URL群の場所信頼度が小さい場合でも、クエリ間の場所信頼度を用いて場所を推定できる。 In the conventional technology, for the purpose of specifying the locational connection of queries based on the relevance between queries, when the relevance is calculated using a bipartite graph of the click relationship between the query and the URL, the locality depends on the characteristics of the click relationship. There is a possibility that the relevance to the query that is not connected is high. On the other hand, according to the present invention, by using the location reliability based on the relevance between the queries and the location reliability of the URL group in the click relationship with the query, even if the location reliability between the queries is small, The location can be estimated using the location reliability, and the location can be estimated using the location reliability between queries even when the location reliability of the URL group is small.

従来の検索ログの例である。It is an example of the conventional search log. 従来技術におけるクエリ、URLのクリックグラフの例１である。It is Example 1 of the click graph of the query and URL in a prior art. 従来技術におけるクエリ、URLのクリックグラフの例２である。It is Example 2 of the click graph of the query and URL in a prior art. 従来技術におけるクエリ、URLのクリックグラフの例３である。It is Example 3 of the query and URL click graph in a prior art. 本発明の一実施の形態におけるクエリの場所推定装置の構成図である。It is a block diagram of the location estimation apparatus of the query in one embodiment of this invention. 本発明の一実施の形態における検索ログ記憶部の例である。It is an example of the search log memory | storage part in one embodiment of this invention. 本発明の一実施の形態における検索ログ抽出処理のフローチャートである。It is a flowchart of the search log extraction process in one embodiment of this invention. 本発明の一実施の形態におけるクエリの場所推定処理のフローチャートである。It is a flowchart of the place estimation process of the query in one embodiment of this invention. 本発明の一実施の形態におけるクエリ、URLの二部グラフの例である。It is an example of the bipartite graph of the query and URL in one embodiment of this invention. 本発明の一実施の形態におけるクエリ間の遷移確率算出例である。It is an example of transition probability calculation between queries in an embodiment of the present invention. 本発明の一実施の形態におけるクエリ間遷移確率算出部で算出されたクエリ間の遷移確率の例である。It is an example of the transition probability between queries calculated in the transition probability calculation part between queries in one embodiment of this invention. 本発明の一実施の形態におけるURL間の遷移確率算出例である。It is an example of the transition probability calculation between URL in one embodiment of this invention. 本発明の一実施の形態におけるURL間遷移確率算出部で算出されたURL間の遷移確率の例である。It is an example of the transition probability between URL calculated in the transition probability calculation part between URL in one embodiment of this invention. 本発明の一実施の形態における場所キーワード記憶部の例である。It is an example of the place keyword memory | storage part in one embodiment of this invention. 本発明の一実施の形態における場所URL記憶部の例である。It is an example of the place URL memory | storage part in one embodiment of this invention. 本発明の一実施の形態における場所情報算出部で算出されるクエリ「渋谷デパート」の場所ベクトルの例である。It is an example of the place vector of the query “Shibuya Department Store” calculated by the place information calculation unit in the embodiment of the present invention. 本発明の一実施の形態における場所情報算出部で算出されるURL1の場所ベクトルの例である。It is an example of the place vector of URL1 calculated by the place information calculation part in one embodiment of the present invention.

以下図面と共に、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図５は、本発明の一実施の形態におけるクエリの場所推定装置の構成を示す。 FIG. 5 shows a configuration of a query location estimation apparatus according to an embodiment of the present invention.

同図に示す装置は、クエリ入力部１、検索ログ抽出部２、クエリ間遷移確率算出部３、URL間遷移確率算出部４、場所情報算出部５、場所判定部６、検索ログ記憶部７、場所キーワード記憶部８、場所URL記憶部９から構成される。 The apparatus shown in FIG. 1 includes a query input unit 1, a search log extraction unit 2, an inter-query transition probability calculation unit 3, an inter-URL transition probability calculation unit 4, a location information calculation unit 5, a location determination unit 6, and a search log storage unit 7. A location keyword storage unit 8 and a location URL storage unit 9.

上記の検索ログ記憶部７、場所キーワード記憶部８、場所URL記憶部９は、ハードディスク等の記憶媒体に設けられる。 The search log storage unit 7, the location keyword storage unit 8, and the location URL storage unit 9 are provided in a storage medium such as a hard disk.

検索ログ記憶部７は、図６に示すように、日付、ユーザＩＤ、クエリ名、クリックURLからなる検索ログを格納する。 As shown in FIG. 6, the search log storage unit 7 stores a search log including a date, a user ID, a query name, and a click URL.

以下に、上記の構成の装置の処理を検索ログ抽出処理とクエリ場所推定処理に分けて説明する。 Hereinafter, the processing of the apparatus having the above-described configuration will be described separately for search log extraction processing and query location estimation processing.

＜検索ログ抽出処理＞
クエリ入力部１にクエリq１が入力されると、検索ログ抽出部２にq１を出力する。検索ログ抽出部２は、クエリq1を取得すると、検索ログ記憶部７にアクセスし、クエリq１に関連するログを抽出する。 <Search log extraction process>
When a query q1 is input to the query input unit 1, q1 is output to the search log extraction unit 2. When the search log extraction unit 2 acquires the query q1, the search log extraction unit 2 accesses the search log storage unit 7 and extracts a log related to the query q1.

検索ログを抽出する方法について、図７のフローチャートを用いて説明する。 A method for extracting the search log will be described with reference to the flowchart of FIG.

図７は、本発明の一実施の形態における検索ログ抽出処理のフローチャートである。 FIG. 7 is a flowchart of search log extraction processing according to an embodiment of the present invention.

ステップ１０１）検索ログ抽出部２は、はじめに、N_maxの値を設定する。本実施の形態では、N_max＝２とし、Ｎ＝０とする。 Step 101) The search log extraction unit 2 first sets a value of N_max. In this embodiment, N_max = 2 and N = 0.

ステップ１０２）検索ログ抽出部２は、Ｎ＝Ｎ＋１とし、クエリ入力部１から取得したクエリq1を持つクリックURLを検索ログ記憶部７から検索する。ここでは、図６の1行目と3行目を検索してURL１，URL２を抽出する。 Step 102) The search log extraction unit 2 sets N = N + 1, and searches the search log storage unit 7 for a click URL having the query q1 acquired from the query input unit 1. Here, the first and third lines in FIG. 6 are searched to extract URL1 and URL2.

ステップ１０３）検索ログ抽出部２は、検索ログ記憶部７からURL１，URL２をクリックURLにもつクエリを抽出する。具体的には、図６の検索ログからURL１をクリックURLにもつクエリとしてq2を抽出し、URL２をクリックURLにもつクエリとしてq3を抽出する。 Step 103) The search log extraction unit 2 extracts the query that URL1 and URL2 have in the click URL from the search log storage unit 7. Specifically, q2 is extracted from the search log of FIG. 6 as a query having URL1 as a click URL, and q3 is extracted as a query having URL2 as a click URL.

ステップ１０４）ＮがN_max以下である場合は、N＝N+1としてステップ１０２に移行し、ステップ１０３で抽出したクエリ（ｑ２、ｑ３）のクリックURLを抽出する。ＮがN_maxと同値となったら当該処理を終了する。本例ではＮ＝２となった時点で検索ログを出力する。 Step 104) If N is less than or equal to N_max, the process proceeds to Step 102 with N = N + 1, and the click URL of the query (q2, q3) extracted in Step 103 is extracted. When N becomes equal to N_max, the process is terminated. In this example, the search log is output when N = 2.

検索ログ抽出部２は、上記の処理で抽出した検索ログをクエリq１に関連する検索ログとし、クエリ間遷移確率算出部３とURL間遷移確率算出部４に出力する。 The search log extraction unit 2 sets the search log extracted by the above processing as a search log related to the query q1, and outputs the search log to the inter-query transition probability calculation unit 3 and the inter-URL transition probability calculation unit 4.

＜クエリ場所推定処理＞
クエリ間遷移確率算出部３とURL間遷移確率算出部４は、検索ログ抽出部２から検索ログを取得すると、クエリとURLの二部グラフを作成し、それぞれクエリ間の遷移確率とURL間の遷移確率を算出する。 <Query location estimation processing>
When the inter-query transition probability calculation unit 3 and the inter-URL transition probability calculation unit 4 obtain the search log from the search log extraction unit 2, the inter-query transition probability calculation unit 3 creates a bipartite graph of the query and the URL. The transition probability is calculated.

以下にクエリ間の遷移確率、URL間の遷移確率を用いてクエリの場所を判定する方法を説明する。 A method for determining the location of a query using the transition probability between queries and the transition probability between URLs will be described below.

図８は、本発明の一実施の形態におけるクエリの場所推定処理のフローチャートである。 FIG. 8 is a flowchart of query location estimation processing according to an embodiment of the present invention.

ステップ２０１）クエリ間遷移確率算出部３は、検索ログ記憶部７から検索ログを読み出し、検索ログの各クエリに対し、クエリとクリック関係にあるURL_ｉ（i＝１，２，…，ｎ)を抽出する。 Step 201) The inter-query transition probability calculation unit 3 reads the search log from the search log storage unit 7, and for each query in the search log, URL _i (i = 1, 2,..., N) that has a click relationship with the query. To extract.

ステップ２０２）クエリ間遷移確率算出部３は、クエリとURLによる二部グラフを作成する。二部グラフとは、検索ログにおけるクエリとクリック関係にあるURLをエッジで結んだものであり、クエリ同士、URL同士ではエッジを結ばないグラフである。ここでは、検索ログが図６のように与えられると、図９のような二部グラフを作成する。 Step 202) The inter-query transition probability calculation unit 3 creates a bipartite graph by a query and a URL. A bipartite graph is a graph in which URLs that have a click relationship with a query in a search log are connected by an edge, and an edge is not connected between queries and URLs. Here, when the search log is given as shown in FIG. 6, a bipartite graph as shown in FIG. 9 is created.

ステップ２０３）クエリ間遷移確率算出部３は、ステップ２０１で抽出したクエリの検索ログ内のクリック回数を基に、以下の式（1）でクエリURL_kへのクリック確率を算出する。 Step 203) Based on the number of clicks in the search log of the query extracted in Step 201, the inter-query transition probability calculation unit 3 calculates the click probability to the query URL _k by the following equation (1).

上記式(1)における分子のq_iからURL_kへのクリック回数とは、検索ログにおいて全ユーザがq_iをクエリとして得た検索結果に対し、URL_kをクリックした回数である。また、分母のq_iの全クリック回数とは、検索ログにおいて全ユーザがq_iを使って検索した検索結果のURLをクリックした回数を示している。

The number of clicks from q _i of the molecule in the formula (1) to the URL _k, to search result all users obtain a q _i as a query in the search log, a number of clicks on URL _k. The total number of clicks on the denominator q _i indicates the number of times that all users clicked on the search result URLs searched using q _i in the search log.

図１０に算出例を示す。上記の式において、URL_kとクリック関係にある全てのクエリq_i（i＝１，２，…，ｎ)に対して、URL_kへのクリック確率を算出する。 FIG. 10 shows a calculation example. In the above formula, all the query _{q i (i = 1,2, ...} , n) in the URL _k and clicks relationship to, to calculate the click probability to URL _k.

次に、クエリq_iとクエリq_jの遷移確率p_ijを以下の式（2)を用いて算出する。 Next, the transition probability p _ij between the query q _i and the query q _j is calculated using the following equation (2).

上記の式(2)において、p_ik、p_kjは図１０に示すように、クエリq_iからURL_k、q_jからURL_kへのクリック回数により求まる。また、p_ijを算出する際の総和は、クエリq_iとURL_k、URL_kとクエリq_jを結ぶ全てのURL_k（ｋ＝１，２，…，ｎ）に対して算出する。このp_ijをクエリ間の遷移確率とし、この算出例を図１１に示す。クエリ間遷移確率算出部３は、当該p_ijを場所情報算出部５に出力する。

In the above equation (2), p _ik and p _kj are obtained from the number of clicks from the query q _i to URL _k and from q _j to URL _k , as shown in FIG. Further, the sum for calculating p _ij is calculated for all URL _k (k = 1, 2,..., N) connecting query q _i and URL _k and URL _k and query q _j . This p _ij is used as a transition probability between queries, and this calculation example is shown in FIG. The inter-query transition probability calculation unit 3 outputs the p _ij to the location information calculation unit 5.

ステップ２０４） URL間遷移確率算出部４において、ステップ２０１で抽出したクエリの検索ログ内のクリック回数を基に下記の式（3）でクエリｑ_kへのクリック確率を算出する。 Step 204) The URL transition probability calculation unit 4 calculates the click probability to the query q _k by the following equation (3) based on the number of clicks in the query search log extracted in Step 201.

上記式(3)における分子のURL_iからq_kへのクリック回数とは、検索ログにおいて全ユーザがq_kをクエリとして得た検索結果に対しURL_iをクリックした回数である。また、分母のURL_iの全クリック回数とは、検索ログにおいて全ユーザがq_kを使って検索した検索結果のURL_iをクリックした回数を示している。

The number of clicks to q _k from URL _i molecule in the formula (3), the number of times that all users clicks the search results to URL _i got q _k as a query in the search log. The total number of clicks on the denominator URL _i indicates the number of times that all users clicked on the search result URL _i using q _k in the search log.

図１２に算出例を示す。上記の式(3)において、q_kとクリック関係にある全てのURL_i（i＝１，２，…，ｎ)に対して、q_kへのクリック確率を算出する。 FIG. 12 shows a calculation example. In the above formula (3), all _{URL i (i = 1,2, ...} , n) in the clicking relationship and q _k with respect to calculate the click probability to q _k.

次に、URL_iとURL_jの遷移確率p_ijを下記の式(4)を用いて算出する。 Next, the transition probability p _ij between URL _i and URL _j is calculated using the following equation (4).

上記の式(4)において、p_ik，p_kjは図１２に示すようにURL_iからq_k、URL_jからq_kへのクリック回数により求まる。また、p_ijを算出する際の総和は、URL_iとq_k、q_kとURL_jを結ぶ全てのq_k（k＝１，２、…,n）に対して算出する。このp_ijをURL間の遷移確率とし、URL間の遷移確率の算出例を図１３に示す。また、算出されたURL間の遷移確率を場所情報算出部５へ出力する。

In the above equation (4), p _ik and p _kj are obtained from the number of clicks from URL _i to q _k and URL _j to q _k as shown in FIG. Further, the sum when p _ij is calculated is calculated for all q _k (k = 1, 2,..., N) connecting URL _i and q _k and q _k and URL _j . FIG. 13 shows an example of calculating the transition probability between URLs, where p _ij is the transition probability between URLs. Further, the calculated transition probability between URLs is output to the location information calculation unit 5.

ステップ２０５）場所情報算出部５は、ステップ２０３、ステップ２０４で算出されたクエリ間、URL間の遷移確率をクエリ間遷移確率算出部３、URL間遷移確率算出部４から取得し、それらに含まれている各クエリ、各URLに場所情報を付与する。 Step 205) The location information calculation unit 5 acquires the inter-query and inter-URL transition probabilities calculated in Step 203 and Step 204 from the inter-query transition probability calculation unit 3 and the inter-URL transition probability calculation unit 4 and includes them. Location information is assigned to each query and each URL.

まず、はじめに、場所情報算出部５は、各クエリの表記（「渋谷デパート」など）に該当する場所情報を場所キーワード記憶部８内から取り出す。ここで、場所キーワード記憶部８には図１４に示すように、地名としての「渋谷」のキーワードに対して場所情報の「渋谷区」や、施設名「日本スカイツリー」のキーワードに対して「台東区」の場所情報が入っている。例えば、図１１のクエリ「渋谷デパート」には、クエリ内の「渋谷」に該当する「渋谷区」の場所情報を得る。ここで、クエリの表記において、場所キーワード記憶部８に該当するキーワードが存在しない場合、場所情報を付与できない場合もある。 First, the location information calculation unit 5 extracts location information corresponding to the notation of each query (such as “Shibuya Department Store”) from the location keyword storage unit 8. Here, as shown in FIG. 14, the place keyword storage unit 8 has “Shibuya-ku” as place information for the keyword “Shibuya” as the place name and “Nippon Sky Tree” as the keyword for the place name. Contains Taito-ku location information. For example, for the query “Shibuya Department Store” in FIG. 11, location information of “Shibuya Ward” corresponding to “Shibuya” in the query is obtained. Here, in the query notation, if there is no keyword corresponding to the location keyword storage unit 8, location information may not be given.

次に、場所情報算出部５は、各URLに該当する場所情報を場所URL記憶部９から取り出す。ここで、場所URL記憶部９には、図１５に示すようなURLに対する場所情報が格納されており、本実施の形態では、図１５に示すようなURLに対する情報が格納されているものとして説明する。ここで、場所URL記憶部９内に存在しないURLには場所情報が付与できない。 Next, the location information calculation unit 5 extracts the location information corresponding to each URL from the location URL storage unit 9. Here, the location information for the URL as shown in FIG. 15 is stored in the location URL storage unit 9, and in the present embodiment, it is assumed that the information for the URL as shown in FIG. 15 is stored. To do. Here, location information cannot be given to a URL that does not exist in the location URL storage unit 9.

ステップ２０６）場所情報算出部５は、各クエリに対し、場所ベクトルを算出する。この場所ベクトルの次元は場所キーワード記憶部８、場所URL記憶部９に含まれる全ての場所情報（「渋谷区」など）で構成される。例えば、クエリ「渋谷デパート」は渋谷区の場所情報が付加されているため、図１６のようなベクトルとなる。このクエリq_iの場所ベクトルを下記の方法で算出する。 Step 206) The location information calculation unit 5 calculates a location vector for each query. The dimension of the place vector is composed of all place information (such as “Shibuya Ward”) included in the place keyword storage unit 8 and the place URL storage unit 9. For example, the query “Shibuya Department Store” is a vector as shown in FIG. 16 because the location information of Shibuya Ward is added. The location vector of this query q _i is calculated by the following method.

１）クエリq_iに場所情報が付加されている時：場所情報に該当するベクトルの要素を1.0とし、それ以外の要素を０とする。 1) When location information is added to the query q _i : Vector elements corresponding to the location information are set to 1.0, and other elements are set to 0.

２）クエリｑ_iに場所情報が付加されていない時：ステップ２０３で算出したクエリ間の遷移確率を基に、下記の式(5)でクエリq_iのベクトルを算出する。 2) When no location information is added to the query q _i : Based on the transition probability between queries calculated in step 203, a vector of the query q _i is calculated by the following equation (5).

上記の式(5)において、p_ijはｑ_iとq_jの遷移確率であり、q_vector(q_j)はクエリq_jの場所ベクトルである。

In the above equation (5), p _ij is the transition probability between q _i and q _j , and q_vector (q _j ) is the place vector of query q _j .

ステップ２０７）場所情報算出部５は、各URLに対し場所ベクトルを算出する。この場所ベクトルの次元は、場所キーワード記憶部８、場所URL記憶部９に含まれる全ての場所情報（「渋谷区」など）で構成される。例えば、図１６のURL1は場所情報として「台東区」が該当するため、図１７のようなベクトルとなる。URL_iの場所ベクトルは下記の方法で算出する。 Step 207) The location information calculation unit 5 calculates a location vector for each URL. The dimension of the place vector is composed of all place information (such as “Shibuya Ward”) included in the place keyword storage unit 8 and the place URL storage unit 9. For example, since URL1 in FIG. 16 corresponds to “Taito-ku” as location information, the vector is as shown in FIG. The location vector of URL _i is calculated by the following method.

１）URL_iに場所情報が付加されている時：場所情報に該当するベクトルの要素を1.0とし、それ以外の要素を０とする。 1) When location information is added to URL _i : The element of the vector corresponding to the location information is set to 1.0, and the other elements are set to 0.

２）URL_iに場所情報が付加されていない時：ステップ２０４で算出したURL間の遷移確率を基に、下記の式(6)でURL_iのベクトルを算出する。 2) When no location information is added to URL _i : Based on the transition probability between URLs calculated in step 204, a vector of URL _i is calculated by the following equation (6).

上記の式(6)において、p_ijは、URL_iとURL_jの遷移確率であり、URL_vector(URL_j)はURL_jの場所ベクトルである。

In the above equation (6), p _ij is the transition probability between URL _i and URL _j , and URL_vector (URL _j ) is the location vector of URL _j .

ステップ２０８）場所判定部６は、クエリの場所を特定するため、クエリの場所ベクトルから算出する信頼度と、クエリとクリック関係にあるURLの場所ベクトルから算出される信頼度を算出する。 Step 208) In order to specify the location of the query, the location determination unit 6 calculates the reliability calculated from the location vector of the query and the reliability calculated from the location vector of the URL having a click relationship with the query.

クエリq_iの場所ベクトルから算出する信頼度q_trust(ｑ_i)は下記の式(7)で算出する。 The reliability q_trust (q _i ) calculated from the place vector of the query q _i is calculated by the following equation (7).

ここで、l_kはクエリの場所ベクトルq_vector(q_i)=(l_１，l₂，…,l_k)の要素を示している。上記の式（7）はクエリq_iの場所ベクトルの各要素の場所に対する推定確率としたときのエントロピーである。そのため、q_trust(q_i)の値が大きい場合は、場所を特定できる確率が高いことを示す。

Here, l _k represents an element of the query location vector q_vector (q _i ) = (l ₁ , l ₂ ,..., L _k ). The above equation (7) is entropy when the estimated probability for the location of each element of the location vector of the query q _i is used. Therefore, a large value of q_trust (q _i ) indicates that there is a high probability that the location can be specified.

クエリとクリック関係にあるURLの場所ベクトルから算出する信頼度URL_trust(q_i)は下記の式(8)で算出する。 The reliability URL_trust (q _i ) calculated from the location vector of the URL having a click relationship with the query is calculated by the following equation (8).

ここで、l_kはクエリとクリック関係にあるURLの場所ベクトルを足し合わせたベクトルΣURL_vector(URL_i)=(l₁，l₂，…，l_k)の要素を示している。上記の式（8）は、クエリとクリック関係にあるURL_iの場所ベクトルの総和において、各要素を場所に対する推定確率としたときのエントロピーである。そのため、URL_trust(q_i)の値が大きい場合は、場所を特定できる確率が高いことを示す。

Here, l _k represents an element of a vector ΣURL_vector (URL _i ) = (l ₁ , l ₂ ,..., L _k ) obtained by adding the URL location vectors that are in a click relationship with the query. The above equation (8) is the entropy when each element is the estimated probability for the place in the sum of the place vectors of URL _i that have a click relationship with the query. Therefore, a large value of URL_trust (q _i ) indicates that there is a high probability that a place can be specified.

この２つの信頼度を基に、クエリq_iの場所推定値を下記の式（9）で算出する。 Based on these two reliability levels, the location estimate of the query q _i is calculated by the following equation (9).

クエリの場所推定値＝q_trust(q_i)×URL_trust(q_i) (9)
上記の式(9)のクエリq_iの場所推定値が設定した閾値以上であればクエリｑ_iの場所を特定できたものと判断する。このとき、q_trust(q_i)とURL_trust(q_i)の値の大きさを比較し、下記の方法でクエリq_iの場所を付与する。 Query location estimate = q_trust (q _i ) x URL_trust (q _i ) (9)
If the estimated place value of the query q _{i in} the above formula (9) is equal to or larger than the set threshold value, it is determined that the place of the query q _i can be specified. At this time, the magnitudes of the values of q_trust (q _i ) and URL_trust (q _i ) are compared, and the location of the query q _i is given by the following method.

１）q_trust(q_i)の値が大きい場合：q_vector(q_i)の要素において、最も値の大きい要素に該当する場所をクエリq_iの場所とする。 1) When the value of q_trust (q _i ) is large: Among the elements of q_vector (q _i ), the place corresponding to the element with the largest value is set as the place of query q _i .

２）URL_trust(q_i)の値が大きい場合：ΣURL_vector(URL_i)の要素において、最も値の大きい要素に該当する場所をクエリq_iの場所とする。 2) When the value of URL_trust (q _i ) is large: Among the elements of ΣURL_vector (URL _i ), the place corresponding to the element with the largest value is set as the place of query q _i .

なお、本発明を実施する上で、クエリとURLに場所情報が付与してある必要があるが、一部のクエリ、URLに場所情報が付与されている場合でも、クエリ、URLの場所を推定できる。 In order to implement the present invention, it is necessary to add location information to the query and URL, but even if location information is assigned to some queries and URLs, the location of the query and URL is estimated. it can.

なお、上記の図５に示すクエリの場所推定装置の各構成要素の各機能をプログラムとして構築し、場所推定装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 In addition, each function of each component of the query location estimation device shown in FIG. 5 is constructed as a program, installed in a computer used as the location estimation device, executed, or distributed via a network. Is possible.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１クエリ入力部
２検索ログ抽出部
３クエリ間遷移確率算出部
４ URL間遷移確率算出部
５場所情報算出部
６場所判定部
７検索ログ記憶部
８場所キーワード記憶部
９場所URL記憶部 1 Query Input Unit 2 Search Log Extraction Unit 3 Inter-Query Transition Probability Calculation Unit 4 Inter-URL Transition Probability Calculation Unit 5 Location Information Calculation Unit 6 Location Determination Unit 7 Search Log Storage Unit 8 Location Keyword Storage Unit 9 Location URL Storage Unit

Claims

In a search service, search log storage means for recording a search log composed of a query input by a user and a URL clicked on a search result for the query, and a query group and a URL group related to a target query from the search log Search log extracting means for extracting the query, a bipartite graph composed of the query and the clicked URL, and using the bipartite graph, a relevance calculation means for calculating the relevance between the URLs between the queries, A method for estimating a location of a query in a device having
The inter-query transition probability calculation step in which the relevance calculation means calculates the transition probability between queries based on the inter-query transition probability calculation rule based on the number of clicks of the edge weight connecting the query and URL in the bipartite graph; ,
The relevance calculation means calculates a transition probability between URLs based on a transition probability calculation rule between URLs, and calculates a transition probability between URLs,
A first location information calculation step in which the location information calculation means calculates a location estimate of the query based on the transition probability between the queries;
A second location information calculating step in which the location information calculating means calculates a URL location estimate based on the transition probability between the URLs;
A location determination step for determining a location of the query using the location estimate of the query and the location estimate of the URL;
A method for estimating a location of a query, comprising:

In the inter-query transition probability calculating step,
The inter-query transition probability that calculates the edge weight based on the number of queries input and the number of clicks on the URL, and calculates the transition probability via the URL connecting the queries based on the transition probability from the query to the URL The query location estimation method according to claim 1, wherein a calculation rule is used.

In the URL transition probability calculation step,
Calculating the weight of an edge based on the number of clicks to a query put number and URL, the URL among transition probability calculation for calculating a transition probability via a query connecting the UR L based on the transition probability from URL to query The query location estimation method according to claim 1, wherein a rule is used.

In the first location information calculation step ,
The location information with the other queries using the transition probabilities between before Symbol query allocate to the location vector of the query,
In the second location information calculation step ,
The location information with the other URL using the transition probabilities between before Symbol URL allocated to the location vector of the URL,
In the location determination step,
Probability entropy is calculated as the location reliability of the query using elements of the location vector of the query as probabilities,
Probability entropy is calculated as the URL location reliability using each vector element of the sum of the URL location vectors in a click relationship with the query as a probability,
Has multiplied by a location reliability of the URL and location reliability of the query to the location of the query as determined value, the largest value among the elements of the query location vector if said value is equal to or greater than the threshold The query location estimation method according to claim 1, wherein a location corresponding to an element is a query location.

A query location estimation device for estimating a query location in a search service,
A search log storage means for recording a search log composed of a query input by a user and a URL clicked in a search result for the query;
Search log extraction means for extracting a query group and a URL group related to the target query from the search log;
In a device having a relevance calculation unit that creates a bipartite graph composed of the query and the clicked URL, and uses the bipartite graph to calculate the relevance between the queries and the URL,
The relevance calculation means includes:
An inter-query transition probability calculating means for calculating a transition probability between queries based on an inter-query transition probability calculation rule based on the number of clicks of the edge weight connecting the query and URL in the bipartite graph;
It has a URL transition probability calculation means for calculating a transition probability between URLs based on a URL transition probability calculation rule,
First location information calculation means for calculating a location estimate of a query based on a transition probability between the queries;
Second location information calculating means for calculating a URL location estimate based on the transition probability between the URLs;
A location determination means for determining the location of the query using the location estimate of the query and the location estimate of the URL;
The query location estimation apparatus further comprising:

The inter-query transition probability calculating means includes:
The inter-query transition probability that calculates the edge weight based on the number of queries input and the number of clicks on the URL, and calculates the transition probability via the URL connecting the queries based on the transition probability from the query to the URL 6. The query location estimation apparatus according to claim 5, wherein a calculation rule is used.

The URL transition probability calculating means is:
Calculating the weight of an edge based on the number of clicks to a query put number and URL, the URL among transition probability calculation for calculating a transition probability via a query connecting the UR L based on the transition probability from URL to query 6. The query location estimation apparatus according to claim 5, wherein a rule is used.

The first location information calculation means includes :
The location information with the other queries using the transition probabilities between before Symbol query includes means for allocating to the location vector of the query,
The second location information calculation means includes :
The location information with the other URL using the transition probabilities between before Symbol URL includes means for allocating to the location vector of the URL,
The place determination means includes
Means for calculating a probability entropy as a location reliability of a query using an element of the location vector of the query as a probability;
Means for calculating probability entropy as the URL location reliability using each vector element of the sum of the URL location vectors in a click relationship with the query as a probability;
Has multiplied by a location reliability of the URL and location reliability of the query to the location of the query as determined value, the largest value among the elements of the query location vector if said value is equal to or greater than the threshold Means to make the location corresponding to the element the location of the query,
The query location estimating apparatus according to claim 5, comprising:

Computer
A query location estimation program that functions as each means of the query location estimation apparatus according to claim 5.