JP2013156876A

JP2013156876A - Recommendation query extraction device, method, and program

Info

Publication number: JP2013156876A
Application number: JP2012017571A
Authority: JP
Inventors: Ryota Imai; 良太今井; Shinji Miyahara; 伸二宮原; Yoshimasa Koike; 義昌小池; Ryoji Kataoka; 良治片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-01-31
Filing date: 2012-01-31
Publication date: 2013-08-15
Anticipated expiration: 2032-01-31
Also published as: JP5589009B2

Abstract

PROBLEM TO BE SOLVED: To enable a replacement type query recommendation, even though a clock log has a small storage capacity, and preferentially display candidates having a high probability to be selected.SOLUTION: The recommendation query extraction device according to the present invention acquires a query input from a user and generates a bipartite graph from a URL and a query obtained by retrieving click log storage menas (queries and URLs). At that time, when the input query is not present in the click log storage means, the recommendation query extraction device acquires a URL corresponding to the query via a search engine and adds it to the bipartite graph along with a lower limit (n-1) number of queries included in the bipartite graph. The recommendation query extraction device calculates respective scores of the queries included in the bipartite graph, selects a predetermined number of queries in order of upper scores, and outputs them.

Description

本発明は、推薦クエリ抽出装置及び方法及びプログラムに係り、特に、ユーザが意図した検索結果を得られる可能性が高いクエリを提示するための推薦クエリ抽出装置及び方法及びプログラムに関する。 The present invention relates to a recommended query extracting apparatus, method, and program, and more particularly, to a recommended query extracting apparatus, method, and program for presenting a query that is highly likely to obtain a search result intended by a user.

ユーザが入力するクエリは、ユーザの意図した検索結果を得るために適切であるとは限らないが、推薦されたクエリを選択することで、ユーザは容易に入力するクエリを変更し、検索を再試行することができる。例えば、［コーヒー豆の種類］に関する文書を検索するために『豆の種類』というクエリを入力したユーザに対して、『コーヒー豆種類』や『豆ブランド』というクエリを推薦する動作が可能である。 The query entered by the user may not be appropriate for obtaining the search result intended by the user, but by selecting a recommended query, the user can easily change the entered query and restart the search. Can try. For example, it is possible to recommend a query of “coffee bean type” or “bean brand” to a user who has entered a query of “bean type” in order to search for documents related to [type of coffee beans]. .

以下に、クエリを推薦するための従来技術を説明するが、最初に、本明細書における用語について説明する。 Hereinafter, a conventional technique for recommending a query will be described. First, terms used in this specification will be described.

『クエリ』とは、文書を検索するためにユーザがシステムに入力する文字列をいう。検索する文書はクエリに含まれるキーワードを含む文書であり、クエリが複数のキーワードを含む場合はそれらのキーワードを同時に含む文書を指す。例えば、コーヒー豆の種類に関する文書を検索するために『コーヒー豆種類』というクエリを入力すると、『コーヒー豆』と『種類』の両方のキーワードを含む文書を検索することができる。 A “query” refers to a character string that a user enters into the system to search for a document. A document to be searched is a document including a keyword included in a query, and when the query includes a plurality of keywords, it indicates a document including the keywords simultaneously. For example, if a query “coffee bean type” is entered in order to search for documents related to coffee bean types, documents including both the keywords “coffee bean” and “type” can be searched.

『クリックログ』とは、ハードディスク等の記憶手段に保持されている、過去の複数のユーザがシステムに入力したクエリと、そのクエリに対してシステムが出力した検索結果の中からユーザが選択したURLを指す。例えば、あるユーザＡがクエリaを入力してURL１を選択し、別のユーザＢがクエリｂを入力してURL２を選択した場合、クリックログには、
｛クエリａ，URL１｝
｛クエリｂ，URL２｝
が保持される。 "Click log" refers to a query that is stored in a storage means such as a hard disk, entered by the user in the past, and a URL selected by the user from the search results output by the system for that query. Point to. For example, when a user A inputs a query a to select URL 1 and another user B inputs a query b to select URL 2,
{Query a, URL1}
{Query b, URL2}
Is retained.

『スコア』とは、あるクエリをユーザに推薦したときにユーザがそのクエリを選択する可能性を表す指標を指す。例えば、『豆の種類』という入力クエリに対して『豆知識』というクエリが0.9のスコアをもち、「コーヒー豆の種類」というクエリが1.5のスコアをもつとき、スコアを考慮することで後者を優先して推薦することができる。 “Score” refers to an index indicating the possibility that the user will select a query when a query is recommended to the user. For example, if the query “bean knowledge” has a score of 0.9 and the query “coffee bean type” has a score of 1.5 for the input query “bean type”, the latter is calculated by taking the score into account. Priority can be recommended.

クエリを推薦するための従来の第1の技術として、過去の複数のユーザがシステムに入力したクエリと当該クエリに対してシステムが出力した検索結果の中からユーザが選択したURLからなるクリックログに、入力クエリが含まれる場合に２部グラフを構築し、推薦候補となるクエリを生成する技術がある（例えば、非特許文献1参照）。ここで、２部グラフとは、グラフ理論において、頂点集合を二つの部分集合に分割して各集合内の頂点同士の間には辺がないようにできるグラフのことである。当該技術のクエリ抽出装置の構成を図１に示す。同図に示す装置は、ユーザから入力されたクエリを取得するクエリ入力部１、入力されたクエリを用いて２部グラフを構築する２部グラフ構築部２、クリックログが格納されたクリックログ記憶部５、装置内の各構成要素で用いられるパラメータを保持するパラメータ記憶部４、２部グラフに含まれるそれぞれのクエリについてスコアを計算するスコア計算部７、スコアに基づいてパラメータ記憶部４のパラメータで指定されている個数分のクエリを抽出する推薦クエリ出力部８から構成される。 As a first conventional technique for recommending a query, a click log consisting of a query entered by a plurality of past users and a URL selected by the user from the search results output by the system for that query There is a technique for constructing a bipartite graph when an input query is included and generating a query as a recommendation candidate (see, for example, Non-Patent Document 1). Here, the bipartite graph is a graph that can divide a vertex set into two subsets so that there are no edges between vertices in each set in graph theory. FIG. 1 shows the configuration of the query extraction apparatus of the technology. The apparatus shown in FIG. 1 includes a query input unit 1 that acquires a query input by a user, a bipartite graph construction unit 2 that constructs a bipartite graph using the input query, and a click log storage in which a click log is stored. Unit 5, parameter storage unit 4 that holds parameters used by each component in the apparatus, score calculation unit 7 that calculates a score for each query included in the bipartite graph, parameters of the parameter storage unit 4 based on the scores The recommended query output unit 8 extracts the number of queries specified in.

上記のパラメータ記憶部４は、図２に示すように、パラメータとして、以下の内容を保持する。 As shown in FIG. 2, the parameter storage unit 4 holds the following contents as parameters.

ｎ：２部グラフに含まれるべきクエリの数の下限；
ｍ：スコア計算における反復回数；
ｋ：推薦するクエリの数；
上記の２部グラフ構築部２の動作を図３に示す。 n: Lower limit on the number of queries to be included in the bipartite graph;
m: number of iterations in score calculation;
k: number of recommended queries;
The operation of the bipartite graph construction unit 2 is shown in FIG.

まず、２部グラフ構築部２は、クエリ入力部１からクエリｑを受け取り、ｑのみを含む２部グラフＧを作成する（ステップ１）。次に、クリックログ記憶部５にｑが存在するかを判定し（ステップ２）、存在する場合は（ステップ３、Yes）、ｑを起点として、クエリとURLのアクセス関係に基づいて、ｎ−１個のクエリと当該クエリからアクセスされたURLを２部グラフＧに追加する（ステップ４）。 First, the bipartite graph construction unit 2 receives the query q from the query input unit 1 and creates a bipartite graph G including only q (step 1). Next, it is determined whether or not q exists in the click log storage unit 5 (step 2), and if it exists (step 3, Yes), n− is determined based on the access relation between the query and the URL, starting from q. One query and the URL accessed from the query are added to the bipartite graph G (step 4).

また、従来の第２の技術として、クリックログに入力クエリが含まれていなくとも、検索エンジンによって検索されたURLを用いることにより、候補を推薦する技術がある（例えば、非特許文献2参照）。 Further, as a second conventional technique, there is a technique for recommending candidates by using a URL searched by a search engine even if an input query is not included in the click log (see, for example, Non-Patent Document 2). .

また、従来の第３の技術として、各クエリに対する検索結果の類似性を用いてクエリ同士の類似性を推定し、入力クエリに類似しているクエリをユーザに推薦する技術がある（例えば、非特許文献３参照）。 Further, as a third conventional technique, there is a technique for estimating the similarity between queries using the similarity of search results for each query and recommending a query similar to an input query to a user (for example, non-search (See Patent Document 3).

Qiaozhu Mei, Dengyong Zhou, Kenneth Church: Query suggestion using hitting time, In Proceedings of CIKM '08, pages 469 - 478, ACM, 2008.Qiaozhu Mei, Dengyong Zhou, Kenneth Church: Query suggestion using hitting time, In Proceedings of CIKM '08, pages 469-478, ACM, 2008. US 2005/0055341 A1: System and method for providing search query refinements, Paul Haahr et al.US 2005/0055341 A1: System and method for providing search query refinements, Paul Haahr et al. 特開2001-202390号公報Japanese Patent Laid-Open No. 2001-202390

しかしながら、上記従来の第１の技術は、クリックログに含まれるクエリに対して、検索エンジンの性能に依存しない置き換え型のクエリ推薦が可能であるが、推薦する候補を生成するためにはクリックログが必要であり、クリックログに入力クエリが含まれないと推薦候補を生成できないため、十分な容量のクリックログが必要になるという問題がある。 However, the first conventional technique described above can perform a replacement type query recommendation that does not depend on the performance of the search engine for the query included in the click log, but in order to generate a recommended candidate, the click log There is a problem that a recommendation log cannot be generated unless an input query is included in the click log, and a click log having a sufficient capacity is required.

また、上記従来の第２の技術は、クリックログに含まれないクエリに対するクエリ推薦が可能であるが、検索エンジンによって検索されたURLを必要とするため、検索エンジンの性能がクエリ推薦の結果に大きく影響し、選択される可能性の高い候補を推薦できない場合がある。 The second conventional technique can recommend a query for a query that is not included in the click log. However, since the URL searched by the search engine is required, the performance of the search engine is a result of the query recommendation. Cannot recommend candidates that have a large impact and are likely to be selected.

また、上記従来の第３の技術は、クエリに対する検索結果の類似性に基づくクエリ推薦が可能であるが、人間が判断した結果（本発明におけるクリックログに相当する）を用いないため、検索結果がクエリ推薦の結果に大きく影響し、選択される可能性の高い候補を推薦できない場合がある。 In addition, the third conventional technique described above can perform query recommendation based on the similarity of search results to a query, but does not use a result determined by a human (corresponding to a click log in the present invention). May greatly affect the result of query recommendation and may not recommend candidates that are likely to be selected.

本発明は、上記の点に鑑みなされたもので、クリックログの記憶容量が小さくても置き換え型クエリ推薦を可能とし、また、選択される可能性の高い候補を優先させて表示することが可能な推薦クエリ抽出装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and allows replacement type query recommendation even when the storage capacity of the click log is small, and can prioritize candidates that are likely to be selected. It is an object of the present invention to provide a recommended query extraction device, method, and program.

上記の課題を解決するため、本発明（請求項１）は、ユーザから入力されたクエリに基づいてクリックログからユーザが意図した検索結果を得られる可能性が高いクエリを提示するための推薦クエリ抽出装置であって、
ユーザから入力されたクエリを取得するクエリ入力手段と、
過去に入力されたクエリと該クエリによってアクセスされたURLの組からなるクリックログが格納されたクリックログ記憶手段と、
前記入力されたクエリに基づいて前記クリックログ記憶手段を検索することにより得られたURLと該クエリから２部グラフを生成する２部グラフ構築手段と、
前記２部グラフに含まれるクエリについてそれぞれのスコアを計算するスコア算出手段と、
前記スコアの上位から順に所定の数のクエリを選択し、出力する推薦クエリ出力手段と、
を有し、
前記２部グラフ構築手段は、
前記入力されたクエリが前記クリックログ記憶手段に存在しない場合は、検索エンジンを介して、該クエリに対応するURLを取得し、前記２部グラフに追加するURL検索手段を含むことを特徴とする。 In order to solve the above problems, the present invention (Claim 1) provides a recommended query for presenting a query that is highly likely to obtain a search result intended by the user from the click log based on the query input by the user. An extraction device,
Query input means for obtaining a query input by a user;
A click log storage means for storing a click log comprising a set of a query inputted in the past and a URL accessed by the query;
A bipartite graph construction means for generating a bipartite graph from the URL obtained by searching the click log storage means based on the inputted query, and the query;
Score calculating means for calculating respective scores for the queries included in the bipartite graph;
A recommended query output means for selecting and outputting a predetermined number of queries in order from the top of the score;
Have
The bipartite graph construction means includes:
When the input query does not exist in the click log storage unit, the URL includes a URL search unit that acquires a URL corresponding to the query via a search engine and adds the URL to the bipartite graph. .

また、本発明（請求項２）は、前記URL検索手段において、
前記検索エンジンを介して取得したURLを記憶手段に格納し、１番目のURLを起点として、所定の２部グラフに含まれるべきクエリの下限値ｎ−１個のクエリ、及び該ｎ−１個のクエリからアクセスされたURLを前記２部グラフに追加する手段を含む。 The present invention (Claim 2) provides the URL search means,
The URL acquired through the search engine is stored in the storage means, and the lower limit value n−1 queries to be included in a predetermined bipartite graph starting from the first URL, and the n−1 Means for adding to the bipartite graph the URL accessed from the query.

また、本発明（請求項３）は、前記２部グラフ構築手段において、
前記２部グラフに含まれるクエリの数を所定の推薦するクエリの数ｋで除した前記２部グラフの評価値を求め、該評価値が所定の評価値の下限値よりも大きい２部グラフを出力する手段を含む。 The present invention (Claim 3) provides the bipartite graph construction means,
An evaluation value of the bipartite graph obtained by dividing the number of queries included in the bipartite graph by a predetermined number k of recommended queries is obtained, and a bipartite graph in which the evaluation value is larger than a lower limit value of the predetermined evaluation value is obtained. Means for outputting.

本発明によれば、クリックログに含まれない入力クエリに対して、当該クエリの一部または全部を置き換えたクエリを推薦する置き換え型のクエリ推薦が可能となるため、クリックログの記憶容量が小さくても置き換え型クエリ推薦が可能となる。 According to the present invention, for an input query that is not included in the click log, a replacement-type query recommendation that recommends a query in which a part or all of the query is replaced is possible, so the storage capacity of the click log is small. However, replacement query recommendation is possible.

また、入力クエリがクリックログに含まれるときには、検索エンジンの性能に依存せず、人間が判断した結果のみを用いて推薦するクエリのスコアを算出することが可能であるため、選択される可能性の高い候補を先に表示することが可能となる。 In addition, when the input query is included in the click log, the score of the query to recommend can be calculated using only the result determined by humans without depending on the performance of the search engine. Can be displayed first.

第1の従来技術のクエリ抽出装置の構成図である。1 is a configuration diagram of a first conventional query extraction device. FIG. 第1の従来技術のパラメータ記憶部のパラメータを示す図である。FIG. 5 is a diagram showing parameters in a parameter storage unit of the first conventional technique. 第1の従来技術の２部グラフ構築部のフローチャートである。It is a flowchart of the bipartite graph construction part of the 1st prior art. 本発明の一実施の形態におけるクエリ抽出装置の構成図である。It is a block diagram of the query extraction apparatus in one embodiment of this invention. 本発明の一実施の形態におけるパラメータ記憶部の例である。It is an example of the parameter memory | storage part in one embodiment of this invention. 本発明の一実施の形態におけるクリックログ記憶部の格納例である。It is a storage example of the click log memory | storage part in one embodiment of this invention. 本発明の一実施の形態における２部グラフ構築部のフローチャートである。It is a flowchart of the bipartite graph construction part in one embodiment of this invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図４は、本発明の一実施の形態におけるクエリ抽出装置の構成を示す。 FIG. 4 shows the configuration of the query extraction apparatus in an embodiment of the present invention.

同図に示すクエリ抽出装置は、クエリ入力部１、２部グラフ構築部２０、検索エンジン３０、パラメータ記憶部４０、クリックログ記憶部５、インデックス記憶部６０、スコア計算部７、推薦クエリ出力部８から構成される。 The query extraction device shown in FIG. 1 includes a query input unit 1, a bipartite graph construction unit 20, a search engine 30, a parameter storage unit 40, a click log storage unit 5, an index storage unit 60, a score calculation unit 7, and a recommended query output unit. 8 is composed.

クエリ入力部１、クリックログ記憶部５、スコア計算部７、推薦クエリ出力部８は、従来の第１の技術と同様である。 The query input unit 1, the click log storage unit 5, the score calculation unit 7, and the recommended query output unit 8 are the same as those in the first conventional technique.

クエリ入力部１は、ユーザが入力したクエリを受け取る。 The query input unit 1 receives a query input by a user.

２部グラフ構築部２０は、入力されたクエリを受け取り、当該クエリに基づいてクリックログ記憶部５を検索することにより得られたURLと該クエリから２部グラフを生成し、クリックログ記憶部５に入力されたクエリが存在しない場合には、検索エンジン３０を用いてURLを取得して２部グラフを構築する。処理の詳細について後述する。 The bipartite graph construction unit 20 receives the input query, generates a bipartite graph from the URL obtained by searching the click log storage unit 5 based on the query and the query, and the click log storage unit 5 If there is no query input to the URL, a URL is obtained using the search engine 30 and a bipartite graph is constructed. Details of the processing will be described later.

パラメータ記憶部４０は、図５に示すように、２部グラフに含まれるべきクエリの数の下限ｎ、２部グラフの評価値の下限ｕ、スコアの計算における反復回数m、推薦するクエリの数ｋを格納する。これらのパラメータは、実装する際の計算機資源や利用目的に応じて事前に決定し、クエリ入力部１にクエリが入力される前にパラメータ記憶部４０に格納しておく。 As shown in FIG. 5, the parameter storage unit 40 includes a lower limit n of the number of queries to be included in the bipartite graph, a lower limit u of the evaluation value of the bipartite graph, the number m of iterations in the score calculation, and the number of recommended queries. Store k. These parameters are determined in advance according to the computer resources and the purpose of use at the time of mounting, and are stored in the parameter storage unit 40 before the query is input to the query input unit 1.

インデックス記憶部６０は、文書の検索においてクエリと文書の照合を効率的に行うために、検索対象となる多くの文書の見出し情報を保持し、検索エンジン３０により参照される。見出し情報の作成と更新は、クエリ推薦とは無関係に行われるものであり、予め構築されているものとする。当該見出し情報の作成及び更新の処理については、例えば、文献１"B. Brin, L. Page: The anatomy of a large-scale hypertextual Web search engine, 1998"に記載の方法により実現することが可能である。なお、見出し情報の作成・更新については、本発明の範囲外であるため詳細な説明は省略する。 The index storage unit 60 holds heading information of many documents to be searched and is referred to by the search engine 30 in order to efficiently collate the query with the document in the document search. It is assumed that the creation and update of the heading information is performed regardless of the query recommendation and is constructed in advance. The creation and update processing of the header information can be realized, for example, by the method described in Document 1 “B. Brin, L. Page: The anatomy of a large-scale hypertextual Web search engine, 1998”. is there. Note that the creation / update of heading information is outside the scope of the present invention, and thus detailed description thereof is omitted.

クリックログ記憶部５は、過去の複数のユーザが文書を検索するために入力したクエリと、そのクエリに対して得られた検索結果の中からユーザが選択したURLを保持する。例えば、過去のあるユーザAがクエリａを入力してURL1を選択し、次にユーザＢがクエリｂを入力してURL2を選択した場合、クリックログ記憶部５には図６のように、クエリａとURL1，クエリｂとURL2のペアが保持される。 The click log storage unit 5 holds a query input by a plurality of past users to search for a document and a URL selected by the user from the search results obtained for the query. For example, when a user A in the past inputs a query a and selects URL1, and then user B inputs a query b and selects URL2, the click log storage unit 5 stores the query as shown in FIG. A pair of a and URL1 and a query b and URL2 are held.

検索エンジン３０は、インデックス記憶部６０を用いて、クエリｑに含まれるキーワードを含む文書を検索し、それらの文書のURLのリストを検索結果として出力する。例えば、上記の文献１のFigure4.に示される方法により実現可能である。 The search engine 30 uses the index storage unit 60 to search for documents including keywords included in the query q, and outputs a list of URLs of those documents as a search result. For example, it can be realized by the method shown in FIG.

スコア計算部７は、２部グラフｇに含まれるクエリq以外のクエリについて、それぞれのスコアを計算する。本発明では、２部グラフ構築部２０から出力される２部グラフは、所定の値以上の評価値を有するものである。具体的には、文献２"Qiaozhu Mei, Dengyong Zhou, Kenneth Church: Query suggestion using hitting time, In Proceedings of CIKM '08, pages 469-478, ACM, 2008."の技術を用いることにより実現できる。 The score calculation unit 7 calculates each score for queries other than the query q included in the bipartite graph g. In the present invention, the bipartite graph output from the bipartite graph construction unit 20 has an evaluation value equal to or greater than a predetermined value. Specifically, it can be realized by using the technique of document 2 “Qiaozhu Mei, Dengyong Zhou, Kenneth Church: Query suggestion using hitting time, In Proceedings of CIKM '08, pages 469-478, ACM, 2008.”.

推薦クエリ出力部８は、スコア計算部７で求められたスコアをもとにしてｋ個（２部グラフに含まれるべきクエリの数の下限値）のクエリを選び出し、推薦クエリとして出力する。具体的には、スコアが付与されたクエリの集合Ｑをスコアの順に並び替え、ユーザが選択する可能性の高い順にｋ個のクエリを選び出し、推薦クエリとして出力する。具体的には文献２の技術を用いることにより実現可能である。 The recommended query output unit 8 selects k queries (the lower limit value of the number of queries to be included in the bipartite graph) based on the score obtained by the score calculation unit 7 and outputs them as recommended queries. Specifically, the set Q of the queries to which scores are assigned is rearranged in the order of the scores, k queries are selected in descending order of possibility of being selected by the user, and are output as recommended queries. Specifically, it can be realized by using the technique of Document 2.

次に、上記の２部グラフ構築部２０の処理について詳述する。 Next, the processing of the bipartite graph construction unit 20 will be described in detail.

図７は、本発明の一実施の形態における２部グラフ構築部のフローチャートである。 FIG. 7 is a flowchart of the bipartite graph construction unit in one embodiment of the present invention.

ステップ１００）２部グラフ構築部２０は、クエリ入力部１からクエリｑを取得し、ｑのみを含む２部グラフＧを作成する。 Step 100) The bipartite graph construction unit 20 acquires the query q from the query input unit 1, and creates a bipartite graph G including only q.

ステップ１１０）クエリｑに基づいてクリックログ記憶部５を参照し、当該クエリｑが存在するかを判定する。 Step 110) Referring to the click log storage unit 5 based on the query q, it is determined whether or not the query q exists.

ステップ１２０）クエリｑが存在する場合はステップ１３０に移行し、存在しない場合はステップ１４０に移行する。 Step 120) If the query q exists, the process proceeds to step 130, and if not, the process proceeds to step 140.

ステップ１３０）クエリｑを起点として、クエリとURLのアクセス関係を元に、n−1個のクエリとそれらからアクセスされたURLをメモリ（図示せず）上の２部グラフＧに追加し、当該処理を終了する。 Step 130) Starting from the query q, based on the access relation between the query and the URL, add n−1 queries and the URL accessed from the query to the bipartite graph G on the memory (not shown), The process ends.

ステップ１４０）クエリｑがクリックログ記憶部５に格納されていない場合は、検索エンジン３０を介して、クエリｑに対応するURLを受け取り、URLのカウンタをｉ＝１とする。なお、取得したURLはメモリ（図示せず）に格納しておくものとする。 Step 140) If the query q is not stored in the click log storage unit 5, the URL corresponding to the query q is received via the search engine 30, and the URL counter is set to i = 1. The acquired URL is stored in a memory (not shown).

ステップ１５０）ｉ番目のURLを起点として、クエリとURLのアクセス関係に基づいて、検索エンジン３０による検索結果のi番目のURLをアクセス関係の起点として、ｎ−１個のクエリと当該クエリからアクセスされたURLをクリックログ記憶部５から取り出し、２部グラフＧに追加する。なお、ｎはパラメータ記憶部４０に格納されている２部グラフに含まれるべきクエリの数の下限である。 Step 150) Starting from the i-th URL, based on the access relation between the query and the URL, the i-th URL of the search result by the search engine 30 is used as the starting point of the access relation, and the n-1 queries and the query are accessed. The clicked URL is extracted from the click log storage unit 5 and added to the bipartite graph G. Note that n is the lower limit of the number of queries to be included in the bipartite graph stored in the parameter storage unit 40.

ステップ１６０）２部グラフＧの評価値を算出する。評価値は、
評価値＝｜Q|／k
ただし、｜Q|は２部グラフGに含まれるクエリの数であり、ｋはパラメータ記憶部４０に格納されているパラメータで指定された推薦するクエリの数である。 Step 160) The evaluation value of the bipartite graph G is calculated. Evaluation value is
Evaluation value = | Q | / k
However, | Q | is the number of queries included in the bipartite graph G, and k is the number of recommended queries specified by the parameters stored in the parameter storage unit 40.

ステップ１７０）ステップ１６０で求められた評価値がパラメータ記憶部４０に格納されているグラフＧの評価値の下限ｕ以上であるかを判定し、ｕ以上である場合は当該処理を終了し、ｕより小さい値である場合はステップ１８０に移行する。 Step 170) It is determined whether or not the evaluation value obtained in Step 160 is equal to or higher than the lower limit u of the evaluation value of the graph G stored in the parameter storage unit 40. If the value is smaller, the process proceeds to step 180.

ステップ１８０）カウンタをｉ＝ｉ＋１としてステップ１５０に戻る。 Step 180) Set the counter to i = i + 1 and return to Step 150.

本発明では、入力クエリがクリックログ記憶部５に存在しなくとも、上記のステップ１１０〜１３０において、入力クエリに関連するURLのリストを、検索エンジン３０を介して取得することで２部グラフを構築することができる。クリックログは、入力クエリに対する検索結果から人間がURLを選択することで生成されるので、ステップ１４０で取得するURLは入力クエリに対するクリックログの候補と考えることができ、したがって、クリックログの代替とすることができる。これにより、クエリ抽出装置が探索できるクリックログの容量が小さく（例えば、期間が直前の１ヶ月間のみ）、入力クエリがクリックログに存在しない可能性が高い場合でも２部グラフを構築できる。２部グラフを構築できれば、スコア計算部７の処理を従来技術と同様に行うことでクエリ推薦が可能となる。 In the present invention, even if the input query does not exist in the click log storage unit 5, a bipartite graph can be obtained by acquiring a list of URLs related to the input query through the search engine 30 in the above steps 110 to 130. Can be built. Since the click log is generated when a human selects a URL from the search result for the input query, the URL obtained in step 140 can be considered as a click log candidate for the input query. can do. As a result, the bipartite graph can be constructed even when the click log capacity that can be searched by the query extraction device is small (for example, only during the previous month) and there is a high possibility that the input query does not exist in the click log. If a bipartite graph can be constructed, query recommendation can be performed by performing the processing of the score calculation unit 7 in the same manner as in the prior art.

一方、クリックログ記憶部５にクエリが存在する場合には、当該クリックログのみに基づいてグラフを構築するため、人間の判断を反映することができ、選択される可能性が高いクエリを推薦することができる。 On the other hand, when a query exists in the click log storage unit 5, a graph is constructed based only on the click log, so that a query that can reflect human judgment and is highly likely to be selected is recommended. be able to.

さらに、評価値を算出し、所定の値以上の２部グラフをスコア計算部７に出力することにより、選択される可能性の高い候補を優先させて出力することが可能となる。 Furthermore, by calculating an evaluation value and outputting a bipartite graph of a predetermined value or more to the score calculation unit 7, it is possible to prioritize and output candidates that are likely to be selected.

上記の、図４に示すクエリ入力部１、２部グラフ構築部２０、スコア計算部７、推薦クエリ出力部８の動作をプログラムとして構築し、クエリ抽出装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 The operations of the query input unit 1, the bipartite graph construction unit 20, the score calculation unit 7, and the recommended query output unit 8 shown in FIG. 4 are constructed as a program, and installed and executed on a computer used as a query extraction device. Or can be distributed via a network.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

１クエリ入力部
２，２０２部グラフ構築部
４，４０パラメータ記憶部
５クリックログ記憶部
７スコア計算部
８推薦クエリ出力部
３０検索エンジン
６０インデックス記憶部 DESCRIPTION OF SYMBOLS 1 Query input part 2,20 2 parts Graph construction part 4,40 Parameter storage part 5 Click log storage part 7 Score calculation part 8 Recommended query output part 30 Search engine 60 Index storage part

Claims

A recommended query extraction device for presenting a query that is highly likely to obtain a search result intended by a user from a click log based on a query input by the user,
Query input means for obtaining a query input by a user;
A click log storage means for storing a click log comprising a set of a query inputted in the past and a URL accessed by the query;
URL obtained by searching the click log storage means based on the input query, bipartite graph construction means for generating a bipartite graph from the query, and respective scores for the queries included in the bipartite graph A score calculating means for calculating
A recommended query output means for selecting and outputting a predetermined number of queries in order from the top of the score;
Have
The bipartite graph construction means includes:
When the input query does not exist in the click log storage unit, the URL includes a URL search unit that acquires a URL corresponding to the query via a search engine and adds the URL to the bipartite graph. Recommended query extraction device.

The URL search means
The URL acquired through the search engine is stored in the storage means, and the lower limit value n−1 queries and the n−1 number of queries to be included in a predetermined bipartite graph starting from the first URL The recommended query extraction device according to claim 1, further comprising means for adding a URL accessed from a query to the bipartite graph.

The bipartite graph construction means includes:
An evaluation value of the bipartite graph obtained by dividing the number of queries included in the bipartite graph by a predetermined number k of recommended queries is obtained, and a bipartite graph in which the evaluation value is larger than a lower limit value of the predetermined evaluation value is obtained. The recommended query extraction device according to claim 1, further comprising means for outputting.

A recommended query extraction method for presenting a query that is highly likely to obtain a search result intended by a user from a click log based on a query input by the user,
Query input means for obtaining a query input by a user;
A click log storage means for storing a click log comprising a set of a query inputted in the past and a URL accessed by the query;
A bipartite graph construction means for generating a bipartite graph from the URL obtained by searching the click log storage means based on the inputted query, and the query;
Score calculating means for calculating respective scores for the queries included in the bipartite graph;
A recommended query output means for selecting and outputting a predetermined number of queries in order from the top of the score;
In a device having
If the input query does not exist in the click log storage unit, the bipartite graph construction unit obtains a URL corresponding to the query via a search engine and adds it to the bipartite graph A recommended query extraction method comprising performing steps.

In the URL search step,
The URL acquired through the search engine is stored in the storage means, and the lower limit value n−1 queries and the n−1 number of queries to be included in a predetermined bipartite graph starting from the first URL The recommended query extraction method according to claim 4, wherein a URL accessed from a query is added to the bipartite graph.

The bipartite graph construction means obtains an evaluation value of the bipartite graph obtained by dividing the number of queries included in the bipartite graph by a predetermined number k of recommended queries, and the evaluation value is a lower limit of the predetermined evaluation value. The recommended query extraction method according to claim 4, further comprising a step of outputting a bipartite graph larger than the value.

Computer
A recommended query extraction program for functioning as each means of the recommended query extraction device according to claim 1.