JP5519406B2

JP5519406B2 - Server apparatus, genre score calculation method, and program

Info

Publication number: JP5519406B2
Application number: JP2010122830A
Authority: JP
Inventors: 隼赤塚; 健吉村; 拓藤本; 大祐鳥居
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2010-05-28
Filing date: 2010-05-28
Publication date: 2014-06-11
Anticipated expiration: 2030-05-28
Also published as: JP2011248730A

Description

本発明は、検索クエリのジャンルを推定する技術に関する。 The present invention relates to a technique for estimating a genre of a search query.

ネットワークにおける検索サービスが用いられている。欲しい情報をより効率的に得るためには、検索サービスの利用者が複数の単語を組み合わせるなどして、より適切なクエリを作成する必要がある。利用者の利便性を向上させるため、入力されたクエリに対して推薦クエリを提供する技術が知られている（例えば、特許文献１および２）。 Search services in the network are used. In order to obtain the desired information more efficiently, the user of the search service needs to create a more appropriate query by combining a plurality of words. In order to improve user convenience, a technique for providing a recommendation query to an input query is known (for example, Patent Documents 1 and 2).

特許文献１は、情報検索サーバを開示している。特許文献１において、情報検索サーバは、ユーザによる操作履歴情報（ユーザＩＤ、捜査対象、操作内容、操作回数、評価値）を含むデータベースを有する。情報検索サーバは、入力された要求情報（検索対象の商品名やジャンル名）に対し、ｗｅｂページ群を抽出し、抽出したｗｅｂページ群をカテゴリに分類し、ユーザに送信する（図７）。特許文献２は、検索クエリに関するスコアを算出する技術を開示している。具体的に、特許文献２は、ＵＲＬを介して複数のクエリを結びつけ、クエリの組み合わせに対してスコアを算出し、スコアに基づいて一のクエリから他のクエリを推薦することを開示している（段落００７７、図１１）。 Patent Document 1 discloses an information search server. In Patent Literature 1, the information search server has a database including operation history information (user ID, investigation target, operation content, operation frequency, evaluation value) by a user. The information search server extracts a web page group for the input request information (product name or genre name to be searched), classifies the extracted web page group into a category, and transmits it to the user (FIG. 7). Patent Document 2 discloses a technique for calculating a score related to a search query. Specifically, Patent Document 2 discloses that a plurality of queries are linked through a URL, a score is calculated for a combination of queries, and another query is recommended from one query based on the score. (Paragraph 0077, FIG. 11).

特開２００４−３２６５３７号公報JP 2004-326537 A 特開２００９−２５２０７０号公報JP 2009-252070 A

特許文献１および２に記載された技術によっても、他のクエリのジャンルスコアを用いてジャンルスコアを算出することができなかった。
これに対し本発明は、他のクエリのジャンルスコアを用いてジャンルスコアを算出する技術を提供する。 Even with the techniques described in Patent Documents 1 and 2, the genre score cannot be calculated using the genre scores of other queries.
In contrast, the present invention provides a technique for calculating a genre score using a genre score of another query.

本発明は、検索に用いられたクエリを示すクエリノードおよび前記クエリ以外の前記検索に関する他の項目を示す中継ノードを含む複数のノードをエッジで接続したグラフを取得する第１取得手段と、クエリ、前記クエリのジャンルを示すジャンル識別子および前記クエリと前記ジャンルとの関連性の高さを示すジャンルスコアを含むジャンル辞書の中から、指定されたジャンル識別子に対応するジャンルスコアおよびクエリを取得する第２取得手段と、前記第１取得手段により取得されたグラフに含まれるクエリノードのうち前記ジャンル辞書にジャンルスコアが記載されていないクエリを示す対象ノードの中から特定されたノードを基点ノードとして、前記基点ノードと前記中継ノードを介してエッジで接続された他のクエリノードが示すクエリのジャンルスコアを、前記第２取得手段により取得されたジャンルスコアの中から用いて、前記基点ノードのクエリのジャンルスコアを算出する処理を、終了条件が満たされるまで前記基点ノードを更新しながら繰り返し行い、複数のクエリのジャンルスコアを算出する算出手段とを有するサーバ装置を提供する。
このサーバ装置によれば、他のクエリのジャンルスコアを用いてジャンルスコアを算出することができる。 The present invention provides first acquisition means for acquiring a graph in which a plurality of nodes including a query node indicating a query used for a search and a relay node indicating other items related to the search other than the query are connected by an edge, and a query The genre score and the query corresponding to the specified genre identifier are acquired from the genre dictionary including the genre identifier indicating the genre of the query and the genre score indicating the degree of relevance between the query and the genre. 2 acquisition means and a node identified from target nodes indicating queries whose genre scores are not described in the genre dictionary among query nodes included in the graph acquired by the first acquisition means as a base node; Other query nodes connected at the edge via the base node and the relay node The genre score of the query is used from the genre scores acquired by the second acquisition means, and the genre score of the query of the base node is calculated. The base node is updated until the end condition is satisfied. A server device is provided that includes calculation means for repeatedly performing genre scores of a plurality of queries.
According to this server device, a genre score can be calculated using a genre score of another query.

好ましい態様において、前記複数のノードが、前記検索の結果ユーザに閲覧された文書の所在を階層構造を用いて示す第１所在情報を示す第１所在ノード、前記第１所在情報よりも上位の第２所在情報を示す第２所在ノード、および前記検索をしたユーザ名を示すユーザノードを含み、前記中継ノードが、前記第１所在ノードであり、前記処理は、前記中継ノードとエッジで接続された前記第２所在ノードまたは前記ユーザノードとエッジで接続された他のクエリノードが示すクエリのジャンルスコアを用いて、前記基点クエリのジャンルスコアを算出する処理であってもよい。
このサーバ装置によれば、第１所在ノードよりも上位のノードで接続されたクエリノードのジャンルスコアを用いてジャンルスコアを算出することができる。 In a preferred aspect, the plurality of nodes are a first location node indicating first location information indicating a location of a document browsed by the user as a result of the search using a hierarchical structure, and a higher rank than the first location information. 2 including a second location node indicating location information and a user node indicating the name of the searched user, wherein the relay node is the first location node, and the processing is connected to the relay node at an edge The genre score of the base query may be calculated using the genre score of a query indicated by the second location node or another query node connected to the user node by an edge.
According to this server device, the genre score can be calculated using the genre score of the query node connected by a node higher than the first location node.

別の好ましい態様において、前記複数のノードが、前記検索の結果ユーザに閲覧された文書の所在を階層構造を用いて示す第１所在情報を示す第１所在ノード、および前記第１所在情報よりも上位の第２所在情報を示す第２所在ノードを含み、前記中継ノードが、前記第１所在ノードであり、前記算出手段は、前記第２所在情報毎に、前記第２所在情報と前記ジャンルとの間の相関を示すパラメータを算出し、前記処理は、前記基点ノードと関連する第２所在ノードと関連する他のクエリノードが示すクエリのジャンルスコアおよび前記パラメータを用いて、前記基点ノードのクエリのジャンルスコアを算出する処理であってもよい。
このサーバ装置によれば、第２所在情報とジャンルとの間の相関を示すパラメータを用いてジャンルスコアを算出することができる。 In another preferable aspect, the plurality of nodes include a first location node indicating first location information indicating a location of a document browsed by the user as a result of the search using a hierarchical structure, and more than the first location information. A second location node indicating higher-order second location information, wherein the relay node is the first location node, and the calculation means includes the second location information, the genre, and the second location information for each second location information. A parameter indicating a correlation between the base node and the process using the genre score of the query indicated by another query node related to the second location node related to the base node and the parameter, and the query of the base node The genre score may be calculated.
According to this server device, the genre score can be calculated using the parameter indicating the correlation between the second location information and the genre.

さらに別の好ましい態様において、このサーバ装置は、検索に用いられたクエリと前記検索が行われた回数である検索回数とが蓄積された検索ログを取得する第３取得手段と、クエリが入力されると検索結果を出力する検索サーバに対し、前記検索ログに含まれるクエリを入力し、前記クエリに対する検索結果を取得する第４取得手段と、前記第３取得手段により取得された検索ログおよび前記第４取得手段により取得された検索結果を用いて前記エッジの重みを推定する推定手段とを有し、前記処理は、前記基点ノードと関連する第２所在ノードと関連する他のクエリノードが示すクエリのジャンルスコアおよび前記推定された重みを用いて、前記基点ノードのクエリのジャンルスコアを算出する処理であってもよい。
このサーバ装置によれば、検索に用いられたクエリと前記検索が行われた回数である検索回数とが蓄積された検索ログから、ジャンルスコアを算出することができる。 In yet another preferred aspect, the server device is further provided with third acquisition means for acquiring a search log in which the query used for the search and the search count that is the number of times the search has been performed, and the query are input. Then, to the search server that outputs the search result, a query included in the search log is input, a fourth acquisition unit that acquires the search result for the query, the search log acquired by the third acquisition unit, and the search log Estimation means for estimating the weight of the edge using the search result acquired by the fourth acquisition means, and the processing is indicated by another query node related to the second location node related to the base node Processing for calculating the genre score of the query of the base node may be performed using the genre score of the query and the estimated weight.
According to this server device, the genre score can be calculated from the search log in which the query used for the search and the number of searches that is the number of times the search has been performed are accumulated.

また、本発明は、検索に用いられたクエリと、前記検索により閲覧された文書の所在を階層構造により示す第１所在情報とを含む複数の項目を含む検索ログを第１記憶手段に記憶するステップと、前記第１記憶手段に記憶されている検索ログに含まれる項目のうち、前記クエリおよび前記クエリ以外の他の項目をノードとして抽出するステップと、前記抽出されたクエリを示すクエリノードと、前記抽出された他の項目を示す中継ノードとエッジで接続したグラフを生成するステップと、クエリ、前記クエリのジャンルを示すジャンル識別子および前記クエリと前記ジャンルとの関連性の高さを示すジャンルスコアを含むジャンル辞書を記憶した第２記憶手段から、指定されたジャンル識別子に対応するジャンルスコアおよびクエリを取得するステップと、前記生成されたグラフに含まれるクエリノードのうち所定の条件を満たす対象ノードの中から特定されたノードを基点ノードとして、前記基点ノードと前記中継ノードを介してエッジで接続された他のクエリノードが示すクエリのジャンルスコアを、前記取得されたジャンルスコアの中から用いて、前記基点ノードのクエリのジャンルスコアを算出する処理を、終了条件が満たされるまで前記基点ノードを更新しながら繰り返し行い、複数のクエリのジャンルスコアを算出するステップとを有するジャンルスコア算出方法を提供する。
この方法によれば、他のクエリのジャンルスコアを用いてジャンルスコアを算出することができる。 Further, the present invention stores a search log including a plurality of items including a query used for search and first location information indicating a location of the document browsed by the search in a hierarchical structure in the first storage unit. A step of extracting, as a node, items other than the query and the query among items included in the search log stored in the first storage unit; and a query node indicating the extracted query; Generating a graph connected to the relay node indicating the extracted other item by an edge, a query, a genre identifier indicating the genre of the query, and a genre indicating the degree of relevance between the query and the genre The genre score and the query corresponding to the designated genre identifier are acquired from the second storage means storing the genre dictionary including the score. A node identified from among the target nodes satisfying a predetermined condition among the query nodes included in the generated graph as a base node, and the other connected by an edge via the base node and the relay node The process of calculating the genre score of the query of the base node by using the genre score of the query indicated by the query node from the acquired genre scores while updating the base node until the end condition is satisfied A genre score calculation method is provided that includes repeatedly performing genre score calculations for a plurality of queries.
According to this method, a genre score can be calculated using a genre score of another query.

好ましい態様において、前記複数のノードが、前記検索の結果ユーザに閲覧された文書の所在を階層構造を用いて示す第１所在情報を示す第１所在ノード、前記第１所在情報よりも上位の第２所在情報を示す第２所在ノード、および前記検索をしたユーザ名を示すユーザノードを含み、前記中継ノードが、前記第１所在ノードであり、前記処理は、前記中継ノードとエッジで接続された前記第２所在ノードまたは前記ユーザノードとエッジで接続された他のクエリノードが示すクエリのジャンルスコアを用いて、前記基点クエリのジャンルスコアを算出する処理であってもよい。
この方法によれば、第１所在ノードよりも上位のノードで接続されたクエリノードのジャンルスコアを用いてジャンルスコアを算出することができる。 In a preferred aspect, the plurality of nodes are a first location node indicating first location information indicating a location of a document browsed by the user as a result of the search using a hierarchical structure, and a higher rank than the first location information. 2 including a second location node indicating location information and a user node indicating the name of the searched user, wherein the relay node is the first location node, and the processing is connected to the relay node at an edge The genre score of the base query may be calculated using the genre score of a query indicated by the second location node or another query node connected to the user node by an edge.
According to this method, the genre score can be calculated using the genre score of the query node connected by a node higher than the first location node.

別の好ましい態様において、前記複数のノードが、前記検索の結果ユーザに閲覧された文書の所在を階層構造を用いて示す第１所在情報を示す第１所在ノード、および前記第１所在情報よりも上位の第２所在情報を示す第２所在ノードを含み、前記中継ノードが、前記第１所在ノードであり、前記第２所在情報毎に、前記第２所在情報と前記ジャンルとの間の相関を示すパラメータを算出するステップをさらに有し、前記処理は、前記基点ノードと関連する第２所在ノードと関連する他のクエリノードが示すクエリのジャンルスコアおよび前記パラメータを用いて、前記基点ノードのクエリのジャンルスコアを算出する処理であってもよい。
この方法によれば、第２所在情報とジャンルとの間の相関を示すパラメータを用いてジャンルスコアを算出することができる。 In another preferable aspect, the plurality of nodes include a first location node indicating first location information indicating a location of a document browsed by the user as a result of the search using a hierarchical structure, and more than the first location information. Including a second location node indicating higher-order second location information, wherein the relay node is the first location node, and for each second location information, a correlation between the second location information and the genre is calculated. And calculating the parameter to indicate, wherein the processing uses the genre score of the query indicated by the other query node related to the second location node related to the base node and the parameter to use the query of the base node The genre score may be calculated.
According to this method, the genre score can be calculated using the parameter indicating the correlation between the second location information and the genre.

さらに、本発明は、検索に用いられたクエリと、前記検索が行われた回数とを含む検索ログを第１記憶手段に記憶するステップと、クエリが入力されると検索結果を出力する検索サーバに対し、前記第１記憶手段に記憶されている検索ログに含まれるクエリを入力し、前記クエリに対する検索結果を取得するステップと、前記第１記憶手段に記憶されている検索ログに含まれる項目のうち、前記クエリおよび前記クエリ以外の他の項目をノードとして抽出するステップと、前記抽出されたクエリを示すクエリノードと、前記抽出された他の項目を示す中継ノードとエッジで接続したグラフを生成するステップと、前記第１記憶手段に記憶されている検索ログおよび前記取得された検索結果を用いて前記エッジの重みを推定するステップと、クエリ、前記クエリのジャンルを示すジャンル識別子および前記クエリと前記ジャンルとの関連性の高さを示すジャンルスコアを含むジャンル辞書を記憶した第２記憶手段から、指定されたジャンル識別子に対応するジャンルスコアおよびクエリを取得するステップと、前記生成されたグラフに含まれるクエリノードのうち所定の条件を満たす対象ノードの中から特定されたノードを基点ノードとして、前記基点ノードと前記中継ノードを介してエッジで接続された他のクエリノードが示すクエリのジャンルスコアを、前記取得されたジャンルスコアの中から用いて、かつ前記推定された重みを用いて、前記基点ノードのクエリのジャンルスコアを算出する処理を、終了条件が満たされるまで前記基点ノードを更新しながら繰り返し行い、複数のクエリのジャンルスコアを算出するステップとを有するジャンルスコア算出方法を提供する。
この方法によれば、検索に用いられたクエリと前記検索が行われた回数である検索回数とが蓄積された検索ログから、ジャンルスコアを算出することができる。 Furthermore, the present invention includes a step of storing a search log including a query used for a search and the number of times the search has been performed in a first storage unit, and a search server that outputs a search result when the query is input On the other hand, a step of inputting a query included in the search log stored in the first storage unit, obtaining a search result for the query, and an item included in the search log stored in the first storage unit A step of extracting the query and other items other than the query as nodes, a query node indicating the extracted query, and a graph connecting the relay node indicating the extracted other item with an edge Generating, using the search log stored in the first storage means and the acquired search result to estimate the weight of the edge, A genre score corresponding to the specified genre identifier from the second storage means storing a genre identifier including a genre identifier indicating the genre of the query and a genre score indicating the degree of relevance between the query and the genre And a step of obtaining a query, and an edge through the base node and the relay node with a node identified from target nodes satisfying a predetermined condition among query nodes included in the generated graph as a base node A process of calculating the genre score of the query of the base node using the genre score of the query indicated by the other query node connected in the above from the acquired genre score and using the estimated weight Repeatedly while updating the base point node until the end condition is satisfied, Providing genre score calculation method and a step of calculating a genre score of the query.
According to this method, the genre score can be calculated from the search log in which the query used for the search and the search count that is the number of times the search is performed are accumulated.

さらに、本発明は、コンピュータに、検索に用いられたクエリと、前記検索により閲覧された文書の所在を階層構造により示す第１所在情報とを含む複数の項目を含む検索ログを第１記憶手段に記憶するステップと、前記第１記憶手段に記憶されている検索ログに含まれる項目のうち、前記クエリおよび前記クエリ以外の他の項目をノードとして抽出するステップと、前記抽出されたクエリを示すクエリノードと、前記抽出された他の項目を示す中継ノードとエッジで接続したグラフを生成するステップと、クエリ、前記クエリのジャンルを示すジャンル識別子および前記クエリと前記ジャンルとの関連性の高さを示すジャンルスコアを含むジャンル辞書を記憶した第２記憶手段から、指定されたジャンル識別子に対応するジャンルスコアおよびクエリを取得するステップと、前記生成されたグラフに含まれるクエリノードのうち所定の条件を満たす対象ノードの中から特定されたノードを基点ノードとして、前記基点ノードと前記中継ノードを介してエッジで接続された他のクエリノードが示すクエリのジャンルスコアを、前記取得されたジャンルスコアの中から用いて、前記基点ノードのクエリのジャンルスコアを算出する処理を、終了条件が満たされるまで前記基点ノードを更新しながら繰り返し行い、複数のクエリのジャンルスコアを算出するステップとを実行させるためのプログラムを提供する。
このプログラムによれば、他のクエリのジャンルスコアを用いてジャンルスコアを算出することができる。 Further, according to the present invention, a search log including a plurality of items including a query used for a search and first location information indicating a location of a document browsed by the search in a hierarchical structure is stored in a computer. And a step of extracting, as nodes, items other than the query and the query among items included in the search log stored in the first storage unit, and the extracted query. A query node, a step of generating a graph connected to the relay node indicating the extracted other item by an edge, a query, a genre identifier indicating the genre of the query, and a high degree of relevance between the query and the genre The genre score corresponding to the specified genre identifier is stored in the second storage means storing the genre dictionary including the genre score indicating And obtaining a query, and using the node specified from among the target nodes satisfying a predetermined condition among the query nodes included in the generated graph as a base node, an edge through the base node and the relay node The process of calculating the genre score of the query of the base node using the genre score of the query indicated by the other query node connected in the above-described genre score until the end condition is satisfied. A program for executing a step of calculating genre scores of a plurality of queries is repeatedly performed while updating nodes.
According to this program, a genre score can be calculated using a genre score of another query.

さらに、本発明は、コンピュータに、検索に用いられたクエリと、前記検索が行われた回数とを含む検索ログを第１記憶手段に記憶するステップと、クエリが入力されると検索結果を出力する検索サーバに対し、前記第１記憶手段に記憶されている検索ログに含まれるクエリを入力し、前記クエリに対する検索結果を取得するステップと、前記第１記憶手段に記憶されている検索ログに含まれる項目のうち、前記クエリおよび前記クエリ以外の他の項目をノードとして抽出するステップと、前記抽出されたクエリを示すクエリノードと、前記抽出された他の項目を示す中継ノードとエッジで接続したグラフを生成するステップと、前記第１記憶手段に記憶されている検索ログおよび前記取得された検索結果を用いて前記エッジの重みを推定するステップと、クエリ、前記クエリのジャンルを示すジャンル識別子および前記クエリと前記ジャンルとの関連性の高さを示すジャンルスコアを含むジャンル辞書を記憶した第２記憶手段から、指定されたジャンル識別子に対応するジャンルスコアおよびクエリを取得するステップと、前記生成されたグラフに含まれるクエリノードのうち所定の条件を満たす対象ノードの中から特定されたノードを基点ノードとして、前記基点ノードと前記中継ノードを介してエッジで接続された他のクエリノードが示すクエリのジャンルスコアを、前記取得されたジャンルスコアの中から用いて、かつ前記推定された重みを用いて、前記基点ノードのクエリのジャンルスコアを算出する処理を、終了条件が満たされるまで前記基点ノードを更新しながら繰り返し行い、複数のクエリのジャンルスコアを算出するステップとを実行させるためのプログラムを提供する。
このプログラムによれば、他のクエリのジャンルスコアを用いてジャンルスコアを算出することができる。 Furthermore, the present invention stores a search log including a query used for a search and the number of times the search is performed in a first storage means, and outputs a search result when the query is input. A query included in the search log stored in the first storage means and obtaining a search result for the query; and a search log stored in the first storage means. Among the included items, the step of extracting the query and other items other than the query as nodes, the query node indicating the extracted query, and the relay node indicating the extracted other item are connected at the edge A weight of the edge is estimated using a step of generating the graph, a search log stored in the first storage means, and the acquired search result Corresponding to the specified genre identifier from the second storage means storing the step, the query, the genre identifier indicating the genre of the query, and the genre score indicating the degree of relevance between the query and the genre A genre score to be obtained and a query, and a node identified from target nodes satisfying a predetermined condition among query nodes included in the generated graph as a base node, and the base node and the relay node A genre score of a query indicated by another query node connected by an edge is used from among the acquired genre scores, and the genre score of the query of the base node is determined using the estimated weight. The calculation process is repeated while updating the base node until the end condition is satisfied. Returns conducted to provide a program for executing the steps of calculating a genre score multiple queries.
According to this program, a genre score can be calculated using a genre score of another query.

一実施形態に係る検索システム１の構成を示す図である。It is a figure showing composition of search system 1 concerning one embodiment. 検索ログを例示する図である。It is a figure which illustrates a search log. グラフを例示する図である。It is a figure which illustrates a graph. グラフの生成処理を説明する図である。It is a figure explaining the production | generation process of a graph. ジャンル辞書を例示する図である。It is a figure which illustrates a genre dictionary. ジャンルサーバ１０のハードウェア構成を示す図である。2 is a diagram illustrating a hardware configuration of a genre server 10. FIG. ジャンルスコア算出処理を示すフローチャートである。It is a flowchart which shows a genre score calculation process. この例で取得されたグラフを示す図である。It is a figure which shows the graph acquired in this example. 図８のグラフの付随データを示す図である。It is a figure which shows the accompanying data of the graph of FIG. ジャンルスコア算出処理の一例を示すフローチャートである。It is a flowchart which shows an example of a genre score calculation process. ジャンルサーバ１０からの出力データを例示する図である。It is a figure which illustrates the output data from genre server. 対比例としてのグラフを示す図である。It is a figure which shows the graph as a proportionality. 変形例１に係るグラフを例示する図である。It is a figure which illustrates the graph concerning the modification 1. 図１３のグラフに付随するデータを示す図である。It is a figure which shows the data accompanying the graph of FIG. 変形例２に係る動作を示すフローチャートである。10 is a flowchart showing an operation according to Modification 2. 変形例２に係るグラフを例示する図である。It is a figure which illustrates the graph concerning the modification 2. 図１６のグラフに付随するデータを示す図である。It is a figure which shows the data accompanying the graph of FIG. 変形例３に係るグラフを例示する図である。It is a figure which illustrates the graph concerning the modification 3. 変形例４に係る検索システム１の構成を示す図である。It is a figure which shows the structure of the search system 1 which concerns on the modification 4. 変形例４に係る検索ログを例示する図である。It is a figure which illustrates the search log concerning the modification 4. 検索結果を例示する図である。It is a figure which illustrates a search result. 算出された推定値を含むテーブルを例示する図である。It is a figure which illustrates the table containing the calculated estimated value.

１．構成
図１は、一実施形態に係る検索システム１の構成を示す図である。検索システム１は、ジャンルサーバ１０、検索サーバ２０およびクライアント３０を有する。検索システム１において、ジャンルサーバ１０は、クライアント３０からジャンル識別子が入力されると、そのジャンル識別子に対応するクエリのジャンルスコアを出力する。 1. Configuration FIG. 1 is a diagram illustrating a configuration of a search system 1 according to an embodiment. The search system 1 includes a genre server 10, a search server 20, and a client 30. In the search system 1, when a genre identifier is input from the client 30, the genre server 10 outputs a genre score of a query corresponding to the genre identifier.

ジャンルサーバ１０は、機能要素として、記憶部１１、記憶部１３、生成部１２（第３取得手段、第４取得手段、および推定手段の一例）、呼出部１４、算出部１５（第１取得手段、第２取得手段、および算出手段の一例）および出力部１６を有する。記憶部１１は、検索ログを記憶する。検索ログは、検索のクエリと検索の結果として閲覧された文書の識別子とを含む複数の項目のデータが蓄積されたログである。 The genre server 10 includes, as functional elements, a storage unit 11, a storage unit 13, a generation unit 12 (an example of a third acquisition unit, a fourth acquisition unit, and an estimation unit), a calling unit 14, and a calculation unit 15 (first acquisition unit). , A second acquisition unit and an example of a calculation unit) and an output unit 16. The storage unit 11 stores a search log. The search log is a log in which data of a plurality of items including a search query and an identifier of a document browsed as a result of the search are accumulated.

図２は、検索ログを例示する図である。この例で、検索ログは複数のデータセット（レコード）を含む。各データセットは、複数の項目に区分されたデータを含む。項目としては、ユーザ名、クエリ、ＵＲＬ（Uniform Resource Locator）、タイムスタンプ、順位、第１位ＵＲＬ、第２位ＵＲＬ、および第３位ＵＲＬが用いられる。検索される文書は、インターネット上に存在する文書である。ここで、「文書」とは、ＨＴＭＬ（HyperText Markup Language）ファイル、音声ファイル、画像ファイルその他のあらゆる形式のファイルをいう。文書の所在は、階層構造を用いた所在情報であるＵＲＬ（第１所在情報の一例）によって示される。ユーザ名は、検索を行ったユーザを示す。クエリは、文書の検索に用いられた語を示す。ＵＲＬは、検索により抽出された複数の文書の中からユーザが閲覧（選択）した文書の所在を示す。タイムスタンプは、検索を行った時刻を示す。順位は、ユーザが閲覧した文書の、検索結果における順位を示す。第１位ＵＲＬは、検索結果において順位が第１位の文書の所在を示す。第２位ＵＲＬおよび第３位ＵＲＬについても同様である。図２の例で、最上行のデータセットは、ユーザ名「ｕ１」のユーザが、２０１０年１月２５日０時０分３８秒にクエリ「ｑ１」を用いて検索を行ったことを示している。さらに、この検索では、ＵＲＬ「ｓ１」、「ｓ２」、および「ｓ４」の文書が、検索結果において第１−３位として提示され、ユーザは、その中からＵＲＬ「ｓ１」で示される文書（検索結果が第１位の文書）を閲覧したことが示されている。 FIG. 2 is a diagram illustrating a search log. In this example, the search log includes a plurality of data sets (records). Each data set includes data divided into a plurality of items. As items, a user name, a query, a URL (Uniform Resource Locator), a time stamp, an order, a first place URL, a second place URL, and a third place URL are used. The searched document is a document existing on the Internet. Here, the “document” refers to an HTML (HyperText Markup Language) file, an audio file, an image file, and other types of files. The location of the document is indicated by a URL (an example of first location information) that is location information using a hierarchical structure. The user name indicates the user who performed the search. The query indicates a word used for searching the document. The URL indicates the location of a document viewed (selected) by the user from a plurality of documents extracted by the search. The time stamp indicates the time when the search is performed. The rank indicates the rank in the search result of the document viewed by the user. The first URL indicates the location of the first document in the search result. The same applies to the second URL and the third URL. In the example of FIG. 2, the data set in the top row indicates that the user with the user name “u1” performed a search using the query “q1” on January 25, 2010, 0: 0: 38. Yes. Further, in this search, the documents with URLs “s1”, “s2”, and “s4” are presented as the first to third positions in the search result, and the user selects the document (URL “s1”) from among them. It is shown that the search result is the first document).

再び図１を参照する。生成部１２は、記憶部１１に記憶されている検索ログから、グラフを生成する。ここで、「グラフ」とは、複数のノードの接続関係をエッジ（リンク）により表した情報、またはその視覚的な表現をいう。 Refer to FIG. 1 again. The generation unit 12 generates a graph from the search log stored in the storage unit 11. Here, the “graph” refers to information representing a connection relationship between a plurality of nodes by an edge (link) or a visual expression thereof.

図３は、グラフを例示する図である。丸印はノードを、実線はエッジをそれぞれ示している。この例では、クエリとドメイン（第２所在情報の一例）がノードとして用いられている。以下、クエリのノードを「クエリノード」といい、ドメインのノードを「ドメインノード」（第２所在ノードの一例）という。他のノードについても同様である。ドメインを基準にして見ると、例えば、ドメインノードｄ１にはクエリノードｑ１およびｑ２が接続されており、ドメインノードｄ２にはクエリノードｑ３が接続されている。あるいは、クエリを基準にして見ると、クエリノードｑ３にはドメインノードｄ２，ｄ３，およびｄ５が接続されている。このグラフから、クエリｑ１を入力して、ドメインｄ１に属する文書をユーザが閲覧したことが検索ログに記録されていることが読み取れる。 FIG. 3 is a diagram illustrating a graph. Circles indicate nodes, and solid lines indicate edges. In this example, a query and a domain (an example of second location information) are used as nodes. Hereinafter, a query node is referred to as a “query node”, and a domain node is referred to as a “domain node” (an example of a second location node). The same applies to other nodes. Looking at the domain as a reference, for example, the query nodes q1 and q2 are connected to the domain node d1, and the query node q3 is connected to the domain node d2. Alternatively, when viewed on the basis of a query, domain nodes d2, d3, and d5 are connected to the query node q3. From this graph, it can be read that a query q1 is input and that a user has viewed a document belonging to the domain d1 is recorded in the search log.

図４は、グラフの生成処理を説明する図である。ここでは、図２に示される検索ログからグラフを生成する例を説明する。図４中、上段の表は、図２の検索ログに対応している。グラフは以下のように生成される。生成部１２は、記憶部１１から検索ログを読み出す。生成部１２は、読み出した検索ログからノードを抽出する。ここでは、ユーザ名、クエリ、ＵＲＬ、およびドメインがノードとして抽出される。これらの項目は、階層的に配置される。階層は、下位から順に、ユーザ名、クエリ、ＵＲＬ、およびドメインという構造を有している。どの項目をノードとして抽出するか、および、ノードの階層構造はあらかじめ決められている。ドメインは、ＵＲＬから抽出される。ＵＲＬは、「aaa://bbb.ccc.ddd/eee/fff」のように、所定の書式で階層構造が記述される。ドメインは、このうちの階層構造が上位の一部、具体的にはＵＲＬの先頭から、「://」の次の最初の「/」の前までの文字列である。この例では、「aaa://bbb.ccc.ddd」がドメインである。この例でドメインはＵＲＬより階層が上位の概念であり、単一のＵＲＬから複数のドメインが抽出されることはない（すなわち、単一のＵＲＬに複数のドメインが対応することはない）。生成部１２は、データセット毎にエッジを生成する。図４の例では、第１番目のデータセットは、ユーザ名「ｕ１」、クエリ「ｑ１」、およびＵＲＬ「ｓ１」を含んでいる。ＵＲＬ「ｓ１」からドメイン「ｄ１」が抽出される。生成部１２は、第１番目のデータセットに対し、ノードｕ１、ノードｑ１、ノードｓ１、およびノードｄ１をそれぞれエッジで接続する。エッジは、隣接する階層のノード間に設けられる。例えば、ユーザノードとクエリノードは階層が隣接しているのでエッジで接続される。ユーザノードとＵＲＬノード（第１所在ノードの一例）は階層が隣接していない（間にクエリノードが存在する）ので、エッジで直接結ばれない。図４の第２番目のデータセットは、ユーザ名「ｕ１」、クエリ「ｑ１」、およびＵＲＬ「ｓ２」を含んでいる。ＵＲＬ「ｓ２」からドメイン「ｄ２」が抽出される。生成部１２は、第２番目のデータセットに対し、ノードｕ１、ノードｑ１、ノードｓ２、およびノードｄ２をそれぞれエッジで接続する。以下同様に、すべてのデータセットに対して、ノードがエッジで接続される。 FIG. 4 is a diagram for explaining a graph generation process. Here, an example of generating a graph from the search log shown in FIG. 2 will be described. In FIG. 4, the upper table corresponds to the search log of FIG. The graph is generated as follows: The generation unit 12 reads the search log from the storage unit 11. The generation unit 12 extracts a node from the read search log. Here, the user name, query, URL, and domain are extracted as nodes. These items are arranged in a hierarchy. The hierarchy has a structure of a user name, a query, a URL, and a domain in order from the bottom. Which items are extracted as nodes and the hierarchical structure of the nodes are determined in advance. The domain is extracted from the URL. The URL has a hierarchical structure described in a predetermined format such as “aaa: //bbb.ccc.ddd/eee/fff”. The domain is a character string from the top of the hierarchical structure, specifically, from the top of the URL to before the first “/” next to “: //”. In this example, “aaa: //bbb.ccc.ddd” is the domain. In this example, the domain is a concept higher in hierarchy than the URL, and a plurality of domains are not extracted from a single URL (that is, a plurality of domains do not correspond to a single URL). The generation unit 12 generates an edge for each data set. In the example of FIG. 4, the first data set includes a user name “u1”, a query “q1”, and a URL “s1”. The domain “d1” is extracted from the URL “s1”. The generation unit 12 connects the node u1, the node q1, the node s1, and the node d1 to the first data set by edges. An edge is provided between nodes in adjacent hierarchies. For example, the user node and the query node are connected at the edge because the hierarchy is adjacent. Since the user node and the URL node (an example of the first location node) are not adjacent in the hierarchy (there is a query node between them), they are not directly connected at the edge. The second data set in FIG. 4 includes a user name “u1”, a query “q1”, and a URL “s2”. The domain “d2” is extracted from the URL “s2”. The generation unit 12 connects the node u1, the node q1, the node s2, and the node d2 to the second data set by edges. Similarly, for all data sets, nodes are connected by edges.

生成部１２は、さらに、グラフに付随する付随データを検索ログから生成する。付随データは、ジャンルスコアの算出に用いられるデータを含む。付随データの詳細は後述する。 The generation unit 12 further generates accompanying data accompanying the graph from the search log. The accompanying data includes data used for calculating the genre score. Details of the accompanying data will be described later.

再び図１を参照する。記憶部１３は、ジャンル辞書を記憶する。ジャンル辞書は、クエリとジャンルの関連性を示すデータベースである。 Refer to FIG. 1 again. The storage unit 13 stores a genre dictionary. The genre dictionary is a database indicating the relationship between queries and genres.

図５は、ジャンル辞書を例示する図である。ジャンル辞書は、複数のデータセットを含む。各データセットは、複数の項目に区分されたデータを含む。項目としては、クエリ、ジャンル識別子、ジャンル、およびジャンルスコアが用いられる。ジャンルは、文書の分類または種類を示す。ジャンル識別子は、ジャンルの識別子である。ジャンルスコアは、クエリとジャンルとの関連性の高さを示す。この例では、ジャンルスコアの最高点は１．００であり、最低点は０．００である。例えばジャンルスコアが１．００である場合、そのクエリとそのジャンルが高い関連性を有する。図５の第１番目のデータセットは、クエリ「ｑ１」がジャンル「お笑い」（＝ジャンル識別子「１５６７」）に関連しており、そのジャンルスコアが１．００であることを示している。ジャンルスコアの算出は、PageRank、HITS algorithm、SALSA、またはTrustRankその他の周知のアルゴリズムにより行われる。ジャンルスコアの算出は、ジャンルサーバ１０（算出部１５）によって行われてもよいし、検索サーバ２０等ジャンルサーバ１０以外の装置によって行われてもよい。 FIG. 5 is a diagram illustrating a genre dictionary. The genre dictionary includes a plurality of data sets. Each data set includes data divided into a plurality of items. As items, a query, a genre identifier, a genre, and a genre score are used. The genre indicates the classification or type of the document. The genre identifier is a genre identifier. The genre score indicates the degree of relevance between the query and the genre. In this example, the highest score of the genre score is 1.00 and the lowest score is 0.00. For example, when the genre score is 1.00, the query and the genre are highly relevant. The first data set in FIG. 5 indicates that the query “q1” is related to the genre “Comedy” (= genre identifier “1567”), and the genre score is 1.00. The genre score is calculated by PageRank, HITS algorithm, SALSA, TrustRank or other known algorithms. The calculation of the genre score may be performed by the genre server 10 (calculation unit 15), or may be performed by a device other than the genre server 10, such as the search server 20.

再び図１を参照する。呼出部１４は、クライアント３０から引数としてジャンル識別子を受け取る。呼出部１４は、受け取ったジャンル識別子に対応するクエリを、記憶部１３に記憶されているジャンル辞書に含まれるクエリの中から抽出する。呼出部１４は、抽出したクエリを算出部１５に出力する。 Refer to FIG. 1 again. The calling unit 14 receives a genre identifier from the client 30 as an argument. The calling unit 14 extracts a query corresponding to the received genre identifier from queries included in the genre dictionary stored in the storage unit 13. The calling unit 14 outputs the extracted query to the calculation unit 15.

算出部１５は、呼出部１４から入力されたクエリについて、ジャンルスコアを算出する。出力部１６は、算出部１５により算出されたジャンルスコアをクライアント３０に出力する。 The calculation unit 15 calculates a genre score for the query input from the calling unit 14. The output unit 16 outputs the genre score calculated by the calculation unit 15 to the client 30.

図６は、ジャンルサーバ１０のハードウェア構成を示す図である。ジャンルサーバ１０は、制御部１１０、記憶部１２０、入力部１３０、表示部１４０、通信部１５０を有する。制御部１１０は、他の構成要素を制御する。制御部１１０は、ＣＰＵ（Central Processing Unit）１１１、ＲＡＭ（Random Access Memory）１１２およびＲＯＭ（Read Only Memory）１１３を有する。ＣＰＵ１１１は、種々の演算を行う装置である。ＲＡＭ１１２は、ＣＰＵ１１１がプログラムを実行する際の作業エリアとして機能する記憶装置である。ＲＯＭ１１３は、プログラムやデータを記憶する記憶装置である。記憶部１２０は、内蔵フラッシュメモリや、ＨＤＤ（Hard Disk Drive）、着脱式のメモリカード等の不揮発性の記憶装置を含む。記憶部１２０は、種々のプログラムおよびデータを記憶する。制御部１１０が記憶部１２０に記憶されているジャンルスコア算出プログラムを実行することにより、図１の機能が実現される。記憶部１１、記憶部１３等、情報を記憶する機能は、ＲＡＭ１１２や記憶部１２０に記憶領域を確保し、この記憶領域に情報を記憶する。 FIG. 6 is a diagram illustrating a hardware configuration of the genre server 10. The genre server 10 includes a control unit 110, a storage unit 120, an input unit 130, a display unit 140, and a communication unit 150. The control unit 110 controls other components. The control unit 110 includes a CPU (Central Processing Unit) 111, a RAM (Random Access Memory) 112, and a ROM (Read Only Memory) 113. The CPU 111 is a device that performs various calculations. The RAM 112 is a storage device that functions as a work area when the CPU 111 executes a program. The ROM 113 is a storage device that stores programs and data. The storage unit 120 includes a nonvolatile storage device such as a built-in flash memory, an HDD (Hard Disk Drive), and a removable memory card. The storage unit 120 stores various programs and data. The control unit 110 executes the genre score calculation program stored in the storage unit 120, thereby realizing the function of FIG. The functions for storing information, such as the storage unit 11 and the storage unit 13, secure a storage area in the RAM 112 and the storage unit 120 and store the information in this storage area.

入力部１３０は、制御部１１０に情報を入力する。入力部１３０は、キーボードおよびマウス等の入力装置を含む。表示部１４０は、情報を表示する。表示部１４０は、ＬＣＤ（Liquid Crystal Display）や、ＥＬディスプレイ等の表示装置を含む。通信部１５０は、ネットワークを介した通信を行う。 The input unit 130 inputs information to the control unit 110. The input unit 130 includes input devices such as a keyboard and a mouse. The display unit 140 displays information. The display unit 140 includes a display device such as an LCD (Liquid Crystal Display) or an EL display. The communication unit 150 performs communication via a network.

ジャンルスコア算出プログラムを実行している制御部１１０は、生成部１２、呼出部１４、および算出部１５の一例である。通信部１５０は、出力部１６の一例である。制御部１１０と協働している記憶部１２０は、記憶部１１および記憶部１３の一例である。 The control unit 110 executing the genre score calculation program is an example of the generation unit 12, the calling unit 14, and the calculation unit 15. The communication unit 150 is an example of the output unit 16. The storage unit 120 cooperating with the control unit 110 is an example of the storage unit 11 and the storage unit 13.

検索サーバ２０は、クライアント３０等の他の装置からの検索要求に応じて、検索結果を出力する。検索要求は、クエリを含む。検索結果は、そのクエリを用いて検索された文書のＵＲＬおよびその文書の順位を含む。検索サーバ２０は、この機能を実現するためのハードウェア要素（ＣＰＵ、ＲＯＭ、ＲＡＭ、記憶部、入出力部など）およびソフトウェアを有する（図示略）。 The search server 20 outputs a search result in response to a search request from another device such as the client 30. The search request includes a query. The search result includes the URL of the document searched using the query and the rank of the document. The search server 20 includes hardware elements (CPU, ROM, RAM, storage unit, input / output unit, etc.) and software for realizing this function (not shown).

クライアント３０は、ユーザによって操作される装置、例えばパーソナルコンピュータまたは携帯電話機である。クライアント３０は、ユーザから入力されたクエリを含む検索要求を検索サーバ２０に送信する。クライアント３０は、検索サーバ２０から送信された検索結果を表示する。また、クライアント３０は、ジャンル識別子を含むジャンルスコアの算出要求を、ジャンルサーバ１０に送信する。クライアント３０は、ジャンルサーバ１０から送信されたジャンルスコアを用いて、クエリの提案を行う。クライアント３０は、この機能を実現するためのハードウェア要素（ＣＰＵ、ＲＯＭ、ＲＡＭ、記憶部、入出力部、表示部など）およびソフトウェアを有する（図示略）。 The client 30 is a device operated by a user, for example, a personal computer or a mobile phone. The client 30 transmits a search request including a query input by the user to the search server 20. The client 30 displays the search result transmitted from the search server 20. In addition, the client 30 transmits a genre score calculation request including the genre identifier to the genre server 10. The client 30 proposes a query using the genre score transmitted from the genre server 10. The client 30 includes hardware elements (CPU, ROM, RAM, storage unit, input / output unit, display unit, etc.) and software for realizing this function (not shown).

２．動作
図７は、検索システム１におけるジャンルスコア算出処理を示すフローチャートである。この例では、図７のフローが開始される以前に記憶部１３は図５に示されるジャンル辞書を記憶している。 2. Operation FIG. 7 is a flowchart showing a genre score calculation process in the search system 1. In this example, the storage unit 13 stores the genre dictionary shown in FIG. 5 before the flow of FIG. 7 is started.

ステップＳ１００において、ジャンルサーバ１０は、検索ログを蓄積する。すなわち、記憶部１１は、検索サーバから検索ログを受け取り、受け取った検索ログを、記憶している検索ログに追加して蓄積する。検索ログの蓄積は、検索サーバ２０において検索が行われる度、または、前回の蓄積から一定期間経過後など、所定のタイミングで行われる。 In step S100, the genre server 10 accumulates search logs. That is, the storage unit 11 receives a search log from the search server, and accumulates the received search log in addition to the stored search log. The search log is accumulated at a predetermined timing each time a search is performed in the search server 20, or after a certain period of time has elapsed since the previous accumulation.

ステップＳ１１０において、呼出部１４は、クライアント３０からジャンル識別子を取得する。ジャンル識別子は、クライアント３０から送信されるジャンルスコアの算出要求に含まれる。ここでは、呼出部１４が取得したジャンル識別子が「１５６７」であった場合を例に説明する。この場合、呼出部１４は、ジャンル辞書（図５）に含まれるデータセットの中から、取得したジャンル識別子に対応するクエリを抽出する（ステップＳ１２０）。この例では、クエリｑ１、ｑ３およびｑ４が抽出される。呼出部１４は、抽出したクエリおよびそのジャンルスコアを含むデータセットを、算出部１５に出力する。 In step S <b> 110, the calling unit 14 acquires a genre identifier from the client 30. The genre identifier is included in the genre score calculation request transmitted from the client 30. Here, a case where the genre identifier acquired by the calling unit 14 is “1567” will be described as an example. In this case, the calling unit 14 extracts a query corresponding to the acquired genre identifier from the data set included in the genre dictionary (FIG. 5) (step S120). In this example, queries q1, q3 and q4 are extracted. The calling unit 14 outputs a data set including the extracted query and its genre score to the calculating unit 15.

ステップＳ１３０において、生成部１２は、記憶部１１に記憶されている検索ログを用いてグラフおよび付随データを生成する。
ステップＳ１４０において、算出部１５は、生成部１２が生成したグラフを用いて、他のクエリからジャンルスコアを伝播させることによって、対象となるクエリのジャンルスコアを算出する。詳細には以下のとおりである。まず、算出部１５は、生成部１２からグラフを取得する。 In step S <b> 130, the generation unit 12 generates a graph and accompanying data using the search log stored in the storage unit 11.
In step S140, the calculation unit 15 calculates the genre score of the target query by propagating the genre score from other queries using the graph generated by the generation unit 12. Details are as follows. First, the calculation unit 15 acquires a graph from the generation unit 12.

図８は、この例で取得されたグラフを示す図である。この例で、グラフは、クエリノードおよびドメインノードを含んでいる。クエリノードは、クエリｑ１、ｑ２、ｑ３、およびｑ４のノードを含んでいる。ドメインノードは、ドメインｄ１、ｄ２およびｄ３のノードを含んでいる。クエリノードｑ１は、ドメインノードｄ１およびｄ２と接続されている。クエリノードｑ２は、ドメインノードｄ１およびｄ２と接続されている。クエリノードｑ３は、ドメインノードｄ１と接続されている。クエリノードｑ４は、ドメインノードｄ２およびｄ３と接続されている。 FIG. 8 is a diagram showing a graph acquired in this example. In this example, the graph includes a query node and a domain node. The query node includes nodes for queries q1, q2, q3, and q4. The domain node includes nodes of domains d1, d2, and d3. The query node q1 is connected to the domain nodes d1 and d2. Query node q2 is connected to domain nodes d1 and d2. The query node q3 is connected to the domain node d1. The query node q4 is connected to the domain nodes d2 and d3.

図９は、図８のグラフの付随データを示す図である。付随データは、複数のデータセットを含む。各データセットは、クエリ、ドメイン、ユニークユーザ数（ＮＵＵ）、ユニークユーザ率（ＲＵＵ）、および検索回数（Ｓｅａｒｃｈ）のデータを含む。ユニークユーザ数は、対応するクエリを用いて検索された文書のうち、対応するドメインに属する文書を閲覧したユニークユーザ数を示す。ユニークユーザとは、ある文書を特定の期間のうちに訪れた人のユニークな数をいう。同一期間内に同一人物がその文書を例えば５回閲覧した場合でも、ユニークユーザ数は１人である。図９の第１番目のデータセットは、クエリｑ１を用いて検索された文書のうち、ドメインｄ１に属する文書を閲覧したユニークユーザ数が９０人であることを示している。ユニークユーザ率は、対応するクエリを用いて検索された文書のうち、対応するドメインに属する文書を閲覧したユニークユーザの割合を示す。図９の第１番目のデータセットは、クエリｑ１を用いて検索された文書のうち、ドメインｄ１に属する文書を閲覧したユニークユーザの割合が０．９（＝９０％）であることを示している。これは、クエリｑ１を用いて検索された文書を閲覧した全ユニークユーザ（９０＋１０＝１００人）のうち、ドメインｄ１に属する文書を閲覧したユニークユーザ（９０人）が占める割合である。検索回数は、対応するクエリを用いて検索が行われた回数を示す。図９の第１番目のデータセットは、クエリｑ１を用いて２００回の検索が行われたことを示す。 FIG. 9 is a diagram showing accompanying data of the graph of FIG. The accompanying data includes a plurality of data sets. Each data set includes data of a query, a domain, the number of unique users (NUU), a unique user rate (RUU), and the number of searches (Search). The number of unique users indicates the number of unique users who have browsed documents belonging to the corresponding domain among the documents searched using the corresponding query. A unique user is a unique number of people who visited a document during a specific period. Even when the same person browses the document five times within the same period, the number of unique users is one. The first data set in FIG. 9 indicates that the number of unique users who browsed documents belonging to the domain d1 among the documents searched using the query q1 is 90. The unique user rate indicates a ratio of unique users who have browsed a document belonging to a corresponding domain among documents searched using a corresponding query. The first data set in FIG. 9 indicates that the ratio of unique users who viewed documents belonging to the domain d1 among documents retrieved using the query q1 is 0.9 (= 90%). Yes. This is the ratio of the unique users (90 people) who have viewed documents belonging to the domain d1 out of all unique users (90 + 10 = 100 people) who have browsed the documents searched using the query q1. The number of searches indicates the number of searches performed using the corresponding query. The first data set in FIG. 9 indicates that 200 searches have been performed using the query q1.

再び図７を参照し、ステップＳ１４０の処理を引き続き説明する。算出部１５は、クエリおよびジャンルスコアを含むデータセットを呼出部１４から取得する。ステップＳ１３０において、算出部１５は、取得したデータセットに含まれるクエリのうち対象となるクエリについて、グラフの付随データを用いてジャンルスコアを算出する処理を行う。この例では、算出部１５は、クエリｑ１、ｑ２およびｑ３のそれぞれについて、ジャンルスコアを算出する。 With reference to FIG. 7 again, the process of step S140 will be described. The calculation unit 15 acquires a data set including a query and a genre score from the calling unit 14. In step S130, the calculation unit 15 performs a process of calculating a genre score for the target query among the queries included in the acquired data set using the accompanying data of the graph. In this example, the calculation unit 15 calculates a genre score for each of the queries q1, q2, and q3.

図１０は、ステップＳ１４０のジャンルスコア算出処理の一例を示すフローチャートである。ステップＳ２００において、算出部１５は、グラフに含まれるすべてのクエリについて、ジャンルスコアの初期値を決定する。初期値は、以下のとおり決定される。まず、ステップＳ１１０において出力されたデータセットに含まれるクエリ（クエリｑ１、ｑ２、ｑ４）については、このデータセットに含まれるジャンルスコアが初期値として用いられる。次に、このデータセットに含まれないクエリ（クエリｑ３）、すなわち、ジャンル辞書にジャンルスコアが記載されていないクエリについては、初期値として０．００が用いられる。この例では、図５のジャンル辞書に基づいて、クエリｑ１−ｑ４のジャンルスコアの初期値ＪＳ_iniが次式（１）のように決定される。

FIG. 10 is a flowchart illustrating an example of the genre score calculation process in step S140. In step S200, the calculation unit 15 determines initial values of genre scores for all queries included in the graph. The initial value is determined as follows. First, for queries (queries q1, q2, q4) included in the data set output in step S110, the genre score included in this data set is used as an initial value. Next, 0.00 is used as an initial value for a query not included in this data set (query q3), that is, a query whose genre score is not described in the genre dictionary. In this example, the initial value JS _ini of the genre score of the queries q1-q4 is determined as in the following equation (1) based on the genre dictionary of FIG.

ステップＳ２１０において、算出部１５は、ジャンルスコアの伝播処理の対象となるクエリ（以下「対象クエリ」という）を特定する。算出部１５は、所定の条件を満たすクエリを、対象クエリとして特定する。この例では、「ジャンルスコアの初期値がゼロである」（＝ジャンル辞書にジャンルスコアが記録されてない）という条件が、対象クエリを特定する条件として用いられる。式（１）の例では、クエリｑ３だけが伝播処理の対象として特定され、クエリｑ１、ｑ２およびｑ４は伝播処理の対象とはならない。ここで、「ジャンルスコアの伝播」とは、あるクエリのジャンルスコアを、他のクエリのジャンルスコアを用いて算出することをいう。例えば、クエリｑ１およびｑ３のジャンルスコアを用いてクエリｑ２のジャンルスコアを算出することを、「クエリｑ１およびｑ３のジャンルスコアをクエリｑ２に伝播させる」という。 In step S <b> 210, the calculation unit 15 specifies a query (hereinafter referred to as “target query”) that is a target of genre score propagation processing. The calculation unit 15 identifies a query that satisfies a predetermined condition as a target query. In this example, the condition that “the initial value of the genre score is zero” (= the genre score is not recorded in the genre dictionary) is used as a condition for specifying the target query. In the example of Expression (1), only the query q3 is specified as the target of the propagation process, and the queries q1, q2, and q4 are not the targets of the propagation process. Here, “propagation of genre score” means that the genre score of a certain query is calculated using the genre score of another query. For example, calculating the genre score of the query q2 using the genre scores of the queries q1 and q3 is referred to as “propagating the genre scores of the queries q1 and q3 to the query q2.”

ステップＳ２２０において、算出部１５は、対象クエリの中から、基点となるクエリを特定する。以下、基点となるクエリを「基点クエリ」といい、基点クエリのノードを「基点ノード」という。基点クエリは、あらかじめ決められたアルゴリズムにより決定される。このアルゴリズムは、例えば、対象クエリの中から、クエリを順番に一つずつ基点クエリとして特定するアルゴリズムである。この場合、対象クエリはアイウエオ順、アルファベット順、文字コード順、またはＩＤ番号順などの規則にしたがってソートされる。この例では、対象クエリがクエリｑ２だけなので、クエリｑ２が基点クエリとして特定される。 In step S220, the calculation unit 15 identifies a query serving as a base point from the target queries. Hereinafter, a query serving as a base point is referred to as a “base point query”, and a node of the base point query is referred to as a “base point node”. The base point query is determined by a predetermined algorithm. This algorithm is, for example, an algorithm that specifies a query as a base point query one by one from the target query. In this case, the target queries are sorted according to a rule such as an Iueo order, an alphabetical order, a character code order, or an ID number order. In this example, since the target query is only the query q2, the query q2 is specified as the base point query.

ステップＳ２３０において、算出部１５は、基点クエリを含む経路についてエッジの重みを算出する。ここで、「経路」とは、グラフ上の２つのノードについて、エッジおよびノードを介した接続関係をいう。図８の例で、クエリノードｑ２は、クエリノードｑ１およびｑ３と、ドメインノードを介して接続されている。この接続は、以下の（Ａ）〜（Ｃ）の３つの経路を含んでいる。
（Ａ）ドメインノードｄ１を中継ノードとしてクエリノードｑ１と接続される経路Ａ。
（Ｂ）ドメインノードｄ１を中継ノードとしてクエリノードｑ３と接続される経路Ｂ。
（Ｃ）ドメインノードｄ２を中継ノードとしてクエリノードｑ１と接続される経路Ｃ。
これらの経路は、中継ノードとなるノードによって以下の２つに分類される。
（１）ドメインノードｄ１を中継ノードとする経路群１（経路ＡおよびＢ）。
（２）ドメインノードｄ２を中継ノードとする経路群２（経路Ｃ）。 In step S230, the calculation unit 15 calculates the edge weight for the route including the base point query. Here, the “path” refers to a connection relationship between two nodes on the graph via edges and nodes. In the example of FIG. 8, the query node q2 is connected to the query nodes q1 and q3 via the domain node. This connection includes the following three paths (A) to (C).
(A) A path A connected to the query node q1 using the domain node d1 as a relay node.
(B) A path B connected to the query node q3 using the domain node d1 as a relay node.
(C) A path C connected to the query node q1 using the domain node d2 as a relay node.
These routes are classified into the following two types depending on the node that is a relay node.
(1) Route group 1 (routes A and B) having the domain node d1 as a relay node.
(2) Route group 2 (route C) using the domain node d2 as a relay node.

算出部１５は、基点ノードｑｒと、中継ノードであるドメインノードｄｊと、他のクエリノードｑｉとを結ぶ経路のエッジの重みＰ_qi,dj,qrを、次式（２）により算出する。

ここで、ＲＵＵ_qi,djは、クエリｑｉを用いて検索された文書の中からドメインｄｊに属する文書を閲覧したユニークユーザの割合を示す。 The calculation unit 15 calculates the edge weights P _{qi, dj, qr} of the path connecting the base node qr, the domain node dj as a relay node, and another query node qi by the following equation (2).

Here, RUU _{qi, dj} indicates the percentage of unique users who have browsed documents belonging to the domain dj from among the documents searched using the query qi.

ステップＳ２４０において、算出部１５は、次式（３）により、基点クエリｑｒのジャンルスコアＪＳ（ｑｒ）を算出する。すなわち、算出部１５は、他のクエリのジャンルスコアを起点クエリに伝播させる。

ここで、Ｓｅａｒｃｈ_qiは、クエリｑｉを用いて検索が行われた回数を示す。なお、式（３）においてｑｉはｑｉ＝ｑｒとなる場合を含まない。 In step S240, the calculation unit 15 calculates the genre score JS (qr) of the base point query qr by the following equation (3). That is, the calculation unit 15 propagates the genre scores of other queries to the starting point query.

Here, Search _qi indicates the number of times a search is performed using the query qi. In Expression (3), qi does not include the case where qi = qr.

式（２）および式（３）を用いた計算例を、基点クエリがクエリｑ２である場合を例として具体的に説明する。算出部１５は、各経路について、式（２）を用いてエッジの重みＰを算出する。

次に、算出部１５は、式（３）を用いてジャンルスコアＪＳを算出する。ここでは紙面の都合により、経路群１と経路群２を分けて計算する。まず、経路群１については、

である。経路群２については、

である。式（５）および式（６）から、

が得られる。 A calculation example using Expression (2) and Expression (3) will be specifically described with an example in which the base query is the query q2. The calculation unit 15 calculates the edge weight P for each route using Equation (2).

Next, the calculation unit 15 calculates the genre score JS using Expression (3). Here, the route group 1 and the route group 2 are calculated separately due to space limitations. First, for route group 1,

It is. For route group 2,

It is. From Equation (5) and Equation (6),

Is obtained.

ステップＳ２５０において、算出部１５は、すべての対象クエリについて、そのクエリを基点クエリとしてジャンルスコアの伝播処理を行ったか判断する。まだ基点クエリとなっていない対象クエリがあった場合（Ｓ２５０：ＮＯ）、算出部１５は、処理をステップＳ２２０に移行する。すべての対象クエリを基点クエリとして伝播処理が行われた場合（Ｓ２５０：ＹＥＳ）、算出部１５は、処理をステップＳ２６０に移行する。 In step S250, the calculation unit 15 determines whether genre score propagation processing has been performed for all target queries using the queries as base queries. If there is a target query that has not yet become the base query (S250: NO), the calculation unit 15 moves the process to step S220. When the propagation process is performed using all target queries as the base query (S250: YES), the calculation unit 15 proceeds to step S260.

ステップＳ２６０において、算出部１５は、終了条件が満たされたか判断する。終了条件が満たされたと判断された場合（Ｓ２６０：ＹＥＳ）、算出部１５は、図１０の処理を終了する。終了条件が満たされていないと判断された場合（Ｓ２６０：ＮＯ）、算出部１５は、処理を再びステップＳ２２０に移行する。この例で、終了条件は、（１）ジャンルスコアが収束した、および（２）ステップＳ２２０−２５０の処理を一定回数（例えば１０周）繰り返した、という条件の少なくともいずれか一方が満たされた、という条件である。 In step S260, the calculation unit 15 determines whether the end condition is satisfied. When it is determined that the end condition is satisfied (S260: YES), the calculation unit 15 ends the process of FIG. When it is determined that the termination condition is not satisfied (S260: NO), the calculation unit 15 shifts the process to step S220 again. In this example, the end condition satisfies at least one of the following conditions: (1) the genre score has converged, and (2) the process of steps S220-250 is repeated a certain number of times (for example, 10 laps). This is the condition.

例えば、対象クエリがｑ１、ｑ２、…、ｑ１０の１０個である場合、まずクエリｑ１を対象クエリとして、ジャンルスコアの伝播処理が行われる。次に、クエリｑ２を対象クエリとして、ジャンルスコアの伝播処理が行われる。以下同様に、クエリｑ１０まで、一つずつ順番に対象クエリとして、ジャンルスコアの伝播処理が行われる。クエリｑ１、ｑ２、…、ｑ１０の各々について１回ずつ伝播処理が終了したとき、「伝播処理を１周させた」という。伝播処理を１周させると、終了条件が満たされたか判断される。終了条件が満たされていなかった場合、第２周の伝播処理が行われる。以下、終了条件が満たされるまで、第３周、第４周、…と伝播処理が繰り返し行われる。 For example, when there are ten target queries q1, q2,..., Q10, first, a genre score propagation process is performed using the query q1 as the target query. Next, genre score propagation processing is performed using the query q2 as a target query. Similarly, up to query q10, genre score propagation processing is performed as the target query one by one in order. When the propagation process is completed once for each of the queries q1, q2,..., Q10, it is said that “the propagation process is rotated once”. When the propagation process is performed once, it is determined whether the end condition is satisfied. If the end condition is not satisfied, the propagation process for the second round is performed. Hereinafter, the propagation process is repeated for the third, fourth,... Until the end condition is satisfied.

ふたたび図７を参照する。ステップＳ１５０において、出力部１６は、ジャンルスコアを含むデータをクライアント３０に出力する。さらに、出力部１６は、更新されたジャンルスコアをジャンル辞書に書き込む。 Again referring to FIG. In step S <b> 150, the output unit 16 outputs data including the genre score to the client 30. Further, the output unit 16 writes the updated genre score in the genre dictionary.

図１１は、ジャンルサーバ１０からの出力データを例示する図である。出力データは、複数のデータセットを含む。各データセットは、クエリと、そのクエリのジャンルスコアとを含む。クライアント３０は、ジャンルサーバ１０から出力データを受信する。こうして、クライアント３０は、ジャンルスコアの算出要求に対してジャンルスコアを得ることができる。クライアント３０は、受信した出力データに含まれるクエリのうち、所定の条件を満たすもの（例えば、ジャンルスコアが上位の５件）を、推薦クエリとして表示する。 FIG. 11 is a diagram illustrating output data from the genre server 10. The output data includes a plurality of data sets. Each data set includes a query and a genre score for the query. The client 30 receives output data from the genre server 10. Thus, the client 30 can obtain a genre score in response to a genre score calculation request. The client 30 displays the queries included in the received output data that satisfy a predetermined condition (for example, the top five genre scores) as recommended queries.

図１２は、対比例としてのグラフを示す図である。この対比例で、グラフは、クエリノードおよびＵＲＬノードを含んでいる。このグラフに基づいて、式（２）および（３）を用いてジャンルスコアを算出することも可能である（この場合、数式中のドメインノードに関する項はＵＲＬノードに置き換えられる）。しかし、ＵＲＬノードを用いた場合、ドメインノードを用いた場合と比較してエッジの密度は粗となる。これは、多くの場合、検索サービスを通して検索できる文書がスパース（まばら、わずか）であることによる。ある２つのクエリについて、一方のクエリから他方のクエリにジャンルスコアを伝播させようとした場合、これらのクエリのクエリノードが共通のＵＲＬノードに接続されていないと、ジャンルスコアを伝播させることができない。すなわち、図１２のグラフを用いた場合、共通のＵＲＬノードに接続されていないクエリノードに対しては、ジャンルスコアを伝播させることができないという問題がある。これに対し、ジャンルサーバ１０によれば、共通のＵＲＬノードに接続されていないクエリノードであっても、共通のドメインノードに接続されているクエリノードに対しては、ジャンルスコアを伝播させることができる。このような構成により、より柔軟にグラフを作成することが可能になる。また、ジャンルスコアの算出の際に複数のクエリからジャンルスコアが伝播される場合には、複数のクエリとの関連性を示す指標を一時に算出することができる。 FIG. 12 is a diagram showing a graph as a comparative example. In this contrast, the graph includes a query node and a URL node. Based on this graph, it is also possible to calculate the genre score using the equations (2) and (3) (in this case, the term relating to the domain node in the equation is replaced with the URL node). However, when the URL node is used, the edge density is coarser than when the domain node is used. This is often due to the fact that documents that can be searched through a search service are sparse. When trying to propagate a genre score from one query to the other for two queries, the genre score cannot be propagated unless the query nodes of these queries are connected to a common URL node. . That is, when the graph of FIG. 12 is used, there is a problem that a genre score cannot be propagated to a query node that is not connected to a common URL node. On the other hand, according to the genre server 10, even if the query node is not connected to the common URL node, the genre score can be propagated to the query node connected to the common domain node. it can. With such a configuration, a graph can be created more flexibly. Further, when the genre score is propagated from a plurality of queries when calculating the genre score, an index indicating the relevance to the plurality of queries can be calculated at a time.

別の対比例として、検索ログの内容が制限される場合がある。例えば、クエリ毎の検索回数はログとして記録されるが、ユーザ毎の検索行動履歴または文書毎の閲覧率はログとして記録されない場合がある。このような制限は、例えばハードウェアまたはソフトウェアのリソース不足または検索サーバの仕様に起因する。例えば、文書毎の閲覧率がログとして記録されることが前提となっており、これを用いて文書毎の重要度を算出し、さらに重要度を用いてジャンルスコアを算出する技術は、検索ログの内容が制限される環境には適用できないという問題がある。これに対し、ジャンルサーバ１０によれば、ユーザ毎の検索行動履歴または文書毎の閲覧率がログとして記録されない場合であっても、ジャンルスコアを伝播させることができる。 As another contrast, the contents of the search log may be limited. For example, the number of searches for each query is recorded as a log, but the search behavior history for each user or the browsing rate for each document may not be recorded as a log. Such a limitation is caused by, for example, a lack of hardware or software resources or a search server specification. For example, it is assumed that the browsing rate for each document is recorded as a log, and the technique for calculating the importance for each document using this and further calculating the genre score using the importance is a search log. There is a problem that it cannot be applied to an environment in which the content of this is restricted. On the other hand, according to the genre server 10, the genre score can be propagated even when the search behavior history for each user or the browsing rate for each document is not recorded as a log.

以上で説明したように、ジャンルサーバ１０によれば、従来ジャンルスコアを算出することができなかったクエリについても、ジャンルスコアを算出することができる。ドメインノードを含むグラフを用いた場合には、ＵＲＬノードを用いた場合と比較して、より広範囲にジャンルスコアを伝播させることができる。 As described above, according to the genre server 10, a genre score can be calculated even for a query for which a conventional genre score could not be calculated. When a graph including a domain node is used, a genre score can be propagated in a wider range than when a URL node is used.

３．他の実施形態
本発明は上述の実施形態に限定されるものではなく、種々の変形実施が可能である。以下、変形例をいくつか説明する。以下の変形例のうち、２つ以上のものが組み合わせて用いられてもよい。 3. Other Embodiments The present invention is not limited to the above-described embodiments, and various modifications can be made. Hereinafter, some modifications will be described. Two or more of the following modifications may be used in combination.

３−１．変形例１
図１３は、変形例１に係るグラフを例示する図である。生成部１２が生成するグラフの構造は、図３で例示したものに限定されない。図１３の例で、グラフは、クエリノード、ＵＲＬノードおよびドメインノードを含む。これらのノードは、階層的に配置されている。この例では、下位から、クエリノード、ＵＲＬノード、ドメインノードの順で配置されている。この例では、算出部１５は、クエリノードとＵＲＬノードのエッジも考慮して、ジャンルスコアを伝播する。 3-1. Modification 1
FIG. 13 is a diagram illustrating a graph according to the first modification. The structure of the graph generated by the generation unit 12 is not limited to that illustrated in FIG. In the example of FIG. 13, the graph includes a query node, a URL node, and a domain node. These nodes are arranged in a hierarchy. In this example, the query node, URL node, and domain node are arranged in this order from the bottom. In this example, the calculation unit 15 propagates the genre score in consideration of the edges of the query node and the URL node.

図１４は、図１３のグラフに付随するデータを示す図である。このデータは、複数のデータセットを含む。各データセットは、クエリ、ＵＲＬ、ドメイン、ＮＵＵ、ＲＵＵ、検索回数、およびページビュー（ＰＶ）のデータを含む。この例で、ユニークユーザ数は、対応するクエリを用いて検索された文書のうち、対応するＵＲＬにある文書を閲覧したユニークユーザ数を示す。図１４の第１番目のデータセットは、クエリｑ１を用いて検索された文書のうち、ＵＲＬｓ１にある文書を閲覧したユニークユーザが９０人いたことを示している。また、ページビューは、対応するＵＲＬにある文書が閲覧された回数を示す。図１４の第１番目のデータセットは、ＵＲＬｓ１にある文書が８０回閲覧されたことを示す。 FIG. 14 is a diagram showing data associated with the graph of FIG. This data includes a plurality of data sets. Each data set includes query, URL, domain, NUU, RUU, search count, and page view (PV) data. In this example, the number of unique users indicates the number of unique users who have browsed a document at a corresponding URL among documents retrieved using a corresponding query. The first data set in FIG. 14 indicates that there are 90 unique users who have browsed the document at URLs1 among the documents searched using the query q1. The page view indicates the number of times the document at the corresponding URL has been browsed. The first data set in FIG. 14 indicates that the document at URLs1 has been viewed 80 times.

算出部１５は、式（２）に代わり次式（８）に従ってエッジの重みＰを算出する。

次に、算出部１５は、式（３）に従ってジャンルスコアＪＳを算出する。 The calculation unit 15 calculates the edge weight P according to the following equation (8) instead of the equation (2).

Next, the calculation unit 15 calculates the genre score JS according to Equation (3).

ここで、クエリｑ２が基点クエリである場合を例に、計算の具体例を示す。図１４のデータと式（２）から、エッジの重みＰ_{q1,s1,d1,s2,q2}は以下のとおり算出される。

さらに、式（３）から、ジャンルスコアＪＳは以下のとおり算出される。

Here, a specific example of calculation will be shown by taking the case where the query q2 is a base point query as an example. From the data in FIG. 14 and the equation (2), the edge weights P _{q1, s1, d1, s2, q2} are calculated as follows.

Furthermore, the genre score JS is calculated from the equation (3) as follows.

３−２．変形例２
ジャンルスコアを算出する式は式（３）に限定されない。重み付け係数を用いて式（３）が修正されてもよい。変形例２では、次式（１１）によりジャンルスコアが算出される。

ここで、係数Ｗｅｉｇｈｔ_djは、ドメインｄｊと対象となるジャンルとの間の相関を示すパラメータ（以下「伝播調整パラメータ」という）である。この例で、係数Ｗｅｉｇｈｔ_djは、相関係数βであり、ジャンルスコアＪＳ、検索回数Ｓｅａｒｃｈ、ユニークユーザ率ＲＵＵの関数である。

以下、式（１１）によるジャンルスコアの算出を、具体例を用いて説明する。 3-2. Modification 2
The formula for calculating the genre score is not limited to the formula (3). Equation (3) may be modified using a weighting factor. In Modification 2, the genre score is calculated by the following equation (11).

Here, the coefficient Weight _dj is a parameter indicating the correlation between the domain dj and the target genre (hereinafter referred to as “propagation adjustment parameter”). In this example, the coefficient Weight _dj is a correlation coefficient β, and is a function of the genre score JS, the search count Search, and the unique user rate RUU.

Hereinafter, the calculation of the genre score by Expression (11) will be described using a specific example.

図１５は、検索システム１の変形例２に係る動作を示すフローチャートである。図１５のフローは、ステップＳ２２０とＳ２３０との間にステップＳ３００が挿入されている点において図１０のフローチャートと相違している。ステップＳ３００において、算出部１５は、式（１２）に従って、各ドメインについて伝播調整パラメータを算出する。ステップＳ２４０において、算出部１５は、式（１１）に従ってジャンルスコアを算出する。以下、計算の具体例を説明する。 FIG. 15 is a flowchart illustrating an operation according to the second modification of the search system 1. The flow of FIG. 15 is different from the flowchart of FIG. 10 in that step S300 is inserted between steps S220 and S230. In step S300, the calculation unit 15 calculates a propagation adjustment parameter for each domain according to Equation (12). In step S240, the calculation unit 15 calculates a genre score according to equation (11). Hereinafter, a specific example of calculation will be described.

図１６は、変形例２に係るグラフを例示する図である。この例のグラフは、図１３のグラフと同じ階層構造を有している。 FIG. 16 is a diagram illustrating a graph according to the second modification. The graph of this example has the same hierarchical structure as the graph of FIG.

図１７は、図１６のグラフに付随するデータを示す図である。このデータは、複数のデータセットを含む。各データセットは、クエリ、ＵＲＬ、ドメイン、ＮＵＵ、ＲＵＵ、および検索回数のデータを含む。 FIG. 17 is a diagram showing data associated with the graph of FIG. This data includes a plurality of data sets. Each data set includes query, URL, domain, NUU, RUU, and search count data.

算出部１５は、ステップＳ３００において伝播調整パラメータＷｅｉｇｈｔを算出する。図１７のデータと式（１２）から、伝播調整パラメータＷｅｉｇｈｔ_d2は以下のとおり算出される。

ここでは、Ｗｅｉｇｈｔ_d2＝０．００１であった場合を例に説明する。 In step S300, the calculation unit 15 calculates the propagation adjustment parameter Weight. From the data in FIG. 17 and the equation (12), the propagation adjustment parameter Weight _d2 is calculated as follows.

Here, a case where Weight _d2 = 0.001 will be described as an example.

算出部１５は、ステップＳ２３０において式（２）に従ってエッジの重みＰを算出する。ここで、クエリｑ２が基点クエリである場合を例に、計算の具体例を示す。図１７のデータと式（２）から、エッジの重みＰ_q1,q2は以下のとおり算出される。

In step S230, the calculation unit 15 calculates the edge weight P according to the equation (2). Here, a specific example of calculation will be shown by taking the case where the query q2 is a base point query as an example. From the data of FIG. 17 and the equation (2), the edge weights P _{q1 and q2} are calculated as follows.

算出部１５は、ステップＳ２４０においてジャンルスコアＪＳを算出する。図１７のデータと式（１１）から、ジャンルスコアＪＳ（ｑ２）は以下のとおり算出される。

The calculation unit 15 calculates the genre score JS in step S240. The genre score JS (q2) is calculated from the data of FIG. 17 and the equation (11) as follows.

別の例で、図１６のグラフでクエリノードｑ２とクエリノードｑ４がドメインノードｄ４に属するＵＲＬノードを介して接続されていた場合、ジャンルスコアＪＳ（ｑ２）は以下のとおり算出される。

In another example, when the query node q2 and the query node q4 are connected via the URL node belonging to the domain node d4 in the graph of FIG. 16, the genre score JS (q2) is calculated as follows.

例えばWikipedia（登録商標）のような特定のジャンルとの関連性が薄い文書があった場合、従来のように単にクエリノードとＵＲＬノードだけを含むグラフを用いたのでは、ジャンルとの関連性を正確に評価できなかった。例えば、図１２のグラフだけを用いてジャンル辞書を用いなかった場合、あるＵＲＬ（すなわち文書そのもの）が特定のジャンルに属するかどうかを判断するのは容易ではない。一例として、Wikipediaのような辞書サイトがある。辞書サイトは種々のジャンルのクエリに紐付いていると考えられるので、特定の単一のジャンルを有さない場合がある。ＵＲＬノードとクエリノードの繋がりだけを考慮した場合、特定のジャンルを有さないＵＲＬノードを介してジャンルスコアが伝播されるという問題がある。ジャンルサーバ１０によれば、共通するジャンルに関連付けられたクエリノードからジャンルスコアが伝播される。本変形例の構成によれば、ドメイン毎に集約された伝播調整パラメータを用いることで伝播されるジャンルスコアをドメイン単位で抑制し、より精度の高いジャンルスコアを算出することができる。 For example, when there is a document with a low relationship with a specific genre such as Wikipedia (registered trademark), if a graph including only a query node and a URL node is used as in the past, the relationship with the genre is Could not evaluate accurately. For example, when only the graph of FIG. 12 is used and the genre dictionary is not used, it is not easy to determine whether a certain URL (that is, the document itself) belongs to a specific genre. An example is a dictionary site such as Wikipedia. A dictionary site is thought to be tied to queries of various genres, so it may not have a specific single genre. When only the connection between the URL node and the query node is considered, there is a problem that the genre score is propagated through the URL node that does not have a specific genre. According to the genre server 10, the genre score is propagated from the query node associated with the common genre. According to the configuration of this modification, the genre score that is propagated by using the propagation adjustment parameters aggregated for each domain can be suppressed on a domain basis, and a genre score with higher accuracy can be calculated.

３−３．変形例３
図１８は、変形例３に係るグラフを例示する図である。図１８の例で、グラフは、クエリノード、ＵＲＬノード、ドメインノードおよびユーザノードを含む。これらのノードは、階層的に配置されている。この例では、下位から、クエリノード、ＵＲＬノード、ドメインノードおよびユーザノードの順で配置されている。ドメインノードとユーザノードは同じ階層に配置されている。 3-3. Modification 3
FIG. 18 is a diagram illustrating a graph according to the third modification. In the example of FIG. 18, the graph includes a query node, a URL node, a domain node, and a user node. These nodes are arranged in a hierarchy. In this example, the query node, URL node, domain node, and user node are arranged in this order from the bottom. Domain nodes and user nodes are arranged in the same hierarchy.

この例で、ジャンルスコアは、ドメインノードを介する経路に加え、ＵＲＬノードおよびユーザノードを介して伝播される。例えば、クエリｑ１のジャンルスコアは、ＵＲＬノードｓ２、ユーザノードｕ２、およびＵＲＬノードｓ３を介して、クエリｑ２に伝播される。クエリｑ２のジャンルスコアは、変形例２で説明したように、ドメインノードを基準とする重み付け係数を用いて算出される。 In this example, the genre score is propagated through the URL node and the user node in addition to the route through the domain node. For example, the genre score of the query q1 is propagated to the query q2 via the URL node s2, the user node u2, and the URL node s3. As described in the second modification, the genre score of the query q2 is calculated using a weighting coefficient based on the domain node.

この構成によれば、クエリノードはＵＲＬノードまたはユーザノードを介して伝播されるので、単にクエリノードとＵＲＬノードとのみを含むグラフを用いた場合と比較して、より網羅性の高いグラフが生成される。すなわち、より多くのクエリにジャンルスコアを与えることが可能である。 According to this configuration, since the query node is propagated via the URL node or the user node, a more comprehensive graph is generated as compared with the case where a graph including only the query node and the URL node is used. Is done. That is, it is possible to give a genre score to more queries.

３−４．変形例４
図１９は、変形例４に係る検索システム１の構成を示す図である。この例で、ジャンルサーバ１０は、図１の構成に加え、さらに記憶部１７を有する。記憶部１７は、生成部１２が生成したグラフのデータを記憶する。 3-4. Modification 4
FIG. 19 is a diagram illustrating a configuration of the search system 1 according to the fourth modification. In this example, the genre server 10 further includes a storage unit 17 in addition to the configuration of FIG. The storage unit 17 stores the graph data generated by the generation unit 12.

図２０は、変形例４に係る検索ログを例示する図である。この例の検索ログにおいて、各データセットは、クエリおよび検索回数のデータを含むが、ユーザおよび閲覧された文書（ＵＲＬ）のデータは含まれていない。上述の実施形態において、エッジの重みＰは、クエリの検索回数や、文書を閲覧したユニークユーザ数を用いて算出された。しかし、この例のように、検索ログがユーザおよび閲覧された文書（ＵＲＬ）のデータを含んでいない場合には、上述の実施形態と同様の方法ではエッジの重みＰを算出することができない。そこでこの例で、生成部１２は、記憶部１１に記憶されている検索ログに加え、検索サーバ２０から取得したデータを用いて、グラフを生成する。詳細には以下のとおりである。 FIG. 20 is a diagram illustrating a search log according to the fourth modification. In the search log of this example, each data set includes data of a query and the number of searches, but does not include data of a user and a viewed document (URL). In the above-described embodiment, the edge weight P is calculated using the number of query searches and the number of unique users who have viewed the document. However, as in this example, when the search log does not include data of the user and the viewed document (URL), the edge weight P cannot be calculated by the same method as in the above-described embodiment. Therefore, in this example, the generation unit 12 generates a graph using data acquired from the search server 20 in addition to the search log stored in the storage unit 11. Details are as follows.

生成部１２は、記憶部１１から検索ログを取得する。生成部１２は、取得した検索ログから、クエリを抽出する。図２０の例では、クエリｑ１およびｑ２が抽出される。生成部１２は、抽出したクエリを含む検索要求を、検索サーバ２０に送信する。生成部１２（ジャンルサーバ１０）から検索要求を受信すると、検索サーバ２０は、検索を行い、その結果をジャンルサーバ１０に送信する。 The generation unit 12 acquires a search log from the storage unit 11. The generation unit 12 extracts a query from the acquired search log. In the example of FIG. 20, queries q1 and q2 are extracted. The generation unit 12 transmits a search request including the extracted query to the search server 20. Upon receiving a search request from the generation unit 12 (genre server 10), the search server 20 performs a search and transmits the result to the genre server 10.

図２１は、検索結果を例示する図である。この例で、検索結果は、複数のデータセットを含む。各データセットは、クエリ、ＵＲＬ、および順位のデータを含む。図２１の例で第１−第２番目のデータセットは、クエリｑ１を用いて検索を行うと、ＵＲＬｓ１の文書が第１位、ＵＲＬｓ２の文書が第２位であることを示している。 FIG. 21 is a diagram illustrating search results. In this example, the search result includes a plurality of data sets. Each data set includes query, URL, and ranking data. In the example of FIG. 21, the first to second data sets indicate that the URLs1 document is first and the URLs2 document is second when a search is performed using the query q1.

生成部１２は、検索結果から、欠損データの推定値を算出する。欠損データとは、ジャンルスコアの算出に用いられるデータのうち、検索ログに含まれていないデータをいう。例えば、式（２）および（３）に従ってジャンルスコアを算出する場合、ユニークユーザ率ＲＵＵが欠損データである。欠損データの推定値は、例えば以下のように算出される。各ＵＲＬには、順位に応じて検索回数が割り当てられる。例えば、第１位：第２位＝２：１の割合で検索回数が割り当てられる。図２０の例では、１５８回が、ＵＲＬｓ１（第１位の文書）とＵＲＬｓ２（第２位の文書）に２：１の割合でユニークユーザ数が割り当てられる。すなわち、生成部１２は、クエリｑ１を用いて行われた全１５８回の検索のうち、１０５．３３人のユニークユーザがＵＲＬｓ１に割り当てられ、５２．６７人のユニークユーザがＵＲＬｓ２に割り当てられる。生成部１２は、このようにして割り当てられたユニークユーザ数を用いて、ユニークユーザ率の推定値を算出する。 The generation unit 12 calculates an estimated value of missing data from the search result. Missing data refers to data that is not included in the search log among the data used to calculate the genre score. For example, when the genre score is calculated according to equations (2) and (3), the unique user rate RUU is missing data. The estimated value of missing data is calculated as follows, for example. Each URL is assigned the number of searches according to the order. For example, the number of searches is assigned at a ratio of first place: second place = 2: 1. In the example of FIG. 20, 158 times, the number of unique users is assigned to URLs1 (first document) and URLs2 (second document) at a ratio of 2: 1. That is, the generation unit 12 assigns 105.33 unique users to URLs1 and 52.67 unique users to URLs2 out of a total of 158 searches performed using the query q1. The generation unit 12 calculates an estimated value of the unique user rate using the number of unique users assigned in this way.

図２２は、算出された推定値を含むテーブルを例示する図である。生成部１２は、このようにして算出されたユニークユーザ率の推定値を用いてグラフの付随データを生成する。 FIG. 22 is a diagram illustrating a table including the calculated estimated values. The generation unit 12 generates the accompanying data of the graph using the estimated value of the unique user rate calculated as described above.

この構成によれば、検索ログが図２に例示したデータを完全に含んでいない場合であっても、欠損しているデータを推定することにより、ジャンルスコアを算出することができる。 According to this configuration, even if the search log does not completely include the data illustrated in FIG. 2, the genre score can be calculated by estimating the missing data.

３−５．他の変形例
検索ログの生成は、ジャンルサーバ１０自身によって行われてもよいし、ジャンルサーバ１０以外の装置（例えば検索サーバ２０）によって行われてもよい。ジャンルサーバ１０が検索ログを生成する場合、検索サーバ２０は、検索を行う度に、検索結果をジャンルサーバ１０に送信する。ジャンルサーバ１０の算出部１５は、受信した検索結果を、記憶部１１に記憶されている検索ログに追加する。検索サーバ２０が検索ログを生成する場合、検索サーバ２０は、検索ログを記憶する記憶部を有している。検索サーバ２０は、検索を行う度に、検索結果を検索ログに追加する。検索サーバ２０は、所定のタイミング（前回検索ログを送信してから一定期間経過後や、ジャンルサーバ１０から要求があった時など）で、検索ログをジャンルサーバ１０に送信する。ジャンルサーバ１０の記憶部１１は、受信した検索ログを記憶する。 3-5. Other Modifications The search log may be generated by the genre server 10 itself or by a device other than the genre server 10 (for example, the search server 20). When the genre server 10 generates a search log, the search server 20 transmits a search result to the genre server 10 each time a search is performed. The calculation unit 15 of the genre server 10 adds the received search result to the search log stored in the storage unit 11. When the search server 20 generates a search log, the search server 20 has a storage unit that stores the search log. The search server 20 adds a search result to the search log each time a search is performed. The search server 20 transmits the search log to the genre server 10 at a predetermined timing (after a certain period of time has elapsed since the previous search log was transmitted, or when there is a request from the genre server 10). The storage unit 11 of the genre server 10 stores the received search log.

検索ログの書式は、図２で例示したものに限定されない。例えば、図２の例では、検索の結果として得られた文書のＵＲＬが第１位〜第３位まで記録されているが、記録されるＵＲＬの数はこれに限定されない。検索ログに記録されるＵＲＬの数は、例えば、システムの構成（例えばハードウェアリソースの量）により制限される。 The format of the search log is not limited to that illustrated in FIG. For example, in the example of FIG. 2, the URL of the document obtained as a result of the search is recorded from the first place to the third place, but the number of URLs to be recorded is not limited to this. The number of URLs recorded in the search log is limited by, for example, the system configuration (for example, the amount of hardware resources).

また、ジャンルサーバ１０は、記憶部１１を有していなくてもよい。この場合、ジャンルサーバ１０とは異なる別の装置が、記憶部１１を有する。ジャンルサーバ１０は、検索ログにアクセスするときは、その別の装置を介して検索ログにアクセスする。 Further, the genre server 10 may not include the storage unit 11. In this case, another device different from the genre server 10 has the storage unit 11. When accessing the search log, the genre server 10 accesses the search log via the other device.

ジャンル識別子の指定方法は、実施形態で説明したものに限定されない。すなわち、呼出部１４への入力はジャンル識別子に限定されない。ジャンル識別子ではなく、クエリが呼出部１４に入力されてもよい。この場合、呼出部１４は、ジャンル辞書を参照し、入力されたクエリに対応するジャンルと同じジャンルに対応するクエリを抽出する。図５の例では、クエリｑ１が入力されると、呼出部１４は、クエリｑ１に対応するジャンル（「お笑い」）と同じジャンルに対応する、クエリｑ３およびｑ４を抽出する。 The method for specifying a genre identifier is not limited to that described in the embodiment. That is, the input to the calling unit 14 is not limited to the genre identifier. A query may be input to the calling unit 14 instead of the genre identifier. In this case, the calling unit 14 refers to the genre dictionary and extracts a query corresponding to the same genre as the genre corresponding to the input query. In the example of FIG. 5, when a query q1 is input, the calling unit 14 extracts queries q3 and q4 corresponding to the same genre as the genre corresponding to the query q1 (“comed”).

検索対象となる文書は、インターネット上のファイルに限定されない。ローカルネットワークまたはスタンドアローンコンピュータ上のファイルに対し、実施形態で説明したジャンルスコア算出技術が適用されてもよい。例えばスタンドアローンコンピュータ上のファイルに対してこのジャンルスコア算出技術を適用する場合、ジャンルサーバ１０、検索サーバ２０およびクライアント３０としての機能をすべて、単一のコンピュータが有する。ＵＲＬは、ドライブ名およびフォルダ（ディレクトリ）名を含めたファイル名に読み替えられる。ドメインは、より上位の階層のフォルダ名に読み替えられる。 A document to be searched is not limited to a file on the Internet. The genre score calculation technique described in the embodiment may be applied to a file on a local network or a stand-alone computer. For example, when this genre score calculation technique is applied to a file on a stand-alone computer, all of the functions as the genre server 10, the search server 20, and the client 30 have a single computer. The URL is read as a file name including a drive name and a folder (directory) name. The domain is replaced with a folder name at a higher level.

ＵＲＬからドメインを抽出する方法は、実施形態で説明したものに限定されない。例えば、ＵＲＬの先頭から、「://」の次の２番目の「/」の前までの文字列がドメインとして抽出されてもよい。この場合、ＵＲＬが「aaa://bbb.ccc.ddd/eee/fff」であったときは、「aaa://bbb.ccc.ddd/eee」という文字列がドメインとして抽出される。 The method for extracting the domain from the URL is not limited to that described in the embodiment. For example, a character string from the beginning of the URL to the second “/” next to “: //” may be extracted as a domain. In this case, when the URL is “aaa: //bbb.ccc.ddd/eee/fff”, the character string “aaa: //bbb.ccc.ddd/eee” is extracted as a domain.

ジャンル辞書のデータ構造は図５に示されるものに限定されない。例えば、ジャンル識別子とジャンルのうちどちらか一方は省略されてもよい。あるいは、一のクエリに複数のジャンルが関連付けられていてもよい。この場合、複数のジャンルが関連づけられたクエリに対して、呼出部１４は、ステップＳ１２０において、対象となっているジャンルを含むデータセットだけを抽出する。 The data structure of the genre dictionary is not limited to that shown in FIG. For example, either one of the genre identifier and the genre may be omitted. Alternatively, a plurality of genres may be associated with one query. In this case, in response to a query associated with a plurality of genres, the calling unit 14 extracts only a data set including the genre that is the target in step S120.

ジャンル識別子（ジャンルスコアの算出要求）の入力元と、ジャンルスコアの出力先の装置は同一でなくてもよい。例えば、ジャンルサーバ１０は、ジャンル識別子の入力元であるクライアント３０とは別の装置に、ジャンルスコアを出力してもよい。あるいは、ジャンルサーバ１０は、クライアント３０とは別の装置から入力されたジャンル識別子に対して、クライアント３０にジャンルスコアを出力してもよい。 The input source of the genre identifier (genre score calculation request) and the output device of the genre score may not be the same. For example, the genre server 10 may output the genre score to a device different from the client 30 that is the input source of the genre identifier. Alternatively, the genre server 10 may output a genre score to the client 30 for a genre identifier input from a device different from the client 30.

グラフのデータ構造は実施形態で説明したものに限定されない。実施形態で付随データとして説明したテーブルだけが、グラフとして処理されてもよい。 The data structure of the graph is not limited to that described in the embodiment. Only the table described as the accompanying data in the embodiment may be processed as a graph.

対象クエリを特定する条件は、実施形態で説明したものに限定されない。実施形態において、算出部１５は、ジャンルスコアの初期値がゼロのクエリを対象クエリとしたが、ジャンルスコアの初期値が所定のしきい値以下のクエリを対象クエリとしてもよい。あるいは、算出部１５は、ジャンルスコアの初期値を算出せず、単にジャンル辞書にジャンルスコアが記載されていないクエリを対象クエリとしてもよい。さらに別の例で、算出部１５は、ユーザにより指定されたクエリを対象クエリとして特定してもよい。 The conditions for specifying the target query are not limited to those described in the embodiment. In the embodiment, the calculation unit 15 uses a query whose genre score initial value is zero as a target query, but may use a query whose genre score initial value is a predetermined threshold value or less as a target query. Alternatively, the calculation unit 15 may not calculate the initial value of the genre score, and may simply use a query whose genre score is not described in the genre dictionary as a target query. In yet another example, the calculation unit 15 may specify a query specified by the user as a target query.

中継ノードとなるノードは、ドメインノードに限定されない。ユーザノード、ＵＲＬノード等、ドメインノード以外のノードが中継ノードとなってもよい。 A node serving as a relay node is not limited to a domain node. Nodes other than domain nodes, such as user nodes and URL nodes, may be relay nodes.

エッジの重み（すなわちノード間の繋がりの強さ）の算出に用いられるアルゴリズムは実施形態で説明したものに限定されない。例えば、ハイパーリンクの構造分析に用いられるPageRank、HITS algorithm、SALSA、またはTrustRankなど、周知のアルゴリズムが用いられてもよい。
また、エッジの重みは、グラフを生成する際にあわせて計算されてもよい。すなわち、図１０のフローにおいて、ステップＳ２３０の処理は、ステップＳ２２０−Ｓ２５０の処理ループの外側にあってもよい。 The algorithm used for calculating the edge weight (that is, the strength of connection between nodes) is not limited to that described in the embodiment. For example, a well-known algorithm such as PageRank, HITS algorithm, SALSA, or TrustRank used for the structure analysis of the hyperlink may be used.
Further, the edge weight may be calculated together with the generation of the graph. That is, in the flow of FIG. 10, the process of step S230 may be outside the process loop of steps S220-S250.

ジャンルスコア算出処理のフローは、図７に示されるものに限定されない。例えば、ステップＳ１５０におけるジャンル辞書の書き込み、すなわちジャンル辞書の更新は省略されてもよい。また別の例で、ステップＳ１２０の処理とステップＳ１３０の処理の順番は入れ替えられてもよい。 The flow of the genre score calculation process is not limited to that shown in FIG. For example, the writing of the genre dictionary in step S150, that is, the updating of the genre dictionary may be omitted. In another example, the order of the process of step S120 and the process of step S130 may be switched.

ジャンルスコアの算出要求（ジャンル識別子）を送信する装置と、この要求に対する応答を受信する装置とは異なっていてもよい。 A device that transmits a genre score calculation request (genre identifier) may be different from a device that receives a response to the request.

クライアント３０におけるジャンルスコアの利用方法は、クエリの推薦に限定されない。ジャンルスコアを表示する処理、またはジャンルを推薦する処理等、クエリの推薦以外の処理が行われてもよい。 The method of using the genre score in the client 30 is not limited to query recommendation. Processing other than query recommendation, such as processing for displaying a genre score or processing for recommending a genre, may be performed.

図１に示される機能を実現するためのハードウェア構成は、図６で説明されたものに限定されない。例えば、汎用のＣＰＵに代わり、特定の処理を行うプロセッサが用いられてもよい。 The hardware configuration for realizing the functions shown in FIG. 1 is not limited to that described in FIG. For example, instead of a general-purpose CPU, a processor that performs specific processing may be used.

上述の実施形態においてＣＰＵ１１１によって実行されるプログラムは、磁気記録媒体（磁気テープ、磁気ディスク（ＨＤＤ、ＦＤ（Flexible Disk））など）、光記録媒体（光ディスク（ＣＤ（Compact Disk）、ＤＶＤ（Digital Versatile Disk））など）、光磁気記録媒体、半導体メモリ（フラッシュＲＯＭなど）などのコンピュータ読取り可能な記録媒体に記憶した状態で提供されてもよい。また、このプログラムは、インターネットのようなネットワーク経由でダウンロードされてもよい。 In the above-described embodiment, the program executed by the CPU 111 includes a magnetic recording medium (magnetic tape, magnetic disk (HDD, FD (Flexible Disk)), etc.), an optical recording medium (optical disk (CD (Compact Disk)), DVD (Digital Versatile). Disk)), etc.), a magneto-optical recording medium, and a computer-readable recording medium such as a semiconductor memory (flash ROM or the like). The program may be downloaded via a network such as the Internet.

１…検索システム、１０…ジャンルサーバ、１１…記憶部、１２…生成部、１３…記憶部、１４…呼出部、１５…算出部、１６…出力部、１７…記憶部、２０…検索サーバ、３０…クライアント、１１０…制御部、１１１…ＣＰＵ、１１２…ＲＡＭ、１１３…ＲＯＭ、１２０…記憶部、１３０…入力部、１４０…表示部、１５０…通信部 DESCRIPTION OF SYMBOLS 1 ... Search system, 10 ... Genre server, 11 ... Storage part, 12 ... Generation part, 13 ... Storage part, 14 ... Calling part, 15 ... Calculation part, 16 ... Output part, 17 ... Storage part, 20 ... Search server, DESCRIPTION OF SYMBOLS 30 ... Client, 110 ... Control part, 111 ... CPU, 112 ... RAM, 113 ... ROM, 120 ... Storage part, 130 ... Input part, 140 ... Display part, 150 ... Communication part

Claims

First acquisition means for acquiring a graph in which a plurality of nodes including a query node indicating a query used for search and a relay node indicating other items related to the search other than the query are connected by an edge;
A genre score and a query corresponding to a specified genre identifier are acquired from a genre dictionary including a query, a genre identifier indicating the genre of the query, and a genre score indicating the degree of relevance between the query and the genre. A second acquisition means;
Among the query nodes included in the graph acquired by the first acquisition means, a node identified from target nodes indicating queries for which a genre score is not described in the genre dictionary is used as a base node, and the base node and the The genre score of the query of the base node is calculated using the genre score of the query indicated by the other query node connected at the edge via the relay node from the genre score acquired by the second acquisition unit. A server apparatus comprising: calculation means for repeatedly performing the process while updating the base node until an end condition is satisfied, and calculating genre scores of a plurality of queries.

The plurality of nodes include a first location node indicating first location information indicating a location of a document browsed by the user as a result of the search using a hierarchical structure, and second location information higher than the first location information. A second location node to indicate, and a user node indicating the name of the user who performed the search,
The relay node is the first location node;
The process calculates a genre score of the base query using a genre score of a query indicated by the second location node connected to the relay node at the edge or another query node connected to the user node at the edge. The server device according to claim 1, wherein the server device performs processing.

A first location node indicating first location information indicating a location of a document browsed by the user as a result of the search using a hierarchical structure, and second location information higher than the first location information A second location node indicating
The relay node is the first location node;
The calculation means calculates, for each second location information, a parameter indicating a correlation between the second location information and the genre,
The process is a process of calculating a genre score of the query of the base node using the genre score of the query indicated by another query node related to the second location node related to the base node and the parameter. The server device according to claim 1, characterized in that:

Third acquisition means for acquiring a search log in which the query used for the search and the number of searches that are the number of times the search has been performed are accumulated;
A fourth acquisition unit that inputs a query included in the search log to a search server that outputs a search result when a query is input, and acquires the search result for the query;
An estimation unit that estimates the weight of the edge using the search log acquired by the third acquisition unit and the search result acquired by the fourth acquisition unit;
The process is a process of calculating a genre score of a query of the base node using a genre score of a query indicated by another query node related to the second location node related to the base node and the estimated weight. The server device according to claim 1, wherein:

The control means of the computer stores a search log including a plurality of items including the query used for the search and the first location information indicating the location of the document browsed by the search in a hierarchical structure in the first storage means. Steps,
The step of the control means extracting the query and other items other than the query among the items included in the search log stored in the first storage means as nodes;
The control means generating a graph connected to the query node indicating the extracted query and the relay node indicating the other extracted items by an edge;
The genre score corresponding to the specified genre identifier from the second storage means storing the query, the genre identifier indicating the genre of the query, and the genre dictionary including the genre score indicating the relevance between the query and the genre And the control means obtains the query;
Other query nodes connected to the base node by an edge via the relay node, with a node specified from target nodes satisfying a predetermined condition among the query nodes included in the generated graph as a base node The process of calculating the genre score of the query of the base node using the genre score of the query indicated by is performed from among the acquired genre scores while updating the base node until an end condition is satisfied, A genre score calculation method comprising: calculating genre scores of a plurality of queries by the control means .

The plurality of nodes include a first location node indicating first location information indicating a location of a document browsed by the user as a result of the search using a hierarchical structure, and second location information higher than the first location information. A second location node to indicate, and a user node indicating the name of the user who performed the search,
The relay node is the first location node;
The process uses the genre score of the base query using the genre score of the query indicated by the second location node connected to the relay node at the edge or another query node connected to the user node at the edge, The genre score calculation method according to claim 5, wherein the control means calculates the genre score.

A first location node indicating first location information indicating a location of a document browsed by the user as a result of the search using a hierarchical structure, and second location information higher than the first location information A second location node indicating
The relay node is the first location node;
For each of the second location information, the control means further includes a step of calculating a parameter indicating a correlation between the second location information and the genre,
In the processing, the control means calculates the genre score of the query of the base node using the genre score of the query indicated by another query node related to the second location node related to the base node and the parameter. It is processing. The genre score calculation method of Claim 5 characterized by the above-mentioned.

Storing the search log including the query used for the search and the number of times the search has been performed, in the computer storage means in the first storage means;
A step of inputting a query included in a search log stored in the first storage unit to a search server that outputs a search result when a query is input, and the control unit acquires a search result for the query When,
The step of the control means extracting the query and other items other than the query among the items included in the search log stored in the first storage means as nodes;
The control means generating a graph connected to the query node indicating the extracted query and the relay node indicating the other extracted items by an edge;
The control means estimating the weight of the edge using the search log stored in the first storage means and the acquired search results;
The genre score corresponding to the specified genre identifier from the second storage means storing the query, the genre identifier indicating the genre of the query, and the genre dictionary including the genre score indicating the degree of relevance between the query and the genre And the control means obtains the query;
Other query nodes connected to the base node by an edge via the relay node, with a node specified from target nodes satisfying a predetermined condition among the query nodes included in the generated graph as a base node The processing of calculating the genre score of the query of the base node using the genre score of the query indicated by is used from the acquired genre scores and using the estimated weights until the end condition is satisfied A genre score calculation method comprising: repeatedly performing genre scores of a plurality of queries by the control means while updating the base node.

On the computer,
Storing a search log including a plurality of items including a query used for the search and first location information indicating a location of the document browsed by the search in a hierarchical structure in the first storage unit;
Of the items included in the search log stored in the first storage unit, extracting the query and other items other than the query as nodes;
Generating a graph connected to the query node indicating the extracted query and the relay node indicating the other extracted items by an edge;
The genre score corresponding to the specified genre identifier from the second storage means storing the query, the genre identifier indicating the genre of the query, and the genre dictionary including the genre score indicating the relevance between the query and the genre And retrieving a query,
Other query nodes connected to the base node by an edge via the relay node, with a node specified from target nodes satisfying a predetermined condition among the query nodes included in the generated graph as a base node The process of calculating the genre score of the query of the base node using the genre score of the query indicated by is performed from among the acquired genre scores while updating the base node until an end condition is satisfied, A program for executing a step of calculating a genre score of a plurality of queries.

On the computer,
Storing a search log including the query used for the search and the number of times the search has been performed in the first storage unit;
Inputting a query contained in a search log stored in the first storage means to a search server that outputs a search result when a query is input, and obtaining a search result for the query;
Of the items included in the search log stored in the first storage unit, extracting the query and other items other than the query as nodes;
Generating a graph connected to the query node indicating the extracted query and the relay node indicating the other extracted items by an edge;
Estimating the weight of the edge using the search log stored in the first storage means and the acquired search result;
The genre score corresponding to the specified genre identifier from the second storage means storing the query, the genre identifier indicating the genre of the query, and the genre dictionary including the genre score indicating the degree of relevance between the query and the genre And retrieving a query,
Other query nodes connected to the base node by an edge via the relay node, with a node specified from target nodes satisfying a predetermined condition among the query nodes included in the generated graph as a base node The processing of calculating the genre score of the query of the base node using the genre score of the query indicated by is used from the acquired genre scores and using the estimated weights until the end condition is satisfied A program for repeatedly executing the genre score of a plurality of queries by updating the base node.