JP2013088923A

JP2013088923A - Important query extraction device, important query extraction method and important query extraction program

Info

Publication number: JP2013088923A
Application number: JP2011227090A
Authority: JP
Inventors: Kei Uchiumi; 慶内海
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2011-10-14
Filing date: 2011-10-14
Publication date: 2013-05-13
Anticipated expiration: 2031-10-14
Also published as: JP5524160B2

Abstract

PROBLEM TO BE SOLVED: To provide an important query extraction device capable of determining and outputting a query with a high degree of importance, in association with input queries.SOLUTION: An important query extraction device 10 comprises: a query acquisition part 11 for acquiring queries; a query log storage part 17 for storing query logs; a first node setting part 121 for setting the acquired queries to a first node; a second node setting part 122 for extracting a query including a word included in the queries of the first node from the query logs, and setting the extracted query to a second node; a score setting part 13 for adding a directed edge between the nodes, and setting its score; and a query output part 14 for identifying an important node on the basis of the score, and outputting the query as an important query. The score setting part 13 sets a score of the directed edge going from one node to the other node, on the basis of the number of words included in the other node and a numerical value showing scarcity of words common to the nodes.

Description

本発明は、複数のクエリのなかで重要なクエリを抽出する重要クエリ抽出装置、重要クエリ抽出方法および重要クエリ抽出プログラムに関する。 The present invention relates to an important query extraction device, an important query extraction method, and an important query extraction program for extracting an important query from a plurality of queries.

インターネット上の情報を検索するシステムとして、キーワード検索機能を備えたサーチエンジンが用いられている。
この際、検索を行うユーザが、適切なキーワードを入力できない可能性もある。このため、キーワードを入力したユーザに対して、より適切なクエリを提示する方法が提案されている（例えば、特許文献１参照）。 A search engine having a keyword search function is used as a system for searching information on the Internet.
At this time, the user who performs the search may not be able to input an appropriate keyword. For this reason, a method for presenting a more appropriate query to a user who has input a keyword has been proposed (for example, see Patent Document 1).

特許文献１では、クエリの発生頻度と、前記クエリに関するユーザ満足度スコアとに基づいてクエリランクを計算し、このクエリランクが高順位のクエリを抽出する。そして、入力されたクエリに対して類似度の高いクエリであり、かつ、前記クエリランクが高順位のクエリを修正クエリとして提供する。 In Patent Document 1, a query rank is calculated based on a query occurrence frequency and a user satisfaction score related to the query, and a query having a higher rank is extracted. A query having a high similarity to the input query and having a high query rank is provided as a modified query.

特開２００８−５３５０９０号公報JP 2008-535090 A

しかしながら、前記特許文献１では、スペルミスを含むクエリであっても、そのクエリの入力頻度が高い場合には、高ランクと判定されてしまうため、必ずしもユーザの望むクエリが高ランクと判定されていないという問題があった。 However, in Patent Document 1, even if a query includes a spelling error, if the input frequency of the query is high, the query is determined to have a high rank. Therefore, the query desired by the user is not necessarily determined to have a high rank. There was a problem.

本発明の目的は、入力されたクエリに関連して、重要度の高いクエリを判定して出力できる重要クエリ抽出装置、重要クエリ抽出方法および重要クエリ抽出プログラムを提供することにある。 An object of the present invention is to provide an important query extraction device, an important query extraction method, and an important query extraction program that can determine and output a query having high importance in relation to an input query.

本発明の重要クエリ抽出装置は、検索キーワードとなる単語が含まれるクエリを取得するクエリ取得部と、前記クエリ取得部で取得されたクエリを、クエリログとして順次記憶するクエリログ記憶部と、前記クエリ取得部で取得したクエリを第１ノードに設定する第１ノード設定部と、前記クエリログ記憶部に記憶されているクエリログから、前記第１ノードのクエリに含まれる単語を含むクエリを抽出し、抽出されたクエリを第２ノードに設定する第２ノード設定部と、前記各ノード間に有向エッジを追加し、各有向エッジのスコアを設定するスコア設定部と、前記スコア設定部で設定されたスコアに基づいて重要ノードを特定し、そのノードに設定されたクエリを重要クエリとして出力するクエリ出力部とを備え、前記スコア設定部は、一方のノードから他方のノードに向かう有向エッジのスコアを、他方のノードに含まれる単語数と、各ノードに共通する単語の希少性を表す数値とに基づいて設定することを特徴する。 The important query extraction device of the present invention includes a query acquisition unit that acquires a query including a word that is a search keyword, a query log storage unit that sequentially stores the query acquired by the query acquisition unit as a query log, and the query acquisition A query including a word included in the query of the first node is extracted from a query log stored in the query log storage unit and a first node setting unit that sets the query acquired in the first node Set by the second node setting unit for setting the query to the second node, a score setting unit for adding a directed edge between the nodes, and setting a score for each directed edge, and the score setting unit. A query output unit that identifies an important node based on the score and outputs a query set in the node as an important query, and the score setting unit includes: Score directed edges directed from square node to the other, and the number of words contained in the other node, to said setting based on the value representing the scarcity of words common to each node.

本発明によれば、クエリログ記憶部は、端末装置からインターネットなどのネットワークを介して送信されるキーワード検索用のクエリを、順次記憶してクエリログとして記憶する。たとえば、ウェブページをキーワードで検索する検索サーバに対して、各端末装置からクエリが入力されると、このクエリは検索サーバから本発明の重要クエリ抽出装置に転送され、前記クエリログ記憶部に記憶される。従って、本発明の重要クエリ抽出装置は、ウェブページ等の各種検索サーバに組み込まれていてもよいし、検索サーバとデータ通信可能に構成されていてもよい。
そして、端末装置からクエリが入力されると、第１ノード設定部は、その入力クエリを第１ノードに設定する。第２ノード設定部は、第1ノードのクエリに含まれる単語を含むクエリを前記クエリログ記憶部から抽出して第２ノードに設定する。
たとえば、入力クエリとして、歌手名である「Ａ」という固有名詞と、「曲」という歌手の属性に関連する言葉の二つの単語が入力された場合、第１ノード設定部は、「Ａ曲」という二つの単語を含むクエリを第１ノードに設定する。また、第２ノード設定部は、「Ａ」という単語を含むクエリと、「曲」という単語を含むクエリを抽出し、第２ノードに設定する。たとえば、第２ノードとして設定されるクエリには、「Ａ」、「Ａ、画像」、「Ａ、歌詞」など、過去に固有名詞「Ａ」を含んで入力された各種クエリと、「歌詞、曲」など、過去に「曲」を含んで入力された各種クエリが抽出され、これらにより複数の第２ノードが設定される。 According to the present invention, the query log storage unit sequentially stores a keyword search query transmitted from a terminal device via a network such as the Internet and stores it as a query log. For example, when a query is input from each terminal device to a search server that searches a web page using a keyword, the query is transferred from the search server to the important query extraction device of the present invention and stored in the query log storage unit. The Therefore, the important query extraction device of the present invention may be incorporated in various search servers such as a web page, or may be configured to be able to perform data communication with the search server.
Then, when a query is input from the terminal device, the first node setting unit sets the input query to the first node. The second node setting unit extracts a query including a word included in the query of the first node from the query log storage unit and sets the query as a second node.
For example, when two words, a proper noun “A” that is a singer name and a word related to the attribute of the singer “song”, are input as input queries, the first node setting unit displays “A song”. A query including these two words is set as the first node. The second node setting unit extracts a query including the word “A” and a query including the word “song” and sets the extracted query as a second node. For example, the query set as the second node includes various queries that have been input including the proper noun “A” in the past, such as “A”, “A, image”, “A, lyrics”, and “lyrics, Various queries that have been input including “music” in the past, such as “music”, are extracted, and a plurality of second nodes are set by these.

スコア設定部は、各ノード間に有向エッジを追加し、有向エッジのスコアを設定する。この際、第２ノードが複数ある場合には、有向エッジは、第１ノードと各第２ノード間だけでなく、第２ノード間にも設定される。この際、スコア設定部は、２つのノード間の有向エッジのスコアを、有向エッジの方向によって設定する。すなわち、第１ノードおよび第２ノード間の有向エッジは、第２ノードから第１ノードに向かう第１の有向エッジと、逆方向の第２の有向エッジとの２つが設定され、各有向エッジはそれぞれスコアが設定される。
２つのノード間の各有向エッジのスコアは、２つのノードに共通する単語の希少性が高い場合に高くなる。希少性を表す数値とは、たとえば、クエリログ記憶部に記憶されたクエリログにおいて、その単語が含まれるクエリの数であり、このクエリの数が少ないほど希少性が高いと判断できる。たとえば、歌手名である「Ａ」という単語は、「曲」といった普通名詞に比べると、クエリ履歴の数も少なくなる。「曲」という単語は、他の歌手名等と共に入力されることも多いからである。なお、この希少性は、たとえば、単語のデータベースを作成しておき、単語毎に希少性を表す数値を設定しておいてもよい。 The score setting unit adds a directed edge between the nodes and sets the score of the directed edge. At this time, when there are a plurality of second nodes, the directed edge is set not only between the first node and each second node but also between the second nodes. At this time, the score setting unit sets the score of the directed edge between the two nodes according to the direction of the directed edge. That is, two directed edges between the first node and the second node are set, a first directed edge from the second node toward the first node and a second directed edge in the opposite direction. A score is set for each directed edge.
The score of each directed edge between two nodes is high when the rarity of words common to the two nodes is high. The numerical value representing the rarity is, for example, the number of queries including the word in the query log stored in the query log storage unit, and it can be determined that the rarity is higher as the number of queries is smaller. For example, the word “A” that is the name of a singer has a smaller number of query histories than an ordinary noun such as “song”. This is because the word “song” is often input together with other singer names. The rarity may be created, for example, by creating a database of words and setting a numerical value representing the rarity for each word.

また、有向エッジのスコアは、有向エッジが向かう方向のノードにおける単語数も考慮して設定される。たとえば、単語数が多いほどスコアが低くなり、少ないほどスコアが高くなるように設定する。すなわち、第１ノードとして、「Ａ（歌手名）、曲」の２つの単語が入力されている場合に、第２ノードとして歌手名「Ａ」のみのノードと、「Ａ、Ｂ、Ｃ」の３名の歌手名が設定されたノードとを想定した場合、歌手名Ａと曲をクエリとして入力したユーザにとって、他の歌手名Ｂ，Ｃは検索キーワードとしては不要であり、ノイズとなってしまう。このため、歌手名「Ａ」のみのクエリのほうが重要となる。前記「Ｂ，Ｃ」はいずれも固有名詞であるため、単語の希少性のみでスコアを設定しようとすると、前記２つのクエリでスコアの差が出にくい。一方、単語数もスコアの算出に加えれば、単語数が少ないほどスコアが高くなり、前記歌手名「Ａ」のみのクエリのスコアを高くできる。 The score of the directed edge is set in consideration of the number of words in the node in the direction toward the directed edge. For example, the score is set to be lower as the number of words is larger, and the score is higher as the number is smaller. That is, when two words “A (singer name), song” are input as the first node, a node having only the singer name “A” as the second node, and “A, B, C” Assuming a node in which three singer names are set, the other singer names B and C are unnecessary as search keywords for the user who has input the singer name A and the song as a query, and it causes noise. . For this reason, a query with only the singer name “A” is more important. Since “B, C” are both proper nouns, if it is attempted to set a score based only on the rarity of a word, it is difficult for the two queries to produce a difference in score. On the other hand, if the number of words is added to the score calculation, the smaller the number of words, the higher the score, and the higher the score of the query for the singer name “A” alone.

以上のように、希少性の高い単語を入力している場合、ユーザはその単語に関係する情報を最も要求していると推定できる。また、希少性の高い単語を含んでいても、クエリ内の単語数が多い場合には、その他の単語がノイズとなって適切な検索結果が得られない可能性がある。従って、その単語を含むクエリで単語数が少ないものが最も重要度の高いクエリと判定でき、本発明によればクエリログ記憶部に記憶されたクエリログから重要クエリを的確に抽出できる。このようにして抽出した重要クエリは、クエリ出力部によって、ユーザの端末装置等に出力される。このため、端末装置を操作するユーザは、スペルミスや誤表記を含まない、例えば固有表現の正しい表記などの重要クエリを得ることができる。 As described above, when a word having a high rarity is input, it can be estimated that the user most demands information related to the word. Even if words with a high degree of rarity are included, if the number of words in the query is large, other words may become noise and an appropriate search result may not be obtained. Accordingly, a query including the word with a small number of words can be determined as the most important query, and according to the present invention, an important query can be accurately extracted from the query log stored in the query log storage unit. The important query extracted in this manner is output to the user terminal device or the like by the query output unit. For this reason, the user who operates the terminal device can obtain an important query such as a correct notation of a proper expression, which does not include a spelling error or a misrepresentation.

また、端末装置を操作するユーザは、クエリ候補として重要クエリを選択できる。従って、入力したクエリの検索結果として適切なものが無かった場合などに、本発明の重要クエリ抽出装置から出力された重要クエリを選択して再度検索することで、適切な検索結果を得ることができる。 A user who operates the terminal device can select an important query as a query candidate. Accordingly, when there is no appropriate search result of the input query, an appropriate search result can be obtained by selecting the important query output from the important query extraction device of the present invention and searching again. it can.

本発明の重要クエリ抽出装置において、前記スコア設定部は、入力されたクエリに含まれる単語の中で、最も希少性の高い単語のみで構成されるノードが存在する場合は、そのノードに向かう有向エッジのスコアを最も高く設定し、前記クエリ出力部は、前記有向エッジのスコアが最も高いノードのクエリを重要クエリとして出力することが好ましい。
スコア設定部は、単語数が少ないクエリほどエッジのスコアを高くし、希少性の高い単語ほどエッジのスコアを高くする。このため、入力されたクエリに含まれる単語のなかで、最も希少性の高い単語が１つのみ含まれたクエリのノードに向かうエッジのスコアが最も高い値となる。従って、クエリ出力部は、スコアの高いノードのクエリを重要クエリとして出力する。
このため、歌手名Ａのように固有名詞のみのクエリがクエリログ記憶部の履歴に存在すれば、そのクエリが重要クエリとして出力され、ユーザがその重要クエリを用いて検索すれば、その固有名詞に関する検索結果を得ることができる。 In the important query extraction device of the present invention, when there is a node composed only of the most rare word among the words included in the input query, the score setting unit is directed to the node. It is preferable that the score of the directional edge is set to be the highest, and the query output unit outputs a query of a node having the highest score of the directional edge as an important query.
The score setting unit increases the edge score as the query has a smaller number of words, and increases the edge score as the word is more rare. For this reason, among the words included in the input query, the score of the edge toward the node of the query including only one of the rarest words is the highest value. Therefore, the query output unit outputs a query of a node having a high score as an important query.
For this reason, if a query of only proper nouns exists in the history of the query log storage unit like the singer name A, the query is output as an important query, and if the user searches using the important query, Search results can be obtained.

本発明の重要クエリ抽出装置において、前記スコア設定部は、ノードｉからノードｊに向かうエッジのスコアをH_{i,j}、ノードｊに設定されたクエリを構成する単語数をN_q_j、前記クエリログ記憶部において単語ｗを含むクエリ数を示す数値をDF(w)、ノードｉおよびノードｊに含まれる単語に関するDF(w)の逆数を加算した値をΣ_{w∈q_i∩q_j}1/DF(w)とした場合に、前記エッジのスコアを以下の式（１）で求めることが好ましい。
［数１］
H_{i,j}＝1/N_q_j＊(Σ_{w∈q_i∩q_j} 1/DF(w)) ……（１） In the important query extraction apparatus of the present invention, the score setting unit includes H_ {i, j} as the score of the edge from the node i to the node j, N_q_j as the number of words constituting the query set in the node j, and the query log. A numerical value indicating the number of queries including the word w in the storage unit is DF (w), and a value obtained by adding the reciprocal of DF (w) regarding the words included in the nodes i and j is Σ_ {w∈q_i∩q_j} 1 / DF In the case of (w), the edge score is preferably obtained by the following equation (1).
[Equation 1]
H_ {i, j} = 1 / N_q_j * (Σ_ {w∈q_i∩q_j} 1 / DF (w)) …… (1)

この式（１）を用いれば、ノードｊの単語数が分母となっているので、単語数が多いほどスコアが小さくなる。また、各ノードに共通する単語に関し、その単語を含むクエリの数を示すDF(w)の逆数を、共通する単語分だけ加算している。従って、クエリ数が少ない単語、つまり希少性の高い単語ほど、スコアが大きくなる。
従って、前記スコアは、単語数が少なく、各ノードに共通する単語の希少性が高い場合に値が大きくなり、このスコアが大きくなればクエリの重要度も高くなると判定できる。 If this formula (1) is used, since the number of words of node j is the denominator, the score becomes smaller as the number of words increases. For words common to each node, the reciprocal of DF (w) indicating the number of queries including the word is added for the common words. Accordingly, the score becomes larger as the number of queries is smaller, that is, the word is more rare.
Therefore, the score increases when the number of words is small and the rareness of words common to each node is high, and it can be determined that the importance of the query increases as the score increases.

本発明の重要クエリ抽出方法は、コンピュータにより、重要クエリを抽出する重要クエリ抽出方法であって、前記コンピュータは、クエリの履歴が記憶されたクエリログ記憶部を備え、検索キーワードとなる単語が含まれるクエリを取得するクエリ取得ステップと、前記クエリ取得ステップで取得したクエリを第１ノードに設定する第１ノード設定ステップと、前記クエリログ記憶部に記憶されているクエリの履歴から、前記第１ノードのクエリに含まれる単語を含むクエリを抽出し、抽出されたクエリを第２ノードに設定する第２ノード設定ステップと、前記各ノード間に有向エッジを追加し、各有向エッジのスコアを設定するスコア設定ステップと、前記スコア設定ステップで設定されたスコアに基づいて重要ノードを特定し、そのノードに設定されたクエリを重要クエリとして出力するクエリ出力ステップとを実行し、前記スコア設定ステップは、一方のノードから他方のノードに向かう有向エッジのスコアを、他方のノードに含まれる単語数と、各ノードに共通する単語の希少性を表す数値とに基づいて設定することを特徴する。 The important query extraction method of the present invention is an important query extraction method for extracting an important query by a computer, and the computer includes a query log storage unit in which a history of the query is stored, and a word as a search keyword is included. From the query acquisition step of acquiring a query, the first node setting step of setting the query acquired in the query acquisition step as a first node, and the history of queries stored in the query log storage unit, A query including a word included in the query is extracted, a second node setting step for setting the extracted query as a second node, a directional edge is added between the nodes, and a score for each directional edge is set. And an important node is identified based on the score set in the score setting step and the score set in the score setting step. A query output step of outputting the query set to as an important query, wherein the score setting step calculates the score of the directed edge from one node to the other node, and the number of words included in the other node. And setting based on a numerical value representing the rarity of words common to each node.

本発明の重要クエリ抽出プログラムは、コンピュータに、検索キーワードとなる単語が含まれるクエリを取得するクエリ取得ステップと、前記クエリ取得ステップで取得したクエリを第１ノードに設定する第１ノード設定ステップと、クエリの履歴が記憶されたクエリログ記憶部に記憶されているクエリの履歴から、前記第１ノードのクエリに含まれる単語を含むクエリを抽出し、抽出されたクエリを第２ノードに設定する第２ノード設定ステップと、前記各ノード間に有向エッジを追加し、各有向エッジのスコアを設定するスコア設定ステップと、前記スコア設定ステップで設定されたスコアに基づいて重要ノードを特定し、そのノードに設定されたクエリを重要クエリとして出力するクエリ出力ステップとを実行させる重要クエリ抽出プログラムであって、前記スコア設定ステップは、一方のノードから他方のノードに向かう有向エッジのスコアを、他方のノードに含まれる単語数と、各ノードに共通する単語の希少性を表す数値とに基づいて設定することを特徴する。 The important query extraction program of the present invention includes a query acquisition step for acquiring a query including a word as a search keyword in a computer, and a first node setting step for setting the query acquired in the query acquisition step as a first node; A query including a word included in the query of the first node is extracted from a query history stored in a query log storage unit in which a query history is stored, and the extracted query is set as a second node. 2-node setting step, adding a directional edge between the nodes, setting a score of each directional edge, and identifying an important node based on the score set in the score setting step, An important query extraction program that executes a query output step that outputs the query set for that node as an important query The score setting step includes a score of a directed edge from one node to the other node, the number of words included in the other node, and a numerical value representing the rarity of words common to each node; It is characterized by setting based on

これらの重要クエリ抽出方法および重要クエリ抽出プログラムにおいても、前記重要クエリ抽出装置と同様の作用効果を奏することができる。 Also in these important query extraction methods and important query extraction programs, the same effects as the important query extraction device can be obtained.

本発明の実施形態にかかる検索システムの構成を示すブロック図。The block diagram which shows the structure of the search system concerning embodiment of this invention. 本実施形態の有向グラフを示す図。The figure which shows the directed graph of this embodiment. 本実施形態の重要クエリ抽出処理を示すフローチャート。The flowchart which shows the important query extraction process of this embodiment.

以下、本発明の実施形態を図面に基づいて説明する。
図１には、本発明の重要クエリ抽出装置１０を用いた検索システム１を示す。
検索システム１は、重要クエリ抽出装置１０と、ウェブページの情報を収集・検索する情報検索装置２０と、端末装置３０とを備えている。重要クエリ抽出装置１０および情報検索装置２０と、端末装置３０とは、図示略のインターネットを介して通信可能に構成されている。また、重要クエリ抽出装置１０および端末装置３０によって、検索補助システムが構成されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 shows a search system 1 using an important query extraction device 10 of the present invention.
The search system 1 includes an important query extraction device 10, an information search device 20 that collects and searches web page information, and a terminal device 30. The important query extraction device 10 and the information search device 20 and the terminal device 30 are configured to be communicable via the Internet (not shown). The important query extraction device 10 and the terminal device 30 constitute a search assistance system.

インターネットは、ＴＣＰ／ＩＰなどの汎用のプロトコルに基づくインターネットである。各端末装置３０は、移動しながらインターネットに接続可能であることが好ましい。このため、端末装置３０は、携帯電話網を介してインターネットに接続したり、無線ＬＡＮ（ＬＯＣＡＬＡＲＥＡＮＥＴＷＯＲＫ）を介してインターネットに接続する。
なお、端末装置３０を重要クエリ抽出装置１０や情報検索装置２０に接続させるための構成は、インターネットに限定されず、無線媒体により情報が送受信可能な複数の基地局がネットワークを構成する通信回線網や放送網などのネットワーク、さらには、データを直接受信するための媒体となる無線媒体自体など、データを送受信させるいずれの構成も利用できる。 The Internet is the Internet based on a general-purpose protocol such as TCP / IP. Each terminal device 30 is preferably connectable to the Internet while moving. For this reason, the terminal device 30 is connected to the Internet via a mobile phone network or connected to the Internet via a wireless LAN (LOCAL AREA NETWORK).
Note that the configuration for connecting the terminal device 30 to the important query extraction device 10 and the information search device 20 is not limited to the Internet, and a communication line network in which a plurality of base stations capable of transmitting and receiving information using a wireless medium constitute a network. Any configuration that transmits and receives data can be used, such as a network such as a broadcast network, or a wireless medium itself that directly receives data.

重要クエリ抽出装置１０や情報検索装置２０は、ハードウェア構成として、ＲＡＭ（ＲＡＮＤＯＭＡＣＣＥＳＳＭＥＭＯＲＹ）やＲＯＭ（ＲＥＡＤＯＮＬＹＭＥＭＯＲＹ）等の図示しない記憶手段と、ＣＰＵ（ＣＥＮＴＲＡＬＰＲＯＣＥＳＳＩＮＧＵＮＩＴ）等の制御手段と、を備えたサーバ装置で構成されている。ＲＡＭはデータなどを一時的に記憶できる一時領域などを有しており、ＲＯＭはＯＳ（ＯＰＥＲＡＴＩＮＧＳＹＳＴＥＭ：基本ソフトウェア）や各種サーバを制御するプログラム、各種アプリケーション、各種データ等が格納されている。ＣＰＵは、これらの記憶手段に格納されているプログラムを読み出し、このプログラムに従って各種処理を行う。特に、重要クエリ抽出装置１０は、記憶手段に記憶された重要クエリ抽出プログラムを読み出して、後述する重要クエリ抽出方法を実行する。 The important query extraction device 10 and the information search device 20 have a hardware configuration including a storage unit (not shown) such as a RAM (RANDOM ACCESS MEMORY) and a ROM (READ ONLY MEMORY), a control unit such as a CPU (CENTRAL PROCESSING UNIT), It is comprised with the server apparatus provided with. The RAM has a temporary area where data and the like can be temporarily stored, and the ROM stores an OS (OPERATING SYSTEM: basic software), programs for controlling various servers, various applications, various data, and the like. The CPU reads programs stored in these storage means and performs various processes according to the programs. In particular, the important query extraction apparatus 10 reads out an important query extraction program stored in the storage unit and executes an important query extraction method described later.

［情報検索装置］
情報検索装置２０は、図１に示すように、クローラ部２１と、ウェブページデータベース（ＷＥＢページＤＢ）２２と、サーチエンジン２３を備えている。
クローラ部２１は、インターネット上のウェブページを巡回し、その内容を収集し、前記ＷＥＢページＤＢ２２に登録する検索ロボットである。
サーチエンジン２３は、全文検索などを行うものであり、端末装置３０から検索用のキーワードが入力されると、前記ＷＥＢページＤＢ２２を用いてキーワードを検索し、その検索結果を端末装置３０に出力する。
このような情報検索装置２０は、従来から知られているものであり、詳細な説明は省略する。 [Information retrieval device]
As shown in FIG. 1, the information search device 20 includes a crawler unit 21, a web page database (WEB page DB) 22, and a search engine 23.
The crawler unit 21 is a search robot that circulates web pages on the Internet, collects the contents thereof, and registers them in the WEB page DB 22.
The search engine 23 performs full-text search or the like. When a search keyword is input from the terminal device 30, the search engine 23 searches the keyword using the WEB page DB 22 and outputs the search result to the terminal device 30. .
Such an information retrieval apparatus 20 is conventionally known and will not be described in detail.

［重要クエリ抽出装置］
重要クエリ抽出装置１０は、図１に示すように、クエリ取得部１１、ノード設定部１２、スコア設定部１３、クエリ出力部１４、クエリログ記憶部１７、クエリ関係記憶部１８を備えている。ノード設定部１２は、第１ノード設定部１２１と、第２ノード設定部１２２とを備える。 [Important query extractor]
As illustrated in FIG. 1, the important query extraction device 10 includes a query acquisition unit 11, a node setting unit 12, a score setting unit 13, a query output unit 14, a query log storage unit 17, and a query relation storage unit 18. The node setting unit 12 includes a first node setting unit 121 and a second node setting unit 122.

クエリ取得部１１は、端末装置３０から情報検索装置２０に入力されるクエリを、情報検索装置２０から入手し、クエリログ記憶部１７に記憶する。このため、クエリログ記憶部１７には、様々な端末装置３０から情報検索装置２０に入力された過去のクエリの履歴が記憶される。
また、クエリ取得部１１は、今回、情報検索装置２０から入力されたクエリをノード設定部１２に出力する。 The query acquisition unit 11 acquires a query input from the terminal device 30 to the information search device 20 from the information search device 20 and stores the query in the query log storage unit 17. For this reason, the history of past queries input from various terminal devices 30 to the information search device 20 is stored in the query log storage unit 17.
Further, the query acquisition unit 11 outputs the query input from the information search device 20 to the node setting unit 12 this time.

ノード設定部１２は、クエリ取得部１１で取得した各クエリの関係を示す有向グラフのノードを設定し、クエリ関係記憶部１８に記憶する。
このノード設定部１２における処理を、具体例に基づいて説明する。
まず、ノード設定部１２において設定されてクエリ関係記憶部１８に記憶される有向グラフ（クエリの関係モデル）の一例について、図２に基づいて説明する。
図２に示す有向グラフは、歌手名ABCDEFと曲の２つの単語を含むクエリが入力された場合の例である。 The node setting unit 12 sets a directed graph node indicating the relationship between the queries acquired by the query acquisition unit 11 and stores the node in the query relationship storage unit 18.
The processing in the node setting unit 12 will be described based on a specific example.
First, an example of a directed graph (query relationship model) set in the node setting unit 12 and stored in the query relationship storage unit 18 will be described with reference to FIG.
The directed graph shown in FIG. 2 is an example when a query including two words of a singer name ABCDEF and a song is input.

すなわち、ノード設定部１２の第１ノード設定部１２１は、入力されたクエリ「ABCDEF、曲」を第１ノード４１に設定する。
次に、第２ノード設定部１２２は、第１ノード４１に含まれる単語「ABCDEF」と「曲」を含むクエリをクエリログ記憶部１７から抽出し、第２ノードに設定する。図２の例では、第２ノードとして、歌手名「ABCDEF」のみのノード４２、「ABCDEF、画像」の２つの単語からなるノード４３、「ABCDEF、歌詞」の２つの単語からなるノード４４、「歌詞、曲」の２つの単語からなるノード４５、「曲」のみからなるノード４６が設定されている。
なお、歌手名をスペルミスした「ABGDEF」というクエリ４７も履歴に記憶されている。従来のように、類似クエリも抽出している場合は、このクエリ４７も第２ノードに設定されて後述するエッジが設けられるが、本実施形態では類似クエリにはエッジは設定されない。このため、図２ではエッジ部分に「×」印を付けている。 That is, the first node setting unit 121 of the node setting unit 12 sets the input query “ABCDEF, song” in the first node 41.
Next, the second node setting unit 122 extracts a query including the words “ABCDEF” and “song” included in the first node 41 from the query log storage unit 17 and sets the query as the second node. In the example of FIG. 2, as the second node, a node 42 having only the singer name “ABCDEF”, a node 43 having two words “ABCDEF, image”, a node 44 having two words “ABCDEF, lyrics”, “ A node 45 composed of two words “lyrics and music” and a node 46 composed only of “music” are set.
A query 47 “ABGDEF” in which the singer name is misspelled is also stored in the history. When a similar query is also extracted as in the prior art, this query 47 is also set as the second node and an edge to be described later is provided, but in this embodiment, no edge is set in the similar query. For this reason, in FIG. 2, “x” marks are given to the edge portions.

各ノード４１〜４５間には、有向エッジ５１〜７０が設定されている。すなわち、２つのノード間には、一方のノードから他方のノードに向かう有向エッジと、逆方向の有向エッジの２本のエッジが設定される。
ノード設定部１２は、図２に示すような有効グラフを作成し、クエリ関係記憶部１８に記憶する。 Directed edges 51 to 70 are set between the nodes 41 to 45. That is, between the two nodes, two edges, a directed edge from one node to the other node and a directed edge in the opposite direction, are set.
The node setting unit 12 creates an effective graph as shown in FIG. 2 and stores it in the query relation storage unit 18.

スコア設定部１３は、各有向エッジ５１〜７０のスコアを設定する。スコア設定部１３は、一方のノードから他方のノードに向かう有向エッジのスコアを、他方のノードに含まれる単語数と、各ノードに共通する単語の希少性とに基づいて設定する。
具体的には、スコア設定部１３は、入力されたクエリに含まれる単語の中で、最も希少性の高い単語のみで構成される有向ノードに向かうエッジのスコアを最も高く設定する。
単語の希少性は、クエリの履歴において、前記単語が含まれるクエリの数によって判定される。すなわち、クエリ数が少ないほど、希少性が高いと判定できる。歌手名のような固有名詞と、曲、画像、歌詞のような普通名詞とを比較すると、通常、固有名詞のほうが、その単語を含むクエリの数も少なくなり、希少性も高くなる。このため、図２のグラフでは、希少性の高い単語（歌手名ABCDEF）のみで構成されるノード４２に向かう有向エッジ５１，５７，５９が最も高いスコアになる。 The score setting unit 13 sets the score of each directed edge 51 to 70. The score setting unit 13 sets the score of the directed edge from one node to the other node based on the number of words included in the other node and the rarity of words common to each node.
Specifically, the score setting unit 13 sets the highest score of an edge toward a directed node composed of only words having the highest rarity among words included in the input query.
Word rarity is determined by the number of queries that contain the word in the query history. That is, it can be determined that the scarcity is higher as the number of queries is smaller. Comparing proper nouns such as singer names with common nouns such as songs, images, and lyrics, proper nouns usually have fewer queries that contain the word and are more rare. For this reason, in the graph of FIG. 2, the directed edges 51, 57, and 59 that go to the node 42 that is composed of only a rare word (singer name ABCDEF) have the highest score.

スコア設定部１３における具体的なスコアの算出式の一例は以下の式（２）に示す通りである。
［数２］
H_{i,j}＝1/N_q_j＊(Σ_{w∈q_i∩q_j} 1/DF(w)) ……（２） An example of a specific score calculation formula in the score setting unit 13 is as shown in the following formula (2).
[Equation 2]
H_ {i, j} = 1 / N_q_j * (Σ_ {w∈q_i∩q_j} 1 / DF (w)) …… (2)

ここで、H_{i,j}は、ノードｉからノードｊに向かう有向エッジのスコアを表す。また、N_q_jは、ノードｊに設定されたクエリq_jを構成する単語数を表す。さらに、DF(w)は単語ｗの希少性を示す数値であり、本実施形態では、クエリログ記憶部１７のクエリ履歴において、その単語ｗを含むクエリの数である。Σ_{w∈q_i∩q_j} 1/DF(w)は、ノードｉに設定されたクエリq_iと、ノードｊに設定されたクエリq_jに共通する単語ｗに関するDF(w)の逆数を、両クエリに共通する単語の分だけ加算した値を示す。 Here, H_ {i, j} represents the score of the directed edge from node i to node j. N_q_j represents the number of words constituting the query q_j set for the node j. Furthermore, DF (w) is a numerical value indicating the rarity of the word w, and in this embodiment, is the number of queries including the word w in the query history of the query log storage unit 17. Σ_ {w∈q_i∩q_j} 1 / DF (w) is obtained by calculating the reciprocal of DF (w) related to the word w common to the query q_i set to the node i and the query q_j set to the node j. Indicates the value added for the common words.

有向エッジ５１，５２における前記スコアの算出例を説明する。
有向エッジ５１は、ノード４１(前記式２のノードｉ)からノード４２(前記式２のノードｊ)に向かうエッジである。そして、ノード４２に設定されたクエリの単語数（N_q_j）は、歌手名ABCDEFのみであるから「１」となる。また、歌手名ABCDEFを含むクエリの数が、クエリログ記憶部１７の履歴の中に1000個あった場合、DF(ABCDEF)＝1000となる。
ノード４１，４２に共通する単語はABCDEFのみであるから、Σ_{w∈q_i∩q_j} 1/DF(w)＝1/1000である。すると、スコアH_{i,j}は、1/1＊1/1000＝0.001である。 An example of calculating the score at the directed edges 51 and 52 will be described.
The directed edge 51 is an edge from the node 41 (the node i in the formula 2) toward the node 42 (the node j in the formula 2). Then, the number of words (N_q_j) of the query set in the node 42 is “1” because only the singer name ABCDEF is present. Further, when the number of queries including the singer name ABCDEF is 1000 in the history of the query log storage unit 17, DF (ABCDEF) = 1000.
Since the only word common to the nodes 41 and 42 is ABCDEF, Σ_ {wεq_i∩q_j} 1 / DF (w) = 1/1000. Then, the score H_ {i, j} is 1/1 * 1/1000 = 0.001.

一方、有向エッジ５２の場合、ノード４１に設定されたクエリの単語数は「２」である。
ノード４１，４２に共通する単語はABCDEFのみであるから、Σ_{w∈q_i∩q_j} 1/DF(w)＝1/1000である。すると、スコアH_{i,j}は、1/2＊1/1000＝0.0005であり、前記有向エッジ５１に比べて小さくなる。 On the other hand, in the case of the directed edge 52, the number of words of the query set in the node 41 is “2”.
Since the only word common to the nodes 41 and 42 is ABCDEF, Σ_ {wεq_i∩q_j} 1 / DF (w) = 1/1000. Then, the score H_ {i, j} is 1/2 * 1/1000 = 0.0005, which is smaller than the directed edge 51.

有向エッジ５７，５９は、共通する単語がABCDEFで同じであり、かつ、エッジが向かう方向のノードの単語数が「１」で共通するため、有向エッジ５１と同じスコアになる。
また、有向エッジ５３〜６２は、共通する単語がABCDEFで同じであり、かつ、エッジが向かう方向のノードの単語数が「２」で共通するため、有向エッジ５２と同じスコアになる。 The directed edges 57 and 59 have the same score as the directed edge 51 because the common word is the same in ABCDEF, and the number of words in the node in the direction toward the edge is “1”.
Further, the directed edges 53 to 62 have the same score as the directed edge 52 because the common word is the same in ABCDEF and the number of words in the node in the direction toward the edge is “2”.

一方、有向エッジ６３〜７０は、２つのノードに共通する単語が「曲」や「歌詞」である。このような普通名詞は、クエリログに記憶されているクエリ数も多くなる。単語「曲」のクエリ数が１万個であれば、DF(曲)＝10000となる。このため、有向エッジ６７、６９のスコアは、1/1＊1/10000＝0.0001であり、有向エッジ６３，６４，６８，７０のスコアは、1/2＊1/10000＝0.00005である。
単語「歌詞」のクエリ数が8000個であれば、DF(歌詞)＝8000となり、有向エッジ６５，６６のスコアは、1/2＊1/8000＝0.0000625である。
スコア設定部１３は、各有向エッジのスコアを、クエリ関係記憶部１８の有向グラフに記憶する。具体的には、スコアH_{i,j}を行列と見なしてスコアを設定する。 On the other hand, the words common to the two nodes in the directed edges 63 to 70 are “music” and “lyrics”. Such common nouns also increase the number of queries stored in the query log. If the number of queries for the word “song” is 10,000, DF (song) = 10000. Therefore, the score of the directed edges 67 and 69 is 1/1 * 1/10000 = 0.0001, and the score of the directed edges 63, 64, 68, and 70 is 1/2 * 1/10000 = 0.00005. .
If the number of queries for the word “lyric” is 8000, DF (lyric) = 8000, and the score of the directed edges 65, 66 is 1/2 * 1/8000 = 0.000500625.
The score setting unit 13 stores the score of each directed edge in the directed graph of the query relationship storage unit 18. Specifically, the score is set by regarding the score H_ {i, j} as a matrix.

クエリ出力部１４は、スコア設定部１３で設定された有向エッジのスコアのなかで最も高い値の有向エッジを探し、その有向エッジが向かう方向のノードを重要と判定する。前記図２の例では、有向エッジ５１，５７，５９のスコアが最も高い値となる。従って、クエリ出力部１４は、ノード４２が重要と判定する。そこで、クエリ出力部１４は、ノード４２のクエリ「ABCDEF」を重要クエリとして、端末装置３０に出力する。
このため、ユーザは、重要クエリ抽出装置１０から提供された重要クエリを利用した検索を行うことができ、求める情報を得ることができる。 The query output unit 14 searches the directed edge having the highest value among the directed edge scores set by the score setting unit 13, and determines that the node in the direction to which the directed edge is important is important. In the example of FIG. 2, the score of the directed edges 51, 57, 59 is the highest value. Therefore, the query output unit 14 determines that the node 42 is important. Therefore, the query output unit 14 outputs the query “ABCDEF” of the node 42 to the terminal device 30 as an important query.
For this reason, the user can perform a search using an important query provided from the important query extraction device 10 and can obtain information to be sought.

［重要クエリ抽出装置の動作］
次に、重要クエリ抽出装置１０における重要クエリ抽出処理に関し、図３のフローチャートに基づいて説明する。図３に示す処理は、前述したように、重要クエリ抽出プログラムを実行することで実施される。
重要クエリ抽出装置１０のクエリ取得部１１は、端末装置３０から送信されるクエリを取得する（ステップＳ１１）。クエリ取得部１１は、取得したクエリをクエリログ記憶部１７に記憶し、かつ、ノード設定部１２に出力する。
ノード設定部１２の第１ノード設定部１２１は、取得したクエリを第１ノードに設定する（ステップＳ１２）。 [Operation of important query extractor]
Next, the important query extraction processing in the important query extraction apparatus 10 will be described based on the flowchart of FIG. The process shown in FIG. 3 is implemented by executing the important query extraction program as described above.
The query acquisition unit 11 of the important query extraction device 10 acquires a query transmitted from the terminal device 30 (step S11). The query acquisition unit 11 stores the acquired query in the query log storage unit 17 and outputs it to the node setting unit 12.
The first node setting unit 121 of the node setting unit 12 sets the acquired query as the first node (step S12).

次に、第２ノード設定部１２２は、第１ノードの単語を含むクエリを、クエリログ記憶部１７から抽出し、第２ノードを設定する（ステップＳ１３）。また、第２ノード設定部１２２は、各ノード間に有向のエッジを設定して、クエリ間の関係を示すモデルとして有向グラフを作成し、クエリ関係記憶部１８に記憶する（ステップＳ１４）。 Next, the second node setting unit 122 extracts a query including the word of the first node from the query log storage unit 17, and sets the second node (step S13). Further, the second node setting unit 122 sets a directed edge between the nodes, creates a directed graph as a model indicating the relationship between the queries, and stores the directed graph in the query relationship storage unit 18 (step S14).

次に、スコア設定部１３は、前記各エッジのスコアを算出し、クエリ関係記憶部１８に記憶する（ステップＳ１５）。
クエリ出力部１４は、前記各エッジのスコアで最も高い値を検索し、重要クエリを抽出する（ステップＳ１６）。
さらに、クエリ出力部１４は、抽出した重要クエリを、端末装置３０に出力する（ステップＳ１７）。
以上により、重要クエリの抽出処理が終了する。重要クエリ抽出装置１０は、端末装置３０からクエリが入力されるたびに、前記ステップＳ１１〜Ｓ１７の処理を実行する。 Next, the score setting unit 13 calculates the score of each edge and stores it in the query relation storage unit 18 (step S15).
The query output unit 14 searches for the highest value in the score of each edge, and extracts an important query (step S16).
Further, the query output unit 14 outputs the extracted important query to the terminal device 30 (step S17).
Thus, the important query extraction process ends. The important query extraction device 10 executes the processes of steps S11 to S17 each time a query is input from the terminal device 30.

端末装置３０においては、重要クエリ抽出装置１０から出力された重要クエリが表示される。そして、ユーザが表示されたクエリを選択すると、端末装置３０は、サーチエンジン２３にクエリを出力し、その検索結果を取得して表示する。 In the terminal device 30, the important query output from the important query extraction device 10 is displayed. When the user selects the displayed query, the terminal device 30 outputs the query to the search engine 23, and acquires and displays the search result.

［実施形態の作用効果］
以上の実施形態によれば、以下の作用効果を奏することができる。
重要クエリ抽出装置１０は、端末装置３０から入力されたクエリに含まれる単語を有するクエリをクエリログ記憶部１７から抽出して各ノードに設定する。そして、各ノード間のエッジのスコアを、各ノード間に共通する単語の希少性を示す値と、エッジが向かう方向のノードの単語数に基づいて設定し、希少性の高い単語であり、かつ、単語数が少ないノードに向かうエッジのスコアが高くなるように設定した。このため、固有名詞と、普通名詞からなるクエリを入力した場合、その固有名詞単独のクエリがクエリログ記憶部１７に記憶されていれば、そのクエリを重要クエリとして抽出し、端末装置３０に出力できる。通常、固有名詞のような希少性の高い、つまりクエリログに記憶されている数が少ない単語を入力している場合、ユーザはその単語に関係する情報を最も要求していると推定できる。従って、その単語を含むクエリで単語数が少ないものが最も重要度の高いクエリと判定できる。
このため、端末装置３０のユーザは、重要クエリ抽出装置１０からスペルミスや誤表記を含まない、例えば固有表現の正しい表記などの重要クエリを得ることができ、この重要クエリを用いて新たな検索を行うことで、求める情報を入手できる可能性が高まる。 [Effects of Embodiment]
According to the above embodiment, the following effects can be obtained.
The important query extraction device 10 extracts a query having a word included in the query input from the terminal device 30 from the query log storage unit 17 and sets it in each node. Then, the edge score between the nodes is set based on the value indicating the rarity of the word common between the nodes and the number of words of the node in the direction toward the edge, and is a highly rare word, and The score of the edge toward the node with a small number of words was set to be high. For this reason, when a query composed of proper nouns and common nouns is input, if the query of the proper noun alone is stored in the query log storage unit 17, the query can be extracted as an important query and output to the terminal device 30. . Usually, when a word having a high rarity such as a proper noun, that is, a small number stored in the query log is input, it can be estimated that the user most demands information related to the word. Therefore, a query including the word with a small number of words can be determined as the most important query.
For this reason, the user of the terminal device 30 can obtain an important query from the important query extraction device 10 that does not include a spelling error or a misrepresentation, for example, a correct notation of a specific expression, and performs a new search using the important query. Doing so increases the possibility of getting the information you want.

また、本実施形態では、単語の希少性を表す数値として、前記クエリログ記憶部１７に記憶されたクエリ数としている。このため、各ノードの設定も、エッジのスコアの算出も、クエリログ記憶部１７に記憶されたクエリログ（履歴）のみに基づいて行うことができる。すなわち、本実施形態では、前記重要クエリを抽出するには、クエリ履歴をクエリログ記憶部１７に記憶しておくだけでよく、たとえば、各ユーザが複数のクエリをどのように入力したか等の行動履歴が保存されたログなどを用意する必要がない。このため、システム的な負荷が小さく、簡易なシステムで重要クエリを抽出できる。 In the present embodiment, the number of queries stored in the query log storage unit 17 is used as a numerical value representing the scarcity of words. Therefore, the setting of each node and the calculation of the edge score can be performed based only on the query log (history) stored in the query log storage unit 17. That is, in the present embodiment, in order to extract the important query, it is only necessary to store the query history in the query log storage unit 17. For example, the behavior such as how each user inputs a plurality of queries. There is no need to prepare logs with history. For this reason, a system load is small and an important query can be extracted with a simple system.

さらに、スペルミスを含む類似クエリの入力頻度が高い場合であっても、本実施形態では、一致する単語を含むクエリのみをノードに設定し、類似する単語にはエッジが設定されない。このため、スペルミスを含む類似クエリが重要クエリと誤って判定されることを防止できる。 Furthermore, even when the input frequency of similar queries including spelling errors is high, in this embodiment, only queries including matching words are set as nodes, and edges are not set for similar words. For this reason, it is possible to prevent a similar query including a spelling error from being erroneously determined as an important query.

［変形例］
なお、本発明は、上述した実施形態に限定されるものではなく、本発明の目的を達成できる範囲で、以下に示される変形をも含むものである。
たとえば、上記実施形態では、情報検索装置２０と、重要クエリ抽出装置１０とを別々のサーバ装置で構成していたが、これらを１つのサーバ装置で構成してもよい。 [Modification]
In addition, this invention is not limited to embodiment mentioned above, In the range which can achieve the objective of this invention, the deformation | transformation shown below is also included.
For example, in the above embodiment, the information search device 20 and the important query extraction device 10 are configured as separate server devices, but may be configured as a single server device.

また、エッジのスコアを算出するための式としては、前記式（２）に記載されたものに限定されない。すなわち、クエリの単語の希少性が高いほどスコアが高くなり、かつ、クエリに含まれる単語数が少ないほどスコアが高くなるような式であればよい。 Further, the formula for calculating the score of the edge is not limited to the formula described in the formula (2). In other words, any expression may be used as long as the word rarity of the query is high, the score is high, and the score is high as the number of words included in the query is small.

本発明は、入力されたクエリに関連して、重要度の高いクエリを判定して出力できる重要クエリ抽出装置、重要クエリ抽出方法および重要クエリ抽出プログラムとして利用できる。 INDUSTRIAL APPLICABILITY The present invention can be used as an important query extraction device, an important query extraction method, and an important query extraction program that can determine and output a query with high importance in relation to an input query.

１…検索システム、１０…重要クエリ抽出装置、１１…クエリ取得部、１２…ノード設定部、１２１…第１ノード設定部、１２２…第２ノード設定部、１３…スコア設定部、１４…クエリ出力部、１７…クエリログ記憶部、１８…クエリ関係記憶部、２０…情報検索装置、３０…端末装置。 DESCRIPTION OF SYMBOLS 1 ... Search system, 10 ... Important query extraction apparatus, 11 ... Query acquisition part, 12 ... Node setting part, 121 ... 1st node setting part, 122 ... 2nd node setting part, 13 ... Score setting part, 14 ... Query output , 17 ... Query log storage unit, 18 ... Query relation storage unit, 20 ... Information retrieval device, 30 ... Terminal device.

Claims

A query acquisition unit for acquiring a query including a word as a search keyword;
A query log storage unit for sequentially storing the queries acquired by the query acquisition unit as a query log;
A first node setting unit that sets a query acquired by the query acquisition unit as a first node;
A second node setting unit that extracts a query including a word included in the query of the first node from a query log stored in the query log storage unit, and sets the extracted query as a second node;
A score setting unit that adds a directed edge between the nodes and sets a score of each directed edge;
A query output unit that identifies an important node based on the score set by the score setting unit and outputs a query set to the node as an important query;
The score setting unit sets the score of the directed edge from one node to the other node based on the number of words included in the other node and a numerical value representing the rarity of words common to each node. An important query extraction device characterized by that.

The important query extraction device according to claim 1,
The score setting unit sets the highest score of the directed edge toward the node when there is a node composed only of the rarest word among the words included in the input query. ,
The said query output part outputs the query of the node with the highest score of the said directed edge as an important query. The important query extraction apparatus characterized by the above-mentioned.

In the important query extraction device according to claim 1 or 2,
The score setting unit has an edge score from node i to node j as H_ {i, j}, the number of words constituting a query set in node j as N_q_j, and the number of queries including word w in the query log storage unit DF (w) is a numerical value indicating DF (w), and a value obtained by adding the reciprocal of DF (w) for the words included in node i and node j is Σ_ {w∈q_i∩q_j} 1 / DF (w), Edge score,
H_ {i, j} = 1 / N_q_j * (Σ_ {w∈q_i∩q_j} 1 / DF (w))
An important query extraction device characterized by

An important query extraction method for extracting an important query by a computer,
The computer
A query log storage unit storing a history of queries,
A query acquisition step for acquiring a query including a word as a search keyword;
A first node setting step of setting the query acquired in the query acquisition step as a first node;
A second node setting step of extracting a query including a word included in the query of the first node from a query history stored in the query log storage unit, and setting the extracted query as a second node;
A score setting step of adding a directed edge between the nodes and setting a score of each directed edge;
A query output step of identifying an important node based on the score set in the score setting step, and outputting a query set in the node as an important query;
In the score setting step, the score of the directed edge from one node to the other node is set based on the number of words included in the other node and a numerical value representing the rarity of words common to each node. An important query extraction method characterized by that.

On the computer,
A query acquisition step for acquiring a query including a word as a search keyword;
A first node setting step of setting the query acquired in the query acquisition step as a first node;
A query including a word included in the query of the first node is extracted from a query history stored in a query log storage unit in which a query history is stored, and the extracted query is set as a second node. Node setting step;
A score setting step of adding a directed edge between the nodes and setting a score of each directed edge;
An important query extraction program for executing an query output step of identifying an important node based on the score set in the score setting step and outputting a query set in the node as an important query,
In the score setting step, the score of the directed edge from one node to the other node is set based on the number of words included in the other node and a numerical value representing the rarity of words common to each node. An important query extraction program characterized by that.