JP2017228114A

JP2017228114A - Query analysis device, query analysis method and program

Info

Publication number: JP2017228114A
Application number: JP2016124367A
Authority: JP
Inventors: 田村　健; Takeshi Tamura; 健田村; 伸次池宮; Shinji Ikemiya
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-06-23
Filing date: 2016-06-23
Publication date: 2017-12-28
Anticipated expiration: 2036-06-23
Also published as: JP6779047B2

Abstract

PROBLEM TO BE SOLVED: To provide a query analysis device, a query analysis method and a program capable of analyzing needs of a user who performs search and/or change of the user's needs over time using queries.SOLUTION: A query analysis device comprises: a query information acquisition unit for acquiring query information in which a searched query and a time when the query was searched are associated for each user; an overlap level score calculation unit for, based on the query information, calculating an overlap level score indicating a degree of overlapping of a user who searched a first query and a user who searched a second query; a search time difference calculation unit for, based on the query information, calculating a search time difference which is a difference between a time when the first query was searched and a time when the second query was searched; and a display information generation unit for generating information for displaying the overlap level score and the search time difference in association with the first query and the second query.SELECTED DRAWING: Figure 1

Description

本発明は、クエリ分析装置、クエリ分析方法、およびプログラムに関する。 The present invention relates to a query analysis device, a query analysis method, and a program.

従来、検索サイトに入力されたクエリに基づき、シソーラス辞書を作成する技術が用いられてきた。具体的には、検索時刻の間隔が所定時間以内の検索ワードから、検索時刻が先の検索ワードと検索時刻が後の検索ワードとをペアにしたペア検索ワードを生成し、生成したペア検索ワードを用いてシソーラス辞書を生成する技術が知られている（特許文献１参照）。 Conventionally, a technique for creating a thesaurus dictionary based on a query input to a search site has been used. Specifically, a pair search word is generated by pairing a search word with a search time earlier and a search word with a later search time from a search word having a search time interval within a predetermined time. There is known a technique for generating a thesaurus dictionary by using (see Patent Document 1).

特開２０１３−１０９７０１号公報JP 2013-109701 A

しかしながら、特許文献１に開示された技術は、生成したシソーラス辞書を用いて検索ワードの変換処理を行うことができるものの、検索するユーザのニーズや、時間経過によるユーザのニーズの変化を、検索サイトに入力されたクエリを用いて分析することができなかった。 However, although the technique disclosed in Patent Document 1 can perform a search word conversion process using the generated thesaurus dictionary, it is possible to detect changes in user needs over time and user needs over time. Could not be analyzed using the query entered in.

本発明は、このような事情を考慮してなされたものであり、検索するユーザのニーズや、時間経過によるユーザのニーズの変化を、クエリを用いて分析することができるクエリ分析装置、クエリ分析方法、およびプログラムを提供することを目的の一つとする。 The present invention has been made in view of such circumstances, and a query analysis apparatus and query analysis that can analyze a user's needs to be searched and changes in the user's needs over time using a query. An object is to provide a method and a program.

本発明の一態様は、検索されたクエリと、前記クエリが検索された時間とがユーザごとに関連付けられたクエリ情報を取得するクエリ情報取得部と、前記クエリ情報取得部によって取得された前記クエリ情報に基づいて、第１のクエリを検索したユーザと、第２のクエリを検索したユーザとの重複の度合いを示す重複度スコアを算出する重複度スコア算出部と、前記クエリ情報取得部によって取得された前記クエリ情報に基づいて、前記第１のクエリが検索された時間と、前記第２のクエリが検索された時間との差である検索時間差を算出する検索時間差算出部と、前記重複度スコア算出部によって算出された前記重複度スコアと、前記検索時間差算出部によって算出された前記検索時間差とを、前記第１のクエリおよび前記第２のクエリに関連付けて表示するための情報を生成する表示情報生成部と、を備えるクエリ分析装置である。 One aspect of the present invention is a query information acquisition unit that acquires query information in which a searched query and a time when the query is searched are associated for each user, and the query acquired by the query information acquisition unit. Based on the information, acquired by the query information acquisition unit, the redundancy score calculation unit that calculates the redundancy score indicating the degree of overlap between the user who searched the first query and the user who searched the second query A search time difference calculating unit that calculates a search time difference that is a difference between a time when the first query is searched and a time when the second query is searched based on the query information that has been searched; The redundancy score calculated by the score calculation unit and the search time difference calculated by the search time difference calculation unit in the first query and the second query. A display information generator for generating information for display with continuous, a query analyzing device comprising a.

本発明の一態様によれば、検索するユーザのニーズや、時間経過によるユーザのニーズの変化を、クエリを用いて分析することができる。 According to one embodiment of the present invention, it is possible to analyze a user's needs to be searched and changes in the user's needs over time using a query.

実施形態に係るクエリ分析システム１０の構成を示す図である。1 is a diagram illustrating a configuration of a query analysis system 10 according to an embodiment. 実施形態に係るウェブページの検索処理を示すシーケンス図である。It is a sequence diagram which shows the search process of the web page which concerns on embodiment. 実施形態に係る記憶部１２０に記憶されたクエリ情報の一例を示す図である。It is a figure which shows an example of the query information memorize | stored in the memory | storage part 120 which concerns on embodiment. 実施形態に係る重複度スコアの算出処理を説明するための図である。It is a figure for demonstrating the calculation process of the duplication degree score which concerns on embodiment. 実施形態に係る検索時間差の算出処理を説明するための図である。It is a figure for demonstrating the calculation process of the search time difference which concerns on embodiment. 実施形態に係る分析開始前のクエリ分析ウィンドウＷの一例を示す図である。It is a figure which shows an example of the query analysis window W before the analysis start which concerns on embodiment. 実施形態に係る分析終了後のクエリ分析ウィンドウＷの一例を示す図である。It is a figure showing an example of query analysis window W after the end of analysis concerning an embodiment. 実施形態に係るクラスタリング処理を説明するための図である。It is a figure for demonstrating the clustering process which concerns on embodiment. 実施形態に係るヒートマップＭの一例を示す図である。It is a figure which shows an example of the heat map M which concerns on embodiment. 実施形態に係るクエリ分析処理を示すフローチャートである。It is a flowchart which shows the query analysis process which concerns on embodiment.

以下、図面を参照して、クエリ分析装置、クエリ分析方法、およびプログラムの実施形態について説明する。クエリ分析装置は、ネットワークなどを介して行われた検索の履歴を取得し、第１のクエリに対して相関の高い一以上の第２のクエリを抽出し、第１のクエリと第２のクエリとの関係を可視化する装置である。クエリ分析装置は、コンピュータにツール（プログラム）がインストールされることで実現されてもよいし、クラウドサービスによって分析結果を提供する装置であってもよい。クエリ分析装置によって、検索するユーザのニーズや、時間経過によるユーザのニーズの変化を分析することができる。 Hereinafter, embodiments of a query analysis device, a query analysis method, and a program will be described with reference to the drawings. The query analysis device acquires a history of searches performed via a network or the like, extracts one or more second queries having a high correlation with the first query, and extracts the first query and the second query. It is a device that visualizes the relationship. The query analysis apparatus may be realized by installing a tool (program) in a computer, or may be an apparatus that provides an analysis result by a cloud service. The query analysis device can analyze the needs of users to be searched and the changes in user needs over time.

＜１．クエリ分析システムの構成＞
図１は、実施形態に係るクエリ分析システム１０の構成を示す図である。実施形態のクエリ分析システム１０は、ウェブサーバ１００と、クエリ分析装置２００と、ユーザ端末３００とを備える。 <1. Configuration of query analysis system>
FIG. 1 is a diagram illustrating a configuration of a query analysis system 10 according to the embodiment. The query analysis system 10 according to the embodiment includes a web server 100, a query analysis device 200, and a user terminal 300.

ウェブサーバ１００、クエリ分析装置２００、およびユーザ端末３００は、ネットワークＮＷに接続される。ネットワークＮＷは、例えば、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）、インターネット、プロバイダ装置、無線基地局、専用回線などのうち一部または全部を含む。 Web server 100, query analysis device 200, and user terminal 300 are connected to network NW. The network NW includes, for example, a part or all of a wide area network (WAN), a local area network (LAN), the Internet, a provider device, a wireless base station, a dedicated line, and the like.

ウェブサーバ１００は、制御部１１０と、記憶部１２０とを備える。制御部１１０は、例えば、ウェブサーバ１００のプロセッサがプログラムを実行することで実現されてもよいし、ＬＳＩ（Large Scale Integration）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）などのハードウェアによって実現されてもよいし、ソフトウェアとハードウェアが協働することで実現されてもよい。 The web server 100 includes a control unit 110 and a storage unit 120. For example, the control unit 110 may be realized by a processor of the web server 100 executing a program, LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), or the like. It may be realized by hardware, or may be realized by cooperation of software and hardware.

記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、またはこれらのうち複数が組み合わされたハイブリッド型記憶装置などにより実現される。また、記憶部１２０の一部または全部は、ＮＡＳ（Network Attached Storage）や外部のストレージサーバなど、ウェブサーバ１００がアクセス可能な外部装置であってもよい。 The storage unit 120 is realized by, for example, a random access memory (RAM), a read only memory (ROM), a hard disk drive (HDD), a flash memory, or a hybrid storage device in which a plurality of these are combined. Further, part or all of the storage unit 120 may be an external device accessible by the web server 100, such as NAS (Network Attached Storage) or an external storage server.

クエリ分析装置２００は、クエリを分析する分析者によって使用される装置であり、ノート型のコンピュータまたはデスクトップ型のコンピュータであるが、これに限られない。例えば、クエリ分析装置２００は、スマートフォンなどの携帯電話、タブレット端末、またはＰＤＡ（Personal Digital Assistant）であってもよい。 The query analysis apparatus 200 is an apparatus used by an analyst who analyzes a query, and is a notebook computer or a desktop computer, but is not limited thereto. For example, the query analysis device 200 may be a mobile phone such as a smartphone, a tablet terminal, or a PDA (Personal Digital Assistant).

クエリ分析装置２００は、入力部２１０と、表示部２２０と、クエリ情報取得部２３０と、重複度スコア算出部２４０と、検索時間差算出部２５０と、クラスタリング部２６０と、ヒートマップ生成部２７０と、表示情報生成部２８０と、記憶部２９０とを備える。入力部２１０は、キーボードやマウスなどの入力装置である。クエリ分析装置２００がスマートフォンなどの携帯電話またはタブレット端末である場合には、入力部２１０はタッチパネルなどの入力装置であってもよい。表示部２２０は、液晶表示装置などの表示装置である。 The query analysis device 200 includes an input unit 210, a display unit 220, a query information acquisition unit 230, a redundancy score calculation unit 240, a search time difference calculation unit 250, a clustering unit 260, a heat map generation unit 270, A display information generation unit 280 and a storage unit 290 are provided. The input unit 210 is an input device such as a keyboard or a mouse. When the query analysis device 200 is a mobile phone such as a smartphone or a tablet terminal, the input unit 210 may be an input device such as a touch panel. The display unit 220 is a display device such as a liquid crystal display device.

クエリ情報取得部２３０、重複度スコア算出部２４０、検索時間差算出部２５０、クラスタリング部２６０、ヒートマップ生成部２７０、および表示情報生成部２８０は、例えば、クエリ分析装置２００のプロセッサがプログラムを実行することで実現されてもよいし、ＬＳＩ、ＡＳＩＣ、ＦＰＧＡなどのハードウェアによって実現されてもよいし、ソフトウェアとハードウェアが協働することで実現されてもよい。 The query information acquisition unit 230, the redundancy score calculation unit 240, the search time difference calculation unit 250, the clustering unit 260, the heat map generation unit 270, and the display information generation unit 280, for example, are executed by the processor of the query analysis device 200. May be realized by hardware such as LSI, ASIC, FPGA, or may be realized by cooperation of software and hardware.

記憶部２９０は、例えば、ＲＡＭ、ＲＯＭ、ＨＤＤ、フラッシュメモリ、またはこれらのうち複数が組み合わされたハイブリッド型記憶装置などにより実現される。また、記憶部２９０の一部または全部は、ＮＡＳや外部のストレージサーバなど、クエリ分析装置２００がアクセス可能な外部装置であってもよい。 The storage unit 290 is realized by, for example, a RAM, ROM, HDD, flash memory, or a hybrid storage device in which a plurality of these are combined. Further, a part or all of the storage unit 290 may be an external device accessible by the query analysis device 200, such as a NAS or an external storage server.

ユーザ端末３００は、ユーザによって使用される端末であり、スマートフォンなどの携帯電話やタブレット端末であるが、これに限られない。例えば、ユーザ端末３００は、ノート型のコンピュータ、デスクトップ型のコンピュータ、またはＰＤＡであってもよい。 The user terminal 300 is a terminal used by a user and is a mobile phone such as a smartphone or a tablet terminal, but is not limited thereto. For example, the user terminal 300 may be a notebook computer, a desktop computer, or a PDA.

ユーザ端末３００は、制御部３１０と、入力部３２０と、表示部３３０とを備える。制御部３１０は、例えば、ユーザ端末３００のプロセッサがプログラムを実行することで実現されてもよいし、ＬＳＩ、ＡＳＩＣ、ＦＰＧＡなどのハードウェアによって実現されてもよいし、ソフトウェアとハードウェアが協働することで実現されてもよい。 The user terminal 300 includes a control unit 310, an input unit 320, and a display unit 330. For example, the control unit 310 may be realized by a processor of the user terminal 300 executing a program, or may be realized by hardware such as an LSI, an ASIC, or an FPGA, or software and hardware cooperate with each other. It may be realized by doing.

入力部３２０は、タッチパネルなどの入力装置である。ユーザ端末３００がノート型のコンピュータまたはデスクトップ型のコンピュータである場合には、入力部３２０はキーボードやマウスなどの入力装置であってもよい。表示部３３０は、液晶表示装置などの表示装置である。 The input unit 320 is an input device such as a touch panel. When the user terminal 300 is a notebook computer or a desktop computer, the input unit 320 may be an input device such as a keyboard or a mouse. The display unit 330 is a display device such as a liquid crystal display device.

＜２．ウェブページの検索処理＞
図２は、実施形態に係るウェブページの検索処理を示すシーケンス図である。まず、ユーザは、ユーザ端末３００の入力部３２０を用いて、検索ページを表示部３３０に表示させるための指示を入力する。検索ページとは、ウェブサーバ１００の運営者によって提供される検索サイトのページである。ユーザ端末３００の制御部３１０は、入力された指示に基づき、ＨＴＴＰ（Hypertext Transfer Protocol）リクエストをウェブサーバ１００に送信する（Ｓ１０）。 <2. Web page search processing>
FIG. 2 is a sequence diagram illustrating web page search processing according to the embodiment. First, the user uses the input unit 320 of the user terminal 300 to input an instruction for displaying the search page on the display unit 330. A search page is a page of a search site provided by the operator of the web server 100. The control unit 310 of the user terminal 300 transmits an HTTP (Hypertext Transfer Protocol) request to the web server 100 based on the input instruction (S10).

ウェブサーバ１００の制御部１１０は、ユーザ端末３００からＨＴＴＰリクエストを受信すると、予め記憶部１２０に記憶された検索ページ生成情報を読み出す（Ｓ１１）。次に、ウェブサーバ１００の制御部１１０は、読み出した検索ページ生成情報をユーザ端末３００に送信する（Ｓ１２）。検索ページ生成情報は、例えば、ＨＴＭＬ（HyperText Markup Language）等が記述されたテキストデータや、スタイルシート、画像データ、動画データ、音声データである。 When receiving the HTTP request from the user terminal 300, the control unit 110 of the web server 100 reads the search page generation information stored in the storage unit 120 in advance (S11). Next, the control unit 110 of the web server 100 transmits the read search page generation information to the user terminal 300 (S12). The search page generation information is, for example, text data describing HTML (HyperText Markup Language), style sheets, image data, moving image data, and audio data.

ユーザ端末３００の制御部３１０は、ウェブサーバ１００から検索ページ生成情報を受信すると、受信した検索ページ生成情報を用いて、表示部３３０に検索ページを表示させる（Ｓ１３）。ユーザは、ユーザ端末３００の入力部３２０を用いて、表示部３３０に表示された検索ページにクエリを入力する。クエリとは、検索ページに入力された一つの検索ワードまたは複数の検索ワードの組み合わせである。 When receiving the search page generation information from the web server 100, the control unit 310 of the user terminal 300 causes the display unit 330 to display the search page using the received search page generation information (S13). The user uses the input unit 320 of the user terminal 300 to input a query to the search page displayed on the display unit 330. A query is a single search word or a combination of multiple search words entered on the search page.

ユーザ端末３００の制御部３１０は、ユーザによって入力されたクエリに基づいて、クエリ情報を生成する。クエリ情報には、ユーザによって入力されたクエリの他、ユーザの識別情報および検索日時等が含まれる。制御部３１０は、生成したクエリ情報をウェブサーバ１００に送信する（Ｓ１４）。 The control unit 310 of the user terminal 300 generates query information based on the query input by the user. The query information includes user identification information, search date and time, in addition to the query input by the user. The control unit 310 transmits the generated query information to the web server 100 (S14).

ウェブサーバ１００の制御部１１０は、ユーザ端末３００からクエリ情報を受信すると、受信したクエリ情報を記憶部１２０に記憶する（Ｓ１５）。具体的に、制御部１１０は、ユーザ端末３００から受信したクエリ情報に含まれるユーザの識別情報に基づき、ユーザごとにクエリ情報を分類して記憶部１２０に記憶する。 When receiving the query information from the user terminal 300, the control unit 110 of the web server 100 stores the received query information in the storage unit 120 (S15). Specifically, the control unit 110 classifies the query information for each user based on the user identification information included in the query information received from the user terminal 300 and stores the query information in the storage unit 120.

次に、制御部１１０は、受信したクエリ情報に含まれるクエリに基づいて検索処理を行う（Ｓ１６）。制御部１１０は、検索処理において、クエリによって示される検索ワードが含まれるページの一覧を、検索結果として生成する。その後、制御部１１０は、生成した検索結果をユーザ端末３００に送信する（Ｓ１７）。 Next, the control unit 110 performs a search process based on the query included in the received query information (S16). In the search process, the control unit 110 generates a list of pages including the search word indicated by the query as a search result. Thereafter, the control unit 110 transmits the generated search result to the user terminal 300 (S17).

ユーザ端末３００の制御部３１０は、ウェブサーバ１００から検索結果を受信すると、受信した検索結果を表示部３３０に表示させる（Ｓ１８）。このように、ユーザによって入力されたクエリに基づく検索処理が行われる。 When receiving the search result from the web server 100, the control unit 310 of the user terminal 300 displays the received search result on the display unit 330 (S18). Thus, the search process based on the query input by the user is performed.

なお、本実施形態においては、説明の簡略化のためにウェブサーバ１００に検索機能を持たせることとしたが、これに限られない。例えば、クエリ分析システム１０は、ウェブサーバと検索サーバとを別々に備えてもよい。 In the present embodiment, the web server 100 is provided with a search function for the sake of simplification of explanation, but the present invention is not limited to this. For example, the query analysis system 10 may include a web server and a search server separately.

＜３．重複度スコアの算出処理＞
次に、重複度スコアの算出処理について説明する。重複度スコアは、あるクエリを検索したユーザと、他のクエリを検索したユーザとの重複の度合いを示す値であり、クエリ分析の指標として用いられる。クエリ分析装置２００の重複度スコア算出部２４０は、クエリ情報に用いて重複度スコアを算出する。 <3. Duplication score calculation processing>
Next, the calculation process of the redundancy score will be described. The duplication degree score is a value indicating the degree of duplication between a user who searches for a certain query and a user who searches for another query, and is used as an index for query analysis. The multiplicity score calculation unit 240 of the query analysis device 200 calculates a multiplicity score using the query information.

図３は、実施形態に係る記憶部１２０に記憶されたクエリ情報の一例を示す図である。図３に示されるように、記憶部１２０には、ユーザごとに分類されたクエリ情報Ｑ１からＱｎ（ｎ：ユーザの総数）が格納されている。例えば、クエリ情報Ｑ１はユーザ１のクエリ情報であり、クエリ情報Ｑ２はユーザ２のクエリ情報であり、・・・、クエリ情報Ｑｎ
はユーザｎのクエリ情報である。クエリ情報Ｑ１からＱｎのそれぞれにおいて、ユーザによって入力されたクエリおよび検索日時が関連付けられている。 FIG. 3 is a diagram illustrating an example of query information stored in the storage unit 120 according to the embodiment. As shown in FIG. 3, the storage unit 120 stores query information Q1 to Qn (n: total number of users) classified for each user. For example, the query information Q1 is query information of the user 1, the query information Q2 is query information of the user 2, ..., query information Qn
Is the query information of user n. In each of the query information Q1 to Qn, the query input by the user and the search date and time are associated.

図４は、実施形態に係る重複度スコアの算出処理を説明するための図である。図４において、ＡＬＬｕｓｅｒは全ユーザ数を示し、ＡｕｓｅｒはクエリＡを入力したユーザ数を示し、ＢｕｓｅｒはクエリＢを入力したユーザ数を示す。ここで、ＡＬＬｕｓｅｒ、Ａｕｓｅｒ、およびＢｕｓｅｒはユーザ数であることとしたが、検索回数であってもよい。ＡＬＬｕｓｅｒ、Ａｕｓｅｒ、およびＢｕｓｅｒのそれぞれの値は、クエリ情報に基づいて算出される。 FIG. 4 is a diagram for explaining the calculation process of the redundancy score according to the embodiment. In FIG. 4, ALLuser indicates the total number of users, Auser indicates the number of users who input the query A, and Buser indicates the number of users who input the query B. Here, ALLuser, Auser, and Buser are the number of users, but may be the number of searches. The values of ALLuser, Auser, and Buser are calculated based on the query information.

クエリ分析装置２００のクエリ情報取得部２３０は、ウェブサーバ１００にクエリ情報要求を送信する。ウェブサーバ１００の制御部１１０は、クエリ情報取得部２３０からクエリ情報要求を受信すると、記憶部１２０からクエリ情報Ｑ１からＱｎ（図３）を読み出す。その後、制御部１１０は、読み出したクエリ情報Ｑ１からＱｎをクエリ分析装置２００に送信する。 The query information acquisition unit 230 of the query analysis device 200 transmits a query information request to the web server 100. When receiving the query information request from the query information acquisition unit 230, the control unit 110 of the web server 100 reads the query information Q1 to Qn (FIG. 3) from the storage unit 120. Thereafter, the control unit 110 transmits the read query information Q1 to Qn to the query analysis device 200.

クエリ情報取得部２３０は、ウェブサーバ１００の制御部１１０から送信されたクエリ情報Ｑ１からＱｎを取得する。また、クエリ情報取得部２３０は、取得したクエリ情報Ｑ１からＱｎを記憶部２９０に記憶する。重複度スコア算出部２４０は、記憶部２９０からクエリ情報Ｑ１からＱｎを読み出し、読み出したクエリ情報Ｑ１からＱｎに基づき、全ユーザ数ＡＬＬｕｓｅｒと、クエリＡのユーザ数Ａｕｓｅｒと、クエリＢのユーザ数Ｂｕｓｅｒとを算出する。 The query information acquisition unit 230 acquires the query information Q1 to Qn transmitted from the control unit 110 of the web server 100. Further, the query information acquisition unit 230 stores the acquired query information Q1 to Qn in the storage unit 290. The duplication degree score calculation unit 240 reads the query information Q1 to Qn from the storage unit 290, and based on the read query information Q1 to Qn, the total number of users ALLuser, the number of users A of the query A, and the number of users B of the query B And calculate.

また、重複度スコア算出部２４０は、以下の式（１）に基づき、クエリＡに対するクエリＢの重複度スコアＳｃｏｒｅ（Ａ，Ｂ）を算出する。すなわち、重複度スコア算出部２４０は、クエリＡおよびクエリＢの両方を検索したユーザの数（Ａｕｓｅｒ∩Ｂｕｓｅｒ）をクエリＡを検索したユーザの数（Ａｕｓｅｒ）で除算した値と、クエリＢを検索したユーザの数（Ｂｕｓｅｒ）を全ユーザの数（ＡＬＬｕｓｅｒ）で除算した値とに基づいて、重複度スコアＳｃｏｒｅ（Ａ，Ｂ）を算出する。 Further, the redundancy score calculation unit 240 calculates the redundancy score Score (A, B) of the query B with respect to the query A based on the following formula (1). That is, the redundancy score calculation unit 240 searches the query B by a value obtained by dividing the number of users who searched both the query A and the query B (Auser∩Buser) by the number of users who searched the query A (Auser). Based on the value obtained by dividing the number of users (Buser) by the number of all users (ALLuser), the multiplicity score Score (A, B) is calculated.

重複度スコアＳｃｏｒｅ（Ａ，Ｂ）は、クエリＡを検索したユーザと、クエリＢを検索したユーザとの重複の度合いを示す値である。重複度スコアＳｃｏｒｅ（Ａ，Ｂ）が大きいほど、クエリＡとクエリＢの関連性が高いといえる。逆に、重複度スコアＳｃｏｒｅ（Ａ，Ｂ）が小さいほど、クエリＡとクエリＢの関連性が低いといえる。 The redundancy score Score (A, B) is a value indicating the degree of overlap between the user who searched the query A and the user who searched the query B. It can be said that the greater the redundancy score Score (A, B), the higher the relevance between query A and query B. Conversely, it can be said that the smaller the multiplicity score Score (A, B), the lower the relevance between the query A and the query B.

例えば、車種１は軽自動車であり、車種２は軽自動車であり、車種３はスポーツカーであるとする。この場合において、クエリＡが「車種１」であり、クエリＢが「車種２」である場合、重複度スコアＳｃｏｒｅ（Ａ，Ｂ）は大きな値であった。一方、クエリＡが「車種１」であり、クエリＢが「車種３」である場合、重複度スコアＳｃｏｒｅ（Ａ，Ｂ）は小さな値であった。これは、軽自動車同士の関連性は高く、軽自動車とスポーツカーとの関連性は低いためである。このように、重複度スコアＳｃｏｒｅ（Ａ，Ｂ）を用いることで、例えば、競合製品の抽出や製品のニーズを分析することができる。 For example, it is assumed that the vehicle type 1 is a light vehicle, the vehicle type 2 is a light vehicle, and the vehicle type 3 is a sports car. In this case, when the query A is “vehicle type 1” and the query B is “vehicle type 2”, the multiplicity score Score (A, B) is a large value. On the other hand, when the query A is “vehicle type 1” and the query B is “vehicle type 3”, the multiplicity score Score (A, B) is a small value. This is because the relationship between light vehicles is high, and the relationship between light vehicles and sports cars is low. In this way, by using the redundancy score Score (A, B), for example, it is possible to extract competitive products and analyze product needs.

＜４．検索時間差の算出処理＞
次に、検索時間差の算出処理について説明する。検索時間差は、あるクエリが検索された時間と、他のクエリが検索された時間との差を示す値であり、クエリ分析の指標として用いられる。クエリ分析装置２００の検索時間差算出部２５０は、クエリ情報に用いて検索時間差を算出する。 <4. Search time difference calculation processing>
Next, search time difference calculation processing will be described. The search time difference is a value indicating the difference between the time when a certain query is searched and the time when another query is searched, and is used as an index for query analysis. The search time difference calculation unit 250 of the query analysis device 200 calculates the search time difference using the query information.

図５は、実施形態に係る検索時間差の算出処理を説明するための図である。図５において、横軸は検索が行われた時間を示し、縦軸はクエリを入力したユーザ数を示す。クエリ分布ＱＡはクエリＡのユーザ数の分布を示し、クエリ分布ＱＢはクエリＢのユーザ数の分布を示す。時間Ｔ１はクエリ分布ＱＡの検索時間の中央値を示し、時間Ｔ２はクエリ分布ＱＢの検索時間の中央値を示す。 FIG. 5 is a diagram for explaining search time difference calculation processing according to the embodiment. In FIG. 5, the horizontal axis indicates the time when the search is performed, and the vertical axis indicates the number of users who input the query. The query distribution QA indicates the distribution of the number of users of the query A, and the query distribution QB indicates the distribution of the number of users of the query B. Time T1 indicates the median search time of the query distribution QA, and time T2 indicates the median search time of the query distribution QB.

検索時間差算出部２５０は、クエリ情報取得部２３０によって取得されたクエリ情報Ｑ１からＱｎに基づき、時間Ｔ１および時間Ｔ２を導出する。具体的には、検索時間差算出部２５０は、クエリ情報Ｑ１からＱｎに基づいてクエリＡの検索日時を集計し、集計した検索日時の中央値を時間Ｔ１として導出する。また、検索時間差算出部２５０は、クエリ情報Ｑ１からＱｎに基づいてクエリＢの検索日時を集計し、集計した検索日時の中央値を時間Ｔ２として導出する。 The search time difference calculation unit 250 derives the time T1 and the time T2 based on the query information Q1 to Qn acquired by the query information acquisition unit 230. Specifically, the search time difference calculation unit 250 aggregates the search date / time of the query A based on the query information Q1 to Qn, and derives the median of the aggregated search date / time as the time T1. In addition, the search time difference calculation unit 250 aggregates the search date and time of the query B based on the query information Q1 to Qn, and derives the median of the aggregated search date and time as time T2.

さらに、検索時間差算出部２５０は、導出した時間Ｔ２から時間Ｔ１を減算することによって、クエリ分布ＱＡとクエリ分布ＱＢとの間の検索時間差Ｄ（Ａ，Ｂ）を算出する。検索時間差Ｄ（Ａ，Ｂ）がプラスの値の場合、クエリＡの検索よりも後にクエリＢの検索が行われた頻度が高いといえる。検索時間差Ｄ（Ａ，Ｂ）がマイナスの値の場合、クエリＡの検索よりも前にクエリＢの検索が行われた頻度が高いといえる。また、検索時間差Ｄ（Ａ，Ｂ）が０に近いほど、クエリＡの検索が行われた時期と同時期にクエリＢの検索が行われた頻度が高いといえる。 Further, the search time difference calculation unit 250 calculates the search time difference D (A, B) between the query distribution QA and the query distribution QB by subtracting the time T1 from the derived time T2. When the search time difference D (A, B) is a positive value, it can be said that the search for the query B is performed more frequently than the search for the query A. When the search time difference D (A, B) is a negative value, it can be said that the frequency of the search for the query B before the search for the query A is high. Further, it can be said that the closer the search time difference D (A, B) is to 0, the higher the frequency of the search for the query B at the same time as the search for the query A.

例えば、クエリＡが「咳止め薬」であり、クエリＢが「喉の痛み」である場合、検索時間差はマイナスの値であった。これは、喉の痛みは初期症状であるためである。一方、クエリＡが「咳止め薬」であり、クエリＢが「肺炎」である場合、検索時間差はプラスの値であった。これは、肺炎は症状が進行した状態であるためである。このように、検索時間差を用いることで、例えば、時間経過によるユーザのニーズの変化を把握することができる。 For example, when query A is “cough medicine” and query B is “throat pain”, the search time difference is a negative value. This is because sore throat is an early symptom. On the other hand, when query A is “cough medicine” and query B is “pneumonia”, the search time difference is a positive value. This is because pneumonia is a state in which symptoms have progressed. In this way, by using the search time difference, for example, it is possible to grasp a change in the user's needs over time.

＜５．クエリ分析ウィンドウ＞
図６は、実施形態に係る分析開始前のクエリ分析ウィンドウＷの一例を示す図である。クエリ分析装置２００の表示情報生成部２８０は、クエリ分析ウィンドウＷの表示情報を生成する。表示部２２０は、表示情報生成部２８０によって生成された表示情報に従って、クエリ分析ウィンドウＷを表示する。図６に示されるように、クエリ分析ウィンドウＷには、クエリ選択領域２２１と、データソース選択領域２２２と、閾値入力領域２２３と、開始ボタン２２４と、結果表示領域２２５とが表示されている。 <5. Query analysis window>
FIG. 6 is a diagram illustrating an example of the query analysis window W before the analysis start according to the embodiment. The display information generation unit 280 of the query analysis device 200 generates display information for the query analysis window W. The display unit 220 displays the query analysis window W according to the display information generated by the display information generation unit 280. As shown in FIG. 6, the query analysis window W displays a query selection area 221, a data source selection area 222, a threshold value input area 223, a start button 224, and a result display area 225.

クエリ選択領域２２１は、クエリ分析装置２００を使用する分析者が、分析対象のクエリを選択するための領域である。データソース選択領域２２２は、分析者がクエリ情報のデータソースを選択するための領域である。図６に示される例においては、分析対象のクエリとしてクエリＡが選択され、データソースとして２０１５年１月１日から２０１５年１２月３１日のデータソースが選択されている。 The query selection area 221 is an area for an analyst using the query analysis apparatus 200 to select a query to be analyzed. The data source selection area 222 is an area for an analyst to select a data source of query information. In the example shown in FIG. 6, the query A is selected as the query to be analyzed, and the data source from January 1, 2015 to December 31, 2015 is selected as the data source.

閾値入力領域２２３は、重複度スコアの閾値の入力を受け付ける入力部として機能する。開始ボタン２２４は、分析者が分析開始を指示するためのボタンである。結果表示領域２２５は、分析結果が表示される領域である。結果表示領域２２５には、分析結果として、クエリと、ユーザ数と、重複度スコアと、検索時間差とが表示される。 The threshold value input area 223 functions as an input unit that receives an input of the threshold value of the redundancy score. The start button 224 is a button for the analyst to instruct the start of analysis. The result display area 225 is an area where the analysis result is displayed. In the result display area 225, the query, the number of users, the redundancy score, and the search time difference are displayed as analysis results.

検索時間差算出部２５０は、重複度スコア算出部２４０によって算出されたクエリＡに対する重複度スコアが閾値入力領域２２３に入力された閾値未満のクエリに対しては、検索時間差を算出しない。図６に示される例においては、閾値として４が入力されている。このため、重複度スコアが４未満のクエリに対しては検索時間差が算出されないこととなり、結果表示領域２２５から分析結果が省かれることとなる。 The search time difference calculation unit 250 does not calculate a search time difference for a query in which the redundancy score for the query A calculated by the redundancy score calculation unit 240 is less than the threshold value input to the threshold value input area 223. In the example shown in FIG. 6, 4 is input as the threshold value. For this reason, a search time difference is not calculated for a query having a multiplicity score of less than 4, and the analysis result is omitted from the result display area 225.

分析者が、クエリ分析装置２００の入力部２１０を用いて開始ボタン２２４をクリックすると、クエリ分析処理が開始される。クエリ分析処理において、重複度スコア算出部２４０は重複度スコアを算出し、検索時間差算出部２５０は検索時間差を算出する。算出された重複度スコアおよび検索時間差は、結果表示領域２２５に表示される。 When the analyst clicks the start button 224 using the input unit 210 of the query analysis device 200, the query analysis process is started. In the query analysis process, the redundancy score calculation unit 240 calculates a redundancy score, and the search time difference calculation unit 250 calculates a search time difference. The calculated redundancy score and search time difference are displayed in the result display area 225.

図７は、実施形態に係る分析終了後のクエリ分析ウィンドウＷの一例を示す図である。クエリ分析処理において、重複度スコア算出部２４０は、クエリ情報取得部２３０によって取得されたクエリ情報のうち、データソース選択領域２２２に示されるデータソースを用いて重複度スコアを算出する。具体的には、重複度スコア算出部２４０は、前述の式（１）に基づいて、クエリ選択領域２２１に示されるクエリに対する、他のクエリの重複度スコアを算出する。 FIG. 7 is a diagram illustrating an example of the query analysis window W after the analysis according to the embodiment. In the query analysis process, the multiplicity score calculation unit 240 calculates a multiplicity score using the data source indicated in the data source selection area 222 among the query information acquired by the query information acquisition unit 230. Specifically, the multiplicity score calculation unit 240 calculates the multiplicity score of another query for the query indicated in the query selection area 221 based on the above-described equation (1).

ただし、重複度スコア算出部２４０は、算出した重複度スコアが閾値入力領域２２３に入力された閾値未満である場合、重複度スコアが閾値未満であるクエリについての分析結果を結果表示領域２２５から除外する。これによって、分析対象のクエリと関連性の低いクエリの分析結果を除外することができ、クエリ分析の精度を向上させることができる。 However, if the calculated redundancy score is less than the threshold value input to the threshold value input area 223, the redundancy score calculation unit 240 excludes the analysis result for the query having the redundancy score less than the threshold value from the result display area 225. To do. As a result, it is possible to exclude the analysis result of the query that is less relevant to the query to be analyzed, and to improve the accuracy of the query analysis.

クエリ分析処理において、検索時間差算出部２５０は、クエリ情報取得部２３０によって取得されたクエリ情報のうち、データソース選択領域２２２に示されるデータソースを用いて、検索時間差を算出する。具体的には、検索時間差算出部２５０は、クエリ選択領域２２１に示されるクエリと他のクエリとの間の検索時間差を算出する。 In the query analysis process, the search time difference calculation unit 250 calculates a search time difference using the data source indicated in the data source selection area 222 among the query information acquired by the query information acquisition unit 230. Specifically, the search time difference calculation unit 250 calculates a search time difference between the query indicated in the query selection area 221 and another query.

図７に示されるように、結果表示領域２２５には、重複度スコア算出部２４０によって算出された重複度スコアと、検索時間差算出部２５０によって算出された検索時間差とが、クエリごとに関連付けられて表示される。これによって、分析者は、各クエリについての重複度スコアと検索時間差とを容易に把握することができる。 As shown in FIG. 7, in the result display area 225, the redundancy score calculated by the redundancy score calculation unit 240 and the search time difference calculated by the search time difference calculation unit 250 are associated with each query. Is displayed. Thus, the analyst can easily grasp the redundancy score and the search time difference for each query.

なお、図７において、表示部２２０は、ユーザ数、重複度スコア、または検索時間差について、昇順または降順に並び替えるためのボタンをクエリ分析ウィンドウＷ内に表示してもよい。これらの値を並び替えることで、分析者は、分析結果をより容易に把握することができる。 In FIG. 7, the display unit 220 may display a button for rearranging the number of users, the redundancy score, or the search time difference in ascending order or descending order in the query analysis window W. By rearranging these values, the analyst can more easily grasp the analysis result.

＜６．ヒートマップ生成処理＞
クエリ分析処理が完了すると、クエリ分析ウィンドウＷ内にヒートマップ表示ボタン２２６が表示される。分析者が、クエリ分析装置２００の入力部２１０を用いてヒートマップ表示ボタン２２６をクリックすると、ヒートマップ生成部２７０はヒートマップ生成処理を開始する。ヒートマップ生成処理において、クエリ分析装置２００のクラスタリング部２６０は、複数のクエリをグループ化してクラスタを生成する。 <6. Heat map generation process>
When the query analysis process is completed, a heat map display button 226 is displayed in the query analysis window W. When the analyst clicks the heat map display button 226 using the input unit 210 of the query analysis device 200, the heat map generation unit 270 starts the heat map generation process. In the heat map generation process, the clustering unit 260 of the query analysis device 200 generates a cluster by grouping a plurality of queries.

図８は、実施形態に係るクラスタリング処理を説明するための図である。クラスタリング部２６０は、クエリ情報取得部２３０によって取得されたクエリ情報に基づき、類似する複数のクエリをグループ化してクラスタを生成する。図８に示される例において、クラスタリング部２６０は、クエリＡからクエリＣをグループ化してクラスタＡを生成し、クエリＤからクエリＧをグループ化してクラスタＢを生成し、クエリＨからクエリＫをグループ化してクラスタＣを生成している。 FIG. 8 is a diagram for explaining clustering processing according to the embodiment. The clustering unit 260 generates a cluster by grouping a plurality of similar queries based on the query information acquired by the query information acquisition unit 230. In the example illustrated in FIG. 8, the clustering unit 260 groups the query C from the query A to generate the cluster A, groups the query G from the query D to generate the cluster B, and groups the query K from the query H. To generate cluster C.

例えば、クラスタリング部２６０は、同一の検索ワードを所定数以上含むクエリをグループ化してクラスタを生成してもよいし、シソーラス辞書を用いて検索ワードが類似するか否かを判定し、類似する検索ワードを所定数以上含むクエリをグループ化してクラスタを生成してもよい。 For example, the clustering unit 260 may generate a cluster by grouping queries including a predetermined number or more of the same search word, or determine whether or not the search word is similar using a thesaurus dictionary. A cluster may be generated by grouping queries including a predetermined number of words.

クラスタリング部２６０は、生成したクラスタをヒートマップ生成部２７０に出力する。ヒートマップ生成部２７０は、クエリ情報取得部２３０によって取得されたクエリ情報と、クラスタリング部２６０から入力されたクラスタとに基づいて、ヒートマップＭを生成する。 The clustering unit 260 outputs the generated cluster to the heat map generation unit 270. The heat map generation unit 270 generates the heat map M based on the query information acquired by the query information acquisition unit 230 and the cluster input from the clustering unit 260.

図９は、実施形態に係るヒートマップＭの一例を示す図である。ヒートマップＭの横軸は検索時間差を示し、縦軸はクラスタを示す。図９に示される例において、横軸の検索時間差の単位を日（ｄａｙ）として示しているが、これに限られない。例えば、より細かく分析する必要があれば、横軸の検索時間差の単位を時間（ｈｏｕｒ）として示してもよい。また、より長期間の分析結果が必要とされる場合は、横軸の検索時間差の単位を月（ｍｏｎｔｈ）として示してもよい。 FIG. 9 is a diagram illustrating an example of the heat map M according to the embodiment. The horizontal axis of the heat map M indicates a search time difference, and the vertical axis indicates a cluster. In the example shown in FIG. 9, the unit of the search time difference on the horizontal axis is shown as a day, but is not limited to this. For example, if more detailed analysis is required, the unit of the search time difference on the horizontal axis may be indicated as time. When a longer-term analysis result is required, the unit of the search time difference on the horizontal axis may be indicated as a month.

各セル内に記載された数値は、横一列の合計を１とした場合のユーザ数（検索数）の割合を示す。すなわち、ヒートマップＭは、検索時間差算出部２５０によって算出された検索時間差と、検索時間差に対応する検索数とが、クラスタリング部２６０によってグループ化されたクラスタごとに関連づけられた一覧情報である。このように、ヒートマップ生成部２７０は、横軸において検索時間差の分布を表現し、縦軸において検索時間差ごとにユーザ数（検索数）を正規化した値を表現したヒートマップＭを生成する。 The numerical value described in each cell indicates the ratio of the number of users (the number of searches) when the sum of one horizontal row is 1. That is, the heat map M is list information in which the search time difference calculated by the search time difference calculation unit 250 and the number of searches corresponding to the search time difference are associated with each cluster grouped by the clustering unit 260. As described above, the heat map generation unit 270 generates a heat map M that expresses the distribution of the search time difference on the horizontal axis and expresses the value obtained by normalizing the number of users (search number) for each search time difference on the vertical axis.

ヒートマップ内の各セルは、ユーザ数（検索数）に応じた色でハッチングされる。図９に示される例においては、縦一列の値の合計に対する割合が高いセルほど、濃い色でハッチングされているが、これに限られない。例えば、縦一列の値の合計に対する割合が高いセルほど、薄い色でハッチングされてもよい。 Each cell in the heat map is hatched with a color corresponding to the number of users (number of searches). In the example shown in FIG. 9, cells having a higher ratio with respect to the sum of the values in one vertical column are hatched in a darker color, but are not limited thereto. For example, a cell having a higher ratio with respect to the sum of the values in one column may be hatched with a lighter color.

クエリ分析装置２００の表示部２２０は、クエリ分析ウィンドウＷ内にヒートマップＭを表示してもよいし、クエリ分析ウィンドウＷとは別のウィンドウにヒートマップＭを表示してもよい。このように、表示部２２０がヒートマップＭを表示することによって、分析者は、各クラスタについての検索時間差と検索数とを容易に把握することができる。 The display unit 220 of the query analysis device 200 may display the heat map M in the query analysis window W or display the heat map M in a window different from the query analysis window W. Thus, the display unit 220 displays the heat map M, so that the analyst can easily grasp the search time difference and the number of searches for each cluster.

＜７．クエリ分析処理＞
図１０は、実施形態に係るクエリ分析処理を示すフローチャートである。本フローチャートによる処理は、クエリ分析装置２００によって実行される。 <7. Query analysis processing>
FIG. 10 is a flowchart illustrating query analysis processing according to the embodiment. The processing according to this flowchart is executed by the query analysis device 200.

まず、表示部２２０は、前述の図６に示されるクエリ分析ウィンドウＷを表示する（Ｓ２０）。次に、クエリ分析装置２００は、開始ボタン２２４がクリックされたか否かを判定する（Ｓ２１）。開始ボタン２２４がクリックされたと判定された場合、重複度スコア算出部２４０は、クエリ情報取得部２３０によって取得されたクエリ情報に基づいて、クエリごとの重複度スコアを算出する（Ｓ２２）。次に、検索時間差算出部２５０は、クエリ情報取得部２３０によって取得されたクエリ情報に基づいて、クエリごとの検索時間差を算出する（Ｓ２３）。 First, the display unit 220 displays the query analysis window W shown in FIG. 6 (S20). Next, the query analysis device 200 determines whether or not the start button 224 has been clicked (S21). When it is determined that the start button 224 is clicked, the redundancy score calculation unit 240 calculates a redundancy score for each query based on the query information acquired by the query information acquisition unit 230 (S22). Next, the search time difference calculation unit 250 calculates a search time difference for each query based on the query information acquired by the query information acquisition unit 230 (S23).

その後、前述の図７に示されるように、表示情報生成部２８０は、重複度スコア算出部２４０によって算出された重複度スコアと、検索時間差算出部２５０によって算出された検索時間差とを、クエリごとに関連付けて表示するための表示情報を生成する。表示部２２０は、表示情報生成部２８０によって生成された表示情報を表示する（Ｓ２４）。また、表示部２２０は、クエリ分析ウィンドウＷにヒートマップ表示ボタン２２６を表示する。 Thereafter, as shown in FIG. 7 described above, the display information generation unit 280 calculates the redundancy score calculated by the redundancy score calculation unit 240 and the search time difference calculated by the search time difference calculation unit 250 for each query. Display information to be displayed in association with. The display unit 220 displays the display information generated by the display information generation unit 280 (S24). Further, the display unit 220 displays a heat map display button 226 in the query analysis window W.

次に、クエリ分析装置２００は、ヒートマップ表示ボタン２２６がクリックされたか否かを判定する（Ｓ２５）。ヒートマップ表示ボタン２２６がクリックされたと判定された場合、ヒートマップ生成部２７０は、前述の図９に示されるヒートマップＭを生成する（Ｓ２６）。その後、表示部２２０は、ヒートマップ生成部２７０によって生成されたヒートマップＭを表示し（Ｓ２７）、本フローチャートによる処理を終了する。 Next, the query analysis device 200 determines whether or not the heat map display button 226 has been clicked (S25). When it is determined that the heat map display button 226 is clicked, the heat map generation unit 270 generates the heat map M shown in FIG. 9 (S26). Thereafter, the display unit 220 displays the heat map M generated by the heat map generation unit 270 (S27), and ends the processing according to this flowchart.

以上説明したように、重複度スコア算出部２４０は、クエリ情報に基づいて、クエリＡを検索したユーザと、クエリＢを検索したユーザとの重複の度合いを示す重複度スコアＳｃｏｒｅ（Ａ，Ｂ）を算出する。検索時間差算出部２５０は、クエリ情報に基づいて、クエリＡが検索された時間と、クエリＢが検索された時間との差である検索時間差Ｄ（Ａ，Ｂ）を算出する。表示情報生成部２８０は、重複度スコア算出部２４０によって算出された重複度スコアＳｃｏｒｅ（Ａ，Ｂ）と、検索時間差算出部２５０によって算出された検索時間差Ｄ（Ａ，Ｂ）とを、クエリＡおよびクエリＢに関連付けて表示するための情報を生成する。これによって、検索するユーザのニーズや、時間経過によるユーザのニーズの変化を、クエリを用いて分析することができる。 As described above, the multiplicity score calculation unit 240, based on the query information, the multiplicity score Score (A, B) indicating the degree of duplication between the user who searched the query A and the user who searched the query B. Is calculated. The search time difference calculation unit 250 calculates a search time difference D (A, B) that is a difference between the time when the query A is searched and the time when the query B is searched based on the query information. The display information generation unit 280 uses the redundancy score Score (A, B) calculated by the redundancy score calculation unit 240 and the search time difference D (A, B) calculated by the search time difference calculation unit 250 as the query A. And information to be displayed in association with the query B. This makes it possible to analyze the needs of users to be searched and changes in user needs over time using queries.

なお、上記実施形態によるクエリ分析装置２００は、内部にコンピュータシステムを有している。そして、上述したクエリ分析装置２００の各処理の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって上記各種処理が行われる。ここで、コンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしてもよい。 Note that the query analysis apparatus 200 according to the above embodiment has a computer system therein. Each process of the query analysis apparatus 200 described above is stored in a computer-readable recording medium in the form of a program, and the above-described various processes are performed by the computer reading and executing the program. Here, the computer-readable recording medium refers to a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, and the like. Alternatively, the computer program may be distributed to the computer via a communication line, and the computer that has received the distribution may execute the program.

また、クエリ分析装置２００は、表示部２２０および表示情報生成部２８０の両方を備えることとしたが、これに限られない。例えば、クエリ分析装置２００をクラウドサービスとして提供する場合、表示部２２０を省略することができる。この場合、表示情報生成部２８０は、分析結果を表示するための表示情報を生成し、生成した表示情報を外部からの要求に応じて送信してもよい。 The query analysis device 200 includes both the display unit 220 and the display information generation unit 280, but is not limited thereto. For example, when the query analysis device 200 is provided as a cloud service, the display unit 220 can be omitted. In this case, the display information generation unit 280 may generate display information for displaying the analysis result, and transmit the generated display information in response to a request from the outside.

また、検索時間差算出部２５０は、検索時間の中央値を導出し、導出した中央値を用いて検索時間差を算出することとしたが、これに限られない。例えば、検索時間差算出部２５０は、検索時間の平均値を算出し、算出した平均値を用いて検索時間差を算出してもよい。 Further, although the search time difference calculation unit 250 derives the median value of the search time and calculates the search time difference using the derived median value, the present invention is not limited to this. For example, the search time difference calculation unit 250 may calculate an average value of search times and calculate a search time difference using the calculated average value.

また、ヒートマップ生成部２７０は、クラスタリング部２６０によってグループ化されたクラスタごとに、検索時間差と検索数とが関連づけられたヒートマップを生成するとしたが、これに限られない。例えば、ヒートマップ生成部２７０は、クエリごとに、検索時間差と検索数とが関連づけられたヒートマップを生成してもよい。これによって、分析者は、クエリごとのヒートマップＭを確認することができる。 Moreover, although the heat map production | generation part 270 produced | generated the heat map with which the search time difference and the number of searches were linked | related for every cluster grouped by the clustering part 260, it is not restricted to this. For example, the heat map generation unit 270 may generate a heat map in which a search time difference and the number of searches are associated with each query. Thus, the analyst can check the heat map M for each query.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 As mentioned above, although the form for implementing this invention was demonstrated using embodiment, this invention is not limited to such embodiment at all, In the range which does not deviate from the summary of this invention, various deformation | transformation and substitution Can be added.

１０…クエリ分析システム
１００…ウェブサーバ
１１０…制御部
１２０…記憶部
２００…クエリ分析装置
２１０…入力部
２２０…表示部
２３０…クエリ情報取得部
２４０…重複度スコア算出部
２５０…検索時間差算出部
２６０…クラスタリング部
２７０…ヒートマップ生成部
２８０…表示情報生成部
２９０…記憶部
３００…ユーザ端末
３１０…制御部
３２０…入力部
３３０…表示部 DESCRIPTION OF SYMBOLS 10 ... Query analysis system 100 ... Web server 110 ... Control part 120 ... Memory | storage part 200 ... Query analysis apparatus 210 ... Input part 220 ... Display part 230 ... Query information acquisition part 240 ... Duplication degree score calculation part 250 ... Search time difference calculation part 260 ... clustering part 270 ... heat map generation part 280 ... display information generation part 290 ... storage part 300 ... user terminal 310 ... control part 320 ... input part 330 ... display part

Claims

A query information acquisition unit that acquires query information in which a searched query and a time when the query is searched are associated for each user;
Based on the query information acquired by the query information acquisition unit, a redundancy score for calculating a redundancy score indicating a degree of overlap between a user who has searched for the first query and a user who has searched for the second query A calculation unit;
Based on the query information acquired by the query information acquisition unit, a search time difference that calculates a search time difference that is a difference between a time when the first query is searched and a time when the second query is searched A calculation unit;
Information for displaying the redundancy score calculated by the redundancy score calculation unit and the search time difference calculated by the search time difference calculation unit in association with the first query and the second query A display information generation unit for generating
A query analysis apparatus comprising:

The multiplicity score calculation unit includes a value obtained by dividing the number of users who have searched both the first query and the second query by the number of users who have searched the first query, and the second query. The query analysis device according to claim 1, wherein the redundancy score is calculated based on a value obtained by dividing the number of users who have searched for the number of users by the number of all users.

The search time difference calculation unit calculates, as the search time difference, a difference between a median time when the first query is searched and a median time when the second query is searched. 2. The query analysis device according to 2.

An input unit for receiving an input of a threshold value of the multiplicity score;
The search time difference calculation unit does not calculate the search time difference for a second query in which the redundancy score for the first query calculated by the redundancy score calculation unit is less than the threshold. 4. The query analysis device according to any one of items 1 to 3.

The list information generation part which produces | generates the list information with which the said search time difference calculated by the said search time difference calculation part and the number of searches corresponding to the said search time difference were linked | related is provided. The query analysis device described.

A clustering unit for grouping a plurality of queries to generate a cluster;
The query analysis device according to claim 5, wherein the list information generation unit generates list information in which the search time difference and the number of searches are associated with each cluster grouped by the clustering unit.

The query analysis device according to claim 5, wherein the list information generation unit generates list information in which the search time difference and the number of searches are associated with each query.

The list information generation unit generates the list information that expresses a distribution of the search time difference on a first axis and expresses a value obtained by normalizing the number of searches for each search time difference on a second axis. The query analysis device according to any one of 1 to 7.

A query information acquisition step of acquiring query information in which a searched query and a time when the query is searched are associated for each user;
Based on the query information acquired in the query information acquisition step, a redundancy score for calculating a redundancy score indicating a degree of overlap between the user who has searched for the first query and the user who has searched for the second query A calculation process;
Based on the query information acquired in the query information acquisition step, a search time difference for calculating a search time difference that is a difference between a time when the first query is searched and a time when the second query is searched A calculation process;
Information for displaying the redundancy score calculated in the redundancy score calculation step and the search time difference calculated in the search time difference calculation step in association with the first query and the second query A display information generation step for generating
A query analysis method comprising:

Query information obtained by associating each query with a query searched by a computer and a time when the query was searched,
Based on the acquired query information, a redundancy score indicating a degree of overlap between the user who searched the first query and the user who searched the second query is calculated,
Based on the acquired query information, a search time difference that is a difference between a time when the first query is searched and a time when the second query is searched is calculated,
Generating information for displaying the calculated redundancy score and the calculated search time difference in association with the first query and the second query,
program.