JP5445071B2

JP5445071B2 - Search information analysis program, search information analysis device, and search information analysis method

Info

Publication number: JP5445071B2
Application number: JP2009269497A
Authority: JP
Inventors: 剛寿安藤; 聡子志賀; 友哉岩倉; 青史岡本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-11-27
Filing date: 2009-11-27
Publication date: 2014-03-19
Anticipated expiration: 2029-11-27
Also published as: JP2011113333A

Description

本発明は、検索情報解析プログラム、検索情報解析装置、及び検索情報解析方法に関し、特にサーチエンジンの検索ログを解析する検索情報解析プログラム、検索情報解析装置、及び検索情報解析方法に関する。 The present invention relates to a search information analysis program, a search information analysis device, and a search information analysis method, and more particularly to a search information analysis program, a search information analysis device, and a search information analysis method for analyzing a search log of a search engine.

従来、インターネット上のサーチエンジン（キーワード検索機能を提供するＷｅｂサイト）において記録されている検索ログを解析し、話題となっている単語を提示するサービスが提供されている（例えば、特許文献１）。斯かるサービスを利用することで、ユーザは、現在世の中で話題となっている事、物、又は人物等を容易に知ることができる。 2. Description of the Related Art Conventionally, a service for analyzing a search log recorded in a search engine on the Internet (a Web site that provides a keyword search function) and presenting a topic word is provided (for example, Patent Document 1). . By using such a service, the user can easily know what is currently being talked about in the world, things, people or the like.

特開２００４−２０６５１７号公報JP 2004-206517 A

しかしながら、単に単語が提示されるだけでは、提示された単語の意味及び背景等を理解するのはユーザにとって困難である。例えば、或る人物名が、話題となっている単語として提示された場合、その人物が何故話題となっているのかまでは分からなかった。 However, simply presenting a word makes it difficult for the user to understand the meaning and background of the presented word. For example, when a certain person name is presented as a topical word, it has not been understood why the person is a topic.

本発明は、上記の点に鑑みてなされたものであって、話題となっている単語の背景を示す情報の提示を適切に支援することのできる検索情報解析プログラム、検索情報解析装置、及び検索情報解析方法の提供を目的とする。 The present invention has been made in view of the above points, and is a search information analysis program, a search information analysis apparatus, and a search that can appropriately support the presentation of information indicating the background of a topic word The purpose is to provide an information analysis method.

そこで上記課題を解決するため、検索情報解析プログラムは、コンピュータに、１つ以上の単語を含む検索語と該検索語の属性情報とを対応付けて格納した検索ログに基づいて、該検索語に含まれている単語間の関係を抽出し、各単語と関係する単語を対応付けて関係情報記憶手段に記録する関係生成手順と、前記検索語に含まれる単語の一つである話題語に基づき、該話題語と前記関係を有する単語である周辺語を前記関係情報記憶手段を参照して抽出し、抽出した前記周辺語ごとに、該周辺語と前記関係を有する第一の単語と前記第一の単語であってかつ該周辺語以外の前記周辺語に該当する第二の単語とを前記関係情報記憶手段を参照して抽出し、該抽出した第一の単語の数と該第二の単語の数とをそれぞれ計数し、該第一の単語の数及び該第二の単語の数に基づいて該周辺語が前記話題語の背景を示す度合を算出し、該算出した度合に基づいて前記周辺語の一部を抽出する選択手順とを実行させる。 Therefore, in order to solve the above-described problem, the search information analysis program stores the search word in the computer based on a search log in which a search word including one or more words and attribute information of the search word are stored in association with each other. Based on a relationship generation procedure for extracting a relationship between words included, associating a word related to each word and recording it in the relationship information storage means, and a topic word that is one of the words included in the search word The peripheral words that are the words having the relationship with the topic word are extracted with reference to the relationship information storage means, and the first word having the relationship with the peripheral words and the first word are extracted for each of the extracted peripheral words. A second word corresponding to the peripheral word other than the peripheral word is extracted with reference to the relation information storage means, and the number of the extracted first words and the second word Count the number of each word and the number of the first word The peripheral words calculates the degree indicating the background of the topic word based on the number of fine said second words, to execute a selection procedure for extracting a portion of the peripheral word based on the degree to which the calculated.

開示された技術によれば、話題となっている単語の背景を示す情報の提示を適切に支援することができる。 According to the disclosed technology, it is possible to appropriately support the presentation of information indicating the background of a topic word.

本発明の実施の形態における検索情報解析装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the search information analysis apparatus in embodiment of this invention. 本発明の実施の形態における検索情報解析装置の機能構成例を示す図である。It is a figure which shows the function structural example of the search information analysis apparatus in embodiment of this invention. 検索情報解析装置による処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the process sequence by a search information analysis apparatus. 検索ログ記憶部の構成例を示す図である。It is a figure which shows the structural example of a search log memory | storage part. 話題語の抽出方法の例を説明するための図である。It is a figure for demonstrating the example of the extraction method of a topic word. 飛び先ＵＲＬの共通性に基づく単語間の関係の抽出例を示す図である。It is a figure which shows the example of extraction of the relationship between the words based on the commonality of a jump destination URL. 検索ユーザの共通性に基づく単語間の関係の抽出例を示す図である。It is a figure which shows the example of extraction of the relationship between the words based on a search user's commonality. 同一の検索キーワードに含まれているか否かに基づく単語間の関係の抽出例を示す図である。It is a figure which shows the example of extraction of the relationship between the words based on whether it is contained in the same search keyword. 飛び先ＵＲＬの共通性に基づく単語間の関係の抽出処理の処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the process sequence of the extraction process of the relationship between words based on the commonality of a jump destination URL. 関係管理テーブルの例を示す図である。It is a figure which shows the example of a relationship management table. 検索ユーザの共通性に基づく単語間の関係の抽出処理の処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the process sequence of the extraction process of the relationship between the words based on a search user's commonality. 同一の検索キーワードに含まれているか否かに基づく単語間の関係の抽出処理の処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the process sequence of the extraction process of the relationship between words based on whether it is contained in the same search keyword. 代表語の選択処理の処理手順を説明するためのフローチャートである。It is a flowchart for demonstrating the process sequence of the selection process of a representative word. 話題語から２ステップ以内の単語の関係情報の例を示す図である。It is a figure which shows the example of the related information of the word within two steps from a topic word. 関係情報の具体例における周辺語ごとの各統計値を示す図である。It is a figure which shows each statistical value for every peripheral word in the specific example of related information. 話題語に関する代表語等の表示例を示す図である。It is a figure which shows the example of a display of the representative word etc. regarding a topic word.

以下、図面に基づいて本発明の実施の形態を説明する。図１は、本発明の実施の形態における検索情報解析装置のハードウェア構成例を示す図である。図１の検索情報解析装置１０は、それぞれバスBで相互に接続されているドライブ装置１００と、補助記憶装置１０２と、メモリ装置１０３と、ＣＰＵ１０４と、インタフェース装置１０５と、表示装置１０６と、入力装置１０７とを有する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a hardware configuration example of a search information analysis apparatus according to an embodiment of the present invention. 1 includes a drive device 100, an auxiliary storage device 102, a memory device 103, a CPU 104, an interface device 105, a display device 106, and an input connected to each other via a bus B. Device 107.

検索情報解析装置１０での処理を実現するプログラムは、ＣＤ−ＲＯＭ等の記録媒体１０１によって提供される。プログラムを記録した記録媒体１０１がドライブ装置１００にセットされると、プログラムが記録媒体１０１からドライブ装置１００を介して補助記憶装置１０２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１０１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１０２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 A program for realizing processing in the search information analysis apparatus 10 is provided by a recording medium 101 such as a CD-ROM. When the recording medium 101 on which the program is recorded is set in the drive device 100, the program is installed from the recording medium 101 to the auxiliary storage device 102 via the drive device 100. However, the program need not be installed from the recording medium 101 and may be downloaded from another computer via a network. The auxiliary storage device 102 stores the installed program and also stores necessary files and data.

メモリ装置１０３は、プログラムの起動指示があった場合に、補助記憶装置１０２からプログラムを読み出して格納する。ＣＰＵ１０４は、メモリ装置１０３に格納されたプログラムに従って検索情報解析装置１０に係る機能を実現する。インタフェース装置１０５は、ネットワークに接続するためのインタフェースとして用いられる。表示装置１０６はプログラムによるＧＵＩ（Graphical User Interface）等を表示する。入力装置１０７はキーボード及びマウス等であり、様々な操作指示を入力させるために用いられる。 The memory device 103 reads the program from the auxiliary storage device 102 and stores it when there is an instruction to start the program. The CPU 104 realizes functions related to the search information analysis device 10 according to a program stored in the memory device 103. The interface device 105 is used as an interface for connecting to a network. The display device 106 displays a GUI (Graphical User Interface) or the like by a program. The input device 107 is a keyboard, a mouse, or the like, and is used for inputting various operation instructions.

図２は、本発明の実施の形態における検索情報解析装置の機能構成例を示す図である。同図において、検索情報解析装置１０は、検索ログ記憶部１１、話題語抽出部１２、関係抽出部１３、代表語選択部１４、及び出力処理部１５等を有する。これら各部は、検索情報解析装置１０にインストールされたプログラムがＣＰＵ１０４に実行させる処理によって実現される。 FIG. 2 is a diagram illustrating a functional configuration example of the search information analysis apparatus according to the embodiment of the present invention. In the figure, the search information analysis apparatus 10 includes a search log storage unit 11, a topic word extraction unit 12, a relationship extraction unit 13, a representative word selection unit 14, an output processing unit 15, and the like. Each of these units is realized by processing executed by the CPU 104 by a program installed in the search information analysis apparatus 10.

検索ログ記憶部１１は、補助記憶装置１０２において、サーチエンジンの検索ログ（検索履歴）を記憶する記憶領域である。サーチエンジンとは、インターネット上で公開されている情報の検索サービスを提供するＷｅｂサイトである。検索ログとは、閲覧された情報のＵＲＬ（Uniform Resource Locator）と、当該情報の検索に使用された検索キーワード（検索語）を含むデータである。換言すれば、検索ログは、入力された検索キーワードと、検索結果の中のいずれの情報が閲覧先（飛び先又は遷移先）として選択されたかを示す情報（飛び先ＵＲＬ）とを含む。なお、検索ログ記憶部１１は、検索情報解析装置１０とネットワークを介して接続された他のコンピュータ又は記憶装置が有していてもよい。 The search log storage unit 11 is a storage area for storing a search log (search history) of a search engine in the auxiliary storage device 102. A search engine is a Web site that provides a search service for information published on the Internet. The search log is data including a URL (Uniform Resource Locator) of browsed information and a search keyword (search word) used for searching the information. In other words, the search log includes the input search keyword and information (jump destination URL) indicating which information in the search result is selected as the browsing destination (jump destination or transition destination). The search log storage unit 11 may be included in another computer or storage device connected to the search information analysis apparatus 10 via a network.

話題語抽出部１２は、検索ログ記憶部１１に記録された検索ログを解析し、過去の検索に利用された検索キーワードの中から話題語を抽出する。話題語とは、話題となっている可能性の高い単語をいう。関係抽出部１３は、検索ログ記憶部１１に記録された検索ログを解析し、検索キーワードと共に検索ログに記録されている情報に基づいて検索キーワードに含まれている単語間の関係を抽出する。関係抽出部１３は、抽出された関係を示す情報を関係管理テーブルとして補助記憶装置１０２又はメモリ装置１０３に記録する。 The topic word extraction unit 12 analyzes the search log recorded in the search log storage unit 11 and extracts a topic word from search keywords used for past searches. A topic word is a word that is likely to be a topic. The relationship extraction unit 13 analyzes the search log recorded in the search log storage unit 11 and extracts a relationship between words included in the search keyword based on information recorded in the search log together with the search keyword. The relationship extraction unit 13 records information indicating the extracted relationship in the auxiliary storage device 102 or the memory device 103 as a relationship management table.

代表語選択部１４は、話題語抽出部１２によって抽出された話題語と、関係抽出部１３によって生成された関係管理テーブルとに基づいて、検索キーワードに含まれている単語の中から代表語を選択する。代表語は、話題語が話題となっている背景を示すものと期待されるものとして、話題語と関係を有する単語より所定の規則に基づいて選択される。例えば、話題語が属する分野若しくはカテゴリを示す単語、又は話題語を抽象化した単語等が代表語として選択されることが期待される。出力処理部１５は、代表語選択部１４によって選択された代表語等を出力する。出力先は、表示装置１０６、補助記憶装置１０２、又は非図示のプリンタ等、様々なものが適用可能である。 The representative word selection unit 14 selects a representative word from words included in the search keyword based on the topic word extracted by the topic word extraction unit 12 and the relationship management table generated by the relationship extraction unit 13. select. The representative word is selected based on a predetermined rule from words having a relationship with the topic word as expected to indicate the background in which the topic word is a topic. For example, it is expected that a word indicating a field or category to which a topic word belongs or a word obtained by abstracting a topic word is selected as a representative word. The output processing unit 15 outputs the representative word selected by the representative word selecting unit 14. Various output destinations such as the display device 106, the auxiliary storage device 102, or a printer (not shown) can be applied.

以下、図１の検索情報解析装置１０の処理手順について説明する。図３は、検索情報解析装置による処理手順を説明するためのフローチャートである。同図の処理は、例えば、操作者による処理の開始指示の入力、又は定期的なタイミングの自動検知により開始される。 Hereinafter, the processing procedure of the search information analysis apparatus 10 of FIG. 1 will be described. FIG. 3 is a flowchart for explaining the processing procedure by the search information analysis apparatus. The process shown in FIG. 6 is started by, for example, an input of a process start instruction by an operator or automatic detection of a periodic timing.

ステップＳ１０１において、話題語抽出部１２は、検索ログ記憶部１１に記録されている検索ログに含まれている検索キーワードの中から話題語を抽出する。 In step S <b> 101, the topic word extraction unit 12 extracts a topic word from search keywords included in the search log recorded in the search log storage unit 11.

図４は、検索ログ記憶部の構成例を示す図である。同図において、検索ログ記憶部１１は、検索ログごとに、日時、ユーザ識別子、検索キーワード、及び飛び先ＵＲＬ等を記憶する。検索キーワードは、検索の種として入力された文字列（一文字も含む。）である。検索キーワードは、複数の単語又は文字列を含みうる。同図において、２行目の検索ログの検索キーワードは「マカロンお菓子」である。このことは、「マカロン」と「お菓子」とが同時に検索キーワードとして入力されたことを示す。飛び先ＵＲＬは、検索キーワードに基づいて検索された検索結果（情報の一覧）の中から、閲覧対象として選択された情報のＵＲＬを示す。例えば、１行目の検索ログは、「マカロン」という検索キーワードに基づいて検索された情報の中から、「http://www.xxxxxxx.com/」によって識別される情報が閲覧対象とされたことを示す。 FIG. 4 is a diagram illustrating a configuration example of the search log storage unit. In the figure, the search log storage unit 11 stores a date, a user identifier, a search keyword, a jump URL, and the like for each search log. The search keyword is a character string (including a single character) input as a search seed. The search keyword can include a plurality of words or character strings. In the figure, the search keyword in the search log on the second line is “Macaroon sweets”. This indicates that “Macaroon” and “Sweet” are simultaneously input as search keywords. The jump destination URL indicates the URL of the information selected as the browsing target from the search result (information list) searched based on the search keyword. For example, in the search log on the first line, information identified by “http://www.xxxxxxx.com/” is searched from information searched based on the search keyword “macaron”. It shows that.

日時は、検索ログが記録された日時、すなわち、飛び先ＵＲＬに係る情報の取得要求がサーチエンジンにおいて受信された日時である。ユーザ識別子は、検索ログに係る検索を行ったユーザの識別子である。ユーザ識別子は、例えば、サーチエンジンに対応するクッキーに基づいてクライアント装置側（Ｗｅｂブラウザ）より送信される。但し、クライアント装置単位の識別子（例えば、ＩＰアドレス等）をユーザ識別子としてもよい。 The date and time is the date and time when the search log was recorded, that is, the date and time when the search engine received an acquisition request for information related to the jump destination URL. The user identifier is an identifier of a user who has performed a search related to the search log. The user identifier is transmitted from the client device side (Web browser) based on a cookie corresponding to the search engine, for example. However, an identifier for each client device (for example, an IP address) may be used as the user identifier.

また、図５は、話題語の抽出方法の例を説明するための図である。同図では、検索キーワードに含まれている単語ごとに、過去３日間の検索数の増加率に基づくスコアが算出される例が示されている。具体的には、３日前の検索数から２日前の検索数の増加率と、２日前から１日前の検索数の増加率との合計が、各単語のスコアとされる。スコアの高い順にランク（順位）が決定され、ランクが相対的に高い単語（例えば、上位Ｎ番目までのもの）が話題語とされる。なお、検索数とは、検索キーワードに含まれていた数である。 FIG. 5 is a diagram for explaining an example of a topic word extraction method. The figure shows an example in which a score based on the increase rate of the number of searches over the past three days is calculated for each word included in the search keyword. Specifically, the sum of the rate of increase in the number of searches from 2 days ago to the number of searches from 3 days ago and the rate of increase in the number of searches from 2 days ago to 1 day ago is taken as the score of each word. Ranks (ranks) are determined in descending order of scores, and words having relatively high ranks (for example, words up to the top N) are set as topic words. The number of searches is the number included in the search keyword.

同図の例では、「マカロン」について、３日前、２日前、１日前の検索数は、それぞれ１０、１５、３０である。３日前から２日前の増加率は、１．５である。２日前から１日前までの増加率は２．０である。したがって、スコアは、１．５＋２．０＝３．５である。当該スコアは最も高いため、「マカロン」が最上位にランクされ、上位３位以内の「ラーメン」、及び「パソコン」と共に話題語として抽出されている。 In the example of the figure, the search numbers of “Macaroon” three days ago, two days ago, and one day ago are 10, 15, and 30, respectively. The increase rate from 3 days ago to 2 days ago is 1.5. The rate of increase from 2 days ago to 1 day ago is 2.0. Therefore, the score is 1.5 + 2.0 = 3.5. Since the score is the highest, “Macaron” is ranked at the top and is extracted as a topic word together with “Ramen” and “PC” in the top three.

なお、話題語の抽出処理には、他の公知技術が利用されてもよい。すなわち、本実施の形態に適用可能な、話題語の抽出方法は所定のものに限定されない。また、話題語の抽出処理は、検索情報解析装置１０以外の他のコンピュータにおいて行われてもよい。 Other known techniques may be used for the topic word extraction process. That is, the topic word extraction method applicable to the present embodiment is not limited to a predetermined one. The topic word extraction process may be performed by a computer other than the search information analysis apparatus 10.

続いて、関係抽出部１３は、検索ログに基づいて、検索キーワードに含まれている単語間の関係を抽出し、抽出結果を記録した関係管理テーブルを生成する（Ｓ１０２）。単語間の関係を抽出する検索ログの範囲は、話題語が抽出された検索ログの範囲（例えば、過去３日分）と同じでもよいし、異なっていてもよい。 Subsequently, the relationship extraction unit 13 extracts a relationship between words included in the search keyword based on the search log, and generates a relationship management table in which the extraction result is recorded (S102). The search log range for extracting the relationship between words may be the same as or different from the search log range (for example, the past three days) from which topic words are extracted.

単語間の関係は、検索ログに記録されている情報から抽出又は導出可能なものであり、合理的なものであれば特定の関係に限定されない。例えば、本実施の形態では、飛び先ＵＲＬの共通性、検索の実行主体（検索ユーザ）の共通性、又は同一の検索キーワードに含まれているか否かに基づいて単語間の関係が抽出される。 The relationship between words can be extracted or derived from information recorded in the search log, and is not limited to a specific relationship as long as it is reasonable. For example, in this embodiment, the relationship between words is extracted based on the commonality of jump URLs, the commonality of search execution entities (search users), or whether they are included in the same search keyword. .

図６は、飛び先ＵＲＬの共通性に基づく単語間の関係の抽出例を示す図である。同図の例では、２日前において、ＵＲＬ１が飛び先ＵＲＬである検索ログの検索キーワードには、「マカロン」、「フランス」、又は「東京」が含まれている。したがって、この場合、「マカロン」、「フランス」、及び「東京」は相互に関係を有すると判定される。すなわち、「マカロン」、「フランス」、及び「東京」の相互の関係が抽出される。 FIG. 6 is a diagram illustrating an example of extracting relationships between words based on commonality of jump destination URLs. In the example of the figure, the search keyword of the search log whose URL1 is the destination URL two days ago includes “Macaroon”, “France”, or “Tokyo”. Therefore, in this case, it is determined that “Macaroon”, “France”, and “Tokyo” are related to each other. That is, the mutual relationship between “Macaroon”, “France”, and “Tokyo” is extracted.

また、２日前において、ＵＲＬ２が飛び先ＵＲＬである検索ログの検索キーワードには、「マカロン」、「イタリア」、「お菓子」、又は「銀座」が含まれている。したがって、この場合、「マカロン」、「イタリア」、「お菓子」、及び「銀座」の相互の関係が抽出される。 Further, two days ago, the search keyword of the search log in which URL2 is the destination URL includes “Macaron”, “Italy”, “Sweets”, or “Ginza”. Therefore, in this case, the mutual relationship between “macaron”, “Italy”, “confectionery”, and “Ginza” is extracted.

また、１日前において、ＵＲＬ１が飛び先ＵＲＬである検索ログの検索キーワードには、「マカロン」、「お菓子」、又は「東京」が含まれている。したがって、この場合、「マカロン」、「お菓子」、及び「東京」の相互の関係が抽出される。 One day ago, the search keyword of the search log in which URL1 is the destination URL includes “Macaron”, “Sweets”, or “Tokyo”. Therefore, in this case, the mutual relationship between “macaron”, “sweets”, and “Tokyo” is extracted.

更に、１日前において、ＵＲＬ２が飛び先ＵＲＬである検索ログの検索キーワードには、「マカロン」又は「フランス」が含まれている。したがって、この場合、「マカロン」及び「フランス」の相互の関係が抽出される。 Further, one day ago, the search keyword of the search log in which URL2 is the jump destination URL includes “Macaroon” or “France”. Therefore, in this case, the mutual relationship between “Macaroon” and “France” is extracted.

以上の４つのケースおいて抽出された関係をマージ（統合）すると、関係情報Ｄ１が得られる。関係情報Ｄ１において、単語間の関係は線分によって示されている。また、線分に付された数値は、当該線分によって連結される単語間の関係の抽出回数を示す。例えば、「マカロン」と「フランス」との関係は、２日前の飛び先ＵＲＬがＵＲＬ１のケースと、１日前の飛び先ＵＲＬがＵＲＬ２のケースとにおいて合計２回抽出されている。したがって、当該二つの単語間を結ぶ線分には「２」が付されている。 When the relationships extracted in the above four cases are merged (integrated), the relationship information D1 is obtained. In the relationship information D1, the relationship between words is indicated by a line segment. The numerical value attached to the line segment indicates the number of extractions of the relationship between words connected by the line segment. For example, the relationship between “Macaroon” and “France” is extracted twice in total in the case where the destination URL of URL2 is two days ago and the case of URL2 where the destination URL is one day ago. Therefore, “2” is attached to the line segment connecting the two words.

なお、飛び先ＵＲＬは、ユーザによって閲覧された情報を示す。閲覧された情報は、ユーザが目的としていた情報である可能性が高い。そして、同一の情報を検索の目的とする場合に検索キーワードに含められる単語は、意味が近似する可能性が高い。したがって、飛び先ＵＲＬの共通性に基づいて抽出される関係は、単語間の意味的な近さ（近似度）に基づくものであると考えられる。 The jump destination URL indicates information browsed by the user. The browsed information is likely to be information intended by the user. And when the same information is intended for search, the words included in the search keyword are highly likely to have similar meanings. Therefore, the relationship extracted based on the commonality of the jump destination URLs is considered to be based on the semantic closeness (approximation degree) between words.

また、図７は、検索ユーザの共通性に基づく単語間の関係の抽出例を示す図である。同図の例では、２日前において、ユーザ１は、「マカロン」、「地図」、又は「お菓子」を含む検索キーワードを入力して検索を行っている。したがって、この場合、「マカロン」、「地図」、及び「お菓子」は、相互に関係を有すると判定される。すなわち、「マカロン」、「地図」、及び「お菓子」の相互の関係が抽出される。 Moreover, FIG. 7 is a figure which shows the example of extraction of the relationship between the words based on the commonality of a search user. In the example shown in the figure, two days ago, the user 1 performs a search by inputting a search keyword including “macaron”, “map”, or “sweets”. Therefore, in this case, it is determined that “macaroons”, “maps”, and “sweets” are related to each other. That is, the mutual relationship of “macaron”, “map”, and “sweets” is extracted.

また、２日前において、ユーザ２は、「マカロン」、「スイーツ」、又は「地図」を含む検索キーワードを入力して検索を行っている。したがって、この場合、「マカロン」、「スイーツ」、及び「地図」の相互の関係が抽出される。 In addition, two days ago, the user 2 performs a search by inputting a search keyword including “macaron”, “sweets”, or “map”. Therefore, in this case, the mutual relationship between “macaron”, “sweets”, and “map” is extracted.

また、１日前において、ユーザ１は、「マカロン」又は「地図」を含む検索キーワードを入力して検索を行っている。したがって、この場合、「マカロン」及び「地図」の相互の関係が抽出される。 Further, one day ago, the user 1 performs a search by inputting a search keyword including “macaron” or “map”. Therefore, in this case, the mutual relationship between “macaron” and “map” is extracted.

更に、１日前において、ユーザ２は、「マカロン」、「フランス」、「銀座」、又は「レシピ」を含む検索キーワードを入力して検索を行っている。したがって、この場合、「マカロン」、「フランス」、「銀座」、及び「レシピ」の相互の関係が抽出される。 Further, one day ago, the user 2 performs a search by inputting a search keyword including “macaron”, “France”, “Ginza”, or “recipe”. Therefore, in this case, the mutual relationship between “macaron”, “France”, “Ginza”, and “recipe” is extracted.

以上の４つのケースにおいて抽出された関係をマージすると、関係情報Ｄ２が得られる。関係情報Ｄ２の表記法は、関係情報Ｄ１と同様である。 When the relationships extracted in the above four cases are merged, relationship information D2 is obtained. The notation of the relationship information D2 is the same as that of the relationship information D1.

なお、ユーザは、目的とする情報を検索する際に、様々な検索キーワードを入力して、検索をやり直す場合がある。このような場合において入力される検索キーワードに含まれる単語は、ユーザの意識の中で、目的とする情報を中心とする関係が形成されているものと考えられる。したがって、検索ユーザの共通性に基づいて抽出される関係は、検索ユーザが認識している関係を示すものと考えられる。 When searching for target information, the user may input various search keywords and restart the search. In such a case, it is considered that the words included in the search keyword input have a relationship centered on the target information in the user's consciousness. Therefore, the relationship extracted based on the commonality of the search users is considered to indicate the relationship recognized by the search user.

また、図８は、同一の検索キーワードに含まれているか否かに基づく単語間の関係の抽出例を示す図である。同図の例では、２日前において、「お菓子マカロン東京」という検索キーワードが入力されている。「お菓子マカロン東京」は、「お菓子」、「マカロン」、及び「東京」が、スペース等によって区切られて同時に（一つの検索キーワード（クエリ）として）入力されたことを示す。この場合、当該検索キーワードより、「お菓子」、「マカロン」、及び「東京」の相互の関係が抽出される。２日前の他の検索キーワード、及び１日前の各検索キーワードからも、当該検索キーワードに含まれている単語間の相互の関係が抽出される。その結果、関係情報Ｄ３が得られる。関係情報Ｄ３の表記法は、関係情報Ｄ１と同様である。 FIG. 8 is a diagram illustrating an example of extracting relationships between words based on whether or not they are included in the same search keyword. In the example shown in the figure, the search keyword “sweets macaroon tokyo” is entered two days ago. “Sweets Macaroon Tokyo” indicates that “Sweets”, “Macaroons”, and “Tokyo” are separated by a space or the like and input simultaneously (as one search keyword (query)). In this case, the mutual relationship of “sweets”, “macarons”, and “Tokyo” is extracted from the search keyword. The mutual relationship between the words included in the search keyword is also extracted from other search keywords two days ago and each search keyword one day ago. As a result, the relationship information D3 is obtained. The notation of the relationship information D3 is the same as that of the relationship information D1.

なお、図６〜図８に示される３つの方法のいずれか一つが採用されてもよいし、二つ以上が組み合わされてもよい。 Any one of the three methods shown in FIGS. 6 to 8 may be adopted, or two or more methods may be combined.

上記３つの方法を実現するためのそれぞれの処理手順を説明する。図６の抽出例が採用される場合、関係抽出部１３は、図９に示される処理を実行する。 Each processing procedure for realizing the above three methods will be described. When the extraction example of FIG. 6 is adopted, the relationship extraction unit 13 executes the processing shown in FIG.

図９は、飛び先ＵＲＬの共通性に基づく単語間の関係の抽出処理の処理手順を説明するためのフローチャートである。 FIG. 9 is a flowchart for explaining a processing procedure for extracting a relationship between words based on commonality of the jump destination URL.

ステップＳ２０１において、関係抽出部１３は、飛び先ＵＲＬの共通性（同一性）に基づいて、所定期間内の検索ログをグループに分類する。すなわち、飛び先ＵＲＬが共通する検索ログは、同一のグループに属する。 In step S201, the relationship extraction unit 13 classifies search logs within a predetermined period into groups based on commonality (identity) of jump destination URLs. That is, the search logs having the same jump destination URL belong to the same group.

続いて、関係抽出部１３は、１つのグループを処理対象とする（Ｓ２０２）。以下、処理対象とされたグループを「カレントグループ」という。続いて、関係抽出部１３は、カレントグループに属する検索ログの検索キーワードに含まれる全ての単語を二つずつ組み合わせた場合の全ての組み合わせを関係管理テーブルに登録する（Ｓ２０３）。 Subsequently, the relationship extraction unit 13 sets one group as a processing target (S202). Hereinafter, the group to be processed is referred to as “current group”. Subsequently, the relationship extraction unit 13 registers all combinations in the case where two words are combined in the search keyword of the search log belonging to the current group, in the relationship management table (S203).

図１０は、関係管理テーブルの例を示す図である。関係管理テーブルは、二つの単語（「単語１」と「単語２」）の組み合わせごとに抽出回数を記憶する。すなわち、ステップＳ２０３において、関係抽出部１３は、未登録の組み合わせについては、関係管理テーブルに新規に登録し、当該組み合わせに対する抽出回数に「１」を記録する。一方、既に登録されている組み合わせについては、当該組み合わせに対する抽出回数に「１」を加算する。 FIG. 10 is a diagram illustrating an example of a relationship management table. The relationship management table stores the number of extractions for each combination of two words (“word 1” and “word 2”). That is, in step S203, the relationship extraction unit 13 newly registers an unregistered combination in the relationship management table, and records “1” as the number of extractions for the combination. On the other hand, for a combination that has already been registered, “1” is added to the number of extractions for the combination.

関係抽出部１３は、全てのグループについてステップＳ２０２及びＳ２０３を実行した後（Ｓ２０４でＹｅｓ）、抽出回数が閾値未満のレコードを関係管理テーブルより削除する（Ｓ２０５）。関連性の低い関係を排除することにより、情報のノイズを除去するためのである。 After executing Steps S202 and S203 for all groups (Yes in S204), the relationship extraction unit 13 deletes records whose extraction count is less than the threshold from the relationship management table (S205). This is to eliminate information noise by eliminating a low-relevance relationship.

なお、図６〜図７における関係情報Ｄ１〜Ｄ３は、関係管理テーブルに記録された情報を視覚的に分かり易く表現したものである。 The relationship information D1 to D3 in FIGS. 6 to 7 expresses the information recorded in the relationship management table in a visually easy-to-understand manner.

また、図７の抽出例が採用される場合、関係抽出部１３は、図１１に示される処理を実行する。図１１は、検索ユーザの共通性に基づく単語間の関係の抽出処理の処理手順を説明するためのフローチャートである。 When the extraction example of FIG. 7 is employed, the relationship extraction unit 13 executes the processing shown in FIG. FIG. 11 is a flowchart for explaining the processing procedure for extracting the relationship between words based on the commonality of search users.

ステップＳ２１１において、関係抽出部１３は、ユーザ識別子の共通性（同一性）に基づいて、所定期間内の検索ログをグループに分類する。すなわち、ユーザ識別子が共通する検索ログは、同一のグループに属する。 In step S211, the relationship extraction unit 13 classifies search logs within a predetermined period into groups based on the commonality (identity) of user identifiers. That is, search logs having a common user identifier belong to the same group.

以降、ステップＳ２１２〜Ｓ２１５は、図９のステップＳ２０２〜Ｓ２０５と同様の処理手順である。 Thereafter, steps S212 to S215 are the same processing procedure as steps S202 to S205 of FIG.

また、図８の抽出例が採用される場合、関係抽出部１３は、図１２に示される処理を実行する。図１２は、同一の検索キーワードに含まれているか否かに基づく単語間の関係の抽出処理の処理手順を説明するためのフローチャートである。 When the extraction example of FIG. 8 is employed, the relationship extraction unit 13 executes the processing shown in FIG. FIG. 12 is a flowchart for explaining a processing procedure of a relationship extraction process between words based on whether or not they are included in the same search keyword.

ステップＳ２２１において、関係抽出部１３は、所定期間内の検索ログの中で、検索キーワードが２以上の単語を含む検索ログを一つ選択する。ここで選択された検索ログを、以下「カレントログ」という。 In step S221, the relationship extraction unit 13 selects one search log that includes two or more search keywords in the search logs within a predetermined period. The search log selected here is hereinafter referred to as “current log”.

続いて、関係抽出部１３は、カレントログの検索キーワードに含まれている単語を二つずつ組み合わせた場合の全ての組み合わせを関係管理テーブルに登録する（Ｓ２２２）。関係管理テーブルの形式は、図１０と同様でよい。関係抽出部１３は、未登録の組み合わせについては、関係管理テーブルに新規に登録し、当該組み合わせに対する抽出回数に「１」を記録する。一方、既に登録されている組み合わせについては、当該組み合わせに対する抽出回数に「１」を加算する。 Subsequently, the relationship extraction unit 13 registers all combinations in the case where two words included in the search keyword of the current log are combined in the relationship management table (S222). The format of the relationship management table may be the same as in FIG. The relationship extraction unit 13 newly registers an unregistered combination in the relationship management table, and records “1” as the number of extractions for the combination. On the other hand, for a combination that has already been registered, “1” is added to the number of extractions for the combination.

所定期間内の全ての検索ログについて、ステップＳ２２１及びＳ２２２を実行した後（Ｓ２２３でＹｅｓ）、関係抽出部１３は、抽出回数が閾値未満のレコードを関係管理テーブルより削除する（Ｓ２２４）。 After executing steps S221 and S222 for all search logs within a predetermined period (Yes in S223), the relationship extraction unit 13 deletes records whose extraction count is less than the threshold from the relationship management table (S224).

図３に戻る。ステップＳ１０２に続いて、代表語選択部１４は、ステップＳ１０１において抽出された話題語の一つを処理対象として選択する（Ｓ１０３）。話題語の選択順について特に制限は無い。最終的に全ての話題語が処理対象とされればよい。又は、操作者の指示入力によって、処理対象とする話題語が選択されてもよい。以下、選択された話題語を「カレント話題語」という。 Returning to FIG. Subsequent to step S102, the representative word selection unit 14 selects one of the topic words extracted in step S101 as a processing target (S103). There is no particular restriction on the order of selection of topic words. All topic words may be finally processed. Alternatively, a topic word to be processed may be selected by an instruction input from the operator. Hereinafter, the selected topic word is referred to as “current topic word”.

続いて、代表語選択部１４は、カレント話題語から２ステップ以内の関係に係るレコードを関係管理テーブルより抽出する（Ｓ１０４）。２ステップ以内とは、関係情報Ｄ１等において２枝以内ということを意味する。例えば、「マカロン」がカレント話題語である場合に、関係情報Ｄ３（図８）において、「ピザ」及び「パスタ」等は、「マカロン」から２ステップである。一方、「お菓子」、「フランス」、及び「イタリア」は、「マカロン」から１ステップである。なお、関係情報Ｄ１〜Ｄ３では、「マカロン」から３ステップ以上の単語は便宜上表示されていない。 Subsequently, the representative word selection unit 14 extracts records related to the relationship within two steps from the current topic word from the relationship management table (S104). Within two steps means within two branches in the relationship information D1 and the like. For example, when “macaroon” is the current topic word, “pizza” and “pasta” and the like are two steps from “macaroon” in the relationship information D3 (FIG. 8). On the other hand, “Sweets”, “France”, and “Italy” are one step from “Macaroon”. In the related information D1 to D3, words of three steps or more from “Macaroon” are not displayed for convenience.

カレント話題語から２ステップ以内の関係を示すレコードの関係管理テーブルからの抽出は次のように行えばよい。まず、カレント話題語を含むレコード（以下「１ステップレコード」という。）を全て関係管理テーブルより抽出する。続いて、１ステップレコードにおいてカレント話題語ではない単語をキーとし当該単語を含むレコード（以下、「２ステップレコード」という。）を、１ステップレコードを除いたレコードの中から抽出する。 Extraction from the relationship management table of records indicating relationships within two steps from the current topic word may be performed as follows. First, all records including the current topic word (hereinafter referred to as “one step record”) are extracted from the relationship management table. Subsequently, a record including the word (hereinafter referred to as “two-step record”) using a word that is not the current topic word as a key in the one-step record is extracted from the records excluding the one-step record.

続いて、代表語選択部１４は、抽出されたレコード（すなわち、１ステップレコード及び２ステップレコード）に基づいて、代表語の選択処理を実行する（Ｓ１０５）。すなわち、代表語は、話題語ごとに選択される。続いて、出力処理部１５は、選択された代表語を出力する。 Subsequently, the representative word selection unit 14 executes a representative word selection process based on the extracted records (that is, the one-step record and the two-step record) (S105). That is, a representative word is selected for each topic word. Subsequently, the output processing unit 15 outputs the selected representative word.

全ての話題語について代表語の選択及び出力等が完了すると（Ｓ１０７でＹｅｓ）、図３の処理は終了する。 When the selection and output of representative words for all topic words is completed (Yes in S107), the process in FIG. 3 ends.

続いて、ステップＳ１０５の詳細について説明する。図１３は、代表語の選択処理の処理手順を説明するためのフローチャートである。 Next, details of step S105 will be described. FIG. 13 is a flowchart for explaining a processing procedure of representative word selection processing.

ステップＳ３０１において、代表語選択部１４は、カレント話題語の隣接語を１ステップレコードより抽出する。隣接語とは、１ステップ以内の単語をいう。したがって、１ステップレコードに含まれている単語（話題語以外の単語）が、話題語の隣接語である。 In step S301, the representative word selection unit 14 extracts adjacent words of the current topic word from one step record. An adjacent word means a word within one step. Therefore, words (words other than the topic word) included in the one-step record are adjacent words of the topic word.

例えば、図１４は、話題語から２ステップ以内の単語の関係情報の例を示す図である。同図において、カレント話題語が「マカロン」であるとすると、ステップＳ３０１では、「イタリア」、「銀座」、「フランス」、「スイーツ」、「ロールケーキ」、及び「お菓子」が隣接語として抽出される。なお、話題語の隣接語を特に「周辺語」という。 For example, FIG. 14 is a diagram illustrating an example of relation information of words within two steps from a topic word. In the figure, assuming that the current topic word is “Macaroon”, in step S301, “Italy”, “Ginza”, “France”, “Sweets”, “Rollcake”, and “Sweets” are adjacent words. Extracted. The adjacent words of the topic word are particularly called “neighboring words”.

続いて、代表語選択部１４は、関係管理テーブルに基づいて各周辺語の隣接語数を計数する（Ｓ３０２）。具体的には、周辺語ごとに、当該周辺語を「単語１」又は「単語２」に含むレコード数がカウントされる。例えば、図１４において、周辺語の一つである「銀座」には、「イタリア」、「フランス」、「マカロン」、「お菓子」、「スイーツ」、「アクセサリ」、及び「ブランドショップ」の七つの隣接語が有る。したがって、「銀座」に対する隣接語数は「７」となる。 Subsequently, the representative word selection unit 14 counts the number of adjacent words of each peripheral word based on the relationship management table (S302). Specifically, for each peripheral word, the number of records including the peripheral word in “word 1” or “word 2” is counted. For example, in FIG. 14, “Ginza” which is one of the peripheral words includes “Italy”, “France”, “Macaroon”, “Sweets”, “Sweets”, “Accessories”, and “Brand Shop”. There are seven adjacent words. Therefore, the number of adjacent words for “Ginza” is “7”.

続いて、代表語選択部１４は、関係管理テーブルに基づいて、各周辺語の隣接語中の他の周辺語の数を計数する（Ｓ３０３）。具体的には、周辺語ごとに、当該周辺語を「単語１」又は「単語２」の一方に含み、かつ、他方の単語が当該周辺語以外の他の周辺語（すなわち、話題語の隣接語）であるレコードの数がカウントされる。例えば、図１４において、周辺語の一つである「銀座」には、「イタリア」、「フランス」、「マカロン」、「お菓子」、「スイーツ」、「アクセサリ」、及び「ブランドショップ」の七つの隣接語が有る。このうち、「イタリア」、「フランス」、「マカロン」、「お菓子」、及び「スイーツ」の五つは周辺語である。したがって、「銀座」に対する隣接語中の周辺語の数は「５」となる。 Subsequently, the representative word selection unit 14 counts the number of other neighboring words in the neighboring words of each neighboring word based on the relationship management table (S303). Specifically, for each peripheral word, the peripheral word is included in one of “word 1” or “word 2”, and the other word is a peripheral word other than the peripheral word (that is, adjacent to the topic word) The number of records that are word) is counted. For example, in FIG. 14, “Ginza” which is one of the peripheral words includes “Italy”, “France”, “Macaroon”, “Sweets”, “Sweets”, “Accessories”, and “Brand Shop”. There are seven adjacent words. Of these, five are "Italy", "France", "Macaroon", "Sweets", and "Sweets". Therefore, the number of neighboring words in the adjacent word for “Ginza” is “5”.

続いて、代表語選択部１４は、周辺語ごとにスコアを算出し、スコアが最大である周辺語を代表語として選択する（Ｓ３０４）。ここで、スコアは、以下の式によって算出され、本実施の形態において、話題語の背景を示す度合の一例に相当する。
（ｗｎ×ｗｎ）／ｒｎ
ｒｎ：隣接語数（ステップＳ３０２において計数される値）
ｗｎ：隣接語中の周辺語の数（ステップＳ３０３において計数される値）
図１４の関係情報の例に関して、周辺語ごとのｒｎの値、ｗｎの値、及び（ｗｎ×ｗｎ）／ｒｎの値は、図１５に示される。図１５は、関係情報の具体例における周辺語ごとの各統計値を示す図である。 Subsequently, the representative word selection unit 14 calculates a score for each peripheral word, and selects a peripheral word having the maximum score as a representative word (S304). Here, the score is calculated by the following equation, and corresponds to an example of the degree indicating the background of the topic word in the present embodiment.
(Wn × wn) / rn
rn: Number of adjacent words (value counted in step S302)
wn: Number of neighboring words in the adjacent word (value counted in step S303)
Regarding the example of the relationship information in FIG. 14, the value of rn, the value of wn, and the value of (wn × wn) / rn for each peripheral word are shown in FIG. 15. FIG. 15 is a diagram illustrating each statistical value for each neighboring word in a specific example of the relationship information.

同図の例において、スコアが最大である周辺語は「スイーツ」である。したがって、「スイーツ」が、話題語「マカロン」に対する代表語として選択される。なお、スコアが最大である周辺語だけでなく、スコアが所定値以上又はスコアの順位が所定以上の複数の周辺語が話題語の背景を示すための提示対象として選択されてもよい。 In the example of the figure, the peripheral word having the maximum score is “sweets”. Therefore, “sweets” is selected as a representative word for the topic word “macaron”. In addition, not only the peripheral word having the maximum score but also a plurality of peripheral words having a score equal to or higher than a predetermined value or a score ranking higher than a predetermined value may be selected as a presentation target for showing the background of the topic word.

図１５の例に基づく場合、図３のステップＳ１０６では、例えば、図１６に示されるような表示がお行われる。図１６は、話題語に関する代表語等の表示例を示す図である。 When based on the example of FIG. 15, in step S106 of FIG. 3, for example, a display as shown in FIG. 16 is performed. FIG. 16 is a diagram illustrating a display example of representative words and the like related to topic words.

同図では、話題語「マカロン」を中心として、代表語「スイーツ」が上側に、スコアが上位の周辺語（「銀座」、「フランス」、及び「お菓子」）が下側に配置されている。同図に示された画面を閲覧したユーザは、「マカロン」が話題語であることと共に、スイーツの一種であり、銀座で売られているお菓子であること。また、フランスが原産であることを推測することができる。なお、代表語だけを表示させてもよいが、他の周辺語を提示することにより、話題語の意味、内容、又は背景等をより明確に推測させることが可能となる。 In the figure, with the topic word “Macaroon” at the center, the representative word “Sweets” is placed on the upper side, and peripheral words with higher scores (“Ginza”, “France”, and “Sweets”) are placed on the lower side. Yes. The user who browses the screen shown in the figure is that “Macaroon” is a topic word and is a sweet that is a kind of sweet and is sold in Ginza. It can also be assumed that France is native. Although only representative words may be displayed, it is possible to more clearly guess the meaning, content, background, etc. of the topic word by presenting other peripheral words.

ここで、周辺語の評価に利用されるスコアの算出式（（ｗｎ×ｗｎ）／ｒｎ）の意義について説明する。ｗｎは、周辺語の隣接語のうちの周辺語の数である。すなわち、周辺語の隣接語でもあり、かつ、話題語の隣接語でもある単語の数である。したがって、当該単語は、周辺語とも関係が強く、話題語とも関係が強い単語であるといえる。当該単語の数が多ければ多い周辺語ほど（ｗｎの値が大きければ大きい周辺語ほど）、話題語との関係は強いものと考えられる。したがって、そのような周辺語（ｗｎの値が大きい周辺語）は、話題語に対する代表語の選択において高い評価が与えられるべきである。 Here, the significance of a score calculation formula ((wn × wn) / rn) used for evaluation of peripheral words will be described. wn is the number of neighboring words among neighboring words of the neighboring words. In other words, it is the number of words that are neighboring words of neighboring words and also neighboring words of topic words. Therefore, it can be said that the word is a word that has a strong relationship with peripheral words and a strong relationship with topic words. It can be considered that the larger the number of words, the stronger the relationship with the topic word (the larger the surrounding word, the greater the value of wn). Therefore, such peripheral words (neighboring words having a large value of wn) should be given high evaluation in selecting a representative word for the topic word.

但し、ｗｎの値は、周辺語の意味の範囲（抽象度）に大きく影響される。具体的には、意味の範囲が広い（抽象度が高い）周辺語であれば、多数の単語と関係を有する可能性が高い。そうすると、当該周辺語と話題語との関係の絶対的な強さ、すなわち、ｗｎの値は自ずと大きくなる。一方、意味する範囲が狭い（抽象度が低い）周辺語であれば、関係を有する単語も限定される。そうすると、当該周辺語と話題語との関係の絶対的な強さ、すなわち、ｗｎの値は自ずと小さくなる。したがって、単にｗｎの比較によって、各周辺語と話題語との関係の強さを比較してしまっては、当該強さを正当に評価したことにはならない。 However, the value of wn is greatly influenced by the meaning range (abstraction level) of neighboring words. Specifically, a peripheral word having a wide meaning range (high abstraction) is likely to have a relationship with many words. Then, the absolute strength of the relationship between the peripheral word and the topic word, that is, the value of wn naturally increases. On the other hand, if it is a peripheral word having a narrow meaning range (low abstraction level), related words are also limited. Then, the absolute strength of the relationship between the peripheral word and the topic word, that is, the value of wn is naturally reduced. Therefore, if the strength of the relationship between each peripheral word and the topic word is simply compared by comparing wn, the strength is not properly evaluated.

そこで、本実施の形態では、ｒｎが除数として用いられるのである。ｒｎは、周辺語の隣接語数であり、話題語に隣接していない単語の数も含まれる。したがって、ｒｎは、周辺語と関係の強い単語の総数を示す値、すなわち、周辺語の意味の範囲を示す値であるといえる。 Therefore, in this embodiment, rn is used as a divisor. rn is the number of adjacent words of the peripheral word, and includes the number of words that are not adjacent to the topic word. Therefore, rn can be said to be a value indicating the total number of words closely related to the peripheral words, that is, a value indicating the meaning range of the peripheral words.

ｗｎをｒｎによって除することにより、各周辺語の意味の範囲の大きさを捨象して、各周辺語と話題語との関係の強さを比較することができる。換言すれば、各周辺語について、当該周辺語が有する他の単語との全ての関係に対する話題語との関係（厳密には周辺語との関係）の割合によって、各周辺語と話題語との関係の強さを評価することができる。なお、ｗｎを自乗しているのは、本実施の形態では、話題語との関係の絶対的な強さに対して重み付けをしているためである。したがって、必ずしもｗｎを自乗しなくてもよい。また、ｗｎやｒｎの重み付けを変化させるため、それぞれに対して重み付け係数を乗ずるようにしてもよい。その場合、スコアの算出式は、（α×ｗｎ）／（β×ｒｎ）となる。ここで、αはｗｎに対する重み付け係数である。βは、ｒｎに対する重み付け係数である。 By dividing wn by rn, the magnitude of the range of meaning of each peripheral word can be discarded, and the strength of the relationship between each peripheral word and the topic word can be compared. In other words, for each peripheral word, the ratio between each peripheral word and the topic word depends on the ratio of the relationship with the topic word (strictly the relationship with the peripheral word) with respect to all the relationships with the other words of the peripheral word. The strength of the relationship can be evaluated. Note that wn is squared because the absolute strength of the relationship with the topic word is weighted in the present embodiment. Therefore, it is not always necessary to square wn. Further, in order to change the weighting of wn and rn, each may be multiplied by a weighting coefficient. In this case, the score calculation formula is (α × wn) / (β × rn). Here, α is a weighting coefficient for wn. β is a weighting coefficient for rn.

上述したように、本実施の形態によれば、検索ログを解析して検索キーワードに含まれている単語間の関係情報を抽出し、関係情報に基づいて、検索キーワードから抽出された話題語に対する周辺語及び代表語を抽出又は判定することができる。したがって、話題語の背景を示す情報の提示を適切に支援することができる。すなわち、抽出又は判定された周辺語及び代表語等を提示することで、ユーザは、話題語の背景を容易に理解することが可能となる。 As described above, according to the present embodiment, the search log is analyzed to extract the relationship information between the words included in the search keyword, and the topic word extracted from the search keyword is extracted based on the relationship information. Peripheral words and representative words can be extracted or determined. Therefore, it is possible to appropriately support the presentation of information indicating the background of the topic word. That is, by presenting the extracted peripheral words and representative words, the user can easily understand the background of the topic words.

以上、本発明の実施例について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 As mentioned above, although the Example of this invention was explained in full detail, this invention is not limited to such specific embodiment, In the range of the summary of this invention described in the claim, various deformation | transformation・ Change is possible.

以上の説明に関し、更に以下の項を開示する。
（付記１）
コンピュータに、
１つ以上の単語を含む検索語と該検索語の属性情報とを対応付けて格納した検索ログに基づいて、該検索語に含まれている単語間の関係を抽出し、各単語と関係する単語を対応付けて関係情報記憶手段に記録する関係生成手順と、
前記検索語に含まれる単語の一つである話題語に基づき、該話題語と前記関係を有する単語である周辺語を前記関係情報記憶手段を参照して抽出し、抽出した前記周辺語ごとに、該周辺語と前記関係を有する第一の単語と前記第一の単語であってかつ該周辺語以外の前記周辺語に該当する第二の単語とを前記関係情報記憶手段を参照して抽出し、該抽出した第一の単語の数と該第二の単語の数とをそれぞれ計数し、該第一の単語の数及び該第二の単語の数に基づいて該周辺語が前記話題語の背景を示す度合を算出し、該算出した度合に基づいて前記周辺語の一部を抽出する選択手順とを実行させる検索情報解析プログラム。
（付記２）
前記選択手順が算出する度合いは、前記第二の単語の数を前記第一の単語の数によって除した値である付記１記載の検索情報解析プログラム。
（付記３）
前記検索語の属性情報は、該検索語に対応する閲覧先情報のＵＲＬであり、
前記関係生成手順は、前記閲覧先情報のＵＲＬの共通性に基づいて前記単語間の関係を抽出する付記１又は２記載の検索情報解析プログラム。
（付記４）
前記検索語の属性情報は、該検索語に対応する検索主体の識別子であり、
前記関係生成手順は、前記検索主体の識別子の共通性に基づいて前記単語間の関係を抽出する付記１又は２記載の検索情報解析プログラム。
（付記５）
前記関係生成手順は、前記検索ログにおいて同一の前記検索語に含まれている否かに基づいて前記単語間の関係を抽出する付記１又は２記載の検索情報解析プログラム。
（付記６）
１つ以上の単語を含む検索語と該検索語の属性情報とを対応付けて格納した検索ログに基づいて、該検索語に含まれている単語間の関係を抽出し、各単語と関係する単語を対応付けて関係情報記憶手段に記録する関係生成手段と、
前記検索語に含まれる単語の一つである話題語に基づき、該話題語と前記関係を有する単語である周辺語を前記関係情報記憶手段を参照して抽出し、抽出した前記周辺語ごとに、該周辺語と前記関係を有する第一の単語と前記第一の単語であってかつ該周辺語以外の前記周辺語に該当する第二の単語とを前記関係情報記憶手段を参照して抽出し、該抽出した第一の単語の数と該第二の単語の数とをそれぞれ計数し、該第一の単語の数及び該第二の単語の数に基づいて該周辺語が前記話題語の背景を示す度合を算出し、該算出した度合に基づいて前記周辺語の一部を抽出する選択手段とを有する検索情報解析装置。
（付記７）
前記選択手段が算出する度合いは、前記第二の単語の数を前記第一の単語の数によって除した値である付記６記載の検索情報解析装置。
（付記８）
前記検索語の属性情報は、該検索語に対応する閲覧先情報のＵＲＬであり、
前記関係生成手段は、前記閲覧先情報のＵＲＬの共通性に基づいて前記単語間の関係を抽出する付記６又は７記載の検索情報解析装置。
（付記９）
前記検索語の属性情報は、該検索語に対応する検索主体の識別子であり、
前記関係生成手段は、前記検索主体の識別子の共通性に基づいて前記単語間の関係を抽出する付記６又は７記載の検索情報解析装置。
（付記１０）
前記関係生成手段は、前記検索ログにおいて同一の前記検索語に含まれている否かに基づいて前記単語間の関係を抽出する付記６又は７記載の検索情報解析装置。
（付記１１）
コンピュータが、
１つ以上の単語を含む検索語と該検索語の属性情報とを対応付けて格納した検索ログに基づいて、該検索語に含まれている単語間の関係を抽出し、各単語と関係する単語を対応付けて関係情報記憶手段に記録する関係生成手順と、
前記検索語に含まれる単語の一つである話題語に基づき、該話題語と前記関係を有する単語である周辺語を前記関係情報記憶手段を参照して抽出し、抽出した前記周辺語ごとに、該周辺語と前記関係を有する第一の単語と前記第一の単語であってかつ該周辺語以外の前記周辺語に該当する第二の単語とを前記関係情報記憶手段を参照して抽出し、該抽出した第一の単語の数と該第二の単語の数とをそれぞれ計数し、該第一の単語の数及び該第二の単語の数に基づいて該周辺語が前記話題語の背景を示す度合を算出し、該算出した度合に基づいて前記周辺語の一部を抽出する選択手順とを実行する検索情報解析方法。
（付記１２）
前記選択手順が算出する度合いは、前記第二の単語の数を前記第一の単語の数によって除した値である付記１１記載の検索情報解析方法。
（付記１３）
前記検索語の属性情報は、該検索語に対応する閲覧先情報のＵＲＬであり、
前記関係生成手順は、前記閲覧先情報のＵＲＬの共通性に基づいて前記単語間の関係を抽出する付記１１又は１２記載の検索情報解析方法。
（付記１４）
前記検索語の属性情報は、該検索語に対応する検索主体の識別子であり、
前記関係生成手順は、前記検索主体の識別子の共通性に基づいて前記単語間の関係を抽出する付記１１又は１２記載の検索情報解析方法。
（付記１５）
前記関係生成手順は、前記検索ログにおいて同一の前記検索語に含まれている否かに基づいて前記単語間の関係を抽出する付記１１又は１２記載の検索情報解析方法。 Regarding the above description, the following items are further disclosed.
(Appendix 1)
On the computer,
Based on a search log in which a search word including one or more words and attribute information of the search word are stored in association with each other, a relationship between words included in the search word is extracted and related to each word A relationship generation procedure for associating words with each other and storing them in the relationship information storage means;
Based on a topic word that is one of the words included in the search word, a peripheral word that is a word having the relationship with the topic word is extracted with reference to the relation information storage means, and for each extracted peripheral word The first word having the relationship with the peripheral word and the second word that is the first word and corresponds to the peripheral word other than the peripheral word are extracted with reference to the relationship information storage means And counting the number of the extracted first words and the number of the second words, respectively, so that the peripheral word is the topic word based on the number of the first words and the number of the second words. The search information analysis program which calculates the degree which shows the background of this, and performs the selection procedure which extracts a part of said peripheral word based on this calculated degree.
(Appendix 2)
The search information analysis program according to supplementary note 1, wherein the degree calculated by the selection procedure is a value obtained by dividing the number of the second words by the number of the first words.
(Appendix 3)
The attribute information of the search term is a URL of browsing destination information corresponding to the search term,
The search information analysis program according to appendix 1 or 2, wherein the relationship generation procedure extracts a relationship between the words based on a commonality of URLs of the browsing destination information.
(Appendix 4)
The attribute information of the search term is an identifier of a search subject corresponding to the search term,
The search information analysis program according to appendix 1 or 2, wherein the relation generation procedure extracts a relation between the words based on commonality of the identifiers of the search subjects.
(Appendix 5)
The search information analysis program according to supplementary note 1 or 2, wherein the relationship generation procedure extracts a relationship between the words based on whether or not they are included in the same search word in the search log.
(Appendix 6)
Based on a search log in which a search word including one or more words and attribute information of the search word are stored in association with each other, a relationship between words included in the search word is extracted and related to each word Relationship generation means for associating words with each other and storing them in the relationship information storage means;
Based on a topic word that is one of the words included in the search word, a peripheral word that is a word having the relationship with the topic word is extracted with reference to the relation information storage means, and for each extracted peripheral word The first word having the relationship with the peripheral word and the second word that is the first word and corresponds to the peripheral word other than the peripheral word are extracted with reference to the relationship information storage means And counting the number of the extracted first words and the number of the second words, respectively, so that the peripheral word is the topic word based on the number of the first words and the number of the second words. A search information analysis apparatus comprising: a selection unit that calculates a degree indicating the background of the word and extracts a part of the peripheral words based on the calculated degree.
(Appendix 7)
The search information analysis apparatus according to appendix 6, wherein the degree of calculation by the selection means is a value obtained by dividing the number of the second words by the number of the first words.
(Appendix 8)
The attribute information of the search term is a URL of browsing destination information corresponding to the search term,
The search information analysis apparatus according to appendix 6 or 7, wherein the relationship generation means extracts a relationship between the words based on a commonality of URLs of the browsing destination information.
(Appendix 9)
The attribute information of the search term is an identifier of a search subject corresponding to the search term,
The search information analysis device according to appendix 6 or 7, wherein the relationship generation means extracts the relationship between the words based on the commonality of the identifiers of the search subjects.
(Appendix 10)
The search information analysis apparatus according to appendix 6 or 7, wherein the relationship generation means extracts a relationship between the words based on whether or not the search log includes the same search word.
(Appendix 11)
Computer
Based on a search log in which a search word including one or more words and attribute information of the search word are stored in association with each other, a relationship between words included in the search word is extracted and related to each word A relationship generation procedure for associating words with each other and storing them in the relationship information storage means;
Based on a topic word that is one of the words included in the search word, a peripheral word that is a word having the relationship with the topic word is extracted with reference to the relation information storage means, and for each extracted peripheral word The first word having the relationship with the peripheral word and the second word that is the first word and corresponds to the peripheral word other than the peripheral word are extracted with reference to the relationship information storage means And counting the number of the extracted first words and the number of the second words, respectively, so that the peripheral word is the topic word based on the number of the first words and the number of the second words. And a selection procedure for extracting a part of the neighboring words based on the calculated degree and calculating a degree indicating the background of the search information.
(Appendix 12)
The search information analysis method according to supplementary note 11, wherein the degree calculated by the selection procedure is a value obtained by dividing the number of the second words by the number of the first words.
(Appendix 13)
The attribute information of the search term is a URL of browsing destination information corresponding to the search term,
The search information analysis method according to appendix 11 or 12, wherein the relation generation procedure extracts a relation between the words based on a commonality of URLs of the browsing destination information.
(Appendix 14)
The attribute information of the search term is an identifier of a search subject corresponding to the search term,
The search information analysis method according to appendix 11 or 12, wherein the relationship generation procedure extracts a relationship between the words based on commonality of the identifiers of the search subjects.
(Appendix 15)
The search information analysis method according to appendix 11 or 12, wherein the relationship generation procedure extracts a relationship between the words based on whether or not the search log is included in the same search word.

１０検索情報解析装置
１１検索ログ記憶部
１２話題語抽出部
１３関係抽出部
１４代表語選択部
１５出力処理部
１００ドライブ装置
１０１記録媒体
１０２補助記憶装置
１０３メモリ装置
１０４ＣＰＵ
１０５インタフェース装置
１０６表示装置
１０７入力装置
Ｂバス DESCRIPTION OF SYMBOLS 10 Search information analysis apparatus 11 Search log memory | storage part 12 Topic word extraction part 13 Relation extraction part 14 Representative word selection part 15 Output processing part 100 Drive apparatus 101 Recording medium 102 Auxiliary storage apparatus 103 Memory apparatus 104 CPU
105 interface device 106 display device 107 input device B bus

Claims

On the computer,
Based on a search log in which a search word including one or more words and attribute information of the search word are stored in association with each other, a relationship between words included in the search word is extracted and related to each word A relationship generation procedure for associating words with each other and storing them in the relationship information storage means;
Based on the topic word is one of the words included in the retrieval word, the surrounding words is the word with the relationship between the topic words extracted by referring to the relation storing means, each extracted peripheral Language to a first word having a relationship to the peripheral words, extracts the second word corresponding to peripheral word and the extracted one of the first word by referring to the relation storing means, said The number of the extracted first words and the number of the second words are counted, respectively, and based on the number of the first words and the number of the second words, the surrounding words indicate the background of the topic word. A search information analysis program that calculates a degree to be shown and executes a selection procedure for extracting a part of the peripheral words based on the calculated degree.

The search information analysis program according to claim 1, wherein the degree calculated by the selection procedure is a value obtained by dividing the number of the second words by the number of the first words.

The attribute information of the search term is a URL of browsing destination information corresponding to the search term,
The search information analysis program according to claim 1, wherein the relationship generation procedure extracts a relationship between the words based on a commonality of URLs of the browsing destination information.

The attribute information of the search term is an identifier of a search subject corresponding to the search term,
The search information analysis program according to claim 1 or 2, wherein the relationship generation procedure extracts a relationship between the words based on a commonality of identifiers of the search subjects.

The search information analysis program according to claim 1 or 2, wherein the relationship generation procedure extracts a relationship between the words based on whether or not the search log is included in the same search word.

Based on a search log in which a search word including one or more words and attribute information of the search word are stored in association with each other, a relationship between words included in the search word is extracted and related to each word Relationship generation means for associating words with each other and storing them in the relationship information storage means;
Based on the topic word is one of the words included in the retrieval word, the surrounding words is the word with the relationship between the topic words extracted by referring to the relation storing means, each extracted peripheral Language to a first word having a relationship to the peripheral words, extracts the second word corresponding to peripheral word and the extracted one of the first word by referring to the relation storing means, said The number of the extracted first words and the number of the second words are counted, respectively, and based on the number of the first words and the number of the second words, the surrounding words indicate the background of the topic word. A search information analysis apparatus comprising: a selection unit that calculates a degree to be shown and extracts a part of the peripheral words based on the calculated degree.

Computer
Based on a search log in which a search word including one or more words and attribute information of the search word are stored in association with each other, a relationship between words included in the search word is extracted and related to each word A relationship generation procedure for associating words with each other and storing them in the relationship information storage means;
Based on the topic word is one of the words included in the retrieval word, the surrounding words is the word with the relationship between the topic words extracted by referring to the relation storing means, each extracted peripheral Language to a first word having a relationship to the peripheral words, extracts the second word corresponding to peripheral word and the extracted one of the first word by referring to the relation storing means, said The number of the extracted first words and the number of the second words are counted, respectively, and based on the number of the first words and the number of the second words, the surrounding words indicate the background of the topic word. A search information analysis method for calculating a degree to be shown and performing a selection procedure for extracting a part of the peripheral words based on the calculated degree.