JP4535765B2

JP4535765B2 - Content navigation program, content navigation method, and content navigation apparatus

Info

Publication number: JP4535765B2
Application number: JP2004128925A
Authority: JP
Inventors: 寛治内野; 敏勝鎌仲; 秀治橋本; 裕一多田; 智也成田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2004-04-23
Filing date: 2004-04-23
Publication date: 2010-09-01
Anticipated expiration: 2024-04-23
Also published as: JP2005309998A

Description

本発明は情報探索を支援するためのコンテンツナビゲーションプログラム、コンテンツナビゲーション方法及びコンテンツナビゲーション装置に関し、特にユーザが選択したキーワードに関連するキーワードを提示することができるコンテンツナビゲーションプログラム、コンテンツナビゲーション方法及びコンテンツナビゲーション装置に関する。 The present invention relates to a content navigation program, a content navigation method, and a content navigation device for supporting information search, and in particular, a content navigation program, a content navigation method, and a content navigation device that can present a keyword related to a keyword selected by a user. About.

ＦＴＴＨ（Fiber To The Home）、ＡＤＳＬ（Asymmetric Digital Subscriber Line）などの高帯域ネットワーク環境の整備によって、インターネットが人々の生活に欠かせないインフラとなっている。そんな中、インターネット上の大量のコンテンツの中から情報を探し出す技術は必須となっており、そのような機能をユーザに提供するサービスプロバイダも数多く存在する。 With the development of high-bandwidth network environments such as FTTH (Fiber To The Home) and ADSL (Asymmetric Digital Subscriber Line), the Internet has become an indispensable infrastructure for people's lives. Under such circumstances, a technique for finding information from a large amount of contents on the Internet is essential, and there are many service providers that provide such functions to users.

情報の検索機能は、多くのポータルサイトで提供されている。たとえば、情報を階層構造に分類してユーザによる情報探索を支援するサービス（ディレクトリ検索）や、入力されたキーワードに応じた情報をリストアップするサービス（キーワード検索）がある。これらのサービスは、多くのユーザに利用されている。 Information search functions are provided on many portal sites. For example, there is a service (directory search) that supports information search by a user by classifying information into a hierarchical structure, and a service (keyword search) that lists information according to an input keyword. These services are used by many users.

なお、インターネット上の情報量は膨大である。そのため、キーワード検索では、入力されたキーワードに合致する情報の数も大量となってしまう場合がある。そこで、情報間の引用関係に基づいて各情報のランキングを行い、ランクの上位の情報から優先的にリストアップする技術が提案されている（たとえば、特許文献１参照）。 The amount of information on the Internet is enormous. Therefore, in the keyword search, the number of pieces of information that match the input keyword may be large. In view of this, a technique has been proposed in which each information is ranked based on the citation relationship between the information, and the information is preferentially listed from the information of higher rank (for example, see Patent Document 1).

また、キーワード検索を行う際には、適当なキーワードを入力することが重要となる。そこで、同様の意味を有する複数のキーワードで予め同義語辞書を作成しておけば、ユーザが入力したキーワードを同義語で補完して、漏れのない検索を行うことができる。そこで、単語同士の関連度を定義した同義語辞書を自動作成する技術も考えられている（たとえば、特許文献２参照）。
米国特許６，５２６，４４０号特開平１１−３１２１６８号公報 Also, when performing a keyword search, it is important to input an appropriate keyword. Therefore, if a synonym dictionary is created in advance with a plurality of keywords having the same meaning, the keyword input by the user can be complemented with the synonym and a search without omission can be performed. Therefore, a technique of automatically creating a synonym dictionary that defines the degree of association between words has also been considered (see, for example, Patent Document 2).
US Pat. No. 6,526,440 JP 11-31168 A

現在、テレビや新聞にならぶ情報発信メディアとしてのインターネットの価値が高まる中、個人のＷｅｂページ、ＢＬＯＧ（ウェブログ）や掲示板を経て話題やトレンドが形成される例も少なくない。電子商取引を行うＷｅｂサイトの運用者などはネット上のユーザの注目や興味をいち早く捕らえマーケティングに活かすことが重要である。 Currently, as the value of the Internet as an information transmission medium that follows TV and newspapers is increasing, there are not a few examples in which topics and trends are formed via personal Web pages, BLOGs (Web logs), and bulletin boards. It is important for operators of websites that conduct electronic commerce to quickly capture the attention and interest of users on the Internet and apply them to marketing.

しかし、日々公開される新たなコンテンツの内容を検索結果に反映させるための効率的な方法がないという問題がある。たとえば、ディレクトリ検索では人手によって情報の分類を行うため、膨大なコンテンツに対するディレクトリのメンテナンスが追いつかない。 However, there is a problem that there is no efficient method for reflecting the contents of new contents that are released every day in search results. For example, since directory search manually classifies information, directory maintenance for enormous contents cannot keep up.

また、キーワード検索では、求める情報に対する適当な検索キーワードを入力しないと最適な結果が得られない。たとえば、最新技術が開発されたとき、その技術に対して新たな名称が付けられる。その最新技術に関するコンテンツをインターネットから検索するとき、その最新技術に付けられた名称をユーザがキーワードとして入力しないと、目的の情報が膨大な量の無関係の情報に埋もれてしまう可能性がある。 Also, in keyword search, an optimum result cannot be obtained unless an appropriate search keyword is input for the information to be sought. For example, when the latest technology is developed, a new name is given to the technology. When searching for contents related to the latest technology from the Internet, if the user does not input the name assigned to the latest technology as a keyword, the target information may be buried in a huge amount of irrelevant information.

なお、上記特許文献１では、情報間の引用関係で各情報のランキングを行い、有用な情報が優先的に提示されるようにしている。しかしながら、公開されて間もないコンテンツは、そのコンテンツが重要なものであっても、他のコンテンツからの引用が少ないことが予想される。この場合、そのコンテンツがキーワード検索に合致しても、優先順位が低くなり、ユーザが見落とす可能性が高くなる。 In the above-mentioned patent document 1, each information is ranked according to the citation relationship between the information so that useful information is preferentially presented. However, it is expected that content that has just been released will have few citations from other content, even if the content is important. In this case, even if the content matches the keyword search, the priority is lowered and the possibility that the user overlooks is increased.

また、上記特許文献２では、ユーザが入力した単語のみを用いて単語同士の関連度を定義しているため、検索結果として得られる情報が有効に利用されていない。
本発明はこのような点に鑑みてなされたものであり、現在多くのユーザが関心を寄せている情報の検索を容易に行うことができるコンテンツナビゲーションプログラム、コンテンツナビゲーション方法及びコンテンツナビゲーション装置を提供することを目的とする。 Moreover, in the said patent document 2, since the relevance degree of words is defined using only the word which the user input, the information obtained as a search result is not utilized effectively.
The present invention has been made in view of these points, and provides a content navigation program, a content navigation method, and a content navigation apparatus that can easily search for information that is currently of interest to many users. For the purpose.

本発明では上記課題を解決するために、図１に示すようなコンテンツの検索を支援するためのコンテンツナビゲーションプログラムが提供される。本発明に係るコンテンツナビゲーションプログラムをコンピュータに実行させると、コンピュータが以下の機能を実現する。 In order to solve the above-described problems, the present invention provides a content navigation program for supporting content search as shown in FIG. When the computer executes the content navigation program according to the present invention, the computer realizes the following functions.

格納手段１ａは、ユーザがキーワード６ａに基づいた検索を行い検索結果６ｂの中から任意のコンテンツ７ｂを選択する度に、検索用のキーワード６ａと選択されたコンテンツ７ｂの識別情報とを関連付けて記憶手段１ｂに格納する。グループ化手段１ｃは、キーワード６ａと選択されたコンテンツ７ｂとの対応関係に基づいて、記憶手段１ｂに格納されたキーワード間の関連性を判定し、関連する複数のキーワードをグループ化する。関連キーワード出力手段１ｄは、任意の代表キーワード８ａが選択された際に、選択された代表キーワードと同じグループに属する他のキーワードを出力する。 Each time the user performs a search based on the keyword 6a and selects an arbitrary content 7b from the search result 6b, the storage unit 1a stores the search keyword 6a and the identification information of the selected content 7b in association with each other. Store in means 1b. The grouping unit 1c determines the relevance between the keywords stored in the storage unit 1b based on the correspondence between the keyword 6a and the selected content 7b, and groups a plurality of related keywords. The related keyword output unit 1d outputs another keyword belonging to the same group as the selected representative keyword when an arbitrary representative keyword 8a is selected.

このようなコンテンツナビゲーションプログラムを実行するコンピュータによれば、ユーザがキーワード６ａに基づいた検索を行い検索結果６ｂの中から任意のコンテンツ７ｂを選択する度に、格納手段１ａにより、検索用のキーワード６ａと選択されたコンテンツ７ｂとが、関連付けて記憶手段１ｂに格納される。その後、グループ化手段１ｃにより、キーワードと選択されたコンテンツとの対応関係に基づいて、記憶手段１ｂに格納されたキーワード間の関連性が判定され、関連する複数のキーワードがグループ化される。そして、任意の代表キーワード８ａが選択されると、関連キーワード出力手段１ｄにより、選択された代表キーワードと同じグループに属する他のキーワードが出力される。 According to the computer that executes such a content navigation program, whenever the user performs a search based on the keyword 6a and selects an arbitrary content 7b from the search result 6b, the storage means 1a causes the search keyword 6a. And the selected content 7b are stored in the storage unit 1b in association with each other. Thereafter, the grouping unit 1c determines the relevance between the keywords stored in the storage unit 1b based on the correspondence between the keyword and the selected content, and groups a plurality of related keywords. When an arbitrary representative keyword 8a is selected, the related keyword output unit 1d outputs another keyword belonging to the same group as the selected representative keyword.

また、上記課題を解決するために、コンテンツの検索をコンピュータによって支援するためのコンテンツナビゲーション方法において、格納手段が、ユーザがキーワードに基づいた検索を行い検索結果の中から任意のコンテンツを選択する度に、検索用の前記キーワードと選択された前記コンテンツの識別情報とを関連付けて記憶手段に格納し、グループ化手段が、前記キーワードと選択された前記コンテンツとの対応関係に基づいて、前記記憶手段に格納された前記キーワード間の関連性を判定し、関連する複数の前記キーワードをグループ化し、関連キーワード出力手段が、任意の代表キーワードが選択された際に、選択された前記代表キーワードと同じグループに属する他の前記キーワードを出力する、ことを特徴とするコンテンツナビゲーション方法が提供される。 In addition, in order to solve the above-mentioned problem, in a content navigation method for assisting a search for content by a computer, a storage unit performs a search based on a keyword and selects arbitrary content from search results. The search keyword and the identification information of the selected content are associated with each other and stored in the storage means, and the grouping means is configured to store the storage means based on the correspondence between the keyword and the selected content. A plurality of related keywords are grouped, and when the related keyword output means selects an arbitrary representative keyword, the same group as the selected representative keyword The other content keyword belonging to is output. Geshon method is provided.

このようなコンテンツナビゲーション方法によれば、ユーザがキーワードに基づいた検索を行い検索結果の中から任意のコンテンツを選択する度に、格納手段により、検索用のキーワードと選択されたコンテンツとが、関連付けて記憶手段に格納される。その後、グループ化手段により、キーワードと選択されたコンテンツとの対応関係に基づいて、記憶手段に格納されたキーワード間の関連性が判定され、関連する複数のキーワードがグループ化される。そして、任意の代表キーワードが選択されると、関連キーワード出力手段により、選択された代表キーワードと同じグループに属する他のキーワードが出力される。 According to such a content navigation method, every time a user performs a search based on a keyword and selects any content from the search results, the storage keyword associates the search keyword with the selected content. Stored in the storage means. Thereafter, the grouping means determines the relevance between the keywords stored in the storage means based on the correspondence between the keyword and the selected content, and a plurality of related keywords are grouped. When an arbitrary representative keyword is selected, the related keyword output unit outputs another keyword belonging to the same group as the selected representative keyword.

また、上記課題を解決するために、コンテンツの検索を支援するためのコンテンツナビゲーション装置において、ユーザがキーワードに基づいた検索を行い検索結果の中から任意のコンテンツを選択する度に、検索用の前記キーワードと選択された前記コンテンツの識別情報とを関連付けて記憶手段に格納する格納手段と、前記キーワードと選択された前記コンテンツとの対応関係に基づいて、前記記憶手段に格納された前記キーワード間の関連性を判定し、関連する複数の前記キーワードをグループ化するグループ化手段と、任意の代表キーワードが選択された際に、選択された前記代表キーワードと同じグループに属する他の前記キーワードを出力する関連キーワード出力手段と、を有することを特徴とするコンテンツナビゲーション装置が提供される。 Further, in order to solve the above problem, in the content navigation device for supporting content search, whenever a user performs a search based on a keyword and selects any content from the search results, the search Based on the correspondence between the keyword and the selected content, the storage means for associating the keyword with the identification information of the selected content and storing it in the storage means, between the keywords stored in the storage means Grouping means for determining relevance and grouping a plurality of related keywords, and when an arbitrary representative keyword is selected, another keyword belonging to the same group as the selected representative keyword is output And a related keyword output means. There is provided.

このようなコンテンツナビゲーション装置によれば、ユーザがキーワードに基づいた検索を行い検索結果の中から任意のコンテンツを選択する度に、格納手段により、検索用のキーワードと選択されたコンテンツとが、関連付けて記憶手段に格納される。その後、グループ化手段により、キーワードと選択されたコンテンツとの対応関係に基づいて、記憶手段に格納されたキーワード間の関連性が判定され、関連する複数のキーワードがグループ化される。そして、任意の代表キーワードが選択されると、関連キーワード出力手段により、選択された代表キーワードと同じグループに属する他のキーワードが出力される。 According to such a content navigation apparatus, every time a user performs a search based on a keyword and selects any content from the search results, the storage keyword associates the search keyword with the selected content. Stored in the storage means. Thereafter, the grouping means determines the relevance between the keywords stored in the storage means based on the correspondence between the keyword and the selected content, and a plurality of related keywords are grouped. When an arbitrary representative keyword is selected, the related keyword output unit outputs another keyword belonging to the same group as the selected representative keyword.

以上説明したように本発明では、キーワードと、そのキーワードによる検索結果からユーザが選択したコンテンツとの関係に基づいてキーワードのグループ化を行うようにした。これにより、現在多くのユーザが関心を寄せているコンテンツに関連するキーワード同士がグループ化される。そして、選択された代表キーワードと同じグループの他のキーワードを出力することで、ユーザに対して、関心の対象が類似する他のユーザが入力したキーワードを提示することができる。 As described above, in the present invention, keywords are grouped based on the relationship between the keywords and the content selected by the user from the search result based on the keywords. As a result, keywords related to content that many users are currently interested in are grouped together. Then, by outputting other keywords in the same group as the selected representative keyword, the keywords input by other users with similar interest targets can be presented to the user.

以下、本発明の実施の形態を図面を参照して説明する。
まず、実施の形態に適用される発明の概要について説明し、その後、実施の形態の具体的な内容を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
First, the outline of the invention applied to the embodiment will be described, and then the specific contents of the embodiment will be described.

図１は、実施の形態に適用される発明の概念図である。本発明に係るコンテンツナビゲーション装置１は、キーワード検索を行うユーザが使用するクライアント２、検索サーバ３、コンテンツサーバ４、およびコンテンツナビゲーションのサービスを利用するユーザが使用するクライアント５に接続されている。コンテンツナビゲーション装置１は、格納手段１ａ、記憶手段１ｂ、グループ化手段１ｃ、及び関連キーワード出力手段１ｄを有している。 FIG. 1 is a conceptual diagram of the invention applied to the embodiment. The content navigation apparatus 1 according to the present invention is connected to a client 2, a search server 3, a content server 4, and a client 5 used by a user who uses a content navigation service. The content navigation apparatus 1 includes a storage unit 1a, a storage unit 1b, a grouping unit 1c, and a related keyword output unit 1d.

格納手段１ａは、ユーザがキーワード６ａに基づいた検索を行い検索結果６ｂの中から任意のコンテンツ７ｂを選択する度に、検索用のキーワード６ａと選択されたコンテンツ７ｂとを関連付けて記憶手段１ｂに格納する。具体的には、クライアント２からキーワード６ａが出力されると、検索サーバ３によりキーワード６ａに基づく検索が行われる。そして、検索結果６ｂがクライアント２に返される。クライアント２を使用するユーザが、検索結果６ｂの中から任意のコンテンツを選択すると、コンテンツ取得要求７ａがコンテンツサーバ４に対して出力される。コンテンツサーバ４は、コンテンツ取得要求７ａに応じたコンテンツ７ｂをクライアント２に返す。このとき、格納手段１ａは、キーワード６ａとコンテンツ取得要求７ａで示されるコンテンツ７ｂの識別情報を採取し、それらを関連付けて記憶手段１ｂに格納する。 Each time the user performs a search based on the keyword 6a and selects an arbitrary content 7b from the search result 6b, the storage unit 1a associates the search keyword 6a with the selected content 7b and stores it in the storage unit 1b. Store. Specifically, when the keyword 6a is output from the client 2, the search based on the keyword 6a is performed by the search server 3. Then, the search result 6b is returned to the client 2. When the user using the client 2 selects any content from the search result 6b, a content acquisition request 7a is output to the content server 4. The content server 4 returns the content 7b corresponding to the content acquisition request 7a to the client 2. At this time, the storage unit 1a collects the identification information of the content 6b indicated by the keyword 6a and the content acquisition request 7a, and stores them in the storage unit 1b in association with each other.

グループ化手段１ｃは、キーワードと選択されたコンテンツとの対応関係に基づいて、記憶手段１ｂに格納されたキーワード間の関連性を判定し、関連する複数のキーワードをグループ化する。たとえば、グループ化手段１ｃは、関連付けられたコンテンツが共通するキーワード同士をグループ化する。具体的には、グループ化手段１ｃは、２つのキーワードを比較したとき、共通して関連付けられているコンテンツの数が所定値以上の場合、２つのキーワードをグループ化することができる。さらに、グループ化手段１ｃは、２つのキーワードを比較したとき、２つのキーワードの少なくとも一方に関連付けられているコンテンツの中で、共通して関連付けられているコンテンツの占める割合が所定値以上の場合に、２つのキーワードをグループ化するようにすることもできる。 The grouping unit 1c determines the relevance between the keywords stored in the storage unit 1b based on the correspondence between the keyword and the selected content, and groups a plurality of related keywords. For example, the grouping unit 1c groups keywords having common contents associated with each other. Specifically, when the two keywords are compared, the grouping unit 1c can group the two keywords if the number of commonly associated contents is equal to or greater than a predetermined value. Furthermore, the grouping unit 1c compares the two keywords when the proportion of the content associated with at least one of the two keywords is equal to or greater than a predetermined value. Two keywords can be grouped.

関連キーワード出力手段１ｄは、クライアント５からの代表キーワードの入力を受け付ける。そして、関連キーワード出力手段１ｄは、任意の代表キーワードが選択された際に、選択された代表キーワードと同じグループに属する他のキーワードを、クライアント５に対して出力する。 The related keyword output unit 1 d accepts input of representative keywords from the client 5. Then, the related keyword output unit 1d outputs other keywords belonging to the same group as the selected representative keyword to the client 5 when an arbitrary representative keyword is selected.

このようなコンテンツナビゲーション装置によれば、ユーザがキーワード６ａに基づいた検索を行い検索結果６ｂの中から任意のコンテンツ７ｂを選択する度に、格納手段１ａにより、検索用のキーワード６ａと選択されたコンテンツ７ｂとが、関連付けて記憶手段１ｂに格納される。その後、グループ化手段１ｃにより、キーワードと選択されたコンテンツとの対応関係に基づいて、記憶手段１ｂに格納されたキーワード間の関連性が判定され、関連する複数のキーワードがグループ化される。そして、任意の代表キーワード８ａが選択されると、関連キーワード出力手段１ｄにより、選択された代表キーワード８ａと同じグループに属する他の関連キーワード８ｂが出力される。 According to such a content navigation apparatus, whenever the user performs a search based on the keyword 6a and selects an arbitrary content 7b from the search result 6b, the storage means 1a selects the keyword 6a for search. The content 7b is associated and stored in the storage unit 1b. Thereafter, the grouping unit 1c determines the relevance between the keywords stored in the storage unit 1b based on the correspondence between the keyword and the selected content, and groups a plurality of related keywords. When an arbitrary representative keyword 8a is selected, the related keyword output unit 1d outputs another related keyword 8b belonging to the same group as the selected representative keyword 8a.

このように、キーワードと、そのキーワードによる検索結果からユーザが選択したコンテンツとの関係に基づいてキーワードのグループ化を行うようにしたことにより、現在多くのユーザが関心を寄せているコンテンツに関連するキーワード同士をグループ化することができる。その結果、選択された代表キーワードと同じグループの他のキーワードを出力することで、ユーザに対して、関心の対象が類似する他のユーザが入力したキーワードを提示することができる。 As described above, the keywords are grouped on the basis of the relationship between the keyword and the content selected by the user from the search result by the keyword, so that it is related to the content that many users are currently interested in. Keywords can be grouped together. As a result, by outputting other keywords in the same group as the selected representative keyword, it is possible to present the keywords input by other users with similar interest targets to the user.

図１に示すような本発明の技術は、インターネットやイントラネットなど、様々なネットワークのコンテンツナビゲーションに利用することができる。特に、膨大なコンテンツが公開されているインターネットに適用することで、インターネットの利便性を向上させることができる。 The technique of the present invention as shown in FIG. 1 can be used for content navigation in various networks such as the Internet and an intranet. In particular, the convenience of the Internet can be improved by applying it to the Internet where a large amount of content is disclosed.

なお、グループ化の判断要素として、ユーザの識別情報を利用することもできる。その場合、格納手段１ａは、コンテンツの識別情報に加え、キーワードを入力したユーザのユーザ識別情報を、キーワードに関連付けて記憶手段１ｂに格納する。そしてグループ化手段１ｃは、関連付けられたユーザ識別情報が共通するキーワード同士をグループ化する。具体的には、グループ化手段１ｃは、２つのキーワードを比較したとき、共通して関連付けられているユーザ識別情報の数が所定値以上の場合に、２つのキーワードをグループ化する。また、グループ化手段１ｃは、２つのキーワードを比較したとき、２つのキーワードの少なくとも一方に関連付けられているユーザ識別情報の中で、共通して関連付けられているユーザ識別情報の占める割合が所定値以上の場合に、２つのキーワードをグループ化することもできる。このようにユーザの識別情報を用いてグループ化することで、興味の対象が同じユーザによって入力されるキーワードをグループ化することができる。 Note that user identification information can also be used as a grouping determination factor. In that case, in addition to the content identification information, the storage unit 1a stores the user identification information of the user who entered the keyword in the storage unit 1b in association with the keyword. Then, the grouping unit 1c groups keywords having the same associated user identification information. Specifically, when the two keywords are compared, the grouping unit 1c groups the two keywords when the number of commonly associated user identification information is greater than or equal to a predetermined value. Further, when the grouping means 1c compares two keywords, the proportion of the user identification information commonly associated with the user identification information associated with at least one of the two keywords is a predetermined value. In the above case, two keywords can be grouped. By grouping using the user identification information in this way, it is possible to group keywords that are input by the same user of interest.

さらに、グループ化を繰り返すことで、グループ化の範囲を拡大することができる。具体的には、グループ化手段１ｃは、記憶手段１ｂに含まれる１つのキーワードを対象語とし、対象語に対して関連する他のキーワードを対応語とし、対象語と対応語とをグループ化する。さらに、グループ化手段１ｃは、対応語に対して関連する他のキーワードを同一グループに含める。 Furthermore, the grouping range can be expanded by repeating the grouping. Specifically, the grouping unit 1c groups one keyword included in the storage unit 1b as a target word, another keyword related to the target word as a corresponding word, and groups the target word and the corresponding word. . Furthermore, the grouping means 1c includes other keywords related to the corresponding word in the same group.

また、コンテンツナビゲーション装置１に対して、ユーザから不要として選択されたキーワードを不要語として不要語記憶手段に格納する不要語格納手段を追加し、関連キーワード出力手段１ｄが、不要語記憶手段に記憶された不要語を除くキーワードを出力するようにしてもよい。具体的には、不要語格納手段は、キーワードと選択されたコンテンツとの対応関係に基づいて、ユーザに選択された不要語と他のキーワードとの間の関連性を判定し、不要語に関連する他のキーワード（同じコンテンツに関連付けられている）を新たな不要語として不要語記憶手段に格納する。このように不要語を登録しておくことにより、不必要な関連キーワード（他の関連キーワードとほぼ同じコンテンツにしか関連付けられていない関連キーワード）の出力を防止できる。 Further, an unnecessary word storage unit is added to the content navigation device 1 to store unnecessary keywords stored in the unnecessary word storage unit as unnecessary words from the user, and the related keyword output unit 1d stores the unnecessary word storage unit in the unnecessary word storage unit. Keywords other than the unnecessary words may be output. Specifically, the unnecessary word storage means determines the relationship between the unnecessary word selected by the user and another keyword based on the correspondence relationship between the keyword and the selected content, and relates to the unnecessary word. The other keywords (associated with the same content) are stored in the unnecessary word storage unit as new unnecessary words. By registering unnecessary words in this way, it is possible to prevent unnecessary related keywords (related keywords that are related only to the same content as other related keywords) from being output.

また、グループ化した複数のキーワードそれぞれに関連付けられたコンテンツを、１つのキーワードへの関連づけとして纏めることもできる（縮退処理）。具体的には、グループ化手段１ｃは、記憶手段１ｂに含まれる１つのキーワードを対象語とし、対象語に対して関連する他のキーワードを対応語とする。そして、グループ化手段１ｃは、対応語それぞれに関連付けられたコンテンツを対象語に関連付けて記憶手段１ｂに登録すると共に記憶手段１ｂから対応語を削除する。このように縮退処理を行った後、グループ化手段１ｃは、記憶手段１ｂに格納されたキーワード間の関連性を判定し、関連する複数のキーワードをグループ化する。すなわち、縮退処理とグループ化を交互に行う。これにより、グループ化を繰り返して行い、グループ化される範囲を拡張した場合であっても、関連キーワードとして大量のキーワードが一度に出力されるのを防止できる。 In addition, content associated with each of a plurality of grouped keywords can be collected as association with one keyword (degeneration process). Specifically, the grouping unit 1c sets one keyword included in the storage unit 1b as a target word, and sets other keywords related to the target word as corresponding words. Then, the grouping unit 1c registers the content associated with each corresponding word in the storage unit 1b in association with the target word and deletes the corresponding word from the storage unit 1b. After performing the degeneration processing in this way, the grouping means 1c determines the relevance between the keywords stored in the storage means 1b, and groups a plurality of related keywords. That is, degeneration processing and grouping are performed alternately. Thus, even when grouping is repeated and the range to be grouped is expanded, it is possible to prevent a large amount of keywords from being output at once as related keywords.

また、既に生成された複数のグループを木構造に纏めることもできる。具体的には、グループ化手段１ｃは、グループ化により生成された２つのグループが共通のキーワードを含む場合に、共通のキーワードを介して、一方のグループの配下に他方のグループを木構造に接続した新たなグループを生成する。 It is also possible to combine a plurality of already generated groups into a tree structure. Specifically, when the two groups generated by grouping include a common keyword, the grouping unit 1c connects the other group to a tree structure under one group via the common keyword. Create a new group.

さらに、キーワードによる検索時のヒット件数や、そのキーワードを入力したユーザ数の推移の同一性によってキーワードをグループ化することもできる。具体的には、格納手段１ａは、キーワードによる検索のヒット件数やキーワードを入力したユーザのユーザ識別情報を、キーワードに関連付けて記憶手段１ｂに格納するようにする。そして、グループ化手段１ｃは、ヒット件数の時間的な推移が類似する複数のキーワードをグループ化する。または、グループ化手段１ｃは、入力したユーザ数の時間的な推移が類似する複数のキーワードをグループ化する。このように、時間的な推移を勘案してグループ化することにより、注目度が上がっている情報を、ユーザに提供することができる。 Furthermore, keywords can be grouped according to the number of hits when searching by keyword and the same transition of the number of users who input the keyword. Specifically, the storage unit 1a stores the number of hits of search by keyword and the user identification information of the user who has input the keyword in the storage unit 1b in association with the keyword. Then, the grouping unit 1c groups a plurality of keywords having similar temporal transitions in the number of hits. Alternatively, the grouping unit 1c groups a plurality of keywords with similar temporal transitions of the input number of users. In this way, by grouping in consideration of the temporal transition, it is possible to provide the user with information that has attracted attention.

ところで、本発明によるコンテンツナビゲーションをインターネット上で利用すれば、インターネット上の日々更新される膨大な量の情報を有効に利用することができる。本発明をインターネットに適用する場合、たとえば、ユーザがインターネットにアクセスするときの最初の入り口（ポータルサイトサーバ）に上記機能を構築する。このポータルサイトサーバが、ユーザの検索行動パターンをマイニングする。そして、ポータルサイトサーバにおいて、検索キーワードをベースにしたコンテンツナビゲーションを行う。 By the way, if the content navigation according to the present invention is used on the Internet, a huge amount of information updated every day on the Internet can be used effectively. When the present invention is applied to the Internet, for example, the above function is constructed at the first entrance (portal site server) when the user accesses the Internet. This portal site server mines the user's search behavior pattern. In the portal site server, content navigation based on the search keyword is performed.

以下、本発明をインターネット上のポータルサイトに適用した場合を例に採り、本発明の実施の形態を具体的に説明する。なお、以下の実施の形態では、キーワードをグループ化することをクラスタ化、生成される各グループをクラスタと呼ぶこととする。 In the following, embodiments of the present invention will be specifically described by taking the case where the present invention is applied to a portal site on the Internet as an example. In the following embodiment, grouping keywords is referred to as clustering, and each generated group is referred to as a cluster.

図２は、本発明の実施の形態を実現するためのシステム構成例を示す図である。図２に示すように、インターネット１０を介して、ポータルサイトサーバ１００、複数のクライアント２１１，２１２，・・・、検索サーバ２２０、複数のＷｅｂサーバ２３１，２３２，・・・が接続されている。 FIG. 2 is a diagram showing an example of a system configuration for realizing the embodiment of the present invention. As shown in FIG. 2, a portal site server 100, a plurality of clients 211, 212,..., A search server 220, and a plurality of Web servers 231, 232,.

ポータルサイトサーバ１００には、Ｗｅｂ検索の履歴を保持し、その検索履歴に基づいて、クライアント２１１，２１２，・・・に対してコンテンツナビゲーションサービスを提供する。 The portal site server 100 holds a web search history and provides a content navigation service to the clients 211, 212,... Based on the search history.

検索サーバ２２０は、クライアント２１１，２１２，・・・からの検索要求をポータルサイトサーバ１００経由で受け取り、Ｗｅｂページの検索結果を返す。Ｗｅｂサーバ２３１，２３２，・・・は、インターネット１０を介してＷｅｂページ等の様々なコンテンツを提供する。 The search server 220 receives a search request from the clients 211, 212,... Via the portal site server 100, and returns a Web page search result. The Web servers 231, 232,... Provide various contents such as Web pages via the Internet 10.

図３は、本発明の実施の形態に用いるポータルサイトサーバのハードウェア構成例を示す図である。ポータルサイトサーバ１００は、ＣＰＵ(Central Processing Unit)１０１によって装置全体が制御されている。ＣＰＵ１０１には、バス１０７を介してＲＡＭ(Random Access Memory)１０２、ハードディスクドライブ(ＨＤＤ:Hard Disk Drive)１０３、グラフィック処理装置１０４、入力インタフェース１０５、および通信インタフェース１０６が接続されている。 FIG. 3 is a diagram illustrating a hardware configuration example of the portal site server used in the embodiment of the present invention. The entire portal site server 100 is controlled by a CPU (Central Processing Unit) 101. A random access memory (RAM) 102, a hard disk drive (HDD) 103, a graphic processing device 104, an input interface 105, and a communication interface 106 are connected to the CPU 101 via a bus 107.

ＲＡＭ１０２には、ＣＰＵ１０１に実行させるＯＳ(Operating System)のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ１０２には、ＣＰＵ１０１による処理に必要な各種データが格納される。ＨＤＤ１０３には、ＯＳやアプリケーションプログラムが格納される。 The RAM 102 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the CPU 101. The RAM 102 stores various data necessary for processing by the CPU 101. The HDD 103 stores an OS and application programs.

グラフィック処理装置１０４には、モニタ１１が接続されている。グラフィック処理装置１０４は、ＣＰＵ１０１からの命令に従って、画像をモニタ１１の画面に表示させる。入力インタフェース１０５には、キーボード１２とマウス１３とが接続されている。入力インタフェース１０５は、キーボード１２やマウス１３から送られてくる信号を、バス１０７を介してＣＰＵ１０１に送信する。 A monitor 11 is connected to the graphic processing device 104. The graphic processing device 104 displays an image on the screen of the monitor 11 in accordance with a command from the CPU 101. A keyboard 12 and a mouse 13 are connected to the input interface 105. The input interface 105 transmits a signal transmitted from the keyboard 12 or the mouse 13 to the CPU 101 via the bus 107.

通信インタフェース１０６は、インターネット１０に接続されている。通信インタフェース１０６は、インターネット１０を介して、他のコンピュータとの間でデータの送受信を行う。 The communication interface 106 is connected to the Internet 10. The communication interface 106 transmits and receives data to and from other computers via the Internet 10.

以上のようなハードウェア構成によって、本実施の形態の処理機能を実現することができる。なお、図３には、ポータルサイトサーバ１００のハードウェア構成例を示したが、クライアント２１１，２１２，・・・、検索サーバ２２０、Ｗｅｂサーバ２３１，２３２，・・・も同様のハードウェア構成で実現することができる。 With the hardware configuration as described above, the processing functions of the present embodiment can be realized. 3 shows an example of the hardware configuration of the portal site server 100. However, the clients 211, 212,..., The search server 220, and the Web servers 231, 232,. Can be realized.

本発明に係る機能は、ポータルサイトサーバ１００によって提供される。すなわち、ポータルサイトサーバ１００は、検索サイトで記録している検索履歴（検索キーワード、ユーザ識別ＩＤ、検索結果の飛び先のＵＲＬ）を利用して、キーワードベースの２種類のクラスタを作成する。そして、ポータルサイトサーバ１００は、作成したクラスタからユーザの興味に従ってナビゲートを行うネットワークを構築する。以下に、ポータルサイトサーバ１００の処理機能について詳細に説明する。 The function according to the present invention is provided by the portal site server 100. That is, the portal site server 100 creates two types of keyword-based clusters using the search history (search keyword, user identification ID, and search destination URL) recorded in the search site. Then, the portal site server 100 constructs a network that performs navigation according to the user's interest from the created cluster. Hereinafter, the processing function of the portal site server 100 will be described in detail.

図４は、ポータルサイトサーバの処理機能を示すブロック図である。ポータルサイトサーバ１００には、データベースとして、ＷｅｂページＤＢ１１１、検索履歴ＤＢ１１２、基本ＤＢ１１３、不要語ＤＢ１１４、クラスタＤＢ１１５が設けられている。ＷｅｂページＤＢ１１１には、ポータルサイトとして提供するＷｅｂページデータが格納されている。検索履歴ＤＢ１１２には、ポータルサイトサーバ１００を介して実行されたＷｅｂ検索に関する検索履歴が格納される。基本ＤＢ１１３には、検索履歴の内容を解析した結果得られる情報間の関連づけを示す情報が格納される。不要語ＤＢ１１４には、コンテンツナビゲーションにおいて、ユーザに対して提示する必要のない用語（不要語）に関する情報が格納される。クラスタＤＢ１１５には、検索履歴に基づいてキーワード間の関連性の度合い等を示す情報が格納される。 FIG. 4 is a block diagram showing processing functions of the portal site server. The portal site server 100 is provided with a Web page DB 111, a search history DB 112, a basic DB 113, an unnecessary word DB 114, and a cluster DB 115 as databases. Web page DB 111 stores Web page data provided as a portal site. The search history DB 112 stores search histories related to Web searches executed via the portal site server 100. The basic DB 113 stores information indicating the association between information obtained as a result of analyzing the contents of the search history. The unnecessary word DB 114 stores information on terms (unnecessary words) that need not be presented to the user in content navigation. The cluster DB 115 stores information indicating the degree of relevance between keywords based on the search history.

ポータルサイトサーバ１００には、処理機能として、ポータルサイトコンテンツ提供部１２０、検索履歴記録部１３０、キーワードクラスタ作成部１４０、不要語ＤＢ作成部１５０、及びナビゲーション部１６０が設けられている。ポータルサイトコンテンツ提供部１２０は、クライアント２１１，２１２，・・・に対してＷｅｂページＤＢ１１１に格納されたＷｅｂページデータを提供する。検索履歴記録部１３０は、クライアント２１１，２１２，・・・が検索サーバ２２０を利用して行うＷｅｂ検索の履歴を、検索履歴ＤＢ１１２に記録する。キーワードクラスタ作成部１４０は、検索履歴ＤＢ１１２に基づいて、基本ＤＢ１１３及びクラスタＤＢ１１５を構築する。不要語ＤＢ作成部１５０は、管理者に不要語として指定された用語を、不要語ＤＢ１１４に登録する。ナビゲーション部１６０は、クライアント２１１，２１２，・・・からの要求に応じて、指定されたキーワードに関連するキーワードをクラスタＤＢ１１５から抽出し、クライアント２１１，２１２，・・・に対して送信する。 The portal site server 100 is provided with a portal site content providing unit 120, a search history recording unit 130, a keyword cluster creating unit 140, an unnecessary word DB creating unit 150, and a navigation unit 160 as processing functions. The portal site content providing unit 120 provides the Web page data stored in the Web page DB 111 to the clients 211, 212,. The search history recording unit 130 records, in the search history DB 112, the history of Web search performed by the clients 211, 212,... Using the search server 220. The keyword cluster creation unit 140 constructs a basic DB 113 and a cluster DB 115 based on the search history DB 112. The unnecessary word DB creation unit 150 registers the term designated as an unnecessary word by the administrator in the unnecessary word DB 114. In response to a request from the clients 211, 212,..., The navigation unit 160 extracts keywords related to the specified keyword from the cluster DB 115 and transmits them to the clients 211, 212,.

まず、ポータルサイトサーバ１００における検索履歴の蓄積処理について説明する。
図５は、検索履歴蓄積処理を示す図である。図５の例では、クライアント２１１を使用するユーザがコンテンツの検索を行い、検索結果からＷｅｂサーバ２３１のコンテンツを選択して閲覧するまでの流れを示している。 First, search history accumulation processing in the portal site server 100 will be described.
FIG. 5 is a diagram showing search history accumulation processing. In the example of FIG. 5, a flow from when the user using the client 211 searches for content to select and browse the content of the Web server 231 from the search result is shown.

クライアント２１１は、ユーザからの操作入力に応答して、検索ページ取得要求２１をポータルサイトサーバ１００に対して送信する。ポータルサイトサーバ１００のポータルサイトコンテンツ提供部１２０は、検索ページ取得要求２１に応答して検索ページデータ２２をクライアント２１１に対して送信する。検索ページデータ２２は、たとえば、ＨＴＭＬ（HyperText Markup Language）で記述された構造化文書である。 The client 211 transmits a search page acquisition request 21 to the portal site server 100 in response to an operation input from the user. The portal site content providing unit 120 of the portal site server 100 transmits the search page data 22 to the client 211 in response to the search page acquisition request 21. The search page data 22 is a structured document described in, for example, HTML (HyperText Markup Language).

クライアント２１１では、検索ページ２３がモニタに表示される。検索ページ２３には、検索キーワード入力部２３ａと検索ボタン２３ｂとが表示されている。なお、図では省略しているが、検索ページ２３にはその他の各種情報（ニュース等）も表示されている。 In the client 211, the search page 23 is displayed on the monitor. On the search page 23, a search keyword input part 23a and a search button 23b are displayed. Although omitted in the figure, the search page 23 also displays various other information (news etc.).

ユーザは、検索キーワード入力部２３ａに対して１以上のキーワードを入力し、検索ボタン２３ｂを押下する。すると、クライアント２１１は、検索キーワードを含む検索要求をポータルサイトサーバ１００に対して送信する。 The user inputs one or more keywords to the search keyword input unit 23a and presses the search button 23b. Then, the client 211 transmits a search request including the search keyword to the portal site server 100.

ポータルサイトサーバ１００の検索履歴記録部１３０は、検索要求２４を検索サーバ２２０に転送する。検索サーバ２２０は、受け取った検索要求２４に従ってインターネット１０上のコンテンツを検索する。そして、検索サーバ２２０は、検索結果２５をポータルサイトサーバ１００に対して送信する。 The search history recording unit 130 of the portal site server 100 transfers the search request 24 to the search server 220. The search server 220 searches for content on the Internet 10 in accordance with the received search request 24. Then, the search server 220 transmits the search result 25 to the portal site server 100.

ポータルサイトサーバ１００の検索履歴記録部１３０は、検索結果２５をクライアント２１１に転送する。この際、検索履歴記録部１３０は、検索結果２５をＷｅｂページデータに加工する。生成されるＷｅｂページデータには、検索結果中のＵＲＬをユーザが選択したとき、そのＵＲＬへのアクセスがポータルサイトサーバ１００を経由して実行されるように、制御情報が埋め込まれている。また、検索履歴記録部１３０は、検索結果２５内の所定の情報（検索キーワードやヒット件数等）を、クライアント２１１を一意に識別するための情報（たとえばクッキー）に関連付けて検索履歴ＤＢ１１２に格納する。 The search history recording unit 130 of the portal site server 100 transfers the search result 25 to the client 211. At this time, the search history recording unit 130 processes the search result 25 into Web page data. Control information is embedded in the generated Web page data so that when the user selects a URL in the search result, access to the URL is executed via the portal site server 100. Further, the search history recording unit 130 stores predetermined information (search keyword, number of hits, etc.) in the search result 25 in the search history DB 112 in association with information (for example, a cookie) for uniquely identifying the client 211. .

クライアント２１１は検索結果２５を受け取ると、検索結果リスト２６をモニタに表示する。検索結果リスト２６には、検索キーワードに合致するコンテンツの識別情報２８ａ，２８ｂ，・・・（たとえば、タイトルやＵＲＬ（Uniform Resource Locator））等が含まれる。ここで、ユーザが、Ｗｅｂサーバ２３１で公開されているコンテンツの識別情報を選択すると、クライアント２１１から対応するＵＲＬを指定したＷｅｂページ取得要求２７が出力される。 Upon receiving the search result 25, the client 211 displays the search result list 26 on the monitor. The search result list 26 includes content identification information 28a, 28b,... (For example, a title or URL (Uniform Resource Locator)) that matches the search keyword. Here, when the user selects content identification information published on the Web server 231, a Web page acquisition request 27 specifying a corresponding URL is output from the client 211.

ポータルサイトサーバ１００の検索履歴記録部１３０は、Ｗｅｂページ取得要求２７をＷｅｂサーバ２３１に転送すると共に、アクセスされたＷｅｂページのＵＲＬの情報を検索履歴ＤＢ１１２に格納する。 The search history recording unit 130 of the portal site server 100 transfers the Web page acquisition request 27 to the Web server 231 and stores the URL information of the accessed Web page in the search history DB 112.

Ｗｅｂページ取得要求２７を受け取ったＷｅｂサーバ２３１は、該当するＷｅｂページデータ２８をクライアント２１１に対して送信する。クライアント２１１では、Ｗｅｂページ２９が表示される。 The web server 231 that has received the web page acquisition request 27 transmits the corresponding web page data 28 to the client 211. On the client 211, the Web page 29 is displayed.

図６は、検索履歴ＤＢのデータ構造例を示す図である。検索履歴ＤＢ１１２には、ポータルサイトサーバ１００を介したＷｅｂ検索が行われた際の検索履歴１１２ａ，１１２ｂ，１１２ｃ，・・・が格納されている。 FIG. 6 is a diagram illustrating an exemplary data structure of the search history DB. The search history DB 112 stores search histories 112a, 112b, 112c,... When a Web search is performed via the portal site server 100.

各検索履歴１１２ａ，１１２ｂ，１１２ｃ，・・・には、検索日時、検索を行ったセッションのＩＤ、検索キーワード、ヒット件数、検索結果の取得範囲（何番目から何個のページ情報を取得したか）、ユーザＩＤ（ユーザの識別情報）、検索結果のトップに現れるＵＲＬ、検索結果の中でユーザに選択されたＵＲＬ（飛び先）、検索種別、検索されたページのタイトルなどの情報が含まれる。 In each search history 112a, 112b, 112c,..., The search date and time, the ID of the session in which the search was performed, the search keyword, the number of hits, the search result acquisition range (from what number of page information was acquired ), A user ID (user identification information), a URL appearing at the top of the search result, a URL (jump destination) selected by the user in the search result, a search type, and a title of the searched page. .

たとえば、検索履歴１１２ａでは、検索日時“2003/12/12:00:00:34”、セッションＩＤ“1111111111111111111111111111”、検索キーワード“ロシア民謡トロイカ”、ヒット件数“hn=478”、検索結果の取得範囲“ri=10:21”（２１番目のコンテンツから１０件分）、トップのＵＲＬ“GU=""”、クッキー“ck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”、選択されたＵＲＬ“http://www.xxx.ne.jp/”、検索種別“b=NORMAL”選択されたページのタイトル“t="トロイカ"”が設定されている。 For example, in the search history 112a, the search date and time “2003/12/12: 00: 00: 34”, the session ID “1111111111111111111111111111”, the search keyword “Russian folk song Trojan”, the number of hits “hn = 478”, the search result acquisition range “Ri = 10: 21” (10 items from the 21st content), top URL “GU =” ”, cookie“ ck = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ”, selected URL“ http://www.xxx.ne. “jp /”, search type “b = NORMAL” The title “t = Troika” of the selected page is set.

なお、検索履歴１１２ａでは、検索結果の内トップのページ情報を取得していないため、トップのＵＲＬは空欄となっている。検索履歴１１２では、トップのページ情報を取得しているため、トップのＵＲＬ“GU="http://www.yyy.co.jp/”が登録されている。 In the search history 112a, the top page information in the search results is not acquired, and therefore the top URL is blank. In the search history 112, since the top page information is acquired, the top URL “GU =“ http://www.yyy.co.jp/ ”is registered.

このような履歴が日々大量（１００万件／日以上）に蓄積される環境（ＩＳＰなど）において、意味のあるキーワードクラスタを作成（所謂キーワードマイニング）が、本件のポイントである。キーワードクラスタ作成処理は、定期的（たとえば、１日１回所定の時刻）に実行される。 In an environment (ISP or the like) in which such a history is accumulated in a large amount every day (1 million cases / day or more), creating a meaningful keyword cluster (so-called keyword mining) is the point of this case. The keyword cluster creation process is executed periodically (for example, once a day at a predetermined time).

以下、検索履歴からのキーワードクラスタの作成処理について詳細に説明する。キーワードクラスタは、キーワードを所定の観点によってグループ分けしたものである。本実施の形態では、以下の２種類のオペレーションによって異なるレベル（観点）のクラスタが作成される。 Hereinafter, a process for creating a keyword cluster from the search history will be described in detail. The keyword cluster is a grouping of keywords according to a predetermined viewpoint. In the present embodiment, clusters of different levels (viewpoints) are created by the following two types of operations.

レベル１は、検索結果から選択されたコンテンツが共通するキーワード同士をクラスタ化するものである。このレベル１のクラスタに含まれるキーワード同士は概して、表記のゆれや異表記の関係のものが多い。たとえば、「パーソナルコンピュータ」、「パーソナル・コンピュータ」、「パソコン」、「ＰＣ」などがレベル１のクラスタを構成する。レベル１のクラスタは、ユーザが検索結果で選択したＵＲＬの一致度を利用して作成することができる。 Level 1 is a clustering of keywords having common contents selected from the search results. In general, keywords included in the cluster of level 1 are often related to variations in notation or different notations. For example, “personal computer”, “personal computer”, “personal computer”, “PC”, etc. constitute a level 1 cluster. A level 1 cluster can be created using the degree of matching of URLs selected by the user in the search results.

図７は、レベル１のクラスタ化を示す図である。図７には、検索キーワードとそのキーワードで検索した結果ユーザが選択したＵＲＬの関係を表している。たとえば、「Kwd1」で検索したユーザが、検索結果の中から「URL1」、「URL2」、「URL3」を選択したことを表している。このとき、選択されるＵＲＬの共通性（重なり度合）が高いキーワード同士は、そのキーワードの内容が同一または類似しているものと想定できる。そこで、選択されるＵＲＬの共通性が高いキーワードをクラスタ化し、レベル１のクラスタとする。 FIG. 7 illustrates level 1 clustering. FIG. 7 shows the relationship between the search keyword and the URL selected by the user as a result of searching with the keyword. For example, this indicates that the user who searched for “Kwd1” has selected “URL1”, “URL2”, and “URL3” from the search results. At this time, it can be assumed that the keywords with high commonality (overlapping degree) of the selected URLs have the same or similar contents. Therefore, keywords having high commonality of the selected URLs are clustered to obtain a level 1 cluster.

具体的には、検索キーワードとＵＲＬとのペアを考えたとき、選択ＵＲＬの重なり度合をdupとして以下のように定義する。
dup（Kwd1,Kwd2）＝（Kwd1とKwd2の重なりURL数）／（Kwd1とKwd2とのURLの合計）
検索ログに含まれる全てのキーワードについて、上記の重なり度合(dup)や単純な重なりＵＲＬ数を算出して、それらの値がある閾値以上のキーワード群をレベル１のクラスタとする。 Specifically, when a pair of search keyword and URL is considered, the overlapping degree of the selected URL is defined as dup as follows.
dup (Kwd1, Kwd2) = (number of overlapping URLs of Kwd1 and Kwd2) / (total of URLs of Kwd1 and Kwd2)
With respect to all keywords included in the search log, the above-described overlap degree (dup) and the number of simple overlap URLs are calculated, and a keyword group having a certain value or more as a threshold value is defined as a level 1 cluster.

図７の例では、「Kwd1」と「Kwd2」とが共通のＵＲＬに関連付けられている。そのため、「Kwd1」と「Kwd2」とでクラスタ３１を構成する。同様に、「Kwd3」と「Kwd4」とが共通のＵＲＬに関連付けられている。そのため、「Kwd3」と「Kwd4」とでクラスタ３２を構成する。 In the example of FIG. 7, “Kwd1” and “Kwd2” are associated with a common URL. Therefore, “Kwd1” and “Kwd2” constitute a cluster 31. Similarly, “Kwd3” and “Kwd4” are associated with a common URL. Therefore, “Kwd3” and “Kwd4” constitute a cluster 32.

図８は、２つのキーワードの間のレベル１の重なり度数を示す図である。図８の例では、「キーワードＡ」の検索結果から選択されたＵＲＬが「URL1」、「URL2」、「URL3」、「URL4」、「URL5」である。また、「キーワードＢ」の検索結果から選択されたＵＲＬが「URL4」、「URL5」、「URL6」、「URL7」である。従って、重複するＵＲＬは「URL4」、「URL5」であり、重なり数は２となる。 FIG. 8 is a diagram illustrating the level 1 overlap frequency between two keywords. In the example of FIG. 8, the URLs selected from the search result of “keyword A” are “URL1”, “URL2”, “URL3”, “URL4”, and “URL5”. The URLs selected from the search result of “keyword B” are “URL4”, “URL5”, “URL6”, and “URL7”. Therefore, the overlapping URLs are “URL4” and “URL5”, and the number of overlaps is 2.

レベル２は、入力したユーザが重複するキーワード同士をクラスタ化するものである。レベル２のクラスタで得られるキーワードは、概して兄弟語、関連語レベルのものが多い。たとえば、「○○カメラ」、「△△や」、「□□電気」、「××カメラ」、「凸凸電気」などのキーワードがレベル２のクラスタとしてグループ化される。レベル２のクラスタは、複数のユーザによって検索されたキーワードを利用して作成することができる。 Level 2 is for clustering keywords that are duplicated by the input user. The keywords obtained in the level 2 cluster are generally many at the sibling and related word level. For example, keywords such as “XX camera”, “ΔΔ or”, “□□ electric”, “xx camera”, “convex and convex electric” are grouped as level 2 clusters. Level 2 clusters can be created using keywords retrieved by multiple users.

図９は、レベル２のクラスタ化を示す図である。図９には、ユーザＩＤとそのユーザＩＤのユーザが入力したキーワードとの関係を表している。たとえば、「USR1」のユーザが「Kwd1」、「Kwd2」を検索キーワードとして入力したことを表している。このとき、入力するユーザの共通性（dup値）が高いキーワード同士は、同じカテゴリに属しているものと想定できる。 FIG. 9 is a diagram illustrating level 2 clustering. FIG. 9 shows the relationship between the user ID and the keyword input by the user of the user ID. For example, this indicates that the user “USR1” has input “Kwd1” and “Kwd2” as search keywords. At this time, it can be assumed that keywords having a high commonality (dup value) of users to be input belong to the same category.

すなわち、ユーザが検索によって情報を探す場合、対象に対して様々なキーワードを入力しながら試行錯誤的に検索するのが一般的である。複数のユーザによって同じように検索されたキーワード群には、何らかの共通性があると考えられる。そこで、入力するユーザの共通性が高いキーワードをグループ化し、レベル２のクラスタとする。 That is, when a user searches for information by searching, it is common to search by trial and error while inputting various keywords to the target. A group of keywords searched in the same manner by a plurality of users is considered to have some commonality. Therefore, keywords having high commonality among the input users are grouped into a level 2 cluster.

具体的には、レベル１と同様に重なり度合dupを以下のように定義できる。
Dup（Kwd1,Kwd2）＝（Kwd1とKwd2の共通ユーザ数）／（Kwd1とKwd2それぞれ検索したユーザの合計）
検索ログに含まれる全てのキーワードについて、上記の重なり度合(dup)や単純な重なりユーザ数を算出して、それらの値がある閾値以上のキーワード群をクラスタとする。 Specifically, the overlapping degree dup can be defined as follows in the same manner as in level 1.
Dup (Kwd1, Kwd2) = (number of common users for Kwd1 and Kwd2) / (total number of users searched for Kwd1 and Kwd2 respectively)
For all the keywords included in the search log, the above-mentioned overlap degree (dup) and the number of simple overlap users are calculated, and a group of keywords whose values are equal to or greater than a certain threshold is defined as a cluster.

図９の例では、「Kwd1」と「Kwd2」とは「USR1」と「USR2」とによって共通に検索されたキーワードである。そこで、「Kwd1」と「Kwd2」とをレベル２のクラスタ３３としてグループ化する。同様に、「Kwd3」と「Kwd4」とは「USR3」と「USR4」とによって共通に検索されたキーワードである。そこで、「Kwd3」と「Kwd4」とをレベル２のクラスタ３４としてグループ化する。 In the example of FIG. 9, “Kwd1” and “Kwd2” are keywords that are commonly searched for by “USR1” and “USR2”. Therefore, “Kwd1” and “Kwd2” are grouped as a level 2 cluster 33. Similarly, “Kwd3” and “Kwd4” are keywords commonly searched for by “USR3” and “USR4”. Therefore, “Kwd3” and “Kwd4” are grouped as a level 2 cluster 34.

図１０は、２つのキーワードの間のレベル２の重なり度数を示す図である。図１０の例では、「キーワードＡ」の検索結果から選択されたユーザが「USR1」、「USR2」、「USR3」、「USR4」、「USR5」である。また、「キーワードＢ」の検索結果から選択されたユーザが「USR4」、「USR5」、「USR6」、「USR7」である。従って、重複するユーザは「USR4」、「USR5」であり、重なり数は２となる。 FIG. 10 is a diagram showing the level 2 overlap frequency between two keywords. In the example of FIG. 10, the users selected from the search result of “keyword A” are “USR1”, “USR2”, “USR3”, “USR4”, and “USR5”. The users selected from the search result of “keyword B” are “USR4”, “USR5”, “USR6”, and “USR7”. Therefore, the overlapping users are “USR4” and “USR5”, and the number of overlaps is 2.

このように、本発明の実施の形態では、レベル１とレベル２との２種類のクラスタが生成される。
図１１は、コンピュータというキーワードが属するクラスタの例を示す図である。図１１には「コンピュータ」というキーワードが属するレベル１のクラスタ４１とレベル２のクラスタ４２とに、他のどのようなキーワードが含まれるかを示している。この例では、「コンピュータ」に対するレベル１のクラスタ４１として「パソコン」や「パーソナルコンピュータ」等のキーワードが設定されている。また、「コンピュータ」に対するレベル２のクラスタ４２として、「コンピュータウィルス」や「セキュリティーホール」などのキーワードが設定されている。 Thus, in the embodiment of the present invention, two types of clusters of level 1 and level 2 are generated.
FIG. 11 is a diagram illustrating an example of a cluster to which the keyword “computer” belongs. FIG. 11 shows what other keywords are included in the level 1 cluster 41 and the level 2 cluster 42 to which the keyword “computer” belongs. In this example, keywords such as “personal computer” and “personal computer” are set as the level 1 cluster 41 for “computer”. Further, keywords such as “computer virus” and “security hole” are set as the level 2 cluster 42 for “computer”.

このように、各クラスタ間に共通して出現するキーワードをリンクに見立てることで、検索履歴からキーワードのネットワークを構築することができる。これを一般的なディレクトリとコンテンツナビゲーションとをユーザナビゲーションの観点から比較すると以下のようになる。 Thus, a keyword network can be constructed from a search history by regarding a keyword that appears in common between the clusters as a link. A comparison between a general directory and content navigation from the viewpoint of user navigation is as follows.

図１２は、ディレクトリ方式とコンテンツナビゲーション方式とのデータ構造を示す図である。図１２（Ａ）がディレクトリ方式のデータ構造を示しており、図１２（Ｂ）がコンテンツナビゲーション方式のデータ構造を示している。 FIG. 12 is a diagram illustrating a data structure of the directory method and the content navigation method. 12A shows the data structure of the directory system, and FIG. 12B shows the data structure of the content navigation system.

ディレクトリ方式では、キーワード間の関係がツリー構造５１であるのに対し、本実施の形態に係る方式ではキーワード間の関係がネットワーク構造５３である。また、ディレクトリ方式では、概念を絞り込む方向のナビゲーションであるのに対し、コンテンツナビゲーション方式では、概念を広げる方向のナビゲーションである。すなわち、ディレクトリ方式では、元のキーワード５２からツリー構造５１上の下位の構造へ、順次キーワードを探すことで概念が絞り込まれる。一方、コンテンツナビゲーション方式では、元のキーワード５４からクラスタ上の関連性を有する他のキーワードを順次辿り、キーワードを探すことで概念を広げることができる。 In the directory method, the relationship between keywords is a tree structure 51, whereas in the method according to the present embodiment, the relationship between keywords is a network structure 53. In the directory system, navigation is in a direction to narrow down the concept, whereas in the content navigation system, navigation is in a direction to expand the concept. That is, in the directory method, the concept is narrowed down by sequentially searching for keywords from the original keyword 52 to a lower structure on the tree structure 51. On the other hand, in the content navigation system, the concept can be expanded by sequentially searching other keywords having relevance on the cluster from the original keyword 54 and searching for the keyword.

このようなそれぞれの方式の違いにより、ディレクトリ方式の利用シーンは、対象のイメージがはっきりしている場合であるのに対し、コンテンツナビゲーション方式の利用シーンは、対象のイメージがはっきりしていない場合である。 Due to these differences, directory usage scenes are when the target image is clear, whereas content navigation usage scenes are when the target image is not clear. is there.

このようなキーワードのクラスタ化を効率的に行うため、検索履歴に基づいて、予め基本ＤＢ１１３、不要語ＤＢ１１４、及びクラスタＤＢ１１５を作成しておく。これらのＤＢの作成は、定期的（たとえば、毎日深夜の時間帯）に行われる。 In order to efficiently perform such clustering of keywords, a basic DB 113, an unnecessary word DB 114, and a cluster DB 115 are created in advance based on the search history. These DBs are created regularly (for example, every day at midnight).

図１３は、基本ＤＢのデータ構造例を示す図である。基本ＤＢ１１３は、キーワード別ＵＲＬ管理テーブル１１３ａ、ＵＲＬ別キーワード管理テーブル１１３ｂ、キーワード別選択ユーザ管理テーブル１１３ｃ、ユーザ別入力キーワード管理テーブル１１３ｄ、ＵＲＬ別選択ユーザ管理テーブル１１３ｅ、及びユーザ別選択ＵＲＬ管理テーブル１１３ｆで構成される。 FIG. 13 is a diagram illustrating a data structure example of the basic DB. The basic DB 113 includes a keyword-specific URL management table 113a, a URL-specific keyword management table 113b, a keyword-specific selection user management table 113c, a user-specific input keyword management table 113d, a URL-specific selection user management table 113e, and a user-specific selection URL management table 113f. Consists of.

キーワード別ＵＲＬ管理テーブル１１３ａには、キーワード、ＵＲＬ、ヒット件数、及びトップＵＲＬの欄が設けられ、各欄の横方向に並べられた情報同士が互いに関連づけられて１つのレコードを構成している。キーワードの欄には、検索の際に入力されたキーワードが設定される。ＵＲＬの欄には、対応するキーワードによる検索結果から選択されたＷｅｂページのＵＲＬが設定される。ヒット件数の欄には、対応するキーワードに合致したＷｅｂページの件数が設定される。トップＵＲＬの欄には、対応するキーワードによる検索結果の中で最も優先的に表示されたＷｅｂページのＵＲＬが設定される。 The keyword-specific URL management table 113a includes columns for keywords, URLs, hit counts, and top URLs, and information arranged in the horizontal direction in each column is associated with each other to form one record. The keyword input at the time of search is set in the keyword column. In the URL column, the URL of the Web page selected from the search result by the corresponding keyword is set. The number of Web pages that match the corresponding keyword is set in the hit count column. In the top URL column, the URL of the Web page displayed most preferentially in the search result by the corresponding keyword is set.

ＵＲＬ別キーワード管理テーブル１１３ｂには、ＵＲＬ、ＵＲＬタイトル、キーワード、及び検索タイプの欄が設けられ、各欄の横方向に並べられた情報同士が互いに関連づけられて１つのレコードを構成している。ＵＲＬの欄には、検索結果の中から選択されたＷｅｂページのＵＲＬが設定される。ＵＲＬタイトルの欄には、対応するＵＲＬで示されるＷｅｂページのタイトルが設定される。キーワードの欄には、対応するＵＲＬを選択したときの、元となった検索キーワードが設定される。検索タイプの欄には、検索結果として表示されたＷｅｂページのタイプ（広告やコマース（電子商取引）等の種別）が設定される。 The URL-specific keyword management table 113b includes columns for URL, URL title, keyword, and search type, and information arranged in the horizontal direction of each column is associated with each other to form one record. In the URL column, the URL of the Web page selected from the search results is set. In the URL title column, the title of the Web page indicated by the corresponding URL is set. In the keyword column, the original search keyword when the corresponding URL is selected is set. In the search type column, the type of web page (a type such as advertisement or commerce (electronic commerce)) displayed as a search result is set.

キーワード別選択ユーザ管理テーブル１１３ｃには、キーワードとユーザとの欄が設けられ、各欄の横方向に並べられた情報同士が互いに関連づけられて１つのレコードを構成している。キーワードの欄には、ユーザにより入力された検索キーワードが設定される。ユーザの欄には、対応するキーワードを入力したユーザの識別情報が設定される。 The keyword-specific selection user management table 113c includes columns for keywords and users, and information arranged in the horizontal direction of each column is associated with each other to form one record. A search keyword input by the user is set in the keyword column. In the user column, identification information of the user who has input the corresponding keyword is set.

ユーザ別入力キーワード管理テーブル１１３ｄには、ユーザとキーワードとの欄が設けられ、各欄の横方向に並べられた情報同士が互いに関連づけられて１つのレコードを構成している。ユーザの欄には、検索を行ったユーザの識別情報が設定される。キーワードの欄には、対応するユーザが入力した検索キーワードが設定される。 The user input keyword management table 113d is provided with columns of users and keywords, and information arranged in the horizontal direction of each column is associated with each other to form one record. In the user column, identification information of the user who performed the search is set. In the keyword column, a search keyword input by the corresponding user is set.

ＵＲＬ別選択ユーザ管理テーブル１１３ｅには、ＵＲＬとユーザとの欄が設けられ、各欄の横方向に並べられた情報同士が互いに関連づけられて１つのレコードを構成している。ＵＲＬの欄には、検索結果から選択されたＵＲＬが設定される。ユーザの欄には、対応するＵＲＬを選択したユーザの識別情報が設定される。 The URL-selected user management table 113e has columns of URL and user, and information arranged in the horizontal direction of each column is associated with each other to form one record. In the URL column, the URL selected from the search result is set. In the user column, the identification information of the user who selected the corresponding URL is set.

ユーザ別選択ＵＲＬ管理テーブル１１３ｆには、ユーザとＵＲＬとの欄が設けられ、各欄の横方向に並べられた情報同士が互いに関連づけられて１つのレコードを構成している。ユーザの欄には、検索を行ったユーザの識別情報が設定される。ＵＲＬの欄には、対応するユーザが検索結果から選択したＵＲＬが設定される。 The user-selected URL management table 113f is provided with columns of users and URLs, and information arranged in the horizontal direction of each column is associated with each other to form one record. In the user column, identification information of the user who performed the search is set. In the URL column, a URL selected by the corresponding user from the search result is set.

図１４は、不要語ＤＢのデータ構造例を示す図である。不要語ＤＢ１１４には、不要語、導出元の語、繰り返し回数、レベル１の重なり数、レベル１のdupの値、レベル２の重なり数、及びレベル２のdupの値の欄が設けられ、各欄の横方向に並べられた情報同士が互いに関連づけられて１つのレコードを構成している。 FIG. 14 is a diagram illustrating a data structure example of the unnecessary word DB. The unnecessary word DB 114 is provided with columns of unnecessary words, derivation words, number of repetitions, level 1 overlap number, level 1 dup value, level 2 overlap number, and level 2 dup value. Information arranged in the horizontal direction of the column is associated with each other to form one record.

不要語の欄には、不要語として選択されたキーワードが設定される。導出元の語の欄には、不要語の判断基準として対比されたキーワードが設定される。繰り返し回数の欄には、不要語として検出された際に、不要語検出処理が何段行われたのかを示す数値が設定される。 In the unnecessary word column, a keyword selected as an unnecessary word is set. In the derivation source word column, keywords that are compared as criteria for determining unnecessary words are set. In the number of repetitions column, a numerical value indicating how many unnecessary word detection processes have been performed when an unnecessary word is detected is set.

レベル１の重なり数の欄には、レベル１のクラスタを構成する際の不要語と導出元の語との間の重なり数が設定される。レベル１のdup値の欄には、レベル１のクラスタを構成する際の不要語と導出元の語との間のdup値が設定される。 In the level 1 overlap number column, the number of overlaps between the unnecessary words and the derivation source words when configuring the level 1 cluster is set. In the level 1 dup value column, a dup value between an unnecessary word and a derivation source word for configuring a level 1 cluster is set.

レベル２の重なり数の欄には、レベル２のクラスタを構成する際の不要語と導出元の語との間の重なり数が設定される。レベル２のdup値の欄には、レベル２のクラスタを構成する際の不要語と導出元の語との間のdup値が設定される。 In the level 2 overlap number column, the number of overlaps between unnecessary words and derivation source words when configuring a level 2 cluster is set. In the level 2 dup value column, a dup value between an unnecessary word and a derivation source word for configuring a level 2 cluster is set.

図１５は、クラスタＤＢのデータ構造例を示す図である。クラスタＤＢ１１５には、対象語、ユーザ数、ヒット件数、及び対応語の欄が設けられ、各欄の横方向に並べられた情報同士が互いに関連づけられて１つのレコードを構成している。 FIG. 15 is a diagram illustrating an example of a data structure of the cluster DB. The cluster DB 115 is provided with columns of target words, number of users, number of hits, and corresponding words, and information arranged in the horizontal direction in each column is associated with each other to form one record.

対象語の欄には、クラスタを検索する際の基準となるキーワードが設定される。ユーザ数の欄には、対象語を検索キーワードとして入力したユーザの総数が設定される。ヒット件数の欄には、対象語を検索キーワードとして検索したときのヒット件数が設定される。 In the target word column, a keyword serving as a reference when searching for clusters is set. The total number of users who input the target word as a search keyword is set in the number of users column. In the hit count column, the hit count when the target word is searched as a search keyword is set.

対応語の欄には、対象語と比較する他のキーワード（対応語）に関する情報が設定される。対応語の欄は、更に表記、重なり数、dupの値、ユーザ数、及びヒット件数の欄に細分化されている。 Information on other keywords (corresponding words) to be compared with the target word is set in the corresponding word column. The corresponding word column is further subdivided into notation, overlap number, dup value, user number, and hit number columns.

表記の欄は、対応語を表記する際の文字列が設定されている。重なり数の欄は、上下２段に分かれており、上段には対象語と対応語との間のレベル１の重なり数が設定されており、下段には対象語と対応語との間のレベル２の重なり数が設定されている。dupの値の欄は上下２段に分かれており、上段には対象語と対応語との間のレベル１のdup値が設定されており、下段には対象語と対応語との間のレベル２のdup値が設定されている。ユーザ数の欄には、対応語を検索キーワードとして入力したユーザの数が設定されている。ヒット件数には、対応語を検索キーワードとして入力した際の検索によってヒットする情報の件数が設定される。 In the column of notation, a character string for notation of the corresponding word is set. The overlap number column is divided into two upper and lower levels, the upper level is set to the level 1 overlap number between the target word and the corresponding word, and the lower level is the level between the target word and the corresponding word. An overlap number of 2 is set. The dup value column is divided into two upper and lower levels. The upper level contains level 1 dup values between the target word and the corresponding word, and the lower level indicates the level between the target word and the corresponding word. A dup value of 2 is set. In the number of users column, the number of users who input the corresponding word as a search keyword is set. In the number of hits, the number of pieces of information hit by a search when a corresponding word is input as a search keyword is set.

次に、各データベースの作成方法について説明する。まず、基本ＤＢ１１３の作成手順を説明する。
図１６は、基本ＤＢの作成手順を示すフローチャートである。以下、図１６に示す処理をステップ番号に沿って説明する。 Next, a method for creating each database will be described. First, a procedure for creating the basic DB 113 will be described.
FIG. 16 is a flowchart showing a basic DB creation procedure. In the following, the process illustrated in FIG. 16 will be described in order of step number.

［ステップＳ１］キーワードクラスタ作成部１４０は、検索履歴ＤＢ１１２を参照し、検索キーワードや飛び先ＵＲＬなどの異なる複数の検索履歴を、セッションＩＤなどのユニークなキーで１つの検索履歴としてまとめ、時間順にソートする。 [Step S1] The keyword cluster creation unit 140 refers to the search history DB 112, collects a plurality of different search histories such as search keywords and jump destination URLs as a single search history using a unique key such as a session ID, and in time order. Sort.

［ステップＳ２］キーワードクラスタ作成部１４０は、検索履歴を１レコード毎に読み込み、キーワードの正規化を行う。正規化では、全角の英数字記号を半角の大文字に、半角のカタカナを全角の大文字に、行頭や末尾に含まれる全角や半角のスペースを削除、キーワード中に含まれる連続する複数のスペースを１つの半角のスペースに置き換える等の処理が行われる。 [Step S2] The keyword cluster creation unit 140 reads the search history for each record and normalizes the keywords. In normalization, full-width alphanumeric symbols are changed to half-width uppercase letters, half-width katakana to full-width uppercase letters, full-width and half-width spaces at the beginning and end of lines are deleted, and multiple consecutive spaces contained in keywords are 1 Processing such as replacement with one half-width space is performed.

［ステップＳ３］キーワードクラスタ作成部１４０は、データの組み合わせ毎にハッシュ配列を作成し、値を基本ＤＢ１１３に格納する。具体的には、キーワードに対応するＵＲＬを示すハッシュ配列、キーワードに対応するユーザを示すハッシュ配列、ユーザに対応するキーワードを示すハッシュ配列、ユーザに対応するＵＲＬを示すハッシュ配列、ＵＲＬに対応するキーワードを示すハッシュ配列、ＵＲＬに対応するユーザを示すハッシュ配列が作成される。 [Step S3] The keyword cluster creation unit 140 creates a hash array for each combination of data, and stores the value in the basic DB 113. Specifically, a hash array indicating the URL corresponding to the keyword, a hash array indicating the user corresponding to the keyword, a hash array indicating the keyword corresponding to the user, a hash array indicating the URL corresponding to the user, and a keyword corresponding to the URL And a hash array indicating the user corresponding to the URL are created.

［ステップＳ４］キーワードクラスタ作成部１４０は、全てのレコードについて処理したか否かを判断する。全てのレコードについて処理した場合には処理を終了し、未処理のレコードがある場合には、処理がステップＳ２に進められる。 [Step S4] The keyword cluster creation unit 140 determines whether all records have been processed. If all records have been processed, the process ends. If there is an unprocessed record, the process proceeds to step S2.

図１７は、キーワードに対応するＵＲＬが定義されたハッシュ配列の例を示す図である。図１７に示すように、ハッシュ配列６１によって、各キーワードに対して、そのキーワードの検索結果から選択されたＵＲＬのリストや、ヒット件数（Hit#）やトップＵＲＬ（TopURL）が関連付けられている。 FIG. 17 is a diagram illustrating an example of a hash array in which URLs corresponding to keywords are defined. As shown in FIG. 17, the hash array 61 associates each keyword with a list of URLs selected from the keyword search result, the number of hits (Hit #), and the top URL (TopURL).

図１８は、ＵＲＬに対応するユーザが定義されたハッシュ配列の例を示す図である。図１８に示すように、ハッシュ配列６２によって、各ＵＲＬに対して、そのＵＲＬを選択したユーザのリストが関連付けられている。 FIG. 18 is a diagram illustrating an example of a hash array in which users corresponding to URLs are defined. As shown in FIG. 18, the hash array 62 associates each URL with a list of users who have selected that URL.

このように生成されたハッシュ配列が、図１３に示した基本ＤＢ１１３を示している。すなわち、図１３には、分かり易くテーブル形式で基本ＤＢ１１３を示しているが、実際のポータルサイトサーバ１００内では、ハッシュ配列によって基本ＤＢ１１３が管理されている。 The hash array generated in this way indicates the basic DB 113 shown in FIG. That is, FIG. 13 shows the basic DB 113 in a tabular form in an easy-to-understand manner, but in the actual portal site server 100, the basic DB 113 is managed by a hash array.

次に、不要語ＤＢ１１４の作成手順について説明する。ポータルサイトサーバ１００の検索履歴の中には、一般のユーザへのサービスとして公開する必要のないキーワードも多く含まれている。一般的に、検索履歴中に含まれる不要語を全て削除することは不可能であるが、上記のキーワードクラスタ作成方法を利用して、不要語ＤＢ１１４を作成することによって効率的に削除することができる。不要語ＤＢ１１４の作成は以下のステップによる。 Next, a procedure for creating the unnecessary word DB 114 will be described. The search history of the portal site server 100 includes many keywords that do not need to be disclosed as a service to general users. Generally, it is impossible to delete all unnecessary words included in the search history, but it is possible to delete them efficiently by creating the unnecessary word DB 114 using the above-described keyword cluster creation method. it can. The unnecessary word DB 114 is created according to the following steps.

図１９は、不要語ＤＢの作成手順を示すフローチャートの前半である。以下、図１９に示す処理をステップ番号に沿って説明する。
［ステップＳ１１］不要語ＤＢ作成部１５０は、検索履歴ＤＢ１１２から所定の期間内の検索履歴を取り出す。たとえば、前回のクラスタＤＢ作成処理が実行されてから現在までの期間の検索履歴を取り出す。 FIG. 19 is the first half of a flowchart showing a procedure for creating an unnecessary word DB. In the following, the process illustrated in FIG. 19 will be described in order of step number.
[Step S11] The unnecessary word DB creation unit 150 retrieves a search history within a predetermined period from the search history DB 112. For example, a search history for a period from the previous execution of the cluster DB creation process to the present is taken out.

［ステップＳ１２］不要語ＤＢ作成部１５０は、抽出した検索履歴中のキーワード、リンクに対してクリーニングを行う。クリーニングでは、たとえば、キーワードの全角英数記号を半角英数記号に変換、英数小文字を大文字に変換する。また、ＵＲＬに関連付けられているセッションＩＤやユーザＩＤを削除する。 [Step S12] The unnecessary word DB creation unit 150 performs cleaning on keywords and links in the extracted search history. In the cleaning, for example, a full-width alphanumeric symbol of a keyword is converted to a half-width alphanumeric symbol, and an alphanumeric lowercase character is converted to an uppercase character. Also, the session ID and user ID associated with the URL are deleted.

［ステップＳ１３］不要語ＤＢ作成部１５０は、抽出した各検索履歴から所定の情報を抽出し、結果を基本ＤＢ１１３に格納する。具体的には、不要語ＤＢ作成部１５０は、各検索履歴からキーワード、ユーザ、飛び先ＵＲＬ単位で集計（それぞれをキーにして集計）する。そして、不要語ＤＢ作成部１５０は、集計結果を、基本ＤＢ１１３に登録に登録する。 [Step S13] The unnecessary word DB creation unit 150 extracts predetermined information from each extracted search history, and stores the result in the basic DB 113. Specifically, the unnecessary word DB creation unit 150 aggregates (accumulates using each as a key) from each search history for each keyword, user, and destination URL. Then, the unnecessary word DB creation unit 150 registers the aggregation result in the basic DB 113 for registration.

［ステップＳ１４］不要語ＤＢ作成部１５０は、ユニークユーザ数が多い順にキーワードをソートする。ここで、ユニークユーザ数とは、各キーワードを入力したユーザの数を示している。なお、ユニークユーザ数を求める際、同一ユーザが同一のキーワードを複数回入力したときには、まとめて１ユーザと換算する。 [Step S14] The unnecessary word DB creation unit 150 sorts the keywords in descending order of the number of unique users. Here, the number of unique users indicates the number of users who input each keyword. When obtaining the number of unique users, if the same user inputs the same keyword a plurality of times, they are collectively converted to one user.

具体的には、キーワード別選択ユーザ管理テーブル１１３ｃの各キーワードに関連付けて登録されているユーザの数（ユーザＩＤの重複を排除後）がカウントされ、その数の大きい順にキーワードが並べ替えられる。また、各キーワードのユニークユーザ数は、時間、日、週、月単位でまとめ、その単位内でソートしてもよい。この際、ユニークユーザ数が少ないもの（１や２など、所定の閾値を超えないもの）はリストから削除してもよい。 Specifically, the number of users registered in association with each keyword in the keyword-specific selected user management table 113c (after eliminating duplicate user IDs) is counted, and the keywords are rearranged in descending order. In addition, the number of unique users of each keyword may be grouped in units of time, day, week, and month, and sorted within the unit. At this time, those with a small number of unique users (such as 1 and 2 that do not exceed a predetermined threshold) may be deleted from the list.

［ステップＳ１５］不要語ＤＢ作成部１５０は、ユニークユーザ数の多い順にキーワードを画面に表示させ、ユーザによって不要語にすべきか否かの検討対象とする１以上のキーワードを選択させる。不要語ＤＢ作成部１５０は、ユーザによって選択された１以上のキーワードを含む不要語の種リストを作成する。たとえば、ユーザは、時間、日、週、月単位でまとめられたときに、常に上位に現れるキーワードを選択する。種リストは、たとえば、ＲＡＭ１０２に格納される。 [Step S15] The unnecessary word DB creation unit 150 displays keywords on the screen in descending order of the number of unique users, and allows the user to select one or more keywords to be examined as to whether or not they should be unnecessary words. The unnecessary word DB creation unit 150 creates a seed list of unnecessary words including one or more keywords selected by the user. For example, the user selects keywords that always appear at the top when they are grouped by hour, day, week, or month. The seed list is stored in the RAM 102, for example.

［ステップＳ１６］不要語ＤＢ作成部１５０は、種リストから未処理のキーワードを取り出し、そのキーワードに対応するユーザとＵＲＬとを、基本ＤＢ１１３を参照して求める。具体的には、不要語ＤＢ作成部１５０は、キーワード別選択ユーザ管理テーブル１１３ｃを参照して、取り出したキーワードに対応するユーザＩＤを取得する。また、不要語ＤＢ作成部１５０は、キーワード別ＵＲＬ管理テーブル１１３ａを参照して、取り出したキーワードに対応するＵＲＬを求める。 [Step S16] The unnecessary word DB creation unit 150 extracts an unprocessed keyword from the seed list, and obtains a user and URL corresponding to the keyword with reference to the basic DB 113. Specifically, the unnecessary word DB creation unit 150 refers to the keyword-specific selected user management table 113c and acquires a user ID corresponding to the extracted keyword. Further, the unnecessary word DB creation unit 150 refers to the keyword-specific URL management table 113a and obtains a URL corresponding to the extracted keyword.

［ステップＳ１７］不要語ＤＢ作成部１５０は、ステップＳ１６で求めたユーザとＵＲＬとに対応するキーワードを求め、そのキーワードのＵＲＬやユーザを更に求める。具体的には、不要語ＤＢ作成部１５０は、ユーザ別入力キーワード管理テーブル１１３ｄを参照し、ステップＳ１６で求めた各ユーザに対応するキーワードを求める。そして、不要語ＤＢ作成部１５０は、キーワード別選択ユーザ管理テーブル１１３ｃやキーワード別ＵＲＬ管理テーブル１１３ａを参照して、求めたキーワードに対応するＵＲＬやユーザを求める。 [Step S17] The unnecessary word DB creation unit 150 obtains a keyword corresponding to the user and URL obtained in Step S16, and further obtains the URL and user of the keyword. Specifically, the unnecessary word DB creation unit 150 refers to the user-specific input keyword management table 113d and obtains a keyword corresponding to each user obtained in step S16. Then, the unnecessary word DB creation unit 150 refers to the keyword-specific selected user management table 113c and the keyword-specific URL management table 113a to obtain a URL and a user corresponding to the obtained keyword.

図２０は、不要語ＤＢの作成手順を示すフローチャートの後半である。以下、図２０に示す処理をステップ番号に沿って説明する。
［ステップＳ１８］不要語ＤＢ作成部１５０は、ステップＳ１７で求めたキーワードと、種リストから取り出したキーワードとのＵＲＬの重なり数やdup関数とを計算する。これは、レベル１のクラスタの関係の有無を求める処理である。 FIG. 20 is the second half of the flowchart showing the procedure for creating the unnecessary word DB. In the following, the process illustrated in FIG. 20 will be described in order of step number.
[Step S18] The unnecessary word DB creation unit 150 calculates the number of URL overlaps and the dup function between the keyword obtained in Step S17 and the keyword extracted from the seed list. This is a process for determining whether or not there is a level 1 cluster relationship.

［ステップＳ１９］不要語ＤＢ作成部１５０は、ステップＳ１７で求めたキーワードと、種リストから取り出したキーワードとのユーザの重なり数やdup関数とを計算する。これは、レベル２のクラスタの関係の有無を求める処理である。 [Step S19] The unnecessary word DB creation unit 150 calculates the number of overlapping users and the dup function between the keyword obtained in step S17 and the keyword extracted from the seed list. This is a process for determining whether or not there is a level 2 cluster relationship.

［ステップＳ２０］不要語ＤＢ作成部１５０は、重なり数またはdup関数の値が所定の閾値以上となるキーワードを、不要語ＤＢ１１４に追加する。
［ステップＳ２１］不要語ＤＢ作成部１５０は、ステップＳ２０で追加したキーワードを画面に表示し、ユーザから不要語として不適切なキーワードの選択入力を受け付ける。ユーザからキーワードが選択されると、そのキーワードを不要語ＤＢ１１４から削除する。 [Step S20] The unnecessary word DB creation unit 150 adds, to the unnecessary word DB 114, keywords whose overlapping number or dup function value is equal to or greater than a predetermined threshold.
[Step S21] The unnecessary word DB creation unit 150 displays the keyword added in step S20 on the screen, and accepts selection input of an inappropriate keyword as an unnecessary word from the user. When a keyword is selected by the user, the keyword is deleted from the unnecessary word DB 114.

［ステップＳ２２］不要語ＤＢ作成部１５０は、不要語の登録処理を所定回数繰り返したか否かを判断する。所定回数繰り返した場合、処理を終了させる。まだ所定回数繰り返していない場合、ステップＳ２０で新たに不要語として追加したキーワード（ステップＳ２１で削除したものを除く）を種リストとして、処理をステップＳ１６に進める。 [Step S22] The unnecessary word DB creation unit 150 determines whether or not the unnecessary word registration process has been repeated a predetermined number of times. If it is repeated a predetermined number of times, the process is terminated. If it has not been repeated a predetermined number of times, the process proceeds to step S16 using the keywords newly added as unnecessary words in step S20 (except for those deleted in step S21) as a seed list.

このようにして、不要語ＤＢ１１４を作成することができる。この際、新たに不要語としたキーワードを種リストとして不要語の判定を繰り返すことにより、不要語の抽出漏れを減らすことができる。 In this way, the unnecessary word DB 114 can be created. At this time, the unnecessary word extraction omission can be reduced by repeating the determination of the unnecessary word using the newly set unnecessary word as a seed list.

次に、クラスタＤＢ１１５の作成手順について説明する。
図２１は、クラスタＤＢ作成処理の手順を示すフローチャートの前半である。以下、図２１に示す処理をステップ番号に沿って説明する。 Next, a procedure for creating the cluster DB 115 will be described.
FIG. 21 is the first half of a flowchart showing the procedure of the cluster DB creation process. In the following, the process illustrated in FIG. 21 will be described in order of step number.

［ステップＳ３１］キーワードクラスタ作成部１４０は、検索履歴ＤＢ１１２から所定の期間内の検索履歴を取り出す。たとえば、前回のクラスタＤＢ作成処理が実行されてから現在までの期間の検索履歴を取り出す。 [Step S31] The keyword cluster creation unit 140 retrieves a search history within a predetermined period from the search history DB 112. For example, a search history for a period from the previous execution of the cluster DB creation process to the present is taken out.

［ステップＳ３２］キーワードクラスタ作成部１４０は、抽出した検索履歴中のキーワード、リンクに対してクリーニングを行う。クリーニングでは、たとえば、キーワードの全角英数記号を半角英数記号に変換、英数小文字を大文字に変換する。また、ＵＲＬに関連付けられているセッションＩＤやユーザＩＤを削除する。 [Step S32] The keyword cluster creation unit 140 cleans the keywords and links in the extracted search history. In the cleaning, for example, a full-width alphanumeric symbol of a keyword is converted to a half-width alphanumeric symbol, and an alphanumeric lowercase character is converted to an uppercase character. Also, the session ID and user ID associated with the URL are deleted.

［ステップＳ３３］キーワードクラスタ作成部１４０は、抽出した各検索履歴から所定の情報を抽出し、結果を基本ＤＢ１１３に格納する。具体的には、キーワードクラスタ作成部１４０は、各検索履歴からキーワード、ユーザ、飛び先ＵＲＬ単位で集計（それぞれをキーにして集計）する。そして、キーワードクラスタ作成部１４０は、集計結果を、基本ＤＢ１１３に登録する。 [Step S33] The keyword cluster creation unit 140 extracts predetermined information from each extracted search history, and stores the result in the basic DB 113. Specifically, the keyword cluster creation unit 140 aggregates the keywords, users, and jump destination URLs from each search history (aggregates using each as a key). Then, the keyword cluster creation unit 140 registers the aggregation results in the basic DB 113.

［ステップＳ３４］キーワードクラスタ作成部１４０は、ユニークユーザ数が所定の閾値以下のキーワードや、不要語ＤＢ１１４に含まれるキーワードを基本ＤＢ１１３から削除する。 [Step S34] The keyword cluster creation unit 140 deletes, from the basic DB 113, keywords whose number of unique users is equal to or less than a predetermined threshold and keywords included in the unnecessary word DB 114.

［ステップＳ３５］キーワードクラスタ作成部１４０は、ユニークユーザが多い順にキーワードをソートする。具体的には、キーワードクラスタ作成部１４０は、キーワード別選択ユーザ管理テーブル１１３ｃの各キーワードに関連付けて登録されているユーザの数（ユーザＩＤの重複を排除後）をカウントし、その数の大きい順にキーワードを並べ替える。 [Step S35] The keyword cluster creation unit 140 sorts the keywords in descending order of unique users. Specifically, the keyword cluster creation unit 140 counts the number of users registered after being associated with each keyword in the keyword-specific selected user management table 113c (after eliminating duplication of user IDs), and in ascending order of the number. Sort keywords.

［ステップＳ３６］キーワードクラスタ作成部１４０は、ソートしたリストの上位からキーワードを取り出し、取り出されたキーワードを対象語とする。そして、キーワードクラスタ作成部１４０は、対象語に対応するユーザとＵＲＬを求める。具体的には、キーワードクラスタ作成部１４０は、キーワード別選択ユーザ管理テーブル１１３ｃを参照して、取り出したキーワードに対応するユーザＩＤを取得する。また、キーワードクラスタ作成部１４０は、キーワード別ＵＲＬ管理テーブル１１３ａを参照して、取り出したキーワードに対応するＵＲＬを求める。 [Step S36] The keyword cluster creation unit 140 extracts keywords from the top of the sorted list, and sets the extracted keywords as target words. Then, the keyword cluster creation unit 140 obtains a user and URL corresponding to the target word. Specifically, the keyword cluster creation unit 140 refers to the keyword-specific selected user management table 113c and acquires a user ID corresponding to the extracted keyword. Further, the keyword cluster creation unit 140 refers to the keyword-specific URL management table 113a to obtain a URL corresponding to the extracted keyword.

［ステップＳ３７］キーワードクラスタ作成部１４０は、ステップＳ３６で求めたユーザとＵＲＬとに対応するキーワード（対応語）を求め、そのキーワードのＵＲＬやユーザを更に求める。具体的には、キーワードクラスタ作成部１４０は、ユーザ別入力キーワード管理テーブル１１３ｄを参照し、ステップＳ３６で求めた各ユーザに対応するキーワードを求める。そして、キーワードクラスタ作成部１４０は、キーワード別選択ユーザ管理テーブル１１３ｃやキーワード別ＵＲＬ管理テーブル１１３ａを参照して、求めたキーワードに対応するＵＲＬやユーザを求める。 [Step S37] The keyword cluster creation unit 140 obtains a keyword (corresponding word) corresponding to the user and URL obtained in Step S36, and further obtains the URL and user of the keyword. Specifically, the keyword cluster creation unit 140 refers to the user-specific input keyword management table 113d and obtains a keyword corresponding to each user obtained in step S36. Then, the keyword cluster creation unit 140 refers to the keyword-specific selected user management table 113c and the keyword-specific URL management table 113a to obtain a URL and a user corresponding to the obtained keyword.

図２２は、クラスタＤＢ作成処理の手順を示すフローチャートの後半である。以下、図２２に示す処理をステップ番号に沿って説明する。
［ステップＳ３８］キーワードクラスタ作成部１４０は、ステップＳ３７で求めた対応語と、対象語とのＵＲＬの重なり数やdup関数を計算する。これは、レベル１のクラスタの関係の有無を求める処理である。また、キーワードクラスタ作成部１４０は、ステップＳ３７で求めた対応語と、対象語とのユーザの重なり数やdup関数を計算する。これは、レベル２のクラスタの関係の有無を求める処理である。 FIG. 22 is the second half of the flowchart showing the procedure of the cluster DB creation process. In the following, the process illustrated in FIG. 22 will be described in order of step number.
[Step S38] The keyword cluster creation unit 140 calculates the number of URL overlaps and dup functions between the corresponding word obtained in Step S37 and the target word. This is a process for determining whether or not there is a level 1 cluster relationship. Further, the keyword cluster creation unit 140 calculates the number of overlapping users and the dup function between the corresponding word obtained in step S37 and the target word. This is a process for determining whether or not there is a level 2 cluster relationship.

［ステップＳ３９］キーワードクラスタ作成部１４０は、ステップＳ３８で求めた重なり数またはdup関数の値の何れかが所定の閾値以上の対応語を、対象語に関連付けてクラスタＤＢ１１５に登録する。この際、対応語のレベル１及びレベル２の重なり数やdup関数の値も合わせて登録される。 [Step S39] The keyword cluster creation unit 140 registers, in the cluster DB 115, a corresponding word in which either the number of overlaps obtained in step S38 or the value of the dup function is equal to or greater than a predetermined threshold is associated with the target word. At this time, the number of overlaps of level 1 and level 2 of the corresponding word and the value of the dup function are also registered.

［ステップＳ４０］キーワードクラスタ作成部１４０は、登録した対応語のＵＲＬやユーザ情報を、対象語の情報として基本ＤＢ１１３に登録する。その後、キーワードクラスタ作成部１４０は、登録した対応語の情報を基本ＤＢ１１３から削除する。この処理をネットワークの縮退と呼ぶ。 [Step S40] The keyword cluster creation unit 140 registers the URL and user information of the registered corresponding word in the basic DB 113 as target word information. Thereafter, the keyword cluster creation unit 140 deletes the registered corresponding word information from the basic DB 113. This process is called network degeneration.

［ステップＳ４１］キーワードクラスタ作成部１４０は、更新された基本ＤＢ１１３を対象に不要語ＤＢ作成処理（図１９、図２０参照）を行い、不要語ＤＢ１１４を更新する。 [Step S41] The keyword cluster creation unit 140 performs unnecessary word DB creation processing (see FIGS. 19 and 20) for the updated basic DB 113, and updates the unnecessary word DB 114.

［ステップＳ４２］キーワードクラスタ作成部１４０は、クラスタＤＢ１１５の更新処理を所定回数実行したか否かを判断する。所定回数実行して入れば所定が終了する。所定回数実行していない場合、処理をステップＳ３１に進める。これにより、同様の処理が所定回数繰り返される。 [Step S42] The keyword cluster creation unit 140 determines whether or not the update processing of the cluster DB 115 has been executed a predetermined number of times. If the predetermined number of times of execution is entered, the predetermined process ends. If not, the process proceeds to step S31. Thereby, the same processing is repeated a predetermined number of times.

なお、クラスタＤＢ１１５に対応語を追加する場合、それが何回目の処理による処置であるのかを示す情報を追記することもできる。また、対応語の追加日付をクラスタＤＢ１１５に登録することもできる。日付情報を登録することで、時間の経過によるクラスタキーワードの変化を把握することができる。 In addition, when a corresponding word is added to the cluster DB 115, information indicating how many times the process is performed can be additionally written. Further, the date of addition of the corresponding word can also be registered in the cluster DB 115. By registering the date information, it is possible to grasp the change of the cluster keyword over time.

このようにして、クラスタＤＢ１１５を作成することができる。この際、ステップＳ４０に示すように、ネットワークの縮退が同時に行われる。ネットワークの縮退では、表記のゆれや概念が特に近いキーワード群をまとめる操作が行われる。特に近い関係を有するキーワード群を纏め１つの代表的なキーワードで表すことによって、キーワード数を減らし、ナビゲーションネットワークの規模を縮退することができる。その結果、ユーザに対してクラスタに含まれるキーワードを提示する際にも、必要最低限のキーワードを効率的に提示することができる。 In this way, the cluster DB 115 can be created. At this time, as shown in step S40, the network is simultaneously degenerated. In the degeneracy of a network, an operation for grouping keyword groups that are particularly similar in notation fluctuation and concept is performed. By expressing a group of keywords that are particularly close to each other and representing them as one representative keyword, the number of keywords can be reduced and the scale of the navigation network can be reduced. As a result, when presenting keywords included in the cluster to the user, it is possible to efficiently present the minimum necessary keywords.

このようにクラスタ化されるべきキーワードの組の抽出、およびそれらのキーワードの縮退処理を交互に繰り返すことで、概念的に関連するキーワードがクラスタから漏れてしまう事態を減らすことができる。 Thus, by alternately extracting the set of keywords to be clustered and the reduction processing of those keywords, it is possible to reduce the situation where conceptually related keywords are leaked from the cluster.

図２３は、クラスタ化とキーワードの縮退とを繰り返した際のクラスタの変化を示す図である。たとえば、「Ａ社」を対象語としたときの最初のレベル１のクラスタ６３に３つの対応語が含まれているものとする。このとき、基本ＤＢ１１３における対象語に対応するユーザの項目に、対応語に登録されているユーザを加える。同様に、対象語に対応するＵＲＬの項目に、対応語に登録されているＵＲＬを加える。 FIG. 23 is a diagram illustrating changes in clusters when clustering and keyword degeneration are repeated. For example, it is assumed that three corresponding words are included in the first level 1 cluster 63 when “Company A” is the target word. At this time, the user registered in the corresponding word is added to the user's item corresponding to the target word in the basic DB 113. Similarly, the URL registered in the corresponding word is added to the URL item corresponding to the target word.

その後、再度レベル１のクラスタ作成処理を実行すると、概念的に拡張された他のキーワードを含むクラスタ６４が生成される。このような処理を、任意の回数繰り返すことで、概念的に近いキーワードを集めたクラスタを形成することができる。 Thereafter, when the level 1 cluster creation process is executed again, a cluster 64 including other keywords expanded conceptually is generated. By repeating such processing an arbitrary number of times, it is possible to form a cluster in which keywords that are conceptually similar are collected.

以上のように、クラスタＤＢ１１５を作成しておくことにより、ポータルサイトサーバ１００にアクセスしてインターネット上のコンテンツを検索するユーザに対して、コンテンツナビゲーション機能を提供することができる。コンテンツナビゲーションは、ナビゲーション部１６０によって行われる。 As described above, by creating the cluster DB 115, a content navigation function can be provided to a user who accesses the portal site server 100 and searches for content on the Internet. Content navigation is performed by the navigation unit 160.

本実施の形態では、コンテンツナビゲーションのトップページのＵＲＬが予め用意されているものとする。ユーザがクライアント２１１を操作して、そのＵＲＬにアクセスすると、ナビゲーション部１６０によってナビゲーショントップ画面データが作成される。本実施の形態では、ナビゲーション部１６０が基本ＤＢ１１３のキーワード別選択ユーザ管理テーブル１１３ｃを参照し、検索キーワードとして入力したユーザ数の多いキーワードのリストを作成する。そして、そのキーワードのリストを含むナビゲーショントップ画面データが作成される。ナビゲーショントップ画面データはクライアント２１１に転送され、クライアントでナビゲーショントップ画面が表示される。 In the present embodiment, it is assumed that the URL of the top page of content navigation is prepared in advance. When the user operates the client 211 to access the URL, navigation top screen data is created by the navigation unit 160. In the present embodiment, the navigation unit 160 refers to the keyword-specific selected user management table 113c of the basic DB 113 and creates a list of keywords having a large number of users input as search keywords. Then, navigation top screen data including the keyword list is created. The navigation top screen data is transferred to the client 211, and the navigation top screen is displayed on the client.

図２４は、ナビゲーショントップ画面の例を示す図である。ナビゲーショントップ画面７１には、キーワード入力部７１ａ、ジャンプボタン７１ｂ、複数のキーワード候補７１ｃ、キーワード候補７１ｃ毎のユニークユーザ数７１ｄ、および定番ディレクトリリンクオブジェクト７１ｅが表示されている。 FIG. 24 is a diagram illustrating an example of the navigation top screen. The navigation top screen 71 displays a keyword input unit 71a, a jump button 71b, a plurality of keyword candidates 71c, the number of unique users 71d for each keyword candidate 71c, and a standard directory link object 71e.

キーワード入力部７１ａは、任意のキーワードを代表キーワードとして入力するためのテキストボックスである。ジャンプボタン７１ｂは、コンテンツナビゲーションの実行指示を出すためのボタンである。ジャンプボタン７１ｂが押下されると、キーワード入力部７１ａに入力されたキーワードを代表キーワードとしたコンテンツナビゲーション要求がポータルサイトサーバ１００に対して送信される。 The keyword input unit 71a is a text box for inputting an arbitrary keyword as a representative keyword. The jump button 71b is a button for issuing a content navigation execution instruction. When the jump button 71b is pressed, a content navigation request using the keyword input to the keyword input unit 71a as a representative keyword is transmitted to the portal site server 100.

キーワード候補７１ｃは、代表キーワードとして選択するキーワードの候補である。この例では、過去の所定期間内にユーザによって検索キーワードとして入力された回数の多いキーワードがキーワード候補７１ｃとして表示されている。何れかのキーワード候補７１ｃがユーザによって選択されると、そのキーワードを代表キーワードとしたコンテンツナビゲーション要求がポータルサイトサーバ１００に対して送信される。各キーワード候補７１ｃのユニークユーザ数は、そのキーワードを検索キーワードとして入力したユーザの数である。 The keyword candidate 71c is a keyword candidate to be selected as a representative keyword. In this example, keywords that are frequently input as search keywords by a user within a predetermined period in the past are displayed as keyword candidates 71c. When any one of the keyword candidates 71c is selected by the user, a content navigation request using the keyword as a representative keyword is transmitted to the portal site server 100. The number of unique users of each keyword candidate 71c is the number of users who input the keyword as a search keyword.

また、定番ディレクトリリンクオブジェクト７１ｅは、定番ディレクトリ画面に遷移させるためのリンクが定義されたオブジェクトである。定番ディレクトリリンクオブジェクト７１ｅが選択されると、ポータルサイトサーバ１００に対して、定番ディレクトリ画面の表示要求がだされる。 The standard directory link object 71e is an object in which a link for transition to the standard directory screen is defined. When the standard directory link object 71e is selected, a request to display the standard directory screen is issued to the portal site server 100.

ここで、キーワード候補７１ｃの１つがユーザによって選択されたものとする。すると、コンテンツナビゲーション要求を受け取ったポータルサイトサーバ１００において、ナビゲーション部１６０がコンテンツナビゲーションの処理を行い、処理結果をクライアント２１１に対して送信する。具体的には、ナビゲーション部１６０は、代表キーワードとして指定されたキーワードが対象語として登録されているレコードをクラスタＤＢ１１５から検索する。そして、ナビゲーション部１６０は、検出されたレコードの対応語から、レベル１のdupの値（あるいは重なり数）が大きい順に所定数のキーワードを抽出し、同様にレベル２のdupの値（あるいは重なり数）が大きい順に所定数のキーワードを抽出する。ナビゲーション部１６０は、これらのキーワードを含むナビゲーション画面データを作成する。 Here, it is assumed that one of the keyword candidates 71c is selected by the user. Then, in the portal site server 100 that has received the content navigation request, the navigation unit 160 performs content navigation processing and transmits the processing result to the client 211. Specifically, the navigation unit 160 searches the cluster DB 115 for a record in which a keyword designated as a representative keyword is registered as a target word. Then, the navigation unit 160 extracts a predetermined number of keywords from the corresponding word of the detected record in descending order of the level 1 dup value (or number of overlaps), and similarly the level 2 dup value (or number of overlaps). ), A predetermined number of keywords are extracted in descending order. The navigation unit 160 creates navigation screen data including these keywords.

また、ナビゲーション部１６０は、基本ＤＢ１１３のキーワード別ＵＲＬ管理テーブル１１３ａを参照し、代表キーワードとして指定されたキーワードによる検索結果から選択されたことのあるＵＲＬを抽出する。さらに、ナビゲーション部１６０は、ＵＲＬ別選択ユーザ管理テーブル１１３ｅを参照し、抽出した各ＵＲＬを選択したユーザの数をカウントし、ＵＲＬをそのユーザ数によってソートする。そして、ナビゲーション部１６０は、ソートされたＵＲＬ、およびそのＵＲＬの関連情報をナビゲーション画面データに追加する。 Further, the navigation unit 160 refers to the keyword-specific URL management table 113a of the basic DB 113, and extracts a URL that has been selected from a search result based on the keyword specified as the representative keyword. Furthermore, the navigation unit 160 refers to the URL-specific selected user management table 113e, counts the number of users who have selected each extracted URL, and sorts the URLs according to the number of users. Then, the navigation unit 160 adds the sorted URL and related information of the URL to the navigation screen data.

生成されたナビゲーション画面はクライアント２１１に対して送信され、クライアント２１１に表示される。
図２５は、ナビゲーション画面の例を示す図である。ナビゲーション画面７２には、代表キーワード７２ａ、レベル１クラスタ内キーワード７２ｂ、レベル２クラスタ内キーワード７２ｃ、及びコンテンツ情報７２ｄが表示されている。 The generated navigation screen is transmitted to the client 211 and displayed on the client 211.
FIG. 25 is a diagram illustrating an example of a navigation screen. On the navigation screen 72, a representative keyword 72a, a level 1 cluster keyword 72b, a level 2 cluster keyword 72c, and content information 72d are displayed.

代表キーワード７２ａの横には、その代表キーワードを検索キーワードとして入力したユニークユーザ数が表示されている。レベル１クラスタ内キーワード７２ｂには、代表キーワードとの間でレベル１のクラスタの関係を有するキーワードが表示されている。レベル２クラスタ内キーワード７２ｃには、代表キーワードとの間でレベル２のクラスタの関係を有するキーワードが表示されている。 Next to the representative keyword 72a, the number of unique users who input the representative keyword as a search keyword is displayed. In the level 1 cluster keyword 72b, a keyword having a level 1 cluster relationship with the representative keyword is displayed. In the level 2 cluster keyword 72c, a keyword having a level 2 cluster relationship with the representative keyword is displayed.

コンテンツ情報７２ｄには、代表キーワードを検索キーワードとしたときの検索結果から選択された回数の多いコンテンツに関する情報が表示される。コンテンツ情報７２ｄには、コンテンツアクセス件数７２ｅ、コンテンツタイトル７２ｆ、対応語７２ｇが含まれる。コンテンツアクセス件数７２ｅは、表示されているコンテンツが検索結果として表示されたときにアクセス対象として選択された回数である。コンテンツタイトル７２ｆは、コンテンツのタイトルである。対応語７２ｇは、そのコンテンツを検索結果として検出することができるキーワードである。 The content information 72d displays information related to content that is frequently selected from the search results when the representative keyword is used as the search keyword. The content information 72d includes a content access number 72e, a content title 72f, and a corresponding word 72g. The content access number 72e is the number of times that the displayed content is selected as an access target when the displayed content is displayed as a search result. The content title 72f is a content title. The corresponding word 72g is a keyword that can detect the content as a search result.

なお、ナビゲーション画面において、コンテンツのタイプを識別できるように表示することもできる。その場合、ナビゲーション部１６０は、ナビゲーション画面データの作成時に、ＵＲＬ別キーワード管理テーブル１１３ｂの検索タイプの欄を参照し、検索タイプ毎に異なる表示属性とする。たとえば、検索タイプ毎に異なる表示色とすることができる。 Note that the navigation screen can be displayed so that the type of content can be identified. In this case, the navigation unit 160 refers to the search type column of the URL-based keyword management table 113b when creating the navigation screen data, and sets different display attributes for each search type. For example, the display color can be different for each search type.

図２６は、タイプ識別可能なナビゲーション画面の例を示す図である。このナビゲーション画面７３では、サービスコンテンツのコンテンツタイトル７３ａや、広告コンテンツのアクセス件数７３ｂが、他のコンテンツと異なる色で表示されている（図２６中では、破線によってハイライト表示部を示している）。サービスコンテンツとは、電子商取引などのサービス提供を行うＷｅｂサイトに設けられたコンテンツである。広告コンテンツとは、企業の商品宣伝等の広告のコンテンツである。コンテンツのタイプを異なる色で表示することにより、ユーザは目的のコンテンツを容易に識別できる。 FIG. 26 is a diagram illustrating an example of a navigation screen in which type identification is possible. In the navigation screen 73, the content title 73a of the service content and the access number 73b of the advertisement content are displayed in a color different from that of the other content (in FIG. 26, the highlighted display portion is indicated by a broken line). . Service content is content provided on a website that provides services such as electronic commerce. The advertising content is advertising content such as corporate product promotion. By displaying the content type in different colors, the user can easily identify the target content.

以上のように、検索履歴に基づいて、ユーザが検索結果からどのコンテンツを選択したのかをデータベースで管理し、そのデータベースに基づいてキーワードのクラスタ化を行った。そして、コンテンツナビゲーションにおいて指定された代表キーワードとクラスタ化された他のキーワードをユーザに提示するようにした。その結果、実際のユーザの嗜好等を適宜反映させて、ユーザが指定したキーワードに関連するキーワードを提示することができる。 As described above, based on the search history, which content the user has selected from the search results is managed in the database, and keywords are clustered based on the database. Then, the representative keyword designated in the content navigation and other keywords clustered are presented to the user. As a result, the keyword related to the keyword designated by the user can be presented by appropriately reflecting the actual user's preference and the like.

また、ユーザが任意に指定したコンテンツを強調（たとえば、ハイライト）表示することもできる。
図２７は、任意のコンテンツを強調表示したナビゲーション画面の例を示す図である。図２７に示すナビゲーション画面７４には、ハイライト指定部７４ａ，７４ｂが設けられている。ハイライト指定部７４ａ，７４ｂでは、強調表示すべきコンテンツを示す文字列の入力部がある。この入力部に入力された文字列をＵＲＬやタイトルに含むコンテンツがハイライトによって強調表示される。 Further, content arbitrarily designated by the user can be highlighted (for example, highlighted).
FIG. 27 is a diagram illustrating an example of a navigation screen in which arbitrary content is highlighted. The navigation screen 74 shown in FIG. 27 is provided with highlight designation parts 74a and 74b. In the highlight designating parts 74a and 74b, there is a character string input part indicating the content to be highlighted. The content including the character string input to the input unit in the URL or title is highlighted by highlighting.

また、ナビゲーショントップ画面７１の定番ディレクトリリンクオブジェクト７１ｅが選択されると、ナビゲーション部１６０によって定番ディレクトリ画面データが作成され、クライアント２１１に送信される。定番ディレクトリ画面データには、定常的に頻繁にアクセスされるコンテンツが含まれる。 When the standard directory link object 71 e on the navigation top screen 71 is selected, the standard directory screen data is created by the navigation unit 160 and transmitted to the client 211. The standard directory screen data includes content that is regularly and frequently accessed.

図２８は、定番ディレクトリ画面の例を示す図である。定番ディレクトリ画面７５には、常時アクセス数の多いコンテンツが表示されている。各コンテンツのタイトルの横には、アクセスしたユニークユーザ数が示されている。 FIG. 28 is a diagram illustrating an example of a standard directory screen. The standard directory screen 75 displays content that is constantly accessed frequently. Next to the title of each content, the number of accessed unique users is shown.

このように、本発明の実施の形態に示すコンテンツナビゲーションを行えば、ユーザは代表キーワードを選択することで、その時点での流行のキーワードを取得することができる。そして、ユーザは、流行のコンテンツを広くブラウジングすることができる。また、キーワードの表記などを気にせず、関連する全てのコンテンツにアクセスすることができる。 As described above, if the content navigation shown in the embodiment of the present invention is performed, the user can acquire the trendy keyword at that time by selecting the representative keyword. Then, the user can browse popular content widely. Also, it is possible to access all related contents without worrying about keyword notation.

たとえば、ある期間（時間、日、週、月など）で集計したキーワードを、キーワードの選択したユニークユーザ数、注目度（平均のユーザ数からの伸び）、キーワードのヒット件数、クラスタの大きさなどの指標を用いてソートし、上位のキーワードをディレクトリ検索のトップのように表示する。ユーザはこれを見ることで現在どのようなことが流行っているのか概観することができる。 For example, the keywords collected for a certain period (hours, days, weeks, months, etc.), the number of unique users selected by the keyword, the degree of attention (increase from the average number of users), the number of keyword hits, the size of the cluster, etc. Sort by using the index and display the top keywords as the top of the directory search. The user can see what is happening now by seeing this.

なお、このようなコンテンツナビゲーションは、ディレクトリ検索とは違ったユーザナビゲーションである。すなわち、コンテンツナビゲーションでは、ネットワークを利用するユーザの嗜好の変化やコンテンツの変化が監視され、ユーザの興味の推移に沿った適当なナビゲーションを行うことができる。 Note that such content navigation is user navigation different from directory search. That is, in content navigation, changes in preferences of users who use the network and changes in contents can be monitored, and appropriate navigation can be performed in accordance with changes in user interests.

また、電子商取引を行う事業者はその結果を自社サイトのＳＥＯ（SearchEngineOptimization：自社サイトがユーザによって的確に検索されるようにサイトのキーワードや構成を最適化する手法）やＳＥＭ（SearchEngineMarketing：検索キーワード広告などを利用して自社サイトの利益を最大化する手法）に利用することもできる。 In addition, companies that conduct electronic commerce use the results of their site's SEO (SearchEngineOptimization: a technique for optimizing site keywords and composition so that users can search their site accurately) and SEM (SearchEngineMarketing: search keyword advertising). Can also be used to maximize the profits of their own site.

ところで、本実施の形態は、以下のような応用が可能である。
［コンテンツクラスタの作成］
上記の例では、キーワード間のクラスタを作成したが、クラスタ化されたキーワードに関連するコンテンツ同士をクラスタ化することもできる。具体的には、クラスタＤＢ１１５と基本ＤＢ１１３を利用してコンテンツクラスタを作成することができる。クラスタを構成するキーワードのコンテンツ群はクラスタと考えられる。 By the way, this embodiment can be applied as follows.
[Create content cluster]
In the above example, a cluster between keywords is created. However, contents related to a clustered keyword can be clustered. Specifically, a content cluster can be created using the cluster DB 115 and the basic DB 113. A group of keyword contents constituting a cluster is considered a cluster.

図２９は、コンテンツクラスタの例を示す図である。図に示すように、キーワード間のクラスタ８１（レベル１、レベル２の何れか一方または両方）に含まれる各キーワードに対して、基本ＤＢ１１３上で関連付けられたＵＲＬ（対応するキーワードによる検索結果からユーザによって選択されたＵＲＬ）を抽出する。そして、抽出されたＵＲＬによりコンテンツクラスタ８２を構成する。 FIG. 29 is a diagram illustrating an example of a content cluster. As shown in the figure, for each keyword included in the cluster 81 between keywords (either level 1 or level 2 or both), the URL associated on the basic DB 113 (from the search result by the corresponding keyword, the user The URL selected by (1) is extracted. Then, the content cluster 82 is configured by the extracted URL.

このように、コンテンツクラスタ８２を構成することにより、ユーザによってコンテンツが選択された際に、同じクラスタに属する他のコンテンツのリストを画面に表示することができる。これにより、ユーザは、選択したコンテンツと類似する内容の他のコンテンツに容易にアクセスすることができる。 In this way, by configuring the content cluster 82, when content is selected by the user, a list of other content belonging to the same cluster can be displayed on the screen. Thereby, the user can easily access other contents similar to the selected contents.

［クラスタの組み合わせによるクラスタの拡張］
代表キーワードに対するクラスタを、そのクラスタに含まれる他のキーワードを元に段階的に拡張することができる。たとえば、代表キーワードＡのクラスタとしてＢ，Ｃ，Ｄが抽出されているものとする。この関係を（Ａ：Ｂ，Ｃ，Ｄ）と表す。このとき、代表キーワードＢに対して（Ｂ：Ｃ，Ｅ）となっている場合、Ａのクラスタを（Ａ：（Ｂ：Ｃ，Ｅ），Ｄ）と拡張できる。 [Expand cluster by combining clusters]
The cluster for the representative keyword can be expanded step by step based on other keywords included in the cluster. For example, it is assumed that B, C, and D are extracted as clusters of the representative keyword A. This relationship is expressed as (A: B, C, D). At this time, if (B: C, E) is given to the representative keyword B, the cluster of A can be expanded to (A: (B: C, E), D).

図３０は、クラスタの拡張処理を示す図である。図３０に示すように、クラスタ９１、クラスタ９２を合わせて、クラスタ９３を作成する。この例では、クラスタ９２の対象語「Ｃ自動車」がクラスタ９１の対応語の１つである。そこで、クラスタ９２の対応語を、クラスタ９１の対応語「Ｃ自動車」の下位構造として関連付ける。その際、元からクラスタ９１の対応語として設定されているキーワードに関しては、対応語「Ｃ自動車」の下位構造への関連づけの対象から除外する。 FIG. 30 is a diagram illustrating cluster expansion processing. As shown in FIG. 30, a cluster 93 is created by combining the cluster 91 and the cluster 92. In this example, the target word “C car” of the cluster 92 is one of the corresponding words of the cluster 91. Therefore, the corresponding word of the cluster 92 is associated as a substructure of the corresponding word “C car” of the cluster 91. At that time, the keywords that are originally set as the corresponding words of the cluster 91 are excluded from the objects of association with the subordinate structure of the corresponding word “C car”.

なお、どちらのクラスタが下位となるのかは、たとえば、検索ユーザ数や検索結果のヒット件数で判断することができる。図３０の例では、クラスタ９１の方がクラスタ９２よりも検索件数が多いため、クラスタ９１の配下にクラスタ９２を配置することでクラスタ９３が生成されている。 Note that which cluster is lower can be determined by, for example, the number of search users or the number of hits in the search results. In the example of FIG. 30, the cluster 91 has a larger number of retrievals than the cluster 92, so that the cluster 93 is generated by arranging the cluster 92 under the cluster 91.

このようにクラスタを拡張することで、ユーザが選択した代表キーワードのコンテンツナビゲーションを行う際に、より広い範囲で関連するキーワードを提示することができる。 By expanding the cluster in this way, it is possible to present related keywords in a wider range when performing content navigation of the representative keyword selected by the user.

［注目クラスタの抽出］
基本ＤＢ１１３に含まれる検索キーワードのヒット件数やユニークユーザ数の時間的な変化とその時のユーザによる注目度を考える。たとえば、ユーザ数が増加していれば注目度が高いと考える。また、ヒット件数が増加していれば、注目度が高いと考える。 [Retrieve cluster of interest]
Consider the temporal change in the number of search keyword hits and the number of unique users included in the basic DB 113 and the degree of attention by the user at that time. For example, if the number of users increases, the degree of attention is considered high. Also, if the number of hits increases, the degree of attention is considered high.

図３１は、ユーザ数とヒット件数との変化に応じた注目度を示す図である。この図では、注目度を５段階で評価している。数値が大きいほど注目度は高いことを示す。
このように、キーワード毎の注目度を予め設定しておく。そして、代表キーワードと同じクラスタに属するキーワードを表示する際には、注目度の高いキーワードを優先的に表示する。これにより、代表キーワードに関連するキーワードのうち、多くのユーザによって最近注目されている物事を示すキーワードを容易に知ることができる。 FIG. 31 is a diagram illustrating the degree of attention according to changes in the number of users and the number of hits. In this figure, the degree of attention is evaluated in five stages. The larger the value, the higher the degree of attention.
Thus, the attention level for each keyword is set in advance. When displaying a keyword belonging to the same cluster as the representative keyword, a keyword with a high degree of attention is preferentially displayed. Thereby, the keyword which shows the thing which attracts attention recently by many users among the keywords relevant to a representative keyword can be known easily.

［同じような検索パターンを持つ検索語同士を同一のクラスタと見做すクラスタ作成方法］
ある検索キーワードの１時間単位の検索数を縦軸、時刻を横軸に取っての検索数の増減をプロットしたときに、同じ時刻に極端に増加したり減少したりするキーワードは同じクラスタに入れる。また、検索数の代わりに、キーワードの注目度などの変化を見ても良い。キーワードの注目度とはたとえば以下のような式で定義される。 [Cluster creation method that considers search terms with similar search patterns as the same cluster]
When plotting the increase / decrease in the number of searches for a search keyword with the vertical axis representing the number of searches per hour and the horizontal axis representing time, the keywords that increase or decrease at the same time are included in the same cluster. . Further, instead of the number of searches, a change in the attention level of the keyword may be seen. The keyword attention level is defined by the following expression, for example.

これは、検索語ｗ_iの時刻ｔでの利用者数をＵＵ_t（ｗ_i）、その補正値をＣ_tで表すとき、それらの注目度ＡＴ_t（ｗ_i）を求める評価式である。
図３２は、検索数と注目度との一日の遷移状況を比較した図である。この例では「郵便局」のグラフ９４と「年賀状」のグラフ９５とが示されている。グラフ９４，９５は、縦軸は数値（ユーザ数または注目度）、横軸が時刻である。そしてグラフ９４，９５には、それぞれ「年賀状」と「郵便局」の検索ユーザ数の変化と注目度の値の変化が１時間単位にプロットされている。そして、検索ユーザ数の値の変化が折れ線９４ａ，９５ａで示されており、注目度の値の変化が折れ線９４ｂ，９５ｂで示されている。 This is an evaluation formula for obtaining the attention degree AT _t (w _i ) when the number of users of the search word w _i at time t is represented by UU _t (w _i ) and the correction value thereof is represented by C _t .
FIG. 32 is a diagram comparing a daily transition state between the number of searches and the degree of attention. In this example, a “post office” graph 94 and a “New Year's card” graph 95 are shown. In the graphs 94 and 95, the vertical axis is a numerical value (the number of users or the degree of attention), and the horizontal axis is time. In the graphs 94 and 95, the change in the number of search users and the change in the value of attention are plotted for each hour in “New Year's card” and “Post office”, respectively. Changes in the value of the number of search users are indicated by broken lines 94a and 95a, and changes in the value of the attention level are indicated by broken lines 94b and 95b.

この例は、お年玉付き年賀はがきの当選番号の抽選日の記録である。図３２から分かるように、２つのキーワードは、検索ユーザ数および注目度の立ち上がりの時期やピークの時期がほぼ同期している。このように、検索ユーザ数と注目度と何れか一方、若しくは両方の変化が同じキーワードは、クラスタに入れるようにする。これにより、入力された検索キーワードの情報のみからでも、有効なクラスタを作成することができる。 This example is a record of the lottery date of the winning number of a New Year postcard with New Year's cards. As can be seen from FIG. 32, the number of search users, the rising time of the attention level, and the peak time are almost synchronized in the two keywords. In this way, keywords having the same change in either or both of the number of search users and the degree of attention are entered in the cluster. As a result, an effective cluster can be created only from the input search keyword information.

［コンテンツナビゲーションのプログラムによる実現］
なお、上記の処理機能は、クライアントサーバシステムのサーバコンピュータによって実現することができる。その場合、ポータルサイトサーバ１００が有すべき機能の処理内容を記述したサーバプログラムが提供される。サーバコンピュータは、クライアントコンピュータからの要求に応答して、サーバプログラムを実行する。これにより、上記処理機能がサーバコンピュータ上で実現され、処理結果がクライアントコンピュータに提供される。 [Realization by content navigation program]
The above processing functions can be realized by a server computer of a client server system. In that case, a server program describing the processing contents of the functions that the portal site server 100 should have is provided. The server computer executes the server program in response to a request from the client computer. As a result, the processing function is realized on the server computer, and the processing result is provided to the client computer.

処理内容を記述したサーバプログラムは、サーバコンピュータで読み取り可能な記録媒体に記録しておくことができる。サーバコンピュータで読み取り可能な記録媒体としては、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリなどがある。磁気記録装置には、ハードディスクドライブ（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクには、ＤＶＤ(Digital Versatile Disc)、ＤＶＤ−ＲＡＭ(Random Access Memory)、ＣＤ−ＲＯＭ(Compact Disc Read Only Memory)、ＣＤ−Ｒ(Recordable)／ＲＷ(ReWritable)などがある。光磁気記録媒体には、ＭＯ(Magneto-Optical disk)などがある。 The server program describing the processing contents can be recorded on a recording medium readable by the server computer. Examples of the recording medium readable by the server computer include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic recording device include a hard disk drive (HDD), a flexible disk (FD), and a magnetic tape. Examples of the optical disc include a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), and a CD-R (Recordable) / RW (ReWritable). Magneto-optical recording media include MO (Magneto-Optical disk).

サーバプログラムを流通させる場合には、たとえば、そのサーバプログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭなどの可搬型記録媒体が販売される。
サーバプログラムを実行するサーバコンピュータは、たとえば、可搬型記録媒体に記録されたサーバプログラムを、自己の記憶装置に格納する。そして、サーバコンピュータは、自己の記憶装置からサーバプログラムを読み取り、サーバプログラムに従った処理を実行する。なお、サーバコンピュータは、可搬型記録媒体から直接サーバプログラムを読み取り、そのサーバプログラムに従った処理を実行することもできる。 When distributing the server program, for example, portable recording media such as a DVD and a CD-ROM in which the server program is recorded are sold.
The server computer that executes the server program stores, for example, the server program recorded on the portable recording medium in its own storage device. Then, the server computer reads the server program from its own storage device and executes processing according to the server program. The server computer can also read the server program directly from the portable recording medium and execute processing according to the server program.

（付記１）コンテンツの検索を支援するためのコンテンツナビゲーションプログラムにおいて、
コンピュータを、
ユーザがキーワードに基づいた検索を行い検索結果の中から任意のコンテンツを選択する度に、検索用の前記キーワードと選択された前記コンテンツの識別情報とを関連付けて記憶手段に格納する格納手段、
前記キーワードと選択された前記コンテンツとの対応関係に基づいて、前記記憶手段に格納された前記キーワード間の関連性を判定し、関連する複数の前記キーワードをグループ化するグループ化手段、
任意の代表キーワードが選択された際に、選択された前記代表キーワードと同じグループに属する他の前記キーワードを出力する関連キーワード出力手段、
として機能させることを特徴とするコンテンツナビゲーションプログラム。 (Supplementary note 1) In a content navigation program for supporting content search,
Computer
A storage unit that associates the keyword for search with the identification information of the selected content and stores it in a storage unit each time a user performs a search based on the keyword and selects any content from the search results;
A grouping unit that determines a relevance between the keywords stored in the storage unit based on a correspondence relationship between the keyword and the selected content, and groups the plurality of related keywords;
When an arbitrary representative keyword is selected, related keyword output means for outputting the other keyword belonging to the same group as the selected representative keyword,
A content navigation program characterized by functioning as

（付記２）前記グループ化手段は、関連付けられた前記コンテンツが共通する前記キーワード同士をグループ化することを特徴とするコンテンツナビゲーションプログラム。
（付記３）前記グループ化手段は、２つの前記キーワードを比較したとき、共通して関連付けられている前記コンテンツの数が所定値以上の場合、２つの前記キーワードをグループ化することを特徴とする付記２記載のコンテンツナビゲーションプログラム。 (Additional remark 2) The said grouping means groups the said keywords with which the said related content is common, The content navigation program characterized by the above-mentioned.
(Supplementary Note 3) When the two keywords are compared, the grouping unit groups the two keywords when the number of the commonly associated contents is equal to or greater than a predetermined value. The content navigation program according to attachment 2.

（付記４）前記グループ化手段は、２つの前記キーワードを比較したとき、２つの前記キーワードの少なくとも一方に関連付けられている前記コンテンツの中で、共通して関連付けられている前記コンテンツの占める割合が所定値以上の場合、２つの前記キーワードをグループ化することを特徴とする付記２記載のコンテンツナビゲーションプログラム。 (Additional remark 4) When the said grouping means compares two said keywords, in the said content linked | related with at least one of the said two keywords, the ratio for which the said content linked | related is shared The content navigation program according to appendix 2, wherein the two keywords are grouped when the value is equal to or greater than a predetermined value.

（付記５）前記格納手段は、さらに、前記キーワードを入力したユーザのユーザ識別情報を、前記キーワードに関連付けて前記記憶手段に格納し、
前記グループ化手段は、関連付けられた前記ユーザ識別情報が共通する前記キーワード同士をグループ化することを特徴とするコンテンツナビゲーションプログラム。 (Supplementary Note 5) The storage means further stores user identification information of a user who has input the keyword in the storage means in association with the keyword,
The content navigation program characterized in that the grouping means groups the keywords with which the associated user identification information is common.

（付記６）前記グループ化手段は、２つの前記キーワードを比較したとき、共通して関連付けられている前記ユーザ識別情報の数が所定値以上の場合、２つの前記キーワードをグループ化することを特徴とする付記５記載のコンテンツナビゲーションプログラム。 (Appendix 6) The grouping means groups two keywords when the two keywords are compared, and the number of commonly associated user identification information is a predetermined value or more. The content navigation program according to appendix 5.

（付記７）前記グループ化手段は、２つの前記キーワードを比較したとき、２つの前記キーワードの少なくとも一方に関連付けられている前記ユーザ識別情報の中で、共通して関連付けられている前記ユーザ識別情報の占める割合が所定値以上の場合、２つの前記キーワードをグループ化することを特徴とする付記５記載のコンテンツナビゲーションプログラム。 (Additional remark 7) When the said grouping means compares two said keywords, the said user identification information linked | related commonly among the said user identification information linked | related with at least one of the said two keywords 6. The content navigation program according to appendix 5, wherein the two keywords are grouped when the ratio occupied by is greater than or equal to a predetermined value.

（付記８）前記グループ化手段は、前記記憶手段に含まれる１つの前記キーワードを対象語とし、前記対象語に対して関連する他の前記キーワードを対応語とし、前記対象語と前記対応語とをグループ化し、更に前記対応語に対して関連する他の前記キーワードを同一グループに含めることを特徴とする付記１記載のコンテンツナビゲーションプログラム。 (Additional remark 8) The said grouping means makes one said keyword contained in the said memory | storage means a target word, makes the said other keyword relevant to the said target word a corresponding word, the said target word, the said corresponding word, The content navigation program according to claim 1, wherein the other keywords related to the corresponding word are further included in the same group.

（付記９）前記コンピュータを、さらに、ユーザから不要として選択された前記キーワードを不要語として不要語記憶手段に格納する不要語格納手段として機能させ、
前記関連キーワード出力手段は、前記不要語記憶手段に記憶された前記不要語を除く前記キーワードを出力することを特徴とする付記１記載のコンテンツナビゲーションプログラム。 (Additional remark 9) The said computer is further functioned as an unnecessary word storage means to store the said keyword selected as unnecessary from a user as an unnecessary word in an unnecessary word storage means,
The content navigation program according to claim 1, wherein the related keyword output unit outputs the keyword excluding the unnecessary word stored in the unnecessary word storage unit.

（付記１０）前記不要語格納手段は、前記キーワードと選択された前記コンテンツとの対応関係に基づいて、ユーザに選択された前記不要語と他の前記キーワードとの間の関連性を判定し、前記不要語に関連する他の前記キーワードを新たな不要語として前記不要語記憶手段に格納することを特徴とする付記９記載のコンテンツナビゲーションプログラム。 (Additional remark 10) The said unnecessary word storage means determines the relationship between the said unnecessary word selected by the user, and the said other keyword based on the correspondence of the said keyword and the said selected content, The content navigation program according to appendix 9, wherein another keyword related to the unnecessary word is stored in the unnecessary word storage unit as a new unnecessary word.

（付記１１）前記グループ化手段は、前記記憶手段に含まれる１つの前記キーワードを対象語とし、前記対象語に対して関連する他の前記キーワードを対応語とし、前記対応語それぞれに関連付けられた前記コンテンツを前記対象語に関連付けて前記記憶手段に登録すると共に前記記憶手段から前記対応語を削除し、その後、前記記憶手段に格納された前記キーワード間の関連性を判定し、関連する複数の前記キーワードをグループ化することを特徴する付記１記載のコンテンツナビゲーションプログラム。 (Additional remark 11) The said grouping means made the said one keyword contained in the said memory | storage means into an object word, made the said other keyword relevant to the said object word into a corresponding word, and was linked | related with each said corresponding word The content is registered in the storage unit in association with the target word and the corresponding word is deleted from the storage unit, and then the relevance between the keywords stored in the storage unit is determined, The content navigation program according to appendix 1, wherein the keywords are grouped.

（付記１２）前記グループ化手段では、グループ化により生成された２つグループが共通の前記キーワードを有する場合、前記共通のキーワードを介して、一方の前記グループの配下に他方の前記グループを木構造に接続した新たなグループを生成することを特徴とする付記１記載のコンテンツナビゲーションプログラム。 (Additional remark 12) In the said grouping means, when two groups produced | generated by grouping have the said common keyword, the said other group under the said one group is tree-structured via the said common keyword. 2. A content navigation program according to appendix 1, wherein a new group connected to is generated.

（付記１３）前記格納手段は、さらに、前記キーワードによる検索のヒット件数を、前記キーワードに関連付けて前記記憶手段に格納し、
前記グループ化手段は、ヒット件数の時間的な推移が類似する複数の前記キーワードをグループ化することを特徴とする付記１記載のコンテンツナビゲーションプログラム。 (Supplementary Note 13) The storage means further stores the number of search hits by the keyword in the storage means in association with the keyword,
The content navigation program according to appendix 1, wherein the grouping unit groups a plurality of the keywords having similar temporal transitions in the number of hits.

（付記１４）前記格納手段は、さらに、前記キーワードを入力したユーザのユーザ識別情報を、前記キーワードに関連付けて前記記憶手段に格納し、
前記グループ化手段は、入力したユーザ数の時間的な推移が類似する複数の前記キーワードをグループ化することを特徴とする付記１記載のコンテンツナビゲーションプログラム。 (Supplementary Note 14) The storage means further stores user identification information of a user who has input the keyword in the storage means in association with the keyword,
The content navigation program according to appendix 1, wherein the grouping means groups a plurality of the keywords with similar temporal transitions of the number of input users.

（付記１５）コンテンツの検索をコンピュータによって支援するためのコンテンツナビゲーション方法において、
格納手段が、ユーザがキーワードに基づいた検索を行い検索結果の中から任意のコンテンツを選択する度に、検索用の前記キーワードと選択された前記コンテンツの識別情報とを関連付けて記憶手段に格納し、
グループ化手段が、前記キーワードと選択された前記コンテンツとの対応関係に基づいて、前記記憶手段に格納された前記キーワード間の関連性を判定し、関連する複数の前記キーワードをグループ化し、
関連キーワード出力手段が、任意の代表キーワードが選択された際に、選択された前記代表キーワードと同じグループに属する他の前記キーワードを出力する、
ことを特徴とするコンテンツナビゲーション方法。 (Supplementary Note 15) In a content navigation method for supporting a search for content by a computer,
Whenever the user performs a search based on the keyword and selects any content from the search results, the storage means associates the keyword for search with the identification information of the selected content and stores it in the storage means. ,
A grouping unit determines a relationship between the keywords stored in the storage unit based on a correspondence relationship between the keyword and the selected content, and groups the plurality of related keywords;
The related keyword output means outputs another keyword belonging to the same group as the selected representative keyword when an arbitrary representative keyword is selected.
A content navigation method characterized by the above.

（付記１６）コンテンツの検索を支援するためのコンテンツナビゲーション装置において、
ユーザがキーワードに基づいた検索を行い検索結果の中から任意のコンテンツを選択する度に、検索用の前記キーワードと選択された前記コンテンツの識別情報とを関連付けて記憶手段に格納する格納手段と、
前記キーワードと選択された前記コンテンツとの対応関係に基づいて、前記記憶手段に格納された前記キーワード間の関連性を判定し、関連する複数の前記キーワードをグループ化するグループ化手段と、
任意の代表キーワードが選択された際に、選択された前記代表キーワードと同じグループに属する他の前記キーワードを出力する関連キーワード出力手段と、
を有することを特徴とするコンテンツナビゲーション装置。 (Supplementary Note 16) In a content navigation apparatus for supporting content search,
A storage unit that associates and stores in the storage unit the keyword for search and the identification information of the selected content each time the user performs a search based on the keyword and selects any content from the search results;
Grouping means for determining a relevance between the keywords stored in the storage means based on a correspondence relationship between the keyword and the selected content, and grouping a plurality of related keywords;
When an arbitrary representative keyword is selected, related keyword output means for outputting the other keyword belonging to the same group as the selected representative keyword;
A content navigation apparatus comprising:

（付記１７）コンテンツの検索を支援するためのコンテンツナビゲーションプログラムを記録したコンピュータ読み取り可能な記録媒体において、
コンピュータを、
ユーザがキーワードに基づいた検索を行い検索結果の中から任意のコンテンツを選択する度に、検索用の前記キーワードと選択された前記コンテンツの識別情報とを関連付けて記憶手段に格納する格納手段、
前記キーワードと選択された前記コンテンツとの対応関係に基づいて、前記記憶手段に格納された前記キーワード間の関連性を判定し、関連する複数の前記キーワードをグループ化するグループ化手段、
任意の代表キーワードが選択された際に、選択された前記代表キーワードと同じグループに属する他の前記キーワードを出力する関連キーワード出力手段、
として機能させることを特徴とするコンテンツナビゲーションプログラムを記録したコンピュータ読み取り可能な記録媒体。 (Supplementary Note 17) In a computer-readable recording medium on which a content navigation program for supporting content search is recorded,
Computer
A storage unit that associates the keyword for search with the identification information of the selected content and stores it in a storage unit each time a user performs a search based on the keyword and selects any content from the search results;
A grouping unit that determines a relevance between the keywords stored in the storage unit based on a correspondence relationship between the keyword and the selected content, and groups the plurality of related keywords;
When an arbitrary representative keyword is selected, related keyword output means for outputting the other keyword belonging to the same group as the selected representative keyword,
A computer-readable recording medium on which a content navigation program is recorded.

実施の形態に適用される発明の概念図である。It is a conceptual diagram of the invention applied to embodiment. 本発明の実施の形態を実現するためのシステム構成例を示す図である。It is a figure which shows the system configuration example for implement | achieving embodiment of this invention. 本発明の実施の形態に用いるポータルサイトサーバのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the portal site server used for embodiment of this invention. ポータルサイトサーバの処理機能を示すブロック図である。It is a block diagram which shows the processing function of a portal site server. 検索履歴蓄積処理を示す図である。It is a figure which shows a search log | history accumulation process. 検索履歴ＤＢのデータ構造例を示す図である。It is a figure which shows the example of a data structure of search log | history DB. レベル１のクラスタ化を示す図である。It is a figure which shows level 1 clustering. ２つのキーワードの間のレベル１の重なり度数を示す図である。It is a figure which shows the overlap frequency of the level 1 between two keywords. レベル２のクラスタ化を示す図である。It is a figure which shows the clustering of level 2. FIG. ２つのキーワードの間のレベル２の重なり度数を示す図である。It is a figure which shows the overlap frequency of the level 2 between two keywords. コンピュータというキーワードが属するクラスタの例を示す図である。It is a figure which shows the example of the cluster to which the keyword of a computer belongs. ディレクトリ方式とコンテンツナビゲーション方式とのデータ構造を示す図である。図１２（Ａ）がディレクトリ方式のデータ構造を示しており、図１２（Ｂ）がコンテンツナビゲーション方式のデータ構造を示している。It is a figure which shows the data structure of a directory system and a content navigation system. 12A shows the data structure of the directory system, and FIG. 12B shows the data structure of the content navigation system. 基本ＤＢのデータ構造例を示す図である。It is a figure which shows the data structure example of basic DB. 不要語ＤＢのデータ構造例を示す図である。It is a figure which shows the data structure example of unnecessary word DB. クラスタＤＢのデータ構造例を示す図である。It is a figure which shows the data structure example of cluster DB. 基本ＤＢの作成手順を示すフローチャートである。It is a flowchart which shows the preparation procedure of basic DB. キーワードに対応するＵＲＬが定義されたハッシュ配列の例を示す図である。It is a figure which shows the example of the hash array in which URL corresponding to a keyword was defined. ＵＲＬに対応するユーザが定義されたハッシュ配列の例を示す図である。It is a figure which shows the example of the hash array in which the user corresponding to URL was defined. 不要語ＤＢの作成手順を示すフローチャートの前半である。It is the first half of the flowchart which shows the preparation procedure of unnecessary word DB. 不要語ＤＢの作成手順を示すフローチャートの後半である。It is the latter half of the flowchart which shows the preparation procedure of unnecessary word DB. クラスタＤＢ作成処理の手順を示すフローチャートの前半である。It is the first half of the flowchart which shows the procedure of cluster DB creation processing. クラスタＤＢ作成処理の手順を示すフローチャートの後半である。It is the latter half of the flowchart which shows the procedure of cluster DB creation processing. クラスタ化とキーワードの縮退とを繰り返した際のクラスタの変化を示す図である。It is a figure which shows the change of a cluster at the time of repeating clustering and keyword degeneracy. ナビゲーショントップ画面の例を示す図である。It is a figure which shows the example of a navigation top screen. ナビゲーション画面の例を示す図である。It is a figure which shows the example of a navigation screen. タイプ識別可能なナビゲーション画面の例を示す図である。It is a figure which shows the example of the navigation screen which can identify a type. 任意のコンテンツを強調表示したナビゲーション画面の例を示す図である。It is a figure which shows the example of the navigation screen which highlighted the arbitrary content. 定番ディレクトリ画面の例を示す図である。It is a figure which shows the example of a classic directory screen. コンテンツクラスタの例を示す図である。It is a figure which shows the example of a content cluster. クラスタの拡張処理を示す図である。It is a figure which shows the expansion process of a cluster. ユーザ数とヒット件数との変化に応じた注目度を示す図である。It is a figure which shows the attention degree according to the change of the number of users and the number of hits. 検索数と注目度との一日の遷移状況を比較した図である。It is the figure which compared the daily transition situation of the number of searches and attention degree.

Explanation of symbols

１コンテンツナビゲーション装置
１ａ格納手段
１ｂ記憶手段
１ｃグループ化手段
１ｄ関連キーワード出力手段
２，５クライアント
３検索サーバ
４コンテンツサーバ
６ａキーワード
６ｂ検索結果
７ａコンテンツ取得要求
７ｂコンテンツ
８ａ代表キーワード
８ｂ関連キーワード DESCRIPTION OF SYMBOLS 1 Content navigation apparatus 1a Storage means 1b Storage means 1c Grouping means 1d Related keyword output means 2,5 Client 3 Search server 4 Content server 6a Keyword 6b Search result 7a Content acquisition request 7b Content 8a Representative keyword 8b Related keyword

Claims

In a content navigation program for supporting content search,
Computer
Each time a user performs a search based on a keyword and selects any content from the search results, the identification information of the selected content and the identification information of the user who has input the keyword are input for the keyword for search. Storage means for associating and storing in the storage means,
With reference to the storage means, a keyword having a large number after deduplication of identification information of the associated user is selected as a target word, and for each non-selected keyword other than the target word stored in the storage means, the target words and the number of identification information of content associated with common between the non-selected keyword, divided by the sum of the identification information of the content associated with each of the said subject word and the non-selected keyword, the division result numerical A non-selected keyword having a predetermined value equal to or greater than a predetermined value as a corresponding word, the target word and the corresponding word are grouped into one group, content identification information associated with the corresponding word is associated with the target word, and stored in the storage unit regard the subject word stored and each non-grouped keyword other than the corresponding word, associate in common between the subject word and the non-grouped keywords The number of identification information obtained content, divided by the sum of the identification information of the content associated with each of the said subject word and non grouping keyword, division result of numerical said non grouping keywords than a predetermined value Grouping means included in one group,
Related keyword output means for outputting a keyword other than the representative keyword belonging to the same group as the input representative keyword when an arbitrary representative keyword is input by a user's operation input;
A content navigation program characterized by functioning as

2. The content navigation program according to claim 1, wherein the grouping unit deletes the corresponding word from the storage unit.

In a content navigation method for supporting search of content by a computer,
  The computer is
  Each time a user performs a search based on a keyword and selects any content from the search results, the identification information of the selected content and the identification information of the user who has input the keyword are input for the keyword for search. Is stored in the storage means in association with
  With reference to the storage means, a keyword having a large number after deduplication of identification information of the associated user is selected as a target word, and for each non-selected keyword other than the target word stored in the storage means, the target The number of content identification information commonly associated with a word and a non-selected keyword is divided by the sum of the content identification information associated with each of the target word and the non-selected keyword, and the numerical value of the division result A non-selected keyword having a predetermined value equal to or greater than a predetermined value as a corresponding word, grouping the target word and the corresponding word into a group, associating content identification information associated with the corresponding word with the target word, For each ungrouped keyword other than the stored target word and the corresponding word, the target word and the ungrouped keyword are associated in common. The number of pieces of identification information of the obtained content is divided by the sum of the pieces of content identification information associated with the target word and the ungrouped keyword, and an ungrouped keyword whose division result is a predetermined value or more In one group,
  When an arbitrary representative keyword is input by a user operation input, a keyword other than the representative keyword belonging to the same group as the input representative keyword is output.
  A content navigation method characterized by the above.

4. The content navigation method according to claim 3, wherein, when grouping, the corresponding word is deleted from the storage means.

In a content navigation apparatus for supporting content search,
  Each time a user performs a search based on a keyword and selects any content from the search results, the identification information of the selected content and the identification information of the user who has input the keyword are input for the keyword for search. Storage means for associating and storing in the storage means,
  With reference to the storage means, a keyword having a large number after deduplication of identification information of the associated user is selected as a target word, and for each non-selected keyword other than the target word stored in the storage means, the target Divide the number of content identification information commonly associated between the word and the non-selected keyword by the sum of the content identification information associated with the target word and the non-selected keyword, A non-selected keyword having a predetermined value equal to or greater than a predetermined value as a corresponding word, the target word and the corresponding word are grouped into one group, content identification information associated with the corresponding word is associated with the target word, and stored in the storage unit For each ungrouped keyword other than the stored target word and the corresponding word, the target word and the ungrouped keyword are associated in common. The number of pieces of content identification information obtained is divided by the sum of the content identification information associated with each of the target word and the ungrouped keyword, and an ungrouped keyword whose division result is equal to or greater than a predetermined value is obtained. Grouping means to be included in one group;
  Related keyword output means for outputting a keyword other than the representative keyword belonging to the same group as the input representative keyword when an arbitrary representative keyword is input by a user's operation input;
  A content navigation apparatus comprising: