JP2000172699A

JP2000172699A - Device and method for supporting hypertext structure change and storage medium with hypertext structure change support program recorded therein

Info

Publication number: JP2000172699A
Application number: JP10345071A
Authority: JP
Inventors: Hiroki Kato; 裕樹加藤; Takehiro Nakayama; 雄大中山
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1998-12-04
Filing date: 1998-12-04
Publication date: 2000-06-23
Anticipated expiration: 2018-12-04
Also published as: JP3705330B2

Abstract

PROBLEM TO BE SOLVED: To provide a hypertext structure change supporting device for comprehending the trend of a user from the access tendency of the user and the similarity degree between the distribution tendencies of contents and supporting the change of contents on a web site, corresponding to the trend of the user. SOLUTION: A log clustering part 2 acquires a link structure of nodes and a log as the access history of the user from a web server 1 and forms a log cluster by clustering the nodes based on the log. Moreover, a content clustering part 3 acquires contents from the web server 1 and forms a content cluster by clustering based on the similarity of contents. An out-of-intention detecting part 4 makes the log cluster correspond to the contents cluster and detects the contents cluster, which cannot be made to correspond to cluster as one with possibility for being out-of-intention. Then, an out-of-intention presenting part 5 presents them to the manager or the like of the web server 1.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、インターネットで
の情報提供を行うＷｅｂサイトにおいて、Ｗｅｂサイト
で公開されているコンテンツ（ハイパーテキスト情報）
に対するユーザのアクセス傾向とコンテンツ分布傾向と
の類似度からハイパーテキスト構造の変更支援を行う装
置および方法と、その支援装置をコンピュータで実現す
るための支援プログラムを記録した記憶媒体に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a content (hypertext information) published on a Web site for providing information on the Internet.
TECHNICAL FIELD The present invention relates to an apparatus and a method for providing support for changing a hypertext structure based on the similarity between a user's access tendency and a content distribution tendency with respect to, and a storage medium on which a support program for implementing the support apparatus by a computer is recorded.

【０００２】[0002]

【従来の技術】近年、ＷｏｒｌｄＷｉｄｅＷｅｂ
（以下Ｗｅｂと略す）において情報提供を行う需要が
増大している。企業においてもＷｅｂで情報提供を行う
ためのＷｅｂサイトを持ち、Ｗｅｂサイトで自社の製品
の広告活動、あるいは電子商取引などの手段による物
品、サービスの売買が盛んに行われつつある。そのよう
な中で、Ｗｅｂにアクセスしてきたユーザの動向を知
り、ユーザ動向に合わせて提供すべき情報の構造を速や
かに変更していくための技術に関する需要が高まってい
る。2. Description of the Related Art In recent years, the World Wide Web
(Hereinafter abbreviated as Web), there is an increasing demand for providing information. Companies also have a Web site for providing information on the Web, and advertising activities for their products or buying and selling of goods and services by means such as electronic commerce are being actively performed on the Web site. Under such circumstances, there is an increasing demand for technology for knowing the trends of users who have accessed the Web and quickly changing the structure of information to be provided according to the user trends.

【０００３】Ｗｅｂサイトで提供される情報はコンテン
ツと呼ばれ、ハイパーテキスト（たとえば、ＨＴＭＬ）
によって表現される。個々のハイパーテキスト情報（以
降ノードと呼ぶ）はリンクによって結ばれる。Ｗｅｂサ
イトにアクセスしたいユーザは、リンクをたどることに
よってＷｅｂサイト中のノードにアクセスし、様々な情
報を入手することができる。またＷｅｂサーバは、ユー
ザがＷｅｂサーバにアクセスした履歴を記録することが
できる。記録された内容を以降ではログと呼ぶ。例え
ば、ログには情報にアクセスしてきたコンピュータ（ユ
ーザがＷｅｂサーバにアクセスするために使用している
コンピュータ、又はユーザの利用情報を中継しているコ
ンピュータ）のＩＰアドレス、アクセスしてきた時刻、
情報にアクセスするためにＷｅｂサーバに送った命令が
記録されている。命令には、ユーザがアクセスしたノー
ドのサーバ上での識別子（例えばＵＲＬ）が含まれる。[0003] Information provided on a Web site is called content, and is referred to as hypertext (for example, HTML).
Is represented by Each piece of hypertext information (hereinafter referred to as a node) is connected by a link. A user who wants to access a Web site can access a node in the Web site by following a link to obtain various information. Further, the Web server can record the history of the user accessing the Web server. The recorded content is hereinafter referred to as a log. For example, in the log, the IP address of the computer accessing the information (the computer used by the user to access the Web server or the computer relaying the user's usage information), the time of access,
The command sent to the Web server to access the information is recorded. The instruction includes an identifier (for example, URL) on the server of the node accessed by the user.

【０００４】アクセスログを利用してユーザの動向を知
る従来技術としては、例えば、Ｍｉｎｇ−Ｓｙａｎほ
か，“ＤａｔａＭｉｎｉｎｇｆｏｒＰａｔｈＴ
ｒａｖｅｒｓａｌＰａｔｔｅｒｎｓｏｎｔｈｅ
Ｗｅｂ”，ＩＥＥＥＴｒａｎｓ．ｏｎＫｎｏｗｌ
ｅｄｇｅａｎｄＤａｔａＥｎｇｉｎｅｅｒｉｎ
ｇ，１９９８等に記載されている技術がある。この文
献に記載されている技術は、ＭＦ（ＭａｘｉｍａｌＦ
ｏｒｗａｒｄｒｅｆｅｒｅｎｃｅｓ）という系列を取
り出す技術である。図１５は、Ｗｅｂサイト構造とユー
ザのアクセス経路の一例の説明図である。図１５に示し
た例では、ＡないしＨの各ノードが図示したようにリン
クされているとする。このとき、あるユーザが図中の太
線で示したように、Ａ→Ｂ→Ｃ→Ｄ→Ｅ→Ｆ→Ｇの順に
ノードを閲覧したとする。この場合には、［ＡＢＣ
Ｄ］、［ＡＢＥ］、［ＡＦＧ］というＭＦが得られる。
アクセスのあったユーザすべてに対してＭＦを求め、Ｍ
Ｆの部分列の出現回数をカウントすることでユーザのア
クセスパターンを得ることができる。[0004] As a conventional technique for knowing a user's trend using an access log, for example, Ming-Syan et al., "Data Mining for Path T".
Raversal Patterns on the
Web ", IEEE Trans. On Knowl
edge and Data Engineerin
g, 1998 and the like. The technique described in this document is based on MF (Maximal F).
This is a technique for extracting a series called “old reference”. FIG. 15 is an explanatory diagram of an example of a Web site structure and an access route of a user. In the example shown in FIG. 15, it is assumed that the nodes A to H are linked as shown. At this time, it is assumed that a certain user browses the nodes in the order of A → B → C → D → E → F → G as shown by the thick line in the figure. In this case, [ABC
D], [ABE] and [AFG] are obtained.
MF is calculated for all users who have access, and M
The access pattern of the user can be obtained by counting the number of appearances of the subsequence of F.

【０００５】この技術では、ユーザがリンクをたどった
ノードの直線的な系列を得ることができる。しかし、ユ
ーザがあるノードから戻ったノードを起点にたどり直し
た系列については、異なる系列として得られる。そのた
め、ユーザのアクセスしたノード群を一塊として得るこ
とは困難である。[0005] According to this technique, a linear series of nodes that a user has followed a link can be obtained. However, a series obtained by tracing the user starting from a node returning from a certain node is obtained as a different series. Therefore, it is difficult to obtain a group of nodes accessed by the user as a lump.

【０００６】別の従来技術として、例えば、Ｊ．Ｂｏｒ
ｇｅｓほか，“ＭｉｎｉｎｇＡｓｓｏｃｉａｔｉｏｎ
ＲｕｌｅｓｉｎＨｙｐｅｒｔｅｘｔＤａｔａｂ
ａｓｅｓ”，ＩｎＰｒｏｃ．ｏｆＫＤＤ−９８に
記載されている技術がある。この技術は、ログからリン
クで結ばれたノード間のユーザの遷移数を取り出し、ユ
ーザのアクセスしたノード系列の相関ルールを得る技術
である。この技術では、ノード間のユーザの遷移数のみ
を見ているため、同一のユーザが必ずその系列をたどっ
ているとは限らない。図１６は、Ｗｅｂサイト構造とノ
ード間のユーザの遷移数の一例の説明図である。図中、
リンクをたどった方向を矢線によって示し、そのたどっ
た回数を数値により示している。この従来技術によれ
ば、図１６に示した例においては、例えば、Ｂ→Ａのリ
ンクをたどったユーザはＡ→Ｆのリンクをたどる傾向が
あるという相関関係が得られる。しかし、同じユーザが
ノードＢ，Ａ，Ｆすべてを訪れていることを示すもので
はなく、ユーザのアクセス動向を正確に反映していると
はいえない。Another prior art is disclosed in, for example, Bor
ges et al., “Mining Association
Rules in Hypertext Datab
as ", In Proc. of KDD-98. This technique extracts the number of transitions of a user between nodes connected by a link from a log, and calculates a correlation rule of a node sequence accessed by the user. In this technique, since only the number of transitions of the user between the nodes is observed, the same user does not necessarily follow the sequence. It is explanatory drawing of an example of the transition number of a user.
The direction in which the link was followed is indicated by an arrow, and the number of times followed is indicated by a numerical value. According to this conventional technique, in the example shown in FIG. 16, for example, a correlation is obtained in which a user who follows a link from B to A tends to follow a link from A to F. However, this does not indicate that the same user has visited all of the nodes B, A, and F, and cannot be said to accurately reflect the access trend of the user.

【０００７】加えて、上記２つの従来技術ではログから
ユーザのアクセスパターンを得ることで頻度の高いアク
セス経路を知ることはできるが、コンテンツをあわせて
統合的に処理を行っていないため、アクセスパターンを
もとにどのコンテンツを変更すべきかといったＷｅｂサ
イトの維持、管理に関する支援を行うことは困難であっ
た。In addition, in the above two prior arts, a frequently accessed route can be known by obtaining a user's access pattern from a log. However, since the content is not integrated and processed in an integrated manner, the access pattern cannot be determined. It is difficult to provide support for maintaining and managing a Web site, such as which content should be changed based on the content.

【０００８】[0008]

【発明が解決しようとする課題】本発明は、上述した事
情に鑑みてなされたもので、ユーザのアクセス傾向と、
コンテンツの分布傾向間の類似度とを比較することによ
って、ユーザの動向を把握するとともに、ユーザの動向
に合わせたＷｅｂサイトのコンテンツの変更の支援を可
能としたハイパーテキスト構造変更支援装置および方法
と、その支援装置をコンピュータで実現するための支援
プログラムを記録した記憶媒体を提供することを目的と
するものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned circumstances, and shows the tendency of access by users,
A hypertext structure change support apparatus and method that can grasp the user's trend by comparing the similarity between the distribution trends of the content and support the change of the content of the Web site according to the user's trend It is another object of the present invention to provide a storage medium storing a support program for realizing the support device by a computer.

【０００９】[0009]

【課題を解決するための手段】本発明は、情報システム
によって管理されたハイパーテキスト情報群に対するア
クセス履歴情報に基づいて、該ハイパーテキスト情報群
をクラスタリングしてログクラスタを生成し、また、ハ
イパーテキスト情報群を個々のハイパーテキスト間の類
似度に基づいてクラスタリングしてコンテンツクラスタ
を生成し、ログクラスタに対してコンテンツクラスタを
対応づけ、対応づけができなかったコンテンツクラスタ
内の要素であるハイパーテキストを提示するものであ
る。ログクラスタに対してコンテンツクラスタを対応づ
けられなかった場合には、例えば同じような内容のハイ
パーテキストに対してユーザからの参照頻度が明らかに
異なり、Ｗｅｂサイト構造が不適切である、あるいは、
リンクが適切に設けられていないなどの可能性がある。
このようなハイパーテキストを提示することによって、
ハイパーテキストの構造の変更を促し、構造変更を支援
することができる。According to the present invention, a log cluster is generated by clustering hypertext information groups based on access history information for the hypertext information groups managed by the information system. The information group is clustered based on the similarity between individual hypertexts to generate a content cluster, the content cluster is associated with the log cluster, and the hypertext that is an element in the content cluster that could not be associated is identified. It is presented. When the content cluster cannot be associated with the log cluster, for example, the frequency of reference from the user to hypertexts having similar contents is clearly different, and the website structure is inappropriate, or
There is a possibility that the link is not properly provided.
By presenting such hypertext,
The change of the structure of the hypertext can be promoted, and the structure change can be supported.

【００１０】[0010]

【発明の実施の形態】図１は、本発明のハイパーテキス
ト構造変更支援装置の実施の一形態を示すブロック図で
ある。図中、１はＷｅｂサーバ、２はログクラスタリン
グ部、３はコンテンツクラスタリング部、４は意図ずれ
検出部、５は意図ずれ提示部、１１はログ記録部、１２
はコンテンツ提供部である。FIG. 1 is a block diagram showing an embodiment of a hypertext structure change support device according to the present invention. In the figure, 1 is a Web server, 2 is a log clustering unit, 3 is a content clustering unit, 4 is an intention shift detection unit, 5 is an intention shift presentation unit, 11 is a log recording unit, 12
Is a content providing unit.

【００１１】Ｗｅｂサーバ１は、ユーザに提供したい情
報（コンテンツ）を貯えており、ネットワーク上で情報
を発信する。Ｗｅｂサーバ１は、ログ記録部１１および
コンテンツ提供部１２を有している。コンテンツ提供部
１２は、ユーザのアクセスに従ってコンテンツを提供す
る。また、ログ記録部１１は、ユーザからのアクセスが
あるごとに、ユーザを識別するためのユーザ識別子（Ｉ
Ｐアドレス）、アクセスした時刻、ユーザのアクセスし
たハイパーテキスト（以降、ノードと呼ぶ）のあるアド
レス（ＵＲＬ）を記録する。Ｗｅｂサーバ１で記録され
るＩＰアドレスは、実際にユーザが利用しているクライ
アントコンピュータのほか、クライアントコンピュータ
からのアクセスを代行するプロキシサーバのアドレスで
あることもある。後者の場合は、複数のユーザが同一の
ＩＰアドレスを用いることになる。以下の説明では、Ｗ
ｅｂサーバ１にアクセスしたコンピュータのＩＰアドレ
スと日付によりユーザの識別を行うものとする。すなわ
ち同じＩＰアドレスからのアクセスであってもアクセス
した日付が異なる場合には、異なるユーザとして識別す
る。The Web server 1 stores information (contents) to be provided to a user, and transmits the information on a network. The Web server 1 has a log recording unit 11 and a content providing unit 12. The content providing unit 12 provides a content according to a user's access. In addition, the log recording unit 11 sets a user identifier (I) for identifying a user every time there is an access from the user.
P address), access time, and an address (URL) of the hypertext (hereinafter, referred to as a node) accessed by the user. The IP address recorded in the Web server 1 may be an address of a client computer actually used by a user or an address of a proxy server acting for access from the client computer. In the latter case, a plurality of users use the same IP address. In the following description, W
It is assumed that the user is identified based on the IP address and the date of the computer accessing the web server 1. That is, if the access dates are different even from accesses from the same IP address, they are identified as different users.

【００１２】本発明において用いるログは、上述のもの
に限らず、例えば特開平１０−２２４３４９号公報に記
載されているように、プロキシサーバのログをあわせて
用いることで、ユーザを識別したものでもよい。また、
例えば特開平１０−２０７８３８号公報に記載されてい
るＪａｖａアプレット等のクライアント側にアクセス通
知の機構をもつ仕組みによって記録されたログでも、ユ
ーザが識別でき、アクセスした情報の場所、アクセスし
た順序がわかるものであればよい。The log used in the present invention is not limited to the one described above. For example, as described in Japanese Patent Application Laid-Open No. H10-224349, a log that identifies a user by using a log of a proxy server together is also used. Good. Also,
For example, even in a log recorded by a mechanism having an access notification mechanism on the client side such as a Java applet described in Japanese Patent Application Laid-Open No. 10-207838, the user can be identified, and the location of accessed information and the order of access can be known. Anything should do.

【００１３】ログクラスタリング部２は、Ｗｅｂサーバ
１からノードとリンクで形成されるリンク構造と、ログ
を取得し、ログに基づいてノードをクラスタリングして
ログクラスタを形成する。このログクラスタリング部２
の詳細は後述する。The log clustering unit 2 acquires a link structure formed by links with nodes from the Web server 1 and logs, and forms a log cluster by clustering the nodes based on the logs. This log clustering unit 2
Will be described later.

【００１４】コンテンツクラスタリング部３は、Ｗｅｂ
サーバ１からコンテンツを取得し、例えば内容の類似度
に基づいてクラスタリングして、コンテンツクラスタを
形成する。このコンテンツクラスタの形成時には、リン
ク構造を参照せずに行う。The content clustering unit 3 is a Web server
Content is acquired from the server 1 and clustered based on, for example, the similarity of the content to form a content cluster. This content cluster is formed without referring to the link structure.

【００１５】意図ずれ検出部４は、ログクラスタリング
部２によって得られたログクラスタに対して、コンテン
ツクラスタリング部３によって得られたコンテンツクラ
スタを対応づける。このとき、対応づけを行えなかった
コンテンツクラスタを、意図がずれている可能性がある
として検出する。The intention shift detecting unit 4 associates the content cluster obtained by the content clustering unit 3 with the log cluster obtained by the log clustering unit 2. At this time, the content cluster that could not be correlated is detected as possibly having an unintended intention.

【００１６】意図ずれ提示部５は、意図ずれ検出部４に
おいて対応づけを行えずに意図がずれている可能性があ
るとして検出されたコンテンツクラスタ内の要素である
ハイパーテキストを、Ｗｅｂサーバ１の管理者などに提
示する。The intention shift presenting unit 5 sends the hypertext, which is an element in the content cluster detected as having a possibility that the intention is shifted due to the inability to associate with the intention shift detecting unit 4, to the Web server 1. Present it to the administrator.

【００１７】図２は、本発明のハイパーテキスト構造変
更支援装置の実施の一形態におけるログクラスタリング
部の一例を示すブロック図である。図中、２１はアクセ
ス情報記憶部、２２はアクセス数カウント部、２３はク
ラスタ構成部、２４はクラスタ生成高速化部、２５はク
ラスタ生成制限部である。FIG. 2 is a block diagram showing an example of a log clustering unit in an embodiment of the hypertext structure change support device of the present invention. In the figure, 21 is an access information storage unit, 22 is an access number counting unit, 23 is a cluster configuration unit, 24 is a cluster generation acceleration unit, and 25 is a cluster generation restriction unit.

【００１８】アクセス情報記憶部２１は、ログの情報か
ら、識別子と、その識別子をもつユーザがアクセスした
ノードを時系列順に記録した情報とを組にして記憶す
る。ここでの識別子は、上述のように同じＩＰアドレス
からのアクセスであってもアクセスした日付が異なる場
合には、異なるユーザとして識別して付与した識別子で
ある。The access information storage unit 21 stores, from the log information, a set of an identifier and information in which nodes accessed by a user having the identifier are recorded in chronological order. The identifier here is an identifier identified and given as a different user when the access date is different even if the access is from the same IP address as described above.

【００１９】アクセス数カウント部２２は、与えられた
ハイパーテキスト集合すべてに与えられた順序制約を満
たしてアクセスした履歴をもつ識別子の数を、アクセス
情報記憶部２１に記憶されている情報を用いて計算す
る。The number-of-accesses counting section 22 counts the number of identifiers having a history of accessing the given hypertext set by satisfying the given order constraint by using the information stored in the access information storage section 21. calculate.

【００２０】クラスタ構成部２３は、予め定められた起
点ノードのみからなるクラスタを頂点とし、起点ノード
からリンクを１つたどることで到達可能でありかつログ
が取得されているノード集合から一つずつ取り出したノ
ードと起点ノードの組からなるクラスタ集合を一つ下位
の階層の新たなクラスタ候補として生成する。生成した
クラスタ候補が、アクセス数カウント部２２により予め
与えられた閾値以上のアクセス数を持つことが計算され
た場合には、新たなクラスタとして起点ノードの子クラ
スタとして生成する。生成した子クラスタについても、
それぞれについて、子クラスタに含まれるノード集合か
らリンクを一つたどることで到達可能なノード集合に含
まれかつログが取得されておりかつその子クラスタには
含まれていないノードを一つずつ取り出し、子クラスタ
に含まれるノード集合に加えることでさらに一つ下位の
階層のクラスタ候補を生成する。さらに、クラスタ候補
が予め与えられた閾値以上のアクセス数を持つことが計
算された場合には、子クラスタを親クラスタとする新た
な子クラスタを生成する過程を新たな子クラスタが生成
されなくなるまで繰り返す。このようにして、階層的ク
ラスタの生成を行う。The cluster composing unit 23 has a cluster composed of only a predetermined starting node as a vertex, and can be reached by tracing one link from the starting node and one by one from a set of nodes from which a log has been acquired. A cluster set including a set of the extracted node and the originating node is generated as a new cluster candidate in the next lower hierarchy. When the generated cluster candidate is calculated by the access number counting unit 22 to have an access number equal to or greater than a threshold value given in advance, a new cluster is generated as a child cluster of the origin node as a new cluster. Regarding the generated child cluster,
For each of the nodes, the nodes included in the set of nodes reachable by following a link from the set of nodes included in the child cluster and whose logs have been acquired and not included in the child cluster are extracted one by one, and the child By adding to the node set included in the cluster, a cluster candidate of the next lower hierarchy is generated. Further, when it is calculated that the cluster candidate has the number of accesses equal to or greater than the predetermined threshold, the process of generating a new child cluster with the child cluster as a parent cluster is performed until a new child cluster is not generated. repeat. In this way, a hierarchical cluster is generated.

【００２１】クラスタ生成高速化部２４は、クラスタ構
成部２３において階層的クラスタを生成する際に、新た
に生成されるクラスタ候補に含まれるノード集合の部分
集合をもつ上位階層のクラスタにおいて、その上位階層
のクラスタに対するユーザからのアクセス数が予め定め
られた閾値以下である場合に、そのクラスタ候補を生成
しないようにする。これによって、生成しないクラスタ
候補以後のクラスタ生成を打ち切って処理時間を短縮す
ることができる。When generating a hierarchical cluster in the cluster composing section 23, the cluster generation accelerating section 24 selects an upper-level cluster having a subset of a node set included in a newly generated cluster candidate. When the number of accesses from a user to a hierarchical cluster is equal to or less than a predetermined threshold, the cluster candidate is not generated. As a result, generation of clusters after a cluster candidate not to be generated can be stopped to shorten the processing time.

【００２２】クラスタ生成制限部２５は、予め定められ
たノードについては、新たなクラスタ候補を生成する際
に、そのノードのみからリンクを一つたどることで到達
可能なノード集合については候補を生成する際に用いな
いようにする。これによって、生成する子クラスタ候補
の数を制限することができる。When generating a new cluster candidate for a predetermined node, the cluster generation restriction unit 25 generates a candidate for a set of nodes that can be reached by following a link from only that node. Do not use it. Thus, the number of child cluster candidates to be generated can be limited.

【００２３】次に、本発明のハイパーテキスト構造変更
支援装置の実施の一形態における動作の一例について説
明する。図３は、ログクラスタリング部２における動作
の一例を示すフローチャートである。また、図４は、Ｗ
ｅｂサーバ１に貯えられているコンテンツの構造の具体
例の説明図である。以下の説明では、図４に示すような
コンテンツが図示のようなリンクによって関連づけられ
ているものとし、この図４を用いながら説明を行う。な
お、図３に示す主な処理は、クラスタ構成部２３におい
て行われる。Next, an example of the operation of the hypertext structure change support apparatus according to the embodiment of the present invention will be described. FIG. 3 is a flowchart illustrating an example of the operation of the log clustering unit 2. Also, FIG.
FIG. 3 is an explanatory diagram of a specific example of a structure of a content stored in an eb server 1. In the following description, it is assumed that the content as shown in FIG. 4 is associated with the link as shown, and the description will be made with reference to FIG. Note that the main processing shown in FIG. 3 is performed in the cluster configuration unit 23.

【００２４】まずＳ３１において、クラスタに含まれる
すべてのノードを訪れたユーザ数に関するアクセス数の
閾値を読み込む。またＳ３２において、クラスタを生成
する際にリンクをたどらないノードを記した非継承ノー
ド集合を読み込む。この例では、アクセス数の閾値を２
０、非継承ノード集合を［サイトの入り口］として予め
与える。なお、集合は‘［’と‘］’で囲んで示すもの
とする。First, in step S31, a threshold value of the number of accesses relating to the number of users who have visited all nodes included in the cluster is read. In S32, a non-inherited node set describing nodes that do not follow a link when generating a cluster is read. In this example, the threshold for the number of accesses is 2
0, a non-inherited node set is given in advance as [site entrance]. Note that the set is indicated by being surrounded by '[' and ']'.

【００２５】またＳ３３において、Ｗｅｂサーバ１か
ら、Ｗｅｂサイト上のノードのリンク関係を事前に抽出
しておいたリンクテーブルを読み込む。このリンクテー
ブルを生成する際に、そのＷｅｂサイトでログを取得す
ることのできない他のサイトのノードへのリンクは取り
除かれる。In step S33, a link table from which link relations of nodes on the Web site have been extracted in advance is read from the Web server 1. When the link table is generated, a link to a node of another site from which a log cannot be acquired at the Web site is removed.

【００２６】併せてＳ３４において、予め与えられた起
点ノードについて読み込む。この例では、図４に示す構
造を持つＷｅｂサイトにおいて起点ノードであるトップ
ページ「サイトの入り口」を用いる。トップページから
であれば、Ｗｅｂサイトが公開しているすべてのノード
にリンクをたどることでアクセスできる。In step S34, a predetermined starting node is read. In this example, a top page "site entrance" which is a starting node in a Web site having the structure shown in FIG. 4 is used. From the top page, it can be accessed by following links to all nodes published on the website.

【００２７】次にＳ３５において、起点ノードのみから
なるクラスタを初期クラスタとして生成し、クラスタ集
合に加える。図４に示す例では、クラスタ集合は［サイ
トの入り口］となる。Next, in S35, a cluster consisting of only the originating node is generated as an initial cluster and added to the cluster set. In the example shown in FIG. 4, the cluster set is [the entrance of the site].

【００２８】Ｓ３６において、クラスタ集合が空である
か否かを判定する。クラスタ集合が空であれば、処理が
終了したことを示すので、Ｓ３９に進んで、それまでに
生成されたクラスタ集合をログクラスタとして出力し
て、ログクラスタリング部２の処理を終える。また、ク
ラスタ集合が空でなければＳ３７へ進んでクラスタ候補
の生成を行う。最初の状態では、初期クラスタが生成さ
れているため、Ｓ３７のクラスタ候補の生成処理に移
る。In S36, it is determined whether or not the cluster set is empty. If the cluster set is empty, it indicates that the processing has been completed, so the process proceeds to S39, where the cluster set generated so far is output as a log cluster, and the processing of the log clustering unit 2 ends. If the cluster set is not empty, the process proceeds to S37 to generate a cluster candidate. In the initial state, since the initial cluster has been generated, the process proceeds to cluster candidate generation processing in S37.

【００２９】Ｓ３７において、クラスタ集合からクラス
タを取り出し、そのクラスタと、そのクラスタ中に含ま
れるノードからのリンク先ノードとの組み合わせによ
り、クラスタ候補を生成する。図４に示した例では、ノ
ード「サイトの入り口」はノード「プリンタ」、「コピ
ー」にリンクされているので、新たなクラスタ候補とし
てノード集合が［サイトの入り口，プリンタ］からなる
子クラスタと、［サイトの入り口，コピー］からなる子
クラスタがクラスタ候補として生成される。In S37, a cluster is extracted from the cluster set, and a cluster candidate is generated by a combination of the cluster and a link destination node from a node included in the cluster. In the example shown in FIG. 4, the node “site entrance” is linked to the nodes “printer” and “copy”, so that as a new cluster candidate, the node set includes a child cluster consisting of [site entrance, printer]. , [Site entrance, copy] are generated as cluster candidates.

【００３０】次にＳ３８において、生成されたクラスタ
候補がクラスタとなりうる基準を満たすか否かを調べ、
クラスタ基準を満たすものを選出する。図５は、クラス
タ候補がクラスタ基準を満たすか否かを判定する処理の
一例を示すフローチャートである。Ｓ４１において、同
じノード集合を持つクラスタが存在するか否かを判定
し、同じノード集合を持つクラスタが存在する場合には
候補から外す。Next, in S38, it is checked whether or not the generated cluster candidate satisfies a criterion that can be a cluster.
Select those that meet the cluster criteria. FIG. 5 is a flowchart illustrating an example of a process of determining whether a cluster candidate satisfies a cluster criterion. In S41, it is determined whether there is a cluster having the same node set, and if there is a cluster having the same node set, it is excluded from the candidates.

【００３１】次にＳ４２において、クラスタ候補の部分
集合をもつクラスタが過去にクラスタ基準を満たさない
として除かれているか否かを判定し、クラスタ候補の部
分集合をもつクラスタが過去に除かれている場合には、
そのクラスタ候補を候補から外す。この処理は、クラス
タ生成高速化部２４において行われる。Next, in S42, it is determined whether or not a cluster having a subset of cluster candidates has been removed in the past as not satisfying the cluster criterion, and a cluster having a subset of cluster candidates has been removed in the past. in case of,
The cluster candidate is excluded from the candidates. This processing is performed in the cluster generation speed-up unit 24.

【００３２】次にＳ４３において、クラスタに含まれる
ノード集合すべてにアクセスし、かつ起点ノードに一番
目にアクセスしているユーザの数（ユーザ識別子の数）
をアクセス数カウント部２２を用いてカウントする。そ
の数がＳ３１で読み込んだアクセス数の閾値未満の場合
には、クラスタ候補から外す。ユーザ数のカウント方法
としては、例えば、各ノードにアクセスしてきた順序ま
でが一致するもののみをカウントする手法によっても行
うことができる。この場合には、クラスタに含まれるノ
ード集合は順序情報も併せ持ち、順序情報も一致すると
きのみ同じクラスタとみなすこととする。Next, in S43, the number of users (the number of user identifiers) accessing all the node sets included in the cluster and accessing the origin node first.
Is counted using the access number counting unit 22. If the number is less than the access number threshold read in S31, it is excluded from cluster candidates. As a method of counting the number of users, for example, a method of counting only those that match the order in which each node is accessed may be used. In this case, the node set included in the cluster also has order information, and is considered to be the same cluster only when the order information matches.

【００３３】Ｓ３８では、このように図５に示した判定
基準を合格したクラスタ候補をクラスタ集合に追加す
る。そしてＳ３６に戻る。In S38, the cluster candidates that pass the determination criteria shown in FIG. 5 are added to the cluster set. Then, the process returns to S36.

【００３４】Ｓ３６に戻り、クラスタ集合が空でなけれ
ばＳ３７およびＳ３８の処理を繰り返すことになる。こ
のとき、新たに生成されたクラスタ集合に複数のクラス
タが存在する場合には、クラスタ数の一番小さいものを
取り出し、同様にクラスタ候補の生成を行う。以降、ク
ラスタ候補の生成を行う場合には、クラスタに含まれる
ノードからリンクを一つたどることでアクセス可能なノ
ードを用いて行う。これは、ユーザがアクセスした際
に、ブラウザのバック機能を使ってノードを戻り、他の
ページにアクセスしていることも考えられるためであ
る。Returning to S36, if the cluster set is not empty, the processes of S37 and S38 are repeated. At this time, if there are a plurality of clusters in the newly generated cluster set, the cluster with the smallest number of clusters is extracted, and cluster candidates are generated similarly. Hereinafter, when generating a cluster candidate, a node that can be accessed by following one link from a node included in the cluster is used. This is because the user may return to the node using the back function of the browser and access another page when accessing.

【００３５】ただし、Ｗｅｂサーバのコンテンツ中でイ
ンデックスとなっているノードについては、リンク先を
子クラスタ候補の生成に用いないように、クラスタ生成
制限部２５により制限し、組み合わせの数の増大を防ぐ
ことができる。図４に示す例では、ノード「サイトの入
り口」がインデックスとなっているとすれば、以降のク
ラスタを生成する際には、クラスタ［サイトの入り口］
からクラスタ［コピー，プリンタ］への展開は行われな
い。このインデックスノードについては、ユーザからの
明示的な入力によってクラスタを生成しないようにする
ことができる。また、ここではコンテンツの類似度によ
るクラスタリングとの対応づけを行うため、コンテンツ
間の類似度が低ければ意図ずれ検出部４では対応するコ
ンテンツクラスタが割り当てられないため、それらのノ
ードを含むクラスタが生成されなくてもよい。そこで、
事前にあるノードからリンクで繋がれているノード間の
類似度を計算し、類似度の平均が閾値以下であるノード
を検出し、検出されたノードをインデックスノードとし
てクラスタの生成を制限することが可能である。However, for the nodes which are indexed in the contents of the Web server, the link destination is restricted by the cluster generation restriction unit 25 so as not to be used for generating the child cluster candidate, and the number of combinations is prevented from increasing. be able to. In the example illustrated in FIG. 4, if the node “entrance of the site” is an index, the cluster [entrance of the site] is generated when a subsequent cluster is generated.
Is not expanded to the cluster [copy, printer]. For this index node, it is possible to prevent a cluster from being generated by an explicit input from the user. Here, since the association with the clustering based on the similarity of the contents is performed, if the similarity between the contents is low, the corresponding intention cluster is not assigned by the intention shift detecting unit 4, and the cluster including those nodes is generated. It does not have to be done. Therefore,
It is possible to calculate the similarity between nodes connected by a link from a certain node in advance, detect nodes whose average similarity is less than or equal to a threshold, and limit cluster generation using the detected nodes as index nodes. It is possible.

【００３６】また、Ｓ３７においてクラスタ候補を生成
する際には、既に同一のノード集合を持つ子クラスタが
他の親クラスタから生成されている場合には、既に生成
された子クラスタを、その子クラスタを生成しようとし
た親クラスタの子クラスタとする関係を生成することが
できる。When generating a cluster candidate in S37, if a child cluster having the same node set has already been generated from another parent cluster, the already generated child cluster is replaced with the child cluster. It is possible to generate a relationship as a child cluster of the parent cluster to be generated.

【００３７】図６は、ログクラスタの一例の説明図であ
る。上述のようなログクラスタリング部２における処理
によって、図４に示したコンテンツの構造から図６に示
すようなログクラスタが出力される。すなわち、まず初
期クラスタとしてクラスタ１［サイトの入り口］が生成
された後、ノード［サイトの入り口］からリンクを一つ
たどれるノード「プリンタ」、「コピー」から、クラス
タ２［サイトの入り口，プリンタ］と、クラスタ３［サ
イトの入り口，コピー］が生成される。同様にして、ク
ラスタ２からクラスタ４，クラスタ５が生成され、クラ
スタ３からクラスタ６が生成される。クラスタ３をもと
にクラスタ候補となったクラスタ候補１［サイトの入り
口，コピー，機種Ｄ］は、アクセス数が２０未満である
ため、クラスタとはならない。FIG. 6 is an explanatory diagram of an example of a log cluster. By the processing in the log clustering unit 2 as described above, a log cluster as shown in FIG. 6 is output from the content structure shown in FIG. That is, first, cluster 1 [site entrance] is generated as an initial cluster, and then from node “printer” and “copy” that can follow one link from node [site entrance], cluster 2 [site entrance, printer] Then, the cluster 3 [site entrance, copy] is generated. Similarly, cluster 4 and cluster 5 are generated from cluster 2, and cluster 6 is generated from cluster 3. Cluster candidate 1 [site entrance, copy, model D], which has become a cluster candidate based on cluster 3, does not become a cluster because the number of accesses is less than 20.

【００３８】クラスタ４からは、クラスタ８およびクラ
スタ９が生成される。このとき、クラスタ５からも子ク
ラスタ候補としてクラスタ８と同じクラスタが生成され
る。この場合には、上述のように、すでに生成されてい
るクラスタ８に対して、クラスタ５の子クラスタとして
の関係も生成する。同様に、クラスタ８からクラスタ１
０が生成されるが、クラスタ９の子クラスとしての関係
も生成される。From the cluster 4, clusters 8 and 9 are generated. At this time, the same cluster as cluster 8 is generated from cluster 5 as a child cluster candidate. In this case, as described above, a relationship as a child cluster of the cluster 5 is also generated for the already generated cluster 8. Similarly, from cluster 8 to cluster 1
Although 0 is generated, a relationship as a child class of the cluster 9 is also generated.

【００３９】クラスタ６からは、ノード集合［サイトの
入り口，コピー，機種Ｃ，機種Ｄ］を持つクラスタ候補
２が生成できるが、このノード集合の部分集合を持つク
ラスタ候補１が以前に基準を満たさないため、図５のＳ
４２の条件によって取り除かれ、クラスタ候補から外さ
れる。From the cluster 6, a cluster candidate 2 having a node set [entrance of site, copy, model C, model D] can be generated, but a cluster candidate 1 having a subset of this node set previously satisfies the criterion. Because there is no
It is removed by the condition of 42 and is excluded from cluster candidates.

【００４０】このようにして生成されたクラスタは、図
６に示すように有効非循環グラフ（ＤＡＧ）構造を持
ち、それぞれがどのクラスタを子クラスタとして持つの
か、どのクラスタを親クラスタとして持つかという関係
が生成されている。なお、生成したログクラスタも親子
の関係を有する木構造となるため、図６に示した例では
図４と類似した構造となっているが、コンテンツの構造
とは別の構造としてログクラスタが形成されることに留
意されたい。The clusters generated in this way have an effective acyclic graph (DAG) structure as shown in FIG. 6, and each cluster has which cluster as a child cluster and which cluster as a parent cluster. A relationship has been created. Since the generated log cluster also has a tree structure having a parent-child relationship, it has a structure similar to that of FIG. 4 in the example shown in FIG. 6, but a log cluster is formed as a structure different from the content structure. Note that

【００４１】次に、コンテンツクラスタリング部３によ
りコンテンツクラスタを生成する処理について説明す
る。コンテンツクラスタリング部３は、各ノードに対し
て例えば特願平９−１５３３８７号で示された手法等を
用いて各ノードのプロファイルを生成し、すべての２つ
のノードの組み合わせについて類似度を計算し、記憶す
る。そして、計算された類似度をもとに、既知の手法で
クラスタリングを行う。クラスタリングの手法として
は、例えばＥｌｌｅｎＭ．Ｖｏｏｒｈｅｅｓ，“Ｉ
ＭＰＬＥＭＥＮＴＩＮＧＡＧＧＬＯＭＥＲＡＴＩＶＥ
ＨＩＥＲＡＲＣＨＩＣＣＬＵＳＴＥＲＩＮＧＡＬ
ＧＯＲＩＴＨＭＳＦＯＲＵＳＥＩＮＤＯＣＵＭＥ
ＮＴＲＥＴＲＩＥＶＡＬ”，Ｉｎｆｏｒｍａｔｉｏｎ
Ｐｒｏｃｅｓｓｉｎｇ＆Ｍａｎａｇｅｍｅｎｔ，
Ｖｏｌ．２２，Ｎｏ．６，ｐｐ．４６５−４７６，１９
８６などに記載されているＨｉｅｒａｒｃｈｉｃａｌ
ＣｌｕｓｔｅｒｉｎｇＡｌｇｏｒｉｔｈｍのＣｏｍｐ
ｌｅｔｅｌｉｎｋを用いてクラスタリングを行うこと
ができる。Next, a process of generating a content cluster by the content clustering unit 3 will be described. The content clustering unit 3 generates a profile of each node with respect to each node using, for example, the method disclosed in Japanese Patent Application No. 9-153387, calculates the similarity for all combinations of two nodes, Remember. Then, clustering is performed by a known method based on the calculated similarity. As a method of clustering, for example, Ellen M. et al. Voorhees, "I
MPLEMENTING AGGLOMERATIVE
HIERARCHIC CLUSTERING AL
GORITHMS FOR USE INDOCUME
NT RETRIEVAL ”, Information
Processing & Management,
Vol. 22, no. 6, pp. 465-476,19
86 and the like.
Clustering Algorithm Comp
Clustering can be performed using the let link.

【００４２】図７は、コンテンツクラスタの一例の説明
図である。図４に示したようなコンテンツがＷｅｂサー
バ１に貯えられているとき、これらのコンテンツを階層
的にクラスタリングすることによって、図７に示すよう
なコンテンツクラスタの構造が得られる。すなわち、ノ
ード「機種Ａ」とノード「機種Ａの機能」の内容が類似
しているため、クラスタ［機種Ａ，機種Ａの機能Ａ］が
得られる。また、これらのノードとノード「機種Ｂ」と
が類似していることからクラスタ［機種Ａ，機種Ａの機
能Ａ，機種Ｂ］が得られる。一方、ノード「機種Ｃ」と
ノード「機種Ｄ」の内容が類似していることからクラス
タ［機種Ｃ，機種Ｄ］が得られる。最後にこれらのノー
ドが類似していることから、クラスタ［機種Ａ，機種Ａ
の機能Ａ，機種Ｂ，機種Ｃ，機種Ｄ］が得られる。この
ようにして、［機種Ａ，機種Ａの機能Ａ］，［機種Ａ，
機種Ａの機能Ａ，機種Ｂ］，［機種Ｃ，機種Ｄ］，［機
種Ａ，機種Ａの機能Ａ，機種Ｂ，機種Ｃ，機種Ｄ］の４
つのクラスタが得られる。FIG. 7 is an explanatory diagram of an example of a content cluster. When the contents as shown in FIG. 4 are stored in the Web server 1, by hierarchically clustering these contents, the structure of the content cluster as shown in FIG. 7 is obtained. That is, since the contents of the node “model A” and the node “function of the model A” are similar, a cluster [model A, function A of the model A] is obtained. Further, since these nodes are similar to the node “model B”, a cluster [model A, function A of model A, model B] is obtained. On the other hand, since the contents of the node “model C” and the node “model D” are similar, a cluster [model C, model D] is obtained. Finally, since these nodes are similar, the cluster [model A, model A
A, model B, model C, and model D] are obtained. Thus, [model A, function A of model A], [model A,
Function A, Model B of Model A], [Model C, Model D], [Model A, Function A of Model A, Model B, Model C, Model D]
One cluster is obtained.

【００４３】なお、このコンテンツクラスタリング部３
の処理は、各ノード間のリンクは考慮せず、例えば内容
についての類似度によりクラスタリングする。そのた
め、図４に示した例では内容的に類似したコンテンツが
リンクによって接続されているが、コンテンツクラスタ
リングによってリンク関係では離れたノードが１つのク
ラスタにまとめられることもある。あるいは、何らかの
理由でリンクがないノードについても、コンテンツクラ
スタに含めることができる。The contents clustering unit 3
Does not consider the links between the nodes, but performs clustering based on, for example, the similarity of the contents. For this reason, in the example shown in FIG. 4, content similar in content is connected by a link, but distant nodes in a link relationship may be combined into one cluster by content clustering. Alternatively, a node having no link for some reason can be included in the content cluster.

【００４４】図８は、意図ずれ検出部４および意図ずれ
提示部５における動作の一例を示すフローチャートであ
る。まずＳ５１およびＳ５２において、上述のようにし
て生成されたログクラスタおよびコンテンツクラスタを
読み込む。ここでは、図６に示したログクラスタおよび
図７に示したコンテンツクラスタを読み込んだものとす
る。FIG. 8 is a flowchart showing an example of the operation of the intention shift detecting section 4 and the intention shift presenting section 5. First, in S51 and S52, the log cluster and the content cluster generated as described above are read. Here, it is assumed that the log cluster shown in FIG. 6 and the content cluster shown in FIG. 7 have been read.

【００４５】次にＳ５３において、ログクラスタの葉ク
ラスタ（そのクラスタのもつノード集合に新たにノード
を追加したクラスタの生成が行われなかったクラスタ）
に含まれるコンテンツクラスタを検出する。図６に示し
たログクラスタの葉クラスタはクラスタ６およびクラス
タ１０である。これらのログクラスタに含まれるコンテ
ンツクラスタとして、クラスタ１０からはコンテンツク
ラスタ［機種Ａ，機種Ａの機能Ａ］，［機種Ａ，機種Ａ
の機能Ａ，機種Ｂ］が検出される。以下、コンテンツク
ラスタ［機種Ａ，機種Ａの機能Ａ，機種Ｂ］をコンテン
ツクラスタ１、コンテンツクラスタ［機種Ａ，機種Ａの
機能Ａ］をコンテンツクラスタ２と呼ぶことにする。な
お、クラスタ６に含まれるコンテンツクラスタは検出さ
れない。Next, in S53, a leaf cluster of the log cluster (a cluster in which a new node is not added to the node set of the cluster and a cluster has not been generated)
Of content clusters included in. The leaf clusters of the log cluster shown in FIG. As content clusters included in these log clusters, the content clusters [model A, function A of model A], [model A, model A]
Function A, model B] is detected. Hereinafter, the content cluster [model A, function A of model A, model B] is referred to as a content cluster 1, and the content cluster [model A, function A of model A] is referred to as a content cluster 2. Note that the content cluster included in the cluster 6 is not detected.

【００４６】Ｓ５４において、Ｓ５３で検出したコンテ
ンツクラスタを含むログクラスタの葉クラスタを集合１
に加える。コンテンツクラスタ１およびコンテンツクラ
スタ２はログクラスタのうちの葉クラスタ１０に含まれ
る。そのため、集合１にはログクラスタのうちクラスタ
１０が加えられる。In S54, the leaf cluster of the log cluster including the content cluster detected in S53 is set 1
Add to The content cluster 1 and the content cluster 2 are included in the leaf cluster 10 of the log cluster. Therefore, the cluster 10 of the log clusters is added to the set 1.

【００４７】Ｓ５５において、集合１が空であるか否か
を判定し、空になった場合には処理を終了する。集合１
が空でなければ以下の処理を行う。In S55, it is determined whether or not the set 1 is empty. If the set 1 is empty, the process ends. Set 1
If is not empty, the following processing is performed.

【００４８】Ｓ５６において、集合１からログクラスタ
の１つを取り出す。そしてＳ５７において、取り出した
クラスタから親クラスタへコンテンツクラスタを伝播す
る。図９は、ログクラスタとコンテンツクラスタとの対
応関係の説明図である。上述のように、集合１にはログ
クラスタのうちのクラスタ１０が含まれているので、こ
のクラスタ１０を集合１から取り出す。ここで、クラス
タ１０が含むコンテンツクラスタをコンテンツクラスタ
１であるものとして図９に示している。そして、図９に
矢線で示すように、クラスタ１０から親クラスタである
クラスタ８およびクラスタ９へ、コンテンツクラスタ１
を伝播する。At S56, one of the log clusters is extracted from the set 1. Then, in S57, the content cluster is propagated from the extracted cluster to the parent cluster. FIG. 9 is an explanatory diagram of the correspondence between log clusters and content clusters. As described above, since the set 1 includes the cluster 10 among the log clusters, the cluster 10 is extracted from the set 1. Here, the content cluster included in the cluster 10 is shown in FIG. 9 as being the content cluster 1. Then, as shown by an arrow in FIG. 9, the content cluster 1 is transferred from the cluster 10 to the parent clusters 8 and 9.
To propagate.

【００４９】このとき、Ｓ５８において、親クラスタで
は含まれないノードがコンテンツクラスタに存在するか
否かを判定する。親クラスタでは含まれないノードがコ
ンテンツクラスタに存在する場合には、Ｓ５９における
判定に進む。図１０は、コンテンツクラスタとログクラ
スタの重なり方の一例の説明図である。この例では、ク
ラスタ１０から親クラスタであるクラスタ９にコンテン
ツクラスタ１を伝播した場合、図１０に示すようにコン
テンツクラスタ１に存在するノード「機種Ｂ」がクラス
タ９に含まれていない。そのため、Ｓ５９に進む。この
Ｓ５８における判定において、ログクラスタとコンテン
ツクラスタとのずれを検出することができる。すなわ
ち、ユーザによるノードのアクセスと、コンテンツの内
容の類似性が一致していない部分の存在を検出すること
ができる。At this time, in S58, it is determined whether or not a node not included in the parent cluster exists in the content cluster. If a node that is not included in the parent cluster exists in the content cluster, the process proceeds to the determination in S59. FIG. 10 is an explanatory diagram of an example of how a content cluster and a log cluster overlap. In this example, when the content cluster 1 is propagated from the cluster 10 to the parent cluster 9, the node “model B” existing in the content cluster 1 is not included in the cluster 9 as shown in FIG. Therefore, the process proceeds to S59. In the determination in S58, a shift between the log cluster and the content cluster can be detected. That is, it is possible to detect the presence of a portion where the similarity between the content access and the user's access to the node does not match.

【００５０】Ｓ５９において、コンテンツクラスタと親
クラスタの関係が予め定められた形式であるか否かを判
定する。図１１、図１２は、コンテンツクラスタとログ
クラスタのずれの一例の説明図である。予め定められた
形式としては、例えば図１１や図１２に示した形式を定
めておくことができる。図１１に示した形式は、ノード
２の内容の一部についてノード３において詳しく説明し
ているような例であり、サイトを作った側としてはノー
ド２から更にノード３へのアクセスを期待しているリン
ク形式である。しかし、実際にはノード３へのアクセス
が行われていないことを示す。図１２に示した形式は、
ノード２、ノード３ではノード１の内容について詳しく
説明しているものであり、サイトを作った側としてはノ
ード１、ノード２に同程度のアクセス数を期待している
か、あるいはどちらにユーザがアクセスするかによって
ユーザの選択点を知るのに重要となるリンク形式であ
る。図１２に示す形式では、ノード２はアクセスされて
いるものの、ノード３へのアクセスはほとんどないこと
を示している。ここで、図１０に示した例は、図１２に
示すリンク形式であるので、親クラスタとコンテンツク
ラスタとは予め定められた形式であるものと判断され
る。In S59, it is determined whether or not the relationship between the content cluster and the parent cluster is in a predetermined format. FIG. 11 and FIG. 12 are explanatory diagrams of an example of a shift between a content cluster and a log cluster. As the predetermined format, for example, the formats shown in FIGS. 11 and 12 can be defined. The format shown in FIG. 11 is an example in which a part of the contents of the node 2 is described in detail in the node 3, and the site creator expects further access from the node 2 to the node 3. Link format. However, it indicates that the access to the node 3 is not actually performed. The format shown in FIG.
Nodes 2 and 3 explain the contents of node 1 in detail, and the site creator expects the same number of accesses to node 1 and node 2 or which user has access It is a link format that is important for knowing the user's selection point depending on whether it is done. The format shown in FIG. 12 indicates that the node 2 is being accessed, but the node 3 is hardly accessed. Here, since the example shown in FIG. 10 has the link format shown in FIG. 12, it is determined that the parent cluster and the content cluster are in a predetermined format.

【００５１】Ｓ５９において予め定められた形式である
と判断された場合には、意図ずれが生じているものとし
て、Ｓ６１において、評価値を計算し、管理者などに提
示する。評価値は、例えば、親クラスに分類されるアク
セス件数÷子クラスタに分類されるアクセス件数で表す
ことができる。評価値が大きいほど、コンテンツを変更
することでユーザのアクセス数を増やすことのできる可
能性が高いこと、あるいは、ユーザの選択がはっきりし
ていることを示している。図６あるいは図９に示したよ
うに、クラスタ９にアクセス傾向が分類されるアクセス
数は５０件、クラス１０にアクセス傾向が分類されるア
クセス数は３０件であるので、評価値は１．７となる。If it is determined in S59 that the format is a predetermined format, it is determined that an intentional deviation has occurred, and in S61 an evaluation value is calculated and presented to a manager or the like. The evaluation value can be represented by, for example, the number of accesses classified into the parent class divided by the number of accesses classified into the child cluster. The larger the evaluation value, the higher the possibility that the number of accesses of the user can be increased by changing the content, or the clearer the selection of the user. As shown in FIG. 6 or FIG. 9, the number of accesses in which the access tendency is classified into the cluster 9 is 50, and the number of accesses in which the access tendency is classified into the class 10 is 30, so the evaluation value is 1.7. Becomes

【００５２】図１３は、提示される評価結果の一例の説
明図である。上述のようにして計算した評価値を用い、
例えば図１３に示すような形式で管理者などに提示する
ことができる。サイトを変更すべきポイントとして、ロ
グクラスタから外れたコンテンツクラスタ内のノード、
該ノードへのリンク元のノードを強調して表示し、ログ
クラスタに含まれるノードの構造と該ログクラスタに含
まれなくなったノードの関係を示す。このとき、各ノー
ドはノードの内容をＷｅｂブラウザと同じ形式で提示す
るか、あるいは表示サイズが限られる場合には、ページ
のタイトルのみで提示する。なお、図１３においては、
図示の都合上、強調表示にはハッチングを付して示して
いる。また、表示方法は任意であり、図１３に示す形態
にとらわれることなく、わかりやすいように表示させる
ことができる。FIG. 13 is an explanatory diagram of an example of the presented evaluation result. Using the evaluation value calculated as described above,
For example, it can be presented to an administrator or the like in a format as shown in FIG. Sites that need to be changed include nodes in the content cluster that have left the log cluster,
A node that is a link source to the node is highlighted to show the relationship between the structure of the node included in the log cluster and the node that is no longer included in the log cluster. At this time, each node presents the contents of the node in the same format as the Web browser, or presents only the title of the page when the display size is limited. In FIG. 13,
For convenience of illustration, the highlighting is shown with hatching. Further, the display method is arbitrary, and the display can be performed in an easy-to-understand manner without being limited to the form shown in FIG.

【００５３】上述の説明では、Ｓ５７においてクラスタ
１０の親クラスタとしてクラスタ９へコンテンツクラス
タを伝播させた場合について説明した。クラスタ１０の
親クラスタとして、もう一つの親クラスタであるクラス
タ８にコンテンツクラスタを伝播させた場合について説
明する。この場合、コンテンツクラスタ１に含まれるノ
ード「機種Ａの機能Ａ」がクラスタ８には含まれていな
いため、Ｓ５８からＳ５９へ進む。クラスタ８とコンテ
ンツクラスタ１との関係は図１１に示すリンク形式に相
当するので、Ｓ６１において評価値が計算され、評価結
果が提示される。図１４は、提示される評価結果の別の
例の説明図である。この場合、ノード「機種Ａ」とノー
ド「機種Ａの機能Ａ」とが強調表示され、その間のリン
クに評価値を表示している。これによって、サイトの変
更ポイントを管理者などに示すことができる。この場合
も、表示形態は任意である。In the above description, the case where the content cluster is propagated to the cluster 9 as the parent cluster of the cluster 10 in S57 has been described. A case will be described in which a content cluster is propagated to cluster 8 which is another parent cluster as a parent cluster of cluster 10. In this case, since the node “function A of model A” included in the content cluster 1 is not included in the cluster 8, the process proceeds from S58 to S59. Since the relationship between the cluster 8 and the content cluster 1 corresponds to the link format shown in FIG. 11, an evaluation value is calculated in S61, and the evaluation result is presented. FIG. 14 is an explanatory diagram of another example of the presented evaluation result. In this case, the node “model A” and the node “function A of model A” are highlighted, and the link between them displays the evaluation value. Thereby, the point of change of the site can be indicated to the administrator or the like. Also in this case, the display form is arbitrary.

【００５４】Ｓ５８においてコンテンツクラスタのノー
ドが親クラスタにすべて含まれる場合や、Ｓ５９におい
て、親クラスタとコンテンツクラスタとの関係が所定の
形式ではない場合には、評価値の算出や評価結果の表示
を行わずに、Ｓ６０へ進む。また、Ｓ６１において評価
値の算出および評価結果の表示を行った後も、Ｓ６０へ
進む。In the case where all the nodes of the content cluster are included in the parent cluster in S58, or in the case where the relationship between the parent cluster and the content cluster is not in a predetermined format in S59, calculation of the evaluation value and display of the evaluation result are performed. Proceed to S60 without performing. After the calculation of the evaluation value and the display of the evaluation result in S61, the process proceeds to S60.

【００５５】Ｓ６０において、親クラスタがまだコンテ
ンツクラスタを含むか否かを判定し、含む場合にはＳ６
２で親クラスタを集合１に加える。クラスタ１０の親ク
ラスタであるクラスタ９は、コンテンツクラスタ２［機
種Ａの機能Ａ，機種Ａ］を含むので、クラスタ９が集合
１に加えられる。なお、もう一つのクラスタ１０の親ク
ラスタであるクラスタ８は、コンテンツクラスタを含ま
ないので、集合１には加えられない。In S60, it is determined whether or not the parent cluster still includes a content cluster.
At 2, the parent cluster is added to the set 1. Since the cluster 9 which is the parent cluster of the cluster 10 includes the content cluster 2 [function A of model A, model A], the cluster 9 is added to the set 1. Note that the cluster 8 which is the parent cluster of the other cluster 10 does not include the content cluster, and thus is not added to the set 1.

【００５６】そして、Ｓ５５へ戻る。この例では、集合
１にはクラスタ９がまだ存在する。そのため、Ｓ５６以
降の処理を繰り返すことになる。クラスタ９の親クラス
タであるクラスタ４にコンテンツクラスタ２を伝播させ
るが、クラスタ４にはノード「機種Ａの機能Ａ」が含ま
れない。この場合の親クラスタとコンテンツクラスタと
の関係は図１１に示した形式であるので、Ｓ６１におい
て評価値を算出し、評価結果の表示を行う。Then, the flow returns to S55. In this example, cluster 9 still exists in set 1. Therefore, the processing after S56 is repeated. The content cluster 2 is propagated to the cluster 4 that is the parent cluster of the cluster 9, but the cluster 4 does not include the node “function A of model A”. Since the relationship between the parent cluster and the content cluster in this case has the format shown in FIG. 11, the evaluation value is calculated in S61, and the evaluation result is displayed.

【００５７】このクラスタ４はコンテンツクラスタを含
まないため、集合１へ加えられず、Ｓ５５に戻って集合
１が空となり、処理を終了する。Since this cluster 4 does not include a content cluster, it is not added to the set 1, and the process returns to S55, where the set 1 becomes empty, and the processing is terminated.

【００５８】このようにして、コンテンツの内容からサ
イトを作った側として同様のアクセスを意図したノード
と、実際にアクセスされたノードとのずれを検出して表
示することができる。例えば図１３に示したような表示
を参照することで、管理者は機種Ａの人気が高いことを
知ることができる。あるいは、機種Ｂについてのアクセ
スを増加させるような対策を講じることができる。ま
た、例えば図１４に示すような表示を参照することによ
って、機種Ａの機能Ａについてもっとアクセスしてもら
えるような対策を講じることができる。In this way, it is possible to detect and display a shift between a node intended for similar access as a site that created a site from the contents of the content and a node actually accessed. For example, by referring to the display as shown in FIG. 13, the administrator can know that the model A is popular. Alternatively, it is possible to take measures to increase the access to the model B. Further, by referring to the display as shown in FIG. 14, for example, it is possible to take measures to have more access to the function A of the model A.

【００５９】上述の実施の形態は、コンピュータプログ
ラムによっても実現することが可能である。その場合、
そのプログラムおよびそのプログラムが用いるデータな
どは、コンピュータが読み取り可能な記憶媒体に記憶す
ることも可能である。記憶媒体とは、コンピュータのハ
ードウェア資源に備えられている読取装置に対して、プ
ログラムの記述内容に応じて、磁気、光、電気等のエネ
ルギーの変化状態を引き起こして、それに対応する信号
の形式で、読取装置にプログラムの記述内容を伝達でき
るものである。例えば、磁気ディスク、光ディスク、Ｃ
Ｄ−ＲＯＭ、コンピュータに内蔵されるメモリ等であ
る。The above embodiment can be realized also by a computer program. In that case,
The program and the data used by the program can be stored in a computer-readable storage medium. A storage medium is a type of signal corresponding to a change state of energy such as magnetism, light, electricity, etc., caused to a reading device provided in a hardware resource of a computer in accordance with a description content of a program. Thus, the program description can be transmitted to the reading device. For example, magnetic disk, optical disk, C
A D-ROM, a memory built in a computer, and the like.

【００６０】[0060]

【発明の効果】以上の説明から明らかなように、本発明
によれば、ユーザのアクセス傾向とＷｅｂサイトのコン
テンツ作成者の意図とのずれを検出し、Ｗｅｂサイトの
コンテンツの変更を行う際の支援を行うことができ、ユ
ーザの動向にあわせて速やかにＷｅｂサイトの維持、管
理を行うことができるという効果がある。As is apparent from the above description, according to the present invention, the difference between the access tendency of the user and the intention of the content creator of the Web site is detected, and the change of the content of the Web site is performed. Support can be provided, and there is an effect that the maintenance and management of the Web site can be promptly performed according to the trend of the user.

[Brief description of the drawings]

【図１】本発明のハイパーテキスト構造変更支援装置
の実施の一形態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a hypertext structure change support device of the present invention.

【図２】本発明のハイパーテキスト構造変更支援装置
の実施の一形態におけるログクラスタリング部の一例を
示すブロック図である。FIG. 2 is a block diagram illustrating an example of a log clustering unit in the embodiment of the hypertext structure change support device of the present invention.

【図３】ログクラスタリング部２における動作の一例
を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of an operation in the log clustering unit 2.

【図４】Ｗｅｂサーバ１に貯えられているコンテンツ
の構造の具体例の説明図である。FIG. 4 is an explanatory diagram of a specific example of the structure of content stored in the Web server 1.

【図５】クラスタ候補がクラスタ基準を満たすか否か
を判定する処理の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of a process of determining whether a cluster candidate satisfies a cluster criterion;

【図６】ログクラスタの一例の説明図である。FIG. 6 is an explanatory diagram of an example of a log cluster.

【図７】コンテンツクラスタの一例の説明図である。FIG. 7 is an explanatory diagram of an example of a content cluster.

【図８】意図ずれ検出部４および意図ずれ提示部５に
おける動作の一例を示すフローチャートである。FIG. 8 is a flowchart illustrating an example of an operation in an intention shift detecting unit 4 and an intention shift presenting unit 5;

【図９】ログクラスタとコンテンツクラスタとの対応
関係の説明図である。FIG. 9 is an explanatory diagram of the correspondence between log clusters and content clusters.

【図１０】コンテンツクラスタとログクラスタの重な
り方の一例の説明図である。FIG. 10 is an explanatory diagram of an example of how a content cluster and a log cluster overlap.

【図１１】コンテンツクラスタとログクラスタのずれ
の一例の説明図である。FIG. 11 is an explanatory diagram of an example of a shift between a content cluster and a log cluster.

【図１２】コンテンツクラスタとログクラスタのずれ
の別の例の説明図である。FIG. 12 is an explanatory diagram of another example of a shift between a content cluster and a log cluster.

【図１３】提示される評価結果の一例の説明図であ
る。FIG. 13 is an explanatory diagram of an example of a presented evaluation result.

【図１４】提示される評価結果の別の例の説明図であ
る。FIG. 14 is an explanatory diagram of another example of the presented evaluation result.

【図１５】Ｗｅｂサイト構造とユーザのアクセス経路
の一例の説明図である。FIG. 15 is an explanatory diagram of an example of a Web site structure and an access route of a user.

【図１６】Ｗｅｂサイト構造とノード間のユーザの遷
移数の一例の説明図である。FIG. 16 is an explanatory diagram of an example of a Web site structure and the number of transitions of a user between nodes.

[Explanation of symbols]

１…Ｗｅｂサーバ、２…ログクラスタリング部、３…コ
ンテンツクラスタリング部、４…意図ずれ検出部、５…
意図ずれ提示部、１１…ログ記録部、１２…コンテンツ
提供部、２１…アクセス情報記憶部、２２…アクセス数
カウント部、２３…クラスタ構成部、２４…クラスタ生
成高速化部、２５…クラスタ生成制限部。DESCRIPTION OF SYMBOLS 1 ... Web server, 2 ... log clustering part, 3 ... content clustering part, 4 ... intention shift detection part, 5 ...
Intention deviation presentation unit, 11 log recording unit, 12 content providing unit, 21 access information storage unit, 22 access count unit, 23 cluster configuration unit, 24 cluster generation acceleration unit, 25 cluster generation restriction Department.

フロントページの続きＦターム(参考） 5B075 ND04 ND36 NK02 NK32 NK44 NR03 NR06 NR12 PR04 QM08 UU40 5B082 FA11 GA02 GA15 Continuation of the front page F term (reference) 5B075 ND04 ND36 NK02 NK32 NK44 NR03 NR06 NR12 PR04 QM08 UU40 5B082 FA11 GA02 GA15

Claims

[Claims]

1. A log clustering means for clustering hypertext information groups based on access history information on hypertext information groups managed by an information system, and converting the hypertext information groups into similarities between individual hypertexts. Content clustering means for performing clustering based on the content clustering means, and an intention shift detecting means for associating the content clustering result obtained by the content clustering means with the log clustering result obtained by the log clustering means; A hypertext structure change support device, characterized in that it has a mis-intention presentation means for presenting a hypertext which is an element in a content cluster that could not be attached.

2. The log clustering means according to claim 2, wherein said user identifier accessing said hypertext information group, the time, and the same user identifier accessed during a predetermined period from an access history in which the location of the accessed information are recorded. Access information storage means for giving a single identifier to a user having the identifier and storing information accessed by the user having the identifier in chronological order without duplication and storing the identifier in combination with the given hypertext set Access number counting means for calculating, using the access information storage means, the number of identifiers having a history of accesses satisfying the order constraint given to, and the number of identifiers calculated by the access number counting means. Cluster Construction Method for Generating Hierarchical Cluster from Hypertext Set Based on The cluster constructing means comprises a cluster composed of only a predetermined starting hypertext as a vertex, and a hyperlink that can be reached by following one link from the starting hypertext and whose access history is acquired. A cluster set consisting of a pair of a hypertext and a starting hypertext extracted one by one from the text set is generated as a new cluster candidate of the next lower hierarchy, and the cluster candidate is a threshold given in advance by the access number counting means. If it is calculated to have the above number of accesses, a new cluster is generated as a child cluster of the origin hypertext,
Each of the child clusters is included in a hypertext set that can be reached by following a link from the hypertext set included in the child cluster, an access history is acquired, and the child cluster is included in the child cluster. Unextracted hypertexts are extracted one by one, and are added to the hypertext set included in the child cluster to generate a cluster candidate of the next lower hierarchy. When it is calculated that the child cluster is generated, a hierarchical cluster obtained by repeating a process of generating a new child cluster with the child cluster as a parent cluster until no new child cluster is generated is performed. The hypertext structure change support device according to claim 1, wherein:

3. The log clustering means further comprises, in a higher-level cluster having a subset of a hypertext set included in a cluster candidate newly generated when forming a hierarchical cluster, A cluster generation speed-up unit that does not generate a cluster candidate when the number of accesses from a user to a cluster is equal to or less than a predetermined threshold value; A cluster generation restricting means for restricting the number of child cluster candidates to be generated by not using a hypertext set that can be reached by following a link from text alone when generating a candidate. Item 3. A hypertext structure change support device according to item 2.

4. When a candidate for a child cluster generated from a parent cluster has already been generated as a child cluster of another parent cluster, the cluster composing means sets a new parent cluster to generate a new child cluster. 3. The hypertext structure change support device according to claim 2, wherein a parent-child relationship is generated between the cluster and the already generated child cluster.

5. In the cluster having a hierarchical structure obtained by the log clustering means, the intention shift detection means determines whether or not a log cluster set having no child cluster has a difference between hypertexts obtained by the content clustering means. A content cluster included in each cluster of the log cluster set is selected from the content cluster set based on the similarity, and a hypertext in the content cluster is determined with respect to a parent log cluster of the log cluster associated with the selected content cluster. Is calculated to be included in the parent log cluster, there is a hypertext that is not included, and the hypertext has a predetermined link relationship with a hypertext set included in the upper log cluster, and To the top log cluster The hypertext is detected when the number of accesses to be performed and the change in the number of accesses to the corresponding log cluster are in a predetermined relationship. The related hypertext is presented to the user, and the intention shift detecting means and the intention shift presenting means repeat the above processing when there is a content cluster included in the parent log cluster, thereby detecting the intention shift. 2. The hypertext structure change support device according to claim 1, wherein the detection / presentation is repeated until the content cluster included in the parent log cluster no longer exists or a predetermined stop condition is satisfied.

6. The intention shift detecting means uses a cluster having a hierarchical structure based on the degree of similarity between hypertexts. The hypertext structure change support device according to claim 5, wherein a cluster of a higher hierarchy including the cluster of (a) is excluded.

7. The cluster generation restricting means calculates a similarity between hypertexts linked by a link from the same link source hypertext, and an average value of the similarities is equal to or less than a predetermined threshold. 4. The hypertext structure change support device according to claim 3, wherein the link source hypertext is detected, and the detected hypertext is used as the predetermined hypertext.

8. The hypertext information group is clustered into log clusters based on access history information for the hypertext information group managed by the information system, and the hypertext information group is classified based on the similarity between individual hypertexts. Clustering the content clusters into content clusters, associating the content clusters with the log clusters, and presenting hypertexts as elements in the content clusters that could not be associated with the log clusters.

9. The process of clustering into a log cluster is performed by the same user who has accessed for a predetermined period from an access history in which a user identifier, a time, and a location of the accessed information are recorded. One identifier is given to the user having the identifier, and the information accessed by the user having the identifier is stored as access information as a set of the information recorded in the chronological order without allowing duplication and the identifier. The number of identifiers having a history of accessing the text set satisfying the given order constraint is calculated as the number of accesses using the access information, and a cluster consisting of only a predetermined starting hypertext is defined as a vertex,
A cluster set consisting of a set of a hypertext and a starting hypertext that can be reached by following a link from the starting hypertext and that is extracted one by one from a hypertext set for which an access history has been acquired is one level lower. It is generated as a new cluster candidate of the hierarchy, and when it is calculated that the cluster candidate has the access number equal to or greater than a predetermined threshold, it is generated as a child cluster of the origin hypertext as a new cluster, Each of the child clusters is included in a hypertext set that can be reached by following a link from the hypertext set included in the child cluster, an access history is acquired, and the child cluster is included in the child cluster. Take out the missing hypertext one by one, Is added to the set of hypertexts included in the group, a cluster candidate of the next lower hierarchy is generated, and when it is calculated that the cluster candidate has the number of accesses equal to or more than the predetermined threshold, the child 9. The hypertext structure according to claim 8, wherein a hierarchical cluster is obtained by repeating a process of generating a new child cluster having a cluster as a parent cluster until no new child cluster is generated. Change support method.

10. The process of associating the content cluster with the log cluster and presenting a hypertext which is an element in the content cluster that could not be associated with the cluster having a hierarchical structure obtained by the log clustering process. In the log cluster set having no child clusters, a content cluster included in each cluster of the log cluster set is selected from the content cluster set obtained by the content clustering process based on the similarity between hypertexts Calculating, for the parent log cluster of the log cluster associated with the selected content cluster, whether or not the hypertext in the content cluster is included in the parent log cluster; And the hyper -The text has a predetermined link relationship with the hypertext set included in the upper log cluster, and the number of accesses to the upper log cluster and the change in the number of accesses to the corresponding log cluster have a predetermined relationship. Sometimes, the hypertext is detected as having an unintended intention, and the hypertext having a predetermined relationship with the hypertext is presented to the user, and there is a content cluster included in the parent log cluster. 9. The method according to claim 8, wherein, in the case, by repeating the above processing, the detection and presentation of the intention shift are repeated until the content cluster included in the parent log cluster disappears or a predetermined stop condition is satisfied. Hypertext structure change support method.

11. A log clustering process for clustering hypertext information groups based on access history information for the hypertext information groups managed by the information system, and converting the hypertext information groups into similarities between individual hypertexts. A content clustering process for performing clustering based on the content clustering process; an intention shift detection process for associating the content clustering result obtained by the content clustering process with the log clustering result obtained by the log clustering process; A storage medium storing a hypertext structure change support program for causing a computer to execute an unintended presentation process of presenting a hypertext which is an element in the hypertext.

12. The log clustering process has a user identifier that has accessed a hypertext information group, a time, and the same user identifier that has been accessed for a predetermined period from an access history in which the location of the accessed information is recorded. A single identifier is given to the user, and the information accessed by the user having the identifier is stored in the form of a set of the information recorded in a chronological order without allowing duplication and the identifier as access information, and all the given hypertext sets are stored. , The number of identifiers having a history of accesses satisfying the given order constraint is calculated as the number of accesses using the access information, and a cluster consisting only of a predetermined starting hypertext is set as a vertex, and the starting hyper Reachable and accessible by following a single link from the text A cluster set consisting of a pair of a hypertext and a starting hypertext extracted one by one from the hypertext set whose history is acquired is generated as a new cluster candidate of the next lower hierarchy, and the cluster candidate is given in advance. If it is calculated that the number of accesses is equal to or greater than a threshold value, a new cluster is generated as a child cluster of the originating hypertext, and each of the child clusters is linked from a hypertext set included in the child cluster. Is extracted from the hypertexts that are included in the set of hypertexts that can be reached by tracing and the access histories of which have been acquired, and that are not included in the child cluster. Cluster candidates in the next lower hierarchy Generating, and when it is calculated that the cluster candidate has the number of accesses equal to or greater than the predetermined threshold, the process of generating a new child cluster with the child cluster as a parent cluster is performed by a new child cluster. The storage medium according to claim 11, wherein a hierarchical cluster obtained by repeating the process until the cluster is no longer generated is generated.

13. The content intention clustering based on the similarity between hypertexts in a cluster having a hierarchical structure obtained by the log clustering process with respect to a log cluster set having no child clusters. A content cluster included in each cluster of the log cluster set is selected from the content cluster set obtained by the processing, and a hypertext in the content cluster is determined with respect to a parent log cluster of the log cluster associated with the selected content cluster. Is calculated whether or not is included in the parent log cluster,
There is a hypertext that is not included, and the hypertext has a predetermined link relationship with the hypertext set included in the upper log cluster, and the number of accesses to the upper log cluster and the number of accesses to the corresponding log cluster When the change in the number of accesses is in a predetermined relationship, the hypertext is detected, and the intention shift presentation process presents a hypertext in a predetermined relationship with the hypertext to a user, and In the shift detection process and the intention shift presentation process, if there is a content cluster included in the parent log cluster, the above process is repeated to detect the intention shift and the content cluster included in the parent log cluster. Repeat until it disappears or the predetermined stop condition is met. A storage medium storing hypertext structure changing support program according to claim 11, wherein.