JP5959308B2

JP5959308B2 - ID assigning apparatus, method and program

Info

Publication number: JP5959308B2
Application number: JP2012116893A
Authority: JP
Inventors: 勝本庄; 敦士田上; 長谷川　亨; 亨長谷川
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2012-05-22
Filing date: 2012-05-22
Publication date: 2016-08-02
Anticipated expiration: 2032-05-22
Also published as: JP2013242804A

Description

本発明は、ウェブサイトにＩＤを割り当てる装置、方法及びプログラムに関する。 The present invention relates to an apparatus, a method, and a program for assigning an ID to a website.

従来、インターネット上で公開されているウェブサイトの中には、オフラインの個人が設定した１又は複数のオンラインの個人により管理される個人ウェブサイトが存在する。
ここで、オフラインの個人とは、ネットワーク（インターネット）を利用する現実のユーザそれぞれをいい、ネットワーク上でオンラインの個人を管理している。オンラインの個人とは、ネットワークを通じて所定のサービス群の提供を受ける仮想のユーザをいい、オフラインの個人とオンラインの個人とは、１対１又は１対多の関係にある。
近年、特に中学生や高校生の間では、各人が複数のオンラインの個人を操り、それぞれのオンラインの個人で複数の個人ウェブサイトを作成し、自身のサイト間のみならず、他者とのサイト間で互いにハイパーリンクを設け、情報やメッセージの公開及び交換を行うことが多い。 Conventionally, among websites published on the Internet, there are personal websites managed by one or more online individuals set by offline individuals.
Here, the offline individual means each real user who uses the network (Internet), and manages the online individual on the network. An online individual refers to a virtual user who is provided with a predetermined service group through a network, and an offline individual and an online individual have a one-to-one or one-to-many relationship.
In recent years, especially among junior high school students and high school students, each person manipulates multiple online individuals and creates multiple personal websites with each online individual, not only between their own sites, but also between sites with others. In many cases, hyperlinks are provided and information and messages are disclosed and exchanged.

ところで、このように相互にリンクが設けられているウェブサイトのリンク構造を解析する技術も提案されている。例えば、特許文献１では、リンク構造を解析してコミュニティの境界を判定することが示されている。 By the way, a technique for analyzing the link structure of a website in which links are provided in this way has also been proposed. For example, Patent Document 1 discloses that a boundary of a community is determined by analyzing a link structure.

特開２００６−３３１０７０号公報JP 2006-331070 A

特許文献１の手法は、ウェブサイトのリンクを抽出し、リンク先のウェブサイトを再帰的に、リンクがなくなるまで収集するものであるため、リンクで紐付けられている全てのウェブサイトが同一のコミュニティとみなされる。 Since the technique of Patent Document 1 extracts website links and recursively collects linked websites until there are no more links, all websites linked by links are the same. Considered a community.

ところが、上述のオンラインの個人は、互いにリンクされた複数の個人ウェブサイトを管理しているが、これらの個人ウェブサイトには、同一のオンラインの個人を特定する情報（ＩＤ）が含まれていない。また、個人ウェブサイトは、上述のように、他のオンラインの個人が管理する個人ウェブサイトともリンクで紐付けられている。したがって、リンクの有無からだけでは、同一のオンラインの個人が管理する個人ウェブサイトを特定することはできなかった。 However, the above-mentioned online individuals manage a plurality of personal websites linked to each other, but these personal websites do not contain information (ID) that identifies the same online individual. . Further, as described above, the personal website is linked with a personal website managed by another online individual. Therefore, it was not possible to specify a personal website managed by the same online individual only by the presence or absence of the link.

そこで、オンラインの個人による複数の個人ウェブサイトの管理方法に関する事例に基づいて、ある条件を仮定することにより、複数の個人ウェブサイトを管理者であるオンラインの個人毎に分類する手法も考えられる。
しかしながら、この仮定された条件に合致しない事例が出現すると、分類の精度が低下してしまうという課題があった。 Therefore, a method of classifying a plurality of personal websites for each online individual who is an administrator by assuming a certain condition based on an example of a method for managing a plurality of personal websites by online individuals can be considered.
However, when a case that does not meet the assumed condition appears, there is a problem that the accuracy of classification is lowered.

本発明は、複数の個人ウェブサイトを、管理者であるオンラインの個人毎に分類できるＩＤ割当装置、方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide an ID assignment apparatus, method, and program capable of classifying a plurality of personal websites for each online person who is an administrator.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）複数の個人ウェブサイト、及び当該複数の個人ウェブサイト間におけるハイパーリンクによる隣接関係を示したリンク情報を記憶する記憶部と、前記リンク情報に基づいて、２つの個人ウェブサイトが共通の隣接する個人ウェブサイトを有する程度を示す第１の指標を算出する算出部と、前記第１の指標に基づいて、１以上の個人ウェブサイトからなるクラスタの集合を生成する第１の生成部と、前記複数の個人ウェブサイトに対して、前記クラスタ毎に異なり、かつ、当該クラスタ内で同一の管理者ＩＤを割り当てる割当部と、を備えるＩＤ割当装置。 (1) A storage unit that stores a plurality of personal websites and link information indicating adjacency relationships by hyperlinks between the plurality of personal websites, and two personal websites are common based on the link information A calculation unit that calculates a first index indicating the degree of having an adjacent personal website; a first generation unit that generates a set of clusters of one or more personal websites based on the first index; An ID assigning device comprising: an assigning unit that assigns the same administrator ID to each of the plurality of personal websites, which is different for each cluster and within the cluster.

このような構成によれば、ＩＤ割当装置は、２つの個人ウェブサイトが共通の隣接する個人ウェブサイトを有する程度を示す指標に基づいて、複数の個人ウェブサイトからクラスタの集合を生成する。ＩＤ割当装置は、これらのクラスタに対して別々の管理者ＩＤを割り当てることにより、複数の個人ウェブサイトを、管理者であるオンラインの個人毎に分類できる。したがって、ＩＤ割当装置は、普遍的な指標を用いることにより、同一管理者の個人ウェブサイトを、より正確に推定できる。
この結果、ＩＤ割当装置は、同一のオンラインの個人が管理する個人ウェブサイトに関する情報を容易に取得できるので、教師や保護者等は、このＩＤ割当装置を子供達（特に中高生）が作成した個人ウェブサイトの監視に役立てることができる。 According to such a configuration, the ID allocation device generates a cluster set from a plurality of personal websites based on an index indicating the degree to which two personal websites have a common adjacent personal website. The ID assigning device can classify a plurality of personal websites for each online individual who is an administrator by assigning different administrator IDs to these clusters. Therefore, the ID allocation apparatus can estimate the personal website of the same administrator more accurately by using a universal index.
As a result, the ID allocation device can easily obtain information on personal websites managed by the same online individual, so teachers and guardians can create personal IDs created by children (especially junior and senior high school students). Useful for website monitoring.

（２）前記算出部は、前記隣接する個人ウェブサイトとして自身を含めて、前記第１の指標を算出する（１）に記載のＩＤ割当装置。 (2) The ID allocation device according to (1), wherein the calculation unit calculates the first index including itself as the adjacent personal website.

このような構成によれば、ＩＤ割当装置は、隣接する個人ウェブサイトとして自身を含める。同一管理者の個人ウェブサイトは、互いに隣接していることが多いので、隣接している個人ウェブサイトのペアに対して重み付けされることにより、ＩＤ割当装置は、同一管理者の個人ウェブサイトを、より正確に推定できる。 According to such a configuration, the ID allocation device includes itself as an adjacent personal website. Since the same administrator's personal website is often adjacent to each other, the ID assigning device assigns the same administrator's personal website by weighting the pair of adjacent personal websites. Can be estimated more accurately.

（３）前記第１の生成部は、前記第１の指標が所定以上に大きい個人ウェブサイトの組み合わせを同一のクラスタに含める（１）又は（２）に記載のＩＤ割当装置。 (3) The ID assignment device according to (1) or (2), wherein the first generation unit includes a combination of personal websites in which the first index is larger than a predetermined value in the same cluster.

このような構成によれば、ＩＤ割当装置は、第１の指標が所定以上に大きい個人ウェブサイトの組み合わせを同一のクラスタに含めるので、所定以上に類似した隣接関係を持つ個人ウェブサイトの管理者が同一であるとして、容易にクラスタを生成できる。 According to such a configuration, since the ID allocation device includes a combination of personal websites whose first index is greater than or equal to a predetermined value in the same cluster, the administrator of the personal website having an adjacency similar to the predetermined value or more. Can be easily generated by assuming that they are the same.

（４）前記第１の生成部は、前記クラスタに属する個人ウェブサイトと当該クラスタ外の個人ウェブサイトとの組み合わせに関する前記第１の指標を、所定の規則に従ってクラスタとクラスタ、又はクラスタと個人ウェブサイトの組み合わせに関する第２の指標に統合し、前記第１の指標又は前記第２の指標に基づいて、個人ウェブサイトと個人ウェブサイト、個人ウェブサイトとクラスタ、又はクラスタとクラスタとを結合したクラスタを生成する処理を、前記第１の指標及び前記第２の指標が所定値に満たなくなるまで繰り返す（１）又は（２）に記載のＩＤ割当装置。 (4) The first generation unit uses the first index relating to the combination of the personal website belonging to the cluster and the personal website outside the cluster as a cluster and a cluster or a cluster and a personal web according to a predetermined rule. Cluster integrated with a second index relating to a combination of sites, and combining a personal website and a personal website, a personal website and a cluster, or a cluster and a cluster based on the first index or the second index The ID assigning device according to (1) or (2), wherein the process of generating is repeated until the first index and the second index are less than a predetermined value.

このような構成によれば、ＩＤ割当装置は、第１の指標を、クラスタとクラスタ、又はクラスタと個人ウェブサイトの組み合わせに関する第２の指標に統合し、この第２の指標に基づいて、これらの組み合わせを結合したクラスタを生成できる。したがって、ＩＤ割当装置は、第１の指標を基にして、管理者が同一のクラスタを順次生成、拡大させることにより、管理者ＩＤを割り当てることができる。 According to such a configuration, the ID allocation device integrates the first index into the second index related to the combination of the cluster and the cluster, or the cluster and the personal website, and based on the second index, The cluster which combined the combination of can be generated. Therefore, the ID assigning device can assign the manager ID by sequentially generating and expanding the same cluster by the manager based on the first index.

（５）前記第１の生成部は、前記クラスタに属する個人ウェブサイトと当該クラスタ外の個人ウェブサイトとの組み合わせに関する前記第１の指標の最大値を前記第２の指標とする（４）に記載のＩＤ割当装置。 (5) The first generation unit sets the maximum value of the first index relating to a combination of a personal website belonging to the cluster and a personal website outside the cluster as the second index (4). The ID assigning device described.

このような構成によれば、ＩＤ割当装置は、クラスタが生成された際に、このクラスタとの組み合わせに関する第２の指標を、クラスタに属する個人ウェブサイトとの組み合わせに関する指標の最大値として容易に求めることができる。 According to such a configuration, when a cluster is generated, the ID allocation device can easily set the second index related to the combination with the cluster as the maximum value of the index related to the combination with the personal website belonging to the cluster. Can be sought.

（６）前記複数の個人ウェブサイトそれぞれのＵＲＬに基づいて分類される利用形態を表す所定数の種別、及び前記ハイパーリンクの入出力関係の構造に基づいて、予め設定されている所定の入出力の関係にある所定種別の個人ウェブサイトの組み合わせを結合して前記クラスタを生成する第２の生成部を備える（１）から（５）のいずれかに記載のＩＤ割当装置。 (6) Predetermined predetermined input / output based on a predetermined number of types representing usage forms classified based on the URLs of each of the plurality of personal websites and the structure of the input / output relationship of the hyperlink The ID allocation device according to any one of (1) to (5), further including a second generation unit configured to combine the combinations of predetermined types of personal websites having the relationship (1) to generate the cluster.

このような構成によれば、ＩＤ割当装置は、予め設定されている所定の入出力の関係にある所定種別の個人ウェブサイトの組み合わせを結合してクラスタを生成する。したがって、ＩＤ割当装置は、事例に基づく所定の条件を加味して、より正確に管理者ＩＤを割り当てることができる。 According to such a configuration, the ID allocation device generates a cluster by combining combinations of predetermined types of personal websites that have a predetermined input / output relationship set in advance. Therefore, the ID assigning device can assign the manager ID more accurately in consideration of a predetermined condition based on the case.

（７）前記複数の個人ウェブサイトそれぞれのＵＲＬにより識別可能なサービスプロバイダのアカウントに基づいて、当該アカウントが同一の個人ウェブサイトを結合して前記クラスタを生成する第３の生成部を備える（１）から（６）のいずれかに記載のＩＤ割当装置。 (7) Based on a service provider account that can be identified by the URL of each of the plurality of personal websites, the account includes a third generation unit that combines the same personal websites to generate the cluster (1) ) To (6).

このような構成によれば、ＩＤ割当装置は、サービスプロバイダのアカウントが同一の個人ウェブサイトを結合してクラスタを生成するので、より正確に管理者ＩＤを割り当てることができる。 According to such a configuration, the ID assigning apparatus generates a cluster by combining personal websites with the same service provider account, and therefore can assign an administrator ID more accurately.

（８）前記算出部は、前記ハイパーリンクの入次数及び出次数が所定の条件を満たす個人ウェブサイト、及び当該個人ウェブサイトに関する前記リンク情報を除いて、前記第１の指標を算出する（１）から（７）のいずれかに記載のＩＤ割当装置。 (8) The calculation unit calculates the first index by excluding the personal website satisfying a predetermined condition of the incoming order and the outgoing order of the hyperlink and the link information related to the personal website (1 ) To (7).

このような構成によれば、ＩＤ割当装置は、ハイパーリンクの入次数及び出次数が所定の条件によって、特定の種類の個人ウェブサイト及び隣接するハイパーリンクを除外して指標を算出する。したがって、ＩＤ割当装置は、管理者ＩＤの割り当て対象外であるサイトをノイズとして除去できるので、より正確に管理者ＩＤを割り当てることができる。 According to such a configuration, the ID allocation device calculates an index by excluding a specific type of personal website and an adjacent hyperlink according to a predetermined condition of the incoming and outgoing orders of the hyperlink. Therefore, the ID assigning apparatus can remove sites that are not subject to the assignment of the administrator ID as noise, and therefore can assign the administrator ID more accurately.

（９）前記入次数又は前記出次数は、隣接する個人ウェブサイトが当該個人ウェブサイトのＵＲＬに基づいて分類される利用形態を表す所定の種別である場合のハイパーリンクの数である（８）に記載のＩＤ割当装置。 (9) The incoming order or the outgoing order is the number of hyperlinks in a case where the adjacent personal website is a predetermined type representing a usage form classified based on the URL of the personal website (8) The ID allocation device described in 1.

このような構成によれば、ＩＤ割当装置は、サイトの種別に基づいて管理者ＩＤの割り当て対象外であるサイトをノイズとして、より確実に除去できる。 According to such a configuration, the ID assigning apparatus can more reliably remove sites that are not subject to assignment of administrator IDs as noise based on the type of site.

（１０）前記算出部は、前記リンク情報のうち、所定の期間以外に発生したハイパーリンクを除いて、前記第１の指標を算出する（１）から（９）のいずれかに記載のＩＤ割当装置。 (10) The ID calculation unit according to any one of (1) to (9), wherein the calculation unit calculates the first index by excluding a hyperlink that has occurred outside a predetermined period of the link information. apparatus.

このような構成によれば、ＩＤ割当装置は、所定の期間以外に発生したハイパーリンクを除いて指標を算出するので、最近の情報、又は特定の期間等に限定して精度を向上させると共に、処理負荷を低減できる。 According to such a configuration, since the ID allocation device calculates the index except for the hyperlink that occurs outside the predetermined period, the accuracy is limited to recent information or a specific period, and the like. Processing load can be reduced.

（１１）前記算出部は、前記リンク情報のうち、所定の期間に同一の個人ウェブサイト間で発生したハイパーリンクの数が所定数に満たない場合、当該ハイパーリンクを除いて、前記第１の指標を算出する（１）から（１０）のいずれかに記載のＩＤ割当装置。 (11) When the number of hyperlinks generated between the same personal websites in a predetermined period is less than a predetermined number in the link information, the calculation unit excludes the hyperlinks and the first information The ID assignment device according to any one of (1) to (10), which calculates an index.

このような構成によれば、ＩＤ割当装置は、所定の期間に同一の個人ウェブサイト間で発生したハイパーリンクの数が所定数に満たない場合、これらのハイパーリンクを除いて指標を算出する。したがって、ＩＤ割当装置は、所定以上の強さで結び付いているハイパーリンクを対象としてノイズを除去するので、より確実に管理者の同一を判定できる。 According to such a configuration, when the number of hyperlinks generated between the same personal websites during a predetermined period is less than the predetermined number, the ID allocation device calculates an index excluding these hyperlinks. Therefore, since the ID allocation device removes noise for hyperlinks linked with a predetermined strength or more, it is possible to determine the identity of the manager more reliably.

（１２）前記所定数は、前記ハイパーリンクのリンク元の個人ウェブサイトのＵＲＬに基づいて分類される利用形態を表す種別毎に設定される（１１）に記載のＩＤ割当装置。 (12) The ID allocation device according to (11), wherein the predetermined number is set for each type representing a usage form classified based on a URL of a personal website that is a link source of the hyperlink.

このような構成によれば、ＩＤ割当装置は、リンク元の種別毎にハイパーリンクの強さの閾値を設定するので、利用形態によって異なるハイパーリンクの発生傾向に対応して、より確実にノイズを除去できる。 According to such a configuration, the ID allocation device sets the hyperlink strength threshold value for each type of link source, and thus more reliably generates noise corresponding to the tendency of hyperlinks to vary depending on the usage mode. Can be removed.

（１３）複数の個人ウェブサイトそれぞれに対して、コンピュータが管理者ＩＤを割り当てるＩＤ割当方法であって、前記コンピュータが前記複数の個人ウェブサイト、及び当該複数の個人ウェブサイト間におけるハイパーリンクによる隣接関係を示したリンク情報を記憶し、前記リンク情報に基づいて、２つの個人ウェブサイトが共通の隣接する個人ウェブサイトを有する程度を示す第１の指標を算出する算出ステップと、前記第１の指標に基づいて、１以上の個人ウェブサイトからなるクラスタの集合を生成する生成ステップと、前記複数の個人ウェブサイトに対して、前記クラスタ毎に異なり、かつ、当該クラスタ内で同一の管理者ＩＤを割り当てる割当ステップと、を実行する方法。 (13) An ID assignment method in which a computer assigns an administrator ID to each of a plurality of personal websites, wherein the computer is adjacent to the plurality of personal websites and hyperlinks between the plurality of personal websites. A calculation step of storing link information indicating a relationship, and calculating a first index indicating a degree to which two personal websites have a common adjacent personal website based on the link information; and Based on the index, a generation step for generating a set of clusters including one or more personal websites, and for each of the plurality of personal websites, an administrator ID that is different for each cluster and is the same in the cluster And assigning step, and how to execute.

このような構成によれば、ＩＤ割当方法をコンピュータが実行することにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected when the computer executes the ID allocation method.

（１４）複数の個人ウェブサイトそれぞれに対して、コンピュータに管理者ＩＤを割り当てさせるためのＩＤ割当プログラムであって、前記コンピュータは、前記複数の個人ウェブサイト、及び当該複数の個人ウェブサイト間におけるハイパーリンクによる隣接関係を示したリンク情報を記憶し、前記リンク情報に基づいて、２つの個人ウェブサイトが共通の隣接する個人ウェブサイトを有する程度を示す第１の指標を算出する算出ステップと、前記第１の指標に基づいて、１以上の個人ウェブサイトからなるクラスタの集合を生成する生成ステップと、前記複数の個人ウェブサイトに対して、前記クラスタ毎に異なり、かつ、当該クラスタ内で同一の管理者ＩＤを割り当てる割当ステップと、を実行させるためのＩＤ割当プログラム。 (14) An ID assignment program for causing a computer to assign an administrator ID to each of a plurality of personal websites, wherein the computer is connected between the plurality of personal websites and the plurality of personal websites. A calculation step of storing link information indicating an adjacency relationship by a hyperlink, and calculating a first index indicating a degree that two personal websites have a common adjacent personal website based on the link information; A generation step of generating a set of clusters composed of one or more personal websites based on the first index, and for each of the plurality of personal websites, different for each cluster and the same in the cluster ID assigning program for executing the assigning step of assigning the administrator ID.

このような構成によれば、ＩＤ割当プログラムをコンピュータに実行させることにより、（１）と同様の効果が期待できる。 According to such a configuration, the same effect as in (1) can be expected by causing the computer to execute the ID assignment program.

本発明によれば、複数の個人ウェブサイトを、管理者であるオンラインの個人毎に分類できる。 According to the present invention, a plurality of personal websites can be classified for each online individual who is an administrator.

実施形態に係る個人ウェブサイトと、その管理者との関係を示す図である。It is a figure which shows the relationship between the personal website which concerns on embodiment, and its administrator. 実施形態に係るＯｎＩＤが割り当てられた結果を示す概要図である。It is a schematic diagram which shows the result by which OnID which concerns on embodiment was allocated. 実施形態に係るＩＤ割当装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the ID allocation apparatus which concerns on embodiment. 実施形態に係る収集履歴管理テーブルを示す図である。It is a figure which shows the collection log | history management table which concerns on embodiment. 実施形態に係るサイト間リレーションテーブルを示す図である。It is a figure which shows the relationship table between sites which concerns on embodiment. 実施形態に係る個人ウェブサイト間のＳｉｍｉｌａｒｉｔｙの算出例を示す図である。It is a figure which shows the example of calculation of Simality between the personal websites concerning embodiment. 実施形態に係るクラスタ間のＳｉｍｉｌａｒｉｔｙの算出例を示す図である。It is a figure which shows the example of calculation of Similarity between the clusters which concerns on embodiment. 実施形態に係る制御部における処理を示すフローチャートである。It is a flowchart which shows the process in the control part which concerns on embodiment. 実施形態に係るＩＤ割当処理の手順を示す第１の図である。It is a 1st figure which shows the procedure of the ID allocation process which concerns on embodiment. 実施形態に係るＩＤ割当処理の手順を示す第２の図である。It is a 2nd figure which shows the procedure of the ID allocation process which concerns on embodiment. 実施形態に係るＩＤ割当処理の手順を示す第３の図である。It is a 3rd figure which shows the procedure of the ID allocation process which concerns on embodiment. 実施形態に係るＩＤ割当処理の手順を示す第４の図である。It is a 4th figure which shows the procedure of the ID allocation process which concerns on embodiment. 実施形態に係るＩＤ割当処理の手順を示す第５の図である。It is a 5th figure which shows the procedure of the ID allocation process which concerns on embodiment.

以下、本発明の実施形態の一例について説明する。
本実施形態に係るＩＤ割当装置１は、オンラインの個人が管理する個人ウェブサイトに対して、このオンラインの個人を識別するオンラインＩＤを割り当てる装置である。なお、ＩＤ割当装置１は、サーバ装置やＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等、様々な情報処理装置（コンピュータ）であってよい。 Hereinafter, an example of an embodiment of the present invention will be described.
The ID assignment device 1 according to the present embodiment is a device that assigns an online ID for identifying an online individual to a personal website managed by the online individual. The ID assignment device 1 may be various information processing devices (computers) such as a server device and a PC (Personal Computer).

図１は、本実施形態に係る個人ウェブサイトと、その管理者との関係を示す図である。
現実の人物であるオフラインの個人は、ネットワーク（インターネット）上で、１又は複数のオンラインの個人を管理している。また、オンラインの個人は、１又は複数の個人ウェブサイトを管理している。 FIG. 1 is a diagram showing a relationship between a personal website according to the present embodiment and an administrator thereof.
An offline individual who is a real person manages one or a plurality of online individuals on a network (Internet). Also, online individuals manage one or more personal websites.

各オンラインの個人は、例えば、同じ学校の生徒であったり、同じ趣味を持つグループの一員であったり、オンラインの他者と一定の人間関係を持っている。そのため、複数のオンラインの個人がそれぞれ管理している個人ウェブサイトの間は、ハイパーリンクで参照されていることも多い。 Each online individual, for example, is a student at the same school, is part of a group with the same hobby, or has a certain relationship with others online. Therefore, hyperlinks are often referenced between personal websites managed by a plurality of online individuals.

ここで、個人ウェブサイトとは、オンラインの個人が、自身に関する情報を公開したり、オンラインの他者とメッセージを交換したりするためのウェブサイトをいう。例えば、以下のタイプの個人ウェブサイトがそれぞれ複数のサービスプロバイダにより提供されている。 Here, the personal website refers to a website for online individuals to publish information about themselves and to exchange messages with others online. For example, the following types of personal websites are each provided by a plurality of service providers.

プロフ（プロフィール）・・・個人のプロフィールを公開できるサイト。
ゲスブ（ゲストブック）・・・訪問者が履歴としてコメントを投稿できるサイト。
リアル（リアルタイム）・・・個人の現況を短い文章で投稿できるサイト。
ブログ・・・日々更新される日記を公開できるサイト。
マイリンク・・・他者の個人ウェブサイトへのリンクを掲載できるサイト。
ホムペ（ホームページ）・・・個人用のサイト。 Profile (profile): A site where you can publish your personal profile.
Gesbu (Guest Book): A site where visitors can post comments as a history.
Real (real time): A site where you can post your current situation in short sentences.
Blog: A site where you can publish a daily updated diary.
My Link: A site where you can post a link to another person's personal website.
Hompe (homepage)-personal site.

オンラインの個人は、上記の複数のタイプの個人ウェブサイトを、サービスプロバイダ毎に異なるアカウントで作成しているため、同一のＩＤによる紐付け（名寄せ）ができていないことが多い。例えば、「オンラインＩＤ（ＯｎＩＤ）＝１」であるオンラインの個人は、プロフ、ゲスブ及びリアルを管理している。これらの個人ウェブサイトは、「ＯｎＩＤ＝１」の情報を有しておらず、異なるアカウントＩＤ（１及び２）で管理されている。 Since online individuals create the above-mentioned multiple types of personal websites with different accounts for each service provider, they are often not linked (identified) with the same ID. For example, an online individual who has “Online ID (OnID) = 1” manages Prof, Gesbu, and Real. These personal websites do not have the information “OnID = 1” and are managed with different account IDs (1 and 2).

ＩＤ割当装置１は、後述の処理により、管理者であるオンラインの個人が同一である個人ウェブサイトに対して、同一のオンラインＩＤ（ＯｎＩＤ）を割り当て、複数の個人ウェブサイトをオンラインの個人毎に分類する。 The ID assigning apparatus 1 assigns the same online ID (OnID) to the personal website that is the same as the online individual who is the administrator by the process described later, and assigns a plurality of personal websites to each online individual. Classify.

図２は、本実施形態に係るＩＤ割当装置１によりオンラインＩＤが割り当てられた結果を示す概要図である。 FIG. 2 is a schematic diagram illustrating a result of online ID assignment by the ID assignment apparatus 1 according to the present embodiment.

以下、本実施形態においてＩＤの割り当ての対象とする個人ウェブサイトは、サイトの利用形態を表す次の３種類の種別（クラスＡ、クラスＢ及びクラスＣ）に分類されるものとする。なお、これらのクラス及び上記のタイプは、個人ウェブサイトのＵＲＬから判別できるものとする。 Hereinafter, in this embodiment, personal websites to which IDs are assigned are classified into the following three types (class A, class B, and class C) that indicate the usage mode of the site. These classes and the above types can be identified from the URL of the personal website.

クラスＡ（プロフ、ホムペ）・・・オンラインの個人が他者と識別するために作成する個人ウェブサイト。
クラスＢ（ゲスブ、マイリンク）・・・オンラインの個人がクラスＡのサイトに付随して作成する個人ウェブサイト。
クラスＣ（リアル、ブログ）・・・オンラインの個人が他者と識別するために単体で、又はクラスＡのサイトに付随して作成する個人ウェブサイト。 Class A (prof, hompe): Personal website created by online individuals to distinguish them from others.
Class B (Gesbu, Mylink): A personal website created by an online individual attached to a Class A site.
Class C (real, blog): A personal website created by an online individual alone or in association with a Class A site to distinguish them from others.

なお、同一の個人ウェブサイトがクラスＡを含んで複数のタイプ（例えば、プロフとマイリンク等）を持つ場合もある。この場合、ＵＲＬから判別されるクラスは、クラスＡとなる。 In some cases, the same personal website includes a class A and has a plurality of types (for example, profile and my link). In this case, the class determined from the URL is class A.

図３は、本実施形態に係るＩＤ割当装置１の機能構成を示すブロック図である。
ＩＤ割当装置１は、制御部１０と、記憶部２０と、通信部３０と、入力部４０と、出力部５０とを備える。 FIG. 3 is a block diagram showing a functional configuration of the ID assignment device 1 according to the present embodiment.
The ID assignment device 1 includes a control unit 10, a storage unit 20, a communication unit 30, an input unit 40, and an output unit 50.

制御部１０は、ＩＤ割当装置１の全体を制御する部分であり、記憶部２０に記憶された各種プログラムを適宜読み出して実行することにより、上記のハードウェアと協働し、本実施形態における各種機能を実現している。制御部１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）であってよい。なお、制御部１０が備える各部の機能は後述する。 The control unit 10 is a part that controls the entire ID allocation device 1, and by reading and executing various programs stored in the storage unit 20 as appropriate, the control unit 10 cooperates with the hardware described above, and performs various types in the present embodiment. The function is realized. The control unit 10 may be a CPU (Central Processing Unit). In addition, the function of each part with which the control part 10 is provided is mentioned later.

記憶部２０は、ハードウェア群をＩＤ割当装置１として機能させるための各種プログラム、及び各種データ等の記憶領域であり、ハードディスク（ＨＤＤ）であってよい。具体的には、記憶部２０には、本実施形態の各種機能を実現させるため制御部１０に実行させるプログラム（ＩＤ割当プログラム）が記憶される。 The storage unit 20 is a storage area for various programs and various data for causing the hardware group to function as the ID assignment device 1, and may be a hard disk (HDD). Specifically, the storage unit 20 stores a program (ID assignment program) that is executed by the control unit 10 in order to realize various functions of the present embodiment.

さらに、記憶部２０は、サイト保存ＤＢ２１と、サイト管理ＤＢ２２とを備える。サイト保存ＤＢ２１は、プログラムにて取得される個人ウェブサイトのページデータ（ＨＴＭＬファイル群）を記憶する。また、サイト管理ＤＢ２２は、プログラムにて作成又は編集される後述の収集履歴管理テーブル及びサイト間リレーションテーブルを記憶する。 Furthermore, the storage unit 20 includes a site storage DB 21 and a site management DB 22. The site storage DB 21 stores page data (HTML file group) of a personal website acquired by the program. The site management DB 22 also stores a collection history management table and an inter-site relation table, which will be described later, created or edited by a program.

通信部３０は、ＩＤ割当装置１が他の装置と情報を送受信する場合のネットワーク・アダプタであり、ネットワーク（インターネット）を介して個人ウェブサイトを管理しているサーバ１００にアクセスし、個人ウェブサイトのページデータを取得して制御部１０へ提供する。 The communication unit 30 is a network adapter in the case where the ID allocation device 1 transmits and receives information to and from other devices, and accesses the server 100 that manages the personal website via the network (Internet). The page data is acquired and provided to the control unit 10.

入力部４０は、ＩＤ割当装置１に対する利用者からの指示入力を受け付けるインタフェース装置である。入力部４０は、例えば、キーボード、マウス及びタッチパネル等により構成される。 The input unit 40 is an interface device that receives an instruction input from the user to the ID assignment device 1. The input unit 40 is configured by, for example, a keyboard, a mouse, a touch panel, and the like.

出力部５０は、利用者にデータの入力を受け付ける画面を表示したり、ＩＤ割当装置１による処理結果の画面を表示したりするディスプレイ装置を含む。さらに、出力部５０は、ブラウン管表示装置（ＣＲＴ）や液晶表示装置（ＬＣＤ）等のディスプレイ装置の他、プリンタ等の各種出力装置を含んでよい。 The output unit 50 includes a display device that displays a screen for accepting data input to the user and displays a screen of a processing result by the ID assignment device 1. Further, the output unit 50 may include various output devices such as a printer in addition to a display device such as a cathode ray tube display device (CRT) or a liquid crystal display device (LCD).

次に、制御部１０の機能を詳述する。
まず、本実施形態において使用する記号を説明する。
−個人ウェブサイトのネットワーク（有向グラフ）：Ｇ＝（Ｖ，Ｅ）
・個人ウェブサイト（ノード）：ｖ_ｉ∈Ｖ，（ｉ＝１，２，・・・，｜Ｖ｜）
個人ウェブサイトの数（グラフのサイズ）：｜Ｖ｜
・ｖ_ｉからｖ_ｊへのハイパーリンク：（ｖ_ｉ，ｖ_ｊ）∈Ｅ，（ｖ_ｉ，ｖ_ｊ∈Ｖ）
ハイパーリンクの数：｜Ｅ｜
−個人ウェブサイトの属性
・ＵＲＬ：ＵＲＬ（ｖ_ｉ）
・タイプ：ＴＹＰＥ（ｖ_ｉ）
・クラス：Ｃ（ｖ_ｉ）
・サービスプロバイダ：ＳＰ（ｖ_ｉ）
・アカウントＩＤ：ＡＣＩＤ（ｖ_ｉ）
・管理者のＩＤ（ＯｎＩＤ）：ＯＮＩＤ（ｖ_ｉ）
−ハイパーリンクの属性
・ハイパーリンクの強さ：｜（ｖ_ｉ，ｖ_ｊ）｜
具体的には、ｖ_ｉからｖ_ｊへのハイパーリンクの本数を表す。
・時刻ｔ０，ｔ１間のハイパーリンクの強さ：｜（ｖ_ｉ，ｖ_ｊ）｜_{ｔ０→ｔ１}
ブログ、リアル、ゲスブでのコメント投稿で発生するハイパーリンク等、時刻情報を持つハイパーリンクのうち、指定された期間内（時刻ｔ０からｔ１まで）のハイパーリンクの本数を表す。ただし、プロフ、ホムペ、マイリンクで発生するハイパーリンク等、時刻情報を持たないハイパーリンクは、全期間において発生しているものとみなす。 Next, the function of the control unit 10 will be described in detail.
First, symbols used in the present embodiment will be described.
-Personal website network (directed graph): G = (V, E)
Personal website (node): v _i ∈V, (i = 1, 2,..., | V |)
Number of personal websites (graph size): | V |
• Hyperlinks from v _i to v _j : (v _i , v _j ) εE, (v _i , v _j εV)
Number of hyperlinks: | E |
- attribute · URL of personal web site: URL _(v i)
Type: TYPE _(v i)
Class: C _(v i)
Service Provider: SP _(v i)
Account ID: ACID _(v i)
-Administrator ID (OnID): ONID (v _i )
- attributes hyperlink hyperlink _{_{strength: | (v i, v j}} ) |
Specifically, representing the number of hyperlinks from v _i to v _j.
・ Hyperlink strength between times t0 and t1: | (v _i , v _j ) | _{t0 → t1}
This represents the number of hyperlinks within a specified period (from time t0 to t1) among hyperlinks having time information, such as hyperlinks generated when posting comments on blogs, reals, and gebs. However, hyperlinks that do not have time information, such as hyperlinks generated by Prof, Hompe, and Mylink, are considered to have occurred during the entire period.

制御部１０は、サイト収集部１１（収集部）と、メトリック算出部１２（算出部）と、アカウントクラスタ生成部１３ａ（第３の生成部）と、種別クラスタ生成部１３ｂ（第２の生成部）と、メトリッククラスタ生成部１３ｃ（第１の生成部）と、ＩＤ割当部１４（割当部）と、アプリケーション部１５とを備える。各部は、ＩＤ割当プログラムを実行することにより実現される機能ブロックである。 The control unit 10 includes a site collection unit 11 (collection unit), a metric calculation unit 12 (calculation unit), an account cluster generation unit 13a (third generation unit), and a type cluster generation unit 13b (second generation unit). ), A metric cluster generation unit 13c (first generation unit), an ID allocation unit 14 (allocation unit), and an application unit 15. Each unit is a functional block realized by executing an ID assignment program.

サイト収集部１１は、個人ウェブサイトに含まれるハイパーリンクを抽出し、さらにこのハイパーリンクの参照先である別の個人ウェブサイトのページデータを取得する。そして、サイト収集部１１は、サイト保存ＤＢ２１にページデータを記憶すると共に、複数の個人ウェブサイトのリスト（収集履歴管理テーブル）、並びに複数の個人ウェブサイト間におけるハイパーリンクによる隣接関係及び入出力関係を示したリンク情報のリスト（サイト間リレーションテーブル）にそれぞれデータ追加し、サイト管理ＤＢ２２を更新する。 The site collection unit 11 extracts a hyperlink included in the personal website, and further acquires page data of another personal website that is a reference destination of the hyperlink. And the site collection part 11 memorize | stores page data in site preservation | save DB21, and the adjacent relationship and input / output relationship by the hyperlink between several personal website lists (collection history management table), and several personal websites Is added to the list of link information (inter-site relation table) indicating the site information, and the site management DB 22 is updated.

例えば、サイト収集部１１は、まず、収集処理の元になるルートの個人ウェブサイトのＵＲＬと、このルートの個人ウェブサイトからハイパーリンク（ＨＴＭＬにおける＜ａ＞リンク）を幾つ辿るか（リンクホップ数）等の収集範囲の指定とを受け付ける。 For example, the site collection unit 11 firstly, the URL of the personal website of the route that is the source of the collection process, and how many hyperlinks (<a> link in HTML) are traced from the personal website of this route (number of link hops). ) Etc. are accepted.

次に、サイト収集部１１は、インターネットにアクセスし、ルートの個人ウェブサイトのページデータをサイト保存ＤＢ２１に記憶する。さらに、このルートの個人ウェブサイトのＵＲＬを、収集履歴管理テーブルに追加する。 Next, the site collection unit 11 accesses the Internet and stores the page data of the root personal website in the site storage DB 21. Further, the URL of the personal website of this route is added to the collection history management table.

また、サイト収集部１１は、個人ウェブサイトからハイパーリンクを取得し、リンク先が個人ウェブサイトでないものを除いて、リンク元のＵＲＬとリンク先のＵＲＬとの組合せをサイト間リレーションテーブルに追加する。 Further, the site collection unit 11 acquires a hyperlink from the personal website, and adds a combination of the link source URL and the link destination URL to the inter-site relation table except for a link destination that is not a personal website. .

さらに、サイト収集部１１は、リンク先のＵＲＬから取得した個人ウェブサイトのページデータをサイト保存ＤＢ２１に、このＵＲＬを収集履歴管理テーブルにそれぞれ記憶する。そして、サイト収集部１１は、指定された収集範囲まで、全てのＵＲＬが収集履歴管理テーブルに記憶されると収集処理を終了する。また、サイト収集部１１は、指定された収集範囲までＵＲＬを収集できていない場合には、ハイパーリンクの取得、サイト間リレーションテーブルの更新、ページデータの記憶及び収集履歴管理テーブルの更新を繰り返す。 Further, the site collection unit 11 stores the page data of the personal website acquired from the link destination URL in the site storage DB 21 and this URL in the collection history management table. Then, the site collection unit 11 ends the collection process when all URLs are stored in the collection history management table up to the designated collection range. Further, when the URL has not been collected to the designated collection range, the site collection unit 11 repeats the acquisition of the hyperlink, the update of the inter-site relation table, the storage of the page data, and the update of the collection history management table.

図４は、本実施形態に係るサイト管理ＤＢ２２に格納される収集履歴管理テーブルを示す図である。 FIG. 4 is a diagram showing a collection history management table stored in the site management DB 22 according to the present embodiment.

収集履歴管理テーブルは、収集ＩＤ、ルートＵＲＬ、リンク元ＵＲＬ、個人ウェブサイトＵＲＬ、個人ウェブサイトのクラス、保存先、サイトホップ数、管理者ＩＤ（ＯｎＩＤ）及び収集日時を記憶する。 The collection history management table stores a collection ID, route URL, link source URL, personal website URL, personal website class, storage destination, number of site hops, administrator ID (OnID), and collection date and time.

ここで、収集ＩＤは、上記の収集処理毎に付与される識別番号である。ルートＵＲＬは、収集処理を行うために指定された個人ウェブサイトのＵＲＬである。保存先は、サイト保存ＤＢ２１内における対象の個人ウェブサイトの記憶場所を示すＵＲＬである。ＯｎＩＤは、後述のＩＤ割当部１４によって割り当てられるオンラインの個人を識別するＩＤである。 Here, the collection ID is an identification number assigned to each collection process. The root URL is the URL of the personal website designated for performing the collection process. The storage destination is a URL indicating the storage location of the target personal website in the site storage DB 21. OnID is an ID for identifying an online individual assigned by an ID assigning unit 14 described later.

また、サイトホップ数は、指定された個人ウェブサイトをルートノードとし、リンクされた隣接する個人ウェブサイト間のハイパーリンクを１ホップとしたときの、ルートノードからのホップ数である。 The number of site hops is the number of hops from the root node when a designated personal website is a root node and a hyperlink between adjacent linked personal websites is 1 hop.

図５は、本実施形態に係るサイト管理ＤＢ２２に格納されるサイト間リレーションテーブルを示す図である。 FIG. 5 is a diagram showing an inter-site relation table stored in the site management DB 22 according to the present embodiment.

サイト間リレーションテーブルは、収集ＩＤと、リンク元の個人ウェブサイト及びリンク先の個人ウェブサイトそれぞれのＵＲＬ、タイプ、クラス、サービスプロバイダの識別データ（ＳＰ）、及びサービスプロバイダにおけるアカウントＩＤ（ＡＣＩＤ）と、ハイパーリンクの発生日時と、収集日時とを記憶する。 The inter-site relation table includes the collection ID, the URL of each link source personal website and the link destination personal website, type, class, service provider identification data (SP), and service provider account ID (ACID). The occurrence date and time of the hyperlink and the collection date and time are stored.

なお、ＳＰ及びＡＣＩＤは、個人ウェブサイトのＵＲＬによって識別される。ここで、同一のＳＰにおいてＡＣＩＤが同一であれば、管理者（オンラインの個人）は同一であると見なせるが、ＳＰが異なれば、ＡＣＩＤが同一であっても管理者が同一であるとは限らない。 The SP and ACID are identified by the URL of the personal website. Here, if the ACID is the same in the same SP, the administrator (online individual) can be regarded as the same, but if the SP is different, the administrator is not necessarily the same even if the ACID is the same. Absent.

また、ハイパーリンクの発生日時は、個人ウェブサイトへのコメント投稿等によりハイパーリンクが書き込まれた日時である。なお、個人ウェブサイトの作成時から存在するハイパーリンク等、時刻情報が紐付いていないハイパーリンクの場合、発生日時は特定されないが、全期間において発生しているものとみなして後述の処理で利用する。 Further, the occurrence date and time of the hyperlink is the date and time when the hyperlink was written by posting a comment on a personal website. In the case of hyperlinks that are not linked to time information, such as hyperlinks that exist since the creation of personal websites, the date and time of occurrence are not specified, but they are considered to have occurred for the entire period and are used in the processing described below. To do.

メトリック算出部１２は、サイト間リレーションテーブルのリンク情報に基づいて、２つの個人ウェブサイトが共通の隣接する個人ウェブサイトを有する程度を示すメトリック（第１の指標）を算出する。 The metric calculation unit 12 calculates a metric (first index) indicating the degree to which two personal websites have a common adjacent personal website based on the link information in the inter-site relation table.

具体的には、ｖ_ｉとｖ_ｊのメトリックとして、以下に示す２種類の類似度（Ｓｉｍｉｌａｒｉｔｙ）のいずれかが用いられる。
（１）Ｓｉｍｉｌａｒｉｔｙ＝｜Γ（ｖ_ｉ）∩Γ（ｖ_ｊ）｜／｜Ｖ｜
ただし、
Γ（ｖ_ｉ）＝｛ｗ｜（ｖ_ｉ，ｗ）∈Ｅｏｒ（ｗ，ｖ_ｉ）∈Ｅ｝
は、隣接する個人ウェブサイト（隣接ノード）の集合であり、ハイパーリンクの向きに依存しない。また、
｜Γ（ｖ_ｉ）∩Γ（ｖ_ｊ）｜
は、共通の隣接ノードの数である。 Specifically, one of the following two types of similarity (Similarity) is used as a metric for v _i and v _j .
(1) Similarity = | Γ (v _i ) ∩Γ (v _j ) | / | V |
However,
Γ (v _i ) = {w | (v _i , w) εE or (w, v _i ) εE}
Is a set of adjacent personal websites (adjacent nodes) and does not depend on the direction of the hyperlink. Also,
| Γ (v _i ) ∩Γ (v _j ) |
Is the number of common adjacent nodes.

（２）Ｓｉｍｉｌａｒｉｔｙ＝｜Γ^＋（ｖ_ｉ）∩Γ^＋（ｖ_ｊ）｜／｜Ｖ｜
ただし、
Γ^＋（ｖ_ｉ）＝｛ｗ｜（ｖ_ｉ，ｗ）∈Ｅｏｒ（ｗ，ｖ_ｉ）∈Ｅ｝∪｛ｖ_ｉ｝
は、Γ（ｖ_ｉ）と同様に隣接ノードの集合であるが、自ノード（ｖ_ｉ）を含む。また、
｜Γ^＋（ｖ_ｉ）∩Γ^＋（ｖ_ｊ）｜
は、共通の隣接ノードの数である。 (2) Similarity = | Γ ⁺ (v _i ) ∩Γ ⁺ (v _j ) | / | V |
However,
Γ ⁺ (v _i ) = {w | (v _i , w) ∈ E or (w, v _i ) ∈ E} ∪ {v _i }
Is a set of adjacent nodes, like Γ (v _i ), but includes its own node (v _i ). Also,
| Γ ⁺ (v _i ) ∩Γ ⁺ (v _j ) |
Is the number of common adjacent nodes.

図６は、本実施形態に係る個人ウェブサイト間のＳｉｍｉｌａｒｉｔｙの算出例を示す図である。
７つのノードが図のようにハイパーリンクによって隣接している。具体的には、算出方法（１）の場合、ノードｖ_１及びノードｖ_２の隣接ノードは、
Γ（ｖ_１）＝｛ｖ_２，ｖ_３，ｖ_４，ｖ_５，ｖ_６｝
Γ（ｖ_２）＝｛ｖ_１，ｖ_３，ｖ_４，ｖ_５｝
である。このとき、共通の隣接ノードは、
Γ（ｖ_１）∩Γ（ｖ_２）＝｛ｖ_３，ｖ_４，ｖ_５｝
である。したがって、
Ｓｉｍｉｌａｒｉｔｙ＝３／７
となる。 FIG. 6 is a diagram illustrating an example of calculating Simality between personal websites according to the present embodiment.
Seven nodes are adjacent by hyperlinks as shown. Specifically, in the calculation method (1), the adjacent nodes of the node v ₁ and the node v ₂ are
Γ (v ₁ ) = {v ₂ , v ₃ , v ₄ , v ₅ , v ₆ }
Γ (v ₂ ) = {v ₁ , v ₃ , v ₄ , v ₅ }
It is. At this time, the common adjacent node is
Γ (v ₁ ) ∩Γ (v ₂ ) = {v ₃ , v ₄ , v ₅ }
It is. Therefore,
Similarity = 3/7
It becomes.

また、算出方法（２）の場合、ノードｖ_１及びノードｖ_２の隣接ノードは、
Γ^＋（ｖ_１）＝｛ｖ_１，ｖ_２，ｖ_３，ｖ_４，ｖ_５，ｖ_６｝
Γ^＋（ｖ_２）＝｛ｖ_１，ｖ_２，ｖ_３，ｖ_４，ｖ_５｝
である。このとき、共通の隣接ノードは、
Γ^＋（ｖ_１）∩Γ^＋（ｖ_２）＝｛ｖ_１，ｖ_２，ｖ_３，ｖ_４，ｖ_５｝
である。したがって、
Ｓｉｍｉｌａｒｉｔｙ＝５／７
となる。 In the case of the calculation method (2), adjacent nodes of the node v ₁ and the node v ₂ are
Γ ⁺ (v ₁ ) = {v ₁ , v ₂ , v ₃ , v ₄ , v ₅ , v ₆ }
Γ ⁺ (v ₂ ) = {v ₁ , v ₂ , v ₃ , v ₄ , v ₅ }
It is. At this time, the common adjacent node is
Γ ⁺ (v ₁ ) ∩Γ ⁺ (v ₂ ) = {v ₁ , v ₂ , v ₃ , v ₄ , v ₅ }
It is. Therefore,
Similarity = 5/7
It becomes.

なお、いずれの算出方法も、同一管理者の個人ウェブサイトは、共通の隣接ノードを多数保有する傾向があることに基づいている。
以下、自ノードを含む（２）の算出方法を用いて説明する。 Both calculation methods are based on the fact that personal websites of the same manager tend to have many common adjacent nodes.
Hereinafter, description will be made using the calculation method (2) including the own node.

また、メトリック算出部１２は、ハイパーリンクの入次数及び出次数が所定の条件を満たす個人ウェブサイト、及びこの個人ウェブサイトに関するリンク情報を除いて、Ｓｉｍｉｌａｒｉｔｙを算出する。
ここで、入次数又は出次数は、隣接する個人ウェブサイトが所定のクラス又はタイプである場合のハイパーリンクの数に限定してもよい。 In addition, the metric calculation unit 12 calculates Similarity except for the personal website in which the incoming order and outgoing order of the hyperlink satisfy a predetermined condition and link information related to the personal website.
Here, the incoming order or outgoing order may be limited to the number of hyperlinks when the adjacent personal website is of a predetermined class or type.

また、メトリック算出部１２は、リンク情報のうち、所定の期間（例えば、過去１年間）以外に発生したハイパーリンクを除いて、Ｓｉｍｉｌａｒｉｔｙを算出する。
さらに、メトリック算出部１２は、リンク情報のうち、所定の期間に同一の個人ウェブサイト間で発生したハイパーリンクの強さが所定値に満たない場合、このハイパーリンクを除いて、Ｓｉｍｉｌａｒｉｔｙを算出する。なお、所定値は、ハイパーリンクのリンク元のクラス又はタイプ毎に設定される。 In addition, the metric calculation unit 12 calculates Similarity by excluding hyperlinks that have occurred outside the predetermined period (for example, the past one year) from the link information.
Further, when the strength of the hyperlink generated between the same personal websites in the predetermined period is less than the predetermined value in the link information, the metric calculation unit 12 calculates Simularity except for the hyperlink. . The predetermined value is set for each class or type of the hyperlink link source.

アカウントクラスタ生成部１３ａは、複数の個人ウェブサイトそれぞれのＵＲＬにより識別可能なサービスプロバイダのアカウントＩＤ（ＡＣＩＤ）に基づいて、このＡＣＩＤが同一の個人ウェブサイトを結合してクラスタを生成する。 The account cluster generation unit 13a generates a cluster by combining the personal websites having the same ACID based on the account ID (ACID) of the service provider that can be identified by the URL of each of the plurality of personal websites.

種別クラスタ生成部１３ｂは、複数の個人ウェブサイトそれぞれのクラス、及びハイパーリンクの入出力関係の構造に基づいて、予め設定されている所定の入出力の関係にある所定クラスの個人ウェブサイトの組み合わせを結合してクラスタを生成する。 The type cluster generation unit 13b is configured to combine a predetermined class of personal websites having a predetermined input / output relationship based on a class of each of a plurality of personal websites and a hyperlink input / output relationship structure. To create a cluster.

例えば、種別クラスタ生成部１３ｂは、経験則に基づく以下の仮説（Ａ）、（Ｂ）に基づいて、クラスＡからクラスＢへのハイパーリンクがある場合に、このハイパーリンクのリンク元及びリンク先の個人ウェブサイトを結合してクラスタを生成する。
（Ａ）管理者は、クラスＡの個人ウェブサイトから他者の個人ウェブサイトへハイパーリンクを張らない。
（Ｂ）管理者は、自身の個人ウェブサイトから他者のクラスＢの個人ウェブサイトへハイパーリンクを張らない。 For example, if there is a hyperlink from class A to class B based on the following hypotheses (A) and (B) based on empirical rules, the type cluster generation unit 13b links the link source and link destination of this hyperlink. Create a cluster by combining your personal websites.
(A) The administrator does not create a hyperlink from a class A personal website to another person's personal website.
(B) The administrator does not create a hyperlink from his / her own personal website to another person's class B personal website.

メトリッククラスタ生成部１３ｃは、メトリックに基づいて、Ｓｉｍｉｌａｒｉｔｙが所定以上に大きい個人ウェブサイトの組み合わせを同一のクラスタに含めることにより、１以上の個人ウェブサイトからなるクラスタの集合を生成する。 The metric cluster generation unit 13c generates a set of clusters including one or more personal websites by including, in the same cluster, a combination of personal websites having a similarity greater than a predetermined value based on the metric.

具体的には、メトリッククラスタ生成部１３ｃは、クラスタに属する個人ウェブサイトと、このクラスタ外の個人ウェブサイトとの組み合わせに関するＳｉｍｉｌａｒｉｔｙ（第１の指標）を、所定の規則に従ってクラスタとクラスタ、又はクラスタと個人ウェブサイトの組み合わせに関するＳｉｍｉｌａｒｉｔｙ（第２の指標）に統合する。そして、メトリッククラスタ生成部１３ｃは、これら第１の指標又は前記第２の指標が大きいものから順に、個人ウェブサイトと個人ウェブサイト、個人ウェブサイトとクラスタ、又はクラスタとクラスタとを結合したクラスタを生成する処理を、第１の指標及び前記第２の指標が所定値に満たなくなるまで繰り返す。 Specifically, the metric cluster generation unit 13c determines the similarity (first index) relating to the combination of the personal website belonging to the cluster and the personal website outside the cluster according to a predetermined rule. Integrate with Similarity (Second Indicator) on the combination of personal websites. Then, the metric cluster generation unit 13c selects a personal website and a personal website, a personal website and a cluster, or a cluster obtained by combining clusters and clusters in descending order of the first index or the second index. The generation process is repeated until the first index and the second index are less than a predetermined value.

なお、本実施形態において、第２の指標は、クラスタに属する個人ウェブサイトと、このクラスタ外の個人ウェブサイトとの組み合わせに関する第１の指標の最大値とする。 In the present embodiment, the second index is the maximum value of the first index related to the combination of the personal website belonging to the cluster and the personal website outside the cluster.

図７は、本実施形態に係るクラスタ間のＳｉｍｉｌａｒｉｔｙの算出例を示す図である。
ノードｖ_１及びｖ_２からなるクラスタＣＬ（ｖ_１，ｖ_２）と、ノードｖ_３及びｖ_４からなるクラスタＣＬ（ｖ_３，ｖ_４）との組み合わせに関するＳｉｍｉｌａｒｉｔｙ（第２の指標）は、各ノード間のＳｉｍｉｌａｒｉｔｙ（第１の指標）から求められる。 FIG. 7 is a diagram illustrating an example of calculating similarity between clusters according to the present embodiment.
Similarity (second index) regarding the combination of the cluster CL (v ₁ , v ₂ ) composed of the nodes v ₁ and v ₂ and the cluster CL (v ₃ , v ₄ ) composed of the nodes v ₃ and v ₄ is It is obtained from the similarity (first index) between the nodes.

ここで、各ノード間のＳｉｍｉｌａｒｉｔｙは、
｜Γ^＋（ｖ_１）∩Γ^＋（ｖ_３）｜／｜Ｖ｜＝４／７
｜Γ^＋（ｖ_２）∩Γ^＋（ｖ_３）｜／｜Ｖ｜＝４／７
｜Γ^＋（ｖ_１）∩Γ^＋（ｖ_４）｜／｜Ｖ｜＝５／７
｜Γ^＋（ｖ_２）∩Γ^＋（ｖ_４）｜／｜Ｖ｜＝５／７
である。したがって、これらの最大値を用いて、クラスタ間のＳｉｍｉｌａｒｉｔｙは、
Ｓｉｍｉｌａｒｉｔｙ＝５／７
となる。 Here, the similarity between the nodes is
| Γ ⁺ (v ₁ ) ∩Γ ⁺ (v ₃ ) | / | V | = 4/7
| Γ ⁺ (v ₂ ) ∩Γ ⁺ (v ₃ ) | / | V | = 4/7
| Γ ⁺ (v ₁ ) ∩Γ ⁺ (v ₄ ) | / | V | = 5/7
| Γ ⁺ (v ₂ ) ∩Γ ⁺ (v ₄ ) | / | V | = 5/7
It is. Therefore, using these maximum values, the similarity between clusters is
Similarity = 5/7
It becomes.

ＩＤ割当部１４は、複数の個人ウェブサイトに対して、クラスタ毎に異なり、かつ、クラスタ内で同一の管理者ＩＤ（ＯｎＩＤ）を割り当て、収集履歴管理テーブルを更新する。 The ID assigning unit 14 assigns the same administrator ID (OnID) that is different for each cluster to the plurality of personal websites and updates the collection history management table.

アプリケーション部１５は、入力部４０を介して利用者からの指示入力を受け付け、サイト管理ＤＢ２２に蓄積された情報を出力部５０へ出力して利用者に提供する。具体的には、アプリケーション部１５は、収集履歴管理テーブル及びサイト間リレーションテーブルに基づいて、個人ウェブサイト間のハイパーリンク、又はクラスタ間のハイパーリンク、さらにはその強さを可視化して出力する。これにより、管理者であるオンラインの個人のネットワークが可視化される。 The application unit 15 receives an instruction input from the user via the input unit 40, and outputs the information stored in the site management DB 22 to the output unit 50 and provides it to the user. Specifically, the application unit 15 visualizes and outputs hyperlinks between personal websites or hyperlinks between clusters and further the strength based on the collection history management table and the site relation table. As a result, an online personal network as an administrator is visualized.

次に、ＯｎＩＤを割り当てる処理の手順を詳述する。
図８は、本実施形態に係る制御部１０における処理を示すフローチャートである。
なお、処理対象とする範囲の個人ウェブサイトの収集は終了し、サイト管理ＤＢ２２に記憶されているものとする。ただし、収集履歴管理テーブルのＯｎＩＤは空欄である。 Next, the procedure of the process for assigning OnID will be described in detail.
FIG. 8 is a flowchart showing processing in the control unit 10 according to the present embodiment.
It is assumed that the collection of personal websites within the range to be processed has been completed and stored in the site management DB 22. However, OnID in the collection history management table is blank.

ステップＳ１において、制御部１０は、サイト管理ＤＢ２２からＧ（Ｖ，Ｅ）のデータを取得する。 In step S 1, the control unit 10 acquires G (V, E) data from the site management DB 22.

ステップＳ２において、制御部１０（メトリック算出部１２）は、ステップＳ１で取得したＧ（Ｖ，Ｅ）の中から、入次数がＡＤ＿Ｄｅｇ＿｛ｉｎ｝以上で、かつ出次数がＡＤ＿Ｄｅｇ＿｛ｏｕｔ｝以下のノード及び接するリンクを除去する。 In step S2, the control unit 10 (metric calculation unit 12) determines that the input order is AD_Deg_ {in} or more and the output order is AD_Deg_ {out} or less from G (V, E) acquired in step S1. Remove nodes and tangent links.

ここで、広告サイト等、不特定の相手に自身がリンク先となるハイパーリンクを多数発生させるが、逆向きのハイパーリンクはほとんど発生しないサイトを除去するために、例えば、「ＡＤ＿Ｄｅｇ＿｛ｉｎ｝＝１０００」、「ＡＤ＿Ｄｅｇ＿｛ｏｕｔ｝＝５」といった値が用いられる。このとき、リンク元をクラスＢ又はクラスＣに限定してもよい。 Here, in order to remove a site such as an advertisement site where many hyperlinks are linked to an unspecified partner, but hardly generate a reverse hyperlink, for example, “AD_Deg_ {in} = Values such as “1000” and “AD_Deg_ {out} = 5” are used. At this time, the link source may be limited to class B or class C.

また、詩や歌詞等を公開しているサイトは、不特定の相手のクラスＡの個人ウェブサイトから多数の参照を受けるため、例えば、「ＡＤ＿Ｄｅｇ＿｛ｉｎ｝＝５００」、「ＡＤ＿Ｄｅｇ＿｛ｏｕｔ｝＝１０」といった値が用いられる。このとき、リンク元をクラスＡに限定してもよい。 In addition, since a site that publishes poetry, lyrics, etc. receives many references from an unspecified partner's class A personal website, for example, “AD_Deg_ {in} = 500”, “AD_Deg_ {out} = A value such as “10” is used. At this time, the link source may be limited to class A.

ステップＳ３において、制御部１０（メトリック算出部１２）は、ステップＳ１で取得したＧ（Ｖ，Ｅ）の中から、出次数がＣＥＬＥＢ＿Ｄｅｇ＿｛ｏｕｔ｝以上のノード及び接するリンクを除去する。また、入次数がＣＥＬＥＢ＿Ｄｅｇ＿｛ｉｎ｝以下の条件を付加してもよい。 In step S 3, the control unit 10 (metric calculation unit 12) removes a node whose outgoing order is equal to or greater than CELEB_Deg_ {out} and a link that contacts the node from G (V, E) acquired in step S 1. In addition, a condition that the incoming order is equal to or less than CELEB_Deg_ {in} may be added.

ここで、有名人の個人ウェブサイトは、特にクラスＢ又はクラスＣの個人ウェブサイトに不特定の相手から多数のコメントを受けるため、自身がリンク元となるハイパーリンクが多数発生する。そこで、例えば、「ＣＥＬＥＢ＿Ｄｅｇ＿｛ｏｕｔ｝＝１００００」、「ＣＥＬＥＢ＿Ｄｅｇ＿｛ｉｎ｝＝１００」といった値が用いられる。このとき、リンク元をクラスＢ又はクラスＣに限定してもよい。 Here, since the personal website of a celebrity receives a lot of comments from an unspecified partner, especially on a personal website of class B or class C, a large number of hyperlinks that are linked to themselves are generated. Therefore, for example, values such as “CELEB_Deg_ {out} = 10000” and “CELEB_Deg_ {in} = 100” are used. At this time, the link source may be limited to class B or class C.

ステップＳ４において、制御部１０（メトリック算出部１２）は、ステップＳ１で取得したＧ（Ｖ，Ｅ）の中から、指定した期間（例えば、過去１年間）でのハイパーリンクの強さがＳｔｒｅｎｇｔｈ以下のリンクを除去する。このＳｔｒｅｎｇｔｈは、リンク元のクラス毎に予め設定される。 In step S4, the control unit 10 (metric calculation unit 12) determines that the strength of the hyperlink in the specified period (for example, the past one year) from the G (V, E) acquired in step S1 is not more than Strength. Remove the link. This Strength is preset for each link source class.

ステップＳ５において、制御部１０（メトリック算出部１２）は、Ｇ（Ｖ，Ｅ）内の全てのノードのペアについて、Ｓｉｍｉｌａｒｉｔｙを算出する。 In step S 5, the control unit 10 (metric calculation unit 12) calculates Similarity for all node pairs in G (V, E).

ステップＳ６において、制御部１０（アカウントクラスタ生成部１３ａ）は、個人ウェブサイトのＵＲＬから識別されるアカウントＩＤ（ＡＣＩＤ）が共通のノードがある場合、これらのノードを含む新たなクラスタを生成する。このとき、ノードが既にクラスタに属している場合、制御部１０（アカウントクラスタ生成部１３ａ）は、このクラスタに属している全てのノードを含む新たなクラスタを生成する。 In step S6, when there is a node having a common account ID (ACID) identified from the URL of the personal website, the control unit 10 (account cluster generation unit 13a) generates a new cluster including these nodes. At this time, if the node already belongs to the cluster, the control unit 10 (account cluster generation unit 13a) generates a new cluster including all the nodes belonging to this cluster.

ステップＳ７において、制御部１０（種別クラスタ生成部１３ｂ）は、クラスＡからクラスＢへのハイパーリンクがある場合、このハイパーリンクの両端のノードを含む新たなクラスタを生成する。このとき、ノードが既にクラスタに属している場合、制御部１０（種別クラスタ生成部１３ｂ）は、このクラスタに属している全てのノードを含む新たなクラスタを生成する。 In step S7, when there is a hyperlink from class A to class B, the control unit 10 (type cluster generation unit 13b) generates a new cluster including nodes at both ends of the hyperlink. At this time, if the node already belongs to the cluster, the control unit 10 (type cluster generation unit 13b) generates a new cluster including all the nodes belonging to this cluster.

ステップＳ８において、制御部１０（メトリッククラスタ生成部１３ｃ）は、全てのノード及びクラスタのペアの中で最もＳｉｍｉｌａｒｉｔｙが高いペアを結合し、新たなクラスタを生成する。 In step S8, the control unit 10 (metric cluster generation unit 13c) combines a pair having the highest similarity among all node and cluster pairs to generate a new cluster.

ステップＳ９において、制御部１０（メトリッククラスタ生成部１３ｃ）は、全てのノード及びクラスタのペアのＳｉｍｉｌａｒｉｔｙが閾値Ｔｈ未満か否かを判定する。この判定がＹＥＳの場合、処理はステップＳ１０に移り、判定がＮＯの場合、処理はステップＳ８に戻ってクラスタの生成が繰り返される。 In step S 9, the control unit 10 (metric cluster generation unit 13 c) determines whether the similarity of all node and cluster pairs is less than the threshold Th. If this determination is YES, the process proceeds to step S10, and if the determination is NO, the process returns to step S8 to repeat cluster generation.

ステップＳ１０において、制御部１０（ＩＤ割当部１４）は、各ノード及びクラスタに対して、それぞれに固有のＯｎＩＤを割り当てる。 In step S10, the control unit 10 (ID allocation unit 14) allocates a unique OnID to each node and cluster.

なお、上記のＴｈ、ＡＤ＿Ｄｅｇ＿｛ｉｎ｝、ＡＤ＿Ｄｅｇ＿｛ｏｕｔ｝、ＣＥＬＥＢ＿Ｄｅｇ＿｛ｏｕｔ｝、ＣＥＬＥＢ＿Ｄｅｇ＿｛ｉｎ｝、期間、Ｓｔｒｅｎｇｔｈは、入力部４０を介して利用者から予め受け付ける。 Note that Th, AD_Deg_ {in}, AD_Deg_ {out}, CELEB_Deg_ {out}, CELEB_Deg_ {in}, period, and Strength are received from the user via the input unit 40 in advance.

図９〜図１３は、本実施形態に係るＯｎＩＤの割当処理の手順を示す図である。
図９のように、７つの個人ウェブサイト（ノード）がハイパーリンクによってリンクされているものとする。
Ｖ＝｛ｖ_１，ｖ_２，ｖ_３，ｖ_４，ｖ_５，ｖ_６，ｖ_７｝
ただし、ｖ_１，ｖ_７：クラスＡ、ｖ_２，ｖ_６：クラスＢ、ｖ_３，ｖ_４，ｖ_５：クラスＣ
Ｅ＝｛（ｖ_１，ｖ_２），（ｖ_２，ｖ_３），（ｖ_２，ｖ_４），（ｖ_２，ｖ_５），（ｖ_３，ｖ_１），（ｖ_３，ｖ_４），（ｖ_４，ｖ_１），（ｖ_４，ｖ_５），（ｖ_５，ｖ_１），（ｖ_６，ｖ_１），（ｖ_６，ｖ_７），（ｖ_７，ｖ_６）｝ 9 to 13 are diagrams showing the procedure of OnID allocation processing according to the present embodiment.
Assume that seven personal websites (nodes) are linked by hyperlinks as shown in FIG.
V = {v ₁ , v ₂ , v ₃ , v ₄ , v ₅ , v ₆ , v ₇ }
_However, _v _1, v _7: Class _A, _v _2, v _6: Class _{_{B, v 3, v 4,}} v 5: Class C
E = {(v ₁ , v ₂ ), (v ₂ , v ₃ ), (v ₂ , v ₄ ), (v ₂ , v ₅ ), (v ₃ , v ₁ ), (v ₃ , v ₄ ) _{_{_{_{, (v 4, v 1)}}}} , (v 4, v 5), (v 5, v 1), (v 6, v 1), (v 6, v 7), (v 7, v 6)}

このとき、各ノードの組み合わせ２１通りのＳｉｍｉｌａｒｉｔｙは、表のように、「０／７」から「５／７」の範囲で算出される。
以下、クラスタ生成時の閾値は、「Ｔｈ＝５／７」であるとする。 At this time, 21 combinations of each node are calculated in the range of “0/7” to “5/7” as shown in the table.
Hereinafter, it is assumed that the threshold at the time of cluster generation is “Th = 5/7”.

まず、図１０のように、クラスＡからクラスＢへのハイパーリンク（ｖ_１，ｖ_２）について、両端のノードを結合したクラスタＣＬ（ｖ_１，ｖ_２）が生成される。
このとき、Ｓｉｍｉｌａｒｉｔｙの表において、ノードｖ_１及びｖ_２が削除されると共に、これらがクラスタＣＬ（ｖ_１，ｖ_２）に結合される。 First, as shown in FIG. 10, for a hyperlink (v ₁ , v ₂ ) from class A to class B, a cluster CL (v ₁ , v ₂ ) that combines nodes at both ends is generated.
At this time, in the Similarity table, the nodes v ₁ and v ₂ are deleted, and these are coupled to the cluster CL (v ₁ , v ₂ ).

また、ＣＬ（ｖ_１，ｖ_２）に関するＳｉｍｉｌａｒｉｔｙは、ｖ_１，ｖ_２に関するＳｉｍｉｌａｒｉｔｙの最大値が採用される。例えば、ＣＬ（ｖ_１，ｖ_２）とｖ_３とのペアに対しては、「４／７」となり、ＣＬ（ｖ_１，ｖ_２）とｖ_６とのペアに対しては、「２／７」及び「１／７」の最大値である「２／７」となる。 Further, the maximum value of the similarity regarding v ₁ and v ₂ is adopted as the similarity regarding CL (v ₁ , v ₂ ). For example, for a pair of CL (v ₁ , v ₂ ) and v ₃ , “4/7”, and for a pair of CL (v ₁ , v ₂ ) and v ₆ , “2 / The maximum value of “7” and “1/7” is “2/7”.

さらに、図１１のように、クラスＡからクラスＢへのハイパーリンク（ｖ_７，ｖ_６）について、両端のノードを結合したクラスタＣＬ（ｖ_６，ｖ_７）が生成される。
このとき、Ｓｉｍｉｌａｒｉｔｙの表において、ノードｖ_６及びｖ_７が削除されると共に、これらがクラスタＣＬ（ｖ_６，ｖ_７）に結合される。 Furthermore, as shown in FIG. 11, a cluster CL (v ₆ , v ₇ ) in which nodes at both ends are combined is generated for the hyperlink (v ₇ , v ₆ ) from class A to class B.
At this time, in the Similarity table, nodes v ₆ and v ₇ are deleted, and these are coupled to the cluster CL (v ₆ , v ₇ ).

次に、図１１において、Ｓｉｍｉｌａｒｉｔｙが最も高いペアであるＣＬ（ｖ_１，ｖ_２）とｖ_４との組み合わせが結合され、図１２のように、新たなクラスタＣＬ（ｖ_１，ｖ_２，ｖ_４）が生成される。 Next, in FIG. 11, the combination of CL (v ₁ , v ₂ ) and v ₄ , which is the pair with the highest similarity, is combined, and a new cluster CL (v ₁ , v ₂ , v, as shown in FIG. 12). ₄ ) is generated.

この結果、図１２において、Ｓｉｍｉｌａｒｉｔｙが全て閾値Ｔｈ＝５を下回ったので、クラスタの生成は終了する。
そして、図１３のように、各ノード及びクラスタに対して、４つのＯｎＩＤが割り当てられる。 As a result, in FIG. 12, all of the Similarity falls below the threshold Th = 5, and thus the generation of the cluster ends.
As shown in FIG. 13, four OnIDs are assigned to each node and cluster.

以上のように、本実施形態によれば、ＩＤ割当装置１は、２つの個人ウェブサイトが共通の隣接する個人ウェブサイトを有する程度を示す指標（メトリック）であるＳｉｍｉｌａｒｉｔｙに基づいて、複数の個人ウェブサイトからクラスタの集合を生成する。ＩＤ割当装置１は、これらのクラスタに対して別々の管理者ＩＤを割り当てることにより、複数の個人ウェブサイトを、管理者であるオンラインの個人毎に分類できる。したがって、ＩＤ割当装置１は、普遍的なメトリックを用いることにより、同一管理者の個人ウェブサイトを、より正確に推定できる。
この結果、ＩＤ割当装置１は、同一のオンラインの個人が管理する個人ウェブサイトに関する情報を容易に取得できるので、教師や保護者等は、このＩＤ割当装置１を子供達（特に中高生）が作成した個人ウェブサイトの監視に役立てることができる。 As described above, according to the present embodiment, the ID assigning apparatus 1 uses a plurality of individuals based on Similarity, which is an index (metric) indicating the degree that two personal websites have a common adjacent personal website. Create a set of clusters from a website. The ID assigning apparatus 1 can classify a plurality of personal websites for each online individual who is an administrator by assigning different administrator IDs to these clusters. Therefore, the ID assigning apparatus 1 can estimate the personal website of the same administrator more accurately by using a universal metric.
As a result, since the ID allocation device 1 can easily acquire information on a personal website managed by the same online individual, teachers, guardians, etc. create this ID allocation device 1 by children (especially junior and senior high school students). Can be used to monitor personal websites.

また、ＩＤ割当装置１は、隣接する個人ウェブサイトとして自身を含める。同一管理者の個人ウェブサイトは、互いに隣接していることが多いので、隣接している個人ウェブサイトのペアに対して重み付けされることにより、ＩＤ割当装置１は、同一管理者の個人ウェブサイトを、より正確に推定できる。 Further, the ID assignment device 1 includes itself as an adjacent personal website. Since personal websites of the same administrator are often adjacent to each other, the ID assigning apparatus 1 can be assigned to the personal website of the same administrator by weighting the pairs of adjacent personal websites. Can be estimated more accurately.

また、ＩＤ割当装置１は、第１の指標が所定以上に大きい個人ウェブサイトの組み合わせを同一のクラスタに含めるので、所定以上に類似した隣接関係を持つ個人ウェブサイトの管理者が同一であるとして、容易にクラスタを生成できる。 In addition, since the ID allocation device 1 includes a combination of personal websites whose first index is greater than or equal to a predetermined value in the same cluster, it is assumed that the administrators of personal websites having adjacency relationships similar to or greater than the predetermined value are the same. Can easily generate clusters.

また、ＩＤ割当装置１は、第１の指標を、クラスタとクラスタ、又はクラスタと個人ウェブサイトの組み合わせに関する第２の指標に統合し、この第２の指標に基づいて、これらの組み合わせを結合したクラスタを生成できる。したがって、ＩＤ割当装置１は、第１の指標を基にして、管理者が同一のクラスタを順次生成、拡大させることにより、ＯｎＩＤを割り当てることができる。
さらに、ＩＤ割当装置１は、クラスタが生成された際に、このクラスタとの組み合わせに関する第２の指標を、クラスタに属する個人ウェブサイトとの組み合わせに関する指標の最大値として容易に求めることができる。 Further, the ID assigning apparatus 1 integrates the first index into the second index related to the combination of the cluster and the cluster or the cluster and the personal website, and combines these combinations based on the second index. A cluster can be created. Therefore, the ID assigning apparatus 1 can assign an OnID by sequentially generating and expanding the same cluster based on the first index.
Furthermore, when the cluster is generated, the ID allocation device 1 can easily obtain the second index related to the combination with the cluster as the maximum value of the index related to the combination with the personal website belonging to the cluster.

また、ＩＤ割当装置１は、予め設定されている所定の入出力の関係にある所定クラスの個人ウェブサイトの組み合わせを結合してクラスタを生成する。したがって、ＩＤ割当装置１は、事例に基づく所定の条件を加味して、より正確にＯｎＩＤを割り当てることができる。 Further, the ID assigning apparatus 1 generates a cluster by combining combinations of a predetermined class of personal websites having a predetermined input / output relationship set in advance. Therefore, the ID assignment device 1 can assign an OnID more accurately in consideration of a predetermined condition based on the case.

また、ＩＤ割当装置１は、サービスプロバイダのアカウントが同一の個人ウェブサイトを結合してクラスタを生成するので、より正確にＯｎＩＤを割り当てることができる。 In addition, since the ID assignment apparatus 1 generates a cluster by combining personal websites with the same service provider account, it can assign an OnID more accurately.

また、ＩＤ割当装置１は、ハイパーリンクの入次数及び出次数が所定の条件によって、特定の種類の個人ウェブサイト及び隣接するハイパーリンクを除外してＳｉｍｉｌａｒｉｔｙを算出する。したがって、ＩＤ割当装置１は、ＯｎＩＤの割り当て対象外であるサイトをノイズとして除去できるので、より正確にＯｎＩＤを割り当てることができる。
さらに、ＩＤ割当装置１は、サイトのクラスに基づいてＯｎＩＤの割り当て対象外であるサイトをノイズとして、より確実に除去できる。 Further, the ID assigning apparatus 1 calculates Similarity by excluding specific types of personal websites and adjacent hyperlinks according to predetermined conditions of the incoming and outgoing orders of hyperlinks. Therefore, the ID assigning apparatus 1 can remove sites that are not subject to OnID assignment as noise, and therefore can assign OnID more accurately.
Furthermore, the ID allocation device 1 can more reliably remove sites that are not subject to OnID allocation as noise based on the site class.

また、ＩＤ割当装置１は、所定の期間以外に発生したハイパーリンクを除いてＳｉｍｉｌａｒｉｔｙを算出するので、最近の情報、又は特定の期間等に限定して精度を向上させると共に、処理負荷を低減できる。 In addition, since the ID allocation device 1 calculates Similarity excluding hyperlinks generated outside a predetermined period, the accuracy can be improved only for recent information or a specific period, and the processing load can be reduced. .

また、ＩＤ割当装置１は、所定の期間に同一の個人ウェブサイト間で発生したハイパーリンクの強さが所定値Ｓｔｒｅｎｇｔｈに満たない場合、これらのハイパーリンクを除いてＳｉｍｉｌａｒｉｔｙを算出する。したがって、ＩＤ割当装置１は、所定以上の強さで結び付いているハイパーリンクを対象としてノイズを除去するので、より確実に管理者の同一を判定できる。
さらに、ＩＤ割当装置１は、リンク元の種別毎にハイパーリンクの強さの閾値であるＳｔｒｅｎｇｔｈを設定するので、利用形態によって異なるハイパーリンクの発生傾向に対応して、より確実にノイズを除去できる。 Further, when the strength of hyperlinks generated between the same personal websites during a predetermined period is less than the predetermined value Strength, the ID allocation device 1 calculates Simularity excluding these hyperlinks. Therefore, the ID allocation device 1 removes noise for hyperlinks linked with a strength of a predetermined level or more, so that it is possible to determine the identity of the manager more reliably.
Furthermore, since the ID allocation device 1 sets Strength, which is a threshold of hyperlink strength, for each type of link source, noise can be more reliably removed in response to the occurrence tendency of hyperlinks that differ depending on the usage mode. .

以上、本発明の実施形態について説明したが、本発明は前述した実施形態に限るものではない。また、本実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本実施形態に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. Further, the effects described in the present embodiment are merely a list of the most preferable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the present embodiment.

前述の実施形態では、ノードｖ_ｉ及びｖ_ｊに関するメトリックとして、２種類のＳｉｍｉｌａｒｉｔｙのいずれかを用いることとしたが、メトリックは、これらには限られない。例えば、Ｊａｃｃａｒｄ係数を採用して、
（３）Ｓｉｍｉｌａｒｉｔｙ＝｜Γ（ｖ_ｉ）∩Γ（ｖ_ｊ）｜／｜Γ（ｖ_ｉ）∪Γ（ｖ_ｊ）｜
又は、
（４）Ｓｉｍｉｌａｒｉｔｙ＝｜Γ^＋（ｖ_ｉ）∩Γ^＋（ｖ_ｊ）｜／｜Γ^＋（ｖ_ｉ）∪Γ^＋（ｖ_ｊ）｜
としてもよい。このように、分母をｖ_ｉ又はｖ_ｊに隣接したノードの総数とすることにより、ＩＤ割当装置１は、収集したノードの数｜Ｖ｜に依存しない閾値を設定することができる。 In the above-described embodiment, one of the two types of similarity is used as the metric regarding the nodes v _i and v _j , but the metric is not limited to these. For example, using the Jaccard coefficient,
(3) Similarity = | Γ (v _i ) ∩Γ (v _j ) | / | Γ (v _i ) ∪Γ (v _j ) |
Or
(4) Similarity = | Γ ⁺ (v _i ) ∩Γ ⁺ (v _j ) | / | Γ ⁺ (v _i ) ∪Γ ⁺ (v _j ) |
It is good. Thus, by setting the denominator to the total number of nodes adjacent to v _i or v _j , the ID allocation device 1 can set a threshold value that does not depend on the number of collected nodes | V |.

１ＩＤ割当装置
１０制御部
１１サイト収集部（収集部）
１２メトリック算出部（算出部）
１３ａアカウントクラスタ生成部（第３の生成部）
１３ｂ種別クラスタ生成部（第２の生成部）
１３ｃメトリッククラスタ生成部（第１の生成部）
１４ＩＤ割当部（割当部）
１５アプリケーション部
２０記憶部
２１サイト保存ＤＢ
２２サイト管理ＤＢ
３０通信部
４０入力部
５０出力部 1 ID allocation device 10 Control unit 11 Site collection unit (collection unit)
12 Metric calculation part (calculation part)
13a Account cluster generation unit (third generation unit)
13b Type cluster generation unit (second generation unit)
13c Metric cluster generation unit (first generation unit)
14 ID allocation unit (allocation unit)
15 Application section 20 Storage section 21 Site storage DB
22 Site management DB
30 communication unit 40 input unit 50 output unit

Claims

A storage unit for storing link information indicating a plurality of personal websites and an adjacency relationship by hyperlink between the plurality of personal websites;
A calculation unit that calculates a first index indicating the degree to which two personal websites have a common adjacent personal website based on the link information;
A first generation unit configured to generate a set of clusters including one or more personal websites based on the first index;
An ID assigning device comprising: an assigning unit that assigns the same administrator ID to each of the plurality of personal websites, which is different for each cluster and within the cluster.

The ID allocation device according to claim 1, wherein the calculation unit calculates the first index including itself as the adjacent personal website.

The ID allocation device according to claim 1, wherein the first generation unit includes a combination of personal websites in which the first index is greater than a predetermined value in the same cluster.

The first generation unit uses the first index relating to a combination of a personal website belonging to the cluster and a personal website outside the cluster as a combination of a cluster and a cluster, or a combination of a cluster and a personal website according to a predetermined rule. And a personal website and a personal website, a personal website and a cluster, or a cluster that combines clusters and clusters is generated based on the first index or the second index. The ID allocation device according to claim 1 or 2, wherein the process is repeated until the first index and the second index are less than a predetermined value.

5. The ID according to claim 4, wherein the first generation unit uses the maximum value of the first index relating to a combination of a personal website belonging to the cluster and a personal website outside the cluster as the second index. Allocation device.

Based on a predetermined number of types representing usage forms classified based on URLs of each of the plurality of personal websites and a structure of the input / output relationship of the hyperlink, a predetermined input / output relationship is established. The ID allocation device according to claim 1, further comprising a second generation unit that generates a cluster by combining a combination of a certain type of personal website.

The third generation unit according to claim 1, further comprising: a third generation unit configured to generate the cluster by combining the same personal websites based on service provider accounts identifiable by URLs of the plurality of personal websites. Item 7. The ID assignment device according to any one of Items 6 to 7.

2. The calculation unit according to claim 1, wherein the calculation unit calculates the first index by excluding a personal website satisfying a predetermined condition of an incoming order and an outgoing order of the hyperlink and the link information related to the personal website. Item 8. The ID assignment device according to any one of Items 7 to 8.

9. The number of hyperlinks when the incoming order or the outgoing order is a predetermined type representing a usage form in which adjacent personal websites are classified based on URLs of the personal websites. ID assignment device.

The ID allocation device according to any one of claims 1 to 9, wherein the calculation unit calculates the first index by excluding a hyperlink that has occurred outside a predetermined period in the link information.

When the number of hyperlinks generated between the same personal websites in a predetermined period of the link information is less than a predetermined number, the calculation unit calculates the first index except for the hyperlinks The ID assignment device according to any one of claims 1 to 10.

The ID assignment device according to claim 11, wherein the predetermined number is set for each type representing a usage form classified based on a URL of a personal website that is a link source of the hyperlink.

An ID assignment method in which a computer assigns an administrator ID to each of a plurality of personal websites,
The computer stores link information indicating adjacency relationships by hyperlinks between the plurality of personal websites and the plurality of personal websites;
A calculation step of calculating a first index indicating the degree to which two personal websites have a common adjacent personal website based on the link information;
Generating a cluster set of one or more personal websites based on the first indicator;
A method of performing, for each of the plurality of personal websites, an assigning step of assigning the same administrator ID that is different for each cluster and within the cluster.

An ID assignment program for causing a computer to assign an administrator ID to each of a plurality of personal websites,
The computer stores link information indicating adjacency relationships by hyperlinks between the plurality of personal websites and the plurality of personal websites,
A calculation step of calculating a first index indicating the degree to which two personal websites have a common adjacent personal website based on the link information;
Generating a cluster set of one or more personal websites based on the first indicator;
An ID allocation program for executing, for the plurality of personal websites, an allocation step that is different for each cluster and allocates the same administrator ID in the cluster.