JP2008191904A

JP2008191904A - Distributed data management system and method

Info

Publication number: JP2008191904A
Application number: JP2007025226A
Authority: JP
Inventors: Hiroshi Kato; 大志加藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-02-05
Filing date: 2007-02-05
Publication date: 2008-08-21
Anticipated expiration: 2027-02-05
Also published as: JP4952276B2

Abstract

<P>PROBLEM TO BE SOLVED: To reduce delay required for data acquisition in a distributed data management system in a pear-to-pear network. <P>SOLUTION: This data processing prediction means 206 predicts the data processing pattern of the other node based on a history of a data processing request stored in a data processing request history storage means 207, and decides whether to distribute the data to the node in advance. When receiving the instruction of distribution from the data processing predicting means 206, a data prior distribution means 208 distributes the instructed data to the other node. The node distributed in advance can instantaneously acquire the data from a data storage means 201 of its own node when the acquisition request of the data is generated. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、サーバが存在しないピアツーピアネットワークにおける分散データ管理システムに係り、特に分散ハッシュ表技術を応用した分散データ管理システムおよび方法に関するものである。 The present invention relates to a distributed data management system in a peer-to-peer network without a server, and more particularly to a distributed data management system and method to which a distributed hash table technique is applied.

サーバが存在せず、任意のノードが任意の時刻に参加することができ、かつ任意のノードが任意の時刻に離脱することができるピアツーピアネットワークにおいて、データ共有を実現する技術として、分散ハッシュ表技術が知られている。ピアツーピアネットワークでは、図４に示すように、複数のノード２００がネットワーク１００を介して接続されている。ネットワーク１００は、多くの場合ＩＰ（Internet Protocol ）ネットワークであるが、ＩＰネットワークに限定するものではない。 Distributed hash table technology as a technology for realizing data sharing in a peer-to-peer network where there is no server, any node can join at any time, and any node can leave at any time It has been known. In the peer-to-peer network, a plurality of nodes 200 are connected via the network 100 as shown in FIG. The network 100 is often an IP (Internet Protocol) network, but is not limited to an IP network.

図４のようなピアツーピアネットワークに関して、分散ハッシュ表技術の代表例であるＣｈｏｒｄシステムが特許文献１に記載されている。また、分散ハッシュ表技術の応用例であるスキップネットが特許文献２に記載されている。以降では、分散ハッシュ表技術としてＣｈｏｒｄシステムを対象に説明する。 With respect to the peer-to-peer network as shown in FIG. 4, a Chord system, which is a representative example of the distributed hash table technique, is described in Patent Document 1. A skip net, which is an application example of the distributed hash table technique, is described in Patent Document 2. Hereinafter, the Chord system will be described as a distributed hash table technique.

Ｃｈｏｒｄシステムにおいては、データはＩＤ（識別子）と関連づけられて格納される。データに関連づけるＩＤは、データのハッシュ値もしくはデータを特徴づけるキーのハッシュ値とすることが一般的である。当該ハッシュ値を計算するためのハッシュ関数としては、例えばＳＨＡ−１（Secure Hash Algorithm 1 ）ハッシュ関数が使用される。Ｃｈｏｒｄシステムにおいては、ノードには予め前記ハッシュ値の空間と同じ空間にＩＤを定めておき、ノードのＩＤとデータのＩＤの距離を定義してデータを分散管理する。 In the Chord system, data is stored in association with an ID (identifier). In general, an ID associated with data is a hash value of data or a hash value of a key characterizing the data. As a hash function for calculating the hash value, for example, a SHA-1 (Secure Hash Algorithm 1) hash function is used. In the Chord system, IDs are set in advance for the nodes in the same space as the hash value space, and the distance between the node ID and the data ID is defined to manage the data in a distributed manner.

すなわち、あるデータの管理ノードは、当該データのＩＤに一番距離の小さいＩＤを持つノードとする。データを登録する場合は当該データの管理ノードへ登録し、データを取得する場合は当該データの管理ノードから取得する。距離は、ＩＤ空間の最大値（例えば、２の１６０乗）を法とした除算により定義する。大規模なピアツーピアネットワークにおいても効率的に管理ノードを探すことができるように、Ｃｈｏｒｄシステムでは経路表を管理してオーバレイネットワークを構成する。 That is, a management node for certain data is a node having an ID having the shortest distance to the ID of the data. When registering data, it registers with the management node of the said data, and when acquiring data, it acquires from the management node of the said data. The distance is defined by division modulo the maximum value of the ID space (for example, 2 to the power of 160). In the Chord system, an overlay network is configured by managing a routing table so that a management node can be searched efficiently even in a large-scale peer-to-peer network.

Ｃｈｏｒｄシステムのノード２００は、図５に示すように、データ格納手段２０１と、データ処理手段２０２と、通信処理手段２０３と、経路表処理手段２０４と、経路表格納手段２０５とから構成される。ノード２００の動作は以下のとおりである。
まず、経路表処理手段２０４は、データを登録もしくは取得する際に、経路表格納手段２０５に格納された経路表を用いて、当該データのＩＤと最も近いＩＤを持つノードが自ノードであるか否かを判断する。 As shown in FIG. 5, the Chord system node 200 includes data storage means 201, data processing means 202, communication processing means 203, route table processing means 204, and route table storage means 205. The operation of the node 200 is as follows.
First, when registering or acquiring data, the routing table processing unit 204 uses the routing table stored in the routing table storage unit 205 to determine whether the node having the ID closest to the data ID is its own node. Judge whether or not.

当該データのＩＤと最も近いＩＤを持つノードが自ノードである場合、データ処理手段２０２は、データ格納手段２０１にデータを登録するか、もしくはデータ格納手段２０１からデータを取得する。
当該データのＩＤと最も近いＩＤを持つノードが自ノードでない場合、通信処理手段２０３は、当該データの登録のメッセージもしくは取得のメッセージを、当該データのＩＤと可能な限り近いＩＤを持つ別のノードに転送する。以降のノードにおいても同様の転送を繰り返すことにより、当該データのＩＤと最も近いＩＤを持つノードにメッセージが到達する。 When the node having the ID closest to the ID of the data is its own node, the data processing unit 202 registers data in the data storage unit 201 or acquires data from the data storage unit 201.
If the node having the ID closest to the data ID is not its own node, the communication processing unit 203 sends another message having an ID as close as possible to the data ID. Forward to. By repeating the same transfer in subsequent nodes, the message reaches the node having the ID closest to the ID of the data.

Ｃｈｏｒｄシステムにおける経路表は、例えば、自ノードのＩＤとの距離が２のｉ乗に最も近いＩＤを持つノードのＩＤ及びアドレスをエントリとして格納している。ただし、距離の最小単位を１とし、２のｉ乗はＩＤ空間を越えないようなｉを用いるものとする。例えばＩＤ空間の最大値を２の１６０乗とすれば、ｉは０≦ｉ≦１５９を満たす。このような経路表の構成法により、Ｃｈｏｒｄシステムでは、任意のノードから別の任意のノードに辿り着くまでのホップ数がＯ（ｌｏｇ（Ｎ））となる。ただし、Ｎはネットワークの大きさ、すなわちネットワークに参加しているノード数を表し、Ｏ（）はオーダを意味する。 The route table in the Chord system stores, for example, the ID and address of a node having an ID that is closest to the power of 2 to the ID of the own node as an entry. It is assumed that the minimum unit of distance is 1, and i that does not exceed the ID space is used for 2 to the power of i. For example, if the maximum value of the ID space is 2 to the 160th power, i satisfies 0 ≦ i ≦ 159. With such a construction method of the routing table, in the Chord system, the number of hops until an arbitrary node arrives at another arbitrary node is O (log (N)). N represents the size of the network, that is, the number of nodes participating in the network, and O () represents an order.

実際には、ホップ数の分布には幅があり、ホップ数の平均値はホップ数の最大値のおよそ半分になることが知られている。また、ｌｏｇの底はＩＤ空間のワード数に一致し、前記の例では２となる。すなわち、ホップ数の平均値は、ｌｏｇ（Ｎ）／ｌｏｇ（２）／２となる。Ｏ（ｌｏｇ（Ｎ））のホップ数はＮの増加に対して比較的緩やかに増加するため、Ｃｈｏｒｄシステムはネットワークの大きさに対して効率的である。 Actually, the distribution of the number of hops varies, and it is known that the average value of the number of hops is approximately half the maximum value of the number of hops. Also, the bottom of the log matches the number of words in the ID space, which is 2 in the above example. That is, the average value of the number of hops is log (N) / log (2) / 2. Since the number of hops of O (log (N)) increases relatively slowly as N increases, the Chord system is efficient with respect to the size of the network.

特開２００５−３５４３６４号公報JP 2005-354364 A 特開２００４−２６６７９６号公報Japanese Patent Application Laid-Open No. 2004-266796

Ｃｈｏｒｄシステムおよび一般的な分散ハッシュ表技術を用いた分散データ管理システムにおいては、データ取得にＯ（ｌｏｇ（Ｎ））のホップ数を必要とするため、次のような問題点があった。
第１の問題点は、ネットワークがある程度の大きさを超えると、ホップ数による遅延が無視できないほど大きくなることである。例えば、ネットワークに参加しているノード数を１００００００ノード、すなわちＮ＝１００００００とすると、ホップ数の平均値は、ｌｏｇ（１０００００）／ｌｏｇ（２）／２＝９．９６となる。仮に、１ホップにかかるネットワーク遅延が１００ミリ秒とすると、合計で１秒近い遅延になり、用途によっては許容できない場合がある。すなわち、遅延がホップ数に比例しているため、大規模なネットワークではホップ数の増加によりデータ取得にかかる遅延が大きくなるという問題がある。 In the distributed data management system using the Chord system and a general distributed hash table technique, the number of hops of O (log (N)) is required for data acquisition, and thus there are the following problems.
The first problem is that when the network exceeds a certain size, the delay due to the number of hops becomes so large that it cannot be ignored. For example, if the number of nodes participating in the network is 1000000 nodes, that is, N = 1000000, the average value of the number of hops is log (100000) / log (2) /2=9.96. If the network delay for one hop is 100 milliseconds, the total delay is nearly one second, which may be unacceptable depending on the application. That is, since the delay is proportional to the number of hops, there is a problem that the delay in data acquisition increases due to an increase in the number of hops in a large-scale network.

第２の問題点は、受動的にデータの存在を知る可能性が低いことである。その理由は、データ取得の明示的な要求がない限り、基本的にはデータが配信されないためである。データの登録に際して転送途中に関与するノードは存在するものの少数であり、キーの情報を無くしてデータの存在を知りうるノードの割合は限られる。この問題はネットワークが大きいほど顕著になる。 The second problem is that the possibility of passively knowing the existence of data is low. The reason is that data is basically not distributed unless there is an explicit request for data acquisition. A small number of nodes are involved in the transfer during data registration, and the ratio of nodes that can know the existence of data without the key information is limited. This problem becomes more pronounced with larger networks.

本発明の目的は、分散ハッシュ表技術を用いた分散データ管理システムにおいて、データ取得にかかる遅延を軽減することができる分散データ管理システムおよび方法を提供することにある。
本発明の他の目的は、分散ハッシュ表技術を用いた分散データ管理システムにおいて、データ取得の要求がなくとも、将来のデータ処理を予測してデータを配信することができる分散データ管理システムおよび方法を提供することにある。 An object of the present invention is to provide a distributed data management system and method capable of reducing a delay in data acquisition in a distributed data management system using a distributed hash table technique.
Another object of the present invention is to provide a distributed data management system and a method for predicting future data processing and distributing data in a distributed data management system using a distributed hash table technique, even without a data acquisition request. Is to provide.

本発明は、ピアツーピアネットワークにおける分散データ管理システムにおいて、ネットワークに接続される各ノードが、他ノードと前記ネットワークを介して通信する通信処理手段と、データを記憶するデータ格納手段と、前記通信処理手段が受信したデータ処理要求に応じて、前記データ格納手段に格納されたデータを処理するデータ処理手段と、前記データ処理要求の履歴を記憶するデータ処理要求履歴格納手段と、前記データ処理要求の履歴を基に他ノードのデータ処理パターンを予測して、このノードにデータを事前に配信するか否かを判断するデータ処理予測手段と、このデータ処理予測手段から配信の指示を受けたときに、指示されたデータを前記他ノードに配信するデータ事前配信手段とを備えるものである。
データ処理予測手段は、データ処理要求の履歴をデータ処理要求履歴格納手段に格納し、データ処理要求履歴格納手段に格納されたデータ処理要求の履歴をもとに、将来のデータ処理を予測する。例えば、データ取得の分布が時間に対して一様であると仮定すると、あるデータＸに対する単位時間あたりの取得要求数をもとにデータＸに対する将来の取得要求数を予測することができる。この予測に基づき、データＸに対する将来の取得要求にかかるコスト（メッセージ数やメッセージ量）がデータＸを事前に配信するためにかかるコストより多いと判断される場合は、データ事前配信手段によりデータＸはすべてのまたは一部のノードに事前に配信される。このような構成を採用し、データ処理を予測し、必要に応じてデータを事前に配信することで、データ取得にかかる遅延を軽減することができ、本発明の目的を達成することができる。また、データ処理を予測し、必要に応じてデータを事前に配信することで、データ取得要求に関わらずデータを配信することができ、本発明の他の目的を達成することができる。 The present invention provides a distributed data management system in a peer-to-peer network, in which each node connected to the network communicates with other nodes via the network, data storage means for storing data, and the communication processing means. The data processing means for processing the data stored in the data storage means according to the received data processing request, the data processing request history storage means for storing the history of the data processing request, and the history of the data processing request Predicting the data processing pattern of other nodes based on the data processing prediction means for determining whether or not to deliver data to this node in advance, and when receiving a delivery instruction from this data processing prediction means, Data pre-distribution means for distributing the instructed data to the other nodes.
The data processing prediction means stores the data processing request history in the data processing request history storage means, and predicts future data processing based on the data processing request history stored in the data processing request history storage means. For example, assuming that the distribution of data acquisition is uniform with respect to time, the number of future acquisition requests for data X can be predicted based on the number of acquisition requests per unit time for certain data X. Based on this prediction, if it is determined that the cost (number of messages or message amount) required for a future acquisition request for data X is higher than the cost required for distributing data X in advance, data X Are pre-delivered to all or some nodes. By adopting such a configuration, predicting data processing, and distributing data in advance as necessary, data acquisition delay can be reduced, and the object of the present invention can be achieved. In addition, by predicting data processing and distributing data in advance as necessary, data can be distributed regardless of the data acquisition request, and the other object of the present invention can be achieved.

また、本発明の分散データ管理システムの１構成例は、さらに、経路表を記憶する経路表格納手段と、この経路表格納手段に格納された経路表の情報に基づいて、前記通信処理手段が受信したメッセージの転送先を決定する経路表処理手段とを備えるものである。
また、本発明の分散データ管理システムの１構成例において、前記経路表は、分散ハッシュ表技術を用いて構築されるものである。
また、本発明の分散データ管理システムの１構成例において、前記データ事前配信手段は、前記経路表格納手段に格納された経路表の情報を用いてデータを事前に配信するものである。
また、本発明の分散データ管理システムの１構成例において、前記データ処理予測手段は、前記データ処理要求の履歴を基に過去から現在までの前記データ処理パターンを求め、このデータ処理パターンから将来のデータ処理パターンを予測して、事前配信しない場合のデータ取得にかかるコストがデータ事前配信にかかるコストを上回ると判断した場合に、データの事前配信を決定するものである。
また、本発明の分散データ管理システムの１構成例において、前記データ処理予測手段は、特定のノードのグループを事前配信範囲として、この事前配信範囲についてのみ前記データ処理パターンを予測し、前記データ事前配信手段は、前記ノードのグループに対してのみデータを事前に配信するものである。
また、本発明の分散データ管理システムの１構成例において、前記データ処理予測手段は、他ノードにおける過去から現在までのデータ取得頻度と将来のデータ取得頻度とが線形の関係にあると仮定して、将来のデータ取得頻度を前記データ処理パターンとして予測するものである。
また、本発明の分散データ管理システムの１構成例において、前記データ処理予測手段は、データ登録時点から現在までの時間をＴ１とし、現在からデータの有効期限までの時間をＴ２とし、データ登録時点から現在までのデータ取得回数をＫとしたときに、将来のデータ取得回数の予測値をＫ×Ｔ２／Ｔ１とするものである。 In addition, one configuration example of the distributed data management system of the present invention further includes a routing table storing unit that stores a routing table, and the communication processing unit based on information on the routing table stored in the routing table storing unit. Route table processing means for determining a transfer destination of the received message.
In one configuration example of the distributed data management system of the present invention, the routing table is constructed using a distributed hash table technique.
Further, in one configuration example of the distributed data management system of the present invention, the data pre-distribution unit distributes data in advance using the information of the routing table stored in the routing table storage unit.
Further, in one configuration example of the distributed data management system of the present invention, the data processing prediction means obtains the data processing pattern from the past to the present based on the history of the data processing request, and from this data processing pattern to the future When the data processing pattern is predicted and it is determined that the cost for data acquisition when not pre-distributed exceeds the cost for data pre-distribution, data pre-distribution is determined.
Further, in one configuration example of the distributed data management system of the present invention, the data processing prediction unit predicts the data processing pattern only for the pre-distribution range with a specific node group as the pre-distribution range, and the data pre- The distribution means distributes data in advance only to the group of nodes.
Further, in one configuration example of the distributed data management system of the present invention, the data processing prediction means assumes that the data acquisition frequency from the past to the present in the other nodes and the future data acquisition frequency have a linear relationship. The future data acquisition frequency is predicted as the data processing pattern.
Further, in one configuration example of the distributed data management system of the present invention, the data processing prediction means sets the time from the data registration time to the present time as T1, the time from the present time to the data expiration date as T2, and the data registration time point. The predicted value of the number of future data acquisitions is K × T2 / T1, where K is the number of data acquisitions from now to the present.

また、本発明の分散データ管理方法において、ネットワークに接続される各ノードは、自ノードの通信処理手段が受信したデータ処理要求に応じて、自ノードのデータ格納手順に格納されたデータを処理するデータ処理手順と、前記データ処理要求の履歴を記憶するデータ処理要求履歴格納手順と、前記データ処理要求の履歴を基に他のノードのデータ処理パターンを予測して、このノードにデータを事前に配信するか否かを判断するデータ処理予測手順と、このデータ処理予測手順で事前配信が決定されたデータを前記他のノードに配信するデータ事前配信手順とを実行するようにしたものである。 In the distributed data management method of the present invention, each node connected to the network processes the data stored in the data storage procedure of the own node in response to the data processing request received by the communication processing means of the own node. A data processing procedure, a data processing request history storing procedure for storing the data processing request history, and a data processing pattern of another node are predicted based on the data processing request history, and data is preliminarily stored in this node. A data processing prediction procedure for determining whether to distribute or not, and a data pre-distribution procedure for distributing data determined to be pre-distributed in this data processing prediction procedure to the other nodes are executed.

本発明によれば、データ処理要求の履歴を基に他ノードのデータ処理パターンを予測して、このノードにデータを事前に配信するか否かを判断し、事前配信が決定されたデータを他ノードに配信するようにしたので、データ取得にかかる遅延を軽減することができる。その理由は、将来のデータ処理を予測して、必要に応じてデータを他ノードに事前に配信することで、事前配信されたノードでは、当該データの取得要求が発生した場合に、データを自ノードのデータ格納手段から即座に取得できるためである。また、本発明では、データ取得要求を行わないノードにもデータを配信することができる。その理由は、将来のデータ処理を予測して、必要に応じてデータを事前に配信することで、全てのもしくは一部のノードに能動的にデータが配信されるためである。 According to the present invention, a data processing pattern of another node is predicted based on a history of data processing requests, it is determined whether or not data is to be distributed in advance to this node, Since it is distributed to the nodes, it is possible to reduce the delay in data acquisition. The reason is that by predicting future data processing and distributing the data to other nodes in advance if necessary, the pre-distributed node will automatically receive the data when a request to acquire the data occurs. This is because it can be acquired immediately from the data storage means of the node. In the present invention, data can also be distributed to a node that does not make a data acquisition request. The reason is that data is actively distributed to all or some nodes by predicting future data processing and distributing data in advance as necessary.

また、本発明では、経路表格納手段に格納された経路表の情報を用いてデータを事前に配信することにより、データ事前配信手段が別途配信のための情報を管理する必要がなくなる。 Further, in the present invention, data is distributed in advance using the information of the routing table stored in the routing table storage means, so that it is not necessary for the data pre-distribution means to separately manage information for distribution.

また、本発明では、データ処理要求の履歴を基に過去から現在までのデータ処理パターンを求め、このデータ処理パターンから将来のデータ処理パターンを予測して、事前配信しない場合のデータ取得にかかるコストとデータ事前配信にかかるコストとを比較することにより、データ処理予測手段はデータを事前に配信するか否かを判断することができる。 Further, according to the present invention, the cost of data acquisition when the data processing pattern from the past to the present is obtained based on the history of the data processing request, the future data processing pattern is predicted from the data processing pattern, and the data is not distributed in advance. And the cost for data pre-distribution, the data processing prediction means can determine whether or not to distribute data in advance.

また、本発明では、特定のノードのグループを事前配信範囲として、この事前配信範囲についてのみデータ処理パターンを予測することにより、特定のノードのグループに対してのみデータを事前に配信することができる。 Further, in the present invention, it is possible to distribute data in advance only to a group of specific nodes by predicting a data processing pattern only for the pre-distribution range with a specific group of nodes as a pre-distribution range. .

また、本発明では、他ノードにおける過去から現在までのデータ取得頻度と将来のデータ取得頻度とが線形の関係にあると仮定することにより、過去から現在までのデータ取得頻度から将来のデータ取得頻度をデータ処理パターンとして予測することができる。 Further, in the present invention, it is assumed that the data acquisition frequency from the past to the present and the future data acquisition frequency in other nodes have a linear relationship, so that the data acquisition frequency from the past to the present to the future data acquisition frequency. Can be predicted as a data processing pattern.

［第１の実施の形態］
次に、本発明を実施するための最良の形態について図面を参照して詳細に説明する。図１は本発明の第１の実施の形態に係る分散データ管理システムにおけるノードの構成を示すブロック図である。本実施の形態の分散データ管理システムを適用するピアツーピアネットワークの構成は図４に示したとおりであるので、ネットワークを１００、ノードを２００とする。図１では、１つのノード２００についてのみ記載しているが、図４に示したように、同様の構成のノード２００が複数存在することは言うまでもない。 [First Embodiment]
Next, the best mode for carrying out the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a node in the distributed data management system according to the first embodiment of the present invention. Since the configuration of the peer-to-peer network to which the distributed data management system of the present embodiment is applied is as shown in FIG. 4, it is assumed that the network is 100 and the nodes are 200. Although only one node 200 is illustrated in FIG. 1, it goes without saying that a plurality of nodes 200 having the same configuration exist as shown in FIG.

図１に示すように、ノード２００は、データ格納手段２０１と、データ処理手段２０２と、通信処理手段２０３と、経路表処理手段２０４と、経路表格納手段２０５と、データ処理予測手段２０６と、データ処理要求履歴格納手段２０７と、データ事前配信手段２０８とから構成されている。このノード２００の各手段は、それぞれ概略つぎのように動作する。 As shown in FIG. 1, the node 200 includes a data storage unit 201, a data processing unit 202, a communication processing unit 203, a route table processing unit 204, a route table storage unit 205, a data processing prediction unit 206, The data processing request history storage unit 207 and the data pre-distribution unit 208 are configured. Each means of the node 200 generally operates as follows.

データ格納手段２０１は、分散ハッシュ表全体に格納されるデータの一部、すなわち自ノードが管理するデータを記憶している。
データ処理手段２０２は、通信処理手段２０３からデータ処理要求を受けたときに、このデータ処理要求をデータ処理予測手段２０６へ出力すると共に、データ処理要求に応じてデータ処理を実行し、必要であればデータ処理の結果を通信処理手段２０３を通じて要求元のノード２００もしくはクライアント（不図示）に返送する。 The data storage unit 201 stores a part of data stored in the entire distributed hash table, that is, data managed by the own node.
When the data processing unit 202 receives the data processing request from the communication processing unit 203, the data processing unit 202 outputs the data processing request to the data processing prediction unit 206 and executes the data processing in response to the data processing request. For example, the result of the data processing is returned to the requesting node 200 or the client (not shown) through the communication processing means 203.

通信処理手段２０３は、ネットワーク１００を介して他のノード２００もしくはクライアントからメッセージを受けたときに、当該メッセージがデータ処理要求であればデータ処理手段２０２に当該要求を出力し、当該メッセージが経路表更新要求であれば経路表処理手段２０４に当該要求を出力し、いずれの要求であっても必要であれば当該要求を別のノード２００に転送する。 When the communication processing unit 203 receives a message from another node 200 or a client via the network 100, if the message is a data processing request, the communication processing unit 203 outputs the request to the data processing unit 202. If it is an update request, the request is output to the routing table processing unit 204, and if any request is necessary, the request is transferred to another node 200.

経路表処理手段２０４は、通信処理手段２０３から経路表更新要求を受けたときに、経路表格納手段２０５に格納された経路表を更新し、必要であれば、別のノード２００に対して新規に経路表更新要求を送信する。また、経路表処理手段２０４は、メッセージの転送の必要性および転送先を判断する。
経路表格納手段２０５は、経路表処理手段２０４により管理される経路表を記憶している。経路表は、分散ハッシュ表技術を用いて構築される。 When the routing table processing unit 204 receives a routing table update request from the communication processing unit 203, the routing table processing unit 204 updates the routing table stored in the routing table storage unit 205. If necessary, the routing table processing unit 204 creates a new one for another node 200. Send a routing table update request to Further, the routing table processing unit 204 determines the necessity and destination of message transfer.
The route table storage unit 205 stores a route table managed by the route table processing unit 204. The routing table is constructed using a distributed hash table technique.

データ処理予測手段２０６は、データ処理手段２０２から逐次もしくは定期的にデータ処理要求を受けて、このデータ処理要求を履歴としてデータ処理要求履歴格納手段２０７に格納する。また、データ処理予測手段２０６は、データ処理要求履歴格納手段２０７に格納されているデータ処理要求履歴に基づいて将来のデータ処理を予測し、あるデータを事前に配信すべきと判断すれば、データ事前配信手段２０８に対して当該データの事前配信を指示する。 The data processing prediction unit 206 receives a data processing request from the data processing unit 202 sequentially or periodically, and stores this data processing request as a history in the data processing request history storage unit 207. The data processing prediction unit 206 predicts future data processing based on the data processing request history stored in the data processing request history storage unit 207, and determines that data should be distributed in advance. The pre-distribution means 208 is instructed to pre-distribute the data.

データ事前配信手段２０８は、データ処理予測手段２０６からデータ事前配信の指示を受けたときに、指示されたデータをデータ格納手段２０１から取り出して、このデータをネットワーク１００を介して自ノード以外の全てのノード２００もしくは一部のノード２００のデータ事前配信手段２０８に事前配信する。
事前配信されたデータを受信したノード２００では、データ事前配信手段２０８が受信したデータをデータ格納手段２０１に格納する。 When the data pre-distribution unit 208 receives an instruction for data pre-distribution from the data processing prediction unit 206, the data pre-distribution unit 208 takes out the instructed data from the data storage unit 201 and sends this data to all but the own node via the network 100. Are pre-delivered to the data pre-distribution means 208 of the node 200 or some of the nodes 200.
The node 200 that has received the pre-distributed data stores the data received by the data pre-distribution means 208 in the data storage means 201.

あるデータを事前に配信すべきか否かを判断するための参考情報としては、例えば当該データの単位時間あたりの取得回数（配信側から見た場合には配信回数）や、当該データの残りの有効時間内での線形予測取得回数がある。また、当該データが更新されたデータである場合には、更新前の古いデータの取得頻度（配信頻度）を参考情報として用いてもよいし、データの類似性が判定できる場合には、当該データの類似データの取得頻度を参考情報として用いてもよい。 Reference information for determining whether or not certain data should be distributed in advance includes, for example, the number of acquisitions of the data per unit time (the number of distributions when viewed from the distribution side) and the remaining valid data There are linear prediction acquisition times in time. In addition, when the data is updated data, the acquisition frequency (distribution frequency) of old data before the update may be used as reference information. When the similarity of data can be determined, the data The acquisition frequency of similar data may be used as reference information.

また、データの事前配信範囲、すなわち事前配信先のノードを予め限定し、この事前配信範囲のみに関する当該データの前記参考情報を用いて、データを事前に配信すべきか否かを判断するようにしてもよい。
なお、データ取得頻度とデータ取得回数は前記参考情報として同等の意味を有するが、データ取得の数を明確な期間を定めずに求めた場合はデータ取得頻度とし、ある特定の期間について求めた場合はデータ取得回数とする。 In addition, the pre-delivery range of data, that is, the node of the pre-delivery destination is limited in advance, and it is determined whether or not the data should be pre-delivered using the reference information of the data related to only this pre-delivery range Also good.
The data acquisition frequency and the number of data acquisitions have the same meaning as the reference information. However, when the number of data acquisitions is determined without setting a clear period, the data acquisition frequency is used, and the data acquisition frequency is determined for a specific period. Is the number of data acquisitions.

次に、図１及び図２のフローチャートを参照して本実施の形態の全体の動作について詳細に説明する。
まず、ノード２００の通信処理手段２０３は、ネットワーク１００を介して別のノード２００もしくはクライアントからメッセージを受信する（ステップＡ１）。メッセージを受信した通信処理手段２０３は、このメッセージに含まれる要求の種類に応じて要求を振り分ける。すなわち、メッセージに含まれる要求がデータ処理要求の場合はデータ処理手段２０２に出力し、経路表更新要求の場合は経路表処理手段２０４に出力する（ステップＡ２）。 Next, the overall operation of the present embodiment will be described in detail with reference to the flowcharts of FIGS.
First, the communication processing unit 203 of the node 200 receives a message from another node 200 or a client via the network 100 (step A1). The communication processing unit 203 that has received the message distributes the request according to the type of request included in the message. That is, if the request included in the message is a data processing request, the request is output to the data processing unit 202, and if the request is a routing table update request, the request is output to the routing table processing unit 204 (step A2).

データ処理手段２０２は、通信処理手段２０３からデータ処理要求を受けた場合に、このデータ処理要求に応じてデータを処理する（ステップＡ３）。データ処理要求には、データ登録要求とデータ取得要求の２種類がある。 When receiving a data processing request from the communication processing unit 203, the data processing unit 202 processes the data in response to the data processing request (step A3). There are two types of data processing requests: data registration requests and data acquisition requests.

データ処理手段２０２は、データ登録要求を受けた場合、この要求に含まれるデータをデータ格納手段２０１に登録し、要求に対する返信が必要な場合は（例えば返信すべきことが予め定められている場合など）、データ登録が完了したことを示すデータ処理結果を通信処理手段２０３及びネットワーク１００を介して要求元のノード２００もしくはクライアントに返信する。また、データ処理手段２０２は、データ取得要求を受けた場合、この要求に該当するデータをデータ格納手段２０１から取得し、取得したデータを通信処理手段２０３及びネットワーク１００を介して要求元のノード２００もしくはクライアントに返信する。 When the data processing unit 202 receives a data registration request, the data processing unit 202 registers the data included in the request in the data storage unit 201, and when a reply to the request is necessary (for example, when it is predetermined that a reply should be made) The data processing result indicating that the data registration is completed is returned to the requesting node 200 or the client via the communication processing unit 203 and the network 100. Further, when receiving a data acquisition request, the data processing unit 202 acquires data corresponding to the request from the data storage unit 201, and acquires the acquired data via the communication processing unit 203 and the network 100. Or reply to the client.

次に、データ処理手段２０２は、通信処理手段２０３から受けたデータ処理要求をデータ処理予測手段２０６に出力し、データ処理予測手段２０６は、このデータ処理要求を履歴としてデータ処理要求履歴格納手段２０７に格納する（ステップＡ４）。
続いて、データ処理予測手段２０６は、データ処理要求履歴格納手段２０７に格納されているデータ処理要求の履歴を参照して、あるデータについて過去から現在までのデータ取得パターン（データ取得回数又はデータ取得頻度）を履歴から求め、このデータについての将来のデータ取得パターンを予測し、事前配信しない場合のデータ取得にかかるコスト（メッセージ数やメッセージ量）がデータ事前配信にかかるコストを上回ると判断した場合には、データ事前配信手段２０８に当該データの事前配信を指示する（ステップＡ５）。 Next, the data processing unit 202 outputs the data processing request received from the communication processing unit 203 to the data processing prediction unit 206, and the data processing prediction unit 206 uses the data processing request as a history to store data processing request history storage unit 207. (Step A4).
Subsequently, the data processing prediction unit 206 refers to the history of data processing requests stored in the data processing request history storage unit 207, and obtains data acquisition patterns (data acquisition count or data acquisition) from past to present for certain data. Frequency) from the history, predicting future data acquisition patterns for this data, and determining that the cost of data acquisition (number of messages and message volume) without pre-delivery exceeds the cost of data pre-delivery The data pre-distribution means 208 is instructed to pre-distribute the data (step A5).

データ事前配信手段２０８は、データ事前配信の指示を受けると、指示されたデータをネットワーク１００を介して自ノード以外の全てのノード２００もしくは一部のノード２００へ事前配信する（ステップＡ６）。
事前配信されたデータを受信したノード２００では、データ事前配信手段２０８が受信したデータを自ノードのデータ格納手段２０１に格納する。 When receiving the data pre-distribution instruction, the data pre-distribution means 208 pre-distributes the instructed data to all the nodes 200 or some of the nodes 200 other than the own node via the network 100 (step A6).
In the node 200 that has received the pre-distributed data, the data received by the data pre-distribution means 208 is stored in the data storage means 201 of the own node.

一方、経路表処理手段２０４は、通信処理手段２０３から経路表更新要求を受けた場合に、経路表格納手段２０５に格納された経路表を経路表更新要求に応じて更新する（ステップＡ７）。経路表更新要求は、例えば新たなノード２００がネットワーク１００に追加されたときに生成される。続いて、経路表処理手段２０４は、新たに経路表を更新する必要があるか否かを判断する（ステップＡ８）。通信処理手段２０３は、経路表処理手段２０４により新たに経路表を更新する必要があると判断された場合、新たな経路表更新要求をネットワーク１００を介して送信する（ステップＡ９）。 On the other hand, when receiving the routing table update request from the communication processing unit 203, the routing table processing unit 204 updates the routing table stored in the routing table storage unit 205 in response to the routing table update request (step A7). The routing table update request is generated when a new node 200 is added to the network 100, for example. Subsequently, the route table processing means 204 determines whether or not it is necessary to newly update the route table (step A8). When the routing table processing unit 204 determines that the routing table needs to be updated anew, the communication processing unit 203 transmits a new routing table update request via the network 100 (step A9).

次に、経路表処理手段２０４は、ステップＡ１で通信処理手段２０３が受けたメッセージの転送が必要か否かを判断する（ステップＡ１０）。メッセージの転送が必要な場合としては、例えばメッセージが自ノード宛でない場合がある。経路表処理手段２０４は、メッセージの転送が必要と判断した場合、経路表格納手段２０５に格納された経路表の情報に基づいてメッセージの転送先を決定する。通信処理手段２０３は、経路表処理手段２０４により転送が必要と判断された場合、メッセージをネットワーク１００を介して転送先のノード２００に転送する（ステップＡ１１）。 Next, the routing table processing unit 204 determines whether or not it is necessary to transfer the message received by the communication processing unit 203 in Step A1 (Step A10). As a case where the message needs to be transferred, for example, the message may not be addressed to the own node. When the routing table processing unit 204 determines that the message needs to be transferred, the routing table processing unit 204 determines the transfer destination of the message based on the routing table information stored in the routing table storage unit 205. When the routing table processing unit 204 determines that the transfer is necessary, the communication processing unit 203 transfers the message to the transfer destination node 200 via the network 100 (step A11).

なお、図２のステップＡ２において、通信処理手段２０３が受信した１つのメッセージに複数の要求が含まれる場合もある。また、ステップＡ５およびステップＡ８については、メッセージを受信する以外のイベント、例えば定期的なイベントにより実行される場合もある。 In step A2 of FIG. 2, a single message received by the communication processing unit 203 may include a plurality of requests. Step A5 and step A8 may be executed by an event other than receiving a message, for example, a regular event.

次に、本実施の形態の効果について説明する。本実施の形態では、データ処理要求の履歴をデータ処理要求履歴格納手段２０７に格納し、データ処理予測手段２０６がデータ処理要求の履歴を利用して将来のデータ取得要求のパターンを予測し、データを事前に配信するとコスト（メッセージ数やメッセージ量）が少なく済むと判断される場合に、データを事前配信することにより、データが事前配信されたノードにおいては以後の当該データの取得が即座に完了するので、全体としてデータ取得にかかる時間を軽減することができる。 Next, the effect of this embodiment will be described. In the present embodiment, the history of data processing requests is stored in the data processing request history storage means 207, and the data processing prediction means 206 uses the history of data processing requests to predict the pattern of future data acquisition requests. If it is determined that the cost (number of messages and message volume) can be reduced by distributing the data in advance, the data is pre-distributed and the subsequent acquisition of the data is immediately completed at the node where the data has been distributed in advance. Therefore, the time required for data acquisition as a whole can be reduced.

Ｃｈｏｒｄシステムおよび一般的な分散ハッシュ表技術を用いた分散データ管理システムにおいて、一回のデータ取得にかかるコストはＯ（ｌｏｇ（Ｎ））であり、一般的なデータ配信にかかるコストはＯ（Ｎ）であるから、あるデータの取得回数が将来にわたってＯ（Ｎ／ｌｏｇ（Ｎ））回見込まれる場合は、データを事前配信することによるコストの増加はない。 In the distributed data management system using the Chord system and a general distributed hash table technology, the cost for acquiring data once is O (log (N)), and the cost for general data distribution is O (N Therefore, when the number of times of acquisition of certain data is expected to be O (N / log (N)) times in the future, there is no increase in cost due to pre-distribution of data.

［第２の実施の形態］
次に、本発明の第２の発明を実施するための最良の形態について図面を参照して詳細に説明する。図３は本発明の第２の実施の形態に係る分散データ管理システムにおけるノードの構成を示すブロック図であり、図１と同一の構成には同一の符号を付してある。
本実施の形態は、図１に示した第１の実施の形態の構成と比較して、経路表格納手段２０５とデータ事前配信手段２０８ａとが接続している点で異なる。 [Second Embodiment]
Next, the best mode for carrying out the second invention of the present invention will be described in detail with reference to the drawings. FIG. 3 is a block diagram showing the configuration of a node in the distributed data management system according to the second embodiment of the present invention. The same components as those in FIG. 1 are denoted by the same reference numerals.
This embodiment is different from the configuration of the first embodiment shown in FIG. 1 in that the route table storage unit 205 and the data pre-distribution unit 208a are connected.

本実施の形態においては、データ事前配信手段２０８ａは、データを事前配信する際に、経路表格納手段２０５に格納されている経路表の情報を用いて、データの事前配信先を決定する。
第１の実施の形態では、データ事前配信手段２０８が事前配信のための情報を管理している必要があるが、本実施の形態では、データ事前配信手段２０８ａが経路表格納手段２０５に格納されている経路表の情報を用いることにより、データ事前配信手段２０８ａが別途配信のための情報を管理する必要がなくなるという効果がある。 In the present embodiment, the data pre-distribution unit 208a determines the data pre-distribution destination using the information of the route table stored in the route table storage unit 205 when the data is pre-distributed.
In the first embodiment, the data pre-distribution unit 208 needs to manage information for pre-distribution, but in this embodiment, the data pre-distribution unit 208a is stored in the routing table storage unit 205. By using the information in the routing table, there is an effect that the data pre-distribution means 208a does not need to separately manage information for distribution.

一般に、分散ハッシュ表技術では、任意のノードにＯ（ｌｏｇ（Ｎ））ホップで到達するために経路表を管理してネットワークを構成するが、このネットワークは任意のノードを根とした木とみなせることが多い。そこで、本実施の形態では、データ事前配信手段２０８ａがデータを事前配信する際にその木を使って事前配信することにより、新たにネットワーク構造を用意する必要がなくなる。 In general, in the distributed hash table technique, a network is configured by managing a routing table in order to reach an arbitrary node with O (log (N)) hops. This network can be regarded as a tree rooted at an arbitrary node. There are many cases. Therefore, in the present embodiment, when the data pre-distribution means 208a pre-distributes data, pre-distribution using the tree eliminates the need to prepare a new network structure.

［第３の実施の形態］
次に、具体的な例を用いて本発明を実施するための最良の形態の動作を説明する。本実施の形態は、第１の実施の形態をより具体的に説明するものである。本実施の形態では、分散ハッシュ表技術の例としてＣｈｏｒｄシステムを想定し、データを事前配信する技術としては二分木の構造をもったアプリケーションレイヤマルチキャストを想定している。分散ハッシュ表技術の他の例としては、ＫａｄｅｍｌｉａやＰａｓｔｒｙなどが知られている。データの事前配信の技術としては、アプリケーションレイヤマルチキャストだけでなく、ＩＰマルチキャストも利用することができる。 [Third Embodiment]
Next, the operation of the best mode for carrying out the present invention will be described using a specific example. In the present embodiment, the first embodiment will be described more specifically. In this embodiment, a Chord system is assumed as an example of the distributed hash table technique, and an application layer multicast having a binary tree structure is assumed as a technique for pre-distributing data. Other examples of the distributed hash table technique include Kademlia and Pastry. As a technique for data pre-distribution, not only application layer multicast but also IP multicast can be used.

図１に示したデータ格納手段２０１とデータ処理手段２０２と通信処理手段２０３と経路表処理手段２０４と経路表格納手段２０５は、Ｃｈｏｒｄシステムの構成要素である。Ｃｈｏｒｄシステムの詳細は例えば特許文献１に開示されている。
本実施の形態においては、通常のＣｈｏｒｄシステムと異なり、事前に配信されたデータを取得すること、および履歴を記録することの２つの理由により、データ処理要求が常にデータ処理手段２０２に出力されるようにする。 The data storage means 201, data processing means 202, communication processing means 203, route table processing means 204, and route table storage means 205 shown in FIG. 1 are components of the Chord system. The details of the Chord system are disclosed in, for example, Japanese Patent Application Laid-Open No. H10-228707.
In the present embodiment, unlike a normal Chord system, a data processing request is always output to the data processing means 202 for two reasons: obtaining data distributed in advance and recording a history. Like that.

以下では、Ｃｈｏｒｄシステムの構成要素を除いたデータ処理予測手段２０６と、データ処理要求履歴格納手段２０７と、データ事前配信手段２０８とについて詳細に説明する。データ処理要求履歴格納手段２０７は、メモリもしくはファイルシステム上のデータベースであり、例えばＣＳＶ（Comma Separated Values）形式のファイルを保存する。ＣＳＶファイルの一行は、＜時刻＞，＜取得／登録＞，＜ノード情報＞，＜キー＞，＜データ＞，＜有効期限＞とする。ただし、＜データ＞および＜有効期限＞はデータ登録要求の場合のみ必要である。また、＜データ＞および＜有効期限＞はデータ格納手段２０１と重複する場合は省略することもできる。＜ノード情報＞はＩＰアドレスに加え、分散ハッシュ表（経路表）が提供するＩＤやグループの情報などによって構成される。 Hereinafter, the data processing prediction unit 206 excluding the components of the Chord system, the data processing request history storage unit 207, and the data pre-distribution unit 208 will be described in detail. The data processing request history storage means 207 is a database on a memory or file system, and stores, for example, a CSV (Comma Separated Values) format file. One line of the CSV file includes <time>, <acquisition / registration>, <node information>, <key>, <data>, and <expiration date>. However, <data> and <expiration date> are only required for data registration requests. Also, <data> and <expiration date> can be omitted if they overlap with the data storage unit 201. <Node information> is configured by an ID, group information, and the like provided by a distributed hash table (route table) in addition to an IP address.

データ処理予測手段２０６は、データ処理手段２０２からデータ処理要求を受けると、このデータ処理要求を逐次にもしくは定期的にデータ処理要求履歴格納手段２０７のＣＳＶファイルに追記する。また、データ処理予測手段２０６は、ＣＳＶファイルへの追記と同期的もしくは非同期的に、データ処理要求に対応したデータもしくはデータ格納手段２０１に格納されている全てのデータについて、それらのデータを事前に配信すべきか否かを判断する。 When the data processing prediction unit 206 receives a data processing request from the data processing unit 202, the data processing prediction unit 206 adds the data processing request to the CSV file of the data processing request history storage unit 207 sequentially or periodically. In addition, the data processing prediction unit 206 preliminarily sets the data corresponding to the data processing request or all the data stored in the data storage unit 201 in synchronization or asynchronously with the appending to the CSV file. Determine whether or not to deliver.

データを事前に配信すべきか否かを判断するためには、予めもしくは動的に配信にかかるコストを見積もる必要がある。ここで、データのサイズは十分に小さいと仮定し、ストレージコストを無視すると、コストはメッセージ送受信の量となる。データサイズが固定の場合は、メッセージ送受信の量はメッセージ数と同等であり、以下ではコストとしてメッセージ数を考える。 In order to determine whether or not data should be distributed in advance, it is necessary to estimate the cost for distribution in advance or dynamically. Here, assuming that the data size is sufficiently small and ignoring the storage cost, the cost is the amount of message transmission / reception. When the data size is fixed, the amount of message transmission / reception is equal to the number of messages, and the number of messages is considered as cost below.

二分木によるマルチキャストでは、すべてのノードが厳密に１回メッセージを受信するため、Ｎ個のノードにデータを事前配信するコストはＮである。ところで、分散ハッシュ表を用いた場合には、一般にデータ取得にＯ（ｌｏｇ（Ｎ））のコストがかかり、Ｃｈｏｒｄシステムでは、データ取得に平均的にｌｏｇ（Ｎ）／ｌｏｇ（２）／２のコストがかかる。仮に、あるデータについて今後の取得回数もしくは取得ノード数がＭであるとすると、このデータ取得にかかるコストの合計の期待値は、Ｍ×ｌｏｇ（Ｎ）／ｌｏｇ（２）／２になる。 In the binary tree multicast, all nodes receive a message exactly once, so the cost of pre-distributing data to N nodes is N. By the way, when a distributed hash table is used, it generally costs O (log (N)) for data acquisition, and the Chord system averages log (N) / log (2) / 2 for data acquisition. costly. Assuming that the number of future acquisitions or the number of acquisition nodes for a certain data is M, the expected value of the total cost for this data acquisition is M × log (N) / log (2) / 2.

データ事前配信手段２０８は、このコストの合計の期待値Ｍ×ｌｏｇ（Ｎ）／ｌｏｇ（２）／２がデータの事前配信コストＮより大きければ、データを事前配信すべきと判断する。データ事前配信手段２０８で利用されるネットワークは、すべてのノード２００によって一つの二分木を構成する。データは二分木の根に相当するノード２００から事前配信される。根に相当するノード２００に負荷が集中しないように、複数の二分木を構成する場合もある。 The data pre-distribution means 208 determines that the data should be pre-distributed if the total expected value M × log (N) / log (2) / 2 of this cost is greater than the data pre-delivery cost N. In the network used by the data pre-distribution means 208, all the nodes 200 constitute one binary tree. Data is distributed in advance from the node 200 corresponding to the root of the binary tree. A plurality of binary trees may be configured so that the load is not concentrated on the node 200 corresponding to the root.

データの事前配信を受けたノード２００では、事前配信されたデータがデータ格納手段２０１に格納される。以後、このノード２００において当該データの取得要求が発生した場合には、データ格納手段２０１から当該データを取得することができるので、データ取得が即座に完了する。 In the node 200 that has received the pre-distribution of data, the pre-distributed data is stored in the data storage unit 201. Thereafter, when an acquisition request for the data occurs in the node 200, the data can be acquired from the data storage unit 201, so that the data acquisition is completed immediately.

次に、データ事前配信手段２０８が将来のデータ取得頻度（データ取得回数）Ｍを予測する方法について説明する。Ｍの予測方法には２種類ある。第一の方法は、過去から現在までのデータ取得頻度と将来のデータ取得頻度とが線形の関係にあると仮定する方法である。この方法では、例えばデータ登録時点から現在までの時間をＴ１とし、現在からデータの有効期限までの時間をＴ２とし、データ登録時点から現在までのデータ取得回数をＫとすると、現在からデータの有効期限までのデータ取得回数はＭ＝Ｋ×Ｔ２／Ｔ１と予測することができる。また、例えば単位時間Ｔ（例えば１分）におけるデータ取得回数の平均値をＬとし、前記のＴ２を用いると、現在からデータの有効期限までのデータ取得回数はＭ＝Ｌ×Ｔ２／Ｔと予測することができる。 Next, a method in which the data pre-distribution unit 208 predicts the future data acquisition frequency (data acquisition frequency) M will be described. There are two methods for predicting M. The first method is a method that assumes that the data acquisition frequency from the past to the present and the future data acquisition frequency have a linear relationship. In this method, for example, if the time from the data registration time to the present is T1, the time from the present time to the data expiration date is T2, and the number of data acquisition from the data registration time to the present is K, the data validity from the present time The number of data acquisitions until the deadline can be predicted as M = K × T2 / T1. For example, if the average value of the number of data acquisitions in unit time T (for example, 1 minute) is L, and T2 is used, the number of data acquisitions from the present to the data expiration date is predicted as M = L × T2 / T. can do.

Ｍ＝Ｋ×Ｔ２／Ｔ１又はＭ＝Ｌ×Ｔ２／Ｔの予測例では、現在からデータの有効期限までの時間Ｔ２が必要であるが、Ｔ２が定められていない場合は、例えばＴ２をＴ１の半分とすればよい。また、例えばあるデータを更新する場合には、当該データの過去の取得頻度を今後の取得頻度としてＭを予測することができる。また、例えばデータの類似性が判定できる場合には、当該データに類似したデータの過去の取得頻度を当該データの今後の取得頻度としてＭを予測することができる。 In the prediction example of M = K × T2 / T1 or M = L × T2 / T, a time T2 from the present to the expiration date of the data is necessary. If T2 is not defined, for example, T2 is set to T1. You can halve it. Further, for example, when updating certain data, M can be predicted using the past acquisition frequency of the data as the future acquisition frequency. For example, when the similarity of data can be determined, M can be predicted using the past acquisition frequency of data similar to the data as the future acquisition frequency of the data.

第二の方法は、過去から現在までのデータ取得頻度と将来のデータ取得頻度とが非線形の関係にあると仮定する方法である。この方法では、例えばデータ登録時点から現在までのデータ取得回数をＫとすると、将来のデータ取得回数はＭ＝Ｋ／２と予測することができる。また、例えばデータ取得頻度が単調減少するものとして、データ登録時点から現在までのデータ取得パターンよりデータ取得頻度の減少の傾きＪを推定する。現在のデータ取得頻度をＬとすれば、将来のデータ取得頻度はＭ＝Ｌ×Ｌ／Ｊ／２と予測することができる。また、例えばデータ取得頻度は２次関数に従って減少するものとして、データ登録時点から現在までのデータ取得パターンより最少二乗法により２次関数を推定して、将来のデータ取得頻度Ｍを予測することもできる。 The second method is a method that assumes that the data acquisition frequency from the past to the present and the future data acquisition frequency are in a non-linear relationship. In this method, for example, if the number of data acquisitions from the time of data registration to the present is K, the future number of data acquisitions can be predicted as M = K / 2. Further, for example, assuming that the data acquisition frequency decreases monotonously, the slope J of the decrease in the data acquisition frequency is estimated from the data acquisition pattern from the data registration time to the present. If the current data acquisition frequency is L, the future data acquisition frequency can be predicted as M = L × L / J / 2. Further, for example, assuming that the data acquisition frequency decreases according to the quadratic function, the future data acquisition frequency M may be predicted by estimating the quadratic function by the least square method from the data acquisition pattern from the time of data registration to the present. it can.

また、前記第一の方法、および第二の方法の両方の場合において、データの事前配信範囲を限定し、この事前配信範囲のみの取得パターンを利用して将来のデータ取得頻度（データ取得回数）Ｍを予測するようにしてもよい。この場合、データ事前配信手段２０８は、事前配信範囲のみのデータ事前配信コストよりも事前配信しない場合のデータ取得にかかるコストの合計の期待値Ｍ×ｌｏｇ（Ｎ）／ｌｏｇ（２）／２が大きければ、データを事前配信すべきと判断し、当該範囲にデータを事前配信する。 Further, in both cases of the first method and the second method, the data pre-delivery range is limited, and the data acquisition frequency (the number of data acquisitions) in the future using the acquisition pattern of only this pre-delivery range. M may be predicted. In this case, the data pre-distribution means 208 determines that the total expected value M × log (N) / log (2) / 2 for the data acquisition when the pre-distribution cost is not pre-distributed than the data pre-distribution cost of only the pre-delivery range If it is larger, it is determined that the data should be pre-distributed, and the data is pre-distributed in the range.

事前配信範囲としては、例えばＩＰネットワークのサブネットを用いてグループ化したノード２００のグループや、ＩＰマルチキャストのグループ、あるいは分散ハッシュ表（経路表）が提供するグループ情報を用いてグループ化したノード２００のグループなどがある。 As the pre-distribution range, for example, a group of nodes 200 grouped using a subnet of an IP network, an IP multicast group, or a group of nodes 200 grouped using group information provided by a distributed hash table (route table). There are groups.

ＩＰネットワークのサブネットを用いてグループ化する場合は、ＩＰアドレスをビット列とみなして、あるプレフィックス長を用いプレフィックスが一致するＩＰアドレスを持つノードをグループ化する。データ事前配信手段２０８は、グループ毎にデータ取得頻度Ｍを予測し、事前配信しない場合のデータ取得にかかるコストの合計の期待値Ｍ×ｌｏｇ（Ｎ’）／ｌｏｇ（２）／２が事前配信範囲のデータ事前配信コストＮ’より大きければ、データを事前配信すべきと判断し、当該事前配信範囲のグループにデータを事前配信する。ただし、Ｎ’は当該グループ中のノード数である。 When grouping by using the subnet of the IP network, the IP address is regarded as a bit string, and nodes having IP addresses having the same prefix using a certain prefix length are grouped. The data pre-distribution means 208 predicts the data acquisition frequency M for each group, and the expected value M × log (N ′) / log (2) / 2 of the total cost for data acquisition when not pre-distributed is pre-distributed. If it is larger than the range data pre-delivery cost N ′, it is determined that the data should be pre-delivered, and the data is pre-delivered to the group in the pre-delivery range. N ′ is the number of nodes in the group.

また、ＩＰアドレスのプレフィックスを用いたデータの事前配信を行う場合、アプリケーションレイヤマルチキャストの二分木は、ＩＰアドレスのプレフィックス一致長が長いノード同士を近くに接続するように構成する。 In addition, when performing pre-distribution of data using an IP address prefix, an application layer multicast binary tree is configured to connect nodes having a long IP address prefix match length close to each other.

なお、第１〜第３の実施の形態におけるノード２００は、例えばＣＰＵ、記憶装置およびインタフェースを備えたコンピュータとこれらのハードウェア資源を制御するプログラムによって実現することができる。このようなコンピュータを動作させるためのプログラムは、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、メモリカードなどの記録媒体に記録された状態で提供される。ＣＰＵは、読み込んだプログラムを記憶装置に書き込み、このプログラムに従って第１〜第３の実施の形態で説明した処理を実行する。 Note that the node 200 in the first to third embodiments can be realized by, for example, a computer including a CPU, a storage device, and an interface, and a program that controls these hardware resources. A program for operating such a computer is provided in a state of being recorded on a recording medium such as a flexible disk, a CD-ROM, a DVD-ROM, or a memory card. The CPU writes the read program into the storage device, and executes the processing described in the first to third embodiments according to this program.

本発明は、ネットワーク上で、あるノードから別のノードへ情報を配信するといった用途に適用できる。また、受信ノードを特定しないで情報を配信するといった用途にも適用可能である。 The present invention can be applied to uses such as distributing information from one node to another on a network. Also, the present invention can be applied to uses such as distributing information without specifying a receiving node.

本発明の第１の実施の形態に係る分散データ管理システムにおけるノードの構成を示すブロック図である。It is a block diagram which shows the structure of the node in the distributed data management system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る分散データ管理システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the distributed data management system which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る分散データ管理システムにおけるノードの構成を示すブロック図である。It is a block diagram which shows the structure of the node in the distributed data management system which concerns on the 2nd Embodiment of this invention. ピアツーピアネットワークの構成例を示すブロック図である。It is a block diagram which shows the structural example of a peer to peer network. 従来のＣｈｏｒｄシステムにおけるノードの構成を示すブロック図である。It is a block diagram which shows the structure of the node in the conventional Chord system.

Explanation of symbols

１００…ネットワーク、２００…ノード、２０１…データ格納手段、２０２…データ処理手段、２０３…通信処理手段、２０４…経路表処理手段、２０５…経路表格納手段、２０６…データ処理予測手段、２０７…データ処理要求履歴格納手段、２０８，２０８ａ…データ事前配信手段。 DESCRIPTION OF SYMBOLS 100 ... Network, 200 ... Node, 201 ... Data storage means, 202 ... Data processing means, 203 ... Communication processing means, 204 ... Path table processing means, 205 ... Path table storage means, 206 ... Data processing prediction means, 207 ... Data Processing request history storage means, 208, 208a, data pre-delivery means.

Claims

In a distributed data management system in a peer-to-peer network,
Each node connected to the network
Communication processing means for communicating with other nodes via the network;
Data storage means for storing data;
Data processing means for processing data stored in the data storage means in response to a data processing request received by the communication processing means;
Data processing request history storage means for storing a history of the data processing request;
A data processing prediction unit that predicts a data processing pattern of another node based on the history of the data processing request and determines whether or not to distribute data to this node in advance;
A distributed data management system comprising: a data pre-distribution unit that distributes the instructed data to the other nodes when receiving a distribution instruction from the data processing prediction unit.

The distributed data management system according to claim 1, wherein
Furthermore, route table storage means for storing the route table;
A distributed data management system comprising: a routing table processing unit that determines a transfer destination of a message received by the communication processing unit based on routing table information stored in the routing table storage unit.

The distributed data management system according to claim 2,
The distributed data management system, wherein the routing table is constructed using a distributed hash table technique.

In the distributed data management system according to claim 2 or 3,
The distributed data management system, wherein the data pre-distribution means distributes data in advance using information of a routing table stored in the routing table storage means.

The distributed data management system according to any one of claims 1 to 4,
The data processing prediction unit obtains the data processing pattern from the past to the present based on the history of the data processing request, predicts a future data processing pattern from the data processing pattern, and obtains data when not pre-distributed A distributed data management system that determines data pre-distribution when it is determined that the cost of data exceeds the cost of data pre-distribution.

The distributed data management system according to any one of claims 1 to 5,
The data processing predicting means predicts the data processing pattern only for the pre-delivery range with a specific group of nodes as the pre-delivery range,
The distributed data management system, wherein the data pre-distribution means distributes data in advance only to the group of nodes.

The distributed data management system according to any one of claims 1 to 6,
The data processing prediction means predicts a future data acquisition frequency as the data processing pattern on the assumption that the data acquisition frequency from the past to the present in the other node and the future data acquisition frequency have a linear relationship. Distributed data management system characterized by

The distributed data management system according to claim 7,
When the time from the data registration time to the present is T1, the time from the present time to the data expiration date is T2, and the data acquisition count from the data registration time to the present is K, the data processing prediction means A distributed data management system characterized in that the predicted value of the number of times of data acquisition is K × T2 / T1.

In a distributed data management method in a peer-to-peer network,
Each node connected to the network
A data processing procedure for processing the data stored in the data storage procedure of the own node in response to the data processing request received by the communication processing means of the own node;
A data processing request history storage procedure for storing a history of the data processing request;
A data processing prediction procedure for predicting a data processing pattern of another node based on the history of the data processing request and determining whether or not to distribute data to this node in advance;
A distributed data management method comprising: executing a data pre-distribution procedure for distributing data determined to be pre-distributed in the data processing prediction procedure to the other nodes.