JP2011095976A

JP2011095976A - Device, method and program for distributed data management

Info

Publication number: JP2011095976A
Application number: JP2009248873A
Authority: JP
Inventors: Michio Irie; 道生入江
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-10-29
Filing date: 2009-10-29
Publication date: 2011-05-12

Abstract

<P>PROBLEM TO BE SOLVED: To obtain a high fault tolerance without data loss against a fault in entire applications using a distributed data management program, which handles data of which loss may cause trouble, and to improve use efficiency of each node. <P>SOLUTION: In a certain physical node, a node eigenvalue common to the entire plurality of virtual node ID obtained by a hash function is set, while a data decision method managed by each node (the node having a node ID close to a data ID manages an object data) is maintained in the distributed data management program. Also, at the time of data ID calculation being duplicated from certain data, the probability of nodes which manages the original and the duplication of data becoming physically different is increased by the change of a point corresponding to the node eigenvalue of the data ID. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、分散データ管理装置及び方法及びプログラムに係り、特に、ネットワーク上の複数のノード（サーバなど）上に分散しているデータを管理するための分散データ管理装置及び方法及びプログラムに関する。 The present invention relates to a distributed data management apparatus, method, and program, and more particularly, to a distributed data management apparatus, method, and program for managing data distributed on a plurality of nodes (such as servers) on a network.

本発明の対象である分散データ管理プログラムは、インターネットに代表されるＩＰ等を用いた各種ネットワークにおいて、複数のノード（計算機等）に跨ってデータを管理することに用いられる。これは計算機科学においてハッシュ表と呼ばれる、あるデータについて、その目印となるキー情報とデータの内容とを対にして保持するデータ管理技術を複数のノードによる分散環境に適用し、分割したハッシュ表を各ノードで分散して管理するものである。これにより実際の処理は各ノードに分散させつつ、ノード間で共通に用いるデータについては分散データ管理プログラムによって共有することができる。そのため、その適用分野は並列・分散処理等の分散コンピューティング全域に亘っている。 The distributed data management program that is the subject of the present invention is used to manage data across a plurality of nodes (computers, etc.) in various networks using IPs typified by the Internet. This is called a hash table in computer science. For some data, data management technology that holds key information and data contents as a pair is retained in a distributed environment with multiple nodes. It is distributed and managed at each node. As a result, while the actual processing is distributed to each node, the data commonly used among the nodes can be shared by the distributed data management program. Therefore, the field of application covers the entire distributed computing such as parallel / distributed processing.

本明細書において用いる分散データ管理プログラムという用語は、計算機科学においては、コンシステント・ハッシング（例えば、非特許文献１参照）、分散ハッシュテーブル（例えば、非特許文献２参照）、分散キー・バリュー・ストア（例えば、非特許文献３参照）と呼ばれる方式及びその方式を実装したプログラム類を総称したものである。これらに共通した特徴として、ハッシュ表と呼ばれるデータ管理方式を物理的に分散した複数のノード上で分割して管理するという点が挙げられる。 In the computer science, the term distributed data management program used in this specification is consistent hashing (see, for example, Non-Patent Document 1), distributed hash table (see, for example, Non-Patent Document 2), distributed key, value, This is a general term for a method called a store (see, for example, Non-Patent Document 3) and programs that implement the method. A feature common to these is that a data management method called a hash table is divided and managed on a plurality of physically distributed nodes.

ハッシュ表とは、あるデータとそのデータを識別するキーという情報を対として持ち、このキーをハッシュ関数と呼ばれる関数に与えて得られた値（ハッシュ値）を索引として利用するものである。ハッシュ関数は入力に対して法則性のない出力を返す関数である。これにより、似たようなキーが連続する場合も、キーの索引として使われるハッシュ値はキューの連続性とは無関係な値となり、特定の領域にデータが集中することを防ぐ。 A hash table has information called a key for identifying certain data and a key as a pair, and uses a value (hash value) obtained by giving this key to a function called a hash function as an index. A hash function is a function that returns an output with no law for the input. As a result, even when similar keys are consecutive, the hash value used as the key index becomes a value irrelevant to the continuity of the queue, preventing data from being concentrated in a specific area.

コンシステント・ハッシングにおいては、このハッシュ表を１つの物理的なノード（計算機等の単位）で保持するのではなく、複数のノードで分割して持つ。どのノードが分割されたハッシュ表のどの領域を担当するかは、ノードに設定されたＩＤ（本明細書ではノードＩＤと記述する）とデータキーをハッシュ関数にかけて得られたハッシュ値（本明細書では「データＩＤ」と記述する）との距離等によって決定される。 In consistent hashing, this hash table is not held by one physical node (unit such as a computer) but is divided and held by a plurality of nodes. Which node is responsible for which area of the divided hash table is determined by the hash value (this specification) obtained by applying the ID (set as node ID in this specification) set to the node and the data key to the hash function. In this case, it is determined by the distance to “data ID”.

代表的な例では、あるＩＤを持つノードは、自身のＩＤと、自身のＩＤよりも小さいＩＤを持つノードのうち直近のノードのＩＤとの間の領域に位置するデータＩＤを持つデータを担当する仕組みになっている。 In a typical example, a node having a certain ID is responsible for data having a data ID located in an area between its own ID and the ID of the nearest node among nodes having an ID smaller than its own ID. It is a mechanism to do.

分散ハッシュテーブルはコンシステント・ハッシングの考え方を拡張したものである。コンシステント・ハッシングにおいては、あるノードは他のノードと通信を行うために、他の全てのノードのＩＤとアドレスとを表によって管理する。一方で、このアドレス表の管理は全体のノード数が多くなるにつれてコストが高くなり、困難になる。そこで、分散ハッシュテーブルでは、ハッシュ表だけではなくこのアドレス表についても、その全体ではなく一部だけを管理することで、管理を簡単化することを行う。一方で、データ管理方式についての多くの機能はコンシステント・ハッシングと同様のため、本発明では両者を同様に分散データ管理プログラムとして一語で同一視する。 The distributed hash table is an extension of the consistent hashing concept. In consistent hashing, a node manages the IDs and addresses of all other nodes in a table in order to communicate with other nodes. On the other hand, the management of this address table becomes more expensive and difficult as the total number of nodes increases. Therefore, in the distributed hash table, management is simplified by managing not only the hash table but also a part of the address table, not the whole. On the other hand, since many functions of the data management method are the same as those of consistent hashing, in the present invention, both are similarly identified as a distributed data management program in one word.

分散キー・バリュー・ストアについても、コンシステント・ハッシングと同様の考え方で分散したデータ管理方式に、例えばデータの複製やキャッシュ化、トランザクションの管理といった処理を拡張したものである。やはり根本となるデータ管理方式についてはコンシステント・ハッシングと同様であるため、本発明ではこれら３つの技術を総称して分散データ管理プログラムと呼ぶ。 For the distributed key / value store, for example, data replication, caching, and transaction management are expanded to a data management method distributed in the same way as consistent hashing. Since the fundamental data management method is the same as that of consistent hashing, in the present invention, these three technologies are collectively referred to as a distributed data management program.

分散データ管理プログラムでは、一般にあるデータについてはそのデータＩＤと近い値のノードＩＤを持つノードにデータが管理されるため、ノードＩＤの偏りはそのままノード間のデータ管理量の偏りとなって表れる。ノードＩＤとデータＩＤの取りうる値の範囲のことを「ＩＤ空間」と呼び、上記の偏りについてはＩＤ空間内でのノードＩＤの偏りといった言い方ができる。 In the distributed data management program, since data is generally managed by a node having a node ID close to the data ID, the deviation of the node ID appears as the deviation of the data management amount between the nodes as it is. The range of values that can be taken by the node ID and the data ID is referred to as “ID space”, and the above-mentioned bias can be expressed as the bias of the node ID in the ID space.

分散データ管理プログラムではこのようなＩＤ空間内でのノードＩＤの偏りについて、一般に１つの物理的なノードに対して複数の仮想的なノードＩＤを割り当てるという、確率的な対策を行う。即ち、大数の法則によって、ある物理的なノードの仮想ノードＩＤがＩＤ空間上にたくさん散らばることで、偏りを平均化し、軽減する。１つのノードに割り当てる仮想ノードＩＤの数に比例して、ＩＤ空間上の分布の偏りは軽減される。 In the distributed data management program, a probabilistic measure is generally taken to assign a plurality of virtual node IDs to one physical node in general for such a bias of node IDs in the ID space. That is, the bias is averaged and mitigated by the fact that a large number of virtual node IDs of a physical node are scattered in the ID space according to the law of large numbers. In proportion to the number of virtual node IDs assigned to one node, the distribution bias in the ID space is reduced.

ここで、分散データ管理プログラムの利用法の１つの形態について説明する。分散データ管理プログラムでは１つのハッシュ表を複数のノードで分割して管理するため、ある特定のノードが故障やプログラムの終了といった理由によってハッシュ表の構成要素としての役目を終える際、そのノードが管理していたデータは一般に失われてしまう。分散データ管理プログラムをキャッシュとして用い、データの原本は別の装置で管理しているような場合は問題ないが、分散データ管理プログラム上でデータの原本を管理していて、かつそのデータを失うことができない場合はこの構成要素の離脱に伴うデータの損失は致命的である。 Here, one form of usage of the distributed data management program will be described. In a distributed data management program, one hash table is divided and managed by multiple nodes, so when a particular node finishes its role as a hash table component due to a failure or program termination, that node manages it. The data that was being used is generally lost. There is no problem if the distributed data management program is used as a cache and the original data is managed by another device, but the original data is managed by the distributed data management program and the data is lost. If this is not possible, the loss of data associated with the removal of this component is fatal.

この問題に対し、一般に失いたくないデータに対しては、そのデータの複製を生成し、別のデータＩＤを与えることで、ハッシュ表にオリジナルのデータとは全く別の新しいデータとして登録しておく利用形態が用いられる。ここで、複製のデータＩＤはオリジナルのデータＩＤから類推可能なように生成する。例えば、データキーからデータＩＤを算出するハッシュ関数をｆ（ｘ）と置く場合、その複製の算出には関数ｇ（ｘ）といったようにオリジナルのデータキーを入力として別のハッシュ関数にかけるなどの算出方法が考えられる。この形態においては、あるデータのオリジナルを管理するノードが離脱した場合も、そのデータを取得しようとしていたノードが上記のような算出論理に従ってデータの複製のデータＩＤを算出することで、このデータＩＤを管理するノードからデータの複製を取得することができる。 For this problem, for data that you don't want to lose in general, create a copy of the data and give it another data ID, so that it is registered in the hash table as new data completely different from the original data. A usage form is used. Here, the duplicate data ID is generated so as to be inferred from the original data ID. For example, when a hash function for calculating the data ID from the data key is set as f (x), the copy is calculated by applying the original data key to another hash function as in the function g (x). A calculation method can be considered. In this form, even when the node that manages the original of certain data leaves, the node that has acquired the data calculates the data ID of the duplicate of the data according to the calculation logic as described above. A copy of the data can be obtained from the node that manages the data.

あるデータのオリジナルと複製とを管理するノードが同時に故障する等の理由によって同時に離脱した場合は、オリジナルと複製のデータが両方とも失われ、そのデータは完全に失われてしまうことになる。このような同時離脱に対して、一般にはあるデータについて複数の複製を生成することで、確率的にオリジナルと全ての複製が同時に失われてしまうことを回避する。つまり、１つのデータについて生成する複製の数に比例して、あるデータのオリジナルか複製の何れかがＩＤ空間上に存在する確率が高まる。 If a node that manages the original and duplicate of a data is disconnected at the same time, for example, due to a failure, both the original and duplicate data are lost, and the data is completely lost. In general, a plurality of copies are generated for a certain data against such a simultaneous departure, thereby avoiding the probability that the original and all copies are lost at the same time. In other words, in proportion to the number of replicas generated for one data, the probability that either the original or the replica of certain data exists in the ID space increases.

D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy, "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web," in Proceedings of the 29th ACM Symposium on Theory of Computing (STOC'97), pp.654-663, May 1997.D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy, "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web," in Proceedings of the 29th ACM Symposium on Theory of Computing (STOC'97), pp.654-663, May 1997. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan, "Chord: A Scalable Peertopeer Lookup Service for Internet Applications," in Proceedings of the Annual Conference of the Special Interest Group on Data Communication (SIGCOMM'01), pp.149-160, August 2001.I. Stoica, R. Morris, D. Karger, MF Kaashoek, and H. Balakrishnan, "Chord: A Scalable Peertopeer Lookup Service for Internet Applications," in Proceedings of the Annual Conference of the Special Interest Group on Data Communication (SIGCOMM ' 01), pp.149-160, August 2001. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall and W. Vogels, "Dynamo: Amazon's Highly Available Key-value Store," in Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07), pp.205-220, October 2007.G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall and W. Vogels, "Dynamo: Amazon's Highly Available Key-value Store," in Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP'07), pp.205-220, October 2007.

上述のＩＤ空間におけるノードＩＤの偏りを回避するため、物理的なノードに複数の仮想的なノードＩＤを付与することと、データの複製の利用とを同時に用いることを考える場合、オリジナルのデータと複製のデータを管理する物理的なノードが同じになってしまう可能性が存在する。ノードやデータのＩＤを生成するハッシュ関数は入力に対して法則性を持たないため、あるデータのオリジナルと複製のデータＩＤが、ある物理的なノードのそれぞれ別の仮想ノードＩＤにマッピングされてしまう可能性がある。 In order to avoid the bias of node IDs in the ID space described above, when considering using a plurality of virtual node IDs for physical nodes and using data replication simultaneously, the original data and There is a possibility that the physical nodes that manage the replicated data will be the same. Since the hash function for generating the ID of the node or data has no law with respect to the input, the original and duplicate data IDs of certain data are mapped to different virtual node IDs of a certain physical node. there is a possibility.

このようなマッピングがなされた場合、オリジナルと複製の両方を別々の仮想ノードＩＤで管理している物理的なノードが故障等の理由により分散データ管理プログラムのノード同士で構成する分散データ管理システムより離脱した場合に、対象のデータを管理しているノードがシステム上に存在せず、データが失われてしまうということが、データの複製を利用しているにも関わらず一定確率で発生してしまう。 When such a mapping is made, the distributed data management system in which the physical node that manages both the original and the copy with different virtual node IDs is composed of nodes of the distributed data management program due to a failure or the like. In the event of withdrawal, the node that manages the target data does not exist on the system and the data will be lost with a certain probability even though data replication is used. End up.

また、データが失われない場合も、オリジナルや幾つかの複製のうちの一定量が同時にシステムより失われることで、残りの複製等を管理するノードが離脱する際にデータが失われてしまう確率を上げていることには変わりがない。 In addition, even if data is not lost, the probability that data will be lost when the node that manages the remaining replicas leaves because a certain amount of the original and several replicas are lost from the system at the same time. There is no change in raising.

この課題は、単純にデータの複製のデータＩＤの算出時に、物理的に同じノードから管理されそうであればデータＩＤの算出をやり直すといったような対処ではその場限りの対処に止まる。なぜならば、ノードは常に参加と離脱をする可能性があり、参加や離脱に伴いあるデータＩＤのデータを管理するノードは変わり続けるためである。 This problem is limited to ad hoc measures such as recalculating the data ID if it is likely to be managed from the same physical node when calculating the data ID of the data copy. This is because there is a possibility that a node always joins and leaves, and a node that manages data with a data ID that accompanies joining or leaving keeps changing.

この課題に対する従来の解決手段としては、確率的な手段とＩＤ生成規則に一定の法則性を与えるものとの２種類が存在する。 There are two types of conventional solutions to this problem: probabilistic means and those that give a certain rule to the ID generation rule.

前者（確率的な手段）は、複製の生成数を上げることで、物理的に同じノードが幾つかのオリジナルや複製を同時に管理した場合にも、幾つかの残りの複製を他の物理的に異なるノードが管理する確率を高めるものである。 The former (probabilistic means) increases the number of duplicates generated so that if the same physical node manages several originals or duplicates simultaneously, some remaining duplicates are It increases the probability that different nodes manage.

このアプローチはシステムから特定のデータが完全に失われてしまう確率を低減するものの、各ノードで管理するデータの量は複製数に比例するため、システム全体で管理するデータ量がその最大データ量に対して大きい場合、この複製コストを無視することができない。 Although this approach reduces the probability that certain data will be completely lost from the system, the amount of data managed by each node is proportional to the number of replicas, so the amount of data managed by the entire system is the maximum amount of data. On the other hand, if it is large, this replication cost cannot be ignored.

後者（ＩＤ生成規則に一定の法則性を与える）は、ある物理的なノードの仮想ノードＩＤについては、その最上位数ビットについてノードに固有の値を用い、データのオリジナルと各複製との間では該当する最上位数ビットの値を一定の法則性に従って変更することで、ある物理的なノードがデータのオリジナルと複製とを同時に管理する確率を低減するものである。 The latter (giving a certain rule to the ID generation rule) uses a node-specific value for the most significant bits for the virtual node ID of a physical node, and between the original and each copy of the data. Then, by changing the value of the most significant bit corresponding to a certain rule, the probability that a certain physical node simultaneously manages the original and duplicated data is reduced.

例えば、ある物理的なノードのノード固有値を「０１０１」（２進数にて表記）とした場合、そのノードの全ての仮想ノードＩＤの上位４ビットは全て「０１０１」になるものとする。これに対し、あるデータのオリジナルのデータＩＤが「０１０１」で始まる場合、このデータは高確率で先のノード固有値「０１０１」のノードによって管理される。この際、データの複製のデータＩＤの算出規則に、上位４ビットを必ず変えるような仕組みを入れることで、データの複製については他のノード固有値を持つ別の物理的なノードにおいて管理される確率が高くなる。 For example, if the node-specific value of a physical node is “0101” (in binary notation), the upper 4 bits of all virtual node IDs of that node are all “0101”. On the other hand, when the original data ID of certain data starts with “0101”, this data is managed by the node having the node specific value “0101” with high probability. At this time, the probability that data replication is managed in another physical node having another node specific value by adding a mechanism that always changes the upper 4 bits to the data ID calculation rule for data replication. Becomes higher.

このアプローチは、複製の数を一般的な用法よりも増やす必要がないため、システム全体で管理するデータ量は変わらないという利点と、データがシステムから失われる確率を下げるという利点とを備える。 This approach has the advantage that the amount of data managed throughout the system does not change and the probability of data being lost from the system is reduced because the number of replicas does not need to be increased over typical usage.

一方で、ノードＩＤの生成規則に、物理的なノード毎に固有のノード固有値という法則性を与えたがために、ＩＤ空間上でのノードＩＤの偏りは、全てこのノード固有値の偏りによって引き起こされることになってしまう。ノード固有値は物理的なノードに１つであるため、ノード毎に仮想的なノードＩＤを幾ら割り当てようとも、ノード固有値が偏ってしまった場合にはＩＤ空間上でのノードＩＤの偏りを回避することができなくなるという課題を持つ。 On the other hand, since the rule of generating node IDs is given a rule of unique node eigenvalues for each physical node, the deviation of node IDs in the ID space is all caused by this deviation of node eigenvalues. It will be. Since there is one node-specific value for each physical node, no matter how many virtual node IDs are assigned to each node, if the node-specific values are biased, avoid the bias of node IDs in the ID space. It has a problem that it cannot be done.

本発明は、上記の点に鑑みなされたもので、ネットワーク上の複数のノード（サーバなど）上に分散しているデータを管理する場合に、ＩＤ空間上でのノードＩＤの偏りを抑え、かつ、データのオリジナルと複製の両方を物理的に同一のノードが管理する確率を低減することが可能な分散データ管理装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points. When managing data distributed on a plurality of nodes (such as servers) on a network, the present invention suppresses the bias of node IDs in the ID space, and An object of the present invention is to provide a distributed data management apparatus, method, and program capable of reducing the probability that the same node manages both the original and the replica of data.

本発明では、この分散データ管理プログラムを用いる際、データの複製を置くことでノード故障時のデータの耐障害性を高める手法において発生する問題点をノードのＩＤの生成方法とデータの複製のＩＤの算出方法に、単純なハッシュ関数以上の処理を施すことで、確率的に解決する。 In the present invention, when this distributed data management program is used, the problem that occurs in the method of enhancing the fault tolerance of data at the time of a node failure by placing a copy of the data is to generate the node ID and the ID of the data copy. The calculation method is subjected to processing more than a simple hash function to solve it probabilistically.

図１は、本発明の原理構成図である。 FIG. 1 is a principle configuration diagram of the present invention.

本発明（請求項１）は、入力値の法則とは無関係な法則性のない出力を返す関数であるハッシュ関数を用いて求めたハッシュ値であるデータＩＤに近いノードＩＤを持つノードが対象データを管理する分散データ管理装置であって、
ハッシュ関数を保持するハッシュ関数記憶手段９００と、
ある物理的なノードに設定する複数の仮想ノードＩＤを生成する仮想ノードＩＤ生成手段１０００を有し、
仮想ノードＩＤ生成手段１０００は、
任意の手段によって算出された上位無作為値と下位無作為値との間に、物理的なノード毎の固有の値にハッシュ関数記憶手段９００のハッシュ関数を用いて算出されたノード固有値を設定することにより仮想ノードＩＤを生成するノード固有値挿入手段１０１０を有する。 According to the present invention (claim 1), a node having a node ID close to a data ID that is a hash value obtained using a hash function that is a function that returns an output having no law unrelated to the law of the input value is subject data. A distributed data management device for managing
Hash function storage means 900 for holding a hash function;
A virtual node ID generation unit 1000 that generates a plurality of virtual node IDs set for a physical node;
The virtual node ID generation unit 1000
A node unique value calculated using the hash function of the hash function storage unit 900 is set to a unique value for each physical node between the upper random value and the lower random value calculated by an arbitrary means. Thus, node specific value insertion means 1010 for generating a virtual node ID is provided.

また、本発明（請求項２）は、他のノードから自装置が担当すべきデータのデータＩＤ、キー、データを取得する通信手段と、
データの複製を示す複製のＩＤを算出する複製データＩＤ算出手段と、を更に有し、
複製データＩＤ算出手段は、
通信手段により取得した元のデータＩＤの前記ノード固有値に相当する箇所を、元のデータを保持しているノードＩＤとは別のノードＩＤの値をハッシュ値と組み合わせて変換する固有値部分変換手段を有する。 Further, the present invention (claim 2) is a communication means for acquiring the data ID, key, and data of the data to be handled by the own device from another node;
A copy data ID calculating means for calculating a copy ID indicating data copy;
The duplicate data ID calculation means
Eigenvalue partial conversion means for converting a portion corresponding to the node specific value of the original data ID acquired by the communication means by combining a value of a node ID different from the node ID holding the original data with a hash value Have.

本発明（請求項３）は、入力値の法則とは無関係な法則性のない出力を返す関数であるハッシュ関数を用いて求めたハッシュ値であるデータＩＤに近いノードＩＤを持つノードが対象データを管理する装置における分散データ管理方法であって、
装置の仮想ノードＩＤ生成手段が、
任意の手段によって算出された上位無作為値と下位無作為値との間に、物理的なノード毎の固有の値にハッシュ関数を用いて算出されたノード固有値を設定することにより前記仮想ノードＩＤを生成する。 According to the present invention (Claim 3), a node having a node ID close to a data ID that is a hash value obtained by using a hash function that is a function that returns an output having no law unrelated to the law of the input value is subject data. A distributed data management method in a device for managing
The virtual node ID generation means of the device is
The virtual node ID is set by setting a node unique value calculated by using a hash function to a unique value for each physical node between the upper random value and the lower random value calculated by an arbitrary means. Is generated.

また、本発明（請求項４）は、装置の複製データＩＤ算出手段が、
他のノードから通信手段を介して自装置が担当すべきデータのデータＩＤ、キー、データを取得し、
通信手段により取得した元のデータＩＤの前記ノード固有値に相当する箇所を変換する。 According to the present invention (claim 4), the duplicate data ID calculation means of the device
Obtain the data ID, key, and data of the data that the device should be in charge of from other nodes via communication means,
A portion corresponding to the node specific value of the original data ID acquired by the communication means is converted.

本発明（請求項５）は、請求項１または２に記載の分散データ管理装置の各手段としてコンピュータを機能させるための分散データ管理プログラムである。 The present invention (Claim 5) is a distributed data management program for causing a computer to function as each means of the distributed data management apparatus according to Claim 1 or 2.

上記のように本発明は、ノードＩＤの途中、上位数ビットと下位数ビットの間の残り数ビット部分に、物理的なノード毎に固有の値をもち、そのノードの全ての仮想ノードＩＤに共通の値（ノード固有値）を持つ。また、オリジナルのデータＩＤからノード固有値に相当する箇所を変更することでデータの複製のデータＩＤを算出する。これにより、分散データ管理プログラムにおけるデータ複製によるデータ冗長性の向上と、仮想ノードＩＤの生成によるＩＤ空間上のＩＤの偏りの回避とを両立することができるため、分散データ管理プログラムを用いて失われては困るデータを扱う全てのアプリケーションにおいて、障害に対してもデータの失われることのない、耐障害性が高くかつ各ノードの利用効率に優れたデータ管理を行うことができる。 As described above, the present invention has a unique value for each physical node in the remaining several bits between the upper few bits and the lower few bits in the middle of the node ID, and all the virtual node IDs of the node have a unique value. Has a common value (node-specific value). Further, the data ID of the duplicate of the data is calculated by changing the portion corresponding to the node specific value from the original data ID. As a result, it is possible to achieve both improvement in data redundancy by data replication in the distributed data management program and avoidance of ID bias in the ID space by generation of virtual node IDs. In all applications that handle data that is difficult for us, it is possible to perform data management with high fault tolerance and excellent use efficiency of each node without data loss even in the event of a failure.

本発明の原理構成図である。It is a principle block diagram of this invention. ノードＩＤの構成を示す図である。It is a figure which shows the structure of node ID. 本発明の一実施の形態におけるシステム構成図である。1 is a system configuration diagram according to an embodiment of the present invention. 本発明の一実施の形態におけるノードの構成図である。It is a block diagram of the node in one embodiment of this invention. 本発明の一実施の形態における仮想ノードＩＤの生成例である。It is an example of generation of virtual node ID in one embodiment of the present invention. 本発明の一実施の形態における分散データ管理プログラムにおける複製データＩＤの生成例（その１）である。It is a generation example (part 1) of the duplicate data ID in the distributed data management program according to the embodiment of the present invention. 本発明の一実施の形態における分散データ管理プログラムにおける複製データＩＤの生成例（その２）である。It is a generation example (the 2) of replication data ID in the distributed data management program in one embodiment of this invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図２は、ノードＩＤの構成を示す。同図（Ａ）は、従来のノードＩＤ２００であり、同図（Ｂ）は本発明のノードＩＤ１００の構成を示す。 FIG. 2 shows the configuration of the node ID. FIG. 2A shows a conventional node ID 200, and FIG. 2B shows the configuration of the node ID 100 of the present invention.

前述の課題に対し、本発明ではノードＩＤ１００の途中の、上位数ビット１１０と下位数ビット１３０の間の残り数ビット部分に、物理的なノード毎に固有のノード固有値１２０を設定する（図２（Ｂ））。このノード固有値は上述の既存解決手段における最上位数ビットに設定したノード固有値と同様、物理的なノード毎に固有の値を持ち、そのノードの全ての仮想ノードＩＤに共通の値を持つ。 In the present invention, in the present invention, a unique node unique value 120 is set for each physical node in the remaining few bits between the upper few bits 110 and the lower few bits 130 in the middle of the node ID 100 (FIG. 2). (B)). This node unique value has a unique value for each physical node as well as the node unique value set in the most significant bit in the existing solution described above, and has a value common to all virtual node IDs of the node.

例えば、あるノードのノード固有値が「００１０」で、これがノードＩＤの９〜１２ビット目に相当する場合、そのノードの全ての仮想ノードＩＤの９〜１２ビット目はノード固有値である「００１０」となる。 For example, when the node unique value of a certain node is “0010” and this corresponds to the 9th to 12th bits of the node ID, the 9th to 12th bits of all the virtual node IDs of the node are “0010” which is the node unique value. Become.

また、データの複製のデータＩＤの算出方法についても、オリジナルのデータＩＤから、ノード固有値に相当する箇所を変更することで算出するものとする。 Also, the method for calculating the data ID of the data copy is calculated by changing the part corresponding to the node specific value from the original data ID.

例えば、ノードＩＤ中のノード固有値が９〜１２ビット目に存在するとした場合、データの複製のデータＩＤは、９〜１２ビット目が異なった値を取る。仮にオリジナルのデータのデータＩＤの９〜１２ビット目が「１０１１」であった場合、複製のデータＩＤの９〜１２ビット目は「１１００」、「１１０１」、「１１１０」といったように、「１０１１」以外の値を取る。この際、「１０１１」以外の値の取り方・法則性については、オリジナルのデータＩＤにおける該当箇所と「離れている」ことが条件であるが、この「離れている」という概念については本発明を適用する分散データ管理プログラム毎に異なるため、本明細書では値の変更の仕方については限定しない。なお、代表的な例については後述する。 For example, if the node unique value in the node ID is present in the 9th to 12th bits, the data ID of the data copy takes a different value in the 9th to 12th bits. If the 9th to 12th bits of the data ID of the original data are “1011”, the 9th to 12th bits of the duplicate data ID are “1011” such as “1100”, “1101”, “1110”. Take a value other than At this time, regarding the method of taking a value other than “1011” and the legality, it is necessary to be “separated” from the corresponding part in the original data ID. Therefore, the method of changing the value is not limited in this specification. A typical example will be described later.

本発明の分散データ管理プログラムは、パーソナルコンピュータやサーバ等の一般に広く用いられている計算機（ノード）上のプログラムであり、複数のノード上で実行されたプログラム同士で協調し、データの管理を行うものである。 The distributed data management program of the present invention is a program on a widely used computer (node) such as a personal computer or a server, and the programs executed on a plurality of nodes cooperate to manage data. Is.

分散データ管理プログラムをデータ管理に利用するアプリケーションからは、擬似的に１つのハッシュ表のようなデータ管理システムに見え、実際には分散されていることを意識することなくデータ管理に利用することができる。 From an application that uses a distributed data management program for data management, it appears to be a data management system that looks like a single hash table, and can be used for data management without being conscious of being actually distributed. it can.

分散データ管理プログラムは、主にインターネットやインターネット・プロトコル（ＩＰ）によるローカルエリアネットワークにおいて用いられることを想定されている。 The distributed data management program is assumed to be used mainly in a local area network based on the Internet or Internet Protocol (IP).

図３は、本発明の一実施の形態におけるシステム構成図であり、ネットワーク１２００に接続された分散データ管理プログラムのノードの関係を示すものである。各ノード（３００Ａ〜３００Ｆ）間にはどれが主でどれが従であるといったような主従関係はなく、全てのノードの関係が平等である。ＩＤ空間の偏りによっては各ノード間で担当するデータ量に大小が発生する可能性があるが、各ノードの役割としては平等である。 FIG. 3 is a system configuration diagram according to an embodiment of the present invention, and shows the relationship of nodes of the distributed data management program connected to the network 1200. There is no master-slave relationship such as which is the master and which is the slave between the nodes (300A to 300F), and the relationship of all the nodes is equal. Depending on the bias of the ID space, there is a possibility that the amount of data handled between the nodes may vary, but the role of each node is equal.

図４は、本発明の一実施の形態におけるノードの構成図である。以下、ノード３００が備える各機能について説明する。ノード３００は、自ノードＩＤ管理部４００、他ノード管理部５００、ハッシュ表管理部６００、通信部７００、探索部８００、ハッシュ関数９００、仮想ノードＩＤ生成部１０００、複製データＩＤ算出部１１００からなる。これらの各構成要素のうち、自ノードＩＤ管理部４００、他ノード管理部５００、ハッシュ表管理部６００、通信部７００、探索部８００、ハッシュ関数９００は、従来の構成と共通のものであり、本発明に特有となる部分は仮想ノードＩＤ生成部１０００中のノード固有値挿入部１０１０と、複製データＩＤ算出部１１００中の固有値部分変換部１１１０である。 FIG. 4 is a configuration diagram of a node according to an embodiment of the present invention. Hereinafter, each function with which the node 300 is provided is demonstrated. The node 300 includes a local node ID management unit 400, another node management unit 500, a hash table management unit 600, a communication unit 700, a search unit 800, a hash function 900, a virtual node ID generation unit 1000, and a duplicate data ID calculation unit 1100. . Among these components, the own node ID management unit 400, the other node management unit 500, the hash table management unit 600, the communication unit 700, the search unit 800, and the hash function 900 are common to the conventional configuration. The parts specific to the present invention are a node unique value insertion unit 1010 in the virtual node ID generation unit 1000 and a unique value part conversion unit 1110 in the duplicate data ID calculation unit 1100.

自ノードＩＤ管理部４００は自ノードＩＤ４１０を、他ノード管理部５００は、ノードＩＤ５１０とアドレス５２０を、ハッシュ表管理部６００は、データＩＤ６１０、キー６２０、データ値６３０を、それぞれメモリ等の記憶手段に格納している。また、ハッシュ関数９００は、メモリ等の記憶手段に格納されているものとする。 The own node ID management unit 400 stores the own node ID 410, the other node management unit 500 stores the node ID 510 and the address 520, the hash table management unit 600 stores the data ID 610, the key 620, and the data value 630, respectively. Is stored. The hash function 900 is assumed to be stored in a storage unit such as a memory.

ノード３００はその起動時に、自らのノードＩＤの算出を行うとき、他のノードの構成する分散データ管理システムに通信部７００を用いて接続を行い、システムのその時点の状態や他のノードの情報等を取得する。 When the node 300 calculates its own node ID at the time of activation, the node 300 is connected to the distributed data management system configured by another node using the communication unit 700, and the current state of the system and information on other nodes Etc.

通信部７００は分散データ管理プログラムを実行する各ノードが他のノードと通信し情報のやり取りを行うことに用いられる。 The communication unit 700 is used when each node executing the distributed data management program communicates with other nodes to exchange information.

自らのノードＩＤは、一般にＩＰアドレスやその他の一意な識別子等のノードに固有の値を入力として、ハッシュ関数９００を用いて算出される。ここで言うハッシュ関数とは前述の通り、入力値の法則とは無関係な、法則性のない出力を返す関数のことである。代表的なハッシュ関数にはＳＨＡ−１やＭＤ５といったものが挙げられるが、本発明の手法ではハッシュ関数９００を特定のハッシュ関数には限定せず、またハッシュ関数９００によって得られるハッシュ値の大きさについても限定しない。 The own node ID is generally calculated using a hash function 900 by inputting a value unique to the node such as an IP address or other unique identifier. As described above, the hash function referred to here is a function that returns an output having no law that is not related to the law of the input value. Typical hash functions include SHA-1 and MD5. However, in the method of the present invention, the hash function 900 is not limited to a specific hash function, and the hash value obtained by the hash function 900 is large. It is not limited to.

自ノードＩＤ管理部４００は、このようなハッシュ関数９００によって算出されたノード自身のノードＩＤ４１０Ａを管理する機能であると共に、後述する自身の仮想ノードＩＤをノードＩＤ４１０Ｂ〜４１０Ｃとして管理する。 The own node ID management unit 400 has a function of managing the node ID 410A of the node itself calculated by such a hash function 900, and manages its own virtual node ID, which will be described later, as node IDs 410B to 410C.

ノード３００は分散データ管理プログラムの起動後、自身のノードＩＤを計算すると共に、他のノードからシステムの現在の状態（参加している他のノードの情報と、参加したノードが管理すべきデータの情報）を、通信部７００を用いて受け取る。 After starting the distributed data management program, the node 300 calculates its own node ID, and from the other nodes, the current state of the system (information on other participating nodes and data to be managed by the participating nodes). Information) is received using the communication unit 700.

参加している他のノードの情報（ノード毎のノードＩＤと通信部７００で用いるノードの通信先情報（アドレス））は、ノード３００の他ノード管理部５００において、ノードＩＤ５１０とアドレス５２０の対として、他のノード毎にノードＩＤ５１０Ａ〜５１０Ｃ、アドレス５２０Ａ〜５２０Ｃとして管理される。 Information on other participating nodes (node ID for each node and communication destination information (address) of the node used in the communication unit 700) is stored as a pair of the node ID 510 and the address 520 in the other node management unit 500 of the node 300. The other nodes are managed as node IDs 510A to 510C and addresses 520A to 520C.

ここで、仮想ノードＩＤのような仕組みを用いる場合、他ノード管理部５００において物理的なノード毎にその仮想ノードＩＤとアドレスの対を管理する形も考えられるが、本発明においては物理的なノード毎の管理は必要条件ではないため、簡略化してノードＩＤとアドレスの対という形で記述する。この場合、他の物理的なノードの複数の仮想ノードＩＤが非連続に他ノード管理機能に表れ、場合によっては同じアドレスを指し示すような形で格納される。 Here, when a mechanism such as a virtual node ID is used, a form in which the virtual node ID / address pair is managed for each physical node in the other node management unit 500 is also conceivable. Since management for each node is not a necessary condition, it is simplified and described as a pair of node ID and address. In this case, a plurality of virtual node IDs of other physical nodes appear discontinuously in the other node management function, and in some cases are stored in a form indicating the same address.

なお、自ノードＩＤ管理部４００および他ノード管理部５００におけるノードＩＤの形式は、ここでは省略しているが図２（Ｂ）におけるノードＩＤ１００に相当する構造を取る。 Note that the node ID formats in the own node ID management unit 400 and the other node management unit 500 are omitted here, but have a structure corresponding to the node ID 100 in FIG.

ノード３００は他のノードから通信部７００を介して、自身が担当すべきデータについて、そのデータＩＤ、キー、データの値の３つの情報を受け取る。 The node 300 receives, from the other nodes via the communication unit 700, three pieces of information, that is, data ID, key, and data value, for data to be handled by the node 300.

これはハッシュ表管理部６００において、データ毎にデータＩＤ６１０、キー６２０、データ値６３０の組み合わせとして、データＩＤ６１０Ａ〜６１０Ｃ、キー６２０Ａ〜６２０Ｃ、値６３０Ａ〜６３０Ｃとして管理される。なお、データキー６１０Ａ〜６１０Ｃの形は基本的にはノードＩＤ１００と同様の構成となり、データの複製毎にノード固有値１２０に相当する箇所を後述する複製データＩＤ算出部１１００の固有値部分変換部１１１０によって変換する形となる。 This is managed in the hash table management unit 600 as data IDs 610A to 610C, keys 620A to 620C, and values 630A to 630C as combinations of the data ID 610, the key 620, and the data value 630 for each data. The form of the data keys 610A to 610C is basically the same as that of the node ID 100, and the location corresponding to the node unique value 120 is copied by the unique value partial conversion unit 1110 of the duplicate data ID calculation unit 1100 described later for each data copy. It becomes a form to convert.

これらの分散データ管理プログラムの起動時に行われる初期化処理が終了した後は、アプリケーションからのデータの探索要求や格納要求を通信部７００によって受け、対応する処理を行うこととなる。 After the initialization process performed at the time of starting these distributed data management programs is completed, the communication unit 700 receives a data search request or storage request from the application, and performs a corresponding process.

探索要求はノード３００の探索部８００に送られる。探索要求はデータのキー６２０に相当するデータを取得するもので、ノード３００の探索部８００は、キー６２０をハッシュ関数にかけて得られたデータＩＤ６１０を管理するノードを他ノード管理部５００から探し出す。ここでもし自身がキー６１０で指定されたデータを管理するノードだった場合は、自身のハッシュ表管理部６００から対応するデータを取り出し、これを要求元へと送信する。もし自身以外のノードが管理するデータだった場合は、そのノードのアドレスを他ノード管理部５００を参照して取得し、このノードに対し要求を転送する。 The search request is sent to the search unit 800 of the node 300. The search request is to obtain data corresponding to the data key 620, and the search unit 800 of the node 300 searches the other node management unit 500 for a node that manages the data ID 610 obtained by applying the key 620 to the hash function. If the node itself manages the data specified by the key 610, the corresponding data is extracted from its own hash table management unit 600 and transmitted to the request source. If the data is managed by a node other than itself, the address of the node is acquired by referring to the other node management unit 500, and the request is transferred to this node.

この際、対象のデータを管理しているはずのノードが故障等によりシステムから離脱しており、対象のデータが取得できない場合には、後述する複製データＩＤ算出部１１００においてその複製のデータＩＤ６１０をキー６２０より算出し、その複製データＩＤ６１０を管理するノードへと要求を転送する形となる。また、複製データの取得にはこのような要求の失敗時にのみ行う形の他、高速化のためにオリジナルと一定数の複製へと要求を転送し、もっとも速く返ってきたデータを要求の結果として採用するといった形が考えられる。 At this time, if the node that should manage the target data has left the system due to a failure or the like and the target data cannot be acquired, the replication data ID calculation unit 1100 described later sets the data ID 610 of the replication. The request is calculated from the key 620, and the request is transferred to the node that manages the duplicate data ID 610. In addition to obtaining such duplicate data only when such a request fails, the request is transferred to the original and a certain number of duplicates for speedup, and the data returned most quickly as the result of the request. The form of adoption is conceivable.

アプリケーションからのデータの格納要求は、アプリケーションより渡されたデータのキーをハッシュ関数にかけ、データＩＤを算出するところから始まる。ここで得られたデータＩＤを管理すべきノードを、他ノード管理部５００を参照することで取得し、そのノードのアドレスに対しデータＩＤとアプリケーションより渡されたキーと値を送る。このデータを受け取ったノードは自身のハッシュ表管理部６００においてこのデータを管理する。 The data storage request from the application starts by calculating the data ID by applying the key of the data passed from the application to the hash function. The node that should manage the data ID obtained here is acquired by referring to the other node management unit 500, and the data ID and the key and value passed from the application are sent to the address of the node. The node that has received this data manages this data in its own hash table management unit 600.

データの複製を用いる場合は、格納要求の受信時に、複製のデータＩＤを複製データＩＤ算出部１１００において算出する。固有値部分変換部１１１０における算出方法の具体例については後述する。 When data replication is used, the replication data ID calculation unit 1100 calculates the replication data ID when a storage request is received. A specific example of the calculation method in the eigenvalue partial converter 1110 will be described later.

以下、オリジナルのデータの格納時の動作と同様に、生成された複製データＩＤを管理するノードを他ノード管理部５００によって取得し、対象のノードに対して複製のデータＩＤ、キー、値を送る。複製データを受け取ったノードは自身のハッシュ表管理部６００においてこの複製データを管理する。 Thereafter, similarly to the operation at the time of storing the original data, the node that manages the generated duplicate data ID is acquired by the other node management unit 500, and the duplicate data ID, key, and value are sent to the target node. . The node that received the duplicate data manages this duplicate data in its own hash table management unit 600.

なお、データのオリジナルと複製との間で一貫性を維持するため、一般には何らかのトランザクション管理の仕組みを適用することが考えられるため、本発明ではオリジナルと複製のデータの書き込みの順序を本実施例における順序に限定するものではない。 In order to maintain consistency between the original and duplicate of data, it is generally possible to apply some kind of transaction management mechanism. Therefore, in the present invention, the order of writing the original and duplicated data is determined in this embodiment. It is not limited to the order.

以下、本発明におけるノード固有値挿入部１０１０と固有値部分変換部１１１０のそれぞれの動作の例について説明する。 Hereinafter, examples of operations of the node eigenvalue insertion unit 1010 and the eigenvalue partial conversion unit 1110 according to the present invention will be described.

図５は、本発明の一実施の形態における仮想ノードＩＤの生成例であり、仮想ノードＩＤ生成部１０００及びノード固有値挿入部１０１０による仮想ノードＩＤの生成例を示している。ここでは例としてノードＩＤを３２ビットの値としている。この例では上位８ビットが上位無作為値１１０であり、続く４ビットがノード固有値１２０、下位の２０ビットが下位無作為値１３０に相当する。ここでノードＩＤａをオリジナルのノードＩＤとした場合、このノードＩＤａにおけるノード固有値（０１１０）をこの物理的なノードにおけるノード固有値として固定する。以後、仮想ノードＩＤ（ノードＩＤｂ、ノードＩＤｃ、…）の算出時には常にノード固有値部分（上位９〜１２ビット目）をこのノード固有値（０１１０）に固定する。 FIG. 5 shows a generation example of the virtual node ID according to the embodiment of the present invention, and shows a generation example of the virtual node ID by the virtual node ID generation unit 1000 and the node unique value insertion unit 1010. Here, as an example, the node ID is a 32-bit value. In this example, the upper 8 bits are the upper random value 110, the subsequent 4 bits are the node specific value 120, and the lower 20 bits are the lower random value 130. Here, when the node IDa is the original node ID, the node unique value (0110) in this node IDa is fixed as the node unique value in this physical node. Thereafter, when calculating the virtual node ID (node IDb, node IDc,...), The node specific value portion (the upper 9th to 12th bits) is always fixed to this node specific value (0110).

なお、仮想ノードＩＤ算出時における上位無作為値と下位無作為値は任意の手段によって算出された無作為値である。一般には、ノードのＩＰアドレスの末尾に仮想ノードＩＤの順番を付与したものを入力としてハッシュ関数に掛けた戻り値等を用いる。 The upper random value and the lower random value at the time of calculating the virtual node ID are random values calculated by any means. In general, a return value or the like obtained by multiplying a hash function with an input of a virtual node ID order at the end of a node IP address is used.

次に、複製データＩＤ算出部１１００における固有値部分変換部１１１０による複製データＩＤの算出方法を、図６及び図７の例を用いて述べる。 Next, a method for calculating a duplicate data ID by the eigenvalue partial converter 1110 in the duplicate data ID calculator 1100 will be described with reference to the examples of FIGS.

図６は、本発明の一実施の形態における分散データ管理プログラムにおける複製データＩＤの生成例（ＩＤを生成として扱い、その差を距離と置いた場合）を示し、図７は、本発明の一実施の形態における複製データＩＤの生成例（（ＩＤビット列間の排他論理和の値を距離としておいた場合）を示す。ここで、どのノードがどのデータを格納するかを判定する基準によって、複製のデータＩＤの算出方法が変わってくる。 FIG. 6 shows a generation example of a duplicate data ID in the distributed data management program according to an embodiment of the present invention (when ID is treated as a generation and the difference is set as a distance), and FIG. An example of generation of duplicate data ID in the embodiment (when the value of the exclusive OR between ID bit strings is set as the distance) is shown. Here, the duplicate is determined according to the criteria for determining which node stores which data. The calculation method of the data ID changes.

図６は最も一般的な、ノードＩＤおよびデータＩＤを整数と見立てて、データＩＤに対して最も近いノードＩＤ、あるいはデータＩＤよりも整数としての値が小さくかつ最も近いノードＩＤを持つノードにおいてデータを格納する、という判定基準を持つ分散データ管理プログラムにおける複製データＩＤの生成例である。 FIG. 6 shows the most common node ID and data ID as integers, and the data at a node ID closest to the data ID, or a node having a closest node ID whose integer value is smaller than the data ID. This is an example of generating a duplicate data ID in a distributed data management program having a determination criterion of storing.

オリジナルのデータＩＤ（本来データＩＤ）の、ノードＩＤにおけるノード固有値に相当する箇所（９〜１２ビット目）が（００１０）である。複製データＩＤはこのノード固有値に相当する箇所を固有値部分変換部１１１０によって変換することで生成される。この例では、データＩＤ中の固有値部分（００１０）に、複製データＩＤ毎のビットマスク（１０００、０１００、１１００、００１０）の排他的論理和（ＸＯＲ）をかけることで、固有値部分をそれぞれ（１０１０、０１１０、１１１０、００００）に変換している。 A portion (9th to 12th bits) corresponding to the node unique value in the node ID of the original data ID (original data ID) is (0010). The duplicate data ID is generated by converting a portion corresponding to the node eigenvalue by the eigenvalue partial conversion unit 1110. In this example, the eigenvalue part (0010) in the data ID is multiplied by the exclusive OR (XOR) of the bit masks (1000, 0100, 1100, 0010) for each duplicated data ID, so that the eigenvalue part is (1010). , 0110, 1110, 0000).

これにより複製データＩＤは、本来データＩＤと上位８ビットについては良く似た値を持ちながら、固有値に相当する４ビットについては本来データＩＤと整数の差による距離の概念においてかけ離れた値を取ることとなり、本来データＩＤを仮想ノードＩＤの１つとして持つ物理的ノードが、複製データについても管理する確率を減らしている。 As a result, the duplicate data ID originally has a value that is very similar to the upper 8 bits of the data ID, but the 4 bits corresponding to the unique value take values that are far from each other in terms of the distance due to the difference between the data ID and the integer. Thus, the probability that a physical node that originally has a data ID as one of the virtual node IDs also manages duplicated data is reduced.

図７は、ノードＩＤとデータＩＤとの排他的論理和の値（整数）をＩＤ間の距離と考え、この距離が最も小さいノードＩＤを持つノードにおいてデータを格納する、という判定基準を持つ分散データ管理プログラムにおける複製データＩＤの生成例である。 FIG. 7 considers a value (integer) of an exclusive OR of a node ID and a data ID as a distance between IDs, and has a determination criterion that data is stored in a node having a node ID with the smallest distance. It is an example of generation of duplicate data ID in the data management program.

オリジナルのデータＩＤ（本来データＩＤ）の、ノードＩＤにおけるノード固有値に相当する箇所（９〜１２ビット目）は上記の例と同様に（００１０）である。複製データＩＤはこのノード固有値に相当する箇所を固有値部分変換部１１１０によって変換することで生成される。この例では、データＩＤ中の固有値部分（００１０）に、複製データＩＤ毎のビットマスク（１１１１、０１１１、１０１１、００１１）の排他的論理和（ＸＯＲ）をかけることで、固有値部分をそれぞれ（１１０１、０１０１、１００１、０００１）に変換している。 The portion (9th to 12th bits) corresponding to the node unique value in the node ID of the original data ID (original data ID) is (0010) as in the above example. The duplicate data ID is generated by converting a portion corresponding to the node eigenvalue by the eigenvalue partial conversion unit 1110. In this example, the eigenvalue part (0010) in the data ID is multiplied by the exclusive OR (XOR) of the bit masks (1111, 0111, 1011, 0011) for each duplicated data ID, so that the eigenvalue part is (1101). , 0101, 1001, 0001).

これにより複製データＩＤは、本来データＩＤと上位８ビットについては良く似た値を持ちながら、固有値に相当する４ビットについては本来データＩＤと排他的論理和による距離の概念においてかけ離れた値を取ることとなり、本来データＩＤを仮想ノードＩＤの１つとして持つ物理的ノードが、複製データについても管理する確率を減らしている。 As a result, the duplicate data ID originally has a value that is very similar to the upper 8 bits of the data ID, but takes a value that is far from the original data ID and the concept of exclusive OR for the 4 bits corresponding to the unique value. In other words, the probability that a physical node that originally has a data ID as one of the virtual node IDs also manages duplicated data is reduced.

上記の実施の形態に示した方法により、前述の課題におけるＩＤ空間上でのノードＩＤの偏りを抑え、かつデータのオリジナルと複製の両方を物理的に同一のノードが管理する確率を減らす。この２つの特徴はトレードオフの関係になっており、ノード固有値の上位に位置する無作為値（ハッシュ関数によって得られた法則性のない値）のサイズが小さければＩＤ空間の偏りが大きくなり、逆に大きくすると同一のノードがオリジナルと複製の両方を管理する確率が大きくなる。 By the method shown in the above embodiment, the bias of the node ID on the ID space in the above-mentioned problem is suppressed, and the probability that the same node manages both the original and the replica of the data is reduced. These two features are in a trade-off relationship, and if the size of a random value (a value without a law property obtained by a hash function) located at the top of the node eigenvalue is small, the bias of the ID space becomes large, On the other hand, when the size is increased, the probability that the same node manages both the original and the copy increases.

本発明では、ノードＩＤの生成手法とデータＩＤの生成・算出方法のみを変更するため、それ以外の分散データ管理プログラムの仕様はそのまま引き継ぐことが可能であり、前述したコンシステント・ハッシングや分散ハッシュテーブル、分散キー・バリュー・ストアの全てに適用することができる。 In the present invention, since only the node ID generation method and the data ID generation / calculation method are changed, the specifications of the other distributed data management programs can be inherited as they are. It can be applied to all tables and distributed key / value stores.

また、図４に示すノード３００の各構成要素の機能をプログラムとして構築し、ノードとして利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 Further, the function of each component of the node 300 shown in FIG. 4 can be constructed as a program, installed in a computer used as a node and executed, or distributed via a network.

また、構築されたプログラムをハードディスクや、フレキシブルディスク・ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

１００ノードＩＤ
１１０上位無作為値
１２０ノード固有値
１３０下位無作為値
２００ノードＩＤ
２１０無作為値
３００ノード
４００自ノードＩＤ管理部
４１０ノードＩＤ
５００他ノード管理部
５１０ノードＩＤ
５２０アドレス
６００ハッシュ表管理部
６１０データＩＤ
６２０キー
６３０データの値
７００通信部
８００探索部
９００ハッシュ関数
１０００仮想ノードＩＤ生成部
１０１０ノード固有値挿入部
１１００複製データＩＤ算出部
１１１０固有値部分変換部
１２００ネットワーク 100 Node ID
110 Upper random value 120 Node specific value 130 Lower random value 200 Node ID
210 Random value 300 Node 400 Own node ID management unit 410 Node ID
500 Other node management unit 510 Node ID
520 Address 600 Hash table management unit 610 Data ID
620 Key 630 Data value 700 Communication unit 800 Search unit 900 Hash function 1000 Virtual node ID generation unit 1010 Node eigenvalue insertion unit 1100 Duplicate data ID calculation unit 1110 Eigenvalue partial conversion unit 1200 Network

Claims

A distributed data management apparatus in which a node having a node ID close to a data ID that is a hash value obtained using a hash function that returns an output having no law unrelated to the input value law manages target data. And
A hash function storage means for holding a hash function;
A virtual node ID generating means for generating a plurality of virtual node IDs to be set in a certain physical node;
The virtual node ID generation means includes
A node unique value calculated using the hash function of the hash function storage means is set to a unique value for each physical node between the upper random value and the lower random value calculated by any means. A distributed data management apparatus comprising node unique value insertion means for generating the virtual node ID.

A communication means for acquiring the data ID, key, and data of the data to be handled by the own device from another node;
Copy data ID calculating means for calculating a copy ID indicating a copy of the data,
The duplicate data ID calculation means
Eigen value partial conversion means for converting a portion corresponding to the node unique value of the original data ID acquired by the communication means by combining a value of a node ID different from the node ID holding the original data with a hash value 2. The distributed data management apparatus according to claim 1, further comprising:

A distributed data management method in a device in which a node having a node ID close to a data ID that is a hash value obtained using a hash function that is a function that returns an output having no law unrelated to the input value law manages target data Because
The virtual node ID generation means of the device is
By setting a node unique value calculated using the hash function to a unique value for each physical node between the upper random value and the lower random value calculated by any means, the virtual node A distributed data management method characterized by generating an ID.

The duplicate data ID calculation means of the device is
Obtain the data ID, key, and data of the data that the device should be in charge of from other nodes via communication means,
4. The distributed data management method according to claim 3, wherein a portion corresponding to the node specific value of the original data ID acquired by the communication means is converted.

A distributed data management program for causing a computer to function as each means of the distributed data management device according to claim 1.