JP5508346B2

JP5508346B2 - Distributed data management system, distributed data management method, and distributed data management program

Info

Publication number: JP5508346B2
Application number: JP2011131524A
Authority: JP
Inventors: 道生入江; 豪生西村; 雅志金子
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-06-13
Filing date: 2011-06-13
Publication date: 2014-05-28
Anticipated expiration: 2031-06-13
Also published as: JP2013003677A

Description

本発明は、データを複数のサーバに分散して管理する分散データ管理システム、分散データ管理方法及び分散データ管理プログラムに関する。 The present invention relates to a distributed data management system, a distributed data management method, and a distributed data management program for managing data distributed to a plurality of servers.

データベースのように大規模なデータ管理を行う際に、複数のサーバを接続して高い処理性能を獲得するために、分散データ管理システムが用いられる。分散データ管理システムは、複数のサーバが協調して動作する分散システムであり、当該システム全体で管理されるデータを、個々のサーバが分担して管理するものである。かかる分散データ管理システムは、医療、通信等といった高いリアルタイム性が求められる分野において用いられており、分散データ管理システムを利用するクライアントからのデータ取得要求に対し、当該クライアントのアプリケーションごとに定められた一定時間内にデータを返信することが望まれる。 When performing large-scale data management like a database, a distributed data management system is used to connect a plurality of servers to obtain high processing performance. The distributed data management system is a distributed system in which a plurality of servers operate in cooperation, and each server shares and manages data managed by the entire system. Such a distributed data management system is used in fields requiring high real-time properties such as medical care and communication, and is determined for each client application in response to a data acquisition request from a client using the distributed data management system. It is desirable to return data within a certain time.

データを複数のサーバに分散して管理する手法として、コンシステント・ハッシングが挙げられる（非特許文献１参照）。図７に示すように、コンシステント・ハッシングは、分散データ管理プログラムを実装するサーバである複数のノード１３０（１３０Ａ〜１３０Ｆ）と、ノード１３０によって管理されるデータのそれぞれに識別子を割り当て、あるノード１３０が管理するデータの識別子の範囲を、当該ノード１３０の識別子と、当該ノード１３０よりも値の小さい識別子を持つノード１３０の中では最も大きい値の識別子を持つノード１３０の識別子との間に定める技術である。なお、図７のように、ノード１３０及びデータを識別子の値の順に並べたものは、コンシステント・ハッシングにおいてＩＤ空間と呼ばれる。ここで、ノード１３０及びデータの識別子は、ハッシュ関数等の無作為な値を算出する手法によって決定されるので、個々のノード１３０が担当するデータの量は、確率的に等しくなり、分散データ管理システムにおいて、各ノード１３０に対して平等に近い負荷分散を行うことが可能になる。 As a technique for managing data by distributing it to a plurality of servers, there is consistent hashing (see Non-Patent Document 1). As shown in FIG. 7, consistent hashing assigns an identifier to each of a plurality of nodes 130 (130A to 130F), which are servers that implement a distributed data management program, and data managed by the node 130. The range of identifiers of data managed by the node 130 is determined between the identifier of the node 130 and the identifier of the node 130 having the largest identifier among the nodes 130 having identifiers smaller than the node 130. Technology. As shown in FIG. 7, the node 130 and the data arranged in the order of the identifier values are called ID space in the consistent hashing. Here, since the node 130 and the identifier of the data are determined by a method of calculating a random value such as a hash function, the amount of data handled by each node 130 is stochastically equal and distributed data management In the system, it is possible to perform load sharing close to equality for each node 130.

各ノード１３０は、他の全てのノード１３０の識別子と当該ノード１３０のアドレス（ＩＰアドレス等）が関連付けられたアドレス表を保持する。コンシステント・ハッシングを用いた分散データ管理システムでは、当該システムを構成するノード１３０の数が増減した場合であっても、複数のノード１３０間で分担していたデータの担当替えがごく一部で済み、分散データ管理システム全体から見ればノード１３０の数によらず一定のデータ移動しか発生しない。 Each node 130 holds an address table in which identifiers of all other nodes 130 are associated with addresses (such as IP addresses) of the nodes 130. In a distributed data management system using consistent hashing, even if the number of nodes 130 constituting the system increases or decreases, only a part of the data is reassigned among a plurality of nodes 130. From the viewpoint of the entire distributed data management system, only certain data movement occurs regardless of the number of nodes 130.

D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy, “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web,” in Proceedings of the 29th ACM Symposium on Theory of Computing (STOC'97), pp.654-663, May 1997.D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy, “Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web,” in Proceedings of the 29th ACM Symposium on Theory of Computing (STOC'97), pp.654-663, May 1997.

ここで、非特許文献１には、ノード１３０の数を動的に変更した際のデータ移管の具体的手法については、開示されていない。データの移管に関する最も単純な手法としては、ノード１３０の数の変更時に、全てのデータ送信（データを必要とするクライアントへの送信）・保管（データを担当するノード１３０での保管）要求の処理を停止し、データを前の担当ノード１３０から新たな担当ノード１３０へと全て移管し、その後でデータ送信・保管要求の処理を再開するという手法が考えられる。 Here, Non-Patent Document 1 does not disclose a specific method of data transfer when the number of nodes 130 is dynamically changed. The simplest method for data transfer is to process all data transmission (sending to a client that requires data) / storage (storage at the node in charge of data) request when the number of nodes 130 is changed. Can be considered, and all the data is transferred from the previous responsible node 130 to the new responsible node 130, and then the data transmission / storage request processing is resumed.

しかし、医療、通信等といった高リアルタイム性を必要とする分野においては、このようにデータ送信・保管要求を停止して全てのデータを移管するという手法は、データの移管にかかる時間が長くなり、データ送信・保管要求の処理の再開が遅れてしまうため、適用が困難である。 However, in fields that require high real-time properties such as medical care and communication, the method of transferring all data by stopping the data transmission / storage request in this way takes a long time to transfer the data, Application of data transmission / storage request is difficult because it is delayed.

そこで、新たなノード１３０上で実際にデータが必要になった際に、前の担当ノード１３０からこのデータを新たなノード１３０へ移管し処理を行う手法が考えられる。かかる手法では、予め全てのデータを移管する手法と比べて、データの移管が散発的に発生するため、データ送信・保管要求を受け付けながらデータの移管を行うことができ、リアルタイム性への影響が少ない。一方で、データの移管が完了するまでの時間は長くなり、全てのデータに対してデータ送信・保管要求等の「実際にデータが必要になる」契機が発生するまで、データの移管は完了しない。 Therefore, when data is actually needed on the new node 130, a method of transferring this data from the previous node 130 to the new node 130 and performing processing can be considered. In this method, data transfer occurs sporadically compared to the method of transferring all data in advance, so data can be transferred while accepting data transmission / storage requests, which has an impact on real-time performance. Few. On the other hand, the time until the data transfer is completed is long, and the data transfer will not be completed until an “actual data is required” opportunity such as a data transmission / storage request occurs for all data. .

このようなデータ移管の手法においては、分散データ管理システムに新たに追加されたノード１３０にデータ送信・保管要求が届き、ノード１３０上に処理対象のデータがないことを確認したことを契機に、どのノード１３０が前の担当ノード１３０であるかの探索を行うことが考えられる。 In such a data transfer method, when a data transmission / storage request arrives at the node 130 newly added to the distributed data management system and it is confirmed that there is no data to be processed on the node 130, It is conceivable to search for which node 130 is the previous responsible node 130.

一般的には、図８（ａ）に示すように、前の担当ノード１３０Ａは、新規に追加されたノード１３０Ｇよりも大きい識別子で、かつ最も識別子の値の近いノード１３０である可能性が高いため、新たな担当ノード１３０Ｇは、ノード１３０Ａへデータの有無の問い合わせを行うことが考えられる。このように、ＩＤ空間において隣接するノード１３０からのデータ移管は、一度に追加されるノード１３０の数が小さい（例えば、１つのみ）場合には成立する。 In general, as shown in FIG. 8A, the previous responsible node 130A is likely to be a node 130 having an identifier larger than the newly added node 130G and having the closest identifier value. Therefore, it is conceivable that the new node 130G makes an inquiry about the presence / absence of data to the node 130A. As described above, data transfer from adjacent nodes 130 in the ID space is established when the number of nodes 130 added at a time is small (for example, only one).

一方、一度に追加されるノード１３０の数が多い場合には、図８（ｂ）に示すように、新たな担当ノード１３０Ｇと前の担当ノード１３０Ａとの間に、1つ以上の他の新規ノード１３０（１３０Ｈ，１３０Ｉ，１３０Ｊ）が存在する可能性がある。このような場合、単純にデータ送信・保管要求を受けた新たな担当ノード１３０Ｇが、隣接するノード１３０Ｈにデータの問い合わせを行っても、問い合わせ先のノード１３０Ｈも追加されたばかりで何のデータも持たないため、新たな担当ノード１３０Ｇは、所望のデータを取得することができない。 On the other hand, when the number of nodes 130 added at a time is large, as shown in FIG. 8B, one or more other new nodes are placed between the new responsible node 130G and the previous responsible node 130A. There may be a node 130 (130H, 130I, 130J). In such a case, even if a new node 130G that has simply received a data transmission / storage request makes an inquiry about data to the adjacent node 130H, the inquiry destination node 130H has just been added and has no data. Therefore, the new responsible node 130G cannot acquire desired data.

ここで、隣接するノード１３０へ問い合わせた結果、所望のデータを取得できない場合、順繰りに更に隣のノード１３０（１３０Ｉ，１３０Ｊ，…）へと問い合わせを行う手法が考えられる。かかる手法では、新たな担当ノード１３０Ｇと前の担当ノード１３０Ａとの間に隣接して追加されたノード１３０の台数Ｎに比例して、実際のデータが発見されるまでに複数ホップ（Ｎホップ）がかかるため、Ｎの値に比例してリアルタイム性が損なわれる。 Here, as a result of making an inquiry to the adjacent node 130, if desired data cannot be acquired, a method of making an inquiry to the adjacent nodes 130 (130I, 130J,...) In order is conceivable. In this method, a plurality of hops (N hops) are required until actual data is found in proportion to the number N of nodes 130 added adjacently between the new assigned node 130G and the previous assigned node 130A. Therefore, the real-time property is impaired in proportion to the value of N.

前記したように、ノード１３０の識別子はハッシュ関数等により無作為に与えられるため、Ｎの値は一定ではないが、追加前の全てのノード１３０の数に対する追加されたノード１３０の数の比に比例して大きくなる。 As described above, since the identifier of the node 130 is randomly given by a hash function or the like, the value of N is not constant, but the ratio of the number of added nodes 130 to the number of all nodes 130 before the addition is used. Increase proportionally.

リアルタイム性を損ねない手法としては、データ送信・保管要求を受けた新たな担当ノード１３０Ｇが所望のデータを持たない場合に、他の全てのノード１３０へと問い合わせを行うものがある。かかる手法では、データの探索に複数ホップを要しないものの、システム全体への問い合わせを行うため、全ノード１３０の数に比例して処理負荷が増大し、スケーラビリティを著しく損ねる。 As a technique that does not impair the real-time property, there is a method that inquires all other nodes 130 when the new node 130G that has received the data transmission / storage request does not have the desired data. Although this method does not require a plurality of hops for searching for data, an inquiry is made to the entire system, so the processing load increases in proportion to the number of all nodes 130 and the scalability is significantly impaired.

本発明は、前記した事情に鑑みて創作されたものであり、複数のサーバが追加された場合であっても、リアルタイムに、かつスケーラビリティを損なわずにデータを移管することが可能な分散データ管理システム、分散データ管理方法及び分散データ管理プログラムを提供することを課題とする。 The present invention was created in view of the circumstances described above, and even when a plurality of servers are added, distributed data management capable of transferring data in real time and without losing scalability. It is an object to provide a system, a distributed data management method, and a distributed data management program.

前記課題を解決するため、本発明の分散データ管理システムは、複数のサーバがデータを分散して管理する分散データ管理システムであって、前記サーバは、データと当該データの識別子とが関連付けて記憶される第一の記憶部と、前記複数のサーバの識別子及びアドレスとが関連付けられたアドレス表が記憶される第二の記憶部と、前記複数のサーバの増減時に、前記アドレス表を新たに生成して当該アドレス表の生成時刻とともに前記第二の記憶部に記憶させるアドレス表生成部と、クライアントから送信された、データの識別子と当該データの生成時刻とを含み、前記データの送信又は保管を要求する要求信号を取得する要求信号取得部と、取得された前記要求信号に含まれる前記データの識別子と、前記第二の記憶部に記憶された最新の前記アドレス表の前記複数のサーバの識別子と、に基づいて、前記データの担当サーバを特定する担当サーバ特定部と、前記第二の記憶部に記憶された最新の前記アドレス表のアドレスを用いて、特定された前記担当サーバに前記要求信号を転送する要求信号転送部と、前記データの送信を要求する前記要求信号に含まれる前記データの識別子に基づいて第一の記憶部を参照し、対応する前記データを読み出して前記クライアントへ送信する、又は、前記データの保管を要求する前記要求信号に含まれる前記データの識別子に基づいて前記第一の記憶部を参照し、対応する前記データの保管の成否を判定して判定結果を示す保管成否信号を前記クライアントへ送信する応答部と、前記要求信号に含まれる前記データの識別子に対応するデータが前記第一の記憶部に記憶されていない場合に、前記第二の記憶部に記憶されたアドレス表のうち、当該アドレス表の生成時刻が前記要求信号に含まれる前記データの生成時刻よりも古くかつ最新のアドレス表を用いて、前記データの前の担当サーバを特定する前担当サーバ特定部と、特定された前記前の担当サーバとの間で前記データ及び当該データの識別子の転送を行うデータ転送部と、転送された前記データ及び当該データの識別子を前記第一の記憶部に記憶させるデータ更新部と、を備えることを特徴とする。 In order to solve the above problems, a distributed data management system of the present invention is a distributed data management system in which a plurality of servers distribute and manage data, and the server stores data associated with an identifier of the data. A first storage unit, a second storage unit storing an address table in which identifiers and addresses of the plurality of servers are associated, and a new generation of the address table when the plurality of servers are increased or decreased Including the address table generation unit to be stored in the second storage unit together with the generation time of the address table, the data identifier transmitted from the client, and the generation time of the data. A request signal acquisition unit for acquiring a request signal to be requested, an identifier of the data included in the acquired request signal, and a data stored in the second storage unit. Using the identifier of the plurality of servers in the address table, and a server identification unit that identifies the server in charge of the data, and the latest address table address stored in the second storage unit A request signal transfer unit that transfers the request signal to the specified server in charge, and refers to the first storage unit based on an identifier of the data included in the request signal that requests transmission of the data, Read the corresponding data and send it to the client, or refer to the first storage unit based on the identifier of the data included in the request signal requesting storage of the data, and A response unit that determines the success or failure of the storage and transmits a storage success / failure signal indicating the determination result to the client, and a data corresponding to the identifier of the data included in the request signal. Data is not stored in the first storage unit, among the address tables stored in the second storage unit, the generation time of the address table is greater than the generation time of the data included in the request signal Using the oldest and latest address table, the data and the identifier of the data are transferred between the previous server specification unit that specifies the server in front of the data and the specified server in front. And a data updating unit for storing the transferred data and an identifier of the data in the first storage unit.

かかる構成によると、サーバの増減ごとにアドレス表生成時刻を有するアドレス表を生成して第二の記憶部に記憶させ、担当サーバの第一の記憶部のデータ表に所望のデータが記憶されていない場合には、データ生成時刻とアドレス表生成時刻とに基づいて選択した過去のアドレス表に基づいて前の担当サーバを特定し、特定された前の担当サーバから担当サーバへデータを移管するので、複数のサーバが追加された場合であっても、リアルタイムに、かつスケーラビリティを損なわずにデータを移管することができる。 According to such a configuration, an address table having an address table generation time is generated for each increase / decrease of the server and stored in the second storage unit, and desired data is stored in the data table of the first storage unit of the server in charge. If not, the previous responsible server is identified based on the past address table selected based on the data generation time and the address table generation time, and the data is transferred from the identified previous responsible server to the responsible server. Even when a plurality of servers are added, data can be transferred in real time and without loss of scalability.

前記した分散データ管理システムにおいて、前記第一の記憶部には、前記データ及び当該データの識別子と当該データの生成時刻とが関連付けて記憶されており、前記データ更新部は、転送された前記データ及び当該データの識別子を前記第一の記憶部に記憶させる際に、転送された前記データ及び当該データの識別子を当該データ更新部が取得した時刻、又は、転送された前記データ及び当該データの識別子を当該データ更新部が前記第一の記憶部に記憶させた時刻を当該データの生成時刻として前記第一の記憶部に記憶させ、前記応答部は、前記データ又は前記保管成否信号に前記データの生成時刻を付与して前記クライアントへ送信することが望ましい。 In the distributed data management system described above, the first storage unit stores the data, an identifier of the data, and a generation time of the data in association with each other, and the data update unit transmits the transferred data And the time when the data update unit acquires the transferred data and the identifier of the data when the identifier of the data is stored in the first storage unit, or the transferred data and the identifier of the data Is stored in the first storage unit as the generation time of the data, and the response unit stores the data in the data or the storage success / failure signal. It is desirable to give the generation time and transmit to the client.

かかる構成によると、前の担当サーバから担当サーバへデータを移管した際に、担当サーバがデータを取得した時刻又は担当サーバの第一の記憶部のデータ表にデータを記憶させた時刻をデータ生成時刻として第一の記憶部に記憶させるとともにクライアントへ送信するので、データ移管後において、クライアントに新たなデータ生成時刻に基づいて要求信号を生成させることができる。 According to such a configuration, when data is transferred from the previous server in charge to the server in charge, the time at which the server in charge acquires the data or the time at which the data is stored in the data table of the first storage unit of the server in charge is generated. Since the time is stored in the first storage unit and transmitted to the client, it is possible to cause the client to generate a request signal based on a new data generation time after data transfer.

また、本発明の分散データ管理方法は、複数のサーバがデータを分散して管理する分散データ管理方法であって、前記サーバは、データと当該データの識別子とが関連付けて記憶される第一の記憶部と、前記複数のサーバの識別子及びアドレスとが関連付けられたアドレス表が記憶される第二の記憶部と、を備え、前記複数のサーバの増減時に、前記アドレス表を新たに生成して当該アドレス表の生成時刻とともに前記第二の記憶部に記憶させており、一の前記サーバが、クライアントから送信された、データの識別子と当該データの生成時刻とを含み、前記データの送信又は保管を要求する要求信号を取得するステップと、一の前記サーバが、取得された前記要求信号に含まれる前記データの識別子と、前記第二の記憶部に記憶された最新の前記アドレス表の前記複数のサーバの識別子と、に基づいて、前記データの担当サーバを特定するステップと、一の前記サーバが、前記第二の記憶部に記憶された最新の前記アドレス表のアドレスを用いて、特定された前記担当サーバに前記要求信号を転送するステップと、前記担当サーバが、前記データの送信を要求する前記要求信号に含まれる前記データの識別子に基づいて第一の記憶部を参照し、対応する前記データを読み出して前記クライアントへ送信する、又は、前記データの保管を要求する前記要求信号に含まれる前記データの識別子に基づいて前記第一の記憶部を参照し、対応する前記データの保管の成否を判定して判定結果を示す保管成否信号を前記クライアントへ送信するステップと、を含むとともに、前記要求信号に含まれる前記データの識別子に対応するデータが前記第一の記憶部に記憶されていない場合に、前記担当サーバが、前記第二の記憶部に記憶されたアドレス表のうち、当該アドレス表の生成時刻が前記要求信号に含まれる前記データの生成時刻よりも古くかつ最新のアドレス表を用いて、前記データの前の担当サーバを特定するステップと、前記担当サーバが、前記前の担当サーバとの間で前記データ及び当該データの識別子の転送を行うステップと、前記担当サーバが、転送された前記データ及び当該データの識別子を前記第一の記憶部に記憶させるステップと、を含むことを特徴とする。 Further, the distributed data management method of the present invention is a distributed data management method in which a plurality of servers distribute and manage data, and the server stores the data and the identifier of the data in association with each other. A storage unit and a second storage unit that stores an address table in which identifiers and addresses of the plurality of servers are associated, and when the number of the servers increases or decreases, the address table is newly generated. It is stored in the second storage unit together with the generation time of the address table, and the one server includes a data identifier and a generation time of the data transmitted from the client, and transmits or stores the data Obtaining a request signal for requesting, an identifier of the data included in the acquired request signal, and the latest stored in the second storage unit Identifying the server in charge of the data based on the identifiers of the plurality of servers in the address table, and the one address of the latest address table stored in the second storage unit by the one server And transferring the request signal to the specified responsible server using the first storage unit based on the identifier of the data included in the request signal for requesting transmission of the data by the responsible server The corresponding data is read out and transmitted to the client, or the first storage unit is referenced based on the identifier of the data included in the request signal requesting storage of the data. Determining whether the data is stored successfully and transmitting a storage success / failure signal indicating a determination result to the client. When the data corresponding to the identifier of the data to be stored is not stored in the first storage unit, the responsible server generates the address table from among the address tables stored in the second storage unit. Identifying a previous server in charge of the data using an address table whose time is older than the generation time of the data included in the request signal, and the server in charge is connected to the previous server in charge. Transferring the data and the identifier of the data between, and the responsible server storing the transferred data and the identifier of the data in the first storage unit, To do.

前記した分散データ管理方法において、前記第一の記憶部には、前記データ及び当該データの識別子と当該データの生成時刻とが関連付けて記憶されており、前記担当サーバは、転送された前記データ及び当該データの識別子を前記第一の記憶部に記憶させる際に、転送された前記データ及び当該データの識別子を当該担当サーバが取得した時刻、又は、転送された前記データ及び当該データの識別子を当該担当サーバが前記第一の記憶部に記憶させた時刻を当該データの生成時刻として前記第一の記憶部に記憶させ、前記担当サーバは、前記データ又は前記保管成否信号に前記データの生成時刻を付与して前記クライアントへ送信することが望ましい。 In the distributed data management method described above, the first storage unit stores the data and the identifier of the data and the generation time of the data in association with each other, and the responsible server stores the transferred data and When storing the identifier of the data in the first storage unit, the time when the server in charge acquired the transferred data and the identifier of the data, or the transferred data and the identifier of the data The time stored in the first storage unit by the server in charge is stored in the first storage unit as the generation time of the data, and the server in charge sets the generation time of the data in the data or the storage success / failure signal. It is desirable to give and send to the client.

なお、本発明は、コンピュータを前記した分散データ管理システムのサーバの各機能部として機能させるための分散データ管理プログラムとしても具現化可能である。 The present invention can also be embodied as a distributed data management program for causing a computer to function as each functional unit of the server of the distributed data management system.

本発明によれば、複数のサーバが追加された場合であっても、リアルタイムに、かつスケーラビリティを損ねずにデータを移管することができる。 According to the present invention, even when a plurality of servers are added, data can be transferred in real time and without loss of scalability.

本発明の実施形態に係る分散データ管理システムを模式的に示す図である。It is a figure which shows typically the distributed data management system which concerns on embodiment of this invention. 図１のサーバの構成を示すブロック図である。It is a block diagram which shows the structure of the server of FIG. データ表の一例を示す図である。It is a figure which shows an example of a data table. アドレス表の一例を示す図である。It is a figure which shows an example of an address table. 本発明の実施形態に係る分散データ管理システムの動作例を説明するための図であって、ＩＤ空間を示す図である。It is a figure for demonstrating the operation example of the distributed data management system which concerns on embodiment of this invention, Comprising: It is a figure which shows ID space. 本発明の実施形態に係る分散データ管理システムの動作例を説明するためのフローチャートである。It is a flowchart for demonstrating the operation example of the distributed data management system which concerns on embodiment of this invention. 従来技術におけるＩＤ空間を示す図である。It is a figure which shows ID space in a prior art. 従来技術におけるＩＤ空間において、ノード（サーバ）が追加された場合を説明するための図であって、（ａ）は１つのノードが追加されたＩＤ空間を示す図、（ｂ）は複数のノードが追加されたＩＤ空間を示す図である。In the ID space in the prior art, it is a figure for demonstrating the case where a node (server) is added, Comprising: (a) is a figure which shows ID space where one node was added, (b) is a several node It is a figure which shows ID space to which is added.

以下、本発明の実施形態について、適宜図面を参照しながら説明する。図１に示すように、本発明の実施形態に係る分散データ管理システム１は、クライアント１０からの要求信号をロードバランサ２０を介して取得して応答するシステムであって、複数のサーバ３０（３０Ａ，３０Ｂ，…，３０Ｎ）を備える。複数のサーバ３０は、データを分散して管理しており、図示しないネットワークを介して互いに通信可能に接続されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings as appropriate. As shown in FIG. 1, a distributed data management system 1 according to an embodiment of the present invention is a system that obtains a request signal from a client 10 via a load balancer 20 and responds, and includes a plurality of servers 30 (30A , 30B, ..., 30N). The plurality of servers 30 manage data in a distributed manner, and are connected to be communicable with each other via a network (not shown).

＜クライアント及びロードバランサ＞
クライアント１０には、データの識別子と当該データの生成時刻とが関連付けて記憶されている。クライアント１０は、利用者による入力装置（キーボード、マウス等）の操作に応じて、分散データ管理システム１に記憶されたデータのクライアント１０への送信又は分散データ管理システム１での保管を要求する要求信号を生成し、生成された要求信号を分散データ管理システム１へ送信する。ここで、要求信号には、送信又は保管を要求すべきデータの識別子（データ識別子）と生成時刻（データ生成時刻）とが含まれる。ロードバランサ２０は、任意のサーバ３０を無作為に選択し、クライアント１０から送信された要求信号を選択されたサーバ３０へ転送する。なお、クライアント１０は、最初に、データ生成時刻を含まない、データの保管を要求する要求信号を生成して分散データ管理システム１へ送信し、後記するようにサーバ３０から応答信号に付与されて送信されたデータ生成時刻を取得し、取得されたデータ生成時刻を記憶し、以降はデータ識別子及びデータ生成時刻を引数として要求信号を生成することができる。 <Client and load balancer>
The client 10 stores a data identifier and a generation time of the data in association with each other. The client 10 requests transmission of data stored in the distributed data management system 1 to the client 10 or storage in the distributed data management system 1 in accordance with the operation of the input device (keyboard, mouse, etc.) by the user. A signal is generated, and the generated request signal is transmitted to the distributed data management system 1. Here, the request signal includes an identifier (data identifier) of data to be transmitted or stored and a generation time (data generation time). The load balancer 20 randomly selects an arbitrary server 30 and transfers the request signal transmitted from the client 10 to the selected server 30. The client 10 first generates a request signal for requesting storage of data that does not include the data generation time, transmits the request signal to the distributed data management system 1, and is added to the response signal from the server 30 as described later. The transmitted data generation time is acquired, the acquired data generation time is stored, and thereafter, the request signal can be generated using the data identifier and the data generation time as arguments.

＜サーバ＞
複数のサーバ３０は、分散データ管理システム１全体で管理すべきデータのうち、当該サーバ３０が担当するデータをそれぞれ管理する。サーバ３０は、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read-Only Memory）、入出力回路等から構成されており、図２に示すように、機能部として、第一の記憶部３１−１と、第二の記憶部３１−２と、アドレス表生成部３２と、要求信号取得部３３と、担当サーバ特定部３４と、要求信号転送部３５と、応答部３６と、前担当サーバ特定部３７と、データ転送部３８と、データ更新部３９と、を備える。 <Server>
The plurality of servers 30 respectively manage data handled by the server 30 among data to be managed by the entire distributed data management system 1. The server 30 includes a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read-Only Memory), an input / output circuit, and the like. As shown in FIG. A storage unit 31-1, a second storage unit 31-2, an address table generation unit 32, a request signal acquisition unit 33, a responsible server identification unit 34, a request signal transfer unit 35, a response unit 36, A front server identification unit 37, a data transfer unit 38, and a data update unit 39 are provided.

＜第一の記憶部＞
第一の記憶部３１−１には、図３に示すデータ表３１ａが記憶されている。データ表３１ａは、サーバ３０が担当するデータと、当該データの識別子（データ識別子）と、当該データの生成時刻（データ生成時刻）とを関連付けた表である。 <First storage unit>
A data table 31a shown in FIG. 3 is stored in the first storage unit 31-1. The data table 31a is a table in which data handled by the server 30, an identifier of the data (data identifier), and a generation time (data generation time) of the data are associated with each other.

＜第二の記憶部＞
第二の記憶部３１−２には、図４に示すアドレス表３１ｂが記憶されている。アドレス表３１ｂは、分散データ管理システム１を構成する全てのサーバ３０の識別子（サーバ識別子）と、当該サーバ３０のアドレス（ＩＰアドレス等）とを関連付けた表であり、当該アドレス表３１ｂの生成時刻（アドレス表生成時刻）ごとに記憶されている。 <Second storage unit>
The second storage unit 31-2 stores an address table 31b shown in FIG. The address table 31b is a table in which identifiers (server identifiers) of all servers 30 constituting the distributed data management system 1 are associated with addresses (such as IP addresses) of the servers 30, and the generation time of the address table 31b Stored for each (address table generation time).

＜アドレス表生成部＞
アドレス表生成部３２は、分散データ管理システム１を構成するサーバ３０の増減時に、アドレス表３１ｂを新たに生成し、新たなアドレス表３１ｂのアドレス表生成時刻とともに第二の記憶部３１−２に記憶させる。すなわち、第二の記憶部３１−２に記憶されたアドレス表３１ｂは、サーバ３０が増減して新たなアドレス表３１ｂが生成されても削除されずに履歴として保存され、サーバ３０の増減ごとに増える。なお、サーバ３０の増減時に、例えば追加（増設）されたサーバ３０にアドレスを付与したり、サーバ３０の増減及び新たに付与されたアドレスを他のサーバ３０と共有したりする技術としては、公知の技術を採用することが可能であり、ここでは詳しい説明を省略する。 <Address table generator>
The address table generation unit 32 newly generates an address table 31b when the number of servers 30 constituting the distributed data management system 1 increases or decreases, and stores it in the second storage unit 31-2 together with the address table generation time of the new address table 31b. Remember me. That is, the address table 31b stored in the second storage unit 31-2 is stored as a history without being deleted even if the server 30 increases or decreases and a new address table 31b is generated. Increase. In addition, when the number of servers 30 is increased or decreased, for example, a technique for assigning addresses to the added (added) servers 30 or sharing the newly added or decreased addresses with the servers 30 is known. This technique can be employed, and detailed description thereof is omitted here.

＜要求信号取得部＞
要求信号取得部３３は、クライアント１０から送信された要求信号をロードバランサ２０を介して取得し、担当サーバ特定部３４へ出力する。 <Request signal acquisition unit>
The request signal acquisition unit 33 acquires the request signal transmitted from the client 10 via the load balancer 20 and outputs the request signal to the assigned server specifying unit 34.

＜担当サーバ特定部＞
担当サーバ特定部３４は、要求信号取得部３３から出力された要求信号を取得し、取得された要求信号に含まれるデータ識別子と、第二の記憶部３１−２に記憶された最新のアドレス表３１ｂのサーバ識別子と、に基づいて、データの担当サーバを特定する。ここで、担当サーバ特定部３４は、既存のコンシステント・ハッシング等の技術を用いて担当サーバを特定することができる。自サーバ３０が担当サーバであると特定された場合には、担当サーバ特定部３４は、要求信号を応答部３６へ出力し、他サーバ３０が担当サーバであると特定された場合には、担当サーバ特定部３４は、担当サーバのアドレスを第二の記憶部３１−２に記憶されたアドレス表３１ｂから読み出し、要求信号及び読み出されたアドレスを要求信号転送部３５へ出力する。 <Responsible server identification part>
The assigned server specifying unit 34 acquires the request signal output from the request signal acquiring unit 33, the data identifier included in the acquired request signal, and the latest address table stored in the second storage unit 31-2. Based on the server identifier 31b, the server in charge of data is specified. Here, the assigned server specifying unit 34 can specify the assigned server using an existing technique such as consistent hashing. When the server 30 is identified as the responsible server, the responsible server identifying unit 34 outputs a request signal to the response unit 36. When the other server 30 is identified as the responsible server, the responsible server identifying unit 34 The server specifying unit 34 reads the address of the server in charge from the address table 31b stored in the second storage unit 31-2, and outputs the request signal and the read address to the request signal transfer unit 35.

＜要求信号転送部＞
要求信号転送部３５は、担当サーバ特定部３４から出力された要求信号及びアドレスを取得し、かかるアドレスを用いて要求信号を担当サーバの応答部３６へ転送する。 <Request signal transfer unit>
The request signal transfer unit 35 acquires the request signal and the address output from the assigned server specifying unit 34 and transfers the request signal to the response unit 36 of the assigned server using the address.

＜応答部＞
応答部３６は、自サーバ３０が担当サーバである場合に、自サーバ３０の担当サーバ特定部３４から出力された要求信号、又は、非担当サーバである他サーバ３０の要求信号転送部３５によって転送された要求信号を取得する。要求信号がデータの送信を要求するものである場合には、応答部３６は、取得された要求信号に含まれるデータ識別子に基づいて第一の記憶部３１−１に記憶されたデータ表３１ａを参照し、データ識別子に対応するデータを読み出してクライアント１０へ送信する。また、要求信号がデータの保管を要求するものである場合には、応答部３６は、取得された要求信号に含まれるデータ識別子に基づいて第一の記憶部３１−１に記憶されたデータ表３１ａを参照し、データ識別子に対応するデータがデータ表３１ａに記憶（保管）されているか否かを判定し、判定結果を示す保管成否信号を生成してクライアント１０へ送信する。 <Response section>
When the own server 30 is the responsible server, the response unit 36 transfers the request signal output from the responsible server specifying unit 34 of the own server 30 or the request signal transfer unit 35 of the other server 30 that is a non-responsible server. Obtained request signal. When the request signal is a request for data transmission, the response unit 36 uses the data table 31a stored in the first storage unit 31-1 based on the data identifier included in the acquired request signal. The data corresponding to the data identifier is read out and transmitted to the client 10. When the request signal is a request for data storage, the response unit 36 stores the data table stored in the first storage unit 31-1 based on the data identifier included in the acquired request signal. Referring to 31a, it is determined whether or not the data corresponding to the data identifier is stored (stored) in the data table 31a, and a storage success / failure signal indicating the determination result is generated and transmitted to the client 10.

なお、担当サーバである自サーバ３０が分散データ管理システム１に追加されたばかりで、データ表３１ａに所望のデータが記憶されていない場合がある。この場合には、応答部３６は、要求信号を前担当サーバ特定部３７へ出力する。 In some cases, the server 30 that is the server in charge has just been added to the distributed data management system 1 and desired data is not stored in the data table 31a. In this case, the response unit 36 outputs a request signal to the previous server identification unit 37.

＜前担当サーバ特定部＞
前担当サーバ特定部３７は、応答部３６から出力された要求信号を取得し、取得された要求信号に含まれるデータ生成時刻に基づいて第二の記憶部３１−２に記憶されたアドレス表３１ｂを参照し、データ生成時刻よりも古くかつ最新のアドレス表生成時刻を有するアドレス表３１ｂを特定する。前担当サーバ特定部３７は、取得された要求信号に含まれるデータ識別子と、特定されたアドレス表３１ｂのサーバ識別子と、に基づいて、データの直前の担当サーバ（前担当サーバ）を特定する。ここで、前担当サーバ特定部３７は、担当サーバ特定部３４と同様に、既存のコンシステント・ハッシング等の技術を用いて前担当サーバを特定することができる。データ生成時刻よりも古くかつ最新のアドレス表生成時刻を有するアドレス表３１ｂは、所望のデータの担当ノードの更新前のアドレス表に相当するため、かかるアドレス表３１ｂを用いると、担当ノード更新前（担当ノードの更新が行われていない場合は、データの分散データ管理システム１への最初の格納時）の担当ノード、すなわち前担当ノードを特定することができる。前担当サーバ特定部３７は、自サーバ３０のアドレス及び前担当サーバのアドレスを第二の記憶部３１−２に記憶されたアドレス表３１ｂから読み出し、要求信号及び読み出されたアドレスをデータ転送部３８へ出力する。 <Previous server specific department>
The previous server identification unit 37 acquires the request signal output from the response unit 36, and the address table 31b stored in the second storage unit 31-2 based on the data generation time included in the acquired request signal. , The address table 31b that is older than the data generation time and has the latest address table generation time is specified. The previous server specifying unit 37 specifies the server in front of the data (the previous server) based on the data identifier included in the acquired request signal and the server identifier of the specified address table 31b. Here, similarly to the assigned server specifying unit 34, the previous assigned server specifying unit 37 can specify the previous assigned server using a technique such as existing consistent hashing. Since the address table 31b that is older than the data generation time and has the latest address table generation time corresponds to the address table before the update of the node in charge of the desired data, the address table 31b can be used before updating the node in charge ( When the responsible node has not been updated, the responsible node (at the first storage of data in the distributed data management system 1), that is, the previous responsible node can be identified. The previous server identification unit 37 reads the address of the own server 30 and the address of the previous server from the address table 31b stored in the second storage unit 31-2, and sends the request signal and the read address to the data transfer unit. 38.

＜データ転送部＞
自サーバ３０が担当サーバであってデータを保管していないデータ転送部３８は、前担当サーバ特定部３７から出力された要求信号及びアドレスを取得し、取得された要求信号及びアドレスに基づいて、前担当サーバのデータ転送部３８との間でデータの転送を行い、前担当サーバからデータ及び当該データのデータ識別子を取得し、データ更新部３９へ出力する。 <Data transfer unit>
The data transfer unit 38 whose own server 30 is a responsible server and does not store data acquires the request signal and address output from the previous responsible server specifying unit 37, and based on the acquired request signal and address, Data is transferred to and from the data transfer unit 38 of the previous server, and the data and the data identifier of the data are acquired from the previous server and output to the data update unit 39.

＜データ更新部＞
データ更新部３９は、データ転送部３８から出力されたデータ及びデータ識別子を取得し、取得されたデータを第一の記憶部３１−１のデータ表３１ａに記憶させる。 <Data update unit>
The data update unit 39 acquires the data and data identifier output from the data transfer unit 38, and stores the acquired data in the data table 31a of the first storage unit 31-1.

≪データ移管≫
ここで、自サーバ３０の第一の記憶部３１−１のデータ表３１ａに所望のデータが記憶されていない場合の、自サーバ３０及び前担当サーバのデータ転送部３８、データ更新部３９及び応答部３６のデータ移管に関する機能について詳細に説明する。 ≪Data transfer≫
Here, when the desired data is not stored in the data table 31a of the first storage unit 31-1 of the own server 30, the data transfer unit 38, the data update unit 39, and the response of the own server 30 and the previous server The function regarding the data transfer of the part 36 is demonstrated in detail.

自サーバ３０が現在の担当サーバ（新担当サーバ）であってデータを保管していないデータ転送部３８は、要求信号及び自サーバ３０のアドレスを含むデータ転送依頼信号を、前担当サーバのアドレスを用いて前担当サーバのデータ転送部３８へ送信する。 The data transfer unit 38 which is the current server 30 (the new server) and does not store the data sends the request signal and the data transfer request signal including the address of the server 30 to the address of the previous server. And transmitted to the data transfer unit 38 of the previous server.

前担当サーバのデータ転送部３８は、新担当サーバのデータ転送部３８によって送信されたデータ転送依頼信号を取得し、取得されたデータ転送依頼信号に含まれる要求信号のデータ識別子をデータ更新部３９へ出力する。 The data transfer unit 38 of the previous server in charge acquires the data transfer request signal transmitted by the data transfer unit 38 of the new server in charge, and the data identifier of the request signal included in the acquired data transfer request signal is the data update unit 39. Output to.

前担当サーバのデータ更新部３９は、データ転送部３８から出力されたデータ識別子を取得し、取得されたデータ識別子に基づいて第一の記憶部３１−１に記憶されたデータ表３１ａを参照し、データ識別子に対応するデータを読み出し、データ識別子及び読み出されたデータをデータ転送部３８へ出力する。ここで、データ更新部３９は、読み出したデータ及び当該データのデータ識別子（並びに当該データのデータ生成時刻）をデータ表３１ａから削除する。 The data update unit 39 of the previous server in charge acquires the data identifier output from the data transfer unit 38, and refers to the data table 31a stored in the first storage unit 31-1 based on the acquired data identifier. The data corresponding to the data identifier is read, and the data identifier and the read data are output to the data transfer unit 38. Here, the data update unit 39 deletes the read data and the data identifier of the data (and the data generation time of the data) from the data table 31a.

前担当サーバのデータ転送部３８は、データ更新部３９から出力されたデータ及びデータ識別子を取得し、取得されたデータ及びデータ識別子を、新担当サーバのアドレスを用いて新担当サーバのデータ転送部３８へ送信する。 The data transfer unit 38 of the previous server in charge acquires the data and data identifier output from the data update unit 39, and uses the data and data identifier of the new server in charge of the acquired data and data identifier. 38.

新担当サーバのデータ転送部３８は、前担当サーバのデータ転送部３８によって送信されたデータ及びデータ識別子を取得し、取得されたデータ及びデータ識別子をデータ更新部３９へ出力する。 The data transfer unit 38 of the new server in charge acquires the data and the data identifier transmitted by the data transfer unit 38 of the previous server in charge, and outputs the acquired data and data identifier to the data update unit 39.

新担当サーバのデータ更新部３９は、データ転送部３８から出力されたデータ及びデータ識別子を取得し、取得されたデータ及びデータ識別子を第一の記憶部３１−１のデータ表３１ａに記憶させるとともに、更新通知信号を応答部３６へ出力する。ここで、新担当サーバのデータ更新部３９は、前担当サーバから転送されてきたデータ及びデータ識別子を第一の記憶部３１−１のデータ表３１ａに記憶させる際に、前担当サーバから転送されてきたデータ及びデータ識別子を当該データ更新部３９が取得した時刻、又は、前担当サーバから転送されてきたデータ及びデータ識別子を当該データ更新部３９がデータ表３１ａに記憶させた時刻を当該データのデータ生成時刻としてデータ表３１ａに記憶させる。 The data update unit 39 of the new responsible server acquires the data and data identifier output from the data transfer unit 38, and stores the acquired data and data identifier in the data table 31a of the first storage unit 31-1. The update notification signal is output to the response unit 36. Here, the data update unit 39 of the new responsible server is transferred from the previous responsible server when storing the data and data identifier transferred from the previous responsible server in the data table 31a of the first storage unit 31-1. The time when the data update unit 39 acquired the data and the data identifier received, or the time when the data update unit 39 stores the data and data identifier transferred from the previous server in the data table 31a. The data generation time is stored in the data table 31a.

新担当サーバの応答部３６は、データ更新部３９から出力された更新通知信号を取得し、取得された更新通知信号に応じて、データ又は保管成否信号をクライアント１０へ送信する。この際、新担当サーバの応答部３６は、要求信号に含まれるデータ識別信号に基づいて第一の記憶部３１−１のデータ表３１ａに記憶されたデータ生成時刻を読み出し、データ又は保管成否信号にデータ生成時刻を付与してクライアント１０へ送信することができる。 The response unit 36 of the new responsible server acquires the update notification signal output from the data update unit 39, and transmits data or a storage success / failure signal to the client 10 in accordance with the acquired update notification signal. At this time, the response unit 36 of the new responsible server reads out the data generation time stored in the data table 31a of the first storage unit 31-1 based on the data identification signal included in the request signal, and the data or storage success / failure signal The data generation time can be given to the client 10 and transmitted to the client 10.

クライアント１０は、応答部３６によって送信されたデータ又は保管成否信号とデータ生成時刻とを取得すると、取得されたデータ生成時刻に基づいて、データ識別子と関連付けて記憶されたデータ生成時刻を更新する。 When the client 10 acquires the data or the storage success / failure signal transmitted by the response unit 36 and the data generation time, the client 10 updates the data generation time stored in association with the data identifier based on the acquired data generation time.

なお、前担当サーバのデータ更新部３９が所望のデータを読み出すことができなかった場合（前担当サーバのデータ表３１ａに所望のデータが無い場合）には、前担当サーバのデータ更新部３９は、データが無い旨を示す信号をデータ転送部３８へ出力し、前担当サーバのデータ転送部３８は、データが無い旨を示す信号を新担当サーバのデータ転送部３８へ送信する。新担当サーバのデータ転送部３８は、データが無い旨を示す信号を取得し、取得された信号をデータ更新部３９を介して応答部３６へ出力する（データ更新部３９を介さずに応答部３６へ直接出力する構成でも可）。新担当サーバの応答部３６は、データが無い旨を示す保管成否信号を生成し、クライアント１０へ送信する。 If the data update unit 39 of the previous server in charge cannot read out the desired data (when there is no desired data in the data table 31a of the previous server in charge), the data update unit 39 of the previous server in charge A signal indicating that there is no data is output to the data transfer unit 38, and the data transfer unit 38 of the previous server in charge transmits a signal indicating that there is no data to the data transfer unit 38 of the new server. The data transfer unit 38 of the new responsible server acquires a signal indicating that there is no data, and outputs the acquired signal to the response unit 36 via the data update unit 39 (the response unit without using the data update unit 39). It is also possible to output directly to 36). The response unit 36 of the new responsible server generates a storage success / failure signal indicating that there is no data, and transmits it to the client 10.

＜動作例＞
続いて、本発明の実施形態に係る分散データ管理システム１の動作例について、図５及び図６を参照して説明する（適宜図１〜図４参照）。かかる動作例では、図５に示すように、サーバ３０Ａ〜３０Ｆによって構成された分散データ管理システム１に対して、新たなサーバ３０Ｇ，３０Ｈ，３０Ｉ，３０Ｊがサーバ３０Ｆ，３０Ａ間に追加され、あるデータの担当サーバが、サーバ３０Ｇ〜３０Ｊの追加によって、サーバ３０Ａからサーバ３０Ｈに変更されているものとする。 <Operation example>
Subsequently, an operation example of the distributed data management system 1 according to the embodiment of the present invention will be described with reference to FIGS. 5 and 6 (see FIGS. 1 to 4 as appropriate). In such an operation example, as shown in FIG. 5, new servers 30G, 30H, 30I, and 30J are added between the servers 30F and 30A with respect to the distributed data management system 1 configured by the servers 30A to 30F. It is assumed that the server in charge of data is changed from the server 30A to the server 30H by adding the servers 30G to 30J.

まず、クライアント１０が、あるデータの要求信号をロードバランサ２０を介してサーバ３０Ｆへ送信する。サーバ３０Ｆの要求信号取得部３３が、要求信号を取得すると（ステップＳ１）、サーバ３０Ｆの担当サーバ特定部３４が、要求信号に含まれるデータ識別子と、最新のアドレス表３１ｂに記憶された全サーバ３０のサーバ識別子と、に基づいて、データの担当サーバを特定する。 First, the client 10 transmits a request signal for certain data to the server 30F via the load balancer 20. When the request signal acquisition unit 33 of the server 30F acquires the request signal (step S1), the server identification unit 34 of the server 30F displays all the servers stored in the data identifier included in the request signal and the latest address table 31b. The server in charge of data is specified based on the 30 server identifiers.

本動作例では、担当サーバはサーバ３０Ｆではなくサーバ３０Ｈであるため（ステップＳ２でＮｏ）、サーバ３０Ｆの要求信号転送部３５が、要求信号を担当サーバであるサーバ３０Ｈの応答部３６へ送信する（ステップＳ３）。なお、サーバ３０Ｆが担当サーバである場合には（ステップＳ２でＹｅｓ）、サーバ３０Ｆの担当サーバ特定部３４が、要求信号をサーバ３０Ｆの応答部３６へ出力し、本フローはステップＳ４へ移行する。 In this operation example, since the server in charge is not the server 30F but the server 30H (No in step S2), the request signal transfer unit 35 of the server 30F transmits the request signal to the response unit 36 of the server 30H that is the server in charge. (Step S3). If the server 30F is a responsible server (Yes in step S2), the responsible server specifying unit 34 of the server 30F outputs a request signal to the response unit 36 of the server 30F, and the flow proceeds to step S4. .

ステップＳ３の実行後、担当サーバであるサーバ３０Ｈの応答部３６が、要求信号を取得し、取得された要求信号に含まれるデータ識別子に基づいて、第一の記憶部３１−１のデータ表３１ａからデータを探索する。本動作例では、サーバ３０Ｈのデータ表３１ａには所望のデータが記憶されていないため（ステップＳ４でＮｏ）、前担当サーバ特定部３７が、要求信号に含まれるデータ生成時刻よりも古くかつ最新のアドレス表生成時刻を有するアドレス表３１ｂに記憶されたサーバ識別子と、要求信号に含まれるデータ識別子と、に基づいて、データの前担当サーバを特定する（ステップＳ５）。なお、サーバ３０Ｈのデータ表３１ａに所望のデータが記憶されている場合には（ステップＳ４でＹｅｓ）、本フローはステップＳ７へ移行する。 After execution of step S3, the response unit 36 of the server 30H that is the server in charge acquires the request signal, and based on the data identifier included in the acquired request signal, the data table 31a of the first storage unit 31-1. Search data from. In the present operation example, since the desired data is not stored in the data table 31a of the server 30H (No in step S4), the previous server specifying unit 37 is older and newer than the data generation time included in the request signal. Based on the server identifier stored in the address table 31b having the address table generation time and the data identifier included in the request signal, the server in charge of data is specified (step S5). If desired data is stored in the data table 31a of the server 30H (Yes in step S4), the flow proceeds to step S7.

ステップＳ５の実行後、担当サーバであるサーバ３０Ｈのデータ転送部３８及びデータ更新部３９と、前担当サーバであるサーバ３０Ａのデータ転送部３８及びデータ更新部３９とが、データの移管を行うとともに、担当サーバであるサーバ３０Ｈのデータ更新部３９が、データ生成時刻を更新する（ステップＳ６）。ステップＳ６によって、第一の記憶部３１−１のデータ表３１ａに、データ及びデータ識別子が記憶され、また、データ及びデータ識別子を取得した時刻又は記憶させた時刻がデータ生成時刻として記憶される。 After execution of step S5, the data transfer unit 38 and the data update unit 39 of the server 30H that is the server in charge, and the data transfer unit 38 and the data update unit 39 of the server 30A that is the previous server perform data transfer. The data updating unit 39 of the server 30H as the server in charge updates the data generation time (step S6). By step S6, the data and the data identifier are stored in the data table 31a of the first storage unit 31-1, and the time when the data and the data identifier are acquired or the stored time is stored as the data generation time.

続いて、担当サーバであるサーバ３０Ｈの応答部３６が、クライアントへ応答する（ステップＳ７）。要求信号がデータの送信を要求するものである場合には、応答部３６は、データ及びデータ生成時刻をクライアント１０へ送信し、要求信号がデータの保管を要求するものである場合には、応答部３６は、保管成否信号及びデータ生成時刻をクライアント１０へ送信する。 Subsequently, the response unit 36 of the server 30H, which is the server in charge, responds to the client (step S7). When the request signal is a request for data transmission, the response unit 36 transmits the data and the data generation time to the client 10, and when the request signal is a request for data storage, a response is made. The unit 36 transmits the storage success / failure signal and the data generation time to the client 10.

本発明の実施形態に係る分散データ管理システム１は、サーバ３０の増減ごとにアドレス表生成時刻を有するアドレス表３１ｂを生成して第二の記憶部３１−２に記憶させ、担当サーバの第一の記憶部３１−１のデータ表に所望のデータが記憶されていない場合には、データ生成時刻とアドレス表生成時刻とに基づいて選択した過去のアドレス表３１ｂに基づいて前担当サーバを特定し、特定された前担当サーバから新担当サーバへデータを移管するので、複数のサーバ３０が追加された場合であっても、隣接するサーバ３０に順にデータの有無を問い合わせたり、全てのサーバ３０にデータの有無を問い合わせたりする必要が無く、前担当サーバにのみ１ホップでデータの有無を確認することができ、リアルタイムに、かつスケーラビリティを損なわずにデータを移管することができる。 The distributed data management system 1 according to the embodiment of the present invention generates an address table 31b having an address table generation time for each increase / decrease of the server 30 and stores the generated address table 31b in the second storage unit 31-2. If the desired data is not stored in the data table of the storage unit 31-1, the previous server is identified based on the past address table 31b selected based on the data generation time and the address table generation time. Since the data is transferred from the specified previous responsible server to the new responsible server, even if a plurality of servers 30 are added, the adjacent servers 30 are inquired about the presence of data in order, or all the servers 30 are inquired. There is no need to inquire about the presence or absence of data, and only the previous server can check the presence or absence of data in one hop. It is possible to transfer data without compromising.

また、本発明の実施形態に係る分散データ管理システム１は、前担当サーバから新担当サーバへデータを移管した際に、新担当サーバがデータを取得した時刻又は新担当サーバの第一の記憶部３１−１のデータ表３１ａにデータを記憶させた時刻をデータ生成時刻としてデータ表３１ａに記憶させるとともにクライアント１０へ送信するので、データ移管後において、クライアント１０に新たなデータ生成時刻に基づいて要求信号を生成させることができる。 Further, the distributed data management system 1 according to the embodiment of the present invention is configured such that when the data is transferred from the previous server in charge to the new server, the time when the new server acquires data or the first storage unit of the new server Since the time when the data is stored in the data table 31a of 31-1 is stored in the data table 31a as the data generation time and transmitted to the client 10, a request is made to the client 10 based on the new data generation time after the data transfer. A signal can be generated.

以上、本発明の実施形態について図面を参照して説明したが、本発明は前記実施形態に限定されず、本発明の要旨を逸脱しない範囲で適宜変更可能である。例えば、本発明は、コンピュータを前記サーバ３０として機能させるための分散データ管理プログラムとしても具現化可能である。また、サーバ３０及びデータの識別子の決定手法は、ハッシュ関数によるものに限定されず、サーバ３０のアドレスは、ＩＰアドレスに限定されない。また、クライアント１０及びサーバ３０は、それぞれ、一のコンピュータによって具現化されてもよく、複数のコンピュータによって具現化されてもよい。 As mentioned above, although embodiment of this invention was described with reference to drawings, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it can change suitably. For example, the present invention can be embodied as a distributed data management program for causing a computer to function as the server 30. Further, the server 30 and the data identifier determination method are not limited to those using a hash function, and the address of the server 30 is not limited to an IP address. Each of the client 10 and the server 30 may be realized by a single computer or may be realized by a plurality of computers.

１分散データ管理システム
３０サーバ（ノード）
３１−１第一の記憶部
３１−２第二の記憶部
３２アドレス表生成部
３３要求信号取得部
３４担当サーバ特定部
３５要求信号転送部
３６応答部
３７前担当サーバ特定部
３８データ転送部
３９データ更新部 1 Distributed data management system 30 Server (node)
31-1 1st memory | storage part 31-2 2nd memory | storage part 32 Address table production | generation part 33 Request signal acquisition part 34 Responsible server specific | specification part 35 Request signal transfer part 36 Response part 37 Previous charge server specific part 38 Data transfer part 39 Data update unit

Claims

A distributed data management system in which multiple servers distribute and manage data,
The server
A first storage unit that stores data and an identifier of the data in association with each other;
A second storage unit storing an address table in which identifiers and addresses of the plurality of servers are associated;
An address table generation unit that newly generates the address table and stores it in the second storage unit together with the generation time of the address table when the number of servers increases or decreases;
A request signal acquisition unit that receives a request signal for requesting transmission or storage of the data, including an identifier of the data transmitted from the client and a generation time of the data;
The server in charge of the data is identified based on the identifier of the data included in the acquired request signal and the identifiers of the plurality of servers in the latest address table stored in the second storage unit The server identification department in charge,
A request signal transfer unit that transfers the request signal to the specified server in charge using the address of the latest address table stored in the second storage unit;
Based on an identifier of the data included in the request signal for requesting transmission of the data, the first storage unit is referred to, the corresponding data is read and transmitted to the client, or the storage of the data is requested. A response that refers to the first storage unit based on the identifier of the data included in the request signal, determines whether the corresponding data is stored successfully, and transmits a storage success / failure signal indicating the determination result to the client And
When the data corresponding to the identifier of the data included in the request signal is not stored in the first storage unit, the generation time of the address table among the address tables stored in the second storage unit Using a latest address table that is older than the generation time of the data included in the request signal, and a previous responsible server identifying unit that identifies the responsible server before the data;
A data transfer unit that transfers the data and the identifier of the data to the identified previous server in charge;
A data updating unit for storing the transferred data and an identifier of the data in the first storage unit;
A distributed data management system comprising:

The first storage unit stores the data, the identifier of the data, and the generation time of the data in association with each other.
The data update unit, when storing the transferred data and the identifier of the data in the first storage unit, the time when the data update unit acquired the transferred data and the identifier of the data, or , The time at which the data update unit stores the transferred data and the identifier of the data in the first storage unit is stored in the first storage unit as the generation time of the data,
The distributed data management system according to claim 1, wherein the response unit assigns a generation time of the data to the data or the storage success / failure signal and transmits the data to the client.

A distributed data management method in which multiple servers distribute and manage data,
The server includes a first storage unit that stores data and an identifier of the data in association with each other; a second storage unit that stores an address table in which identifiers and addresses of the plurality of servers are associated; And when the number of servers increases or decreases, the address table is newly generated and stored together with the generation time of the address table in the second storage unit,
One of the servers obtains a request signal for transmitting or storing the data, including the identifier of the data and the generation time of the data, transmitted from the client;
One of the servers is based on the identifier of the data included in the acquired request signal and the identifiers of the plurality of servers in the latest address table stored in the second storage unit, Identifying the server responsible for the data;
The one server transfers the request signal to the identified responsible server using the address of the latest address table stored in the second storage unit;
The responsible server refers to the first storage unit based on the identifier of the data included in the request signal for requesting transmission of the data, reads the corresponding data and transmits the data to the client, or Based on the identifier of the data included in the request signal for requesting data storage, the first storage unit is referred to, and a storage success / failure signal indicating a determination result by determining success / failure of storage of the corresponding data is indicated. Sending to the client;
Including
When the data corresponding to the identifier of the data included in the request signal is not stored in the first storage unit, the server in charge is associated with the address table stored in the second storage unit. Identifying a server in front of the data by using an address table whose address table generation time is older and newer than the data generation time included in the request signal;
The responsible server transfers the data and the identifier of the data to and from the previous responsible server;
The responsible server storing the transferred data and the identifier of the data in the first storage unit;
A distributed data management method comprising:

The first storage unit stores the data, the identifier of the data, and the generation time of the data in association with each other.
When the responsible server stores the transferred data and the identifier of the data in the first storage unit, the time when the transferred server acquired the transferred data and the identifier of the data, or the transfer The time when the server in charge stores the data and the identifier of the data in the first storage unit is stored in the first storage unit as the generation time of the data,
The distributed data management method according to claim 3, wherein the responsible server gives the generation time of the data to the data or the storage success / failure signal and transmits the data to the client.

A distributed data management program used when a plurality of computers distribute and manage data,
A computer comprising: a first storage unit storing data and an identifier of the data in association with each other; and a second storage unit storing an address table in which identifiers and addresses of the plurality of servers are associated with each other. ,
An address table generator that newly generates the address table and stores the address table in the second storage unit together with the generation time of the address table when the plurality of servers increase or decrease;
A request signal acquisition unit for acquiring a request signal for requesting transmission or storage of the data, including a data identifier and a generation time of the data, transmitted from the client;
The server in charge of the data is identified based on the identifier of the data included in the acquired request signal and the identifiers of the plurality of servers in the latest address table stored in the second storage unit Responsible server identification department,
A request signal transfer unit that transfers the request signal to the specified server in charge using the address of the latest address table stored in the second storage unit;
Based on an identifier of the data included in the request signal for requesting transmission of the data, the first storage unit is referred to, the corresponding data is read and transmitted to the client, or the storage of the data is requested. A response that refers to the first storage unit based on the identifier of the data included in the request signal, determines whether the corresponding data is stored successfully, and transmits a storage success / failure signal indicating the determination result to the client Part,
When the data corresponding to the identifier of the data included in the request signal is not stored in the first storage unit, the generation time of the address table among the address tables stored in the second storage unit Using a latest address table that is older than the generation time of the data included in the request signal and using the latest address server specifying unit that specifies the server in front of the data,
A data transfer unit that transfers the data and an identifier of the data to the identified previous server in charge; and
A data update unit for storing the transferred data and the identifier of the data in the first storage unit;
Distributed data management program to function as

The first storage unit stores the data, the identifier of the data, and the generation time of the data in association with each other.
The data update unit, when storing the transferred data and the identifier of the data in the first storage unit, the time when the data update unit acquired the transferred data and the identifier of the data, or , The time at which the data update unit stores the transferred data and the identifier of the data in the first storage unit is stored in the first storage unit as the generation time of the data,
The distributed data management program according to claim 5, wherein the response unit assigns a generation time of the data to the data or the storage success / failure signal and transmits the data to the client.