WO2014054173A1

WO2014054173A1 - Distributed data management system and program

Info

Publication number: WO2014054173A1
Application number: PCT/JP2012/075995
Authority: WO
Inventors: 洋中越; 崇利加藤
Original assignee: 株式会社日立製作所
Priority date: 2012-10-05
Filing date: 2012-10-05
Publication date: 2014-04-10

Abstract

[Problem] To construct a distributed database which has anti-disaster performance and enables access to data from any configuration node, and manage data stored in the distributed database. [Solution] A distributed data management system (1) comprises: a plurality of nodes (20) which each store data; and a core node (10) which constructs a hierarchical overlay network configured from the plurality of nodes (20) and manages the data. When constructing the hierarchical overlay network, the core node (10) configures a sub overlay network from a subset of the nodes (20), which is determined on the basis of distances in a geographical space, associates respective networks with each other between layers, generates network configuration information relating to the respective nodes (20) configuring the hierarchical overlay network, and transfers the network configuration information to the respective nodes (20). When writing data, the plurality of nodes (20) each replicate the data in each of overlay networks associated with a network to which the data was written.

Description

Distributed data management system and program

The present invention relates to a distributed data management system and program, and is suitable for application to a distributed data management system and program using a distributed database by an overlay network.

Conventionally, large-scale databases have been constructed by expanding the scale of non-distributed databases. However, in recent years, as a result of increasing demands for reducing operation management costs, which have increased with the complexity of the system, and for improving tolerance in the event of a failure, a distributed database (distributed database) has attracted attention.

A distributed database is composed of multiple nodes where data is distributed and stored. In a distributed database, it is necessary to manage which node stores data, and a distributed hash table (DHT: Distributed Hash Table) is used as an example of a method for determining a data storage destination node. In a general distributed database using a distributed hash table, each node is arranged in a ring-shaped hash space according to a node hash value derived from a hash function with respect to a unique identifier of each node. Then, a data hash value derived from a hash function is obtained for the key assigned to the data, and the data is transferred to a node having a node hash value close to the data hash value in a ring-shaped hash space in which the nodes are aligned. Is stored.

However, in such a distributed database, the storage location of data is determined by the proximity in the hash space, that is, the proximity of the hash value. Could not be stored.

In response to such a problem, for example, Patent Document 1 describes a system that realizes a distributed network that stores data for nodes based on a distance in a name space. In the system described in Patent Document 1, an ID corresponding to a name is given to a node in addition to a hash value, and a network expressing the name space (dictionary space) proximity of the name ID is overlaid on the hash ring network. To build a hierarchical network. Hereinafter, a hierarchical overlay network having a structure in which a plurality of hash ring networks are hierarchically overlaid is referred to as a hierarchical hash ring network. According to the hierarchical hash ring network described in Patent Document 1, data can be stored in nodes close to the dictionary space.

Japanese Patent Application Laid-Open No. 2004-266796

However, in the system described in Patent Document 1, the hash ring network is configured based on the proximity in the dictionary space using the name ID, and the name ID means the proximity on the network space. Although data is stored in a node closer to the top, there is a problem that a hash ring network based on proximity in geospace cannot be configured. For example, in a distributed network such as a hash ring network, when a node that cannot be connected due to a disaster (or failure, etc.) occurs, a network with nodes close to the location of the disaster, that is, nodes close to the geographical space is constructed. By doing so, it is possible to realize a resilient network with resilience that can be connected to the network from a node near the disaster occurrence location. However, since the system described in Patent Document 1 cannot configure a hash ring network based on proximity in geographic space, it is difficult to realize a resilient network having disaster resistance.

Further, in the system described in Patent Document 1, no means for accessing data across layers is provided for the construction of a hierarchical hash ring network, so that data written to a node in a certain subnetwork is not provided. Can be referred only from a node participating in the sub-network, and cannot be referenced from a hash ring network in another layer or another hash ring network in the same layer. For example, when a user moves in a distributed database, data stored in a node can be accessed from a plurality of geographical spaces or bases on a network space. Thus, if a data access means that straddles between layers is not provided, data access from different hash ring networks is denied. As a result, for example, a situation has occurred in which data written to a node in a subnetwork composed of a home network space cannot be referred to from a subnetwork composed of a network space outside the home. End up.

The present invention has been made in consideration of the above points. A distributed database is constructed that has disaster resistance and can access data from any of the constituent nodes, and manages the data stored in the distributed database. It is intended to propose a data management system and program.

In order to solve such a problem, in the present invention, a plurality of nodes connected to each other on a network and storing data, and an uppermost overlay network that is connected to the plurality of nodes and is composed of the entire set of the plurality of nodes. And constructing a hierarchical overlay network in which a sub-overlay network composed of a subset of the plurality of nodes and a unit overlay network composed of units of the plurality of nodes are sequentially hierarchized, and the hierarchical overlay A core node that manages the data by a network, and each of the plurality of nodes is assigned a node number indicating a stochastic uniform number across the set of the plurality of nodes, and the hierarchical overlay network Each overlay network included in the Based on the node number assigned to each node constituting the network and the data number indicating the stochastic uniform number obtained from the data stored in the node, the node storing the data in the network is determined. The core node configures the sub-overlay network by a subset of the plurality of nodes determined based on a geospatial distance when constructing the hierarchical overlay network. , The uppermost overlay network, the sub-overlay network, and the unit overlay network are associated with each other in order, and for each node constituting the hierarchical overlay network, information indicating all overlay networks including the node and the information Overlay Generating network configuration information having information on all nodes constituting the network and transferring the network configuration information to each node, the plurality of nodes holding the transferred network configuration information, and the plurality of nodes A distributed data management system is provided that, when writing data to a hierarchical overlay network, replicates the data in each overlay network associated with the unit overlay network that wrote the data.

Further, in order to solve such a problem, in the present invention, a computer in a core node connected to a plurality of nodes storing data, a top-level overlay network composed of a whole set of a plurality of nodes storing data, A hierarchical overlay network in which a sub-overlay network composed of a subset of a plurality of nodes and a unit overlay network composed of a unit of the plurality of nodes are sequentially layered is constructed, and the data is obtained by the hierarchical overlay network. Each of the plurality of nodes is assigned a node number indicating a stochastic uniform number over the set of the plurality of nodes, and is included in the hierarchical overlay network. The overlay network Based on the node number assigned to each node constituting the valley network and a data number indicating a stochastic uniform number obtained from the stored data, a node for storing the data in the network is determined. A ring network, and the program is executed by a subset of the plurality of nodes determined based on a geospatial distance when the hierarchical overlay network is built on a computer in the core node. A step of constructing an overlay network, a step of associating hierarchies in the order of the highest order overlay network, the sub-overlay network, and the unit overlay network; and for each node constituting the hierarchical overlay network, Generating network configuration information including information indicating all included overlay networks and information relating to all nodes constituting the overlay network; and transferring the generated network configuration information to each of the nodes. A program is provided for execution.

According to the present invention, it is possible to construct a distributed database that has disaster resistance and can access data from any constituent node, and to manage the data stored in the distributed database.

It is a block diagram which shows the structure of the distributed data management system by 1st Embodiment. It is a table which shows an example of a metric management table. It is a schematic diagram which shows the example of a screen of an administrator service setting screen. It is a flowchart which shows the process procedure which builds a hierarchical hash ring network. It is a table which shows the example of a policy of clustering. It is a table which shows an example of a fixed area information table. It is explanatory drawing which shows an example of the overlapping rectangular area. It is the schematic which shows the structural example of the hierarchical hash ring network produced | generated by the process shown in FIG. FIG. 9 is a table example showing network configuration node information in the hierarchical hash ring network shown in FIG. 8. FIG. It is a table which shows an example of network configuration information. It is a sequence diagram which shows an example of the process procedure of data writing. It is explanatory drawing which shows the example of a format of a data write request. It is a sequence diagram which shows an example of the process sequence of data deletion. It is the schematic for demonstrating the method to recover | restore data in the hierarchical hash ring network which deleted the node. It is a flowchart which shows the process sequence which builds a hierarchical hash ring network in the distributed data management system by 2nd Embodiment. 16 is a table showing an example of a clustering policy used in the processing shown in FIG. It is the schematic which shows the structural example of the hierarchical hash ring network in 2nd Embodiment. 18 is a table example showing network configuration node information in the hierarchical hash ring network shown in FIG. 17. It is a table figure which shows an example of network configuration information.

In the following description, the information of the present invention will be described using expressions such as “aaa table”, “aaa list”, “aaaDB”, “aaa queue”, etc., but these information are not necessarily tables, lists, DBs, queues, etc. It may not be expressed in other than the data structure. Therefore, “aaa table”, “aaa list”, “aaaDB”, “aaa queue”, etc. may be referred to as “aaa information” to indicate that they are not dependent on the data structure.
Furthermore, in describing the contents of each information, the expressions “identification information”, “identifier”, “name”, “name”, and “ID” are used, but these can be replaced with each other.
In the following description, “program” may be used as the subject. However, since the program performs processing determined by being executed by the processor using a memory and a communication port (communication control device), the processor The subject may be an explanation. Further, the processing disclosed with the program as the subject may be processing performed by a computer such as a management server or an information processing apparatus. Further, part or all of the program may be realized by dedicated hardware. Various programs may be installed in each computer by a program distribution server or a computer-readable storage medium.

(1) First Embodiment A distributed data management system according to a first embodiment is distributed by a plurality of nodes storing data and a hierarchical overlay network (hierarchical hash ring network) in which the plurality of nodes participate. Type database, and a core node that manages data by the constructed hierarchical hash ring network, and the core node is a sub-overlay network ( Hash ring network), generating configuration information having a network including each node in the hierarchical hash ring network and information on all nodes constituting the network, and transferring the generated configuration information to each node Yes.

(1-1) Configuration of Distributed Data Management System FIG. 1 is a block diagram showing the configuration of the distributed data management system according to the first embodiment. A distributed data management system 1 shown in FIG. 1 includes a distributed database in which a core node 10 and a plurality of nodes 20 (individually, nodes 20A to 20N) are connected to each other via a network 30, and the plurality of nodes 20 participate. to manage. Each of the core node 10 and the plurality of nodes 20 has a central processing unit (CPU), a memory, and an interface, and is realized by an information processing device such as a general computer, a server, or a data center. Is done.

(1-1-1) Configuration of Core Node and Functions of Each Unit The core node 10 creates a hierarchical hash ring network configuration constructed by incorporating a plurality of nodes 20 as a distributed database, and stores the hierarchical hash ring network. It is a device for managing. More specifically, the core node 10 collects measured values (hereinafter referred to as metrics) obtained by the nodes 20A to 20N and indicating the performance of the nodes, and based on the collected metrics, the hierarchical hash ring network Then, the core node 10 transfers configuration information of the constructed hierarchical hash ring network to each node 20. The hierarchical hash ring network is an example of an overlay network, and a process in which the core node 10 constructs a hierarchical hash ring network and transfers the configuration information to the node 20 will be described later with reference to FIG.

As shown in FIG. 1, the core node 10 includes a configuration information management unit 11, a communication I / O (Input / Output) 12, and a communication unit 13. The communication I / O 12 is an interface used when the configuration information management unit 11 uses a communication function, and generally corresponds to software such as a communication library or a protocol stack, but is not limited thereto. The communication unit 13 is a device that provides a communication function, and generally corresponds to a NIC (Network Interface Card), but is not limited thereto.

The core node 10 has an input / output device (not shown). Examples of input / output devices include a display, a keyboard, and a pointer device, but other devices may be used. Also, as an alternative to the input / output device, a serial interface or an Ethernet interface is used as the input / output device, a display computer having a display or keyboard or pointer device is connected to the interface, and the display information is transmitted to the display computer. By receiving the input information from the display computer, the display computer may perform the display, or the input may be replaced by the input / output device by receiving the input.

The configuration information management unit 11 creates a hierarchical hash ring network configuration based on the metrics of a plurality of nodes 20 by reading and executing a program stored in a memory (not shown) by a CPU (not shown). And distributing the created configuration information of the hierarchical hash ring network to the plurality of nodes 20. The creation of the hierarchical hash ring network configuration and the distribution of the configuration information of the hierarchical hashing network by the configuration information management unit 11 are executed by the CPU executing a program read from the recording medium connected to the core node 10. It may be realized. The configuration information management unit 11 functionally includes a metric management unit 111, a configuration information creation unit 112, and a configuration information transfer control unit 113.

The metric management unit 111 collects and manages metrics necessary for configuring a hierarchical hash ring network described later from a plurality of nodes 20. More specifically, the metrics respectively acquired by the metric acquisition units 22 of the nodes 20A to 20N are transmitted on the network and collected by the metric management unit 111 via the communication unit 13 and the communication I / O 12 of the core node 10. Is done. The metric management unit 111 collects the collected metrics into a table, a list, etc., and manages them by storing them in a storage area.

Here, as the metrics collected by the metric management unit 111, measured values of various parameters indicating the performance state of the node 20 can be applied. In the present embodiment, as an example, each part (the CPU 235 or the memory of the node 20). 236, etc.) are collected and managed as a metric management table.

FIG. 2 is a table showing an example of the metric management table. In the metric management table 141 in FIG. 2A and the metric management table 142 in FIG. 2B, the metrics collected by the metric management unit 111 for each of the plurality of nodes 20 (

nodes

20A, 20B, and 20C as an example). Is stored.

The metric management table 141 shown in FIG. 2A stores metrics indicating the specifications and position information of the

nodes

20A, 20B, and 20C. In the metric management table 141, the node 20 141A describes the node 20 that is the metric acquisition target, and the address column 141B describes the address of the node 20. In the specification column including the CPU column 141C, the memory column 141D, the storage column 141E, and the network column 141F, metrics related to specifications in each part (CPU 235, memory 236, storage unit 234, communication unit 233, etc.) are described. For example, the CPU column 141C describes the static specification performance (corresponding to the operating frequency) of the CPU 235 and the usage rate (corresponding to the CPU usage rate) indicating the dynamic performance. In the memory column 141D, the static specification performance (corresponding to the memory capacity) of the memory 236 and the usage rate (corresponding to the memory usage rate) indicating the dynamic performance are described. In the position information column 141G, position information (corresponding to address information or coordinate information) indicating the position of the node 20 that is the metric acquisition target is described.

In the metric management table 142 shown in FIG. 2B, a metric indicating the distance between nodes on the network (network distance) is stored for the

nodes

20A, 20B, and 20C. The type of value indicating the network distance between nodes is not particularly limited. For example, it may be the number of hops obtained by executing Ping between nodes, may be a value of throughput or response, and may be an IP address. The value may be a value based on the closeness of the numerical value, or may be a value obtained by normalizing a value obtained by another measurement.

Note that the metrics stored in the metric management tables 141 and 142 are not limited to metrics acquired by the metric acquisition unit 22 of the node 20, but may be metrics directly input by an administrator, a user, or the like. In addition, for example, when the node 20A has a configuration without the position measurement unit 237, the administrator inputs the position information of the node 20A. It may be stored in a management table.

FIG. 3 is a schematic diagram showing a screen example of the administrator service setting screen. The administrator service setting screen 143 shown in FIG. 3 is a screen provided for the administrator to directly perform metric setting and network setting. The administrator service setting screen is displayed on, for example, the core node 10 or a display unit (not shown) of an administrator terminal (not shown) connected to the core node, and is input from the administrator to the administrator service setting screen. The configuration information is notified to the configuration information management unit 11. The administrator service setting screen 143 includes a metric setting field 143A, a hierarchy number setting field 143B, a policy hierarchy setting field 143C, a policy content setting field 143D, and a node duplication setting field 143E. In the metric setting field 143A, position information (address information or coordinate information) can be input for the designated node 20. In the hierarchy number setting column 143B, the total number of hierarchies in the hierarchical hash ring network can be set. In the policy hierarchy setting field 143C and the policy content setting field 143D, the hierarchy and contents can be set for the policy at the time of creating the hash ring network based on the spatial proximity. In the node duplication setting field 143E, it is possible to set whether or not duplication of the node 20 is permitted between hash ring networks. Therefore, the direct setting of the metric relating to the position information described above can be performed by inputting to the metric setting field 143A of the administrator service setting screen 143. Further, the hierarchy number setting field 143B, the policy hierarchy setting field 143C, the policy content setting field 143D, and the node duplication setting field 143E will be described later in the description of the processing in which each setting is performed.

The configuration information creation unit 112 creates configuration information of the hierarchical hash ring network in which the node 20 participates based on the metric management tables 141 and 142 managed by the metric management unit 111. The configuration information transfer control unit 113 distributes the hierarchical hash ring network configuration information created by the configuration information creation unit 112 to the plurality of nodes 20.

(1-1-2) Node Configuration and Functions of Each Part The node 20 is communicably connected from a user terminal (not shown) via a network such as a mobile phone network or WiFi (registered trademark). The node 20 is a node of a hierarchical hash ring network whose configuration is managed by the core node. The node 20 acquires a metric of the own node 20 (for example, the node 20A) and transmits it to the core node 10, a user (user terminal), and others. And a function for processing a request (data access request) accompanied by data access sent from the node 20 (for example, the node 20B).

Here, the data access request sent from the user terminal or another node 20 corresponds to a data write request or a read request to the distributed data. More specifically, the data access request sent from the user terminal requests general data storage or reference. The data access request sent from the other node 20 (for example, the node 20B) is a request to replicate (replicate) the data in the case of a write request, and the node connected from the user terminal in the case of a read request. 20B requests to read the data requested from the user terminal instead of the user terminal. Further, data access is not limited to data write requests and read requests, but includes an add request for requesting node addition processing, a delete request for requesting data or node deletion, and the like.

Hereinafter, the detailed configuration of the node 20 will be described. However, since all of the nodes 20A to 20N have the same configuration, only the node 20A will be described for the sake of brevity. The number of nodes 20 is not particularly limited, and it is assumed that there are n nodes 20 in FIG.

The node 20A has a data control unit 21, a metric acquisition unit 22, and a management device 23 as shown in FIG. The node 20A in FIG. 1 describes components related to processing of a data access request from a user terminal.

The data control unit 21 includes a request processing unit 211, a data write control unit 212, a data read control unit 213, and a table management unit 214, and controls processing related to a data access request from a user terminal.

The request processing unit 211 has a function of receiving a data access request sent from a user terminal or another node 20B to 20N, performing analysis and calculation of a hash value, and outputting an analysis result and a calculation result. 215, a hash calculation unit 216, and a node derivation unit 217. The request analysis unit 215 receives a data access request via the communication unit 233 and the communication I / O 231 and analyzes the received request. The request analysis unit 215 determines, for example, whether the data access request is a data write request or a read request by the analysis, and also includes information (hereinafter referred to as a data access key) included in the request. Or a data key). The hash calculation unit 216 derives a hash value from the data key analyzed by the request analysis unit 215. Further, the node deriving unit 217 refers to the hash value derived by the hash calculation unit 216 and the table managed by the table management unit 214, and accesses the data accessed by the request (data write destination or data The node 20 to be read in) is determined.

Finally, the request processing unit 211 notifies the data write control unit 212 of the data write request and the data write destination when the request is a data write request according to the analysis result by the request analysis unit 215. If it is a data read request, the data read control unit 213 is notified of the data read request and the data read destination.

When the data write control unit 212 receives a data write request from the request processing unit 211 and the data write destination is another node (for example, the node 20B), the data write control unit 212 passes through the communication I / O 231. The communication unit 233 requests the other node 20B to write the data. In addition, when the data write control unit 212 receives a data write request from the request processing unit 211 and the data write destination is the node 20 </ b> A of itself, the data write control unit 212 passes the storage unit via the storage I / O 232. Write data to H.234.

When the data reading control unit 213 receives a data reading request from the request processing unit 211 and the data reading destination is another node (for example, the node 20B), the data reading control unit 213 performs communication via the communication I / O 231. The unit 233 requests the other node 20B to read the data. In addition, when the data read control unit 213 receives a data read request from the request processing unit 211 and the data read destination is the node 20 </ b> A of the data read control unit 213, the data read control unit 213 via the storage I / O 232 stores Read data from.

The table management unit 214 manages a table in which configuration information of the hierarchical hash ring network sent from the core node 10 is described, and receives a reference to the table from the node deriving unit 217.

The metric acquisition unit 22 acquires one or more types of metrics from the management device 23 and transmits the acquired metrics to the core node 10. As an example, the metric acquisition unit 22 transmits to the regularly acquired metric core node 10, and when a failure occurs in the node 20 </ b> A, transmits the metrics collected in an event-driven manner in response to the failure to the core node 10. .

The management device 23 includes a communication I / O 231, a storage I / O 232, a communication unit 233, a storage unit 234, a CPU 235, a memory 236, and a position measurement unit 237.

The communication I / O 231 is an interface used when the data control unit 21 uses a communication function, and generally corresponds to software such as a communication library or a protocol stack, but is not limited thereto. The storage I / O 232 is an interface used when the data control unit 21 requests storage access. The communication unit 233 is a device that provides a communication function to the node 20A, and generally corresponds to a NIC, but is not limited thereto.

The storage unit 234 generally represents the local storage in the node 20A. The storage unit 234 provides a storage function to the node 20A and corresponds to, for example, an auxiliary storage device or a hard disk drive (HDD). The CPU 235 reads the program and data stored in the storage unit 234 and held in the memory 236, and processes the data according to the instruction code of the program. The memory 236 is a main storage device that temporarily stores programs and data stored in the storage unit 234 and the like.

The position measurement unit 237 is a device that measures the geographical position of the node 20A and outputs the position information as a numerical value, for example, and is generally realized by GPS (Global Positioning System). The position measurement unit 237 facilitates the description of the distributed data management system according to the present embodiment, but is not necessarily required. For example, when the node 20A does not move from the installation location, the geographical position information indicating the installation position of the node 20A may be stored in the node 20A or the core node 10 when the node 20A is installed.

(1-2) Construction of Hierarchical Hash Ring Network Hereinafter, an example of a construction method of a hierarchical hash ring network constructed by the configuration information creation unit 112 in the distributed data management system 1 according to the present embodiment will be described. In the distributed data management system 1, one or more replication data is created for each tier, and the number of tiers of the tiered network is set according to the number of replication data. The number of hierarchies of the hierarchical network may be set in advance in the core node 10, and the number of hierarchies desired by the administrator is displayed in the hierarchy number setting field 143B of the administrator service setting screen 143 shown in FIG. May be set. In the following description, as an example, the number of layers of the hierarchical hash ring network is four. Therefore, the number of replication data in the hierarchical hash ring network managed by the distributed data management system 1 is at least four.

FIG. 4 is a flowchart showing a processing procedure for constructing a hierarchical hash ring network. With reference to FIG. 4, a processing procedure for constructing a hierarchical hash ring network in which the total number of hierarchies is four and nodes 20A to 20M participate will be described.

(1-2-1) Derivation of Hash Value First, the configuration information creation unit 112 is a node 20 participating in the distributed database (here, the nodes 20A to 20M, “A”, “B”,. A unique identifier assigned to each is input to the hash function to derive hash values of the nodes 20A to 20M (step S101). Here, the identifier assigned to the nodes 20A to 20M only needs to be able to uniquely identify each of the nodes 20A to 20M, and may be either a character string or a numerical value. The type of hash function is not limited. For example, a hash function such as SHA (Secure Hash Algorithm) -1, SHA256, or MD (Message Digest Algorithm) 5 can be used.

In this embodiment, the nodes 20A to 20M are configured across a plurality of hierarchies, but the hash value space of each node 20 is the same. That is, the hash space sizes in all the hash ring networks constituting the hierarchical hash ring network are the same, and the nodes 20A to 20M are uniquely determined by the same hash value in the hash ring network in which the nodes 20A to 20M participate.

(1-2-2) Creation of Layer 0 Network and Bottom Layer Network Next, the configuration information creation unit 112 creates a layer 0 network including all the nodes 20 based on the hash value derived in step S101. Create (step S102). Specifically, the configuration information creation unit 112 maps the maximum value (for example, “HASH_MAX”) that can be taken by the hash function from the minimum value (for example, “0”) that can be taken by the hash function to the one-dimensional hash ring space, Nodes 20A to 20M are arranged in a one-dimensional hash ring space according to their hash values. The size of the hash space and the arrangement method in the hash space are not particularly limited.

Next, the configuration information creation unit 112 creates, for each of the nodes 20A to 20M, hash ring networks including only itself for the number of nodes of the nodes 20A to 20M (step S103). This hash ring network is the lowest layer network, and when the total number of hierarchies is 4, it corresponds to the third layer network. Specifically, as in step S102, the configuration information creation unit 112 maps the minimum and maximum values that can be taken by the hash function to a one-dimensional hash ring space, and sets each of the nodes 20A to 20M to the respective hash values. Accordingly, it is arranged in a one-dimensional hash ring space.

(1-2-3) Creation of Neighborhood Network Next, the configuration information creation unit 112 creates m layers (m is a variable) of a hash ring network (hereinafter referred to as a neighborhood network) based on spatial proximity. (Steps S104 to S110). Here, since the total number of hierarchies is 4, two are subtracted from the 0th layer network created in step S102 and the lowest layer network created in step S103, so that m = 2. The spatial proximity may be a geographic space proximity or a network space proximity. Hereinafter, the processing of steps S104 to S110 for creating a neighborhood network will be described in detail.

(1-2-3-1) Node Clustering In step S104, the configuration information creation unit 112 is a node whose distance in the geographic space or network space is close to each node 20 with respect to the node 20 assigned to each layer. 20 is clustered. Hereinafter, a network created by incorporating the nodes clustered in step S104 is also referred to as a neighborhood network.

FIG. 5 is a table showing an example of a clustering policy. The policy table 144 shown in FIG. 5 includes a layer column 144A in which the number of layers of networks is described, a policy (geospace) column 144B in which a policy in the case of using a geospace as a reference, and a network space as a reference And a policy (network space) column 144C in which a policy is described.

For example, when clustering is performed using the network space as a reference, referring to the column of the policy (network space) column 144C in FIG. 5, the node 20 is connected to the node 20 whose network distance is within “100” in the first layer. It is clustered (clustered), and it is shown that the second layer is clustered with a node 20 whose network distance is within “20”. Note that, when the policy described in the policy (network space) column 144C in FIG. 5 is followed, it is conceivable that overlapping nodes occur in a plurality of clusters when a plurality of policies are satisfied. This is dealt with by the de-duplication processing (step S106) of the overlapping node.

Further, for example, when clustering is performed based on the geospace, referring to the column of the policy (geospace) column 144B in FIG. 5, the nodes 20 are clustered by the region of the prefecture in the first layer, and the second layer Shows that it is clustered by the area of municipalities. Here, according to the policy (geographical space) of the second layer, it is not only clustered in the area of municipalities, but “but include adjacent municipalities” and “but adjacent to Tokyo 23 wards. It can be seen that a policy permitting overlapping areas can be set, such as “include 23 wards in the municipality to do”. Such an overlap area setting exists in a municipality where the user resides when considering a user who lives in a municipality adjacent to the 23 wards of Tokyo and frequently visits the 23 wards of Tokyo. It is possible to improve the convenience for the user by clustering the nodes 20 in the 23 wards of Tokyo where the user visits frequently rather than clustering only the nodes 20 To be done.

As an example of a method for determining the node 20 that is actually a clustering target, there is the following method. First, the position coordinates in the geographic space of each of the nodes 20A to 20M are stored in the position information column 141G of the metric management table 141. In addition, the configuration information creation unit 112 can refer to a fixed area information table in which coordinate information of a rectangular area corresponding to a prefecture or a city is stored in the core node 10. FIG. 6 is a table showing an example of the fixed area information table. The fixed area information table 145 shown in FIG. 6 includes a prefecture name field 145A in which the prefecture name is described and a rectangular area field 145B in which the coordinate information of the corresponding rectangular area is described. A rectangular area is defined. Similarly, a fixed area information table in which a rectangular area corresponding to a municipality is defined may be prepared. The fixed area information table 145 may be created by data input by an administrator or automatic processing. Specifically, for example, the network hierarchy for setting the policy is set in the policy hierarchy setting field 143C of the administrator service setting screen 143 shown in FIG. 3, and the policy is selected and set in the policy content setting field 143D.

In such a case, the configuration information creation unit 112 refers to the position coordinates stored in the position information column 141G of the metric management table 141 for each of the nodes 20A to 20M, and each node 20 enters the fixed area information table 145. By determining whether or not it belongs to the retained “prefecture” rectangular area, it is possible to determine whether or not the first layer policy is satisfied and to determine the node 20 to be clustered. The target prefecture or city may be set directly by the administrator, or a predetermined range of rectangular area is derived and set from the address information input by the administrator. You may do it. In particular, if the position coordinates of each of the nodes 20A to 20M are managed as an R-tree by a rectangular area in the geographic space specified by the policy, clustering can be completed with an O (logN) calculation amount due to the nature of the R-tree. it can.

Note that the clustering policy may be set directly by the administrator from the administrator service setting screen 143 shown in FIG. In such a case, the administrator sets a hierarchy for setting the policy in the policy hierarchy setting field 143C of the administrator service setting screen 143, and the specific contents of the policy (for example, geospatial space) are set in the policy content setting field 143D. By setting the area to be divided above and the distance to be divided on the network space, the same setting as that of the policy table 144 shown in FIG. 5 can be realized. Further, when the address information of the node 20 is described in the metric setting field 143A on the administrator service setting screen 143 shown in FIG. 3, the coordinate information of the node 20 is obtained from the address information with reference to FIG. Can be derived.

Furthermore, on the administrator service setting screen 143, the administrator may be able to set in the node duplication setting field 143E whether or not to permit duplication of the nodes 20 between the hash ring networks. FIG. 7 is an explanatory diagram illustrating an example of overlapping rectangular areas. In FIG. 7, nodes [1] to [4] exist in the A ward, nodes [5] in the B ward, nodes [6] in the C ward, and nodes [7] in the D ward. “A ward cluster” to “D ward cluster” indicated by broken lines are rectangular areas set by the policy. Here, if the nodes [1] to [7] are not permitted to be duplicated, the nodes [1] to [7] are assigned to the above-mentioned respective sections. On the other hand, when the nodes [1] to [7] are permitted to be duplicated, one node is permitted to be duplicated in a plurality of clusters. Specifically, for example, the node [3] is assigned to the A ward class and the B ward cluster in an overlapping manner, and the node [6] is assigned to the C ward cluster and the D ward cluster in an overlapping manner. By allowing such duplication, the node 20 included in the hash ring network that is rectangular according to one policy can be included in another hash ring network that is rectangular according to another policy. .

Note that, in the hierarchical hash ring network constructed in the distributed data management system 1 of the present embodiment, clustering by geographic space and clustering by network space may be simultaneously specified in the policy table 144. That is, for example, the policy table may describe a policy for clustering by geographic space in the first layer and a policy for clustering by network space in the second layer. Further, the policy described in the policy table may be that both the policy by the geo space and the policy by the network space are satisfied. The policy table described in this way is the policy table of FIG. As in 144, it can be expected that only the same effect as the case where the policy based on the geographical space and the policy based on the network space are separately described, and the description of the policy is only complicated. That is, set the policy “Tokyo and network distance is within 100” in the first layer, and set the policy “Tokyo” in the first layer and “network distance is within 100” in the second layer. Has the same effect on an overlay network.

(1-2-3-2) Association between hierarchies In step S105, the configuration information creation unit 112 sets the number of nodes of related lines that pass through all hierarchies for a plurality of hierarchies in the neighboring network created in step S104. And a hash ring network between hierarchies is associated with each node 20. Since the association line is determined corresponding to the nodes 20A to 20M, the association is surely made for the 0th layer network including all of the nodes 20A to 20M and the 3rd layer network including only the nodes 20A to 20M. Done. In addition, a hash ring network including the corresponding node 20 is also associated with the first layer and second layer networks.

For example, when the node 20A is included in the third layer network “R0000”, the second layer network “R000”, the first layer network “R00”, and the zeroth layer network “R0”, the node 20A corresponds to the node 20A The related line A is indicated by “R0000-R000-R00-R0”. In addition, when the node 20A overlaps with a plurality of hash ring networks in the same hierarchy, all the hash ring networks to be allocated are associated with the related line A. Specifically, for example, if the node 20A is included not only in the second layer network “R000” but also in the second layer network “R001”, the related line A is “R0000-R000, R001-R00-R0”. As shown.

(1-2-3-3) Derivation of Allowable Lowermost Layer in Each Node In step S106, the configuration information creating unit 112 refers to the metric table 141, and for each node 20, the lowest layer of the neighboring network that is allowed to exist ( The lowest acceptable layer) is determined, and which layer of the neighboring network is included is determined. In the determination of the allowable lowest layer in step S106, the lowest layer network (corresponding to the third layer network in this description) is not the target, but the first layer of the lowest layer network is the target.

Here, in the distributed data management system 1, at least one replication data is created in each layer as described above. In general, when constructing a sub-network that considers spatial proximity regardless of geographic space or network space, the number of nodes constituting the sub-network in the lower layer is smaller than the number of nodes in the upper layer. Therefore, as the number of hierarchies incorporated into the sub-network increases, the lower the layer, the higher the possibility of becoming a replication target node. If the performance of a computing resource of a certain node 20A is lower than that of the other nodes 20B to 20N, if the node 20A is incorporated in all layers, the load is relatively higher than that of the other nodes 20B to 20N. There is a problem of concentration. Therefore, in step S106, it is preferable that the node 20A having a low calculation resource performance is not incorporated in the lower-layer hash ring network as much as possible.

From this point of view, in this embodiment, when determining the allowable lowest layer of the node 20, the low-performance node 20 is not included in the lower layer, in other words, the lower-performance node is included in the upper layer. 20 is incorporated. A method for including the low-performance node 20 in the upper layer is not particularly limited, but a method example will be described below.

First, referring to the metric management table 141, the index of the node 20 (for example, corresponding to the description contents of the node column 141A) is “i”, and the metric index of the node (for example, the description contents of the CPU column 141C to the network column 141F) (Equivalent) is “j”, a set of the node 20 and its metrics is represented by x [i, j]. For x [i, j], a metric value described in the metric management table 141 may be used as it is, or a separately normalized value may be used. Using such x [i, j], Li for deriving the allowable lowest layer of node i is given by

Or

Can be determined as follows.

In equation (1), S indicates the total number of metrics of node i. Expression (1) obtains the logarithm of all the metrics in the node i and sets the minimum value to Li. When formula (1) is used, the calculated Li or a value obtained by normalizing Li is set as the allowable lowest layer value.

In equation (2), mj represents the average value of all nodes of metric j, and σj represents the standard deviation of metric j. Expression (2) is an expression for deriving a ratio with the standard deviation with respect to the metric j of the node i, and the allowable lowest layer is determined using Li obtained by Expression (2). For example, if Li is a negative number, the node i is assigned only to the first hierarchy, and if Li is 0 or more, the node i is assigned to the first hierarchy and the second hierarchy.

(1-2-3-4) Neighboring network adjustment In steps S107 to S108, the configuration information creating unit 112 adjusts the nodes in the neighboring network by performing deduplication processing for duplicate nodes (step S107) and hash space. The adjustment is performed by performing the adjustment (step S108).

First, the duplicate node deduplication process performed in step S107 will be described. Duplicate node deduplication processing refers to creation of configuration information for duplicate nodes assigned to a plurality of neighboring networks in the same hierarchy, which can be generated when duplication of the node 20 is permitted by the policy described in the policy table 144 This is a process for evaluating whether the duplicate node is a node where duplication should be permitted, and canceling duplication when it is determined that duplication should not be permitted. A method for evaluating whether or not an overlapping node is a node that should be allowed to overlap is not particularly limited, but a method example will be described below.

For example, the configuration information creation unit 112 applies the above equation (2) to the metric j of the node i that is a duplicate node, and the obtained value Li is larger than a predetermined threshold (for example, 2). It is determined that duplication is permitted.

Further, for example, when the node i is a node overlapping with three or more neighboring networks, the configuration information creation unit 112 is obtained by applying the above equation (2) to the metric j of the node i. The number of networks that allow duplication may be determined according to the value Li. Specifically, for example, if the obtained value Li is 3 or more, duplication in 3 or more neighboring networks is permitted, and if it is 2 or more and less than 3, duplication in 2 neighboring networks is permitted. If it exists, it is sufficient that duplication is not permitted.

Note that, as a result of the above-described processing for evaluating whether or not duplication should be permitted for duplicate nodes, the node 20 that is not permitted to duplicate is incorporated into any one of the neighboring networks. An example of a method for determining a neighboring network to which the node 20 to which duplication is not permitted is determined will be described below.

As an example, when the duplication is not permitted for the node 20 that has been duplicated in the two neighboring networks as a result of the clustering in step S104, the configuration information creation unit 112, among the two neighboring networks that have been duplicated, The node is assigned to a neighboring network having a smaller number of nodes constituting the neighboring network.

As another example, when deleting the node from each of the two neighboring networks, the configuration information creating unit 112 extends from the node located immediately before the node in the hash space to the node located one after. The average of the distances between nodes in the hash space of each of the two neighboring networks is calculated, and the node is assigned to the neighboring network that is more biased with respect to the calculated average of the distances between the nodes.

In the above two examples, the node 20 that overlaps with two neighboring networks has been described. However, the configuration information creation unit can also be used when duplication is not permitted for the node 20 that overlaps with three or more neighboring networks. 112 can determine the neighborhood network to which the node is assigned by the same process.

Next, the adjustment of the hash space performed in step S108 will be described. The adjustment of the hash space means that for each neighboring network, the distance between the nodes in the hash space is derived for all the nodes 20 included in the neighboring network, and is divided into the neighboring networks based on the deviation of the distance between the nodes. This is a process of adjusting the hash space corresponding to the neighboring network by adding a node of the neighboring network. The distance between nodes in the hash space means a distance from a certain node (start node) to a node (end node) located immediately after the start node. Specifically, the configuration information creation unit 112 derives the distance between the nodes in the hash space for all the nodes 20 included in the neighboring network, and the deviation of the distance between the nodes (for example, using the standard deviation) is predetermined. If it is equal to or greater than the threshold value, the hash values of the start node and the end node are acquired. Then, the configuration information creation unit 112 searches for a node that is included in the upper neighboring network, and whose hash value is between the acquired hash value of the start node and the hash value of the end node, The node of the corresponding upper neighbor network is added and incorporated into the neighbor network from which the distance between the nodes is derived. Note that the search target of the node to be additionally incorporated into the neighboring network may be, for example, a node included in a neighboring network in the same hierarchy adjacent to the neighboring network, but the non-overlapping of the overlapping nodes described in step S107. It is preferable to search from the nodes included in the upper neighboring network because the result of the conversion processing may be invalidated.

In steps S109 to S110, the configuration information creation unit 112 confirms the number of nodes in the hash ring network formed by neighboring networks in the same hierarchy (step S109), and corrects the adjustment value of the neighboring network as necessary (step S109). Step S110).

Specifically, in step S109, the configuration information creation unit 112 performs a certain level of hash ring network (for each layer) on all layers of the hierarchical hash ring network configured by the processes in steps S101 to S108. It is determined whether the total number of nodes in the upper layer hash ring network) is at least two more than the total number of nodes in the lower layer hash ring network one layer below that is associated by the associated line determined in step S105. Here, the reason for determining the difference between the number of at least two nodes is to maintain one replication node that is necessary for reliably performing data replication and the replication data that occurs when the node is deleted. This is to secure one node as follows. Processing related to data replication will be described later with reference to FIGS. 11 and 12, and processing related to node deletion will be described later with reference to FIG.

If the total number of nodes in the upper layer hash ring network is one more or less than the total number of nodes in the lower layer hash ring network in step S109 (NO in step S109), the configuration information creation unit 112 corrects the adjustment value used in the adjustment of the neighborhood network in step S106 and step S107 (step S110), returns to the process of step S104, and constructs the neighborhood network again. Specifically, the correction of the adjustment value in step S110 corresponds to a process of loosening the restriction by the threshold value by lowering the threshold value used in step S106 and step S107. When the total number of nodes in the upper layer hash ring network is 2 or more than the total number of nodes in the lower layer hash ring network in step S109 (YES in step S109), the process proceeds to step S111.

(1-2-4) Structure of Hierarchical Hash Ring Network Through the processing in steps S101 to S110, the configuration information creation unit 112 generates a hierarchical hash ring network in which a plurality of nodes 20 participate.

FIG. 8 is a schematic diagram illustrating a configuration example of a hierarchical hash ring network generated by the process illustrated in FIG. In FIG. 8, the nodes 20A to 20M incorporated in the hierarchical hash ring network are indicated by “A” to “M” for simplicity. In the following description, when the nodes 20A to 20M are simply shown, other nodes may be referred to as “node A” to “node M” or “A” to “M” for simplicity.

FIG. 9 is a table example showing network configuration node information in the hierarchical hash ring network shown in FIG. In the network configuration node information 146 shown in FIG. 9, information about each node incorporated in the hierarchical hash ring network shown in FIG. 8 is stored. Further, the hierarchical hash ring network shown in FIG. 8 is configured by the nodes A to M shown in FIG. The network configuration node information 146 includes a node column 146A, a node hash column 146B, an allowable lowest layer network column 146C, a neighboring node column 146D, and an address column 146E.

In the node column 146A, node names incorporated in the hierarchical hash ring network are described. In FIG. 9, “A” to “M” are described corresponding to the nodes A to M. In the node hash column 146B, hash values (node hash values) corresponding to the nodes A to M are described. In FIG. 9, MD5 values obtained by applying MD5, which is one of hash functions, to unique IDs (for example, “a” to “m”) possessed by nodes A to M, respectively, are described.

In the allowable lowest layer network column 146C, the allowable lowest layer for each node determined in step S106 of FIG. 4 is described. In FIG. 9, the allowable lowest layer network column 146C describes the allowable lowest layer “1” only for the node F, and the allowable lowest layer “2” for the other nodes A to E and G to M. Has been. Therefore, in FIG. 8, the node F is incorporated in the network of the first hierarchy or higher, but is not incorporated in the network of the second hierarchy. The other nodes A to E and G to M are incorporated in both the first layer network and the second layer network.

In the neighborhood node column 146D, the neighborhood nodes clustered in the processing of steps S104 to S110 in FIG. 4 for the nodes described in the node column 146A are described. In FIG. 8, it can be seen that each network is generated by a combination of neighboring nodes described in the neighboring node column 146D. In the address column 146E, addresses (for example, IP addresses) of the nodes A to M are described. The network configuration node information 146 may have a configuration in which a host name is described instead of the address column 146E, or a configuration in which other network location information is described. Also good.

Hereinafter, the structure of the hierarchical hash ring network shown in FIG. 8 will be described with reference to FIG.

In the 0th layer of the hierarchical hash ring network shown in FIG. 8, one network “R0” including all nodes (nodes A to M) is generated. In the ring network “R0”, the nodes A to M are sequentially arranged according to the respective node hash values described in the node hash column 146B of the network configuration node information 146. In other ring networks described below, the nodes A to M are arranged according to the node hash value described in the node hash column 146B.

The first layer of the hierarchical hash ring network shown in FIG. 8 includes a neighboring network “node group” clustered in step S104 of FIG. 4 based on the policy of the first layer described in the policy table 144 of FIG. R00 "and" R10 "are generated. The network “R00” is composed of nodes A to G, and the network “R10” is composed of nodes C and HM. The node C is a node overlapping with the networks “R00” and “R10”. Note that the policy for the first layer clustering condition described in FIG. 5 is wider (distance) than the policy for the second layer clustering condition, and is particularly based on a wide neighborhood in a geographical space. Therefore, each of the networks “R00” and “R10” in the first layer is a neighborhood network including a long-distance neighborhood. is there.

The second layer of the hierarchical hash ring network shown in FIG. 8 includes a neighboring network “by a group of nodes clustered in step S104 of FIG. 4 based on the policy of the second layer described in the policy table 144 of FIG. R000 "," R001 ", and" R100 "are generated. The network “R000” is configured by nodes A to C, the network “R001” is configured by nodes D, E, and G, and the network “R100” is configured by nodes C, H, K, and M. Note that the nodes D, I, J, and L may be any of the second layer because the node has low performance or does not conform to the second layer clustering condition described in the policy table 144. It is not integrated into the network. In addition, the second layer clustering condition policy described in FIG. 5 is narrower than the first layer clustering condition policy, and in particular, based on a narrow neighborhood in a geographical space, Each of the networks “R000”, “R001”, and “R100” in the second layer is a neighborhood network composed of neighborhood regions in a narrow geographic space.

In the third layer (lowermost layer) of the hierarchical hash ring network shown in FIG. 8, the hash ring networks “R0000” to “R1101” created in step S103 of FIG. Is generated.

(1-2-5) Distribution of Configuration Table of Hierarchical Hash Ring Network The configuration information creation unit 112 constructs a hierarchical network as described above by the processing of steps S101 to S110, and then includes it in the hierarchical network. A configuration table in which configuration information of nodes to be collected is collected in a table is generated. Then, the configuration information creation unit 112 passes the generated configuration table to the configuration information transfer control unit 113, and the configuration information transfer control unit 113 sends the network configuration information based on the configuration table to each table management unit 214 of the nodes A to M. Is transferred (step S111).

The configuration contents and table items of the configuration table are not particularly limited. For example, as used in a general distributed hash table, in a hash ring network, a node located immediately before or after a specified node You may use the table in which the hash value and the destination address (for example, IP address, host name, etc.) of the designated node are described.

In particular, when the number of nodes 20 included in the hierarchical hash ring network is small (for example, when the number of nodes is about several tens of thousands), when the node 20 is actually searched by referring to the network configuration information in the subsequent processing. In such a case, the configuration table includes the hash values of all the nodes 20 included in each hash ring network. And a table in which addresses are described.

For example, when the configuration information creation unit 112 creates the configuration table as described above, the configuration information transfer control unit 113 requires the nodes A to M for the table management units 214 of the nodes A to M, respectively. Is extracted from the original configuration table, and network configuration information is generated and transferred. The portions required by the nodes A to M in the configuration table correspond to the configuration information of the hash ring network of each layer belonging to the relevant line of the node 20. The node A shown in FIG. 8 will be specifically described as an example. Since the hash ring network including the node A is “R0”, “R00”, “R000”, and “R0000”, the configuration information transfer control is performed. The unit 113 transfers only the configuration table of the network to the table management unit 214 of the node A as network configuration information.

FIG. 10 is a table showing an example of network configuration information. The network configuration information 147 illustrated in FIG. 10 is an example of network configuration information transferred from the configuration information transfer control unit 113 to the node A. The network configuration information 147 includes a network name column 147A in which the name or ID of the hash ring network is described, a node hash column 147B in which a node hash value corresponding to a node incorporated in the hash ring network is described, and the address of the node And an address column 147C in which is described. Since the network configuration information 147 shown in FIG. 10 describes only the configuration information of nodes included in the networks “R0”, “R00”, “R000”, and “R0000”, in the hierarchical hash ring network, It can be seen that the node A belongs to each of the networks “R0”, “R00”, “R000”, and “R0000”. Note that the configuration table that is the basis of the network configuration information 147 is also configured to have the same table items as the network configuration information 147.

In addition, for the node 20 having a network hierarchy to which it does not belong, such as the node L that is not incorporated in the second layer network in FIG. 8, the configuration information transfer control unit 113 performs step S105 in FIG. By referring to the created related line, only the network hierarchy to which the node 20 belongs is targeted, and the configuration information of the node included in the hash ring network in which the node 20 is incorporated in the network hierarchy is cut out from the configuration table. What is necessary is just to transfer to the table management part 214 of the said node 20.

(1-3) Data Writing to Distributed Database In the following, in the distributed data management system 1 according to the present embodiment, a method of writing data to a distributed database using a hierarchical hash ring network as shown in FIG. explain. FIG. 11 is a sequence diagram illustrating an example of a data write processing procedure. In FIG. 11, as a specific example, a user terminal (not shown) connected to node A using the hierarchical hash ring network of FIG. 8 generated by a node having network configuration node information 146 shown in FIG. The process when data having the data key “q” is written is described.

In the hash ring network, the nodes constituting the hash ring network are arranged in a ring shape based on the hash value of the node. When data is stored in the hash ring network, the hash value of the data key included in the data and the hash value of each node are compared, and the responsible node for storing the data is determined based on a predetermined determination rule. The The predetermined determination rule is not particularly limited. For example, a node positioned in front of or behind the hash value of the data key, a node closest to the hash value of the data key, or the like can be set as a node in charge of the data. In the hierarchical hash ring network used in the present embodiment, in principle, the node located at the rearmost position from the hash value of the data key is determined as the node in charge of the data.

Note that the hash value (MD5 value) by MD5 for the data key “q” is “7694f4a66316e53c8cdd9d9954bd611d”. The configuration of each of the nodes A to M is the same as the configuration of the node 20A shown in FIG. 1, and the table management unit 133 of each node A to M has network configuration information equivalent to the network configuration information 147 shown in FIG. Suppose you have

(1-3-1) Data Write Request First, before explaining the data write processing shown in FIG. 11, a data write request transmitted when requesting data write between user terminals or nodes will be explained. . FIG. 12 is an explanatory diagram showing a format example of a data write request. A data write request 148 shown in FIG. 12 is a format example of a data write request when data write is requested in the REST (Representational State Transfer) format of HTTP (Hypertext Transfer Protocol), and the data write request is sent to the request type column 148A. The request format example column 148B describes a format example of the data write request.

In the data write request 148, three types of request examples 148C to 148E are shown. The request 148C is a format example of a data write request transmitted from the user terminal to the distributed data management system 1. In the request 148D, the node 20 (for example, the node A) of the distributed data management system 1 that has received the request 148C from the user terminal requests the node 20 configuring the distributed data management system 1 to write data for the purpose of data replication. It is a format example of the data write request transmitted at the time. The request 148E is a format example of a data write request that is transmitted when the node 20 that has received the request 148D cannot re-request data replication to another node 20 when the replication data cannot be created. The replication data creation process will be described later with reference to FIG.

Assume that the request 148C, the request 148D, and the request 148E have the same format configuration, and first, the request 148C is taken as an example, and the configuration content of the request format will be described in detail.

The first line of the request 148C is an HTTP command, “PUT” meaning data transmission, a character string “/ ServiceID / UserID / FileID” indicating a URI (Uniform Resource Identifier) of the data, and an HTTP version of the request. It consists of the character string “HTTP 1.1”. The second line of the request 148C is a TimeStamp header that presents the time stamp “T1” of the data to be transmitted, and describes the creation date and time of the file write request. The data URI and the time stamp are items used to determine the uniqueness of the transmitted data, and the same contents are described in the request 148D and the request 148E.

The third line of the request 148C is a Network header indicating from which layer of the hierarchical hash ring network the request is transmitted. In the case of the request 148C, the request is a transmission request from the user terminal and belongs to any layer. Since there is no, “Null” is described. The Network header is used in performing re-replication within the same hash ring network described later. The fourth line of the request 148C is a Replication header indicating whether the request is mainly for data replication, and “true” is described therein. In this case, the request is not a data replication main purpose but is a primary data write request, and indicates that the write request destination node needs to create replication data of the write target data.

The fifth line of the request 148C is a Primary header indicating the location of primary data when primary or secondary is distinguished as an attribute for data distinction (described later in step S205 in FIG. 11). Since the data distinction attribute has not yet been created at the stage of the request 148C, “Null” is described in the Primary header. Note that the value specified in the Primary header only needs to be uniquely identified by the node 20 that holds the primary data, and may be indicated using a network address (IP address), a host name, or the like. The sixth line of the request 148C is a header indicating whether or not another header is written in the request, and is not particularly specified in the case of the request 148. In the seventh line of the request 148C, data to be written that is requested to be written by the request is described.

Next, portions of the requests 148D and 148E that are different from the request 148C will be described. The network header (third line) of the request 148D describes the network name “R0000”, and indicates that the request 148D is transmitted from the third layer network “R0000” configured only by the node A. . The network header (third line) of the request 148E describes the network name “R00”, indicating that the request 148E has been transmitted from the first layer network “R00”.

Also, “false” is described in the Replication header (line 4) of the requests 148D and 148E, indicating that any request does not require replication processing of the data to be written. In the Primary header (line 5) of the requests 148D and 148E, “node A” indicating the node A is described. This indicates that the node storing the primary data of the data transmitted by the requests 148D and 148E is the node A.

(1-3-2) Data Write Processing Data write processing will be described below with reference to FIG. In step S201, the node A receives a data write request (eg, request 148C in FIG. 12) having the data key “q” from the user terminal. The processing in step S201 in node A will be described in more detail. First, the data read control unit 213 in node A receives a write request from the user terminal via the communication unit 233 and the communication I / O 231. Next, the request analysis unit 215 analyzes the received write request and determines that the request is transmitted from the user terminal. Subsequently, the hash calculation unit 216 calculates a hash value based on the data key “q”, and the node derivation unit 217 is transferred from the configuration information transfer control unit 113 and is based on the network configuration information managed by the table management unit 214. Thus, the node 20 that requests data replication is derived. Then, the data write control unit 212 requests the replication destination node derived by the node deriving unit 217 to write data by the request 148D in FIG. 12 via the communication I / O 231 and the communication unit 233. (Steps S202, S204, S206, S208).

Specifically, according to the network configuration information 147 shown in FIG. 10, the hash ring networks to which the node A belongs are “R0000”, “R000”, “R00”, “R0”, and the data key “q The hash value for "is" 7694f4a66316e53c8cdd9d9954bd611d ". Therefore, the node deriving unit 217 sets a node in charge of the data key “q” in each of the networks “R0000”, “R000”, “R00”, and “R0” as a replication request destination.

In the case of the network “R0000”, since the network is a hash ring network composed only of the node A, the data write request destination in the network is the node A. Therefore, the data control unit 21 (specifically, the data write control unit 212) requests the node A to write data (step S202). In step S202, since node A requests data write to node A, data write control unit 213 of node A writes data to storage unit 234 of its own node (step S203).

Further, in the case of the network “R000”, the network is a hash ring network composed of nodes A, B, and C. With reference to the network configuration node information 146 shown in FIG. 9, data having a hash value “7694f4a66316e53c8cdd9d9954bd611d” The key “q” is assigned to the node B having the node hash “92eb5ffee6ae2fec3ad71c777531578f”. Therefore, the data control unit 21 of the node A requests the node B to write data (step S204). Then, the node B that has received the data write request writes the data to the storage unit 234 of its own node (step S205).

The processing in step S205 will be described in detail. The data read control unit 213 of the node B receives a write request from the node A via the communication unit 233 and the communication I / O 231. Next, the request analysis unit 215 analyzes the received write request and determines that the request is a request for replication from the node A that has received the request from the user terminal. The request processing unit 211 transfers the write target data attached to the request to the data write control unit 212 in order to write data as replication, and the data write control unit 212 passes through the storage I / O 232. Then, the data is written into the storage unit 234.

In addition, regarding the replication of the data to be written, if it is desired to keep the primary and secondary differently, the location of the primary data is written at the same time as the data writing as metadata at least when the secondary data is written. The metadata writing method is not particularly limited, but an existing file system fork may be used. A fork is a refork source in HFS (Hierarchical File System), a file system in Unix (registered trademark) or Unix (registered trademark) system (ext2, ext3, JFS, ReserFS, XFS, etc.) as xattr, NTFS (NT System) is known as an alternative data stream.

In the case of the network “R00”, the data key “q” is assigned to the node D having the node hash value “8277e0910d750195b448797616e091ad” in the same way as the case of the network “R000” described above. Accordingly, the data control unit 21 of the node A requests the node D to write data (step S206). Then, the node D that has received the data write request writes data to the storage unit 234 of the node D as in step S205 (step S207).

In the case of the network “R0”, the data key “q” is assigned to the node D having the node hash value “8277e0910d750195b448797616e091ad” in the same way as the network “R000” described above. Therefore, the data control unit 21 of the node A requests data writing to the node D (step S208). The node D that has received the data write request tries to write data to the storage unit 234 of the node D as in step S205, but the data write control unit 212 corresponds to the network “R00” in step S207. By this process, it is detected that the same data has already been written in the node D (step S209). At this time, in order to maintain the number of replications, the node D selects again the data replication destination node (for example, node I) (step S210), and requests 148E in FIG. 12 to the selected replication destination node I. To request data writing (step S211). Then, the node I requested to write data writes the data to its own node (step S212). When it is desired to distinguish between primary and secondary for replication of data to be written, at least when writing secondary data, the location of primary data is written as metadata at the same time as data writing.

In the processing of steps S208 to S212, in the distributed database constructed by the distributed data management system 1, the same node is allocated across multiple hierarchies, so the same node can be selected as a data write destination in multiple networks. In such a case, this is a process showing one characteristic in the present embodiment that data is re-replicated to another appropriate node.

Note that the method for detecting that the same data is written in step S209 is not particularly limited. For example, when the data write request has the request format shown in FIG. The time stamp described in the TimeStamp header in the request is confirmed, and data having the same hash value as the hash value of the data key “q” and having the same time stamp is stored in the storage unit 234. For example, it may be detected that the same data has been written. Further, if the data to be written is a file, the data write control unit 212 detects whether the same data is written in the property information stored in the file format of the file, in particular, the time stamp of the file. You may use it.

In step S210, the method of selecting the data replication destination node to be reselected is not limited. For example, it is set that replication data is not created on the same node in a plurality of networks regardless of the internal configuration of each node. In the network “R0” in which the writing of the same data is detected, another node uniquely determined from the node D requested as the data writing destination is reselected as the data replication destination node, and Data replication. Another node uniquely determined from the node D is, for example, a node located in the vicinity of the node D in the network “R0”, and more specifically, the nodes M, C, Nodes I and K located behind the node D, or other nodes located before and after the factor D by 2 can be considered. However, in consideration of the processing efficiency at the time of data recovery, which will be described later, it is preferable to reselect the node I located one node behind the node D as the data replication destination node.

As another example of selecting a data replication destination node to be reselected in step S210, a data replication destination node is selected as a reselection target in a network other than the network “R0” that detected the writing of the same data. Also good. However, in such a case, the node reselected from the other network is a node to which the same data has already been written, such as nodes A, B, and D to which data has been written in steps S202 to S207. Since there is a possibility, the data write control unit 212 of the node must execute the write detection of the same data again, and the processing efficiency is lowered.

The data having the data key “q” requested to be written from the user terminal to the node A by the processing of steps S201 to S212 is the node A, among the nodes A to M constituting the distributed database in the distributed data management system 1. Written in B, D, I.

After the completion of step S212, at least the highest layer network, that is, all nodes A to M, are used for data recovery accompanying node deletion in the deletion processing of nodes constituting the distributed data management system 1 described later. In the 0th layer network “R0” to which the node “D” is assigned, the node D in which data is written in the network “R0” is backed up to another node (for example, the node I) uniquely identified from the node D. And requesting replication of the data (step S213). However, in FIG. 11, since the same data has already been written in step S212, node I reselects the replication destination in the same manner as steps S209 to S210, and the reselected node (for example, node K) Request replication again (step S214). Then, in the node (node K) where the same data has not been written, the data is written in the storage unit 234 of the own node to create a backup (step S215). If you want to keep primary and secondary differently for replication of data to be written, write the location of the primary data as metadata at the same time as the data write at least when writing secondary data as before. Like that.

In FIG. 11, the execution order of the data replication request processing (steps S202, S204, S206, and S208) performed by the node A is not limited, and may be executed in parallel. However, in order to perform data replication from a node in the vicinity of the node A to the node A from which the user terminal first requested data writing, it is preferable to execute data replication request processing from the lower hierarchical network. .

By performing the processing of steps S201 to S215 described above, the distributed data management system 1 can write data from any node A to M to which the user terminal is connected to the distributed database using the hierarchical hash ring network. it can.

(1-4) Reading Data from Distributed Database Hereinafter, a method for reading data from a distributed database in the distributed data management system 1 will be described. Here, it is assumed that data having “q” as the data key from the user terminal is stored in the distributed database by the hierarchical hash ring network shown in FIG. 8 by the data writing process shown in FIG. Specifically, the data is written in nodes A, B, D, I, and K. In such a case, a process for reading data having the data key “q” from the node M will be described.

First, the data read control unit 213 of the node M requests the node M included in the network “R1101” to search for data stored with the data key “q”. Since the request destination of the read request is the own node, the data read control unit 213 checks whether the data is stored in the storage unit 234. Here, since the data having the data key “q” does not exist in the node M, the data reading control unit 213 returns a failure notification notifying that the search result is unsuccessful. Although the format format of the read request is not particularly limited, for example, considering the same format as the data write request 148 shown in FIG. 12, the read request does not need to include a Primary header or data, What is necessary is just the structure which has the HTTP command containing the command to mean, and the Network header etc. which show to which hierarchy of the hierarchical hash ring network the said request is transmitted.

Next, the data control unit 21 of the node M traces the related line including the node M, refers to the upper-layer second-layer network “R100”, and takes charge of the hash value of the data key “q” ( Although explanation is omitted, it corresponds to the node K) and requests to search whether data having the data key “q” is stored. Since the data having the data key “q” is stored in the node K, the data is transferred from the node K to the node M, and the reading of the data by the node M is completed.

If data is not found even in the second layer network, it is only necessary to search for a higher-layer first layer network “R10” or a highest layer zero-layer network “R0”. If the search is performed up to the zero-layer network “R0”, the target data can always be successfully searched, and the target data can be read.

(1-5) Deletion of Data from Distributed Database In the following, in the distributed data management system 1 according to the present embodiment, a method for deleting data stored in a node from a distributed database using a hierarchical hash ring network Will be described. FIG. 13 is a sequence diagram illustrating an example of a data deletion processing procedure. In FIG. 13, it is assumed that data having “q” as the data key from the user terminal is stored in the distributed database by the hierarchical hash ring network shown in FIG. 8 by the data writing process shown in FIG. . Specifically, the data is written in nodes A, B, D, I, and K. In such a case, a process for deleting data having the data key “q” in response to a deletion request from a user terminal connected to the node M will be described.

In the data deletion process, first, the target data in the network is searched in order from the lower layer to the upper layer in the same procedure as the data reading described above. Specifically, the data read control unit 213 of the node M that has received a request to delete data (target data) having the data key “q” from the user terminal reads the target data to the own node of the network “R1101”. Is requested (step S301). Note that the process shown in FIG. 13 is for the purpose of deleting data, and it is not necessary to read the target data body. Therefore, the data read request transmitted in step S301 and step S303 described later reads only the header of the target data. Or a request for examining only the presence or absence of target data. As a result of the read request in step S301, failure notification is returned because no data having the data key “q” is stored in the node M (step S302).

Next, the data control unit 21 of the node M traces the related line including the node M, refers to the second-layer network “R100” that is one level higher, and is responsible for the hash value of the data key “q”. Is requested to read data (step S303). At this time, since the data is stored in the node K, the data read control unit 213 of the node K reads the header information of the data and transmits it to the data control unit of the node M (step S304). When receiving the header information of the target data in step S304, the data control unit 21 of the node M requests the transmission source node M to read the metadata of the target data (step S305). Upon receiving the metadata read request, the data control unit 21 of the node K reads the metadata of the target data in response to the request and returns it to the node M (step S306).

In the node M that has received the metadata of the target data in step S306, the data control unit 21 refers to the received metadata, acquires the storage destination node of the primary data of the target data, and targets the storage destination node. A request for data deletion is made (step S307). The primary data storage destination node of the data having the data key “q” is identified as the node A by referring to, for example, the Primary header included in the metadata. Accordingly, in step S307, a data deletion request is transmitted to the node A.

Then, the data control unit 21 of the node A that has received the data deletion request, in the same procedure as the data writing described in FIG. 11, all the nodes to which the target data is replicated, that is, the nodes A, B, D, I and K are requested to delete the target data (step S308), and the target data is deleted at each of the nodes A, B, D, I, and K (step S309). When the deletion of the target data is completed, a deletion completion notification for notifying the completion of the deletion is transmitted from each node A, B, D, I, K to the node M (step S310).

When writing data having the data key “q”, the data is replicated to the node I instead of the node D in step S212, and the data is replicated to the node K instead of the node I in step S215. Similarly, in the case of data deletion, for example, when the target data of the node D is deleted in the first layer network “R00” and then the data deletion in the zeroth layer network “R0” is attempted, Based on the fact that the target data has already been deleted, the node I uniquely determined from the node D can also recognize that the target data is stored and delete the data. That is, not the node that is directly searched based on the hash value and is originally responsible for the data storage destination (for example, the node D), but the node that is uniquely determined from the node that is directly searched based on the hash value (for example, the node D) When deleting data replicated to node I for node I or node K for node I), the node directly searched based on the hash value has already deleted the target data in the lower layer network As a trigger, it can be detected that the target data is also replicated to a node uniquely determined corresponding to the node in the upper layer network, and the target data can be deleted.

The distributed data management system 1 can delete the data stored in the distributed database from any of the nodes constituting the distributed database by the processing of steps S301 to S310 described above.

(1-6) Addition of Node to Distributed Database Here, a method of adding a node to a distributed database using a hierarchical hash ring network in the distributed data management system 1 according to the present embodiment will be described. As a process for newly adding a node to such a hierarchical hash ring network, a node addition process in a conventional distributed data management system using a distributed hash table can be applied. Specifically, for example, a node to be added is added to a network to be added, and in each network of the hierarchical hash ring network, a range of hash values handled by the addition target node from a node positioned after the addition target node This data may be moved to the added node. Further, the processing similar to the processing described in step S107 or step S108 of FIG. 4 can be applied to processing for overlapping nodes generated by adding the addition target node to each network and adjustment of the hash space. .

(1-7) Deletion of node of distributed database and data recovery In the following, in the distributed data management system 1 according to the present embodiment, when a node is deleted from the distributed database by the hierarchical hash ring network, the node is deleted. A method for recovering data in the node will be described. FIG. 14 is a schematic diagram for explaining a method of recovering data in a hierarchical hash ring network from which nodes are deleted. The hierarchical hash ring network shown in FIG. 14 is a hierarchical hash ring in which node B is deleted from the hierarchical hash ring network shown in FIG. 8 for failure handling or exchange. A broken line arrow in FIG. The direction in which the data stored in B is restored is shown. Further, in FIG. 14, the deleted node B is shown surrounded by a broken line.

First, the outline of the procedure for deleting the node B will be described. First, the configuration information management unit 11 of the core node 10 needs to collect the node B based on the metrics acquired by the metric acquisition units 22 of the nodes A to M. When a situation (for example, occurrence of a failure) is detected, the hierarchical hash ring network from which the node B is deleted is reconfigured. The configuration information management unit 11 of the core node 10 may delete the designated node B in response to a deletion instruction from the user terminal or direct input to the core node. Then, in the hierarchical hash ring network in which the configuration information management unit 11 deletes and reconstructs the node B, the data is restored in order from the lower layer network in the order shown in FIG. Deletion of B is completed.

Hereinafter, a specific processing procedure for restoring the data of the node B will be described. In the hierarchical hash ring network before node B is deleted, in the lowermost layer (third layer) network “R0001”, all data is written to node B, and the data written to node B In the network “R000”, replication is performed to the nodes A and C. The reason why the data of the node B is not replicated in the network “R000” is because the same data is already written in the node B in the lower layer network “R0001”. Data to be written is replicated to node A.

Therefore, in order to recover the data written to the node B in the network “R0001”, the configuration information management unit 11 writes to the node B in the network “R0001” and replicates to the nodes A and C in the network “R000”. The transferred data is moved or copied to the node C (steps S1 and S2). By the way, in addition to the data replicated from the node B, the nodes A and C are written in the lowest layer networks “R0000” and “R0010”, respectively, and the node B before deletion in the network “R000” should be in charge Data etc. are written. In order to find the data written in the network “R0001” from the data written from the plurality of writing nodes in this way, the configuration information management unit 11 utilizes the metadata written at the same time as the data. To do. That is, the configuration information management unit 11 refers to the metadata in steps S1 and S2, and moves or duplicates from the nodes A and C to the node C by using only the data in which information related to the node B is described in the metadata as a recovery target do it.

Note that the replication destination node that moves or replicates data in steps S1 and S2 is not limited to the node C, and may be any node included in the network “R000”. If the data moved or copied to the node C is to be distinguished from the primary and the secondary regarding the replication of the write target data, information indicating the primary data is written as metadata, or the secondary data Delete metadata to indicate that it is not.

Next, consider the upper layer network “R00”. Data written to the node B in the network “R000” is replicated to the nodes D, F, and G in the upper layer network “R00”. The nodes D, F, and G correspond to nodes that are positioned before and after the node B before deletion, or nodes that are uniquely determined when the same data is already written in the node B. Is the same as in the case of the network “R000” described above, and a description thereof will be omitted.

Therefore, in the network “R00”, in order to recover the data written in the node B in the network “R000”, the configuration information management unit 11 is written in the node B in the network “R000” and in the network “R00”. The data replicated to the nodes D, F, and G is moved or copied to the node A (steps S3 to S5). The reason that the data replication destination is set to node A in steps S3 to S5 is that the node that handles data that should have been handled by node B after deletion of node B is node A. In addition to nodes A, C in steps S1 and S2, nodes D, F, and G are written in other networks in addition to the data that is written and replicated in node B in network “R000”. In the network “R00”, data to be handled by the node B before deletion is written. In order to find the data written in the network “R000” from the data written from the plurality of writing nodes in this way, the configuration information management unit 11 utilizes the metadata written at the same time as the data. To do. That is, the configuration information management unit 11 refers to the metadata in steps S3 to S5, and restores only data in which information on the nodes A, B, and C is described in the metadata from the nodes D, F, and G. It may be moved or copied to node A. In other words, the configuration information management unit 11 includes only data in which information on nodes included in the hash ring network including the deleted node in the lower layer hash ring network of the data recovery request source network is described in the metadata. Target for recovery.

Next, consider the higher-layer network “R0”. Data written to the node B in the network “R00” is replicated to the node G in the uppermost network “R0”. Therefore, in the network “R0”, in order to recover the data written in the node B in the network “R00”, the configuration information management unit 11 is written in the node B in the network “R00” and the network “R0”. The data replicated to the node G is moved or copied to the node G (step S6). When the data source and destination are the same node as in this example, no migration process or replication process actually occurs.

Note that the reason why the data replication destination is set to the node G in step S6 is that the data that should have been handled by the node B is the node G after the deletion of the node B. Similarly to the nodes A and C in steps S1 and S2 and the nodes D, F, and G in steps S3 to S5, the node G includes data other than the data written to the node B and replicated in the network “R00”. Are written in another network, and data to be handled by the node B before deletion is written in the network “R0”. Thus, in order to find the data written in the network “R00” from the data written from the plurality of writing nodes, the configuration information management unit 11 utilizes the metadata written at the same time as the data. To do. That is, the configuration information management unit 11 refers to the metadata in step S6, and only moves or replicates data from the node G using only the data in which information related to the nodes A to G is described in the metadata as a recovery target. In other words, the configuration information management unit 11 includes only data in which information on nodes included in the hash ring network including the deleted node in the lower layer hash ring network of the data recovery request source network is described in the metadata. Is the recovery target.

Here, as described in FIG. 11, at least the highest-layer network “R0” has a node (for example, a node G) in which replication data of data stored in the node B is uniquely determined in the same network. (See steps S213 to S215 in FIG. 11). Therefore, in order to recover the data written in the node B in the network “R0”, the configuration information management unit 11 transfers the data written in the node B and replicated to the node G in the network “R0” to the node G. Move or duplicate (step S7). If the data source and destination are the same node as in this example, the migration process and replication process do not actually occur, but the same data has already been written to the node G in step S6. Therefore, replication is performed from a node G to a node uniquely determined (for example, node E).

Note that the reason why the data replication destination is the node G in step S7 is that the data that should have been handled by the node B is the node G after the deletion of the node B. In addition to the data that has been written and replicated in the node “B” in the network “R0”, the node G includes other networks as well as the other nodes A, C, D, F, and G described above. The data written in is written. In order to find data written in the network “R0” from data written from a plurality of write nodes in this way, the configuration information management unit 11 utilizes metadata written at the same time as the data. To do. That is, the configuration information management unit 11 refers to the metadata in step S7, and only moves or duplicates data from the node G by using only data in which information on all nodes A to M is described in the metadata as a recovery target. . In other words, the configuration information management unit 11 is data in which information on nodes included in the hash ring network in which the deleted node is included among the lower layer hash ring networks of the data recovery request source network is described in the metadata. Only the recovery target.

Finally, at least the highest-layer network “R0” needs to replicate the data to be moved or copied in step S7 of FIG. 11 in order to use it for data recovery when the node is deleted. Therefore, the configuration information management unit 11 performs replication on the node E, which is uniquely determined from the node G, for the data to be moved or copied in step S7 (step S8). However, in this example, since the data has already been replicated to the node E in step S7, the data is actually replicated to the node A uniquely determined from the node E.

When the node is deleted from the distributed database by the hierarchical hash ring in the distributed data management system 1 by the processing of steps S1 to S8 described above, the node is stored in the deleted node in the remaining nodes constituting the distributed database. The stored data can be securely stored.

(1-8) Effects According to this Embodiment According to such a distributed data management system 1, regarding the construction of a hierarchical hash ring network, the configuration information management unit 11 has a geospatial clustering policy for each hierarchy. By performing clustering of nodes based on the policy table 144, a hierarchical hash ring network including a hash ring network based on proximity in a geospatial space can be constructed. In a hierarchical hash ring network constructed in this way, for example, even if a disaster occurs at a certain node and the connection becomes impossible, the network to which the disaster source node belongs from the neighboring nodes included in the same hash ring network Since data can be accessed by connecting to, disaster resistance can be ensured.

Further, according to such a distributed data management system 1, regarding the construction of the hierarchical hash ring network, the configuration information management unit 11 configures each node as shown in the process of step S106 in FIGS. Metrics indicating the performance (spec) of the computer to be used are collected, the allowable lowest layer for each node is determined based on the metric, and the arrangement of each node in the hierarchical hash ring network is determined in consideration of the allowable lowest layer Therefore, it is possible to prevent the node 20 having a low performance of the calculation resource from being incorporated into the lower layer hash ring network as much as possible. Conventionally, when a hierarchical hash ring network is built without considering the performance of the computers that make up the nodes, data writing is concentrated on the nodes that are made up of low-performance computers, increasing processing time. However, the distributed data management system 1 avoids such a situation where accessibility is reduced, and takes into account the specifications of the computers constituting the nodes, and the hierarchical hash ring. There is an effect of reducing the unevenness of the node processing time at each node in the network.

Further, according to such a distributed data management system 1, regarding the construction of the hierarchical hash ring network, as shown in the process of step S107 in FIG. 4, the configuration information management unit 11 has a plurality of sub-networks in the same hierarchy. A node that can exist in a duplicated manner is deduplicated by moving the node to another subnetwork in the same hierarchy. Conventionally, when constructing a sub-network considering spatial proximity when constructing a hierarchical hash ring network, there is a possibility that nodes are duplicated and incorporated between multiple sub-networks in the same layer However, since the distributed data management system 1 adjusts overlapping nodes as described above to construct a hierarchical hash ring network, an effect of avoiding a situation in which the load is biased to a specific node can be expected.

Also, in the past, when constructing a sub-network considering spatial proximity when constructing a hierarchical hash ring network, the sub-network to be constructed is expanded by a one-dimensional hash space formed only by hashes Hash ring network sorted by hash and space (geography and network) more than two dimensions and sorted by hash space more than two dimensions hashes compared to sort by one dimension hash space Since the space is biased, there is a problem that the range in which data is stored in each node becomes uneven and the burden on each node is biased. On the other hand, in the distributed data management system 1 described above, with respect to the construction of the hierarchical hash ring network, as shown in the process of step S108 in FIG. Thus, the hash space corresponding to the neighboring network can be adjusted, and the load can be prevented from being biased to a specific node.

In such a distributed data management system 1, the configuration information control unit 11 constructs a hierarchical hash ring network so as to maintain a sufficient number of replications for executing data replication. The effect of can be realized. As for specific processing relating to data replication, for example, when the data control unit 21 writes data, as shown in FIG. 11, a procedure for performing data replication in the same network and in an associated upper layer network Has been revealed. Also, when deleting a node, as shown in FIG. 14, it has been clarified that the number of nodes necessary for replication is secured.

Further, according to such a distributed data management system 1, regarding the construction of the hierarchical hash ring network, the network configuration in the hierarchical hash ring network is clarified by the network configuration node information 146 shown in FIG. Since the network configuration information 147 clarifies the configuration information of the network in which each node participates, it is possible to access data across hierarchies. As a result, the distributed data management system 1 performs processing such as data writing to the distributed database (FIG. 11), data reading, data deletion (FIG. 13), node addition, and node deletion (FIG. 14). In this case, data can be accessed from any node constituting the distributed database, and these processes can be executed. Thus, according to the distributed data management system 1, a distributed database that has disaster resistance and can access data from any constituent node is constructed by a hierarchical hash ring network, and the data stored in the distributed database is stored in the distributed database. Can be managed.

(2) Second Embodiment A distributed data management system according to the second embodiment is characterized by constructing and managing a distributed database based on a hierarchical hash ring network, and in particular, proximity in geospatial space. By applying the reverse of what can be applied to the hierarchical hash ring network, the geographical distance is applied to the hierarchical hash ring network to construct a hierarchical hash ring network suitable for disaster recovery.

(2-1) Configuration of Distributed Data Management System The configuration of the distributed data management system 2 according to the second embodiment is the same as the configuration of the distributed data management system 1 shown in FIG. In the distributed data management system 2, one or more replication data is created for each tier, and the number of tiers is set according to the number of replication data. In this embodiment, as an example, the number of hierarchies of the hierarchical hash ring network is four. Therefore, the number of replication data in the hierarchical hash ring network managed by the distributed data management system 2 is at least four.

(2-2) Construction of Hierarchical Hash Ring Network FIG. 15 is a flowchart showing a processing procedure for constructing a hierarchical hash ring network in the distributed data management system according to the second embodiment. With reference to FIG. 15, a processing procedure for constructing a hierarchical hash ring network in which nodes A to M participate in the distributed data management system 2 according to the second embodiment will be described.

First, the configuration information creation unit 112 derives the hash values of the nodes A to M (step S401), creates the 0th layer network based on the derived hash value (step S402), and the lowest layer (third layer). The processing up to network creation (step S403) is the same as the processing of steps S101 to S103 shown in FIG.

Next, the configuration information creation unit 112 creates m layers (where m is a variable) of neighborhood networks based on the proximity of geospace (steps S404 to S410). Here, m = 1. In step S <b> 404, the configuration information creation unit 112 clusters the nodes 20 on the basis of a policy in the geospace for each hierarchy with respect to the nodes 20 assigned to each hierarchy.

FIG. 16 is a table showing an example of a clustering policy used in the processing shown in FIG. In the construction process of the hierarchical hash ring network shown in FIG. 15, the policy table 149 shown in FIG. 16 is applied. The policy table 149 shown in FIG. 16 includes a layer column 149A in which the number of layers of networks is described, and a policy (geospace) column 149B in which a policy based on a geospace is described. Is done.

The clustering policy shown in FIG. 16 is different from the clustering policy of the first embodiment shown in FIG. 5 in that it does not have a policy when the network space is used as a reference, and the first layer policy ( The content of (geographic space) is different. According to the policy table 149 shown in FIG. 16, the first layer is clustered according to the condition “the network farthest from the second layer network” in addition to the condition “clustering by municipality”. That is, in the configuration processing of the hierarchical hash ring network shown in FIG. 15, a neighborhood network clustered based on the proximity in the geospatial is formed in the second layer, and the proximity in the geospatial is formed in the first layer. And a network having a distant property from the second-layer neighboring network (distant neighboring network) is formed. Note that the policy table 149 shown in FIG. 16 may be configured to have a policy in the network space as in the policy (network space) column 144C of the policy table 144 shown in FIG.

Next, in step S405, the configuration information creation unit 112 associates each cluster with the neighborhood network created in step S404 (including the distant neighborhood network). Specifically, based on the clustering policy described in the policy table 149 of FIG. 16, the configuration information creation unit 112 associates an optimum cluster for disaster recovery with each second-layer hash ring network. Specifically, the configuration information creating unit 112 sets a cluster having a maximum distance from the second layer hash ring network as a correspondence relationship.

In step S405, the method for associating each hash ring network between a plurality of hierarchies is not limited, and the same applies particularly when performing disaster recovery association. In addition, in order to associate the first layer hash ring network based on the distance from the second layer hash ring network, it may be necessary to determine the distance of the hash ring network between layers. As a method for obtaining the distance of the ring network, for example, a method of deriving the distance between the clusters using coordinates unified with respect to each cluster can be used. Here, the coordinate coordinated with respect to each cluster corresponds to, for example, a predetermined position (vertex coordinate, center coordinate, etc.) of the rectangular area.

Next, the configuration information creation unit 112 refers to the metric table 141, except for the lowest layer network including only its own node, so that nodes with low performance of calculation resources are not incorporated into the lower layer hash ring network as much as possible. An allowable lowest layer for the node 20 is determined (step S406). Since a specific method example when determining the allowable lowest layer can be applied to the method example described in step S106 of FIG. 4, detailed description thereof is omitted.

Thereafter, the configuration information creation unit 112 adjusts the neighborhood network (steps S407 and S408), compares the number of nodes included in the upper layer network with the number of nodes included in the lower layer network (step S409), and The process of correcting the adjustment value of the network as necessary (step S410) is the same as the process shown in steps S107 to S110 of FIG.

Finally, the configuration information creation unit 112 generates a configuration table for the hierarchical network generated by the processes in steps S401 to S410, and the configuration information transfer control unit 113 stores the network configuration information based on the configuration table in the node A. The data is transferred to each of the table management units 214 (step S411). Since the process of step S411 is the same as the process shown in step S111 of FIG. 4, detailed description thereof is omitted. Through the processing in steps S401 to S411, the distributed data management system 2 builds a distributed database using a hierarchical hash ring network.

(2-3) Structure of Hierarchical Hash Ring Network FIG. 17 is a schematic diagram illustrating a configuration example of a hierarchical hash ring network according to the second embodiment. FIG. 18 is an example of a table showing network configuration node information in the hierarchical hash ring network shown in FIG. The network configuration node information 150 shown in FIG. 18 includes the same table items as the network configuration node information 146 shown in FIG. 9 (node column 150A, node hash column 150B, allowable lowest layer network column 150C, neighboring node column 150D, address column). In addition to 150E), a remote node column 150F in which a node existing far in the geographical space is described with respect to the node described in the node column 150A is configured.

Hereinafter, the structure of the hierarchical hash ring network shown in FIG. 17 will be described for each hierarchy.

In the layer 0 of the hierarchical hash ring network shown in FIG. 17, one network “R0” including all nodes (nodes A to M) is generated. In the ring-shaped network “R0”, the nodes A to M are configured in a ring shape based on hash values derived from the IDs of the nodes A to M using a hash function (for example, MD5). Here, FIG. 19 is a table showing an example of network configuration information. FIG. 19 shows network configuration information that the configuration information transfer control unit 113 transfers to the table management unit 214 of the node A in step S411 of FIG. 15 after the configuration information creation unit 112 generates the hierarchical hash ring network shown in FIG. It is. The table configuration of the network configuration information 151 illustrated in FIG. 19 is the same as the table configuration of the network configuration information 147 illustrated in FIG. 10, and describes a network name, a node hash value, and an address. From the network configuration information 151 of FIG. 19, the network including the node A (specifically, the network names “R0”, “R00”, “R000”, “R0000”) and the network including the node A are joined. The node hash value and address for the current node can be read.

Also, in the second layer of the hierarchical hash ring network shown in FIG. 17, neighboring networks “R000” and “R100” configured by participation of nodes in neighboring areas on a narrow geographical area are generated. The The network “R000” includes nodes A, B, and C, and the network “R100” is a second layer network including nodes D, E, and G. Here, the nodes D, I, J, and L are the second layer network because the node has low performance or does not conform to the clustering policy described in the policy table 149 of FIG. Suppose you don't belong to.

In addition, the first layer of the hierarchical hash ring network shown in FIG. 17 includes a distant neighborhood network “consisting of a distant area in a wide geographical space corresponding to the network“ R000 ”or“ R100 ”of the second layer. R00 "and" R10 "are generated. The network “R00” includes nodes H, I, J, and K, and the network “R10” is a first layer network including nodes L and M.

Further, in the third layer (lowermost layer) of the hierarchical hash ring network shown in FIG. 17, networks “R0000” to “R1101” including only their own nodes are generated for the nodes A to M.

As described above, the hierarchical hash ring network generated in the distributed data management system 2 includes the second layer network based on the neighborhood network, and the first layer network based on the far neighborhood network having distantness with respect to the second layer network. Are associated with each other, so that it is possible to store data in a remote place in the geospace.

In the distributed data management system 2 according to the second embodiment, a method for writing data, a method for reading data, a method for deleting data, and a node are added to a distributed database using a hierarchical hash ring network. Since the same processing described in the distributed data management system 1 according to the first embodiment can be applied to the method of deleting, deleting the node, and recovering the data of the deleted node, the description thereof is omitted. .

(2-4) Effects According to this Embodiment According to the distributed data management system 2 described above, with regard to the construction of the hierarchical hash ring network, the neighborhood network configured based on the proximity in the geospace (for example, FIG. 17). Is associated with a distant neighborhood network having a distant property in geospace (for example, a first layer network in the hierarchical hash ring network shown in FIG. 17). As a result, data can be stored in a remote place in a geographic space. Thus, the distributed data management system 2 can construct a distributed database that is more suitable for disaster recovery than the distributed data management system 1 according to the first embodiment, and is more resilient in the event of a disaster (resilient). ) A distributed database can be realized.

More specifically, according to the distributed data management system 2, by having a neighborhood network based on the proximity in the geospatial like the second layer network, it is the same as the distributed data management system 1 in the first embodiment. In addition, disaster resistance can be ensured. Furthermore, according to the distributed data management system 2, even if a large-scale disaster makes it impossible to connect to all the nodes in one neighboring network, the remoteness having a geo-spaced distance to the neighboring network By associating the neighboring network, it is possible to connect to the data stored in the node in the neighboring network. In addition, since these data are also stored in remote nodes constituting the distant neighborhood network, it is possible to expect a data protection effect that prevents the loss of the data from a large-scale disaster.

Further, such a distributed data management system 2 has the same configuration as the distributed data management system 1, and the same processing as the distributed data management system 1 is applied to the processing related to data access to the hierarchical hash ring network. Thus, the effect of the first embodiment can also be realized.

(3) Other Embodiments In the distributed

data management systems

1 and 2 according to the first and second embodiments described above, the case where the core node 10 and the nodes 20A to 20N are explicitly separated has been described. The present invention is not limited to this. For example, the function of the core node 10 may be arranged in any of the nodes 20A to 20N, or the core node 10 itself may store data in the same manner as the node 20. It may be configured to store and participate in a hierarchical hash ring network. In such a case, the distributed

data management systems

1 and 2 according to the first or second embodiment can be realized without using a dedicated device for the core node 10, and an improvement in versatility can be expected.

In the distributed

data management systems

1 and 2 according to the first and second embodiments described above, the nodes 20A to 20N are each one information processing apparatus. However, the present invention is not limited to this. For example, each of the nodes 20A to 20N may be configured by a data center configured by a plurality of nodes. In such a case, the metrics of each data center may list node metrics for each of a plurality of nodes in the data center as in the metric management tables 141 and 142 shown in FIG. A value rounded by an average value of a metric for each data center may be used, and a method of indicating a minimum value and a maximum value of a metric for each data center may be employed. In the distributed data management system configured as described above, a distributed database based on a hierarchical hash ring network can be built by incorporating a plurality of data centers, so that more data and nodes can be handled. Large-scale distributed data management can be realized.

Further, in the distributed

data management systems

1 and 2 according to the first and second embodiments described above, a metric management table that acquires the current “usage rate” of internal components such as a CPU as an example of a dynamic metric. 141, but the present invention is not limited to this. For example, the moving average value generated from the latest data of the usage rate and the past data is used as the metric, or the past data is referred to at the same time. You may do it. In such a distributed data management system, the metrics of the node 20 can be obtained in more detail by using various types of metrics. Therefore, the distribution according to the metric of the specification is performed based on a more detailed situation analysis. Construction of a type database can be realized.

In the distributed

data management systems

1 and 2 according to the first and second embodiments described above, the configuration of the hierarchical hash ring network generated by the process shown in FIG. As shown in the administrator service setting screen 143, the configuration of the generated hierarchical hash ring network may be provided to the administrator by displaying it on the display unit of the core node 10 or the administrator terminal connected to the core node. Good. In such a case, the administrator can visually recognize the structure of the distributed database by displaying the configuration of the hierarchical hash ring network as shown in FIG. 8 or FIG. 17 on the display unit. Therefore, the distributed data management system can be used more easily.

Further, in the distributed

data management systems

1 and 2 according to the first and second embodiments described above, the 0th layer network “R0” illustrated in FIG. 8 has the highest overlay composed of a total set of a plurality of nodes. An example of a network, the first layer networks “R00” and “R10” and the second layer networks “R000”, “R001”, and “R100” are examples of a sub-overlay network configured by a subset of a plurality of nodes. The lowermost layer (third layer) networks “R0000” to “R1101” are an example of a unit overlay network composed of units of a plurality of nodes. The hierarchical hash ring network constructed by the core node 10 is an example of a hierarchical overlay network in which a top overlay network, a sub-overlay network, and a unit overlay network are sequentially layered. Further, the node hash value described in the node hash field 146B of the network configuration information 146 illustrated in FIG. 9 is an example of a node number that indicates a number that is stochastically uniform over a set of a plurality of nodes. The hash value “7694f4a66316e53c8cdd9d9954bd611d” by MD5 for “q” is an example of a data number indicating a stochastic uniform number obtained from data stored in the node.

Note that the present invention is not limited to the above-described embodiment, and includes various modifications. For example, the above-described embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to one having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. . Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

Information such as programs, tables, and files for realizing the above functions is stored in a recording device such as a memory, hard disk, SSD (Solid State Drive), or a recording medium such as an IC card, SD card, or DVD. be able to. Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. In practice, it may be considered that almost all components are connected to each other.

1, 2 Distributed data management system 10 Core node 11 Configuration information management unit 111 Metric management unit 112 Configuration information creation unit 113 Configuration information transfer control unit 20 (20A to 20N) Node 21 Data control unit 212 Data write control unit 213 Data read control Unit 214 Table management unit 22 Metric acquisition unit 234 Storage unit 235 CPU
236 memory 30 network

Claims

Multiple nodes connected to each other on the network to store data,
A top overlay network connected to the plurality of nodes and composed of a total set of the plurality of nodes, a sub-overlay network composed of a subset of the plurality of nodes, and a unit of the plurality of nodes. A unitary overlay network, a hierarchical overlay network that is hierarchized in order, and a core node that manages the data by the hierarchical overlay network;
Have
Each of the plurality of nodes is assigned a node number indicating a stochastic uniform number across the set of the plurality of nodes,
Each overlay network included in the hierarchical overlay network includes the node number assigned to each node constituting the overlay network, and a data number indicating a stochastic uniform number obtained from data stored in the node. Is a ring network in which a node for storing the data is determined in the network,
When the core node constructs the hierarchical overlay network,
Configuring the sub-overlay network by a subset of the plurality of nodes determined based on a geospatial distance;
Associating layers in the order of the top-level overlay network, the sub-overlay network, and the unit overlay network;
For each node constituting the hierarchical overlay network, network configuration information including information indicating all the overlay networks including the node and information regarding all nodes constituting the overlay network is generated to each node. Forward,
The plurality of nodes hold the transferred network configuration information,
The plurality of nodes, when writing data to the hierarchical overlay network, replicates the data in each overlay network associated with the unit overlay network to which the data has been written. Management system.
The plurality of nodes acquire a predetermined parameter value indicating the performance of the node,
The core node collects parameter values respectively acquired by the plurality of nodes, and constructs the hierarchical overlay network, and based on the collected parameter values, the sub-overlay network in which each node can participate. The distributed data management system according to claim 1, wherein the hierarchy is limited.
The core node evaluates a node allocated redundantly between the sub-overlay networks in the same hierarchy, and eliminates the redundant allocation of the node when an evaluation result that does not permit duplication is obtained. Item 3. A distributed data management system according to item 2.
The core node evaluates a spatial distance with respect to the node number between nodes constituting the sub-overlay network, and if the spatial distance is equal to or greater than a predetermined threshold, the core node is associated with the sub-overlay network. 4. The distributed data management system according to claim 3, wherein a node assigned to the sub-overlay network or the highest overlay network in a higher hierarchy is added to the sub-overlay network.
In the hierarchical overlay network, the number of nodes constituting the upper layer overlay network is two or more than the number of nodes constituting the lower layer overlay network between the layers of the associated overlay network. The distributed data management system according to claim 4, wherein the distributed data management system is constructed as follows.
The core node includes a first sub-overlay network configured by a subset of the plurality of nodes determined based on proximity in a geospatial distance, and a higher rank than the first sub-overlay network. Constructing the hierarchical overlay network to include a second sub-overlay network associated with a hierarchy;
The nodes constituting the second sub-overlay network are a subset of the plurality of nodes determined on the basis of the proximity of the distance in the geospatial space, and from the first sub-overlay network in the geospatial space 6. The distributed data management system according to claim 5, wherein the distributed data management system is a subset of the plurality of nodes having a distance greater than or equal to a predetermined distance.
When the plurality of nodes write data to the sub-overlay network or the top-level overlay network, the same data has already been written to the first node determined based on the node number and the data number. 7. If distributed, the data is replicated to a node that is uniquely determined from the determined first node and to which the data is not written. Management system.
The plurality of nodes are uniquely determined from the first node determined based on the node number and the data number at least in the highest-order overlay network when writing data to the hierarchical overlay network. The distributed data management system according to claim 7, wherein the data is replicated to a node to which the data is not written.
9. The plurality of nodes, when replicating data, include information indicating the location of a node storing primary data of the data in metadata and write the information to a destination node. Described distributed data management system.
When deleting data from the hierarchical overlay network, a node requesting data deletion refers to the metadata and searches for the location of a node storing primary data of data to be deleted, and the primary The distributed data management system according to claim 9, wherein a deletion of the data is requested to a node storing the data.
When deleting the nodes constituting the hierarchical overlay network,
The plurality of nodes are nodes included in an overlay network associated with a higher rank of the overlay network, in each overlay network associated with the node to be deleted, the data stored in the node to be deleted. The distributed data management system according to claim 10, wherein the data is moved or replicated from a node obtained based on a data number corresponding to the data to another node of the overlay network.
A top overlay network composed of a total set of a plurality of nodes storing data in a computer in a core node connected to a plurality of nodes storing data, and a sub-overlay network composed of a subset of the plurality of nodes A hierarchical overlay network in which a unit overlay network composed of units of the plurality of nodes is sequentially hierarchized, and the data is managed by the hierarchical overlay network,
Each of the plurality of nodes is assigned a node number indicating a stochastic uniform number across the set of the plurality of nodes,
Each overlay network included in the hierarchical overlay network is based on the node number assigned to each node constituting the overlay network and a data number indicating a stochastic uniform number obtained from stored data. A ring type network in which a node for storing the data is determined in the network,
The program is stored in a computer in the core node.
When constructing the hierarchical overlay network,
Configuring the sub-overlay network with a subset of the plurality of nodes determined based on a geospatial distance;
Associating hierarchies in order of the top-level overlay network, the sub-overlay network, and the unit overlay network;
For each node constituting the hierarchical overlay network, generating network configuration information including information indicating all overlay networks included in the node and information relating to all nodes constituting the overlay network;
Transferring the generated network configuration information to the nodes;
A program for running