WO2012115194A1

WO2012115194A1 - Distributed data base system and data structure for distributed data base

Info

Publication number: WO2012115194A1
Application number: PCT/JP2012/054432
Authority: WO
Inventors: 宏二伊藤; 木村　聡; 洋平日詰
Original assignee: ディジタル・ワークス株式会社
Priority date: 2011-02-25
Filing date: 2012-02-23
Publication date: 2012-08-30
Also published as: US20140074774A1; EP2680151A4; EP2680151A1; CN103384878A; JP2012178025A; JP5727258B2

Abstract

A distributed data base system that distributes and stores distributed shared NID (DSN) data to a plurality of slave nodes, separated into data type, said data relating to the correspondence between a key value that is an actual value and a key value identifier (NID) that takes any value within a range for the data types for key values within a whole distributed data base. When distributing and storing the DSN data to the plurality of slave nodes (15, 17, 19), the storage destination slave node is determined from among the plurality of slave nodes (15, 17, 19) on the basis of the key value relating to the registration request.

Description

Distributed database system and data structure of distributed database

The present invention relates to a distributed database system that includes a master node that collectively manages a plurality of slave nodes, and that stores key values in a plurality of slave nodes in a distributed manner, and a data structure of the distributed database.

As a database, a centralized database and a distributed database are known from the viewpoint of physical arrangement of nodes that perform data processing. Among them, a distributed database that stores data in a distributed manner includes a master node that manages and manages a plurality of slave nodes, and stores key value data in a plurality of slave nodes in a distributed manner.

As an example of such a distributed database, Cited Document 1 discloses a database management apparatus that stores data in a distributed manner by combining horizontal and vertical distribution. The database management device includes a plurality of database devices having a hash function calculation unit and a distribution function calculation unit, and a global server having a load information collection unit and an access right management unit. The global server statistically processes the load information to determine the access right including the database device with the lowest load and the access period to the database device. Access to the plurality of database devices is permitted based on the access right determined by the global server.

JP 2006-350741 A

However, in the distributed database apparatus according to Patent Document 1 in which key values are distributedly stored in a plurality of nodes by combining horizontal and vertical distributions, each key value is considered without considering whether or not the key values have the same value. Are randomly distributed across multiple nodes. When data operations are executed under such random distributed storage, the time delay related to communication that occurs between multiple nodes due to mutual reference of key values having the same value distributed and stored in multiple nodes is a bottle. It becomes difficult to efficiently improve the processing capacity of the entire system.

The present invention has been made to solve the above-described problems, and an object of the present invention is to efficiently improve the processing capacity of the entire distributed data system.

In order to solve the above-described problem, in the present invention, a key value that is a real value and a key value identifier (NID) that takes a unique value within the range of the data type of the key value in the entire distributed database. The distributed shared NID (DSN) data related to the correspondence is distributed and stored in each of the plurality of slave nodes.
When the DSN data is distributed and stored in each of the plurality of slave nodes, one slave node as a storage destination is determined based on the key value related to the registration request from among the plurality of slave nodes.

According to the present invention configured as described above, each of the plurality of slave nodes performs a data operation such as a join operation based on a command from the master node in parallel, instead of a key value that is an actual value, It operates so as to use a key value identifier (NID) obtained by referring to DSNs distributedly stored in its own node.
Here, the key value identifier (NID) takes a unique value within the range of the data type of the key value in the entire distributed database. That is, if the key value is the same, the same key value identifier (NID) value is taken. On the other hand, the storage location of the distributed shared NID (DSN) data as information about the key value is determined based on the key value. Then, information on the same key value is collected in the same slave node.

In short, in the present invention, information related to the same key value is intentionally distributed and stored so as to be aggregated in the same slave node. For this reason, in contrast to the conventional example in which key values having the same value across multiple slave nodes are randomly distributed and stored, for example, when a slave node performs a data operation such as a join operation. No communication between slave nodes for referring to key values having the same value mutually occurs, and processing overhead of the entire system is suppressed.

Therefore, according to the present invention, the processing capability of the entire distributed database system can be improved efficiently.

It is a block diagram which shows the outline | summary of the distributed database system which concerns on embodiment of this invention. It is a figure which shows an example of the transaction of the sales management table in the 1st-3rd slave node. It is a figure which shows an example of the transaction of the sales management table in the 1st-3rd slave node. It is a figure which shows an example of a distributed shared NID (DSN). It is a figure which shows an example of a distributed compression decompression | restoration index (D-CRX). It is a figure which shows an example of a distributed compression result set cache (D-CRS). It is a figure which shows an example of a distributed row identification index (D-RIX). It is a functional block diagram which shows the internal structure of a master node and a 1st slave node. It is a flowchart which shows the cooperation operation | movement with the master node 13 and the

slave nodes

15, 17, and 19 when the registration request | requirement of the data of DSN arises. It is a flowchart which shows the cooperative operation | movement with the master node 13 and the

slave nodes

15, 17, and 19 when the registration request of the data of D-CRX / D-RIX arises. It is a flowchart which shows the cooperation operation | movement with the master node 13 and the

slave nodes

15, 17, and 19 when the registration request of the data of D-CRS arises. It is process drawing which shows the flow of the distributed query process performed with the distributed database system which concerns on this embodiment. It is a figure which shows the number of customers according to area (internal table) distributedly stored in a plurality of local nodes. FIG. 8 is a diagram showing an example of D-RIX in the internal table shown in FIG. 7. It is a figure which shows the RID contrast table which shows the correspondence of external table RID and internal table RID.

Hereinafter, a distributed database system according to an embodiment of the present invention will be described in detail with reference to the drawings.

(Overview of distributed database system according to an embodiment of the present invention)
First, an overview of a distributed database system according to an embodiment of the present invention will be described. FIG. 1 is a configuration diagram showing an overview of a distributed database system according to an embodiment of the present invention. A system 11 of a distributed relational database (hereinafter, “relational database” is abbreviated as “RDB” and “database” is abbreviated as “DB”) is basically a master node 13 according to this embodiment. And the first to

third slave nodes

15, 17, 19 are connected via a first communication network 21. The master node 13 manages and manages the plurality of

slave nodes

15, 17, and 19.

Nodes

13, 15, 17, and 19 are computers having an information processing function.

As a component outside the distributed RDB system 11, a plurality of

client terminals

25 a, 25 b and 25 c are connected to the master node 13 via the second communication network 23. When the master node 13 accepts a key value registration request issued from any one of the plurality of

client terminals

25a, 25b, and 25c or a data operation request such as a table join operation, the master node 13 performs a process according to the request. Executed in cooperation with the

third slave nodes

15, 17, 19 and returns the obtained processing result as a response to the requested client terminal.

The master node 13 has a master data storage unit 13a for storing master data. The master data includes DB metadata and DB management data. The DB metadata includes a physical configuration table regarding where and how

many slave nodes

15, 17, and 19 are installed, a configuration table of table attributes, and the like. The DB management data includes sharing management data such as the latest shared NID described later. Here, as a main feature of the present invention, the master node 13 includes information on the key value for specifying the key value, including the key value that is the original management target, as the first to third slave nodes 15, Only the management for the distributed storage in 17 and 19 is performed, and neither the master node 13 itself nor the master data storage unit 13a holds the key value or information about the key value.

Each of the first to

third slave nodes

15, 17, and 19 has first to third local

data storage units

15a, 17a, and 19a for storing first to third local data, respectively. The configurations of the first to

third slave nodes

15, 17, 19 and the first to third local

data storage units

15a, 17a, 19a are equal side by side. Therefore, in order to avoid duplication of description, the first slave node 15, the first local data, and the first local data storage unit 15a will be representatively described below, and the other second, third, The description will be replaced with the description of the

slave nodes

17 and 19, the second and third local data, and the second and third local

data storage units

17a and 19a. Each of the first to third local

data storage units

15a, 17a, and 19a corresponds to a DSN storage unit, a D-CRX storage unit, a D-CRS storage unit, and a D-RIX storage unit.

The first local data includes four types of index data. That is, a first distributed shared NID (hereinafter referred to as “DSN”), a first distributed compression and decompression index (hereinafter referred to as “D-CRX”), a first distributed compression result set cache (hereinafter referred to as “D-CRS”). And a first distributed row identification index (hereinafter referred to as “D-RIX”). These will be described in detail later.

(Example of sales management table transaction in multiple slave nodes)
An example of the transaction of the sales management table in the plurality of

slave nodes

15, 17, and 19 generated by the distributed RDB system 11 according to the embodiment of the present invention is described. 2A and 2B are tables showing an example of a transaction of the sales management table in the plurality of

slave nodes

15, 17, and 19. FIG.

When table data in which tuples (rows) and columns (rows) are arranged two-dimensionally as shown in FIG. 2A and FIG. 2B is input, the distributed RDB system 11 has four types of distributed relational data models. Create index data. The input line numbers shown in FIGS. 2A and 2B are signs for uniquely identifying the input lines, and are assigned sequentially from the first in ascending order. This input line number is a mark given by the master node 13 and is not given to actual input data.

In the first row of the table shown in FIGS. 2A and 2B, the column item names are displayed. In the second row of the table, the type of data type (for example, character string type, numeric type, date type, etc.) ) Is filled in. In the tables shown in FIGS. 2A and 2B, 1 to 15 are assigned as input row numbers to the respective tuples. In the tables shown in FIGS. 2A and 2B, “distributed row identifier (hereinafter referred to as“ RID ”) having a unique value for each column in the table constituting the distributed database” is described in the claims. The value is equal to the input line number.

In the tables shown in FIGS. 2A and 2B, the attribute value of each tuple is entered in the lower column of each tuple. However, the attribute value of each tuple is entered for convenience of explanation and is not included in the actual input data. As this attribute value, for example, a key value identifier (hereinafter referred to as “NID”) issued by referring to the DSN data with the key value as an input, the DSN data determined by the consistent hash method, Storage destination node number (hereinafter referred to as “DSN-CNN”), storage destination node number of D-CRX data determined by the consistent hash method (hereinafter referred to as “D-CRX-CNN”), and There is a D-CRX block number (hereinafter referred to as “D-CRX-BN”) used to determine D-CRX-CNN. When determining D-CRX-BN, a blocking coefficient (hereinafter referred to as “D-CRX-BF”) is used. NID, DSN, DSN-CNN, D-CRX-CNN, D-CRX-BN, and D-CRX-BF will be described in detail later.

In the tables shown in FIGS. 2A and 2B, the value of D-CRX-BF used to determine D-CRX-BN is “7”, and the number of storage destination nodes is the first to

third slave nodes

15 and 17. , 19 to correspond to the number of 19. The first to

third slave nodes

15, 17, and 19 are assigned a to c as CNN identifiers, respectively. That is, the first slave node 15 corresponds to the storage node of CNN = a, the second slave node 17 corresponds to the storage node of CNN = b, and the third slave node 19 stores CNN = c. The following description will be made on the assumption that it corresponds to the previous node. In the following description, a symbol {i, j, k} in which an element is enclosed by braces {} indicates a set having i, j, k as elements.

The element of a set of values to be managed (hereinafter referred to as “value set”) is called a key value. NID is an indicator for uniquely identifying a key value. This NID is assigned to each key value so that it takes a unique value within the range of the data type of the key value related to the registration request in the entire distributed database.

In short, the same NID is assigned to key values having the same value. This will be verified in the tables shown in FIGS. 2A and 2B. In the tuple whose input row number is “1” in FIG. 2A, “2” is assigned as the NID to “Hokkaido Tohoku” which is the key value entered as the region name of the column. In addition, in the tuple with the input line number “5” in FIG. 2A, an example of the tuple with the input line number “1” with respect to “Hokkaido Tohoku”, which is the key value entered as the column name, Similarly, “2” is assigned as the NID. In the tables shown in FIGS. 2A and 2B, the key values that appear later, the key values that appear earlier, and the key values that do not correspond to these can be identified at a glance. The two boxes have different shading modifiers.

NID is preferably a natural number. This is because when NID is used instead of the key value and data operation such as table join operation is performed, the processing load related to the calculation can be suppressed and the calculation process can be speeded up. Details of the reason will be described later. Moreover, it is preferable that NID takes the value of an order number. This is because a unique NID can be paid out for a key value related to a registration request by a very simple procedure such as incrementing the value of the latest shared NID. In the tables shown in FIGS. 2A and 2B, NID = 0 is defined as an invalid value, NID = 1 is defined as a NULL value, and NID = 2 or greater is defined as a valid value.

As a distribution method for distributing key value information (DSN, D-CRX, D-CRS, D-RIX) to a plurality of

slave nodes

15, 17, and 19, for example, a known consistent hash method is adopted. be able to. However, the method is not limited to the consistent hash method as long as the information about the key value can be redistributed at a sufficiently small cost when the number of slave nodes (storage nodes) increases or decreases.

(Example of DSN)
Next, an example of the DSN generated by the distributed RDB system 11 according to the embodiment of the present invention will be described. FIG. 3A is index data showing an example of the DSN. The DSN is an index in which NIDs are distributed and stored in a plurality of

slave nodes

15, 17, and 19 by a consistent hash method using a key value as a distributed key. When obtaining a corresponding NID by using the key value as an input, the DSN is obtained. Referenced.

More specifically, the DSN is an index related to the correspondence between the key value related to the registration request and the NID assigned to the key value. The DSN is stored separately for each data type of the key value related to the registration request. According to the index of the DSN, the corresponding NID can be obtained by inputting the key value.

The DSN is generated according to the following rules as shown in FIG. 3A.
(1) A common NID is given to the same set of key values of the same data type within the range of the entire distributed database.
(2) The key value and NID pair are distributed and stored in a plurality of

slave nodes

15, 17, and 19 (DSN-CNN values a to c) by a consistent hash method using the key value as a distributed key.
(3) The management unit of the DSN is a distributed database.

(Example of D-CRX)
Next, an example of D-CRX generated by the distributed RDB system 11 according to the embodiment of the present invention will be described. FIG. 3B shows an example of D-CRX in the case of extracting three columns of area name, price, and order date. The D-CRX is an index in which NIDs are distributedly stored in a plurality of

slave nodes

15, 17, and 19 by a consistent hash method using an NID function (including NID itself) as a distributed key. This is used when examining the corresponding NID at the time of searching, or when converting the NID back to the key value.

More specifically, the D-CRX is an index relating to a one-to-one correspondence between a key value related to a registration request and an NID assigned to the key value. The D-CRX is stored separately for each column to which the key value related to the registration request belongs. According to the index of D-CRX, a corresponding key value can be obtained by using NID as an input, and a corresponding NID can be obtained by using the key value as an input. The difference between DNS and D-CRX is that the key value is converted to NID in one direction in DNS, whereas the key value and NID are converted bidirectionally in D-CRX. Further, according to the index of D-CRX, a corresponding NID set can be obtained (value range search) by inputting the value range (start price and end price) of the key value. This value range search will be described later.

As shown in FIG. 3B, the D-CRX is generated according to the following rules.
(1) A one-to-one correspondence between key values and NIDs is given using the same column of the same table in the distributed database as a management unit.
(2) The block number (D-CRX-BN) is a quotient obtained by dividing the NID by the blocking coefficient (D-CRX-BF). From the above equation, it can be said that D-CRX-BN is a function of NID.
(3) D-CRX-BF is a constant and takes a value of an arbitrary positive integer (in this example, “7”). Thus, D-CRX-BN takes a positive integer value.
(4) By a consistent hash method using D-CRX-BN (NID function) as a distributed key, a pair of NID and key value (one-to-one correspondence) is stored in a plurality of

slave nodes

15, 17, 19 (stored) Destination node; D-CRX-CNN values are distributed and stored in a to c).
(5) The management unit of D-CRX is a column.

(Example of D-CRS)
Next, an example of D-CRS generated by the distributed RDB system 11 according to the embodiment of the present invention will be described. FIG. 3C shows an example of a D-CRS in the case of extracting three columns of area name, price, and order date. The D-CRS is an index in which NIDs are distributedly stored in a plurality of

slave nodes

15, 17, and 19 by a consistent hash method using a RID function (including RID itself) as a distribution key. This is used when creating a RID set as, or when creating a tuple as a join result.

More specifically, the D-CRS is an index related to the correspondence between RID and NID that takes a unique value for each column in the table constituting the distributed database. The D-CRS is stored separately for each column to which the key value related to the registration request belongs. In D-CRS, the correspondence between RIDs and NIDs is described on a one-to-one basis, and has a data structure that allows duplicate appearances of NIDs in the same column. According to D-CRS, a corresponding NID can be obtained by using RID as an input. In addition, a corresponding RID set (represented as {RID}) can be obtained using an NID set (represented as {NID}) as an input. That is, the RID set corresponding to the NID set can be obtained by repeatedly performing the full scan of checking the NID to be searched and all the NIDs belonging to the column one by one for the original number of NID sets. .

As shown in FIG. 3C, the D-CRS is generated according to the following rules.
(1) The NID corresponding to the RID is stored in the same column unit in the distributed database. This is called a column unit NID array.
(2) The block number (D-CRS-BN) is a quotient obtained by dividing the RID by the blocking coefficient (D-CRS-BF). From the above formula, it can be said that D-CRS-BN is a function of RID.
(3) D-CRS-BF is a constant and takes a value of an arbitrary positive integer (in this example, “7”). Therefore, D-CRS-BN takes a positive integer value.
(4) By a consistent hash method using D-CRS-BN (RID function) as a distribution key, a column unit NID array is converted into a plurality of

slave nodes

15, 17, 19 (D-CRS data storage destination nodes; The value of D-CRS-CNN is distributed and stored in a to c).
(5) The management unit of D-CRS is a column.

(Example of D-RIX)
Next, an example of RIX generated by the distributed RDB system 11 according to the embodiment of the present invention will be described. FIG. 3D shows an example of D-RIX in the case of extracting three columns of area name, price, and order date. D-RIX is an index in which NIDs are distributed and stored in a plurality of

slave nodes

15, 17, and 19 by a constant hash method using NID functions (including NIDs themselves) as distribution keys. This is used to check the corresponding RID when searching for the key value in operation.

More specifically, D-RIX is an index related to a one-to-N correspondence between NID and RID set. This D-RIX is stored separately for each column to which the key value related to the registration request belongs. The difference between D-CRS and D-RIX is that D-CRS can cause duplicate NIDs in the same column, whereas D-RIX has duplicate NIDs in the same column. It is a point that cannot be obtained. This difference is caused by using RID as a distribution key in D-CRS and using NID as a distribution key in D-RIX. According to the D-RIX index, a corresponding RID set can be obtained using NID as an input, and a corresponding RID set can be obtained using NID set as an input. Also, by performing data operations such as table join operations using D-RIX, data movement between a plurality of

slave nodes

15, 17, 19 (storage destination nodes; DSN-CNN values a to c) And a full scan at the time of a search in a table join operation can be suppressed. The reason will be described later.

As shown in FIG. 3D, the D-RIX is generated according to the following rules.
(1) The correspondence between the NID and the RID set is given using the same column of the same table in the distributed database as a management unit.
(2) The block number (D-RIX-BN) is a quotient obtained by dividing the NID by the blocking coefficient (D-RIX-BF). From the above equation, it can be said that D-RIX-BN is a function of NID.
(3) D-RIX-BF is a constant and takes a value of an arbitrary positive integer (in this example, “7”). Therefore, D-RIX-BN takes a positive integer value.
(4) By a consistent hash method using D-RIX-BN (NID function) as a distribution key, a pair of NID and RID set (corresponding to 1 to N) is converted into a plurality of

slave nodes

15, 17, 19 (storage destinations) Node: DSN-CNN values are distributed and stored in a to c).
(5) The management unit of D-RIX is a column.

(Internal configuration of master node and first slave node)
Next, an internal configuration of the master node 13 and the first slave node 15 that play an important role in the distributed RDB system 11 according to the present embodiment will be described. FIG. 4 is a functional block diagram showing the internal configuration of the master node 13 and the first slave node 15.

First, the internal configuration of the master node 13 will be described. The master node 13 includes a master reception unit 31, an NID allocation unit 33, an index generation unit 35, a node determination unit 37, a distributed overall management unit 39 including a request issue unit 41 and a processing result integration unit 43, and an update management unit 45. It is configured with.

The master reception unit 31 corresponding to the registration request reception unit receives the key value and the data type information related to the registration request. In practice, the key value registration request is generally input to the master reception unit 31 in units of tuples in which each key value and data type information are associated for each of a plurality of columns. However, the key value registration request may be input in the form of table data composed of a set of a plurality of tuples. In any case, the master reception unit 31 that has received input data in units of tuples is referred to as a key value associated with any one of a plurality of columns included in the tuple (hereinafter referred to as “processing target key value”). ) And the data type information is the minimum unit, and the sequential processing is advanced. For this reason, in the present embodiment, the description will be made assuming that the master receiving unit 31 has received a set of processing target key values, which are the minimum unit, and information on the data type thereof.

The master reception unit 31 that has received input data in units of tuples assigns a common RID that takes a unique value for each column in the table to all key values belonging to the tuple. As the RID, it is preferable to take a natural number and an ordinal number as in the case of NID. This is because a unique RID can be assigned to each column in the table with a very simple procedure of incrementing the latest value of the RID.

The master reception unit 31 receives a key value registration request (including information on the data type) issued from any one of the plurality of

client terminals

25a, 25b, and 25c, or a data operation request such as a table join operation. Accept. Further, the master accepting unit 31 accepts an existing confirmation result to be described later transmitted from any slave node. The master reception unit 31 passes the request to the NID assignment unit 33 when a key value registration request occurs, and passes the request to the node determination unit 37 when a data operation request occurs.

When a key value registration request is generated, the NID allocation unit 33 refers to the latest shared NID among the DB management data stored in the master data storage unit 13a, and assigns a unique NID to the key value related to the registration request. Assign. Further, when the master reception unit 31 receives an existing confirmation result (information indicating whether or not the key value related to the registration request already exists in the storage destination node), the NID assignment unit 33 is based on the existing confirmation result. Then, the latest shared NID update control signal related to whether or not to update the latest shared NID is generated and sent to the update management unit 45.

When an NID is assigned to the key value related to the registration request by the NID assigning unit 33, the index generation unit 35, the key value related to the registration request, the data type of the key value, and the NID assigned to the key value , And D-CRX, D-CRS, and D-RIX data are generated based on the RID assigned by the master node 13, respectively. Here, the DSN data means at least the minimum unit data (a pair of data of one key value and one NID) that is a component of the DSN. Similarly, D-CRX data means at least the minimum unit data (a pair of data of one key value and one NID) that is a component of D-CRX, and D-CRS data is: This means at least the minimum unit data (one pair of RID and one NID) that is a component of D-CRS, and D-RIX data is at least the minimum unit that is a component of D-RIX. Data (pair data of one NID and a group of RID sets). The four types of index data generated in this way are referred to, for example, when a node is determined as described below, or when a data operation request such as a table join operation occurs. The index generation unit 35 corresponds to a DSN generation unit, a D-CRX generation unit, a D-CRS generation unit, and a D-RIX generation unit.

When a key value registration request is generated, the node determination unit 37 determines the slave nodes that are the distributed storage destinations of the DSN, D-CRX, D-CRS, and D-RIX data generated by the index generation unit 35 as key values. , NID function or RID function is determined by a consistent hash method using a distribution key as one of the functions. The node determination unit 37 corresponds to a DSN storage node determination unit, a D-CRX storage node determination unit, a D-CRS storage node determination unit, and a D-RIX storage node determination unit.

When a data operation request such as a table join operation occurs, the node determination unit 37 distributes the data operation request to all of the first to

third slave nodes

15, 17, and 19 under the control of the master node 13. It is determined as a node to perform. In response to this, the first to

third slave nodes

15, 17, and 19 execute the given data operation requests in parallel. The procedure by which the first to

third slave nodes

15, 17, and 19 distribute the data operation request will be described in detail later.

When a key value registration request is generated, the request issuing unit 41 belonging to the distributed overall management unit 39 among the first to

third slave nodes

15, 17, 19 under the control of the master node 13 is a node determination unit 37. The DSN, D-CRX, D-CRS, and D-RIX data are respectively sent to the slave node determined in (1) to issue a data registration request. Further, when a data operation request is generated, the request issuing unit 41 issues a processing request to the node determined by the node determining unit 37 among the first to

third slave nodes

15, 17, and 19. The request issuing unit 41 corresponds to a DSN registration request issuing unit, a D-CRX registration request issuing unit, a D-CRS registration request issuing unit, and a D-RIX registration request issuing unit.

The processing result integration unit 43 belonging to the distributed overall management unit 39 receives the data operation results distributed in each of the first to

third slave nodes

15, 17, 19 and integrates these processing results.

The update management unit 45 controls the update of the latest shared NID in the DB management data in accordance with the latest shared NID update control signal transmitted from the NID assigning unit 33. Specifically, the update management unit 45 updates the latest shared NID to the next shared NID when it receives an existing confirmation result that the key value related to the registration request does not yet exist in the DSN storage unit of the storage destination node. To control.

Next, the internal configuration of the first slave node 15 will be described. The first slave node 15 includes a first reception unit 51, an existing determination unit 53, a registration management unit 55, a first distributed processing unit 57, and a first response unit 59.

The first reception unit 51 requests registration of DSN, D-CRX, D-CRS and D-RIX data (hereinafter referred to as “index data”) sent from the request issuing unit 41 of the master node 13, or Accept data manipulation requests such as table join operations. The first reception unit 51 passes the request to the existing determination unit 53 when an index data registration request occurs, and passes the request to the first distributed processing unit 57 when a data operation request occurs.

The existing determination unit 53 refers to the first DSN in the first local data storage unit 15a when an index data registration request is generated, and is the same as the processing target key value included in the DSN data related to the registration request. It is confirmed whether or not the value data already exists in the first DSN, and the result of the existing confirmation is sent to the first response unit 59. Further, the existing determination unit 53 registers the data of the DSN related to the combination of the processing target key value included in the DSN data related to the registration request and the unique NID for the key value based on the existing confirmation result. The command is sent to the registration management unit 55.

The registration management unit 55 performs registration management in which the DSN data related to the registration request is additionally stored in the first local data storage unit 15a (corresponding to the DSN storage unit) in accordance with the registration command sent from the existing determination unit 53. . The registration management unit 55 also stores the D-CRX, D-CRS, and D-RIX data related to the registration request in accordance with the registration command sent from the master node 13 in the first local data storage unit 15a (D-CRX). Registration management to be stored in a storage unit, a D-CRS storage unit, and a D-RIX storage unit). Thereby, registration of all index data related to the registration request is completed. The registration management unit 55 corresponds to a DSN registration management unit, a D-CRX registration management unit, a D-CRS registration management unit, and a D-RIX registration management unit.

The first distributed processing unit 57 corresponding to the data operation execution unit, when a data operation request related to its own node occurs, the first DSN and the first D stored in the first local data storage unit 15a. Referring to -CRX, first D-CRS, and first D-RIX as appropriate, the distributed processing related to the request is executed in parallel to other nodes. The first distributed processing unit 57 passes the obtained distributed processing result to the first response unit 59.

The first response unit 59 responds to the master reception unit 31 of the master node 13 with the existing confirmation result sent from the existing determination unit 53. In the master node 13, as described above, the master reception unit 31 passes the received existing confirmation result to the NID allocation unit 33. The NID allocation unit 33 generates the latest shared NID update control signal based on the existing confirmation result, and sends this to the update management unit 45. The update management unit 45 controls the update of the latest shared NID in the DB management data according to the latest shared NID update control signal transmitted from the NID assigning unit 33. Further, the first response unit 59 responds to the processing result integration unit 43 of the master node 13 with the distributed processing result sent from the first distributed processing unit 57.

(Cooperation between the master node 13 and the

slave nodes

15, 17, 19 when a key value registration request is generated)
Next, a cooperative operation between the master node 13 and the

slave nodes

15, 17, and 19 when a key value registration request is generated will be described. First, a cooperative operation between the master node 13 and the

slave nodes

15, 17, and 19 when a DSN data registration request is generated will be described with reference to FIG. 5A. FIG. 5A is a flowchart showing the cooperative operation between the master node 13 and the

slave nodes

15, 17, and 19 when a DSN data registration request based on the key value registration request is generated. As described above, the key value registration request is generally input to the master reception unit 31 in units of tuples. However, the master reception unit 31 that has received input data in units of tuples includes a plurality of tuples included in the tuple. It is assumed that the processing proceeds with the key value associated with any one of the columns and the data type information as a unit.

In step S11, the master reception unit 31 of the master node 13 receives a key value registration request issued from any one of the plurality of

client terminals

25a, 25b, and 25c, and passes the request to the NID allocation unit 33. .

In step S12, the NID assigning unit 33 that has received the key value registration request refers to the latest shared NID in the DB management data stored in the master data storage unit 13a, and is appropriate for the key value related to the registration request. A shared NID (for example, the next shared NID is a value obtained by incrementing the value of the latest shared NID by “1”) is assigned. Information on the next shared NID assigned to the key value related to the registration request by the NID assigning unit 33 is sent to the index generating unit 35.

In step S13, the index generation unit 35 of the master node 13 converts the key value related to the registration request, the data type of the key value, and the next shared NID assigned by the NID assignment unit 33 to the key value. Based on this, DSN data is generated.

In step S14, the node determination unit 37 of the master node 13 determines the slave node that is the distributed storage destination of the DSN data generated by the index generation unit 35 by the consistent hash method using the key value as the distribution key. Then, the decision content is sent to the distributed management unit 39. Here, it is assumed that a node having a value of CNN = x (where x is any one of a to c) is determined as a slave node that is a distributed storage destination of the DSN. Further, the following description will be made assuming that the node having the value of CNN = x is the first slave node 15.

In step S 15, the request issuing unit 41 belonging to the distributed overall management unit 39 of the master node 13 is the node determination unit 37 among the first to

third slave nodes

15, 17, 19 under the control of the master node 13. A data registration request is issued by sending the DSN data generated in step S13 to the first slave node 15 having the determined value of CNN = x.

Here, although the description of the processing flow in the master node 13 is in progress, for the sake of convenience in smoothly explaining the cooperative operation between the master node 13 and the first slave node 15, the value of CNN = x will be described below. The contents of processing in the first slave node 15 having In step S21, the first receiving unit 51 of the first slave node 15 having a value of CNN = x receives the DSN data registration request sent from the request issuing unit 41 of the master node 13, and receives the request. Is passed to the existing determination unit 53.

In step S22, the existing determination unit 53 of the first slave node 15 having a value of CNN = x refers to the first DSN in the first local data storage unit 15a and sets the DSN data related to the registration request. It is confirmed whether or not data having the same value as the included processing target key value already exists in the first DSN. Based on the existing confirmation result, the processing target key value related to the registration request is already registered in step S23. The existing determination regarding whether or not it is completed is performed. Then, based on the existing determination result, the existing determination unit 53 registers a registration command for the DSN data related to the combination of the processing target key value related to the registration request and the unique NID for the key value. To do.

As a result of the existing determination in step S23, if it is determined that the processing target key value related to the registration request has already been registered, the registration management unit 55 in response to the registration command sent from the existing determination unit 53 in step S24. Instead, the DSN data relating to the correspondence between the processing target key value related to the registration request and the registered NID is maintained as it is. This ensures the unique NID assignment for the same key value. In this case, the registration management unit 55 cancels the DSN data registration request. This is because the DSN data related to the correspondence between the processing target key value related to the registration request and the registered NID has already been registered, and it is not necessary to additionally register the DSN data.

On the other hand, if it is determined as a result of the existing determination in step S23 that the processing target key value related to the registration request is an unregistered value, the registration management unit 55 is sent from the existing determination unit 53 in step S25. In accordance with the registration command, the DSN data to which the next shared NID is assigned as appropriate for the processing target key value related to the registration request is additionally stored in the first local data storage unit 15a. Here, the additional storage of the DSN data to which the next shared NID is assigned means that the data of the DSN to which the next shared NID is assigned is added without rewriting the already accumulated DSN data. Say.

In step S26, the first response unit 59 of the first slave node 15 displays the NID actually assigned to the processing target key value related to the registration request together with the existing confirmation result after the processing in step S24 or S25. It returns to the master reception part 31 of the master node 13, and complete | finishes the flow of a series of processes.

Here, the description returns to the process flow in the master node 13. In step S <b> 16, the master reception unit 31 of the master node 13 receives the existing confirmation result transmitted from the first slave node 15 and the NID actually assigned to the processing target key value related to the registration request. The result is passed to the NID allocation unit 33. In step S <b> 17, the NID allocation unit 33 performs an existing determination regarding whether or not the processing target key value related to the registration request has already been registered.

As a result of the existing determination in step S17, if it is determined that the processing target key value related to the registration request has already been registered, in step S18, the processing target key value related to the registration request is the first slave that is the storage destination. Receiving the existing confirmation result indicating that the node 15 already exists, the NID assigning unit 33 generates a control signal for prohibiting the update of the latest shared NID, and sends the control signal to the update managing unit 45. The update management unit 45 prohibits the update of the latest shared NID according to the latest shared NID update control signal transmitted from the NID assigning unit 33. Thereby, the next shared NID assigned to the processing target key value related to the registration request in step S12 is canceled, and the value of the latest shared NID is maintained as it is without being updated.

On the other hand, if it is determined as a result of the existing determination in step S17 that the key value related to the registration request is an unregistered value, in step S19, the processing target key value related to the registration request is the first storage destination. Receiving the existing confirmation result indicating that the slave node 15 does not yet exist, the NID assigning unit 33 generates a control signal for updating the latest shared NID and sends it to the update managing unit 45. In accordance with the latest shared NID update control signal transmitted from the NID assigning unit 33, the update managing unit 45 sets the value of the latest shared NID to the value of the next shared NID assigned to the key value related to the registration request in step S12. Update to After this update, the NID allocation unit 33 advances the process flow to step S31 in FIG. 5B.

Next, with reference to FIG. 5B, the cooperative operation between the master node 13 and the

slave nodes

15, 17, and 19 when a D-CRX / D-RIX data registration request occurs after the DSN data registration is completed. I will explain. FIG. 5B is a flowchart showing the cooperative operation between the master node 13 and the

slave nodes

15, 17, and 19 when a D-CRX / D-RIX data registration request based on the key value registration request is generated.

In step S31, the NID assigning unit 33 of the master node 13 refers to the NID actually assigned to the processing target key value related to the registration request, and the block number (D-CRX-BN) which is a function of the NID. And D-RIX-BN). Specifically, “D-CRX-BN is the quotient obtained by dividing NID by D-CRX-BF” and “D-RIX-BN is the quotient obtained by dividing NID by D-RIX-BF”, respectively. Ask for.

In step S32, the index generation unit 35 of the master node 13 determines the processing target key value related to the registration request, the NID actually assigned to the key value, and the column name to which the processing target key value related to the registration request belongs. Based on the above, D-CRX data is generated.

In step S33, the index generation unit 35 of the master node 13 actually assigns the NID assigned to the processing target key value related to the registration request, the RID set corresponding to the NID, and the processing target key value related to the registration request. D-RIX data is generated based on the column name to which the file belongs.

In step S34, the node determination unit 37 of the master node 13 uses the NID function obtained in step S31 to determine the slave nodes that are the distributed storage destinations of the D-CRX and D-RIX data generated by the index generation unit 35. It is determined by a consistent hash method using a certain block number (D-CRX-BN and D-RIX-BN) as a distribution key, and the determined content is sent to the distribution management unit 39. Here, it is assumed that a node having a value of CNN = y (where y is any of a to c) is determined as a slave node serving as a distributed storage destination of D-CRX and D-RIX data. Further, the following description will be made assuming that the node having the value of CNN = y is the first slave node 15.

In step S 35, the request issuing unit 41 belonging to the distributed overall management unit 39 of the master node 13 is the node determination unit 37 among the first to

third slave nodes

15, 17, 19 under the control of the master node 13. A data registration request is issued by sending the D-CRX data generated in step S32 and the D-RIX data generated in step S33 to the first slave node 15 having the determined value of CNN = y. To do. After issuing this data registration request, the distributed overall management unit 39 of the master node 13 advances the processing flow to step S51 in FIG. 5C.

Here, although the description of the processing flow in the master node 13 is in progress, for the sake of convenience in smoothly explaining the cooperative operation between the master node 13 and the first slave node 15, the value of CNN = y is described below. The contents of processing in the first slave node 15 having In step S41, the first receiving unit 51 of the first slave node 15 having a value of CNN = y registers the D-CRX and D-RIX data sent from the request issuing unit 41 of the master node 13. The request is accepted, and the request is passed to the registration management unit 55 via the existing determination unit 53.

In steps S42 to S43, the registration management unit 55 of the first slave node 15 having a value of CNN = y responds to the registration request sent from the request issuing unit 41 of the master node 13 with D-CRX and D -RIX data is divided into columns and stored in the first local data storage unit 15a. After storing the D-CRX and D-RIX data in steps S42 to S43, the registration management unit 55 of the first slave node 15 ends the series of processing flow.

Next, FIG. 5C shows the cooperative operation between the master node 13 and the

slave nodes

15, 17, and 19 when the D-CRS data registration request is made after the D-CRX and D-RIX data registration is completed. The description will be given with reference. FIG. 5C is a flowchart showing a cooperative operation between the master node 13 and the

slave nodes

15, 17, and 19 when a D-CRS data registration request based on the key value registration request is generated.

In step S51, the NID allocation unit 33 of the master node 13 refers to the RID corresponding to the NID actually allocated to the processing target key value related to the registration request, and the block number (D -CRS-BN). Specifically, “D-CRS-BN is a quotient obtained by dividing RID by D-CRS-BF” is obtained by calculation.

In step S52, the index generating unit 35 of the master node 13 relates to the NID actually assigned to the processing target key value related to the registration request, the RID to which the processing target key value related to the registration request belongs, and the registration request. Based on the column name to which the processing target key value belongs, D-CRS data is generated.

In step S53, the node determination unit 37 of the master node 13 determines the slave node that is the distributed storage destination of the D-CRS data generated by the index generation unit 35 as a block number (RID function obtained in step S51). D-CRS-BN) is determined by a consistent hash method using a distribution key, and the determined content is sent to the distribution management unit 39. Here, it is assumed that a node having a value of CNN = z (where z is any one of a to c) is determined as a slave node serving as a D-CRS distributed storage destination. Further, the following description will be made assuming that the node having the value of CNN = z is the first slave node 15.

In step S 54, the request issuing unit 41 belonging to the distributed overall management unit 39 of the master node 13 is the node determination unit 37 among the first to

third slave nodes

15, 17, 19 under the control of the master node 13. The D-CRS data generated in step S52 is sent to the first slave node 15 having the determined value of CNN = z to issue a data registration request. After issuing this data registration request, the distributed overall management unit 39 of the master node 13 terminates the series of processing flow.

Next, in step S61, the first receiving unit 51 of the first slave node 15 having a value of CNN = z sends a registration request for D-CRS data sent from the request issuing unit 41 of the master node 13. And passes the request to the registration management unit 55 via the existing determination unit 53.

In step S 62, the registration management unit 55 of the first slave node 15 having a value of CNN = z displays the D-CRS data in the column according to the registration request sent from the request issuing unit 41 of the master node 13. Each of them is stored in the first local data storage unit 15a. After storing the D-CRS data in step S62, the registration management unit 55 of the first slave node 15 ends the series of processes.

The four types of index data registered as described above are processed by the first to

third slave nodes

15, 17, and 19, such as a table join operation, for a distributed RDB consisting of a huge amount of data. Demonstrates its power when distributed and executed in parallel. In particular, when the distributed RDB system 11 according to the present embodiment provides a distributed RDB service via, for example, WWW (World Wide Web), the number of nodes in the system is aimed at flexible response to a sudden increase in demand. Even when data operations such as table join operations are performed on data distributed and stored in multiple nodes after expansion, the overall processing capacity of the system before and after the expansion is linearly improved. The linear scale-out property that can be achieved can be realized. In the following, we introduce distributed query processing on how to improve the processing capacity of the system as a whole and to achieve linear scale-out by introducing four types of index data. An example will be described.

FIG. 6 is a process diagram showing the flow of distributed query processing. The distributed query process shown in FIG. 6 includes a distributed search process in step S71, a distributed table join process in step S72, a distributed result tuple creation process for aggregation in step S73, and the like.

The distributed search process in step S71, the distributed table join process in step S72, and the distributed result tuple creation process in step S73 can be executed in parallel by the first to

third slave nodes

15, 17, and 19. It is. In the process of step S72 in which the process is executed using the result of the upstream phase, the process cannot be executed until the process of the upstream process (step S71) is completed in all the nodes.

Here, prior to explaining the flow of distributed query processing, the meaning of words used in the explanation is defined.
The search expression includes a search term, a logical operator, and parentheses that control the priority of operations. A search expression is constituted by any combination of these.
The search term includes a left side term, a comparison operator, and a right side term. The left-hand side consists of a column name or a literal (actual value). The right-hand side consists of a column name or a literal (actual value). The comparison operators are equal “=”, not equal “≠”, greater than “>”, greater than equal “≧”, less than “<”, and less than equal “≦”.

The logical operator consists of AND “&”, OR “|”, and NOT “￢”. AND is a product operation, OR is a sum operation, and NOT is a negative operation. The parentheses consist of an opening parenthesis “(” and a closing parenthesis “)”.

In the search, after searching for D-CRX using the key value itself as a search key in all slave nodes, D-CRS is searched using the NID set extracted by this search as the search key. The RID set of is acquired. In a range search (for example, a search for extracting key values belonging to a specified range based on a start price and an end price for numeric data), a D-CRX search is performed by giving a specified range of key values in all slave nodes. Thereafter, a D-CRS search is performed using the NID set extracted by this search as a search key, and an RID set as a search result is obtained. In a partial match search (for example, a search for extracting a key value having at least a part of a specified character string for character string type data), the specified character string of the key value is given to all slave nodes and the D-CRX After the search, a D-CRS search is performed using the NID set extracted by this search as a search key, and an RID set as a search result is acquired.

“Table join” means a table join operation between an outer table and an inner table. An external table is a table that serves as a basis for table join. The inner table is a table that becomes a partner of the table join to the outer table. The outer table and inner table are joined by the value of the join column. The joined column is a column that exists in common between the outer table and the inner table, and has a role of allowing the outer table and the inner table to be joined through the column. The outer table join column is called the outer table outer key column, and the inner table join column is called the inner table primary key column. Multiple tables can be joined by repeating the table join operation.

Next, the content of the distributed search process in step S71 shown in FIG. 6 will be described with reference to FIGS. 2A and 2B with specific examples. As a first example of a complete match search using a single term, an example in which a key value whose “region name” is “Kanto” is extracted as an RID set will be described. In the first embodiment, first, the distributed processing units of the first to

third slave nodes

15, 17, 19 that have received the search request from the master node 13 extract the search terms from the search formula, and set the search term set { Find the region name = "Kanto"}. Then, for the D-CRX (CNN = a to c) whose column name is “area name”, the first to third slave nodes 15, using the element {area name = “Kanto”} of the search term set, The search is performed simultaneously by the distributed

processing units

17 and 19. By this search, the NID set = {6} is obtained from the original {area name = “Kanto”}.

Here, in the simultaneous search for D-CRX (CNN = a to c), the search can be stopped when the key value matching the search condition is hit without requiring a full scan collation. This is because D-CRX employs a data structure that does not allow duplication of key values within a certain column. Therefore, according to the data structure of D-CRX, it is possible to contribute to shortening the search time (the same applies hereinafter).

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 target N-ID sets = D-CRS (CNN = a to c) whose column name is “region name” as an object unit = By performing full scan collation with {6}, an RID set that matches the value of the NID set is obtained. This processing is performed simultaneously by the distributed processing units of the first to

third slave nodes

15, 17, 19 to obtain RID set = {2, 3, 7, 9}. RID set = {2, 3, 7, 9} is the answer to be obtained.

Here, the value search of the NID set for D-CRS can be completed in a relatively short time, although full scan verification is required. This is because the actual key value is replaced with an NID (for example, a natural number) by a search using D-CRX in the previous stage, and full scan collation is performed using the replaced NID value. In the distributed processing units of the first to

third slave nodes

15, 17, and 19, the NID is expressed by a fixed-width binary integer value, so that it is much more efficient in search and reference than the actual key value. good. Therefore, according to the data search related to the combination of D-CRX and D-CRS, it is possible to contribute to shortening the search time (the same applies hereinafter).

As an exact match search in a plurality of terms (combination of single terms), for example, a key value whose “region name” is “Kanto” or a key value whose “region name” is “Kansai” is used as an RID set. Example 2 to extract is given. In the second embodiment, each of the search conditions in each single term is used to execute a complete match search in a single term according to the procedure in the first embodiment, and a logical sum (OR) between the obtained RID sets is obtained. A target RID set can be obtained by performing the calculation.

As a third example of the range search in a single term, an example in which an RID set whose “price” is 500,000 or more and 800,000 or less is extracted. In the third embodiment, first, the distributed processing units of the first to

third slave nodes

15, 17, 19 that have received the search request from the master node 13 extract the search terms from the search formula, and the search term set { [Price ≧ 500000, Price ≦ 800000]} is obtained. Then, for the D-CRX (CNN = a to c) whose column name is “price”, the first to third slave nodes using the element {[price ≧ 500000, price ≦ 800000]} of the search term set Searches are simultaneously performed by the distributed

processing units

15, 17, and 19. By this search, the NID set = {5, 8, 11, 14, 22, 30} is obtained from the first element {[price ≧ 500000]}. NID set = {2, 8, 11, 17, 22, 30} is obtained from the second element {[price ≦ 800000]}. Next, the distributed processing units of the first to

third slave nodes

15, 17 and 19 apply the NID sets of the first and second elements to the search formula and perform a logical product (AND) operation between the NID sets. The NID set = {8, 11, 22, 30} as the search result is obtained. Then, for the D-CRS (CNN = a to c) whose column name is “price”, by performing a full scan collation with the NID set = {8, 11, 22, 30} in the original unit, Find the RID set that matches the value.

As a fourth example of partial match search in a single term, ““ area name ”= LIKE”% relation% ”(in accordance with SQL notation, LIKE represents a keyword for an ambiguous search instruction, and% represents a wild card symbol. In the case of this example, an example of extracting a key value that matches a search condition of “search for a key value including a character string having“ region name ”as“ related ”” as an RID set will be given. In the fourth embodiment, first, the distributed processing units of the first to

third slave nodes

15, 17, 19 that have received the search request from the master node 13 extract the search terms from the search formula, and set the search term set { "Region name" = LIKE "% relation%"} is obtained. Then, for the D-CRX (CNN = a to c) whose column name is “region name”, the first to third using the element {“region name” = LIKE ”% relation“} of the search term set. Searches are simultaneously performed by the distributed processing units of the

slave nodes

15, 17, and 19. By this search, the NID set = {6, 33} is obtained from the source {“area name” = LIKE ”% relation%”}.

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 target N-ID sets = D-CRS (CNN = a to c) whose column name is “region name” as an object unit = By performing full scan collation with {6, 33}, an RID set that matches the value of the NID set is obtained. This processing is performed simultaneously by the distributed processing units of the first to

third slave nodes

15, 17 and 19, thereby obtaining RID set = {2, 3, 7, 9, 12, 15}. This RID set = {2, 3, 7, 9, 12, 15} is the answer to be obtained.

As a partial match search with multiple terms (combination of single terms), for example, a key value including the character string of “Region name” is “Seki” or a character string of “Region name” is “East” A fifth embodiment in which key values are extracted as RID sets will be described. In the fifth embodiment, a partial match search in a single term is executed by the procedure in the fourth embodiment using search conditions in each single term, respectively, and a logical sum (OR) between the obtained RID sets is obtained. By performing the calculation, an RID set as a search result can be obtained.

Next, the procedure of the distributed table join process in step S72 shown in FIG. 6 will be described with reference to FIGS. 2A, 2B, 7 and 8. FIG. 7 is a diagram showing an internal table of the number of customers by region that are distributed and stored in the plurality of

slave nodes

15, 17, and 19. FIG. 8 is a diagram showing an example of D-RIX in the internal table shown in FIG. In the sixth embodiment, the distributed processing units of the first to

third slave nodes

15, 17, and 19 obtain the result of joining the outer table and the inner table with reference to the D-RIX of the joined column. In the sixth embodiment, FIGS. 2A and 2B showing sales management tables (transactions) distributedly stored in a plurality of

slave nodes

15, 17, and 19 are positioned as external tables. FIG. 7 showing the number of customers by region distributedly stored in the plurality of

slave nodes

15, 17 and 19 is positioned as an internal table. In Example 6, the combined column is “area name”.

In the sixth embodiment, first, the distributed processing units of the first to

third slave nodes

15, 17 and 19 that have received the table join operation request from the master node 13 perform D-RIX (hereinafter “OTFK”) of the external table. NID (hereinafter abbreviated as “OTFK-NID”) sets of the external table external key columns are respectively acquired from “-D-RIX”. Specifically, for example, the first slave node (CNN = a) 15 acquires the OTFK-NID set {2, 6, 25} from the “region name” column shown in FIG. 3D.

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 use the element (NID) of the OTFK-NID set as a search condition, and D-RIX (hereinafter, “ Search for “ITPK-D-RIX”. Specifically, for example, in the first slave node (CNN = a) 15, using the element {2, 6, 25} of the OTFK-NID set as a search condition, the column “region name” shown in FIG. , The NID (hereinafter abbreviated as “ITPK-NID”) set of the internal table primary key column that matches the element {2, 6, 25} of the OTFK-NID set is searched. By this search, an ITPK-NID set {2, 6, 25} is obtained.

When the search for the ITPK-NID set is successful, the distributed processing units of the first to

third slave nodes

15, 17, 19 start from the target column (external table external key column) of the OTFK-D-RIX to the OTFK-NID set. The external table RID (hereinafter abbreviated as “OTRID”) set corresponding to is acquired. Specifically, for example, in the first slave node (CNN = a) 15, from the column of “area name” shown in FIG. 3D, the OTRID set {1, 2, 25} corresponding to the OTFK-NID set {2, 6, 25} 2, 3, 5, 7, 8, 9, 10, 14}.

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 start from the ITPK-D-RIX target column (internal table main key column) to the internal table RID (hereinafter referred to as the ITPK-NID set). Each of the sets is acquired. Specifically, for example, in the first slave node (CNN = a) 15, from the column of “area name” shown in FIG. 8, the ITRID set {1, 2, 25} corresponding to the ITPK-NID set {2, 6, 25}. 2,7}.

Then, the distributed processing units of the first to

third slave nodes

15, 17, and 19 respectively create a comparison table (hereinafter abbreviated as “REF-OTRID-ITRID”) of the internal table RID corresponding to the external table RID. To do. The REF-OTRID-ITRID plays a role of connecting the external table RID and the corresponding internal table RID with the common OTFK-NID and ITPK-NID in between. Thereby, the RID comparison table as shown in FIG. 9 is obtained.

When there are a plurality of coupling conditions, the distributed processing units of the first to

third slave nodes

15, 17 and 19 respectively provide the REF-OTRID-ITRID according to the procedure of Embodiment 6 for each of the plurality of coupling conditions. REF-OTRID-ITRID (RID comparison table) as a combined result is obtained for each of the plurality of

slave nodes

15, 17, 19 by performing a logical operation between the created REF-OTRID-ITRIDs. be able to.

The RID comparison table as a combination result according to the sixth embodiment is expressed by REF-OTRID-ITRID distributedly stored for each of the plurality of

slave nodes

15, 17, and 19. The data structure of the RID comparison table as a result of this combination greatly affects the storage efficiency and processing efficiency of data in the RDB. This is because a function equivalent to that of a join table can be achieved without creating a join table that tends to be huge on a real value basis. In the distributed processing units of the first to

third slave nodes

15, 17, and 19, the RID of the target column (external table external key column) in the external table is pointed by sequentially tracing REF-OTRID-ITRID with the external table as a reference. Can be used to efficiently refer to the RID of the target column (inner table main key column) in the inner table. If the RID of the base table (external table or internal table) belonging to the target column (external table external key column or internal table primary key column) is obtained, the corresponding NID is obtained by referring to the D-CRS using the RID as a pointer can do. If the NID is obtained, the corresponding key value can be obtained by referring to the D-CRX using the NID as a pointer.

The data structure of the search result is expressed as an RID set of external tables. On the other hand, the data structure of the join result is expressed as a chain of REF-OTRID-ITRID (RID comparison table) with reference to the external table. Their common point is that both have RID sets of external tables. Therefore, by performing a logical operation between the RID sets of the external tables of the search result and the join result, a logical operation related to the combination of the search result and the join result can be efficiently realized.

According to the sixth embodiment, a complicated operation for the table join operation can be replaced with a simple set operation. For this reason, it is possible to realize a significant reduction in calculation processing time. Further, according to the sixth embodiment, when performing a table join operation, it is possible to make it unnecessary to match key values between the join column of the outer table and the join column of the inner table. This is because the adoption of the DSN guarantees that the same NID is assigned to the same key value, and the external table RID and the corresponding internal table RID with the common NID interposed therebetween. This is based on the existence of a RID comparison table that links.

In the sixth embodiment, information (NID) related to the same key value is intentionally distributed and stored so as to be aggregated in the same slave node. For this reason, in contrast to the conventional example in which key values having the same value across multiple slave nodes are randomly distributed and stored, for example, when a slave node performs a data operation such as a join operation. , Communication between slave nodes for referring to each other key values having the same value does not occur at all. Therefore, according to the sixth embodiment, the processing overhead of the entire system can be suppressed, so that the processing capability of the distributed RDB system 11 as a whole can be improved efficiently.

In short, according to the sixth embodiment, for example, when providing a distributed RDB service via WWW (World Wide Web), the number of slave nodes in the system is increased in order to flexibly cope with a sudden increase in demand. Even if data operations such as table join operations are performed on data that is distributed and stored in each of the multiple slave nodes after expansion, the processing capacity of the entire system before and after expansion can be linearly improved. Scale-out property can be realized.

Next, an overview of the distribution result tuple creation process for aggregation in step S73 shown in FIG. 6 will be described separately for cases where there is no table join operation and cases where there is no table join operation. The process of step S73 is executed in parallel in each of the first to

third slave nodes

15, 17, and 19. In the distributed result tuple creation process for aggregation when there is no table join operation, the distributed processing units of the first to

third slave nodes

15, 17 and 19 use the RID sets as the respective search results. Each RID that is the source of the is acquired.

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 specify which node holds the NID data corresponding to the acquired RID based on the acquired RID. . Specifically, the distributed processing units of the first to

third slave nodes

15, 17, and 19 specify the data storage destination node number D-CRS-CNN by performing the following calculation. That is, the distributed processing unit determines the D-CRS block number D-CRS-BN based on the RID. Then, D-CRS-CNN is determined by performing a hash operation by the consistent hash method using the determined D-CRS-BN. Here, when a node other than the own node holds NID data corresponding to the acquired RID, the own node acquires the data from a node other than the own node. Next, the own node acquires the NID by referring to the D-CRS of the target column constituting the tuple using the acquired RID as a pointer.

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 specify which node holds the key value data corresponding to the acquired NID based on the acquired NID. To do. Specifically, the distributed processing units of the first to

third slave nodes

15, 17 and 19 perform the following calculation to identify the data storage node number D-CRX-CNN. That is, the distributed processing unit determines the D-CRX block number D-CRX-BN based on the NID. Further, D-CRX-CNN is determined by performing a hash operation by the consistent hash method using the determined D-CRX-BN. Here, when a node other than the own node holds data of a key value corresponding to the acquired NID, the own node acquires the data from a node other than the own node. Next, the node uses the acquired NID as a pointer, and acquires a key value that is an actual value by referring to the D-CRX of the target column that forms the tuple.

Next, in the distributed result tuple creation process for aggregation when there is a table join operation, the distributed processing units of the first to

third slave nodes

15, 17, 19 from the RID sets as the respective search results, Get the RID of the external table.

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 hold which NID data corresponding to the acquired RID of the external table based on the acquired RID of the external table. To identify. Specifically, the distributed processing units of the first to

third slave nodes

15, 17, and 19 perform the following calculation to specify the data storage node number D-CRS-CNN. That is, the distributed processing unit determines the D-CRS block number D-CRS-BN based on the RID of the external table. Then, D-CRS-CNN is determined by performing a hash operation by the consistent hash method using the determined D-CRS-BN. Here, when a node other than the own node holds NID data corresponding to the acquired RID of the external table, the own node acquires the data from a node other than the own node. Next, the own node uses the RID of the acquired external table as a pointer, refers to the REF-OTRID-ITRID chain, and determines the RID of the target internal table from the REF-OTRID-ITRID of the target column constituting the tuple. get.

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 hold which NID data corresponding to the acquired RID of the internal table based on the acquired RID of the internal table. To identify. Specifically, the distributed processing units of the first to

third slave nodes

15, 17, and 19 perform the following calculation to specify the data storage node number D-CRS-CNN. That is, the distributed processing unit determines the D-CRS block number D-CRS-BN based on the RID in the internal table. Then, D-CRS-CNN is determined by performing a hash operation by the consistent hash method using the determined D-CRS-BN. Here, when a node other than the own node holds NID data corresponding to the acquired RID of the internal table, the own node acquires the data from a node other than the own node. Next, the node acquires the NID by referring to the D-CRS of the target column constituting the tuple using the acquired RID of the internal table as a pointer.

Next, the distributed processing units of the first to

third slave nodes

15, 17, and 19 identify which node holds the key value data corresponding to the acquired NID based on the acquired NID. To do. Specifically, the distributed processing units of the first to

third slave nodes

In this embodiment, the distributed processing units of the first to

third slave nodes

15, 17, and 19 themselves perform a hash operation by the consistent hash method, so that D-CRX-CNN and D-CRS- Although the example which determines CNN was demonstrated, this invention is not limited to this. For example, the master node 13 holds D-CRX-CNN and D-CRS-CNN as master data 13a, and the distributed processing units of the first to

third slave nodes

15, 17, and 19 make an inquiry to the master node 13. May be. However, it is more efficient and preferable that each slave node performs the calculation by itself rather than inquiring to the master node 13.

As described above, the index generation unit 35 of the master node 13 stores the index data (DSN, D-CRX, D-CRS, and the like) to be distributed and stored in the first to

third slave nodes

15, 17, and 19. D-RIX) is created, and the created index data is collectively transmitted to the decision node by the node decision unit 37, and the index data is collectively processed on each decision node.

When determining the index data storage node by the consistent hash method, the DSN uses the key value, D-CRX and D-RIX use the NID function, and D-CRS uses the RID function as the distribution key. As a result, for example, when a slave node performs a data operation such as a join operation, communication between slave nodes for referring to key values having the same value does not occur at all. Efficiency can be realized.

In addition, since the NID and the key value are regulated to have a one-to-one correspondence by the DSN, the NID (natural number and order number) is used instead of the key value in the process before the key value is required as a meaningful value. Is preferably used. Thereby, all the calculations can be reduced to numerical calculations. Since the NID is represented by a fixed-width binary integer value in the computer, it is more efficient in search and reference scenes than the actual key value. Therefore, it is possible to contribute to shortening the arithmetic processing time.

The embodiment described above shows an example of realization of the present invention. Therefore, the technical scope of the present invention should not be limitedly interpreted by these. This is because the present invention can be implemented in various forms without departing from the gist or main features thereof.

For example, in the present embodiment, the first to

third slave nodes

15, 17, and 19 have been described as a plurality of slave nodes. However, the present invention is not limited to this example. The number of slave nodes may be adjusted to an appropriate number in accordance with the increase or decrease in the amount of data to be processed.

In this embodiment, one master node 13 is exemplified as the master node, but the present invention is not limited to this example. For the purpose of improving load distribution and fault tolerance, a replica of the master node may be provided. The same may be said about a slave node that a replica may be provided.

In this embodiment, the D-RIX index data is described side by side with DSN, D-CRX, and D-CRS. However, in the present invention, D-RIX is not an essential data structure. This is because D-RIX can realize processing efficiency at the time of a table join operation, but even without this, the function can be replaced by full scan collation referring to D-CRS.

In the above embodiment, the example in which the index generation unit 35, the node determination unit 37, and the update management unit 45 are provided in the master node 13 has been described, but the present invention is not limited to this. For example, these functional configurations may be provided in the

slave nodes

15, 17, and 19. When registering a large amount of data, the processing efficiency can be improved by executing the processes related to the index generation unit 35, the node determination unit 37, and the update management unit 45 in parallel in the plurality of

slave nodes

15, 17, and 19. it can.

The present invention can be used for a distributed database system including a plurality of slave nodes and a master node.

Claims

A master node for managing and managing a plurality of slave nodes, wherein key values are distributedly stored in the plurality of slave nodes, and data operations based on commands from the master node are performed using the distributedly stored key values. A distributed database system in which multiple slave nodes execute in parallel,
A registration request accepting unit that accepts the key value and the data type information relating to the registration request;
A key value identifier (hereinafter referred to as a key value identifier) that takes a unique value within the range of the data type of the key value related to the registration request in the entire distributed database with respect to the key value related to the registration request received by the registration request receiving unit An NID assigning unit for assigning (NID)),
A DSN generation unit that generates data of a distributed shared NID (hereinafter referred to as “DSN”) related to the correspondence between the key value related to the registration request and the NID allocated by the NID allocation unit;
A DSN storage node determination unit that determines one slave node that is a storage destination of the DSN data generated by the DSN generation unit based on a key value related to the registration request from among the plurality of slave nodes; ,
A distributed database system characterized by comprising:
The distributed database system according to claim 1, wherein
A D-CRX generation unit that generates data of a distributed compression / decompression index (hereinafter referred to as “D-CRX”) related to the correspondence between the key value related to the registration request and the NID allocated by the NID allocation unit;
A D-CRX storage node that determines one slave node as a storage destination of the D-CRX data generated by the D-CRX generation unit from the plurality of slave nodes based on the function of the NID A decision unit;
A distributed compression result set relating to a correspondence relationship between a distributed row identifier (hereinafter referred to as “RID”) having a unique value for each column in the table constituting the distributed database and the NID allocated by the NID allocation unit. A D-CRS generator that generates data of a cache (hereinafter referred to as “D-CRS”);
A D-CRS storage node that determines one slave node serving as a storage destination of the D-CRS data generated by the D-CRS generation unit from the plurality of slave nodes based on the function of the RID A decision unit;
A distributed database system further comprising:
The distributed database system according to claim 2,
A D-RIX generation unit that generates data of a distributed row identification index (hereinafter referred to as “D-RIX”) relating to a correspondence relationship between the NID assigned to the key value related to the registration request and the set of RIDs; ,
A D-RIX storage node determination unit that determines one slave node that is a storage destination of the D-RIX generated by the D-RIX generation unit based on the NID function from the plurality of slave nodes. When,
A distributed database system further comprising:
A distributed database system according to any one of claims 1 to 3,
The registration request reception unit, the NID allocation unit, the DSN generation unit, and the DSN storage node determination unit are provided in the master node,
The master node sends the DSN data and the data type information of the key value related to the registration request to the one slave node determined by the DSN storage node determination unit, and sends a registration request for the DSN data. It further includes a DSN registration request issuing unit for issuing,
Each of the plurality of slave nodes is
In accordance with a registration request for the DSN data by the DSN registration request issuing unit, the DSN performs registration management in which the DSN data is stored in the DSN storage unit for each data type of the key value related to the registration request. The registration management department;
An existing determination unit that determines whether or not the key value related to the registration request already exists in the DSN storage unit,
The DSN registration management unit, if the existing determination unit determines that the key value related to the registration request has already been registered in the DSN storage unit, the NID already assigned to the key value related to the registration request If the key value related to the registration request is determined not to exist in the DSN storage unit yet by the existing determination unit, the registration value of the DSN to which Registration management for storing the data of the DSN related to the correspondence relationship with the received NID in the DSN storage unit;
A distributed database system characterized by that.
A distributed database system according to any one of claims 1 to 3,
The distributed database system, wherein the DSN generation unit and the DSN storage node determination unit are provided in the plurality of slave nodes.
The distributed database system according to claim 2,
The DSN generation unit, the DSN storage node determination unit, the D-CRX generation unit, the D-CRX storage node determination unit, the D-CRS generation unit, and the D-CRS storage node determination unit are provided in the master node. ,
The master node is
A DSN registration request for issuing a registration request for the DSN data by sending the data of the DSN and the data type of the key value related to the registration request to one slave node determined by the DSN storage node determination unit The issuing department;
Sending the D-CRX data registration request to the D-CRX data by sending the D-CRX data and the column information to which the key value related to the registration request belongs to one slave node determined by the D-CRX storage node determination unit. A D-CRX registration request issuing unit to be issued;
Issuing a D-CRS registration request issuance for sending the D-CRS data and the column information to the one slave node determined by the D-CRS storage node determination unit and issuing a registration request for the D-CRS data And further comprising
Each of the plurality of slave nodes is
In accordance with a registration request for the DSN data by the DSN registration request issuing unit, the DSN performs registration management in which the DSN data is stored in the DSN storage unit for each data type of the key value related to the registration request. The registration management department;
In response to the D-CRX data registration request from the D-CRX registration request issuing unit, the D-CRX data is divided into columns for the key value related to the registration request and stored in the D-CRX storage unit. A D-CRX registration management unit for performing registration management to be stored;
In accordance with a registration request for the D-CRS data by the D-CRS registration request issuing unit, the D-CRS data is stored for each column in the D-CRS storage unit. A CRS registration manager,
A data operation execution unit that executes data operation based on a command from the master node in parallel using the information stored in the DSN storage unit, the D-CRX storage unit, and the D-CRS storage unit; ,
The master node further includes a processing result integration unit that integrates processing results executed in parallel by the data operation execution units of the plurality of slave nodes.
A distributed database system characterized by that.
The distributed database system according to claim 2,
The DSN generation unit, the DSN storage node determination unit, the D-CRX generation unit, the D-CRX storage node determination unit, the D-CRS generation unit, and the D-CRS storage node determination unit are connected to the plurality of slave nodes. A distributed database system characterized by being provided.
The distributed database system according to claim 3, wherein
The DSN generation unit, the DSN storage node determination unit, the D-CRX generation unit, the D-CRX storage node determination unit, the D-CRS generation unit, the D-CRS storage node determination unit, and the D-RIX generation unit And the D-RIX storage node determination unit is provided in the master node,
The master node is
A DSN registration request for issuing a registration request for the DSN data by sending the data of the DSN and the data type of the key value related to the registration request to one slave node determined by the DSN storage node determination unit The issuing department;
Sending the D-CRX data registration request to the D-CRX data by sending the D-CRX data and the column information to which the key value related to the registration request belongs to one slave node determined by the D-CRX storage node determination unit. A D-CRX registration request issuing unit to be issued;
Issuing a D-CRS registration request issuance for sending the D-CRS data and the column information to the one slave node determined by the D-CRS storage node determination unit and issuing a registration request for the D-CRS data And
Issuing a D-RIX registration request issuance request for sending the D-RIX data and the column information to the one slave node determined by the D-RIX storage node determination unit and issuing a registration request for the D-RIX data And further comprising
Each of the plurality of slave nodes is
In accordance with a registration request for the DSN data by the DSN registration request issuing unit, the DSN performs registration management in which the DSN data is stored in the DSN storage unit for each data type of the key value related to the registration request. The registration management department;
In response to the D-CRX data registration request from the D-CRX registration request issuing unit, the D-CRX data is divided into columns for the key value related to the registration request and stored in the D-CRX storage unit. A D-CRX registration management unit for performing registration management to be stored;
In accordance with a registration request for the D-CRS data by the D-CRS registration request issuing unit, the D-CRS data is stored for each column in the D-CRS storage unit. A CRS registration manager,
In response to the D-RIX data registration request from the D-RIX registration request issuing unit, the D-RIX data is stored in the D-RIX storage unit separately for each column. A RIX registration manager,
Data for executing data operations based on commands from the master node in parallel using information stored in the DSN storage unit, the D-CRX storage unit, the D-CRS storage unit, and the D-RIX storage unit An operation execution unit,
The master node further includes a processing result integration unit that integrates processing results executed in parallel by the data operation execution units of the plurality of slave nodes.
A distributed database system characterized by that.
The distributed database system according to claim 3, wherein
The DSN generation unit, the DSN storage node determination unit, the D-CRX generation unit, the D-CRX storage node determination unit, the D-CRS generation unit, the D-CRS storage node determination unit, and the D-RIX generation unit And the D-RIX storage node determination unit is provided in the plurality of slave nodes.
A distributed database system according to any one of claims 1 to 9,
The NID assigning unit assigns a natural number and an order number to the key value related to the registration request,
A distributed database system characterized by that.
The distributed database system according to claim 1, wherein
The DSN storage node determination unit determines one slave node that is a storage destination of the DSN data by a consistent hash method using a key value related to the registration request as a distribution key.
A distributed database system characterized by that.
The distributed database system according to claim 2,
The D-CRX storage node determination unit determines one slave node as a storage destination of the D-CRX data by a consistent hash method using the NID function as a distribution key,
The D-CRS storage node determination unit determines one slave node that is a storage destination of the D-CRS data by a consistent hash method using the RID function as a distribution key.
A distributed database system characterized by that.
The distributed database system according to claim 3, wherein
The D-RIX storage node determination unit determines one slave node as a storage destination of the D-RIX data by a consistent hash method using the NID function as a distribution key.
A distributed database system characterized by that.
A master node for managing and managing a plurality of slave nodes, wherein key values are distributedly stored in the plurality of slave nodes, and data operations based on commands from the master node are performed using the distributedly stored key values. The data structure of the distributed database system that the slave nodes of
The distributed shared NID related to the correspondence between the key value related to the registration request and the key value identifier (NID) that takes a unique value within the range of the data type of the key value related to the registration request in the entire distributed database Information indicating (DSN) separately for each data type included in the key value related to the registration request;
Information indicating a distributed compression / decompression index (D-CRX) related to the correspondence between the key value related to the registration request and the NID, for each column to which the key value related to the registration request belongs;
The distributed compression result set cache (D-CRS) related to the correspondence between the distributed row identifier (RID) taking a unique value for each column in the table constituting the distributed database and the NID is divided for each column. The information shown as the information about the key value for specifying the key value,
A data structure of a distributed database characterized by that.
15. A data structure of a distributed database according to claim 14,
Information indicating the distributed row identification index (D-RIX) relating to the correspondence between the NID and the set of RIDs divided for each column to which the key value related to the registration request belongs is further provided as information relating to the key value. Have
A data structure of a distributed database characterized by that.