WO2018028797A1 - Methods and systems for bulk loading of data into a distributed database - Google Patents

Methods and systems for bulk loading of data into a distributed database Download PDF

Info

Publication number
WO2018028797A1
WO2018028797A1 PCT/EP2016/069283 EP2016069283W WO2018028797A1 WO 2018028797 A1 WO2018028797 A1 WO 2018028797A1 EP 2016069283 W EP2016069283 W EP 2016069283W WO 2018028797 A1 WO2018028797 A1 WO 2018028797A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
distributed database
temporary
node
bulk
Prior art date
Application number
PCT/EP2016/069283
Other languages
French (fr)
Inventor
Jan Karlsson
Per OTTERSTRÖM
Marcus OLSSON
Tommy STENDAHL
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2016/069283 priority Critical patent/WO2018028797A1/en
Publication of WO2018028797A1 publication Critical patent/WO2018028797A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • G06F16/2386Bulk updating operations

Definitions

  • the present invention generally relates to distributed and replicated databases and, more specifically, to bulk inserting data into a distributed database.
  • FIG. 1 depicts an example of a distributed database having a plurality of nodes 1 ...N.
  • a node in a distributed database e.g., Node 2
  • Client applications like client application 106, make requests or queries of the distributed database system.
  • the RC 104 is a software module which assists in translating and formatting client queries into commands which will be understood by the DBMs, each of which are responsible for reading data from, and writing data to, a corresponding DB.
  • a node may have all three components, i.e., RC, DBM, and DB, or only subsets of those components as illustrated in Figure 1.
  • Figure 1 should be considered purely as an illustrative example of some of the components of a distributed database.
  • Each node in a distributed database like that of Figure 1 may have its own copy of data of one or several data shards.
  • a shard is a horizontal partition of data in the database. Describing some of the terminology associated with distributed database implementations may be useful for the reader of this description.
  • a table as that term is used in this technology is both a logical unit in the cluster, and also a physical unit at each node shard.
  • the complete key range can be divided into equally sized shards, one for each node. Each node would then be the primary owner of one such shard. Another strategy may be to divide the complete key range into a large number of randomly sized shards. Each node will then be the primary owner of several such shards. In order to handle all kinds of simple or complex data types as a key, it is common to calculate a hash value from the actual key, and then associate sequences of hash values with each shard.
  • Each node is also responsible for synchronizing its data with other nodes in the distributed database that have a replica of its shard(s) so that the data is consistent. This means that if a node is unable to receive a write request, another node will create a hint which will be replayed on the target node, once the node is available again.
  • FIG. 2 illustrates an example of a conventional technique for writing a single statement to a distributed database.
  • the write instruction is often expressed in a statement 200 which modifies a specific unit of data. This unit of data is identified with a key.
  • a client 202 starts a write operation, it will typically start by choosing a node in the cluster of nodes which will act as the coordinator for the operation.
  • the client 202 sends the statement to the coordinator 204 which in turn will calculate which shard the key associated with that statement is part of.
  • the coordinator 204 will then forward the statement to the nodes 206, 208, 210 which are determined to have a replica of that shard.
  • Each replica node 206, 208 and 210 will then apply the modification to the table specified by the statement, an example of which is illustrated for node 208 and table 212.
  • the modification is persisted in a local commit log, and then eventually the modification will finally be flushed to one of several structured data files 214 which hold the content of that table.
  • FIG. 3 illustrates a conventional batch statement write operation.
  • Batch statements are typically used to make multiple non-atomic statements into one atomic statement represented by arrow 300 being sent as a request from client 302. This one atomic statement will either fail as one unit or be committed as one unit.
  • one node 304 When writing batch statements in a distributed database, one node 304 will typically act as a coordinator for the entire batch. The coordinator node 304 will take that single batch and hold it in memory as indicated by log file 306. The coordinator node 304 will also write this batch statement to a log 307 on at least one other node. If the batch statement is successfully written to the logs 306 and 307, then the data is considered to be persistent and will not be lost if the coordinator node 304 goes down.
  • the coordinator node 304 will then write the individual statements in the batch from its memory (as indicated by the arrows coming from node 304 to the other nodes), in order to replicate the data to the tables in the corresponding nodes which are responsible for the shard(s) affected by this particular batch statement.
  • node 308 might have two statements from the batch written to it by node 304, while other nodes in Figure 3 might only have one statement written thereto depending upon which nodes own which shards that are impacted by the batch. If the coordinator node 304 goes down, it will read from the log 306 or 307of batches and pick up where it left off to process the batch write operation.
  • a problem with loading batches of data into a distributed database system in the manner described above is that the coordinator 304 will manage the batch until the batch has been completed. To maintain good performance the coordinator 304 will typically hold the complete batch in memory through the whole batch process.
  • the coordinator 304 will have a lot of memory reserved for these statements, which causes heap pressure. This effectively sets a threshold on how big a batch can be without causing performance issues.
  • Another problem is that the batch procedure is slow as it writes all data twice to ensure atomicity, first to the batch log and then to the real tables. As the batch is expressed as one operation towards the database cluster it is impractical to handle huge batches spanning millions of records using this conventional technique.
  • Another option available is to bulk load data into the distributed database by streaming data files directly to the nodes that are responsible. This is achieved by preparing a data set and streaming it from an external source into the database. The external source will be responsible for recovering from errors occurring during this process.
  • Embodiments allow for bulk or batch loading of data into a distributed database using temporary tables.
  • the temporary tables are not accessible to those client applications which are not involved in the bulk loading.
  • the data in the temporary tables becomes available to all of the client applications.
  • Temporary tables are periodically consolidated with corresponding target tables as part of a compaction process.
  • a method for bulk loading of data into a distributed database A plurality of temporary tables are created, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some of the client applications of the distributed database.
  • the data to be bulk loaded is inserted into each of the temporary tables whose corresponding node owns a shard of that data.
  • the bulk loaded data is committed to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some of the client applications.
  • a distributed database system there is a distributed database system.
  • the system includes a plurality of nodes, at least some of the nodes each including a communications interface configured to receive and to transmit instructions and data to be bulk loaded into the distributed database system, at least one storage device configured to store data; and at least one processor configured to create a temporary table, the temporary table being temporarily unavailable to at least some of the client applications of the distributed database, further configured to insert data to into the temporary table when the node owns a shard of that data; and further configured to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary table becomes available to the at least some of the client applications.
  • an apparatus is adapted to create a plurality of temporary tables, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some client applications of the distributed database, to insert the data to be bulk loaded into each of the temporary tables whose corresponding node owns a shard of that data, and to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some client applications.
  • a computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the methods described herein.
  • a carrier containing the computer program of claim described above wherein the carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium.
  • an apparatus comprising a first module configured to create a plurality of temporary tables, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some client applications of the distributed database; a second module configured to insert the data to be bulk loaded into each of the temporary tables whose corresponding node owns a shard of that data; and a third module configured to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some client applications.
  • Figure 1 illustrates a distributed database system
  • Figure 2 shows a conventional single statement write to a distributed database
  • Figure 3 shows a conventional batch statement write to a distributed database
  • Figure 4 depicts bulk loading of data into a distributed database according to an embodiment
  • Figures 5-7 show various aspects of bulk loading of data into a distributed database according to an embodiment
  • Figure 8 illustrates a flowchart of a method for bulk loading data into a distributed database according to an embodiment
  • Figure 9 illustrates a node in a distributed database
  • Figure 10 illustrates a carrier on which a computer program product according to an embodiment resides.
  • a client 400 initiates the process by sending an instruction or command 402 to start the bulk loading of data to a node 404 which it has selected as the coordinator node.
  • a coordinator can be selected to start the bulk loading either randomly or using a round-robin approach .
  • Node 404 creates its own hidden or temporary table at step 406.
  • a temporary table is "hidden" in the sense that only a subset of the client applications which access the distributed database, e.g., the client (or clients) which initiated the bulk loading process, are aware of the temporary tables and can access those temporary tables during the time while the bulk loading is being performed, as will be described below.
  • the temporary table which is created will have an identical structure (same distribution/same number of replicas/same schema) as that of a corresponding target table to facilitate the later merging of these tables, as will be described in more detail below.
  • Node 404 also instructs the other nodes 408, 410 to create their own temporary tables as indicated by signals 412 and 414, respectively.
  • the coordinator node 404 will send an instruction to all of the other nodes in the distributed database to create their own temporary tables. However according to other embodiments, the coordinator node 404 will only send the instruction to create a temporary table to a subset of all of the nodes in the distributed database system, e.g., omitting those nodes which do not have a corresponding DBM and DB such as node N in Figure 1.
  • sub-statements in the bulk data to be processed can be sent to their own coordinator nodes as seen in Figure 4.
  • sub-statement 416 is transmitted to node 404, where it is inserted 418 into the temporary table previously created by node 404. Since node 404 is acting as the coordinator node for this sub-statement, node 404 also forwards this sub-statement to those other nodes which own a replica of the shard associated with that sub-statement, in this example node 408 via signal 420, which then inserts that sub-statement into its temporary table.
  • sub-statement 422 is sent to coordinator node 408 for insertion into its temporary table 424 and forwarding to other nodes, in this example node 410, which own a replica of the shard associated with that particular sub-statement, for insertion 426 into its temporary table.
  • the client 400 can commit the batch to the distributed database by issuing a commit batch command 428 to a selected coordinator node 410 (assuming that no previous decision was taken to rollback the bulk loading process as discussed below).
  • the node in this example, node 410, which is selected to be the coordinator node for the commit batch command 428, begins the process of merging its temporary and target table and also forwards the same instruction to the other nodes (or a subset of the other nodes) as indicated by signals 430 and 432.
  • each node will be responsible to merge the data files in the temporary table with the data files of the corresponding target table, as described in more detail below.
  • Figures 5-7 depict various aspects of the embodiment of Figure 4 in more detail.
  • Figure 5 illustrates some of the aspects associated with the creation of the temporary tables and explicitly shows an example of a temporary table and corresponding target table at a node according to an embodiment.
  • First the client 500 (or clients) which are initiating the bulk loading of data will request via signal 502 the cluster to create a temporary table (or equivalently to start the batch processing as depicted in Figure 4).
  • This request is coordinated by one node 504 which will distribute the request to all nodes N in the cluster or, as mentioned previously, potentially to a subset thereof.
  • a representation of the temporary table is created at each node which receives an instruction to do so, an example of which is illustrated in dotted lines by temporary table 506 for node 508.
  • the (currently empty) temporary table 506 has the same format (e.g., same rows and columns) as the target or original table 510 with which it will later be merged after the data is committed and which target table may have some other data already stored therein.
  • the temporary table 506 is hidden in the sense that read operations which are issued before the commit will not see data from the temporary table 506 and write operations from clients which are not involved in the bulk loading process will not affect the temporary table 506.
  • Figure 6 shows certain aspects of an embodiment which occur after the creation of the temporary tables. Then, as part of the batch operation, the client 600 writes data to the temporary tables by issuing the individual sub-statements.
  • each sub-statement 602 is sent to a node 604 which will act as a coordinator for that sub- statement.
  • the coordinator node 604 will share this sub-statement with other relevant nodes (illustrated in Figure 6 by arrows emanating from node 604 to the other nodes N) which are holding a replica of the relevant shard for that data.
  • node 606 At each relevant node, an example of which is illustrated as node 606, that sub-statement is then written to the temporary table 608 and is placed in files 610 which belong to this specific table 608. This procedure is repeated for all sub-statements in the batch, where each sub-statement can be routed to different nodes depending on the key in the sub-statement.
  • the same or different nodes in the cluster may act as the coordinator for each individual sub-statement in the batch.
  • read and write operations from client applications towards the target table 612 and its files 614 can be processed normally, but those operations will only see and affect the data in the target table 612 and not the data in the temporary table 608.
  • the batch can be committed.
  • the client 400 sends a commit statement 428 to one node in the cluster which will act as the coordinator for the commit process.
  • the coordinator node will then forward the commit operation to the other nodes.
  • the commit operation is persisted in the local commit log at each node; ensuring that the operation will complete atomically on all nodes.
  • the commit operation also changes the ownership of the data files from the temporary table 608 to the target table 612. This will effectively introduce the temporary table 608 into the read path of the original table 612.
  • this part of the process does not move or copy those data files 610 from the temporary table 608 into the target table 612; instead the ownership of the data files 610 are transferred from the temporary table 608 to the target table 612.
  • the temporary table 608 can be completely removed, as described in more detail below. If one of the nodes N is not able to receive the commit request, a hint is created which would be replayed when that node is available again.
  • Figure 7 illustrates the status of the distributed database after the batch has been committed according to an embodiment. All subsequent read operations will see data from the target table as well as data from the batch operation. This is accomplished by, for example, sending a read operation 700 from a client 702 to one of the nodes in the cluster which will act as the coordinator for the read request. That node 704 will then forward the read operation towards all or some of the nodes 706, 708, 710 holding a replica for the relevant shard. Each such node will then retrieve data from the table, exemplified in Figure 7 by table 712 stored by node 710, by scanning both the original data files 714 as well as the data files created by the batch operation 716.
  • the result is consolidated before it is sent back 718 to the coordinator node 704, and back to the client via signal 720.
  • the newly committed files 716, as well as those files 714 in the target table 712 will eventually be consolidated together, and temporary table 718 will be deleted.
  • embodiments provided herein provide for the loading of batches in a distributed database system by, among other things, merging two identically structured tables.
  • the merge takes place without moving any data since compaction, which is run regardless on immutable files, will take care of the data movement by automatically merging temporary data files with other target data files.
  • Some embodiments thus do not have the same memory constraints as existing batch solutions have while still being an atomic operation on the cluster of nodes.
  • some embodiments can be expressed as methods for performing the bulk loading of data into a distributed database, an example of which is the method 800 illustrated in the flowchart of Figure 8.
  • a plurality of temporary tables are created, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some of the client applications of the distributed database.
  • the data to be bulk loaded is inserted into each of the temporary tables whose corresponding node owns a shard of that data at step 804.
  • the bulk loaded data is committed to the distributed database at step 806, after which the data inserted into the temporary tables becomes available to the at least some of the client applications.
  • step 802 can be performed by a first module that is appropriately configured
  • step 804 can be performed by a second module that is appropriately configured
  • step 806 can be performed by a third module that is appropriately configured.
  • batches of data are atomic and can be loaded into the database without memory constraints since the batch can be created using several small statements.
  • This feature allows for batches of any size to be bulk loaded as embodiments are not bound by memory in a single node.
  • Embodiments also enable rollback of the batch before it has been committed.
  • the node includes one or more processors 902 and one or more memory devices 904, e.g., RAM, as well as secondary memory 906 which can be used to store the database data in node 900.
  • processors 902 and one or more memory devices 904, e.g., RAM, as well as secondary memory 906 which can be used to store the database data in node 900.
  • memory devices 904 e.g., RAM
  • secondary memory 906 which can be used to store the database data in node 900.
  • These (and potentially other) elements of node 900 can communicate with one another via interconnect 908, and with other nodes via interface 910. These elements can interact to send, receive and process the signaling described above with respect to Figures 4-8.
  • the embodiments may take the form of an entirely hardware embodiment or an embodiment combining hardware and software aspects. Further, the embodiments, e.g., the creation of temporary tables and loading of data into the temporary tables can be implemented using a computer program product stored on a computer-readable storage medium having computer- readable instructions embodied in the medium. Any suitable computer-readable medium or carrier may be utilized, including RAM, hard disks, CD-ROMs (an example of which is illustrated as CD-ROM 1000 in Figure 10), digital versatile disc (DVD), optical storage devices, or magnetic storage devices such as floppy disk or magnetic tape. Other non- limiting examples of computer-readable media include flash-type memories or other known memories.

Abstract

A method for bulk loading of data into a distributed database is described. A plurality of temporary tables are created, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some of the client applications of the distributed database. The data to be bulk loaded is inserted into each of the temporary tables whose corresponding node owns a shard of that data. The bulk loaded data is committed to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some of the client applications.

Description

METHODS AND SYSTEMS FOR BULK LOADING OF DATA
INTO A DISTRIBUTED DATABASE
TECHNICAL FIELD
[0001] The present invention generally relates to distributed and replicated databases and, more specifically, to bulk inserting data into a distributed database.
BACKGROUND
[0002] In distributed databases, data is synchronized across multiple nodes and stored in collections of data called tables. Figure 1 depicts an example of a distributed database having a plurality of nodes 1 ...N. A node in a distributed database, e.g., Node 2, can have a data storage device (DB) 100, a database manager (DBM) 102 and a request coordinator (RC) 104. Client applications, like client application 106, make requests or queries of the distributed database system. The RC 104 is a software module which assists in translating and formatting client queries into commands which will be understood by the DBMs, each of which are responsible for reading data from, and writing data to, a corresponding DB. Note that, in some implementations, a node may have all three components, i.e., RC, DBM, and DB, or only subsets of those components as illustrated in Figure 1. However Figure 1 should be considered purely as an illustrative example of some of the components of a distributed database.
[0003] Each node in a distributed database like that of Figure 1 may have its own copy of data of one or several data shards. As will be appreciated by those skilled in the art, a shard is a horizontal partition of data in the database. Describing some of the terminology associated with distributed database implementations may be useful for the reader of this description. For example, a table as that term is used in this technology, is both a logical unit in the cluster, and also a physical unit at each node shard. When a request is made by a client application, an algorithm is used to determine which nodes are responsible for which shard of the relevant table. There are different kinds of algorithms that can be used to divide the full key range into shards. For example, the complete key range can be divided into equally sized shards, one for each node. Each node would then be the primary owner of one such shard. Another strategy may be to divide the complete key range into a large number of randomly sized shards. Each node will then be the primary owner of several such shards. In order to handle all kinds of simple or complex data types as a key, it is common to calculate a hash value from the actual key, and then associate sequences of hash values with each shard.
[0004] Each node is also responsible for synchronizing its data with other nodes in the distributed database that have a replica of its shard(s) so that the data is consistent. This means that if a node is unable to receive a write request, another node will create a hint which will be replayed on the target node, once the node is available again.
[0005] Figure 2 illustrates an example of a conventional technique for writing a single statement to a distributed database. When writing data to the cluster, the write instruction is often expressed in a statement 200 which modifies a specific unit of data. This unit of data is identified with a key. When a client 202 starts a write operation, it will typically start by choosing a node in the cluster of nodes which will act as the coordinator for the operation. There are different techniques which can be used by the client 202 to select an appropriate coordinator node. For example, a coordinator can be selected randomly or can be selected using a round-robin approach, the latter of which often gives an even load in the system. However, it is also possible for a client to choose a coordinator which it knows to be close to one of the shard replicas affected by the request which is being sent by the client.
[0006] Regardless of how the coordinator is selected, the client 202 sends the statement to the coordinator 204 which in turn will calculate which shard the key associated with that statement is part of. The coordinator 204 will then forward the statement to the nodes 206, 208, 210 which are determined to have a replica of that shard. Each replica node 206, 208 and 210 will then apply the modification to the table specified by the statement, an example of which is illustrated for node 208 and table 212. Typically the modification is persisted in a local commit log, and then eventually the modification will finally be flushed to one of several structured data files 214 which hold the content of that table.
[0007] Figure 3 illustrates a conventional batch statement write operation. When loading a distributed cluster with volumes of data, it is common to group data together into a single statement. This is known as a batch statement and guarantees that if the batch is accepted, the entire batch will be executed eventually. Batch statements are typically used to make multiple non-atomic statements into one atomic statement represented by arrow 300 being sent as a request from client 302. This one atomic statement will either fail as one unit or be committed as one unit.
[0008] When writing batch statements in a distributed database, one node 304 will typically act as a coordinator for the entire batch. The coordinator node 304 will take that single batch and hold it in memory as indicated by log file 306. The coordinator node 304 will also write this batch statement to a log 307 on at least one other node. If the batch statement is successfully written to the logs 306 and 307, then the data is considered to be persistent and will not be lost if the coordinator node 304 goes down. The coordinator node 304 will then write the individual statements in the batch from its memory (as indicated by the arrows coming from node 304 to the other nodes), in order to replicate the data to the tables in the corresponding nodes which are responsible for the shard(s) affected by this particular batch statement. For example, node 308 might have two statements from the batch written to it by node 304, while other nodes in Figure 3 might only have one statement written thereto depending upon which nodes own which shards that are impacted by the batch. If the coordinator node 304 goes down, it will read from the log 306 or 307of batches and pick up where it left off to process the batch write operation.
[0009] A problem with loading batches of data into a distributed database system in the manner described above is that the coordinator 304 will manage the batch until the batch has been completed. To maintain good performance the coordinator 304 will typically hold the complete batch in memory through the whole batch process.
Depending on how many clients 302 run batches at the same time and how big the batches are, the coordinator 304 will have a lot of memory reserved for these statements, which causes heap pressure. This effectively sets a threshold on how big a batch can be without causing performance issues. Another problem is that the batch procedure is slow as it writes all data twice to ensure atomicity, first to the batch log and then to the real tables. As the batch is expressed as one operation towards the database cluster it is impractical to handle huge batches spanning millions of records using this conventional technique.
[0010] Another option available is to bulk load data into the distributed database by streaming data files directly to the nodes that are responsible. This is achieved by preparing a data set and streaming it from an external source into the database. The external source will be responsible for recovering from errors occurring during this process.
[0011] Using the streaming bulk load method is also problematic. Using this method it is possible to handle batches of any size, however the data is being inserted non-atomically as the progress is not stored anywhere in the database, in contrast to batch statements where the complete batch is written to the database for persistence before it is committed. Therefore, with the streaming bulk load method a crash provides no insight into how much of the data was inserted into the database, and where to restart the process. This will effectively put the recovery burden on the client. When using this method it is virtually impossible to perform a rollback on a batch that has partial failures which could leave the data in an inconsistent state. While this method supports huge batches, it also has poor isolation characteristics as other clients will see some of the data before the batch is completed.
[0012] Thus, there is a need to provide methods and systems that overcome the above-described drawbacks of the conventional approaches to implementing bulk loading of data to distributed databases. SUMMARY
[0013] Embodiments allow for bulk or batch loading of data into a distributed database using temporary tables. During the bulk loading the temporary tables are not accessible to those client applications which are not involved in the bulk loading. After the batch is committed, the data in the temporary tables becomes available to all of the client applications. Temporary tables are periodically consolidated with corresponding target tables as part of a compaction process.
[0014] According to an embodiment, there is a method for bulk loading of data into a distributed database. A plurality of temporary tables are created, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some of the client applications of the distributed database. The data to be bulk loaded is inserted into each of the temporary tables whose corresponding node owns a shard of that data. The bulk loaded data is committed to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some of the client applications.
[0015] According to an embodiment, there is a distributed database system. The system includes a plurality of nodes, at least some of the nodes each including a communications interface configured to receive and to transmit instructions and data to be bulk loaded into the distributed database system, at least one storage device configured to store data; and at least one processor configured to create a temporary table, the temporary table being temporarily unavailable to at least some of the client applications of the distributed database, further configured to insert data to into the temporary table when the node owns a shard of that data; and further configured to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary table becomes available to the at least some of the client applications.
[0016] According to an embodiment, an apparatus is adapted to create a plurality of temporary tables, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some client applications of the distributed database, to insert the data to be bulk loaded into each of the temporary tables whose corresponding node owns a shard of that data, and to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some client applications.
[0017] According to an embodiment, a computer program, comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the methods described herein.
[0018] According to an embodiment, a carrier containing the computer program of claim described above, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium.
[0019] According to an embodiment, an apparatus comprising a first module configured to create a plurality of temporary tables, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some client applications of the distributed database; a second module configured to insert the data to be bulk loaded into each of the temporary tables whose corresponding node owns a shard of that data; and a third module configured to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some client applications.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate one or more embodiments and, together with the description, explain these embodiments. In the drawings:
[0021] Figure 1 illustrates a distributed database system;
[0022] Figure 2 shows a conventional single statement write to a distributed database;
[0023] Figure 3 shows a conventional batch statement write to a distributed database;
[0024] Figure 4 depicts bulk loading of data into a distributed database according to an embodiment;
[0025] Figures 5-7 show various aspects of bulk loading of data into a distributed database according to an embodiment;
[0026] Figure 8 illustrates a flowchart of a method for bulk loading data into a distributed database according to an embodiment;
[0027] Figure 9 illustrates a node in a distributed database; and
[0028] Figure 10 illustrates a carrier on which a computer program product according to an embodiment resides.
DETAILED DESCRIPTION
[0029] The following description of the embodiments refers to the accompanying drawings. The same reference numbers in different drawings identify the same or similar elements. The following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims. The embodiments to be discussed next are not limited to the configurations described below, but may be extended to other arrangements as discussed later.
[0030] Reference throughout the specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with an embodiment is included in at least one embodiment of the present invention. Thus, the appearance of the phrases "in one embodiment" or "in an
embodiment" in various places throughout the specification is not necessarily all referring to the same embodiment. Further, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
[0031] As described in the Background, there are problems associated with current methods for bulk loading data into a distributed database. According to an embodiment, those problems are addressed by bulk loading data using a plurality of temporary tables and inserting the data to be bulk loaded into each of the temporary tables whose corresponding node in the distributed database owns a shard of that data. Unless a rollback is performed during the bulk load process, once all of the data is loaded into the appropriate temporary tables, it can be committed to the distributed database and made available to the client applications. [0032] An overview of an embodiment is illustrated in Figure 4, with more details later illustrated and described with respect to Figures 5-7. Therein, a client 400 initiates the process by sending an instruction or command 402 to start the bulk loading of data to a node 404 which it has selected as the coordinator node. A coordinator can be selected to start the bulk loading either randomly or using a round-robin approach . Node 404 creates its own hidden or temporary table at step 406. In this context, a temporary table is "hidden" in the sense that only a subset of the client applications which access the distributed database, e.g., the client (or clients) which initiated the bulk loading process, are aware of the temporary tables and can access those temporary tables during the time while the bulk loading is being performed, as will be described below. According to an embodiment, the temporary table which is created will have an identical structure (same distribution/same number of replicas/same schema) as that of a corresponding target table to facilitate the later merging of these tables, as will be described in more detail below.
[0033] Node 404 also instructs the other nodes 408, 410 to create their own temporary tables as indicated by signals 412 and 414, respectively. Those skilled in the art will appreciate that although only one client and three nodes are illustrated in Figure 4, a typical implementation of a distributed database will have many more clients and nodes. In some embodiments, the coordinator node 404 will send an instruction to all of the other nodes in the distributed database to create their own temporary tables. However according to other embodiments, the coordinator node 404 will only send the instruction to create a temporary table to a subset of all of the nodes in the distributed database system, e.g., omitting those nodes which do not have a corresponding DBM and DB such as node N in Figure 1.
[0034] After the temporary tables are created, the sub-statements in the bulk data to be processed can be sent to their own coordinator nodes as seen in Figure 4. In this example, sub-statement 416 is transmitted to node 404, where it is inserted 418 into the temporary table previously created by node 404. Since node 404 is acting as the coordinator node for this sub-statement, node 404 also forwards this sub-statement to those other nodes which own a replica of the shard associated with that sub-statement, in this example node 408 via signal 420, which then inserts that sub-statement into its temporary table. Similarly, sub-statement 422 is sent to coordinator node 408 for insertion into its temporary table 424 and forwarding to other nodes, in this example node 410, which own a replica of the shard associated with that particular sub-statement, for insertion 426 into its temporary table.
[0035] Although only two sub-statements are illustrated as being bulk loaded into the distributed database in the embodiment of Figure 4, it will be appreciated by those skilled in the art that many more sub-statements may be processed in a single batch using the foregoing techniques, e.g., hundreds of thousands or millions of sub-statements.
Once all of the sub-statements have been inserted into the temporary tables associated with nodes which own replicas of the shards which correspond to those sub-statements, then the client 400 can commit the batch to the distributed database by issuing a commit batch command 428 to a selected coordinator node 410 (assuming that no previous decision was taken to rollback the bulk loading process as discussed below). The node, in this example, node 410, which is selected to be the coordinator node for the commit batch command 428, begins the process of merging its temporary and target table and also forwards the same instruction to the other nodes (or a subset of the other nodes) as indicated by signals 430 and 432. Once the batch is committed, each node will be responsible to merge the data files in the temporary table with the data files of the corresponding target table, as described in more detail below.
[0036] Figures 5-7 depict various aspects of the embodiment of Figure 4 in more detail. For example, Figure 5 illustrates some of the aspects associated with the creation of the temporary tables and explicitly shows an example of a temporary table and corresponding target table at a node according to an embodiment. First the client 500 (or clients) which are initiating the bulk loading of data will request via signal 502 the cluster to create a temporary table (or equivalently to start the batch processing as depicted in Figure 4). This request is coordinated by one node 504 which will distribute the request to all nodes N in the cluster or, as mentioned previously, potentially to a subset thereof. A representation of the temporary table is created at each node which receives an instruction to do so, an example of which is illustrated in dotted lines by temporary table 506 for node 508. According to an embodiment, the (currently empty) temporary table 506 has the same format (e.g., same rows and columns) as the target or original table 510 with which it will later be merged after the data is committed and which target table may have some other data already stored therein. The temporary table 506 is hidden in the sense that read operations which are issued before the commit will not see data from the temporary table 506 and write operations from clients which are not involved in the bulk loading process will not affect the temporary table 506. [0037] Figure 6 shows certain aspects of an embodiment which occur after the creation of the temporary tables. Then, as part of the batch operation, the client 600 writes data to the temporary tables by issuing the individual sub-statements. For example, each sub-statement 602 is sent to a node 604 which will act as a coordinator for that sub- statement. The coordinator node 604 will share this sub-statement with other relevant nodes (illustrated in Figure 6 by arrows emanating from node 604 to the other nodes N) which are holding a replica of the relevant shard for that data. At each relevant node, an example of which is illustrated as node 606, that sub-statement is then written to the temporary table 608 and is placed in files 610 which belong to this specific table 608. This procedure is repeated for all sub-statements in the batch, where each sub-statement can be routed to different nodes depending on the key in the sub-statement. The same or different nodes in the cluster may act as the coordinator for each individual sub-statement in the batch. During the processing of the batch, read and write operations from client applications towards the target table 612 and its files 614 can be processed normally, but those operations will only see and affect the data in the target table 612 and not the data in the temporary table 608.
[0038] Once all of the sub-statements are processed as shown in Figure 6, the batch can be committed. As indicated in Figure 4, the client 400 sends a commit statement 428 to one node in the cluster which will act as the coordinator for the commit process. The coordinator node will then forward the commit operation to the other nodes. The commit operation is persisted in the local commit log at each node; ensuring that the operation will complete atomically on all nodes. The commit operation also changes the ownership of the data files from the temporary table 608 to the target table 612. This will effectively introduce the temporary table 608 into the read path of the original table 612. According to an embodiment, this part of the process does not move or copy those data files 610 from the temporary table 608 into the target table 612; instead the ownership of the data files 610 are transferred from the temporary table 608 to the target table 612. Once this part of the process is completed, the temporary table 608 can be completely removed, as described in more detail below. If one of the nodes N is not able to receive the commit request, a hint is created which would be replayed when that node is available again.
[0039] Figure 7 illustrates the status of the distributed database after the batch has been committed according to an embodiment. All subsequent read operations will see data from the target table as well as data from the batch operation. This is accomplished by, for example, sending a read operation 700 from a client 702 to one of the nodes in the cluster which will act as the coordinator for the read request. That node 704 will then forward the read operation towards all or some of the nodes 706, 708, 710 holding a replica for the relevant shard. Each such node will then retrieve data from the table, exemplified in Figure 7 by table 712 stored by node 710, by scanning both the original data files 714 as well as the data files created by the batch operation 716. The result is consolidated before it is sent back 718 to the coordinator node 704, and back to the client via signal 720. As part of a compaction process, the newly committed files 716, as well as those files 714 in the target table 712 will eventually be consolidated together, and temporary table 718 will be deleted.
[0040] Briefly, regarding the compaction process, those skilled in the art will recognize that data is stored in distributed databases in immutable files for persistence. This gives the system the ability to avoid in-place updates which improves performance. However immutability means that eventually the disk will be filled by immutable files. This is typically solved by compaction. Compaction takes multiple immutable files and merges them together into one new file. The old files are then deleted to free space. The compaction process is used in these embodiments to also remove the temporary tables which were created to aid in the bulk loading, after the batch is committed.
[0041] From the foregoing description, it will be apparent that embodiments provided herein provide for the loading of batches in a distributed database system by, among other things, merging two identically structured tables. The merge takes place without moving any data since compaction, which is run regardless on immutable files, will take care of the data movement by automatically merging temporary data files with other target data files. Some embodiments thus do not have the same memory constraints as existing batch solutions have while still being an atomic operation on the cluster of nodes.
[0042] In addition to distributed database systems, some embodiments can be expressed as methods for performing the bulk loading of data into a distributed database, an example of which is the method 800 illustrated in the flowchart of Figure 8. Therein, at step 802, a plurality of temporary tables are created, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some of the client applications of the distributed database. The data to be bulk loaded is inserted into each of the temporary tables whose corresponding node owns a shard of that data at step 804. The bulk loaded data is committed to the distributed database at step 806, after which the data inserted into the temporary tables becomes available to the at least some of the client applications. [0043] Each of the steps described above and illustrated in Figure 8 can be performed by modules. For example, step 802 can be performed by a first module that is appropriately configured, step 804 can be performed by a second module that is appropriately configured and step 806 can be performed by a third module that is appropriately configured.
[0044] Such embodiments have various benefits. For example, batches of data are atomic and can be loaded into the database without memory constraints since the batch can be created using several small statements. This feature allows for batches of any size to be bulk loaded as embodiments are not bound by memory in a single node.
Embodiments also enable rollback of the batch before it has been committed.
Furthermore the data being loaded will be stored in the correct shard right away using a single write operation for each statement, albeit isolated from some clients until a commit is issued.
[0045] For completeness, some elements which may be present in an exemplary node 900 of a distributed database are shown in Figure 9. Therein, the node includes one or more processors 902 and one or more memory devices 904, e.g., RAM, as well as secondary memory 906 which can be used to store the database data in node 900. These (and potentially other) elements of node 900 can communicate with one another via interconnect 908, and with other nodes via interface 910. These elements can interact to send, receive and process the signaling described above with respect to Figures 4-8.
[0046] The disclosed embodiments provide methods and devices for the bulk loading of data in a distributed database using temporary tables. It should be understood that this description is not intended to limit the invention. On the contrary, the embodiments are intended to cover alternatives, modifications and equivalents, which are included in the spirit and scope of the invention. Further, in the detailed description of the embodiments, numerous specific details are set forth in order to provide a comprehensive understanding of the claimed invention. However, one skilled in the art would understand that various embodiments may be practiced without such specific details.
[0047] As also will be appreciated by one skilled in the art, the embodiments may take the form of an entirely hardware embodiment or an embodiment combining hardware and software aspects. Further, the embodiments, e.g., the creation of temporary tables and loading of data into the temporary tables can be implemented using a computer program product stored on a computer-readable storage medium having computer- readable instructions embodied in the medium. Any suitable computer-readable medium or carrier may be utilized, including RAM, hard disks, CD-ROMs (an example of which is illustrated as CD-ROM 1000 in Figure 10), digital versatile disc (DVD), optical storage devices, or magnetic storage devices such as floppy disk or magnetic tape. Other non- limiting examples of computer-readable media include flash-type memories or other known memories.
[0048] Although the features and elements of the present embodiments are described in the embodiments in particular combinations, each feature or element can be used alone without the other features and elements of the embodiments or in various combinations with or without other features and elements disclosed herein. The methods or flowcharts provided in the present application may be implemented in a computer program, software or firmware tangibly embodied in a computer-readable storage medium for execution by a specifically programmed computer or processor.

Claims

WHAT IS CLAIMED IS:
1. A method for bulk loading of data into a distributed database comprising:
creating a plurality of temporary tables, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some client applications of the distributed database;
inserting the data to be bulk loaded into each of the temporary tables whose corresponding node owns a shard of that data; and
committing the bulk loaded data to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some client applications.
2. The method of claim 1 , wherein the temporary tables are accessible by at least one client application which is involved with the bulk loading of data, but not to other client applications until after the step of committing is performed.
3. The method of claims 1 -2, further comprising:
merging the temporary tables with corresponding target tables; and
deleting the temporary tables.
4. The method of claim 3, wherein the temporary tables and their corresponding target tables have identical table formats.
5. The method of claims 1 -4, wherein the step of creating the plurality of temporary tables further comprises:
receiving, at one of the plurality of different nodes selected as a coordinator node, an instruction to start the bulk loading of data;
creating one of the temporary tables at the coordinator node; and
sending, by the coordinator node, an instruction to each of the other nodes to create a temporary table.
6. The method of claim 5, further comprising:
receiving the data to be bulk loaded into the distributed database after receiving the instruction to start the bulk loading of data.
7. The method of claim 6, wherein the data to be bulk loaded is received at different ones of the plurality of nodes.
8. The method of claim 3, wherein the step of merging is performed as part of a compaction process in the distributed database.
9. The method of claims 1 -8, wherein the distributed database includes the plurality of different nodes, each node having at least one processor and at least one memory device for storing data in the distributed database, and wherein the data associated with a single, logical table structure in the distributed database does not all reside in one of the plurality of different nodes but is instead distributed across multiple nodes.
10. The method of claims 1 -9, further comprising:
prior to committing the bulk loaded data to the distributed database, cancelling the bulk loading operation by deleting the plurality of temporary tables.
1 1 . A distributed database system comprising:
a plurality of nodes, at least some of said nodes each including:
a communications interface configured to receive and to transmit instructions and data to be bulk loaded into the distributed database system;
at least one storage device configured to store data; and
at least one processor configured to create a temporary table, the temporary table being temporarily unavailable to at least some of the client applications of the distributed database, further configured to insert data to into the temporary table when the node owns a shard of that data; and further configured to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary table becomes available to the at least some of the client applications.
12. The system of claim 1 1 , wherein the temporary table is accessible by at least one client application which is involved with the bulk loading of data, but not to other client applications until after the at least one processor commits the bulk loaded data to the distributed database.
13. The system of claims 1 1 -12, wherein the at least one processor is further configured to merge the temporary table with a corresponding target table and to delete the temporary table.
14. The system of claim 13, wherein the temporary table and its corresponding target table have identical table formats.
15. The system of claims 1 1 -14, wherein the processor is further configured to create the plurality of temporary tables by:
receiving, when the node is selected as a coordinator node, an instruction to start the bulk loading of data;
creating the temporary tables; and
sending an instruction to each of the other at least some of the nodes to create a temporary table.
16. The system of claim 15, wherein the communications interface receives data to be bulk loaded into the distributed database after receiving the instruction to start the bulk loading of data.
17. The system of claim 16, wherein the data to be bulk loaded is received at different ones of the plurality of nodes.
18. The system of claim 13, wherein the at least one processor is configured to merge the temporary table with the target table as part of a compaction process in the distributed database.
19. The system of claims 1 1 -18, wherein the data associated with a single, logical table structure in the distributed database does not all reside in one of the plurality of different nodes but is instead distributed across multiple nodes.
20. The system of claims 1 1 -19, wherein the at least one processor is further configured to, prior to committing the bulk loaded data to the distributed database, cancelling the bulk loading operation by deleting the plurality of temporary tables.
21. An apparatus adapted to create a plurality of temporary tables, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some client applications of the distributed database, to insert the data to be bulk loaded into each of the temporary tables whose corresponding node owns a shard of that data, and to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some client applications.
22. A computer program, comprising instructions which, when execute on at least one processor, cause the at least one processor to carry out the method according to any one of claims 1 -10.
23. A carrier containing the computer program of claim 22, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, or a computer readable storage medium.
24. An apparatus comprising:
a first module configured to create a plurality of temporary tables, each temporary table being associated with a different node in the distributed database, which temporary tables are temporarily unavailable to at least some client applications of the distributed database;
a second module configured to insert the data to be bulk loaded into each of the temporary tables whose corresponding node owns a shard of that data; and
a third module configured to commit the bulk loaded data to the distributed database, after which the data inserted into the temporary tables becomes available to the at least some client applications.
PCT/EP2016/069283 2016-08-12 2016-08-12 Methods and systems for bulk loading of data into a distributed database WO2018028797A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/069283 WO2018028797A1 (en) 2016-08-12 2016-08-12 Methods and systems for bulk loading of data into a distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/069283 WO2018028797A1 (en) 2016-08-12 2016-08-12 Methods and systems for bulk loading of data into a distributed database

Publications (1)

Publication Number Publication Date
WO2018028797A1 true WO2018028797A1 (en) 2018-02-15

Family

ID=56800274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2016/069283 WO2018028797A1 (en) 2016-08-12 2016-08-12 Methods and systems for bulk loading of data into a distributed database

Country Status (1)

Country Link
WO (1) WO2018028797A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489403A (en) * 2019-08-27 2019-11-22 江苏华库数据技术有限公司 A kind of method of the pretreatment and storage of high-volume data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050187977A1 (en) * 2004-02-21 2005-08-25 Datallegro, Inc. Ultra-shared-nothing parallel database
US20080120304A1 (en) * 2006-11-21 2008-05-22 Calio Robert J Method and system for providing high performance data modification of relational database tables

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050187977A1 (en) * 2004-02-21 2005-08-25 Datallegro, Inc. Ultra-shared-nothing parallel database
US20080120304A1 (en) * 2006-11-21 2008-05-22 Calio Robert J Method and system for providing high performance data modification of relational database tables

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Database tuning - Wikipedia", 28 July 2016 (2016-07-28), XP055343533, Retrieved from the Internet <URL:https://en.wikipedia.org/w/index.php?title=Database_tuning&oldid=731895697> [retrieved on 20170208] *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489403A (en) * 2019-08-27 2019-11-22 江苏华库数据技术有限公司 A kind of method of the pretreatment and storage of high-volume data

Similar Documents

Publication Publication Date Title
US11874746B2 (en) Transaction commit protocol with recoverable commit identifier
US11327958B2 (en) Table replication in a database environment
US11429641B2 (en) Copying data changes to a target database
US9626394B2 (en) Method for mass-deleting data records of a database system
US10296632B2 (en) Synchronization on reactivation of asynchronous table replication
US8078825B2 (en) Composite hash and list partitioning of database tables
US8626717B2 (en) Database backup and restore with integrated index reorganization
US9515878B2 (en) Method, medium, and system for configuring a new node in a distributed memory network
US9218405B2 (en) Batch processing and data synchronization in cloud-based systems
EP2746971A2 (en) Replication mechanisms for database environments
US9922086B1 (en) Consistent query of local indexes
CN105574187A (en) Duplication transaction consistency guaranteeing method and system for heterogeneous databases
JP2008518284A (en) Metadata management for fixed content distributed data storage
US10810092B2 (en) Checkpoints for document store
US11550485B2 (en) Paging and disk storage for document store
US11216416B2 (en) Managing snapshotting of a dataset using an ordered set of B+ trees
KR101875763B1 (en) The database management system and method for preventing performance degradation of transaction when table reconfiguring
US8452730B2 (en) Archiving method and system
US8612717B2 (en) Storage system
WO2018028797A1 (en) Methods and systems for bulk loading of data into a distributed database
CN110442645B (en) Data indexing method and device
US11151157B2 (en) Database management method
Ronström et al. Recovery principles in MySQL cluster 5.1
CN106155837B (en) method and device for restoring data of main and standby databases
US10360145B2 (en) Handling large writes to distributed logs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16756984

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16756984

Country of ref document: EP

Kind code of ref document: A1