US20130318059A1 - Transfer of data from transactional data sources to partitioned databases in restartable environment - Google Patents

Transfer of data from transactional data sources to partitioned databases in restartable environment Download PDF

Info

Publication number
US20130318059A1
US20130318059A1 US13/956,159 US201313956159A US2013318059A1 US 20130318059 A1 US20130318059 A1 US 20130318059A1 US 201313956159 A US201313956159 A US 201313956159A US 2013318059 A1 US2013318059 A1 US 2013318059A1
Authority
US
United States
Prior art keywords
partition
message
records
messages
transactional data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/956,159
Inventor
Jing-Song Jang
James M. McArdle
Michael J.E. Spicer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ServiceNow Inc
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US13/956,159 priority Critical patent/US20130318059A1/en
Publication of US20130318059A1 publication Critical patent/US20130318059A1/en
Assigned to SERVICENOW, INC. reassignment SERVICENOW, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/30371
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1474Saving, restoring, recovering or retrying in transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Definitions

  • Transferring large volumes of data from a transactional data source to a partitioned database in a restartable environment requires high data throughput, no data loss, and no data duplication.
  • High data throughput allows for real-time transfer of the large volumes of data. Additionally, if the transfer of the large volumes of data fails before all of the data have been transferred and is later restarted, then there needs to be some assurances that any data not yet transferred is not lost and any data already transferred is not transferred again.
  • Method, system, and computer program product for transferring data from transactional data sources to partitioned databases are provided.
  • one or more messages from a transactional data source are read.
  • Each message includes one or more records to be inserted into a database comprising a plurality of partitions.
  • One of the partitions of the database in which the one or more records of each message are to be inserted is then identified.
  • a transaction is initiated to insert the one or more records of each message into the one partition and to delete the one or more messages from the transactional data source.
  • the transaction is committed only when insertion of the one or more records of each message into the one partition and deletion of the one or more messages from the transactional data source succeed.
  • FIG. 1 depicts a process for transferring data from transactional data sources to partitioned databases according to an implementation.
  • FIG. 2 illustrates a system according to an implementation.
  • FIG. 3 is a block diagram of a data processing system with which implementations of this disclosure can be implemented.
  • This disclosure generally relates to transferring data from transactional data sources to partitioned databases in restartable environments.
  • the following description is provided in the context of a patent application and its requirements. Accordingly, this disclosure is not intended to be limited to the implementations shown, but is to be accorded the widest scope consistent with the principles and features described herein.
  • a transactional data source may be, for instance, a queue storing messages or other types of data packets.
  • the types of data that may be stored in a transactional data source include, for instance, stock prices, GPS (Global Positioning System) locations, RFID (Radio Frequency Identification) tags, or the like.
  • a partitioned database is a database that has been divided into multiple partitions based on values in one or more database table columns. Each column upon which partitioning of a database is based is also referred to as a partitioning key.
  • a restartable environment is an environment that can restart where it left off when failure occurred.
  • High data throughput is required because the type and amount of data being transferred need to be transferred in real-time. No data loss is required because if data transfer fails as a result of, for instance, system shut down, application error, loss of connection, or the like, and is later restarted, then data that has not been successfully transferred cannot be lost. No data duplication is required because in the event of a failure and restart, any data that has already been successfully transferred cannot be transferred again. Otherwise, databases will have duplicated data.
  • FIG. 1 Depicted in FIG. 1 is a process 100 for transferring data from transactional data sources to partitioned databases according to an implementation.
  • one or more messages are read from a transactional data source.
  • Each message includes one or more records to be inserted into a database that comprises a plurality of partitions.
  • One of the plurality of partitions of the database in which the one or more records of each message are to be inserted is identified at 104 .
  • a single transaction is then initiated at 106 to insert the one or more records of each message into the one partition and to delete the one or more messages from the transactional data source. Combining the insertion of the one or more records and the deletion of the one or more messages into the single transaction ensures that neither operation can commit (e.g., written to disk) without the other.
  • the order in which the insertion of the one or more records and the deletion of the one or more messages are carried out does not matter.
  • Failure of an insertion into the database or deletion from the transactional data source may be caused by any number of things, such as, the database crashing, the transactional data source crashing, connection to the database being lost, connection to the transactional data source being lost, and so forth.
  • the insertion of record(s) and the deletion of message(s) are part of the same transaction, the insertion cannot commit without the deletion committing, and vice versa. This ensures no data loss and no data duplication when data transfer is restarted after failure.
  • System 200 includes a transactional data source 202 , a data stream engine 204 , and a database 206 .
  • Transactional data source 202 includes a plurality of messages 208 a - 208 c. Each message 208 includes one or more records 210 .
  • transactional data source 202 is a queue.
  • Database 206 includes a plurality of partitions 212 a - 212 b.
  • system 200 may include additional entities, such as, database server(s), other data source(s), other database(s), or the like.
  • transactional data source 202 may include other messages (not illustrated).
  • Further database 206 may include other partitions (not illustrated).
  • aspects of the present invention may be embodied as a system, method or computer program product.
  • aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.”
  • aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • Data stream engine 204 includes a feed handler 214 , a sorter 216 , and flushers 218 a - 218 b.
  • Feed handler 214 reads messages 208 a - 208 c from transactional data source 202 .
  • Messages 208 a - 208 c may be read together (e.g., around the same time) or separately (e.g., at different times).
  • sorter 216 identifies one of partitions 212 a - 212 b of database 206 in which to insert the one or more records 210 of the message 208 and directs the one or more records 210 of the message 208 to the flusher 218 servicing the one partition 212 .
  • sorter 216 has identified partition 212 a of database 206 for messages 208 a and 208 b and partition 212 b of database 206 for message 208 c.
  • record 210 a of message 208 a and records 210 b - 210 d of message 208 b have been directed to flusher 218 a, which services partition 212 a
  • records 210 e - 210 f of message 208 c have been directed to flusher 218 b, which services partition 212 b.
  • Each flusher 218 is directly connected to the partition 212 serviced by the flusher 218 .
  • Flusher 218 a then initiates a transaction to insert records 210 a - 210 d into partition 212 a and to delete messages 208 a - 208 b from transactional data source 202 .
  • separate transactions may be initiated by flusher 218 a for records 210 of each message.
  • Flusher 218 b initiates a separate transaction to insert records 210 e - 210 f into partition 212 b and to delete message 208 c from transactional data source 202 .
  • the transaction initiated by flusher 218 b may be running in parallel with the transaction initiated by flusher 218 a.
  • data stream engine 204 may be included in data stream engine 204 .
  • additional flushers may be added to data stream engine 204 to service other partitions (not illustrated) of database 206 and/or to have multiple flushers servicing each partition 212 of database 206 .
  • message 208 a may be directed to one flusher
  • message 208 b may be directed to another flusher.
  • the number of flushers 218 servicing each partition 212 need not be the same.
  • flusher utilization scales linearly. For example, if data throughput to database 206 is 40,000 records/second (i.e., 20,000 records/second to each partition 212 ), then adding a new partition (not illustrated) to database 206 and a new flusher (not illustrated) to data stream engine 204 to service the new partition will increase data throughput to 60,000 records/second.
  • Messages 208 included in each record 210 may be destined for more than one partition 212 of database 206 .
  • record 210 c of message 208 b may be destined for partition 212 b rather than partition 212 a.
  • sorter 216 may identify the partition 212 in which to insert the records 210 of the message 208 based on, for instance, the partition 212 in which most of the records 210 of the message 208 are destined, the partition 212 in which the first record 210 of the message 208 is destined, or the like.
  • the partition 212 in which each record 210 of the message 208 is destined may be determined by comparing one or more partitioning key values of the record 210 to a partitioning key lookup table for database 206 .
  • a partitioning key value of a record is the value in a field of the record that corresponds to a partitioning key of a database.
  • Data stream engine 204 may be implemented through one or more processors, memories, and/or other computer hardware. Thus, when feed handler 214 reads messages 208 from transactional data source 202 , those messages 208 may be stored in memory. Feed handler 214 , sorter 216 , and each flusher 218 can be implemented as a separate thread. To give an example, a first thread acting as feed handler 214 browses transactional data source 202 and reads messages 208 from transactional data source 202 without removing messages 208 from transactional data source 202 .
  • the first thread submits records 210 from messages 208 to a second thread acting as sorter 216 .
  • the second thread determines which partition 212 in database 206 records 210 from each message 208 should be written to and directs records 210 of each message 208 to a third thread acting as flusher 218 a or a fourth thread acting as flusher 218 b, which are connected to partition 212 a and partition 212 b, respectively.
  • a transaction combining deletion of messages 208 a - 208 b from transactional data source 202 and insertion of records 210 a - 210 d into partition 212 a of database 206 is initiated by the third thread.
  • a separate transaction combining deletion of message 208 c from transactional data source 202 and insertion of records 210 e - 210 f into partition 212 b of database 206 is initiated by the fourth thread.
  • the delete of the message from the transactional data source and the insert of records in the message to the database are performed by the same thread because all operations in the same transaction must be from the same thread.
  • the third thread may initiate a separate transaction for each message 208 a and 208 b. However, all records of a single message are part of the same transaction.
  • the third and fourth threads may manage their own transactions using, for instance, the XA 2-phase commit protocol defined by the X/Open Distributed Transaction Processing (DTP) group. Consequently, there may be multiple parallel transactions running, one or more for each database partition with transaction data specifically directed to that database partition, which minimizes crosstalk (e.g., re-routing records from one partition to another).
  • Each thread may have an associated memory queue in which data to be processed by a thread is placed on the memory queue associated with the thread.
  • any messages involved in transactions not yet committed will still be in transactional data source 202 and available for processing again. If connection to transactional data source 202 or database 206 is lost, then all records in memory from the data source or destined for the database are discarded, any remaining connections are disconnected, all connections are then re-established, and all discarded data will be on transactional data source 202 for re-processing. This allows all data to be processed and inserted into database 206 once without duplicates and prevents data from being lost even with failures and restarts.
  • a failure e.g., system crash
  • the disclosure can take the form of an entirely hardware implementation, an entirely software implementation, or an implementation containing both hardware and software elements.
  • this disclosure is implemented in software, which includes, but is not limited to, application software, firmware, resident software, microcode, etc.
  • this disclosure can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk.
  • Current examples of optical disks include DVD, compact disk-read-only memory (CD-ROM), and compact disk-read/write (CD-R/W).
  • FIG. 3 depicts a data processing system 300 suitable for storing and/or executing program code.
  • Data processing system 300 includes a processor 302 coupled to memory elements 304 a - b through a system bus 306 .
  • data processing system 300 may include more than one processor and each processor may be coupled directly or indirectly to one or more memory elements through a system bus.
  • Memory elements 304 a - b can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code in order to reduce the number of times the code must be retrieved from bulk storage during execution.
  • I/O devices 308 a - b are coupled to data processing system 300 .
  • I/O devices 308 a - b may be coupled to data processing system 300 directly or indirectly through intervening I/O controllers (not shown).
  • a network adapter 310 is coupled to data processing system 300 to enable data processing system 300 to become coupled to other data processing systems or remote printers or storage devices through communication link 312 .
  • Communication link 312 can be a private or public network. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters.

Abstract

Method, system, and computer program product for transferring data from transactional data sources to partitioned databases are provided. One or more messages from a transactional data source are read. Each message includes one or more records to be inserted into a database comprising a plurality of partitions. One of the partitions of the database in which the one or more records of each message are to be inserted is then identified. A transaction is initiated to insert the one or more records of each message into the one partition and to delete the one or more messages from the transactional data source. The transaction is committed only when insertion of the one or more records of each message into the one partition and deletion of the one or more messages from the transactional data source succeed.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a divisional of co-pending U.S. patent application Ser. No. 12/016,092, filed Jan. 17, 2008. The aforementioned related patent application is herein incorporated by reference in its entirety.
  • BACKGROUND
  • Transferring large volumes of data from a transactional data source to a partitioned database in a restartable environment requires high data throughput, no data loss, and no data duplication. High data throughput allows for real-time transfer of the large volumes of data. Additionally, if the transfer of the large volumes of data fails before all of the data have been transferred and is later restarted, then there needs to be some assurances that any data not yet transferred is not lost and any data already transferred is not transferred again.
  • SUMMARY
  • Method, system, and computer program product for transferring data from transactional data sources to partitioned databases are provided. In one implementation, one or more messages from a transactional data source are read. Each message includes one or more records to be inserted into a database comprising a plurality of partitions. One of the partitions of the database in which the one or more records of each message are to be inserted is then identified. A transaction is initiated to insert the one or more records of each message into the one partition and to delete the one or more messages from the transactional data source. The transaction is committed only when insertion of the one or more records of each message into the one partition and deletion of the one or more messages from the transactional data source succeed.
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
  • FIG. 1 depicts a process for transferring data from transactional data sources to partitioned databases according to an implementation.
  • FIG. 2 illustrates a system according to an implementation.
  • FIG. 3 is a block diagram of a data processing system with which implementations of this disclosure can be implemented.
  • DETAILED DESCRIPTION
  • This disclosure generally relates to transferring data from transactional data sources to partitioned databases in restartable environments. The following description is provided in the context of a patent application and its requirements. Accordingly, this disclosure is not intended to be limited to the implementations shown, but is to be accorded the widest scope consistent with the principles and features described herein.
  • High data throughput, no data loss, and no data duplication are required when transferring large volumes of data from transactional data sources to partitioned databases in restartable environments. A transactional data source may be, for instance, a queue storing messages or other types of data packets. The types of data that may be stored in a transactional data source include, for instance, stock prices, GPS (Global Positioning System) locations, RFID (Radio Frequency Identification) tags, or the like. A partitioned database is a database that has been divided into multiple partitions based on values in one or more database table columns. Each column upon which partitioning of a database is based is also referred to as a partitioning key. A restartable environment is an environment that can restart where it left off when failure occurred.
  • High data throughput is required because the type and amount of data being transferred need to be transferred in real-time. No data loss is required because if data transfer fails as a result of, for instance, system shut down, application error, loss of connection, or the like, and is later restarted, then data that has not been successfully transferred cannot be lost. No data duplication is required because in the event of a failure and restart, any data that has already been successfully transferred cannot be transferred again. Otherwise, databases will have duplicated data.
  • Depicted in FIG. 1 is a process 100 for transferring data from transactional data sources to partitioned databases according to an implementation. At 102, one or more messages are read from a transactional data source. Each message includes one or more records to be inserted into a database that comprises a plurality of partitions.
  • One of the plurality of partitions of the database in which the one or more records of each message are to be inserted is identified at 104. A single transaction is then initiated at 106 to insert the one or more records of each message into the one partition and to delete the one or more messages from the transactional data source. Combining the insertion of the one or more records and the deletion of the one or more messages into the single transaction ensures that neither operation can commit (e.g., written to disk) without the other.
  • A determination is made at 108 as to whether insertion of the one or more records of each message into the one partition succeeded. If insertion of the one or more records of each message into the one partition succeeded, then a determination is made at 110 as to whether deletion of the one or more messages from the transactional data source succeeded. If deletion of the one or more messages from the transactional data source also succeeded, then the single transaction is committed at 112. The order in which the insertion of the one or more records and the deletion of the one or more messages are carried out does not matter.
  • On the other hand, if it is determined at 108 that insertion of the one or more records of each message into the one partition failed (e.g., at least one record of at least one message was not successfully inserted into the one partition) or if it is determined at 110 that deletion of the one or more messages from the transactional data source failed (e.g., at least one message was not successfully deleted from the transactional data source), then the single transaction is rolled back at 114. Data loss is thus avoided when record insertion fails because message deletion will be rolled back. In addition, data duplication is avoided when message deletion fails because record insertion will be rolled back.
  • Failure of an insertion into the database or deletion from the transactional data source may be caused by any number of things, such as, the database crashing, the transactional data source crashing, connection to the database being lost, connection to the transactional data source being lost, and so forth. However, because the insertion of record(s) and the deletion of message(s) are part of the same transaction, the insertion cannot commit without the deletion committing, and vice versa. This ensures no data loss and no data duplication when data transfer is restarted after failure.
  • Illustrated in FIG. 2 is a system 200 according to an implementation. System 200 includes a transactional data source 202, a data stream engine 204, and a database 206. Transactional data source 202 includes a plurality of messages 208 a-208 c. Each message 208 includes one or more records 210. In one implementation, transactional data source 202 is a queue. Database 206 includes a plurality of partitions 212 a-212 b.
  • Although not illustrated in FIG. 2, system 200 may include additional entities, such as, database server(s), other data source(s), other database(s), or the like. In addition, transactional data source 202 may include other messages (not illustrated). Further database 206 may include other partitions (not illustrated). As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • Data stream engine 204 includes a feed handler 214, a sorter 216, and flushers 218 a-218 b. Feed handler 214 reads messages 208 a-208 c from transactional data source 202. Messages 208 a-208 c may be read together (e.g., around the same time) or separately (e.g., at different times). For each message 208 read by feed handler 214, sorter 216 identifies one of partitions 212 a-212 b of database 206 in which to insert the one or more records 210 of the message 208 and directs the one or more records 210 of the message 208 to the flusher 218 servicing the one partition 212.
  • In the implementation, sorter 216 has identified partition 212 a of database 206 for messages 208 a and 208 b and partition 212 b of database 206 for message 208 c. As a result, record 210 a of message 208 a and records 210 b-210 d of message 208 b have been directed to flusher 218 a, which services partition 212 a, and records 210 e-210 f of message 208 c have been directed to flusher 218 b, which services partition 212 b. Each flusher 218 is directly connected to the partition 212 serviced by the flusher 218.
  • Flusher 218 a then initiates a transaction to insert records 210 a-210 d into partition 212 a and to delete messages 208 a-208 b from transactional data source 202. Rather than initiating a single transaction for records 210 of both messages 208 a and 208 b, separate transactions may be initiated by flusher 218 a for records 210 of each message. Flusher 218 b initiates a separate transaction to insert records 210 e-210 f into partition 212 b and to delete message 208 c from transactional data source 202. The transaction initiated by flusher 218 b may be running in parallel with the transaction initiated by flusher 218 a.
  • Other components (not shown) may be included in data stream engine 204. For example, additional flushers (not illustrated) may be added to data stream engine 204 to service other partitions (not illustrated) of database 206 and/or to have multiple flushers servicing each partition 212 of database 206. Hence, if more than one flusher 218 is servicing partition 212 a, then message 208 a may be directed to one flusher, while message 208 b may be directed to another flusher. The number of flushers 218 servicing each partition 212 need not be the same.
  • By pre-sorting messages based which partition records of each message are destined and having a direct connection between each flusher and corresponding partition, higher data throughput is possible because database 206 will no longer have to worry about routing records 210 to the correct partition 212. In addition, flusher utilization scales linearly. For example, if data throughput to database 206 is 40,000 records/second (i.e., 20,000 records/second to each partition 212), then adding a new partition (not illustrated) to database 206 and a new flusher (not illustrated) to data stream engine 204 to service the new partition will increase data throughput to 60,000 records/second.
  • Messages 208 included in each record 210 may be destined for more than one partition 212 of database 206. For example, record 210 c of message 208 b may be destined for partition 212 b rather than partition 212 a. If a message 208 includes records 210 destined for more than one partition 212 of database 206, then sorter 216 may identify the partition 212 in which to insert the records 210 of the message 208 based on, for instance, the partition 212 in which most of the records 210 of the message 208 are destined, the partition 212 in which the first record 210 of the message 208 is destined, or the like.
  • The partition 212 in which each record 210 of the message 208 is destined may be determined by comparing one or more partitioning key values of the record 210 to a partitioning key lookup table for database 206. A partitioning key value of a record is the value in a field of the record that corresponds to a partitioning key of a database. Once all records 210 of the message 208 are inserted into a particular partition 212 of database 206, those records 210 not destined for the particular partition 212 will have to be re-routed to the correct partition 212. However, this should be a rare occurrence as records in the same message will usually cluster around the same partitioning key values.
  • Data stream engine 204 may be implemented through one or more processors, memories, and/or other computer hardware. Thus, when feed handler 214 reads messages 208 from transactional data source 202, those messages 208 may be stored in memory. Feed handler 214, sorter 216, and each flusher 218 can be implemented as a separate thread. To give an example, a first thread acting as feed handler 214 browses transactional data source 202 and reads messages 208 from transactional data source 202 without removing messages 208 from transactional data source 202.
  • The first thread submits records 210 from messages 208 to a second thread acting as sorter 216. The second thread then determines which partition 212 in database 206 records 210 from each message 208 should be written to and directs records 210 of each message 208 to a third thread acting as flusher 218 a or a fourth thread acting as flusher 218 b, which are connected to partition 212 a and partition 212 b, respectively.
  • A transaction combining deletion of messages 208 a-208 b from transactional data source 202 and insertion of records 210 a-210 d into partition 212 a of database 206 is initiated by the third thread. A separate transaction combining deletion of message 208 c from transactional data source 202 and insertion of records 210 e-210 f into partition 212 b of database 206 is initiated by the fourth thread. The delete of the message from the transactional data source and the insert of records in the message to the database are performed by the same thread because all operations in the same transaction must be from the same thread. The third thread may initiate a separate transaction for each message 208 a and 208 b. However, all records of a single message are part of the same transaction.
  • The third and fourth threads may manage their own transactions using, for instance, the XA 2-phase commit protocol defined by the X/Open Distributed Transaction Processing (DTP) group. Consequently, there may be multiple parallel transactions running, one or more for each database partition with transaction data specifically directed to that database partition, which minimizes crosstalk (e.g., re-routing records from one partition to another). Each thread may have an associated memory queue in which data to be processed by a thread is placed on the memory queue associated with the thread.
  • Upon a restart of data stream engine 204 after a failure (e.g., system crash), any messages involved in transactions not yet committed will still be in transactional data source 202 and available for processing again. If connection to transactional data source 202 or database 206 is lost, then all records in memory from the data source or destined for the database are discarded, any remaining connections are disconnected, all connections are then re-established, and all discarded data will be on transactional data source 202 for re-processing. This allows all data to be processed and inserted into database 206 once without duplicates and prevents data from being lost even with failures and restarts.
  • By pre-sorting messages, combining record insertion and message deletion into a single transaction, and having a dedicated connection to each database partition, higher data throughput, no data loss, and no data duplication are achieved when transferring large volumes of data from transactional data sources to partitioned databases in restartable environments.
  • The disclosure can take the form of an entirely hardware implementation, an entirely software implementation, or an implementation containing both hardware and software elements. In one implementation, this disclosure is implemented in software, which includes, but is not limited to, application software, firmware, resident software, microcode, etc.
  • Furthermore, this disclosure can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer-readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk, and an optical disk. Current examples of optical disks include DVD, compact disk-read-only memory (CD-ROM), and compact disk-read/write (CD-R/W).
  • FIG. 3 depicts a data processing system 300 suitable for storing and/or executing program code. Data processing system 300 includes a processor 302 coupled to memory elements 304 a-b through a system bus 306. In other implementations, data processing system 300 may include more than one processor and each processor may be coupled directly or indirectly to one or more memory elements through a system bus.
  • Memory elements 304 a-b can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code in order to reduce the number of times the code must be retrieved from bulk storage during execution. As shown, input/output or I/O devices 308 a-b (including, but not limited to, keyboards, displays, pointing devices, etc.) are coupled to data processing system 300. I/O devices 308 a-b may be coupled to data processing system 300 directly or indirectly through intervening I/O controllers (not shown).
  • In the implementation, a network adapter 310 is coupled to data processing system 300 to enable data processing system 300 to become coupled to other data processing systems or remote printers or storage devices through communication link 312. Communication link 312 can be a private or public network. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters.
  • While various implementations for transferring data from transactional data sources to partitioned databases in restartable environments have been described, the technical scope of this disclosure is not limited thereto. For example, this disclosure is described in terms of particular systems having certain components and particular methods having certain steps in a certain order. One of ordinary skill in the art, however, will readily recognize that the methods described herein can, for instance, include additional steps and/or be in a different order, and that the systems described herein can, for instance, include additional or substitute components. Hence, various modifications or improvements can be added to the above implementations and those modifications or improvements fall within the technical scope of this disclosure.
  • While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (7)

What is claimed is:
1. A system for transferring data from transactional data sources to partitioned database, the method comprising a feed handler reading a first set of one or more messages from a transactional data source, each message in the first set comprising one or more records to be inserted into a database comprising a plurality of partitions a sorter in communication with the feed handler, the sorter identifying a first of the plurality of partitions of the database in which to insert the one or more records of each message in the first set; and a first flusher in communication with the sorter and the first partition, the first flusher initiating a first transaction to insert the one or more records of each message in the first set into the first partition and to delete the one or more messages in the first set from the transactional data source, and committing the first transaction only when insertion of the one or more records of each message in the first set into the first partition and deletion of the one or more messages in the first set from the transactional data source succeed.
2. The system of claim 1, wherein the first flusher further rolls back the first transaction when insertion of the one or more records of each message in the first set into the first partition or deletion of the one or more messages in the first set from the transactional data source fails.
3. The system of claim 1, wherein the feed handler further reads a second set of one or more messages from the transactional data source, each message in the second set comprising one or more records to be inserted into the database, the sorter further identifies a second of the plurality of partitions of the database in which to insert the one or more records of each message in the second set, and the system further comprises a second flusher in communication with the sorter and the second partition, the second flusher initiating a second transaction to insert the one or more records of each message in the second set into the second partition and to delete the one or more messages in the second set from the transactional data source, and committing the second transaction only when insertion of the one or more records of each message in the second set into the second partition and deletion of the one or more messages in the second set from the transactional data source succeed.
4. The system of claim 3, wherein the second flusher further rolls back the second transaction when insertion of the one or more records of each message in the second set into the second partition or deletion of the one or more messages in the second set from the transactional data source fails
5. The system of claim 3, wherein the first transaction and the second transaction are running in parallel.
6. The system of claim 3, wherein the second flusher is in communication with the second partition via at least one dedicated connection to the second partition.
7. The system of claim 1, wherein the first flusher is in communication with the first partition via at least one dedicated connection to the first partition.
US13/956,159 2008-01-17 2013-07-31 Transfer of data from transactional data sources to partitioned databases in restartable environment Abandoned US20130318059A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/956,159 US20130318059A1 (en) 2008-01-17 2013-07-31 Transfer of data from transactional data sources to partitioned databases in restartable environment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/016,092 US8521682B2 (en) 2008-01-17 2008-01-17 Transfer of data from transactional data sources to partitioned databases in restartable environments
US13/956,159 US20130318059A1 (en) 2008-01-17 2013-07-31 Transfer of data from transactional data sources to partitioned databases in restartable environment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/016,092 Division US8521682B2 (en) 2008-01-17 2008-01-17 Transfer of data from transactional data sources to partitioned databases in restartable environments

Publications (1)

Publication Number Publication Date
US20130318059A1 true US20130318059A1 (en) 2013-11-28

Family

ID=40877483

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/016,092 Active 2030-11-06 US8521682B2 (en) 2008-01-17 2008-01-17 Transfer of data from transactional data sources to partitioned databases in restartable environments
US13/956,159 Abandoned US20130318059A1 (en) 2008-01-17 2013-07-31 Transfer of data from transactional data sources to partitioned databases in restartable environment

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/016,092 Active 2030-11-06 US8521682B2 (en) 2008-01-17 2008-01-17 Transfer of data from transactional data sources to partitioned databases in restartable environments

Country Status (1)

Country Link
US (2) US8521682B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107209726A (en) * 2014-08-15 2017-09-26 微软技术许可有限责任公司 Washing away in file system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7933873B2 (en) * 2008-01-17 2011-04-26 International Business Machines Corporation Handling transfer of bad data to database partitions in restartable environments
US8521682B2 (en) 2008-01-17 2013-08-27 International Business Machines Corporation Transfer of data from transactional data sources to partitioned databases in restartable environments
US9110938B2 (en) * 2010-10-18 2015-08-18 International Business Machines Corporation Parallelizing I/O processing of index insertions during insertion of data into a database
US8874510B2 (en) * 2011-01-05 2014-10-28 Lenovo (Singapore) Pte. Ltd. Synchronizing files between base and detachable device
US11544258B2 (en) * 2020-02-05 2023-01-03 Ebay Inc. Selecting a host based on quality of stored data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040107389A1 (en) * 2002-11-30 2004-06-03 International Business Machines Corporation System and method for handling out-of-order data supplied by a real-time feed
US7653665B1 (en) * 2004-09-13 2010-01-26 Microsoft Corporation Systems and methods for avoiding database anomalies when maintaining constraints and indexes in presence of snapshot isolation

Family Cites Families (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6662307B1 (en) * 1993-06-14 2003-12-09 Unisys Corporation Disk recovery/reconstruction
US5960194A (en) * 1995-09-11 1999-09-28 International Business Machines Corporation Method for generating a multi-tiered index for partitioned data
US5838910A (en) * 1996-03-14 1998-11-17 Domenikos; Steven D. Systems and methods for executing application programs from a memory device linked to a server at an internet site
US6289474B1 (en) * 1998-06-24 2001-09-11 Torrent Systems, Inc. Computer system and process for checkpointing operations on data in a computer system by partitioning the data
US6473774B1 (en) * 1998-09-28 2002-10-29 Compaq Computer Corporation Method and apparatus for record addressing in partitioned files
US6480941B1 (en) * 1999-02-23 2002-11-12 International Business Machines Corporation Secure partitioning of shared memory based multiprocessor system
US6571282B1 (en) * 1999-08-31 2003-05-27 Accenture Llp Block-based communication in a communication services patterns environment
US20020049815A1 (en) * 2000-04-14 2002-04-25 Kayshav Dattatri System for monitoring and managing information and information transfers in a computer network
US7028056B1 (en) * 2000-04-14 2006-04-11 Microsoft Corporation Method and arrangements for generating debugging information following software failures
US6681348B1 (en) * 2000-12-15 2004-01-20 Microsoft Corporation Creation of mini dump files from full dump files
US20020161778A1 (en) * 2001-02-24 2002-10-31 Core Integration Partners, Inc. Method and system of data warehousing and building business intelligence using a data storage model
US6687798B1 (en) * 2001-05-31 2004-02-03 Oracle International Corporation Methods for intra-partition parallelism for inserts
US20030115207A1 (en) * 2001-09-25 2003-06-19 Bowman David M. Hierarchical hybrid OLAP analytics generators
US7403985B2 (en) * 2001-11-01 2008-07-22 Hewlett-Packard Development Company, L.P. Method and system for analyzing electronic service execution
US6898608B2 (en) * 2002-04-26 2005-05-24 Oracle International Corporation Techniques for managing what-if analysis of data managed by a relational database system
AU2003275005A1 (en) * 2002-09-18 2004-04-08 Netezza Corporation Intelligent storage device controller
JP4301849B2 (en) * 2003-03-31 2009-07-22 株式会社日立製作所 Information processing method and its execution system, its processing program, disaster recovery method and system, storage device for executing the processing, and its control processing method
US7174345B2 (en) * 2003-05-30 2007-02-06 Oracle International Corporation Methods and systems for auto-partitioning of schema objects
US7113942B2 (en) * 2003-06-27 2006-09-26 Microsoft Corporation Scalable storage and processing of hierarchical documents
US8095511B2 (en) * 2003-06-30 2012-01-10 Microsoft Corporation Database data recovery system and method
US6845384B2 (en) * 2003-08-01 2005-01-18 Oracle International Corporation One-phase commit in a shared-nothing database system
US7237139B2 (en) * 2003-08-07 2007-06-26 International Business Machines Corporation Services heuristics for computer adapter placement in logical partitioning operations
US7149929B2 (en) * 2003-08-25 2006-12-12 Hewlett-Packard Development Company, L.P. Method of and apparatus for cross-platform core dumping during dynamic binary translation
US7730489B1 (en) * 2003-12-10 2010-06-01 Oracle America, Inc. Horizontally scalable and reliable distributed transaction management in a clustered application server environment
US7490268B2 (en) * 2004-06-01 2009-02-10 The Trustees Of Columbia University In The City Of New York Methods and systems for repairing applications
US7779404B2 (en) * 2004-06-10 2010-08-17 Cisco Technology, Inc. Managing network device configuration using versioning and partitioning
US20060123204A1 (en) * 2004-12-02 2006-06-08 International Business Machines Corporation Method and system for shared input/output adapter in logically partitioned data processing system
US7415487B2 (en) * 2004-12-17 2008-08-19 Amazon Technologies, Inc. Apparatus and method for data warehousing
US8166059B2 (en) * 2005-07-08 2012-04-24 Oracle International Corporation Optimization of queries on a repository based on constraints on how the data is stored in the repository
US7702959B2 (en) * 2005-08-02 2010-04-20 Nhn Corporation Error management system and method of using the same
US7689565B1 (en) * 2006-06-28 2010-03-30 Emc Corporation Methods and apparatus for synchronizing network management data
US7933935B2 (en) * 2006-10-16 2011-04-26 Oracle International Corporation Efficient partitioning technique while managing large XML documents
US8521682B2 (en) 2008-01-17 2013-08-27 International Business Machines Corporation Transfer of data from transactional data sources to partitioned databases in restartable environments
US8156084B2 (en) * 2008-01-17 2012-04-10 International Business Machines Corporation Transfer of data from positional data sources to partitioned databases in restartable environments
US7933873B2 (en) * 2008-01-17 2011-04-26 International Business Machines Corporation Handling transfer of bad data to database partitions in restartable environments

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040107389A1 (en) * 2002-11-30 2004-06-03 International Business Machines Corporation System and method for handling out-of-order data supplied by a real-time feed
US7653665B1 (en) * 2004-09-13 2010-01-26 Microsoft Corporation Systems and methods for avoiding database anomalies when maintaining constraints and indexes in presence of snapshot isolation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107209726A (en) * 2014-08-15 2017-09-26 微软技术许可有限责任公司 Washing away in file system

Also Published As

Publication number Publication date
US8521682B2 (en) 2013-08-27
US20090187917A1 (en) 2009-07-23

Similar Documents

Publication Publication Date Title
US9779128B2 (en) System and method for massively parallel processing database
US20130318059A1 (en) Transfer of data from transactional data sources to partitioned databases in restartable environment
EP2474919B1 (en) System and method for data replication between heterogeneous databases
US9798792B2 (en) Replication for on-line hot-standby database
US7290015B1 (en) High availability via data services
US8150812B2 (en) Methods, apparatus and computer programs for data replication
CN110442560B (en) Log replay method, device, server and storage medium
US20170339238A1 (en) Large message support for a publish-subscribe messaging system
CN113396407A (en) System and method for augmenting database applications using blockchain techniques
CN110941502B (en) Message processing method, device, storage medium and equipment
US7801997B2 (en) Asynchronous interconnect protocol for a clustered DBMS
US7330860B2 (en) Fault tolerant mechanism to handle initial load of replicated object in live system
CN103092903A (en) Database Log Parallelization
US20110137874A1 (en) Methods to Minimize Communication in a Cluster Database System
US8156084B2 (en) Transfer of data from positional data sources to partitioned databases in restartable environments
CN112163038A (en) Cross-cluster data synchronization method, device, equipment and storage medium
US20060282524A1 (en) Apparatus, system, and method for facilitating communication between an enterprise information system and a client
US20140250326A1 (en) Method and system for load balancing a distributed database providing object-level management and recovery
CN101802791B (en) Dynamic address tracking
WO2020253314A1 (en) Transaction monitoring method, apparatus and system for distributed database, and storage medium
CN113905054B (en) RDMA (remote direct memory access) -based Kudu cluster data synchronization method, device and system
US7933873B2 (en) Handling transfer of bad data to database partitions in restartable environments
CN112559628A (en) Multi-cluster message synchronization method, device, medium and electronic equipment
CN112087501A (en) Transmission method and system for keeping data consistency
WO2023193495A1 (en) Method for processing read request, distributed database and server

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE

AS Assignment

Owner name: SERVICENOW, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:055117/0022

Effective date: 20210201