WO2020224374A1 - 数据复制方法、装置、计算机设备及存储介质 - Google Patents

数据复制方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020224374A1
WO2020224374A1 PCT/CN2020/084085 CN2020084085W WO2020224374A1 WO 2020224374 A1 WO2020224374 A1 WO 2020224374A1 CN 2020084085 W CN2020084085 W CN 2020084085W WO 2020224374 A1 WO2020224374 A1 WO 2020224374A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
transaction
historical
historical state
buffer
Prior art date
Application number
PCT/CN2020/084085
Other languages
English (en)
French (fr)
Inventor
李海翔
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2021532087A priority Critical patent/JP7271670B2/ja
Priority to EP20802129.5A priority patent/EP3968175B1/en
Publication of WO2020224374A1 publication Critical patent/WO2020224374A1/zh
Priority to US17/330,276 priority patent/US11921746B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Definitions

  • This application relates to the field of database technology, in particular to data replication technology.
  • the host In the process of copying, there are usually two devices: the host and the standby machine.
  • the host For the current database (such as Oracle, MySQL, InnoDB, etc.), the host can regularly copy the data files in the database to the standby machine to achieve data file-based Active and standby synchronization. Further, in order to avoid data inconsistency between the host and the standby machine due to damage to the data file during the copy process, after the communication connection between the host and the standby machine is established, a redo log (REDO LOG) is synchronized between the two databases. If an exception occurs during the copy process, the standby machine can clear the abnormal data by playing back the redo log.
  • REDO LOG redo log
  • the embodiments of the present application provide a data replication method, device, computer equipment, and storage medium, which can solve the problems of long time consumption, complicated analysis and playback work, and impact on data replication efficiency when data is replicated based on redo logs.
  • the technical scheme is as follows:
  • a data replication method which is executed by a computer device (node device), and the method includes:
  • the historical state data of the transaction is added to the data queue, and the data queue is used to cache the historical state data;
  • the at least one historical state data in the sending buffer is copied to the cluster device.
  • a data replication method which is executed by a computer device (cluster device), and the method includes:
  • the at least one historical state data in the receiving buffer is added to the forwarding buffer, and the at least one historical state data is converted into data conforming to the tuple format through the forwarding buffer to obtain at least one data item.
  • the forwarding buffer The area is used for data format conversion of historical data;
  • the at least one data item is stored in at least one target data table of the cluster database, and one target data table corresponds to an original data table where one data item is located in the node device.
  • a data replication device which includes:
  • the add module is used to add the historical state data of the transaction to the data queue when the commit operation of the transaction is detected, and the data queue is used to cache the historical state data;
  • the adding module is also used to add at least one historical state data in the data queue to the sending buffer, and the sending buffer is used to buffer the historical data to be copied;
  • the copying module is configured to copy the at least one historical state data in the sending buffer to the cluster device when the first preset condition is met.
  • a data replication device which includes:
  • the receiving module is configured to receive at least one historical state data sent by the node device from a receiving buffer, and the receiving buffer is used to buffer the received historical state data;
  • the adding module is used for adding the at least one historical state data in the receiving buffer to the forwarding buffer, and converting the at least one historical state data into data conforming to the tuple format through the forwarding buffer to obtain at least one data Item, the forwarding buffer is used for data format conversion of historical data;
  • the storage module is configured to store the at least one data item in at least one target data table of the cluster database, and one target data table corresponds to an original data table where one data item is located in the node device.
  • a computer device in one aspect, includes a processor and a memory, and at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement any of the foregoing possible implementation modes. Data replication method.
  • a computer-readable storage medium is provided, and at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to implement a data replication method in any one of the foregoing possible implementation manners.
  • a computer program product including instructions, which when run on a computer, cause the computer to execute to implement the data replication method in any of the above-mentioned possible implementation modes.
  • the historical state data of the transaction is added to the data queue to buffer the historical state data of the transaction in the data queue, and at least one historical state data in the data queue is added to the sending buffer , So as to execute the sending process or sending thread based on the sending buffer, and when the first preset condition is met, the at least one historical state data in the sending buffer is copied to the cluster device, so that the node device can meet the first preset condition every time Set the conditions to copy the historical data in at least one sending buffer to the cluster device.
  • the node device does not need to convert the original historical data format into the log format, and the cluster device does not need to parse the log into the original data format before storing, so there is no need to replay the historical data during data replication.
  • the cumbersome playback process is avoided, the time length of the redo log playback process is shortened, and the efficiency of the data copy process is improved.
  • FIG. 1 is a schematic diagram of an implementation environment of a data replication method provided by an embodiment of the present application
  • FIG. 2 is an interactive flowchart of a data replication method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a principle for obtaining historical state data according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a principle for obtaining historical data according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a streaming replication technology provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a streaming replication technology provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an original data table provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a target data table provided by an embodiment of the present application.
  • FIG. 9 is a flowchart of a data query process provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a transaction consistency point provided by an embodiment of the present application.
  • FIG. 11 is an interaction flowchart of a data system provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a data replication device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a data replication device provided by an embodiment of the present application.
  • Fig. 14 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the database involved in the embodiment of the present application stores multiple data tables, and each data table can be used to store tuples.
  • the database may be any type of database based on MVCC (multi-version concurrency control, multi-version concurrency control). In the embodiment of the present application, the type of the database is not specifically limited.
  • the data in the above database can be divided into three states based on state attributes: current state, transition state and historical state. These three states are collectively called “full state”, or full state for short. Data, each of the different state attributes in the total state data, can be used to identify the state of the data in its life cycle track.
  • the data of the latest version of the tuple is the data in the current stage.
  • the state of the data in the current stage is called the current state.
  • Transitional state The data that is not the latest version of the tuple or the historical state version, in the process of transition from the current state to the historical state, is called half-decay data.
  • Historical state A state in the history of a tuple whose value is the old value, not the current value.
  • the state of the data in the historical stage is called the historical state.
  • the data can only exist in the historical state and the current state.
  • the new value of the data after the transaction is committed is in the current state.
  • the data generated by the transaction before the smallest transaction in the current active transaction list is in the historical state;
  • the blocking concurrent access control mechanism after the transaction is committed, the value of the data before the commit becomes the value of the historical state , That is, the old value of the tuple is in the historical state.
  • the latest related transaction modifies the value of the tuple, its latest value is already in a current state, and the value read is already in the current state. It is in a historical state, so its data state is between the current state and the historical state, so it is called the transition state.
  • the balance of account A in the user table (user table) is changed from 10 yuan to 20 yuan, and then 15 yuan is consumed to 5 yuan, if the financial institution B starts to read the data for inspection from this time , After A recharges 20 yuan to 25 yuan, then 25 yuan is the current state data, the 5 yuan that B is reading is the transition state, and the remaining two values 20 and 10 are states that existed in history. Historical data.
  • FIG. 1 is a schematic diagram of an implementation environment of a data replication method provided by an embodiment of the present application.
  • the implementation environment can be collectively referred to as HTAC (hybrid transaction/analytical cluster) architecture.
  • HTAC hybrid transaction/analytical cluster
  • TP transaction processing
  • AP analytical processing
  • Processing Cluster 102.
  • the TP cluster 101 is used to provide transaction processing services.
  • the TP cluster may include multiple node devices 103. During data replication, the multiple node devices are used to provide historical data to be replicated.
  • Each node device may be configured with a node database, and each node device may be a stand-alone device, or a cluster device with one active and two backups.
  • the embodiment of the present application does not specifically limit the type of the node device.
  • the AP cluster 102 is used to provide historical data query and analysis services.
  • the AP cluster may include a cluster device, and a cluster database may be configured on the cluster device.
  • the cluster device is used to The historical data sent by each node device is replicated and stored in the cluster database, and query and analysis services are provided based on the historical data stored in the cluster database.
  • the cluster database can be a local database, or a distributed file system that the cluster device accesses through a storage interface, so that the distributed file system can provide unlimited storage functions to the TP cluster, for example, the distributed file system
  • the system can be HDFS (Hadoop distributed file system), Ceph (a distributed file system under Linux), Alluxio (a memory-based distributed file system), etc.
  • the cluster device may be composed of one or more stand-alone devices or a combination of one main and two standby cluster devices, and online communication between each device is realized.
  • the embodiment of the present application does not specifically limit the type of the cluster device.
  • multiple node devices in the TP cluster 101 can provide transaction processing services, at the moment when any transaction is submitted, while generating new current state data, it will also generate the corresponding current state data.
  • Historical state data and because historical state data occupies more storage space, but historical state data has preservation value, the multiple node devices can copy the historical state data to the cluster based on the data replication method provided in this embodiment of the application.
  • the cluster device stores the historical data in the data table based on the local executor (LE), and when the copy is completed, it supports deleting the copied historical data on the node device (of course, it does not need to be deleted), Dump historical state data from the TP cluster to the AP cluster to ensure that the HTAC architecture can not only store current state data and transition state data, but also properly store historical state data, realizing a complete storage mechanism for full state data.
  • L local executor
  • the metadata of the historical data copied this time can also be registered to the metadata (MD) manager of the cluster device In this way, it is convenient for the cluster device to count the metadata of the stored historical data based on the metadata manager.
  • the user can route the query to the stored in the TP cluster 101 or AP cluster 102 based on the query statement, query operation semantics and metadata provided in the SQL routing (structured query language router, SQL Router, SR) layer
  • the TP cluster 101 mainly provides query services for current state data
  • the AP cluster 102 mainly provides query services for historical state data.
  • the semantics of the query operation is the operation intention obtained from the analysis of the query sentence, for example, the condition of the WHERE clause can express the intention of the WHERE clause.
  • a transaction not only involves data modification of the node database of a single node device, but also usually involves data modification of the node database of at least one other node device.
  • a distributed consensus algorithm such as two-phase commit, 2PC
  • one or more node databases corresponding to each node device in the TP cluster 101 can form a database instance set, which can be called a SET (set).
  • a database instance set which can be called a SET (set).
  • the database instance of the stand-alone device is a SET
  • the SET of the node device is the host database instance and two standby database instances
  • the strong synchronization technology of the cloud database can be used to ensure the consistency between the data of the host and the copy data of the standby machine.
  • each SET can be linearly expanded to meet business processing requirements in big data scenarios.
  • the TP cluster 101 can also support the management of the multiple node devices 103 through a distributed coordination system (such as ZooKeeper).
  • ZooKeeper can make a certain node device fail (that is, the node device Removed from TP cluster 101).
  • FIG. 2 is an interaction flowchart of a data replication method provided by an embodiment of the present application.
  • this embodiment is applied to the interaction process between the node device and the cluster device. Examples include:
  • a node device When a node device detects a transaction commit operation, it adds historical state data of the transaction to a data queue, and the data queue is used to cache historical state data.
  • the node device can be any node device in the TP cluster, and a node database can be configured on the node device.
  • a node database can be configured on the node device.
  • historical data and data will be generated accordingly. New current state data.
  • the deletion transaction (DELETE operation) also has a similar process.
  • the deletion flag is added to the original tuple.
  • the tuple completes the effective deletion process.
  • the original tuple appears to be "readable", that is, only after the deletion transaction is submitted, the user can discover that the tuple has been deleted.
  • the node database when the node device provides transaction processing services, when the commit operation of any transaction is detected, the node database will obtain the historical data of the transaction. If the node database is a database that does not support storing historical data, then The node device can obtain the historical state data at the time the transaction submission is completed, and execute the operation of adding the historical state data to the data queue in step 201 above, so as to achieve synchronization of the transaction commit operation and the data queue addition operation achieve.
  • some types of node databases support the temporary storage of transitional state data or historical state data in a rollback segment.
  • the transaction commit operation is the same as
  • the data queue addition operation is asynchronous. Since the node database can only temporarily store historical data, the database engine will periodically clean up the data stored in the rollback segment. At this time, the node device can perform the cleanup of the rollback segment in the database engine.
  • the historical data stored in the rollback segment is acquired, and the operation of adding the historical data to the data queue in step 201 is performed, so that the transaction commit operation and the data queue addition operation are implemented asynchronously.
  • FIG. 3 is a schematic diagram of a principle for obtaining historical data according to an embodiment of the present application.
  • the initial balance of user A is 100 yuan
  • 100 yuan is recharged at the first moment
  • the balance becomes 200 yuan.
  • 100 yuan is recharged
  • the balance becomes 300 yuan.
  • the financial institution reads the node database, and at the third moment during the reading operation, user A recharges 100 yuan
  • the balance becomes 400 yuan
  • the current state data corresponding to user A is 400
  • the transition state data is 300
  • the historical state data includes 100 and 200.
  • the rollback segment can be performed by executing the PURGE operation
  • the node device detects the PURGE operation
  • the node device adds the historical state data (100 and 200 corresponding to user A) that the PURGE operation acts on to the data queue.
  • the historical state data of user A is An example is used for explanation. The same applies to user B, user C, and user D, so I won’t repeat them here.
  • some types of node databases support recording current state data, transition state data, and historical state data in the data page, and periodically clean up the historical state data in the data page.
  • the node device can be When the database engine performs the cleanup operation of the data page, it obtains the historical state data stored in the data page, and executes the operation of adding the historical state data to the data queue in step 201 above, thereby achieving the transaction commit operation and the addition of the data queue The operation is implemented asynchronously.
  • FIG. 4 is a schematic diagram of the principle of obtaining historical state data provided by an embodiment of the present application.
  • the node database takes the node database as PostgreSQL as an example, the node database combines current state data, transition state data, and The historical data is recorded in the data page, and the tuple information of the multiple tuples can also be recorded in the data page.
  • the node database cleans the data page by executing the VACUUM operation. When the node device detects the VACUUM operation, the VACUUM operation The historical state data is added to the data queue, and then the data generated by the transaction before the current smallest active transaction in the data page is cleaned up.
  • the node device may include a data buffer (buffer).
  • the data buffer caches the historical state data in the form of a data queue and removes the historical state data from The original data table of the node database is added to the data queue of the data buffer.
  • the node device obtains at least one piece of historical state data that has been added within the first preset time period before the current time in the data queue every first preset time interval.
  • the first preset duration may be any value greater than or equal to 0, for example, the first preset duration may be 0.5 milliseconds.
  • the node device obtains historical data from the data queue every first preset time interval. However, since the historical data in the data queue is disordered, the following step 203 needs to be performed. The data is sorted and then added to the sending buffer, so that the historical state data can be asynchronously written into the sending buffer.
  • the node device can also synchronously write historical state data into the sending buffer.
  • the synchronization process is that whenever a new historical state data is added to the data queue, the historical state data is synchronously added to the sending buffer. District.
  • the node device can write the historical data into the data queue at the time the transaction is submitted, and at the same time The historical data is written into the sending buffer.
  • steps 202-204 in this embodiment can be replaced with: when it is detected that any historical data is added to the data queue, the historical data is added to the sending buffer; when the sending buffer is detected When any historical state data is added to the area, the at least one historical state data in the sending buffer is copied to the cluster device, thereby realizing the synchronous replication of the historical state data and ensuring that the historical state data is written into the sending buffer At the time, write in the order of the transaction commit time stamp and the order of the transaction identifier, so that there is no need to perform the sorting operation in the above step 203, and the following step 204 is directly performed.
  • the PURGE operation is used to clean up the historical state in the MySQL/InnoDB and other types of node databases involved in step 201 above.
  • Data, or the use of VACUUM operations to clean up historical data in a node database such as PostgreSQL will cause the historical data cached in the data queue to be disordered. Even if the historical data is sent to the buffer synchronously, it is still There is no guarantee that the historical data is written into the sending buffer in an orderly manner. Therefore, in this scenario, the following step 203 needs to be performed.
  • Figure 5 is a schematic diagram of the principle of a streaming replication technology provided by an embodiment of the present application. See Figure 5.
  • the original current state data can be converted into historical state data.
  • the historical state data obtained in is out of order.
  • the following step 203 needs to be performed.
  • the node device sorts the at least one historical data in the order of transaction submission time stamps from small to large. When there are multiple historical data with the same transaction submission time stamp, the node device sorts the data in the order of transaction identification from small to large. A plurality of historical state data are sorted to obtain at least one ordered historical state data, and the at least one ordered historical state data is added to the sending buffer.
  • each historical state data corresponds to a transaction
  • the transaction identifier (identification, transaction ID) is used to uniquely identify a transaction
  • the transaction identifier increases monotonically according to the transaction generation timestamp (timestamp).
  • the transaction identifier may be the transaction generation Timestamp, of course, the transaction identifier can also be a monotonically increasing value assigned according to the transaction generation timestamp.
  • a transaction usually corresponds to two timestamps, namely the transaction generation timestamp and the transaction commit time Stamp, these two timestamps correspond to the generation time and commit time of the transaction respectively.
  • the sending buffer can be a part that is cyclically used in the data replication process, and the sending buffer can be a buffer called when the sending process or sending thread executes the sending task (sending historical data from the node device to the cluster device)
  • the number of sending buffers can also be one or more. In the above step 203, only the ordered historical data Write to any send buffer as an example.
  • the node device before the node device asynchronously writes the historical state data into the sending buffer, it can sort the historical state data first.
  • sorting first sort the transaction commit timestamp from small to large, and then submit the timestamp for the transaction.
  • the same historical state data is sorted according to the order of transaction identifiers from small to large, and then the ordered historical state data is written into the sending buffer, ensuring that the historical data in the sending buffer is absolutely orderly.
  • FIG. 6 is a schematic diagram of the principle of a streaming replication technology provided by an embodiment of the present application.
  • the method for acquiring historical state data in each sending buffer is the same as the above step 202-
  • the implementation described in 203 is similar, so I won't repeat it here.
  • the node device adds at least one historical state data in the data queue to any one of the at least one sending buffer. In a scenario with a large amount of data, you can increase the value of the sending buffer. Quantity, write the historical state data in the data queue into the sending buffer more quickly.
  • the node device can evenly add historical data from the same original data table in the data queue to the multiple sending buffers, thereby improving multiple sending buffers.
  • the utilization rate of the buffer can also increase the sending rate of the historical data in the original data table.
  • the node device after the node device adds historical state data from the data queue to the sending buffer, it can mark the historical state data in the data queue as a reusable state according to actual needs, so that the node device will The historical data is dumped locally.
  • the node device copies the at least one historical state data in the sending buffer to the cluster device.
  • the first preset condition may be that the node device detects that any historical data is added to the sending buffer, where the sending buffer is used to buffer the historical data to be copied, and the node device executes the slave data In the process of acquiring historical data in the queue, once a historical data is successfully added to the sending buffer, the sending buffer will copy the historical data to the cluster device, so that the historical data can be continuously copied to the cluster In the device, this data replication technology is called streaming replication technology.
  • the first preset condition may also be that the node device detects that the ratio of the used data amount of the sending buffer to the capacity of the sending buffer reaches a ratio threshold, and the node device executes the acquisition from the data queue In the process of historical data, once the ratio of used data in the sending buffer to the total capacity of the sending buffer reaches the ratio threshold, the sending buffer will copy the historical data cached by itself to the cluster device, so that it can continuously Copy historical data to cluster devices.
  • the ratio threshold can be any value greater than 0 and less than or equal to 1, for example, the ratio threshold can be a value such as 100% or 75%.
  • the first preset condition may also be that the current time has reached the second preset time period from the last time the sending buffer copied historical data to the cluster device, and the node device executes the acquisition of history from the data queue.
  • the sending buffer will copy the historical data to the cluster device, so that the historical data can be copied continuously to Cluster equipment.
  • the second preset duration may be any value greater than or equal to the first preset duration.
  • the first preset duration is 0.5 milliseconds
  • the second preset duration may be 1 millisecond.
  • the sending buffer The zone performs data replication to the cluster device every 1 millisecond, and in this 1 millisecond interval, every 0.5 millisecond interval of the sending buffer, the newly added historical state data in the data queue in the previous 0.5 millisecond will be obtained from the data queue (Can be one or more).
  • the first preset condition may also be that the current time has reached a third preset duration from the time when the sending buffer last copied historical data to the cluster device, where the third preset duration is for Each node device in the multiple node devices is configured with the same preset duration, and the third preset duration is greater than the second preset duration.
  • each interval is the third preset duration.
  • the first preset condition may also be when the node device detects that the ratio of the amount of used data in the sending buffer to the capacity of the sending buffer reaches a ratio threshold, or the current time is far away from the sending buffer. The time when the historical data is copied to the cluster device once reaches the second preset duration.
  • the above situation is: in the process of data copying, once the ratio of the used data volume of the sending buffer to the total capacity of the sending buffer reaches the ratio threshold, the data copy task is executed once, or even if the sending buffer is used The ratio of the amount of data to the capacity of the sending buffer has not yet reached the ratio threshold, but the current time from the time when the sending buffer last copied historical data to the cluster device has reached the second preset duration, and a data copy task is also executed.
  • the node device may send the at least one historical state data in the sending buffer to the cluster device based on the sending process or the sending thread.
  • the node device may also, when the first preset condition is met, Send all the historical data cached in the sending buffer to the cluster device at one time.
  • the above steps 202-204 constitute a cyclic process, enabling the node device to continuously copy the historical data based on a streaming replication technology To the cluster device.
  • each historical state data sent by the sending buffer to the cluster device may include the transaction identifier of the transaction corresponding to the historical state data, and the information of one or more node devices corresponding to one or more sub-transactions of the transaction. At least one of the node identification or the full data of the historical state data.
  • a transaction may include at least one sub-transaction, and each sub-transaction corresponds to a node device, and each node device has a unique node identifier.
  • the node identifier may be the IP address (Internet protocol address) of the node device, It can also be the identification number of the node device.
  • the identification number has a one-to-one mapping relationship with the IP address.
  • Any node device in the TP cluster can store the mapping relationship.
  • the cluster device of the AP cluster can also store the mapping relationship. The mapping relationship.
  • bitmap encoding or dictionary compression can be used to encode the node identification of one or more node devices, so that the length of the historical data sent by the node device is shortened, and the data transmission occupied H.
  • the above-mentioned data replication process can be implemented by the Checkpoint operation of the TP cluster, and the node device can also set the Checkpoint operation frequency of the TP cluster, where the operation frequency is used to indicate that the TP cluster performs the Checkpoint operation For example, the operation frequency can be performed once per second.
  • each node device in the TP cluster performs the data replication process in step 204 above, so that the newly generated history in the TP cluster
  • the status data can be dumped to the AP cluster at one time, that is, the checkpoint operation frequency actually corresponds to the third preset duration.
  • the "micro Checkpoint" operation can be performed.
  • the operation frequency of the micro Checkpoint is faster than that of Checkpoint. The operation frequency can make the historical state data of the node device be dumped to the AP cluster faster, meet the AP cluster's demand for obtaining historical state data, ensure the historical state data replication efficiency, and improve the real-time availability of the AP cluster.
  • the operation frequency of the micro Checkpoint can be set to one thousandth of the time unit of the operation frequency of the Checkpoint, that is, if the Checkpoint operation is performed once per second, the micro Checkpoint operation is performed once per millisecond.
  • the operation frequency of the micro Checkpoint can be set to one thousandth of the time unit of the operation frequency of the Checkpoint, that is, if the Checkpoint operation is performed once per second, the micro Checkpoint operation is performed once per millisecond.
  • the embodiment of the present application does not specifically limit the ratio between the operation frequency of the micro Checkpoint and the operation frequency of the Checkpoint.
  • the operating frequency of the micro Checkpoint actually corresponds to the second preset duration
  • different node devices can be set with different operating frequencies of the micro Checkpoint
  • the operating frequency of the micro Checkpoint can be It is positively correlated with the number of active transactions per second of the node device. For example, for the node device whose number of active transactions per second accounts for the top 10 in the TP cluster, a higher micro-Checkpoint operation frequency can be set.
  • the ratio of the used data volume of the sending buffer of different node devices to the total capacity of the sending buffer will usually not reach the ratio threshold at the same time, which will cause different The micro Checkpoint operation between node devices is not synchronized.
  • all node devices in the TP cluster can also be forced to perform a Checkpoint operation periodically, thereby avoiding different node devices in the TP cluster due to micro Checkpoint operations. Synchronization causes excessive data delay, which affects the real-time availability of the AP cluster. For example, each node device performs a micro Checkpoint operation every 1 millisecond, while the TP cluster traverses all node devices every 1 second and performs a Checkpoint operation to ensure that the data delay of the AP cluster receiving historical data does not exceed 1 second (not Exceed the frequency of Checkpoint operation).
  • the data replication process in the above step 204 can also be divided into synchronous and asynchronous.
  • synchronous replication data replication is closely related to the cleaning operation of historical data, and each cleaning operation (such as PRUGE operation or VACUUM operation) corresponds to
  • the cleanup transaction in the commit phase initiates a historical data flow replication, that is, the node device synchronizes all the cleaned historical data to the cluster device before the cleanup operation is completed.
  • the cluster device is based on the ARIES algorithm for the data replication process.
  • the redo log (REDO LOG) of the data is played back, and the node device waits for the completion of the playback before setting the status of the cleanup transaction to "committed", so that the historical data can be copied to the cluster device as soon as possible, which greatly guarantees the historical state Data security.
  • redo log records can be performed only on the metadata of this data replication.
  • playback to realize the re-verification and proofreading between the node device and the cluster device, which can ensure the safety of the data replication process to a greater extent. In this case, it is still possible to avoid cleaning up the history from the original data table. State data executes redo log playback one by one, which simplifies the amount of data in the playback process, shortens the time consumed in the playback process, and improves the efficiency of data replication.
  • the data replication process can also be asynchronous.
  • the data replication and the commit of the cleanup transaction are not related.
  • the cleanup transaction of the node device does not initiate historical data flow replication during the commit phase.
  • the node device and the cluster The streaming replication between devices is initiated according to the second preset duration specified by the first preset condition, and the historical state data modified on the node device during the time interval between the two streaming replications is copied to the cluster device, saving The data transmission resources occupied by the data replication process are eliminated.
  • the confirmation of the completion of the data copying is also involved.
  • it can be divided into three confirmation levels, namely the confirmation playback level, the confirmation reception level, and the confirmation transmission level, which are described in detail below:
  • the node device when the node device receives the replication success response of the cluster device, the node device considers that a data replication task is completed, which realizes the strong synchronization of the data replication process. Strong synchronization can ensure that each data replication is atomic, that is, the overall process of data replication either succeeds or fails, and there is no intermediate state. Once an abnormality occurs in any link, it is considered that the data replication has failed and needs to be corrected. The entire data replication is redoed to ensure the safety of the data replication process.
  • the copy success response may be an "Applied" instruction.
  • the node device In the confirmation reception level, when the node device receives the data reception response of the cluster device, the node device considers that a data replication task is completed, which realizes the weak synchronization of the data replication process. Weak synchronization can ensure that in addition to the metadata playback of the cluster device, the rest of the operations in the data replication process are all atomic. At this time, if the metadata playback fails, it will not cause the entire data replication to be redone. While the efficiency of data replication, the security of the data replication process is guaranteed to a certain extent.
  • the data receiving response may be a "Received" instruction.
  • the node device In the confirmation transmission level, when the node device completes the data transmission operation, the node device considers that a data replication task is completed. At this time, although the data replication process cannot be guaranteed to be atomic, the node device and the cluster device do not affect each other. When the cluster device responds to abnormal conditions such as downtime, it will not block the node device from initiating data replication again. When the cluster device has more than one single device, even if one single device fails, the data replication process for the remaining single devices can still be performed The normal operation ensures the efficiency of data replication.
  • the cluster device receives at least one historical state data sent by the node device from a receiving buffer, where the receiving buffer is used to buffer the received historical state data.
  • the receiving buffer can be a part that is cyclically used in the data copying process, and the receiving buffer can be a buffer called when the receiving process or the receiving thread executes the receiving task (the historical data sent by the receiving node device).
  • the number of receiving processes or receiving threads can be one or more, the number of receiving buffers can also be one or more.
  • the embodiment of the present application takes one receiving buffer as an example for description.
  • the receiving buffer has a similar process of receiving historical state data, which will not be repeated here.
  • one receiving buffer may correspond to one node device.
  • the above step 205 is: the cluster device determines the receiving buffer corresponding to the node device from at least one receiving buffer, based on the receiving process or receiving thread Buffer the at least one historical state data sent by the node device into the receiving buffer, so that a receiving buffer can receive historical state data from the same node device in a targeted manner.
  • the cluster device allocates the data receiving task according to the storage space currently available in the receiving buffer.
  • the above step 205 is: In a receiving buffer, the receiving buffer with the largest storage space currently available is determined. Based on the receiving process or receiving thread, at least one historical state data sent by the node device is cached in the receiving buffer, so that the cluster device can add historical state data To the receiving buffer with the largest storage space currently available to realize the rational use of cache resources.
  • the cluster device adds the at least one historical state data in the receiving buffer to the forwarding buffer, and converts the at least one historical state data into data conforming to the tuple format through the forwarding buffer to obtain at least one data item ,
  • the forwarding buffer is used for data format conversion of historical data.
  • the process of adding (that is, copying) historical state data to the forwarding buffer by the receiving buffer can include synchronous replication and asynchronous replication.
  • the cluster device In the process of synchronous replication, whenever the cluster device receives historical state data from the receiving buffer (may be one or more, but it belongs to the one-time transmission of the node device), it immediately copies the historical state data to the forwarding Buffer.
  • the receiving buffer may be one or more, but it belongs to the one-time transmission of the node device
  • the cluster device receives historical data from the receiving buffer, and copies all historical data in the receiving buffer to the forwarding buffer every fourth preset period of time.
  • the fourth preset duration is any value greater than or equal to zero.
  • the cluster device receives at least the node device sent from the receiving buffer
  • a piece of historical data can be deduced by analogy for any node device, and the second preset duration of different node devices may be the same or different.
  • the cluster device receives multiple checkpoints from the receiving buffer At least one historical data sent by the node device at the same time ensures that the data delay between different node devices of the TP cluster does not exceed the third preset time period, which improves the real-time availability of the historical data stored in the AP cluster.
  • the historical data of this copy is cleared in the receiving buffer, so that the buffer space can be cleared in time to store the new data.
  • the historical state data thereby speeding up the speed of data transmission.
  • the format of at least one historical state data sent by the node device is a compressed data format, it is necessary to restore the at least one historical state data in the forwarding buffer to the original one that conforms to the tuple format.
  • the data conforming to the tuple format may be data in a row format.
  • the cluster device stores the at least one data item in at least one target data table of the cluster database, and one target data table corresponds to an original data table where one data item is located in the node device.
  • the target data table can include two storage formats according to different business requirements. Therefore, when the cluster device stores the at least one data item in the target data table, there are also two corresponding storage procedures, which are described in detail below :
  • the cluster device may store the data items in the target data table corresponding to the original data table according to the storage format in the original data table where the data item is located. , So that the storage format of the target data table is exactly the same as that of the original data table, which is convenient for tracking the life cycle of a tuple in a general situation.
  • each target data table corresponding to each original data table in the node device is created in the cluster device.
  • the original data table is used to store the current state data of multiple tuples
  • the original data table The corresponding target data table is used to store the historical state data of the multiple tuples.
  • BinLog binary log, also known as logical log
  • BinLog database transaction operations such as data changes and table structure changes are described in a specific format, which can be recorded in BinLog. Transaction operations are usually committed or rolled back.
  • the following takes the logical replication technology of the MySQL database as an example.
  • Dump-Thread threads dumping threads
  • One Dump-Thread thread is used to communicate with one When the cluster device is connected, the following steps can be performed when the node device and the cluster device are logically copied:
  • the cluster device sends the synchronized BinLog information (including the data file name and the location in the data file) to the node device.
  • the node database determines the current synchronized location based on the synchronized BinLog information; the node device’s Dump-Thread thread will not
  • the BinLog data of the synchronized metadata is sent to the cluster device.
  • the cluster device receives the BinLog data synchronized by the node device through IO-Thread (input/output thread), and writes the BinLog data to the Relay-Log (transfer log).
  • the cluster device reads the BinLog data from the Relay-Log file through SQL-Thread (SQL thread), and executes the SQL statement obtained after BinLog data decoding, so that the metadata of the node device can be incrementally copied to the cluster device in.
  • SQL-Thread SQL thread
  • the cluster device may store the data item in a target data table corresponding to the original data table according to a key-value storage format, thereby Not only can the information originally carried by the data item be retained, but also the change of historical data of any field can be tracked in a customized manner through the storage format of key-value pairs.
  • the cluster device stores the data At least one of the key name of the item in the original data table and the generation time of the data item is determined as the key name of the data item in the target data table.
  • the original data table has a key name
  • the generation time of the data item can be directly determined as the key name in the target data table, which can intuitively record the generation time of the data item.
  • the cluster device determines the modified field of the data item in the original data table as the key value of the data item in the target data table, where the modified field
  • the field is similar to the format of a string, the storage format of each modified field can be "key name: old value, new value", optionally, the modified field can be one or more, if there are multiple If the fields are modified at the same time, the modified fields can be separated by semicolons.
  • FIG. 7 is a schematic structural diagram of an original data table provided by an embodiment of the present application.
  • a data item representing field changes is taken as an example for description.
  • FIG. 8 is a schematic structural diagram of a target data table provided by an embodiment of the present application. See FIG. 8, in the target data In the table, you can visually see the dynamic changes of "server status" and "department” and operating time.
  • the storage format of each key value in the target data table can be "server status: service provided, service interrupted; department: department A, department B".
  • the cluster device can also upload the data items in the forwarding buffer to the distributed file system through the storage interface through the storage process or the storage thread for persistent storage, so as to achieve unlimited historical data. storage.
  • Ceph Take the distributed file system as Ceph and the cluster database of the cluster device as MySQL as an example.
  • MySQL There are two ways to mount Ceph on MySQL. For example, you can complete the configuration by mounting CephFS. Including a monitoring (Monitor) device (node1) and two stand-alone devices (node2 and node3), you can perform the following steps:
  • the cluster device creates a directory and prepares the bootstrap keyring file, which can be achieved by the "sudo mkdir-p/var/lib/ceph/mds/ceph-localhost" command.
  • Ceph will automatically generate it on node1 where the monitoring device is located Bootstrap keyring file.
  • this is a cluster
  • the device includes two stand-alone devices as an example. If the cluster device includes more than two stand-alone devices, you need to mount CephFS on other stand-alone devices and copy the bootstrap keyring file to the stand-alone device.
  • the cluster device generates the done file and the sysvinit file.
  • the cluster device can generate the done file through the statement "sudo touch/var/lib/ceph/mds/ceph-mon1/done" and the statement "sudo touch/var/lib/ceph/mds/ceph-mon1/sysvinit" generates a sysvinit file.
  • the cluster device generates the mds keyring file.
  • the cluster device can use the sentence "sudo ceph auth get-or-create mds.mon1osd'allow rwx'mds'allow'mon'allow profile mds'-o/ var/lib/ceph/mds/ceph-mon1/keyring" to generate a keyring file.
  • the cluster device creates a Cephfs pool.
  • the cluster device can create Cephfs pool data through the statement "ceph osd pool create cephfs_data 300", and create Cephfs pool data through the statement "ceph osd pool create cephfs_metadata 300" Metadata.
  • the cluster device starts the MDS file (a kind of mirror file).
  • the cluster device can start the MDS through the statement "sudo/etc/init.d/ceph start
  • the cluster device creates Cephfs and mounts Cephfs.
  • the cluster device can create Cephfs through the statement “cephfs new cephfs cephfs_metadata cephfs_data”.
  • the cluster device can use the statement "mount-tceph[ mon monitoring device ip address]:6789://mnt/mycephfs" to complete the mounting of Cephfs.
  • the cluster device can also be configured by mounting Ceph's RBD (a mirror file). Specifically, the following steps can be performed:
  • the cluster device creates an RBD pool, for example, through the sentence "ceph osd pool create rbd 256".
  • the cluster device creates the RBD block device myrbd (that is, applies for a block storage space), for example, you can use the sentence "rbd create rbd/myrbd--size 204800-m [mon supervision device ip address]-k/etc/ceph/ ceph.client.admin.keyring" to create.
  • the cluster device creates an RBD mapping to obtain the device name, that is, to map the RBD to the supervision device, for example, you can use the sentence "sudo rbd map rbd/myrbd--name client.admin-m[mon supervision device ip address]- k/etc/ceph/ceph.client.admin.keyring" to map and get the name of the supervisory device. It should be noted that here is an example of mounting to the supervisory device. Which one is actually mounted to For a stand-alone device, the operation of mapping the RBD to the stand-alone device and obtaining the name of the stand-alone device is performed.
  • the cluster device creates a file system based on the obtained device name and mounts the RBD. For example, you can create a file system through the statement "sudo mkfs.xfs/dev/rbd1", and use the statement “sudo mount/dev/rbd1/mnt/myrbd ”Carry out RBD mounting.
  • the cluster device can access the distributed file system through the storage interface, but any node device in the TP cluster can also access the distributed file system through the storage interface.
  • the configuration of the mounting method is completed, which will not be repeated here.
  • the cluster device sends a copy success response to the node device.
  • the cluster device when the cluster device successfully stores the historical data in the target data table, the cluster device can send an ACK data (acknowledgement, confirmation character) to the node device.
  • the ACK data is a transmission control character used for Indicates that the historical data sent to the node device has been successfully copied.
  • the node device When the node device receives the replication success response sent by the cluster device, it clears the sending buffer corresponding to the replication success response.
  • the node device is allowed to clear the sending buffer only after receiving the copy success response, which ensures strong synchronization between the node device and the cluster device, and ensures the security of the data replication process.
  • the historical data of the transaction is added to the data queue, so as to buffer the historical data of the transaction in the data queue, and the data in the data queue
  • At least one historical state data is added to the sending buffer to execute a sending process or a sending thread based on the sending buffer, and when the first preset condition is met, the at least one historical state data in the sending buffer is copied to the cluster device,
  • the node device can copy the historical data in the sending buffer to the cluster device whenever the first preset condition is met.
  • the node device does not need to convert the original historical data format into a log format, and the cluster device does not need to parse the log.
  • the data is stored in the original format, so there is no need to replay the redo log of the historical data during data copying, avoiding the tedious playback process, shortening the time of the redo log playback process, and improving the efficiency of the data copy process .
  • the node device synchronously replicates the historical state data, ensuring that the historical state data can be copied to the cluster device in the order of transaction submission, avoiding the step of sorting the historical state data, and simplifying the flow of the streaming replication process.
  • the node device can also asynchronously copy the historical data in the data queue to the sending buffer, thereby adding the historical data in the data queue to the sending buffer in batches, avoiding frequent copy operations of the historical data, and then To avoid affecting the processing efficiency of the node device, the historical data needs to be sorted before asynchronous replication to ensure that the historical data is added to the sending buffer in an orderly manner, which facilitates subsequent cluster devices to obtain the minimum transaction identifier.
  • the sending buffer will copy the historical data to the cluster device. After the copy is successful, the sending buffer will be cleared. After that, the sending buffer will cyclically execute the process of adding historical data and sending historical data.
  • the historical state data of the node device is continuously copied to the cluster device, avoiding the redo log playback of the historical state data, and improving the efficiency of the data replication process.
  • the node device can evenly add historical state data from the same original data table in the data queue to the multiple sending buffers, thereby increasing the number of sending buffers.
  • the utilization rate of the original data table is increased, and the sending rate of the historical data in the original data table is increased.
  • the first preset condition is that when the node device detects that any historical data is added to the sending buffer, it can realize synchronous replication of data replication, which ensures the real-time performance of the historical data replication process.
  • the first preset condition is that when the node device detects that the ratio of the used data volume of the sending buffer to the capacity of the sending buffer reaches the ratio threshold, it can effectively reduce the used data volume of the sending buffer to the capacity of the sending buffer. The ratio is controlled within the ratio threshold, which improves the efficiency of the data replication process.
  • the first preset condition is that when the current time has reached the second preset duration from the time when the sending buffer was last copied historical data to the cluster device, the maximum time interval between two data copies can be controlled to ensure historical data The real-time nature of the copy process.
  • the first preset condition is when the current time is the third preset time period from the last time the sending buffer copied historical data to the cluster device, because the third preset time period is the same preset time that each node device of the TP cluster has Set the duration to control the delay of the data replication process of different nodes in the TP cluster.
  • the cluster device may add the at least one historical state data in the receiving buffer to the forwarding buffer, and the at least one historical data in the receiving buffer
  • a historical data is converted into data that conforms to the tuple format to obtain at least one data item, thereby restoring the format of the compressed historical data. Since the historical data with the original format is directly obtained, it can be avoided by Log analysis is used to obtain historical data, and the at least one data item is stored in at least one target data table of the cluster database, so that the historical data can be properly stored.
  • the target data table of the cluster device can support two storage formats.
  • the cluster device can store the storage format in the original data table, which is convenient for general purpose In the case of tracking the life cycle of a tuple.
  • the cluster device can store them in the storage format of key-value pairs. In this way, not only can the information originally carried by the data item be retained, but also the historical data changes of any field can be tracked customized .
  • the cluster device may determine at least one of the key name of the data item in the original data table and the generation time of the data item as the data item in the target data table
  • the key name in the data item is used to track the changes of historical data from different dimensions and intuitively record the generation time of the data item.
  • the cluster device determines the modified field of the data item in the original data table as the data item
  • the key value in the target data table can visually view the modified field and track the change of the historical data of any field.
  • the foregoing embodiment provides a data replication method.
  • the node device can replicate historical data to the cluster device based on the streaming replication technology, which improves the security of the historical data, and the cluster device is properly After storing historical data, it can also provide services such as query or analysis of historical data.
  • a transaction can include one or more sub-transactions, and different sub-transactions can correspond to different node devices, although the node device can perform the data replication process every second preset duration, However, the starting time point of different node devices may be different, resulting in the possibility of asynchronous data replication between node devices.
  • the node devices corresponding to one or more sub-transactions of the transaction may cause some node devices to have The historical state data corresponding to the transaction is copied to the cluster device, and some node devices may not have copied the historical state data corresponding to the sub-transaction to the cluster device, which causes the cluster device to not be able to read the same transaction completely. For all historical data, there will be "inconsistencies" when the AP cluster reads data.
  • FIG. 9 is a flowchart of a data query process provided by an embodiment of this application. See FIG. 9, in the cluster device The steps to read historical data are as follows:
  • the cluster device sorts at least one historical data in the order of transaction submission timestamp from smallest to largest. When there are multiple historical data with the same transaction submission timestamp, the transaction identifier is in the order of smallest to largest. Sort the historical data to obtain the target data sequence.
  • the above sorting process refers to sorting according to the order of assignment of transaction identifiers from small to large, and if one transaction is before another, the assignment of the transaction identifier of one transaction is less than the assignment of the transaction identifier of the other transaction.
  • the later the commit time of a transaction the greater the assignment of the transaction identifier of the transaction. Therefore, the assignment of the transaction identifier is actually incremented according to the time stamp of the commit time.
  • the sorting process in the above step 901 is similar to the above step 203, and will not be repeated here.
  • the TP cluster periodically performs a Checkpoint operation
  • the cluster device receives at least one historical state data sent by the Checkpoint operation from at least one receiving buffer, it can sort the received at least one historical state data , Obtain the target data sequence ordered according to the transaction commit time stamp and ordered according to the transaction identifier.
  • the following steps 902-903 are executed.
  • the cluster device traverses the target data sequence, performs a bitwise AND operation on the bitmap code of each historical state data, and determines that the transaction corresponding to the historical state data that is output is true meets the second preset condition.
  • step 204 when any node device sends historical data to the cluster device, since one or more sub-transactions of a transaction correspond to one or more node devices, in order to record the node devices related to the transaction (That is, the node device corresponding to the sub-transaction), usually bitmap encoding or dictionary compression can be used to encode the node identification of the one or more node devices, thereby compressing the length of historical data and reducing data transmission Resources occupied.
  • the at least one transaction is a transaction that meets a second preset condition
  • the second preset condition is used to indicate that data items corresponding to all sub-transactions of the transaction have been stored in the cluster database.
  • the cluster device obtains at least one transaction that meets the second preset condition from the target data sequence, and the manner of obtaining the at least one transaction is determined by the method of compressing the historical state data by the node device.
  • the above process gives a method to determine at least one transaction that meets the second preset condition when the node device uses bitmap encoding for data compression, that is, to perform a bitwise AND operation on each historical state data in the target data sequence , If all the bits are 1 (true), it means that the transaction corresponding to the historical state data meets the second preset condition, because the data items corresponding to all sub-transactions of the transaction have been stored in the cluster database.
  • the at least one transaction can be referred to as "alternative consistency point".
  • the above step 902 can also be replaced in the following manner: the cluster device traverses the target data sequence, and decodes the compression dictionary of each historical data , Obtain the global transaction identifier corresponding to each historical state data, when it is determined that the data items of the sub-transaction corresponding to the global transaction identifier have been stored in the cluster database, it is determined that the transaction corresponding to the global transaction identifier meets the second predetermined Set conditions, so that candidate consistency points can also be determined in the case of dictionary compression, and the "complete minimum transaction ID" can be found from the candidate consistency points through the following step 903.
  • a global transaction means that the multiple sub-transactions involved in this transaction correspond to multiple node devices
  • any global transaction Language can include two types of transaction identifiers, which are global transaction identifiers and local transaction identifiers.
  • the global transaction identifier is used to indicate the unique identification information of all global transactions in the entire TP cluster
  • the local transaction identifier is used to indicate the The unique identification information in all transactions in the node device.
  • all sub-transactions have the same global transaction identifier, and each sub-transaction also has its own local transaction identifier.
  • the process of determining that the data items of the sub-transactions corresponding to the global transaction identifier have been stored in the cluster database may be as follows: the cluster device obtains the data items stored in the cluster database and has the global transaction identifier according to the global transaction identifier.
  • the data item of the transaction identifier when the acquired data item and the decoded historical state data correspond to all sub-transactions of the transaction, it is determined that the data items of the sub-transaction corresponding to the global transaction identifier have been stored in the cluster database .
  • the cluster device determines the transaction identifier corresponding to the most-ranked transaction among the at least one transaction as the smallest transaction identifier.
  • the cluster device since the cluster device has sorted each historical state data in the order of transaction identifiers from small to large in step 901, it can directly obtain the transaction identifier corresponding to the transaction with the highest ranking among at least one transaction.
  • the minimum transaction identifier among the transaction identifiers of at least one transaction is obtained.
  • the transaction identifier in the node device is incremented according to the timestamp. Therefore, the minimum transaction identifier is obtained, which means that the historical data received by this Checkpoint operation is obtained.
  • the most complete (means meeting the second preset condition) and the smallest time stamp, the smallest transaction identifier can be called the "complete smallest transaction ID".
  • it can be regarded as It is a "micro consistency point".
  • the cluster device determines the smallest transaction identifier that meets the second preset condition among the transaction identifiers of at least one historical state data, and the second preset condition is used to indicate that all sub-transactions of the transaction correspond to The data items have been stored in the cluster database, so as to find the complete smallest transaction ID in this Checkpoint operation. In some embodiments, if one cannot be found in this Checkpoint operation than the one determined in the last Checkpoint operation If the minimum transaction identifier is larger for the new round of the minimum transaction identifier, the minimum transaction identifier will not be updated for the time being.
  • steps 901-903 will be executed in the next Checkpoint operation of the TP cluster until the new one is determined
  • step 904 performs the following step 904 to ensure that during the continuous submission of new transactions in the TP cluster, historical data with larger transaction identifiers will continue to be generated, and these historical data will be dumped to the AP through the Checkpoint operation
  • the AP cluster can continuously update the value of the minimum transaction ID, making the value of the complete minimum transaction ID larger and larger, which is similar to a forward rolling process, ensuring that the AP cluster provides data query services Real-time.
  • the cluster device determines a visible data item according to the minimum transaction identifier, and provides a data query service based on the visible data item, where the transaction identifier of the visible data item is less than or equal to the minimum transaction identifier.
  • the cluster device can be based on the tuple visibility judgment algorithm of MVCC technology, so that data items with a transaction identifier less than or equal to the minimum transaction identifier are visible to the outside, thereby ensuring the AP cluster under the micro Checkpoint operation mechanism Read consistency.
  • the read consistency of any read operation of the full-temporal data can be achieved, because the read consistency can be considered essentially based on historical data
  • the constructed transaction consistency therefore, the realization of read consistency also ensures that the historical state data at any point in time read from the AP cluster is at a transaction consistency point.
  • FIG. 10 is a schematic diagram of a transaction consistency point provided by an embodiment of the present application.
  • the initial data state is represented by a white circle.
  • r1, r2, and r3 are in a consistent state (represented by solid lines).
  • T1 transaction commits at t1 ,
  • the data item r1 is modified to generate a new version of r1, which is represented by a black circle in the figure;
  • transaction T2 is committed at t2, and the data items r2 and r3 are modified to generate new versions of r2 and r3, in the figure It is represented by a slash circle;
  • the T3 transaction is submitted at t3, and the data items r1 and r3 are modified, resulting in new versions of r1 and r3, which are represented by grid circles in the figure;
  • the T4 transaction is submitted at t4 and modified With the data item r2, a new version of r2 is generated, which is represented by dot circles in the figure.
  • the user can route the query to any data stored in the TP cluster or AP cluster based on the query statement, the semantics and metadata of the query operation provided by the SR layer in Figure 1.
  • the TP cluster mainly provides
  • the AP cluster mainly provides query services for historical data.
  • the distributed concurrent access control algorithm can be used to ensure the transaction consistency of the current state (or transition state) data.
  • the distributed The concurrent access control algorithm can be a concurrent access control algorithm based on blocking technology, a concurrent access control algorithm based on OCC (optimstic concurrency control) technology, a concurrent access control algorithm based on TO (time ordering, time sequence) technology,
  • OCC optical concurrency control
  • TO time ordering, time sequence
  • the embodiment of the present application does not specifically limit the type of the distributed concurrent access control algorithm.
  • the historical state data that meets the consistency condition may be read based on the above-mentioned transaction consistency.
  • the HTAC architecture as a whole can also provide hybrid query services, that is, a query operation is used to query the current state data and historical state data of the tuple at the same time.
  • the query operation usually specifies a historical time point. From this point in time, the historical state data of the tuple has been read until the current state data at the current moment is queried.
  • a hybrid query can be implemented based on the following statement:
  • table_references can be as follows: tbl_name[[AS]alias][index_hint][SNAPSHOTSTART snapshot_name[TO snapshot_name2][WITHtype]].
  • SNAPSHOT is a transaction snapshot (different from a data snapshot of a data block), which can be referred to as a snapshot for short, "[SNAPSHOT[START snapshot_name][TO snapshot_name2][WITH type]]” means that a "tbl_name” object specifies a snapshot interval, It is a new content added on the basis of DQL (data query language, data query language). All clauses of the statement include (SNAPSHOT, START, TO), which means “snapshot difference read", that is, read from a snapshot Fetch until another snapshot is read.
  • the data query process provided in the embodiments of this application ensures the overall read consistency under the HTAC architecture, which not only ensures the read consistency of the TP cluster, but also the read consistency of the AP cluster.
  • each When receiving the historical data of a Checkpoint operation all try to obtain a new minimum transaction ID (complete minimum transaction ID), that is, try to update the value of the minimum transaction ID, based on the visibility of the tuple of MVCC technology
  • the judgment algorithm makes the corresponding data items of the transaction ID smaller than the minimum transaction ID visible, ensuring the transaction consistency of the historical data stored in the AP cluster at the transaction level, and when HTAC also supports external consistency (including linear consistency, causality) Consistency, etc.), the overall external consistency and transaction consistency can be regarded as global consistency, so that any read operation initiated based on the HTAC architecture can meet global consistency, although the Checkpoint operation will cause a certain data delay.
  • the AP cluster meets the query requirements and computing requirements of analytical services for data correctness and real
  • the above embodiment provides a process of performing data query based on the data replication method.
  • the node device can copy the historical data to the cluster device based on the streaming replication technology, so that the cluster device can provide Services such as query and analysis of historical data have improved the security and usability of historical data.
  • the time consumed when the TP cluster performs data replication to the AP cluster is greatly increased, and the performance of HTAC may also be bumped, which affects HTAC. Because of its stability and robustness, the micro Checkpoint operation is introduced.
  • Figure 11 is an interactive flowchart of a data system provided by an embodiment of the present application. See Figure 11.
  • the data system includes a cluster device of an AP cluster and multiple node devices of a TP cluster.
  • the Checkpoint operation and the process of Checkpoint operation are detailed:
  • each node device in the TP cluster performs a micro Checkpoint operation every second preset time interval, and copies at least one historical state data on the node device to the cluster device.
  • the second preset duration is the same as that in step 202, the micro Checkpoint operation has been described in detail in step 204, and the process of data copying is similar to the steps 201-209, which will not be repeated here.
  • the multiple node devices simultaneously copy at least one historical state data of themselves to the cluster device, where the third preset duration is greater than the second preset duration.
  • the third preset duration may be any value greater than the second preset duration.
  • the second preset duration corresponds to the operation frequency of the micro Checkpoint
  • the third preset duration corresponds to the operation frequency of the Checkpoint.
  • the TP cluster traverses each node device of the TP cluster every third preset time interval, performs a checkpoint operation, and copies at least one historical state data of all the node devices in the TP cluster to the cluster device, and the data is copied
  • the process is similar to the above steps 201-209, and will not be repeated here.
  • the cluster device determines the smallest transaction identification that meets the second preset condition among the transaction identifications of all historical data sent by the multiple node devices, and the second preset condition uses The data items corresponding to all sub-transactions representing the transaction have been stored in the cluster database; according to the minimum transaction identifier, the visible data item is determined, and data query services are provided based on the visible data item.
  • the transaction identifier is less than or equal to the minimum transaction identifier.
  • the foregoing step 1103 is similar to the foregoing steps 901-904, and will not be repeated here.
  • the data system provided by the embodiment of the application through the interaction process between the TP cluster and the AP cluster, reflects at the system level that each node device in the TP cluster executes the micro checkpoint operation every second preset time interval, and the TP cluster As a whole, all node devices perform a Checkpoint operation every third preset time interval, which can not only meet the real-time update requirements of the AP cluster for historical data, ensure the real-time availability of the AP cluster, but also reduce the data replication process through the micro-checkpoint operation The length of traversal confirmation time spent in the process improves the efficiency of data replication.
  • FIG. 12 is a schematic structural diagram of a data copying device provided by an embodiment of the present application.
  • the device includes an adding module 1201 and a copying module 1202, which will be described in detail below:
  • the adding module 1201 is used for adding the historical state data of the transaction to the data queue when the commit operation of the transaction is detected, and the data queue is used for buffering the historical state data;
  • the adding module 1201 is also used to add at least one historical state data in the data queue to a sending buffer, where the sending buffer is used to buffer the historical state data to be copied;
  • the copy module 1202 is configured to copy the at least one historical state data in the sending buffer to the cluster device when the first preset condition is met.
  • the device provided by the embodiment of the present application adds the historical state data of the transaction to the data queue when the commit operation of the transaction is detected, so as to buffer the historical state data of the transaction in the data queue, and the data in the data queue At least one historical state data is added to the sending buffer to execute a sending process or a sending thread based on the sending buffer, and when the first preset condition is met, the at least one historical state data in the sending buffer is copied to the cluster device, This allows the node device to copy the historical data in the sending buffer to the cluster device whenever the first preset condition is met.
  • the node device does not need to convert the original historical data format into a log format, and the cluster device does not need to The log is parsed into the original format of the data and then stored, so there is no need to replay the historical data during data replication, avoiding the tedious playback process, shortening the time of the redo log playback process, and improving the data replication process s efficiency.
  • the adding module 1201 is used to:
  • the copy module 1202 is used to:
  • the at least one historical data in the sending buffer is copied to the cluster device.
  • the adding module 1201 is used to:
  • Every interval of the first preset duration acquiring at least one piece of historical state data that is added within the first preset duration before the current moment in the data queue;
  • the multiple historical states are sorted according to the order of transaction identification from small to large.
  • the data is sorted to obtain at least one ordered historical state data, and the at least one ordered historical state data is added to the sending buffer.
  • the first preset condition is detecting that any historical data is added to the sending buffer.
  • the first preset condition is when it is detected that the ratio of the used data amount of the sending buffer to the capacity of the sending buffer reaches a ratio threshold; or,
  • the first preset condition is that the current time has reached the second preset time period from the time when the sending buffer last copied historical data to the cluster device; or,
  • the first preset condition is that the current time has reached a third preset duration from the time when the sending buffer was last copied historical state data to the cluster device, and the third preset duration is for each node device among the multiple node devices Configure the same preset duration, and the third preset duration is greater than the second preset duration.
  • the device further includes:
  • the emptying module is used to empty the sending buffer corresponding to the successful copying response when the successful copying response sent by the cluster device is received.
  • the adding module 1201 is also used to:
  • the data replication device provided in the above embodiment replicates data
  • only the division of the above functional modules is used as an example for illustration.
  • the above function allocation can be completed by different functional modules as required, namely The internal structure of the node device is divided into different functional modules to complete all or part of the functions described above.
  • the data copying device provided in the foregoing embodiment and the data copying method embodiment belong to the same concept, and the specific implementation process is detailed in the data copying method embodiment, which will not be repeated here.
  • FIG. 13 is a schematic structural diagram of a data replication device provided by an embodiment of the present application.
  • the device includes a receiving module 1301, an adding module 1302, and a storage module 1203, which will be described in detail below:
  • the receiving module 1301 is configured to receive at least one historical state data sent by a node device from a receiving buffer, where the receiving buffer is used to buffer the received historical state data;
  • the adding module 1302 is configured to add the at least one historical state data in the receiving buffer to the forwarding buffer, and convert the at least one historical state data into data conforming to the tuple format through the forwarding buffer to obtain at least one Data item, the forwarding buffer is used for data format conversion of historical data;
  • the storage module 1303 is configured to store the at least one data item in at least one target data table of the cluster database, and one target data table corresponds to an original data table where one data item is located in the node device.
  • the apparatus after receiving at least one historical state data sent by a node device from a receiving buffer, adds at least one historical state data in the receiving buffer to the forwarding buffer, and the forwarding buffer At least one historical state data is converted into data conforming to the tuple format to obtain at least one data item, so that the format of the compressed historical state data can be restored. Because the historical state data with the original format is directly obtained, it can be Avoid performing the operation of parsing logs to obtain historical state data, and then store the at least one data item in at least one target data table of the cluster database, so as to achieve proper preservation of historical state data.
  • the storage module 1303 includes:
  • the first storage unit is used to store the data item in the target data table corresponding to the original data table according to the storage format in the original data table where the data item is located for the data item in the unit of tuple; or ,
  • the second storage unit is used to store the data item in the target data table corresponding to the original data table according to the storage format of the key-value pair for the data item representing the field change.
  • the second storage unit is used for:
  • At least one of the key name of the data item in the original data table and the generation time of the data item is determined as the key name of the data item in the target data table;
  • the modified field of the data item in the original data table is determined as the key value of the data item in the target data table.
  • the device further includes:
  • the determining module is configured to determine the smallest transaction identifier that meets the second preset condition among the transaction identifiers of the at least one historical state data, and the second preset condition is used to indicate that the data items corresponding to all sub-transactions of the transaction have been stored In the cluster database;
  • the query module is used to determine the visible data item according to the minimum transaction identifier, and provide data query services based on the visible data item, wherein the transaction identifier of the visible data item is less than or equal to the minimum transaction identifier.
  • the determining module includes:
  • the sorting unit is used to sort the at least one historical state data in the descending order of the transaction commit timestamp.
  • the transaction identifiers are sorted in descending order Sort the multiple historical data to obtain the target data sequence;
  • An obtaining unit configured to obtain at least one transaction that meets the second preset condition from the target data sequence
  • the determining unit is configured to determine the transaction identifier of the most-ranked transaction among the at least one transaction as the smallest transaction identifier.
  • the acquiring unit includes:
  • the traversal determination subunit is used to traverse the target data sequence, perform a bitwise AND operation on the bitmap code of each historical state data, and determine that the transaction corresponding to the historical state data that the output is true meets the second preset condition; or,
  • Traverse to determine the subunit is also used to traverse the target data sequence, decode the compression dictionary of each historical state data, and obtain the global transaction identifier corresponding to each historical state data; when determining the sub-transaction corresponding to the global transaction identifier When all data items have been stored in the cluster database, the second preset condition of the transaction corresponding to the global transaction identifier is determined.
  • the traversal determination subunit is also used for:
  • the data item stored in the cluster database and having the global transaction identifier is obtained.
  • the acquired data item and the decoded historical state data correspond to all sub-transactions of the transaction, it is determined
  • the data items of the sub-transactions corresponding to the global transaction identifier have been stored in the cluster database.
  • the receiving module 1301 is used to:
  • At every third preset time interval at least one piece of historical state data sent by multiple node devices simultaneously is received from the receiving buffer.
  • the data replication device provided in the above embodiment replicates data
  • only the division of the above functional modules is used as an example for illustration.
  • the above function allocation can be completed by different functional modules as required, namely The internal structure of the cluster device is divided into different functional modules to complete all or part of the functions described above.
  • the data copying device provided in the foregoing embodiment and the data copying method embodiment belong to the same concept, and the specific implementation process is detailed in the data copying method embodiment, which will not be repeated here.
  • the computer device 1400 may have relatively large differences due to different configurations or performance, and may include one or more processors (central processing units, CPU) 1401 and one Or more than one memory 1402, where at least one instruction is stored in the memory 1402, and the at least one instruction is loaded and executed by the processor 1401 to implement the data replication method provided by the foregoing data replication method embodiments.
  • the computer device may also have components such as a wired or wireless network interface, a keyboard, an input and output interface for input and output, and the computer device may also include other components for implementing device functions, which will not be repeated here.
  • a computer-readable storage medium such as a memory including at least one instruction, which can be executed by a processor in a terminal to complete the data copy method in the foregoing embodiment.
  • the computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium can be read-only memory, magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据复制方法、装置、计算机设备及存储介质,属于数据库技术领域。该方法包括:当检测到任一事务的提交操作时,将该事务的历史态数据添加到数据队列中;将该数据队列中的至少一个历史态数据添加到至少一个发送缓冲区中的任一发送缓冲区;当符合第一预设条件时,将该发送缓冲区中的至少一个历史态数据复制至集群设备。节点设备不用把原本的历史态数据格式转化为日志格式,集群设备也不必将日志解析为数据原始格式后再进行存储,也就避免了繁琐的日志回放流程,提高了数据复制过程的效率。

Description

数据复制方法、装置、计算机设备及存储介质
本申请要求于2019年05月05日提交中国专利局、申请号为201910368297X、申请名称为“数据复制方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据库技术领域,特别涉及数据复制技术。
背景技术
在数据库技术中,尤其是OLAP(online analytical processing,联机实时分析)处理系统、数据仓库、大数据分析等场景中,经常需要从数据库中复制数据,对已有数据进行及时备份。
在复制过程中通常涉及到主机和备机两种设备,对于目前的数据库(例如Oracle、MySQL或者InnoDB等)而言,主机可以定期将数据库中的数据文件复制到备机上,实现基于数据文件的主备同步。进一步地,为了避免因在复制过程中对数据文件造成损坏,而导致主机与备机数据不一致,主机和备机建立通信连接后,两者的数据库之间同步一个重做日志(REDO LOG),如果复制过程出现异常,备机可以通过回放重做日志清除掉异常的数据。
然而,重做日志的解析工作和回放工作较为复杂,在大数据量的场景下,备机回放重做日志会消耗较长时间,影响数据复制过程的效率。
发明内容
本申请实施例提供了一种数据复制方法、装置、计算机设备及存储介质,能够解决基于重做日志复制数据时,消耗时间长,解析回放工作复杂,影响数据复制效率的问题。该技术方案如下:
一方面,提供了一种数据复制方法,由计算机设备(节点设备)执行,该方法包括:
当检测到事务的提交操作时,将该事务的历史态数据添加到数据队列中,该数据队列用于缓存历史态数据;
将该数据队列中的至少一个历史态数据添加到发送缓冲区,该发送缓冲区用于缓存待复制的历史态数据;
当符合第一预设条件时,将该发送缓冲区中的该至少一个历史态数据复制至集群设备。
一方面,提供一种数据复制方法,由计算机设备(集群设备)执行,该方法包括:
从接收缓冲区接收节点设备发送的至少一个历史态数据,该接收缓冲区用于缓存接收的历史态数据;
将该接收缓冲区中的该至少一个历史态数据添加到转发缓冲区,通过该 转发缓冲区,将该至少一个历史态数据转换为符合元组格式的数据,得到至少一个数据项,该转发缓冲区用于对历史态数据进行数据格式转换;
将该至少一个数据项存储到集群数据库的至少一个目标数据表中,一个目标数据表对应于一个数据项在该节点设备中所在的一个原始数据表。
一方面,提供了一种数据复制装置,该装置包括:
添加模块,用于当检测到事务的提交操作时,将该事务的历史态数据添加到数据队列中,该数据队列用于缓存历史态数据;
该添加模块,还用于将该数据队列中的至少一个历史态数据添加到发送缓冲区,该发送缓冲区用于缓存待复制的历史态数据;
复制模块,用于当符合第一预设条件时,将该发送缓冲区中的该至少一个历史态数据复制至集群设备。
一方面,提供了一种数据复制装置,该装置包括:
接收模块,用于从接收缓冲区接收节点设备发送的至少一个历史态数据,该接收缓冲区用于缓存接收的历史态数据;
添加模块,用于将该接收缓冲区中的该至少一个历史态数据添加到转发缓冲区,通过该转发缓冲区,将该至少一个历史态数据转换为符合元组格式的数据,得到至少一个数据项,该转发缓冲区用于对历史态数据进行数据格式转换;
存储模块,用于将该至少一个数据项存储到集群数据库的至少一个目标数据表中,一个目标数据表对应于一个数据项在该节点设备中所在的一个原始数据表。
一方面,提供了一种计算机设备,该计算机设备包括处理器和存储器,该存储器中存储有至少一条指令,该至少一条指令由该处理器加载并执行以实现如上述任一种可能实现方式的数据复制方法。
一方面,提供了一种计算机可读存储介质,该存储介质中存储有至少一条指令,该至少一条指令由处理器加载并执行以实现如上述任一种可能实现方式的数据复制方法。
一方面,提供了一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行以实现如上述任一种可能实现方式的数据复制方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
当检测到事务的提交操作时,将该事务的历史态数据添加到数据队列中,以将事务的历史态数据缓存到数据队列中,将该数据队列中的至少一个历史态数据添加发送缓冲区,以便基于发送缓冲区执行发送进程或发送线程,当符合第一预设条件时,将该发送缓冲区中的该至少一个历史态数据复制至集群设备,使得节点设备能够每当符合第一预设条件,就将至少一个发送缓冲区中的历史态数据复制到集群设备。如此,节点设备不必把原本的历史态数据格式转化为日志格式,集群设备也不必将日志解析为数据原始格式后再进 行存储,从而在数据复制时无需对历史态数据进行重做日志的回放,避免了繁琐的回放流程,缩短了重做日志回放过程的时长,提高了数据复制过程的效率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种数据复制方法的实施环境示意图;
图2是本申请实施例提供的一种数据复制方法的交互流程图;
图3是本申请实施例提供的一种获取历史态数据的原理性示意图;
图4是本申请实施例提供的一种获取历史态数据的原理性示意图;
图5是本申请实施例提供的一种流复制技术的原理性示意图;
图6是本申请实施例提供的一种流复制技术的原理性示意图;
图7是本申请实施例提供的一种原始数据表的结构示意图;
图8是本申请实施例提供的一种目标数据表的结构示意图;
图9是本申请实施例提供的一种数据查询过程的流程图;
图10是本申请实施例提供的一种事务一致性点的原理性示意图;
图11是本申请实施例提供的一种数据系统的交互流程图;
图12是本申请实施例提供的一种数据复制装置的结构示意图;
图13是本申请实施例提供的一种数据复制装置的结构示意图;
图14是本申请实施例提供的计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在介绍本申请实施例之前,首先介绍一些数据库技术中的基本概念:
本申请实施例涉及的数据库存储有多个数据表,每个数据表可以用于存储元组。其中,该数据库可以为基于MVCC(multi-version concurrency control,多版本并发控制)的任一类型的数据库。在本申请实施例中,对该数据库的类型不作具体限定。
需要说明的是,基于状态属性可以将上述数据库中的数据划分为三种状态:当前态、过渡态和历史态,该三种状态合称为“数据全态(full state)”,简称全态数据,全态数据中的各个不同状态属性,可以用于标识数据在其生命周期轨迹中所处的状态。
当前态(current state):元组的最新版本的数据,是处于当前阶段的数据。处于当前阶段的数据的状态,称为当前态。
过渡态(transitional state):不是元组的最新的版本也不是历史态版本,处于从当前态向历史态转变的过程中,处于过渡态的数据,称为半衰数据。
历史态(historical state):元组在历史上的一个状态,其值是旧值,不是当前值。处于历史阶段的数据的状态,称为历史态。一个元组的历史态,可以有多个,反映了数据的状态变迁的过程。处于历史态的数据,只能被读取而不能被修改或删除。
需要说明的是,在MVCC机制下,数据的上述三种状态均存在,在非MVCC机制下,数据可以只存在历史态和当前态。在MVCC或封锁并发访问控制机制下,事务提交后的数据的新值处于当前态。在MVCC机制下,当前活跃事务列表中最小的事务之前的事务生成的数据,其状态处于历史态;在封锁并发访问控制机制下,事务提交后,提交前的数据的值变为历史态的值,即元组的旧值处于历史态。而被读取的版本上尚有活跃事务(非最新相关事务)在使用,由于最新相关事务修改了元组的值,其最新值已经处于一个当前态,被读取到的值相对当前态已经处于一个历史状态,因此,其数据状态介于当前态和历史态之间,所以称为过渡态。
例如,在MVCC机制下,用户表(user表)的A账户余额从10元充值变为20元,然后消费了15元变为5元,若从此时起金融B机构开始读取数据做检查事务,A之后又充值20元变为25元,则25元为当前态数据,B正在读取到的5元为过渡态,其余的两个值20、10是历史上存在过的状态,都是历史态数据。
基于上述名词解释,图1是本申请实施例提供的一种数据复制方法的实施环境示意图。参见图1,该实施环境可以统称为HTAC(hybrid transaction/analytical cluster,混合事务/分析集群)架构,在HTAC架构内,可以包括TP(transaction processing,事务处理)集群101和AP(analytical processing,分析处理)集群102。
其中,该TP集群101用于提供事务处理服务,在TP集群中可以包括多个节点设备103,在数据复制过程中,该多个节点设备用于提供待复制的历史态数据,TP集群101的每个节点设备上可以配置有节点数据库,每个节点设备可以是单机设备,也可以是一主两备集群设备,本申请实施例不对节点设备的类型进行具体限定。
其中,该AP集群102用于提供历史态数据的查询及分析服务,在AP集群中可以包括集群设备,集群设备上可以配置有集群数据库,在数据复制过程中,该集群设备用于将该多个节点设备发送的历史态数据复制存储到集群数据库,基于集群数据库中已存储的历史态数据提供查询及分析服务。其中,该集群数据库可以是本地的数据库,也可以是该集群设备通过存储接口接入的分布式文件系统,从而可以通过该分布式文件系统对TP集群提供无限存储 功能,例如,该分布式文件系统可以是HDFS(Hadoop distributed file system,Hadoop分布式文件系统)、Ceph(一种Linux系统下的分布式文件系统)、Alluxio(一种基于内存的分布式文件系统)等。
当然,该集群设备可以由一个或多个单机设备或者一主两备集群设备组合而成,各个设备之间实现联机通信,本申请实施例不对集群设备的类型进行具体限定。
在一些实施例中,由于TP集群101中的多个节点设备可以提供事务处理服务,在任一事务提交完成的时刻,在生成新的当前态数据的同时,也会生成与该当前态数据对应的历史态数据,而由于历史态数据会占用较多存储空间,但历史态数据又具有保存价值,因此该多个节点设备可以基于本申请实施例提供的数据复制方法,将历史态数据复制至集群设备,由集群设备基于本地执行器(local executor,LE)将历史态数据存储到数据表中,当复制完成后支持在该节点设备上删除已复制的历史态数据(当然也可以不删除),将历史态数据从TP集群转储到AP集群,从而保证HTAC架构不仅可以存储当前态数据和过渡态数据,对历史态数据也能实现妥善存储,实现完备的全态数据的存储机制。
在上述过程中,当该多个节点设备将历史态数据成功复制到集群设备上之后,还可以将本次复制的历史态数据的元数据注册到集群设备的元数据(metadata,MD)管理器中,便于集群设备基于该元数据管理器统计已储备的历史态数据的元信息。
在一些实施例中,用户可以基于SQL路由(structured query language router,SQL Router,SR)层中提供的查询语句、查询操作的语义和元数据,路由查询到TP集群101或AP集群102内存储的任一数据,当然,TP集群101主要提供对当前态数据的查询服务,AP集群102主要提供对历史态数据的查询服务。其中,查询操作的语义是根据查询语句分析得到的操作意图,例如,WHERE子句的条件可以表示WHERE子句的意图。
在一些实施例中,尤其是大数据场景下,一个事务不仅会涉及到对单个节点设备的节点数据库进行数据修改,还通常会涉及到对另外的至少一个节点设备的节点数据库进行数据修改,此时可以基于分布式一致性算法(例如two-phase commit,2PC)执行跨节点的写事务,以保证对数据操作的事务的原子性和一致性。
在上述架构中,TP集群101中的每个节点设备对应的一个或多个节点数据库可以组成数据库实例集合,可被称为一个SET(集合)。当然,如果该节点设备为单机设备,那么该单机设备的数据库实例为一个SET;如果该节点设备为一主两备集群设备,那么该节点设备的SET为主机数据库实例以及两个备机数据库实例的集合,此时可以基于云数据库(cloud database)的强同步技术来保证主机的数据与备机的副本数据之间的一致性。可选地,每个 SET可以进行线性扩容,以应付大数据场景下的业务处理需求。
在一些实施例中,TP集群101还可以支持通过分布式协调系统(例如ZooKeeper)实现对该多个节点设备103的管理,例如可以通过ZooKeeper使得某一个节点设备失效(也即是将该节点设备从TP集群101中删除)。
基于上述实施环境,图2是本申请实施例提供的一种数据复制方法的交互流程图。参见图2,以TP集群中多个节点设备中的任一节点设备、AP集群中的集群设备作为交互执行主体为例进行说明,该实施例应用于节点设备与集群设备的交互过程,该实施例包括:
201、当节点设备检测到事务的提交操作时,将该事务的历史态数据添加到数据队列中,该数据队列用于缓存历史态数据。
在上述过程中,该节点设备可以是TP集群内的任一节点设备,该节点设备上可以配置有节点数据库,在该节点数据库内部,伴随着任一事务的提交,会相应地生成历史态数据和新的当前态数据。
以更新事务(UPDATE操作)为例进行说明,对一个元组执行更新事务时可以划分为两个步骤:其一是在更新前的元组上添加删除标识,其二是生成一个新的元组存放修改后的数据内容。当更新事务提交完成后,该更新前的元组和该新的元组对外呈现“可被读到”的状态,也即是只有在更新事务提交完成后,元组才完成更新有效的过程,数据库引擎才支持对该更新前的元组和该新的元组执行读操作,从而用户能够发现这个元组被修改了。
另一方面,删除事务(DELETE操作)也具有类似的过程,对一个元组执行删除事务时,在原本的元组上添加删除标识,当删除事务提交完成后,元组才完成删除有效的过程,该原本的元组才对外呈现“可被读到”的状态,也即是只有在删除事务提交完成后,用户才能够发现这个元组被删除了。
基于上述情况,节点设备在提供事务处理服务时,当检测到任一事务的提交操作时,节点数据库会得到该事务的历史态数据,如果该节点数据库是不支持存储历史态数据的数据库,那么节点设备可以在事务提交完成的时刻,同时获取到历史态数据,执行上述步骤201中的将该历史态数据添加到数据队列中的操作,从而达到了事务的提交操作与数据队列的添加操作同步实现。
在一些实施例中,一些类型的节点数据库(例如Oracle、MySQL/InnoDB等)支持以回滚段的方式暂时性的存储过渡态数据或历史态数据,在这种情况下,事务的提交操作与数据队列的添加操作则是异步的,由于节点数据库仅能暂时性的存储历史态数据,数据库引擎会定时清理回滚段中存储的数据,此时节点设备可以在数据库引擎执行回滚段的清理操作时获取回滚段中存储的历史态数据,执行上述步骤201中的将该历史态数据添加到数据队列中的操作,从而达到了事务的提交操作与数据队列的添加操作异步实现。
例如,图3是本申请实施例提供的一种获取历史态数据的原理性示意图, 参见图3,假设用户A的初始余额为100元,在第一时刻充值了100元,余额变为200元,在第二时刻之后的某一时刻又充值了100元,余额变为300元,此时金融机构对节点数据库执行读操作,而在读操作进行过程中的第三时刻用户A又充值了100元,余额变为400元,此时用户A对应的当前态数据为400,过渡态数据为300,历史态数据包括100和200,以节点数据库为MySQL为例,可以通过执行PURGE操作进行回滚段的清理,而当节点设备检测到PURGE操作时,节点设备将PURGE操作所作用的历史态数据(用户A所对应的100和200)添加到数据队列中,这里仅以用户A的历史态数据为例进行说明,对于用户B、用户C以及用户D都同理使用,这里不做赘述。
在一些实施例中,一些类型的节点数据库(例如PostgreSQL)支持把当前态数据、过渡态数据和历史态数据记录在数据页面中,定时清理数据页面中的历史态数据,此时节点设备可以在数据库引擎执行数据页面的清理操作时获取数据页面中存储的历史态数据,执行上述步骤201中的将该历史态数据添加到数据队列中的操作,从而达到了事务的提交操作与数据队列的添加操作异步实现。
例如,图4是本申请实施例提供的一种获取历史态数据的原理性示意图,参见图4,以节点数据库为PostgreSQL为例,节点数据库将多个元组的当前态数据、过渡态数据和历史态数据记录在数据页面中,在数据页面中还可以记录该多个元组的元组信息,节点数据库通过执行VACUUM操作进行数据页面的清理,当节点设备检测到VACUUM操作时,将VACUUM操作所作用的历史态数据添加到数据队列中,然后清理数据页面中当前最小活跃事务之前的事务所生成的数据。
在上述任一种情况下的将历史态数据添加到数据队列过程中,节点设备上可以包括数据缓冲区(buffer),该数据缓冲区以数据队列的形式缓存历史态数据,将历史态数据从节点数据库的原始数据表中添加到该数据缓冲区的数据队列中。
202、节点设备每间隔第一预设时长,获取该数据队列中在当前时刻之前的该第一预设时长内增加的至少一个历史态数据。
其中,该第一预设时长可以是任一大于或等于0的数值,例如,该第一预设时长可以是0.5毫秒。
在上述过程中,节点设备每间隔第一预设时长就从数据队列中获取一次历史态数据,但由于数据队列中的历史态数据是无序的,因此需要执行下述步骤203,对历史态数据进行排序后再添加到发送缓冲区中,从而实现将历史态数据异步写入发送缓冲区。
在一些实施例中,节点设备还可以将历史态数据同步写入发送缓冲区,同步过程也即是每当数据队列中新增一个历史态数据,就将该历史态数据同步地添加到发送缓冲区中。基于上述同步写入发送缓冲区的情况,如果节点 数据库是不支持存储历史态数据的数据库,那么节点设备在事务提交完成的时刻,即可以将历史态数据写入数据队列,在同一时刻又将历史态数据写入发送缓冲区。
在上述过程中,本实施例中的步骤202-204可以被替换为:当检测到该数据队列中增加任一历史态数据时,将该历史态数据添加到发送缓冲区;当检测到发送缓冲区中增加任一历史态数据时,将该发送缓冲区中的该至少一个历史态数据复制至该集群设备,从而实现了历史态数据的同步复制,保证将历史态数据在写入发送缓冲区时,按照事务提交时间戳的顺序以及事务标识的顺序写入,从而无需执行上述步骤203中的排序操作,直接执行下述步骤204。
在一些场景中,如果历史态数据的产生与历史态数据被添加到数据队列的过程是异步的,例如上述步骤201中所涉及到的MySQL/InnoDB等类型的节点数据库中采用PURGE操作清理历史态数据,或者例如PostgreSQL等类型的节点数据库中采用VACUUM操作清理历史态数据,此时会导致数据队列中本身缓存的历史态数据是无序的,那么即使历史态数据是同步发送缓冲区的,仍然无法保证历史态数据被有序地写入发送缓冲区,因此,在这种场景下需要执行下述步骤203。
图5是本申请实施例提供的一种流复制技术的原理性示意图,参见图5,当检测到任一事务的提交操作后,可以使原本的当前态数据转换为历史态数据,此时先将历史态数据从原始数据表添加到数据队列中,然后基于上述步骤202中的操作,将历史态数据从数据队列异步添加到发送缓冲区,但由于间隔了第一预设时长,从数据队列中获取到的历史态数据是无序的,为了保证历史态数据能够有序地写入发送缓冲区,需要执行下述步骤203。
203、节点设备按照事务提交时间戳从小到大的顺序,对该至少一个历史态数据进行排序,当存在事务提交时间戳相同的多个历史态数据时,按照事务标识从小到大的顺序对该多个历史态数据进行排序,得到至少一个有序排列的历史态数据,将该至少一个有序排列的历史态数据添加到发送缓冲区。
其中,每个历史态数据对应于一个事务,该事务标识(identification,事务ID)用于唯一标识一个事务,事务标识按照事务产生时间戳(timestamp)呈单调递增,例如该事务标识可以就是事务产生时间戳,当然,该事务标识也可以是按照事务产生时间戳赋值的呈单调递增趋势的数值,需要说明的是,一个事务通常对应于两个时间戳,分别是事务产生时间戳和事务提交时间戳,这两个时间戳分别对应于事务的产生时刻和提交时刻。
其中,该发送缓冲区可以是一个在数据复制过程中循环使用的部分,该发送缓冲区可以是发送进程或发送线程执行发送任务(将历史态数据从节点设备发送至集群设备)时调用的缓冲区,可选地,由于发送进程或者发送线程的数量可以是一个或多个,因此该发送缓冲区的数量也可以是一个或多个, 在上述步骤203中仅以将有序的历史态数据写入任一个发送缓冲区为例进行说明。
在上述过程中,节点设备将历史态数据异步写入发送缓冲区之前,可以先对历史态数据排序,在排序时先按照事务提交时间戳从小到大的顺序进行排序,再针对事务提交时间戳相同的历史态数据,按照事务标识从小到大的顺序进行排序,进而将有序排列的历史态数据写入该发送缓冲区,保证了在发送缓冲区内的历史态数据是绝对有序的。
图6是本申请实施例提供的一种流复制技术的原理性示意图,参见图6,当发送缓冲区的数量为多个时,每个发送缓冲区获取历史态数据的方法与上述步骤202-203介绍的实现方式类似,在此不做赘述。在上述步骤202-203中,节点设备将数据队列中的至少一个历史态数据添加到至少一个发送缓冲区中的任一发送缓冲区,在大数据量的场景中,可以通过增加发送缓冲区的数量,更加快速地将数据队列中的历史态数据写入发送缓冲区。
可选地,当发送缓冲区的数量为多个时,节点设备可以将数据队列中来自于同一个原始数据表的历史态数据均匀地添加到多个发送缓冲区中,从而能够提高多个发送缓冲区的利用率,也能提升对该原始数据表中历史态数据的发送速率。
在一些实施例中,节点设备将历史态数据从数据队列中添加到发送缓冲区之后,可以根据实际需求,在数据队列中将该历史态数据标记为可复用的状态,从而使得节点设备将该历史态数据转储在本地。
204、当符合第一预设条件时,节点设备将该发送缓冲区中的该至少一个历史态数据复制至集群设备。
在一些实施例中,该第一预设条件可以为节点设备检测到发送缓冲区中增加任一历史态数据,其中该发送缓冲区用于缓存待复制的历史态数据,在节点设备执行从数据队列中获取历史态数据的过程中,一旦向发送缓冲区中成功添加了一个历史态数据,则发送缓冲区会向集群设备复制该历史态数据,从而能够源源不断地将历史态数据复制到集群设备中,这种数据复制的技术称为流复制技术。
在一些实施例中,该第一预设条件还可以是节点设备检测到该发送缓冲区的已用数据量占该发送缓冲区的容量的比例达到比例阈值,在节点设备执行从数据队列中获取历史态数据的过程中,一旦发送缓冲区的已用数据量占发送缓冲区总容量的比例达到比例阈值,则发送缓冲区会向集群设备复制自身缓存的该历史态数据,从而能够源源不断地将历史态数据复制到集群设备中。
其中,该比例阈值可以是任一大于0且小于等于1的数值,例如该比例阈值可以是100%或75%等数值。
在一些实施例中,该第一预设条件还可以是当前时刻距离该发送缓冲区 上一次向集群设备复制历史态数据的时刻达到第二预设时长,在节点设备执行从数据队列中获取历史态数据的过程中,一旦节点设备距离上次历史态数据复制的时刻达到第二预设时长,则发送缓冲区会向集群设备复制该历史态数据,从而能够源源不断地将历史态数据复制到集群设备中。
其中,该第二预设时长可以是任一大于或等于第一预设时长的数值,例如当第一预设时长为0.5毫秒时,该第二预设时长可以是1毫秒,此时发送缓冲区每隔1毫秒向集群设备执行一次数据复制,而在这1毫秒的间隔内,发送缓冲区每间隔0.5毫秒,就会从数据队列中获取前0.5毫秒内数据队列中新增的历史态数据(可以是一个或多个)。
在一些实施例中,该第一预设条件还可以是当前时刻距离该发送缓冲区上一次向集群设备复制历史态数据的时刻达到第三预设时长,其中,该第三预设时长为针对多个节点设备中每一个节点设备配置有相同的预设时长,该第三预设时长大于第二预设时长,在多个节点设备各自执行数据复制的过程中,每间隔第三预设时长,多个节点设备同时执行一次数据复制任务,以控制各个节点设备执行数据复制操作时相互之间的延时最大不超过该第三预设时长。
在一些实施例中,该第一预设条件还可以是当节点设备检测到发送缓冲区的已用数据量占该发送缓冲区的容量的比例达到比例阈值,或者当前时刻距离该发送缓冲区上一次向集群设备复制历史态数据的时刻达到第二预设时长。上述情况也即是:在数据复制的过程中,一旦发送缓冲区的已用数据量占发送缓冲区总容量的比例达到了比例阈值,则执行一次数据复制任务,或者即使发送缓冲区的已用数据量占发送缓冲区容量的比例还没有达到比例阈值,但是当前时刻距离该发送缓冲区上一次向集群设备复制历史态数据的时刻达到了第二预设时长,也执行一次数据复制任务。
在上述过程中,节点设备可以基于发送进程或者发送线程,将该发送缓冲区中的该至少一个历史态数据发送至集群设备,可选地,节点设备还可以当符合第一预设条件时,一次性将该发送缓冲区中缓存的所有历史态数据发送至该集群设备,上述步骤202-204构成了一个循环过程,使得节点设备能够基于一种流复制的技术,持续地将历史态数据复制到集群设备中。
在一些实施例中,发送缓冲区向集群设备发送的每个历史态数据中可以包括该历史态数据对应的事务的事务标识、该事务的一个或多个子事务对应的一个或多个节点设备的节点标识或者该历史态数据的全量数据中的至少一项。
其中,一个事务可以包括至少一个子事务,每个子事务对应于一个节点设备,每个节点设备具有唯一的节点标识,该节点标识可以是节点设备的IP地址(internet protocol address,互联网协议地址),也可以是节点设备的标识号码,该标识号码与IP地址具有一一对应的映射关系,TP集群中的任一 节点设备中可以存储有该映射关系,当然AP集群的集群设备中也可以存储有该映射关系。
在一些实施例中,可以采用位图编码或者字典压缩等方式对上述一个或多个节点设备的节点标识进行编码,使得节点设备发送的历史态数据的长度变短,进一步地压缩数据传输所占用的资源。
在一些实施例中,上述数据复制过程可以通过TP集群的Checkpoint(检查点)操作实现,节点设备还可以设置TP集群的Checkpoint操作频度,其中,该操作频度用于表示TP集群执行Checkpoint操作的频率,例如该操作频度可以是1秒执行1次,在一次Checkpoint操作中,TP集群中的每一个节点设备都执行一次上述步骤204中的数据复制过程,使得TP集群中新产生的历史态数据可以一次性地转储到AP集群中,也即是Checkpoint操作频度实际上是对应于上述第三预设时长。
在一些实施例中,当TP集群内节点设备的数量较多时,如果仍然对TP集群中每一个节点设备遍历执行一次Checkpoint操作,可能会导致TP集群向AP集群进行数据复制时消耗的时长大量增加,还会导致HTAC的性能出现颠簸,影响了HTAC的稳定性和鲁棒性,因此对于TP集群内的每一个节点设备,可以执行“微Checkpoint”操作,微Checkpoint的操作频度快于Checkpoint的操作频度,从而能够使得节点设备的历史态数据更快地转储到AP集群,满足AP集群对历史态数据的获取需求,保障了历史态数据的复制效率,提高了AP集群的实时可用性。
例如,可以将微Checkpoint的操作频度设置为Checkpoint的操作频度的千分之一个时间单位,也即是如果Checkpoint操作1秒执行1次,则微Checkpoint操作1毫秒执行1次,当然,这里仅仅是对微Checkpoint的操作频度进行示例性描述,本申请实施例不对微Checkpoint的操作频度与Checkpoint的操作频度之间的比例进行具体限定。
需要说明的是,在上述情况中,微Checkpoint的操作频度实际上是对应于上述第二预设时长,不同的节点设备可以设置有不同的微Checkpoint的操作频度,微Checkpoint操作频度可以与节点设备每秒钟活跃事务数量呈正相关,例如,对于每秒钟活跃事务数量占TP集群前10的节点设备,可以设置较高的微Checkpoint操作频度。当然,即使不同的节点设备设置了相同的微Checkpoint操作频度,由于不同节点设备的发送缓冲区的已用数据量占发送缓冲区总容量的比例通常不会同时达到比例阈值,也会导致不同节点设备之间的微Checkpoint操作不同步。
在一些实施例中,在TP集群不同节点设备分别执行微Checkpoint操作的同时,还可以强制TP集群中所有节点设备定时地执行一次Checkpoint操作,从而避免TP集群内部不同节点设备由于微Checkpoint操作的不同步造成数据延时过大,影响AP集群的实时可用性。例如,各个节点设备每1毫 秒执行一次微Checkpoint操作,而TP集群每1秒遍历所有的节点设备,执行一次Checkpoint操作,保证了AP集群接收历史态数据的数据延时最大不超过1秒(不超过Checkpoint操作频度)。
进一步地,在上述步骤204中的数据复制过程也可以分为同步和异步,在同步复制中,数据复制与历史态数据的清理操作紧密相关,每次清理操作(例如PRUGE操作或者VACUUM操作)对应的清理事务在提交阶段都会发起一次历史态数据流复制,也即是节点设备在清理操作完成之前先将被清理的所有历史态数据同步到集群设备,集群设备基于ARIES算法对数据复制过程的元数据的重做日志(REDO LOG)进行回放,节点设备等待回放完成后才将清理事务的状态置为“已提交”,使得历史态数据能够尽快地复制到集群设备上,极大地保证了历史态数据的安全性。
需要说明的是,对于原始数据表中清理出的历史态数据,基于流复制技术就能实现数据复制,但是在一些实施例中,可以仅对本次数据复制的元数据进行重做日志的记录和回放,实现节点设备与集群设备之间的再次核验和校对,能够更大程度地保证本次数据复制过程的安全性,在这种情况下,仍然可以避免对原始数据表中清理出的历史态数据一一执行重做日志的回放,也就简化了回放过程的数据量,缩短了回放过程消耗的时长,提升了数据复制的效率。
在一些实施例中,数据复制过程还可以是异步的,此时数据复制与清理事务的提交是不相关的,节点设备的清理事务在提交阶段不会发起历史态数据流复制,节点设备与集群设备之间的流复制是按照第一预设条件所规定的第二预设时长发起的,将两次流复制之间的时间间隔内节点设备上发生修改的历史态数据复制到集群设备,节约了数据复制过程所占用的数据传输资源。
在上述步骤204中,还涉及到对数据复制的完成事项的确认,此时可以分为三种确认级别,分别是确认回放级别、确认接收级别、确认发送级别,下面进行详述:
在确认回放级别中,当节点设备接收到集群设备的复制成功响应后,节点设备才认为一次数据复制任务完成,实现了数据复制过程的强同步。强同步能够保证每次的数据复制是原子的,也即是数据复制的整体过程要么成功,要么失败,不存在中间状态,一旦任一环节出现异常,则认为本次数据复制失败,需要对本次数据复制整体进行重做,保证了数据复制过程的安全性。可选地,该复制成功响应可以是一个“Applied”指令。
在确认接收级别中,当节点设备接收到集群设备的数据接收响应后,节点设备就认为一次数据复制任务完成,实现了数据复制过程的弱同步。弱同步能够保证除了集群设备的元数据回放工作之外,数据复制过程中的其余操作都是原子的,此时如果元数据回放失败,也不会使得本次数据复制整体重做,在兼顾了数据复制效率的同时,在一定程度上保证了数据复制过程的安 全性。可选地,该数据接收响应可以是一个“Received”指令。
在确认发送级别中,当节点设备完成数据发送操作后,节点设备就认为一次数据复制任务完成,此时虽然不能保证数据复制过程是原子的,但节点设备与集群设备之间互不影响,当集群设备产生响应发生宕机等异常情况时,不会阻塞节点设备再次发起数据复制,当集群设备具有的单机设备数量不止一个时,即使一个单机设备故障,对于其余单机设备的数据复制过程仍可以正常进行,保障了数据复制的效率。
205、集群设备从接收缓冲区接收节点设备发送的至少一个历史态数据,该接收缓冲区用于缓存接收的历史态数据。
其中,该接收缓冲区可以是一个在数据复制过程中循环使用的部分,该接收缓冲区可以是接收进程或接收线程执行接收任务(接收节点设备发送的历史态数据)时调用的缓冲区,可选地,由于接收进程或者接收线程的数量可以是一个或多个,因此该接收缓冲区的数量也可以是一个或多个,本申请实施例以一个接收缓冲区为例进行说明,对于其他的接收缓冲区具有类似的接收历史态数据的过程,在此不作赘述。
在一些实施例中,一个接收缓冲区可以对应于一个节点设备,此时上述步骤205即是:集群设备从至少一个接收缓冲区中确定与节点设备对应的接收缓冲区,基于接收进程或接收线程,将该节点设备发送的至少一个历史态数据缓存至该接收缓冲区,使得一个接收缓冲区能够有针对性地接收来自同一节点设备的历史态数据。
当然,该接收缓冲区与节点设备之间也可以不存在对应关系,由集群设备根据接收缓冲区当前可用的存储空间,进行数据接收任务的分配,此时上述步骤205即是:集群设备从至少一个接收缓冲区中确定当前可用的存储空间最大的接收缓冲区,基于接收进程或接收线程,将节点设备发送的至少一个历史态数据缓存至该接收缓冲区,使得集群设备能够将历史态数据添加到当前可用的存储空间最大的接收缓冲区,实现缓存资源的合理利用。
206、集群设备将该接收缓冲区中的该至少一个历史态数据添加到转发缓冲区,通过该转发缓冲区,将该至少一个历史态数据转换为符合元组格式的数据,得到至少一个数据项,该转发缓冲区用于对历史态数据进行数据格式转换。
在上述过程中,接收缓冲区将历史态数据添加到(也即是复制)转发缓冲区的过程可以包括同步复制和异步复制两种方式。
在同步复制的过程中,每当集群设备从接收缓冲区中接收到历史态数据(可能是一个或多个,但属于节点设备一次性发送的),即立刻对将该历史态数据复制到转发缓冲区。
在异步复制的过程中,集群设备从接收缓冲区中接收历史态数据,每间隔第四预设时长,将该接收缓冲区中的所有历史态数据复制到转发缓冲区。 其中,第四预设时长为任一大于或等于0的数值。
在一些实施例中,如果节点设备每间隔第二预设时长执行一次微Checkpoint操作,那么上述步骤206即是:每间隔第二预设时长,集群设备从接收缓冲区中接收节点设备发送的至少一个历史态数据,当然对于任一个节点设备都以此类推,不同的节点设备的第二预设时长可以相同也可以不同。
在一些实施例中,如果TP集群所有的节点设备每间隔第三预设时长执行一次Checkpoint操作,那么上述步骤206即是:每间隔第三预设时长,集群设备从接收缓冲区中接收多个节点设备同时发送的至少一个历史态数据,保证了TP集群不同节点设备之间的数据延时不超过第三预设时长,提升了AP集群存储历史态数据的实时可用性。
在一些实施例中,不管是同步复制还是异步复制,当历史态数据成功复制到转发缓冲区之后,在接收缓冲区中清空本次复制的历史态数据,从而能够及时清理出缓存空间来存储新的历史态数据,从而加快数据传输的速度。
在上述步骤206中,由于节点设备发送的至少一个历史态数据的格式是经过压缩后的数据格式,因此,需要在转发缓冲区中将该至少一个历史态数据还原为原本的符合元组格式的数据,以便于执行下述步骤207,在一些实施例中,该符合元组格式的数据可以是行格式的数据。
207、集群设备将该至少一个数据项存储到集群数据库的至少一个目标数据表中,一个目标数据表对应于一个数据项在该节点设备中所在的一个原始数据表。
在上述步骤中,根据业务需求的不同,目标数据表可以包括两种存储格式,因此,集群设备将该至少一个数据项存入目标数据表时也存在两种相应的存储过程,下面进行详述:
在一些实施例中,对以元组为单位的数据项,集群设备可以按照该数据项所在的原始数据表中的存储格式,将该数据项存储在与该原始数据表对应的目标数据表中,从而使得目标数据表与原始数据表的存储格式完全相同,便于在通用的情况下跟踪一个元组的生命周期。
在上述过程中,为了保证原始数据表与目标数据表的格式一致,当任一节点设备与集群设备建立连接后,可用基于逻辑复制技术(例如MySQL的BinLog技术)或者物理复制技术(例如PostgreSQL的基于REDO LOG的复制技术),在集群设备中创建与节点设备中各个原始数据表对应的各个目标数据表,其中,原始数据表用于存储多个元组的当前态数据,与该原始数据表对应的目标数据表用于存储该多个元组的历史态数据。
在上述BinLog(二进制日志,也称逻辑日志)技术中,BinLog用于记录数据库中的操作,在BinLog中以特定的格式来描述数据改动、表结构改动等数据库事务操作,能够记录在BinLog中的事务操作通常是已经完成提交或完成回滚的。下面以MySQL数据库的逻辑复制技术为例进行说明,当节点设 备与集群设备建立连接后,节点设备上可以维护一个或多个Dump-Thread线程(倾倒线程),一个Dump-Thread线程用于与一个集群设备进行对接,在节点设备与集群设备进行逻辑复制时,可以执行下述步骤:
集群设备向节点设备发送已经同步的BinLog的信息(包括数据文件名和数据文件内的位置),节点数据库根据已经同步的BinLog的信息,确定当前已同步的位置;节点设备的Dump-Thread线程将未同步元数据的BinLog数据发送到集群设备,集群设备通过IO-Thread(input/output thread,输入输出线程)接收节点设备同步过来的BinLog数据,将BinLog数据写入Relay-Log(转接日志)所在的文件中,集群设备通过SQL-Thread(SQL线程)从Relay-Log文件中读取BinLog数据,执行BinLog数据解码后得到的SQL语句,从而可以增量的将节点设备的元数据复制到集群设备中。
在一些实施例中,对表示字段变更情况的数据项,集群设备可以按照键值对(key-value)的存储格式,将该数据项存储在与该原始数据表对应的目标数据表中,从而不仅可以保留数据项原本承载的信息,还可以通过键值对的存储格式,定制化地跟踪任一字段的历史态数据的变更情况。
在上述以键值对格式存储的过程中,需要确定目标数据表的键名(key)和键值(value),在一些实施例中,具体可以进行如下操作来确定键名:集群设备将数据项在原始数据表中的键名和该数据项的生成时间中的至少一项,确定为该数据项在该目标数据表中的键名,可选地,当原始数据表存在键名时,可以将原始数据表中的键名以及该数据项的生成时间确定为目标数据表中的键名,从而能够从不同的维度来跟踪历史态数据的变更情况,当然,如果原始数据表不存在键名时,则可以直接将数据项的生成时间确定为目标数据表中的键名,能够直观地记录数据项的生成时间。
在一些实施例中,还可以进行如下操作来确定键值:集群设备将数据项在原始数据表中被修改的字段,确定为该数据项在目标数据表中的键值,其中,被修改的字段类似于一个字符串的格式,每个被修改的字段的存储格式可以为“键名:旧值,新值”,可选地,被修改的字段可以是一个或多个,如果有多个字段同时被修改,则被修改的字段之间可以用分号隔开。
例如,图7是本申请实施例提供的一种原始数据表的结构示意图,参见图7,以一个表示字段变更情况的数据项为例进行说明,在原始数据表中存在有4个键名,分别是服务器号、服务器状态、所属部门和地区。假设在一次事务操作中,对该数据项的“服务器状态”和“所属部门”进行了修改,图8是本申请实施例提供的一种目标数据表的结构示意图,参见图8,在目标数据表中可以直观地看到“服务器状态”和“所属部门”以及操作时间的动态变更情况,由于数据项的“地区”没有被修改,因此在目标数据表中不需要展示“地区”的变更情况,此时,在目标数据表中各个键值的存储格式可以是“服务器状态:提供服务,服务中断;所属部门:部门A,部门B”。
在一些实施例中,集群设备还可以通过存储进程或者存储线程,将转发缓冲区中的数据项通过存储接口(storage interface)上传至分布式文件系统进行持久化存储,以实现历史态数据的无限存储。
以分布式文件系统为Ceph,集群设备的集群数据库为MySQL为例进行说明,在MySQL上可以通过两种方式来挂载Ceph,例如,可以通过挂载CephFS来完成配置,此时假设集群设备中包括一个监管(Monitor)设备(node1)以及两个单机设备(node2和node3),具体可以执行下述步骤:
首先,集群设备创建目录并准备bootstrap keyring文件,可以通过“sudo mkdir-p/var/lib/ceph/mds/ceph-localhost”命令实现,在创建目录后Ceph会在监管设备所在的node1上自动生成bootstrap keyring文件,此时需要将bootstrap keyring文件复制到node2和node3上,可以通过“/var/lib/ceph/bootstrap-osd/ceph.keyring”命令进行复制,需要说明的是,此处是以集群设备中包括2个单机设备为例进行说明,如果集群设备中包括2个以上的单机设备,还需要在其他的单机设备上挂载CephFS时,并将bootstrap keyring文件复制到该单机设备上。
其次,集群设备生成done文件和sysvinit文件,在一些实施例中,集群设备可以通过语句“sudo touch/var/lib/ceph/mds/ceph-mon1/done”生成done文件,以及可以通过语句“sudo touch/var/lib/ceph/mds/ceph-mon1/sysvinit”生成sysvinit文件。
其次,集群设备生成mds的keyring文件,在一些实施例中,集群设备可以通过语句“sudo ceph auth get-or-create mds.mon1osd'allow rwx'mds'allow'mon'allow profile mds'-o/var/lib/ceph/mds/ceph-mon1/keyring”生成keyring文件。
其次,集群设备创建Cephfs的pool,在一些实施例中,集群设备可以通过语句“ceph osd pool create cephfs_data 300”创建Cephfs的pool的数据,通过语句“ceph osd pool create cephfs_metadata 300”创建Cephfs的pool的元数据。
其次,集群设备启动MDS文件(一种镜像文件),在一些实施例中,集群设备可以通过语句“sudo/etc/init.d/ceph start|stop mds.localhost”来启动MDS。
最后,集群设备创建Cephfs和挂载Cephfs,在一些实施例中,集群设备可以通过语句“ceph fs new cephfs cephfs_metadata cephfs_data”来创建Cephfs,当创建完成后,集群设备可以通过语句“mount-t ceph[mon监管设备ip地址]:6789://mnt/mycephfs”来完成Cephfs的挂载。
可选地,集群设备还可以通过挂载Ceph的RBD(一种镜像文件)的方式来完成配置,具体可以执行下述步骤:
首先,集群设备创建RBD的pool,例如可以通过语句“ceph osd pool create  rbd 256”进行创建。
其次,集群设备创建RBD块设备myrbd(也即是申请一个块存储空间),例如可以通过语句“rbd create rbd/myrbd--size 204800-m[mon监管设备ip地址]-k/etc/ceph/ceph.client.admin.keyring”进行创建。
其次,集群设备创建RBD映射,获取设备名称,也即是将RBD映射到监管设备上,例如可以通过语句“sudo rbd map rbd/myrbd--name client.admin-m[mon监管设备ip地址]-k/etc/ceph/ceph.client.admin.keyring”来进行映射,同时获取到监管设备的名称,需要说明的是,这里以挂载到监管设备为例进行说明,实际上要挂载到哪个单机设备,就执行将RBD映射到该单机设备上,获取该单机设备的名称的操作。
最后,集群设备根据获取到的设备名称创建文件系统,挂载RBD,例如可以通过语句“sudo mkfs.xfs/dev/rbd1”进行创建文件系统,通过语句“sudo mount/dev/rbd1/mnt/myrbd”进行RBD挂载。
需要说明的是,在一些实施例中,不仅集群设备可以通过存储接口接入分布式文件系统,TP集群中的任一节点设备也可以通过存储接口接入分布式文件系统,均可以执行上述类似的挂载方式完成配置,这里不再赘述。
208、集群设备向节点设备发送复制成功响应。
在上述过程中,当集群设备将历史态数据成功存储到目标数据表后,集群设备可以向节点设备发送一个ACK数据(acknowledgement,确认字符),该ACK数据是一种传输类控制字符,用于表示已对节点设备发送的历史态数据复制成功。
209、当节点设备接收到该集群设备发送的复制成功响应时,清空与该复制成功响应所对应的发送缓冲区。
在上述过程中,节点设备接收到复制成功响应后,才允许清空发送缓冲区,保证了节点设备与集群设备之间的强同步,保证了数据复制过程的安全性。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
本申请实施例提供的方法,当检测到事务的提交操作时,将该事务的历史态数据添加到数据队列中,以将该事务的历史态数据缓存到数据队列中,将该数据队列中的至少一个历史态数据添加到发送缓冲区,以便基于发送缓冲区执行发送进程或发送线程,当符合第一预设条件时,将该发送缓冲区中的该至少一个历史态数据复制至集群设备,使得节点设备能够每当符合第一预设条件,就将发送缓冲区中的历史态数据复制到集群设备,节点设备不用把原本的历史态数据格式转化为日志格式,集群设备也不必将日志解析为数据原始格式后再进行存储,从而在数据复制时无需对历史态数据进行重做日 志的回放,避免了繁琐的回放流程,缩短了重做日志回放过程的时长,提高了数据复制过程的效率。
进一步地,节点设备对历史态数据进行同步复制,保证了历史态数据可以按照事务提交的先后顺序被复制到集群设备,避免执行对历史态数据进行排序的步骤,简化了流复制过程的流程。当然,节点设备还可以将数据队列中的历史态数据异步复制到发送缓冲区,从而批量地将数据队列中的历史态数据添加到发送缓冲区中,避免频繁执行历史态数据的复制操作,进而避免影响节点设备的处理效率,但在异步复制之前需要对历史态数据进行排序,以保证历史态数据被有序地被添加到发送缓冲区中,方便了后续集群设备获取最小事务标识。
进一步地,如果符合第一预设条件,发送缓冲区会向集群设备复制历史态数据,复制成功后清空发送缓冲区,此后发送缓冲区循环执行添加历史态数据和发送历史态数据的过程,能够源源不断地将节点设备的历史态数据复制到集群设备中,避免对历史态数据进行重做日志的回放,提高了数据复制过程的效率。
进一步地,当发送缓冲区的数量为多个时,节点设备可以将数据队列中来自于同一个原始数据表的历史态数据均匀地添加到多个发送缓冲区中,从而提高多个发送缓冲区的利用率,并且提升对该原始数据表中历史态数据的发送速率。
进一步地,第一预设条件为节点设备检测到发送缓冲区中增加任一历史态数据时,能够实现数据复制的同步复制,保障了历史态数据复制过程的实时性。第一预设条件为节点设备检测到发送缓冲区的已用数据量占该发送缓冲区的容量的比例达到比例阈值时,能够有效地将发送缓冲区的已用数据量占发送缓冲区容量的比例控制在比例阈值内,提升了数据复制过程的效率。第一预设条件为当前时刻距离该发送缓冲区上一次向集群设备复制历史态数据的时刻达到第二预设时长时,能够控制两次数据复制之间的最大时间间隔,保障了历史态数据复制过程的实时性。第一预设条件为当前时刻距离该发送缓冲区上一次向集群设备复制历史态数据的时刻达到第三预设时长时,由于第三预设时长是TP集群各个节点设备所具有的相同的预设时长,从而能够控制TP集群不同节点设备数据复制过程的延时。
进一步地,集群设备从接收缓冲区接收节点设备发送的至少一个历史态数据后,可以将该接收缓冲区中的该至少一个历史态数据添加到转发缓冲区,通过该转发缓冲区,将该至少一个历史态数据转换为符合元组格式的数据,得到至少一个数据项,从而对压缩后的历史态数据的格式进行还原,由于直接得到了保留有原本格式的历史态数据,因此可以避免通过对日志解析来获得历史态数据,将该至少一个数据项存储到集群数据库的至少一个目标数据表中,能够实现对历史态数据的妥善保存。
进一步地,根据业务需求的不同,集群设备的目标数据表中可以支持两种存储格式,对以元组为单位的数据项,集群设备可以按照原始数据表中的存储格式进行存储,便于在通用的情况下跟踪一个元组的生命周期。对表示字段变更情况的数据项,集群设备可以按照键值对的存储格式进行存储,如此,不仅可以保留数据项原本承载的信息,还可以定制化地跟踪任一字段的历史态数据的变更情况。
进一步地,在以键值对格式存储的过程中,集群设备可以将数据项在原始数据表中的键名和该数据项的生成时间中的至少一项,确定为该数据项在该目标数据表中的键名,以从不同的维度跟踪历史态数据的变更情况,直观地记录数据项的生成时间,进一步地,集群设备将数据项在原始数据表中被修改的字段,确定为该数据项在目标数据表中的键值,能够直观地查看被修改的字段,跟踪任一字段的历史态数据的变更情况。
上述实施例提供了一种数据复制方法,当符合第一预设条件时,节点设备能够基于流复制技术,将历史态数据复制到集群设备上,提高了历史态数据的安全性,集群设备妥善存储历史态数据之后,还可以对外提供历史态数据的查询或者分析等服务。
在上述实施例中,已经提到过一个事务可以包括一个或多个子事务,而不同的子事务可以对应于不同的节点设备,虽然节点设备可以每间隔第二预设时长执行一次数据复制过程,但不同的节点设备的起始时间点可以不同,导致节点设备之间的数据复制有可能是异步的。因此,在一些场景中,对于同一个已经提交的事务而言,由于该事务的一个或多个子事务所对应的节点设备在数据复制过程中不同步,因此,可能导致有的节点设备已经将子事务对应的历史态数据复制至集群设备,而有的节点设备可能还没有将子事务对应的历史态数据复制到集群设备的情况发生,进而导致集群设备不能够完整地读取同一个事务影响的所有历史态数据,在AP集群进行数据读取时会出现“不一致性”的问题。
为解决集群设备读取的“不一致性”的问题,本申请还提供了一种数据查询方法,图9是本申请实施例提供的一种数据查询过程的流程图,参见图9,在集群设备上读取历史态数据的步骤如下:
901、集群设备按照事务提交时间戳从小到大的顺序,对至少一个历史态数据进行排序,当存在事务提交时间戳相同的多个历史态数据时,按照事务标识从小到大的顺序对该多个历史态数据进行排序,得到目标数据序列。
其中,上述排序过程是指按照事务标识赋值的从小到大的顺序进行排序,而若一个事务在另一个事务之前,则一个事务的事务标识的赋值小于另一个事务的事务标识的赋值,在不同事务的事务标识中,一个事务的提交时刻越晚,该事务的事务标识赋值越大,因此,事务标识的赋值实际上是按照提交 时刻的时间戳递增的。上述步骤901中的排序过程与上述步骤203类似,这里不做赘述。
在上述过程中,虽然每个节点设备的至少一个发送缓冲区在数据发送前都进行过排序,但是由于集群设备中可以设置有至少一个接收缓冲区,因此,虽然每一个接收缓冲区中接收到的历史态数据是有序的(可以视为一种分段有序的情况),但无法保证所有的接收缓冲区的历史态数据综合起来是有序的,因此集群设备需要执行上述步骤901,对各个接收缓冲区接收到的至少一个历史态数据进行排序,其中,该至少一个历史态数据是多个节点设备发送的历史态数据。
在上述过程中,由于TP集群定时执行一次Checkpoint操作,因此每当集群设备从至少一个接收缓冲区中接收Checkpoint操作发送的至少一个历史态数据时,可以对所接收的至少一个历史态数据进行排序,得到按照事务提交时间戳有序且按照事务标识有序的目标数据序列,此时为了保证读取的一致性,执行下述步骤902-903。
902、集群设备遍历该目标数据序列,对每个历史态数据的位图编码执行按位与操作,确定输出为真的历史态数据对应的事务符合第二预设条件。
在上述步骤204中已经提到过,任一节点设备向集群设备发送历史态数据时,由于一个事务的一个或多个子事务对应于一个或多个节点设备,为了记录与该事务相关的节点设备(也即是子事务对应的节点设备),通常可以采用位图编码或者字典压缩等方式对该一个或多个节点设备的节点标识进行编码,从而对历史态数据的长度进行压缩,减少数据传输所占用的资源。
其中,该至少一个事务为符合第二预设条件的事务,该第二预设条件用于表示事务的所有子事务对应的数据项均已经存储在集群数据库中。
在上述步骤902中,集群设备从该目标数据序列中获取符合该第二预设条件的至少一个事务,获取至少一个事务的方式是由节点设备对历史态数据的压缩方式而决定的。
上述过程给出的是当节点设备采用位图编码进行数据压缩时,确定符合第二预设条件的至少一个事务的方法,也即是对目标数据序列中每个历史态数据进行按位与操作,如果所有的bit位都为1(真),那么表示该历史态数据对应的事务符合第二预设条件,因为该事务的所有子事务对应的数据项都已经被存储至集群数据库中,此时可以将该至少一个事务称为“备选一致性点”。
在一些实施例中,如果节点设备采用字典压缩的方式进行数据压缩时,上述步骤902还可以采用下述方式进行替换:集群设备遍历该目标数据序列,对每个历史态数据的压缩字典进行解码,得到与每个历史态数据对应的全局事务标识,当确定该全局事务标识对应的子事务的数据项均已经存储在该集群数据库中时,确定该全局事务标识对应的事务符合该第二预设条件,从而 对于字典压缩的情况也能够确定出备选一致性点,并通过下述步骤903可以从备选一致性点中找到“完备的最小的事务ID”。
其中,如果一个事务包括多个子事务,那么该事务可以称为一个“全局事务”,一个全局事务意味着这个事务中涉及到的多个子事务对应于多个节点设备,那么对任一全局事务而言可以包括两种类型的事务标识,分别是全局事务标识和局部事务标识,全局事务标识用于表示整个TP集群中所有全局事务中的唯一标识信息,而局部事务标识则用于表示在各自的节点设备中所有事务中的唯一标识信息,对于一个全局事务而言,所有的子事务具有相同的全局事务标识,并且各个子事务还具有各自的局部事务标识。
基于上述情况,确定全局事务标识对应的子事务的数据项均已存储在该集群数据库中的过程可以是如下:集群设备根据该全局事务标识,获取在该集群数据库中已存储的且具有该全局事务标识的数据项,当获取到的该数据项以及解码得到的该历史态数据与事务的所有子事务对应时,确定该全局事务标识对应的子事务的数据项均已经存储在该集群数据库中。
903、集群设备将该至少一个事务中排序最靠前的事务对应的事务标识确定为最小事务标识。
在上述过程中,由于集群设备已经在步骤901中对各个历史态数据按照事务标识从小到大的顺序进行排序,因此可以直接获取至少一个事务中排序最靠前的事务对应的事务标识,也就获取到了至少一个事务的事务标识中的最小事务标识,在节点设备中事务标识是按照时间戳递增的,因此获取到了最小事务标识,也就意味着得到这本次Checkpoint操作接收到的历史态数据中,最完备(指符合第二预设条件)并且时间戳最小的事务,该最小事务标识可以称作“完备的最小的事务ID”,对于事务ID小于该最小事务标识的数据项,可以视为一个“微一致性点”。
在上述步骤901-903中,集群设备在至少一个历史态数据的事务标识中,确定了符合第二预设条件的最小事务标识,该第二预设条件用于表示事务的所有子事务对应的数据项均已经存储在集群数据库中,从而找到了本次Checkpoint操作中的完备的最小的事务ID,在一些实施例中,如果在本次Checkpoint操作中不能找到一个比上一次Checkpoint操作中确定的最小事务标识更大的新一轮的最小事务标识,那么将暂时不对最小事务标识进行更新,而是在TP集群的下一次Checkpoint操作中执行上述步骤901-903所执行的操作,直到确定新的最小事务标识之后再执行下述步骤904,从而可以保证在TP集群中新事务不断的提交的过程中,不断产生事务标识更大的历史态数据,这些历史态数据通过Checkpoint操作被转储到AP集群中,同时会AP集群可以不断地对最小事务标识的数值进行更新,使得完备的最小的事务ID取值越来越大,类似于一个向前滚动的过程,保障了AP集群提供数据查询服务的实时性。
904、集群设备根据该最小事务标识,确定可见数据项,基于该可见数据项,提供数据查询服务,其中,该可见数据项的事务标识小于或等于该最小事务标识。
在上述步骤904中,集群设备可以基于MVCC技术的元组可见性判断算法,使得事务标识小于或等于该最小事务标识的数据项是对外可见的,从而保障了在微Checkpoint操作机制下AP集群的读取一致性。
在一些实施例中,当集群设备基于可见数据项提供数据查询服务时,可以实现全时态数据的任何读操作的读取一致性,因为读取一致性在本质上可以认为是基于历史态数据所构建的事务一致性,因此,实现了读取一致性也就确保了从AP集群读取到的任何时间点的历史态数据都是处于一个事务一致性点。
例如,图10是本申请实施例提供的一种事务一致性点的原理性示意图,参见图10,假设集群数据库中存在三个数据项,分别为r1、r2、r3(三者可分布在AP集群中不同的单机设备上)。初始的数据状态用白色圆圈表示,此时r1、r2、r3处于一个一致性状态(用实线表示),当新的事务发生时,会致使数据的版本发生改变,如T1事务在t1时刻提交,修改了数据项r1,生成一个r1的新版本,在图中用黑色圆圈表示;后来,事务T2在t2时刻提交,修改了数据项r2和r3,产生r2和r3的新版本,在图中用斜线圆圈表示;后来,T3事务在t3时刻提交,修改了数据项r1和r3,产生r1和r3的新版本,在图中用网格圆圈表示;后来,T4事务在t4时刻提交,修改了数据项r2,产生r2的新版本,在图中用网点圆圈表示。经过T1~T4这一系列事务的操作,在全时态数据的维度上进行观察,产生了图中所示的实线、长划线、短划线、点划线、点线共5个一致性状态,每一条线段均可以代表一个一致性状态。那么,如果需要查询t1.5、t3.5等历史时刻的历史态数据,也即是可以通过图中曲线所示的数据版本所处的符合一致性状态的历史态数据(都满足了事务一致性)提供数据查询服务。
在一些实施例中,用户可以基于图1中SR层提供的查询语句、查询操作的语义和元数据,路由查询到TP集群或AP集群内存储的任一数据,当然,TP集群主要提供对当前态数据的查询服务,AP集群则主要提供对历史态数据的查询服务。
可选地,当TP集群提供对当前态(或过渡态)数据的查询服务时,可以基于分布式并发访问控制算法来保证当前态(或过渡态)数据的事务一致性,例如,该分布式并发访问控制算法可以是基于封锁技术的并发访问控制算法、基于OCC(optimstic concurrency control,乐观并发控制)技术的并发访问控制算法、基于TO(time ordering,时间序列)技术的并发访问控制算法、基于MVCC技术的并发访问控制算法等,本申请实施例不对分布式并发访问控制算法的类型进行具体限定。
可选地,当AP集群提供对历史态数据的查询服务时,可以基于上述事务一致性的基础来读取满足一致性条件的历史态数据。
在一些实施例中,HTAC架构整体还可以提供混合查询的服务,也即是一个查询操作同时用于查询元组的当前态数据和历史态数据,该查询操作通常是指定一个历史的时间点,从该时间点起一直读取元组的历史态数据,直到查询到当前时刻的当前态数据。
例如,可以基于下述语句实现混合查询:
SELECT
[ALL|DISTINCT|DISTINCTROW]
[HIGH_PRIORITY]
[STRAIGHT_JOIN]
[SQL_SMALL_RESULT][SQL_BIG_RESULT][SQL_BUFFER_RESULT]
[SQL_CACHE|SQL_NO_CACHE][SQL_CALC_FOUND_ROWS]
select_expr[,select_expr...]
[FROM table_references
[PARTITION partition_list]
[WHERE where_condition]
[GROUP BY{col_name|expr|position}
[ASC|DESC],...[WITH ROLLUP]]
[HAVING where_condition]
[ORDER BY{col_name|expr|position}
[ASC|DESC],...]
[LIMIT{[offset,]row_count|row_count OFFSET offset}]
[PROCEDURE procedure_name(argument_list)]
[INTO OUTFILE'file_name'
[CHARACTER SET charset_name]
export_options
|INTO DUMPFILE'file_name'
|INTO var_name[,var_name]]
[FOR UPDATE|LOCK IN SHARE MODE]]
在上述语句中,table_references的格式可以为如下格式:tbl_name[[AS]alias][index_hint][SNAPSHOT START snapshot_name[TO snapshot_name2][WITH type]]。
其中,SNAPSHOT为事务快照(不同于数据块的数据快照),可以简称为快照,“[SNAPSHOT[START snapshot_name][TO snapshot_name2][WITH type]]”表示为一个“tbl_name”对象指定一个快照区间,是在DQL(data query language,数据查询语言)的基础上新增加的内容,语句的所有子句都包括 (SNAPSHOT、START、TO),表示“快照差读”,也即是从一个快照开始读取直到读取到另一个快照。
本申请实施例提供的数据查询过程,保证了HTAC架构下整体的读取一致性,既保证了TP集群的读取一致性,也保证了AP集群的读取一致性,在AP集群内部,每当接收一次Checkpoint操作所作用的历史态数据,均尝试获取一个新的最小事务标识(完备的最小的事务ID),也即是尝试更新最小事务标识的取值,基于MVCC技术的元组可见性判断算法,使得事务标识小于最小事务标识的事务所对应数据项可见,保证了AP集群存储的历史态数据在事务层面的事务一致性,而当HTAC还支持外部一致性(包括线性一致性、因果一致性等)时,外部一致性和事务一致性整体可以视为全局一致性,使得基于HTAC架构发起的任一项读操作能够满足全局一致性,虽然由于Checkpoint操作会造成一定的数据延时,但仍然可以认为AP集群近似实时地满足了分析类业务对于数据正确性和实时性的查询需求以及计算需求。
上述实施例提供了一种基于数据复制方法后执行数据查询的过程,当符合第一预设条件时,节点设备能够基于流复制技术,将历史态数据复制到集群设备上,使得集群设备能够提供历史态数据的查询、分析等服务,提高了历史态数据的安全性和可用性。
在一些实施例中,如果对TP集群中每一个节点设备遍历执行一次Checkpoint操作,会导致TP集群向AP集群进行数据复制时消耗的时长大量增加,还会导致HTAC的性能出现颠簸,影响了HTAC的稳定性和鲁棒性,因此引入了微Checkpoint操作。
图11是本申请实施例提供的一种数据系统的交互流程图,参见图11,该数据系统包括AP集群的集群设备和TP集群的多个节点设备,下面将对TP集群向AP集群执行微Checkpoint操作和Checkpoint操作的过程进行详述:
1101、每间隔第二预设时长,对该多个节点设备中的任一节点设备,将该节点设备的至少一个历史态数据复制至该集群设备。
在上述步骤1101中,TP集群中的每个节点设备都每间隔第二预设时长执行一次微Checkpoint操作,将该节点设备上的至少一个历史态数据复制到集群设备。
其中,该第二预设时长与上述步骤202中相同,该微Checkpoint操作在上述步骤204中已经作出详述,数据复制的过程与上述步骤201-209类似,这里不做赘述。
1102、每间隔第三预设时长,该多个节点设备将自身的至少一个历史态数据同时复制至该集群设备,该第三预设时长大于该第二预设时长。
其中,该第三预设时长可以是大于第二预设时长的任一数值。可选地, 第二预设时长与微Checkpoint的操作频度所对应,第三预设时长与Checkpoint的操作频度所对应。
在上述步骤204中,TP集群每间隔第三预设时长,遍历TP集群的每一个节点设备,执行一次Checkpoint操作,将TP集群中所有节点设备的至少一个历史态数据复制到集群设备,数据复制的过程与上述步骤201-209类似,这里不做赘述。
1103、每间隔该第三预设时长,该集群设备在该多个节点设备发送的所有历史态数据的事务标识中,确定符合第二预设条件的最小事务标识,该第二预设条件用于表示事务的所有子事务所对应的数据项均已经存储在集群数据库中;根据该最小事务标识,确定可见数据项,基于该可见数据项,提供数据查询服务,其中,提交该可见数据项的事务标识中小于或者等于该最小事务标识。
上述步骤1103与上述步骤901-904类似,这里不做赘述。
本申请实施例提供的数据系统,通过在TP集群与AP集群之间的交互过程,在系统层面体现了TP集群中各个节点设备各自分别每间隔第二预设时长执行微Checkpoint操作,而TP集群整体所有节点设备则每间隔第三预设时长执行一次Checkpoint操作,从而既能满足AP集群对历史态数据的实时更新需求,保证了AP集群的实时可用性,也通过微Checkpoint操作减少了数据复制过程中所耗费的遍历确认时长,提高了数据复制的效率。
图12是本申请实施例提供的一种数据复制装置的结构示意图,参见图12,该装置包括添加模块1201和复制模块1202,下面进行详述:
添加模块1201,用于当检测到事务的提交操作时,将该事务的历史态数据添加到数据队列中,该数据队列用于缓存历史态数据;
该添加模块1201,还用于将该数据队列中的至少一个历史态数据添加到发送缓冲区,该发送缓冲区用于缓存待复制的历史态数据;
复制模块1202,用于当符合第一预设条件时,将该发送缓冲区中的该至少一个历史态数据复制至集群设备。
本申请实施例提供的装置,当检测到事务的提交操作时,将该事务的历史态数据添加到数据队列中,以将该事务的历史态数据缓存到数据队列中,将该数据队列中的至少一个历史态数据添加到发送缓冲区,以便基于发送缓冲区执行发送进程或发送线程,当符合第一预设条件时,将该发送缓冲区中的该至少一个历史态数据复制至集群设备,使得节点设备能够在每当符合第一预设条件时,就将发送缓冲区中的历史态数据复制到集群设备,节点设备不用把原本的历史态数据格式转化为日志格式,集群设备也不必将日志解析为数据原始格式后再进行存储,从而在数据复制时无需对历史态数据进行重 做日志的回放,避免了繁琐的回放流程,缩短了重做日志回放过程的时长,提高了数据复制过程的效率。
在一种可能实施方式中,该添加模块1201用于:
当检测到该数据队列中增加历史态数据时,将该历史态数据添加到该发送缓冲区;
该复制模块1202用于:
当检测到该发送缓冲区中增加历史态数据时,将该发送缓冲区中的该至少一个历史态数据复制至该集群设备。
在一种可能实施方式中,该添加模块1201用于:
每间隔第一预设时长,获取该数据队列中在当前时刻之前的该第一预设时长内增加的至少一个历史态数据;
按照事务提交时间戳从小到大的顺序,对该至少一个历史态数据进行排序,当存在事务提交时间戳相同的多个历史态数据时,按照事务标识从小到大的顺序对该多个历史态数据进行排序,得到至少一个有序排列的历史态数据,将该至少一个有序排列的历史态数据添加到该发送缓冲区。
在一种可能实施方式中,该第一预设条件为检测到该发送缓冲区中增加任一历史态数据;或,
该第一预设条件为当检测到该发送缓冲区的已用数据量占该发送缓冲区的容量的比例达到比例阈值;或,
该第一预设条件为当前时刻距离该发送缓冲区上一次向该集群设备复制历史态数据的时刻达到第二预设时长;或,
该第一预设条件为当前时刻距离该发送缓冲区上一次向该集群设备复制历史态数据的时刻达到第三预设时长,该第三预设时长为针对多个节点设备中每一个节点设备配置的相同的预设时长,该第三预设时长大于该第二预设时长。
在一种可能实施方式中,基于图12的装置组成,该装置还包括:
清空模块,用于当接收到该集群设备发送的复制成功响应时,清空与该复制成功响应对应的发送缓冲区。
在一种可能实施方式中,该添加模块1201还用于:
将该数据队列中来自于同一个原始数据表的历史态数据均匀地添加到多个发送缓冲区中。
需要说明的是:上述实施例提供的数据复制装置在复制数据时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将节点设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据复制装置与数据复制方法实施例属于同一构思,其具体实现过程详见数据复制方法实施例,这里不再赘述。
图13是本申请实施例提供的一种数据复制装置的结构示意图,参见图13,该装置包括接收模块1301、添加模块1302和存储模块1203,下面进行详述:
接收模块1301,用于从接收缓冲区接收节点设备发送的至少一个历史态数据,该接收缓冲区用于缓存接收的历史态数据;
添加模块1302,用于将该接收缓冲区中的该至少一个历史态数据添加到转发缓冲区,通过该转发缓冲区,将该至少一个历史态数据转换为符合元组格式的数据,得到至少一个数据项,该转发缓冲区用于对历史态数据进行数据格式转换;
存储模块1303,用于将该至少一个数据项存储到集群数据库的至少一个目标数据表中,一个目标数据表对应于一个数据项在该节点设备中所在的一个原始数据表。
本申请实施例提供的装置,从接收缓冲区接收节点设备发送的至少一个历史态数据后,将该接收缓冲区中的至少一个历史态数据添加到转发缓冲区,通过该转发缓冲区,将该至少一个历史态数据转换为符合元组格式的数据,得到至少一个数据项,从而能够对压缩后的历史态数据的格式进行还原,由于直接得到了保留有原本格式的历史态数据,因此,可以避免执行解析日志获得历史态数据的操作,进而将该至少一个数据项存储到集群数据库的至少一个目标数据表中,实现对历史态数据的妥善保存。
在一种可能实施方式中,基于图13的装置组成,该存储模块1303包括:
第一存储单元,用于对以元组为单位的数据项,按照该数据项所在的原始数据表中的存储格式,将该数据项存储在与该原始数据表对应的目标数据表中;或,
第二存储单元,用于对表示字段变更情况的数据项,按照键值对的存储格式,将该数据项存储在与该原始数据表对应的目标数据表中。
在一种可能实施方式中,该第二存储单元用于:
将该数据项在该原始数据表中的键名和该数据项的生成时间中的至少一项,确定为该数据项在该目标数据表中的键名;
将该数据项在该原始数据表中被修改的字段,确定为该数据项在该目标数据表中的键值。
在一种可能实施方式中,基于图13的装置组成,该装置还包括:
确定模块,用于在该至少一个历史态数据的事务标识中,确定符合第二预设条件的最小事务标识,该第二预设条件用于表示事务的所有子事务对应的数据项均已经存储在该集群数据库中;
查询模块,用于根据该最小事务标识,确定可见数据项,基于该可见数据项,提供数据查询服务,其中,该可见数据项的事务标识小于或者等于该 最小事务标识。
在一种可能实施方式中,基于图13的装置组成,该确定模块包括:
排序单元,用于按照事务提交时间戳从小到大的顺序,对该至少一个历史态数据进行排序,当存在事务提交时间戳相同的多个历史态数据时,按照事务标识从小到大的顺序对该多个历史态数据进行排序,得到目标数据序列;
获取单元,用于从该目标数据序列中获取符合该第二预设条件的至少一个事务;
确定单元,用于将该至少一个事务中排序最靠前的事务的事务标识确定为该最小事务标识。
在一种可能实施方式中,该获取单元包括:
遍历确定子单元,用于遍历该目标数据序列,对每个历史态数据的位图编码执行按位与操作,确定输出为真的历史态数据对应的事务符合第二预设条件;或,
遍历确定子单元,还用于遍历该目标数据序列,对每个历史态数据的压缩字典进行解码,得到与每个历史态数据对应的全局事务标识;当确定该全局事务标识对应的子事务的数据项均已经存储在该集群数据库中时,确定该全局事务标识对应的事务该第二预设条件。
在一种可能实施方式中,遍历确定子单元还用于:
根据该全局事务标识,获取在该集群数据库中已存储的且具有该全局事务标识的数据项,当获取到的该数据项以及解码得到的该历史态数据与事务的所有子事务对应时,确定该全局事务标识对应的子事务的数据项均已经存储在该集群数据库中。
在一种可能实施方式中,该接收模块1301用于:
每间隔第二预设时长,从该接收缓冲区中接收任一节点设备发送的至少一个历史态数据;或,
每间隔第三预设时长,从该接收缓冲区中接收多个节点设备同时发送的至少一个历史态数据。
需要说明的是:上述实施例提供的数据复制装置在复制数据时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将集群设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的数据复制装置与数据复制方法实施例属于同一构思,其具体实现过程详见数据复制方法实施例,这里不再赘述。
图14是本申请实施例提供的计算机设备的结构示意图,该计算机设备1400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)1401和一个或一个以上的存储器1402, 其中,该存储器1402中存储有至少一条指令,该至少一条指令由该处理器1401加载并执行以实现上述各个数据复制方法实施例提供的数据复制方法。当然,该计算机设备还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备还可以包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括至少一条指令的存储器,上述至少一条指令可由终端中的处理器执行以完成上述实施例中数据复制方法。例如,该计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (19)

  1. 一种数据复制方法,由计算机设备执行,所述方法包括:
    当检测到事务的提交操作时,将所述事务的历史态数据添加到数据队列中,所述数据队列用于缓存历史态数据;
    将所述数据队列中的至少一个历史态数据添加到发送缓冲区,所述发送缓冲区用于缓存待复制的历史态数据;
    当符合第一预设条件时,将所述发送缓冲区中的所述至少一个历史态数据复制至集群设备。
  2. 根据权利要求1所述的方法,所述将所述数据队列中的至少一个历史态数据添加到发送缓冲区,包括:
    当检测到所述数据队列中增加历史态数据时,将所述历史态数据添加到所述发送缓冲区;
    所述当符合第一预设条件时,将所述发送缓冲区中的所述至少一个历史态数据复制至集群设备,包括:
    当检测到所述发送缓冲区中增加历史态数据时,将所述发送缓冲区中的所述至少一个历史态数据复制至所述集群设备。
  3. 根据权利要求1所述的方法,所述将所述数据队列中的至少一个历史态数据添加到发送缓冲区,包括:
    每间隔第一预设时长,获取所述数据队列中在当前时刻之前的所述第一预设时长内增加的至少一个历史态数据;
    按照事务提交时间戳从小到大的顺序,对所述至少一个历史态数据进行排序,当存在事务提交时间戳相同的多个历史态数据时,按照事务标识从小到大的顺序对所述多个历史态数据进行排序,得到至少一个有序排列的历史态数据,将所述至少一个有序排列的历史态数据添加到所述发送缓冲区。
  4. 根据权利要求1至3中任一项所述的方法,所述第一预设条件包括以下任一种或多种:
    检测到所述发送缓冲区中增加任一历史态数据;
    检测到所述发送缓冲区的已用数据量占所述发送缓冲区的容量的比例达到比例阈值;
    当前时刻距离所述发送缓冲区上一次向所述集群设备复制历史态数据的时刻达到第二预设时长;
    当前时刻距离所述发送缓冲区上一次向所述集群设备复制历史态数据的时刻达到第三预设时长,所述第三预设时长为针对多个节点设备中每一个节点设备配置的相同的预设时长,所述第三预设时长大于所述第二预设时长。
  5. 根据权利要求1所述的方法,所述方法还包括:
    当接收到所述集群设备发送的复制成功响应时,清空与所述复制成功响应对应的发送缓冲区。
  6. 根据权利要求1所述的方法,当所述发送缓冲区的数量为多个时,所述方法还包括:
    将所述数据队列中来自于同一个原始数据表的历史态数据均匀地添加到多个所述发送缓冲区中。
  7. 一种数据复制方法,由计算机设备执行,所述方法包括:
    从接收缓冲区接收节点设备发送的至少一个历史态数据,所述接收缓冲区用于缓存接收的历史态数据;
    将所述接收缓冲区中的所述至少一个历史态数据添加到转发缓冲区,通过所述转发缓冲区,将所述至少一个历史态数据转换为符合元组格式的数据,得到至少一个数据项,所述转发缓冲区用于对历史态数据进行数据格式转换;
    将所述至少一个数据项存储到集群数据库的至少一个目标数据表中,一个目标数据表对应于一个数据项在所述节点设备中所在的一个原始数据表。
  8. 根据权利要求7所述的方法,所述将所述至少一个数据项存储到集群数据库的至少一个目标数据表中,包括:
    对以元组为单位的数据项,按照所述数据项所在的原始数据表中的存储格式,将所述数据项存储在与所述原始数据表对应的目标数据表中;
    或,对表示字段变更情况的数据项,按照键值对的存储格式,将所述数据项存储在与所述原始数据表对应的目标数据表中。
  9. 根据权利要求8所述的方法,所述按照键值对的存储格式,将所述数据项存储在与所述原始数据表所对应的目标数据表中,包括:
    将所述数据项在所述原始数据表中的键名和所述数据项的生成时间中的至少一项,确定为所述数据项在所述目标数据表中的键名;
    将所述数据项在所述原始数据表中被修改的字段,确定为所述数据项在所述目标数据表中的键值。
  10. 根据权利要求7所述的方法,所述方法还包括:
    在所述至少一个历史态数据的事务标识中,确定符合第二预设条件的最小事务标识,所述第二预设条件用于表示事务的所有子事务对应的数据项均已经存储在所述集群数据库中;
    根据所述最小事务标识,确定可见数据项,所述可见数据项的事务标识小于或者等于所述最小事务标识;
    基于所述可见数据项,提供数据查询服务。
  11. 根据权利要求10所述的方法,所述在所述至少一个历史态数据的事务标识中,确定符合第二预设条件的最小事务标识,包括:
    按照事务提交时间戳从小到大的顺序,对所述至少一个历史态数据进行排序,当存在事务提交时间戳相同的多个历史态数据时,按照事务标识从小到大的顺序对所述多个历史态数据进行排序,得到目标数据序列;
    从所述目标数据序列中获取符合所述第二预设条件的至少一个事务;
    将所述至少一个事务中排序最靠前的事务的事务标识确定为所述最小事务标识。
  12. 根据权利要求11所述的方法,所述从所述目标数据序列中获取符合所述第二预设条件的至少一个事务,包括:
    遍历所述目标数据序列,对每个历史态数据的位图编码执行按位与操作,确定输出为真的历史态数据对应的事务符合所述第二预设条件;或,
    遍历所述目标数据序列,对每个历史态数据的压缩字典进行解码,得到与每个历史态数据对应的全局事务标识;当确定所述全局事务标识对应的子事务的数据项均已经存储在所述集群数据库中时,确定所述全局事务标识对应的事务符合所述第二预设条件。
  13. 根据权利要求12所述的方法,所述确定所述全局事务标识对应的子事务的数据项均已经存储在所述集群数据库中,包括:
    根据所述全局事务标识,获取在所述集群数据库中已存储的且具有所述全局事务标识的数据项,当获取到的所述数据项以及解码得到的所述历史态数据与事务的所有子事务对应时,确定所述全局事务标识对应的子事务的数据项均已经存储在所述集群数据库中。
  14. 根据权利要求7所述的方法,所述从接收缓冲区接收节点设备发送的至少一个历史态数据,包括:
    每间隔第二预设时长,从所述接收缓冲区中接收任一节点设备发送的至少一个历史态数据;或,
    每间隔第三预设时长,从所述接收缓冲区中接收多个节点设备同时发送的至少一个历史态数据。
  15. 一种数据复制装置,所述装置包括:
    添加模块,用于当检测到事务的提交操作时,将所述事务的历史态数据添加到数据队列中,所述数据队列用于缓存历史态数据;
    所述添加模块,还用于将所述数据队列中的至少一个历史态数据添加到发送缓冲区,所述发送缓冲区用于缓存待复制的历史态数据;
    复制模块,用于当符合第一预设条件时,将所述发送缓冲区中的所述至少一个历史态数据复制至集群设备。
  16. 一种数据复制装置,所述装置包括:
    接收模块,用于从接收缓冲区接收节点设备发送的至少一个历史态数据,所述接收缓冲区用于缓存接收的历史态数据;
    添加模块,用于将所述接收缓冲区中的所述至少一个历史态数据添加到转发缓冲区,通过所述转发缓冲区,将所述至少一个历史态数据转换为符合元组格式的数据,得到至少一个数据项,所述转发缓冲区用于对历史态数据进行数据格式转换;
    存储模块,用于将所述至少一个数据项存储到集群数据库的至少一个目 标数据表中,一个目标数据表对应于一个数据项在所述节点设备中所在的一个原始数据表。
  17. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如权利要求1至权利要求6或权利要求7至权利要求14任一项所述的数据复制方法。
  18. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如权利要求1至权利要求6或权利要求7至权利要求14任一项所述的数据复制方法。
  19. 一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行权利要求1至权利要求6或权利要求7至权利要求14任一项所述的数据复制方法。
PCT/CN2020/084085 2019-05-05 2020-04-10 数据复制方法、装置、计算机设备及存储介质 WO2020224374A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021532087A JP7271670B2 (ja) 2019-05-05 2020-04-10 データレプリケーション方法、装置、コンピュータ機器及びコンピュータプログラム
EP20802129.5A EP3968175B1 (en) 2019-05-05 2020-04-10 Data replication method and apparatus, and computer device and storage medium
US17/330,276 US11921746B2 (en) 2019-05-05 2021-05-25 Data replication method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910368297.X 2019-05-05
CN201910368297.XA CN110209734B (zh) 2019-05-05 2019-05-05 数据复制方法、装置、计算机设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/330,276 Continuation US11921746B2 (en) 2019-05-05 2021-05-25 Data replication method and apparatus, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020224374A1 true WO2020224374A1 (zh) 2020-11-12

Family

ID=67786863

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084085 WO2020224374A1 (zh) 2019-05-05 2020-04-10 数据复制方法、装置、计算机设备及存储介质

Country Status (5)

Country Link
US (1) US11921746B2 (zh)
EP (1) EP3968175B1 (zh)
JP (1) JP7271670B2 (zh)
CN (1) CN110209734B (zh)
WO (1) WO2020224374A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704340A (zh) * 2021-08-30 2021-11-26 远景智能国际私人投资有限公司 数据处理方法、装置、服务器及存储介质
CN113722396A (zh) * 2021-08-25 2021-11-30 武汉达梦数据库股份有限公司 一种数据同步接收端服务主备切换的方法及设备
CN114884971A (zh) * 2021-01-22 2022-08-09 瞻博网络公司 用于跨网络节点同步所复制的对象的装置、系统和方法
CN115037796A (zh) * 2022-06-06 2022-09-09 联通(广东)产业互联网有限公司 一种实时数据发送方法、接收方法及相应的设备和系统
CN116980475A (zh) * 2023-07-31 2023-10-31 深圳市亲邻科技有限公司 一种基于binlog与双环形缓冲区的数据推送系统
US11922026B2 (en) 2022-02-16 2024-03-05 T-Mobile Usa, Inc. Preventing data loss in a filesystem by creating duplicates of data in parallel, such as charging data in a wireless telecommunications network

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209734B (zh) 2019-05-05 2022-11-18 深圳市腾讯计算机系统有限公司 数据复制方法、装置、计算机设备及存储介质
CN110955612B (zh) * 2019-11-07 2022-03-08 浪潮电子信息产业股份有限公司 一种数据缓存方法及相关装置
CN111694825B (zh) * 2020-05-27 2024-04-09 平安银行股份有限公司 机构撤并数据核验方法、装置、计算机设备及存储介质
CN112380266A (zh) * 2020-10-16 2021-02-19 广州市百果园网络科技有限公司 消息数据处理方法、装置、设备和存储介质
CN113193947B (zh) * 2021-04-23 2022-11-15 北京百度网讯科技有限公司 实现分布式全局序的方法、设备、介质及程序产品
CN113297605B (zh) * 2021-06-24 2023-05-05 中国建设银行股份有限公司 复制数据管理方法、装置、电子设备及计算机可读介质
CN114860824A (zh) * 2022-04-11 2022-08-05 远景智能国际私人投资有限公司 数据传输方法、装置、设备及存储介质
CN115134266A (zh) * 2022-05-31 2022-09-30 西安北方华创微电子装备有限公司 数据上传方法、数据接收方法和半导体工艺设备
CN114884774B (zh) * 2022-06-01 2023-10-03 北京东土军悦科技有限公司 数据包转发方法及装置、网络设备和计算设备
CN114741206B (zh) * 2022-06-09 2022-09-06 深圳华锐分布式技术股份有限公司 客户端数据回放处理方法、装置、设备及存储介质
CN114969083B (zh) * 2022-06-24 2024-06-14 在线途游(北京)科技有限公司 一种实时数据分析方法及系统
CN116301593B (zh) * 2023-02-09 2024-02-02 安超云软件有限公司 在云平台下跨集群跨存储拷贝块数据的方法及应用
CN115951846B (zh) * 2023-03-15 2023-06-13 苏州浪潮智能科技有限公司 数据写入方法、装置、设备及介质
CN116644103B (zh) * 2023-05-17 2023-11-24 本原数据(北京)信息技术有限公司 基于数据库的数据排序方法和装置、设备、存储介质
CN116385001B (zh) * 2023-06-07 2023-08-22 建信金融科技有限责任公司 交易日志回放方法、装置、电子设备及存储介质
CN116578655B (zh) * 2023-07-06 2023-09-15 舟谱数据技术南京有限公司 一种数据传输系统及其控制方法
CN116644215B (zh) * 2023-07-27 2023-09-29 北京亿中邮信息技术有限公司 一种跨组织结构的数据更新方法以及更新系统
CN116701544B (zh) * 2023-08-07 2023-11-24 金篆信科有限责任公司 分布式数据库日志处理方法和装置、电子设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114285A1 (en) * 2001-11-16 2005-05-26 Cincotta Frank A. Data replication system and method
CN101719165A (zh) * 2010-01-12 2010-06-02 山东高效能服务器和存储研究院 一种实现数据库高效快速备份的方法
CN104239476A (zh) * 2014-09-04 2014-12-24 上海天脉聚源文化传媒有限公司 一种数据库同步的方法、装置及系统
CN109902127A (zh) * 2019-03-07 2019-06-18 腾讯科技(深圳)有限公司 历史态数据处理方法、装置、计算机设备及存储介质
CN109992628A (zh) * 2019-04-15 2019-07-09 深圳市腾讯计算机系统有限公司 数据同步的方法、装置、服务器及计算机可读存储介质
CN110209734A (zh) * 2019-05-05 2019-09-06 深圳市腾讯计算机系统有限公司 数据复制方法、装置、计算机设备及存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5170480A (en) * 1989-09-25 1992-12-08 International Business Machines Corporation Concurrently applying redo records to backup database in a log sequence using single queue server per queue at a time
US5799305A (en) * 1995-11-02 1998-08-25 Informix Software, Inc. Method of commitment in a distributed database transaction
JP2003263350A (ja) * 2002-03-07 2003-09-19 Ricoh Co Ltd データベースシステム
EP1498815A3 (en) * 2003-06-30 2006-11-29 Gravic, Inc. Methods for ensuring referential integrity in multi-threaded replication engines
JP4549793B2 (ja) * 2004-09-21 2010-09-22 株式会社日立製作所 データ処理方法、データベースシステム及びストレージ装置
JP4355674B2 (ja) * 2005-03-17 2009-11-04 富士通株式会社 リモートコピー方法及びストレージシステム
CA2619778C (en) * 2005-09-09 2016-08-02 Avokia Inc. Method and apparatus for sequencing transactions globally in a distributed database cluster with collision monitoring
KR100926880B1 (ko) * 2007-05-21 2009-11-16 엔에이치엔(주) Dbms에서의 데이터 복제 방법 및 시스템
JP4612715B2 (ja) * 2008-09-05 2011-01-12 株式会社日立製作所 情報処理システム、データ更新方法およびデータ更新プログラム
CN102609479B (zh) 2012-01-20 2015-11-25 北京思特奇信息技术股份有限公司 一种内存数据库节点复制方法
JP6225606B2 (ja) * 2013-09-26 2017-11-08 日本電気株式会社 データベース監視装置、データベース監視方法、並びにコンピュータ・プログラム
US9965359B2 (en) * 2014-11-25 2018-05-08 Sap Se Log forwarding to avoid deadlocks during parallel log replay in asynchronous table replication
US20180096018A1 (en) * 2016-09-30 2018-04-05 Microsoft Technology Licensing, Llc Reducing processing for comparing large metadata sets
US10168928B2 (en) * 2017-03-10 2019-01-01 International Business Machines Corporation Managing data storage by an asynchronous copy service

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114285A1 (en) * 2001-11-16 2005-05-26 Cincotta Frank A. Data replication system and method
CN101719165A (zh) * 2010-01-12 2010-06-02 山东高效能服务器和存储研究院 一种实现数据库高效快速备份的方法
CN104239476A (zh) * 2014-09-04 2014-12-24 上海天脉聚源文化传媒有限公司 一种数据库同步的方法、装置及系统
CN109902127A (zh) * 2019-03-07 2019-06-18 腾讯科技(深圳)有限公司 历史态数据处理方法、装置、计算机设备及存储介质
CN109992628A (zh) * 2019-04-15 2019-07-09 深圳市腾讯计算机系统有限公司 数据同步的方法、装置、服务器及计算机可读存储介质
CN110209734A (zh) * 2019-05-05 2019-09-06 深圳市腾讯计算机系统有限公司 数据复制方法、装置、计算机设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, SHENGJIE ET AL.: " Continuous Data Protection Method for Ceph Distributed Block Storage", NETWORK SECURITY TECHNOLOGY & APPLICATION, 15 February 2017 (2017-02-15), pages 84 - 85, XP009530130 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114884971A (zh) * 2021-01-22 2022-08-09 瞻博网络公司 用于跨网络节点同步所复制的对象的装置、系统和方法
CN113722396A (zh) * 2021-08-25 2021-11-30 武汉达梦数据库股份有限公司 一种数据同步接收端服务主备切换的方法及设备
CN113722396B (zh) * 2021-08-25 2023-12-22 武汉达梦数据库股份有限公司 一种数据同步接收端服务主备切换的方法及设备
CN113704340A (zh) * 2021-08-30 2021-11-26 远景智能国际私人投资有限公司 数据处理方法、装置、服务器及存储介质
CN113704340B (zh) * 2021-08-30 2023-07-21 远景智能国际私人投资有限公司 数据处理方法、装置、服务器及存储介质
US11922026B2 (en) 2022-02-16 2024-03-05 T-Mobile Usa, Inc. Preventing data loss in a filesystem by creating duplicates of data in parallel, such as charging data in a wireless telecommunications network
CN115037796A (zh) * 2022-06-06 2022-09-09 联通(广东)产业互联网有限公司 一种实时数据发送方法、接收方法及相应的设备和系统
CN116980475A (zh) * 2023-07-31 2023-10-31 深圳市亲邻科技有限公司 一种基于binlog与双环形缓冲区的数据推送系统
CN116980475B (zh) * 2023-07-31 2024-06-04 深圳市亲邻科技有限公司 一种基于binlog与双环形缓冲区的数据推送系统

Also Published As

Publication number Publication date
US11921746B2 (en) 2024-03-05
CN110209734A (zh) 2019-09-06
US20210279254A1 (en) 2021-09-09
EP3968175A4 (en) 2022-05-25
JP2022510460A (ja) 2022-01-26
EP3968175A1 (en) 2022-03-16
JP7271670B2 (ja) 2023-05-11
CN110209734B (zh) 2022-11-18
EP3968175B1 (en) 2023-10-18

Similar Documents

Publication Publication Date Title
WO2020224374A1 (zh) 数据复制方法、装置、计算机设备及存储介质
US11263235B2 (en) Database management system and method of operation
US10055250B2 (en) High performance log-based parallel processing of logs of work items representing operations on data objects
CN109739935B (zh) 数据读取方法、装置、电子设备以及存储介质
WO2019154394A1 (zh) 分布式数据库集群系统、数据同步方法及存储介质
CN109710388B (zh) 数据读取方法、装置、电子设备以及存储介质
US9146934B2 (en) Reduced disk space standby
US9589041B2 (en) Client and server integration for replicating data
US8838919B2 (en) Controlling data lag in a replicated computer system
US11822540B2 (en) Data read method and apparatus, computer device, and storage medium
CN111797121B (zh) 读写分离架构业务系统的强一致性查询方法、装置及系统
CN112534419A (zh) 到备用数据库的自动查询卸载
US11263236B2 (en) Real-time cross-system database replication for hybrid-cloud elastic scaling and high-performance data virtualization
EP3234780A1 (en) Detecting lost writes
US20230110826A1 (en) Log execution method and apparatus, computer device and storage medium
CN109783578B (zh) 数据读取方法、装置、电子设备以及存储介质
US20240126781A1 (en) Consensus protocol for asynchronous database transaction replication with fast, automatic failover, zero data loss, strong consistency, full sql support and horizontal scalability
CN117420947A (zh) 一种分布式数据库实时储存功能的方法
WO2024081140A1 (en) Configuration and management of replication units for asynchronous database transaction replication
WO2024081139A1 (en) Consensus protocol for asynchronous database transaction replication with fast, automatic failover, zero data loss, strong consistency, full sql support and horizontal scalability

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20802129

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021532087

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020802129

Country of ref document: EP

Effective date: 20211206