WO2021169955A1 - 一种数据库复制系统、方法、源端设备以及目的端设备 - Google Patents

一种数据库复制系统、方法、源端设备以及目的端设备 Download PDF

Info

Publication number
WO2021169955A1
WO2021169955A1 PCT/CN2021/077476 CN2021077476W WO2021169955A1 WO 2021169955 A1 WO2021169955 A1 WO 2021169955A1 CN 2021077476 W CN2021077476 W CN 2021077476W WO 2021169955 A1 WO2021169955 A1 WO 2021169955A1
Authority
WO
WIPO (PCT)
Prior art keywords
transaction
transaction log
log
logs
database
Prior art date
Application number
PCT/CN2021/077476
Other languages
English (en)
French (fr)
Inventor
孟小珍
马剑涛
黄凯耀
李志学
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21759531.3A priority Critical patent/EP4095714B1/en
Publication of WO2021169955A1 publication Critical patent/WO2021169955A1/zh
Priority to US17/894,352 priority patent/US20220405306A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation

Definitions

  • This application relates to the field of storage technology, and in particular to a database replication system, method, source device, and destination device.
  • the data in the source database can usually be copied to the destination database through a database replication scheme to ensure that when the data in the source database fails, it can be recovered from the destination database Data before the failure.
  • a database replication solution based on transaction logs usually includes three stages: change data capture, change data transmission, and change data replay.
  • change data capture refers to identifying the changed data in the source database through the transaction log recorded in the log file of the source database, and obtaining the transaction log corresponding to the changed data.
  • the transmission of changed data refers to the transfer of the transaction log corresponding to the changed data from the source database to the destination database.
  • the replay of changed data means that the destination database parses and processes the transaction log corresponding to the received changed data, and updates the changed data to the destination database.
  • transaction log 1 must be replayed after transaction log 2 is replayed.
  • the reason is that both transaction log 1 and transaction log 2 record the same data in the source database.
  • the operation performed by the operation object such as a write operation, a write operation for the primary key of a data table row in the source database, the write operation for the primary key of the row recorded in transaction log 1 is prior to the write operation for the primary key of the row recorded in transaction log 2 . Therefore, in the database replication scheme, taking into account the above dependencies, the transaction log can only be processed serially.
  • This application provides a database replication system, method, source-end equipment, and destination-end equipment to improve the efficiency of data replication in a database replication scheme.
  • a database replication system which is used to replay at a destination database based on at least two sets of transaction logs included in a log file of a source database.
  • the system includes a source device and a destination device, wherein :
  • the source device is used to obtain at least two sets of transaction logs in parallel from the log files of the source database, the at least two sets of transaction logs include the first set of transaction logs and the second set of transaction logs, and to send at least two sets of transaction logs, for example,
  • the first group of transaction logs includes at least the adjacent first transaction log and the second transaction log
  • the second group of transaction logs includes at least the adjacent third transaction log and the fourth transaction log
  • the second transaction log is generated earlier than the first transaction log.
  • Three generation time of the transaction log; among them, adjacent can be understood as the generation time of the transaction log is continuous.
  • the destination device is used to receive at least two sets of transaction logs, and perform transaction replay in the destination database of the destination device according to the at least two sets of transaction logs, for example, according to the first transaction log and the first transaction log in the first set of transaction logs.
  • the second transaction log and the dependency between the first transaction log and the second transaction log are replayed in the destination database, and then based on the third transaction log, the fourth transaction log, and the third transaction log and the third transaction log in the second set of transaction logs.
  • the dependency of the transaction log is replayed in the destination database, so that the data stored in the destination database is consistent with the data stored in the source database.
  • the transaction log in the source database reaches the destination device, the transaction log is grouped in the order of generation time to realize the process of obtaining and sending multiple sets of transaction logs in parallel, which can improve database replication.
  • the processing efficiency of the system since there is no need to consider the dependencies between the transaction logs before the transaction log replay, there is no need to perform centralized analysis and processing on the transaction log, which can reduce the processing complexity of the source database and improve the processing efficiency of the system.
  • the transaction logs of different groups will be replayed on the destination device according to the dependency between the transaction logs and the sequence of generation time, the accuracy of the data obtained in the destination database can be guaranteed to ensure the destination database The data in the database is consistent with the data in the source database.
  • the source device is also used to confirm that the operation object of the first transaction operation recorded in the first transaction log is in the source database and the second transaction operation recorded in the second transaction log is in the source database
  • the operation object of the first transaction log is the same, and the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database.
  • the serial number of a transaction log is recorded in the second transaction log, and the serial number of the first transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log; and,
  • the third transaction log After confirming that the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and the third transaction log recorded in the third transaction
  • the number of the third transaction log is recorded in the fourth transaction log. The number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the dependency relationship between transaction logs can be recorded in the corresponding transaction log through the source device, so that the destination device can directly replay the transaction log according to the dependency relationship recorded by each transaction log, which can improve The efficiency of the transaction log replay process.
  • the destination device is also used to confirm that the operation object of the first transaction operation recorded in the first transaction log is in the source database and the second transaction operation recorded in the second transaction log is in the source database
  • the operation objects are the same, and the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database, and the The number of the first transaction log is recorded in the second transaction log, and the number of the first transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log;
  • the third transaction log After confirming that the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and the third transaction log recorded in the third transaction
  • the number of the third transaction log is recorded in the fourth transaction log. The number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the dependency relationship between each transaction log can be determined, so that the processing amount of the source device can be reduced, and the efficiency of the process of extracting the transaction log can be improved.
  • the destination device For the first set of transaction logs, when the destination device obtains the second transaction log in the first set of transaction logs, it confirms that the second transaction log records the dependency relationship between the first transaction log and the second transaction log. The number of the first transaction log, after confirming that the transaction replay performed according to the first transaction log is completed, the transaction replay is performed according to the second transaction log.
  • the destination device can determine whether the transaction log has a dependency relationship with other transaction logs by determining whether the transaction log carries the number of other transaction logs. If there is a dependency relationship, it needs to wait for the dependent relationship. After the transaction log is replayed, it is replayed according to the transaction log to ensure the accuracy of the data obtained in the destination database.
  • the destination device when the destination device obtains the first transaction log in the first set of transaction logs, it confirms that the first transaction log does not record a transaction log indicating a dependency relationship with the first transaction log The number of the transaction is replayed according to the first transaction log.
  • the destination device For the second set of transaction logs, when the destination device obtains the fourth transaction log in the second set of transaction logs, it confirms that the fourth transaction log has a record for indicating the dependency relationship between the fourth transaction log and the third transaction log. The number of the third transaction log, after confirming that the transaction replay based on the third transaction log is completed, the transaction replay is performed based on the fourth transaction log.
  • the destination device when the destination device obtains the third transaction log in the second set of transaction logs, it confirms that the third transaction log does not record a transaction log indicating a dependency relationship with the third transaction log. The number of the transaction replay based on the third transaction log.
  • the source device and the source database are set in the first area
  • the destination device and the destination database are set in the second area
  • the first area and the second area are connected remotely.
  • the source device and the destination device may be set in different regions or different data centers, and then send transaction logs through remote connections between different regions or different data centers.
  • the source device and the destination device can also be located in the same area or the same data center, and there is no restriction here.
  • the source-end device is used to obtain at least two sets of transaction logs in parallel from the source-end database according to the number range of the transaction log.
  • each transaction log group can be assigned a corresponding transaction log number range in advance, and the source device can perform transaction log extraction according to each number range to improve processing efficiency.
  • the source device is also used to: read the log summary record information from the source database, and the log summary record information records the number of the transaction log generated by the source database and the record position in the log file , Length and quantity, and then obtain at least two sets of transaction logs in parallel in the log file according to the log summary record information.
  • the location of the transaction log stored in the log file may be discontinuous.
  • the source database can store log summary record information. Then, when the source device needs to extract the transaction log, First read the log summary record information of the source database, find the transaction log record that needs to be collected from the log summary record information, and determine the storage location of the transaction log in the log file according to the location, length and quantity of the record Therefore, it is possible to obtain the transaction log without traversing all the transaction logs in the log file, which can improve the processing efficiency of the extraction module.
  • a database replication method In a second aspect, a database replication method is provided.
  • the source device first obtains at least two sets of transaction logs in parallel from the log files of the source database, and the at least two sets of transaction logs include the first set of transaction logs and The second set of transaction logs, and the at least two sets of transaction logs are sent, wherein the first set of transaction logs includes at least adjacent first and second transaction logs, and the second set of transaction logs includes at least adjacent The third transaction log and the fourth transaction log, the generation time of the second transaction log is earlier than the generation time of the third transaction log; then, the at least two sets of transaction logs are sent to the destination device.
  • the source device confirms that the first transaction operation recorded in the first transaction log is in the source database and the second transaction operation recorded in the second transaction log is in the same position.
  • the operation objects in the source database are the same, and the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the second transaction operation recorded in the second transaction log in the
  • the number of the first transaction log is recorded in the second transaction log, where the number of the first transaction log is used to indicate that the first transaction log and The dependency relationship of the second transaction log;
  • the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and In the case where the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, The number of the third transaction log is recorded in the fourth transaction log, where the number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the source-end device may obtain the at least two sets of transaction logs in parallel from the source-end database according to the number range of the transaction log.
  • the source-end device may first read the log summary record information from the source-end database.
  • the log summary record information records the serial number of the transaction log generated by the source-end database and the The record position, length, and quantity in the log file, and then the at least two sets of transaction logs are obtained in parallel in the log file according to the log summary record information.
  • a destination device first receives at least two sets of transaction logs from a source device, the at least two sets of transaction logs include a first set of transaction logs and a second set of transaction logs ,
  • the first set of transaction logs includes at least a first transaction log and a second transaction log that are adjacent to each other, and the second set of transaction logs includes at least a third transaction log and a fourth transaction log that are adjacent to each other, and the second transaction
  • the generation time of the log is earlier than the generation time of the third transaction log; then, according to the at least two sets of transaction logs, a transaction replay is performed on the destination database of the destination device, so that the destination database and the source
  • the data stored in the end database is consistent, wherein, according to the first transaction log, the second transaction log in the first set of transaction logs, and the dependency relationship between the first transaction log and the second transaction log After the transaction replay is performed in the destination database, according to the third transaction log, the fourth transaction log in
  • the destination device confirms that the first transaction operation recorded in the first transaction log is in the source database and the second transaction operation recorded in the second transaction log is in the same location.
  • the operation objects in the source database are the same, and the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the second transaction operation recorded in the second transaction log in the
  • the number of the first transaction log is recorded in the second transaction log, wherein the number of the first transaction log is used to indicate the first transaction log The dependency relationship with the second transaction log; and,
  • the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and In the case where the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, The number of the third transaction log is recorded in the fourth transaction log, where the number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the destination device when the destination device obtains the first transaction log, it confirms that the first transaction log does not record a transaction log indicating that there is a dependency relationship with the first transaction log. Number, and perform transaction replay according to the first transaction log.
  • the destination device confirms that the second transaction log is recorded to indicate the dependency of the first transaction log and the second transaction log The serial number of the first transaction log of the relationship, after confirming that the transaction replay performed according to the first transaction log is completed, the transaction replay is performed according to the second transaction log.
  • the destination device when the destination device obtains the third transaction log, it confirms that the third transaction log does not record a transaction log indicating a dependency relationship with the third transaction log. Number, and perform transaction replay according to the third transaction log.
  • the destination device confirms that the fourth transaction log is recorded to indicate the dependency of the fourth transaction log and the third transaction log The serial number of the third transaction log of the relationship, after confirming that the transaction replay performed according to the third transaction log is completed, the transaction replay is performed according to the fourth transaction log.
  • a source device in a fourth aspect, includes a processing module and a sending module. These modules can perform corresponding functions performed in any of the design examples of the second aspect, specifically:
  • the processing module is configured to obtain at least two sets of transaction logs in parallel from the log files of the source database, the at least two sets of transaction logs including a first set of transaction logs and a second set of transaction logs, and to send the at least two sets of transaction logs ,
  • the first set of transaction logs includes at least a first transaction log and a second transaction log that are adjacent
  • the second set of transaction logs includes at least a third transaction log and a fourth transaction log that are adjacent
  • the second transaction The generation time of the log is earlier than the generation time of the third transaction log;
  • the sending module is used to send the at least two sets of transaction logs to the destination device.
  • the processing module is also used to:
  • the operation object of the first transaction operation recorded in the first transaction log in the source database is the same as the operation object of the second transaction operation recorded in the second transaction log in the source database, and In the case where the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database, Recording the number of the first transaction log in the second transaction log, wherein the number of the first transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log; and ,
  • the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and In the case where the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, The number of the third transaction log is recorded in the fourth transaction log, where the number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the processing module is specifically used for:
  • the at least two sets of transaction logs are obtained in parallel from the source database according to the number range of the transaction logs.
  • the processing module is specifically used for:
  • log summary record information from the source-end database, where the log summary record information records the serial number of the transaction log generated by the source-end database, the record position, length, and quantity in the log file;
  • the at least two sets of transaction logs are obtained in parallel in the log file according to the log summary record information.
  • a destination device in a fifth aspect, includes a receiving module and a processing module. These modules can perform corresponding functions performed in any of the design examples of the third aspect, specifically:
  • the receiving module is configured to receive at least two sets of transaction logs from the source device, the at least two sets of transaction logs include a first set of transaction logs and a second set of transaction logs, and the first set of transaction logs includes at least the adjacent first set of transaction logs.
  • the processing module is used to set up the target database according to the first transaction log, the second transaction log, and the dependency relationship between the first transaction log and the second transaction log in the first set of transaction logs After replaying the transaction, according to the third transaction log, the fourth transaction log in the second set of transaction logs, and the dependency relationship between the third transaction log and the fourth transaction log in the destination end
  • the database performs transaction replay so that the data stored in the destination database is consistent with the data stored in the source database.
  • the processing module is also used to:
  • the operation object of the first transaction operation recorded in the first transaction log in the source database is the same as the operation object of the second transaction operation recorded in the second transaction log in the source database, and In the case where the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database, Recording the number of the first transaction log in the second transaction log, wherein the number of the first transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log; and ,
  • the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and In the case where the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, The number of the third transaction log is recorded in the fourth transaction log, where the number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the processing module is specifically used for:
  • the first transaction log In the case of obtaining the first transaction log, confirm that the first transaction log does not record the serial number of the transaction log indicating a dependency relationship with the first transaction log, and perform the transaction according to the first transaction log Repeat.
  • the processing module is specifically used for:
  • the second transaction log In the case of obtaining the second transaction log, confirm that the second transaction log records the serial number of the first transaction log used to indicate the dependency relationship between the first transaction log and the second transaction log, After confirming that the transaction replay based on the first transaction log is completed, perform the transaction replay based on the second transaction log.
  • the processing module is specifically used for:
  • the processing module is specifically used for:
  • the fourth transaction log In the case of obtaining the fourth transaction log, confirm that the fourth transaction log records the number of the third transaction log used to indicate the dependency relationship between the fourth transaction log and the third transaction log, After confirming that the transaction replay based on the third transaction log is completed, perform the transaction replay based on the fourth transaction log.
  • a source-end device in a sixth aspect, includes a processor for implementing the method described in the second aspect.
  • the source device may also include a memory for storing program instructions and data.
  • the memory is coupled with the processor, and the processor can call and execute the program instructions stored in the memory to implement any one of the methods described in the second aspect.
  • the source device may also include a communication interface, and the communication interface is used for the source device to communicate with other devices. Exemplarily, the other device is the destination device.
  • the source device includes a processor and a communication interface, where:
  • the processor is configured to obtain at least two sets of transaction logs in parallel from the log files of the source database, the at least two sets of transaction logs including a first set of transaction logs and a second set of transaction logs, and to send the at least two sets of transaction logs ,
  • the first set of transaction logs includes at least a first transaction log and a second transaction log that are adjacent
  • the second set of transaction logs includes at least a third transaction log and a fourth transaction log that are adjacent
  • the second transaction The generation time of the log is earlier than the generation time of the third transaction log;
  • the communication interface is used to send the at least two sets of transaction logs to the destination device.
  • the processor is also used for:
  • the operation object of the first transaction operation recorded in the first transaction log in the source database is the same as the operation object of the second transaction operation recorded in the second transaction log in the source database, and In the case where the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database, Recording the number of the first transaction log in the second transaction log, wherein the number of the first transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log; and ,
  • the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and In the case where the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, The number of the third transaction log is recorded in the fourth transaction log, where the number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the processor is specifically used for:
  • the at least two sets of transaction logs are obtained in parallel from the source database according to the number range of the transaction logs.
  • the processor is specifically used for:
  • log summary record information from the source-end database, where the log summary record information records the serial number of the transaction log generated by the source-end database, the record position, length, and quantity in the log file;
  • the at least two sets of transaction logs are obtained in parallel in the log file according to the log summary record information.
  • a destination device in a seventh aspect, includes a processor for implementing the method described in the third aspect.
  • the destination device may also include a memory for storing program instructions and data.
  • the memory is coupled with the processor, and the processor can call and execute the program instructions stored in the memory to implement any one of the methods described in the third aspect.
  • the destination device may also include a communication interface, and the communication interface is used for the destination device to communicate with other devices. Exemplarily, the other device is the source device.
  • the destination device includes a processor and a communication interface, where:
  • the communication interface is configured to receive at least two sets of transaction logs from the source device, the at least two sets of transaction logs include a first set of transaction logs and a second set of transaction logs, and the first set of transaction logs includes at least the adjacent first set of transaction logs.
  • the processor is configured to store the first transaction log, the second transaction log and the dependency relationship between the first transaction log and the second transaction log in the destination database according to the first transaction log in the first set of transaction logs. After replaying the transaction, according to the third transaction log, the fourth transaction log in the second set of transaction logs, and the dependency relationship between the third transaction log and the fourth transaction log in the destination end.
  • the database performs transaction replay so that the data stored in the destination database is consistent with the data stored in the source database.
  • the processor is also used for:
  • the operation object of the first transaction operation recorded in the first transaction log in the source database is the same as the operation object of the second transaction operation recorded in the second transaction log in the source database, and In the case where the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database, Recording the number of the first transaction log in the second transaction log, wherein the number of the first transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log; and ,
  • the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and In the case where the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, The number of the third transaction log is recorded in the fourth transaction log, where the number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the processor is specifically used for:
  • the first transaction log In the case of obtaining the first transaction log, confirm that the first transaction log does not record the serial number of the transaction log indicating a dependency relationship with the first transaction log, and perform the transaction according to the first transaction log Repeat.
  • the processor is specifically used for:
  • the second transaction log In the case of obtaining the second transaction log, confirm that the second transaction log records the serial number of the first transaction log used to indicate the dependency relationship between the first transaction log and the second transaction log, After confirming that the transaction replay based on the first transaction log is completed, perform the transaction replay based on the second transaction log.
  • the processor is specifically used for:
  • the processor is specifically used for:
  • the fourth transaction log In the case of obtaining the fourth transaction log, confirm that the fourth transaction log records the number of the third transaction log used to indicate the dependency relationship between the fourth transaction log and the third transaction log, After confirming that the transaction replay based on the third transaction log is completed, perform the transaction replay based on the fourth transaction log.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a computer, cause the The computer executes the method described in any one of the second aspect or the third aspect.
  • an embodiment of the present application provides a computer program product that stores a computer program, and the computer program includes program instructions that, when executed by a computer, cause the computer to execute the second The method of any one of the aspect or the third aspect.
  • the present application provides a chip system, which includes a processor and may also include a memory, for implementing the method described in the second aspect or the third aspect.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of an example of an application scenario of an embodiment of the application
  • Figure 2 is a schematic diagram of a database replication scheme based on transaction logs
  • FIG. 3 is a structural block diagram of a database replication system 300 provided by an embodiment of the application.
  • FIG. 4 is a structural block diagram of an example of a database replication system 300
  • FIG. 5 is a structural block diagram of another example of the database replication system 300.
  • FIG. 6 is a structural block diagram of another example of the database replication system 300.
  • FIG. 7 is a structural block diagram of another example of the database replication system 300.
  • FIG. 8 is a schematic diagram of an example of data replication performed by the database replication system 300 shown in FIG. 7;
  • FIG. 9 is a structural block diagram of another example of the database replication system 300.
  • FIG. 10 is a structural block diagram of another example of the database replication system 300.
  • FIG. 11 is a structural block diagram of another example of the database replication system 300.
  • FIG. 12 is a flowchart of an example of a database replication method provided by an embodiment of the application.
  • FIG. 13 is a flowchart of another example of a database replication method provided by an embodiment of the application.
  • FIG. 14 is a flowchart of the initialization setting of each module provided by an embodiment of the application.
  • FIG. 15 is a flowchart of database replication performed by various modules provided in an embodiment of the application.
  • FIG. 16 is a flowchart of failure recovery of each module provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of an example of a source device provided by an embodiment of this application.
  • FIG. 18 is a schematic structural diagram of an example of a destination device provided by an embodiment of this application.
  • the source device refers to a device used to store data independently, such as a server, or a device cluster that can also be used to store data, for example, a storage system that includes a management device and multiple storage devices, where the management device can It is a server, and the storage device can be a hard disk drive (HDD) disk device, a solid state drive (SSD) disk device, a serial advanced technology attachment (SATA) disk device, etc.
  • HDD hard disk drive
  • SSD solid state drive
  • SATA serial advanced technology attachment
  • the destination device is similar to the source device, so I won't repeat it here.
  • the source database refers to a collection of multiple data stored in the source device according to a certain storage method and managed uniformly.
  • the source device can perform operations such as adding, querying, updating, and deleting data in the database.
  • the source database can include relational data or non-relational data. Of course, it can also include other types of databases, which is not limited here.
  • a source device can include one source database or multiple source databases. If multiple source databases are included, each database can be numbered. The source device can be based on the number of each source database. Access to various source-side databases.
  • the destination database is similar to the source data, so I won't repeat it here.
  • the operation object refers to each data stored in the source database.
  • the operation object may refer to the data of a row determined by the row primary key or the row unique key in any data table in the source database.
  • the dependency relationship refers to the relationship between multiple transaction logs generated for the same operation object of the source database, which must be replayed in the order of the generation time of the transaction log. For example, at the first moment, a modification operation on an operation object in the source database generates transaction log 1, and at the second moment after the first moment, the operation object is modified again to generate transaction log 2. And since transaction log 2 is generated after transaction log 1, transaction log 2 must be replayed after transaction log 1, so there is a dependency between transaction log 1 and transaction log 2, which can also be called transaction log 2 dependency Transaction log 1.
  • Area refers to a physical area independent of power and network. Each area can be used to provide corresponding computing resources, such as virtual machines, or each area can also be used to provide corresponding storage resources, such as storage systems. There is no restriction here. When each area is used to provide storage resources, the area can also be called a data center. Remote connections between different areas or data centers, for example, can be connected through a wireless network.
  • the generation time of the transaction log refers to the logical time of the transaction log in the log file, rather than a specific time stamp. Logical time can be understood as the sequence between multiple transaction logs. For example, transaction log 1 is generated before transaction log 2, but it does not indicate that the transaction log is at a certain moment (for example, 10:39:00) produced.
  • multiple refers to two or more than two. In view of this, “multiple” can also be understood as “at least two” in the embodiments of the present application. "At least one” can be understood as one or more, for example, one, two or more. For example, including at least one means including one, two or more, and does not limit which ones are included. For example, including at least one of A, B, and C, then the included may be A, B, C, A and B, A and C, B and C, or A and B and C.
  • ordinal numbers such as “first” and “second” mentioned in the embodiments of the present application are used to distinguish multiple objects, and are not used to limit the order, timing, priority, or importance of multiple objects.
  • data can be stored through a database storage system.
  • the data in the storage system is usually copied.
  • Figure 1 Taking the storage system as the database storage system as an example, the data in the source database can be copied to the destination database through the network, so that when the source database fails, the data can be restored from the destination database .
  • data can be copied based on the application layer, data can be copied based on the database, etc. In the embodiments of the present application, it is mainly aimed at database-based copying data.
  • the scheme can be referred to as the database replication scheme for short.
  • a database replication solution based on a transaction log is taken as an example to describe the database replication solution.
  • the source database When the data stored in the source database changes, the source database generates a transaction log corresponding to the changed data.
  • the transaction log can record information such as the operation performed on the operation object, the content of the operation object, the start and end positions of the operation object, and so on.
  • the specific content included in the transaction log is not restricted here. For example, at time 1, through a write operation, a new data is written to the source database, such as data A, then data A is the changed data in the source database. Therefore, the source database will generate and data
  • the transaction log corresponding to A is stored in a log file, and the transaction log can record information such as operations performed on data A (in this example, the operation is a write operation), the content of data A, and the start and end positions of data A.
  • the transaction log corresponding to data A is marked as transaction log 1. Then, at time 2 after time 1, the data A is modified through a modification operation, and the source database again generates a transaction log 2 corresponding to data A and stores it in the log file.
  • a database replication scheme based on transaction logs can be used.
  • One of the main principles of the transaction log-based database replication scheme is that when the transaction log reaches the destination database, it must be replayed in accordance with the dependency between the transaction logs. For example, the aforementioned transaction log 2 must be replayed after the transaction log 1, so as to obtain the same data as the source database.
  • Figure 2 is a schematic diagram of a database replication scheme based on transaction logs.
  • the database replication scheme includes 4 modules, namely the transaction extraction module, the cross-domain transmission module, the pre-replay parallelization module, and the transaction replay module.
  • the transaction extraction module and the cross-domain transmission module are set in the source device, and the pre-replay parallelization module and the transaction replay module are set in the destination device.
  • the pre-replay parallelization module and the transaction replay module have a one-to-many relationship.
  • transaction extraction The number of modules, cross-domain transmission modules, and parallelization modules before replay is one, while the number of transaction replay modules is multiple.
  • the specific values can be set according to actual usage requirements.
  • the transaction replay module includes K as an example.
  • the transaction extraction module in the source device first obtains the transaction log corresponding to the changed data in the source database. Specifically, the transaction log is stored in the source database in the order of the generation time of the transaction log. The transaction log is read serially in the log file, and then the acquired transaction log is transmitted to the cross-domain transmission module for processing. After receiving the transaction log, the cross-domain transmission module serially transmits the acquired transaction log to the pre-replay parallelization module, thereby sending the transaction log corresponding to the changed data to the destination device. It should be noted that, in the above technical solution, the transaction extraction module serially reads the transaction log, which refers to serially read the transaction log from a log file corresponding to a certain source database, and each source database only includes A log file.
  • the pre-replay parallelization module in the destination device after receiving the multiple transaction logs, first identifies the dependency relationship between the multiple transaction logs, and transmits the multiple transaction logs to the transaction replay module according to the determined dependency relationship .
  • the parallelization module before replay can default to the first transaction log received without relying on other transaction logs, then the first transaction log is transmitted to one of the K transaction replay modules, for example, to the transaction replay module 1.
  • the parallelization module before replay can default to the first transaction log received without relying on other transaction logs, then the first transaction log is transmitted to one of the K transaction replay modules, for example, to the transaction replay module 1.
  • the parallelization module before replay can default to the first transaction log received without relying on other transaction logs, then the first transaction log is transmitted to one of the K transaction replay modules, for example, to the transaction replay module 1.
  • the dependency between the received second transaction log and the first transaction log and determine that the second transaction log and the first transaction log deal with the same operation object in the source database, then confirm There is a dependency between
  • the parallelization module before replay needs to wait for the transaction replay module 1 to complete the first transaction log before replaying the second transaction log. Transfer to transaction replay module 1. If the parallelization module determines that there is no dependency between the second transaction log and the first transaction log before replay, it directly transmits the second transaction log to the K transaction replay modules except transaction replay module 1. The transaction replay module, for example, is transmitted to the transaction replay module 2. In this way, the transaction replay module 1 and the transaction replay module 2 can process different transaction logs in parallel, so that the process of parallel processing of transaction logs without dependencies can be realized. After each transaction replay module receives the transaction log, it executes the transaction log in the target database. After the execution is completed, the data that changes in the source database can be obtained. Then, the transaction replay module will feed back the execution result of the transaction log to the pre-replay parallelization module, so that the pre-replay parallelization module determines whether the transaction log is replayed according to whether it receives the execution result of a certain transaction log.
  • multiple transaction replay modules can be set to achieve parallel replay of the transaction log.
  • other processing stages for example, the transaction extraction stage, the cross-domain transfer stage, and the parallelization before the replay
  • only serial processing can be used, which leads to low efficiency when copying data through the database replication scheme.
  • embodiments of the present application provide a database replication system that can improve the efficiency of data replication.
  • the database replication system provided by the embodiments of the present application will be described with reference to the accompanying drawings.
  • FIG. 3 is a structural block diagram of a database replication system 300 provided by an embodiment of this application.
  • the database replication system 300 is used to replay at the destination database according to at least two sets of transaction logs included in the log file of the source database.
  • the database replication system 300 includes a source device 301 and a source device 301. 301 The destination device 302 of the communication connection, where:
  • the source device 301 is configured to obtain at least two sets of transaction logs in parallel from the log files of the source database, and the at least two sets of transaction logs include a first set of transaction logs and a second set of transaction logs.
  • each group of transaction logs includes at least two transaction logs.
  • the first group of transaction logs includes at least a first transaction log and a second transaction log
  • the second group of transaction logs includes at least a third transaction log and a fourth transaction log.
  • each set of transaction logs are adjacent, that is, the first transaction log is adjacent to the second transaction log, and the third transaction log is adjacent to the fourth transaction log.
  • Transaction logs are adjacent, which can be understood as the generation time of transaction logs in each group of transaction logs in log files is continuous. For example, each line in the log file is used to record a transaction log, the source device 301 stores each transaction log in the log file according to the generation time of the transaction log, then the first transaction log and the second transaction log are stored in the log file Among the two consecutive lines, the third transaction log and the fourth transaction log are stored in the other two consecutive lines in the log file.
  • the generation time of each transaction log in the first set of transaction logs is earlier than the generation time of any transaction log in the second set of transaction logs, that is, the generation time of each transaction log in the first set of transaction logs
  • the generation time of the last transaction log is earlier than the generation time of the first transaction log of the second set of transaction logs.
  • the first group of transaction logs includes the first transaction log and the second transaction log.
  • the second transaction log is the last transaction log in the first group of transaction logs sorted by the time of generation.
  • the second group of transaction logs includes the third transaction log.
  • the third transaction log is the first transaction log sorted by time in the second group of transaction logs, and the generation time of the second transaction log is earlier than the generation time of the third transaction log.
  • the number of transaction log groups obtained in parallel by the source device 301 is not limited. For example, three sets of transaction logs can be obtained in parallel, or five sets of transaction logs can be obtained in parallel, which is not limited here.
  • the source device 301 obtains two sets of transaction logs in parallel, which are marked as the first set of transaction logs and the second set of transaction logs respectively, as an example. Be explained.
  • the source device 301 may include at least one source database, and the at least two sets of transaction logs obtained above are obtained from log files corresponding to a certain source database, or may be obtained from different log files. Obtained from the log file corresponding to the source database, there is no restriction here.
  • the source device 301 After the source device 301 obtains the first group of transaction logs and the second group of transaction logs, it sends the first group of transaction logs and the second group of transaction logs to the destination device. It should be noted that the source device 301 can send the first group of transaction logs and the second group of transaction logs in any manner, specifically, it can be sent in parallel or asynchronously, which is not limited here.
  • the destination device 302 is configured to receive the first group of transaction logs and the second group of transaction logs, and then according to the dependency between at least two transaction logs included in each group of transaction logs and the transaction logs in each group of transaction logs Relationship, each set of transaction logs is replayed in the destination database of the destination device 302, so that the data stored in the destination database is consistent with the data stored in the source database.
  • the destination device 302 needs to first follow the first group of transaction logs.
  • the fourth transaction log and the dependency relationship between the third transaction log and the fourth transaction log are replayed in the destination database.
  • the transaction log in the source database reaches the destination device, the transaction log is grouped in the order of generation time to realize the process of obtaining and sending multiple sets of transaction logs in parallel, which can improve database replication.
  • the processing efficiency of the system since there is no need to consider the dependencies between the transaction logs before the transaction log replay, there is no need to perform centralized analysis and processing on the transaction log, which can reduce the processing complexity of the source database and improve the processing efficiency of the system.
  • the transaction logs of different groups will be replayed on the destination device according to the dependency between the transaction logs and the sequence of generation time, the accuracy of the data obtained in the destination database can be guaranteed to ensure the destination database The data in the database is consistent with the data in the source database.
  • the source device and destination device in the above system can also process more sets of transaction logs in parallel, such as processing three sets of transaction logs, four sets of transaction logs, etc. in parallel.
  • the process of processing the multiple sets of transaction logs by the source-end device and the destination-end device is the same as that of the foregoing two sets of transaction logs.
  • the source device and the source database can be integrated into one device, or two independent devices, and the destination device and the destination database can also be integrated into one device or two separate devices.
  • the source device and the source database can be set in the first region or the first data center, and the destination device and the destination database can be set in the second region or the second data remotely connected to the first region or the first data center.
  • the center, or the source device, the source database, the destination device, and the destination database may also be located in the same area or the same data center, and there is no restriction here.
  • FIG. 4 is a structural block diagram of an example of a database replication system 300.
  • the source device 301 can be provided with two extraction modules (respectively the first extraction module 3011 and the second extraction module 3012) and two transmission modules (respectively the first transmission module 3013 and the second transmission module 3014), the two extraction modules and the two sending modules are connected in a one-to-one relationship.
  • the first extraction module 3011 is connected with the first sending module 3013
  • the second extraction module 3012 is connected with the second sending module 3014.
  • Two receiving modules (respectively the first receiving module 3021 and the second receiving module 3022) and one replay module 3023 can be set in the destination device 302, and the two receiving modules are connected in a one-to-one relationship with the two sending modules.
  • the first sending module 3013 is connected to the first receiving module 3021
  • the second sending module 3014 is connected to the second receiving module 3022
  • the two receiving modules are connected to the replay module 3023 respectively.
  • the number of extraction modules, sending modules, and receiving modules can be related to the number of transaction log groups that the source device 301 needs to extract. For example, if the source device 301 needs to obtain two sets of transaction logs, you can set Two extraction modules, two sending modules, and two receiving modules; if the source device 301 needs to obtain three sets of transaction logs, three extraction modules, three sending modules, and three receiving modules can be set up, and so on. I will not list them all here.
  • each extraction module, each sending module, each receiving module, and replay module can be a program in the server.
  • Functional modules or applications or threads implemented by code If the source device 301 and the destination device 302 are a cluster system, for example, a cluster system composed of at least one virtual machine, each extraction module, each sending module, each receiving module, and replay module can be deployed in a virtual machine Virtualization function instances or containers on the Internet.
  • the above modules can also be implemented in other ways, which are not limited here.
  • each extraction module is used to obtain a set of transaction logs from the log file of the source database.
  • the first extraction module 3011 is used to obtain the first set of transaction logs from the log file
  • the second extract The module 3012 is used to obtain the second set of transaction logs from the log file.
  • the manner in which the extraction module obtains the transaction log may include, but is not limited to, the following three:
  • Each extraction module first needs to determine the extraction range in which a set of transaction logs should be extracted, and then obtain the first set of transaction logs and the second set of transaction logs according to their respective extraction ranges.
  • the extraction range can be preset.
  • each line in the log file can be used to store a transaction log. Specifically, a certain line in the log file can be indicated by a start identifier and an end identifier of the transaction log.
  • the first extraction module 3011 extracts the transaction log stored in lines 1-100 in the log file
  • the second extraction module 3012 extracts the transaction log stored in lines 101-200 in the log file.
  • each extraction module obtains the transaction log of the group from the corresponding position in the log file in parallel according to the preset extraction range.
  • the time for extracting the transaction log can be set for each extraction module. For example, you can set the transaction log to start extracting one hour after the source device 301 is turned on, and the running time of the source device 301 reaches After one hour, each extraction module obtains the transaction logs of each group in the above-mentioned manner.
  • Each extraction module obtains the first group of transaction logs and the second group of transaction logs in parallel from the source database according to the number range of the transaction log.
  • the serial number of the transaction log may be obtained by the extraction module according to the sequence of the generation time of the transaction log.
  • the number of the first transaction log generated is 1
  • the number of the second transaction log generated is 2, and so on.
  • the starting number and the number of transaction logs to be extracted by each extraction module can be set in advance. For example, if each extraction module extracts 5000 transaction logs, the first extraction module 3011 needs to be extracted
  • the starting number of the transaction log is 1, and the number of extraction is 5000, that is, the transaction log numbered from 1-5000 is extracted.
  • the starting number of the transaction log to be extracted by the second extraction module 3012 is 5001, and the number of extraction is 5000 , That is, extract the transaction logs numbered 5001-10000.
  • each extraction module obtains a set of transaction logs according to the preset number range.
  • the starting number of the transaction log is 1 for description. In actual use, the starting number of the transaction log can also be 0, which is not limited here.
  • each extraction module determines its own number range, it obtains a set of transaction logs from the source database according to the number range.
  • each transaction log in the log file can include two parts: a header and a body.
  • the header is used to record the storage location of the transaction log, and the body is used to record the type of the transaction log, the operation corresponding to the transaction log, and Information such as the content of the processed data will not be explained here.
  • the extraction module can read each transaction log from the source database in turn.
  • the extraction module When reading the transaction log, first determine the transaction log number according to the generation time of the transaction log, if the transaction log number belongs to the number corresponding to the extraction module Range, the header of the transaction log and the body of the transaction log are further read to obtain the transaction log; if the number of the transaction log does not belong to the number range corresponding to the extraction module, skip the transaction log and read the next Transaction logs until all transaction logs corresponding to the number range are obtained, and finally a set of transaction logs corresponding to the extraction module is obtained.
  • the extraction module determines the serial number of the transaction log according to the generation time of the transaction log, it may also add the serial number of the transaction log to the header of the transaction log after obtaining the transaction log.
  • the extraction module can filter the transaction log, and then number the filtered transaction log. Filtering method For example, the extraction module only needs to obtain the transaction log of data table A, and the extraction module can filter out the transaction log that does not belong to data table A. Alternatively, the extraction module can also filter according to the type of the transaction log, for example, filter out the transaction log for creating the data table and the transaction log for modifying the structure of the data table. There are many specific filtering methods, which will not be explained here.
  • the location of the transaction log stored in the log file may be discontinuous.
  • transaction log 1 is stored in the first line of the log file
  • transaction log 2 is stored in the fourth line of the log file.
  • each extraction module may need to traverse all transaction logs in the log file to obtain the corresponding transaction log within the number range.
  • the source database of the source device 301 When the source database of the source device 301 stores transaction logs, it can generate log summary record information corresponding to each transaction log.
  • the log summary record information records the number of the transaction log generated by the source database and the log file
  • the recording location, length, and quantity in, of course, can also include other information, and I will not give examples one by one here.
  • the number of the transaction log in the log summary record information is generated by the source database.
  • the source database may number the transaction log according to the generation time of the transaction log.
  • the log summary record information is stored in the specified location in the source database. In this way, when the extraction module needs to obtain the transaction log, it can first go to the specified location to obtain the log summary record information, and then record the information in the log file according to the log summary record.
  • the extraction module first obtains the log summary record information of the source database, and finds a record numbered 2 from the log summary record information. According to the position in the record, Length and quantity, determine the storage location of the transaction log numbered 2 in the log file, and then go to the corresponding location in the log file to obtain the transaction log, that is, the transaction log numbered 2, so that you do not need to traverse all transaction logs in the log file.
  • the processing efficiency of the extraction module can be improved.
  • the number of transaction logs included in the extraction range corresponding to each extraction module is the same. In actual use, the number of transaction logs included in the extraction range corresponding to different extraction modules is The quantity can also be different.
  • the first extraction module can extract 5000 transaction logs
  • the second extraction module can extract 4000 transaction logs.
  • each transaction log can record a variety of content, for example, it can include information such as the processed operation object, the content of the operation object, the size of the operation object (which can be understood as data), etc., combined with the embodiments of the present application
  • the transaction log From the perspective of the purpose of transmitting the transaction log (that is, in order to be able to obtain the changed data in the source database), not every content included in the transaction log is necessary to obtain the changed data in the source database, for example, Even if the transaction log does not include the size of the processed data, when the transaction log is replayed in the destination database, the corresponding data can still be obtained.
  • each extraction module after each extraction module obtains the transaction log, it can also analyze the transaction log and follow the preset Filter conditions, filter each transaction log, and obtain the filtered transaction log.
  • the transaction log is first parsed to obtain the content contained in the transaction log, and then according to the operation type corresponding to the transaction log and the filter conditions corresponding to each operation type, the transaction Part of the content in the log is filtered out, and then the remaining content is copied and combined to obtain the filtered transaction log.
  • filtering out part of the content can be understood as deleting part of the content.
  • the operation type corresponding to the transaction log may include, but is not limited to, the operation type of adding data, the operation type of modifying data, the operation type of deleting data, the operation type of adding database table, and the operation type of deleting database table. Those skilled in the art can follow Actual use needs to set filter conditions corresponding to different operation types.
  • the corresponding filter conditions can be to filter out information other than the storage location of the processed data and the content of the data, and
  • the corresponding filter condition can be to filter out information other than the processed operation object, so that different transaction logs can be flexibly filtered.
  • the conditions for filtering here can be the same as or different from the conditions for filtering by the aforementioned extraction module before numbering the transaction log, and there is no restriction here.
  • each Each extraction module can be adapted to support transaction logs corresponding to different types of databases, that is, each extraction module can correspond to multiple ways to parse the transaction log.
  • the extraction module can determine the source before parsing the transaction log. The type of the database, and then use a parsing method that matches the type of the source database to parse the transaction log.
  • each extraction module can also automatically calculate the extraction range of the next set of transaction logs to be extracted.
  • multiple extraction modules can interact with each other, and each extraction module can acquire the extraction range of other modules.
  • the number range of the first extraction module 1 is 1-5000, and the first extraction module acquires other extraction modules
  • the number of transaction logs included in the number range of is 5000, so the first extraction module can calculate that the number range of the next set of transaction logs to be extracted is 20001-25000.
  • a calculation strategy can be preset in each extraction module, and the calculation strategy can be that after a set of transaction logs are extracted, 20,000 is automatically added to the current number range to obtain the number range of the next set of transaction logs.
  • the extraction module can extract the next set of transaction logs without waiting for the replay module to replay the transaction log it has extracted, which can speed up the processing efficiency.
  • first extraction module 3011 and the second extraction module 3012 respectively obtain a corresponding set of transaction logs, they send the respective set of transaction logs to the sending module connected to them.
  • the first extraction module 3011 will obtain the first set of transaction logs.
  • a group of transaction logs are sent to the first sending module 3013, and the second extraction module 3012 sends the acquired second group of transaction logs to the second sending module 3014, and then sends the first group of transaction logs to the first through the first sending module 3013
  • the receiving module 3021 sends the second group of transaction logs to the second receiving module 3022 in parallel through the second sending module 3014.
  • first receiving module 3021 and the second receiving module 3022 respectively receive the first group of transaction logs and the second group of transaction logs, they send the first group of transaction logs and the second group of transaction logs to the replay module 3023, and the replay module 3023 replays the two sets of transaction logs in the destination database according to the dependency between the transaction logs.
  • the replay module 3023 first determines the dependency relationship between the transaction logs included in the first set of transaction logs. For example, the replay module 3023 determines the operation object of the first transaction log in the first set of transaction logs and the second transaction. Whether the operation objects of the logs are the same, if they are the same, it is judged whether the generation time of the first transaction log is before the generation time of the second transaction log, and if so, it is determined that the second transaction log depends on the first transaction log, and the replay module 3023 replays first After the first transaction log, replay the second transaction log to ensure the accuracy of the data obtained in the destination database.
  • the replay module 3023 may include multiple replay queues, and the replay module 3023 may divide the multiple transaction logs included in the first set of transaction logs into multiple replay queues according to the dependency relationship between the transaction logs. For example, if the first transaction log and the second transaction log have a dependency relationship, the first transaction log and the second transaction log are divided into the same replay queue, and the other transaction logs are the same as the first transaction log and the second transaction log. If there is no dependency between them, divide other transaction logs into other replay queues, until all transaction logs in the first group of transaction logs are divided into corresponding replay queues, and then replay each transaction in each replay queue in turn Log, complete the replay of the first set of transaction logs.
  • the replay module 3023 uses the same method as the above to determine the dependency between the transaction logs included in the second set of transaction logs, and after replaying the first set of transaction logs, according to the relationship between each transaction log in the second set of transaction logs Replay all transaction logs in the second set of transaction logs.
  • the specific process is similar to that of the first set of transaction logs, so I won’t repeat them here.
  • a transaction log may include multiple transaction operations.
  • the multiple transaction operations may include adding, modifying, or deleting data in different rows or columns of different data tables, that is, a transaction
  • the log can include multiple operation objects. In this case, when determining the dependency between the transaction log and other transaction logs, as long as the operation object of the other transaction log is related to the multiple operation objects of the transaction log. If an operation object is the same, it is determined that there is a dependency between the two transaction logs.
  • the replay module may also include more than one.
  • FIG. 5 is a structural block diagram of another example of the database replication system 300. The difference from FIG. 4 is that in the example shown in FIG. 5, the number of replay modules can be the same as the number of receiving modules, for example, two replay modules are included, namely the first replay module 3024 and the second replay module 3025.
  • the first replay module 3024 is connected to the first receiving module 3021 to receive the first set of transaction logs
  • the second replay module 3025 is connected to the second receiving module 3022 to receive the second set of transaction logs, and then the first replay The module 3024 and the second replay module 3025 replay the transaction log they received in a preset order.
  • the preset sequence is that the first replay module 3024 performs the replay first. After the transaction log in the first replay module 3024 is replayed, The second replay module 3025 performs replay again.
  • the manner in which each replay module replays a set of transaction logs it receives is similar to that of the replay module 3023 in FIG. 4, which is not limited here.
  • the receiving module can also be integrated into the replay module.
  • the first receiving module 3021 is integrated into the first replay module 3024
  • the second receiving module 3022 is integrated into the first replay module.
  • the system can be simplified.
  • the source device 301 in the database replication system 300 as shown in FIG. 3 is used to obtain the first set of transaction logs and the second set of transaction logs in parallel, and is also used to provide the first set of transaction logs and the second set of transaction logs. The dependency between multiple transaction logs included in each set of transaction logs.
  • the source device 301 is confirming that the first transaction operation recorded in the first transaction log is in the source database and the second transaction operation recorded in the second transaction log is in the source database. If the operation objects in the first transaction log are the same, and the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database, the The number of the first transaction log is recorded in the second transaction log, and the number of the first transaction log carried in the second transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log, that is, the second transaction log Rely on the first transaction log.
  • the source device 301 is confirming the operation object of the third transaction operation recorded in the third transaction log in the source database and the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database Same, and if the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, the third transaction log The number of is recorded in the fourth transaction log, and the number of the third transaction log carried in the fourth transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log, that is, the fourth transaction log depends on the third transaction log. Transaction log.
  • the destination device 302 can replay multiple transaction logs in each group of transaction logs according to whether each group of transaction logs carries the serial numbers of other transaction logs.
  • the destination device 302 in the case of obtaining the first transaction log, confirms that The first transaction log performs transaction replay. Then, when the destination device 302 obtains the second transaction log in the first set of transaction logs, it confirms that the second transaction log records the first transaction log indicating the dependency relationship between the first transaction log and the second transaction log. A transaction log number, after confirming that the transaction replay performed according to the first transaction log is completed, the transaction replay is performed according to the second transaction log, and so on, until all transaction logs included in the first set of transaction logs are replayed.
  • the destination device 302 obtains the third transaction log in the second set of transaction logs, confirms that the third transaction log does not record the number of the transaction log used to indicate a dependency relationship with the third transaction log, The transaction is replayed according to the third transaction log.
  • the fourth transaction log in the second set of transaction logs confirm that the fourth transaction log records the number of the third transaction log indicating the dependency relationship between the fourth transaction log and the third transaction log, and confirm After the transaction replay according to the third transaction log is completed, the transaction replay is performed according to the fourth transaction log, and so on, until all transaction logs included in the second set of transaction logs are replayed.
  • FIG. 6 is a structural block diagram of another example of the database replication system 300.
  • each parallelization module is associated with an extraction module.
  • the module is connected to a sending module, for example, the first parallelization module 3015 is connected to the first extraction module 3011 and the first sending module 3013 respectively, and the second parallelization module 3016 is connected to the second extraction module 3012 and the second sending module 3014 respectively .
  • Each parallelization module is used to determine the dependency relationship between multiple transaction logs included in each set of transaction logs, and add the dependency relationship to the corresponding transaction log.
  • parallelization modules are similar to the aforementioned extraction module, each sending module, each receiving module, and replay module, and will not be repeated here.
  • first extraction module 3011, the first sending module 3013, the second extraction module 3012, the second sending module 3014, the first receiving module 3021, the second receiving module 3022, and the replay module 3023 are respectively similar to the corresponding modules in FIG. 4 , I won’t repeat it here.
  • the parallelization module is mainly explained.
  • the first parallelization module 3015 receives the first set of transaction logs sent by the first extraction module 3011 connected to it, it sequentially adds a dependency relationship for each transaction log in the set of transaction logs.
  • the first parallelization module 3015 obtains the first transaction log in the first set of transaction logs. Obviously, there is no dependency on the first transaction log.
  • the first parallelization module 3015 can be used in the first transaction log. Add a field to the header, which is used to indicate the number of the transaction log on which the transaction log depends. Since the first transaction log does not have a dependency relationship, the field corresponding to the first transaction log can be empty, or the first transaction log A parallelization module 3015 can also write 0 in this field.
  • the starting number of the transaction log is 1. If it is 0, it means that the transaction log does not depend on other transaction logs. Then, the first parallelization module 3015 determines the dependency of the second transaction log in the first set of transaction logs, and determines whether the transaction operation recorded in the second transaction log is the same as the previous transaction in the source database. The transaction operation of the log record is the same as the operation object of the source database. For example, the first parallelization module 3015 determines that the transaction operation recorded in the first transaction log is used to process the first row of data in data table A, and the transaction operation recorded in the second transaction log is also used in processing the first row of data table A.
  • the first parallelization module 3015 determines that the transaction operations recorded in the first transaction log and the second transaction log respectively have the same operation objects in the source database; or, when the source database is a KV key-value database, then the first A parallelization module 3015 can determine whether the operation objects recorded in the two transaction logs are the same in the source database by determining whether the operation objects recorded in the two transaction logs have at least one identical key value, and if they are the same
  • the key value of the two transaction logs indicates that the transaction operations recorded in the two transaction logs have the same operation object in the source database. If there is no same key value, it indicates that the transaction operations recorded in the two transaction logs are in the source database.
  • the operation object in is different. Of course, it can also be judged in other ways, which is not limited here.
  • the first parallelization module 3015 determines whether the transaction operation recorded in the first transaction log is in the source database at the time of operation (it can be understood as the time when the first transaction log was generated in the log file), and whether it is in the second transaction
  • the transaction operation of the log record is before the operation time in the source database (it can be understood as the time when the second transaction log is generated in the log file). If yes, it means that the second transaction log depends on the first transaction log, then Add the number of the first transaction log to the new field in the header of the second transaction log.
  • each parallel processing module after each parallel processing module has processed a set of transaction logs, it can cache the processed transaction logs locally, and then create a new processing queue to receive and process the data sent by the extraction module connected to it. Another set of transaction logs can improve the processing efficiency of the parallel processing module without waiting for the transaction logs that have been processed to be successfully transmitted to the destination database.
  • the parallel processing module can also add a new field to the header of each transaction log to indicate the information of the group to which the transaction log belongs. For example, if the first transaction log and the second transaction log belong to the first group of transaction logs, the number 1 is added to the header of the first transaction log and the second transaction log, and the third transaction log and the fourth transaction log belong to the second group For transaction logs, add the number 2 to the headers of the third transaction log and the fourth transaction log.
  • each extraction module after each extraction module has extracted a set of transaction logs, it can also automatically calculate the extraction range of the next set of transaction logs to be extracted. For example, the first extraction module 1 does not need to wait for the completion of the replay of the transaction log after extracting the transaction log with the number range of 1-5000. After determining the number range of the next set of transaction logs to be extracted is 20001-25000 , You can extract transaction logs with a serial number ranging from 20001-25000. Obviously, the transaction log with the number range of 20001-25000 is also the transaction log of group 1.
  • the parallel processing module You can also add an identifier to display the number of extractions in the transaction log. For example, if the transaction log with a number range of 1-5000 is the transaction log in the transaction log group extracted for the first time by extraction module 1, you can add number 11 to each transaction log, and the first number 1 is used to indicate the The transaction log is extracted for the first time. The second number is used to indicate that the group to which the transaction log belongs is the first group. Of course, the first number 1 can also be used to indicate the group described in the transaction log.
  • the two numbers 1 indicate that the transaction log is extracted for the first time, and there is no restriction here; the transaction log with the number range of 20001-25000 is the transaction log in the transaction log group extracted for the second time by extraction module 1.
  • the number 21 is added to each transaction log, where the first number 2 is used to indicate that the transaction log is extracted for the second time, and the second transaction log is used to indicate that the group to which the transaction log belongs is the first group, or, The number 12 can be increased.
  • the first number 1 is used to indicate the group to which the transaction log belongs, and the second number 2 is used to indicate that the transaction log is extracted for the second time. In this way, when the replay module receives the transaction log, The transaction log extracted for the first time by each extraction module is executed first, and then the transaction log extracted for the second time is executed, and so on.
  • FIG. 7 is a structural block diagram of another example of the database replication system 300.
  • the number of playback modules in the destination device 302 can be multiple, and the number of playback modules can be different from the number of receiving modules.
  • there are three replay modules namely a third replay module 3026, a fourth replay module 3027, and a fifth replay module 3028.
  • Each replay module is connected to the first receiving module 3021 and the second receiving module 3021, respectively.
  • Module 3022 is connected, that is, each receiving module can send transaction logs to any replay module, and each replay module can be used to receive transaction logs from different groups.
  • each receiving module can randomly distribute each transaction log in a set of received transaction logs to any replay module.
  • each receiving module can distribute each transaction log in a set of received transaction logs to the replay module in a preset order.
  • the first receiving module 3021 will receive the first set of transaction logs
  • the first transaction log is distributed to the third replay module 3026
  • the second transaction log in the first set of transaction logs is distributed to the fourth replay module 3027
  • the third transaction log is distributed to the fifth replay module 3028.
  • the fourth transaction log is distributed to the third replay module 3026
  • the fifth transaction log is distributed to the fourth replay module 3027
  • the sixth transaction log is distributed to the fifth replay module 3028, and so on.
  • each replay module can be numbered.
  • the number of the third replay module 3026 is 1, the number of the fourth replay module 3027 is 2, and the number of the fifth replay module 3028 is 3. It can be based on load balancing.
  • the principle of hash calculation is based on the number of each transaction log, and the result of the hash calculation is the number of the replay module to which each transaction log should be distributed, so that the transaction log is distributed to the corresponding replay module.
  • the first receiving module 3021 hashes the number 1 to obtain the calculated value, for example, it means that the transaction log 1 should be distributed to the number If it is a replay module of 1, the first receiving module 3021 distributes the transaction log 1 to the third replay module 3026, and so on, until the completion of distributing each received transaction log to the replay module.
  • each transaction log may also carry the transaction log group identifier described in the transaction log.
  • each transaction log in the first group of transaction logs carries the identifier of the first group
  • each transaction log in the second group of transaction logs carries the identifier of the second group, and so on, which will not be described here.
  • each replay module After each replay module receives the transaction log sent by each receiving module, it caches the transaction log in a different replay queue to wait for replay.
  • a replay module can set up multiple replay queues. These replay queues buffer transaction logs sent by different receiving modules. For example, in this example, the number of receiving modules is 2, then each replay module can There are two replay queues, and the transaction logs sent by different receiving modules are buffered in different replay queues according to the sequence of the generation time of the transaction log received from each receiving module.
  • the third replay module 3026 sequentially buffers the transaction logs received from the first receiving module 3021 in the first replay queue, and sequentially caches the transaction logs received from the second receiving module 3022 in the second replay queue. middle.
  • the processing process of other replay modules is the same as that of the third replay module 3026, and will not be repeated here. Or, you can set up multiple replay queues to be associated with different transaction log groups, for example, cache all transaction logs in the first set of transaction logs in the first replay queue, and store all transaction logs in the second set of transaction logs Cached in the second replay queue, so that transaction log replay can be performed according to different replay queues.
  • each replay module will replay the transaction logs in different queues in sequence according to the sequence of the replay queue.
  • the third replay module 3026 first processes the transaction log in the first replay queue, and then replays all the transactions in the queue. After logging, replay all transaction logs in the second replay queue.
  • the processing methods of other playback modules are the same, so I won't repeat them here.
  • each replay module determines that the first transaction log waiting to be processed in the replay module meets the replay condition, it replays the first transaction log.
  • the first transaction log waiting to be processed can be understood as the first transaction log waiting to be processed in the replay queue being processed by the replay module.
  • the third replay module 3026 first processes the transaction log in the first replay queue, the first transaction log in the first replay queue is the first transaction log in the first set of transaction logs, and then the third replay The module 3026 determines whether the first transaction log depends on other transaction logs.
  • the third replay module 3026 determines that the first transaction log does not carry the numbers of other transaction logs in the first set of transaction logs, thereby determining the first transaction log If you do not rely on other transaction logs, it is determined that the first transaction log meets the replay conditions, and then the first transaction log is replayed in the destination database.
  • the specific replay process is similar to the example shown in Figure 2, so I won’t do it here. limit.
  • the third replay module 3026 After the third replay module 3026 has replayed the first transaction log, it sends the replay result to other replay modules, that is, sends the replay result to the fourth replay module 3027 and the fifth replay module 3028, where the replay result is the first A transaction log has completed its replay.
  • the fourth replay module 3027 and the fifth replay module 3028 are also processing the transaction log in their first replay queue in parallel.
  • the fourth replay module 3027 determines the second transaction log Rely on the first transaction log in the first set of transaction logs, so that the fourth replay module 3027 does not receive the first transaction log of the first set of transaction logs from other replay modules to complete the replay result of the replay.
  • the replay module 3027 cannot replay the transaction log.
  • the fifth replay module 3028 That is to say, at the same time, only one of the multiple replay modules is replaying the transaction log, while the other replay modules are in the waiting state.
  • the fourth replay module 3027 After the fourth replay module 3027 receives the replay result of the first transaction log in the first set of transaction logs sent by the third replay module 3026, the fourth replay module 3027 determines that the replay result is exactly the transaction log that it is waiting to process (That is, the second transaction log in the first set of transaction logs) is the replay result of the transaction log on which it depends, then the fourth replay module 3027 determines that the transaction log it is waiting to process meets the replay conditions, and then the transaction log at the destination end The data is replayed, and the replay result of the second transaction log is sent to the third replay module 3026 and the fifth replay module 3027.
  • the fifth replay module 3028 After the fifth replay module 3028 receives the replay result of the first transaction log in the first set of transaction logs sent by the third replay module 3026, the fifth replay module 3028 determines that the transaction log waiting to be processed is the first set of transactions The fourth transaction log in the log, and the fourth transaction log depends on the second transaction log and the third transaction log. The replay result is not the replay result of the transaction log on which the transaction log it is waiting to be processed depends on. Therefore, Keep waiting until it receives the replay results of the second transaction log and the third transaction log of the first group, before it can start replaying.
  • the fourth replay module 3027 and the fifth replay module 3028 do not receive the replay result of the transaction log on which the transaction log they are waiting to process depends on within the preset time period
  • the fourth replay module 3027 and the fifth replay module 3028 may also send a query request for obtaining the replay result to other replay modules, and the replay module that executes the transaction log responds to the query request, and the response result is whether the transaction log replay is completed. In this way, the fourth replay module 3027 and the fifth replay module 3028 can also determine whether it is necessary to continue to maintain the waiting state according to the response result.
  • the target database will get the same data as the source database.
  • each sending module may further include a buffer unit for buffering the transaction log that has not been sent to the receiving module.
  • the source device 301 can clear the transaction log whose storage duration exceeds the threshold in the log file.
  • the sending module can use the cache unit to Store the transaction log that has not been sent to the receiving module, so that when the transmission resumes, the transaction log is re-sent to the receiving module.
  • the sending module can also use other methods to ensure the reliability of transaction log transmission.
  • the sending module can also directly store the transaction log that has not been sent to the receiving module in a permanent storage device, which is not limited here.
  • each sending module can create multiple processing queues at the same time.
  • Each processing queue is used to process a set of transaction logs received from the parallelization module, and according to the preset processing sequence, the transaction logs in each processing queue are sent in turn To the receiving module.
  • Each processing queue can be executed independently, that is, when the transaction log in one processing queue is not all sent to the receiving module, the other processing queue receives the next set of transaction logs from the parallelization module connected to it to reduce Delay in transmission waiting.
  • FIG. 8 is a schematic diagram of an example of data replication performed by the database replication system 300 shown in FIG. 7.
  • the source database of the source device 301 generates log files including n transaction logs within a preset time period.
  • the n transaction logs are transaction log 1 to transaction log n
  • the source device 301 includes 3 extraction modules, namely extraction module 1 to extraction module 3, 3 parallelization modules, respectively, parallelization module 1 to parallelization module 3, and 3 sending modules, respectively, sending module 1 to sending module 3, and
  • Each module is connected one by one, that is, the parallelization module 1 is connected to the extraction module 1 and the sending module 1, the parallelization module 2 is connected to the extraction module 2 and the sending module 2, and the parallelization module 3 is connected to the extraction module 3 and the sending module.
  • Module 3 is connected.
  • the destination device 302 includes 3 receiving modules and 3 replay modules, namely receiving module 1 to receiving module 3, replaying module 1 to replaying module 3, 3 receiving modules and 3 replaying modules.
  • the modules are connected one by one.
  • Each sending module in the source device 301 is connected to three receiving modules in the destination device 302 respectively.
  • each extraction module extracts a group of transaction logs in parallel from the log files of the source database according to their number ranges.
  • the transaction log of group 1 extracted by extraction module 1 includes transaction log 1 to transaction log 3
  • extraction module 2 extracts The transaction log of group 2 includes transaction log 4 to transaction log 6
  • the transaction log of group 3 extracted by extraction module 3 includes transaction log 7 to transaction log 9, and then each extraction module sends a set of transaction logs extracted to it.
  • Parallelization module In Figure 8, T1 to T9 are taken as examples to mark transaction log 1 to transaction log 9.
  • the parallelization module 1 After the parallelization module 1 receives the transaction log of the group 1 sent by the extraction module 1, it determines that the transaction log 2 depends on the transaction log 1, and adds the number 1 to the header of the transaction log 2 to indicate the difference between the transaction log 1 and the transaction log 2. To determine the dependency relationship between transaction log 3 and transaction log 1 does not depend on other transaction logs, add number 0 to the header of transaction log 1 and transaction log 2, and then add numbered transaction log 1 to numbered The transaction log 3 is sent to the sending module 1. After the parallelization module 2 receives the transaction log of the group 2 sent by the extraction module 2, it determines that the transaction log 6 depends on the transaction log 4, then adds the number 4 to the head of the transaction log 6, and determines the transaction log 4 and the transaction log 5.
  • the number 0 is added to the headers of transaction log 4 and transaction log 5, and then the numbered transaction log 4 to the numbered transaction log 6 are sent to the sending module 2.
  • the parallelization module 3 receives the transaction log of group 3 sent by the extraction module 3, it determines that the transaction log 7 to transaction log 9 do not depend on other transaction logs, and then adds the number 0 to the header of the transaction log 7 to transaction log 9. Then send the numbered transaction log 7 to the numbered transaction log 9 to the sending module 3.
  • sending module 1 to the sending module 3 After the sending module 1 to the sending module 3 receive a corresponding set of transaction logs, they send each transaction log to the receiving module according to the number of the transaction log. For example, sending module 1 performs a hash calculation on number 1, and the calculation result is 1, so that the numbered transaction log 1 is sent to receiving module 1, and sending module 1 adds the transaction log group number to transaction log 1. , In order to indicate to the receiving module 1 that the transaction log group to which the transaction log 1 belongs is group 1, all the transaction logs in the group 1 are sent in the same manner, which will not be explained here.
  • the sending module 1 sends the transaction log 1 to the transaction log 3 to the receiving module 1 to the receiving module 3, and the sending module 2 sends the transaction log 4 to the transaction log 6 to the receiving module 1 to the receiving module 3, respectively.
  • Module 3 sends transaction log 7 to transaction log 9 to receiving module 1 to receiving module 3, so that receiving module 1 receives transaction log 1, transaction log 4 of group 2, and transaction log 7 of group 3, and the receiving module 2Receive the transaction log 2 of group 1, the transaction log 5 of group 2 and the transaction log 8 of group 3, and the receiving module 3 receives the transaction log 3 of group 1, the transaction log 6 of group 2 and the transaction log 9 of group 3.
  • the receiving module sends the received transaction log to the replay module connected to it.
  • the replay module After the replay module receives the transaction log, it replays the transaction log in the destination database in turn according to the dependencies between the transaction logs and the group in which the transaction log is located. For example, the replay module 1 first executes the transaction log in group 1, that is, transaction log 1, and determines that transaction log 1 does not depend on other transaction logs, and then directly replays the transaction log 1 in the destination database, and then completes the replay of transaction log 1 The result is sent to replay module 2 and replay module 3. In parallel, replay module 2 first executes transaction log 2 in group 1. Since transaction log 2 depends on transaction log 1, replay module 2 first waits for other replay modules to send the result of transaction log 1 to complete the replay.
  • replay module 2 starts from the replay module 1 After receiving the result, it replays the transaction log 2 in the destination data, and then sends the result of the transaction log 2 to the replay module 1 and the replay module 3.
  • the replay module 3 first executes the transaction log 3 in the group 1. Since the transaction log 3 does not depend on other transaction logs, the replay module 3 directly replays the transaction log 3 in the destination database, and then completes the replay of the transaction log 3 The results are sent to playback module 1 and playback module 2.
  • the replay module 1 can determine whether all the transaction logs of the group 1 have been replayed. After the transaction logs of the group 1 have all been replayed, the replay will be performed according to the transaction log of the group 2. For example, all transaction logs of group 1 are cached in the first replay queue. If all transaction logs in replay queue 1 have been replayed, then replay module 1 can determine that all transaction logs of group 1 have completed replay. The transaction logs in the two replay queues are replayed. The replay process of the transaction log of group 2 is similar to the replay process of the transaction log of group 1, and will not be repeated here. When each replay module has replayed all the transaction logs it received, the same data as the source database is obtained in the destination database, and the data in the source database is copied to the destination database.
  • the function of the parallelization module set in the source device 301 can be migrated to the destination device 302, that is, the source device 301 does not need to provide each set of transaction logs
  • the dependency between the multiple transaction logs included, and the dependency between the multiple transaction logs included in each set of transaction logs is determined by the destination device 302.
  • the destination device 302 is confirming that the first transaction operation recorded in the first transaction log is in the source database and the second transaction operation recorded in the second transaction log is in the source database. If the operation object in the first transaction log is the same, and the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database, the The number of the first transaction log is recorded in the second transaction log, and the number of the first transaction log carried in the second transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log, that is, the second transaction log Rely on the first transaction log.
  • the destination device 302 is confirming the operation object of the third transaction operation recorded in the third transaction log in the source database and the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database Same, and if the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, the third transaction log The number of is recorded in the fourth transaction log, and the number of the third transaction log carried in the fourth transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log, that is, the fourth transaction log depends on the third transaction log. Transaction log.
  • the manner in which the destination device 302 determines the dependency relationship between multiple transaction logs included in each set of transaction logs is similar to the source device 301 in Example 2, and will not be repeated here.
  • FIG. 9 is a structural block diagram of another example of the database replication system 300.
  • Figure 9 includes a third parallelization module 30209 and a fourth parallelization module 30210, where each parallelization module is connected to a receiving module to receive a set of transaction logs from one receiving module, and each parallel The replay modules are all connected to the replay module, that is, each parallelized module can send transaction logs to the replay module.
  • the third parallelization module 30209 is similar to the first parallelization module 3015 shown in FIG. 6,
  • the fourth parallelization module 30210 is similar to the second parallelization module 3016 shown in FIG. 6, and the other modules are respectively as shown in FIG. The corresponding modules are similar, so I won’t repeat them here.
  • the number of playback modules is one. Of course, the number of playback modules can also be multiple. As shown in Figure 10, it includes three playback modules, namely the third playback module 3026, the fourth playback module 3027, and the third playback module 3027.
  • the fifth replay module 3028 so that each parallelization module is respectively connected to each replay module, and is used to send transaction logs to any replay module.
  • each parallelization module is also used to execute the process of sending each transaction log in the received set of transaction logs to multiple replay modules. For example, each transaction log in the received set of transaction logs can be sent to multiple replay modules.
  • Each transaction log is randomly distributed to any replay module, or each transaction log in a set of received transaction logs can be distributed to the replay module in a preset order, or each replay module can be numbered according to the load
  • the principle of balance is to perform hash calculation according to the number of each transaction log, and the result of the hash calculation is the number of the replay module to which each transaction log should be distributed, so that the transaction log is distributed to the corresponding replay module.
  • FIG. 11 is a structural block diagram of another example of the database replication system 300.
  • FIG. 11 also includes management devices, which are the source management device 303 and the destination management device 304 respectively.
  • management devices which are the source management device 303 and the destination management device 304 respectively.
  • the other modules are similar to those in Fig. 10, so I won’t repeat them here.
  • the source management device 303 may allocate a number range for acquiring a set of transaction logs for each extraction module, and each extraction module extracts the corresponding transaction log according to the number range allocated by the source management device 303. And/or, the source management device 303 can also be used to monitor the operating status of each module in the source device 301, and dynamically adjust the number and range of transaction logs extracted by each extraction module. For example, if a failure of a certain extraction module is detected, the transaction log that the extraction module needs to obtain can be allocated to other extraction modules.
  • the destination management device 304 is used to monitor the operating status of each module in the destination device 302, and dynamically adjust the number of transaction logs processed by each receiving module and each replay module. For example, when a failure of a replay module is detected, each receiving module can be notified not to send transaction logs to the failed replay module, and the transaction logs that the failed replay module needs to replay are distributed to other replay modules.
  • the destination management device 304 when the destination management device 304 detects a failure of a replay module, it collects relevant information, for example, the current transaction number processed by the failed replay module, the number of the failed replay module, etc., and then collects The relevant information of the replay module is sent to other replay modules, and then each replay module redistributes the transaction log whose transaction generation time is after the current transaction number processed by the failed replay module to other non-faulty replay modules, thereby passing other replay modules.
  • the non-faulty replay module replays the transaction log in the destination database.
  • the destination management device 304 needs to send the collected relevant information to the source management device 303, and the source management device 303 will send the relevant information to the source management device 303.
  • the information is forwarded to each sending module to redistribute the transaction log whose transaction generation time is after the current transaction number processed by the failed replay module to other non-faulty replay modules.
  • a retransmission flag can be added to the redistributed transaction log, for example, a "second hash retransmission" flag can be added, Then, the replay module that receives the first transaction log with the "second hash retransmission" flag can immediately replay the transaction and complete the recovery. For example, the number of the transaction log being processed by the failed replay module is 3, and the first transaction log with the "second hash retransmission" flag is transaction log 4.
  • a certain replay module When a certain replay module receives the transaction log with " After the transaction log with the hash retransmission flag and number 4, it is directly replayed in the destination database based on the transaction log, and then the replay result is sent to other replay modules, and the rest of the transaction log is performed in the aforementioned manner, according to each transaction log. The transaction log group and dependencies of each transaction log are replayed in the destination database, thereby recovering the entire replay process.
  • the source management device 303 and the destination management device 304 can also re-allocate the tasks of the modules that have not failed in a similar manner to ensure the stability of the system.
  • modules in the above examples can also be freely combined, and are not limited to the several combinations in the above examples.
  • the dependency between the transaction logs is considered only when the transaction log is replayed. In this way, before the transaction log is replayed, the transaction can be changed without considering the dependency between the transaction logs.
  • the log is divided into multiple groups for parallel extraction and parallel transmission, which can improve the processing efficiency of the database replication system.
  • since the transaction log will be replayed in the destination database according to the dependency relationship between the transaction logs, it can be ensured that the destination database can obtain the same data as the source database, ensuring data consistency.
  • FIG. 12 is a flowchart of an example of this method. The flowchart is described as follows:
  • the source-end device obtains at least two sets of transaction logs in parallel from the log files of the source-end database, where the at least two sets of transaction logs include a first set of transaction logs and a second set of transaction logs.
  • the number of transaction log groups is not limited.
  • the following takes the at least two groups of transaction logs including the first group of transaction logs and the second group of transaction logs as an example.
  • the first group of transaction logs includes at least the adjacent first transaction log and the second transaction log
  • the second group of transaction logs includes at least the adjacent third transaction log and the fourth transaction log
  • the second transaction log is generated early At the time when the third transaction log was generated.
  • the source device obtains the first set of transaction logs and the second set of transaction logs in parallel from the source database, which may include but not limited to the following three methods:
  • the first group of transaction logs and the second group of transaction logs are obtained in parallel from the source database according to the number range of the transaction log.
  • the source device When the source device stores the transaction log, it can generate log summary record information corresponding to each transaction log.
  • the log summary record information records the number of the transaction log generated by the source database, the record position in the log file, and Then, when the source-end device needs to obtain the transaction log, it first reads the log summary record information, and obtains the first set of transaction logs and the first set of transaction logs in the log file in parallel according to the log summary record information. The second set of transaction logs.
  • the source-end device obtains two sets of transaction logs as an example for description.
  • the number of transaction log groups that the source-end device obtains in parallel is not limited. For example, parallel There are no restrictions on obtaining three sets of transaction logs, four sets of transaction logs, and even more sets of transaction logs.
  • the source device sends the first set of transaction logs and the second set of transaction logs in parallel, and the destination device receives the first set of transaction logs and the second set of transaction logs.
  • the source device can send the first set of transaction logs and the second set of transaction logs to the destination device through a remote connection Destination device.
  • the destination device first needs to determine the dependency between the multiple transaction logs included in each set of transaction logs, for example, determine the relationship between the first transaction log and the second transaction log included in the first set of transaction logs.
  • the dependency relationship and determining the dependency relationship between the third transaction log and the fourth transaction log included in the second set of transaction logs.
  • replay according to the group of transaction logs in the destination database.
  • determining the dependency relationship between the first transaction log and the second transaction log included in the first group of transaction logs includes:
  • the serial number of the first transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log.
  • the third transaction log After confirming that the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database, and the third transaction log recorded in the third transaction
  • the number of the third transaction log is recorded in the fourth transaction log, where , The number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • each transaction log is replayed in the destination database.
  • the destination device when the destination device obtains the first transaction log, it confirms that the first transaction log does not record the number of the transaction log used to indicate that there is a dependency relationship with the first transaction log.
  • the destination database replays the transaction. Then, in the case of obtaining the second transaction log, confirm that the second transaction log records the number of the first transaction log used to indicate the dependency relationship between the first transaction log and the second transaction log, and confirm that the number of the first transaction log is based on the first transaction log. After the transaction replay is completed, the transaction replay is performed on the destination database according to the second transaction log.
  • the destination device When the destination device obtains the third transaction log, it confirms that the third transaction log does not record the number of the transaction log indicating the dependency relationship with the third transaction log, and performs the transaction in the destination database according to the third transaction log. Repeat. Then, in the case of obtaining the fourth transaction log, confirm that the fourth transaction log records the number of the third transaction log used to indicate the dependency relationship between the fourth transaction log and the third transaction log, and confirm that the third transaction log is performed according to the third transaction log. After the transaction replay is completed, the transaction replay is performed on the destination database according to the fourth transaction log.
  • the destination device When the destination device has replayed all the transaction logs in the destination database in the above manner, it can obtain the data in the source database so as to be consistent with the source database.
  • the destination device needs to determine the dependency between each set of transaction logs before replaying the transaction.
  • the source device may also determine the relationship between each set of transaction logs. Dependency relationship, which can reduce the amount of computing on the destination device.
  • FIG. 13 is a flowchart of another example of this method. The flowchart is described as follows:
  • the source device obtains the first group of transaction logs and the second group of transaction logs in parallel from the log files of the source database.
  • the source device determines the dependency relationship among multiple transaction logs included in each set of transaction logs.
  • the source device confirms that the first transaction operation recorded in the first transaction log is in the source database and the second transaction operation recorded in the second transaction log is in the source database. If the operation object in the first transaction log is the same, and the operation time of the first transaction operation recorded in the first transaction log in the source database is earlier than the operation time of the second transaction operation recorded in the second transaction log in the source database, the The number of the first transaction log is recorded in the second transaction log, where the number of the first transaction log is used to indicate the dependency relationship between the first transaction log and the second transaction log.
  • the source device confirms that the operation object of the third transaction operation recorded in the third transaction log in the source database is the same as the operation object of the fourth transaction operation recorded in the fourth transaction log in the source database , And the operation time of the third transaction operation recorded in the third transaction log in the source database is earlier than the operation time of the fourth transaction operation recorded in the fourth transaction log in the source database, the third transaction log The number is recorded in the fourth transaction log, where the number of the third transaction log is used to indicate the dependency relationship between the third transaction log and the fourth transaction log.
  • the source-end device determines the dependency relationship between the transaction logs, please refer to the description of the parallelization module in Example 2, which will not be repeated here.
  • the source device sends the first group of transaction logs and the second group of transaction logs in parallel, and the destination device receives the first group of transaction logs and the second group of transaction logs.
  • the transaction log sent by the source-end device carries the number of the transaction log on which the transaction log depends. If a certain transaction log does not have a dependent transaction log, the transaction log may not carry the number of other transaction logs, or the number of the carried transaction log is 0.
  • Step S133 and step S134 are similar to step S122 and step S123, respectively, and will not be repeated here.
  • the functions of the source device and the destination device introduced in the above embodiments can all be implemented by function modules, applications, threads, virtualized function instances, or containers implemented by program code.
  • a source management module, multiple extraction modules, multiple parallelization modules, and multiple sending modules can be set in the source device.
  • the source management module is used to monitor the operating status of other modules in the source device.
  • One extraction module is used to obtain a set of transaction logs from the log file of the source database, multiple extraction modules obtain multiple sets of transaction logs in parallel, and each parallelization module is used to obtain a set of transaction logs from one extraction module, and then determine The dependency between multiple transaction logs in the group of transaction logs, and then send a group of transaction logs including the dependency to a sending module, and the sending module sends a group of transaction logs including the dependency to the destination device .
  • the destination management module and multiple replay modules can be set in the destination device. The destination management module is used to monitor the running status of other modules in the destination device. Each replay module is used to receive transaction logs from the source device and is used by Multiple replay modules cooperate with each other to complete the replay of all transaction logs.
  • each module needs to be initialized. Please refer to Figure 14 for the flow chart of the initialization setting for each module. The flow chart is described as follows:
  • the source management module reads local configuration data.
  • the configuration data may be preset by the technicians, for example, may include the topology relationship and network connection information between the extraction module, the parallelization module, and the sending module, and monitor the extraction module, the parallelization module, the sending module, and the purpose.
  • an extraction module, a parallelization module, and a transmission module are taken as examples for schematic illustration.
  • the processing of each module is The process is the same as that of the module corresponding to FIG. 14.
  • the destination management module reads the local configuration data.
  • the configuration data may be preset by a technician, for example, may include network connection information between the playback modules, and monitor the connection establishment request sent by the playback module to the destination management module.
  • one replay module is taken as an example for schematic description.
  • the processing process of each replay module is the same as the processing process of the replay module shown in FIG. 14.
  • the extraction module, the parallelization module, and the sending module respectively send a connection establishment request to the source management module, and establish a connection with the source management module.
  • the replay module sends a connection establishment request to the destination management module, and establishes a connection with the destination management module.
  • the destination management module sends a connection establishment request to the source management module, and establishes a connection with the source management module.
  • the destination management module may also send to the source management module the number of the last replayed transaction log of the destination device and/or the information of the replay module connected to the destination management module. When there are multiple playback modules, the information of all playback modules will be sent.
  • the source management module sends a connection confirmation message to the parallelization module and the sending module, and the connection information corresponding to each module.
  • the source management module feeds back the information of the extraction module and the sending module connected to each parallelization module, and sends the information of the parallelization module and the replay module connected to each sending module.
  • the source management module calculates the number range of a set of transaction logs to be extracted by each extraction module according to the number of pairs of extraction modules, parallelization modules, and sending modules, and feeds back its corresponding number range to the extraction module.
  • the number of extraction modules is 4, and each extraction module is used to extract 2500 transaction logs, that is, the number range of a set of transaction logs used by the first extraction module is 1 to 2500, and the second extraction module uses The number range of a set of transaction logs obtained is 2501 ⁇ 5000, and so on.
  • the source management module feeds back to the destination management module the information of all sending modules and the number range of a set of transaction logs that each extraction module needs to extract.
  • Each extraction module saves its number range, and returns a confirmation message to the source management module.
  • the initial configuration of the database replication system is completed.
  • the database replication system can perform database replication.
  • Each extraction module initiates a transaction log acquisition request to the source database.
  • the transaction log acquisition request is used to acquire the transaction log. Since the processing flow of each extraction module is the same, in the example shown in FIG. 15, only one extraction module is used as an example for description.
  • the extraction module determines whether a transaction log read is a transaction log within the number range corresponding to the extraction module according to the header information of the transaction log, and if so, continues to obtain the subject information of the transaction log, and the transaction log Perform analysis and filtering, and finally obtain the transaction log; if not, discard the transaction log and continue to read the next transaction log.
  • the extraction module sends the acquired transaction log to the parallelization module connected to it.
  • the parallelization module identifies the group of the transaction log where the received transaction log is located, determines the dependency relationship between the transaction log and other transaction logs, and carries the dependency relationship in the transaction log.
  • the parallelization module sends the transaction log carrying the dependency to the sending module connected to it.
  • the sending module performs hash calculation on the number of the transaction log, and the result of the hash calculation is the number of the replay module for receiving the transaction log, and sends the transaction log to the corresponding replay module.
  • the number of replay modules is K as an example.
  • the sending module determines to send the transaction log to the replay module 1.
  • the replay module 1 identifies the transaction log group of the received transaction log, and determines whether the transaction log belongs to the replay queue currently being processed according to the group, and if so, stores the transaction log in the replay queue, If not, create a new replay queue to store the transaction log.
  • the transaction logs of the same group are stored in the same replay queue.
  • the replay module 1 determines whether the transaction log meets the replay condition, and if it meets the replay condition, replay is performed in the destination database according to the transaction log.
  • the transaction log if the transaction log does not carry the numbers of other transaction logs, it is determined that the transaction log meets the replay condition. Or, if the transaction log carries the serial number of another transaction log, and the other transaction log has been replayed, it is determined that the transaction log satisfies the replay condition.
  • the replay module 1 may wait for the other replay modules to send the replay result of the transaction log on which the transaction log depends. If the replay result is not received within the preset time, you can perform a hash calculation according to the number of the transaction log you rely on, determine the replay module corresponding to the transaction log you rely on, and send an inquiry request to the replay module. The inquiry request is used to obtain the replay result.
  • the replay module 1 performs replay in the destination database according to the transaction log; if the response message corresponding to the query request is received, the transaction is indicated The log has not been replayed yet, so continue to wait.
  • the replay module for replaying the transaction log on which it depends fails, so that the replay module 1 will not receive the response message corresponding to the query request.
  • the replay module 1 can determine that it is used for The replay module of the transaction log on which the replay depends has a failure judgment, so that the situation is sent to the destination management module. It should be noted that this situation is not shown in FIG. 15.
  • the replay module 1 notifies other replay modules of the replay result of the transaction log.
  • each replay module repeats the steps performed by replay module 1 until the replay of all transaction logs is completed.
  • the replay module that completed the last transaction log replay will repeat it. The result is fed back to the sending module and other replay modules. After the sending module determines that the last transaction log has been replayed, it can clear the cache of the transaction log.
  • the replay module may fail when replaying according to the transaction log.
  • the source management module and the destination management module may also be used in this embodiment of the application.
  • the management module performs fault recovery processing. Please refer to Figure 16 for the flow chart of fault recovery for each module. The flow chart is described as follows:
  • the destination management module determines that the playback module m is faulty.
  • the method for the destination management module to determine the failure of the replay module m may include but is not limited to the following methods:
  • Each replay module can send a heartbeat to the destination management module according to a preset period. If the destination management module does not receive the heartbeat sent by the replay module m within a certain period, it can be determined that the replay module m is faulty.
  • playback modules send inquiry requests to playback module m, but do not get a response message from playback module m, and thus report the situation to the destination management module.
  • the destination management module can determine that playback module m is faulty.
  • the destination management module can actively query the replay status of the destination database and determine the number of the last replayed transaction log in the destination database. If the number of the last replayed transaction log is not the number of the last transaction log received by the destination device , The hash calculation is performed according to the number of the last replayed transaction log that is queried, and the result of the hash calculation is m, then the destination management module can determine that the replay module m is faulty.
  • the destination management module sends a failure notification message to the source management module and each replay module.
  • the failure notification message may include the number of the replay module that has failed, for example, m, and the number of the last replayed transaction log in the replay module that has failed, for example, transaction log n.
  • Each replay module identifies the replay module m as a faulty module.
  • the source management module sends the failure notification message to each sending module.
  • the sending module redistributes the transaction log.
  • the sending module After the sending module receives the failure notification message, it performs a secondary hash calculation on the transaction log that has been distributed to the replay module m, and distributes it to other replay modules.
  • the redistributed transaction log may carry a "second hash retransmission" flag.
  • Each replay module recreates a new replay queue according to the group number of the transaction log and the number of the fault replay module, caches the retransmitted transaction log, and feeds back a confirmation message to the sending module.
  • Each replay module performs replay in the destination database according to the transaction log after retransmission and the transaction log before retransmission.
  • Each replay module can use the same method as the foregoing to replay each transaction log in the destination database, which will not be repeated here. If a certain replay module receives the first transaction log carrying the "second hash retransmission" identifier, for example, it receives the transaction log numbered n with the "second hash retransmission" identifier, then the replay module can immediately Replay based on the transaction log without waiting.
  • the sending module After the sending module redistributes the transaction log that has been sent to the replay module m, it sends other transaction logs in the normal way. Of course, if it is determined that a certain transaction log needs to be distributed to the replay module m, the transaction log still needs to be distributed
  • the second hash calculation is to distribute the transaction log to other replay modules. Although the sending module performs a second hash calculation, it is the first time the replay module receives the transaction log. Therefore, in this In this case, there is no need to carry the "secondary hash retransmission" flag in the transaction log where the second hash calculation has been performed.
  • the storage system may include a hardware structure and/or a software module. Form to achieve the above functions. Whether a certain function among the above-mentioned functions is executed by a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraint conditions of the technical solution.
  • the division of modules in the embodiments shown in Fig. 3 to Fig. 11 is illustrative, and is only a logical function division. In actual implementation, there may be other division methods.
  • the functions in the various embodiments of the present application The module can be integrated in a processor, or it can exist alone physically, or two or more modules can be integrated in one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
  • FIG. 17 shows a source device 1700 provided by an embodiment of this application, where the source device 1700 may be a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the source device 1700 includes at least one processor 1720, which is configured to implement or support the source device 1700 to implement the function of the source device in the method provided in the embodiment of the present application.
  • the processor 1720 may obtain at least two sets of transaction logs in parallel from the log file of the source database. For details, please refer to the detailed description in the method example, which will not be repeated here.
  • the source device 1700 may also include at least one memory 1730 for storing program instructions and/or data.
  • the memory 1730 and the processor 1720 are coupled.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processor 1720 may operate in cooperation with the memory 1730.
  • the processor 1720 may execute program instructions stored in the memory 1730. At least one of the at least one memory may be included in the processor.
  • the source device 1700 may further include a communication interface 1710 for communicating with other devices through a transmission medium, so that the source device 1700 can communicate with other devices.
  • the other device may be a storage client or a storage device.
  • the processor 1720 can use the communication interface 1710 to send and receive data.
  • the embodiment of the present application does not limit the specific connection medium between the communication interface 1710, the processor 1720, and the memory 1730.
  • the memory 1730, the processor 1720, and the communication interface 1710 are connected by a bus 1740.
  • the bus is represented by a thick line in FIG. 17.
  • the connection mode between other components is only for schematic illustration , Is not limited.
  • the bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 17, but it does not mean that there is only one bus or one type of bus.
  • the processor 1720 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. Or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory 1730 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory).
  • a non-volatile memory such as a hard disk drive (HDD) or a solid-state drive (SSD), etc.
  • a volatile memory volatile memory
  • RAM random-access memory
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this.
  • the memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • FIG. 18 shows a destination device 1800 provided in an embodiment of this application, where the destination device 1800 may be a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the destination device 1800 includes at least one processor 1820, which is used to implement or support the destination device 1800 to implement the function of the destination device in the method provided in the embodiment of the present application.
  • the processor 1820 may obtain at least two sets of transaction logs from the source device, and perform transaction log replay in the destination database according to the transaction logs. For details, please refer to the detailed description in the method example, which will not be repeated here.
  • the destination device 1800 may also include at least one memory 1830 for storing program instructions and/or data.
  • the memory 1830 and the processor 1820 are coupled.
  • the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the processor 1820 may operate in cooperation with the memory 1830.
  • the processor 1820 may execute program instructions stored in the memory 1830. At least one of the at least one memory may be included in the processor.
  • the destination device 1800 may further include a communication interface 1810 for communicating with other devices through a transmission medium, so that the destination device 1800 can communicate with other devices.
  • the other device may be a storage client or a storage device.
  • the processor 1820 can use the communication interface 1810 to send and receive data.
  • the embodiment of the present application does not limit the specific connection medium between the communication interface 1810, the processor 1820, and the memory 1830.
  • the memory 1830, the processor 1820, and the communication interface 1810 are connected by a bus 1840.
  • the bus is represented by a thick line in FIG. 18.
  • the connection mode between other components is only for schematic illustration. , Is not limited.
  • the bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used to represent in FIG. 18, but it does not mean that there is only one bus or one type of bus.
  • the processor 1820 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. Or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory 1830 may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., and may also be a volatile memory (volatile memory). For example, random-access memory (RAM).
  • the memory is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, but is not limited to this.
  • the memory in the embodiments of the present application may also be a circuit or any other device capable of realizing a storage function for storing program instructions and/or data.
  • An embodiment of the present application also provides a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the method executed by the server in the embodiments shown in FIGS. 12-16.
  • the embodiments of the present application also provide a computer program product, including instructions, which when run on a computer, cause the computer to execute the method executed by the server in the embodiments shown in FIGS. 12-16.
  • the embodiment of the present application provides a chip system, which includes a processor and may also include a memory, which is used to implement the functions of the source device or the destination device in the foregoing method.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • the methods provided in the embodiments of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server, data center, etc. integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, hard disk, Magnetic tape), optical media (for example, digital video disc (digital video disc, DVD for short)), or semiconductor media (for example, SSD), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种数据库复制系统、方法、源端设备以及目的端设备,在该数据库复制系统中,在源端数据库中的事务日志到目的端设备之前,通过将事务日志按照产生时间的先后顺序进行分组,实现并行化获取并发送多组事务日志的过程,可以提高数据库复制系统的处理效率。进一步,由于在事务日志重演之前不需要考虑事务日志之间的依赖关系,从而不需要对事务日志进行集中分析处理,可以降低源端数据库的处理复杂度,可以提高系统的处理效率。且,由于不同组的事务日志在目的端设备会根据事务日志之间的依赖关系以及产生时间的先后顺序进行重演,从而可以保证在目的端数据库中获取的数据的准确性,以确保目的端数据库中的数据与源端数据库中的数据的一致性。

Description

一种数据库复制系统、方法、源端设备以及目的端设备
相关申请的交叉引用
本申请要求在2020年02月28日提交中国专利局、申请号为202010129105.2、申请名称为“数据复制方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中;本申请要求在2020年05月08日提交中国专利局、申请号为202010383462.1、申请名称为“一种数据库复制系统、方法、源端设备以及目的端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,尤其涉及一种数据库复制系统、方法、源端设备以及目的端设备。
背景技术
随着技术的发展,越来越多的数据需要使用数据库进行存储。为了保证数据库中数据的可靠性,通常可以通过数据库复制方案,将源端数据库中的数据复制到目的端数据库中,以保证当源端数据库中的数据发生故障时,能够从目的端数据库中恢复发生故障前的数据。
作为一种示例,基于事务日志的数据库复制方案通常包括变化数据捕获、变化数据传输以及变化数据重演这3个阶段。其中,变化数据捕获是指,通过源数据库的日志文件中记录的事务日志,识别源端数据库中发生变化的数据,并获取该发生变化的数据所对应的事务日志。变化数据的传输是指,将发生变化的数据所对应的事务日志从源端数据库传输到目的端数据库。变化数据的重演是指,目的端数据库对接收到的发生变化的数据所对应的事务日志进行解析及处理,将变化的数据更新到目的端数据库。
由于数据库中的多个事务日志之间可能存在依赖关系,例如,事务日志1必须在事务日志2重演完成之后再重演,其原因在于,事务日志1和事务日志2均记录了源数据库中针对相同的操作对象进行的操作,例如写操作,针对源数据库的一个数据表行主键的写操作,事务日志1记录的针对该行主键的写操作先于事务日志2记录的针对该行主键的写操作。因此,在数据库复制方案中,考虑到上述依赖关系,只能对事务日志进行串行处理,例如先从源端数据库的日志文件中获取事务日志1,并发送事务日志1至目的端数据库,再从源端数据库的日志文件中获取事务日志2,并发送事务日志2至目的端数据库,目的端数据库先对事务日志1进行重演,再对事务日志2进行重演,由于以上方案获取事务日志的过程中必须严格执行次序,从而导致通过数据库复制方案复制数据时的效率较低。可见,如何提高根据数据库复制方案复制数据的效率,是目前亟待解决的技术问题。
发明内容
本申请提供一种数据库复制系统、方法、源端设备以及目的端设备,用以提高数据库复制方案复制数据的效率。
第一方面,提供一种数据库复制系统,该系统用于根据源端数据库的日志文件中包括的至少两组事务日志,在目的端数据库进行重演,该系统包括源端设备和目的端设备,其 中:
源端设备,用于从源端数据库的日志文件中并行获取至少两组事务日志,至少两组事务日志包括第一组事务日志和第二组事务日志,并发送至少两组事务日志,例如,第一组事务日志至少包括相邻的第一事务日志和第二事务日志,第二组事务日志至少包括相邻的第三事务日志和第四事务日志,第二事务日志的产生时间早于第三事务日志的产生时间;其中,相邻的可以理解为事务日志的产生时间连续。
目的端设备,用于接收至少两组事务日志,并根据至少两组事务日志在目的端设备的目的端数据库进行事务重演,例如,先在根据第一组事务日志中的第一事务日志、第二事务日志以及第一事务日志与第二事务日志的依赖关系在目的端数据库进行事务重演之后,再根据第二组事务日志中的第三事务日志、第四事务日志以及第三事务日志与第四事务日志的依赖关系在目的端数据库进行事务重演,使得目的端数据库与源端数据库存储的数据一致。
在上述技术方案中,在源端数据库中的事务日志到目的端设备之前,通过将事务日志按照产生时间的先后顺序进行分组,实现并行化获取并发送多组事务日志的过程,可以提高数据库复制系统的处理效率。进一步,由于在事务日志重演之前不需要考虑事务日志之间的依赖关系,从而不需要对事务日志进行集中分析处理,可以降低源端数据库的处理复杂度,可以提高系统的处理效率。且,由于不同组的事务日志在目的端设备会根据事务日志之间的依赖关系以及产生时间的先后顺序进行重演,从而可以保证在目的端数据库中获取的数据的准确性,以确保目的端数据库中的数据与源端数据库中的数据的一致性。
在一种可能的设计中,源端设备还用于在确认第一事务日志记录的第一事务操作在源端数据库中的操作对象与第二事务日志记录的第二事务操作在源端数据库中的操作对象相同,且第一事务日志记录的第一事务操作在源端数据库中的操作时刻早于第二事务日志记录的第二事务操作在源端数据库中的操作时刻的情况下,将第一事务日志的编号记录到第二事务日志中,该第一事务日志的编号用于指示第一事务日志与第二事务日志的依赖关系;以及,
在确认第三事务日志记录的第三事务操作在源端数据库中的操作对象与第四事务日志记录的第四事务操作在源端数据库中的操作对象相同,且第三事务日志记录的第三事务操作在源端数据库中的操作时刻早于第四事务日志记录的第四事务操作在源端数据库中的操作时刻的情况下,将第三事务日志的编号记录到第四事务日志中,该第三事务日志的编号用于指示第三事务日志与第四事务日志的依赖关系。
在上述技术方案中,可以通过源端设备将事务日志之间的依赖关系记录在对应的事务日志中,从而目的端设备可以直接根据每个事务日志记录的依赖关系来进行事务日志重演,可以提高进行事务日志重演过程的效率。
在一种可能的设计中,目的端设备还用于在确认第一事务日志记录的第一事务操作在源端数据库中的操作对象与第二事务日志记录的第二事务操作在源端数据库中的操作对象相同,且第一事务日志记录的第一事务操作在源端数据库中的操作时刻早于第二事务日志记录的第二事务操作在源端数据库中的操作时刻的情况下,并将第一事务日志的编号记录到第二事务日志中,该第一事务日志的编号用于指示第一事务日志与第二事务日志的依赖关系;以及,
在确认第三事务日志记录的第三事务操作在源端数据库中的操作对象与第四事务日 志记录的第四事务操作在源端数据库中的操作对象相同,且第三事务日志记录的第三事务操作在源端数据库中的操作时刻早于第四事务日志记录的第四事务操作在源端数据库中的操作时刻的情况下,将第三事务日志的编号记录到第四事务日志中,该第三事务日志的编号用于指示第三事务日志与第四事务日志的依赖关系。
在上述技术方案中,也可以在目的端设备接收到事务日志之后,确定各个事务日志之间的依赖关系,从而可以减少源端设备的处理量,提高抽取事务日志的过程的效率。
在一种可能的设计中,以第一组事务日志和第二组事务日志为例,对目的端设备在目的端数据库进行事务日志重演的过程进行说明。
针对第一组事务日志,目的端设备在获取到第一组事务日志中的第二事务日志的情况下,确认第二事务日志记录有用于指示第一事务日志与第二事务日志的依赖关系的第一事务日志的编号,在确认根据第一事务日志进行的事务重演完成之后,再根据第二事务日志进行事务重演。
在上述技术方案中,目的端设备可以通过确定事务日志中是否携带其他事务日志的编号来确定该事务日志是否与其他事务日志之间存在依赖关系,若存在依赖关系,则需要等待其所依赖的事务日志完成重演之后,再根据该事务日志进行重演,以确保在目的端数据库获取的数据的准确性。
在一种可能的设计中,目的端设备在获取到第一组事务日志中的第一事务日志的情况下,确认第一事务日志没有记录有用于指示与第一事务日志存在依赖关系的事务日志的编号,则根据第一事务日志进行事务重演。
在上述技术方案中,若某一个事务日志中不包括其他事务日志的编号,则确定该事务日志与其他事务日志之间不存在依赖关系,则可以直接根据该事务日志进行事务重演,而不用等待其他事务日志。
针对第二组事务日志,目的端设备在获取到第二组事务日志中的第四事务日志的情况下,确认第四事务日志记录有用于指示第四事务日志与第三事务日志的依赖关系的第三事务日志的编号,在确认根据第三事务日志进行的事务重演完成之后,根据第四事务日志进行事务重演。
在一种可能的设计中,目的端设备在获取到第二组事务日志中的第三事务日志的情况下,确认第三事务日志没有记录有用于指示与第三事务日志存在依赖关系的事务日志的编号,根据第三事务日志进行事务重演。
针对该第二组事务日志进行重演所能实现的技术效果可以参照前述针对第一组事务日志进行重演的技术效果的描述,在此不再赘述。
在一种可能的设计中,源端设备和源端数据库设置在第一区域,目的端设备和目的端数据库设置在第二区域,第一区域和第二区域远程连接。
在上述技术方案中,源端设备和目的端设备可以设置在不同的区域或者不同的数据中心,然后通过不同的区域或者不同的数据中心之间的远程连接发送事务日志。当然,源端设备和目的端设备也可以设置在同一区域或同一数据中心,在此不作限制。
在一种可能的设计中,源端设备用于根据事务日志的编号范围从源端数据库中并行获取至少两组事务日志。
在上述技术方案中,可以预先为各个事务日志组分配好对应的事务日志的编号范围,源端设备则可以根据各个编号范围进行事务日志抽取,提高处理效率。
在一种可能的设计中,源端设备还用于:从源端数据库中读取日志概要记录信息,日志概要记录信息记录有源端数据库产生的事务日志的编号、在日志文件中的记录位置、长度以及数量,然后,根据日志概要记录信息在日志文件中并行获取至少两组事务日志。
在源端数据库中,事务日志在日志文件中存储的位置可能是不连续的,在这种情况下,源端数据库中可以存储日志概要记录信息,然后,源端设备在需要抽取事务日志时,首先读取源端数据库的日志概要记录信息,从该日志概要记录信息中找到需要收取的事务日志的记录,根据该条记录中的位置、长度以及数量,确定事务日志在日志文件中的存储位置,从而不用遍历日志文件中所有的事务日志,便可以获取该事务日志,可以提高抽取模块的处理效率。
第二方面,提供一种数据库复制方法,在该方法中,源端设备首先从源端数据库的日志文件中并行获取至少两组事务日志,所述至少两组事务日志包括第一组事务日志和第二组事务日志,并发送所述至少两组事务日志,其中,第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;然后,向目的端设备发送所述至少两组事务日志。
在一种可能的设计中,源端设备在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
在一种可能的设计中,源端设备可以根据事务日志的编号范围从源端数据库中并行获取所述至少两组事务日志。
在一种可能的设计中,源端设备可以先从所述源端数据库中读取日志概要记录信息,所述日志概要记录信息记录有所述源端数据库产生的事务日志的编号、在所述日志文件中的记录位置、长度以及数量,然后,再根据所述日志概要记录信息在所述日志文件中并行获取所述至少两组事务日志。
第三方面,提供一种数据库复制方法,在该方法中,目的端设备首先从源端设备接收至少两组事务日志,所述至少两组事务日志包括第一组事务日志以及第二组事务日志,所述第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少相邻的包括第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;然后,根据所述至少两组事务日志在所述目的端设备的目的端数据库进行事务重演,使得所述目的端数据库与所述源端数据库存储的数据一致,其中,在根据所述第一组事务日志中的所述第一事务日志、所述第二事务日志以及所述第一事务日志 与所述第二事务日志的依赖关系在目的端数据库进行事务重演之后,根据所述第二组事务日志中的所述第三事务日志、所述第四事务日志以及所述第三事务日志与所述第四事务日志的依赖关系在所述目的端数据库进行事务重演。
在一种可能的设计中,目的端设备在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,并将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
在一种可能的设计中,目的端设备在获取到所述第一事务日志的情况下,确认所述第一事务日志没有记录有用于指示与所述第一事务日志存在依赖关系的事务日志的编号,根据所述第一事务日志进行事务重演。
在一种可能的设计中,目的端设备在获取到所述第二事务日志的情况下,确认所述第二事务日志记录有用于指示所述第一事务日志与所述第二事务日志的依赖关系的所述第一事务日志的编号,在确认根据所述第一事务日志进行的事务重演完成之后,根据所述第二事务日志进行事务重演。
在一种可能的设计中,目的端设备在获取到所述第三事务日志的情况下,确认所述第三事务日志没有记录有用于指示与所述第三事务日志存在依赖关系的事务日志的编号,根据所述第三事务日志进行事务重演。
在一种可能的设计中,目的端设备在获取到所述第四事务日志的情况下,确认所述第四事务日志记录有用于指示所述第四事务日志与所述第三事务日志的依赖关系的所述第三事务日志的编号,在确认根据所述第三事务日志进行的事务重演完成之后,根据所述第四事务日志进行事务重演。
第四方面,提供一种源端设备,该源端设备包括处理模块和发送模块,这些模块可以执行上述第二方面任一种设计示例中的所执行的相应功能,具体的:
处理模块,用于从源端数据库的日志文件中并行获取至少两组事务日志,所述至少两组事务日志包括第一组事务日志和第二组事务日志,并发送所述至少两组事务日志,其中,第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
发送模块,用于向目的端设备发送所述至少两组事务日志。
在一种可能的设计中,所述处理模块还用于:
在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事 务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
在一种可能的设计中,所述处理模块具体用于:
根据事务日志的编号范围从源端数据库中并行获取所述至少两组事务日志。
在一种可能的设计中,所述处理模块具体用于:
从所述源端数据库中读取日志概要记录信息,所述日志概要记录信息记录有所述源端数据库产生的事务日志的编号、在所述日志文件中的记录位置、长度以及数量;
根据所述日志概要记录信息在所述日志文件中并行获取所述至少两组事务日志。
第五方面,提供一种目的端设备,该目的端设备包括接收模块和处理模块,这些模块可以执行上述第三方面任一种设计示例中的所执行的相应功能,具体的:
接收模块,用于从源端设备接收至少两组事务日志,所述至少两组事务日志包括第一组事务日志以及第二组事务日志,所述第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
处理模块,用于在根据所述第一组事务日志中的所述第一事务日志、所述第二事务日志以及所述第一事务日志与所述第二事务日志的依赖关系在目的端数据库进行事务重演之后,根据所述第二组事务日志中的所述第三事务日志、所述第四事务日志以及所述第三事务日志与所述第四事务日志的依赖关系在所述目的端数据库进行事务重演,使得所述目的端数据库与所述源端数据库存储的数据一致。
在一种可能的设计中,所述处理模块还用于:
在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
在一种可能的设计中,所述处理模块具体用于:
在获取到所述第一事务日志的情况下,确认所述第一事务日志没有记录有用于指示与所述第一事务日志存在依赖关系的事务日志的编号,根据所述第一事务日志进行事务重演。
在一种可能的设计中,所述处理模块具体用于:
在获取到所述第二事务日志的情况下,确认所述第二事务日志记录有用于指示所述第一事务日志与所述第二事务日志的依赖关系的所述第一事务日志的编号,在确认根据所述第一事务日志进行的事务重演完成之后,根据所述第二事务日志进行事务重演。
在一种可能的设计中,所述处理模块具体用于:
在获取到所述第三事务日志的情况下,确认所述第三事务日志没有记录有用于指示与所述第三事务日志存在依赖关系的事务日志的编号,根据所述第三事务日志进行事务重演。
在一种可能的设计中,所述处理模块具体用于:
在获取到所述第四事务日志的情况下,确认所述第四事务日志记录有用于指示所述第四事务日志与所述第三事务日志的依赖关系的所述第三事务日志的编号,在确认根据所述第三事务日志进行的事务重演完成之后,根据所述第四事务日志进行事务重演。
第六方面,提供一种源端设备,该源端设备包括处理器,用于实现上述第二方面描述的方法。该源端设备还可以包括存储器,用于存储程序指令和数据。该存储器与该处理器耦合,该处理器可以调用并执行该存储器中存储的程序指令,用于实现上述第二方面描述的方法中的任意一种方法。该源端设备还可以包括通信接口,该通信接口用于该源端设备与其它设备进行通信。示例性地,该其它设备为目的端设备。
在一种可能的设计中,该源端设备包括处理器和通信接口,其中:
处理器,用于从源端数据库的日志文件中并行获取至少两组事务日志,所述至少两组事务日志包括第一组事务日志和第二组事务日志,并发送所述至少两组事务日志,其中,第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
通信接口,用于向目的端设备发送所述至少两组事务日志。
在一种可能的设计中,所述处理器还用于:
在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
在一种可能的设计中,所述处理器具体用于:
根据事务日志的编号范围从源端数据库中并行获取所述至少两组事务日志。
在一种可能的设计中,所述处理器具体用于:
从所述源端数据库中读取日志概要记录信息,所述日志概要记录信息记录有所述源端数据库产生的事务日志的编号、在所述日志文件中的记录位置、长度以及数量;
根据所述日志概要记录信息在所述日志文件中并行获取所述至少两组事务日志。
第七方面,提供一种目的端设备,该目的端设备包括处理器,用于实现上述第三方面描述的方法。该目的端设备还可以包括存储器,用于存储程序指令和数据。该存储器与该处理器耦合,该处理器可以调用并执行该存储器中存储的程序指令,用于实现上述第三方面描述的方法中的任意一种方法。该目的端设备还可以包括通信接口,该通信接口用于该目的端设备与其它设备进行通信。示例性地,该其它设备为源端设备。
在一种可能的设计中,该目的端设备包括处理器和通信接口,其中:
通信接口,用于从源端设备接收至少两组事务日志,所述至少两组事务日志包括第一组事务日志以及第二组事务日志,所述第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
处理器,用于在根据所述第一组事务日志中的所述第一事务日志、所述第二事务日志以及所述第一事务日志与所述第二事务日志的依赖关系在目的端数据库进行事务重演之后,根据所述第二组事务日志中的所述第三事务日志、所述第四事务日志以及所述第三事务日志与所述第四事务日志的依赖关系在所述目的端数据库进行事务重演,使得所述目的端数据库与所述源端数据库存储的数据一致。
在一种可能的设计中,所述处理器还用于:
在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
在一种可能的设计中,所述处理器具体用于:
在获取到所述第一事务日志的情况下,确认所述第一事务日志没有记录有用于指示与所述第一事务日志存在依赖关系的事务日志的编号,根据所述第一事务日志进行事务重演。
在一种可能的设计中,所述处理器具体用于:
在获取到所述第二事务日志的情况下,确认所述第二事务日志记录有用于指示所述第一事务日志与所述第二事务日志的依赖关系的所述第一事务日志的编号,在确认根据所述第一事务日志进行的事务重演完成之后,根据所述第二事务日志进行事务重演。
在一种可能的设计中,所述处理器具体用于:
在获取到所述第三事务日志的情况下,确认所述第三事务日志没有记录有用于指示与所述第三事务日志存在依赖关系的事务日志的编号,根据所述第三事务日志进行事务重演。
在一种可能的设计中,所述处理器具体用于:
在获取到所述第四事务日志的情况下,确认所述第四事务日志记录有用于指示所述第四事务日志与所述第三事务日志的依赖关系的所述第三事务日志的编号,在确认根据所述第三事务日志进行的事务重演完成之后,根据所述第四事务日志进行事务重演。
第八方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第二方面或第三方面中任意一项所述的方法。
第九方面,本申请实施例提供一种计算机程序产品,所述计算机程序产品存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被计算机执行时,使所述计算机执行第二方面或第三方面中任意一项所述的方法。
第十方面,本申请提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现第二方面或第三方面中所述的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
上述第二方面至第十方面及其实现方式的有益效果可以参考对第一方面的系统及其实现方式的有益效果的描述。
附图说明
图1为本申请实施例的应用场景的一种示例的示意图;
图2为一种基于事务日志的数据库复制方案的示意图;
图3为本申请实施例提供的数据库复制系统300的结构框图;
图4为数据库复制系统300的一种示例的结构框图;
图5为数据库复制系统300的另一种示例的结构框图;
图6为数据库复制系统300的另一种示例的结构框图;
图7为数据库复制系统300的另一种示例的结构框图;
图8为图7所示的数据库复制系统300进行数据复制的一种示例的示意图;
图9为数据库复制系统300的另一种示例的结构框图;
图10为数据库复制系统300的另一种示例的结构框图;
图11为数据库复制系统300的另一种示例的结构框图;
图12为本申请实施例提供的数据库复制方法的一种示例的流程图;
图13为本申请实施例提供的数据库复制方法的另一种示例的流程图;
图14为本申请实施例提供的各个模块进行初始化设置的流程图;
图15为本申请实施例提供的各个模块进行数据库复制的流程图;
图16为本申请实施例提供的各个模块进行故障恢复的流程图;
图17为本申请实施例提供的源端设备的一种示例的结构示意图;
图18为本申请实施例提供的目的端设备的一种示例的结构示意图。
具体实施方式
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。
为便于本领域技术人员理解本申请提供的技术方案,下面对本申请所涉及的技术术语进行说明。
1)源端设备,是指用于独立存储数据的设备,例如服务器,或者,也可以用于存储数据的设备集群,例如,包括管理设备和多个存储设备的存储系统,其中,管理设备可以是服务器,存储设备可以是硬盘驱动器(hard disk drive,HDD)磁盘设备、固态驱动器(solid state drive,SSD)磁盘设备,串行高级技术附件(serial advanced technology attachment,SATA)磁盘设备等。
目的端设备与源端设备相似,在此不再赘述。
2)源端数据库,是指源端设备中按照一定的存储方式存储数据且统一管理的多个数据的集合,源端设备可以对数据库中的数据进行新增、查询、更新、删除等操作。根据不同的存储方式,源端数据库可以包括关系型数据或者非关系数据,当然,也可以包括其他类型的数据库,在此不作限制。一个源端设备中可以包括一个源端数据库,也可以包括多个源端数据库,若包括多个源端数据库,则可以对每个数据库进行编号,源端设备可以根据每个源端数据库的编号访问各个源端数据库。
目的端数据库与源端数据相似,在此不再赘述。
3)操作对象,是指源端数据库中存储的每个数据。例如,源端数据库采用数据表的方式存储数据,则操作对象可以是指该源端数据库中的任意一个数据表中,由行主键或者行唯一键确定的一行的数据。
4)依赖关系,是指针对源端数据库的同一操作对象生成的多个事务日志之间必须按照事务日志的产生时间的先后顺序进行重演的关系。例如,在第一时刻对源端数据库中的某一个操作对象进行修改操作生成了事务日志1,在第一时刻之后的第二时刻,对该操作对象再次进行修改操作,生成了事务日志2,且由于事务日志2的产生时间在事务日志1之后,因此,事务日志2必须在事务日志1之后进行重演,则事务日志1和事务日志2之间存在依赖关系,也可以称为事务日志2依赖事务日志1。
5)区域,是指电力和网络相互独立的物理区域,每个区域可以用于提供相应的计算资源,例如虚拟机等,或每个区域也可以用于提供相应的存储资源,例如存储系统,在此不作限制。在每个区域用来提供存储资源时,也可以将区域称为数据中心。不同的区域或者数据中心之间远程连接,例如,可以通过无线网络进行连接等。
6)事务日志的产生时间,是指事务日志在日志文件中的逻辑时间,而不是一个具体的时间戳。逻辑时间可以理解为,多个事务日志之间的先后顺序,例如,事务日志1是在事务日志2之前产生的,但是并不能指示事务日志是在某一个时刻(例如10时39分00秒)产生的。
7)本申请实施例中“多个”是指两个或两个以上,鉴于此,本申请实施例中也可以将“多个”理解为“至少两个”。“至少一个”,可理解为一个或多个,例如理解为一个、两个或更多个。例如,包括至少一个,是指包括一个、两个或更多个,而且不限制包括的是哪几个,例如,包括A、B和C中的至少一个,那么包括的可以是A、B、C、A和B、A和C、B和C、或A和B和C。“和/或”,描述关联对象的关联关系,表示可以存在三种 关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,字符“/”,如无特殊说明,一般表示前后关联对象是一种“或”的关系。
除非有相反的说明,本申请实施例提及“第一”、“第二”等序数词用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。
首先,对本申请实施例的应用场景进行说明。
随着大数据分析技术、物联网技术等技术的发展,数据成为了推动相关技术进步的核心要素,因此,各行各业的数据需要进行存储,以便用来分析和指导业务。例如,可以通过数据库存储系统将数据进行存储。为了保证数据的可靠性,通常会将存储系统中的数据进行复制。请参考图1,以存储系统为数据库存储系统为例,可以通过网络,将源端数据库中的数据复制到目的端数据库中,从而当源端数据库发生故障时,可以从目的端数据库中恢复数据。通过网络将源端数据库中的数据复制到目的端数据库的方案有多种,例如,可以基于应用层复制数据,可以基于数据库复制数据等,在本申请实施例中,主要针对基于数据库复制数据的方案,可以简称为数据库复制方案。
下面,以基于事务日志的数据库复制方案为例,对数据库复制方案进行说明。
当源端数据库中存储的数据发生变化时,源端数据库会生成与变化的数据对应的事务日志。事务日志可以记录对操作对象进行的操作、操作对象的内容,操作对象的起止位置等信息,在此不对事务日志所包括的具体内容进行限制。例如,在时刻1,通过写入操作,向源端数据库中写入一个新数据,例如为数据A,则数据A为源端数据库中发生变化的数据,因此,源端数据库中会生成与数据A对应的事务日志存储在日志文件中,该事务日志可以记录对数据A进行的操作(在该示例中该操作为写入操作)、数据A的内容、数据A的起止位置等信息。为方便说明,将与数据A对应的事务日志标记为事务日志1。然后,在时刻1之后的时刻2,通过修改操作,对数据A进行修改,则源端数据库再次生成与数据A对应的事务日志2并存储在日志文件中。
当需要将源端数据库中的数据复制到目的端数据库时,则可以采用基于事务日志的数据库复制方案。基于事务日志的数据库复制方案的一个主要原则就是:要求事务日志到达目的端数据库时,必须按照事务日志之间的依赖关系进行重演。例如,前述所述的事务日志2必须在事务日志1之后进行重演,从而才能获取与源端数据库相同的数据。
基于上述原则,请参考图2,为一种基于事务日志的数据库复制方案的示意图。如图2所示,该数据库复制方案中包括4个模块,分别为事务抽取模块、跨域传输模块、重演前并行化模块以及事务重演模块。其中,事务抽取模块以及跨域传输模块设置在源端设备中,重演前并行化模块以及事务重演模块设置在目的端设备中。事务抽取模块、跨域传输模块以及重演前并行化模块之间是一对一的关系,重演前并行化模块和事务重演模块之间是一对多的关系,即在该复制方案中,事务抽取模、跨域传输模块以及重演前并行化模块的数量为1个,而事务重演模块的数量为多个,具体的数值可以根据实际使用需求进行设置。在图2中,以事务重演模块包括K个为例。
源端设备中的事务抽取模块,首先获取源端数据库中发生变化的数据对应的事务日志,具体来讲,就是按照事务日志的产生时间的先后顺序,从源端数据库中用于存储事务日志的日志文件中依次串行读取事务日志,然后将获取的事务日志传输给跨域传输模块进行处理。跨域传输模块在接收到事务日志后,则将获取的事务日志依次串行传输给重演前并行化模块,从而将变化的数据对应的事务日志发送到目的端设备。需要说明的是,在上述技 术方案中,事务抽取模块串行读取事务日志,是指从某一个源端数据库对应的一个日志文件中依次串行读取事务日志,每个源端数据库仅包括一个日志文件。
目的端设备中的重演前并行化模块,在接收该多个事务日志后,首先识别该多个事务日志之间的依赖关系,并根据确定出的依赖关系将多个事务日志传输给事务重演模块。例如,重演前并行化模块可以默认接收到的第一个事务日志没有依赖其他事务日志,则将第一事务日志传输给K个事务重演模块中的其中一个事务重演模块,例如传输给事务重演模块1。然后继续判断接收到的第二个事务日志与第一个事务日志之间的依赖关系,确定该第二个事务日志与第一个事务日志处理的是源端数据库中相同的操作对象,则确定第二个事务日志与第一个事务日志之间存在依赖关系,在这种情况下,重演前并行化模块需要等待事务重演模块1重演完成第一个事务日志之后,再将第二个事务日志传输给事务重演模块1。若重演前并行化模块确定第二个事务日志与第一个事务日志之间不存在依赖关系,则直接将该第二个事务日志传输给K个事务重演模块中除事务重演模块1之外的事务重演模块,例如,传输给事务重演模块2,这样,事务重演模块1和事务重演模块2可以并行处理不同的事务日志,从而可以实现将不存在依赖关系的事务日志并行化处理的过程。每个事务重演模块,在接收到事务日志后,则在目标数据库中执行该事务日志,执行完成后,则可以得到源端数据库中发生会变化的数据。然后,事务重演模块会将事务日志的执行结果反馈给重演前并行化模块,以使重演前并行化模块根据是否接收到某一个事务日志的执行结果来确定该事务日志是否重演完成。
可见,在上述数据库复制方案中,除了可以在事务重演阶段,通过设置多个事务重演模块实现并行重演事务日志,在其他处理阶段中,例如,事务抽取阶段、跨域传输阶段以及重演前并行化阶段,都只能采用串行处理方式,从而导致通过数据库复制方案复制数据时的效率较低。
由上述过程可知,正是因为考虑到事务日志之间的依赖关系,从而导致事务日志在进行重演之前,只能采用串行方式进行处理。但是,若想要通过传输事务日志的方式,在目的端数据库获取源端数据库中发生变化的数据,其实只要保证事务日志在重演时按照其依赖关系进行重演即可,而在事务日志重演之前可以不用考虑该依赖关系,这样就可以实现并行传输多个事务日志,从而可以提高复制数据的效率。
鉴于此,本申请实施例提供一种能够提高复制数据的效率数据库复制系统。下面,结合附图对本申请实施例提供的数据库复制系统进行说明。
请参考图3,为本申请实施例提供的数据库复制系统300的结构框图。数据库复制系统300用于根据源端数据库的日志文件中包括的至少两组事务日志,在目的端数据库进行重演,如图3所示,数据库复制系统300包括源端设备301,以及与源端设备301通信连接的目的端设备302,其中:
源端设备301,用于从源端数据库的日志文件中并行获取至少两组事务日志,该至少两组事务日志包括第一组事务日志和第二组事务日志。其中,每组事务日志中包括至少两个事务日志,例如,第一组事务日志至少包括第一事务日志和第二事务日志,第二组事务日志至少包括第三事务日志和第四事务日志。
需要说明的是,每一组事务日志中包括的事务日志是相邻的,即第一事务日志与第二事务日志相邻,第三事务日志与第四事务日志相邻。事务日志是相邻的,可以理解为,每一组事务日志中的事务日志在日志文件中的产生时间是连续的。例如,日志文件中的每一 行用于记录一个事务日志,源端设备301根据事务日志的产生时间将各个事务日志存储在日志文件中,则第一事务日志和第二事务日志存储在日志文件中连续的两行中,第三事务日志和第四事务日志存储在日志文件中连续的另外两行中。且,在本申请实施例中,第一组事务日志中的每一个事务日志的产生时间均早于第二组事务日志中任意一个事务日志的产生时间,也就是说,第一组事务日志中的最后一个事务日志的产生时间早于第二组事务日志的第一个事务日志的产生时间。如前所述,第一组事务日志包括第一事务日志和第二事务日志,第二事务日志为第一组事务日志中按照产生时间排序的最后一个事务日志,第二组事务日志包括第三事务日志和第三事务日志,第三事务日志为第二组事务日志中按照时间排序的第一个事务日志,则第二事务日志的产生时间早于第三事务日志的产生时间。
在本申请实施例中,不限制源端设备301并行获取的事务日志组的数量,例如,可以并行获取3组事务日志,或者并行获取5组事务日志等,在此不作限制。在本申请实施例中,为方便说明,下文中,以源端设备301并行获取两组事务日志,分别标记为第一组事务日志和第二组事务日志为例,对本申请实施例中的系统进行说明。
另外,需要说明的是,源端设备301中可以包括至少一个源端数据库,上述获取的至少两组事务日志,是从某一个源端数据库对应的日志文件中获取的,也可以是从不同的源端数据库对应的日志文件中获取的,在此不作限制。
当源端设备301获取该第一组事务日志和第二组事务日志后,则将该第一组事务日志和该第二组事务日志发送给目的端设备。需要说明的是,源端设备301可以任何方式发送该第一组事务日志和第二组事务日志,具体可以通过并行发送或者异步发送的方式,在此不作限制。
目的端设备302,用于接收该第一组事务日志和该第二组事务日志,然后根据每组事务日志中所包括的至少两个事务日志以及每组事务日志中的事务日志之间的依赖关系,在目的端设备302的目的端数据库中重演每组事务日志,使得目的端数据库与源端数据库存储的数据一致。
具体来讲,由于第一组事务日志中的最后一个事务日志的产生时间早于第二组事务日志的第一个事务日志的产生时间,因此,目的端设备302需要先根据所述第一组事务日志中的第一事务日志、第二事务日志以及第一事务日志与第二事务日志的依赖关系,在目的端数据库进行事务重演之后,再根据第二组事务日志中的第三事务日志、第四事务日志以及第三事务日志与第四事务日志的依赖关系在目的端数据库进行事务重演。
在上述技术方案中,在源端数据库中的事务日志到目的端设备之前,通过将事务日志按照产生时间的先后顺序进行分组,实现并行化获取并发送多组事务日志的过程,可以提高数据库复制系统的处理效率。进一步,由于在事务日志重演之前不需要考虑事务日志之间的依赖关系,从而不需要对事务日志进行集中分析处理,可以降低源端数据库的处理复杂度,可以提高系统的处理效率。且,由于不同组的事务日志在目的端设备会根据事务日志之间的依赖关系以及产生时间的先后顺序进行重演,从而可以保证在目的端数据库中获取的数据的准确性,以确保目的端数据库中的数据与源端数据库中的数据的一致性。
需要说明的是,在实际使用过程中,上述系统中的源端设备和目的端设备还可以并行处理更多组事务日志,例如并行处理三组事务日志、四组事务日志等,当获取更多组事务日志时,源端设备和目的端设备对该多组事务日志的处理过程与前述两组事务日志的处理过程相同。
另外,在本申请实施例中,源端设备和源端数据库可以集成在一个设备中,也可以是两个独立的设备,目的端设备和目的端数据库也可以集成在一个设备或者也可以是两个独立的设备。且,源端设备和源端数据库可以设置在第一区域或第一数据中心,目的端设备和目的端数据库可以设置在与第一区域或第一数据中心远程连接的第二区域或第二数据中心,或者,源端设备、源端数据库、目的端设备以及目的端数据库也可以是设置在同一区域或者同一数据中心,在此不作限制。
下面,将以不同的示例对数据库复制系统300的源端设备301和目的端设备302的具体实现方式进行说明。
示例一
请参考图4,为数据库复制系统300的一种示例的结构框图。如图4所示,源端设备301中可以设置两个抽取模块(分别为第一抽取模块3011和第二抽取模块3012)和两个发送模块(分别为第一发送模块3013和第二发送模块3014),两个抽取模块和两个发送模块为一对一连接的关系,例如,第一抽取模块3011与第一发送模块3013连接,第二抽取模块3012与第二发送模块3014连接。目的端设备302中可以设置两个接收模块(分别为第一接收模块3021和第二接收模块3022)和一个重演模块3023,两个接收模块分别与两个发送模块为一对一连接的关系,例如,第一发送模块3013与第一接收模块3021连接,第二发送模块3014与第二接收模块3022连接,两个接收模块分别与重演模块3023连接。
需要说明的是,抽取模块、发送模块以及接收模块的数量可以与源端设备301所需要抽取的事务日志组的数量相关联,例如,若源端设备301需要获取两组事务日志,则可以设置两个抽取模块、两个发送模块以及两个接收模块;若源端设备301需要获取三组事务日志,则可以设置三个抽取模块、三个发送模块以及三个接收模块,以此类推,在此不一一列举。
具体来讲,若源端设备301和目的端设备302为独立的装置,例如可以是独立的服务器,则每个抽取模块、每个发送模块、每个接收模块以及重演模块可以是服务器中由程序代码实现的功能模块或者应用程序或者线程等。若源端设备301和目的端设备302为集群系统,例如,由至少一个虚拟机组成的集群系统,则每个抽取模块、每个发送模块、每个接收模块以及重演模块可以是部署在虚拟机上的虚拟化功能实例或者容器等。当然,上述各个模块也可以通过其他方式实现,在此不作限制。
在本示例中,每个抽取模块用于从源端数据库的日志文件中获取一组事务日志,例如,第一个抽取模块3011用于从日志文件中获取第一组事务日志,第二个抽取模块3012用于从日志文件中获取第二组事务日志。在本申请实施例中,抽取模块获取事务日志的方式可以包括但不限于如下三种:
第一种获取方式:
每个抽取模块首先需要确定其应该抽取的一组事务日志所在的抽取范围,然后根据各自的抽取范围,获取第一组事务日志和第二组事务日志。
该抽取范围可以是预先设置好的。例如,日志文件中的每一行可以用来存储一个事务日志,具体来讲,可以通过一个事务日志的起始标识和终止标识来指示日志文件中的某一行。则第一个抽取模块3011固定抽取日志文件中1-100行中存储的事务日志,第二个抽取模块3012固定抽取日志文件中的101-200行中存储的事务日志,当确定需要抽取事务日志时,各个抽取模块则按照预先设置好的抽取范围,并行从日志文件中对应的位置获取该组 事务日志。需要说明的是,可以为每个抽取模块设置用于抽取事务日志的时间,例如,可以设置在源端设备301开机后的一小时后开始抽取事务日志,则在源端设备301的运行时长达到一小时后,则各个抽取模块则按照上述方式获取各组事务日志。
第二种获取方式:
每个抽取模块根据事务日志的编号范围从源端数据库中并行获取第一组事务日志和第二组事务日志。
具体来讲,事务日志的编号可以是抽取模块根据事务日志的产生时间先后顺序进行编号得到的。例如,第一个产生的事务日志的编号为1,第二个产生的事务日志的编号为2,以此类推。在这种情况下,可以预先设置每个抽取模块用于抽取的事务日志的起始编号和抽取个数,例如,每个抽取模块均抽取5000个事务日志,则第一个抽取模块3011需要抽取的事务日志的起始编号为1,抽取个数为5000,即抽取编号为1-5000的事务日志,第二个抽取模块3012需要抽取的事务日志的起始编号为5001,抽取个数为5000,即抽取编号为5001-10000的事务日志。则每个抽取模块则按照预设的编号范围,获取一组事务日志。需要说明的是,在上述示例中,是以事务日志的起始编号为1进行说明的,在实际使用过程中,事务日志的起始编号也可以为0,在此不作限制。
然后,在每个抽取模块确定各自的编号范围后,则按照该编号范围,从源端数据库中获取一组事务日志。例如,日志文件中的每个事务日志可以包括头部和主体两部分,头部用于记录该事务日志的存储位置等信息,主体用于记录该事务日志的类型、该事务日志对应的操作以及所处理的数据的内容等信息,在此不一一说明。抽取模块可以依次从源端数据库中读取每个事务日志,在读取事务日志时,首先根据事务日志的产生时间确定该事务日志的编号,若该事务日志的编号属于该抽取模块对应的编号范围,则进一步读取事务日志的头部和该事务日志的主体,以获取该事务日志;若该事务日志的编号不属于该抽取模块对应的编号范围,则跳过该事务日志读取下一个事务日志,直至获取该编号范围对应的所有的事务日志,最终获取与该抽取模块对应的一组事务日志。另外,抽取模块在根据事务日志的产生时间确定事务日志的编号后,也可以在获取该事务日志之后,将该事务日志的编号添加在事务日志的头部中。
另外,需要说明的是,抽取模块在获取每个事务日志之后,可以对事务日志进行过滤,然后再对过滤之后的事务日志进行编号。过滤方式例如,抽取模块只需要获取数据表A的事务日志,则抽取模块可以过滤掉不属于数据表A的事务日志。或者,抽取模块也可以根据事务日志的类型进行过滤,例如,过滤掉对创建数据表的事务日志以及修改数据表结构的事务日志等。具体的过滤方式有多种,在此不一一说明。
然而在源端数据库中,事务日志在日志文件中存储的位置可能是不连续的,例如,事务日志1存储在日志文件的第一行,事务日志2存储在日志文件的第四行,在这种情况下,每个抽取模块可能需要遍历日志文件中所有的事务日志,才能获取其编号范围内对应的事务日志。
因此,为了进一步提高抽取模块的处理效率,提供第三种获取方式:
源端设备301的源端数据库在存储事务日志时,可以生成与每个事务日志对应的日志概要记录信息,该日志概要记录信息记录有源端数据库产生的事务日志的编号、在所述日志文件中的记录位置、长度以及数量,当然,也可以包括其他信息,在此不一一举例。需要说明的是,该日志概要记录信息中的事务日志的编号是由源端数据库产生的,例如,源 端数据库可以根据事务日志的产生时间对事务日志进行编号。该日志概要记录信息存储在源端数据库中指定的位置,这样,当抽取模块需要获取事务日志时,可以首先到指定位置获取该日志概要记录信息,然后根据所述日志概要记录信息在日志文件中并行获取第一组事务日志和第二组事务日志。例如,抽取模块需要获取编号为2的事务日志,则抽取模块首先获取源端数据库的日志概要记录信息,从该日志概要记录信息中找到编号为2的一条记录,根据该条记录中的位置、长度以及数量,确定编号为2的事务日志在日志文件中的存储位置,然后到日志文件中对应的位置获取事务日志,即编号为2的事务日志,从而不用遍历日志文件中所有的事务日志,可以提高抽取模块的处理效率。
需要说明的是,在上述示例中,每个抽取模块对应的抽取范围中所包括的事务日志的数量是相同,在实际使用过程中,不同的抽取模块所对应的抽取范围中包括的事务日志的数量也可以是不同。例如,第一个抽取模块可以抽取5000个事务日志,第二个抽取模块可以抽取4000个事务日志,本领域技术人员可以根据实际使用需求进行设置,在此不作限制。
由前述内容可知,每个事务日志中可以记录多种内容,例如,可以包括该所处理的操作对象、操作对象的内容、操作对象(可以理解为数据)的大小等信息,结合本申请实施例中传输事务日志的目的(即为了能够获取源端数据库中发生变化的数据)来看,并非事务日志中包括的每一个内容都是获取源端数据库中发生变化的数据必不可少的,例如,即使事务日志中不包括所处理的数据的大小,当在目的端数据库中重演该事务日志后,仍然可以获取对应的数据。因此,为了减少传输的事务日志所占用的传输资源,提高传输事务日志的效率,在本申请实施例中,每个抽取模块获取事务日志后,还可以对事务日志进行解析,并按照预设的过滤条件,对每个事务日志进行过滤处理,获取过滤后的事务日志。
作为一种示例,在获取事务日志后,首先对事务日志进行解析,获取该事务日志所包括的内容,然后根据该事务日志对应的操作类型,以及与每个操作类型对应的过滤条件,将事务日志中的部分内容过滤掉,然后将剩余内容进行复制并组合,得到过滤后的事务日志。其中,将部分内容过滤掉,可以理解为将部分内容删除。事务日志对应的操作类型可以包括但不限于增加数据的操作类型、修改数据的操作类型、删除数据的操作类型、增加数据库表的操作类型以及删除数据库表的操作类型等,本领域技术人员可以根据实际使用需求设置与不同的操作类型对应的过滤条件,例如,针对增加数据的操作类型,其对应的过滤条件可以为过滤掉除所处理的数据的存储位置以及数据的内容之外的信息,又例如,针对删除数据的操作类型,其对应的过滤条件可以为过滤掉所处理的操作对象之外的信息,这样,可以灵活地对不同的事务日志进行过滤。此处进行过滤的条件与前述抽取模块在对事务日志进行编号之前进行过滤的条件可以相同,也可以不同,在此不作限制。
当然,当源端数据库的类型不同时,事务日志的结构可能会不同。例如,当源端数据库采用数据库A时,事务日志包括5个字段,当源端数据库采用数据库B时,事务日志包括7个字段,因此,为了保证能够准确地解析事务日志所包括的内容,每个抽取模块可以适配支持不同类型的数据库对应的事务日志,也就是说,每个抽取模块可以对应有多种对事务日志进行解析的方式,抽取模块在解析事务日志之前,可以先确定源端数据库的类型,然后采用与源端数据库的类型相匹配的解析方式,对事务日志进行解析。
另外,在本申请实施例中,当每个抽取模块抽取完一组事务日志后,每个抽取模块还可以自动计算出下一组要抽取的事务日志的抽取范围。例如,多个抽取模块之间可以进行 交互,每个抽取模块可以获取其他模块的抽取范围,例如,第一个抽取模块1的编号范围为1-5000,且第一个抽取模块获取其他抽取模块的编号范围中包括的事务日志的数量均为5000,则第一个抽取模块可以推算出下一组要抽取的事务日志的编号范围为20001-25000。或者,每个抽取模块中还可以预先设置计算策略,该计算策略可以为在抽取完一组事务日志后,自动在当前的编号范围内增加20000得到下一组事务日志的编号范围。这样,抽取模块可以不用等待重演模块重演完其抽取的事务日志,便抽取下一组事务日志,可以加快处理效率。
当第一抽取模块3011和第二抽取模块3012分别获取对应的一组事务日志后,则将各自获取的一组事务日志发送到与其连接的发送模块,例如,第一抽取模块3011将获取的第一组事务日志发送给第一发送模块3013,第二抽取模块3012将获取的第二组事务日志发送给第二发送模块3014,然后通过第一发送模块3013将第一组事务日志发送给第一接收模块3021,通过第二发送模块3014将第二组事务日志并行发送给第二接收模块3022。
第一接收模块3021和第二接收模块3022分别接收到该第一组事务日志和第二组事务日志之后,则将该第一组事务日志和第二组事务日志发送给重演模块3023,重演模块3023根据事务日志之间的依赖关系在目的端数据库重演这两组事务日志。
作为一种示例,重演模块3023首先判断第一组事务日志中包括的事务日志之间的依赖关系,例如,重演模块3023确定第一组事务日志中的第一事务日志的操作对象与第二事务日志的操作对象是否相同,若相同,则判断第一事务日志的产生时间是否在第二事务日志的产生时间之前,若是,则确定第二事务日志依赖第一事务日志,则重演模块3023先重演第一事务日志之后,再重演第二事务日志,以保证在目的端数据库获取的数据的准确性。具体来讲,重演模块3023中可以包括多个重演队列,重演模块3023可以根据事务日志之间的依赖关系,将第一组事务日志中包括的多个事务日志划分到多个重演队列中。例如,第一事务日志与第二事务日志具有依赖关系,则将第一事务日志和第二事务日志划分到同一个重演队列中,其他的事务日志均与第一事务日志和第二事务日志之间不存在依赖关系,则将其他事务日志划分到其他重演队列中,直至将第一组事务日志中所有的事务日志都划分到对应的重演队列中,然后依次重演各个重演队列中的每个事务日志,完成对第一组事务日志的重演。重演模块3023采用与上述相同的方式,确定第二组事务日志中包括的事务日志之间的依赖关系,并在重演完第一组事务日志之后,按照第二组事务日志中各个事务日志之间的依赖关系,重演第二组事务日志中的所有事务日志,具体过程与重演第一组事务日志相似,在此不再赘述。
需要说明的是,一个事务日志中可以包括多个事务操作,例如,该多个事务操作可以包括增加、修改、删除不同的数据表的不同行或者不同列中的数据,也就是说,一个事务日志可以包括多个操作对象,则在这种情况下,确定该事务日志与其他事务日志之间的依赖关系时,只要其他事务日志的操作对象,与该事务日志的多个操作对象中的其中一个操作对象相同,则确定这两个事务日志之间存在依赖关系。
当然,在其他实施例中,重演模块也可以包括多个。请参考图5,为数据库复制系统300的另一种示例的结构框图。与图4不同的是,在图5所示的示例中,重演模块的数量可以与接收模块的数量相同,例如,包括两个重演模块,分别为第一重演模块3024和第二重演模块3025,其中,第一重演模块3024与第一接收模块3021连接,用于接收第一组事务日志,第二重演模块3025与第二接收模块3022连接,用于接收第二组事务日志,然 后第一重演模块3024和第二重演模块3025按照预设的顺序重演其接收到的事务日志,该预设的顺序为第一重演模块3024先进行重演,当第一重演模块3024中的事务日志重演完之后,第二重演模块3025再进行重演。其中,每个重演模块重演其接收到的一组事务日志的方式与图4中的重演模块3023的方式相似,在此不作限制。
需要说明的是,在图5所示的示例中,接收模块也可以集成到重演模块中,例如,将第一接收模块3021集成到第一重演模块3024中,将第二接收模块3022集成到第二重演模块3025中,从而可以简化系统。
示例二
如图3所示的数据库复制系统300中的源端设备301除用于并行获取第一组事务日志和第二组事务日志之外,还用于提供第一组事务日志和第二组事务日志中每一组事务日志包括的多个事务日志之间的依赖关系。
具体来讲,针对第一组事务日志,源端设备301在确认第一事务日志记录的第一事务操作在源端数据库中的操作对象与第二事务日志记录的第二事务操作在源端数据库中的操作对象相同,且第一事务日志记录的第一事务操作在源端数据库中的操作时刻早于第二事务日志记录的第二事务操作在源端数据库中的操作时刻的情况下,将第一事务日志的编号记录到第二事务日志中,通过在第二事务日志中携带的第一事务日志的编号,来指示第一事务日志与第二事务日志的依赖关系,即第二事务日志依赖第一事务日志。
针对第二组事务日志,源端设备301在确认第三事务日志记录的第三事务操作在源端数据库中的操作对象与第四事务日志记录的第四事务操作在源端数据库中的操作对象相同,且第三事务日志记录的第三事务操作在源端数据库中的操作时刻早于第四事务日志记录的第四事务操作在源端数据库中的操作时刻的情况下,将第三事务日志的编号记录到第四事务日志中,通过在第四事务日志中携带的第三事务日志的编号,来指示为第三事务日志与第四事务日志的依赖关系,即第四事务日志依赖第三事务日志。
在这种情况下,目的端设备302则可以根据每一组事务日志中是否携带其他事务日志的编号来重演每一组事务日志中的多个事务日志。
具体来讲,若第一事务日志中不包括用于指示与该第一事务日志存在依赖关系的事务日志的编号,则目的端设备302在获取到该第一事务日志的情况下,确认根据该第一事务日志进行事务重演。然后,目的端设备302在获取到第一组事务日志中的第二事务日志的情况下,确认该第二事务日志记录有用于指示该第一事务日志与该第二事务日志的依赖关系的第一事务日志的编号,在确认根据第一事务日志进行的事务重演完成之后,再根据第二事务日志进行事务重演,依次类推,直至重演完第一组事务日志中包括的所有事务日志。然后,目的端设备302在获取到第二组事务日志中的第三事务日志的情况下,确认该第三事务日志没有记录有用于指示与该第三事务日志存在依赖关系的事务日志的编号,在根据该第三事务日志进行事务重演。在获取到第二组事务日志中的第四事务日志的情况下,确认该第四事务日志记录有用于指示第四事务日志与第三事务日志的依赖关系的第三事务日志的编号,在确认根据第三事务日志进行的事务重演完成之后,再根据第四事务日志进行事务重演,依次类推,直至重演完第二组事务日志中包括的所有事务日志。
在这种情况下,请参考图6,为数据库复制系统300的另一种示例的结构框图。
与图4所示的结构不同的是,源端设备301中还设置有两个并行化模块,分别为第一并行化模块3015和第二并行化模块3016,每个并行化模块分别与一个抽取模块和一个发 送模块连接,例如,第一并行化模块3015分别与第一抽取模块3011以及第一发送模块3013连接,第二并行化模块3016分别与第二抽取模块3012以及第二发送模块3014连接。每个并行化模块用于确定每一组事务日志中包括的多个事务日志之间的依赖关系,并将该依赖关系添加到对应的事务日志中。
需要说明的是,并行化模块的数量以及实现方式,与前述抽取模块、每个发送模块、每个接收模块以及重演模块相似,在此不再赘述。另外,第一抽取模块3011、第一发送模块3013、第二抽取模块3012、第二发送模块3014、第一接收模块3021、第二接收模块3022以及重演模块3023分别与图4中相应的模块相似,在此不再赘述。在本示例中,主要对并行化模块进行说明。
具体来讲,第一并行化模块3015接收到与其连接的第一抽取模块3011发送的第一组事务日志后,则依次为该组事务日志中的每个事务日志添加依赖关系。首先,第一并行化模块3015获取第一组事务日志中的第一个事务日志,很显然第一个事务日志不存在依赖关系,然后,第一并行化模块3015可以在第一个事务日志的头部添加一个字段,该字段用于指示该事务日志所依赖的事务日志的编号,由于第一个事务日志不存在依赖关系,则第一个事务日志对应的字段中可以为空,或者,第一并行化模块3015也可以在该字段中写入0,在这种情况下,事务日志的起始编号为1,若为0,则说明该事务日志不依赖其他事务日志。然后,第一并行化模块3015判断第一组事务日志中的第二个事务日志的依赖关系,确定该第二个事务日志记录的事务操作在源端数据库中的操作对象是否与在前的事务日志记录的事务操作在源端数据库的操作对象相同。例如,第一并行化模块3015确定第一个事务日志记录的事务操作用于处理数据表A的第一行数据,第二个事务日志记录的事务操作也用于处理数据表A的第一行数据,则第一并行化模块3015确定第一个事务日志和第二个事务日志分别记录的事务操作在源端数据库中的操作对象相同;或者,当源端数据库为KV键值数据库,则第一并行化模块3015可以通过确定两个事务日志中记录的操作对象是否存在至少一个相同的键值来确定两个事务日志分别记录的事务操作在源端数据库中的操作对象是否相同,若存在相同的键值,则说明两个事务日志分别记录的事务操作在源端数据库中的操作对象相同,若不存在任意一个相同的键值,则说明两个事务日志分别记录的事务操作在源端数据库中的操作对象不同。当然,也可以通过其他方式判断,在此不作限制。
然后,第一并行化模块3015再判断第一个事务日志记录的事务操作在源端数据库中的操作时刻(可以理解为第一个事务日志在日志文件的产生时间),是否在第二个事务日志记录的事务操作在源端数据库中的操作时刻(可以理解为第二个事务日志在日志文件的产生时间)之前,若为是,则说明第二个事务日志依赖第一个事务日志,则将第一个事务日志的编号添加到第二个事务日志的头部中新增的字段中。紧接着,继续确定第一组事务日志中的第三个事务日志分别与第一个事务日志和第二个事务日志的依赖关系,具体确定方式与前述确定第二个事务日志与第一个事务日志的依赖关系相似,在此不再赘述。若确定第三个事务日志分别与第一个事务日志及第二个事务日志存在依赖关系,则在第三个事务日志的头部中新增的字段中,增加第一个事务日志的编号和第二个事务日志的编号。采用前述相同的方式,确定该组事务中每个事务日志与在先的事务日志之间的依赖关系,并将确定的依赖关系添加在对应的事务日志中,从而获得包括依赖关系的一组事务日志,然后将包括依赖关系的事务日志发送给第一发送模块3013,通过第一发送模块3013将包括 依赖关系的事务日志发送给目的端设备302中的重演模块3023。
需要说明的是,每个并行化处理模块处理完一组事务日志后,可以将已经处理完的事务日志先缓存在本地,然后再创建新的处理队列,接收并处理与其连接的抽取模块发送的另一组事务日志,而不用等待前述已经处理完成的事务日志成功传输到目的端数据库,可以提高并行化处理模块的处理效率。
另外,由前述内容可知,在重演模块根据事务日志进行重演时,需要先重演完成第一组事务日志后,才能根据第二组事务日志进行重演,因此,为了便于重演模块区分出不同组的事务日志,并行化处理模块也可以在每个事务日志的头部中新增字段,用于指示该事务日志所属的组别的信息。例如,第一事务日志和第二事务日志属于第一组事务日志,则在第一事务日志和第二事务日志的头部中增加编号1,第三事务日志和第四事务日志属于第二组事务日志,则在第三事务日志和第四事务日志的头部中增加编号2。
另外,针对前述对抽取模块的说明可知,每个抽取模块抽取完一组事务日志后,还可以自动计算出下一组要抽取的事务日志的抽取范围。例如,第一个抽取模块1在抽取完编号范围为1-5000的事务日志之后,不用等待该事务日志重演完成,则在确定出下一组要抽取的事务日志的编号范围为20001-25000之后,可以抽取编号范围为20001-25000的事务日志。很显然,取编号范围为20001-25000的事务日志也是属于组1的事务日志,为了便于将编号范围为1-5000的事务日志和编号范围为20001-25000的事务日志区分开,并行化处理模块还可以在事务日志中增加用于显示抽取次数的标识。例如,编号范围为1-5000的事务日志为抽取模块1第一次抽取的事务日志组中的事务日志,则可以在每个事务日志中增加编号11,其中第一个编号1用于指示该事务日志为第一次抽取的,第二个编号用于指示该事务日志所属的组别为第一组,当然,也可以用第一个编号1指示该事务日志所述的组别,用第二个编号1指示该事务日志为第一次抽取的,在此不作限制;编号范围为20001-25000的事务日志为抽取模块1第二次抽取的事务日志组中的事务日志,则可以在每个事务日志中增加编号21,其中第一个编号2用于指示该事务日志为第二次抽取的,第二个事务日志用于指示该事务日志所属的组别为第一组,或者,也可以增加编号12,第一个编号1用于指示该事务日志所属的组别,第二个编号2用于指示该事务日志为第二次抽取的,这样,当重演模块接收到事务日志后,先执行每个抽取模块第一次抽取的事务日志,然后在执行第二次抽取的事务日志,以此类推。
示例三
请参考图7,为数据库复制系统300的另一种示例的结构框图。
与图6所示的结构不同的是,目的端设备302中的重演模块的数量可以有多个,且重演模块的数量可以与接收模块的数量不同。例如,在图7中,包括三个重演模块,分别为第三重演模块3026、第四重演模块3027以及第五重演模块3028,其中,每个重演模块分别与第一接收模块3021和第二接收模块3022连接,也就是说,每个接收模块可以向任意一个重演模块发送事务日志,且每个重演模块可以用来接收来自不同组的事务日志。
第一种示例,每个接收模块可以将接收的一组事务日志中的每个事务日志随机分发给任意一个重演模块。
第二种示例,每个接收模块可以按照预设的顺序将接收的一组事务日志中的每个事务日志分发给重演模块,例如,第一接收模块3021将接收到的第一组事务日志中的第一个事务日志分发给第三重演模块3026,将第一组事务日志中的第二个事务日志分发给第四重 演模块3027,将第三个事务日志分发给第五重演模块3028,将第四个事务日志分发给第三重演模块3026,将第五个事务日志分发给第四重演模块3027,将第六个事务日志分发给第五重演模块3028,以此类推。
第三种示例,可以给每个重演模块进行编号,例如,第三重演模块3026的编号为1,第四重演模块3027的编号为2,第五重演模块3028的编号为3,可以按照负载均衡的原则,根据每个事务日志的编号进行哈希计算,哈希计算的结果即每个事务日志应该分发到的重演模块的编号,从而将该事务日志分发给对应的重演模块。例如,第一组事务日志中的第一个事务日志为事务日志1,则第一接收模块3021对编号1进行哈希计算,得到计算值,例如为1,则说明事务日志1应该分发给编号为1的重演模块,则第一接收模块3021将事务日志1分发给第三重演模块3026,以此类推,直至完成将接收到的每个事务日志分发给重演模块。
需要说明的是,在上述示例中,当接收模块将接收到的一组事务日志分发给多个重演模块时,还可以在每个事务日志中携带该事务日志所述的事务日志组的标识。例如,第一组事务日志中的每个事务日志携带第一组的标识,第二组事务日志中的每个事务日志携带第二组的标识,以此类推,在此不一一说明。
每个重演模块在接收到各个接收模块发送的事务日志之后,则将事务日志缓存到不同的重演队列中等待重演。具体来讲,一个重演模块可以设置多个重演队列,这些重演队列分别缓存不同的接收模块发送的事务日志,例如,在本实例中,接收模块的数量为2个,则每个重演模块中可以包括有2个重演队列,按照从每个接收模块接收的事务日志的产生时间的先后顺序,将不同接收模块发送的事务日志缓存在不同的重演队列中。例如,第三重演模块3026将从第一接收模块3021接收的事务日志,依次缓存在第一个重演队列中,将从第二接收模块3022中接收的事务日志,依次缓存在第二个重演队列中。其他重演模块的处理过程与第三重演模块3026的处理方式相同,在此不再赘述。或者,可以设置多个重演队列与不同的事务日志组相关联,例如,将第一组事务日志中的所有事务日志缓存在第一个重演队列中,将第二组事务日志中的所有事务日志缓存在第二个重演队列中,这样,可以根据不同的重演队列来进行事务日志重演。
然后,每个重演模块则按照重演队列的顺序,依次重演完不同队列中的事务日志,例如,第三重演模块3026先处理第一个重演队列中的事务日志,在重演该队列中所有的事务日志之后,再重演第二个重演队列中所有的事务日志。其他重演模块的处理方式也是一样,在此不再赘述。
每个重演模块在确定该重演模块中正在等待处理的第一个事务日志满足重演条件时,重演该第一个事务日志。其中,正在等待处理的第一个事务日志,可以理解为,该重演模块正在处理的重演队列中的第一个等待处理的事务日志。例如,第三重演模块3026首先处理第一个重演队列中的事务日志,第一个重演队列中的第一个事务日志为第一组事务日志中的第一个事务日志,然后,第三重演模块3026确定该第一个事务日志是否依赖其他事务日志,例如,第三重演模块3026确定第一个事务日志中没有携带第一组事务日志中其他事务日志的编号,从而确定第一个事务日志不依赖其他事务日志,则确定第一个事务日志满足重演条件,则按照第一个事务日志,在目的端数据库中进行重演,具体重演的过程与图2所示的示例中相似,在此不作限制。当第三重演模块3026重演完该第一个事务日志之后,则将重演结果发送给其他重演模块,即将重演结果发送给第四重演模块3027 以及第五重演模块3028,其中,重演结果为第一个事务日志完成重演。
其中,当第三重演模块3026在处理其第一个重演队列中的事务日志时,第四重演模块3027以及第五重演模块3028也在并行处理各自的第一个重演队列中的事务日志。只不过,由于第四重演模块3027中第一个重演队列的第一个等待处理的事务日志为第一组事务日志中的第二个事务日志,第四重演模块3027确定该第二个事务日志依赖第一组事务日志中的第一个事务日志,从而,在第四重演模块3027未从其他重演模块中接收到第一组事务日志的第一个事务日志完成重演的重演结果之前,第四重演模块3027不能重演该事务日志。第五重演模块3028也是如此。也就是说,在同一时刻,多个重演模块中只有一个重演模块在重演事务日志,而其他的重演模块处于等待状态中。
当第四重演模块3027接收到第三重演模块3026发送的第一组事务日志中的第一个事务日志的重演结果之后,第四重演模块3027判断该重演结果正好是其在等待处理的事务日志(即第一组事务日志中的第二个事务日志)所依赖的事务日志的重演结果,则第四重演模块3027确定其正在等待处理的事务日志满足重演条件,则按照该事务日志在目的端数据中进行重演,并将第二个事务日志的重演结果发送给第三重演模块3026以及第五重演模块3027。
当第五重演模块3028接收到第三重演模块3026发送的第一组事务日志中的第一个事务日志的重演结果之后,第五重演模块3028判断其在等待处理的事务日志为第一组事务日志中的第四个事务日志,而第四个事务日志依赖第二个事务日志和第三个事务日志,该重演结果不是其正在等待处理的事务日志所依赖的事务日志的重演结果,因此,保持继续等待的状态,直至其接收到第一组的第二个事务日志和第三个事务日志的重演结果之后,才能开始重演。
需要说明的是,当第四重演模块3027以及第五重演模块3028在预设时长内未接收到其正在等待处理的事务日志所依赖的事务日志的重演结果时,第四重演模块3027以及第五重演模块3028也可以向其他重演模块发送用于获取该重演结果的询问请求,并由执行该事务日志的重演模块对该询问请求进行应答,应答结果为该事务日志是否重演完成。这样,第四重演模块3027以及第五重演模块3028也可以根据应答结果确定是否需要继续保持等待状态。
按照上述过程,当三个重演模块重演完所有重演队列中的事务日志之后,则在目的端数据库得到了与源端数据库相同的数据。
在本申请实施例中,为了保证传输事务日志的可靠性,每个发送模块中还可以包括缓存单元,用于缓存未发送给接收模块的事务日志。源端设备301可以对日志文件中存储时长超过阈值的事务日志进行清除,然而,当发送模块与接收模块的传输发生异常,例如,传输中断,在这种情况下,发送模块可以通过缓存单元,将还未发送给接收模块的事务日志进行存储,以便于当传输恢复后,重新向该接收模块发送该事务日志。或者,发送模块也可以采用其他方式保证事务日志传输的可靠性,例如,发送模块也可以直接将还未发送给接收模块的事务日志存储到永久性存储设备中,在此不作限制。
另外,每个发送模块可以同时创建多个处理队列,每个处理队列用于处理从并行化模块接收的一组事务日志,并按照预设的处理顺序,依次将各个处理队列中的事务日志发送给接收模块。各个处理队列之间可以独立执行,即,当其中一个处理队列中的事务日志未全部发送给接收模块时,另一个处理队列中从与其连接的并行化模块中接收下一组事务日 志,以减少传输等待的时延。
为了能够更加直观地了解该示例中各个模块的处理流程,下面以一个具体的例子,对图7所示的数据库复制系统300进行数据复制的处理流程进行说明。请参考图8,为图7所示的数据库复制系统300进行数据复制的一种示例的示意图。
如图8所示,源端设备301的源端数据库在预设时长内产生了包括n个事务日志的日志文件,n个事务日志分别为事务日志1~事务日志n,且源端设备301包括3个抽取模块,分别为抽取模块1~抽取模块3,3个并行化模块,分别为并行化模块1~并行化模块3,以及3个发送模块,分别为发送模块1~发送模块3,且各个模块之间一一连接,即并行化模块1分别与抽取模块1和发送模块1连接,并行化模块2分别与抽取模块2和发送模块2连接,并行化模块3分别与抽取模块3和发送模块3连接,相应地,在目的端设备302中包括3个接收模块和3个重演模块,分别为接收模块1~接收模块3,重演模块1~重演模块3,3个接收模块和3个重演模块一一连接。源端设备301中的每个发送模块分别与目的端设备302中的3个接收模块连接。
首先,各个抽取模块按照各自的编号范围从源端数据库的日志文件中并行抽取一组事务日志,例如,抽取模块1抽取的组1的事务日志包括事务日志1~事务日志3,抽取模块2抽取的组2的事务日志包括事务日志4~事务日志6,抽取模块3抽取的组3的事务日志包括事务日志7~事务日志9,然后各个抽取模块将各自抽取的一组事务日志发送给与其连接的并行化模块。在图8中,以T1~T9为例标记事务日志1~事务日志9。
并行化模块1接收到抽取模块1发送的组1的事务日志后,确定事务日志2依赖事务日志1,则将编号1添加到事务日志2的头部,以指示事务日志1和事务日志2之间的依赖关系,以及,确定事务日志3和事务日志1不依赖其他事务日志,则在事务日志1和事务日志2的头部中添加编号0,然后将添加编号的事务日志1~添加编号的事务日志3发送给发送模块1。并行化模块2接收到抽取模块2发送的组2的事务日志后,确定事务日志6依赖事务日志4,则将编号4添加到事务日志6的头部,以及,确定事务日志4和事务日志5不依赖其他事务日志,则在事务日志4和事务日志5的头部中添加编号0,然后将添加编号的事务日志4~添加编号的事务日志6发送给发送模块2。并行化模块3接收到抽取模块3发送的组3的事务日志后,确定事务日志7~事务日志9均不依赖其他事务日志,则在事务日志7~事务日志9的头部中添加编号0,然后将添加编号的事务日志7~添加编号的事务日志9发送给发送模块3。
发送模块1~发送模块3在接收到对应的一组事务日志后,则根据事务日志的编号,将各个事务日志发送给接收模块。例如,发送模块1对编号1进行哈希计算,得到的计算结果为1,从而将添加编号的事务日志1发送给接收模块1,且,发送模块1在事务日志1中添加事务日志组的编号,以向接收模块1指示事务日志1所属的事务日志组为组1,采用相同的方式发送组1中所有的事务日志,在此不一一说明。在图8中,发送模块1将事务日志1~事务日志3分别发送给接收模块1~接收模块3,发送模块2将事务日志4~事务日志6分别发送给接收模块1~接收模块3,发送模块3将事务日志7~事务日志9分别发送给接收模块1~接收模块3,从而接收模块1接收到组1的事务日志1、组2的事务日志4以及组3的事务日志7,接收模块2接收到组1的事务日志2、组2的事务日志5以及组3的事务日志8,接收模块3接收到组1的事务日志3、组2的事务日志6以及组3的事务日志9。接收模块将接收到的事务日志发送给与其连接的重演模块。
重演模块在接收到事务日志之后,则按照事务日志的之间的依赖关系以及事务日志所在的组,依次在目的端数据库中重演事务日志。例如,重演模块1首先执行组1中的事务日志,即事务日志1,确定事务日志1不依赖其他事务日志,则直接在目的端数据库中按照事务日志1进行重演,然后将事务日志1完成重演的结果发送给重演模块2和重演模块3。并行地,重演模块2首先执行组1中的事务日志2,由于事务日志2依赖事务日志1,则重演模块2先等待其他重演模块发送事务日志1完成重演的结果,当重演模块2从重演模块1接收到该结果后,则在目的端数据中按照事务日志2进行重演,然后将事务日志2完成重演的结果发送给重演模块1和重演模块3。并行地,重演模块3首先执行组1中的事务日志3,由于事务日志3不依赖其他事务日志,则重演模块3直接在目的端数据库中按照事务日志3进行重演,然后将事务日志3完成重演的结果发送给重演模块1和重演模块2。
然后,重演模块1可以判断组1的事务日志是否全部完成重演,在组1的事务日志全部完成重演后,则根据组2的事务日志进行重演。例如,组1的所有事务日志均缓存在第一个重演队列中,若重演队列1中的所有事务日志均已完成重演,则重演模块1可以判断组1的事务日志全部完成重演,则根据第二个重演队列中的事务日志进行重演。组2的事务日志的重演过程与组1的事务日志的重演过程相似,在此不再赘述。当每个重演模块重演完其接收到的所有的事务日志时,在目的端数据库中则得到了与源端数据库相同的数据,实现了将源端数据库中的数据复制到目的端数据库中。
示例四
与示例二中的数据库复制系统300不同的是,可以将设置在源端设备301中的并行化模块的功能,迁移到目的端设备302中,即源端设备301中不用提供每一组事务日志包括的多个事务日志之间的依赖关系,而通过目的端设备302确定出每一组事务日志包括的多个事务日志之间的依赖关系。
具体来讲,针对第一组事务日志,目的端设备302在确认第一事务日志记录的第一事务操作在源端数据库中的操作对象与第二事务日志记录的第二事务操作在源端数据库中的操作对象相同,且第一事务日志记录的第一事务操作在源端数据库中的操作时刻早于第二事务日志记录的第二事务操作在源端数据库中的操作时刻的情况下,将第一事务日志的编号记录到第二事务日志中,通过在第二事务日志中携带的第一事务日志的编号,来指示第一事务日志与第二事务日志的依赖关系,即第二事务日志依赖第一事务日志。
针对第二组事务日志,目的端设备302在确认第三事务日志记录的第三事务操作在源端数据库中的操作对象与第四事务日志记录的第四事务操作在源端数据库中的操作对象相同,且第三事务日志记录的第三事务操作在源端数据库中的操作时刻早于第四事务日志记录的第四事务操作在源端数据库中的操作时刻的情况下,将第三事务日志的编号记录到第四事务日志中,通过在第四事务日志中携带的第三事务日志的编号,来指示为第三事务日志与第四事务日志的依赖关系,即第四事务日志依赖第三事务日志。
目的端设备302确定每一组事务日志包括的多个事务日志之间的依赖关系的方式,与示例二中源端设备301相似,在此不再赘述。
在这种情况下,请参考图9,为数据库复制系统300的另一种示例的结构框图。
与图6所示的结构不同的是,更改了两个并行化模块的设置位置,由图6中设置在源端设备301中更改为设置在目的端设备302中。在图9中包括第三并行化模块30209和第 四并行化模块30210,其中,每个并行化模块分别连接一个接收模块连接,用于从一个接收模块中接收一组事务日志,且每个并行化模块均与重演模块连接,也就是说,每个并行化模块可以向重演模块发送事务日志。
其中,第三并行化模块30209与图6所示的第一并行化模块3015相似,第四并行化模块30210与图6所示的第二并行化模块3016相似,其他模块分别与图6所示的相应的模块相似,在此不再赘述。
在图9中,重演模块的数量为一个,当然,重演模块的数量也可以是多个,如图10所示,包括三个重演模块,分别为第三重演模块3026、第四重演模块3027以及第五重演模块3028,从而每一个并行化模块分别与每一个重演模块连接,用于向任意一个重演模块发送事务日志。在这种情况下,各个并行化模块还用于执行将接收到的一组事务日志中的每个事务日志发送给多个重演模块的过程,例如,可以将接收的一组事务日志中的每个事务日志随机分发给任意一个重演模块,或者,可以按照预设的顺序将接收的一组事务日志中的每个事务日志分发给重演模块,或者,可以给每个重演模块进行编号,按照负载均衡的原则,根据每个事务日志的编号进行哈希计算,哈希计算的结果即每个事务日志应该分发到的重演模块的编号,从而将该事务日志分发给对应的重演模块。具体过程可以参照图7所示的示例中接收模块的分发事务日志的第一种~第三种示例,在此不再赘述。
示例五
请参考图11,为数据库复制系统300的另一种示例的结构框图。
与图10所示的结构不同的是,图11中还包括管理设备,分别为源端管理设备303以及目的端管理设备304。其他模块与图10中相似,在此不再赘述。
其中,源端管理设备303可以为每个抽取模块分配用于获取一组事务日志的编号范围,各个抽取模块根据源端管理设备303分配的编号范围抽取对应的事务日志。和/或,源端管理设备303还可以用于监测源端设备301中各个模块的运行状态,以及动态调整各个抽取模块抽取事务日志的数量以及范围。例如,监测到某一个抽取模块发生故障,则可以将该抽取模块需要获取的事务日志分配给其他抽取模块。
目的端管理设备304用于监测目的端设备302中各个模块的运行状态,以及动态调整各个接收模块和各个重演模块处理的事务日志的数量。例如,当监测到某个重演模块发生故障,则可以通知各个接收模块不向该故障的重演模块发送事务日志,且将该故障的重演模块需要重演的事务日志分发到其他重演模块中。
作为一种示例,目的端管理设备304在监测到某个重演模块出现故障时,收集相关信息,例如,发生故障的重演模块所处理的当前事务编号,故障的重演模块的编号等,然后将收集的相关信息发送给其他重演模块,然后,每个重演模块将事务的产生时间位于发生故障的重演模块所处理的当前事务编号之后的事务日志重新分发给其他未发生故障的重演模块,从而通过其他未发生故障的重演模块在目的端数据库重演事务日志。
若各个重演模块接收到的事务日志是由源端设备301中的发送模块分发的,则目的端管理设备304需要将收集的相关信息发送个源端管理设备303,由源端管理设备303将相关信息转发给各个发送模块,以将事务的产生时间位于发生故障的重演模块所处理的当前事务编号之后的事务日志重新分发给其他未发生故障的重演模块。
当目的端设备302中的接收模块或者源端设备301中的发送模块重新分发事务日志时,可以在重新分发的事务日志中添加重发标识,例如,可以添加“二次hash重发”标志,则 收到携带有“二次hash重发”标志的第一条事务日志的重演模块可以立即进行事务重演,完成恢复。例如,发生故障的重演模块正在处理的事务日志的编号为3,则携带有“二次hash重发”标志的第一条事务日志为事务日志4,则当某一个重演模块接收到携带有“二次hash重发”标志且编号为4的事务日志后,则直接根据该事务日志在目的端数据库进行重演,然后将重演结果发送给其他重演模块,其余的事务日志则按照前述方式,根据每个事务日志所在的事务日志组以及依赖关系,在目的端数据库进行重演,从而恢复整个重演过程。
当然,若其他模块发生故障,源端管理设备303和目的端管理设备304也可以采用相似的方式重新分配未发生故障的模块的任务,以保证系统的稳定性。
另外,上述各个示例中的各个模块也可以进行自由组合,不限于上述示例中的几种组合方式。
在上述技术方案中,只在事务日志进行重演时,才考虑事务日志之间的依赖关系,这样,在事务日志进行重演之前,在不考虑事务日志之间的依赖关系的前提下,可以将事务日志分为多组并行抽取以及并行发送,从而可以提高数据库复制系统的处理效率。且,由于事务日志在目的端数据库会根据事务日志之间的依赖关系进行重演,因此,可以保证目的端数据库能够获取与源端数据库相同的数据,保证数据的一致性。
基于同一发明构思,本申请实施例提供一种数据库复制方法,该方法可以应用在如图3~图11所示的数据库复制系统中。请参考图12,为该方法的一种示例的流程图,该流程图描述如下:
S121、源端设备从源端数据库的日志文件中并行获取至少两组事务日志,该至少两组事务日志包括第一组事务日志和第二组事务日志。
在本申请实施例中,不限制事务日志组的数量,为方便说明,下文中以该至少两组事务日志包括第一组事务日志和第二组事务日志为例。其中,第一组事务日志至少包括相邻的第一事务日志和第二事务日志,第二组事务日志至少包括相邻的第三事务日志和第四事务日志,第二事务日志的产生时间早于第三事务日志的产生时间。对第一组事务日志和第二组事务日志的说明,可以参照前述示例一中的相应内容,在此不再赘述。
在本申请实施例中,源端设备从源端数据库中并行获取第一组事务日志和第二组事务日志,可以包括但不限于如下三种方式:
第一种获取方式:
首先需要确定其应该抽取的一组事务日志所在的抽取范围,然后根据各自的抽取范围,获取第一组事务日志和第二组事务日志。
第二种获取方式:
根据事务日志的编号范围从源端数据库中并行获取第一组事务日志和第二组事务日志。
第三种获取方式:
源端设备在存储事务日志时,可以生成与每个事务日志对应的日志概要记录信息,该日志概要记录信息记录有源端数据库产生的事务日志的编号、在所述日志文件中的记录位置、长度以及数量,然后,当源端设备需要获取事务日志时,则首先读取该日志概要记录信息,根据该日志概要记录信息在所述日志文件中并行获取所述第一组事务日志和所述第二组事务日志。
对上述三种获取方式的描述,可以参照前述任意一个示例中对抽取模块的说明,在此不再赘述。
需要说明的是,在本申请实施例中,以源端设备获取两组事务日志为例进行说明,在实际使用过程中,不限制源端设备并行获取的事务日志的组数,例如,可以并行获取三组事务日志、四组事务日志甚至更多组事务日志,在此不作限制。
S122、源端设备并行发送该第一组事务日志和该第二组事务日志,目的端设备接收该第一组事务日志以及该第二组事务日志。
若源端设备与目的端设备设置在不同的区域或者不同的数据中心,则源端设备可以通过与目的端设备之间的远程连接,将该第一组事务日志和第二组事务日志发送给目的端设备。
具体实现方式可以参照前述任意一个示例中对发送模块的说明,在此不再赘述。
S123、目的端设备在根据该第一组事务日志中的第一事务日志、第二事务日志以及第一事务日志与第二事务日志的依赖关系在目的端数据库进行事务重演之后,根据第二组事务日志中的第三事务日志、第四事务日志以及第三事务日志与第四事务日志的依赖关系在目的端数据库进行事务重演,使得目的端数据库与源端设备的源端数据库存储的数据一致。
具体来讲,目的端设备首先需要确定每一组事务日志所包括的多个事务日志之间的依赖关系,例如,确定第一组事务日志包括的第一事务日志和第二事务日志之间的依赖关系,以及,确定第二组事务日志包括的第三事务日志和第四事务日志之间的依赖关系。然后根据每一组事务日志的依赖关系,在目的端数据库中根据该组事务日志进行重演。
在本申请实施例中,确定第一组事务日志包括的第一事务日志和第二事务日志之间的依赖关系,包括:
在确认第一事务日志记录的第一事务操作在源端数据库中的操作对象与第二事务日志记录的第二事务操作在源端数据库中的操作对象相同,且第一事务日志记录的第一事务操作在源端数据库中的操作时刻早于第二事务日志记录的第二事务操作在源端数据库中的操作时刻的情况下,并将第一事务日志的编号记录到第二事务日志中,其中,第一事务日志的编号用于指示第一事务日志与第二事务日志的依赖关系。
确定第二组事务日志包括的第三事务日志和第四事务日志之间的依赖关系,包括:
在确认第三事务日志记录的第三事务操作在源端数据库中的操作对象与第四事务日志记录的第四事务操作在源端数据库中的操作对象相同,且第三事务日志记录的第三事务操作在源端数据库中的操作时刻早于第四事务日志记录的第四事务操作在源端数据库中的操作时刻的情况下,将第三事务日志的编号记录到第四事务日志中,其中,第三事务日志的编号用于指示第三事务日志与第四事务日志的依赖关系。
具体实现方式可以参照前述示例四中,对目的端设备302中的并行化处理模块的说明,在此不再赘述。
在确认每一组事务日志之间的依赖关系之后,则根据各个事务日志之间的依赖关系,在目的端数据库重演各个事务日志。
作为一种示例,目的端设备在获取到第一事务日志的情况下,确认第一事务日志没有记录有用于指示与该第一事务日志存在依赖关系的事务日志的编号,根据第一事务日志在目的端数据库进行事务重演。然后,在获取到第二事务日志的情况下,确认第二事务日志记录有用于指示第一事务日志与第二事务日志的依赖关系的第一事务日志的编号,在确认 根据第一事务日志进行的事务重演完成之后,根据第二事务日志在目的端数据库进行事务重演。
目的端设备在获取到第三事务日志的情况下,确认第三事务日志没有记录有用于指示与该第三事务日志存在依赖关系的事务日志的编号,根据第三事务日志在目的端数据库进行事务重演。然后,在获取到第四事务日志的情况下,确认第四事务日志记录有用于指示第四事务日志与第三事务日志的依赖关系的第三事务日志的编号,在确认根据第三事务日志进行的事务重演完成之后,根据第四事务日志在目的端数据库进行事务重演。
具体实现方式可以参照前述任意一个示例中对重演模块的说明,在此不再赘述。
当目的端设备按照上述方式,在目的端数据库重演完所有的事务日志,则可以获取源端数据库中的数据,从而与源端数据库保持一致。
在上述技术方案中,目的端设备在进行事务重演之前,需要先确定每一组事务日志之间的依赖关系,在另一些实施例中,也可以由源端设备确定每一组事务日志之间的依赖关系,从而可以减少目的端设备的运算量。
请参考图13,为该方法的另一种示例的流程图,该流程图描述如下:
S131、源端设备从源端数据库的日志文件中并行获取第一组事务日志和第二组事务日志。
S132、源端设备确定每一组事务日志所包括的多个事务日志之间的依赖关系。
作为一种示例,针对第一组事务日志,源端设备在确认第一事务日志记录的第一事务操作在源端数据库中的操作对象与第二事务日志记录的第二事务操作在源端数据库中的操作对象相同,且第一事务日志记录的第一事务操作在源端数据库中的操作时刻早于第二事务日志记录的第二事务操作在源端数据库中的操作时刻的情况下,将第一事务日志的编号记录到第二事务日志中,其中,第一事务日志的编号用于指示第一事务日志与第二事务日志的依赖关系。
针对第二组事务日志,源端设备在确认第三事务日志记录的第三事务操作在源端数据库中的操作对象与第四事务日志记录的第四事务操作在源端数据库中的操作对象相同,且第三事务日志记录的第三事务操作在源端数据库中的操作时刻早于第四事务日志记录的第四事务操作在源端数据库中的操作时刻的情况下,将第三事务日志的编号记录到第四事务日志中,其中,第三事务日志的编号用于指示第三事务日志与第四事务日志的依赖关系。
其中,源端设备确定事务日志之间的依赖关系的方式,可以参照示例二中对并行化模块的说明,在此不再赘述。
S133、源端设备并行发送该第一组事务日志和该第二组事务日志,目的端设备接收该第一组事务日志以及该第二组事务日志。
需要说明的是,在这种情况下,源端设备发送的事务日志中,都携带有该事务日志所依赖的事务日志的编号。若某一个事务日志没有依赖的事务日志,则事务日志可以不携带其他事务日志的编号,或者携带的事务日志的编号为0。
S134、目的端设备在根据该第一组事务日志中的第一事务日志、第二事务日志以及第一事务日志与第二事务日志的依赖关系在目的端数据库进行事务重演之后,根据第二组事务日志中的第三事务日志、第四事务日志以及第三事务日志与第四事务日志的依赖关系在目的端数据库进行事务重演,使得目的端数据库与源端设备的源端数据库存储的数据一致。
步骤S133和步骤S134,分别与步骤S122和步骤S123相似,在此不再赘述。
在上述技术方案中,通过源端设备并行获取并发送多组事务日志,从而可以提高数据复制的效率,且并行发送的多组事务日志在目的端设备会根据事务日志之间的依赖关系进行事务日志重演,从而可以保证获取的数据的准确性,可以确保目的端数据库的数据与源数据库中的数据保持一致。
上述实施例中的介绍的源端设备以及目的端设备的功能,均可以由程序代码实现的功能模块、应用程序、线程、虚拟化功能实例或者容器来实现。例如,在源端设备中可以设置源端管理模块、多个抽取模块、多个并行化模块、多个发送模块,其中,源端管理模块用于监测源端设备中其他模块的运行状态,每个抽取模块用于从源端数据库的日志文件中获取一组事务日志,多个抽取模块并行获取多组事务日志,每个并行化模块用于从一个抽取模块中获取一组事务日志,然后确定该组事务日志中多个事务日志之间的依赖关系,然后将包括依赖关系的一组事务日志发送给一个发送模块,由该发送模块将包括由依赖关系的一组事务日志发送给目的端设备。在目的端设备中可以设置目的端管理模块和多个重演模块,目的端管理模块用于监测目的端设备中其他模块的运行状态,每个重演模块用于从源端设备接收事务日志,并由多个重演模块相互配合完成所有事务日志的重演。
下面,以上述多个模块为例,对本申请实施例提供的数据库复制方法进行说明。
在通过上述各个模块实现本申请实施例中的数据库复制方法之前,需要对各个模块进行初始化设置,请参考图14,为各个模块进行初始化设置的流程图,该流程图描述如下:
S1401、源端管理模块读取本端配置数据。
其中,该配置数据可以是由技术人员预先设置的,例如,可以包括抽取模块、并行化模块以及发送模块之间的拓扑关系及网络连接信息,并监听抽取模块、并行化模块、发送模块以及目的端管理模块发送的连接建立请求。
为方便说明,在图14中,以一个抽取模块、一个并行化模块以及一个发送模块为例进行示意说明,在抽取模块、并行化模块、发送模块的数量为多个时,每一个模块的处理过程均与图14对应的模块的处理过程相同。
S1402、目的端管理模块读取本端配置数据。
其中,该配置数据可以是由技术人员预先设置的,例如,可以包括重演模块之间的网络连接信息,并监听重演模块向目的端管理模块发送的连接建立请求。
为方便说明,在图14中,以一个重演模块为例进行示意说明,在重演模块的数量为多个时,每一个重演模块的处理过程均与图14所示的重演模块的处理过程相同。
S1403、抽取模块、并行化模块、发送模块分别向源端管理模块发送连接建立请求,并与源端管理模块建立连接。
S1404、重演模块分别向目的端管理模块发送连接建立请求,并与目的端管理模块建立连接。
S1405、目的端管理模块向源端管理模块发送连接建立请求,并与源端管理模块建立连接。
目的端管理模块还可以向源端管理模块发送目的端设备最后重演事务日志的编号和/或与目的端管理模块连接的重演模块的信息等。当重演模块由多个时,则发送所有重演模块的信息。
S1406、源端管理模块分别向并行化模块和发送模块发送连接确认消息,以及各个模块对应的连接信息。
例如,源端管理模块向每个并行化模块反馈与其连接的抽取模块和发送模块的信息,向每个发送模块发送与其连接的并行化模块以及重演模块的信息。
S1407、抽取模块、并行化模块、发送模块以及重演模块建立连接。
S1408、源端管理模块根据抽取模、并行化模块以及发送模块的配对数,计算每个抽取模块需要抽取的一组事务日志的编号范围,并向抽取模块反馈其对应的编号范围。
例如,抽取模块的数量为4个,每个抽取模块用于抽取2500个事务日志,即第一个抽取模块用于获取的一组事务日志的编号范围为1~2500,第二个抽取模块用于获取的一组事务日志的编号范围为2501~5000,以此类推。
S1409、源端管理模块向目的端管理模块反馈所有发送模块的信息以及每个抽取模块需要抽取的一组事务日志的编号范围。
S1410、每个抽取模块保存其编号范围,向源端管理模块返回确认消息。
通过上述步骤,完成数据库复制系统的初始化配置。在完成数据库复制系统的初始化配置之后,数据库复制系统则可以进行数据库复制。
请参考图15,为各个模块进行数据库复制的流程图,该流程图描述如下:
S1501、每个抽取模块向源端数据库发起事务日志获取请求。
该事务日志获取请求用于获取事务日志。由于每个抽取模块的处理流程相同,因此,在图15所示的示例中,仅以一个抽取模块为例进行说明。
S1502、抽取模块根据事务日志的头部信息,判断读取的某一个事务日志是否为该抽取模块对应的编号范围内的事务日志,若是,则继续获取该事务日志的主体信息,对该事务日志进行解析及过滤处理,最终获取该事务日志;若为否,则抛弃该事务日志,继续读取下一个事务日志。
S1503、抽取模块将获取的事务日志发送给与其连接的并行化模块。
S1504、并行化模块识别接收到的事务日志所在的事务日志的组别以及确定该事务日志与其他事务日志之间的依赖关系,并将该依赖关系携带在该事务日志中。
S1505、并行化模块将携带有依赖关系的事务日志发送给与其连接的发送模块。
S1506、发送模块对该事务日志的编号进行哈希计算,哈希计算的结果为用于接收该事务日志的重演模块的编号,并将该事务日志发送给对应的重演模块。
在本示例中,以重演模块的数量为K个为例。发送模块确定将该事务日志发送给重演模块1。
S1507、重演模块1识别接收到的事务日志所在的事务日志的组别,根据该组别确定该事务日志是否属于当前正在处理的重演队列,若是,则将该事务日志存放在该重演队列中,若为否,则创建新的重演队列用于存放该事务日志。
在本示例中,同一个组别的事务日志存放在同一个重演队列中。
S1508、重演模块1确定该事务日志是否满足重演条件,若满足重演条件,则根据该事务日志在目的端数据库进行重演。
其中,若该事务日志中不携带其他事务日志的编号,则确定该事务日志满足重演条件。或者,若该事务日志中携带另一个事务日志的编号,且该另一个事务日志已经重演完成,则确定该事务日志满足重演条件。
需要说明的是,若重演模块1确定该事务日志依赖其他事务日志,重演模块1可以等待其他重演模块发送该事务日志所依赖的事务日志的重演结果。若在预设时长内仍未收到 该重演结果,则可以根据所依赖的事务日志的编号,进行哈希计算,确定所依赖的事务日志对应的重演模块,并向该重演模块发送询问请求,该询问请求用于获取该重演结果。若接收到与该询问请求对应的应答消息中指示该事务日志重演完成,则重演模块1则根据该事务日志在目的端数据库进行重演;若接收到与该询问请求对应的应答消息中指示该事务日志还未重演完成,则继续等待。
存在一种情况,用于重演所依赖的事务日志的重演模块发生故障,从而,重演模块1不会接收到与该询问请求对应的应答消息,在这种情况下,重演模块1可以确定用于重演所依赖的事务日志的重演模块发生故障判,从而将该情况发送给目的端管理模块,需要说明的是,这种情况在图15中未示出。
S1509、重演模块1将该事务日志的重演结果通知其他重演模块。
当其他重演节点接收到重演结果后,则触发对后续事务日志的重演,具体过程与步骤S1508相似,在此不再赘述。各个重演模块重复执行重演模块1所执行的步骤,直至完成所有事务日志的重演,当目的端设备中接收到的最后一个事务日志重演完成,则由完成最后一个事务日志重演的重演模块,将重演结果反馈给发送模块以及其他重演模块。发送模块在确定最后一个事务日志完成重演之后,则可以清除对事务日志的缓存。
由图15所示的示例可知,重演模块在根据事务日志重演时,可能会发生故障,为了保证所有的事务日志都能够重演完成,在本申请实施例中还可以通过源端管理模块和目的端管理模块进行故障恢复的处理。请参考图16,为各个模块进行故障恢复的流程图,该流程图描述如下:
S161、目的端管理模块确定重演模块m发生故障。
目的端管理模块确定重演模块m发生故障的方式可以包括但不限于如下方式:
第一种确定方式:
每个重演模块可以按照预设周期,向目的端管理模块发送心跳,若目的端管理模块在某个周期内未接收到重演模块m发送的心跳,则可以确定重演模块m发生故障。
第二种确定方式:
其他重演模块向重演模块m发送询问请求,但是未得到重演模块m的应答消息,从而将该情况上报给目的端管理模块,在这种情况下,目的端管理模块可以确定重演模块m发生故障。
第三种确定方式:
目的端管理模块可以主动查询目的端数据库的重演状态,确定目的端数据库中最后一个重演的事务日志的编号,如果最后一个重演的事务日志的编号不是目的端设备接收到的最后一个事务日志的编号,则根据查询到的最后一个重演的事务日志的编号进行哈希计算,哈希计算的结果为m,则目的端管理模块可以确定重演模块m发生故障。
当然,也可以通过其他方式确定重演模块m发生故障,在此不作限制。
S162、目的端管理模块向源端管理模块以及各个重演模块发送故障通知消息。
其中,该故障通知消息中可以包括发生故障的重演模块的编号,例如,m,以及发生故障的重演模块中最后重演的事务日志的编号,例如,事务日志n。
S163、各个重演模块标识重演模块m为故障模块。
S164、源端管理模块向各个发送模块发送该故障通知消息。
S165、发送模块重新分发事务日志。
当发送模块接收该故障通知消息后,则将已经分发给重演模块m的事务日志进行二次哈希计算,分发到其他重演模块。该重新分发的事务日志可以携带“二次hash重发”标志。
由于各个发送模块的处理过程相同,因此,在图16中以一个发送模块为例进行说明。
S166、各个重演模块根据事务日志所在的组的编号以及故障重演模块的编号重新创建新的重演队列,缓存重发的事务日志,并向发送模块反馈确认消息。
S167、各个重演模块根据重发后的事务日志以及重发之前的事务日志,在目的端数据库进行重演。
各个重演模块可以采用与前述相同的方式,在目的端数据库重演各个事务日志,在此不再赘述。若某一个重演模块接收到携带“二次hash重发”标识的第一条事务日志,例如,接收到携带“二次hash重发”标识且编号为n的事务日志,则该重演模块可以立即根据该事务日志进行重演,而不用等待。
发送模块在将已经发送给重演模块m的事务日志重新分发之后,则按照正常的方式发送其他事务日志,当然,若确定某一个事务日志需要分发到重演模块m,则仍需要对该事务日志进行二次哈希计算,以将该事务日志分发到其他重演模块,虽然发送模块进行了二次哈希计算,但是对于重演模块来讲,是第一次接收到该事务日志,因此,在这种情况下,不需要在进行了二次哈希计算的事务日志中携带“二次hash重发”标识。
上述本申请提供的实施例中,为了实现上述本申请实施例提供的方法中的各功能,存储系统可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
图3-图11所示的实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
如图17所示为本申请实施例提供的源端设备1700,其中,源端设备1700可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
源端设备1700包括至少一个处理器1720,用于实现或用于支持源端设备1700实现本申请实施例提供的方法中源端设备的功能。示例性地,处理器1720可以从源端数据库的日志文件中并行获取至少两组事务日志,具体参见方法示例中的详细描述,此处不做赘述。
源端设备1700还可以包括至少一个存储器1730,用于存储程序指令和/或数据。存储器1730和处理器1720耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器1720可能和存储器1730协同操作。处理器1720可能执行存储器1730中存储的程序指令。所述至少一个存储器中的至少一个可以包括于处理器中。
源端设备1700还可以包括通信接口1710,用于通过传输介质和其它设备进行通信,从而用于源端设备1700可以和其它设备进行通信。示例性地,该其它设备可以是存储客户端或者存储设备。处理器1720可以利用通信接口1710收发数据。
本申请实施例中不限定上述通信接口1710、处理器1720以及存储器1730之间的具体连接介质。本申请实施例在图17中以存储器1730、处理器1720以及通信接口1710之间通过总线1740连接,总线在图17中以粗线表示,其它部件之间的连接方式,仅是进行示 意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图17中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本申请实施例中,处理器1720可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
在本申请实施例中,存储器1730可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
如图18所示为本申请实施例提供的目的端设备1800,其中,目的端设备1800可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
目的端设备1800包括至少一个处理器1820,用于实现或用于支持目的端设备1800实现本申请实施例提供的方法中目的端设备的功能。示例性地,处理器1820可以从源端设备中获取至少两组事务日志,并根据事务日志在目的端数据库进行事务日志重演,具体参见方法示例中的详细描述,此处不做赘述。
目的端设备1800还可以包括至少一个存储器1830,用于存储程序指令和/或数据。存储器1830和处理器1820耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器1820可能和存储器1830协同操作。处理器1820可能执行存储器1830中存储的程序指令。所述至少一个存储器中的至少一个可以包括于处理器中。
目的端设备1800还可以包括通信接口1810,用于通过传输介质和其它设备进行通信,从而用于目的端设备1800可以和其它设备进行通信。示例性地,该其它设备可以是存储客户端或者存储设备。处理器1820可以利用通信接口1810收发数据。
本申请实施例中不限定上述通信接口1810、处理器1820以及存储器1830之间的具体连接介质。本申请实施例在图18中以存储器1830、处理器1820以及通信接口1810之间通过总线1840连接,总线在图18中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图18中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在本申请实施例中,处理器1820可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
在本申请实施例中,存储器1830可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具 有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
本申请实施例中还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行图12-图16所示的实施例中服务端执行的方法。
本申请实施例中还提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行图12-图16所示的实施例中服务端执行的方法。
本申请实施例提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现前述方法中源端设备或目的端设备的功能。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
本申请实施例提供的方法中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,简称DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,简称DVD))、或者半导体介质(例如,SSD)等。

Claims (34)

  1. 一种数据库复制系统,其特征在于,包括源端设备和目的端设备,所述数据库复制系统用于根据源端数据库的日志文件中包括的至少两组事务日志,在目的端数据库进行重演,其中:
    所述源端设备,用于从所述源端数据库的日志文件中并行获取所述至少两组事务日志,所述至少两组事务日志包括第一组事务日志和第二组事务日志,并发送所述至少两组事务日志,其中,第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
    所述目的端设备,用于接收所述至少两组事务日志,在根据所述第一组事务日志中的所述第一事务日志、所述第二事务日志以及所述第一事务日志与所述第二事务日志的依赖关系在所述目的端数据库进行事务重演之后,根据所述第二组事务日志中的所述第三事务日志、所述第四事务日志以及所述第三事务日志与所述第四事务日志的依赖关系在所述目的端数据库进行事务重演,使得所述目的端数据库与所述源端数据库存储的数据一致。
  2. 根据权利要求1所述的数据库复制系统,其特征在于,
    所述源端设备还用于在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;
    所述源端设备还用于在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
  3. 根据权利要求1所述的数据库复制系统,其特征在于,
    所述目的端设备还用于在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;
    所述目的端设备还用于在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指 示所述第三事务日志与所述第四事务日志的依赖关系。
  4. 根据权利要求2或3所述的数据库复制系统,其特征在于,
    所述目的端设备在获取到所述第一组事务日志中的所述第二事务日志的情况下,确认所述第二事务日志记录有用于指示所述第一事务日志与所述第二事务日志的依赖关系的所述第一事务日志的编号,在确认根据所述第一事务日志进行的事务重演完成之后,根据所述第二事务日志进行事务重演。
  5. 根据权利要求2或3所述的数据库复制系统,其特征在于,
    所述目的端设备在获取到第一组事务日志中的所述第一事务日志的情况下,确认所述第一事务日志没有记录有用于指示与所述第一事务日志存在依赖关系的事务日志的编号,根据所述第一事务日志进行事务重演。
  6. 根据权利要求2或3所述的数据库复制系统,其特征在于,
    所述目的端设备在获取到第二组事务日志中的所述第四事务日志的情况下,确认所述第四事务日志记录有用于指示所述第四事务日志与所述第三事务日志的依赖关系的所述第三事务日志的编号,在确认根据所述第三事务日志进行的事务重演完成之后,根据所述第四事务日志进行事务重演。
  7. 根据权利要求2或3所述的数据库复制系统,其特征在于,
    所述目的端设备在获取到第二组事务日志中的所述第三事务日志的情况下,确认所述第三事务日志没有记录有用于指示与所述第三事务日志存在依赖关系的事务日志的编号,根据所述第三事务日志进行事务重演。
  8. 根据权利要求1至7任一项所述的数据库复制系统,其特征在于,所述源端设备和所述源端数据库设置在第一区域,所述目的端设备和所述目的端数据库设置在第二区域,所述第一区域和所述第二区域远程连接。
  9. 根据权利要求1至8任一项所述的数据库复制系统,其特征在于,
    所述源端设备用于根据事务日志的编号范围从源端数据库中并行获取所述至少两组事务日志。
  10. 根据权利要求9所述的数据库复制系统,其特征在于,所述源端设备还用于:
    从所述源端数据库中读取日志概要记录信息,所述日志概要记录信息记录有所述源端数据库产生的事务日志的编号、在所述日志文件中的记录位置、长度以及数量;
    根据所述日志概要记录信息在所述日志文件中并行获取所述至少两组事务日志。
  11. 一种数据库复制方法,其特征在于,包括:
    从源端数据库的日志文件中并行获取至少两组事务日志,所述至少两组事务日志包括第一组事务日志和第二组事务日志,并发送所述至少两组事务日志,其中,第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
    向目的端设备发送所述至少两组事务日志。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录 的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
    在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
  13. 根据权利要求11或12所述的方法,其特征在于,从源端数据库中并行获取所述至少两组事务日志,包括:
    根据事务日志的编号范围从源端数据库中并行获取所述至少两组事务日志。
  14. 根据权利要求13所述的方法,其特征在于,根据事务日志的编号范围从源端数据库中并行获取所述至少两组事务日志,包括:
    从所述源端数据库中读取日志概要记录信息,所述日志概要记录信息记录有所述源端数据库产生的事务日志的编号、在所述日志文件中的记录位置、长度以及数量,
    根据所述日志概要记录信息在所述日志文件中并行获取所述至少两组事务日志。
  15. 一种数据库复制方法,其特征在于,包括:
    从源端设备接收至少两组事务日志,所述至少两组事务日志包括第一组事务日志以及第二组事务日志,所述第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少相邻的包括第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
    在根据所述第一组事务日志中的所述第一事务日志、所述第二事务日志以及所述第一事务日志与所述第二事务日志的依赖关系在目的端数据库进行事务重演之后,根据所述第二组事务日志中的所述第三事务日志、所述第四事务日志以及所述第三事务日志与所述第四事务日志的依赖关系在所述目的端数据库进行事务重演,使得所述目的端数据库与所述源端数据库存储的数据一致。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
    在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
  17. 根据权利要求15或16所述的方法,其特征在于,根据所述第一组事务日志中的所述第一事务日志、所述第二事务日志以及所述第一事务日志与所述第二事务日志的依赖关系在目的端数据库进行事务重演,包括:
    在获取到所述第一事务日志的情况下,确认所述第一事务日志没有记录有用于指示与所述第一事务日志存在依赖关系的事务日志的编号,根据所述第一事务日志进行事务重演。
  18. 根据权利要求15或16所述的方法,其特征在于,根据所述第一组事务日志中的所述第一事务日志、所述第二事务日志以及所述第一事务日志与所述第二事务日志的依赖关系在目的端数据库进行事务重演,包括:
    在获取到所述第二事务日志的情况下,确认所述第二事务日志记录有用于指示所述第一事务日志与所述第二事务日志的依赖关系的所述第一事务日志的编号,在确认根据所述第一事务日志进行的事务重演完成之后,根据所述第二事务日志进行事务重演。
  19. 根据权利要求15或16所述的方法,其特征在于,根据所述第二组事务日志中的所述第三事务日志、所述第四事务日志以及所述第三事务日志与所述第四事务日志的依赖关系在所述目的端数据库进行事务重演,包括:
    在获取到所述第三事务日志的情况下,确认所述第三事务日志没有记录有用于指示与所述第三事务日志存在依赖关系的事务日志的编号,根据所述第三事务日志进行事务重演。
  20. 根据权利要求15或16所述的方法,其特征在于,根据所述第二组事务日志中的所述第三事务日志、所述第四事务日志以及所述第三事务日志与所述第四事务日志的依赖关系在所述目的端数据库进行事务重演,包括:
    在获取到所述第四事务日志的情况下,确认所述第四事务日志记录有用于指示所述第四事务日志与所述第三事务日志的依赖关系的所述第三事务日志的编号,在确认根据所述第三事务日志进行的事务重演完成之后,根据所述第四事务日志进行事务重演。
  21. 一种源端设备,其特征在于,包括:
    处理模块,用于从源端数据库的日志文件中并行获取至少两组事务日志,所述至少两组事务日志包括第一组事务日志和第二组事务日志,并发送所述至少两组事务日志,其中,第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
    发送模块,用于向目的端设备发送所述至少两组事务日志。
  22. 根据权利要求21所述的设备,其特征在于,所述处理模块还用于:
    在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
    在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记 录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
  23. 根据权利要求21或22所述的设备,其特征在于,所述处理模块具体用于:
    根据事务日志的编号范围从源端数据库中并行获取所述至少两组事务日志。
  24. 根据权利要求23所述的设备,其特征在于,所述处理模块具体用于:
    从所述源端数据库中读取日志概要记录信息,所述日志概要记录信息记录有所述源端数据库产生的事务日志的编号、在所述日志文件中的记录位置、长度以及数量,
    根据所述日志概要记录信息在所述日志文件中并行获取所述至少两组事务日志。
  25. 一种目的端设备,其特征在于,包括:
    接收模块,用于从源端设备接收至少两组事务日志,所述至少两组事务日志包括第一组事务日志以及第二组事务日志,所述第一组事务日志至少包括相邻的第一事务日志和第二事务日志,所述第二组事务日志至少包括相邻的第三事务日志和第四事务日志,所述第二事务日志的产生时间早于所述第三事务日志的产生时间;
    处理模块,用于在根据所述第一组事务日志中的所述第一事务日志、所述第二事务日志以及所述第一事务日志与所述第二事务日志的依赖关系在目的端数据库进行事务重演之后,根据所述第二组事务日志中的所述第三事务日志、所述第四事务日志以及所述第三事务日志与所述第四事务日志的依赖关系在所述目的端数据库进行事务重演,使得所述目的端数据库与所述源端数据库存储的数据一致。
  26. 根据权利要求25所述的设备,其特征在于,所述处理模块还用于:
    在确认所述第一事务日志记录的第一事务操作在所述源端数据库中的操作对象与所述第二事务日志记录的第二事务操作在所述源端数据库中的操作对象相同,且所述第一事务日志记录的第一事务操作在所述源端数据库中的操作时刻早于所述第二事务日志记录的第二事务操作在所述源端数据库中的操作时刻的情况下,将所述第一事务日志的编号记录到所述第二事务日志中,其中,所述第一事务日志的编号用于指示所述第一事务日志与所述第二事务日志的依赖关系;以及,
    在确认所述第三事务日志记录的第三事务操作在所述源端数据库中的操作对象与所述第四事务日志记录的第四事务操作在所述源端数据库中的操作对象相同,且所述第三事务日志记录的第三事务操作在所述源端数据库中的操作时刻早于所述第四事务日志记录的第四事务操作在所述源端数据库中的操作时刻的情况下,将所述第三事务日志的编号记录到所述第四事务日志中,其中,所述第三事务日志的编号用于指示所述第三事务日志与所述第四事务日志的依赖关系。
  27. 根据权利要求25或26所述的设备,其特征在于,所述处理模块具体用于:
    在获取到所述第一事务日志的情况下,确认所述第一事务日志没有记录有用于指示与所述第一事务日志存在依赖关系的事务日志的编号,根据所述第一事务日志进行事务重演。
  28. 根据权利要求25或26所述的设备,其特征在于,所述处理模块具体用于:
    在获取到所述第二事务日志的情况下,确认所述第二事务日志记录有用于指示所述第一事务日志与所述第二事务日志的依赖关系的所述第一事务日志的编号,在确认根据所述第一事务日志进行的事务重演完成之后,根据所述第二事务日志进行事务重演。
  29. 根据权利要求25或26所述的设备,其特征在于,所述处理模块具体用于:
    在获取到所述第三事务日志的情况下,确认所述第三事务日志没有记录有用于指示与 所述第三事务日志存在依赖关系的事务日志的编号,根据所述第三事务日志进行事务重演。
  30. 根据权利要求25或26所述的设备,其特征在于,所述处理模块具体用于:
    在获取到所述第四事务日志的情况下,确认所述第四事务日志记录有用于指示所述第四事务日志与所述第三事务日志的依赖关系的所述第三事务日志的编号,在确认根据所述第三事务日志进行的事务重演完成之后,根据所述第四事务日志进行事务重演。
  31. 一种源端设备,其特征在于,包括:
    存储器,用于存储指令;
    处理器,用于执行所述存储器中的指令,使得所述源端设备执行如权利要求11-14任一项所述的方法。
  32. 一种目的端设备,其特征在于,包括:
    存储器,用于存储指令;
    处理器,用于执行所述存储器中的指令,使得所述目的端设备执行如权利要求15-20任一项所述的方法。
  33. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求11-14或15-20任一项所述的方法。
  34. 一种计算机程序产品,其特征在于,所述计算机程序产品存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求11-14或15-20任一项所述的方法。
PCT/CN2021/077476 2020-02-28 2021-02-23 一种数据库复制系统、方法、源端设备以及目的端设备 WO2021169955A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21759531.3A EP4095714B1 (en) 2020-02-28 2021-02-23 Database replication system and method, and source device and destination device
US17/894,352 US20220405306A1 (en) 2020-02-28 2022-08-24 Database replication system and method, source end device, and destination end device

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010129105.2 2020-02-28
CN202010129105 2020-02-28
CN202010383462.1 2020-05-08
CN202010383462.1A CN113326315A (zh) 2020-02-28 2020-05-08 一种数据库复制系统、方法、源端设备以及目的端设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/894,352 Continuation US20220405306A1 (en) 2020-02-28 2022-08-24 Database replication system and method, source end device, and destination end device

Publications (1)

Publication Number Publication Date
WO2021169955A1 true WO2021169955A1 (zh) 2021-09-02

Family

ID=77413055

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/077476 WO2021169955A1 (zh) 2020-02-28 2021-02-23 一种数据库复制系统、方法、源端设备以及目的端设备

Country Status (4)

Country Link
US (1) US20220405306A1 (zh)
EP (1) EP4095714B1 (zh)
CN (1) CN113326315A (zh)
WO (1) WO2021169955A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777270A (zh) * 2016-12-28 2017-05-31 中国民航信息网络股份有限公司 一种基于提交点时间线同步的异构数据库复制并行执行系统及方法
CN107678888A (zh) * 2017-09-30 2018-02-09 北京九桥同步软件有限公司 数据库数据备份方法及装置
CN109189608A (zh) * 2018-08-13 2019-01-11 武汉达梦数据库有限公司 一种保证复制事务一致性的方法以及相应的复制装置
US20190303470A1 (en) * 2018-04-03 2019-10-03 Sap Se Database change capture with transaction-consistent order

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9727625B2 (en) * 2014-01-16 2017-08-08 International Business Machines Corporation Parallel transaction messages for database replication
US9959178B2 (en) * 2014-11-25 2018-05-01 Sap Se Transactional and parallel log replay for asynchronous table replication
US10762107B2 (en) * 2016-11-29 2020-09-01 Sap Se Synchronization mechanism for serialized data log replay in database systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777270A (zh) * 2016-12-28 2017-05-31 中国民航信息网络股份有限公司 一种基于提交点时间线同步的异构数据库复制并行执行系统及方法
CN107678888A (zh) * 2017-09-30 2018-02-09 北京九桥同步软件有限公司 数据库数据备份方法及装置
US20190303470A1 (en) * 2018-04-03 2019-10-03 Sap Se Database change capture with transaction-consistent order
CN109189608A (zh) * 2018-08-13 2019-01-11 武汉达梦数据库有限公司 一种保证复制事务一致性的方法以及相应的复制装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4095714A4 *

Also Published As

Publication number Publication date
US20220405306A1 (en) 2022-12-22
CN113326315A (zh) 2021-08-31
EP4095714B1 (en) 2024-05-15
EP4095714A1 (en) 2022-11-30
EP4095714A4 (en) 2023-05-03

Similar Documents

Publication Publication Date Title
US10795863B2 (en) Geographically-distributed file system using coordinated namespace replication over a wide area network
US8074222B2 (en) Job management device, cluster system, and computer-readable medium storing job management program
US7293145B1 (en) System and method for data transfer using a recoverable data pipe
US8856091B2 (en) Method and apparatus for sequencing transactions globally in distributed database cluster
US20120180070A1 (en) Single point, scalable data synchronization for management of a virtual input/output server cluster
EP1357465A2 (en) Storage system having virtualized resource
US9652520B2 (en) System and method for supporting parallel asynchronous synchronization between clusters in a distributed data grid
US10055445B2 (en) Transaction processing method and apparatus
WO2007028248A1 (en) Method and apparatus for sequencing transactions globally in a distributed database cluster
US20210165768A1 (en) Replication Barriers for Dependent Data Transfers between Data Stores
WO2017181430A1 (zh) 分布式系统的数据库复制方法及装置
WO2022151593A1 (zh) 一种数据恢复方法、装置、设备、介质及程序产品
WO2021169955A1 (zh) 一种数据库复制系统、方法、源端设备以及目的端设备
US11079960B2 (en) Object storage system with priority meta object replication
US9720796B2 (en) Information processing apparatus, information processing system, control method for information processing system, and medium
CN116594551A (zh) 一种数据存储方法及装置
US11093465B2 (en) Object storage system with versioned meta objects
US20240078485A1 (en) Data management system and data management method
WO2020207078A1 (zh) 数据处理方法、装置和分布式数据库系统
CN117555493A (zh) 数据处理方法、系统、装置、存储介质及电子设备
JP6100135B2 (ja) フォールトトレラントシステム及びフォールトトレラントシステム制御方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21759531

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021759531

Country of ref document: EP

Effective date: 20220825

NENP Non-entry into the national phase

Ref country code: DE