WO2023116419A1 - 数据同步方法、设备及计算机可读存储介质 - Google Patents

数据同步方法、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2023116419A1
WO2023116419A1 PCT/CN2022/136956 CN2022136956W WO2023116419A1 WO 2023116419 A1 WO2023116419 A1 WO 2023116419A1 CN 2022136956 W CN2022136956 W CN 2022136956W WO 2023116419 A1 WO2023116419 A1 WO 2023116419A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
machine
redo
standby
redo log
Prior art date
Application number
PCT/CN2022/136956
Other languages
English (en)
French (fr)
Inventor
周亚运
盛夏
付裕
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023116419A1 publication Critical patent/WO2023116419A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning

Definitions

  • the present application relates to but not limited to the technical field of data processing, and in particular relates to a data synchronization method, device and computer-readable storage medium.
  • the first is logical replication based on logical logs
  • the second is physical replication based on physical logs.
  • the data nodes of the current GoldenDB (financial-level transactional) distributed database are developed based on an open source database such as MySQL (relational database management system).
  • an open source database such as MySQL (relational database management system).
  • Data synchronization but the synchronization performance between the active and standby data nodes is usually low, which leads to low overall performance of the distributed database; in addition, due to the high playback delay of the standby machine node in the active and standby data nodes, it affects the active and standby data nodes of the distributed database.
  • the standby switching time reduces the high availability of the distributed database.
  • Embodiments of the present application provide a data synchronization method, device, and computer-readable storage medium.
  • the embodiment of the present application provides a data synchronization method, which is applied to the main machine, and the main machine communicates with multiple standby machines, and the standby machines include synchronous standby machines and asynchronous standby machines.
  • the method Including: receiving a database request from a terminal; generating a redo log according to the database request, and synchronously copying the redo log to multiple synchronous standby machines; generating a binary log according to the database request, and copying the binary log Asynchronously replicate to multiple asynchronous standby machines; after receiving a response from each of the synchronous standby machines based on the redo log, re-receive the database request from the terminal.
  • the embodiment of the present application also provides a data synchronization method, which is applied to multiple standby machines, the standby machine communicates with the main machine, and the standby machine includes a synchronous standby machine and an asynchronous standby machine.
  • the method includes: the synchronous standby machine receives the redo log from the main machine, and plays back the redo log, wherein the redo log is generated by the main machine based on a terminal database request; The synchronous standby machine sends an acknowledgment response to the main machine, so that the main machine re-receives the database request from the terminal; the asynchronous standby machine receives the binary log from the main machine, and The binary log is played back, wherein the binary log is generated by the host machine based on the terminal's database request.
  • the embodiment of the present application also provides a data synchronization device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program as follows One: the data synchronization method described in any one of the first aspect; or the data synchronization method described in any one of the second aspect.
  • the embodiment of the present application also provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used to execute at least one of the following: the data described in any one of the first aspect A synchronization method; or the data synchronization method described in any one of the second aspect.
  • Fig. 1 is a schematic flow chart of the master machine side of the data synchronization method of the embodiment of the present application
  • Fig. 2 is a schematic flow chart of generating a redo log according to an embodiment of the present application
  • FIG. 3 is a schematic flow diagram of generating the last redo log in an embodiment of the present application.
  • Fig. 4 is a schematic flow diagram of the synchronous replication of the embodiment of the present application to the synchronous standby machine
  • FIG. 5 is a schematic flow diagram of a response response in an embodiment of the present application.
  • Fig. 6 is a schematic flow chart of generating a binary log according to an embodiment of the present application.
  • FIG. 7 is a schematic flow diagram of the standby machine side of the data synchronization method of the embodiment of the present application.
  • FIG. 8 is a schematic flow diagram of replaying redo logs according to an embodiment of the present application.
  • FIG. 9 is a schematic flow diagram of replaying binary logs according to an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a data synchronization method according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the overall architecture of the distributed database of the embodiment of the present application.
  • MySQL is a relational database management system, developed by the Swedish MySQL AB company, which belongs to the product of Oracle.
  • the SQL (Structured Query Language) used by MySQL is used to access The most commonly used standardized language of the database), mainly adopts the first logical replication based on the logical log, and its logical log is a binlog log (ie, a binary log).
  • This logical replication method is to generate a transaction logic log at the time of transaction submission, and write the transaction logic log to the binlog log (that is, binary log) file and refresh the disk, and then synchronize to the standby machine node.
  • PostgreSQL is a free software object-relational database management system with complete features, based on POSTGRES, version 4.2 developed by the Department of Computer Science, University of California
  • the second type of physical replication based on physical logs is adopted, and the physical logs are redo logs (ie, redo logs).
  • This physical replication method is to continuously generate redo logs incrementally during transaction execution, and write the generated redo logs to the redo log (that is, redo log) file, and write the redo log and redo log for the last time when the transaction is committed Disk operation. Since the physical log is continuously synchronized to the standby machine node during the generation process, its overall synchronization performance is relatively high.
  • this physical replication method supports scenarios first, the replication compatibility between different versions is poor, and physical replication cannot be performed between different database systems.
  • the data nodes of the current GoldenDB (financial-level transactional) distributed database are developed based on an open source database such as MySQL (relational database management system), which adopts a logical replication method based on logical logs (ie, binlog logs) Perform data synchronization between the active and standby data nodes.
  • MySQL relational database management system
  • the synchronization performance between the active and standby data nodes is usually low, which in turn leads to low overall performance of the distributed database.
  • the master-standby switching time of the distributed database is affected, resulting in a decrease in the high availability of the distributed database.
  • the embodiment of the present application provides a data synchronization method, which can improve the synchronization performance between the active and standby data nodes, and reduce the playback delay of the standby node.
  • the embodiment of the present application provides a data synchronization method, which is applied to the master machine, and the master machine communicates with multiple backup machines.
  • the backup machines include synchronous backup machines and asynchronous backup machines.
  • the method includes but is not limited to the following steps:
  • Step S100 receiving a database request from a terminal
  • Step S110 generating a redo log according to the database request, and synchronously copying the redo log to multiple synchronous standby machines;
  • Step S120 generate a binary log according to the database request, and asynchronously copy the binary log to multiple asynchronous standby machines;
  • Step S130 after receiving the response from each synchronous standby machine based on the redo log, re-receiving the database request from the terminal.
  • the database request in this embodiment of the present application may be an SQL request.
  • the database request is sent to the host machine through the service end, so that the host machine receives the database request from the service end.
  • the terminal may be a mobile terminal device or a non-mobile terminal device.
  • Mobile terminal devices can be mobile phones, tablet computers, notebook computers, handheld computers, vehicle-mounted terminal devices, wearable devices, super mobile personal computers, netbooks, personal digital assistants, CPE, UFI (wireless hotspot equipment), etc.; non-mobile terminal devices can be It is a personal computer, a television, a teller machine, or a self-service machine, which is not specifically limited in this embodiment of the present application.
  • the master machine generates redo logs (i.e. redo logs) and binary logs (i.e. binlog logs) according to the database request.
  • redo logs i.e. redo logs
  • binary logs i.e. binlog logs
  • the redo logs are continuously incrementally generated during transaction execution, and then the generated redo logs are generated.
  • the log is synchronously copied to multiple synchronous standby machines; and the transaction logic log (ie, binary log) is generated at the time of transaction submission, and then the binary log is asynchronously copied to multiple asynchronous standby machines.
  • the synchronous standby machine After receiving the redo logs from the active machine, the synchronous standby machine will respond to the active machine according to the redo log feedback. Therefore, after receiving the acknowledgment responses from each synchronous standby machine, the active machine will receive For the database request from the terminal, return to step S100, and re-execute steps S100 to S130, so as to perform data synchronization operation on the next database
  • the embodiment of the present application is applied to the master machine, through which the master machine communicates with multiple standby machines, and realizes Data synchronization method.
  • the embodiment of the present application can effectively improve the synchronization performance between the active and standby data nodes by synchronously replicating the redo log to multiple synchronous standby machines, thereby improving the overall performance of the distributed database; on the other hand , by asynchronously replicating the binary log to multiple asynchronous standby machines, it can effectively reduce the playback delay of the standby machine nodes and improve the high availability of the distributed database.
  • generating redo logs according to database requests includes but is not limited to the following steps:
  • Step S111 execute the database request, and generate redo logs during the execution of the database request.
  • the embodiment of the present application realizes the data synchronization method between the master machine and multiple synchronous standby machines through the synchronous replication method based on the physical log (ie redo log).
  • the primary machine After receiving the database request from the terminal, the primary machine will execute the database request, and continuously generate redo logs during the execution of the database request, so as to synchronously copy the generated redo logs to multiple synchronous standby machines. Since the redo log in the embodiment of the present application is continuously synchronized to the synchronous standby machine during the generation process, the synchronization performance between the active and standby data nodes is relatively high.
  • step S111 that is, executing the database request and generating redo logs during the execution of the database request, the method further includes but is not limited to the following steps:
  • Step S112 commit the transaction after the execution of the database request is completed
  • Step S113 generating the last redo log during the transaction commit process.
  • the master machine in the embodiment of the present application will generate a redo log during the execution of the database request to synchronously replicate to multiple synchronous standby machines.
  • the transaction commit will start, and, during the transaction commit process, the writing of the last redo log will be completed to generate the last redo log. In this way, the master machine can synchronously copy the redo log to the synchronous standby machine.
  • step S112 and step S113 with reference to Figure 4, the redo log is synchronously replicated to multiple synchronous standby machines, including but not limited to the following steps:
  • Step S114 synchronously copying the last redo log to multiple synchronous standby machines.
  • the master computer synchronously copies the last redo log to multiple synchronous standby machines, so that after the synchronous standby machine receives the last redo log from the master machine, it responds according to the last redo log feedback Respond to the master. After receiving the response from each synchronous standby machine based on the last redo log, the master machine will re-execute step S100 to step S130.
  • Step S115 synchronously waiting for a response from each synchronous standby machine based on redo logs.
  • the embodiment of the present application adopts a synchronous replication method based on physical logs (that is, redo logs), that is, the master machine synchronously replicates the redo logs generated based on terminal database requests to multiple synchronous standby machines. , the synchronous standby machine will make a response according to the redo log. During this process, the master machine needs to wait for the response from each synchronous After the response is answered, step S100 to step S130 are re-executed. In this manner, the data synchronization between the active and standby data nodes can be made simpler and can be processed nearby, thereby improving the synchronization performance between the active and standby data nodes.
  • step S115 may also synchronously wait for the master machine to receive a response from each synchronous standby machine based on the last redo log.
  • generating a binary log according to a database request includes but is not limited to the following steps:
  • Step S121 generating binary data according to the database request
  • Step S122 copying the binary data to the global log cache
  • Step S123 asynchronously writing the binary data in the global log buffer into the binary log file to generate a binary log.
  • step S110 the redo log is generated according to the database request, and after the redo log is synchronously copied to multiple synchronous standby machines, the master machine will trigger the step of asynchronously writing the binary log, that is, execute steps S121 to S123.
  • Binary data is generated according to the database request, and then the binary data is copied to the global log cache, and the background thread of the host computer asynchronously writes the binary data in the global log cache into the binary log file to generate a binary log, thereby Binary logs are replicated asynchronously to multiple asynchronous standbys.
  • step S121 that is, after the binary data is generated according to the database request, the size of the corresponding binary data in this transaction can also be obtained.
  • a log space is reserved in the global log cache, and the log space may correspond to the size of the binary data, so that the current binary data is copied/copied into the global log cache according to the size of the corresponding binary data in this transaction.
  • step S123 may also be executed asynchronously, which can effectively improve data processing efficiency.
  • the binary data may also be directly copied to the global log cache to cover the previous binary data, which is not specifically limited here.
  • the master machine ends and feeds back a response to the client.
  • the master machine in the embodiment of the present application is based on an asynchronous replication method of a lock-free logical log (that is, a binary log).
  • a lock-free method By using the lock-free method, the step of asynchronously writing the binary log in the master node is realized, which can release resources such as threads occupied, avoid blocking, and improve data response efficiency.
  • the embodiment of the present application provides a kind of data synchronization method, is applied to a plurality of standby machines, and the standby machine communicates with the master machine, and the standby machine includes a synchronous standby machine and an asynchronous standby machine, and the method includes but is not limited to the following steps:
  • Step S200 the synchronous standby machine receives the redo log from the main machine, and plays back the redo log, wherein the redo log is generated by the main machine based on the terminal's database request;
  • Step S210 the synchronous standby machine sends a response to the main machine, so that the main machine can receive the database request from the terminal again;
  • Step S220 the asynchronous standby machine receives the binary log from the main machine, and plays back the binary log, wherein the binary log is generated by the main machine based on the terminal's database request.
  • the embodiment of the present application is applied to multiple standby machines. It can be understood that the master machine in the embodiment of the present application can communicate with multiple synchronous standby machines and multiple asynchronous standby machines respectively, so as to implement the data synchronization method.
  • the master machine generates redo logs and binary logs according to database requests, and correspondingly, the synchronous standby machine receives the redo logs synchronously replicated from the master machine, and writes the redo logs into redo log files , the synchronous standby machine replays the redo logs, and then the synchronous standby machine sends a response to the active machine according to the redo logs.
  • the active machine receives the response from the synchronous standby machine based on the redo logs, and then receives Database requests from the terminal, such as SQL requests.
  • the asynchronous standby machine receives the binary log asynchronously replicated from the main machine, and plays back the binary log.
  • multiple synchronous standby machines and multiple asynchronous standby machines communicate with the main machine respectively to realize synchronous replication based on physical logs (that is, redo logs) and asynchronous replication based on logical logs (that is, binary logs).
  • Data synchronization method Compared with related technologies, the embodiment of the present application can effectively improve the synchronization performance between the active and standby data nodes, thereby improving the overall performance of the distributed database, and can effectively reduce the playback delay of the standby machine node and improve the high availability of the distributed database.
  • replaying the redo log includes but is not limited to the following steps:
  • Step S201 the synchronous standby machine analyzes the current redo log block to obtain a redo analysis result, wherein the current redo log block is a redo log block with a preset analysis size;
  • Step S202 the synchronous standby machine distributes the redo analysis results to different working threads
  • Step S203 the parsing of the current redo log block is completed
  • step S203 it is judged whether the playback of the previous redo log block is completed:
  • the playback of the previous redo log block has not been completed. After the playback of the previous redo log block is completed, the synchronous standby machine analyzes the next redo log block.
  • the synchronous standby machine in the embodiment of the present application is based on a parallel replay method of physical log blocks (ie, redo log blocks).
  • the synchronous standby machine after receiving the redo logs from the active machine, analyzes the redo logs according to the redo log blocks of the preset parsing size through the parsing thread, that is, the synchronous standby machine analyzes the current redo log The block is parsed to get the redo parsing result.
  • the redo log block with a preset parsing size may be a 2M redo log block, that is, the size corresponding to the current redo log block is 2M.
  • redo log blocks of other parsing sizes may also be set to implement redo log parsing, which is not specifically limited here.
  • the distribution thread of the synchronous standby machine distributes the redo analysis result to different working threads according to the analysis result (ie redo analysis result). For example, redo analysis results are distributed to different worker threads for application according to pages.
  • the application log task can be obtained through multiple working threads; according to the application log task, the corresponding physical log (ie redo log) is applied to the data page. It can be understood that the distribution thread distributes the redo analysis results to different worker threads according to the page. For example, worker thread 1 accepts application log tasks to apply redo logs, and worker thread 2 accepts application log tasks to apply redo logs..., Worker thread n (n is an integer) accepts the application log task for redo log application.
  • parsing thread of the synchronous standby machine completes a redo log block with a preset parsing size (for example, 2M), that is, after the parsing of the current redo log block is completed, there is no need to wait for the playback of the current redo log block to be completed. Parsing of the next redo log block may be performed.
  • a preset parsing size for example, 2M
  • the parsing of the next redo log block can be continued in a loop.
  • the current redo log block needs to be parsed.
  • the previous redo log block has been parsed, and the current redo log block can be parsed. If the previous redo log block is being parsed, the current redo log block cannot be parsed. Redo log blocks are parsed.
  • step S203 the parsing of the current redo log block is completed. It can also be understood that there is no need to wait for the current redo log block. It should be noted that it is also necessary to determine whether the playback of the previous redo log block preceding the current redo log block is completed.
  • the synchronous standby machine will analyze the next redo log block, that is, return to step S201 to execute the next current redo log block circularly. block for playback.
  • the parsing of the current redo log block and the playback of the previous redo log block are completed. It can be understood that the current redo log block is being parsed and the previous redo log block is being played back. After the playback of the previous redo log block is completed, the playback of the current redo log block and the analysis of the next redo log block can be performed, and the analysis and playback of the redo log can be realized in this cycle.
  • the current redo log block can be played back only after the parsing is completed, that is, the redo log is parsed first and then the redo log is played back to obtain the log data required by the database.
  • step S201 executes the step of replaying the next current redo log block in a loop.
  • the analysis of the current redo log block is completed and the playback of the previous redo log block is not completed, it can be understood that the analysis of the current redo log block is completed, and the current redo log block needs to be played back at this time.
  • the example allows at most one redo log block being parsed and one redo log block being played back.
  • the synchronous standby machine in the embodiment of the present application can effectively improve data processing efficiency by using a parallel replay method based on physical log blocks (that is, redo log blocks).
  • the asynchronous standby machine includes a first asynchronous standby machine and multiple second asynchronous standby machines.
  • Step S220 that is, the asynchronous standby machine receives the binary log from the master machine and plays back the binary log. Including but not limited to the following steps:
  • Step S221 multiple second asynchronous standby machines receive binary logs from the main machine, and perform parallel playback on the binary logs;
  • step S222 one of the second asynchronous standby machines synchronizes the asynchronously played back binary log to the first asynchronous standby machine.
  • the binary log completed by asynchronous playback is then synchronously copied to the first asynchronous standby machine, so as to complete the asynchronous and synchronous operation of the binary log and realize the asynchronous standby machine.
  • Inter-data synchronization to improve the high availability of distributed databases.
  • the standby machine includes 3 synchronous standby machines: DB2-slave, DB4-slave, DB5-slave, and 4 asynchronous standby machines: DB3-slave, DB6-slave , DB7-slave, DB8-slave.
  • DB1-master receives the database request from the terminal, and generates redo logs and binary logs according to the database request;
  • DB1-master replicates the redo logs to three synchronous standby machines in real time, namely DB2-slave, DB4-slave, and DB5-slave, and waits for the response of the three synchronous standby machines based on the redo logs;
  • DB1-master asynchronously replicates binary logs to three asynchronous standby machines, namely DB3-slave, DB6-slave, and DB7-slave;
  • the three synchronous standby machines receive redo logs from DB1-master, play back the redo logs, and send acknowledgment responses (ie ACK responses) to DB1-master;
  • Three asynchronous standby machines receive binary logs from DB1-master and play back the binary logs;
  • DB7-slave that is, one of the second asynchronous standby machines
  • DB8-slave that is, the first asynchronous standby machine
  • DB1-master After DB1-master receives the redo log-based response (ie, ACK response) from the three synchronous standby machines, it will receive the database request from the terminal again, that is, continue to execute subsequent transaction commit operations.
  • redo log-based response ie, ACK response
  • the synchronization performance between the active and standby data nodes in the distributed database can be improved, and the actual test performance can be improved by 50%, thereby improving the processing capacity of the distributed database data nodes under high concurrency; in addition, it also improves
  • the playback performance of the standby machine in the distributed database reduces the playback delay of the standby machine.
  • the actual test playback delay is 1 second, which saves the delay of the master-standby switch under high concurrency and improves the high availability of the distributed database.
  • the primary machine DB1-master synchronously replicates redo logs (ie redo logs) to three synchronous standby machines, namely DB2-slave, DB4-slave, DB5-slave, and the primary machine DB1-master Asynchronously copy the binary log (that is, binlog log) to three asynchronous standby machines, namely DB3-slave, DB6-slave, and DB7-slave.
  • the main machine DB1-master, synchronous standby machine DB2-slave and asynchronous standby machine DB3-slave can be located in the local computer room, and the synchronous standby machine DB4-slave and DB5-slave and the asynchronous standby machine DB6-slave can be located in the same city computer room.
  • the standby machines DB7-slave and DB8-slave can be located in remote computer rooms.
  • the embodiment of the present application can be applied to the scenario of a distributed database, and the fusion synchronization between high-performance active and standby data nodes can be realized through the data synchronization method of the embodiment of the present application.
  • the data synchronization between the active and standby data nodes of the distributed database in this embodiment mainly adopts a fusion synchronization method based on real-time synchronous replication of physical logs, that is, redo logs, and asynchronous replication based on logical logs, that is, binlog logs.
  • the distributed database it mainly includes the following modules: customer access layer, computing node cluster, management node, global transaction management center, back-end middleware and data node cluster.
  • the data synchronization method in the embodiment of the present application is mainly implemented through a cluster of data nodes.
  • the business side that is, the client access layer, includes multiple application APPs, which can support common ODBC interfaces and JDBC interfaces, and users use the distributed database through the client access layer;
  • the computing node cluster includes multiple middleware DBProxy, and the computing nodes in the computing node cluster complete the basic processing and distribution of SQL statements;
  • the management node includes multiple components, such as OMM Server, MDS, PM, CM, etc., which are mainly used to manage and guarantee the distributed database;
  • GTM the global transaction management center, is mainly used to generate and maintain the global transaction ID of distributed transactions
  • the data node cluster includes multiple DB-GROUPs, and each DB-GROUP includes a master machine and multiple standby machines, which are mainly used for data reading and writing, storage, synchronization, etc.;
  • Post middleware mainly monitors DB data nodes and manages high availability.
  • the data synchronization method of the embodiment of the present application includes but is not limited to:
  • a terminal such as a service end (ie, a client access layer), initiates a database request, such as an SQL request;
  • the middleware DBProxy in the calculation node cluster receives the SQL request from the business end, calculates and routes the SQL request, and distributes it to the corresponding DB data node (ie each data node cluster);
  • the DB data node receives the SQL processing task sent by DBProxy, and processes the SQL processing task;
  • the master computer executes SQL commands according to SQL requests, and continuously generates redo logs (ie redo logs) during the execution of SQL requests;
  • the master machine replicates the generated redo logs to some synchronous standby machines in real time;
  • the master machine After the SQL request, that is, the execution of the transaction, the master machine starts to commit the transaction, generates the last redo log during the transaction commit process, and synchronously copies the last redo log to the synchronous standby machine;
  • the main machine triggers asynchronous writing of binary logs according to SQL requests, and asynchronously sends binary logs to multiple asynchronous standby machines;
  • the synchronous standby machine receives the redo log sent by the main machine, and writes the redo log into the redo log file;
  • the synchronous standby machine performs concurrent playback of the redo log according to the redo log block of the preset parsing size
  • the synchronous standby machine receives the last redo log sent by the main machine, and sends an acknowledgment response (ie ACK response) to the main machine based on the last redo log;
  • the asynchronous standby machine receives the binary log generated by the main machine based on the SQL request of the business end, and writes the binary log into the binary log file;
  • the asynchronous standby machine performs parallel playback of the binary log
  • the master machine receives the response from the synchronous standby machine based on the last redo log, and commits the final transaction;
  • the present application also provides a data synchronization device, including: a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • a data synchronization device including: a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, one of the following:
  • a data synchronization method applied to a standby machine A data synchronization method applied to a standby machine.
  • the processor and memory can be connected by a bus or other means.
  • memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
  • the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
  • the memory may include memory located remotely from the processor, which remote memory may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • non-transitory software programs and instructions required to implement the data synchronization method of the above-mentioned embodiment are stored in the memory, and when executed by the processor, the data synchronization method in the above-mentioned embodiment is executed, for example, the execution of the above-described Method steps S100 to S130 in Fig. 1, method steps S111 in Fig. 2, method steps S112 to S113 in Fig. 3, method steps S114 in Fig. 4, method steps S115 in Fig. 5, method in Fig. 6 Steps S121 to S123, method steps S200 to S220 in FIG. 7 , method steps S201 to S203 in FIG. 8 , method steps S221 to S222 in FIG. 9 .
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to perform at least one of the following:
  • a data synchronization method applied to a standby machine A data synchronization method applied to a standby machine.
  • the computer-executable instructions of the present application are executed by a processor or a controller, for example, executed by a processor in the above-mentioned device embodiment, which can make the above-mentioned processor execute the data synchronization method in the above-mentioned embodiment, for example, execute the above-described Method steps S100 to S130 in Fig. 1, method steps S111 in Fig. 2, method steps S112 to S113 in Fig. 3, method steps S114 in Fig. 4, method steps S115 in Fig. 5, method in Fig. 6 Steps S121 to S123, method steps S200 to S220 in FIG. 7 , method steps S201 to S203 in FIG. 8 , method steps S221 to S222 in FIG. 9 .
  • the embodiment of the present application includes: receiving the database request from the terminal through the master machine, generating redo logs and binary logs according to the database request, synchronously copying the redo logs to multiple synchronous standby machines, and asynchronously copying the binary logs to Multiple asynchronous standby machines, after receiving the response from each synchronous standby machine based on the redo log, re-receive the database request from the terminal; the embodiment of this application adopts logical replication based on logical log and physical Copy the data synchronization methods between the two master and backup data nodes to achieve fusion and synchronization between the master and backup data nodes, effectively improve the synchronization performance between the master and backup data nodes, thereby improving the overall performance of the distributed database; Node playback delay improves the high availability of distributed databases.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

本申请提供了一种数据同步方法、设备及计算机可读存储介质,方法包括:通过主用机接收来自终端的数据库请求(S100),并根据数据库请求生成重做日志和二进制日志,并将重做日志同步复制给多个同步备用机(S110),以及将二进制日志异步复制给多个异步备用机(S120),当接收到来自每一同步备用机基于重做日志的应答响应后,重新接收来自终端的数据库请求(S130)。

Description

数据同步方法、设备及计算机可读存储介质
相关申请的交叉引用
本申请基于申请号为202111602799.8、申请日为2021年12月24日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及但不限于数据处理技术领域,尤其涉及一种数据同步方法、设备及计算机可读存储介质。
背景技术
当前无论是分布式数据库还是单机数据库,其主备数据节点间的数据同步,主要有两种方式,第一种是基于逻辑日志的逻辑复制,第二种是基于物理日志的物理复制。
例如,当前GoldenDB(金融级的交易型)分布式数据库的数据节点是基于MySQL(关系型数据库管理系统)这种开源数据库开发的,其采用基于逻辑日志的逻辑复制方式进行主备数据节点间的数据同步,但主备数据节点间的同步性能通常较低,进而导致分布式数据库整体性能较低;另外,由于主备数据节点中备用机节点回放时延较高,影响了分布式数据库的主备切换时间,导致分布式数据库的高可用性降低。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供了一种数据同步方法、设备及计算机可读存储介质。
第一方面,本申请实施例提供了一种数据同步方法,应用于主用机,所述主用机与多个备用机通信,所述备用机包括同步备用机和异步备用机,所述方法包括:接收来自终端的数据库请求;根据所述数据库请求生成重做日志,将所述重做日志同步复制给多个所述同步备用机;根据所述数据库请求生成二进制日志,将所述二进制日志异步复制给多个所述异步备用机;当接收到来自每一所述同步备用机基于所述重做日志的应答响应后,重新接收来自所述终端的数据库请求。
第二方面,本申请实施例还提供了一种数据同步方法,应用于多个备用机,所述备用机与主用机通信,所述备用机包括同步备用机和异步备用机,所述方法包括:所述同步备用机接收来自所述主用机的重做日志,并对所述重做日志进行回放,其中,所述重做日志由所述主用机基于终端的数据库请求生成得到;所述同步备用机发送应答响应给所述主用机,以使所述主用机重新接收来自所述终端的数据库请求;所述异步备用机接收来自所述主用机的二进制日志,并对所述二进制日志进行回放,其中,所述二进制日志由所述主用机基于终端的数据库请求生成得到。
第三方面,本申请实施例还提供了一种数据同步设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时如下之一:第一方面任意一项所述的数据同步方法;或第二方面任意一项所述的数据同步方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行至少如下之一:第一方面任意一项所述的数据同步方法;或第二方面任意一项所述的数据同步方法。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
附图说明
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请实施例的数据同步方法的主用机侧的流程示意图;
图2是本申请实施例的生成重做日志的流程示意图;
图3是本申请实施例的生成最后一个重做日志的流程示意图;
图4是本申请实施例的同步复制给同步备用机的流程示意图;
图5是本申请实施例的应答响应的流程示意图;
图6是本申请实施例的生成二进制日志的流程示意图;
图7是本申请实施例的数据同步方法的备用机侧的流程示意图;
图8是本申请实施例的对重做日志进行回放的流程示意图;
图9是本申请实施例的对二进制日志进行回放的流程示意图;
图10是本申请实施例的数据同步方法的流程示意图;
图11是本申请实施例的分布式数据库的总体架构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
可以理解的是,当前无论是分布式数据库还是单机数据库,其主备数据节点间的数据同步,主要有两种方式,第一种是基于逻辑日志的逻辑复制,第二种是基于物理日志的物理复制。
例如,针对MySQL数据库而言(其中,MySQL是一个关系型数据库管理系统,由瑞典MySQL AB公司开发,属于Oracle旗下产品。MySQL所使用的SQL(Structured Query Language,结构化查询语言)是用于访问数据库的最常用标准化语言),主要采用第一种基于逻辑日志的逻辑复制,其逻辑日志为binlog日志(即二进制日志)。这种逻辑复制方式是在事务提交时刻生成的事务逻辑日志,并将该事务逻辑日志写入到binlog日志(即二进制日志)文件并刷盘,再同步到备用机节点。通过这种方式,当数据库请求压力大时,将会在事务提交、写binlog日志以及等待备用机反馈响应处造成阻塞,从而导致该数据库整体性能较低。并且,在瞬间写入大量事务逻辑日志,而备用机节点进行回放时,将会产生较大的回放时延。
针对PostgreSQL数据库而言(其中,PostgreSQL是一种特性齐全的自由软件的对象-关系型数据库管理系统,是以加州大学计算机系开发的POSTGRES,4.2版本为基础的对象关系型数据库管理系统),主要采用第二种基于物理日志的物理复制,其物理日志为redo日志(即重做日志)。这种物理复制方式是在事务执行过程中不断增量产生redo日志,并将产生的redo日志写入到redo日志(即重做日志)文件中,在事务提交的时候进行最后一次写redo日志和刷盘操作。由于物理日志是在产生过程中不断同步到备用机节点,因此其总体同步性能较高。但是,这种物理复制方式支持场景优先,不同版本之间的复制兼容性差,不同数据库系统之间无法进行物理复制。
相关技术中,当前GoldenDB(金融级的交易型)分布式数据库的数据节点是基于MySQL(关系型数据库管理系统)这种开源数据库开发的,其采用基于逻辑日志(即binlog日志)的逻辑复制方式进行主备数据节点间的数据同步,由上述可知,对于这种逻辑复制方式,其主备数据节点间的同步性能通常较低,进而导致分布式数据库整体性能较低。另外,由于主备数据节点中备用机节点回放时延较高,影响了分布式数据库的主备切换时间,导致分布式数据库的高可用性降低。
基于此,本申请实施例提供一种数据同步方法,能够提高主备数据节点间的同步性能,以及降低备用机节点的回放时延。
下面结合附图,对本申请实施例作进一步阐述。
参照图1,本申请实施例提供一种数据同步方法,应用于主用机,主用机与多个备用机通信,备用机包括同步备用机和异步备用机,方法包括但不限于以下步骤:
步骤S100,接收来自终端的数据库请求;
步骤S110,根据数据库请求生成重做日志,将重做日志同步复制给多个同步备用机;
步骤S120,根据数据库请求生成二进制日志,将二进制日志异步复制给多个异步备用机;
步骤S130,当接收到来自每一同步备用机基于重做日志的应答响应后,重新接收来自终端的数据库请求。
需要说明的是,本申请实施例的数据库请求可以为SQL请求。
在一实施例中,通过业务端发送数据库请求给主用机,以使主用机接收来自业务端的数据库请求。可以理解的是,终端可以为移动终端设备,也可以为非移动终端设备。移动终端设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载终端设备、可穿戴设备、超级移动个人计算机、上网本、个人数字助理、CPE、UFI(无线热点设备)等;非移动终端设备可以为个人计算机、电视机、柜员机或者自助机等,本申请实施例不作具体限定。
之后,主用机根据数据库请求生成重做日志(即redo日志)和二进制日志(即binlog日志),本申请实施例是在事务执行过程中不断增量产生重做日志,再将产生的重做日志同步复制给多个同步备用机;并在事务提交时刻生成事务逻辑日志(即二进制日志),再将二进制日志异步复制给多个异步备用机。同步备用机在接收到来自主用机的重做日志后,将根据重做日志反馈应答响应给主用机,因此,主用机在接收到来自每一同步备用机的应答响应后,将重新接收来自终端的数据库请求,即返回到步骤S100中,重新执行步骤S100至步骤S130,以便对下一数据库请求进行数据同步操作。
本申请实施例应用于主用机,通过主用机实现与多个备用机的通信,通过基于物理日志 (即重做日志)的同步复制和基于逻辑日志(即二进制日志)的异步复制,实现数据同步方法。与相关技术相比,本申请实施例一方面通过将重做日志同步复制给多个同步备用机,能够有效提高主备数据节点间的同步性能,进而提高分布式数据库的整体性能;另一方面,通过将二进制日志异步复制给多个异步备用机,能够有效降低备用机节点回放时延,提高分布式数据库的高可用性。
可以理解的是,参照图2,根据数据库请求生成重做日志,包括但不限于以下步骤:
步骤S111,执行数据库请求,在数据库请求执行过程中生成重做日志。
本申请实施例通过基于物理日志(即重做日志)的同步复制方式,实现主用机与多个同步备用机之间的数据同步方法。主用机在接收来自终端的数据库请求后,将执行该数据库请求,并在数据库请求执行过程中不断生成重做日志,以将生成的重做日志同步复制给多个同步备用机。由于本申请实施例的重做日志是在产生过程中不断同步到同步备用机的,因此主备数据节点间的同步性能较高。
可以理解的是,参照图3,在步骤S111,即执行数据库请求,在数据库请求执行过程中生成重做日志之后,方法还包括但不限于以下步骤:
步骤S112,在数据库请求执行完成后进行事务提交;
步骤S113,在事务提交过程中生成最后一个重做日志。
需要说明的是,本申请实施例的主用机接收来自终端的数据库请求后,在数据库请求执行过程中会生成重做日志,以同步复制给多个同步备用机。而在数据库请求执行完成后,将开始进行事务提交,并且,在事务提交过程中,将完成最后一次重做日志的写入,以生成最后一个重做日志。由此,实现主用机将重做日志同步复制给同步备用机。
在步骤S112和步骤S113之后,参照图4,将重做日志同步复制给多个同步备用机,包括但不限于以下步骤:
步骤S114,将最后一个重做日志同步复制给多个同步备用机。
本申请实施例通过主用机将最后一个重做日志同步复制给多个同步备用机,以使得同步备用机再接收到来自主用机的最后一个重做日志后,根据最后一个重做日志反馈应答响应给主用机。主用机在接收到来自每一同步备用机基于最后一个重做日志的应答响应后,将重新执行步骤S100至步骤S130。
可以理解的是,参照图5,在将重做日志同步复制给多个同步备用机之后,包括但不限于以下步骤:
步骤S115,同步等待来自每一同步备用机基于重做日志的应答响应。
需要说明的是,本申请实施例采用基于物理日志(即重做日志)的同步复制方式,即主用机将基于终端的数据库请求生成得到的重做日志,同步复制给多个同步备用机后,同步备用机将根据该重做日志作出应答响应,在这过程中,主用机需要同步等待来自每一同步备用机反馈的应答响应,当接收到来自每一同步备用机基于重做日志的应答响应后,再重新执行步骤S100至步骤S130。通过该方式,能够使得主备数据节点间的数据同步更简单,可就近处理,从而提高主备数据节点间的同步性能。
例如,在主用机执行步骤S112、步骤S113和步骤S114之后,步骤S115还可以为主用机同步等待来自每一同步备用机基于最后一个重做日志的应答响应。
可以理解的是,参照图6,根据数据库请求生成二进制日志,包括但不限于以下步骤:
步骤S121,根据数据库请求,生成二进制数据;
步骤S122,将二进制数据复制到全局日志缓存中;
步骤S123,异步将全局日志缓存中的二进制数据写入二进制日志文件中,生成二进制日志。
本申请实施例在步骤S110,即根据数据库请求生成重做日志,将重做日志同步复制给多个同步备用机之后,主用机将触发异步写二进制日志步骤,即执行步骤S121至步骤S123。通过根据数据库请求生成二进制数据,之后,将二进制数据复制到全局日志缓存中,由主用机的后台线程异步将全局日志缓存中的二进制数据写入二进制日志文件中,以生成二进制日志,从而将二进制日志异步复制给多个异步备用机。
可以理解的是,在步骤S121,即根据数据库请求,生成二进制数据之后,还可获取本次事务中对应的二进制数据的大小。在全局日志缓存中预留日志空间,该日志空间可对应于二进制数据的大小,以使得通过根据本次事务中对应的二进制数据的大小,将当前的二进制数据复制/拷贝到全局日志缓存中。在该缓存步骤执行过程中,还可异步执行步骤S123,能够有效提高数据处理效率。
此外,在步骤S121之后,还可直接将二进制数据复制到全局日志缓存中,以覆盖上一二进制数据,在此不作具体限定。
可以理解的是,主用机在事务提交完成后,结束,并给客户端反馈响应。
由此,本申请实施例的主用机是基于无锁化的逻辑日志(即二进制日志)的异步复制方法。通过利用无锁化方式,实现了主用机节点中通过异步写入二进制日志的步骤,能够释放占用的线程等资源,避免阻塞,提高数据响应效率。
参照图7,本申请实施例提供一种数据同步方法,应用于多个备用机,备用机与主用机 通信,备用机包括同步备用机和异步备用机,方法包括但不限于以下步骤:
步骤S200,同步备用机接收来自主用机的重做日志,并对重做日志进行回放,其中,重做日志由主用机基于终端的数据库请求生成得到;
步骤S210,同步备用机发送应答响应给主用机,以使主用机重新接收来自终端的数据库请求;
步骤S220,异步备用机接收来自主用机的二进制日志,并对二进制日志进行回放,其中,二进制日志由主用机基于终端的数据库请求生成得到。
本申请实施例应用于多个备用机,可以理解的是,本申请实施例的主用机可以分别与多个同步备用机和多个异步备用机通信,以实现数据同步方法。
在一些实施例中,主用机根据数据库请求生成重做日志和二进制日志,对应的,同步备用机接收来自主用机同步复制的重做日志,并将重做日志写入重做日志文件中,同步备用机再对重做日志进行回放,之后同步备用机根据重做日志发送应答响应给主用机,此时主用机接收到来自同步备用机基于重做日志的应答响应后,重新接收来自终端的数据库请求,例如SQL请求。而异步备用机接收来自主用机异步复制的二进制日志,并对二进制日志进行回放。
本申请实施例通过多个同步备用机和多个异步备用机分别与主用机通信,以实现基于物理日志(即重做日志)的同步复制和基于逻辑日志(即二进制日志)的异步复制的数据同步方法。与相关技术相比,本申请实施例能够有效提高主备数据节点间的同步性能,进而提高分布式数据库的整体性能,且能够有效降低备用机节点回放时延,提高分布式数据库的高可用性。
可以理解的是,参照图8,对重做日志进行回放,包括但不限于以下步骤:
步骤S201,同步备用机对当前重做日志块进行解析,得到重做解析结果,其中,当前重做日志块为预设解析大小的重做日志块;
步骤S202,同步备用机将重做解析结果分发到不同的工作线程中;
步骤S203,当前重做日志块解析完成;
步骤S203之后,判断前一重做日志块是否回放完成:
当前一重做日志块回放完成,同步备用机对下一重做日志块进行解析;
当前一重做日志块未回放完成,等待前一重做日志块回放完成后,同步备用机对下一重做日志块进行解析。
可以理解的是,本申请实施例的同步备用机是基于物理日志块(即重做日志块)的并行回放方法。
在一些实施例中,同步备用机在接收到来自主用机的重做日志后,通过解析线程按照预设解析大小的重做日志块进行重做日志的解析,即同步备用机对当前重做日志块进行解析,得到重做解析结果。需说明的是,预设解析大小的重做日志块可以为2M的重做日志块,即当前重做日志块对应的大小为2M,通过如此设置,能够有效提高解析效率。例如,相关技术中,假设按照几十个字节或上百个字节这样的大小分别对重做日志进行解析,并采用串行的方式时,将导致整个串行过程很慢,本申请实施例通过按照2M的重做日志块对重做日志进行解析,使得前一2M的重做日志块和后一2M的重做日志块之间可以进行并行回放,有效提高数据处理效率。即前一重做日志块、当前重做日志块和下一重做日志块均可以为2M的重做日志块。在其他实施例中,还可设置其他解析大小的重做日志块以实现对重做日志的解析,在此不作具体限定。
之后,同步备用机的分发线程根据解析之后的结果(即重做解析结果),将重做解析结果分发到不同的工作线程中。例如,按照页面将重做解析结果分发到不同的工作线程中进行应用。一实施例中,通过多个工作线程,可获取到应用日志任务;根据应用日志任务,将对应的物理日志(即重做日志)应用到数据页中。可理解为,分发线程按照页面将重做解析结果分发到不同的工作线程中,例如工作线程1接受应用日志任务进行重做日志应用,工作线程2接受应用日志任务进行重做日志应用……,工作线程n(n为整数)接受应用日志任务进行重做日志应用。
可以理解的是,在同步备用机的解析线程完成一个预设解析大小(例如2M)的重做日志块后,即当前重做日志块解析完成后,无需等待当前重做日志块回放完成,便可执行对下一重做日志块的解析。
此外,当不存在前一重做日志块(即前一2M的重做日志块)正在解析,则可循环继续下一重做日志块的解析。例如,需要对当前重做日志块进行解析,此时,前一重做日志块已解析完成,则可对当前重做日志块进行解析,若前一重做日志块正在解析,则不可对当前重做日志块进行解析。
在步骤S203的当前重做日志块解析完成,还可以理解为,当前重做日志块快无需等待。需要说明的是,还需要判断当前重做日志块之前的前一重做日志块是否回放完成。
即当当前重做日志块解析完成和前一重做日志块回放完成,同步备用机将对下一重做日志块进行解析,即重新返回至步骤S201,以循环执行对下一当前重做日志块进行回放的步骤。其中,当前重做日志块解析完成和前一重做日志块回放完成,可理解为,当前重做日志块正在解析,而前一重做日志块正在回放,当当前重做日志块解析完成和前一重做日志块回放完 成,即可进行当前重做日志块的回放,以及下一重做日志块的解析,以此循环,实现对重做日志的解析、回放。需要说明的是,当前重做日志块在解析完成后才能进行回放,即先对重做日志进行解析后再对该重做日志进行回放,便可得到数据库所需要的日志数据。
而当当前重做日志块解析完成和前一重做日志块未回放完成,此时需要等待前一重做日志块回放完成后,同步备用机再对下一重做日志块进行解析,即重新返回至步骤S201,以循环执行对下一当前重做日志块进行回放的步骤。其中,当当前重做日志块解析完成和前一重做日志块未回放完成,可理解为,当前重做日志块解析完成,此时需要对当前重做日志块进行回放,但由于本申请实施例最多允许一个正在解析的重做日志块和一个正在回放的重做日志块,因此需要等待前一重做日志块回放完成后,同步备用机才能对下一重做日志块进行解析,此时,可对当前重做日志块进行回放,以此循环,实现对重做日志的解析、回放。
本申请实施例的同步备用机通过基于物理日志块(即重做日志块)的并行回放方法,能够有效提高数据处理效率。
可以理解的是,参照图9,异步备用机包括第一异步备用机和多个第二异步备用机,步骤S220,即异步备用机接收来自主用机的二进制日志,并对二进制日志进行回放,包括但不限于以下步骤:
步骤S221,多个第二异步备用机接收来自主用机的二进制日志,并对二进制日志进行并行回放;
步骤S222,其中一个第二异步备用机将异步回放后的二进制日志同步给第一异步备用机。
通过多个第二异步备用机中的其中一个第二异步备用机,将异步回放完成的二进制日志再同步复制给第一异步备用机,以完成对二进制日志的异步同步操作,实现异步备用机之间的数据同步,提高分布式数据库的高可用性。
下面以一个实施例描述本申请实施例的数据同步方法。应用于主用机:DB1-master和多个备用机,备用机包括3个同步备用机:DB2-slave,DB4-slave,DB5-slave,以及4个异步备用机:DB3-slave,DB6-slave,DB7-slave,DB8-slave。
DB1-master接收到来自终端的数据库请求,根据数据库请求生成重做日志和二进制日志;
DB1-master将重做日志实时同步复制给3个同步备用机,即DB2-slave,DB4-slave,DB5-slave,并同步等待3个同步备用机基于重做日志的应答响应;
DB1-master将二进制日志异步复制给3个异步备用机,即DB3-slave,DB6-slave,DB7-slave;
3个同步备用机接收来自DB1-master的重做日志,对重做日志进行回放,并给DB1-master发送应答响应(即ACK响应);
3个异步备用机(即第二异步备用机)接收来自DB1-master的二进制日志,对二进制日志进行回放;
DB7-slave(即其中一个第二异步备用机)将异步回放后的二进制日志再同步给DB8-slave(即第一异步备用机);
当DB1-master接收到3个同步备用机基于重做日志的应答响应(即ACK响应)后,重新接收来自终端的数据库请求,即继续执行后续的事务提交操作。
通过本申请实施例,能够提高分布式数据库中主备数据节点间的同步性能,在实际测试性能提升达到50%,进而提高了高并发下分布式数据库数据节点的处理能力;此外,还提高了分布式数据库中备用机的回放性能,减少备用机的回放时延,实际测试回放时延为1秒,节省了高并发下主备切换的时延,提高了分布式数据库的高可用性。
例如,参照图10,主用机DB1-master将重做日志(即redo日志)同步复制给3个同步备用机,即DB2-slave,DB4-slave,DB5-slave,以及主用机DB1-master将二进制日志(即binlog日志)异步复制给3个异步备用机,即DB3-slave,DB6-slave,DB7-slave。而主用机DB1-master、同步备用机DB2-slave和异步备用机DB3-slave可以处于本地机房,同步备用机DB4-slave和DB5-slave、异步备用机DB6-slave可以处于同城机房,而异步备用机DB7-slave和DB8-slave可以处于异地机房。
在另一实施例中,本申请实施例可应用于分布式数据库的场景下,通过本申请实施例的数据同步方法,实现高性能主备数据节点间的融合同步。本实施例的分布式数据库的主备数据节点间的数据同步主要采用了基于物理日志即redo日志的实时同步复制和基于逻辑日志即binlog日志的异步复制的融合同步方法。
在该分布式数据库中,主要包括以下模块:客户接入层、计算节点集群、管理节点、全局事务管理中心、后置中间件和数据节点集群。本申请实施例的数据同步方法主要通过数据节点集群来实现。
可以理解的是,本申请实施例可应用于GoldenDB的分布式数据库中,其总体架构示意图如附图11所示。
1.业务端,即客户接入层,包括多个应用APP,可支持通用的ODBC接口和JDBC接口,用户通过客户接入层来使用分布式数据库;
2.计算节点集群包括多个中间件DBProxy,SQL语句在计算节点集群中的计算节点完成 基本的处理和分发;
3.管理节点包括多个组件,例如包括OMM Server,MDS,PM,CM等,主要用于管理和保障分布式数据库;
4.全局事务管理中心GTM,主要用于生成和维护分布式事务的全局事务ID;
5.数据节点集群包括多个DB-GROUP,每个DB-GROUP包括一个主用机和多个备用机,主要用于数据的读写、存储、同步等;
6.后置中间件主要对DB数据节点进行监测和高可用管理等。
在一些实施例中,如图11所示,本申请实施例的数据同步方法,包括但不限于:
1)终端,例如业务端(即客户接入层)发起一个数据库请求,例如发起SQL请求;
2)计算节点集群中的中间件DBProxy接收到来自业务端的SQL请求,对SQL请求进行计算和路由处理,并分发到对应的DB数据节点(即各数据节点集群)中;
3)DB数据节点接收到DBProxy发送的SQL处理任务,并对SQL处理任务进行处理;
4)主用机根据SQL请求执行SQL命令,在执行SQL请求过程中不断生成重做日志(即redo日志);
5)主用机将生成的重做日志实时同步复制给部分同步备用机;
6)主用机在SQL请求即事务执行完成后,开始进行事务提交,在事务提交过程中生成最后一个重做日志,并将最后一个重做日志同步复制给同步备用机;
7)主用机根据SQL请求触发异步写二进制日志,并异步地将二进制日志发送给多个异步备用机;
8)主用机同步等待来自每一同步备用机基于重做日志的应答响应;
9)同步备用机接收到主用机发送的重做日志,并将重做日志写入到重做日志文件中;
10)同步备用机按照预设解析大小的重做日志块对重做日志进行并发回放;
11)同步备用机接收到主用机发送的最后一个重做日志,并基于最后一个重做日志发送应答响应(即ACK响应)给主用机;
12)异步备用机接收到主用机基于业务端的SQL请求生成得到的二进制日志,并将二进制日志写入二进制日志文件中;
13)异步备用机对二进制日志进行并行回放;
14)主用机接收到同步备用机基于最后一个重做日志的应答响应,进行最终事务提交;
15)主用机完成事务提交后,结束,并给客户端反馈响应。
16)客户端收到主用机发送的响应后,整个分布式数据库的任务处理结束。
可以理解的是,本申请还提供一种数据同步设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时如下之一:
应用于主用机的数据同步方法;或
应用于备用机的数据同步方法。
处理器和存储器可以通过总线或者其他方式连接。
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
需要说明的是,实现上述实施例的数据同步方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的数据同步方法,例如,执行以上描述的图1中的方法步骤S100至S130、图2中的方法步骤S111、图3中的方法步骤S112至S113、图4中的方法步骤S114、图5中的方法步骤S115、图6中的方法步骤S121至S123、图7中的方法步骤S200至S220、图8中的方法步骤S201至S203、图9中的方法步骤S221至S222。
以上所描述的设备实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
可以理解的是,本申请还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,计算机可执行指令用于执行至少如下之一:
应用于主用机的数据同步方法;或
应用于备用机的数据同步方法。
本申请的计算机可执行指令被一个处理器或控制器执行,例如,被上述设备实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的数据同步方法,例如,执行以上描述的图1中的方法步骤S100至S130、图2中的方法步骤S111、图3中的方法步骤S112至S113、图4中的方法步骤S114、图5中的方法步骤S115、图6中的方法步骤S121至S123、图7中的方法步骤S200至S220、图8中的方法步骤S201至S203、图9中的方法步骤S221至S222。
本申请实施例包括:通过主用机接收来自终端的数据库请求,并根据数据库请求生成重做日志和二进制日志,并将重做日志同步复制给多个同步备用机,以及将二进制日志异步复 制给多个异步备用机,当接收到来自每一同步备用机基于重做日志的应答响应后,重新接收来自终端的数据库请求;本申请实施例通过采用基于逻辑日志的逻辑复制和基于物理日志的物理复制这两种主备数据节点间的数据同步方式,以实现主备数据节点间的融合同步,有效提高主备数据节点间的同步性能,进而提高分布式数据库的整体性能;且能够降低备用机节点回放时延,提高分布式数据库的高可用性。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上是对本申请的一些实施进行了说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (11)

  1. 一种数据同步方法,应用于主用机,所述主用机与多个备用机通信,所述备用机包括同步备用机和异步备用机,所述方法包括:
    接收来自终端的数据库请求;
    根据所述数据库请求生成重做日志,将所述重做日志同步复制给多个所述同步备用机;
    根据所述数据库请求生成二进制日志,将所述二进制日志异步复制给多个所述异步备用机;
    当接收到来自每一所述同步备用机基于所述重做日志的应答响应后,重新接收来自所述终端的数据库请求。
  2. 根据权利要求1所述的方法,其中,在所述将所述重做日志同步复制给多个所述同步备用机之后,包括:
    同步等待来自每一所述同步备用机基于所述重做日志的应答响应。
  3. 根据权利要求1所述的方法,其中,所述根据所述数据库请求生成重做日志,包括:
    执行所述数据库请求,在所述数据库请求执行过程中生成重做日志。
  4. 根据权利要求3所述的方法,其中,在所述执行所述数据库请求,在所述数据库请求执行过程中生成重做日志之后,所述方法还包括:
    在所述数据库请求执行完成后进行事务提交;
    在所述事务提交过程中生成最后一个重做日志。
  5. 根据权利要求4所述的方法,其中,所述将所述重做日志同步复制给多个所述同步备用机,包括:
    将所述最后一个重做日志同步复制给多个所述同步备用机。
  6. 根据权利要求1所述的方法,其中,所述根据所述数据库请求生成二进制日志,包括:
    根据所述数据库请求,生成二进制数据;
    将所述二进制数据复制到全局日志缓存中;
    异步将全局日志缓存中的二进制数据写入二进制日志文件中,生成二进制日志。
  7. 一种数据同步方法,应用于多个备用机,所述备用机与主用机通信,所述备用机包括同步备用机和异步备用机,所述方法包括:
    所述同步备用机接收来自所述主用机的重做日志,并对所述重做日志进行回放,其中,所述重做日志由所述主用机基于终端的数据库请求生成得到;
    所述同步备用机发送应答响应给所述主用机,以使所述主用机重新接收来自所述终端的数据库请求;
    所述异步备用机接收来自所述主用机的二进制日志,并对所述二进制日志进行回放,其中,所述二进制日志由所述主用机基于终端的数据库请求生成得到。
  8. 根据权利要求7所述的方法,其中,所述对所述重做日志进行回放,包括:
    所述同步备用机对当前重做日志块进行解析,得到重做解析结果,其中,所述当前重做日志块为预设解析大小的重做日志块;
    所述同步备用机将所述重做解析结果分发到不同的工作线程中;
    当所述当前重做日志块解析完成和前一重做日志块回放完成,所述同步备用机对下一重做日志块进行所述解析;
    当所述当前重做日志块解析完成和所述前一重做日志块未回放完成,等待所述前一重做日志块回放完成后,所述同步备用机对下一重做日志块进行所述解析。
  9. 根据权利要求7所述的方法,其中,所述异步备用机包括第一异步备用机和多个第二异步备用机,所述异步备用机接收来自所述主用机的二进制日志,并对所述二进制日志进行回放,包括:
    多个所述第二异步备用机接收来自所述主用机的二进制日志,并对所述二进制日志进行并行回放;
    其中一个所述第二异步备用机将异步回放后的二进制日志同步给所述第一异步备用机。
  10. 一种数据同步设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时如下之一:
    权利要求1至6中任意一项所述的数据同步方法;或
    权利要求7至9中任意一项所述的数据同步方法。
  11. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行至少如下之一:
    权利要求1至6中任意一项所述的数据同步方法;或
    权利要求7至9中任意一项所述的数据同步方法。
PCT/CN2022/136956 2021-12-24 2022-12-06 数据同步方法、设备及计算机可读存储介质 WO2023116419A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111602799.8 2021-12-24
CN202111602799.8A CN113987078B (zh) 2021-12-24 2021-12-24 数据同步方法、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2023116419A1 true WO2023116419A1 (zh) 2023-06-29

Family

ID=79734313

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/136956 WO2023116419A1 (zh) 2021-12-24 2022-12-06 数据同步方法、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN113987078B (zh)
WO (1) WO2023116419A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987078B (zh) * 2021-12-24 2022-04-19 中兴通讯股份有限公司 数据同步方法、设备及计算机可读存储介质
CN115905270B (zh) * 2023-01-06 2023-06-09 金篆信科有限责任公司 数据库中主用数据节点的确定方法、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198159A (zh) * 2013-04-27 2013-07-10 国家计算机网络与信息安全管理中心 一种基于事务重做的异构集群多副本一致性维护方法
US20160147614A1 (en) * 2014-11-25 2016-05-26 Kaushal MITTAL Synchronized Backup and Recovery of Database Systems
US20180081956A1 (en) * 2013-11-04 2018-03-22 Guangdong Electronics Industry Institute Ltd. Method for automatically synchronizing multi-source heterogeneous data resources
CN108170768A (zh) * 2017-12-25 2018-06-15 腾讯科技(深圳)有限公司 数据库同步方法、装置及可读介质
CN113987078A (zh) * 2021-12-24 2022-01-28 中兴通讯股份有限公司 数据同步方法、设备及计算机可读存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289469B (zh) * 2011-07-26 2013-01-30 国电南瑞科技股份有限公司 一种支持通用数据库基于物理隔离设备同步数据的方法
US9092475B2 (en) * 2011-11-07 2015-07-28 Sap Se Database log parallelization
US20150310044A1 (en) * 2014-02-03 2015-10-29 Codefutures Corporation Database device and processing of data in a database
CN106933703B (zh) * 2015-12-30 2021-04-02 阿里巴巴集团控股有限公司 一种数据库数据备份的方法、装置及电子设备
CN108376142B (zh) * 2018-01-10 2021-05-14 北京思特奇信息技术股份有限公司 一种分布式内存数据库数据同步方法及系统
CN109992628B (zh) * 2019-04-15 2022-10-25 深圳市腾讯计算机系统有限公司 数据同步的方法、装置、服务器及计算机可读存储介质
CN110442560B (zh) * 2019-08-14 2022-03-08 上海达梦数据库有限公司 一种日志重演方法、装置、服务器和存储介质
CN110851527B (zh) * 2019-09-24 2022-12-06 福建星网智慧科技有限公司 一种主备服务器的数据同步方法
CN110716828B (zh) * 2019-10-09 2023-05-23 宏为物联网科技(苏州)有限公司 一种数据库实时备份方法
CN112131237A (zh) * 2020-09-28 2020-12-25 京东数字科技控股股份有限公司 数据同步方法、装置、设备及计算机可读介质
CN112632017A (zh) * 2020-11-05 2021-04-09 北京乐学帮网络技术有限公司 数据库日志的处理方法、装置以及电子设备、存储介质
CN113297159B (zh) * 2021-02-08 2024-03-08 阿里巴巴集团控股有限公司 数据存储方法以及装置
CN113535665B (zh) * 2021-07-16 2022-07-22 北京元年科技股份有限公司 一种主数据库与备数据库之间同步日志文件的方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198159A (zh) * 2013-04-27 2013-07-10 国家计算机网络与信息安全管理中心 一种基于事务重做的异构集群多副本一致性维护方法
US20180081956A1 (en) * 2013-11-04 2018-03-22 Guangdong Electronics Industry Institute Ltd. Method for automatically synchronizing multi-source heterogeneous data resources
US20160147614A1 (en) * 2014-11-25 2016-05-26 Kaushal MITTAL Synchronized Backup and Recovery of Database Systems
CN108170768A (zh) * 2017-12-25 2018-06-15 腾讯科技(深圳)有限公司 数据库同步方法、装置及可读介质
CN113987078A (zh) * 2021-12-24 2022-01-28 中兴通讯股份有限公司 数据同步方法、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN113987078A (zh) 2022-01-28
CN113987078B (zh) 2022-04-19

Similar Documents

Publication Publication Date Title
EP3968175B1 (en) Data replication method and apparatus, and computer device and storage medium
US11874746B2 (en) Transaction commit protocol with recoverable commit identifier
US11681684B2 (en) Client-driven commit of distributed write transactions in a database environment
WO2023116419A1 (zh) 数据同步方法、设备及计算机可读存储介质
US11010262B2 (en) Database system recovery using preliminary and final slave node replay positions
AU2014206155B2 (en) Multi-row transactions
US9589041B2 (en) Client and server integration for replicating data
CN109710388B (zh) 数据读取方法、装置、电子设备以及存储介质
EP3722973B1 (en) Data processing method and device for distributed database, storage medium, and electronic device
US11216346B2 (en) Coordinated replication of heterogeneous database stores
WO2020025049A1 (zh) 数据同步的方法、装置、数据库主机及存储介质
EP4170509A1 (en) Method for playing back log on data node, data node, and system
US20230099664A1 (en) Transaction processing method, system, apparatus, device, storage medium, and program product
CN109783578B (zh) 数据读取方法、装置、电子设备以及存储介质
WO2022135471A1 (zh) 多版本并发控制和日志清除方法、节点、设备和介质
CN109857523B (zh) 一种用于实现数据库高可用性的方法及装置
US20240126783A1 (en) Recovery from loss of leader during asynchronous database transaction replication
WO2024082693A1 (zh) 数据处理方法及装置
Klauck Scalability, Availability, and Elasticity through Database Replication in Hyrise-R
WO2024081139A1 (en) Consensus protocol for asynchronous database transaction replication with fast, automatic failover, zero data loss, strong consistency, full sql support and horizontal scalability
CN117591552A (zh) 数据处理方法、介质、装置和计算设备
Sciascia High performance deferred update replication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909734

Country of ref document: EP

Kind code of ref document: A1