EP4170509A1 - Verfahren zum wiedergeben eines protokolls auf einem datenknoten, datenknoten und system - Google Patents

Verfahren zum wiedergeben eines protokolls auf einem datenknoten, datenknoten und system Download PDF

Info

Publication number
EP4170509A1
EP4170509A1 EP21831988.7A EP21831988A EP4170509A1 EP 4170509 A1 EP4170509 A1 EP 4170509A1 EP 21831988 A EP21831988 A EP 21831988A EP 4170509 A1 EP4170509 A1 EP 4170509A1
Authority
EP
European Patent Office
Prior art keywords
page
log
version
logs
transaction commit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21831988.7A
Other languages
English (en)
French (fr)
Other versions
EP4170509A4 (de
Inventor
Qiong Zhang
Xuli LI
Heting LI
Geyong YAO
Xiang Hu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP4170509A1 publication Critical patent/EP4170509A1/de
Publication of EP4170509A4 publication Critical patent/EP4170509A4/de
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2064Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring while ensuring consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing

Definitions

  • Embodiments of this application relate to the field of database technologies, and in particular, to a method for replaying a log on a data node, a data node, and a system.
  • At least two data nodes are usually established on the server.
  • One data node is used as a primary server, and other data nodes are used as secondary servers.
  • the primary server provides data services
  • the secondary server replays a transaction log on the primary server to ensure data consistency with the primary server. If the primary server is faulty or shut down due to other reasons, the secondary server becomes the primary server and provides data services.
  • the secondary server needs to replay all the logs before the secondary server provides data services. Therefore, to enable the secondary server to provide data services as soon as possible, the secondary server randomly allocates to-be-replayed logs to a plurality of threads. In this way, the plurality of threads can concurrently replay the logs to improve a log replay speed.
  • a log corresponding to a next transaction can be replayed only after all logs corresponding to a transaction are replayed. Therefore, the log replay speed is still not fast enough.
  • Embodiments of this application provide a method for replaying a log on a data node, a data node, and a database system.
  • the method can improve a log replay speed.
  • a first aspect of this application provides a method for replaying a log on a data node, where a plurality of threads run on the data node, and the method includes the following.
  • the data node obtains a plurality of logs, where the plurality of logs include at least one transaction commit log and at least one page operation log, each of the at least one transaction commit log includes a transaction commit operation, and each of the at least one page operation log includes one or more operations on one page.
  • the page operation log may include only an operation of inserting one row to a specific page.
  • the page operation log may further include an operation of deleting another row from the specific page.
  • the page operation log may include an operation of inserting two rows to the specific page and an operation of deleting one row from the specific page.
  • the data node replays the at least one transaction commit log by using a first thread of the plurality of threads.
  • the first thread sequentially replays the at least one transaction commit log based on a log sequence number LSN of the transaction commit log.
  • the data node replays the at least one page operation log by using at least one second thread of the plurality of threads, where all page operation logs that are in the at least one page operation log and that include an operation on a same page are replayed by using a same second thread.
  • the same second thread may replay page operation logs that include operations on different pages.
  • a process in which the first thread replays the at least one transaction commit log is independent of a process in which the at least one second thread replays the at least one page operation log.
  • log replay progress does not need to be synchronized between threads, and a log replay speed is improved compared with a case in which a plurality of second threads replay all the page operation logs including an operation on a same page.
  • a process in which the first thread replays a transaction commit log is independent of a process in which any second thread replays a page operation log. Therefore, even if replaying of a transaction commit log corresponding to a current transaction is not completed, a page operation log corresponding to a next transaction can be replayed. This further increases a replay speed.
  • that the data node obtains a plurality of logs includes: obtaining the at least one transaction commit log and at least one operation log from a buffer; and the data node parses the at least one operation log on a per-page basis, to obtain the at least one page operation log, where each of the at least one operation log includes an operation on one or more pages; and the at least one page operation log includes a first page operation log, and a log sequence number LSN of the first page operation log is the same as an LSN of an operation log which is parsed to obtain the first page operation log.
  • reading the plurality of logs from the buffer can reduce a quantity of input/output I/O times of the data memory, and can increase a log reading speed. Therefore, a log replay speed is improved.
  • this implementation provides a feasible solution for obtaining a page operation log.
  • the method further includes: The data node determines a maximum visible LSN, where the maximum visible LSN indicates log replay progress; and all transaction commit logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one transaction commit log are replayed, and all page operation logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one page operation log are replayed.
  • An LSN corresponding to one of the plurality of logs is a first LSN, all transaction commit logs whose LSNs are less than or equal to the first LSN in the at least one transaction commit log are replayed, and if all page operation logs whose LSNs are less than or equal to the first LSN in the at least one page operation log are replayed, the first LSN is considered as the maximum visible LSN. Th.is avoids a case in which the maximum visible LSN is incorrect because of discontinuous log replay. Therefore, in this implementation, the maximum visible LSN can accurately indicate log replay progress, and data consistency can be ensured by using the log replay progress.
  • that the data node replays the at least one transaction commit log by using a first thread of the plurality of threads includes: The data node replays, by using the first thread of the plurality of threads, the at least one transaction commit log on an initial version of a first page in an LSN sequence, to obtain a plurality of versions of the first page, and each of the plurality of versions of the first page is associated with at least one of the at least one transaction commit log.
  • a quantity of versions of the first page is not specifically limited in this embodiment of this application.
  • the plurality of versions of the first page include a first version of the first page and a second version of the first page that is generated based on the first version of the first page, where the second version of the first page is obtained by replaying, on the first version of the first page in the LSN sequence, transaction commit logs associated with the first version of the first page.
  • Content of the first page after any transaction commit log is replayed can be viewed based on the plurality of versions of the first page.
  • the method further includes: After the data node obtains the plurality of versions of the first page, the data node writes the plurality of versions of the first page into a double write (double write) file; and the data node writes the plurality of versions of the first page in the double write file into a data (data) file.
  • the data node first writes the plurality of versions of the first page into the double write file, and then writes the plurality of versions of the first page from the double write file into the data file. This can ensure that the plurality of versions of the first page are not damaged.
  • one version of the first page is generated, on the initial version of the first page, each time a specific quantity of transaction commit logs are replayed. Therefore, a quantity of transaction commit logs associated with each of the plurality of versions of the first page is the same.
  • one version of the first page is generated, on the initial version of the first page, each time a transaction commit log is replayed for a period of time. Therefore, replay time of transaction commit logs associated with each of the plurality of versions of the first page is the same.
  • the method further includes: The data node replays, in response to a first indication, on a reference version of the first page in an LSN sequence, transaction commit logs associated with the reference version of the first page until the first transaction commit log is replayed, to obtain a target version of the first page, where the first indication is used to query log replay progress of the at least one transaction commit log.
  • the first indication may be in a plurality of forms.
  • the reference version of the first page is a version that is associated with the first transaction commit log and that is in the plurality of versions of the first page, and the first transaction commit log is a log whose LSN is the largest and whose LSN is less than or equal to the maximum visible LSN in the at least one transaction commit log.
  • the method further includes: determining the log replay progress of the at least one transaction commit log based on the target version of the first page.
  • the log replay progress can be fed back in real time and accurately based on the plurality of versions of the first page and the maximum visible LSN. This ensures data consistency.
  • that the data node replays the at least one page operation log by using at least one second thread of the plurality of threads includes: The data node replays, by using one second thread of the plurality of threads, on an initial version of a second page in an LSN sequence, all page operation logs that are in the at least one page operation log and that include an operation on the second page, to obtain a plurality of versions of the second page, and each of the plurality of versions of the second page is associated with at least one of all the page operation logs including the operation on the second page.
  • a quantity of versions of the second page is not specifically limited in this embodiment of this application.
  • the plurality of versions of the second page include a first version of the second page and a second version of the second page that is generated based on the first version of the second page, where the second version of the second page is obtained by replaying, on the first version of the second page in the LSN sequence, page operation logs associated with the first version of the second page.
  • Content of the second page after any page operation log is replayed can be viewed based on the plurality of versions of the second page.
  • the method further includes: After the data node obtains the plurality of versions of the second page, the data node writes the plurality of versions of the second page into a double write (double write) file; and the data node writes the plurality of versions of the second page in the double write file into a data (data) file.
  • the data node first writes the plurality of versions of the second page into the double write file, and then writes the plurality of versions of the second page from the double write file into the data file. This can ensure that the plurality of versions of the second page are not damaged.
  • one version of the second page is generated, on the initial version of the second page, each time a specific quantity of transaction commit logs are replayed. Therefore, a quantity of page operation logs associated with each of the plurality of versions of the second page is the same.
  • one version of the second page is generated, on the initial version of the second page, each time a page operation log is replayed for a period of time. Therefore, replay time of page operation logs associated with each of the plurality of versions of the second page is the same.
  • the method further includes: The data node replays, in response to a second indication, on a reference version of the second page in the LSN sequence, transaction commit logs associated with the target version of the second page until a first page operation log is replayed, to obtain a target version of the second page, where the second indication is used to query log replay progress of the at least one page operation log.
  • the reference version of the second page is a version that is associated with the first page operation log and that is in the plurality of versions of the second page
  • the first page operation log is a log whose LSN is the largest and whose LSN is less than or equal to the maximum visible LSN in all the page operation logs including the operation on the second page.
  • the method further includes determining the log replay progress of the at least one page operation log based on the target version of the second page.
  • the log replay progress can be fed back in real time and accurately based on the plurality of versions of the second page and the maximum visible LSN. This ensures data consistency.
  • a second aspect of this application provides a data node, where a plurality of threads run on the data node, and the data node includes:
  • a process in which the first thread replays the at least one transaction commit log is independent of a process in which the at least one second thread replays the at least one page operation log.
  • the obtaining unit is configured to: obtain the at least one transaction commit log and the at least one operation log from a buffer; and parse the at least one operation log to obtain the at least one page operation log, where each of the at least one operation log includes an operation on one or more pages; and the at least one page operation log includes a first page operation log, and an LSN of the first page operation log is the same as an LSN of an operation log which is parsed to obtain the first page operation log.
  • the data node further includes a determining unit, configured to determine a maximum visible LSN, where the maximum visible LSN indicates log replay progress; and all transaction commit logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one transaction commit log are replayed, and all page operation logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one page operation log are replayed.
  • a determining unit configured to determine a maximum visible LSN, where the maximum visible LSN indicates log replay progress; and all transaction commit logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one transaction commit log are replayed, and all page operation logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one page operation log are replayed.
  • the first replay unit is configured to replay the at least one transaction commit log in an LSN sequence on an initial version of a first page by using a first thread of the plurality of threads, to obtain a plurality of versions of the first page, and each of the plurality of versions of the first page is associated with at least one of at least one transaction commit log; and the plurality of versions of the first page include a first version of the first page and a second version of the first page that is generated based on the first version of the first page, where the second version of the first page is obtained by replaying, on the first version of the first page in the LSN sequence, transaction commit logs associated with the first version of the first page.
  • the data node further includes a writing unit, configured to: write the plurality of versions of the first page into a double write (double write) file; and write the plurality of versions of the first page in the double write file into a data (data) file.
  • a writing unit configured to: write the plurality of versions of the first page into a double write (double write) file; and write the plurality of versions of the first page in the double write file into a data (data) file.
  • a quantity of transaction commit logs associated with each of the plurality of versions of the first page is the same, or replay time of transaction commit logs associated with each of the plurality of versions of the first page is the same.
  • the first replay unit is further configured to replay, in response to a first indication, on a reference version of the first page in an LSN sequence, transaction commit logs associated with the reference version of the first page until the first transaction commit log is replayed, to obtain a target version of the first page, where the first indication is used to query log replay progress of the at least one transaction commit log.
  • the reference version of the first page is a version that is associated with the first transaction commit log and that is in the plurality of versions of the first page, and the first transaction commit log is a log whose LSN is the largest and whose LSN is less than or equal to the maximum visible LSN in the at least one transaction commit log.
  • the first replay unit is further configured to determine the log replay progress of the at least one transaction commit log based on the target version of the first page.
  • the second replay unit is configured to replay, on an initial version of a second page by using one second thread of the plurality threads in an LSN sequence, all page operation logs that are in the at least one page operation log and that include the operation on the second page, to obtain a plurality of versions of the second page, and each of the plurality of versions of the second page is associated with at least one of all page operation logs including an operation on the second page.
  • the plurality of versions of the second page include a first version of the second page and a second version of the second page that is generated based on the first version of the second page, where the second version of the second page is obtained by replaying, on the first version of the second page in the LSN sequence, page operation logs associated with the first version of the second page.
  • the data node further includes a writing unit, configured to: write the plurality of versions of the second page into a double write (double write) file; and write the plurality of versions of the second page in the double write file into a data (data) file.
  • a writing unit configured to: write the plurality of versions of the second page into a double write (double write) file; and write the plurality of versions of the second page in the double write file into a data (data) file.
  • a quantity of page operation logs associated with each of the plurality of versions of the second page is the same, or replay time of page operation logs associated with each of the plurality of versions of the second page is the same.
  • the second replay unit is further configured to replay, in response to a second indication, on a reference version of the second page in the LSN sequence, transaction commit logs associated with the reference version of the second page until a first page operation log is replayed, to obtain a target version of the second page, where the second indication is used to query log replay progress of the at least one page operation log.
  • the reference version of the second page is a version that is associated with the first page operation log and that is in the plurality of versions of the second page
  • the first page operation log is a log whose LSN is the largest and whose LSN is less than or equal to the maximum visible LSN in all the page operation logs including the operation on the second page.
  • the second replay unit is further configured to determine the log replay progress of the at least one page operation log based on the target version of the second page.
  • a third aspect of this application provides a data node, including at least one processor and a memory, where the memory stores computer-executable instructions that can be run on the processor, and when the computer-executable instructions are executed by the processor, the data node performs the method for replaying a log on a data node according to any one of the implementations of the first aspect.
  • a fourth aspect of this application provides a chip or a chip system.
  • the chip or the chip system includes at least one processor and a communication interface.
  • the communication interface is interconnected to the at least one processor by using a line.
  • the at least one processor is configured to run a computer program or instructions, to perform the method for replaying a log on a data node according to any one of the implementations of the first aspect.
  • a fifth aspect of this application provides a computer storage medium.
  • the computer storage medium is configured to store computer software instructions used by the foregoing terminal device, and the computer software instructions include a program designed for a data node; and the data node may be the data node described in the third aspect.
  • a sixth aspect of this application provides a computer program product, where the computer program product includes computer software instructions, and the computer software instructions may be loaded by a processor to implement the method for replaying a log on a data node according to any one of the implementations of the first aspect.
  • a seventh aspect of this application provides a database system, including a primary server and at least one secondary server.
  • the primary server is configured to send a log to the at least one secondary server
  • Each of the at least one secondary server is configured to replay the log according to the method according to any one of the implementations of the first aspect.
  • a same second thread replays all page operation logs including an operation on a same page. Therefore, log replay progress does not need to be synchronized between threads, and a log replay speed is improved compared with a case in which a plurality of second threads replay all page operation logs including an operation on a same page.
  • a process in which the first thread replays a transaction commit log is independent of a process in which any second thread replays a page operation log. Therefore, even if replaying of a transaction commit log corresponding to a current transaction is not completed, a page operation log corresponding to a next transaction can be replayed. This further increases a replay speed.
  • a database system includes the following three parts.
  • a database database, DB
  • DB database
  • Hardware including a data memory required for data storage, for example, a memory and/or a magnetic disk.
  • FIG. 1A is a schematic diagram of a cluster database system using a shared-nothing (shared-nothing) architecture.
  • Each data node has an exclusive hardware resource (such as a data memory), an exclusive operating system, and an exclusive database, and data nodes communicate with each other by using a network.
  • an exclusive hardware resource such as a data memory
  • an exclusive operating system such as a data memory
  • an exclusive database such as a data database
  • data nodes communicate with each other by using a network.
  • one data node is usually used as a primary server to provide a database service, and other data nodes are used as secondary servers.
  • the primary server is faulty or a load on the primary server is heavy, one or more secondary servers are selected as the primary server to provide a database service.
  • the primary server needs to continuously synchronize logs to the secondary servers through streaming replication and replay the logs on the secondary servers. Based on this, logs may be replayed on the secondary servers by using the method provided in embodiments of this application. In addition, because the primary server may also be restarted after the primary server is shut down, logs may also be replayed on the primary server by using the method provided in embodiments of this application.
  • the data node herein may be a physical machine or a virtual machine.
  • a coordinator node 1 and a coordinator node 2 serve as entrances for clients to connect to the database system
  • a transaction management node is configured to provide a database transaction management service
  • a cluster management node is configured to provide a cluster management service.
  • a data memory (data storage) of the database system includes but is not limited to a solid-state drive (solid state drives, SSD), a disk array, or another type of non-transitory computer-readable medium.
  • a database is not shown in FIG. 1A , it should be understood that the database is stored in the data memory.
  • a database system may include fewer or more components than components shown in FIG. 1B , or include components different from the components shown in FIG. 1A.
  • FIG. 1A shows only components more related to the implementations disclosed in embodiments of this application.
  • a cluster database system may include any quantity of data nodes. Functions of a database management system of each data node may be implemented by a proper combination of software, hardware, and/or firmware that are running on each data node.
  • the database system in FIG. 1A includes a data node.
  • a data node including at least one processor 104, a non-transitory computer-readable medium (non-transitory computer-readable medium) 106 storing executable code, and a database management system 108.
  • the executable code is executed by the at least one processor 104, the executable code is configured to implement components and functions of the database management system 108.
  • the non-transitory computer-readable medium 106 may include one or more non-volatile memories.
  • the non-volatile memory includes a semiconductor storage device such as an erasable programmable read-only memory (erasable programmable read only memory, EPROM), an electrically erasable read-only memory (electrically erasable programmable read only memory, EEPROM), and a flash memory (flash memory); and a disk such as an internal hard disk (internal hard disk) or a removable disk (removable disk), a magneto-optical disk (magneto optical disk), a CD ROM, and a DVD-ROM.
  • the non-transitory computer-readable medium 106 may further include any device configured as a main memory (main memory).
  • the at least one processor 104 may include any type of general-purpose computing circuit or application-specific logic circuit, for example, a field programmable gate array (field-programmable gate array, FPGA) or an application-specific integrated circuit (application specific integrated circuit, ASIC).
  • the at least one processor 104 may alternatively be one or more processors coupled to one or more semiconductor substrates, such as a CPU.
  • the database management system 108 may be a relational database management system (relational database management system, RDBMS).
  • the database management system 108 supports a structured query language (structured query language, SQL).
  • structured query language structured query language
  • the SQL is a dedicated programming language dedicated for managing data stored in a relational database.
  • the SQL may be any type of data-related language, including a data definition language and a data manipulation language, and functions of the SQL may include data insertion, querying, updating, and deletion, mode creation and modification, and data access control.
  • the SQL may include descriptions related to various language elements, including a clause (clause), an expression (expression), a predicate (predicate), and a query statement (query statement).
  • the query statement is usually referred to as "query (query)" for short.
  • the clause may be various constituents of the statement and the query, and in some cases, the clause may be considered as optional.
  • the expression may be configured to generate a scalar value (scalar value) and/or a table including a data column and/or row.
  • a specified condition may be configured for the predicate, to adjust an effect of the statement and the query.
  • the query statement is a request to view, access, and/or manipulate data stored in the database.
  • the database management system 108 may receive a query in an SQL format (referred to as an SQL query) from a database client 102.
  • the SQL query may also be referred to as an SQL statement.
  • the database management system 108 usually generates, by accessing related data in the database and manipulating the related data, a query result corresponding to the query, and returns the query result to the database client 102.
  • a database is a set of data organized, described, and stored based on a mathematical model.
  • the database may include one or more database structures or formats, such as row storage and column storage.
  • the database is typically stored in a data memory, such as an external data memory 120 in FIG. 1B , or the non-transitory computer-readable medium 106. When the database is stored in the non-transitory computer-readable medium 106, the database management system 108 is a memory database management system.
  • the database client 102 may include any type of device or application program configured to interact with the database management system 108.
  • the database client 102 includes one or more application servers.
  • the database management system 108 includes an SQL engine 110, an execute engine 122, and a storage engine 134.
  • the SQL engine 110 generates a corresponding execution plan based on an SQL statement submitted by the client 102, such as a query (query).
  • the execute engine 122 performs an operation based on the execution plan of the statement, to generate a query result.
  • the storage engine 134 is responsible for managing actual content of data and an index of a table in a file system, and also manages data such as a cache (cache), a buffer (buffer), a transaction, and a log (log) during running of the storage engine 134.
  • the storage engine 134 may write an execution result of the execute engine 122 into the data memory 120 through physical I/O.
  • the SQL engine 110 includes a parser 112 and an optimizer 114.
  • the parser 110 is configured to: perform syntax and semantic analysis on the SQL statement, expand a query view, and obtain smaller query blocks through division.
  • the optimizer 114 generates, for the statement, a group of execution plans that are possibly used, estimates costs of each execution plan, compares costs of the plans, and finally selects an execution plan with least costs.
  • RTO recovery time objective
  • the RTO refers to a maximum tolerable time for a computer, a system, a network, or an application to stop working after a fault or disaster occurs.
  • logs are replayed on the secondary server
  • logs are accumulated on the secondary server if the log replay speed is not fast enough.
  • the secondary server needs to wait for a long time after the primary server becomes the primary server, to provide an external service, and the waiting time exceeds the RTO.
  • an embodiment of this application provides a method for replaying a log on a data node.
  • logs are divided into a transaction commit log and a log related to a page operation, and different threads are separately used to submit the transaction commit log and the log related to the page operation. Therefore, a log does not need to be replayed by using a transaction as a unit.
  • the log related to the page operation may be replayed by using a plurality of threads, to further increase a transaction replay speed.
  • a log may be a redo (redo) log. It should be noted that the method provided in this embodiment of this application may be adaptively adjusted, and then another log such as an undo (undo) log is replayed by using an adjusted method.
  • an embodiment of this application provides an embodiment of a method for replaying a log on a data node.
  • a plurality of threads run on the data node. Based on this, the method in this embodiment of this application includes the following steps.
  • Step 101 The data node obtains a plurality of logs.
  • the plurality of logs include at least one transaction commit log and at least one page operation log, each of the at least one transaction commit log includes a transaction commit operation, and each of the at least one page operation log includes one or more operations on one page.
  • the page operation log may include only an operation of inserting one row to a specific page.
  • the page operation log may further include an operation of deleting another row from the specific page.
  • the page operation log may include an operation of inserting two rows to a specific page and an operation of deleting one row from the specific page.
  • Step 102 The data node replays the at least one transaction commit log by using a first thread of the plurality of threads.
  • the first thread After the first thread obtains the plurality of logs, the first thread replays transaction commit logs in the plurality of logs.
  • a transaction commit operation in a transaction commit log is performed on an initial page, to update the initial page. Then, on the updated initial page, a transaction commit operation in a next transaction commit log is performed, to further update the updated initial page. The foregoing process is repeatedly performed. In this way, the transaction commit log can be replayed, and finally a latest updated version of the initial page can be obtained.
  • the initial page may be a blank page, or may be an existing non-blank page in the database.
  • the first thread usually replays the transaction commit logs in a sequence based on log sequence numbers LSNs of the transaction commit logs, to ensure data consistency.
  • transaction commit logs may also be replayed in another manner, which is described in detail in the following.
  • Step 103 The data node replays the at least one page operation log by using at least one second thread of the plurality of threads.
  • a plurality of second threads may be used for parallel replay, to improve a replay speed of the page operation logs.
  • all page operation logs that are in the at least one page operation log and that include an operation on a same page are replayed by using a same second thread.
  • the same second thread may replay page operation logs that include operations on different pages. For example, all page operation logs including an operation on a page 11 and all page operation logs including an operation on a page 22 may be replayed by the same second thread.
  • the replay page operation log Similar to the replay transaction commit log, the replay page operation log also needs to be performed on an initial page. A difference lies in that the initial page may be a blank page, or may be an existing non-blank page in the database. Specifically, all page operation logs including an operation on an initial page are used as an example. If all page operation logs including an operation on the initial page are replayed by one second thread, only one second thread needs to obtain the initial page. If all page operation logs including an operation on the initial page are separately replayed by two second threads, both of the two second threads need to obtain the initial page. Therefore, when all page operation logs including an operation on a same page are replayed by a same second thread, the second thread does not need to obtain an initial page.
  • the second thread usually replays the page operation logs in a sequence based on the LSNs of the page operation logs, to ensure data consistency. Therefore, if all page operation logs including an operation on a same page are replayed by a plurality of second threads, to ensure data consistency, log replay progress needs to be synchronized between the plurality of second threads. In this way, it can be ensured that replay is performed in the sequence based on the LSNs of the page operation logs. If all page operation logs including an operation on a same page are replayed by a same second thread, log replay progress does not need to be synchronized between the plurality of second threads.
  • a process in which the first thread replays the at least one transaction commit log is independent of a process in which the at least one second thread replays the at least one page operation log. It may also be understood that a process in which the first thread replays the at least one transaction commit log and a process in which the at least one second thread replays the at least one page operation log are parallel and do not interfere with each other.
  • a process in which the first thread replays the transaction commit log and a process in which the second thread replays the page operation log are performed independently, and the first thread does not need to synchronize log replay progress of the transaction commit log to the second thread
  • the second process does not need to synchronize log replay progress of the page operation log to the first thread either. Therefore, a process in which the first thread replays the transaction commit log can be performed without waiting for completion of replaying the page operation log by the second thread.
  • a process in which the first thread replays the transaction commit log and a process in which the second thread replays the page operation log are also performed independently.
  • the second thread can replay a page operation log corresponding to a next transaction without waiting for completion of replaying of a transaction commit log corresponding to a current transaction by the first thread.
  • a log of a first transaction includes a page operation log 1 and a transaction commit log 2
  • a log of a second transaction includes a page operation log 3, a page operation log 4, and a transaction commit log 5
  • the page operation log 1 and the page operation log 3 correspond to a same page.
  • Log replay is performed by using the method provided in this embodiment of this application.
  • the page operation log 1 and the page operation log 3 are replay by using a second thread 1
  • the page operation log 4 is replay by using a second thread 2.
  • the transaction commit log 2 and the transaction commit log 5 are replayed by a first thread.
  • a process in which the first thread replays the transaction commit log 2 is independent of a process in which the second thread 1 replays the page operation log 1, and the transaction commit log 2 can be replayed without waiting for completion of replaying the page operation log 1. Therefore, the log replay progress does not need to be synchronized between the first thread and the second thread 1.
  • a process in which the first thread replays the transaction commit log 5 is independent of a process in which the second thread 1 replays the page operation log 3
  • a process in which the first thread replays the transaction commit log 5 is independent of a process in which the second thread 2 replays the page operation log 4. Therefore, the log replay progress does not need to be synchronized between the first thread and the second thread 1, and the log replay progress does not need to be synchronized between the first thread and the second thread 2.
  • log replay between the second thread 1 and the second thread 2 is also independent, and the log replay progress does not need to be synchronized between the second thread 1 and the second thread 2.
  • a same second thread replays all page operation logs including an operation on a same page. Therefore, log replay progress does not need to be synchronized between threads, and a log replay speed is improved compared with a case in which a plurality of second threads replay all page operation logs including an operation on a same page.
  • a process in which the first thread replays a transaction commit log is independent of a process in which any second thread replays a page operation log. Therefore, even if replaying of a transaction commit log corresponding to a current transaction is not completed, a page operation log corresponding to a next transaction can be replayed. This further increases a replay speed.
  • the following describes a specific process of obtaining a plurality of logs.
  • the data node may read the plurality of logs from a data memory or a buffer.
  • the buffer is a piece of memory, and is configured to interact with the data memory to read and write a log.
  • the primary server synchronizes logs to the secondary server through streaming replication, the logs are synchronized to a buffer of the secondary server, and then the logs into the buffer are updated to the data memory. It should be noted that the logs into the buffer are cleared only after the logs are read during log replay. Therefore, the logs are not cleared from the buffer because the logs are updated to the data memory.
  • the plurality of logs may be read from the buffer; and when the primary server and the secondary server are not in the streaming replication state, for example, when the primary server is restarted after the primary server is shut down, the plurality of logs may be read from the data memory.
  • reading the plurality of logs from the buffer can reduce a quantity of input/output I/O times of the data memory, and can increase a log reading speed. Therefore, a log replay speed is increased.
  • the plurality of logs include a transaction commit log and an operation log.
  • the operation log may be understood as a log related to a table, and specifically includes an operation on one or more pages in a table.
  • each of the plurality of logs corresponds to one LSN.
  • the data node obtains a plurality of logs includes: The data node parses at least one operation log to obtain at least one page operation log, and each of the at least one operation log includes an operation on one or more pages.
  • a process in which the data node parses the operation log is performed on a per-page basis. For example, if the operation log includes a total of 15 types of operations on 10 pages, the operation log is parsed into 10 pages of operation logs, that is, a plurality of types of operations corresponding to a same page are included in one page operation log.
  • An LSN of the operation log is the same as an LSN of a page operation log obtained by parsing the operation log.
  • the at least one page operation log includes a first page operation log, and an LSN of the first page operation log is the same as an LSN of an operation log which is parsed to obtain the first page operation log.
  • LSNs of the five page operation logs are all the same as LSNs of the operation logs.
  • the data node may obtain the log replay progress based on a synchronization result. This allows a user to view the log replay progress.
  • the log replay progress does not need to be synchronized between the first thread and the second thread, and the log replay progress does not need to be synchronized between the second threads either. Therefore, the log replay progress cannot be obtained through synchronization between threads.
  • the method further includes the following steps.
  • Step 104 The data node determines a maximum visible LSN, where the maximum visible LSN indicates log replay progress.
  • All transaction commit logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one transaction commit log are replayed, and all page operation logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one page operation log are replayed.
  • log replay may be discontinuous.
  • an LSN of a transaction commit log currently replayed by the first thread is 100
  • an LSN of a page operation log currently replayed by the second thread is 90.
  • some logs whose LSNs are between 90 and 100 are replayed. In this case, 100 cannot be used as the maximum visible LSN, and only 90 can be used as the maximum visible LSN.
  • an LSN corresponding to one of the plurality of logs is a first LSN
  • all transaction commit logs whose LSNs are less than or equal to the first LSN in the at least one transaction commit log are replayed
  • all page operation logs whose LSNs are less than or equal to the first LSN in the at least one page operation log are replayed
  • the first LSN is used as the maximum visible LSN. This avoids a case in which the maximum visible LSN is incorrect because of discontinuous log replay. Therefore, the maximum visible LSN in this embodiment of this application can accurately indicate the log replay progress, and data consistency can be ensured by using the log replay progress.
  • the foregoing describes a method for replaying the at least one transaction commit log by the first thread.
  • the following describes another method for replaying the at least one transaction commit log by the first thread.
  • that the data node replays the at least one transaction commit log by using a first thread of the plurality of threads includes: The data node replays the at least one transaction commit log in an LSN sequence on an initial version of a first page by using the first thread of the plurality of threads, to obtain a plurality of versions of the first page, and each of the plurality of versions of the first page is associated with at least one of the at least one transaction commit log.
  • the plurality of versions of the first page include a first version of the first page and a second version of the first page that is generated based on the first version of the first page, where the second version of the first page is obtained by replaying, on the first version of the first page in the LSN sequence, transaction commit logs associated with the first version of the first page.
  • a process in which the first thread replays the at least one transaction commit log to obtain the plurality of versions of the first page may be as follows.
  • the first thread in the data node sequentially associates a specific quantity of transaction commit logs with a version A (that is, the initial version) of the first page in an ascending order of respective LSNs of the at least one transaction commit log.
  • the data node generates a version B of the first page based on the version A of the first page and the transaction commit logs associated with the version A of the first page. In this case, two versions of the first page are obtained.
  • the first thread in the data node may continue to associate a specific quantity of transaction commit logs with the version B of the first page in an ascending order of respective LSNs of the at least one transaction commit log.
  • the data node generates a version C of the first page based on the version B of the first page and the transaction commit logs associated with the version B of the first page. In this case, three versions of the first page are obtained.
  • the plurality of versions of the first page including the initial version of the first page are obtained, the plurality of versions of the first page are usually stored in a memory.
  • the plurality of versions of the first page are generally written into a data memory.
  • a plurality of methods may be used to write the plurality of versions of the first page into the data memory.
  • the data node may directly write the plurality of versions of the first page into a data (data) file in the data memory. Because there is a risk that the plurality of versions of the first page are damaged when the plurality of versions of the first page are directly written into the data file, the data node may write the plurality of versions of the first page into a double write (double write) file. Then, the plurality of versions of the first page in the double write file are written into the data (data) file. In this way, it can be ensured that the plurality of versions of the first page are not damaged.
  • a quantity of versions of the first page is not specifically limited in this embodiment of this application.
  • the plurality of versions of the first page may include two versions, or may include three or more versions.
  • a quantity of transaction commit logs associated with each of the plurality of versions of the first page is not specifically limited in this embodiment of this application either.
  • a quantity of transaction commit logs associated with each of the plurality of versions of the first page is the same.
  • one version of the first page is generated each time a specific quantity of transaction commit logs are replayed.
  • one version of the first page is generated, on the initial version of the first page, each time a transaction commit log is replayed for a period of time.
  • replay time of transaction commit logs associated with each of the plurality of versions of the first page is the same.
  • a transaction commit operation in the transaction commit log may be continuously performed on the initial page, to continuously update the initial page.
  • the at least one transaction commit log is replayed by using this method, and finally only a latest updated version of the initial page can be obtained.
  • a plurality of versions of the first page may be finally obtained. Content of the first page after any transaction commit log is replayed can be viewed based on the plurality of versions of the first page.
  • the method further includes: the data node replays, in response to a first indication, on a reference version of the first page in an LSN sequence, transaction commit logs associated with the reference version of the first page until the first transaction commit log is replayed, to obtain a target version of the first page, where the first indication is used to query log replay progress of the at least one transaction commit log.
  • the reference version of the first page is a version that is associated with the first transaction commit log and that is in the plurality of versions of the first page, and the first transaction commit log is a log whose LSN is the largest and whose LSN is less than or equal to the maximum visible LSN in the at least one transaction commit log.
  • the method further includes determining the log replay progress of the at least one transaction commit log based on the target version of the first page.
  • the plurality of versions of the first page include a first version of the first page, a second version of the first page, and a third version of the first page, and each version of the first page is associated with three transaction commit logs.
  • LSNs of three transaction commit logs associated with the first version of the first page are respectively 2, 5, and 8.
  • LSNs of three transaction commit logs associated with the second version of the first page are respectively 12, 15, and 18.
  • LSNs of three transaction commit logs associated with the third version of the first page are respectively 22, 25, and 28.
  • a transaction commit log whose LSN is 18 is the first transaction commit log
  • the reference version of the first page is a version associated with the transaction commit log whose LSN is 18, that is, the second version of the first page.
  • the transaction commit logs whose LSNs are respectively 12, 15, and 18 are sequentially replayed on the second version of the first page, to finally obtain the target version of the first page.
  • the target version of the first page may indicate the log replay progress.
  • a transaction commit log whose LSN is 25 is the first transaction commit log
  • the reference version of the first page is a version associated with the transaction commit log whose LSN is 25, that is, the third version of the first page.
  • the transaction commit logs whose LSNs are 22 and 25 are sequentially replayed on the third version of the first page, to finally obtain the target version of the first page.
  • the target version of the first page may indicate the log replay progress.
  • the foregoing describes a method for replaying the at least one page operation log by the second thread.
  • the following describes another method for replaying the at least one page operation log by the second thread.
  • that the data node replays the at least one page operation log by using at least one second thread of the plurality of threads includes: The data node replays, on an initial version of a second page in an LSN sequence by using one second thread of the plurality of threads, all page operation logs that are in the at least one page operation log and that include an operation on the second page, to obtain a plurality of versions of the second page, and each of the plurality of versions of the second page is associated with at least one of all the page operation logs including the operation on the second page.
  • the plurality of versions of the second page include a third version and a fourth version generated based on the third version, and the fourth version is obtained by replaying transaction commit logs associated with the third version in the LSN sequence on the third version.
  • a process in which the second thread replays the at least one page operation log to obtain the plurality of versions of the second page may be as follows.
  • the second thread in the data node sequentially associates a specific quantity of page operation logs with a version D (that is, the initial version) of the second page in an ascending order of respective LSNs of the at least one page operation log.
  • the data node generates a version E of the second page based on the version D of the second page and the page operation logs associated with the version D of the second page. In this case, two versions of the second page are obtained.
  • the second thread in the data node may continue to associate a specific quantity of page operation logs with the version E of the second page in an ascending order of respective LSNs of the at least one page operation log.
  • the data node then generates a version F of the second page based on the version E of the second page and the transaction commit logs associated with the version E of the second page. In this case, three versions of the second page are obtained.
  • the plurality of versions of the second page are usually stored into a memory.
  • the plurality of versions of the second page are usually written into a data memory.
  • a plurality of methods may be used to write the plurality of versions of the second page into the data memory.
  • the data node may directly write the plurality of versions of the second page into a data (data) file in the data memory. Because there is a risk that the plurality of versions of the second page are damaged when the plurality of versions of the second page are directly written into the data file, the data node may write the plurality of versions of the second page into a double write (double write) file. Then, the plurality of versions of the second page in the double write file are written into the data (data) file. In this way, it can be ensured that the plurality of versions of the second page are not damaged.
  • a quantity of versions of the second page is not specifically limited in this embodiment of this application.
  • the plurality of versions of the second page may include two versions, or may include three or more versions.
  • a quantity of page operation logs associated with each of the plurality of versions of the second page is not specifically limited in this embodiment of this application either.
  • one version of the second page is generated, on the initial version of the second page, each time a specific quantity of transaction commit logs are replayed. Therefore, a quantity of page operation logs associated with each of the plurality of versions of the second page is the same.
  • one version of the second page is generated, on the initial version of the second page, each time a page operation log is replayed for a period of time, that is, replay time of page operation logs associated with each of the plurality of versions of the second page is the same.
  • the method further includes: The data node replays, in response to a second indication, on a reference version of the second page in the LSN sequence, transaction commit logs associated with the reference version of the second page until a first page operation log is replayed, to obtain a target version of the second page, where the second indication is used to query log replay progress of the at least one page operation log.
  • the reference version of the second page is a version that is associated with the first page operation log and that is in the plurality of versions of the second page
  • the first page operation log is a log whose LSN is the largest and whose LSN is less than or equal to the maximum visible LSN in all the page operation logs including the operation on the second page.
  • the method further includes determining the log replay progress of the at least one page operation log based on the target version of the second page.
  • the plurality of versions of the second page include a first version of the second page, a second version of the second page, and a third version of the second page, and each version of the second page is associated with three page operation logs.
  • LSNs of three page operation logs associated with the first version of the second page are respectively 1, 3, and 4.
  • LSNs of three page operation logs associated with the second version of the second page are respectively 6, 7, and 9.
  • LSNs of three page operation logs associated with the third version of the second page are respectively 10, 12, and 14.
  • a page operation log whose LSN is 7 is the first page operation log
  • the reference version of the second page is a version associated with the page operation log whose LSN is 7, that is, the second version of the second page.
  • Page operation logs whose LSNs are 6 and 7 are sequentially replayed on the second version of the second page, to finally obtain the target version of the second page.
  • the target version of the second page may indicate the log replay progress.
  • a page operation log whose LSN is 12 is the first page operation log
  • the reference version of the second page is a version associated with the page operation log whose LSN is 12, that is, the third version of the second page.
  • Page operation logs whose LSNs are 10 and 12 are sequentially replayed on the third version of the second page, to finally obtain the target version of the second page.
  • the target version of the second page may indicate the log replay progress.
  • the data node in a log replay process, the data node generates a plurality of versions of the first page and a plurality of versions of the second page, and the log replay progress may be viewed based on the plurality of versions of the first page and the plurality of versions of the second page.
  • content of the first page after any transaction commit log is replayed can be viewed, or content of the second page after any operation log is replayed can be viewed. Therefore, if the log replay method provided in this embodiment of this application is applied to the secondary server of the cluster database system, the secondary server can read data. Further, the primary server is configured to write data, and the secondary server is configured to read data. In this way, a load when the primary server reads data and writes data at the same time can be reduced.
  • an application scenario of this example is that a primary server synchronizes a log to a secondary server through streaming replication, and then the secondary server replays the log.
  • the example includes the following steps.
  • Step 201 The primary server sends the log to the secondary server.
  • the primary server may send, to the secondary server by using a walsender thread, a log generated by the primary server, and the secondary server receives, by using the walreceiver thread, the log sent by the primary server, and stores the received log into a buffer.
  • Step 202 The secondary server stores the received log in a data memory by using the walreceiver thread, and sends a message to the primary server, where the message indicates that the log is written into the data memory.
  • Step 203 A read (read) thread in a data node reads a byte stream into the buffer.
  • the data node may also read the byte stream from the data memory.
  • Step 203 and step 202 are not subject to a specific sequence.
  • step 202 and step 203 may be performed at the same time.
  • step 202 is performed first, and then step 203 is performed.
  • Step 204 A decoding (decode) thread in the data node decodes the byte stream into a plurality of logs.
  • Step 205 A dispatching (dispatcher) thread in the data node sends transaction commit logs in the plurality of logs to a parse redo record thread, and sends operation logs in the plurality of logs to a trxn manager thread.
  • the read thread, the decoding (decode) thread, and the dispatching (dispatcher) thread all belong to a log distribution module.
  • a process of decoding the byte stream is not shown in FIG. 4A .
  • Step 206 The parse redo record thread in the data node parses the operation logs into page operation logs on a per-page basis, and then sends the page operation logs to a page redo manager thread.
  • Step 207 Since the operation logs are table-related logs, the page redo manager thread establishes a hash (hash) table by using a table and a page as a key-value pair.
  • a page operation log under a label 1 is an operation log corresponding to one page
  • a page operation log under a label 2 is an operation log corresponding to one page
  • a page operation log under a label 3 is an operation log corresponding to another page.
  • the parse redo record thread and the page redo manager thread belong to a parsing and classification module.
  • a process in which the parse redo record thread parses an operation log is not shown in FIG. 4A .
  • Step 208 The page redo manager thread adds the page operation logs to a queue of a page redo worker thread based on the hash table.
  • All page operation logs including an operation on a same page are located in a queue of a same page redo worker thread, and are sequentially arranged according to LSNs. It should be noted that, in the queue of the same page redo worker thread, it only needs to be ensured that all page operation logs including an operation on a same page are sequentially arranged according to LSNs, and a sequence of page operation logs including operations on different pages may be random.
  • Step 209 The page redo worker thread replays the page operation log in the queue.
  • Step 210 Generate a plurality of versions of a page of the page operation logs when the page redo worker thread replays the page operation logs.
  • Step 211 The trxn manager thread adds the transaction commit logs to a queue of the trxn worker thread.
  • the transaction commit logs are sequentially arranged in the queue of the trxn worker thread according to LSNs.
  • Step 212 The trxn worker thread replays the transaction commit logs in the queue.
  • the page redo worker thread and trxn worker thread belong to a log replay module.
  • Step 213 Generate a plurality of versions of a page of the transaction commit logs while the trxn worker thread replays the transaction commit logs.
  • Step 214 A data write thread writes the plurality of versions of the page into a double write file and a data file, where the data write thread belongs to a data write module.
  • a plurality of versions of a page can be read by a secondary server, that is, a read service can be provided for a client.
  • the logs may further include a checkpoint log, a tablespace creation log, a tablespace deletion log, a data space creation log, and a data space deletion log.
  • the checkpoint log includes an operation of periodically refreshing data into the buffer to the data memory.
  • the tablespace creation log includes an operation of creating a new table, and correspondingly the tablespace deletion log includes an operation of deleting a table.
  • the data space creation log includes an operation of creating a database, and correspondingly the data space deletion log includes an operation of deleting a database.
  • the first thread may be used to replay the checkpoint log, that is, the first thread is used to replay the transaction commit log and the checkpoint log.
  • a process in which the first thread replays the checkpoint log may interact with a process in which the second thread replays the page operation log.
  • a process in which the first thread replays a transaction commit log is independent of a process in which the second thread replays a page operation log.
  • Another thread may be used to replay the tablespace creation log, the tablespace deletion log, the data space creation log, and the data space deletion log.
  • the page redo manager thread may be used to replay the tablespace creation log, the tablespace deletion log, the data space creation log, and the data space deletion log. It may be understood that operations in the tablespace creation log, the tablespace deletion log, the data space creation log, and the data space deletion log cause a page change. Therefore, a process of replaying the tablespace creation log, the tablespace deletion log, the data space creation log, and the data space deletion log may interact with a process in which the second thread replays the page operation log.
  • the log replay method provided in embodiments of this application may be applied to a simple database with one primary server and a plurality of secondary servers.
  • the primary server when the primary server runs normally, the primary server synchronizes a log to the two secondary servers.
  • FIG. 6(b) after the primary server is shut down abnormally, one of the two secondary servers becomes the primary server, and the log is synchronized to the other secondary server.
  • the log replay method provided in embodiments of this application may be used to replay logs on both of the two secondary servers. This can improve a log replay speed, avoid accumulation of a large quantity of logs on the secondary server, and improve a speed of starting to work after the secondary server becomes the primary server.
  • the log replay method provided in embodiments of this application may be applied to a terminal cloud platform.
  • the terminal cloud platform includes a production equipment room AZ, a coordinator node CN, a transaction management node GTM, and a data node DN. All machines in a same equipment room belong to a same AZ.
  • the terminal cloud platform includes two AZs. A plurality of CNs can be deployed in each equipment room as an entry for connecting to a client. In FIG. 7 , two CNs are disposed in each equipment room.
  • the terminal cloud platform includes six GTMs, where one GTM is an active GTM, and other GTMs are standby GTMs.
  • the terminal cloud platform includes 24 data nodes DNs, where four DNs are primary servers, and each primary server corresponds to five DNs used as secondary servers.
  • a log may be replayed by using the log replay method provided in embodiments of this application. This can improve a log replay speed, avoid accumulation of a large quantity of logs on the secondary server, and improve a speed of starting to work after the secondary server becomes the primary server.
  • the log replay method provided in embodiments of this application may be applied to a public cloud platform.
  • An architecture of the public cloud platform is similar to an architecture of the terminal cloud platform shown in FIG. 7 .
  • a quantity of components in the public cloud platform shown in FIG. 8 may be determined based on a scale of the public cloud platform.
  • a log may be replayed, on a DN used as a secondary server, by using the log replay method provided in embodiments of this application. This can improve a log replay speed, avoid accumulation of a large quantity of logs on the secondary server, and improve a speed of starting to work after the secondary server becomes the primary server.
  • FIG. 9 is a schematic diagram of a structure of a data node according to an embodiment of this application. As shown in FIG. 9 , an embodiment of this application provides an embodiment of a data node. A plurality of threads run on the data node.
  • the data node includes:
  • a process in which the first thread replays the at least one transaction commit log is independent of a process in which the at least one second thread replays the at least one page operation log.
  • the obtaining unit 301 is configured to: obtain the at least one transaction commit log and at least one operation log from a buffer; and parse the at least one operation log to obtain at least one page operation log, where each of the at least one operation log includes an operation on one or more pages; and the at least one page operation log includes a first page operation log, and an LSN of the first page operation log is the same as an LSN of an operation log which is parsed to obtain the first page operation log.
  • the data node further includes a determining unit 304, configured to determine a maximum visible LSN, where the maximum visible LSN indicates log replay progress; and all transaction commit logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one transaction commit log are replayed, and all page operation logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one page operation log are replayed.
  • a determining unit 304 configured to determine a maximum visible LSN, where the maximum visible LSN indicates log replay progress; and all transaction commit logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one transaction commit log are replayed, and all page operation logs whose log sequence numbers LSNs are less than or equal to the maximum visible LSN in the at least one page operation log are replayed.
  • the first replay unit 302 is configured to replay, by using the first thread of the plurality of threads, the at least one transaction commit log on an initial version of a first page in an LSN sequence, to obtain a plurality of versions of the first page, and each of the plurality of versions of the first page is associated with at least one of the at least one transaction commit log.
  • the plurality of versions of the first page include a first version of the first page and a second version of the first page that is generated based on the first version of the first page, where the second version of the first page is obtained by replaying, on the first version of the first page in the LSN sequence, transaction commit logs associated with the first version of the first page.
  • the data node further includes a writing unit 305, configured to: write the plurality of versions of the first page into a double write (double write) file; and write the plurality of versions of the first page in the double write file into a data (data) file.
  • a writing unit 305 configured to: write the plurality of versions of the first page into a double write (double write) file; and write the plurality of versions of the first page in the double write file into a data (data) file.
  • a quantity of transaction commit logs associated with each of the plurality of versions of the first page is the same, or replay time of transaction commit logs associated with each of the plurality of versions of the first page is the same.
  • the first replay unit 302 is further configured to replay, in response to a first indication, on a reference version of the first page in the LSN sequence, transaction commit logs associated with the reference version of the first page until a first transaction commit log is replayed, to obtain a target version of the first page, where the first indication is used to query log replay progress of the at least one transaction commit log.
  • the reference version of the first page is a version that is associated with the first transaction commit log and that is in the plurality of versions of the first page, and the first transaction commit log is a log whose LSN is the largest and whose LSN is less than or equal to the maximum visible LSN in the at least one transaction commit log.
  • the first replay unit 302 is further configured to determine the log replay progress of the at least one transaction commit log based on the target version of the first page.
  • the second replay unit 303 is configured to: replay, by using one second thread of the plurality of threads, on an initial version of a second page in an LSN sequence, all page operation logs that are in the at least one page operation log and that include an operation on the second page, to obtain a plurality of versions of the second page, and each of the plurality of versions of the second page is associated with at least one of all the page operation logs including the operation on the second page.
  • the plurality of versions of the second page include a first version of the second page and a second version of the second page that is generated based on the first version of the second page, where the second version of the second page is obtained by replaying, on the first version of the second page in the LSN sequence, page operation logs associated with the first version of the second page.
  • the data node further includes a writing unit 305, configured to: write the plurality of versions of the second page into a double write (double write) file; and write the plurality of versions of the second page in the double write file into a data (data) file.
  • a writing unit 305 configured to: write the plurality of versions of the second page into a double write (double write) file; and write the plurality of versions of the second page in the double write file into a data (data) file.
  • a quantity of page operation logs associated with each of the plurality of versions of the second page is the same, or replay time of page operation logs associated with each of the plurality of versions of the second page is the same.
  • the second replay unit 303 is further configured to: replay, in response to a second indication, on a reference version of the second page in the LSN sequence, transaction commit logs associated with a target version of the second page until a first page operation log is replayed, to obtain a target version of the second page, where the second indication is used to query log replay progress of the at least one page operation log.
  • the reference version of the second page is a version that is associated with the first page operation log and that is in the plurality of versions of the second page
  • the first page operation log is a log whose LSN is the largest and whose LSN is less than or equal to the maximum visible LSN in all the page operation logs including the operation on the second page.
  • the second replay unit 303 is further configured to determine the log replay progress of the at least one page operation log based on the target version of the second page.
  • An embodiment of a data node in embodiments of this application may include one or more processors 401, a memory 402, and a communication interface 403.
  • the memory 402 may be transitory storage or persistent storage. Still further, the processor 401 may be configured to communicate with the memory 402, and perform, on the data node, a series of instruction operations in the memory 402.
  • the processor 401 may perform the operations performed by the data node in the embodiment shown in FIG. 9 . Details are not described herein again.
  • specific function module division in the processor 401 may be similar to the function module division manner described in FIG. 9 . Details are not described herein again.
  • An embodiment of this application further provides a chip or a chip system.
  • the chip or the chip system includes at least one processor and a communication interface.
  • the communication interface is interconnected to the at least one processor by using a line.
  • the at least one processor is configured to run a computer program or instructions, to perform operations performed by the data node in the embodiment shown in FIG. 9 . Details are not described herein again.
  • the communication interface in the chip may be an input/output interface, a pin, a circuit, or the like.
  • An embodiment of this application further provides a first implementation of a chip or a chip system.
  • the chip or the chip system described above in this application further includes at least one memory, and the at least one memory stores instructions.
  • the memory may be a storage unit inside the chip, for example, a register or a cache, or may be a storage unit (for example, a read-only memory or a random access memory) of the chip.
  • An embodiment of this application further provides a computer storage medium.
  • the computer storage medium is configured to store computer software instructions used by the foregoing control device, and the computer software instructions include a program designed for a data node.
  • the data node may be the data node described in FIG. 9 .
  • An embodiment of this application further provides a computer program product.
  • the computer program product includes computer software instructions, and the computer software instructions may be loaded by a processor to implement a procedure in the method provided in any one of FIG. 2 , FIG. 5A , and FIG. 5B .
  • An embodiment of this application further provides a database system, including a primary server 501 and at least one secondary server 502.
  • the primary server 501 is configured to send a log to the at least one secondary server 502.
  • Each of the at least one secondary server 502 is configured to replay the log according to the method provided in any one of FIG. 2 , FIG. 5A , and FIG. 5B .
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the described apparatus embodiment is merely an example.
  • division into the units is merely logical function division and may be other division in actual implementation.
  • a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented by using some interfaces.
  • the indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
  • the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one location, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of embodiments.
  • each of the units may exist alone physically, or two or more units may be integrated into one unit.
  • the integrated unit may be implemented in a form of hardware, or may be implemented in a form of a software functional unit.
  • the integrated unit When the integrated unit is implemented in the form of the software functional unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium and includes several instructions for instructing a computer device (which may be a personal computer, a server, or a network device) to perform all or some of the steps of the methods described in embodiments of this application.
  • the storage medium includes any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP21831988.7A 2020-06-30 2021-06-30 Verfahren zum wiedergeben eines protokolls auf einem datenknoten, datenknoten und system Pending EP4170509A4 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010615640.9A CN113868028A (zh) 2020-06-30 2020-06-30 一种在数据节点上回放日志的方法、数据节点及系统
PCT/CN2021/103376 WO2022002103A1 (zh) 2020-06-30 2021-06-30 一种在数据节点上回放日志的方法、数据节点及系统

Publications (2)

Publication Number Publication Date
EP4170509A1 true EP4170509A1 (de) 2023-04-26
EP4170509A4 EP4170509A4 (de) 2023-12-06

Family

ID=78981473

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21831988.7A Pending EP4170509A4 (de) 2020-06-30 2021-06-30 Verfahren zum wiedergeben eines protokolls auf einem datenknoten, datenknoten und system

Country Status (4)

Country Link
US (1) US20230137119A1 (de)
EP (1) EP4170509A4 (de)
CN (1) CN113868028A (de)
WO (1) WO2022002103A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868028A (zh) * 2020-06-30 2021-12-31 华为技术有限公司 一种在数据节点上回放日志的方法、数据节点及系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510494B (zh) * 2022-04-18 2022-08-02 成方金融科技有限公司 日志回放方法、装置及存储介质
CN115905270B (zh) * 2023-01-06 2023-06-09 金篆信科有限责任公司 数据库中主用数据节点的确定方法、装置及存储介质
CN117194566B (zh) * 2023-08-21 2024-04-19 泽拓科技(深圳)有限责任公司 多存储引擎数据复制方法、系统、计算机设备

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101201768A (zh) * 2006-12-11 2008-06-18 北京北大方正电子有限公司 数据保存的方法和模块、数据恢复的方法和模块
CN103729442B (zh) * 2013-12-30 2017-11-24 华为技术有限公司 记录事务日志的方法和数据库引擎
CN106855822A (zh) * 2015-12-08 2017-06-16 阿里巴巴集团控股有限公司 用于分布式事务处理的方法及设备
US10387275B2 (en) * 2016-07-26 2019-08-20 Hewlett Packard Enterprise Development Lp Resume host access based on transaction logs
US20180144015A1 (en) * 2016-11-18 2018-05-24 Microsoft Technology Licensing, Llc Redoing transaction log records in parallel
CN106776136B (zh) * 2016-12-12 2019-10-22 网易(杭州)网络有限公司 数据库处理方法和装置
CN106919679B (zh) * 2017-02-27 2019-12-13 北京小米移动软件有限公司 应用于分布式文件系统的日志重演方法、装置及终端
US11573947B2 (en) * 2017-05-08 2023-02-07 Sap Se Adaptive query routing in a replicated database environment
CN110019063B (zh) * 2017-08-15 2022-07-05 厦门雅迅网络股份有限公司 计算节点数据容灾回放的方法、终端设备及存储介质
CN110045912B (zh) * 2018-01-16 2021-06-01 华为技术有限公司 数据处理方法和装置
CN110442560B (zh) * 2019-08-14 2022-03-08 上海达梦数据库有限公司 一种日志重演方法、装置、服务器和存储介质
CN113868028A (zh) * 2020-06-30 2021-12-31 华为技术有限公司 一种在数据节点上回放日志的方法、数据节点及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868028A (zh) * 2020-06-30 2021-12-31 华为技术有限公司 一种在数据节点上回放日志的方法、数据节点及系统

Also Published As

Publication number Publication date
WO2022002103A1 (zh) 2022-01-06
EP4170509A4 (de) 2023-12-06
US20230137119A1 (en) 2023-05-04
CN113868028A (zh) 2021-12-31

Similar Documents

Publication Publication Date Title
CA3121919C (en) System and method for augmenting database applications with blockchain technology
US11429641B2 (en) Copying data changes to a target database
EP4170509A1 (de) Verfahren zum wiedergeben eines protokolls auf einem datenknoten, datenknoten und system
EP3968175B1 (de) Datenreplikationsverfahren und -vorrichtung sowie computervorrichtung und speichermedium
US10503699B2 (en) Metadata synchronization in a distrubuted database
US8838919B2 (en) Controlling data lag in a replicated computer system
US7984042B2 (en) System and method for providing highly available database performance
US9996427B2 (en) Parallel backup for distributed database system environments
US9760617B2 (en) Applying transaction log in parallel
US20190146886A1 (en) Database system recovery using preliminary and final slave node replay positions
US10248709B2 (en) Promoted properties in relational structured data
EP2746971A2 (de) Replikationsmechanismen für Datenbankumgebungen
US11928089B2 (en) Data processing method and device for distributed database, storage medium, and electronic device
US11182405B2 (en) High throughput cross database table synchronization and transactional replication in federated databases
US20170161353A1 (en) System, method and device for optimizing database operations
WO2023111910A1 (en) Rolling back database transaction
CN112612647B (zh) 日志并行重演方法、装置、设备及存储介质
US20230418711A1 (en) Repairing unresolved dangling references after failover
CN115544173B (zh) 可线性扩展的分布式数据库
WO2024082693A1 (zh) 数据处理方法及装置
WO2024109415A1 (zh) 一种数据库重分布的方法、系统、设备集群及存储介质
US20230014029A1 (en) Local indexing for metadata repository objects
US20230359622A1 (en) Blocked index join
CN113326268A (zh) 一种数据写入、读取方法及装置
CN113220784A (zh) 高可用数据库系统的实现方法、装置、设备及存储介质

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20230120

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Free format text: PREVIOUS MAIN CLASS: G06F0016180000

Ipc: G06F0016270000

A4 Supplementary search report drawn up and despatched

Effective date: 20231108

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 16/23 20190101ALI20231102BHEP

Ipc: G06F 16/27 20190101AFI20231102BHEP