CN115658391A - Backup recovery method of WAL mechanism based on QianBase MPP database - Google Patents

Backup recovery method of WAL mechanism based on QianBase MPP database Download PDF

Info

Publication number
CN115658391A
CN115658391A CN202211507825.3A CN202211507825A CN115658391A CN 115658391 A CN115658391 A CN 115658391A CN 202211507825 A CN202211507825 A CN 202211507825A CN 115658391 A CN115658391 A CN 115658391A
Authority
CN
China
Prior art keywords
wal
backup
file
qianbase
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211507825.3A
Other languages
Chinese (zh)
Inventor
黄江伟
杨永芳
李建衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Esgyn Information Technology Co Ltd
Original Assignee
Guizhou Esgyn Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Esgyn Information Technology Co Ltd filed Critical Guizhou Esgyn Information Technology Co Ltd
Priority to CN202211507825.3A priority Critical patent/CN115658391A/en
Publication of CN115658391A publication Critical patent/CN115658391A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a backup recovery method of a WAL mechanism based on a QianBase MPP database, which writes transaction data of each transaction successfully executed in the QianBase MPP database into a WAL file, wherein each node stores the WAL file and WAL files of other nodes; each transaction data of the successfully executed transaction has a time stamp; reading metadata of a first WAL backup file, and copying the WAL file to write back data until the backup data is recovered; before reading the metadata of the WAL backup file, firstly backing up the current WAL file of the node in the QianBase MPP database to obtain a second WAL backup file. The consistency and the accuracy of a backup and recovery system are ensured through a WAL mechanism, each node in the QianBase MPP database has a WAL file which is complete per se and WAL files of other nodes, and when the node fails, the normal operation of the database and the integrity of data can still be ensured.

Description

Backup recovery method of WAL mechanism based on QianBase MPP database
Technical Field
The invention relates to a data backup and recovery method, in particular to a backup and recovery method of a WAL mechanism based on a QianBase MPP database.
Background
The data security of the database is always a very concerned function point of manufacturers and customers, various threats often exist in the operation process of the computer database, the data security of the database is tested, and the backup recovery technology stores the database data off line at regular time, so that the data loss caused when the database or the computer is damaged is avoided. The current backup and recovery scheme of the main stream data mainly comprises MySQL, oracle, gaussDB, SQL Server and the like, and most of the schemes directly backup the data to generate SQL files or respectively complete backup and differential backup on logs and files in a database by using different backup modes. However, in the scenario of a distributed database, such as QianBase MPP, backup recovery cannot guarantee service continuity in a multi-computer environment, the problem of failure in backup or recovery of the distributed database, and the problem of recovery to any time point, so it is necessary to research and improve a data backup and recovery method in the scenario of the distributed database.
Disclosure of Invention
One of the objectives of the present invention is to provide a backup recovery method for a WAL mechanism based on a QianBase MPP database, so as to solve the technical problems that the existing data backup recovery scheme cannot ensure the continuity of services in a multi-computer environment, and the normal use of the database is affected after the database backup or recovery fails.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention provides a backup recovery method of a WAL mechanism based on a QianBase MPP database, which comprises the following steps.
Step A, writing transaction data of each transaction successfully executed in a QianBase MPP database into a WAL file, wherein each node in the QianBase MPP database stores the WAL of each node and the WAL files of other nodes; each successfully executed transaction has a timestamp in its transaction data.
B, backing up the current WAL file of the node in the QianBase MPP database to obtain a second WAL backup file; reading metadata of a first WAL backup file, and copying the WAL file to write back data until the backup data is recovered; before reading the metadata of the WAL backup file.
Preferably, the further technical scheme is as follows: and when the data recovery of the first WAL backup file fails, reading the metadata of the second WAL backup file, and copying the WAL file to write back the data until the backup data is recovered.
The further technical scheme is as follows: after starting backup, closing the WAL file of the current node, transferring the WAL file to a backup directory, and recording the metadata of the WAL file.
The further technical scheme is as follows: in the method, the WAL of the same node in the QianBase MPP database is stored in at least two nodes and is synchronously updated.
The present invention also provides a computer-readable storage medium having stored thereon instructions which, when executed by a computer, cause the computer to perform the above-described method.
Compared with the prior art, the invention has the following beneficial effects: the consistency and the accuracy of a backup and recovery system are ensured through a WAL mechanism, each node in the QianBase MPP database has a WAL file which is complete per se and WAL files of other nodes, and when the node fails, the normal operation of the database and the integrity of data can still be ensured.
Drawings
FIG. 1 is a schematic block diagram illustrating the structure of the QianBase MPP database in one embodiment of the present invention.
FIG. 2 is a logic flow diagram illustrating one embodiment of the present invention.
Detailed Description
The QianBase MPP mentioned in the present invention is a relational distributed database applied to a data warehouse, and specifically, as shown in fig. 1, has outstanding advantages in data storage, high concurrency, high availability, linear expansion, reaction speed, ease of use, cost performance, and the like. The database architecture comprises three layers, namely a client service layer, an SQL database service layer and a storage engine layer. The first layer is the client service layer where the application resides. The application may be written by a user or implemented through a third party ISV tool/solution. You can access the QianBase database service layer through a standard ODBC/JDBC interface using a Windows or Linux client driver provided by QianBase. QianBase supports type2JDBC, type4JDBC and ado. Depending on the particular requirements (response time, number of connections, security requirements, and other factors), you can select the appropriate driver type. The second layer is the SQL database engine layer. This layer includes all the QianBase services, encapsulating all the services that manage QianBase objects and efficiently execute SQL database requests. Services include connection management, SQL statement compilation and creation of optimal execution plans, SQL execution (serial and parallel), transaction management and workload management. The third layer is the storage engine layer, which includes the standard Hadoop services used by QianBase (HDFS and Zookeeper). The QianBase object is stored in a native Hadoop database structure and comprises HBase, a cache text file and a key value sequence file. QianBase handles SQL requests coming from applications and transparently translates these requests into native interface calls required by the underlying data format. QianBase provides a relational schema abstraction over HBase, so QianBase can support legacy relational database objects (tables, views, secondary indices) by using the familiar DDL/DML syntax (object naming, column definition, and data type support). In addition, qianBase also supports the native table of HBase and Hive as the appearance of QianBase.
The WAL (Write-Ahead-Log) pre-written Log is a Log used by the RegionServer of HBase to record operation contents in the process of processing data insertion and deletion, and data recovery can be performed according to the Log when a node is down.
The invention is further elucidated with reference to the drawing.
Referring to fig. 1 and 2, one embodiment of the present invention is a backup recovery method for WAL mechanism based on QianBase MPP database, which includes the following steps.
S1, writing transaction data of each successfully executed transaction into a WAL file in a QianBase MPP database; what is important here is that each node in the aforementioned QianBase MPP database stores its own WAL file as well as the WAL files of other nodes; that is, in the QianBase MPP database, WAL files of the same node are stored and updated synchronously in at least two nodes. Meanwhile, the transaction data of each transaction which is successfully executed has a timestamp, and the current node data can be restored to any previous time point through the timestamp.
S2, reading metadata of the first WAL backup file, and copying the WAL file to write back data until the backup data is recovered; it should be noted here that, before reading the metadata of the WAL backup file in this step, the current WAL file of the node in the QianBase MPP database is backed up first to obtain a second WAL backup file. Therefore, when the recovery fails, the database can be rolled back to the state before the recovery action.
When the data recovery of the first WAL backup file fails, the metadata of the second WAL backup file is read, and then the WAL file is copied to write back the data until the backup data is recovered. After the backup is started in the above steps, closing the WAL file of the current node, then transferring the WAL file to the backup directory, and recording the metadata of the WAL file.
Based on the above description of the method of the present invention, more specifically: firstly, storing WAL: in the distributed database QianBase MPP, as shown in fig. 1, each transaction is written into the WAL first, so that it can be ensured that the problem of transaction inconsistency does not occur when the database fails or the computer is powered off, and in the distributed environment, each node stores its own WAL and cross-stores partial WALs of other nodes, so that in the distributed environment, when partial nodes are down, the complete WAL can still be ensured.
Each transaction data is provided with an accurate timestamp: each transaction is assigned with an accurate timestamp when the transaction is successful, so that the data can be ensured not to have difference in quantity when the data is recovered, and the final data is inconsistent.
The size of the WAL file can be regulated and controlled: the size of the WAL file can be adjusted according to the actual situation, otherwise, an oversize single file can generate performance loss on data backup and movement, and the operation efficiency of the database is affected under severe conditions.
Transferring the WAL file during backup: when performing backup, as shown in fig. 2, only the WAL file that is not closed currently needs to be closed, and the WAL file information is recorded, and the closed WAL file is directly transferred to the backup directory, which has no influence on the service of the database. If the data is worried about to be copied, the CPU and the I/O of the physical machine are excessively occupied, the resource use upper limit of the backup can also be set, and the backup action is carried out imperceptibly as much as possible.
WAL files are not deleted directly when recovery: when the database needs to be restored, the database may still be running, and each node still has a large number of WAL files, as shown in fig. 2, the backed-up WAL files need to be copied back to the node during restoration, and at this time, the WAL files in the node need to be backed up, so that it can be ensured that the database can be rolled back to the state before the restoration action when the restoration fails.
Because the current data backup and recovery schemes copy the data itself or copy the data blocks, in the distributed database QianBase MPP, it is difficult to ensure the integrity and consistency of the data, and the method provided by the present invention can effectively solve the problem, and has the following characteristics:
first, data consistency can be guaranteed in a distributed environment: in a distributed database QianBase MPP, business data of each industry can be completely written into a WAL, and a time point is marked to be successful or failed, so that when backup is initiated, the WAL file in a disk is directly closed and then copied to a specified backup directory, the backed-up data is more than the required backup data, the data is screened through the time point during recovery, the consistency of the data can be ensured, and the business in operation cannot be influenced in the backup process.
And secondly, the database is failed in backup or recovery, namely when the backup recovery operation is executed, the operation cannot be guaranteed to be successful due to some unexpected conditions, but the backup recovery failure still needs to be guaranteed not to have negative influence on the currently running database operation, because the backup operation is the operation on the WAL file, and the backup failure does not have influence on the running business operation. Before the recovery operation is executed, the backup operation is firstly carried out, the backup recovery is started after the backup is successful, and when the backup recovery is failed, the backup is recovered, so that the consistency of the service data is ensured.
Thirdly, PIT recovery of the distributed database: based on backup recovery of the WAL mechanism, each WAL file records the time stamp of each service data, and the recovery to any time point is to filter out the data which is in accordance with the time period in the WAL file, so that the consistency of the recovery and recovery data at any time point can be ensured.
Based on the general form of the computer software product, a further embodiment of the present invention provides a computer-readable storage medium, in which instructions are stored, and when the instructions are executed by a computer, the computer is enabled to execute the backup and restore method based on the WAL mechanism of the QianBase MPP database in the foregoing embodiment.
The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM), a register, a hard disk, an optical fiber, a CD-ROM, an optical storage device, a magnetic storage device, any suitable combination of the foregoing, or any other form of computer readable storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuit (ASIC). In embodiments of the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
In addition to the foregoing, it should be noted that reference throughout this specification to "one embodiment," "another embodiment," "an embodiment," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment described generally throughout this application. The appearances of the same phrase in various places in the specification are not necessarily all referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with any embodiment, it is submitted that it is within the scope of the invention to effect such feature, structure, or characteristic in connection with other embodiments.
Although the invention has been described herein with reference to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure. More specifically, various variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the disclosure, the drawings and the appended claims. In addition to variations and modifications in the component parts and/or arrangements, other uses will also be apparent to those skilled in the art.

Claims (5)

1. A backup recovery method of a WAL mechanism based on a QianBase MPP database is characterized by comprising the following steps:
writing transaction data of each transaction successfully executed in a QianBase MPP database into a WAL file, wherein each node in the QianBase MPP database stores the WAL file of the node and the WAL files of other nodes; each transaction data of the successfully executed transaction has a time stamp;
reading metadata of a first WAL backup file, and copying the WAL file to write back data until the backup data is recovered; before reading the metadata of the WAL backup file, firstly backing up the current WAL file of the node in the QianBase MPP database to obtain a second WAL backup file.
2. The QianBase MPP database-based backup recovery method for the WAL mechanism according to claim 1, characterized in that: and when the data recovery of the first WAL backup file fails, reading the metadata of the second WAL backup file, and copying the WAL file to write back the data until the backup data is recovered.
3. The QianBase MPP database-based backup recovery method for WAL mechanism according to claim 1 or 2, characterized in that: after starting backup, closing the WAL file of the current node, transferring the WAL file to a backup directory, and recording the metadata of the WAL file.
4. The QianBase MPP database-based backup recovery method for the WAL mechanism according to claim 1, characterized in that: in the method, the WAL of the same node in the QianBase MPP database is stored in at least two nodes and is synchronously updated.
5. A computer-readable medium, characterized in that: the computer-readable storage medium has stored therein instructions that, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 4.
CN202211507825.3A 2022-11-29 2022-11-29 Backup recovery method of WAL mechanism based on QianBase MPP database Pending CN115658391A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211507825.3A CN115658391A (en) 2022-11-29 2022-11-29 Backup recovery method of WAL mechanism based on QianBase MPP database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211507825.3A CN115658391A (en) 2022-11-29 2022-11-29 Backup recovery method of WAL mechanism based on QianBase MPP database

Publications (1)

Publication Number Publication Date
CN115658391A true CN115658391A (en) 2023-01-31

Family

ID=85020142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211507825.3A Pending CN115658391A (en) 2022-11-29 2022-11-29 Backup recovery method of WAL mechanism based on QianBase MPP database

Country Status (1)

Country Link
CN (1) CN115658391A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858252A (en) * 2023-02-21 2023-03-28 浙江智臾科技有限公司 Data recovery method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858252A (en) * 2023-02-21 2023-03-28 浙江智臾科技有限公司 Data recovery method, device and storage medium
CN115858252B (en) * 2023-02-21 2023-06-02 浙江智臾科技有限公司 Data recovery method, device and storage medium

Similar Documents

Publication Publication Date Title
US11740974B2 (en) Restoring a database using a fully hydrated backup
US11256715B2 (en) Data backup method and apparatus
US10254996B1 (en) Fast migration of metadata
CN107835983B (en) Backup and restore in distributed databases using consistent database snapshots
US8965850B2 (en) Method of and system for merging, storing and retrieving incremental backup data
CA2933790C (en) Apparatus and method for creating a real time database replica
EP3796174B1 (en) Restoring a database using a fully hydrated backup
US20140095432A1 (en) Schema versioning for cloud hosted databases
US10204016B1 (en) Incrementally backing up file system hard links based on change logs
US9223797B2 (en) Reparse point replication
US11003364B2 (en) Write-once read-many compliant data storage cluster
CN111078667B (en) Data migration method and related device
CN104657382A (en) Method and device for detecting consistency of data of MySQL master and slave servers
US10628298B1 (en) Resumable garbage collection
CN107357920B (en) Incremental multi-copy data synchronization method and system
US7631020B1 (en) Method and system of generating a proxy for a database
US8843450B1 (en) Write capable exchange granular level recoveries
CN112800019A (en) Data backup method and system based on Hadoop distributed file system
CN115658391A (en) Backup recovery method of WAL mechanism based on QianBase MPP database
US11966297B2 (en) Identifying database archive log dependency and backup copy recoverability
US8595271B1 (en) Systems and methods for performing file system checks
US7831564B1 (en) Method and system of generating a point-in-time image of at least a portion of a database
US20060004846A1 (en) Low-overhead relational database backup and restore operations
CN111221801A (en) Database migration method, system and related device
US20230306014A1 (en) Transactionally consistent database exports

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination