CN118069430A

CN118069430A - Automatic recovery method for distributed database data

Info

Publication number: CN118069430A
Application number: CN202410458526.8A
Authority: CN
Inventors: 阳远健
Original assignee: Tianjin Nankai University General Data Technologies Co ltd
Current assignee: Tianjin Nankai University General Data Technologies Co ltd
Priority date: 2024-04-17
Filing date: 2024-04-17
Publication date: 2024-05-24

Abstract

The invention provides a distributed database data automatic recovery method, a database system continuously monitors the node process state, when the data is damaged due to node faults, the database automatically recovers the data through the data automatic recovery method, and the data automatic recovery method comprises triggering automatic recovery service, inquiring table operation, table structure recovery, table data recovery, table structure and data recovery. The invention has the beneficial effects that: the method realizes the imperceptible data recovery, automatically repairs the data loss caused by the downtime of the node process, and ensures the high availability of the system.

Description

Automatic recovery method for distributed database data

Technical Field

The invention belongs to the technical field of databases, and particularly relates to an automatic recovery method for distributed database data.

Background

In distributed database systems, lost data is a common problem that may occur due to data write failures caused by node process failures, network failures, or other causes. To address this problem, existing methods and instrumentalities mainly include logging the redox log of DDL and DML operations, serializing DDL and DML operations, and data backup and restore mechanisms. First, a distributed database system will typically record the redox logs of DDL and DML operations. When a node process failure occurs, the system may recover the data by replaying operations in the redox log to maintain consistency and integrity of the data. In addition, for DDL operations, the system may also record a redox log of DDL statements and re-execute the DDL statements after a node failure to synchronize changes in the table structure. Second, some systems record and recover data in a manner that serializes DDL and DML operations. By serializing and recording the operations into the log file, the system can perform data recovery operations by parsing the log file after node failure, ensuring consistency and integrity of the data. In addition, some distributed database systems may implement data backup and restore mechanisms in order to further increase the reliability of the system. By periodically backing up data and storing the backup data on the backup node, the system can recover data from the backup node when a node failure occurs, thereby ensuring the integrity of the data. However, existing methods and approaches also suffer from some limitations and disadvantages. For example, a recovery mechanism based on a redox log may face problems of excessive log and long recovery time; the manner in which DDL and DML operations are serialized can impact system performance and can present performance bottlenecks in the log parsing process. In addition, data recovery operations based on backup and recovery mechanisms may face challenges of data synchronization delay and high cost.

Disclosure of Invention

In view of the above, the present invention aims to provide a distributed database data automatic recovery method, so as to realize non-perceived data recovery, automatically repair data loss caused by downtime of a node process, and ensure high availability of the system.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows:

An automatic recovery method for distributed database data.

Further, the database system continuously monitors the process state of the node, and when the data is damaged due to the node failure, the database automatically restores the data by a data automatic restoration method, and the data automatic restoration method comprises the following steps:

t1, triggering an automatic recovery service: if the node process is checked to be faulty when the user operates the database, starting the automatic recovery service to transfer to T2;

t2, look-up table operation: checking the operation of a user on a database table, if the operation of the table structure is turned to T3, if the operation of the table data is turned to T4, and if the operation of the table structure and the data is included, turning to T5;

t3, table structure recovery: the system finishes the automatic data recovery flow after the original table structure is flattened according to the recorded structure of the serial ddl event log recovery table;

T4, table data recovery: the system finishes the automatic data recovery flow after synchronously aligning the original table data according to the recorded dml data loss event recovery table data;

t5, table structure and data recovery: and after the system sequentially executes T3 and T4, ending the automatic data recovery flow.

Further, the T3 table structure recovery includes the following sub-steps:

S1, when a user issues an ddl statement to a database table, a database system distributes ddl operation to each data node;

s2, when a certain data node process in the database fails, the database system records the ddl event log of the data node, and stores the ddl event log into a ddl_ fevent _log file after serializing and compressing the ddl event log;

S3, after the data node fault is repaired in S2, the system decompresses and de-sequences the ddl event logs recorded in the ddl_ fevent _log file in S2, then performs ddl redo on the fault data node, and completes table structure recovery.

Further, the ddl event log is serialized in a format of head+content, wherein the head comprises a serialization mark, redolog marks and version numbers, and the content comprises Lockinfo, dbtablename, nodeinfo, statement; wherein Lockinf is that lock related information contains lock id and lock character string information, dbtablename is database table name information, nodeinfo is that node information contains ip address information of a node, and statement is ddl_sql statement to be executed.

Further, the T4 table data recovery includes the steps of:

A1, when a user issues dml sentences to a database table, a database system distributes dml operations to associated data nodes;

A2, when a certain data node process in the database fails, the database system records dml data loss events of a corresponding table of the failed data node, and stores the dml data loss events into a dml_ fevent _log;

A3, after the data node in the A2 is repaired, the system acquires backup data from other normally operated associated data nodes according to dml data loss events in the dml_ fevent _log in the A2;

And A4, performing synchronous writing operation of the data block on the fault data node by using the backup data to finish table data recovery.

Further, the dml_ fevent _log content includes tableinfo, noddeinfo, datainfo, where tableinfo is a table name of the lost data, noddeinfo a node ip of the lost data and backup data nodes ip, datainfo are information of the lost data block.

Further, in the T2, the scenario of the table structure operation includes creating a table, modifying a table, deleting a table, creating an index, modifying an index, deleting an index, and the scenario of the table data operation includes data insertion, data update, data deletion, and data query.

Further, in the step A4, the data block is a difference database, and the synchronous write operation rewrites the difference data into the failed data node, so as to improve the data recovery speed.

Further, the electronic device comprises a processor and a memory which is in communication connection with the processor and is used for storing executable instructions of the processor, wherein the processor is used for realizing the automatic recovery method of the distributed database data when the processor is executed.

Further, a server comprising at least one processor and a memory communicatively coupled to the processor, the memory storing instructions executable by the at least one processor, the instructions executable by the processor to cause the at least one processor to perform the method of automatically recovering distributed database data.

Further, a computer readable storage medium stores a computer program which when executed by a processor implements the method for automatically recovering distributed database data.

Compared with the prior art, the distributed database data automatic recovery method has the following beneficial effects:

the automatic data recovery method of the invention can effectively solve the problems of overlarge logs, performance bottleneck, data synchronization delay and the like by recording events of DDL and DML operations and adopting a mode of carrying out data synchronization writing operation by a backup node, thereby improving the reliability and performance of the system.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:

FIG. 1 is a schematic diagram of a main flow chart of a method for automatically recovering data according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a table structure restoration flow according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a table structure restoration flow according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a head structure according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a content structure according to an embodiment of the present invention;

FIG. 6 is a table data recovery flow diagram illustrating an embodiment of the present invention;

FIG. 7 is a table data recovery flow diagram illustrating an embodiment of the present invention;

FIG. 8 is a diagram of a dml_ fevent _log structure according to an embodiment of the present invention.

Detailed Description

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

The invention will be described in detail below with reference to the drawings in connection with embodiments.

The application is applied to a distributed database, and the data automatic recovery method aims at repairing the data loss of the node process fault table under the condition that user service is completely not perceived. As shown in fig. 1, the database system continuously monitors the process state of the node, and when the data is damaged due to the failure of the node, the database automatically restores the data by a data automatic restoration method, and the user can operate the database during the data automatic restoration method, which comprises the following steps:

Optionally, in T2, the scenario of the table structure operation includes creating a table, modifying a table, deleting a table, creating an index, modifying an index, deleting an index, and the scenario of the table data operation includes data insertion, data update, data deletion, and data query.

Example 1:

if a node process fault occurs during the process of creating a table or modifying a table structure, the table structure recovery in step T3 can be started, and the method specifically comprises the following sub-steps as shown in fig. 2-3:

s2, when a certain data node process in the database fails, the database system records the ddl event log of the data node, sequences the ddl event log (namely ddl_redox_log) and compresses the ddl event log into a ddl_ fevent _log file;

S3, after the data node fault is repaired in S2, the system decompresses and deserializes the ddl event logs recorded in the ddl_ fevent _log file in S2, then carries out ddl redo on the fault data node, and completes table structure recovery.

The ddl event log is serialized in a format of head+content, wherein the head comprises a serialization flag, redolog flag and version number as shown in fig. 4, and the content comprises Lockinfo, dbtablename, nodeinfo, statement as shown in fig. 5; wherein Lockinf is that lock related information contains lock id and lock character string information, dbtablename is database table name information, nodeinfo is that node information contains ip address information of a node, and statement is ddl_sql statement to be executed. The advantage is that the information is lightweight compared with traditional log records, and often can be described by only recording below 0.5K, and the information is compressed, so that the actual required space is often below 100 bytes.

Specifically, after the obtained ddl event log is deserialized in step S3, performing ddl rework (only by re-executing the sql statement) on the failed node according to the information of the corresponding field at the format analysis position, for example, building a table sentence create table, modifying a table sentence alter table, and the like, and then re-executing the sql statement on the failed node to ensure the integrity of the table structure.

Example 2:

If the process fault of the data node occurs during the process of inserting the data of the table, the data of the table is lost, and the data recovery of the table can be performed in step T4, and the steps shown in fig. 6-7 comprise the following steps:

a1, when a user issues dml sentences to a database table, a database system distributes dml operation to each data node;

A3, after the data node in the A2 is repaired, the system acquires backup data from other data nodes which normally run according to dml data loss events in a dml_ fevent _log in the A2;

And A4, performing synchronous writing operation of the data block on the fault data node by using the backup data, completing table data recovery, ensuring the data to be smoothed, wherein the data block is a difference database, and the synchronous writing operation rewrites the difference data into the fault data node so as to improve the data recovery speed.

Specifically, as shown in fig. 8, the dml_ fevent _log content includes tableinfo, noddeinfo, datainfo, where tableinfo is a table name of the lost data, noddeinfo a node ip of the lost data and backup data nodes ip, datainfo are information of the lost data block. Advantageously, the information can be described with only less than 100 bytes of information recording space.

Example 3:

If the user performs both the table structure ddl operation and the table data dml operation, and the data node process fails, step T5 may be performed, specifically, embodiment 1 is performed first to perform table structure recovery (ddl), and embodiment 2 is performed after the table structures are aligned to perform table data recovery (dml), so that data recovery is finally completed, and consistency of the failed node table data and the structure is ensured.

Those of ordinary skill in the art will appreciate that the elements and method steps of each example described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the elements and steps of each example have been described generally in terms of functionality in the foregoing description to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in the present application, it should be understood that the disclosed methods and systems may be implemented in other ways. For example, the above-described division of units is merely a logical function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. The units may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. An automatic recovery method for distributed database data is characterized in that: the database system continuously monitors the process state of the nodes, when the data is damaged due to node faults, the database automatically restores the data through a data automatic restoration method, and the data automatic restoration method comprises the following steps:

2. The method for automatically recovering data from a distributed database according to claim 1, wherein: the T3 table structure recovery includes the following sub-steps:

3. The method for automatically recovering data from a distributed database according to claim 2, wherein: the format of the ddl event log after serialization is head+content, wherein the head comprises a serialization mark, redolog marks and version numbers, and the content comprises Lockinfo, dbtablename, nodeinfo, statement;

Wherein Lockinf is that lock related information contains lock id and lock character string information, dbtablename is database table name information, nodeinfo is that node information contains ip address information of a node, and statement is ddl_sql statement to be executed.

4. The method for automatically recovering data from a distributed database according to claim 1, wherein: the T4 table data recovery includes the steps of:

5. The method for automatically recovering data from a distributed database according to claim 4, wherein: the dml_ fevent _log content includes tableinfo, noddeinfo, datainfo, where tableinfo is a table name of lost data, noddeinfo a node ip of lost data and backup data nodes ip, datainfo are lost data block information.

6. The method for automatically recovering data from a distributed database according to claim 1, wherein: in T2, the table structure operation includes creating a table, modifying a table, deleting a table, creating an index, modifying an index, deleting an index, and the table data operation includes data insertion, data update, data deletion, and data query.

7. The method for automatically recovering data from a distributed database according to claim 4, wherein: in the step A4, the data block is a difference database, and the synchronous writing operation rewrites the difference data into the fault data node so as to improve the data recovery speed.

8. An electronic device comprising a processor and a memory communicatively coupled to the processor for storing processor-executable instructions, characterized in that: the processor is configured to perform a distributed database data auto-recovery method as claimed in any one of claims 1 to 7.

9. A server, characterized by: comprising at least one processor and a memory communicatively coupled to the processor, the memory storing instructions executable by the at least one processor to cause the at least one processor to perform a distributed database data auto-recovery method as claimed in any one of claims 1-7.

10. A computer-readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements a distributed database data automatic restoration method as claimed in any one of claims 1-7.