CN118069430A - Automatic recovery method for distributed database data - Google Patents

Automatic recovery method for distributed database data Download PDF

Info

Publication number
CN118069430A
CN118069430A CN202410458526.8A CN202410458526A CN118069430A CN 118069430 A CN118069430 A CN 118069430A CN 202410458526 A CN202410458526 A CN 202410458526A CN 118069430 A CN118069430 A CN 118069430A
Authority
CN
China
Prior art keywords
data
recovery
node
database
ddl
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410458526.8A
Other languages
Chinese (zh)
Inventor
阳远健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Nankai University General Data Technologies Co ltd
Original Assignee
Tianjin Nankai University General Data Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Nankai University General Data Technologies Co ltd filed Critical Tianjin Nankai University General Data Technologies Co ltd
Priority to CN202410458526.8A priority Critical patent/CN118069430A/en
Publication of CN118069430A publication Critical patent/CN118069430A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a distributed database data automatic recovery method, a database system continuously monitors the node process state, when the data is damaged due to node faults, the database automatically recovers the data through the data automatic recovery method, and the data automatic recovery method comprises triggering automatic recovery service, inquiring table operation, table structure recovery, table data recovery, table structure and data recovery. The invention has the beneficial effects that: the method realizes the imperceptible data recovery, automatically repairs the data loss caused by the downtime of the node process, and ensures the high availability of the system.

Description

Automatic recovery method for distributed database data
Technical Field
The invention belongs to the technical field of databases, and particularly relates to an automatic recovery method for distributed database data.
Background
In distributed database systems, lost data is a common problem that may occur due to data write failures caused by node process failures, network failures, or other causes. To address this problem, existing methods and instrumentalities mainly include logging the redox log of DDL and DML operations, serializing DDL and DML operations, and data backup and restore mechanisms. First, a distributed database system will typically record the redox logs of DDL and DML operations. When a node process failure occurs, the system may recover the data by replaying operations in the redox log to maintain consistency and integrity of the data. In addition, for DDL operations, the system may also record a redox log of DDL statements and re-execute the DDL statements after a node failure to synchronize changes in the table structure. Second, some systems record and recover data in a manner that serializes DDL and DML operations. By serializing and recording the operations into the log file, the system can perform data recovery operations by parsing the log file after node failure, ensuring consistency and integrity of the data. In addition, some distributed database systems may implement data backup and restore mechanisms in order to further increase the reliability of the system. By periodically backing up data and storing the backup data on the backup node, the system can recover data from the backup node when a node failure occurs, thereby ensuring the integrity of the data. However, existing methods and approaches also suffer from some limitations and disadvantages. For example, a recovery mechanism based on a redox log may face problems of excessive log and long recovery time; the manner in which DDL and DML operations are serialized can impact system performance and can present performance bottlenecks in the log parsing process. In addition, data recovery operations based on backup and recovery mechanisms may face challenges of data synchronization delay and high cost.
Disclosure of Invention
In view of the above, the present invention aims to provide a distributed database data automatic recovery method, so as to realize non-perceived data recovery, automatically repair data loss caused by downtime of a node process, and ensure high availability of the system.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
An automatic recovery method for distributed database data.
Further, the database system continuously monitors the process state of the node, and when the data is damaged due to the node failure, the database automatically restores the data by a data automatic restoration method, and the data automatic restoration method comprises the following steps:
t1, triggering an automatic recovery service: if the node process is checked to be faulty when the user operates the database, starting the automatic recovery service to transfer to T2;
t2, look-up table operation: checking the operation of a user on a database table, if the operation of the table structure is turned to T3, if the operation of the table data is turned to T4, and if the operation of the table structure and the data is included, turning to T5;
t3, table structure recovery: the system finishes the automatic data recovery flow after the original table structure is flattened according to the recorded structure of the serial ddl event log recovery table;
T4, table data recovery: the system finishes the automatic data recovery flow after synchronously aligning the original table data according to the recorded dml data loss event recovery table data;
t5, table structure and data recovery: and after the system sequentially executes T3 and T4, ending the automatic data recovery flow.
Further, the T3 table structure recovery includes the following sub-steps:
S1, when a user issues an ddl statement to a database table, a database system distributes ddl operation to each data node;
s2, when a certain data node process in the database fails, the database system records the ddl event log of the data node, and stores the ddl event log into a ddl_ fevent _log file after serializing and compressing the ddl event log;
S3, after the data node fault is repaired in S2, the system decompresses and de-sequences the ddl event logs recorded in the ddl_ fevent _log file in S2, then performs ddl redo on the fault data node, and completes table structure recovery.
Further, the ddl event log is serialized in a format of head+content, wherein the head comprises a serialization mark, redolog marks and version numbers, and the content comprises Lockinfo, dbtablename, nodeinfo, statement; wherein Lockinf is that lock related information contains lock id and lock character string information, dbtablename is database table name information, nodeinfo is that node information contains ip address information of a node, and statement is ddl_sql statement to be executed.
Further, the T4 table data recovery includes the steps of:
A1, when a user issues dml sentences to a database table, a database system distributes dml operations to associated data nodes;
A2, when a certain data node process in the database fails, the database system records dml data loss events of a corresponding table of the failed data node, and stores the dml data loss events into a dml_ fevent _log;
A3, after the data node in the A2 is repaired, the system acquires backup data from other normally operated associated data nodes according to dml data loss events in the dml_ fevent _log in the A2;
And A4, performing synchronous writing operation of the data block on the fault data node by using the backup data to finish table data recovery.
Further, the dml_ fevent _log content includes tableinfo, noddeinfo, datainfo, where tableinfo is a table name of the lost data, noddeinfo a node ip of the lost data and backup data nodes ip, datainfo are information of the lost data block.
Further, in the T2, the scenario of the table structure operation includes creating a table, modifying a table, deleting a table, creating an index, modifying an index, deleting an index, and the scenario of the table data operation includes data insertion, data update, data deletion, and data query.
Further, in the step A4, the data block is a difference database, and the synchronous write operation rewrites the difference data into the failed data node, so as to improve the data recovery speed.
Further, the electronic device comprises a processor and a memory which is in communication connection with the processor and is used for storing executable instructions of the processor, wherein the processor is used for realizing the automatic recovery method of the distributed database data when the processor is executed.
Further, a server comprising at least one processor and a memory communicatively coupled to the processor, the memory storing instructions executable by the at least one processor, the instructions executable by the processor to cause the at least one processor to perform the method of automatically recovering distributed database data.
Further, a computer readable storage medium stores a computer program which when executed by a processor implements the method for automatically recovering distributed database data.
Compared with the prior art, the distributed database data automatic recovery method has the following beneficial effects:
the automatic data recovery method of the invention can effectively solve the problems of overlarge logs, performance bottleneck, data synchronization delay and the like by recording events of DDL and DML operations and adopting a mode of carrying out data synchronization writing operation by a backup node, thereby improving the reliability and performance of the system.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a schematic diagram of a main flow chart of a method for automatically recovering data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a table structure restoration flow according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a table structure restoration flow according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a head structure according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a content structure according to an embodiment of the present invention;
FIG. 6 is a table data recovery flow diagram illustrating an embodiment of the present invention;
FIG. 7 is a table data recovery flow diagram illustrating an embodiment of the present invention;
FIG. 8 is a diagram of a dml_ fevent _log structure according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
The application is applied to a distributed database, and the data automatic recovery method aims at repairing the data loss of the node process fault table under the condition that user service is completely not perceived. As shown in fig. 1, the database system continuously monitors the process state of the node, and when the data is damaged due to the failure of the node, the database automatically restores the data by a data automatic restoration method, and the user can operate the database during the data automatic restoration method, which comprises the following steps:
t1, triggering an automatic recovery service: if the node process is checked to be faulty when the user operates the database, starting the automatic recovery service to transfer to T2;
t2, look-up table operation: checking the operation of a user on a database table, if the operation of the table structure is turned to T3, if the operation of the table data is turned to T4, and if the operation of the table structure and the data is included, turning to T5;
t3, table structure recovery: the system finishes the automatic data recovery flow after the original table structure is flattened according to the recorded structure of the serial ddl event log recovery table;
T4, table data recovery: the system finishes the automatic data recovery flow after synchronously aligning the original table data according to the recorded dml data loss event recovery table data;
t5, table structure and data recovery: and after the system sequentially executes T3 and T4, ending the automatic data recovery flow.
Optionally, in T2, the scenario of the table structure operation includes creating a table, modifying a table, deleting a table, creating an index, modifying an index, deleting an index, and the scenario of the table data operation includes data insertion, data update, data deletion, and data query.
Example 1:
if a node process fault occurs during the process of creating a table or modifying a table structure, the table structure recovery in step T3 can be started, and the method specifically comprises the following sub-steps as shown in fig. 2-3:
S1, when a user issues an ddl statement to a database table, a database system distributes ddl operation to each data node;
s2, when a certain data node process in the database fails, the database system records the ddl event log of the data node, sequences the ddl event log (namely ddl_redox_log) and compresses the ddl event log into a ddl_ fevent _log file;
S3, after the data node fault is repaired in S2, the system decompresses and deserializes the ddl event logs recorded in the ddl_ fevent _log file in S2, then carries out ddl redo on the fault data node, and completes table structure recovery.
The ddl event log is serialized in a format of head+content, wherein the head comprises a serialization flag, redolog flag and version number as shown in fig. 4, and the content comprises Lockinfo, dbtablename, nodeinfo, statement as shown in fig. 5; wherein Lockinf is that lock related information contains lock id and lock character string information, dbtablename is database table name information, nodeinfo is that node information contains ip address information of a node, and statement is ddl_sql statement to be executed. The advantage is that the information is lightweight compared with traditional log records, and often can be described by only recording below 0.5K, and the information is compressed, so that the actual required space is often below 100 bytes.
Specifically, after the obtained ddl event log is deserialized in step S3, performing ddl rework (only by re-executing the sql statement) on the failed node according to the information of the corresponding field at the format analysis position, for example, building a table sentence create table, modifying a table sentence alter table, and the like, and then re-executing the sql statement on the failed node to ensure the integrity of the table structure.
Example 2:
If the process fault of the data node occurs during the process of inserting the data of the table, the data of the table is lost, and the data recovery of the table can be performed in step T4, and the steps shown in fig. 6-7 comprise the following steps:
a1, when a user issues dml sentences to a database table, a database system distributes dml operation to each data node;
A2, when a certain data node process in the database fails, the database system records dml data loss events of a corresponding table of the failed data node, and stores the dml data loss events into a dml_ fevent _log;
A3, after the data node in the A2 is repaired, the system acquires backup data from other data nodes which normally run according to dml data loss events in a dml_ fevent _log in the A2;
And A4, performing synchronous writing operation of the data block on the fault data node by using the backup data, completing table data recovery, ensuring the data to be smoothed, wherein the data block is a difference database, and the synchronous writing operation rewrites the difference data into the fault data node so as to improve the data recovery speed.
Specifically, as shown in fig. 8, the dml_ fevent _log content includes tableinfo, noddeinfo, datainfo, where tableinfo is a table name of the lost data, noddeinfo a node ip of the lost data and backup data nodes ip, datainfo are information of the lost data block. Advantageously, the information can be described with only less than 100 bytes of information recording space.
Example 3:
If the user performs both the table structure ddl operation and the table data dml operation, and the data node process fails, step T5 may be performed, specifically, embodiment 1 is performed first to perform table structure recovery (ddl), and embodiment 2 is performed after the table structures are aligned to perform table data recovery (dml), so that data recovery is finally completed, and consistency of the failed node table data and the structure is ensured.
Those of ordinary skill in the art will appreciate that the elements and method steps of each example described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the elements and steps of each example have been described generally in terms of functionality in the foregoing description to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed methods and systems may be implemented in other ways. For example, the above-described division of units is merely a logical function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. The units may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. An automatic recovery method for distributed database data is characterized in that: the database system continuously monitors the process state of the nodes, when the data is damaged due to node faults, the database automatically restores the data through a data automatic restoration method, and the data automatic restoration method comprises the following steps:
t1, triggering an automatic recovery service: if the node process is checked to be faulty when the user operates the database, starting the automatic recovery service to transfer to T2;
t2, look-up table operation: checking the operation of a user on a database table, if the operation of the table structure is turned to T3, if the operation of the table data is turned to T4, and if the operation of the table structure and the data is included, turning to T5;
t3, table structure recovery: the system finishes the automatic data recovery flow after the original table structure is flattened according to the recorded structure of the serial ddl event log recovery table;
T4, table data recovery: the system finishes the automatic data recovery flow after synchronously aligning the original table data according to the recorded dml data loss event recovery table data;
t5, table structure and data recovery: and after the system sequentially executes T3 and T4, ending the automatic data recovery flow.
2. The method for automatically recovering data from a distributed database according to claim 1, wherein: the T3 table structure recovery includes the following sub-steps:
S1, when a user issues an ddl statement to a database table, a database system distributes ddl operation to each data node;
s2, when a certain data node process in the database fails, the database system records the ddl event log of the data node, and stores the ddl event log into a ddl_ fevent _log file after serializing and compressing the ddl event log;
S3, after the data node fault is repaired in S2, the system decompresses and de-sequences the ddl event logs recorded in the ddl_ fevent _log file in S2, then performs ddl redo on the fault data node, and completes table structure recovery.
3. The method for automatically recovering data from a distributed database according to claim 2, wherein: the format of the ddl event log after serialization is head+content, wherein the head comprises a serialization mark, redolog marks and version numbers, and the content comprises Lockinfo, dbtablename, nodeinfo, statement;
Wherein Lockinf is that lock related information contains lock id and lock character string information, dbtablename is database table name information, nodeinfo is that node information contains ip address information of a node, and statement is ddl_sql statement to be executed.
4. The method for automatically recovering data from a distributed database according to claim 1, wherein: the T4 table data recovery includes the steps of:
A1, when a user issues dml sentences to a database table, a database system distributes dml operations to associated data nodes;
A2, when a certain data node process in the database fails, the database system records dml data loss events of a corresponding table of the failed data node, and stores the dml data loss events into a dml_ fevent _log;
A3, after the data node in the A2 is repaired, the system acquires backup data from other normally operated associated data nodes according to dml data loss events in the dml_ fevent _log in the A2;
And A4, performing synchronous writing operation of the data block on the fault data node by using the backup data to finish table data recovery.
5. The method for automatically recovering data from a distributed database according to claim 4, wherein: the dml_ fevent _log content includes tableinfo, noddeinfo, datainfo, where tableinfo is a table name of lost data, noddeinfo a node ip of lost data and backup data nodes ip, datainfo are lost data block information.
6. The method for automatically recovering data from a distributed database according to claim 1, wherein: in T2, the table structure operation includes creating a table, modifying a table, deleting a table, creating an index, modifying an index, deleting an index, and the table data operation includes data insertion, data update, data deletion, and data query.
7. The method for automatically recovering data from a distributed database according to claim 4, wherein: in the step A4, the data block is a difference database, and the synchronous writing operation rewrites the difference data into the fault data node so as to improve the data recovery speed.
8. An electronic device comprising a processor and a memory communicatively coupled to the processor for storing processor-executable instructions, characterized in that: the processor is configured to perform a distributed database data auto-recovery method as claimed in any one of claims 1 to 7.
9. A server, characterized by: comprising at least one processor and a memory communicatively coupled to the processor, the memory storing instructions executable by the at least one processor to cause the at least one processor to perform a distributed database data auto-recovery method as claimed in any one of claims 1-7.
10. A computer-readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implements a distributed database data automatic restoration method as claimed in any one of claims 1-7.
CN202410458526.8A 2024-04-17 2024-04-17 Automatic recovery method for distributed database data Pending CN118069430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410458526.8A CN118069430A (en) 2024-04-17 2024-04-17 Automatic recovery method for distributed database data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410458526.8A CN118069430A (en) 2024-04-17 2024-04-17 Automatic recovery method for distributed database data

Publications (1)

Publication Number Publication Date
CN118069430A true CN118069430A (en) 2024-05-24

Family

ID=91107528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410458526.8A Pending CN118069430A (en) 2024-04-17 2024-04-17 Automatic recovery method for distributed database data

Country Status (1)

Country Link
CN (1) CN118069430A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198159A (en) * 2013-04-27 2013-07-10 国家计算机网络与信息安全管理中心 Transaction-redo-based multi-copy consistency maintaining method for heterogeneous clusters
US20180203771A1 (en) * 2017-01-19 2018-07-19 Sap Se Database Redo Log Optimization by Skipping MVCC Redo Log Records
CN109656911A (en) * 2018-12-11 2019-04-19 江苏瑞中数据股份有限公司 Distributed variable-frequencypump Database Systems and its data processing method
CN111488243A (en) * 2020-03-19 2020-08-04 北京金山云网络技术有限公司 MongoDB database backup and recovery method and device, electronic equipment and storage medium
CN114756408A (en) * 2022-04-28 2022-07-15 泽拓科技(深圳)有限责任公司 Metadata backup recovery method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198159A (en) * 2013-04-27 2013-07-10 国家计算机网络与信息安全管理中心 Transaction-redo-based multi-copy consistency maintaining method for heterogeneous clusters
US20180203771A1 (en) * 2017-01-19 2018-07-19 Sap Se Database Redo Log Optimization by Skipping MVCC Redo Log Records
CN109656911A (en) * 2018-12-11 2019-04-19 江苏瑞中数据股份有限公司 Distributed variable-frequencypump Database Systems and its data processing method
CN111488243A (en) * 2020-03-19 2020-08-04 北京金山云网络技术有限公司 MongoDB database backup and recovery method and device, electronic equipment and storage medium
CN114756408A (en) * 2022-04-28 2022-07-15 泽拓科技(深圳)有限责任公司 Metadata backup recovery method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
JP2708386B2 (en) Method and apparatus for recovering duplicate database through simultaneous update and copy procedure
US10503616B2 (en) Periodic data replication
US7437609B2 (en) Reliable standby database failover
US7428657B2 (en) Method for rolling back from snapshot with log
US7925633B2 (en) Disaster recovery system suitable for database system
CN106462586B (en) The method and system that the consistency of Multi version concurrency control based on record is read
US7634679B2 (en) Remote location failover server application
JP4880668B2 (en) Apparatus and method for identifying asynchronous data in a redundant data store and for resynchronizing it
US6691245B1 (en) Data storage with host-initiated synchronization and fail-over of remote mirror
EP0672985B1 (en) Asynchronous remote data duplexing
CN102891849B (en) Service data synchronization method, data recovery method, data recovery device and network device
CN106776121B (en) Data disaster recovery device, system and method
KR100471567B1 (en) Transaction Management Method For Data Synchronous In Dual System Environment
CN105302667B (en) High reliability data backup and restoration methods based on aggregated structure
US20110066804A1 (en) Storage device and information management system
CN102955720A (en) Method for improving stability of EXT (extended) file system
US7487385B2 (en) Apparatus and method for recovering destroyed data volumes
CN109189860A (en) A kind of active and standby increment synchronization method of MySQL based on Kubernetes system
CN105574187A (en) Duplication transaction consistency guaranteeing method and system for heterogeneous databases
CN110825546A (en) Recovery method, system and equipment terminal for high-availability database cluster
EP3147789B1 (en) Method for re-establishing standby database, and apparatus thereof
US20200341861A1 (en) Method, device, and computer-readable storage medium for managing storage system
US9087115B1 (en) Mirror resynchnronization of fixed page length tables for better repair time to high availability in databases
JP2003223350A (en) Data base system
CN112925676A (en) Method for realizing recovery of distributed database cluster at any time point based on WAL

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination