CN117520055A - Data recovery method, device, equipment and medium based on data cluster - Google Patents

Data recovery method, device, equipment and medium based on data cluster Download PDF

Info

Publication number
CN117520055A
CN117520055A CN202311597302.7A CN202311597302A CN117520055A CN 117520055 A CN117520055 A CN 117520055A CN 202311597302 A CN202311597302 A CN 202311597302A CN 117520055 A CN117520055 A CN 117520055A
Authority
CN
China
Prior art keywords
data
node
backup
cluster
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311597302.7A
Other languages
Chinese (zh)
Inventor
赵武清
柏姗姗
耿新
李承钊
李科德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd
Original Assignee
China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd filed Critical China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd
Priority to CN202311597302.7A priority Critical patent/CN117520055A/en
Publication of CN117520055A publication Critical patent/CN117520055A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data recovery method, a device, equipment and a medium based on a data cluster. Comprising the following steps: acquiring a query statement input by a user, determining the execution condition of each node in the data cluster based on the query statement, and generating a data backup instruction when the execution condition of each node is successful; determining data to be backed up of the data cluster based on the data backup instruction, and carrying out data backup according to the data to be backed up to generate a backup set; acquiring a data recovery instruction, and downloading recovery data from the backup set according to the data recovery instruction; and carrying out data recovery of the data cluster according to the restored data. The data backup instruction is generated only when the execution condition of each node is successful, so that the consistency of the data of each node in the data cluster is ensured, the data to be backed up is determined through the key value difference before and after backup, the accuracy of the data to be backed up is ensured, and the data is deleted in the backup and recovery stages, so that the data processing efficiency is improved, and the system overhead is reduced.

Description

Data recovery method, device, equipment and medium based on data cluster
Technical Field
The present invention relates to the field of data backup technologies, and in particular, to a data recovery method, apparatus, device, and medium based on a data cluster.
Background
Various business systems have higher and higher dependency on informatization, and accumulated mass data becomes increasingly important intangible assets of enterprises, so that in order to ensure the effectiveness and the safety of production business data, backup management on the production data is needed in time.
In the prior art, data backup is performed through a storage interface, and a backup mode based on the storage interface is that the storage interface is used for firstly completing the complete backup of database data to generate a complete backup snapshot and recording LSN, and the subsequent incremental backup is used for acquiring incremental data based on the LSN of the last backup record and sending the incremental data to storage through the interface to generate the incremental backup snapshot. This scheme is time consuming to restore and requires the full backup set to be downloaded from the storage server.
Disclosure of Invention
The invention provides a data recovery method, a device, equipment and a medium based on a data cluster, which are used for realizing data backup through the data cluster and recovering the data when faults occur, so that data loss or errors are avoided.
According to an aspect of the present invention, there is provided a data recovery method based on a data cluster, the method comprising:
Acquiring a query statement input by a user, determining the execution condition of each node in the data cluster based on the query statement, and generating a data backup instruction when the execution condition of each node is successful;
determining data to be backed up of the data cluster based on the data backup instruction, and carrying out data backup according to the data to be backed up to generate a backup set;
acquiring a data recovery instruction, and downloading recovery data from the backup set according to the data recovery instruction;
and carrying out data recovery of the data cluster according to the restored data.
Optionally, determining the execution condition of each node in the data cluster based on the query statement includes: acquiring a data query result of the data cluster according to the query statement, and displaying the data query result to a user; acquiring an execution statement based on the data query result, and editing data according to the execution statement to generate editing data; and determining the execution condition according to the editing data by each node, wherein the execution condition comprises execution success and execution failure.
Optionally, obtaining the data query result of the data cluster according to the query statement includes: taking nodes connected with the user terminals in the data cluster as master nodes and other nodes as slave nodes; acquiring a query statement input by a user through a master node, generating a parallel query plan according to the query statement, and distributing the parallel query plan to other slave nodes; directly inquiring a local log according to the inquiry statement through the main node to obtain a main node inquiry result; inquiring a local log through each slave node according to the parallel inquiry plan to obtain a slave node inquiry result, and feeding back each slave node inquiry result to the master node; and taking the master node query result and each slave node query result as data query results through the master node.
Optionally, acquiring the execution statement based on the data query result, and performing data editing according to the execution statement to generate editing data, including: acquiring an execution statement input by a user based on a data query result through a main node, wherein the execution statement comprises insertion, modification and deletion; determining a target node corresponding to the execution statement through the master node; judging whether the target node is a local node or not through the main node, if so, directly editing local data according to the execution statement to generate editing data; otherwise, the execution statement is sent to each slave node, and the data is edited based on the execution statement through each slave node to generate editing data.
Optionally, determining the data to be backed up of the data cluster includes: acquiring a global lock of the data cluster, and locking the global lock, wherein the global lock is used for locking a data writing state of the data cluster; issuing a data backup instruction to each slave node through the master node, wherein the data backup instruction comprises a backup time stamp; determining a current key value of the local log through each slave node according to the backup time stamp, and returning the current key value to the master node; acquiring a history key value through a main node, calculating a difference value between the history key value and a current key value, and determining data to be backed up according to the difference value; unlocking the global lock.
Optionally, performing data backup according to the data to be backed up to generate a backup set, including: the data to be backed up is sent to a memory through a main node to form a backup set, the backup set is stored through the memory, and a storage result is generated and fed back to the main node; and acquiring a change log of each node through the master node according to the storage result, and generating an archiving log according to the change log.
Optionally, downloading the restored data from the backup set according to the data restoration instruction includes: determining the restoring time corresponding to the data restoring instruction through the master node; determining a target address of a backup set from the archive log according to the restoring time by the master node; transmitting the target address to each slave node through the master node; and downloading the restored data from the backup set according to the target address through the master node and each slave node.
Optionally, performing data recovery according to the restored data includes: acquiring current data of a data cluster, and comparing the current data with restored data to determine the similarity degree of the data; and when the data similarity is smaller than a preset threshold value, the current data is covered by the restored data so as to restore the data.
Optionally, the method further comprises: partitioning the data to be backed up or the restored data according to the designated data volume to generate data blocks; establishing newly added fingerprint information corresponding to each data block through a hash algorithm; acquiring historical fingerprint information, judging whether the newly added fingerprint information and the historical expected information are repeated, and if yes, deleting the newly added fingerprint information; otherwise, the newly added fingerprint information is reserved.
According to another aspect of the present invention, there is provided a data recovery apparatus based on a data cluster, the apparatus comprising:
the data backup instruction generation module is used for acquiring query sentences input by a user, determining the execution conditions of all nodes in the data cluster based on the query sentences, and generating a data backup instruction when the execution conditions of all the nodes are successful in execution;
the backup set generation module is used for determining data to be backed up of the data cluster based on the data backup instruction, and carrying out data backup according to the data to be backed up so as to generate a backup set;
the restored data downloading module is used for acquiring a data restoration instruction and downloading restored data from the backup set according to the data restoration instruction;
and the data recovery module is used for carrying out data recovery of the data cluster according to the restored data.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a data cluster-based data recovery method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a data recovery method based on a data cluster according to any one of the embodiments of the present invention when executed.
According to the technical scheme, the data backup instruction is generated only when the execution conditions of all the nodes are successful, so that the consistency of the data of all the nodes in the data cluster is ensured, the data to be backed up is determined through the key value difference before and after backup, the accuracy of the data to be backed up is ensured, and the data is deleted in the backup and recovery stages, so that the data processing efficiency is improved, and the system overhead is reduced.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data recovery method based on a data cluster according to a first embodiment of the present invention;
FIG. 2 is a flowchart of another data recovery method based on a data cluster according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of another data recovery device based on a data cluster according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing a data recovery method based on a data cluster according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data cluster-based data recovery method according to an embodiment of the present invention, where the method may be performed by a data cluster-based data recovery device, and the data cluster-based data recovery device may be implemented in hardware and/or software, and the data cluster-based data recovery device may be configured in a data cluster. As shown in fig. 1, the method includes:
s110, acquiring a query statement input by a user, determining the execution condition of each node in the data cluster based on the query statement, and generating a data backup instruction when the execution condition of each node is successful.
The query statement refers to a structured query language (Structured Query Language, SQL), which is a database query and programming language for accessing data and querying, updating and managing data. The data content which the user wants to inquire can be obtained through the inquiry statement, the data editing operation is executed on the data content, and when each node in the data cluster executes the editing operation which corresponds successfully, the data cluster system can automatically generate the data backup instruction.
In a specific implementation, the present embodiment is applied to a data cluster system, where the system includes a plurality of server nodes and a database cluster, where the database cluster includes a plurality of data server nodes. The database cluster is a shared storage space and can be accessed by each node of the server. The user terminal initiates a service access request to the server node, the server node performs service access response based on the data stored in the database cluster, and meanwhile, the processing procedure is stored in the local of the server node to form a local log. The shared log is arranged in the database cluster, and the operation of reading and writing data, namely the archiving log, is recorded. The database cluster stores backup data, i.e., a backup set.
S120, determining data to be backed up of the data cluster based on the data backup instruction, and carrying out data backup according to the data to be backed up to generate a backup set.
Optionally, determining the data to be backed up of the data cluster includes: acquiring a global lock of the data cluster, and locking the global lock, wherein the global lock is used for locking a data writing state of the data cluster; issuing a data backup instruction to each slave node through the master node, wherein the data backup instruction comprises a backup time stamp; determining a current key value of the local log through each slave node according to the backup time stamp, and returning the current key value to the master node; acquiring a history key value through a main node, calculating a difference value between the history key value and a current key value, and determining data to be backed up according to the difference value; unlocking the global lock.
It should be noted that, when the global lock is used for locking the data writing state of the data cluster and obtaining the data to be backed up, the global lock may be locked first, for example, the state of the global lock is set to be the locking state, so as to record the log key value of the current backup period, and after the recording is completed, the state of the global lock may be adjusted to be the unlocking state, so as to perform the subsequent normal operation. On the basis, new data can be prevented from being written, and the accuracy of the backup method is improved.
In a specific embodiment, when the data to be backed up of the data cluster is acquired, the global lock of the data cluster is acquired first, and the state of the global lock is set to be locked, so that writing of new data is avoided continuously in the process of determining the data to be backed up. Then, the data backup command is issued to each slave through the master node, the data backup command comprises a backup time stamp, each node determines a current key value corresponding to the local log according to the backup time stamp, each slave node feeds back the current key value to the master node, the master node can further calculate the difference value between the current key value and the historical key value of each node, and then the data to be backed up is determined according to the difference value. And when the difference value is 0, no change data is generated, and no subsequent backup operation is needed. After determining that the data to be backed up is complete, the global lock state may be set to unlock, at which time new data is allowed to be written.
Optionally, performing data backup according to the data to be backed up to generate a backup set, including: the data to be backed up is sent to a memory through a main node to form a backup set, the backup set is stored through the memory, and a storage result is generated and fed back to the main node; and acquiring a change log of each node through the master node according to the storage result, and generating an archiving log according to the change log.
The backup refers to that valid data pages of local logs of all nodes in the data cluster are backed up, and the data are recovered when faults occur. According to the different backup contents, the data backup and the archive log backup can be divided. The data backup mainly aims at the content of the data file, and comprises a library backup, a table space backup and a table backup, so that a backup set is formed. And the log backup is to record the data read-write operation content of each node to form an archive log.
Specifically, the data to be backed up can be sent to the memory through the master node to form a backup set, and the memory is a shared storage space in the data set. After receiving the feedback of successful storage of the memory, the master node can further acquire log changes generated by each node to form an archive log.
S130, acquiring a data recovery instruction, and downloading recovery data from the backup set according to the data recovery instruction.
Optionally, downloading the restored data from the backup set according to the data restoration instruction includes: determining the restoring time corresponding to the data restoring instruction through the master node; determining a target address of a backup set from the archive log according to the restoring time by the master node; transmitting the target address to each slave node through the master node; and downloading the restored data from the backup set according to the target address through the master node and each slave node.
The restore is the reverse process of backup, that is, the process of reading restore data from the backup set and writing the restore data into the corresponding position of the data file corresponding to the target database.
Specifically, when the data cluster detects that a node fails, a data recovery instruction is generated, at this time, the master node determines a recovery time corresponding to the data recovery instruction, and then determines a target address of the backup set corresponding to the recovery time from the archive log. The target address can be issued to each slave node through the master node, and each node can download the restored data from the backup set according to the target address.
And S140, carrying out data recovery of the data cluster according to the restored data.
Optionally, performing data recovery according to the restored data includes: acquiring current data of a data cluster, and comparing the current data with restored data to determine the similarity degree of the data; and when the data similarity is smaller than a preset threshold value, the current data is covered by the restored data so as to restore the data.
Specifically, the method also comprises a process of determining whether the data recovery is needed or not when determining to restore the data. And comparing the restored data with the current data to determine the similarity degree of the data. When the data similarity reaches a preset threshold, the preset threshold may be 98%, and when the data similarity reaches 98%, it indicates that the data is basically unchanged, and data recovery is not needed at this time. When the degree of similarity of the data is less than 98%, the surface data is changed greatly, and the data recovery operation is required.
Optionally, the method further comprises: partitioning the data to be backed up or the restored data according to the designated data volume to generate data blocks; establishing newly added fingerprint information corresponding to each data block through a hash algorithm; acquiring historical fingerprint information, judging whether the newly added fingerprint information and the historical expected information are repeated, and if yes, deleting the newly added fingerprint information; otherwise, the newly added fingerprint information is reserved.
Specifically, the method also comprises the process of deleting the repeated data in the processes of data backup and data recovery. And according to the specified data volume, blocking the data to be backed up or the restored data to generate data blocks, and establishing newly added fingerprint information corresponding to each data block through a hash algorithm. And acquiring historical storage data and corresponding historical fingerprint information in a memory. And judging whether the newly added fingerprint information and the historical fingerprint information are repeated or not, if so, only storing the historical fingerprint information and not storing the newly added fingerprint information. Otherwise, the data block is reserved, and relevant information is extracted and stored. By comparing the accurate data fingerprints of the data blocks, the data repeatability among a plurality of backup sets is reduced, so that the minimized occupation of the host computer resource of the user terminal during source terminal re-deleting is ensured, the maximized deletion of the repeated data is realized, and the effective use of the storage space is greatly improved.
According to the technical scheme, the data backup instruction is generated only when the execution conditions of all the nodes are successful, so that the consistency of the data of all the nodes in the data cluster is ensured, the data to be backed up is determined through the key value difference before and after backup, the accuracy of the data to be backed up is ensured, and the data is deleted in the backup and recovery stages, so that the data processing efficiency is improved, and the system overhead is reduced.
Example two
Fig. 2 is a flowchart of a data recovery method based on a data cluster according to a second embodiment of the present invention, where a specific process of determining an execution condition of each node in the data cluster based on a query statement is added on the basis of the first embodiment. As shown in fig. 2, the method includes:
s210, acquiring a query sentence input by a user.
S220, acquiring a data query result of the data cluster according to the query statement, and displaying the data query result to a user.
Optionally, obtaining the data query result of the data cluster according to the query statement includes: taking nodes connected with the user terminals in the data cluster as master nodes and other nodes as slave nodes; acquiring a query statement input by a user through a master node, generating a parallel query plan according to the query statement, and distributing the parallel query plan to other slave nodes; directly inquiring a local log according to the inquiry statement through the main node to obtain a main node inquiry result; inquiring a local log through each slave node according to the parallel inquiry plan to obtain a slave node inquiry result, and feeding back each slave node inquiry result to the master node; and taking the master node query result and each slave node query result as data query results through the master node.
Specifically, a node connected with the user terminal in the data cluster can be used as a master node, and the master node can respond to the request externally and generate a scheduling plan corresponding to the request to issue to the slave node. When a user performs data query, a query sentence can be input first. After the master node acquires the query statement, a parallel query plan is generated according to the query statement, the parallel query plan is used for each slave node to simultaneously execute the query task in parallel, and the parallel query plan comprises the query statement. Each server node can query the local log according to the query statement to acquire a query result, the slave node feeds back the query result of the slave node to the master node, and the master node gathers all the query results to generate a data query result and displays the data query result to the user.
S230, acquiring an execution statement based on the data query result, and editing data according to the execution statement to generate editing data.
Optionally, acquiring the execution statement based on the data query result, and performing data editing according to the execution statement to generate editing data, including: acquiring an execution statement input by a user based on a data query result through a main node, wherein the execution statement comprises insertion, modification and deletion; determining a target node corresponding to the execution statement through the master node; judging whether the target node is a local node or not through the main node, if so, directly editing local data according to the execution statement to generate editing data; otherwise, the execution statement is sent to each slave node, and the data is edited based on the execution statement through each slave node to generate editing data.
Specifically, the main node acquires the execution statement of the execution statement input by the user, and the execution statement comprises insertion, modification and deletion, so as to determine the target node corresponding to the execution statement. And then the master node judges whether the target node is a local node, if so, the target node is directly modified locally, otherwise, an execution plan is generated according to the execution statement and distributed to other slave nodes, and the data is edited and generated through the slave nodes based on the execution statement.
S240, determining the execution condition according to the editing data through each node, wherein the execution condition comprises execution success and execution failure.
Specifically, after editing data is generated by executing editing operation by each slave node, a successful execution result is returned to the master node. The master node further judges whether all the slave nodes return results of successful execution within the appointed time, if so, returns the successful results to the client to complete data modification, otherwise, returns failure results to the client, and restores the modifications of other executed slave nodes to ensure the consistency of the data.
S250, when the execution conditions of all the nodes are successful, generating a data backup instruction.
S260, determining data to be backed up of the data cluster based on the data backup instruction, and carrying out data backup according to the data to be backed up to generate a backup set.
Optionally, determining the data to be backed up of the data cluster includes: acquiring a global lock of the data cluster, and locking the global lock, wherein the global lock is used for locking a data writing state of the data cluster; issuing a data backup instruction to each slave node through the master node, wherein the data backup instruction comprises a backup time stamp; determining a current key value of the local log through each slave node according to the backup time stamp, and returning the current key value to the master node; acquiring a history key value through a main node, calculating a difference value between the history key value and a current key value, and determining data to be backed up according to the difference value; unlocking the global lock.
Optionally, performing data backup according to the data to be backed up to generate a backup set, including: the data to be backed up is sent to a memory through a main node to form a backup set, the backup set is stored through the memory, and a storage result is generated and fed back to the main node; and acquiring a change log of each node through the master node according to the storage result, and generating an archiving log according to the change log.
S270, acquiring a data recovery instruction, and downloading recovery data from the backup set according to the data recovery instruction.
Optionally, downloading the restored data from the backup set according to the data restoration instruction includes: determining the restoring time corresponding to the data restoring instruction through the master node; determining a target address of a backup set from the archive log according to the restoring time by the master node; transmitting the target address to each slave node through the master node; and downloading the restored data from the backup set according to the target address through the master node and each slave node.
S280, carrying out data recovery of the data cluster according to the restored data.
Optionally, performing data recovery according to the restored data includes: acquiring current data of a data cluster, and comparing the current data with restored data to determine the similarity degree of the data; and when the data similarity is smaller than a preset threshold value, the current data is covered by the restored data so as to restore the data.
Optionally, the method further comprises: partitioning the data to be backed up or the restored data according to the designated data volume to generate data blocks; establishing newly added fingerprint information corresponding to each data block through a hash algorithm; acquiring historical fingerprint information, judging whether the newly added fingerprint information and the historical expected information are repeated, and if yes, deleting the newly added fingerprint information; otherwise, the newly added fingerprint information is reserved.
According to the technical scheme, the data backup instruction is generated only when the execution conditions of all the nodes are successful, so that the consistency of the data of all the nodes in the data cluster is ensured, the data to be backed up is determined through the key value difference before and after backup, the accuracy of the data to be backed up is ensured, and the data is deleted in the backup and recovery stages, so that the data processing efficiency is improved, and the system overhead is reduced.
Example III
Fig. 3 is a schematic structural diagram of a data recovery device based on a data cluster according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes: the data backup instruction generating module 310 is configured to obtain a query statement input by a user, determine an execution condition of each node in the data cluster based on the query statement, and generate a data backup instruction when the execution condition of each node is successful;
the backup set generating module 320 is configured to determine data to be backed up of the data cluster based on the data backup instruction, and perform data backup according to the data to be backed up to generate a backup set;
the restored data downloading module 330 is configured to obtain a data restoration instruction, and download restored data from the backup set according to the data restoration instruction;
the data recovery module 340 is configured to perform data recovery of the data cluster according to the restored data.
Optionally, the data backup instruction generating module 310 specifically includes: a query result acquisition unit, configured to: acquiring a data query result of the data cluster according to the query statement, and displaying the data query result to a user; an edit data generation unit configured to: acquiring an execution statement based on the data query result, and editing data according to the execution statement to generate editing data; an execution condition determining unit configured to: and determining the execution condition according to the editing data by each node, wherein the execution condition comprises execution success and execution failure.
Optionally, the query result obtaining unit is specifically configured to: taking nodes connected with the user terminals in the data cluster as master nodes and other nodes as slave nodes; acquiring a query statement input by a user through a master node, generating a parallel query plan according to the query statement, and distributing the parallel query plan to other slave nodes; directly inquiring a local log according to the inquiry statement through the main node to obtain a main node inquiry result; inquiring a local log through each slave node according to the parallel inquiry plan to obtain a slave node inquiry result, and feeding back each slave node inquiry result to the master node; and taking the master node query result and each slave node query result as data query results through the master node.
Optionally, the edit data generating unit is specifically configured to: acquiring an execution statement input by a user based on a data query result through a main node, wherein the execution statement comprises insertion, modification and deletion; determining a target node corresponding to the execution statement through the master node; judging whether the target node is a local node or not through the main node, if so, directly editing local data according to the execution statement to generate editing data; otherwise, the execution statement is sent to each slave node, and the data is edited based on the execution statement through each slave node to generate editing data.
Optionally, the backup set generating module 320 specifically includes: a data to be backed up determining unit, configured to: acquiring a global lock of the data cluster, and locking the global lock, wherein the global lock is used for locking a data writing state of the data cluster; issuing a data backup instruction to each slave node through the master node, wherein the data backup instruction comprises a backup time stamp; determining a current key value of the local log through each slave node according to the backup time stamp, and returning the current key value to the master node; acquiring a history key value through a main node, calculating a difference value between the history key value and a current key value, and determining data to be backed up according to the difference value; unlocking the global lock.
Optionally, the backup set generating module 320 specifically includes: a data backup unit for: the data to be backed up is sent to a memory through a main node to form a backup set, the backup set is stored through the memory, and a storage result is generated and fed back to the main node; and acquiring a change log of each node through the master node according to the storage result, and generating an archiving log according to the change log.
Optionally, the restore data download module 330 is specifically configured to: determining the restoring time corresponding to the data restoring instruction through the master node; determining a target address of a backup set from the archive log according to the restoring time by the master node; transmitting the target address to each slave node through the master node; and downloading the restored data from the backup set according to the target address through the master node and each slave node.
Optionally, the data recovery module 340 is specifically configured to: acquiring current data of a data cluster, and comparing the current data with restored data to determine the similarity degree of the data; and when the data similarity is smaller than a preset threshold value, the current data is covered by the restored data so as to restore the data.
Optionally, the apparatus further comprises: the data backup module is used for generating data blocks by carrying out data backup or data restoration according to the specified data volume; establishing newly added fingerprint information corresponding to each data block through a hash algorithm; acquiring historical fingerprint information, judging whether the newly added fingerprint information and the historical expected information are repeated, and if yes, deleting the newly added fingerprint information; otherwise, the newly added fingerprint information is reserved.
According to the technical scheme, the data backup instruction is generated only when the execution conditions of all the nodes are successful, so that the consistency of the data of all the nodes in the data cluster is ensured, the data to be backed up is determined through the key value difference before and after backup, the accuracy of the data to be backed up is ensured, and the data is deleted in the backup and recovery stages, so that the data processing efficiency is improved, and the system overhead is reduced.
The data recovery device based on the data cluster provided by the embodiment of the invention can execute the data recovery method based on the data cluster provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a data recovery method based on data clusters. Namely: acquiring a query statement input by a user, determining the execution condition of each node in the data cluster based on the query statement, and generating a data backup instruction when the execution condition of each node is successful; determining data to be backed up of the data cluster based on the data backup instruction, and carrying out data backup according to the data to be backed up to generate a backup set; acquiring a data recovery instruction, and downloading recovery data from the backup set according to the data recovery instruction; and carrying out data recovery of the data cluster according to the restored data.
In some embodiments, a data cluster-based data recovery method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more of the steps of a data cluster-based data recovery method described above may be performed when a computer program is loaded into RAM 13 and executed by processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform a data cluster-based data recovery method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include a user terminal and a server. The user terminal and the server are typically remote from each other and typically interact through a communication network. The relationship of user terminals and servers arises by virtue of computer programs running on the respective computers and having a user terminal-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (12)

1. A data recovery method based on a data cluster, comprising:
acquiring a query statement input by a user, determining the execution condition of each node in a data cluster based on the query statement, and generating a data backup instruction when the execution condition of each node is successful;
determining data to be backed up of a data cluster based on the data backup instruction, and carrying out data backup according to the data to be backed up to generate a backup set;
acquiring a data recovery instruction, and downloading recovery data from the backup set according to the data recovery instruction;
and carrying out data recovery of the data cluster according to the restored data.
2. The method of claim 1, wherein determining execution of each node in the data cluster based on the query statement comprises:
acquiring a data query result of a data cluster according to the query statement, and displaying the data query result to a user;
acquiring an execution statement based on the data query result, and editing data according to the execution statement to generate editing data;
and determining the execution condition according to the editing data through each node, wherein the execution condition comprises execution success and execution failure.
3. The method according to claim 2, wherein the obtaining the data query result of the data cluster according to the query statement comprises:
taking nodes connected with the user terminals in the data cluster as master nodes and other nodes as slave nodes;
acquiring a query statement input by a user through the master node, generating a parallel query plan according to the query statement, and distributing the parallel query plan to other slave nodes;
directly inquiring a local log according to an inquiry statement through the master node to obtain a master node inquiry result;
inquiring a local log through each slave node according to the parallel inquiry plan to obtain a slave node inquiry result, and feeding back each slave node inquiry result to the master node;
and taking the master node query result and each slave node query result as the data query result through the master node.
4. The method of claim 3, wherein the obtaining the execution statement based on the data query result and performing data editing according to the execution statement to generate the edited data comprises:
acquiring an execution statement input by a user based on the data query result through the main node, wherein the execution statement comprises insertion, modification and deletion;
Determining a target node corresponding to the execution statement through the master node;
judging whether the target node is a local node or not through the main node, if so, directly editing local data according to the execution statement to generate the editing data;
otherwise, the execution statement is sent to each slave node, and the data editing is carried out on the basis of the execution statement through each slave node to generate the editing data.
5. A method according to claim 3, wherein said determining data of the data cluster to be backed up comprises:
acquiring a global lock of a data cluster, and locking the global lock, wherein the global lock is used for locking a data writing state of the data cluster;
issuing the data backup instruction to each slave node through the master node, wherein the data backup instruction comprises a backup time stamp;
determining a current key value of a local log through each slave node according to the backup time stamp, and returning the current key value to the master node;
acquiring a history key value through the master node, calculating a difference value between the history key value and the current key value, and determining the data to be backed up according to the difference value;
Unlocking the global lock.
6. The method of claim 5, wherein the performing data backup according to the data to be backed up to generate a backup set comprises:
the data to be backed up is sent to a memory through a main node to form the backup set, the backup set is stored through the memory, and a storage result is generated and fed back to the main node;
and acquiring a change log of each node through the master node according to the storage result, and generating an archiving log according to the change log.
7. The method of claim 6, wherein the downloading restored data from the backup set according to the data restoration instruction comprises:
determining the restoring time corresponding to the data restoring instruction through the master node;
determining a target address of a backup set from the archive log according to the restoration time by the master node;
transmitting the target address to each slave node through the master node;
and downloading restored data from the backup set according to the target address through the master node and each slave node.
8. The method of claim 1, wherein the recovering data from the restored data comprises:
Acquiring current data of a data cluster, and comparing the current data with the restored data to determine the similarity degree of the data;
and when the data similarity is smaller than a preset threshold, covering the current data by the restored data so as to restore the data.
9. The method of claim 7, wherein the method further comprises:
partitioning the data to be backed up or the restored data according to the designated data volume to generate data partitions;
establishing newly added fingerprint information corresponding to each data block through a hash algorithm;
acquiring historical fingerprint information, judging whether the newly added fingerprint information and the historical expected information are repeated, and if yes, deleting the newly added fingerprint information;
otherwise, the newly added fingerprint information is reserved.
10. A data recovery apparatus based on a data cluster, comprising:
the data backup instruction generation module is used for acquiring query sentences input by a user, determining the execution conditions of all nodes in the data cluster based on the query sentences, and generating a data backup instruction when the execution conditions of all the nodes are successful in execution;
the backup set generation module is used for determining data to be backed up of the data cluster based on the data backup instruction, and carrying out data backup according to the data to be backed up so as to generate a backup set;
The restored data downloading module is used for acquiring a data restoration instruction and downloading restored data from the backup set according to the data restoration instruction;
and the data recovery module is used for carrying out data recovery of the data cluster according to the restored data.
11. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
12. A computer storage medium storing computer instructions for causing a processor to perform the method of any one of claims 1-9 when executed.
CN202311597302.7A 2023-11-24 2023-11-24 Data recovery method, device, equipment and medium based on data cluster Pending CN117520055A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311597302.7A CN117520055A (en) 2023-11-24 2023-11-24 Data recovery method, device, equipment and medium based on data cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311597302.7A CN117520055A (en) 2023-11-24 2023-11-24 Data recovery method, device, equipment and medium based on data cluster

Publications (1)

Publication Number Publication Date
CN117520055A true CN117520055A (en) 2024-02-06

Family

ID=89747517

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311597302.7A Pending CN117520055A (en) 2023-11-24 2023-11-24 Data recovery method, device, equipment and medium based on data cluster

Country Status (1)

Country Link
CN (1) CN117520055A (en)

Similar Documents

Publication Publication Date Title
US11748215B2 (en) Log management method, server, and database system
US11775500B2 (en) File system consistency in a distributed system using version vectors
CN113364877A (en) Data processing method, device, electronic equipment and medium
CN115757616A (en) Data consistency checking method, device and medium based on binary log
CN117112522A (en) Concurrent process log management method, device, equipment and storage medium
CN113515518A (en) Data storage method and device, computer equipment and storage medium
CN117520055A (en) Data recovery method, device, equipment and medium based on data cluster
CN115510036A (en) Data migration method, device, equipment and storage medium
CN115639966A (en) Data writing method and device, terminal equipment and storage medium
CN115640280A (en) Data migration method and device
CN114691781A (en) Data synchronization method, system, device, equipment and medium
CN114722261A (en) Resource processing method and device, electronic equipment and storage medium
CN112860376A (en) Snapshot chain making method and device, electronic equipment and storage medium
CN108376104B (en) Node scheduling method and device and computer readable storage medium
CN114791901A (en) Data processing method, device, equipment and storage medium
CN117931514A (en) Backup method, recovery method, device, equipment and storage medium
CN115629910B (en) Transaction recovery method, device, database node and medium
CN116257531B (en) Database space recovery method
CN115757452A (en) Blocking method, device, equipment and storage medium
CN118296076A (en) Data synchronization method, device, equipment and storage medium
WO2023077283A1 (en) File management method and apparatus, and electronic device
CN117370354A (en) Snapshot and query method and device of metadata tree and electronic equipment
CN115237968A (en) Node management method, device, equipment and storage medium in database system
CN116361388A (en) Data processing method, device, equipment and storage medium
CN117421322A (en) Data reading method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination