CN112181722A - Data backup and recovery method, device, equipment and readable storage medium - Google Patents

Data backup and recovery method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN112181722A
CN112181722A CN202010973108.4A CN202010973108A CN112181722A CN 112181722 A CN112181722 A CN 112181722A CN 202010973108 A CN202010973108 A CN 202010973108A CN 112181722 A CN112181722 A CN 112181722A
Authority
CN
China
Prior art keywords
backup
data
storage cluster
druid
recovery
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010973108.4A
Other languages
Chinese (zh)
Inventor
宋文豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Data Technology Co Ltd
Original Assignee
Jinan Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Data Technology Co Ltd filed Critical Jinan Inspur Data Technology Co Ltd
Priority to CN202010973108.4A priority Critical patent/CN112181722A/en
Publication of CN112181722A publication Critical patent/CN112181722A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The application discloses a data backup and recovery method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a data change instruction, and executing the data change instruction in the Druid storage cluster; updating backup data in the backup storage cluster according to the data change condition of the Druid storage cluster; the backup data comprises backup metadata and backup data blocks; if the Druid storage cluster fails, performing data recovery on the Druid storage cluster by using backup data; when data in the Druid storage cluster changes, the method updates the backup data in the corresponding backup storage cluster according to the change condition; the backup of the whole drive storage cluster comprises the backup of the metadata information, so that when the metadata information is lost, the metadata information can be obtained by using the backup data, the data recovery is further realized, and the reliability of disaster recovery backup is improved.

Description

Data backup and recovery method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of data disaster recovery technologies, and in particular, to a data backup and recovery method, a data backup and recovery device, a data backup and recovery apparatus, and a computer-readable storage medium.
Background
The importance and vulnerability of information resources makes backup restore operations an important criterion for measuring the availability of large data components in a production environment. Data reliability is the life line of a business system, and data must be backed up in order to ensure high reliability of the data. The Apache drive is an analytical data platform integrating the characteristics of a time sequence database, a data warehouse and a full text retrieval system, and the corresponding drive data is mainly divided into two parts: segment data blocks and metadata. The data blocks are generally stored in storage systems such as HDFS and S3, the metadata is generally stored in MySQL or similar databases, and the storage systems and the databases belong to the same storage cluster. Related art typically performs disaster recovery backup within a storage cluster. For the drive data, although HDFS and the like have a relatively perfect disaster recovery mechanism, the data backup recovery logic of the drive and the drive design framework are tightly combined, and the consistency between the metadata information and the segment data needs to be ensured, thereby greatly increasing the complexity of the drive data backup. And because the metadata information of the Druid is stored in MySQL or PostgreSQL, if the metadata information is lost, the metadata information cannot be retrieved, and the reliability of disaster recovery backup is seriously influenced. Therefore, the reliability of the related technology is poor, and the related technology cannot be directly applied to data backup and recovery of the Druid, so that the usability is poor.
Therefore, how to solve the problem of poor reliability and usability of the related art is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, an object of the present application is to provide a data backup and recovery method, a data backup and recovery device, and a computer-readable storage medium, which improve reliability and usability of disaster recovery backup.
In order to solve the above technical problem, the present application provides a data backup and recovery method, including:
acquiring a data change instruction, and executing the data change instruction in the Druid storage cluster;
updating backup data in a backup storage cluster according to the data change condition of the Druid storage cluster; wherein the backup data comprises backup metadata and backup data blocks;
and if the Druid storage cluster fails, performing data recovery on the Druid storage cluster by using the backup data.
Optionally, the updating the backup data in the backup storage cluster according to the data change condition of the drive storage cluster includes:
determining backup information according to a current backup mechanism;
and updating the backup data in the backup storage cluster by using the backup information based on the current backup mechanism.
Optionally, the determining backup information according to the current backup mechanism includes:
if the current backup mechanism is hot backup, determining the data change instruction as the backup information;
correspondingly, the updating the backup data in the backup storage cluster by using the backup information based on the current backup mechanism includes:
and executing the updating corresponding to the data change instruction on the backup data in real time.
Optionally, the determining backup information according to the current backup mechanism includes:
if the current backup mechanism is cold backup, determining the backup log of the Druid storage cluster as the backup information;
correspondingly, the updating the backup data in the backup storage cluster by using the backup information based on the current backup mechanism includes:
according to a preset period, generating cold backup information by using the backup log;
and executing the update corresponding to the cold backup information on the backup data.
Optionally, the method further comprises:
after the data change instruction is executed in the Druid storage cluster, updating the backup log according to the data change instruction.
Optionally, the performing data recovery on the Druid storage cluster by using the backup data includes:
acquiring the backup data by using a Distcp technology;
analyzing the backup data to obtain backup metadata and backup data blocks;
performing data recovery on the MySQL database in the Druid storage cluster by using the backup metadata;
and performing data recovery on the distributed file system in the Druid storage cluster by using the backup data blocks.
Optionally, the obtaining a data change instruction includes:
acquiring an instruction requiring the Druid storage cluster to respond;
judging whether the instruction can cause data change or not;
and if so, determining the instruction as the data change instruction.
The present application further provides a data backup and recovery device, including:
the execution module is used for acquiring a data change instruction and executing the data change instruction in the Druid storage cluster;
the backup updating module is used for updating backup data in the backup storage cluster according to the data change condition of the drive storage cluster; wherein the backup data comprises backup metadata and backup data blocks;
and the data recovery module is used for performing data recovery on the Druid storage cluster by using the backup data if the Druid storage cluster fails.
The application also provides a data backup and recovery device, comprising a memory and a processor, wherein:
the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the data backup and recovery method.
The present application also provides a computer readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the data backup and recovery method described above.
According to the data backup and recovery method, a data change instruction is obtained, and the data change instruction is executed in the Druid storage cluster; updating backup data in the backup storage cluster according to the data change condition of the Druid storage cluster; the backup data comprises backup metadata and backup data blocks; and if the Druid storage cluster fails, performing data recovery on the Druid storage cluster by using the backup data.
As can be seen, the method uses the backup storage cluster to backup the Druid storage cluster, i.e., perform cross-domain backup. After the data change instruction is acquired and executed in the Druid storage cluster, the change of the data in the Druid storage cluster is described, wherein the change can be the change of the data block or the change of the metadata information. Therefore, the backup data in the backup storage cluster is updated according to the data change condition, the backup data comprises the backup metadata and the backup data blocks, and the backup of the whole drive storage cluster, including the backup of the data blocks and the backup of the metadata information, can be completed by updating the backup data. After a failure, the data in the source storage cluster may be restored using the backup data. When the data in the Druid storage cluster changes, the backup data in the corresponding backup storage cluster is updated according to the change situation, that is, all the data in the Druid storage cluster is backed up by using the backup storage cluster. The backup of the whole drive storage cluster comprises the backup of the metadata information, so that the data recovery can be realized when the metadata information is lost, the reliability and the usability of disaster recovery backup are improved, and the problem of poor reliability and usability of the related technology is solved.
In addition, the application also provides a data backup and recovery device, data backup and recovery equipment and a computer readable storage medium, and the beneficial effects are also achieved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a data backup and recovery method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a data recovery system according to an embodiment of the present application;
fig. 3 is a schematic diagram of a data backup process according to an embodiment of the present application;
fig. 4 is a schematic diagram of a data recovery process according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data backup and recovery device according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a data backup and recovery device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart illustrating a data backup and recovery method according to an embodiment of the present disclosure. The method comprises the following steps:
s101: the data change instruction is fetched and executed in the Druid storage cluster.
The drive Storage cluster is a Storage cluster for storing drive data, and comprises a Storage System and a database, wherein the Storage System can be a Storage System such as an HDFS (Hadoop Distributed File System) or a S3(Simple Storage Service), and the database can be a database such as a MySQL database or a PostgreSQL database. The data change instruction is an instruction which causes a change of data in the Druid storage cluster, and may specifically be a data intake instruction, a data update instruction, a data merge instruction, a data delete instruction, and the like. The data change instruction may change the metadata information in the droud data, or may change a segment data block (simply referred to as a data block) in the droud data. Executing the data change instruction in the Druid storage cluster after the data change instruction is obtained so as to operate on the data in the Druid storage cluster.
Specifically, since not all of the instructions that the Druid storage cluster needs to process are instructions that cause data change, in one possible embodiment, the process of obtaining the data change instruction may include:
step 11: instructions requiring a drive storage cluster to respond are fetched.
Step 12: it is determined whether the instruction causes a data change.
Step 13: if yes, the instruction is determined to be a data change instruction.
In this embodiment, all instructions that need to be responded by the droid storage cluster are screened, that is, it is determined whether data change will be caused after the instructions are acquired. Specifically, it may be determined whether the data content is to be operated, and if the data content is to be operated, the instruction is determined to be a data change instruction. For example, when the command is a data deletion command or a data update command, the command deletes or updates the data content, and thus is determined to be a data change command; or when the command is a data reading command, the data content of the command is not changed, so that the command is not a data change command although the command needs to be responded by the Druid storage cluster. By screening the instructions to be responded by the Druid storage cluster, the consumption of computing resources caused by synchronous data backup operation after any one instruction is obtained can be avoided, and intelligent data backup of the Druid is realized. It should be noted that, in this embodiment, a specific execution process of another instruction that is not a data change instruction is not limited, for example, the instruction is directly executed when it is detected that the instruction is not the data change instruction, and the specific execution process is not limited, and reference may be made to related technologies.
In another embodiment, since the command may not be executed normally, i.e. not executed successfully, when determining whether the command is a data change command, after determining that the command will operate on the data content, the command status is further used to determine whether the command is executed successfully. If the instruction is successfully executed, the instruction is determined as a data change instruction because the instruction successfully operates the data content, and if the instruction fails to execute, the instruction is determined as a data change instruction because the instruction does not successfully operate the data content. If the instruction is being executed, the determination may be made after the instruction is executed. By judging the instruction state corresponding to the instruction, intelligent data backup of the Druid can be realized.
S102: and updating the backup data in the backup storage cluster according to the data change condition of the Druid storage cluster.
And after the data of the Druid storage cluster is changed, updating the backup data in the backup storage cluster according to the data change condition. The specific form of the backup storage cluster is not limited, and for example, the backup storage cluster may include a storage system and a database, or may include only a storage system, such as an HDFS storage system, and the metadata information in the database in the drain storage cluster is also stored in the storage system. The data change situation may be embodied in different manners, for example, in the form of a data change instruction, or may be embodied in a record in a backup log corresponding to the Druid cluster. It should be noted that, if the backup storage cluster has backup data before the current update, the current update is to replace the old backup data with the new backup data; if the backup storage cluster does not have the backup data before the current update, the current update is to newly add the corresponding backup data in the backup storage cluster. In this embodiment, the backup data includes backup of metadata information and backup of data blocks, that is, includes backup metadata and backup data blocks. It should be noted that, this embodiment does not limit the specific storage manner of the backup metadata, for example, when a database exists in the backup storage cluster, the backup metadata may be stored in the database. Or when the database does not exist in the backup storage cluster, generating backup metadata according to a preset format, and storing the backup metadata into the storage system. The data in the Druid storage cluster is subjected to cross-domain backup, namely the data in the Druid storage cluster is backed up in the backup storage cluster, so that the backup process can be more reliable. Further, the backup storage cluster and the Druid storage cluster can be geographically located in different data centers, so that the problem that data recovery cannot be performed if the whole data center is down when backup is performed among different clusters of the same data center can be avoided. .
To flexibly backup data, a number of different backup mechanisms may be employed. Accordingly, when updating backup data, an update operation may be performed based on a corresponding backup mechanism. Specifically, the step S102 may include:
step 21: and determining backup information according to the current backup mechanism.
Step 22: and updating the backup data in the backup storage cluster by using the backup information based on the current backup mechanism.
The current backup mechanism is the currently selected backup mechanism, a plurality of candidate backup mechanisms are provided, and the current backup mechanism can be any one of the candidate backup mechanisms. After the current backup mechanism is determined, corresponding backup information can be determined according to the current backup mechanism, and the backup information is used for updating backup data in the backup storage cluster. When different backup mechanisms are adopted, the corresponding backup information is also different, and the embodiment does not limit the specific content and form of the backup information, and can be set according to the requirements of the current backup mechanism.
In a possible implementation manner, in order to backup data in the Druid storage cluster in time and avoid that the data in the Druid storage cluster cannot be restored when the data in the Druid storage cluster fails, a backup mechanism of hot backup may be used. Specifically, step 21 may include:
step 31: and if the current backup mechanism is hot backup, determining the data change instruction as backup information.
Accordingly, step 22 may include:
step 32: and updating the data change instruction corresponding to the backup data in real time.
The hot backup is real-time backup, and if the current backup mechanism is the hot backup, the data change instruction can be directly determined as backup information, and the data change condition of the Druid storage cluster is reflected by the data change instruction. When data backup is performed, updating corresponding to the data change instruction can be performed in the backup storage cluster, so that the same change operation as that of the data in the Druid storage cluster is performed on the backup data in the backup storage cluster. The update corresponding to the data change instruction is executed on the backup data, and the backup storage cluster does not execute the data change instruction, but performs corresponding change on the backup storage cluster according to the change situation of the data in the managed storage cluster caused by the data change instruction. Specifically, when the data change instruction is to change the data block, the corresponding data block in the backup data may be changed, for example, the data block is added, updated, deleted, and the like; when the data change instruction is to change the metadata information, corresponding changes may be made to the backup metadata in the backup data, for example, data is newly added, data is updated, data is deleted, and the like in the backup metadata. By executing the update corresponding to the data change instruction in the backup storage cluster in real time, the update of the backup data can be completed in real time.
It should be noted that, in an embodiment, the backup storage cluster does not include a database, and if the data change instruction is an instruction to modify the metadata information, the update cannot be completed by directly updating the backup metadata in the database of the backup storage cluster. In this case, the backup metadata may be updated in an effective manner according to the data change instruction, or when it is detected that the backup metadata needs to be updated, new backup metadata may be directly obtained from the database of the Druid storage cluster, and the old backup metadata is replaced with the new backup metadata, so as to complete the update of the backup metadata, and further complete the update of the backup data.
In another possible implementation manner, in order to reduce the influence of data backup on normal service performance, a cold backup manner may be used to update backup data. Specifically, step 21 may include:
step 41: and if the current backup mechanism is cold backup, determining the backup log of the Druid storage cluster as backup information.
Accordingly, step 22 may include:
step 42: and according to a preset period, generating cold backup information by using the backup log.
Step 43: and performing corresponding updating of the cold backup information on the backup data.
It should be noted that the cold backup does not update the backup data in real time, but updates the backup data according to a preset period. The backup log is used for recording the data change condition of the Druid storage cluster, and the backup log of the Druid storage cluster can be used as backup information, namely the data change condition of the Druid storage cluster is reflected by the backup log. When a backup mechanism of cold backup is adopted, the cold backup information is generated by using the backup log according to a preset period, and it should be noted that the cold backup information is not a data change instruction, but is a summary of data change conditions in the Druid storage cluster. For example, when the drive storage cluster captures (i.e., writes) one data a and deletes the data a, information corresponding to the action of deleting the data a after the data a is generated does not occur when the cold backup information is generated, and it is considered that the data a has not occurred. Therefore, the cold backup information is representative of data changes of the Druid storage cluster over a period of time. By generating the cold backup information and executing the corresponding update in the backup storage cluster, the backup data can be efficiently updated, and the influence on the normal service performance is avoided.
Further, when the backup mechanism of cold backup is adopted, after the data change instruction is executed in the Druid storage cluster, the backup log needs to be updated according to the data change instruction, so as to record each data change in the Druid storage cluster.
In an embodiment, whether a backup mechanism of cold backup is adopted or not, the data change instruction can be recorded by using a backup log, and when the backup mechanism of cold backup is adopted, cold backup information can be recorded so as to perform a backup task canceling operation or a rollback operation.
S103: and if the Druid storage cluster fails, performing data recovery on the Druid storage cluster by using the backup data.
The failure refers to the situation that data in the Druid storage cluster is lost, conflicts and the like, and if the Druid storage cluster fails, the data in the Druid storage cluster can be restored by using the backup data in the backup storage cluster, that is, the data in the Druid storage cluster can be restored by using the backup data. The embodiment does not limit the manner of determining that the Druid storage cluster fails, for example, the Druid storage cluster may be monitored to determine whether data therein conflicts or is lost, and if so, it is determined that a failure occurs. It should be noted that the data recovery may be full recovery, i.e., recovering all data in the Druid storage cluster, or may be partial recovery, e.g., recovering metadata information or recovering data blocks.
Specifically, the step S103 may include:
step 51: and acquiring backup data by using a Distcp technology.
Step 52: and analyzing the backup data to obtain backup metadata and backup data blocks.
Step 53: and performing data recovery on the MySQL database in the Druid storage cluster by using the backup metadata.
Step 54: and performing data recovery on the distributed file system in the drive storage cluster by using the backup data blocks.
The Distcp technology is a technology for copying a large amount of data between clusters or inside the clusters, and can realize cross-cluster and cross-regional network transmission of the data. When data recovery is carried out, the backup data is analyzed after the backup data is acquired by using the Dispct technology. It should be noted that, the parsing of the backup data may include decompressing and may also include a decryption operation, and accordingly, when the backup data is generated and updated, the backup data may be encrypted and compressed, so as to improve the data transmission speed and the security performance. After the backup data is analyzed, two parts of the Druid data, namely the backup metadata and the backup data block, can be obtained, and the data recovery of the MySQL database and the distributed file system in the Druid storage cluster is performed by respectively using the backup metadata and the backup data block, so that the data recovery process of the Druid storage cluster is completed. Instructions that are not executed during the data recovery process may be logged for execution after the data recovery.
It should be noted that, when data recovery is performed on backup data obtained by using a cold backup mechanism, after the data recovery, data between a time node corresponding to the backup data and a current time node needs to be recovered by using a backup log, otherwise, the problem of data inconsistency may be caused.
By applying the data backup and recovery method provided by the embodiment of the application, the drive storage cluster is backed up by using the backup storage cluster, namely, cross-domain backup is performed. After the data change instruction is acquired and executed in the Druid storage cluster, the change of the data in the Druid storage cluster is described, wherein the change can be the change of the data block or the change of the metadata information. Therefore, the backup data in the backup storage cluster is updated according to the data change condition, and the backup of the whole drive storage cluster is completed, including the backup of the data blocks and the backup of the metadata information. After a failure, the data in the source storage cluster may be restored using the backup data. When the data in the Druid storage cluster changes, the backup data in the corresponding backup storage cluster is updated according to the change situation, that is, all the data in the Druid storage cluster is backed up by using the backup storage cluster. The backup of the whole drive storage cluster comprises the backup of the metadata information, so that the data recovery can be realized when the metadata information is lost, the reliability and the usability of disaster recovery backup are improved, and the problem of poor reliability and usability of the related technology is solved.
Based on the above embodiments, the present embodiment will describe a specific data backup and recovery process. Referring to fig. 2, fig. 2 is a schematic diagram of a data recovery system according to an embodiment of the present disclosure. The source drive cluster is a drive storage cluster, and the source data center HDFS is a storage system. And the backup data center HDFS is the backup storage cluster. And the data supervision module is used for sending a recovery instruction to the recovery execution module after detecting the fault. And the task supervision module is responsible for acquiring the task type and the task execution state executed by the drive cluster in real time. For the data intake task, performing segment-level backup operation (namely performing backup on newly generated segments) according to metadata information in MySQL in the task execution process; for the data updating/merging task, after the task is executed successfully, executing backup operation according to segment change conditions; for kill (delete) type tasks, after the tasks are executed successfully, according to the change of the meta information in the MySQL, the related segment can be deleted in the backup cluster directly. Note that when the task fails, the backup operation is not performed. The task supervision module is also used for maintaining a backup system log, namely a backup log, wherein the backup log can contain task information (type, state, task instruction content and time point) and backup operation instructions. For cold backup of data, that is, when a backup mechanism of cold backup is used, information in a backup log needs to be periodically (for example, by day) collected, data change conditions are analyzed, corresponding cold backup information is generated and sent to a backup execution module, and the cold backup information can also be maintained in a backup system log. And the backup system log is combined, and the functions of backup task cancellation and rollback can be supported in an extensible mode. And the backup execution module is used for compressing and encrypting the data in the drive storage cluster and then sending the data to the backup storage cluster by using a cross-domain transmission technology such as Distcp. And the recovery execution module is used for performing cross-domain pulling, decompression and decryption on the backup data in the backup storage cluster and performing data recovery on the Druid storage cluster by using the cross-domain pulling, decompression and decryption.
Referring to fig. 3, fig. 3 is a schematic diagram of a data backup process according to an embodiment of the present disclosure. Wherein the ingestion instruction is an intake instruction, the update is an update instruction, the compact is a merge instruction, and the kill is a delete instruction. running is the executing state, success is the successful executing state, and failed is the executing failure state. According to different backup rules (namely backup mechanisms), a hot backup operation instruction or a cold backup operation instruction is obtained by using the task instruction or the backup log, and then the hot backup operation or the cold backup operation is carried out.
And the backup execution module executes the data backup operation sent by the task supervision module to complete the corresponding data backup operation. For the tasks of the ingestion and updating types, data compression and encryption operations are firstly carried out, and then cross-domain transmission of data is carried out by using a Distcp technology; for the deletion task, the corresponding data is directly deleted in the backup cluster. The data compression supports Lz4, Snappy, Bzip2 and ZLib compression algorithms, and the data encryption supports AES encryption algorithms. Through compression and encryption of data, occupation of network bandwidth in a cross-domain transmission process is reduced, and safety of data transmission is improved.
The data supervision module is responsible for monitoring whether segment data of the source drive cluster is consistent with the metadata in real time, and once an inconsistent condition occurs (such as the segment in the HDFS being deleted by mistake, the metadata information in the MySQL being deleted by mistake and the like), the data supervision module immediately informs the recovery execution module to pull the backup data and complete data recovery. If the metadata information and the segment information cannot be determined to be lost, performing data recovery by taking the data at the latest time point as a reference according to the backup system log; after the data is restored, according to the task instruction after the time point corresponding to the backup data maintained in the log, the data restoration can be completed by executing the part of tasks again in the source drive cluster. And the recovery execution module is responsible for pulling corresponding data from the backup cluster, and importing the data into the specified position of the HDFS/updating the specified table data in MySQL after the decompression and decryption of the data are completed, so that the recovery of the data is completed. Referring to fig. 4, fig. 4 is a schematic diagram of a data recovery process according to an embodiment of the present application. And the recovery execution module performs decryption and decompression on the backup data after pulling the backup data in a cross-domain manner, and performs segment data block uploading and MySQL metadata importing operations to complete data recovery.
In the following, the data backup and recovery device provided in the embodiments of the present application is introduced, and the data backup and recovery device described below and the data backup and recovery method described above may be referred to correspondingly.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a data backup and recovery device according to an embodiment of the present application, including:
the execution module 110 is configured to obtain a data change instruction, and execute the data change instruction in the Druid storage cluster;
the backup updating module 120 is configured to update the backup data in the backup storage cluster according to the data change condition of the Druid storage cluster; the backup data comprises backup metadata and backup data blocks;
and the data recovery module 130 is configured to perform data recovery on the drive storage cluster by using the backup data if the drive storage cluster fails.
Optionally, the backup update module 120 includes:
the backup information determining unit is used for determining backup information according to the current backup mechanism;
and the updating unit is used for updating the backup data in the backup storage cluster by using the backup information based on the current backup mechanism.
Optionally, the backup information determining unit includes:
the hot backup information determining subunit is used for determining the data change instruction as the backup information if the current backup mechanism is the hot backup;
accordingly, an update unit comprises:
and the hot backup updating subunit is used for executing the updating corresponding to the data change instruction on the backup data in real time.
Optionally, the backup information determining unit includes:
the cold backup information determining subunit is configured to determine, if the current backup mechanism is a cold backup, a backup log of the Druid storage cluster as backup information;
accordingly, an update unit comprises:
the cold backup information generation subunit is used for generating cold backup information by using the backup log according to a preset period;
and the cold backup updating subunit is used for executing cold backup information on the backup data.
Optionally, the method further comprises:
and the backup log updating module is used for updating the backup log according to the data change instruction after the data change instruction is executed in the drive storage cluster.
Optionally, the data recovery module 130 includes:
the backup data acquisition unit is used for acquiring backup data by using a Distcp technology;
the backup data analysis unit is used for analyzing the backup data to obtain backup metadata and backup data blocks;
the metadata information recovery unit is used for performing data recovery on the MySQL database in the Druid storage cluster by using the backup metadata;
and the data block recovery unit is used for performing data recovery on the distributed file system in the Druid storage cluster by using the backup data blocks.
Optionally, the execution module 110 includes:
the command acquisition unit is used for acquiring a command which needs a Druid storage cluster to respond;
the judging unit is used for judging whether the instruction can cause data change or not;
and the data change instruction determining unit is used for determining the instruction as a data change instruction if the instruction can cause data change.
In the following, the data backup and recovery device provided in the embodiments of the present application is introduced, and the data backup and recovery device described below and the data backup and recovery method described above may be referred to correspondingly.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a data backup and recovery device according to an embodiment of the present disclosure. Wherein the data backup and restore device 100 may include a processor 101 and a memory 102, and may further include one or more of a multimedia component 103, an information input/information output (I/O) interface 104, and a communication component 105.
The processor 101 is configured to control the overall operation of the data backup and recovery apparatus 100 to complete all or part of the steps in the data backup and recovery method; the memory 102 is used to store various types of data to support the operation of the data backup and restore device 100, which may include, for example, instructions for any application or method operating on the data backup and restore device 100, as well as application-related data. The Memory 102 may be implemented by any type or combination of volatile and non-volatile Memory devices, such as one or more of Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.
The multimedia component 103 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 102 or transmitted through the communication component 105. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 104 provides an interface between the processor 101 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 105 is used for wired or wireless communication between the data backup and restore apparatus 100 and other apparatuses. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 105 may include: Wi-Fi part, Bluetooth part, NFC part.
The data backup and recovery Device 100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components, and is configured to perform the data backup and recovery method according to the above embodiments.
The following describes a computer-readable storage medium provided in an embodiment of the present application, and the computer-readable storage medium described below and the data backup and recovery method described above may be referred to correspondingly.
The present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the data backup and recovery method described above.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, herein, relationships such as first and second, etc., are intended only to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms include, or any other variation is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A data backup and recovery method is characterized by comprising the following steps:
acquiring a data change instruction, and executing the data change instruction in the Druid storage cluster;
updating backup data in a backup storage cluster according to the data change condition of the Druid storage cluster; wherein the backup data comprises backup metadata and backup data blocks;
and if the Druid storage cluster fails, performing data recovery on the Druid storage cluster by using the backup data.
2. The data backup and recovery method according to claim 1, wherein the updating the backup data in the backup storage cluster according to the data change condition of the drive storage cluster comprises:
determining backup information according to a current backup mechanism;
and updating the backup data in the backup storage cluster by using the backup information based on the current backup mechanism.
3. The method for backing up and recovering data according to claim 2, wherein the determining backup information according to the current backup mechanism comprises:
if the current backup mechanism is hot backup, determining the data change instruction as the backup information;
correspondingly, the updating the backup data in the backup storage cluster by using the backup information based on the current backup mechanism includes:
and executing the updating corresponding to the data change instruction on the backup data in real time.
4. The method for backing up and recovering data according to claim 2, wherein the determining backup information according to the current backup mechanism comprises:
if the current backup mechanism is cold backup, determining the backup log of the Druid storage cluster as the backup information;
correspondingly, the updating the backup data in the backup storage cluster by using the backup information based on the current backup mechanism includes:
according to a preset period, generating cold backup information by using the backup log;
and executing the update corresponding to the cold backup information on the backup data.
5. The data backup and recovery method according to claim 4, further comprising:
after the data change instruction is executed in the Druid storage cluster, updating the backup log according to the data change instruction.
6. The data backup and recovery method according to any one of claims 1 to 5, wherein the performing data recovery on the Druid storage cluster by using the backup data includes:
acquiring the backup data by using a Distcp technology;
analyzing the backup data to obtain backup metadata and backup data blocks;
performing data recovery on the MySQL database in the Druid storage cluster by using the backup metadata;
and performing data recovery on the distributed file system in the Druid storage cluster by using the backup data blocks.
7. The data backup and recovery method according to claim 1, wherein the obtaining the data change instruction comprises:
acquiring an instruction requiring the Druid storage cluster to respond;
judging whether the instruction can cause data change or not;
and if so, determining the instruction as the data change instruction.
8. A data backup and recovery apparatus, comprising:
the execution module is used for acquiring a data change instruction and executing the data change instruction in the Druid storage cluster;
the backup updating module is used for updating backup data in the backup storage cluster according to the data change condition of the drive storage cluster; wherein the backup data comprises backup metadata and backup data blocks;
and the data recovery module is used for performing data recovery on the Druid storage cluster by using the backup data if the Druid storage cluster fails.
9. A data backup and recovery device comprising a memory and a processor, wherein:
the memory is used for storing a computer program;
the processor is configured to execute the computer program to implement the data backup and recovery method according to any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the data backup and restore method according to any one of claims 1 to 7.
CN202010973108.4A 2020-09-16 2020-09-16 Data backup and recovery method, device, equipment and readable storage medium Pending CN112181722A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010973108.4A CN112181722A (en) 2020-09-16 2020-09-16 Data backup and recovery method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010973108.4A CN112181722A (en) 2020-09-16 2020-09-16 Data backup and recovery method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112181722A true CN112181722A (en) 2021-01-05

Family

ID=73920759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010973108.4A Pending CN112181722A (en) 2020-09-16 2020-09-16 Data backup and recovery method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112181722A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112965858A (en) * 2021-03-04 2021-06-15 电信科学技术第五研究所有限公司 Method for realizing conflict processing of networking distributed storage data

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101977124A (en) * 2010-11-05 2011-02-16 山东中创软件工程股份有限公司 Service clustering method and system based on ZooKeeper technology
CN103699548A (en) * 2012-09-27 2014-04-02 阿里巴巴集团控股有限公司 Method and equipment for recovering database data by using logs
CN107357681A (en) * 2017-06-26 2017-11-17 杭州铭师堂教育科技发展有限公司 Zookeeper backup management systems and method based on salt
CN107678883A (en) * 2017-09-22 2018-02-09 郑州云海信息技术有限公司 A kind of cluster recovery method and apparatus based on storage system
CN109144792A (en) * 2018-10-08 2019-01-04 郑州云海信息技术有限公司 Data reconstruction method, device and system and computer readable storage medium
CN110543386A (en) * 2019-09-16 2019-12-06 上海达梦数据库有限公司 Data storage method, device, equipment and storage medium
CN110633168A (en) * 2018-06-22 2019-12-31 北京东土科技股份有限公司 Data backup method and system for distributed storage system
CN111625401A (en) * 2020-05-29 2020-09-04 浪潮电子信息产业股份有限公司 Data backup method and device based on cluster file system and readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101977124A (en) * 2010-11-05 2011-02-16 山东中创软件工程股份有限公司 Service clustering method and system based on ZooKeeper technology
CN103699548A (en) * 2012-09-27 2014-04-02 阿里巴巴集团控股有限公司 Method and equipment for recovering database data by using logs
CN107357681A (en) * 2017-06-26 2017-11-17 杭州铭师堂教育科技发展有限公司 Zookeeper backup management systems and method based on salt
CN107678883A (en) * 2017-09-22 2018-02-09 郑州云海信息技术有限公司 A kind of cluster recovery method and apparatus based on storage system
CN110633168A (en) * 2018-06-22 2019-12-31 北京东土科技股份有限公司 Data backup method and system for distributed storage system
CN109144792A (en) * 2018-10-08 2019-01-04 郑州云海信息技术有限公司 Data reconstruction method, device and system and computer readable storage medium
CN110543386A (en) * 2019-09-16 2019-12-06 上海达梦数据库有限公司 Data storage method, device, equipment and storage medium
CN111625401A (en) * 2020-05-29 2020-09-04 浪潮电子信息产业股份有限公司 Data backup method and device based on cluster file system and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112965858A (en) * 2021-03-04 2021-06-15 电信科学技术第五研究所有限公司 Method for realizing conflict processing of networking distributed storage data

Similar Documents

Publication Publication Date Title
US10515057B2 (en) Management of data replication and storage apparatuses, methods and systems
US20220066863A1 (en) Remedial action based on maintaining process awareness in data storage management
US9697092B2 (en) File-based cluster-to-cluster replication recovery
US9727601B2 (en) Predicting validity of data replication prior to actual replication in a transaction processing system
US9223679B1 (en) Lightweight, non-intrusive, and flexible apparatus to obtain system and process state
US11249943B2 (en) Scalable enterprise content management
US11151030B1 (en) Method for prediction of the duration of garbage collection for backup storage systems
US9483352B2 (en) Process control systems and methods
US9037905B2 (en) Data processing failure recovery method, system and program
US20220374519A1 (en) Application migration for cloud data management and ransomware recovery
US9342390B2 (en) Cluster management in a shared nothing cluster
US11880467B2 (en) Security-aware caching of resources
SG181959A1 (en) System event logging system
CN113254320A (en) Method and device for recording user webpage operation behaviors
CN115202929B (en) Database cluster backup system
CN112181722A (en) Data backup and recovery method, device, equipment and readable storage medium
CN111488117B (en) Method, electronic device, and computer-readable medium for managing metadata
CN110058963B (en) Method, apparatus and computer program product for managing a storage system
CN114518985B (en) Failure indication for storage system commands
CN113641693B (en) Data processing method and device of streaming computing system, electronic equipment and medium
CN112925676B (en) WAL-based method for realizing recovery of distributed database cluster at any time point
CN111522783B (en) Data synchronization method, device, electronic equipment and computer readable storage medium
US11886277B2 (en) Systems, apparatuses, and methods for assessing recovery viability of backup databases
JP5636635B2 (en) Backup apparatus, backup system, backup method, and program
JP2023547830A (en) Delivery of event notifications from distributed file systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210105

RJ01 Rejection of invention patent application after publication