CN117520056A - Hbase data backup method, hbase data backup system, electronic equipment and storage medium - Google Patents

Hbase data backup method, hbase data backup system, electronic equipment and storage medium Download PDF

Info

Publication number
CN117520056A
CN117520056A CN202410021488.XA CN202410021488A CN117520056A CN 117520056 A CN117520056 A CN 117520056A CN 202410021488 A CN202410021488 A CN 202410021488A CN 117520056 A CN117520056 A CN 117520056A
Authority
CN
China
Prior art keywords
backup
file
data
file list
full
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410021488.XA
Other languages
Chinese (zh)
Inventor
宋培毓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Ecloud Technology Co ltd
Original Assignee
Nanjing Ecloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Ecloud Technology Co ltd filed Critical Nanjing Ecloud Technology Co ltd
Priority to CN202410021488.XA priority Critical patent/CN117520056A/en
Publication of CN117520056A publication Critical patent/CN117520056A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/84Using snapshots, i.e. a logical point-in-time copy of the data

Abstract

The invention provides a Hbase data backup method, a Hbase data backup system, electronic equipment and a storage medium, and relates to the technical field of data management. The method comprises the following steps: and generating full backup tasks/incremental backup tasks corresponding to at least one area in the data table respectively aiming at the data table to be backed up in the Hbase database. Executing a full backup task/incremental backup task; wherein performing the full back-up task includes: analyzing the current snapshot file, backing up the file according to the file list obtained by analysis, and recording metadata information of the backing up file to a database. Performing incremental backup tasks includes: analyzing the current snapshot file to compare according to the total synthetic information of last backup of the file list obtained by analysis, so as to obtain a change file list according to comparison processing to backup files, and synthesizing and recording the backup files to a database based on the synthetic backup file metadata information. Which can reduce the data backup time of the Hbase database and reduce the data recovery time.

Description

Hbase data backup method, hbase data backup system, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of data management, in particular to a Hbase data backup method, a Hbase data backup system, electronic equipment and a storage medium.
Background
Data reliability is a lifeline of business systems and is one of the core values of distributed storage systems. In order to ensure high reliability of data, it is necessary to perform processes such as backup and restoration of data. The backup process of the Hbase database is generally composed of two links: full back-up and incremental back-up.
Typical Hbase backup schemes are: the full-volume backup uses a snapshot mechanism of Hbase to perform snapshot on data at a certain time point once, and then copies the full-volume snapshot data to a target storage; the principle of incremental backup is to process the log of HBase and analyze and backup the newly generated log data after the last period in a fixed period. However, in this typical scheme, the data file export needs to be regenerated according to the job log in incremental time, which occupies a large amount of computing storage resources in the backup process, and increases the backup time.
Disclosure of Invention
The invention aims to provide a method, a system, electronic equipment and a storage medium for Hbase data backup, which can reduce the data backup time of a Hbase database and reduce the data recovery time.
The invention is realized in the following way:
in a first aspect, the present application provides a method for backing up Hbase data, including the steps of:
and generating full backup tasks/incremental backup tasks corresponding to at least one area in the data table aiming at the data table to be backed up in the Hbase database.
A full/incremental backup task is performed. Wherein performing the full back-up task includes: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; backing up corresponding files based on the first file list, and recording metadata information of the backed up files to a database; performing incremental backup tasks includes: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
Further, based on the foregoing scheme, the method further includes a data recovery step:
in response to a data recovery instruction, determining an Hbase data table to be recovered; and determining a recovery time point of the data table. And reading the database to obtain a backup file list corresponding to the data table at the recovery time point. And based on the backup file list, uploading the hFile file and the corresponding snapshot information file to the Hbase database reversely from the storage system so as to restore the corresponding table data.
In a second aspect, the present application provides a system for Hbase data backup, including:
a data transmission module configured to: when the full backup task/incremental backup task is responded, the newly added/modified data files in the file list are downloaded from the dataNode node of the Hbase database and uploaded to the storage system according to backup file path information; a snapshot parsing module configured to: based on the obtained snapshot file, analyzing all file list information required by the current snapshot according to an Hbase interface document format; a file list processing module configured to: executing a full backup task/incremental backup task to calculate a file list to be processed to a data transmission module; a database module configured to: and when each backup is performed, all file information of the backup set is independently stored in a corresponding metadata table in a data synthesis mode.
Further, based on the foregoing, the file list processing module is further configured to:
when executing the full-volume backup task, obtaining a full-volume file list set corresponding to the snapshot from the snapshot analysis module, and directly transmitting the full-volume file list set to the data transmission module for processing.
Further, based on the foregoing, the file list processing module is further configured to:
when the incremental backup task is executed, a full-volume file list set S corresponding to the snapshot is obtained from the snapshot analysis module, and a full-volume data set L of the last backup is obtained from the database module. Based on the total file list set S and the total data set L, obtaining a file list to be processed in the current backup, where the file list to be processed in the current backup includes: newly adding a file set N, deleting a file set D and changing a file set M. And providing the file list to be processed in the backup to a data transmission module for processing.
Further, based on the foregoing solution, the obtaining a file list to be processed in the current backup based on the full-volume file list set S and the full-volume data set L includes:
carrying out union processing based on the full file list set S and the full data set L to obtain a set R, wherein the set R is a set of files which are continuously stored until the backup is finished; subtracting the set R from the full file list set S to obtain a newly added file set N; subtracting the set R from the full data set L to obtain a deleted file set D; traversing the file metadata in the set R, and judging the file with the changed file size to obtain a changed file set M.
In a third aspect, the present application provides a system for Hbase data backup, including:
a backup task generation module configured to: generating full backup tasks/incremental backup tasks corresponding to at least one area in a data table to be backed up in an Hbase database respectively;
a backup task execution module configured to: a full/incremental backup task is performed. The backup task execution module includes: a full back-up task execution unit configured to: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; and backing up the corresponding file based on the first file list, and recording the metadata information of the backed-up file to a database. An incremental backup task execution unit configured to: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
In a fourth aspect, the present application provides a system for Hbase data backup, including:
the data backup management module and the plurality of data backup task execution modules.
The data backup management module is used for generating full backup tasks/incremental backup tasks corresponding to at least one area in a data table to be backed up in the Hbase database respectively; and distributing the full-volume backup task/incremental backup task to each data backup task execution module according to a task distribution algorithm.
The data backup task execution module comprises: a first execution unit configured to: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; and backing up the corresponding file based on the first file list, and recording the metadata information of the backed-up file to a database. A second execution unit configured to: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
In a fifth aspect, the present application provides an electronic device comprising at least one processor, at least one memory, and a data bus; wherein: the processor and the memory complete communication with each other through the data bus; the memory stores program instructions for execution by the processor, the processor invoking the program instructions to perform the method of any of the first aspects.
In a sixth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in any of the first aspects above.
Compared with the prior art, the invention has at least the following advantages or beneficial effects:
according to the technical scheme, the log file is not required to be analyzed during incremental backup of the Hbase database, the incremental file is directly backed up instead, and the snapshot data at the backup time point can be directly restored during the recovery, so that the resource consumption during the backup and the subsequent recovery processes can be reduced, the time consumption is short, and the operation is convenient.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for Hbase data backup according to an embodiment of the present invention;
FIG. 2 is a flowchart of another embodiment of a method for Hbase data backup according to the present invention;
FIG. 3 is a block diagram illustrating an embodiment of a Hbase data backup system according to the present invention;
FIG. 4 is a block diagram illustrating another embodiment of a Hbase data backup system according to the present invention;
FIG. 5 is a block diagram illustrating another embodiment of a Hbase data backup system according to the present invention;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention.
Icon: 1. a data transmission module; 2. a snapshot parsing module; 3. a file list processing module; 4. a database module; 5. a backup task generating module; 6. a backup task execution module; 7. a data backup management module; 8. a data backup task execution module; 9. a processor; 10. a memory; 11. a data bus.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The various embodiments and features of the embodiments described below may be combined with one another without conflict. Wherein in the description of the present application, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Example 1
The embodiment of the application provides a Hbase data backup method, which does not need to analyze log files during backup, and can reduce time consumption in the backup process. In addition, when the data is restored, the snapshot data at the backup time point can be directly restored, and the time consumption can be reduced in the restoration process.
Referring to fig. 1, the method for backing up Hbase data includes the following steps:
step S101: and generating full backup tasks/incremental backup tasks corresponding to at least one area in the data table respectively aiming at the data table to be backed up in the Hbase database.
Step S102: a full/incremental backup task is performed. Wherein performing the full back-up task includes: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; and backing up the corresponding file based on the first file list, and recording the metadata information of the backed-up file to a database. Performing incremental backup tasks includes: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
When the data table in the Hbase database is backed up, only partial areas of the data table are possibly backed up, so that a corresponding full backup task/incremental backup task can be generated based on the data table to be backed up to process corresponding processing areas in the data table to be backed up. Because the Hbase database records all file information of the current snapshot in the manifest file of the snapshot while generating the snapshot, the manifest file is analyzed according to a certain format to obtain a file list.
The file snapshot refers to recording, storing or copying files in the whole file system, is convenient for subsequent replay, recovery or research, is an operation technology adopted when a computer system manager manages and monitors the file system, and can effectively capture states and information of files, folders and catalogues stored in the file system. The file snapshot is different from the backup because it is generated while the system is running, and the process of generating the snapshot is extremely fast without stopping the system, thereby ensuring that the system works properly. In addition, file snapshots also save storage space over backups. Compared with a more complex backup and recovery mode, the file snapshot is generated in an operation state, so that the storage space is saved, and high-quality and powerful recovery service can be provided.
Specifically, during full backup, the current snapshot manifest file is analyzed first, all files needed by backup are backed up according to the first file list obtained through analysis, and metadata information of the backed up files is recorded to a database. And in the incremental backup, the current snapshot manifest file is analyzed first, and then the second file list obtained based on the analysis is compared with the total synthetic information of the previous backup to obtain a converted file list. So that the corresponding files can be backed up according to the analysis and comparison conditions, and the files are synthesized and recorded to the database based on the metadata information of the synthesized backup files. It should be noted that, the "first" and "second" appearing in the first file list and the second file list in the foregoing description are only used to distinguish the file lists obtained when the snapshot file is parsed when describing different backup processes.
In summary, in the above embodiment, the snapshot of the Hbase database at each full-volume backup is a snapshot of the full-volume data, but only the file data that changes incrementally need to be backed up at the time of incremental backup, and the full-volume data of the snapshot need to be directly restored at the time of subsequent restoration. The method is characterized in that the data backup of Hbase is performed on the basis of a snapshot mechanism of Hbase, unlike the existing Hbase backup scheme, log files are not needed to be analyzed during backup, snapshot data of backup time points only need to be directly restored during subsequent recovery, and the whole process is short in time consumption and convenient to operate.
Referring to fig. 2, based on the foregoing scheme, in some embodiments of the present invention, the method further includes a data recovery step:
step S201: in response to a data recovery instruction, determining an Hbase data table to be recovered; and determining a recovery time point of the data table.
Step S202: and reading the database to obtain a backup file list corresponding to the data table at the recovery time point.
Step S203: and based on the backup file list, uploading the hFile file and the corresponding snapshot information file to the Hbase database reversely from the storage system so as to restore the corresponding table data.
In the prior art, when incremental backup is performed, data file export needs to be regenerated according to the job log, and a large amount of computing storage resources are occupied in the backup process, so that the backup time is increased. In addition, when the recovery is performed, the full data is required to be recovered before the incremental data is recovered, so that the recovery data quantity and recovery time are increased. In the above embodiment, after the Hbase data table to be restored and the specific restoration time point are determined, all the file list information of the Hbase table snapshot is obtained from the database according to the backup set, and then the hFile file and the corresponding snapshot information file can be uploaded from the storage system to the Hbase database based on the file list information, so that the Hbase interface is invoked to directly restore the table data from the snapshot cloning.
Example 2
Referring to fig. 3, an embodiment of the present application provides a system for Hbase data backup, which includes:
a data transmission module 1 configured to: in response to executing the full backup task/incremental backup task, the newly added/modified data files in the file list are downloaded from the dataNode node of the Hbase database and uploaded to the storage system according to backup file path information. A snapshot parsing module 2 configured to: and analyzing all file list information required by the current snapshot according to the Hbase interface document format based on the obtained snapshot file. A file list processing module 3 configured to: the full backup task/incremental backup task is performed to calculate a list of files to be processed to the data transfer module 1. A database module 4 configured to: and when each backup is performed, all file information of the backup set is independently stored in a corresponding metadata table in a data synthesis mode.
The data transmission module 1 can adopt a single or distributed architecture according to different data volumes in the Hbase database, and download the newly added/modified data file in the file list from the dateNode node of the Hbase and upload the backed-up file path information to the storage system during backup. The deleted file is cleaned from the storage system, and the data file is downloaded from the storage system according to a designated path for uploading to the Hbase database during recovery. The snapshot analysis module 2 is used for obtaining a manifest file from the snapshot information and analyzing all file list information required by the current snapshot according to the Hbase interface document format. The file list processing module 3 is responsible for calculating a file list to be processed for processing by the data transmission module 1. The database module 4 is not limited to the relational database, and may be a non-relational database, and may be selected and applied according to actual conditions. During each backup, all file information of the backup set can be independently stored in a metadata table through a data synthesis method, and key values of table data can be Hbase data file names plus absolute paths. And specific metadata includes, but is not limited to, storage system storage path and data file size, time and attributes, and the like.
Based on the foregoing, in some embodiments of the present invention, the file list processing module 3 is further configured to:
when executing the full-volume backup task, the snapshot analysis module 2 obtains the full-volume file list set corresponding to the snapshot, and directly transmits the full-volume file list set to the data transmission module 1 for processing.
Based on the foregoing, in some embodiments of the present invention, the file list processing module 3 is further configured to:
when executing the incremental backup task, obtaining a full-quantity file list set S corresponding to the snapshot from the snapshot analysis module 2, and obtaining a full-quantity data set L backed up last time from the database module 4;
based on the full file list set S and the full data set L, obtaining a file list to be processed in the current backup, wherein the file list to be processed in the current backup comprises: newly adding a file set N, deleting a file set D and changing a file set M;
and providing the file list to be processed in the backup to the data transmission module 1 for processing.
Based on the foregoing solutions, in some embodiments of the present invention, obtaining a file list to be processed in the current backup based on the full-volume file list set S and the full-volume data set L includes:
performing union processing based on the full file list set S and the full data set L to obtain a set R, wherein the set R is a set of files which are continuously stored until the current backup is finished after the last backup; subtracting the set R from the full file list set S to obtain a newly added file set N; subtracting the set R from the full data set L to obtain a deleted file set D; traversing the file metadata in the set R, and judging the file with the changed file size to obtain a changed file set M.
In the above embodiment, during full backup, the file list processing module 3 obtains the current snapshot full file list set S from the snapshot analysis module 2, and directly provides the set S to the data transmission module 1 for processing. Correspondingly, the file list processing module 3 not only obtains the full-quantity file list set S of the snapshot from the snapshot analysis module 2, but also obtains the full-quantity data set L of the last backup from the database module 4 during incremental backup. The union set R of the set S and the set L is a file which is continuously stored until the backup is finished; subtracting the set R from the set S to obtain a newly added file set N; subtracting the set R from the set L to obtain a deleted file set D; by traversing file metadata in the collection R by Hbase, a change file collection M can be obtained according to the specificity (namely, modified file) of the Hbase for judging the change of the file size. And then, the set N, the set D and the set M are used as file lists to be processed in the backup, and are respectively provided for the data transmission module 1 to be processed, so that a large amount of data is reduced compared with the original list. Correspondingly, during recovery, the recovery file list set R is read from the database according to the backup set, and the set R is directly provided for the data transmission module 1 for processing.
In addition, for the triggering flow of the full backup, it may be:
firstly, generating a full-quantity data snapshot on a table by calling an interface command in Hbase, and if the full-quantity data snapshot is a namespace backup, circularly traversing all tables under the namespace and generating the snapshot on the table; and then the snapshot analysis module 2 traverses all the snapshots, reads the manifest file of the snapshot and acquires the hFile file list of the snapshot. Secondly, the data transmission module 1 transmits the hFile files of all the snapshots and the corresponding snapshot information files to the backup storage system. And finally, recording the metadata information of all the hFile files of the table snapshot into a table corresponding to a backup set of a backup system database according to the Hbase table.
For the triggering flow of incremental backups, it may be:
first, a full data snapshot is generated from the table with an interface command at Hbase, and if a namespace backup is made, all tables under the namespace are traversed and the table is snapshot generated. Then, the snapshot parsing module 2 traverses all the snapshots, reads the manifest file of the snapshot and obtains the hFile file list of the snapshot. And secondly, the file list processing module 3 acquires a file list synthesized by the Hbase list and metadata which are backed up last time from a backup system database, compares all the snapshot hFile files with the file list which is backed up last time, and respectively acquires the file list which is increased and deleted by incremental backup this time. And the data transmission module 1 transmits all the snapshot added and deleted hFile files and corresponding snapshot information files to the backup storage system. And finally, synthesizing the file list information added and deleted at this time according to the file list metadata information in the last backup set, and recording all the hFile file metadata of the snapshot of the current backup list into a list corresponding to the backup set of the backup system database.
The triggering process for backup data recovery may be:
first, the file list processing module 3 acquires all file list information of the Hbase table snapshot from the database module 4 by backup set. And secondly, the data transmission module 1 reversely uploads the hFile file and the corresponding snapshot information file to the Hbase big data platform from the storage system according to the file list information. And finally, calling Hbase interface to directly recover the table data from the snapshot clone.
Example 3
Referring to fig. 4, an embodiment of the present application provides a system for Hbase data backup, which includes:
a backup task generation module 5 configured to: and generating full backup tasks/incremental backup tasks corresponding to at least one area in the data table respectively aiming at the data table to be backed up in the Hbase database.
A backup task execution module 6 configured to: a full/incremental backup task is performed.
The backup task execution module 6 includes: a full back-up task execution unit configured to: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; and backing up the corresponding file based on the first file list, and recording the metadata information of the backed-up file to a database. An incremental backup task execution unit configured to: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
The specific implementation process of the above system refers to a method for backup of Hbase data provided in embodiment 1, and is not described herein.
Example 4
Referring to fig. 5, an embodiment of the present application provides a system for Hbase data backup, which includes:
a data backup management module 7 and a plurality of data backup task execution modules 8.
The data backup management module 7 is configured to generate, for a data table to be backed up in the Hbase database, a full backup task/incremental backup task corresponding to at least one region in the data table respectively; and distributing the full-volume backup task/incremental backup task to each data backup task execution module 8 according to a task distribution algorithm. The task load of each data backup task execution module 8 can be balanced by distributing the full backup task/incremental backup task through a task distribution algorithm.
The data backup task execution module 8 includes:
a first execution unit configured to: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; and backing up the corresponding file based on the first file list, and recording the metadata information of the backed-up file to a database. A second execution unit configured to: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
Based on the foregoing, in some embodiments of the present invention, the method further includes a data recovery module configured to: in response to a data recovery instruction, determining an Hbase data table to be recovered; and determining a recovery time point of the data table. And reading the database to obtain a backup file list corresponding to the data table at the recovery time point. And based on the backup file list, uploading the hFile file and the corresponding snapshot information file to the Hbase database reversely from the storage system so as to restore the corresponding table data.
Example 5
Referring to fig. 6, an embodiment of the present application provides an electronic device comprising at least one processor 9, at least one memory 10 and a data bus 11; wherein: the processor 9 and the memory 10 complete the communication with each other through the data bus 11; the memory 10 stores program instructions executable by the processor 9, which the processor 9 invokes to perform a method of Hbase data backup. For example, implementation:
and generating full backup tasks/incremental backup tasks corresponding to at least one area in the data table respectively aiming at the data table to be backed up in the Hbase database.
A full/incremental backup task is performed. Wherein performing the full back-up task includes: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; and backing up the corresponding files based on the first file list, and recording the metadata information of the backed-up files to the database. Performing incremental backup tasks includes: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information backed up last time to obtain a change file list; and backing up the corresponding files based on the changed file list, synthesizing and recording the synthesized backup files to a database based on the synthesized backup file metadata information.
The Memory 10 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
The processor 9 may be an integrated circuit chip with signal processing capabilities. The processor 9 may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in fig. 6 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 6, or have a different configuration than shown in fig. 6. The components shown in fig. 6 may be implemented in hardware, software, or a combination thereof.
Example 6
The present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor 9, implements a method of Hbase data backup. For example, implementation:
and generating full backup tasks/incremental backup tasks corresponding to at least one area in the data table respectively aiming at the data table to be backed up in the Hbase database.
A full/incremental backup task is performed. Wherein performing the full back-up task includes: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; and backing up the corresponding files based on the first file list, and recording the metadata information of the backed-up files to the database. Performing incremental backup tasks includes: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information backed up last time to obtain a change file list; and backing up the corresponding files based on the changed file list, synthesizing and recording the synthesized backup files to a database based on the synthesized backup file metadata information.
The above functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (10)

1. A method for Hbase data backup, comprising the steps of:
generating full backup tasks/incremental backup tasks respectively corresponding to at least one area in a data table to be backed up in an Hbase database;
executing a full backup task/incremental backup task;
wherein performing the full back-up task includes: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; backing up corresponding files based on the first file list, and recording backup file metadata information to a database;
performing incremental backup tasks includes: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
2. The Hbase data backup method of claim 1 further comprising the step of data recovery:
in response to a data recovery instruction, determining an Hbase data table to be recovered; and determining a recovery time point of the data table;
reading a database to obtain a backup file list corresponding to the data table at the recovery time point;
and based on the backup file list, uploading the hFile file and the corresponding snapshot information file to the Hbase database reversely from the storage system so as to restore the corresponding table data.
3. A system for Hbase data backup according to claim 1 comprising:
a data transmission module configured to: when the full backup task/incremental backup task is responded, the newly added/modified data files in the file list are downloaded from the dataNode node of the Hbase database and uploaded to the storage system according to backup file path information;
a snapshot parsing module configured to: based on the obtained snapshot file, analyzing all file list information required by the current snapshot according to an Hbase interface document format;
a file list processing module configured to: executing a full backup task/incremental backup task to calculate a file list to be processed to a data transmission module;
a database module configured to: and when each backup is performed, all file information of the backup set is independently stored in a corresponding metadata table in a data synthesis mode.
4. The Hbase data backup system of claim 3 wherein said file list processing module is further configured to:
when executing the full-volume backup task, obtaining a full-volume file list set corresponding to the snapshot from the snapshot analysis module, and directly transmitting the full-volume file list set to the data transmission module for processing.
5. The Hbase data backup system of claim 3 wherein said file list processing module is further configured to:
when executing the incremental backup task, obtaining a full-quantity file list set S corresponding to the snapshot from the snapshot analysis module, and obtaining a full-quantity data set L backed up last time from the database module;
based on the full file list set S and the full data set L, obtaining a file list to be processed in the current backup, wherein the file list to be processed in the current backup comprises: newly adding a file set N, deleting a file set D and changing a file set M;
and providing the file list to be processed in the backup to a data transmission module for processing.
6. The Hbase data backup system according to claim 5 wherein said obtaining a list of files to be processed for the current backup based on said full-size file list set S and full-size data set L comprises:
performing union processing based on the full file list set S and the full data set L to obtain a set R, wherein the set R is a set of files which are continuously stored until the current backup is finished after the last backup;
subtracting the set R from the full file list set S to obtain a newly added file set N;
subtracting the set R from the full data set L to obtain a deleted file set D;
traversing the file metadata in the set R, and judging the file with the changed file size to obtain a changed file set M.
7. A system for Hbase data backup, comprising:
a backup task generation module configured to: generating full backup tasks/incremental backup tasks respectively corresponding to at least one area in a data table to be backed up in an Hbase database;
a backup task execution module configured to: executing a full backup task/incremental backup task;
the backup task execution module comprises:
a full back-up task execution unit configured to: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; backing up corresponding files based on the first file list, and recording backup file metadata information to a database;
an incremental backup task execution unit configured to: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
8. A system for Hbase data backup, comprising:
a data backup management module and a plurality of data backup task execution modules;
the data backup management module is used for generating full backup tasks/incremental backup tasks corresponding to at least one area in a data table to be backed up in the Hbase database respectively; distributing the full backup task/incremental backup task to each data backup task execution module according to a task distribution algorithm;
the data backup task execution module comprises:
a first execution unit configured to: acquiring and analyzing to obtain a first file list based on the acquired snapshot file; backing up corresponding files based on the first file list, and recording backup file metadata information to a database;
a second execution unit configured to: acquiring and analyzing to obtain a second file list based on the acquired snapshot file; comparing the second file list with the total synthetic information of the last backup to obtain a change file list; and backing up the corresponding files based on the change file list, synthesizing and recording the files to a database based on the synthesized backup file metadata information.
9. An electronic device comprising at least one processor, at least one memory, and a data bus; wherein: the processor and the memory complete communication with each other through the data bus; the memory stores program instructions for execution by the processor, the processor invoking the program instructions to perform the method of any of claims 1-2.
10. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-2.
CN202410021488.XA 2024-01-08 2024-01-08 Hbase data backup method, hbase data backup system, electronic equipment and storage medium Pending CN117520056A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410021488.XA CN117520056A (en) 2024-01-08 2024-01-08 Hbase data backup method, hbase data backup system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410021488.XA CN117520056A (en) 2024-01-08 2024-01-08 Hbase data backup method, hbase data backup system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117520056A true CN117520056A (en) 2024-02-06

Family

ID=89742459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410021488.XA Pending CN117520056A (en) 2024-01-08 2024-01-08 Hbase data backup method, hbase data backup system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117520056A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104714858A (en) * 2013-12-13 2015-06-17 中国移动通信集团公司 Data backup method, data recovery method and device
CN111221678A (en) * 2018-11-27 2020-06-02 阿里巴巴集团控股有限公司 Hbase data backup/recovery system, method and device and electronic equipment
CN114546728A (en) * 2022-02-28 2022-05-27 浪潮云信息技术股份公司 Method suitable for data backup and recovery of HBase
US20220188194A1 (en) * 2020-12-10 2022-06-16 Coupang Corp. Cloud-based database backup and recovery
CN115098473A (en) * 2022-07-13 2022-09-23 重庆长安汽车股份有限公司 Incremental data migration method and device for database, electronic equipment and storage medium
CN115344428A (en) * 2022-08-12 2022-11-15 广州鼎甲计算机科技有限公司 Data processing method, data processing apparatus, computer device, storage medium, and program product

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104714858A (en) * 2013-12-13 2015-06-17 中国移动通信集团公司 Data backup method, data recovery method and device
CN111221678A (en) * 2018-11-27 2020-06-02 阿里巴巴集团控股有限公司 Hbase data backup/recovery system, method and device and electronic equipment
US20220188194A1 (en) * 2020-12-10 2022-06-16 Coupang Corp. Cloud-based database backup and recovery
CN114546728A (en) * 2022-02-28 2022-05-27 浪潮云信息技术股份公司 Method suitable for data backup and recovery of HBase
CN115098473A (en) * 2022-07-13 2022-09-23 重庆长安汽车股份有限公司 Incremental data migration method and device for database, electronic equipment and storage medium
CN115344428A (en) * 2022-08-12 2022-11-15 广州鼎甲计算机科技有限公司 Data processing method, data processing apparatus, computer device, storage medium, and program product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HBASE技术社区: "HBase 实践 | 如何跨CDP集群通过HBase快照迁移数据", pages 1 - 14, Retrieved from the Internet <URL:https://www.qinglite.cn/doc/17746476d380bc85b> *
许红军: "Hbase的备份和群集复制", 《网络安全和信息化》, no. 3, 5 March 2018 (2018-03-05), pages 72 - 76 *

Similar Documents

Publication Publication Date Title
US9934107B1 (en) Designating backup nodes and backing up in parallel in a high-availability environment
US9645892B1 (en) Recording file events in change logs while incrementally backing up file systems
CN102171660B (en) Backing up and restoring selected versioned objects from a monolithic database backup
US8805849B1 (en) Enabling use of analytic functions for distributed storage system data
US20140081920A1 (en) Medium, control method, and information processing apparatus
CN113297166A (en) Data processing system, method and device
CN111078464A (en) Method, device and system for backing up data
CN112286728A (en) Data backup method, device, equipment and computer storage medium
CN111221678A (en) Hbase data backup/recovery system, method and device and electronic equipment
CN103838645B (en) Remote difference synthesis backup method based on Hash
CN112380057A (en) Data recovery method, device, equipment and storage medium
WO2022082891A1 (en) Big data acquisition method and system, and computer device and storage medium thereof
CN113656149A (en) Application processing method and device and related equipment
CN115098299A (en) Backup method, disaster recovery method, device and equipment for virtual machine
CN113672350A (en) Application processing method and device and related equipment
US20120072394A1 (en) Determining database record content changes
CN113051102A (en) File backup method, device, system, storage medium and computer equipment
CN113468118B (en) File increment storage method, device and storage medium based on blockchain
CN113419897A (en) File processing method and device, electronic equipment and storage medium thereof
CN117520056A (en) Hbase data backup method, hbase data backup system, electronic equipment and storage medium
CN107885617B (en) Method, system, computer device and readable storage medium for exporting business data
CN112948176B (en) DB2 database recovery method and device
CN115586872A (en) Container mirror image management method, device, equipment and storage medium
CN111625397B (en) Service log backup method, cluster, device, electronic equipment and storage medium
CN114896222A (en) Log data processing method and device, computer equipment and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination