CN112214357A - HDFS data backup and recovery system and backup and recovery method - Google Patents

HDFS data backup and recovery system and backup and recovery method Download PDF

Info

Publication number
CN112214357A
CN112214357A CN202011188471.1A CN202011188471A CN112214357A CN 112214357 A CN112214357 A CN 112214357A CN 202011188471 A CN202011188471 A CN 202011188471A CN 112214357 A CN112214357 A CN 112214357A
Authority
CN
China
Prior art keywords
backup
recovery
hdfs
data
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011188471.1A
Other languages
Chinese (zh)
Other versions
CN112214357B (en
Inventor
朱拓之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eisoo Information Technology Co Ltd
Original Assignee
Shanghai Eisoo Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eisoo Information Technology Co Ltd filed Critical Shanghai Eisoo Information Technology Co Ltd
Priority to CN202011188471.1A priority Critical patent/CN112214357B/en
Publication of CN112214357A publication Critical patent/CN112214357A/en
Application granted granted Critical
Publication of CN112214357B publication Critical patent/CN112214357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a HDFS data backup and recovery system and a backup and recovery method, in the system, an HDFS client in an HDSF unit is correspondingly connected with a proxy client, a plurality of proxy clients are commonly connected with a virtual client, the virtual client is connected with a backup server, the proxy clients are also connected with the backup server, a storage medium is arranged in the backup server, and the backup server is used for creating a backup and recovery task, performing data interaction with the proxy clients and performing data management on the storage medium; the virtual client is used for positioning the backup recovery tasks to the plurality of agent clients; the agent client is used for executing a backup recovery task so as to read a backup object or write a recovery object; the HDFS client is used to receive and respond to read or write operations by the proxy client. Compared with the prior art, the method and the device can support various backup requirements and recovery requirements, can effectively manage backup data, and can improve the backup recovery efficiency through concurrent execution of tasks.

Description

HDFS data backup and recovery system and backup and recovery method
Technical Field
The invention relates to the technical field of data backup and recovery, in particular to a HDFS (Hadoop distributed File System) data backup and recovery system and a backup and recovery method.
Background
The fusion insight HD is a Distributed data processing System, provides large-capacity data storage, query and analysis capability for the outside, and the HDFS (Hadoop Distributed File System) is the bottom storage of the fusion insight HD and provides high fault tolerance and high throughput storage support for upper-layer application. How to efficiently ensure the daily data safety of the fusion instrumentation HD and ensure that the data recovery can be carried out in time when the system is abnormal or does not reach the expected result in the case of heavy operation, and the influence of the service is reduced to the minimum, which becomes the task of the current HDFS application.
The existing HDFS backup scheme is based on a snapshot technology provided by HDFS, backup data is reserved in an HDFS file system or stored in an external storage, and the method has the following defects:
1. backup data cannot be managed and utilized effectively;
2. in some scenes, only complete backup is supported, and selective recovery cannot be performed according to user requirements;
3. when there are multiple backup or restore objects, backup-restore efficiency is low.
Disclosure of Invention
The present invention aims to overcome the defects of the prior art and provide a HDFS data backup and recovery system and a backup and recovery method, so as to achieve the purposes of effectively managing backup data, supporting various backup requirements and recovery requirements, and improving backup and recovery efficiency.
The purpose of the invention can be realized by the following technical scheme: an HDFS data backup and recovery system comprises an HDFS unit provided with a plurality of HDFS clients, wherein the HDFS clients are correspondingly connected with a plurality of agent clients respectively, the agent clients are connected with a virtual client together, the virtual client is connected with a backup server, the agent clients are also connected with the backup server respectively, a storage medium used for storing backup data is arranged in the backup server, and the backup server is used for creating a backup and recovery task, performing data interaction with the agent clients and performing data management on the storage medium;
the virtual client is used for positioning the backup recovery tasks to a plurality of agent clients connected with the virtual client;
the proxy client is used for executing a backup recovery task so as to read an HDFS backup object or write an HDFS recovery object;
the HDFS client is used for receiving and responding to read or write operations provided by the proxy client.
Further, the HDFS client and the proxy client are both located on the same device.
An HDFS data backup method comprises the following steps:
a1, according to the data source, backup high-level parameter and backup type selected by the user, initiating the backup task by the backup server, and sending the corresponding backup instruction to a plurality of agent clients connected with the virtual client;
a2, the multiple proxy clients respectively obtain the current time of the HDFS from the corresponding HDFS clients;
a3, confirming the backup mode by a plurality of agent clients according to the received backup instruction;
a4, according to the backup mode, a plurality of agent clients respectively obtain backup time objects through corresponding HDFS clients, and transmit the current time point information of the HDFS to a backup server and write the information into a storage medium;
a5, each agent client generates a backup object list by analyzing the data source in the backup task;
a6, according to the backup object list, each agent client respectively and sequentially judges whether each backup object is backed up, filtered or not and is incremental data;
a7, each agent client transmits the backup object judged as incremental data to the corresponding HDFS client to read the file block of the backup object, transmits the file block to the backup server, writes the file block into a storage medium, and simultaneously stores the corresponding HDFS connection information and the time point copy complete mark information to complete the backup task.
Further, the data source to be protected in step a1 is specifically an HDFS file or directory.
Further, the step a4 specifically includes the following steps:
a41, if the initiated task is the full backup task, executing the step A44;
a42, if the initiated is an incremental backup task or a permanent incremental backup task, the agent client inquires the existing time point type from the backup server according to the backup task, if the full standby time point is found and the time point copy between the full standby time point and the current time of the HDFS is complete, the step A44 is executed, otherwise, the backup type is converted into full backup, and then the step A44 is executed;
a43, if the difference backup is initiated, the agent client inquires the existing time point type from the backup server according to the backup task, if the latest time is the full backup time point and the time point copy is complete, the step A44 is executed, otherwise, the backup type is converted into the full backup, and then the step A44 is executed;
a44, backup time object, transmitting the current time point information of HDFS to backup server, writing it into storage medium.
Further, the step a6 is specifically to pass the backup object through a load balancer to determine whether the backup object has been backed up;
the backup object is passed through a file filter to determine if the backup object is filtered.
Further, the step a7 specifically includes the following steps:
a71, the agent client transmits the backup object to the HDFS client, reads the file block of the backup object, transmits the file block to the backup server, and writes the file block into the storage medium;
a72, if the backup object is backed up successfully, marking the time point copy as complete, otherwise, marking the time point copy as incomplete;
and A73, after all the backup objects in the backup object list complete the backup and time point copy marking operation, storing the HDFS connection information and the time point copy complete marking information corresponding to the backup objects at the same time to complete the backup task.
An HDFS data recovery method comprises the following steps:
b1, according to the recovery time, the recovery data and the recovery position selected by the user, initiating a recovery task by the backup server, and sending corresponding recovery instructions to a plurality of proxy clients connected with the virtual client;
b2, determining time availability and data information needing to be recovered by a plurality of proxy clients through analyzing parameters according to the received recovery instructions;
b3, each agent client generates a recovery object list by analyzing the data source in the recovery task;
b4, according to the recovery object list, each proxy client respectively and sequentially judges whether each recovery object is recovered or not and whether the recovery object is filtered or not;
and B5, obtaining recovery data through data analysis and new path synthesis, and transmitting the recovery data to the HDFS clients by each agent client to complete recovery tasks.
Further, the step B4 is specifically to pass the recovery object through a load balancer to determine whether the recovery object has been backed up;
the restoration object is passed through a file filter to determine whether the restoration object is filtered.
Further, the step B5 specifically includes the following steps:
b51, obtaining a data file name needing to be recovered through data analysis;
b52, according to the set recovery task, if the recovery task requires to recover to the new path, splicing the new path and the data file name needing to be recovered into recovery data, otherwise, taking the data file name needing to be recovered as the recovery data;
and B53, each proxy client transmits the corresponding recovery data to the HDFS client, and writes the recovery data into the HDFS according to the coverage rule to complete the recovery task.
Compared with the prior art, the invention has the following advantages:
the backup server is responsible for creating a backup task, issuing a backup/recovery instruction to the agent client, receiving data returned by the agent client, interacting with the storage medium, reading/writing the data and managing the data in the storage medium, thereby achieving the purpose of effectively and periodically managing the backup task and the backup data.
The virtual client is connected with the plurality of agent clients, and the backup recovery tasks created by the backup server are positioned and distributed to the plurality of agent clients through the virtual client, so that concurrent backup/recovery of the plurality of clients is supported, a backup/recovery window is reduced, and backup/recovery efficiency is effectively improved.
Third, the invention can provide not only full backup, but also incremental backup, differential backup and permanent incremental backup by acquiring the last modification time of the backup object and the current time of the HDFS and combining the backup time point, thereby realizing the purpose of supporting various backup requirements.
The invention removes the file path when backing up the data, only backs up the file name and the backup object attribute, and only needs to splice the recovery path and the file name when recovering the data, so as to recover the file content and the file attribute, and ensure the file attribute after recovery to be consistent with that during backup, thereby supporting the recovery of original position, different position, original machine, different machine and even different file system when recovering the data.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a schematic diagram of a data backup process according to the present invention;
FIG. 3 is a schematic diagram of a data recovery process according to the present invention;
FIG. 4 is a flow diagram illustrating the installation of a proxy client in an embodiment;
FIG. 5 is a flow of creating a virtual client in an embodiment;
FIG. 6 is a schematic diagram of an embodiment of a data backup process;
FIG. 7 is a diagram illustrating a data recovery process according to an embodiment;
the notation in the figure is: 1. HDFS unit, 11, HDFS client, 2, proxy client, 3, virtual client, 4, backup server, 41 and storage medium.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
Examples
As shown in fig. 1, an HDFS data backup and recovery system includes an HDFS unit 1 having a plurality of HDFS clients 11, the plurality of HDFS clients 11 are respectively and correspondingly interconnected with a plurality of agent clients 2, the plurality of agent clients 2 are commonly interconnected with a virtual client 3, the virtual client 3 is interconnected with a backup server 4, the plurality of agent clients 2 are also respectively interconnected with the backup server 4, a storage medium 41 for storing backup data is disposed in the backup server 4, the backup server 4 is configured to create a backup and recovery task, perform data interaction with the agent clients 2, and perform data management on the storage medium 41;
the virtual client 3 is used for positioning the backup and recovery tasks to a plurality of agent clients 2 connected with the virtual client;
the agent client 2 is used for executing a backup recovery task to read an HDFS backup object or write an HDFS recovery object;
the HDFS client 11 is used for receiving and responding to the read or write operation provided by the proxy client 2;
the HDFS client 11 and the proxy client 2 are both located on the same device.
Specifically, the backup server 4 is used as a management console of backup software, and is used for managing all resources, including the virtual client 3, the agent client 2 and the storage medium 41, and is responsible for creating a backup task, issuing a backup/recovery instruction to the agent client 2, receiving data returned by the agent client 2, interacting with the storage medium 41, reading/writing data, clearing out-of-date data in the storage medium 41, and automatically clearing out copies exceeding a set value by setting copy retention policies, such as copy number and retention time, in the backup server, so as to provide a utilization rate of a backup storage space, and also manually deleting unnecessary copies;
the storage medium 41 is a data storage unit of backup software for storing backup data;
the virtual client 3 is a set of physical agent clients, can ensure the concurrent execution of multiple clients of the backup/recovery task, the virtual client 3 is used for initiating the backup and recovery task, is equivalent to a virtual client associated with the backup task recovery task, manages the initiation of the whole task and the execution result, finds the corresponding agent client 2 to initiate the task through the virtual client 3, and the interaction of the task is that the agent client 2 interacts with the backup server 4;
the agent client 2 is used as an agent of backup software on the client, is responsible for interacting with the backup server 4, receiving and responding to commands issued by the backup server 4, and returns execution results to the backup server 4; interacting with an HDFS client 11, reading an HDFS backup object, and writing an HDFS recovery object;
the HDFS client 11 is implemented based on a Hadoop client provided by the fusion instrumentation HD, is located in the same device as the proxy client 2, receives and responds to the read/write operation provided by the proxy client 2, forwards the corresponding operation to the HDFS, and returns the response of the HDFS to the proxy client 2.
The above system is applied to practice, and the data backup process is shown in fig. 2, and includes the following steps:
a1, according to the data source (HDFS file or catalog) that needs protection, backup high-level parameter and backup type selected by the user, initiating the backup task by the backup server, and sending the corresponding backup instruction to a plurality of agent clients connected with the virtual client;
a2, the multiple proxy clients respectively obtain the current time of the HDFS from the corresponding HDFS clients;
a3, confirming the backup mode by a plurality of agent clients according to the received backup instruction;
a4, according to the backup mode, the multiple agent clients respectively obtain backup time objects through the corresponding HDFS clients, and transmit the current time point information of the HDFS to the backup server and write the information into the storage medium, specifically:
a41, if the initiated task is the full backup task, executing the step A44;
a42, if the initiated is an incremental backup task or a permanent incremental backup task, the agent client inquires the existing time point type from the backup server according to the backup task, if the full standby time point is found and the time point copy between the full standby time point and the current time of the HDFS is complete, the step A44 is executed, otherwise, the backup type is converted into full backup, and then the step A44 is executed;
a43, if the difference backup is initiated, the agent client inquires the existing time point type from the backup server according to the backup task, if the latest time is the full backup time point and the time point copy is complete, the step A44 is executed, otherwise, the backup type is converted into the full backup, and then the step A44 is executed;
a44, backing up the time object, transmitting the current time point information of the HDFS to a backup server, and writing the information into a storage medium;
a5, each agent client generates a backup object list by analyzing the data source in the backup task;
a6, according to the backup object list, each agent client respectively and sequentially judges whether each backup object has been backed up, filtered or not, and is incremental data, wherein, the backup object is passed through a load balancer to judge whether the backup object has been backed up;
the backup object passes through a file filter to judge whether the backup object is filtered or not;
a7, each agent client transmits the backup object judged as incremental data to the corresponding HDFS client to read the file block of the backup object, transmits the file block to the backup server, writes the file block into the storage medium, and simultaneously stores the corresponding HDFS connection information and the time point copy complete mark information to complete the backup task, specifically:
a71, the agent client transmits the backup object to the HDFS client, reads the file block of the backup object, transmits the file block to the backup server, and writes the file block into the storage medium;
a72, if the backup object is backed up successfully, marking the time point copy as complete, otherwise, marking the time point copy as incomplete;
and A73, after all the backup objects in the backup object list complete the backup and time point copy marking operation, storing the HDFS connection information and the time point copy complete marking information corresponding to the backup objects at the same time to complete the backup task.
The above system is applied to practice, and the data recovery process is shown in fig. 3, and includes the following steps:
b1, according to the recovery time, the recovery data and the recovery position selected by the user, initiating a recovery task by the backup server, and sending corresponding recovery instructions to a plurality of proxy clients connected with the virtual client;
b2, determining time availability and data information needing to be recovered by a plurality of proxy clients through analyzing parameters according to the received recovery instructions;
b3, each agent client generates a recovery object list by analyzing the data source in the recovery task;
b4, according to the recovery object list, each agent client end respectively and sequentially judges whether each recovery object is recovered and filtered, and similarly, the recovery object passes through a load balancer to judge whether the recovery object is backed up;
passing the restored object through a file filter to determine whether the restored object is filtered;
b5, obtaining recovery data through data analysis and new path synthesis, and transmitting the recovery data to the HDFS client by each proxy client to complete recovery tasks, specifically:
b51, obtaining a data file name needing to be recovered through data analysis;
b52, according to the set recovery task, if the recovery task requires to recover to the new path, splicing the new path and the data file name needing to be recovered into recovery data, otherwise, taking the data file name needing to be recovered as the recovery data;
and B53, each proxy client transmits the corresponding recovery data to the HDFS client, and writes the recovery data into the HDFS according to the coverage rule to complete the recovery task.
In the invention, the whole backup and recovery system consists of an agent client, a storage medium, a backup server and an HDFS client, and the HDFS unit and the backup service are transmitted through a TCP/IP protocol process.
The backup/recovery task execution result is determined by all the agent clients associated with the virtual client, and when all the agent clients fail, the task fails, otherwise, the task is successful or partially successful.
In order to construct a backup recovery system, a proxy client needs to be installed and a virtual client needs to be created, wherein the proxy client is installed as shown in fig. 4, the virtual client is created as shown in fig. 5, the specific execution flows of backup recovery are respectively shown in fig. 6 and fig. 7, the HDFS client and the proxy client are in the same machine, a Hadoop client provided by fusion instrumentation HD is needed to be installed in advance, the IP of the HDFS cluster NameNode needs to be the IP of the main NameNode, the stand-alone NameNode needs to be in an active mode, and a user needs to have a corresponding management authority of the client proxy and a corresponding storage medium use authority and configures correct information of the NameNode IP username, Kerberos and the like for the HDFS needing backup.
As shown in fig. 4, at the time of proxy client installation, the fusion instrumentation HD option must be selected:
1. the user starts to execute and install the client program;
2. selecting a supporting fusion instrumentation HD installation option;
3. inputting Hadoop Client and component _ env _ C _ example script positions;
4. executing the parameters provided in the step 3, generating an environment variable file, and executing the step 5;
5. and after the installation is completed, supporting the backup of the fusion insight HD HDFS if the installation is successful, and not supporting the backup of the fusion insight HD HDFS if the installation is failed.
As shown in fig. 5, the virtual client creation flow is as follows:
1. creating a virtual client, inputting a NameNode ip and a user name, and executing the step 2;
2. selecting physical clients needing to be bound, setting the kerbtickcachepath of each client, and executing the step 3;
3. submitting parameters and executing the step 4;
4. checking the legality of the parameters, connecting the HDFS, if the parameters are checked to be passed, executing the step 5, otherwise, executing the step 6;
5. the creation is successful;
6. and (5) failing to create, and prompting an error.
As shown in fig. 6, the backup process is as follows:
1. a user selects a data source (HDFS file or directory) needing protection, selects a backup high-level parameter and a backup type, initiates backup, and sends a backup instruction to a backup agent client bound with a virtual client;
2. each backup agent acquires the current time of the HDFS and executes the step 3;
3. each backup agent receives the backup instruction and confirms the backup type:
3.1 if the initiating is full standby, executing step 4;
3.2 if the incremental backup is initiated, the backup agent inquires the existing time point type in the backup service according to the task parameters, if the full-backup time point is found and the time point copy between the full-backup time point and the current time is complete, executing the step 4, otherwise, converting the backup type into full backup, and executing the step 4;
3.3 if the difference backup is initiated, the backup agent inquires the existing time point type in the backup service according to the task parameters, if the latest time is a complete time point and the time point copy is complete, the step 4 is executed, otherwise, the backup type is converted into full backup, and the step 4 is executed;
3.4 if the initiated permanent backup, executing 3.2 steps;
4. backing up the time object, writing the time point information into the storage medium, and executing the step 5;
5. analyzing a data source, generating a backup object list, and executing the step 6;
6. the backup object passes through a load balancer (under the condition of multi-client concurrency) and a file filter (under the condition of starting file filtering), and executes the step 7, otherwise, executes the step 5;
7. backing up the object, if the object is incremental backup, judging whether the object can have incremental data, if the object can be incremental, executing the step 8, otherwise returning to the step 5;
8. transferring the backup object to an HDFS Client, reading a file block of the backup object through the HDFS, writing the file block into a backup memory, and executing the step 9;
9. if the backup object is successfully backed up, marking the copy as complete, executing step 5, otherwise, marking the copy as incomplete, and executing step 11;
10. if all the backup objects are backed up, executing step 11;
11. storing special metadata (HDFS connection information) and copy integrity, and finishing the current backup agent client task.
As shown in fig. 7, the recovery flow is as follows:
1. selecting time needing to be recovered and a recovery file or a directory by a user, selecting a recovery position, and initiating recovery;
2. the agent client receives the recovery instruction, analyzes the parameters, determines the availability of time and the data information needing to be recovered, and executes the step 3;
3. starting a data source reader, analyzing a data source, generating a recovery object, and executing the step 4;
4. generating a recovery object list, sequentially taking out backup objects, and executing the step 5;
5. the recovery object is judged by a load balancer (under the condition of multi-client concurrency) and a file filter (under the condition of starting file filtering), if the recovery object passes through the load balancer, the step 6 is executed, and if the recovery object does not pass through the file filter, the step 4 is executed;
6. according to the recovery destination category, new path synthesis, data recovery, and execution of step 7, in this embodiment, the recovery destination category includes an HDFS file system and extx under a Linux file system;
7. the proxy client sends the data to the HDFS client, the HDFS client writes the data into the HDFS according to the coverage rule, if the data is written successfully, the step 5 is executed, and if the data is written unsuccessfully, the step 8 is executed;
8. and the current proxy client ends the recovery task.
In summary, the present invention can provide full backup, incremental backup, differential backup, and permanent increment based on the JNI interface data provided by the HDFS, and can flexibly configure a backup object because a snapshot technique is not used.

Claims (10)

1. The HDFS data backup and recovery system is characterized by comprising an HDFS unit (1) provided with a plurality of HDFS clients (11), wherein the HDFS clients (11) are respectively and correspondingly connected with a plurality of agent clients (2), the agent clients (2) are commonly connected with a virtual client (3), the virtual client (3) is connected with a backup server (4), the agent clients (2) are also respectively connected with the backup server (4), a storage medium (41) for storing backup data is arranged in the backup server (4), and the backup server (4) is used for creating a backup and recovery task, performing data interaction with the agent clients (2) and performing data management on the storage medium (41);
the virtual client (3) is used for positioning backup and recovery tasks to a plurality of agent clients (2) connected with the virtual client;
the proxy client (2) is used for executing a backup recovery task to read an HDFS backup object or write an HDFS recovery object;
the HDFS client (11) is used for receiving and responding to read or write operations provided by the proxy client (2).
2. The HDFS data backup-restore system according to claim 1, wherein the HDFS client (11) and the proxy client (2) are located on the same device.
3. An HDFS data backup method using the system of claim 1, comprising the steps of:
a1, according to the data source, backup high-level parameter and backup type selected by the user, initiating the backup task by the backup server, and sending the corresponding backup instruction to a plurality of agent clients connected with the virtual client;
a2, the multiple proxy clients respectively obtain the current time of the HDFS from the corresponding HDFS clients;
a3, confirming the backup mode by a plurality of agent clients according to the received backup instruction;
a4, according to the backup mode, a plurality of agent clients respectively obtain backup time objects through corresponding HDFS clients, and transmit the current time point information of the HDFS to a backup server and write the information into a storage medium;
a5, each agent client generates a backup object list by analyzing the data source in the backup task;
a6, according to the backup object list, each agent client respectively and sequentially judges whether each backup object is backed up, filtered or not and is incremental data;
a7, each agent client transmits the backup object judged as incremental data to the corresponding HDFS client to read the file block of the backup object, transmits the file block to the backup server, writes the file block into a storage medium, and simultaneously stores the corresponding HDFS connection information and the time point copy complete mark information to complete the backup task.
4. The HDFS data backup method according to claim 3, wherein the data source to be protected in step a1 is specifically an HDFS file or directory.
5. The HDFS data backup method according to claim 3, wherein the step a4 specifically includes the following steps:
a41, if the initiated task is the full backup task, executing the step A44;
a42, if the initiated is an incremental backup task or a permanent incremental backup task, the agent client inquires the existing time point type from the backup server according to the backup task, if the full standby time point is found and the time point copy between the full standby time point and the current time of the HDFS is complete, the step A44 is executed, otherwise, the backup type is converted into full backup, and then the step A44 is executed;
a43, if the difference backup is initiated, the agent client inquires the existing time point type from the backup server according to the backup task, if the latest time is the full backup time point and the time point copy is complete, the step A44 is executed, otherwise, the backup type is converted into the full backup, and then the step A44 is executed;
a44, backup time object, transmitting the current time point information of HDFS to backup server, writing it into storage medium.
6. The HDFS data backup method according to claim 3, wherein the step a6 is specifically to pass the backup object through a load balancer to determine whether the backup object has been backed up;
the backup object is passed through a file filter to determine if the backup object is filtered.
7. The HDFS data backup method according to claim 5, wherein the step a7 specifically includes the following steps:
a71, the agent client transmits the backup object to the HDFS client, reads the file block of the backup object, transmits the file block to the backup server, and writes the file block into the storage medium;
a72, if the backup object is backed up successfully, marking the time point copy as complete, otherwise, marking the time point copy as incomplete;
and A73, after all the backup objects in the backup object list complete the backup and time point copy marking operation, storing the HDFS connection information and the time point copy complete marking information corresponding to the backup objects at the same time to complete the backup task.
8. An HDFS data recovery method using the system of claim 1, comprising the steps of:
b1, according to the recovery time, the recovery data and the recovery position selected by the user, initiating a recovery task by the backup server, and sending corresponding recovery instructions to a plurality of proxy clients connected with the virtual client;
b2, determining time availability and data information needing to be recovered by a plurality of proxy clients through analyzing parameters according to the received recovery instructions;
b3, each agent client generates a recovery object list by analyzing the data source in the recovery task;
b4, according to the recovery object list, each proxy client respectively and sequentially judges whether each recovery object is recovered or not and whether the recovery object is filtered or not;
and B5, obtaining recovery data through data analysis and new path synthesis, and transmitting the recovery data to the HDFS clients by each agent client to complete recovery tasks.
9. The HDFS data recovery method according to claim 8, wherein the step B4 is specifically to pass the recovery object through a load balancer to determine whether the recovery object has been backed up;
the restoration object is passed through a file filter to determine whether the restoration object is filtered.
10. The HDFS data recovery method according to claim 8, wherein the step B5 specifically includes the following steps:
b51, obtaining a data file name needing to be recovered through data analysis;
b52, according to the set recovery task, if the recovery task requires to recover to the new path, splicing the new path and the data file name needing to be recovered into recovery data, otherwise, taking the data file name needing to be recovered as the recovery data;
and B53, each proxy client transmits the corresponding recovery data to the HDFS client, and writes the recovery data into the HDFS according to the coverage rule to complete the recovery task.
CN202011188471.1A 2020-10-30 2020-10-30 HDFS data backup and recovery system and backup and recovery method Active CN112214357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011188471.1A CN112214357B (en) 2020-10-30 2020-10-30 HDFS data backup and recovery system and backup and recovery method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011188471.1A CN112214357B (en) 2020-10-30 2020-10-30 HDFS data backup and recovery system and backup and recovery method

Publications (2)

Publication Number Publication Date
CN112214357A true CN112214357A (en) 2021-01-12
CN112214357B CN112214357B (en) 2022-12-30

Family

ID=74057663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011188471.1A Active CN112214357B (en) 2020-10-30 2020-10-30 HDFS data backup and recovery system and backup and recovery method

Country Status (1)

Country Link
CN (1) CN112214357B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800019A (en) * 2021-03-03 2021-05-14 国网甘肃省电力公司 Data backup method and system based on Hadoop distributed file system
CN113112023A (en) * 2021-06-15 2021-07-13 苏州浪潮智能科技有限公司 Inference service management method, device, system and medium of inference platform
CN114153660A (en) * 2021-11-29 2022-03-08 平安壹账通云科技(深圳)有限公司 Database backup method, device, server and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741846A (en) * 2009-12-22 2010-06-16 联想网御科技(北京)有限公司 File downloading method, file downloading device and file downloading system
US20130346709A1 (en) * 2012-06-21 2013-12-26 Ca, Inc. Data recovery using conversion of backup to virtual disk
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)
CN106156359A (en) * 2016-07-28 2016-11-23 四川新环佳科技发展有限公司 A kind of data synchronization updating method under cloud computing platform
CN107613026A (en) * 2017-10-31 2018-01-19 四川仕虹腾飞信息技术有限公司 Distributed file management system based on cloud storage system
CN110851302A (en) * 2019-10-31 2020-02-28 上海爱数信息技术股份有限公司 Database information backup method and database information recovery method
CN111352700A (en) * 2020-02-29 2020-06-30 苏州浪潮智能科技有限公司 Method, system, terminal and storage medium for online migration of virtual machine across clouds

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741846A (en) * 2009-12-22 2010-06-16 联想网御科技(北京)有限公司 File downloading method, file downloading device and file downloading system
US20130346709A1 (en) * 2012-06-21 2013-12-26 Ca, Inc. Data recovery using conversion of backup to virtual disk
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)
CN106156359A (en) * 2016-07-28 2016-11-23 四川新环佳科技发展有限公司 A kind of data synchronization updating method under cloud computing platform
CN107613026A (en) * 2017-10-31 2018-01-19 四川仕虹腾飞信息技术有限公司 Distributed file management system based on cloud storage system
CN110851302A (en) * 2019-10-31 2020-02-28 上海爱数信息技术股份有限公司 Database information backup method and database information recovery method
CN111352700A (en) * 2020-02-29 2020-06-30 苏州浪潮智能科技有限公司 Method, system, terminal and storage medium for online migration of virtual machine across clouds

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800019A (en) * 2021-03-03 2021-05-14 国网甘肃省电力公司 Data backup method and system based on Hadoop distributed file system
CN113112023A (en) * 2021-06-15 2021-07-13 苏州浪潮智能科技有限公司 Inference service management method, device, system and medium of inference platform
CN113112023B (en) * 2021-06-15 2021-08-31 苏州浪潮智能科技有限公司 Inference service management method and device of AIStation inference platform
US11994958B2 (en) 2021-06-15 2024-05-28 Inspur Suzhou Intelligent Technology Co., Ltd. Inference service management method, apparatus and system for inference platform, and medium
CN114153660A (en) * 2021-11-29 2022-03-08 平安壹账通云科技(深圳)有限公司 Database backup method, device, server and medium

Also Published As

Publication number Publication date
CN112214357B (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN112214357B (en) HDFS data backup and recovery system and backup and recovery method
CN110647580B (en) Distributed container cluster mirror image management main node, slave node, system and method
US8104043B2 (en) System and method for dynamic cooperative distributed execution of computer tasks without a centralized controller
KR101970839B1 (en) Replaying jobs at a secondary location of a service
JP5444178B2 (en) Backup / restore processing device, backup / restore processing method and program
US20060259594A1 (en) Progressive deployment and maintenance of applications on a set of peer nodes
CN109032838B (en) Automatic verification method for consistency of backup and recovery data of virtual machine
CN111309524A (en) Distributed storage system fault recovery method, device, terminal and storage medium
JP6091142B2 (en) Image forming apparatus, control method, and program thereof.
CN110851303B (en) Data backup method, system and equipment
CN113438292A (en) Agent deployment method and device based on automatic operation and maintenance tool
CN110972497A (en) Disaster recovery method and device for virtualization platform
CN112035062B (en) Migration method of local storage of cloud computing, computer equipment and storage medium
JP5352027B2 (en) Computer system management method and management apparatus
CN110096226B (en) Disk array deployment method and device
CN115098300B (en) Database backup method, disaster recovery method, device and equipment
CN110543385A (en) Virtual backup method and virtual backup restoration method
CN114281600A (en) Disaster recovery backup and recovery method, device, equipment and storage medium
CN112148532A (en) Batch recovery method and device for hard disk data, storage medium and electronic equipment
CN113467717B (en) Dual-machine volume mirror image management method, device and equipment and readable storage medium
JP2001034595A (en) Method and device for integrating work among a plurality of systems
CN113553007A (en) Host disk data snapshot copy method, device, equipment and storage medium
CN116501544A (en) Database backup method and system, electronic equipment and storage medium
JP6394059B2 (en) Storage system, storage method, and program
CN112000515A (en) Method and assembly for recovering instance data in redis cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant