CN108984345B

CN108984345B - Big data backup method based on virtual shared directory

Info

Publication number: CN108984345B
Application number: CN201810776448.0A
Authority: CN
Inventors: 匙凯; 于富东; 胡建华; 杨林; 崔明阳
Original assignee: Jilin Jlu Communication Design Institute Co ltd
Current assignee: Jilin Jlu Communication Design Institute Co ltd
Priority date: 2018-07-11
Filing date: 2018-07-11
Publication date: 2020-06-23
Anticipated expiration: 2038-07-11
Also published as: CN108984345A

Abstract

A big data backup method based on virtual shared directory is prepared as providing file sharing protocol interface to external by local storage on media server, setting up a virtual shared directory, providing said interface to big data platform A to be backed up, carrying partition on local when big data platform A needs to be backed up to obtain sharing right of said virtual directory, disconnecting partition after backup is finished, backing back said partition to media server and providing shared directory service to another storage server.

Description

Big data backup method based on virtual shared directory

Technical Field

The invention belongs to the technical field of data backup, and particularly relates to a big data backup method for improving big data backup efficiency.

Background

The value of data in the big data era is more critical, and the safety of data running on the big data needs to be guaranteed, so that a faster and more universal backup technology is needed to realize data backup of various big data platforms and guarantee backup efficiency and compatibility.

At present, the method for data backup of a large data platform generally follows the following architecture, which includes the following parts: backup agents (i.e., agents), media servers, storage media.

The details of the specific implementation can be roughly divided into the following two types:

(1) client agent

HTTP

Media server

ISCSI

Storage medium

The backup agent is installed on a large data host of a to-be-backed end, collects backup data, and transmits the data to the media server through a network HTTP protocol, the media server is often deployed independently, collects data from each backup agent, and transmits and stores the data to a storage medium (such as disk) through an ISCSI interface after deduplication and compression are performed.

(2) Client agent

HTTP

Media server

HTTP

Storage medium

The backup agent is installed on a big data host of a side to be backed up, collects backup data, transmits the data to the media server through a network HTTP protocol, the media server is deployed independently, collects the data from each backup agent, performs deduplication and compression, and transmits and stores the data to a storage medium (such as object storage) through an HTTP interface.

In the prior art (1), corresponding acquisition clients are required for different backup objects, and agents are required to transfer data from a real data source, such as a hadoop name, to a temporary directory (on the host), then, the data in the directory is processed by block cutting (for example, one 64K data block at a time), and then each data block is transmitted to the media server end by the HTTP protocol, and after the media server receives the data, after a series of deduplication and compression processing, data is transmitted to a special storage medium (such as disk) through an FC network by an ISCSI protocol, the data in the whole process is subjected to 4 key time-consuming steps (i.e., agent local temporary storage, local switching, network transmission to a media server, and network transmission of the media server to the storage medium), the efficiency of data backup is difficult to guarantee, and the running risk of the system is increased by too many links.

Compared with the technology (1), the difference is that after the data is transmitted to the media server, the data is not directly transmitted to the storage media through the ISCSI protocol, but is cut into blocks again through the HTTP protocol, and the data is transmitted to the object storage through the HTTP protocol (object storage), the technology (2) is only different in the back-end storage protocol compared with the technology (1), the overall storage efficiency and risk are not effectively avoided, meanwhile, corresponding client agent agents also need to be developed for the acquisition of a multi-type large data platform, and the complexity and compatibility of a backup system are not improved. Therefore, there is a need in the art for a new solution to solve this problem.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the big data backup method based on the virtual shared directory improves the compatibility of a data backup system under a heterogeneous big data platform, simplifies the backup process of the big data platform backup system and improves the backup efficiency.

A big data backup method based on a virtual shared directory is characterized in that: the method comprises the following steps:

step one, establishing a virtual shared data storage backup system comprising a big data platform, a backup medium layer, a medium service layer and a storage medium;

secondly, the big data platform initiates a backup requirement to the system, the backup medium layer remotely mounts the network file medium NFS agent on the big data platform, provides a virtual shared directory based on a network file NFS protocol for the big data platform, and temporarily stores data in an internal directory of the NFS agent;

step three, after the NFS agent provided by the backup medium layer finishes temporary storage, the virtual sharing link is disconnected, and the data of the large data platform belongs to the backup medium layer;

step four, after data processing is carried out on the backup medium layer, the NFS agent is sent to a storage medium, and data of the big data platform is reserved in the storage medium;

step five, the big data platform initiates a data recovery request, backups data corresponding to the storage medium on the medium layer, establishes a shared virtual directory through the NFS agent, and sends the shared virtual directory to the medium service layer;

step six, mounting the NFS agent to the big data platform again through the medium service layer, and obtaining the file level access authority of the data by the big data platform;

and seventhly, the big data platform restores the data to the production environment, the restoration operation of the data is carried out, and the big data backup based on the virtual shared directory is completed.

The storage medium is an entity terminal device for actually storing data, can be automatically partitioned inside and is used for backing up data storage of more than one big data platform at the same time.

The backup medium layer is used for adapting the data receiving layer corresponding to the NFS agent to the storage medium for temporary storage and processing of data.

Through the design scheme, the invention can bring the following beneficial effects: a big data backup method based on a virtual shared directory improves the compatibility of a data backup system under a heterogeneous big data platform, simplifies the backup process of the big data platform backup system and improves the backup efficiency.

The invention can bring the following further beneficial effects: the invention realizes the creation of the virtual shared directory by two times of remote mounting, simplifies the complexity caused by the repeated processing and transmission of the existing backup software, and improves the efficiency of backup recovery.

The remote mounting technology of the invention adopts NFS protocol support, and a universal file protocol can be adapted to various big data platforms, and the compatibility of data backup of the big data platforms is improved without the need of traditional backup software for various clients.

Drawings

The invention is further described with reference to the following figures and detailed description:

fig. 1 is a schematic block diagram of a process of a big data backup method based on a virtual shared directory according to the present invention.

Detailed Description

step four, after data processing is carried out on the backup medium layer, the NFS agent is sent to a storage medium, and data of the big data platform is reserved in the storage medium; the virtual shared directory is provided on the storage medium in a remote mounting mode, so that the disk-drop persistence of the backup data on the storage medium is realized, namely the shared directory is used as storage and reserved at the storage medium, and when other large data platforms need to be backed up at the moment, a new partition is divided at the storage medium and used for storing new backup data;

The invention provides a file sharing protocol interface to the outside through the local storage on the medium server, establishes a virtual sharing directory, if the interface is provided for the big data platform A which needs to be backed up, the partition is mounted on the local when the big data platform A needs to be backed up, the sharing right of the virtual directory can be obtained, after the backup is finished, the partition is disconnected, the partition can be returned to the medium server, meanwhile, the sharing directory service is provided for the other storage server, and the backup of the big data file is realized simply through the file copying.

The recovery process is the reverse of the backup process, except that the order of the two data shares is different.

Claims

1. A big data backup method based on a virtual shared directory is characterized in that: comprises the following steps of (a) carrying out,

2. The method for backing up big data based on the virtual shared directory as claimed in claim 1, wherein: the storage medium is a disk for actually storing data, can be automatically partitioned inside and is used for backing up data storage of more than one big data platform at the same time.

3. The method for backing up big data based on the virtual shared directory as claimed in claim 1, wherein: the backup medium layer is used for adapting the data receiving layer corresponding to the NFS agent to the storage medium for temporary storage and processing of data.