CN108804253B - Parallel operation backup method for mass data backup - Google Patents

Parallel operation backup method for mass data backup Download PDF

Info

Publication number
CN108804253B
CN108804253B CN201710301054.5A CN201710301054A CN108804253B CN 108804253 B CN108804253 B CN 108804253B CN 201710301054 A CN201710301054 A CN 201710301054A CN 108804253 B CN108804253 B CN 108804253B
Authority
CN
China
Prior art keywords
backup
job
file
directory
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710301054.5A
Other languages
Chinese (zh)
Other versions
CN108804253A (en
Inventor
姚秋玲
陈德清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of High Energy Physics of CAS
Original Assignee
Institute of High Energy Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of High Energy Physics of CAS filed Critical Institute of High Energy Physics of CAS
Priority to CN201710301054.5A priority Critical patent/CN108804253B/en
Publication of CN108804253A publication Critical patent/CN108804253A/en
Application granted granted Critical
Publication of CN108804253B publication Critical patent/CN108804253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy

Abstract

The invention discloses a parallel operation backup method for mass data backup. The method comprises the following steps: 1) selecting a plurality of backup nodes to form a backup cluster, wherein each backup node has uniform configuration; 2) the terminal selects a backup node as a backup management server and starts a backup strategy of the object to be backed up; 3) the backup management server selects a backup node as an operation scheduler, acquires the directory structure corresponding to the backup object layer by layer, and generates a scanning operation when acquiring a directory; 4) the backup management server submits each scanning operation and the corresponding operation path to the operation dispatcher; the job scheduler sends the target directory to the backup node to scan the target directory in the scanning job; 5) the backup management server selects files to be backed up and generates a plurality of file sub-tables; generating a copy job according to each sub-table and sending the copy job to a job scheduler; 6) and the job scheduler sends different copy jobs to different backup nodes and copies the files to be backed up to corresponding positions.

Description

Parallel operation backup method for mass data backup
Technical Field
The present invention relates to a data backup method, and more particularly, to a parallel job backup method for mass data backup.
Background
Data is vital to an enterprise, department, organization, or individual. Data backup becomes very important because once data information is lost or destroyed, the data information is lost or damaged due to various reasons, such as equipment failure, hacker virus, human misoperation, etc. Data backup is a data security policy, and a copy is made on key data so as to restore the data through backup software when a fault occurs, thereby avoiding loss caused by data loss.
With the continuous development of information technology, emerging things such as cloud computing, internet of things and social networks enable the data types and scales of human society to increase explosively on a global scale. By 2012, the amount of data has stepped from the TB (1TB 1024GB) level to the PB (1PB 1024TB), EB (1EB 1024PB), and even ZB (1ZB 1024EB) level. The advent of the big data era also promotes the rapid increase of the backup demand, and TB and larger mass data bring new challenges to data backup.
In addition, the storage manner of data also tends to be diversified: structured traditional relational databases; there is an unstructured non-relational database; there are also distributed file systems typified by GFS and HDFS. As the amount and variety of data increases, the backup of such data becomes more complex and time consuming.
In the face of mass data, how to fully utilize software and hardware resources, meet different backup requirements, and quickly and effectively complete data backup and recovery is a main purpose of designing and researching a backup system. The existing backup software has several problems:
1. and is not designed for the backup of mass data. During the backup process, it is most important to copy the backup object to another machine. In the process, a plurality of backup software copies and transmits data in a single data stream mode, and the backup speed and capacity cannot be increased due to the limitation of a server or network bandwidth. The performance is good when several thousand or tens of thousands of files are backed up. However, for massive data containing tens of millions or even hundreds of millions of files, several days or even weeks are needed, and the backup task cannot be completed within an acceptable time range.
2. There is a single point of failure potential in the backup system. Some backup systems build multiple backup servers, but different backup servers are responsible for different backup services. Once a server fails, the backup and restore services defined on that server cannot proceed.
3. The backup software adopts a self-defined storage format for safety consideration, the backup file depends on the backup software, and when the software fails, the backup file cannot be used, so that the backup is equal to the result of no backup.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a parallel job backup method for mass data backup, which comprises the steps of obtaining a mass data list to be backed up in a mode of constructing a backup cluster and submitting parallel jobs, submitting the parallel backup jobs according to a customized backup strategy and storing backup files in a standard linux file format.
The invention comprises several structural blocks:
1. equipment layer: all hardware resources, including backup clusters composed of backup nodes-multiple computers, storage resources-distributed file systems composed of multiple disk arrays, network resources, etc. In order to remove the possibility of single point of failure, each node in the backup cluster uniformly installs an operating system, a backup system and job scheduling and checking software, uniformly configures parameters, and uniformly defines all backup strategies and server lists. Each backup node may act as a backup management server and a job scheduling server. The method comprises the steps of virtualizing disks distributed on a plurality of disk arrays into a logical storage space in a distributed file system mode, and mounting the logical storage space to each backup node in a shared directory. Each backup node in the backup cluster uses the underlying disk array by accessing the shared directory. All data to be saved in the backup process, including mysql database, jobs and job results, backup files, logs, etc., are saved in the shared directory. One backup node is used as a backup management server and takes charge of the whole backup management task, and when the backup management server fails, another backup node is started to serve as the management server. And the other backup node is used as a job scheduling server for job receiving and scheduling. Because all backup node information in the cluster is uniform and resources are shared, the normal operation of the whole system cannot be influenced by the fault of a single machine.
2. And (3) a management layer: the layer comprises all software and applications, including backup strategy, database management, job scheduling, job checking, access authorization and the like, and each backup node comprises the software and the applications.
1) Customizable backup strategies. Different data have different backup requirements, and the backup capacity, the backup frequency, the backup mode (full backup, incremental backup or differential backup), the retention time, the access level and the like can be customized according to specific requirements.
2) And installing the mysql database service, establishing a special mysql table for backup, recording directory information and file information of a backup object, and providing file query and recovery in the backup process.
3) For the backup of mass data, in order to improve the backup speed, the extraction of a file list and the copying of a backup object are completed in parallel in a backup cluster in the form of a job. Jobs of the same function are scripts containing the same program and different objects, generated in a fixed format, and sent to the job scheduler. And the job scheduler selects the executed backup node according to the job execution time and the state of the backup node.
4) The jobs running on the backup nodes have regular job check scripts, and failed jobs need to enter the job scheduler to be executed again.
5) In the backup system, backup objects are defined by directories, and one backup object can be composed of a plurality of directories. An authorized read-only machine list is defined on the backup node, and the defined format is as follows: directory + the machine name or ip address where the backup object is located. The authorized machine can inquire the information of the backup files related to the authorized machine on the backup node, but can not delete the backup files of the backup node and can not inquire and restore other directories.
3. And (3) a service layer: and the backup system is used for backing up all services and API interfaces provided externally. Including applications for data backup, requests for data recovery, and viewing and retrieving backed up data. And the system also has a linux command line format and an API (application programming interface) interface, and is used for inquiring the backed-up directory information and file information.
In order to achieve the purpose, the invention adopts the following technical scheme:
a parallel backup method for processing mass data, comprising the following steps (as shown in fig. 1):
1. the backup strategy is checked. Each backup object defines a corresponding backup strategy before backup. After the backup plan is started, all definitions of the backup process are obtained in the backup strategy, including whether a backup object exists and is granted with a reading authority or not, whether a directory related in the backup process is normal or not, which backup form (full backup, incremental backup or differential backup) the backup is performed, in which directory or medium a backup file is stored, whether a backup management server and a job scheduling server are normal or not, according to what threshold value a file list is divided (data capacity or file number), and which logs and information need to be recorded and submitted after the backup is finished. And if the mysql database has the directory information table and the file information table of the backup object, firstly establishing the two database tables if the backup object is backed up for the first time.
2. A parallel job of directory scans is generated (as shown in figure 2). Because each backup object is defined in a directory form, a directory structure of the directory is acquired layer by layer through a depth-first traversal algorithm, every time a directory is acquired, a piece of directory information is inserted into a directory information table of the mysql database, and simultaneously, a scanning job, namely an executable script, is generated, the content of the scanning job and the target directory name are used, and then the scanning job and the job path are submitted to a job scheduler.
3. The scan job is scheduled and run (as shown in fig. 3). And the job scheduler selects idle backup nodes as execution nodes according to the comprehensive state evaluation of each node in the backup cluster, including the residual memory amount, the load value, the CPU occupation total number, the running process number and the like, and sends the scanning job names and the job paths to the backup nodes. And the node executes scanning operation under the operation path, scans a target directory defined in the operation script, records all file information under the directory, and inserts a piece of file information for each file in the mysql data table. When all the scanning operations are completed, the complete directory information and file information of the backup object are stored in the mysql database.
4. The file copy is run in parallel jobs (as shown in fig. 4). And extracting files to be backed up from a directory information table and a file information table in the mysql database and recording the files into a text file according to whether the backup is full backup, incremental backup or differential backup. If the file is a full backup, all files in the file information table, including the file name with the path and the file size, are written into the file list text one by one. If the file is incremental backup or difference, extracting the required file according to the file modification time, including the file name with the path and the file size, and similarly writing the file list text one by one. And dividing the file list according to a segmentation threshold (data capacity or file quantity), generating a plurality of copy jobs, wherein the copy jobs comprise tar + file name with path + backup file name with path, submitting a job scheduler, and allocating the jobs to idle backup nodes by the scheduler to run. The backup node executes the tar program and generates a backup file in a path defined by the job (i.e., a backup file name with a path). And different jobs are responsible for copying different files, generating different backup files and running in parallel at different backup nodes.
5. And (5) checking and summarizing backup tasks. And (4) operation inspection: each scanning job or copying job has a set time threshold, and once overtime, failure information is returned to the job scheduler. The job checker checks all failed jobs and generates jobs of the same content to resubmit the job scheduler. The operation with the same content is repeatedly thrown at most twice, and once the number of times exceeds the limit, error information is generated and written into the backup log. And (3) checking resources: and checking and recording the load value, the memory usage amount, the cpu occupancy rate, the disk array usage ratio and the like on each backup node in the backup cluster. And when the machine fault is discovered to be down, informing the job scheduler to delete the machine name. When a new machine is added, the job scheduler is notified to add the machine name. And when no residual space exists in the part storage or an Input/Output fault exists in the part storage, generating alarm information and writing the alarm information into a backup log. And (3) summarizing logs: according to the definition of the backup strategy, useful information in the backup operation is recorded, wherein the useful information comprises a backup server name, a backup object, a backup total capacity, backup time, a backup mode, errors in the backup process and an alarm.
6. Backup and restore services available for queries. If the user needs to recover the lost data, the backup system will first check if the machine is an authorized machine and the corresponding directory name. The recovery service is mainly realized by an interactive script, a user starts script execution by a recovery command, a directory name or a file name needing to be recovered is input according to the prompt of a character interface, the time point of data recovery, a destination path recovery and the like are recovered, the system matches the backup file where the file needs to be recovered according to the parameters, the backup file is de-tard packaged, and the needed file is copied to a path appointed by the user. The user may also call the API to view the backup file information in a command + parameter manner.
The invention has the following positive effects:
the invention adopts a parallel operation mode to complete the scanning and data copying tasks of mass data, fully exerts the performance of a computer cluster and quickly completes the backup task of huge data volume. The backup result is stored in a plurality of backup files, only part of the backup files need to be extracted when the data is restored, and the file restoration speed is accelerated. All backup nodes in the backup system are configured consistently, resources are shared, and system paralysis caused by single machine failure is eliminated; backup and data copy are executed by a tar command of a linux standard, the generated backup file is stored in a tar format and can be read by a tar program carried by a linux or windows operating system, data can be recovered without depending on a backup system, and the availability of the backup system is improved. Compression and encryption parameter selection of the Tar command also reduces network consumption and risk of data theft. After each backup, the task automatic check and the log automatic submission are provided, so that the robustness of the backup system is improved, and the management burden is reduced.
Drawings
FIG. 1 is a flow chart of a parallel backup method of the present invention;
FIG. 2 is a flow chart of a method of generating a scan job;
FIG. 3 is a flowchart of a scan job scheduling and running method;
FIG. 4 is a flow chart of a method of generating a copy job.
Detailed Description
The present invention is further described below with reference to specific examples.
Take a/home directory on a machine named logic as an example for backup. First, the machine named bak01 is the server responsible for backup, and it first starts the self service self-check process, checks the state on the machine is normal, and can acquire/home related configuration, and each backup process exists. Then the backup daemon backup _ agent of the/home is started. Starting a checkconf script in the backup _ agent, reading a defined backup strategy of the home directory, and returning required parameters:
1. backing up a source directory: login is/home;
2. backup frequency: running a backup once a day;
3. backup level: 0 (the backup adopts a complete backup);
4. and (3) access level: private (non-public, root user on login machine only can retrieve data);
5. storing a catalog: bak 01:/cluster/day/login _ home;
6. log directory: bak 01:/$ date/cluster;
7. the task operation server: bak 01;
8. a scheduling server: bak 06;
9. segmentation threshold value: defaults (default cuts one copy process every 20000 files);
10. whether to encrypt: is that;
11. retention time: one month;
12. log mail receiving address: heguans @ hotmail.com;
13. selecting a log record: all information including backup summary, job error reporting and alarm information;
checking whether the source directory exists and can be read, whether the storage sharing directory exists and can be written, whether the log directory exists and can be written, and the job scheduling server can be used.
Thereafter, the back _ agent process starts the baklog process and finddir process on bak 01. The Baklog generates a log file named as/cluster/20170218/log _20170218000300, which is used for storing information such as a known backup source directory, an operating server, a backup level and a storage directory, and continuously records a plurality of information such as a job check result, a job re-projection result, a data copying process, an alarm and the like which are generated by each process below. The finddir process filters files, records only the related information of each subdirectory under the home, and comprises the following steps: the unique ID of the directory, the absolute path of the directory, the relative depth of the directory (corresponding to/home directory, e.g.,/home/a record depth of 2,/home/a/b record depth of 3), parent directory name, directory creation time, insert a record for the directory in the mysql database. For example, the corresponding/home/a directory, the ID number in the database is 46821131, the directory absolute path is/home/a, the directory depth is 2, the parent directory name is/home, and the directory creation time is 2015/08/10. A scan job for the scan/home/a directory is generated at the same time, named scanjob + random number, such as scanjob021 to shared memory directory/cluster/tmp/job/20170218/logic _ home, and sent to job scheduler bak 06.
After receiving the job application, the Bak06 machine starts to start the job periodic check process jobhandle process. And acquiring state evaluation values of all machines in the current backup cluster according to an algorithm, selecting a bak03 machine to start executing a/cluster/tmp/joba/20170218/login _ home/scanjoba 21 script, scanning files in the/home/a, and filtering subdirectories in the/home/a. And inserting a file record into the mysql database after scanning a file, wherein the file record comprises a unique file ID number, a file name, an absolute file path, a user name of a file owner, a group name, a file size, final modification time and final state change time. After the scanjobb 21 job on Bak03 is completed, the job scheduler Bak06 is notified, and scanjobb 21.e file and scanjobb 21.o file are generated to/cluster/tmp/jobb/20170218/login _ home directory at the same time. If the size of the scanjob21.e file is larger than 0, the operation of the scanjob21 job is problematic, and a correct return value is not obtained.
The regularly executed job _ handle process checks all job result e files, resubmits the job, and records the resubmission times of the job until all job checks and job resubmissions are finished. Once the job in the job scheduler is 0, the job _ handle process stops running and returns a value bak01 to the backup _ agent process.
And the backup _ agent process extracts all file lists under the home directory from the file library of the database according to the backup level 0 acquired in the first step, and generates a copy operation every 2 ten thousand files according to the segmentation threshold value. All copy jobs, such as dumpjoba 155, are submitted to the job scheduler bak06 machine. Bak06 starts the job _ handle process for the second time, selects Bak19 to start executing/cluster/tmp/job/20170218/logic _ home/dumpjob155 according to the current state evaluation value, copies the file in dumpjob155 by using the tar program of linux and stores the generated backup file in the directory of/cluster/file/logic _ home/20170218. After all copy jobs have been executed, a dumpjob155.e file and a dumpjob155.o file are also generated for job _ handle process check processing. In addition, one index155 file is generated in the/cluster/file/region _ home/20170218/index for recording the names and file sizes of the 2 ten thousand files. After the job _ handle process stops running, the backup _ agent process on bak01 is also notified.
Finally, the backup _ agent process starts the main process, calls the sendmail program of linux, and mails the content in/cluster/20170218/log _20170218000300 to the relevant administrator heguans @ hotmail. And simultaneously, the backup _ agent process starts a garpage process to clean all directories and files under the/cluster/tmp/joba, so as to avoid leaving garbage files. At this time, the backup of the/home directory on the logic is completed, and the backup _ agent process stops running.
If a user needs to recover some files of the/home, the user must log in a logic machine in a root mode, execute a recovery command, submit a directory name or a file name needing to be recovered step by step according to a prompt, recover the date (such as 20170218) of the file, and recover the location (such as/tmp) of the file. Bak01 will retrieve the index file under/cluster/file/logic _ home/20170218/index, extract the file he needs to copy to the logic:/tmp directory.

Claims (7)

1. A parallel operation backup method for mass data backup comprises the following steps:
1) selecting a plurality of computers as backup nodes to form a backup cluster, wherein each backup node has uniform configuration; each disk array is connected with each backup node in a logical volume mode, and a backup database is constructed on the logical volume;
2) selecting a backup node as a backup management server by a terminal needing backup, and starting a backup strategy of an object to be backed up on the backup management server; the backup objects are defined in a directory form, namely each backup object corresponds to a directory;
3) the backup management server selects a backup node as a job scheduling server according to the backup strategy, checks whether a directory information table and a file information table of the backup object exist in the backup database, and if not, establishes the directory information table and the file information table of the backup object; secondly, acquiring a directory structure corresponding to the backup object layer by layer, inserting a piece of directory information into the directory information table when acquiring a directory, and generating a scanning operation; the scanning operation comprises a scanning program name and a target directory name;
4) the backup management server submits each scanning operation and the corresponding operation path to the operation scheduling server; the job scheduling server selects a plurality of backup nodes as execution nodes and sends each scanning job and a job path corresponding to the scanning job to one execution node; each execution node scans a target directory in the received scanning operation, records all file information under the target directory, and inserts a piece of file information for each scanned file in the file information table;
5) the backup management server selects files to be backed up according to the backup strategy, the directory information table and the file information table to generate a file list, and segments the file list according to a segmentation threshold value to obtain a plurality of sub-tables;
6) the backup management server generates a copy job according to each sub-table and sends each copy job to the job scheduling server; the copy job includes a copy program name, a file name with a path, and a backup file name with a path;
7) the operation scheduling server sends different copy operations to different backup nodes, and the backup nodes copy corresponding files to be backed up to corresponding positions in the logical volume according to the received copy operations.
2. The method of claim 1, wherein the information in the backup strategy comprises read authority of a backup object, backup form, directory or medium in which a backup file is stored, backup node as a job scheduling server, a split threshold, and log and information to be recorded and submitted after the backup is finished.
3. The method according to claim 2, wherein when the backup cluster receives a request for restoring the backup object from a terminal, the backup cluster first checks whether the terminal is an authorized terminal in the backup policy; if the terminal is an authorized terminal, prompting the terminal to input a directory name or a file name to be recovered, a time point of recovering data and a recovery destination path; and then searching a backup file where the file needing to be restored is located according to the input information, and copying the backup file to a specified path.
4. The method according to claim 1, 2 or 3, wherein the scanning operation and the copying operation both have a set time threshold value, and if the execution time exceeds the time threshold value, failure information is returned to the operation scheduling server; the job scheduling server selects a backup node for the failed scanning job or the failed copying job to execute again; and if the execution times of the same scanning operation or the same copying operation exceed a set threshold value, stopping executing the corresponding operation and generating error information to be written into a backup log.
5. The method according to claim 1, 2 or 3, wherein the job scheduling server selects the backup node to execute according to the execution time of the job and the state of the backup node; wherein the job is the scan job or the copy job.
6. A method according to claim 1, 2 or 3, wherein the disk array is mounted in a distributed file system and provides a uniform name.
7. The method of claim 1, 2 or 3, wherein the disks distributed on a plurality of disk arrays are virtualized into a logical storage space in a distributed file system manner, and the logical storage space is mounted to each backup node in a shared directory; each backup node in the backup cluster uses the underlying disk array by accessing the shared directory.
CN201710301054.5A 2017-05-02 2017-05-02 Parallel operation backup method for mass data backup Active CN108804253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710301054.5A CN108804253B (en) 2017-05-02 2017-05-02 Parallel operation backup method for mass data backup

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710301054.5A CN108804253B (en) 2017-05-02 2017-05-02 Parallel operation backup method for mass data backup

Publications (2)

Publication Number Publication Date
CN108804253A CN108804253A (en) 2018-11-13
CN108804253B true CN108804253B (en) 2021-08-06

Family

ID=64053876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710301054.5A Active CN108804253B (en) 2017-05-02 2017-05-02 Parallel operation backup method for mass data backup

Country Status (1)

Country Link
CN (1) CN108804253B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522160B (en) * 2018-11-29 2020-05-05 上海英方软件股份有限公司 Method and system for comparing and backing up file directory by saving file information abstract
CN109558215B (en) * 2018-12-10 2021-09-07 深圳市木浪云数据有限公司 Backup method, recovery method and device of virtual machine and backup server cluster
CN109976945A (en) * 2019-02-26 2019-07-05 深圳市买买提信息科技有限公司 A kind of method and device of Log backup
CN109901951A (en) * 2019-03-05 2019-06-18 山东浪潮云信息技术有限公司 A kind of storage system and method for ceph company-data
CN110688430B (en) * 2019-08-22 2023-01-10 创新先进技术有限公司 Method and device for obtaining data bypass and electronic equipment
CN110618898A (en) * 2019-09-11 2019-12-27 厦门鑫朗软件有限公司 Method for forced saving file to appointed directory synchronous backup according to process
CN110795404B (en) * 2019-10-31 2023-04-07 京东方科技集团股份有限公司 Hadoop distributed file system and operation method and repair method thereof
CN110968463B (en) * 2019-12-19 2022-08-30 北京五八信息技术有限公司 Method and device for determining types of data nodes in group
CN111159313B (en) * 2019-12-31 2020-11-13 广州鼎甲计算机科技有限公司 Method, system, device and storage medium for database rapid synthesis backup
CN111339037B (en) * 2020-02-14 2023-06-09 西安奥卡云数据科技有限公司 Efficient parallel replication method for parallel distributed file system
CN113157645B (en) * 2021-04-21 2023-12-19 平安科技(深圳)有限公司 Cluster data migration method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567198A (en) * 2003-06-30 2005-01-19 联想(北京)有限公司 Method for mirror backup of cluster platform cross parallel system
CN105302667A (en) * 2015-10-12 2016-02-03 国家计算机网络与信息安全管理中心 Cluster architecture based high-reliability data backup and recovery method
US9600487B1 (en) * 2014-06-30 2017-03-21 EMC IP Holding Company LLC Self healing and restartable multi-steam data backup
CN106648967A (en) * 2016-10-14 2017-05-10 曙光信息产业(北京)有限公司 File scanning method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761162B (en) * 2014-01-11 2016-12-07 深圳清华大学研究院 The data back up method of distributed file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1567198A (en) * 2003-06-30 2005-01-19 联想(北京)有限公司 Method for mirror backup of cluster platform cross parallel system
US9600487B1 (en) * 2014-06-30 2017-03-21 EMC IP Holding Company LLC Self healing and restartable multi-steam data backup
CN105302667A (en) * 2015-10-12 2016-02-03 国家计算机网络与信息安全管理中心 Cluster architecture based high-reliability data backup and recovery method
CN106648967A (en) * 2016-10-14 2017-05-10 曙光信息产业(北京)有限公司 File scanning method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
The Amanda Network Backup Manager;James da Silva等;《LISA》;19931130;全文 *
企业级开源备份软件在图书馆数据中心的应用;张媛;《图书馆学刊》;20140930;全文 *
基于数据库的文件系统管理工具设计与实现;石京燕等;《计算机工程》;20150531;全文 *

Also Published As

Publication number Publication date
CN108804253A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108804253B (en) Parallel operation backup method for mass data backup
US7933872B2 (en) Database backup, refresh and cloning system and method
KR101658964B1 (en) System and method for datacenter workflow automation scenarios using virtual databases
KR101617339B1 (en) Virtual database system
US10992676B2 (en) Leveraging blockchain technology for auditing cloud service for data protection compliance
US20190196919A1 (en) Maintaining files in a retained file system
US11675741B2 (en) Adaptable multi-layered storage for deduplicating electronic messages
US11194669B2 (en) Adaptable multi-layered storage for generating search indexes
CN110209653B (en) HBase data migration method and device
US20240095380A1 (en) Blockchain technology for regulatory compliance of data management systems
US11392460B2 (en) Adaptable multi-layer storage with controlled restoration of protected data
US11003364B2 (en) Write-once read-many compliant data storage cluster
US8832030B1 (en) Sharepoint granular level recoveries
US20230273864A1 (en) Data management system with limited control of external compute and storage resources
US11080142B2 (en) Preservation of electronic messages between snapshots
US11966297B2 (en) Identifying database archive log dependency and backup copy recoverability
US11436089B2 (en) Identifying database backup copy chaining
US11436193B2 (en) System and method for managing data using an enumerator
US20230214511A1 (en) Database management engine for a database management system
US20240004712A1 (en) Fencing off cluster services based on shared storage access keys
US20240005017A1 (en) Fencing off cluster services based on access keys for shared storage
US20230306129A1 (en) Sensitive data discovery for databases
Verma A peer to peer System for Data Management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant