CN115543688B - Backup method, backup device, proxy terminal and storage medium - Google Patents

Backup method, backup device, proxy terminal and storage medium Download PDF

Info

Publication number
CN115543688B
CN115543688B CN202211197326.9A CN202211197326A CN115543688B CN 115543688 B CN115543688 B CN 115543688B CN 202211197326 A CN202211197326 A CN 202211197326A CN 115543688 B CN115543688 B CN 115543688B
Authority
CN
China
Prior art keywords
backup
fingerprint
fingerprints
backed
storage server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211197326.9A
Other languages
Chinese (zh)
Other versions
CN115543688A (en
Inventor
马立珂
王贤达
王子骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Dingjia Computer Technology Co ltd
Original Assignee
Guangzhou Dingjia Computer Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Dingjia Computer Technology Co ltd filed Critical Guangzhou Dingjia Computer Technology Co ltd
Priority to CN202211197326.9A priority Critical patent/CN115543688B/en
Publication of CN115543688A publication Critical patent/CN115543688A/en
Application granted granted Critical
Publication of CN115543688B publication Critical patent/CN115543688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a backup method, a device, an agent end and a storage medium, which realize the effects of quickly inquiring fingerprints and improving backup efficiency. Before backup, downloading fingerprints from a backup storage server according to the type of backup, performing modulo on the downloaded fingerprints, storing the downloaded fingerprints to a local fingerprint file with the same number as the modulo result of the fingerprints, calculating the fingerprints for the data blocks to be backed up, performing modulo, determining the local fingerprint file with the same number as the modulo result according to the modulo result, searching whether the fingerprints identical to the fingerprints of the data blocks to be backed up exist, determining whether the fingerprints identical to the fingerprints of the data blocks to be backed up exist in the backup storage server according to the searching result, and determining whether the data blocks to be backed up and the fingerprints thereof to be transmitted to the backup storage server or the fingerprints only to be transmitted to the backup storage server according to the fact that the fingerprints identical to the fingerprints of the data blocks to be backed up exist in the backup storage server.

Description

Backup method, backup device, proxy terminal and storage medium
Technical Field
The present invention relates to the field of backup technologies, and in particular, to a backup method, apparatus, agent side, storage medium, and computer program product.
Background
In the backup system of the repeated data deletion, when the agent end performs backup, the agent end needs to frequently inquire whether fingerprints of the data blocks needing to be backed up exist or not from the backup storage server, if not, the agent end transmits the data blocks and the fingerprints thereof to the backup storage server, and if so, the agent end only transmits the fingerprints to the backup storage server.
In order to improve the fingerprint query speed, the agent end can download fingerprints of data blocks possibly needed to be used before backup, and store the downloaded fingerprints into a local fingerprint file of a local file system; however, in the case that there are a plurality of local fingerprint files, there is no scheme with high feasibility, and it can be determined to which local fingerprint file the downloaded fingerprint is stored, so as to further increase the speed of querying the fingerprint.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a backup method, apparatus, agent side, storage medium, and computer program product.
The application provides a backup method, which comprises the following steps:
before the backup, if the type of the backup is full-scale backup, downloading fingerprints related to the last full-scale backup and the data blocks of the backup from the last full-scale backup from a backup storage server; under the condition that the full-quantity backup is performed after the last time and before the current time, the differential backup and the incremental backup are sequentially performed, if the current backup type is the differential backup, the fingerprints related to the data blocks of the last differential backup and the incremental backup from the last differential backup are downloaded from a backup storage server; under the condition that multiple incremental backups are carried out after the last full backup and before the current backup, if the current backup is the incremental backup, the fingerprints related to the data blocks of the last full backup and the incremental backups except the last incremental backup are downloaded from a backup storage server;
Taking the number of the local fingerprint files in the local file system as a divisor, carrying out modulo calculation on the downloaded fingerprints, numbering each local fingerprint file according to the modulo calculation result, and storing the downloaded fingerprints to the local fingerprint files with the same number as the modulo calculation result of the fingerprints;
in the backup process, calculating fingerprints for each data block to be backed up, taking the number of the local fingerprint files in the local file system as a divisor, performing modulo calculation on the calculated fingerprints, and determining the local fingerprint files with the same numbers as the modulo calculation result according to the modulo calculation result;
searching whether the fingerprints which are the same as the fingerprints of the data blocks to be backed up exist in the local fingerprint file with the same number as the modulo result;
if the fact that the local fingerprint file has the same fingerprints as the fingerprints of the data blocks to be backed up is determined according to the searching result, the fingerprints of the data blocks to be backed up are only transmitted to a backup storage server;
if the local fingerprint file is determined to not store the fingerprints identical to the fingerprints of the data blocks to be backed up according to the searching result, and if the backup storage server is stored with the fingerprints identical to the fingerprints of the data blocks to be backed up, transmitting only the fingerprints of the data blocks to be backed up to the backup storage server;
And if the local fingerprint file is determined to not store the fingerprint identical to the fingerprint of the data block to be backed up according to the searching result, and if the backup storage server is determined to not store the fingerprint identical to the fingerprint of the data block to be backed up, transmitting the data block to be backed up and the fingerprint thereof to the backup storage server.
The application provides a backup device, the device includes:
before the current backup, if the current backup type is full backup, downloading fingerprints related to the last full backup and the data blocks of the last full backup from the backup storage server; if the current backup type is differential backup, downloading fingerprints related to the last differential backup and the data blocks of the incremental backup from the last differential backup from a backup storage server; if the backup is the incremental backup, downloading fingerprints related to the data blocks of the last full-size backup and the incremental backup except the last incremental backup from a backup storage server;
the fingerprint storage module is used for taking the number of the local fingerprint files in the local file system as a divisor, carrying out modulo calculation on the downloaded fingerprints, numbering each local fingerprint file according to the modulo calculation result, and storing the downloaded fingerprints into the local fingerprint files with the same number as the modulo calculation result of the fingerprints;
The fingerprint file determining module is used for calculating fingerprints for each data block to be backed up in the backup process, taking the number of the local fingerprint files in the local file system as a divisor, performing modulo calculation on the calculated fingerprints, and determining the local fingerprint files with the same numbers as the modulo result according to the modulo result;
the fingerprint searching module is used for searching whether the fingerprints which are the same as the fingerprints of the data blocks to be backed up exist in the local fingerprint file with the same number as the modulo result;
the transmission module is used for only transmitting the fingerprints of the data blocks to be backed up to the backup storage server if the fingerprints which are the same as the fingerprints of the data blocks to be backed up are determined to exist in the local fingerprint file according to the searching result;
the transmission module is further configured to, if it is determined, according to the result of the searching, that the local fingerprint file does not store a fingerprint identical to the fingerprint of the data block to be backed up, and if the backup storage server stores a fingerprint identical to the fingerprint of the data block to be backed up, only transmit the fingerprint of the data block to be backed up to the backup storage server;
and the transmission module is also used for transmitting the data block to be backed up and the fingerprint thereof to the backup storage server if the local fingerprint file is determined to not have the fingerprint which is the same as the fingerprint of the data block to be backed up according to the searching result and the backup storage server does not have the fingerprint which is the same as the fingerprint of the data block to be backed up.
The application provides a proxy terminal, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the method.
The present application provides a computer readable storage medium having stored thereon a computer program for execution by a processor of the above method.
The present application provides a computer program product having a computer program stored thereon, the computer program being executed by a processor to perform the above method.
Before the backup, if the type of the backup is full backup, fingerprints related to data blocks of the last full backup and the last full backup are downloaded from a backup storage server, if the type of the backup is differential backup and incremental backup are sequentially performed before the backup, the fingerprints related to the last differential backup and the last incremental backup are downloaded from the backup storage server, if the type of the backup is differential backup, fingerprints related to the last differential backup and the last incremental backup are downloaded from the backup storage server, if the type of the backup is incremental backup, the fingerprints related to the last full backup and the incremental backup except the last incremental backup are downloaded from the backup storage server, the number of local fingerprint files in a local file system is a divisor, the downloaded fingerprints are subjected to modulo for obtaining the modulo result, the number of each local file is numbered, and the downloaded fingerprint is stored to the same as the local fingerprint number, so that the fingerprint is obtained by the modulo result.
In the backup process, calculating fingerprints for each data block to be backed up, taking the number of the local fingerprint files in the local file system as a divisor, carrying out modulo calculation on the calculated fingerprints, determining the local fingerprint files with the same numbers as the modulo calculation result according to the modulo calculation result, searching whether fingerprints with the same numbers as the fingerprints of the data blocks to be backed up exist in the local fingerprint files with the same numbers as the modulo calculation result, and if the fingerprints with the same numbers as the fingerprints of the data blocks to be backed up exist in the local fingerprint files according to the searching result, transmitting the fingerprints of the data blocks to be backed up to a backup storage server; and if the local fingerprint file does not store the fingerprint identical to the fingerprint of the data block to be backed up according to the searching result, and if the backup storage server stores the fingerprint identical to the fingerprint of the data block to be backed up, transmitting only the fingerprint of the data block to be backed up to the backup storage server, and if the local fingerprint file does not store the fingerprint identical to the fingerprint of the data block to be backed up according to the searching result, and if the backup storage server does not store the fingerprint identical to the fingerprint of the data block to be backed up, transmitting the data block to be backed up and the fingerprint thereof to the backup storage server.
In this way, fingerprints of data blocks which may need to be used in the backup process are downloaded and stored into the local fingerprint file through the modulo calculation, and in the backup process, the fingerprints are calculated and modulo calculation is performed on the data blocks which need to be backed up, and whether the fingerprints which are the same as the fingerprints of the data blocks which need to be backed up exist or not is searched in the local fingerprint file with the same number as the modulo calculation result, so that whether the fingerprints exist in the backup storage server is judged, and the effects of quickly searching the fingerprints, reducing network round-trip delay and improving the backup efficiency are achieved.
Drawings
FIG. 1 is an application environment diagram of a backup method in one embodiment;
FIG. 2 is a flow diagram of a backup method in one embodiment;
FIG. 3 is a flow diagram of a backup method in one embodiment;
FIG. 4 is a block diagram of a backup device in one embodiment;
fig. 5 is an internal structural diagram of the proxy end in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly understand that the embodiments described herein may be combined with other embodiments.
The backup method provided by the application can be applied to an application environment shown in fig. 1.
Before backup, the agent downloads fingerprints of data blocks which may need to be used in the backup process from the backup storage server to a local file system, meanwhile, the downloaded fingerprints are stored in different local fingerprint files in a classified mode, in the backup process, the same kind of fingerprints only need to be searched in the same local fingerprint file, whether the fingerprints exist in the backup storage server is judged, if the fingerprints do not exist, the agent transmits the data blocks and the fingerprints to the backup storage server, and if the fingerprints exist, the agent only transmits the fingerprints to the backup storage server.
In one embodiment, as shown in fig. 2, a backup method is provided, which may be performed by a proxy, and includes the following steps:
step S201, before the backup, if the type of the backup is full-scale backup, downloading fingerprints related to the last full-scale backup and the data blocks of the backup from the last full-scale backup from a backup storage server; under the condition that the full-quantity backup is performed after the last time and before the current time, the differential backup and the incremental backup are sequentially performed, if the current backup type is the differential backup, the fingerprints related to the data blocks of the last differential backup and the incremental backup from the last differential backup are downloaded from a backup storage server; and under the condition that the incremental backup is carried out for a plurality of times from the last full backup to the previous backup, if the current backup is the incremental backup, downloading fingerprints related to the data blocks of the last full backup and the incremental backups except the last incremental backup from a backup storage server.
Further, in the case that the difference backup is performed after the previous full-load backup and before the current backup, if the current backup type is the difference backup, the fingerprint related to the data block of the previous difference backup is downloaded from the backup storage server; under the condition that the incremental backup is carried out after the last full-load backup and before the current backup or the incremental backup is not carried out, if the current backup type is differential backup, the fingerprints related to the data blocks of the incremental backup from the last full-load backup are downloaded from a backup storage server; and if the backup is the incremental backup under the condition that the difference backup is carried out after the last full backup and before the current backup, downloading fingerprints related to the last full backup and the last difference backup and the data blocks of the incremental backup after the last full backup from the backup storage server.
In this embodiment, the purpose of downloading the fingerprint is to download the fingerprint in the backup storage server, which may be repeated with the fingerprint of the data block of the current backup, to the local file system, and because the full-size backup is a backup of all the data blocks at a certain time point, the incremental backup is a backup of the data block changed with respect to the previous backup, and the differential backup is a backup of the data block changed with respect to the full-size backup, the corresponding backup is downloaded to the local file system in combination with the backup that has undergone since the previous full-size backup according to the type of the current backup.
Step S202, taking the number of the local fingerprint files in the local file system as a divisor, performing modulo calculation on the downloaded fingerprints, numbering each local fingerprint file according to the modulo calculation result, and storing the downloaded fingerprints in the local fingerprint files with the same number as the modulo calculation result of the fingerprints.
In this embodiment, if the downloaded fingerprint is F i The local file system of the proxy end is provided with C local fingerprint files, all downloaded fingerprints are subjected to modulo calculation, and the formula is Q i =F i mod C, according to modulo result Q i And numbering each local fingerprint file, and storing the fingerprints with the same modulo result into the same local fingerprint file. For example, if the downloaded fingerprints have modulo results 2, 3 and 4, different local fingerprint files are numbered with modulo results 2, 3 and 4, and if the fingerprint with modulo 2 is stored in the local fingerprint file with number 2, the fingerprint with modulo 3 is stored in the local fingerprint file with number 3, and the fingerprint with modulo 4 is stored in the local fingerprint file with number 4.
In step S203, in the backup process, fingerprints are calculated for each data block to be backed up, the number of the local fingerprint files in the local file system is taken as a divisor, the calculated fingerprints are subjected to modulo calculation, and the local fingerprint files with the same number as the modulo calculation result are determined according to the modulo calculation result.
Step S204, searching whether the fingerprints which are the same as the fingerprints of the data blocks to be backed up exist in the local fingerprint file with the same number as the modulo result.
In this embodiment, when performing this backup, the fingerprint of each data block to be backed up is calculated, and the number of local fingerprint files at the proxy end is the divisor, that is, C in the modulo formula is calculated, so as to obtain a modulo result, determine the local fingerprint file with the same number as the fingerprint of each data block to be backed up, and find whether the fingerprint with the same number as the fingerprint of the data block to be backed up exists in the local fingerprint file.
Step S205, if it is determined that the local fingerprint file has the same fingerprint as the fingerprint of the data block to be backed up according to the result of the search, transmitting only the fingerprint of the data block to be backed up to a backup storage server; if the local fingerprint file is determined to not store the fingerprints identical to the fingerprints of the data blocks to be backed up according to the searching result, and if the backup storage server is stored with the fingerprints identical to the fingerprints of the data blocks to be backed up, transmitting only the fingerprints of the data blocks to be backed up to the backup storage server; and if the local fingerprint file is determined to not store the fingerprint identical to the fingerprint of the data block to be backed up according to the searching result, and if the backup storage server is determined to not store the fingerprint identical to the fingerprint of the data block to be backed up, transmitting the data block to be backed up and the fingerprint thereof to the backup storage server.
Further, if the type of deduplication is global deduplication, searching a backup storage server to determine whether the backup storage server has the same fingerprint as the fingerprint of the data block to be backed up. If the type of the deduplication is the local deduplication, when the local fingerprint file does not store the fingerprint identical to the fingerprint of the data block to be backed up, the backup storage server is directly determined to not store the fingerprint identical to the fingerprint of the data block to be backed up.
In this embodiment, after determining that the fingerprint of the data block to be backed up already exists in the local fingerprint file at the proxy, since the fingerprint in the local fingerprint file is downloaded from the fingerprint database of the backup storage server, the fingerprint of the data block to be backed up also exists in the backup storage server, and in the backup system of the deduplication, since the fingerprint of each data block is uniquely determined, the data block with the same fingerprint is the duplicate data block, when the fingerprint of the data block to be backed up exists in the backup storage server, that is, the data block to be backed up is the duplicate data block, the duplicate data block is already stored in the backup storage server, and the duplicate data block is not required to be retransmitted to the backup storage server for storage, and only the fingerprint is required to be transmitted to the storage server.
When determining that the fingerprint of the data block to be backed up does not exist in the local fingerprint file system, determining whether the fingerprint of the data block to be backed up exists in the backup storage server can be performed by combining the type of deduplication.
Specific: if the backup system for repeating data deletion performs local repeating deleting, namely, only the proxy end which performs backup currently performs repeating data deletion, fingerprints downloaded from the backup storage server and stored in the local fingerprint file are provided for the repeated deleting of the history of the proxy end, and if the fingerprints of the data block to be backed up are determined not to exist in the local fingerprint file system, the backup storage server does not exist the fingerprints of the data block to be backed up, so that the data block to be backed up is a new data block, and the data block to be backed up and the fingerprints thereof are transmitted to the backup storage server during backup.
If the backup system for repeating data deletion performs global repeating data deletion, then the multiple agent terminals delete the repeated data, the fingerprints stored in the backup storage server are provided by the multiple agent terminals, at this time, the multiple agent terminals may simultaneously backup, when a certain agent terminal downloads the fingerprints in the backup storage server or before starting backup after the downloading is completed, other agent terminals may transmit new data blocks and fingerprints to the backup storage server, when the fingerprints of the data blocks to be backed up are not present in the local fingerprint file system, the fingerprints of the data blocks to be backed up cannot be directly judged to be not present in the backup storage server, the agent terminal needs to remotely search whether the fingerprints identical to the fingerprints of the data blocks to be backed up exist in the backup storage server, if yes, the fingerprints of the data blocks to be backed up are transmitted to the backup storage server only when the fingerprints of the data blocks to be backed up are repeated, and when the fingerprints of the data blocks to be backed up are not present, the data blocks to be backed up are transmitted to the backup storage server with the fingerprints.
In the above backup method, before the current backup, if the current backup type is full backup, fingerprints related to the data blocks of the previous full backup and the last full backup are downloaded from the backup storage server, if the current backup type is differential backup in the case that the data blocks of the last full backup and the last full backup are sequentially subjected to differential backup and incremental backup before the current backup, the fingerprints related to the data blocks of the previous differential backup and the last incremental backup are downloaded from the backup storage server, if the current backup is incremental backup after the last full backup and before the current backup, fingerprints related to the data blocks of the last full backup and the incremental backup except the previous incremental backup are downloaded from the backup storage server, the number of local fingerprint files in the local file system is taken as a divisor, each local fingerprint file is numbered with the result of the modulo, the downloaded fingerprints are stored to the local fingerprint files with the same number as the result of the fingerprint, and the result of the modulo is obtained by the local fingerprint files is obtained.
In the backup process, calculating fingerprints for each data block to be backed up, taking the number of the local fingerprint files in the local file system as a divisor, carrying out modulo calculation on the calculated fingerprints, determining the local fingerprint files with the same numbers as the modulo calculation result according to the modulo calculation result, searching whether fingerprints with the same numbers as the fingerprints of the data blocks to be backed up exist in the local fingerprint files with the same numbers as the modulo calculation result, and if the fingerprints with the same numbers as the fingerprints of the data blocks to be backed up exist in the local fingerprint files according to the searching result, transmitting the fingerprints of the data blocks to be backed up to a backup storage server; and if the local fingerprint file does not store the fingerprint identical to the fingerprint of the data block to be backed up according to the searching result, and if the backup storage server stores the fingerprint identical to the fingerprint of the data block to be backed up, transmitting only the fingerprint of the data block to be backed up to the backup storage server, and if the local fingerprint file does not store the fingerprint identical to the fingerprint of the data block to be backed up according to the searching result, and if the backup storage server does not store the fingerprint identical to the fingerprint of the data block to be backed up, transmitting the data block to be backed up and the fingerprint thereof to the backup storage server.
In this way, fingerprints of data blocks which may need to be used in the backup process are downloaded and stored into the local fingerprint file through the modulo calculation, and in the backup process, the fingerprints are calculated and modulo calculation is performed on the data blocks which need to be backed up, and whether the fingerprints which are the same as the fingerprints of the data blocks which need to be backed up exist or not is searched in the local fingerprint file with the same number as the modulo calculation result, so that whether the fingerprints exist in the backup storage server is judged, and the effects of quickly searching the fingerprints, reducing network round-trip delay and improving the backup efficiency are achieved.
In one embodiment, searching whether the same fingerprint as the fingerprint of the data block to be backed up exists in the local fingerprint file with the same number as the modulo result comprises:
acquiring the fingerprints which are sequentially sequenced in the local fingerprint file and have the same number as the modulo result; and determining whether the fingerprints which are the same as the fingerprints of the data blocks to be backed up exist in the fingerprints which are sequenced in sequence in a binary search mode.
The binary search is also called as a compromise search, and is a search method with higher efficiency, but the binary search is used on the premise that the searched sequence is ordered, so after each local fingerprint file is read and written and mapped to a virtual memory by a system call mmap mode, the fingerprints in each local fingerprint file are required to be rapidly sequenced, and the performance of the search is greatly improved by performing binary search in the sequenced fingerprints.
Based on this, before finding whether there is a fingerprint identical to the fingerprint of the data block to be backed up, the method further comprises:
mapping each local fingerprint file to a virtual memory in a read-write mode by a system calling mmap mode, and sequencing fingerprints in each local fingerprint file in the virtual memory; after each local fingerprint file is read-write mapped to the virtual memory and sorting is completed, the read-write mapping is closed by a mode of calling a mumap by the system.
The mmap is a method for mapping the file in the memory, after the local fingerprint file is mapped to the virtual memory, the process can access the local fingerprint file like accessing the common memory, and operations such as read (), write () and the like are not required to be called, so that the query efficiency of the fingerprint is greatly improved.
In order to better understand the above method, an application example of the backup method of the present application is described in detail below.
Currently, in a deduplication system, a fingerprint (fingerprint) is a cryptographic hash function value (e.g., MD5 or SHA-1) of a data block (chunk) stored in a fingerprint database of a backup storage server. In the backup process of the agent, the agent needs to frequently query the backup storage server (storage) whether the fingerprint of the data block exists, and the data block and the fingerprint are transmitted to the backup storage server when the fingerprint does not exist, and if the fingerprint does exist, the agent transmits the fingerprint only to the backup storage server.
As the data volume of business systems of various industries increases year by year, the data volume handled by a Backup Window (Backup Window) also increases. How the performance of a deduplicated backup system meets the backup window presents a technical challenge.
The main content of this embodiment is described with reference to fig. 3:
step S301 to step S302: before the backup, the agent determines a backup set in which the fingerprint to be downloaded is located according to the type of the backup:
if the backup type is full backup, downloading fingerprints related to the last full backup and the data blocks of the backup from the last full backup from a backup storage server; under the condition that the full-quantity backup is performed after the last time and before the current time, the differential backup and the incremental backup are sequentially performed, if the current backup type is the differential backup, the fingerprints related to the data blocks of the last differential backup and the incremental backup from the last differential backup are downloaded from a backup storage server; under the condition that multiple incremental backups are carried out after the last full backup and before the current backup, if the current backup is the incremental backup, the fingerprints related to the data blocks of the last full backup and the incremental backups except the last incremental backup are downloaded from a backup storage server; under the condition that the difference backup is carried out from the last full backup to the previous backup, if the type of the current backup is the difference backup, the fingerprints related to the data blocks of the last difference backup are downloaded from the backup storage server; under the condition that the incremental backup is carried out after the last full-load backup and before the current backup or the incremental backup is not carried out, if the current backup type is differential backup, the fingerprints related to the data blocks of the incremental backup from the last full-load backup are downloaded from a backup storage server; in the case that the difference backup is performed after the last full-size backup and before the current backup, if the current backup is the incremental backup, downloading fingerprints related to the last full-size backup, the last difference backup from the last full-size backup, and the data blocks of the incremental backup from the last difference backup from the last full-size backup from the backup storage server;
Step S303 to step S307: downloading the fingerprint F from the backup storage server i Find the modulus Q i =F i ModC, partition store to proxy local file system C files, each file in Q i Is named after the value of (2);
step S308 to step S311: the agent terminal respectively reads and writes and maps each local fingerprint file to a virtual memory through a system call mmap, respectively rapidly sorts the fingerprints of each local fingerprint file, and closes the mapping after sorting is completed;
step S312 to step S315: the agent end respectively reads and maps each sequenced fingerprint file to the virtual memory through mmap; in the backup process, the agent calculates the fingerprint F every 1 block of data i Modulo Q i =F i ModC finds File Q i
Step S316 to step S321: proxy end in file Q i In which the fingerprint F is found by two halves i According to the type of deduplication:
if the type of the deduplication is global deduplication, remotely searching a fingerprint database of the backup storage server to determine whether the fingerprint exists in the fingerprint database;
if the type of the deduplication is local deduplication, the fingerprint database of the backup storage server is not required to be searched remotely, and the fact that the fingerprint does not exist in the fingerprint database can be determined;
if the fingerprint exists in the fingerprint database of the backup storage server, the proxy transmits the fingerprint to the backup storage server only; otherwise, the agent transmits the data block and the fingerprint to the backup storage server.
In this embodiment, before the current backup, the agent downloads fingerprints related to the backup set generated by the source data in the last full backup to the agent local file system, and stores the fingerprints in a plurality of mmap files (local fingerprint files mapped to the virtual memory through mmap read-write) in a modulo partition (Round-robin partitioning), and rapidly sorts each mmap file. In the backup process, the fingerprint is found out and the partition file is stored in which local fingerprint file, and the fingerprint is found out in the file in two halves, compared with the prior art, the method has at least the following advantages:
1. the fingerprint cache file is a read-only lookup, and the querying thread has lock-free access (all threads can continuously execute in a non-stop state).
2. When the fingerprint does not exist in the local cache file, the fingerprint can be determined to be not exist in the fingerprint database without remotely searching the fingerprint database, so that network round-trip delay is avoided.
3. When the fingerprint exists in the local cache file, if the fingerprint is global deleting, remotely searching a fingerprint database to determine whether the fingerprint exists in the fingerprint database; if the fingerprint is local deduplication, the fingerprint is determined not to exist in the fingerprint database without remotely searching the fingerprint database.
4. Both fast ordering and binary search work on continuous memory, with each fingerprint 16byte, the x86_64 architecture has 64 bytes per cache line, i.e., just holds 4 fingerprints. mmap maps files into memory in a memory page (typically 4 KiB) alignment, each page containing exactly 256 fingerprints. Therefore, the search fingerprint is efficiently utilized, and the search performance is greatly improved relative to a tree-shaped or hash structure.
5. And when the system memory is in tension due to mmap, the read-only memory occupied by the fingerprint is automatically recovered by the operating system, so that the instability of the service system caused by OOM is avoided.
6. The modulo partitioning combines the binary search, and compared with the binary search, the time complexity is reduced from O (lgN) to O (lg (N/C))
Assuming that the source data is 1TiB, each block of data 64kib,128 partitions is delayed by 0.2 ms back and forth according to the 1GE network TCP (because the proxy end interacts with the backup storage server in the TCP protocol), the memory delay is 16 ns:
N=1TiB/64KiB=240/(64*210)=224
M=Nlog2(N/C)=224*log2(224/128)=17*224
where N is the total number of blocks, m=17×224 is the number of local fingerprints found, estimated to take 16×17×224 ns=4.6 seconds, in contrast to:
if there is no local cache, the backup storage server searches for 64 fingerprints 1 time in batch, which is equivalent to local search (actually should be slower than local search), then 0.2/64×17×224 ms+4.6s= 895.6 s. The performance of this example is improved by 19300%
If the local cache is cached to the tree structure, the time consumed by the cache line cannot be utilized efficiently due to the memory discontinuity is at least 16×n log2 (N) nanoseconds=16×224×log2 (224) =6.4 seconds. The implementation performance is improved by 39%, so that the embodiment achieves the effects of quickly inquiring fingerprints, reducing network round-trip delay and improving backup efficiency.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
In one embodiment, as shown in fig. 4, there is provided a backup device, applied to a proxy, including:
the fingerprint downloading module 401 is configured to download, from the backup storage server, the fingerprint related to the last full-size backup and the data block of the backup from the last full-size backup, if the current backup type is full-size backup before the current backup; if the current backup type is differential backup, downloading fingerprints related to the last differential backup and the data blocks of the incremental backup from the last differential backup from a backup storage server; if the backup is the incremental backup, downloading fingerprints related to the data blocks of the last full-size backup and the incremental backup except the last incremental backup from a backup storage server;
the fingerprint storage module 402 is configured to perform modulo calculation on the downloaded fingerprints by taking the number of the local fingerprint files in the local file system as a divisor, number each local fingerprint file with a modulo calculation result, and store the downloaded fingerprints in the local fingerprint files with the same number as the modulo calculation result of the fingerprints;
the fingerprint file determining module 403 is configured to calculate fingerprints for each data block to be backed up in the current backup process, perform modulo on the calculated fingerprints by taking the number of the local fingerprint files in the local file system as a divisor, and determine the local fingerprint files with the same number as the modulo result according to the modulo result;
The fingerprint searching module 404 is configured to search whether a fingerprint identical to a fingerprint of the data block to be backed up exists in the local fingerprint file with the same number as the modulo result;
a transmission module 405, configured to, if it is determined, according to the result of the searching, that the local fingerprint file has a fingerprint that is the same as a fingerprint of the data block to be backed up, transmit only the fingerprint of the data block to be backed up to a backup storage server;
the transmission module 405 is further configured to, if it is determined, according to the result of the searching, that the local fingerprint file does not store a fingerprint identical to the fingerprint of the data block to be backed up, and if the backup storage server stores a fingerprint identical to the fingerprint of the data block to be backed up, only transmit the fingerprint of the data block to be backed up to the backup storage server;
and the transmission module 405 is further configured to, if it is determined, according to the result of the searching, that the local fingerprint file does not have a fingerprint identical to the fingerprint of the data block to be backed up, and if the backup storage server does not have a fingerprint identical to the fingerprint of the data block to be backed up, transmit the data block to be backed up and the fingerprint thereof to the backup storage server.
In one embodiment, the fingerprint downloading module 401 is further configured to, in a case where a differential backup is performed after the last full-size backup and before the current backup, download, from the backup storage server, a fingerprint related to a data block of the last differential backup if the current backup type is the differential backup.
In one embodiment, the fingerprint downloading module 401 is further configured to, when the incremental backup is performed after the previous full backup until before the current backup, or the incremental backup is not performed, download, from the backup storage server, the fingerprint related to the data block of the incremental backup from the previous full backup if the current backup type is the differential backup.
In one embodiment, the fingerprint downloading module 401 is further configured to, when the difference backup is performed after the last full-size backup and before the current backup, download, from the backup storage server, the fingerprint related to the last full-size backup, the last difference backup from the last full-size backup, and the data block of the last difference backup.
In one embodiment, the fingerprint searching module 404 is further configured to obtain the sequentially ordered fingerprints in the local fingerprint file with the same number as the modulo result; and determining whether the fingerprints which are the same as the fingerprints of the data blocks to be backed up exist in the fingerprints which are sequenced in sequence in a binary search mode.
In one embodiment, the device further comprises a system call mmap module, which is used for mapping each local fingerprint file to a virtual memory in a read-write mode by a system call mmap mode, and sorting fingerprints in each local fingerprint file in the virtual memory; after each local fingerprint file is read-write mapped to the virtual memory and sorting is completed, the read-write mapping is closed by a mode of calling a mumap by the system.
In one embodiment, the transmission module 405 is further configured to search a backup storage server if the type of deduplication is global deduplication, so as to determine whether the backup storage server has a fingerprint that is the same as a fingerprint of the data block to be backed up;
if the type of the deduplication is the local deduplication, when the local fingerprint file does not store the fingerprint identical to the fingerprint of the data block to be backed up, the backup storage server is directly determined to not store the fingerprint identical to the fingerprint of the data block to be backed up.
For specific limitations of the backup device, reference may be made to the above limitations of the backup method, and no further description is given here. The modules in the backup device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a proxy end is provided, the internal structure of which may be as shown in fig. 5. The proxy side includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the proxy is configured to provide computing and control capabilities. The memory of the proxy end comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the proxy side is used for storing backup data. The network interface of the proxy end is used for communicating with an external terminal through network connection. The agent terminal also comprises an input/output interface, wherein the input/output interface is a connecting circuit for exchanging information between the processor and external equipment, and the input/output interface is connected with the processor through a bus and is called as an I/O interface for short. The computer program, when executed by a processor, implements a backup method.
It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a proxy terminal is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the above method embodiments when executing the computer program.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the respective method embodiments described above.
In one embodiment, a computer program product is provided, on which a computer program is stored, which computer program is executed by a processor for performing the steps of the various method embodiments described above.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. A backup method, applied to a proxy, the method comprising:
before the backup, if the type of the backup is full-scale backup, downloading fingerprints related to the last full-scale backup and the data blocks of the backup from the last full-scale backup from a backup storage server; under the condition that the full-quantity backup is performed after the last time and before the current time, the differential backup and the incremental backup are sequentially performed, if the current backup type is the differential backup, the fingerprints related to the data blocks of the last differential backup and the incremental backup from the last differential backup are downloaded from a backup storage server; under the condition that multiple incremental backups are carried out after the last full backup and before the current backup, if the current backup is the incremental backup, the fingerprints related to the data blocks of the last full backup and the incremental backups except the last incremental backup are downloaded from a backup storage server;
Taking the number of the local fingerprint files in the local file system as a divisor, carrying out modulo calculation on the downloaded fingerprints, numbering each local fingerprint file according to the modulo calculation result, and storing the downloaded fingerprints to the local fingerprint files with the same number as the modulo calculation result of the fingerprints;
in the backup process, calculating fingerprints for each data block to be backed up, taking the number of the local fingerprint files in the local file system as a divisor, performing modulo calculation on the calculated fingerprints, and determining the local fingerprint files with the same numbers as the modulo calculation result according to the modulo calculation result;
searching whether the fingerprints which are the same as the fingerprints of the data blocks to be backed up exist in the local fingerprint file with the same number as the modulo result;
if the fact that the local fingerprint file has the same fingerprints as the fingerprints of the data blocks to be backed up is determined according to the searching result, the fingerprints of the data blocks to be backed up are only transmitted to a backup storage server;
if the local fingerprint file is determined to not store the fingerprints identical to the fingerprints of the data blocks to be backed up according to the searching result, and if the backup storage server is stored with the fingerprints identical to the fingerprints of the data blocks to be backed up, transmitting only the fingerprints of the data blocks to be backed up to the backup storage server;
And if the local fingerprint file is determined to not store the fingerprint identical to the fingerprint of the data block to be backed up according to the searching result, and if the backup storage server is determined to not store the fingerprint identical to the fingerprint of the data block to be backed up, transmitting the data block to be backed up and the fingerprint thereof to the backup storage server.
2. The method according to claim 1, wherein the method further comprises:
and under the condition that the difference backup is carried out from the last full backup to the previous backup, if the current backup type is the difference backup, downloading fingerprints related to the data blocks of the last difference backup from the backup storage server.
3. The method according to claim 1, wherein the method further comprises:
and under the condition that the incremental backup is carried out after the full-load backup is carried out last time and before the backup is carried out this time or the incremental backup is not carried out, if the type of the backup is differential backup, the fingerprints related to the data blocks of the incremental backup from the last time of the full-load backup are downloaded from the backup storage server.
4. The method according to claim 1, wherein the method further comprises:
and if the backup is the incremental backup under the condition that the difference backup is carried out after the last full backup and before the current backup, downloading fingerprints related to the last full backup and the last difference backup and the data blocks of the incremental backup after the last full backup from the backup storage server.
5. The method of claim 1, wherein searching for whether there is a fingerprint identical to a fingerprint of the block of data to be backed up in the local fingerprint file having the same number as the modulo result comprises:
acquiring the fingerprints which are sequentially sequenced in the local fingerprint file and have the same number as the modulo result;
and determining whether the fingerprints which are the same as the fingerprints of the data blocks to be backed up exist in the fingerprints which are sequenced in sequence in a binary search mode.
6. The method of claim 5, wherein before searching for whether there is a fingerprint identical to the fingerprint of the data block to be backed up in the local fingerprint file having the same number as the modulo result, the method further comprises:
mapping each local fingerprint file to a virtual memory in a read-write mode by a system calling mmap mode, and sequencing fingerprints in each local fingerprint file in the virtual memory;
after each local fingerprint file is read-write mapped to the virtual memory and sorting is completed, the read-write mapping is closed by a mode of calling a mumap by the system.
7. The method of claim 1, wherein determining whether the backup storage server has the same fingerprint as the fingerprint of the block of data to be backed up comprises:
If the type of the deduplication is global deduplication, searching a backup storage server to determine whether the backup storage server has fingerprints identical to fingerprints of the data blocks to be backed up;
the method further comprises the steps of:
if the type of the deduplication is the local deduplication, when the local fingerprint file does not store the fingerprint identical to the fingerprint of the data block to be backed up, the backup storage server is directly determined to not store the fingerprint identical to the fingerprint of the data block to be backed up.
8. A backup apparatus for use at a proxy, the apparatus comprising:
before the current backup, if the current backup type is full backup, downloading fingerprints related to the last full backup and the data blocks of the last full backup from the backup storage server; if the current backup type is differential backup, downloading fingerprints related to the last differential backup and the data blocks of the incremental backup from the last differential backup from a backup storage server; if the backup is the incremental backup, downloading fingerprints related to the data blocks of the last full-size backup and the incremental backup except the last incremental backup from a backup storage server;
The fingerprint storage module is used for taking the number of the local fingerprint files in the local file system as a divisor, carrying out modulo calculation on the downloaded fingerprints, numbering each local fingerprint file according to the modulo calculation result, and storing the downloaded fingerprints into the local fingerprint files with the same number as the modulo calculation result of the fingerprints;
the fingerprint file determining module is used for calculating fingerprints for each data block to be backed up in the backup process, taking the number of the local fingerprint files in the local file system as a divisor, performing modulo calculation on the calculated fingerprints, and determining the local fingerprint files with the same numbers as the modulo result according to the modulo result;
the fingerprint searching module is used for searching whether the fingerprints which are the same as the fingerprints of the data blocks to be backed up exist in the local fingerprint file with the same number as the modulo result;
the transmission module is used for only transmitting the fingerprints of the data blocks to be backed up to the backup storage server if the fingerprints which are the same as the fingerprints of the data blocks to be backed up are determined to exist in the local fingerprint file according to the searching result;
the transmission module is further configured to, if it is determined, according to the result of the searching, that the local fingerprint file does not store a fingerprint identical to the fingerprint of the data block to be backed up, and if the backup storage server stores a fingerprint identical to the fingerprint of the data block to be backed up, only transmit the fingerprint of the data block to be backed up to the backup storage server;
And the transmission module is also used for transmitting the data block to be backed up and the fingerprint thereof to the backup storage server if the local fingerprint file is determined to not have the fingerprint which is the same as the fingerprint of the data block to be backed up according to the searching result and the backup storage server does not have the fingerprint which is the same as the fingerprint of the data block to be backed up.
9. A proxy comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1 to 7.
CN202211197326.9A 2022-09-29 2022-09-29 Backup method, backup device, proxy terminal and storage medium Active CN115543688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211197326.9A CN115543688B (en) 2022-09-29 2022-09-29 Backup method, backup device, proxy terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211197326.9A CN115543688B (en) 2022-09-29 2022-09-29 Backup method, backup device, proxy terminal and storage medium

Publications (2)

Publication Number Publication Date
CN115543688A CN115543688A (en) 2022-12-30
CN115543688B true CN115543688B (en) 2023-06-09

Family

ID=84732128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211197326.9A Active CN115543688B (en) 2022-09-29 2022-09-29 Backup method, backup device, proxy terminal and storage medium

Country Status (1)

Country Link
CN (1) CN115543688B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302675A (en) * 2015-11-25 2016-02-03 上海爱数信息技术股份有限公司 Method and device for data backup
CN110032383A (en) * 2019-04-08 2019-07-19 网易(杭州)网络有限公司 Oftware updating method, device and storage medium
US11593225B2 (en) * 2019-05-01 2023-02-28 EMC IP Holding Company LLC Method and system for live-mounting database backups
US11449392B2 (en) * 2020-01-21 2022-09-20 Druva Inc. Data backup system with block size optimization
CN114153652A (en) * 2020-09-07 2022-03-08 华为云计算技术有限公司 Directory backup integrity detection method and device
CN114090337A (en) * 2021-11-11 2022-02-25 上海英方软件股份有限公司 Quick synthesis backup and recovery method based on snapshot
CN114661521A (en) * 2022-02-28 2022-06-24 锐掣(杭州)科技有限公司 Offline duplicate removal method, offline duplicate removal device, electronic equipment, storage medium and program product
CN114691430A (en) * 2022-04-24 2022-07-01 北京科技大学 Incremental backup method and system for CAD (computer-aided design) engineering data files
CN114816856A (en) * 2022-04-29 2022-07-29 济南浪潮数据技术有限公司 Data backup method, device and equipment and readable storage medium

Also Published As

Publication number Publication date
CN115543688A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
US10592348B2 (en) System and method for data deduplication using log-structured merge trees
US8751763B1 (en) Low-overhead deduplication within a block-based data storage
US10938961B1 (en) Systems and methods for data deduplication by generating similarity metrics using sketch computation
US7269689B2 (en) System and method for sharing storage resources between multiple files
US7725437B2 (en) Providing an index for a data store
US8914338B1 (en) Out-of-core similarity matching
US8402063B2 (en) Restoring data backed up in a content addressed storage (CAS) system
US10579593B2 (en) Techniques for selectively deactivating storage deduplication
WO2014067063A1 (en) Duplicate data retrieval method and device
CN103581331A (en) Virtual machine on-line transfer method and system
CN109976669B (en) Edge storage method, device and storage medium
US11995050B2 (en) Systems and methods for sketch computation
CN111950025A (en) File distributed storage method based on block chain intelligent contract
CN116578746A (en) Object de-duplication method and device
WO2021127245A1 (en) Systems and methods for sketch computation
US7949630B1 (en) Storage of data addresses with hashes in backup systems
CN114490060A (en) Memory allocation method and device, computer equipment and computer readable storage medium
CN114064984A (en) Sparse array linked list-based world state increment updating method and device
CN112748877A (en) File integration uploading method and device and file downloading method and device
CN115543688B (en) Backup method, backup device, proxy terminal and storage medium
US20210191640A1 (en) Systems and methods for data segment processing
CN115168499B (en) Database table fragmentation method and device, computer equipment and storage medium
US20230029728A1 (en) Per-service storage of attributes
CN116192395A (en) Trusted system for distributed data storage
CN112395256B (en) Data reading method, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant