WO2020037985A1 - Method and apparatus for calculating backup file size - Google Patents

Method and apparatus for calculating backup file size Download PDF

Info

Publication number
WO2020037985A1
WO2020037985A1 PCT/CN2019/079390 CN2019079390W WO2020037985A1 WO 2020037985 A1 WO2020037985 A1 WO 2020037985A1 CN 2019079390 W CN2019079390 W CN 2019079390W WO 2020037985 A1 WO2020037985 A1 WO 2020037985A1
Authority
WO
WIPO (PCT)
Prior art keywords
backup file
data block
backup
target
file
Prior art date
Application number
PCT/CN2019/079390
Other languages
French (fr)
Chinese (zh)
Inventor
古文武
刘继朋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020037985A1 publication Critical patent/WO2020037985A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Definitions

  • the embodiments of the present application relate to the technical field of data protection, and in particular, to a method and a device for calculating the size of a backup file.
  • Incremental backup technology is usually used to back up user data on cloud resources.
  • the principle of incremental backup technology is: when the user's data is backed up for the first time, a full backup is performed to back up all data blocks. When the data is not backed up for the first time, incremental backup is used. Only the data blocks whose data has changed are not backed up.
  • FIG. 1 illustrates the incremental backup technology.
  • Figure 1 is a chain structure of incremental backup.
  • a group of data blocks: data block 1-data block 6 is backed up 4 times.
  • the first backup is performed, Back up all 6 data blocks.
  • Subsequent backups only back up data blocks that have changed data.
  • Data blocks that have not changed data are no longer backed up. Among them, whether the data has changed is relative to the last backup.
  • the data block 1, data block 2, and data block 4 are compared with the first backup, and the data in them has changed.
  • Data block 3, data block 5, and data block 6 Compared to the first backup, the data in it has not changed. Therefore, in the second backup, only data block 1, data block 2, and data block 4 are backed up, and data block 3, data block 5, and data block 6 are no longer backed up.
  • each backup performed for a data block group generates a metadata record file.
  • the metadata record file stores an identifier of each data block in the data block group and each data. The storage path of the block.
  • the data blocks that are backed up in the data block group corresponding to each backup are considered as the data blocks contained in the backup file.
  • a metadata record file (that is, a metadata record file identified as backup 1) is generated for the first backup, and the identifiers of data block 1 to data block 6 (data block 1 ... data) are stored.
  • the corresponding identifiers of block 6 are Block1... Block6), and the storage paths of data blocks 1 to 6 are stored (data block 1... And the corresponding storage paths of data block 6 are Flie1... Flie6).
  • another metadata record file (that is, the metadata record file identified as backup 2) is generated.
  • the record file 2 also stores the identifiers of data blocks 1 to 6 and data blocks 1 to data blocks. 6 storage paths.
  • the difference is that during the second backup process, only the data block 1, data block 2 and data block 4 whose data content has changed are backed up (that is, the changed data is stored in the metadata record file identified as backup 2).
  • the storage paths of block 1, data block 2, and data block 4 are Flie1 ', File2', and Flie4 '), and the storage paths and records of data block 3, data block 5, and data block 6 in which the data content in record file 2 has not changed
  • the same in file 1 that is, the storage addresses of data block 3, data block 5, and data block 6 are still Flie3, File5, and Flie6), which means that data block 3, data block 5, and data block 6 are generated in the backup file in the order of reference. Data blocks from previous backup files.
  • the embodiments of the present application provide a method and a device for calculating the size of a backup file, which are used to solve the problem of slow calculation speed when calculating the size of a backup file in the prior art.
  • the specific technical solutions provided in the embodiments of the present application are as follows:
  • a method for calculating the size of a backup file is provided.
  • the method can be applied to a backup server, specifically:
  • the backup server determines a target backup file of a size to be calculated, and determines a first backup file, the first backup file having a creation time before the creation time of the target backup file, and the creation time and the target backup file
  • the backup file with the most recent creation time, the first backup file has not been deleted.
  • the metadata of the backup file records the reference information of the data block and the size of the data block.
  • the reference information of the data block is used to indicate the backup file to which the data block belongs.
  • the backup server queries the metadata of the target backup file for reference information of each data block in the multiple data blocks included in the target backup file to obtain the first data block belonging to the target backup file.
  • the backup server calculates the size of the target backup file according to the size of the first data block and the second data block recorded in the metadata of the target backup file.
  • the sum of the sizes of the first data block and the second data block may be determined as the size of the target backup file.
  • the metadata of the target backup file and the metadata of the first backup file respectively record the reference information of each data block included in the corresponding backup file, it can be quickly identified as belonging to the target backup.
  • the data block of the file, and the size of each recorded data block attributable to the target backup file is quickly added to calculate the size of the target backup file.
  • the second data block belonging to the target backup file when determining the second data block belonging to the target backup file, it may be based on reference information of the data block recorded in the metadata of the target backup file and the first backup file.
  • the reference information of the data block recorded in the metadata to obtain the data block belonging to the second backup file, the creation time of the second backup file is before the creation time of the target backup file, and it is located in the first backup After the file creation time, the second backup file has been deleted; and the data block attributed to the second backup file is determined as the second data block attributed to the target backup file.
  • the data block belonging to the second backup file is updated to the target backup file, then the data block originally belonging to the second backup file may be determined as belonging.
  • the second data block of the target backup file is used to quickly and accurately calculate the size of the backup file.
  • the backup server may determine a target backup file after receiving a query request, and the query request includes an identifier of the target backup file, and the backup server determines the target backup file according to the identifier.
  • the query request may be sent by the tenant or sent by the billing system.
  • the method described in the first aspect may further include: recording, in the metadata of the target backup file, reference information of each data block included in the target backup file, and the each The size of each data block, and the reference information of each data block contained in the first backup file is recorded in the metadata of the first backup file.
  • a method for calculating the size of a backup file is provided.
  • the method can be applied to a backup server, specifically:
  • the backup server obtains a first total size of a plurality of data blocks included in the saved target backup file, and the target backup file is a backup file of the source file; then, the backup server obtains a plurality of data contained in the saved target backup file.
  • the difference between the first total size and the second total size is determined as the size of the target backup file.
  • a subtraction method is used to quickly calculate the target backup file's size.
  • the process of pre-saving the first total size of a plurality of data blocks included in the target backup file includes: the backup server determines a target backup file, and queries the target from the metadata of the target backup file. The size of each data block included in the backup file, and determining the sum of the sizes of each data block included in the target backup file as the first total size of a plurality of data blocks included in the target backup file, And save.
  • the process of pre-saving the second total size of the data blocks that do not belong to the target backup file includes: the backup server determines the target backup file, and according to the metadata recorded in the target backup file, The reference information of each data block included in the target backup file to obtain a third data block of the plurality of data blocks included in the target backup file that does not belong to the target backup file, and each data block The reference information of is used to indicate the backup file to which the data block belongs; according to the size of the third data block recorded in the metadata of the target backup file, determining among a plurality of data blocks included in the target backup file
  • the second total size of the data blocks that do not belong to the target backup file is the sum of the sizes of the third data blocks.
  • the backup server may determine a target backup file after receiving a query request, and the query request includes an identifier of the target backup file, and the backup server determines the target backup file according to the identifier.
  • the query request may be sent by the tenant or sent by the billing system.
  • the method described in the second aspect may further include: in the metadata of the target backup file, recording reference information of each data block included in the target backup file, and the each The size of each data block.
  • a method for calculating a size of data deleted in a backup file is provided.
  • the method can be applied to a backup server, and specifically:
  • the backup server determines a target backup file.
  • the target backup file includes multiple data blocks.
  • the target backup file is a backup file of a source file.
  • the source file also has a first backup file and a third backup file.
  • the first backup file includes a plurality of data blocks
  • the third backup file includes a plurality of data blocks.
  • the first backup file is created before the creation time of the target backup file.
  • the backup file with the latest creation time of the target backup file, the first backup file is not deleted, the third backup file is created after the creation time of the target backup file, and the creation time and the target backup are The backup file with the most recent file creation time, the third backup file has not been deleted;
  • a first backup file is determined, according to the target backup file recorded in the metadata of the target backup file, the target backup file contains reference information of each data block, and the first backup recorded in the metadata of the first backup file.
  • the reference information of each data block contained in the file to obtain a second data block belonging to the target backup file, and the reference information of each data block is used to indicate the backup file to which the data block belongs;
  • a third backup file is determined, and according to the metadata of the third backup file, reference information of each data block included in the third backup file is recorded to obtain a first data block belonging to the target backup file. And the third data block in the second data block that is referenced by the third backup file;
  • a fourth data block that can be deleted is determined, and according to each of the fourth data blocks recorded in the metadata of the target backup file, The size determines the size of data that can be deleted in the target backup file.
  • the sum of the sizes of the fourth data blocks is determined as the size of data that can be deleted in the target backup file.
  • the metadata of the target backup file, the metadata of the first backup file, and the metadata of the third backup file each record the reference information of each data block included in the corresponding backup file, it is possible to Quickly identify the data blocks that can be deleted in the target backup file, and quickly calculate the data that can be deleted in the target backup file according to the size of each recorded data block that can be deleted. size.
  • the data blocks other than the third data block in the first data block and the second data block may be determined as being capable of being deleted.
  • the deleted fourth data block when determining the fourth data block that can be deleted in the target backup file, the data blocks other than the third data block in the first data block and the second data block may be determined as being capable of being deleted.
  • the creation time of the second backup file is before the creation time of the target backup file, and it is located in the first backup After the file creation time, the second backup file has been deleted; then, the data block attributed to the second backup file is determined as the second data block attributed to the target backup file.
  • the data block belonging to the second backup file is updated to the target backup file, then the data block originally belonging to the second backup file may be determined as belonging.
  • a second data block for the target backup file is determined as belonging.
  • the backup server may determine a target backup file after receiving the delete request, the delete request includes an identifier of the target backup file, and the backup server determines the target backup file according to the identifier.
  • the deletion request may be sent by the tenant or by another system.
  • the method described in the third aspect may further include: recording, in the metadata of the target backup file, reference information of each data block included in the target backup file, and the each The size of each data block, record the reference information of each data block contained in the first backup file in the metadata of the first backup file, and record the third backup file in the metadata of the third backup file Reference information for each data block contained in the.
  • the present application provides a device for calculating the size of a backup file, a chip for a backup server or a backup server, including: units or means for performing each step in any of the above aspects.
  • the present application provides a device for calculating the size of a backup file, which is used for a backup server or a chip of a backup server, and includes at least one processing element and at least one storage element, where the at least one storage element is used to store a program and Data, the at least one processing element is configured to perform a method provided by any aspect of the present application.
  • the present application provides a device for calculating a size of a backup file, which is used for a backup server including at least one processing element (or chip) for performing the method of any of the above aspects.
  • the present application provides a computer program product including computer instructions that, when executed by a computer, cause the computer to execute the method of any of the above aspects.
  • the present application provides a computer-readable storage medium that stores computer instructions, and when the computer instructions are executed by a computer, the computer is caused to execute the method of any of the above aspects.
  • FIG. 2 is a schematic diagram of a metadata record file of a backup file in the prior art
  • FIG. 3 is a structural diagram of a computing backup system according to an embodiment of the present application.
  • FIG. 4 is a flowchart of a method for calculating a size of a backup file according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a metadata record file of a backup file in an embodiment of the present application.
  • FIG. 6 is a flowchart of a method for calculating a size of a backup file according to an embodiment of the present application
  • FIG. 7 is a schematic diagram of a backup process in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a process of deleting a backup file in an embodiment of the present application.
  • 9 is a device for calculating a size of a backup file in an embodiment of the present application.
  • FIG. 10 is an apparatus for calculating a size of deleted data in a backup file according to an embodiment of the present application.
  • 11 is an apparatus for calculating a size of a backup file in an embodiment of the present application.
  • FIG. 12 is an apparatus for calculating a size of deleted data in a backup file according to an embodiment of the present application.
  • FIG. 3 is a backup system 300 according to an embodiment of the present application.
  • the system includes a user 301, a backup server 302, a production storage server 303, and a backup storage server 304.
  • the backup server 302 Users can back up, restore, and delete backup files using the backup server 302 in the backup system. This application applies to backup and deletion scenarios.
  • the user 301 can periodically report to the backup server.
  • 302 sends a backup request to back up the data that needs to be backed up.
  • the user 301 initiates a backup task to the backup server 302.
  • the backup server 302 reads the data to be backed up from the production storage server 303 and stores the data in the backup storage server 304.
  • the backup server 302 also has a local database for management Back up task data. Users can be tenants or other systems.
  • the storage space rented by the tenant on the cloud resource is limited.
  • the tenant can save the storage space occupied by each backup file. Therefore, when the rented storage space is insufficient, some backup files are deleted, so the tenant can send backup
  • the server 302 queries the size of the backup file and records it.
  • the billing system can also query the backup server 302 about the size of the backup file.
  • the billing system performs billing, it charges each backup file.
  • the billing system needs to know the size of each backup file in order to perform billing.
  • the backup server when the backup server calculates the size of the backup file, it needs to query the size of each data block at the bottom according to the storage path of the data block in the backup file. The calculation speed is slow.
  • This application provides a calculation The method of backing up the file size can increase the calculation speed.
  • a backup file adjacent to the backup file may be involved. Multiple backup files provided in this application are located in the same backup chain, that is, backup files for the same source file.
  • the words "first" and “second” are used only for the purpose of distinguishing descriptions, and cannot be understood as indicating or implying relative importance, nor as indicating Or imply order.
  • the present application discloses a flowchart of a method for calculating the size of a backup file.
  • the backup server in this process is specifically the backup server 302 shown in FIG. 3. It can be understood that, in this application, the function of the backup server may also be implemented by a chip applied to the backup server.
  • the process is specifically:
  • Step S401 The backup server receives a query request, and the query request includes an identifier of a target backup file.
  • Step S402 The backup server determines a target backup file.
  • the tenant or the charging system when the tenant or the charging system wants to know the size of a backup file, it may send a query request to the backup server to query the size of the backup file.
  • the query request includes an identifier of the backup file to be queried, and the backup file to be queried is referred to as a target backup file.
  • the backup server After receiving the query request, the backup server can determine the target backup file according to the identifier contained in the query request.
  • the backup file generated by this backup is the backup file for the source file.
  • the data block to be backed up can be regarded as the data block belonging to the backup file generated by the backup. Regardless of whether the data block is backed up or the data block is not backed up, that is, the data block in the backup file generated from the previous backup can be considered as the data block included in the backup file generated by the backup.
  • data block 1, data block 2, data block 4, and data block 6 refer to the data block in the backup file generated by referring to the previous backup, that is, the data block 3
  • data block 5 is the data block backed up in the third backup
  • data blocks 1 to 6 in the third backup are considered as the data blocks and data blocks contained in the backup file generated by the third backup.
  • 3 and data block 5 are considered to belong to the backup file generated by the third backup.
  • Data block 1, data block 2, data block 4, and data block 6 do not belong to the backup file generated by the third backup.
  • the target backup file is a backup file generated when performing incremental backup
  • the target backup file is a backup file of the source file.
  • Each backup file has its metadata.
  • the metadata includes the identification of the backup file, the backup serial number of the backup file, and the information of each data block contained in the backup file. Among them, the backup serial number increases in the order in which the backup file was generated, and the data
  • the block information includes the size of the data block, the reference information of the data block, and so on.
  • the reference information of a data block is used to indicate the backup file to which the data block belongs, that is, the backup file generated when the data block is backed up.
  • the reference information of the data block recorded in the metadata can be the backup file to which the data block belongs Identification and backup serial number.
  • the backup sequence number of the backup file is used to indicate the generation sequence of the backup file.
  • the backup server may further perform step 403: query the metadata of the target backup file to query reference information of each data block in the multiple data blocks included in the target backup file. To obtain the first data block belonging to the target backup file.
  • the ownership can be determined according to the reference information of each data block found
  • the data block belonging to the target backup file is determined as the first data block only based on the reference information of the data block recorded in the metadata of the target backup file.
  • the first data block is originally owned. For the target backup file.
  • the data block belonging to the target backup file When determining the data block belonging to the target backup file according to the reference information of each data block that is queried, it may be determined according to the identifier of the backup file in the reference information, specifically, it is recorded in the metadata identifying the target backup file.
  • the identifier of the backup file to which each data block belongs, and the data block corresponding to the identifier of the target backup file therein is determined as the first data block belonging to the target backup file. It may also be determined based on the backup sequence number of the backup file in the reference information, specifically identifying the backup sequence number of the backup file to which each data block recorded in the metadata of the target backup file belongs, and correspondingly the backup sequence number of the target backup file therein.
  • the data block is determined as the first data block belonging to the target backup file.
  • the source file also has a first backup file.
  • the creation time of the first backup file is earlier than the creation time of the target backup file, and the creation time of the two is the closest, that is, the first backup file is The backup file whose creation time is before the creation time of the target backup file and whose creation time is closest to the creation time of the target backup file, the first backup file has not been deleted.
  • the target backup file is the backup file generated during the fourth backup.
  • the first backup file is the backup file generated during the third backup. If a deleted backup file exists and the deleted backup file is the backup file generated during the third backup, the first backup file is the backup file generated during the second backup and originally belongs to the third backup generation The data block of the backup file is changed to belong to the target backup file. If the deleted backup file is the backup file generated during the third backup and the backup file generated during the second backup, the first backup file is the backup file generated during the first backup and originally belongs to the third backup The data blocks of the backup file generated during the backup and the backup file generated during the second backup are changed to the target backup file.
  • the process of the backup server determining the first backup file may be as follows:
  • the backup server determines the backup serial number of the target backup file according to the recorded backup serial number of each backup file.
  • the backup serial number of the target backup file is referred to as the target backup serial number.
  • the backup server determines that it is adjacent to the target backup serial number and is smaller than The first backup sequence number of the target backup sequence number;
  • the first backup file corresponding to the first backup serial number is determined according to the backup serial number of each backup file.
  • step 404 may also be performed: determining the first backup file, according to reference information of each data block included in the target backup file recorded in the metadata of the target backup file, and the first The reference information of each data block contained in the first backup file recorded in the metadata of a backup file is used to obtain the data block belonging to the second backup file; and the data block belonging to the second backup file is determined as A second data block belonging to the target backup file, wherein the creation time of the second backup file is before the creation time of the target backup file and after the creation time of the first backup file, the The second backup file has been deleted.
  • Reference information of each data block included in the target backup file recorded in the metadata of the target backup file according to the backup server and each data block included in the first backup file recorded in the metadata of the first backup file When obtaining the reference data of the second backup file belonging to the second backup file, specifically, for each data block except the first data block in the target backup file, identifying the data block in the first backup file.
  • the corresponding data block, the data block, and the corresponding data block in the first backup file are backup data blocks for the same data block in the source file.
  • the backup server judges the reference information of the data block and the data block. Whether the reference information of the corresponding data block of the data block in the first backup file is the same. If they are different, then it is determined that the data block is the data block originally belonging to the deleted second backup file, that is, the data block currently belonging to the target backup file.
  • the second data block is determined that the data block is the data block originally belonging to the deleted second backup file, that is, the data block currently belonging to the target backup file.
  • the process of identifying the corresponding data block of the data block in the first backup file may be that the backup server records in the metadata of each backup file
  • the identifier of each data block contained in the backup file may be the same for the backup data block of each data block in the source file.
  • the backup server may identify, for each data block except the first data block in the target backup file, a data block in the first backup file that has the same identifier as the data block, and determine the data block with the same identifier as the target backup file.
  • the corresponding data block of the data block in the first backup file may be that the backup server records in the metadata of each backup file
  • the identifier of each data block contained in the backup file may be the same for the backup data block of each data block in the source file.
  • the backup server may identify, for each data block except the first data block in the target backup file, a data block in the first backup file that has the same identifier as the data block, and determine the data block with the same identifier as the target backup file.
  • the backup server may according to the first data block and the second data block recorded in the metadata of the target backup file. To calculate the size of the target backup file. Specifically, step 405 may be performed: determining the sum of the sizes of the first data block and the second data block as the size of the target backup file.
  • the metadata of the target backup file and the metadata of the first backup file respectively record the reference information of each data block contained in the corresponding backup file
  • the data blocks belonging to the target backup file can be quickly identified, and according to The size of each recorded data block that belongs to the target backup file is quickly calculated by using the addition method.
  • the reference information of each data block contained in the backup file and the size of the data block recorded in the metadata of the backup file can be specifically recorded in the metadata record file of the backup file. Except for the record in the metadata record file of the backup file, In addition to the identification block and storage path Flie of each existing data block shown in 2, the size of the data block, the identification of the backup file to which the data block belongs, and the backup sequence number of the backup file are detailed. See the metadata record file shown in Figure 5.
  • the metadata record file shown in FIG. 5 is only used to indicate what information is recorded in the metadata record file of the backup file.
  • the specific record format can be the format shown in FIG. 5 or other settings set by the backup server. format.
  • the third column is the backup serial number of the backup file to which the data block belongs.
  • the third column is an orderly identification. Of course, the identification of the backup file and the backup serial number can also be recorded separately.
  • the fourth column is The size of the data block. Taking the third backup as an example, the metadata record file of the backup file generated by the third backup is backup 3 in FIG. 5. The number “2” corresponding to the data block 1 (Block1) in the backup 3 indicates that the data block 1 belongs to it. The backup file is the backup file generated during the second backup. The number “3” corresponding to data block 5 (Block 5) indicates that the backup file to which data block 5 belongs is the backup file generated during the third backup.
  • the target backup file is not the backup file generated by the first backup, so there may be data blocks that do not belong to the target backup file among the multiple data blocks included in the target backup file.
  • the size of the target backup file can be calculated by adding the sizes of the data blocks belonging to the target backup file, or the total of multiple data blocks contained in the target backup file can be used.
  • the size of the target backup file is calculated by subtracting the size of each data block that does not belong to the target backup file from the size.
  • the backup serial number of the backup file contained in the backup file and the reference information of each data block may be further recorded according to the metadata of the backup file.
  • the size of the data block save the total size of multiple data blocks contained in the backup file in advance, and the total size of data blocks among the multiple data blocks contained in the target backup file that do not belong to the target backup file, for convenience
  • the total size of the data blocks included in the backup file is referred to as a first total size
  • the total size of the data blocks not included in the target backup file among the plurality of data blocks included in the target backup file is referred to as a second The total size.
  • the backup server receives a query request, and the query request includes an identifier of the target backup file.
  • the backup server determines the target backup file according to the identifier, and obtains a first total size of a plurality of data blocks contained in the saved target backup file.
  • the target backup file is a backup file of the source file; then, the backup server obtains a second total size of the data blocks among the plurality of data blocks contained in the saved target backup file that do not belong to the target backup file; finally, according to the The first total size and the second total size are used to calculate the size of the target backup file.
  • the difference between the first total size and the second total size is determined as the size of the target backup file.
  • a subtraction method is used to quickly calculate the target backup file's size.
  • the total size of the data blocks contained in the target backup file referencing the first backup file is the total size of the data blocks included in the target backup file that do not belong to the target backup file.
  • the total size of the data blocks included in the first backup file referenced by the target backup file may be recorded.
  • the total size of multiple data blocks contained in the backup file, the reference creation order in the backup file, and the total size of the data blocks contained in each backup file before the backup file creation order can be stored in the backup chain.
  • the information recorded in the backup chain reference relationship record file includes the identification (ID) of the backup file as shown in Table 1. For example, it can be backup 1, backup 2, ..., backup n, etc., backup of the backup file Sequence number (Snaplndex), such as 1, 2, ..., n, etc.
  • the total size of multiple data blocks contained in the backup file (Totasize).
  • Each backup in the backup file is referenced in the generation order before the generation order of the backup file.
  • the total size of the data blocks contained in the file (Reference). Subsequently, the size of the backup file can be calculated based on the information recorded in the backup chain reference relationship record file.
  • T3 represents the total size of multiple data blocks contained in the backup file with backup sequence number 3
  • R (3, 1) represents the backup file with backup sequence number 3.
  • R (3, 2) represents the total size of the data blocks contained in the backup file with the backup sequence number 2 in the backup file with the backup sequence number 3.
  • the backup server may record the backup sequence number of the backup file, the reference information of each data block and the size of the data block in advance in the backup chain reference relationship record file for calculating the backup according to the backup file recorded in the metadata of the backup file. Information about the size of the file.
  • the backup server records information in the backup chain reference relationship record file, it generally records in real time following the generation process of the backup file, that is, when a backup file is generated, it is based on the metadata of the backup file.
  • the information recorded in the backup chain reference relationship record file determines the information recorded in the backup chain reference relationship record file. In the process of recording information in the backup chain reference relationship record file, there is no need to consider the case of deleting a backup file.
  • the process of the backup server pre-saving the first total size of the plurality of data blocks included in the target backup file includes: the backup server determines the target backup file, and queries each metadata included in the target backup file from the metadata of the target backup file.
  • the size of each data block determine the sum of the sizes of each data block included in the target backup file as the first total size of a plurality of data blocks included in the target backup file, and save the target backup The first total size of multiple data blocks contained in the file.
  • the process of the second total size of the data blocks that are not attributable to the target backup file pre-saved by the backup server includes: the backup server determines the target backup file, and according to the target backup file recorded in the metadata of the target backup file The reference information of each data block included, to obtain a third data block of the plurality of data blocks included in the target backup file that does not belong to the target backup file, and the reference information of each data block is used to indicate A backup file to which the data block belongs; according to the size of the third data block recorded in the metadata of the target backup file, determining that a plurality of data blocks included in the target backup file do not belong to the target The second total size of the data blocks of the backup file, and the second total size is the sum of the sizes of the third data blocks.
  • the backup server calculates the size of the target backup file, then the backup chain reference relationship record file records at least the first total size of multiple data blocks contained in the target backup file, and the target backup file references the data contained in the first backup file
  • the total size of the blocks that is, the second total size of the data blocks in the plurality of data blocks included in the first target file that do not belong to the target backup file.
  • the backup chain reference relationship record file records the multiple data blocks contained in the target backup file.
  • the total size process includes:
  • the backup server determines the target backup file, and determines the sum of the sizes of each data block recorded in the metadata of the target backup file according to the size of each data block recorded in the metadata of the target backup file. The total size of multiple data blocks contained in the target backup file recorded in the backup chain reference relationship record file is described.
  • records in the backup chain reference relationship record file refer to the target backup file that is included in the first backup file.
  • the process of the second total size of the data block includes:
  • the backup server determines a target backup file, and obtains, based on the reference information of each data block included in the target backup file in the metadata of the target backup file, a plurality of data blocks included in the target backup file that do not belong to The third data block of the target backup file, and the reference information of each data block is used to indicate the backup file to which the data block belongs; according to the third data block recorded in the metadata of the target backup file Determine the second total size of the data blocks not attributable to the target backup file among the plurality of data blocks included in the target backup file, and the second total size is the sum of the sizes of the third data blocks.
  • the third data block of the file may be determined according to the identifier of the backup file in the reference information, specifically identifying the identifier of the backup file to which each data block recorded in the metadata of the target backup file belongs, and the non-target The data block corresponding to the identifier of the backup file is determined as the third data block that does not belong to the target backup file.
  • the corresponding data block is determined as a third data block that does not belong to the target backup file, that is, the target backup file refers to a data block included in the first backup file.
  • the backup file referenced in the backup chain reference relationship record file recorded in the backup chain reference number 3 is included in the backup file with the backup serial number 1.
  • the total size of the data block, that is, R (3,1) is the size of data block 6 (Block6), that is, size6.
  • the data block 6 (Block6) contained in the backup file with the backup sequence number 2 refers to the data block contained in the backup file with the backup sequence number 1 and the backup file with the backup sequence number 3 recorded in the backup chain reference relationship record file refers to the backup
  • the total size of the data blocks contained in the backup file with sequence number 2, that is, R (3, 2) is the sum of the sizes of data block 1, data block 2, data block 4, and data block 6, namely size1 ', size2', The sum of size4 'and size6.
  • the generation order of the backup file b precedes the generation order of the backup file a.
  • it can be Identify the backup sequence number of the backup file to which each data block contained in the backup file a recorded in the metadata of the backup file a, and identify the backup sequence number of the backup file b recorded in the metadata of the backup file b, in The backup sequence number of the backup file to which each data block contained in the backup file a identifies a specific backup sequence number that is less than or equal to the backup sequence number of the backup file b; the data block corresponding to the specific backup sequence number is determined as the backup file a and the backup file b is referenced
  • the data blocks included in the backup file a determine the size of each specific data block according to the size of each data block included in the backup file a, and determine the sum of the sizes of each specific data block as the backup file a reference to the backup file b
  • the total size of data blocks contained in another backup file a that references another backup file b.
  • the size of the target backup file When querying the size of the target backup file, you can calculate the size of the target backup file based on the information contained in the backup chain reference relationship record file. It may be the total size (Totasize) of a plurality of data blocks included in the target backup file minus the size of the data blocks that do not belong to the target backup file.
  • the size of the data block that does not belong to the target backup file is the size of the data block included in the reference backup file in the target backup file.
  • the total size of the multiple data blocks contained in the target backup file is Ta
  • the size of the data block in the target backup file that refers to the first backup file is R (a, p).
  • the size of the target backup file is Ta-R (a, p), where a is the target backup serial number of the target backup file, and p is the first backup serial number of the first backup file.
  • the target backup file whose size is to be calculated is the backup file identified as backup 4. If there is no deletion of a backup file, you can query it in the backup chain reference relationship record file according to the backup serial number of each backup file.
  • the first backup file is a backup file identified by 3.
  • the total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks among the multiple data blocks contained in the target backup file that do not belong to the target backup file is R (4, 3).
  • the target backup The file size is T4-R (4,3).
  • the backup chain reference relationship record file does not contain related information identified as backup 3. You can query the backup chain reference relationship record file to find the A backup file is a backup file identified by 2.
  • the total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks among the multiple data blocks contained in the target backup file that do not belong to the target backup file is R (4, 2).
  • the target backup The file size is T4-R (4, 2).
  • FIG. 6 a flowchart of calculating the size of a backup file.
  • the user sends a query request to the backup server for querying the size of the target backup file.
  • the backup server records the target backup file recorded by the backup chain reference relationship record file in advance The total size of the data blocks included in the reference, the size of the first backup file, the size of the target backup file, and feedback to the user on the size of the target backup file.
  • the method further includes:
  • a backup file is generated, and for the backup file, reference information of each data block included in the backup file is recorded in a metadata block of the backup file.
  • the tenant may send a backup request to the backup server.
  • other systems may also send a backup request to the backup server when the period arrives according to the backup period of the data block group.
  • the backup server backs up the data blocks in the target data block group according to the identifier of the target data block group, generates a target backup file, and records the identifier of the target backup file in the metadata of the target backup file and the backup serial number of the target backup file. , The reference information of each data block contained in the target backup file, and the size of each data block.
  • the backup server performs the backup process and records the backup file identifier, backup serial number, data block reference information, and data block size process:
  • the backup server When the backup server receives the backup request sent by the user, it creates a backup task and generates the ID of the backup file.
  • the generated ID of the backup file is called the target ID.
  • the backup server creates a new metadata record file based on the target ID and the backup server.
  • the largest backup sequence number currently stored in the database is used to determine the target backup sequence number. Generally, the largest backup sequence number is increased by 1 as the target backup sequence number, and the target identifier and the target backup sequence number are stored in the newly created metadata record file.
  • the backup server cyclically reads data from the production storage server, obtains each data block in the target data block group for backup, writes the backup data block to the backup storage server to implement data block backup, and identifies each The size of the backed up data block, the size of each backed up data block is recorded in the newly created metadata record file, and the target identifier of the backup file to which it belongs is recorded for each backed up data block;
  • the data blocks to be backed up and the data blocks not to be backed up are data blocks included in the backup file of the target identifier, and the data blocks to be backed up belong to the backup file of the target identifier. According to the above process, it can be considered that the backup task is completed, and the user can be notified that the backup is completed.
  • the size of the backup file it can also be calculated based on the backup sequence number recorded in the newly created metadata record file, the size of each data block, and the identifier of the backup file to which each data block belongs, that is, the backup sequence number.
  • the total size of the data blocks contained in the backup file generated by this backup, and the total size of the data blocks contained in each backup file referenced in the backup file generated by this backup before the backup file generated by this backup, And the total size of the data blocks contained in the backup file generated by this backup recorded in the backup chain reference relationship record file, and each of the reference generation order in the backup file generated by this backup precedes the backup file generated by this backup The total size of the data blocks contained in the backup file. According to the above process, the backup task is completed.
  • the target backup file of the size to be calculated is the backup file generated by the first backup
  • the sizes of all data blocks included in the target backup file of the size to be calculated may be directly added. To get the size of the target backup file.
  • the backup server may determine whether the target backup file of the size to be calculated is the backup file generated by the first backup when receiving the query request. Specifically, the backup server may determine the size to be calculated according to the backup serial number of each backup file saved in advance. Whether the backup serial number of the target backup file is the serial number of the smallest backup in the backup chain where the target backup file of the size to be calculated is located. If so, it is determined that the backup file of the size to be calculated is the backup file generated during the first backup, otherwise, It is determined that the backup file to be calculated is not a backup file generated when the first backup is performed.
  • the identification of the backup file, the backup serial number, the identification of the backup file to which each data block contained in the backup file, and the backup serial number can be stored in the local database of the backup server shown in FIG. 3.
  • the storage space rented by the tenant on the cloud resource is limited.
  • the tenant can delete some backup files.
  • the tenant can send a delete request to the backup server to delete the backup file.
  • the size of the data in the deleted backup file can be calculated.
  • the entire deletion process takes a long time, and the size of the deleted data cannot be feedback to the tenant in time.
  • a method for calculating the size of data deleted in a backup file which can quickly calculate the size of data that can be deleted in a backup file. Get the size of the data that can be deleted in the backup file.
  • the methods include:
  • the backup server determines a target backup file.
  • the target backup file includes multiple data blocks.
  • the target backup file is a backup file of a source file.
  • the source file also has a first backup file and a third backup file.
  • the first backup file is a backup file whose creation time is before the target backup file, and whose creation time is closest to the creation time of the target backup file
  • the third backup file is a creation time which is after the target backup file.
  • the backup file whose creation time is closest to the creation time of the target backup file neither the first backup file nor the third backup file is deleted.
  • a first data block belonging to the target backup file is obtained.
  • the backup server determines the first target backup file, and according to the target backup file recorded in the metadata of the target backup file, the target backup file contains reference information of each data block, and the metadata recorded in the metadata of the first backup file is recorded. Reference information of each data block included in the first backup file is used to obtain a second data block belonging to the target backup file, and the reference information of each data block is used to indicate the backup file to which the data block belongs.
  • the backup server determines the second data block belonging to the target backup file, it may be based on the reference information of the data block recorded in the metadata of the target backup file and the data recorded in the metadata of the first backup file.
  • the reference information of the data block to obtain the data block belonging to the second backup file, and the creation time of the second backup file is before the creation time of the target backup file and after the creation time of the first backup file,
  • the second backup file has been deleted; then, the data block attributed to the second backup file is determined as the second data block attributed to the target backup file.
  • the backup server determines a third target backup file, and according to the metadata of the third backup file, records reference information of each data block included in the third backup file to obtain a first attribute belonging to the target backup file.
  • a fourth data block that can be deleted is determined according to the first data block, the second data block, and the third data block, specifically, the fourth data that can be deleted in determining the target backup file
  • the data block other than the third data block among the first data block and the second data block may be determined as a fourth data block that can be deleted.
  • the size of the data that can be deleted in the target backup file is determined according to the size of each fourth data block recorded in the metadata of the target backup file. Specifically, the sum of the sizes of the fourth data blocks is determined as the size of data that can be deleted in the target backup file.
  • the tenant when deleting a backup file, may send a delete request to the backup server to delete data that can be deleted from the backup file.
  • the delete request includes an identifier of the backup file to be deleted, and the backup file to be deleted is referred to as a target backup file.
  • the backup server After receiving the delete request, the backup server can determine the target backup file according to the identifier contained in the delete request.
  • the process of the backup server determining the third backup file may be as follows:
  • the backup server determines the backup serial number of the target backup file according to the recorded backup serial number of each backup file.
  • the backup serial number of the target backup file is referred to as the target backup serial number.
  • the backup server determines that it is adjacent to the target backup serial number and is greater than the The third backup sequence number of the target backup sequence number;
  • the third backup file corresponding to the third backup serial number is determined according to the backup serial number of each backup file.
  • the backup server records the reference information of each data block included in the third backup file in the metadata according to the third backup file to obtain the first data block and the second data block belonging to the target backup file.
  • the backup server judges whether the reference information of the data block is the same as the reference information of the corresponding data block in the first backup file. If the reference information is the same, it determines that the data block is the third data referenced by the third backup file. Block, the third data block cannot be deleted.
  • the metadata of the target backup file, the metadata of the first backup file, and the metadata of the third backup file each record the reference information of each data block included in the corresponding backup file, it is possible to Quickly identify the data blocks that can be deleted in the target backup file, and quickly calculate the data that can be deleted in the target backup file according to the size of each recorded data block that can be deleted. size.
  • the backup server may pre-save a first total size of a plurality of data blocks included in the target backup file, and a second total referenced data block included in the first backup file in the target backup file.
  • the third backup file refers to the third total size of the data blocks contained in the target backup file
  • the third backup file refers to the fourth total size of the data blocks contained in the first backup file.
  • the reference to the data block included in the first backup file in the target backup file can be understood as the data block included in the target backup file that does not belong to the target backup file.
  • the data blocks included in the third backup file that are referenced in the target backup file can be understood as the data blocks included in the third backup file that do not belong to the third backup file.
  • the reference to the data block contained in the first backup file in the third backup file can be understood as the data block contained in the third backup file is neither a data block belonging to the third backup file nor a target backup file.
  • the difference between the third total size of the third backup file referencing the data block contained in the target backup file minus the fourth total size of the third backup file referencing the data block contained in the first backup file is included in the third backup file Among the data blocks belonging to the target backup file, the data blocks belonging to the target backup file are referenced by the third backup file and will not be deleted.
  • the total size of the data blocks belonging to the target backup file may be calculated first, that is, the first total size minus the first difference between the second total size;
  • the total size of the data blocks referenced by the third backup file in the data blocks belonging to the target backup file, that is, the third total size minus the fourth total size is the second difference, which belongs to the data blocks of the target backup file, but The data blocks referenced by the third backup file are not deleted; finally, the total size of the data blocks belonging to the target backup file other than the data blocks referenced by the third backup file is calculated, that is, the first total size Subtract the first difference of the second total size, and then subtract the second difference of the third total size and the fourth total size, that is, the first total size, subtract the second total size, and then subtract the third total size.
  • the size plus the fourth total size is the total size of data that can be deleted in the target backup file.
  • the backup chain reference relationship record file can record the total size of the data blocks contained in the backup file, as well as the reference file generation order in the backup file that is included in each backup file before the backup file generation order. Information such as the total size of the data block.
  • the third backup file references The size R (b, a) of the data block in the target backup file, the size R (b, p) of the data block in the first backup file that is referenced in the third backup file, and the size of the deleted data in the target backup file is Ta -R (a, p)-[R (b, a) -R (b, p)], where a is the target backup serial number of the target backup file, p is the first backup serial number of the first backup file, and b is the first The third backup sequence number of the three backup files.
  • the target backup file to be deleted is the backup file identified as backup 4.
  • a backup file is a backup file identified by 3
  • a third backup file is a backup file identified by backup 5.
  • the total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks among the multiple data blocks contained in the target backup file that do not belong to the target backup file is R (4, 3).
  • the total size of the data blocks belonging to the target backup file among the multiple data blocks contained in the three backup files is (R (5,4) -R (5,3), and the size of the deleted data in the target backup file is T4-R (4,3)-[(R (5,4) -R (5,3)].
  • the backup chain reference relationship record file does not contain related information identified as backup 3. You can query the backup chain reference relationship record file to find the A backup file is a backup file identified as 2 and a third backup file is a backup file identified as backup 5.
  • the total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks that do not belong to the target backup file among the multiple data blocks contained in the target backup file is R (4, 2).
  • the third backup The total size of the data blocks belonging to the target backup file among the multiple data blocks contained in the file is (R (5,4) -R (5,2), and the size of the deleted data in the target backup file is T4-R (4 , 2)-[(R (5,4) -R (5,2)].
  • the backup chain reference relationship record file does not contain related information identified as backup 3 and backup 5, and the reference relationship can be referenced in the backup chain according to the backup sequence number of each backup file.
  • the first backup file is queried as the backup file identified as 2 and the third backup file is the backup file identified as backup 6.
  • the total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks that do not belong to the target backup file among the multiple data blocks contained in the target backup file is R (4, 2).
  • the third backup The total size of the data blocks belonging to the target backup file among the multiple data blocks contained in the file is (R (6,4) -R (6,2), and the size of the deleted data in the target backup file is T4-R (4 , 2)-[(R (6,4) -R (6,2)].
  • the target backup file to be deleted is a backup file generated by the first backup
  • the target backup file to be deleted may be directly The total size of the multiple data blocks contained in the data is subtracted from the sum of the sizes of the data blocks in the reference backup file in the third backup file, which is the size of the data deleted in the target backup file.
  • the target backup file to be deleted may be directly The total size of the multiple data blocks included is subtracted from the sum of the sizes of the data blocks in the first backup file referenced in the target backup file as the size of the data deleted in the target backup file.
  • the backup server may determine whether the target backup file to be deleted is the backup file generated for the first backup and the backup file generated for the last backup when receiving the deletion request.
  • the process of determining whether a backup file is generated for the first backup is described in the above embodiment for calculating the size of the backup file, and will not be described here.
  • the target backup file to be deleted is the target backup file to be deleted.
  • FIG. 8 it is a flowchart of deleting a target backup file.
  • a user sends a delete request to a backup server to delete the target backup file.
  • the backup server includes the target backup file recorded in the pre-saved backup chain reference relationship record file.
  • the total size of multiple data blocks, the total size of the data blocks in the target backup file referencing the first backup file, the total size of the data blocks in the third backup file referencing the target backup file, and the third backup file referencing the first backup The total size of the data blocks of the file, calculates the size of the data that can be deleted in the target backup file, and feeds back to the user the size of the data that can be deleted in the target backup file.
  • the backup server can also delete data that can be deleted from the target backup file saved on the production storage server.
  • the backup server can also update the backup chain reference relationship record file. Specifically, it deletes the related information of the target backup file recorded in the backup chain reference relationship record file.
  • the related information includes The ID of the target backup file, the backup serial number, the total size of the multiple data blocks contained in the target backup file, and the total size of the data blocks in the reference backup file in any backup file.
  • this application provides a device 900 for calculating the size of a backup file.
  • the device may include a processing unit 901, a transceiver unit 902, and a storage unit 903.
  • the device 900 for calculating the size of a backup file may be applied to a backup server or a chip in the backup server.
  • the storage unit 903 may be configured to store a backup file.
  • the processing unit 901 may be configured to determine a target backup file of a size to be calculated, and determine a first backup file, where the first backup file is created before the creation time of the target backup file, and the creation time and the target are The backup file that was created most recently.
  • the metadata of the backup file records the backup serial number of the backup file, the reference information of the data block, and the size of the data block.
  • the reference information of the data block is used to indicate the backup file to which the data block belongs;
  • the metadata from the target backup file Query the reference information of each data block contained in the target backup file to obtain a first data block belonging to the target backup file, and the reference information of each data block is used to indicate the backup file;
  • Determining the size of the target backup file according to the size of the first data block and the second data block recorded in the metadata of the target backup file.
  • the processing unit 901 may be specifically configured to obtain attribution attributed to the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file.
  • a data block of a second backup file, the creation time of the second backup file is before the creation time of the target backup file, and after the creation time of the first backup file, the second backup file has been deleted ;
  • processing unit 901 may be specifically configured to determine the sum of the sizes of the first data block and the second data block as the size of the target backup file.
  • the receiving unit 902 may be configured to receive a query request, where the query request includes an identifier of the target backup file.
  • the storage unit 903 may be configured to record the reference information of each data block in the metadata of the target backup file, the size of each data block, and record the data in the metadata of the first backup file. Reference information for each data block contained in the first backup file.
  • the functions of the above units may be executed by the processor of the backup server or executed by the processor calling a program in the memory.
  • the present application provides a device 1000 for calculating the size of deleted data in a backup file.
  • the device may include a processing unit 1001, a transceiving unit 1002, and a storage unit 1003.
  • the device 1000 for calculating the size of a backup file may be applied to a backup server or a chip in a backup server.
  • the processing unit 1001 may be configured to determine a target backup file, where the target backup file includes multiple data blocks, the target backup file is a backup file of a source file, and the source file further includes a first backup file and a third backup file
  • the first backup file includes multiple data blocks
  • the third backup file includes multiple data blocks
  • the first backup file is created before the creation time of the target backup file, and the creation time and The backup file with the latest creation time of the target backup file, the first backup file is not deleted
  • the third backup file is created after the creation time of the target backup file, and the creation time and the The backup file with the latest creation time of the target backup file, and the third backup file has not been deleted;
  • Determining the first backup file according to the target backup file recorded in the metadata of the target backup file, including the reference information of each data block, and the first backup file recorded in the metadata of the first backup file, Obtaining reference information of each data block to obtain a second data block belonging to the target backup file, and the reference information of each data block is used to indicate a backup file to which the data block belongs;
  • the processing unit 1001 may be specifically configured to determine the sum of the sizes of the fourth data blocks as the sizes of data that can be deleted in the target backup file.
  • the processing unit 1001 may be specifically configured to determine a data block other than the third data block among the first data block and the second data block as a fourth data block that can be deleted.
  • the processing unit 1001 may be specifically configured to obtain the second backup according to the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file.
  • Data block of the file, the creation time of the second backup file is before the creation time of the target backup file, and after the creation time of the first backup file, the second backup file has been deleted;
  • the data block attributed to the second backup file is determined as the second data block attributed to the target backup file.
  • the receiving unit 1002 may be configured to receive a deletion request, where the deletion request includes an identifier of the target backup file.
  • the storage unit 1003 may be configured to record the backup serial number of the respective backup file and the reference information of each data block included in the backup file in the metadata of the target backup file, the second backup file, and the first backup file. Describe the size of each data block.
  • the functions of the above units may be executed by the processor of the backup server or executed by the processor calling a program in the memory.
  • the present application further provides a device 1100 for calculating the size of a backup file.
  • the device 1100 for calculating the size of a backup file can be applied to a backup server or a chip in the backup server.
  • the device 1100 for calculating the size of a backup file may include a processor 1101 and a memory 1102.
  • the memory 1102 is configured to store computer instructions
  • the processor 1101 is configured to execute computer instructions stored in the memory 1102, so that the apparatus for calculating the size of a backup file implements the method according to any one of the foregoing calculations of the size of a backup file.
  • the present application further provides a device 1200 for calculating the size of deleted data in a backup file.
  • the device 1200 for calculating the size of deleted data in a backup file may be applied to a backup server or a backup server. In the chip.
  • the device 1200 for calculating the size of the deleted data in the backup file may include a processor 1201 and a memory 1202.
  • the memory 1202 is configured to store computer instructions
  • the processor 1201 is configured to execute computer instructions stored in the memory 1202, so that the device for calculating the size of deleted data in a backup file implements any of the above-mentioned methods for calculating the size of deleted data in a backup file. The method described.
  • this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the field of data protection, and discloses a method and an apparatus for calculating backup file size, used for solving the problem in the prior art of the slow calculation speed when calculating the size of a backup file. The method comprises: determining a target backup file and a first backup file, the first backup file being the first backup file not deleted of the backup files having a creation time closest to the creation time of the target backup file and having a creation time before the creation time of the target file; on the basis of reference information of the data blocks recorded in the metadata of the target backup file and reference information of the data blocks recorded in the metadata of the first backup file, acquiring the data blocks belonging to the target backup file; and, on the basis of the data blocks belonging to the target backup file, calculating the size of the target backup file. Rapid calculation of the size of a backup file is thereby implemented.

Description

一种计算备份文件大小的方法及装置Method and device for calculating backup file size 技术领域Technical field
本申请实施例涉及数据保护技术领域,尤其涉及一种计算备份文件大小的方法及装置。The embodiments of the present application relate to the technical field of data protection, and in particular, to a method and a device for calculating the size of a backup file.
背景技术Background technique
随着公有云、私有云、混合云如火如荼的发展,使用云资源对用户的数据进行备份成为一种主流。通常采用增量备份技术,在云资源上对用户的数据进行备份。其中,增量备份技术的原理为:在对用户的数据进行首次备份时,进行全量备份,备份全部的数据块。在对数据进行非首次备份时,采用增量备份,只备份数据发生变化的数据块,数据未发生变化的数据块不进行备份。With the development of public cloud, private cloud, and hybrid cloud in full swing, the use of cloud resources to back up user data has become a mainstream. Incremental backup technology is usually used to back up user data on cloud resources. Among them, the principle of incremental backup technology is: when the user's data is backed up for the first time, a full backup is performed to back up all data blocks. When the data is not backed up for the first time, incremental backup is used. Only the data blocks whose data has changed are not backed up.
以下举例说明增量备份技术,如图1所示,为增量备份的一种链式结构,对一组数据块:数据块1-数据块6进行了4次备份,在进行首次备份时,备份全部的6个数据块,后续的备份只对数据发生变化的数据块进行备份,数据未发生变化的数据块不再进行备份,其中,数据是否发生变化是相对上一次备份来说的。以第二次备份为例,在第二次备份中,数据块1、数据块2、数据块4相对第一次备份,其中的数据发生了变化,数据块3、数据块5、数据块6相对第一次备份,其中的数据未发生变化。因此在第二次备份中,仅对数据块1、数据块2以及数据块4进行备份,对数据块3、数据块5、数据块6不再进行备份。The following illustrates the incremental backup technology. As shown in Figure 1, it is a chain structure of incremental backup. A group of data blocks: data block 1-data block 6 is backed up 4 times. When the first backup is performed, Back up all 6 data blocks. Subsequent backups only back up data blocks that have changed data. Data blocks that have not changed data are no longer backed up. Among them, whether the data has changed is relative to the last backup. Take the second backup as an example. In the second backup, the data block 1, data block 2, and data block 4 are compared with the first backup, and the data in them has changed. Data block 3, data block 5, and data block 6 Compared to the first backup, the data in it has not changed. Therefore, in the second backup, only data block 1, data block 2, and data block 4 are backed up, and data block 3, data block 5, and data block 6 are no longer backed up.
在现有技术中,针对数据块组进行的每次备份,均会生成一个元数据记录文件,该元数据记录文件中存储有该数据块组中的每个数据块的标识,以及每个数据块的存储路径。将每次备份对应的数据块组中进行备份的数据块,未进行备份的数据块均认为是备份文件中包含的数据块。In the prior art, each backup performed for a data block group generates a metadata record file. The metadata record file stores an identifier of each data block in the data block group and each data. The storage path of the block. The data blocks that are backed up in the data block group corresponding to each backup are considered as the data blocks contained in the backup file.
比如,如图2所示,针对第一次备份生成一个元数据记录文件(即标识为备份1的元数据记录文件),存储有数据块1至数据块6的标识(数据块1….数据块6分别对应的标识为Block1….Block6),且存储有数据块1至数据块6的存储路径(数据块1….数据块6分别对应的存储路径为Flie1….Flie6)。针对第二次备份,生成另一个元数据记录文件(即标识为备份2的元数据记录文件),记录文件2中同样存储有数据块1至数据块6的标识,以及数据块1至数据块6的存储路径。不同的是,在第二次备份过程中,仅对数据内容变化了的数据块1、数据块2以及数据块4进行了备份(即标识为备份2的元数据记录文件中存储变化后的数据块1、数据块2以及数据块4的存储路径为Flie1’、File2’、Flie4’),而记录文件2中数据内容未变化的数据块3、数据块5以及数据块6的存储路径与记录文件1中相同(即数据块3、数据块5以及数据块6的存储地址仍为Flie3、File5、Flie6),即表示数据块3、数据块5以及数据块6为引用生成顺序位于该备份文件之前的备份文件中的数据块。For example, as shown in FIG. 2, a metadata record file (that is, a metadata record file identified as backup 1) is generated for the first backup, and the identifiers of data block 1 to data block 6 (data block 1 ... data) are stored. The corresponding identifiers of block 6 are Block1... Block6), and the storage paths of data blocks 1 to 6 are stored (data block 1... And the corresponding storage paths of data block 6 are Flie1... Flie6). For the second backup, another metadata record file (that is, the metadata record file identified as backup 2) is generated. The record file 2 also stores the identifiers of data blocks 1 to 6 and data blocks 1 to data blocks. 6 storage paths. The difference is that during the second backup process, only the data block 1, data block 2 and data block 4 whose data content has changed are backed up (that is, the changed data is stored in the metadata record file identified as backup 2). The storage paths of block 1, data block 2, and data block 4 are Flie1 ', File2', and Flie4 '), and the storage paths and records of data block 3, data block 5, and data block 6 in which the data content in record file 2 has not changed The same in file 1 (that is, the storage addresses of data block 3, data block 5, and data block 6 are still Flie3, File5, and Flie6), which means that data block 3, data block 5, and data block 6 are generated in the backup file in the order of reference. Data blocks from previous backup files.
在现有技术中,在计算一个备份文件中包括的数据块占用的存储空间大小时,仅需要计算该备份文件中包含的发生变化的数据块占用的存储空间大小,可以根据备份文件对应的元数据记录文件中记录的发生变化的数据块的存储路径,在存储空间存储的数据中查找到对应的数据块,并识别出查找到的各数据块的大小,得到备份文件的大小。比如计算第二次备份后得到的备份文件中包括的数据块占用的存储空间大小时,可以仅查找Flie1’、 File2’、Flie4’对应的3个数据块,将查找到的3个数据块作为第二次备份后得到的备份文件中包括的数据块占用的存储空间大小。这种采用在数据底层查找数据块的方式来计算备份文件的大小,计算的速度较慢。In the prior art, when calculating the storage space occupied by a data block included in a backup file, it is only necessary to calculate the storage space occupied by the changed data block included in the backup file. The storage path of the changed data block recorded in the data record file, the corresponding data block is found in the data stored in the storage space, and the size of each found data block is identified to obtain the size of the backup file. For example, when calculating the storage space occupied by the data blocks included in the backup file obtained after the second backup, you can only find the 3 data blocks corresponding to Flie1 ', File2', and Flie4 ', and use the found 3 data blocks as The amount of storage space occupied by the data blocks included in the backup file obtained after the second backup. This method of finding the data block at the bottom of the data to calculate the size of the backup file is slower.
发明内容Summary of the Invention
本申请实施例提供一种计算备份文件大小的方法及装置,用以解决现有技术中在计算备份文件的大小时,计算速度慢的问题。本申请实施例提供的具体技术方案如下:The embodiments of the present application provide a method and a device for calculating the size of a backup file, which are used to solve the problem of slow calculation speed when calculating the size of a backup file in the prior art. The specific technical solutions provided in the embodiments of the present application are as follows:
第一方面,提供一种计算备份文件大小的方法,该方法可应用于备份服务器,具体为:In a first aspect, a method for calculating the size of a backup file is provided. The method can be applied to a backup server, specifically:
首先,备份服务器确定待计算大小的目标备份文件,并确定第一备份文件,所述第一备份文件为创建时间位于所述目标备份文件的创建时间之前的,且创建时间与所述目标备份文件的创建时间最近的备份文件,所述第一备份文件未被删除。备份文件的元数据中记录有数据块的引用信息,以及数据块的大小,数据块的引用信息用于指示数据块归属的备份文件。然后,备份服务器从所述目标备份文件的元数据中查询目标备份文件中包含的多个数据块中的每个数据块的引用信息,获得归属于所述目标备份文件的第一数据块。然后备份服务器并根据目标备份文件的元数据中记录的目标备份文件中包含的每个数据块的引用信息以及所述第一备份文件的元数据中记录的第一备份文件中包含的每个数据块的引用信息,获得归属于所述目标备份文件的第二数据块。最后,备份服务器根据所述目标备份文件的元数据中记录的所述第一数据块和所述第二数据块的大小,计算所述目标备份文件的大小。First, the backup server determines a target backup file of a size to be calculated, and determines a first backup file, the first backup file having a creation time before the creation time of the target backup file, and the creation time and the target backup file The backup file with the most recent creation time, the first backup file has not been deleted. The metadata of the backup file records the reference information of the data block and the size of the data block. The reference information of the data block is used to indicate the backup file to which the data block belongs. Then, the backup server queries the metadata of the target backup file for reference information of each data block in the multiple data blocks included in the target backup file to obtain the first data block belonging to the target backup file. Then back up the server and according to the reference information of each data block contained in the target backup file recorded in the metadata of the target backup file and each data contained in the first backup file recorded in the metadata of the first backup file Block reference information to obtain a second data block belonging to the target backup file. Finally, the backup server calculates the size of the target backup file according to the size of the first data block and the second data block recorded in the metadata of the target backup file.
在一种可能的实现中,在计算目标备份文件的大小时,可以是,将第一数据块和第二数据块的大小之和,确定为所述目标备份文件的大小。In a possible implementation, when calculating the size of the target backup file, the sum of the sizes of the first data block and the second data block may be determined as the size of the target backup file.
在本申请实施例中,由于目标备份文件的元数据和第一备份文件的元数据中各自记录有对应的备份文件中包含的每个数据块的引用信息,可以快速地识别出归属于目标备份文件的数据块,并根据记录的每个归属于目标备份文件的数据块的大小,采用相加的方式,快速地计算出目标备份文件的大小。In the embodiment of the present application, since the metadata of the target backup file and the metadata of the first backup file respectively record the reference information of each data block included in the corresponding backup file, it can be quickly identified as belonging to the target backup. The data block of the file, and the size of each recorded data block attributable to the target backup file, is quickly added to calculate the size of the target backup file.
在一种可能的实现中,在确定归属于所述目标备份文件的第二数据块时可以是,根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息,获得归属于第二备份文件的数据块,所述第二备份文件的创建时间位于所述目标备份文件的创建时间之前,且位于所述第一备份文件的创建时间之后,所述第二备份文件已被删除;将所述归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块。In a possible implementation, when determining the second data block belonging to the target backup file, it may be based on reference information of the data block recorded in the metadata of the target backup file and the first backup file. The reference information of the data block recorded in the metadata to obtain the data block belonging to the second backup file, the creation time of the second backup file is before the creation time of the target backup file, and it is located in the first backup After the file creation time, the second backup file has been deleted; and the data block attributed to the second backup file is determined as the second data block attributed to the target backup file.
在本发明实施例中,当第二备份文件被删除后,归属于第二备份文件的数据块更新为归属于目标备份文件,则可以将原本归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块,快速准确地计算备份文件的大小。In the embodiment of the present invention, after the second backup file is deleted, the data block belonging to the second backup file is updated to the target backup file, then the data block originally belonging to the second backup file may be determined as belonging. The second data block of the target backup file is used to quickly and accurately calculate the size of the backup file.
在一种可能的实现中,所述备份服务器可以是在接收到查询请求后,确定目标备份文件,所述查询请求中包含所述目标备份文件的标识,备份服务器根据标识确定目标备份文件。In a possible implementation, the backup server may determine a target backup file after receiving a query request, and the query request includes an identifier of the target backup file, and the backup server determines the target backup file according to the identifier.
该查询请求可以是租户发送的,也可以是计费系统发送的。The query request may be sent by the tenant or sent by the billing system.
在一种可能的实现中,第一方面所述的方法还可包括:在所述目标备份文件的元数据 中记录所述目标备份文件中包含的每个数据块的引用信息,以及所述每个数据块的大小,以及在第一备份文件的元数据中记录所述第一备份文件中包含的每个数据块的引用信息。In a possible implementation, the method described in the first aspect may further include: recording, in the metadata of the target backup file, reference information of each data block included in the target backup file, and the each The size of each data block, and the reference information of each data block contained in the first backup file is recorded in the metadata of the first backup file.
以便后续根据元数据中记录的信息计算备份文件的大小。In order to calculate the backup file size based on the information recorded in the metadata.
第二方面,提供一种计算备份文件大小的方法,该方法可应用于备份服务器,具体为:In a second aspect, a method for calculating the size of a backup file is provided. The method can be applied to a backup server, specifically:
首先,备份服务器获取保存的目标备份文件中包含的多个数据块的第一总大小,所述目标备份文件是源文件的备份文件;然后,备份服务器获取保存的目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的第二总大小;最后,根据所述第一总大小,以及所述第二总大小,计算所述目标备份文件的大小。First, the backup server obtains a first total size of a plurality of data blocks included in the saved target backup file, and the target backup file is a backup file of the source file; then, the backup server obtains a plurality of data contained in the saved target backup file. The second total size of the data blocks in the data block that do not belong to the target backup file; finally, the size of the target backup file is calculated according to the first total size and the second total size.
在一种可能的实现中,将第一总大小与第二总大小的差值,确定为所述目标备份文件的大小。In a possible implementation, the difference between the first total size and the second total size is determined as the size of the target backup file.
在本发明实施例中,由于预先保存目标备份文件的第一总大小,以及不归属目标备份文件的第三数据块的第二总大小,采用相减的方式,快速地计算出目标备份文件的大小。In the embodiment of the present invention, since the first total size of the target backup file is saved in advance and the second total size of the third data block that does not belong to the target backup file, a subtraction method is used to quickly calculate the target backup file's size.
在一种可能的实现中,预先保存目标备份文件中包含的多个数据块的第一总大小的过程包括:备份服务器确定目标备份文件,从所述目标备份文件的元数据中查询所述目标备份文件中包含的每个数据块的大小,将所述目标备份文件中包含的每个数据块的大小之和,确定为所述目标备份文件中包含的多个数据块的第一总大小,并保存。In a possible implementation, the process of pre-saving the first total size of a plurality of data blocks included in the target backup file includes: the backup server determines a target backup file, and queries the target from the metadata of the target backup file. The size of each data block included in the backup file, and determining the sum of the sizes of each data block included in the target backup file as the first total size of a plurality of data blocks included in the target backup file, And save.
在一种可能的实现中,预先保存的不归属于所述目标备份文件的数据块的第二总大小的过程包括:备份服务器确定目标备份文件,根据所述目标备份文件的元数据中记录的所述目标备份文件中包含的每个数据块的引用信息,获得所述目标备份文件中包含的多个数据块中不归属于所述目标备份文件的第三数据块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件;根据所述目标备份文件的元数据中记录的所述第三数据块的大小,确定所述目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的第二总大小,第二总大小为第三数据块的大小之和。In a possible implementation, the process of pre-saving the second total size of the data blocks that do not belong to the target backup file includes: the backup server determines the target backup file, and according to the metadata recorded in the target backup file, The reference information of each data block included in the target backup file to obtain a third data block of the plurality of data blocks included in the target backup file that does not belong to the target backup file, and each data block The reference information of is used to indicate the backup file to which the data block belongs; according to the size of the third data block recorded in the metadata of the target backup file, determining among a plurality of data blocks included in the target backup file The second total size of the data blocks that do not belong to the target backup file is the sum of the sizes of the third data blocks.
在一种可能的实现中,所述备份服务器可以是在接收到查询请求后,确定目标备份文件,所述查询请求中包含所述目标备份文件的标识,备份服务器根据标识确定目标备份文件。In a possible implementation, the backup server may determine a target backup file after receiving a query request, and the query request includes an identifier of the target backup file, and the backup server determines the target backup file according to the identifier.
该查询请求可以是租户发送的,也可以是计费系统发送的。The query request may be sent by the tenant or sent by the billing system.
在一种可能的实现中,第二方面所述的方法还可包括:在所述目标备份文件的元数据中记录所述目标备份文件中包含的每个数据块的引用信息,以及所述每个数据块的大小。In a possible implementation, the method described in the second aspect may further include: in the metadata of the target backup file, recording reference information of each data block included in the target backup file, and the each The size of each data block.
以便后续根据元数据中记录的信息计算备份文件的大小。In order to calculate the backup file size based on the information recorded in the metadata.
第三方面,提供一种计算备份文件中删除的数据大小的方法,该方法可应用于备份服务器,具体为:In a third aspect, a method for calculating a size of data deleted in a backup file is provided. The method can be applied to a backup server, and specifically:
首先,备份服务器确定目标备份文件,所述目标备份文件中包含多个数据块,所述目标备份文件是源文件的备份文件,所述源文件还具有第一备份文件和第三备份文件,所述第一备份文件包含多个数据块,所述第三备份文件包含多个数据块,所述第一备份文件为创建时间位于所述目标备份文件的创建时间之前的,且创建时间与所述目标备份文件的创建时间最近的备份文件,所述第一备份文件未被删除,所述第三备份文件为创建时间位于所述目标备份文件的创建时间之后的,且创建时间与所述目标备份文件的创建时间最近的备份文件,所述第三备份文件未被删除;First, the backup server determines a target backup file. The target backup file includes multiple data blocks. The target backup file is a backup file of a source file. The source file also has a first backup file and a third backup file. The first backup file includes a plurality of data blocks, and the third backup file includes a plurality of data blocks. The first backup file is created before the creation time of the target backup file. The backup file with the latest creation time of the target backup file, the first backup file is not deleted, the third backup file is created after the creation time of the target backup file, and the creation time and the target backup are The backup file with the most recent file creation time, the third backup file has not been deleted;
然后,根据所述目标备份文件的元数据中记录的所述目标备份文件中包含的每个数据块的引用信息,获得归属于所述目标备份文件的第一数据块;Then, according to the reference information of each data block contained in the target backup file recorded in the metadata of the target backup file, obtaining a first data block belonging to the target backup file;
再然后,确定第一备份文件,根据所述目标备份文件的元数据中记录的目标备份文件中包含每个数据块的引用信息,以及所述第一备份文件的元数据中记录的第一备份文件中包含的每个数据块的引用信息,获得归属于所述目标备份文件的第二数据块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件;Then, a first backup file is determined, according to the target backup file recorded in the metadata of the target backup file, the target backup file contains reference information of each data block, and the first backup recorded in the metadata of the first backup file. The reference information of each data block contained in the file to obtain a second data block belonging to the target backup file, and the reference information of each data block is used to indicate the backup file to which the data block belongs;
再然后,确定第三备份文件,根据所述第三备份文件的元数据中记录第三备份文件中包含的每个数据块的引用信息,以获得归属于所述目标备份文件的第一数据块和第二数据块中被第三备份文件引用的第三数据块;Then, a third backup file is determined, and according to the metadata of the third backup file, reference information of each data block included in the third backup file is recorded to obtain a first data block belonging to the target backup file. And the third data block in the second data block that is referenced by the third backup file;
最后,根据所述第一数据块、第二数据块,以及所述第三数据块,确定能够被删除的第四数据块,根据目标备份文件的元数据中记录的每个第四数据块的大小,确定所述目标备份文件中能够被删除的数据的大小。Finally, according to the first data block, the second data block, and the third data block, a fourth data block that can be deleted is determined, and according to each of the fourth data blocks recorded in the metadata of the target backup file, The size determines the size of data that can be deleted in the target backup file.
在一种可能的实现中,将第四数据块的大小之和,确定为所述目标备份文件中能够被删除的数据的大小。In a possible implementation, the sum of the sizes of the fourth data blocks is determined as the size of data that can be deleted in the target backup file.
在本发明实施例中,由于目标备份文件的元数据、第一备份文件的元数据和第三备份文件的元数据中各自记录有对应的备份文件中包含的每个数据块的引用信息,可以快速地识别出目标备份文件中能够被删除的数据块,并根据记录的每个能够被删除的数据块的大小,采用相加的方式,快速地计算出目标备份文件中能够被删除的数据的大小。In the embodiment of the present invention, since the metadata of the target backup file, the metadata of the first backup file, and the metadata of the third backup file each record the reference information of each data block included in the corresponding backup file, it is possible to Quickly identify the data blocks that can be deleted in the target backup file, and quickly calculate the data that can be deleted in the target backup file according to the size of each recorded data block that can be deleted. size.
在一种可能的实现中,在确定所述目标备份文件中能够删除的第四数据块时,可以是将第一数据块和第二数据块中除第三数据块外的数据块确定为能够被删除的第四数据块。In a possible implementation, when determining the fourth data block that can be deleted in the target backup file, the data blocks other than the third data block in the first data block and the second data block may be determined as being capable of being deleted. The deleted fourth data block.
在一种可能的实现中,在确定归属于所述目标备份文件的第二数据块时,首先,根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息,获得归属于第二备份文件的数据块,所述第二备份文件的创建时间位于所述目标备份文件的创建时间之前,且位于所述第一备份文件的创建时间之后,所述第二备份文件已被删除;然后,将所述归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块。In a possible implementation, when determining the second data block belonging to the target backup file, first, according to the reference information of the data block recorded in the metadata of the target backup file and the first backup file The reference information of the data block recorded in the metadata to obtain the data block belonging to the second backup file, the creation time of the second backup file is before the creation time of the target backup file, and it is located in the first backup After the file creation time, the second backup file has been deleted; then, the data block attributed to the second backup file is determined as the second data block attributed to the target backup file.
在本发明实施例中,当第二备份文件被删除后,归属于第二备份文件的数据块更新为归属于目标备份文件,则可以将原本归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块。In the embodiment of the present invention, after the second backup file is deleted, the data block belonging to the second backup file is updated to the target backup file, then the data block originally belonging to the second backup file may be determined as belonging. A second data block for the target backup file.
在一种可能的实现中,所述备份服务器可以是在接收到删除请求后,确定目标备份文件,所述删除请求中包含所述目标备份文件的标识,备份服务器根据标识确定目标备份文件。In a possible implementation, the backup server may determine a target backup file after receiving the delete request, the delete request includes an identifier of the target backup file, and the backup server determines the target backup file according to the identifier.
该删除请求可以是租户发送的,也可以是其他系统发送的。The deletion request may be sent by the tenant or by another system.
在一种可能的实现中,第三方面所述的方法还可包括:在所述目标备份文件的元数据中记录所述目标备份文件中包含的每个数据块的引用信息,以及所述每个数据块的大小,在第一备份文件的元数据中记录所述第一备份文件中包含的每个数据块的引用信息,以及在第三备份文件的元数据中记录所述第三备份文件中包含的每个数据块的引用信息。In a possible implementation, the method described in the third aspect may further include: recording, in the metadata of the target backup file, reference information of each data block included in the target backup file, and the each The size of each data block, record the reference information of each data block contained in the first backup file in the metadata of the first backup file, and record the third backup file in the metadata of the third backup file Reference information for each data block contained in the.
以便后续根据元数据中记录的信息计算备份文件中能够被删除的数据的大小。In order to subsequently calculate the size of the data that can be deleted in the backup file based on the information recorded in the metadata.
第四方面,本申请提供一种计算备份文件的大小的装置,用于备份服务器或备份服务 器的芯片,包括:用于执行以上任一方面各个步骤的单元或手段(means)。In a fourth aspect, the present application provides a device for calculating the size of a backup file, a chip for a backup server or a backup server, including: units or means for performing each step in any of the above aspects.
第五方面,本申请提供一种计算备份文件的大小的装置,用于备份服务器或备份服务器的芯片,包括至少一个处理元件和至少一个存储元件,其中所述至少一个存储元件用于存储程序和数据,所述至少一个处理元件用于执行本申请任一方面提供的方法。In a fifth aspect, the present application provides a device for calculating the size of a backup file, which is used for a backup server or a chip of a backup server, and includes at least one processing element and at least one storage element, where the at least one storage element is used to store a program and Data, the at least one processing element is configured to perform a method provided by any aspect of the present application.
第六方面,本申请提供一种计算备份文件的大小的装置,用于备份服务器包括用于执行以上任一方面的方法的至少一个处理元件(或芯片)。According to a sixth aspect, the present application provides a device for calculating a size of a backup file, which is used for a backup server including at least one processing element (or chip) for performing the method of any of the above aspects.
第七方面,本申请提供一种计算机程序产品,该计算机程序产品包括计算机指令,当该计算机指令被计算机执行时,使得所述计算机执行以上任一方面的方法。In a seventh aspect, the present application provides a computer program product including computer instructions that, when executed by a computer, cause the computer to execute the method of any of the above aspects.
第八方面,本申请提供了一种计算机可读存储介质,该存储介质存储有计算机指令,当所述计算机指令被计算机执行时,使得所述计算机执行以上任一方面的方法。In an eighth aspect, the present application provides a computer-readable storage medium that stores computer instructions, and when the computer instructions are executed by a computer, the computer is caused to execute the method of any of the above aspects.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为现有技术中的增量备份的一种链式结构图;1 is a chain structure diagram of an incremental backup in the prior art;
图2为现有技术中的备份文件的元数据记录文件示意图;2 is a schematic diagram of a metadata record file of a backup file in the prior art;
图3为本申请实施例中的计算备份系统结构图;3 is a structural diagram of a computing backup system according to an embodiment of the present application;
图4为本申请实施例中的计算备份文件的大小的方法流程图;4 is a flowchart of a method for calculating a size of a backup file according to an embodiment of the present application;
图5为本申请实施例中的备份文件的元数据记录文件示意图;5 is a schematic diagram of a metadata record file of a backup file in an embodiment of the present application;
图6为本申请实施例中的计算备份文件的大小的方法流程图;6 is a flowchart of a method for calculating a size of a backup file according to an embodiment of the present application;
图7为本申请实施例中的备份过程示意图;7 is a schematic diagram of a backup process in an embodiment of the present application;
图8为本申请实施例中的删除备份文件过程示意图;8 is a schematic diagram of a process of deleting a backup file in an embodiment of the present application;
图9为本申请实施例中的计算备份文件的大小的装置;9 is a device for calculating a size of a backup file in an embodiment of the present application;
图10为本申请实施例中的计算备份文件中删除的数据的大小的装置;FIG. 10 is an apparatus for calculating a size of deleted data in a backup file according to an embodiment of the present application; FIG.
图11为本申请实施例中的计算备份文件的大小的装置;11 is an apparatus for calculating a size of a backup file in an embodiment of the present application;
图12为本申请实施例中的计算备份文件中删除的数据的大小的装置。FIG. 12 is an apparatus for calculating a size of deleted data in a backup file according to an embodiment of the present application.
具体实施方式detailed description
下面将结合附图,对本申请实施例进行详细描述。The embodiments of the present application will be described in detail below with reference to the drawings.
图3为本申请实施例提供的一种备份系统300,该系统包括:用户301、备份服务器302、生产存储服务器303、和备份存储服务器304。FIG. 3 is a backup system 300 according to an embodiment of the present application. The system includes a user 301, a backup server 302, a production storage server 303, and a backup storage server 304.
用户可以通过备份系统中的备份服务器302进行数据块的备份、恢复和删除备份文件操作,本申请应用于备份和删除场景中,为了防止业务故障时造成的数据丢失,用户301可以定期向备份服务器302发送备份请求,用以对需要备份的数据进行备份。用户301向备份服务器302发起备份任务,备份服务器302从生产存储服务器303中读取需要备份的数据,并将数据存储到备份存储服务器304中,备份服务器302上还具有一个本地数据库,用于管理备份任务数据。用户可以是租户,也可以是其他系统。Users can back up, restore, and delete backup files using the backup server 302 in the backup system. This application applies to backup and deletion scenarios. In order to prevent data loss caused by business failures, the user 301 can periodically report to the backup server. 302 sends a backup request to back up the data that needs to be backed up. The user 301 initiates a backup task to the backup server 302. The backup server 302 reads the data to be backed up from the production storage server 303 and stores the data in the backup storage server 304. The backup server 302 also has a local database for management Back up task data. Users can be tenants or other systems.
租户在云资源上租用的存储空间是有限的,租户可以保存每个备份文件占用的存储空间是多少,从而在租用的存储空间不足时,把某些备份文件给删掉,因此租户可以向备份 服务器302查询备份文件的大小并记录下来。The storage space rented by the tenant on the cloud resource is limited. The tenant can save the storage space occupied by each backup file. Therefore, when the rented storage space is insufficient, some backup files are deleted, so the tenant can send backup The server 302 queries the size of the backup file and records it.
除了租户可以向备份服务302器查询备份文件的大小,计费系统也可以向备份服务器302查询备份文件的大小,计费系统在进行计费时,是针对每个备份文件进行计费的,计费系统需要知道每个备份文件的大小,从而来进行计费。Except that the tenant can query the backup file size from the backup service 302, the billing system can also query the backup server 302 about the size of the backup file. When the billing system performs billing, it charges each backup file. The billing system needs to know the size of each backup file in order to perform billing.
在现有技术中,备份服务器在计算备份文件的大小时,需要根据备份文件中的数据块的存储路径去底层查询每个数据块的大小,计算的速度较慢,本申请提供了一种计算备份文件大小的方法,可以提高计算的速度。在计算某个备份文件的大小时,可能会涉及到与该备份文件相邻的备份文件,本申请提供的多个备份文件均位于同一个备份链中,也就是针对同一源文件的备份文件。另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。In the prior art, when the backup server calculates the size of the backup file, it needs to query the size of each data block at the bottom according to the storage path of the data block in the backup file. The calculation speed is slow. This application provides a calculation The method of backing up the file size can increase the calculation speed. When calculating the size of a backup file, a backup file adjacent to the backup file may be involved. Multiple backup files provided in this application are located in the same backup chain, that is, backup files for the same source file. In addition, it should be understood that in the description of this application, the words "first" and "second" are used only for the purpose of distinguishing descriptions, and cannot be understood as indicating or implying relative importance, nor as indicating Or imply order.
如图4所示,本申请公开了一种计算备份文件大小的方法流程图,该流程中的备份服务器具体为图3所示的备份服务器302。可以理解的是,在本申请中,备份服务器的功能也可以通过应用于备份服务器的芯片来实现。该流程具体为:As shown in FIG. 4, the present application discloses a flowchart of a method for calculating the size of a backup file. The backup server in this process is specifically the backup server 302 shown in FIG. 3. It can be understood that, in this application, the function of the backup server may also be implemented by a chip applied to the backup server. The process is specifically:
步骤S401:备份服务器接收查询请求,所述查询请求中包含目标备份文件的标识。Step S401: The backup server receives a query request, and the query request includes an identifier of a target backup file.
步骤S402:备份服务器确定目标备份文件。Step S402: The backup server determines a target backup file.
在本申请实施例中,租户或者计费系统在想要了解某个备份文件的大小时,可以向备份服务器发送查询请求,用以查询备份文件的大小。查询请求中包含待查询的备份文件的标识,将待查询的备份文件称为目标备份文件。备份服务器在接收到查询请求后,可以根据查询请求中包含的标识确定出目标备份文件。In the embodiment of the present application, when the tenant or the charging system wants to know the size of a backup file, it may send a query request to the backup server to query the size of the backup file. The query request includes an identifier of the backup file to be queried, and the backup file to be queried is referred to as a target backup file. After receiving the query request, the backup server can determine the target backup file according to the identifier contained in the query request.
在增量备份技术中,在对源文件进行第一次备份时,根据源文件进行全量备份,在进行后续的第二次、第三次、……、第n次备份时,根据前一次备份生成的备份文件进行增量备份,即均是针对数据发生变化的数据块的备份,数据未发生变化的数据块不进行备份,而是引用前一次备份生成的备份文件中的数据块。不管是第一次备份还是非第一次备份,该次备份生成的备份文件都是针对源文件的备份文件。In the incremental backup technology, when the source file is backed up for the first time, a full backup is performed according to the source file, and when the subsequent second, third, ..., n-th backup is performed, the previous backup is performed according to the previous backup. The generated backup files are incrementally backed up, that is, they are backups of data blocks whose data has changed. Data blocks whose data has not changed are not backed up, but instead refer to the data blocks in the backup file generated by the previous backup. Regardless of whether it is the first or non-first backup, the backup file generated by this backup is the backup file for the source file.
在每一次的备份中,进行备份的数据块可以认为是归属于该次备份生成的备份文件的数据块。不管是进行备份的数据块,还是未进行备份的数据块,即引用前一次备份生成的备份文件中的数据块,均可以认为是该次备份生成的备份文件中包含的数据块。In each backup, the data block to be backed up can be regarded as the data block belonging to the backup file generated by the backup. Regardless of whether the data block is backed up or the data block is not backed up, that is, the data block in the backup file generated from the previous backup can be considered as the data block included in the backup file generated by the backup.
如图1所示,在第三次备份中,数据块1,数据块2,数据块4,数据块6为引用前一次备份即第二次备份生成的备份文件中的数据块,数据块3和数据块5为第三次备份中进行备份的数据块,则将第三次备份中的数据块1-数据块6均认为是第三次备份生成的备份文件中包含的数据块,数据块3和数据块5认为是归属于第三次备份生成的备份文件的数据块,数据块1,数据块2,数据块4,数据块6不归属于第三次备份生成的备份文件。As shown in Figure 1, in the third backup, data block 1, data block 2, data block 4, and data block 6 refer to the data block in the backup file generated by referring to the previous backup, that is, the data block 3 And data block 5 is the data block backed up in the third backup, then data blocks 1 to 6 in the third backup are considered as the data blocks and data blocks contained in the backup file generated by the third backup. 3 and data block 5 are considered to belong to the backup file generated by the third backup. Data block 1, data block 2, data block 4, and data block 6 do not belong to the backup file generated by the third backup.
在该申请实施例中,目标备份文件为进行增量备份时生成的备份文件,目标备份文件是源文件的备份文件。每个备份文件都有其元数据,该元数据包括备份文件的标识、备份文件的备份序号,备份文件中包含的每个数据块的信息,其中,备份序号按备份文件的生成顺序递增,数据块的信息包括数据块的大小,数据块的引用信息等。数据块的引用信息用于指示该数据块归属的备份文件,也就是该数据块在进行备份时生成的备份文件,元数据中记录的数据块的引用信息具体可以是该数据块归属的备份文件的标识以及备份序号。 备份文件的备份序号用于指示该备份文件的生成顺序。In the embodiment of the application, the target backup file is a backup file generated when performing incremental backup, and the target backup file is a backup file of the source file. Each backup file has its metadata. The metadata includes the identification of the backup file, the backup serial number of the backup file, and the information of each data block contained in the backup file. Among them, the backup serial number increases in the order in which the backup file was generated, and the data The block information includes the size of the data block, the reference information of the data block, and so on. The reference information of a data block is used to indicate the backup file to which the data block belongs, that is, the backup file generated when the data block is backed up. The reference information of the data block recorded in the metadata can be the backup file to which the data block belongs Identification and backup serial number. The backup sequence number of the backup file is used to indicate the generation sequence of the backup file.
备份服务器在确定出了目标备份文件后,还可以再进行步骤403:从所述目标备份文件的元数据中查询所述目标备份文件中包含的多个数据块中的每个数据块的引用信息,获取归属于目标备份文件的第一数据块。After the backup server determines the target backup file, it may further perform step 403: query the metadata of the target backup file to query reference information of each data block in the multiple data blocks included in the target backup file. To obtain the first data block belonging to the target backup file.
在查询出目标备份文件的元数据中记录的目标备份文件中包含的多个数据块中的每个数据块的引用信息后,就可以根据查询到的每个数据块的引用信息,确定出归属于目标备份文件的数据块,仅根据目标备份文件的元数据中记录的数据块的引用信息,确定出归属于目标备份文件的数据块称为第一数据块,第一数据块是原本就归属于目标备份文件的。After querying the reference information of each data block among the multiple data blocks contained in the target backup file recorded in the metadata of the target backup file, the ownership can be determined according to the reference information of each data block found For the data block of the target backup file, the data block belonging to the target backup file is determined as the first data block only based on the reference information of the data block recorded in the metadata of the target backup file. The first data block is originally owned. For the target backup file.
在根据查询到的每个数据块的引用信息,确定归属于目标备份文件的数据块时,可以是根据引用信息中的备份文件的标识确定的,具体为识别目标备份文件的元数据中记录的每个数据块归属的备份文件的标识,将其中的目标备份文件的标识对应的数据块确,确定为归属于目标备份文件的第一数据块。也可以是根据引用信息中的备份文件的备份序号确定,具体为识别目标备份文件的元数据中记录的每个数据块归属的备份文件的备份序号,将其中的目标备份文件的备份序号对应的数据块,确定为归属于目标备份文件的第一数据块。When determining the data block belonging to the target backup file according to the reference information of each data block that is queried, it may be determined according to the identifier of the backup file in the reference information, specifically, it is recorded in the metadata identifying the target backup file. The identifier of the backup file to which each data block belongs, and the data block corresponding to the identifier of the target backup file therein is determined as the first data block belonging to the target backup file. It may also be determined based on the backup sequence number of the backup file in the reference information, specifically identifying the backup sequence number of the backup file to which each data block recorded in the metadata of the target backup file belongs, and correspondingly the backup sequence number of the target backup file therein. The data block is determined as the first data block belonging to the target backup file.
在本发明实施例中,源文件还具有第一备份文件,第一备份文件的创建时间相较于目标备份文件的创建时间早,且两者的创建时间最接近,也就是第一备份文件为创建时间位于所述目标备份文件的创建时间之前的,且创建时间与所述目标备份文件的创建时间最近的备份文件,第一备份文件未被删除。In the embodiment of the present invention, the source file also has a first backup file. The creation time of the first backup file is earlier than the creation time of the target backup file, and the creation time of the two is the closest, that is, the first backup file is The backup file whose creation time is before the creation time of the target backup file and whose creation time is closest to the creation time of the target backup file, the first backup file has not been deleted.
例如,目标备份文件为第四次备份时生成的备份文件,在不存在被删除的备份文件时,第一备份文件为第三次备份时生成的备份文件。如果存在被删除的备份文件,且被删除的备份文件为第三次备份时生成的备份文件,则第一备份文件为第二次备份时生成的备份文件,并且原本归属于第三次备份生成的备份文件的数据块变更为归属于目标备份文件。如果被删除的备份文件为第三次备份时生成的备份文件和第二次备份时生成的备份文件,则第一备份文件为第一次备份时生成的备份文件,并且原本归属于第三次备份生成的备份文件和第二次备份时生成的备份文件的数据块变更为归属于目标备份文件。For example, the target backup file is the backup file generated during the fourth backup. When there is no deleted backup file, the first backup file is the backup file generated during the third backup. If a deleted backup file exists and the deleted backup file is the backup file generated during the third backup, the first backup file is the backup file generated during the second backup and originally belongs to the third backup generation The data block of the backup file is changed to belong to the target backup file. If the deleted backup file is the backup file generated during the third backup and the backup file generated during the second backup, the first backup file is the backup file generated during the first backup and originally belongs to the third backup The data blocks of the backup file generated during the backup and the backup file generated during the second backup are changed to the target backup file.
备份服务器确定第一备份文件的过程可以如下:The process of the backup server determining the first backup file may be as follows:
备份服务器根据记录的每个备份文件的备份序号,确定所述目标备份文件的备份序号,将目标备份文件的备份序号称为目标备份序号,备份服务器确定与目标备份序号相邻,且小于所述目标备份序号的第一备份序号;The backup server determines the backup serial number of the target backup file according to the recorded backup serial number of each backup file. The backup serial number of the target backup file is referred to as the target backup serial number. The backup server determines that it is adjacent to the target backup serial number and is smaller than The first backup sequence number of the target backup sequence number;
并根据记录每个备份文件的备份序号,确定第一备份序号对应的第一备份文件。The first backup file corresponding to the first backup serial number is determined according to the backup serial number of each backup file.
为了使确定的备份文件的大小更加准确,还可以进行步骤404:确定第一备份文件,根据目标备份文件的元数据中记录的目标备份文件中包含的每个数据块的引用信息以及所述第一备份文件的元数据中记录的第一备份文件中包含的每个数据块的引用信息,获得归属于第二备份文件的数据块;将所述归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块,其中,所述第二备份文件的创建时间位于所述目标备份文件的创建时间之前,且位于所述第一备份文件的创建时间之后,所述第二备份文件已被删除。In order to make the size of the determined backup file more accurate, step 404 may also be performed: determining the first backup file, according to reference information of each data block included in the target backup file recorded in the metadata of the target backup file, and the first The reference information of each data block contained in the first backup file recorded in the metadata of a backup file is used to obtain the data block belonging to the second backup file; and the data block belonging to the second backup file is determined as A second data block belonging to the target backup file, wherein the creation time of the second backup file is before the creation time of the target backup file and after the creation time of the first backup file, the The second backup file has been deleted.
备份服务器在根据目标备份文件的元数据中记录的目标备份文件中包含的每个数据块的引用信息以及所述第一备份文件的元数据中记录的第一备份文件中包含的每个数据块的 引用信息,获得归属于第二备份文件的第二数据块时,具体可以是,针对目标备份文件中除第一数据块外的每个数据块,识别出该数据块在第一备份文件中的对应数据块,该数据块,及该数据块在第一备份文件中的对应数据块,均是针对源文件中的同一数据块的备份数据块,备份服务器判断该数据块的引用信息与该数据块在第一备份文件中的对应数据块的引用信息是否相同,如果不同,则确定该数据块为原本归属于被删除的第二备份文件的数据块,也就是当前归属于目标备份文件的第二数据块。Reference information of each data block included in the target backup file recorded in the metadata of the target backup file according to the backup server and each data block included in the first backup file recorded in the metadata of the first backup file When obtaining the reference data of the second backup file belonging to the second backup file, specifically, for each data block except the first data block in the target backup file, identifying the data block in the first backup file The corresponding data block, the data block, and the corresponding data block in the first backup file are backup data blocks for the same data block in the source file. The backup server judges the reference information of the data block and the data block. Whether the reference information of the corresponding data block of the data block in the first backup file is the same. If they are different, then it is determined that the data block is the data block originally belonging to the deleted second backup file, that is, the data block currently belonging to the target backup file. The second data block.
备份服务器针对目标备份文件中除第一数据块外的每个数据块,识别该数据块在第一备份文件中的对应数据块的过程可以是,备份服务器在每个备份文件的元数据中记录该备份文件中包含的每个数据块的标识,针对源文件中的每个数据块的备份数据块,备份服务器记录的标识可以是相同的。备份服务器可以针对目标备份文件中除第一数据块外的每个数据块,识别第一备份文件中与该数据块的标识相同的数据块,将该标识相同的数据块,确定为目标备份文件中的该数据块在第一备份文件中的对应数据块。For each data block except the first data block in the target backup file, the process of identifying the corresponding data block of the data block in the first backup file may be that the backup server records in the metadata of each backup file The identifier of each data block contained in the backup file may be the same for the backup data block of each data block in the source file. The backup server may identify, for each data block except the first data block in the target backup file, a data block in the first backup file that has the same identifier as the data block, and determine the data block with the same identifier as the target backup file. The corresponding data block of the data block in the first backup file.
备份服务器在确定出了归属于目标备份文件的第一数据块和第二数据块后,就可以根据所述目标备份文件的元数据中记录的所述第一数据块和所述第二数据块的大小,计算所述目标备份文件的大小。具体可以是执行步骤405:将第一数据块和第二数据块的大小之和,确定为所述目标备份文件的大小。After the backup server determines the first data block and the second data block belonging to the target backup file, the backup server may according to the first data block and the second data block recorded in the metadata of the target backup file. To calculate the size of the target backup file. Specifically, step 405 may be performed: determining the sum of the sizes of the first data block and the second data block as the size of the target backup file.
由于目标备份文件的元数据和第一备份文件的元数据中各自记录有对应的备份文件中包含的每个数据块的引用信息,可以快速地识别出归属于目标备份文件的数据块,并根据记录的每个归属于目标备份文件的数据块的大小,采用相加的方式,快速地计算出目标备份文件的大小。Since the metadata of the target backup file and the metadata of the first backup file respectively record the reference information of each data block contained in the corresponding backup file, the data blocks belonging to the target backup file can be quickly identified, and according to The size of each recorded data block that belongs to the target backup file is quickly calculated by using the addition method.
备份文件的元数据中记录的该备份文件包含的每个数据块的引用信息以及数据块的大小具体可以记录在备份文件的元数据记录文件中,备份文件的元数据记录文件中除了记录如图2所示的已有的每个数据块的标识Block以及数据块的存储路径Flie外,还记录有数据块的大小size,以及数据块归属的备份文件的标识,以及备份文件的备份序号,具体可以参见图5所示的元数据记录文件。图5所示的元数据记录文件仅是用来表示备份文件的元数据记录文件中记录有哪些信息,具体的记录格式可以是如图5所示的格式,也可以是备份服务器设定的其他格式。在图5中,第三列为数据块归属的备份文件的备份序号,可以理解为第三列为一个有序的标识,当然也可以将备份文件的标识及备份序号分开记录,第四列为数据块的大小。以第三次备份为例,第三次备份生成的备份文件的元数据记录文件为图5中的备份3,备份3中的数据块1(Block1)对应的数字“2”表示数据块1归属的备份文件为第二次备份时生成的备份文件,数据块5(Block5)对应的数字“3”表示数据块5归属的备份文件为第三次备份时生成的备份文件。The reference information of each data block contained in the backup file and the size of the data block recorded in the metadata of the backup file can be specifically recorded in the metadata record file of the backup file. Except for the record in the metadata record file of the backup file, In addition to the identification block and storage path Flie of each existing data block shown in 2, the size of the data block, the identification of the backup file to which the data block belongs, and the backup sequence number of the backup file are detailed. See the metadata record file shown in Figure 5. The metadata record file shown in FIG. 5 is only used to indicate what information is recorded in the metadata record file of the backup file. The specific record format can be the format shown in FIG. 5 or other settings set by the backup server. format. In Figure 5, the third column is the backup serial number of the backup file to which the data block belongs. It can be understood that the third column is an orderly identification. Of course, the identification of the backup file and the backup serial number can also be recorded separately. The fourth column is The size of the data block. Taking the third backup as an example, the metadata record file of the backup file generated by the third backup is backup 3 in FIG. 5. The number “2” corresponding to the data block 1 (Block1) in the backup 3 indicates that the data block 1 belongs to it. The backup file is the backup file generated during the second backup. The number “3” corresponding to data block 5 (Block 5) indicates that the backup file to which data block 5 belongs is the backup file generated during the third backup.
在该实施例中,目标备份文件不是第一次备份生成的备份文件,则目标备份文件中包含的多个数据块中可能存在不归属于该目标备份文件的数据块。在确定目标备份文件的大小时,可以采用将归属于该目标备份文件的数据块的大小相加的方式计算目标备份文件的大小,也可以采用将目标备份文件中包含的多个数据块的总大小减去不归属于该目标备份文件的每个数据块的大小的方式,计算目标备份文件的大小。In this embodiment, the target backup file is not the backup file generated by the first backup, so there may be data blocks that do not belong to the target backup file among the multiple data blocks included in the target backup file. When determining the size of the target backup file, the size of the target backup file can be calculated by adding the sizes of the data blocks belonging to the target backup file, or the total of multiple data blocks contained in the target backup file can be used. The size of the target backup file is calculated by subtracting the size of each data block that does not belong to the target backup file from the size.
在本申请的另一实施例中,为了进一步快速地计算备份文件的大小,还可以根据备份文件的元数据中记录的该备份文件包含的该备份文件的备份序号,每个数据块的引用信息 以及数据块的大小,提前保存备份文件中包含的多个数据块的总大小,以及目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的总大小,为了便于区分,将备份文件中包含的数据块的总大小称为第一总大小,将目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的总大小称为第二总大小。In another embodiment of the present application, in order to further quickly calculate the size of the backup file, the backup serial number of the backup file contained in the backup file and the reference information of each data block may be further recorded according to the metadata of the backup file. And the size of the data block, save the total size of multiple data blocks contained in the backup file in advance, and the total size of data blocks among the multiple data blocks contained in the target backup file that do not belong to the target backup file, for convenience Differently, the total size of the data blocks included in the backup file is referred to as a first total size, and the total size of the data blocks not included in the target backup file among the plurality of data blocks included in the target backup file is referred to as a second The total size.
备份服务器接收查询请求,所述查询请求中包含所述目标备份文件的标识,备份服务器根据标识确定目标备份文件,并获取保存的目标备份文件中包含的多个数据块的第一总大小,所述目标备份文件是源文件的备份文件;然后,备份服务器获取保存的目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的第二总大小;最后,根据所述第一总大小,以及所述第二总大小,计算所述目标备份文件的大小。The backup server receives a query request, and the query request includes an identifier of the target backup file. The backup server determines the target backup file according to the identifier, and obtains a first total size of a plurality of data blocks contained in the saved target backup file. The target backup file is a backup file of the source file; then, the backup server obtains a second total size of the data blocks among the plurality of data blocks contained in the saved target backup file that do not belong to the target backup file; finally, according to the The first total size and the second total size are used to calculate the size of the target backup file.
具体为将第一总大小与第二总大小的差值,确定为所述目标备份文件的大小。Specifically, the difference between the first total size and the second total size is determined as the size of the target backup file.
在本发明实施例中,由于预先保存目标备份文件的第一总大小,以及不归属目标备份文件的第三数据块的第二总大小,采用相减的方式,快速地计算出目标备份文件的大小。In the embodiment of the present invention, since the first total size of the target backup file is saved in advance and the second total size of the third data block that does not belong to the target backup file, a subtraction method is used to quickly calculate the target backup file's size.
针对目标备份文件,该目标备份文件引用第一备份文件中包含的数据块的总大小为该目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的总大小。在记录目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的总大小时,可以是记录该目标备份文件引用第一备份文件中包含的数据块的总大小,当然还可以记录备份文件中引用创建顺序位于该备份文件的创建顺序之前的每个备份文件中包含的数据块的总大小。For the target backup file, the total size of the data blocks contained in the target backup file referencing the first backup file is the total size of the data blocks included in the target backup file that do not belong to the target backup file. When recording the total size of data blocks that are not attributable to the target backup file among the plurality of data blocks included in the target backup file, the total size of the data blocks included in the first backup file referenced by the target backup file may be recorded. Of course, it is also possible to record the total size of the data blocks contained in each backup file in which the reference creation order in the backup file precedes the creation order of the backup file.
备份文件中包含的多个数据块的总大小,备份文件中引用创建顺序位于该备份文件的创建顺序之前的每个备份文件中包含的数据块的总大小等信息具体可以是保存在备份链引用关系记录文件中,备份链引用关系记录文件中记录的信息包含如表1所示的备份文件的标识(ID),例如可以是备份1、备份2、……、备份n等,备份文件的备份序号(Snaplndex),例如1、2、……、n等,备份文件中包含的多个数据块的总大小(Totasize),备份文件中引用生成顺序位于该备份文件的生成顺序之前的每个备份文件中包含的数据块的总大小(Reference)。后续可根据备份链引用关系记录文件中记录的信息计算备份文件的大小。The total size of multiple data blocks contained in the backup file, the reference creation order in the backup file, and the total size of the data blocks contained in each backup file before the backup file creation order can be stored in the backup chain. In the relationship record file, the information recorded in the backup chain reference relationship record file includes the identification (ID) of the backup file as shown in Table 1. For example, it can be backup 1, backup 2, ..., backup n, etc., backup of the backup file Sequence number (Snaplndex), such as 1, 2, ..., n, etc. The total size of multiple data blocks contained in the backup file (Totasize). Each backup in the backup file is referenced in the generation order before the generation order of the backup file. The total size of the data blocks contained in the file (Reference). Subsequently, the size of the backup file can be calculated based on the information recorded in the backup chain reference relationship record file.
IDID SnaplndexSnaplndex TotasizeTotasize ReferenceReference
备份1Backup 1 11 T1T1  Zh
备份2Backup 2 22 T2T2 R(2,1)R (2,1)
备份3Backup 3 33 T3T3 R(3,1);R(3,2)R (3,1); R (3,2)
备份4 Backup 4 44 T4T4 R(4,1);R(4,2);R(4,3)R (4,1); R (4,2); R (4,3)
……... ……... ……... ……...
备份nBackup n nn TnTn R(n,1);R(n,2);R(n,3)……R(n,n-1)R (n, 1); R (n, 2); R (n, 3) ... R (n, n-1)
表1Table 1
以表1中标识为备份3的这一行为例进行说明:T3表示备份序号为3的备份文件中包含的多个数据块的总大小,R(3,1)表示备份序号为3的备份文件中引用备份序号为1的备份文件中包含的数据块的总大小,R(3,2)表示备份序号为3的备份文件中引用备份序号为2的备份文件中包含的数据块的总大小。Take this behavior example identified as backup 3 in Table 1 for illustration: T3 represents the total size of multiple data blocks contained in the backup file with backup sequence number 3, and R (3, 1) represents the backup file with backup sequence number 3. The total size of the data blocks contained in the backup file with the backup sequence number 1 in the reference, R (3, 2) represents the total size of the data blocks contained in the backup file with the backup sequence number 2 in the backup file with the backup sequence number 3.
备份服务器可以根据备份文件的元数据中记录的该备份文件包含的该备份文件的备份 序号,每个数据块的引用信息以及数据块的大小提前在备份链引用关系记录文件中记录用于计算备份文件的大小的相关信息,备份服务器在备份链引用关系记录文件中记录信息时,一般是跟随备份文件的生成过程实时记录的,也就是当生成一个备份文件后,就根据该备份文件的元数据中记录的信息确定备份链引用关系记录文件中记录的信息,在在备份链引用关系记录文件中记录信息的过程中,可以无需考虑存在删除某个备份文件的情况。The backup server may record the backup sequence number of the backup file, the reference information of each data block and the size of the data block in advance in the backup chain reference relationship record file for calculating the backup according to the backup file recorded in the metadata of the backup file. Information about the size of the file. When the backup server records information in the backup chain reference relationship record file, it generally records in real time following the generation process of the backup file, that is, when a backup file is generated, it is based on the metadata of the backup file. The information recorded in the backup chain reference relationship record file determines the information recorded in the backup chain reference relationship record file. In the process of recording information in the backup chain reference relationship record file, there is no need to consider the case of deleting a backup file.
备份服务器预先保存目标备份文件中包含的多个数据块的第一总大小的过程包括:备份服务器确定目标备份文件,从所述目标备份文件的元数据中查询所述目标备份文件中包含的每个数据块的大小,将所述目标备份文件中包含的每个数据块的大小之和,确定为所述目标备份文件中包含的多个数据块的第一总大小,并保存所述目标备份文件中包含的多个数据块的第一总大小。The process of the backup server pre-saving the first total size of the plurality of data blocks included in the target backup file includes: the backup server determines the target backup file, and queries each metadata included in the target backup file from the metadata of the target backup file. The size of each data block, determine the sum of the sizes of each data block included in the target backup file as the first total size of a plurality of data blocks included in the target backup file, and save the target backup The first total size of multiple data blocks contained in the file.
备份服务器预先保存的不归属于所述目标备份文件的数据块的第二总大小的过程包括:备份服务器确定目标备份文件,根据所述目标备份文件的元数据中记录的所述目标备份文件中包含的每个数据块的引用信息,获得所述目标备份文件中包含的多个数据块中不归属于所述目标备份文件的第三数据块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件;根据所述目标备份文件的元数据中记录的所述第三数据块的大小,确定所述目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的第二总大小,第二总大小为所述第三数据块的大小之和。The process of the second total size of the data blocks that are not attributable to the target backup file pre-saved by the backup server includes: the backup server determines the target backup file, and according to the target backup file recorded in the metadata of the target backup file The reference information of each data block included, to obtain a third data block of the plurality of data blocks included in the target backup file that does not belong to the target backup file, and the reference information of each data block is used to indicate A backup file to which the data block belongs; according to the size of the third data block recorded in the metadata of the target backup file, determining that a plurality of data blocks included in the target backup file do not belong to the target The second total size of the data blocks of the backup file, and the second total size is the sum of the sizes of the third data blocks.
备份服务器计算目标备份文件的大小,则备份链引用关系记录文件中至少记录有目标备份文件中包含的多个数据块的第一总大小,以及目标备份文件中引用第一备份文件中包含的数据块的总大小,也就是第一目标文件中包含的多个数据块中不归属所述目标备份文件的数据块的第二总大小。The backup server calculates the size of the target backup file, then the backup chain reference relationship record file records at least the first total size of multiple data blocks contained in the target backup file, and the target backup file references the data contained in the first backup file The total size of the blocks, that is, the second total size of the data blocks in the plurality of data blocks included in the first target file that do not belong to the target backup file.
根据目标备份文件的元数据中记录的该目标备份文件中包含的每个数据块的引用信息以及数据块的大小,在备份链引用关系记录文件中记录目标备份文件中包含的多个数据块的总大小的过程包括:According to the reference information of each data block contained in the target backup file and the size of the data block recorded in the metadata of the target backup file, the backup chain reference relationship record file records the multiple data blocks contained in the target backup file. The total size process includes:
备份服务器确定目标备份文件,根据所述目标备份文件的元数据中记录的每个数据块的大小,将所述目标备份文件的元数据中记录的每个数据块的大小之和,确定为所述备份链引用关系记录文件中记录的目标备份文件中包含的多个数据块的总大小。The backup server determines the target backup file, and determines the sum of the sizes of each data block recorded in the metadata of the target backup file according to the size of each data block recorded in the metadata of the target backup file. The total size of multiple data blocks contained in the target backup file recorded in the backup chain reference relationship record file is described.
根据目标备份文件的元数据中记录的该目标备份文件中包含的每个数据块的引用信息以及数据块的大小,在备份链引用关系记录文件中记录目标备份文件中引用第一备份文件中包含的数据块的第二总大小的过程包括:According to the reference information of each data block contained in the target backup file and the size of the data block recorded in the metadata of the target backup file, records in the backup chain reference relationship record file refer to the target backup file that is included in the first backup file. The process of the second total size of the data block includes:
备份服务器确定目标备份文件,根据所述目标备份文件的元数据中所述目标备份文件中包含的每个数据块的引用信息,获得所述目标备份文件中包含的多个数据块中不归属于所述目标备份文件的第三数据块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件;根据所述目标备份文件的元数据中记录的所述第三数据块的大小,确定所述目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的第二总大小,第二总大小为第三数据块的大小之和。The backup server determines a target backup file, and obtains, based on the reference information of each data block included in the target backup file in the metadata of the target backup file, a plurality of data blocks included in the target backup file that do not belong to The third data block of the target backup file, and the reference information of each data block is used to indicate the backup file to which the data block belongs; according to the third data block recorded in the metadata of the target backup file Determine the second total size of the data blocks not attributable to the target backup file among the plurality of data blocks included in the target backup file, and the second total size is the sum of the sizes of the third data blocks.
在根据所述目标备份文件的元数据中记录的所述目标备份文件中包含的每个数据块的引用信息,获得所述目标备份文件中包含的多个数据块中不归属于所述目标备份文件的第三数据块时,可以是根据引用信息中的备份文件的标识确定的,具体为识别目标备份文件 的元数据中记录的每个数据块归属的备份文件的标识,将其中的非目标备份文件的标识对应的数据块,确定为不归属于目标备份文件的第三数据块。也可以是根据引用信息中的备份文件的备份序号确定,具体可以是识别目标备份文件的元数据中记录的每个数据块归属的备份文件的备份序号,将其中的非目标备份文件的备份序号对应的数据块确定为不归属于所述目标备份文件的第三数据块,即目标备份文件中引用第一备份文件中包含的数据块。Obtaining, according to reference information of each data block included in the target backup file recorded in the metadata of the target backup file, a plurality of data blocks included in the target backup file that do not belong to the target backup The third data block of the file may be determined according to the identifier of the backup file in the reference information, specifically identifying the identifier of the backup file to which each data block recorded in the metadata of the target backup file belongs, and the non-target The data block corresponding to the identifier of the backup file is determined as the third data block that does not belong to the target backup file. It may also be determined based on the backup sequence number of the backup file in the reference information, which may specifically identify the backup sequence number of the backup file to which each data block recorded in the metadata of the target backup file belongs, and the backup sequence number of the non-target backup file therein. The corresponding data block is determined as a third data block that does not belong to the target backup file, that is, the target backup file refers to a data block included in the first backup file.
结合图5中的标识为备份3的备份文件的元数据记录文件中包含的信息,备份链引用关系记录文件中记录的备份序号为3的备份文件中引用备份序号为1的备份文件中包含的数据块的总大小,即R(3,1)为数据块6(Block6)的大小之,即size6。With reference to the information contained in the metadata record file of the backup file identified as backup 3 in FIG. 5, the backup file referenced in the backup chain reference relationship record file recorded in the backup chain reference number 3 is included in the backup file with the backup serial number 1. The total size of the data block, that is, R (3,1) is the size of data block 6 (Block6), that is, size6.
备份序号为2的备份文件中包含的数据块6(Block6)为引用备份序号为1的备份文件中包含的数据块,备份链引用关系记录文件中记录的备份序号为3的备份文件中引用备份序号为2的备份文件中包含的数据块的总大小,即R(3,2)为数据块1、数据块2、数据块4和数据块6的大小之和,即size1’、size2’、size4’与size6之和。The data block 6 (Block6) contained in the backup file with the backup sequence number 2 refers to the data block contained in the backup file with the backup sequence number 1 and the backup file with the backup sequence number 3 recorded in the backup chain reference relationship record file refers to the backup The total size of the data blocks contained in the backup file with sequence number 2, that is, R (3, 2) is the sum of the sizes of data block 1, data block 2, data block 4, and data block 6, namely size1 ', size2', The sum of size4 'and size6.
针对不存在删除备份文件的情况,备份文件b的生成顺序位于备份文件a的生成顺序之前,在确定某个备份文件a中引用另一备份文件b中包含的数据块的总大小时,可以是识别该备份文件a的元数据中记录的该备份文件a中包含的每个数据块归属的备份文件的备份序号,以及识别备份文件b的元数据中记录的该备份文件b的备份序号,在备份文件a中包含的每个数据块归属的备份文件的备份序号中识别小于以及等于备份文件b的备份序号的特定备份序号;将特定备份序号对应的数据块确定为备份文件a引用备份文件b中包含的数据块,根据备份文件a中包含每个数据块的大小,确定每个特定数据块的大小,将每个特定数据块的大小之和确定为备份文件a引用备份文件b中包含的数据块的总大小。For the case where there is no deletion of the backup file, the generation order of the backup file b precedes the generation order of the backup file a. When determining the total size of data blocks contained in another backup file a that references another backup file b, it can be Identify the backup sequence number of the backup file to which each data block contained in the backup file a recorded in the metadata of the backup file a, and identify the backup sequence number of the backup file b recorded in the metadata of the backup file b, in The backup sequence number of the backup file to which each data block contained in the backup file a identifies a specific backup sequence number that is less than or equal to the backup sequence number of the backup file b; the data block corresponding to the specific backup sequence number is determined as the backup file a and the backup file b is referenced The data blocks included in the backup file a determine the size of each specific data block according to the size of each data block included in the backup file a, and determine the sum of the sizes of each specific data block as the backup file a reference to the backup file b The total size of the data block.
在查询目标备份文件的大小时,可以根据备份链引用关系记录文件中包含的信息,计算目标备份文件的大小。可以是将目标备份文件中包含的多个数据块的总大小(Totasize)减去不归属于该目标备份文件的数据块的大小。其中,不归属于该目标备份文件的数据块的大小也就是该目标备份文件中引用第一备份文件中包含的数据块的大小。假设目标备份文件中包含的多个数据块的总大小为Ta,目标备份文件中包含的数据块中引用第一备份文件的数据块的大小R(a,p),则目标备份文件的大小为Ta-R(a,p),其中a为目标备份文件的目标备份序号,p为第一备份文件的第一备份序号。When querying the size of the target backup file, you can calculate the size of the target backup file based on the information contained in the backup chain reference relationship record file. It may be the total size (Totasize) of a plurality of data blocks included in the target backup file minus the size of the data blocks that do not belong to the target backup file. The size of the data block that does not belong to the target backup file is the size of the data block included in the reference backup file in the target backup file. Assume that the total size of the multiple data blocks contained in the target backup file is Ta, and the size of the data block in the target backup file that refers to the first backup file is R (a, p). The size of the target backup file is Ta-R (a, p), where a is the target backup serial number of the target backup file, and p is the first backup serial number of the first backup file.
假设,待计算大小的目标备份文件为标识为备份4的备份文件,在不存在删除某个备份文件的情况下,可以根据每个备份文件的备份序号,在备份链引用关系记录文件中查询出第一备份文件为标识为3的备份文件。目标备份文件中包含的多个数据块的总大小为T4,目标备份文件中包含的多个数据块中不归属所述目标备份文件的数据块的总大小为R(4,3),目标备份文件的大小为T4-R(4,3)。Assume that the target backup file whose size is to be calculated is the backup file identified as backup 4. If there is no deletion of a backup file, you can query it in the backup chain reference relationship record file according to the backup serial number of each backup file. The first backup file is a backup file identified by 3. The total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks among the multiple data blocks contained in the target backup file that do not belong to the target backup file is R (4, 3). The target backup The file size is T4-R (4,3).
如果标识为备份3的备份文件被删除,则备份链引用关系记录文件中不包含标识为备份3的相关信息,可以根据每个备份文件的备份序号,在备份链引用关系记录文件中查询出第一备份文件为标识为2的备份文件。目标备份文件中包含的多个数据块的总大小为T4,目标备份文件中包含的多个数据块中不归属所述目标备份文件的数据块的总大小为R(4,2),目标备份文件的大小为T4-R(4,2)。If the backup file identified as backup 3 is deleted, the backup chain reference relationship record file does not contain related information identified as backup 3. You can query the backup chain reference relationship record file to find the A backup file is a backup file identified by 2. The total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks among the multiple data blocks contained in the target backup file that do not belong to the target backup file is R (4, 2). The target backup The file size is T4-R (4, 2).
如图6所示,为计算备份文件的大小的流程图,用户向备份服务器发送查询请求,用于查询目标备份文件的大小,备份服务器根据预先保存的备份链引用关系记录文件记录的 目标备份文件中包含的数据块的总大小,以及引用第一备份文件的大小,目标备份文件的大小,并向用户反馈目标备份文件的大小。As shown in Figure 6, a flowchart of calculating the size of a backup file. The user sends a query request to the backup server for querying the size of the target backup file. The backup server records the target backup file recorded by the backup chain reference relationship record file in advance The total size of the data blocks included in the reference, the size of the first backup file, the size of the target backup file, and feedback to the user on the size of the target backup file.
不管是采用将归属于目标备份文件中包含的多个数据块的大小相加的方式,确定目标备份文件的大小,还是采用将目标备份文件中包含的多个数据块的总大小减去不归属目标备份文件的数据块的总大小的方式,计算目标备份文件的大小,备份服务器使用的每个数据块的引用信息以及每个数据块的大小,均需要在元数据中进行提前记录,在本申请的另一实施例中,还包括:No matter whether the size of the target backup file is determined by adding the sizes of the multiple data blocks included in the target backup file, or the total size of the multiple data blocks included in the target backup file is subtracted from the non-attribution The total size of the data blocks of the target backup file, the calculation of the size of the target backup file, the reference information of each data block used by the backup server, and the size of each data block need to be recorded in advance in the metadata. In another embodiment of the application, the method further includes:
在所述目标备份文件的元数据中记录所述目标备份文件中包含的每个数据块的引用信息,以及每个数据块的大小,以及在第一备份文件的元数据中记录所述第一备份文件中包含的每个数据块的引用信息。Record the reference information of each data block contained in the target backup file and the size of each data block in the metadata of the target backup file, and record the first in the metadata of the first backup file Reference information for each data block contained in the backup file.
一般,在进行备份时,生成备份文件,针对该备份文件,在该备份文件的元数据块中记录该备份文件中包含的每个数据块的引用信息。Generally, when a backup is performed, a backup file is generated, and for the backup file, reference information of each data block included in the backup file is recorded in a metadata block of the backup file.
在本申请实施例中,为了防止业务故障时造成的数据丢失,租户可以向备份服务器发送备份请求,当然还可以是其他系统根据数据块组的备份周期,在周期到达时向备份服务器发送备份请求,备份服务器根据目标数据块组的标识对该目标数据块组中的数据块进行备份,生成目标备份文件,并在目标备份文件的元数据中记录目标备份文件的标识,目标备份文件的备份序号,目标备份文件中包含的每个数据块的引用信息,以及每个数据块的大小。In the embodiment of the present application, in order to prevent data loss caused by a service failure, the tenant may send a backup request to the backup server. Of course, other systems may also send a backup request to the backup server when the period arrives according to the backup period of the data block group. The backup server backs up the data blocks in the target data block group according to the identifier of the target data block group, generates a target backup file, and records the identifier of the target backup file in the metadata of the target backup file and the backup serial number of the target backup file. , The reference information of each data block contained in the target backup file, and the size of each data block.
如图7所示,具体说明备份服务器进行备份的过程以及记录备份文件的标识,备份序号,数据块的引用信息及数据块的大小的过程:As shown in FIG. 7, the backup server performs the backup process and records the backup file identifier, backup serial number, data block reference information, and data block size process:
备份服务器在接收到用户发送的备份请求时,创建备份任务,并生成备份文件的标识,将生成的备份文件的标识称为目标标识,备份服务器根据目标标识新建元数据记录文件,以及根据备份服务器中保存的当前存在的最大的备份序号,确定目标备份序号,一般为将最大的备份序号加1确定为目标备份序号,并在新建的元数据记录文件中保存目标标识及目标备份序号。When the backup server receives the backup request sent by the user, it creates a backup task and generates the ID of the backup file. The generated ID of the backup file is called the target ID. The backup server creates a new metadata record file based on the target ID and the backup server. The largest backup sequence number currently stored in the database is used to determine the target backup sequence number. Generally, the largest backup sequence number is increased by 1 as the target backup sequence number, and the target identifier and the target backup sequence number are stored in the newly created metadata record file.
备份服务器从生产存储服务器中循环读取数据,获取目标数据块组中的每个进行备份的数据块,向备份存储服务器中写入进行备份的数据块实现数据块的备份,并识别每个进行备份的数据块的大小,在新建的元数据记录文件中记录每个进行备份的数据块的大小,以及针对每个进行备份的数据块记录其归属的备份文件的目标标识;The backup server cyclically reads data from the production storage server, obtains each data block in the target data block group for backup, writes the backup data block to the backup storage server to implement data block backup, and identifies each The size of the backed up data block, the size of each backed up data block is recorded in the newly created metadata record file, and the target identifier of the backup file to which it belongs is recorded for each backed up data block;
识别所述目标数据块组中的每个未进行备份的数据块,针对每个未进行备份的数据块,识别最大的备份序号对应的备份文件的其他标识,根据所述其他标识查找对应的元数据记录文件;Identify each un-backed-up data block in the target data block group, and for each un-backed-up data block, identify another identifier of the backup file corresponding to the largest backup sequence number, and find the corresponding element according to the other identifier Data log file
识别其他标识对应的元数据记录文件中记录的该次备份过程中的每个未进行备份的数据块的大小,以及每个未进行备份的数据块归属的备份文件的标识即备份序号,并在新建的元数据记录文件中记录。Identify the size of each un-backed-up data block recorded in the backup process recorded in the metadata record file corresponding to the other ID, and the ID of the backup file to which each un-backed-up data block belongs is the backup serial number, and Record in the newly created metadata record file.
上述的进行备份的数据块以及未进行备份的数据块为该目标标识的备份文件中包含的数据块,进行备份的数据块归属于目标标识的备份文件。根据上述过程,可以算是完成了备份任务,可以通知用户备份完成。The data blocks to be backed up and the data blocks not to be backed up are data blocks included in the backup file of the target identifier, and the data blocks to be backed up belong to the backup file of the target identifier. According to the above process, it can be considered that the backup task is completed, and the user can be notified that the backup is completed.
但是,为了后续快速地计算备份文件的大小,还可以根据新建的元数据记录文件中记 录的备份序号,每个数据块的大小,以及每个数据块归属的备份文件的标识即备份序号,计算该次备份生成的备份文件包含的数据块的总大小,以及该次备份生成的备份文件中引用生成顺序位于该次备份生成的备份文件之前的每个备份文件中包含的数据块的总大小,并在备份链引用关系记录文件中记录的该次备份生成的备份文件包含的数据块的总大小,以及该次备份生成的备份文件中引用生成顺序位于该次备份生成的备份文件之前的每个备份文件中包含的数据块的总大小。根据以上过程,完成了备份任务。However, in order to quickly calculate the size of the backup file, it can also be calculated based on the backup sequence number recorded in the newly created metadata record file, the size of each data block, and the identifier of the backup file to which each data block belongs, that is, the backup sequence number. The total size of the data blocks contained in the backup file generated by this backup, and the total size of the data blocks contained in each backup file referenced in the backup file generated by this backup before the backup file generated by this backup, And the total size of the data blocks contained in the backup file generated by this backup recorded in the backup chain reference relationship record file, and each of the reference generation order in the backup file generated by this backup precedes the backup file generated by this backup The total size of the data blocks contained in the backup file. According to the above process, the backup task is completed.
在本申请的另一实施例中,如果待计算大小的目标备份文件为第一次备份生成的备份文件,则可以直接将待计算大小的目标备份文件中包含的所有的数据块的大小相加,即可得到该目标备份文件的大小。In another embodiment of the present application, if the target backup file of the size to be calculated is the backup file generated by the first backup, the sizes of all data blocks included in the target backup file of the size to be calculated may be directly added. To get the size of the target backup file.
备份服务器可以是在接收到查询请求时,判断待计算大小的目标备份文件是否为第一次备份生成的备份文件,具体可以是根据预先保存的每个备份文件的备份序号,确定待计算大小的目标备份文件的备份序号是否为待计算大小的目标备份文件所在的备份链中最小的备份的序号,如果是,则确定待计算大小的备份文件为第一次备份时生成的备份文件,否则,则确定待计算大小的备份文件非第一次备份时生成的备份文件。The backup server may determine whether the target backup file of the size to be calculated is the backup file generated by the first backup when receiving the query request. Specifically, the backup server may determine the size to be calculated according to the backup serial number of each backup file saved in advance. Whether the backup serial number of the target backup file is the serial number of the smallest backup in the backup chain where the target backup file of the size to be calculated is located. If so, it is determined that the backup file of the size to be calculated is the backup file generated during the first backup, otherwise, It is determined that the backup file to be calculated is not a backup file generated when the first backup is performed.
上述的备份文件的标识,备份序号,备份文件中包含的每个数据块归属的备份文件的标识及备份序号可以在图3所示的备份服务器的本地数据库中保存。The identification of the backup file, the backup serial number, the identification of the backup file to which each data block contained in the backup file, and the backup serial number can be stored in the local database of the backup server shown in FIG. 3.
租户在云资源上租用的存储空间是有限的,当租用的存储空间不足时,租户可以把某些备份文件给删掉,租户可以向备份服务器发送删除请求,用以删除备份文件,在现有技术中,备份服务器在删除了备份文件中的数据后,才能计算出删除的备份文件中的数据的大小,整个删除过程耗时较长,不能及时地给租户反馈删除的数据的大小,并且租户已经发送了删除请求,就不应该再对备份文件中能够删除的数据进行计费,在耗时较长的删除过程中,对租户请求删除的备份文件中能够删除的数据的计费是不合理的。The storage space rented by the tenant on the cloud resource is limited. When the rented storage space is insufficient, the tenant can delete some backup files. The tenant can send a delete request to the backup server to delete the backup file. In the technology, after the backup server deletes the data in the backup file, the size of the data in the deleted backup file can be calculated. The entire deletion process takes a long time, and the size of the deleted data cannot be feedback to the tenant in time. Once a delete request has been sent, you should not be billed for the data that can be deleted in the backup file. During a long process of deletion, it is not reasonable to charge the data that can be deleted in the backup file that the tenant requested to delete. of.
为此在本申请的一个实施例中,提供了一种计算备份文件中删除的数据的大小的方法,可以快速地计算出备份文件中能够删除的数据的大小,无需在删除任务结束后才能计算出备份文件中能够删除的数据的大小,方法包括:For this reason, in one embodiment of the present application, a method for calculating the size of data deleted in a backup file is provided, which can quickly calculate the size of data that can be deleted in a backup file. Get the size of the data that can be deleted in the backup file. The methods include:
首先,备份服务器确定目标备份文件,所述目标备份文件中包含多个数据块,所述目标备份文件是源文件的备份文件,所述源文件还具有第一备份文件和第三备份文件,所述第一备份文件为创建时间位于所述目标备份文件之前的,且创建时间与所述目标备份文件的创建时间最近的备份文件,所述第三备份文件为创建时间位于所述目标备份文件之后的,且创建时间与所述目标备份文件的创建时间最近的备份文件,第一备份文件与第三备份文件均未被删除。First, the backup server determines a target backup file. The target backup file includes multiple data blocks. The target backup file is a backup file of a source file. The source file also has a first backup file and a third backup file. The first backup file is a backup file whose creation time is before the target backup file, and whose creation time is closest to the creation time of the target backup file, and the third backup file is a creation time which is after the target backup file. , And the backup file whose creation time is closest to the creation time of the target backup file, neither the first backup file nor the third backup file is deleted.
其次,根据所述目标备份文件的元数据中记录的所述目标备份文件中包含的每个数据块的引用信息,获得归属于所述目标备份文件的第一数据块。Secondly, according to the reference information of each data block contained in the target backup file recorded in the metadata of the target backup file, a first data block belonging to the target backup file is obtained.
备份服务器确定归属于所述目标备份文件的第一数据块的过程在上述计算备份文件的大小的实施例中已经记载,在此不再进行描述。The process by which the backup server determines the first data block belonging to the target backup file has been described in the above embodiment for calculating the size of the backup file, and will not be described here.
再其次,备份服务器确定第一目标备份文件,根据所述目标备份文件的元数据中记录的目标备份文件中包含每个数据块的引用信息,以及所述第一备份文件的元数据中记录的第一备份文件中包含的每个数据块的引用信息,获得归属于所述目标备份文件的第二数据 块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件。Secondly, the backup server determines the first target backup file, and according to the target backup file recorded in the metadata of the target backup file, the target backup file contains reference information of each data block, and the metadata recorded in the metadata of the first backup file is recorded. Reference information of each data block included in the first backup file is used to obtain a second data block belonging to the target backup file, and the reference information of each data block is used to indicate the backup file to which the data block belongs.
备份服务器确定第一备份文件的过程在上述计算备份文件的大小的实施例中已经记载,在此不再进行描述。The process of determining the first backup file by the backup server has been described in the above embodiment for calculating the size of the backup file, and will not be described here.
备份服务器在确定归属于所述目标备份文件的第二数据块时,可以是根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息,获得归属于第二备份文件的数据块,所述第二备份文件的创建时间位于所述目标备份文件的创建时间之前,且位于所述第一备份文件的创建时间之后,所述第二备份文件已被删除;然后,将所述归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块。具体过程在上述计算备份文件的大小的实施例中已经记载,在此不再进行描述。When the backup server determines the second data block belonging to the target backup file, it may be based on the reference information of the data block recorded in the metadata of the target backup file and the data recorded in the metadata of the first backup file. The reference information of the data block to obtain the data block belonging to the second backup file, and the creation time of the second backup file is before the creation time of the target backup file and after the creation time of the first backup file, The second backup file has been deleted; then, the data block attributed to the second backup file is determined as the second data block attributed to the target backup file. The specific process has been described in the above embodiment for calculating the size of the backup file, and will not be described here.
然后,备份服务器确定第三目标备份文件,根据所述第三备份文件的元数据中记录第三备份文件中包含的每个数据块的引用信息,以获得归属于所述目标备份文件的第一数据块和第二数据块中被第三备份文件引用的第三数据块;Then, the backup server determines a third target backup file, and according to the metadata of the third backup file, records reference information of each data block included in the third backup file to obtain a first attribute belonging to the target backup file. A third data block referenced by a third backup file in the data block and the second data block;
再然后,根据所述第一数据块、第二数据块,以及所述第三数据块,确定能够被删除的第四数据块,具体为在确定所述目标备份文件中能够删除的第四数据块时,可以是将第一数据块和第二数据块中除第三数据块外的数据块确定为能够被删除的第四数据块。Then, a fourth data block that can be deleted is determined according to the first data block, the second data block, and the third data block, specifically, the fourth data that can be deleted in determining the target backup file In the block, the data block other than the third data block among the first data block and the second data block may be determined as a fourth data block that can be deleted.
最后,根据目标备份文件的元数据中记录的每个第四数据块的大小,确定所述目标备份文件中能够被删除的数据的大小。具体为将第四数据块的大小之和,确定为所述目标备份文件中能够被删除的数据的大小。Finally, the size of the data that can be deleted in the target backup file is determined according to the size of each fourth data block recorded in the metadata of the target backup file. Specifically, the sum of the sizes of the fourth data blocks is determined as the size of data that can be deleted in the target backup file.
在本申请实施例中,租户在删除某个备份文件时,可以向备份服务器发送删除请求,用以删除备份文件中能够删除的数据。删除请求中包含待删除的备份文件的标识,将待删除的备份文件称为目标备份文件。备份服务器在接收到删除请求后,可以根据删除请求中包含的标识确定出目标备份文件。In the embodiment of the present application, when deleting a backup file, the tenant may send a delete request to the backup server to delete data that can be deleted from the backup file. The delete request includes an identifier of the backup file to be deleted, and the backup file to be deleted is referred to as a target backup file. After receiving the delete request, the backup server can determine the target backup file according to the identifier contained in the delete request.
备份服务器确定第三备份文件的过程可以如下:The process of the backup server determining the third backup file may be as follows:
备份服务器根据记录的每个备份文件的备份序号,确定所述目标备份文件的备份序号,将目标备份文件的备份序号称为目标备份序号,备份服务器确定与目标备份序号相邻,且大于所述目标备份序号的第三备份序号;The backup server determines the backup serial number of the target backup file according to the recorded backup serial number of each backup file. The backup serial number of the target backup file is referred to as the target backup serial number. The backup server determines that it is adjacent to the target backup serial number and is greater than the The third backup sequence number of the target backup sequence number;
并根据记录每个备份文件的备份序号,确定第三备份序号对应的第三备份文件。The third backup file corresponding to the third backup serial number is determined according to the backup serial number of each backup file.
备份服务器在根据所述第三备份文件的元数据中记录第三备份文件中包含的每个数据块的引用信息,以获得归属于所述目标备份文件的第一数据块和第二数据块中被第三备份文件引用的第三数据块时,具体可以是针对目标备份文件中的每个第一数据块和每个第二数据块,识别出该数据块在第一备份文件中的对应数据块,备份服务器判断该数据块的引用信息与该数据块在第一备份文件中的对应数据块的引用信息是否相同,如果相同,则确定该数据块为被第三备份文件引用的第三数据块,该第三数据块不能删除。The backup server records the reference information of each data block included in the third backup file in the metadata according to the third backup file to obtain the first data block and the second data block belonging to the target backup file. When the third data block is referenced by the third backup file, specifically, for each first data block and each second data block in the target backup file, identifying the corresponding data of the data block in the first backup file Block, the backup server judges whether the reference information of the data block is the same as the reference information of the corresponding data block in the first backup file. If the reference information is the same, it determines that the data block is the third data referenced by the third backup file. Block, the third data block cannot be deleted.
在本发明实施例中,由于目标备份文件的元数据、第一备份文件的元数据和第三备份文件的元数据中各自记录有对应的备份文件中包含的每个数据块的引用信息,可以快速地识别出目标备份文件中能够被删除的数据块,并根据记录的每个能够被删除的数据块的大小,采用相加的方式,快速地计算出目标备份文件中能够被删除的数据的大小。In the embodiment of the present invention, since the metadata of the target backup file, the metadata of the first backup file, and the metadata of the third backup file each record the reference information of each data block included in the corresponding backup file, it is possible to Quickly identify the data blocks that can be deleted in the target backup file, and quickly calculate the data that can be deleted in the target backup file according to the size of each recorded data block that can be deleted. size.
在本申请的另一实施例中,备份服务器可以是预先保存目标备份文件中包含的多个数 据块的第一总大小,目标备份文件中引用第一备份文件中包含的数据块的第二总大小,第三备份文件中引用目标备份文件中包含的数据块的第三总大小,以及第三备份文件中引用第一备份文件中包含的数据块的第四总大小。In another embodiment of the present application, the backup server may pre-save a first total size of a plurality of data blocks included in the target backup file, and a second total referenced data block included in the first backup file in the target backup file. The third backup file refers to the third total size of the data blocks contained in the target backup file, and the third backup file refers to the fourth total size of the data blocks contained in the first backup file.
其中,目标备份文件中引用第一备份文件中包含的数据块可以理解为目标备份文件中包含的数据块中不归属于目标备份文件的数据块。The reference to the data block included in the first backup file in the target backup file can be understood as the data block included in the target backup file that does not belong to the target backup file.
第三备份文件中引用目标备份文件中包含的数据块可以理解为第三备份文件中包含的数据块中不归属于第三备份文件的数据块。The data blocks included in the third backup file that are referenced in the target backup file can be understood as the data blocks included in the third backup file that do not belong to the third backup file.
第三备份文件中引用第一备份文件中包含的数据块可以理解为第三备份文件中包含的数据块中既不归属于第三备份文件,也不归属于目标备份文件的数据块。The reference to the data block contained in the first backup file in the third backup file can be understood as the data block contained in the third backup file is neither a data block belonging to the third backup file nor a target backup file.
第三备份文件中引用目标备份文件中包含的数据块的第三总大小减去第三备份文件中引用第一备份文件中包含的数据块第四总大小的差值为第三备份文件中包含的数据块中归属于目标备份文件的数据块,这些归属与目标备份文件的数据块被第三备份文件所引用,不进行删除。The difference between the third total size of the third backup file referencing the data block contained in the target backup file minus the fourth total size of the third backup file referencing the data block contained in the first backup file is included in the third backup file Among the data blocks belonging to the target backup file, the data blocks belonging to the target backup file are referenced by the third backup file and will not be deleted.
在计算目标备份文件中能够被删除的数据的大小时,可以是先计算归属于目标备份文件的数据块的总大小,即第一总大小减去第二总大小的第一差值;再计算归属于目标备份文件的数据块中被第三备份文件所引用数据块的总大小,即第三总大小减去第四总大小的第二差值,归属于目标备份文件的数据块中,但被第三备份文件所引用数据块不被删除;最后,计算归属于目标备份文件的数据块中除被第三备份文件所引用的数据块外的数据块的总大小,也就是第一总大小减去第二总大小的第一差值,再减去第三总大小减去第四总大小的第二差值,即第一总大小,减去第二总大小,再减去第三总大小再加上第四总大小,即为目标备份文件中能够被删除的数据的总大小。When calculating the size of the data that can be deleted in the target backup file, the total size of the data blocks belonging to the target backup file may be calculated first, that is, the first total size minus the first difference between the second total size; The total size of the data blocks referenced by the third backup file in the data blocks belonging to the target backup file, that is, the third total size minus the fourth total size is the second difference, which belongs to the data blocks of the target backup file, but The data blocks referenced by the third backup file are not deleted; finally, the total size of the data blocks belonging to the target backup file other than the data blocks referenced by the third backup file is calculated, that is, the first total size Subtract the first difference of the second total size, and then subtract the second difference of the third total size and the fourth total size, that is, the first total size, subtract the second total size, and then subtract the third total size. The size plus the fourth total size is the total size of data that can be deleted in the target backup file.
如表1所示,可以在备份链引用关系记录文件中记录备份文件中包含的数据块的总大小,以及备份文件中引用生成顺序位于该备份文件的生成顺序之前的每个备份文件中包含的数据块的总大小等信息。As shown in Table 1, the backup chain reference relationship record file can record the total size of the data blocks contained in the backup file, as well as the reference file generation order in the backup file that is included in each backup file before the backup file generation order. Information such as the total size of the data block.
假设目标备份文件中包含的多个数据块的总大小为Ta,目标备份文件中包含的数据块中引用第一备份文件的数据块的大小R(a,p),则第三备份文件中引用目标备份文件中的数据块的大小R(b,a),第三备份文件中引用第一备份文件中的数据块的大小R(b,p),目标备份文件中删除的数据的大小为Ta-R(a,p)-[R(b,a)-R(b,p)],其中a为目标备份文件的目标备份序号,p为第一备份文件的第一备份序号,b为第三备份文件的第三备份序号。Assume that the total size of the multiple data blocks contained in the target backup file is Ta, and that the data block contained in the target backup file refers to the size R (a, p) of the data block that references the first backup file, then the third backup file references The size R (b, a) of the data block in the target backup file, the size R (b, p) of the data block in the first backup file that is referenced in the third backup file, and the size of the deleted data in the target backup file is Ta -R (a, p)-[R (b, a) -R (b, p)], where a is the target backup serial number of the target backup file, p is the first backup serial number of the first backup file, and b is the first The third backup sequence number of the three backup files.
假设,待删除的目标备份文件为标识为备份4的备份文件,在不存在删除某个备份文件的情况下,可以根据每个备份文件的备份序号,在备份链引用关系记录文件中查询出第一备份文件为标识为3的备份文件,第三备份文件为标识为备份5的备份文件。目标备份文件中包含的多个数据块的总大小为T4,目标备份文件中包含的多个数据块中不归属于所述目标备份文件的数据块的总大小为R(4,3),第三备份文件中包含的多个数据块中归属于目标备份文件的数据块总大小为(R(5,4)-R(5,3),目标备份文件中删除的数据的大小为T4-R(4,3)-[(R(5,4)-R(5,3)]。Assume that the target backup file to be deleted is the backup file identified as backup 4. In the case where there is no deletion of a backup file, you can query the backup chain reference relationship record file for the backup file according to the backup serial number of each backup file. A backup file is a backup file identified by 3, and a third backup file is a backup file identified by backup 5. The total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks among the multiple data blocks contained in the target backup file that do not belong to the target backup file is R (4, 3). The total size of the data blocks belonging to the target backup file among the multiple data blocks contained in the three backup files is (R (5,4) -R (5,3), and the size of the deleted data in the target backup file is T4-R (4,3)-[(R (5,4) -R (5,3)].
如果标识为备份3的备份文件被删除,则备份链引用关系记录文件中不包含标识为备份3的相关信息,可以根据每个备份文件的备份序号,在备份链引用关系记录文件中查询 出第一备份文件为标识为2的备份文件,第三备份文件为标识为备份5的备份文件。目标备份文件中包含的多个数据块的总大小为T4,目标备份文件中包含的多个数据块中不归属于目标备份文件的数据块的总大小为R(4,2),第三备份文件中包含的多个数据块中归属于目标备份文件的数据块总大小为(R(5,4)-R(5,2),目标备份文件中删除的数据的大小为T4-R(4,2)-[(R(5,4)-R(5,2)]。If the backup file identified as backup 3 is deleted, the backup chain reference relationship record file does not contain related information identified as backup 3. You can query the backup chain reference relationship record file to find the A backup file is a backup file identified as 2 and a third backup file is a backup file identified as backup 5. The total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks that do not belong to the target backup file among the multiple data blocks contained in the target backup file is R (4, 2). The third backup The total size of the data blocks belonging to the target backup file among the multiple data blocks contained in the file is (R (5,4) -R (5,2), and the size of the deleted data in the target backup file is T4-R (4 , 2)-[(R (5,4) -R (5,2)].
如果标识为备份3和备份5的备份文件被删除,则备份链引用关系记录文件中不包含标识为备份3和备份5的相关信息,可以根据每个备份文件的备份序号,在备份链引用关系记录文件中查询出第一备份文件为标识为2的备份文件,第三备份文件为标识为备份6的备份文件。目标备份文件中包含的多个数据块的总大小为T4,目标备份文件中包含的多个数据块中不归属于目标备份文件的数据块的总大小为R(4,2),第三备份文件中包含的多个数据块中归属于目标备份文件的数据块总大小为(R(6,4)-R(6,2),目标备份文件中删除的数据的大小为T4-R(4,2)-[(R(6,4)-R(6,2)]。If the backup files identified as backup 3 and backup 5 are deleted, the backup chain reference relationship record file does not contain related information identified as backup 3 and backup 5, and the reference relationship can be referenced in the backup chain according to the backup sequence number of each backup file. In the record file, the first backup file is queried as the backup file identified as 2 and the third backup file is the backup file identified as backup 6. The total size of the multiple data blocks contained in the target backup file is T4, and the total size of the data blocks that do not belong to the target backup file among the multiple data blocks contained in the target backup file is R (4, 2). The third backup The total size of the data blocks belonging to the target backup file among the multiple data blocks contained in the file is (R (6,4) -R (6,2), and the size of the deleted data in the target backup file is T4-R (4 , 2)-[(R (6,4) -R (6,2)].
在本申请的另一实施例中,如果待删除的目标备份文件为第一次备份生成的备份文件,则在计算目标备份文件中删除的数据的大小时,可以直接将待删除的目标备份文件中包含的多个数据块的总大小,减去第三备份文件中引用目标备份文件中数据块的大小之和,即作为该目标备份文件中删除的数据的大小。In another embodiment of the present application, if the target backup file to be deleted is a backup file generated by the first backup, when calculating the size of the data deleted in the target backup file, the target backup file to be deleted may be directly The total size of the multiple data blocks contained in the data is subtracted from the sum of the sizes of the data blocks in the reference backup file in the third backup file, which is the size of the data deleted in the target backup file.
在本申请的另一实施例中,如果待删除的目标备份文件为最后一次备份生成的备份文件,则在计算目标备份文件中删除的数据的大小时,可以直接将待删除的目标备份文件中包含的多个数据块的总大小,减去目标备份文件中引用第一备份文件中数据块的大小之和,作为该目标备份文件中删除的数据的大小。In another embodiment of the present application, if the target backup file to be deleted is the backup file generated by the last backup, when calculating the size of data deleted in the target backup file, the target backup file to be deleted may be directly The total size of the multiple data blocks included is subtracted from the sum of the sizes of the data blocks in the first backup file referenced in the target backup file as the size of the data deleted in the target backup file.
备份服务器可以是在接收到删除请求时,判断待删除的目标备份文件是否为第一次备份生成的备份文件,以及是否为最后一次备份生成的备份文件。判断是否为第一次备份生成的备份文件的过程在上述计算备份文件的大小的实施例中已经记载,在此不再进行描述。The backup server may determine whether the target backup file to be deleted is the backup file generated for the first backup and the backup file generated for the last backup when receiving the deletion request. The process of determining whether a backup file is generated for the first backup is described in the above embodiment for calculating the size of the backup file, and will not be described here.
在判断目标备份文件是否为最后一次备份生成的备份文件时,具体可以是根据预先保存的每个备份文件的备份序号,确定待删除的目标备份文件的备份序号是否为待删除的目标备份文件所在的备份链中最大的备份的序号,如果是,则确定待删除的目标备份文件为最后一次备份时生成的备份文件,否则,则确定待删除的目标备份文件非最后一次备份时生成的备份文件。When judging whether the target backup file is the backup file generated by the last backup, specifically, according to the backup serial number of each backup file saved in advance, it is determined whether the backup serial number of the target backup file to be deleted is the target backup file to be deleted. The sequence number of the largest backup in the backup chain of the. If yes, determine that the target backup file to be deleted is the backup file generated at the last backup; otherwise, determine that the target backup file to be deleted is not the backup file generated at the last backup .
如图8所示,为删除目标备份文件的流程图,用户向备份服务器发送删除请求,用于删除目标备份文件,备份服务器根据预先保存的备份链引用关系记录文件中记录的目标备份文件中包含的多个数据块的总大小,目标备份文件中引用第一备份文件的数据块的总大小,第三备份文件中引用目标备份文件的数据块的总大小,第三备份文件中引用第一备份文件的数据块的总大小,计算目标备份文件中能够删除的数据的大小,并向用户反馈目标备份文件中能够删除的数据的大小。备份服务器还可以删除生产存储服务器中保存的目标备份文件中能够删除的数据。备份服务器在删除目标备份文件中能够删除的数据后,还可以对备份链引用关系记录文件进行更新,具体为删除备份链引用关系记录文件中记录的目标备份文件的相关信息,该相关信息具体包括目标备份文件的标识,备份序号,目标备份文件中包含的多个数据块的总大小,任一备份文件中引用目标备份文件中的数据块的总大小。As shown in FIG. 8, it is a flowchart of deleting a target backup file. A user sends a delete request to a backup server to delete the target backup file. The backup server includes the target backup file recorded in the pre-saved backup chain reference relationship record file. The total size of multiple data blocks, the total size of the data blocks in the target backup file referencing the first backup file, the total size of the data blocks in the third backup file referencing the target backup file, and the third backup file referencing the first backup The total size of the data blocks of the file, calculates the size of the data that can be deleted in the target backup file, and feeds back to the user the size of the data that can be deleted in the target backup file. The backup server can also delete data that can be deleted from the target backup file saved on the production storage server. After the backup server deletes the data that can be deleted in the target backup file, it can also update the backup chain reference relationship record file. Specifically, it deletes the related information of the target backup file recorded in the backup chain reference relationship record file. The related information includes The ID of the target backup file, the backup serial number, the total size of the multiple data blocks contained in the target backup file, and the total size of the data blocks in the reference backup file in any backup file.
基于上述构思,如图9所述,本申请提供一种计算备份文件大小的装置900,该装置可包括处理单元901、收发单元902和存储单元903。所述计算备份文件大小的装置900可应用于备份服务器或备份服务器中的芯片。Based on the above concept, as shown in FIG. 9, this application provides a device 900 for calculating the size of a backup file. The device may include a processing unit 901, a transceiver unit 902, and a storage unit 903. The device 900 for calculating the size of a backup file may be applied to a backup server or a chip in the backup server.
存储单元903,可用于存储备份文件;The storage unit 903 may be configured to store a backup file.
处理单元901,可用于确定待计算大小的目标备份文件,并确定第一备份文件,所述第一备份文件为创建时间位于所述目标备份文件的创建时间之前的,且创建时间与所述目标备份文件的创建时间最近的备份文件。备份文件的元数据中记录有备份文件的备份序号,数据块的引用信息,以及数据块的大小,数据块的引用信息用于指示数据块归属的备份文件;从所述目标备份文件的元数据中查询所述目标备份文件中包含的每个数据块的引用信息以获得归属于所述目标备份文件的第一数据块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件;The processing unit 901 may be configured to determine a target backup file of a size to be calculated, and determine a first backup file, where the first backup file is created before the creation time of the target backup file, and the creation time and the target are The backup file that was created most recently. The metadata of the backup file records the backup serial number of the backup file, the reference information of the data block, and the size of the data block. The reference information of the data block is used to indicate the backup file to which the data block belongs; the metadata from the target backup file Query the reference information of each data block contained in the target backup file to obtain a first data block belonging to the target backup file, and the reference information of each data block is used to indicate the backup file;
根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息获得归属于所述目标备份文件的第二数据块;Obtaining the second data block belonging to the target backup file according to the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file;
根据所述目标备份文件的元数据中记录的所述第一数据块和所述第二数据块的大小,确定所述目标备份文件的大小。Determining the size of the target backup file according to the size of the first data block and the second data block recorded in the metadata of the target backup file.
进一步地,处理单元901,可具体用于根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息,获得归属于第二备份文件的数据块,所述第二备份文件的创建时间位于所述目标备份文件的创建时间之前,且位于所述第一备份文件的创建时间之后,所述第二备份文件已被删除;Further, the processing unit 901 may be specifically configured to obtain attribution attributed to the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file. A data block of a second backup file, the creation time of the second backup file is before the creation time of the target backup file, and after the creation time of the first backup file, the second backup file has been deleted ;
将所述归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块。And determining the data block attributed to the second backup file as the second data block attributed to the target backup file.
进一步地,处理单元901,可具体用于将第一数据块和第二数据块的大小之和,确定为所述目标备份文件的大小。Further, the processing unit 901 may be specifically configured to determine the sum of the sizes of the first data block and the second data block as the size of the target backup file.
接收单元902,可用于接收查询请求,所述查询请求中包含所述目标备份文件的标识。The receiving unit 902 may be configured to receive a query request, where the query request includes an identifier of the target backup file.
存储单元903,可用于在所述目标备份文件的元数据中记录所述每个数据块的引用信息,以及所述每个数据块的大小,以及在第一备份文件的元数据中记录所述第一备份文件中包含的每个数据块的引用信息。The storage unit 903 may be configured to record the reference information of each data block in the metadata of the target backup file, the size of each data block, and record the data in the metadata of the first backup file. Reference information for each data block contained in the first backup file.
需要说明的是,以上的各个单元的功能可由备份服务器的处理器执行或者处理器调用存储器中的程序进行执行。It should be noted that the functions of the above units may be executed by the processor of the backup server or executed by the processor calling a program in the memory.
基于上述构思,如图10所述,本申请提供一种计算备份文件中删除的数据大小的装置1000,该装置可包括处理单元1001、收发单元1002和存储单元1003。所述计算备份文件大小的装置1000可应用于备份服务器或备份服务器中的芯片。Based on the above concept, as shown in FIG. 10, the present application provides a device 1000 for calculating the size of deleted data in a backup file. The device may include a processing unit 1001, a transceiving unit 1002, and a storage unit 1003. The device 1000 for calculating the size of a backup file may be applied to a backup server or a chip in a backup server.
存储单元1003,可用于存储备份文件;A storage unit 1003, which can be used to store a backup file;
处理单元1001,可用于确定目标备份文件,所述目标备份文件中包含多个数据块,所述目标备份文件是源文件的备份文件,所述源文件还具有第一备份文件和第三备份文件,所述第一备份文件包含多个数据块,所述第三备份文件包含多个数据块,所述第一备份文件为创建时间位于所述目标备份文件的创建时间之前的,且创建时间与所述目标备份文件的创建时间最近的备份文件,所述第一备份文件未被删除,所述第三备份文件为创建时间位于所述目标备份文件的创建时间之后的,且创建时间与所述目标备份文件的创建时间最近的备份文件,所述第三备份文件未被删除;The processing unit 1001 may be configured to determine a target backup file, where the target backup file includes multiple data blocks, the target backup file is a backup file of a source file, and the source file further includes a first backup file and a third backup file The first backup file includes multiple data blocks, the third backup file includes multiple data blocks, and the first backup file is created before the creation time of the target backup file, and the creation time and The backup file with the latest creation time of the target backup file, the first backup file is not deleted, the third backup file is created after the creation time of the target backup file, and the creation time and the The backup file with the latest creation time of the target backup file, and the third backup file has not been deleted;
根据所述目标备份文件的元数据中记录的所述目标备份文件中包含的每个数据块的引用信息,获得归属于所述目标备份文件的第一数据块;Obtaining the first data block belonging to the target backup file according to the reference information of each data block contained in the target backup file recorded in the metadata of the target backup file;
确定第一备份文件,根据所述目标备份文件的元数据中记录的目标备份文件中包含每个数据块的引用信息,以及所述第一备份文件的元数据中记录的第一备份文件中包含的每个数据块的引用信息,获得归属于所述目标备份文件的第二数据块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件;Determining the first backup file, according to the target backup file recorded in the metadata of the target backup file, including the reference information of each data block, and the first backup file recorded in the metadata of the first backup file, Obtaining reference information of each data block to obtain a second data block belonging to the target backup file, and the reference information of each data block is used to indicate a backup file to which the data block belongs;
确定第三备份文件,根据所述第三备份文件的元数据中记录第三备份文件中包含的每个数据块的引用信息,以获得归属于所述目标备份文件的第一数据块和第二数据块中被第三备份文件引用的第三数据块;Determine a third backup file, and according to the metadata of the third backup file, record reference information of each data block included in the third backup file to obtain the first data block and the second data belonging to the target backup file The third data block in the data block that is referenced by the third backup file;
根据所述第一数据块、第二数据块,以及所述第三数据块,确定能够被删除的第四数据块,根据目标备份文件的元数据中记录的每个第四数据块的大小,确定所述目标备份文件中能够被删除的数据的大小。Determine a fourth data block that can be deleted according to the first data block, the second data block, and the third data block, and according to the size of each fourth data block recorded in the metadata of the target backup file, Determine the size of data that can be deleted in the target backup file.
处理单元1001,可具体用于将第四数据块的大小之和,确定为所述目标备份文件中能够被删除的数据的大小。The processing unit 1001 may be specifically configured to determine the sum of the sizes of the fourth data blocks as the sizes of data that can be deleted in the target backup file.
处理单元1001,可具体用于将第一数据块和第二数据块中除第三数据块外的数据块确定为能够被删除的第四数据块。The processing unit 1001 may be specifically configured to determine a data block other than the third data block among the first data block and the second data block as a fourth data block that can be deleted.
处理单元1001,可具体用于根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息,获得归属于第二备份文件的数据块,所述第二备份文件的创建时间位于所述目标备份文件的创建时间之前,且位于所述第一备份文件的创建时间之后,所述第二备份文件已被删除;将所述归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块。The processing unit 1001 may be specifically configured to obtain the second backup according to the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file. Data block of the file, the creation time of the second backup file is before the creation time of the target backup file, and after the creation time of the first backup file, the second backup file has been deleted; The data block attributed to the second backup file is determined as the second data block attributed to the target backup file.
接收单元1002,可用于接收删除请求,所述删除请求中包含所述目标备份文件的标识。The receiving unit 1002 may be configured to receive a deletion request, where the deletion request includes an identifier of the target backup file.
存储单元1003,可用于在所述目标备份文件、第二备份文件、第一备份文件的元数据中记录各自的备份文件的备份序号,以及备份文件中包含的每个数据块的引用信息,所述每个数据块的大小。The storage unit 1003 may be configured to record the backup serial number of the respective backup file and the reference information of each data block included in the backup file in the metadata of the target backup file, the second backup file, and the first backup file. Describe the size of each data block.
需要说明的是,以上的各个单元的功能可由备份服务器的处理器执行或者处理器调用存储器中的程序进行执行。It should be noted that the functions of the above units may be executed by the processor of the backup server or executed by the processor calling a program in the memory.
基于上述构思,如图11所示,本申请还提供一种计算备份文件的大小的装置1100,该计算备份文件的大小的装置1100可应用于备份服务器或备份服务器中的芯片。Based on the above concept, as shown in FIG. 11, the present application further provides a device 1100 for calculating the size of a backup file. The device 1100 for calculating the size of a backup file can be applied to a backup server or a chip in the backup server.
该计算备份文件的大小的装置1100可包括处理器1101和存储器1102。The device 1100 for calculating the size of a backup file may include a processor 1101 and a memory 1102.
所述存储器1102用于存储计算机指令;The memory 1102 is configured to store computer instructions;
所述处理器1101用于执行所述存储器1102所存储的计算机指令,以使所述计算备份文件的大小的装置实现如上述的计算备份文件大小的任一项所述的方法。The processor 1101 is configured to execute computer instructions stored in the memory 1102, so that the apparatus for calculating the size of a backup file implements the method according to any one of the foregoing calculations of the size of a backup file.
关于处理器1101、存储器1102的介绍,可参见上述计算备份文件的大小所示流程的介绍,在此不再赘述。For the introduction of the processor 1101 and the memory 1102, refer to the description of the process shown in the above calculation of the size of the backup file, which will not be repeated here.
基于上述构思,如图12所示,本申请还提供一种计算备份文件中删除的数据的大小的装置1200,该计算备份文件中删除的数据的大小的装置1200可应用于备份服务器或备份服务器中的芯片。Based on the above concept, as shown in FIG. 12, the present application further provides a device 1200 for calculating the size of deleted data in a backup file. The device 1200 for calculating the size of deleted data in a backup file may be applied to a backup server or a backup server. In the chip.
该计算备份文件中删除的数据的大小的装置1200可包括处理器1201和存储器1202。The device 1200 for calculating the size of the deleted data in the backup file may include a processor 1201 and a memory 1202.
所述存储器1202用于存储计算机指令;The memory 1202 is configured to store computer instructions;
所述处理器1201用于执行所述存储器1202所存储的计算机指令,以使所述计算备份文件中删除的数据的大小的装置实现如上述的计算备份文件中删除的数据的大小的任一项所述的方法。The processor 1201 is configured to execute computer instructions stored in the memory 1202, so that the device for calculating the size of deleted data in a backup file implements any of the above-mentioned methods for calculating the size of deleted data in a backup file. The method described.
关于处理器1201、存储器1202的介绍,可参见上述计算备份文件的大小所示流程的介绍,在此不再赘述。For the introduction of the processor 1201 and the memory 1202, refer to the description of the process shown in the above calculation of the size of the backup file, which is not repeated here.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present application. It should be understood that each process and / or block in the flowcharts and / or block diagrams, and combinations of processes and / or blocks in the flowcharts and / or block diagrams can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing device to produce a machine, so that the instructions generated by the processor of the computer or other programmable data processing device are used to generate instructions Means for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions The device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art can make other changes and modifications to these embodiments once they know the basic inventive concepts. Therefore, the following claims are intended to be construed to include the preferred embodiments and all changes and modifications that fall within the scope of this application.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的精神和范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various modifications and variations to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application also intends to include these changes and variations.

Claims (13)

  1. 一种计算备份文件大小的方法,其特征在于,包括:A method for calculating the size of a backup file, comprising:
    备份服务器确定目标备份文件,所述目标备份文件中包含多个数据块,所述目标备份文件是源文件的备份文件,所述源文件还具有第一备份文件,所述第一备份文件包含多个数据块,所述第一备份文件为创建时间位于所述目标备份文件的创建时间之前的,且创建时间与所述目标备份文件的创建时间最近的备份文件,所述第一备份文件未被删除;The backup server determines a target backup file. The target backup file includes multiple data blocks. The target backup file is a backup file of a source file. The source file also has a first backup file. The first backup file includes multiple data blocks. Data blocks, the first backup file is a backup file whose creation time is before the creation time of the target backup file, and whose creation time is closest to the creation time of the target backup file, the first backup file is not delete;
    从所述目标备份文件的元数据中查询所述目标备份文件中包含的每个数据块的引用信息以获得归属于所述目标备份文件的第一数据块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件;Querying the reference information of each data block contained in the target backup file from the metadata of the target backup file to obtain a first data block belonging to the target backup file, and the reference information of each data block A backup file for indicating the ownership of the data block;
    根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息获得归属于所述目标备份文件的第二数据块;Obtaining the second data block belonging to the target backup file according to the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file;
    根据所述目标备份文件的元数据中记录的所述第一数据块和所述第二数据块的大小,确定所述目标备份文件的大小。Determining the size of the target backup file according to the size of the first data block and the second data block recorded in the metadata of the target backup file.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息获得归属于所述目标备份文件的第二数据块包括:The method according to claim 1, wherein the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file Obtaining the second data block belonging to the target backup file includes:
    根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息,获得归属于第二备份文件的数据块,所述第二备份文件的创建时间位于所述目标备份文件的创建时间之前,且位于所述第一备份文件的创建时间之后,所述第二备份文件已被删除;Obtaining the data block belonging to the second backup file according to the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file, the first The creation time of the second backup file is before the creation time of the target backup file and after the creation time of the first backup file, the second backup file has been deleted;
    将所述归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块。And determining the data block attributed to the second backup file as the second data block attributed to the target backup file.
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述目标备份文件的元数据中记录的所述第一数据块和所述第二数据块的大小,确定所述目标备份文件的大小包括:The method according to claim 1, wherein the determining of the target backup file based on the size of the first data block and the second data block recorded in the metadata of the target backup file Sizes include:
    将第一数据块和第二数据块的大小之和,确定为所述目标备份文件的大小。The sum of the sizes of the first data block and the second data block is determined as the size of the target backup file.
  4. 根据权利要求1所述的方法,其特征在于,还包括:The method according to claim 1, further comprising:
    在所述目标备份文件的元数据中记录所述目标备份文件中包含的每个数据块的引用信息,和每个数据块的大小;Record the reference information of each data block contained in the target backup file and the size of each data block in the metadata of the target backup file;
    在所述第一备份文件的元数据中记录所述第一备份文件中包含的每个数据块的引用信息。Record the reference information of each data block contained in the first backup file in the metadata of the first backup file.
  5. 根据权利要求1所述的方法,其特征在于,所述备份服务器在确定目标备份文件之前,还包括:The method according to claim 1, wherein before the backup server determines a target backup file, the method further comprises:
    接收查询请求,所述查询请求中包含所述目标备份文件的标识。A query request is received, and the query request includes an identifier of the target backup file.
  6. 一种计算备份文件的大小的装置,其特征在于,包括处理单元和存储单元;A device for calculating the size of a backup file, comprising a processing unit and a storage unit;
    所述存储单元,用于存储备份文件;The storage unit is configured to store a backup file;
    所述处理单元,用于确定目标备份文件,所述目标备份文件中包含多个数据块,所述目标备份文件是源文件的备份文件,所述源文件还具有第一备份文件,所述第一备份文件包含多个数据块,所述第一备份文件为创建时间位于所述目标备份文件的创建时间之前的,且创建时间与所述目标备份文件的创建时间最近的备份文件,所述第一备份文件未被删除;The processing unit is configured to determine a target backup file, where the target backup file includes multiple data blocks, the target backup file is a backup file of a source file, and the source file further includes a first backup file, the first A backup file includes a plurality of data blocks. The first backup file is a backup file whose creation time is before the creation time of the target backup file and whose creation time is closest to the creation time of the target backup file. A backup file has not been deleted;
    从所述目标备份文件的元数据中查询所述目标备份文件中包含的每个数据块的引用信 息以获得归属于所述目标备份文件的第一数据块,所述每个数据块的引用信息用于指示所述数据块归属的备份文件;Querying the reference information of each data block contained in the target backup file from the metadata of the target backup file to obtain a first data block belonging to the target backup file, and the reference information of each data block A backup file for indicating the ownership of the data block;
    根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息获得归属于所述目标备份文件的第二数据块;Obtaining the second data block belonging to the target backup file according to the reference information of the data block recorded in the metadata of the target backup file and the reference information of the data block recorded in the metadata of the first backup file;
    根据所述目标备份文件的元数据中记录的所述第一数据块和所述第二数据块的大小,确定所述目标备份文件的大小。Determining the size of the target backup file according to the size of the first data block and the second data block recorded in the metadata of the target backup file.
  7. 根据权利要求6所述的装置,其特征在于,所述处理单元,具体用于根据所述目标备份文件的元数据中记录的数据块的引用信息以及所述第一备份文件的元数据中记录的数据块的引用信息,获得归属于第二备份文件的数据块,所述第二备份文件的创建时间位于所述目标备份文件的创建时间之前,且位于所述第一备份文件的创建时间之后,所述第二备份文件已被删除;The device according to claim 6, wherein the processing unit is specifically configured to record information in the metadata of the target backup file according to the reference information of the data block recorded in the metadata of the target backup file and in the metadata of the first backup file. The reference information of the data block to obtain the data block belonging to the second backup file, and the creation time of the second backup file is before the creation time of the target backup file and after the creation time of the first backup file , The second backup file has been deleted;
    将所述归属于第二备份文件的数据块,确定为归属于所述目标备份文件的第二数据块。And determining the data block attributed to the second backup file as the second data block attributed to the target backup file.
  8. 根据权利要求6所述的装置,其特征在于,所述处理单元,具体用于将第一数据块和第二数据块的大小之和,确定为所述目标备份文件的大小。The apparatus according to claim 6, wherein the processing unit is specifically configured to determine a sum of sizes of the first data block and the second data block as the size of the target backup file.
  9. 根据权利要求6所述的装置,其特征在于,所述存储单元,还用于在所述目标备份文件的元数据中记录所述目标备份文件中包含的每个数据块的引用信息,和每个数据块的大小;The device according to claim 6, wherein the storage unit is further configured to record reference information of each data block included in the target backup file in the metadata of the target backup file, and each The size of each data block;
    在所述第一备份文件的元数据中记录所述第一备份文件中包含的每个数据块的引用信息。Record the reference information of each data block contained in the first backup file in the metadata of the first backup file.
  10. 根据权利要求6所述的装置,其特征在于,还包括:The apparatus according to claim 6, further comprising:
    接收单元,用于接收查询请求,所述查询请求中包含所述目标备份文件的标识。The receiving unit is configured to receive a query request, where the query request includes an identifier of the target backup file.
  11. 一种计算备份文件的大小的装置,其特征在于,包括处理器和存储器;A device for calculating the size of a backup file, comprising a processor and a memory;
    所述存储器用于存储计算机指令;The memory is used to store computer instructions;
    所述处理器用于执行所述存储器所存储的计算机指令,以使所述计算备份文件的大小的装置实现如权利要求1至5任一项所述的方法。The processor is configured to execute computer instructions stored in the memory, so that the apparatus for calculating the size of a backup file implements the method according to any one of claims 1 to 5.
  12. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机指令,当所述计算机指令被计算机执行时,使得所述计算机执行如权利要求1至5中任一项所述的方法。A computer-readable storage medium, wherein the storage medium stores computer instructions, and when the computer instructions are executed by a computer, the computer causes the computer to execute the method according to any one of claims 1 to 5. .
  13. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机指令,当所述计算机指令被计算机执行时,使得所述计算机执行如权利要求1至5中任一项所述的方法。A computer program product, wherein the computer program product includes computer instructions, and when the computer instructions are executed by a computer, the computer causes the computer to execute the method according to any one of claims 1 to 5.
PCT/CN2019/079390 2018-08-23 2019-03-23 Method and apparatus for calculating backup file size WO2020037985A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810966824.2A CN110858123B (en) 2018-08-23 2018-08-23 Method and device for calculating size of backup file
CN201810966824.2 2018-08-23

Publications (1)

Publication Number Publication Date
WO2020037985A1 true WO2020037985A1 (en) 2020-02-27

Family

ID=69592250

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/079390 WO2020037985A1 (en) 2018-08-23 2019-03-23 Method and apparatus for calculating backup file size

Country Status (2)

Country Link
CN (1) CN110858123B (en)
WO (1) WO2020037985A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11531488B2 (en) 2017-08-07 2022-12-20 Kaseya Limited Copy-on-write systems and methods

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581013A (en) * 2020-03-18 2020-08-25 宁波送变电建设有限公司永耀科技分公司 System information backup and reconstruction method based on metadata and shadow files
US20240281338A1 (en) * 2023-02-22 2024-08-22 Bank Of America Corporation Systems, methods, and apparatuses for determining and applying a backup file attribution to files in an electronic network

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317676A (en) * 2014-11-21 2015-01-28 四川智诚天逸科技有限公司 Data backup disaster tolerance method
CN104714864A (en) * 2015-03-20 2015-06-17 成都云祺科技有限公司 Intelligent computer data backup method
CN104866391A (en) * 2015-05-13 2015-08-26 三星电子(中国)研发中心 Terminal information backup method and apparatus based on incremental information system
CN105373452A (en) * 2015-12-11 2016-03-02 上海爱数信息技术股份有限公司 Data backup method
CN106973099A (en) * 2017-03-28 2017-07-21 广东欧珀移动通信有限公司 A kind of data-updating method, apparatus and system
US20170357553A1 (en) * 2016-06-14 2017-12-14 EMC IP Holding Company LLC Method and device for data backup

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3856855B2 (en) * 1995-10-06 2006-12-13 三菱電機株式会社 Differential backup method
CN102236586A (en) * 2010-04-21 2011-11-09 雷州 Local and network multiple incremental data backup and recovery method of computer
CN102436408B (en) * 2011-10-10 2014-02-19 上海交通大学 Data storage cloud and cloud backup method based on Map/Dedup
CN102436478B (en) * 2011-10-12 2013-06-19 浪潮(北京)电子信息产业有限公司 System and method for accessing massive data
US9514000B2 (en) * 2014-01-31 2016-12-06 Western Digital Technologies, Inc. Backup of baseline installation
US10503604B2 (en) * 2014-06-26 2019-12-10 Hewlett Packard Enterprise Development Lp Virtual machine data protection
CN104375905A (en) * 2014-11-07 2015-02-25 北京云巢动脉科技有限公司 Incremental backing up method and system based on data block
CN106155843B (en) * 2016-07-13 2019-03-12 袁凌 A kind of backup of virtual machine and backward recovery method
CN108268344B (en) * 2017-12-26 2021-05-18 华为技术有限公司 Data processing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317676A (en) * 2014-11-21 2015-01-28 四川智诚天逸科技有限公司 Data backup disaster tolerance method
CN104714864A (en) * 2015-03-20 2015-06-17 成都云祺科技有限公司 Intelligent computer data backup method
CN104866391A (en) * 2015-05-13 2015-08-26 三星电子(中国)研发中心 Terminal information backup method and apparatus based on incremental information system
CN105373452A (en) * 2015-12-11 2016-03-02 上海爱数信息技术股份有限公司 Data backup method
US20170357553A1 (en) * 2016-06-14 2017-12-14 EMC IP Holding Company LLC Method and device for data backup
CN106973099A (en) * 2017-03-28 2017-07-21 广东欧珀移动通信有限公司 A kind of data-updating method, apparatus and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11531488B2 (en) 2017-08-07 2022-12-20 Kaseya Limited Copy-on-write systems and methods

Also Published As

Publication number Publication date
CN110858123A (en) 2020-03-03
CN110858123B (en) 2021-06-01

Similar Documents

Publication Publication Date Title
US11799959B2 (en) Data processing method, apparatus, and system
EP3678015B1 (en) Metadata query method and device
WO2020037985A1 (en) Method and apparatus for calculating backup file size
US20140297603A1 (en) Method and apparatus for deduplication of replicated file
WO2018149271A1 (en) Data query method, device and calculating apparatus
CN106649403B (en) Index implementation method and system in file storage
CN106815326B (en) System and method for detecting consistency of data table without main key
CN104978324A (en) Data processing method and device
CN109033365B (en) Data processing method and related equipment
CN110413631B (en) Data query method and device
US11494105B2 (en) Using a secondary storage system to implement a hierarchical storage management plan
CN108984343B (en) Virtual machine backup and storage management method based on content analysis
GB2587530A (en) Snapshot space reduction method and apparatus
US11436193B2 (en) System and method for managing data using an enumerator
KR20220011184A (en) Incremental data comparison implementation system and method
WO2019072088A1 (en) File management method, file management device, electronic equipment and storage medium
US12032537B2 (en) Deduplicating metadata based on a common sequence of chunk identifiers
US11308048B2 (en) Database migration method, apparatus, device, and computer-readable medium
US11157367B1 (en) Promotional logic during database differential backup
US11704037B2 (en) Deduplicated storage disk space utilization
CN114416689A (en) Data migration method and device, computer equipment and storage medium
US9483560B2 (en) Data analysis control
CN112860694B (en) Service data processing method, device and equipment
CN104301345B (en) The method and system of data are deleted in a kind of Cache clusters
US11782901B2 (en) Method and system for performing computations in a distributed system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19852084

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19852084

Country of ref document: EP

Kind code of ref document: A1