WO2023000674A1 - 云硬盘数据压缩备份及恢复方法、装置、设备及存储介质 - Google Patents

云硬盘数据压缩备份及恢复方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023000674A1
WO2023000674A1 PCT/CN2022/078491 CN2022078491W WO2023000674A1 WO 2023000674 A1 WO2023000674 A1 WO 2023000674A1 CN 2022078491 W CN2022078491 W CN 2022078491W WO 2023000674 A1 WO2023000674 A1 WO 2023000674A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
backup
data
hard disk
data block
Prior art date
Application number
PCT/CN2022/078491
Other languages
English (en)
French (fr)
Inventor
海鑫
亓开元
轩艳东
马翱宇
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023000674A1 publication Critical patent/WO2023000674A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of cloud platform technology, and in particular to a cloud hard disk data compression backup and recovery method, a cloud hard disk data compression backup and recovery device, electronic equipment, and a computer-readable storage medium.
  • Cloud computing platform also known as cloud platform, refers to services based on hardware resources and software resources, providing computing, network and storage capabilities.
  • a cloud hard disk is a device that can be mounted to a cloud host and used as a physical hard disk. In order to make the data more secure and reliable, it is usually necessary to back up the cloud hard disk. When the cloud hard disk fails or the data in the cloud hard disk has a logic error (such as deleting data by mistake, encountering hacker attack or virus damage, etc.), the backup data can be used Recover data quickly.
  • the data in the source cloud hard disk is usually written directly to the backup volume, and the actual storage capacity occupied by the backup volume is consistent with the storage capacity occupied by the source cloud hard disk, which makes the backup data occupy A large amount of storage space increases the cost of backup operations.
  • the related technologies have the problems of large storage space occupation and high service costs, which are technical problems to be solved by those skilled in the art.
  • the purpose of this application is to provide a cloud hard disk data compression backup and recovery method, cloud hard disk data compression backup and recovery device, electronic equipment and computer-readable storage media, and reduce storage space on the premise of ensuring correct data recovery Occupancy, reduce business costs.
  • the application provides a cloud hard disk data compression backup and recovery method, including:
  • performing data restoration by using the target backup information, the target backup volume, and the corresponding target preset order specified by the recovery request includes
  • the target backup information includes several preset data volumes and corresponding preset start offsets ;
  • Target non-zero data block into the target cloud hard disk based on a match between the target preset start offset and the current write location of the target cloud hard disk.
  • the writing the target non-zero data block into the target cloud hard disk based on the match between the target preset start offset and the current write position of the target cloud hard disk includes:
  • the target preset start offset matches the current write location, then write the target non-zero data block into the target cloud hard disk according to the target preset start offset;
  • the target preset start offset does not match the current write location, write the target non-zero data block into the target cloud hard disk according to the target preset start offset, and Clear the data between the target cloud hard disk from the current write position before the target non-zero data block is written to the target preset start offset.
  • the method further includes:
  • the determining the preset sequence corresponding to the backup information includes:
  • the compressed data blocks are sorted according to the size relationship of the start offset corresponding to each compressed data block, and the order of the compressed data blocks is determined as the preset order.
  • the compressing the non-zero data blocks in the initial data blocks to obtain compressed data blocks includes:
  • Determining the initial data block whose detection result is non-zero is determined as a non-zero data block and performing compression to obtain the compressed data block.
  • performing zero data block detection on each of the initial data blocks to obtain a detection result includes:
  • said dividing the source cloud hard disk to obtain several initial data blocks including:
  • the segmentation granularity can be equally divided into 1GB;
  • the source cloud hard disk is evenly divided according to the division granularity to obtain the initial data block.
  • the generating corresponding backup information using the starting offset and the data volume corresponding to the compressed data block includes:
  • the key-value pair sequence is identified by using the hard disk identifier of the source cloud disk and the volume identifier of the backup volume to obtain the backup information.
  • the generating corresponding backup information using the starting offset and the data volume corresponding to the compressed data block includes:
  • the initial backup information is identified by using the compression identifier to obtain the backup information.
  • the present application also provides a cloud hard disk data compression backup and recovery device, including:
  • a segmentation module configured to segment the source cloud hard disk to obtain several initial data blocks, and determine the initial offset of each of the initial data blocks in the source cloud hard disk;
  • a compression module configured to compress non-zero data blocks in the initial data blocks to obtain compressed data blocks, and calculate the data volume of each compressed data block;
  • An information generation module configured to generate corresponding backup information by using the starting offset and the data volume corresponding to the compressed data block, and determine a preset sequence corresponding to the backup information
  • a writing module configured to write the compressed data blocks into the backup volume
  • the recovery module is configured to, when a recovery request is detected, perform data recovery using the target backup information specified by the recovery request, the target backup volume and the corresponding target preset sequence.
  • the cloud hard disk data compression backup and recovery method divides the source cloud hard disk to obtain several initial data blocks, and determines the initial offset of each initial data block in the source cloud hard disk; Compress non-zero data blocks to obtain compressed data blocks, and calculate the data volume of each compressed data block; use the starting offset and data volume corresponding to the compressed data blocks to generate corresponding backup information, and determine the preset corresponding to the backup information Sequence; write compressed data blocks into the backup volume; when a recovery request is detected, use the target backup information specified by the recovery request and the corresponding target preset sequence to perform data recovery.
  • a non-zero data block is a data block that records non-zero data. Unlike a zero data block, its specific content cannot be determined during data recovery, so it needs to be compressed and saved for data recovery based on it. Since different non-zero data blocks have different compressed volumes, and the compressed data blocks are stored continuously, in order to be able to recover accurately, the data volume of the compressed data blocks is recorded so that the compressed data can be read correctly piece.
  • the starting offset corresponding to the compressed data block that is, the starting offset corresponding to the non-zero data block and the corresponding data volume to generate backup information, and determine its corresponding Preset order.
  • the preset order is used to indicate the order in which compressed data blocks are selected during data recovery.
  • a compressed backup of the source cloud disk can be done by writing the compressed data blocks to the backup volume.
  • the target backup information, target backup volume, and target preset sequence specified by the recovery request can be used to accurately read out compressed data blocks, decompress them, and Carry out data block splicing and complete data recovery.
  • the storage space required for compressed backup can be greatly reduced, and the utilization efficiency of storage space can be improved. It solves the problems of relatively large storage space occupation and high service cost in related technologies.
  • the present application also provides a cloud hard disk data compression backup and recovery device, electronic equipment, and computer-readable storage medium, which also have the above-mentioned beneficial effects.
  • Fig. 1 is the flowchart of a kind of cloud hard disk data compression backup and recovery method that the embodiment of the present application provides;
  • Fig. 2 is a kind of specific cloud hard disk backup flowchart provided by the embodiment of the present application.
  • Fig. 3 is a backup time-consuming comparison chart provided by the embodiment of the present application.
  • Fig. 4 is a backup volume capacity comparison chart provided by the embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a cloud hard disk data compression backup and recovery device provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a flow chart of a cloud hard disk data compression backup and recovery method provided by an embodiment of the present application. The method includes:
  • S101 Segment the source cloud hard disk to obtain several initial data blocks, and determine the starting offset of each initial data block in the source cloud hard disk.
  • the split can be average or uneven.
  • the entire source cloud hard disk can be equally divided. Since the size of the source cloud disk is in GB (gigabytes), the splitting granularity should be equal to 1GB. When splitting, you can obtain a split granularity that can equally divide 1GB, and split the source cloud disk on average according to the split granularity to obtain initial data blocks.
  • the initial offset refers to the first data in the initial data block in the source cloud hard disk. position offset.
  • S102 Compress non-zero data blocks in the initial data block to obtain compressed data blocks, and calculate the data volume of each compressed data block.
  • this application Since there may be zero data blocks in the initial data blocks obtained after splitting the source cloud hard disk, in order to improve the utilization rate of storage space, this application only stores the non-zero data blocks. For zero data blocks, since there are no records Any valid data, and its data content is uniquely determined, so it does not need to be backed up.
  • the process of obtaining the compressed data block includes the following steps:
  • Step 11 Perform zero data block detection on each of the initial data blocks to obtain a detection result.
  • Step 12 Determining the initial data block whose detection result is non-zero as a non-zero data block and performing compression to obtain the compressed data block.
  • the present application before compressing the non-zero data blocks, the present application must determine which data blocks are non-zero data blocks and which are zero data blocks.
  • the corresponding data block identity information may be acquired, which is specifically the identity information indicating whether each data block is a zero data block after the splitting manner is determined.
  • zero data block detection may be performed, and an initial data block that fails the zero data block detection is determined as a non-zero data block. This embodiment does not limit the specific detection method of zero data blocks.
  • the zero data block detection is performed on each initial data block, and the step of obtaining the detection result may include:
  • Step 21 Read the data content of the initial data block, and compare the data content with the binary empty flag.
  • Step 22 If there is any data content that is not a binary empty flag, determine that the detection result corresponding to the initial data block is non-zero.
  • the binary empty flag is " ⁇ x00", and the data content refers to the specific content recorded in the initial data block. By comparing it with the binary empty flag, it can be determined whether it is all empty. If any data content is not a binary empty flag, it means that the initial data block is not all zeros, so it can be determined that it is a non-zero data block, that is, the detection result corresponding to the initial data block is determined to be non-zero.
  • the non-zero data block After the non-zero data block is determined, it can be compressed to obtain a compressed data block. Specifically, a compression algorithm such as gzip, zip, or snappy can be used to compress the non-zero data block to obtain a compressed data block. In addition, since the compressed volume of each data block is usually different, in order to correctly read each compressed data block, it is necessary to count the data volume of each compressed data block, so as to generate backup information later for correct data recovery .
  • a compression algorithm such as gzip, zip, or snappy can be used to compress the non-zero data block to obtain a compressed data block.
  • the compressed volume of each data block is usually different, in order to correctly read each compressed data block, it is necessary to count the data volume of each compressed data block, so as to generate backup information later for correct data recovery .
  • S103 Generate corresponding backup information using the starting offset and data volume corresponding to the compressed data block, and determine a preset sequence corresponding to the backup information.
  • step S103 and step S104 is not limited, for example, step S103 may be executed first, and then step S104 may be executed; or step S104 may be executed first, and then step S103 may be executed; or step S103 and step S103 may be executed simultaneously S104.
  • the process of generating the backup information may include the following steps:
  • Step 31 Use the starting offset corresponding to the compressed data block and the data volume to form a key-value pair.
  • Step 32 Sort each key-value pair according to the size of the starting offset to obtain a sequence of key-value pairs.
  • Step 33 Use the hard disk identifier of the source cloud disk and the volume identifier of the backup volume to identify the sequence of key-value pairs to obtain backup information.
  • a key-value pair may be used to represent the correlation between the starting offset and the data volume, and a corresponding key-value pair is obtained. After the key-value pairs are obtained, they can be sorted according to the size of the starting offset to obtain a sequence of key-value pairs.
  • the order of the key-value pairs in the key-value pair sequence can be used as the aforementioned preset order, that is, the preset order in this embodiment is specifically the order of the starting offset size, which is also the order of the non-zero data blocks in the source cloud hard disk The order of position.
  • UUID is the abbreviation of Universally Unique Identifier (Universally Unique Identifier), which is a standard for software construction. Its purpose is to allow all elements in the distributed system to have unique identification information without specifying the identification information through the central control terminal.
  • the system can use different backup policies for different source cloud hard disks. For example, some source cloud hard disks need to be backed up according to the above backup method, while other source cloud hard disks do not need to be backed up, but directly Make a copy backup. Therefore, in order to indicate the way of backup, the generation process of backup information can include:
  • Step 41 Generate initial backup information using the starting offset and data volume corresponding to the compressed data block.
  • Step 42 Use the compression identifier to mark the initial backup information to obtain the backup information.
  • the initial backup information is directly generated by using the starting offset and the data volume.
  • the compression flag refers to the flag that can indicate the backup method, and its specific form is not limited.
  • the state flag bit compress can be set for the initial backup information. If the flag bit is set to true, then true is the compression flag.
  • the initial backup information is identified by using the compression identifier, and the backup information that can represent the backup mode can be obtained.
  • the generated backup information is also the content corresponding to the non-zero data blocks. Therefore, when performing data recovery, in order to accurately determine where to insert zero data blocks to obtain a correct and complete source cloud hard disk, a judgment rule needs to be preset, and the judgment rule is usually related to the writing position of the hard disk and the non-zero data.
  • the offset of the block is related, that is, when the two do not match, it is determined that a zero data block needs to be added. And every time a non-zero data block is written into the cloud hard disk, it is necessary to select a specific non-zero data block in a certain order, and use its corresponding starting offset to match the writing position of the hard disk.
  • the preset order can adapt to changes according to changes in the matching rules.
  • the recovery process of the source cloud hard disk is sequentially restored from the beginning of the data to the end.
  • it can be set to determine whether the starting offset is greater than the hard disk The current writing position is adjacent to the current writing position, if so, it is determined that the two match, otherwise they do not match.
  • the process of generating the preset sequence may include the following steps:
  • Step 51 Sort the compressed data blocks according to the size relationship of the start offsets corresponding to each compressed data block, and determine the sequence of the compressed data blocks as a preset sequence.
  • the compressed data blocks are sorted according to the order of the size of the starting offset, and the sequence of the compressed data blocks is determined as a preset sequence after sorting. According to this order, the compressed data blocks with larger starting offsets can be successively selected, that is, non-zero data blocks with larger starting offsets can be obtained. During the writing process, it can be judged based on the above matching rules whether it is necessary Supplement with zero data blocks.
  • a backup volume is a backup volume used to store compressed data blocks.
  • Restoration request refers to a request to restore the data in the specified cloud hard disk, and its specific form and content are not limited. It can be understood that based on the recovery request, it must be possible to determine which cloud disk data needs to be recovered, so that the data required for data recovery can be further determined, including target backup information (also called target information), target preset Set the sequence and target backup volume.
  • target backup information also called target information
  • target preset Set the sequence and target backup volume.
  • the target backup volume refers to the data volume storing the backup data specified by the recovery request, where the backup data generated when the source cloud hard disk (that is, the backed up cloud hard disk) is backed up is stored.
  • the target information refers to the backup information of the backup data specified by the recovery request. It can be understood that, when backing up the source cloud hard disk, data usually needs to be divided into blocks, and the divided data is continuously written to the target backup volume. Therefore, the target backup information should at least be able to indicate the volume of each data block in the target backup volume, so that accurate backup data can be read, and at the same time, it should be able to indicate the location of each backup data in the source cloud hard disk, so that the source volume can be correctly reconstructed. Data in the cloud disk.
  • step S105 may further include:
  • Step 61 If a recovery request is detected, determine the target backup information and the target backup volume specified by the recovery request.
  • Step 62 Using each preset data volume, read the corresponding target compressed data blocks from the target backup volume according to the target preset sequence.
  • Step 63 Decompress the target compressed data block to obtain a candidate non-zero data block.
  • Step 64 Determine the target non-zero data block among the candidate non-zero data blocks according to the target preset sequence, and determine the target preset starting offset corresponding to the target non-zero data block.
  • Step 65 Write the target non-zero data block into the target cloud hard disk based on the match between the target preset start offset and the current writing position of the target cloud hard disk.
  • the target information includes several preset data volumes and corresponding preset starting offsets.
  • the preset data volume refers to the data volume of each compressed data block in the target backup volume.
  • the preset starting offset refers to the position of the non-zero data blocks corresponding to each compressed data block in the target backup volume in the source cloud disk. The two are in a one-to-one correspondence, and correspond to each compressed data block in the target backup volume.
  • the restoration request may include the source cloud hard disk information, and after obtaining the source cloud hard disk information, use the above correspondence to determine the corresponding target backup information and target backup volume.
  • the restoration request may directly specify the target backup information and the target backup volume.
  • the target information After the target information is obtained, it is analyzed to obtain the preset data volume and the preset start offset.
  • the preset data volume and the preset start offset are the same and multiple. Of course, The number of both may also be one.
  • each data block obtained by splitting is compressed when the source cloud hard disk is backed up to obtain compressed data blocks. Therefore, the preset data volume is the volume of the target compressed data block, and the target compressed data block refers to the compressed data block stored in the target backup volume.
  • the target preset order may be the storage order of the target compressed data blocks in the target backup volume. Usually, it is also the sequence of the target compressed data blocks in the source cloud hard disk, that is, the corresponding start The order of the size of the starting offset. According to the target preset sequence, it is possible to determine which target compressed data block to read at a certain stage during data recovery, and then determine which preset data volume needs to be read based on.
  • a zero data block refers to a data block including only zero data
  • a non-zero data block refers to a data block including non-zero data.
  • the decompression method it needs to correspond to the compression method of the target compressed data block.
  • the specific content of the compression method and decompression method is not limited in this embodiment, and any reversible compression method and corresponding decompression method can be selected. Among them, reversible means that the data content will not change after being compressed and decompressed.
  • the target non-zero data block refers to the data block that needs to be written to the target cloud disk in the current stage.
  • the target non-zero data blocks need to be determined according to a preset order. After the target non-zero data block is determined, its corresponding preset start offset is the target preset start offset, which can represent the data position of the target non-zero data block in the source cloud hard disk.
  • the target non-zero data block When writing the target non-zero data block, it is necessary to judge whether there is zero data between it and the candidate non-zero data block written last time, and then determine whether it is necessary to supplement the zero data at the same time, so as to update the data of the source cloud disk recover accurately. That is, it is necessary to write the target non-zero data block into the target cloud hard disk based on the match between the target preset starting offset and the current writing position of the target cloud hard disk.
  • the process of writing the target non-zero data block into the target cloud hard disk includes the following steps:
  • Step 71 If the target preset start offset matches the current writing position, write the target non-zero data block into the target cloud hard disk according to the target preset start offset.
  • Step 72 If the preset start offset of the target does not match the current write location, write the target non-zero data block to the target cloud disk according to the preset start offset of the target, and transfer the target cloud disk from the target non-zero The data between the current write position before the zero data block is written and the target preset start offset is cleared.
  • the current writing position of the target cloud hard disk refers to the position specified by the data pointer after the data was last written to the target cloud hard disk.
  • the position pointed to by the data pointer will change with the writing of data, and it always points to the last written The location of the input data. If no data is written in the target cloud disk, the data pointer points to the initial starting offset of the target cloud disk.
  • the preset start offset of the target matches the current writing position, it means that the candidate non-zero data block written last time is connected end to end with the target non-zero data block. data. In this case, you can directly write the target non-zero data blocks sequentially to the target cloud disk.
  • This embodiment does not limit the specific detection method of whether the target preset start offset matches the current writing position, for example, it can be determined whether the current writing position is smaller than the target preset start offset and is close to the target preset start If it is the starting offset, it can be determined that the two match. Or it can be judged whether the current write position and the target preset start offset are both the initial start offset position, that is, the first storage position of the entire cloud hard disk, and if so, it can be determined that the two match.
  • the preset start offset of the target does not match the current write position, it means that there are zero data blocks between the candidate non-zero data block written last time and the target non-zero data block, or the first few data blocks of the source cloud disk data blocks are non-zero data blocks.
  • the target non-zero data block needs to be written to the target cloud disk according to the target preset start offset, and zero data needs to be supplemented.
  • Step 81 Determine whether the target backup information has a compression flag.
  • Step 82 If there is a compression flag, determine to execute the step of reading the corresponding target compressed data blocks from the target backup volume according to the target preset order by using each preset data volume.
  • Step 83 If there is no compression flag, use each preset data volume to read the corresponding target data blocks from the target backup volume according to the target preset sequence, and splice the target data blocks to complete data recovery.
  • Target data blocks may include all-zero data blocks and non-zero data blocks.
  • a non-zero data block is a data block that records non-zero data. Unlike a zero data block, its specific content cannot be determined during data recovery, so it needs to be compressed and saved for data recovery based on it. Since different non-zero data blocks have different compressed volumes, and the compressed data blocks are stored continuously, in order to be able to recover accurately, the data volume of the compressed data blocks is recorded so that the compressed data can be read correctly piece.
  • the starting offset corresponding to the compressed data block that is, the starting offset corresponding to the non-zero data block and the corresponding data volume to generate backup information, and determine its corresponding Preset order.
  • the preset order is used to indicate the order in which compressed data blocks are selected during data recovery.
  • a compressed backup of the source cloud disk can be done by writing the compressed data blocks to the backup volume.
  • the target backup information, target backup volume, and target preset sequence specified by the recovery request can be used to accurately read out compressed data blocks, decompress them, and Carry out data block splicing and complete data recovery.
  • the storage space required for compressed backup can be greatly reduced, and the utilization efficiency of storage space can be improved. It solves the problems of relatively large storage space occupation and high service cost in related technologies.
  • FIG. 2 is a specific cloud hard disk backup flowchart provided by the embodiment of the present application.
  • the source cloud disk backup process is described as follows:
  • gzip tool For a non-empty chunk, use the gzip tool to compress to obtain the compressed data block (ie, the compressed data block), and calculate its corresponding capacity size (ie, the data volume of the compressed data block). Specifically, after chunk1 is compressed, its capacity is reduced to size1. Since chunk2 is identified as an empty chunk (that is, a zero data block), the compression step will be skipped, and the size of the compressed data block will not be calculated. After chunk3 is compressed, the size of the obtained data block is size3. Subsequent compressed data blocks and so on.
  • the key value of the key-value pair is: the starting offset offset1 when reading this chunk from the source cloud disk. Its value is: size1 of the compressed data block 1 of the chunk.
  • chunk2 Since chunk2 is identified as an empty chunk, it will be compressed, written to the backup volume, and the database records the key-value pairs. All subsequent empty chunks will be processed in this way.
  • the cloud disk After the cloud disk is compressed and backed up, it not only does not write empty chunks to the backup volume, but also compresses the non-empty chunks, which greatly reduces the capacity occupied by the backup volume, and records the complete source of each chunk in the database. The starting offset of volume reading, and the size information of the data block after each chunk is compressed.
  • the cloud hard disk data compression backup and recovery process is as follows:
  • chunk2 was an empty data block before, relevant information is not recorded in the data block, and the recovery process of chunk3 will start directly.
  • the backup volume reads from the initial offset size1, and then reads data blocks of size size3 backwards, and then decompresses to obtain new data blocks. Since offset3 must be greater than the current starting offset offset2 of the restored cloud disk (that is, the current write position, since the decompressed chunk1 was written before, the current write position is increased from offset1 to offset2). Therefore, before writing this data block (chunk3 after decompression) into the recovery cloud disk, the space from offset2 to offset3 needs to be cleared. In order to ensure that the restored data is consistent with the original backup.
  • Source cloud hard disk imageA Create an empty cloud hard disk with a quota size of 10G, mount it to the virtual machine and format it as an ext4 file system, and use the dd command in the file system to create an all-zero file with a size of 2G.
  • Source cloud hard disk imageB Create an empty cloud hard disk with a quota size of 10G, mount it to the virtual machine and format it as an ext4 file system, and use the dd command in the file system to create an all-zero file with a size of 5G.
  • Source cloud disk imageC Create a mirror volume with a quota size of 10G, that is, the cloud disk contains a system image with a size of 39MB, which is a minimum Linux installation system.
  • Source cloud disk imageD Create a mirror volume with a quota size of 10G, that is, the cloud disk contains a system image with a size of 2404MB, which is a centos7 installation system.
  • Source cloud disk imageE Create a mirrored volume with a quota size of 10G, that is, the cloud disk contains a system image with a size of 396MB, which is a minimum installation system for win.
  • the above five types of cloud disks perform cloud disk backup in the following three scenarios, and record the time consumed and the actual capacity occupied by the backup volume after the backup is completed.
  • the scene is as follows:
  • FIG. 3 is a backup time-consuming comparison chart provided by the embodiment of the present application
  • FIG. 4 is a backup volume capacity comparison chart provided by the embodiment of the present application.
  • the analysis of Figure 3 shows that the cloud hard disk backup mechanism adopted by the related technology consumes a relatively long time for backup; after enabling empty chunk detection (backup acceleration), the time consumed for backing up the same volume is significantly reduced; after enabling backup compression, the time consumed In general, it is higher than only enabling empty chunk detection, because it takes more time for chunk compression, but it is still less time-consuming than the backup mechanism adopted by related technologies.
  • the capacity saved in Scenario 3 compared with Scenario 2 is related to the sparsity of the data in the source cloud disk (for example, imageA and imageB are all-zero files generated by the dd command, and the sparsity is very high), and it is also related to the sparseness of the data in the source cloud disk. related to the compression algorithm.
  • the embodiment of the present application adopts the gzip compression algorithm to obtain the test results in Fig. 3 and Fig. 4 .
  • the cloud hard disk data compression backup and restoration device provided by the embodiment of the present application is introduced below.
  • the cloud hard disk data compression backup and restoration device described below and the cloud hard disk data compression backup and restoration method described above can be referred to in correspondence.
  • FIG. 5 is a schematic structural diagram of a cloud hard disk data compression backup and recovery device provided in the embodiment of the present application, including:
  • the segmentation module 110 is used to obtain several initial data blocks by segmenting the source cloud hard disk, and determine the starting offset of each initial data block in the source cloud hard disk;
  • the compression module 120 is used to compress the non-zero data blocks in the initial data block to obtain compressed data blocks, and calculate the data volume of each compressed data block;
  • the information generation module 130 is used to generate corresponding backup information using the starting offset and data volume corresponding to the compressed data block, and determine the corresponding preset order of the backup information;
  • Write module 140 for writing compressed data block in the backup volume
  • the restoration module 150 is configured to, when a restoration request is detected, perform data restoration using the target backup information, the target backup volume, and the corresponding target preset order specified by the restoration request.
  • recovery module 150 includes
  • the determination unit is configured to determine the target backup information and the target backup volume specified by the recovery request if the recovery request is detected; the target backup information includes several preset data volumes and corresponding preset start offsets;
  • the reading unit is configured to use each preset data volume to read the corresponding target compressed data blocks from the target backup volume according to the target preset sequence;
  • a decompression unit configured to decompress the target compressed data block to obtain a candidate non-zero data block
  • a target determining unit configured to determine a target non-zero data block among candidate non-zero data blocks according to a target preset sequence, and determine a target preset starting offset corresponding to the target non-zero data block;
  • the writing unit is configured to write the target non-zero data block into the target cloud hard disk based on the match between the preset start offset of the target and the current write position of the target cloud hard disk.
  • write cells including:
  • the first writing subunit is used to write the target non-zero data block into the target cloud hard disk according to the target preset start offset if the target preset start offset matches the current write position;
  • the second write subunit is used to write the target non-zero data block into the target cloud hard disk according to the target preset start offset if the target preset start offset does not match the current write position, and The data between the target cloud disk from the current write position before the target non-zero data block is written to the target preset start offset is cleared.
  • a compression judging unit configured to judge whether the target backup information has a compression flag
  • Determining the execution unit configured to determine and execute the step of reading the corresponding target compressed data block from the target backup volume according to the target preset order by using each preset data volume if there is a compression flag;
  • the splicing recovery unit is configured to read the corresponding target data blocks from the target backup volume according to the target preset order by using each preset data volume if there is no compression flag, and splice the target data blocks to complete data recovery.
  • the information generating module 130 includes:
  • the sorting unit is configured to sort the compressed data blocks according to the size relationship of the starting offset corresponding to each compressed data block, and determine the sequence of the compressed data blocks as a preset sequence.
  • the compression module 120 includes:
  • a zero data block detection unit is used to perform zero data block detection on each initial data block to obtain a detection result
  • the compression unit is configured to determine an initial data block whose detection result is non-zero as a non-zero data block and perform compression to obtain a compressed data block.
  • the zero data block detection unit includes:
  • the content matching subunit is used to read the data content of the initial data block, and compare the data content with the binary empty flag;
  • the non-zero determining subunit is used to determine that the detection result corresponding to the initial data block is non-zero if there is any data content that is not a binary empty flag.
  • the segmentation module 110 includes:
  • the granularity acquisition unit is used to obtain the segmentation granularity; the segmentation granularity can be equally divided into 1GB;
  • the average segmentation unit is used to averagely segment the source cloud disk according to the segmentation granularity to obtain initial data blocks.
  • the information generating module 130 includes:
  • a key-value pair generating unit configured to form a key-value pair using the starting offset and data volume corresponding to the compressed data block
  • the key-value pair sorting unit is used to sort each key-value pair according to the size of the starting offset to obtain a sequence of key-value pairs;
  • the identification unit is configured to identify the key-value pair sequence by using the hard disk identification of the source cloud hard disk and the volume identification of the backup volume to obtain backup information.
  • the information generating module 130 includes:
  • an initial generation unit configured to generate initial backup information using the starting offset and data volume corresponding to the compressed data block
  • the compression identification unit is configured to use the compression identification to identify the initial backup information to obtain the backup information.
  • the electronic device 100 may include a processor 101 and a memory 102 , and may further include one or more of a multimedia component 103 , an information input/information output (I/O) interface 104 and a communication component 105 .
  • a multimedia component 103 may be included in the electronic device 100 .
  • I/O information input/information output
  • the processor 101 is used to control the overall operation of the electronic device 100, so as to complete all or part of the steps in the above cloud hard disk data compression backup and recovery method;
  • these data may include instructions for any application program or method operating on the electronic device 100, as well as data related to the application program.
  • the memory 102 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random Access Memory (Static Random Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory, One or more of Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • Static Random Access Memory Static Random Access Memory
  • SRAM Static Random Access Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • PROM Programmable Read-Only Memory
  • Read-Only Memory One or more of Only Memory, ROM
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • Multimedia components 103 may include screen and audio components.
  • the screen can be, for example, a touch screen, and the audio component is used for outputting and/or inputting audio signals.
  • an audio component may include a microphone for receiving external audio signals.
  • the received audio signal may be further stored in the memory 102 or sent via the communication component 105 .
  • the audio component also includes at least one speaker for outputting audio signals.
  • the I/O interface 104 provides an interface between the processor 101 and other interface modules, which may be a keyboard, a mouse, buttons, and the like. These buttons can be virtual buttons or physical buttons.
  • the communication component 105 is used for wired or wireless communication between the electronic device 100 and other devices.
  • Wireless communication such as Wi-Fi, Bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G or 4G, or a combination of one or more of them, so the corresponding communication component 105 may include: Wi-Fi parts, Bluetooth parts, NFC parts.
  • the electronic device 100 may be implemented by one or more Application Specific Integrated Circuit (ASIC for short), Digital Signal Processor (DSP for short), Digital Signal Processing Device (DSPD for short), Programmable Logic Device (Programmable Logic Device, PLD for short), Field Programmable Gate Array (Field Programmable Gate Array, FPGA for short), controller, microcontroller, microprocessor or other electronic components are implemented for implementing the above embodiments
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • Field Programmable Gate Array Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above cloud hard disk data compression backup and restoration method are realized.
  • the computer-readable storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store program codes. medium.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for the related information, please refer to the description of the method part.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种云硬盘数据压缩备份及恢复方法、装置、电子设备及计算机可读存储介质,该方法包括:将源云硬盘切分得到若干个初始数据块,并确定各个初始数据块在源云硬盘中的起始偏移量;对初始数据块中的非零数据块进行压缩,得到压缩数据块,并计算各个压缩数据块的数据体积;利用压缩数据块对应的起始偏移量和数据体积生成对应的备份信息,并确定备份信息对应的预设顺序;向备份卷中写入压缩数据块;在检测到恢复请求时,利用恢复请求指定的目标备份信息和对应的目标预设顺序进行数据恢复;该方法可以在保证正确恢复数据的前提下减少存储空间的占用量。

Description

云硬盘数据压缩备份及恢复方法、装置、设备及存储介质
本申请要求在2021年7月23日提交中国专利局、申请号为202110838010.2、发明名称为“云硬盘数据压缩备份及恢复方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及云平台技术领域,特别涉及一种云硬盘数据压缩备份及恢复方法、云硬盘数据压缩备份及恢复装置、电子设备及计算机可读存储介质。
背景技术
云计算平台也称为云平台,是指基于硬件资源和软件资源的服务,提供计算、网络和存储能力。云硬盘是一种可以挂载到云主机上,作为物理硬盘使用的设备。为了使数据更加安全可靠,通常需要对云硬盘进行备份,当云硬盘出现故障或云硬盘中的数据发生逻辑错误时(如误删数据、遭遇黑客攻击或病毒危害等),可利用备份的数据快速恢复数据。相关技术在进行云硬盘备份时,通常会将源云硬盘中的数据直接写入到备份卷,备份卷实际占用的存储容量和源云硬盘占用的存储容量是一致的,这使得备份数据占用了大量的存储空间,提高了备份业务成本。
因此,相关技术存在的存储空间占用较多,业务成本较高的问题,是本领域技术人员需要解决的技术问题。
发明内容
有鉴于此,本申请的目的在于提供一种云硬盘数据压缩备份及恢复方法、云硬盘数据压缩备份及恢复装置、电子设备及计算机可读存储介质,在保证正确恢复数据的前提下减少存储空间的占用量,降低业务成本。
为解决上述技术问题,本申请提供了一种云硬盘数据压缩备份及恢复方法,包括:
将源云硬盘切分得到若干个初始数据块,并确定各个所述初始数据块在所述源云硬盘中的起始偏移量;
对所述初始数据块中的非零数据块进行压缩,得到压缩数据块,并计算各个所述压缩数据块的数据体积;
利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成对应的备份信息,并确定所述备份信息对应的预设顺序;
向备份卷中写入所述压缩数据块;
在检测到恢复请求时,利用所述恢复请求指定的目标备份信息、目标备份卷和对应的目标预设顺序进行数据恢复。
可选地,所述在检测到恢复请求时,利用所述恢复请求指定的目标备份信息、目标备份卷和对应的目标预设顺序进行数据恢复,包括
若检测到恢复请求,则确定所述恢复请求指定的所述目标备份信息和所述目标备份卷;所述目标备份信息包括若干个预设数据体积和对应的若干个预设起始偏移量;
利用各个所述预设数据体积,按照所述目标预设顺序从所述目标备份卷中读取对应的目标压缩数据块;
对所述目标压缩数据块进行解压缩,得到候选非零数据块;
按照所述目标预设顺序在所述候选非零数据块中确定目标非零数据块,并确定所述目标非零数据块对应的目标预设起始偏移量;
基于所述目标预设起始偏移量与所述目标云硬盘的当前写入位置的匹配情况,将所述目标非零数据块写入所述目标云硬盘。
可选地,所述基于所述目标预设起始偏移量与所述目标云硬盘的当前写入位置的匹配情况,将所述目标非零数据块写入所述目标云硬盘,包括:
若所述目标预设起始偏移量与所述当前写入位置相匹配,则将所述目标非零数据块按照所述目标预设起始偏移量写入所述目标云硬盘;
若所述目标预设起始偏移量与所述当前写入位置不匹配,则将所述目标非零数据块按照所述目标预设起始偏移量写入所述目标云硬盘,并将所述目标云硬盘从所述目标非零数据块写入前的所述当前写入位置,到所述目标预设起始偏移量之间的数据清零。
可选地,在确定所述恢复请求指定的所述目标备份信息和目标备份卷之后,还包括:
判断所述目标备份信息是否具有压缩标识;
若具有所述压缩标识,则确定执行利用各个所述预设数据体积,按照所述目标预设顺序从所述目标备份卷中读取对应的目标压缩数据块的步骤;
若不具有所述压缩标识,则利用各个所述预设数据体积,按照所述目标预设顺序从所述目标备份卷中读取对应的目标数据块,并将所述目标数据块进行拼接,完成所述数据恢复。
可选地,所述确定所述备份信息对应的预设顺序,包括:
根据各个所述压缩数据块对应的所述起始偏移量的大小关系对所述压缩数据块进行排序,并将所述压缩数据块的先后顺序确定为所述预设顺序。
可选地,所述对所述初始数据块中的非零数据块进行压缩,得到压缩数据块,包括:
对各个所述初始数据块进行零数据块检测,得到检测结果;
将检测结果表示为非零的所述初始数据块确定为非零数据块进行压缩,得到所述压缩数据块。
可选地,所述对各个所述初始数据块进行零数据块检测,得到检测结果,包括:
读取所述初始数据块的数据内容,并将所述数据内容与二进制空标志位比对;
若存在任一所述数据内容不为所述二进制空标志位,则确定所述初始数据块对应的检测结果表示为非零。
可选地,所述将源云硬盘切分得到若干个初始数据块,包括:
获取切分粒度;所述切分粒度能够等分1GB;
根据所述切分粒度对所述源云硬盘进行平均切分,得到所述初始数据块。
可选地,所述利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成对应的备份信息,包括:
利用所述压缩数据块对应的所述起始偏移量和所述数据体积组成键值对;
按照所述起始偏移量的大小顺序对各个所述键值对排序,得到键值对序列;
利用所述源云硬盘的硬盘标志、所述备份卷的卷标志对所述键值对序列进行标识,得到所述备份信息。
可选地,所述利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成对应的备份信息,包括:
利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成初始备份信息;
利用压缩标识对所述初始备份信息进行标识,得到所述备份信息。
本申请还提供了一种云硬盘数据压缩备份及恢复装置,包括:
切分模块,用于将源云硬盘切分得到若干个初始数据块,并确定各个所述初始数据块在所述源云硬盘中的起始偏移量;
压缩模块,用于对所述初始数据块中的非零数据块进行压缩,得到压缩数据块,并计算各个所述压缩数据块的数据体积;
信息生成模块,用于利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成对应的备份信息,并确定所述备份信息对应的预设顺序;
写入模块,用于向备份卷中写入所述压缩数据块;
恢复模块,用于在检测到恢复请求时,利用所述恢复请求指定的目标备份信息、目标备份卷和对应的目标预设顺序进行数据恢复。
本申请提供的云硬盘数据压缩备份及恢复方法,将源云硬盘切分得到若干个初始数据块,并确定各个初始数据块在源云硬盘中的起始偏移量;对初始数据块中的非零数据块进行压缩,得到压缩数据块,并计算各个压缩数据块的数据体积;利用压缩数据块对应的起始偏移量和数据体积生成对应的备份信息,并确定备份信息对应的预设顺序;向备份卷中写入压缩数据块;在检测到恢复请求时,利用恢复请求指定的目标备份信息和对应的目标预设顺序进行数据恢复。
可见,该方法在进行云硬盘备份时,对其进行切分,并对其中的非零数据块进行压缩。非零数据块为记录有非零数据的数据块,与零数据块不同,在数据恢复时,其具体内容无法确定,因此需要将其进行压缩和保存,以便基于其进行数据恢复。由于不同的非零数据块在被压缩后的体积不同,且压缩数据块之间连续存储,因此为了能够进行准确地恢复,对压缩数据块的数据体积进行记录,以便能够正确地读出压缩数据块。为了表征数据块在源云硬盘中的位置,利用压缩数据块对应的起始偏移量,即非零数据块对应的起始偏移量与对应的数据体积生成备份信息,并确定其对应的预设顺序。预设顺序为数据恢复时用于指示选择压缩数据块的顺序。通过将压缩数据块写入备份卷,可以完成对源云硬盘的压缩备份。在检测到恢复请求时,说明想要对某一个源云硬盘进行恢复,此时可以利用恢复请求指定的目标备份信息、目标备份卷和目标预设顺序准确地读出压缩数据块、解压缩并进行数据块拼接,完成数据恢复。通过将全零数据块去除,并对非零数据块进行压缩存储,同时生成其对应的备份信息和预设顺序,可以大大减少压缩备份所需的存储空间,提高了存储空间的利用效率,解决了相关技术存在的存储空间占用较多,业务成本较高的问题。
此外,本申请还提供了一种云硬盘数据压缩备份及恢复装置、电子设备及计算机可读存储介质,同样具有上述有益效果。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例提供的一种云硬盘数据压缩备份及恢复方法流程图;
图2为本申请实施例提供的一种具体的云硬盘备份流程图;
图3为本申请实施例提供的一种备份耗时对比图;
图4为本申请实施例提供的一种备份卷容量对比图;
图5为本申请实施例提供的一种云硬盘数据压缩备份及恢复装置的结构示意图;
图6为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参考图1,图1为本申请实施例提供的一种云硬盘数据压缩备份及恢复方法流程图。该方法包括:
S101:将源云硬盘切分得到若干个初始数据块,并确定各个初始数据块在源云硬盘中的起始偏移量。
在对源云硬盘进行备份时,需要先对其进行切分,切分可以为平均切分,或者可以为非平均切分。为了提高压缩的效果,尽可能地降低对存储空间的占用,可以对整个源云硬盘进行平均切分。由于源云硬盘的大小以GB(千兆字节)为单位,因此切分粒度应当能够等分1GB。在进行切分时,可以获取能够等分1GB的切分粒度,并根据该切分粒度对源云硬盘进行平均切分,得到初始数据块。
在得到各个初始数据块后,需要记录各个初始数据块在源云硬盘中的起始偏移量,起始偏移量,是指表示初始数据块中的第一位数据在源云硬盘中的位置偏移量。
S102:对初始数据块中的非零数据块进行压缩,得到压缩数据块,并计算各个压缩数据块的数据体积。
由于对源云硬盘拆分后得到的初始数据块中可能存在零数据块,为了提高存储空间利用率,本申请仅对其中的非零数据块进行存储,对于零数据块,由于其中没有记录有任何有效数据,且其数据内容是唯一确定的,因此可以不用将其进行备份。
为了实现上述效果,得到压缩数据块的过程包括如下步骤:
步骤11:对各个所述初始数据块进行零数据块检测,得到检测结果。
步骤12:将检测结果表示为非零的所述初始数据块确定为非零数据块进行压缩,得到所述压缩数据块。
可以理解的是,本申请在对非零数据块进行压缩之前,必须确定哪些数据块为非零数据块,哪些为零数据块。在一种实施方式中,可以获取对应的数据块身份信息,其具体是在确定切分方式后指示各个数据块是否为零数据块的身份信息。在另一种实施方式中,可以进行零数据块检测,并将未通过零数据块检测的初始数据块确定为非零数据块。本实施例并不限定零数据块的具体检测方式,在一种实施方式中,对各个初始数据块进行零数据块检测,得到检测结果的步骤可以包括:
步骤21:读取初始数据块的数据内容,并将数据内容与二进制空标志位比对。
步骤22:若存在任一数据内容不为二进制空标志位,则确定初始数据块对应的检测结果表示为非零。
其中,二进制空标志位即为“\x00”,数据内容,是指初始数据块中记录的具体内容,通过将其与二进制空标志位比对,可以确定其是否全部为空。若任一数据内容不为二进制空标志位,则说明初始数据块中并不是全零,因此可以确定其为非零数据块,即确定初始数据块对应的检测结果表示为非零。
在确定非零数据块后,可以对其进行压缩得到压缩数据块,具体的,可以采用gzip、zip或snappy等压缩算法对非零数据块进行压缩,得到压缩数据块。此外,由于各个数据块经过压缩后的体积通常不相同,为了能够正确地读出各个压缩数据块,需要统计各个压缩数据块的数据体积,以便在后续生成备份信息,用于进行正确地数据恢复。
S103:利用压缩数据块对应的起始偏移量和数据体积生成对应的备份信息,并确定备份信息对应的预设顺序。
在得到起始偏移量和数据体积后,可以利用其生成备份信息,并存储备份信息。此外,还需要将压缩数据块存入备份卷,实现对源云硬盘的备份。需要说明的是,步骤S103和步骤S104的具体执行顺序不做限定,例如可以先执行步骤S103,后执行步骤S104;或者可以先执行步骤S104,后执行步骤S103;或者可以同时执行步骤S103和步骤S104。
本实施例并不限定备份信息的具体形式和内容,具体的,在一种实施方式中,备份信息生成的过程可以包括如下步骤:
步骤31:利用压缩数据块对应的起始偏移量和数据体积组成键值对。
步骤32:按照起始偏移量的大小顺序对各个键值对排序,得到键值对序列。
步骤33:利用源云硬盘的硬盘标志、备份卷的卷标志对键值对序列进行标识,得到备份信息。
在本实施方式中,可以采用键值对的方式表示起始偏移量与数据体积之间的相关关系,得到对应的键值对。在得到键值对后,可以按照起始偏移量的大小对其进行排序,得到键值对序列。键值对序列中键值对的顺序即可作为前述的预设顺序,即本实施方式中的预设顺序具体为起始偏移量大小顺序,同样为非零数据块在源云硬盘中的位置先后顺序。在得到键值对序列后,可以利用源云硬盘的硬盘标志和备份卷的卷标志对键值对序列进行标识,进而建立源云硬盘、备份卷和备份信息之间的对应关系,得到备份数据。需要说 明的是,本实施例并不限定硬盘标标志和卷标志的具体形式,例如可以采用UUID形式。UUID是通用唯一识别码(Universally Unique Identifier)的缩写,是一种软件建构的标准。其目的,是让分布式系统中的所有元素,都能有唯一的辨识信息,而不需要通过中央控制端来做辨识信息的指定。
在另一种实施方式中,系统可以对不同的源云硬盘采用不同的备份策略进行备份,例如某些源云硬盘需要按照上述备份方式进行备份,而另外一些源云硬盘不需要,而是直接进行复制备份。因此为了表明备份的方式,备份信息的生成过程可以包括:
步骤41:利用压缩数据块对应的起始偏移量和数据体积生成初始备份信息。
步骤42:利用压缩标识对初始备份信息进行标识,得到备份信息。
在本实施方式中,利用起始偏移量和数据体积直接生成的是初始备份信息。压缩标识,是指能够表明备份方式的标识,其具体形式不做限定,例如可以为初始备份信息设置状态标志位compress,若将该标志位设置为true,则true即为压缩标识。利用压缩标识对初始备份信息进行标识,即可得到能够表示备份方式的备份信息。
由于进行本申请中的压缩数据块为非零数据块压缩后的结果,生成的备份信息也是非零数据块对应的内容。因此在进行数据恢复时,为了准确确定在何处插入零数据块以便得到正确且完整的源云硬盘,需要预设有一个判断规则,而该判断规则通常与硬盘的写入位置和非零数据块的偏移量相关,即当二者不匹配时,确定需要补充零数据块。而每次将非零数据块写入云硬盘中时,都需要按照一定的顺序选择特定的非零数据块,并利用其对应的起始偏移量与硬盘的写入位置进行匹配,只有该特定的非零数据块对应的起始偏移量与写入位置的匹配结果才能够表示是否需要插入零数据块。可以理解的是,该顺序必然与判断规则的内容相关,因此在判断规则的内容确定后,即在判断起始偏移量与写入位置是否匹配的规则确定后,在进行数据备份时,则需要生成各个压缩数据块对应的预设顺序,以便在进行数据恢复时基于该预设顺序依次选取非零数据块。
本实施例并不限定该预设顺序的具体内容,根据匹配规则的改变,预设顺序可以适应改变。在一种具体的实施方式中,源云硬盘的恢复过程为从数据首端开始向尾端依次恢复,为了降低匹配规则的复杂程度,可以将其设置为判断起始偏移量是否大于硬盘的当前写入位置且与当前写入位置紧邻,若是则确定二者匹配,否则不匹配。在这种情况下,生成预设顺序的过程可以包括如下步骤:
步骤51:根据各个压缩数据块对应的起始偏移量的大小关系对压缩数据块进行排序,并将压缩数据块的先后顺序确定为预设顺序。
通过按照起始偏移量的大小顺序对压缩数据块进行排序,并在排序后将压缩数据块的先后顺序确定为预设顺序。按照该顺序,可以逐次选择到起始偏移量变大的压缩数据块,即得到起始偏移量依次变大的非零数据块,在写入过程中,可以基于上述的匹配规则判断是否需要进行零数据块的补充。
S104:向备份卷中写入压缩数据块。
备份卷是指用于存储压缩数据块的备份卷。
S105:在检测到恢复请求时,利用恢复请求指定的目标备份信息、目标备份卷和对应的目标预设顺序进行数据恢复。
恢复请求,是指表明对指定云硬盘中的数据进行恢复的请求,其具体形式和内容不做限定。可以理解的是,基于恢复请求必然能够确定需要对哪些云硬盘的数据进行恢复,因而可以进一步确定进行数据恢复时所需的数据,其中包括目标备份信息(也可以称为目标信息)、目标预设顺序和目标备份卷。
目标备份卷,是指恢复请求指定的存储有备份数据的数据卷,其中存储有对源云硬盘(即经过备份的云硬盘)进行备份时所生成的备份数据。目标信息,是指恢复请求指定的备份数据情况的备份信息。可以理解的是,由于对源云硬盘备份时通常需要对数据进行分块处理,并将分块后的数据连续写入目标备份卷。因此,目标备份信息至少应当能够表明目标备份卷中各个数据块的体积,以便能够读出准确的备份数据,同时还应当能够表明各个备份数据在源云硬盘中的位置,以便正确地重构源云硬盘中的数据。
具体的,S105步骤可以进一步包括:
步骤61:若检测到恢复请求,则确定恢复请求指定的目标备份信息和目标备份卷。
步骤62:利用各个预设数据体积,按照目标预设顺序从目标备份卷中读取对应的目标压缩数据块。
步骤63:对目标压缩数据块进行解压缩,得到候选非零数据块。
步骤64:按照目标预设顺序在候选非零数据块中确定目标非零数据块,并确定目标非零数据块对应的目标预设起始偏移量。
步骤65:基于目标预设起始偏移量与目标云硬盘的当前写入位置的匹配情况,将目标非零数据块写入目标云硬盘。
在本实施例中,目标信息包括若干个预设数据体积和对应的若干个预设起始偏移量。预设数据体积,是指目标备份卷中各个压缩数据块的数据体积。预设起始偏移量,是指目标备份卷中各个压缩数据块对应的非零数据块在源云硬盘中的位置。二者为一一对应的关系,且分别与目标备份卷中的各个压缩数据块相对应。
可以理解的是,不同的源云硬盘进行备份后得到的备份卷和备份信息不同,为了能够表明备份与源云硬盘之间的关系,还可以生成并保存源云硬盘、备份卷以及备份信息三者之间的对应关系。因此,在一种可行的实施方式中,恢复请求中可以包括源云硬盘信息,在得到源云硬盘信息后,利用上述对应关系确定对应的目标备份信息和目标备份卷。在另一种实施方式中,若不存在源云硬盘、备份卷以及备份信息三者之间的对应关系,则恢复请求中可以直接指定目标备份信息和目标备份卷。
在得到目标信息后对其进行解析,得到预设数据体积和预设起始偏移量,通常情况下,预设数据体积和预设起始偏移量的数量相同且为多个,当然,二者的数量也可以为一个。在本实施例中,为了尽可能地提高备份存储空间的利用效率,在对源云硬盘进行备份时对切分得到的各个数据块进行压缩,得到压缩数据块。因此,预设数据体积即为目标压缩数据块的体积,目标压缩数据块是指目标备份卷中存储的压缩数据块。
在本实施例中,目标预设顺序可以为目标压缩数据块在目标备份卷中的存储顺序,通常情况下,其同样为目标压缩数据块在源云硬盘中的位置先后顺序,即对应的起始偏移量的大小顺序。按照目标预设顺序,可以在进行数据恢复时确定某一阶段要读取哪一个目标压缩数据块,进而确定需要基于哪一个预设数据体积进行读取。
在准确读出各个目标压缩数据块后,对其进行解压缩的处理,得到对应的候选非零数据块。零数据块,是指仅包括零数据的数据块,相应的,非零数据块,是指包括非零数据的数据块。对于解压缩方式,其需要与目标压缩数据块的压缩方式相对应,对于压缩方式和解压缩方式的具体内容本实施例不做限定,可以选择任意可逆的压缩方式和对应的解压缩方式。其中,可逆是指数据内容经过压缩和解压缩处理后不会发生变化。
可以理解的是,由于仅对非零数据块进行了压缩备份,而源云硬盘中,相邻的候选非零数据块之间可能存在零数据。因此在进行数据恢复时,无法直接将全部的候选非零数据块直接相邻地写入目标云硬盘,而是需要逐一写入,因此需要在各个候选非零数据块中确定目标非零数据块。目标非零数据块,是指在当前阶段中需要被写入目标云硬盘的数据块。本实施例中,需要按照预设顺序确定目标非零数据块。在确定目标非零数据块后,其对应的预设起始偏移量即为目标预设起始偏移量,其能够表示目标非零数据块在源云硬盘中的数据位置。
在写入目标非零数据块时,需要判断其与前一次写入的候选非零数据块之间是否具有零数据,进而确定是否需要同时进行零数据的补充,以便对源云硬盘的数据进行准确地恢复。即,需要基于目标预设起始偏移量与目标云硬盘的当前写入位置的匹配情况,将目标非零数据块写入目标云硬盘。
具体的,基于目标预设起始偏移量与目标云硬盘的当前写入位置的匹配情况,将目标非零数据块写入目标云硬盘的过程包括如下步骤:
步骤71:若目标预设起始偏移量与当前写入位置相匹配,则将目标非零数据块按照目标预设起始偏移量写入目标云硬盘。
步骤72:若目标预设起始偏移量与当前写入位置不匹配,则将目标非零数据块按照目标预设起始偏移量写入目标云硬盘,并将目标云硬盘从目标非零数据块写入前的当前写入位置,到目标预设起始偏移量之间的数据清零。
具体的,目标云硬盘的当前写入位置,是指目标云硬盘在上一次写入数据后数据指针指定的位置,数据指针指向的位置会随着数据的写入而变化,其一直指向最后写入数据的位置。若目标云硬盘中没有写入数据,则数据指针指向目标云硬盘的初始起始偏移量位置。
若目标预设起始偏移量与当前写入位置相匹配,则说明上一次写入的候选非零数据块与目标非零数据块首尾相接,二者为紧邻的关系,中间不存在空白数据。在这种情况下,可以直接将目标非零数据块顺序写入目标云硬盘。本实施例并不限定目标预设起始偏移量与当前写入位置是否匹配的具体检测方式,例如可以判断当前写入位置是否比目标预设起始偏移量小且紧邻目标预设起始偏移量,若是,则可以确定二者相匹配。或者可以判断当前写入位置和目标预设起始偏移量是否均为初始起始偏移量位置,即整个云硬盘的第一个存储位置,若是,则可以确定二者相匹配。
若目标预设起始偏移量和当前写入位置不匹配,则说明上一次写入的候选非零数据块和目标非零数据块之间存在零数据块,或者,源云硬盘的前若干个数据块为非零数据块。在这种情况下,需要按照目标预设起始偏移量将目标非零数据块写入目标云硬盘,且还需要进行零数据的补充。可以理解的是,由于目标非零数据块的写入会引起当前写入位置的变化,因此需要利用目标非零数据块写入前的当前写入位置为区间起点,利用目标预设起始偏移量为区间终点,对该区间内的数据进行清零,完成对零数据的补充。
基于上述实施例,由于可能采用不同的备份策略进行备份,因此,在确定所述恢复请求指定的所述目标备份信息和目标备份卷之后,还可以包括如下步骤:
步骤81:判断目标备份信息是否具有压缩标识。
步骤82:若具有压缩标识,则确定执行利用各个预设数据体积,按照目标预设顺序从目标备份卷中读取对应的目标压缩数据块的步骤。
步骤83:若不具有压缩标识,则利用各个预设数据体积,按照目标预设顺序从目标备份卷中读取对应的目标数据块,并将目标数据块进行拼接,完成数据恢复。
若不具有压缩标识,则说明其没有采用本申请提供的备份方式进行备份,没有经过压缩,因此可以利用各个预设数据体积从目标备份卷中直接读出对应的目标数据块并进行拼接。目标数据块可以包括全零数据块和非零数据块。
应用本申请实施例提供的云硬盘数据压缩备份及恢复方法,在进行云硬盘备份时,对其进行切分,并对其中的非零数据块进行压缩。非零数据块为记录有非零数据的数据块,与零数据块不同,在数据恢复时,其具体内容无法确定,因此需要将其进行压缩和保存,以便基于其进行数据恢复。由于不同的非零数据块在被压缩后的体积不同,且压缩数据块之间连续存储,因此为了能够进行准确地恢复,对压缩数据块的数据体积进行记录,以便能够正确地读出压缩数据块。为了表征数据块在源云硬盘中的位置,利用压缩数据块对应的起始偏移量,即非零数据块对应的起始偏移量与对应的数据体积生成备份信息,并确定其对应的预设顺序。预设顺序为数据恢复时用于指示选择压缩数据块的顺序。通过将压缩数据块写入备份卷,可以完成对源云硬盘的压缩备份。在检测到恢复请求时,说明想要对某一个源云硬盘进行恢复,此时可以利用恢复请求指定的目标备份信息、目标备份卷和目标预设顺序准确地读出压缩数据块、解压缩并进行数据块拼接,完成数据恢复。通过将全零数据块去除,并对非零数据块进行压缩存储,同时生成其对应的备份信息和预设顺序,可以大大减少压缩备份所需的存储空间,提高了存储空间的利用效率,解决了相关技术存在的存储空间占用较多,业务成本较高的问题。
基于上述实施例,请参考图2,图2为本申请实施例提供的一种具体的云硬盘备份流程图。该源云硬盘备份流程描述如下:
1)将源云硬盘划分为n个块(chunk),n为正整数。chunk即为初始数据块。
2)将各个chunk与二进制空标志位“\x00”比对,识别出空chunk。在本实施例中,可以假设第1,3,4…n-1个chunk为非空chunk(即非零数据块),以2,n代表空chunk。
3)对于非空chunk,使用gzip工具压缩,得到压缩后的数据块(即压缩数据块),并计算其对应的容量大小size(即压缩数据块的数据体积)。具体的,chunk1经过压缩后,其容量缩小为size1。chunk2由于被识别为空chunk(即零数据块),将跳过压缩步骤,同样就不会计算压缩后的数据块大小。chunk3经过压缩后,得到的数据块的大小为size3。后续的压缩数据块以此类推。
4)将chunk1经过压缩得到的数据块1写入备份卷,并在数据库中记录一个键值对来对应此数据块。键值对的key值为:从源云硬盘读取此chunk时的起始起始偏移量offset1。其value值为:该chunk经过压缩后的数据块1的大小size1。
5)由于chunk2被识别为空chunk,故将掉过压缩,写入备份卷,数据库记录键值对的步骤,后续的所有空chunk均以此方式处理。
6)开始处理数据块3,由于此前数据块1已经首次写入了备份卷,备份卷当前的写入偏移量已经由起始的0变为size1(因为数据块1的大小为size1),数据块3将从偏移量size1开始继续写入,直到将压缩后的chunk3的数据块3完全写完。写完之后,备份卷的偏移量变为size1+size3,以便作为下一个数据块开始写入的起始位置。完成数据块3的写入后,数据库新增一个新的键值对,其key值为:chunk3从源云硬盘读取时的起始偏移量,value值为:chunk3经过压缩后的数据块3的大小size3。
7)后续的数据块均以上述处理方式进行处理,完成数据写入以及数据库的键值对记录。
云硬盘经过压缩备份后,不仅没有向备份卷写入空chunk,而且对非空chunk进行了压缩,极大的缩小了备份卷占用的容量大小,而且在数据库内记录了完整的各个chunk的源卷起始读取起始偏移量,以及各个chunk压缩后得到的数据块的大小size信息。
云硬盘数据压缩备份及恢复流程如下:
1)先从数据库内读取记录的键值对(即目标信息),轮询处理各个键值对对应的数据块。
2)以图2为例,先处理偏移量offset1:size1,即先从备份卷从0起始偏移量开始,向后读取size1的数据块,然后将数据块使用gzip工具反向解压,得到新的解压后的数据块(即候选非零数据块)。
3)将恢复的云硬盘(即目标云硬盘)的起始偏移量定位到offset1,然后开始写入解压后的数据块。
4)由于之前chunk2为空数据块,故数据块并未记录相关信息,将直接开始chunk3的恢复流程。
5)处理偏移量offset3:size3,备份卷从起始读取起始偏移量size1开始,再向后读取大小为size3的数据块,然后解压得到新数据块。由于offset3一定大于恢复云硬盘当前的起始偏移量offset2(即当前写入位置,由于之前写入了解压后的chunk1,将当前写入位置从offset1增加至offset2)。故将此数据块(解压后的chunk3)写入恢复云硬盘之前,需要将offset2至offset3这个空间,进行清零。以保障恢复后的数据和当初备份时是一致的。
6)后续的处理流程以此类推,直到完成对数据库内所有键值对的处理,即完成了云硬盘备份的恢复。
在实测中,对五个源云硬盘进行对比,各源云硬盘情况如下:
A、源云硬盘imageA:创建配额大小为10G的空云硬盘,挂载到虚拟机格式化为ext4文件系统,向文件系统内通过dd命令,创建大小为2G的全零文件。
B、源云硬盘imageB:创建配额大小为10G的空云硬盘,挂载到虚拟机格式化为ext4文件系统,向文件系统内通过dd命令,创建大小为5G的全零文件.
C、源云硬盘imageC:创建配额大小为10G的镜像卷,即该云硬盘内包含一个系统镜像,此镜像大小为39MB,是一个linux最小安装系统。
D、源云硬盘imageD:创建配额大小为10G的镜像卷,即该云硬盘内包含一个系统镜像,此镜像大小为2404MB,是一个centos7安装系统。
E、源云硬盘imageE:创建配额大小为10G的镜像卷,即该云硬盘内包含一个系统镜像,此镜像大小为396MB,是一个win最小安装系统。
以上5种云硬盘分别在以下三种场景下执行云硬盘备份,并记录完成备份后,所消耗的时间以及备份卷占用的真实容量大小。场景如下:
1、使用社区备份驱动,即未开启空chunk检测,未开启备份压缩。
2、使用本申请优化后的驱动,仅开启空chunk检测,未开启备份压缩。
3、使用本申请优化后的驱动,开启空chunk检测,开启备份压缩。
请参考图3和图4,图3为本申请实施例提供的一种备份耗时对比图,图4为本申请实施例提供的一种备份卷容量对比图。对图3分析可知,相关技术采用的云硬盘备份机制,备份消耗的时间较高;开启空chunk检测(备份加速)后备份相同的卷消耗的时间明显降低;再开启备份压缩后,消耗的时间总体上比仅开启空chunk检测要高,因为多了chunk压缩的时间,但是仍然比相关技术采用的机制备份耗时低。
需要注意的是,图4的纵坐标是指数分布的。由图4可知,相关技术采用的云硬盘备份逻辑(场景1),备份卷的容量占用和源云硬盘的配额大小一样,均为10G,严重消耗了备份存储容量。开启空chunk检测后(场景2),可以看出备份卷的容量占用已经有了大幅降低。而同时开启备份压缩后(场景3),备份卷的容量占用将进一步降低(场景3)。需要说明的是,场景3比场景2进一步节省的容量大小,与源云硬盘内数据的稀疏程度有关(例如imageA和imageB为dd命令生成的全零文件,稀疏程度很高),也与使用的压缩算法有关。本申请实施例采用了gzip压缩算法得到图3和图4中的测试结果。
下面对本申请实施例提供的云硬盘数据压缩备份及恢复装置进行介绍,下文描述的云硬盘数据压缩备份及恢复装置与上文描述的云硬盘数据压缩备份及恢复方法可相互对应参照。
请参考图5,图5为本申请实施例提供的一种云硬盘数据压缩备份及恢复装置的结构示意图,包括:
切分模块110,用于将源云硬盘切分得到若干个初始数据块,并确定各个初始数据块在源云硬盘中的起始偏移量;
压缩模块120,用于对初始数据块中的非零数据块进行压缩,得到压缩数据块,并计算各个压缩数据块的数据体积;
信息生成模块130,用于利用压缩数据块对应的起始偏移量和数据体积生成对应的备份信息,并确定备份信息对应的预设顺序;
写入模块140,用于向备份卷中写入压缩数据块;
恢复模块150,用于在检测到恢复请求时,利用恢复请求指定的目标备份信息、目标备份卷和对应的目标预设顺序进行数据恢复。
可选地,恢复模块150,包括
确定单元,用于若检测到恢复请求,则确定恢复请求指定的目标备份信息和目标备份卷;目标备份信息包括若干个预设数据体积和对应的若干个预设起始偏移量;
读取单元,用于利用各个预设数据体积,按照目标预设顺序从目标备份卷中读取对应的目标压缩数据块;
解压单元,用于对目标压缩数据块进行解压缩,得到候选非零数据块;
目标确定单元,用于按照目标预设顺序在候选非零数据块中确定目标非零数据块,并确定目标非零数据块对应的目标预设起始偏移量;
写入单元,用于基于目标预设起始偏移量与目标云硬盘的当前写入位置的匹配情况,将目标非零数据块写入目标云硬盘。
可选地,写入单元,包括:
第一写入子单元,用于若目标预设起始偏移量与当前写入位置相匹配,则将目标非零数据块按照目标预设起始偏移量写入目标云硬盘;
第二写入子单元,用于若目标预设起始偏移量与当前写入位置不匹配,则将目标非零数据块按照目标预设起始偏移量写入目标云硬盘,并将目标云硬盘从目标非零数据块写入前的当前写入位置,到目标预设起始偏移量之间的数据清零。
可选地,还包括:
压缩判断单元,用于判断目标备份信息是否具有压缩标识;
确定执行单元,用于若具有压缩标识,则确定执行利用各个预设数据体积,按照目标预设顺序从目标备份卷中读取对应的目标压缩数据块的步骤;
拼接恢复单元,用于若不具有压缩标识,则利用各个预设数据体积,按照目标预设顺序从目标备份卷中读取对应的目标数据块,并将目标数据块进行拼接,完成数据恢复。
可选地,信息生成模块130,包括:
排序单元,用于根据各个压缩数据块对应的起始偏移量的大小关系对压缩数据块进行排序,并将压缩数据块的先后顺序确定为预设顺序。
可选地,压缩模块120,包括:
零数据块检测单元,用于对各个初始数据块进行零数据块检测,得到检测结果;
压缩单元,用于将检测结果表示为非零的初始数据块确定为非零数据块进行压缩,得到压缩数据块。
可选地,零数据块检测单元,包括:
内容匹配子单元,用于读取初始数据块的数据内容,并将数据内容与二进制空标志位比对;
非零确定子单元,用于若存在任一数据内容不为二进制空标志位,则确定初始数据块对应的检测结果表示为非零。
可选地,切分模块110,包括:
粒度获取单元,用于获取切分粒度;切分粒度能够等分1GB;
平均切分单元,用于根据切分粒度对源云硬盘进行平均切分,得到初始数据块。
可选地,信息生成模块130,包括:
键值对生成单元,用于利用压缩数据块对应的起始偏移量和数据体积组成键值对;
键值对排序单元,用于按照起始偏移量的大小顺序对各个键值对排序,得到键值对序列;
标识单元,用于利用源云硬盘的硬盘标志、备份卷的卷标志对键值对序列进行标识,得到备份信息。
可选地,信息生成模块130,包括:
初始生成单元,用于利用压缩数据块对应的起始偏移量和数据体积生成初始备份信息;
压缩标识单元,用于利用压缩标识对初始备份信息进行标识,得到备份信息。
下面对本申请实施例提供的电子设备进行介绍,下文描述的电子设备与上文描述的云硬盘数据压缩备份及恢复方法可相互对应参照。
请参考图6,图6为本申请实施例提供的一种电子设备的结构示意图。其中电子设备100可以包括处理器101和存储器102,还可以进一步包括多媒体组件103、信息输入/信息输出(I/O)接口104以及通信组件105中的一种或多种。
其中,处理器101用于控制电子设备100的整体操作,以完成上述的云硬盘数据压缩备份及恢复方法中的全部或部分步骤;存储器102用于存储各种类型的数据以支持在电子设备100的操作,这些数据例如可以包括用于在该电子设备100上操作的任何应用程序或方法的指令,以及应用程序相关的数据。该存储器102可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,例如静态随机存取存储器(Static Random Access Memory,SRAM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、只读存储器(Read-Only Memory,ROM)、磁存储器、快闪存储器、磁盘或光盘中的一种或多种。
多媒体组件103可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏,音频组件用于输出和/或输入音频信号。例如,音频组件可以包括一个麦克风,麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器102或通过通信组件105发送。音频组件还包括至少一个扬声器,用于输出音频信号。I/O接口104为处理器101和其他接口模块之间提供接口,上述其他接口模块可以是键盘,鼠标,按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件105用于电子设备100与其他设备之间进行有线或无线通信。无线通信,例如Wi-Fi,蓝牙,近场通信(Near Field Communication,简称NFC),2G、3G或4G,或它们中的一种或几种的组合,因此相应的该通信组件105可以包括:Wi-Fi部件,蓝牙部件,NFC部件。
电子设备100可以被一个或多个应用专用集成电路(Application Specific Integrated Circuit,简称ASIC)、数字信号处理器(Digital Signal Processor, 简称DSP)、数字信号处理设备(Digital Signal Processing Device,简称DSPD)、可编程逻辑器件(Programmable Logic Device,简称PLD)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述实施例给出的云硬盘数据压缩备份及恢复方法。
下面对本申请实施例提供的计算机可读存储介质进行介绍,下文描述的计算机可读存储介质与上文描述的云硬盘数据压缩备份及恢复方法可相互对应参照。
本申请还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述的云硬盘数据压缩备份及恢复方法的步骤。
该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
本领域技术人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应该认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系属于仅仅用来将一个实体或者操作与另一个实体或者操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语包括、包含或者其他任何变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (13)

  1. 一种云硬盘数据压缩备份及恢复方法,其特征在于,包括:
    将源云硬盘切分得到若干个初始数据块,并确定各个所述初始数据块在所述源云硬盘中的起始偏移量;
    对所述初始数据块中的非零数据块进行压缩,得到压缩数据块,并计算各个所述压缩数据块的数据体积;
    利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成对应的备份信息,并确定所述备份信息对应的预设顺序;
    向备份卷中写入所述压缩数据块;
    在检测到恢复请求时,利用所述恢复请求指定的目标备份信息、目标备份卷和对应的目标预设顺序进行数据恢复。
  2. 根据权利要求1所述的云硬盘数据压缩备份及恢复方法,其特征在于,所述在检测到恢复请求时,利用所述恢复请求指定的目标备份信息、目标备份卷和对应的目标预设顺序进行数据恢复,包括
    若检测到所述恢复请求,则确定所述恢复请求指定的所述目标备份信息和所述目标备份卷;所述目标备份信息包括若干个预设数据体积和对应的若干个预设起始偏移量;
    利用各个所述预设数据体积,按照所述目标预设顺序从所述目标备份卷中读取对应的目标压缩数据块;
    对所述目标压缩数据块进行解压缩,得到候选非零数据块;
    按照所述目标预设顺序在所述候选非零数据块中确定目标非零数据块,并确定所述目标非零数据块对应的目标预设起始偏移量;
    基于所述目标预设起始偏移量与目标云硬盘的当前写入位置的匹配情况,将所述目标非零数据块写入所述目标云硬盘。
  3. 根据权利要求2所述的云硬盘数据压缩备份及恢复方法,其特征在于,所述基于所述目标预设起始偏移量与目标云硬盘的当前写入位置的匹配情况,将所述目标非零数据块写入所述目标云硬盘,包括:
    若所述目标预设起始偏移量与所述当前写入位置相匹配,则将所述目标非零数据块按照所述目标预设起始偏移量写入所述目标云硬盘;
    若所述目标预设起始偏移量与所述当前写入位置不匹配,则将所述目标非零数据块按照所述目标预设起始偏移量写入所述目标云硬盘,并将所述目标云硬盘从所述目标非零数据块写入前的所述当前写入位置,到所述目标预设起始偏移量之间的数据清零。
  4. 根据权利要求2所述的云硬盘数据压缩备份及恢复方法,其特征在于,在确定所述恢复请求指定的所述目标备份信息和所述目标备份卷之后,还包括:
    判断所述目标备份信息是否具有压缩标识;
    若具有所述压缩标识,则确定执行利用各个所述预设数据体积,按照所述目标预设顺序从所述目标备份卷中读取对应的目标压缩数据块的步骤;
    若不具有所述压缩标识,则利用各个所述预设数据体积,按照所述目标预设顺序从所述目标备份卷中读取对应的目标数据块,并将所述目标数据块进行拼接,完成所述数据恢复。
  5. 根据权利要求1所述的云硬盘数据压缩备份及恢复方法,其特征在于,所述确定所述备份信息对应的预设顺序,包括:
    根据各个所述压缩数据块对应的所述起始偏移量的大小关系对所述压缩数据块进行排序,并将所述压缩数据块的先后顺序确定为所述预设顺序。
  6. 根据权利要求1所述的云硬盘数据压缩备份及恢复方法,其特征在于,所述对所述初始数据块中的非零数据块进行压缩,得到压缩数据块,包括:
    对各个所述初始数据块进行零数据块检测,得到检测结果;
    将所述检测结果表示为非零的所述初始数据块确定为非零数据块进行压缩,得到所述压缩数据块。
  7. 根据权利要求6所述的云硬盘数据压缩备份及恢复方法,其特征在于,所述对各个所述初始数据块进行零数据块检测,得到检测结果,包括:
    读取所述初始数据块的数据内容,并将所述数据内容与二进制空标志位比对;
    若存在任一所述数据内容不为所述二进制空标志位,则确定所述初始数据块对应的所述检测结果表示为非零。
  8. 根据权利要求1所述的云硬盘数据压缩备份及恢复方法,其特征在于,所述将源云硬盘切分得到若干个初始数据块,包括:
    获取切分粒度;所述切分粒度能够等分1GB;
    根据所述切分粒度对所述源云硬盘进行平均切分,得到所述初始数据块。
  9. 根据权利要求1所述的云硬盘数据压缩备份及恢复方法,其特征在于,所述利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成对应的备份信息,包括:
    利用所述压缩数据块对应的所述起始偏移量和所述数据体积组成键值对;
    按照所述起始偏移量的大小顺序对各个所述键值对排序,得到键值对序列;
    利用所述源云硬盘的硬盘标志、所述备份卷的卷标志对所述键值对序列进行标识,得到所述备份信息。
  10. 根据权利要求1所述的云硬盘数据压缩备份及恢复方法,其特征在于,所述利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成对应的备份信息,包括:
    利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成初始备份信息;
    利用压缩标识对所述初始备份信息进行标识,得到所述备份信息。
  11. 一种云硬盘数据压缩备份及恢复装置,其特征在于,包括:
    切分模块,用于将源云硬盘切分得到若干个初始数据块,并确定各个所述初始数据块在所述源云硬盘中的起始偏移量;
    压缩模块,用于对所述初始数据块中的非零数据块进行压缩,得到压缩数据块,并计算各个所述压缩数据块的数据体积;
    信息生成模块,用于利用所述压缩数据块对应的所述起始偏移量和所述数据体积生成对应的备份信息,并确定所述备份信息对应的预设顺序;
    写入模块,用于向备份卷中写入所述压缩数据块;
    恢复模块,用于在检测到恢复请求时,利用所述恢复请求指定的目标备份信息、目标备份卷和对应的目标预设顺序进行数据恢复。
  12. 一种电子设备,其特征在于,包括存储器和处理器,其中:
    所述存储器,用于保存计算机程序;
    所述处理器,用于执行所述计算机程序,以实现如权利要求1至10任一项所述的云硬盘数据压缩备份及恢复方法。
  13. 一种计算机可读存储介质,其特征在于,用于保存计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至10任一项所述的云硬盘数据压缩备份及恢复方法。
PCT/CN2022/078491 2021-07-23 2022-02-28 云硬盘数据压缩备份及恢复方法、装置、设备及存储介质 WO2023000674A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110838010.2 2021-07-23
CN202110838010.2A CN113722150B (zh) 2021-07-23 2021-07-23 云硬盘数据压缩备份及恢复方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023000674A1 true WO2023000674A1 (zh) 2023-01-26

Family

ID=78673874

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078491 WO2023000674A1 (zh) 2021-07-23 2022-02-28 云硬盘数据压缩备份及恢复方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113722150B (zh)
WO (1) WO2023000674A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865097A (zh) * 2023-02-17 2023-03-28 浪潮电子信息产业股份有限公司 一种数据压缩方法、系统、设备及计算机可读存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722150B (zh) * 2021-07-23 2023-08-22 苏州浪潮智能科技有限公司 云硬盘数据压缩备份及恢复方法、装置、设备及存储介质
CN115982398B (zh) * 2023-03-13 2023-05-16 苏州浪潮智能科技有限公司 图结构数据处理方法、系统、计算机设备和存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542397B1 (en) * 2013-03-14 2017-01-10 EMC IP Holding Company LLC File block addressing for backups
CN109582653A (zh) * 2018-11-14 2019-04-05 网易(杭州)网络有限公司 文件的压缩、解压缩方法及设备
CN109597717A (zh) * 2018-12-07 2019-04-09 北京金山云网络技术有限公司 一种数据备份、恢复方法、装置、电子设备及存储介质
CN111104258A (zh) * 2019-12-23 2020-05-05 北京金山云网络技术有限公司 MongoDB数据库的备份方法、装置及电子设备
CN111104063A (zh) * 2019-12-06 2020-05-05 浪潮电子信息产业股份有限公司 一种数据存储方法、装置及电子设备和存储介质
CN111723053A (zh) * 2020-06-24 2020-09-29 北京航天数据股份有限公司 一种数据的压缩方法及装置、解压方法及装置
CN112214359A (zh) * 2020-10-30 2021-01-12 上海爱数信息技术股份有限公司 一种Oracle数据库的备份恢复系统及其方法
CN113064760A (zh) * 2021-04-06 2021-07-02 广州鼎甲计算机科技有限公司 数据库合成备份方法、装置、计算机设备和存储介质
CN113722150A (zh) * 2021-07-23 2021-11-30 苏州浪潮智能科技有限公司 云硬盘数据压缩备份及恢复方法、装置、设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614268A (zh) * 2018-12-10 2019-04-12 浪潮(北京)电子信息产业有限公司 一种云备份数据的恢复方法、装置及系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542397B1 (en) * 2013-03-14 2017-01-10 EMC IP Holding Company LLC File block addressing for backups
CN109582653A (zh) * 2018-11-14 2019-04-05 网易(杭州)网络有限公司 文件的压缩、解压缩方法及设备
CN109597717A (zh) * 2018-12-07 2019-04-09 北京金山云网络技术有限公司 一种数据备份、恢复方法、装置、电子设备及存储介质
CN111104063A (zh) * 2019-12-06 2020-05-05 浪潮电子信息产业股份有限公司 一种数据存储方法、装置及电子设备和存储介质
CN111104258A (zh) * 2019-12-23 2020-05-05 北京金山云网络技术有限公司 MongoDB数据库的备份方法、装置及电子设备
CN111723053A (zh) * 2020-06-24 2020-09-29 北京航天数据股份有限公司 一种数据的压缩方法及装置、解压方法及装置
CN112214359A (zh) * 2020-10-30 2021-01-12 上海爱数信息技术股份有限公司 一种Oracle数据库的备份恢复系统及其方法
CN113064760A (zh) * 2021-04-06 2021-07-02 广州鼎甲计算机科技有限公司 数据库合成备份方法、装置、计算机设备和存储介质
CN113722150A (zh) * 2021-07-23 2021-11-30 苏州浪潮智能科技有限公司 云硬盘数据压缩备份及恢复方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865097A (zh) * 2023-02-17 2023-03-28 浪潮电子信息产业股份有限公司 一种数据压缩方法、系统、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN113722150B (zh) 2023-08-22
CN113722150A (zh) 2021-11-30

Similar Documents

Publication Publication Date Title
WO2023000674A1 (zh) 云硬盘数据压缩备份及恢复方法、装置、设备及存储介质
US9275067B2 (en) Apparatus and method to sequentially deduplicate data
US10666435B2 (en) Multi-tenant encryption on distributed storage having deduplication and compression capability
CN107229420B (zh) 数据存储方法、读取方法、删除方法和数据操作系统
EP3896564A1 (en) Data processing method and device, and computer readable storage medium
US10783145B2 (en) Block level deduplication with block similarity
US8719240B2 (en) Apparatus and method to sequentially deduplicate groups of files comprising the same file name but different file version numbers
CN106844102B (zh) 数据恢复方法和装置
US20160034201A1 (en) Managing de-duplication using estimated benefits
CN111125033B (zh) 一种基于全闪存阵列的空间回收方法及系统
US10656860B2 (en) Tape drive library integrated memory deduplication
US10581602B2 (en) End-to-end checksum in a multi-tenant encryption storage system
US8909606B2 (en) Data block compression using coalescion
WO2015096847A1 (en) Method and apparatus for context aware based data de-duplication
CN111124940B (zh) 一种基于全闪存阵列的空间回收方法及系统
CN111338759A (zh) 虚拟磁盘校验码生成方法、装置、设备及存储介质
US20220398220A1 (en) Systems and methods for physical capacity estimation of logical space units
CN113761059A (zh) 数据处理方法及装置
CN111061428B (zh) 一种数据压缩的方法及装置
CN115470040A (zh) 基于快照的重删指纹阈值的测试方法、装置、设备、介质
CN115328696A (zh) 一种数据库中的数据备份方法
CN111125012A (zh) 一种快照生成方法、装置、设备及可读存储介质
CN109086172B (zh) 一种数据处理的方法以及相关装置
US11977525B2 (en) Method to optimize ingest in dedupe systems by using compressibility hints
CN114528258B (zh) 文件异步处理方法、装置、服务器、介质、产品及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22844847

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE